Skip to content

Inspector cleanup wave 1 architectural decomposition and contract hardening

Problem

Plan and execute a deeper inspector cleanup pass that decomposes oversized inspector modules, tightens internal APIs, improves naming and helper boundaries, and strengthens contract-oriented regression coverage for upcoming open-source release quality.

Context

  • The first maintainability pass reduced TableInspector._inspect_table_inner() into a smaller orchestration flow, but the inspector subsystem still has several oversized hotspots that will attract scrutiny during open-source review.
  • Current high-value complexity targets from a local scan:
  • dataface/core/inspect/semantic_detector.py — detector catalog is large and highly branchy, especially _detect_ip_address, _detect_region, and _detect_status.
  • dataface/core/inspect/connection.pyget_table_enrichment() centralizes multiple warehouse-specific behaviors in one method.
  • dataface/core/inspect/query_builder.pybuild_approximate_stats_query() and _build_comprehensive_stats_query() still carry a lot of dialect-specific branching and query-shape logic.
  • dataface/core/inspect/quality_detector.pyclassify() and _detect_role() still own a dense set of heuristics and output mutations.
  • The open-source quality bar here is not just “works”; the inspector internals should read like a clean reference implementation with clear boundaries, stable contracts, and obvious extension points.
  • Constraints:
  • Preserve existing inspect.json/API behavior unless a contract change is explicitly planned and documented.
  • Avoid deep behavior changes mixed with structural refactors in the same patch set.
  • Prefer small, reviewable cleanup waves with contract tests over a single repo-wide rewrite.
  • Keep the data-shape boundary intact: detection/inspection code owns semantics and profiling policy; renderers consume structured outputs only.

Possible Solutions

  • Large-bang inspector rewrite across modules in one branch.
  • Pros: maximum freedom to redesign boundaries.
  • Cons: high regression risk, difficult review, and poor fit for a subsystem that already has broad integration coverage and external consumers.
  • Wave-based cleanup initiative with contract-first refactors across the main hotspots. Recommended
  • Pros: lets us isolate architectural improvements by seam, preserve behavior with targeted contract tests, and keep each cleanup pass understandable to future open-source contributors.
  • Cons: requires discipline to keep each wave bounded and avoid letting “cleanup” turn into feature work.
  • Documentation-only hardening without code cleanup.
  • Pros: low immediate risk.
  • Cons: does not address the actual readability/extensibility debt in the subsystem internals.

Plan

  • Treat this as wave 1 of a broader inspector cleanup initiative.
  • Scope the first deep refactor pass around the most review-sensitive seams:
  • detector organization and repeated pattern helpers in semantic_detector.py
  • warehouse enrichment boundary cleanup in connection.py
  • query assembly decomposition in query_builder.py
  • classification pipeline cleanup in quality_detector.py
  • For each selected area:
  • identify the contract surface and current regression coverage
  • extract repeated logic into narrowly named helpers or internal structures
  • improve naming, helper locality, and mutation boundaries
  • add focused tests that lock down the refactor seam before or alongside the code change
  • Keep task execution on a fresh branch/worktree from latest main after the current PR is resolved.
  • Update this task worksheet during execution with concrete wave selection, touched files, and validation results.

Implementation Progress

  • 2026-03-13: Created under initiative inspector-cleanup-and-open-source-hardening to capture the broader refactor program rather than another isolated cleanup.
  • 2026-03-13: Initial scoping identified the next major cleanup candidates as semantic_detector.py, connection.py, query_builder.py, and quality_detector.py.
  • 2026-03-13: Deferred code execution until the current inspector refactor PR flow is fully resolved so the next wave can start from a clean branch and latest main.

Review Feedback

  • Pending execution.

  • Review cleared