Inspector cleanup wave 1 architectural decomposition and contract hardening¶

Problem¶

Plan and execute a deeper inspector cleanup pass that decomposes oversized inspector modules, tightens internal APIs, improves naming and helper boundaries, and strengthens contract-oriented regression coverage for upcoming open-source release quality.

Context¶

The first maintainability pass reduced TableInspector._inspect_table_inner() into a smaller orchestration flow, but the inspector subsystem still has several oversized hotspots that will attract scrutiny during open-source review.
Current high-value complexity targets from a local scan:
dataface/core/inspect/semantic_detector.py — detector catalog is large and highly branchy, especially _detect_ip_address, _detect_region, and _detect_status.
dataface/core/inspect/connection.py — get_table_enrichment() centralizes multiple warehouse-specific behaviors in one method.
dataface/core/inspect/query_builder.py — build_approximate_stats_query() and _build_comprehensive_stats_query() still carry a lot of dialect-specific branching and query-shape logic.
dataface/core/inspect/quality_detector.py — classify() and _detect_role() still own a dense set of heuristics and output mutations.
The open-source quality bar here is not just “works”; the inspector internals should read like a clean reference implementation with clear boundaries, stable contracts, and obvious extension points.
Constraints:
Preserve existing inspect.json/API behavior unless a contract change is explicitly planned and documented.
Avoid deep behavior changes mixed with structural refactors in the same patch set.
Prefer small, reviewable cleanup waves with contract tests over a single repo-wide rewrite.
Keep the data-shape boundary intact: detection/inspection code owns semantics and profiling policy; renderers consume structured outputs only.

Possible Solutions¶

Large-bang inspector rewrite across modules in one branch.
Pros: maximum freedom to redesign boundaries.
Cons: high regression risk, difficult review, and poor fit for a subsystem that already has broad integration coverage and external consumers.
Wave-based cleanup initiative with contract-first refactors across the main hotspots. Recommended
Pros: lets us isolate architectural improvements by seam, preserve behavior with targeted contract tests, and keep each cleanup pass understandable to future open-source contributors.
Cons: requires discipline to keep each wave bounded and avoid letting “cleanup” turn into feature work.
Documentation-only hardening without code cleanup.
Pros: low immediate risk.
Cons: does not address the actual readability/extensibility debt in the subsystem internals.

Plan¶

Treat this as wave 1 of a broader inspector cleanup initiative.
Scope the first deep refactor pass around the most review-sensitive seams:
detector organization and repeated pattern helpers in semantic_detector.py
warehouse enrichment boundary cleanup in connection.py
query assembly decomposition in query_builder.py
classification pipeline cleanup in quality_detector.py
For each selected area:
identify the contract surface and current regression coverage
extract repeated logic into narrowly named helpers or internal structures
improve naming, helper locality, and mutation boundaries
add focused tests that lock down the refactor seam before or alongside the code change
Keep task execution on a fresh branch/worktree from latest main after the current PR is resolved.
Update this task worksheet during execution with concrete wave selection, touched files, and validation results.

Implementation Progress¶

2026-03-13: Created under initiative inspector-cleanup-and-open-source-hardening to capture the broader refactor program rather than another isolated cleanup.
2026-03-13: Initial scoping identified the next major cleanup candidates as semantic_detector.py, connection.py, query_builder.py, and quality_detector.py.
2026-03-13: Deferred code execution until the current inspector refactor PR flow is fully resolved so the next wave can start from a clean branch and latest main.

Review Feedback¶

Pending execution.
Review cleared