Inspector cleanup wave 1 architectural decomposition and contract hardening¶
Problem¶
Plan and execute a deeper inspector cleanup pass that decomposes oversized inspector modules, tightens internal APIs, improves naming and helper boundaries, and strengthens contract-oriented regression coverage for upcoming open-source release quality.
Context¶
- The first maintainability pass reduced
TableInspector._inspect_table_inner()into a smaller orchestration flow, but the inspector subsystem still has several oversized hotspots that will attract scrutiny during open-source review. - Current high-value complexity targets from a local scan:
dataface/core/inspect/semantic_detector.py— detector catalog is large and highly branchy, especially_detect_ip_address,_detect_region, and_detect_status.dataface/core/inspect/connection.py—get_table_enrichment()centralizes multiple warehouse-specific behaviors in one method.dataface/core/inspect/query_builder.py—build_approximate_stats_query()and_build_comprehensive_stats_query()still carry a lot of dialect-specific branching and query-shape logic.dataface/core/inspect/quality_detector.py—classify()and_detect_role()still own a dense set of heuristics and output mutations.- The open-source quality bar here is not just “works”; the inspector internals should read like a clean reference implementation with clear boundaries, stable contracts, and obvious extension points.
- Constraints:
- Preserve existing
inspect.json/API behavior unless a contract change is explicitly planned and documented. - Avoid deep behavior changes mixed with structural refactors in the same patch set.
- Prefer small, reviewable cleanup waves with contract tests over a single repo-wide rewrite.
- Keep the data-shape boundary intact: detection/inspection code owns semantics and profiling policy; renderers consume structured outputs only.
Possible Solutions¶
- Large-bang inspector rewrite across modules in one branch.
- Pros: maximum freedom to redesign boundaries.
- Cons: high regression risk, difficult review, and poor fit for a subsystem that already has broad integration coverage and external consumers.
- Wave-based cleanup initiative with contract-first refactors across the main hotspots. Recommended
- Pros: lets us isolate architectural improvements by seam, preserve behavior with targeted contract tests, and keep each cleanup pass understandable to future open-source contributors.
- Cons: requires discipline to keep each wave bounded and avoid letting “cleanup” turn into feature work.
- Documentation-only hardening without code cleanup.
- Pros: low immediate risk.
- Cons: does not address the actual readability/extensibility debt in the subsystem internals.
Plan¶
- Treat this as wave 1 of a broader inspector cleanup initiative.
- Scope the first deep refactor pass around the most review-sensitive seams:
- detector organization and repeated pattern helpers in
semantic_detector.py - warehouse enrichment boundary cleanup in
connection.py - query assembly decomposition in
query_builder.py - classification pipeline cleanup in
quality_detector.py - For each selected area:
- identify the contract surface and current regression coverage
- extract repeated logic into narrowly named helpers or internal structures
- improve naming, helper locality, and mutation boundaries
- add focused tests that lock down the refactor seam before or alongside the code change
- Keep task execution on a fresh branch/worktree from latest
mainafter the current PR is resolved. - Update this task worksheet during execution with concrete wave selection, touched files, and validation results.
Implementation Progress¶
- 2026-03-13: Created under initiative
inspector-cleanup-and-open-source-hardeningto capture the broader refactor program rather than another isolated cleanup. - 2026-03-13: Initial scoping identified the next major cleanup candidates as
semantic_detector.py,connection.py,query_builder.py, andquality_detector.py. - 2026-03-13: Deferred code execution until the current inspector refactor PR flow is fully resolved so the next wave can start from a clean branch and latest
main.
Review Feedback¶
-
Pending execution.
-
Review cleared