Skip to content

Increase profiler semantic coverage

Problem

The profiler's semantic type detection covers 27 detectors as of M1, but significant gaps remain: geographic coordinates, JSON-structured strings, encoded IDs (UUIDs in non-standard formats), currency codes, and domain-specific patterns (e.g., Fivetran connector IDs, Stripe charge IDs) are all classified as generic string or numeric types. For analysts, these misclassifications mean the profile report provides no useful context for columns that are semantically rich — a column of lat/long pairs looks the same as a column of random floats. For AI agent workflows that consume profile data to generate queries or suggest charts, low semantic coverage means the agent operates on impoverished metadata. Expanding detector coverage and improving detection confidence for ambiguous columns is necessary for the profiler to deliver meaningful value beyond basic statistics.

Context

Possible Solutions

Plan

Implementation Progress

  • Confirm scope and acceptance with milestone owner.

  • Milestone readiness signal is updated.

  • Track blockers and mitigation owner.

Review Feedback

  • Review cleared