Architecture that provides a data profile computation technique which
employs key profile computation and data pattern profile computation. Key
profile computation in a data table includes both exact keys as well as
approximate keys, and is based on key strengths. A key strength of 100%
is an exact key, and any other percentage in an approximate key. The key
strength is estimated based on the number of table rows that have
duplicated attribute values. Only column sets that exceed a threshold
value are returned. Pattern profiling identifies a small set of regular
expression patterns which best describe the patterns within a given set
of attribute values. Pattern profiling includes three phases: a first
phases for determining token regular expressions, a second phase for
determining candidate regular expressions, and a third phase for
identifying the best regular expressions of the candidates that match the
attribute values.