A system and method determine numerical representations for categorical data fields by taking advantage of the redundancy of the data records to allow automatic discovery of an order of the categories. A categorical data field is recoded by creating separate tables for each numerical data field occurring in the data records. The separate tables are sorted according to the numerical values of the respective data fields. The recoding of the categories is performed based on the average sort order of occurrences of the category in a specific sorted table. The standard deviation of the numerical codes provided by the categories is calculated for each of the separate recoding tables. The recoding table with the maximum standard deviation is selected as the recoding table to perform the recoding of the categories contained in the respective categorical data field of the data records. A plausibility check is performed for the selected recoding table by excluding the numerical data field that has formed the basis for the sorting of the respective table and recreating the recoding table from the data records. The resulting recoding table and the original recoding table are compared. Resulting recoding tables that are similar indicate a high level of confidence that the originally selected recoding table is optimal.

 
Web www.patentalert.com

> System and method of monitoring and controlling application files

~ 00378