A method and system for ensuring statistical disclosure limitation (SDL)
of categorical or continuous micro data, while maintaining the analytical
quality of the micro data. The new SDL methodology exploits the analogy
between (1) taking a sample (instead of a census,) along with some
adjustments, including imputation, for missing information, and (2)
releasing a subset, instead of the original data set, along with some
adjustments for records still at disclosure risk. Survey sampling reduces
monetary cost in comparison to a census, but entails some loss of
information. Similarly, releasing a subset reduces disclosure cost in
comparison to the full database, but entails some loss of information.
Thus, optimal survey sampling methods can be used for statistical
disclosure limitation. The method includes partitioning the database into
risk strata, optimal probabilistic substitution, optimal probabilistic
subsampling, and optimal sampling weight calibration.