A method for data shuffling to preserve data confidentiality is provided.
The method comprises masking of particular attributes of a dataset which
are to be preserved in confidentiality, followed by a shuffling step
comprising sorting the transformed dataset and a transformed confidential
attribute in accordance with the same rank order criteria. For normally
distributed datasets, transformation may be achieved by general additive
data perturbation, followed by generating a normalized perturbed value of
the confidential attribute using a conditional distribution of the
confidential and non-confidential attribute. In another aspect, a
software program for accomplishing the method of the present invention is
provided. The method of the invention provides greater security and
utility for the data, and increases user comfort by allowing use of the
actual data without identifying the origin.