A data hashing method, a data processing method, and a data processing
system using a similarity-based hashing (SBH) algorithm in which the same
hash value is calculated for the same data and the more similar data, the
smaller difference in the generated hash values. The data hashing method
includes receiving computerized data, and generating a hash value of the
computerized data using the SBH algorithm in which two data are the same
if calculated hash values are the same and two data are similar if the
difference of calculated hash values is small, wherein a search,
comparison, and classification of data may be quickly processed within a
time complexity of O(1) or O(n) since the similarity/closeness of data
content are quantified by component values for each of the respective
corresponding generated hash values.