Character strings in sample data are classified into groups of character
strings with the same leading n characters (for example, "abc"). Then,
one character string with the highest appearance frequency (the most
frequently appearing character string) in the sample data is extracted
from each group. The most frequently appearing character strings
extracted from each group are registered in a dictionary as initial
values in descending order of appearance frequency. Alternatively,
character strings in sample data are classified into groups of character
strings with the same hash value of leading n characters, the most
frequently appearing character string is detected from each of the groups
and the most frequently appearing character string is registered in the
dictionary as an initial value.