For migration or de-duplication of a file system having a large number of
files, a utility program traverses the file system to create a log of
file-specific information about the file system. For identification of
duplicates, the utility program produces a signature for each file.
Respective instances of the utility program are started on multiple nodes
upon which the file system is mounted. A fully qualified pathname is
compiled during transfer of the log to a database. Multiple databases can
be produced for the file system such that each database contains the
file-specific information for a specified range of inode numbers. The
database also maintains classification state for each file. For example,
for a migration or replication process, the classification state
identifies whether or not the file has been untouched, copied, linked,
secondary-ized, source deleted, or modified.