Mapping of cacheable memory pages from other processes in a parallel job
provides a very efficient mechanism for inter-process communication. A
trivial address computation can then be used to look up a virtual address
that allows the use of cacheable loads and stores to directly access or
update the memory of other processes in the job for communication
purposes. When an interconnection network permits the cacheable access of
one host's memory from another host in the cluster, kernel and library
software can map memory from processes on other hosts, in addition to the
memory on the same host. This mapping can be done at the start of a
parallel job using a system library interface. A function in an
application programming interface provides a user-level, fast lookup of a
virtual address that references data regions residing on all of the
processes in a parallel job running across multiple hosts.