Described herein is an operator-based approach to representing dataflows.
A dataflow is a set of one or more operations and one or more flows of
data that are processed successively by the set of operations. A dataflow
is described by a generic description in which operations in a dataflow
are represented by operators. An operator defines a primitive operation
(e.g. join, filter), specifying not only the type of operation, but the
inputs and outputs, rules, and criteria that govern the operation. From
the generic description, a code implementation is generated that may be
completely executed on a source database system and target data
warehouse, without need for an intermediate system to participate in the
execution of the code implementation, such as a data movement engine.