Parallel Processing Support In ETL
ETL Integrator allows user to invoke ETL runtime concurrently to process different set of data set in parallel e.g. user could submit multiple flat file and can invoke same ETL collaboration and ETLSE will process each file concurrently or could chunk the source data set by applying filter and then submit multiple data set than then be processed in parallel.
- User must have different database space for the monitor log and ETL engine instance.
- User must not use the same file file as target, since ETL does not lock the file (In future we could protect the external flat files that target table pointing to, by implementing some locking mechanism e.g. we could create a lock file filename.ext.etllock and check its existence to prevent multiple process writing to same file)
- ETLSE protects monitor log files from possible conflicting updates from different process (e.g. when an ETL instance writing to log file, user could attempt to truncate the log tables)
- During codegen ETL puts a placeholder for the database directory used as part of the AXION JDBC connection URL; it gets resolved by a unique directory every time ETL gets invoked for a given collaboration. On application root-directory where ETL creates two sub-directory one for ETLMonitor (which will be root database directory for ETLMonitor db instance that is shared between ETL monitor application and all instances of a given collaboration) and other for ETLEngine. Under ETLEngine it creates a sub-directory for a given collaboration and then every instance of ETL engine creates a sub-directory, which is the database root directory for a given instance of ETLEngine. This works fine as long user made sure that datasets are mutually exclusive.
One can process ETL in parallel if the user application is receiving mutually exclusive data segments that need to be processed. User also could define a range and create virtual data segments by defining “Extraction Condition” and use runtime arguments as one of the operand in the condition. During runtime this argument can be used as range value and create virtual data segments that can be processed in parallel.
Back to ETLSE