What is the DataIntegrator / ETLSE?
ETL stands for extract, transform, and load. That is, ETL programs periodically extract data from source systems, transform the data into a common format, and then load the data into the target data store, usually a data warehouse. ETL processes bring together and combine data from multiple source systems into a data warehouse, enabling all users to work off a single, integrated set of data a single version of the truth.
- Extract -- the process of reading data from a specified source database and extracting a desired subset of data.
- Transform -- the process of converting the extracted/acquired data from its previous form into the form it needs to be in so that it can be placed into another database. Transformation occurs by using rules or lookup tables or by combining with other data.
- Load -- the process of writing/loading the data into the target database.
ETL Integrator is a Data Integration tool which can be used to preform extract - transform - load to build DataWarehouse or can be used for Data Migration.
ETL Integrator is designed to manage and orchestrate high-volume, high-performance data transformation from within the SOA tier. ETL Integrator, along with Open ESB platform, offers a comprehensive enterprise integration infrastructure. ETL Integrator is an enterprise module optimized for extracting, transforming, and loading bulk data between files and databases. It provides connectivity to a vast range of heterogeneous and diversified data sources including non relational data sources. It provides an ETL development and runtime environment that is fully integrated into Open ESB and Netbeans Enterprise Pack and optimized for handling very large record sets.
Why use ETL Integrator?
- Simplify the process of migrating data.
- Standardize the method of data migration.
- Store all data transformation logic/rules in one place.
- Enable users, managers and architects to understand, review, and modify the various interfaces.
- Reduce cost and effort associated with building interfaces (custom coding in conventional languages could cost 2 - 5 times as much).
- Require little database expertise to build high performing ETL processes.
- Metadata auto discovery enables user to design ETL processes faster.
- Ability to integrate with a wider variety of source data system including HTML, XML and RSS.
- Convert data into a consistent, standardized form to enable load to conformed target databases.
- Maintain data quality and data integrity.
- Robust Error handler to ensure data quality. Comprehensive system for reporting and responding to all ETL error events.
- Avail enterprise development environment, ETL Editor is fully integrated with NetBeans 5.5 and NetBeans 6
- Use a robust change management functions or "versioning" integrated with NetBeans.
- Design concurrent/parallel processing of multiple source data streams.
- Execute Full refresh and incremental extraction.
- Orchestrate Business Process where ETL can participate as partner.
- Avail a high degree of connectivity using extended array of binding components.
- Ability to specify complex transformations using built-in transformation objects.
Using the DataIntegrator / ETLSE
- See Also: ETL Features
- ETL Editor has many predefined transformations as well as cleansing operators/functions. It also allows the user to add user defined functions. This editor is used as design time component which lets the user design the ETL collaborations and create artifacts that can be deployed as ETL Service Engine.
- Supported Data Sources
- Oracle (Oracle 8 and higher)
- DB2 (Version 5 and higher)
- SQL Server
- HTML/Web Tables
- RSS/ATOM feeds
- Flat files (CSV, Delimited and Fixed Width)
- Other Databases like Access, Foxpro etc. through JDBC driver.
System requirements and dependencies
No special requirements
Please post questions on the OpenESB mailing list
. File bugs in the OpenESB issue tracker
; use the DataIntegrator / ETLSE category.
Evolution of the DataIntegrator / ETLSE
What is new?
- New Improved Data View: Right Click on the header section of any table. From the popup menu, choose "Show Data". The Data Panel now contains lot of new user options like "Insert a record", "Update a record" and also "Delete a record". The Data panel also has improved navigation capabilities.
- Show Data for Join View: Right Click on the header area of the Join table. The popup menu contains a new menu item named "Show Join Data". Using this feature, you can see the data for the join view even if the join tables are from heterogeneous databases.
- New Mashup Database Menu: Now creating a flatfile table in Axion is very easy and simple. Go to Tools menu -> Choose Mashup Database. From this menu you can create a new Mashup database as well as add tables to it from non-relational data sources. These tables can then be used in ETL Collaborations.
- New Web Table: Now ETL gives you an option to extract any table from any webpage. It also has tag depth based filtering capabilities which helps is searching for the required table. This table can then be used to create an Axion table which in turn can be used in ETL Collaborations.
- New Spreadsheet Table: Now ETL provides the user an option to create an axion table from .xls (Spreadsheet) Files. You can navigate and choose any .xls file and the wizard lets you create an axion table which can be used in any ETL Collaborations.
- New RSS/ATOM Feed table: Now ETL gives you an option to expose RRS/ATOM feed as table. It can then be aggregated or filtered to create more meaningful presentation.
Note:Check ETL Tutorials page for more help.
Here is what we have been thinking ETL Future Ideas
, please do share your thoughts. We may incorporate these in future releases.
- Developers working on the DataIntegrator / ETLSE: Ahimanikya Satapathy, Srinivasan R, Nilesh Apte, Sivashankari Rajmohan, Jawed Akhtar
- Past Contributors: Ritesh Adval, Wei Han, Girish Patil, Rahul Dwivedi, Rupesh Ramachandran, Sudhi Seshachala, Amrish Lal, Jonathan Giron, Chris Johnston, Sujit Biswas, Karthik S, Nithya Radhakrishnan
- People Who helped the Product to evolve: Sureh Sharma, Prabhu Balashanmugam, Kristine Doizaki, Jeroen Vanheeringen, Arthur Lathrop, Evan Carlsen, Siddiq Patel, Mark Foster, Andrea Kendall
Developing the DataIntegrator / ETLSE
ETL Architecture Overview
Process of extraction, transformation and loading in ETL Integrator engine can be classified as below.
- Direct or Simple (No Extraction or No Load) - All the source tables and the Target table reside in the same database.
- One Pass (No Extraction Required, Transform and then Load) - All the source tables are in the same database.
- Staging (Extract Load and Transform - ELT) - Source tables are scattered across different data sources.
- Pipeline (Federated database) - Uses an embedded virtual database to give a single view to different data sources.
Apart from Standard SQL operators/function, ETL Integrator support Java Operators and Custom/User Defined operators.
Example include, cleansing and validation operators such as parsing business name and address, Zip code, SSN data validation and date transformation operators.
To ensure data quality
ETL Integrator allows user to configure data validation logic and user can view the rejected data thru web console and resubmit
DataIntegrator / ETLSE Source Code
ETL Source Code