Not logged in | Login
Enterprise Data Mashup
The goal of this project is to provide an open-source implementation of JBI compliant data mashup service engine which will give a single view of data from heterogeneous sources within the enterprise with ability to source data from static web pages and tabular data exposed as web services, join/aggregate them, cleanse the data and generate a response in a WebRowSet schema.
Enterprise Data Mashup Service Engine project aims at building a Open-Source JBI compliant Service Engine which features
Wikipedia defines mashup as “Mashup (web application hybrid), a website or web application that combines content from more than one source”. Any discussion of Mashup is not complete without reference to Google Maps. chicagocrime.org uses Google Maps and statistics from a publicly available database (the Chicago Police Department's Citizen ICAM), to show where in Chicago different kinds of crimes were committed. I would also prefer to quote eBay which exposed APIs to its partners and built an ecosystem around its auction service. Some examples of eBay mashups include: markovic.com (eBay and Virtual Earth), auctionmapper.com, 2RealEstateAuctions.com. In short Mashups have Business value and not just a cool technology.
Dion Hinchcliffs’ blog gave a nice classification of Mashup styles viz. Presentation Mashup, Client-Side Data Mashup, Client-Side software Mashup, Server-Side Software Mashup and Server-Side Data Mashup. Server-Side Data Mashup is defined as follows.
Databases have been linking and connecting data for decades, and as such, they have relatively powerful mechanisms to join or mashup data under the covers, on the server-side. While it’s still harder to mashup up data across databases from different vendors, products like Microsoft SQL Server increasingly make it much easier to do. This points out that many applications we have today are early forms of mashups, despite the term. Of course, the more interesting and newer aspects of mashups happen above this level.
Mashup concept is making its way into businesses in the form of Enterprise Mashup Services (EMS), which pull data from enterprise search engines, Web services and other storehouses, mix 'n' match them and serve it up to users. EMS sounds like EII, but EMS and EII are different technologies. Naturally EII vendors tend to believe that enterprises that flock to bleeding-edge mashups may get squashed. For they say “With EMS, users must pull data from multiple sources and then burn CPU cycles on their PCs while the client combines the data before presenting it. It's a waste of resources and bandwidth”
We want to see it in the light of “how do we help creation of Mashups and participate in this phenomenon?” Chicago crimes used an Information source which was publicly available and we are addressing that part of mashup problem, i.e. How do we expose the Information sources to be available for mashups. How do we do Data mashup under the covers to reduce client processing? The following lines describe the EDMS architecture which tries to address this problem domain.
The core of EDMS architecture is based on the ability to aggregate relational datastores represented as a jdbc resultset, flatfiles mapped to relational and Xquery rowset mapped to relational using Federated Query Server alias Mashup Database Engine which is optimized for read access using its Virtual table support which helps to establish a relational representation for flat file sources, web rowset and xquery rowset. It features the core set of components viz.
Enterprise Data Mashup SE is a JBI compliant service engine which provides Enterprise Data Federation alias Mashup services. Design-Time tool Mashup Editor(Netbeans plug-in) can be used to browse heterogenous data sources and build join conditions. Service engine does the aggregation of datasources and produces the resultset as the response. Output can be further weaved with say XSLT Service engine to produce different output formats thus enabling Multi-Channel deployment.
This component is the core kernel of the EII engine as it does the core function of executing the federated query. Mashup Database will be administered in an Netbeans environment.
This will be the interface to the mashup service. It will perform the job of understanding the configuration for the given mashup, extracting the federated query from the .view file, use the Query Manager to do the query execution and generate the response in the format expected. Basically DataMashupServices Manager will completely decouple the JBI adapter from the Mashup so that the core capabilities can be shared across other components in the JBI eco system.
This will perform the function of receiving the federated query as the input and generate the jdbc resultset as the output. It will however use the underlying Cache Manager to see if the results are already available and valid before query execution. It will also use the Strategy builder to see if the query is really a federated query or a homogenous query in which case it will use the native database to execute the query(or use content-based routing to execute the query using jdbcbc and get the response).
This will manage the persistence/retrieval of the resultset. It could potentially create a temp table in Mashup Database and do a select to load the resultset back to memory(this is a design decision deferred to later stage).
It will generate an execution strategy for the query. It will classify if the query can be executed in a target database or use pipeline strategy execution in Mashup Database Engine.
It will parse, execute queries against an xml datasource and generate a rowset. Need to explore using an existing Open source implementation of Xquery specification. We will re-use the Xquery implementation of Xquery SE for consistency. Moreover, first version will only address xml sources which can be mapped at design-time and dynamic mapping of xml rowsets to relational will have to be addressed at a later stage.
Mashup Editor which is a Netbeans 6.0 Plugin capable of walking the user through the view definition process. It represents the view in Source/Design views and is capable of Source to graphical view synchronization. It also features a Join builder to build join conditions. This editor will also configure the Cache, output format etc.
Netbeans environment provides the platform to plug menus which can help create mashup databases, ability to browse the mashup tables, view data, build/execute select queries, hence called the database browser. It can also be used to manage life-cycle operations.
Mashup SE has to be injected with Aspects and hence the Service Assembly has to be composed in a CASA Editor and deployed as a composite Application. We intend to use the existing infrastructure from Aspect SE and CASA as much as possible. Monitoring could be from CAM.
It is possible to weave(Composite Weaving pattern) Mashup SE and XSLT SE using CASA Editor and create multiple Service Assemblies with different stylesheets for different Device channels like browsers, PDAs etc. Thus making multi-channel deployment possible.
Enterprise Software Development is complex not so much because of the limitations of technology but because of the endless and conflicting demands from the Consumer. Search for the perfect solution has finally been directed towards not building an all-in-one solution, but to build the infrastructure which can support the multitude of solutions which are only limited by the imagination of the consumers. Web 2.0 for the Enterprise is an effort in that direction.
Jonathan Schwartz, Sun Microsystems CEO, describes this as an age in which -
Meg Withgott, Director of Engineering, Network Communities at Sun Microsystems says -
Web 2.0 is more of a reflection of an Industry trend than conceptualization of a new phenomena. Hence, it is a reality. Tim O’Reilly gives a concrete set of axioms which refers to the web 2.0 paradigm.
While this is mostly in the context of companies like Google and eBay which deliver services, these principles can be applied in the context of Enterprise.
Andrew McAfee a Harvard Academician wrote, "Enterprise 2.0 is the use of emergent social software platforms within companies, or between companies and their partners or customers.
Emergence is what happens when the whole is smarter than the sum of its parts. It's what happens when you have a system of relatively simple-minded component parts -- often there are thousands or millions of them -- and they interact in relatively simple ways. And yet somehow out of all this interaction some higher level structure or intelligence appears, usually without any master planner calling the shots. These kinds of systems tend to evolve from the ground up.
This feature will be particularly useful for creating a database table by extracting a HTML Table on any web page (For Eg: a web catalog). In the tutorial, We show how to select the "Truck Light Boxes" Catalog for "Light Accessories" from http://www.buyersproducts.com. We have a list of Tables available from the webpage. We search for the required table using the HTML Tag - Depth based filtering and choose the required table. We also choose a Field Delimiter, Record Delimiter and Text Qualifier for parsing the extracted data into CSV format. Then we also do some modification to the extracted data and finally use this modified data to create a new Flatfile Axion table.
Also, the HTML Table extraction is done On-Demand Basis which makes the extraction of tables from the web page much faster. It can also clip a HTML table from any HTML file (Either available on the local File System or available at any URL). In addition to these, the wizard also provides the user to choose a Filename for the new CSV File. If any caption is available for the table, the caption is chosen as the default filename.
NOTE: This feature works fine for simple HTML Table without any nesting. The result is unpredictable if nested tables are used.
In this tutorial, We create a table for the .xls spreadsheet. The wizard gives an option to choose a table from the list of tables available in the spreadsheet. By saying list of tables in a spreadsheet, we consider each sheet in a spreadsheet as a table. The example shows two tables available for the chosen spreadsheet, that is because we have two sheets in the spreadsheet. Then we create an Axion table for this spreadsheet.
Contributors: Ahimanikya Satapathy, Srinivasan Rengarajan, Karthik S