Index Changes

Enterprise Data Mashup Services

The goal of this project is to provide an open-source implementation of JBI compliant data mashup service engine which will give a single view of data from heterogeneous sources within the enterprise with ability to source data from static web pages and tabular data exposed as web services, join/aggregate them, cleanse the data and generate a response in a WebRowSet schema.

The Concept

Enterprise Data Mashup Service Engine project aims at building a Open-Source JBI compliant Service Engine which features

  • Integrating Information from heterogeneous sources viz. Relational databases, flat files, DCOM documents, spreadsheets, XML, HTML, RSS/Atom, Xquery RowSet to provide unified view.
  • Creating a Data Mashup Services capability in SOA(JBI based) using OpenESB and Netbeans Enterprise pack infrastructure.
  • Exposing the Aggregation of Enterprise Data Sources to Mashup Client frameworks thus an enabling technology for Web 2.0 style applications in the enterprise
  • Time bound View Caching for improved response times, one can call this as virtual materialized view
  • Composite weaving pattern can be applied to provide multiple views of the response through XSLT transformation
  • Extensibility through the ability to consume JBI Services
  • Transforming the response to various formats by weaving the output with an XSLT Service Engine; hence enabling multi-channel deployment
  • Reuse Netbeans Database Explorer to browse source tables; Drag-n-Drop these tables into the Mashup Editor to define the join conditions
  • Ability to view the ResultSet using the Mashup editor, Mashup View Cache Management
  • Creating a DataServices Layer in a true Service Oriented architectures
  • Enabling Integration on-demand
....thus provides the mashed up views of enterprise data from heterogeneous sources. These pre-canned, materialized views served by the EDM SE can be used by clients to build highly responsive and interactive Ajax powered web2.0 style enterprise applications using existing client-side frameworks.

Introduction

The terms web 2.0 and Mashups are doing the round and naturally offerings are gravitating around it. However, an attempt to go beyond the hype cloud to have an insight into the reality is inevitable. Such a reality check alone can result in offerings which are capable of solving real problems and hence bring real value to the customers. This document intends to do just that.

Wikipedia defines mashup as “Mashup (web application hybrid), a website or web application that combines content from more than one source”. Any discussion of Mashup is not complete without reference to Google Maps. chicagocrime.org uses Google Maps and statistics from a publicly available database (the Chicago Police Department's Citizen ICAM), to show where in Chicago different kinds of crimes were committed. I would also prefer to quote eBay which exposed APIs to its partners and built an ecosystem around its auction service. Some examples of eBay mashups include: markovic.com (eBay and Virtual Earth), auctionmapper.com, 2RealEstateAuctions.com. In short Mashups have Business value and not just a cool technology.

Dion Hinchcliffs’ blog gave a nice classification of Mashup styles viz. Presentation Mashup, Client-Side Data Mashup, Client-Side software Mashup, Server-Side Software Mashup and Server-Side Data Mashup. Server-Side Data Mashup is defined as follows.

Server-Side Data Mashup

Databases have been linking and connecting data for decades, and as such, they have relatively powerful mechanisms to join or mashup data under the covers, on the server-side. While it’s still harder to mashup up data across databases from different vendors, products like Microsoft SQL Server increasingly make it much easier to do. This points out that many applications we have today are early forms of mashups, despite the term. Of course, the more interesting and newer aspects of mashups happen above this level.

Mashup concept is making its way into businesses in the form of Enterprise Mashup Services (EMS), which pull data from enterprise search engines, Web services and other storehouses, mix 'n' match them and serve it up to users. EMS sounds like EII, but EMS and EII are different technologies. Naturally EII vendors tend to believe that enterprises that flock to bleeding-edge mashups may get squashed. For they say “With EMS, users must pull data from multiple sources and then burn CPU cycles on their PCs while the client combines the data before presenting it. It's a waste of resources and bandwidth”

What we intend to do?

We want to see it in the light of “how do we help creation of Mashups and participate in this phenomenon?” Chicago crimes used an Information source which was publicly available and we are addressing that part of mashup problem, i.e. How do we expose the Information sources to be available for mashups. How do we do Data mashup under the covers to reduce client processing? The following lines describe the EDMS architecture which tries to address this problem domain.

Enterprise Data Mashup Services Architecture

EDMS Architecture Overview

The core of EDMS architecture is based on the ability to aggregate relational datastores represented as a jdbc resultset, flatfiles mapped to relational and Xquery rowset mapped to relational using Federated Query Server alias Mashup Database Engine which is optimized for read access using its Virtual table support which helps to establish a relational representation for flat file sources, web rowset and xquery rowset. It features the core set of components viz.

Enterprise Data Mashup SE is a JBI compliant service engine which provides Enterprise Data Federation alias Mashup services. Design-Time tool Mashup Editor(Netbeans plug-in) can be used to browse heterogenous data sources and build join conditions. Service engine does the aggregation of datasources and produces the resultset as the response. Output can be further weaved with say XSLT Service engine to produce different output formats thus enabling Multi-Channel deployment.

Federated query server alias Mashup Database

This component is the core kernel of the EII engine as it does the core function of executing the federated query. Mashup Database will be administered in an Netbeans environment.

  • Design time should Support wizards to create tables from flatfile sources viz.
  • CSV(Fixed width/Delimited), tabular xml sources, spreadsheets(MS & ODT) etc.
  • Design time should allow user to visualize tables and export the content
  • Mapping of Xquery rowset to virtual table in Mashup Database
  • Design time should support view data and export tabular data to flatfile formats.
  • Optimized xquery execution and merge with relational

DataMashupServices Manager

This will be the interface to the mashup service. It will perform the job of understanding the configuration for the given mashup, extracting the federated query from the .view file, use the Query Manager to do the query execution and generate the response in the format expected. Basically DataMashupServices Manager will completely decouple the JBI adapter from the Mashup so that the core capabilities can be shared across other components in the JBI eco system.

Query Manager

This will perform the function of receiving the federated query as the input and generate the jdbc resultset as the output. It will however use the underlying Cache Manager to see if the results are already available and valid before query execution. It will also use the Strategy builder to see if the query is really a federated query or a homogenous query in which case it will use the native database to execute the query(or use content-based routing to execute the query using jdbcbc and get the response).

Cache Manager

This will manage the persistence/retrieval of the resultset. It could potentially create a temp table in Mashup Database and do a select to load the resultset back to memory(this is a design decision deferred to later stage).

Strategy Builder

It will generate an execution strategy for the query. It will classify if the query can be executed in a target database or use pipeline strategy execution in Mashup Database Engine.

Xquery engine

It will parse, execute queries against an xml datasource and generate a rowset. Need to explore using an existing Open source implementation of Xquery specification. We will re-use the Xquery implementation of Xquery SE for consistency. Moreover, first version will only address xml sources which can be mapped at design-time and dynamic mapping of xml rowsets to relational will have to be addressed at a later stage.

Mashup Editor(Design Time)

Mashup Editor which is a Netbeans 6.0 Plugin capable of walking the user through the view definition process. It represents the view in Source/Design views and is capable of Source to graphical view synchronization. It also features a Join builder to build join conditions. This editor will also configure the Cache, output format etc.

Mashup Database Browser(Design Time)

Netbeans environment provides the platform to plug menus which can help create mashup databases, ability to browse the mashup tables, view data, build/execute select queries, hence called the database browser. It can also be used to manage life-cycle operations.

Mashup SE

  • JBI compliant service engine
  • Implementation of contracts is based on Common Runtime Library which provides base classes for jbi contracts.
  • Represents the query resultset in the response as WebRowSet (JSR -114 )
  • Implements in-out MEP
  • Can be monitored using a Composite Application manager
  • Run-time environment include Glassfish 9.1 which features an Open-ESB runtime.
  • EDM Service engine can be configured to cache the join resultset. Caching can be achieved by either -
    • Using the ResultSet Cache in the mashup Database
    • Injecting a caching advice for the service

Mashup Database Engine

  • Based on axion database engine which is a small, fast, open source relational database system (RDBMS) supporting SQL and JDBC written in and for the Java programming language.
  • Support for db links and external tables which basically enables federated query capability
  • Support for extensibility through pluggable table types, functions, index types and data types.
  • Features wizards to extract HTML tables from web pages, create xml tables from RSS feeds etc. thus fits well as a mashup infrastructure
  • Supports Data Quality functions
  • Supports normalization and standardization functions

EDMS ECO System

  • EDMS lives in the Open-ESB/Open-JBI-Components eco-system and hence can leverage the functional and non-functional attributes of the eco-system
  • Leverages Xquery SE to access xml datasources
  • DCOM BC to extract data from Office Documents.
  • CASA Editor for service composition
  • Aspect SE for injecting caching/logging etc. using the Facade/Composite Weaving patterns.
  • Netbeans Infrastructure
  • Sun Web Developer pack to build mashup clients

Deployment

Mashup SE has to be injected with Aspects and hence the Service Assembly has to be composed in a CASA Editor and deployed as a composite Application. We intend to use the existing infrastructure from Aspect SE and CASA as much as possible. Monitoring could be from CAM.

Multi-Channel Deployment

It is possible to weave(Composite Weaving pattern) Mashup SE and XSLT SE using CASA Editor and create multiple Service Assemblies with different stylesheets for different Device channels like browsers, PDAs etc. Thus making multi-channel deployment possible.

EDMS SDLC

  • Create tables in Axion
  • Create a Project of type Enterprise Data Mashup
  • Use Data Mashup Editor to browse tables, define join conditions and visualize the join view which creates a .edm file
  • Use Project system to build the project which will generate the mashup_engine.xml which contains the ddls and the view queries
  • Add the service unit to the composite app project system to deploy the view as a service
  • Invoke the service to test the output

Enterprise Software Development is complex not so much because of the limitations of technology but because of the endless and conflicting demands from the Consumer. Search for the perfect solution has finally been directed towards not building an all-in-one solution, but to build the infrastructure which can support the multitude of solutions which are only limited by the imagination of the consumers. Web 2.0 for the Enterprise is an effort in that direction.

Jonathan Schwartz, Sun Microsystems CEO, describes this as an age in which -

an open and competitive network fuels growing opportunities for everyone — not simply to draw data or shift work around the world, but to participate, to create value and independence. If the Information Age was passive, the Participation Age is active.

Meg Withgott, Director of Engineering, Network Communities at Sun Microsystems says -

The human wish to communicate and create has deep roots, and new collaborative business models have tapped right into them. Millions of people are now creating content in an uncoordinated way that builds significant value. This is an emergent process, like the way termites or ants build without an architectural blueprint. These “ants” and their communities will in all likelihood tunnel around standard computing models looking for businesses that offer paths to participation.

Cool thoughts on Web 2.0, 3.0 ... Visualization Age

  • Browser on the desktop sucks. I want to rotate google earth and them see 3D surface plots instead of google maps and lines. HTML will be surpassed and complemented by VRML
  • Search should be more intelligent to capture the context of the user. A chemist searching for a drug should not end up getting dogs in his search results
  • Natural language processing. Why to type? Just tell the computer in your native tongue and let it process it.
  • Content creation should include search indexing. The moment I save a document, it should publish itself to a search engine.
  • Priority based services. Someone doing time pass searches should wait more than someone doing critical work.

Web 2.0 a hype or reality?

Web 2.0 is more of a reflection of an Industry trend than conceptualization of a new phenomena. Hence, it is a reality. Tim O’Reilly gives a concrete set of axioms which refers to the web 2.0 paradigm.

What is web 2.0?

  • Web As a Platform : Shift from proprietary operating system as a foundation, the new development model uses Web as a platform and open standards as a backbone to build services and not packaged applications.
  • Harnessing Collective Intelligence : A good illustration is Amazon which did not just stop from displaying the Product Information in the catalog, but went on to collect rich user feedback as part of each item.
  • Data is the Next Intel Inside: Hal Varian remarked “SQL is the new HTML”. Database Management is the core competency of web 2.0 companies. For Googles’ and eBays’ to succeed, it is obvious that managing monstrous databases and keeping them up-to-date is a bottom-line problem.
  • End Of Software Life Cycle: Clearly Googles’ and eBays’ dont release software, but roll-them into production often in a short life spans. Cal Henderson, lead developer of Flickr revealed that they roll out every half-hour. Such is the dynamism of the next generation web.
  • LightWeight Programming Models Think Syndication, Not co-ordination. Simply put, syndicate data outwards. Dont worry about what happens at the receiving end.
  • Software above the level of a single device The development of the web as platform extends this idea to synthetic applications composed of services provided by multiple computers.
  • Rich User Experiences Using Ajaxed frameworks to build interactive desktop like clients
  • Architecture of Participation Andrew Mcafee quoted about emergence “Another important difference is that Web 2.0 has accelerated the rate of emergence on the public Internet. I think of Web 2.0 tools and technologies as accomplishing two important goals: increasing the number of people who are contributing content (and the ease with which they can do it), and increasing the number of ways to let content creators (and consumers) interact with each other. These new interactions are the further mechanisms, beyond linking, for emergence -- for letting patterns and structure emerge from low-level behavior.” probably serves to explain the architecture of participation.
  • Architectures through Emergence

Two themes emerge:

  • Software as a service ( SAAS )
  • User as a Co-Developer

While this is mostly in the context of companies like Google and eBay which deliver services, these principles can be applied in the context of Enterprise.

Andrew McAfee a Harvard Academician wrote, "Enterprise 2.0 is the use of emergent social software platforms within companies, or between companies and their partners or customers.

Emergence is what happens when the whole is smarter than the sum of its parts. It's what happens when you have a system of relatively simple-minded component parts -- often there are thousands or millions of them -- and they interact in relatively simple ways. And yet somehow out of all this interaction some higher level structure or intelligence appears, usually without any master planner calling the shots. These kinds of systems tend to evolve from the ground up.

Enterprise Trend


Enterprise Mashup


(Source: Dion Hinchcliffe Blog Entry -- http://web2.wsj2.com/)

Enterprise 2.0 involves the following concepts

  • Decentralization of hierarchy through an Architecture of Participation
  • User as a co-developer

Web Table

This tutorial shows the ability to source from any web page and use the same table in Data Mashup.

This feature will be particularly useful for creating a database table by extracting a HTML Table on any web page (For Eg: a web catalog). In the tutorial, We show how to select the "Truck Light Boxes" Catalog for "Light Accessories" from http://www.buyersproducts.com. We have a list of Tables available from the webpage. We search for the required table using the HTML Tag - Depth based filtering and choose the required table. We also choose a Field Delimiter, Record Delimiter and Text Qualifier for parsing the extracted data into CSV format. Then we also do some modification to the extracted data and finally use this modified data to create a new Flatfile Axion table.

Also, the HTML Table extraction is done On-Demand Basis which makes the extraction of tables from the web page much faster. It can also clip a HTML table from any HTML file (Either available on the local File System or available at any URL). In addition to these, the wizard also provides the user to choose a Filename for the new CSV File. If any caption is available for the table, the caption is chosen as the default filename.


NOTE: This feature works fine for simple HTML Table without any nesting. The result is unpredictable if nested tables are used.

Spreadsheet Table

This tutorial shows the ability to source from .xls spreadsheet and use the same table in Data Mashup.

In this tutorial, We create a table for the .xls spreadsheet. The wizard gives an option to choose a table from the list of tables available in the spreadsheet. By saying list of tables in a spreadsheet, we consider each sheet in a spreadsheet as a table. The example shows two tables available for the chosen spreadsheet, that is because we have two sheets in the spreadsheet. Then we create an Axion table for this spreadsheet.

Contributors: Ahimanikya Satapathy, Srinivasan Rengarajan, Karthik S

JSPWiki v2.4.100
[RSS]
« Home Index Changes Prefs
This page (revision-26) was last changed on 24-Mar-09 12:02 PM, -0700 by FrankKieviet