Why ETL just isn’t good enough for Reference Data Distribution
Warren Buckley, CTO PolarLake
ETL as a concept sounds very well suited to the world of Reference Data Distribution. Extraction, Transformation and Loading all sound like good things to do when distributing Reference Data. This is why a lot of the ETL vendors show up at the usual Reference Data conferences. It all sounds logical. And lots of Financial Institutions are using it for Reference Data Distribution, usually because there is an available license and it is notionally “free” to use. This all sounds straightforward?
However as the Reference Data Management discipline has matured IT staff have discovered that the reality of real world implementations differs dramatically from what seemed straightforward at the outset of an EDM implementation. Take for example a common problem in the world of Reference Data Distribution. The Business comes to IT with a seemingly simple request “I need a golden copy feed for a new trading system, same as the standard feed except five additional fields and I need four existing fields overridden”.
Sounds straightforward, right? Unfortunately that’s not the case. The above request, while appearing to be an exception has become the rule. Everyone wants something just that little bit different when it comes to Reference Data. And this is where ETL falls down. A product designed for high speed data synchronization of a small number of customer, product and supplier databases of similar dimensions for all industries is poorly suited to the complex world of Security Masters, Corporate Actions, SSIs, Counterparty data etc. Add to that multiple feeds, asset classes and downstream systems and the problem multiplies in complexity. ETL vendors are not specialists in the Financial Services industry so they assume the simplicity and static data structures of many other industries. ETL is great at shifting manufacturing data from SAP into a Business Objects data warehouse with static data structures that may never change. We all know Reference Data in the Financial Services sector is all about change. New feeds, asset classes and downstream systems are the norm. The danger of ETL is that it demonstrates very well and it gives everyone a false sense of security.
However that simplistic view of the ETL vendors is becoming less and less tolerated as the Reference Data Management discipline matures. When Reference Data IT staff see an ETL line drawing demo in 2009 they tend to be more cynical and aggressive as to how this will scale and be manageable when exceptions become the rule. They have the experience of dealing with non-stop exceptions and change. They also have the unpleasant job of maintaining the existing mappings in ETL tools. The beautiful new golden copy data warehouse has been surrounded by an ugly new legacy, a spider’s web of integration mappings.
Add to that the limitations of ETL tools to support request / reply processing, data subscriptions (pub / sub), real time data delivery, message prioritization, data reconciliations etc. and you can see why the assumed fit to the Reference Data world just isn’t as one would have assumed.
2010