Sunday, April 7, 2013

A Battle of ETL Tools: PowerCenter Informatica Versus Microsoft SSIS

Technology has made it easier for businesses to organize and manipulate data to get a clearer picture of what’s going on with their business. Notably, ETL tools have made managing huge amounts of data significantly easier and faster, boosting many organizations’ business intelligence operations.

There are many third-party vendors offering ETL solutions, but two of the most popular are PowerCenter Informatica and Microsoft SSIS (SQL Server Integration Services). Each technology has its advantages but there are also similarities on how they carry out the extract-transform-load processes and only differ in terminologies.

If you’re in the process of choosing ETL tools and PowerCenter Informatica and Microsoft SSIS made it to your shortlist, here is a short comparative discussion detailing the differences between the two, as well as their benefits.

Package Configuration

Most enterprise data integration projects would require the capacity to develop a solution in one platform and test and deploy it in a separate environment without having to manually change the established workflow. In order to achieve this seamless movement between two environments, your ETL technology should allow the dynamic update of the project’s properties using the content or a parameter file or configuration.

Both Informatica and SSIS support this functionality using different methodologies. In Informatica, every session can have more than one source and one or more destination connections. There are different kinds of connections the primary being relational connections. Every session can be set up dynamically by changing parameters contained in a parameter file.

The same thing can be achieved in SSIS via Configurations. Using the SSIS Configuration Wizard, configuration data is saved in XML files. Unlike Informatica wherein there can be multiple connections, SSIS only allows a single defined connection that can be applied across all tasks in a package.

In short, Informatica parameters are defined at the session level while SSIS configurations are set at the package level.

Data Staging

When you use SSIS, you will use the Connection Manager to generate a connection defining the physical location of the file. Multiple files that need to be loaded from multiple connections would require multiple connections. All information set in the connection manager can be incorporated in the Configuration File and can be dynamically updated during run-time.

On the other hand, if you’re using Informatica, you will use the Workflow Manager tool to assign a location to each file. Every task that needs to access that file can be configured with the location and name of that specific file.

Value Extraction

One of the main functions of ETL tools is being able to extract meaning from the information that is currently being ran or to supplement that information with extra information obtained from the current information in the data processing pipeline.

Both SSIS and Informatica have this functionality through the use of derived columns or the capacity to draw new information from existing data. Informatica does this via its expression transformation component while Microsoft SSIS does this via the Derived Column transformation.

The logic used to complete both operations is the same and the syntax involved is also identical. The difference between the two technologies lies in the expression language utilized to obtain the new data and the notation style involved. Informatica uses Character, Conversion, Data, Numerical, Scientific, Special and Test. On the other hand, SSIS uses Mathematical, String, Date/Time, NULL, Type Casts and Operators.

Sorting

Simply defined, sorting is having the ability to sort information into a chronological data set. While the order of the information may appear to be immaterial for loading into a relational data warehouse or database, it may matter for the other tasks later on in the transformation process.

The difference on how SSIS and Informatica carry out this functionality cannot be subtler. Informatica’s Sorter and SSIS’ Sort can both chronologically organize data and eliminate duplicate data. In SSIS, de-duplication can be done by setting the eliminate duplicates option to TRUE. For Informatica, this can be done by selecting the distinct option.


Detection of Data Quality Issues

Similar to all data integration solutions, ETL technologies can be susceptible to data quality problems. Fortunately for users of Microsoft SSIS, it allows for the creation of checkpoints within the data transformation process that can reveal and repair data quality problems. SSIS has a feature called Fuzzy Lookup transform that pairs incoming “dirty” information – unexpected abbreviations, null fields, inserted or missing tokens, truncations, misspellings and other data abnormalities – with clean records contained in a reference table. There is also the Fuzzy Grouping tool that finds similarities among input rows and unites duplicate data.

Unfortunately, Informatica does not have the same functionality out of the box. If you want to recreate this capability in Informatica, it would require human intervention to make a mapping table that contains every referrer value that came across the input stream and their equivalent mapping values.


However, take note that even if you’re using SSIS, you may still need to manually intervene to detect and repair data quality issues. Even the most advanced algorithm may miss something so you still need to manually check for the accuracy and integrity of your data.

Modularity

Modularity is concerned with the manner in which the work units that make up an end-to-end ETL solution are created and reused.

There’s a slight difference between PowerCenter Informatica and Microsoft SSIS on how they build modular data integration and ETL solutions.

Informatica involves a bottom-up framework to ETL implementation by permitting a library of components – mappings, mapplets, transformations, targets, and sources – that can be employed across numerous worklets in the solution. A worklet is composed of a sequence of mapping instances.

On the contrary, SSIS uses a top-down approach in wherein a general sequence of tasks is defined before setting the specifics on how these tasks are going to be carried out. Reusability of ETL components is achieved by creating libraries of packages which can then be implemented together with a master package. A package is the counterpart of Informatica’s worklet.

Tracking Changes in Slowly Changing Dimensions

Slowly changing dimensions address the issue of capturing and documenting a history of modifications or changes to entities within a database that are not reflected in a System of Record for that particular data. A common example of slowly changing dimensions is an item moving to another product category in a department store. This will modify the said product’s category attribute but its SKU will remain unchanged.

Informatica and SSIS both have the functionality to track these changes with very similar features. Informatica is equipped with a “slowly changing dimension wizard” that will allow you to create the sources, transformations, ports and destinations that are pertinent to accommodate these slowly changing requirements. SSIS also comes with a slowly changing dimension wizard that works similarly. Aside from the ability to keep track of slowly changing attributes, it can also recognize changes to attributes that are not supposed to change. These are known as fixed attributes. You have the option to enable the wizard to raise an error warning accordingly when this happens.

Dimension Loading

In terms of dimension loading, a surrogate key is necessary. As a substitute to a natural key, a surrogate key is where every join between fact tables and dimension tables are based.

Informatica and SSIS have varying ways on how they generate surrogate keys.

Out of the box, Informatica PowerCenter comes with a component dubbed as Sequence transformation that has the capacity to create a surrogate key. It produces an incremental value for every row in the pipeline which can then be incorporate into a destination table via a surrogate key column.

SSIS does not have a Sequence transformation component. Instead, it uses its Script transform component to generate surrogate keys.

Fact Table Loading

Populating fact tables usually involved two processes: (1) aggregating the data to the needed granularity and (2) retrieval of dimensional surrogate keys.

Informatica PowerCenter carries out these operations via a transformation dubbed as “Aggregator.” This component cuts across groupings of values from chosen input columns. SSIS has the same capability through the component called “Aggregate.” The slight difference between the two is that SSIS only has the most used functions such as Minimum and Maximum, Average, Count Distinct, Count and Sum. Informatica has all these and other extra capabilities such as Variance, Standard Deviation, Percentile, Median, First and Last.

Which ETL Technology is Right for Your Business?

There are other aspects that illustrate the difference between Informatica PowerCenter and Microsoft SSIS such as Design Time Debugging, Collaborative Development, Security, Integration with Packaged Applications, and Web Services & Service Oriented Architecture. However, the things discussed above cover the basic concepts in ETL technology.

As businesses encounter bigger challenges to synergize data from a constantly increasing number of different systems, your choice of an ETL solution to fit your needs is more crucial than ever.

As for the choice between Informatica PowerCenter and Microsoft SSIS, many analysts consider Informatica as the leader in ETL technology while reputable research firm Forrester firm once called SSIS’ price-to-performance ration “downright seductive.”

However, proclaiming a winner in this battle between two ETL technology giants would greatly depend on your business requirements. Of course, there are pricing differences between the two technologies and notable difference in their features, capabilities and differences as well as their level of usability. It’s for you to analyze which technology is the perfect fit.

23 comments:

  1. Some of this blog is obsolete with version 2012 of SSIS. Configuration files are deprecated and no longer fully supported - which is massive. They were a big pain and eliminated with the new job and package parameters.

    ReplyDelete
  2. Informatica Training |Informatica Online Training| Informatica ...
    www.21cssindia.com/courses/informatica-online-training-31.html
    Informatica Training , Informatica Online Training, Informatica Corporate Training, Expert Informatica Training, Expert Informatica online Training, Best ...
    Informatica Data Quality Online Training
    www.21cssindia.com/courses/informatica-idq-online-training-98.html
    Online training informatica data quality, online informatica data quality training, informatica data quality online training from India , informatica data quality course ...
    Informatica ILM Training| Informatica ILM Online Training ...
    www.21cssindia.com/courses/informatica-ilm-online-training-245.html
    Informatica ILM Online Training. Click Here For Enquiry. Introduction to ILM Data Archive. Describe Informatica's approach to data archive; Introduction to the ILM ...
    informatica mdm online training| informatica mdm training ...
    www.21cssindia.com/courses/informatica-mdm-online-training-99.html
    Online training informatica mdm, online informatica mdm training, informatica mdm online training, informatica mdm training, informatica mdm enquiry, ...

    ReplyDelete
  3. Thanks for sharing this information. I would like to say:
    The ‘ETL’ shortcut comes from 'Extract, transform, and load' – the words that describe the idea of the system. The ETL tools were created to improve and facilitate data warehousing.
    The Etl process consists of the following steps:
    Initiation
    Build reference data
    Extract from sources
    Validate
    Transform
    Load into stages tables
    Audit reports
    Publish
    Archive
    Clean up
    Sometimes those steps are supervised and performed indirectly but its very time-consuming and may be not so accurate.
    The purpose of using ETL Tools is to save the time and make the whole process more reliable.
    https://intellipaat.com/informatica-training/

    ReplyDelete
  4. It was very nice article and it is very useful to Testing tools learners.We also provide Testing tools online training

    ReplyDelete
  5. Thanks for sharing such useful information on the blog and refer the link Android Training in Chennai

    ReplyDelete
  6. This is the information that I was looking for.. Thanks for the efforts you put to gather such a nice content and posted here.
    Informatica training | Microsoft Dynamics CRM Training

    ReplyDelete
  7. This comment has been removed by the author.

    ReplyDelete
  8. Informatica PowerCenter is a more powerful and robust tool than SSIS and most of the competing ETL tools on the market, 97% of Fortune 100 use Informatica PowerCenter in their business intelligence services and data warehouse teams. Our organization offers, Informatica Consulting, Informatica training and support for large companies in the US, UK, Scandinavia and Europe. Our business intelligence services are offered both onsite or via a licensed virtual environment. For more information, contact us sales@existbi.com or website is: www.existbi.com. Best regards, Big Data Consulting Director, Max T. Russ

    ReplyDelete
    Replies
    1. I would like to share one of my opinions. To transform any data into any database fast and easily ETL tools are required. There are so many useful tools. People from all over the word can use whenever they need to transform any data into any database faster. So, I think ETL tools is the best tools ever in the word.
      http://www.etl-tools.com

      Delete
  9. nformatica PowerCenter is a more powerful and robust tool than SSIS and most of the competing ETL tools on the market, 97% of Fortune 100 use Informatica PowerCenter in their business intelligence services and data warehouse teams. Our organization offers, Informatica Consulting, Informatica training and support for large companies in the US, UK, Scandinavia and Europe. Our business intelligence services are offered both onsite or via a licensed virtual environment. For more information, contact us sales@existbi.com or website is: www.existbi.com. Best regards, Big Data Consulting Director, Max T. Russ

    ReplyDelete
  10. You have stated definite points about the technology that is discussed above. The content published here derives a valuable inspiration to technology geeks like me. Moreover you are running a great blog. Many thanks for sharing this in here.

    Salesforce Training
    Salesforce training in chennai
    Salesforce training institutes in chennai

    ReplyDelete
  11. Truly enjoyed surfing around your weblog posts.If u r interested visit
    http://www.tekclasses.com/

    ReplyDelete
  12. Excellent blog you’ve got here..

    http://www.tekclasses.com/

    ReplyDelete
  13. Its a very good site to visit.

    http://www.tekclasses.com/

    ReplyDelete
  14. Nice thanks for sharing with us

    visit- http://tekclasses.in/

    ReplyDelete
  15. too good piece of information, I had come to know about your site from my friend sajid, bangalore,i have read atleast 7 posts of yours by now, and let me tell you, your web-page gives the best and the most interesting information. This is just the kind of information that i had been looking for, i'm already your rss reader now and i would regularly watch out for the new post, once again hats off to you! Thanks a lot once again, Regards, informatica mdm training in hyderabad

    ReplyDelete
  16. nice posts..
    Informatica training, in the recent times has acquired a wide scope of popularity amongst the youngsters at the forefront of their career.
    Informatica online training in hyderabad


    ReplyDelete
  17. Informatica MDM uses a flexible business model-driven MDM approach to address the enterprise’s unique MDM business requirements. Informatica MDM training at TekSlate is well taught with real time scenarios.

    ReplyDelete
  18. Thank you for sharing such a nice and interesting blog with us. i have seen that all will say the same thing repeatedly. But in your blog, I had a chance to get some useful and unique information
    We at Colan Infotech Private Limited best web design company in chennai,is Situated in US and India, will provide you best service in
    Professional Website Design Services and Colan Infotech has a group of exceedingly dedicated,
    inventive and creative experts with an energy for delivering exciting , helpful and stylish Web and Mobile Applications,
    We provide all sort of web designing services in chennai and
    of course we stepped in bangalore too we are best website designers in bangalore can provide web design services in bangalore, hire web designer india

    ReplyDelete

  19. This is excellent information. It is amazing and wonderful to visit your site.Thanks for sharng this information,this is useful to me...
    Android training in chennai
    Ios training in chennai

    ReplyDelete
  20. Thanks for posting useful information.You have provided an nice article, Thank you very much for this one. And i hope this will be useful for many people.. and i am waiting for your next post keep on updating these kinds of knowledgeable things...Really it was an awesome article...very interesting to read..please sharing like this information......
    Web Design Development Company
    Mobile App Development Company


    ReplyDelete