- Assess whether the information can be used to meet the
needs of the organization.
- Assess the data
quality
- Check the potential for using the data in a new or
different application.
- Assess the risks that any kind of data processing,
storage, or management might have on the data.
- Have an overall view of managing and governing data
for the warehouse.
If you stay in
the ETL business, you will find that data profiling often becomes the backbone
of your work in planning, designing, implementing, and managing a data
warehouse. In order to guide you in your projects, you can follow these top
four data profiling best practices, all very helpful tips in this line of work.
They can be applied to your circumstances, too.
Data Profiling Best Practice 1: Do it before data
warehousing
Do data
profiling before data warehousing. This is probably the first and foremost data
profiling best practice that you should know. When you find that your company
or organization needs to set up a data warehouse, then your first order of
business is to profile the information that you now have, before embarking on
any of the extract-transform-load processes. This first step allows you to look
long and hard objectively into the information that you have and to see what
its quality is. This first data profiling step provides metadata that you will
usually need when designing the rest of your data warehouse. This data
profiling best practice will ensure that your ETL best practices are as
efficient as they can be. In most cases, a data profiling done before the
actual data warehousing project, will even tell you if the project is worth
pursuing—saving you thousands of dollars in manpower hours.
Data Profiling Best Practice 2: Know the full array of
tools
It is
necessary for you to know the full array of tools that you can use to create
your data profiling. Best practice is to get an idea of what programs or
platforms are in the market then compare their functionality, price points, and
usage so that you will have an idea of which tool is best suited for your data
profiling purposes. Some tools available in the market are DataFlux, Trillium, and
Informatica and even the basic data profiling tools that come with SQL Server.
Currently, some providers have also released automated profiling tools. Most
ETL experts agree that the last option is already suitable for basic, light
data profiling work. For the more advanced kinds of projects, however, a tool
dedicated or mainly designed for data profiling work is best. Aside from the
kind of work, settling on a data profiling tool should, of course, be based on
the level of standardization that you want. What you should do as a data
profiling best practice is to get on to the Internet and faithfully do your
research about each of these tools and read user/programmer feedback. Remember
that choosing a data profiling tool really is a commitment to a kind of
standard and style for data profiling.
Data Profiling Best Practice 3: Combine data profiling with
cross-checking
If you really
want to get the most out of your data profiles, you need to remember that the
information you’ll be working with does not only come from inside, but must
also be validated by outside sources. That means that you have to cross-check
the facts that you have from data profiling with the feedback and other
information that you are getting from other sources, which may not be so easy
to tabulate and input in a data warehouse. These outside ways of checking data
profiles may come from customer feedback, simple questionnaires for your
stakeholders, trends relating or corresponding to the items in your data
profile, or maybe a general survey of what your industry benchmarks are. This
data profiling best practice can help you check the results of your inside
work, as well as possibly point out errors or open up new ways of looking at
your available data—all for the better understanding of information about your
company.
Data Profiling Best Practice 4: Review segmentation
As a last data
profiling best practice, you must continue running this process even after you
have designed your data warehouse. Profiling data regularly can help you look
for new ways of interpreting the data that you have, which usually brings a
fresh insight into how the company or organization is doing, revealing changing
trends and sounding off on any need to innovate or adapt so that the company
will perform better. More than just regular, periodic data profiling, however,
you also have to regularly review the way you segment the information for your
data profiling work. This means that you have to cut through your data in new
ways so that you get fresh insights into the same information. This will help
you broaden or focus your data segmentation so that the metadata will give you
either a widened or a specified look into your market and your business
transactions. Sometimes, adjusting your data profile segmentation can mean just
doing the data profile on a relevant part of information within your data
warehouse. This is done so that you can be more efficient at studying the
different related information or information chains. Data profiling best
practices will make the data identification process faster and will yield more
information about the more important aspects of your matrix of data.