Data Warehouse or Data Lake?

by Terence Bennett • November 20, 2019

There are essentially two paths to strategic data storage. The path you choose before you bring in the data will determine what's possible in your future. Although your company's objectives and resources will normally suggest the most reasonable path, it's important to establish a good working knowledge of both paths now, especially as new technologies and capabilities gains wider acceptance.

We'll name these paths for their destinations: The Warehouse or the Lake. As you stand here are the fork in the data road considering which way to go, we've assembled a key to what these paths represent and a map to what could be waiting at the end of each road.

The Warehouse

This well-worn path leads to a massive database ready for analysis. It's characterized by the Extract-Transform-Load (ETL) data process. This is the preferred option for rapid access to and analysis of data, but it is also the only option for highly regulated industries where certain types of private customer data must be masked or tightly controlled.

Data transformation prior to loading is the key here. In the past, the transformation piece or even the entire ETL process would have to be hand-coded by developers, but it’s more common now for businesses to deploy pre-built server-based solutions or cloud-based platforms with graphical interfaces that provide more control for process managers. Transformation improves the quality and reliability of the information through data cleansing or scrubbing, removing duplicates, record fragments, and syntax errors.

Did you know you can generate a full-featured, documented, and secure REST API in minutes using DreamFactory? Sign up for our free 14 day hosted trial to learn how! Our guided tour will show you how to create an API using an example database provided to you as part of the trial!

Create a REST API Now

The Lake

This new path how only recently begun to open up for wider use thanks to the massive storage and processing power of cloud providers. Raw, unstructured, incompatible data streams of all types can pool together for maximum flexibility in handling that data at a later point. It is characterized by the Extract-Load-Transform (ELT) data process.

The delay in transformation can afford your team a much wider scope of possibilities in terms of data mining. Data mining introduces many of the tools at the edge of artificial intelligence, such as unsupervised learning algorithms, neural networks, and natural language processing (NLP), to serendipitously discover new insights hidden in unstructured data. At the same time, securing the talent and the software you need to refine raw data into information using the ELT process can still be a challenge. That is beginning to change as ELT becomes better understood and cloud providers make the process more affordable.

Choosing the Right Path

To go deeper into all of these terms and strategies, consult our friends over at Integrate.io: ETL vs ELT: Top Differences. You'll find a nuts and bolts discussion and supporting illustrations that compare the two approaches in categories such as "Costs", "Availability of tools and experts" and "Hardware requirements." The most important takeaway is that the way we handle data is evolving along with the velocity and volume of what is available. Making the best call early on will have significant repercussions across both your market strategy and financial performance in the end.

Written by Luke Marshall

Check out our other blogs to learn more about data management options : Data Marts and Data Mesh, or if you'd like to learn more about Integrate.io or DreamFactory be sure to book a time here.

Terence Bennett

Terence Bennett, CEO of DreamFactory, has a wealth of experience in government IT systems and Google Cloud. His impressive background includes being a former U.S. Navy Intelligence Officer and a former member of Google's Red Team. Prior to becoming CEO, he served as COO at DreamFactory Software.