The data lake is usually defined as 'A data lake is a storage repository that holds a vast amount of raw data in its native format, including structured, semi-structured, and unstructured data.' Especially for data science and for investigative analytics a data lake is incredibly useful. But the question is does it really have to be a physical repository of data? Isn't it sufficient that users can access a system that gives them access to all the data? In other words, why not a virtual data lake? The technology in the form of Data virtualization servers are mature enough to develop data lakes. It would avoid copying massive amounts of big data from their source to the date lake. But what's the difference between a virtual data lake and a logical data warehouse? They are really two sides of the same coin. In this tutorial one integrated architecture is presented that covers both concepts.
Target Audience: BI specialists and DW designers looking to learn the pros and cons of the logical data lake and logical data warehouse; data scientists, data analysts, and business analysts; technology planners and architects; database developers and administrators
Prerequisites: Some general knowledge of data warehousing and business intelligence.