Moving the work of data scientists from their analytics notebooks to production is often a slow, manual and error prone process.
In our talk we present how the principles of Continuous Integration and Continuous Delivery can be applied to data and machine learning applications. We will describe how to manage a development environment consisting of automatic pipelines for code and data to continuously move the work of your rare and expensive data scientists into your production environment at any time reliable and scalable - what we call Continuous Intelligence. In our talk we will illustrate this with our experience in several customer projects and some live demonstrations.
Target Audience: Architects, Developers, BI Engineers, Data Scientists, Machine Learning Engineers
Prerequisites: Basic understanding of software development processes. Basic understanding of how data scientists work
'Data Scientists working on the company's big data usually work in their own environment consisting of data sources, statistical modeling languages, machine learning frameworks and personal analytics notebooks. Moving their hand-crafted and optimized data models and algorithms into production usually isn't their business. Data engineers, developers and operations engineers take over to deploy the work of the data scientists, sometimes reimplementing everything. And when data scientists update their models with new data, the whole process starts from the beginning. This takes a long time, is cumbersome and error prone.
At ThoughtWorks, we're pioneers in Continuous Delivery, where tools and processes ensure that software under development can be reliably released at any time and with high frequency. In our talk we present how the principles of Continuous Integration (CI) and Continuous Delivery (CD) can be applied to data and machine learning applications. We will describe how CI/CD tools like GoCD and data science versioning control (DVC) tools can work together to manage a development environment consisting of automatic pipelines for code and data. With this technique, the valuable work of your rare and expensive data scientists can be continuously moved into your production environment to be deployed at any time reliable and scalable - what we call Continuous Intelligence.
In this talk, we will introduce you to the principles of Continuous Intelligence. We will demonstrate how the principles work in practice with some live demonstrations and illustrate the talk with our experience in real customer projects.