Using Docker to Reproduce Environments
While most of the content on this website focuses on coding specific content, there are important technologies that can greatly increase your effectiveness. One of these is docker which has a variety of use cases for data scientists and quants.
What is Docker?
One of the toughest challenges when you begin to work on projects with multiple people is that not everyone may have the same set up. Forget having the same versions of packages installed, they may not even have the same operating system. Docker is free software that essentially allows you to create a virtual system within your system, which is referred to as a container. These containers are created in a way that given a file for creation, it will build the same container on different operating systems.
An Example Use Case
To make this idea more concrete, let’s think of an example. Say you are building a website with flask, but you and your friend who is working with you on it both have different library versions and different operating systems. If you wanted to easily ensure you were always running the same website, you could create a docker container which installed python then installed specified versions of the libraries. Then, when you are testing the website, if you run it in your docker container you will be sure that the results won’t be different between the two computers.
How to Begin
The creation of the container and full functionality of docker would be a bit too much to discuss in this first post on docker, so I will refer you to this link: https://www.docker.com/blog/containerized-python-development-part-1/. Essentially, using the directions here you can specify what libraries to install and create a container for your purposes.
What are some extensions?
Besides being a great way to ensure you are always running code the same way, there are also many extensions for docker. For example, docker can be used to parallelize tasks (discussed in a later blog) so that if you have a queue of work to be done it can be split among containers. That requires kubernetes. Another extension is that you can deploy containers on many cloud services. For example, Azure allows you to deploy a container for doing things like running a website or for running computation in the cloud. For those of you who find you don’t have enough memory or computing power, you can use containers in the cloud to take some of the load off of your local machine and potentially speed up your work by deploying multiple containers.