The Problem

Python is a very reasonable "default" language of choice for a variety of tasks. At least on desktop platforms, it makes sense to start with Python and see how it goes. Unfortunately, along its way to popularity it lost much of the simplicity that made it fun in the first place. An especially egregious aspect of Python is its poor backward compatibility. In practice it means that your codebase will only reliably work with a specific version of Python interpreter.

Even worse, the same holds for many popular libraries. Some Python codebase will only work with a certain library version 5.0.0 or lower, while another codebase requires 5.0.1 or higher. Thus, working with a single combination of Python interpreter and a bunch of libraries is simply not an option beyond the most basic experiments.

Working with third-party code is also not straightforward. Python has an official package repository, where any reasonable non-standard library is supposed to be present. Thus, code snippets found online often make no distinction between the standard library and third-party packages. For example, the following snippet will throw an exception on a freshly installed Python:

import json
import dotmap

print("hello, world")

The problem is simple: json is a standard module, while dotmap is not, so it is supposed to be installed in advance. It is easy to install dotmap by typing

pip install dotmap

However, it simply installs the most recent version of dotmap globally for the current Python interpreter, which might cause issues with other codebases, requiring a certain different version of dotmap.

Summing up, there are several separate issues here:

  1. We should be able to ensure the right version of Python interpreter for our code.

  2. We should be able to ensure the right combination of versions of third-party libraries and their presence in the system.

  3. We should be able to isolate different setups from each other.

In other words, the basic problem is to recreate a Python environment, where our system is being developed, on another machine.

Using Poetry

The issues described above are well known in Python world, and there are standard or semi-standard tools for dealing with them. However, they typically solve individual aspects of the problem rather than tackling it as a whole. I believe that nearly every Python project needs to deal with the full pack, and Poetry comes close to being an "all-purpose" solution.

Poetry can be installed via command line as follows:

# Bash
curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python -

# Windows PowerShell
(Invoke-WebRequest -Uri https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py -UseBasicParsing).Content | python -

Adding Poetry to Python Code

As discussed above, it is almost always desirable to make a local Python environment reproducible, so it makes sense to include Poetry configuration into the project from the very start. However, it can also be done any time later. To do it, navigate to the project directory and run

poetry init

This command simply asks a few questions in an interactive mode and generates a plain text file named pyproject.toml. It can also be created manually without Poetry, but the shell tool ensures that all the required elements are present.

The most important sections are [tool.poetry.dependencies] listing both the main project dependencies and the required Python version, and [tool.poetry.dev-dependencies], responsible for the development dependencies. The command line tool asks about dependencies, but they can be also added any time later.

When adding dependencies, it might be important to set constraints to their version numbers. By default, Poetry suggests the most recent version from PyPi, but it is also possible to add a library not listed on PyPi (such as a Git repository or a binary file). Dependency specification syntax is quite elaborate, and requires understanding of the semantic versioning system. I would say that in most cases the default option would work fine.

Adding a new main dependency is easy:

poetry add <name1> [... <nameN>]

A development dependency can be added in a similar manner:

poetry add --dev <name1> [... <nameN>]

"Development" dependencies are assumed to be necessary for project development only. In practice it means we can skip their installation by instructing the system to do so.

Removing dependencies is done similarly via poetry remove command.

One may ask how to find out the list of dependencies for an existing project. Suppose we already have a Python environment with a number of installed libraries, and use it for some of our projects. How do we know which libraries are needed for any given project? Tools like pipreqs may help, but they are not 100% reliable, so, in general, some manual work is needed.

Handling Virtual Environments

There is one major difference between adding a dependency manually and using poetry add: the command-line tool immediately installs the dependency. Since dependencies are installed into an isolated project-specific Python environment, we have to discuss these environments first.

It is important to note that Poetry will not install Python interpreters specified in project requirements. It is expected that the required Python version is already present in the system.

Thus, for example, running

poetry add dotmap

will fail if Python 3.6 is not installed, while pyproject.toml has a dependency

python = "3.6"

Calling any command that installs/updates/removes project dependencies forces Poetry to modify the isolated virtual environment, associated with the current project. If this environment doesn’t exist yet, it will be created.

By default, Poetry uses the Python version found in PATH. However, suppose that we need to use some other Python, present in the system. In this case, we have to trigger the creation of a virtual environment prior to poetry add (or poetry install) calls.

The most straightforward way to do it is to run poetry env use with a full path to the required Python version:

poetry env use C:\Python3.6\python.exe

Poetry remembers this setup, and will use Python 3.6 every time we run poetry add and similar commands from the project directory.

Technically, a virtual environment is just a directory where all environment-specific settings and libraries are stored. If it breaks for some reason, we can simply delete this directory and call poetry install or poetry env use to recreate it.

By default, all virtual environment directories are kept in the user home location. It is also possible to instruct Poetry to create virtual environments right inside the respective projects by calling

poetry config virtualenvs.in-project true

I think this setup is actually more convenient. If I want to delete a project, it will delete the corresponding virtual environment as well. The only real downside I see for now is cluttering of the project directory. When I search for a file inside a project, I have to exclude .venv every time, which is annoying.

Installing Dependencies

Having a pyproject.toml file inside the project is enough to be able to run

poetry install [--no-dev]

and get all the necessary dependencies installed inside the corresponding virtual environment. As mentioned above, Poetry will also automatically create the environment if it is not present (alternatively, use poetry env use to choose the right Python version). Thus, this is the minimal working setup for preparing the system for an existing project: navigate to the project directory on the disk and run poetry install. The --no-dev option instructs Poetry to skip development dependencies.

Poetry authors recommend to make one extra step, though. When Poetry installs dependencies, it writes down their exact version numbers into a file called poetry.lock. If we keep it as a part of the project (i.e., add it to version control), Poetry will make sure to install these specific versions during poetry install.

Let’s consider an example. Suppose we have a project that depends on the library dotmap. We add it with a command

poetry add dotmap@1.*

Poetry actually installs dotmap version 1.3.24 (the most recent version at the time of writing) and writes down this information into poetry.lock. Suppose that after some time I pull this project from the repository on another machine without poetry.lock. If I run poetry install, it will install the most recent dotmap version matching the mask 1.*. However, if I keep poetry.lock as well, dotmap version 1.3.24 will be installed.

So, to prevent possible issues it is recommended to place poetry.lock under version control and update the libraries listed there manually. This is done by calling

poetry update [--no-dev]

The effect of this command is equivalent to deleting poetry.lock and running poetry install.

Running Project Code

The explicit way to run code inside a project-specific virtual environment is to use poetry run command:

poetry run python main.py

Basically, poetry run executes any given command inside the project’s environment. It is also possible to open a command line shell to work with environment’s tools without having to prefix them with poetry run all the time:

poetry shell

For now, I prefer the "explicit" version. It feels a bit overly verbose to type poetry run python instead of python, but in practice it is all done inside batch files or Visual Studio Code. Also I tend to think that explicit is good in this case.