The Problem
Python is a very reasonable "default" language of choice for a variety of tasks. At least on desktop platforms, it makes sense to start with Python and see how it goes. Unfortunately, along its way to popularity it lost much of the simplicity that made it fun in the first place. An especially egregious aspect of Python is its poor backward compatibility. In practice it means that your codebase will only reliably work with a specific version of Python interpreter.
Even worse, the same holds for many popular libraries. Some Python codebase will only work with a certain library version 5.0.0 or lower, while another codebase requires 5.0.1 or higher. Thus, working with a single combination of Python interpreter and a bunch of libraries is simply not an option beyond the most basic experiments.
Working with third-party code is also not straightforward. Python has an official package repository, where any reasonable non-standard library is supposed to be present. Thus, code snippets found online often make no distinction between the standard library and third-party packages. For example, the following snippet will throw an exception on a freshly installed Python:
import json
import dotmap
print("hello, world")
The problem is simple: json
is a standard module, while dotmap
is not, so it is supposed to be installed in advance. It is easy to install dotmap
by typing
pip install dotmap
However, it simply installs the most recent version of dotmap
globally for the current Python interpreter, which might cause issues with other codebases, requiring a certain different version of dotmap
.
Summing up, there are several separate issues here:
We should be able to ensure the right version of Python interpreter for our code.
We should be able to ensure the right combination of versions of third-party libraries and their presence in the system.
We should be able to isolate different setups from each other.
In other words, the basic problem is to recreate a Python environment, where our system is being developed, on another machine.
Using Poetry
The issues described above are well known in Python world, and there are standard or semi-standard tools for dealing with them. However, they typically solve individual aspects of the problem rather than tackling it as a whole. I believe that nearly every Python project needs to deal with the full pack, and Poetry comes close to being an "all-purpose" solution.
Poetry can be installed via command line as follows:
# Bash
curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python -
# Windows PowerShell
(Invoke-WebRequest -Uri https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py -UseBasicParsing).Content | python -
Adding Poetry to Python Code
As discussed above, it is almost always desirable to make a local Python environment reproducible, so it makes sense to include Poetry configuration into the project from the very start. However, it can also be done any time later. To do it, navigate to the project directory and run
poetry init
This command simply asks a few questions in an interactive mode and generates a plain text file named pyproject.toml
. It can also be created manually without Poetry, but the shell tool ensures that all the required elements are present.
The most important sections are [tool.poetry.dependencies]
listing both the main project dependencies and the required Python version, and [tool.poetry.dev-dependencies]
, responsible for the development dependencies. The command line tool asks about dependencies, but they can be also added any time later.
When adding dependencies, it might be important to set constraints to their version numbers. By default, Poetry suggests the most recent version from PyPi, but it is also possible to add a library not listed on PyPi (such as a Git repository or a binary file). Dependency specification syntax is quite elaborate, and requires understanding of the semantic versioning system. I would say that in most cases the default option would work fine.
Adding a new main dependency is easy:
poetry add <name1> [... <nameN>]
A development dependency can be added in a similar manner:
poetry add --dev <name1> [... <nameN>]
"Development" dependencies are assumed to be necessary for project development only. In practice it means we can skip their installation by instructing the system to do so.
Removing dependencies is done similarly via poetry remove
command.
One may ask how to find out the list of dependencies for an existing project. Suppose we already have a Python environment with a number of installed libraries, and use it for some of our projects. How do we know which libraries are needed for any given project? Tools like pipreqs may help, but they are not 100% reliable, so, in general, some manual work is needed.
Handling Virtual Environments
There is one major difference between adding a dependency manually and using poetry add
: the command-line tool immediately installs the dependency. Since dependencies are installed into an isolated project-specific Python environment, we have to discuss these environments first.
It is important to note that Poetry will not install Python interpreters specified in project requirements. It is expected that the required Python version is already present in the system.
Thus, for example, running
poetry add dotmap
will fail if Python 3.6 is not installed, while pyproject.toml
has a dependency
python = "3.6"
Calling any command that installs/updates/removes project dependencies forces Poetry to modify the isolated virtual environment, associated with the current project. If this environment doesn’t exist yet, it will be created.
By default, Poetry uses the Python version found in PATH
. However, suppose that we need to use some other Python, present in the system. In this case, we have to trigger the creation of a virtual environment prior to poetry add
(or poetry install
) calls.
The most straightforward way to do it is to run poetry env use
with a full path to the required Python version:
poetry env use C:\Python3.6\python.exe
Poetry remembers this setup, and will use Python 3.6 every time we run poetry add
and similar commands from the project directory.
Technically, a virtual environment is just a directory where all environment-specific settings and libraries are stored. If it breaks for some reason, we can simply delete this directory and call poetry install
or poetry env use
to recreate it.
By default, all virtual environment directories are kept in the user home location. It is also possible to instruct Poetry to create virtual environments right inside the respective projects by calling
poetry config virtualenvs.in-project true
I think this setup is actually more convenient. If I want to delete a project, it will delete the corresponding virtual environment as well. The only real downside I see for now is cluttering of the project directory. When I search for a file inside a project, I have to exclude .venv
every time, which is annoying.
Installing Dependencies
Having a pyproject.toml
file inside the project is enough to be able to run
poetry install [--no-dev]
and get all the necessary dependencies installed inside the corresponding virtual environment. As mentioned above, Poetry will also automatically create the environment if it is not present (alternatively, use poetry env use
to choose the right Python version). Thus, this is the minimal working setup for preparing the system for an existing project: navigate to the project directory on the disk and run poetry install
. The --no-dev
option instructs Poetry to skip development dependencies.
Poetry authors recommend to make one extra step, though. When Poetry installs dependencies, it writes down their exact version numbers into a file called poetry.lock
. If we keep it as a part of the project (i.e., add it to version control), Poetry will make sure to install these specific versions during poetry install
.
Let’s consider an example. Suppose we have a project that depends on the library dotmap
. We add it with a command
poetry add dotmap@1.*
Poetry actually installs dotmap
version 1.3.24 (the most recent version at the time of writing) and writes down this information into poetry.lock
. Suppose that after some time I pull this project from the repository on another machine without poetry.lock
. If I run poetry install
, it will install the most recent dotmap
version matching the mask 1.*
. However, if I keep poetry.lock
as well, dotmap
version 1.3.24 will be installed.
So, to prevent possible issues it is recommended to place poetry.lock
under version control and update the libraries listed there manually. This is done by calling
poetry update [--no-dev]
The effect of this command is equivalent to deleting poetry.lock
and running poetry install
.
Running Project Code
The explicit way to run code inside a project-specific virtual environment is to use poetry run
command:
poetry run python main.py
Basically, poetry run
executes any given command inside the project’s environment. It is also possible to open a command line shell to work with environment’s tools without having to prefix them with poetry run
all the time:
poetry shell
For now, I prefer the "explicit" version. It feels a bit overly verbose to type poetry run python
instead of python
, but in practice it is all done inside batch files or Visual Studio Code. Also I tend to think that explicit is good in this case.