Python virtualenv and venv dos and don’ts
One of Python’s biggest draws is its expansive ecosystem of third-party packages. If there is a task you want to pull off — file format conversion, scraping and restructuring web pages, linear regression, you name it — odds are that one or more packages in the Python Package Index will fill your need.
The hard part is managing the accumulation of packages in a given Python installation. It’s all too easy to thoughtlessly install dozens of packages and in time end up with a Python environment fraught with conflicts between older and newer versions of tools, making work harder than it needs to be.
Python comes with an automated system for keeping a package set local to a given Python project. Virtual environments — courtesy of the virtualenv
tool in Python 2 and venv
in Python 3—can be used to create a separate, isolated instance of the Python runtime for a project, with its own complement of packages.
Do use Python virtual environments
The first common mistake Python programmers make with virtualenv
or venv
is to just not bother with it. If all you’re doing is throwing together a quick-and-dirty script to do one little thing, why bother setting up a virtual environment at all?
The trouble is, that “one little thing” often turns out to be much, much more. As your mastery of Python grows, you’ll inevitably end up pulling in more third-party modules to accomplish more sophisticated work. What’s more, you’ll find it increasingly difficult to deal with dependencies on earlier versions of packages, one of the key problems virtual environments were created to solve.
Nominations are open for the 2024 Best Places to Work in IT
Some people also wrinkle their noses at using virtualenv
or venv
because each virtual environment is its own little copy of the Python runtime, taking up about 25MB. But disk space is ridiculously cheap these days, and removing a virtual environment is as blissfully simple as deleting its directory (no side effects). Plus, if you have multiple tasks that share a common set of packages, you can use the same virtual environment for those tasks. (However, this only works if that package set is perfectly consistent and never changes; more on this later.)
Do use virtualenvwrapper to manage Python virtual environments
One way to make virtual environments less burdensome is to use virtualenvwrapper
. This tool allows you to manage all the virtual environments in your workspace from a single, central command-line application.
A word of advice on creating virtual environments: Don’t name the directory of your virtual environment venv
—or, for that matter, the name of any other package you want to use in the virtual environment. This can have unpredictable effects on imports later. Use a name that describes your project unambiguously.
Don’t share virtual environments between projects
If you have multiple projects that have roughly the same requirements, it might seem like a good idea to create a single virtual environment that both projects can share. Don’t do it.
This is a bad idea for plenty of reasons, but two will suffice. One, it’s all too likely that one of the projects in question will suddenly have requirements that break the other project. The whole point of virtual environments is to isolate each project from other projects and their quirks.
Two, the convenience and saved disk space will be marginal. If your project has requirements.txt or Pipfile files, it’s trivially easy to set up a virtual environment for the project and install what it needs with a couple of commands. These installs are one-time costs, so there’s not much point in trying to ameliorate them.
If you have multiple projects that use the same versions of the same packages, and do so consistently, then you may be able to share a venv
between them without ill effects. But that requires you stay on top of the requirements for those projects as a group.
Do share big packages across environments — but carefully
Here’s a problem that’s growing more common in this age of installing big Python packages like TensorFlow. What if we have multiple projects that all need to share the same version of some package that happens to be really large — say, hundreds of megabytes?
One way to handle this is to take advantage of a not-widely-known feature of Python virtual environments. Normally, when created, they don’t use the underlying Python installation’s site packages directory. Apps that run in the venv can only “see” packages installed to the venv.
However, if you create a venv with the --system-site-packages
option, programs running in that venv will also be able to see packages installed in the underlying Python installation.
Normally, venvs don’t do this, as a way to keep their package namespaces clean. But if you enable this feature, you could use it to install a few key packages in the underlying install, and so allow them to be shared with venvs. If you need a newer version of the package in question, you could install it locally to the venv, since the venv’s own package versions would supersede what’s in the underlying installation.
Bear in mind, this solution works best only when:
- The package in question is really large — again, hundreds of megabytes — and it’s not practical to install it into multiple projects.
- You’re dealing with multiple projects that all need that package.
- You plan on keeping the versioning for that package consistent across all those projects. This is generally the hardest criterion to satisfy, since a project’s requirements can theoretically change at any time. But if it does, you can again always install a local copy of the package that satisfies the needed version.
Don’t place project files inside a Python virtual environment
When you set up a virtual environment, the directory it lives in isn’t meant to hold anything but the virtual environment itself. Your project belongs in its own separate directory tree. There are many good reasons for this:
- Your project directory tree might well have a naming convention that collides with elements of the virtual environment.
- The easy way to remove a virtual environment is to delete the directory. Mingling project files with the virtual environment means you must first disentangle the two.
- It’s too easy to overwrite something in the virtual environment without knowing it.
- Multiple projects may use the same virtual environment. (Unlikely, as above, but possible, and so worth being aware of.)
One way to organize things would be to create a top-level directory that holds different virtual environments and another top-level directory that holds projects. As long as the two are kept separate, that’s what matters. But the general approach, for convenience and consistency, is to have the venv inside the project as a top-level directory.
Don’t forget to activate your Python virtual environment
Another common mistake people make with virtual environments is forgetting to activate them, or not activating the right one.
Before a virtual environment can be used in a particular shell session, it has to be activated, by way of a script named activate
in the virtual environment’s Scripts
directory on venvs created in Microsoft Windows. On POSIX systems like Linux and MacOS, it's named bin
. Upon activation, the virtual environment is treated as the default Python instance until you deactivate it (by running the deactivate
command), or until the shell session closes.
It’s easy to forget this step at first, both because it’s a habit that needs to be acquired and because the activation script is one level down in the virtual environment directory. A couple of tricks come in handy here:
- Create shortcuts to the activation/deactivation scripts in the root directory of your project. You can name those shortcuts something simple like
act
anddeact
to make them less obnoxious to type. - For projects that you work on from an IDE and not a command line, create a project launcher — a batch file or shell script — for the Python application in question. This lets you call the activation script, then run your own script afterward. You generally don’t need to deactivate the script environment after the run, because the session will terminate on its own anyway.
This last trick underscores an important point about virtual environment activations: They only apply to the environment session they run in.
For instance, if you launch two command-line sessions and activate a virtual environment in one, the other command-line session will use the system’s default Python installation, not the virtual environment. You’re not activating the virtual environment for the system as a whole, but only for the specific session.
Many IDEs now support the automatic detection and activation of a virtual environment with a project. Microsoft Visual Studio Code, with Microsoft’s own Python extension, does this. When you open a project and then kick open a console in VS Code, any virtual environment associated with that project will be automatically activated in that console.
Don’t use >=
for package version pinning in a Python virtual environment
This tip is useful outside of virtual environments as well. When you have an application with a requirements.txt
file, you should specify packages with an exact version number. Use a definition like mypackage==2.2
, not mypackage>=2.2
.
Here’s why: One of the chief reasons to use a virtual environment is to ensure the use of specific versions of packages. If you use >=
instead of ==
, there is no guarantee you—or someone else—will end up with the same version if the environment needs to be recreated for that project. Use an exact version number. You, a future you, and whoever else comes after you, will thank you.