Published on

PDM Internals(1)

Lock file

Reading Time

12 min read

pdm-lock

In order to answer some frequently asked questions and facilitate future contributors, I plan to write a series of articles on the internals of PDM starting from this article. This article will introduce the lock file of PDM, based on the latest version 2.12. This English version was translated with assistance from LLM. You can read the original post in Chinese.

What is a lock file?

Lock file is a file used to store pinned package versions, which records the package's dependencies and their versions. This file is common in package managers, such as yarn.lock, Cargo.lock, go.sum, etc. In the Python ecosystem, Pipenv and Poetry also have their own lock files. PDM is also a package manager with lock files, which distinguishes it from other package managers without lock files, such as pip. There will be some behavioral changes with lock file. Many people ask in PDM's issue tracker, why pip can install but PDM cannot. Hopefully this article can answer this question.

Having said that, it seems we need to define what the lock file does first. This doesn't seem to be an easy task, at least in PDM a lock file is used to constrain the version of all packages that might be installed during the installation process, as well as its source, checksum, etc., with the goal of providing reproducible Python environment. You can generate a lock file by running pdm lock. PDM will also ensure that the lock file exists and is valid when you run pdm install, and generate it if necessary. Recently, a new round of lock file proposal discussion is underway. The discussion is quite lengthy. If you have the time, you can take a look at different understanding and expectations of lock files.

How to generate a lock file?

Cross-version lock and lock for current environment

Initially, Python's package managers did not come with a lock file, but dependency resolution was still a necessary process. So what happens when pip installs a package foo?

  1. Access https://pypi.org/simple/foo/ to get all versions of foo.
  2. Starting from the latest version that meets the requirements, check each package file one by one to see if it satisfies the current environment and Python version. If it does, select this version for the next step.
  3. Get the dependency list from this file. Dependencies may also have environment requirements, so each dependency needs to be checked individually to see if it meets the current environment and Python version. If it does, record this dependency.
  4. For dependencies that have been recorded but version pins not resolved yet, repeat step 1.
  5. If a matching version is not found, go back to step 2 and select the next file that meets the requirements.

You can see that I've bolded current environment and Python version, and yes, the resolver takes the current environment and Python version into account when checking whether the condition is met. This is a lock that is only for the current environment, and at the time of writing, this is the way dependency resolution in most package managers works, with the exception of Poetry and PDM.

For package managers that don't generate lock files, each installation of a dependency is resolved on-the-fly, taking into account the current environment and nothing else. But if a package manager generates a lock file, since its purpose is to reproduce the environment, it is possible that the installation will be performed on a different Python version or operating system. Then there is a need for a cross-version lock file, you could generate one for each target environment, but PDM chooses to record all package versions, and its environment information, in one lock file.

requires-python

requires-python is a metadata field defined in PEP 621 and written in the [project] table, but in fact a similar concept was introduced much earlier, setuptools.setup() has the python_requires parameter, which serves the same purpose of restricting which Python environments the package can be installed on. After the devastating update to Python 3, all packages or Python projects should theoretically have this field, indicating the range of Python versions it supports, at lease the mininum one.

This field plays a critical role in PDM's dependency resolution. To illustrate its mechanism, let's look at an example:

[project]
name = "foo"
requires-python = ">=3.8"

Now run pdm add numpy. You will see the following output:

Output
Adding packages to default dependencies: numpy
${SITE_PACKAGES}/pdm/resolver/providers.py:196: PackageWarning: Skipping [email protected] because it requires Python>=3.9 but the
project claims to work with Python>=3.8. Instead, another version of numpy that supports Python>=3.8 will be used.
If you want to install [email protected], narrow down the `requires-python` range to include this version. For example, ">=3.9" should work.
  return self.repository.find_candidates(
${SITE_PACKAGES}/pdm/resolver/providers.py:196: PackageWarning: Skipping [email protected] because it requires Python>=3.9 but the
project claims to work with Python>=3.8. Instead, another version of numpy that supports Python>=3.8 will be used.
If you want to install [email protected], narrow down the `requires-python` range to include this version. For example, ">=3.9" should work.
  return self.repository.find_candidates(
${SITE_PACKAGES}/pdm/resolver/providers.py:196: PackageWarning: Skipping [email protected] because it requires Python>=3.9 but the
project claims to work with Python>=3.8. Instead, another version of numpy that supports Python>=3.8 will be used.
If you want to install [email protected], narrow down the `requires-python` range to include this version. For example, ">=3.9" should work.
  return self.repository.find_candidates(
${SITE_PACKAGES}/pdm/resolver/providers.py:196: PackageWarning: Skipping [email protected] because it requires Python<3.13,>=3.9
but the project claims to work with Python>=3.8. Instead, another version of numpy that supports Python>=3.8 will be used.
If you want to install [email protected], narrow down the `requires-python` range to include this version. For example, "<3.13,>=3.9" should work.
  return self.repository.find_candidates(
${SITE_PACKAGES}/pdm/resolver/providers.py:196: PackageWarning: Skipping [email protected] because it requires Python<3.13,>=3.9
but the project claims to work with Python>=3.8. Instead, another version of numpy that supports Python>=3.8 will be used.
If you want to install [email protected], narrow down the `requires-python` range to include this version. For example, "<3.13,>=3.9" should work.
  return self.repository.find_candidates(
${SITE_PACKAGES}/pdm/resolver/providers.py:196: PackageWarning: Skipping [email protected] because it requires Python>=3.9 but the
project claims to work with Python>=3.8. Instead, another version of numpy that supports Python>=3.8 will be used.
If you want to install [email protected], narrow down the `requires-python` range to include this version. For example, ">=3.9" should work.
  return self.repository.find_candidates(
${SITE_PACKAGES}/pdm/resolver/providers.py:196: PackageWarning: Skipping [email protected] because it requires Python>=3.9 but the
project claims to work with Python>=3.8. Instead, another version of numpy that supports Python>=3.8 will be used.
If you want to install [email protected], narrow down the `requires-python` range to include this version. For example, ">=3.9" should work.
  return self.repository.find_candidates(
${SITE_PACKAGES}/pdm/resolver/providers.py:196: PackageWarning: Skipping [email protected] because it requires Python>=3.9 but the
project claims to work with Python>=3.8. Instead, another version of numpy that supports Python>=3.8 will be used.
If you want to install [email protected], narrow down the `requires-python` range to include this version. For example, ">=3.9" should work.
  return self.repository.find_candidates(
INFO: Use `-q/--quiet` to suppress these warnings, or ignore them per-package with `ignore_package_warnings` config in [tool.pdm] table.
🔒 Lock successful
Changes are written to pyproject.toml.
Synchronizing working set with resolved packages: 1 to add, 0 to update, 0 to remove

  ✔️ Install numpy 1.24.4 success

As you can see it rejects a bunch of newer versions of numpy1 because they support Python versions as low as 3.9, and your project specifies a minimum Python version of 3.8. So if you try to install a package that has a dependency on numpy>=1.26, PDM will fail to solve it and reports an error. A lot of users have asked this question, why is PDM still rejecting these new versions of packages when I'm using Python version 3.10?

The answer is that PDM always tries to make the lock file work on all the versions of Python you specify, it doesn't take into account which version of Python you are currently using.

The reason for this is that if you have requires-python = ">=3.8" specified in your project and shared it, then you are giving full permission for a user to install the package using Python 3.8. At that point all dependencies must support Python 3.8, and [email protected] clearly does not satisfy that. So in practice, the range specified by requires-python in your project must be a subset of the requires-python scopes of all dependent packages. PDM will compute the appropriate value for you and show it in the warning message, as in the example above.

Markers

Another concept related to environmental restrictions is environment markers, which comes from the PEP 508 specification. It is a conditional expression used to specify the installation conditions of packages. For example, foo>=1.0; sys_platform == "win32" means that foo will only be installed on the Windows platform. In the lock for current environment approach, if the resolver encounters such an expression and finds that the current environment does not meet this condition, then this package will be rejected; otherwise, foo>=1.0 will be accepted. While in cross-version lock approach, the expression cannot be evaluated at lock time, so this expression will be recorded as a whole foo>=1.0; sys_platform == "win32", and continue to resolve the dependencies of foo. When installing the package, this expression will be evaluated to determine whether to install the package.

pdm-lock-graph

However, it should be noted that dependencies of the constrained packages should also apply the same markers. That is, if foo depends on bar, then bar should only be installed on Windows, too. This is known as the propagation of markers. In case bar itself has a different environment marker, the two markers should be combined with logical AND. On the other hand, the same dependency may come from different parent packages, so the markers they "inherit" from their parents should be combined with logical OR.

marker-propagation

In pdm.lock, the final calculation results of the markers will be recorded in the markers field of each package. In this way, during installation, you only need to read this field to determine whether this package needs to be installed, without needing to traverse the dependency tree to find the information from the ancestors.

For example, this is the resolution result of rich.
# This file is @generated by PDM.
# It is not intended for manual editing.

[metadata]
groups = ["default"]
strategy = ["cross_platform", "inherit_metadata"]
lock_version = "4.4.1"
content_hash = "sha256:37d2aae470ae5f416baf9366fdba3a83f22de379a8ab288ec6077f4ce3b0ec59"

[[package]]
name = "markdown-it-py"
version = "3.0.0"
requires_python = ">=3.8"
summary = "Python port of markdown-it. Markdown parsing, done right!"
groups = ["default"]
dependencies = [
    "mdurl~=0.1",
]
files = [
    {file = "markdown-it-py-3.0.0.tar.gz", hash = "sha256:e3f60a94fa066dc52ec76661e37c851cb232d92f9886b15cb560aaada2df8feb"},
    {file = "markdown_it_py-3.0.0-py3-none-any.whl", hash = "sha256:355216845c60bd96232cd8d8c40e8f9765cc86f46880e43a8fd22dc1a1a8cab1"},
]

[[package]]
name = "mdurl"
version = "0.1.2"
requires_python = ">=3.7"
summary = "Markdown URL utilities"
groups = ["default"]
files = [
    {file = "mdurl-0.1.2-py3-none-any.whl", hash = "sha256:84008a41e51615a49fc9966191ff91509e3c40b939176e643fd50a5c2196b8f8"},
    {file = "mdurl-0.1.2.tar.gz", hash = "sha256:bb413d29f5eea38f31dd4754dd7377d4465116fb207585f97bf925588687c1ba"},
]

[[package]]
name = "pygments"
version = "2.17.2"
requires_python = ">=3.7"
summary = "Pygments is a syntax highlighting package written in Python."
groups = ["default"]
files = [
    {file = "pygments-2.17.2-py3-none-any.whl", hash = "sha256:b27c2826c47d0f3219f29554824c30c5e8945175d888647acd804ddd04af846c"},
    {file = "pygments-2.17.2.tar.gz", hash = "sha256:da46cec9fd2de5be3a8a784f434e4c4ab670b4ff54d605c4c2717e9d49c4c367"},
]

[[package]]
name = "rich"
version = "13.7.1"
requires_python = ">=3.7.0"
summary = "Render rich text, tables, progress bars, syntax highlighting, markdown and more to the terminal"
groups = ["default"]
dependencies = [
    "markdown-it-py>=2.2.0",
    "pygments<3.0.0,>=2.13.0",
    "typing-extensions<5.0,>=4.0.0; python_version < \"3.9\"",
]
files = [
    {file = "rich-13.7.1-py3-none-any.whl", hash = "sha256:4edbae314f59eb482f54e9e30bf00d33350aaa94f4bfcd4e9e3110e64d0d7222"},
    {file = "rich-13.7.1.tar.gz", hash = "sha256:9be308cb1fe2f1f57d67ce99e95af38a1e2bc71ad9813b0e247cf7ffbcc3a432"},
]

[[package]]
name = "typing-extensions"
version = "4.10.0"
requires_python = ">=3.8"
summary = "Backported and Experimental Type Hints for Python 3.8+"
groups = ["default"]
marker = "python_version < \"3.9\""
files = [
    {file = "typing_extensions-4.10.0-py3-none-any.whl", hash = "sha256:69b1a937c3a517342112fb4c6df7e72fc39a38e7891a5730ed4985b5214b5475"},
    {file = "typing_extensions-4.10.0.tar.gz", hash = "sha256:b0abd7c89e8fb96f98db18d86106ff1d90ab692004eb746cf6eda2682f91b3cb"},
]

Note how the environment markers associated to typing-extensions dependency are propagated to the typing-extensions package. The implementation of PDM utilizes another library I wrote, dep-logic, which provides logical operation capabilities for markers.

Metadata

A dependency list of a package file, supporting Python version (requires-python), all belong to the metadata of this package. PDM has two ways to obtain the metadata of a package. One is to download the file and then read its METADATA/PKG_INFO file content, and the other is to use the metadata link standardized in PEP 658 to request the content of metadata separately.

However, because PDM's lock file is cross-versioned, there are many more package files that need to be resolved and recorded. For example, a single version of charset-normalizer contains 90 files! And not all package indexes support PEP 658, it is unrealistic to traverse so many files. So PDM made a trade-off and introduced an assumption:

The metadata of different files for the same version are the same.

This is not always the case, as there is no standard forcing this. In fact, you can specify completely different dependencies for different files in a package. For sdist, metadata even needs to be generated by running the build process, and the result can be completely unpredictable. It's even possible that the metadata obtained at one moment may differ from the next moment. However, this assumption holds true in most cases, so PDM has chosen to make this assumption in exchange for performance improvements. Not only PDM but also Poetry and uv operate in a similar way. Therefore, in PDM's lock files, metadata are recorded based on package versions; currently, PDM can only lock one version for each package. In other words, PDM locks versions rather than specific files. And during installation, it selects the correct file to download and install based on the specified version. This introduces another somewhat inaccurate assumption:

Each version is complete and contains package files corresponding to all platforms supported by the current version.

This is a trade-off, sometimes you have to sacrifice some correctness for performance. For those corner cases that are not covered, PDM will likely fail to resolve.

Another important feature of PDM's lock file is it supports various lock strategies. This is to be introduced in the next article. Thanks for reading.

Footnotes

  1. You might think that there are too many warnings, but I am worried that without this information, users will be more confused and not know why the result is like this. Fortunately, these warnings can be suppressed by adding --quiet.

Share: