Python¶

The information below is complementary to the Python page in the Alliance technical documentation.

Modules¶

To load a Python version compatible with the default software environment:

[alice@ip09 ~]$ module avail python
------------------------------------- Core Modules --------------------------------------
   ipython-kernel/3.10              python-build-bundle/2024a (D)
   ipython-kernel/3.11       (D)    python/3.10.13            (t,3.10)
   ipython-kernel/3.12              python/3.11.5             (t,D:3.11)
   python-build-bundle/2023b        python/3.12.4             (t)

...

[alice@ip09 ~]$ module load python/3.11.5

The recommended version (default module) is indicated by (D). If you need a version which is not available in the default software environment, use module spider python to list all available versions.

In addition to modules for Python itself, the Alliance software contains modules for Scientific Python. Named scipy-stack, they provide scipy but also numpy, pandas, matplotlib, etc. Once your module for Python itself is loaded:

[alice@ip09 ~]$ module avail scipy-stack

------------------------------------- Core Modules --------------------------------------
   scipy-stack/2023b (math)    scipy-stack/2024a (math)    scipy-stack/2024b (math,D)

...

[alice@ip09 ~]$ module load scipy-stack/2024b

Virtual environments¶

As explained in the next section, Python packages (other than Scientific Python) can be installed with the pip command, either by downloading them from PyPI or by choosing from a collection of packages precompiled by the Alliance software team. However, we recommend to never install Python packages without first activating a virtual environment.

Virtual Python environment are directories containing a set of packages installed for a given job. For instance, you could have a virtual environment to simulate quantum systems with QuTiP and another for machine learning with TensorFlow. You could even have several virtual environments for different QuTiP or TensorFlow versions, as required for various projects.

By systematically using a virtual environment for your Python jobs, you facilitate software installation, debugging, and research reproducibility. Conversely, if you install all your Python packages in the default location, you will encounter more software compatibility issues, and installing or updating a package can cause previously successful jobs to stop working. In addition, you will not be able to switch between different versions of a given package. Finally, it will be more difficult to rerun a job on another computer or to provide instructions to reproduce your software environment and, therefore, your research.

To conclude this long explanation, here is a short demonstration. First, load the modules for Python and Scientific Python:

[alice@ip09 ~]$ module load python/3.11.5
[alice@ip09 ~]$ module load scipy-stack/2024b

Then, create a virtual environment:

[alice@ip09 ~]$ virtualenv --no-download $HOME/venv/qutip

Activate the environment:

[alice@ip09 ~]$ source $HOME/venv/qutip/bin/activate

Notice that the command prompt changes to show the active virtual environment. All actions performed by the pip command (installing, uninstalling, updating packages) will now target the $HOME/venv/qutip directory.

The first thing to do is update pip:

(qutip) [alice@ip09 ~]$ pip install --no-index --upgrade pip

Then, install packages, such as QuTiP :

(qutip) [alice@ip09 ~]$ pip install --no-index qutip==5.0.1

Finally, deactivate the environment.

(qutip) [alice@ip09 ~]$ deactivate

Once the environment has been created, it can be reused simply by activating it again; there is no need to reinstall any packages. For example, the above environment can be used in a job script with:

module purge
module load python/3.11.5
module load scipy-stack/2024b
source $HOME/venv/qutip/bin/activate

Installing outside a virtual environment¶

If you try to install Python packages without first activating a virtual environment, you will get the following error:

[alice@ip09 ~]$ pip install --no-index numpy
ERROR: Could not find an activated virtualenv (required).

If you nonetheless wish to install a package outside a virtual environment, you can do it with:

[alice@ip09 ~]$ PIP_REQUIRE_VIRTUALENV=false pip install --no-index numpy

Note

This behaviour differs from that of Alliance clusters, where it is possible by default to install Python packages outside a virtual environment.

Precompiled Python packages¶

The avail_wheels command lists Python software packages precompiled by the Alliance software team. These packages are optimised for HPC. For instance, to search for Qiskit:

[alice@ip09 ~]$ avai l_wheels qiskit
name    version    python    arch
------  ---------  --------  -------
qiskit  1.2.4      cp38      generic

To install this precompiled version in an active virtual environment:

(qiskit) [alice@ip09 ~]$ pip install --no-index qiskit==1.2.4

Parallel computing with Python¶

Python code is typically not parallel. As a consequence, asking for more than one CPU core will not automatically accelerate your jobs! You first need to parallelise your code, either explicitly or by using parallelised library functions, such as some of those in NumPy or SciPy.

Due to an intrinsic limitation, the “global interpreter lock”, Python code cannot be parallelised using the shared memory model. However, there are alternatives. One is to create a C/C++ Python extension using a parallel programming library such as OpenMP. Another is to use the distributed memory model with multiple Python processes. To do so, you can use the multiprocessing module, or a library such as mpi4py (message passing) or Dask (distributed computing).

Thread oversubscription¶

A common problem when dealing with parallelism in Python is thread oversubscription: the number of execution threads started in a job is greater than the number of allocated CPU cores. The multiprocessing module, in particular, starts by default as many threads as there are CPU cores, with no regards to whether or not these cores are accessible. For example, by default, multiprocessing would start 64 execution threads when used in a job allocated to an IQ HPC Platform CPU node, even if you requested only 2, 4, or 8 cores.

This problem is compounded when using parallelised functions that also start as many threads as there are cores (for instance scipy.sparse.linalg.eigsh). To build on the above example, in a job that uses both multiprocessing and eigsh, 4096 execution threads (64 × 64) would be started by default, even if the job only has access to 2, 4, or 8 cores. Performance is thus drastically reduced.

To paliate this problem, you must instruct SciPy, multiprocessing, Dask, etc. to use the right number of execution threads. By adding the following instructions to your job script (before your actual calculation), you disable implicit parallelism in most functions, including those in SciPy, which use OpenMP or Intel MKL in the background:

export OMP_NUM_THREADS=1
export MKL_NUM_THREADS=1

To control the number of processes started by multiprocessing:

from multiprocessing import Pool
from os import environ

nprocesses = int(environ.get('SLURM_CPUS_PER_TASK', default=1))

pool = Pool(nprocesses)

With Dask:

from os import environ
from dask.distributed import LocalCluster

nprocesses = int(environ.get('SLURM_CPUS_PER_TASK', default=1))

cluster = LocalCluster(n_workers=nprocesses)

Conversely, if you do not use multiprocessing, Dask, etc. but would rather take advantage of SciPy’s parallel functions, set the number of execution threads with:

export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK:-1}
export MKL_NUM_THREADS=${SLURM_CPUS_PER_TASK:-1}