Python ====== The information below is complementary to the `Python `_ page in the Alliance technical documentation. Modules ------- To load a Python version compatible with the default software environment: .. code-block:: console [alice@ip09 ~]$ module avail python ------------------------------------- Core Modules -------------------------------------- ipython-kernel/3.10 python-build-bundle/2024a (D) ipython-kernel/3.11 (D) python/3.10.13 (t,3.10) ipython-kernel/3.12 python/3.11.5 (t,D:3.11) python-build-bundle/2023b python/3.12.4 (t) ... [alice@ip09 ~]$ module load python/3.11.5 The recommended version (default module) is indicated by ``(D)``. If you need a version which is not available in the default software environment, use `module spider python` to list all available versions. In addition to modules for Python itself, the Alliance software contains modules for `Scientific Python`. Named ``scipy-stack``, they provide ``scipy`` but also ``numpy``, ``pandas``, ``matplotlib``, etc. Once your module for Python itself is loaded: .. code-block:: console [alice@ip09 ~]$ module avail scipy-stack ------------------------------------- Core Modules -------------------------------------- scipy-stack/2023b (math) scipy-stack/2024a (math) scipy-stack/2024b (math,D) ... [alice@ip09 ~]$ module load scipy-stack/2024b Virtual environments -------------------- As explained in the next section, Python packages (other than `Scientific Python`) can be installed with the ``pip`` command, either by downloading them from PyPI or by choosing from a collection of packages precompiled by the Alliance software team. However, we recommend to never install Python packages without first activating a virtual environment. Virtual Python environment are directories containing a set of packages installed for a given job. For instance, you could have a virtual environment to simulate quantum systems with QuTiP and another for machine learning with TensorFlow. You could even have several virtual environments for different QuTiP or TensorFlow versions, as required for various projects. By systematically using a virtual environment for your Python jobs, you facilitate software installation, debugging, and research reproducibility. Conversely, if you install all your Python packages in the default location, you will encounter more software compatibility issues, and installing or updating a package can cause previously successful jobs to stop working. In addition, you will not be able to switch between different versions of a given package. Finally, it will be more difficult to rerun a job on another computer or to provide instructions to reproduce your software environment and, therefore, your research. To conclude this long explanation, here is a short demonstration. First, load the modules for Python and `Scientific Python`: .. code-block:: console [alice@ip09 ~]$ module load python/3.11.5 [alice@ip09 ~]$ module load scipy-stack/2024b Then, create a virtual environment: .. code-block:: console [alice@ip09 ~]$ virtualenv --no-download $HOME/venv/qutip Activate the environment: .. code-block:: console [alice@ip09 ~]$ source $HOME/venv/qutip/bin/activate Notice that the command prompt changes to show the active virtual environment. All actions performed by the ``pip`` command (installing, uninstalling, updating packages) will now target the ``$HOME/venv/qutip`` directory. The first thing to do is update ``pip``: .. code-block:: console (qutip) [alice@ip09 ~]$ pip install --no-index --upgrade pip Then, install packages, such as QuTiP : .. code-block:: console (qutip) [alice@ip09 ~]$ pip install --no-index qutip==5.0.1 Finally, deactivate the environment. .. code-block:: console (qutip) [alice@ip09 ~]$ deactivate Once the environment has been created, it can be reused simply by activating it again; there is no need to reinstall any packages. For example, the above environment can be used in a job script with: .. code-block:: bash module purge module load python/3.11.5 module load scipy-stack/2024b source $HOME/venv/qutip/bin/activate Installing outside a virtual environment '''''''''''''''''''''''''''''''''''''''' If you try to install Python packages without first activating a virtual environment, you will get the following error: .. code-block:: console [alice@ip09 ~]$ pip install --no-index numpy ERROR: Could not find an activated virtualenv (required). If you nonetheless wish to install a package outside a virtual environment, you can do it with: .. code-block:: console [alice@ip09 ~]$ PIP_REQUIRE_VIRTUALENV=false pip install --no-index numpy .. note:: This behaviour differs from that of Alliance clusters, where it is possible by default to install Python packages outside a virtual environment. Precompiled Python packages --------------------------- The ``avail_wheels`` command lists Python software packages precompiled by the Alliance software team. These packages are optimised for HPC. For instance, to search for Qiskit: .. code-block:: console [alice@ip09 ~]$ avai l_wheels qiskit name version python arch ------ --------- -------- ------- qiskit 1.2.4 cp38 generic To install this precompiled version in an active virtual environment: .. code-block:: console (qiskit) [alice@ip09 ~]$ pip install --no-index qiskit==1.2.4 Parallel computing with Python ------------------------------ Python code is typically not parallel. As a consequence, asking for more than one CPU core will not automatically accelerate your jobs! You first need to parallelise your code, either explicitly or by using parallelised library functions, such as some of those in NumPy or SciPy. Due to an intrinsic limitation, the “global interpreter lock”, Python code cannot be parallelised using the shared memory model. However, there are alternatives. One is to create a C/C++ Python extension using a parallel programming library such as OpenMP. Another is to use the distributed memory model with multiple Python processes. To do so, you can use the ``multiprocessing`` module, or a library such as `mpi4py `_ (message passing) or `Dask `_ (distributed computing). .. _python-fils-label: Thread oversubscription ''''''''''''''''''''''' A common problem when dealing with parallelism in Python is thread oversubscription: the number of execution threads started in a job is greater than the number of allocated CPU cores. The ``multiprocessing`` module, in particular, starts by default as many threads as there are CPU cores, with no regards to whether or not these cores are accessible. For example, by default, ``multiprocessing`` would start 64 execution threads when used in a job allocated to an IQ HPC Platform CPU node, even if you requested only 2, 4, or 8 cores. This problem is compounded when using parallelised functions that also start as many threads as there are cores (for instance ``scipy.sparse.linalg.eigsh``). To build on the above example, in a job that uses both ``multiprocessing`` and ``eigsh``, 4096 execution threads (64 × 64) would be started by default, even if the job only has access to 2, 4, or 8 cores. Performance is thus drastically reduced. To paliate this problem, you must instruct SciPy, ``multiprocessing``, Dask, etc. to use the right number of execution threads. By adding the following instructions to your job script (before your actual calculation), you disable implicit parallelism in most functions, including those in SciPy, which use OpenMP or Intel MKL in the background: .. code-block:: bash export OMP_NUM_THREADS=1 export MKL_NUM_THREADS=1 To control the number of processes started by ``multiprocessing``: .. code-block:: python from multiprocessing import Pool from os import environ nprocesses = int(environ.get('SLURM_CPUS_PER_TASK', default=1)) pool = Pool(nprocesses) With Dask: .. code-block:: python from os import environ from dask.distributed import LocalCluster nprocesses = int(environ.get('SLURM_CPUS_PER_TASK', default=1)) cluster = LocalCluster(n_workers=nprocesses) Conversely, if you do not use ``multiprocessing``, Dask, etc. but would rather take advantage of SciPy’s parallel functions, set the number of execution threads with: .. code-block:: bash export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK:-1} export MKL_NUM_THREADS=${SLURM_CPUS_PER_TASK:-1} .. seealso:: - :ref:`This FAQ entry ` discusses threads and performance issues in general.