FAQs
General¶
How do I access the cluster?¶
You have to first login to the frontend
node and then to the submitter
using your account <user>
:
Once in the submitter
you can use Slurm.
For details, please refer to this page.
Why do I get "connection timed out" when connecting to the cluster?¶
Probably you're not connected to the Sapienza network/VPN.
Additionally, remember that if you're connected through Department's VPN you're already on the frontend network, so you can connect straight to the submitter node using ssh <user>@192.168.0.102
.
How do I get an account?¶
Contact an admin at cluster.di@uniroma1.it
stating which is your role (student, PhD., researcher, professor etc.).
Eventually, if you have the group_leaders
role, you can create a new account using the add_user_hpc
command from the frontend
.
How do I create an account for a student?¶
If you have the group_leaders
role, you can create a new account using the add_user_hpc
command from the frontend
.
How do I get department
or group_leaders
role?¶
Contact an admin at cluster.di@uniroma1.it
stating which is your role (student, PhD., researcher, professor etc.).
If you are a PhD., a researcher or a professor at our Department, then you're eligible to have the department
role.
If you are a professor and you lead a laboratory, then you're eligible to have the group_leaders
role.
How do I change my password?¶
Once in the frontend
, use the change_password
command.
Storage¶
How do I transfer files from/to the cluster?¶
Please take a look at this page of the docs.
Jobs¶
How do I check the resources of each node?¶
You can do this using the sinfo
command with custom formatting.
The basic command is the following:
The output is not much detailed though:
If you need to know more specific info about each node, you can use a custom formatter for the output of sinfo
.
The following formatters will show node names, partitions, number of CPUs per node, MBs of memory per node, GPUs per node, and time limit, and the numbers mean how much character are needed per column:
For a list of all the available formatters, please refer to the Slurm's official docs. The result will then be detailed as will:
NODELIST PARTITION CPUS MEMORY GRES TIMELIMIT
node120 admin 64 257566 gpu:quadro_rtx_6000:2 infinite
node121 admin 64 257566 (null) infinite
node122 admin 64 257566 gpu:quadro_rtx_6000:1 infinite
node[103,114,116,130,132-135,137-143] department_only 64 257566+ (null) 3-00:00:00
node[112-113,123-124,126] department_only 64 257566 gpu:quadro_rtx_6000:2 3-00:00:00
node[106-109,145-149,151] students* 32+ 257566+ (null) 1-00:00:00
node110 students* 64 257566 gpu:quadro_rtx_6000:1 1-00:00:00
node111 students* 64 257566 gpu:quadro_rtx_6000:2 1-00:00:00
What are the fairshare politics of the cluster?¶
Please refer to the priorities and fairshare sections of the current slurm.conf
file below and the official Slurm documentation.
slurm.conf
slurm/slurm.conf | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 |
|
How do I use NVCC?¶
To use NVCC (Nvidia CUDA Compiler) you need to be in a node with a GPU.
Once there, you need to append some lines to your ~/.bashrc
file:
echo "export PATH=/usr/local/cuda-12.8/bin:$PATH" >> ~/.bashrc
echo "export LD_LIBRARY_PATH=/usr/local/cuda-12.8/lib64:$LD_LIBRARY_PATH" >> ~/.bashrc
source ~/.bashrc
After that, nvcc
will be available.
Check nvcc
's version
You can do this with the nvcc --version
command:
How do I use MPI?¶
Every node has OpenMPI 5.0.7 installed. You can check the details using the ompi_info
command.
Please refer to Slurm's official MPI documentation about how to use it.
How do I launch a container?¶
You can launch containers using either Docker or Singularity. You're encouraged to use Singularity as Docker is known to have issues on HPCs.
To launch a non-interactive container from image <image_name>
in the Docker Hub with command <command>
, simply run:
Example
For example, to get to know the GPU characteristics of a node from inside a container:
srun --gpus=1 -p students singularity exec --nv docker://pytorch/pytorch:2.6.0-cuda11.8-cudnn9-devel nvidia-smi
It will take a while since the image is big and Singularity must convert from the Docker to Singularity format:
INFO: Converting OCI blobs to SIF format
INFO: Starting build...
INFO: Fetching OCI image...
INFO: Extracting OCI image...
INFO: Inserting Singularity configuration...
INFO: Creating SIF file...
Tue Mar 25 11:04:36 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.124.06 Driver Version: 570.124.06 CUDA Version: 12.8 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Quadro RTX 6000 Off | 00000000:41:00.0 Off | Off |
| 32% 36C P0 62W / 260W | 1MiB / 24576MiB | 6% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
To launch an interactive container from image <image_name>
in the Docker Hub, simply run:
Example
For example, to connect to bash on a node with a GPU:
srun --gpus=1 -p students singularity shell --nv docker://pytorch/pytorch:2.6.0-cuda11.8-cudnn9-devel
It will take a while since the image is big and Singularity must convert from the Docker to Singularity format. If I were to start a Python terminal, the output will look like this:
How do I launch a Visual Studio Code instance?¶
You can use this command:
For more details, please take a look at this page of the docs.
Why Visual Studio Code keeps disconnecting?¶
Visual Studio Code is intended for interactive use. Thus, if your local instance disconnects from the network, also the remote instance will be deleted.