Skip to content

Custom commands

For easeness of use, you can create custom commands to automatize some actions. For example, you can define a custom command drain_node to drain a Slurm node from all jobs and put it to idle so you don't have to remember the exact command each time.

How to add a custom command

To add a custom script, you have to create a new file in ~/bin. Note that this location is not shared between nodes and is not shared between users. Once you create a file here and set the right permissions, the command will be immediately available.

This is a step-by-step guide to create a custom command:

  1. Create a folder that will contain all your custom scripts. It can be located everywhere, but for easeness of use we will create it in our home directory:

    mkdir ~/bin
    
  2. Add this folder to your PATH by adding this line at the end of your ~/.bashrc (e.g. using nano ~/.bashrc):

    ~/.bashrc
    ...
    export PATH="$HOME/bin:$PATH"
    

    After that, either run source ~/.bashrc or reload your shell to apply changes.

  3. Create the new script inside your newly created ~/bin. Remember to prepend #!/bin/bash to the file to tell the system to run it using the shell. The following script, called custom_command, will simply print a string:

    ~/bin/custom_command
    #!/bin/bash
    echo "Hello world! I'm a custom command!"
    
  4. Set the new permissions accordingly. For easiness, you can set them for the entire folder:

    chown -R $USER ~/bin && chmod -R 744 ~/bin
    

Now you can run custom_command from your shell to print the string.

List of implemented admin custom commands

Connections

connect_to_node

Connect to a node through ssh. Requires a node ID as a parameter (e.g. 102).

1
2
3
4
5
6
#!/bin/bash 
set -e

NODE_NAME=$1

ssh guest@192.168.0.$NODE_NAME

connect_to_submitter

Connect to the submitter node through ssh.

1
2
3
4
#!/bin/bash 
set -e

connect_to_node 102

connect_to_controller

Connect to the controller node through ssh.

1
2
3
4
#!/bin/bash 
set -e

connect_to_node 115

Containers

docker_kill_old

Kill all containers running for more than 30 minutes. This command is automatically executed every minute using crontab.

#!/bin/bash
set -e

MAX_AGE_MINUTES=30
CURRENT_TIME=$(date +%s)

echo "Checking jobs at $(date -u)"
# Get all running containers
for id in $(docker ps -q); do
    # Extract container creation time in epoch
    created_epoch=$(docker inspect -f '{{.State.StartedAt}}' "$id" | xargs -I{} date --date="{}" +%s 2>/dev/null)

    # Skip if date parsing failed
    if [[ -z "$created_epoch" ]]; then
        continue
    fi

    # Calculate age in minutes
    age=$(((CURRENT_TIME - created_epoch) / 60))
    if ((age > MAX_AGE_MINUTES)); then
        docker kill "$id"
        echo "Killed $id that was running for $age minutes"
    fi
done

Slurm

drain_node

Drains a Slurm node from all jobs and put it idle.

#!/bin/bash
set -e

NODE_ID=$1

scontrol update NodeName=node$NODE_ID State=DOWN Reason="undraining"
scontrol update NodeName=node$NODE_ID State=RESUME
ssh -t guest@192.168.0.$NODE_ID "usermod -u 1217 slurm && systemctl restart slurmd"
sleep 2
sinfo