Oracle HPC

Note: As of January 2024, the Oracle HPC is generally being phased out. This page remains for historical reference only.

System layout / architecture

The Biociphers Oracle HPC consists of two small 'bastion' nodes, one 'zfs/data storage' node, and a variable number of compute node, depending on the currently submitted jobs. The bastion nodes are small, please be courteous and don't run very intensive processes on them. There are two separate bastion machines for each type of run which can be submitted: one for CPU bound tasks, and the other to deploy GPU bound tasks.

Storage

Primary storage is provided by Oracle in increments of 10 TB. Based on discussion with Yoseph we have started the system with a total size of 60TB after paring down some storage from the intermediate legacy Oracle system.

In the background, he space is configured at a ZFS stripe over these volumes (with a higher performance volume for cache). Please note that compression is enabled on the filesystem, so files will appear smaller than they would if transferred to a different system. In order to find this, you can use "du" with the option "--apparent-size". For example:

$ du -sh /data/home

109G /data/home

$ du -sh --apparent-size /data/home

114G /data/home

Access/Connection

Currently, the lab developer(s) (Paul Jewell, at time of writing), can grant you access to the system. This access will come in the form of a username, private key file, and public key file. You will use the private key file (which by default has NO file extension, and is the same name as your username) to log into the HPC.

The hostnames of the nodes to log into are:

hpc-cpu.biociphers.org (132.145.32.106)

hpc-gpu.biociphers.org (132.145.42.27)

For example, using standard openssh:

$ ssh -i /path/to/keyfile username@hpc-cpu.biociphers.org

If you get a message along the lines of "unprotected key file.....", you will need to first run:

$ chmod 600 /path/to/keyfile

And try again.

Viewing and submitting jobs

The oracle HPC uses Slurm as it's grid management system, similar to CHOP. There are ample supplementary manuals available online for more details on the many options available for the grid viewing and submission commands. Similar to other batch managers, you can either create a bash script for your job, and submit that, or run interactively. Some simple use cases will be specified here:

Overview of available resources:

$ squeue

$ sinfo

These will show an overview of jobs waiting / running, as well as the current number / type of nodes created

General Workflow:

On this system, there are no compute nodes running by default. When you submit a batch or interactive job, appropriate nodes will be created for you, which takes 10-15 minutes. After you end the interactive session, or when the batch jobs all end, after some time (between 15 minutes and an hour or so, slurm decides) the nodes will be deleted, and you would need to wait for the warm up time again when submitting a new task.

Note that there is a separation between *clusters* and *nodes*. Each cluster has nodes of the same size/type. Functionally creating many clusters with one node will run analysis the same as if you had made one cluster with many nodes, however, if you require many nodes it's generally going to be much faster to create them all in one cluster, because the system needs less overhead to add/remove nodes this way. There is also a lower number of available clusters vs nodes.

If you need GPUs, remember you *must* log into the GPU bastion to start them, not the CPU bastion!

Available CPU Oracle node shapes (Default):

(note: all commands here show launching *a single node* in each cluster)

These nodes are temporarily disabled while an issue with oracle is looked at. You can get equivalent power by using the node sizes below.

Available CPU Oracle node shapes (Custom):

New as of 19-Apr-2022

In addition to the fixed-size Intel shapes above, it is also possible to allocate a node with a specific number of CPU cores and a certain memory size, so that you can use only what you need and the amount of memory does not need to scale exactly with the number of CPU cores like above. Note that while there are not a great many options for combinations, currently each increment of cores / memory is still set.

Available core counts:

(2, 4, 8, 16, 32, 64, 128)

Available Memory sizes:

(1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024)

Note that these will be new AMD chips instead of Intel. To use these sizes there is a constraint string in a certain format passed in:

hpc-4c64g

For example, the full launch command would look like:

$ sbatch -n 1 --constraint hpc-32c4g --ntasks-per-node 1 --exclusive --job-name sleep_job sleep.sh

Available GPU Oracle node shapes:

At the time of writing, Oracle seems to be out of stock for hpc-gpu31, 32, and 34 -- that is, _only_ hpc-gpu38 is available to use.

Note that the shapes listed here have this many cores with hyperthreading. If for some reason your software doesn't perform well with hyperthreading, you can cut the number of cores listed in half.

Submit a batch job:

$ sbatch -n 1 --constraint VM.Standard2.1 --ntasks-per-node 1 --exclusive --job-name sleep_job sleep.sh

Explanation:

-n 1 : number of total cpu cores to use

--constraint VM.Standard2.1 : the type of Oracle instance shape to launch, explained above

--ntasks-per-node 1 : the number of cpu cores to assign for each node. As we don't usually work with MPI based software, this should generally be the same as the above '-n 1' total cpu count

--exclusive : don't share the created nodes with other users. In our new deployment, where you only use what you need, this is generally the case.

--job-name sleep_job : a name so you can remember which running job is which

sleep.sh : The batch script to run. In this case we execute a bash script with bash by having the first line read "#!/bin/bash"

Also useful switches: -o name_of_output_log_file -e name_of_error_log_file (otherwise, they will be automatically chosen)

Submitting an interactive session:

$ srun -n 1 --pty --constraint VM.Standard2.1 --ntasks-per-node 1 --exclusive --job-name interactive_job bash -i

Note this uses much the same switches as the batch job. Here as usual we want to make sure that we start a screen session first. ($ screen -S name_of_session) so that we can get back to our job if we lose connectivity. This job will create the required node type, and then log you into a shell running on it, so you can run tasks interactively if needed.

Software installation

If a program is large, has many system dependencies, or would be useful equally to all users, please ask the system administrator for suggestions on it's installation, before trying to do the installation yourself.

In general, there is no issue with maintaining a virtualenv for majiq, moccasin, or other such frequently updated tools inside your home directory.