This page details how are jobs charged, and how to figure out how many UCs do your jobs consume.
The central concept to understand how jobs are charged are Computational Units (UCs, from its acronym in Catalan).
Every group with access to our HPC service will be granted a number of UCs. That is the upper limit of consumption for a group in a given year, and both the assigned limit and the number of UCs consumed so far can be checked with the command
consum |
This information is also printed when logging into our cluster.
Our cluster is heterogeneous, and in order to both rationalise resource usage and establish an accounting equivalence between different architectures, all charges are translated from computational hours (HCs) into computational units (UCs).
Partition | Conversion factor |
---|---|
std | 1 UC/HC |
std-fat | 1.5 UC/HC |
mem | 2 UC/HC |
gpu | 1 UC/HC |
knl | 0.5 UC/HC |
In simple terms, UCs act as a virtual currency in which jobs are charged, and conversion factors are the price of a single computational hour (1 CPU * 1 hour) for each architecture.
The only resource that is charged for is CPU time. The cost of any job, in any architecture, is determined by the number of CPUs it uses times the wall-clock length of the job.
Note that this means that all of the following elements are not taken into account when charging a job:
The formula to find out how much a job costs you is therefore very simple:
UCs charged = CPUs assigned to the job * Job length in hours * Conversion factor of the partition you're using
The standard partition is the reference for all conversion factor. As a result, jobs that run on the std partition are charged at a rate of 1 UC per HC - which is to say, 1 UC per core per hour.
The standard-fat partition contains "fat" nodes with identical CPUs but twice as much memory per core as those in the standard partition. To encourage a rational use of those resources for jobs that actually require them, the conversion factor for this partition is 1.5 UC/HC - meaning that 1.5 UC is charged per core per hour.
Note that this is less expensive than running the same job in twice as many cores in the standard partition to have access to the same amount of memory.
The shared memory partition corresponds to canigo1 and canigo2, our shared memory machines. These machines have a considerably larger memory-per-core ratio, up to 24GB/core. Again, in order to encourage a responsible use of these resources, these are charged at a rate of 2 UCs per core per hour.
The GPU partition contains four nodes with identical CPUs and memory to standard nodes, but which additionally contain two Tesla P-100 GPGPUs each. The GPUs themselves are free of charge, but to ensure a responsible use of resources, each GPU is only allocated together with an entire CPU socket containing 24 cores (see How to request GPUs for more details). The conversion factor is 1 UC per core per hour, but taking into account that jobs in the gpu partition are only assigned multiples of 24 cores, in practice these jobs are charged at an effective rate of 24 UCs per each set of P-100 GPU + 24-core socket per hour.
This partition contains four Intel Knight's Landing (KNL) nodes with specialty, highly parallel CPUs tailored to certain tasks. To ensure an efficient use of those resources, jobs can only be assigned whole nodes of 68 physical cores. Since the conversion factor is 0.5 UC/HC, in practice these jobs are charged at an effective rate of 34 UCs per node per hour.
Related articles appear here based on the labels you select. Click to edit the macro and add or change labels.
|