HPC

What is this?

HPC (High-Performance Computing) refers to shared computing clusters designed to run large-scale, compute-intensive workloads that exceed the capacity of local machines or small virtual machines.

These systems typically consist of many interconnected nodes (CPUs/GPUs) managed by a scheduler, allowing users to run jobs in parallel at scale.

When should you use it?

You need to run large batches of LLM experiments
You need high-memory or multi-GPU jobs
You need parallel evaluation across many datasets or models
You are fine-tuning models or running large-scale benchmarking
You need reproducible, large-scale pipelines

When should you NOT use it?

For quick exploratory prompting
For small workloads that run well locally or on a VM
When your workflow requires real-time interaction
When your team cannot maintain job scripts and queue-based workflows

How it works (simple explanation)

You submit jobs to a scheduler (commonly Slurm) using batch scripts. The scheduler queues your job and executes it when the requested resources become available.

Jobs / batch scripts: Non-interactive scripts specifying what to run and required resources
Scheduler: Software (e.g., Slurm) that manages job submission, scheduling, and execution
Queue: A waiting line where jobs are held until resources become available; priority may depend on policies or quotas
Resources: Requested compute such as CPUs, GPUs, memory, runtime, and number of nodes

Concrete examples (tools/platforms)

Academic and national HPC

University HPC clusters
National infrastructure such as SURF (e.g., Snellius)

See INSTITUTIONAL RESOURCES/HPC for more.

Commercial HPC (HPC-as-a-Service)

AWS (EC2 + ParallelCluster): scalable HPC clusters with GPU support
Microsoft Azure HPC: GPU clusters with high-speed interconnects
Google Cloud HPC: distributed workloads with GPUs/TPUs
Oracle Cloud Infrastructure (OCI) HPC: HPC with low-latency networking
OVHcloud HPC: European HPC and GPU clusters (GDPR-friendly option)

Example workflow (step-by-step)

Prepare a reproducible environment (modules, container, or environment file).
Write a batch script specifying CPU/GPU, memory, and time requirements.
Submit the job via the scheduler (e.g., sbatch).
Monitor job status and logs.
Validate outputs and iterate with adjusted parameters.
Archive results, logs, and configuration for reproducibility.

Pros and cons

Pros	Cons
Handles large-scale and parallel workloads	Steeper learning curve than chat or API use
Access to high-end compute and storage	Queue times can delay iteration
Strong fit for reproducible batch pipelines	Requires planning and resource estimation
Enables multi-GPU and distributed workloads	Less suitable for interactive exploration

Learning resources

Batch jobs and Slurm basics

Running LLM workloads on HPC

Accelerate: This Hugging Face Accelerate documentation provides a practical guide to scaling PyTorch-based workflows from local machines to distributed environments such as multi-GPU servers, cloud instances, or HPC clusters. It introduces a lightweight abstraction (Accelerator) that allows researchers to run the same code across different hardware setups with minimal changes, while handling device placement, parallelism, and mixed precision under the hood. The documentation also includes a range of tutorials and how-to guides covering both beginner and advanced topics for training and inference, such as the Distributed Data Parallel tutorial and DeepSpeed integration.
vLLM: library for scalable LLM inference (often used on HPC clusters)

Environment management and reproducibility

docker: containerization for consistent environments
apptainer: HPC-friendly containers (formerly Singularity)

HPC