Remote deployment

What is this?

Remote deployment means running LLM workloads on external compute infrastructure instead of your local machine. Common options include virtual machines (VMs) and Virtual Research Environments (VREs).

When should you use it?

Your local machine is not powerful enough
You need shareable environments for a team
You need a controlled but flexible compute setup
You want to avoid the overhead of HPC but need more than local resources
You want to test models before scaling to HPC
You want to transition from local prototyping to more scalable environments

When should you NOT use it?

When your workload is small and can run locally
When governance rules prohibit the selected provider
When your team cannot maintain remote environments
When you need very large models requiring multi-GPU or HPC-scale infrastructure

How it works (simple explanation)

You provision remote compute, connect securely (e.g., via SSH or browser), install the required tools, and run inference or fine-tuning tasks there. Data and compute remain within the remote environment.

Depending on the setup: - VMs are typically persistent (you manage them over time)
- VREs are often ephemeral or semi-managed (sessions may reset or be preconfigured)

Concrete examples (tools/platforms)

Virtual machines (VMs)

Exoscale: European cloud provider with VM instances
AWS EC2: on-demand cloud virtual machines
Google Cloud Compute Engine: scalable VM infrastructure
Microsoft Azure Virtual Machines: enterprise cloud VM platform

Virtual Research Environments (VREs)

Google Colab: managed notebook environment (free tier with limitations)
Azure ML notebooks: managed notebook environments within Azure
JupyterHub: institution-hosted multi-user notebook server
Binder: temporary environments (no GPU, limited for LLM use)
Institution-provided notebook environments (see INSTITUTIONAL RESOURCES/Remote deployment)

VM versus VRE: key difference

VM: full control over the machine, operating system, and installed services. More flexibility, but more responsibility.
VRE: managed research workspace with preconfigured tools and interfaces. Less control, faster onboarding.

Example workflow (step-by-step)

Select VM or VRE based on required level of control.
Create or request access to the environment.
For VMs, configure access (e.g., SSH keys, firewall rules).
Install runtime dependencies (Python, libraries, models).
Upload or mount data.
Run scripts or notebooks for inference or experimentation.
Monitor usage, logs, and costs.

Typical SSH workflow for VMs

Generate or load your SSH key pair.
Add your public key to the VM configuration.
Connect using ssh user@host.
Use tools like tmux or screen for long-running jobs.
Keep scripts version-controlled for reproducibility.

Pros and cons

Pros	Cons
More compute flexibility than local setups	Ongoing cost and operational overhead
Suitable for team collaboration	Requires setup of security and access controls
Good balance between control and usability	Risk of configuration drift without automation

Learning resources

SSH tutorial by DigitalOcean: practical SSH basics
vLLM documentation: scalable inference on remote GPUs