← Back to main site

CloudTune Build Log

CloudTune: Distributed GPU Broker Playbook

When I started CloudTune, I did not want another model-training dashboard. I wanted the missing piece of the LLM stack: a one-click GPU broker that can fine-tune, serve, and attest runs across AWS, GCP, and Azure with the rigor that regulated industries expect. The hard part was never GPUs themselves; it was the volatility of spot markets that behave more like high-frequency trading venues than cloud APIs.

CloudTune is engineered for survival under volatility. Everything in the platform, from Terraform modules to LoRA checkpoints, assumes GPUs can vanish and budgets can spike mid-run. That forced me to design the system like a resilient market infrastructure, not a single-cloud script.

Terraform modules that respect reality

Most IaC GPU templates assume capacity is stable. CloudTune ships three Terraform modules (AWS, GCP, Azure) that expose a GPU bidder with safeguards:

With this layer, the platform can slide workloads from AWS when H100 pools disappear to GCP or Azure when prices reset. Every allocation event is logged and signed so downstream compliance tooling (Axiom OS) can prove where training actually happened.

FastAPI scheduler as a market-aware broker

The CloudTune control plane is a FastAPI service that behaves like a lightweight matching engine. Incoming jobs are normalized into a schema containing:

The scheduler selects the cheapest GPU that satisfies both SLA and evidence requirements. Training is never fire-and-forget. Each run produces a cryptographically linked bundle (via Axiom) proving the dataset, recipe, container hash, GPU topology, and final checkpoints. That is how CloudTune earns trust with enterprises that refuse to adopt black-box tooling.

LoRA / QLoRA that does not fall apart under preemption

LoRA is light, but QLoRA with 4 and 8 bit quantization can crumble when a GPU is reclaimed. We kept runs stable with:

Together these techniques trim fine-tuning cost by roughly 20 to 35 percent across 7B to 13B families while staying resilient to cloud turbulence.

Observability as a first-class citizen

CloudTune ships with Prometheus, Grafana, and OpenTelemetry from day one. Prometheus scrapes the broker, trainers, and GPUs. Grafana highlights real-time burn, throughput, and failure domains. OpenTelemetry traces cross-cloud scheduling decisions, and all request-scoped logs are compressed to S3 for audit trails. The observability stack powers a "time-to-receipt" metric that measures how quickly a user request moves from run -> train -> validate -> receipt.

Inference receipts stay under 60 seconds. Training receipts consume less than 5 percent of total wall-clock runtime. Those guarantees let founders and researchers operate CloudTune in production without losing visibility.

Lessons learned

LLM infrastructure teams ultimately face two options:

CloudTune chooses the second. The mission is simple: let anyone fine-tune and serve LLMs with reliability, reproducibility, and compliance baked into the workflow. GPUs are only the beginning; the same playbook can manage inference clusters, evaluation fleets, and agentic pipelines that demand receipts.