Infrastructure
for the inference
age.

We run the silicon, serve the models, and operate the network that connects them. Train and infer across the US, Singapore, Japan, and Malaysia — with more locations coming.

CDNA 4
8 XCD · 256 CU
288GB HBM3E
90+
MW Deployed
2,000+
GPUs Deployed
17B
Monthly Tokens
104+
Frontier Models · 10 Providers

Three layers,
one vertical.

We don't resell capacity. We provision the silicon, serve the models, and operate the fabric that connects them — so performance, pricing, and roadmap are ours to set, not yours to chase across vendors.

PRODUCT / 01 CLOUD

GPU Cloud

Bare-metal and GPU-accelerated VM capacity on AMD Instinct MI325X and MI355X systems. Reserve clusters for training; spin up VMs for inference, dev, and experimentation.

GPUsMI325X · MI355X
provisioningbare-metal + VM
storageparallel FS, NVMe
PRODUCT / 02 TOKENS

AI Tokens

Frontier open-weight models served on our own AMD Instinct silicon — plus a unified gateway to 104+ models across every major provider. One OpenAI-compatible API for both, billed per million tokens.

self-hostedLlama · Qwen · DeepSeek
gateway104+ models · 10 providers
APIOpenAI-compatible
PRODUCT / 03 FABRIC

Network Fabric

A high-throughput, any-to-any network connecting our GPU sites across regions. Unlimited 400G ports — pay a flat port fee, run unlimited traffic. No per-bit egress, no transit surprises.

ports400G · unlimited usage
reach44 cities · multi-DC each
billingflat port fee

Reserve silicon,
not promises.

Dedicated GPU clusters for training, GPU-accelerated VMs for everything else. Reserve bare-metal where you need maximum performance, spin up VMs where you need flexibility. Transparent per-GPU-hour pricing, no committed-spend gymnastics.

TIER · CAPACITY
AMD Instinct MI325X
$1.99/hr
per GPU · reserved
  • vRAM256 GB HBM3E
  • Bandwidth6 TB/s
  • ArchitectureCDNA 3
  • Low precisionFP16 · FP8
  • Infinity Fabric4th gen
  • Use casetrillion-param models

Open weights
on our silicon.

We serve frontier open-weight models — Llama, Qwen, DeepSeek, Mixtral, and more — on AMD Instinct GPUs we own and operate. And through the same gateway, we route to 104+ closed and open models across every major provider. One API, one bill, one auth.

SELF-HOSTED · ON A3CLOUD.AI GPUs SERVING
Llama family
Meta · open weights
MI355X
Qwen family
Alibaba · open weights
MI325X / MI355X
DeepSeek family
DeepSeek · open weights
MI355X
Mixtral family
Mistral · open weights
MI325X
GATEWAY · ROUTED TO PROVIDER 104+ MODELS
OpenAI · Anthropic · Google
Closed frontier models
gateway
Grok · Zhipu · Vidu
Specialty & regional models
gateway
Moonshot · Minimax
Long-context & multimodal
gateway

One API.
Every modality.

LLMs, embeddings, text-to-speech, speech-to-text, image generation, image editing, video generation — all behind a single OpenAI-compatible endpoint. Switch model classes with a single field change in your request.

For open-weight models, your tokens are generated on our GPUs, on our fabric — no third-party API in the path. For closed models, the gateway handles auth, billing, and version routing across every major provider.

The network is
the computer.

AI workloads don't stop at the rack. Our network fabric reaches 44 cities — with multiple data centers wired up in each — so a training job, a dataset, and an inference endpoint can live in different buildings, or different countries, without the network becoming the bottleneck.

  • F.01

    Unlimited 400G ports

    Pay a flat port fee, run unlimited traffic. No per-gigabyte egress meters, no transit surcharges between sites. Move datasets, checkpoints, and inference traffic as freely as your budget says you should.

  • F.02

    Any-to-any across regions

    GPU clusters across data centers, edge POPs, and customer cages share one flat address space. No NAT, no overlay tax, no MPLS-era complexity — workloads move sites without re-architecting.

  • F.03

    Built for AI traffic patterns

    Tuned for the bursty, east-west, loss-sensitive traffic that AI workloads actually generate. Inference flows don't fight training flows; storage doesn't compete with checkpoints.

  • F.04

    Telemetry to the millisecond

    Per-flow visibility into queue depth, latency, and link utilization. When a training run stalls, we know which span to look at before you do.

REGIONAL CORE AGGREGATION REGIONS US-WEST ACTIVE SINGAPORE ACTIVE TOKYO ACTIVE KUALA L. ACTIVE PORT SPEED 400G BILLING FLAT FEE REACH 44 CITIES

Four layers,
one operator.

We control every layer from the raised floor to the API. That vertical integration is why we can hold latency, throughput, and availability commitments end-to-end — and why our roadmap is ours, not a hyperscaler's.

L04 / API
Token Serving Plane
OpenAI-compatible endpoints · streaming · auth · rate-limit · per-token billing · model router
L03 / RUNTIME
Inference Runtime
vLLM · SGLang · ROCm runtime · continuous batching · paged KV-cache · speculative decode
L02 / FABRIC
Cross-Region Network & Storage
Any-to-any network · unlimited 400G ports · cross-region reach · parallel filesystem · NVMe object cache
L01 / SILICON
GPU Capacity Layer
AMD Instinct MI325X · MI355X · CDNA 3/4 · UBB platforms · US · SG · JP · MY · 44-city fabric reach
Silicon partner
AMD Instinct™
MI355X & MI325X
MI355X288 GB · 8 TB/s · CDNA 4
MI325X256 GB · 6 TB/s · CDNA 3
SoftwareROCm™ · open stack
LIVE IN PRODUCTION

Built on AMD Instinct.

Every token we serve and every cluster we reserve runs on AMD Instinct accelerators. The flagship MI355X pairs CDNA 4 Matrix Cores with native FP8/FP6/FP4 production compute and 8 TB/s of bandwidth — purpose-built for high-concurrency LLM inference.

For the largest workloads, the MI325X brings 256 GB of HBM3E per GPU, so trillion-parameter models fit in memory with far less sharding. Paired with the open ROCm™ stack and native PyTorch support, you get frontier capacity without proprietary lock-in.

— Get started

Run on the
integrated stack.

Reserve compute, get an API key, or talk to an engineer about your training schedule. We answer within one business day.