Infrastructure
for the inference
age.

We run the silicon, serve the models, and operate the network that connects them. Train and infer across the US, Singapore, Japan, and Malaysia — with more locations coming.

Talk to sales View architecture

CDNA 4

8 XCD · 256 CU

288GB HBM3E

90+

MW Deployed

2,000+

GPUs Deployed

17B

Monthly Tokens

104+

Frontier Models · 10 Providers

01 · Stack

Three layers,
one vertical.

We don't resell capacity. We provision the silicon, serve the models, and operate the fabric that connects them — so performance, pricing, and roadmap are ours to set, not yours to chase across vendors.

PRODUCT / 01 CLOUD

GPU Cloud

Bare-metal and GPU-accelerated VM capacity on AMD Instinct MI325X and MI355X systems. Reserve clusters for training; spin up VMs for inference, dev, and experimentation.

GPUsMI325X · MI355X

provisioningbare-metal + VM

storageparallel FS, NVMe

PRODUCT / 02 TOKENS

AI Tokens

Frontier open-weight models served on our own AMD Instinct silicon — plus a unified gateway to 104+ models across every major provider. One OpenAI-compatible API for both, billed per million tokens.

self-hostedLlama · Qwen · DeepSeek

gateway104+ models · 10 providers

APIOpenAI-compatible

PRODUCT / 03 FABRIC

Network Fabric

A high-throughput, any-to-any network connecting our GPU sites across regions. Unlimited 400G ports — pay a flat port fee, run unlimited traffic. No per-bit egress, no transit surprises.

ports400G · unlimited usage

reach44 cities · multi-DC each

billingflat port fee

02 · Cloud

Reserve silicon,
not promises.

Dedicated GPU clusters for training, GPU-accelerated VMs for everything else. Reserve bare-metal where you need maximum performance, spin up VMs where you need flexibility. Transparent per-GPU-hour pricing, no committed-spend gymnastics.

TIER · FLAGSHIP

AMD Instinct MI355X

$2.99/hr

per GPU · reserved

vRAM288 GB HBM3E
Bandwidth8 TB/s
ArchitectureCDNA 4
Low precisionFP8 · FP6 · FP4
Infinity Fabric153 GB/s
Use casehigh-concurrency LLMs

TIER · CAPACITY

AMD Instinct MI325X

$1.99/hr

per GPU · reserved

vRAM256 GB HBM3E
Bandwidth6 TB/s
ArchitectureCDNA 3
Low precisionFP16 · FP8
Infinity Fabric4th gen
Use casetrillion-param models

03 · Tokens

Open weights
on our silicon.

We serve frontier open-weight models — Llama, Qwen, DeepSeek, Mixtral, and more — on AMD Instinct GPUs we own and operate. And through the same gateway, we route to 104+ closed and open models across every major provider. One API, one bill, one auth.

SELF-HOSTED · ON A3CLOUD.AI GPUs SERVING

Llama family

Meta · open weights

MI355X

Qwen family

Alibaba · open weights

MI325X / MI355X

DeepSeek family

DeepSeek · open weights

MI355X

Mixtral family

Mistral · open weights

MI325X

GATEWAY · ROUTED TO PROVIDER 104+ MODELS

OpenAI · Anthropic · Google

Closed frontier models

gateway

Grok · Zhipu · Vidu

Specialty & regional models

gateway

Moonshot · Minimax

Long-context & multimodal

gateway

One API.
Every modality.

LLMs, embeddings, text-to-speech, speech-to-text, image generation, image editing, video generation — all behind a single OpenAI-compatible endpoint. Switch model classes with a single field change in your request.

For open-weight models, your tokens are generated on our GPUs, on our fabric — no third-party API in the path. For closed models, the gateway handles auth, billing, and version routing across every major provider.

Request API access

04 · Fabric

The network is
the computer.

AI workloads don't stop at the rack. Our network fabric reaches 44 cities — with multiple data centers wired up in each — so a training job, a dataset, and an inference endpoint can live in different buildings, or different countries, without the network becoming the bottleneck.

F.01

Unlimited 400G ports

Pay a flat port fee, run unlimited traffic. No per-gigabyte egress meters, no transit surcharges between sites. Move datasets, checkpoints, and inference traffic as freely as your budget says you should.
F.02

Any-to-any across regions

GPU clusters across data centers, edge POPs, and customer cages share one flat address space. No NAT, no overlay tax, no MPLS-era complexity — workloads move sites without re-architecting.
F.03

Built for AI traffic patterns

Tuned for the bursty, east-west, loss-sensitive traffic that AI workloads actually generate. Inference flows don't fight training flows; storage doesn't compete with checkpoints.
F.04

Telemetry to the millisecond

Per-flow visibility into queue depth, latency, and link utilization. When a training run stalls, we know which span to look at before you do.

05 · Architecture

Four layers,
one operator.

We control every layer from the raised floor to the API. That vertical integration is why we can hold latency, throughput, and availability commitments end-to-end — and why our roadmap is ours, not a hyperscaler's.

L04 / API

Token Serving Plane

OpenAI-compatible endpoints · streaming · auth · rate-limit · per-token billing · model router

L03 / RUNTIME

Inference Runtime

vLLM · SGLang · ROCm runtime · continuous batching · paged KV-cache · speculative decode

L02 / FABRIC

Cross-Region Network & Storage

Any-to-any network · unlimited 400G ports · cross-region reach · parallel filesystem · NVMe object cache

L01 / SILICON

GPU Capacity Layer

AMD Instinct MI325X · MI355X · CDNA 3/4 · UBB platforms · US · SG · JP · MY · 44-city fabric reach

Silicon partner

AMD Instinct™
MI355X & MI325X

MI355X288 GB · 8 TB/s · CDNA 4

MI325X256 GB · 6 TB/s · CDNA 3

SoftwareROCm™ · open stack

LIVE IN PRODUCTION

Built on AMD Instinct.

Every token we serve and every cluster we reserve runs on AMD Instinct accelerators. The flagship MI355X pairs CDNA 4 Matrix Cores with native FP8/FP6/FP4 production compute and 8 TB/s of bandwidth — purpose-built for high-concurrency LLM inference.

For the largest workloads, the MI325X brings 256 GB of HBM3E per GPU, so trillion-parameter models fit in memory with far less sharding. Paired with the open ROCm™ stack and native PyTorch support, you get frontier capacity without proprietary lock-in.

Infrastructure for the inference age.

Three layers,one vertical.

GPU Cloud

AI Tokens

Network Fabric

Reserve silicon,not promises.

Open weightson our silicon.

One API.Every modality.

The network isthe computer.

Unlimited 400G ports

Any-to-any across regions

Built for AI traffic patterns

Telemetry to the millisecond

Four layers,one operator.

Built on AMD Instinct.

Run on theintegrated stack.

Infrastructure
for the inference
age.

Three layers,
one vertical.

Reserve silicon,
not promises.

Open weights
on our silicon.

One API.
Every modality.

The network is
the computer.

Four layers,
one operator.

Run on the
integrated stack.