TECHNICAL BRIEFING

Disaggregated All-Flash Storage
meets NVIDIA’s inference paradigm

ZK-Storage WS5000 × NVIDIA Dynamo / KVBM / NIXL / GPUDirect Storage — same paradigm, different layers: an objective, verifiable, non-disparaging comparison and a complementary positioning.

300 GB/saggregate bandwidth (S9)
85×peak inference-load speedup (S38)
90.9%median reduction, 7 metrics
WS5000in mass production · sovereign
AGENDA

How this briefing flows

Shared bottleneck first, then each side’s tech, then an objective comparison and complementary positioning

ModuleTakeaway
01Shared bottleneckGPUs are starved by slow IO — NVIDIA’s judgment and ours
02ZK-Storage stackDisaggregation + four core technologies
03Mapping to NVIDIADisaggregation / KV-cache offload / GPUDirect / data path
04Comparison tableRow by row, with sources and conventions
05Complementary & validationThird-party benchmark + sovereign positioning
THE BOTTLENECK

Shared view: faster GPUs are starved by slow IO

NVIDIA (GPUDirect): “As AI, HPC, and data analytics datasets continue to increase in size, the time spent loading data begins to impact application performance. Fast GPUs are increasingly starved by slow IO.”
NVIDIA Developer · GPUDirect

This matches our view: in the LLM era the real bottleneck is on the data-supply side — model load, checkpoint I/O and KV-cache scheduling — not raw compute alone.

<60%
avg. compute-center utilization
large headroom (S11)
30–50%
effective GPU util. when IO-bound
research (S4)
2–3×
uplift via storage acceleration
S4
~74%
peak KV-cache offload savings
online (S5)
01
ZK-STORAGE STACK

The ZK-Storage stack

Disaggregation at the core — turning storage from a bit player into a compute amplifier.

ARCHITECTURE

Disaggregation: compute pool ⟷ lossless fabric ⟷ all-flash pool

Compute pool
GPU / NPU nodes
Huawei Ascend Atlas 910B
Training / inference (transparent)
Lossless fabric
NVMe-oF over RDMA / RoCE
GPUDirect path
2×200GbE line-rate
All-flash pool
EBOF flash array
CPFS parallel file system
KV-cache tiered scheduling
Core idea
Decouple storage media from compute into an independent all-flash pool, linked to GPUs over a lossless fabric; compute and capacity scale independently, with no change to upper-layer frameworks.
FOUR PILLARS

Four core technologies

01NVMe-oF over RDMA / RoCE
Carry NVMe over RDMA, bypassing redundant copies to approach local-disk performance.
02GPUDirect path
Data moves directly between storage and GPU memory, shortening the path and cutting CPU and latency overhead.
03All-flash EBOF
Controller-less, high-density flash pool; bandwidth and IOPS scale near-linearly with capacity, at lower power.
04KV-cache tiered scheduling
Offload and reuse KV cache for long-context / high-switch inference, lifting effective GPU utilization.
02
VS. NVIDIA

Mapping to NVIDIA: same paradigm, different layers

NVIDIA’s software / IO frameworks define disaggregated inference + tiered KV-cache offload + a direct storage path; ZK-Storage brings the same engineering ideas to sovereign compute at the storage-base layer.

NVIDIA PARADIGM

NVIDIA’s inference paradigm (official)

NVIDIA Dynamo composes three core techniques: Disaggregated Serving, KV Cache-Aware Routing and KV Cache Offloading, underpinned by the low-latency transfer layer NIXL.

NVIDIA, verbatim (excerpt)
“KV cache offloading moves KV cache from HBM to cheaper storage tiers such as host memory, local disk, or remote storage. Reusing precomputed state improves TTFT, reduces TCO, and allows for longer context.”
MAPPING ①

Disaggregation ↔ Disaggregated Serving

ZK-Storage
Hardware-disaggregated EBOF
Decouple storage and compute into an independent all-flash pool, linked to the GPU pool over NVMe-oF/RoCE; compute and capacity scale independently.
NVIDIA
NVIDIA Dynamo · 分离式推理服务(Disaggregated Serving)
“Disaggregated serving runs prefill and decode on different devices so each can be scaled and parallelized independently. It required three capabilities: scheduling, memory management for KV cache offloading and onboarding, and low-latency data transfer to move KV cache between nodes and across the memory hierarchy.”
MAPPING ②

KV-cache offload ↔ KVBM tiers

ZK-Storage
KV-cache tiered scheduling
For long-context / multi-model switching, offload and reuse KV cache between GPU memory and all-flash — extend context and concurrency without buying more GPUs.
NVIDIA
NVIDIA Dynamo KVBM · KV Cache 分层卸载
“The KV Block Manager (KVBM) offers a unified memory API spanning GPU memory, pinned host memory, remote RDMA-accessible memory, local/distributed SSDs, and remote file/object/cloud storage. Offloading KV cache from HBM to cheaper tiers (G1 GPU → G2 CPU → G3 SSD → G4 remote) improves TTFT, reduces TCO and enables longer context.”
MAPPING ③

GPUDirect path ↔ GPUDirect Storage

ZK-Storage
GPUDirect path + NVMe-oF
Move data on a direct DMA path between all-flash storage and GPU memory, bypassing the CPU to shorten the path and cut latency.
NVIDIA
NVIDIA GPUDirect Storage(Magnum IO GDS)
“GPUDirect Storage enables a direct data path between local or remote storage, such as NVMe or NVMe over Fabric (NVMe-oF), and GPU memory. It avoids extra copies through a bounce buffer in the CPU’s memory, enabling a DMA engine near the NIC or storage to move data on a direct path into or out of GPU memory — all without burdening the CPU.”
MAPPING ④

Data path ↔ NIXL

ZK-Storage
NVMe-oF over RDMA/RoCE
Carry NVMe over lossless RDMA, providing a low-latency, high-bandwidth data path across GPU memory / host memory / all-flash.
NVIDIA
NVIDIA NIXL · 推理数据传输库
“NIXL (NVIDIA Inference Xfer Library) provides a non-blocking API for high-performance, vendor-agnostic data movement, transferring KV caches across GPU memory, CPU memory and storage tiers (SSD / remote) for use cases such as disaggregated KV cache movement, long-context storage and model-weight transfer.”
COMPARISON

Objective comparison (fair, non-disparaging)

DimensionZK-Storage WS5000NVIDIA equivalent (official)
LayerAll-flash storage appliance (hardware base)Inference / IO software framework (Dynamo·NIXL·GDS)
DisaggregationHardware EBOF + NVMe-oF/RoCEDynamo Disaggregated Serving (prefill/decode split)
KV-cache offloadKV-cache tiered scheduling (mem↔flash)KVBM tiers G1→G4 (GPU→CPU→SSD→remote)
GPU direct pathGPUDirect path + NVMe-oFGPUDirect Storage (GPU↔NVMe/NVMe-oF DMA)
Primary compute fitDomestic GPU / Ascend 90%+ (S9)Mainly the NVIDIA GPU ecosystem
Data sovereigntyStrong (self-controlled)Assess per deployment / compliance
Third-party benchmarkYes (Beijing Information Science and Technology University, Ascend 910B, S38)Per official / partner materials
RelationshipComplementary: a sovereign storage base for the paradigmOpen to third-party storage (WEKA / Dell, etc.)
How to read this
ZK figures are labeled vendor spec (S9) / third-party benchmark (S38); NVIDIA capabilities are quoted from official docs (see Sources). This table is an objective dimension-by-dimension reference, not a disparagement of any third party; refer to each party’s latest official information.
COMPLEMENTARY

Complementary, not a replacement

NVIDIA’s KVBM / NIXL are open to third-party storage. Per NVIDIA’s own updates: “Dell integrates PowerScale with Dynamo’s NIXL for 19x faster TTFT” and “WEKA partners with NVIDIA on KV cache storage for Dynamo.”

Where ZK-Storage fits
This confirms that a disaggregated all-flash storage base is a key part of the disaggregated-inference / KV-cache-offload paradigm. ZK-Storage provides that base for sovereign compute (Ascend / domestic GPUs) — mass-producible, independently benchmarked, with data residency.
Interoperable, not adversarialSovereign compute baseData residencyMass-production · validated
03
VALIDATION

Validation & positioning

Let a reproducible third-party benchmark speak, with an honest positioning.

INDEPENDENT TEST

Third-party benchmark: Beijing Information Science and Technology University · Ascend 910B

ModelZK-Storage loadNFS loadLoad speedupService speedup
DeepSeek-32B6.62 s563.85 s85.2×6.17×
DeepSeek-70B35.38 s1284.66 s36.3×9.33×
Key result (reproducible)
Against an NFS over TCP/10GbE baseline, ZK-Storage over NVMe-oF (RDMA/RoCE): peak inference load 85.17×; at 40 switches/day, effective token throughput rises +356.9%; median reduction across 7 metrics is 90.9%. From a single source, reproducible and verifiable (S38).
POSITIONING

An honest positioning

  • Same paradigm: with NVIDIA, we agree slow IO is the hidden bottleneck of LLM compute.
  • Different layers: NVIDIA provides software / IO frameworks; ZK-Storage provides a mass-producible all-flash storage base.
  • Complementary: a disaggregated all-flash base is part of the disaggregated-inference / KV-offload paradigm.
  • Sovereign: deeply tuned for Ascend / domestic GPUs, with data residency, third-party validation and mass production.
In one line
Make every GPU earn its keep — whichever compute lineage it comes from.
SOURCES

Sources & conventions (verifiable)

ZK-Storage perf / specVendor spec (S9): 300 GB/s aggregate bandwidth, 50M random IOPS, 20 μs latency, 90%+ domestic-GPU coverage, 48–72h deployment, ~-40% total cost.
ZK-Storage validationBeijing Information Science and Technology University on Huawei Ascend Atlas 910B, NFS baseline (S38): DeepSeek-32B load 563.85s→6.62s (85.17×); 90.9% median reduction across 7 metrics. From business_plan/outputs/results.json, reproducible.
KV-cache offload savingsIndustry research: up to ~73.7% online-workload cost reduction (S5).
NVIDIA GPUDirect Storage(Magnum IO GDS)NVIDIA Developer · GPUDirect · GPUDirect Storage Overview Guide
NVIDIA Dynamo · 分离式推理服务(Disaggregated Serving)NVIDIA Dynamo · Introduction · ai-dynamo/dynamo (GitHub)
NVIDIA Dynamo KVBM · KV Cache 分层卸载NVIDIA Dynamo · KVBM
NVIDIA NIXL · 推理数据传输库NVIDIA Technical Blog · NIXL · ai-dynamo/nixl (GitHub)

Last updated: 2026-06-28 · ZK figures from business_plan/outputs/results.json (S-codes on the site’s “Data Sources” page); NVIDIA descriptions and links are its official public materials.

THANK YOU

Make every GPU earn its keep

ZK-Storage WS5000 · disaggregated all-flash accelerated storage appliance · Shenzhen Zhongke Hangxing Technology Co., Ltd.

Q & Atechnical discussion welcome
PoCdemo units in stock
Interopbuilding on sovereign compute
ZK-Storage · Briefing
← → navigate · click sides · press O for overview · ⎙ export PDF

ZK-Storage vs NVIDIA · Technical Briefing Deck