MoreRSS

site iconJimmy Song | 宋净超修改

Tetrate 布道师,云原生社区 创始人,CNCF Ambassador,云原生技术专家。
请复制 RSS 到你的阅读器,或快速订阅到 :

Inoreader Feedly Follow Feedbin Local Reader

Jimmy Song | 宋净超的 RSS 预览

Kubernetes's Anxiety and Rebirth in the AI Wave

2026-04-03 13:20:28

Kubernetes hasn’t been replaced by AI, but it’s being redefined by it. Anxiety is the prelude to rebirth.

After attending KubeCon EU 2026 in Amsterdam, I’ve been pondering a key question: Kubernetes isn’t obsolete, but it’s no longer “enough”; it hasn’t been replaced by AI, but it’s being redefined by AI.

Figure 1: KubeCon EU 2026 slogan: Keep Cloud Native Moving. This event had over 13,000 registrations, making it the largest KubeCon to date.
Figure 1: KubeCon EU 2026 slogan: Keep Cloud Native Moving. This event had over 13,000 registrations, making it the largest KubeCon to date.

This was my third time attending KubeCon in Europe. Over the past few years, you can actually see the community’s mindset shift through the event slogans:

  • 2024 Paris: La vie en Cloud Native

    → Cloud Native has become a “way of life,” the default state

  • 2025 London: No slogan, just the 10th anniversary

    → Kubernetes reached a milestone, focusing on retrospection rather than moving forward

  • 2026 Amsterdam: Keep Cloud Native Moving

    → But the question is: where is it moving?

The absence of a slogan in 2025 was a signal in itself:

When an ecosystem starts commemorating the past instead of defining the future, it’s already at an inflection point.

This article doesn’t recap the talks, but instead distills my observations at KubeCon into insights about Kubernetes’ anxiety and rebirth in the AI wave.

The Root of Anxiety: Is Kubernetes Facing a “Crisis”?

The biggest change at KubeCon was that AI has completely replaced traditional cloud native topics. The focus shifted from service optimization and microservices management to how to deploy and manage AI workloads on Kubernetes, especially inference tasks and GPU scheduling.

Figure 2: Before KubeCon officially started, the Maintainer Summit was all about AI.
Figure 2: Before KubeCon officially started, the Maintainer Summit was all about AI.

Kubernetes, as the foundational infrastructure, was once the core of the cloud native world. With the explosive growth of AI models, the question now is whether Kubernetes can still serve as a “universal” platform for everything, which has become a new source of anxiety.

The AI boom brings real challenges: Can Kubernetes’ “universality” adapt to the complexity of AI workloads?

The Focus Brought by the AI Boom

AI’s popularity has shifted the cloud native spotlight entirely to artificial intelligence. AI coding, OpenClaw, large language models, and generative models have all drawn widespread attention. AI has become the core computing demand in the real world.

This surge in demand raises the question: Can Kubernetes continue to serve as the infrastructure platform for complex tasks? Especially with issues like GPU sharing, inference model scheduling, VRAM allocation, and device attribute selection, is the traditional Kubernetes resource model sufficient?

In the past, Kubernetes handled compute, storage, and networking as foundational infrastructure. But with the rapid development of AI, its “universality” is being challenged. Particularly for inference tasks, Kubernetes’ model appears thin.

Comparing with OpenStack: Will Kubernetes Repeat History?

OpenStack once aimed to be a complete open-source cloud platform, but ultimately failed to sustain growth due to complexity and a lack of flexibility in adapting to new technologies.

Will Kubernetes follow the same path? I believe Kubernetes has different strengths: as a container and microservices orchestration platform, it’s widely adopted and has strong community and vendor support. It doesn’t try to replace all cloud provider capabilities but serves as an infrastructure control plane to help users manage resources.

Figure 3: Cloud native contributors remain active. The crowd at the KubeCon EU 2026 Maintainer Summit shows the community’s vitality.
Figure 3: Cloud native contributors remain active. The crowd at the KubeCon EU 2026 Maintainer Summit shows the community’s vitality.

However, as AI workloads become mainstream, Kubernetes must find a new position to avoid being replaced by “AI-optimized platforms.”

Kubernetes’ Challenge: The GPU Resource Management Gap

At KubeCon, NVIDIA announced the donation of the GPU DRA (Dynamic Resource Allocation) driver to the CNCF, marking the upstreaming of GPU resource management. GPU sharing and scheduling have become urgent issues for Kubernetes.

Traditionally, Kubernetes relied on the Device Plugin model to schedule GPUs, only supporting allocation by device count (e.g., nvidia.com/gpu: 1). But for AI inference tasks, more information is needed for resource scheduling, such as VRAM size, GPU topology, and sharing strategies. NVIDIA DRA makes GPU resource management more flexible and intelligent, gradually easing the “GPU resource crunch” in AI workloads.

This shift means Kubernetes is no longer just a “container orchestration platform,” but is becoming the infrastructure layer for AI-specific resource scheduling.

Against this backdrop, both the community and industry are exploring finer-grained GPU resource abstraction and scheduling mechanisms. For example, the open-source project HAMi is building a GPU resource management layer for AI workloads on top of Kubernetes, supporting GPU sharing, VRAM-level allocation, and heterogeneous device scheduling.

Figure 4: HAMi demo at KubeCon EU 2026 Keynote
Figure 4: HAMi demo at KubeCon EU 2026 Keynote

These efforts are not about replacing Kubernetes, but about filling the resource model gaps for the AI era. In the long run, this layer may evolve into a “GPU Abstraction Layer” similar to CNI/CSI, becoming a key part of AI-native infrastructure.

The Production “Gap”: Many AI PoCs, Few in Production

A common post-event summary was: Many PoCs, but “everyday production deployments” are still rare. Pulumi summarized it as:

lots of working demos, very few production setups people trust

This shows that while many AI workload solutions succeed in technical demos, the transition from experimentation to production remains difficult. Whether it’s GPU resource sharing or inference request scheduling, whether Kubernetes as the foundation can support this transformation is still an open question.

The Rise of Inference Systems: Kubernetes’ Scheduling Boundaries Are Challenged

Another major event at this KubeCon was llm-d being contributed to the CNCF as a Sandbox project.

If GPU DRA represents the upstreaming of device resource models, then llm-d represents another critical evolution: Distributed LLM inference capabilities are moving from proprietary engineering implementations to standardized, community-driven collaboration in cloud native.

This is significant not just because it’s another open-source project, but because it shows that Kubernetes’ challenges in the AI era are no longer just about “how to schedule GPUs,” but also “how to host inference systems themselves.” As prefill/decode separation, request routing, KV cache management, and throughput optimization move into the infrastructure layer, Kubernetes’ boundaries are being redefined.

Traditionally, the Kubernetes scheduler focused on Pod scheduling. But in AI inference scenarios, scheduling is not just about picking a node—it’s about selecting the most suitable inference instance based on request characteristics. Factors like model state, request queue depth, and cache hit rate all need to be considered. This process is increasingly managed by inference runtimes, forming new “request-level scheduling” systems.

This leads to an overlap between the Kubernetes scheduler and inference systems, forcing Kubernetes to rethink its role: should it keep expanding, or collaborate with inference systems?

AI-Native Infrastructure: The Key Challenge for Production

At the AI Native Summit, the real needs for AI-native infrastructure were especially clear. The focus was no longer “can it run on Kubernetes,” but how to make AI workloads routine, stable, and production-ready on Kubernetes.

Figure 5: At the AI Native Summit after KubeCon, Linux Foundation Chairman Jonathan said cloud native is entering the AI-native era.
Figure 5: At the AI Native Summit after KubeCon, Linux Foundation Chairman Jonathan said cloud native is entering the AI-native era.

The core challenge is delivery. Unlike traditional apps, AI model weights are often huge—tens of GB or even TB—making model delivery and data management extremely complex. Traditional container delivery systems (like image layers) struggle with such massive data and complex versioning.

A key direction for Kubernetes is to standardize model weight and data delivery, using ImageVolume and OCI artifacts to solve AI model delivery and version management on Kubernetes. This not only reduces “cold start” times but also provides infrastructure support for multi-tenancy and compliance.

Summary

Kubernetes won’t be replaced by AI, but it’s being reshaped as the core of infrastructure. This anxiety is the force driving its evolution—it’s moving from a “general-purpose infrastructure platform” to an “AI-powered multifunctional base”. Some even call it the AI operating system.

In the future, Kubernetes’ core competitiveness will no longer be just container management, but how effectively it can schedule and manage AI workloads, and how it can make AI a routine part of operations. This was my biggest takeaway from the AI Native Summit and KubeCon, and it’s what I look forward to in the Kubernetes ecosystem over the next few years.

References

Day One in Amsterdam: Kubernetes Is Rethinking AI

2026-03-23 04:41:19

Today marks my first day at KubeCon Europe 2026. The most striking feeling is: the world is vast, but this community is truly small.

Figure 11: Jimmy on the first day of KubeCon EU 2026
Figure 11: Jimmy on the first day of KubeCon EU 2026

One strong impression stands out:

The world is big, but this circle is really small.

Old Friends, New Cycle

At the Maintainer Summit, I met many familiar faces—

Colleagues from Ant Group, friends from Tetrate, and some people I’ve known for nearly a decade. Together, we’ve journeyed from the early days of Kubernetes, Service Mesh, and cloud native infrastructure to today.

In a sense, this generation has fully experienced:

  • The rise of Kubernetes
  • The standardization of Cloud Native
  • The microservices and service mesh boom
  • And now, the era of AI Infrastructure

This isn’t about “new people entering the field,” but rather—

The same group stepping into a new technology cycle.

What Is the Maintainer Summit Discussing?

If you ask:

What is the Kubernetes community most concerned about right now?

Today’s answer is very clear:

👉 How to run AI workloads better on Kubernetes

Figure 12: The Maintainer Summit’s main topic is AI Infra
Figure 12: The Maintainer Summit’s main topic is AI Infra

Many topics at the Maintainer Summit revolved around:

  • Scheduling models for LLM / AI workloads
  • GPU / accelerator resource management
  • Integrating inference systems with Kubernetes
  • Redefining the roles of data plane vs. control plane
  • How observability tools like OTel monitor AI workloads

In other words:

Kubernetes hasn’t been replaced by AI; it’s actively “absorbing” AI.

Key Signal: GPUs Are Becoming the “Infrastructure Layer”

Today, I had an in-depth discussion with CNCF TOC, Red Hat, and the vLLM community.

The core question was:

How should GPUs be “platformized”?

Some consensus is already clear:

  • GPUs are no longer just devices
  • They are now a schedulable, partitionable, and shareable resource layer
Figure 13: TOC meeting discussing GPU resource management and LLM Serving integration
Figure 13: TOC meeting discussing GPU resource management and LLM Serving integration

At the Maintainer Summit in Amsterdam, we had deep discussions with CNCF TOC, Red Hat, and the vLLM community about GPU resource management and LLM Serving integration in Kubernetes scenarios, and explored potential collaboration between vLLM and HAMi.

Behind this is a major paradigm shift:

Past Now
GPU = Node resource GPU = Infrastructure layer
Exclusive use Multi-tenant sharing
Static binding Dynamic scheduling
Managed within frameworks Unified management at the platform layer

This is exactly what we’ve been working on in HAMi.

HAMi: From “Project” to “Reference Pattern”

Another interesting change today:

HAMi is no longer just a “community project”—it’s becoming:

A reference implementation (reference pattern) for AI Infra

Figure 14: Li Mengxuan, CTO of Dynamia, sharing HAMi’s design and practice at KubeCon EU 2026 Maintainer Summit
Figure 14: Li Mengxuan, CTO of Dynamia, sharing HAMi’s design and practice at KubeCon EU 2026 Maintainer Summit

This is reflected in several ways:

  • Invited to present at the Maintainer Summit
  • Participating in CNCF TOC discussions
  • Involved in incubating review demos
  • Exploring joint content with the vLLM community (even discussing a joint blog 👀)

Especially in conversations with Red Hat and vLLM, a clear trend emerged:

GPU resource management and LLM serving are becoming coupled

That is:

  • Upper layer: vLLM / inference frameworks
  • Lower layer: GPU scheduling / sharing

A new “interface layer” is gradually forming.

This is a direction worth betting on.

Figure 15: At the TAG Workshop, HAMi was discussed as an Incubating demo
Figure 15: At the TAG Workshop, HAMi was discussed as an Incubating demo

A Caution: The AI Infra Startup Boom Hasn’t Really Begun

At the same time, I have a somewhat “counterintuitive” observation:

We haven’t yet seen a large wave of AI Infra (K8s-focused) startups.

Most companies I saw today:

  • Many are pivoting from CI/CD, Service Mesh, or Gateway
  • Many are traditional cloud vendors extending into AI
  • Many are working on models, agents, or even lower-level tech

But those truly focused on:

“Making AI workloads run better on Kubernetes”

There are actually not many startups at this layer.

This could mean two things:

1) This Layer Isn’t Fully Formed Yet

Currently, most activity is at:

  • The model layer (LLM / foundation models)
  • The application layer (Agent / Copilot)

But not at:

  • The scheduling layer
  • The resource layer
  • The runtime layer

2) Or, the Barrier to Entry Is Very High

Because at its core, this is:

The intersection of Cloud Native × GPU × AI workload

It’s not just “wrapping AI,” but a fundamental re-architecture at the infrastructure level.

My Take

If we break down the AI technology stack:

Agent / Application
LLM Serving (vLLM, etc.)
AI Runtime / Scheduling
GPU Resource Layer
Hardware

Most innovation today is concentrated in:

  • The top two layers (Agent / LLM)

But the real long-term moat lies in:

  • The middle two layers (Runtime + Resource Layer)

And Kubernetes is very likely to remain:

The default platform for this middle layer

Summary

Today’s takeaway:

Kubernetes is not obsolete; it’s being redefined.

And our generation is shifting from:

“Cloud Native Builders”

to:

“AI Infrastructure Builders”

More to come tomorrow.

HAMi Website Refactor: Why HAMi Docs and Website Underwent a Complete Redesign

2026-03-17 08:55:52

This redesign is more than a style update—it’s a step toward clearer technical communication and better user experience. Try the new HAMi website at https://project-hami.io and submit issues here.

Over the past two months, I conducted a thorough refactor of the documentation website (see GitHub). Externally, it looks like a “visual redesign”, but from the perspective of community maintainers and content builders, it’s a comprehensive upgrade of information architecture, content system, and frontend experience.

This article aims to systematically explain three things: why we did this refactor, what exactly changed, and what these changes mean for the HAMi community.

Why Refactor the Website and Documentation

HAMi is a CNCF-hosted open source project initiated and contributed by Dynamia, with growing influence in GPU virtualization, heterogeneous compute scheduling, and AI infrastructure. The community content is expanding, and user types are becoming more diverse: from first-time visitors to engineers and enterprise users seeking deployment docs, architecture diagrams, case studies, and ecosystem information.

The original site was functional, but as content grew, several issues became apparent:

  • The homepage lacked information density, making it hard to quickly grasp the project’s overall value.
  • Connections between docs, blogs, and community info were not smooth; content entry points were scattered.
  • Search experience was unstable; external solutions were not ideal in practice.
  • Mobile experience had many details needing improvement, especially navigation, card layouts, and footer areas.
  • Visual style was inconsistent, making it hard to convey community influence and engineering maturity.

For a fast-evolving open source community, the website is not just a “place for docs”, but the public interface of the community. It needs to serve as project introduction, knowledge gateway, adoption proof, community connector, and brand expression.

So the goal of this refactor was clear: not just superficial beautification, but to truly upgrade the website into HAMi’s systematic community entry point.

What Was Done in This Refactor

This update was not a single-point change, but a series of systematic improvements.

Homepage Redesign and Complete Information Architecture Overhaul

The most obvious change is the homepage.

We redesigned the homepage structure, moving away from simply stacking content blocks, and instead organizing the page around the main narrative: “Project Positioning → Core Capabilities → Ecosystem Entry → Content Accumulation → Community Trust”.

Specifically, the homepage received several key upgrades:

  • Rebuilt the Hero section to strengthen first-screen information delivery and action entry.
  • Optimized CTA design so users can quickly access docs, blogs, and resources.
  • Added and enhanced multiple homepage sections to showcase project value and community reach in a more structured way.
  • Adjusted visual hierarchy, background atmosphere, and scroll rhythm, transforming the homepage from a “content list” into a “narrative page”.

These changes include Hero animations and atmosphere layers, research/story sections, new resource entry sections, refreshed CTAs, unified background design, and ongoing reduction of visual noise. Together, they solve a core problem: enabling visitors to understand what HAMi is and why it’s worth exploring further within seconds.

Architecture Diagrams

Key diagrams were redrawn for clearer technical communication. This helps users grasp HAMi’s role in AI infrastructure.

Figure 1: HAMi website homepage architecture diagram
Figure 1: HAMi website homepage architecture diagram

For HAMi, this change is critical. The community faces not just a single feature, but a set of system-level challenges involving Kubernetes, schedulers, GPU Operators, heterogeneous devices, and enterprise platforms. Improved diagrams make the website a better technical entry point.

Added Case Studies, Community, and Ecosystem Sections to Make Impact Visible

Another important direction was strengthening the “community proof” layer.

Many open source project sites fall into the trap of having complete docs, but users can’t tell if the project is truly adopted, if the community is active, or if the ecosystem is expanding. The HAMi website redesign consciously addresses this.

Figure 2: HAMi ecosystem and device support
Figure 2: HAMi ecosystem and device support
Figure 3: HAMi adopters
Figure 3: HAMi adopters
Figure 4: HAMi contributor organizations
Figure 4: HAMi contributor organizations

Blog & Reading Experience

Blog cards, lists, and metadata were unified for easier reading and sharing. Blogs are now a core communication layer.

Figure 5: HAMi website blog list page
Figure 5: HAMi website blog list page

Mobile Optimization

Navigation, card layouts, footer, and search were improved for smoother mobile browsing.

Figure 6: HAMi website mobile view
Figure 6: HAMi website mobile view

Footer & Search

Footer layout was enhanced for better navigation and credibility. Built-in search replaced unreliable external solutions, improving content accessibility.

Figure 7: HAMi website footer
Figure 7: HAMi website footer
Figure 8: HAMi website built-in search
Figure 8: HAMi website built-in search

What This Redesign Means for the HAMi Community

From screenshots, it looks like “the website looks better”. But from a community-building perspective, its significance is deeper.

First, HAMi’s external expression is more systematic.

The website is no longer just a collection of scattered pages, but is forming a complete narrative chain: users can understand project value from the homepage, capability details from docs, practical paths from blogs, and community impact from ecosystem modules.

Second, community content assets are reorganized.

Previously, valuable articles, diagrams, and explanations existed but were hard to find. Now, through homepage sections, navigation, and search refactor, these contents are more effectively connected.

Third, HAMi’s community image is more mature.

A mature open source project needs not just an active code repository, but clear, stable, and sustainable website expression. Structure, style, and usability are part of the community’s engineering capability.

Fourth, this lays the foundation for expanding case studies, adopters, contributors, and ecosystem content.

With the framework sorted, adding more case studies, collaboration entry points, or showcasing more adopters and partners will be more natural and easier for users to understand.

As a Community Contributor, My Top Three Takeaways from This Redesign

In summary, I believe this refactor got three things right:

  • Upgraded the website from a “content dump” to a “community gateway”.
  • Combined visual optimization with information architecture adjustment, not just a skin change.
  • Improved basic experiences like search, mobile, navigation, and footer.

These may not be as flashy as launching a new feature, but they directly impact content dissemination, user comprehension, and the project’s long-term image.

For infrastructure projects like HAMi, technical capability is fundamental, but clearly communicating, organizing, and continuously presenting that capability is also a form of infrastructure.

Summary

This HAMi documentation and website refactor is essentially an upgrade to the community’s “expression layer” infrastructure.

It improves visual and reading experience, reorganizes content, homepage narrative, search paths, mobile access, and community signal display. Homepage redesign, architecture diagram redraw, unified blog style, mobile optimization, enhanced footer, and switching from external to built-in search together constitute a true “refactor”.

Externally, it helps more people quickly understand HAMi; internally, it provides a stable platform for the community to accumulate case studies, expand the ecosystem, and serve adopters and contributors.

The website is not an accessory to the open source community, but part of its long-term influence. HAMi’s redesign is about taking this seriously.

If you’re interested in Kubernetes GPU virtualization, add me on WeChat jimmysong or scan the QR code below.

GTC 2026 Eve: AI is Becoming the New Infrastructure

2026-03-15 11:34:06

AI is quietly reshaping the infrastructure landscape, and GTC 2026 may become a key node in this transformation.

Next week, one of the most important technology conferences in the AI industry, NVIDIA GTC 2026, will be held in San Jose, USA.

For many people, GTC is just a GPU technology conference. But if you follow the development of the AI industry over the past few years, you’ll find an interesting phenomenon:

Many important narratives about AI infrastructure are gradually taking shape at GTC.

From CUDA, DGX, to AI Factory, and most recently Jensen Huang’s proposed AI Five-Layer Cake, NVIDIA is constantly attempting to redefine the computing infrastructure of the AI era.

This is why many people call GTC:

AI’s “Woodstock.”

Figure 1: NVIDIA GTC Conference
Figure 1: NVIDIA GTC Conference

This year’s GTC (March 16-19) is expected to cover various levels of the AI stack, including:

  • AI Chips
  • AI Data Centers
  • AI Agents
  • Robotics
  • Inference Computing

According to NVIDIA’s official blog, this year’s keynote will focus on the complete AI stack from chips to applications.

If we put these signals together, we can actually see a larger trend:

AI is transforming from an “applied technology” into “infrastructure.”

The Perspective of Industrial Revolutions

From a longer time scale, the technological revolutions in human history are essentially infrastructure revolutions.

We usually divide industrial revolutions into four times.

In the table below, you can see the infrastructure corresponding to each industrial revolution:

Industrial Revolution Infrastructure
Steam Revolution Steam Engine
Electrical Revolution Power Grid
Digital Revolution Computer
Internet Era Network
Table 1: Industrial Revolutions and Corresponding Infrastructure

First Industrial Revolution: Steam

The steam engine allowed humans to utilize mechanical power on a large scale for the first time. Production no longer relied on human or animal power, but on machines.

Second Industrial Revolution: Electricity

Electricity changed not only the source of power, but also the organization of production. Assembly lines, large-scale manufacturing, and modern industrial systems are all built on the foundation of the power grid.

Third Industrial Revolution: Computers

Computers allowed information to be processed digitally. Software became a production tool.

Fourth Industrial Revolution: Internet and Intelligence

The internet connects all computers together. Cloud computing transforms computing resources into infrastructure. And AI gives machines a certain degree of “cognitive ability.”

The True Significance of AI

If we observe these industrial revolutions, we discover a pattern:

Each industrial revolution produces a new General Purpose Infrastructure.

And AI is likely to become the next-generation infrastructure.

NVIDIA even directly stated in a recent article:

AI is essential infrastructure, like electricity and the internet.

In other words:

AI is no longer just an applied technology, but a new factor of production.

NVIDIA’s Five-Layer Cake

Recently, Jensen Huang proposed a very interesting concept: AI Five-Layer Cake.

Figure 2: AI Five Layer Cake (Image source: <a href="https://blogs.nvidia.com/blog/ai-5-layer-cake/" target="_blank" rel="noopener">NVIDIA</a>)
Figure 2: AI Five Layer Cake (Image source: NVIDIA)

AI is broken down into five layers:

  1. Energy
  2. Chips
  3. AI Infrastructure
  4. Models
  5. Applications

This model actually illustrates one thing:

AI is a complete industrial system.

Jensen Huang even described AI at Davos as:

“One of the largest-scale infrastructure constructions in human history.”

Signals GTC 2026 May Release

This year’s GTC is expected to release several important directions.

Inference Computing

The focus of AI in the past was training. But the main load of AI in the future is likely to be Inference.

Analysts expect that by 2030, 75% of computing demand in the AI data center market will come from inference.

Agentic AI

The past AI model was:

User → Model → Answer

The Agent model is more complex:

User → Agent → Tools → Model → Action

The flowchart below shows the main interaction paths in the Agent model:

Figure 3: Agentic AI Interaction Flow
Figure 3: Agentic AI Interaction Flow

AI is no longer just answering questions, but executing tasks.

Agent Platform

Recent media reports suggest that NVIDIA may launch a new Agent platform: NemoClaw, aimed at helping enterprises deploy AI Agents.

If this project is truly released, it means NVIDIA’s stack will become the following structure:

Figure 4: NVIDIA Agent Platform Architecture
Figure 4: NVIDIA Agent Platform Architecture

This is actually a complete AI stack.

Agents Change Computing Workloads

The emergence of Agents brings new computing workload issues.

Past AI workloads were mainly:

  • Training
  • Inference

But Agents bring a third type of workload:

Agent Workloads

The figure below shows the diverse workload types related to Agents:

Figure 5: Agent Workloads Structure
Figure 5: Agent Workloads Structure

The characteristic of this workload is highly fragmented. GPUs are no longer occupied for long periods, but rather face many small requests. This poses new challenges for infrastructure.

AI-Native Infrastructure

For the past few years, I’ve been thinking about a question:

What is AI-native infrastructure?

It is clearly not just “Kubernetes with GPUs.” I’m more inclined to believe it needs to possess several characteristics.

GPU as a First-Class Resource

In the cloud computing era, CPU is the core resource. In the AI era, GPU is the core resource.

Heterogeneous Computing

Real-world AI chips are not limited to NVIDIA:

  • NVIDIA
  • Ascend
  • Cambricon
  • Metax
  • Moore Threads

Future AI infrastructure must be able to manage heterogeneous computing.

GPU Sharing

GPU is a very expensive resource. If it cannot be shared, utilization will be very low. This is why GPU virtualization and slicing are becoming increasingly important.

AI Scheduling

AI scheduling includes not only traditional CPU and Memory, but also:

GPU
VRAM
Topology
Bandwidth

A Possible AI Tech Stack

Combining the above trends, the future AI stack may present the following structure:

Figure 6: AI Tech Stack Evolution
Figure 6: AI Tech Stack Evolution

This structure is very close to NVIDIA’s Five-Layer Cake.

My Judgment

Combining signals from GTC, AI Factory, Agents, and AI Five-Layer Cake, we can see a very obvious trend:

AI is rewriting computing infrastructure.

Future competition may not just be “who has the best model,” but:

Who has the best AI Infrastructure.

Just like the past few decades:

  • Electricity determines industrial capability
  • Internet determines information capability
  • Cloud computing determines software capability

The future may be:

AI Infrastructure determines intelligence capability.

Summary

If we stretch the time scale a bit longer, we may be in a new historical stage.

AI is no longer just a technological tool. It is becoming new infrastructure.

Just like:

  • Electricity
  • Internet
  • Cloud computing

And AI-native infrastructure is likely to become one of the most important technology directions for the next decade.

When GPUs Move Toward Open Scheduling: Structural Shifts in AI Native Infrastructure

2026-02-13 22:32:46

The future of GPU scheduling isn’t about whose implementation is more “black-box”—it’s about who can standardize device resource contracts into something governable.

Figure 1: GPU Open Scheduling
Figure 1: GPU Open Scheduling

Introduction

Have you ever wondered: why are GPUs so expensive, yet overall utilization often hovers around 10–20%?

Figure 2: GPU Utilization Problem: Expensive GPUs with only 10-20% utilization
Figure 2: GPU Utilization Problem: Expensive GPUs with only 10-20% utilization

This isn’t a problem you solve with “better scheduling algorithms.” It’s a structural problem - GPU scheduling is undergoing a shift from “proprietary implementation” to “open scheduling,” similar to how networking converged on CNI and storage converged on CSI.

In the HAMi 2025 Annual Review, we noted: “HAMi 2025 is no longer just about GPU sharing tools—it’s a more structural signal: GPUs are moving toward open scheduling.”

By 2025, the signals of this shift became visible: Kubernetes Dynamic Resource Allocation (DRA) graduated to GA and became enabled by default, NVIDIA GPU Operator started defaulting to CDI (Container Device Interface), and HAMi’s production-grade case studies under CNCF are moving “GPU sharing” from experimental capability to operational excellence.

This post analyzes this structural shift from an AI Native Infrastructure perspective, and what it means for Dynamia and the industry.

Why “Open Scheduling” Matters

In multi-cloud and hybrid cloud environments, GPU model diversity significantly amplifies operational costs. One large internet company’s platform spans H200/H100/A100/V100/4090 GPUs across five clusters. If you can only allocate “whole GPUs,” resource misalignment becomes inevitable.

“Open scheduling” isn’t a slogan—it’s a set of engineering contracts being solidified into the mainstream stack.

Standardized Resource Expression

Before: GPUs were extended resources. The scheduler didn’t understand if they represented memory, compute, or device types.

Figure 3: Open Scheduling Standardization Evolution
Figure 3: Open Scheduling Standardization Evolution

Now: Kubernetes DRA provides objects like DeviceClass, ResourceClaim, and ResourceSlice. This lets drivers and cluster administrators define device categories and selection logic (including CEL-based selectors), while Kubernetes handles the full loop: match devices → bind claims → place Pods onto nodes with access to allocated devices.

Even more importantly, Kubernetes 1.34 stated that core APIs in the resource.k8s.io group graduated to GA, DRA became stable and enabled by default, and the community committed to avoiding breaking changes going forward. This means the ecosystem can invest with confidence in a stable, standard API.

Standardized Device Injection

Before: Device injection relied on vendor-specific hooks and runtime class patterns.

Now: The Container Device Interface (CDI) abstracts device injection into an open specification. NVIDIA’s Container Toolkit explicitly describes CDI as an open specification for container runtimes, and NVIDIA GPU Operator 25.10.0 defaults to enabling CDI on install/upgrade—directly leveraging runtime-native CDI support (containerd, CRI-O, etc.) for GPU injection.

This means “devices into containers” is also moving toward replaceable, standardized interfaces.

HAMi: From “Sharing Tool” to “Governable Data Plane”

On this standardization path, HAMi’s role needs redefinition: it’s not about replacing Kubernetes—it’s about turning GPU virtualization and slicing into a declarative, schedulable, governable data plane.

Data Plane Perspective

HAMi’s core contribution expands the allocatable unit from “whole GPU integers” to finer-grained shares (memory and compute), forming a complete allocation chain:

  1. Device discovery: Identify available GPU devices and models
  2. Scheduling placement: Use Scheduler Extender to make native schedulers “understand” vGPU resource models (Filter/Score/Bind phases)
  3. In-container enforcement: Inject share constraints into container runtime
  4. Metric export: Provide observable metrics for utilization, isolation, and more

This transforms “sharing” from ad-hoc “it runs” experimentation into engineering capability that can be declared in YAML, scheduled by policy, and validated by metrics.

Scheduling Mechanism: Enhancement, Not Replacement

HAMi’s scheduling doesn’t replace Kubernetes—it uses a Scheduler Extender pattern to let the native scheduler understand vGPU resource models:

  • Filter: Filter nodes based on memory, compute, device type, topology, and other constraints
  • Score: Apply configurable policies like binpack, spread, topology-aware scoring
  • Bind: Complete final device-to-Pod binding

This architecture positions HAMi naturally as an execution layer under higher-level “AI control planes” (queuing, quotas, priorities)—working alongside Volcano, Kueue, Koordinator, and others.

Figure 4: HAMi Scheduling Architecture (Filter → Score → Bind)
Figure 4: HAMi Scheduling Architecture (Filter → Score → Bind)

Production Evidence: From “Can We Share?” to “Can We Operate?”

CNCF public case studies provide concrete answers: in a hybrid, multi-cloud platform built on Kubernetes and HAMi, 10,000+ Pods run concurrently, and GPU utilization improves from 13% to 37% (nearly 3×).

Figure 5: CNCF Production Case Studies: Ke Holdings 13%→37%, DaoCloud 80%+ utilization, SF Technology 57% savings
Figure 5: CNCF Production Case Studies: Ke Holdings 13%→37%, DaoCloud 80%+ utilization, SF Technology 57% savings

Here are highlights from several cases:

Case Study 1: Ke Holdings (February 5, 2026)

  • Environment: 5 clusters spanning public and private clouds
  • GPU models: H200/H100/A100/V100/4090 and more
  • Architecture: Separate “GPU clusters” for large training tasks (dedicated allocation) vs “vGPU clusters” with HAMi fine-grained memory slicing for high-density inference
  • Concurrent scale: 10,000+ Pods
  • Outcome: Overall GPU utilization improved from 13% to 37% (nearly 3×)

Case Study 2: DaoCloud (December 2, 2025)

  • Hard constraints: Must remain cloud-native, vendor-agnostic, and compatible with CNCF toolchain
  • Adoption outcomes:
    • Average GPU utilization: 80%+
    • GPU-related operating cost reduction: 20–30%
    • Coverage: 10+ data centers, 10,000+ GPUs
  • Explicit benefit: Unified abstraction layer across NVIDIA and domestic GPUs, reducing vendor dependency

Case Study 3: Prep EDU (August 20, 2025)

  • Negative experience: Isolation failures in other GPU-sharing approaches caused memory conflicts and instability
  • Positive outcome: HAMi’s vGPU scheduling, GPU type/UUID targeting, and compatibility with NVIDIA GPU Operator and RKE2 became decisive factors for production adoption
  • Environment: Heterogeneous RTX 4070/4090 cluster

Case Study 4: SF Technology (September 18, 2025)

  • Project: EffectiveGPU (built on HAMi)
  • Use cases: Large model inference, test services, speech recognition, domestic AI hardware (Huawei Ascend, Baidu Kunlun, etc.)
  • Outcomes:
    • GPU savings: Large model inference runs 65 services on 28 GPUs (37 saved); test cluster runs 19 services on 6 GPUs (13 saved)
    • Overall savings: Up to 57% GPU savings for production and test clusters
    • Utilization improvement: Up to 100% GPU utilization improvement with GPU virtualization
  • Highlights: Cross-node collaborative scheduling, priority-based preemption, memory over-subscription

These cases demonstrate a consistent pattern: GPU virtualization becomes economically meaningful only when it participates in a governable contract—where utilization, isolation, and policy can be expressed, measured, and improved over time.

Strategic Implications for Dynamia

From Dynamia’s perspective (and as VP of Open Source Ecosystem), the strategic value of HAMi becomes clear:

Two-Layer Architecture: Open Source vs Commercial

  • HAMi (CNCF open source project): Responsible for “adoption and trust,” focused on GPU virtualization and compute efficiency
  • Dynamia enterprise products and services: Responsible for “production and scale,” providing commercial distributions and enterprise services built on HAMi
Figure 6: Dynamia Dual Mechanism: Open Source vs Commercial
Figure 6: Dynamia Dual Mechanism: Open Source vs Commercial

This boundary is the foundation for long-term trust—project and company offerings remain separate, with commercial distributions and services built on the open source project.

Global Narrative Strategy

The internal alignment memo recommends a bilingual approach:

First layer: Lead globally with “GPU virtualization / sharing / utilization” (Chinese can directly use “GPU virtualization and heterogeneous scheduling,” but English first layer should avoid “heterogeneous” as a headline)

Second layer: When users discuss mixed GPUs or workload diversity, introduce “heterogeneous” to confirm capability boundaries—never as the opening hook

Core anchor: Maintain “HAMi (project and community) ≠ company products” as the non-negotiable baseline for long-term positioning

The Right Commercialization Landing

DaoCloud’s case study already set vendor-agnostic and CNCF toolchain compatibility as hard constraints, framing vendor dependency reduction as a business and operational benefit—not just a technical detail. Project-HAMi’s official documentation lists “avoid vendor lock” as a core value proposition.

In this context, the right commercialization landing isn’t “closed-source scheduling”—it’s productizing capabilities around real enterprise complexity:

  • Systematic compatibility matrix
  • SLO and tail-latency governance
  • Metering for billing
  • RBAC, quotas, multi-cluster governance
  • Upgrade and rollback safety
  • Faster path-to-production for DRA/CDI and other standardization efforts

Forward View: The Next 2–3 Years

My strong judgment: over the next 2–3 years, GPU scheduling competition will shift from “whose implementation is more black-box” to “whose contract is more open.”

The reasons are practical:

Hardware Form Factors and Supply Chains Are Diversifying

  • OpenAI’s February 12, 2026 “GPT‑5.3‑Codex‑Spark” release emphasizes ultra-low latency serving, including persistent WebSockets and a dedicated serving tier on Cerebras hardware
  • Large-scale GPU-backed financing announcements (for pan-European deployments) illustrate the infrastructure scale and financial engineering surrounding accelerator fleets

These signals suggest that heterogeneity will grow: mixed accelerators, mixed clouds, mixed workload types.

Low-Latency Inference Tiers Will Force Systematic Scheduling

Low-latency inference tiers (beyond just GPUs) will force resource scheduling toward “multi-accelerator, multi-layer cache, multi-class node” architectural design—scheduling must inherently be heterogeneous.

Open Scheduling Is Risk Management, Not Idealism

In this world, “open scheduling” isn’t idealism—it’s risk management. Building schedulable governable “control plane + data plane” combinations around DRA/CDI and other solidifying open interfaces, ones that are pluggable, multi-tenant governable, and co-evolvable with the ecosystem—this looks like the truly sustainable path for AI Native Infrastructure.

The next battleground isn’t “whose scheduling is smarter”—it’s “who can standardize device resource contracts into something governable.”

Conclusion

When you place HAMi 2025 back in the broader AI Native Infrastructure context, it’s no longer just the year of “GPU sharing tools”—it’s a more structural signal: GPUs are moving toward open scheduling.

Figure 7: Open Scheduling Future Vision
Figure 7: Open Scheduling Future Vision

The driving forces come from both ends:

  • Upstream: Standards like DRA/CDI continue to solidify
  • Downstream: Scale and diversity (multi-cloud, multi-model, even accelerators beyond GPUs)

For Dynamia, HAMi’s significance has transcended “GPU sharing tool”: it turns GPU virtualization and slicing into declarative, schedulable, measurable data planes—letting queues, quotas, priorities, and multi-tenancy actually close the governance loop.

AI Learning Resources: 44 Curated Collections from Our Cleanup

2026-02-08 20:20:05

“The best way to learn AI is to start building. These resources will guide your journey.”

Figure 1: AI Learning Resources Collection
Figure 1: AI Learning Resources Collection

In my ongoing effort to keep the AI Resources list focused on production-ready tools and frameworks, I’ve removed 44 collection-type projects—courses, tutorials, awesome lists, and cookbooks.

These resources aren’t gone—they’ve been moved here. This post is a curated collection of those educational materials, organized by type and topic. Whether you’re a complete beginner or an experienced practitioner, you’ll find something valuable here.

Why Remove Collections from AI Resources?

My AI Resources list now focuses on concrete tools and frameworks—projects you can directly use in production. Collections, while valuable, serve a different purpose: education and discovery.

By separating them, I:

  • Keep the resources list actionable and focused
  • Create a dedicated space for learning materials
  • Make it easier to find what you need

📚 Awesome Lists (14 Collections)

Awesome lists are community-curated collections of the best resources. They’re perfect for discovering new tools and staying updated.

Must-Explore Awesome Lists

Awesome Generative AI

  • Models, tools, tutorials, and research papers
  • Great for: Comprehensive overview of generative AI landscape

Awesome LLM

  • LLM resources: papers, tools, datasets, applications
  • Great for: Deep dive into large language models

Awesome AI Apps

  • Practical LLM applications, RAG examples, agent implementations
  • Great for: Real-world implementation examples

Awesome Claude Code

  • Claude Code commands, files, and workflows
  • Great for: Maximizing Claude Code productivity

Awesome MCP Servers

  • MCP servers for modular AI backend systems
  • Great for: Building with Model Context Protocol

Specialized Awesome Lists


🎓 Courses & Tutorials (9 Curricula)

Structured learning paths from universities and tech companies.

Microsoft’s AI Curriculum

AI for Beginners

  • 12 weeks, 24 lessons covering neural networks, deep learning, CV, NLP
  • Great for: Complete AI foundation
  • Format: Lessons, quizzes, projects

Machine Learning for Beginners

  • 12-week, 26-lesson curriculum on classic ML
  • Great for: ML fundamentals without deep math
  • Format: Project-based exercises

Generative AI for Beginners

  • 18 lessons on building GenAI applications
  • Great for: Practical GenAI development
  • Format: Hands-on projects

AI Agents for Beginners

  • 11 lessons on agent systems
  • Great for: Understanding autonomous agents
  • Format: Project-driven learning

EdgeAI for Beginners

  • Optimization, deployment, and real-world Edge AI
  • Great for: On-device AI applications
  • Format: Practical tutorials

MCP for Beginners

  • Model Context Protocol curriculum
  • Great for: Building with MCP
  • Format: Cross-language examples and labs

Official Platform Courses

Hugging Face Learn Center

  • Free courses on LLMs, deep RL, CV, audio
  • Great for: Hands-on Hugging Face ecosystem
  • Format: Interactive notebooks

OpenAI Cookbook

  • Runnable examples using OpenAI API
  • Great for: OpenAI API best practices
  • Format: Code examples and guides

PyTorch Tutorials

  • Basics to advanced deep learning
  • Great for: PyTorch mastery
  • Format: Comprehensive tutorials

🍳 Cookbooks & Example Collections (5 Collections)

Practical code examples and recipes.

Claude Cookbooks

  • Notebooks and examples for building with Claude
  • Great for: Anthropic Claude integration
  • Format: Jupyter notebooks

Hugging Face Cookbook

  • Practical AI cookbook with Jupyter notebooks
  • Great for: Open models and tools
  • Format: Hands-on examples

Tinker Cookbook

  • Training and fine-tuning examples
  • Great for: Fine-tuning workflows
  • Format: Platform-specific recipes

E2B Cookbook

  • Examples for building LLM apps
  • Great for: LLM application development
  • Format: Recipes and tutorials

arXiv Paper Curator

  • 6-week course on RAG systems
  • Great for: Production-ready RAG
  • Format: Project-based learning

📖 Guides & Handbooks (5 Resources)

In-depth guides on specific topics.

Prompt Engineering Guide

  • Comprehensive prompt engineering resources
  • Great for: Mastering prompt design
  • Format: Guides, papers, lectures, notebooks

Evaluation Guidebook

  • LLM evaluation best practices from Hugging Face
  • Great for: Assessing LLM performance
  • Format: Practical guide

Context Engineering

  • Design and optimize context beyond prompt engineering
  • Great for: Advanced context management
  • Format: Practical handbook

Context Engineering Intro

  • Template and guide for context engineering
  • Great for: Providing project context to AI assistants
  • Format: Template + guide

Vibe-Coding Workflow

  • 5-step prompt template for building MVPs with LLMs
  • Great for: Rapid prototyping with AI
  • Format: Workflow template

🗂️ Template & Workflow Collections

Reusable templates and workflows.

Claude Code Templates

  • Code templates for various programming scenarios
  • Great for: Claude AI development
  • Format: Template collection

n8n Workflows

  • 2,000+ professionally organized n8n workflows
  • Great for: Workflow automation
  • Format: Searchable catalog

N8N Workflows Catalog

  • Community-driven reusable workflow templates
  • Great for: Workflow import and versioning
  • Format: Template catalog

📊 Research & Evaluation

Academic and evaluation resources.

LLMSys PaperList

  • Curated list of LLM systems papers
  • Great for: Research on training, inference, serving
  • Format: Paper collection

Free LLM API Resources

  • LLM providers with free/trial API access
  • Great for: Experimentation without cost
  • Format: Provider list

🎨 Other Notable Resources

System Prompts and Models of AI Tools

  • Community-curated collection of system prompts and AI tool examples
  • Great for: Prompt and agent engineering
  • Format: Resource collection

ML Course CS-433

  • EPFL Machine Learning Course
  • Great for: Academic ML foundation
  • Format: Lectures, labs, projects

Machine Learning Engineering

  • ML engineering open-book: compute, storage, networking
  • Great for: Production ML systems
  • Format: Comprehensive guide

Realtime Phone Agents Course

  • Build low-latency voice agents
  • Great for: Voice AI applications
  • Format: Hands-on course

LLMs from Scratch

  • Build a working LLM from first principles
  • Great for: Understanding LLM internals
  • Format: Repository + book materials

💡 How to Use This Collection

For Complete Beginners

  1. Start with: Microsoft’s AI for Beginners
  2. Practice with: PyTorch Tutorials
  3. Explore: Awesome AI Apps for inspiration

For Developers

  1. Build skills: OpenAI Cookbook + Claude Cookbooks
  2. Find tools: Awesome Generative AI + Awesome LLM
  3. Learn workflows: n8n Workflows Catalog

For Researchers

  1. Stay updated: Awesome Generative AI + LLMSys PaperList
  2. Deep dive: Awesome LLM
  3. Implement: Hugging Face Cookbook

For Product Builders

  1. Find examples: Awesome AI Apps
  2. Learn workflows: n8n Workflows Catalog
  3. Study patterns: Awesome LLM Apps

🔄 What Was NOT Removed

Agent frameworks and production tools remain in the AI Resources list, including:

  • AutoGen - Microsoft’s multi-agent framework
  • CrewAI - High-performance multi-agent orchestration
  • LangGraph - Stateful multi-agent applications
  • Flowise - Visual agent platform
  • Langflow - Visual workflow builder
  • And 80+ more agent frameworks

These are functional tools you can use to build applications, not educational collections. They belong in the AI Resources list.


📝 Summary

I removed 44 collection-type projects from the AI Resources list to keep it focused on production tools:

  • 14 Awesome Lists - Discover new tools and stay updated
  • 9 Courses & Tutorials - Structured learning paths
  • 5 Cookbooks - Practical code examples
  • 5 Guides & Handbooks - In-depth resources
  • 4 Template Collections - Reusable workflows
  • 7 Other Resources - Research and evaluation

These resources remain incredibly valuable for learning and discovery. They just serve a different purpose than the production-focused tools in my AI Resources list.


Next Steps:

  1. Bookmark this post for future reference
  2. Explore the AI Resources list for production tools (agent frameworks, databases, etc.)
  3. Check out my blog for more AI engineering insights

Acknowledgments: This collection was compiled during my AI Resources cleanup initiative. Special thanks to all the maintainers of these awesome lists, courses, and collections for their invaluable contributions to the AI community.