MoreRSS

site iconJimmy Song | 宋净超修改

Tetrate 布道师,云原生社区 创始人,CNCF Ambassador,云原生技术专家。
请复制 RSS 到你的阅读器,或快速订阅到 :

Inoreader Feedly Follow Feedbin Local Reader

Jimmy Song | 宋净超的 RSS 预览

When GPUs Move Toward Open Scheduling: Structural Shifts in AI Native Infrastructure

2026-02-13 22:32:46

The future of GPU scheduling isn’t about whose implementation is more “black-box”—it’s about who can standardize device resource contracts into something governable.

Figure 1: GPU Open Scheduling
Figure 1: GPU Open Scheduling

Introduction

Have you ever wondered: why are GPUs so expensive, yet overall utilization often hovers around 10–20%?

Figure 2: GPU Utilization Problem: Expensive GPUs with only 10-20% utilization
Figure 2: GPU Utilization Problem: Expensive GPUs with only 10-20% utilization

This isn’t a problem you solve with “better scheduling algorithms.” It’s a structural problem - GPU scheduling is undergoing a shift from “proprietary implementation” to “open scheduling,” similar to how networking converged on CNI and storage converged on CSI.

In the HAMi 2025 Annual Review, we noted: “HAMi 2025 is no longer just about GPU sharing tools—it’s a more structural signal: GPUs are moving toward open scheduling.”

By 2025, the signals of this shift became visible: Kubernetes Dynamic Resource Allocation (DRA) graduated to GA and became enabled by default, NVIDIA GPU Operator started defaulting to CDI (Container Device Interface), and HAMi’s production-grade case studies under CNCF are moving “GPU sharing” from experimental capability to operational excellence.

This post analyzes this structural shift from an AI Native Infrastructure perspective, and what it means for Dynamia and the industry.

Why “Open Scheduling” Matters

In multi-cloud and hybrid cloud environments, GPU model diversity significantly amplifies operational costs. One large internet company’s platform spans H200/H100/A100/V100/4090 GPUs across five clusters. If you can only allocate “whole GPUs,” resource misalignment becomes inevitable.

“Open scheduling” isn’t a slogan—it’s a set of engineering contracts being solidified into the mainstream stack.

Standardized Resource Expression

Before: GPUs were extended resources. The scheduler didn’t understand if they represented memory, compute, or device types.

Figure 3: Open Scheduling Standardization Evolution
Figure 3: Open Scheduling Standardization Evolution

Now: Kubernetes DRA provides objects like DeviceClass, ResourceClaim, and ResourceSlice. This lets drivers and cluster administrators define device categories and selection logic (including CEL-based selectors), while Kubernetes handles the full loop: match devices → bind claims → place Pods onto nodes with access to allocated devices.

Even more importantly, Kubernetes 1.34 stated that core APIs in the resource.k8s.io group graduated to GA, DRA became stable and enabled by default, and the community committed to avoiding breaking changes going forward. This means the ecosystem can invest with confidence in a stable, standard API.

Standardized Device Injection

Before: Device injection relied on vendor-specific hooks and runtime class patterns.

Now: The Container Device Interface (CDI) abstracts device injection into an open specification. NVIDIA’s Container Toolkit explicitly describes CDI as an open specification for container runtimes, and NVIDIA GPU Operator 25.10.0 defaults to enabling CDI on install/upgrade—directly leveraging runtime-native CDI support (containerd, CRI-O, etc.) for GPU injection.

This means “devices into containers” is also moving toward replaceable, standardized interfaces.

HAMi: From “Sharing Tool” to “Governable Data Plane”

On this standardization path, HAMi’s role needs redefinition: it’s not about replacing Kubernetes—it’s about turning GPU virtualization and slicing into a declarative, schedulable, governable data plane.

Data Plane Perspective

HAMi’s core contribution expands the allocatable unit from “whole GPU integers” to finer-grained shares (memory and compute), forming a complete allocation chain:

  1. Device discovery: Identify available GPU devices and models
  2. Scheduling placement: Use Scheduler Extender to make native schedulers “understand” vGPU resource models (Filter/Score/Bind phases)
  3. In-container enforcement: Inject share constraints into container runtime
  4. Metric export: Provide observable metrics for utilization, isolation, and more

This transforms “sharing” from ad-hoc “it runs” experimentation into engineering capability that can be declared in YAML, scheduled by policy, and validated by metrics.

Scheduling Mechanism: Enhancement, Not Replacement

HAMi’s scheduling doesn’t replace Kubernetes—it uses a Scheduler Extender pattern to let the native scheduler understand vGPU resource models:

  • Filter: Filter nodes based on memory, compute, device type, topology, and other constraints
  • Score: Apply configurable policies like binpack, spread, topology-aware scoring
  • Bind: Complete final device-to-Pod binding

This architecture positions HAMi naturally as an execution layer under higher-level “AI control planes” (queuing, quotas, priorities)—working alongside Volcano, Kueue, Koordinator, and others.

Figure 4: HAMi Scheduling Architecture (Filter → Score → Bind)
Figure 4: HAMi Scheduling Architecture (Filter → Score → Bind)

Production Evidence: From “Can We Share?” to “Can We Operate?”

CNCF public case studies provide concrete answers: in a hybrid, multi-cloud platform built on Kubernetes and HAMi, 10,000+ Pods run concurrently, and GPU utilization improves from 13% to 37% (nearly 3×).

Figure 5: CNCF Production Case Studies: Ke Holdings 13%→37%, DaoCloud 80%+ utilization, SF Technology 57% savings
Figure 5: CNCF Production Case Studies: Ke Holdings 13%→37%, DaoCloud 80%+ utilization, SF Technology 57% savings

Here are highlights from several cases:

Case Study 1: Ke Holdings (February 5, 2026)

  • Environment: 5 clusters spanning public and private clouds
  • GPU models: H200/H100/A100/V100/4090 and more
  • Architecture: Separate “GPU clusters” for large training tasks (dedicated allocation) vs “vGPU clusters” with HAMi fine-grained memory slicing for high-density inference
  • Concurrent scale: 10,000+ Pods
  • Outcome: Overall GPU utilization improved from 13% to 37% (nearly 3×)

Case Study 2: DaoCloud (December 2, 2025)

  • Hard constraints: Must remain cloud-native, vendor-agnostic, and compatible with CNCF toolchain
  • Adoption outcomes:
    • Average GPU utilization: 80%+
    • GPU-related operating cost reduction: 20–30%
    • Coverage: 10+ data centers, 10,000+ GPUs
  • Explicit benefit: Unified abstraction layer across NVIDIA and domestic GPUs, reducing vendor dependency

Case Study 3: Prep EDU (August 20, 2025)

  • Negative experience: Isolation failures in other GPU-sharing approaches caused memory conflicts and instability
  • Positive outcome: HAMi’s vGPU scheduling, GPU type/UUID targeting, and compatibility with NVIDIA GPU Operator and RKE2 became decisive factors for production adoption
  • Environment: Heterogeneous RTX 4070/4090 cluster

Case Study 4: SF Technology (September 18, 2025)

  • Project: EffectiveGPU (built on HAMi)
  • Use cases: Large model inference, test services, speech recognition, domestic AI hardware (Huawei Ascend, Baidu Kunlun, etc.)
  • Outcomes:
    • GPU savings: Large model inference runs 65 services on 28 GPUs (37 saved); test cluster runs 19 services on 6 GPUs (13 saved)
    • Overall savings: Up to 57% GPU savings for production and test clusters
    • Utilization improvement: Up to 100% GPU utilization improvement with GPU virtualization
  • Highlights: Cross-node collaborative scheduling, priority-based preemption, memory over-subscription

These cases demonstrate a consistent pattern: GPU virtualization becomes economically meaningful only when it participates in a governable contract—where utilization, isolation, and policy can be expressed, measured, and improved over time.

Strategic Implications for Dynamia

From Dynamia’s perspective (and as VP of Open Source Ecosystem), the strategic value of HAMi becomes clear:

Two-Layer Architecture: Open Source vs Commercial

  • HAMi (CNCF open source project): Responsible for “adoption and trust,” focused on GPU virtualization and compute efficiency
  • Dynamia enterprise products and services: Responsible for “production and scale,” providing commercial distributions and enterprise services built on HAMi
Figure 6: Dynamia Dual Mechanism: Open Source vs Commercial
Figure 6: Dynamia Dual Mechanism: Open Source vs Commercial

This boundary is the foundation for long-term trust—project and company offerings remain separate, with commercial distributions and services built on the open source project.

Global Narrative Strategy

The internal alignment memo recommends a bilingual approach:

First layer: Lead globally with “GPU virtualization / sharing / utilization” (Chinese can directly use “GPU virtualization and heterogeneous scheduling,” but English first layer should avoid “heterogeneous” as a headline)

Second layer: When users discuss mixed GPUs or workload diversity, introduce “heterogeneous” to confirm capability boundaries—never as the opening hook

Core anchor: Maintain “HAMi (project and community) ≠ company products” as the non-negotiable baseline for long-term positioning

The Right Commercialization Landing

DaoCloud’s case study already set vendor-agnostic and CNCF toolchain compatibility as hard constraints, framing vendor dependency reduction as a business and operational benefit—not just a technical detail. Project-HAMi’s official documentation lists “avoid vendor lock” as a core value proposition.

In this context, the right commercialization landing isn’t “closed-source scheduling”—it’s productizing capabilities around real enterprise complexity:

  • Systematic compatibility matrix
  • SLO and tail-latency governance
  • Metering for billing
  • RBAC, quotas, multi-cluster governance
  • Upgrade and rollback safety
  • Faster path-to-production for DRA/CDI and other standardization efforts

Forward View: The Next 2–3 Years

My strong judgment: over the next 2–3 years, GPU scheduling competition will shift from “whose implementation is more black-box” to “whose contract is more open.”

The reasons are practical:

Hardware Form Factors and Supply Chains Are Diversifying

  • OpenAI’s February 12, 2026 “GPT‑5.3‑Codex‑Spark” release emphasizes ultra-low latency serving, including persistent WebSockets and a dedicated serving tier on Cerebras hardware
  • Large-scale GPU-backed financing announcements (for pan-European deployments) illustrate the infrastructure scale and financial engineering surrounding accelerator fleets

These signals suggest that heterogeneity will grow: mixed accelerators, mixed clouds, mixed workload types.

Low-Latency Inference Tiers Will Force Systematic Scheduling

Low-latency inference tiers (beyond just GPUs) will force resource scheduling toward “multi-accelerator, multi-layer cache, multi-class node” architectural design—scheduling must inherently be heterogeneous.

Open Scheduling Is Risk Management, Not Idealism

In this world, “open scheduling” isn’t idealism—it’s risk management. Building schedulable governable “control plane + data plane” combinations around DRA/CDI and other solidifying open interfaces, ones that are pluggable, multi-tenant governable, and co-evolvable with the ecosystem—this looks like the truly sustainable path for AI Native Infrastructure.

The next battleground isn’t “whose scheduling is smarter”—it’s “who can standardize device resource contracts into something governable.”

Conclusion

When you place HAMi 2025 back in the broader AI Native Infrastructure context, it’s no longer just the year of “GPU sharing tools”—it’s a more structural signal: GPUs are moving toward open scheduling.

Figure 7: Open Scheduling Future Vision
Figure 7: Open Scheduling Future Vision

The driving forces come from both ends:

  • Upstream: Standards like DRA/CDI continue to solidify
  • Downstream: Scale and diversity (multi-cloud, multi-model, even accelerators beyond GPUs)

For Dynamia, HAMi’s significance has transcended “GPU sharing tool”: it turns GPU virtualization and slicing into declarative, schedulable, measurable data planes—letting queues, quotas, priorities, and multi-tenancy actually close the governance loop.

AI Learning Resources: 44 Curated Collections from Our Cleanup

2026-02-08 20:20:05

“The best way to learn AI is to start building. These resources will guide your journey.”

Figure 1: AI Learning Resources Collection
Figure 1: AI Learning Resources Collection

In my ongoing effort to keep the AI Resources list focused on production-ready tools and frameworks, I’ve removed 44 collection-type projects—courses, tutorials, awesome lists, and cookbooks.

These resources aren’t gone—they’ve been moved here. This post is a curated collection of those educational materials, organized by type and topic. Whether you’re a complete beginner or an experienced practitioner, you’ll find something valuable here.

Why Remove Collections from AI Resources?

My AI Resources list now focuses on concrete tools and frameworks—projects you can directly use in production. Collections, while valuable, serve a different purpose: education and discovery.

By separating them, I:

  • Keep the resources list actionable and focused
  • Create a dedicated space for learning materials
  • Make it easier to find what you need

📚 Awesome Lists (14 Collections)

Awesome lists are community-curated collections of the best resources. They’re perfect for discovering new tools and staying updated.

Must-Explore Awesome Lists

Awesome Generative AI

  • Models, tools, tutorials, and research papers
  • Great for: Comprehensive overview of generative AI landscape

Awesome LLM

  • LLM resources: papers, tools, datasets, applications
  • Great for: Deep dive into large language models

Awesome AI Apps

  • Practical LLM applications, RAG examples, agent implementations
  • Great for: Real-world implementation examples

Awesome Claude Code

  • Claude Code commands, files, and workflows
  • Great for: Maximizing Claude Code productivity

Awesome MCP Servers

  • MCP servers for modular AI backend systems
  • Great for: Building with Model Context Protocol

Specialized Awesome Lists


🎓 Courses & Tutorials (9 Curricula)

Structured learning paths from universities and tech companies.

Microsoft’s AI Curriculum

AI for Beginners

  • 12 weeks, 24 lessons covering neural networks, deep learning, CV, NLP
  • Great for: Complete AI foundation
  • Format: Lessons, quizzes, projects

Machine Learning for Beginners

  • 12-week, 26-lesson curriculum on classic ML
  • Great for: ML fundamentals without deep math
  • Format: Project-based exercises

Generative AI for Beginners

  • 18 lessons on building GenAI applications
  • Great for: Practical GenAI development
  • Format: Hands-on projects

AI Agents for Beginners

  • 11 lessons on agent systems
  • Great for: Understanding autonomous agents
  • Format: Project-driven learning

EdgeAI for Beginners

  • Optimization, deployment, and real-world Edge AI
  • Great for: On-device AI applications
  • Format: Practical tutorials

MCP for Beginners

  • Model Context Protocol curriculum
  • Great for: Building with MCP
  • Format: Cross-language examples and labs

Official Platform Courses

Hugging Face Learn Center

  • Free courses on LLMs, deep RL, CV, audio
  • Great for: Hands-on Hugging Face ecosystem
  • Format: Interactive notebooks

OpenAI Cookbook

  • Runnable examples using OpenAI API
  • Great for: OpenAI API best practices
  • Format: Code examples and guides

PyTorch Tutorials

  • Basics to advanced deep learning
  • Great for: PyTorch mastery
  • Format: Comprehensive tutorials

🍳 Cookbooks & Example Collections (5 Collections)

Practical code examples and recipes.

Claude Cookbooks

  • Notebooks and examples for building with Claude
  • Great for: Anthropic Claude integration
  • Format: Jupyter notebooks

Hugging Face Cookbook

  • Practical AI cookbook with Jupyter notebooks
  • Great for: Open models and tools
  • Format: Hands-on examples

Tinker Cookbook

  • Training and fine-tuning examples
  • Great for: Fine-tuning workflows
  • Format: Platform-specific recipes

E2B Cookbook

  • Examples for building LLM apps
  • Great for: LLM application development
  • Format: Recipes and tutorials

arXiv Paper Curator

  • 6-week course on RAG systems
  • Great for: Production-ready RAG
  • Format: Project-based learning

📖 Guides & Handbooks (5 Resources)

In-depth guides on specific topics.

Prompt Engineering Guide

  • Comprehensive prompt engineering resources
  • Great for: Mastering prompt design
  • Format: Guides, papers, lectures, notebooks

Evaluation Guidebook

  • LLM evaluation best practices from Hugging Face
  • Great for: Assessing LLM performance
  • Format: Practical guide

Context Engineering

  • Design and optimize context beyond prompt engineering
  • Great for: Advanced context management
  • Format: Practical handbook

Context Engineering Intro

  • Template and guide for context engineering
  • Great for: Providing project context to AI assistants
  • Format: Template + guide

Vibe-Coding Workflow

  • 5-step prompt template for building MVPs with LLMs
  • Great for: Rapid prototyping with AI
  • Format: Workflow template

🗂️ Template & Workflow Collections

Reusable templates and workflows.

Claude Code Templates

  • Code templates for various programming scenarios
  • Great for: Claude AI development
  • Format: Template collection

n8n Workflows

  • 2,000+ professionally organized n8n workflows
  • Great for: Workflow automation
  • Format: Searchable catalog

N8N Workflows Catalog

  • Community-driven reusable workflow templates
  • Great for: Workflow import and versioning
  • Format: Template catalog

📊 Research & Evaluation

Academic and evaluation resources.

LLMSys PaperList

  • Curated list of LLM systems papers
  • Great for: Research on training, inference, serving
  • Format: Paper collection

Free LLM API Resources

  • LLM providers with free/trial API access
  • Great for: Experimentation without cost
  • Format: Provider list

🎨 Other Notable Resources

System Prompts and Models of AI Tools

  • Community-curated collection of system prompts and AI tool examples
  • Great for: Prompt and agent engineering
  • Format: Resource collection

ML Course CS-433

  • EPFL Machine Learning Course
  • Great for: Academic ML foundation
  • Format: Lectures, labs, projects

Machine Learning Engineering

  • ML engineering open-book: compute, storage, networking
  • Great for: Production ML systems
  • Format: Comprehensive guide

Realtime Phone Agents Course

  • Build low-latency voice agents
  • Great for: Voice AI applications
  • Format: Hands-on course

LLMs from Scratch

  • Build a working LLM from first principles
  • Great for: Understanding LLM internals
  • Format: Repository + book materials

💡 How to Use This Collection

For Complete Beginners

  1. Start with: Microsoft’s AI for Beginners
  2. Practice with: PyTorch Tutorials
  3. Explore: Awesome AI Apps for inspiration

For Developers

  1. Build skills: OpenAI Cookbook + Claude Cookbooks
  2. Find tools: Awesome Generative AI + Awesome LLM
  3. Learn workflows: n8n Workflows Catalog

For Researchers

  1. Stay updated: Awesome Generative AI + LLMSys PaperList
  2. Deep dive: Awesome LLM
  3. Implement: Hugging Face Cookbook

For Product Builders

  1. Find examples: Awesome AI Apps
  2. Learn workflows: n8n Workflows Catalog
  3. Study patterns: Awesome LLM Apps

🔄 What Was NOT Removed

Agent frameworks and production tools remain in the AI Resources list, including:

  • AutoGen - Microsoft’s multi-agent framework
  • CrewAI - High-performance multi-agent orchestration
  • LangGraph - Stateful multi-agent applications
  • Flowise - Visual agent platform
  • Langflow - Visual workflow builder
  • And 80+ more agent frameworks

These are functional tools you can use to build applications, not educational collections. They belong in the AI Resources list.


📝 Summary

I removed 44 collection-type projects from the AI Resources list to keep it focused on production tools:

  • 14 Awesome Lists - Discover new tools and stay updated
  • 9 Courses & Tutorials - Structured learning paths
  • 5 Cookbooks - Practical code examples
  • 5 Guides & Handbooks - In-depth resources
  • 4 Template Collections - Reusable workflows
  • 7 Other Resources - Research and evaluation

These resources remain incredibly valuable for learning and discovery. They just serve a different purpose than the production-focused tools in my AI Resources list.


Next Steps:

  1. Bookmark this post for future reference
  2. Explore the AI Resources list for production tools (agent frameworks, databases, etc.)
  3. Check out my blog for more AI engineering insights

Acknowledgments: This collection was compiled during my AI Resources cleanup initiative. Special thanks to all the maintainers of these awesome lists, courses, and collections for their invaluable contributions to the AI community.

Standing on Giants' Shoulders: The Traditional Infrastructure Powering Modern AI

2026-02-08 16:00:00

“If I have seen further, it is by standing on the shoulders of giants.” — Isaac Newton

Figure 1: Standing on Giants’ Shoulders: The Traditional Infrastructure Powering Modern AI
Figure 1: Standing on Giants’ Shoulders: The Traditional Infrastructure Powering Modern AI

In the excitement surrounding LLMs, vector databases, and AI agents, it’s easy to forget that modern AI didn’t emerge from a vacuum. Today’s AI revolution stands upon decades of infrastructure work—distributed systems, data pipelines, search engines, and orchestration platforms that were built long before “AI Native” became a buzzword.

This post is a tribute to those traditional open source projects that became the invisible foundation of AI infrastructure. They’re not “AI projects” per se, but without them, the AI revolution as we know it wouldn’t exist.

The Evolution: From Big Data to AI

Era Focus Core Technologies AI Connection
2000s Web Search & Indexing Lucene, Elasticsearch Semantic search foundations
2010s Big Data & Distributed Computing Hadoop, Spark, Kafka Data processing at scale
2010s Cloud Native Docker, Kubernetes Model deployment platforms
2010s Stream Processing Flink, Storm, Pulsar Real-time ML inference
2020s AI Native Transformers, Vector DBs Built on everything above
Table 1: Evolution of Data Infrastructure

Big Data Frameworks: The Data Engines

Before we could train models on petabytes of data, we needed ways to store, process, and move that data.

Apache Hadoop (2006)

GitHub: https://github.com/apache/hadoop

Hadoop democratized big data by making distributed computing accessible. Its HDFS filesystem and MapReduce paradigm proved that commodity hardware could process web-scale datasets.

Why it matters for AI:

  • Modern ML training datasets live in HDFS-compatible storage
  • Data lakes built on Hadoop became training data reservoirs
  • Proved that distributed computing could scale horizontally

Apache Kafka (2011)

GitHub: https://github.com/apache/kafka

Kafka redefined data streaming with its log-based architecture. It became the nervous system for real-time data flows in enterprises worldwide.

Why it matters for AI:

  • Real-time feature pipelines for ML models
  • Event-driven architectures for AI agent systems
  • Streaming inference pipelines
  • Model telemetry and monitoring backbones

Apache Spark (2014)

GitHub: https://github.com/apache/spark

Spark brought in-memory computing to big data, making iterative algorithms (like ML training) practical at scale.

Why it matters for AI:

  • MLlib made ML accessible to data engineers
  • Distributed data processing for model training
  • Spark ML became the de facto standard for big data ML
  • Proved that in-memory computing could accelerate ML workloads

Search Engines: The Retrieval Foundation

Before RAG (Retrieval-Augmented Generation) became a buzzword, search engines were solving retrieval at scale.

Elasticsearch (2010)

GitHub: https://github.com/elastic/elasticsearch

Elasticsearch made full-text search accessible and scalable. Its distributed architecture and RESTful API became the standard for search.

Why it matters for AI:

  • pioneered distributed inverted index structures
  • Proved that horizontal scaling was possible for search workloads
  • Many “AI search” systems actually use Elasticsearch under the hood
  • Query DSL influenced modern vector database query languages

OpenSearch (2021)

GitHub: https://github.com/opensearch-project/opensearch

When AWS forked Elasticsearch, it ensured search infrastructure remained truly open. OpenSearch continues the mission of accessible, scalable search.

Why it matters for AI:

  • Maintains open source innovation in search
  • Vector search capabilities added in 2023
  • Demonstrates community fork resilience

Databases: From SQL to Vectors

The evolution from relational databases to vector databases represents a paradigm shift—but both have AI relevance.

Traditional Databases That Paved the Way

  • Dgraph (2015) - Graph database proving that specialized data structures enable new use cases
  • TDengine (2019) - Time-series database for IoT ML workloads
  • OceanBase (2021) - Distributed database showing ACID transactions could scale

Why they matter for AI:

  • Proved that specialized database engines could outperform general-purpose ones
  • Database internals (indexing, sharding, replication) are now applied to vector databases
  • Multi-model databases (graph + vector + relational) are becoming the norm for AI apps

Cloud Native: The Runtime Foundation

When Docker and Kubernetes emerged, they weren’t built for AI—but AI couldn’t scale without them.

Docker (2013) & Kubernetes (2014)

GitHub: https://github.com/kubernetes/kubernetes

Kubernetes became the operating system for cloud-native applications. Its declarative API and controller pattern made it perfect for AI workloads.

Why it matters for AI:

  • Model deployment platforms (KServe, Seldon Core) run on K8s
  • GPU orchestration (NVIDIA GPU Operator, Volcano, HAMi) extends K8s
  • Kubeflow made K8s the standard for ML pipelines
  • Microservice patterns enable modular AI agent architectures

Service Mesh & Serverless

Istio (2016), Knative (2018) - Service mesh and serverless platforms that proved:

  • Network-level observability applies to AI model calls
  • Scale-to-zero is essential for cost-effective inference
  • Traffic splitting enables A/B testing of ML models

Why they matter for AI:

  • AI Gateway patterns evolved from API gateways + service mesh
  • Serverless inference platforms use Knative-style autoscaling
  • Observability patterns (tracing, metrics) are now standard for ML systems

API Gateways: From REST to LLM

API gateways weren’t designed for AI, but they became the foundation of AI Gateway patterns.

Kong, APISIX, KGateway

These API gateways solved rate limiting, auth, and routing at scale. When LLMs emerged, the same patterns applied:

AI Gateway Evolution:

Traditional API Gateway (2010s)
Rate Limiting → Token Bucket Rate Limiting
Auth → API Key + Organization Management
Routing → Model Routing (GPT-4 → Claude → Local Models)
Observability → LLM-specific Telemetry (token usage, cost)
AI Gateway (2024)

Why they matter for AI:

  • Proved that centralized API management scales
  • Plugin architectures enable LLM-specific features
  • Traffic management patterns apply to prompt routing
  • Security patterns (mTLS, JWT) now protect AI endpoints

Workflow Orchestration: The Pipeline Backbone

Data engineering needs pipelines. ML engineering needs pipelines. AI agents need workflows.

Apache Airflow (2015)

GitHub: https://github.com/apache/airflow

Airflow made pipeline orchestration accessible with its DAG-based approach. It became the standard for ETL and data engineering.

Why it matters for AI:

  • ML pipeline orchestration (feature engineering, training, evaluation)
  • Proved that DAG-based workflow definition works at scale
  • Prompt engineering pipelines use Airflow-style orchestration
  • Scheduler patterns are now applied to AI agent workflows

n8n, Prefect, Flyte

Modern workflow platforms that evolved from Airflow’s foundations:

  • n8n (2019) - Visual workflow automation with AI capabilities
  • Prefect (2018) - Python-native workflow orchestration for ML
  • Flyte (2019) - Kubernetes-native workflow orchestration for ML/data

Why they matter for AI:

  • Multi-modal agents need workflow orchestration
  • RAG pipelines are essentially ETL pipelines for embeddings
  • Prompt chaining is DAG-based orchestration

Data Formats: The Lakehouse Foundation

Before we could train on massive datasets, we needed formats that supported ACID transactions and schema evolution.

Delta Lake, Apache Iceberg, Apache Hudi

These table formats brought reliability to data lakes:

Why they matter for AI:

  • Training datasets need versioning and reproducibility
  • Feature stores use Delta/Iceberg as storage formats
  • Proved that “big data” could have transactional semantics
  • Schema evolution handles ML feature drift

The Invisible Thread: Why These Projects Matter

What do all these projects have in common?

  1. They solved scaling first - AI training/inference needs horizontal scaling
  2. They proved distributed systems work - Modern AI is fundamentally distributed
  3. They created ecosystem patterns - Plugin systems, extension points, APIs
  4. They established best practices - Observability, security, CI/CD
  5. They built developer habits - YAML configs, declarative APIs, CLI tools

The AI Native Continuum

Modern “AI Native” infrastructure didn’t replace these projects—it builds on them:

Traditional Project AI Native Evolution Example
Hadoop HDFS Distributed model storage HDFS for datasets, S3 for checkpoints
Kafka Real-time feature pipelines Kafka → Feature Store → Model Serving
Spark ML Distributed ML training MLlib → PyTorch Distributed
Elasticsearch Vector search ES → Weaviate/Qdrant/Milvus
Kubernetes ML orchestration K8s → Kubeflow/KServe
Istio AI Gateway service mesh Istio → LLM Gateway with mTLS
Airflow ML pipeline orchestration Airflow → Prefect/Flyte for ML
Table 2: From Traditional to AI Native

Why We’re Removing Them from AI Resources List

This post honors these projects, but we’re also removing them from our AI Resources list. Here’s why:

They’re not “AI Projects”—they’re foundational infrastructure.

  • Hadoop, Kafka, Spark are data engineering tools, not ML frameworks
  • Elasticsearch is search, not semantic search
  • Kubernetes is general-purpose orchestration
  • API gateways serve REST/GraphQL, not just LLMs

But their absence doesn’t diminish their importance.

By removing them, we acknowledge that:

  1. AI has its own ecosystem - Transformers, vector DBs, LLM ops
  2. Traditional infra has its own domain - Data engineering, cloud native
  3. The intersection is where innovation happens - AI-native data platforms, LLM ops on K8s

The Giants We Stand On

The next time you:

  • Deploy a model on Kubernetes
  • Stream features through Kafka
  • Search embeddings with a vector database
  • Orchestrate a RAG pipeline with Prefect

Remember: You’re standing on the shoulders of Hadoop, Kafka, Elasticsearch, Kubernetes, and countless others. They built the roads we now drive on.

The Future: Building New Giants

Just as Hadoop and Kafka enabled modern AI, today’s AI infrastructure will become tomorrow’s foundation:

  • Vector databases may become the new standard for all search
  • LLM observability may evolve into general distributed tracing
  • AI agent orchestration may reinvent workflow automation
  • GPU scheduling may influence general-purpose resource management

The cycle continues. The giants of today will be the foundations of tomorrow.

Conclusion: Gratitude and Continuity

As we clean up our AI Resources list to focus on AI-native projects, we don’t forget where we came from. Traditional big data and cloud native infrastructure made the AI revolution possible.

To the Hadoop committers, Kafka maintainers, Kubernetes contributors, and all who built the foundation: Thank you.

Your work enabled ChatGPT, enabled Transformers, enabled everything we now call “AI.”

Standing on your shoulders, we see further.


Acknowledgments: This post was inspired by the need to refactor our AI Resources list. The 27 projects mentioned here are being removed—not because they’re unimportant, but because they deserve their own category: The Foundation.

My First Month at Dynamia: Why AI Native Infra Is Worth the Investment

2026-02-06 20:56:35

Time flies—it’s already been a month since I joined Dynamia. In this article, I want to share my observations from this past month: why AI Native Infra is a direction worth investing in, and some considerations for those thinking about their own career or technical direction.

Introduction

After nearly five years of remote work, I officially joined Dynamia last month as VP of Open Source Ecosystem. This decision was not sudden, but a natural extension of my journey from cloud native to AI Native Infra.

But this article is not just about my personal choice. I want to answer a more universal question: In the wave of AI infrastructure startups, why is compute governance a direction worth investing in?

For the past decade, I have worked continuously in the infrastructure space: from Kubernetes to Service Mesh, and now to AI Infra. I am increasingly convinced that the core challenge in the AI era is not “can the model run,” but “can compute resources be run efficiently, reliably, and in a controlled manner.” This conviction has only grown stronger through my observations and reflections during this first month at Dynamia.

This article answers three questions: What is AI Native Infra? Why is GPU virtualization a necessity? Why did I choose Dynamia and HAMi?

What Is AI Native Infra

The core of AI Native Infrastructure is not about adding another platform layer, but about redefining the governance target: expanding from “services and containers” to “model behaviors and compute assets.”

I summarize it as three key shifts:

  • Models as execution entities: Governance now includes not just processes, but also model behaviors.
  • Compute as a scarce asset: GPU, memory, and bandwidth must be scheduled and metered precisely.
  • Uncertainty as the default: Systems must remain observable and recoverable amid fluctuations.

In essence, AI Native Infra is about upgrading compute governance from “resource allocation” to “sustainable business capability.”

Why GPU Virtualization Is Essential

Many teams focus on model inference optimization, but in production, enterprises first encounter the problem of “underutilized GPUs.” This is where GPU virtualization delivers value.

  • Structural idleness: Small tasks monopolize large GPUs, leaving them idle for long periods.
  • Pseudo-isolation risks: Native sharing lacks hard boundaries, so a single task OOM can cause cascading failures.
  • Scheduling failures: Some users queue for GPUs while others occupy but do not use them, leading to both shortages and idleness.
  • Fragmentation waste: There may be enough total GPU, but not enough full cards, making efficient packing impossible.
  • Vendor lock-in anxiety: Proprietary, tightly coupled solutions make migration costs uncontrollable.

In short: GPUs must not only be allocatable, but also splittable, isolatable, schedulable, and governable.

The Relationship Between HAMi and Dynamia

This is the most frequently asked question. Here is the shortest answer:

  • HAMi: A CNCF-hosted open source project and community focused on GPU virtualization and heterogeneous compute scheduling.
  • Dynamia: The founding and leading company behind HAMi, providing enterprise-grade products and services based on HAMi.

Open source projects are not the same as company products, but the two evolve together. HAMi drives industry adoption and technical trust, while Dynamia brings these capabilities into enterprise production environments at scale. This “dual engine” approach is what makes Dynamia unique.

What HAMi Provides

HAMi (Heterogeneous AI Computing Virtualization Middleware) delivers three key capabilities on Kubernetes:

  • Virtualization and partitioning: Split physical GPUs into logical resources on demand to improve utilization.
  • Scheduling and topology awareness: Place workloads optimally based on topology to reduce communication bottlenecks.
  • Isolation and observability: Support quotas, policies, and monitoring to reduce production risks.

Currently, HAMi has attracted over 360 contributors from 16 countries, with more than 200 enterprise end users, and its international influence continues to grow.

Market Trends: The AI Infrastructure Startup Wave

AI infrastructure is experiencing a new wave of startups. The vLLM team’s company raised $150 million, SGLang’s commercial spin-off RadixArk is valued at $4 billion, and Databricks acquired MosaicML for $1.3 billion—all pointing to a consensus: Whoever helps enterprises run large models more efficiently and cost-effectively will hold the keys to next-generation AI infrastructure.

Against this backdrop, the positioning of Dynamia and HAMi is even clearer. Many teams focus on “model performance acceleration” and “inference optimization” (like vLLM, SGLang), while we focus on “resource scheduling and virtualization”—enabling better orchestration of existing accelerated hardware resources.

The two are complementary: the former makes individual models run faster and cheaper, while the latter ensures that compute allocation at the cluster level is efficient, fair, and controllable. This is similar to extending Kubernetes’ CPU/memory scheduling philosophy to GPU and heterogeneous compute management in the AI era.

Why AI Native Infra Is Worth the Investment

My observations this month have convinced me that compute governance is the most undervalued yet most promising area in AI infrastructure. If you are considering a career or technical investment, here is my assessment:

First, this is a real and urgent pain point

Model training and inference optimization attract a lot of attention, but in production, enterprises first encounter the problem of “underutilized GPUs”—structural idleness, scheduling failures, fragmentation waste, and vendor lock-in anxiety. Without solving these problems, even the fastest models cannot scale in production. GPU virtualization and heterogeneous compute scheduling are the “infrastructure below infrastructure” for enterprise AI transformation.

Second, this is a clear long-term track

Frameworks like vLLM and SGLang emerge constantly, making individual models run faster. But who ensures that compute allocation at the cluster level is efficient, fair, and controllable? This is similar to extending Kubernetes’ success in CPU/memory scheduling to GPU and heterogeneous compute management in the AI era. This is not something that can be finished in a year or two, but a direction for continuous construction over the next five to ten years.

Third, this is an open and verifiable path

Dynamia chose to build on HAMi as an open source foundation, first solving general capabilities, then supporting enterprise adoption. This means the technical direction is transparent and verifiable in the community. You can form your own judgment by participating in open source, observing adoption, and evaluating the ecosystem—rather than relying on the black-box promises of proprietary solutions.

Fourth, this is a window of opportunity that is opening now

AI infrastructure is being redefined. Investing in its construction today will continue to yield value in the coming years. The vLLM team’s company raised $150 million, SGLang’s commercial spin-off RadixArk is valued at $4 billion, Databricks acquired MosaicML for $1.3 billion—all validating the same trend: Whoever helps enterprises run large models more efficiently will hold the keys to next-generation AI infrastructure.

I hope to bring my experience in cloud native and open source communities to the next stage of HAMi and Dynamia: turning GPU resources from a “cost center” into an “operational asset.” This is not just my career choice, but my judgment and investment in the direction of next-generation infrastructure.

Join the HAMi Community
Add me on WeChat (jimmysong) to join the HAMi community focused on GPU virtualization and heterogeneous compute scheduling.

If you are also interested in HAMi, GPU virtualization, AI Native Infra, or Dynamia, feel free to reach out.

Summary

From cloud native to AI Native Infra, my observations this month have only strengthened my conviction: The true upper limit of AI applications is determined by the infrastructure’s ability to govern compute resources.

HAMi addresses the fundamental issues of GPU virtualization and heterogeneous compute scheduling, while Dynamia is driving these capabilities into large-scale production. If you are also looking for a technical direction worth long-term investment, AI Native Infra—especially compute governance and scheduling—is a track with real pain points, a clear path, an open ecosystem, and an opening window of opportunity.

Joining Dynamia is not just a career choice, but a commitment to building the next generation of infrastructure. I hope the observations and reflections in this article can provide some reference for you as you evaluate technical directions and career opportunities.

If you are also interested in HAMi, GPU virtualization, AI Native Infra, or Dynamia, feel free to reach out.

The True Inflection Point of ADD: When Spec Becomes the Core Asset of AI-Era Software

2026-01-20 15:51:36

The role of Spec is undergoing a fundamental transformation, becoming the governance anchor of engineering systems in the AI era.

The Essence of Software Engineering and the Cost Structure Shift Brought by AI

From first principles, software engineering has always been about one thing: stably, controllably, and reproducibly transforming human intent into executable systems.

Artificial Intelligence (AI) does not change this engineering essence, but it dramatically alters the cost structure:

  • Implementation costs plummet: Code, tests, and boilerplate logic are rapidly commoditized.
  • Consistency costs rise sharply: Intent drift, hidden conflicts, and cross-module inconsistencies become more frequent.
  • Governance costs are amplified: As agents can act directly, auditability, accountability, and explainability become hard constraints.

Therefore, in the era of Agent-Driven Development (ADD), the core issue is not “can agents do the work,” but how to maintain controllability and intent preservation in engineering systems under highly autonomous agents.

The ADD Era Inflection Point: Three Structural Preconditions

Many attribute the “explosion” of ADD to more mature multi-agent systems, stronger models, or more automated tools. In reality, the true structural inflection point arises only when these three conditions are met:

Agents have acquired multi-step execution capabilities

With frameworks like LangChain, LangGraph, and CrewAI, agents are no longer just prompt invocations, but long-lived entities capable of planning, decomposition, execution, and rollback.

Agents are entering real enterprise delivery pipelines

Once in enterprise R&D, the question shifts from “can it generate” to “who approved it, is it compliant, can it be rolled back.”

Traditional engineering tools lack a control plane for the agent era

Tools like Git, CI, and Issue Trackers were designed for “human developer collaboration,” not for “agent execution.”

When these three factors converge, ADD inevitably shifts from an “efficiency tool” to a “governance system.”

The Changing Role of Spec: From Documentation to System Constraint

In the context of ADD, Spec is undergoing a fundamental shift:

Spec is no longer “documentation for humans,” but “the source of constraints and facts for systems and agents to execute.”

Spec now serves at least three roles:

Verifiable expression of intent and boundaries

Requirements, acceptance criteria, and design principles are no longer just text, but objects that can be checked, aligned, and traced.

Stable contracts for organizational collaboration

When agents participate in delivery, verbal consensus and tacit knowledge quickly fail. Versioned, auditable artifacts become the foundation of collaboration.

Policy surface for agent execution

Agents can write code, modify configurations, and trigger pipelines. Spec must become the constraint on “what can and cannot be done.”

From this perspective, the status of Spec is approaching that of the Control Plane in AI-native infrastructure.

The Reality of Multi-Agent Workflows: Orchestration and Governance First

In recent systems (such as APOX and other enterprise products), an industry consensus is emerging:

  • Multi-agent collaboration no longer pursues “full automation,” but is staged and gated.
  • Frameworks like LangGraph are used to build persistent, debuggable agent workflows.
  • RAG (e.g., based on Milvus) is used to accumulate historical Specs, decisions, and context as long-term memory.
  • The IDE mainly focuses on execution efficiency, not engineering governance.
Figure 1: APOX user interface
Figure 1: APOX user interface

APOX (AI Product Orchestration eXtended) is a multi-agent collaboration workflow platform for enterprise software delivery. Its core goals are:

  • To connect the entire process from product requirements to executable code with a governable Agentflow and explicit engineering artifact chain.
  • To assign dedicated AI agents to each delivery stage (such as PRD, PO, Architecture, Developer, Implementation, Coding, etc.).
  • To embed manual approval gates and full audit trails at every step, solving the “intent drift and consistency” governance problem that traditional AI coding tools cannot address.
  • The platform provides a VS Code plugin for real-time sync between local IDE and web artifacts, allowing Specs, code, tasks, and approval statuses to coexist in the repository.
  • Supports assigning different base models to different agents according to enterprise needs.

APOX is not about simply speeding up code generation, but about elevating “Spec” from auxiliary documentation to a verifiable, constrainable, and traceable core asset in engineering—building a control plane and workflow governance system suitable for Agent-Driven Development.

Such systems emphasize:

  • An explicit artifact chain from PRD → Spec → Task → Implementation.
  • Manual confirmation and audit points at every stage.
  • Bidirectional sync between Spec, code, repository, and IDE.

This is not about “smarter AI,” but about engineering systems adapting to the agent era.

The Long-Term Value of Spec: The Core Anchor of Engineering Assets

This is not to devalue code, but to acknowledge reality:

  • There will always be long-term differentiation in algorithms and model capabilities.
  • General engineering implementation is rapidly homogenizing.
  • What is hard to replicate is: how to define problems, constrain systems, and govern change.

In the ADD era, the value of Spec is reflected in:

  • Determining what agents can and cannot do.
  • Carrying the organization’s long-term understanding of the system.
  • Serving as the anchor for audit, compliance, and accountability.

Code will be rewritten again and again; Spec is the long-term asset.

Risks and Challenges of ADD: Living Spec and Governance Constraints

ADD also faces significant risks:

Can Spec become a Living Spec

That is, when key implementation changes occur, can the system detect “intent changes” and prompt Spec updates, rather than allowing silent drift?

Can governance achieve low friction but strong constraints

If gates are too strict, teams will bypass them; if too loose, the system loses control.

These two factors determine whether ADD is “the next engineering paradigm” or “just another tool bubble.”

The Trend Toward Control Planes in Engineering Systems

From a broader perspective, ADD is the inevitable result of engineering systems becoming “control planes”:

Engineering systems are evolving from “human collaboration tools” to “control systems for agent execution.”

In this structure:

  • Agent / IDE is the execution plane.
  • RAG / Memory is the state and memory plane.
  • Spec is the intent and policy plane.
  • Gates, audit, and traceability form the governance loop.

This closely aligns with the evolution path of AI-native infrastructure.

Summary

The winners of the ADD era will not be the systems with “the most agents or the fastest generation,” but those that first upgrade Spec from documentation to a governable, auditable, and executable asset. As automation advances, the true scarcity is the long-term control of intent.

AI Voice Dictation Input Methods Are Becoming the New Shortcut Key for the Programming Era

2026-01-18 14:53:08

Voice input methods are not just about being “fast”—they are becoming a brand new gateway for developers to collaborate with AI.

Warning
On January 12, 2026, due to financial difficulties encountered during operations, the Miaoyan project announced the cessation of operations and the team was disbanded. The application will no longer be updated or maintained, but existing versions can continue to be used on the current device and system, and do not store any audio or transcription content.
Figure 1: Can voice input become the new shortcut for developers? My in-depth comparison experience.
Figure 1: Can voice input become the new shortcut for developers? My in-depth comparison experience.

AI Voice Input Methods Are Becoming the “New Shortcut Key” in the Programming Era

I am increasingly convinced of one thing: PC-based AI voice input methods are evolving from mere “input tools” into the foundational interaction layer for the era of programming and AI collaboration.

It’s not just about typing faster—it determines how you deliver your intent to the system, whether you’re writing documentation, code, or collaborating with AI in IDEs, terminals, or chat windows.

Because of this, the differences in voice input method experiences are far more significant than they appear on the surface.

My Six Evaluation Criteria for AI Voice Input Methods

After long-term, high-frequency use, I have developed a set of criteria to assess the real-world performance of AI voice input methods:

  • Response speed: Does text appear quickly enough after pressing the shortcut to keep up with your thoughts?
  • Continuous input stability: Does it remain reliable during extended use, or does it suddenly fail or miss recognition?
  • Mixed Chinese-English and technical terms: Can it reliably handle code, paths, abbreviations, and product names?
  • Developer friendliness: Is it truly designed for command line, IDE, and automation scenarios?
  • Interaction restraint: Does it avoid introducing distracting features that interfere with input itself?
  • Subscription and cost structure: Is it a standalone paid product, or can it be bundled with existing tool subscriptions?

Based on these criteria, I focused on comparing Miaoyan, Shandianshuo, and Zhipu AI Voice Input Method.

Miaoyan: Currently the Most “Developer-Oriented” Domestic Product

Miaoyan was the first domestic AI voice input method I used extensively, and it remains the one I am most willing to use continuously.

Figure 2: Miaoyan is currently my most-used Mac voice input method.
Figure 2: Miaoyan is currently my most-used Mac voice input method.

Command Mode: The Key Differentiator for Developer Productivity

It’s important to clarify that Miaoyan’s command mode is not about editing text via voice. Instead:

You describe your need in natural language, and the system directly generates an executable command-line command.

This is crucial for developers:

  • It’s not just about input
  • It’s about turning voice into an automation entry point
  • Essentially, it connects voice to the CLI or toolchain

This design is clearly focused on engineering efficiency, not office document polishing.

Usage Experience Summary

  • Fast response, nearly instant
  • Output is relatively clean, with minimal guessing
  • Interaction design is restrained, with no unnecessary concepts
  • Developer-friendly mindset

But there are some practical limitations:

  • It is a completely standalone product
  • Requires a separate subscription
  • Still in relatively small-scale use

From a product strategy perspective, it feels more like a “pure tool” than part of an ecosystem.

Note
On January 12, 2026, due to financial difficulties encountered during operations, the Miaoyan project announced the cessation of operations and the team was disbanded. The application will no longer be updated or maintained, but existing versions can continue to be used on the current device and system, and do not store any audio or transcription content.

Shandianshuo: Local-First Approach, Developer Experience Depends on Your Setup

Shandianshuo takes a different approach: it treats voice input as a “local-first foundational capability,” emphasizing low latency and privacy (at least in its product narrative). The natural advantages of this approach are speed and controllable marginal costs, making it suitable as a “system capability” that’s always available, rather than a cloud service.

Figure 3: Shandianshuo settings page
Figure 3: Shandianshuo settings page

However, from a developer’s perspective, its upper limit often depends on “how you implement enhanced capabilities”:

If you only use it for basic transcription, the experience is more like a high-quality local input tool. But if you want better mixed Chinese-English input, technical term correction, symbol and formatting handling, the common approach is to add optional AI correction/enhancement capabilities, which usually requires extra configuration (such as providing your own API key or subscribing to enhanced features). The key trade-off here is not “can it be used,” but “how much configuration cost are you willing to pay for enhanced capabilities.”

If you want voice input to be a “lightweight, stable, non-intrusive” foundation, Shandianshuo is worth considering. But if your goal is to make voice input part of your developer workflow (such as command generation or executable actions), it needs to offer stronger productized design at the “command layer” and in terms of controllability.

Zhipu AI Voice Input Method: Stable but with Friction

I also thoroughly tested the Zhipu AI Voice Input Method.

Figure 4: Zhipu Voice Input Method settings interface
Figure 4: Zhipu Voice Input Method settings interface

Its strengths include:

  • More stable for long-term continuous input
  • Rarely becomes completely unresponsive
  • Good tolerance for longer Chinese input

But with frequent use, some issues stand out:

  • Idle misrecognition: If you press the shortcut but don’t speak, it may output random characters, disrupting your input flow
  • Occasionally messy output: Sometimes adds irrelevant words, making it less controllable than Miaoyan
  • Basic recognition errors: For example, “Zhipu” being recognized as “Zhipu” (with a different character), which is a trust issue for professional users
  • Feature-heavy design: Various tone and style features increase cognitive load

Subscription Bundling: Zhipu’s Practical Advantage

Although I prefer Miaoyan in terms of experience, Zhipu has a very practical advantage:

If you already subscribe to Zhipu’s programming package, the voice input method is included for free.

This means:

  • No need to pay separately for the input method
  • Lower psychological and decision-making cost
  • More likely to become the “default tool” that stays

From a business perspective, this is a very smart strategy.

Main Comparison Table

The following table compares the three products across key dimensions for quick reference.

Dimension Miaoyan Shandianshuo Zhipu AI Voice Input Method
Response Speed Fast, nearly instant Usually fast (local-first) Slightly slower than Miaoyan
Continuous Stability Stable Depends on setup and environment Very stable
Idle Misrecognition Rare Generally restrained (varies by version) Obvious: outputs characters even if silent
Output Cleanliness/Control High More like an “input tool” Occasionally messy
Developer Differentiator Natural language → executable command Local-first / optional enhancements Ecosystem-attached capabilities
Subscription & Cost Standalone, separate purchase Basic usable; enhancements often require setup/subscription Bundled free with programming package
My Current Preference Best experience More like a “foundation approach” Easy to keep but not clean enough
Table 1: Core Comparison of Miaoyan, Shandianshuo, and Zhipu AI Voice Input Methods

User Loyalty to AI Voice Input Methods

The switching cost for voice input methods is actually low: just a shortcut key and a habit of output.

What really determines whether users stick around is:

  • Whether the output is controllable
  • Whether it keeps causing annoying minor issues
  • Whether it integrates into your existing workflow and payment structure

For me personally:

  • The best and smoothest experience is still Miaoyan
  • The one most likely to stick around is probably Zhipu
  • Shandianshuo is more of a “foundation approach” and worth watching for how its enhancements evolve

These points are not contradictory.

Summary

  • Miaoyan is more mature in engineering orientation, command capabilities, and input control
  • Zhipu has practical advantages in stability and subscription bundling
  • Shandianshuo takes a local-first + optional enhancement approach, with the key being how it balances “basic capability” and “enhancement cost”
  • Who truly becomes the “default gateway” depends on reducing distractions, fixing frequent minor issues, and treating voice input as true “infrastructure” rather than an add-on feature

The competition among AI voice input methods is no longer about recognition accuracy, but about who can own the shortcut key you press every day.