MoreRSS

site iconJimmy Song | 宋净超修改

Tetrate 布道师,云原生社区 创始人,CNCF Ambassador,云原生技术专家。
请复制 RSS 到你的阅读器,或快速订阅到 :

Inoreader Feedly Follow Feedbin Local Reader

Jimmy Song | 宋净超的 RSS 预览

The True Inflection Point of ADD: When Spec Becomes the Core Asset of AI-Era Software

2026-01-20 15:51:36

The role of Spec is undergoing a fundamental transformation, becoming the governance anchor of engineering systems in the AI era.

The Essence of Software Engineering and the Cost Structure Shift Brought by AI

From first principles, software engineering has always been about one thing: stably, controllably, and reproducibly transforming human intent into executable systems.

Artificial Intelligence (AI) does not change this engineering essence, but it dramatically alters the cost structure:

  • Implementation costs plummet: Code, tests, and boilerplate logic are rapidly commoditized.
  • Consistency costs rise sharply: Intent drift, hidden conflicts, and cross-module inconsistencies become more frequent.
  • Governance costs are amplified: As agents can act directly, auditability, accountability, and explainability become hard constraints.

Therefore, in the era of Agent-Driven Development (ADD), the core issue is not “can agents do the work,” but how to maintain controllability and intent preservation in engineering systems under highly autonomous agents.

The ADD Era Inflection Point: Three Structural Preconditions

Many attribute the “explosion” of ADD to more mature multi-agent systems, stronger models, or more automated tools. In reality, the true structural inflection point arises only when these three conditions are met:

Agents have acquired multi-step execution capabilities

With frameworks like LangChain, LangGraph, and CrewAI, agents are no longer just prompt invocations, but long-lived entities capable of planning, decomposition, execution, and rollback.

Agents are entering real enterprise delivery pipelines

Once in enterprise R&D, the question shifts from “can it generate” to “who approved it, is it compliant, can it be rolled back.”

Traditional engineering tools lack a control plane for the agent era

Tools like Git, CI, and Issue Trackers were designed for “human developer collaboration,” not for “agent execution.”

When these three factors converge, ADD inevitably shifts from an “efficiency tool” to a “governance system.”

The Changing Role of Spec: From Documentation to System Constraint

In the context of ADD, Spec is undergoing a fundamental shift:

Spec is no longer “documentation for humans,” but “the source of constraints and facts for systems and agents to execute.”

Spec now serves at least three roles:

Verifiable expression of intent and boundaries

Requirements, acceptance criteria, and design principles are no longer just text, but objects that can be checked, aligned, and traced.

Stable contracts for organizational collaboration

When agents participate in delivery, verbal consensus and tacit knowledge quickly fail. Versioned, auditable artifacts become the foundation of collaboration.

Policy surface for agent execution

Agents can write code, modify configurations, and trigger pipelines. Spec must become the constraint on “what can and cannot be done.”

From this perspective, the status of Spec is approaching that of the Control Plane in AI-native infrastructure.

The Reality of Multi-Agent Workflows: Orchestration and Governance First

In recent systems (such as APOX and other enterprise products), an industry consensus is emerging:

  • Multi-agent collaboration no longer pursues “full automation,” but is staged and gated.
  • Frameworks like LangGraph are used to build persistent, debuggable agent workflows.
  • RAG (e.g., based on Milvus) is used to accumulate historical Specs, decisions, and context as long-term memory.
  • The IDE mainly focuses on execution efficiency, not engineering governance.
Figure 1: APOX user interface
Figure 1: APOX user interface

APOX (AI Product Orchestration eXtended) is a multi-agent collaboration workflow platform for enterprise software delivery. Its core goals are:

  • To connect the entire process from product requirements to executable code with a governable Agentflow and explicit engineering artifact chain.
  • To assign dedicated AI agents to each delivery stage (such as PRD, PO, Architecture, Developer, Implementation, Coding, etc.).
  • To embed manual approval gates and full audit trails at every step, solving the “intent drift and consistency” governance problem that traditional AI coding tools cannot address.
  • The platform provides a VS Code plugin for real-time sync between local IDE and web artifacts, allowing Specs, code, tasks, and approval statuses to coexist in the repository.
  • Supports assigning different base models to different agents according to enterprise needs.

APOX is not about simply speeding up code generation, but about elevating “Spec” from auxiliary documentation to a verifiable, constrainable, and traceable core asset in engineering—building a control plane and workflow governance system suitable for Agent-Driven Development.

Such systems emphasize:

  • An explicit artifact chain from PRD → Spec → Task → Implementation.
  • Manual confirmation and audit points at every stage.
  • Bidirectional sync between Spec, code, repository, and IDE.

This is not about “smarter AI,” but about engineering systems adapting to the agent era.

The Long-Term Value of Spec: The Core Anchor of Engineering Assets

This is not to devalue code, but to acknowledge reality:

  • There will always be long-term differentiation in algorithms and model capabilities.
  • General engineering implementation is rapidly homogenizing.
  • What is hard to replicate is: how to define problems, constrain systems, and govern change.

In the ADD era, the value of Spec is reflected in:

  • Determining what agents can and cannot do.
  • Carrying the organization’s long-term understanding of the system.
  • Serving as the anchor for audit, compliance, and accountability.

Code will be rewritten again and again; Spec is the long-term asset.

Risks and Challenges of ADD: Living Spec and Governance Constraints

ADD also faces significant risks:

Can Spec become a Living Spec

That is, when key implementation changes occur, can the system detect “intent changes” and prompt Spec updates, rather than allowing silent drift?

Can governance achieve low friction but strong constraints

If gates are too strict, teams will bypass them; if too loose, the system loses control.

These two factors determine whether ADD is “the next engineering paradigm” or “just another tool bubble.”

The Trend Toward Control Planes in Engineering Systems

From a broader perspective, ADD is the inevitable result of engineering systems becoming “control planes”:

Engineering systems are evolving from “human collaboration tools” to “control systems for agent execution.”

In this structure:

  • Agent / IDE is the execution plane.
  • RAG / Memory is the state and memory plane.
  • Spec is the intent and policy plane.
  • Gates, audit, and traceability form the governance loop.

This closely aligns with the evolution path of AI-native infrastructure.

Summary

The winners of the ADD era will not be the systems with “the most agents or the fastest generation,” but those that first upgrade Spec from documentation to a governable, auditable, and executable asset. As automation advances, the true scarcity is the long-term control of intent.

AI Voice Dictation Input Methods Are Becoming the New Shortcut Key for the Programming Era

2026-01-18 14:53:08

Voice input methods are not just about being “fast”—they are becoming a brand new gateway for developers to collaborate with AI.

Warning
On January 12, 2026, due to financial difficulties encountered during operations, the Miaoyan project announced the cessation of operations and the team was disbanded. The application will no longer be updated or maintained, but existing versions can continue to be used on the current device and system, and do not store any audio or transcription content.
Figure 1: Can voice input become the new shortcut for developers? My in-depth comparison experience.
Figure 1: Can voice input become the new shortcut for developers? My in-depth comparison experience.

AI Voice Input Methods Are Becoming the “New Shortcut Key” in the Programming Era

I am increasingly convinced of one thing: PC-based AI voice input methods are evolving from mere “input tools” into the foundational interaction layer for the era of programming and AI collaboration.

It’s not just about typing faster—it determines how you deliver your intent to the system, whether you’re writing documentation, code, or collaborating with AI in IDEs, terminals, or chat windows.

Because of this, the differences in voice input method experiences are far more significant than they appear on the surface.

My Six Evaluation Criteria for AI Voice Input Methods

After long-term, high-frequency use, I have developed a set of criteria to assess the real-world performance of AI voice input methods:

  • Response speed: Does text appear quickly enough after pressing the shortcut to keep up with your thoughts?
  • Continuous input stability: Does it remain reliable during extended use, or does it suddenly fail or miss recognition?
  • Mixed Chinese-English and technical terms: Can it reliably handle code, paths, abbreviations, and product names?
  • Developer friendliness: Is it truly designed for command line, IDE, and automation scenarios?
  • Interaction restraint: Does it avoid introducing distracting features that interfere with input itself?
  • Subscription and cost structure: Is it a standalone paid product, or can it be bundled with existing tool subscriptions?

Based on these criteria, I focused on comparing Miaoyan, Shandianshuo, and Zhipu AI Voice Input Method.

Miaoyan: Currently the Most “Developer-Oriented” Domestic Product

Miaoyan was the first domestic AI voice input method I used extensively, and it remains the one I am most willing to use continuously.

Figure 2: Miaoyan is currently my most-used Mac voice input method.
Figure 2: Miaoyan is currently my most-used Mac voice input method.

Command Mode: The Key Differentiator for Developer Productivity

It’s important to clarify that Miaoyan’s command mode is not about editing text via voice. Instead:

You describe your need in natural language, and the system directly generates an executable command-line command.

This is crucial for developers:

  • It’s not just about input
  • It’s about turning voice into an automation entry point
  • Essentially, it connects voice to the CLI or toolchain

This design is clearly focused on engineering efficiency, not office document polishing.

Usage Experience Summary

  • Fast response, nearly instant
  • Output is relatively clean, with minimal guessing
  • Interaction design is restrained, with no unnecessary concepts
  • Developer-friendly mindset

But there are some practical limitations:

  • It is a completely standalone product
  • Requires a separate subscription
  • Still in relatively small-scale use

From a product strategy perspective, it feels more like a “pure tool” than part of an ecosystem.

Note
On January 12, 2026, due to financial difficulties encountered during operations, the Miaoyan project announced the cessation of operations and the team was disbanded. The application will no longer be updated or maintained, but existing versions can continue to be used on the current device and system, and do not store any audio or transcription content.

Shandianshuo: Local-First Approach, Developer Experience Depends on Your Setup

Shandianshuo takes a different approach: it treats voice input as a “local-first foundational capability,” emphasizing low latency and privacy (at least in its product narrative). The natural advantages of this approach are speed and controllable marginal costs, making it suitable as a “system capability” that’s always available, rather than a cloud service.

Figure 3: Shandianshuo settings page
Figure 3: Shandianshuo settings page

However, from a developer’s perspective, its upper limit often depends on “how you implement enhanced capabilities”:

If you only use it for basic transcription, the experience is more like a high-quality local input tool. But if you want better mixed Chinese-English input, technical term correction, symbol and formatting handling, the common approach is to add optional AI correction/enhancement capabilities, which usually requires extra configuration (such as providing your own API key or subscribing to enhanced features). The key trade-off here is not “can it be used,” but “how much configuration cost are you willing to pay for enhanced capabilities.”

If you want voice input to be a “lightweight, stable, non-intrusive” foundation, Shandianshuo is worth considering. But if your goal is to make voice input part of your developer workflow (such as command generation or executable actions), it needs to offer stronger productized design at the “command layer” and in terms of controllability.

Zhipu AI Voice Input Method: Stable but with Friction

I also thoroughly tested the Zhipu AI Voice Input Method.

Figure 4: Zhipu Voice Input Method settings interface
Figure 4: Zhipu Voice Input Method settings interface

Its strengths include:

  • More stable for long-term continuous input
  • Rarely becomes completely unresponsive
  • Good tolerance for longer Chinese input

But with frequent use, some issues stand out:

  • Idle misrecognition: If you press the shortcut but don’t speak, it may output random characters, disrupting your input flow
  • Occasionally messy output: Sometimes adds irrelevant words, making it less controllable than Miaoyan
  • Basic recognition errors: For example, “Zhipu” being recognized as “Zhipu” (with a different character), which is a trust issue for professional users
  • Feature-heavy design: Various tone and style features increase cognitive load

Subscription Bundling: Zhipu’s Practical Advantage

Although I prefer Miaoyan in terms of experience, Zhipu has a very practical advantage:

If you already subscribe to Zhipu’s programming package, the voice input method is included for free.

This means:

  • No need to pay separately for the input method
  • Lower psychological and decision-making cost
  • More likely to become the “default tool” that stays

From a business perspective, this is a very smart strategy.

Main Comparison Table

The following table compares the three products across key dimensions for quick reference.

Dimension Miaoyan Shandianshuo Zhipu AI Voice Input Method
Response Speed Fast, nearly instant Usually fast (local-first) Slightly slower than Miaoyan
Continuous Stability Stable Depends on setup and environment Very stable
Idle Misrecognition Rare Generally restrained (varies by version) Obvious: outputs characters even if silent
Output Cleanliness/Control High More like an “input tool” Occasionally messy
Developer Differentiator Natural language → executable command Local-first / optional enhancements Ecosystem-attached capabilities
Subscription & Cost Standalone, separate purchase Basic usable; enhancements often require setup/subscription Bundled free with programming package
My Current Preference Best experience More like a “foundation approach” Easy to keep but not clean enough
Table 1: Core Comparison of Miaoyan, Shandianshuo, and Zhipu AI Voice Input Methods

User Loyalty to AI Voice Input Methods

The switching cost for voice input methods is actually low: just a shortcut key and a habit of output.

What really determines whether users stick around is:

  • Whether the output is controllable
  • Whether it keeps causing annoying minor issues
  • Whether it integrates into your existing workflow and payment structure

For me personally:

  • The best and smoothest experience is still Miaoyan
  • The one most likely to stick around is probably Zhipu
  • Shandianshuo is more of a “foundation approach” and worth watching for how its enhancements evolve

These points are not contradictory.

Summary

  • Miaoyan is more mature in engineering orientation, command capabilities, and input control
  • Zhipu has practical advantages in stability and subscription bundling
  • Shandianshuo takes a local-first + optional enhancement approach, with the key being how it balances “basic capability” and “enhancement cost”
  • Who truly becomes the “default gateway” depends on reducing distractions, fixing frequent minor issues, and treating voice input as true “infrastructure” rather than an add-on feature

The competition among AI voice input methods is no longer about recognition accuracy, but about who can own the shortcut key you press every day.

From Spatial Data to AI Open Source: Technical Standards, Data Sovereignty, and the Global Divide

2026-01-11 11:29:28

The divide in technical standards and data sovereignty determines the global competitive landscape of infrastructure open source in the AI era.

In this article, I will use the differences in air quality data presentation in Apple Maps and Weather as a starting point to explore how technical standards and data sovereignty influence the open source paths of AI in different countries. I will further analyze why, in the AI era, infrastructure-level open source has become the key battleground for ecosystem dominance.

Author’s Note

This article originates from a very everyday observation: Why is air quality data in China shown as “points” in Apple Maps and Weather, while in other countries it is often displayed as “areas”?

Figure 1: Air quality map in Apple Weather, showing point-based data in China and area-based data in other countries
Figure 1: Air quality map in Apple Weather, showing point-based data in China and area-based data in other countries

At first glance, it seems like a product experience difference. But when I reconsidered this issue in the context of engineering, standards, and system design, I realized it actually points to a much bigger question: how different countries understand the relationship between technology, standards, openness, and sovereignty.

As an engineer who has long worked in cloud native, AI infrastructure, and open source ecosystems, I gradually realized that this difference is not limited to air quality or map data. In the AI era, it is further amplified, directly affecting how we open source models, build infrastructure, and whether we can participate in the formulation of global rules.

Writing this article is not about judging right or wrong, but about using a concrete example to explain a structural difference and discuss the long-term impact and real opportunities this difference may bring in the AI era.

What is especially important: at the level of AI infrastructure and infra-level open source, the competition has just begun. China is not without opportunities, but the choice of path will become more critical than ever.

Differences in Air Quality Data Presentation: A Microcosm of Technical Standards and Sovereignty

The following image illustrates the divide between spatial data, AI open source, and technical standards. By comparing how air quality data is presented in Apple Maps and Weather in different countries, you can intuitively feel the differences in technical standards and sovereignty strategies behind the scenes.

Figure 2: The divide between spatial data, AI open source, and technical standards
Figure 2: The divide between spatial data, AI open source, and technical standards

If you regularly use global products such as maps, weather, traffic, or various data services, you may notice a recurring phenomenon that is rarely discussed seriously: the way data is presented in China often differs significantly from global mainstream standards.

A very intuitive example comes from the air quality display in Apple Maps or Weather. In China, air quality is usually shown as discrete points; in the US, Europe, Japan, and other countries, it is often rendered as continuous coverage areas.

At first glance, this seems like a product experience difference, and may even lead people to mistakenly believe that “China’s data is incomplete.” But if you treat it as an engineering or system design issue, you will find: this is not a matter of data capability, but a different choice in technical standards, data sovereignty, and openness strategies.

And this choice is not limited to air quality.

Air Quality Is Just a Slice: Greater Differences in Spatial Public Data

Air quality is just a highly visible and relatively low-risk example. Similar differences have long existed in broader spatial and public data domains.

  • Maps and coordinate systems
  • Surveying and high-precision spatial data
  • Real-time traffic and population movement
  • Remote sensing, environmental, and urban operation data

In global mainstream systems, such data is usually regarded as public information infrastructure. It is standardized, gridded, API-ified, allows interpolation, modeling, and redistribution, and is widely used in research, business, and product innovation.

In China, this data often takes another form: hierarchical, discrete, strictly defined, and with centralized interpretation authority.

This is not a technical preference in a single field, but a systemic logic of technology and governance.

Three Global Paths

Placing China in a global context, we can see that there are roughly three different paths worldwide regarding “how public data and technical standards are opened.”

Engineering-Open Type: Standards and Ecosystem First

Represented by the US and some European countries, the core features of this system are:

  • Public data prioritized as infrastructure
  • Standards and interfaces come first
  • Encourages engineering autonomy and ecosystem evolution
  • Tolerates model inference and uncertainty

This path directly shaped the global landscape of foundational software and infrastructure-level open source. Linux, Kubernetes, and the cloud native system are essentially products of openness at the rules layer.

Governance-Sovereignty Type: Control and Auditability First

Represented by China, this path emphasizes:

  • Sensitivity of spatial and public data
  • Data as part of governance capability
  • Standards, definitions, and release methods are highly bound
  • Emphasizes traceability, accountability, and controllability

In this system, “point data” is not a sign of technological backwardness, but a governable technical form. When a technical system is designed as a governance system, its primary goal is not reusability, but controllability.

Compromise-Coordinated Type: Cautious Openness, Engineering Internationalization

Some countries try to find a balance between the two, maintaining caution in spatial data while being highly internationalized in engineering and industry. This shows that the difference is not about being advanced or backward, but about different objective functions.

The following diagram compares the core characteristics, typical cases, and advantages/challenges of these three paths from a global perspective. The “Engineering-Open Type” on the left shapes the global infrastructure software landscape through standards and ecosystems; the “Governance-Sovereignty Type” in the middle emphasizes data sovereignty and security controllability but has limitations in influence at the rules layer; the “Compromise-Coordinated Type” on the right attempts to find a balance between security and openness. The divide between these three paths directly affects the infrastructure competition landscape of various countries in the AI era.

Figure 3: Global Perspective: Three Paths for Public Data and Technical Standards
Figure 3: Global Perspective: Three Paths for Public Data and Technical Standards

The Essence of “Point” vs. “Area” in Air Quality

Among all spatial public data, air quality is an ideal observation window:

  • Does not directly involve military or core economic security
  • Highly visible, updated daily, and perceptible to everyone

China does not lack air quality data; on the contrary, the density of monitoring stations is among the highest in the world. The real difference lies in:

  • Whether interpolation is allowed
  • Whether model inference is allowed
  • Whether platforms are allowed to reinterpret the data

“Point” means authenticity and traceability; “area” means models, inference, and redistribution of interpretive authority. This is precisely the watershed between technical standards and data sovereignty.

The following diagram compares two different technical paths. The left side, “Governance-Sovereignty Type,” emphasizes data traceability and controllability, using discrete point-based data presentation. The right side, “Engineering-Open Type,” allows model interpolation and inference, providing more user-friendly experience through continuous area-based coverage. The essence of this difference lies not in the level of technical capability, but in the different choices made between data sovereignty, governance capability, and open ecosystems.

Figure 4: Technical Standards and Sovereignty Divide in Spatial Data Presentation
Figure 4: Technical Standards and Sovereignty Divide in Spatial Data Presentation

The Amplification Effect in the AI Era

With the above logic in mind, many phenomena in the AI era become less confusing.

For example:

  • Why are Chinese AI companies more willing to open source large language model (LLM) weights, while American companies have clearly shifted toward closed source in recent years?
  • Why is foundational software and infrastructure-level open source still mainly led by the US?

The key is not “whether to open source,” but “which layer is open sourced.”

  • Model weights are static, declarable assets
  • Infrastructure, runtimes, protocols, and standards are dynamic, evolving system rules

Open sourcing weights is essentially openness at the asset layer; infrastructure-level open source means relinquishing control over operating rules and interpretive authority.

The following diagram compares two different layers of AI open source. The left side shows “Model Weight Layer Open Source,” which is a typical feature of Chinese path—opening static digital assets with low cost and controllable risk, but not involving rule-making. The right side shows “Infrastructure Layer Open Source,” which is a core strategy of US path—by open sourcing development tools, protocol standards, runtimes, and compute scheduling and other infrastructure, defining how AI is used, thereby mastering ecosystem rules and interpretive authority. Key insight: Open sourcing model weights does not equal mastering AI ecosystem, and the real competitive focus is shifting to the infrastructure layer of “how AI runs.”

Figure 5: Two Layers of AI Era Open Source: Model Weights vs Infrastructure
Figure 5: Two Layers of AI Era Open Source: Model Weights vs Infrastructure

The US Approach: Focusing on Rules and Runtime Layers

In the past year or two, US-led AI open source and ecosystem initiatives have shown a highly consistent direction: not rushing to open source the strongest models, but focusing on defining “how AI is used.”

  • The Linux Foundation established AAIF (Agentic AI Foundation), focusing on AI infrastructure, standards, and toolchain collaboration
  • Protocols like MCP (Model Context Protocol) aim to define common interaction methods between agents and tools/systems
  • Major tech companies are generally focusing on APIs, platforms, runtimes, and ecosystem binding

The commonality of these actions: competing in model capability, but controlling the usage rules.

China’s Shift: From Model-Oriented to Infrastructure-Oriented

It is important to emphasize that this difference does not mean China is unaware of the issue.

Whether in policy discussions or within industry and research institutions, the risk of “only open sourcing models without controlling infrastructure and standard dominance” has been repeatedly discussed.

The real challenge lies in how to achieve a directional shift within the existing governance logic and risk framework. This shift has already appeared in some concrete practices.

Exploration and Practice at the Infrastructure Layer

In the AI era, infrastructure often starts with the most engineering-driven problems.

HAMi Project

Projects like HAMi do not focus on model capability, but on:

  • Abstraction, allocation, and isolation of GPU resources
  • How multi-tenant AI workloads are run
  • How computing power transitions from hardware assets to governable system resources

The significance of such projects is not about being “SOTA,” but about entering the domain of “how AI runs.”

AI Runtime Reconstruction from a System Software Perspective

Exploration at the research institution level is also noteworthy. The FlagOS initiative by the Beijing Academy of Artificial Intelligence is a clear signal: AI is being redefined as a system software issue, not just a model or algorithm problem.

Long-Term Tech Stack Investment by Industry Players

In the industry, Huawei’s strategy reflects a similar direction: not simply open sourcing models, but attempting to build a complete, controllable AI tech stack, from computing power to frameworks, platforms, and ecosystems. This is a slower, heavier, but more infrastructure-competitive path.

Realistic Assessment: The Starting Point of AI Infrastructure Competition

Taking a longer view, we find an easily overlooked fact:

At the level of AI infrastructure and infra-level open source, there is no settled pattern between China and the US.

The US advantage lies in:

  • Mature engineering culture
  • Standard organizations and foundation mechanisms
  • High proficiency in openness at the rules layer

China’s variables include:

  • Huge AI application scenarios
  • Extreme demand for computing power and system efficiency
  • Ongoing directional adjustments

The real uncertainty is not “whether we can catch up,” but whether it is possible to gradually open up space for engineering autonomy and standard co-construction while maintaining governance bottom lines.

Summary

The “points” and “areas” of air quality, model weights and the world of operations—behind these appearances lies not a simple technical route dispute, but how a country finds its own balance between openness, standards, and sovereignty.

In the AI era, this issue will not disappear, but will become more concrete and more engineering-driven. And this is precisely where there are still opportunities for China’s AI infrastructure open source.

Joining Dynamia: Embarking on a New Journey in AI Native Infrastructure

2026-01-07 15:49:21

Compute governance is the critical bottleneck for AI scaling. From hardware consumption to core asset, this long-undervalued path needs to be redefined.

Figure 1: Dynamia.ai
Figure 1: Dynamia.ai

A New Beginning

I have officially joined Dynamia as Open Source Ecosystem VP, responsible for the long-term development of the company in open source, technical narrative, and AI Native Infrastructure ecosystem directions.

Why I Chose Dynamia

I chose to join Dynamia not because it’s a company trying to “solve all AI problems,” but precisely the opposite—it’s because Dynamia focuses intensely on one unavoidable, yet long-undervalued core issue in AI Native Infrastructure: compute, especially Graphics Processing Units (GPU), are evolving from “technical resources” into infrastructure elements that require refined governance and economic management.

Through years of practice in cloud native, distributed systems, and AI infrastructure (AI Infra), I’ve formed a clear judgment: as Large Language Models (LLM) and AI Agents enter the stage of large-scale deployment, the real bottleneck limiting system scalability and sustainability is no longer just model capability itself, but how compute is measured, allocated, isolated, and scheduled, and how a governable, accountable, and optimizable operational mechanism is formed at the system level. From this perspective, the core challenge of AI infrastructure is essentially evolving into a “resource governance and Token economy” problem.

About Dynamia and HAMi

Dynamia is an AI-native infrastructure technology company rooted in open source DNA, driving efficiency leaps in heterogeneous compute through technological innovation. Its leading open source project, HAMi (Heterogeneous AI Computing Virtualization Middleware), is a Cloud Native Computing Foundation (CNCF) sandbox project providing GPU, NPU and other heterogeneous device virtualization, sharing, isolation, and topology-aware scheduling capabilities, widely adopted by 50+ enterprises and institutions.

Dynamia’s Technical Approach

In this context, Dynamia’s technical approach—starting from the GPU layer, which is the most expensive, scarcest, and least unified abstraction layer in AI systems, treating compute as a foundational resource that can be measured, partitioned, scheduled, governed, and even “tokenized” for refined accounting and optimization—aligns highly with my long-term judgment on AI-native infrastructure.

This path doesn’t use “model capabilities” or “application innovation” as selling points in the short term, nor is it easily packaged into simple stories. However, with rising compute costs, heterogeneous accelerators becoming the norm, and AI systems moving toward multi-tenant and large-scale operations, these infrastructure-level capabilities are gradually becoming prerequisites for the establishment and expansion of AI systems.

Future Focus

As Dynamia’s Open Source Ecosystem VP, I will focus on technical narrative of AI-native infrastructure, open source ecosystem building, and global developer collaboration, promoting compute from “hardware resource being consumed” to governable, measurable, and optimizable AI infrastructure core asset, laying the foundation for the scaling and sustainable evolution of AI systems in the next stage.

Summary

Joining Dynamia is an important milestone in my career and a concrete action demonstrating my long-term optimism about AI-native infrastructure. Compute governance is not a short-term trend that yields quick results, but an infrastructure proposition that cannot be bypassed for AI large-scale deployment. I look forward to exploring, building, and landing solutions on this long-undervalued path with global developers.

References

Running Parallel AI Agents on My Mac: Hands-On with Verdent's Standalone App

2026-01-04 10:25:48

I’ve been spending more time recently experimenting with vibe coding tools on real projects, not demos. One of those projects is my own website, where I constantly tweak content structure, navigation, and layout.

During this process, I started using Verdent’s standalone Mac app more seriously. What stood out was not any single feature, but how different the experience felt compared to traditional AI coding tools.

Figure 1: Verdent Standalone App UI
Figure 1: Verdent Standalone App UI

Verdent doesn’t behave like an assistant waiting for instructions. It behaves more like an environment where work happens in parallel.

A Different Starting Point: Tasks, Not Chats

Most AI coding tools begin with a conversation. Verdent begins with tasks.

When I opened my website repository in the Verdent app, I didn’t start with a long prompt. I created multiple tasks directly: one to rethink navigation and SEO structure, another to explore homepage layout improvements, and a third to review existing content organization.

Each task immediately spun up its own agent and workspace. From the beginning, the app encouraged me to think in parallel, the same way I normally would when sketching ideas on paper or jumping between files.

This framing alone changes how you work.

Built for Multitasking, Without Losing Context

Switching contexts is unavoidable in real development work. What usually breaks is continuity.

Verdent handles this well. Each task preserves its full context independently. I could stop one task mid-way, switch to another, and come back later without re-explaining the problem or reloading files.

For example, while one agent was analyzing my site’s navigation structure, another was exploring layout options. I moved between them freely. Nothing was lost. Each agent remembered exactly what it was doing.

This feels closer to how developers think than how chat-based tools operate.

Safe Parallel Coding with Workspaces

Parallel work only becomes truly safe when code changes are isolated. When parallelism moves from discussion to actual code modification, risk management becomes essential.

Verdent solves this with Workspaces. Each workspace is an isolated, independent code environment with its own change history, commit log, and branches. This isn’t just about separation—it’s about making concurrent code changes manageable.

What this means in practice:

  • Multiple tasks can write code simultaneously
  • Changes remain isolated from each other
  • If conflicts arise, they’re visible and cleanly resolvable

I intentionally let different agents operate on overlapping parts of my project: one modifying Markdown content and links, another adjusting CSS and layout logic. Both ran in parallel. No conflicts emerged. Later, I reviewed the diffs from each workspace and merged only what made sense.

This kind of isolation removes significant anxiety from AI-assisted coding. You stop worrying about breaking things and start experimenting more freely, knowing that each change exists in its own contained environment.

Parallel Agent Execution Feels Like Delegation

Parallelism doesn’t mean that all agents complete the same phase of work at the same time—instead, by isolating and overlapping phases, what was once a strictly sequential process is compressed into a more efficient, collaborative mode.

In Verdent, each agent runs in its own workspace, essentially an automatically managed branch or worktree. In practice, I often create multiple tasks with different responsibilities for the same requirement, such as planning, implementation, and review. But this doesn’t mean they all complete the same phase simultaneously.

These tasks are triggered as needed, each running for a period and producing clear artifacts as boundaries for collaboration. The planning task generates planning documents or constraint specifications; the implementation task advances code changes based on those documents and produces diffs; the review task, according to the established planning goals and audit criteria, performs staged reviews of the generated changes. By overlapping phases around artifacts, the originally strict sequential process is compressed into a workflow that more closely resembles team collaboration.

The value of splitting into multiple tasks is not parallel execution, but parallel cognition and clear collaboration boundaries.

While it’s technically possible to put multiple roles into a single task, this causes planning, implementation, and review to share the same context, which weakens role isolation and the auditability of results.

Configurability and Design Trade-offs

Beyond the workflow model itself, Verdent exposes a surprisingly rich set of configurable capabilities.

It allows users to customize MCP settings, define subagents with configurable prompts, and create reusable commands via slash (/) shortcuts. Personal rules can be written to influence agent behavior and response style, and command-level permissions can be configured to enforce basic security boundaries. Verdent also supports multiple mainstream foundation models, including GPT, Claude, Gemini, and K2. For users who prefer a lightweight coding experience without a full IDE, Verdent offers DiffLens as an alternative review-oriented interface. Both subscription-based and credit-based pricing models are supported.

Figure 2: Verdent Settings
Figure 2: Verdent Settings

That said, Verdent makes a clear set of trade-offs. It is not built around tab-based code completion, nor does it offer a plugin system. If it did, it would start to resemble a traditional IDE - which does not seem to be its goal. Verdent is not designed for direct, fine-grained code manipulation; most changes are mediated through conversational tasks and agent-driven edits. This makes the experience clean and focused, but it also means that for large, highly complex codebases, Verdent may function better as a complementary orchestration layer rather than a full-time development environment.

Where Verdent Fits Today

There are many AI-assisted coding tools emerging right now. Some focus on smarter editors, others on faster generation.

Verdent feels different because it focuses on orchestration, not just assistance.

It doesn’t try to replace your editor. It sits one level above, coordinating planning, execution, and review across multiple agents.

That makes it particularly suitable for exploratory work, refactoring, and early-stage design - exactly the kind of work I was doing on my website.

Final Thoughts

Using Verdent’s standalone app didn’t just speed things up. It changed how I structured work.

Instead of doing everything sequentially, I started thinking in parallel again - and letting the system support that way of thinking.

Verdent feels less like an AI feature and more like an environment that assumes AI is already part of how development happens.

For developers experimenting with AI-native workflows, that shift is worth paying attention to.

2025 Annual Review: The Transformation Journey from Cloud Native to AI Native

2025-12-31 18:02:01

The waves of technology keep evolving; only by actively embracing change can we continue to create value. In 2025, I chose to move from Cloud Native to AI Native—this year marked a key turning point for personal growth and system reinvention.

2025 was a turning point for me. This year, I not only changed my technical direction but also the way I approach problems. Moving from Cloud Native infrastructure to AI Native Infrastructure was not just a migration of content, but an upgrade in mindset.

Figure 1: Farewell 2025!
Figure 1: Farewell 2025!

This year, I conducted a large-scale refactoring of the website and systematically organized the content. Beyond the technical improvements, I want to share my thoughts and changes throughout the year.

A Bold Shift: Embracing the AI Native Era

At the beginning of 2025, I made an important decision: to reposition myself from a Cloud Native Evangelist to an AI Infrastructure Architect. This was not just a change in title, but a strategic transformation after careful consideration.

As I witnessed the surge of AI technologies and the rise of Agent-based applications reshaping software, I realized that clinging to the boundaries of Cloud Native might mean missing an era. So, I systematically adjusted the website’s content structure, shifting the focus toward AI Native Infrastructure.

This transformation was not about abandoning the past, but extending forward from the foundation of Cloud Native. Classic content like Kubernetes and Istio remains and is continuously updated, but new topics such as AI Agent and the AI OSS landscape have been added, forming a more complete knowledge map.

Content Creation: From Technical Details to Ecosystem Perspective

AI Agent: Building Systematic Knowledge

Agents represent a major evolution in software for the AI era. When I tried to understand Agent design principles, I found fragmented information everywhere but lacked a systematic knowledge base.

So I created content that analyzes the Agent context lifecycle and control loop mechanisms, summarizing several proven architectural patterns. To make complex knowledge easier to digest, I organized it into logical sections so readers can learn step by step.

AI Tool Ecosystem: Mapping the Open Source Landscape

AI tools and frameworks are emerging rapidly, with new projects appearing daily. To help readers quickly grasp the ecosystem, I built a comprehensive AI OSS database.

This database covers everything from Agent frameworks to development tools and deployment services. I not only included active projects but also established an archive mechanism, preserving detailed information on over 150 historical projects. More importantly, I developed a scoring system to objectively evaluate projects across dimensions like quality and sustainability, helping readers decide which tools are worth investing time in.

Blogging: Capturing Technology Trends Faster

In 2025, I wrote over 120 blog posts. Compared to previous years, these articles focused more on observing and reflecting on technology trends, rather than just technical tutorials.

I started paying attention to deeper questions: How will AI infrastructure evolve? What does Beijing’s open source initiative mean for the AI industry? What ripple effects might a tech acquisition trigger? These articles allowed me and my readers to not only see “what” technology is, but also “why” and “what’s next.”

User Experience: Making Knowledge Easier to Discover and Consume

No matter how good the content is, if it can’t be easily found and read, its value is greatly diminished. In 2025, I invested significant effort into website functionality, with one goal: to provide readers with a smoother reading experience.

Comprehensive Search Upgrade

As the volume of content grew, the original search function could no longer meet demand. I redesigned the search system to support fuzzy search and result scoring, and optimized index loading performance. More importantly, the new search interface is more user-friendly, supporting keyboard navigation and category filtering so users can find what they want faster.

Multi-Device Experience Optimization

Mobile reading experience has improved significantly. I refactored the mobile navigation and table of contents, making reading on phones much smoother. Dark mode is now more refined, fixing several display issues and ensuring images and diagrams look good on dark backgrounds.

Efficiency Revolution in Content Distribution

A major change was optimizing the WeChat Official Account publishing workflow. Previously, publishing website content to WeChat required manual handling of many details; now, it’s almost one-click export. This workflow automatically processes images, metadata, styles, and all details, reducing a half-hour task to just a few minutes.

Additionally, I added a glossary feature for technical term highlighting and tooltips; improved SEO and social sharing metadata; and cleaned up outdated content. These seemingly minor improvements quietly enhance the user experience.

Content Evolution: More Dimensional Knowledge Expression

Looking back at content creation in 2025, I found clear changes in several dimensions.

From Tutorials to Observations

Early content leaned toward technical tutorials and practical guides, showing “how to do.” This year, I focused more on “why” and “what are the trends.” I wrote more technology trend analyses, ecosystem maps, and in-depth case studies. These may not directly teach you how to use an API, but they help you understand the direction of technological evolution.

From Chinese to Bilingual

AI is a global wave and cannot be limited to the Chinese-speaking world. In 2025, I wrote bilingual documentation for almost all new AI tools, and important blog posts also have English versions. This increased the workload, but allowed the content to reach a broader audience.

From Text to Multimedia

Text is efficient, but not all knowledge is best expressed in words. This year, I used many architecture and schematic diagrams to explain complex concepts, adding 59 new charts. These visual elements lower the barrier to understanding, making abstract concepts more intuitive. I also optimized image display in dark mode to ensure consistent visual experience.

Development Approach: Embracing AI-Assisted Programming

2025 was not only a year of shifting content themes toward AI, but also a year of deep practice in AI-assisted programming.

I developed a VS Code plugin and created many prompts to automate repetitive tasks. I experimented with various AI programming tools and settled on a toolchain that suits me. I even migrated the website to Cloudflare Pages and used its edge computing services to develop a chatbot. These practices greatly improved development efficiency, giving me more time to focus on thinking and creating rather than mechanical coding.

This made me realize: AI will not replace developers, but developers who use AI well will replace those who do not. I also shared more insights to help others master AI-assisted programming.

Looking Ahead to 2026: Keep Moving Forward

Looking back at 2025, the site underwent a profound transformation—from a Cloud Native tech blog to an AI infrastructure knowledge base. But this is just the beginning, not the end.

Looking forward to 2026, I plan to continue deepening in several areas:

  • Enhancing the knowledge system: Continue to supplement GPU infrastructure and AI Agent content, especially practical cases and performance tuning knowledge.
  • Tracking ecosystem evolution: AI tools and frameworks iterate rapidly; I need to keep up with this fast-changing ecosystem and update content in a timely manner.
  • Deepening engineering practice: Share more practical AI engineering experience to help readers turn theory into practice.
  • Exploring knowledge connections: Consider building a knowledge graph to connect different content sections, providing smarter navigation and recommendations.

Summary

2025 was a year of change and growth. From Cloud Native to AI Native, from technical practice to ecosystem observation, both the content and functionality of the site have made qualitative leaps.

What makes me happiest is that this transformation allowed me and my readers to stand at the forefront of the technology wave. We are not just learning new technologies, but thinking about how technology changes the world and the way we write software.

The waves of technology keep evolving; only by actively embracing change can we continue to create value. Thank you to every reader for your companionship and support. I look forward to sharing more insights and practices in 2026.

Further Reading: