MoreRSS

site iconJimmy Song | 宋净超修改

Tetrate 布道师,云原生社区 创始人,CNCF Ambassador,云原生技术专家。
请复制 RSS 到你的阅读器,或快速订阅到 :

Inoreader Feedly Follow Feedbin Local Reader

Jimmy Song | 宋净超的 RSS 预览

Joining Dynamia: Embarking on a New Journey in AI Native Infrastructure

2026-01-07 15:49:21

Compute governance is the critical bottleneck for AI scaling. From hardware consumption to core asset, this long-undervalued path needs to be redefined.

Figure 1: Dynamia.ai
Figure 1: Dynamia.ai

A New Beginning

I have officially joined Dynamia as VP of Open Source Ecosystem / Partner, responsible for the long-term development of the company in open source, technical narrative, and AI Native Infrastructure ecosystem directions.

Why I Chose Dynamia

I chose to join Dynamia not because it’s a company trying to “solve all AI problems,” but precisely the opposite—it’s because Dynamia focuses intensely on one unavoidable, yet long-undervalued core issue in AI Native Infrastructure: compute, especially Graphics Processing Units (GPU), are evolving from “technical resources” into infrastructure elements that require refined governance and economic management.

Through years of practice in cloud native, distributed systems, and AI infrastructure (AI Infra), I’ve formed a clear judgment: as Large Language Models (LLM) and AI Agents enter the stage of large-scale deployment, the real bottleneck limiting system scalability and sustainability is no longer just model capability itself, but how compute is measured, allocated, isolated, and scheduled, and how a governable, accountable, and optimizable operational mechanism is formed at the system level. From this perspective, the core challenge of AI infrastructure is essentially evolving into a “resource governance and Token economy” problem.

About Dynamia and HAMi

Dynamia is an AI native infrastructure technology company rooted in open source DNA, driving efficiency leaps in heterogeneous compute through technological innovation. Its leading open source project, HAMi (Heterogeneous AI Computing Virtualization Middleware), is a Cloud Native Computing Foundation (CNCF) sandbox project providing GPU, NPU and other heterogeneous device virtualization, sharing, isolation, and topology-aware scheduling capabilities, widely adopted by 50+ enterprises and institutions.

Dynamia’s Technical Approach

In this context, Dynamia’s technical approach—starting from the GPU layer, which is the most expensive, scarcest, and least unified abstraction layer in AI systems, treating compute as a foundational resource that can be measured, partitioned, scheduled, governed, and even “tokenized” for refined accounting and optimization—aligns highly with my long-term judgment on AI native infrastructure.

This path doesn’t use “model capabilities” or “application innovation” as selling points in the short term, nor is it easily packaged into simple stories. However, with rising compute costs, heterogeneous accelerators becoming the norm, and AI systems moving toward multi-tenant and large-scale operations, these infrastructure-level capabilities are gradually becoming prerequisites for the establishment and expansion of AI systems.

Future Focus

As Dynamia’s VP of Open Source Ecosystem / Partner, I will focus on technical narrative of AI native infrastructure, open source ecosystem building, and global developer collaboration, promoting compute from “hardware resource being consumed” to governable, measurable, and optimizable AI infrastructure core asset, laying the foundation for the scaling and sustainable evolution of AI systems in the next stage.

Summary

Joining Dynamia is an important milestone in my career and a concrete action demonstrating my long-term optimism about AI native infrastructure. Compute governance is not a short-term trend that yields quick results, but an infrastructure proposition that cannot be bypassed for AI large-scale deployment. I look forward to exploring, building, and landing solutions on this long-undervalued path with global developers.

References

Running Parallel AI Agents on My Mac: Hands-On with Verdent's Standalone App

2026-01-04 10:25:48

I’ve been spending more time recently experimenting with vibe coding tools on real projects, not demos. One of those projects is my own website, where I constantly tweak content structure, navigation, and layout.

During this process, I started using Verdent’s standalone Mac app more seriously. What stood out was not any single feature, but how different the experience felt compared to traditional AI coding tools.

Figure 1: Verdent Standalone App UI
Figure 1: Verdent Standalone App UI

Verdent doesn’t behave like an assistant waiting for instructions. It behaves more like an environment where work happens in parallel.

A Different Starting Point: Tasks, Not Chats

Most AI coding tools begin with a conversation. Verdent begins with tasks.

When I opened my website repository in the Verdent app, I didn’t start with a long prompt. I created multiple tasks directly: one to rethink navigation and SEO structure, another to explore homepage layout improvements, and a third to review existing content organization.

Each task immediately spun up its own agent and workspace. From the beginning, the app encouraged me to think in parallel, the same way I normally would when sketching ideas on paper or jumping between files.

This framing alone changes how you work.

Built for Multitasking, Without Losing Context

Switching contexts is unavoidable in real development work. What usually breaks is continuity.

Verdent handles this well. Each task preserves its full context independently. I could stop one task mid-way, switch to another, and come back later without re-explaining the problem or reloading files.

For example, while one agent was analyzing my site’s navigation structure, another was exploring layout options. I moved between them freely. Nothing was lost. Each agent remembered exactly what it was doing.

This feels closer to how developers think than how chat-based tools operate.

Safe Parallel Coding with Workspaces

Parallel work only becomes truly safe when code changes are isolated. When parallelism moves from discussion to actual code modification, risk management becomes essential.

Verdent solves this with Workspaces. Each workspace is an isolated, independent code environment with its own change history, commit log, and branches. This isn’t just about separation—it’s about making concurrent code changes manageable.

What this means in practice:

  • Multiple tasks can write code simultaneously
  • Changes remain isolated from each other
  • If conflicts arise, they’re visible and cleanly resolvable

I intentionally let different agents operate on overlapping parts of my project: one modifying Markdown content and links, another adjusting CSS and layout logic. Both ran in parallel. No conflicts emerged. Later, I reviewed the diffs from each workspace and merged only what made sense.

This kind of isolation removes significant anxiety from AI-assisted coding. You stop worrying about breaking things and start experimenting more freely, knowing that each change exists in its own contained environment.

Parallel Agent Execution Feels Like Delegation

Parallelism doesn’t mean that all agents complete the same phase of work at the same time—instead, by isolating and overlapping phases, what was once a strictly sequential process is compressed into a more efficient, collaborative mode.

In Verdent, each agent runs in its own workspace, essentially an automatically managed branch or worktree. In practice, I often create multiple tasks with different responsibilities for the same requirement, such as planning, implementation, and review. But this doesn’t mean they all complete the same phase simultaneously.

These tasks are triggered as needed, each running for a period and producing clear artifacts as boundaries for collaboration. The planning task generates planning documents or constraint specifications; the implementation task advances code changes based on those documents and produces diffs; the review task, according to the established planning goals and audit criteria, performs staged reviews of the generated changes. By overlapping phases around artifacts, the originally strict sequential process is compressed into a workflow that more closely resembles team collaboration.

The value of splitting into multiple tasks is not parallel execution, but parallel cognition and clear collaboration boundaries.

While it’s technically possible to put multiple roles into a single task, this causes planning, implementation, and review to share the same context, which weakens role isolation and the auditability of results.

Configurability and Design Trade-offs

Beyond the workflow model itself, Verdent exposes a surprisingly rich set of configurable capabilities.

It allows users to customize MCP settings, define subagents with configurable prompts, and create reusable commands via slash (/) shortcuts. Personal rules can be written to influence agent behavior and response style, and command-level permissions can be configured to enforce basic security boundaries. Verdent also supports multiple mainstream foundation models, including GPT, Claude, Gemini, and K2. For users who prefer a lightweight coding experience without a full IDE, Verdent offers DiffLens as an alternative review-oriented interface. Both subscription-based and credit-based pricing models are supported.

Figure 2: Verdent Settings
Figure 2: Verdent Settings

That said, Verdent makes a clear set of trade-offs. It is not built around tab-based code completion, nor does it offer a plugin system. If it did, it would start to resemble a traditional IDE - which does not seem to be its goal. Verdent is not designed for direct, fine-grained code manipulation; most changes are mediated through conversational tasks and agent-driven edits. This makes the experience clean and focused, but it also means that for large, highly complex codebases, Verdent may function better as a complementary orchestration layer rather than a full-time development environment.

Where Verdent Fits Today

There are many AI-assisted coding tools emerging right now. Some focus on smarter editors, others on faster generation.

Verdent feels different because it focuses on orchestration, not just assistance.

It doesn’t try to replace your editor. It sits one level above, coordinating planning, execution, and review across multiple agents.

That makes it particularly suitable for exploratory work, refactoring, and early-stage design - exactly the kind of work I was doing on my website.

Final Thoughts

Using Verdent’s standalone app didn’t just speed things up. It changed how I structured work.

Instead of doing everything sequentially, I started thinking in parallel again - and letting the system support that way of thinking.

Verdent feels less like an AI feature and more like an environment that assumes AI is already part of how development happens.

For developers experimenting with AI-native workflows, that shift is worth paying attention to.

2025 Annual Review: The Transformation Journey from Cloud Native to AI Native

2025-12-31 18:02:01

The waves of technology keep evolving; only by actively embracing change can we continue to create value. In 2025, I chose to move from Cloud Native to AI Native—this year marked a key turning point for personal growth and system reinvention.

2025 was a turning point for me. This year, I not only changed my technical direction but also the way I approach problems. Moving from Cloud Native infrastructure to AI Native infrastructure was not just a migration of content, but an upgrade in mindset.

Figure 1: Farewell 2025!
Figure 1: Farewell 2025!

This year, I conducted a large-scale refactoring of the website and systematically organized the content. Beyond the technical improvements, I want to share my thoughts and changes throughout the year.

A Bold Shift: Embracing the AI Native Era

At the beginning of 2025, I made an important decision: to reposition myself from a Cloud Native Evangelist to an AI Infrastructure Architect. This was not just a change in title, but a strategic transformation after careful consideration.

As I witnessed the surge of AI technologies and the rise of Agent-based applications reshaping software, I realized that clinging to the boundaries of Cloud Native might mean missing an era. So, I systematically adjusted the website’s content structure, shifting the focus toward AI Native infrastructure.

This transformation was not about abandoning the past, but extending forward from the foundation of Cloud Native. Classic content like Kubernetes and Istio remains and is continuously updated, but new topics such as AI Agent and the AI OSS landscape have been added, forming a more complete knowledge map.

Content Creation: From Technical Details to Ecosystem Perspective

AI Agent: Building Systematic Knowledge

Agents represent a major evolution in software for the AI era. When I tried to understand Agent design principles, I found fragmented information everywhere but lacked a systematic knowledge base.

So I created content that analyzes the Agent context lifecycle and control loop mechanisms, summarizing several proven architectural patterns. To make complex knowledge easier to digest, I organized it into logical sections so readers can learn step by step.

AI Tool Ecosystem: Mapping the Open Source Landscape

AI tools and frameworks are emerging rapidly, with new projects appearing daily. To help readers quickly grasp the ecosystem, I built a comprehensive AI OSS database.

This database covers everything from Agent frameworks to development tools and deployment services. I not only included active projects but also established an archive mechanism, preserving detailed information on over 150 historical projects. More importantly, I developed a scoring system to objectively evaluate projects across dimensions like quality and sustainability, helping readers decide which tools are worth investing time in.

Blogging: Capturing Technology Trends Faster

In 2025, I wrote over 120 blog posts. Compared to previous years, these articles focused more on observing and reflecting on technology trends, rather than just technical tutorials.

I started paying attention to deeper questions: How will AI infrastructure evolve? What does Beijing’s open source initiative mean for the AI industry? What ripple effects might a tech acquisition trigger? These articles allowed me and my readers to not only see “what” technology is, but also “why” and “what’s next.”

User Experience: Making Knowledge Easier to Discover and Consume

No matter how good the content is, if it can’t be easily found and read, its value is greatly diminished. In 2025, I invested significant effort into website functionality, with one goal: to provide readers with a smoother reading experience.

Comprehensive Search Upgrade

As the volume of content grew, the original search function could no longer meet demand. I redesigned the search system to support fuzzy search and result scoring, and optimized index loading performance. More importantly, the new search interface is more user-friendly, supporting keyboard navigation and category filtering so users can find what they want faster.

Multi-Device Experience Optimization

Mobile reading experience has improved significantly. I refactored the mobile navigation and table of contents, making reading on phones much smoother. Dark mode is now more refined, fixing several display issues and ensuring images and diagrams look good on dark backgrounds.

Efficiency Revolution in Content Distribution

A major change was optimizing the WeChat Official Account publishing workflow. Previously, publishing website content to WeChat required manual handling of many details; now, it’s almost one-click export. This workflow automatically processes images, metadata, styles, and all details, reducing a half-hour task to just a few minutes.

Additionally, I added a glossary feature for technical term highlighting and tooltips; improved SEO and social sharing metadata; and cleaned up outdated content. These seemingly minor improvements quietly enhance the user experience.

Content Evolution: More Dimensional Knowledge Expression

Looking back at content creation in 2025, I found clear changes in several dimensions.

From Tutorials to Observations

Early content leaned toward technical tutorials and practical guides, showing “how to do.” This year, I focused more on “why” and “what are the trends.” I wrote more technology trend analyses, ecosystem maps, and in-depth case studies. These may not directly teach you how to use an API, but they help you understand the direction of technological evolution.

From Chinese to Bilingual

AI is a global wave and cannot be limited to the Chinese-speaking world. In 2025, I wrote bilingual documentation for almost all new AI tools, and important blog posts also have English versions. This increased the workload, but allowed the content to reach a broader audience.

From Text to Multimedia

Text is efficient, but not all knowledge is best expressed in words. This year, I used many architecture and schematic diagrams to explain complex concepts, adding 59 new charts. These visual elements lower the barrier to understanding, making abstract concepts more intuitive. I also optimized image display in dark mode to ensure consistent visual experience.

Development Approach: Embracing AI-Assisted Programming

2025 was not only a year of shifting content themes toward AI, but also a year of deep practice in AI-assisted programming.

I developed a VS Code plugin and created many prompts to automate repetitive tasks. I experimented with various AI programming tools and settled on a toolchain that suits me. I even migrated the website to Cloudflare Pages and used its edge computing services to develop a chatbot. These practices greatly improved development efficiency, giving me more time to focus on thinking and creating rather than mechanical coding.

This made me realize: AI will not replace developers, but developers who use AI well will replace those who do not. I also shared more insights to help others master AI-assisted programming.

Looking Ahead to 2026: Keep Moving Forward

Looking back at 2025, the site underwent a profound transformation—from a Cloud Native tech blog to an AI infrastructure knowledge base. But this is just the beginning, not the end.

Looking forward to 2026, I plan to continue deepening in several areas:

  • Enhancing the knowledge system: Continue to supplement GPU infrastructure and AI Agent content, especially practical cases and performance tuning knowledge.
  • Tracking ecosystem evolution: AI tools and frameworks iterate rapidly; I need to keep up with this fast-changing ecosystem and update content in a timely manner.
  • Deepening engineering practice: Share more practical AI engineering experience to help readers turn theory into practice.
  • Exploring knowledge connections: Consider building a knowledge graph to connect different content sections, providing smarter navigation and recommendations.

Summary

2025 was a year of change and growth. From Cloud Native to AI Native, from technical practice to ecosystem observation, both the content and functionality of the site have made qualitative leaps.

What makes me happiest is that this transformation allowed me and my readers to stand at the forefront of the technology wave. We are not just learning new technologies, but thinking about how technology changes the world and the way we write software.

The waves of technology keep evolving; only by actively embracing change can we continue to create value. Thank you to every reader for your companionship and support. I look forward to sharing more insights and practices in 2026.

Further Reading:

The Butterfly Effect After Manus Was Acquired by Meta

2025-12-30 11:30:51

The success or failure of AI applications often lies not in the technology itself, but in the ability to scale delivery and create a closed loop.

Figure 1: The Butterfly Effect After Manus Was Acquired by Meta
Figure 1: The Butterfly Effect After Manus Was Acquired by Meta

When “Those Who Discuss It” Are Not “Those Who Pay for It”

On December 30, 2025, a piece of news went viral: Manus was acquired by Meta for billions of dollars (Manus Joins Meta for Next Era of Innovation). This startup, founded in China and under pressure from tech giants since its inception, completed a whirlwind journey in less than a year—from explosive growth, relocating to Singapore, to being acquired by a global giant.

According to Manus’s official statement, its products and subscriptions will continue to be available via the app and website, and the company will remain operational in Singapore. The team will join Meta to provide general Agent capabilities for Meta’s consumer and enterprise products (including Meta AI).

Rather than focusing on “who won,” I’m more interested in the chain reaction this event triggered: it activated completely opposite judgment systems among different groups, and this split is reshaping the growth paths and strategies for AI applications and startups.

Two Public Opinion Arenas: Blessings and Doubts Coexist

After Manus was acquired, the mainstream sentiment in social circles was one of congratulations and excitement. Many saw it as a stellar example of a Chinese team going global—achieving remarkable results in the most competitive field in a very short time.

Meanwhile, the comment sections of public accounts became “venting valves for counter-narratives,” with skepticism centering on three main points:

  • Whether the technology has real barriers (e.g., “there are countless similar products,” “it’s not hard for big companies to build their own”).
  • Valuation and bubble concerns (e.g., “another case of the AI bubble”).
  • Distrust in the buyer’s judgment (e.g., “giants making desperate bets,” “history repeating itself”).

This divergence isn’t about who understands AI better, but about different evaluation frameworks: social circles focus on “trajectory and outcome,” while comment sections focus on “legitimacy and worthiness.”

Where Does the $100M ARR Come From: The Target Users Aren’t in Our Social Circles

Many people are impressed by Manus’s marketing buzz and controversies, which can lead to skepticism. But if it achieved a “strict $100M ARR” in 10 months, one fact is clear: its revenue doesn’t depend on broad consensus, but comes from a highly concentrated group of global users with strong willingness to pay.

Manus’s core user profile is closer to “individuals as production units,” including freelancers, indie developers, independent researchers, and key deliverers in small and medium businesses. They don’t care about debates over “wrapping” or not; they care about “can I deliver end-to-end tasks,” and “can this help me hire one less person, work fewer late nights, or avoid juggling ten tools.”

This leads to a counterintuitive phenomenon: those who discuss the most may not pay, while those who pay steadily are often silent.

For these users, tools are not identity badges—they are profit levers.

Three Lessons for Entrepreneurs: The Growth Paradigm in the AI Application Era Has Changed

Based on the above, the Manus case offers three lessons for entrepreneurs:

Growth No Longer Equals Positive Reviews

AI applications can commercialize first and build consensus later. Public opinion can remain divided for a long time, but cash flow doesn’t wait for unified recognition.

“Heavy Marketing” Is Becoming a Capability, Not a Stigma

As foundational models and capabilities spread rapidly, differentiation is quickly erased. Being seen, understood, and paid for is itself part of the moat. Not all marketing deserves respect, but “distribution and mindshare” have become unavoidable battlegrounds for AI applications.

Globalization Is No Longer a Bonus, but May Be a Survival Strategy

From payment willingness, compliance boundaries, talent density to valuation systems, market structure means many teams “can only complete the loop overseas.” It’s not romantic, but it’s reality.

A Personal Reflection

As someone long engaged in cloud native and AI infrastructure, I’m used to evaluating products by their “technical barriers.” But cases like Manus remind me: at the AI application layer, barriers may not first appear in models or code, but often in organizational speed, productization capability, delivery loop, and distribution efficiency.

When a system can reliably turn “capability” into “results,” it has built a commercial moat—even if its tech stack doesn’t meet outsiders’ ideals of “purity.”

The biggest butterfly effect of Manus being acquired by Meta may not be the deal itself, but making more entrepreneurs realize: in the AI era, the winning move is shifting from “what model you use” to “whether you can deliver results at scale.”

Summary

The acquisition of Manus by Meta is not just a convergence of capital and technology, but also a microcosm of the changing growth paradigm in the AI application era. For entrepreneurs, understanding and mastering “user structure,” “distribution capability,” and “global closed loops” will be key to future competition.

AI Infra Open Source in China: Analysis of Beijing and Shanghai's Plans

2025-12-25 18:01:13

Institutionalized open source marks a new starting point for China’s AI Infra, but true breakthroughs and risks lie in the engineering and governance details.

Perspective on Beijing and Shanghai’s Open Source Plans

Using the simultaneous release of open source ecosystem plans by Beijing and Shanghai as a lens, and drawing on China’s past foundation practices and international open source governance experience, this article explores the real opportunities, structural constraints, and potential risks as AI Infrastructure (AI Infra, Artificial Intelligence Infrastructure) enters a new phase of institutionalized open source.

Figure 1: Beijing and Shanghai successively launch open source ecosystem construction plans
Figure 1: Beijing and Shanghai successively launch open source ecosystem construction plans

Why Compare Beijing and Shanghai Together

It is rare for me to write an article solely because of a local policy document. However, during Christmas, both Beijing and Shanghai’s Bureaus of Economy and Information Technology released their respective open source ecosystem construction plans:

This time, the fact that both cities released their plans on the same day sends a signal worth serious attention: China is attempting to advance open source in a more systematic and institutionalized way, especially regarding open source capabilities related to AI Infra.

If you only look at Beijing’s plan, it is easy to interpret it as a local industrial policy upgrade. But when you consider both Beijing and Shanghai’s plans together, it looks more like a clearly defined “dual-center structure.”

The question is no longer whether to develop open source, but:

In the AI era, what institutional forms, engineering paths, and governance models will open source take?

Open Source as “Industrial Infrastructure Engineering”

Both Beijing and Shanghai’s plans reflect a highly consistent judgment:

Open source is no longer seen as a spontaneous community activity, but as an industrial infrastructure capability that requires systematic construction.

This is especially evident in the field of AI Infra.

Issues such as computing power scheduling, model evaluation, toolchains, data elements, license compliance, and supply chain security—previously hidden in “engineering details”—are now systematically incorporated into policy language for the first time. This at least shows that decision-makers have realized:

  • AI competition is not only about model parameter scale
  • It is even more about toolchains, infrastructure, evaluation systems, and engineering capabilities
  • These capabilities are naturally more suitable for building public foundations through open source

In this respect, Beijing and Shanghai are highly aligned.

Two Open Source Paths: Infra vs. Platform

When we zoom in, the differences between the two plans become clear.

Beijing: “Foundation-Oriented” Open Source Path for AI Infra

Beijing’s plan focuses on:

  • Heterogeneous computing power scheduling
  • Model evaluation toolchains
  • Data elements and data governance
  • RISC-V software-hardware collaboration
  • SBOM, license compatibility, open source compliance
  • Supply chain security and industrial resilience

This is a typical perspective of “treating AI as an infrastructure problem.”

It is less concerned with the number of projects or community size, and more with:

  • Whether reusable engineering capabilities can be formed
  • Whether these can be trusted by industry and government over the long term
  • Whether they can stand up to scrutiny in terms of security, compliance, and governance

To some extent, Beijing is answering the question:

How can open source become a “governable, auditable, and scalable public capability”?

Shanghai: “Scale and Internationalization” Path for AI Platform

In contrast, Shanghai’s plan has a different focus:

  • Building an international open source community for artificial intelligence
  • Covering the entire platform chain from development, training, testing, hosting, to operation
  • Overseas sites, multilingual support, international activities
  • Resource linkage through computing vouchers and model vouchers
  • “Open source platform first release / global simultaneous release” dual-release mechanism
  • Clear targets for community, enterprise, and developer scale

Shanghai cares more about:

  • How open source can achieve scale effects
  • How it can support the growth of commercial enterprises
  • How it can be seen and adopted globally

This is a path of “treating open source as a global digital product and platform capability.”

Together: A Complete but Tension-Filled Structure

When viewed together, Beijing and Shanghai’s plans form a more complete picture:

Beijing is responsible for “making open source solid,” while Shanghai is responsible for “taking open source global.”

Structurally, this is a clear division of labor:

  • Beijing focuses on institutions, governance, and foundational capabilities
  • Shanghai focuses on community, commercialization, and international communication

These two paths are not in conflict; in theory, they are even complementary. The real question is whether they can form positive feedback in practice, rather than operating in silos.

Cautious Attitude Toward “Institutionalized, Platformized Open Source”

Precisely because both plans are so “systematic,” I am even more cautious.

The reason is simple: this is not China’s first attempt to promote open source through foundations, associations, or platforms.

Over the past decade, we have seen similar paths repeatedly, and recurring structural problems:

  • The difficulty of establishing neutrality and multi-party trust is extremely high
  • There is a huge gap between showcase metrics (quantity, activities, certifications) and ecosystem strength
  • Commercialization and long-term maintenance mechanisms are hard to sustain

These problems will not disappear just because the plans are more comprehensive.

Four Risks to Watch Under the Dual Plans

If we are to “listen to their words and watch their actions,” I would focus on the following four risks:

Will Metrics Hijack Engineering Reality

When “internationally influential projects,” “star projects,” and “first-release projects” become hard metrics, will this induce packaging, migration, and short-term hype, rather than truly solving engineering problems?

Will It Slide Toward Platform Centralism

The long-term pattern of AI Infra is closer to a model that prioritizes protocols, standards, and interoperability. If it eventually evolves into “a few platforms concentrating resources and discourse power,” it may be efficient in the short term but will suppress external participation and international collaboration in the long run.

Is Internationalization Underestimated as an “Operational Issue”

True international collaboration is never just about language, sites, or events; it also involves governance structures, compliance boundaries, and supply chain trust.

Will Application Demonstrations Become One-Off Projects

If “first plans” and “computing vouchers” are just procurement tactics without continuous iteration and community feedback mechanisms, the long-term benefit to the ecosystem will be very limited.

What Are the “Hard Results” of AI Infra Open Source After Three Years

If we review the success of this round of institutionalized open source after three years, I would look for three types of results:

  • Whether de facto standards and interoperable ecosystems have emerged, including scheduling interfaces, evaluation benchmarks, Agent tool invocation protocols, and observability semantics.
  • Whether compliance and supply chain security have become public capabilities—SBOM, license compatibility, vulnerability monitoring—truly productized and service-oriented.
  • Whether a sustainable maintenance business mechanism has been established, allowing core maintainers to stay long-term, rather than relying on passion and subsidies.

If I were to use a North Star metric to measure the success of these plans, it would be the emergence of several outstanding open source commercial companies rooted in China and serving the world.

Summary

The open source ecosystem plans of Beijing and Shanghai mark a new phase of institutionalization and engineering for AI Infra open source in China. Over the next three years, the real achievements will not be about meeting targets, but about forming sustainable engineering capabilities, de facto standards, and maintenance mechanisms. Only through continuous participation and practice can open source become the public foundation of AI infrastructure.

References

From 2025 Onwards, Software Engineering Shifts from Code-Centric to Runtime and Cost-Centric

2025-12-24 22:59:11

In 2025, the core of software engineering is no longer just about code itself, but about runtime controllability and cost governance. This shift is fundamentally reshaping the industry’s underlying logic.

Looking back at 2025, I became increasingly aware that this year was not about “code becoming unimportant,” but rather that the value coordinates of engineering have shifted as a whole. For more than a decade, software engineering has focused on code quality, architectural evolution, and delivery efficiency. But starting in 2025, the key to system success is shifting—towards whether the runtime is controllable and whether costs are governable.

This is not just a slogan, but a conclusion repeatedly validated by my real-world experiences throughout the year.

My 2025: From “Platform Engineering” to “Runtime Challenges”

In my annual review, I noted a clear change: I spent less time on “how to write a good system,” and more time on “how to keep the system running stably, reliably, and affordably.”

This shift in focus is a natural extension of a decade of cloud native evolution.

The following timeline diagram illustrates how my focus has changed over recent years:

Figure 1: My Focus Shift Timeline
Figure 1: My Focus Shift Timeline

My focus shifted from cloud native platform engineering to LLM application engineering, then to AI infrastructure, and finally to Agentic Runtime with governance and cost control.

When AI workloads truly enter business scenarios, the core challenges engineers face also change:

  • Are inference, training, and evaluation competing for the same compute pool?
  • Is GPU utilization consistently below expectations?
  • Does cost scale linearly and uncontrollably with concurrency?
  • Does the system have failure isolation and replay capabilities?

These issues go far beyond the code level.

Industry Consensus: AI Is Shifting the Focus of Engineering

By 2025, an industry consensus is emerging: AI is rewriting software engineering. But the real change is not happening in the IDE or code completion speed—it is reflected in the migration of engineering complexity.

Previously, complexity was concentrated in code and interfaces, and problems were solved through abstraction, refactoring, and testing.

Now, complexity has shifted to the runtime, resource, and cost layers, and must be addressed through scheduling, isolation, observability, and governance.

This is why the same AI tools:

  • Serve as “accelerators” for junior engineers
  • But act as “magnifiers” for senior engineers

AI tools amplify whether you truly understand how systems run in production.

Why “Cost” Becomes a First Principle

In traditional cloud native systems, low CPU utilization is often just an efficiency issue; but in AI systems, low GPU utilization is often a cash flow problem.

In 2025, I repeatedly encountered scenarios like:

  • Resources “seem insufficient,” but utilization is not actually high
  • Scaling up to solve queuing issues ends up increasing unit costs
  • The system lacks clear budget and quota boundaries, so throttling becomes the only way to stop the bleeding

The root cause of these phenomena is not model selection, but the lack of a runtime and cost control plane tailored for AI workloads.

The following flowchart visually illustrates the cyclical relationship between GPU resources and cost pressures:

Figure 2: GPU Resource and Cost Cycle in AI Systems
Figure 2: GPU Resource and Cost Cycle in AI Systems

In AI systems, limited GPU supply leads to queuing and waiting, which causes throughput to drop. Attempts to solve this through blind scaling only increase unit costs and create budget pressure, ultimately forcing the adoption of finer scheduling and governance strategies.

Engineering problems ultimately manifest as cost issues.

The Rise of Agents: The Real Challenge Is at Runtime

In 2025, Agent (Intelligent Agent, Agent, Intelligent Agent) became a hot topic; by 2026, it will enter the “can it actually run” stage.

The challenge for Agents has never been about “how smart they are,” but rather:

  • Whether there are clear permission and data boundaries
  • Whether they run in an isolated execution environment
  • Whether they can be observed, evaluated, and replayed
  • Whether they are subject to explicit cost and budget constraints

These capabilities form the outline of Agentic Runtime (Agentic Runtime, Intelligent Agent Runtime) that I have been trying to clarify throughout the year.

The following flowchart shows the core capability layers of Agentic Runtime:

Figure 3: Agentic Runtime Capability Layers
Figure 3: Agentic Runtime Capability Layers

Agentic Runtime builds from the foundation of Agents and workflows, connecting through orchestration and tool protocols, with the runtime managing state, memory, and evaluation. It provides secure execution environments (Sandbox and Policy), and ultimately implements a resource and cost control plane that unifies GPU, quota, and billing management.

Without a runtime, an Agent is just a demo; without cost constraints, an Agent is just a risk amplifier.

Outlook for 2026: The “Foundation” of Engineering Matters Again

Looking ahead to 2026, I remain cautiously optimistic.

I do not believe the future belongs to “those who write the best prompts,” but more likely to:

  • Those who understand runtime boundaries
  • Those who can govern compute as a constrained resource
  • Those who design AI systems as long-running systems

From 2025 onwards, software engineering is no longer code-centric, but runtime and cost-centric. This is not a regression, but a return: a return to being responsible for the whole system and for real-world constraints.

For me personally, this is both a year-end summary and the direction I will continue to invest in for the coming years.

Summary

In 2025, the focus of software engineering has shifted from code itself to runtime and cost governance. The rise of AI and Agents has not diminished the value of engineering, but has pushed complexity to a higher level. In the future, understanding runtime, managing compute and cost will become the new core competencies for engineers. I hope this year-end review provides some inspiration and reflection for fellow professionals.