2025-12-12 16:16:48
Goose is not a project that excites you at first glance in this wave of Agent innovation, but its entry into AAIF signals a deeper shift in how we think about Agentic Runtime and AI-Native infrastructure.
At first glance, Goose is not a project that immediately excites people.

It doesn’t have flashy demos, nor does it showcase overwhelming multimodal capabilities, and it certainly doesn’t look like an AI product aimed at consumers. Yet, this seemingly “plain” project became one of the first donations to the Agentic AI Foundation (AAIF), standing alongside Anthropic’s MCP and OpenAI’s AGENTS.md.
This fact alone is worth a closer look.
This article does not aim to prove how powerful Goose is, but rather to answer three more practical questions:
If you only look at its surface features, Goose is easily mistaken for one of two things:
But inside Block, it was never designed as a “tool” from the start.
Goose’s origin is closely tied to Block’s engineering environment.
Block (formerly Square) is a classic engineering-driven company: complex systems, high automation needs, many internal tools, and very high execution costs in real production environments. In its recent AI transformation, Block did not focus on “which model to choose” or “which AI tool to introduce,” but directly targeted the engineering execution layer itself.
Goose was born in this context.
Its goal is not to “help people code faster,” but to enable models to stably and controllably take action: run tests, modify code, drive UIs, call internal systems, and operate reliably in real engineering environments.
In short:
Goose is more like an executable Agent Runtime than a conversation-centric product.
To understand Goose, you can’t ignore a key organizational shift at Block.
In an interview with Block’s CTO, one signal was very clear: the starting point for AI transformation was not buying tools or stacking models, but the organizational structure itself.
Block shifted from a business-line GM model to a more functionally oriented structure, making engineering and design the company’s core scheduling units again. This is essentially a proactive response to Conway’s Law.
If the organizational structure doesn’t allow technical capabilities to be orchestrated centrally, Agents will ultimately remain “personal assistants” or “engineering toys.”
From this perspective, Goose is not just a tool, but a cultural signal:
Every employee can use AI to build and execute real system behaviors.
This also explains a fact many overlook: Goose was not packaged as SaaS, nor was it rushed to commercialization, but was open-sourced and rapidly standardized.
Because its role inside Block is closer to an “operating system for execution models” than a product that can be sold separately.
This is what confuses outsiders the most.
If you only look at flashy features, model support, or community popularity, Goose doesn’t stand out. But AAIF’s choice was not about “maximum capability,” but about whether the position is right.
Looking at the first batch of AAIF projects, a clear chain emerges:
Goose’s role is not to set new protocols, but to serve as the practical carrier and reference implementation for these protocols.
It proves one thing:
From this angle, Goose’s “ordinariness” is actually an advantage.
It is not tied to Block’s business moat, nor does it have irreplaceable private APIs; it can be forked, replaced, audited—“boring” enough, and neutral enough.
And that is the most important trait of public infrastructure.
From a longer-term perspective, Goose’s value becomes clearer.
What we’re experiencing now is much like the early days of containers: Most Agent projects today are demos, IDE plugins, or workflow wrappers, but what’s really missing is a sustainable, schedulable, observable execution layer.
Goose is already moving in this direction.
Block’s metrics for Goose’s success are straightforward:
Behind this is a judgment I’m increasingly convinced of:
What enterprises truly need is not “smarter models,” but “cheaper execution.”
The long-term value of Agents is not in generation quality, but in execution substitution rate.
Just as CNCF did for cloud native, AAIF is not guaranteed to succeed.
But it at least marks a shift: Agents are no longer just application-layer innovations, but are beginning to enter the stage of infrastructure-layer collaboration.
As a reference implementation, Goose is likely to remain in this ecosystem for a long time—even if it is replaced, rewritten, or evolved in the future.
If you see Goose as a “product,” it is indeed not dazzling.
But if you place it in the long-term evolution path of Agentic AI, its significance becomes clear:
It is not the end, but a necessary intermediate state.
For me, the emergence of Goose further confirms one thing:
Agentic Runtime is not a conceptual problem, but an engineering and organizational one.
And that is one of the most worthwhile directions to invest energy in over the next few years.
2025-12-11 21:19:42
The deep integration of cloud native and AI, with the ARK platform, provides a new paradigm for engineering multi-agent systems.
AI Agents are moving from the “single agent demo” stage to “large-scale operation.” The real challenge does not lie in the model itself, but in engineering issues at runtime: model management, tool invocation, state maintenance, elastic scaling, team collaboration, observability, deployment, and upgrades. These are problems that traditional agent libraries struggle to solve.
ARK (Agentic Runtime for Kubernetes) provides a fully operational, observable, governable, and continuously deliverable multi-agent operating system. It is not a Python library, but a complete runtime platform.

Note: In this article, ARK refers to McKinsey’s open-source ARK Agent Runtime for Kubernetes .
This article, from an engineer’s perspective, will reorganize ARK’s core capabilities and answer the following questions:
The core idea of ARK is: An agent is not a script, but a schedulable, governable, and observable Kubernetes workload.
The following architecture diagram illustrates ARK’s underlying structure.
This diagram highlights ARK’s key design points:
ARK adopts the typical cloud-native Operator pattern and applies it to multi-agent systems.
Unlike traditional agent frameworks where “code is logic,” ARK uses CRDs (Custom Resource Definitions) to abstract the components of agent applications.
The main CRD types in ARK include:
These CRDs correspond to all the key components of an agent system.
The following diagram shows the structure of the CRDs:
Through CRDs, ARK achieves the following engineering features:
This is the key gene of ARK’s engineering-oriented system.
The following image shows how to view query details in the ARK Dashboard.

In ARK, the complete execution flow for an agent receiving a query is as follows:
This flow has the following characteristics:
This makes ARK more like an “agent microservice platform.”
Below is an example of a request and response:
ARK’s Team CRD allows multiple agents to be woven into a higher-level “system,” enabling multi-agent collaboration.
The following diagram shows the collaboration model of a multi-agent team:
The engineering value of Team is reflected in:
For enterprises, this means the “agent organizational structure” can be standardized, replayed, and tuned.
Many engineers, upon first seeing ARK, may wonder:
“Is it just LangChain or CrewAI wrapped in Kubernetes?”
In fact, there are fundamental differences. The following diagram compares the structural differences between ARK and mainstream agent frameworks:
The table below further summarizes the key differences:
| Dimension | Traditional Agent Libraries | ARK |
|---|---|---|
| Core Pattern | Write Python code | Write CRDs (declarative) |
| Deployment | Local/Container | Kubernetes-native scheduling |
| State | Managed inside code | Memory CR + Service |
| Tools | Integrated at code level | Tool CR + MCP |
| Multi-Agent | Dialog managed in code | Team CR + A2A protocol |
| Observability | Almost none | OTel / Langfuse / Dashboard |
| Use Cases | Demo / Prototype / Single Agent | Enterprise production / Multi-Agent Systems |
In short:
LangChain is a “library for building agents,” while ARK is a “platform for running agents.”
The two are not in conflict and are, in fact, highly complementary.
To summarize ARK’s engineering value in simple terms:
This is a clear evolution path:
Agent → Service → Platform → Runtime → Operating System
ARK is currently positioned at the fourth stage: Runtime.
ARK provides three direct insights for building Agentic Runtimes:
Unified Scheduling System
Declarative Capability Boundaries
Observability
ARK demonstrates a direction:
Multi-agent systems are an engineering problem, not a prompt engineering problem.
If you only need to build a simple agent, frameworks like LangChain, CrewAI, and AutoGPT are sufficient.
But if you want to operate a system composed of dozens or hundreds of agents that need to collaborate, run long-term, and support continuous delivery and governance, runtimes like ARK are the inevitable trend.
It provides Agentic AI with:
Therefore, ARK deserves to be regarded as an early model for engineering multi-agent systems.
2025-12-11 13:20:12
“Open source” in the AI era is no longer a trustworthy promise. Commercial projects can withdraw their code at any time, and developers must be wary of the gap between appearances and reality.
While updating the AI open source project library on my website, I encountered a situation that left me stunned for the first time: An “open-source AI tool” that still promotes itself, with an active website and commercial services, suddenly vanished from GitHub—its repository went straight to 404.
The project is called Lunary.
Original repository address: https://github.com/lunary-ai/lunary It now returns a 404 Not Found.
Notably, the official site lunary.ai remains online, but the core promise of an “open-source codebase” has disappeared.
Here is an overview of Lunary’s main features and positioning to help understand its role in the AI tool ecosystem.
Lunary claims to be an Observability and Evaluations platform for large language model (LLM, Large Language Model) applications, focusing on:
Its overall positioning is clear: “Development and debugging tools for AI applications.”
In fact, products like this have emerged rapidly over the past year, forming a new AI DevTool track.
The core issue is not the tool itself, but its claim to be “open source.”
Lunary has consistently emphasized:
“Lunary is an open-source platform for developers.”
This statement is great for attracting users, as open source implies transparency, trustworthiness, self-hosting, and community participation.
But now the repository is gone, with only the website continuing its promotion—raising many questions.
Lunary is not a niche hobby project, but a commercial company-led initiative. If an individual suddenly deletes a repo, it’s not surprising, but for a company operating publicly, this move is extremely rare.
This is the first time I’ve truly seen a reality in the AI DevTools space: “Open source” is being used as a branding term, not a commitment.
Let’s analyze some common industry reasons for deleting a repository to help developers understand the motivations behind such actions.
Regardless of the reason, the impact on users is the same: it is no longer an “open-source product.”
The most noteworthy aspect is not Lunary itself, but the rapid spread of this phenomenon in the AI tool space.
Many projects use “open source” as a user acquisition strategy but lack open governance and long-term commitment.
High substitutability, homogeneity, and commercial pressure mean these DevTools have low survival rates.
When commercial teams lead open source, a single decision can make the repository disappear instantly.
In the cloud native era, we’ve already seen a wave of “pseudo open source.” In the AI era, this trend is accelerating.
Based on this case, here are three practical lessons for developers:
After collecting hundreds of projects over the past two years, this is the first time I’ve encountered a “commercial open source project disappearing, official repo 404” case.
To me, this is an industry signal: the AI open source world is entering a period of drift, and commercial projects’ open source commitments are increasingly unstable.
It also reminds everyone making technical choices: in the AI era, open source is no longer a label you can automatically trust.
The disappearance of the Lunary repository is not an isolated incident, but a reflection of the “pseudo open source” phenomenon in the AI tool space. Developers should be cautious about the actual commitments behind the “open source” label, paying attention to project governance and sustainability. In the future, the boundary between open source and commercial will become even more blurred, and rational judgment and risk awareness will be essential for technical decision-making.
Lunary’s sudden disappearance highlights the instability of open source projects in the AI DevTools space. For developers, technical choices should focus more on project usability and community governance, rather than relying solely on the “open source” label. As the industry evolves, similar incidents may become more frequent. Only rational judgment and risk awareness can help you stand firm in the fast-changing tech landscape.
2025-12-10 11:25:38
The standardization and open collaboration of the agent ecosystem is no longer a luxury, but the critical watershed for whether AI Native can be engineered and implemented.
Over the past decade, Cloud Native technologies like Kubernetes, Service Mesh, and microservices have standardized “how applications run in the cloud”. But AI Native faces a completely different challenge: It’s not about “how to deploy a service”, but “how many behaviors in the system can be handed over to agents to execute themselves”.
CNCF’s Cloud Native AI (CNAI) addresses infrastructure-level issues: “How can model training/inference/RAG run at scale and securely on Kubernetes?”
But what AI Native truly lacks is another layer: How do agents collaborate, access tools, get governed, and audited?
This is exactly the gap AAIF aims to fill.
AAIF hosts three core technologies contributed by its founding members:
https://github.com/modelcontextprotocol
A “system call interface for agents”:
It may not be the flashiest technology, but it could become the plumbing for the entire Agentic ecosystem.
https://github.com/block/goose
Reference runtime for MCP:
A simple but effective standard:
This makes agent behavior more predictable and auditable.
Let’s compare with history:
AAIF is not about “mature technology entering a foundation”, but staking out the key position early.
The reasons are practical:
CNCF’s role:
“What infrastructure do agent workloads run on?” Kubernetes, Service Mesh, observability, AI Gateway, RAG Infra—all at this layer.
AAIF’s role:
“How do agents collaborate, invoke tools, and get governed?” Protocols, runtimes, and behavioral standards—all at this layer.
Analogy:
| Domain | Responsibilities |
|---|---|
| AAIF | Semantic and collaboration layer of Agentic Runtime |
| CNCF/CNAI | Resource and execution layer of AI Native Infra |
This matches the upper semantic and lower infrastructure layers in my ArkSphere architecture diagram.
In the long run, the two sides will be tightly coupled: CNCF’s KServe, KAgent, and AI Gateway will natively support MCP / AGENTS.md, AAIF’s Runtime will run on Cloud Native infrastructure by default.
Most enterprises will get stuck on:
In other words, agent adoption is not a “technical migration”, but an “organizational migration”.
If AAIF cannot provide:
It will be difficult for AAIF to achieve the industry impact that CNCF did.
For me, the establishment of AAIF feels like:
“The battlefield boundaries of the agent world have finally been drawn. Now it’s up to the engineering community to make it work.”
CNCF solved “how to run Cloud Native”, AAIF is now trying to solve “how agents collaborate”.
In the next five years, whoever can truly connect these two worlds will stand at the gateway to the next generation of infrastructure.
That’s why I started a dedicated “Agentic Runtime + AI Native Infra” research track in ArkSphere .
Finally, a personal note—my thoughts on ArkSphere.
This diagram shows the three-layer structure of the AI Native era:
CNCF (bottom layer): Provides the Cloud Native foundation required for agent operation, including Kubernetes, Service Mesh, GPU scheduling, and security systems.
AAIF (middle layer): Defines the runtime semantics and standards for agents, including the MCP protocol, Goose reference runtime, and AGENTS.md behavioral standard.
ArkSphere (bridging layer): Aligns the “Agentic Runtime semantic layer” with the “AI Native Infra infrastructure layer”, forming an engineerable agent architecture standard.
In short:
Infra is responsible for “running”, Runtime for “how to act”, and ArkSphere for “how to assemble a system”.
2025-12-03 13:21:28
The shifting ownership of runtimes is reshaping the underlying logic of AI programming and infrastructure.
After the announcement of Bun’s acquisition by Anthropic , my focus was not on the deal itself, but on the structural signal it revealed: general-purpose language runtimes are now being drawn into the path dependencies of AI programming systems. This is not just “a JS project finding a home,” but “the first time a language runtime has been actively integrated into the unified engineering system of a leading large model company.”
This event deserves a deeper analysis.
Before examining Bun ’s industry significance, let’s outline its runtime characteristics. The following list summarizes Bun’s main engineering capabilities:
These capabilities have formed measurable performance barriers.
However, it should be noted that Bun currently lacks the core attributes of an AI Runtime, including:
Therefore, Bun’s “AI Native” properties have not yet been established, but Anthropic’s acquisition provides an opportunity for it to evolve in this direction.
Historically, it is not uncommon for model companies to acquire editors, plugins, or IDEs, but in known public cases, mainstream large model vendors have never directly acquired a mature general-purpose language runtime. Bun × Anthropic is the first clear event pulling the runtime into the AI programming system landscape. This move sends two engineering-level signals:
This is not a short-term business integration, but a manifestation of the trend toward compressed engineering pipelines.
Based on observations of agentic runtimes over the past year, runtime requirements in the AI coding era are diverging. The following list summarizes the main engineering abstractions trending in this space:
These requirements are not unique to Bun, nor did Bun originate them, but Bun’s “monolithic and controllable” runtime structure is more conducive to evolving in this direction.
If Bun is seen merely as a Node.js replacement, the acquisition is of limited significance. But if it is viewed as the execution foundation for future AI coding systems, the logic becomes clearer:
This model is similar to the relationship between Chrome and V8: the execution engine and upper-layer system co-evolve over time, with performance and semantics advancing in sync.
Whether Bun can fulfill this role depends on Anthropic’s architectural choices, but the event itself has opened up possibilities in this direction.
Combining facts, signals, and engineering trends, the following directions can be anticipated:
These trends will not all materialize in the short term, but they represent the inevitable path of engineering evolution.
The combination of Bun × Anthropic is not about “an open-source project being absorbed,” but about a language runtime being actively integrated into the engineering pipeline of a large model system for the first time. Competition at the model layer will continue, but what truly reshapes software is the structural transformation of AI-native runtimes. This is a foundational change worth long-term attention.
2025-12-02 20:07:45
The value of Agentic Runtime lies not in unified interfaces, but in semantic governance and the transformation of engineering paradigms. Ark is just a reflection of the trend; the future belongs to governable Agentic Workloads.
Recently, the ArkSphere community has been focusing on McKinsey’s open-source Ark (Agentic Runtime for Kubernetes). Although the project is still in technical preview, its architecture and semantic model have already become key indicators for the direction of AI Infra in 2026.
This article analyzes the engineering paradigm and semantic model of Ark, highlighting its industry implications. It avoids repeating the reasons for the failure of unified model APIs and generic infrastructure logic, instead focusing on the unique perspective of the ArkSphere community.
Ark’s greatest value is in making Agents first-class citizens in Kubernetes, achieving closed-loop tasks through CRD (Custom Resource Definition) and controllers (Reconcilers). This semantic abstraction not only enhances governance capabilities but also aligns closely with the Agentic Runtime strategies of major cloud providers.
Ark’s main resources include:
The diagram below illustrates the semantic relationships in Agentic Runtime:
Ark’s architecture adopts a standard control plane system, emphasizing unified runtime semantics. The community is highly active, engineer-driven, and the codebase is well-structured, though production readiness is still being improved.
The emergence of Ark has clarified the boundaries of ArkSphere. ArkSphere does not aim for unified model interfaces, multi-cloud abstraction, a collection of miscellaneous tools, or a comprehensive framework layer. Instead, it focuses on:
ArkSphere is an ecosystem and engineering system at the runtime level, not a “model abstraction layer” or an “agent development framework.”
2026 will usher in the era of Agentic Runtime, where Agents are no longer just classes but workloads that require governance rather than mere importation. Ark is just one example of this trend, and the direction is clear:
Ark’s realism teaches us that the future belongs to runtime, semantics, governability, and workload-level Agents. The industry will no longer pursue unified APIs or framework implementations, but will focus on governable runtime semantics and engineering paradigms.