MoreRSS

site iconHackerNoonModify

We are an open and international community of 45,000+ contributing writers publishing stories and expertise for 4+ million curious and insightful monthly readers.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of HackerNoon

您的 AI 代理权限过大。只是您目前还未察觉

2026-03-18 05:53:17

A few years ago, I was doing a security review at a mid-sized financial services company. They had a mature IAM program, a dedicated cloud security team, and had just completed a major access certification campaign. Clean bill of health.

Then we pulled the service account inventory.

Over 400 service accounts across their AWS and Snowflake environments. Roughly 60% hadn't been used in over 90 days. Several had admin-level privileges on production data pipelines. One created for a PoC that never went to prod had read access to their entire customer data warehouse. Nobody knew it existed.

This wasn't negligence. It was entropy. Access sprawl isn't a policy failure, it's what happens when you treat access as a configuration detail rather than a system that needs to be continuously understood.

That was before AI agents entered the picture. Now, that same pattern plays out 10x faster.

What "Shadow AI" Actually Looks Like in Practice

Shadow IT used to mean a rogue Dropbox account or an unauthorized SaaS subscription. Security teams learned to deal with it - DLP policies, CASB tooling, strongly-worded all-hands emails.

Shadow AI is different in kind, not just degree. It's not just ungoverned storage - it's ungoverned access plus action.

Here's what I've seen firsthand in enterprise environments over the last 18 months:

The "just for testing" OAuth grant that never died. A developer connects an AI coding assistant to the internal GitHub org "just to evaluate it." The OAuth grant gets broad repo access. The eval period ends; the integration doesn't get revoked. Six months later, nobody remembers it's there but it still has read access to every private repo in the org.

The no-code automation with human-level credentials. A business analyst builds a workflow automation using their own credentials to call internal APIs. The workflow runs on a schedule, silently, long after the analyst has moved to a different team. The access was never designed to outlive the person's role.

The agent framework that caches more than you think. Open-source agent frameworks that retain conversation context and credentials between sessions deployed by an engineering team that didn't read that part of the docs. Context windows contain API keys. Sessions persist. Nobody audited what was being stored or who could access it.

None of these show up in your IAM dashboard. None trigger your DLP policies. They live in the seam between "approved tooling" and "obvious incident."

The Non-Human Identity Problem Is Already Bigger Than Most Teams Realize

Here's a number that tends to land hard: in most enterprise environments, non-human identities already outnumber human users often by a factor of 5 to 10.

Service accounts. Workload identities. CI/CD pipelines. API keys. And increasingly, AI agents with their own OAuth grants and session tokens. These identities accumulate privileges quietly. They don't hit MFA prompts. They don't get flagged in quarterly access reviews. They don't get offboarded when the system they were created for gets decommissioned.

And because they've historically been managed separately from human access in a different tool, by a different team, on a different cadence the governance model never developed the same muscle.

The compounding problem with AI agents specifically is that their access patterns are harder to reason about statically. A human user with access to a data warehouse and a Slack integration uses those things independently. An AI agent with the same access can chain them pull from the warehouse, reason over the output, and post results somewhere in a single task execution that nobody designed end to end.

The blast radius of a misconfigured or compromised agent isn't just the permissions it holds. It's the transitive surface of everything it can reach and act on.

Access Is Now the Control Plane for AI Trust

The framing I keep coming back to: in an agentic world, access governance is the primary mechanism for controlling what AI systems can actually do.

Model alignment matters. Output filtering matters. But an agent that behaves perfectly and has access to everything is still a risk surface that most security teams aren't ready for. Prompt injection attacks, scope creep, compromised API keys any of these become catastrophic if the blast radius isn't bounded.

The access model has to evolve from:

"Is this identity authorized?"

to:

"Is this access pattern consistent with what this identity should be doing right now?"

That's a meaningful shift. The first question is answered at provisioning time and revisited periodically. The second requires continuous evaluation against observed behavior and it only works if you have a coherent model of what "normal" looks like for each identity.

This is where treating access as a graph rather than a list actually pays off. When users, roles, permissions, resources, service identities, and agents are modeled as interconnected entities, you can ask questions that siloed systems can't answer:

  • What can this agent actually reach if it follows transitive permission paths?
  • Which identities share an access path to this sensitive dataset?
  • If we add this new integration, what does the propagation look like before we deploy it?

That last one matters most. The shift from reactive cleanup to proactive blast-radius modeling is where access intelligence stops being a security function and starts being an architectural input.

What Responsible Agent Deployment Actually Requires

I've sat in enough post-mortems to know that AI-related access incidents rarely start with a dramatic breach. They start with access that was too broad, too static, and too invisible to catch before something downstream broke.

The pattern is almost always the same:

  1. Agent gets provisioned with "enough" access to do its job
  2. Access is never scoped down once the job is better understood
  3. Agent behavior drifts- new integrations, expanded use cases, changed prompts
  4. Nobody is watching the access surface, because the agent passes all the authentication checks

Fixing this isn't about adding more controls. It's about a few concrete practices:

Scope grants at deployment, not after. Define the minimum access an agent needs to complete its task before it goes anywhere near production. Treat over-provisioning as a deploy blocker, the same way you'd treat an unreviewed network rule.

Model human and non-human identities together. If agents are evaluated on a separate cadence, with separate tools, by a separate team you will have blind spots. The governance model needs to cover both, consistently.

Baseline behavior, not just provisioned scope. What data sources does this agent actually access? What chaining behavior does it exhibit? Does it stay within the access patterns it was designed for? Anomalies in behavior are often the first signal of drift, compromise, or scope creep but only if you're watching.

Design access before deployment, not after. The best time to understand the blast radius of a new agent or integration is before it's running in production. Modeling access propagation upfront is an order of magnitude cheaper than cleaning it up after.

The Access Problem Didn't Get Solved

I keep seeing organizations declare access "under control" after completing a certification campaign or deploying a new IAM tool. And then six months later, the same patterns re-emerge stale permissions, over-privileged accounts, ungoverned machine identities.

The reason is structural: access governance has mostly been a point-in-time activity applied to a continuous problem. Cloud infrastructure, SaaS sprawl, and now AI agents don't slow down between your quarterly reviews.

The organizations that are actually getting ahead of this are the ones that stopped treating access as a configuration detail and started treating it as a living system one that needs to be modeled, monitored, and reasoned about continuously across every kind of identity.

Okay, So What Do You Actually Do About It?

The pattern I've described isn't new to most security leaders. The harder question is always: where do we apply force? Here's a concrete action plan, organized by the three layers where access risk actually accumulates.

Layer 1: Get a Real Inventory of Who — and What — Has Access

You can't govern what you can't see. And most organizations have a visibility problem that's worse than they think.

Build a unified identity inventory — human and non-human together. Stop managing service accounts in a separate spreadsheet from human users. Pull every identity type, employees, contractors, service accounts, workload identities, API keys, OAuth grants, CI/CD pipeline credentials, and agent tokens, into a single view. The goal isn't a perfect CMDB. It's enough visibility to ask cross-identity questions: Which identities have access to this dataset? Who shares a permission path with this admin role?

Flag the orphans and the over-privileged immediately. Two filters that surface the highest-risk identities fast:

  • Any non-human identity unused in 60+ days with privileges still active
  • Any identity — human or machine — with permissions that cross more than two system boundaries (e.g., production database + cloud storage + external API)

These are your quickest wins and, more importantly, your most credible signal to leadership that the problem is real.

Map OAuth grants and agent tokens as first-class identities. This is the one most teams skip. Every AI tool, automation, and SaaS integration that authenticated against your internal systems left a token somewhere. Treat each one as an identity with an owner, a scope, and an expiration. If it doesn't have all three, that's a gap.

Layer 2: Understand Access as a Graph, Not a List

The reason transitive access risk is so hard to manage is that most teams are querying flat permission lists when the actual risk lives in relationships between systems.

Model permissions as connected entities. When you represent users, roles, resources, and service identities as nodes in a graph, with grants, inheritance, and delegation as edges, questions that were previously unanswerable become straightforward:

  • If this service account is compromised, what's the blast radius?
  • What's the shortest path from this external-facing API to our most sensitive data store?
  • Which AI agents share an access path to this regulated dataset?

You don't necessarily need a dedicated graph database to start. Even mapping this in a structured document for your five highest-risk systems will surface things that have been invisible for years.

Trace transitive permissions before they become incidents. Direct permissions are easy to audit. Inherited ones aren't. A role that grants access to a data pipeline that has a trust relationship with a storage bucket that contains PII - that's three hops, and most point-in-time reviews miss it entirely. Build the habit of tracing at least two levels of inheritance for any identity you're evaluating.

Identify cross-system access paths as a distinct risk category. Permissions that cross system boundaries, especially between a lower-trust environment and a higher-trust one deserve their own review cadence. An agent that can read from a dev environment and write to a prod messaging queue is a different risk than one scoped to a single system, even if each individual permission looks benign in isolation.

Layer 3: Make Access a Design Input, Not an Afterthought

This is the hardest shift, but it's where the leverage is. The teams that are genuinely ahead of this problem didn't get there by cleaning up access faster they got there by building access considerations into how systems get designed and deployed.

Require an access scope document before any new agent or automation goes to production. It doesn't need to be long. Four questions:

  1. What identities does this system use?
  2. What is the minimum access required for it to function?
  3. Who is the named owner responsible for its access surface?
  4. How does it get deprovisioned when it's no longer needed?

Making this a deploy requirement, the same way you'd require a security review for a new external endpoint, shifts the culture from reactive to proactive without adding significant overhead.

Define blast radius as a first-class design constraint. Before deploying a new agent or integration, explicitly model the worst-case impact if it's compromised or behaves unexpectedly. What systems can it reach? What data can it exfiltrate or corrupt? What downstream automations could it trigger? This isn't a theoretical exercise, it's the same kind of threat modeling you'd do for a new network segment, applied to access scope.

Implement just-in-time access for high-privilege operations. Persistent broad access is the enemy. For operations that require elevated privileges, schema changes, bulk data exports, cross-environment access move toward time-bounded grants that require explicit justification and expire automatically. This applies to AI agents as much as humans. An agent that needs elevated access to complete a specific task should get it for that task, not permanently.

Build behavioral baselines for your highest-risk non-human identities. What data sources does this agent normally access? What's its typical call volume? What systems does it chain together? Establish that baseline at deployment, and build alerting around meaningful deviations. Anomalous access behavior, new data sources, unusual chaining, access at unexpected times, is often the first signal of prompt injection, credential theft, or scope creep. But only if you defined "normal" first.

Close the feedback loop between access and observed behavior. The final step, and the one that separates mature programs from everyone else: use actual usage data to continuously right-size access. If a service account hasn't touched a permission in 90 days, remove it. If an agent's observed behavior only ever touches two of the eight systems it has access to, scope it down. Access should reflect reality, not the broadest possible interpretation of what might someday be needed.

The Question Every Security Leader Should Be Asking Right Now

Most CISOs I talk to are focused on the right things securing the AI models their organizations are adopting, managing cloud risk, keeping up with compliance mandates. But there's a question that doesn't come up often enough in those conversations:

Do you know what your AI agents can actually reach?

Not what access you intended to grant. What they can reach through transitive permissions, chained integrations, and OAuth grants your team approved six months ago and forgot about.

If you can't answer that with confidence today, you are already behind because your teams are deploying agents faster than your governance model can track them. And the gap between "what we provisioned" and "what can actually be accessed" is exactly the surface that gets exploited.

The organizations that will navigate the agentic era without a major access-related incident aren't the ones with the most controls. They're the ones that decided, early, to treat access as a first-class system modeled continuously, owned clearly, and designed before deployment rather than cleaned up after.

That decision starts at the top. The mandate to govern human and non-human identities together, to make access a design input for AI initiatives, and to hold engineering and business teams accountable for the access surface they create that doesn't come from a security engineer. It comes from you.

The access problem didn't get solved. You just inherited a faster version of it. What you do in the next two quarters will determine whether your AI investments become a competitive advantage or your next hard conversation with the board.

Author: Priyanka Neelakrishnan is a Director of Product Management focused on Access Security, with experience building enterprise security and data platforms at Palo Alto Networks and Symantec. She is the author of Autonomous Data Security: Creating a Proactive Enterprise Protection Plan, which explores how organizations can move from policy-centric data protection toward adaptive, AI-enabled security architectures. Make the world better than how it was yesterday!

Good security is an act of respect - for users, for data, for the future.

\ \

“亚当优化器”会加剧灾难性遗忘吗?

2026-03-18 05:45:48

:::info Authors:

  1. Dylan R. Ashley
  2. Sina Ghiassian
  3. Richard S. Sutton

:::

TABLE OF LINKS

Abstract

1 Introduction

2 Related Work

3 Problem Formulation

4 Measuring Catastrophic Forgetting

5 Experimental Setup

6 Results

7 Discussion

8 Conclusion

9 Future Work and References

\

Abstract

Catastrophic forgetting remains a severe hindrance to the broad application of artificial neural networks (ANNs), however, it continues to be a poorly understood phenomenon. Despite the extensive amount of work on catastrophic forgetting, we argue that it is still unclear how exactly the phenomenon should be quantified, and, moreover, to what degree all of the choices we make when designing learning systems affect the amount of catastrophic forgetting. We use various testbeds from the reinforcement learning and supervised learning literature to (1) provide evidence that the choice of which modern gradient-based optimization algorithm is used to train an ANN has a significant impact on the amount of catastrophic forgetting and show that—surprisingly—in many instances classical algorithms such as vanilla SGD experience less catastrophic forgetting than the more modern algorithms such as Adam. We empirically compare four different existing metrics for quantifying catastrophic forgetting and (2) show that the degree to which the learning systems experience catastrophic forgetting is sufficiently sensitive to the metric used that a change from one principled metric to another is enough to change the conclusions of a study dramatically. Our results suggest that a much more rigorous experimental methodology is required when looking at catastrophic forgetting. Based on our results, we recommend inter-task forgetting in supervised learning must be measured with both retention and relearning metrics concurrently, and intra-task forgetting in reinforcement learning must—at the very least—be measured with pairwise interference.

1 Introduction

In online learning, catastrophic forgetting refers to the tendency for artificial neural networks (ANNs) to forget previously learned information when in the presence of new information (French, 1991, p. 173). Catastrophic forgetting presents a severe issue for the broad applicability of ANNs as many important learning problems, such as reinforcement learning, are online learning problems. Efficient online learning is also core to the continual—sometimes called lifelong (Chen and Liu, 2018, p. 55)—learning problem. The existence of catastrophic forgetting is of particular relevance now as ANNs have been responsible for a number of major artificial intelligence (AI) successes in recent years (e.g., Taigman et al. (2014), Mnih et al. (2015), Silver et al. (2016), Gatys et al. (2016), Vaswani et al. (2017), Radford et al. (2019), Senior et al. (2020)). Thus there is reason to believe that methods able to successfully mitigate catastrophic forgetting could lead to new breakthroughs in online learning problems.

\ The significance of the catastrophic forgetting problem means that it has attracted much attention from the AI community. It was first formally reported on in McCloskey and Cohen (1989) and, since then, numerous methods have been proposed to mitigate it (e.g., Kirkpatrick et al. (2017), Lee et al. (2017), Zenke et al. (2017), Masse et al. (2018), Sodhani et al. (2020)). Despite this, it continues to be an unsolved issue (Kemker et al., 2018). This may be partly because the phenomenon itself—and what contributes to it—is poorly understood, with recent work still uncovering fundamental connections (e.g., Mirzadeh et al. (2020)). This paper is offered as a step forward in our understanding of the phenomenon of catastrophic forgetting. In this work, we seek to improve our understanding of it by revisiting the fundamental questions of (1) how we should quantify catastrophic forgetting, and (2) to what degree do all of the choices we make when designing learning systems affect the amount of catastrophic forgetting. To answer the first question, we compare several different existing measures for catastrophic forgetting: retention, relearning, activation overlap, and pairwise interference. We discuss each of these metrics in detail in Section 4. We show that, despite each of these metrics providing a principled measure of catastrophic forgetting, the relative ranking of algorithms varies wildly between them. This result suggests that catastrophic forgetting is not a phenomenon that a single one of these metrics can effectively describe. As most existing research into methods to mitigate catastrophic forgetting rarely looks at more than one of these metrics, our results imply that a more rigorous experimental methodology is required in the research community. Based on our results, we recommend that work looking at inter-task forgetting in supervised learning must, at the very least, consider both retention and relearning metrics concurrently. For intra-task forgetting in reinforcement learning, our results suggest that pairwise interference may be a suitable metric, but that activation overlap should, in general, be avoided as a singular measure of catastrophic forgetting.

\ To address the question of to what degree all the choices we make when designing learning systems affect the amount of catastrophic forgetting, we look at how the choice of which modern gradientbased optimizer is used to train an ANN impacts the amount of catastrophic forgetting that occurs during training. We empirically compare vanilla SGD, SGD with Momentum (Qian, 1999; Rumelhart et al., 1986), RMSProp (Hinton et al., n.d.), and Adam (Kingma and Ba, 2014), under the different metrics and testbeds. Our results suggest that selecting one of these optimizers over another does indeed result in a significant change in the catastrophic forgetting experienced by the learning system. Furthermore, our results ground previous observations about why vanilla SGD is often favoured in continual learning settings (Mirzadeh et al., 2020, p. 6): namely that it frequently experiences less catastrophic forgetting than the more sophisticated gradient-based optimizers—with a particularly pronounced reduction when compared with Adam. To the best of our knowledge, this is the first work explicitly providing strong evidence of this. Importantly, in this work, we are trying to better understand the phenomenon of catastrophic forgetting itself, and not explicitly seeking to understand the relationship between catastrophic forgetting and performance. While that relation is important, it is not the focus of this work. Thus, we defer all discussion of that relation to Appendix C of our supplementary material. The source code for our experiments is available at https://github.com/dylanashley/catastrophic-forgetting/tree/arxiv.

\

:::info This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license.

:::

\

如果让人工智能帮你写一周代码,会发生什么?

2026-03-18 05:21:56

For years, developers handled every line of code by hand. Long hours. Debugging sessions that stretched late into the night. Coffee cups everywhere.

\ Now, something interesting is happening.

\ Many developers are letting AI tools write parts of their code. Not for a few minutes. Not just for a quick snippet. But for days at a time.

\ So, what actually happens if you let AI write your code for an entire week?

\ Does productivity explode? Do bugs multiply? Does it replace developers or simply change the way they work?

\ Let’s talk about what really happens when teams try this experiment.

The First Day Feels Surprisingly Fast

The first thing most developers notice is speed.

\ Tasks that normally take 20 minutes can take five. Boilerplate code appears almost instantly. Functions that once required careful typing suddenly show up with a single prompt.

\ You ask for a login API. It generates one.

You need a form validation script. Done.

You want a quick database query. There it is.

\ At first, it feels like a cheat code. Developers spend less time typing and more time reviewing what appears on the screen.

\ And that shift matters.

\ Instead of building everything from scratch, the role begins to change. You guide the code rather than write every character.

\ The keyboard gets quieter. The brain gets busier.

Routine Work Almost Disappears

Developers spend a surprising amount of time on repetitive work.

\ Creating similar functions. Writing standard API calls. Building CRUD operations again and again.

\ After a few days of using AI tools, that repetitive work starts to fade away.

\ You can request things like:

  • Create an API endpoint for user profiles
  • Write a pagination function
  • Generate test cases for this module

\ The result appears quickly.

\ This does not mean the code is always perfect. Far from it. But it usually gets you halfway there.

\ Instead of spending an hour writing the structure, you spend ten minutes adjusting the details.

\ That shift saves real time.

Debugging Becomes a Different Experience

By day three or four, something interesting happens.

\ Developers begin using AI to fix bugs as well.

\ You paste an error message. It suggests possible fixes.

You share a block of code. It points out issues.

\ Sometimes, the solution works instantly. Other times it sends you in the wrong direction.

\ But even when the answer is not perfect, it often gives developers a starting point.

\ Think about how debugging normally works. You search documentation, read Stack Overflow threads, and test ideas.

\ Now you can ask for possible fixes instantly.

\ It becomes a brainstorming partner rather than just a coding assistant.

The Developer Still Does the Real Thinking

Here is the part that many people misunderstand.

\ Even if AI writes pieces of the code, developers still make the important decisions.

\ You decide the architecture.

You define how systems communicate.

You design the database.

You determine performance requirements.

\ AI can generate code, but it does not understand the business problem the way you do.

\ And that difference becomes obvious after a few days.

\ Developers who give vague instructions often receive messy code. Developers who give clear instructions get better results.

\ In other words, good thinking still matters.

Code Reviews Become More Important

When AI writes part of your code, reviewing becomes critical.

\ Developers cannot assume every line is correct.

\ Some generated code may be inefficient. Some may ignore edge cases. Occasionally, there may be security issues.

\ So teams spend more time reviewing and testing.

\ That sounds like extra work, but it balances out. Since the code appears faster, there is more time to inspect it carefully.

\ Many developers describe the process like this:

  • AI writes the first draft.
  • The developer edits it.

\ It feels similar to editing a document someone else started.

Small Projects Move Much Faster

After a full week of working this way, the speed difference becomes clear.

\ Small features can ship quickly.

Internal tools get built faster.

Prototypes appear in days instead of weeks.

\ Startups find this particularly helpful. When you're testing ideas quickly, speed can make a big difference.

\ Instead of spending weeks building a proof of concept, teams can assemble something functional in a few days.

\ That agility helps teams experiment more often.

Complex Systems Still Need Human Experience

Here is where reality sets in.

\ Large systems still require deep experience.

\ AI tools may generate individual pieces of code, but designing complex platforms still depends on human judgment.

\ Think about things like:

  • Microservice architecture
  • Security rules
  • Scaling strategy
  • Database design
  • API versioning

\ These decisions affect the entire product.

\ AI can help with smaller tasks inside those systems, but the big picture still belongs to experienced developers.

\ And that likely won't change anytime soon.

Developers Spend More Time Asking Good Questions

When developers rely on AI tools, one skill becomes very important.

\ Asking clear questions.

\ The quality of the answer often depends on how the request is written.

\ A vague request may produce vague results.

A detailed request usually leads to better code.

\ Developers who learn how to describe problems clearly get the most value from these tools.

\ That skill improves with practice.

\ By the end of the week, many developers notice they spend more time thinking about the request than typing the solution.

Costs Become a Serious Topic

Once teams see how much faster development can move, another conversation begins.

\ Cost.

\ Companies begin asking whether faster development reduces project budgets.

\ Some teams notice they need fewer development hours to build certain features. Others find they can build more features within the same budget.

\ Both situations change how companies think about software planning.

\ And this is where discussions about AI software development cost in 2026 become very relevant.

\ The tools themselves are not free. Teams must pay for subscriptions, cloud usage, and additional infrastructure.

\ But the productivity increase can offset those expenses.

\ If a feature takes half the time to build, the overall project cost may drop.

\ That potential shift is why many businesses are exploring how AI tools affect development budgets.

The Learning Curve Is Surprisingly Short

One surprising discovery is how quickly developers adapt.

\ Most tools work through simple prompts or code suggestions.

\ Developers experiment for a few hours and quickly find a rhythm.

\ By day five or six, the workflow starts to feel natural.

  1. Write a prompt.
  2. Review the code.
  3. Adjust it.
  4. Test it.

\ That cycle repeats throughout the day.

\ The process does not replace traditional coding. It simply adds another tool to the toolbox.

Creativity Starts to Increase

Less repetitive work means developers have more time to think about creative solutions.

\ Instead of focusing on small coding details, they can focus on improving the product.

\ User experience.

Feature ideas.

Performance improvements.

\ Developers often say the job becomes more interesting when routine tasks shrink.

\ And when teams move faster, experimentation becomes easier.

\ You can try new ideas without committing weeks of development time.

\ That freedom changes how teams build software.

Some Developers Feel Skeptical

Not every developer loves the experience right away.

\ Some worry about code quality.

\ Others feel uncomfortable relying on generated code.

\ There is also a fear that heavy reliance on AI could weaken core programming skills.

\ Those concerns are understandable.

\ But after a week of testing these tools, many developers realize something important.

\ AI is not replacing their skills.

It is amplifying them.

\ The developer still decides what gets built and how it should behave.

Teams Start Rethinking Development Workflows

When coding speed increases, team workflows often change.

\ Developers collaborate differently. Code reviews become more detailed. Planning sessions focus more on architecture and less on writing basic functions.

\ Product managers also notice something interesting.

\ Features move through development faster.

\ That can shorten release cycles and speed up product updates.

\ And when software updates reach users faster, companies can respond to feedback quickly.

Security and Testing Stay Critical

One thing that never changes is the need for careful testing.

\ Generated code must still go through security checks, performance tests, and QA review.

\ Companies cannot skip those steps.

\ If anything, automated code generation makes testing even more important.

\ Teams need to confirm that every component behaves correctly.

\ Developers who rely heavily on generated code usually increase their testing coverage to stay safe.

The Real Outcome After One Week

After seven days of letting AI write parts of the code, most developers reach the same conclusion.

\ It helps. A lot.

\ But it does not replace human developers.

\ Instead, it changes how they spend their time.

\ Less typing.

More reviewing.

More planning.

More problem-solving.

\ Developers move from writing every line to guiding the direction of the code.

\ And that shift can significantly affect productivity.

Why This Matters for Businesses

Companies building software care about two things.

\ Speed and cost.

\ If development teams can move faster without sacrificing quality, that affects project planning, hiring decisions, and product timelines.

\ It also changes conversations about AI software development cost in 2026.

\ Businesses want to know if these tools can reduce development expenses or help them ship products faster.

\ The answer often depends on how well teams use the technology.

\ Used correctly, it can accelerate development cycles.

Used poorly, it can create messy code that takes longer to fix.

\ The difference usually comes down to experienced developers guiding the process.

A Week Is Only the Beginning

Letting AI write your code for a week is just the start.

\ As tools continue to improve, developers will likely find new ways to use them.

\ Maybe for testing.

Maybe for documentation.

Maybe for code refactoring.

What is clear right now is this.

\ AI tools are not removing developers from the process. They are reshaping how development happens.

\ Developers who learn how to work alongside these tools may build software faster than ever before.

\ And companies paying attention to that shift will likely stay ahead of the competition.

我开发了一个可视化工作台,因为管理 Claude Code 技能快把我逼疯了

2026-03-18 04:55:03

It started with a folder full of markdown files.

\ I'd been using Claude Code daily for months. It became my go-to coding partner pretty quickly. Early on, I discovered Skills: markdown files with YAML frontmatter that you drop into ~/.claude/skills/ to teach Claude how you want things done. Write a SKILL.md, describe when it should trigger, add your instructions, and Claude suddenly knows your deployment pipeline, your coding standards, and your project's weird quirks.

\ I went deep. I wrote skills for everything. Code review guidelines. Database migration patterns. Component scaffolding. API endpoint boilerplate. Test generation strategies. Each one made Claude Code sharper, more tuned to how I actually work.

\ Then the problems started. 😅

The Skill Management Problem Nobody Talks About 🤯

Five skills in a folder? Totally fine. Thirty skills spread across multiple projects, each with slightly different versions? Absolute nightmare.

\ I kept hitting the same walls. I'd edit a skill's YAML frontmatter, deploy it, then discover a typo broke the trigger pattern. No validation anywhere. I'd copy a skill between projects, tweak it, then completely forget which version was current. No version history. I wanted to test whether a skill actually produced the output I expected before shipping it. No testing sandbox.

\ Sharing was the worst part. A teammate would ask for my code review skill. I'd send the file over. They'd ask which model it was tuned for. I couldn't remember. They'd deploy it, get so-so results, and write off skills entirely.

\ I was spending more time managing skills than writing code. I was using an AI agent to be more productive, but the tooling around that agent kept dragging me back. 🙃

Building What I Needed 🔨

So, I did what any developer does when the tooling falls short: built my own.

\ The idea was straightforward. A visual editor where I can see YAML frontmatter and markdown instructions side by side. Real-time validation so I catch errors before deployment. A way to test a skill against actual models with streaming responses, so I can tweak the instructions until the output matches what I want. And when it's ready, one-click deploy instead of manually copying files around.

\ That project became uberSKILLS ⚡ an open-source visual workbench for designing, testing, and deploying agent skills.

\ The first version was rough. A Next.js app with a basic editor and a deploy button that wrote files to ~/.claude/skills/. But even that bare-bones version saved me hours. No more YAML syntax errors. No more blind deployments. No more wondering if a skill would actually work.

From Side Project to Multi-Agent Workbench 🚀

This is where things got interesting.

\ While I was building uberSKILLS for Claude Code, the agent ecosystem blew up. Cursor shipped their rules system. GitHub Copilot added custom instructions. Windsurf launched with its own skill format. Gemini CLI showed up with agent configuration. Codex, OpenCode, Antigravity… suddenly there were eight major code agents, all supporting some form of persistent instructions.

\ The problem I'd solved for Claude Code? It existed everywhere. Every agent had its own directory structure, its own conventions, its own deployment path. Developers using multiple agents were maintaining duplicate sets of instructions with zero shared tooling. 😩

\ So uberSKILLS grew. Today, it deploys to eight agents 🎯:

\

  • Claude Code
  • Cursor
  • GitHub Copilot
  • Windsurf
  • Gemini CLI
  • Codex
  • OpenCode
  • Antigravity

\ Write your skill once, pick your targets, deploy everywhere. The skill format is standardized (YAML frontmatter for metadata and triggers, markdown body for instructions), and the engine handles translation to each agent's expected structure.

\ This matters more than it might sound. If you've spent time crafting a detailed code review prompt that works great with Claude, you should be able to use that same work with Copilot or Cursor without rewriting anything. Your prompt engineering expertise should be portable. 🔄

What It Actually Does ⚙️

Three steps: create, test, deploy.

✏️ Create

You can go manual with the structured editor and fill in metadata fields (name, description, trigger patterns, tags, model preferences). Or open the AI chat, describe what you want in plain language, and let it generate a complete skill for you. The AI creation flow has a live preview panel so you can watch the SKILL.md update as you refine your description through conversation.

🧪 Test

This is where uberSKILLS really pays for itself. The multi-model sandbox lets you pick any model available through OpenRouter (Claude, GPT, Gemini, Llama, dozens more) and run your skill against it with streaming responses. You see output in real time, plus metrics: token counts, latency, and time to first token. Tweak the instructions, test again, compare outputs across models, and actually feel confident that a skill works before it touches a real project. Every test run gets saved too, so you can track how instruction changes affect output quality over time. 📊

🚀 Deploy

One click. Pick your target agents from a dropdown, hit deploy, and uberSKILLS writes the files to the correct directory for each agent. Status updates to "deployed" so you can see at a glance what's live and what's still in draft.

\ Beyond those three steps: there's a skills library with search, status filtering, and sorting. Version history tracks every edit so you can roll back any revision. Import and export lets you pull skills from zip files or directories, and share them with your team. Settings panel covers API key management, theme preferences, and data backup. 📦

The Technical Choices 🛠️

For the curious, here's the stack: Turborepo monorepo with pnpm. Next.js 15 on the App Router with React 19, shadcn/ui, and Tailwind CSS v4. SQLite through Drizzle ORM for the database, so no external database server is needed. Everything runs locally. AI integration uses the Vercel AI SDK with the OpenRouter provider for multi-model support.

\ SQLite was a deliberate choice. uberSKILLS is local-first. Your skills, test history, API keys… all of it stays on your machine. The API key gets encrypted with AES-256-GCM before storage. No cloud dependency, no account to create, no data leaving your laptop. 🔒

\ Getting started is one command: \n

npx @uberskillsdev/uberskills

\ It creates a ~/.uberskills/data/ directory, sets up the database, runs migrations, generates an encryption secret, and launches at localhost:3000. No Docker, no cloning, no configuration ceremony. ✨

Why Skills Are the Multiplier Most Developers Ignore 💡

I talk to developers every week who use Claude Code or Copilot and have never written a single skill. They're leaving a ton of productivity on the table.

\ A well-written skill turns a general-purpose agent into a specialist. Without skills, you repeat the same context in every conversation. With skills, that context loads automatically based on trigger patterns. Your agent already knows your database conventions, your error handling patterns, your test philosophy, your deployment checklist… before you type a word.

\ The developers getting the most out of code agents are the ones who invest time teaching them. Skills are how you do the teaching. uberSKILLS is how you manage all that teaching without going crazy. 🧠

What's Next 🗺️

uberSKILLS is open source under MIT and free forever. The roadmap includes a community skill marketplace where developers can share and discover skills, collaborative editing for teams, and deeper integrations as new agents keep showing up.

\ The agent ecosystem moves fast. New agents ship every month, and existing ones pick up new capabilities every week. But one thing stays consistent: developers who customize their agents outperform those who don't. A proper workbench for that customization isn't optional anymore. It's infrastructure.

\ If you're still managing agent skills by hand-editing markdown files and copying them between directories, try uberSKILLS. Your future self, the one who isn't debugging YAML indentation at midnight, will appreciate it. 😄

\

如何使用 OpenClaw 部署您自己的 24/7 人工智能代理

2026-03-18 04:32:50

\ OpenClaw is a self-hosted AI assistant designed to run under your control instead of inside a hosted SaaS platform.

It can connect to messaging interfaces, local tools, and model providers while keeping execution and data closer to your own infrastructure.

The project is actively developed, and the current ecosystem revolves around a CLI-driven setup flow, onboarding wizard, and multiple deployment paths ranging from local installs to containerised or cloud-hosted setups.

This article explains how to deploy your own instance of OpenClaw from a practical systems perspective. We will look at how to deploy it on your local machine as well as a PaaS provider like Sevalla.

The goal is not just to “make it run,” but to understand deployment choices, architecture implications, and operational tradeoffs so you can run a stable instance long term.

:::warning Warning: It is dangerous to give an AI system full control of your system. Make sure you understand the risks before running it on your machine.

:::

Understanding What You Are Deploying

Before touching installation commands, it helps to understand the runtime model.

OpenClaw is essentially a local-first AI assistant that runs as a service and exposes interaction through chat interfaces and a gateway architecture.

The gateway acts as the operational core, handling communication between messaging platforms, models, and local capabilities.

In practical terms, deploying OpenClaw means deploying three layers.

The first layer is the CLI and runtime, which launches and manages the assistant.

The second layer is configuration and onboarding, where you select model providers and integrations.

The third layer is persistence and execution context, which determines whether OpenClaw runs on your laptop, a VPS, or inside a container.

Because OpenClaw runs with access to local resources, deployment decisions are not only about convenience but also about security boundaries. Treat it as an administrative system, not just a chatbot.

Deploying on a Local Machine

OpenClaw supports multiple deployment approaches, and the right one depends on your goals.

The simplest route is to install it directly on a local machine. This is ideal for experimentation, private workflows, or development because onboarding is fast and maintenance is minimal.

The installer script handles environment detection, dependency setup, and launching the onboarding wizard.

The fastest way to install OpenClaw is via the official installer script. The installer downloads the CLI, installs it globally through npm, and launches onboarding automatically.

curl -fsSL https://openclaw.ai/install.cmd -o install.cmd && install.cmd && del install.cmd

This method abstracts away most environmental complexity and is recommended for first-time deployments.

If you already maintain a Node environment, you can install it directly using npm.

npm i -g openclaw

The CLI is then used to run onboarding and optionally install a daemon for persistent background execution. This approach gives you more control over versioning and update cadence.

openclaw onboard

Regardless of installation path, verify that the CLI is discoverable in your shell. Environment path issues are common when global npm packages are installed under custom Node managers.

The Onboarding Process

Once installed, OpenClaw relies heavily on onboarding to bootstrap configuration.

Openclaw CLI

During onboarding you will select an AI provider, configure authentication, and choose how you want to interact with the assistant. This process establishes the core runtime state and generates local configuration files used by the gateway.

Onboarding also allows you to connect messaging channels such as Telegram or Discord. These integrations transform OpenClaw from a local CLI tool into an always-accessible assistant.

From a deployment perspective, this is the moment where availability requirements change. If you connect external chat platforms, your instance must remain online consistently.

You can skip certain onboarding steps and configure integrations later, but for production deployments, it is better to complete the initial configuration so you can validate end-to-end functionality immediately.

Once you add an OpenAI API key or Claude key, you can choose to open the web UI.

Openclaw Options

Go to localhost:18789 to interact with OpenClaw.

Deploying on the Cloud using Sevalla

A second approach is to deploy to a VPS or cloud instance. This model gives you always-on availability and makes it possible to interact with OpenClaw from anywhere.

A third approach is containerised deployment using Docker or similar tooling. This provides reproducibility and cleaner dependency isolation.

Docker setups are particularly useful if you want predictable upgrades or easy migration between machines. OpenClaw’s repository includes scripts and compose configurations that support container execution workflows.

I have set up a custom Docker image to load OpenClaw into a PaaS platform like Sevalla.

Sevalla is a developer-friendly PaaS provider. It offers application hosting, database, object storage, and static site hosting for your projects.

Log in to Sevalla and click “Create application”. Choose “Docker image” as the application source instead of a GitHub repository. Use manishmshiva/openclaw as the Docker image, and it will be pulled automatically from DockerHub.

Sevalla New Application

Click “Create application” and go to the environment variables. Add an environment variable ANTHROPIC_API_KEY . Then go to “Deployments” and click “Deploy now”.

OpenClaw Deployment

Once the deployment is successful, you can click “Visit app” and interact with the UI with the sevalla provided url.

OpenClaw Dashboard

Interacting with the Agent

There are many ways to interact with the agent once you set up Openclaw. You can configure a Telegram bot to interact with your agent. Basically, the agent will (try to) do a task similar to a human assistant. Its capabilities depend on how much access you provide the agent.

You can ask it to clean your inbox, watch a website for new articles, and perform many other tasks. Please note that providing OpenClaw access to your critical apps or files is not ideal or secure. This is still a system in its early stages, and the risk of it making a mistake or exposing your private information is high.

Here are some of the ways people are using OpenClaw.

Security and Operational Considerations

Because OpenClaw can execute tasks and access system resources, deployment security is not optional. The safest baseline is to bind services to localhost and access them through secure tunnels when remote control is required. This significantly reduces exposure risk.

When deploying on a VPS, harden the host like any administrative service. Use non-root users, keep packages updated, restrict inbound ports, and monitor logs. If you are integrating messaging channels, treat tokens and API keys as sensitive secrets and avoid storing them in plaintext configuration where possible.

Containerization helps isolate dependencies but does not eliminate risk. The container still executes code on your host, so network and volume permissions should be carefully scoped.

Updating and Maintaining Your Instance

OpenClaw evolves quickly, with frequent releases and feature changes. Keeping your instance updated is important not only for features but also for stability and compatibility with integrations.

For npm-based installations, updates are straightforward, but you should test upgrades in a staging environment if your assistant handles important workflows. For source-based deployments, pull changes and rebuild consistently rather than mixing old build artifacts with new code.

Monitoring is another overlooked aspect. Even simple log inspection can reveal integration failures early. If your deployment is mission-critical, consider external uptime checks or process supervisors.

Conclusion

Deploying your own OpenClaw agent is ultimately about taking control of how your AI assistant works, where it runs, and how it fits into your daily workflows. While the setup process is straightforward, the real value comes from understanding the choices you make along the way, whether you run it locally for privacy, host it in the cloud for constant availability, or use containers for consistency and portability.

As the ecosystem around self-hosted AI continues to evolve, tools like OpenClaw make it possible to move beyond relying entirely on third-party platforms. Running your own agent gives you flexibility, ownership, and the freedom to shape the experience around your needs.

Start small, experiment safely, and gradually build confidence in how your assistant operates. Over time, what begins as a simple deployment can become a dependable, personalized system that works the way you want , under your control.

Hope you enjoyed this article. Learn more about me by visiting my website.

\

印度尼西亚在推动人工智能生态系统发展的同时,警惕“数字殖民”

2026-03-18 04:11:25

As Southeast Asia’s largest economy continues accelerating its digital transformation, Indonesia finds itself at a critical juncture, balancing both its desire to build a robust artificial intelligence (AI) ecosystem without becoming overreliant on foreign powers.

This tension was brought in front of the world stage on February 10, 2026 at South China Morning Post’s China Conference: Southeast Asia 2026. Here, Vikram Sinha, president director and CEO at Indosat Ooredoo Hutchison, announced that “digital colonization is the biggest threat for any country,” highlighting how China’s focus on open-source technology could offer Indonesia increased digital infrastructure sovereignty.

That ecosystem includes global industry heavyweights like Hangzhou-based DeepSeek AI and Alibaba Cloud, whose open-source models and offerings are often marketed as cost-effective and customizable alternatives compared to their more expensive and proprietary Western counterparts. For Indonesia, these platforms are appealing because they provide access to advanced AI capabilities without reliance on U.S.-dominated technology stacks.

However, open-source technologies do not automatically translate to digital sovereignty. Despite the need for foreign partnerships, Indonesia must determine whether leveraging Chinese platforms enhances its own autonomy or simply consolidates its dependencies, a particularly important nuance given the dual-use nature of artificial intelligence and the increasingly intertwined relationship between technological advancement and national security .

Domestic Indonesian efforts to support AI development

Subsequently, Indonesia has launched an aggressive campaign to internalize and grow its critical AI tech capabilities, with a focus on key infrastructure like semiconductors and human capital.

On February 1, 2026, Indonesia’s Ministry of Industry (Kemenperin) announced its launch of a semiconductor industry roadmap focused on chip design, and talent development. Kemenperin also highlighted its establishment of the Indonesia Chip Design Collaborative Center, bringing together private industry participants and chip experts from 13 universities to reduce import dependence and strengthen Indonesia’s role in the global supply chain.

Just several months prior, in August 2025, Indonesia launched its AI Talent Factory, which partners with universities to support AI practitioners to meet Indonesia’s growing demand for AI-based talent.

When put together, these initiatives reflect a broader industrial policy shift that highlights Indonesia’s desire to move higher in the world’s AI value chain from its current position as a downstream consumer of AI technologies.

The importance of strategic foreign partnerships

Despite its desire to establish AI independence, Indonesia knows that it cannot build an entire technological ecosystem without assistance from other countries. China stands out as a crucial partner for Indonesia, though Indonesia is currently attempting to diversify its AI capabilities through other regional partners as well.

Indonesia is increasingly seeking external involvement from foreign partners like China for developing its emerging technologies, to include energy, smart grids, and data centers, all key components of building critical AI infrastructure. The Digital Silk Road is playing a huge role in this respect, with Chinese technology being deployed within Indonesia to position the country for digital economic growth.

Chinese firms like Huawei Cloud and Alibaba Cloud are increasingly investing in Indonesia’s digital infrastructure, positioning themselves as long-term partners in the country’s AI development. The ability for these Chinese firms to bundle hardware, cloud services, and even technical training alongside their open-source AI systems highlight an integrated model that appeals to emerging economies that desire rapid scalability and implementation ease.

This integration carries significant national security implications for Indonesia. Digital infrastructure encompassing all-things-AI and cloud technologies are inherently dual-use technology. As a result, a reliance on Chinese-built data centers, algorithms, hardware, and software introduce the potential of vulnerabilities from the data sovereignty and cybersecurity perspectives. Additionally, deep integration of exclusively Chinese digital ecosystems risks Indonesia’s status in terms of partnering with the United States and other non-Chinese partners, particularly from the military and intelligence perspectives.

Recognizing this, Indonesia has also been expanding its technological relationships beyond just China. Manda Royal Hospital Puri, a private hospital in Banten just a stone’s throw west of Jakarta, adopted a Roen Surgical’s Zamenix system, an AI-based robotic system from South Korea for kidney stone surgery. Koltiva, a Swiss-Indonesian agritech company, was able to launch an AI-enabled traceability pilot in Indonesia on February 11, 2026, albeit under AI Singapore’s AIAP for Industry initiative, a 9-month program training individuals to become AI engineers.

The diversification of partnerships highlights how Indonesia is attempting to balance interdependence between different partners to achieve the growth it wishes to achieve in the AI space while addressing any potential strategic vulnerabilities.

Future challenges for regulating the unknown

As AI adoption increases, the Indonesian government is hastily seeking to establish its own crucial legal frameworks to manage the societal and national security risks of AI.

On January 21, 2026, Indonesia’s Ministry of Communication and Digital announced that it would soon issue two national-level regulations to provide a comprehensive legal framework for responsible AI development within the archipelago nation. These two regulations would establish a clear national AI roadmap and AI security/ethics guidelines.

The necessity of this governance is easily highlighted through the recent controversies surrounding AI in the region. On February 2, 2026, Indonesia’s Ministry of Communication and Digital Affairs reversed its ban on X’s (formerly Twitter) Grok AI chatbot, which was previously imposed alongside Malaysia just one month prior over concerns that it could create sexually explicit deepfakes involving women and children. This has secondary implications for information operations, with dis- and mis-information campaigns playing an increasingly large role in both domestic and international politics.

This rapid reversal suggests that Indonesia’s regulatory environment is still finding its footing, highlighting the need to keep communication channels and technologies open.

Conclusion

Indonesia’s journey to becoming a digital power is characterized by its current predicament of balancing its AI future alongside foreign partnerships. Its simultaneous investment into physical and digital infrastructure alongside human capital, contrasted with its partnership with giants like China highlight the delicate balance that Indonesia is striking to establish its future.

The real test will be whether Indonesia can leverage Chinese technological scale and financing without replicating new forms of dependency and comprising its national security. In an era increasingly shaped by strategic competition between the U.S. and China, Indonesia’s approach may offer a preview of how middle powers attempt to chart a path between competing digital spheres of influence.