2025-12-21 23:39:40
Verba is a new platform that lets you build AI characters that come alive across your favourite services. Instead of coding complicated bots, you can define a personality, voice and knowledge base in minutes.
With Verba you can:
Once your character is created, you can deploy it on Discord, X (Twitter) and Bluesky so users can chat with it anywhere.
If you’re curious to try it out, visit verba.ink to sign up and start building your first AI character. We’re excited to see what you create!
2025-12-21 23:35:04
When you hear AI agents, what comes to mind?
For many people, it’s sophisticated bots running around on their own, something straight out of a sci-fi movie. And while there is a tiny bit of truth in that, the reality is far less dramatic and far more useful.
At their core, AI agents are simply software systems that can think, decide, and act on their own with little to no human intervention.
Think of them as smart assistants, not robots.
Imagine telling your assistant:
"I need a weekly report."
Instead of just giving suggestions, the agent:
That’s an AI agent in action.
This is where most confusion happens.
Chatbots are reactive.
They respond to prompts and guide you through tasks.
A chatbot can:
But you still do the execution copying, pasting, sending, deciding the next step.
AI agents go a step further.
They don’t just assist,they execute.
An AI agent can:
Chatbots talk. AI agents act.
AI agents work because of a few key capabilities working together:
AI agents can remember past interactions, data, or outcomes.
This allows them to learn from previous tasks and improve over time.
They can break down complex tasks into smaller, logical steps.
Instead of being told every single action, they figure out how to get things done.
AI agents can use external tools, such as:
This is what allows them to actually do things, not just suggest them.
Once given a goal, an AI agent can work independently:
This autonomy is what separates agents from traditional automation.
Here’s where it really clicks.
Scenario: Scheduling a team meeting
With a chatbot:
With an AI agent:
You don’t manage the steps the agent does.
AI agents can:
This frees humans to focus on:
AI agents come in different forms depending on what they’re built to do:
Task Agents
Focus on a single job (e.g., sending reports, answering tickets).
Workflow Agents
Handle longer chains of steps and decisions.
Multi-Agent Systems
Multiple agents working together, each with a role, like a team.
Embedded Agents
Built directly into apps or systems to quietly assist in the background.
You can think of an AI agent’s workflow like this:
Request
│
Think
│
Decide
│
Act
│
Observe
│
Improve
No sci-fi. Just structured decision-making.
AI agents aren’t here to replace humans or take over the world.
They’re here to handle the busy work: the repetitive, time-consuming tasks that slow teams down.
Think of them as reliable digital coworkers:
quiet, tireless, and surprisingly practical.
2025-12-21 23:33:17
Google’s Gemini 3 Pro arrived as a headline-grabbing multimodal model that Google positions as a major step forward in reasoning, agentic workflows, and coding assistance. In this long-form piece I note to answer one clear question: Is Gemini 3 Pro good for coding? Short answer: Yes — with important caveats. Below you’ll find evidence, use-cases, limitations, and concrete adoption advice so teams and individual developers can decide how to use Gemini 3 Pro effectively and safely.
Currently, CometAPI that aggregates over 500 AI models from leading providers) integrates Gemini 3 Pro and Gemini 3 Flash APIs, and the API discounts are very cost-effective. You can first test the coding capabilities of Gemini 3 Pro in the CometAPI interactive window.
Gemini 3 Pro is the flagship release in Google’s Gemini 3 family — a multimodal (text, code, image, audio, video) model series built to improve depth of reasoning and agentic capabilities. Google launched Gemini 3 Pro in mid-November 2025 and positioned it explicitly as their “best vibe coding model yet,” making strong claims about reasoning, multimodal understanding, and integration into developer toolchains.
Why it matters: unlike earlier assistants that were optimized primarily for natural-language assistance or shorter code snippets, Gemini 3 Pro was designed from the ground up for deeper, longer-form reasoning and more autonomous agent-style coding — e.g., generating multi-file projects, running terminal-like operations via agents, and integrating with IDEs and CI systems. For teams that want an AI to do more than patch single functions — to scaffold applications, propose architecture changes, and handle multi-step development tasks — Gemini 3 Pro signals a new capability tier.
Three specs stand out for coding workflows:
These characteristics are promising on paper for coding tasks: large context reduces the need to compress or summarize repositories, multimodality helps when debugging from error screenshots or log attachments, and better reasoning helps with architecture and complex bug triage.
Gemini 3 Pro consistently produces idiomatic code and — importantly — shows an improved ability to reason about architecture and multi-file projects. Several hands-on reports demonstrate that it can generate scaffolded applications (frontend + backend), translate designs into working prototypes, and refactor larger codebases with fewer context-limitation problems than earlier models. However, real-world correctness still depends on prompt quality and human review: the model can still introduce subtle logical errors or make unsafe assumptions about environment state.
One of Gemini 3 Pro’s headline features is agentic or autonomous coding — the ability to reason about tasks, run through multi-step workflows, and interact with tools (via API or a sandboxed execution environment). Benchmarks such as Terminal-Bench show that the model is substantially better at tasks requiring command-line navigation, dependency management, and debugging sequences. For developers who use AI to triage bugs, create debugging scripts, or automate deployment tasks, Gemini 3 Pro’s agentic abilities are a major plus. But caution: those features require secure gating and careful sandboxing before giving the model access to production systems.
While Gemini 3 Pro’s reasoning strength is excellent for larger tasks, latency can be higher than some competitors when making small iterative edits (fixes, micro-refactors). For workflows that need rapid, repeated edit cycles (e.g., pair programming with instant suggestions), models optimized for low-latency completions may still feel snappier.
A major caveat: independent evaluations focused on factual accuracy show that even top models struggle with absolute factual correctness in some contexts. Google’s own FACTS-style benchmarks show non-trivial error rates when models are asked to retrieve or assert factual information, and Gemini 3 Pro scored around 69% accuracy on a new FACTS benchmark designed by Google researchers — indicating meaningful room for improvement in absolute reliability. For code, that means the model can confidently produce plausible but incorrect code (or incorrect citations, commands, or dependency versions). Always plan for human review and automated testing.
When a model generates dependency updates, bash commands, or infrastructure-as-code, it can introduce supply-chain risks (e.g., suggesting a vulnerable package version) or misconfigure access controls. Because of Gemini 3 Pro’s agentic reach, organizations must add policy controls, code-scanning, and restricted execution sandboxes before integrating the model into CI/CD or deploy pipelines.
Gemini 3 Pro can be used as a pre-commit reviewer or as part of code-review automation to flag potential bugs, propose refactors, or generate test cases. Early adopters reported it helped generate unit tests and end-to-end test skeletons quickly. Still, automated acceptance criteria should include human verification and failing builds for any model-suggested changes that affect security or architecture.
By many measures, Gemini 3 Pro is a top-tier contender. Public comparisons and trackers show it outranking many prior models on reasoning and long-context tasks, and often matching or edging out competitors on coding benchmarks. That said, the model ecosystem in late-2025 is highly competitive: OpenAI released newer GPT models (e.g., GPT-5.2) with explicit improvements to coding and long-context tasks in direct response to competitor progress. The market is therefore fast-moving, and “best” is a moving target.
SWE-Bench is designed to evaluate real-world software engineering tasks: given a code repository + failing tests or an issue, can a model produce a correct patch that fixes the problem?
Data table:
| Model | SWE-Bench Verified Score |
|---|---|
| Claude Opus 4.5 | ~80.9% (highest among competitors) |
| GPT-5.2 (standard) | ~80.0% (close competitor) |
| Gemini 3 Pro | ~74.20–76.2% (slightly behind the others) |
Benchmark: Evaluates a model’s ability to complete multi-step coding tasks, approximate real developer agent behavior (file edits, tests, shell commands).
| Model & Variant | Terminal-Bench 2.0 Score (%) |
|---|---|
| Claude Opus 4.5 | ~63.1% |
| Gemini 3 Pro (Stanford Terminus 2) | ~54.2% |
| GPT-5.2 (Stanford Terminus 2) | ~54.0% |
Notes:
τ2-bench (tau-2) and similar tool-use evals measure an agent’s ability to orchestrate tools (APIs, Python execution, external services) to complete higher-level tasks (telecom retail automations, multi-step workflows). Toolathlon, OSWorld, Vending-Bench, and other specialized arenas measure domain-specific automation, long-horizon agentic competence, or environment interaction.
Gemini 3 Pro: DeepMind reports very high τ2-bench / agentic tool-use numbers (e.g., τ2-bench ≈ 85.4% in their table) and strong long-horizon results on some vendor tests (Vending-Bench mean net worth numbers).
LiveCodeBench Pro focuses on algorithmic / competitive programming problems (Codeforces-style), often reported as Elo ratings derived from pass@1 / pass@k comparisons and pairwise matches. This benchmark emphasizes algorithm design, reasoning about edge cases, and concise, correct implementations.
Gemini 3 Pro (DeepMind): DeepMind reports a LiveCodeBench Pro Elo ≈ 2,439 for Gemini 3 Pro (their published performance table). Gemini 3 Pro shows particularly strong competition/algorithmic performance in DeepMind’s published numbers (high Elo), which aligns with anecdotal and independent tests that Google’s model is strong on algorithmic problems and coding puzzles.
The best, most-relevant benchmarks for judging coding capability today are SWE-Bench (Verified and Pro) for real repo fixes, Terminal-Bench 2.0 for agentic terminal workflows, and LiveCodeBench Pro for algorithmic / competition skill. Vendor disclosures place Claude Opus 4.5 and GPT-5.2 at the top of SWE-Bench Verified (~80% range) while Gemini 3 Pro shows especially strong algorithmic and agentic numbers in DeepMind’s published table (high LiveCodeBench Elo and solid Terminal-Bench performance).
All three vendors highlight agentic / tool-use competence as a primary advancement. Reported scores vary by task: Gemini is emphasized for tool chaining & long context / multimodal reasoning, Anthropic for robust code+agent workflows, and OpenAI for long-context and multi-tool reliability.
Gemini 3 Pro excels at:
It may be less attractive when:
Google has rolled out integrations and guidance that make Gemini 3 Pro useful inside real development environments:
These integrations are what make Gemini 3 Pro particularly useful: they allow end-to-end loops where the model can propose changes and then run tests or linters to confirm behavior.
Gemini 3 Pro is very good for coding if you treat it as a powerful, multimodal assistant integrated into an engineering workflow that includes execution, tests, and human review. Its combination of reasoning, multimodal input, and agentic tool support elevates it beyond a mere autocomplete; it can act like a junior engineer that drafts, tests, and explains changes. But it is not a replacement for experienced developers — rather, a force multiplier that lets your team focus on design, architecture, and edge cases while it handles scaffolding, iteration, and routine fixes.
2025-12-21 23:31:05
OpenAI announced GPT Image 1.5, the company’s new flagship image-generation and editing model, and shipped a refreshed “ChatGPT Images” experience across ChatGPT and the API. OpenAI markets the release as a step toward production-grade image creation: stronger instruction following, more precise edits that preserve important details (faces, lighting, logos), output that’s up to 4× faster, and lower image input/output costs in the API.The good news is that CometAPI has integrated GPT-image 1.5 (gpt-image-1.5) and offers a lower price than OpenAI.
GPT Image 1.5 is OpenAI’s latest generation image model, released as the engine behind a rebuilt ChatGPT Images experience and made available through the OpenAI API as gpt-image-1.5. OpenAI positions it not just as a novelty art tool but as a production-ready creative studio: it aims to make precise, repeatable edits and to support workflows like ecommerce catalogs, brand asset variant generation, creative asset pipelines, and fast prototyping. Explicitly highlights advances in preserving important image details—faces, logos, lighting—and in following step-by-step editing instructions.
Two operational details to remember: GPT Image 1.5 renders images up to four times faster than its predecessor and that image inputs/outputs are ~20% cheaper in the API compared with GPT Image 1.0 — both important for teams that iterate a lot. The new ChatGPT Images UI also adds a dedicated sidebar workspace, preset filters and trending prompts, and a one-time “likeness” upload for repeated personalizations.
OpenAI’s image line has moved from DALL·E → multiple internal image experiments → GPT Image 1 (and smaller variants). Compared with earlier OpenAI image models (e.g., GPT-image-1 and earlier ChatGPT image stacks), 1.5 is explicitly optimized for:
In short: instead of treating image generation as a one-off “art toy,” OpenAI is pushing image models toward predictable, repeatable tools for creative teams and enterprise workflows.
GPT Image 1.5 performing strongly across several image-generation and editing leaderboards published since launch.LMArena report GPT Image 1.5 ranking at or near the top of image text-to-image and image-editing leaderboards, sometimes narrowly ahead of competitors like Google’s Nano Banana Pro.
One of the headline features for GPT Image 1.5 is precise editing that preserves “what matters”: when you ask the model to change a particular object or attribute it aims to change only that element while keeping composition, lighting, and people’s appearance consistent across edits. For brands and ecommerce teams this translates to fewer manual touchups after automated edits.
OpenAI reports that image generation in ChatGPT Images is up to 4× faster than before, ~20% cheaper image I/O costs in the API compared to GPT Image 1. That’s a product-level claim: faster render time means you can iterate more images in the same session, start additional generations while others are still processing, and reduce friction in exploratory workflows. Faster inference not only reduces latency for end-users, it also lowers energy per request and operational cost for deployments. Note: “up to” means real-world gains will depend on prompt complexity, image size, and system load.
Stronger instruction following versus GPT Image 1.0: the model is better at interpreting multi-step prompts and retaining user intent across chained edits. They also highlight improved text rendering (legible text embedded in images) and better small-face rendering, but it still flags multilingual/text rendering limits in some edge cases, but overall the model aims to close the longstanding gap where generated images would produce illegible or nonsensical signage.
Nano Banana Pro (branded in Google’s Gemini family as Gemini 3 Pro Image / Nano Banana Pro) is Google/DeepMind’s studio-grade image model. Google emphasizes excellent text rendering, multi-image composition (blend many images into one), and integration with broader Gemini capabilities (search grounding, locale-aware translations, and enterprise workflows in Vertex AI). Nano Banana Pro aims to be production-ready for designers who need high fidelity and predictable text layout inside images.
Qwen-Image (from the Qwen/Tongyi family) is an image model released by Alibaba that has been evaluated across academic and public benchmarks. The Qwen team’s technical report documents strong cross-benchmark performance (GenEval, DPG, OneIG-Bench) and highlights particular strengths in prompt understanding, multilingual text rendering (notably Chinese), and robust editing. Qwen-Image is often discussed as one of the leading open-source / enterprise-friendly options outside the US hyperscalers.
There are two ways to access GPT Image 1.5:
ChatGPT (UI) — GPT Image 1.5 powers the new ChatGPT Images experience (Images tab). Use it to generate from text, upload images and make edits, or iterate interactively.
API — Use the Image API (/v1/images/generations and /v1/images/edits) to generate and edit images with gpt-image-1.5. Responses are base64-encoded images for GPT image models.
The good news is that CometAPI has integrated GPT-image 1.5 (gpt-image-1.5) and offers a lower price than OpenAI. You can use CometAPI to simultaneously use and compare Nano banana pro and Qwen image.
gpt-image-1.5-YYYY-MM-DD).Faster generation can reduce latency and (depending on pricing) cost-per-image, but enterprise usage should measure both throughput and token/compute pricing.
GPT Image 1.5 reduces certain failure modes (bad edits, inconsistent faces) but does not eliminate hallucinated or biased outputs. Like other generative models, it can reproduce cultural biases or produce inaccurate depictions if prompts are poorly specified. Implement guardrails: content filters, human review, and test suites that reflect expected edge cases.
If your project needs high-quality image generation or robust, iterative editing within conversational workflows (for example: marketing creatives, product mockups, virtual try-ons, or an image-enabled SaaS pro.
2025-12-21 23:30:03
Builders often define themselves by what they create and the tools they use. It's natural - after all, that's where we spend our time and direct our focus. But what if this attachment to building is actually getting in the way of our primary job: solving problems?
In DEV347, Ben Kehoe, Distinguished Engineer at Siemens and AWS Serverless Hero, delivered a thought-provoking session that challenges how we think about being builders. His central thesis: "A builder's true job is solving problems, which may or may not require building something."
Watch the Full Session:
Ben identified three common mindsets that can limit our effectiveness:
Identifying with your tools - Whether it's VI versus Emacs, SQL versus NoSQL, or any other technology choice, when tools become part of our identity, disagreement feels like a personal threat. This leads to flame wars and poor decision-making.
Identifying with your output - When we view our code as "our baby," not-invented-here syndrome takes hold. If someone suggests our creation isn't useful, it feels like an attack on ourselves.
Isolation from greater purpose - When we narrow our focus to just delivering to the next team without understanding the broader organizational context, we lose sight of why customers would care about our work.
These mindsets share a common problem: they focus on outputs (the things we build) rather than outcomes (the problems we solve).
Ben drew powerful analogies from software development practices to illustrate why examining our mindset is valuable:
Continuous Deployment - The value isn't just faster time to market. It's that continuous deployment forces you to be good at testing, rollback, and mean time to resolution. It creates accountability and requires humility - you're acknowledging you won't get it perfect, so you need robust safety mechanisms.
You Build It, You Run It - When the same team is responsible for both development and operations, they're held accountable for software quality. If it's hard to run, that's their problem to solve.
Blameless Postmortems - We don't assign blame not because there's nobody to blame, but because blaming people prevents us from understanding why things went wrong. We prioritize introspection over finger-pointing.
As Ben noted: "Humility and self-confidence are not mutually exclusive. Humility and ego are." Self-confidence means believing in your abilities while remaining open to being wrong and always improving.
Ben's core argument centers on a crucial distinction: outcomes are assets, implementations are liabilities.
"When we talk about a software feature, the feature is the asset, but the implementation is a liability. At best, in a perfect world, it does exactly what you expect it to do. The only thing your implementation can do is subtract value from the feature."
This reframes everything. Every line of code you write is a potential source of bugs, maintenance burden, and technical debt. The feature itself - the problem it solves - is what has value.
This perspective also helps us understand our place in the organization. When you focus on outputs, you only see the next step in the value chain. When you focus on outcomes, you can trace how your work connects to customer needs through multiple teams and layers.
Ben referenced the classic saying: "People don't buy drills, they buy holes." Understanding the outcome (the hole) rather than just the output (the drill) allows you to work with customers to find better solutions than what they initially requested.
View tools as means to an end - Don't identify with your technology choices. Understand yourself as adaptable. If your dev setup has to be exactly right or you can't work, that's brittle - and we don't want brittle systems.
Focus on total cost of ownership - The AWS bill is only part of the cost. A system that runs cheaply but requires constant operational attention may be more expensive than one with higher infrastructure costs but minimal maintenance. Consider costs across you, your team, your organization, and your company.
Optimize for runtime over development time - Code runs far more often than it's built. Optimize for ongoing maintenance and operations costs rather than development speed.
Understand technical debt strategically - Ben made a brilliant point: if you ask your finance team whether your company should have zero debt, they'll say no. Debt is leverage. The same applies to technical debt. "If you have none of it, you are not moving fast enough." Take on technical debt strategically when it provides value, but have a plan to pay both the interest (keeping it running) and the principal (fixing it properly).
Here's an uncomfortable truth: "The feeling of productivity is a poor proxy for actual productivity."
When you're in flow state writing code, that feeling is about your outputs - how effectively you're creating code. But whether that code is the right thing to create isn't something you feel in the moment.
Ben shared a story about AWS Step Functions in 2017. Back then, Step Functions was hard to use - writing JSON, wrestling with JSONPath, poor documentation. It felt unproductive and frustrating. But if you spent that afternoon struggling with it, you often ended up with something you never had to touch again. It just worked, scaled automatically, and ceased to occupy mental space.
Compare that to building the same workflow in a different way that felt more productive in the moment but required ongoing maintenance, updates, and occasional firefighting. The total cost of ownership was much higher, even though it felt better to build.
Throwing away code should be a joyous act - When a feature is no longer needed, or when a third-party service or AWS service now does what you built, celebrate. That implementation was always a liability. Now it's off your balance sheet and you can create something else valuable.
This requires not identifying with your output. That code you poured blood, sweat, and tears into isn't you. If it's no longer useful, let it go and do something else useful.
When your organization changes - through reorgs, new strategies, or different ways of working - understanding your place through outcomes rather than outputs makes you resilient.
If you're not your tools, then changing what tools you use or what you're building doesn't threaten your identity. You can understand how you're serving the organization in new ways, potentially moving up the value chain or into different value chains entirely.
Ben addressed the elephant in the room: how does AI change this mindset?
Accountability still lies with you - Currently, if AI gets it wrong, you're accountable. This means you need to know what good looks like to catch and fix problems.
Apply the same thinking to AI-generated code - If AI is generating boilerplate, ask why that boilerplate is necessary. Reducing boilerplate means less code that can be wrong, whether written by humans or AI.
The senior/junior engineer gap - Historically, new technologies allowed junior engineers to compete with senior engineers by eliminating things seniors had to worry about. AI works differently. It's more effective in the hands of senior engineers who can evaluate its output and fix problems. Junior engineers using AI may not develop the skills they need because AI is doing the work for them. This creates a training challenge we must address.
Quality versus cost curves are shifting - AI can create something that works at extraordinarily low cost (vibe coding). But for very high-quality software, AI lowers costs much less dramatically. This changes business decisions about acceptable quality levels. "High quality software is not the point." The appropriate level of quality is a business decision based on needs, not an absolute standard.
We are not artisans - Software isn't craft work requiring perfection. It's easy to make software (though not easy to make good software). We're not creating objects of beauty; we're solving business problems.
Focus on outcomes over outputs - Features are assets. Implementations are liabilities. The less you have to build to achieve an outcome, the better.
Don't identify with your tools or output - You are not your technology choices. You are not your code. These are means to an end.
Understand total cost of ownership - Consider all costs: development, operations, maintenance, opportunity cost. Balance them intelligently.
Be strategically comfortable with technical debt - Like financial debt, technical debt can be leverage. Use it wisely.
Embrace change and obsolescence - When your code becomes unnecessary, celebrate. When the organization shifts, adapt by focusing on the outcomes you provide.
Builders are problem solvers - The thing that makes us builders is that our mechanism for solving problems is to build. But building is the tool, not the job.
As Ben concluded: "When we ask who am I, and we acknowledge that we are not our tools, that we are not our output, we can answer that we are problem solvers. That is the thing that we do. We are asked to solve problems in interesting ways. The thing that makes us different from other people, where we get that label, is that the mechanism by which we solve problems is to build, and that is what makes us builders."
This post is part of DEV Track Spotlight, a series highlighting the incredible sessions from the AWS re:Invent 2025 Developer Community (DEV) track.
The DEV track featured 60 unique sessions delivered by 93 speakers from the AWS Community - including AWS Heroes, AWS Community Builders, and AWS User Group Leaders - alongside speakers from AWS and Amazon. These sessions covered cutting-edge topics including:
Each post in this series dives deep into one session, sharing key insights, practical takeaways, and links to the full recordings. Whether you attended re:Invent or are catching up remotely, these sessions represent the best of our developer community sharing real code, real demos, and real learnings.
Follow along as we spotlight these amazing sessions and celebrate the speakers who made the DEV track what it was!