2026-01-26 01:36:58
It was 2 AM on a Tuesday, and I was staring at my browser's network tab again. The numbers were brutal:
TTFB: 847ms
For the third client project in a row, I was hitting the same wall. The headless WordPress site looked beautiful—modern React frontend, slick animations, perfect design. But the performance? Unacceptable.
"Just use WPGraphQL," everyone said. "It's the standard for headless WordPress."
So we did. And it was killing our Core Web Vitals.
This is the story of why we built Headless Bridge, and why WPGraphQL's approach to headless WordPress APIs is fundamentally flawed for most use cases.
When WPGraphQL launched, it was revolutionary. Finally, a proper GraphQL API for WordPress! No more wrestling with the clunky REST API. You could query exactly what you needed, nest relationships, and build truly decoupled WordPress sites.
The promise was incredible:
But the reality was different:
// A simple query to get 10 blog posts
query {
posts(first: 10) {
edges {
node {
id
title
excerpt
featuredImage {
node {
sourceUrl
mediaDetails {
width
height
}
}
}
author {
node {
name
}
}
categories {
edges {
node {
name
}
}
}
}
}
}
}
This "simple" query to fetch 10 blog posts would trigger:
And this was on a good day.
The turning point came with a high-traffic client project. A content publisher with 50,000+ posts and millions of monthly visitors.
Week 1: Everything seemed fine in development.
Week 2: Staging environment started showing cracks. API responses were hitting 1-2 seconds.
Week 3: Launch day. Within hours, the site was crawling. TTFB spiked to 3+ seconds during peak traffic.
Week 4: Emergency client meeting. "Why is our $50,000 headless WordPress site slower than our old WordPress theme?"
We tried everything:
Result: Marginal improvement. TTFB dropped from 3 seconds to 800ms. Still terrible.
The client threatened to cancel the project and revert to their old WordPress theme.
After weeks of profiling, benchmarking, and digging through WPGraphQL's internals, I finally understood the fundamental issue:
Every single API request goes through this process:
Every. Single. Request.
Think about that. Your blog post content doesn't change between requests. Your featured images don't change. Your author names don't change. But WPGraphQL recomputes everything from scratch for every request, as if the data is constantly changing.
It's like going to a restaurant where the chef shops for ingredients, cooks your meal from scratch, and washes dishes after every single order—even though you ordered the same dish as the person before you.
I was complaining about this to a friend over coffee when he asked a simple question:
"Why does the API need to compute anything at request time? Your content only changes when an editor hits 'Save', right?"
That's when it hit me.
Blog posts don't change at request time. They change at save time.
What if we pre-compiled the JSON response when content is saved, instead of computing it when requested?
This wasn't a new idea. Static site generators like Gatsby do this. But nobody was doing it inside WordPress for the API layer itself.
That weekend, I started prototyping.
The core concept was simple:
The first benchmark results were shocking:
| Metric | WPGraphQL | Headless Bridge (v0.1) |
|---|---|---|
| TTFB | 487ms | 52ms |
| DB Queries | 12 | 1 |
| Response Size | 15.3KB | 8.1KB |
| Speed Improvement | Baseline | 9.4x faster |
Nearly 10x faster with just a prototype.
Of course, there are tradeoffs. Nothing is free in software.
With WPGraphQL, you can query exactly what you want:
query {
posts {
title # Just the title
}
}
With Headless Bridge, you get a fixed JSON structure:
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"title": "My Blog Post",
"content": "...",
"excerpt": "...",
"featured_image": {...},
"author": {...},
"categories": [...]
}
But here's the thing: In practice, 95% of headless WordPress projects use the same standard queries:
You almost never need GraphQL's complex querying capabilities. And when you do, you're usually better off implementing that logic in your frontend or using a dedicated search service like Algolia.
Trade query flexibility for 10x performance? That's a deal most developers will take.
With WPGraphQL, changes appear immediately in the API. With Headless Bridge, there's a small delay (typically 5-30 seconds) while content is recompiled in the background using WordPress's Action Scheduler.
But again: For 99% of content sites, instant updates don't matter. Blog posts, marketing sites, documentation—content changes a few times per day at most. A 10-second delay is totally acceptable in exchange for 10x faster performance.
If you need real-time updates (like a live sports site or stock ticker), Headless Bridge isn't for you. But most sites don't need that.
I rebuilt that failing client project using the Headless Bridge prototype.
Before (WPGraphQL):
After (Headless Bridge):
The client's exact words: "I don't know what you did, but this is exactly what I wanted from headless WordPress in the first place."
That prototype worked, but it wasn't production-ready. Over the next few months, we added:
We ran comprehensive benchmarks against WPGraphQL, REST API, and Headless Bridge across different scenarios:
| API | TTFB | DB Queries | Response Size |
|---|---|---|---|
| WordPress REST API | 245ms | 8 | 12KB |
| WPGraphQL | 387ms | 12 | 15KB |
| Headless Bridge | 48ms | 1 | 8KB |
| API | TTFB | DB Queries | Response Size |
|---|---|---|---|
| WordPress REST API | 512ms | 9 | 12KB |
| WPGraphQL | 847ms | 14 | 16KB |
| Headless Bridge | 51ms | 1 | 8KB |
| API | TTFB | DB Queries | Response Size |
|---|---|---|---|
| WordPress REST API | 1,240ms | 11 | 13KB |
| WPGraphQL | 2,150ms | 18 | 17KB |
| Headless Bridge | 53ms | 1 | 8KB |
The key insight: Headless Bridge performance stays flat regardless of content volume. WPGraphQL and REST API degrade significantly.
Content-focused sites - Blogs, marketing sites, documentation, portfolios
High-traffic sites - News sites, magazines, publishers
Agency projects - Client sites with standard requirements
Real-time applications - Live sports scores, stock tickers, chat apps
Complex data relationships - E-commerce with complex filters
Directory sites - Listings with thousands of search combinations
We decided to make Headless Bridge free and open source with optional Pro features.
Why free?
Why Pro?
The split:
99% of personal projects can use the free version. Professional projects that need ACF or webhooks upgrade to Pro. Agencies with multiple clients get Agency licenses.
Building Headless Bridge taught us several lessons:
Just because WPGraphQL is the "standard" doesn't mean it's the best solution. Sometimes the best approach is to go back to first principles and rethink the problem.
Developers love flexible tools. But users don't care about GraphQL. They care about fast websites. Trade flexibility for performance every time.
For content that doesn't change often, pre-compilation is almost always faster than runtime computation. This applies beyond WordPress APIs.
Nested JSON structures are elegant in theory but painful in practice. Flat structures are easier to work with, smaller in size, and faster to parse.
Headless Bridge is available now:
Install it, run a benchmark against your current WPGraphQL setup, and see the difference for yourself.
We're actively developing new features:
Coming Soon:
On the Roadmap:
Want to contribute? Open a GitHub issue or PR. We're building this in public.
WPGraphQL is an impressive piece of engineering. For applications that need GraphQL's query flexibility, it's still a solid choice.
But for the vast majority of headless WordPress projects—blogs, marketing sites, documentation, portfolios—you don't need GraphQL's complexity. You need speed.
That's why we built Headless Bridge.
If you're frustrated with slow TTFB, degrading performance at scale, or server costs that keep climbing, give Headless Bridge a try. It might just save your project—like it saved ours.
Ready to 10x your headless WordPress API?
Download Headless Bridge Free →
Questions? Comments? Find me on Twitter @HBridgeWP or email [email protected]
Andy Ryan is a full-stack developer who specializes in headless WordPress and modern JavaScript frameworks. After years of frustration with WPGraphQL performance, he built Headless Bridge to solve the speed problem once and for all. When not coding, you can find him enjoying nature while rock climbing and hiking.
Related Articles:
2026-01-26 01:35:28
Ever wanted to build an AI that can actually answer phone calls and have a real conversation? 🎙️
In this Part 2 of my AI Voice Agent series, I walk you through connecting ElevenLabs Conversational AI to Twilio to create a fully functional voice agent that can TALK BACK!
This tutorial covers building a bidirectional audio bridge where:
It's like having your own AI assistant that can answer calls 24/7!
If you found this helpful, drop a ❤️ and follow me for more AI & web dev content!
Have questions? Let me know in the comments below! 👇
2026-01-26 01:34:15
If you've been pasting entire src/ folders into ChatGPT and praying to the Silicon Gods, stop it. Get some help.
Enter Model-Context-Protocol (MCP).
It’s not just a fancy acronym use to impress your Product Manager (though it will do that). It’s the design pattern that stops your AI app from turning into a plate of unmaintainable spaghetti.

(Your codebase right now. Don't lie.)
Let's look at how most people build their first AI app. It usually looks something like this disaster:
// classic_beginner_mistake.js
async function askAI(question) {
// 🚩 RED FLAG: Hardcoded logic mixed with DB calls
const context = await db.getUserHistory();
// 🚩 RED FLAG: String bashing hell
const prompt = `You are a helpful assistant. Here is history: ${JSON.stringify(context)}. User asks: ${question}`;
// 🚩 RED FLAG: Married to OpenAI forever
const response = await openAI.chat.completions.create({ model: "gpt-4", prompt });
return response;
}
Why this sucks:
503 Service Unavailable do us part.MCP separates these concerns into three distinct layers. Think of it like a fancy Michelin-star restaurant, but instead of food, we serve functions.
The Chef (Model) doesn't care who the customer is. They just know how to cook (generate text/code).
The Waiter (Context Manager) gathers what's relevant. They don't give the Chef the customer's entire life story including their childhood trauma. They say, "Table 5, allergy to peanuts, wants spicy."
The standardized language everyone speaks. The customer points to item #4. The waiter writes "Item #4". The Chef cooks "Item #4".
Here is a pseudo-code example of what an MCP architecture looks like. Notice how it sparks joy?
// 1. Define the Protocol (The Contract)
interface AIRequest {
task: "summarize" | "translate" | "generate_code";
data: string;
constraints: string[];
}
// 2. The Context Provider (The Waiter)
class ContextManager {
getRelevantContext(userId: string): string {
// Smart logic to only get what matters
// "User prefers Python over JavaScript because they have taste."
return "User prefers Python.";
}
}
// 3. The Model Adapter (The Chef Wrapper)
class ModelAdapter {
constructor(private provider: "openai" | "anthropic") {}
async execute(request: AIRequest, context: string) {
// Handles the weird specific API details here
// So your main app can live in blissful ignorance
if (this.provider === "openai") {
return callOpenAI(request, context);
} // ...
}
}
By adopting the MCP pattern, you're not just over-engineering; you're building for the future.
This is just the tip of the iceberg. We haven't even talked about Agentic Workflows or Tool Use yet (which are basically MCP on steroids and caffeine).
In the next posts, we'll dive deeper:
Stay tuned, and remember: Always structure your prompts, or your prompts will structure you.
2026-01-26 01:33:57
Act 1: You discover the OpenAI API. You're drunk with power. "I can build Jarvis!" you scream into the void. You build a chatbot in 20 lines.
Act 2: Your PM asks for "just a few more features." You add them. Then more. Then you add "PDF support" which is just regex hoping for the best.
Act 3: You're staring at 2,000 lines of spaghetti, the context window is overflowing, the AI is hallucinating company policies that involve free pizza, and you've forgotten what happiness feels like.

(A live look at your server logs)
This is the journey of every developer who touches LLMs. I'm here to tell you: it's not your fault, and there's a way out.
Here's how it starts. Twenty lines of beautiful, naive code:
// The honeymoon phase
import OpenAI from 'openai';
const openai = new OpenAI();
async function askAI(question: string) {
const response = await openai.chat.completions.create({
model: 'gpt-4',
messages: [
{ role: 'system', content: 'You are a helpful assistant.' }, // Minimalist art
{ role: 'user', content: question }
]
});
return response.choices[0].message.content;
}
// It works! Ship it!
console.log(await askAI("What's the weather like?"));
You show your PM. They're impressed. You're a genius. Life is good. Ideally, you should stop here and retire.
Then the requests come:
And you, the naive optimist, say "Sure!"
// Three weeks later... (Viewer discretion advised)
async function askAI(question: string, userId: string) {
// Get conversation history (Loading... loading...)
const history = await db.getConversationHistory(userId);
// Get user context (All of it. Just in case.)
const user = await db.getUser(userId);
const recentOrders = await db.getRecentOrders(userId);
const tickets = await supportSystem.getOpenTickets(userId); // Why do we need tickets? Who knows!
// Build the mega-prompt from hell
const systemPrompt = `
You are a helpful assistant for ${COMPANY_NAME}.
Current user: ${user.name} (${user.tier} tier)
Recent orders: ${JSON.stringify(recentOrders)}
Open tickets: ${JSON.stringify(tickets)}
Available actions (Please work, please work):
- To book a meeting, respond with: [BOOK_MEETING: datetime, description]
- To send an email, respond with: [SEND_EMAIL: to, subject, body]
Brand voice guidelines:
${BRAND_VOICE_DOCUMENT} // <- Goodbye, token budget
Remember: Never mention competitors. Always be helpful. Be funny but not too funny.
`;
// ... (API Call) ...
// Parse the response for actions using reliable technology: REGEX
if (content.includes('[BOOK_MEETING:')) {
// 60% of the time, it works every time
const match = content.match(/\[BOOK_MEETING: (.*?), (.*?)\]/);
if (match) {
// ...
}
}
}
This code "works," but you're now dealing with:
Your system prompt is 3,000 tokens. User history is 2,000. Customer data is 1,000. You're spending $5 per question to ask "Hi".
You're using regex to parse natural language. The model writes [BOOK MEETING] without the underscore and your app crashes.
The model confidently tells users about orders that don't exist because it's completing the pattern. "Your order of 500 Rubber Ducks is on the way!" (User ordered 1 pen).
Here's the good news: these problems have solutions. Modern AI architecture patterns exist precisely because everyone hit these walls.
The key principles:
Here's what the same feature set looks like with proper architecture:
// With MCP-style architecture
const agent = new Agent({
model: 'gpt-4',
tools: [
bookingTool, // Handles its own validation
emailTool, // Handles its own auth
],
context: dynamicContextLoader(userId), // Loads what's needed
});
const response = await agent.run(question);
// That's it. Go home.
Next up: "MCP: The Secret Sauce (That Isn't Ranch) for AI Apps" → where we finally learn the architecture that fixes all of this.
2026-01-26 01:33:24

(You, after reading this article)
You've learned what LLMs are and how they work. Now comes the actual skill: making them do what you want.
This is harder than it sounds. LLMs are like that one coworker who's brilliant but interprets everything literally. Say "make it better" and they'll add sparkles. Say "fix the bug" and they'll delete the file.
Let's learn how to communicate properly.
Every effective prompt has these components:
[ROLE] Who should the AI pretend to be?
[CONTEXT] What does it need to know?
[TASK] What should it actually do?
[FORMAT] How should the output look?
[CONSTRAINTS] What should it avoid?
Write me some code for a login page.
Why it sucks: No context, no constraints, no format. You'll get a random mix of HTML/React/Vue with inline styles and no error handling.
You are a senior frontend developer specializing in React and TypeScript.
Context: I'm building a B2B SaaS dashboard. We use:
- React 18 with TypeScript
- Tailwind CSS for styling
- React Hook Form for forms
- Our existing AuthContext for state
Task: Create a login page component with email and password fields.
Requirements:
- Use our existing AuthContext's login() function
- Show loading state during submission
- Display API errors below the form
- Redirect to /dashboard on success
Format: Provide the complete component file with proper TypeScript types.
Why it works: Clear role, specific context, defined requirements, expected format.

(The difference is night and day)
When your prompts aren't working, use RICE:
| Letter | Meaning | Question to Ask |
|---|---|---|
| R | Role | Who is the AI being? |
| I | Instructions | What exactly should it do? |
| C | Context | What background info does it need? |
| E | Examples | Can I show what I want? |
Nothing beats a good example. LLMs are pattern-matching machines—show them the pattern.
Convert these sentences to the passive voice.
Example:
- Input: "The cat ate the fish."
- Output: "The fish was eaten by the cat."
Now convert:
- "The developer wrote the code."
- "The manager approved the request."
This works 10x better than explaining grammatical rules.

(Step by step, like a robot learning to dance)
For complex reasoning, tell the model to think step by step:
Solve this problem. Think through it step by step before giving your final answer.
Problem: A store has 3 types of items. Type A costs $5, Type B costs $8,
Type C costs $12. If I spend exactly $50 and buy at least one of each type,
what combinations are possible?
Without "step by step," models often jump to wrong conclusions. With it, they show their work and catch errors.
Give 2-3 examples before your actual request:
Classify the sentiment of these reviews:
Review: "This product changed my life! Best purchase ever!"
Sentiment: Positive
Review: "Arrived broken. Customer service was unhelpful."
Sentiment: Negative
Review: "It's okay. Does what it says, nothing special."
Sentiment: Neutral
Now classify:
Review: "Decent quality for the price, but shipping took forever."
Sentiment:
For critical tasks, ask the model to solve the problem multiple ways and check if answers agree:
Solve this problem using two different approaches.
If your answers differ, explain which one is correct and why.
Combine perspectives for better output:
You are three experts collaborating:
1. A security engineer who spots vulnerabilities
2. A UX designer who ensures usability
3. A performance engineer who optimizes speed
Review this authentication flow and provide feedback from all three perspectives.
Make it better.
Fix: Be specific about what "better" means.
Improve this code's readability by:
- Adding TypeScript types
- Extracting magic numbers into named constants
- Adding JSDoc comments to public functions
Why isn't this working?
[pastes 500 lines of code]
Fix: Explain the expected vs actual behavior.
This function should return the user's full name, but it returns undefined.
Expected: "John Doe"
Actual: undefined
Here's the relevant code:
[paste only the relevant 20 lines]
Give me some API endpoints for a todo app.
Fix: Specify the output format.
Design REST API endpoints for a todo app.
Format your response as a markdown table with columns:
| Method | Endpoint | Description | Request Body | Response |
Analyze this data and provide insights.
Fix: Tell it what to do when uncertain.
Analyze this data and provide insights.
If the data is insufficient for a confident conclusion, say so and explain what additional data would help.
Here are battle-tested templates for common tasks:
Review this [LANGUAGE] code as a senior developer. Focus on:
1. Bugs or potential runtime errors
2. Security vulnerabilities
3. Performance issues
4. Readability improvements
For each issue, explain:
- What's wrong
- Why it matters
- How to fix it (with code example)
Code:
[YOUR CODE]
Explain [CONCEPT] to me as if I'm a [SKILL LEVEL] developer.
Use:
- Simple analogies
- Practical examples
- Code snippets where helpful
Avoid:
- Jargon without explanation
- Overly academic language
I have a bug in my [LANGUAGE] code.
Expected behavior: [WHAT SHOULD HAPPEN]
Actual behavior: [WHAT HAPPENS INSTEAD]
Error message (if any): [ERROR]
Relevant code:
[CODE SNIPPET]
What I've tried:
[LIST ATTEMPTS]
Help me identify the root cause and fix it.
Here's a cheat code—ask the AI to help you write better prompts:
I want to use an LLM to [YOUR GOAL].
Help me create an effective prompt by:
1. Asking clarifying questions about my requirements
2. Suggesting an appropriate role for the AI
3. Identifying context the AI might need
4. Proposing a clear output format
Then iterate. Good prompts are rarely written on the first try.
Let's peek under the hood at why these techniques actually work.
LLMs generate tokens by sampling from a probability distribution. Temperature controls how "creative" (random) this sampling is.
$$
P(token_i) = \frac{e^{z_i / T}}{\sum_j e^{z_j / T}}
$$
Where:
Why specificity matters: A vague prompt creates a flat distribution—many tokens are roughly equally likely. A specific prompt concentrates probability on the "right" tokens.
When you provide examples (few-shot prompting), you're essentially updating the model's behavior without changing its weights. The attention mechanism allows the model to:
This is why example format matters so much—the model literally pattern-matches against your examples.
LLMs generate tokens one at a time, conditioning on all previous tokens:
$$
P(output) = \prod_{i=1}^{n} P(token_i | token_1, ..., token_{i-1})
$$
When you force the model to "think step by step," you're adding intermediate tokens that:
Without CoT, the model tries to jump directly from question to answer—skipping reasoning that might have corrected errors.
When you say "You are a senior security engineer," you're biasing the model's hidden states toward a region of embedding space associated with:
The first few tokens heavily influence the trajectory through the model's latent space. A good role prompt puts you on the right "track."
Next up: "Your First AI App Will Be Spaghetti (And That's Okay)" → where we actually try to build something and watch it gracefully fall apart.
2026-01-26 01:33:07
What happens when you type "Write me a poem about pizza" into ChatGPT?
If you said "it understands your deep yearning for pepperoni and crafts a creative response," I have bad news: you've been lied to.
LLMs don't understand anything. They don't think. They don't know what pizza is. They've never tasted cheese. They're just really, really good at one thing: predicting the next word.
Remember your phone's keyboard suggestions? The ones that turn "I'm on my" into "I'm on my way"?
LLMs are that, but on steroids. And Red Bull. And training on the entire internet.
Here's the mental model:
Input: "The capital of France is"
LLM thinking: "Based on 45,000 Wikipedia articles, the next word is 99.9% likely to be..."
Output: "Paris"
It's not looking up facts. It's not reasoning. It's pattern matching at an absurd scale.
LLMs don't read words—they read tokens. A token is roughly 3-4 characters, or "a chunk of a word."
| Text | Tokens |
|---|---|
| "Hello" | 1 token |
| "ChatGPT" | 2 tokens: "Chat" + "GPT" |
| "Supercalifragilisticexpialidocious" | 7 tokens (and a headache) |
Every LLM has a context window—a maximum amount of text it can hold in its "brain" at once.
When your conversation exceeds this limit, the model literally forgets the beginning. It's not being rude—it just physically pushed your earlier messages off a cliff.

(The LLM forgetting your name after 4000 tokens)
So how does "next word prediction" produce coherent essays? The secret sauce is Attention.
Imagine you're at a loud cocktail party. You can hear everyone, but you pay attention only to the person saying your name.
LLMs do this with words. When generating a response, the model looks back at all previous tokens and decides which ones are "relevant" to the current word it's trying to spit out.
If I say: "The doctor took her stethoscope..."
The model connects "her" to "doctor". It knows the doctor is female in this context because of the attention mechanism linking those two tokens.
Here's the uncomfortable truth: LLMs don't know what they don't know.
When you ask an LLM about something it wasn't trained on, it doesn't say "I don't know." Instead, it predicts the most statistically likely series of words.
You: "Who is the CEO of The Made Up Company Inc?"
LLM: "The CEO of The Made Up Company Inc is John Smith, appointed in 2021."
Why?! Because "John Smith" and "appointed in" are words that frequently appear near "CEO" in its training data. It's not lying; it's improv.
Warning: The following section contains linear algebra. Proceed at your own risk.
The core of transformer-based LLMs is the self-attention mechanism.
$$
\text{Attention}(Q, K, V) = \text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V
$$
Translation for humans:
We smash these vectors together (dot product), normalize them (softmax), and get a weighted sum. It's basically a giant, mathematical matchmaking service for words.
Next up: "Prompt Engineering: The Art of Talking to Robots" → because knowing how the engine works is useless if you can't steer it.