{"slug":"nextjs-rsc-flight-format-ai-crawler","locale":"en","isFallback":false,"translationAvailable":["en","id"],"translationUrls":{"en":"/api/notes/nextjs-rsc-flight-format-ai-crawler?locale=en","id":"/api/notes/nextjs-rsc-flight-format-ai-crawler?locale=id"},"title":"Why AI Crawlers Can't Read Next.js App Router Sites","description":"Next.js App Router outputs RSC Flight payload, not plain HTML. Plain HTTP crawlers like Claude's web_fetch get structure but miss content. Here's the fix.","date":"2026-03-31","updated":null,"tags":["nextjs","web","debugging","ai","linux"],"content":"\nI was testing something with Claude — I asked it to fetch one of my SnipGeek articles directly from its URL. It came back with just the title tag. The article body was completely empty.\n\nMy first instinct was to blame my own code.\n\n## First Diagnosis: Client-Side Rendering?\n\nThe obvious suspect: maybe the article pages were still client-side rendered, sending only a shell HTML and injecting content via JavaScript after load. This is a classic Next.js mistake when `\"use client\"` ends up on a page component by accident.\n\nI asked Antigravity to audit the full codebase. The result was surprisingly clean:\n\n- `[locale]/blog/[slug]/page.tsx` → ✅ Server Component\n- `[locale]/notes/[slug]/page.tsx` → ✅ Server Component\n- MDX compiled server-side via `next-mdx-remote/rsc` → ✅\n- `generateStaticParams` present → ✅\n\nEverything was correct. So why was the content missing?\n\n## Second Diagnosis: RSC Flight Format\n\nI ran a deeper diagnostic directly against the live URL:\n\n```bash\ncurl -s https://snipgeek.com/notes/how-to-read-ai-build-failed-logs | grep -i 'article\\|content\\|body\\|prose' | head -20\n```\n\nThe response was **101KB** — not an empty shell. Keywords like `content`, `article`, and `prose` appeared hundreds of times. But when I dug into the actual content, this is what I found:\n\n```\n{\"className\":\"text-lg text-foreground/80 prose-content\",\"children\":\"$L1d\"}\n```\n\n`$L1d` is not article text. It's a reference to a **React Server Component chunk** — Next.js App Router's RSC Flight streaming format. The full article content is there, but encoded as a payload that requires the React runtime to decode into readable HTML.\n\nConfirmation:\n\n```bash\ncurl -s https://snipgeek.com/notes/how-to-read-ai-build-failed-logs | grep '<p>'\n# Total <p> tags: 0\n# Total <h2> tags: 0\n```\n\nZero traditional HTML tags. The content is entirely inside the RSC payload.\n\n## This Isn't a Bug — It's an Architecture Trade-off\n\nThe old Pages Router emitted raw HTML: `<p>`, `<h2>`, full readable content in the HTTP response. App Router switched to RSC Flight — a streaming format optimised for hydration performance, but unreadable without React runtime.\n\nFor SEO, this is fine:\n\n| Crawler | Can Read Content? | Reason |\n|---|---|---|\n| Googlebot | ✅ | Headless Chrome, full JS render |\n| Bingbot | ✅ | Same — full JS render |\n| AI crawlers (GPTBot, ClaudeBot) | ⚠️ | Depends — some render JS, some don't |\n| Claude via `web_fetch` | ❌ | Plain HTTP fetch, no JS execution |\n\nGoogle can read everything. The problem is specific to crawlers that rely on plain HTTP without JavaScript rendering.\n\n## The Fix: A Plain JSON API Route\n\nI added Route Handlers in Next.js that serve article content as plain JSON — no RSC format, no JavaScript required:\n\n```\nGET /api/posts/[slug]?locale=en   → English article JSON\nGET /api/posts/[slug]?locale=id   → Indonesian article JSON\nGET /api/notes/[slug]?locale=en   → English note JSON\nGET /api/notes/[slug]?locale=id   → Indonesian note JSON\n```\n\nA few decisions I made during implementation:\n\n- **Locale fallback** — if an `id` version doesn't exist, it falls back to `en` with `isFallback: true` in the response.\n- **`X-Robots-Tag: noindex`** — prevents Google from indexing the API route as a duplicate of the main page.\n- **`Cache-Control: public, max-age=3600`** — caches responses to avoid repeated serverless invocations.\n- **`translationUrls`** — a field listing the full API URL for each available locale, useful for tools consuming the API.\n\nAfter deploying, a quick test:\n\n```bash\ncurl -s \"https://snipgeek.com/api/posts/ubuntu-26-04-beta-sudah-bisa-didownload?locale=id\"\n```\n\nResponse:\n\n```json\n{\n  \"slug\": \"ubuntu-26-04-beta-sudah-bisa-didownload\",\n  \"locale\": \"id\",\n  \"isFallback\": false,\n  \"translationAvailable\": [\"en\", \"id\"],\n  \"translationUrls\": {\n    \"en\": \"/api/posts/ubuntu-26-04-beta-sudah-bisa-didownload?locale=en\",\n    \"id\": \"/api/posts/ubuntu-26-04-beta-sudah-bisa-didownload?locale=id\"\n  },\n  \"title\": \"Ubuntu 26.04 Beta Sudah Rilis — Tapi Jangan Buru-Buru Install\",\n  \"description\": \"...\",\n  \"date\": \"2026-03-30\",\n  \"tags\": [\"ubuntu\", \"linux\", \"beta\"],\n  \"content\": \"\\nSaya nunggu beta Ubuntu 26.04 ini sambil setengah semangat...\"\n}\n```\n\nFull article content, readable as plain text. No browser, no JavaScript needed.\n\n<Callout variant=\"info\" title=\"Safe Change\">\n  This API route lives entirely under `/api/*` — a separate namespace that cannot conflict with or break any existing page routing. It's a **purely additive** change.\n</Callout>\n\n## What's Next\n\nThe next step I'm planning: implement [`llms.txt`](https://llmstxt.org) — an emerging standard (similar to `robots.txt` but for AI) that lists all SnipGeek content URLs in a format that LLM crawlers can process easily.\n\nFor the curious, the relevant specs are in the [Next.js Route Handlers docs](https://nextjs.org/docs/app/building-your-application/routing/route-handlers) and the [React Server Components reference](https://react.dev/reference/rsc/server-components).\n\nIf you hit this same wall with your own Next.js site, adding a plain JSON API route is probably the fastest fix. Let me know if it works for you.\n\n### References\n1. [Next.js Route Handlers — Next.js Docs](https://nextjs.org/docs/app/building-your-application/routing/route-handlers)\n2. [React Server Components — React Docs](https://react.dev/reference/rsc/server-components)\n3. [llms.txt — Emerging Standard for AI Crawlers](https://llmstxt.org)\n"}