Optimizing Googlebot Crawl Budget for Next.js on Firebase
Iwan Efendi3 min
Cache your sitemap, set stable lastModified dates, and add s-maxage headers to protect Googlebot's crawl budget without blocking article re-crawls.
Someone asked me whether deploying the site several times a day would drain Googlebot's crawl budget — and whether doing that repeatedly could eventually get the site de-prioritized in search. My honest first instinct was to say "probably not," but I realized I'd never actually verified the setup from first principles. So I audited it properly. The answer turned out to be reassuring, but three things still needed fixing.
Googlebot doesn't re-crawl everything every time you push a new deploy. What it actually watches are:
The SSG setup with
Three issues were worth addressing.
No cache on
API routes explicitly set their own
This was already working — it just wasn't documented anywhere. When you update an article, add or update the
The sitemap reads
How Crawl Budget Actually Works
lastModifiedin your sitemap — if the date hasn't changed for a URL, Googlebot doesn't treat it as a priority- Server response time — if pages are slow or the origin is under load, Google backs off the crawl rate
- Content change signals — over time, pages that rarely change get visited less often
What Was Already Protecting the Budget
generateStaticParams and dynamicParams = false on blog and note pages was already the best possible starting point. All pages are pre-rendered at build time, so Googlebot gets immediate HTML without waiting for server compute. Fast response = more efficient crawl.
The sitemap was also using real frontmatter dates (updated || date) for every blog post and note. That means Googlebot gets an accurate freshness signal for content pages. When a post actually changes and you update the updated field, the date changes in the sitemap and re-crawl gets queued automatically.
The /_next/ path was also correctly disallowed in robots.txt, so Googlebot never wastes budget on thousands of JS chunks and build assets.
What Needed Fixing
sitemap.xml. The sitemap route reads every MDX file, computes tag data, and builds the full entry list on every request. With multiple crawlers fetching it regularly (Googlebot, GPTBot, ClaudeBot, etc.), that's avoidable repeated origin work.
No Cache-Control for HTML pages. Without an explicit CDN-level cache header, Firebase App Hosting's default behavior for pre-rendered pages is uncertain. If the CDN isn't caching those responses, every Googlebot visit is an origin hit.
Info pages on "weekly" change frequency. Pages like /about, /privacy, and /disclaimer almost never change. Signaling "weekly" is inaccurate and nudges Google to allocate budget toward pages that rarely need re-indexing.
The Three Fixes
1
Cache the sitemap with ISR
Addedexport const revalidate = 3600 to src/app/sitemap.ts. Next.js now serves a cached version for up to one hour before recomputing. The full MDX scan only runs once per hour at most, regardless of how many bots fetch /sitemap.xml.// Cache sitemap for 1 hour to avoid recomputing on every crawler request
export const revalidate = 3600;2
Correct change frequency for info pages
Split the routes insitemap.ts into two groups. Discovery pages (/, /blog, /notes, /tags) stay at "weekly". Info pages move to "monthly" with a lower priority of 0.5.const contentRoutes = ["", "/blog", "/notes", "/tags"];
const infoRoutes = ["/about", "/contact", "/privacy", "/terms", "/disclaimer"];3
Add CDN-level cache headers for HTML pages
Added a new rule innext.config.ts that sets Cache-Control for all non-API routes. This tells Cloud CDN (Firebase App Hosting) to cache pre-rendered HTML for 1 hour, then serve stale while refreshing for up to 24 hours in the background.{
source: "/((?!api/).*)",
headers: [
{
key: "Cache-Control",
value: "public, s-maxage=3600, stale-while-revalidate=86400",
},
],
},Cache-Control in their response handlers, so they're not affected.Triggering Re-Crawl for Updated Articles
updated field in frontmatter:
date: "2026-01-15"
updated: "2026-04-19"post.frontmatter.updated || post.frontmatter.date for lastModified. When Googlebot fetches the sitemap next and sees the new date for that URL, it schedules a re-crawl. No manual action, no Search Console submission needed.
Tool Pages
Tool pages like
/tools/spin-wheel use a hardcoded STATIC_LAST_MODIFIED constant in sitemap.ts. Update that date manually when a tool changes significantly — otherwise Googlebot has no signal to re-visit it.Key Takeaways
- Deploy frequency doesn't directly drain crawl budget —
lastModifiedstability is what matters - Cache the sitemap to prevent repeated full MDX scans on every bot request
- Set
s-maxageon HTML pages — don't rely on Firebase App Hosting's default CDN behavior - Differentiate
changeFrequencybetween discovery pages and static info pages - Use
updated:frontmatter to explicitly signal content changes to Googlebot
References
Topics
Topics in this note
Explore related ideas through the topics connected to this note.
Share this article
Discussion
Preparing the comments area...