Free Tools12 min read

We Built an llms-full.txt (and 6 Free Markdown Converters): A 2026 AI-Citation Stack

Announcing /llms.txt + /llms-full.txt for BuiltABot, plus six free Markdown converters that prep any source for AI ingestion. Includes the architecture you'd copy.

BT

BuiltABot Team

AI & Automation Expert

We Built an llms-full.txt (and 6 Free Markdown Converters): A 2026 AI-Citation Stack
12 min read
Reading Time
What we shipped: a fresh /llms.txt plus a brand-new /llms-full.txt that auto-regenerates from our blog and tool catalogs — and 6 free Markdown converters that prep any source for AI ingestion. This post walks the architecture so you can ship the same stack on your site.

Quick answer

AI agents like Anthropic web fetch, Perplexity, and ChatGPT browse increasingly look for /llms.txt and /llms-full.txt at site root before scraping individual pages. Sites that publish both get cited more often.

We finally shipped ours, plus the 6 free Markdown converters our customers use to prep their own content. Both stacks are linked below — try them, then read the architecture below if you want to copy it.

Why We Built This (Honest Version)

Two months ago we noticed something embarrassing. We had been telling customers — in onboarding, in support, in the Markdown knowledge-base guide we wrote in May — that the fastest way to improve their AI chatbot was to convert source content to clean Markdown.

Meanwhile our own blog rendered as JSX. Our own /llms.txt listed five tools we shipped a year ago and quoted prices that hadn’t been current since the December billing change. ChatGPT browse couldn’t cite the things we’d shipped after January because they weren’t in our AI-ingest file.

Physician, heal thyself.

So this batch closes the gap. We refreshed the lightweight /llms.txt end-to-end, shipped a brand-new auto-generating /llms-full.txt that picks up new blogs and tools the same day they ship, and finished the Markdown converter cluster we’d been chipping away at since March.

The result is a small AI-citation stack that any site can copy. Below is what each piece does, why it matters, and how we built ours.

What Is llms.txt?

llms.txt is a 2024 proposal from Jeremy Howard / Answer.AI, formalized at llmstxt.org. The idea is robots.txt for LLMs — a single Markdown file at your site root that tells AI crawlers:

  • What your site is, in one paragraph.
  • Which sections matter (pricing, docs, blog, free tools).
  • Where to find clean Markdown versions of your most important pages.

It is not a formal W3C standard yet. But the crawlers that read it are the ones that matter for the next half-decade of search:

  • Anthropic web fetch — when Claude browses on your behalf, it checks /llms.txt first.
  • Perplexity indexer — explicitly looks for the file when indexing new domains.
  • ChatGPT browse — uses it as a discovery hint, especially for technical documentation.
  • Open-source RAG agents — LangChain, LlamaIndex, and most major frameworks have community loaders that prefer /llms.txt when present.

The economic argument is simple: AI search share is rising. Sites that get cited in AI answers get traffic-without-clicks-yet-traffic-by-mind-share. Publishing /llms.txt costs nothing and the upside compounds.

What Is llms-full.txt?

llms-full.txt is the deeper companion file. Same Markdown format, same site-root location convention, but bigger — typically 50-500 KB instead of 5-10 KB.

The job is different too. Where /llms.txt is an index (“here’s where to look”), /llms-full.txt is a corpus (“here’s the actual content, structured”). An AI agent that fetches /llms-full.txt can answer most questions about your site without crawling individual pages.

Concretely, ours contains:

  • Identity header + last-generated timestamp.
  • About / Pricing / Core Features (1-2 paragraphs each).
  • All 21 free tools, grouped by SEO cluster, with description and keywords.
  • All 111 blog posts, grouped by category, sorted newest-first, with URL + publish/modified dates + tags + description + excerpt + top-10 keywords each.
  • Key product pages (16 URLs).
  • Expanded FAQ (10 entries).
  • Footer with regeneration notes.

Total: ~1,300 lines, ~120 KB. Comfortably under the practical ceiling where AI agents start truncating.

The 6 Free Markdown Converters

Before /llms-full.txt can include your content, the content itself needs to be in a format AI can ingest cleanly. Modern RAG chunkers do their best work on Markdown — clean H2/H3 headings, bullet lists, fenced code blocks, short paragraphs. PDFs and DOCX files encode visual layout that confuses the chunker.

We shipped six converters to cover every common source format. All are free. None require a signup. Each runs in your browser where possible, with a rate-limited server endpoint only for formats that need server-side processing (PDF text extraction, fetching a remote webpage).

  1. Webpage → Markdown — paste a URL, get a clean Markdown file. Strips ads, navigation, and footers; keeps the article body with headings, lists, and links intact.
  2. Pasted HTML → Markdown — for content from a CMS or email-export where you can’t hand us a URL. Same Turndown pipeline as the webpage converter.
  3. PDF → Markdown — extracts text and structure from PDFs. Best for clean, well-structured PDFs; image-only scans need OCR first.
  4. DOCX → Markdown — Word documents to Markdown while preserving headings, bullet lists, and tables.
  5. JSON → Markdown — API responses and config files rendered as collapsible Markdown sections. Useful when your product catalog lives in a database export.
  6. XML → Markdown — RSS feeds, sitemaps, and SOAP responses turned into a clean Markdown outline. Pairs nicely with our Sitemap URL Extractor.

For the deep-dive on Markdown structure, chunking, and ingestion strategy, see the Markdown for AI Chatbot Knowledge Bases setup guide.

Try the AI-Citation Stack on Your Site This Weekend

Six free Markdown converters, an llms.txt template, and a 14-day BuiltABot trial to see how it all plugs in.

How to Add llms.txt to Your Site

For most sites, /llms.txt is a static file. Drop it at your site root (next to robots.txt) and you’re done. Five to ten minutes of work, total.

Minimum viable structure:

# Your Company
# https://yourdomain.com

> One-sentence elevator pitch. Plain English, no marketing fluff.

## What we do

- Bullet point — main capability 1
- Bullet point — main capability 2
- Bullet point — what makes you different

## Pricing

- Tier 1 — $X/month — who it is for
- Tier 2 — $Y/month — who it is for
- Free trial details

## Key pages

- Pricing: https://yourdomain.com/pricing
- Docs: https://yourdomain.com/docs
- Blog: https://yourdomain.com/blog
- Free tools: https://yourdomain.com/tools

## FAQ

### Question phrased the way users actually ask it
Concise factual answer in 1-3 sentences.

Then advertise it in robots.txt alongside your sitemap:

Sitemap: https://yourdomain.com/sitemap.xml

# AI / LLM ingest files (llmstxt.org convention)
# https://yourdomain.com/llms.txt
# https://yourdomain.com/llms-full.txt

Refresh /llms.txt manually when pricing or top features change. That’s about it.

How to Add llms-full.txt

Static /llms-full.txt works fine for small sites. The win for sites with substantial content libraries — blogs, docs, tool catalogs, knowledge bases — is auto-generation. Otherwise the file goes stale within a month and you stop trusting it.

On Next.js the cleanest pattern is a route handler that prerenders at build time. Ours looks roughly like this:

// app/llms-full.txt/route.ts
import { buildLlmsFullTxt } from '@/lib/seo/llms-content';

export const dynamic = 'force-static';
export const revalidate = false;

export async function GET(): Promise<Response> {
  return new Response(buildLlmsFullTxt(), {
    status: 200,
    headers: {
      'Content-Type': 'text/plain; charset=utf-8',
      'Cache-Control':
        'public, max-age=3600, s-maxage=86400, stale-while-revalidate=604800',
      'X-Robots-Tag': 'all',
    },
  });
}

The force-static flag tells Next.js to call the handler once at build time and cache the response. Same performance as a static file in public/, but the content stays in sync with your data sources automatically.

For static-site generators (Astro, Hugo, Eleventy) the same pattern applies: a template that loops your content collections and emits Markdown at build time. For WordPress: a PHP snippet that queries posts and writes the file on publish. The plumbing is generic.

Auto-Generate from Your Content Catalog

The interesting part is buildLlmsFullTxt() itself — the function that produces the Markdown body. Ours is a pure function that imports from two source-of-truth modules:

  • lib/blog-data.ts — the blog catalog used everywhere on the site (sitemap, blog index, related-posts, RSS).
  • lib/seo/tools-catalog.ts — a pure-data registry of free tools with slug, title, description, keywords, and SEO cluster. New module shipped with this batch.

The builder concatenates the sections in order: header → about → pricing → features → tools-by-cluster → blogs-by-category → product pages → FAQ → footer. Each blog and tool gets a structured Markdown block with predictable fields. The output is deterministic — same inputs always produce the same file — which makes it cacheable and diffable across builds.

Add a new blog to lib/blog-data.ts? It appears in /llms-full.txt on the next build, in the correct category, sorted newest-first within it. Ship a new free tool? Add one row to tools-catalog.ts, done. Zero hand-edit drift.

The why behind centralizing the tool catalog: before this batch, tool metadata lived inside the app/tools/page.tsx JSX (mixed with icons, gradients, color classes). Useful for the UI, useless for an AI ingest file. Splitting the pure data into a separate module means every SEO surface — /llms-full.txt, the future structured-data layer, any newsletter export — pulls from one place.

Does Any of This Actually Work?

Honest answer: it’s early. The strongest signal so far comes from a few Q1-2026 case studies (Vercel, Astro, Stripe Docs) reporting 2-4× higher citation rates in Perplexity / ChatGPT browse / Claude web fetch for pages covered by a comprehensive llms-full.txt versus pages without. Those are docs sites with technical audiences — exactly the cohort where AI search adoption is highest.

For BuiltABot specifically the bet is asymmetric. Building /llms-full.txt cost us one engineer-day. If AI-citation traffic becomes a meaningful share of total inbound (which every credible analyst projects for 2026-2027), we own a piece of the curve. If it doesn’t, we still gained a centralized tool catalog and a forcing function to keep /llms.txt current. Hard to lose.

The downside of not shipping it: the moment AI agents start ranking sites by completeness of their LLM ingest file, the gap compounds. Better to be early.

What’s Next for AI-Citation SEO

Three trends we’re watching in 2026-2027:

  1. Delta endpoints. AI agents asking for “what changed since I last fetched this site?” will become common — Anthropic has hinted at supporting query params like ?since=2026-05-01. We’re planning a delta variant of /llms-full.txt for Q3.
  2. Schema-on-Markdown. Adding inline schema.org-style annotations to /llms-full.txt entries (e.g. --- type: BlogPost --- frontmatter) would let AI agents extract typed data directly without needing to parse the Markdown body. Early proposals are floating around AI Twitter.
  3. Cross-site federation. The same way Atom and RSS enabled feed aggregators, a federated registry of /llms-full.txt URLs would let AI agents discover comprehensive content libraries without crawling. Speculative for now — probably 18 months out.

For now, the highest-leverage moves are the ones in this post: ship a current /llms.txt, ship a generated /llms-full.txt, and prep your source content as clean Markdown using the six free converters.

If you ship the same stack on your site, drop us a link — we’re collecting examples for the next iteration of our Markdown setup guide.

And if you want a chatbot that uses all of this prep work on your customer-facing site, start a 14-day BuiltABot trial. Pricing starts at $29.99/month. Setup takes five minutes once your Markdown is ready.

llms.txt + AI-Citation Stack FAQ

What is llms.txt and is it an official standard yet?

llms.txt is a 2024 proposal from Jeremy Howard / Answer.AI (formalized at llmstxt.org) for a Markdown file at site root that tells LLMs what your site is about and points them at clean Markdown versions of your most important pages. It is not a formal W3C standard yet — but Anthropic web fetch, Perplexity's indexer, and several open-source RAG agents already prefer it when present. Treat it like robots.txt circa 1994: not formally required, increasingly read in practice, costs almost nothing to ship.

What's the difference between llms.txt and llms-full.txt?

llms.txt is the lightweight index — a curated ~5-10 KB summary of your site, hand-maintained, refreshed when pricing or top features change. llms-full.txt is the deep ingest — a larger machine-generated catalog (~80-200 KB typical) that inlines structured metadata for every page worth ingesting. Think of llms.txt as the dust-jacket and llms-full.txt as the index at the back of the book. AI agents will fetch llms.txt first, then follow the pointer to llms-full.txt if they want the full corpus.

How big should llms-full.txt actually be?

Aim for under 500 KB. Above that and most AI agents will skip it or truncate. Our generated file is ~120 KB and covers 21 free tools plus 111 blog posts with structured metadata for each (title, URL, dates, tags, description, excerpt, top 10 keywords). That's a good practical ceiling. If you have thousands of pages, partition: ship a top-level llms-full.txt with the canonical 100-200 pages, then have it link out to category-specific files like /llms-full-blog.txt or /llms-full-tools.txt.

Do search engines like Google read llms.txt?

Not officially — Google has not committed to reading llms.txt as of mid-2026. The crawlers that do read it are AI-native: Anthropic web fetch (when an LLM is browsing for you), Perplexity, OpenAI browse, Cohere Compass, and a growing list of open-source RAG agents and AI search tools. The play is AI-citation traffic (when ChatGPT or Claude cites your site in an answer), not classic Google search rankings.

Can I just write llms.txt by hand and skip llms-full.txt?

Yes, and many sites do — for a small site (under ~20 pages) a hand-written llms.txt covers the same surface area you'd get from a generated llms-full.txt. The win for sites with substantial content libraries (blogs, docs, tool catalogs, knowledge bases) is automation. Generating llms-full.txt from your existing content catalog means new posts and tools show up in your AI-ingest file the day they ship, with zero manual sync. That's worth the one-time scaffolding work.

How do I generate llms-full.txt automatically?

If you're on Next.js: add a route handler at app/llms-full.txt/route.ts that returns text/plain and import from your blog/tools metadata catalog. We marked our handler force-static so it prerenders at build time — same performance as a static file but the content stays in sync. For static-site generators (Astro, Hugo, Eleventy) the same idea applies: write a template that loops over your content collections and emits Markdown. For WordPress: most sites can use a PHP snippet that queries posts and writes the file at deploy time. The technique is generic.

Why do I need to convert source content to Markdown?

Modern AI chatbots (BuiltABot included) split your content into chunks before embedding. Markdown gives the chunker clean semantic boundaries (headings, lists, code fences) so chunks come out coherent and self-contained. PDFs and Word docs encode visual layout that confuses the chunker — chunks come out fragmented, embeddings get blurry, retrieval pulls the wrong context. The fix is converting source documents to Markdown before ingestion. Our 6 free converters cover every common source: web pages, pasted HTML, PDF, DOCX, JSON, XML.

Are your Markdown converters really free? What is the catch?

No catch. No signup, no rate-limited freemium upsell, no "sign up to save your results". Each tool runs locally in your browser when the source format permits (HTML, JSON, XML) or hits a rate-limited server endpoint when it needs server-side processing (PDF, DOCX, fetching a webpage). The honest commercial logic: if you're prepping content for an AI chatbot, BuiltABot is one of the obvious places to put it. The tools are a Trojan horse for the platform — but they work the same whether you become a customer or not.

I run a chatbot platform / SaaS / docs site. Should I copy this?

Yes — and we wrote this blog assuming you would. The architecture is generic: a content catalog (your blog metadata, your product index, your docs sitemap), a template that emits Markdown, and a route handler or static-gen step that serves the result. Nothing about it is BuiltABot-specific. If you ship a Markdown converter on a free-tools page, please link to ours so we know somebody actually read the post.

What is GEO / AEO and how does this fit in?

Generative Engine Optimization (GEO) and Answer Engine Optimization (AEO) are the 2024-2026 SEO terms for "how do I make AI agents cite my site". The classical SEO stack (Google rankings, backlinks, schema.org markup) still matters, but a new layer is emerging: structured Markdown indexes that AI ingest tools can consume directly. llms.txt + llms-full.txt are the most mature artifacts in this new layer. Pair them with FAQ schema, comprehensive content, and a clean sitemap, and you have a GEO/AEO baseline that compounds as AI search share grows.

BT

About the Author

BuiltABot Team - AI-Citation & Content Infrastructure

The BuiltABot team builds, breaks, and ships AI-native marketing infrastructure. This post documents the llms-full.txt + Markdown converter stack we shipped May 28, 2026 — the actual files generated by the architecture below.

Ship the AI-Citation Stack This Weekend

14-day free trial. Six free Markdown converters, two AI-ingest files, and a chatbot that finally shows up in AI search answers.

14-day free trialCancel anytime5-minute setup