An ad server has milliseconds to decide what to show. Drop a large language model into the pipeline and it can read the page like a person — understanding the topic, judging brand safety, and scoring relevance — then attach all of it to the bid before the auction.
A Amit W15 June 20259 min read
An ad server has a few milliseconds to decide what to put in front of a reader. For years that decision leaned on cheap signals — keywords, URL patterns, page metadata — that only approximate what a page is really about. Large language models change the economics of understanding. Put one in the serving pipeline and the server can read a page the way a person would: grasp the topic, judge whether it’s safe for a brand, and score how relevant a given ad actually is — then attach all of that to the bid before it ever reaches the DSP.
This post walks through one practical architecture for an LLM-based ad server, and the four jobs the model does inside it: content understanding, category extraction, brand safety, and ad relevance scoring. It closes with the engineering reality of keeping a slow, expensive model out of a fast auction.
01 — ARCHITECTUREThe serving pipeline
The shape is a standard programmatic stack with one new component bolted in. A Publisher sends the page and request context to the Ad Server. Instead of going straight to auction, the server consults an LLM Context Engine that turns raw page content into structured signals. Those signals flow into Bid Enrichment, which attaches them to the OpenRTB bid request, and the enriched request goes out to the DSP Auction — where buyers now bid with far more context than a bare URL.
The LLM-based serving pipeline
The LLM Context Engine is the only new piece — everything around it is a normal programmatic stack.
02 — UNDERSTANDINGGPT for content understanding
The first job is comprehension. A GPT-style model reads the page’s title, headings and body and builds a semantic picture: what the article is about, the entities it mentions, the tone. This is where it beats keyword matching outright — it knows “Apple” the company from the fruit, and “shooting a scene” from a crime report, because it reads context, not tokens.
In practice you don’t hand the model a raw HTML dump. You strip boilerplate (nav, ads, footers), keep the main content, truncate to a sensible token budget, and prompt for a structured response. The output of this stage is a compact semantic profile that the next three stages consume.
The trick isn’t calling an LLM during the auction. It’s having already called it — and serving the answer from cache.
03 — CATEGORY EXTRACTIONMapping content to a taxonomy
Understanding only becomes useful when it’s machine-readable. So the next step maps the semantic profile onto a fixed taxonomy — the IAB Content Taxonomy is the common choice — as multi-label categories with confidence scores. A page might come back as Automotive 0.91, Electric Vehicles 0.78.
The engineering tip that matters here: constrain the output to the taxonomy. Give the model the allowed category list (or an enum / JSON schema) and validate what comes back, so you never enrich a bid with a hallucinated category. Pass the top few categories downstream; drop anything below a confidence floor.
04 — BRAND SAFETYFlagging unsafe content before the bid
The same model can classify a page against a brand-safety framework (for example the GARM floor and suitability tiers): violence, hate, adult, illegal activity, and so on, each with a severity. The ad server can then block unsafe inventory outright, down-weight it, or simply pass a safety signal so each advertiser’s DSP applies its own thresholds.
This catches what blocklists miss — sarcasm, quotation, and context that a banned-words list reads as unsafe when it isn’t (and vice-versa). The caveat is calibration: over-blocking quietly destroys yield, so tune your thresholds and keep a human-review path for the grey area rather than trusting a single score.
05 — RELEVANCEAd relevance scoring
Finally, the engine scores fit: given the page’s semantic profile and a candidate creative or advertiser category, how relevant is this ad, on a 0–1 scale? That score rides along in bid enrichment so the bidder can lean into strong matches and ease off weak ones.
You don’t need a full LLM call per candidate to do this. Embed the page and the creative once, compare them with cosine similarity, and you get a fast, cheap relevance score that scales to the candidate set — reserving the heavyweight model for the understanding and safety stages.
06 — ENGINEERINGKeeping the model off the hot path
Here’s the constraint that shapes the whole design: an LLM call takes hundreds of milliseconds and costs money, and the auction budget is under <100 ms. You cannot call GPT synchronously inside the bid request. So you don’t.
Pre-compute and cache. Run the LLM Context Engine at crawl or first-seen time, key the result by URL or content hash, and refresh only when the content changes.
Serve from cache. The auction reads the pre-computed signals synchronously; the model never sits in the request path.
Use light models on the hot path. Embeddings and small distilled classifiers handle anything that must run live; the large model runs offline.
Fail safe. Set timeouts and degrade gracefully to keyword/URL signals if enrichment is missing, so a slow model never blocks a bid.
Semantic
Understands what a page means — not just the words it contains.
Off-path
The LLM runs at crawl time; the auction serves signals from cache.
Brand-safe
Unsafe and unsuitable content is flagged before a bid is placed.
In the bid
Category, safety and relevance ride along in the OpenRTB request.
07 — OUTCOMERicher requests, better auctions
An LLM-based ad server doesn’t replace the auction — it feeds it. Content understanding, category extraction, brand safety and relevance scoring turn a thin bid request into a rich one, and richer requests clear at better prices for publishers and better outcomes for advertisers. The engineering art is simply keeping the model where it belongs: understand offline, enrich in advance, and serve the answer from cache.
Key takeaways
An LLM Context Engine sits between the ad server and the auction, turning page content into structured signals.
GPT-style models read context that keyword and URL signals miss.
Constrain category output to a fixed taxonomy (e.g. IAB) with confidence scores so it’s machine-usable — and validate against hallucinations.
Use the model for brand-safety classification, but calibrate thresholds to avoid over-blocking and killing yield.
Score ad–page relevance with embeddings for speed; reserve the big model for understanding and safety.
Keep the LLM off the synchronous auction path: pre-compute, cache, and fail safe.
Building something like this?
We work on programmatic infrastructure every day. Tell us what you’re building and we’ll share how PeakMyAds approaches it.