How ChatGPT pulls hotel prices:it scrapes the OTAs, then cites Reddit
TL;DR: Ask ChatGPT what a hotel costs and it goes to the live web 98.8% of the time. It fetches online travel agencies (Booking, Expedia, Kayak) more than every other source combined — but cites them least (11%). It reads the price off those pages, then footnotes someone more readable: Reddit gets cited 100% of the time it's fetched. And the price you see almost always rides one licensed retrieval tier (labrador), not the scraper tier.
Executive Summary
ChatGPT treats online travel agencies as price oracles to read from — not as the sources it shows you.
We asked ChatGPT 30 hotel-price questions — 12 named hotels across four segments and three cities, plus open “cheapest hotel” queries — and ran each eight times through Bright Data. For every answer we parsed ChatGPT's raw response stream down to its network-source layer: each document it fetched, the retrieval tier that served it (result_source), and whether it was actually cited. That's 240 captures and 3,092 fetched documents.
The plumbing is consistent and a little surprising. A price question almost always hits the live web. ChatGPT casts a wide net over the OTAs to read the number, then cites a much smaller, more human-readable set — forums and editorial win the footnote, official sites win for chains and luxury, and the cited price rides the licensed labrador tier rather than the Bright Data scraper tier the trade press associates with shopping.
A price question always reaches the web
Before searching, ChatGPT files each query into a bucket (turn_use_case) that decides whether the web is touched. For generic hotel questions a large share land in text — answered from memory, no retrieval. Attach a price and that disappears: every price prompt routed to a live-search bucket.
| turn_use_case | Share of price turns | Hits the web? |
|---|---|---|
| instant search | 85% | Yes |
| local | 15% | Yes (maps / places) |
| text | 0% | No — none landed here |
It fetches the OTAs the most and cites them the least
Here is every document ChatGPT fetched for a price question, grouped by source class, against how often a fetched document of that class was actually cited. The gap between the two is the whole story.
| Source class | Fetched | Cited | Cite rate |
|---|---|---|---|
| Forum / social (Reddit) | 51 | 51 | 100% |
| Editorial | 29 | 17 | 59% |
| Deal / loyalty / points | 170 | 54 | 32% |
| Metasearch (Kayak, Trivago, Google) | 477 | 89 | 19% |
| Official hotel site | 554 | 87 | 16% |
| OTA (Booking, Expedia, Klook) | 1,434 | 154 | 11% |
Reddit gets cited every single time
The flip side of the OTA gap: when ChatGPT pulls a forum thread or a magazine piece about what a hotel really costs, it almost always surfaces it. These are the most-cited individual price domains.
| Domain | Class | Times cited |
|---|---|---|
| booking.com | OTA | 64 |
| reddit.com | Forum / social | 51 |
| kayak.com | Metasearch | 37 |
| expedia.com | OTA | 36 |
| hilton.com | Official | 32 |
| klook.com | OTA | 20 |
| saverrooms.co.uk | Deal / loyalty | 14 |
| google.com | Metasearch | 11 |
The cited price rides the licensed tier, not the scraper
ChatGPT stamps each fetched page with result_source — the pipeline that served it. The trade press associates the bright (Bright Data) tier with shopping and finance. For hotels it does show up in the fetch layer — but it nearly vanishes from what gets cited.
| Retrieval tier | Fetched | Cited |
|---|---|---|
| labrador (licensed / quality-gated) | 2,447 | 392 |
| bright (Bright Data datasets) | 170 | 3 |
| serp (open-web baseline) | 4 | 0 |
| untagged (footnote-only) | 471 | 76 |
bright appears in the fetch layer (170 documents) but is cited just 3 times. Every cited price source we can tier resolves to labrador. So the “Bright Data dominates shopping” story holds for what ChatGPT fetches — but for hotels, the price you read came through the licensed tier. (The untagged-but-cited rows are Reddit and editorial, which arrive via a separate footnote path and carry no tier.)Where the price comes from depends on the hotel
“Where does the price come from” has no single answer — it tracks how much of a hotel's inventory the OTAs control. Cited source-class share, by hotel segment:
| Segment | OTA | Official | Deal / loyalty | Metasearch | Forum |
|---|---|---|---|---|---|
| Palace (Ritz, Savoy, Plaza) | 21% | 30% | 23% | 11% | 7% |
| Global chain (Hilton) | 32% | 31% | 10% | 12% | 15% |
| Boutique indie | 37% | 15% | 10% | 19% | 10% |
| Budget chain (ibis, Premier Inn, Pod) | 30% | 17% | 15% | 27% | 11% |
| City-level open queries | 44% | ~0% | — | 25% | 12% |
brand.com holds its own. But boutique independents get OTA-priced 37% of the time versus their own site only 15%, even when they run a strong direct-booking site. Open “cheapest hotel” questions lean hardest on aggregators of all.What it actually quoted — and which numbers to distrust
The per-night figure ChatGPT typically quoted for each named hotel (median across 16 captures), next to the source it leaned on. The source is a tell: when it's the official site, the number tends to hold; when it's a deal blog or an unrelated chain, distrust it.
| Hotel | Segment | Typical / night | Dominant cited source |
|---|---|---|---|
| Hilton Times Square | Global chain | ~$235 | hilton.com (official) |
| The Plaza | Palace | $1,250–$1,900 | theplazany.com (official) |
| Hilton Paris Opéra | Global chain | €260–€350 | hilton.com + Kayak |
| The Savoy | Palace | £1,000–£1,900 | Booking.com |
| Ritz Paris | Palace | ~$2,900 | Travelzoo, Amex FHR, points blogs |
| The Greenwich Hotel | Boutique indie | $1,200–$1,800 | Kayak + official |
| Hilton London Park Lane | Global chain | £250–£500 | Booking.com + hilton.com |
| Grand Hôtel du Palais Royal | Boutique indie | ~$800–$850 | Expedia / Booking ⚠ |
| Hazlitt’s | Boutique indie | £450–£600 | hotelpricewatch, Expedia, deal blogs |
| ibis Paris Tour Eiffel | Budget chain | €160–€170 | Klook (unusual channel) |
| Pod 51 | Budget chain | $105–$130 | Booking, Google, Kayak |
| Premier Inn County Hall | Budget chain | £100–£260 | saverrooms.co.uk (deal blog) ⚠ |
saverrooms.co.uk, a third-party deal blog — not premierinn.com, which publishes hard, JavaScript-light rates of its own; the £100–£260 spread is suspiciously wide. (2) ibis Paris was priced off Klook, a Southeast-Asia-centric OTA, over Accor's own site. (3) Grand Hôtel du Palais Royal, an independent, cited hilton.com — a wrong-entity citation, so its ~$800 may be anchored to an unrelated property. By contrast the official-sourced figures (Hilton Times Square, The Plaza) should track reality closely.Study Design
Data Collection
- 30 frozen price prompts: 12 named hotels (4 segments × 3 cities) plus open city-level questions, each run 8× via Bright Data, country US, 25 Jun 2026 — 240 captures.
- Each capture parsed from the raw SSE stream into its network-source layer: every fetched document with
result_sourcetier and cited flag — 3,092 documents, 471 cited. - Prices extracted from the prose answer per capture; source classes curated from observed domains.
Caveats
- N = 240, one capture day, US proxy, English only — a snapshot, not a trend.
- Cited documents arrive via a footnote path with different URL strings than the tier-tagged fetches, so a cited source's tier is imputed from its domain's modal tier. All raw rows are stored.
- The structured shopping block never fired for hotels (it is a products surface); prices come via the map panel and prose.
oxylabsdid not appear in this hotel run;serpappeared 4× and was never cited.
Open data. Headline stats and the underlying tables are published as CSV: summary.csv, fetch_vs_cite.csv, tier_distribution.csv, by_segment.csv, top_cited_domains.csv, quoted_prices.csv.