June 2026AI Search · Pricing

How ChatGPT pulls hotel prices:it scrapes the OTAs, then cites Reddit

TL;DR: Ask ChatGPT what a hotel costs and it goes to the live web 98.8% of the time. It fetches online travel agencies (Booking, Expedia, Kayak) more than every other source combined — but cites them least (11%). It reads the price off those pages, then footnotes someone more readable: Reddit gets cited 100% of the time it's fetched. And the price you see almost always rides one licensed retrieval tier (labrador), not the scraper tier.

NS
Nicolas Sitter
Published June 26, 2026
240
Price captures
3,092
Documents fetched
11%
OTA cite rate
100%
Reddit cite rate
Read the Report

Executive Summary

ChatGPT treats online travel agencies as price oracles to read from — not as the sources it shows you.

We asked ChatGPT 30 hotel-price questions — 12 named hotels across four segments and three cities, plus open “cheapest hotel” queries — and ran each eight times through Bright Data. For every answer we parsed ChatGPT's raw response stream down to its network-source layer: each document it fetched, the retrieval tier that served it (result_source), and whether it was actually cited. That's 240 captures and 3,092 fetched documents.

The plumbing is consistent and a little surprising. A price question almost always hits the live web. ChatGPT casts a wide net over the OTAs to read the number, then cites a much smaller, more human-readable set — forums and editorial win the footnote, official sites win for chains and luxury, and the cited price rides the licensed labrador tier rather than the Bright Data scraper tier the trade press associates with shopping.

Section 1

A price question always reaches the web

Before searching, ChatGPT files each query into a bucket (turn_use_case) that decides whether the web is touched. For generic hotel questions a large share land in text — answered from memory, no retrieval. Attach a price and that disappears: every price prompt routed to a live-search bucket.

Query routing across 240 ChatGPT hotel-price captures. 98.8% triggered a web search.
turn_use_caseShare of price turnsHits the web?
instant search85%Yes
local15%Yes (maps / places)
text0%No — none landed here
Asking for a price forces ChatGPT onto the live web. Where a generic “best hotels in Paris” can be answered from training data, “how much is a room at the Ritz on these dates” is treated as freshness-sensitive — so the answer is only as good as what it can fetch and read in that moment.
Section 2

It fetches the OTAs the most and cites them the least

Here is every document ChatGPT fetched for a price question, grouped by source class, against how often a fetched document of that class was actually cited. The gap between the two is the whole story.

Fetched vs cited per source class across 3,092 documents (471 cited). OTAs are 46% of everything fetched.
Source classFetchedCitedCite rate
Forum / social (Reddit)5151100%
Editorial291759%
Deal / loyalty / points1705432%
Metasearch (Kayak, Trivago, Google)4778919%
Official hotel site5548716%
OTA (Booking, Expedia, Klook)1,43415411%
OTAs are 46% of every document ChatGPT fetches — nearly as much as all other classes combined — yet they have the lowest cite rate of any real source class (11%). ChatGPT scrapes a wide OTA spread to read the live nightly rate off the page, then footnotes a much smaller, more human-readable set. The aggregator is the price oracle; it is not the citation.
Section 3

Reddit gets cited every single time

The flip side of the OTA gap: when ChatGPT pulls a forum thread or a magazine piece about what a hotel really costs, it almost always surfaces it. These are the most-cited individual price domains.

Most-cited price domains across the study. Reddit is the #2 most-cited domain after Booking.com.
DomainClassTimes cited
booking.comOTA64
reddit.comForum / social51
kayak.comMetasearch37
expedia.comOTA36
hilton.comOfficial32
klook.comOTA20
saverrooms.co.ukDeal / loyalty14
google.comMetasearch11
Two different games. Being on Booking gets your number read; being discussed on Reddit or written up in a magazine gets you shown. For a hotel chasing visibility in price answers, earned forum and editorial coverage is worth more per citation than another OTA listing.
Section 4

The cited price rides the licensed tier, not the scraper

ChatGPT stamps each fetched page with result_source — the pipeline that served it. The trade press associates the bright (Bright Data) tier with shopping and finance. For hotels it does show up in the fetch layer — but it nearly vanishes from what gets cited.

result_source tier of fetched vs cited documents. Every tier-tagged cited price source resolves to labrador.
Retrieval tierFetchedCited
labrador (licensed / quality-gated)2,447392
bright (Bright Data datasets)1703
serp (open-web baseline)40
untagged (footnote-only)47176
bright appears in the fetch layer (170 documents) but is cited just 3 times. Every cited price source we can tier resolves to labrador. So the “Bright Data dominates shopping” story holds for what ChatGPT fetches — but for hotels, the price you read came through the licensed tier. (The untagged-but-cited rows are Reddit and editorial, which arrive via a separate footnote path and carry no tier.)
Section 5

Where the price comes from depends on the hotel

“Where does the price come from” has no single answer — it tracks how much of a hotel's inventory the OTAs control. Cited source-class share, by hotel segment:

Cited source-class share by hotel segment. Each row is the mix of sources ChatGPT cited for that group's price.
SegmentOTAOfficialDeal / loyaltyMetasearchForum
Palace (Ritz, Savoy, Plaza)21%30%23%11%7%
Global chain (Hilton)32%31%10%12%15%
Boutique indie37%15%10%19%10%
Budget chain (ibis, Premier Inn, Pod)30%17%15%27%11%
City-level open queries44%~0%25%12%
Luxury is priced off official and loyalty channels; independents off OTAs. Palaces lean on their own site plus Amex Fine Hotels & Resorts and points blogs (OTAs only 21%). Global chains see official roughly tie with OTAs — strong brand.com holds its own. But boutique independents get OTA-priced 37% of the time versus their own site only 15%, even when they run a strong direct-booking site. Open “cheapest hotel” questions lean hardest on aggregators of all.
Section 6

What it actually quoted — and which numbers to distrust

The per-night figure ChatGPT typically quoted for each named hotel (median across 16 captures), next to the source it leaned on. The source is a tell: when it's the official site, the number tends to hold; when it's a deal blog or an unrelated chain, distrust it.

Median per-night price ChatGPT quoted per named hotel for the same dates, with its dominant cited source. Figures are estimates; the source predicts reliability.
HotelSegmentTypical / nightDominant cited source
Hilton Times SquareGlobal chain~$235hilton.com (official)
The PlazaPalace$1,250–$1,900theplazany.com (official)
Hilton Paris OpéraGlobal chain€260–€350hilton.com + Kayak
The SavoyPalace£1,000–£1,900Booking.com
Ritz ParisPalace~$2,900Travelzoo, Amex FHR, points blogs
The Greenwich HotelBoutique indie$1,200–$1,800Kayak + official
Hilton London Park LaneGlobal chain£250–£500Booking.com + hilton.com
Grand Hôtel du Palais RoyalBoutique indie~$800–$850Expedia / Booking ⚠
Hazlitt’sBoutique indie£450–£600hotelpricewatch, Expedia, deal blogs
ibis Paris Tour EiffelBudget chain€160–€170Klook (unusual channel)
Pod 51Budget chain$105–$130Booking, Google, Kayak
Premier Inn County HallBudget chain£100–£260saverrooms.co.uk (deal blog) ⚠
Three tells where the source flags a shaky number. (1) Premier Inn was priced almost entirely from saverrooms.co.uk, a third-party deal blog — not premierinn.com, which publishes hard, JavaScript-light rates of its own; the £100–£260 spread is suspiciously wide. (2) ibis Paris was priced off Klook, a Southeast-Asia-centric OTA, over Accor's own site. (3) Grand Hôtel du Palais Royal, an independent, cited hilton.com — a wrong-entity citation, so its ~$800 may be anchored to an unrelated property. By contrast the official-sourced figures (Hilton Times Square, The Plaza) should track reality closely.
Methodology

Study Design

Data Collection

  • 30 frozen price prompts: 12 named hotels (4 segments × 3 cities) plus open city-level questions, each run 8× via Bright Data, country US, 25 Jun 2026 — 240 captures.
  • Each capture parsed from the raw SSE stream into its network-source layer: every fetched document with result_source tier and cited flag — 3,092 documents, 471 cited.
  • Prices extracted from the prose answer per capture; source classes curated from observed domains.

Caveats

  • N = 240, one capture day, US proxy, English only — a snapshot, not a trend.
  • Cited documents arrive via a footnote path with different URL strings than the tier-tagged fetches, so a cited source's tier is imputed from its domain's modal tier. All raw rows are stored.
  • The structured shopping block never fired for hotels (it is a products surface); prices come via the map panel and prose.
  • oxylabs did not appear in this hotel run; serp appeared 4× and was never cited.

Open data. Headline stats and the underlying tables are published as CSV: summary.csv, fetch_vs_cite.csv, tier_distribution.csv, by_segment.csv, top_cited_domains.csv, quoted_prices.csv.

Summarize with AI

ChatGPTPerplexityClaudeGeminiGrok
FAQ

Frequently Asked Questions

Continue Reading

More field tests of how AI engines find, price and cite sources.

All Research