Where does ChatGPT get hotel prices from?

From the open web, in real time — a hotel price question triggers a web search 98.8% of the time. ChatGPT fetches online travel agencies (Booking.com, Expedia, Kayak and similar) more than every other kind of source combined: OTAs are 46% of all documents it pulls. It reads the price number off those pages, but it cites them only 11% of the time, preferring to footnote more human-readable sources such as forums and editorial pieces.

Does ChatGPT use the hotel’s own website for the price?

Sometimes, but less than you would expect. Official hotel and brand sites are 18% of the documents ChatGPT fetches for price questions, and it cites them only 16% of the time it fetches one. Whether the official site wins depends on the hotel: luxury palaces are priced largely from official and loyalty channels (Amex Fine Hotels & Resorts, points blogs), global chains see official roughly tie with OTAs, but independent boutique hotels get priced from OTAs 37% of the time versus their own site only 15% — even when they have a strong direct-booking site.

Is the price ChatGPT quotes accurate?

Treat it as an estimate, and let the source tell you how much to trust it. When the cited source is the official hotel or brand site, the figure tends to track reality. When it comes from a third-party deal blog, an unusual aggregator, or — in one case in this study — a citation to an entirely unrelated hotel chain, the number is much shakier. ChatGPT also reads prices off the page text directly, so anything behind JavaScript or inside an image is invisible to it.

What is the result_source / labrador tier, and why does it matter for prices?

result_source is an undocumented field ChatGPT stamps on each retrieved page naming the pipeline that fetched it: labrador (a licensed, quality-gated tier), bright (Bright Data structured datasets), oxylabs (scraped open web) or serp (open-web baseline). For hotel prices, the bright tier does appear in the fetch layer (170 of 3,092 documents) but almost never in what gets cited (3 of 471). Every tier-tagged price source that ChatGPT actually cited came through labrador. So the licensed tier, not the scraper tiers, supplies the price you read.

Why does ChatGPT fetch Booking.com so much but rarely cite it?

It uses OTAs as price oracles, not as the sources it shows you. ChatGPT pulls a wide spread of OTA pages for a hotel to read the live nightly rate off them, then footnotes a smaller, more readable set. OTAs are fetched the most (46% of documents) but cited the least (11%), while Reddit is cited 100% of the time it is fetched and editorial 59%. Being on Booking gets your number read; being discussed on Reddit or in a magazine gets you shown.

How should hotels optimize for ChatGPT price answers?

Keep your price in plain, fetchable HTML text — never behind JavaScript or inside an image — because ChatGPT reads the number straight off the page. Maintain OTA presence so the number exists to be read, but invest in earned forum and editorial coverage, which is what actually gets cited. For chains and luxury properties a strong official site can win the price citation; for independents, the OTA-plus-earned-coverage combination is the realistic path.

June 2026AI Search · Pricing

How ChatGPT pulls hotel prices:it scrapes the OTAs, then cites Reddit

Name: How ChatGPT Sources Hotel Prices — Network-Source Forensics 2026
Creator: Nicolas Sitter
Published: 2026-06-26
License: https://creativecommons.org/licenses/by/4.0/

TL;DR: Ask ChatGPT what a hotel costs and it goes to the live web 98.8% of the time. It fetches online travel agencies (Booking, Expedia, Kayak) more than every other source combined — but cites them least (11%). It reads the price off those pages, then footnotes someone more readable: Reddit gets cited 100% of the time it's fetched. And the price you see almost always rides one licensed retrieval tier (labrador), not the scraper tier.

Nicolas Sitter

Published June 26, 2026

240

Price captures

3,092

Documents fetched

11%

OTA cite rate

100%

Reddit cite rate

Read the Report

Summary 1. Always searches 2. Fetch ≠ cite 3. Who gets cited 4. The licensed tier 5. By hotel tier 6. The actual prices Methodology FAQ

Executive Summary

ChatGPT treats online travel agencies as price oracles to read from — not as the sources it shows you.

We asked ChatGPT 30 hotel-price questions — 12 named hotels across four segments and three cities, plus open “cheapest hotel” queries — and ran each eight times through Bright Data. For every answer we parsed ChatGPT's raw response stream down to its network-source layer: each document it fetched, the retrieval tier that served it (result_source), and whether it was actually cited. That's 240 captures and 3,092 fetched documents.

The plumbing is consistent and a little surprising. A price question almost always hits the live web. ChatGPT casts a wide net over the OTAs to read the number, then cites a much smaller, more human-readable set — forums and editorial win the footnote, official sites win for chains and luxury, and the cited price rides the licensed labrador tier rather than the Bright Data scraper tier the trade press associates with shopping.

Section 1

A price question always reaches the web

Before searching, ChatGPT files each query into a bucket (turn_use_case) that decides whether the web is touched. For generic hotel questions a large share land in text — answered from memory, no retrieval. Attach a price and that disappears: every price prompt routed to a live-search bucket.

Query routing across 240 ChatGPT hotel-price captures. 98.8% triggered a web search.

turn_use_case	Share of price turns	Hits the web?
instant search	85%	Yes
local	15%	Yes (maps / places)
text	0%	No — none landed here

Asking for a price forces ChatGPT onto the live web. Where a generic “best hotels in Paris” can be answered from training data, “how much is a room at the Ritz on these dates” is treated as freshness-sensitive — so the answer is only as good as what it can fetch and read in that moment.

Section 2

It fetches the OTAs the most and cites them the least

Here is every document ChatGPT fetched for a price question, grouped by source class, against how often a fetched document of that class was actually cited. The gap between the two is the whole story.

Fetched vs cited per source class across 3,092 documents (471 cited). OTAs are 46% of everything fetched.

Source class	Fetched	Cited	Cite rate
Forum / social (Reddit)	51	51	100%
Editorial	29	17	59%
Deal / loyalty / points	170	54	32%
Metasearch (Kayak, Trivago, Google)	477	89	19%
Official hotel site	554	87	16%
OTA (Booking, Expedia, Klook)	1,434	154	11%

OTAs are 46% of every document ChatGPT fetches — nearly as much as all other classes combined — yet they have the lowest cite rate of any real source class (11%). ChatGPT scrapes a wide OTA spread to read the live nightly rate off the page, then footnotes a much smaller, more human-readable set. The aggregator is the price oracle; it is not the citation.

Section 3

Reddit gets cited every single time

The flip side of the OTA gap: when ChatGPT pulls a forum thread or a magazine piece about what a hotel really costs, it almost always surfaces it. These are the most-cited individual price domains.

Most-cited price domains across the study. Reddit is the #2 most-cited domain after Booking.com.

Domain	Class	Times cited
booking.com	OTA	64
reddit.com	Forum / social	51
kayak.com	Metasearch	37
expedia.com	OTA	36
hilton.com	Official	32
klook.com	OTA	20
saverrooms.co.uk	Deal / loyalty	14
google.com	Metasearch	11

Two different games. Being on Booking gets your number read; being discussed on Reddit or written up in a magazine gets you shown. For a hotel chasing visibility in price answers, earned forum and editorial coverage is worth more per citation than another OTA listing.

Section 4

The cited price rides the licensed tier, not the scraper

ChatGPT stamps each fetched page with result_source — the pipeline that served it. The trade press associates the bright (Bright Data) tier with shopping and finance. For hotels it does show up in the fetch layer — but it nearly vanishes from what gets cited.

result_source tier of fetched vs cited documents. Every tier-tagged cited price source resolves to labrador.

Retrieval tier	Fetched	Cited
labrador (licensed / quality-gated)	2,447	392
bright (Bright Data datasets)	170	3
serp (open-web baseline)	4	0
untagged (footnote-only)	471	76

bright appears in the fetch layer (170 documents) but is cited just 3 times. Every cited price source we can tier resolves to labrador. So the “Bright Data dominates shopping” story holds for what ChatGPT fetches — but for hotels, the price you read came through the licensed tier. (The untagged-but-cited rows are Reddit and editorial, which arrive via a separate footnote path and carry no tier.)

Section 5

Where the price comes from depends on the hotel

“Where does the price come from” has no single answer — it tracks how much of a hotel's inventory the OTAs control. Cited source-class share, by hotel segment:

Cited source-class share by hotel segment. Each row is the mix of sources ChatGPT cited for that group's price.

Segment	OTA	Official	Deal / loyalty	Metasearch	Forum
Palace (Ritz, Savoy, Plaza)	21%	30%	23%	11%	7%
Global chain (Hilton)	32%	31%	10%	12%	15%
Boutique indie	37%	15%	10%	19%	10%
Budget chain (ibis, Premier Inn, Pod)	30%	17%	15%	27%	11%
City-level open queries	44%	~0%	—	25%	12%

Luxury is priced off official and loyalty channels; independents off OTAs. Palaces lean on their own site plus Amex Fine Hotels & Resorts and points blogs (OTAs only 21%). Global chains see official roughly tie with OTAs — strong brand.com holds its own. But boutique independents get OTA-priced 37% of the time versus their own site only 15%, even when they run a strong direct-booking site. Open “cheapest hotel” questions lean hardest on aggregators of all.

Section 6

What it actually quoted — and which numbers to distrust

The per-night figure ChatGPT typically quoted for each named hotel (median across 16 captures), next to the source it leaned on. The source is a tell: when it's the official site, the number tends to hold; when it's a deal blog or an unrelated chain, distrust it.

Median per-night price ChatGPT quoted per named hotel for the same dates, with its dominant cited source. Figures are estimates; the source predicts reliability.

Hotel	Segment	Typical / night	Dominant cited source
Hilton Times Square	Global chain	~$235	hilton.com (official)
The Plaza	Palace	$1,250–$1,900	theplazany.com (official)
Hilton Paris Opéra	Global chain	€260–€350	hilton.com + Kayak
The Savoy	Palace	£1,000–£1,900	Booking.com
Ritz Paris	Palace	~$2,900	Travelzoo, Amex FHR, points blogs
The Greenwich Hotel	Boutique indie	$1,200–$1,800	Kayak + official
Hilton London Park Lane	Global chain	£250–£500	Booking.com + hilton.com
Grand Hôtel du Palais Royal	Boutique indie	~$800–$850	Expedia / Booking ⚠
Hazlitt’s	Boutique indie	£450–£600	hotelpricewatch, Expedia, deal blogs
ibis Paris Tour Eiffel	Budget chain	€160–€170	Klook (unusual channel)
Pod 51	Budget chain	$105–$130	Booking, Google, Kayak
Premier Inn County Hall	Budget chain	£100–£260	saverrooms.co.uk (deal blog) ⚠

Three tells where the source flags a shaky number. (1) Premier Inn was priced almost entirely from saverrooms.co.uk, a third-party deal blog — not premierinn.com, which publishes hard, JavaScript-light rates of its own; the £100–£260 spread is suspiciously wide. (2) ibis Paris was priced off Klook, a Southeast-Asia-centric OTA, over Accor's own site. (3) Grand Hôtel du Palais Royal, an independent, cited hilton.com — a wrong-entity citation, so its ~$800 may be anchored to an unrelated property. By contrast the official-sourced figures (Hilton Times Square, The Plaza) should track reality closely.

Methodology

Study Design

Data Collection

30 frozen price prompts: 12 named hotels (4 segments × 3 cities) plus open city-level questions, each run 8× via Bright Data, country US, 25 Jun 2026 — 240 captures.
Each capture parsed from the raw SSE stream into its network-source layer: every fetched document with result_source tier and cited flag — 3,092 documents, 471 cited.
Prices extracted from the prose answer per capture; source classes curated from observed domains.

Caveats

N = 240, one capture day, US proxy, English only — a snapshot, not a trend.
Cited documents arrive via a footnote path with different URL strings than the tier-tagged fetches, so a cited source's tier is imputed from its domain's modal tier. All raw rows are stored.
The structured shopping block never fired for hotels (it is a products surface); prices come via the map panel and prose.
oxylabs did not appear in this hotel run; serp appeared 4× and was never cited.

Open data. Headline stats and the underlying tables are published as CSV: summary.csv, fetch_vs_cite.csv, tier_distribution.csv, by_segment.csv, top_cited_domains.csv, quoted_prices.csv.