Back to Research
Research

How Dirty Is Google Maps
Hotel Data?

We analyzed 179K Google Maps hotel listings across 11 countries. 17% fail basic quality checks. 8,167 are OYO vacation rentals. Belgium loses 54% of listings after cleaning. And this is what powers ChatGPT maps in the UI recommendations.

179K
Hotels Analyzed
17%
Fail QA
8.2%
Zero Reviews

TL;DR

Google Maps lets anyone create a "Hotel" listing with barely any verification — if you have a website that looks like a hotel and type an address, you can be live in minutes. The result: 16.4% of listings fail basic quality checks, 7.8% have zero reviews, and a single company — OYO — has polluted the dataset with 8,167 vacation rentals disguised as hotels. In Belgium, 37% of "hotels" on Google Maps are Belvilla holiday homes. This matters because Google Maps is the primary data source for ChatGPT, Gemini, and Perplexity hotel recommendations. Dirty data in, dirty recommendations out.

By Nicolas Sitter|April 2026|178,647 listings across 11 countries

Summary

We analyzed 178,647 Google Maps listings categorized as hotels across 11 countries. After deduplication, we worked with 148,923 unique listings. Applying progressive quality filters — website presence, review count, address validation, domain checks — reduced the clean dataset to 124,537 listings (83.6%). The remaining 16.4% are noise: vacation rentals, restaurants, zero-review placeholders, and OTA redirect pages.

The single largest source of pollution is OYO's European vacation rental brands: Belvilla (5,047 listings) and Traum-Ferienwohnungen (3,120 listings) together account for 8,167 fake hotel entries — 5.5% of the entire dataset. In Belgium, Belvilla alone is 37.3% of all "hotel" listings. These properties have an average of 0.1 Google reviews.

This matters for AI. Google Maps is the primary data source powering hotel recommendations in ChatGPT (88.8% of map entities), Gemini, and Perplexity. When a user asks "best hotels in Brussels," the AI draws from a pool where over a third of "hotels" are sheep farms and holiday apartments.

16.4%
Fail QA
8,167
OYO Fake Hotels
7.8%
Zero Reviews
54%
Belgium Drop Rate

Raw Data Problems

Before even looking at content quality, the raw data has structural issues. Here are the problems we found across 148,923 deduplicated listings.

Zero Reviews

11,688
7.8% of all listings

Under 10 Reviews

21,272
14.3% of all listings

Non-Hotel Website

11,510
7.8% link to OTAs, social, etc.

Review Count Distribution

Review Count Distribution
Review BucketListingsShare
0 reviews11,6887.8%
1-56,0344.1%
6-103,5502.4%
11-258,6625.8%
26-5010,5577.1%
51-10015,63110.5%
101-50056,73938.1%
501-1,00019,84513.3%
1,000+16,21710.9%
Rating Distribution
Rating RangeListingsShare
0.0-2.91,9021.4%
3.0-3.44,6413.4%
3.5-3.914,68210.7%
4.0-4.224,58617.9%
4.3-4.540,20329.3%
4.6-4.836,91826.9%
4.9-5.014,30310.4%

The sweet spot is 101-500 reviews (38.1%).

Most legitimate hotels fall in the 101-500 review range. The long tail below 10 reviews (14.3%) is disproportionately fake listings, vacation rentals, and newly created placeholder profiles.

Case Study: Belvilla & OYO

OYO Rooms (India) acquired two European vacation rental platforms in 2019: Belvilla and Traum-Ferienwohnungen. Both list individual holiday homes on Google Maps categorized as "Hotel." They are the single largest source of non-hotel pollution in our dataset.

OYO Vacation Rental Brands on Google Maps

Belvilla by OYO
Vacation rental marketplace
5,047
fake hotel listings
0.1
avg Google reviews
Domains: belvilla.nl, belvilla.de, belvilla.es, belvilla.com
Traum-Ferienwohnungen
Vacation rental marketplace (Germany)
3,120
fake hotel listings
0.2
avg Google reviews
Domains: traum-ferienwohnungen.de
8,167
Total OYO vacation rentals listed as "Hotel" — 5.5% of the dataset

Belvilla Pollution by Country

The damage is concentrated in small markets. In Belgium, more than a third of all "hotel" listings on Google Maps are Belvilla vacation rentals. In the Netherlands, it's nearly one in five.

Belvilla Listings as % of All Google Maps Hotels
CountryBelvilla ListingsTotal HotelsBelvilla %
Belgium1,7554,71037.3%
Netherlands1,3797,24519%
Austria5169,4315.5%
Spain76023,9523.2%
Germany56833,3601.7%
Switzerland655,3831.2%
Italy337,8610%

What These Listings Look Like

These are real Google Maps "hotel" listing titles from Belgium:

  • Huisje op schapenboerderij met gelateria - Belvilla by Oyo
  • Vakantiehuis in Virton met privezwembad - Belvilla by Oyo
  • Modern vakantiehuis in Senzeille met tuin - Belvilla by Oyo
  • Heerlijk vakantiehuis in Libramont-Chevigny met tuin - Belvilla by Oyo

These are individual holiday homes — sheep farms, garden cottages, pool villas — categorized as "Hotel" with 0-1 Google reviews.

Belvilla Domain Breakdown
DomainListings
belvilla.nl3,123
belvilla.de1,081
belvilla.es760
belvilla.com70
belvilla.fr8
belvilla.it5

A single review threshold eliminates 99.9% of Belvilla.

Requiring just > 10 reviews drops Belvilla from 5,047 listings to 3. At > 50, it drops to zero. This is the strongest evidence that review count is the single most effective quality filter for Google Maps hotel data.

Beyond OYO: Other Data Polluters

OYO is the biggest offender, but 7.8% of all listings (11,510) have non-hotel website domains. These fall into five categories.

Non-Hotel Website Types in Google Maps Hotel Data
TypeExamplesCountShare
Vacation rental platformsbelvilla.nl, traum-ferienwohnungen.de8,1555.5%
Redirects & aggregatorstripcombined.com, traveleto.com, google.com1,6031.1%
OTA pagesbooking.com, expedia.com, tripadvisor.com6990.5%
Social mediafacebook.com, instagram.com6970.5%
Free website builderswixsite.com, wordpress.com3560.2%
Top Non-Hotel Domains Found in Hotel Listings
DomainListingsType
booking.com692OTA redirect
facebook.com528Social media
google.com289Google redirect / placeholder
traveleto.com224Booking redirect
tripcombined.com212Meta-search redirect
wixsite.com184Free website builder
instagram.com169Social media
wordpress.com82Free website builder

692 "hotels" have booking.com as their website.

These are hotels with no direct website — their Google Business Profile links to their Booking.com page. Others link to Facebook (528), Instagram (169), or free website builders like Wix (184). None of these are indicators of a professional hotel operation.

Cleaning the Data: Filter Effectiveness

How much noise can progressive filtering remove? We applied filters cumulatively and measured the impact. The review threshold is by far the most effective single filter.

Cumulative Filter Pipeline
Filter AppliedRemainingShare
Raw (no filter)148,923100%
Has website147,36799%
+ reviews > 0136,37591.6%
+ reviews > 10127,40485.6%
+ has street address126,71285.1%
+ name > 7 chars125,44784.2%
+ exclude non-hotel domains124,53783.6%

Review Threshold vs Belvilla Survival

The review threshold is surgical. It eliminates fake listings while preserving real hotels. Here is how Belvilla listings survive at each threshold:

Review Threshold Effectiveness Against Belvilla
ThresholdTotal RemainingRemaining %Belvilla Surviving
> 0137,23592.2%181
> 1135,30790.9%36
> 5131,20188.1%8
> 10127,65185.7%3
> 25118,98979.9%1
> 50108,43272.8%0
> 10092,80162.3%0

The recommended filter: > 10 reviews.

At > 10 reviews, you keep 85.7% of listings and eliminate 99.9% of Belvilla spam (from 5,047 to 3). Going higher (> 50, > 100) starts cutting legitimate small hotels. The > 10 threshold offers the best precision-recall tradeoff for hotel data cleaning.

Country-by-Country Comparison

Data quality varies dramatically by country. Belgium and the Netherlands lose over 50% of listings after cleaning — almost entirely due to Belvilla. Italy and Greece are the cleanest.

Google Maps Hotel Listings: Raw vs Clean by Country
CountryRawAfter CleaningDrop %Notes
France (FR)28,89023,31019.3%170% of official hotel count
Italy (IT)37,86134,1949.7%Lowest drop rate — cleanest data
Germany (DE)33,36026,94419.2%3,120 Traum-Ferienwohnungen inflate count
Spain (ES)23,95320,90612.7%
USA (US)15,49914,2228.2%Motel / extended stay noise
Greece (GR)11,34011,1601.6%Pre-filtered export — cleanest
Austria (AT)9,4317,73118%516 Belvilla listings
Netherlands (NL)7,2453,61650.1%1,379 Belvilla = 19% of total
Switzerland (CH)5,3854,41218.1%
Belgium (BE)4,7132,18953.6%1,755 Belvilla = 37% of total

Worst: Belgium (53.6% drop)

1,755 Belvilla listings make up 37.3% of all Belgian "hotels." After cleaning, Belgium goes from 4,713 to 2,189 listings. More than half the dataset is fake.

Netherlands (50.1% drop)

1,379 Belvilla listings (19% of total). Drops from 7,245 to 3,616. Without Belvilla, the Netherlands would have a clean dataset.

France (19.3% drop)

France has 170% of its official hotel count on Google Maps (28,890 vs ~17,000 real hotels). Mix of Belvilla, accor.com domain listings, and vacation rentals.

Best: Greece (1.6% drop)

Greece has the cleanest data in our sample — only 1.6% of listings fail QA. This is partly due to pre-filtered export from our scraping setup.

Without Belvilla, data quality improves dramatically.

Remove one company's listings and Belgium's drop rate goes from 53.6% to ~16%. Netherlands goes from 50.1% to ~31%. The pollution is concentrated, not distributed — which means it's fixable. Google could solve half the problem by validating one brand.

Chain Hotels vs Independent Hotels

Chain hotels are inherently cleaner data. They actively manage their Google Business Profiles, have dedicated digital teams, and rarely have zero reviews. Independent hotels are where the noise concentrates.

Chain Hotels

10,897
7.3% of dataset
1,210
avg reviews
1%
zero reviews

Independent Hotels

138,026
92.7% of dataset
395
avg reviews
8.4%
zero reviews
Top Hotel Chains by Listing Count
ChainParent CompanyListings
MarriottMarriott International1,858
HiltonHilton Worldwide1,496
WyndhamWyndham Hotels & Resorts1,074
IHGInterContinental Hotels Group1,006
Best WesternBest Western International824
AccorAccor SA750
ChoiceChoice Hotels International734
B&B HotelsGoldman Sachs / B&B Hotels466
Motel 6G6 Hospitality (Blackstone)286
HyattHyatt Hotels Corporation285

Chain hotels have 3x more reviews and are 8x less likely to have zero.

1,210 avg reviews vs 395. 1.0% zero reviews vs 8.4%. Chain data is inherently more reliable because chains actively manage their profiles. For AI systems, weighting chain data higher is a reasonable quality heuristic — but it disadvantages legitimate independent hotels who simply don't manage their Google profiles.

Fun fact: if Belvilla were a hotel chain...

With 5,047 listings on Google Maps, Belvilla would be the largest hotel chain in Europe — by far. Accor has 750 in our dataset. Marriott has 1,858. OYO's combined 8,167 vacation rental listings dwarf every actual hotel group. A vacation rental marketplace that nobody in hospitality takes seriously has more Google Maps "hotel" listings than Marriott, Hilton, IHG, and Accor combined.

Why This Matters for AI

Google Maps is not just a consumer tool — it is the foundational data layer for AI hotel recommendations. ChatGPT uses Google Places for 88.8% of its hotel entity cards. Gemini uses Google Maps directly. Perplexity queries Google. When the source data is dirty, everything downstream inherits the noise.

AI recommends non-existent properties

A zero-review Belvilla "hotel" in rural Belgium can appear in ChatGPT results. The AI has no way to distinguish it from a real hotel based on Google Maps data alone.

AI inflates hotel counts per city

Brussels has ~200 real hotels. Google Maps says 4,710. An AI asked "how many hotels in Brussels?" will give a wildly wrong answer.

AI uses reviews from fake listings

When computing average ratings or competitive analysis, fake listings with 0-1 reviews skew the statistics that AI uses to rank hotels.

Anyone can influence AI output

Creating a Google Business Profile as a "Hotel" is free and unverified. This is an open vector for manipulating AI hotel recommendations.

The fix is simple but nobody's doing it.

A > 10 review filter eliminates 99.9% of spam while keeping 85.7% of real hotels. OpenAI, Google, and other AI providers could dramatically improve hotel recommendation quality with a single threshold. The fact that they don't suggests they prioritize coverage over accuracy — or simply haven't audited the data.

Frequently Asked Questions

Methodology

Data Collection

178,647 Google Maps listings categorized as hotels, scraped via Apify Google Maps scraper across 10 European countries + USA. After deduplication on Google Place ID: 148,923 unique listings.

Domain Extraction

Website URLs parsed using tldextract for accurate root domain identification. This enables matching vacation rental platforms (belvilla.nl, belvilla.de, etc.) and detecting non-hotel websites (booking.com, facebook.com).

Chain Detection

Chain hotels identified via domain matching (marriott.com, hilton.com, etc.) and name pattern matching. 10,897 chain hotels detected (7.3% of dataset). Unmatched listings classified as independent.

Quality Filters

Progressive filters applied: website presence, review count > 10, street address present, name > 7 characters, exclude known non-hotel domains. Final clean dataset: 124,537 (83.6%).

Limitations

Scraping coverage varies by country (Greece was pre-filtered). Official hotel counts are estimates. Chain detection misses some brands. The "clean" dataset still contains some noise — our filters optimize for recall (keeping real hotels) over precision (excluding all fakes).

Related Research

See our ChatGPT Map Providers Study for how this data flows into AI recommendations, and our Anatomy of ChatGPT Hotel Search for the full technical architecture.

Want the Full Picture?

Dirty data is just one piece of the AI hotel visibility puzzle. Read our flagship study covering how AI is reshaping hotel discovery.

Read AI Hotel Landscape 2026