Why is Instagram cited so heavily for Marseille specialty coffee?

Instagram returned 237 citations across 3 of the 5 engines — by far the highest single social-platform signal in any of our cross-vertical studies so far. In the dataset, Marseille coffee behaves like an Instagram-native scene: many venues document themselves on Instagram more than on a website (Nua is the cleanest case — no website, only an Instagram account), and AI engines reflect that. In Paris yoga, Berlin yoga, Amsterdam bikes and Tokyo bookstores the dominant social channel was Reddit; in Marseille, Instagram joins it as a co-equal source.

What is the Gemini Barista Magazine pattern?

When a city has no dense local specialty press, Gemini reaches for the global trade press instead. For Marseille, baristamagazine.com returned 95 citations — about 34% of Gemini's Marseille citations, more than any shop website. The lesson generalises: Gemini's source preference is "an authoritative editorial outlet for this domain"; which editorial it picks is locale-dependent. In Berlin the equivalent slot went to Urban Sports Club's blog; in Tokyo it went to the local-guide web; in Marseille, with neither, Gemini reaches for the global specialty trade press.

Which Marseille specialty coffee shops does AI recommend most?

By the metric that matters — does the engine actually name the brand in its answer — Deep is the consensus winner, named across all five engines and in 59.3% of ChatGPT captures. A separate cite-counted score ranks Nua #1, but that is an artifact: Nua has no website, so its citations resolve to a generic instagram.com key, while in actual answer text it appears in just one capture. The article ranks the leaderboard by text mentions. Below Deep come 7VB Café, Black Bird Coffee, Café Lauca, Boujou Coffee and Ben Mouture.

Does asking in French vs English change the Marseille coffee shops AI recommends?

Yes, more sharply than in any other city we have measured. On the control prompt ("best specialty coffee in Marseille"), EN and FR top-5 overlap is just 11% — only 1 of the top 5 shops matches. Berlin's control was 25%, Paris yoga's was 25%. The cause: with so little global specialty press covering Marseille, English prompts pull from a tiny English-language source set (Reddit, Wanderlog, Barista Magazine) while French prompts pull from the dense Marseille local-blog web. Two languages, two largely disjoint citation universes.

Why is Google AI Mode missing from the French-language data?

The AI Mode x FR-proxy batch was rejected at the Bright Data trigger layer (HTTP-level, no snapshot ID returned). The same FR-proxy block hit Paris yoga in May; AI Mode x DE worked in the Berlin replication; AI Mode x FR failed again in Marseille. The rejection is FR-specific to AI Mode, not a one-off scrape failure. Every AI Mode percentage on this page is over the US-proxy batch only, and we flag the absence rather than imputing French numbers.

June 2026AI Search Studies

AI Search for Specialty Coffee in Marseille (2026):When Instagram and local blogs beat café websites

TL;DR: Marseille is the first non-Tokyo case in this series where ChatGPT mostly ignores the businesses’ own websites. Only 10% of its citations go to café websites, versus the ~32% we measured in Paris yoga, Berlin yoga and Amsterdam bikes (Tokyo bookstores was already an exception, at 8%). Instead it pulls from Reddit, list & review aggregators, French micro-guides and, unusually, Instagram — which returns 237 cites across 3 of the 5 engines, the highest social signal in any of our studies. By the “does the engine actually name the brand” metric, Deep is the consensus winner — named in 59.3% of ChatGPT captures and across all five engines. The lesson: in Marseille, AI search mirrors the city’s real discovery layer — and that layer isn’t websites. It’s Instagram, local guides and social proof.

Published June 15, 2026

AI engines

Prompt templates

280

Cafés

EN + FR

Both languages

Read the Report

Summary 1. Source Mix 2. ChatGPT Inversion 3. Instagram 4. Leaderboard 5. Top Sources 6. Language Split 7. Gemini’s Substitute 8. TLD Limit 9. Geography 10. Vs. 4 Cities 11. AI Behaviour For Café Owners Conclusion Methodology FAQ

Executive Summary

Marseille is the second city in this series where the entity-engine pattern breaks — and the first in a Western market. Marseille coffee runs on third-party sources, not café websites.

By “entity-engine,” I mean the pattern from prior studies where ChatGPT often grounds a local recommendation in the business’s own website, not only in third-party guides. Across Paris yoga, Berlin yoga and Amsterdam bikes, it cited the entity’s own site around a third of the time — what we’d been treating as a consistent baseline. Tokyo bookstores already broke it, at 8%. For Marseille coffee the figure is 10%. The missing citations don’t go to a single source — they go to a trio: 31% Reddit, 32% list & review aggregators (Wanderlog, Mapstr), 14% French micro-guides (marseille.love-spots.com, marseillesecrete.com, tarpin-bien.com, lescachotteriesdemarseille.com). That’s 77% of ChatGPT’s Marseille answers grounded in third-party content.

The likely mechanism is the same one Berlin hinted at: the source mix bends toward a city’s local commercial and editorial infrastructure. Marseille just has an unusual one — a small specialty-coffee scene with a thin owned-web footprint (only 114 of 280 venues have a website at all), a heavy Instagram-native discovery layer (237 cites, our highest social signal anywhere), and a dense thicket of French micro-guides. AI engines mirror what’s out there; where the documentation lives, the citations follow.

And there’s a structural reason the owned web is so thin here: a café has no online transaction. A yoga studio sells class bookings on its site; a hotel sells rooms. That gives the website a job — so it gets built, linked and crawled. A coffee shop sells nothing online, so the website is optional, and the scene documents itself where people actually look: Instagram, Reddit and local guides. The AI source mix just follows that.

One distinction throughout: citations are the URLs an engine used as support; mentions are the cafés actually named in the visible answer. The source-mix numbers are about citations (where AI search is grounded); the leaderboard later is about mentions (who users actually see). They don’t always agree — which is the point.

A ChatGPT answer to “best specialty coffee marseille”: a Google-Maps panel listing Deep, Ben Mouture, Café Corto and La Brûlerie MÖKA, then a written shortlist that leads with Deep (4.6, coffee shop) and Ben Mouture (4.8, café). — A real answer for our control prompt, “best specialty coffee in Marseille.” Both the map panel and the written shortlist lead with **Deep** and **Ben Mouture** — the consensus this study measures. Captured June 2026.

Section 1

Source mix by platform

For every cited URL we bucketed the source into one of ten categories. The mix per engine is the cleanest way to see why Marseille breaks the pattern — especially compared to the consistent shop/studio-website share we measured in three earlier cities.

source-mix-by-platform-marseille-coffee

Copilot

Still entity-engine — but at 83%

The 95–97% Copilot range we measured in yoga and bikes drops to 83% in Marseille. Even the engine that goes hardest on shop websites loses 12 points here. The weak digital footprint of Marseille coffee — most venues without a website — likely pushes Copilot into the long tail it usually ignores.

Gemini

Swings to global trade press: 34%

Gemini seems to look for an authoritative editorial layer; when the local specialty-coffee layer is thin, that slot gets filled by global trade press. baristamagazine.com alone returned 95 citations — more than any shop website on Gemini, and roughly 34% of Gemini’s Marseille citations.

Perplexity

Leads with local editorial: 27%

Perplexity leans on the French local-guide ecosystem more than any other engine — 27% of its Marseille citations go to marseille.love-spots.com, marseillesecrete.com, tarpin-bien.com and similar French micro-guides. Where Marseille is documented in French, Perplexity finds it.

The mechanism is the same one Berlin showed: source mix bends to local commercial and editorial infrastructure. In Berlin that infrastructure was a fitness-marketplace blog (Urban Sports Club). Here in Marseille the “infrastructure” is a Reddit thread, an Instagram feed and a thicket of French micro-guides — so the AI engines mirror exactly that. The platform personalities don’t change; the substrate does.

Section 2

The ChatGPT inversion

One number tells the whole story of this article. ChatGPT’s shop-website citation share for Marseille specialty coffee is the lowest we’ve measured in five cross-vertical replications — and the gap to the other “city + scene” cases is what makes it interesting.

10%

ChatGPT’s share of citations going to specialty-coffee shop websites in Marseille.

vs the ~32% baseline we measured in Paris yoga, Berlin yoga and Amsterdam bikes.

chatgpt-shop-website-share-by-city

ChatGPT shop/studio website citation share across five cross-vertical AI-search studies.

City + vertical	ChatGPT shop-site %	Signature non-shop source
Paris yoga	32%	Reddit (16% of all ChatGPT URLs)
Berlin yoga	32%	Urban Sports Club blog
Amsterdam bikes	42%	Reddit
Tokyo bookstores	8%	whenin.tokyo + Tokyo Weekender
Marseille coffee	10%	Reddit + Instagram + FR local blogs

This doesn’t look like a simple measurement artifact. Tokyo bookstores landed at 8% for what looks like the same reason Marseille lands at 10%: when a scene’s documentation lives off-site — on list & review aggregators, in local-guide blogs, or on social — the AI engines find it there. The ~32% entity-engine baseline is what we saw when shops do own their digital footprint. In Marseille, only 114 of 280 venues have a website at all. The 10% fits the underlying source ecosystem rather than fighting it.

Section 3

The Instagram signal

One domain stands out so far above the rest of the social layer that it deserves its own section.

237

Instagram citations

across 3 platforms — the highest single-domain count anywhere in this study except for AI Mode’s Google self-citation firehose.

No other city/vertical pushes Instagram this hard.

In Paris yoga, Berlin yoga, Amsterdam bikes and Tokyo bookstores, the dominant social signal was Reddit — Instagram barely registered (single digits to low double digits at most). For Marseille specialty coffee, Instagram joins Reddit as a co-equal source: Reddit at 175 cites, Instagram at 237.

In the dataset, Marseille coffee behaves like an Instagram-native scene — and it isn’t a single-engine artifact: Instagram appears across 3 of the 5 engines. Many specialty shops here document themselves there in preference to a website —Nua is the cleanest example: no website, only an Instagram account, and ChatGPT’s answers cite that Instagram URL as the primary source. AI engines reflect this by surfacing Instagram URLs as primary citations rather than social-proof addenda. If you’re ranking a Marseille café for AI visibility, “own your Instagram” may be closer to the AI-search truth than the generic “own your website” playbook.

The reverse holds too: in cities where Instagram doesn’t drive citations, optimising it as your primary AI-discovery surface would underperform. The platform that matters depends on where the city’s scene actually documents itself — not on a generic GEO playbook.

Section 4

The Marseille Specialty Coffee AI Leaderboard

Aggregating shop mentions across the engines (chain locations merged), these are the most-cited Marseille cafés. “Engines” is the number of the five AI engines that surfaced the shop at all — a breadth signal.

ai-favourite-marseille-coffee-shops-2026

Top 12 Marseille specialty cafés ranked by text-mention total (engines naming the brand in the visible answer text), with the data session's citation-counted score alongside for comparison. Engines = number of the five engines that named the brand at least once in answer text. Brand-aggregated — AI answers say 'Café Lauca,' not 'Café Lauca Vieux-Port,' so the relevant unit is the brand, not the address.

Rank	Shop	Text mentions	Cite score	Engines
#1	Deep	200	131	5 / 5
#2	7VB Café	141	66	5 / 5
#3	Black Bird Coffee	122	42	5 / 5
#4	Café Lauca	114	101	5 / 5
#5	Boujou Coffee	103	54	5 / 5
#6	Ben Mouture	100	30	5 / 5

The same leaderboard, split by engine

Reading across a row: the share of each engine’s prompt-captures whose visible answer text names the brand (raw count in parentheses). Because each engine answered a different number of prompts (ChatGPT 91, Gemini / Copilot / Perplexity 92, AI Mode 46 — AI Mode × FR failed at the Bright Data trigger), raw counts aren’t comparable across columns, so we render them as rates.

Deep (the café at deep.coffee) dominates the heatmap: named in 59.3% of ChatGPT’s captures, 46.7% of Gemini, 48.9% of Copilot, 45.7% of Perplexity and 34.8% of AI Mode — a universal recommendation. 7VB Café and Boujou Coffee are similarly broad. Black Bird Coffee (#3) and Ben Mouture (#6) land high despite being absent from the original Google Maps seed — the engines named them often enough that we recovered and verified them via Google Places (see methodology). Their presence is the reminder that AI search surfaces specialty shops a Maps scrape can miss.

#	Shop	AI Mode46 prompts	ChatGPT91 prompts	Perplexity92 prompts	Gemini92 prompts	Copilot92 prompts
1	Deep	34.8%(16)	59.3%(54)	45.7%(42)	46.7%(43)	48.9%(45)
2	7VB Café	39.1%(18)	33%(30)	27.2%(25)	41.3%(38)	32.6%(30)
3	Black Bird Coffee	39.1%(18)	42.9%(39)	18.5%(17)	8.7%(8)	43.5%(40)
4	Café Lauca	30.4%(14)	46.2%(42)	12%(11)	4.3%(4)	46.7%(43)
5	Boujou Coffee	39.1%(18)	40.7%(37)	15.2%(14)	14.1%(13)	22.8%(21)
6	Ben Mouture	28.3%(13)	39.6%(36)	5.4%(5)	26.1%(24)	23.9%(22)
7	Coffee&Bakery	37%(17)	33%(30)	12%(11)	0(0)	31.5%(29)
8	La Brûlerie MÖKA	17.4%(8)	19.8%(18)	18.5%(17)	35.9%(33)	8.7%(8)
9	Risette	37%(17)	40.7%(37)	5.4%(5)	3.3%(3)	21.7%(20)
10	Coogee	13%(6)	8.8%(8)	19.6%(18)	33.7%(31)	13%(12)
11	Petrin Couchette	13%(6)	15.4%(14)	1.1%(1)	35.9%(33)	0(0)
12	Tarlata Café	4.3%(2)	9.9%(9)	19.6%(18)	1.1%(1)	10.9%(10)

Cell = % of that engine’s captures in which the brand was named in the visible answer text; raw count in parentheses. Rows ordered by total text mentions (matching the leaderboard above). Colour scales with the table maximum (Deep on ChatGPT). Zeros greyed. Hover a cell for the underlying counts.

How this leaderboard is built — and corrected. Each engine’s answer runs through a deterministic (temperature 0) extraction pass that pulls the shops it actually recommends; a brand counts once per answer, so a name repeated in one reply doesn’t inflate it. Mentions are then resolved against the Google Maps seed. Two of the most-recommended shops — Black Bird Coffee and Ben Mouture — weren’t in that seed at all, so an earlier pass discarded them as unresolved. We now recover such names via Google Places (confirming each is a real Marseille café before counting it), which is why both appear here. The cite-counted score column is shown alongside text mentions for contrast, but the rank order is the text-mention one — the honest “did the engine say the name” metric.

Where the winners are

The 12 leaderboard brands plotted on the map — chains show all their locations (Café Lauca, Boujou and La Brûlerie MÖKA have two each). Click a marker for the per-engine text-mention breakdown.

Top 12 most-cited brands — locations, popups show per-engine text mentions

1Deep27VB Café3Black Bird Coffee4Café Lauca5Boujou Coffee6Ben Mouture7Coffee&Bakery8La Brûlerie MÖKA9Risette10Coogee11Petrin Couchette12Tarlata Café

Section 5

Top cited sources, cross-platform

Every domain cited by at least 3 of 5 platforms with 30+ cites. Read this as the working infrastructure of AI search for Marseille specialty coffee.

Top cross-platform cited domains for Marseille specialty coffee, ≥3 platforms and ≥30 cites.

Platforms	Cites	Bucket	Domain
4 / 5	175	social	reddit.com
4 / 5	93	editorial — local	marseille.love-spots.com
4 / 5	76	review aggregator	wanderlog.com
4 / 5	54	entity website	cafelauca.com
4 / 5	36	editorial — local	marseillesecrete.com
3 / 5	237	social	instagram.com

instagram.com at 237 cites is the row to notice. It shows up on 3 of 5 platforms with the highest single-domain count outside AI Mode’s Google self-citations — and no prior city or vertical in this series has surfaced Instagram even close to that volume. Reddit (175 cites, 4 platforms) is the more “expected” social signal; Instagram is the Marseille-specific one.

Section 6

EN vs FR — the sharpest split we’ve seen

Same intent, different language, different shops. For Marseille the divergence is the largest in the cross-vertical series so far.

EN vs FR top-5 shop overlap per prompt template, ChatGPT, FR proxy.

Template	EN vs FR top-5 overlap
dist_cours_julien	67%
price_cheap	43%
dist_panier	43%
brew_pourover	25%
brew_espresso	25%
brew_coldbrew	25%

The control prompt overlaps just 11% — only 1 of the top 5 shops matches between English and French. For comparison, Berlin’s control was 25% and Paris yoga’s was 25%. Marseille is roughly half as language-stable as the previous two non-anglophone replications.

The cause is structural and follows directly from the source mix: with so little global specialty-coffee press covering Marseille, English prompts pull from a small English-language source set (Reddit threads, Wanderlog, Barista Magazine) while French prompts pull from the dense Marseille local-blog web (marseille.love-spots.com, marseillesecrete.com, tarpin-bien.com). Two languages, two largely disjoint citation universes — and two largely disjoint top-5 shop lists as the downstream consequence.

For a Marseille café, this is the most actionable single finding in the study: on the control prompt the EN and FR top-5 lists overlap by only 11%, so an EN-only or FR-only audit gives a very incomplete picture of what the other language’s users see. The two languages pull from different libraries.

There’s a structural tilt underneath this. The web these engines train on, and the retrievers that rank sources at query time, both skew heavily toward English — so for a French city with little global coverage, the English-language “library” is thin to begin with. That cuts two ways: anglophone travellers get a shallower, more Reddit-and-aggregator-driven answer for Marseille, and a café with even one credible English source (its own English page, a Barista Magazine write-up) can punch well above that thin field. The lever isn’t abandoning your .fr local-blog presence — it’s adding a genuine English edition so you sit in both libraries rather than only the French one.

Section 7

Gemini’s substitute

When the local layer Gemini usually leans on doesn’t exist, it reaches one rung up.

baristamagazine.com citations on Gemini

~34% of Gemini’s entire Marseille citation pool — more than any shop website.

The generalisable lesson

Gemini’s source preference is consistent across our studies, but it’s a shape more than a fixed set: “an authoritative editorial outlet for this domain.” The identity of that outlet is locale-dependent.

·Berlin yoga → the Urban Sports Club blog filled the slot.
·Tokyo bookstores → the local-guide web (whenin.tokyo, Tokyo Weekender) filled it.
·Marseille coffee → with no obvious Marseille-specific specialty-coffee authority, Gemini reaches for the global specialty trade press.

For a domain-aware editorial outlet, the strategic implication is the inverse of the local-blog story: in cities where no local equivalent exists, a global trade publication can become the single largest Gemini citation source for that locale. Barista Magazine isn’t “about” Marseille — but on Gemini, for Marseille, it’s the source.

Section 8

TLD note: too thin to read

We tried to measure a .com vs .fr citation split by prompt language. Only ~22 of the 280 venues have a .fr website, so the .fr sample sits below our trust threshold — we read no pattern from it. Flagging it as a measurement limit, not a finding.

Section 9

Geography: useful map, unusable quartier test

Marseille’s coffee venues cluster in a handful of central quartiers. The map is useful; the per-district AI accuracy test is not — and that’s a measurement limit, not an AI behaviour.

Marseille’s coffee venues, mapped

Every geocoded coffee venue (302 in all). Answers were scored against the 280-venue registry — the subset left after filtering out bakeries, brunch spots and restaurants. Hover a dot for the name and district. The density follows the central waterfront and creative-quarter neighbourhoods (Vieux-Port, Notre-Dame-du-Mont, Cours Julien) rather than the outlying arrondissements.

All 302 verified Marseille specialty cafés

The supply side: where Marseille coffee actually is

The same distribution as counts — central quartiers dominate, the outlying arrondissements barely show up.

Bar chart of specialty cafe counts by Marseille district showing concentration in central quartiers including Le Panier, Cours Julien, Notre-Dame-du-Mont and the Vieux-Port area. — Coffee-venue counts across Marseille’s central quartiers, from the 280-venue registry.

The quartier accuracy test reads 0% across all six tested districts — but that’s a seed-data artifact, not AI getting Marseille wrong. The Google Maps seed stores every venue’s city as “Marseille” with no quartier field, so a shop an engine returns can’t string-match a quartier target. Paris yoga, with an arrondissement-tagged registry, was the one case we could measure this cleanly.

Section 10

Where Marseille fits in the four-city picture

One row per city in the cross-vertical series, on the two axes that matter most for the entity-engine pattern: ChatGPT’s shop-website citation share, and the signature non-shop source the engine leans on instead.

Five cross-vertical replications of the AI-search methodology, scored on ChatGPT’s entity-engine vs third-party split.

City + vertical	ChatGPT shop-site %	Signature non-shop source
Paris yoga	32%	Reddit
Berlin yoga	32%	Urban Sports Club
Amsterdam bikes	42%	Reddit
Tokyo bookstores	8%	whenin.tokyo + Tokyo Weekender
Marseille coffee	10%	Reddit + Instagram + French local blogs

What still generalises

Per-engine personalities hold. Copilot is still the entity engine; Perplexity still the most diverse mix; Gemini still concentrates on one editorial vein; AI Mode still self-cites Google.
Source mix bends to local infrastructure. Berlin (Urban Sports Club), Tokyo (local-guide web), Marseille (Reddit + Instagram + FR micro-guides) — same mechanism, different substrate.
Language and proxy reshape the answer. EN/FR divergence widens further here, but the direction is the same as Paris and Berlin.

What Marseille adds

The entity-engine baseline isn’t a law. The ~32% ChatGPT shop-site share we’d been treating as a constant is conditional on shops owning a digital footprint. When they don’t, it can drop to 10%.
Instagram can be a primary AI citation source. Not just social proof — a top-3 cited domain on 3 of 5 engines.
Gemini will substitute upward. No local specialty press → global trade press fills the slot. The shape of the source preference is fixed; the identity isn’t.

Read: AI Search for Yoga Studios in Paris All AI Search Studies

Section 11

AI search behaviour — and an operational caveat

The plumbing notes for this dataset, including a reproducible Bright Data quirk that bit two of our studies in a row.

413

Captures across 9 of 10 platform×proxy batches. Roughly even per-engine, weighted toward platforms with multiple language/proxy combinations that resolved.

3,442

Cited URLs logged, bucketed into the 10-bucket source taxonomy. Marseille’s French local-blog tail bucketed cleanly — “other” tightened from 54% (pre-taxonomy pass) to 9%.

786

Distinct map entities surfaced and resolved through the NER pipeline to the 280-shop registry.

AI Mode × FR proxy failed at the Bright Data trigger. HTTP-level rejection, no snapshot ID returned. This is the same FR-proxy block that hit our Paris yoga study in May; AI Mode × DE worked cleanly in the Berlin replication, so the rejection is FR-specific to AI Mode — not a one-off scrape failure. Every AI Mode percentage on this page is over the US-proxy batch only. We flag the absence rather than imputing French numbers.

For café owners

What this means for a Marseille café

The playbook isn’t “website or Instagram.” It’s show up where Marseille coffee is already documented.

Your website still matters — for clean entity resolution, and for Copilot, which stays website-first. Just don’t expect it to carry the whole job.
Your Instagram is an AI citation surface, not just a brand channel. With no booking transaction to host on a website, Instagram is where the scene documents itself — and the engines cite it.
French micro-guides drive the French prompts. Getting into love-spots, marseillesecrete and the local-guide tail is worth more than another homepage tweak.
English-language visibility is thin — so one good English page or write-up can overperform for tourist prompts.
Don’t audit only one language. The EN and FR result sets overlap just 11% — check both.

Conclusion

A useful exception, not a new rule

The engines didn’t change personality. Copilot still prefers café websites, Perplexity still spreads across sources, Gemini still reaches for an editorial authority, and ChatGPT still blends recommendations with open-web citations. What changed is the local substrate.

In Marseille coffee the owned-web layer is thin — and there’s a clean reason why: there’s no transaction to host. Yoga and hotels sell bookings on their sites; a café sells nothing online, so its useful documentation lives elsewhere — Instagram, Reddit, French micro-guides and a few coffee-trade publications. So the engines go there.

The broader lesson: AI search doesn’t reward one universal content playbook. It reflects the shape of the local web. In Marseille, that shape is less “homepage + schema” and more “Instagram post + local blog + Reddit thread + enough English-language surface to be understood by tourists.”

So for hotels, restaurants, cafés or studios, the question isn’t “what does GEO say I should publish?” It’s: where does your category already get documented in your city? In Marseille coffee, the answer is clear — the AI-search ecosystem smells less like a sitemap and more like a terrasse.

Methodology

Study design

Data collection

23 prompt templates × 2 languages (EN/FR) × 2 proxy countries (US/FR) × 5 AI engines = 460 theoretical; 413 captured (AI Mode × FR rejected, −46; one empty ChatGPT run, −1)
Engines: ChatGPT, Perplexity, Gemini, Copilot, Google AI Mode
Captured 2026-05-28 via Bright Data
413 captures, 3,442 cited URLs, 786 distinct map entities
9 of 10 platform×proxy batches resolved — AI Mode × FR rejected

What we measured

Shops named per answer (chain-aggregated leaderboard)
Cited URLs bucketed into a 10-bucket source taxonomy
EN vs FR top-5 overlap per prompt template
.com vs .fr citation balance by prompt language (sample-limited)
Per-platform social and review-aggregator share
Cross-city comparison vs Paris yoga, Berlin yoga, Amsterdam bikes, Tokyo bookstores

How we turned answers into shops: the NER pipeline

AI answers are free text — “Deep is the obvious one, then maybe Black Bird up on Cours Julien, or Tarlata…” — not a clean list of businesses. To count anything, we first had to extract the shop mentions. Each answer (and each citation’s anchor text) ran through a named-entity-recognition (NER) pass that works in four steps:

Span detection. A transformer NER model tags candidate spans — organisation/business names and the location phrases attached to them (quartier names, street addresses, “Vieux-Port,” “Cours Julien”). A coffee-specific gazetteer (“café,” “brûlerie,” “coffee,” “roasters”) boosts recall on names the base model would otherwise miss.
Normalisation. Each candidate is lower-cased and stripped of boilerplate — the word “Marseille,” quartier suffixes, trademark glyphs and punctuation — so “Black Bird Coffee Marseille” and “Black Bird” collapse toward the same key.
Entity resolution. Normalised mentions are matched to the 280-venue coffee registry — a 285-row Apify Google Maps seed expanded with Google Places recovery (302 geocoded venues) and filtered to coffee-led spots. Fuzzy string similarity plus a domain match when the answer cited the shop’s own website. Ambiguous or sub-threshold spans are dropped. A frequently-recommended name that doesn’t match the seed is checked against Google Places: if it’s a real Marseille café the seed simply missed, it’s added and counted (this is how Black Bird Coffee and Ben Mouture entered the leaderboard); otherwise it’s flagged as a hallucination and excluded.
Chain aggregation. Resolved entities that belong to the same brand are merged so a multi-location shop isn’t double-counted, while per-location coordinates are retained for the map.

A taxonomy pass also tightened the source bucketing: pre-pass, “other” held 54% of Marseille citations because the French local-blog tail was unfamiliar to the default classifier. Post-pass, “other” sits at 9% — the French micro-guide ecosystem bucketed cleanly into editorial_local.

Caveats

AI Mode × FR proxy is missing. Bright Data rejected the batch at the trigger layer (HTTP-level, no snapshot ID). The same FR-proxy block hit Paris yoga; AI Mode × DE worked in Berlin. FR-specific to AI Mode, not a one-off.
.fr TLD sample is too thin to read. Only ~22 of 280 Marseille coffee venues carry a .fr website; .fr citation counts sit below our trust threshold. The TLD section is rendered as a measurement limitation, not a finding.
District-targeting null is a seed-granularity artifact. The Apify seed labels every shop city = “Marseille” with no quartier field; the 0% accuracy across all six quartiers is methodological, not behavioural. Same Amsterdam/Berlin pattern.
NER resolution is high-precision by design: mentions the pipeline can’t confidently map to the registry are dropped, so counts are conservative lower bounds.
Google AI Mode’s heavy reliance on google.com URLs may inflate its citation count relative to other engines.
Disclosure: no personal affiliation with any Marseille specialty coffee shop.

FAQ

Frequently Asked Questions

Summarize with AI

Continue Reading

More on how AI search surfaces local businesses.

AI Search for Yoga Studios in Paris (2026)All AI Search Studies

All Research