Built for the dealer chat retrieval layer — Vectorize + Workers AI + AI Gateway, the stack we discussed
AI-POWERED DEMO

Dealer chat retrieval, on Cloudflare.

A working reference for the dealer chat workload we discussed: per-dealer knowledge bases on Vectorize, edge inference on Workers AI, cost-controlled and audited through AI Gateway. AWS data lake (Athena, Redshift) stays put — this supplements the inference + retrieval path.

Vectorize AI Search AI Gateway Workers AI Code Mode Durable Objects
Live sandbox uses real Workers, R2, D1, Vectorize, and Workers AI
dealer chat · retrieval path
// shopper asks dealer chatbot
Workerrequest enters edge
→ Durable Objectsession state
→ Workers AI embed~60ms
→ Vectorize queryper-dealer KB · <100ms
→ AI Gatewaycache · ratelimit · audit
→ Workers AI inferenceedge LLM
→ D1transcript · lead
AWS data lakeunchanged
Per-tenant retrievalon edge
Vectorize + AI Search — the retrieval layer for dealer chat

LeadVenture powers 40,000+ dealers across 6 brands and 9 industries

40K+
Dealers
6
Brands
9
Verticals
Multi
Tenant Scale
LIVE — NO SIGNUP REQUIRED

Skip the slide deck.
Try it live.

The search above is a preview. The Live Sandbox is the AI feature layer your dealer chatbots will run on — upload sample inventory photos, watch Workers AI auto-caption them, see Vectorize index them in real time, then search with natural language. Same primitive, every dealer, isolated by Workers for Platforms.

Your own isolated session
Real Workers AI inference
Reset anytime
sandbox-api.workers.dev
POST /api/upload
→ R2 20ms
→ Workers AI caption 1.2s
→ BGE embed 60ms
→ Vectorize index 50ms
POST /api/search
→ Embed query 60ms
→ Vectorize query 12ms
→ D1 lookup 5ms
Total ~77ms

A dealer's inventory, embedded

Each unit is auto-captioned by Workers AI at ingest, embedded into Vectorize with a per-dealer namespace, and surfaced through the dealer's chatbot. Photos served from R2 at zero egress.

22ft Pontoon
2023 · White Hull
Marine
Sun Tracker Party Barge
North Lake Marine
$38,995
Touring Bike
2022 · 8,400 mi
Powersports
Honda Gold Wing
Ridgeline Motorsports
$14,750
Class C RV
2024 · Sleeps 6
RV
Forest River Sunseeker
Open Roads RV
$98,500
Compact Tractor
2023 · 38hp
Agriculture
Kubota L3902 + Loader
Heartland Equipment
$32,400
HERO LAYER — PER-DEALER KNOWLEDGE BASE

Per-dealer retrieval, two ways

For the dealer chat workload, the retrieval layer is the architecture decision that matters most. Cloudflare gives you two paths depending on what kind of data each dealer's knowledge base actually contains.

Documents → AI Search

Drop dealer policies, FAQs, manuals, service docs (markdown / PDF / HTML) into an R2 bucket. AI Search auto-generates and maintains a Vectorize index per dealer. No embedding pipeline to build.

R2 + AI Search · per-tenant out of the box

Structured data → Vectorize direct

Live inventory feeds, transactional data, anything you want fine-grained control over. Embed with Workers AI, write to Vectorize per-dealer namespace, query with sub-100ms latency at the edge.

Workers AI + Vectorize · full control
AWS data lake stays put. Athena and Redshift remain the system of record for inventory and analytics. Cloudflare supplements with the per-tenant inference + retrieval path; the data layer doesn't move.

Architecture: LeadVenture on Cloudflare

Every component mapped to a Cloudflare product. Zero origin servers required.

Per-Dealer Knowledge Base — Vectorize

Each dealer's inventory, FAQs, and service docs embedded in a per-dealer namespace. Sub-100ms semantic retrieval at the edge. Cost discussion: vs. AWS-based vector DBs at platform scale.

THE OPEN QUESTION

Document Retrieval — AI Search

Drop docs (markdown / PDF / HTML) into R2; AI Search auto-generates and maintains a Vectorize index. Per-tenant search, no embedding pipeline to build.

ZERO-OPS RETRIEVAL

LLM Control Plane — AI Gateway

Caching, rate limits, fallback, and full audit logs across every dealer bot. Already in use internally; production beta tracking for end of year. Per-Worker fine-grained permissions in flight.

ALREADY ADOPTED

Edge Inference — Workers AI

Llama, embeddings, vision models running in 330+ cities. Or proxy to OpenAI / Anthropic via AI Gateway — same control plane, same audit, swappable per use case.

BYO model via AI Gateway

Tool-Call Path — Code Mode + Agents SDK

For the dealer chat agent. Code Mode shrinks the tool-call context window and limits the execute tool's outbound to the specific tool intended. Same pattern Cloudflare uses internally.

Pairs with Durable Objects

Chat Session State — Durable Objects

One DO per active chat session. Strongly consistent, sticky to user, runs at the edge. Holds session tokens for the tool-call path securely.

Replaces: Redis + sticky LB

Transcripts & Config — D1

Per-dealer chat history, lead capture, bot config (tone, hours, escalation). Athena / Redshift stay as the analytical data lake — D1 holds the operational chat data only.

Operational store, not data lake

Per-Dealer Compute — Workers for Platforms

When dealer chat logic needs to be customized per dealer (custom prompts, escalation rules, integrations), dispatch namespaces give isolated execution without per-dealer deployments. Optional, depending on customization needs.

If per-dealer customization required

Lead Fan-out — Queues

Qualified chat leads pushed to dealer CRMs without blocking the bot response. Retries, DLQ, batching built in. Plays well with existing CRM webhook patterns.

Async fan-out, non-blocking

The business case

The dealer chat workload sits at the intersection of three things LeadVenture already cares about: per-tenant retrieval, controlled LLM spend, and edge latency. Cloudflare's pieces compose for exactly this.

<100ms
Edge Retrieval

Per-dealer Vectorize namespace lookups served from the nearest of 330+ cities. Beats round-tripping a centralized vector DB.

1
AI Cost Control Plane

AI Gateway caches identical FAQ requests, rate-limits abusive sessions, and audits every LLM call. Already in flight internally; same primitive applies to dealer chat.

0
Embedding Pipelines to Build

If dealer KBs are mostly documents, AI Search auto-generates and maintains the Vectorize index from R2. Skip the ETL.

What this supplements, not replaces

AWS data lake stays. Athena and Redshift remain the system of record for inventory and analytics across the brands. Cloudflare adds the per-tenant inference + retrieval layer that sits in front of it.
AI Gateway is already in use. The internal LLM-key consolidation and Langfuse integration carries over — same gateway, new use case.
Code Mode pattern carries over. The same primitive being implemented elsewhere fits the dealer chat agent's tool-call path naturally.
Pilot scoped to one brand. Pick one of the brands, one cluster of dealers. Measure latency + cost vs. the AWS-based path. Concrete numbers before commitment.

Where to next?

A scoped working session on Vectorize cost vs. your current AWS path, AI Search for per-tenant document retrieval, and Code Mode for the dealer chat agent. Concrete numbers, not architecture diagrams.

Built with Cloudflare Pages, Vectorize, AI Search, Workers AI, AI Gateway, Durable Objects, D1, Workers for Platforms, Queues, and R2.
Deployed globally to 330+ cities. AWS data lake (Athena, Redshift) unchanged.