AI-POWERED DEMO

Dealer chat retrieval, on Cloudflare.

A working reference for the dealer chat workload we discussed: per-dealer knowledge bases on Vectorize, edge inference on Workers AI, cost-controlled and audited through AI Gateway. AWS data lake (Athena, Redshift) stays put — this supplements the inference + retrieval path.

Vectorize AI Search AI Gateway Workers AI Code Mode Durable Objects

Try the Live Sandbox Preview Demo Only

Live sandbox uses real Workers, R2, D1, Vectorize, and Workers AI

dealer chat · retrieval path

// shopper asks dealer chatbot

Workerrequest enters edge

→ Durable Objectsession state

→ Workers AI embed~60ms

→ Vectorize queryper-dealer KB · <100ms

→ AI Gatewaycache · ratelimit · audit

→ Workers AI inferenceedge LLM

→ D1transcript · lead

AWS data lakeunchanged

Per-tenant retrievalon edge

Vectorize + AI Search — the retrieval layer for dealer chat

NEW — WORKERS AI + VECTORIZE

AI Inventory Search

The retrieval primitive at the heart of dealer chat. Shopper describes what they want; Workers AI generates embeddings; Vectorize queries the dealer's per-tenant namespace; bot grounds its response in real inventory in <100ms.

How this works inside a dealer chatbot

Step 1

Shopper asks the dealer's chatbot for a unit

Step 2

Workers AI embeds the query at the edge

Step 3

Vectorize searches that dealer's inventory

Step 4

Bot returns matches in <100ms via AI Gateway

LIVE — NO SIGNUP REQUIRED

Skip the slide deck.
Try it live.

The search above is a preview. The Live Sandbox is the AI feature layer your dealer chatbots will run on — upload sample inventory photos, watch Workers AI auto-caption them, see Vectorize index them in real time, then search with natural language. Same primitive, every dealer, isolated by Workers for Platforms.

Launch the Sandbox View API Health

Your own isolated session

Real Workers AI inference

Reset anytime

sandbox-api.workers.dev

POST /api/upload

→ R2 20ms

→ Workers AI caption 1.2s

→ BGE embed 60ms

→ Vectorize index 50ms

POST /api/search

→ Embed query 60ms

→ Vectorize query 12ms

→ D1 lookup 5ms

Total ~77ms

A dealer's inventory, embedded

Each unit is auto-captioned by Workers AI at ingest, embedded into Vectorize with a per-dealer namespace, and surfaced through the dealer's chatbot. Photos served from R2 at zero egress.

22ft Pontoon

2023 · White Hull

Marine

Sun Tracker Party Barge

North Lake Marine

$38,995

Touring Bike

2022 · 8,400 mi

Powersports

Honda Gold Wing

Ridgeline Motorsports

$14,750

Class C RV

2024 · Sleeps 6

Forest River Sunseeker

Open Roads RV

$98,500

Compact Tractor

2023 · 38hp

Agriculture

Kubota L3902 + Loader

Heartland Equipment

$32,400

HERO LAYER — PER-DEALER KNOWLEDGE BASE

Per-dealer retrieval, two ways

For the dealer chat workload, the retrieval layer is the architecture decision that matters most. Cloudflare gives you two paths depending on what kind of data each dealer's knowledge base actually contains.

Documents → AI Search

Drop dealer policies, FAQs, manuals, service docs (markdown / PDF / HTML) into an R2 bucket. AI Search auto-generates and maintains a Vectorize index per dealer. No embedding pipeline to build.

R2 + AI Search · per-tenant out of the box

Structured data → Vectorize direct

Live inventory feeds, transactional data, anything you want fine-grained control over. Embed with Workers AI, write to Vectorize per-dealer namespace, query with sub-100ms latency at the edge.

Workers AI + Vectorize · full control

AWS data lake stays put. Athena and Redshift remain the system of record for inventory and analytics. Cloudflare supplements with the per-tenant inference + retrieval path; the data layer doesn't move.

Architecture: LeadVenture on Cloudflare

Every component mapped to a Cloudflare product. Zero origin servers required.

Per-Dealer Knowledge Base — Vectorize

Each dealer's inventory, FAQs, and service docs embedded in a per-dealer namespace. Sub-100ms semantic retrieval at the edge. Cost discussion: vs. AWS-based vector DBs at platform scale.

THE OPEN QUESTION

Document Retrieval — AI Search

Drop docs (markdown / PDF / HTML) into R2; AI Search auto-generates and maintains a Vectorize index. Per-tenant search, no embedding pipeline to build.

ZERO-OPS RETRIEVAL

LLM Control Plane — AI Gateway

Caching, rate limits, fallback, and full audit logs across every dealer bot. Already in use internally; production beta tracking for end of year. Per-Worker fine-grained permissions in flight.

ALREADY ADOPTED

Edge Inference — Workers AI

Llama, embeddings, vision models running in 330+ cities. Or proxy to OpenAI / Anthropic via AI Gateway — same control plane, same audit, swappable per use case.

BYO model via AI Gateway

Tool-Call Path — Code Mode + Agents SDK

For the dealer chat agent. Code Mode shrinks the tool-call context window and limits the execute tool's outbound to the specific tool intended. Same pattern Cloudflare uses internally.

Pairs with Durable Objects

Chat Session State — Durable Objects

One DO per active chat session. Strongly consistent, sticky to user, runs at the edge. Holds session tokens for the tool-call path securely.

Replaces: Redis + sticky LB

Transcripts & Config — D1

Per-dealer chat history, lead capture, bot config (tone, hours, escalation). Athena / Redshift stay as the analytical data lake — D1 holds the operational chat data only.

Operational store, not data lake

Per-Dealer Compute — Workers for Platforms

When dealer chat logic needs to be customized per dealer (custom prompts, escalation rules, integrations), dispatch namespaces give isolated execution without per-dealer deployments. Optional, depending on customization needs.

If per-dealer customization required

Lead Fan-out — Queues

Qualified chat leads pushed to dealer CRMs without blocking the bot response. Retries, DLQ, batching built in. Plays well with existing CRM webhook patterns.

Async fan-out, non-blocking

The business case

The dealer chat workload sits at the intersection of three things LeadVenture already cares about: per-tenant retrieval, controlled LLM spend, and edge latency. Cloudflare's pieces compose for exactly this.

<100ms

Edge Retrieval

Per-dealer Vectorize namespace lookups served from the nearest of 330+ cities. Beats round-tripping a centralized vector DB.

AI Cost Control Plane

AI Gateway caches identical FAQ requests, rate-limits abusive sessions, and audits every LLM call. Already in flight internally; same primitive applies to dealer chat.

Embedding Pipelines to Build

If dealer KBs are mostly documents, AI Search auto-generates and maintains the Vectorize index from R2. Skip the ETL.

What this supplements, not replaces

AWS data lake stays. Athena and Redshift remain the system of record for inventory and analytics across the brands. Cloudflare adds the per-tenant inference + retrieval layer that sits in front of it.

AI Gateway is already in use. The internal LLM-key consolidation and Langfuse integration carries over — same gateway, new use case.

Code Mode pattern carries over. The same primitive being implemented elsewhere fits the dealer chat agent's tool-call path naturally.

Pilot scoped to one brand. Pick one of the brands, one cluster of dealers. Measure latency + cost vs. the AWS-based path. Concrete numbers before commitment.

Where to next?

A scoped working session on Vectorize cost vs. your current AWS path, AI Search for per-tenant document retrieval, and Code Mode for the dealer chat agent. Concrete numbers, not architecture diagrams.

Launch the Live Sandbox Schedule Architecture Review

Built with Cloudflare Pages, Vectorize, AI Search, Workers AI, AI Gateway, Durable Objects, D1, Workers for Platforms, Queues, and R2.
Deployed globally to 330+ cities. AWS data lake (Athena, Redshift) unchanged.