A working reference for the dealer chat workload we discussed: per-dealer knowledge bases on Vectorize, edge inference on Workers AI, cost-controlled and audited through AI Gateway. AWS data lake (Athena, Redshift) stays put — this supplements the inference + retrieval path.
LeadVenture powers 40,000+ dealers across 6 brands and 9 industries
The retrieval primitive at the heart of dealer chat. Shopper describes what they want; Workers AI generates embeddings; Vectorize queries the dealer's per-tenant namespace; bot grounds its response in real inventory in <100ms.
The search above is a preview. The Live Sandbox is the AI feature layer your dealer chatbots will run on — upload sample inventory photos, watch Workers AI auto-caption them, see Vectorize index them in real time, then search with natural language. Same primitive, every dealer, isolated by Workers for Platforms.
Each unit is auto-captioned by Workers AI at ingest, embedded into Vectorize with a per-dealer namespace, and surfaced through the dealer's chatbot. Photos served from R2 at zero egress.
For the dealer chat workload, the retrieval layer is the architecture decision that matters most. Cloudflare gives you two paths depending on what kind of data each dealer's knowledge base actually contains.
Drop dealer policies, FAQs, manuals, service docs (markdown / PDF / HTML) into an R2 bucket. AI Search auto-generates and maintains a Vectorize index per dealer. No embedding pipeline to build.
Live inventory feeds, transactional data, anything you want fine-grained control over. Embed with Workers AI, write to Vectorize per-dealer namespace, query with sub-100ms latency at the edge.
Every component mapped to a Cloudflare product. Zero origin servers required.
Each dealer's inventory, FAQs, and service docs embedded in a per-dealer namespace. Sub-100ms semantic retrieval at the edge. Cost discussion: vs. AWS-based vector DBs at platform scale.
Drop docs (markdown / PDF / HTML) into R2; AI Search auto-generates and maintains a Vectorize index. Per-tenant search, no embedding pipeline to build.
Caching, rate limits, fallback, and full audit logs across every dealer bot. Already in use internally; production beta tracking for end of year. Per-Worker fine-grained permissions in flight.
Llama, embeddings, vision models running in 330+ cities. Or proxy to OpenAI / Anthropic via AI Gateway — same control plane, same audit, swappable per use case.
For the dealer chat agent. Code Mode shrinks the tool-call context window and limits the execute tool's outbound to the specific tool intended. Same pattern Cloudflare uses internally.
One DO per active chat session. Strongly consistent, sticky to user, runs at the edge. Holds session tokens for the tool-call path securely.
Per-dealer chat history, lead capture, bot config (tone, hours, escalation). Athena / Redshift stay as the analytical data lake — D1 holds the operational chat data only.
When dealer chat logic needs to be customized per dealer (custom prompts, escalation rules, integrations), dispatch namespaces give isolated execution without per-dealer deployments. Optional, depending on customization needs.
Qualified chat leads pushed to dealer CRMs without blocking the bot response. Retries, DLQ, batching built in. Plays well with existing CRM webhook patterns.
The dealer chat workload sits at the intersection of three things LeadVenture already cares about: per-tenant retrieval, controlled LLM spend, and edge latency. Cloudflare's pieces compose for exactly this.
Per-dealer Vectorize namespace lookups served from the nearest of 330+ cities. Beats round-tripping a centralized vector DB.
AI Gateway caches identical FAQ requests, rate-limits abusive sessions, and audits every LLM call. Already in flight internally; same primitive applies to dealer chat.
If dealer KBs are mostly documents, AI Search auto-generates and maintains the Vectorize index from R2. Skip the ETL.
A scoped working session on Vectorize cost vs. your current AWS path, AI Search for per-tenant document retrieval, and Code Mode for the dealer chat agent. Concrete numbers, not architecture diagrams.
Built with Cloudflare Pages, Vectorize, AI Search, Workers AI, AI Gateway, Durable Objects, D1, Workers for Platforms, Queues, and R2.
Deployed globally to 330+ cities. AWS data lake (Athena, Redshift) unchanged.