Here's a common pattern in e-commerce feedback (and these are realistic for most websites, not edge cases):
Launch product → 10,000 customers buy → 10–20% leave reviews → 20–30% of those contain actionable insights.
For example, apparel tends to sit toward the higher end of review volume, consumer electronics toward the lower end, and SaaS higher on volume but uneven on quality.
Product decisions often rely on feedback from roughly 2–6% of the shopper base. The silent 94–98% remain largely unknown. Writing reviews requires effort, which naturally filters out customers without strong opinions.
What businesses see:
What users experience:
\
What if we could extract high-signal feedback effortlessly—driving product improvement, building platform trust, and making it easy for users to shape better products?
Instead of asking users to write reviews, we interview them with hyper-personalized MCQs. Not generic forms. Not hardcoded decision trees. Adaptive, context-aware conversations where each question is shaped by:
The core insight: MCQs collapse cognitive load while preserving signal. One-click responses can generate the same depth as written reviews—if the questions are smart enough.
The potential value:
Introducing Survey Sensei—a multi-agent system that implements this approach with four specialized agents:
The full workflow (end-to-end example):
Step 1 (Product Intelligence): Analyze 213 reviews of the purchased laptop → Battery: 67% complain "dies mid-afternoon" → Keyboard: 82% praise "excellent typing experience" → Performance: 45% mention "handles multitasking well" Step 2 (User Intelligence): Pull purchase history for this customer → Bought 3 laptops in past 2 years (power user pattern) → Reviews 85% of purchases, critical but fair (3.6★ average) → Detail-oriented: past reviews averaged 120 words Step 3 (Adaptive Question 1): "You've purchased 3 laptops recently. What drove this upgrade?" ○ Better performance ● Longer battery life ← USER SELECTED ○ Lighter/more portable ○ Other: [text] Step 4 (Adaptive Follow-up): "How long does the battery last on a typical workday?" ○ All day (12+ hours) ● 4-8 hours ← USER SELECTED ○ Less than 4 hours Step 5 (Probing Deeper): "Does this meet your battery expectations?" ○ Exceeds expectations ● Falls slightly short ← USER SELECTED ○ Major disappointment [Agent continues for 10-12 total questions, probing keyboard quality, performance, portability based on this user's priorities...] Step 6 (Review Synthesis): Convert MCQ selections → natural language "Upgraded hoping for better battery. Lasts 4-8 hours—falls short of all-day claims, but manageable for office work. Keyboard is outstanding for typing. Performance handles multitasking well."
Contrast with generic tools:
\
The convergence of cheaper, more intelligent models with rapidly declining token costs has made AI-powered personalization economically viable at scale.
2020 (GPT-3 era):
2025 (GPT-4/GPT-5 era and evolving):
1. Per-user personalization:
2. Adaptive vs. static workflows:
3. Natural language synthesis:
4. Economic accessibility:
\
Before diving into the details, it's critical to understand how the project is structured. The diagram below shows the complete system architecture—from the UI layer through the orchestrator to the multi-agent framework, along with data pipelines and database schema:
Before diving into the details, it's critical to understand how the project is structured:
Purpose: Development scaffolding—test the core system without production data.
MockDataOrchestrator creates semi-realistic e-commerce ecosystems:
In production: Skip this entirely and integrate with real e-commerce databases and pipelines.
Purpose: The actual product—adaptive survey generation + authentic review synthesis.
This is the heart of Survey Sensei. The 4-agent system decomposes into specialized agents, each with focused responsibility:
Build a mental model of the product before generating questions.
Three-path adaptive logic:
Output schema:
class ProductContext: key_features: List[str] major_concerns: List[str] pros: List[str] cons: List[str] common_use_cases: List[str] context_type: str confidence_score: float
Build behavioral profiles to personalize question depth and tone.
Three-path adaptive logic:
Output schema:
class CustomerContext: purchase_patterns: List[str] review_behavior: List[str] product_preferences: List[str] primary_concerns: List[str] expectations: List[str] pain_points: List[str] engagement_level: str # highly_engaged | moderately_engaged | passive_buyer | new_user sentiment_tendency: str # positive | critical | balanced | polarized | neutral review_engagement_rate: float confidence_score: float
Personalization:
Conducts adaptive surveys where questions evolve based on answers. Uses LangGraph StateGraph for conversation state.
Performance optimization: Survey state is cached in-memory during the survey (no database writes on every answer). State is only persisted to the database at two points:
product_context and customer_context JSONB columnsquestions_and_answers JSONB, complete state to session_context JSONBAll intermediate answers are logged asynchronously to survey_details table for analytics (fire-and-forget, non-blocking).
The interview flow:
┌─ Survey Start ─────────────────────────────────────────────┐ │ │ │ 1. Fetch contexts in parallel: │ │ ├─ ProductContextAgent → What to ask about │ │ └─ CustomerContextAgent → How to ask it │ │ │ │ 2. Generate initial MCQs (3 questions baseline) │ │ │ │ 3. Stateful conversation loop: │ │ ┌───────────────────────────────────────────┐ │ │ │ Present MCQ │ │ │ │ ↓ │ │ │ │ Wait for user selection │ │ │ │ ↓ │ │ │ │ Process answer → Update internal state │ │ │ │ ↓ │ │ │ │ Route decision: │ │ │ │ ├─ Need follow-up? → Generate adaptive │ │ │ │ ├─ Move to next topic? → Next question │ │ │ │ └─ Survey complete? → Save & exit │ │ │ └───────────────────────────────────────────┘ │ │ │ └────────────────────────────────────────────────────────────┘
Survey completion rules:
initial_questions_count: 3 # Start with 3 baseline MCQs min_answered_questions: 10 # User must answer ≥10 max_answered_questions: 15 # Hard stop at 15 max_survey_questions: 20 # Total questions asked max_consecutive_skips: 3 # 3 consecutive skips → must answer to continue
Adaptive questioning example:
Question 5: "How long does the battery last on a typical workday?" ● Less than 4 hours ← USER SELECTED [Agent's internal state update: - Battery performance: Below average - Action: Generate follow-up to quantify impact] Follow-up Question 6: "When does the battery typically die?" ● Mid-afternoon (2-4pm) ← USER SELECTED [Agent's internal state update: - Specific pain point: Dies at 2-4pm (work hours) - Severity: High (impacts productivity) - Action: Probe importance for review weighting] Follow-up Question 7: "How important is longer battery life to you?" ● Very important - major inconvenience (USER SELECTED)
Why adaptive matters: Without adaptive AI, you'd build a rigid decision tree: "Battery life: Excellent | Good | Fair | Poor". That tells you what they think—but misses the critical detail: "Dies at 2pm during work hours, and it's a major inconvenience." The AI doesn't just branch—it regenerates the next question based on evolving context.
Convert MCQ selections into natural language reviews matching user's writing style.
Three-stage synthesis:
good | okay | baddef _get_star_ratings(sentiment_band: str) -> List[int]: if sentiment_band == "good": return [5, 4] elif sentiment_band == "okay": return [4, 3, 2] else: # bad return [2, 1]
Example output (sentiment: "okay", user: concise + critical):
[4-star] "Solid build quality and excellent screen. Battery dies around 3pm—acceptable for office use where I have charging access. Keyboard is comfortable for long typing. Performance handles multitasking well. Worth it on sale." [3-star] "Mixed feelings. Build quality and screen are great, but battery is the main letdown—dies at 3pm despite 'all-day' claims. Keyboard is excellent. If battery isn't a dealbreaker, it's decent." [2-star] "Disappointed with battery life. Product page advertised all-day battery, but it dies by 3pm daily with moderate use. Screen and keyboard are good, but battery is a major problem for anyone working away from chargers."
User picks framing, edits if needed, submits. 2 minutes of MCQ clicks → rich, authentic review.
\
Backend: FastAPI (Python 3.11), LangChain + LangGraph, OpenAI GPT-4o-mini ($0.15/1M tokens), Pydantic, Supabase/PostgreSQL + pgvector
Frontend: Next.js 14, TypeScript, Tailwind CSS, Supabase Client
AI/ML: OpenAI embeddings (1536-dim), batch generation (100 texts in 2-3s), IVFFlat indexes (2-3% recall loss for 100x speed)
-- 1. PRODUCTS: Catalog with semantic embeddings products ( item_id VARCHAR(20) PRIMARY KEY, title, brand, description, price, star_rating, num_ratings, review_count INTEGER, embeddings vector(1536), -- Semantic search is_mock BOOLEAN ) -- 2. USERS: Behavioral profiles users ( user_id UUID PRIMARY KEY, user_name, email_id, age, gender, base_location, embeddings vector(1536), total_purchases INTEGER, total_reviews INTEGER, review_engagement_rate DECIMAL(4,3), avg_review_rating DECIMAL(3,2), sentiment_tendency VARCHAR(20), engagement_level VARCHAR(30), is_main_user BOOLEAN ) -- 3. TRANSACTIONS: Purchase history transactions ( transaction_id UUID PRIMARY KEY, item_id → products, user_id → users, order_date, delivery_date, original_price, retail_price, transaction_status ) -- 4. REVIEWS: Multi-source feedback reviews ( review_id UUID PRIMARY KEY, item_id → products, user_id → users, transaction_id → transactions, review_title, review_text, review_stars, source VARCHAR(20), -- 'rapidapi' | 'agent_generated' | 'user_survey' embeddings vector(1536) ) -- 5. SURVEY_SESSIONS: Stateful survey orchestration survey_sessions ( session_id UUID PRIMARY KEY, user_id, item_id, transaction_id, product_context JSONB, -- Agent 1 output customer_context JSONB, -- Agent 2 output session_context JSONB, -- LangGraph state questions_and_answers JSONB, review_options JSONB, status VARCHAR(20) ) -- 6. SURVEY_DETAILS: Event log survey_details ( detail_id UUID PRIMARY KEY, session_id → survey_sessions, event_type VARCHAR(50), event_detail JSONB, created_at TIMESTAMP )
Design decisions:
rapidapi (real) | agent_generated (mock) | user_survey (golden path)survey_details logs every interaction for debuggingAll text → 1536-dim embeddings via text-embedding-3-small.
Find similar products:
SELECT item_id, title, 1 - (embeddings <=> query_embedding) AS similarity FROM products WHERE 1 - (embeddings <=> query_embedding) > 0.7 ORDER BY similarity DESC LIMIT 5;
Why vectors beat traditional categories:
Traditional hierarchies (Electronics → Audio → Headphones → Wireless) are rigid. Vector embeddings cluster products by intent and use case:
Vector embeddings naturally cluster by intent rather than superficial attributes. "Noise-canceling Bluetooth headphones" is closer to "wireless earbuds with ANC" than to "studio monitor headphones"—even though all three are technically "headphones."
Performance benchmarks:
The API provides six endpoints that cover the end-to-end survey workflow:
Error handling and edge cases:
1. Session expiration:
2. Idempotency:
3. Pydantic validation:
answer must be one of the provided options, not arbitrary text\
Source: github.com/arnavvj/survey-sensei
Prerequisites: Python 3.11+, Node.js 18+, Supabase account (free), OpenAI API key (~$5)
1. Clone the repo and create a python environment:
git clone https://github.com/arnavvj/survey-sensei.git cd survey-sensei/backend conda env create -f environment.yml # Installs all deps (FastAPI, LangChain, etc.) conda activate survey-sensei
2. Configure environment variables:
cp .env.local.example .env.local
Edit .env.local with your credentials:
OPENAI_API_KEY=sk-proj-... # From platform.openai.com SUPABASE_URL=https://xxxxx.supabase.co # From Supabase dashboard SUPABASE_SERVICE_ROLE_KEY=eyJhbGciOiJIUzI1NiIs... # From Supabase Settings → API RAPID_API_KEY=your_rapidapi_key # Optional: From rapidapi.com
3. Initialize database:
python database/init/apply_migrations.py # Applies migrations # Execute SQL code from `backend\database\_combined_migrations.sql` in your supabase project
4. Start the backend:
uvicorn main:app --reload --port 8000
1. Navigate to frontend:
cd survey-sensei/frontend
2. Configure environment variables:
cp .env.local.example .env.local
Edit .env.local:
NEXT_PUBLIC_SUPABASE_URL=https://xxxxx.supabase.co NEXT_PUBLIC_SUPABASE_ANON_KEY=eyJhbGciOiJIUzI1NiIs... OPENAI_API_KEY=sk-proj-...
3. Install dependencies and start dev server:
npm install npm run dev # Open http://localhost:3000
Enter an Amazon product URL (must include ASIN) and generate mock data (takes 3-4 minutes).
The MockDataOrchestrator builds a realistic e-commerce simulation in your supabase project. For example:
| Entity | Count | Composition | Purpose | |----|----|----|----| | Products | 11 | 1 real (RapidAPI) + 6 similar (LLM) + 4 diverse (LLM) | Market context for ProductContextAgent | | Users | 13-25 | 1 main user + 12-24 mock personas (varied ages, locations, purchase patterns) | Behavioral diversity for CustomerContextAgent | | Reviews | 30-100+ | 10-15 real (RapidAPI) + 20-85 LLM-generated (70% positive, 20% neutral, 10% negative) | Signal for ProductContextAgent analysis | | Transactions | 80-170+ | Each review → 1 transaction; additional no-review purchases (40% sparsity); 1 "current" delivery (triggers survey) | Realistic purchase patterns | | Embeddings | 200-300 | All entities → 1536-dim vectors (batch parallel via text-embedding-3-small) | Semantic similarity search |
Click "Start Survey" and wait 3-5 seconds.
What's happening behind the scenes:
Answer 10-12 adaptive MCQ questions. Each response triggers follow-up questions that probe deeper into your concerns (e.g., "battery life" → "how long does it last?" → "does this meet expectations?").
ReviewGenAgent synthesizes your MCQ responses into 3 natural language review variations (different star ratings, same sentiment). Pick one, optionally edit, and submit.
\
Note: These are projections based on industry benchmarks and reasonable assumptions. Actual results will vary significantly based on implementation, industry vertical, and user behavior. The scenarios below illustrate potential impact, not guaranteed outcomes.
Baseline (Traditional Reviews):
With Survey Sensei (projected):
Potential financial impact:
ROI (assuming full impact realization):
Potential business model:
Projected unit economics:
Customer acquisition (estimated):
\
Month 0: Current MVP State
Month 1: Production Hardening + Initial Testing
Infrastructure improvements:
Batch data pipelines:
Early validation:
Month 2: Platform Integrations (If Early Metrics Look Promising)
Service layer architecture:
# Embedded API integration example @app.route('/webhooks/order_delivered', methods=['POST']) def handle_order_delivered(order_data): response = requests.post('https://api.surveysensei.io/v1/surveys/generate', json={ 'transaction_id': order_data['id'], 'user_id': order_data['customer_id'], 'product_id': order_data['product_id'], 'user_context': {...}, 'product_context': {...} }) survey_url = response.json()['survey_url'] send_email(to=order_data['customer']['email'], body=survey_url)
Initial connectors:
Early data patterns (if scale permits):
Month 3: Analytics Layer + Scale Testing
Basic intelligence features:
Scale validation:
Market positioning refinement:
Month 4+: Iterative Improvement
Realistic expectations:
Competitive considerations:
What needs ongoing work:
\
Survey Sensei demonstrates a practical path toward better customer feedback by combining modern AI capabilities with structured data collection:
What we built:
Improvements over traditional reviews:
What needs validation:
The system works today. Clone the repo, run the setup, and test a survey in 10 minutes. The architecture shows how multi-agent patterns handle complex, context-dependent workflows—not just for surveys, but for any system requiring personalization at scale.
If you're building customer feedback systems, recommendation engines, or personalization tools, this architecture offers a concrete reference implementation. The pattern (context gathering → adaptive decision-making → personalized output) generalizes well:
The shift toward specialized agents collaborating on tasks represents a practical middle ground between monolithic models and over-engineered microservices. It's early, but the economics and technical patterns are sound enough to build on.
\
Questions? Ideas? Feedback?
Academic Papers
Industry Reports
Multi-Agent Systems Resources
Technical Documentation
\
:::tip All code examples and architecture diagrams are from the Survey Sensei codebase.
:::
\


