Building an Intelligent Intent Classification System: From Pattern Matching to Self-Learning AI
TL;DR
We rebuilt Cereby's intent classification system from the ground up, transforming it from simple pattern matching into an intelligent, self-learning system. The new system:
- Uses multi-factor confidence scoring to know when it's uncertain
- Learns from mistakes through automatic feedback loops
- Employs semantic similarity to understand concepts beyond keywords
- Handles complex action sequences with dependency resolution
- Personalizes to each user's preferences and patterns
The Problem: When Simple Pattern Matching Isn't Enough
When we launched Cereby, our AI learning assistant, we started with a straightforward approach to intent classification based on simple keyword matching. This worked... until it didn't.
Real User Frustrations
Scenario 1: Ambiguity User asks about "cells" - could mean biology (cell structure), chemistry (electrochemical cells), or computer science (cellular automata). Simple keyword matching couldn't distinguish. Scenario 2: Complex Requests User: "Analyze my weak points in physics then create a quiz on them" System: Only analyzes weak points, ignores the quiz part Scenario 3: Context Blindness User: "Explain photosynthesis" System: Explains concept User: "Quiz me on it" System: "Quiz on what? Please specify a topic."These failures had a real cost:
- 23% of requests required clarification or correction
- Users had to rephrase 1 in 4 requests
- 15% abandonment rate when things went wrong
We needed a fundamental rethink.
The Vision: Intent Classification as a Learning System
Rather than incrementally patch our keyword matcher, we asked: "What if intent classification could be as intelligent as the AI it powers?"
Our goals:
- Know when you're uncertain (confidence scoring)
- Learn from mistakes (feedback loops)
- Understand semantics (embeddings, not just keywords)
- Handle complexity (multi-intent, dependencies)
- Personalize (remember user preferences)
- Converse naturally (multi-turn dialogue)
This became the Enhanced Intent Classification System.
Architecture: Building Intelligence in Layers
Layer 1: Multi-Factor Confidence Scoring
Instead of binary "matched/not matched", every classification gets a confidence score based on:
- Pattern Match — Keyword strength (how well query keywords match)
- Model Confidence — AI certainty from probability scores
- Historical Accuracy — Past success rate for similar requests
- Parameter Completeness — Have all required parameters?
- Context Alignment — Fits user's recent activity?
When confidence drops below 0.75, we trigger clarification instead of proceeding with uncertain classification.
Impact: Reduced misclassifications from 23% to 5%.Layer 2: Automatic Learning from Feedback
The system tracks three types of feedback:
- Explicit corrections: "No, I wanted X not Y"
- Implicit corrections: User changes action during confirmation
- Cancellations: User abandons low-confidence classifications
The Pattern Learner analyzes feedback to improve future classifications:
- Identifies common misclassifications
- Learns user-specific vocabulary
- Enhances prompts with learned patterns
- Adapts to individual user preferences
Layer 3: Semantic Understanding with Embeddings
Keywords fail for ambiguous terms. Solution: semantic similarity.
Instead of exact keyword matching, we:
- Generate embeddings for topics
- Generate embeddings for query
- Calculate semantic similarity
- Find most similar subject/topic
This solves the ambiguity problem:
- "cells" + context → Chemistry or Biology (semantic disambiguation)
- "quantum entanglement" → Physics (even though not in keyword map)
- "derivative rules" → Mathematics (semantic similarity to "calculus")
Layer 4: Dependency Resolution for Complex Requests
Users don't think in single actions. They chain them:
"Analyze my weak points in physics then create a quiz on them"
This requires:
- Detecting the sequence ("then" indicates dependency)
- Executing in order
- Passing data between actions
- Parses chained intents
- Builds execution graph (directed acyclic graph)
- Identifies dependencies
- Determines parallelism opportunities
- Executes in optimal order
User: "Explain photosynthesis, create notes, and quiz me"
Execution Graph:
Batch 1: Explain (must complete first)
Batch 2: Notes + Quiz (can run in parallel)
Impact: Handles 3x more complex requests without requiring users to break them down.
Layer 5: User Preference Learning
Every user has preferences. Why ask every time?
The Strategy:- Track successful action completions
- Store preferred parameters (quiz length, difficulty, etc.)
- Auto-fill missing parameters from learned preferences
- Override with user-provided params when present
Layer 6: Natural Conversational Dialogue
When parameters are missing, ask naturally with contextual suggestions based on user's recent activity and available options.
Implementation: Key Technical Decisions
1. Singleton Pattern for All Services
Every component uses the singleton pattern for efficiency - avoid recreating expensive resources (AI clients, database connections, embedding caches).
2. Permanent Embedding Cache
Embeddings never change, so cache forever with no expiration.
Impact: 95% cache hit rate after first week, saving significant API costs.3. Batch Processing for Embeddings
Don't generate embeddings one at a time - batch them together for parallel processing.
Impact: 10x reduction in API latency (1 request for 10 embeddings vs. 10 serial requests).4. Database Views for Analytics
Pre-compute expensive analytics with database views for real-time dashboards without expensive aggregations.
5. Graceful Degradation
If embeddings fail, fall back to keywords with clear indication of reduced confidence.
Results: By the Numbers
Before vs. After
| Metric | Before | After | Change |
|---|---|---|---|
| Classification Accuracy | 77% | 95% | +18% |
| Clarification Rate | 23% | 7% | -70% |
| User Corrections | 15% | 3% | -80% |
| Avg Confidence Score | N/A | 0.87 | New |
| Complex Request Success | 45% | 89% | +98% |
| Cost per Classification | $0.003 | $0.001 | -67% |
Real User Impact
Sarah, Graduate Student:"Before: I'd have to rephrase my requests 2-3 times. Now: Cereby understands exactly what I want, even with complex requests. It feels like talking to a smart human tutor."James, High School Teacher:
"The system learns my preferences. I always want 15-question quizzes at intermediate difficulty. Now it just knows. Saves me time per quiz creation."
Business Impact
- 23% increase in daily active users (better UX = more engagement)
- Significant cost reduction (batching, caching, fewer retries)
- Improved ratings across platforms
- 31% reduction in support tickets related to "AI doesn't understand me"
Lessons Learned
1. Confidence Scores Changed Everything
The system knowing its own uncertainty is transformative - ask when uncertain rather than proceeding with wrong classification.
2. User-Specific Learning is Surprisingly Powerful
Generic AI is good. Personalized AI is magic. Users develop their own vocabulary, and learning these patterns significantly reduces errors.
3. Semantic Similarity > Keyword Expansion
Switching to semantic similarity with embeddings outperformed extensive keyword dictionaries because it handles:
- Synonyms automatically
- Multi-word concepts
- Domain-specific terminology
- Typos (embeddings are typo-tolerant)
4. The Long Tail Matters
80% of requests matched 20 patterns. But the remaining 20% of "weird" requests caused 60% of user frustration. Handling edge cases required conversation history context, ambiguity detection, and natural clarification dialogues.
5. Async > Sync, But Don't Overcomplicate
Users love responsiveness, but don't over-engineer. Simple instant feedback while processing in background provides most of the value.
6. Analytics Drove Prioritization
Real-time dashboard showing which action types have lowest confidence, which users need most clarification, and common misclassification patterns enabled data-driven prioritization of improvements.
Looking Ahead: What's Next
Phase 2 Features (Q1 2026)
1. Cross-User Learning Share patterns across users while preserving privacy through federated learning. 2. Multimodal Intent Classification Classify from voice input, screen context, and time of day patterns. 3. Proactive Suggestions Don't wait for user requests - suggest actions based on context. 4. Intent Synthesis Generate new actions on the fly for novel requests.Conclusion
Building an intelligent intent classification system taught us that AI isn't just about accuracy—it's about knowing your limitations.
The key insights:
- Confidence scoring lets the system know when to ask for help
- Feedback loops turn every mistake into learning
- Semantic understanding beats keyword dictionaries
- User personalization compounds value over time
- Natural dialogue makes failures recoverable
The result: An AI assistant that feels genuinely intelligent, learns from experience, and gets better the more you use it.
Most importantly, it's production-ready and maintainable. The modular architecture means we can improve individual components without touching others. The analytics tell us exactly where to optimize next. The learning loops mean it continuously improves.
If you're building AI-powered features, invest in the intent classification layer. It's the foundation everything else builds on.
Resources
Demo: Try Cereby and see the system in action Questions? Reach out to our engineering team: engineering@cereby.aiTags: #AI #IntentClassification #MachineLearning #NLP #Embeddings #ProductEngineering #EdTech
Want to work on problems like this? We're hiring ML Engineers, Backend Engineers, and Product Engineers.
Visual Summary
flowchart TD
A[User Message] --> B[Embedding + Keyword Extraction]
B --> C[Candidate Intent Retrieval]
C --> D[Classifier Decision]
D --> E{Confidence High?}
E -->|Yes| F[Execute Intent]
E -->|No| G[Clarification Flow]
F --> H[Tool Action + Response]