Building Cereby AI: A Vertical AI Architecture for Personalized Learning
Introduction
At Cereby, we set out to solve a fundamental problem in educational technology: generic AI assistants don't understand the unique learning journey of each student. That's why we built Cereby AI — a vertical AI system specifically fine-tuned for academic learning that leverages comprehensive user context to deliver personalized, actionable study support.
After months of development, Cereby AI now powers intelligent note generation, concept explanations, adaptive quizzes, personalized learning paths, and performance analysis across our platform. This post explores the technical architecture, engineering challenges, and key design decisions that made it possible.
The Core Challenge: Context-Aware Intelligence
The biggest challenge wasn't building another chatbot — it was creating a system that truly understands each student's learning context. Unlike general-purpose AI assistants that treat each query in isolation, Cereby AI needed to:
- Aggregate heterogeneous data sources (quizzes, notes, calendar events, learning paths)
- Maintain persistent context about performance and weak points
- Generate domain-specific, pedagogically sound content that aligns with academic standards
- Act proactively rather than just reactively respond to queries
Architecture Overview: Plugin-Based Modular System
We implemented a plugin-based modular architecture that separates concerns cleanly and allows for independent scaling and easy extensibility:
CerebyAIController (Orchestration Layer)
├── ContextAggregator (Data Collection)
├── IntentClassifier (NLP Understanding)
│ ├── Conversation History Analysis
│ ├── Intent Detection (clear vs. unclear)
│ ├── Context Selection (data vs. conversation)
│ └── Tool Registry Integration (dynamic prompt generation)
├── ToolOrchestrator (Tool Execution)
│ ├── Parameter Validation
│ ├── Tool Execution
│ └── Error Handling
└── Tool Registry (Central Tool Management)
├── Tool Definitions (Metadata)
├── Tool Handlers (Business Logic)
└── Intent Classification Prompt Generation
│
└── Tools (Plugin Architecture)
├── GenerateQuizHandler
├── CreateNotesHandler
├── ExplainConceptsHandler
├── CreateLearningPathHandler
├── AnalyzePerformanceHandler
├── GenerateFlashcardsHandler
├── ScheduleSpacedRepetitionHandler
└── GenerateExamHandler
Key Architectural Components:
- Tool Registry: Central registry managing all tool definitions and handlers
- Tool Orchestrator: Executes tools with validation and error handling
- Tool Definitions: Metadata describing each tool (parameters, examples, confirmations)
- Tool Handlers: Self-contained business logic implementations
- Topic Selector: Centralized topic formatting and deduplication module
The Controller Layer
The orchestration layer manages the complete request lifecycle, including:
- Request lifecycle management — from user input to final response
- Context management — fetching, caching, and refreshing user context
- Error handling and fallbacks — graceful degradation when components fail
- Model selection — choosing between fine-tuned and base models
- Tool registration — automatically registers all tools on initialization
- User sends natural language request (with optional conversation history)
- Controller fetches/validates cached user context
- IntentClassifier processes request with user data and conversation history
- System determines if intent is clear or requires clarification
- For clear intents: ToolOrchestrator validates parameters and executes appropriate tool
- Tool handler executes business logic and generates response
- Result returned to user with updated conversation history
The Modular Tool System
Our plugin-based architecture allows each tool to be a self-contained module with:
- Tool Definition — Metadata describing capabilities, parameters, and examples
- Tool Handler — Business logic implementation for executing the tool
- Parameter Validation — Ensures required inputs are present and properly formatted
- Error Handling — Graceful failure modes and user-friendly error messages
- Single Source of Truth: Tool definitions centralized in the registry
- Easy Extensibility: Add new tools without modifying core files
- Type Safety: Full typing throughout the system
- Testability: Each handler can be tested independently
- Dynamic Prompt Generation: IntentClassifier adapts to available tools automatically
Vertical AI: Why We Chose Fine-Tuning
One of our most impactful decisions was to build Cereby AI as a vertical AI — a model fine-tuned specifically on academic content rather than using a general-purpose model.
The Fine-Tuning Strategy
We fine-tuned our model on a curated dataset of:
- Open-source textbooks (OpenStax, LibreTexts)
- Educational content across STEM, humanities, and social sciences
- Pedagogical frameworks and learning science research
- Structured Q&A pairs, exam questions, and academic materials
Technical Benefits
1. Reduced Prompt Engineering OverheadFine-tuned models understand academic context natively, allowing us to use focused, user-specific prompts rather than lengthy instructions about academic standards.
2. Consistent Academic QualityOur fine-tuned model maintains consistency across all generated content:
- Proper terminology usage
- Alignment with standard curricula
- Pedagogically sound progression
- Accurate mathematical notation
While fine-tuning had upfront costs, we've achieved:
- 30-40% reduction in input tokens per request
- Better first-attempt quality (fewer regeneration requests)
- Higher user satisfaction (less need for corrections)
Model Management
Our system handles:
- Version tracking and gradual rollouts
- Automatic fallback to base model if fine-tuned model is unavailable
- A/B testing between model versions
- Performance monitoring per model
Context Aggregation: The Data Challenge
Cereby AI's intelligence comes from aggregating data across multiple sources:
- Quiz Performance — scores, weak areas, time spent, difficulty levels
- Learning Paths — progress, topic mastery, completion rates
- Calendar Events — upcoming exams, study sessions, deadlines
- Notes Content — topics covered, embedded quizzes, flashcard performance
The system also leverages conversation history for:
- Reference resolution — Understanding "this", "that", "it" from previous messages
- Follow-up requests — "Quiz on this" after an explanation
- Contextual continuity — Maintaining conversation flow across multiple interactions
The Performance Challenge
Initial implementation was slow — each request triggered multiple database queries. Context aggregation took 2-3 seconds for users with extensive history.
Solution: Multi-Layered Caching
We implemented a sophisticated caching strategy:
1. Context Cache- Stores complete user context in optimized format
- Short TTL for real-time accuracy
- Automatic expiration and refresh
- Recalculated periodically (performance doesn't change minute-to-minute)
- Stores results with computed metrics
- Combined queries using efficient patterns
- Indexed all frequently queried columns
- Used aggregation for nested data
Intent Classification: Understanding Natural Language
The IntentClassifier uses advanced NLP to map natural language requests to specific actions. It leverages the Tool Registry to dynamically generate classification prompts. The system handles:
- Direct requests: "Create notes on my weak points in calculus"
- Concept explanations: "Explain the chain rule" or "What is photosynthesis?"
- Contextual requests: "Create a learning path for my physics exam"
- Conversation-aware requests: "Quiz on this" (references previous explanation)
- Casual conversation: "Hi, how are you?" (no specific intent)
Chat History Integration
One of our most impactful improvements was integrating conversation history into intent classification. The system maintains context across multiple messages, enabling natural follow-up conversations.
Conversation Context Extraction:The system extracts concepts, subjects, and topics from recent assistant messages to understand references in follow-up queries.
Reference Resolution:When users say "this", "that", or "it", the system resolves these references from conversation history.
Example Conversation Flow:User: "Explain cell reproduction in biology"
Assistant: [Provides detailed explanation of cell reproduction]
User: "Quiz on this"
System Analysis:
- Detects "this" refers to previous explanation
- Extracts "cell reproduction" and "biology" from conversation history
- Generates quiz on "cell reproduction" in "biology"
Intent Detection and Casual Conversation
The system can detect when there's no clear learning intent and respond naturally with friendly, contextual responses. This approach gives us:
- Conversation continuity — Understands references across messages
- Smart context selection — Uses relevant data when needed, conversation history when appropriate
- Natural interaction — Handles greetings and casual conversation gracefully
- Structured extraction — Still extracts parameters when intent is clear
Tool Implementation
Note Creation: From Context-Rich to Zero-Context
Cereby AI's note creation capability works across the entire spectrum — from creating highly personalized notes based on weak points and performance data, to generating comprehensive study materials on entirely new subjects where no prior context exists.
The Multi-Tiered Approach:When a user requests notes, Cereby AI follows a cascading strategy:
- Check for learning path topics (most relevant)
- Check for weak points (personalized)
- Fallback to general notes (zero context)
This multi-tiered approach ensures Cereby AI works for:
- New students exploring subjects for the first time
- Advanced learners diving into specialized topics
- Casual learners who haven't taken quizzes yet
- Anyone who wants to learn something completely new
By gracefully degrading from context-rich to zero-context generation, Cereby AI maintains its value proposition even when users have no prior interaction history.
Concept Explanation
The Explain Concepts capability provides detailed, structured explanations tailored to the user's learning level and context.
Structured Explanation Format:- Core Definition — Clear, concise explanation of the concept
- Key Principles — Fundamental rules or theorems
- Step-by-Step Examples — Worked examples at the user's level
- Common Pitfalls — Mistakes to avoid (especially relevant for weak points)
- Related Concepts — Connections to other topics the user has studied
- Practice Applications — Real-world or exam-style applications
The system adjusts explanation depth based on user performance, providing more examples and detail for weak points, and advanced connections for mastered concepts.
Database Design
We extended our schema with several optimized tables:
Context Cache Table
Stores aggregated user context for fast retrieval with:
- User identification and scoping
- JSONB storage for flexible schema
- Timestamp tracking for expiration
- Optimized indexes for quick lookups
Flashcards Table
Implements spaced repetition with algorithm integration:
- User-scoped card storage
- Review scheduling metadata
- Performance tracking
- Algorithm parameters (ease factor, interval)
Concept Explanations Table
Stores generated explanations for quick reference:
- Subject and concept organization
- Difficulty level tracking
- Related concepts linking
- Access timestamp for analytics
- JSONB for flexibility — Allows schema evolution without migrations
- Indexed user identification — Fast user-scoped queries
- Timestamp tracking — Enables time-based analysis
API Architecture
We implemented a dual-endpoint strategy:
1. Unified Chat Endpoint
Natural language interface for the UI - accepts free-form text and processes through the full intent classification pipeline.
2. Direct Capability Endpoints
Programmatic access for integrations - structured requests that bypass intent classification for known operations.
This architecture gives us:
- Flexibility — Support both natural language and structured requests
- Backwards compatibility — Direct endpoints for existing integrations
- Rate limiting — Different limits per endpoint type
Performance Optimizations
1. Parallel Data Fetching
Context aggregation fetches data sources in parallel rather than sequentially, dramatically reducing latency.
2. Streaming Responses
For long-form content generation (exams, comprehensive notes, concept explanations), we stream responses to provide better perceived performance.
3. Request Deduplication
We cache similar requests within a short time window to avoid redundant AI calls.
Error Handling and Reliability
Graceful Degradation
When components fail, Cereby AI degrades gracefully with fallback strategies for model unavailability, partial context, and missing data.
Context Fallbacks
If context aggregation fails, the system:
- Uses cached context (even if slightly stale)
- Proceeds with partial context
- Requests user clarification if critical data is missing
One of our most important design decisions was ensuring Cereby AI can operate effectively even without any user context. This is valuable for new users, exploratory learning, and first-time topics.
Retry Logic
For transient failures:
- Exponential backoff for API rate limits
- Circuit breaker pattern for repeated failures
- Automatic retry with jitter
Security and Privacy
Data Isolation
All queries are user-scoped with proper access controls.
Row-Level Security
Database policies ensure no cross-user data access.
AI Data Handling
- User data only sent to AI provider for processing
- No data retention (configured in API settings)
- Encrypted context cache in database
Monitoring and Analytics
We track comprehensive metrics:
Usage Metrics:- Requests per user per day
- Capability distribution (which features are used most)
- Average response time per capability
- User satisfaction scores
- Content regeneration rate (proxy for quality)
- Weak point improvement correlation
- Cache hit rates
- Model selection (fine-tuned vs base)
- Error rates by type
- Cost per request
These metrics inform model retraining decisions, performance optimization priorities, and feature development roadmap.
Lessons Learned
1. Start with Caching Early
We initially focused on feature development and added caching later. Building caching from day one would have saved significant refactoring time.
2. Fine-Tuning is Worth the Investment
The upfront cost of fine-tuning was substantial, but the quality improvements and cost savings over time have more than justified it.
3. Context is King (But Not Always Required)
The quality of context aggregation directly impacts AI output quality when it exists. However, we also learned that graceful degradation is essential. Not every user interaction has rich context, and the system must work beautifully even with zero context.
4. Design for Extensibility
Our plugin-based modular architecture makes adding new capabilities straightforward. Each new tool follows a consistent pattern without requiring modifications to core files.
5. User Feedback Loops Matter
Early user testing revealed that academic accuracy was paramount. This guided our fine-tuning dataset curation and quality assurance processes.
6. Conversation History Transforms User Experience
Adding conversation history support was a game-changer. Users could finally have natural, flowing conversations instead of treating each message as isolated.
Future Enhancements
We're working on:
- Subject-specific fine-tuning — Separate models for STEM vs humanities
- Multi-modal understanding — Process diagrams, formulas, and images
- Proactive suggestions — AI-initiated study recommendations
- Collaborative features — Study groups with shared Cereby insights
- Voice interaction — Natural language voice commands
Conclusion
Building Cereby AI taught us that creating truly intelligent, context-aware AI systems requires:
- Vertical specialization — Fine-tuning for domain expertise
- Robust data aggregation — Comprehensive context is non-negotiable
- Conversation awareness — Understanding references and maintaining context across messages
- Smart intent detection — Knowing when to act vs. when to have a casual conversation
- Modular architecture — Plugin-based system enables clean separation and easy extensibility
- Performance optimization — Caching and parallelization are critical
- User-centric design — Quality metrics must align with user outcomes
Cereby AI is now used by thousands of students, generating personalized study materials that adapt to their unique learning journey. The technical foundation we built allows us to iterate quickly and add new capabilities without architectural changes.
For developers working on similar systems, our key takeaway is: invest in context aggregation (both structured data and conversation history) and vertical specialization from the start, but always design for graceful degradation. The system must provide value immediately for new users (zero-context operation) while leveraging rich context when available for personalization.
Want to learn more about Cereby AI or join our team? Check out our careers page or reach out on Twitter.
Visual Summary
flowchart TD
A[User Request] --> B[CerebyAIController]
B --> C[Context Aggregator]
B --> D[Intent Classifier]
D --> E{Intent Clear?}
E -->|Yes| F[Tool Orchestrator]
E -->|No| G[Clarification Response]
F --> H[Tool Registry]
H --> I[Selected Tool Handler]
I --> J[AI Response + Updated Context]