Building Cereby AI: A Vertical AI Architecture for Personalized Learning

How we engineered a context-aware learning assistant that transforms how students study

Introduction

At Cereby, we set out to solve a fundamental problem in educational technology: generic AI assistants don't understand the unique learning journey of each student. That's why we built Cereby AI — a vertical AI system specifically fine-tuned for academic learning that leverages comprehensive user context to deliver personalized, actionable study support.

After months of development, Cereby AI now powers intelligent note generation, concept explanations, adaptive quizzes, personalized learning paths, and performance analysis across our platform. This post explores the technical architecture, engineering challenges, and key design decisions that made it possible.

The Core Challenge: Context-Aware Intelligence

The biggest challenge wasn't building another chatbot — it was creating a system that truly understands each student's learning context. Unlike general-purpose AI assistants that treat each query in isolation, Cereby AI needed to:

Aggregate heterogeneous data sources (quizzes, notes, calendar events, learning paths)
Maintain persistent context about performance and weak points
Generate domain-specific, pedagogically sound content that aligns with academic standards
Act proactively rather than just reactively respond to queries

Architecture Overview: Plugin-Based Modular System

We implemented a plugin-based modular architecture that separates concerns cleanly and allows for independent scaling and easy extensibility:

CerebyAIController (Orchestration Layer)
├── ContextAggregator (Data Collection)
├── IntentClassifier (NLP Understanding)
│   ├── Conversation History Analysis
│   ├── Intent Detection (clear vs. unclear)
│   ├── Context Selection (data vs. conversation)
│   └── Tool Registry Integration (dynamic prompt generation)
├── ToolOrchestrator (Tool Execution)
│   ├── Parameter Validation
│   ├── Tool Execution
│   └── Error Handling
└── Tool Registry (Central Tool Management)
    ├── Tool Definitions (Metadata)
    ├── Tool Handlers (Business Logic)
    └── Intent Classification Prompt Generation
        │
        └── Tools (Plugin Architecture)
            ├── GenerateQuizHandler
            ├── CreateNotesHandler
            ├── ExplainConceptsHandler
            ├── CreateLearningPathHandler
            ├── AnalyzePerformanceHandler
            ├── GenerateFlashcardsHandler
            ├── ScheduleSpacedRepetitionHandler
            └── GenerateExamHandler

Key Architectural Components:

Tool Registry: Central registry managing all tool definitions and handlers
Tool Orchestrator: Executes tools with validation and error handling
Tool Definitions: Metadata describing each tool (parameters, examples, confirmations)
Tool Handlers: Self-contained business logic implementations
Topic Selector: Centralized topic formatting and deduplication module

The Controller Layer

The orchestration layer manages the complete request lifecycle, including:

Request lifecycle management — from user input to final response
Context management — fetching, caching, and refreshing user context
Error handling and fallbacks — graceful degradation when components fail
Model selection — choosing between fine-tuned and base models
Tool registration — automatically registers all tools on initialization

Request Flow:

User sends natural language request (with optional conversation history)
Controller fetches/validates cached user context
IntentClassifier processes request with user data and conversation history
System determines if intent is clear or requires clarification
For clear intents: ToolOrchestrator validates parameters and executes appropriate tool
Tool handler executes business logic and generates response
Result returned to user with updated conversation history

The Modular Tool System

Our plugin-based architecture allows each tool to be a self-contained module with:

Tool Definition — Metadata describing capabilities, parameters, and examples
Tool Handler — Business logic implementation for executing the tool
Parameter Validation — Ensures required inputs are present and properly formatted
Error Handling — Graceful failure modes and user-friendly error messages

Benefits of This Architecture:

Single Source of Truth: Tool definitions centralized in the registry
Easy Extensibility: Add new tools without modifying core files
Type Safety: Full typing throughout the system
Testability: Each handler can be tested independently
Dynamic Prompt Generation: IntentClassifier adapts to available tools automatically

Vertical AI: Why We Chose Fine-Tuning

One of our most impactful decisions was to build Cereby AI as a vertical AI — a model fine-tuned specifically on academic content rather than using a general-purpose model.

The Fine-Tuning Strategy

We fine-tuned our model on a curated dataset of:

Open-source textbooks (OpenStax, LibreTexts)
Educational content across STEM, humanities, and social sciences
Pedagogical frameworks and learning science research
Structured Q&A pairs, exam questions, and academic materials

Technical Benefits

1. Reduced Prompt Engineering Overhead

Fine-tuned models understand academic context natively, allowing us to use focused, user-specific prompts rather than lengthy instructions about academic standards.

2. Consistent Academic Quality

Our fine-tuned model maintains consistency across all generated content:

Proper terminology usage
Alignment with standard curricula
Pedagogically sound progression
Accurate mathematical notation

3. Cost Optimization Through Efficiency

While fine-tuning had upfront costs, we've achieved:

30-40% reduction in input tokens per request
Better first-attempt quality (fewer regeneration requests)
Higher user satisfaction (less need for corrections)

Model Management

Our system handles:

Version tracking and gradual rollouts
Automatic fallback to base model if fine-tuned model is unavailable
A/B testing between model versions
Performance monitoring per model

Context Aggregation: The Data Challenge

Cereby AI's intelligence comes from aggregating data across multiple sources:

Quiz Performance — scores, weak areas, time spent, difficulty levels
Learning Paths — progress, topic mastery, completion rates
Calendar Events — upcoming exams, study sessions, deadlines
Notes Content — topics covered, embedded quizzes, flashcard performance

Conversation History as Context:

The system also leverages conversation history for:

Reference resolution — Understanding "this", "that", "it" from previous messages
Follow-up requests — "Quiz on this" after an explanation
Contextual continuity — Maintaining conversation flow across multiple interactions

The Performance Challenge

Initial implementation was slow — each request triggered multiple database queries. Context aggregation took 2-3 seconds for users with extensive history.

Solution: Multi-Layered Caching

We implemented a sophisticated caching strategy:

1. Context Cache

Stores complete user context in optimized format
Short TTL for real-time accuracy
Automatic expiration and refresh

2. Weak Points Cache

Recalculated periodically (performance doesn't change minute-to-minute)
Stores results with computed metrics

3. Query-Level Optimizations

Combined queries using efficient patterns
Indexed all frequently queried columns
Used aggregation for nested data

Result: Context aggregation now takes 200-400ms instead of 2-3 seconds.

Intent Classification: Understanding Natural Language

The IntentClassifier uses advanced NLP to map natural language requests to specific actions. It leverages the Tool Registry to dynamically generate classification prompts. The system handles:

Direct requests: "Create notes on my weak points in calculus"
Concept explanations: "Explain the chain rule" or "What is photosynthesis?"
Contextual requests: "Create a learning path for my physics exam"
Conversation-aware requests: "Quiz on this" (references previous explanation)
Casual conversation: "Hi, how are you?" (no specific intent)

Chat History Integration

One of our most impactful improvements was integrating conversation history into intent classification. The system maintains context across multiple messages, enabling natural follow-up conversations.

Conversation Context Extraction:

The system extracts concepts, subjects, and topics from recent assistant messages to understand references in follow-up queries.

Reference Resolution:

When users say "this", "that", or "it", the system resolves these references from conversation history.

Example Conversation Flow:

User: "Explain cell reproduction in biology" Assistant: [Provides detailed explanation of cell reproduction]

User: "Quiz on this" System Analysis: - Detects "this" refers to previous explanation - Extracts "cell reproduction" and "biology" from conversation history - Generates quiz on "cell reproduction" in "biology"

Intent Detection and Casual Conversation

The system can detect when there's no clear learning intent and respond naturally with friendly, contextual responses. This approach gives us:

Conversation continuity — Understands references across messages
Smart context selection — Uses relevant data when needed, conversation history when appropriate
Natural interaction — Handles greetings and casual conversation gracefully
Structured extraction — Still extracts parameters when intent is clear

Tool Implementation

Note Creation: From Context-Rich to Zero-Context

Cereby AI's note creation capability works across the entire spectrum — from creating highly personalized notes based on weak points and performance data, to generating comprehensive study materials on entirely new subjects where no prior context exists.

The Multi-Tiered Approach:

When a user requests notes, Cereby AI follows a cascading strategy:

Check for learning path topics (most relevant)
Check for weak points (personalized)
Fallback to general notes (zero context)

This multi-tiered approach ensures Cereby AI works for:

New students exploring subjects for the first time
Advanced learners diving into specialized topics
Casual learners who haven't taken quizzes yet
Anyone who wants to learn something completely new

By gracefully degrading from context-rich to zero-context generation, Cereby AI maintains its value proposition even when users have no prior interaction history.

Concept Explanation

The Explain Concepts capability provides detailed, structured explanations tailored to the user's learning level and context.

Structured Explanation Format:

Core Definition — Clear, concise explanation of the concept
Key Principles — Fundamental rules or theorems
Step-by-Step Examples — Worked examples at the user's level
Common Pitfalls — Mistakes to avoid (especially relevant for weak points)
Related Concepts — Connections to other topics the user has studied
Practice Applications — Real-world or exam-style applications

Adaptive Difficulty:

The system adjusts explanation depth based on user performance, providing more examples and detail for weak points, and advanced connections for mastered concepts.

Database Design

We extended our schema with several optimized tables:

Context Cache Table

Stores aggregated user context for fast retrieval with:

User identification and scoping
JSONB storage for flexible schema
Timestamp tracking for expiration
Optimized indexes for quick lookups

Flashcards Table

Implements spaced repetition with algorithm integration:

User-scoped card storage
Review scheduling metadata
Performance tracking
Algorithm parameters (ease factor, interval)

Concept Explanations Table

Stores generated explanations for quick reference:

Subject and concept organization
Difficulty level tracking
Related concepts linking
Access timestamp for analytics

Key Design Decisions:

JSONB for flexibility — Allows schema evolution without migrations
Indexed user identification — Fast user-scoped queries
Timestamp tracking — Enables time-based analysis

API Architecture

We implemented a dual-endpoint strategy:

1. Unified Chat Endpoint

Natural language interface for the UI - accepts free-form text and processes through the full intent classification pipeline.

2. Direct Capability Endpoints

Programmatic access for integrations - structured requests that bypass intent classification for known operations.

This architecture gives us:

Flexibility — Support both natural language and structured requests
Backwards compatibility — Direct endpoints for existing integrations
Rate limiting — Different limits per endpoint type

Performance Optimizations

1. Parallel Data Fetching

Context aggregation fetches data sources in parallel rather than sequentially, dramatically reducing latency.

2. Streaming Responses

For long-form content generation (exams, comprehensive notes, concept explanations), we stream responses to provide better perceived performance.

3. Request Deduplication

We cache similar requests within a short time window to avoid redundant AI calls.

Error Handling and Reliability

Graceful Degradation

When components fail, Cereby AI degrades gracefully with fallback strategies for model unavailability, partial context, and missing data.

Context Fallbacks

If context aggregation fails, the system:

Uses cached context (even if slightly stale)
Proceeds with partial context
Requests user clarification if critical data is missing

Zero-Context Operation:

One of our most important design decisions was ensuring Cereby AI can operate effectively even without any user context. This is valuable for new users, exploratory learning, and first-time topics.

Retry Logic

For transient failures:

Exponential backoff for API rate limits
Circuit breaker pattern for repeated failures
Automatic retry with jitter

Security and Privacy

Data Isolation

All queries are user-scoped with proper access controls.

Row-Level Security

Database policies ensure no cross-user data access.

AI Data Handling

User data only sent to AI provider for processing
No data retention (configured in API settings)
Encrypted context cache in database

Monitoring and Analytics

We track comprehensive metrics:

Usage Metrics:

Requests per user per day
Capability distribution (which features are used most)
Average response time per capability

Quality Metrics:

User satisfaction scores
Content regeneration rate (proxy for quality)
Weak point improvement correlation

Technical Metrics:

Cache hit rates
Model selection (fine-tuned vs base)
Error rates by type
Cost per request

These metrics inform model retraining decisions, performance optimization priorities, and feature development roadmap.

Lessons Learned

1. Start with Caching Early

We initially focused on feature development and added caching later. Building caching from day one would have saved significant refactoring time.

2. Fine-Tuning is Worth the Investment

The upfront cost of fine-tuning was substantial, but the quality improvements and cost savings over time have more than justified it.

3. Context is King (But Not Always Required)

The quality of context aggregation directly impacts AI output quality when it exists. However, we also learned that graceful degradation is essential. Not every user interaction has rich context, and the system must work beautifully even with zero context.

4. Design for Extensibility

Our plugin-based modular architecture makes adding new capabilities straightforward. Each new tool follows a consistent pattern without requiring modifications to core files.

5. User Feedback Loops Matter

Early user testing revealed that academic accuracy was paramount. This guided our fine-tuning dataset curation and quality assurance processes.

6. Conversation History Transforms User Experience

Adding conversation history support was a game-changer. Users could finally have natural, flowing conversations instead of treating each message as isolated.

Future Enhancements

We're working on:

Subject-specific fine-tuning — Separate models for STEM vs humanities
Multi-modal understanding — Process diagrams, formulas, and images
Proactive suggestions — AI-initiated study recommendations
Collaborative features — Study groups with shared Cereby insights
Voice interaction — Natural language voice commands

Conclusion

Building Cereby AI taught us that creating truly intelligent, context-aware AI systems requires:

Vertical specialization — Fine-tuning for domain expertise
Robust data aggregation — Comprehensive context is non-negotiable
Conversation awareness — Understanding references and maintaining context across messages
Smart intent detection — Knowing when to act vs. when to have a casual conversation
Modular architecture — Plugin-based system enables clean separation and easy extensibility
Performance optimization — Caching and parallelization are critical
User-centric design — Quality metrics must align with user outcomes

Cereby AI is now used by thousands of students, generating personalized study materials that adapt to their unique learning journey. The technical foundation we built allows us to iterate quickly and add new capabilities without architectural changes.

For developers working on similar systems, our key takeaway is: invest in context aggregation (both structured data and conversation history) and vertical specialization from the start, but always design for graceful degradation. The system must provide value immediately for new users (zero-context operation) while leveraging rich context when available for personalization.

Want to learn more about Cereby AI or join our team? Check out our careers page or reach out on Twitter.

Visual Summary

flowchart TD
    A[User Request] --> B[CerebyAIController]
    B --> C[Context Aggregator]
    B --> D[Intent Classifier]
    D --> E{Intent Clear?}
    E -->|Yes| F[Tool Orchestrator]
    E -->|No| G[Clarification Response]
    F --> H[Tool Registry]
    H --> I[Selected Tool Handler]
    I --> J[AI Response + Updated Context]