Back to Blog
Engineering
January 25, 2026
13 min read

Building Cereby AI: A Vertical AI Architecture for Personalized Learning

How we engineered a context-aware learning assistant that transforms how students study

Introduction

At Cereby, we set out to solve a fundamental problem in educational technology: generic AI assistants don't understand the unique learning journey of each student. That's why we built Cereby AI — a vertical AI system specifically fine-tuned for academic learning that leverages comprehensive user context to deliver personalized, actionable study support.

After months of development, Cereby AI now powers intelligent note generation, concept explanations, adaptive quizzes, personalized learning paths, and performance analysis across our platform. This post explores the technical architecture, engineering challenges, and key design decisions that made it possible.

The Core Challenge: Context-Aware Intelligence

The biggest challenge wasn't building another chatbot — it was creating a system that truly understands each student's learning context. Unlike general-purpose AI assistants that treat each query in isolation, Cereby AI needed to:

  1. Aggregate heterogeneous data sources (quizzes, notes, calendar events, learning paths)
  2. Maintain persistent context about performance and weak points
  3. Generate domain-specific, pedagogically sound content that aligns with academic standards
  4. Act proactively rather than just reactively respond to queries

Architecture Overview: Plugin-Based Modular System

We implemented a plugin-based modular architecture that separates concerns cleanly and allows for independent scaling and easy extensibility:

CerebyAIController (Orchestration Layer)
├── ContextAggregator (Data Collection)
├── IntentClassifier (NLP Understanding)
│   ├── Conversation History Analysis
│   ├── Intent Detection (clear vs. unclear)
│   ├── Context Selection (data vs. conversation)
│   └── Tool Registry Integration (dynamic prompt generation)
├── ToolOrchestrator (Tool Execution)
│   ├── Parameter Validation
│   ├── Tool Execution
│   └── Error Handling
└── Tool Registry (Central Tool Management)
    ├── Tool Definitions (Metadata)
    ├── Tool Handlers (Business Logic)
    └── Intent Classification Prompt Generation
        │
        └── Tools (Plugin Architecture)
            ├── GenerateQuizHandler
            ├── CreateNotesHandler
            ├── ExplainConceptsHandler
            ├── CreateLearningPathHandler
            ├── AnalyzePerformanceHandler
            ├── GenerateFlashcardsHandler
            ├── ScheduleSpacedRepetitionHandler
            └── GenerateExamHandler
Key Architectural Components:
  • Tool Registry: Central registry managing all tool definitions and handlers
  • Tool Orchestrator: Executes tools with validation and error handling
  • Tool Definitions: Metadata describing each tool (parameters, examples, confirmations)
  • Tool Handlers: Self-contained business logic implementations
  • Topic Selector: Centralized topic formatting and deduplication module

The Controller Layer

The orchestration layer manages the complete request lifecycle, including:

  • Request lifecycle management — from user input to final response
  • Context management — fetching, caching, and refreshing user context
  • Error handling and fallbacks — graceful degradation when components fail
  • Model selection — choosing between fine-tuned and base models
  • Tool registration — automatically registers all tools on initialization
Request Flow:
  1. User sends natural language request (with optional conversation history)
  2. Controller fetches/validates cached user context
  3. IntentClassifier processes request with user data and conversation history
  4. System determines if intent is clear or requires clarification
  5. For clear intents: ToolOrchestrator validates parameters and executes appropriate tool
  6. Tool handler executes business logic and generates response
  7. Result returned to user with updated conversation history

The Modular Tool System

Our plugin-based architecture allows each tool to be a self-contained module with:

  • Tool Definition — Metadata describing capabilities, parameters, and examples
  • Tool Handler — Business logic implementation for executing the tool
  • Parameter Validation — Ensures required inputs are present and properly formatted
  • Error Handling — Graceful failure modes and user-friendly error messages
Benefits of This Architecture:
  • Single Source of Truth: Tool definitions centralized in the registry
  • Easy Extensibility: Add new tools without modifying core files
  • Type Safety: Full typing throughout the system
  • Testability: Each handler can be tested independently
  • Dynamic Prompt Generation: IntentClassifier adapts to available tools automatically

Vertical AI: Why We Chose Fine-Tuning

One of our most impactful decisions was to build Cereby AI as a vertical AI — a model fine-tuned specifically on academic content rather than using a general-purpose model.

The Fine-Tuning Strategy

We fine-tuned our model on a curated dataset of:

  • Open-source textbooks (OpenStax, LibreTexts)
  • Educational content across STEM, humanities, and social sciences
  • Pedagogical frameworks and learning science research
  • Structured Q&A pairs, exam questions, and academic materials

Technical Benefits

1. Reduced Prompt Engineering Overhead

Fine-tuned models understand academic context natively, allowing us to use focused, user-specific prompts rather than lengthy instructions about academic standards.

2. Consistent Academic Quality

Our fine-tuned model maintains consistency across all generated content:

  • Proper terminology usage
  • Alignment with standard curricula
  • Pedagogically sound progression
  • Accurate mathematical notation
3. Cost Optimization Through Efficiency

While fine-tuning had upfront costs, we've achieved:

  • 30-40% reduction in input tokens per request
  • Better first-attempt quality (fewer regeneration requests)
  • Higher user satisfaction (less need for corrections)

Model Management

Our system handles:

  • Version tracking and gradual rollouts
  • Automatic fallback to base model if fine-tuned model is unavailable
  • A/B testing between model versions
  • Performance monitoring per model

Context Aggregation: The Data Challenge

Cereby AI's intelligence comes from aggregating data across multiple sources:

  • Quiz Performance — scores, weak areas, time spent, difficulty levels
  • Learning Paths — progress, topic mastery, completion rates
  • Calendar Events — upcoming exams, study sessions, deadlines
  • Notes Content — topics covered, embedded quizzes, flashcard performance
Conversation History as Context:

The system also leverages conversation history for:

  • Reference resolution — Understanding "this", "that", "it" from previous messages
  • Follow-up requests — "Quiz on this" after an explanation
  • Contextual continuity — Maintaining conversation flow across multiple interactions

The Performance Challenge

Initial implementation was slow — each request triggered multiple database queries. Context aggregation took 2-3 seconds for users with extensive history.

Solution: Multi-Layered Caching

We implemented a sophisticated caching strategy:

1. Context Cache
  • Stores complete user context in optimized format
  • Short TTL for real-time accuracy
  • Automatic expiration and refresh
2. Weak Points Cache
  • Recalculated periodically (performance doesn't change minute-to-minute)
  • Stores results with computed metrics
3. Query-Level Optimizations
  • Combined queries using efficient patterns
  • Indexed all frequently queried columns
  • Used aggregation for nested data
Result: Context aggregation now takes 200-400ms instead of 2-3 seconds.

Intent Classification: Understanding Natural Language

The IntentClassifier uses advanced NLP to map natural language requests to specific actions. It leverages the Tool Registry to dynamically generate classification prompts. The system handles:

  • Direct requests: "Create notes on my weak points in calculus"
  • Concept explanations: "Explain the chain rule" or "What is photosynthesis?"
  • Contextual requests: "Create a learning path for my physics exam"
  • Conversation-aware requests: "Quiz on this" (references previous explanation)
  • Casual conversation: "Hi, how are you?" (no specific intent)

Chat History Integration

One of our most impactful improvements was integrating conversation history into intent classification. The system maintains context across multiple messages, enabling natural follow-up conversations.

Conversation Context Extraction:

The system extracts concepts, subjects, and topics from recent assistant messages to understand references in follow-up queries.

Reference Resolution:

When users say "this", "that", or "it", the system resolves these references from conversation history.

Example Conversation Flow:
User: "Explain cell reproduction in biology"
Assistant: [Provides detailed explanation of cell reproduction]

User: "Quiz on this" System Analysis: - Detects "this" refers to previous explanation - Extracts "cell reproduction" and "biology" from conversation history - Generates quiz on "cell reproduction" in "biology"

Intent Detection and Casual Conversation

The system can detect when there's no clear learning intent and respond naturally with friendly, contextual responses. This approach gives us:

  • Conversation continuity — Understands references across messages
  • Smart context selection — Uses relevant data when needed, conversation history when appropriate
  • Natural interaction — Handles greetings and casual conversation gracefully
  • Structured extraction — Still extracts parameters when intent is clear

Tool Implementation

Note Creation: From Context-Rich to Zero-Context

Cereby AI's note creation capability works across the entire spectrum — from creating highly personalized notes based on weak points and performance data, to generating comprehensive study materials on entirely new subjects where no prior context exists.

The Multi-Tiered Approach:

When a user requests notes, Cereby AI follows a cascading strategy:

  1. Check for learning path topics (most relevant)
  2. Check for weak points (personalized)
  3. Fallback to general notes (zero context)

This multi-tiered approach ensures Cereby AI works for:

  • New students exploring subjects for the first time
  • Advanced learners diving into specialized topics
  • Casual learners who haven't taken quizzes yet
  • Anyone who wants to learn something completely new

By gracefully degrading from context-rich to zero-context generation, Cereby AI maintains its value proposition even when users have no prior interaction history.

Concept Explanation

The Explain Concepts capability provides detailed, structured explanations tailored to the user's learning level and context.

Structured Explanation Format:
  • Core Definition — Clear, concise explanation of the concept
  • Key Principles — Fundamental rules or theorems
  • Step-by-Step Examples — Worked examples at the user's level
  • Common Pitfalls — Mistakes to avoid (especially relevant for weak points)
  • Related Concepts — Connections to other topics the user has studied
  • Practice Applications — Real-world or exam-style applications
Adaptive Difficulty:

The system adjusts explanation depth based on user performance, providing more examples and detail for weak points, and advanced connections for mastered concepts.

Database Design

We extended our schema with several optimized tables:

Context Cache Table

Stores aggregated user context for fast retrieval with:

  • User identification and scoping
  • JSONB storage for flexible schema
  • Timestamp tracking for expiration
  • Optimized indexes for quick lookups

Flashcards Table

Implements spaced repetition with algorithm integration:

  • User-scoped card storage
  • Review scheduling metadata
  • Performance tracking
  • Algorithm parameters (ease factor, interval)

Concept Explanations Table

Stores generated explanations for quick reference:

  • Subject and concept organization
  • Difficulty level tracking
  • Related concepts linking
  • Access timestamp for analytics
Key Design Decisions:
  • JSONB for flexibility — Allows schema evolution without migrations
  • Indexed user identification — Fast user-scoped queries
  • Timestamp tracking — Enables time-based analysis

API Architecture

We implemented a dual-endpoint strategy:

1. Unified Chat Endpoint

Natural language interface for the UI - accepts free-form text and processes through the full intent classification pipeline.

2. Direct Capability Endpoints

Programmatic access for integrations - structured requests that bypass intent classification for known operations.

This architecture gives us:

  • Flexibility — Support both natural language and structured requests
  • Backwards compatibility — Direct endpoints for existing integrations
  • Rate limiting — Different limits per endpoint type

Performance Optimizations

1. Parallel Data Fetching

Context aggregation fetches data sources in parallel rather than sequentially, dramatically reducing latency.

2. Streaming Responses

For long-form content generation (exams, comprehensive notes, concept explanations), we stream responses to provide better perceived performance.

3. Request Deduplication

We cache similar requests within a short time window to avoid redundant AI calls.

Error Handling and Reliability

Graceful Degradation

When components fail, Cereby AI degrades gracefully with fallback strategies for model unavailability, partial context, and missing data.

Context Fallbacks

If context aggregation fails, the system:

  • Uses cached context (even if slightly stale)
  • Proceeds with partial context
  • Requests user clarification if critical data is missing
Zero-Context Operation:

One of our most important design decisions was ensuring Cereby AI can operate effectively even without any user context. This is valuable for new users, exploratory learning, and first-time topics.

Retry Logic

For transient failures:

  • Exponential backoff for API rate limits
  • Circuit breaker pattern for repeated failures
  • Automatic retry with jitter

Security and Privacy

Data Isolation

All queries are user-scoped with proper access controls.

Row-Level Security

Database policies ensure no cross-user data access.

AI Data Handling

  • User data only sent to AI provider for processing
  • No data retention (configured in API settings)
  • Encrypted context cache in database

Monitoring and Analytics

We track comprehensive metrics:

Usage Metrics:
  • Requests per user per day
  • Capability distribution (which features are used most)
  • Average response time per capability
Quality Metrics:
  • User satisfaction scores
  • Content regeneration rate (proxy for quality)
  • Weak point improvement correlation
Technical Metrics:
  • Cache hit rates
  • Model selection (fine-tuned vs base)
  • Error rates by type
  • Cost per request

These metrics inform model retraining decisions, performance optimization priorities, and feature development roadmap.

Lessons Learned

1. Start with Caching Early

We initially focused on feature development and added caching later. Building caching from day one would have saved significant refactoring time.

2. Fine-Tuning is Worth the Investment

The upfront cost of fine-tuning was substantial, but the quality improvements and cost savings over time have more than justified it.

3. Context is King (But Not Always Required)

The quality of context aggregation directly impacts AI output quality when it exists. However, we also learned that graceful degradation is essential. Not every user interaction has rich context, and the system must work beautifully even with zero context.

4. Design for Extensibility

Our plugin-based modular architecture makes adding new capabilities straightforward. Each new tool follows a consistent pattern without requiring modifications to core files.

5. User Feedback Loops Matter

Early user testing revealed that academic accuracy was paramount. This guided our fine-tuning dataset curation and quality assurance processes.

6. Conversation History Transforms User Experience

Adding conversation history support was a game-changer. Users could finally have natural, flowing conversations instead of treating each message as isolated.

Future Enhancements

We're working on:

  • Subject-specific fine-tuning — Separate models for STEM vs humanities
  • Multi-modal understanding — Process diagrams, formulas, and images
  • Proactive suggestions — AI-initiated study recommendations
  • Collaborative features — Study groups with shared Cereby insights
  • Voice interaction — Natural language voice commands

Conclusion

Building Cereby AI taught us that creating truly intelligent, context-aware AI systems requires:

  1. Vertical specialization — Fine-tuning for domain expertise
  2. Robust data aggregation — Comprehensive context is non-negotiable
  3. Conversation awareness — Understanding references and maintaining context across messages
  4. Smart intent detection — Knowing when to act vs. when to have a casual conversation
  5. Modular architecture — Plugin-based system enables clean separation and easy extensibility
  6. Performance optimization — Caching and parallelization are critical
  7. User-centric design — Quality metrics must align with user outcomes

Cereby AI is now used by thousands of students, generating personalized study materials that adapt to their unique learning journey. The technical foundation we built allows us to iterate quickly and add new capabilities without architectural changes.

For developers working on similar systems, our key takeaway is: invest in context aggregation (both structured data and conversation history) and vertical specialization from the start, but always design for graceful degradation. The system must provide value immediately for new users (zero-context operation) while leveraging rich context when available for personalization.


Want to learn more about Cereby AI or join our team? Check out our careers page or reach out on Twitter.

Visual Summary

flowchart TD
    A[User Request] --> B[CerebyAIController]
    B --> C[Context Aggregator]
    B --> D[Intent Classifier]
    D --> E{Intent Clear?}
    E -->|Yes| F[Tool Orchestrator]
    E -->|No| G[Clarification Response]
    F --> H[Tool Registry]
    H --> I[Selected Tool Handler]
    I --> J[AI Response + Updated Context]