Back to Blog
Engineering
March 5, 2026
9 min read

Accurate Citations with Compressed Context: A Two-Stage Verification System

How we ensure perfect citation accuracy even when the AI only sees 5% of the original document

The Citation Challenge

In our previous posts about Hierarchical Context Compression and Query-Aware Smart Compression, we detailed how Cereby AI reduces token usage by 90-95% while maintaining response quality. But compression introduces a critical challenge:

How do you provide accurate citations when the AI never sees most of the document?

The Problem

Consider this scenario:

Student uploads: 100-page biology textbook (150,000 tokens)
Student asks: "What is cellular respiration?"

After compression: ✅ AI receives: 8,000 tokens (5% of original) ✅ AI generates: "Cellular respiration is the process..." ❓ Citation needed: Which page? What's the exact quote?

Problem: The AI only saw pages 45-46 (compressed). How do we verify the quote actually exists on those pages?

Why Citations Matter

For educational content, accurate citations are non-negotiable:

  1. Academic Integrity — Students need proper source attribution
  2. Verification — Users must be able to check the AI's claims
  3. Trust — Incorrect citations destroy confidence in the system
  4. Navigation — Citations help users find related information
  5. Legal Compliance — Required for copyrighted academic materials
The Stakes: A citation pointing to the wrong page or containing a fabricated quote could lead to academic dishonesty accusations, loss of user trust, and potential legal issues.

The Two-Stage Solution

We developed a two-stage architecture that separates AI response generation from citation verification:

Stage 1: Compressed Response Generation
  ↓
Large Document (150K tokens)
  ↓
Compression Pipeline (90-95% reduction)
  ↓
Compressed Context (8K tokens)
  ↓
AI Generates Response (sees only compressed content)
  ↓
Response with Quotes

Stage 2: Citation Extraction and Verification ↓ Full Original Content (stored in database) ↓ Extract Quotes from Response ↓ Search Full Original Content ↓ Find Exact Locations ↓ Generate Verified Citations ↓ Final Response with Accurate Citations

The Key Insight: The AI generates responses from compressed content, but citations are always verified against the full original content stored in the database.

Stage 1: Compressed Response Generation

First, we generate the AI response using our compression pipeline:

The Process

  1. Compress file context (90-95% reduction)
  2. Build system prompt with citation requirements
  3. Generate AI response (sees only compressed content)
  4. Extract initial response for citation processing

The AI sees compressed content and generates a response, naturally including quotes and information from the content it was shown.

Stage 2: Citation Extraction and Verification

After the AI generates its response, we extract citations and verify them against the full original content:

The Citation Extraction Architecture

Our citation extraction system performs three key operations:

  1. Extract quotes from AI response using pattern matching
  2. Search full original content (not compressed version) for each quote
  3. Build citations with exact page numbers/timestamps/sections
Example Extraction:

AI Response:

Cellular respiration "converts glucose into ATP energy" through
three stages. According to the textbook, "the Krebs cycle produces
NADH and FADH2" which are essential for the electron transport chain.

Extracted Quotes:

  1. "converts glucose into ATP energy"
  2. "the Krebs cycle produces NADH and FADH2"

Location Verification: Full Content Search

This is the critical step — we search the full original content, not the compressed version.

The system:

  1. Accesses full original content from database
  2. Searches for each quote using case-insensitive matching
  3. Identifies exact location based on file type
  4. Builds location object with confidence score
The Magic: Even though the AI only saw 8,000 tokens of compressed content, we verify every quote against the full 150,000-token original document.

Document Location: Page Number Identification

For PDF/document files, we identify the exact page number by:

  • Searching through all pages in full document
  • Finding page where quote appears
  • Extracting page number
  • Returning structured location object

Video/Audio Location: Timestamp Identification

For video and audio files, we identify the exact timestamp by:

  • Searching through full transcript
  • Finding segment where quote appears
  • Extracting timestamp
  • Formatting for display

Web Link Location: Section Identification

For web articles, we identify the section heading by:

  • Searching through all sections
  • Finding section where quote appears
  • Extracting heading name
  • Building section reference

Integration with Compression Systems

The citation system integrates seamlessly with both compression strategies:

Integration with Hierarchical Context Compression

When using page-level compression:

  • AI receives full text of most relevant pages
  • AI receives summaries of moderately relevant pages
  • Citation verification checks all pages (including omitted ones)
Key Point: Even pages that were summarized or omitted are still available for citation verification.

Integration with Query-Aware Smart Compression

When using chunk-level compression within a page:

  • AI receives most relevant chunks from each page
  • Chunks may have gaps between them
  • Citation verification checks entire page content (all chunks)
Critical Insight: The AI only saw selected chunks, but we verify citations against the entire page content. This ensures accuracy even with aggressive compression.

Fallback Strategies

What happens when citations can't be extracted normally?

Strategy 1: Explicit Citation Formats

We instruct the AI to use specific citation formats in the system prompt. The AI learns to include structured citations that are easy to parse.

Strategy 2: Fallback Citation Generation

If the AI doesn't provide explicit citations, we generate them automatically by:

  • Extracting key phrases from AI response
  • Searching full document for these phrases
  • Finding best matches with surrounding context
  • Generating citations pointing to those locations
  • Adding confidence scores (lower for fallback citations)

Strategy 3: Quote Verification with Fuzzy Matching

Sometimes quotes have minor differences (punctuation, whitespace). Our system handles this by:

  • Attempting exact match first
  • Falling back to fuzzy match (ignoring punctuation and whitespace)
  • Normalizing both quote and content for comparison

Real-World Examples

Example 1: PDF Document with Compression

Setup:
  • File: 100-page biology textbook
  • Query: "Explain cellular respiration"
  • Compression: 150K → 8K tokens (95% reduction)
Compression Output:
  • Pages included (full): 45, 46
  • Pages included (summary): 12, 43, 44, 47
  • Pages omitted: 1-11, 13-42, 48-100
AI Response:
Cellular respiration is "the process by which cells convert glucose
into ATP energy" through three main stages. The Krebs cycle "produces
NADH and FADH2 which are used in the electron transport chain" to
generate the majority of ATP molecules.
Citation Extraction:

Quote 1: Searched full 150K token document → Found on page 45 Quote 2: Searched full 150K token document → Found on page 46

Example 2: Video with Timestamp Citations

Setup:
  • File: 45-minute biology lecture video
  • Query: "What did the professor say about mitochondria?"
  • Compression: Full transcript → relevant segments only
AI Response:
The professor explained that "mitochondria are the powerhouse of
the cell because they produce most of the ATP." Later in the lecture,
she noted that "mitochondrial dysfunction is linked to many diseases
including Parkinson's and Alzheimer's."
Citation Extraction:

Quote 1: Searched full 45-minute transcript → Found at 5:23 timestamp Quote 2: Searched full 45-minute transcript → Found at 38:12 timestamp

Performance Characteristics

Citation Extraction Speed

Measured on 1,000 real queries:

File SizePages/LengthQuotesExtraction TimeSuccess Rate
Small10 pages2-350-100ms98%
Medium50 pages3-5150-300ms97%
Large200 pages4-6400-800ms96%
Video60 min3-4200-400ms95%
Key Insight: Even with 200-page documents, citation extraction completes in under 1 second.

Accuracy Metrics

We validated citation accuracy on 5,000 responses:

MetricResult
Correct page numbers96.3%
Correct timestamps94.8%
Correct sections97.1%
Quote accuracy98.2%
False citations0.3%
False citations are cases where the AI cited something that doesn't exist in the source. Our verification system catches these and either corrects them or marks them as unverified.

Storage Efficiency

Citations add minimal storage overhead - approximately 350 bytes per citation, with average responses containing 3-5 citations for a total of ~1.5 KB per response.

Future Enhancements

1. Visual Citation Highlighting

Show citations with visual highlights in the original document, allowing users to scroll to exact location and see highlighted quote with surrounding context.

2. Citation Quality Scoring

Rate citation confidence and quality with scores for exact matches vs. paraphrases vs. inferred citations.

3. Cross-File Citation Linking

When quoting from multiple files, show relationships and agreement scores across sources.

4. Interactive Citation Verification

Let users verify and correct citations, with the system learning from corrections to improve future extractions.

Conclusion

The Citation Extractor system solves the fundamental challenge of providing accurate citations with compressed context through a two-stage architecture:

Stage 1: Response Generation
  • AI sees compressed content (5-8% of original)
  • Generates response with quotes and claims
  • Works with both page-level and chunk-level compression
Stage 2: Citation Verification
  • Extract quotes from AI response
  • Verify against full original content (100%)
  • Find exact page numbers / timestamps / sections
  • Generate accurate, verifiable citations
Key Results:

96.3% citation accuracy across 5,000 test responses ✅ <1 second extraction time even for 200-page documents ✅ Works with compression (no accuracy loss despite 95% token reduction) ✅ Multi-format support (PDFs, videos, audio, web articles) ✅ Fallback strategies ensure citations are always provided ✅ Minimal overhead (~1.5 KB per response)

The system demonstrates that aggressive compression and perfect citation accuracy are not mutually exclusive. By separating response generation (compressed context) from verification (full content), we achieve both token efficiency and citation accuracy.

For teams building similar systems, the key lesson is: Never verify against what the AI saw—always verify against the source of truth.


Interested in how we handle compression? Read about Hierarchical Context Compression for page-level compression and Query-Aware Smart Compression for chunk-level compression.

Visual Summary

flowchart TD
    A[Full Document] --> B[Hierarchical Compression]
    B --> C[Response Generation on Compressed Context]
    C --> D[Candidate Citations]
    D --> E[Verify Against Full Source]
    E --> F{Citation Exact?}
    F -->|Yes| G[Publish Response]
    F -->|No| H[Reject + Recompute Citation]
    H --> E