Skip to content

Auto-Grading System

Overview

The auto-grading system provides real-time feedback on story quality by automatically triggering grading when significant content changes are detected. This eliminates the need for users to manually request grading after every edit.


How It Works

Trigger Flow

  1. User updates scroll content via POST /api/story/updateScroll
  2. System checks conditions:
  3. Is grading already in progress? If yes, skip
  4. Has content changed significantly? (word count delta check) OR do grading criteria differ from last graded result?
  5. If either condition triggers, create IN_PROGRESS grading entry
  6. Response returned immediately with isStoryGradingInProgress: boolean flag
  7. Grading executes asynchronously in background (fire-and-forget)
  8. Frontend polls POST /api/story/grading/status to get results
  9. When complete, grading status updates to COMPLETED and results are available

Content Change Detection

Algorithm: MAX Word Count Delta

The system compares current scroll content with the last graded snapshot:

  • For each scroll: Calculate absolute word count difference from snapshot
  • Take MAX: Use the largest delta across all scrolls
  • Decision: Trigger grading if maxWordCountDelta >= 50 words

Edge Cases

First Grading (No Snapshots):

  • Always triggers grading
  • Uses total word count as delta

New Scroll Added:

  • Scroll has no snapshot
  • Treats new scroll's word count as delta
  • Triggers if new scroll >= 50 words

Scroll Deleted:

  • Snapshot exists but scroll is missing
  • Uses deleted scroll's word count as delta
  • Triggers grading to reflect removal

Multiple Scrolls Changed:

  • Only MAX matters, not total
  • Example: Scroll A +10 words, Scroll B +60 words → Triggers (60 >= 50)

Criteria Mismatch Detection

Purpose

When grading criteria are updated in code (e.g. a new criterion added), stories that were graded with the old criteria will be missing ratings for the new criterion. The frontend can break when it expects certain criteria to exist.

Algorithm

  • Compare criteriaId set from SharedConstants.STORY_GRADING_CRITERIA with the criteriaId set from the latest COMPLETED grading's ratings
  • If they differ (different count or different IDs), trigger re-grading on next scroll update

When This Triggers

  • New criteria added: Story was graded with [A, B, C], code now has [A, B, C, D] → Triggers
  • Criteria removed: Story was graded with [A, B, C, D], code now has [A, B, C] → Triggers
  • Criteria renamed/changed: Any ID difference → Triggers

Grading Status States

IN_PROGRESS

  • Grading is currently running
  • Created before async grading starts
  • Prevents concurrent grading for same story
  • Updated to COMPLETED or FAILED when done

COMPLETED

  • Grading finished successfully
  • Contains ratings array with criteria scores
  • Snapshots saved for next comparison
  • Displayed to user via polling API

FAILED

  • Grading encountered an error
  • Contains error message for debugging
  • Does NOT update snapshots
  • User can manually trigger grading again

Snapshot System

Purpose

Snapshots store the content state at the time of last successful grading. This enables accurate change detection.

Structure

{
  [scrollId]: {
    content: string,        // Plain text extracted from editorState
    snapshotTakenAt: Date   // Timestamp for debugging
  }
}

Lifecycle

Created:

  • After COMPLETED grading only (not on FAILED)
  • Only includes non-deleted scrolls
  • Stored in story.lastGradedScrollSnapshots

Updated:

  • Each successful grading replaces all snapshots
  • Deleted scrolls removed from snapshots
  • Ensures clean state for next comparison

Used:

  • Every scroll update triggers comparison
  • Deleted scroll snapshots ignored (not compared)

Concurrency & Race Conditions

Prevention Mechanisms

Single IN_PROGRESS Check:

  • Before creating new IN_PROGRESS entry, check if one already exists
  • Uses latest entry sorted by createdAt (not array position)
  • Prevents multiple concurrent gradings

MongoDB Array Filters:

  • Updates use arrayFilters: [{ 'elem.status': 'IN_PROGRESS' }]
  • Updates ALL IN_PROGRESS entries (handles stuck entries gracefully)
  • Atomic operations prevent race conditions

Fire-and-Forget with Error Handling:

  • Async execution doesn't block API response
  • Errors captured and logged to New Relic
  • Request context automatically preserved via AsyncLocalStorage

Integration Points

Database Schema

Story Model Fields:

  • gradingHistory[] - Array of grading entries with status, ratings, timestamps
  • lastGradedScrollSnapshots - Map of scrollId to snapshot data

API Endpoints

POST /api/story/updateScroll

  • Response includes isStoryGradingInProgress flag
  • Frontend starts polling if true

POST /api/story/grading/status

  • Returns latest grading status
  • If IN_PROGRESS, returns previous COMPLETED result + flag to continue polling
  • If COMPLETED/FAILED, returns that result + flag to stop polling

LLM Integration

Guru Chat Context:

  • Latest COMPLETED grading result passed to LLM
  • Enables context-aware coaching based on story weaknesses
  • LLM focuses questions on areas with low ratings
  • Grading information not explicitly shown to user unless requested

Configuration

Constants

AUTO_GRADING_WORD_COUNT_THRESHOLD:

  • Default: 50 words
  • Controls sensitivity of change detection
  • Lower = more frequent grading, Higher = less frequent

Frontend Integration Requirements

Initial Page Load

  1. Fetch story data
  2. Call grading/status endpoint
  3. If isStoryGradingInProgress: true, start polling

On Scroll Update

  1. Call updateScroll endpoint
  2. Check isStoryGradingInProgress in response
  3. If true, start polling

Polling Strategy

  • Interval: 3 seconds recommended
  • Stop when isStoryGradingInProgress: false
  • Display previous results while waiting
  • Cleanup polling on component unmount

Error Handling

Grading Failures

  • Entry updated to FAILED with error message
  • Snapshots NOT updated
  • User can manually trigger grading
  • Errors logged to New Relic with full context

Stuck IN_PROGRESS Entries

  • Can occur if server crashes during grading
  • Array filters handle gracefully (updates all IN_PROGRESS)

Network Errors

  • Polling failures handled gracefully on frontend
  • Backend errors don't break user editing workflow
  • Fire-and-forget ensures API always responds quickly

Performance Considerations

Database Efficiency

  • Single DB call per updateScroll (get story)
  • Single atomic update on grading completion (status + snapshots)
  • Application-level scroll filtering minimizes query complexity

Computation

  • Word count calculation: O(n) where n = text length
  • Change detection: O(m) where m = number of scrolls
  • Runs in <5ms for typical stories (3-5 scrolls, 500-2000 words each)

API Response Time

  • updateScroll returns immediately (<200ms target)
  • Grading runs asynchronously (2-5 seconds)
  • No blocking operations in request chain

Monitoring & Debugging

Key Metrics to Track

  • Auto-grading trigger rate
  • Grading success vs failure rate
  • Average word count delta when triggered
  • Stuck IN_PROGRESS entry count
  • Average grading completion time

Error Types

  • FireAndForgetError - Background grading failure
  • No IN_PROGRESS grading entry found - Race condition or stuck entry
  • Logged to New Relic with full request context

Business Rules

When Auto-Grading Triggers

✅ Content change >= 50 words in any scroll ✅ New scroll added with >= 50 words ✅ Scroll deleted with >= 50 words ✅ Grading criteria in code differ from last graded result (e.g. new criterion added) ✅ No grading currently in progress

When Auto-Grading Skips

❌ Content change < 50 words AND criteria match last graded ❌ Grading already in progress ❌ Only canvas position or title changed (no editorState change)

Snapshot Behavior

  • Only saved on COMPLETED (not FAILED)
  • Only includes non-deleted scrolls
  • Deleted scrolls trigger grading once, then removed from snapshots
  • Ensures deleted content doesn't repeatedly trigger grading

Limitations & Future Improvements

Current Limitations

  • Simple word count threshold (doesn't detect semantic changes)
  • No debouncing (triggers on every qualifying update)
  • No user control (always enabled)