Skip to content

Sketch AIDA (Analysis & Interactive Director Assistant)

Frontend Integration Points

1. Tool calls drive UI interactions

Every LLM response comes with a toolCalls array. The frontend must handle two tool types:

Tool call type Frontend action
request_polygon_selection Open the polygon drawing modal on the sketch image. The data object contains instruction, maxSelections, annotationLabelPrefix, and purpose (character_mapping or keep_region). I believe this one is already fine but just double-check it a bit. Right now it is difficult to read the chat if the display to draw appears and you might not fully know what you have to annotate
finalize_sketch_analysis Close the chat / mark analysis complete. This tool means the analysis is done. The frontend should refresh shot details (metadata was updated server-side) and close or collapse the AIDA panel.

2. Triggering the flow

Send a message to POST /api/sketchChat/message with: - userMessage: anything like "analyze this sketch" - sketchImageUrl: the uploaded reference (sketch, photo, or z-depth map) - storyId, sceneId, shotId

The guru auto-detects the reference type (sketch, photo, depth map) and starts the guided Q&A.

3. What AIDA produces

The Q&A conversation results in two outputs:

  1. Shot detail updates — metadata fields like description, shotSize, actingInstructions, cameraAngle, etc. are updated directly on the shot document via finalize_sketch_analysis. The frontend must refresh shot details after the final tool call.

  2. Generation notes — stored in sketchAnalyses[] on the shot. When generateFromSketch runs later, it picks up the matching analysis (by sketchImageUrl) and feeds generationNotes + annotatedSketchImageUrl into the image prompt. This tells the image model how to use the reference (e.g., "PHOTO — composition/pose only, do NOT copy style").

How It Works

User uploads reference (sketch/photo/z-depth)
         │
         ▼
   SketchChatController
   - Loads/creates chat session
   - Builds system prompt with shot metadata context
   - Restores accumulated polygon state from previous turns
         │
         ▼
   SketchAnalysisGuru (extends AgenticGuru)
   - Sends image + conversation to GPT-5.1 (vision)
   - Single-pass mode (enableAgenticLoop: false)
   - Returns JSON with textResponse + toolCalls
         │
         ▼
   Multi-turn Q&A (one question per turn):
   1. Reference type identification (sketch? photo? depth map?)
   2. Shot size / framing comparison
   3. Character visibility and polygon mapping (prefix "C")
   4. Environment / setting mismatch
   5. Description mismatch
   6. Action / pose / physics mismatch
   7. Whole reference vs keep/use region polygon (prefix "R")
   8. Final review (contains "FINAL REVIEW — awaiting confirmation")
   9. User confirms → finalize_sketch_analysis tool call
         │
         ▼
   finalize_sketch_analysis:
   - Burns all accumulated polygon annotations onto the sketch image
   - Saves SketchAnalysisResult (generationNotes, characterMappings, keepRegion, etc.)
   - Applies confirmed metadata updates to the shot document

Polygon Types

Purpose Prefix Max selections When
character_mapping C N (one per visible character) Multiple characters need identification
keep_region R 1 User wants to use only part of the reference

Polygons accumulate across turns. All are burned onto the original sketch image at finalization, producing a single annotatedSketchImageUrl.

Key Files

  • src/workbench/sketchChat/sketchChat.controller.ts — controller, session management, state restoration
  • src/agenticGuru/sketchAnalysisGuru.ts — guru class, tool execution, polygon merging
  • src/agenticGuru/toolSchemas/sketchAnalysisTools.ts — finalize tool schema, allowed metadata fields
  • src/promptTemplates/agenticGuruSystemPrompts/sketchAnalysisGuruPrompt.ts — system prompt
  • src/workbench/shotImageGeneration/shotImageGeneration.service.ts — downstream consumer (generateFromSketch reads sketchAnalyses[])
  • src/models/storyVideos.model.tsSketchAnalysisResult interface, upsertSketchAnalysis, updateShotFields

Session Persistence

Each sketch analysis session is keyed by (storyId, sceneId, shotId, sketchImageUrl). The session stores: - Full conversation history - Accumulated polygon selections - analysisState (tracks finalReviewPresented for the confirmation trigger) - activeSketchImageUrl

Multiple sketches per shot are supported — each gets its own session and SketchAnalysisResult in the sketchAnalyses[] array.