Sketch AIDA (Analysis & Interactive Director Assistant)¶
Frontend Integration Points¶
1. Tool calls drive UI interactions¶
Every LLM response comes with a toolCalls array. The frontend must handle two tool types:
| Tool call type | Frontend action |
|---|---|
request_polygon_selection |
Open the polygon drawing modal on the sketch image. The data object contains instruction, maxSelections, annotationLabelPrefix, and purpose (character_mapping or keep_region). I believe this one is already fine but just double-check it a bit. Right now it is difficult to read the chat if the display to draw appears and you might not fully know what you have to annotate |
finalize_sketch_analysis |
Close the chat / mark analysis complete. This tool means the analysis is done. The frontend should refresh shot details (metadata was updated server-side) and close or collapse the AIDA panel. |
2. Triggering the flow¶
Send a message to POST /api/sketchChat/message with:
- userMessage: anything like "analyze this sketch"
- sketchImageUrl: the uploaded reference (sketch, photo, or z-depth map)
- storyId, sceneId, shotId
The guru auto-detects the reference type (sketch, photo, depth map) and starts the guided Q&A.
3. What AIDA produces¶
The Q&A conversation results in two outputs:
-
Shot detail updates — metadata fields like
description,shotSize,actingInstructions,cameraAngle, etc. are updated directly on the shot document viafinalize_sketch_analysis. The frontend must refresh shot details after the final tool call. -
Generation notes — stored in
sketchAnalyses[]on the shot. WhengenerateFromSketchruns later, it picks up the matching analysis (bysketchImageUrl) and feedsgenerationNotes+annotatedSketchImageUrlinto the image prompt. This tells the image model how to use the reference (e.g., "PHOTO — composition/pose only, do NOT copy style").
How It Works¶
User uploads reference (sketch/photo/z-depth)
│
▼
SketchChatController
- Loads/creates chat session
- Builds system prompt with shot metadata context
- Restores accumulated polygon state from previous turns
│
▼
SketchAnalysisGuru (extends AgenticGuru)
- Sends image + conversation to GPT-5.1 (vision)
- Single-pass mode (enableAgenticLoop: false)
- Returns JSON with textResponse + toolCalls
│
▼
Multi-turn Q&A (one question per turn):
1. Reference type identification (sketch? photo? depth map?)
2. Shot size / framing comparison
3. Character visibility and polygon mapping (prefix "C")
4. Environment / setting mismatch
5. Description mismatch
6. Action / pose / physics mismatch
7. Whole reference vs keep/use region polygon (prefix "R")
8. Final review (contains "FINAL REVIEW — awaiting confirmation")
9. User confirms → finalize_sketch_analysis tool call
│
▼
finalize_sketch_analysis:
- Burns all accumulated polygon annotations onto the sketch image
- Saves SketchAnalysisResult (generationNotes, characterMappings, keepRegion, etc.)
- Applies confirmed metadata updates to the shot document
Polygon Types¶
| Purpose | Prefix | Max selections | When |
|---|---|---|---|
character_mapping |
C |
N (one per visible character) | Multiple characters need identification |
keep_region |
R |
1 | User wants to use only part of the reference |
Polygons accumulate across turns. All are burned onto the original sketch image at finalization, producing a single annotatedSketchImageUrl.
Key Files¶
src/workbench/sketchChat/sketchChat.controller.ts— controller, session management, state restorationsrc/agenticGuru/sketchAnalysisGuru.ts— guru class, tool execution, polygon mergingsrc/agenticGuru/toolSchemas/sketchAnalysisTools.ts— finalize tool schema, allowed metadata fieldssrc/promptTemplates/agenticGuruSystemPrompts/sketchAnalysisGuruPrompt.ts— system promptsrc/workbench/shotImageGeneration/shotImageGeneration.service.ts— downstream consumer (generateFromSketchreadssketchAnalyses[])src/models/storyVideos.model.ts—SketchAnalysisResultinterface,upsertSketchAnalysis,updateShotFields
Session Persistence¶
Each sketch analysis session is keyed by (storyId, sceneId, shotId, sketchImageUrl). The session stores:
- Full conversation history
- Accumulated polygon selections
- analysisState (tracks finalReviewPresented for the confirmation trigger)
- activeSketchImageUrl
Multiple sketches per shot are supported — each gets its own session and SketchAnalysisResult in the sketchAnalyses[] array.