🎬 Dialogue System Flow - Complete Architecture¶

📋 Table of Contents¶

Overview
Creation Flow (Scene Generation)
Distribution Flow (Shot Generation)
Redistribution Flow (Shot Edits)
Data Structures
Key Design Principles
Dialogue Audio V2
Complete Example

Overview¶

The dialogue system operates on a source of truth principle where scene-level dialogue (SceneDialogue[]) is canonical and shot-level dialogue (ShotDialogue[]) is derived. This document explains the complete flow from dialogue creation through distribution and redistribution.

High-Level Flow¶

┌─────────────┐
│   SCRIPT    │ (Text with character dialogue)
└──────┬──────┘
       │
       ↓ [1. EXTRACTION]
┌─────────────────────────────────────┐
│   SCENE DIALOGUE (SceneDialogue[]) │
│   • dialogueId, speaker, text      │
│   • origin, timing, voiceDirection │
│   • Independent of shots           │
└──────┬──────────────────────────────┘
       │
       ↓ [2. DISTRIBUTION]
┌─────────────────────────────────────┐
│   SHOT DIALOGUE (ShotDialogue[])   │
│   • references SceneDialogue via   │
│     dialogueId + portion indices   │
│   • Derived from scene dialogue    │
└──────┬──────────────────────────────┘
       │
       ↓ [3. REDISTRIBUTION]
┌─────────────────────────────────────┐
│   UPDATED SHOT DIALOGUE             │
│   • Recalculated when shots change  │
│   • Fills gaps, maintains flow      │
└─────────────────────────────────────┘

1. Creation Flow (Scene Generation)¶

Trigger¶

User clicks "Generate Scenes" → generateScenesV2()

Step-by-Step Process¶

User Action: "Generate Scenes"
        ↓
┌───────────────────────────────────────────────────────┐
│ 1. LLM Call (generateScenesV2.template.ts)           │
├───────────────────────────────────────────────────────┤
│ Input:                                                │
│   • scriptContent (raw text)                          │
│   • characters[] (from story, with assetIds)          │
│   • environmentSettings[] (from story, with assetIds) │
│                                                       │
│ Prompt Instructions:                                  │
│   "Extract dialogue from script. Return:              │
│    - dialogue: Array of segments                      │
│    - Each segment: speaker, text, deliverySpeed,      │
│      voiceDirection, order"                           │
└───────────────────────────────────────────────────────┘
        ↓
┌───────────────────────────────────────────────────────┐
│ 2. LLM Output (Raw JSON)                              │
├───────────────────────────────────────────────────────┤
│ {                                                     │
│   scenes: [                                           │
│     {                                                 │
│       sceneIndex: 1,                                  │
│       title: "Koyal and Cuddles Reunion",            │
│       characters: [                                   │
│         {title: "Koyal", assetId: "char-uuid-1"},    │
│         {title: "Cuddles", assetId: "char-uuid-2"}   │
│       ],                                              │
│       dialogue: [  ← Raw LLM output                   │
│         {                                             │
│           speaker: {                                  │
│             type: "CHARACTER",                        │
│             characterName: "Koyal"                    │
│           },                                          │
│           text: "I was on my way to save you...",    │
│           deliverySpeed: "normal",                    │
│           voiceDirection: "apologetic, tired",        │
│           order: 1                                    │
│         },                                            │
│         {                                             │
│           speaker: {                                  │
│             type: "CHARACTER",                        │
│             characterName: "Cuddles"                  │
│           },                                          │
│           text: "You're a bird. You fly. What?",     │
│           deliverySpeed: "fast",                      │
│           voiceDirection: "sarcastic, annoyed",       │
│           order: 2                                    │
│         }                                             │
│       ]                                               │
│     }                                                 │
│   ]                                                   │
│ }                                                     │
└───────────────────────────────────────────────────────┘
        ↓
┌───────────────────────────────────────────────────────┐
│ 3. Backend Processing (workbench.service.ts)         │
│    convertLLMDialogueToSceneDialogue()               │
├───────────────────────────────────────────────────────┤
│ Step 3a: Map character names → assetIds              │
│   characterMap = {                                   │
│     "koyal": "char-uuid-1",                         │
│     "cuddles": "char-uuid-2"                        │
│   }                                                 │
│                                                      │
│ Step 3b: Generate dialogue UUIDs                    │
│   dialogueId = generateSegmentId()  // Backend UUID │
│                                                      │
│ Step 3c: Add assetId to speakers                    │
│   if (speaker.type === 'CHARACTER') {               │
│     speaker.assetId = characterMap.get(...)         │
│   }                                                 │
│                                                      │
│ Step 3d: Calculate timing (dialogue.utils.ts)       │
│   recalculateSceneDialogueTimings(dialogues)        │
│   • Loops through dialogues in order                │
│   • For each dialogue:                              │
│     - Base duration = text.length / CHARS_PER_SECOND│
│     - Add pauses for punctuation (., ?, !, etc.)    │
│     - startTime = previous.endTime                  │
│     - endTime = startTime + duration                │
└───────────────────────────────────────────────────────┘
        ↓
┌───────────────────────────────────────────────────────┐
│ 4. Final Scene Data (Saved to MongoDB)               │
├───────────────────────────────────────────────────────┤
│ scene: {                                              │
│   sceneId: "scene-uuid",                             │
│   sceneIndex: 1,                                      │
│   title: "Koyal and Cuddles Reunion",                │
│   characters: [...],  // AssetRef[]                  │
│   dialogues: [  ← CANONICAL DIALOGUE (SceneDialogue[])│
│     {                                                │
│       dialogueId: "dlg-abc-123",  ← Backend UUID    │
│       speaker: {                                     │
│         title: "Koyal",                              │
│         type: "CHARACTER",                           │
│         assetId: "char-uuid-1"  ← Mapped             │
│       },                                             │
│       text: "I was on my way to save you...",        │
│       origin: "SCRIPT",                              │
│       timing: {                                      │
│         startTime: 0,        ← Calculated            │
│         endTime: 3.99,       ← Calculated            │
│         deliverySpeed: "normal"                      │
│       },                                             │
│       voiceDirection: "apologetic, tired",           │
│       dialogueIndex: 1                               │
│     },                                               │
│     {                                                │
│       dialogueId: "dlg-def-456",                    │
│       speaker: {                                     │
│         title: "Cuddles",                            │
│         type: "CHARACTER",                           │
│         assetId: "char-uuid-2"                       │
│       },                                             │
│       text: "You're a bird. You fly. What?",         │
│       origin: "SCRIPT",                              │
│       timing: {                                      │
│         startTime: 3.99,     ← Sequential            │
│         endTime: 7.11,                               │
│         deliverySpeed: "fast"                        │
│       },                                             │
│       voiceDirection: "sarcastic, annoyed",          │
│       dialogueIndex: 2                               │
│     }                                                │
│   ],                                                 │
│   shots: []  ← Empty, shots not generated yet        │
│ }                                                     │
└───────────────────────────────────────────────────────┘

Code References¶

LLM Prompt: src/promptTemplates/workbenchV2/generateScenesV2.template.ts
Conversion Logic: src/workbench/workbench.service.ts (convertLLMDialogueToSceneDialogue)
Timing Calculation: src/shared/dialogue.utils.ts (recalculateSceneDialogueTimings)
UUID Generation: src/shared/dialogue.utils.ts (generateSegmentId)

2. Distribution Flow (Shot Generation)¶

Trigger¶

User generates shots (KEYS_ONLY mode) → generateShots()

Step-by-Step Process¶

User Action: "Generate Key Shots"
        ↓
┌───────────────────────────────────────────────────────┐
│ 1. LLM Call (generateShotsV2.template.ts)            │
├───────────────────────────────────────────────────────┤
│ Input:                                                │
│   • sceneData (with dialogues: SceneDialogue[])       │
│   • storyData                                         │
│   • mode: "KEYS_ONLY"                                 │
│                                                       │
│ Prompt Includes Scene Dialogue:                      │
│   "SCENE DIALOGUE (reference by index):               │
│    [                                                  │
│      {                                                │
│        segmentIndex: 0,  ← LLM uses this             │
│        dialogueIndex: 1,                             │
│        speaker: {...},                                │
│        text: "I was on my way...",                    │
│        timing: {startTime: 0, endTime: 3.99}          │
│      },                                               │
│      {                                                │
│        segmentIndex: 1,                               │
│        dialogueIndex: 2,                             │
│        speaker: {...},                                │
│        text: "You're a bird...",                      │
│        timing: {startTime: 3.99, endTime: 7.11}       │
│      }                                                │
│    ]                                                  │
│                                                       │
│    NOTE: Distribute dialogue across shots.            │
│    Use segmentIndex + portionStart/End.              │
└───────────────────────────────────────────────────────┘
        ↓
┌───────────────────────────────────────────────────────┐
│ 2. LLM Output (Raw JSON)                              │
├───────────────────────────────────────────────────────┤
│ {                                                     │
│   key_shots: [                                        │
│     {                                                 │
│       shotId: "K1",                                   │
│       cameraAngle: "MEDIUM_TWO_SHOT",                 │
│       duration: 4.5,                                  │
│       dialogueDistribution: {  ← Raw LLM output      │
│         segments: [                                   │
│           {                                           │
│             segmentIndex: 0,  ← References scene     │
│             portionStart: 0,   ← Char index          │
│             portionEnd: 66     ← Full segment        │
│           }                                           │
│         ]                                             │
│       }                                               │
│     },                                                │
│     {                                                 │
│       shotId: "K2",                                   │
│       cameraAngle: "CLOSE_UP",                        │
│       duration: 1.2,                                  │
│       dialogueDistribution: { segments: [] }  ← No dialogue│
│     },                                                │
│     {                                                 │
│       shotId: "K3",                                   │
│       cameraAngle: "MEDIUM_SINGLE",                   │
│       duration: 3.5,                                  │
│       dialogueDistribution: {                         │
│         segments: [                                   │
│           {                                           │
│             segmentIndex: 1,                          │
│             portionStart: 0,                          │
│             portionEnd: 38  ← Partial segment        │
│           }                                           │
│         ]                                             │
│       }                                               │
│     }                                                 │
│   ]                                                   │
│ }                                                     │
└───────────────────────────────────────────────────────┘
        ↓
┌───────────────────────────────────────────────────────┐
│ 3. Backend Processing (testDebug.service.ts /        │
│    workbenchSceneGuru.convertDialogueDistribution)   │
├───────────────────────────────────────────────────────┤
│ For each shot:                                        │
│   For each segment in dialogueDistribution:           │
│     1. Get scene dialogue by segmentIndex             │
│     2. Create ShotDialogue entry via                  │
│        createShotDialogue(sceneDialogue,              │
│          portionStart, portionEnd, startOffset)       │
│                                                       │
│ Remove raw dialogueDistribution from shot             │
└───────────────────────────────────────────────────────┘
        ↓
┌───────────────────────────────────────────────────────┐
│ 4. Final Shot Data (Saved to scene.shots[])          │
├───────────────────────────────────────────────────────┤
│ shots: [                                              │
│   {                                                   │
│     shotId: "K1",                                     │
│     cameraAngle: "MEDIUM_TWO_SHOT",                   │
│     duration: 4.5,                                    │
│     dialogues: [  ← SHOT DIALOGUE (ShotDialogue[])    │
│       {                                               │
│         dialogueId: "dlg-abc-123", ← Links to scene  │
│         speaker: {...},  ← Copied from scene         │
│         portion: { start: 0, end: 66, text: "I was on my way to save you..." },│
│         timing: {                                     │
│           startOffset: 0,                             │
│           duration: 3.99                              │
│         },                                            │
│         voiceDirection: "apologetic, tired"           │
│       }                                               │
│     ]                                                 │
│   },                                                  │
│   {                                                   │
│     shotId: "K2",                                     │
│     duration: 1.2,                                    │
│     dialogues: []           ← Visual only             │
│   },                                                  │
│   {                                                   │
│     shotId: "K3",                                     │
│     duration: 3.5,                                    │
│     dialogues: [                                      │
│       {                                               │
│         dialogueId: "dlg-def-456",                   │
│         speaker: {...},                               │
│         portion: { start: 0, end: 38, text: "You're a bird. You fly. What?" },│
│         timing: {                                     │
│           startOffset: 0,                             │
│           duration: 3.12                              │
│         },                                            │
│         voiceDirection: "sarcastic, annoyed"          │
│       }                                               │
│     ]                                                 │
│   }                                                   │
│ ]                                                     │
└───────────────────────────────────────────────────────┘

Key Point¶

Shot dialogue is DERIVED, not independent: - Shots reference scene dialogue via dialogueId + portion.start/end - Shots only cache the relevant substring and timing - Full dialogue text and canonical timing live on the scene

Code References¶

LLM Prompt: src/promptTemplates/workbenchV2/generateShotsV2.template.ts
Shot Generation: src/agenticGuru/workbenchSceneGuru.ts (generateShots)
Post-processing: src/testDebug/testDebug.service.ts (postProcessGenerateShots)

3. Redistribution Flow (Shot Edits)¶

Trigger¶

User adds filler shot → addFillerShot() → Dialogue must redistribute

Step-by-Step Process¶

User Action: "Add filler shot between K2 and K3"
        ↓
┌───────────────────────────────────────────────────────┐
│ 1. Current State (Before Filler)                      │
├───────────────────────────────────────────────────────┤
│ Scene Dialogue (UNCHANGED - source of truth):         │
│   dialogues: [                                        │
│     {dialogueId: "dlg-abc-123", text: "I was...", ...}│
│     {dialogueId: "dlg-def-456", text: "You're...", ...}│
│   ]                                                   │
│                                                       │
│ Shots:                                                │
│   K1 (4.5s): [dlg-abc-123: 0-66 chars]               │
│   K2 (1.2s): [] ← Visual only                        │
│   K3 (3.5s): [dlg-def-456: 0-38 chars]               │
│                                                       │
│ Problem: Total 9.2s, but dialogue is 7.11s           │
│          Seg-def-456 only uses 38/38 chars           │
│          Where does rest of dialogue go?             │
└───────────────────────────────────────────────────────┘
        ↓
┌───────────────────────────────────────────────────────┐
│ 2. Insert Filler Shot (workbenchSceneGuru.ts:1069)   │
├───────────────────────────────────────────────────────┤
│ New Shot Array:                                       │
│   K1 (4.5s)  ← Existing                              │
│   K2 (1.2s)  ← Existing                              │
│   F1 (1.0s)  ← NEW FILLER                            │
│   K3 (3.5s)  ← Existing                              │
│                                                       │
│ Total duration: 10.2s                                 │
│ Dialogue duration: 7.11s                              │
│                                                       │
│ Trigger Redistribution:                               │
│   redistributeDialogueAcrossShots(                    │
│     sceneDialogues: scene.dialogues,                  │
│     shotData: [                                       │
│       {shotId: "K1", timing: 4.5},                   │
│       {shotId: "K2", timing: 1.2},                   │
│       {shotId: "F1", timing: 1.0},  ← NEW            │
│       {shotId: "K3", timing: 3.5}                    │
│     ]                                                 │
│   )                                                   │
└───────────────────────────────────────────────────────┘
        ↓
┌───────────────────────────────────────────────────────┐
│ 3. Redistribution Algorithm (dialogue.utils.ts)      │
├───────────────────────────────────────────────────────┤
│ Input:                                                │
│   • Scene dialogues (canonical dialogue)              │
│   • Shot timings (available time)                     │
│                                                       │
│ Algorithm:                                            │
│   1. Sort dialogues by dialogueIndex                  │
│   2. Initialize: currentShotIndex = 0,                │
│                  currentShotTimeUsed = 0              │
│   3. For each dialogue:                               │
│      a. Walk through its text using character indices │
│      b. For each shot while text remains:             │
│         - Calculate how much text fits in remaining   │
│           shot time (charsPerSecond × time)           │
│         - Find a word boundary for the break point    │
│         - Create ShotDialogue for that portion        │
│         - Update currentShotTimeUsed and index        │
│                                                       │
│ Key Logic:                                            │
│   while (dialogueCharIndex < dialogueText.length      │
│          && currentShotIndex < shots.length) {        │
│     shotTimeRemaining = shot.timing - usedTime        │
│                                                       │
│     if (shotTimeRemaining fits remaining text) {      │
│       // Dialogue fits entirely in this shot          │
│       portionEnd = dialogueText.length                │
│     } else {                                          │
│       // Partial segment, break at word boundary      │
│       charsPerSec = CHARS_PER_SECOND[speed]           │
│       maxChars = shotTimeRemaining * charsPerSec      │
│       portionEnd = findWordBoundary(text, maxChars)   │
│     }                                                 │
│                                                       │
│     createShotDialogue(...)                           │
│     currentShotTimeUsed += portionDuration            │
│   }                                                   │
└───────────────────────────────────────────────────────┘
        ↓
┌───────────────────────────────────────────────────────┐
│ 4. New Shot Dialogue (After Redistribution)          │
├───────────────────────────────────────────────────────┤
│ K1 (4.5s): [seg-abc-123: 0-66]  ← Full segment       │
│   "I was on my way to save you, but… traffic..."     │
│   Duration: 3.99s, fits in 4.5s ✓                    │
│                                                       │
│ K2 (1.2s): []  ← Visual only, no dialogue            │
│   Filler for reaction/establishing                    │
│                                                       │
│ F1 (1.0s): [dlg-def-456: 0-15]  ← NEW PORTION        │
│   "You're a bird."                                    │
│   Duration: ~0.8s, fits in 1.0s ✓                    │
│                                                       │
│ K3 (3.5s): [dlg-def-456: 15-38]  ← ADJUSTED          │
│   "You fly. What traffic??"                           │
│   Duration: ~2.3s, fits in 3.5s ✓                    │
│                                                       │
│ Result: All dialogue redistributed naturally          │
│         Breaks at word boundaries                     │
│         No gaps or overlaps                           │
└───────────────────────────────────────────────────────┘

Key Points¶

Scene dialogue NEVER changes - it's the source of truth
Shot dialogue is recalculated from scene segments + shot timings
Algorithm ensures:
Linear flow (no dialogue skipped)
Word boundary breaks (natural pacing)
No overlaps or gaps
Respects shot timing constraints

Code References¶

Redistribution Call: src/agenticGuru/workbenchSceneGuru.ts:1069-1095
Algorithm: src/shared/dialogue.utils.ts:144-215 (redistributeDialogueAcrossShots)
Portion Creation: src/shared/dialogue.utils.ts:98-139 (createShotDialoguePortion)

4. Data Structures¶

Scene Dialogue (Source of Truth)¶

// Scene-level dialogue segment (source of truth)
export interface SceneDialogue {
  dialogueId: string;
  speaker: {
    title: string;                  // Character display name
    type: 'CHARACTER' | 'NARRATOR' | 'VOICEOVER' | 'OFF_SCREEN';
    assetId?: string;               // Mapped from scene.characters
  };
  text: string;                     // FULL turn text
  origin: 'SCRIPT' | 'GENERATED';
  timing: {
    startTime: number;             // Seconds from scene start
    endTime: number;               // Calculated from text + speed
    deliverySpeed: 'slow' | 'normal' | 'fast';
  };
  voiceDirection: string;          // TTS guidance
  dialogueIndex: number;           // Sequence (1, 2, 3...)
  audios?: DialogueAudioEntry[];   // Generated TTS audio entries (history + active selection)
  isDeleted?: boolean;             // Soft-delete flag — true means excluded from UI and timing recalcs
}

// Audio generation entry for a dialogue segment
export interface DialogueAudioEntry {
  assetGenJobId: string;           // Reference to AssetGenJob — resolves audioUrl, userId, createdAt
  pitch: 'high' | 'low' | 'none'; // Voice variant used for this generation
  isSelected: boolean;             // Only one entry should be true at a time per dialogue
  text: string;                    // Snapshot of text at generation time (staleness detection)
}

Shot Dialogue (Derived)¶

// Shot-level dialogue (references SceneDialogue via dialogueId)
export interface ShotDialogue {
  dialogueId: string; // Reference to SceneDialogue.dialogueId
  speaker: SceneDialogue['speaker']; // Copied from scene for convenience
  portion: {
    start: number;    // Character index in SceneDialogue.text
    end: number;      // Character index in SceneDialogue.text
    text: string;     // Cached substring for display
  };
  timing: {
    startOffset: number; // Seconds into the shot when this dialogue starts
    duration: number;    // How long this dialogue takes in seconds
  };
  voiceDirection: string; // Copied from scene dialogue
}

Timing Constants¶

// Characters per second for different delivery speeds
CHARS_PER_SECOND = {
  slow: 12,    // Moody, dramatic, deliberate
  normal: 18,  // Standard conversation
  fast: 26     // Excited, urgent, quick banter
}

// Pause durations for punctuation (milliseconds)
PAUSE_DURATIONS_MS = {
  ',': { min: 80, max: 120 },     // Brief pause
  '.': { min: 180, max: 260 },    // Sentence end
  '...': { min: 250, max: 450 },  // Ellipsis, suspense
  '!': { min: 200, max: 320 },    // Exclamation
  '?': { min: 200, max: 320 },    // Question
  '—': { min: 180, max: 300 },    // Em dash
  '\n': { min: 250, max: 500 }    // Line break
}

5. Key Design Principles¶

✅ Single Source of Truth¶

Scene dialogue is canonical
Shot dialogue is derived and recalculable
Edits to scene dialogue automatically invalidate shot dialogue

✅ Referential Integrity¶

Shots reference segments by sceneSegmentId (UUID)
Never duplicate dialogue text
Always derive from scene

✅ Linear Flow¶

Dialogue flows sequentially across shots
No overlaps, no gaps, no skipped segments
order field ensures sequence

✅ Timing Independence¶

Scene timing: Based on text + delivery speed
Shot timing: Based on shot duration
Can recalculate without affecting source

✅ Frontend vs Backend Responsibilities¶

Operation	Where	Why
Dialogue Extraction	Backend (LLM)	Requires AI understanding
UUID Generation	Backend	Security + consistency
Timing Calculation	Backend	Deterministic, complex
Initial Distribution	Backend (LLM)	Cinematic judgment needed
Redistribution	Backend	Affects multiple shots
Manual Edits	Frontend	User interactivity
Validation	Both	Frontend: UX, Backend: Data integrity

Complete Example¶

Let's trace a full user journey:

1. User writes script¶

KOYAL: I was on my way to save you, but… traffic in Ghatkopar was insane.
CUDDLES: You're a bird. You fly. What traffic??

2. Generates scenes¶

Backend creates: - seg-abc-123: "I was on my way..." (0-3.99s, normal delivery) - seg-def-456: "You're a bird..." (3.99-7.11s, fast delivery)

3. Generates shots¶

LLM distributes: - K1 (4.5s): Full segment 1 - "I was on my way to save you..." - K2 (1.2s): Visual only - Reaction shot - K3 (3.5s): Full segment 2 - "You're a bird. You fly. What traffic??"

4. Adds filler F1¶

Backend redistributes: - K1 (4.5s): seg-abc-123 (full) - "I was on my way..." - K2 (1.2s): (visual) - Reaction shot - F1 (1.0s): seg-def-456 (0-15) - "You're a bird." - K3 (3.5s): seg-def-456 (15-38) - "You fly. What traffic??"

Result: Natural pacing, no dialogue lost, cinematically sound! 🎬

Testing¶

The dialogue system is tested with 36 test cases across 3 fixtures (sdf3, sentinels1, voc5).

See python_eval/tests/test_definitions/dialogue_operations.py for: - 7 deterministic tests for dialogue extraction - 2 deterministic tests for dialogue distribution - 3 G-Eval tests for semantic quality

Current Status: ✅ 100% passing (36/36)

6. Dialogue Audio V2 — TTS Generation & Management¶

Overview¶

Generates ElevenLabs TTS audio for scene-level dialogue entries. Audio lives at the SceneDialogue level as audios: DialogueAudioEntry[]. Each entry is a lean record — display data (audioUrl, userId, createdAt) is resolved at runtime from the AssetGenJob collection.

API Endpoints¶

Method	Endpoint	Description
`POST`	`/workbench/generateDialogueAudio`	Create TTS job + push audio entry onto dialogue
`PATCH`	`/workbench/selectDialogueAudio`	Toggle `isSelected` on a specific audio entry
`PATCH`	`/workbench/updateDialogueText`	Update text + recalculate timings + redistribute to shots
`PATCH`	`/workbench/deleteDialogue`	Soft-delete dialogue + recalculate timings + redistribute to shots

Soft Delete¶

Dialogues use the isDeleted: boolean pattern (matching shots/characters). On deletion: 1. dialogue.isDeleted = true is set in the DB 2. Remaining (non-deleted) dialogues get their timings recalculated 3. Dialogue portions are redistributed across shots 4. Deleted dialogues remain in the array for data preservation but are filtered out in queries and UI

Integration Tests¶

Located at tests/workbench/dialogueAudio.test.ts — 19 supertest integration tests covering select, update text, and delete endpoints.

Core Implementation¶

src/workbench/workbench.service.ts - Scene generation with dialogue + dialogue audio V2 endpoints
src/workbench/workbench.controller.ts - Route handlers for dialogue audio V2 (generate, select, update text, delete)
src/workbench/workbench.validator.ts - Zod validators for dialogue audio V2 requests
src/workbench/workbench.router.ts - Route definitions for dialogue audio V2
src/agenticGuru/workbenchSceneGuru.ts - Shot operations with redistribution
src/shared/dialogue.utils.ts - Timing and redistribution algorithms
src/shared/dialogue.types.ts - TypeScript type definitions (SceneDialogue, DialogueAudioEntry)
src/shared/dialogue.constants.ts - Timing constants
src/models/storyVideos.model.ts - MongoDB model methods (pushDialogueAudio, selectDialogueAudio, softDeleteDialogue)

Prompt Templates¶

src/promptTemplates/workbenchV2/generateScenesV2.template.ts - Scene dialogue extraction
src/promptTemplates/workbenchV2/generateShotsV2.template.ts - Shot dialogue distribution

Testing¶

python_eval/tests/test_definitions/dialogue_operations.py - Test definitions
python_eval/tests/test_definitions/utils/dialogue_validation.py - Test functions
python_eval/tests/test_definitions/utils/g_eval_metrics.py - G-Eval metrics
tests/workbench/dialogueAudio.test.ts - Supertest integration tests for dialogue audio API

Migration¶

For existing stories with old dialogue format, see src/shared/dialogue.migration.ts for on-demand migration utilities.

🎬 Dialogue System Flow - Complete Architecture¶

📋 Table of Contents¶

Overview¶

High-Level Flow¶

1. Creation Flow (Scene Generation)¶

Trigger¶

Step-by-Step Process¶

Code References¶

2. Distribution Flow (Shot Generation)¶

Trigger¶

Step-by-Step Process¶

Key Point¶

Code References¶

3. Redistribution Flow (Shot Edits)¶

Trigger¶

Step-by-Step Process¶

Key Points¶

Code References¶

4. Data Structures¶

Scene Dialogue (Source of Truth)¶

Shot Dialogue (Derived)¶

Timing Constants¶

5. Key Design Principles¶

✅ Single Source of Truth¶

✅ Referential Integrity¶

✅ Linear Flow¶

✅ Timing Independence¶

✅ Frontend vs Backend Responsibilities¶

Complete Example¶

1. User writes script¶

2. Generates scenes¶

3. Generates shots¶

4. Adds filler F1¶

Testing¶

6. Dialogue Audio V2 — TTS Generation & Management¶

Overview¶

API Endpoints¶

Soft Delete¶

Integration Tests¶

Related Files¶

Core Implementation¶

Prompt Templates¶

Testing¶

Migration¶