Skip to content

AssetGen Module

Voice Settings

Character voice assignment via ElevenLabs. Voices are stored as lean VoiceEntry[] on the project asset -- display data is resolved at runtime from source collections.

Generated Voice Flow

  1. User describes a voice -> POST /elevenLabs/text-to-voice/design -> returns 3 previews with audio_base_64
  2. User picks one -> POST /elevenLabs/text-to-voice -> permanent voice created in ElevenLabs + saved to voices collection

Library Voice Flow

  1. User picks a marketplace voice -> POST /elevenLabs/addVoice -> voice saved to voices collection
  2. TTS job runs with character's dialog text -> POST /createJob -> custom audio stored in AssetGenJob.outputLinks.result

Pitch Variation Flow (Remix)

  1. User selects pitch (high/low) + strength -> POST /elevenLabs/remix -> returns 3 remixed previews
  2. User picks one -> POST /elevenLabs/text-to-voice -> permanent voice saved to voices collection

Data Resolution (on voice settings open)

Endpoint Source Returns
POST /voices/getByIds voices collection name, preview_url, user_id, created_at
POST /getJobsByIds AssetGenJob collection outputLinks.result (library voice audio)
POST /users/resolveNames users collection userId -> display name map

All batch endpoints capped at max 50 per request (Zod validated). Frontend chunks larger arrays to match.

Audio Resolution Logic

  • Generated voice -> voices collection preview_url (custom dialog audio from design step)
  • Library voice -> AssetGenJob outputLinks.result (custom dialog TTS on B2)
  • Variation voice -> voices collection preview_url (remix audio)

Collections

Collection Purpose
voices (MongoDB) Local cache of ElevenLabs voice metadata (premade + user voices)
AssetGenJob TTS job results -- library voice custom audio lives here, and other jobs also
Project asset voices[] Lean VoiceEntry array: voiceId, selected, source, assetGenJobId?, variations?

Dialog Preview

Each character has a dialogPreview string (min 100 chars) generated on character creation. Template: "My name is {title} and I'm the {role} in {project}." Padded with character description if under 100 chars. Used as the spoken text for all voice previews.

Voice Selection (Assign / Restore)

After creating a voice via the endpoints above, the FE selects it on the character via a dedicated endpoint in the projects module:

PATCH /projects/:projectId/assets/:assetId/voices/select
Body: { voiceId, source?, assetGenJobId?, variation?: { voiceId, pitch, strength } }

The BE determines the action automatically:

source present? variation present? voiceId exists? Action Activity Logged
yes no Add new VoiceEntry, select it ASSIGN_CHARACTER_VOICE
no no yes Select existing voice RESTORE_CHARACTER_VOICE
yes (new) Add new Variation on parent ASSIGN_CHARACTER_VOICE_VARIATION
yes (existing) yes Select existing variation RESTORE_CHARACTER_VOICE_VARIATION

Activity logging is fire-and-forget. Display data (voiceName, audioUrl) is resolved from source collections before logging:

Voice type voiceName from audioUrl from
Generated VoicesModel.getVoicesByIds() -> name -> preview_url
Library AssetJobModel.getJobsByIds() -> modelConfig.modelTitle -> outputLinks.result
Variation VoicesModel.getVoicesByIds() -> name -> preview_url

Key files: src/projects/projects.service.ts (selectVoice, handleBaseVoice, handleVariation), src/projects/projects.validator.ts (validateSelectVoiceRequest), src/projects/projects.controller.ts (selectVoice). See src/projectActivities/README.md for activity logging conventions.