Overview
Transcribes audio or video using AssemblyAI Universal-2/3 models. Returns a structured transcript with optional word timings, speaker labels, sentiment, entities, and content safety flags.Request
Key parameters
| Parameter | Default | Description |
|---|---|---|
source_url | required | URL of the audio or video file |
language | auto | BCP-47 language code, e.g. en, de, es |
speech_model | universal-2 | universal-2, universal-3-pro, or nano |
word_timings | true | Include word-level timestamps |
speaker_labels | false | Enable speaker diarisation |
speakers_expected | null | Hint for number of speakers (1–10) |
sentiment_analysis | false | Per-sentence sentiment |
entity_detection | false | Named entity recognition |
auto_highlights | false | Extract key phrases |
content_safety | false | Flag sensitive content |
iab_categories | false | IAB taxonomy classification |
word_boost | [] | Domain terms to boost (max 200) |
Output
The completed task’soutput.url points to a JSON file with the full AssemblyAI transcript response, including text, words, utterances (with speaker labels), and any requested analysis results.

