Skip to main content

Overview

Transcribes audio or video using AssemblyAI Universal-2/3 models. Returns a structured transcript with optional word timings, speaker labels, sentiment, entities, and content safety flags.

Request

POST /v1/tasks/transcribe
Authorization: Bearer {api_key}
{
  "source_url": "https://cdn.example.com/interview.mp4",
  "language": "en",
  "speech_model": "universal-2",
  "word_timings": true,
  "speaker_labels": true,
  "speakers_expected": 2
}

Key parameters

ParameterDefaultDescription
source_urlrequiredURL of the audio or video file
languageautoBCP-47 language code, e.g. en, de, es
speech_modeluniversal-2universal-2, universal-3-pro, or nano
word_timingstrueInclude word-level timestamps
speaker_labelsfalseEnable speaker diarisation
speakers_expectednullHint for number of speakers (1–10)
sentiment_analysisfalsePer-sentence sentiment
entity_detectionfalseNamed entity recognition
auto_highlightsfalseExtract key phrases
content_safetyfalseFlag sensitive content
iab_categoriesfalseIAB taxonomy classification
word_boost[]Domain terms to boost (max 200)

Output

The completed task’s output.url points to a JSON file with the full AssemblyAI transcript response, including text, words, utterances (with speaker labels), and any requested analysis results.

Example output shape

{
  "text": "Welcome back to the show...",
  "words": [
    { "text": "Welcome", "start": 0, "end": 650, "confidence": 0.99, "speaker": "A" }
  ],
  "utterances": [
    { "speaker": "A", "text": "Welcome back to the show.", "start": 0, "end": 2500 }
  ]
}