Skip to main content
The caption stack that performs on TikTok: an opening title lands with a Difference blend motion effect, then each spoken line gets its own font, color treatment, and word-animation style, timed to word-level timestamps from transcription. One video layer carries the audio; caption elements swap in and out on the timeline. Features used: difference motion, word_animation (color, box, glow)
{
  "width": 1080,
  "height": 1920,
  "elements": [
    {
      "type": "video",
      "id": "vid1",
      "source_url": "https://cdn-assets.framelane.io/shared/videos/clip1.mp4",
      "volume": 50
    },
    {
      "type": "text",
      "id": "t1",
      "text": "Albert",
      "font_family": "Alfa Slab One",
      "text_color": "#FFFFFF",
      "font_size": 260,
      "x": "50%",
      "y": "70%",
      "time": 0,
      "duration": 4.193,
      "motion": [
        {
          "type": "difference",
          "time": 0,
          "duration": 4.193
        }
      ]
    },
    {
      "type": "text",
      "id": "t1",
      "text": "When I came to you with those calculations",
      "font_family": "Komika Axis",
      "text_color": "#FFFFFF",
      "stroke_color": "#000000",
      "stroke_width": 0.1,
      "font_size": 100,
      "background_color": "#850000",
      "x": "50%",
      "y": "75%",
      "time": 4.496,
      "duration": 2.493,
      "word_animation": {
        "style": "color",
        "words": [
          { "text": "When", "start": 4.496, "end": 4.577 },
          { "text": "I", "start": 4.593, "end": 4.69 },
          { "text": "came", "start": 4.69, "end": 5.014 },
          { "text": "to", "start": 5.014, "end": 5.095 },
          { "text": "you", "start": 5.095, "end": 5.24 },
          { "text": "with", "start": 5.24, "end": 5.37 },
          { "text": "those", "start": 5.58, "end": 5.871 },
          { "text": "calculations,", "start": 5.984, "end": 6.793 }
        ]
      }
    },
    {
      "type": "text",
      "id": "t1",
      "text": "we thought we might start a chain reaction that would destroy the entire world",
      "font_family": "Bebas Neue",
      "text_color": "#FFFFFF",
      "font_size": 100,
      "background_color": "#47008E",
      "x": "50%",
      "y": "75%",
      "time": 7.682,
      "duration": 6.828,
      "word_animation": {
        "style": "box",
        "words": [
          { "text": "we", "start": 7.682, "end": 7.828 },
          { "text": "thought", "start": 7.828, "end": 8.07 },
          { "text": "we", "start": 8.07, "end": 8.183 },
          { "text": "might", "start": 8.183, "end": 8.491 },
          { "text": "start", "start": 8.491, "end": 8.814 },
          { "text": "a", "start": 8.814, "end": 8.895 },
          { "text": "chain", "start": 8.895, "end": 9.202 },
          { "text": "reaction", "start": 9.202, "end": 9.752 },
          { "text": "that", "start": 9.768, "end": 9.946 },
          { "text": "would", "start": 9.946, "end": 10.043 },
          { "text": "destroy", "start": 10.043, "end": 11.757 },
          { "text": "the", "start": 11.822, "end": 11.968 },
          { "text": "entire", "start": 11.968, "end": 12.226 },
          { "text": "world.", "start": 12.938, "end": 14.363 }
        ]
      }
    },
    {
      "type": "text",
      "id": "t1",
      "text": "I remember it well. What of it?",
      "font_family": "Lemon",
      "text_color": "#FFFFFF",
      "background": true,
      "background_color": "#000000",
      "background_opacity": 70,
      "x_padding": "3%",
      "y_padding": "1.5%",
      "font_size": 100,
      "x": "50%",
      "y": "75%",
      "time": 14.70,
      "duration": 2.719,
      "word_animation": {
        "style": "glow",
        "words": [
          { "text": "I", "start": 14.703, "end": 14.719 },
          { "text": "remember", "start": 14.719, "end": 15.205 },
          { "text": "it", "start": 15.205, "end": 15.367 },
          { "text": "well.", "start": 15.367, "end": 15.707 },
          { "text": "What", "start": 16.743, "end": 16.985 },
          { "text": "of", "start": 17.05, "end": 17.163 },
          { "text": "it.", "start": 17.163, "end": 17.406 }
        ]
      }
    },
    {
      "type": "text",
      "id": "t1",
      "text": "I belive we did!",
      "font_family": "Poppins",
      "text_color": "#FFFFFF",
      "stroke_color": "#000000",
      "background_color": "#FF5000",
      "stroke_width": 0.1,
      "shadow_color": "#000000",
      "shadow_x": "5%",
      "shadow_y": "8%",
      "font_size": 120,
      "x": "50%",
      "y": "75%",
      "time": 20.74,
      "duration": 3.593,
      "word_animation": {
        "style": "color",
        "words": [
          { "text": "I", "start": 21.74, "end": 21.844 },
          { "text": "believe", "start": 21.844, "end": 22.123 },
          { "text": "we", "start": 22.123, "end": 22.21 },
          { "text": "did.", "start": 22.21, "end": 22.437 }
        ]
      }
    }
  ]
}

How this request is structured

One video, many caption lines. The video element carries the footage and audio (volume: 50). Each caption is a separate text element with its own time and duration — they appear sequentially, not all at once. Opening title uses blend motion. The first text element ("Albert") has no word_animation. Instead it uses a motion preset with "type": "difference" — the After Effects Difference blend mode, which inverts against the video backdrop per channel (|backdrop − color|). Set motion[].time and motion[].duration to match the element’s time and duration so the effect runs for the full title window. Reuse the same id. Every caption line uses "id": "t1". Because each element has a different time window, they never overlap on the timeline. Word timestamps are absolute. Each word’s start and end are seconds from the start of the composition, not relative to the text element’s time. See Word Animation Examples for details. Mix styles per line. After the opening title, this reel cycles through three karaoke styles:
LineEffectCaption treatment
”Albert”motiondifference blendLarge display type, no word animation
”When I came to you…”word_animationcolorStroke + background_color as highlight color
”we thought we might…”word_animationboxColored box behind the active word
”I remember it well…”word_animationglowSemi-transparent background bar, active word at full opacity
”I belive we did!”word_animationcolorStroke, shadow, and orange highlight

Getting word timestamps

Word timestamps come from a speech-to-text / forced alignment pipeline. Common sources:
  • WhisperX — word-level alignment on top of Whisper transcriptions
  • AssemblyAI / Deepgram — both return word-level timestamps in their transcription API response
  • The FrameLane Transcribe task returns word timestamps directly usable in word_animation.words