This skill is available as a machine-readable YAML playbook via the Framelane MCP server.
Load it into any MCP-capable agent to get the complete workflow without writing integration code.
What this skill does
Given a source video URL and a time range, orchestrates:- Trim —
in_point/out_pointon the video element - Crop to vertical —
crop_left/crop_rightcenter-crop for 16:9 → 9:16 (auto-computed when omitted) - Transcribe —
POST /v1/tasks/transcribewith word-level timestamps - Segment — group words into caption lines (sentence breaks or pauses > 0.8s)
- Style — optional opening title with
differenceblend motion; caption lines cyclecolor→box→glow - Render —
POST /v1/rendersat 1080×1920 using the TikTok Captions recipe as the composition template
Example user prompt
Take the section from 00:22:43 to 00:44:56, crop to vertical, add an opening title with Difference blend, then burn in word-timed captions — mix color, box, and glow styles across different fonts.The agent converts timecodes to seconds (
1363 → 2696), runs this skill, and posts the resulting RenderRequest.
Load via MCP
Inputs
| Name | Required | Description |
|---|---|---|
source_url | required | URL of the source video |
start_seconds | required | Clip start in source seconds (convert HH:MM:SS first) |
end_seconds | required | Clip end in source seconds |
opening_title_text | optional | Display title at clip start with difference blend motion |
opening_title_duration | optional | Title visibility in seconds. Default: 3. |
crop_left | optional | Left crop fraction (0–1). Auto-computed for 16:9 when omitted (~0.342). |
crop_right | optional | Right crop fraction. Defaults to crop_left. |
source_aspect_ratio | optional | Used for auto-crop. Default: "16:9". |
Output
| Name | Description |
|---|---|
clip_url | Signed URL of the rendered 1080×1920 mp4 |
render_id | Render job ID for status polling |
Timecode conversion
ConvertHH:MM:SS to seconds before calling the skill:
Vertical crop (16:9 → 9:16)
Whencrop_left / crop_right are omitted, center-crop a landscape source:
crop_left = crop_right ≈ 0.342.
Caption line rules
After transcription, the agent:- Filters words to
[start_seconds, end_seconds]and offsets to composition time (0= clip start) - Groups words into lines — split on
.?!or a gap > 0.8s between words - Skips words inside
opening_title_durationwhen a title is set - Creates one
textelement per line — all reuse"id": "t1" - Sets each line’s
timeto the first word’sstartanddurationtolast_word.end − time - Cycles
word_animation.style:color→box→glow→color…
word_animation.words are absolute composition seconds — the same clock as each text element’s time. See Word Animation Examples.
Render composition template
Follow the TikTok Captions recipe for element shapes. Minimal skeleton:text element with the same "id": "t1" and the next style in the cycle.
