Why Seedance 2.0?
Text-to-video has been “almost there” for two years. What changed in 2026 is that three separate properties finally landed in a single model:
- Native audio. Synchronized SFX, ambient sound, and lip-sync come with the clip — no ElevenLabs pass, no manual mixing.
- Reference-to-video. You can pass multiple images, a video, and audio together. That's the unlock for character consistency, camera-move transfer, and voice-synced ads.
- Real 1080p at consumer prices. Earlier models forced you to upscale. Seedance renders natively, which matters when you're paying per second and iterating 20 times.
For a solo founder shipping an app marketing video, those three properties are the difference between a one-hour kit and a one-week kit.
Three modes, auto-picked
Seedance 2.0 is three models in a trenchcoat. Newly picks the right one for you based on your inputs — there is no mode toggle in the UI.
Text-to-video
Only a prompt. Best for mood openers, logo animations, abstract b-roll. Capped at ~10 seconds. Endpoint: bytedance/seedance-2.0/text-to-video.
Image-to-video
Exactly one reference image. Animate a hero shot, a UI screenshot, or a mascot. Supports an optional end-frame image for cinematic transitions (A → B). Endpoint: bytedance/seedance-2.0/image-to-video.
Reference-to-video
Two or more images, OR any video, OR any audio. Reference them as @Image1, @Video1, @Audio1 in the prompt. This is how you keep a character consistent across three scenes. Endpoint: bytedance/seedance-2.0/reference-to-video.
The panel shows you which sub-model will run before you submit, so you can add or remove media to force a different mode if needed.
Fast vs Standard
Each mode has a Fast variant. The trade-off:
Fast (amber toggle)
30–60 seconds per generation. Capped at 720p. ~3–5× cheaper. Perfect for iteration: write the prompt, generate, watch, tweak.
Standard
2–3 minutes per generation. Up to 1080p. Extra quality pass visible on faces, hair, and complex motion. Use for the final render before uploading to the App Store or a paid ads account.
Our recommended loop: draft in Fast, lock in Standard. Nine out of ten iterations happen in Fast mode; the last render goes to Standard with the final prompt.
Reference media rules
Images
Up to 9 JPEG / PNG / WebP. 30 MB each. Reference in prompt as @Image1, @Image2, etc. Use the same image across scenes to lock identity.
Videos
Up to 3 MP4 / MOV. 50 MB total. Combined duration 2–15 seconds. Great for transferring camera moves or emotional energy.
Audio
Up to 3 MP3 / WAV. 15 MB each, 15 seconds combined. Requires at least one image or video reference alongside it — audio alone is not enough.
Total ceiling
Up to 12 files across all modalities per generation. Past that, split the scene into two clips and stitch in post.
Prompt patterns that work
Prompt engineering for video is more boring than for text. You're not coaxing creativity, you're specifying a shot list. Four patterns:
Action first, setting second
“Woman jogging along a beach at sunrise, camera tracking beside her” works. “A beach. It’s morning. Someone runs” does not.
Name references explicitly
“@Image1 walks into the shot, picks up the product, smiles at the camera”. Avoid “the person in the image” — be explicit about @Image1.
Borrow language from film
Dolly in, crane up, whip pan, handheld, macro, drone, 35mm. Seedance was trained on labeled film data and responds to these terms.
End with a beat
For ads, end the prompt with the payoff: “...then the app UI animates in and the logo stamps in the final frame”. Clips that don’t end cleanly are unusable.
Ad formats for app launch
Instagram / TikTok vertical (9:16)
6–10 seconds, 1080p. Generate-audio ON. Hook in the first 0.5s, payoff at 4s, CTA card in post.
App Store preview (16:9 or 9:16)
Up to 30 seconds total. Generate three 10-second clips and stitch. Audio optional — most users watch muted.
Product Hunt loop (1:1)
4–6 seconds, no audio, seamless loop. Use image-to-video with the same start and end frame for a clean loop.
YouTube pre-roll (16:9)
6 or 15 seconds, 1080p. Use reference-to-video with a brand color card as @Image2 to anchor the palette.
Pricing & performance
Ballparks as of April 2026, per generation:
- Fast, 5s, 720p: ~$0.15. Generation time: 30–60s.
- Fast, 10s, 720p: ~$0.30. Generation time: 45–90s.
- Standard, 5s, 1080p, audio on: ~$0.50. Generation time: 2m.
- Standard, 10s, 1080p, audio on: ~$1.00. Generation time: 3–4m.
A typical app launch kit (1 hero clip + 3 variants + 1 App Store preview) runs $2–$4 in compute. Cheap enough that you should generate 10 variants and pick the best two.
Seedance vs Sora / Veo / Pika
vs Sora 2 (OpenAI)
Sora 2 edges on cinematic prompts and long takes. Seedance wins on price, on image-to-video, and on native audio. For app ads, Seedance is almost always the right default.
vs Veo 3 (Google)
Veo 3 leads on photoreal humans. Seedance leads on reference-to-video (multi-image + video + audio combined). If you’re animating a stylized brand mascot, Seedance is better.
vs Pika 2 / Pika Turbo
Pika is faster and has fun style presets. Seedance has stricter prompt adherence and much better character consistency across scenes.
vs Runway Gen-4
Runway’s editor is better than ours — but Seedance 2.0’s raw model quality and reference-to-video flexibility are a step ahead, and inside Newly you get it integrated with images and outreach.
Sources & further reading
Official product pages, APIs, and background reading for models and tools mentioned in this guide. Newly is not affiliated with these vendors; links are for your own research.
- Fal — ByteDance Seedance 2.0 (text-to-video)
Official model card for the “prompt only” path described in the three-mode overview (~10s cap, mood openers, abstract b-roll).
- Fal — ByteDance Seedance 2.0 (image-to-video)
Single image input and optional end frame — matches the “hero to motion” and App Store–style ad loops in this guide.
- Fal — ByteDance Seedance 2.0 (reference-to-video)
Multi-image, video, and audio reference slots; this is the API surface behind character-consistent and multi-reference prompts.
- Fal — home
Pricing and latency for Seedance 2.0 and related models (the per-second cost ranges stated here track Fal’s published rates).
- OpenAI — Sora
OpenAI’s high-end generative video line; the article compares Sora 2’s cinematic quality vs Seedance 2.0 on price and modes.
- Google DeepMind — Veo
Google’s Veo 3 class of video models, cited for photorealism vs reference-heavy Seedance 2.0 use cases.
- Pika
Pika 2 / Turbo is discussed as a faster, style-preset–oriented alternative with looser character consistency than Seedance 2.0 in multi-scene work.
- Runway
Runway Gen-4 and its editor are compared to raw Seedance 2.0 model quality in the competitive section.
- Fal — Nano Banana 2 (for reference images)
Image model used in this guide to build character sheets before reference-to-video — pairs with the image generator article in the same cluster.
- Google AI for Developers — Image generation (Gemini API)
Related reading if you are generating stills in the same launch kit as the clips described here.