Infinite-Length Video Generation with Error Recycling
PS: The background video is generated by our Stable Video Infinity
PS: The background video is generated by our Stable Video Infinity
⚠️ Note
All videos displayed on this website have been compressed for web delivery, which may result in reduced visual quality compared to the original generated content. The compression is necessary to ensure optimal loading times and bandwidth efficiency. All the videos have been sped-up from 16 FPS to 24 FPS for better visual experience.
⚖️ Copyright
Some materials and video sources are derived from real videos. The generated content is for academic use only and commercial use is not permitted.
This setting targets the needs of vloggers (e.g., TikTok) for shot video creation, emphasizing moderate scene transitions.
This setting targets vlogger use cases (e.g., TikTok), emphasizing storytelling with plausible scene transitions and exciting contents. All methods are conditioned on the same prompts. In the compared methods, accumulated errors manifest as (1) failed text following, (2) degraded motion, and (3) visual artifacts.
This setting aims to generate temporally coherent videos in a homogeneous scene controlled by a single text prompt, which aligns with the previous long video objective.
[1] Henschel, R., Khachatryan, L., Poghosyan, H., Hayrapetyan, D., Tadevosyan, V., Wang, Z., ... & Shi, H. (2025). Streamingt2v: Consistent, dynamic, and extendable long video generation from text. In CVPR 2025.
[2] Zhang, L., & Agrawala, M. (2025). Packing input frame context in next-frame prediction models for video generation. arXiv preprint arXiv:2504.12626.
[3] Wan, T., Wang, A., Ai, B., Wen, B., Mao, C., Xie, C. W., ... & Liu, Z. (2025). Wan: Open and advanced large-scale video generative models. arXiv preprint arXiv:2503.20314.
[4] Kong, Z., Gao, F., Zhang, Y., Kang, Z., Wei, X., Cai, X., ... & Luo, W. (2025). Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation. In NeurIPS 2025.
[5] Wang, X., Zhang, S., Tang, L., Zhang, Y., Gao, C., Wang, Y., & Sang, N. (2025). Unianimate-dit: Human image animation with large-scale video diffusion transformer. arXiv preprint arXiv:2504.11289.