China’s ByteDance Reveals World’s First Video Generation Tool Trained on Raw Visual Data

Lv Qian

DATE: Feb 11 2025

/ SOURCE: Yicai

China’s ByteDance Reveals World’s First Video Generation Tool Trained on Raw Visual Data

(Yicai) Feb. 11 -- Chinese tech giant ByteDance made public its experimental open-source video generation model VideoWorld yesterday. Unlike mainstream multimodal models such as US artificial intelligence firm OpenAI's Sora and DALL-E, which generate videos from text and prompts, this AI tool is the first in the industry to be able to recognize and understand the world through pure visual input, such as unlabeled videos, without relying on text or language models.

Developed by ByteDance’s Doubao Large Language Model team, Beijing Jiaotong University and the University of Science and Technology of China, the AI tool is part of an academic research project that is exploring new technical approaches and has not yet been released as a finished product, company insiders told Yicai.

LLMs’ knowledge extraction efficiency from video sequences lags significantly behind that of text forms, mainly because there is a lot of redundant information in videos, the Doubao team said. This led them to develop VideoWorld, which is able to achieve efficient video learning by retaining rich visual information while compressing visual variations that result from key decisions and actions.

VideoWorld is not the first video-based AI tool developed by ByteDance. Last week, ByteDance said it would soon release another multimodal video generation tool called OmniHuman, which can generate an AI video from only one picture and one audio clip. This is a closed-source model self-developed by the parent company of TikTok.

The Beijing-based firm has also previously released the text-to-video generative model MagicVideo-V2 and the general multimodal large model UniDoc.

Other internet behemoths such as Alibaba Group Holding, Tencent Holdings and Kuaishou Technology have also recently launched video generation tools and disclosed their developments in the multimodal domain.

There is fierce competition among leading developers of multimodal LLMs, according to a research report by CITIC Securities. The video genre better aligns with the entertainment needs of end users, especially given its strong compatibility with the short video industry. Therefore it has greater potential to produce popular applications, although models’ performance still needs to be improved.

Editor: Kim Taylor

Follow Yicai Global on

Keywords: ByteDance,Doubao,VideoWorld,LLM

Report

Log in to Yicai Global

EMAIL

0/50

PASSWORD

Forgot password? sign up

Create your account

EMAIL

By signing up, you agree to our Terms, Privacy Policy

We sent you a code

Enter it below to verify via ****@****.com

VERIFICATION CODE

Didn't receive email? Resend email

You'll need a password

Make sure it's 8 characters or more

PASSWORD

SHOW PASSWORD/HIDE PASSWORD

Success!

Welcome to Yicai Global

Find your Yicai Global account

Enter your email

Check your email

We've sent an email to *********@q*.*** with a confirmation code.

Enter the code below to reset your password.

If you don't see the email, check your junk, spam or other folders.

Enter code

Didn't receive email?Resend email

Change your password

Strong passwords include numbers,letters,and special characters.

Resetting your password will log you out of all your active Yicai Global sessions.

Enter your new password

Enter your new password again

Congratulations!

Your password has been changed successfully.

Reset your password

Strong passwords include numbers,letters,and special characters.

RELATED

Log in to Yicai Global

EMAIL

PASSWORD

Create your account

EMAIL

We sent you a code

VERIFICATION CODE

You'll need a password

PASSWORD

Find your Yicai Global account

Enter your email

Check your email

Enter code

Change your password

Enter your new password

Enter your new password again

Reset your password

Enter your new password

getcode