ByteDance Adds Image Recognition, 3D Generation to Doubao AI Assistant
Zhang Yushuo
DATE:  Dec 18 2024
/ SOURCE:  Yicai
ByteDance Adds Image Recognition, 3D Generation to Doubao AI Assistant ByteDance Adds Image Recognition, 3D Generation to Doubao AI Assistant

(Yicai) Dec. 12 -- Chinese tech giant ByteDance has added image recognition and 3D generation features to its Doubao artificial intelligence assistant.

Doubao Visual Understanding Model boasts enhanced content recognition, visual understanding, reasoning, and text creation capabilities, Tan Dai, president of ByteDance's cloud-computing services unit Volcano Engine, said at the Volcano Engine Force Conference Winter 2024 today.

Doubao Visual Understanding Model costs only 0.03 Chinese cents (0.0004 US dollars) per thousand tokens, which is 85 percent lower than the industry average, Tan noted.

Doubao 3D Generation Model can efficiently achieve intelligent training, data synthesis, and digital asset creation when used in tandem with its digital twin platform veOmniverse as a simulator supporting AI creation, Tan added.

Volcano Engine also released updates on the development progress of Doubao. Its Doubao General Model Pro is now on the same level as GPT-4o at one-eight of the price, the Beijing-based company said. 

Doubao's music generation model has evolved from creating simple structures of 60 seconds to producing complete works of up to three minutes.

Its text-to-image model v2.1 has become the first of its kind to have accurately generated Chinese characters and created images based on one-sentence instructions. Volcano Engine has already integrated it into Dreamia AI and the Doubao App.

Moreover, Volcano Engine announced it plans to launch a Doubao video generation model with extended generation capabilities next spring and an end-to-end real-time voice model soon, unlocking new abilities, such as multi-character performance and dialect conversion.

ByteDance released Doubao in August last year. By January, it had already been installed 25 million times, with daily usage exceeding four trillion tokens, up 33-fold from its initial release, according to market research institute QuestMobile. Last month, the Doubao App had nearly 60 million monthly active users, second only to ChatGPT.

Even though Doubao launched muchlater after ChatGPT, it has rapidly iterated and developed, becoming one of the most comprehensive and technologically advanced AI models in China, Tan said.

Shortly after the launch of video generation model Sora, OpenAI faced issues of insufficient computing power. Therefore, ByteDance will need a large-scale intelligent computing center before releasing Doubao's video generation model, Huatai Securities said in a recent research report.

Volcano Engine is also set to unveil the AI+ hardware leap plan co-developed with Toycity and Espressif Systems to explore AI-driven toys at today's conference. 

Editor: Futura Costaglione

Follow Yicai Global on
Keywords:   ByteDance,Doubao,LLM