Revolutionary AI Model Qwen3.5-Omni Can Write Code from Voice Commands and Videos

Huma ShaziaMarch 31, 2026 at 8:01 PM10 min read

Alibaba's latest AI model, Qwen3.5-Omni, has made a groundbreaking achievement by learning to write code from spoken instructions and videos without any prior training. This omnimodal AI model can process text, images, audio, and video, and has outperformed Google's Gemini 3.1 Pro in audio tasks. With its massive language support and advanced speech recognition capabilities, Qwen3.5-Omni is set to revolutionize the tech industry.

Key Takeaways

Qwen3.5-Omni can write code from spoken instructions and videos
The model outperforms Google's Gemini 3.1 Pro in audio tasks
It supports 74 languages and 39 Chinese dialects for speech recognition

Introduction to Qwen3.5-Omni

The latest AI model from Alibaba, Qwen3.5-Omni, has been making waves in the tech industry with its unprecedented capabilities. This omnimodal AI model can process text, images, audio, and video, and has been trained on over 100 million hours of audiovisual material.

Qwen3.5-Omni can handle contexts up to 256,000 tokens
It can process more than ten hours of audio and over 400 seconds of 720p video at one frame per second

Groundbreaking Capabilities of Qwen3.5-Omni

One of the most impressive features of Qwen3.5-Omni is its ability to write code from spoken instructions and videos. This capability has the potential to revolutionize the way we interact with computers and develop software.

Qwen3.5-Omni can generate speech output alongside text
It has outperformed Google's Gemini 3.1 Pro in overall audio comprehension, reasoning, recognition, translation, and dialog

Massive Language Support

Qwen3.5-Omni has expanded its language support significantly, with speech recognition now covering 74 languages and 39 Chinese dialects. This is a massive jump from its predecessor, which handled only 11 languages and 8 Chinese dialects.

The model supports 36 languages and dialects for voice output
It has 55 voices available, including user-defined, scenario-specific, dialectal, and multilingual options

Bar charts comparing Qwen3.5-Omni-Plus, Qwen3.5-Omni-Flash, and Gemini 3.1 Pro across five audiovisual benchmarks: DailyOmni (84.6 vs. 82.7), WorldSense (62.8 vs. 65.5), AVUT (85.0 vs. 85.6), QualcommInteractive (68.5 vs. 66.2), and Omni-Cloze (64.8 vs. 57.2).

Technical Achievements of Qwen3.5-Omni

Qwen3.5-Omni has achieved state-of-the-art results in several audio and audiovisual benchmarks. Its technical capabilities are a testament to the advancements in AI research and development.

The model has scored 82.2 in audio comprehension (MMAU) versus 81.1 for Gemini 3.1 Pro
It has achieved a word error rate of 6.24 on the tough 'seed-hard' test set

Future Implications of Qwen3.5-Omni

The release of Qwen3.5-Omni has significant implications for the future of AI research and development. Its capabilities have the potential to revolutionize various industries, from software development to customer service.

Qwen3.5-Omni can enable more efficient and effective human-computer interaction
Its language support and speech recognition capabilities can facilitate communication across languages and cultures

Conclusion and Future Outlook

In conclusion, Qwen3.5-Omni is a revolutionary AI model that has the potential to transform the tech industry. Its groundbreaking capabilities, massive language support, and technical achievements make it an exciting development in the field of AI research.

Qwen3.5-Omni is set to revolutionize the way we interact with computers and develop software
Its release has significant implications for the future of AI research and development

“Qwen3.5-Omni-Plus claims state of the art across 215 audio benchmarks”

— Qwen team

Final Thoughts

As we move forward, it will be exciting to see how Qwen3.5-Omni is used in various industries and applications. With its unprecedented capabilities and massive language support, this AI model has the potential to revolutionize the way we live and work. As AI research continues to advance, we can expect to see even more innovative and groundbreaking developments in the future.

Sources & Credits

Originally reported by Unknown — Jonathan Kemper

Five bar charts comparing speech generation by Qwen3.5-Omni-Plus, Gemini 2.5 Pro, GPT-Audio, ElevenLabs, and Minimax. Metrics include custom voice hard stability, multilingual stability across 20 and 10 languages, voice clone stability, and voice clone similarity. Qwen3.5-Omni-Plus scores highest in four of five categories.

Architecture diagram of Qwen3.5-Omni. A vision encoder and audio transformer (AuT) at the bottom take in video frames and audio data. The hybrid MoE thinker processes the inputs and outputs text. The hybrid MoE talker above it receives hidden states from the thinker, interleaves text and speech tokens, and generates speech output through a streaming codec decoder.

A new technology is set to revolutionize the way AI agents learn and adapt, enabling them to accumulate wisdom and apply it to new situations. This innovation has the potential to significantly boost the reliability of AI agents, especially in complex tasks. By converting raw agent trajectories into reusable guidelines, this tech is poised to transform the AI landscape.

9 Apr 2026

Trending Tech·10 min

The Dark Side of AI: How Bots Are Fueling a Monetized Abuse Ecosystem

A recent analysis of 2.8 million Telegram messages reveals a shocking truth: AI-powered bots are being used to create and sell non-consensual intimate images. These bots can turn ordinary photos into synthetic nude images, and the abuse is being monetized through affiliate programs and subscription-based archives. The researchers behind the study are calling for stricter regulations to combat this growing problem.

9 Apr 2026

Trending Tech·8 min

AI's Secret Sauce: How Journalism Became the Unlikely Ingredient

A recent study reveals that AI chatbots rely heavily on journalistic sources for their quotes, with one in four coming from news outlets. This shocking discovery has significant implications for the media industry and our understanding of AI's information gathering processes. As AI technology continues to evolve, it's essential to consider the role of journalism in shaping its responses.

9 Apr 2026