All postsTech News

Revolutionary AI Model Qwen3.5-Omni Can Write Code from Voice Commands and Videos

Huma Shazia31 March 2026 at 8:01 pm10 min read
Revolutionary AI Model Qwen3.5-Omni Can Write Code from Voice Commands and Videos

Alibaba's latest AI model, Qwen3.5-Omni, has made a groundbreaking achievement by learning to write code from spoken instructions and videos without any prior training. This omnimodal AI model can process text, images, audio, and video, and has outperformed Google's Gemini 3.1 Pro in audio tasks. With its massive language support and advanced speech recognition capabilities, Qwen3.5-Omni is set to revolutionize the tech industry.

Key Takeaways

  • Qwen3.5-Omni can write code from spoken instructions and videos
  • The model outperforms Google's Gemini 3.1 Pro in audio tasks
  • It supports 74 languages and 39 Chinese dialects for speech recognition

In This Article

  • Introduction to Qwen3.5-Omni
  • Groundbreaking Capabilities of Qwen3.5-Omni
  • Massive Language Support
  • Technical Achievements of Qwen3.5-Omni
  • Future Implications of Qwen3.5-Omni
  • Conclusion and Future Outlook

Introduction to Qwen3.5-Omni

The latest AI model from Alibaba, Qwen3.5-Omni, has been making waves in the tech industry with its unprecedented capabilities. This omnimodal AI model can process text, images, audio, and video, and has been trained on over 100 million hours of audiovisual material.

  • Qwen3.5-Omni can handle contexts up to 256,000 tokens
  • It can process more than ten hours of audio and over 400 seconds of 720p video at one frame per second
Bar charts comparing Qwen3.5-Omni-Plus, Qwen3.5-Omni-Flash, and Gemini 3.1 Pro across five audiovisual benchmarks: DailyOmni (84.6 vs. 82.7), WorldSense (62.8 vs. 65.5), AVUT (85.0 vs. 85.6), QualcommInteractive (68.5 vs. 66.2), and Omni-Cloze (64.8 vs. 57.2).
Bar charts comparing Qwen3.5-Omni-Plus, Qwen3.5-Omni-Flash, and Gemini 3.1 Pro across five audiovisual benchmarks: DailyOmni (84.6 vs. 82.7), WorldSense (62.8 vs. 65.5), AVUT (85.0 vs. 85.6), QualcommInteractive (68.5 vs. 66.2), and Omni-Cloze (64.8 vs. 57.2). (Source: Unknown)

Groundbreaking Capabilities of Qwen3.5-Omni

One of the most impressive features of Qwen3.5-Omni is its ability to write code from spoken instructions and videos. This capability has the potential to revolutionize the way we interact with computers and develop software.

  • Qwen3.5-Omni can generate speech output alongside text
  • It has outperformed Google's Gemini 3.1 Pro in overall audio comprehension, reasoning, recognition, translation, and dialog
Five bar charts comparing speech generation by Qwen3.5-Omni-Plus, Gemini 2.5 Pro, GPT-Audio, ElevenLabs, and Minimax. Metrics include custom voice hard stability, multilingual stability across 20 and 10 languages, voice clone stability, and voice clone similarity. Qwen3.5-Omni-Plus scores highest in four of five categories.
Five bar charts comparing speech generation by Qwen3.5-Omni-Plus, Gemini 2.5 Pro, GPT-Audio, ElevenLabs, and Minimax. Metrics include custom voice hard stability, multilingual stability across 20 and 10 languages, voice clone stability, and voice clone similarity. Qwen3.5-Omni-Plus scores highest in four of five categories. (Source: Unknown)

Massive Language Support

Qwen3.5-Omni has expanded its language support significantly, with speech recognition now covering 74 languages and 39 Chinese dialects. This is a massive jump from its predecessor, which handled only 11 languages and 8 Chinese dialects.

  • The model supports 36 languages and dialects for voice output
  • It has 55 voices available, including user-defined, scenario-specific, dialectal, and multilingual options
Architecture diagram of Qwen3.5-Omni. A vision encoder and audio transformer (AuT) at the bottom take in video frames and audio data. The hybrid MoE thinker processes the inputs and outputs text. The hybrid MoE talker above it receives hidden states from the thinker, interleaves text and speech tokens, and generates speech output through a streaming codec decoder.
Architecture diagram of Qwen3.5-Omni. A vision encoder and audio transformer (AuT) at the bottom take in video frames and audio data. The hybrid MoE thinker processes the inputs and outputs text. The hybrid MoE talker above it receives hidden states from the thinker, interleaves text and speech tokens, and generates speech output through a streaming codec decoder. (Source: Unknown)

Technical Achievements of Qwen3.5-Omni

Qwen3.5-Omni has achieved state-of-the-art results in several audio and audiovisual benchmarks. Its technical capabilities are a testament to the advancements in AI research and development.

  • The model has scored 82.2 in audio comprehension (MMAU) versus 81.1 for Gemini 3.1 Pro
  • It has achieved a word error rate of 6.24 on the tough 'seed-hard' test set

Future Implications of Qwen3.5-Omni

The release of Qwen3.5-Omni has significant implications for the future of AI research and development. Its capabilities have the potential to revolutionize various industries, from software development to customer service.

  • Qwen3.5-Omni can enable more efficient and effective human-computer interaction
  • Its language support and speech recognition capabilities can facilitate communication across languages and cultures

Conclusion and Future Outlook

In conclusion, Qwen3.5-Omni is a revolutionary AI model that has the potential to transform the tech industry. Its groundbreaking capabilities, massive language support, and technical achievements make it an exciting development in the field of AI research.

  • Qwen3.5-Omni is set to revolutionize the way we interact with computers and develop software
  • Its release has significant implications for the future of AI research and development
Qwen3.5-Omni-Plus claims state of the art across 215 audio benchmarks

— Qwen team

Final Thoughts

As we move forward, it will be exciting to see how Qwen3.5-Omni is used in various industries and applications. With its unprecedented capabilities and massive language support, this AI model has the potential to revolutionize the way we live and work. As AI research continues to advance, we can expect to see even more innovative and groundbreaking developments in the future.

Sources & Credits

Originally reported by Unknown — Jonathan Kemper

H

Huma Shazia

Senior AI & Tech Writer