Microsoft is announcing three new open-source AI models in the Phi-3.5 series: Phi-3.5-mini-instruct, Phi-3.5-MoE-instruct, and Phi-3.5-vision-instruct. These models are provided under the MIT permissive license and provide developers with tools for a range of tasks, including inference, multilingual processing, and image and video analysis.
With 3.82 billion parameters, the Phi-3.5-mini-instruct model is optimized for basic fast inference tasks. It is designed to run in memory- and compute-constrained environments, making it well-suited for code generation, mathematical problem solving, and logic-based inference tasks. Despite its relatively compact size, Phi-3.5-mini-instruct outperforms larger models such as Meta’s Llama-3.1-8B-instruct and Mistral-7B-instruct on benchmarks such as RepoQA, which measures long-context code comprehension.
With 41.9 billion parameters, the Phi-3.5-MoE-instruct model employs a mixed-experts architecture, which allows it to handle more complex inference tasks by activating different parameters depending on the input. The MoE model outperforms larger peer models such as Google’s Gemini 1.5 Flash on a range of benchmarks, demonstrating its advanced inference capabilities. This makes it a powerful tool for applications that require deep, context-aware understanding and decision-making.
The Phi-3.5-vision-instruct model has 4.15 billion parameters and integrates both text and image processing capabilities. This multimodal approach allows it to handle a variety of tasks, including image understanding, optical character recognition, and video summarization. It supports 128K token context length, making it specialized for handling complex multi-frame visual tasks. Trained on a combination of synthetic and public datasets, the Phi-3.5-vision-instruct model is specialized for tasks such as TextVQA and ScienceQA, providing high-quality visual analytics.
All three models in the Phi-3.5 series have a strong training background. Phi-3.5-mini-instruct was trained on 3.4 trillion tokens using 512 H100-80G GPUs over 10 days. The Phi-3.5-MoE-instruct model required a longer training period, processing 4.9 trillion tokens in 23 days using the same number of GPUs. The Phi-3.5-vision-instruct model was trained on 500 billion tokens using 256 A100-80G GPUs over 6 days. These extensive training processes allow the Phi-3.5 models to achieve high performance in numerous benchmarks, often outperforming other leading AI models, including OpenAI’s GPT-4o, in several scenarios.
Benchmark comparison of Phi-3.5 MiniInstructor with other leading AI models (Source: Hugging Face)
These benchmark results show how the Phi-3.5 model, especially Phi-3.5 mini, compares against other leading AI models such as Mistral, Llama, and Gemini across a range of tasks. The data highlights the effectiveness of the Phi-3.5 model in tasks ranging from general reasoning to more specific problem-solving scenarios.
Responses from the AI community highlighted the technical capabilities of the Phi-3.5 series, especially in multilingual and vision tasks. On social media platforms, users have highlighted the models’ performance in benchmarks and expressed interest in their potential applications. For example, Dr. Turan Jafarzade commented on LinkedIn:
These advantages position the Phi-3.5 SLM (Small Language Model) as a competitive model for enterprise applications where efficiency and scalability are key.
Danny Penrose added:
This is an exciting development! Being able to convert Phi-3.5 to the Llama architecture without any performance degradation opens up exciting possibilities for model optimization. How do you think this will impact the wider adoption of these models in real-world applications?
The Phi-3.5 model is released under the MIT license, which allows developers to freely use, modify, and distribute the software for both commercial and non-commercial purposes. The license is intended to make it easier to integrate AI capabilities into a wide range of applications and projects, supporting a wide range of use cases across different industries.