Granite-4.0-Tiny-Base-Preview is a 7-billion-parameter hybrid mixture-of-experts (MoE) language model developed by IBM's Granite Team. It features a 128,000-token context window and utilizes the Mamba-2 architecture combined with softmax attention to enhance expressiveness. Notably, it omits positional encoding to improve length generalization.
Key Features and Functionality:
- Extensive Context Window: Supports up to 128,000 tokens, facilitating the processing of lengthy documents and complex tasks.
- Advanced Architecture: Incorporates Mamba-2 with softmax attention, enhancing the model's expressiveness and adaptability.
- Multilingual Support: Trained in 12 languages, including English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese, with the flexibility for fine-tuning in additional languages.
- Versatile Applications: Designed for tasks such as summarization, text classification, extraction, question-answering, and other long-context applications.
Primary Value and User Solutions:
Granite-4.0-Tiny-Base-Preview addresses the need for a robust, multilingual language model capable of handling extensive context lengths. Its architecture and training enable it to perform a wide range of text-to-text generation tasks effectively, making it suitable for applications requiring deep language understanding and generation across multiple languages. The model's design allows for fine-tuning, enabling users to adapt it to specific domains or languages beyond the initial 12 supported, thereby offering flexibility and scalability for diverse use cases.