
Improve efficiency of neural network training with algorithmic methods that deliver speed, boost quality and reduce cost.
MPT-7B is a decoder-style transformer pretrained from scratch on 1T tokens of English text and code. This model was trained by MosaicML. MPT-7B is part of the family of MosaicPretrainedTransformer (MPT) models, which use a modified transformer architecture optimized for efficient training and inference. These architectural changes include performance-optimized layer implementations and the elimination of context length limits by replacing positional embeddings with Attention with Linear Biases (ALiBi). Thanks to these modifications, MPT models can be trained with high throughput efficiency and stable convergence. MPT models can also be served efficiently with both standard HuggingFace pipelines and NVIDIA's FasterTransformer.
MosaicML is a company dedicated to advancing the field of artificial intelligence through innovative machine learning technology. Their primary focus is on making machine learning models more efficient and accessible, aiming to accelerate AI research and its application in various industries. MosaicML combines groundbreaking research with practical engineering to create tools and technologies that optimize machine learning workflows and reduce computational costs. The solutions they offer help businesses and developers harness the power of AI more efficiently, enabling faster development and deployment of AI applications.