Published on December 10, 2023, 12:30 pm

Artificial Intelligence (AI) continues to evolve rapidly, and researchers are constantly exploring new ways to improve its performance. The Together AI team has introduced a groundbreaking family of language models called StripedHyena, which offers an alternative to the widely used transformer architecture. With 7 billion parameters, these models aim to enhance training and inference performance compared to models like GPT-4.

StripedHyena introduces a fresh set of AI architectures that are designed to be faster, more memory efficient, and capable of processing longer contexts of up to 128,000 tokens. The release includes two models: StripedHyena-Hessian-7B (SH 7B), a base model, and StripedHyena-Nous-7B (SH-N 7B), a chat model. This research involved collaboration from esteemed institutions such as HazyResearch, hessian.AI, Nous Research, MILA, HuggingFace, and the German Research Centre for Artificial Intelligence (DFKI).

What sets StripedHyena apart is its ability to compete with the best open-source transformers available. According to Together AI, the base model achieves comparable performance to other top models like Llama-2, Yi, and Mistral 7B on OpenLLM leaderboard tasks while outperforming them in long context summarization.

The key component of the StripedHyena models is its state-space model (SSM) layer. Traditionally used for modeling complex sequences and time series data with temporal dependencies, SSMs have proven useful in language modeling as well as other domains. What makes it even more impressive is that these models require less computing power.

In terms of speed, StripedHyena has surpassed conventional transformers by more than 30 percent in end-to-end training for sequences containing 32,000 tokens. Similarly, it achieves over 50 percent faster training for sequences with 64,000 tokens and an astounding 100 percent faster training for sequences with 128,000 tokens. This increased efficiency opens up new possibilities in AI research and application development.

The primary goal of the StripedHyena models is to push the boundaries of architectural design beyond transformers. Moving forward, researchers plan to explore larger models with even longer contexts, multimodal support, further performance optimizations, and seamless integration of StripedHyena into retrieval pipelines. By leveraging the advantages of longer context utilization, StripedHyena aims to unlock new opportunities in AI research and real-world applications.

As the field of AI continues to progress rapidly, innovations like StripedHyena bring fresh perspectives and advancements that have the potential to revolutionize how we leverage generative AI models. Stay tuned for more updates on this exciting development in the world of Artificial Intelligence.


Comments are closed.