Mixtral 8x7B — A Deep Dive
A detailed comparison of Mixtral 8x7B with LLaMA 2, and an implementation of an optimized Mixture of Experts (MoE) layer in PyTorch.
Read on SubstackDeep dives into machine learning concepts with practical code implementations
A detailed comparison of Mixtral 8x7B with LLaMA 2, and an implementation of an optimized Mixture of Experts (MoE) layer in PyTorch.
Read on SubstackStep-by-step guide to building the LLaMA model from scratch in PyTorch, with in-depth explanations of each essential component.
Read on SubstackUnderstanding the evolution from Multi-Head Attention to modern inference optimizations
Read on SubstackA comprehensive exploration of RoPE with theoretical derivations from first principles and PyTorch implementation
Read on SubstackShowing 4 posts