Technical Blogs & Projects

Deep dives into machine learning concepts with practical code implementations

Mixtral 8x7B — A Deep Dive

A detailed comparison of Mixtral 8x7B with LLaMA 2, and an implementation of an optimized Mixture of Experts (MoE) layer in PyTorch.

Read on Substack

Building LLaMA Model: A Deep Dive

Step-by-step guide to building the LLaMA model from scratch in PyTorch, with in-depth explanations of each essential component.

Read on Substack

Attention Mechanisms & KV Cache: A Deep Dive

Understanding the evolution from Multi-Head Attention to modern inference optimizations

Read on Substack

Rotary Positional Embedding: A Deep Dive

A comprehensive exploration of RoPE with theoretical derivations from first principles and PyTorch implementation

Read on Substack

Showing 4 posts