Mixtral 8x7B — A Deep Dive
A detailed comparison of Mixtral 8x7B with LLaMA 2, and an implementation of an optimized Mixture of Experts (MoE) layer in PyTorch.
Read on SubstackSpecializing in Large Language Models and RAG systems. Building production ML infrastructure and sharing deep technical implementations from first principles.
I'm a Machine Learning Engineer specializing in Large Language Models and Retrieval-Augmented Generation (RAG) systems. Currently at Verloop.io, I architect end-to-end RAG pipelines that power customer support for enterprise clients across e-commerce, banking, and healthcare sectors.
I believe in understanding AI systems at a fundamental level—from transformer architectures to retrieval optimization. Through my technical blog, I break down complex topics like Rotary Positional Embeddings (RoPE), attention mechanisms (MHA, GQA, MLA), and modern LLM architectures, providing complete PyTorch implementations from scratch.
My work bridges cutting-edge research and production systems. I focus on building with clarity, sharing knowledge openly, and making advanced ML techniques accessible to practitioners. Previously at Monsoon CreditTech, I built credit risk models achieving 40% reduction in delinquency rates.
Technologies and areas I work with
My professional journey and key achievements
Verloop.io
Building production RAG systems and LLM infrastructure for enterprise customer support automation.
Monsoon CreditTech
Developed ML models for credit risk assessment and fraud detection in fintech.
BML Munjal University
Major in Computer Science and Engineering. Published research in Nature Scientific Reports.
Latest technical articles combining theory and practice
A detailed comparison of Mixtral 8x7B with LLaMA 2, and an implementation of an optimized Mixture of Experts (MoE) layer in PyTorch.
Read on SubstackStep-by-step guide to building the LLaMA model from scratch in PyTorch, with in-depth explanations of each essential component.
Read on SubstackUnderstanding the evolution from Multi-Head Attention to modern inference optimizations
Read on SubstackI'm always interested in discussing LLMs, RAG systems, ML engineering, or potential collaborations. Feel free to reach out!