Mixtral 8x7B — A Deep Dive
A detailed comparison of Mixtral 8x7B with LLaMA 2, and an implementation of an optimized Mixture of Experts (MoE) layer in PyTorch.
Read on Substack
Specializing in Large Language Models and RAG systems. Building production ML infrastructure and sharing deep technical implementations from first principles.
I'm a Machine Learning Engineer specializing in Large Language Models and Retrieval-Augmented Generation (RAG) systems. Currently at Verloop.io, I architect end-to-end RAG pipelines that power customer support for enterprise clients across e-commerce, banking, and healthcare sectors.
I believe in understanding AI systems at a fundamental level—from transformer architectures to retrieval optimization. Through my technical blog, I break down complex topics like Rotary Positional Embeddings (RoPE), attention mechanisms (MHA, GQA, MLA), and modern LLM architectures, providing complete PyTorch implementations from scratch.
My work bridges cutting-edge research and production systems. I focus on building with clarity, sharing knowledge openly, and making advanced ML techniques accessible to practitioners. Previously at Monsoon CreditTech, I built credit risk models achieving 40% reduction in delinquency rates.
Technologies and areas I work with
My professional journey and key achievements
Verloop.io
Building production RAG systems and LLM infrastructure for enterprise customer support automation.
Monsoon CreditTech
Developed ML models for credit risk assessment and fraud detection in fintech.
BML Munjal University
Major in Computer Science and Engineering. Published research in Nature Scientific Reports.
Open-source projects spanning LLMs, ML, and data engineering
Autonomous research agent that decomposes queries, executes multi-turn tool calling with web search and arXiv, and validates completeness through self-reflection. Features real-time Streamlit UI.
Natural language interface for Weaviate vector databases through Claude using Model Context Protocol. Enables intuitive database exploration via conversation with 9 inspection tools.
Deep learning OCR system with ResNet encoder and Transformer decoder (14M parameters). Achieves 70% error reduction via augmentation. Deployed as FastAPI microservice on GCP with monitoring.
Latest technical articles combining theory and practice
A detailed comparison of Mixtral 8x7B with LLaMA 2, and an implementation of an optimized Mixture of Experts (MoE) layer in PyTorch.
Read on SubstackStep-by-step guide to building the LLaMA model from scratch in PyTorch, with in-depth explanations of each essential component.
Read on SubstackUnderstanding the evolution from Multi-Head Attention to modern inference optimizations
Read on SubstackI'm always interested in discussing LLMs, RAG systems, ML engineering, or potential collaborations. Feel free to reach out!
Or send me an email directly at ashishgy77@gmail.com