Reach out — happy to chat

Hi, I'm Ashish Goyal

Machine Learning Engineer

Specializing in Large Language Models and LLM inference optimization. Serving LLMs at scale and sharing deep technical implementations from first principles.

Read My Blogs

View Projects

About Me

I'm a Machine Learning Engineer specializing in Large Language Models and LLM inference optimization. Currently at Verloop.io, I'm building autonomous customer support systems powered by LLMs to serve enterprise clients at scale.

I believe in understanding AI systems at a fundamental level—from transformer architectures to inference optimization. Through my technical blog, I break down complex topics like Rotary Positional Embeddings (RoPE), attention mechanisms (MHA, GQA, MLA), and modern LLM architectures, providing complete PyTorch implementations from scratch.

My work bridges cutting-edge research and production systems. I'm currently diving deep into LLM inference optimization — experimenting with serving runtimes like vLLM, TensorRT-LLM, and SGLang, and exploring techniques like quantization, speculative decoding, and KV cache optimization to serve models at scale efficiently. I focus on building with clarity, sharing knowledge openly, and making advanced ML techniques accessible to practitioners.

At a Glance

Years of Experience

LLM Inference

Scale & Optimize

LLM

Production Infrastructure

LLMs Specialist

Production ML

Inference Engineering

Actively Exploring

Skills & Expertise

Technologies and areas I work with

LLM & AI

Large Language ModelsVector DatabasesEmbedding ModelsPrompt EngineeringTransformer Architecture

ML Frameworks

PyTorchHugging Face TransformersScikit-learnXGBoostWeights & BiasesSHAPOptuna

Infrastructure

DockerKubernetesgRPCKafkaRabbitMQRedisPostgreSQLMySQL

Languages & Web Frameworks

PythonSQLFastAPIDjangoGradio

Cloud & DevOps

Google Cloud PlatformCI/CDGitHub ActionsGit

Expertise

Deep LearningNatural Language ProcessingMathematicsMicroservices Architecture

LLM Inference & Serving

vLLMTensorRT-LLMSGLangQuantizationSpeculative DecodingKV Cache OptimizationGPU Optimization

Experience & Education

My professional journey and key achievements

SDE - Machine Learning

Verloop.io

Oct 2024 - Present

Building autonomous LLM-powered customer support systems and inference infrastructure for enterprise-scale automation.

Improved RAG retrieval performance by ~15% in Recall and 5% in MRR through systematic evaluation of embedding models and rerankers.
Built a robust document processing pipeline for RAG systems to handle complex PDFs (tables, structured layouts), using OCR to extract content and convert it into structured markdown for improved contextual retrieval.
Developed production ML microservices using Python, Docker, Kubernetes, and gRPC, integrating Weaviate for vector search.
Optimized LLM context utilization and prompt pipelines to reduce latency and API costs while maintaining response quality.

Machine Learning Engineer

Monsoon CreditTech

Feb 2022 - Aug 2024

Developed ML models for credit risk assessment and fraud detection in fintech.

Built credit risk models using XGBoost with Bayesian hyperparameter optimization, reducing loan delinquency rates by ~40%.
Developed a monitoring framework including data drift detection (PSI, KS tests) and model explainability using SHAP to ensure model reliability in production.
Designed and deployed scalable ML inference APIs using FastAPI and Django, containerized with Docker and deployed on GCP Cloud Run.
Built fraud detection models using Deep Isolation Forest to identify anomalous transactions, capturing ~30% of high-risk cases in the top risk decile.

Bachelor of Technology in Computer Science & Engineering

BML Munjal University

Jul 2016 - Aug 2020

Major in Computer Science and Engineering. Published research in Nature Scientific Reports.

Published: "Machine learning predicts live-birth occurrence before IVF treatment" in Nature Scientific Reports (2020)
Focused on Machine Learning, Deep Learning, and Software Engineering

Featured Projects

Open-source projects spanning LLMs, ML, and data engineering

LLM

AI Research Agent

Autonomous research agent that decomposes queries, executes multi-turn tool calling with web search and arXiv, and validates completeness through self-reflection. Features real-time Streamlit UI.

AI AgentsRAG

PythonGoogle GeminiStreamlitTavily API

View on GitHub

Vector DB

Weaviate MCP Inspector

Natural language interface for Weaviate vector databases through Claude using Model Context Protocol. Enables intuitive database exploration via conversation with 9 inspection tools.

MCPTools

PythonWeaviateFastMCPClaude

View on GitHub

Computer Vision

Handwritten Text Recognizer

Deep learning OCR system with ResNet encoder and Transformer decoder (14M parameters). Achieves 70% error reduction via augmentation. Deployed as FastAPI microservice on GCP with monitoring.

Deep LearningMLOps

PyTorchFastAPIDockerGCP

View on GitHub

View All Projects