Reach out — happy to chat

Hi, I'm Ashish Goyal

Machine Learning Engineer

Specializing in Large Language Models and LLM inference optimization. Serving LLMs at scale and sharing deep technical implementations from first principles.

About Me

I'm a Machine Learning Engineer specializing in Large Language Models and LLM inference optimization. Currently at Verloop.io, I'm building autonomous customer support systems powered by LLMs to serve enterprise clients at scale.

I believe in understanding AI systems at a fundamental level—from transformer architectures to inference optimization. Through my technical blog, I break down complex topics like Rotary Positional Embeddings (RoPE), attention mechanisms (MHA, GQA, MLA), and modern LLM architectures, providing complete PyTorch implementations from scratch.

My work bridges cutting-edge research and production systems. I'm currently diving deep into LLM inference optimization — experimenting with serving runtimes like vLLM, TensorRT-LLM, and SGLang, and exploring techniques like quantization, speculative decoding, and KV cache optimization to serve models at scale efficiently. I focus on building with clarity, sharing knowledge openly, and making advanced ML techniques accessible to practitioners.

At a Glance

4+
Years of Experience
LLM Inference
Scale & Optimize
LLM
Production Infrastructure
LLMs Specialist
Production ML
Inference Engineering
Actively Exploring

Skills & Expertise

Technologies and areas I work with

LLM & AI

Large Language ModelsVector DatabasesEmbedding ModelsPrompt EngineeringTransformer Architecture

ML Frameworks

PyTorchHugging Face TransformersScikit-learnXGBoostWeights & BiasesSHAPOptuna

Infrastructure

DockerKubernetesgRPCKafkaRabbitMQRedisPostgreSQLMySQL

Languages & Web Frameworks

PythonSQLFastAPIDjangoGradio

Cloud & DevOps

Google Cloud PlatformCI/CDGitHub ActionsGit

Expertise

Deep LearningNatural Language ProcessingMathematicsMicroservices Architecture

LLM Inference & Serving

vLLMTensorRT-LLMSGLangQuantizationSpeculative DecodingKV Cache OptimizationGPU Optimization

Experience & Education

My professional journey and key achievements

SDE - Machine Learning

Verloop.io

Oct 2024 - Present

Building autonomous LLM-powered customer support systems and inference infrastructure for enterprise-scale automation.

  • Improved RAG retrieval performance by ~15% in Recall and 5% in MRR through systematic evaluation of embedding models and rerankers.
  • Built a robust document processing pipeline for RAG systems to handle complex PDFs (tables, structured layouts), using OCR to extract content and convert it into structured markdown for improved contextual retrieval.
  • Developed production ML microservices using Python, Docker, Kubernetes, and gRPC, integrating Weaviate for vector search.
  • Optimized LLM context utilization and prompt pipelines to reduce latency and API costs while maintaining response quality.

Machine Learning Engineer

Monsoon CreditTech

Feb 2022 - Aug 2024

Developed ML models for credit risk assessment and fraud detection in fintech.

  • Built credit risk models using XGBoost with Bayesian hyperparameter optimization, reducing loan delinquency rates by ~40%.
  • Developed a monitoring framework including data drift detection (PSI, KS tests) and model explainability using SHAP to ensure model reliability in production.
  • Designed and deployed scalable ML inference APIs using FastAPI and Django, containerized with Docker and deployed on GCP Cloud Run.
  • Built fraud detection models using Deep Isolation Forest to identify anomalous transactions, capturing ~30% of high-risk cases in the top risk decile.

Bachelor of Technology in Computer Science & Engineering

BML Munjal University

Jul 2016 - Aug 2020

Major in Computer Science and Engineering. Published research in Nature Scientific Reports.

  • Published: "Machine learning predicts live-birth occurrence before IVF treatment" in Nature Scientific Reports (2020)
  • Focused on Machine Learning, Deep Learning, and Software Engineering

Get In Touch

I'm always interested in discussing LLMs, inference optimization, ML engineering, or potential collaborations. Feel free to reach out!

Or send me an email directly at ashishgy77@gmail.com