Kunj Shah headshot

Kunj Shah

AI Agent Intern | LLM Developer | ML Researcher

LLM Research & Development

How LLMs Are Made

An all-in-one GitHub repo documenting my hands-on journey building and experimenting with LLMs—from GPT, Deepseek, and Kimi architectures to advanced techniques like MoE, MoD, MHLA, and MLA. Includes code, experiments, insights, and resources.

Technical Insights & Documentation
Full-Stack LLM Solutions
Kimi, GPT, Deepseek Architectures
GatorGPT

A lightweight 63M param transformer-based language model with modern architecture built for my University. Features Grouped Query Attention, RoPE positioning, and SwiGLU activation. Deployed with vLLM and available via Docker (kunjcr2/gatorgpt).
Eval loss dropped from ~246 to ~1.503.

Fast Inference with torch.compile & flash attention
Memory Efficient GQA reduces memory usage
One-click Docker deployment with vLLM

Experience

Projects

Max project thumbnail Max - AI Voice Assistant
90% voice accuracy 8 tools Langchain/OpenAI

Developed a voice-activated AI assistant using Langchain, OpenAI, Hugging Face, and SpeechRecognition to automate tasks like web search, YouTube streaming, and emailing, enhancing user experience through hands-free interaction.

Github
FLAN-T5 Stack Overflow Finetuning thumbnail Llama-3.2-3b FInetuned on OpenHermes
~300k QA Pairs LoRA Finetuning 1.27->0.21 Train loss vLLM and Docker used

An instruction-tuned Llama-3.2-3B base model trained with LoRA on the OpenHermes dataset. This run transformed the base model into an instruct-capable assistant with only ~0.75% of parameters updated, making it lightweight, deployment-friendly, and packaged as a Docker image (kunjcr2/llama-3.2-3b-openhermes) for reproducible serving with vLLM.

Qwen2.5-0.5B SFT + DPO project thumbnail Qwen2.5-0.5B SFT + DPO
85M tokens (SFT) 1.48 val loss (SFT) 66% reward accuracy (DPO)

A two-stage pipeline where the model was first trained on 85M tokens with supervised fine-tuning, reaching a validation loss of 1.48, and then optimized with Direct Preference Optimization to achieve 66% reward accuracy. This demonstrates how foundational instruction tuning can be reinforced through preference optimization to improve reasoning quality.

More projects on Github

Technical Skills

Programming Languages

Python JavaScript Java C++ HTML/CSS SQL

AI Tools & Frameworks

LangChain LangFlow n8n RAG Pipelines OpenAI API Hugging Face Transformers MCP Servers Vector Databases Prompt Engineering

Machine Learning & Deep Learning

PyTorch TensorFlow Scikit-learn Keras OpenCV Pandas NumPy Matplotlib NLP Computer Vision LoRA Neural Networks Weights & Biases Encoder–Decoder Models Reinforcement Learning DPO PPO

MLOps & Deployment

Docker vLLM Serving Hugging Face Hub Model Deployment GPU Optimization Distributed Training Vertex AI Git

Web Development

Node.js React.js Flask TailwindCSS Express.js

Database & Development Tools

MongoDB MySQL Vertex AI Git Docker

LLM Architectures & Systems

Transformers Attention Mechanisms Pretraining Finetuning Tokenizers vLLM Optimization Mixture of Experts Mixture of Recursions Mixture of Depths Rotary Positional Encodings Multi-Token Prediction Flash Attention Sliding Window Attention Reasoning Models HRMs GPU Training Distributed Learning

Core CS & Problem Solving

Data Structures & Algorithms Binary Trees & BSTs Graph DFS/BFS Dynamic Programming SQL Querying

Hackathons

Show all Hackathons
MCP AWS Agentic Challange

Where: AWS Builder Loft, SF

When: 7/25/2025

Project: Nango Automation

Cal Hacks 11.0

Where: San Francisco, CA

When: October 18, 2024 – October 20, 2024

Project: Workout Web App

SacHacks

Where: Virtual Hackathon

When: March 2, 2025 – March 3, 2025

Project: Web Detective

HackMerced

Where: University of California, Merced

When: March 9, 2025 – March 11, 2025

Project: Web Detective (Updated)

Certificates

    Show all certificates
    • Programming for Everybody (Getting Started with Python) – University of Michigan
    • Python Data Structures – University of Michigan
    • Crash course on Python – Google
    • Calculus through data and modelling: Series and integration – Johns Hopkins University
    • Calculus through data and modelling: Techniques of integration – Johns Hopkins University
    • Calculus through data and modelling: Integration Applications – Johns Hopkins University
    • Calculus through data and modelling: Vector Calculus – Johns Hopkins University
    • Introduction to Web Development – UC Davis
    • Understanding Einstein: Special theory of relativity – Stanford University
    • Introduction to complex analysis – Wesleyan University
    • Understanding Basic SQL Syntax – Coursera Project Network
    • C++ Basics: Selection and Iteration – Codec
    • Building a Text-based Bank – Coursera Project Network
    • Create a Supermarket app using Java OOP – Coursera Project Network
    • Python 101: Develop Your First Python Program – Coursera Project Network
    • LOR by Duc Ta - CSC215
    • LOR by Maitra Shah - Internship Certificate

About Me

I am a second-year Computer Science student at San Francisco State University with expertise in AI/ML and Full-stack development. Currently serving as Tech Director at SparkSF, I specialize in Machine Learning, NLP, and MERN stack development. My notable projects include AI-powered applications like 'theHelper' research assistant and 'Max' voice assistant. I've participated in multiple hackathons including Cal Hacks 11.0 and SacHacks, creating innovative solutions like Workout Web App and Web Detective. With strong foundations in Python, JavaScript, and various AI frameworks including Hugging Face Transformers and OpenCV, I combine academic excellence (JEE qualifier) with practical development experience to deliver impactful solutions.

Connect with Me