Sharing knowledge
and insights
A collection of my talks, blog posts, videos, and workshops where I share experiences and insights about AI, developer relations, and building technology products.
Scaling LLM Training to 1000 GPUs
Deep dive into distributed training strategies, data parallelism, and pipeline parallelism for training large language models at scale.
Understanding Attention Mechanisms: A Visual Guide
An illustrated walkthrough of self-attention, multi-head attention, and cross-attention with interactive visualizations and PyTorch code.
Hands-on LLM Fine-tuning with Hugging Face
A practical workshop on fine-tuning open-source LLMs using LoRA, QLoRA, and the Hugging Face ecosystem for domain-specific tasks.
Building Production ML Pipelines with Kubeflow
End-to-end tutorial on building automated ML pipelines with Kubeflow, from data ingestion to model serving with canary deployments.
The Future of Foundation Models
Discussing scaling laws, emergent capabilities, and the trajectory of foundation models with leading ML researchers.
MLOps Best Practices for Startups
Practical guide to implementing MLOps at startup scale -- experiment tracking, model versioning, CI/CD for ML, and monitoring in production.
Efficient Inference for Large Language Models
Techniques for optimizing LLM inference: quantization, KV-cache optimization, speculative decoding, and continuous batching.
From Research to Production: Deploying Transformer Models
A comprehensive guide covering model export (ONNX, TensorRT), serving architectures, load testing, and monitoring for transformer deployments.