Back to portfolio
Content & Speaking

Sharing knowledge
and insights

A collection of my talks, blog posts, videos, and workshops where I share experiences and insights about AI, developer relations, and building technology products.

Scaling LLM Training to 1000 GPUs
talk
Dec 2024

Scaling LLM Training to 1000 GPUs

Deep dive into distributed training strategies, data parallelism, and pipeline parallelism for training large language models at scale.

LLMsDistributed TrainingMLOps
800 attendees
Understanding Attention Mechanisms: A Visual Guide
blog
Nov 2024

Understanding Attention Mechanisms: A Visual Guide

An illustrated walkthrough of self-attention, multi-head attention, and cross-attention with interactive visualizations and PyTorch code.

TransformersAttentionDeep Learning+1
12,500 reads
Hands-on LLM Fine-tuning with Hugging Face
workshop
Oct 2024

Hands-on LLM Fine-tuning with Hugging Face

A practical workshop on fine-tuning open-source LLMs using LoRA, QLoRA, and the Hugging Face ecosystem for domain-specific tasks.

Hugging FaceFine-tuningLoRA+1
300 attendees
Building Production ML Pipelines with Kubeflow
video
Sep 2024

Building Production ML Pipelines with Kubeflow

End-to-end tutorial on building automated ML pipelines with Kubeflow, from data ingestion to model serving with canary deployments.

KubeflowMLOpsPipelines
25,000 views
The Future of Foundation Models
podcast
Aug 2024

The Future of Foundation Models

Discussing scaling laws, emergent capabilities, and the trajectory of foundation models with leading ML researchers.

Foundation ModelsScaling LawsResearch
18,000 views
MLOps Best Practices for Startups
blog
Jul 2024

MLOps Best Practices for Startups

Practical guide to implementing MLOps at startup scale -- experiment tracking, model versioning, CI/CD for ML, and monitoring in production.

MLOpsStartupsBest Practices
8,200 reads
Efficient Inference for Large Language Models
talk
Jun 2024

Efficient Inference for Large Language Models

Techniques for optimizing LLM inference: quantization, KV-cache optimization, speculative decoding, and continuous batching.

InferenceOptimizationLLMs
450 attendees
From Research to Production: Deploying Transformer Models
blog
May 2024

From Research to Production: Deploying Transformer Models

A comprehensive guide covering model export (ONNX, TensorRT), serving architectures, load testing, and monitoring for transformer deployments.

DeploymentTransformersProduction ML
6,400 reads