Content & Speaking

Sharing knowledge
and insights

A collection of my talks, blog posts, videos, and workshops where I share experiences and insights about AI, developer relations, and building technology products.

talk

Dec 2024

Scaling LLM Training to 1000 GPUs

Deep dive into distributed training strategies, data parallelism, and pipeline parallelism for training large language models at scale.

LLMsDistributed TrainingMLOps

800 attendees

Understanding Attention Mechanisms: A Visual Guide

blog

Nov 2024

Understanding Attention Mechanisms: A Visual Guide

An illustrated walkthrough of self-attention, multi-head attention, and cross-attention with interactive visualizations and PyTorch code.

TransformersAttentionDeep Learning+1

12,500 reads

Hands-on LLM Fine-tuning with Hugging Face

workshop

Oct 2024

Hands-on LLM Fine-tuning with Hugging Face

A practical workshop on fine-tuning open-source LLMs using LoRA, QLoRA, and the Hugging Face ecosystem for domain-specific tasks.

Hugging FaceFine-tuningLoRA+1

300 attendees

Building Production ML Pipelines with Kubeflow

video

Sep 2024

Building Production ML Pipelines with Kubeflow

End-to-end tutorial on building automated ML pipelines with Kubeflow, from data ingestion to model serving with canary deployments.

KubeflowMLOpsPipelines

25,000 views

podcast

Aug 2024

The Future of Foundation Models

Discussing scaling laws, emergent capabilities, and the trajectory of foundation models with leading ML researchers.

Foundation ModelsScaling LawsResearch

18,000 views

blog

Jul 2024

MLOps Best Practices for Startups

Practical guide to implementing MLOps at startup scale -- experiment tracking, model versioning, CI/CD for ML, and monitoring in production.

MLOpsStartupsBest Practices

8,200 reads

Efficient Inference for Large Language Models

talk

Jun 2024

Efficient Inference for Large Language Models

Techniques for optimizing LLM inference: quantization, KV-cache optimization, speculative decoding, and continuous batching.

InferenceOptimizationLLMs

450 attendees

From Research to Production: Deploying Transformer Models

blog

May 2024

From Research to Production: Deploying Transformer Models

A comprehensive guide covering model export (ONNX, TensorRT), serving architectures, load testing, and monitoring for transformer deployments.

DeploymentTransformersProduction ML

6,400 reads

Sharing knowledgeand insights

Scaling LLM Training to 1000 GPUs

Understanding Attention Mechanisms: A Visual Guide

Hands-on LLM Fine-tuning with Hugging Face

Building Production ML Pipelines with Kubeflow

The Future of Foundation Models

MLOps Best Practices for Startups

Efficient Inference for Large Language Models

From Research to Production: Deploying Transformer Models

Sharing knowledge
and insights