Vidit-Ostwal

LLM Behavior & Sampling

KL Divergence — Made Visual

An interactive visual guide to KL divergence: distributions, the formula, live computation, and how it shows up in LLMs and PPO training.

Jun 12, 2026

How Does Temperature Change LLM Responses?

Effect of temperature on next-token probability distribution.

Jul 9, 2025

Mixture of Experts

MoE Routing Calculation (Excel Walkthrough)

Step-by-step Excel breakdown of router logits → top-k selection → normalized expert probabilities. Includes top-k masking, −∞ replacement, softmax, and final expert routing weights.

Jan 25, 2026

Agentic Systems & Tooling

Building MakeMyDocsBot

Automated multi-language documentation sync across feature branches.

Dec 20, 2025

Transformers & LLM Internals

KV (Key-Value) Cache in Transformers

Reducing inference latency using KV cache.

Jul 26, 2025

Masked Self-Attention

How masking enforces autoregressive generation in decoder-only transformers.

Jun 25, 2025

Self-Attention in Transformers

How queries, keys, and values compute attention weights and why it matters.

Jun 21, 2025

Training the Tokenizer

How tokenizers are trained and why the vocabulary choice shapes model behavior.

Jun 3, 2025

Writing

LLM Behavior & Sampling

Mixture of Experts

Agentic Systems & Tooling

Transformers & LLM Internals