Vidit Ostwal
homeexperienceprojectsblog

Writing

I share what I'm learning.

Mixture of Experts

MoE Routing Calculation (Excel Walkthrough)

Step-by-step Excel breakdown of router logits → top-k selection → normalized expert probabilities. Includes top-k masking, −∞ replacement, softmax, and final expert routing weights.

Jan 25, 2026

Agentic Systems & Tooling

Building MakeMyDocsBot

Automated multi-language documentation sync across feature branches.

Dec 20, 2025

LLM Behavior & Sampling

How Does Temperature Change LLM Responses?

Effect of temperature on next-token probability distribution.

Jul 9, 2025

Transformers & LLM Internals

KV (Key-Value) Cache in Transformers

Reducing inference latency using KV cache.

Jul 26, 2025

Masked Self-Attention

How masking enforces autoregressive generation in decoder-only transformers.

Jun 25, 2025

Self-Attention in Transformers

How queries, keys, and values compute attention weights and why it matters.

Jun 21, 2025

Training the Tokenizer

How tokenizers are trained and why the vocabulary choice shapes model behavior.

Jun 3, 2025
© 2026 Vidit Ostwal