Writing
I share what I'm learning.
Agentic Systems & Tooling
LLM Behavior & Sampling
Transformers & LLM Internals
KV (Key-Value) Cache in Transformers
Reducing inference latency using KV cache.
Masked Self-Attention
How masking enforces autoregressive generation in decoder-only transformers.
Self-Attention in Transformers
How queries, keys, and values compute attention weights and why it matters.
Training the Tokenizer
How tokenizers are trained and why the vocabulary choice shapes model behavior.