Archives
All the articles I've archived.
Why AI Might Drive Interest Rates Down: A Contrarian Perspective
Advanced AI could push interest rates significantly lower over the next decade, not higher.
GSPO Gradient Derivation
Consider the Group Sequence Policy Optimization (GSPO) objective for reinforcement learning with large language models.
Importance Sampling (and why the name is misleading)
We call it 'importance,' but the algorithm never picks 'important' points, it re-weights ordinary ones.
Likelihood Ratio Trick
The trick that you need when you need to differentiate through a sampling process -or in other words- through a stochastic node.
Softmaxing is actually Utility Maximization
In this post I show that the softmax function which is used to convert logits to probabilities, is actually utility maximization(with specific type of error on the utility).
Log-Derivative Trick
Just a simple chain-rule we need to encapsulate in a name.
Gumbel-Max Trick
Introducing the gumbel max trick and connecting it with the literature.
Softmax is actually a softer version of argmax.
ow many different ways can we think of to interpret softmax function?
Softmax to Gibbs
What kind of probability distribution you get when you softmax your logits?