Skip to content

Archives

All the articles I've archived.

2025 1
July 1
  • GSPO Gradient Derivation

    Consider the Group Sequence Policy Optimization (GSPO) objective for reinforcement learning with large language models.

2024 3
May 1
April 1
  • Softmaxing is actually Utility Maximization

    In this post I show that the softmax function which is used to convert logits to probabilities, is actually utility maximization(with specific type of error on the utility).

January 1
2022 3
September 3