Rhys Gould

I am currently a 4th year mathematics student at the University of Cambridge studying theoretical physics, and an incoming CS PhD student at UC Berkeley this fall. I am broadly interested in understanding intelligent behavior from the perspective of both machine learning & neuroscience, and using related insights to design reliable AI systems.

In the context of LMs, I am mainly interested in:

Alignment: Designing AI systems to be reliable and safe to deploy, with a focus on misspecification (outer alignment) and misgeneralization (inner alignment).
Evaluation: Evaluating whether a given AI system is aligned and behaves as intended via scalable oversight and comprehensive behavioral evaluations, with findings from evaluations (e.g. discovery of unintended behaviour) informing the alignment stage.
Control: Mitigating the downstream effects of misalignment through robust deployment-time detection and intervention.

Notes on these topics can be found here.

Beyond LMs, I am interested in mathematical formalisms of intelligent behavior based on the framework of variational inference (as in this post), and more generally, in deriving properties of intelligent behavior from simple mathematical principles (analogous to the construction of the Standard Model).

Posts

Jun 20, 2025
Constructing quantum field theories
Mar 22, 2025
Notes on prosaic alignment and control
Sep 23, 2024
Variational framework for perception and action
Aug 18, 2023
Architectures by symmetry
Aug 15, 2023
Free-energy
Aug 3, 2023
Understanding HTM
Jul 24, 2023
Brief notes on spiking neuron models
Dec 30, 2022
Credit assignment