I am currently a 4th year mathematics student at the University of Cambridge studying theoretical physics, and an incoming CS PhD student at UC Berkeley this fall. I am broadly interested in understanding intelligent behavior from the perspective of both machine learning & neuroscience, and using related insights to design reliable AI systems.
In the context of LMs, I am mainly interested in:
- Alignment: Designing AI systems to be reliable and safe to deploy, with a focus on misspecification (outer alignment) and misgeneralization (inner alignment).
- Evaluation: Evaluating whether a given AI system is aligned and behaves as intended via scalable oversight and comprehensive behavioral evaluations, with findings from evaluations (e.g. discovery of unintended behaviour) informing the alignment stage.
- Control: Mitigating the downstream effects of misalignment through robust deployment-time detection and intervention.
Notes on these topics can be found here.
Beyond LMs, I am interested in mathematical formalisms of intelligent behavior based on the framework of variational inference (as in this post), and more generally, in deriving properties of intelligent behavior from simple mathematical principles (analogous to the construction of the Standard Model).
Posts
Constructing quantum field theories
Notes on prosaic alignment and control
Variational framework for perception and action
Architectures by symmetry
Free-energy
Understanding HTM
Brief notes on spiking neuron models
Credit assignment
subscribe via RSS