I am currently a 4th year mathematics student at the University of Cambridge studying theoretical physics, and an incoming CS PhD student at UC Berkeley this fall. I am broadly interested in understanding intelligent behavior from the perspectives of machine learning & neuroscience, and using related insights to design reliable & robust AI systems, with topics of interest including:
- Alignment: Designing AI systems to be reliable and safe to deploy, with a focus on misspecification (outer alignment) and misgeneralization (inner alignment).
- Evaluation: Evaluating whether a given AI system is aligned and behaves as intended via comprehensive behavioral evaluations and scalable oversight, with findings from evaluations (e.g. discovery of unintended behaviour) informing the alignment stage.
- Control: Mitigating the downstream effects of misalignment through robust deployment-time detection and intervention.
Notes on these topics in the context of current-day LMs can be found here. Beyond LMs, I am interested in how these topics relate to potential future scaled-up RL systems.
I am also interested in mathematical formalisms of intelligent behavior based on the framework of variational inference (as in this post), and more generally, in deriving properties of intelligent behavior from simple mathematical principles (analogous to the construction of the Standard Model).
Posts
Constructing quantum field theories
Notes on prosaic alignment and control
Variational framework for perception and action
Architectures by symmetry
Free-energy
Understanding HTM
Brief notes on spiking neuron models
Credit assignment
subscribe via RSS