I am currently a 4th year mathematics student at the University of Cambridge, and an incoming EECS PhD student at UC Berkeley this fall. I am primarily interested in the development of reliable and safely deployable AI systems, with specific interests spanning:
- Alignment: Designing AI systems to be reliable and safe to deploy, with a focus on preventing reward misspecification (and subsequent reward hacking) as well as robustness failures due to misgeneralization.
- Evaluation: Evaluating whether a given AI system is aligned and behaves as intended, through robust & scalable oversight and comprehensive behavioral evaluations, with findings from evaluations (e.g. discovery of problematic behaviour) informing the alignment stage.
- Control: Mitigating the downstream effects of misalignment through robust inference-time detection and intervention pipelines during deployment.
Notes on these topics can be found here. I am also interested in mathematical formalisms of intelligent behaviour, such as those based on variational inference (see this post).
Posts
Notes on alignment and control
Variational framework for perception and action
Architectures by symmetry
Free-energy
Understanding HTM
Brief notes on spiking neuron models
Credit assignment
subscribe via RSS