I am currently a 4th year mathematics student at the University of Cambridge, and an incoming EECS PhD student at UC Berkeley this fall. My main interest is in working towards reliable and safely deployable AI systems, with specific interests spanning:
- Alignment: designing AI systems to be reliable and safe to deploy, with focus areas of misspecification, reward hacking, misgeneralization and robustness.
- Evaluation: evaluating whether a given AI system is aligned and behaves as intended, through scalable oversight and comprehensive behavioral evaluations (with findings then informing the alignment stage).
- Control: mitigating the downstream effects of misalignment through robust inference-time detection and intervention pipelines.
I am also interested in mathematical formalisms of intelligent behaviour, such as those based on variational inference (see this post).
Posts
Notes on alignment and control
Variational framework for perception and action
Architectures by symmetry
Free-energy
Understanding HTM
Brief notes on spiking neuron models
Credit assignment
subscribe via RSS