I am currently a 4th year mathematics student at the University of Cambridge, and an incoming EECS PhD student at UC Berkeley this fall. I am primarily interested in the development of reliable and safely deployable AI systems, with specific interests spanning:

  • Alignment: Designing AI systems to be reliable and safe to deploy, with a focus on preventing reward misspecification (and subsequent reward hacking) as well as robustness failures due to misgeneralization.
  • Evaluation: Evaluating whether a given AI system is aligned and behaves as intended, through robust & scalable oversight and comprehensive behavioral evaluations, with findings from evaluations (e.g. discovery of problematic behaviour) informing the alignment stage.
  • Control: Mitigating the downstream effects of misalignment through robust inference-time detection and intervention pipelines during deployment.

Notes on these topics can be found here. I am also interested in mathematical formalisms of intelligent behaviour, such as those based on variational inference (see this post).

Posts

subscribe via RSS