I am a 1st year CS student at UC Berkeley, broadly interested in understanding intelligent behavior from the perspectives of machine learning & neuroscience, and using related insights to design reliable & robust AI systems. For the latter, topics of interest include:

  • Alignment: Designing AI systems to be reliable and safe to deploy, with a focus on misspecification (outer alignment) and misgeneralization (inner alignment).
  • Evaluation: Evaluating whether a given AI system is aligned and behaves as intended via comprehensive behavioral evaluations and scalable oversight, with findings from evaluations (e.g. discovery of unintended behaviour) informing the alignment stage.
  • Control: Mitigating the downstream effects of misalignment through robust deployment-time detection and intervention.

I am also interested in the mathematical aspects of the Standard Model (notes on this topic here).

Posts

subscribe via RSS