I am a 1st year CS student at UC Berkeley, broadly interested in understanding intelligent behavior from the perspectives of machine learning & neuroscience, and using related insights to design reliable & robust AI systems. For the latter, topics of interest include:
- Alignment: Designing AI systems to be reliable and safe to deploy, with a focus on misspecification (outer alignment) and misgeneralization (inner alignment).
- Evaluation: Evaluating whether a given AI system is aligned and behaves as intended via comprehensive behavioral evaluations and scalable oversight, with findings from evaluations (e.g. discovery of unintended behaviour) informing the alignment stage.
- Control: Mitigating the downstream effects of misalignment through robust deployment-time detection and intervention.
I am also interested in the mathematical aspects of the Standard Model (notes on this topic here).
Posts
-
Constructing quantum field theories
-
Notes on prosaic alignment and control
-
Variational framework for perception and action
-
Architectures by symmetry
-
Free-energy
-
Understanding HTM
-
Brief notes on spiking neuron models
-
Credit assignment
subscribe via RSS