I am a DPhil student at the University of Oxford under the supervision of Shimon Whiteson. Previously, I obtained a Masters in Physics (1st class honours) from the University of Oxford. I am a former intern of NASA JPL and Facebook AI Research.
My research is about learning optimal sequential decision making from data, primarily in the framework of reinforcement learning (RL). Key areas of focus include cooperative multi-agent RL, learning to plan, curriculum learning, and (any-order) gradient estimation.
Loaded DiCE: Trading off Bias and Variance in Any-Order Score Function Estimators for Reinforcement Learning. NeurIPS 2019.
Low-variance estimators of any-order derivatives for RL, with advantage estimation and more.
Multi-Agent Common Knowledge Reinforcement Learning. NeurIPS 2019. [paper]
A Survey of Reinforcement Learning Informed by Natural Language. IJCAI 2019. [paper]
Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning. ICML 2017. [paper]