Greg Farquhar


DPhil student at University of Oxford, studying (deep) reinforcement learning.

Scholar | Twitter | CV


I am a DPhil student at the University of Oxford under the supervision of Shimon Whiteson. Previously, I obtained a Masters in Physics (1st class honours) from the University of Oxford. I am a former intern of NASA JPL and Facebook AI Research.

My research is about learning optimal sequential decision making from data, primarily in the framework of reinforcement learning (RL). Key areas of focus include cooperative multi-agent RL, learning to plan, curriculum learning, and (any-order) gradient estimation.


Recent highlights

Loaded DiCE: Trading off Bias and Variance in Any-Order Score Function Estimators for Reinforcement Learning. NeurIPS 2019. [paper, code]
Low-variance estimators of any-order derivatives for RL, with advantage estimation and more.

TreeQN and ATreeC: Differentiable Tree Planning for Deep Reinforcement Learning. ICLR 2018. [paper, code]
Building intuitions from model-based planning into architectures for deep RL.

Counterfactual multi-agent policy gradients. AAAI 2018 – Outstanding Student Paper Award [paper, code]
Centralised critic with decentralised actors, per-agent advantage estimate to reduce variance.

Growing Action Spaces. Under review. [paper, code]
Curriculum learning with progressively larger action spaces to shape exploration.

Other publications

Multi-Agent Common Knowledge Reinforcement Learning. NeurIPS 2019. [paper]

A Survey of Reinforcement Learning Informed by Natural Language. IJCAI 2019. [paper]

The StarCraft Multi-Agent Challenge. AAMAS 2019. [paper, code, blog]

QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning. ICML 2018. [paper, code]

DiCE: The Infinitely Differentiable Monte-Carlo Estimator. ICML 2018. [paper, code, blog]

Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning. ICML 2017. [paper]