Below are some highlights from our conversation as well as links to the papers, people, and groups referenced in the episode.
"When I read the ProcGen paper from OpenAI, one thing I was also thinking in terms of the ordering was what if we just thought about the environment as your opponent? So what if we thought about single-agent RL as more of a two-agent problem where basically you had an adversarial environment that acted as your opponent."
"I believe in the value of model-based [RL]. I think that a lot of model-based work does focus on slightly toy settings. And it's toy not because of the environment, it's toy because of the premise of the studies, in the sense that a lot of times when you look at model-based papers, they're essentially learning a model of the RL environment. But the RL environment, by assumption, is already a model — you already have a perfect model for that domain […] in the form of a reinforcement learning environment simulator."
"One of the huge benefits of learning a model [is] that when you learn the model as a neural network you essentially get a differentiable simulator for free."
"In supervised learning [unlike RL], there is an exploration problem (that's how you got your data) but we just assume it's already solved. We assume that there is an outside process that did the exploring and collected all the data."
Thanks to Tessa Hall for editing the podcast.