Yash Sharma, MPI-IS: On generalizability, causality, and disentanglement

RSS · Spotify · Apple Podcasts · Pocket Casts

Yash Sharma (Google Scholar) (Website) is a Ph.D. student at the International Max Planck Research School for Intelligent Systems. He previously studied electrical engineering at Cooper Union and has spent time at Borealis AI and IBM Research. Yash's early work was on adversarial examples and his current research interests span a variety of topics in representation disentanglement. In this episode, we discuss robustness to adversarial examples, causality vs. correlation in data, and how to make deep learning models generalize better.

Some highlights from our conversation

“The way we train neural nets, the way we do supervised learning, it’s super convenient, and it’s gotten us very far. But the way we do it is so different from how humans learn. Neural nets are trained from scratch on IID images and one-hot labels. Humans learn on interactive, dynamic experience where their task is constantly changing and they’re always observing distribution shifts.”

“It’s difficult to think how academics can really contribute when they aren’t able to train at that kind of compute scale like Google or Facebook can. But if we study formal problems, where the problems can be studied at small scale, we can make progress.”

“The disentanglement definition, kind of what was put out by beta-VAE, was saying ‘we want each dimension to represent different information.’ But […] some things literally can’t be put into a single continuous latent. If I talk about 3D rotation, 3D rotation is correlated. So how am I going to put three-dimensional rotation into a single latent dimension?”

Referenced in this podcast

Key papers that inspired Yash’s interest in adversarial examples: Szegedi et al. 2014, Goodfellow et al. 2014, and Carlini & Wagner 2016
Matthias Bethge’s lab, where Yash is doing his PhD work
Bernhard Schölkopf and his work on causality
The Book of Why by Judea Pearl
Foundational work on disentangled systems by Irina Higgins and coauthors: beta-VAE and a more recent paper from 2018 working towards a definition of disentangled representations
Yoshua Bengio and his work on “factors of variation”
Locatello et al. 2018, which won best paper at ICML 2019
Invariant Risk Minimization (IRM) by Arjovsky et. al. and a follow-up paper In Search of Lost Domain Generalization by Gulrajani & Lopez-Paz
A recent paper on non-linear IRM by Lu et al.
AI-generating algorithms (AI-GAs) by Jeff Clune

Further discussions

Yash brought up the dilemma of exploration vs. exploitation in research, explaining why he decided to switch his focus in grad school instead of continuing to build on his expertise in adversarial robustness. In particular, he noted that incoming grad students in an increasingly competitive admissions landscape are often expected to already have experience in whatever topic they plan to specialize in. How can new researchers optimally balance exploitation of previous experience with exploration of the broader field?
We discussed how lack of robustness to adversarial examples provides a human-to-AI comparison that seems worth digging into, as it demonstrates severe out of distribution generalizability. On the other hand, these are not naturally occurring distribution shifts. Beyond its practical security implications, does robustness to adversarial examples have anything to teach us about intelligence?

Thanks to Tessa Hall for editing the podcast.