Generally Intelligent #14: Yash Sharma, MPI-IS, on generalizability, causality, and disentanglement

RSS · Spotify · Apple Podcasts · Pocket Casts

Yash Sharma (Google Scholar) (Website) is a Ph.D. student at the International Max Planck Research School for Intelligent Systems. He previously studied electrical engineering at Cooper Union and has spent time at Borealis AI and IBM Research. Yash's early work was on adversarial examples and his current research interests span a variety of topics in representation disentanglement. In this episode, we discuss robustness to adversarial examples, causality vs. correlation in data, and how to make deep learning models generalize better.

Some highlights from our conversation

"The way we train neural nets, the way we do supervised learning, it's super convenient, and it's gotten us very far. But the way we do it is so different from how humans learn. Neural nets are trained from scratch on IID images and one-hot labels. Humans learn on interactive, dynamic experience where their task is constantly changing and they're always observing distribution shifts."

"It's difficult to think how academics can really contribute when they aren't able to train at that kind of compute scale like Google or Facebook can. But if we study formal problems, where the problems can be studied at small scale, we can make progress."

"The disentanglement definition, kind of what was put out by beta-VAE, was saying 'we want each dimension to represent different information.' But [...] some things literally can't be put into a single continuous latent. If I talk about 3D rotation, 3D rotation is correlated. So how am I going to put three-dimensional rotation into a single latent dimension?"

Referenced in this podcast

Further discussions

  • Yash brought up the dilemma of exploration vs. exploitation in research, explaining why he decided to switch his focus in grad school instead of continuing to build on his expertise in adversarial robustness. In particular, he noted that incoming grad students in an increasingly competitive admissions landscape are often expected to already have experience in whatever topic they plan to specialize in. How can new researchers optimally balance exploitation of previous experience with exploration of the broader field?

  • We discussed how lack of robustness to adversarial examples provides a human-to-AI comparison that seems worth digging into, as it demonstrates severe out of distribution generalizability. On the other hand, these are not naturally occurring distribution shifts. Beyond its practical security implications, does robustness to adversarial examples have anything to teach us about intelligence?

Thanks to Tessa Hall for editing the podcast.