• Generally Intelligent #5: Katja Schwarz, MPI-IS, on GANs, implicit functions, and 3D scene understanding

    RSS · Spotify · Apple Podcasts · Pocket Casts

    Katja Schwartz (Google Scholar) came to machine learning from physics, and is now working on 3D geometric scene understanding at the Max Planck Institute for Intelligent Systems. Her most recent work, “Generative Radiance Fields for 3D-Aware Image Synthesis,” revealed that radiance fields are a powerful representation for generative image synthesis, leading to 3D consistent models that render with high fidelity.

    We discuss the ideas in Katja’s work and more:

    🥦 the role 3D generation plays in conceptual understanding

    📝 tons of practical tips on GAN training

    〰 continuous functions as representations for 3D objects

    Some quotes we loved:

    If you can generate something or if you can create something yourself like a prototype, it’s nice because you can see as a program of what your model learned from the data. And on the other hand, if you reconstruct, it’s way harder to see if there’s a true understanding, or if it’s just rapid juicing things. I guess why I favor this GAN approach because I feel that there’s a lot of potential also for robustness. It’s closer to this understanding. We humans can also think of different versions.

    In our last project, we were actually thinking about using a VAE approach, just because of these nice stability properties. Then we decided against it because it meant that we would need to use posed images. Because if you want to reconstruct it, you need to know which pose it was taken from. And that’s an advantage then again for GANs where you don’t need to reconstruct exactly the same image.

    Below is the full transcript. As always, please feel free to reach out with feedback, ideas, and questions!

  • Generally Intelligent #4: Joel Lehman, OpenAI, on evolving intelligence, open-endedness, and reinforcement learning

    RSS · Spotify · Apple Podcasts · Pocket Casts

    Our fourth episode features Joel Lehman (Google Scholar), previously a founding member at Uber AI Labs and assistant professor at the IT University of Copenhagen. He’s now a research scientist at OpenAI, where he focuses on open-endedness, reinforcement learning, and AI safety.

    Joel’s PhD dissertation introduced the novelty search algorithm. That work inspired him to write the popular science book, “Why Greatness Cannot Be Planned”, with his PhD advisor Ken Stanley, which discusses what evolutionary algorithms imply for how individuals and society should think about objectives.

    A few of the topics we touch on in this episode:

    • How discovering novelty search totally changed Joel’s philosophy of life
    • Sometimes, can you reach your objective more quickly by not trying to reach it?
    • Better ways to evolve intelligence
    • Why reinforcement learning is a natural framework for open-endedness

    Some quotes we loved:

    Novelty search is a way of looking at the question of how ambitious objectives are reached: how do you accomplish ambitious things?

    What novelty search did was ask a seemingly zen-like question, which is, “Sometimes, can you reach your objective more quickly by not trying to reach it?”

    The easiest way to talk about open-endedness is to point to some processes that we’re familiar with that are open-ended. Biological evolution, for example, is an algorithm that’s run for billions of years and one run of biological evolution has produced all the amazing diversity of life, including ourselves. And so, it’s amazing that a volitionless process produced volition. That’s what I mean by open-ended. It’s continuing to create new things, diverse things, over time.

    Imagine this SAT analogy problem, which is bird is to jet as evolution is to open-endedness. The idea is that birds’ flight is really interesting, it shows it’s possible to fly, and maybe we even tried to mimic that with ornithopter and those kinds of machines. But really, the history of flight took off when we had some hypothesis about the core principles of flight, that enabled us to engineer things that could more efficiently do things maybe that weren’t even possible for biology. You can’t create a bird that carries tons and tons of cargo across the ocean. Similarly, while we might take inspiration from biological evolution, what are the core principles of open-ended creativity? If you could bottle creativity into a jar, into an algorithm, that could efficiently instantiate an open-ended process that is aimed towards whatever domain we want, that could do so really efficiently and in a way that’s not like biological evolution, but actually has capabilities that are much more impressive than biological evolution?

    Below is the full transcript. As always, please feel free to reach out with feedback, ideas, and questions!

  • Generally Intelligent #3: Cinjon Resnick, NYU, on activity and scene understanding

    RSS · Spotify · Apple Podcasts · Pocket Casts

    On this episode of Generally Intelligent, we interview Cinjon Resnick (Google Scholar), formerly from Google Brain and now doing his PhD at NYU, about why he believes scene understanding is critical to out of distribution generalization, and how his theses have evolved since he started his PhD.

    Some topics we over:

    • How Cinjon started his research by trying to grow a baby through language and games, before running into a wall with this approach
    • How spending time at circuses 🎪 and with gymnasts 🤸🏽‍♂️ re-invigorated his research, and convinced him to focus on video, motion, and activity recognition
    • Why MetaSIM and MetaSIM II are underrated papers
    • Two research ideas Cinjon would like to see others work on

    Below is the full transcript. As always, please feel free to reach out with feedback, ideas, and questions!

  • Generally Intelligent #2: Sarah Jane Hong, Latent Space, on neural rendering & research process

    RSS · Spotify · Apple Podcasts · Pocket Casts

    Excited to release our second episode of Generally Intelligent! This time we’re featuring Sarah Jane Hong, co-founder of Latent Space, a startup building the first fully AI-rendered 3D engine in order to democratize creativity.

    Sarah co-authored Low Distortion Block-Resampling with Spatially Stochastic Networks from NeurIPS 2020, a very cool method that lets you realistically resample part of a generated image. For example, maybe you’ve generated a castle, but you’d like to change the style of a tower - you could then resample that tower until you get something you like.

    In this episode, we touch on:

    • What it was like taking classes under Geoff Hinton in 2013
    • How not to read papers
    • The downsides of only storing information in model weights
    • Why using natural language prompts to render a scene is much harder than you’d expect
    • Why a model’s ability to scale is more important than getting state-of-the-art results

    Below is the full transcript. As always, please feel free to reach out with feedback, ideas, and questions!

  • Generally Intelligent #1: Kelvin Guu, Google AI, on language models & overlooked problems

    Generally Intelligent is a podcast made for deep learning researchers (you can learn more about it here).

    RSS · Spotify · Pocket Casts

    Our first guest is Kelvin Guu, a senior research scientist at Google AI, where he develops new methods for machine learning and language understanding. Kelvin is the co-author of REALM: Retrieval-Augmented Language Model Pretraining. The conversation is a wide-ranging tour of language models, how computers interact with world knowledge, and much more; here are a few of the questions we cover:

    • Why language models like GPT-3 seem to generalize so well to tasks beyond just predicting words
    • How you can store knowledge in a database, in the weights of a model, or a with mix of both approaches
    • What interesting problems and data sets have been overlooked by the research community
    • Why cross-entropy might not be such a great objective function
    • Creative and impactful ways language and knowledge models might be used in the future

    Below is the full transcript. We love feedback, ideas, and questions, so please feel free to reach out!

  • Generally Intelligent #0: A podcast for deep learning researchers

    Over the past few years, we’ve been a part of countless conversations with various deep learning researchers about the hunches and processes that inform their work. These conversations happen as part of informal paper reading groups, lab discussions, or casual chats with friends, and they have proved critical to our own research.

    The intimacy and informality of these environments lends itself to stimulating conversations. Yet it often felt like a shame that the deeper understandings and hard-earned lessons from research failures were shared with just a select few, when so many others could use this knowledge to advance the frontier.

    That’s why, today, we’re launching “Generally Intelligent,” a publicly available podcast made by deep learning researchers, for deep learning researchers.

  • Understanding self-supervised and contrastive learning with "Bootstrap Your Own Latent" (BYOL)


    Unlike prior work like SimCLR and MoCo, the recent paper Bootstrap Your Own Latent (BYOL) from DeepMind demonstrates a state of the art method for self-supervised learning of image representations without an explicitly contrastive loss function. This simplifies training by removing the need for negative examples in the loss function. We highlight two surprising findings from our work on reproducing BYOL:

    (1) BYOL often performs no better than random when batch normalization is removed, and

    (2) the presence of batch normalization implicitly causes a form of contrastive learning.

    These findings highlight the importance of contrast between positive and negative examples when learning representations and help us gain a more fundamental understanding of how and why self-supervised learning works.

    The code used for this post can be found at https://github.com/untitled-ai/self_supervised.

  • Appendix for "Understanding self-supervised and contrastive learning with 'Bootstrap Your Own Latent' (BYOL)"


    This post contains the extra data and detail for our post “Understanding self-supervised and contrastive learning with ‘Bootstrap Your Own Latent’ (BYOL)”