A Theory of Deep Learning

(elonlit.com)

56 points | by elonlit 1 day ago

7 comments

ks2048 33 minutes ago
The relevant paper: "A Theory of Generalization in Deep Learning". https://arxiv.org/abs/2605.01172
prideout 1 hour ago
This is a fascinating mathematical framework, but the post title might be a bit of an overreach. I often wonder if "a theory of deep learning" could exist that could be stated succinctly and that could predict (1) scaling laws and (2) the surprising reliability of gradient descent.
Note that I said "predict" not "describe". It feels like we're still in the era of Kepler, not Newton.
smokel 33 minutes ago
This essay seems to be related to the paper "There Will Be a Scientific Theory of Deep Learning" [1] which was discussed here recently [2].
[1] https://arxiv.org/pdf/2604.21691
[2] https://news.ycombinator.com/item?id=47893779
airza 1 hour ago
A very fascinating read.
As a fellow tufte css enjoyer, Why is user select turned off on the sidenotes? I would like to be able to copy paste them quite badly.
[-]
- piskov 4 minutes ago
  Layout is fine but font is atrocious.
  Uppercase letters have different stroke width than lowercase ones — it’s like they are *B*old *L*ike this.
  Not only that: tracking, kerning is basically non-existent.
  Please don’t use that open-source font
  You need real Bembo, not that piece of shit
jdw64 57 minutes ago
Does anyone happen to know what font this site is using? It looks really elegant.
[-]
- airza 47 minutes ago
  It is a modified version of ET_Book called ET_Bembo:
  https://github.com/DavidBarts/ET_Bembo
  [-]
  - jdw64 42 minutes ago
    I love u. thanks!
- piskov 3 minutes ago
  Font is atrocious.
  Uppercase letters have different stroke width than lowercase ones — it’s like they are *B*old *L*ike this.
  Not only that: tracking, kerning is basically non-existent.
  Please don’t use that open-source font
  You need real paid Bembo, not that piece of shit.
- DataDaoDe 44 minutes ago
  apparently its the font used in Edward Tufte's books. Its on github: https://edwardtufte.github.io/et-book/
refulgentis 1 hour ago
This is a beautifully written way of saying “Some parts of what the network memorizes affect test behavior, and some don’t.” But that’s not a theory of deep learning, the grand unified theory would explain that.
We're given a signal channel and a reservoir. Signal lives in the channel, noise lives in the reservoir, and the reservoir supposedly doesn’t show up at test time.
Okay, but then we have: why would SGD put the right things in the right bucket?
If the answer is “because the reservoir is defined as the stuff that doesn’t transfer to test,” then this is close to circular.
The Borges/Lavoisier stuff is a tell. "We have unified the field” rhetoric should come after nontrivial predictions and results. Claiming to solve benign overfitting, double descent, grokking, implicit bias, risk of training on population, how to avoid a validation set, and last but not least, skipping training by analytically jumping to the end is 6 theory papers, 3 NeurIPS winners, and a $10B startup. Let's get some results before we tell everyone we unified the field. :) I hope you're right.
[-]
- dwrodri 1 hour ago
  Admittedly probably some aggrandized boasting here, but I think empirical verification of that Adam modification alone would be a meaningful contribution, unless that's prior work?