Show HN: I built a tiny LLM to demystify how language models work

(github.com)

151 points | by armanified 3 hours ago

13 comments

LeonTing1010 5 minutes ago
This is a great way to learn. Most people treat LLMs as black boxes, but building one from scratch — even tiny — forces you to understand attention, tokenization, and loss at a visceral level.
Curious what training data you used and how small you could go before the model stopped producing coherent output. There's an interesting cliff where models go from 'random tokens' to 'plausible text' and understanding where that threshold is teaches a lot about what these models actually learn.
ordinarily 1 hour ago
It's genuinely a great introduction to LLMs. I built my own awhile ago based off Milton's Paradise Lost: https://www.wvrk.org/works/milton
martmulx 27 minutes ago
How much training data did you end up needing for the fish personality to feel coherent? Curious what the minimum viable dataset looks like for something like this.
NyxVox 13 minutes ago
Hm, I can actually try the training on my GPU. One of the things I want to try next. Maybe a bit more complex than a fish :)
gnarlouse 28 minutes ago
I... wow, you made an LLM that can actually tell jokes?
cbdevidal 47 minutes ago
> you're my favorite big shape. my mouth are happy when you're here.
Laughed loudly :-D
nullbyte808 1 hour ago
Adorable! Maybe a personality that speaks in emojis?
SilentM68 1 hour ago
Would have been funny if it were called "DORY" due to memory recall issues of the fish vs LLMs similar recall issues :)
AndrewKemendo 2 hours ago
I love these kinds of educational implementations.
I want to really praise the (unintentional?) nod to Nagel, by limiting capabilities to representation of a fish, the user is immediately able to understand the constraints. It can only talk like a fish cause it’s very simple
Especially compared to public models, thats a really simple correspondence to grok intuitively (small LLM > only as verbose as a fish, larger LLM > more verbose) so kudos to the author for making that simple and fun.
[-]
- dvt 1 hour ago
  > the user is immediately able to understand the constraints
  Nagel's point was quite literally the opposite[1] of this, though. We can't understand what it must "be like to be a bat" because their mental model is so fundamentally different than ours. So using all the human language tokens in the world can't get us to truly understand what it's like to be a bat, or a guppy, or whatever. In fact, Nagel's point is arguably even stronger: there's no possible mental mapping between the experience of a bat and the experience of a human.
  [1] https://www.sas.upenn.edu/~cavitch/pdf-library/Nagel_Bat.pdf
  [-]
  - AndrewKemendo 1 hour ago
    Different argument
    I’m not going to argue other than to say that you need to view the point from a third party perspective evaluating “fish” vs “more verbose thing,” such that the composition is the determinant of the complexity of interaction (which has a unique qualia per nagel)
    Hence why it’s a “unintentional nod” not an instantiation
ethanmacavoy 25 minutes ago
[dead]
Morpheus_Matrix 1 hour ago
[dead]
weiyong1024 1 hour ago
[dead]