How does Shazam work?

(perthirtysix.com)

137 points | by datadrivenangel 2 days ago

15 comments

  • swyx 1 hour ago
    related comments from Shazamers

    - OG shazam paper https://www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf (he has a talk on youtube btw look it up if really care)

    - https://news.ycombinator.com/item?id=18069968 shazam employee blogpost

    - https://news.ycombinator.com/item?id=38538996 shazam cofounder endorsed explainer

    - go algo repro https://news.ycombinator.com/item?id=41127726

    as with all ML things... the code is much less % of the value than the data...

  • thakoppno 1 hour ago
    Perhaps obviously this is the same technique that enables ACR on TVs.

    It occurs to me that Shazam has such a better reputation online because the intent and consent of the user is honored.

    It makes me wonder if there couldn’t be an implementation on TVs that is similar and actually is a net positive for consumers. Basically would customers actually like TV ACR if the data wasn’t just going to sell more ads?

    • krustyburger 1 hour ago
      So the value-add would be the consumer would get to find out the name of the show or movie that’s playing, the same info that also pops up if they hit the pause button?
      • thakoppno 1 hour ago
        I was thinking more like interactive content. Do you remember when VH1 had a pop-up music video show?

        Shows could synchronize additional content that’d be visible when Shazam mode enabled.

        • flymasterv 39 minutes ago
          We did this on Fire Phone for live sports and audio based X-Ray cast info. It was, like everything about that phone, a really fun tech demo.
        • w-ll 49 minutes ago
          Pop. Pop. Pop Up Video.
  • SilverElfin 3 minutes ago
    I feel like it does not work well. Shazam struggles to recognize music in real life environments that have some background noise, even with a lot of time. It’s much worse than the built in music recognition Google’s phones have, for example.
  • larodi 42 minutes ago
    There's an algo called dynamic time warping (DTW) and is very often overlooked. My wild guess would be is at play @Shazam.
    • old_bayes 39 minutes ago
      Ayyy I used DTW to track bots on a certain social media site. They tend to act in herds so DTW helps smooth out delayed, repeat actions.
  • Animats 1 hour ago
    Recognizing a recording isn't hard to do, because, for the same recording, the chords follow each other with precisely repeatable timing. That's been around for well over a decade. Recognizing a different recording, say, a, cover version, of the same song, is much more work.

    Audible Magic claims to be able to recognize multiple performances of the same songs, and even parodies.[1] Using, of course, "AI technology" and much more compute.

    [1] https://www.audiblemagic.com/2024/02/07/identifying-cover-so...

    • Gigachad 17 minutes ago
      "Isn't hard to do" is doing some heavy lifting. Obviously on a society level it's simple tech we managed ages ago. But I would bet if you tasked individual devs at building it without looking up the answer, very few could do it.
    • bitexploder 1 hour ago
      20 years at least. I remember seeing how Gracenote worked back in the day when I was consulting for them.
    • andai 1 hour ago
      Why is this harder than "delete timing information" ?
  • gnabgib 2 hours ago
    Again? Oh I see.. SCP (this domain is sus)

    From CameronMacLeod (2022) - and much more complete analysis (587 points, 2023, 155 comments) https://news.ycombinator.com/item?id=38531428

    Or Slate (2009) (50 points, 16 comments) https://news.ycombinator.com/item?id=893353

    • BLKNSLVR 2 hours ago
      Forgive my ignorance, but what does SCP mean in this context? (my normal go-to of 'secure copy' doesn't fit).

      Thanks for the other links, the question in this title is one I've day-dreamily thought about on occasion, but never dug into. Will have a read of all three.

      • Animats 1 hour ago
        Vaguely relevant pop-culture reference.[1]

        [1] https://scp-wiki.wikidot.com/glossary-of-terms

        • BLKNSLVR 1 hour ago
          I seem to have wandered into a parallel universe.

          I think it'll take me longer to understand WTF SCP is than it will to understand how Shazam works.

      • eichin 1 hour ago
        probably the HN-specific "Second Chance Pool" for resurfacing links
    • cyral 2 hours ago
      The interactive parts of this post are very cool though
  • dataviz1000 1 hour ago
    Add to my list of projects. Dinosaur game but with audible clucks to jump.
  • G_o_D 1 hour ago
    Out of curiosity is it possible to prevent shazam like app from detecting maybe by adding noise or any technique ?
    • knodi123 36 minutes ago
      Not unless your noise is louder at certain dominant frequencies than the source. The article gives examples, but the algorithm basically throws away everything except frequency peaks, in order to make the lookups faster.
    • Gigachad 16 minutes ago
      Producers/DJs manage this one by just not releasing their music or edits for ages if ever.
  • cellular 2 hours ago
    I did this for a science project in 1986 on an Apple ][c computer !
  • krishna_dam 1 hour ago
    Surprised to see how that got it worked with out all the "AI" bluff
  • flyuk 1 hour ago
    Nice article - enjoyed reading!
  • blackjackfoe 1 hour ago
    No "AI" required!
  • wood_spirit 1 hour ago
    Reminds me of Roy Van Rijn’s prototype that got a cease and desist letter! Lots of community disappointment at the time!

    https://hn.algolia.com/?q=royvanrijn

  • yawpitch 1 hour ago
    This has been explained so many times… a wizard imbued the kid with the powers of Solomon, Hercules, Atlas, Zeus, Achilles, and Mercury.
  • dackdel 1 hour ago
    voodoo