The Claude Code Leak

(build.ms)

79 points | by mergesort 3 hours ago

15 comments

  • thaumaturgy 1 hour ago
    I wonder what happened to the person that wrote "Coding as Creative Expression" (https://build.ms/2022/5/21/coding-as-creative-expression/)?

    I'm not (just) being glib. That earlier article displays some introspection and thoughtful consideration of an old debate. The writing style is clearly personal, human.

    Today's post is not so much. It has LLM fingerprints on it. It's longer, there are more words. But it doesn't strike me as having the same thoughtful consideration in it. I would venture to guess that the author tried to come up with some new angles on the news of the Claude Code leak, because it's a hot topic, and jotted some notes, and then let an LLM flesh it out.

    Writing styles of course change over time, but looking at these two posts side by side, the difference is stark.

    • mergesort 1 hour ago
      Hey there, author of the post here! I actually wrote this piece myself on my phone while I was out for a walk this morning. It was initially meant to be a quick note more than a full blog post —- whereas Coding As A Creative Expression took me a couple of days to write.

      I made a commitment to write more this year and put my thoughts out quicker than I used to, so that’s likely the primary reason it’s not as deep of a piece of writing as the post you’re referencing. But I do want to note that this wasn’t written using AI, it just wasn’t intended to be as rich of a post.

      The reason it came out longer is that I’ve honestly been thinking about these ideas for a while, and there is so much to say about this subject. I didn’t have any particular intention of hopping on a news cycle, but once I started writing the juices were flowing and I found myself coming up with five separate but interrelated thoughts around this story that I thought were worth sharing.

      • tpoacher 0 minutes ago
        Reminds me of the classic Mark Twain quote: "Apologies, I didn't have time to write a short letter, so I wrote a long one."
    • raincole 7 minutes ago
      What changed is you, the reader. In 2026 we treat the smallest signs as evidence of LLM writing. Too long? LLM. Too short? LLM. Too grammatically correct? Must be LLM.
    • grey-area 1 hour ago
      It does read as if were written on a phone but it doesn’t read like LLM text to me.

      What is interesting and has possibly bled over from heavy LLM use by the author is the style of simplistic bullet point titles for the argument with filler in between. It does read like they wrote the 5 bullet points then added the other text (by hand).

    • stbev 40 minutes ago
      Have you noticed that comments like "this post seems written with AI" are now appearing on all posts, even those written without AI?

      We're starting to become wary due to the abuse of AI and proliferation of sloppy content, but also because we often have trouble distinguishing authentic from sloppy content.

      Another feature of this AI era that I hate.

  • himata4113 1 hour ago
    I personally found it really amusing how they weaponized the legal system to DMCA all the claude code source code repositories. Code ingested into the model is not copyrightable, but produced code apparently is when by legal definition computer generated code can not be copyrighted and that's one of their primary arguments in legal cases.
  • kstenerud 16 minutes ago
    > It should serve as a warning to developers that the code doesn’t seem to matter, even in a product built for developers.

    Code doesn't matter IN THE EARLY DAYS.

    This is similar to what I've observed over 25 years in the industry. In a startup, the code doesn't really matter; the market fit does.

    But as time goes on your codebase has to mature, or else you end up using more and more resources on maintenance rather than innovation.

    • trhway 6 minutes ago
      alternatively the code can go the way of "fast fashion" and even "3d-print your garments in the morning according to your feelings and weather and recycle at the end of the day".

      If dealing with a functionality that is splittable into microfeatures/microservices, then anything that you need right now can potentially be vibe-coded, even on the fly (and deleted afterwards). Single-use code.

      >But as time goes on your codebase has to mature, or else you end up using more and more resources on maintenance rather than innovation.

      tremendous resource sink in enterprise software. Solving it, even if making it just avoidable - may be Anthropic goes that way and leads the others - would be a huge revolution.

  • leduyquang753 2 hours ago
    > Many software developers have argued that working like a pack of hyenas and shipping hundreds of commits a day without reading your code is an unsustainable way to build valuable software, but this leak suggests that maybe this isn’t true — bad code can build well-regarded products.

    The product hasn't been around long enough to decide whether such an approach is "sustainable". It is currently in a hype state and needs more time for that hype to die down and the true value to show up, as well as to see whether it becomes the 9th circle of hell to keep in working order.

    • mergesort 1 hour ago
      Hey there, author of the post here. I actually agree with this! That is in fact why I used the word maybe — my comment really was meant to be more speculative than definitive.
      • 59nadir 38 minutes ago
        I think one thing that goes unmentioned is that maybe code quality is really not that important for trivial things, because they can be trivially reproduced if need be. I would argue Claude Code is exactly such a project; coding agents are incredibly simple and rewriting CC wouldn't be much of a problem.

        Non-trivial things tend to be much more sensitive to code quality in my experience, and will by necessity be kept around for longer and thus be much more sensitive to maintenance issues.

        • rakel_rakel 11 minutes ago
          > maybe code quality is really not that important for trivial things

          I hear this narrative being pushed quite a bit, and it makes my spidey senses tingle every time. Secure programs are a subset of correct programs, and to write and maintain correct programs you need to have a quality mindset.

          A 0-day doesn't care if it's in a part of your computer you consider trivial or not.

  • anematode 2 hours ago
    > But then the clean room implementations started showing up. People had taken Anthropic’s source code and rewritten Claude Code from scratch in other languages like Python and Rust.

    Seems like the phrase "clean room" is the new "nonplussed"... how does this make any sense?

    • mergesort 1 hour ago
      Heya, post author here. I think I was just wrong about this assertion. I got into a discussion with a copyright lawyer over on Bluesky[^1] after I wrote this and came away reasonably convinced that this wouldn’t be a valid example of a clean room implementation.

      [^1]: https://bsky.app/profile/mergesort.me/post/3mihhaliils2y

    • aeternum 52 minutes ago
      The most fitting method would be to be to train an LLM on the Claude Code source-code (among other data).

      Then use Anthropic's own argument that LLM output is original work and thus not subject to copyright.

    • recursive 2 hours ago
      I think it means you write a spec from the implementation. Then you write a new implementation from the spec. You might go so far as to do the second part in a "clean" room.
      • m132 1 hour ago
        Heh, the original being entirely vibed had me thinking of an interesting problem: if you used the same model to generate a specification, then reset the state and passed that specification back to it for implementation, the resulting code would by design be very close to the original. With enough luck (or engineering), you could even get the same exact files in some cases.

        Does this still count as clean-room? Or what if the model wasn't the same exact one, but one trained the same way on the same input material, which Anthropic never owned?

        This is going to be a decade of very interesting, and probably often hypocritical lawsuits.

      • roywiggins 2 hours ago
        right. that's not what people are doing here though, at all
      • john_strinlai 2 hours ago
        in a typical clean-room design, the person writing the new implementation is not supposed to have any knowledge of the original, they should only have knowledge of the specification.

        if one person writes the spec from the implementation, and then also writes the new implementation, it is not clean-room design.

        • post_below 2 hours ago
          I believe the argument is that LLMs are stateless. So if the session writing the code isn't the same session that wrote the spec, it's effectively a clean room implementation.

          There are other details of course (is the old code in the training data?) but I'm not trying to weigh in on the argument one way or the other.

  • twelfthnight 2 hours ago
    Seems equally valid to come out of this with the takeaway that code quality _does_ matter, because poor coding practices are what led to the leak.

    Sure, the weights are where the real value lives, but if the quality is so lax they leak their whole codebase, maybe they are just lucky they didn’t leak customer data or the model weights? If that did happen, the entire business might evaporate overnight.

  • Finbarr 7 minutes ago
    Who cares that the code is garbage? As the models get bigger and more powerful it will be trivial to fully refactor the whole codebase. It’s coming sooner than you think.
  • grey-area 1 hour ago
    Points from the article.

    1. The code is garbage and this means the end of software.

    Now try maintaining it.

    2. Code doesn’t matter (the same point restated).

    No, we shouldn’t accept garbage code that breaks e.g. login as an acceptable cost of business.

    3. It’s about product market fit.

    OK, but what happens after product market fit when your code is hot garbage that nobody understands?

    4. Anthropic can’t defend the copyright of their leaked code.

    This I agree with and they are hoist by their own petard. Would anyone want the garbage though?

    5. This leak doesn’t matter

    I agree with the author but for different reasons - the value is the models, which are incredibly expensive to train, not the badly written scaffold surrounding it.

    We also should not mistake current market value for use value.

    Unlike the author who seems to have fully signed up for the LLM hype train I don’t see this as meaning code is dead, it’s an illustration of where fully relying on generative AI will take you - to a garbage unmaintainable mess which must be a nightmare to work with for humans or LLMs.

  • komali2 17 minutes ago
    > bad code can build well-regarded products.

    Yes, exactly. Products.

    It seems like me and all the engineers I've known always have this established dichotomy: engineers, who want to write good code and to think a lot about user needs, and project managers/ executives/sales people, who want to make the non-negative numbers on accounting documents larger.

    The truth is that to write "good software," you do need to take care, review code, not single-shot vibe code and not let LLMs run rampant. The other truth is that good software is not necessary good product; the converse is also true: bad product doesn't necessarily mean bad software. However there's not really a correlation, as this article points out: terrible software can be great product! In fact if writing terrible software lets you shit out more features, more quickly, you'll probably come ahead in business world than someone carefully writing good software but releasing more slowly. That's because the priorities and incentives in business world are often in contradiction to priorities and incentives in human world.

    I think this is hard to grasp for those of us who have been taught our whole lives that money is a good scorekeeper for quality and efficacy. In reality it's absolutely not. Money is Disney bucks recording who's doing Disney World in the most optimal way. Outside of Disney World, your optimal in-park behavior is often suboptimal for out-of-park needs. The problem is we've mistaken Disney World for all of reality, or, let Walt Disney enclose our globe within the boundaries of his park.

    > The object which labor produces confronts it as something alien, as a power independent of the producer.

  • slopinthebag 36 minutes ago
    Claude Code proves you don't need quality code — you just need hundreds of billions of dollars to produce a best-in-class LLM and then use your legal team to force the extreamly subsidised usage of it through your own agent harness. Or in other words, shitty software + massive moat = users.

    Seriously, if Anthropic were like oAI and let you use their subscription plans with any agent harness, how many users would CC instantly start bleeding? They're #39 in terminal bench and they get beaten by a harness that provides a single tool: tmux. You can literally get better results by giving Opus 4.6 only a tmux session and having it do everything with bash commands.

    It seems premature to make sweeping claims about code quality, especially since the main reason to desire a well architected codebase is for development over the long haul.

  • jeremie_strand 1 hour ago
    [dead]
  • panavm 1 hour ago
    [dead]
  • michaelashley29 43 minutes ago
    [dead]
  • pregseahorses 1 hour ago
    They just said this was an April Fools joke.
    • gfosco 1 hour ago
      No you fell for someone elses joke.
    • dodu_ 26 minutes ago
      jokesonthemiwasonlypretending.png