Formatting a 25M-line codebase overnight

(stripe.dev)

48 points | by r00k 1 hour ago

8 comments

  • CrzyLngPwd 22 minutes ago
    One of my first jobs was a small software company writing software for a small number of clients, in MS basic PDS.

    The lead developer didn't like to bother with formatting code, so I wrote a tool called makenice to format his nasty spaghetti gibberish into something with good indents and layout to make it easier for us normal people to parse.

    He was furious, literally spun in circles about it right in the office in front of everyone, so I wrote makenasty to format code into the way he appeared to like.

    I only shared makenasty/nice with a couple of the team, who loved it, as it allowed easy conversion between something readable and something the team lead like.

    He never knew about makenasty.

  • hobofan 8 minutes ago
    I'm surprised they went with a all-at-once reformat. Even when doing it over a weekend this is bound to mess with a lot of open PRs at their scale.

    I had to introduce a formatter in a few sizeable codebases in the past (few 100k to few million LOC), and I always did it incrementally via a script that reformatted all files that are not touched in any open PR. The initial run reformatted 95% of all files. Then I ran the script every day for ~two weeks and got up to 99.5% of all files and then manually each time one of the remaining ~dozen PRs that were WIP for longer were merged.

  • varun_ch 1 hour ago
    I’m shocked at the 25M line part! That is a completely unfathomable amount of code for one codebase. I really want to know more about that.
    • bruckie 18 minutes ago
      Only 25 million? :) Google had billions a decade ago...

      https://research.google/pubs/why-google-stores-billions-of-l...

    • jsnell 58 minutes ago
      Right, where is the rest of the code?
    • mr_mitm 52 minutes ago
      They're up to 42 million now, as per the article
      • lukan 33 minutes ago
        That sounds even more insane to me, but I guess most of that code does not really touch financial transactions, otherwise it would be a nightmare being responsible to verify that.
  • burnte 28 minutes ago
    The floating spiral thing is so distracting I spent more time deleting it in Inspector than reading the article. I feel like they hate their readers. Awful.
  • CrzyLngPwd 18 minutes ago
    Surely, it no longer needs to be human-readable, and the era of write-only code is finally upon us with the dawn of AI writing our mealtickets.

    Why bother formatting 25m lines of slop, and why is AI wasting tokens on making code look human-readable anyway?

  • hokkos 32 minutes ago
    Now it makes me wonder, are those 45M LoC are untyped ?
  • exsol 1 hour ago
    [dead]
  • andrewstuart 1 hour ago
    A major financial processing company writes it money handling systems in Ruby.

    Terrifying.

    • mbStavola 1 hour ago
      Considering that it's been doing so successfully at volume for just over 15 years, I think their language choice was fine.
    • sixo 35 minutes ago
      This ought to change your mind about Ruby!
    • skinfaxi 1 hour ago
      Why is that terrifying?
      • fantasizr 6 minutes ago
        ive yet to see a compelling elitist programming language opinion. especially when used at big successful companies. these companies don't function in spite of their technology choices.
      • mikedelago 19 minutes ago
        Some folks don't like shipping
      • Jtsummers 58 minutes ago
        It's not particularly terrifying. Some people really just don't like Ruby.
    • sikozu 1 hour ago
      The systems have to be written in some kind of programming language, and I think Ruby is a perfectly fine choice.
      • Imustaskforhelp 42 minutes ago
        Not denying that Ruby is a perfectly fine choice but within the article itself it says that Stripe runs the world's largest Ruby codebase so certainly it might be testing the constraints of the language.

        The thing I am interested is that I don't suppose that Stripe always had these many LOC's and so I would be curious to know if at any point as the codebase was increasing, were they looking at other new languages which were coming like golang or rust which was more suited for their work or not and what were there decisions/thinking process to continue using ruby.

    • sunrunner 32 minutes ago
      Things can always be worse. It could be PHP, for example.
      • burnte 15 minutes ago
        Facebook runs in it, so I think the language itself is probably a fine choice.
    • semiquaver 43 minutes ago
      I’d hardly call Sorbet Ruby :)
    • benbristow 48 minutes ago
      [dead]