I also bet my money on Azure. Someone who allegedly worked there recently posted an article here on the numerous problems with Azure. Sadly I didn’t bookmark it.
Or even both. In any kind of continuous deployment, you'd expect outages at the point of deployment, or shortly thereafter as the unintended consequences ripple.
Then the load during the working days makes those ripples larger and into outages.
Most outages are caused by changes by humans ("actors"?), very rarely are things "People just dig our stuff so much we can't keep up" but more often "We didn't think about this performance drawback when we built thing X, now it's hurting us", and of course, more outages when you try to fix those issues without fully considering the scope and impact.
I don't really understand why this is happening at this scale, it's not like they just became broke and can't afford a proper server... can someone explain?
Agents are shipping code faster all over the world and in some cases 24 hours a day. Additionally, some significant number of non-developers are now developers i.e. they are also shipping to github regularly.
This is not limited to just pushing code but all the bells and whistles that github added as features under the assumption of some predictable growth are now exceeding the original plans.
I suspect a lot of their existing systems have to be re-architected for unanticipated scale, and it won't happen overnight for sure.
Pretty damning. Would also be interesting to see the number of commits overlayed. The graph tells a great story about the correlation with MS's takeover, but I wonder if at the same time that uptime went to shit, MS was shifting over large numbers of enterprise contracts to github. That would be a more complete story IMO.
None of which excuses this. Can you imagine someone's reaction in 2017 if you told them that github would be below 90% uptime in 2026? It would be unimaginable.
The faster you move, the more you screw up, almost no company producing software have figured out how to move fast and not screw up. It's so hard, that companies even used to boast about how much they didn't care about screwing up, as long as they moved fast.
Add in new "productivity" tools that help you move even faster, with even less regards for how much you screw up (even though the tool could be used for you to move at the same speed, but with less screw ups), and an engineering culture which boils down to "Why not?", and you get platforms run by Microsoft that are unable to achieve two nines of reliability.
This website has no overused ai-generated animations and... I quite enjoy it. The original website[1] has a fade-in animation, big round cards, shadows, all the jazz you can think of, it's there.
This site is very readable, very honest and sober. I don't need to sift through buzzwords to figure out tiny details.
> Across 170 days with at least one incident · worst day Thu, Nov 20, 2025 (1.1 days)
1.1 days total how is that possible? Scrolling over that day doesn't indicate the math behind the scenes - 1.3 hours single bullet point.
Also Nov 19 has a bullet point 1.3 day outage but total is 8.1 hours
YOU NEED TO USE MOAR AI!
https://isolveproblems.substack.com/p/how-microsoft-vaporize...
HN thread: https://news.ycombinator.com/item?id=47616242
Hosting forgejo is really easy as well. It being a single binary makes it really easy to handle with almost zero maintenance.
Then the load during the working days makes those ripples larger and into outages.
This is not limited to just pushing code but all the bells and whistles that github added as features under the assumption of some predictable growth are now exceeding the original plans.
I suspect a lot of their existing systems have to be re-architected for unanticipated scale, and it won't happen overnight for sure.
https://damrnelson.github.io/github-historical-uptime/
None of which excuses this. Can you imagine someone's reaction in 2017 if you told them that github would be below 90% uptime in 2026? It would be unimaginable.
They’re making political decisions based on what they sell vs what’s actually useful for their use case.
It’s kind of impossible to find out if this is true though.
Add in new "productivity" tools that help you move even faster, with even less regards for how much you screw up (even though the tool could be used for you to move at the same speed, but with less screw ups), and an engineering culture which boils down to "Why not?", and you get platforms run by Microsoft that are unable to achieve two nines of reliability.
Similarly, i see google releasing advancement after advancement in LLM yet i see antigravity sub where people are crying all time.
or just a multifactor of both.
It does look like Friday outages were a bit rarer, which could be due to having a "no deployments on Friday" rule.
This website has no overused ai-generated animations and... I quite enjoy it. The original website[1] has a fade-in animation, big round cards, shadows, all the jazz you can think of, it's there.
This site is very readable, very honest and sober. I don't need to sift through buzzwords to figure out tiny details.
Thank you, OP!
1: https://mrshu.github.io/github-statuses/
[0] https://news.ycombinator.com/item?id=22867803