GitHub experience various partial-outages/degradations

184 points by bhouston 8 hours ago on hackernews | 55 comments

With linkedin down, I wonder if this is an azure thing ? IIRC github is being moved to azure, maybe the azure piece was partially enabled ?

: CubsFan1060 | 8 hours ago
It is: https://azure.status.microsoft/en-us/status
"Impact statement: As early as 19:46 UTC on 2 February 2026, we are aware of an ongoing issue causing customers to receive error notifications when performing service management operations - such as create, delete, update, scaling, start, stop - for Virtual Machines (VMs) across multiple regions. These issues are also causing impact to services with dependencies on these service management operations - including Azure Arc Enabled Servers, Azure Batch, Azure DevOps, Azure Load Testing, and GitHub. For details on the latter, please see https://www.githubstatus.com."

ChrisArchitect | 8 hours ago

Some more earlier: https://news.ycombinator.com/item?id=46860544

llama052 | 8 hours ago

Looks like Azure as a platform just killed the ability for VM scale operations, due to a change on a storage account ACL that hosted VM extensions. Wow... We noticed when github actions went down, then our self hosted runners because we can't scale anymore.

Information

Active - Virtual Machines and dependent services - Service management issues in multiple regions

Impact statement: As early as 19:46 UTC on 2 February 2026, we are aware of an ongoing issue causing customers to receive error notifications when performing service management operations - such as create, delete, update, scaling, start, stop - for Virtual Machines (VMs) across multiple regions. These issues are also causing impact to services with dependencies on these service management operations - including Azure Arc Enabled Servers, Azure Batch, Azure DevOps, Azure Load Testing, and GitHub. For details on the latter, please see https://www.githubstatus.com.

Current status: We have determined that these issues were caused by a recent configuration change that affected public access to certain Microsoft‑managed storage accounts, used to host extension packages. We are actively working on mitigation, including updating configuration to restore relevant access permissions. We have applied this update in one region so far, and are assessing the extent to which this mitigates customer issues. Our next update will be provided by 22:30 UTC, approximately 60 minutes from now.

https://azure.status.microsoft/en-us/status

bob1029 | 7 hours ago

They've always been terrible at VM ops. I never get weird quota limits and errors in other places. It's almost as if Amazon wants me to be a customer and Microsoft does not.

arcdigital | 6 hours ago

Agreed...I've been waiting for months now to increase my quota for a specific Azure VM type by 20 cores. I get an email every two weeks saying my request is still backlogged because they don't have the physical hardware available. I haven't seen an issue like this with AWS before...

: llama052 | 6 hours ago
We've ran into that issue as well, ended up having to move regions entirely because nothing was changing in the current region. I believe it was westus1 at the time. It's a ton of fun to migrate everything over!
That’s was years ago, wild to see they have the same issues.

llama052 | 6 hours ago

It's awful. Any other service in Azure that relies on the core systems seems to have issues trying to depend on it, I feel for those internal teams.

Ran into an issue upgrading an AKS cluster last week. It completely stalled and broke the entire cluster in a way where our hands were tied as we can't see the control plane at all...

I submit a severity A ticket and 5 hours later I get told there was a known issue with the latest VM image that would create issues with the control plane leaving any cluster that was updated in that window to essentially kill itself and require manual intervention. Did they notify anyone? Nope, did they stop anyone from killing their own clusters. Nope.

It seems like every time I'm forced to touch the Azure environment I'm basically playing Russian roulette hoping that something's not broken on the backend.

dgxyz | 6 hours ago

Amazon isn't much better there. Wait until you hit an EC2 quota limit and can't get anyone to look at it quickly (even under paid enterprise support) or they say no.

Also had a few instance types which won't spin up in some regions/AZs recently. I assume this is capacity issues.

paulddraper | 4 hours ago

The cloud isn’t some infinite thing.

There’s a bunch of hardware, and they can’t run more servers than they have hardware. I don’t see a way around that.

: ApolloFortyNine | an hour ago
I was surprised hitting one of these limits once, but it wasn't as if they were 100% out of servers, just had to pick a different node type. I don't think they would ever post their numbers, but some of the more exotic types definitely have less in the pool.

everfrustrated | 4 hours ago

How is Azure still having faults that affect multiple regions? Clearly their region definition is bollocks.

ragall | an hour ago

All 3 hyperscalers have vulnerabilities in their control planes: they're either single point of failure like AWS with us-east-1, or global meaning that a faulty release can take it down entirely; and take AZ resilience to mean that existing compute will continue to work as before, but allocation of new resources might fail in multi-AZ or multi-region ways.

It means that any service designed to survive a control plane outage must statically allocate its compute resources and have enough slack that it never relies on auto scaling. True for AWS/GCP/Azure.

tbrownaw | an hour ago

> It means that any service designed to survive a control plane outage must statically allocate its compute resources and have enough slack that it never relies on auto scaling. True for AWS/GCP/Azure.

That sounds oddly similar to owning hardware.

: ragall | an hour ago
In a way. It means that you can get new capacity most often, but the transition windows where a service gets resized (or mutated in general) has to be minimised and carefully controlled by ops.

everfrustrated | an hour ago

This outage talks about what appears to be a VM control plane failure (it mentions stop not working) across multiple regions.

AWS has never had this type of outage in 20 years. Yet Azure constantly had them.

This is a total failure of engineering and has nothing to do with capacity. Azure is a joke of a cloud.

: mirashii | an hour ago
AWS had an outage that blocked all EC2 operations just a few months ago: https://aws.amazon.com/message/101925/
: ragall | an hour ago
I do agree that Azure seems to be a lot worse: its control plane(s) seems to be much more centralized than the other two.

flykespice | 6 hours ago

Their AI probably hallucinated the configuration change

booi | 8 hours ago

Copilot being down probably increased code quality

fbnszb | 8 hours ago

As an isolated event, this is not great, but when you see the stagnation (if not downwards trajectory) of GitHub as a whole, it‘s even worse in my opinion.

edit: Before someone says something. I do understand that the underlying issue is some issue with Azure.

llama052 | 7 hours ago

Sadly Github moving more into Azure will expose the fragility of the cloud platform as a whole. We've been working around these rough edges for years. Maybe it will make someone wake up, but I don't think they have any motivation to.

cluckindan | 7 hours ago

> Azure

Which is again even worse.

estimator7292 | 5 hours ago

It really doesn't even matter why it failed. Shifting blame on Azure doesn't change the fact that GitHub is becoming more and more unreliable.

I don't get how Microsoft views this level of service as acceptable.

Ronsenshi | 3 hours ago

Doesn't seem like Microsoft managers care - it's not their core business, so any time anyone complains about issues with GitHub they probably think something along the line of "peasants whining again".

Must be nice to be a monopoly that has most of the businesses in the world as their hostages.

: Aeolun | an hour ago
At one point Gitlab seemed like it wanted to compete, but then they killed all the personal and SMB plans, and now they’re just out of the picture for a lot of people. Their team plan is more expensive that GH’s enterprise plan.

falloutx | 7 hours ago

50% of code written by AI, now let the AI handle this outage.

anematode | 7 hours ago

Catch-22, the AI runs on Azure...

: maddmann | 7 hours ago
Ai deploys itself to aws, saving GitHub but destroying Microsoft’s cloud business — full circle

maddmann | 7 hours ago

This is why I come to hacker news. Sanity check on why my jobs are failing.

: [OP] bhouston | 7 hours ago
Exactly same reason why I posted. My Github Actions jobs were not being picked up.
: nialv7 | 6 hours ago
better luck with your next job :)

suriya-ganesh | 7 hours ago

It is always a config problem. somewhere somplace in the mess of permissioning issues.

rvz | 7 hours ago

Tay.ai and Zoe AI Agents probably running infra operations at GitHub and still arguing about how to deploy to production without hallucinating a config file and deploying a broken fix to address the issue.

Since there is no GitHub CEO, (Satya is not bothered anymore) and human employees not looking, Tay and Zoe are at the helm ruining GitHub with their broken AI generated fixes.

: anematode | 5 hours ago
Hey, let them cook.
: deepsun | 23 minutes ago
Hey, does the stock go up or down?

guywithabike | 7 hours ago

It's notable that they blame "our upstream provider" when it's quite literally the same company. I can't imagine GitHub engineers are very happy about the forced migration to Azure.

madeofpalk | 6 hours ago

I would imagine the majority of Github engineers there currently joined post MS acquisition.

macintux | 3 hours ago

That doesn't necessarily mean they're happy about Azure as a backend.

debo_ | 2 hours ago

I've been a software "engineer" for over 20 years, and my personal experience is that software engineers are basically never happy.

: macintux | 2 hours ago
True enough. The world is never as predictable as the computers we program, and the computers we program are never as predictable as we feel they should be.
: VirusNewbie | 2 hours ago
Plenty of happy engineers at the other cloud. :)
: teej | 2 hours ago
I’ve used AWS for almost 20 years and I can tell you it’s more stable than Azure
: tbrownaw | an hour ago
> personal experience is that software engineers are basically never happy.
Being happy means:
- you don't feel the need to automate more manual tasks (you lack laziness)
- you don't feel the need to make your system faster (you lack impatience)
- you don't feel the need to make your system better (you lack hubris)
So basically, happiness is a Sin.

gscho | 3 hours ago

Having worked there around 2020-2021 there were many folks not happy with being forced to use azure and being forced to build GitHub actions based on azure devops. Lots of AWS usage still existed at that time but these days u bet it’s mostly gone.

b00ty4breakfast | 3 hours ago

something about antifreeze in the dogfood

tbrownaw | an hour ago

> notable that they blame "our upstream provider" when it's quite literally the same company

As in why don't they mention Azure by name?

Or as in there shouldn't be isolated silos?

re-thc | 7 hours ago

Jobs get stuck. Minutes are being consumed. The problem isn't just it being unavailable.

levkk | 7 hours ago

This happens routinely every other Monday or so.

: locao | 6 hours ago
I was going to joke "so, it's Monday, right?" but I thought my memory was playing tricks on me.

fishgoesblub | 6 hours ago

Getting the monthly GitHub outage out of the way early, good work.

: spooneybarger | 6 hours ago
well played sir. well played.
: herpdyderp | 2 hours ago
Unfortunately that won’t clear up the weekly GitHub outages

focusgroup0 | 6 hours ago

Will paid users be credited for the wasted Actions minutes?

bandrami | 4 hours ago

In the Bad Old Days before Github (before Sourceforge even) building and package sucked because of the hundred source tarballs you had to fetch, on any given day 3 would be down (this is why Debian does the "_orig" tarballs the way they do). Now it sucks because on any given day either all of them are available or none of them are.