We've been operating Ceph in production for 12 years. Here's why we built NodeFoundry.
Our founding team ran Ceph clusters for major cloud providers and infrastructure-heavy businesses before building NodeFoundry. This is the tooling we always wished existed.
We’ve been operating Ceph in production for 12 years
Before NodeFoundry, our team spent over a decade operating Ceph clusters — for cloud providers, financial services firms, and data-intensive startups. We’ve seen multi-petabyte clusters, we’ve handled 3 AM OSD failures, and we’ve written more Ansible playbooks for Ceph than we care to count.
The operational gap
Ceph is genuinely the right answer for software-defined storage. It’s battle-tested, it scales, and the community is excellent. But operating it requires expertise that’s hard to hire and institutional knowledge that lives in runbooks, not tooling.
Every team we talked to had some version of the same story: a senior engineer who “knows Ceph” carries the on-call burden, and when that person leaves, institutional knowledge walks out the door with them.
What we decided to build
NodeFoundry encodes what good Ceph operations look like into software. Not a managed service — you run it on your hardware, in your network, with your data staying where it belongs. But the operational intelligence — the drain sequences, the upgrade orchestration, the SMART monitoring, the failure domain awareness — that’s built in.
We’re building the tool we always wished existed.
Want to see it for yourself?
We're happy to walk you through it.
No pitch deck. Just a real conversation about your infrastructure, your cluster size, and whether NodeFoundry is the right fit. If it's not, we'll tell you.