Changelog

New releases, fixes, and improvements to NodeFoundry.

v0.14.0
Added
  • Rolling upgrade support for major Ceph version transitions (Reef → Squid). NodeFoundry drains each OSD, upgrades the host OS and Ceph packages, and recommissions before moving to the next node.
  • IPMI power-cycle and hard reboot actions are now available directly from the node list view in the dashboard.
  • New nf cluster health --verbose flag shows per-OSD and per-placement-group status inline.
  • API endpoint POST /v1/clusters/{id}/upgrade with progress streaming via GET /v1/operations/{id}.
Changed
  • Dashboard cluster overview page redesigned with a 24-hour utilization timeline and IOPS sparklines per node.
  • API rate limits for Professional tier raised from 300 to 600 requests per minute.
  • The nf cluster create --wait flag now streams live progress instead of polling.
Fixed
  • S.M.A.R.T. polling failure on certain NVMe devices that report no temperature sensor.
  • nf node list now correctly shows nodes that are in maintenance mode rather than showing them as offline.
  • Race condition in the OSD activation sequence on nodes with more than 24 drives.
v0.13.2
Fixed
  • Race condition in OSD drain that could stall upgrade operations on clusters with more than 64 OSDs.
  • Alert webhook retries were not respecting exponential backoff — they would flood the endpoint on network errors.
  • Node registration timing issue on certain Supermicro BIOS versions that send DHCP requests before the NIC link is stable.
  • Fixed an off-by-one error in the PG count calculation when creating erasure-coded pools with k=4, m=2.
v0.13.0
Added
  • S.M.A.R.T. monitoring for HDDs and NVMe devices, with configurable alert thresholds per drive model. Predictive failure events generate dashboard warnings and optional webhook alerts.
  • Centralized log aggregation — OSD, monitor, and service logs from all nodes are indexed in Loki and searchable from the dashboard and via nf log search.
  • Webhook alerting: route cluster health events to Slack, PagerDuty, or any HTTP endpoint. Configure per-severity thresholds and retry policies.
  • New nf log search CLI command with --since, --until, --node, and --level flags.
Changed
  • Minimum supported Ceph version raised to Pacific (16.2.x). Luminous and Nautilus clusters are no longer supported for new deployments.
  • iPXE bootstrap image now uses a read-only tmpfs overlay to improve boot reliability on nodes with marginal disks.
  • DHCP lease timeout reduced from 24 hours to 4 hours to improve re-registration after IP changes.
v0.12.0
Added
  • NVMe-oF gateway service provisioning. Deploy and manage NVMe-oF targets on top of RBD pools from the dashboard and CLI.
  • Multi-cluster selector in the dashboard — switch between clusters without navigating away.
  • nf pool create --erasure with support for named erasure code profiles and automatic PG count calculation.
  • Node maintenance mode: nf node maintenance <name> drains OSDs and suppresses alerts while you perform hardware work.
Changed
  • Cluster create API now returns an operation_id and supports streaming progress via SSE at /v1/operations/{id}/stream.
  • Dashboard metrics retention increased from 7 days to 30 days on Professional tier.
Fixed
  • CephFS MDS placement on single-rack topologies was placing both active and standby on the same host.
  • CRUSH map generation failed on clusters where node hostnames contained uppercase letters.
v0.11.0
Added
  • General availability. NodeFoundry is out of beta.
  • Object Gateway (RGW) service provisioning with S3-compatible endpoint and configurable zonegroups.
  • API key management: create scoped read-only and read-write keys per account.
  • Ceph Squid (19.x) support.