Summary:
A regulated manufacturer ran 24/7 production across multiple sites on a 20-year-old Windows stack that introduced unacceptable operational risk. Manual deployments created inconsistencies and slowed outage diagnosis. Shadow-Soft assessed the platform and delivered a defensible, phased roadmap to Kubernetes to ensure business continuity.
With demand ramping up, the manufacturer had to expand capacity and support the workload over the next three to five years while running 24/7 production.
The operation ran with near-zero downtime tolerance, since even a few hours of disruption could scrap high-value batches and wipe out utilization.
A legacy Windows extension layer ran hundreds of scripts and database procedures that teams deployed manually at each site. That model created a brittle, inconsistent environment across facilities. With weak observability, any outage threatened to become a prolonged, high-impact event, posing a significant risk to utilization and revenue.
The organization needed a Kubernetes platform and operating model that met regulatory requirements and long-term cost constraints.
Meanwhile, internal teams pulled in different directions on standardization and pricing. They needed a forcing mechanism to converge on a platform decision before committing to a build.
We helped the client choose a Kubernetes platform (Red Hat OpenShift) and define the operating model for its use in regulated manufacturing.
The work covered platform selection, along with the supporting decisions that make the platform hold up in production: storage and backup, observability, security controls, and CI/CD.
We packaged that into a phased roadmap that moved from proof of concept to initial production and then to rollout across multiple facilities.
The roadmap also defined ownership and enablement: a central platform team runs the cluster baseline, application teams own services and delivery, and local site IT holds break-glass responsibilities, backed by training to standardize the operating rhythm.
We built the recommendation around day two operations. The roadmap set standards for long term management, vendor support expectations, drift control, and a GitOps model that keeps deployments consistent across sites.
We matched the plan to the workload profile and controlled complexity. Most target workloads skew stateless, so the roadmap starts with simpler storage and backup for proof of concept and early production, then defines explicit triggers for when requirements justify heavier storage platforms.
Finally, we sequenced observability the same way: faster troubleshooting first, then a fuller multi-cluster stack once the team hardens operations.
We used our eight-step assessment framework to align cross-functional teams, run a defensible platform evaluation, and turn the decision into a phased plan that de-risks the move from proof of concept to 24/7 production.
Early interviews missed a few key stakeholders, so the team kept the open questions open longer. A follow-up session brought the right owners into the room.
The group worked through the platform trade-offs using shared decision criteria, even when a supporting external architect pushed alternate recommendations.
Cost modeling slowed platform commitment because the team didn’t know the initial and future environment sizing early enough to pressure-test licensing and long-term run costs.
The team treated sizing uncertainty as a gating risk, firmed up assumptions in ranges, and kept the plan phased so the client could move forward without economics that wouldn’t scale.
Red Hat OpenShift: To reduce day two ops burden and meet regulated manufacturing support expectations across sites.
Portworx: To standardize persistent storage and support backup and DR across mixed storage backends and multiple facilities.
GitOps controller (Argo CD or Flux): To prevent site drift and enforce repeatable, auditable deployments across clusters.
Prometheus, Grafana, and Loki: To reduce diagnostic time and eliminate manual log hunting during incidents, with APM as an option for black-box workloads that demand it.
Velero plus CSI snapshots: To support Kubernetes-native backup and restore, with restore testing as a requirement.
Akeyless, with Vault: To support a standardized secrets layer to avoid site-by-site variance and keep GitOps workflows clean.
The engagement replaced an internal platform stalemate with a decision to run the PoC on Red Hat OpenShift, backed by decision criteria leaders could defend.
The client left phase one with a documented operating model that covers deployment, security, storage, backup, and observability for 24/7 regulated manufacturing.
Cost modeling forced a reality check before the build started.
The team tested licensing and run-cost scenarios as sizing firmed up, then delivered a phased roadmap from proof of concept to production and multi-site rollout, with gates that prevent a messy pilot from becoming the long-term platform.
With platform selection and cost modeling finalized, the client is kicking off a proof of concept to validate the target architecture and day-two operations in a regulated, 24/7 manufacturing context.
In parallel, the client will refactor hundreds of legacy scripts and stored procedures into a small set of containerized API services, treating the work as a greenfield refactoring rather than a lift-and-shift.
From there, they’ll rebuild the proof of concept into a production foundation, run a first site pilot, then scale to additional sites.