Summary: Large Manufacturer needed to reduce manual remediation toil, replace aging Puppet workflows, and connect their observability platform to their automation tooling. Shadow-Soft delivered a working Dynatrace→EDA→Ansible self-healing architecture in six weeks, with an enterprise governance framework built to scale to ~600 nodes across multiple plants.
Every time Dynatrace flagged a problem — a service degradation, disk pressure, an application going unstable — someone had to act on it. That meant pulling engineers from other work, introducing a window of exposure, and watching mean time to remediation climb.
The team was also carrying legacy Puppet workflows that had grown difficult to maintain as their infrastructure expanded. And when HashiCorp Terraform finished provisioning new infrastructure, the handoff to configuration management was still a manual step.
The client needed a platform that could close all three gaps at once: eliminate manual response for known incident classes, standardize configuration management without starting from scratch, and connect provisioning to configuration in a single pipeline — all with the governance controls a distributed manufacturing operation requires.
Shadow-Soft designed and delivered a solution centered on Red Hat Ansible Automation Platform 2.x with Event-Driven Ansible at the core. The goal was to prove real business value quickly — not with a broad rollout, but with a tightly scoped, outcome-tied engagement delivered in six weeks.
The architecture connected Dynatrace to EDA via webhook. When Dynatrace fires a problem event, an EDA rulebook evaluates it against defined conditions and automatically triggers the appropriate AAP job template — no human in the loop. We called it a closed-loop remediation flow.
Alongside the self-healing use case, we validated a migration path off Puppet using an idempotent Ansible baseline role, and wired a Terraform pipeline to invoke AAP job templates via REST API for post-provision configuration.
The engagement had a six-week delivery window, so we structured the work to prove value early and build governance in parallel.
The engagement scope was clear from day one, but the environment introduced one complexity that required careful handling: integrating EDA with Dynatrace's webhook meant validating the full event-to-remediation chain in a non-production environment that didn't perfectly mirror production alerting conditions. We worked with the client's team to simulate representative incident classes and confirm the rulebook logic held before the stakeholder demo.
The bigger challenge was governance. Building RBAC, credential segregation, and Automation Hub content controls that would actually hold at scale — across multiple teams and eventually multiple plants — required more architectural rigor than a typical PoC. We treated it like a production foundation, not a prototype.
Red Hat Ansible Automation Platform 2.x (Controller, Event-Driven Ansible, Automation Hub): The core automation engine. Controller ran job templates and workflows, EDA handled event-driven remediation, and Automation Hub provided the curated content governance layer.
Dynatrace: The observability source. Problem events from Dynatrace's API triggered EDA rulebooks via webhook, closing the loop between detection and remediation.
Red Hat Enterprise Linux 8/9: Base OS for AAP nodes and target systems throughout the solution build.
HashiCorp Terraform: Existing provisioning pipeline. We wired it to invoke AAP job templates via REST API as a post-provision configuration step.
Active Directory: Integrated with AAP for SSO, LDAP-based RBAC, credential segregation, and audit trail.
Git: Source control for all AAP Projects and Execution Environments.
The clientwent from manual incident response to a closed-loop remediation architecture that detects, evaluates, and fixes covered incident classes without human intervention — in the time it takes a webhook to fire.
The Puppet replacement pilot gave the team a repeatable, idempotent Ansible baseline they can extend to the full node inventory without rebuilding from scratch for each host group. The Terraform integration eliminated the manual gap between provisioning and configuration. And the governance framework — AD/RBAC, Automation Hub, credential management, audit logging — is built to hold as the platform scales.
Key Results:
Client is using the solution build as the foundation for a broader rollout. The immediate priorities are extending EDA remediation to additional incident classes and onboarding more teams under the existing RBAC framework.
The longer-term roadmap includes migrating remaining Puppet-managed workloads to Ansible, scaling AAP toward the ~600-node target across multiple plants, and building out additional Terraform→AAP pipeline integrations as provisioning patterns expand.
Shadow-Soft remains engaged as the automation architecture grows.
Client is a Fortune 500 company operating dozens of manufacturing facilities across the United States.