While surprise is a major advantage in battle, it's a nightmare for application security (AppSec) teams. That's why they turn to chaos engineering. It introduces controlled failures into systems to identify vulnerabilities and build up the organization's resiliency. Simulating real-world attacks and disruptions lowers the risk of surprise, addresses potential weaknesses before they're exploited, and makes critical applications more reliable.
The roots of chaos engineering can be traced to practices pioneered by technology giants such as Netflix, which created a chaos engineering tool called Chaos Monkey. That tool randomly disables instances in a production environment, forcing developers to build systems that can gracefully handle failures. The result is a robust architecture that is resilient and capable of delivering consistent performance under duress.
To be effective, however, chaos engineering requires a deep understanding of the underlying systems in an organization. That can be difficult with the parts that rely on commercial software. Those software packages — especially enterprise software packages — are complex and are often black boxes by design.
Frequently, organizations rely on a vendor's promise (via questionnaires) that their software is reliable and secure, even though they lack a deep understanding of the package's inner workings. That creates a situation that can introduce significant risks when trying to apply chaos engineering.
Here's what your team needs to know about leveraging modern software supply chain security (SSCS) to bolster the organization's proactive defenses by including commercial software in chaos engineering.
[ See Special Report: How to Manage Commercial & Third-Party Software Risk ]
1. Lack of transparency leads to weak chaos engineering
The lack of transparency in commercial software heightens the weaknesses of chaos engineering. Chaos engineering often involves simulating network disruptions and hardware failures, which can be effective for identifying flaws in infrastructure components but less so in uncovering weaknesses in software. When software is opaque, it's difficult to design chaos experiments that pinpoint the root causes of failures. But by adopting SSCS practices, organizations can obtain the necessary visibility and control to effectively stress-test their commercial applications.
MJ Kaufmann, an author and instructor with O'Reilly Media, said that commercial software generally limits a user’s control over configuration and operational parameters. She said this restriction can make it difficult to inject faults or simulate the types of disruptions chaos engineering requires.
Itai Birenshtok, vice president of R&D at DoControl, said that presents a tough challenge.
“Unlike in-house-developed systems where we have full visibility and control, commercial applications are often black boxes. We can't easily peek inside or tweak their internals to test resilience.”
—Itai Birenshtok
SSCS tools are crucial to the chaos engineering process, Birenshtok contended. He said they give teams much-needed visibility into commercial software components, their dependencies, and potential vulnerabilities.
“With SSCS tools, we can map out the entire software supply chain, identifying weak points and critical paths that need stress-testing.”
—Itai Birenshtok
Kaufmann said that by automating the detection and application of patches, SSCS tools can help maintain commercial software in an updated state, with reduced vulnerabilities,
“For chaos engineering, it ensures that the systems are secure and that any chaos experiments reflect the current state of the software environment, focusing on relevant and realistic threat scenarios.”
—MJ Kaufmann
Apporwa Verma, an AppSec engineer at Cobalt Labs, said SSCS tools also facilitate integrity verification, ensuring that stress tests are conducted on unaltered software artifacts. Additionally, the tools support runtime monitoring and behavior analysis, allowing for the detection of anomalies and security issues that may only emerge under high-load conditions, she said.
"By leveraging these capabilities, organizations can design more focused, realistic, and effective stress tests that account for the complex relationships within their software supply chain.”
—Apporwa Verma
2. Supply chain security integration can change the game
Birenshtok said that integrating SSCS tools into chaos engineering can be a game changer.
“We can use SSCS data to inform our chaos experiments, targeting specific components or dependencies known to have vulnerabilities. We can also use SSCS tools to monitor the behavior of commercial software during chaos tests, looking for unexpected reactions or security issues that might only surface under stress.”
—Itai Birenshtok
The tools can also be used to collect data before and after chaos engineering experiments to assess their impact on an application’s performance, reliability, and security and identify areas for improving and refining future experiments, he said.
O'Reilly's Kaufmann said that leveraging SSCS tools also aids the automation of test environments to mirror production environments to include potential security vulnerabilities identified through the tools, ensuring that chaos experiments are conducted in a controlled yet realistic setting.
Another aspect of using SSCS tools to enhance chaos engineering, Verma said, is that they can help create supply chain-specific failure scenarios as the hypotheses that form the basis of the chaos experiment.
“These tools can simulate compromised dependencies, certificate revocations, or build-process anomalies, allowing teams to test their system's resilience against supply chain attacks. By incorporating dependency mapping and vulnerability data, chaos experiments can target critical components more effectively."
—Apporwa Verma
Runtime-monitoring features from these tools can be used to detect unexpected behaviors from the introduced chaos during the experiment, she said. “This integration enables organizations to proactively identify and address vulnerabilities in their software supply chain while improving overall system resilience.”
3. A mindset shift is needed to modernize chaos engineering
To effectively stress-test commercial software and ensure robust performance in a chaotic environment, organizations must embrace modern SSCS practices.
“We see this intersection of chaos engineering and software supply chain security as increasingly important, especially in SaaS environments. As organizations rely more heavily on third-party SaaS applications, understanding and testing the resilience of these interconnected systems becomes critical."
—Itai Birenshtok
The key is to shift the organization's mindset and see commercial software not as immutable black boxes, but as integral parts of systems that need to be understood, monitored, and tested just like any in-house component, Birenshtok said.
“By combining chaos engineering principles with robust software supply chain security tools, we can build more resilient systems, even when relying heavily on commercial software.”
—Itai Birenshtok
Keep learning
- Get up to speed on securing AI/ML systems and software with our Special Report. Plus: See the Webinar: The MLephant in the Room.
- Learn how you can go beyond the SBOM with deep visibility and new controls for the software you build or buy. Learn more in our Special Report — and take a deep dive with our white paper.
- Upgrade your software security posture with RL's new guide, Software Supply Chain Security for Dummies.
- Commercial software risk is under-addressed. Get key insights with our Special Report, download the related white paper — and see our related Webinar for more insights.
Explore RL's Spectra suite: Spectra Assure for software supply chain security, Spectra Detect for scalable file analysis, Spectra Analyze for malware analysis and threat hunting, and Spectra Intelligence for reputation data and intelligence.