Resilience Engineering: What It Is and Why You Need It

By David Geer

In his famous 2011 Wall Street Journal article, Marc Andreessen, co-creator of the first web browser, Mosaic, wrote, “Software is eating the world.” Digital transformation has since fueled software’s appetite, converting manual processes to automation, counting on code to do the heavy lifting rather than hardware alone.

Criminal actors excel at orchestrating failure conditions in software, driving systems to a state of insecurity, breaking applications and exfiltrating precious data such as intellectual property and customer databases.

Resilience engineering welcomes the insights and experiences of cybersecurity professionals to fortify software against the hammering of modern cyberattacks.

What is Resilience Engineering?

“In cybersecurity, resilience engineering designs and creates software that recovers or adjusts to unexpected conditions,” said Michael Lanham, virtual chief information security officer for Black Talon Security.

In resilience engineering, the vCISO explains, software development teams run progressive tests on the whole software system; they send multiple inputs into the system simultaneously and monitor it for failures. “When the teams see failures, they can redesign the system to cope with them,” he said.

Brute force attacks offer a practical example of how resilience engineering works. These attacks try many username and password combinations in succession in accelerated attempts to log in to a system. Resilience engineering enables the system to drop the connections responsible for the attacks or limit the rate of login attempts to frustrate malicious actors.

What is Not Resilience Engineering?

Resilience engineering is critical to recovery and continuity in the face of debilitating attacks. But some activities don’t fit the definition, and cybersecurity professionals must make the distinction. Good cyber hygiene, for example, doesn’t qualify.

“Building identification, authentication and access control into systems and software is a foundation for making systems cyber-resilient,” said Deborah Bodeau, senior principal cybersecurity engineer for MITRE, “but it’s not resilience engineering.” MITRE is a nonprofit organization that engages with the National Institute of Standards and Technology (NIST) in cybersecurity endeavors for the public and private sectors.

Secure software development misses the mark, too. “Secure programming and security testing aren’t preemptive fault-management methods. Resilience engineering differs in that it remedies system faults as they occur,” Kevin Curran, professor of cybersecurity at Ulster University, Northern Ireland, said in an email exchange.

How to Leverage Resilience Engineering in Your Daily Practice

Cybersecurity professionals have opportunities for feedback on resilience engineering. Security analysts file incident reports that point to unexpected failures.

“Logging is essential to resilience engineering,” Curran said. Log analysis—that is, identifying security events and system failures and their relationships—and security incident documentation cycle back into progressive testing and the software design, engineering and development processes where resilience takes shape.

Cybersecurity professionals aggregate and analyze log data using centralized log managers and security information and event management (SIEM) tools. Alerts from these tools point to security events and systems failures that demand the security analyst’s investigation and remediation. Security incident reports record the impact of the incident, the sensitivity of the data and the steps the analyst took in response. According to the NIST publication Developing Cyber-Resilient Systems, these reports should feed into cyber resiliency analysis.

Cybersecurity professionals model threats from security operations centers and other sources using threat modeling frameworks such as Microsoft’s STRIDE and MITRE ATT&CK to inform resilient software engineering.

“Developers look at those models and the historical outcomes of attacker behavior. But they can’t assume that the past projects itself into future attacks because cyber adversaries do their research and development. There are interesting attacks that come out of nowhere, in addition to exploiting the zero-day vulnerabilities that lurk in the software libraries that we reuse constantly,” said Bodeau.

A Zero-Day vulnerability is a new flaw that attackers exploit as the vendor has had zero days to create a patch. Software libraries contain pre-written components that developers add to their software projects instead of coding them manually.

Cybersecurity professionals have a voice in security leader decisions about investments in correcting software failure states through resilience engineering. “You can advocate for resources dedicated to receiving outside input about software failures,” Lanham explained.

“Bug bounty programs are great examples of outside sources of information about software failures,” he added. “Bugs include software flaws that criminal hackers leverage in an attack to put the software in a weakened state, permitting further infiltration into the network or application to exfiltrate data.”

The organization needs to be open to such feedback about failure conditions in the software, Lanham continued. “There are corporate cultures that are hostile to outside feedback. That culture must check itself at the door and allow that kind of external insight to come in.”

David Geer is a freelance cybersecurity writer based in Ohio.