What Are Survivable Computer Systems

Definition Of A Survivable Computer Systemcases when I am discussing the security of systems
----------------------------A computer system, which maywith customers, the question of business continuity and
be made up of multiple individual systems anddisaster recovery come up. Most companies that
components, designed to provide mission criticalprovide a service that they deem critical just know the
services must be able to perform in a consistent andsystem needs to be operational in a consistent
timely manner under various operating conditions. Itmanner. However, there is typically little discussion
must be able to meet its goals and objectives whetherabout the various events or scenarios surrounding this
it is in a state of normal operation or under some sortand that can lead to great disappointment in the future
of stress or in a hostile environment. A discussion onwhen what the customer thought was a "survivable
survivable computer systems can be a very complexcomputer system" does not meet their expectations.
and far reaching one. However, in this article we willSome of the items I like to bring up during these
touch on just a few of the basics.Computer Securityconversations is what their computer systems goal
And Survivable Computer Systemsand objective is, what specifically does continuous
--------------------------------------------------Survivableoperation mean to them, and specifically what
computer systems and computer security are in manyconstitutes an attack, failure, or accident that can
ways related but at a low-level very much different.cause loss of operation or failure to meet objectives.A
For instance, the hardening of a particular system tofailure may be defined as a localized event that
be resistant against intelligent attacks may be aimpacts the operation of a system and its ability to
component of a survivable computer system. It doesdeliver services or meet its objectives. An example
not address the ability of a computer system to fulfillmight be the failure of one or more critical or
its purpose when it is impacted by an event such as anon-critical functions that effect the performance or
deliberate attack, natural disaster or accident, oroverall operation of the system. Say, the failure of a
general failure. A survivable computer system must bemodule of code that causes a cascading event that
able to adapt, perform its primary critical functionsprevents redundant modules from performing properly.
even if in a hostile environment, even if variousOr, a localize hardware failure that incapacitates the
components of the computer system arecomputer system.An accident is typically an event that
incapacitated. In some cases, even if the entireis outside the control of the system and administrators
"primary" system has been destroyed.As an example;of a local / private system. An example of this would
a system designed to provide real-time criticalbe natural disasters such as hurricanes, if you live in
information regarding analysis of specializedsouth Florida like I do, or floods, or wide spread loss of
medications ceases to function for a few hourspower because the utility provider cut the wrong
because of wide spread loss of communication.power lines during an upgrade to the grid. About two
However, it maintains the validity of the data whenyears ago, a client of mine who provides web based
communication is restored and systems come backdocument management services could not deliver
online. This computer system could be considered torevenue generating services to their customers
have survived under conditions outside of its control.Onbecause a telecommunications engineer cut through a
the other hand, the same system fails to providemajor phone trunk six blocks away from their office.
continuous access to information under normalThey lost phone and data services for nearly a
circumstances or operating environment, because of aweek.An now we come to "attack". We all know
localized failure, may not be judged to have fulfilled itsaccidents will happen, we know that everything fails at
purpose or met its objective.Fault Tolerant And Highlyone time or another, and typically we can speculate on
Availability Computer Systemshow these things will happen. An attack, executed by
----------------------------Many computer systems arean intelligent, experienced individual or group can be
designed with fault tolerant components so theyvery hard to predict. There are many well known and
continue to operate when key portions of the systemdocumented forms of attacks. The problem is
fail. For instance; multiple power supplies, redundant diskintelligence and human imagination continuously
drives or arrays, even multiple processors and systemadvance the form of malicious attacks and can
boards that can continue to function even if its peerseriously threaten even the most advanced designed
component is destroyed or fails. The probability of allsurvivable computer systems. An accident or failure
components designed to be redundant failing at onedoes not have the ability to think out of the box or
time may be quite low. However, a malicious entity thatrealize that a highly available design is flawed because
knows how the redundant components are configuredall participants use the same design. The probability
may be able to engineer critical failures across thethat an attack might occur, and succeed may be quite
board rendering the fault tolerant componentslow, but the impact may be devastating.Conclusion
ineffective.High availability also plays a role in a-----------------------------------------------One of the
survivable computer system. However this designreasons I wrote this article was to illustrate that it's not
component may not maintain computer systemall about prevention. Although prevention is a big part
survivability during certain events such as variousof survivable computer system design, a critical
forms of malicious attack . An example of this mightcomputer system must be able to meet its objectives
be a critical web service that has been duplicated, sayeven when operating under hostile or stressful
across multiple machines, to allow continuouscircumstances. Or if the steps taking for prevention
functionality if one or more the individual web serversultimately prove inadequate. It may be impossible to
was to fail. The problem is that many implementationsthink of all the various events that can impact a critical
of high availability use the same components andcomputer system but it is possible to reasonably
methodology on all of the individual systems. If andefine the possibilities.The subject of survivable
intelligent attack or malicious event takes place and iscomputer systems is actually one of complexity and
directed at a specific set of vulnerabilities on one ofever evolving technology. This article has only touched
the individual systems, it is reasonable to assume theon a few of the basic aspects of computer system
remaining computer systems that participate in thesurvivability. I intend on continuing this article to delve
highly available implementation are also susceptible todeeper into the subject of survivable computer
the same or similar vulnerabilities. A certain degree ofsystems.You may reprint or publish this article free of
variance must be achieved in how all systemscharge as long as the bylines are included.Darren Miller
participate in the highly available implementation.What'sis an Information Security Consultant with over
The Difference Between An Attack, Failure, Andseventeen years experience. He has written many
Accident?technology & security articles, some of which have
How Do These Differences Impact A Survivablebeen published in nationally circulated magazines &
Computer Systemperiodicals.
----------------------------------------------------------In many