Now that I have explained
the hows and whys of analysis and have outlined a general process for performing
analysis, it's time to outline the specific approaches. As mentioned in Chapter 2, “Concepts and Definitions,”
analysis is divided into two parts.
Intrusion detection is composed of misuse detection,
which searches event data for predefined patterns of misuse, and anomaly
detection, which characterizes data in mathematical terms, searching
event data for patterns of abnormality.
Misuse detection asks the following question about system events: Is
this activity bad? Misuse detection involves encoding information about specific
behaviors known to be indicators of intrusion and then filtering event data
for these indicators.
To perform misuse detection, you need the following:
A good understanding of what constitutes a misuse behavior
A reliable record of user activity
A reliable technique for analyzing that record of activity
In a nutshell, misuse detection is best suited for reliably detecting known use patterns. It suffers
the deficiency that you can detect only what you know about, although if you're
clever, you can leverage the knowledge you have (for instance, of outcomes
of attacks) to spot new exploits of old problems.
One of the earliest schemes for performing misuse
detection was the use of production/expert systems. This approach was utilized
in systems such as MIDAS, IDES, Next Generation IDES (NIDES), DIDS, and CMDS.
In the case of MIDAS, IDES, and NIDES, the production system employed was
P-BEST, designed by Alan Whitehurst. For DIDS and CMDS, the CLIPS system,
a public domain system developed by the National Aeronautics and Space Administration
(NASA), was used.
The advantage associated with using production systems is that the control
reasoning of the systems is separated from the statement of the problem solution.
This feature allows users to enter knowledge about attacks as if-then rules
and then enter facts (in the form of audit events); the system evaluates those
facts according to the knowledge entered. This process can occur without the
user ever affecting (or understanding) the internal function of the production
system; before production systems, users had to hard code the decision engine
and rules in custom software, a difficult, time-consuming task.
The attack knowledge is entered using an if-then syntax. Conditions
indicative of an intrusion are specified on the left side (the “if”
part) of the rule. When they are satisfied, the rule performs the actions
on the right side (the “then” part) of the rule.
Some practical problems are associated with the use of production systems
in intrusion detection.
They are ill-equipped to handle large volumes of data. This
is true because thedeclarative representations used by production systems
are generally implemented as interpreted systems and interpreters are slower
than compiled engines.
They do not provide any natural handling of sequentially ordered
The expertise reflected by the production system is only as good as the person on whose skills
the system is modeled (the standard “garbage in, garbage out”
effect endemic to computer systems).
They can detect only known intrusions.
They cannot handle uncertainty.
Maintaining the rule base can be problematic, as one must make changes
to the rules considering the impact of the changes on the rest of the rules
in the knowledge base.3
State transition approaches to performing misuse detection structure the problem of misuse
detection in a way that allows the use of optimized pattern-matching techniques.
Their speed and flexibility make them among the most powerful intrusion detection
capabilities at this time.
State transition approaches use expressions of system state and state
transitions to describe and detect known intrusions. Several approaches are
available for implementing state transition approaches to intrusion detection.
The three major approaches are language or Application Programming Interface
(API), characterization of state transitions, Colored Petri Nets (CP-Nets), and state transition
analysis. In this section, I explore these processes, outlining the strategies each
uses to characterize misuse patterns and then filter event data against them.
State transition analysis is an approach to misuse detection using high-level state transition
diagrams to represent and detect known penetration scenarios. This approach
was first explored in the STAT system and its extension to UNIX network environments,
USTAT. Both systems were developed at the University of California, Santa
Barbara. Phillip Porras and Richard Kemmerer developed STAT; Koral Ilgun and
Kemmerer developed USTAT.
A state transition diagram is a graphical representation of a penetration
scenario.Figure 4.4 shows the components of
a state transition diagram, as well as how they are used to represent a sequence.
Nodes represent the states, and arcs represent the transitions. The concept
of expressing intrusions in state transition form is rooted in the understanding
that all intruders start with limited privileges and exploit system vulnerabilities
to gain some outcome. Both the limited privilege starting point and the successful
intrusionoutcome can be expressed as system states.
In using state transition diagrams to characterize the intrusion sequences,
the system can limit itself to expressing those key activities that result
in a state change. The path between initial and intrusion state can be rather
subjective; hence two persons can come up with different state transition
diagrams that represent the same attack scenario. Each state consists of one
or more state assertions (also shown in Figure 4.4).
State transition analysis systems utilize finite state machine graphs
(finite automata) to model intrusions. The intrusion is composed of a sequence
of actions that lead from some initial system state to an intrusion state.
The initial state represents the state of the system before the intrusion
is executed, and the intrusion state represents the state of the system upon
the completion of the intrusion. The system state is described in terms of
system attributes and/or user privileges. The transition is driven by a user
action. The state transition engine maintains a set of state transition diagrams,
each representing a penetration scenario. At a given time, it's assumed that
some sequence of actions have driven the system to a particular state in each
diagram. When a new action takes place, the engine checks it against each
state transition diagram to see whether the action drives the scenario to
the next state. If the action nullifies the assertions of the current state,
the inference engine moves the state transition back to the nearest state
for which the assertions still hold. If the action drives the scenario to
the end state, indicating an intrusion, the previous transition information
is sent to the decision engine, which alerts the security officer to the presence
of the intrusion.
The advantages of the STAT approach follow:
State transition diagrams provide an intuitive, high-level,
and audit-record-independent representation of penetration scenarios.
The transitions allow one to represent partial order of signature
actions constitutingan attack scenario.
A state transition diagram uses the smallest possible subset
of signature actions that must occur for the penetration to be successful.
Thus, the detector can generalize over variants of the same penetration.
The hard-link information maintained by the system makes it
easier to express penetration scenarios.
The system can detect coordinated and slow attacks.4
Deficiencies of the STAT approach include the following:
The list of state assertions and signature actions are hand
The state assertions and signatures may not be powerful enough
for expressing more elaborate penetration scenarios.
The evaluation of certain state assertions may require the
inference engine to get additional information from the target system. This
process could cause performance degradation.
The system cannot detect many common attacks, so it
must be combined with other detectors for operational use.
The prototyped system is slow compared to other state transition-based
Another state-transition-based approach to optimizing misuse detection
is the Colored Petri (CP)-Net approach developed by Sandeep Kumar and Gene
Spafford at Purdue University. This approach was implemented in the IDIOT
The name IDIOT originated as a joke between Gene Spafford and the author;
it stands for Intrusion Detection In Our Time, a wry commentary on the delays
many members of the intrusion detection research community experienced in
transferring research results to operational systems!
IDIOT uses a variation of CP-Nets to represent and detect intrusion
patterns. Under this model, an intrusion is represented as a CP-Net in which
the color of tokens in each state serves to model the context of the event.
The signature matching is driven by the audit trail and takes place by moving
tokens progressively from initial states to the final state (indicating an
intrusion or attack). Guards define the context in which signatures
are considered matched, and post actions are performed
when the pattern is successfully matched.
At first glance, this approach might appear to be almost identical to
the state transition approach of STAT. However, there are significant differences
between the approaches. First, in STAT the intrusion is detected by the effect
it has on the system state, that is, the outcome of the
intrusion. In IDIOT the intrusion is detected by pattern-matching the signature
that constitutes the penetration. In STAT guards are placed in the state,
whereas in IDIOT the guards are incorporated in the transitions.
In IDIOT each intrusion signature is expressed as a pattern that represents
the relationship among events and their context. This relationship pattern
precisely represents a successful intrusion or its attempt. Vertices in the
CP-Net graph represent system states. Intrusion patterns have preceding conditions
and following actions associated with them. The scheme is independent of any
underlying computational framework of matching and provides a model in which
all categories in the classification are represented and matched.
This pattern-matching model consists of the following:
A context representation that allows the matching to correlate
various events that constitute the intrusion signature
Semantics that accommodate the possibility of several intrusion
patterns (possibly belonging to multiple event sources) being intermixed in
the same event stream
An action specification that provides for the execution of certain
actions when the pattern is matched2
Figure 4.5 shows the CP-Net pattern
TCP/IP connection (for a network connection not involving retransmissions).
The following are significant and numerous advantages associated with
this approach to misuse detection:
It is extremely fast. In experiments involving a nonoptimized
version of IDIOT, for every hour of intense activity (generating C2 audit
records), the detector required about 135 seconds to match about 100 intrusion
patterns. This result represents a processing load of less than 5 0.000000or a Sun
SPARCstation 5 generating about 6MB of audit data per hour.
The pattern-matching engine is independent of audit format,
so you can apply it to IP packet streams and other detection problems.
The signatures are portable across audit trails so that they
can be moved amongdifferent systems without having to rewrite them to accommodate
vendor audit trail differences.
The patterns can be specified according to what needs to be
matched, not how it is to be matched.
The sequencing and other ordering constraints on events can
be represented directly.
The system provides a fine-grained specification of a successful
pattern match (that is, it tells you why an intrusion was detected).
IDIOT provides a front-end language for encoding the graphical
representation of the net.
The system allows you to specify an action that is executed upon the
matching of the signature, thereby supporting automated responses.
The limitations of IDIOT are those of all misuse detection systems. Although
the capability to characterize intrusions in terms of outcomes allows
you to generalize some detection signatures, the system still cannot detect
new attacks that it doesn't know about.6
A common strategy for optimizing commercial misuse
detection tools is to devise a means for describing intrusions in a form that
a detection engine can use. Although, as we discussed before, some production
expert system languages (such as P-BEST and CLIPS) are available, they were
designed for other uses. Three approaches to expressing intrusions for misuse
detection purposes are the RUSSEL language developed by Mounji at Faculties
Universitaires Notre-Dame de la Paix (in Namur, Belgium), the STALKER system,
patented by Smaha and Snapp of Haystack Laboratories, and the N packet filteringlanguage
developed by Marcus Ranum and provided as part of the Network Flight Recorder.
RUSSEL is a rule-based language that is designed to optimize the processing of unstructured
data streams, specifically operating system audit trails. It is optimized
for heterogeneous system environments. The goal of RUSSEL is to enable users
to correlate events across multiple hosts and to support multiple levels of
abstraction for events. RUSSEL is utilized in ASAX, a misuse detection system
optimized for heterogeneous network systems. ASAX features the use of a common
audit data format (called the Normalized Audit Data Format [NADF]) and provides
a component that supportsadaptive rules.
Event patterns are expressed in RUSSEL as guarded actions of the form
Condition -- > Action. This action can be specified at a level
of abstraction that allows the intrusion detection user to specify responses
to a given detection scenario.
The language is structured as bottom up. It starts by asserting audit
records as basic facts and then, based on these facts, tries to find
audit record patterns that can be viewed as derived facts. The bottom-up
structure is utilized because the audit records are not known at the
time analysis starts and because it is more efficient than top-down
strategies when dealing with large audit trails.7
Another approach to
characterizing intrusion for misuse detection purposes is the approach
utilized in the STALKER system, a commercial misuse detection product.8
This patented approach utilizes a common audit data format (the SVR
4++ standard outlined in Chapter 7, “Technical
Issues”) and a state-based data structure for attack signatures.
The detector is implemented as a finite state machine (the underlying
technology for compilers), which is optimized to pattern-matching tasks such
as misuse detection.
The misuse detector operates by passing audit records to the misuse
detection engine. The engine maintains a set of detection signatures in the
form of state transitions. The signature expression consists of a data structure
that contains an initial state, an end state, and one or more sets of transition
functions for each misuse. Figure 4.6 shows
the structure and operation of the STALKER misuse detector.
This approach was successfully implemented and
fielded in a series of commercial products, which supported a variety
of operating systems and applications environments. The STALKER product
was withdrawn from the market when Network Associates acquired Haystack
Laboratories in 1997, but the design has recently been returned to the
market in the Cybercop Monitor.9
A language-based optimization of network monitoring
and analysis functions is provided by the Network Flight Recorder, a network
monitoring system that serves as the basis for some commercial network-based
intrusion detection products. The N programming language is an interpreted
language operating on a byte-code instruction and implementing a simple stack
The language includes flow control, procedures, and 64-bit integer counter
data types. It is customized to support network packet filter construction.
Although these packet filters can be programmed to recognize network attacks,
they can also be programmed to recognize other network activity.
Filters are written in N-code, which is input to the
network monitoring engine, compiled, and stored as byte-code instructions
to optimize filtering performance. Network packet traffic is reassembled using
a table structure, which preserves the state of each current network session.
This state preservation permits matching signatures against the entire lifetime
of a connection or traffic stream.
The underlying engine also keeps statistics about the network performance,
including timing statistics regarding packet arrival rate and network errors
(as evidenced by broken packets or duplicate packet traffic). The engine also
uses a statistical calculation to determine how long to retain state information
about a connection before discarding it. In hybrid intrusion detection systems,
this statistical information can be passed to an anomaly detection component.
Under the N-code packet filtering mechanism, the language binds a filter
to the reception of a network packet. It can also support other specified
events. The filter collects information from the packet, exporting it through
either an alert or record mechanism. Alert mechanisms send alert messages
to an alert management system, whereas record mechanisms send a data structure
to a recorder function for various types of additional back-end processing.
The N language is included in the Network Flight Recorder product,
which is available in source code form
from the developers.10
Although most current intrusion detection is based on
real-time collection and analysis of event data, some approaches involve working
with audit data archives, searching for evidence of interesting patterns of
activity, or providing the ability to isolate the activities affecting a particular
object or involving a particular user. This is of special interest to investigators
and incident handlers.
One such approach comes from Ross Anderson and Abida Khattak of the
University of Cambridge. They propose a functional separation between intrusion
detection systems that discover new attacks and systems that allow security
administrators to find instances of the attacks after they are identified.
To handle the latter problem, Anderson and Khattak propose the use of information
techniques. These techniques are currently widely utilized in the form of
search engines (such as AltaVista) for the World Wide Web.
IR systems utilize inverted files as indexes that allow efficient searching
for keywords and combinations of keywords. These systems also utilize algorithms
that learn a user's preferences and use Bayesian inference to help refine
searches. These differ from data mining in that they rely on indexing, not
machine learning, to discover patterns of data.
This approach is limited to reviewing audit information after the fact
(in batch mode). The researchers constructed a prototype utilizing the UNIX lastcomm
system log and GLIMPSE, a search engine developed by the University of Arizona.
They used a Perl script to sort the file into a selection of smaller files,
one for each user. A GLIMPSE search was entered, searching for command-line
sequences that matched the sequences associated with a particular attack.
This process quickly and reliably located the attacks.
This approach is simple, yet powerful, and significant efficiencies are
associated with the use of the IR techniques to perform the detection activity.
The index utilized by GLIMPSE is compact (about 20f the indexed material)
and can serve as an effective data reduction mechanism for audit trails.
The presence of the index can also serve as a fail-safe mechanism, revealing
situations in which hackers alter audit information to cover their tracks.
Because GLIMPSE and proposed audit data sources are either free or included
with standard operating system packages, this approach is also inexpensive,
an advantage for many organizations that cannot afford
Anomaly detection involves a process of establishing profiles of normal
user behaviors, comparing actual user behavior to those profiles, and flagging
deviations from the normal. The basis of anomaly detection is the assertion
that abnormal behavior patterns indicate misuse of systems. Profiles
are defined as sets of metrics. Metrics are measures
of particular aspects of user behavior. Each metric is associated with a threshold
or range of values.
Anomaly detection depends on an assumption that users exhibit predictable,
consistent patterns of system usage. The approach also accommodates adaptations
to changes in user behavior over time. The completeness of anomaly detection
has yet to be verified (no one knows whether any given set of metrics is rich
enough to express all anomalous behavior). Thus, additional research is required
to know whether anomaly detection will ever be able to detect all scenarios
of interest, representing a strong protection mechanism for systems.
Dorothy Denning, in her landmark 1986 paper outlining the IDES model for intrusion
detectors, asserts that four statistical models may be included in the system.
Each model is deemed suitable for a particular type of system metric.
First is the operational model. This model applies
to metrics such as event counters for the number of password failures in a
particular time interval. The model compares the metric to a set threshold,
triggering an anomaly when the metric exceeds the threshold value. This model,
which applies to misuse detection as well as anomaly detection, corresponds
to threshold detection, covered later in this section.
Denning's second detection model proposes
a classic mean and standard deviation characterization of data. The assumption
is that all the analyzer knows about system behavior metrics are the mean
and standard deviations as determined from the first two moments. A new behavior
observation is defined to be abnormal if it falls outside a confidence interval.
This confidence interval is defined as dstandard deviations
from the mean for some parameter d. Denning hypothesizes
that this characterization is applicable to event counters, interval timers,
and resource measures. She also alludes to the ability to assign weights to
computations, such that more recent data is assigned a greater weight.
The multivariate model, the third of Denning's detection models, is an extension
to the mean and standard deviation model. It is based on performing correlations
among two or more metrics. Therefore, instead of basing the detection of an
anomaly strictly on one measure, you might base it on the correlation of that
measure with another measure. So instead of detecting an anomaly based solely
on the observed length of a session,
you might base it on the correlation of the length of the session with the
number of CPU cycles utilized.
The final, most complex part of
Denning's model is limited to event counters. Under this model, the detector
considers each different type of audit event as a state variable and uses
a state transition matrix to characterize the transition frequencies between
states (not the frequencies of the individual states/audit records). A
new observation is defined as anomalous if its probability, as determined
by the previous state and value in the state transition matrix, is too
low. This allows the detector to spot unusual command or event sequences,
not just single events. This introduces the notion of performing stateful
analysis of event streams.12
The most commonly used anomaly detection approach is quantitative
analysis in which detection rules and attributes are expressed in numeric
form. Denning includes this category of measures in her operational model.
This set of techniques often presumes some computation, which can range from
simple addition to more complex cryptographic calculations. The results of
these techniques can be the basis for misuse detection signatures and anomaly
detection statistical models alike. This section describes several common
quantitative analyses and provides an example of an operational system that
utilizes these techniques to accomplish data reduction and intrusion detection
Probably the most common form of quantitative analysis is threshold
detection (also known in some circles as thresholds and
triggers). In threshold detection, certain attributes of user and
system behaviors are characterized in terms of counts, with some level
established as permissible. The classic example of a threshold is the number
of permissible unsuccessful logins to a system. Virtually every early intrusion
detection system contained a detection rule defining an intrusion in terms
of this measure.
Other thresholds include the number of network connections of a particular
type, the number of attempted file accesses, the number of files or directories
accessed, and the number of network systems accessed. An inherent assumption
in threshold detection is that the measurement is made over a particular time
interval. This interval can be fixed in time (for instance, the threshold
can be reset to zero at a particular time of day) or function over a sliding
window (for example, the measurement is made over the last eight hours).
Heuristic threshold checks take simple threshold
detection a step further by adapting it to observed levels. This process increases
the accuracy of the detection, especially when performing detection over a
wide range of users or target environments. So, for instance, instead of having
a threshold detection rule triggering an alert when the number of failed logins
exceeds three in an eight-hour period, you can have a threshold detection
rule that triggers an alert when an abnormal number of failed logins occur. “Abnormal”
can be defined by various formulas. One that comes immediately to mind is
a Gaussian function (such as chi-square) in which the mean number of failed
logins is calculated and subsequent numbers of failed logins are compared
to the mean plus some standard deviation.
Another valuable quantitative analysis measure
is a target-based integrity check. This is a check for a change in a system
object that should not be subject to unpredictable change. The most common
example of such an integrity check utilizes a message digest function to calculate
a cryptographic checksum of the system object in question. After the checksum
is calculated and stored in a safe place (for instance, read-only media) the
system periodically recalculates the checksum, comparing it to the stored
reference value. If a differential is found, an alarm is raised. The Tripwire™
product, found in both public domain and commercially supported versions,
provides this capability.
One of the more interesting uses of quantitative
analysis in early intrusion detectionsystems used quantitative measures to
perform data reduction. Data reduction is the process
of eliminating superfluous or redundant information from often-voluminous
event information. This reduces system storage loads and optimizes detection
processes based on the event information.
An example demonstrating the use of quantitative methods to support effective
data reduction comes from the NADIR system, developed by the Computing
and Communications Division of Los Alamos National Laboratory. The
NADIR developers utilized data profiling, which transforms user
activity from audit logs into vectors of quantitative measures (most of
them linear categorical or a combination of linear categorical and ordinal
data). The profiles are aggregated over time (with weekly summaries) as
well as over systems (with aggregate views of user activity per system).
The reduced data is subjected to both statistical and expert system examination,
with alarms and alerts handled by a staff investigator.13
The first worked examples of anomaly detection systems were based on
statistical measures. These included approaches such as those utilized in IDES, mentioned
earlier, and thefollow-on NIDES project, as well as the Haystack system.
IDES and NIDES, developed by researchers at SRI International, were
two of the most
prominent early intrusion detection research systems. They were both hybrid
systems, including misuse and anomaly detection features; however, I focus
on the statistical analysis here.
The statistical analysis techniques employed for IDES and NIDES support
historical statistical profiles established and maintained for each user and
system subject. These profiles are updated periodically, with older data aged
so that the profiles adapt to reflect changes in user behavior over time.
The systems maintain a statistical knowledge base consisting of profiles.
Each profile expresses normal behaviors for a particular user in terms of
a set of measures or metrics. Once a day, new audit data is incorporated into
the knowledge base (after the old vectors are aged by an exponential decay
factor), based on the activity of the user during that day.
Let's take a look at how these statistics were calculated for IDES.
Each time an audit record is generated, a summary test statistic is generated.
This statistic, called the IDES score (IS), is calculated by the following
IS = (S1,S2,S3...Sn)C-1 (S1,S2,S3...Sn)t;
where (S..) C-1 is the inverse of a correlation matrix or vector
and (S...) t is the transpose of the vector. Each Sn
measures some aspect of behavior, such as file access, terminals used,
and CPU time used. Different Sn values can also represent
different views of the same aspect of behavior.14
Haystack, an anomaly detection system developed by
Tracor Applied Sciences and Haystack Laboratories for the U.S. Air Force,
employs a two-part statistical anomaly detection approach. The first measure
determines to what degree a user session resembles an established intrusion
type. This measure is calculated as follows:
The system maintains a vector of user behavior measures.
For each type of intrusion, the system associates a weight
with each behavior measure, reflecting the relevance of the measure to the
given intrusion type.
For each session, the vector of user behavior measures is
calculated and compared to the threshold vector.
Those behavior measures for which threshold settings are exceeded
The weights associated with the measures exceeding threshold
The sum is used to assign a suspicion quotient to the session,
based on the distribution of the weighted intrusion scores for all previous
The second, complementary statistical method detects deviations in a
user's session activities from the normal user session profile. This method
looks for session statistics that significantly deviate from the normal
historical statistical profile for that user.15,
Statistical anomaly detection
analysis originally targeted intruders masquerading as legitimate users.
Although the assertion has been made that statistical analysis may also
detect intruders who exploit previously unknown vulnerabilities who could
not be detected by any other means, this assertion has yet to be proven
in production use of a system. Early researchers also hypothesized that
statistical anomaly detection could reveal interesting, sometimes suspicious,
activities that could lead to discoveries of security breaches. This assertion
was confirmed on at least one system, NADIR (in operation at Los Alamos
National Laboratory), where developers reported that some of the information
gained by using NADIR led to the discovery of system and security process
errors, as well as discoveries that allowed them to improve the general
management of Los Alamos's system complex.17
Another advantage often claimed for statistical analysis is that statistical
systems do not require the constant updates and maintenance that misuse detection
systems do. This claim may be true, but it depends on several factors. Metrics
must be well chosen, adequate for good discrimination, and well-adapted to
changes in behavior (that is, changes in user behavior must produce a consistent,
noticeable change in the corresponding metrics).If these conditions are met,
chances are excellent that the statistical analyzer will reliably detect behaviors
of interest without requiring on-the-fly modifications to the system.
On the other hand, statistical analysis systems have significant drawbacks.
First, most were designed to perform batch mode processing of audit records,
which eliminated the capability to perform automated responses to block damage.
This omission was not a problem at the time the systems were proposed because
early systems were designed to monitor audit trails from centralized, mainframe
target platforms. Although later systems attempted to perform real-time analysis
of audit data, the memory and processing loads involved in using and maintaining
the user profile knowledge base usually caused the system to lag behind audit
A second drawback affects the range of events that statistical analysis
can characterize. The nature of statistical analysis precludes the capability
to take into account the sequential relationships between events. The exact
order of the occurrence of events is not provided as an attribute in most
of these systems. In other words, the event horizon for these systems is limited
to one event. Because many anomalies indicating attack depend on such sequential
event relationships, this situation represents a serious limitation to the
In cases when quantitative methods (Denning's operational model) are
utilized, it is also difficult to select appropriate values for thresholds
The false alarm rates associated with statistical analysis systems are
high, which leads to users ignoring or disabling the systems. These false
alarms include both
type 1 (falsenegative) and type 2 (false positive) errors.
Early statistical approaches were similar in that they utilized parametricapproaches
to characterizing the behavioral patterns of users and other system entities. Parametric
approaches refer to analytical approaches in which assumptions
are made about the underlying distribution of the data being analyzed. For
instance, in early versions of IDES and MIDAS the distributions of user usage
patterns were assumed to be Gaussian or normal.
The problem with these assumptions is that error rates are high when
the assumptions are incorrect. When researchers began collecting information
about system usage patterns that included attributes such as system resource
usage, the distributions were discovered not to be normal, and including these
measures led to high error rates.
Linda Lankewicz and Mark Benard of Tulane University proposed that a
way of overcoming these problems was to utilize nonparametric techniques for
performing anomaly detection. This approach provides the capability to accommodate
users with less predictable usage patterns and allows the analyzer to take
into account system measures that are not easily accommodated by parametric
The approach Lankewicz and Benard utilized involved nonparametric data
classification techniques, specifically clustering analysis.In
clustering analysis, large quantities of historical data are collected (a sample
set) and organized into clusters according
to some evaluation criteria (also known as features).
Preprocessing is performed in which features associated with a particular
event stream (often mapped to a specific user) are converted into a vector
representation (for example, Xi = [f1, f2,
...fn] in an n-dimensional
state). A clustering algorithm is used to group vectors into classes of behaviors,
attempting to group them so that members of each class are as close as possible
to each other while different classes are as far apart as they can be.
In nonparametric statistical anomaly detection, the premise is that
a user's activity data, as expressed in terms of the features, falls into
two distinct clusters: one indicating anomalous activity and the other indicating
Various clustering algorithms are available. These range from
algorithms that use simple distance measures to determine whether an object
falls into a cluster, to more complex concept-based measures (in which an
object is “scored” according to a set of conditions and that
score is used to determine membership in a particular cluster). Different
clustering algorithms usually best serve different data sets and analysis
Researchers at Tulane found that the clustering algorithm that best
accomplished this goal using resource usage figures as evaluation criteria
was the k-nearest-neighbor algorithm. This groups each
vector with k of its nearest neighbors. kis
a function of the number of vectors in the sample set, not a fixed value.
Experimental results using this analysis technique showed that clusters
formed that reliably grouped similar system operations (such as compiling
or editing files) and also grouped activity patterns according to user.
The advantages of nonparametric approaches include the capability to perform
reliable reduction of event data (in the transformation of raw event data
to vectors). This reduction effect was documented as more than two orders
of magnitude. Other benefits are improvement in the speed of detection and
improvement in accuracy over parametric statistical analysis. Disadvantages
involve concerns that expanding features beyond resource usage would lessen
the efficiency and accuracy of the analysis.18
Another variation of anomaly detection is rule-based anomaly detection. The assumptions underlying
this approach are the same as those associated with statistical anomaly detection.
The main difference is that rule-based intrusion detection systems use sets
of rules to represent and store usage patterns. Two such approaches are covered
in this section: the Wisdom and Sense approach and the Time-Based Inductive
The first rule-based anomaly detection system was
the Wisdom and Sense (W&S) system, developed by researchers at Los Alamos
National Laboratory and Oak Ridge National Laboratory. W&S can operate
on a variety of system platforms and can characterize activity at both operating
system and application levels. It provides two schemes for populating rule
bases: entering them manually (to reflect a policy statement) and generating
them from historical audit data. The rules are derived from historical audit
data by performing a categorical examination, expressing the patterns found
in terms of rules. The rules reflect the past behavior of system subjects
and objects and are stored in a tree structure, called a forest.
Specific data values within the audit records are grouped into thread
classeswith which collections of operations or rules are associated.
An example of a thread class is “all the records containing the
same user-file field values.” Rules are applied to the data in a
thread each time an activity associated with that thread occurs. Anomalies
are detected this way: When transactions are processed, they are compared
to the events of the matching thread to determine whether the events match
the historical patterns of activity or represent an anomaly.19
The TIM system, proposed by Teng, Chen, and Lu, while they were associated with the Digital Equipment
Corporation, utilizes an inductive approach to dynamically generate rules
defining intrusion. The difference between TIM and other anomaly detection
systems is that TIM looks for patterns in sequences of
events, not in individual events. It effectively implements a Markov transition
probability model, as proposed by Denning in her seminal intrusion detection
TIM observes historical event record sequence, characterizing the probability of particular sequences of events
occurring. Other anomaly detector systems measure whether or not the occurrence
of a single event represents a deviation from normal patterns of activity.
TIM focuses on sequences of events, checking to see whether a chain of events
corresponds to what would be expected based on its observation of historical
For example, suppose events E1, E2,
and E3 are listed sequentially in an audit trail. TIM
characterizes the probability of the occurrence of E1
followed by E2 followed by E3,
based on the history of sequences it has observed in the past. TIM automatically
generates rules about the event sequences as it analyzes historical event
data, and then stores the rules in a rule base. Because TIM groups event sequences,
the amount of space required for the rule base is significantly smaller than
that required for a single-event-oriented rule based system (such as Wisdom
If a sequence of events matches the head of a rule, then the next event
is considered anomalous if it's not in the set of predicted events in the
body of the rule. The system also refines its analysis by deleting less predictive
rules from the rule base. (Rule 1 is more predictive
than rule 2 if rule 1 successfully predicts more events than rule 2 predicts.)
The advantages for TIM, especially when compared to statistical measures,
aresignificant. This approach is well suited to environments where user patterns
differ significantly from user to user, but where each user exhibits consistent
behavior over time. Such an environment might be represented by a large corporation
in which different users are responsible for accounting, administrative, programming,
and personnel functions with very little crossover of user duties. This approach
is also well suited for environments in which threat is associated with a
few event types rather than the full complement of system events. Finally,
this approach is not subject to problems associated with session
creep, a defeat strategy associated with anomaly detection, in
which an attacker gradually alters his/her behavior pattern over time to train
the system to accept intrusive behavior as normal. This resistance to session
creep attacks is due to the fact that the semantics are built into the detection
However, as with other systems, weaknesses are associated with the TIM approach. It suffers the problem associated
with all learning-based approaches in that the effectiveness of the approach
depends on the quality of the training data. In learning-based systems the
training data must reflect normal activity for the users of the system. Furthermore,
the rules generated by this approach may not be comprehensive enough to reflect
all possible normal user behavior patterns. This weakness produces a large
false positive (type 2) error rate, especially at the beginning of the operation
of the system. The error rate is high because if an event does not match the
head of any rule (that is, if the system did not encounter the event type
in the training data set), that event always triggers an anomaly.
This approach served as the basis for the Digital Equipment Corporation
Polycenter intrusion detection product and forms the foundation for much
subsequent anomaly detection research.20
Neural networks use adaptive learning
techniques to characterize anomalous behavior. This nonparametric analysis
technique operates on historical sets of training data, which are presumably
cleansed of any data indicating intrusions or other undesirable user behavior.
Neural networks consist of numerous simple processing elements called units
thatinteract by using weighted connections. The knowledge
of a neural network is encoded in the structure of the net in terms of connections
between units and their weights. The actual learning process takes place by
changing weights and adding or removing connections.
Neural network processing involves two stages. In the first stage (corresponding
to the “building the detector” stage of the intrusion analysis
model outlined earlier in this chapter), the network is populated by a training
set of historical or other sample data that is representative of user behavior.
In the second stage (corresponding to the second stage of the intrusion analysis
model), the network accepts event data and compares it to historical behavior
references, determining similarities and differences.
The network indicates that an event is abnormal by changing the state
of the units, changing the weights of connections, adding connections, or
removing them. The network also modifies its definition of what constitutes
a normal event by performing stepwise corrections.
Neural network approaches hold a great deal of promise for anomaly detection.
Because they don't use a fixed set of features to define user behaviors, feature
selection is irrelevant. Neural networks don't make prior assumptions on expected
statistical distribution of metrics, so this method retains some of the advantages
over classic statistical analysis associated with other nonparametric techniques.
Among the problems associated with utilizing
neural networks for intrusion detection is a tendency to form mysterious
unstable configurations in which the network fails to learn certain things
for no apparent reason. However, the major drawback to utilizing neural
networks for intrusion detection is that neural networks don't provide any
explanation for the anomalies they find. This practice impedes the ability
of users to establish accountability or otherwise address the roots of the
security problems that allowed the detected intrusion. This made it poorly
suited to the needs of security managers. Although some researchers have
proposed hybrid approaches as a means of overcoming these disadvantages,
no published figures yet indicate the feasibility of neural network approaches.21
Some recent intrusion detection approaches fit neither misuse detection
nor anomaly detection categories. These schemes may be applicable to either
problem, perform precursor activity that can drive or refine either form of
detection, or depart from the traditional monolithic view of intrusion detection
in ways that affect detection strategies.
In an innovative and promising research project, researchers at the
University of New Mexico (Forrest, Hofmeyr, and Somayagi, among others) took
a fresh look at the entire question of computer security. The question posed
by the researchers was, “How does one equip computer systems with the
means to protect themselves?” In answering this question, they noted
marked similarities between biological immune systems and system protection
The key to both systems' functioning well is the capability to perform “self/nonself”
determination--that is, the capability of an organism's immune system
to determine which materials are harmless entities (such as the organism's
own) and which are pathogens and other dangerous factors. As the immune system
performs this determination by using peptides, short protein fragments, the
researchers decided to focus on some computer attribute that could be considered
analogous to peptides. The team hypothesized that sequences of UNIX system
calls could satisfy those requirements.
In deciding to consider system calls as a primary source of information,
the researchers considered a variety of goals for the data, including data
volume, capability to reliably detect misuse, and suitability for encoding
in a fashion appropriate for advanced pattern-matching techniques. They chose
to focus on short sequences of the system calls, furthermore ignoring the
parameters passed to the calls, looking only at their temporal orders.
The system as first proposed performs anomaly detection. (It can also
perform misuse detection.) The system complies with the two-phase intrusion
detection analysis process, with the first phase building a knowledge base
that profiles normal behavior. This profile is a bit different from others
discussed in this chapter in that here the behavior characterized is not user-centric,
but system-process centric. Deviations from this profile are defined as anomalous.
In the second phase of the detection system, the profiles are used to monitor
subsequent system behavior for anomalies.
Sequences of system calls that result from running privileged processes
were collected over time. The profiles for the system consisted of unique
sequences of length 10. Three measures were utilized to characterize deviations
from normal process behaviors: successful exploits, unsuccessful exploits,
and error conditions.
The results were extremely promising because the three measures allowed
the detection of several sorts of anomalous behavior spanning several historically
problematic UNIX programs. The research also showed that the sets of execution
sequences were remarkably compact.22
Subsequent research compared different approaches for characterizing normal
behavior. It explored whether more powerful data modeling methods significantly
improved the performance of this approach when monitoring more complex systems.
Somewhat surprisingly, even powerful data modeling techniques (for example,
hidden Markov models, which are extremely reliable, though computationally
greedy) did not give significantly better results than the simpler time-sequence-based
Although the self/nonself techniques appear to constitute an extremely
powerful and promising approach, it is not a complete solution to the intrusion
detection problem. Some attacks, including race conditions, masquerading,
and policy violations, do not involve the use of privileged processes. Therefore,
these attacks are not subject to detection using this approach.23
Another more sophisticated approach to performing anomaly detection utilizes genetic algorithms
to perform analysis of event data.
A genetic algorithm is an instance of a class of algorithms called evolutionary
algorithms incorporate concepts of Darwinian natural selection (survival
of the fittest) to optimize solutions to problems. Genetic algorithms utilize
encoded forms (known as chromosomes)
with methods that allow combination or mutation of the chromosomes to form
new individuals. These algorithms are recognized for their capability to
deal with multidimensional optimization problems in which the chromosome
is composed of encoded values for the variables being optimized.24
In the eyes of the researchers investigating genetic algorithm approaches
to intrusion detection, the intrusion detection process involves defining
hypothesis vectors for event data, where the vector either indicates an intrusion
or does not. The hypothesis is then tested to determine whether it is valid,
and an improved hypothesis is devised and tried based on the results of the
test. This process repeats until a solution is found.
The role of genetic algorithms in this process is to devise the improved
hypothesis. Genetic algorithm analysis involves two steps. The first step
involves coding a solution to the problem with a string of bits. The second
step is finding a fitness function to test each individual of the population
(for instance, all the possible solutions to the problem) against some evaluation
In the system GASSATA,
developed by Ludovic
Mé of Supelec, the French engineering university, genetic algorithms
are applied to the problem of classifying system events by using a set of
hypothesis vectors, H (one vector per event stream of interest) of n
dimensions (where n is the number of potential known
attacks). Hi is defined to be 1 if it represents an
attack and 0 if it doesn't.
The fitness function has two parts. First, the risk that a particular
attack represents to the system is multiplied by the value of the hypothesis
vector. The product is then adjusted by a quadratic penalty function to eliminate
unrealistic hypotheses. This step improves the discrimination among the possible
attacks. The goal of the process is to optimize the results of this analysis
so that the probability of a detected attack being real approaches 1 and the
probability of a detected attack being false approaches 0.
Experimental results for the genetic algorithm approach to anomaly detection
are encouraging. In experimental runs, the mean probability of true positives
(accurate detection of real attacks) was 0.996, and the mean probability of
false positives (detection of nonattacks) was 0.0044. The time required to
construct the filters is also encouraging. For a sample set of 200 attacks,
it took the system 10 minutes and 25 seconds to evaluate audit records generated
by an average user over 30 minutes of intensive system use.
The following drawbacks are noted in this approach to misuse detection:
The system can't take into account attacks characterized by event absence (for instance,
rules in the form of “programmer does NOT use cc as compiler”).
Because of the binary expressional form for individual event
streams, the system can't detect multiple simultaneous attacks. The possibility
exists that nonbinary geneticalgorithm approaches could solve this problem.
If the same event or group of events is common to several
attacks and an attacker uses this commonality to execute multiple simultaneous
attacks, the system can't find an optimal hypothesis vector.
Perhaps the largest drawback is that the system doesn't precisely locate
attacks in the audit trail. Therefore, no sense of temporality occurs
in the results of the detector. (Note that this drawback is similar
to problems noted in neural network approaches.) Therefore, genetic
algorithm approaches must be backed up with post hoc search or other
investigative aids if such support is required.25
Agent-based approaches to intrusion
detection are based on software entities that perform certain security monitoring
functions at a host. They function autonomously--that is, they are controlled
only by the operating system, not by other processes. Agent-based approaches
also run continuously with the understanding that this type of operation allows
them to learn from experiences, as well as to communicate and cooperate with
other agents of similar construction.
Agent-based detection approaches can be very powerful due to the range
of capabilities with which one can imbue an agent. An agent can be extremely
simple (for example counting the number of times a particular command is invoked
in a time interval) or complex (looking for evidence of a set of attacks for
a particular environment), depending on the whim of the developer.
The range of agent capabilities can allow an agent-based intrusion detection
system to provide a mixture of anomaly detection and misuse detection capabilities.
For instance, an agent can be programmed to adapt its detection capabilities
to changes in the local environment. It can also monitor for very subtle patterns
over a long time interval, thereby detecting slow attacks. Finally, an agent
extremely fine-grained responses to a detected problem (for instance, changing
the priority level of a process, effectively slowing it down).
A prototype of an agent-based intrusion detection system,
Autonomous Agents for Intrusion Detection (AAFID), was
developed by researchers at Purdue University. It serves as the basis for
this discussion of agent-based solutions.
This architecture for agent-based intrusion detection systems calls
for a hierarchically ordered control and reporting structure for agents, as
pictured in Figure 4.7. Any number of agents
can reside on a host. All agents on a particular host report their findings
to a single transceiver. Transceivers monitor the operation of all the agents
running in the host, with capabilities to issue start, stop, and reconfiguration
commands to agents. Transceivers also perform data reduction on information
reported by the agents and report results to one or more monitors, the next
level of the hierarchy.
Monitors, which can be hierarchically structured in multiple layers,
control and consolidate information from several transceivers. The architecture
of AAFID allows redundancy in reporting from transceivers to monitors so that
the failure of a monitor doesn't jeopardize the operation of the intrusion
detection system. Monitors have the capability to access data from the entire
network and therefore perform higher-level aggregation of results from transceivers.
This feature enables the system to detect multihost attacks. Through a user
interface, the user of the system enters commands to control the monitors.
They, in turn, control the transceivers based on these user commands.
APIs exported by each component accomplish communications between agents,
transceivers, monitors, and users.
The advantages of AAFID and other such agent-based approaches
include the following:
They appear to be more resistant to insertion and evasion
attacks than other intrusion detection systems.
The architecture is more easily scaled, with provisions for
adding new components or replacing old ones as needed.
Agents can be tested independent of the full system before
they are deployed.
Because they can intercommunicate, agents can be deployed
in groups in which each performs a different, simple function but contributes
to a complex result.
The following deficiencies are also associated with the AAFID architecture:
Monitors are single points of failure. If a monitor ceases
to work, all the transceivers under its control stop producing useful information.
Possible solution strategies exist, but they are as yet untested.
If duplicated monitors are used to address the first problem,
it is difficult to dealwith the consistency and duplication of information.
This situation requires additional mechanisms.
Access control mechanisms to allow different users to have
different modes of access to the intrusion detection system are missing. This
significant deficit must be addressed at each level of the architecture.
Problems occur because of the propagation time for evidence
of attackers to reach the monitor level. This problem is common to all distributed
intrusion detection systems.
As in the rest of intrusion detection,
a significant need remains for more insight regarding designing user
interfaces for intrusion detection systems. This need extends from
presentation schemes to policy structures and specification schemes.26
A second architecture that utilizes distributed agent approaches to performing intrusion detection
is the EMERALD system, researched and prototyped by Phillip Porras (whose
previous intrusion detection work includes the STAT state transition analysis
system) of SRI International. EMERALD includes numerous local monitors in
a framework thatsupports distributing local results to a global array of detectors
that, in turn, consolidate alarms and alerts.
EMERALD, like IDES and NIDES, SRI's previous intrusion detection systems,
is a hybrid intrusion detector, utilizing both signature analysis and statistical
profiling to detect security problems.
EMERALD is notable in that it separates the analysis semantics from
the analysis and response logic, thereby enabling much easier integration
throughout the network. EMERALD also supports analysis at different layers
of abstraction, an importantcapability. Futhermore the design supports interoperability,
another important issue in modern intrusion detection systems.
The central component of EMERALD is the EMERALD Service Monitor, which
is similar in form and function to the AAFID autonomous agent. Service monitors
are programmable to perform any function and are deployed to hosts. They can
be layered, to support hierarchical data reduction, with service monitors
performing some local analysis and reporting results and additional information
to higher-level monitors. This approach yields some scalability not commonly
found in network intrusion detection products. The system also supports a
wide range of automated response, which is of great interest to customers
responsible for critical large-scale networks.
EMERALD, as it builds on considerable
corporate insight gathered in the IDES and NIDES efforts, holds great
promise for protecting large distributed networks.27,
An approach that is similar to some of the rule-based anomaly detection
efforts involves utilizing data mining techniques to build intrusion detection
models. The objective of this approach is to discover consistent useful patterns
of system features that can be used to describe program and user behaviors.
These sets of system features, in turn, can then be processed by inductive
methods to form classifiers (detection engines) that can recognize anomalies
and misuse scenarios.
Data miningrefers to the process of extracting
models from large bodies of data. These models often discover facts in the
data that are not apparent through other means of inspection. Although many
algorithms are available for data mining purposes, the three that are most
useful for mining audit data are classification, link analysis,
and sequence analysis.
Classification assigns a data item to
one of several predefined categories. (This step is akin to sorting data into “bins,”
depending on some criteria.) Classification algorithms output classifiers,
such as decision trees or rules. In intrusion detection, an optimal classifier
can reliably identify audit data as falling into a normal or abnormal category.
Link analysis identifies relationships
and correlations between fields in the body of data. In intrusion detection,
an optimal link analysis algorithm identifies the set of systemfeatures best
able to reliably reveal intrusions.
Sequence analysis models sequential patterns.
These algorithms can reveal which audit events typically occur together and
hold the key to expanding intrusion detection models to include temporal statistical
measures. These measures can provide the capability to recognize denial-of-service
Researchers have developed extensions to
standard data mining algorithms to accommodate some of the special needs
of audit and other system event logs. Initial results of experiments using
live data are interesting, but the work is not yet ready for transfer into
commercial products. Additional research is planned to refine the approach.29
One of the primary architects of OpenCable, Michael
Adams, explains the key concepts of this initiative in his book
Broadband, Second Edition
by George Abe
Introduces the topics surrounding high-speed networks
to the home. It is written for anyone seeking a broad-based familiarity
with the issues of residential broadband (RBB) including product
developers, engineers, network designers, business people, professionals
in legal and regulatory positions, and industry analysts.