Cisco Knowledge Suite Cisco SystemsCisco Press
   

   
Home
MyCKS
Cutting Edge
Certification
Core Reference
Guided Learning
   
Networking Architecture
LAN
WAN
Switching
Internet Protocols (IP)
Network Protocols
Transport and Application Protocols
Desktop Protocols
Security and Troubleshooting
Network Resources and Management
Integrated Services
 

Techniques

   

< Back Contents Next >

Analysis Schemes

  

4.1. -

Thinking About Intrusions

  

4.2. -

A Model for Intrusion Analysis

  

4.3. -

Techniques

  

4.4. -

Conclusion

  

 

Endnotes

Save to MyCKS

 
Intrusion Detection

From: Intrusion Detection
Author: Rebecca Bace
Publisher: MTP
More Information

4.3. Techniques

Now that I have explained the hows and whys of analysis and have outlined a general process for performing analysis, it's time to outline the specific approaches. As mentioned in Chapter 2, “Concepts and Definitions,” analysis is divided into two parts.

Intrusion detection is composed of misuse detection, which searches event data for predefined patterns of misuse, and anomaly detection, which characterizes data in mathematical terms, searching event data for patterns of abnormality.

4.3.1. Misuse Detection

Misuse detection asks the following question about system events: Is this activity bad? Misuse detection involves encoding information about specific behaviors known to be indicators of intrusion and then filtering event data for these indicators.

To perform misuse detection, you need the following:

  • A good understanding of what constitutes a misuse behavior

  • A reliable record of user activity

  • A reliable technique for analyzing that record of activity

In a nutshell, misuse detection is best suited for reliably detecting known use patterns. It suffers the deficiency that you can detect only what you know about, although if you're clever, you can leverage the knowledge you have (for instance, of outcomes of attacks) to spot new exploits of old problems.

4.3.1.1. Production/Expert Systems

One of the earliest schemes for performing misuse detection was the use of production/expert systems. This approach was utilized in systems such as MIDAS, IDES, Next Generation IDES (NIDES), DIDS, and CMDS. In the case of MIDAS, IDES, and NIDES, the production system employed was P-BEST, designed by Alan Whitehurst. For DIDS and CMDS, the CLIPS system, a public domain system developed by the National Aeronautics and Space Administration (NASA), was used.

The advantage associated with using production systems is that the control reasoning of the systems is separated from the statement of the problem solution. This feature allows users to enter knowledge about attacks as if-then rules and then enter facts (in the form of audit events); the system evaluates those facts according to the knowledge entered. This process can occur without the user ever affecting (or understanding) the internal function of the production system; before production systems, users had to hard code the decision engine and rules in custom software, a difficult, time-consuming task.

The attack knowledge is entered using an if-then syntax. Conditions indicative of an intrusion are specified on the left side (the “if” part) of the rule. When they are satisfied, the rule performs the actions on the right side (the “then” part) of the rule.

Some practical problems are associated with the use of production systems in intrusion detection.

  • They are ill-equipped to handle large volumes of data. This is true because thedeclarative representations used by production systems are generally implemented as interpreted systems and interpreters are slower than compiled engines.

  • They do not provide any natural handling of sequentially ordered data.

  • The expertise reflected by the production system is only as good as the person on whose skills the system is modeled (the standard “garbage in, garbage out” effect endemic to computer systems).

  • They can detect only known intrusions.

  • They cannot handle uncertainty.

Maintaining the rule base can be problematic, as one must make changes to the rules considering the impact of the changes on the rest of the rules in the knowledge base.3

4.3.1.2. State Transition Approaches

State transition approaches to performing misuse detection structure the problem of misuse detection in a way that allows the use of optimized pattern-matching techniques. Their speed and flexibility make them among the most powerful intrusion detection capabilities at this time.

State transition approaches use expressions of system state and state transitions to describe and detect known intrusions. Several approaches are available for implementing state transition approaches to intrusion detection. The three major approaches are language or Application Programming Interface (API), characterization of state transitions, Colored Petri Nets (CP-Nets), and state transition analysis. In this section, I explore these processes, outlining the strategies each uses to characterize misuse patterns and then filter event data against them.

State Transition Analysis

State transition analysis is an approach to misuse detection using high-level state transition diagrams to represent and detect known penetration scenarios. This approach was first explored in the STAT system and its extension to UNIX network environments, USTAT. Both systems were developed at the University of California, Santa Barbara. Phillip Porras and Richard Kemmerer developed STAT; Koral Ilgun and Kemmerer developed USTAT.

A state transition diagram is a graphical representation of a penetration scenario.Figure 4.4 shows the components of a state transition diagram, as well as how they are used to represent a sequence. Nodes represent the states, and arcs represent the transitions. The concept of expressing intrusions in state transition form is rooted in the understanding that all intruders start with limited privileges and exploit system vulnerabilities to gain some outcome. Both the limited privilege starting point and the successful intrusionoutcome can be expressed as system states.

In using state transition diagrams to characterize the intrusion sequences, the system can limit itself to expressing those key activities that result in a state change. The path between initial and intrusion state can be rather subjective; hence two persons can come up with different state transition diagrams that represent the same attack scenario. Each state consists of one or more state assertions (also shown in Figure 4.4).

Figure 4.4. State Transition Diagrams

State transition analysis systems utilize finite state machine graphs (finite automata) to model intrusions. The intrusion is composed of a sequence of actions that lead from some initial system state to an intrusion state. The initial state represents the state of the system before the intrusion is executed, and the intrusion state represents the state of the system upon the completion of the intrusion. The system state is described in terms of system attributes and/or user privileges. The transition is driven by a user action. The state transition engine maintains a set of state transition diagrams, each representing a penetration scenario. At a given time, it's assumed that some sequence of actions have driven the system to a particular state in each diagram. When a new action takes place, the engine checks it against each state transition diagram to see whether the action drives the scenario to the next state. If the action nullifies the assertions of the current state, the inference engine moves the state transition back to the nearest state for which the assertions still hold. If the action drives the scenario to the end state, indicating an intrusion, the previous transition information is sent to the decision engine, which alerts the security officer to the presence of the intrusion.

The advantages of the STAT approach follow:

  • State transition diagrams provide an intuitive, high-level, and audit-record-independent representation of penetration scenarios.

  • The transitions allow one to represent partial order of signature actions constitutingan attack scenario.

  • A state transition diagram uses the smallest possible subset of signature actions that must occur for the penetration to be successful. Thus, the detector can generalize over variants of the same penetration.

  • The hard-link information maintained by the system makes it easier to express penetration scenarios.

  • The system can detect coordinated and slow attacks.4

Deficiencies of the STAT approach include the following:

  • The list of state assertions and signature actions are hand coded.

  • The state assertions and signatures may not be powerful enough for expressing more elaborate penetration scenarios.

  • The evaluation of certain state assertions may require the inference engine to get additional information from the target system. This process could cause performance degradation.

  • The system cannot detect many common attacks, so it must be combined with other detectors for operational use.

  • The prototyped system is slow compared to other state transition-based approaches.5

Colored Petri-Net and IDIOT

Another state-transition-based approach to optimizing misuse detection is the Colored Petri (CP)-Net approach developed by Sandeep Kumar and Gene Spafford at Purdue University. This approach was implemented in the IDIOT system.

NOTE

The name IDIOT originated as a joke between Gene Spafford and the author; it stands for Intrusion Detection In Our Time, a wry commentary on the delays many members of the intrusion detection research community experienced in transferring research results to operational systems!

IDIOT uses a variation of CP-Nets to represent and detect intrusion patterns. Under this model, an intrusion is represented as a CP-Net in which the color of tokens in each state serves to model the context of the event. The signature matching is driven by the audit trail and takes place by moving tokens progressively from initial states to the final state (indicating an intrusion or attack). Guards define the context in which signatures are considered matched, and post actions are performed when the pattern is successfully matched.

At first glance, this approach might appear to be almost identical to the state transition approach of STAT. However, there are significant differences between the approaches. First, in STAT the intrusion is detected by the effect it has on the system state, that is, the outcome of the intrusion. In IDIOT the intrusion is detected by pattern-matching the signature that constitutes the penetration. In STAT guards are placed in the state, whereas in IDIOT the guards are incorporated in the transitions.

In IDIOT each intrusion signature is expressed as a pattern that represents the relationship among events and their context. This relationship pattern precisely represents a successful intrusion or its attempt. Vertices in the CP-Net graph represent system states. Intrusion patterns have preceding conditions and following actions associated with them. The scheme is independent of any underlying computational framework of matching and provides a model in which all categories in the classification are represented and matched.

This pattern-matching model consists of the following:

  • A context representation that allows the matching to correlate various events that constitute the intrusion signature

  • Semantics that accommodate the possibility of several intrusion patterns (possibly belonging to multiple event sources) being intermixed in the same event stream

  • An action specification that provides for the execution of certain actions when the pattern is matched2

Figure 4.5 shows the CP-Net pattern for a TCP/IP connection (for a network connection not involving retransmissions).

Figure 4.5. CP-Net Pattern for TCP/IP Connection

The following are significant and numerous advantages associated with this approach to misuse detection:

  • It is extremely fast. In experiments involving a nonoptimized version of IDIOT, for every hour of intense activity (generating C2 audit records), the detector required about 135 seconds to match about 100 intrusion patterns. This result represents a processing load of less than 5 0.000000or a Sun SPARCstation 5 generating about 6MB of audit data per hour.

  • The pattern-matching engine is independent of audit format, so you can apply it to IP packet streams and other detection problems.

  • The signatures are portable across audit trails so that they can be moved amongdifferent systems without having to rewrite them to accommodate vendor audit trail differences.

  • The patterns can be specified according to what needs to be matched, not how it is to be matched.

  • The sequencing and other ordering constraints on events can be represented directly.

  • The system provides a fine-grained specification of a successful pattern match (that is, it tells you why an intrusion was detected).

  • IDIOT provides a front-end language for encoding the graphical representation of the net.

  • The system allows you to specify an action that is executed upon the matching of the signature, thereby supporting automated responses.

The limitations of IDIOT are those of all misuse detection systems. Although the capability to characterize intrusions in terms of outcomes allows you to generalize some detection signatures, the system still cannot detect new attacks that it doesn't know about.6

Language/API-Based Approaches

A common strategy for optimizing commercial misuse detection tools is to devise a means for describing intrusions in a form that a detection engine can use. Although, as we discussed before, some production expert system languages (such as P-BEST and CLIPS) are available, they were designed for other uses. Three approaches to expressing intrusions for misuse detection purposes are the RUSSEL language developed by Mounji at Faculties Universitaires Notre-Dame de la Paix (in Namur, Belgium), the STALKER system, patented by Smaha and Snapp of Haystack Laboratories, and the N packet filteringlanguage developed by Marcus Ranum and provided as part of the Network Flight Recorder.

RUSSEL

RUSSEL is a rule-based language that is designed to optimize the processing of unstructured data streams, specifically operating system audit trails. It is optimized for heterogeneous system environments. The goal of RUSSEL is to enable users to correlate events across multiple hosts and to support multiple levels of abstraction for events. RUSSEL is utilized in ASAX, a misuse detection system optimized for heterogeneous network systems. ASAX features the use of a common audit data format (called the Normalized Audit Data Format [NADF]) and provides a component that supportsadaptive rules.

Event patterns are expressed in RUSSEL as guarded actions of the form Condition -- > Action. This action can be specified at a level of abstraction that allows the intrusion detection user to specify responses to a given detection scenario.

The language is structured as bottom up. It starts by asserting audit records as basic facts and then, based on these facts, tries to find audit record patterns that can be viewed as derived facts. The bottom-up structure is utilized because the audit records are not known at the time analysis starts and because it is more efficient than top-down strategies when dealing with large audit trails.7

STALKER

Another approach to characterizing intrusion for misuse detection purposes is the approach utilized in the STALKER system, a commercial misuse detection product.8 This patented approach utilizes a common audit data format (the SVR 4++ standard outlined in Chapter 7, “Technical Issues”) and a state-based data structure for attack signatures.

The detector is implemented as a finite state machine (the underlying technology for compilers), which is optimized to pattern-matching tasks such as misuse detection.

The misuse detector operates by passing audit records to the misuse detection engine. The engine maintains a set of detection signatures in the form of state transitions. The signature expression consists of a data structure that contains an initial state, an end state, and one or more sets of transition functions for each misuse. Figure 4.6 shows the structure and operation of the STALKER misuse detector.

This approach was successfully implemented and fielded in a series of commercial products, which supported a variety of operating systems and applications environments. The STALKER product was withdrawn from the market when Network Associates acquired Haystack Laboratories in 1997, but the design has recently been returned to the market in the Cybercop Monitor.9

Network Flight Recorder--N-Code

A language-based optimization of network monitoring and analysis functions is provided by the Network Flight Recorder, a network monitoring system that serves as the basis for some commercial network-based intrusion detection products. The N programming language is an interpreted language operating on a byte-code instruction and implementing a simple stack machine.

Figure 4.6. STALKER Misuse Detection Approach

The language includes flow control, procedures, and 64-bit integer counter data types. It is customized to support network packet filter construction. Although these packet filters can be programmed to recognize network attacks, they can also be programmed to recognize other network activity.

Filters are written in N-code, which is input to the network monitoring engine, compiled, and stored as byte-code instructions to optimize filtering performance. Network packet traffic is reassembled using a table structure, which preserves the state of each current network session. This state preservation permits matching signatures against the entire lifetime of a connection or traffic stream.

The underlying engine also keeps statistics about the network performance, including timing statistics regarding packet arrival rate and network errors (as evidenced by broken packets or duplicate packet traffic). The engine also uses a statistical calculation to determine how long to retain state information about a connection before discarding it. In hybrid intrusion detection systems, this statistical information can be passed to an anomaly detection component.

Under the N-code packet filtering mechanism, the language binds a filter to the reception of a network packet. It can also support other specified events. The filter collects information from the packet, exporting it through either an alert or record mechanism. Alert mechanisms send alert messages to an alert management system, whereas record mechanisms send a data structure to a recorder function for various types of additional back-end processing.

The N language is included in the Network Flight Recorder product, which is available in source code form from the developers.10

4.3.1.3. Information Retrieval for Batch Mode Analysis

Although most current intrusion detection is based on real-time collection and analysis of event data, some approaches involve working with audit data archives, searching for evidence of interesting patterns of activity, or providing the ability to isolate the activities affecting a particular object or involving a particular user. This is of special interest to investigators and incident handlers.

One such approach comes from Ross Anderson and Abida Khattak of the University of Cambridge. They propose a functional separation between intrusion detection systems that discover new attacks and systems that allow security administrators to find instances of the attacks after they are identified. To handle the latter problem, Anderson and Khattak propose the use of information retrieval (IR) techniques. These techniques are currently widely utilized in the form of search engines (such as AltaVista) for the World Wide Web.

IR systems utilize inverted files as indexes that allow efficient searching for keywords and combinations of keywords. These systems also utilize algorithms that learn a user's preferences and use Bayesian inference to help refine searches. These differ from data mining in that they rely on indexing, not machine learning, to discover patterns of data.

This approach is limited to reviewing audit information after the fact (in batch mode). The researchers constructed a prototype utilizing the UNIX lastcomm system log and GLIMPSE, a search engine developed by the University of Arizona. They used a Perl script to sort the file into a selection of smaller files, one for each user. A GLIMPSE search was entered, searching for command-line sequences that matched the sequences associated with a particular attack. This process quickly and reliably located the attacks.

This approach is simple, yet powerful, and significant efficiencies are associated with the use of the IR techniques to perform the detection activity. The index utilized by GLIMPSE is compact (about 20f the indexed material) and can serve as an effective data reduction mechanism for audit trails. The presence of the index can also serve as a fail-safe mechanism, revealing situations in which hackers alter audit information to cover their tracks. Because GLIMPSE and proposed audit data sources are either free or included with standard operating system packages, this approach is also inexpensive, an advantage for many organizations that cannot afford other solutions.11

4.3.2. Anomaly Detection

Anomaly detection involves a process of establishing profiles of normal user behaviors, comparing actual user behavior to those profiles, and flagging deviations from the normal. The basis of anomaly detection is the assertion that abnormal behavior patterns indicate misuse of systems. Profiles are defined as sets of metrics. Metrics are measures of particular aspects of user behavior. Each metric is associated with a threshold or range of values.

Anomaly detection depends on an assumption that users exhibit predictable, consistent patterns of system usage. The approach also accommodates adaptations to changes in user behavior over time. The completeness of anomaly detection has yet to be verified (no one knows whether any given set of metrics is rich enough to express all anomalous behavior). Thus, additional research is required to know whether anomaly detection will ever be able to detect all scenarios of interest, representing a strong protection mechanism for systems.

4.3.2.1. Denning's Original Model

Dorothy Denning, in her landmark 1986 paper outlining the IDES model for intrusion detectors, asserts that four statistical models may be included in the system. Each model is deemed suitable for a particular type of system metric.

Operational Model

First is the operational model. This model applies to metrics such as event counters for the number of password failures in a particular time interval. The model compares the metric to a set threshold, triggering an anomaly when the metric exceeds the threshold value. This model, which applies to misuse detection as well as anomaly detection, corresponds to threshold detection, covered later in this section.

Mean and Standard Deviation Model

Denning's second detection model proposes a classic mean and standard deviation characterization of data. The assumption is that all the analyzer knows about system behavior metrics are the mean and standard deviations as determined from the first two moments. A new behavior observation is defined to be abnormal if it falls outside a confidence interval. This confidence interval is defined as dstandard deviations from the mean for some parameter d. Denning hypothesizes that this characterization is applicable to event counters, interval timers, and resource measures. She also alludes to the ability to assign weights to these computations, such that more recent data is assigned a greater weight.

Multivariate Model

The multivariate model, the third of Denning's detection models, is an extension to the mean and standard deviation model. It is based on performing correlations among two or more metrics. Therefore, instead of basing the detection of an anomaly strictly on one measure, you might base it on the correlation of that measure with another measure. So instead of detecting an anomaly based solely on the observed length of a session, you might base it on the correlation of the length of the session with the number of CPU cycles utilized.

Markov Process Model

The final, most complex part of Denning's model is limited to event counters. Under this model, the detector considers each different type of audit event as a state variable and uses a state transition matrix to characterize the transition frequencies between states (not the frequencies of the individual states/audit records). A new observation is defined as anomalous if its probability, as determined by the previous state and value in the state transition matrix, is too low. This allows the detector to spot unusual command or event sequences, not just single events. This introduces the notion of performing stateful analysis of event streams.12

4.3.2.2. Quantitative Analysis

The most commonly used anomaly detection approach is quantitative analysis in which detection rules and attributes are expressed in numeric form. Denning includes this category of measures in her operational model. This set of techniques often presumes some computation, which can range from simple addition to more complex cryptographic calculations. The results of these techniques can be the basis for misuse detection signatures and anomaly detection statistical models alike. This section describes several common quantitative analyses and provides an example of an operational system that utilizes these techniques to accomplish data reduction and intrusion detection goals.

Threshold Detection

Probably the most common form of quantitative analysis is threshold detection (also known in some circles as thresholds and triggers). In threshold detection, certain attributes of user and system behaviors are characterized in terms of counts, with some level established as permissible. The classic example of a threshold is the number of permissible unsuccessful logins to a system. Virtually every early intrusion detection system contained a detection rule defining an intrusion in terms of this measure.

Other thresholds include the number of network connections of a particular type, the number of attempted file accesses, the number of files or directories accessed, and the number of network systems accessed. An inherent assumption in threshold detection is that the measurement is made over a particular time interval. This interval can be fixed in time (for instance, the threshold can be reset to zero at a particular time of day) or function over a sliding window (for example, the measurement is made over the last eight hours).

Heuristic Threshold Detection

Heuristic threshold checks take simple threshold detection a step further by adapting it to observed levels. This process increases the accuracy of the detection, especially when performing detection over a wide range of users or target environments. So, for instance, instead of having a threshold detection rule triggering an alert when the number of failed logins exceeds three in an eight-hour period, you can have a threshold detection rule that triggers an alert when an abnormal number of failed logins occur. “Abnormal” can be defined by various formulas. One that comes immediately to mind is a Gaussian function (such as chi-square) in which the mean number of failed logins is calculated and subsequent numbers of failed logins are compared to the mean plus some standard deviation.

Target-Based Integrity Checks

Another valuable quantitative analysis measure is a target-based integrity check. This is a check for a change in a system object that should not be subject to unpredictable change. The most common example of such an integrity check utilizes a message digest function to calculate a cryptographic checksum of the system object in question. After the checksum is calculated and stored in a safe place (for instance, read-only media) the system periodically recalculates the checksum, comparing it to the stored reference value. If a differential is found, an alarm is raised. The Tripwire™ product, found in both public domain and commercially supported versions, provides this capability.

Quantitative Analysis and Data Reduction

One of the more interesting uses of quantitative analysis in early intrusion detectionsystems used quantitative measures to perform data reduction. Data reduction is the process of eliminating superfluous or redundant information from often-voluminous event information. This reduces system storage loads and optimizes detection processes based on the event information.

An example demonstrating the use of quantitative methods to support effective data reduction comes from the NADIR system, developed by the Computing and Communications Division of Los Alamos National Laboratory. The NADIR developers utilized data profiling, which transforms user activity from audit logs into vectors of quantitative measures (most of them linear categorical or a combination of linear categorical and ordinal data). The profiles are aggregated over time (with weekly summaries) as well as over systems (with aggregate views of user activity per system). The reduced data is subjected to both statistical and expert system examination, with alarms and alerts handled by a staff investigator.13

4.3.2.3. Statistical Measures

The first worked examples of anomaly detection systems were based on statistical measures. These included approaches such as those utilized in IDES, mentioned earlier, and thefollow-on NIDES project, as well as the Haystack system.

IDES/NIDES

IDES and NIDES, developed by researchers at SRI International, were two of the most prominent early intrusion detection research systems. They were both hybrid systems, including misuse and anomaly detection features; however, I focus on the statistical analysis here.

The statistical analysis techniques employed for IDES and NIDES support historical statistical profiles established and maintained for each user and system subject. These profiles are updated periodically, with older data aged so that the profiles adapt to reflect changes in user behavior over time.

The systems maintain a statistical knowledge base consisting of profiles. Each profile expresses normal behaviors for a particular user in terms of a set of measures or metrics. Once a day, new audit data is incorporated into the knowledge base (after the old vectors are aged by an exponential decay factor), based on the activity of the user during that day.

Let's take a look at how these statistics were calculated for IDES. Each time an audit record is generated, a summary test statistic is generated. This statistic, called the IDES score (IS), is calculated by the following formula:

IS = (S1,S2,S3...Sn)C-1 (S1,S2,S3...Sn)t;

where (S..) C-1 is the inverse of a correlation matrix or vector and (S...) t is the transpose of the vector. Each Sn measures some aspect of behavior, such as file access, terminals used, and CPU time used. Different Sn values can also represent different views of the same aspect of behavior.14

Haystack

Haystack, an anomaly detection system developed by Tracor Applied Sciences and Haystack Laboratories for the U.S. Air Force, employs a two-part statistical anomaly detection approach. The first measure determines to what degree a user session resembles an established intrusion type. This measure is calculated as follows:

  1. The system maintains a vector of user behavior measures.

  2. For each type of intrusion, the system associates a weight with each behavior measure, reflecting the relevance of the measure to the given intrusion type.

  3. For each session, the vector of user behavior measures is calculated and compared to the threshold vector.

  4. Those behavior measures for which threshold settings are exceeded are noted.

  5. The weights associated with the measures exceeding threshold are summed.

  6. The sum is used to assign a suspicion quotient to the session, based on the distribution of the weighted intrusion scores for all previous sessions.

The second, complementary statistical method detects deviations in a user's session activities from the normal user session profile. This method looks for session statistics that significantly deviate from the normal historical statistical profile for that user.15, 16

Strengths of Statistical Analysis

Statistical anomaly detection analysis originally targeted intruders masquerading as legitimate users. Although the assertion has been made that statistical analysis may also detect intruders who exploit previously unknown vulnerabilities who could not be detected by any other means, this assertion has yet to be proven in production use of a system. Early researchers also hypothesized that statistical anomaly detection could reveal interesting, sometimes suspicious, activities that could lead to discoveries of security breaches. This assertion was confirmed on at least one system, NADIR (in operation at Los Alamos National Laboratory), where developers reported that some of the information gained by using NADIR led to the discovery of system and security process errors, as well as discoveries that allowed them to improve the general management of Los Alamos's system complex.17

Another advantage often claimed for statistical analysis is that statistical systems do not require the constant updates and maintenance that misuse detection systems do. This claim may be true, but it depends on several factors. Metrics must be well chosen, adequate for good discrimination, and well-adapted to changes in behavior (that is, changes in user behavior must produce a consistent, noticeable change in the corresponding metrics).If these conditions are met, chances are excellent that the statistical analyzer will reliably detect behaviors of interest without requiring on-the-fly modifications to the system.

Drawbacks of Statistical Analysis

On the other hand, statistical analysis systems have significant drawbacks. First, most were designed to perform batch mode processing of audit records, which eliminated the capability to perform automated responses to block damage. This omission was not a problem at the time the systems were proposed because early systems were designed to monitor audit trails from centralized, mainframe target platforms. Although later systems attempted to perform real-time analysis of audit data, the memory and processing loads involved in using and maintaining the user profile knowledge base usually caused the system to lag behind audit record generation.

A second drawback affects the range of events that statistical analysis can characterize. The nature of statistical analysis precludes the capability to take into account the sequential relationships between events. The exact order of the occurrence of events is not provided as an attribute in most of these systems. In other words, the event horizon for these systems is limited to one event. Because many anomalies indicating attack depend on such sequential event relationships, this situation represents a serious limitation to the approach.

In cases when quantitative methods (Denning's operational model) are utilized, it is also difficult to select appropriate values for thresholds and ranges.

The false alarm rates associated with statistical analysis systems are high, which leads to users ignoring or disabling the systems. These false alarms include both type 1 (falsenegative) and type 2 (false positive) errors.

4.3.2.4. Nonparametric Statistical Measures

Early statistical approaches were similar in that they utilized parametricapproaches to characterizing the behavioral patterns of users and other system entities. Parametric approaches refer to analytical approaches in which assumptions are made about the underlying distribution of the data being analyzed. For instance, in early versions of IDES and MIDAS the distributions of user usage patterns were assumed to be Gaussian or normal.

The problem with these assumptions is that error rates are high when the assumptions are incorrect. When researchers began collecting information about system usage patterns that included attributes such as system resource usage, the distributions were discovered not to be normal, and including these measures led to high error rates.

Linda Lankewicz and Mark Benard of Tulane University proposed that a way of overcoming these problems was to utilize nonparametric techniques for performing anomaly detection. This approach provides the capability to accommodate users with less predictable usage patterns and allows the analyzer to take into account system measures that are not easily accommodated by parametric schemes.

The approach Lankewicz and Benard utilized involved nonparametric data classification techniques, specifically clustering analysis.In clustering analysis, large quantities of historical data are collected (a sample set) and organized into clusters according to some evaluation criteria (also known as features). Preprocessing is performed in which features associated with a particular event stream (often mapped to a specific user) are converted into a vector representation (for example, Xi = [f1, f2, ...fn] in an n-dimensional state). A clustering algorithm is used to group vectors into classes of behaviors, attempting to group them so that members of each class are as close as possible to each other while different classes are as far apart as they can be.

In nonparametric statistical anomaly detection, the premise is that a user's activity data, as expressed in terms of the features, falls into two distinct clusters: one indicating anomalous activity and the other indicating normal activity.

Various clustering algorithms are available. These range from algorithms that use simple distance measures to determine whether an object falls into a cluster, to more complex concept-based measures (in which an object is “scored” according to a set of conditions and that score is used to determine membership in a particular cluster). Different clustering algorithms usually best serve different data sets and analysis goals.

Researchers at Tulane found that the clustering algorithm that best accomplished this goal using resource usage figures as evaluation criteria was the k-nearest-neighbor algorithm. This groups each vector with k of its nearest neighbors. kis a function of the number of vectors in the sample set, not a fixed value.

Experimental results using this analysis technique showed that clusters formed that reliably grouped similar system operations (such as compiling or editing files) and also grouped activity patterns according to user.

The advantages of nonparametric approaches include the capability to perform reliable reduction of event data (in the transformation of raw event data to vectors). This reduction effect was documented as more than two orders of magnitude. Other benefits are improvement in the speed of detection and improvement in accuracy over parametric statistical analysis. Disadvantages involve concerns that expanding features beyond resource usage would lessen the efficiency and accuracy of the analysis.18

4.3.2.5. Rule-Based Approaches

Another variation of anomaly detection is rule-based anomaly detection. The assumptions underlying this approach are the same as those associated with statistical anomaly detection. The main difference is that rule-based intrusion detection systems use sets of rules to represent and store usage patterns. Two such approaches are covered in this section: the Wisdom and Sense approach and the Time-Based Inductive Machine (TIM).

Wisdom and Sense

The first rule-based anomaly detection system was the Wisdom and Sense (W&S) system, developed by researchers at Los Alamos National Laboratory and Oak Ridge National Laboratory. W&S can operate on a variety of system platforms and can characterize activity at both operating system and application levels. It provides two schemes for populating rule bases: entering them manually (to reflect a policy statement) and generating them from historical audit data. The rules are derived from historical audit data by performing a categorical examination, expressing the patterns found in terms of rules. The rules reflect the past behavior of system subjects and objects and are stored in a tree structure, called a forest. Specific data values within the audit records are grouped into thread classeswith which collections of operations or rules are associated.

An example of a thread class is “all the records containing the same user-file field values.” Rules are applied to the data in a thread each time an activity associated with that thread occurs. Anomalies are detected this way: When transactions are processed, they are compared to the events of the matching thread to determine whether the events match the historical patterns of activity or represent an anomaly.19

TIM

The TIM system, proposed by Teng, Chen, and Lu, while they were associated with the Digital Equipment Corporation, utilizes an inductive approach to dynamically generate rules defining intrusion. The difference between TIM and other anomaly detection systems is that TIM looks for patterns in sequences of events, not in individual events. It effectively implements a Markov transition probability model, as proposed by Denning in her seminal intrusion detection work.

TIM observes historical event record sequence, characterizing the probability of particular sequences of events occurring. Other anomaly detector systems measure whether or not the occurrence of a single event represents a deviation from normal patterns of activity. TIM focuses on sequences of events, checking to see whether a chain of events corresponds to what would be expected based on its observation of historical event sequences.

For example, suppose events E1, E2, and E3 are listed sequentially in an audit trail. TIM characterizes the probability of the occurrence of E1 followed by E2 followed by E3, based on the history of sequences it has observed in the past. TIM automatically generates rules about the event sequences as it analyzes historical event data, and then stores the rules in a rule base. Because TIM groups event sequences, the amount of space required for the rule base is significantly smaller than that required for a single-event-oriented rule based system (such as Wisdom and Sense).

If a sequence of events matches the head of a rule, then the next event is considered anomalous if it's not in the set of predicted events in the body of the rule. The system also refines its analysis by deleting less predictive rules from the rule base. (Rule 1 is more predictive than rule 2 if rule 1 successfully predicts more events than rule 2 predicts.)

The advantages for TIM, especially when compared to statistical measures, aresignificant. This approach is well suited to environments where user patterns differ significantly from user to user, but where each user exhibits consistent behavior over time. Such an environment might be represented by a large corporation in which different users are responsible for accounting, administrative, programming, and personnel functions with very little crossover of user duties. This approach is also well suited for environments in which threat is associated with a few event types rather than the full complement of system events. Finally, this approach is not subject to problems associated with session creep, a defeat strategy associated with anomaly detection, in which an attacker gradually alters his/her behavior pattern over time to train the system to accept intrusive behavior as normal. This resistance to session creep attacks is due to the fact that the semantics are built into the detection rules.

However, as with other systems, weaknesses are associated with the TIM approach. It suffers the problem associated with all learning-based approaches in that the effectiveness of the approach depends on the quality of the training data. In learning-based systems the training data must reflect normal activity for the users of the system. Furthermore, the rules generated by this approach may not be comprehensive enough to reflect all possible normal user behavior patterns. This weakness produces a large false positive (type 2) error rate, especially at the beginning of the operation of the system. The error rate is high because if an event does not match the head of any rule (that is, if the system did not encounter the event type in the training data set), that event always triggers an anomaly.

This approach served as the basis for the Digital Equipment Corporation Polycenter intrusion detection product and forms the foundation for much subsequent anomaly detection research.20

4.3.2.6. Neural Networks

Neural networks use adaptive learning techniques to characterize anomalous behavior. This nonparametric analysis technique operates on historical sets of training data, which are presumably cleansed of any data indicating intrusions or other undesirable user behavior.

Neural networks consist of numerous simple processing elements called units thatinteract by using weighted connections. The knowledge of a neural network is encoded in the structure of the net in terms of connections between units and their weights. The actual learning process takes place by changing weights and adding or removing connections.

Neural network processing involves two stages. In the first stage (corresponding to the “building the detector” stage of the intrusion analysis model outlined earlier in this chapter), the network is populated by a training set of historical or other sample data that is representative of user behavior. In the second stage (corresponding to the second stage of the intrusion analysis model), the network accepts event data and compares it to historical behavior references, determining similarities and differences.

The network indicates that an event is abnormal by changing the state of the units, changing the weights of connections, adding connections, or removing them. The network also modifies its definition of what constitutes a normal event by performing stepwise corrections.

Neural network approaches hold a great deal of promise for anomaly detection. Because they don't use a fixed set of features to define user behaviors, feature selection is irrelevant. Neural networks don't make prior assumptions on expected statistical distribution of metrics, so this method retains some of the advantages over classic statistical analysis associated with other nonparametric techniques.

Among the problems associated with utilizing neural networks for intrusion detection is a tendency to form mysterious unstable configurations in which the network fails to learn certain things for no apparent reason. However, the major drawback to utilizing neural networks for intrusion detection is that neural networks don't provide any explanation for the anomalies they find. This practice impedes the ability of users to establish accountability or otherwise address the roots of the security problems that allowed the detected intrusion. This made it poorly suited to the needs of security managers. Although some researchers have proposed hybrid approaches as a means of overcoming these disadvantages, no published figures yet indicate the feasibility of neural network approaches.21

4.3.3. Alternative Detection Schemes

Some recent intrusion detection approaches fit neither misuse detection nor anomaly detection categories. These schemes may be applicable to either problem, perform precursor activity that can drive or refine either form of detection, or depart from the traditional monolithic view of intrusion detection in ways that affect detection strategies.

4.3.3.1. Immune System Approaches

In an innovative and promising research project, researchers at the University of New Mexico (Forrest, Hofmeyr, and Somayagi, among others) took a fresh look at the entire question of computer security. The question posed by the researchers was, “How does one equip computer systems with the means to protect themselves?” In answering this question, they noted marked similarities between biological immune systems and system protection mechanisms.

The key to both systems' functioning well is the capability to perform “self/nonself” determination--that is, the capability of an organism's immune system to determine which materials are harmless entities (such as the organism's own) and which are pathogens and other dangerous factors. As the immune system performs this determination by using peptides, short protein fragments, the researchers decided to focus on some computer attribute that could be considered analogous to peptides. The team hypothesized that sequences of UNIX system calls could satisfy those requirements.

In deciding to consider system calls as a primary source of information, the researchers considered a variety of goals for the data, including data volume, capability to reliably detect misuse, and suitability for encoding in a fashion appropriate for advanced pattern-matching techniques. They chose to focus on short sequences of the system calls, furthermore ignoring the parameters passed to the calls, looking only at their temporal orders.

The system as first proposed performs anomaly detection. (It can also perform misuse detection.) The system complies with the two-phase intrusion detection analysis process, with the first phase building a knowledge base that profiles normal behavior. This profile is a bit different from others discussed in this chapter in that here the behavior characterized is not user-centric, but system-process centric. Deviations from this profile are defined as anomalous. In the second phase of the detection system, the profiles are used to monitor subsequent system behavior for anomalies.

Sequences of system calls that result from running privileged processes were collected over time. The profiles for the system consisted of unique sequences of length 10. Three measures were utilized to characterize deviations from normal process behaviors: successful exploits, unsuccessful exploits, and error conditions.

The results were extremely promising because the three measures allowed the detection of several sorts of anomalous behavior spanning several historically problematic UNIX programs. The research also showed that the sets of execution sequences were remarkably compact.22

Subsequent research compared different approaches for characterizing normal behavior. It explored whether more powerful data modeling methods significantly improved the performance of this approach when monitoring more complex systems. Somewhat surprisingly, even powerful data modeling techniques (for example, hidden Markov models, which are extremely reliable, though computationally greedy) did not give significantly better results than the simpler time-sequence-based models.

Although the self/nonself techniques appear to constitute an extremely powerful and promising approach, it is not a complete solution to the intrusion detection problem. Some attacks, including race conditions, masquerading, and policy violations, do not involve the use of privileged processes. Therefore, these attacks are not subject to detection using this approach.23

4.3.3.2. Genetic Algorithms

Another more sophisticated approach to performing anomaly detection utilizes genetic algorithms to perform analysis of event data.

A genetic algorithm is an instance of a class of algorithms called evolutionary algorithms.Evolutionary algorithms incorporate concepts of Darwinian natural selection (survival of the fittest) to optimize solutions to problems. Genetic algorithms utilize encoded forms (known as chromosomes) with methods that allow combination or mutation of the chromosomes to form new individuals. These algorithms are recognized for their capability to deal with multidimensional optimization problems in which the chromosome is composed of encoded values for the variables being optimized.24

In the eyes of the researchers investigating genetic algorithm approaches to intrusion detection, the intrusion detection process involves defining hypothesis vectors for event data, where the vector either indicates an intrusion or does not. The hypothesis is then tested to determine whether it is valid, and an improved hypothesis is devised and tried based on the results of the test. This process repeats until a solution is found.

The role of genetic algorithms in this process is to devise the improved hypothesis. Genetic algorithm analysis involves two steps. The first step involves coding a solution to the problem with a string of bits. The second step is finding a fitness function to test each individual of the population (for instance, all the possible solutions to the problem) against some evaluation criteria.

In the system GASSATA, developed by Ludovic Mé of Supelec, the French engineering university, genetic algorithms are applied to the problem of classifying system events by using a set of hypothesis vectors, H (one vector per event stream of interest) of n dimensions (where n is the number of potential known attacks). Hi is defined to be 1 if it represents an attack and 0 if it doesn't.

The fitness function has two parts. First, the risk that a particular attack represents to the system is multiplied by the value of the hypothesis vector. The product is then adjusted by a quadratic penalty function to eliminate unrealistic hypotheses. This step improves the discrimination among the possible attacks. The goal of the process is to optimize the results of this analysis so that the probability of a detected attack being real approaches 1 and the probability of a detected attack being false approaches 0.

Experimental results for the genetic algorithm approach to anomaly detection are encouraging. In experimental runs, the mean probability of true positives (accurate detection of real attacks) was 0.996, and the mean probability of false positives (detection of nonattacks) was 0.0044. The time required to construct the filters is also encouraging. For a sample set of 200 attacks, it took the system 10 minutes and 25 seconds to evaluate audit records generated by an average user over 30 minutes of intensive system use.

The following drawbacks are noted in this approach to misuse detection:

  • The system can't take into account attacks characterized by event absence (for instance, rules in the form of “programmer does NOT use cc as compiler”).

  • Because of the binary expressional form for individual event streams, the system can't detect multiple simultaneous attacks. The possibility exists that nonbinary geneticalgorithm approaches could solve this problem.

  • If the same event or group of events is common to several attacks and an attacker uses this commonality to execute multiple simultaneous attacks, the system can't find an optimal hypothesis vector.

  • Perhaps the largest drawback is that the system doesn't precisely locate attacks in the audit trail. Therefore, no sense of temporality occurs in the results of the detector. (Note that this drawback is similar to problems noted in neural network approaches.) Therefore, genetic algorithm approaches must be backed up with post hoc search or other investigative aids if such support is required.25

4.3.3.3. Agent-Based Detection

Agent-based approaches to intrusion detection are based on software entities that perform certain security monitoring functions at a host. They function autonomously--that is, they are controlled only by the operating system, not by other processes. Agent-based approaches also run continuously with the understanding that this type of operation allows them to learn from experiences, as well as to communicate and cooperate with other agents of similar construction.

Agent-based detection approaches can be very powerful due to the range of capabilities with which one can imbue an agent. An agent can be extremely simple (for example counting the number of times a particular command is invoked in a time interval) or complex (looking for evidence of a set of attacks for a particular environment), depending on the whim of the developer.

The range of agent capabilities can allow an agent-based intrusion detection system to provide a mixture of anomaly detection and misuse detection capabilities. For instance, an agent can be programmed to adapt its detection capabilities to changes in the local environment. It can also monitor for very subtle patterns over a long time interval, thereby detecting slow attacks. Finally, an agent can enact extremely fine-grained responses to a detected problem (for instance, changing the priority level of a process, effectively slowing it down).

Autonomous Agents for Intrusion Detection

A prototype of an agent-based intrusion detection system, Autonomous Agents for Intrusion Detection (AAFID), was developed by researchers at Purdue University. It serves as the basis for this discussion of agent-based solutions.

This architecture for agent-based intrusion detection systems calls for a hierarchically ordered control and reporting structure for agents, as pictured in Figure 4.7. Any number of agents can reside on a host. All agents on a particular host report their findings to a single transceiver. Transceivers monitor the operation of all the agents running in the host, with capabilities to issue start, stop, and reconfiguration commands to agents. Transceivers also perform data reduction on information reported by the agents and report results to one or more monitors, the next level of the hierarchy.

Figure 4.7. AAFID Architecture

Monitors, which can be hierarchically structured in multiple layers, control and consolidate information from several transceivers. The architecture of AAFID allows redundancy in reporting from transceivers to monitors so that the failure of a monitor doesn't jeopardize the operation of the intrusion detection system. Monitors have the capability to access data from the entire network and therefore perform higher-level aggregation of results from transceivers. This feature enables the system to detect multihost attacks. Through a user interface, the user of the system enters commands to control the monitors. They, in turn, control the transceivers based on these user commands.

APIs exported by each component accomplish communications between agents, transceivers, monitors, and users.

The advantages of AAFID and other such agent-based approaches include the following:

  • They appear to be more resistant to insertion and evasion attacks than other intrusion detection systems.

  • The architecture is more easily scaled, with provisions for adding new components or replacing old ones as needed.

  • Agents can be tested independent of the full system before they are deployed.

  • Because they can intercommunicate, agents can be deployed in groups in which each performs a different, simple function but contributes to a complex result.

The following deficiencies are also associated with the AAFID architecture:

  • Monitors are single points of failure. If a monitor ceases to work, all the transceivers under its control stop producing useful information. Possible solution strategies exist, but they are as yet untested.

  • If duplicated monitors are used to address the first problem, it is difficult to dealwith the consistency and duplication of information. This situation requires additional mechanisms.

  • Access control mechanisms to allow different users to have different modes of access to the intrusion detection system are missing. This significant deficit must be addressed at each level of the architecture.

  • Problems occur because of the propagation time for evidence of attackers to reach the monitor level. This problem is common to all distributed intrusion detection systems.

  • As in the rest of intrusion detection, a significant need remains for more insight regarding designing user interfaces for intrusion detection systems. This need extends from presentation schemes to policy structures and specification schemes.26

EMERALD

A second architecture that utilizes distributed agent approaches to performing intrusion detection is the EMERALD system, researched and prototyped by Phillip Porras (whose previous intrusion detection work includes the STAT state transition analysis system) of SRI International. EMERALD includes numerous local monitors in a framework thatsupports distributing local results to a global array of detectors that, in turn, consolidate alarms and alerts.

EMERALD, like IDES and NIDES, SRI's previous intrusion detection systems, is a hybrid intrusion detector, utilizing both signature analysis and statistical profiling to detect security problems.

EMERALD is notable in that it separates the analysis semantics from the analysis and response logic, thereby enabling much easier integration throughout the network. EMERALD also supports analysis at different layers of abstraction, an importantcapability. Futhermore the design supports interoperability, another important issue in modern intrusion detection systems.

The central component of EMERALD is the EMERALD Service Monitor, which is similar in form and function to the AAFID autonomous agent. Service monitors are programmable to perform any function and are deployed to hosts. They can be layered, to support hierarchical data reduction, with service monitors performing some local analysis and reporting results and additional information to higher-level monitors. This approach yields some scalability not commonly found in network intrusion detection products. The system also supports a wide range of automated response, which is of great interest to customers responsible for critical large-scale networks.

EMERALD, as it builds on considerable corporate insight gathered in the IDES and NIDES efforts, holds great promise for protecting large distributed networks.27, 28

4.3.3.2. Data Mining

An approach that is similar to some of the rule-based anomaly detection efforts involves utilizing data mining techniques to build intrusion detection models. The objective of this approach is to discover consistent useful patterns of system features that can be used to describe program and user behaviors. These sets of system features, in turn, can then be processed by inductive methods to form classifiers (detection engines) that can recognize anomalies and misuse scenarios.

Data miningrefers to the process of extracting models from large bodies of data. These models often discover facts in the data that are not apparent through other means of inspection. Although many algorithms are available for data mining purposes, the three that are most useful for mining audit data are classification, link analysis, and sequence analysis.

  • Classification assigns a data item to one of several predefined categories. (This step is akin to sorting data into “bins,” depending on some criteria.) Classification algorithms output classifiers, such as decision trees or rules. In intrusion detection, an optimal classifier can reliably identify audit data as falling into a normal or abnormal category.

  • Link analysis identifies relationships and correlations between fields in the body of data. In intrusion detection, an optimal link analysis algorithm identifies the set of systemfeatures best able to reliably reveal intrusions.

  • Sequence analysis models sequential patterns. These algorithms can reveal which audit events typically occur together and hold the key to expanding intrusion detection models to include temporal statistical measures. These measures can provide the capability to recognize denial-of-service attacks.

Researchers have developed extensions to standard data mining algorithms to accommodate some of the special needs of audit and other system event logs. Initial results of experiments using live data are interesting, but the work is not yet ready for transfer into commercial products. Additional research is planned to refine the approach.29

   

< Back Contents Next >

Save to MyCKS

 

Breaking News

One of the primary architects of OpenCable, Michael Adams, explains the key concepts of this initiative in his book OpenCable Architecture.

Expert Advice

Ralph Droms, Ph.D., author of The DHCP Handbook and chair of the IETF Dynamic Host Configuration Working Group, guides you to his top picks for reliable DHCP-related information.

Just Published

Residential Broadband, Second Edition
by George Abe

Introduces the topics surrounding high-speed networks to the home. It is written for anyone seeking a broad-based familiarity with the issues of residential broadband (RBB) including product developers, engineers, network designers, business people, professionals in legal and regulatory positions, and industry analysts.

             
     

From the Brains at InformIT

|

Contact Us

|

Copyright, Terms & Conditions

|

Privacy Policy

 

© Copyright 2000 InformIT. All rights reserved.