Section 12.1. Evolution of Web Intrusion Detection

12.1. Evolution of Web Intrusion Detection

Intrusion detection has been in use for many years. Its purpose is to detect attacks by looking at the network traffic or by looking at operating system events. The term intrusion prevention is used to refer to systems that are also capable of preventing attacks.

Today, when people mention intrusion detection, in most cases they are referring to a network intrusion detection system (NIDS). An NIDS works on the TCP/IP level and is used to detect attacks against any network service, including the web server. The job of such systems, the most popular and most widely deployed of all IDSs, is to monitor raw network packets to spot malicious payload. Host-based intrusion detection systems (HIDSs), on the other hand, work on the host level. Though they can analyze network traffic (only the traffic that arrives to that single host), this task is usually left to NIDSs. Host-based intrusion is mostly concerned with the events that take place on the host (such as users logging in and out and executing commands) and the system error messages that are generated. An HIDS can be as simple as a script watching a log file for error messages, as mentioned in Chapter 8. Integrity validation programs (such as Tripwire) are a form of HIDS. Some systems can be complex: one form of HIDS uses system call monitoring on a kernel level to detect processes that behave suspiciously.

Using a single approach for intrusion detection is insufficient. Security information management (SIM) systems are designed to manage various security-relevant events they receive from agents, where an agent can listen to the network traffic or operating system events or can work to obtain any other security-relevant information.

Because many NIDSs are in place, a large effort was made to make the most of them and to use them for web intrusion detection, too. Though NIDSs work well for the problems they were designed to address and they can provide some help with web intrusion detection, they do not and cannot live up to the full web intrusion detection potential for the following reasons:

NIDSs were designed to work with TCP/IP. The Web is based around the HTTP protocol, which is a completely new vocabulary. It comes with its own set of problems and challenges, which are different from the ones of TCP/IP.
The real problem is that web applications are not simple users of the HTTP protocol. Instead, HTTP is only used to carry the application-specific data. It is as though each application builds its own protocol on top of HTTP.
Many new protocols are deployed on top of HTTP (think of Web Services, XML-RPC, and SOAP), pushing the level of complexity further up.
Other problems, such as the inability of an NIDS to see through encrypted SSL channels (which most web applications that are meant to be secure use) and the inability to cope with a large amount of web traffic, make NIDSs insufficient tools for web intrusion detection.

Vendors of NIDSs have responded to the challenges by adding extensions to better understand HTTP. The term deep-inspection firewalls refers to systems that make an additional effort to understand the network traffic on a higher level. Ultimately, a new breed of IDSs was born. Web application firewalls (WAFs), also known as web application gateways, are designed specifically to guard web applications. Designed from the ground up to support HTTP and to exploit its transactional nature, web application firewalls often work as reverse proxies. Instead of going directly to the web application, a request is rerouted to go to a WAF first and only allowed to proceed if deemed safe.

Web application firewalls were designed from the ground up to deal with web attacks and are better suited for that purpose. NIDSs are better suited for monitoring on the network level and cannot be replaced for that purpose.

Though most vendors are focusing on supporting HTTP, the concept of application firewalls can be applied to any application and protocol. Commercial products have become available that act as proxies for other popular network protocols and for popular databases. (Zorp, at http://www.balabit.com/products/zorp/, available under a commercial and open source license, is one such product.)

Learn more about intrusion detection to gain a better understanding of common problems. I have found the following resources useful:

"Intrusion Detection FAQ" by SANS (http://www.sans.org/resources/idfaq/)
Managing Security with Snort & IDS Tools by Kerry J. Cox and Christopher Gerg (O'Reilly)

12.1.1. Is Intrusion Detection the Right Approach?

Sometimes there is a controversy as to whether we are correct to pursue this approach to increasing security. A common counterargument is that web intrusion detection does not solve the real problem, and that it is better to go directly to the problem and fix weak web applications. I agree with this opinion generally, but the reality is preventing us from letting go from IDS techniques:

Achieving 100-percent security is impossible because we humans have limited capabilities and make mistakes.
Attempting to approach 100-percent security is not done in most cases. In my experience, those who direct application development usually demand features, not security. Attitudes are changing, but slowly.
A complex system always contains third-party products whose quality (security-wise) is unknown. If the source code for the products is unavailable, then you are at the mercy of the vendor to supply the fixes.
We must work with existing vulnerable systems.

As a result, I recommend we raise awareness about security among management and developers. Since awareness will come slowly, do what you can in the meantime to increase security.

12.1.2. Log-Based Web Intrusion Detection

I already covered one form of web intrusion detection in Chapter 8. Log-based web intrusion detection makes use of the fact that web servers produce detailed access logs, where the information about every request is kept. It is also possible to create logs in special formats to control which data is collected. This cost-effective method introduces intrusion detection to a system but there is a drawback. Log-based web intrusion detection is performed only after transactions take place; therefore, attack prevention is not possible. Only detection is. If you can live with that (it is a valid decision and it depends on your threat model), then you only need to take a few steps to implement this technique:

Make sure logging is configured and takes place on all web servers.
Optionally reconfigure logging to log more information than that configured by default.
Collect all logs to a central location.
Implement scripts to examine the logs regularly, in real time or in batch mode (e.g., daily).

That is all there is to it. (Refer to Chapter 8 for a detailed discussion.)

12.1.3. Real-Time Web Intrusion Detection

With real-time intrusion detection, not only can you detect problems, but you can react to them as well. Attack prevention is possible, but it comes with a price tag of increased complexity and more time required to run the system. Most of this chapter discusses the ways of running real-time web intrusion detection. There are two approaches:

Network-based: One network node screens HTTP traffic before it reaches the destination.
Web server-based: An intrusion detection agent is embedded within the web server.

Which of these two you choose depends on your circumstances. The web server-based approach is easy to implement since it does not mandate changes to the network design and configuration. All that is needed is the addition of a module to the web server. But if you have many web servers, and especially if the network contains proprietary web servers, then having a single place from which to perform intrusion detection can be the more efficient approach. Though network-based web IDSs typically perform full separation of clients and servers, web server-based solutions can be described more accurately as separating clients from applications, with servers left unprotected in the middle. In this case, therefore, network-based protection is better because it can protect from flaws in web servers, too.

With Apache and mod_security you can choose either approach to real-time web intrusion detection. If network-based web intrusion detection suits your needs best, then you can build such a node by installing an additional Apache instance with mod_security to work in a reverse proxy configuration. (Reverse proxy operation is discussed in Chapter 9.) Aside from initial configuration, the two modes of operation are similar. The rest of this chapter applies equally to both.

12.1.4. Web Intrusion Detection Features

Later in this chapter, I will present a web intrusion detection solution based on open source components. The advantage of using open source components is they are free and familiar (being based on Apache). Products from the commercial arena have more features, and they have nice user interfaces that make some tasks much easier. Here I will present the most important aspects of web IDSs, even if some features are present only in commercial products. I expect the open source products to catch up, but at this point a discussion of web intrusion detection cannot be complete without including features available only in commercial products. The following sections describe some common intrusion detection features.

12.1.4.1 Protocol anomaly detection

If you read through various RFCs, you may detect a recurring theme. Most RFCs recommend that implementations be conservative about how they use protocols, but liberal with respect to what they accept from others. Web servers behave this way too, but such behavior opens the door wide open for all sorts of attacks. Almost all IDSs perform some sort of sanity check on incoming requests and refuse to accept anything that is not in accordance with the HTTP standard. Furthermore, they can narrow down the features to those that are acceptable to the application and thus reduce the attack surface area.

12.1.4.2 Negative versus positive security models

If you have ever worked to develop a firewall policy, you may have been given (good) advice to first put rules in place to deny everything, and then proceed to allow what is safe. That is a positive security model. On the other side is a negative security model, in which everything that is not dangerous is allowed. The two approaches each ask a question:

Positive security model: What is safe?
Negative security model: What is dangerous?

A negative security model is used more often. You identify a dangerous pattern and configure your system to reject it. This is simple, easy, and fun, but not foolproof. The concept relies on you knowing what is dangerous. If there are aspects of the problem you are not aware of (which happens from time to time) then you have left a hole for the attacker to exploit.

A positive security model (also known as a white-list model) is a better approach to building policies and works well for firewall policy building. In the realm of web application security, a positive security model approach boils down to enumerating every script in the application. For each script in the list, you need to determine the following:

Allowed request methods (e.g., GET/POST or POST only)
Allowed Content-Type
Allowed Content-Length
Allowed parameters
Which parameters are mandatory and which are optional
The type of every parameter (e.g., text or integer)
Additional parameter constraints (where applicable)

This is what programmers are supposed to do but frequently do not. Using the positive security model is better if you can afford to spend the time to develop it. One difficult aspect of this approach is that the application model changes as the application evolves. You will need to update the model every time a new script is added to the application or if an existing one changes. But it works well to protect stable, legacy applications that no one maintains anymore.

Automating policy development can ease problems:

Some IDSs can observe the traffic and use it to build the policy automatically. Some can do it in real time.
With white-list protection in place, you may be able to mark certain IP addresses as trusted, and configure the IDS to update the policy according to the observed traffic.
If an application is built with a comprehensive set of regression tests (to simulate correct behavior), playing the tests while the IDS is watching will result in a policy being created automatically.

12.1.4.3 Rule-based versus anomaly-based protection

Rule-based IDSs comprise the majority of what is available on the market. In principle, every request (or packet in the case of NIDS) is subject to a series of tests, where each test consists of one or more inspection rules. If a test fails, the request is rejected as invalid.

Rule-based IDSs are easy to build and use and are efficient when used to defend against known problems or when the task is to build a custom defense policy. But since they must know about the specifics of every threat to protect from it, these tools must rely on using extensive rule databases. Vendors maintain rule databases and distribute their tools with programs to update IDS installations automatically.

This approach is unlikely to be able to protect custom applications or to protect from zero-day exploits (exploits that attack vulnerabilities not yet publicly known). This is where anomaly-based IDSs work better.

The idea behind anomaly-based protection is to build a protection layer that will observe legal application traffic and then build a statistical model to judge the future traffic against. In theory, once trained, an anomaly-based system should detect anything out of the ordinary. With anomaly-based protection, rule databases are not needed and zero-day exploits are not a problem. Anomaly-based protection systems are difficult to build and are thus rare. Because users do not understand how they work, many refuse to trust such systems, making them less popular.

12.1.4.4 Enforcing input validation

A frequent web security problem occurs where the web programming model is misunderstood and programmers think the browser can be trusted. If that happens, the programmers may implement input validation in the browser using JavaScript. Since the browser is just a simple tool under control of the user, an attacker can bypass such input validation easily and send malformed input directly to the application.

A correct approach to handling this problem is to add server-side validation to the application. If that is impossible, another way is to add an intermediary between the client and the application and to have the intermediary reinterpret the JavaScript embedded in the web page.

12.1.4.5 State management

The stateless nature of the HTTP protocol has many negative impacts on web application security. Sessions can and should be implemented on the application level, but for many applications the added functionality is limited to fulfilling business requirements other than security. Web IDSs, on the other hand, can throw their full weight into adding various session-related protection features. Some of the features include:

Enforcement of entry points: At most web sites, you can start browsing from any site URL that is known to you. This is often convenient for attackers and inconvenient for defenders. An IDS that understands sessions will realize the user is making his first request and redirect him back to the default entry point (possibly logging the event).
Observation of each user session individually: Being able to distinguish one session from another opens interesting possibilities, e.g., it becomes possible to watch the rate at which requests are made and the way users navigate through the application going from one page to another. Looking at the behavior of just one user it becomes much easier to detect intrusion attempts.
Detecting and responding to brute-force attacks: Brute-force attacks normally go undetected in most web applications. With state management in place, an IDS tracks unusual events (such as login failures), and it can be configured to take action when a threshold is reached. It is often convenient to slow down future authentication attempts slightly, not enough for real users to notice but enough to practically stop automated scripts. If an authentication script takes 50 milliseconds to make a decision, a script can make around 20 attempts per second. If you introduce a delay of, say, one second, that will bring the speed to under one attempt per second. That, combined with an alert to someone to investigate further, would provide a decent defense.
Implementation of session timeouts: Sessions can be expired after the default timeout expires, and users would be required to re-authenticate. Users can be logged out after a time of inactivity.
Detection and prevention of session hijacking: In most cases, session hijacking results in a change of IP address and some other request data (that is, request headers are likely to be different). A stateful monitoring tool can detect the anomalies and prevent exploitation from taking place. The recommended action to take is to terminate the session, ask the user to re-authenticate, and log a warning.
Allowing only links provided to the client in the previous request: Some tools can be strict and only allow users to follow the links that have been given in the previous response. This seems like an interesting feature but can be difficult to implement. One problem with it is that it prevents the user from using more than one browser window with the application. Another problem is that it can cause incompatibilities with applications using JavaScript to construct links dynamically.

12.1.4.6 Anti-evasion techniques

One area where network-based IDSs have had trouble with web traffic is with respect to evasion techniques (see Chapter 10). The problem is there are so many ways to alter incoming (attack) data, so it keeps the original meaning and the application interprets it, but it is modified sufficiently to sneak under the IDS radar. This is an area where dedicated web IDSs are providing significant improvement. For example, just by looking at whole HTTP requests at a time, an entire class of attacks based on request fragmentation is avoided. And because they understand HTTP well and can separate dynamic requests from requests for static resources (and so choose not to waste time protecting static requests that cannot be compromised), they can afford to apply many different anti-evasion techniques that would prove too time consuming for NIDSs.

12.1.4.7 Response monitoring and information leak prevention

Information leak prevention is a fancy name for response monitoring. In principle it is identical to request monitoring, and its goal is to watch the output for suspicious patterns and prevent the response from reaching the client when such a pattern is detected. The most likely candidates for patterns in output are credit card numbers and social security numbers. Another use for this technique is to watch for signs of successful intrusions, as I will demonstrate later in the chapter.

It is impossible to prevent information leak by a determined and skillful attacker, since he will always be able to encode the information in such a way as to prevent detection by an IDS. Still, this technique can protect when the attacker does not have full control over the server but instead tries to exploit a weakness in the application.