Section 11.1. Black-Box Testing

11.1. Black-Box Testing

In black-box testing, you pretend you are an outsider, and you try to break in. This useful technique simulates the real world. The less you know about the system you are about to investigate, the better. I assume you are doing black-box assessment because you fall into one of these categories:

You want to increase the security of your own system.
You are helping someone else secure their system.
You are performing web security assessment professionally.

Unless you belong to the first category, you must ensure you have permission to perform black-box testing. Black-box testing can be treated as hostile and often illegal. If you are doing a favor for a friend, get written permission from someone who has the authority to provide it.

Ask yourself these questions: Who am I pretending to be? Or, what is the starting point of my assessment? The answer depends on the nature of the system you are testing. Here are some choices:

A member of the general public
A business partner of the target organization
A customer on the same shared server where the target application resides
A malicious employee
A fellow system administrator

Different starting points require different approaches. A system administrator may have access to the most important servers, but such servers are (hopefully) out of reach of a member of the public. The best way to conduct an assessment is to start with no special privileges and examine what the system looks like from that point of view. Then continue upward, assuming other roles. While doing all this, remember you are doing a web security assessment, which is a small fraction of the subject of information security. Do not cover too much territory, or you will never finish. In your initial assessment, you should focus on the issues mostly under your responsibility.

As you perform the assessment, record everything, and create an information trail. If you know something about the infrastructure beforehand, you must prove you did not use it as part of black-box testing. You can use that knowledge later, as part of white-box testing.

Black-box testing consists of the following steps:

Information gathering (passive and active)
Web server analysis
Web application analysis
Vulnerability probing

I did not include report writing, but you will have to do that, too. To make your job easier, mark your findings this way:

Notices: Things to watch out for
Warnings: Problems that are not errors but are things that should be fixed
Errors: Problems that should be corrected as soon as possible
Severe errors: Gross oversights; problems that must be corrected immediately

11.1.1. Information Gathering

Information gathering is the first step of every security assessment procedure and is important when performed as part of black-box testing methodology. Working blindly, you will see information available to a potential attacker. Here we assume you are armed only with the name of a web site.

Information gathering can be broadly separated into two categories: passive and active. Passive techniques cannot be detected by the organization being investigated. They involve extracting knowledge about the organization from systems outside the organization. They may include techniques that involve communication with systems run by the organization but only if such techniques are part of their normal operation (e.g., the use of the organization's DNS servers) and cannot be detected.

Most information gathering techniques are well known, having been used as part of traditional network penetration testing for years. Passive information gathering techniques were covered in the paper written by Gunter Ollmann:

"Passive Information Gathering: The Analysis Of Leaked Network Security Information" by Gunter Ollmann (NGSS) (http://www.nextgenss.com/papers/NGSJan2004PassiveWP.pdf)

The name of the web site you have been provided will resolve to an IP address, giving you the vital information you need to start with. Depending on what you have been asked to do, you must decide whether you want to gather information about the whole of the organization. If your only target is the public web site, the IP address of the server is all you need. If the target of your research is an application used internally, you will need to expand your search to cover the organization's internal systems.

The IP address of the public web site may help discover the whole network, but only if the site is internally hosted. For smaller web sites, hosting internally is overkill, so hosting is often outsourced. Your best bet is to exchange email with someone from the organization. Their IP address, possibly the address from an internal network, will be embedded into email headers.

11.1.1.1 Organizational information

Your first goal is to learn as much as possible about the organization, so going to its public web site is a natural place to start. You are looking for the following information:

Names and positions
Email addresses
Addresses and telephone numbers, which reveal physical locations
Posted documents, which often reveal previous revisions, or information on who created them

The web site should be sufficient for you to learn enough about the organization to map out its network of trust. In a worst-case scenario (from the point of view of attacking them), the organization will trust itself. If it relies on external entities, there may be many opportunities for exploitation. Here is some of the information you should determine:

Size: The security posture of a smaller organization is often lax, and such organizations usually cannot afford having information security professionals on staff. Bigger companies employ many skilled professionals and possibly have a dedicated information security team.
Outsourcing: Organizations are rarely able to enforce their procedures when parts of the operations are outsourced to external entities. If parts of the organization are outsourced, you may have to expand your search to target other sites.
Business model: Do they rely on a network of partners or distributors to do the business? Distributors are often smaller companies with lax security procedures. A distributor may be an easy point of entry.

11.1.1.2 Domain name registration

Current domain name registration practices require significant private information to be provided to the public. This information can easily be accessed using the whois service, which is available in many tools, web sites, and on the command line.

There are many whois servers (e.g., one for each registrar), and the important part of finding the information you are looking for is in knowing which server to ask. Normally, whois servers issue redirects when they cannot answer a query, and good tools will follow redirects automatically. When using web-based tools (e.g., http://www.internic.net/whois.html), you will have to perform redirection manually.

Watch what information we can find on O'Reilly (registrar disclaimers have been removed from the output to save space):

$ whois oreilly.com
...
O'Reilly & Associates
   1005 Gravenstein Hwy., North
   Sebastopol, CA, 95472
   US
   
   Domain Name: OREILLY.COM
   
   Administrative Contact -
        DNS Admin -  nic-ac@OREILLY.COM
        O'Reilly & Associates, Inc.
        1005 Gravenstein Highway North
        Sebastopol, CA 95472
        US
        Phone -  707-827-7000
        Fax -  707-823-9746
   Technical Contact -
        technical DNS -  nic-tc@OREILLY.COM
        O'Reilly & Associates
        1005 Gravenstein Highway North
        Sebastopol, CA 95472
        US
        Phone -  707-827-7000
        Fax -  - 707-823-9746
   
   Record update date -  2004-05-19 07:07:44
   Record create date -  1997-05-27
   Record will expire on -  2005-05-26
   Database last updated on -  2004-06-02 10:33:07 EST
   
   Domain servers in listed order:
   
   NS.OREILLY.COM                209.204.146.21
   NS1.SONIC.NET                 208.201.224.11

11.1.1.3 Domain name system

A tool called dig can be used to convert names to IP addresses or do the reverse, convert IP addresses to names (known as reverse lookup). An older tool, nslookup, is still popular and widely deployed.

$ dig oreilly.com any
   
; <<>> DiG 9.2.1 <<>> oreilly.com any
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 30773
;; flags: qr rd ra; QUERY: 1, ANSWER: 5, AUTHORITY: 3, ADDITIONAL: 4
   
;; QUESTION SECTION:
;oreilly.com.                   IN      ANY
   
;; ANSWER SECTION:
oreilly.com.            20923   IN      NS      ns1.sonic.net.
oreilly.com.            20923   IN      NS      ns2.sonic.net.
oreilly.com.            20923   IN      NS      ns.oreilly.com.
oreilly.com.            20924   IN      SOA     ns.oreilly.com. 
                                                nic-tc.oreilly.com.
2004052001 10800 3600 604800 21600
oreilly.com.            20991   IN      MX      20 smtp2.oreilly.com.
   
;; AUTHORITY SECTION:
oreilly.com.            20923   IN      NS      ns1.sonic.net.
oreilly.com.            20923   IN      NS      ns2.sonic.net.
oreilly.com.            20923   IN      NS      ns.oreilly.com.
   
;; ADDITIONAL SECTION:
ns1.sonic.net.          105840  IN      A       208.201.224.11
ns2.sonic.net.          105840  IN      A       208.201.224.33
ns.oreilly.com.         79648   IN      A       209.204.146.21
smtp2.oreilly.com.      21011   IN      A       209.58.173.10
   
;; Query time: 2 msec
;; SERVER: 217.160.182.251#53(217.160.182.251)
;; WHEN: Wed Jun  2 15:54:00 2004
;; MSG SIZE  rcvd: 262

This type of query reveals basic information about a domain name, such as the name servers and the mail servers. We can gather more information by asking a specific question (e.g., "What is the address of the web site?"):

$ dig www.oreilly.com
   
;; QUESTION SECTION:
;www.oreilly.com.               IN      A
   
;; ANSWER SECTION:
www.oreilly.com.        20269   IN      A       208.201.239.36
www.oreilly.com.        20269   IN      A       208.201.239.37

The dig tool converts IP addresses into names when the -x option is used:

$ dig -x 208.201.239.36
   
;; QUESTION SECTION:
;36.239.201.208.in-addr.arpa.   IN      PTR
   
;; ANSWER SECTION:
36.239.201.208.in-addr.arpa. 86381 IN   PTR     www.oreillynet.com.

You can see that this reverse query of the IP address from looking up the domain name oreilly.com gave us a whole new domain name.

A zone transfer is a service where all the information about a particular domain name is transferred from a domain name server. Such services are handy because of the wealth of information they provide. For the same reason, the access to a zone transfer service is often restricted. Zone transfers are generally not used for normal DNS operation, so requests for zone transfers are sometimes logged and treated as signs of preparation for intrusion.

If you have an address range, you can gather information similar to that of a zone transfer by performing a reverse lookup on every individual IP address.

11.1.1.4 Regional Internet Registries

You have probably discovered several IP addresses by now. IP addresses are not sold; they are assigned to organizations by bodies known as Regional Internet Registries (RIRs). The information kept by RIRs is publicly available. Four registries cover address allocation across the globe:

APNIC: Asia-Pacific Network Information Center (http://www.apnic.net)
ARIN: American Registry for Internet Numbers (http://www.arin.net)
LACNIC: Latin American and Caribbean Internet Address Registry (http://www.lacnic.net)
RIPE NCC: RIPE Network Coordination Centre (http://www.ripe.net)

Registries do not work with end users directly. Instead, they delegate large blocks of addresses to providers, who delegate smaller chunks further. In effect, an address can be assigned to multiple parties. In theory, every IP address should be associated with the organization using it. In real life, Internet providers may not update the IP address database. The best you can do is to determine the connectivity provider of an organization.

IP assignment data can be retrieved from any active whois server, and different servers can give different results. In the case below, I just guessed that whois.sonic.net exists. This is what we get for one of O'Reilly's IP addresses:

$ whois -h whois.sonic.net 209.204.146.21
[Querying whois.sonic.net]
[whois.sonic.net]
You asked for 209.204.146.21
network:Class-Name:network
network:Auth-Area:127.0.0.1/32
network:ID:NETBLK-SONIC-209-204-146-0.127.0.0.1/32
network:Handle:NETBLK-SONIC-209-204-146-0
network:Network-Name:SONIC-209-204-146-0
network:IP-Network:209.204.146.0/24
network:IP-Network-Block:209.204.146.0 - 209.204.146.255
network:Org-Name:John Irwin
network:Email:ora@sonic.net
network:Tech-Contact;Role:SACC-ORA-SONIC.127.0.0.1/32
   
network:Class-Name:network
network:Auth-Area:127.0.0.1/32
network:ID:NETBLK-SONIC-209-204-128-0.127.0.0.1/32
network:Handle:NETBLK-SONIC-209-204-128-0
network:Network-Name:SONIC-209-204-128-0
network:IP-Network:209.204.128.0/18
network:IP-Network-Block:209.204.128.0 - 209.204.191.255
network:Org-Name:Sonic Hostmaster
network:Email:ipowner@sonic.net
network:Tech-Contact;Role:SACC-IPOWNER-SONIC.127.0.0.1/32

11.1.1.5 Search engines

Search engines have become a real resource when it comes to information gathering. This is especially true for Google, which has exposed its functionality through an easy-to-use programming interface. Search engines can help you find:

Publicly available information on a web site or information that was available before.
Information that is not intended for public consumption but that is nevertheless available unprotected (and the search engine picked it up).
Posts from employees to newsgroups and mailing lists. Post headers reveal information about the infrastructure. Even message content can reveal bits about the infrastructure. If you find a member of the development team asking questions about a particular database engine, chances are that engine is used in-house.
Links to other organizations, possibly those that have done work for the organization being targeted.

Look at some example Google queries. If you want to find a list of PDF documents available on a site, type a Google search query such as the following:

site:www.modsecurity.org filetype:pdf

To see if a site contains Apache directory listings, type something like this:

site:www.modsecurity.org intitle:"Index of /" "Parent Directory"

To see if it contains any WS_FTP log files, type something like this:

site:www.modsecurity.org inurl:ws_ftp.log

Anyone can register with Google and receive a key that will support up to 1,000 automated searches per day. To learn more about Google APIs, see the following:

Google Web APIs (http://www.google.com/apis/)
Google Web API Reference (http://www.google.com/apis/reference.html)

Google Hacking Database (http://johnny.ihackstuff.com) is a categorized database of security-related Google queries. You can use it directly from a browser or via an automated tool such as Wikto (http://www.sensepost.com/research/wikto/).

11.1.1.6 Social engineering

Social engineering is arguably the oldest hacking technique, having been used hundreds of years before computers were invented. With social engineering, a small effort can go a long way. Kevin Mitnick (http://en.wikipedia.org/wiki/Kevin_Mitnick) is the most well-known practitioner. Here are some social-engineering approaches:

Direct contact

Just visit the company and have a look around. Get some company documentation from their sales people.

Email contact

Follow up on a visit with a thank-you email and a question. You will get an email back (which you will use to extract headers from).

Establish a relationship

Open an account. Inquire about partnership and distributor opportunities. The sign-up procedure may give out interesting information about the security of the company's extranet system. For example, you may be told that you must have a static IP address to connect, that a custom client is required, or that you can connect from wherever you want provided you use a privately issued client certificate.

Message boards

Message boards are places where you can meet a company's employees. Developers will often want to explain how they have designed the best system there is, revealing information they feel is harmless but which can be useful for the assessment.

Cases in which current employees disclose company secrets are rare but you can find former (often disgruntled) employees who will not hesitate to disclose a secret or two. Even in an innocent conversation, people may give examples from where they used to work. Talking to people who have designed a system will help you get a feeling for what you are up against.

For more information on social engineering (and funny real-life stories), see:

"Social Engineering Fundamentals, Part I: Hacker Tactics" by Sarah Granger (http://www.securityfocus.com/printable/infocus/1527)
"Social Engineering Fundamentals, Part II: Combat Strategies" by Sarah Granger (http://www.securityfocus.com/printable/infocus/1533)

11.1.1.7 Connectivity

For each domain name or IP address you acquire, perform a connectivity check using traceroute. Again, I use O'Reilly as an example.

$ traceroute www.oreilly.com
traceroute: Warning: www.oreilly.com has multiple addresses; using 208.201.
                                                                   239.36
traceroute to www.oreilly.com (208.201.239.36), 30 hops max, 38 byte packets
 1    gw-prtr-44-a.schlund.net (217.160.182.253)  0.238 ms
 2    v999.gw-dist-a.bs.ka.schlund.net (212.227.125.253)  0.373 ms
 3    ge-41.gw-backbone-b.bs.ka.schlund.net (212.227.116.232)  0.535 ms
 4    pos-80.gw-backbone-b.ffm.schlund.net (212.227.112.127)  3.210 ms
 5    cr02.frf02.pccwbtn.net (80.81.192.50)  4.363 ms
 6    pos3-0.cr02.sjo01.pccwbtn.net (63.218.6.66)  195.201 ms
 7    layer42.ge4-0.4.cr02.sjo01.pccwbtn.net (63.218.7.6)  187.701 ms
 8    2.fast0-1.gw.equinix-sj.sonic.net (64.142.0.21)  185.405 ms
 9    fast5-0-0.border.sr.sonic.net (64.142.0.13)  191.517 ms
10    eth1.dist1-1.sr.sonic.net (208.201.224.30)  192.652 ms
11    www.oreillynet.com (208.201.239.36)  190.662 ms

The traceroute output shows the route packets use to travel from your location to the target's location. The last few lines matter; the last line is the server. On line 10, we see what is most likely a router, connecting the network to the Internet.

traceroute relies on the ICMP protocol to discover the path packets use to travel from one point to another, but ICMP packets can be filtered for security reasons. An alternative tool, tcptraceroute (http://michael.toren.net/code/tcptraceroute/) performs a similar function but uses other methods. Try tcptraceroute if tcproute does not produce results.

11.1.1.8 Port scanning

Port scanning is an active information-gathering technique. It is viewed as impolite and legally dubious. You should only perform port scanning against your own network or where you have written permission from the target.

The purpose of port scanning is to discover active network devices on a given range of addresses and to analyze each device to discover public services. In the context of web security assessment, you will want to know if a publicly accessible FTP or a database engine is running on the same server. If there is, you may be able to use it as part of your assessment.

Services often run unprotected and with default passwords. I once discovered a MySQL server on the same machine as the web server, running with the default root password (which is an empty string). Anyone could have accessed the company's data and not bother with the web application.

The most popular port-scanning tool is Nmap (http://www.insecure.org/nmap/), which is free and useful. It is a command line tool, but a freeware frontend called NmapW is available from Syhunt (http://www.syhunt.com/section.php?id=nmapw). In the remainder of this section, I will demonstrate how Nmap can be used to learn more about running devices. In all examples, the real IP addresses are masked because they belong to real devices.

The process of the discovery of active hosts is called a ping sweep. An attempt is made to ping each IP address and live addresses are reported. Here is a sample run, in which XXX.XXX.XXX.112/28 represents the IP address you would type:

# nmap -sP 
XXX.XXX.XXX.112/28

Starting nmap 3.48 ( http://www.insecure.org/nmap/ )
   
Host (XXX.XXX.XXX.112) seems to be a subnet broadcast address (returned 1
extra pings).
Host (XXX.XXX.XXX.114) appears to be up.
Host (XXX.XXX.XXX.117) appears to be up.
Host (XXX.XXX.XXX.120) appears to be up.
Host (XXX.XXX.XXX.122) appears to be up.
Host (XXX.XXX.XXX.125) appears to be up.
Host (XXX.XXX.XXX.126) appears to be up.
Host (XXX.XXX.XXX.127) seems to be a subnet broadcast address (returned 1
extra pings).
   
Nmap run completed -- 16 IP addresses (6 hosts up) scanned in 7 seconds

After that, you can proceed to get more information from individual hosts by looking at their TCP ports for active services. The following is sample output from scanning a single host. I have used one of my servers since scanning one of O'Reilly's servers without a permit would have been inappropriate.

# nmap -sS 
XXX.XXX.XXX.XXX

Starting nmap 3.48 ( http://www.insecure.org/nmap/ )
   
The SYN Stealth Scan took 144 seconds to scan 1657 ports.
Interesting ports on XXX.XXX.XXX.XXX:
(The 1644 ports scanned but not shown below are in state: closed)
PORT     STATE SERVICE
21/tcp   open  ftp
22/tcp   open  ssh
23/tcp   open  telnet
25/tcp   open  smtp
53/tcp   open  domain
80/tcp   open  http
110/tcp  open  pop-3
143/tcp  open  imap
443/tcp  open  https
993/tcp  open  imaps
995/tcp  open  pop3s
3306/tcp open  mysql
8080/tcp open  http-proxy
   
Nmap run completed -- 1 IP address (1 host up) scanned in 157.022 seconds

You can go further if you use Nmap with a -sV switch, in which case it will connect to the ports you specify and attempt to identify the services running on them. In the following example, you can see the results of service analysis when I run Nmap against ports 21, 80, and 8080. It uses the Server header field to identify web servers, which is the reason it incorrectly identified the Apache running on port 80 as a Microsoft Internet Information Server. (I configured my server with a fake server name, as described in Chapter 2, where HTTP fingerprinting for discovering real web server identities is discussed.)

# nmap -sV 
XXX.XXX.XXX.XXX
 -P0 -p 21,80,8080
Starting nmap 3.48 ( http://www.insecure.org/nmap/ )
   
Interesting ports on XXX.XXX.XXX.XXX:
PORT     STATE SERVICE VERSION
21/tcp   open  ftp     ProFTPD 1.2.9
80/tcp   open  http    Microsoft IIS webserver 5.0
8080/tcp open  http    Apache httpd 2.0.49 ((Unix) DAV/2 PHP/4.3.4)
   
Nmap run completed -- 1 IP address (1 host up) scanned in 22.065 seconds

Another well-known tool for service identification is Amap (http://www.thc.org/releases.php). Try it if Nmap does not come back with satisfactory results.

Scanning results will usually fall into one of three categories:

No firewall: Where there is no firewall in place, you will often find many unrestricted services running on the server. This indicates a server that is not taken care of properly. This is the case with many managed dedicated servers.
Limited firewall: A moderate-strength firewall is in place, allowing access to public services (e.g., http) but protecting private services (e.g., ssh). This often means whoever maintains the server communicates with the server from a static IP address. This type of firewall uses an "allow by default, deny what is sensitive" approach.
Tight firewall: In addition to protecting nonpublic services, a tight firewall configuration will restrict ICMP (ping) traffic, restrict outbound traffic, and only accept related incoming traffic. This type of firewall uses a "deny by default, allow what is acceptable" approach.

If scan results fall into the first or the second category, the server is probably not being closely monitored. The third option shows the presence of people who know what they are doing; additional security measures may be in place.

11.1.2. Web Server Analysis

This is where the real fun begins. At a minimum, you need the following tools:

A browser to access the web server
A way to construct and send custom requests, possibly through SSL
A web security assessment proxy to monitor and change traffic

Optionally, you may choose to perform an assessment through one or more open proxies (by chaining). This makes the test more realistic, but it may disclose sensitive information to others (whoever controls the proxy), so be careful.

If you do choose to go with a proxy, note that special page objects such as Flash animations and Java applets often choose to communicate directly with the server, thus revealing your real IP address.

We will take these steps:

Test SSL.
Identify the web server.
Identify the application server.
Examine default locations.
Probe for common configuration problems.
Examine responses to exceptions.
Probe for known vulnerabilities.
Enumerate applications.

11.1.2.1 Testing SSL

I have put SSL tests first because, logically, SSL is the first layer of security you encounter. Also, in some rare cases you will encounter a target that requires use of a privately issued client certificate. In such cases, you are unlikely to progress further until you acquire a client certificate. However, you should still attempt to trick the server to give you access without a valid client certificate.

Attempt to access the server using any kind of client certificate (even a certificate you created will do). If that fails, try to access the server using a proper certificate signed by a well-known CA. On a misconfigured SSL server, such a certificate will pass the authentication phase and allow access to the application. (The server is only supposed to accept privately issued certificates.) Sometimes using a valid certificate with a subject admin or Administrator may get you inside (without a password).

Whether or not a client certificate is required, perform the following tests:

Version 2 of the SSL protocol is known to suffer from a few security problems. Unless there is a good reason to support older SSLv2 clients, the web server should be configured to accept only SSLv3 or TLSv1 connections. To check this, use the OpenSSL client, as demonstrated in Chapter 4, adding the -no_ssl3 and -no_tls1 switches.
A default Apache SSL configuration will allow various ciphers to be used to secure the connection. Many ciphers are not considered secure any more. They are there only for backward compatibility. The OpenSSL s_client tool can be used for this purpose, but an easier way exists. The Foundstone utility SSLDigger (described in the Appendix A) will perform many tests attempting to establish SSL connections using ciphers of different strength. It comes with a well-written whitepaper that describes the tool's function.
Programmers sometimes redirect users to the SSL portion of the web site from the login page only and do not bother to check at other entry points. Consequently, you may be able to bypass SSL and use the site without it by directly typing the URL of a page.

11.1.2.2 Identifying the web server

After SSL testing (if any), attempt to identify the web server. Start by typing a Telnet command such as the following, substituting the appropriate web site name:

$ telnet www.modsecurity.org 80
Trying 217.160.182.153...
Connected to www.modsecurity.org.
Escape character is '^]'.
OPTIONS / HTTP/1.0
Host: www.modsecurity.org
   
HTTP/1.1 200 OK
Date: Tue, 08 Jun 2004 10:54:52 GMT
Server: Microsoft-IIS/5.0
Content-Length: 0
Allow: GET, HEAD, POST, PUT, DELETE, CONNECT, OPTIONS, PATCH, PROPFIND,
PROPPATCH, MKCOL, COPY, MOVE, LOCK, UNLOCK, TRACE

We learn two things from this output:

The web server supports WebDAV. You can see this by the appearance of the WebDAV specific methods, such as PATCH and PROPFIND, in the Allow response header. This is an indication that we should perform more WebDAV research.
The Server signature tells us the site is running the Microsoft Internet Information Server. Suppose you find this unlikely (having in mind the nature of the site and its pro-Unix orientation). You can use Netcraft's "What's this site running?" service (at http://uptime.netcraft.co.uk and described in the Appendix A) and access the historical data if available. In this case, Netcraft will reveal the site is running on Linux and Apache, and that the server signature is "Apache/1.3.27 (Unix) (Red-Hat/Linux) PHP/4.2.2 mod_ssl/2.8.12 openSSL/0.9.6b" (as of August 2003).

We turn to httprint for the confirmation of the signature:

$ httprint -P0 -h www.modsecurity.org -s signatures.txt
httprint v0.202 (beta) - web server fingerprinting tool
(c) 2003,2004 net-square solutions pvt. ltd. - see readme.txt
http://net-square.com/httprint/
httprint@net-square.com
   
--------------------------------------------------
Finger Printing on http://www.modsecurity.org:80/
Derived Signature:
Microsoft-IIS/5.0
9E431BC86ED3C295811C9DC5811C9DC5050C5D32505FCFE84276E4BB811C9DC5
0D7645B5811C9DC5811C9DC5CD37187C11DDC7D7811C9DC5811C9DC58A91CF57
FCCC535BE2CE6923FCCC535B811C9DC5E2CE69272576B769E2CE69269E431BC8
6ED3C295E2CE69262A200B4C6ED3C2956ED3C2956ED3C2956ED3C295E2CE6923
E2CE69236ED3C295811C9DC5E2CE6927E2CE6923
   
Banner Reported: Microsoft-IIS/5.0
Banner Deduced: Apache/1.3.27
Score: 140
Confidence: 84.34

This confirms the version of the web server that was reported by Netcraft. The confirmation shows the web server had not been upgraded since October 2003, so the chances of web server modules having been upgraded are slim. This is good information to have.

This complete signature gives us many things to work with. From here we can go and examine known vulnerabilities for Apache, PHP, mod_ssl, and OpenSSL. The OpenSSL version (reported by Netcraft as 0.9.6b) looks very old. According to the OpenSSL web site, Version 0.9.6b was released in July 2001. Many serious OpenSSL vulnerabilities have been made public since that time.

A natural way forward from here would be to explore those vulnerabilities further. In this case, however, that would be a waste of time because the version of OpenSSL running on the server is not vulnerable to current attacks. Vendors often create custom branches of software applications that they include in their operating systems. After the split, the included applications are maintained internally, and the version numbers rarely change. When a security problem is discovered, vendors perform what is called a backport: the patch is ported from the current software version (maintained by the original application developers) back to the older release. This only results in a change of the packaging version number, which is typically only visible from the inside. Since there is no way of knowing this from the outside, the only thing to do is to go ahead and check for potential vulnerabilities.

11.1.2.3 Identifying the application server

We now know the site likely uses PHP because PHP used to appear in the web server signature. We can confirm our assumption by browsing and looking for a nonstatic part of the site. Pages with the extension .php are likely to be PHP scripts.

Some sites can attempt to hide the technology by hiding extensions. For example, they may associate the extension .html with PHP, making all pages dynamic. Or, if the site is running on a Windows server, associating the extension .asp with PHP may make the application look as if it was implemented in ASP.

Attempts to increase security in this way are not likely to succeed. If you look closely, determining the technology behind a web site is easy. For system administrators it makes more sense to invest their time where it really matters.

Suppose you are not sure what technology is used at a web site. For example, suppose the extension for a file is .asp but you think that ASP is not used. The HTTP response may reveal the truth:

$ telnet www.modsecurity.org 80
Trying 217.160.182.153...
Connected to www.modsecurity.org.
Escape character is '^]'.
HEAD /index.asp HTTP/1.0
Host: www.modsecurity.org
   
HTTP/1.1 200 OK
Date: Tue, 24 Aug 2004 13:54:11 GMT
Server: Microsoft-IIS/5.0
X-Powered-By: PHP/4.3.3-dev
Set-Cookie: PHPSESSID=9d3e167d46dd3ebd81ca12641d82106d; path=/
Connection: close
Content-Type: text/html

There are two clues in the response that tell you this is a PHP-based site. First, the X-Powered-By header includes the PHP version. Second, the site sends a cookie (the Set-Cookie header) whose name is PHP-specific.

Don't forget a site can utilize more than one technology. For example, CGI scripts are often used even when there is a better technology (such as PHP) available. Examine all parts of the site to discover the technologies used.

11.1.2.4 Examining default locations

A search for default locations can yield significant rewards:

Finding files present where you expect them to be present will reinforce your judgment about the identity of the server and the application server.
Default installations can contain vulnerable scripts or files that reveal information about the target.
Management interfaces are often left unprotected, or protected with a default username/password combination.

For Apache, here are the common pages to try to locate:

/server-status
/server-info
/mod_gzip_status
/manual
/icons
~root/
~nobody/

11.1.2.5 Probing for common configuration problems

Test to see if proxy operations are allowed in the web server. A running proxy service that allows anyone to use it without restriction (a so-called open proxy) represents a big configuration error. To test, connect to the target web server and request a page from a totally different web server. In proxy mode, you are allowed to enter a full hostname in the request (otherwise, hostnames go into the Host header):

$ telnet www.example.com 80
Connected to www.example.com.
Escape character is '^]'.
HEAD http://www.google.com:80/ HTTP/1.0
   
HTTP/1.1 302 Found
Date: Thu, 11 Nov 2004 14:10:14 GMT
Server: GWS/2.1
Location: http://www.google.de/
Content-Type: text/html; charset=ISO-8859-1
Via: 1.0 www.google.com
Connection: close
   
Connection closed by foreign host.

If the request succeeds (you get a response, like the response from Google in the example above), you have encountered an open proxy. If you get a 403 response, that could mean the proxy is active but configured not to accept requests from your IP address (which is good). Getting anything else as a response probably means the proxy code is not active. (Web servers sometimes simply respond with a status code 200 and return their default home page.)

The other way to use a proxy is through a CONNECT method, which is designed to handle any type of TCP/IP connection, not just HTTP. This is an example of a successful proxy connection using this method:

$ telnet www.example.com 80
Connected to www.example.com.
Escape character is '^]'.
CONNECT www.google.com:80 HTTP/1.0
   
HTTP/1.0 200 Connection Established
Proxy-agent: Apache/2.0.49 (Unix)
   
HEAD / HTTP/1.0
Host: www.google.com
   
HTTP/1.0 302 Found
Location: http://www.google.de/
Content-Type: text/html
Server: GWS/2.1
Content-Length: 214
Date: Thu, 11 Nov 2004 14:15:22 GMT
Connection: Keep-Alive
   
Connection closed by foreign host.

In the first part of the request, you send a CONNECT line telling the proxy server where you want to go. If the CONNECT method is allowed, you can continue typing. Everything you type from this point on goes directly to the target server. Having access to a proxy that is also part of an internal network opens up interesting possibilities. Internal networks usually use nonroutable private space that cannot be reached from the outside. But the proxy, because it is sitting on two addresses simultaneously, can be used as a gateway. Suppose you know that the IP address of a database server is 192.168.0.99. (For example, you may have found this information in an application library file through file disclosure.) There is no way to reach this database server directly but if you ask the proxy nicely it may respond:

$ telnet www.example.com 80
Connected to www.example.com.
Escape character is '^]'.
CONNECT 192.168.0.99:3306 HTTP/1.0
   
HTTP/1.0 200 Connection Established
Proxy-agent: Apache/2.0.49 (Unix)

If you think a proxy is there but configured not to respond to your IP address, make a note of it. This is one of those things whose exploitation can be attempted later, for example after a successful entry to a machine that holds an IP address internal to the organization.

The presence of WebDAV may allow file enumeration. You can test this using the WebDAV protocol directly (see Chapter 10) or with a WebDAV client. Cadaver (http://www.webdav.org/cadaver/) is one such client. You should also attempt to upload a file using a PUT method. On a web server that supports it, you may be able to upload and execute a script.

Another frequent configuration problem is the unrestricted availability of web server access logs. The logs, when available, can reveal direct links to other interesting (possibly also unprotected) server resources. Here are some folder names you should try:

/logs
/stats
/weblogs
/webstats

11.1.2.6 Examining responses to exceptional requests

For your review, you need to be able to differentiate between normal responses and exceptions when they are coming from the web server you are investigating. To do this, make several obviously incorrect requests at the beginning of the review and watch for the following:

Is the server responding with HTTP status 404 when pages are not found, as expected?
Is an IDS present? Simulate a few attacks against arbitrary scripts and see what happens. See if there might be a device that monitors the traffic and interferes upon attack detection.

Some applications respond to errors with HTTP status 200 as they would for successful requests, rather than following the HTTP standard of returning suitable status codes (such as status 404 when a page is not found). They do this in error or in an attempt to confuse automated vulnerability scanners. Authors of vulnerability scanners know about this trick, but it is still used. Having HTTP status 200 returned in response to errors will slow down any programmatic analysis of the web site but not much. Instead of using the response status code to detect problems, you will have to detect problems from the text embedded in the response page.

Examine the error messages produced by the application (even though we have not reached application analysis yet). If the application gives out overly verbose error messages, note this problem. Then proceed to use this flaw for information discovery later in the test.

11.1.2.7 Probing for known vulnerabilities

If there is sufficient information about the web server and the application server and there is reason to suspect the site is not running the latest version of either, an attacker will try to exploit the vulnerabilities. Vulnerabilities fall into one of the following three categories:

Easy to exploit vulnerabilities, often web-based
Vulnerabilities for which ready-made exploits are available
Vulnerabilities for which exploits are not yet released

Attackers are likely to attempt exploitation in cases 1 and 2. Exploitation through case 3 is possible in theory, but it requires much effort and determination by the attacker. Run up-to-date software to prevent the exploitation of valuable targets.

If you have reason to believe a system is vulnerable to a known vulnerability, you should attempt to compromise it. A successful exploitation of a vulnerability is what black-box assessment is all about. However, that can sometimes be dangerous and may lead to interrupted services, server crashing, or even data loss, so exercise good judgment to stop short of causing damage.

11.1.2.8 Enumerating applications

The last step in web server analysis is to enumerate installed applications. Frequently, there will be only one. Public web sites sometimes have several applications, one for the main content, another for forums, a third for a web log, and so on. Each application is an attack vector that must be analyzed. If you discover that a site uses a well-known application, you should look for its known vulnerabilities (for example, by visiting http://www.securityfocus.com/bid or http://www.secunia.com). If the application has not been patched recently there may be vulnerabilities that can be exploited.

The web application analysis steps should be repeated for every identified application.

11.1.2.9 Assessing the execution environment

Depending on the assessment you are performing, you may be able to execute processes on the server from the beginning (if you are pretending to be a shared hosting customer, for example). Even if such a privilege is not given to you, a successful exploitation of an application weakness may still provide you with this ability. If you can do this, one of the mandatory assessment steps would be to assess the execution environment:

Use a tool such as env_audit (see Chapter 6) to search for process information leaks.
Search the filesystem to locate executable binaries, files and directories you can read and write.

11.1.3. Web Application Analysis

If the source of the web application you are assessing is commonly available, then download it for review. (You can install it later if you determine there is a reason to practice attacking it.) Try to find the exact version used at the target site. Then proceed with the following:

Learn about the application architecture.
Discover how session management is implemented.
Examine the access control mechanisms.
Learn about the way the application interacts with other components.
Read through the source code (if available) for vulnerabilities.
Research whether there are any known vulnerabilities.

The remainder of this section continues with the review under the assumption the source code is unavailable. The principle is the same, except that with the source code you will have much more information to work with.

11.1.3.1 Using a spider to map out the application structure

Map out the entire application structure. A good approach is to use a spider to crawl the site automatically and review the results manually to fill in the blanks. Many spiders do not handle the use of the HTML <base> tag properly. If the site uses it, you will be likely to do most of the work manually.

As you are traversing the application, you should note response headers and cookies used by the application. Whenever you discover a page that is a part of a process (for example, a checkout process in an e-commerce application), write the information down. Those pages are candidates for tests against process state management weaknesses.

11.1.3.2 Examining page elements

Look into the source code of every page (here I mean the HTML source code and not the source of the script that generated it), examining JavaScript code and HTML comments. Developers often create a single JavaScript library file and use it for all application modules. It may happen that you get a lot of JavaScript code covering the use of an administrative interface.

11.1.3.3 Enumerating pages with parameters

Enumerate pages that accept parameters. Forms are especially interesting because most of the application functionality resides in them. Give special attention to hidden form fields because applications often do not expect the values of such fields to change.

For each page, write down the following information:

Target URL
Method (GET/POST)
Encoding (usually application/x-www-form-urlencoded; sometimes multipart/form-data)
Parameters (their types and default values)
If authentication is required
If SSL is required
Notes

You should note all scripts that perform security-sensitive operations, for the following reasons:

File downloads performed through scripts (instead of directly by the web server) may be vulnerable to file disclosure problems.
Scripts that appear to be using page parameters to include files from disk are also candidates for file disclosure attacks.
User registration, login, and pages to handle forgotten passwords are sensitive areas where brute-force attacks may work.

11.1.3.4 Examining well-known locations

Attempt to access directories directly, hoping to get directory listings and discover new files. Use WebDAV directory listings if WebDAV is available.

If that fails, some of the well-known files may provide more information:

robots.txt (may contain links to hidden folders)
.bash_history
citydesk.xml (contains a list of all site files)
WS_FTP.LOG (contains a record of all FTP transfers)
WEB-INF/ (contains code that should never be accessed directly)
CVS/ (contains a list of files in the folder)
_mm/contribute.xml (Macromedia Contribute configuration)
_notes/<pagename>.mno (Macromedia Contribute file notes)
_baks (Macromedia Contribute backup files)

Mutate existing filenames, appending frequently used backup extensions and sometimes replacing the existing extension with one of the following:

~
.bak
.BAK
.old
.OLD
.prev
.swp (but with a dot in front of the filename)

Finally, attempting to download predictably named files and folders in every existing folder of the site may yield results. Some sample predictable names include:

phpinfo.php
p.php
test.php
secret/
test/
new/
old/

11.1.4. Attacks Against Access Control

You have collected enough information about the application to analyze three potentially vulnerable areas in every web application:

Session management: Session management mechanisms, especially those that are homemade, may be vulnerable to one of the many attacks described in Chapter 10. Session tokens should be examined and tested for randomness.
Authentication: The login page is possibly the most important page in an application, especially if the application is not open for public registration. One way to attack the authentication method is to look for script vulnerabilities as you would for any other page. Perhaps the login page is vulnerable to an SQL injection attack and you could craft a special request to bypass authentication. An alternative is to attempt a brute-force attack. Since HTTP is a stateless protocol, many web applications were not designed to detect multiple authentication failures, which makes them vulnerable to brute-force attacks. Though such attacks leave clearly visible tracks in the error logs, they often go unnoticed because logs are not regularly reviewed. It is trivial to write a custom script (using Perl, for example) to automate brute-force attacks, and most people do just that. You may be able to use a tool such as Hydra (http://thc.org/thc-hydra/) to do the same without any programming.
Authorization: The authorization subsystem can be tested once you authenticate with the application. The goal of the tests should be to find ways to perform actions that should be beyond your normal user privileges. The ability to do this is known under the term privilege escalation. For example, a frequent authorization problem occurs when a user's unique identifier is used in a script as a parameter but the script does not check that the identifier belongs to the user who is executing the script. When you hear in the news of users being able to see other users' banking details online, the cause was probably a problem of this type. This is known as horizontal privilege escalation. Vertical privilege escalation occurs when you are able to perform an action that can normally only be performed by a different class of user altogether. For example, some applications keep the information as to whether the user is a privileged user in a cookie. In such circumstances, any user can become a privileged user simply by forging the cookie.

11.1.5. Vulnerability Probing

The final step of black-box vulnerability testing requires the public interface of the application, parameterized pages, to be examined to prove (or disprove) they are susceptible to attacks.

If you have already found some known vulnerabilities, you will need to confirm them, so do that first. The rest of the work is a process of going through the list of all pages, fiddling with the parameters, attempting to break the scripts. There is no single straight path to take. You need to understand web application security well, think on your feet, and combine pieces of information to build toward an exploit.

This process is not covered in detail here. Practice using the material available in this chapter and in Chapter 10. You should follow the links provided throughout both chapters. You may want to try out two web application security learning environments (WebMaven and WebGoat) described in the Appendix A.

Here is a list of the vulnerabilities you may attempt to find in an application. All of these are described in Chapter 10, with the exception of DoS attacks, which are described in Chapter 5.

SQL injection attacks
XSS attacks
File disclosure flaws
Source code disclosure flaws
Misconfigured access control mechanisms
Application logic flaws
Command execution attacks
Code execution attacks
Session management attacks
Brute-force attacks
Technology-specific flaws
Buffer overflow attacks
Denial of service attacks