[ Team LiB ] Previous Section Next Section

6.7 A Simple HTTP Client

With Example 6-9, we move away from file-based New I/O examples and move into networking examples. HttpGet is a program that performs an HTTP GET request to download a file from a web server. It uses a SocketChannel object for communication with the server, and a FileChannel to store the downloaded data into a file. (Or, if no filename is specified on the command line, it uses the Channels utility class to obtain a WritableByteChannel wrapper around the System.out standard output stream.)

HttpGet uses some networking classes that are new in Java 1.4, but are not part of java.nio. java.net.URI is the most important: it has more powerful URL parsing capabilities than java.net.URL, but does not have the built-in networking capability of the URL class. The other important new class is InetSocketAddress, which encapsulates a hostname and a port.

The HTTP request that is sent to the web server is first built as a String, then wrapped in a CharBuffer, which is encoded into a Charset object. The resulting ByteBuffer is then sent to the server using the write( ) method of the SocketChannel. Once the request is sent, the program enters a loop to read response data from the server and copy that data into the destination file (or standard output channel). The basic loop is essentially the same as the one in Example 6-4, but is complicated by code that extracts the HTTP status code from the response and scans for the byte sequence that identifies the end of the HTTP headers and the beginning of the actual data. You may find it interesting to compare this example to Example 5-6, which performs similarly but is implemented using java.net and java.io instead of java.nio.

Note that HttpGet does not implement the complete HTTP 1.1 protocol. Shortcomings include the inability to handle the server response "100 Continue" (which, if properly implemented, would ignore it and continue to read) or the response codes 301, 302, 303, and 305 (which should redirect to a new URL). Also, the code does not know how to handle a Transfer-Encoding: chunked header in the response.

Example 6-9. HttpGet.java
package je3.nio;
import java.io.*;
import java.net.*;
import java.nio.*;
import java.nio.channels.*;
import java.nio.charset.*;

public class HttpGet {
    public static void main(String[  ] args) {
        SocketChannel server = null;        // Channel for reading from server
        FileOutputStream outputStream = null;  // Stream to destination file
        WritableByteChannel destination;       // Channel to write to it

        try { // Exception-handling and channel-closing code follows this block

            // Parse the URL. Note we use the new java.net.URI, not URL here.
            URI uri = new URI(args[0]);

            // Now query and verify the various parts of the URI
            String scheme = uri.getScheme( );
            if (scheme == null || !scheme.equals("http"))
               throw new IllegalArgumentException("Must use 'http:' protocol");

            String hostname = uri.getHost( );

            int port = uri.getPort( );
            if (port == -1) port = 80; // Use default port if none specified

            String path = uri.getRawPath( );
            if (path == null || path.length( ) == 0) path = "/";

            String query = uri.getRawQuery( );
            query = (query == null)?"":'?'+query;

            // Combine the hostname and port into a single address object.
            // java.net.SocketAddress and InetSocketAddress are new in Java 1.4
            SocketAddress serverAddress=new InetSocketAddress(hostname, port);

            // Open a SocketChannel to the server
            server = SocketChannel.open(serverAddress);

            // Put together the HTTP request we'll send to the server.
            String request =
                "GET " + path + query + " HTTP/1.1\r\n" +  // The request
                "Host: " + hostname + "\r\n" +   // Required in HTTP 1.1
                "Connection: close\r\n" +        // Don't keep connection open
                "User-Agent: " + HttpGet.class.getName( ) + "\r\n" +
                "\r\n";  // Blank line indicates end of request headers

            // Now wrap a CharBuffer around that request string
            CharBuffer requestChars = CharBuffer.wrap(request);

            // Get a Charset object to encode the char buffer into bytes
            Charset charset = Charset.forName("ISO-8859-1");
            
            // Use the charset to encode the request into a byte buffer
            ByteBuffer requestBytes = charset.encode(requestChars);

            // Finally, we can send this HTTP request to the server.
            server.write(requestBytes);

            // Set up an output channel to send the output to.
            if (args.length > 1) {   // Use a specified filename
                outputStream = new FileOutputStream(args[1]);
                destination = outputStream.getChannel( );
            }
            else                    // Or wrap a channel around standard out
                destination = Channels.newChannel(System.out);

            // Allocate a 32 Kilobyte byte buffer for reading the response.  
            // Hopefully we'll get a low-level "direct" buffer
            ByteBuffer data = ByteBuffer.allocateDirect(32 * 1024);
            
            // Have we discarded the HTTP response headers yet?
            boolean skippedHeaders = false;
            // The code sent by the server
            int responseCode = -1;

            // Now loop, reading data from the server channel and writing it 
            // to the destination channel until the server indicates that it
            // has no more data.
            while(server.read(data) != -1) {  // Read data, and check for end
                data.flip( );      // Prepare to extract data from buffer

                // All HTTP reponses begin with a set of HTTP headers, which
                // we need to discard.  The headers end with the string
                // "\r\n\r\n" or the bytes 13,10,13,10.  If we haven't already
                // skipped them, then do so now.
                if (!skippedHeaders) {
                    // First, though, read the HTTP response code.
                    // Assume that we get the complete first line of the
                    // response when the first read( ) call returns. Assume also
                    // that the first 9 bytes are the ASCII characters
                    // "HTTP/1.1 ", and that the response code is the ASCII
                    // characters in the following three bytes.
                    if (responseCode == -1) {
                        responseCode =
                            100 * (data.get(9)-'0') +
                            10 * (data.get(10)-'0') +
                            1 * (data.get(11)-'0');
                        
                        // If there was an error, report it and quit
                        // Note that we do not handle redirect responses.
                        if (responseCode < 200 || responseCode >= 300) {
                            System.err.println("HTTP Error: " + responseCode);
                            System.exit(1);
                        }
                    }
                    
                    // Now skip the rest of the headers.
                    try {
                        for(;;) {
                            if ((data.get( ) == 13) && (data.get( ) == 10) &&
                                (data.get( ) == 13) && (data.get( ) == 10)) {
                                skippedHeaders = true;
                                break;
                            }
                        }
                    }
                    catch (BufferUnderflowException e) {
                        // If we arrive here, it means we reached the end of
                        // the buffer and didn't find the end of the headers.
                        // There is a chance that the last 1, 2, or 3 bytes in
                        // the buffer were the beginning of the \r\n\r\n
                        // sequence, so back up a bit.
                        data.position(data.position( )-3);
                        // Now discard the headers we have read
                        data.compact( );
                        // And go read more data from the server.
                        continue;
                    }
                }

                // Write the data out; drain the buffer fully.
                while(data.hasRemaining( )) destination.write(data);

                // Now that the buffer is drained, put it into fill mode
                // in preparation for reading more data into it.
                data.clear( );      // data.compact( ) also works here
            }
        }
        catch (Exception e) {    // Report any errors that arise
            System.err.println(e);
            System.err.println("Usage: java HttpGet <URL> [<filename>]");
        }
        finally { // Close the channels and output file stream, if needed
            try {
                if (server != null && server.isOpen( )) server.close( );
                if (outputStream != null) outputStream.close( );
            }
            catch(IOException e) {  }
        }
    }
}
    [ Team LiB ] Previous Section Next Section