[ Team LiB ] Previous Section Next Section

6.5 Advanced Byte-to-Character Conversion

In Example 6-4 we saw a basic loop for copying bytes from one channel to another. Another commonly seen loop in programs that use the New I/O API is one that combines reading or writing bytes with decoding bytes to characters, or encoding characters to bytes. In Example 6-3 we saw the Charset.decode( ) method for decoding a buffer of bytes into a buffer of characters. This is actually a high-level convenience method, and we'll see similar convenience methods elsewhere in this chapter. For better streaming performance, however, you can use the lower-level CharsetDecoder and CharsetEncoder classes, as is done in Example 6-5. This example is the ChannelToWriter class, which defines a single static copy( ) method. This method reads bytes from a specified channel, decodes them to characters using the specified Charset, and then writes them to the specified Writer. (Note that this is not the same function performed by Channels.newReader( ), Channels.newWriter( ), or Channels.newChannel( ). The factory methods of the Channels class allow you to wrap a channel around a stream or a stream around a channel, but do not perform a copy.)

The read/decode/write loop shown in this example is a common one in java.nio code, but is more complex than you might expect. One reason for the complexity is that in many character encodings, there is not a one-to-one correspondence between bytes and characters. This means that there is no guarantee that all bytes in a buffer can be decoded into characters each time through the loop—one or more bytes at the end of the buffer might not encode a complete character. Note also that before entering the loop, we tell the CharsetDecoder to ignore bad input. If we don't do this, we must examine the return value of each decode( ) call to ensure that it was successful.

Example 6-5. ChannelToWriter.java
package je3.nio;
import java.io.*;
import java.nio.*;
import java.nio.charset.*;
import java.nio.channels.*;

public class ChannelToWriter {
    /**
     * Read bytes from the specified channel, decode them using the specified
     * Charset, and write the resulting characters to the specified writer
     */
    public static void copy(ReadableByteChannel channel, Writer writer,
                            Charset charset)
        throws IOException
    {
        // Get and configure the CharsetDecoder we'll use
        CharsetDecoder decoder = charset.newDecoder( );
        decoder.onMalformedInput(CodingErrorAction.IGNORE);
        decoder.onUnmappableCharacter(CodingErrorAction.IGNORE);

        // Get the buffers we'll use and the backing array for the CharBuffer.
        ByteBuffer bytes = ByteBuffer.allocateDirect(2*1024);
        CharBuffer chars = CharBuffer.allocate(2*1024);
        char[  ] array = chars.array( );

        while(channel.read(bytes) != -1) { // Read from channel until EOF
            bytes.flip( );                  // Switch to drain mode for decoding
            // Decode the byte buffer into the char buffer.
            // Pass false to indicate that we're not done.
            decoder.decode(bytes, chars, false);

            // Put the char buffer into drain mode, and write its contents
            // to the Writer, reading them from the backing array.
            chars.flip( );               
            writer.write(array, chars.position( ), chars.remaining( ));  

            // Discard all bytes we decoded, and put the byte buffer back into
            // fill mode.  Since all characters were output, clear that buffer.
            bytes.compact( );            // Discard decoded bytes
            chars.clear( );              // Clear the character buffer
        }
            
        // At this point there may still be some bytes in the buffer to decode
        // So put the buffer into drain mode, call decode( ) a final time, and
        // finish with a flush( ).
        bytes.flip( );
        decoder.decode(bytes, chars, true);  // True means final call
        decoder.flush(chars);                // Flush any buffered chars
        // Write these final chars (if any) to the writer.
        chars.flip( );                           
        writer.write(array, chars.position( ), chars.remaining( ));  
        writer.flush( );
    }

    // A test method: copy a UTF-8 file to standard out
    public static void main(String[  ] args) throws IOException {
        FileChannel c = new FileInputStream(args[0]).getChannel( );
        OutputStreamWriter w = new OutputStreamWriter(System.out);
        Charset utf8 = Charset.forName("UTF-8");
        ChannelToWriter.copy(c, w, utf8);
        c.close( );
        w.close( );
    }
}
    [ Team LiB ] Previous Section Next Section