Streams and bytes in Java

The byte type in Java is, strangely, signed. So bytes range from –127 to +128. So you cannot write byte b = 0xCA directly (because 202 > 128) but the (byte) cast does sign extension.

The problem I'm fighting with now is with input/output streams. The OutputStream class has two methods that take byte arrays, but the abstract method write that takes an int. With the InputStream, the abstract read returns an int mainly so that it can return –1 for end-of-file, so I guess that write uses an int for consistency.

But here's where it gets really weird. Normally converting between byte and int does sign extension, but apparently that's not what you want here… or is it? The specification of void OutputStream.write(int):

Writes the specified byte to this output stream. The general contract for write is that one byte is written to the output stream. The byte to be written is the eight low-order bits of the argument b. The 24 high-order bits of b are ignored.
and of int InputStream.read():
Reads the next byte of data from the input stream. The value byte is returned as an int in the range 0 to 255. If no byte is available because the end of the stream has been reached, the value -1 is returned. This method blocks until input data is available, the end of the stream is detected, or an exception is thrown.

Got it? In both methods, just the lower 8 bits are used, except in the case where read() returns –1 for EOF. But here's the thing – OutputStream violates its own contract:

import java.io.*;
class StreamTest extends OutputStream {
    public void write(int b) throws IOException {
      System.out.println("Writing int " + b);
    }
    public static void main(String[] args) throws IOException {
        byte[] bs = { (byte)0xCA, (byte)0xFE,
                      (byte)0x7F, (byte)0x32 };
        new StreamTest().write(bs);
    }
}

The above code sets up a byte array, and passes it to OutputStream.write(byte[]), which in turn iterates and calls my overridden write(int). But in converting byte to int, it does the normal sign-extension widening, so that it's not just the lower 8 bits I need to pay attention to in write, it's the lower 7 bits and the sign. (Update: oops, I overlooked that the representations are the same thanks to two's complement!)

Arguably, write doesn't need to be as picky about the sign as read, since read uses –1 to indicate EOF. Indeed, if I try:

import java.io.*;
class ReadTest extends InputStream {
    private byte[] bs = { (byte)0xCA, (byte)0x40,
                          (byte)0xFF, (byte)0x32 };
    private int i = 0;
    public int read() throws IOException {
        return bs[i++];
    }
    public static void main(String[] args) throws IOException {
        byte[] buf = new byte[4];
        new ReadTest().read(buf);
        for( int i=0; i<4; i++ )
            System.out.println(buf[i]);
    }
}

then it outputs –54 64 0 0, because the 0xFF = –1 is interpreted as EOF. The read method needs some sign-inverting logic like: bs[i] > 0? bs[i] : bs[i]+256

This all started with some 3rd-party library code I'm trying to use, for arithmetic encoding (compression) using prediction by partial match. They defined filtering input and output streams that seem to work okay in tests, but fail miserably as soon as I wrap them in a DataOutputStream. It turned out that they were doing sign-inversions – not in the way I showed above – but in a weird place. As long as you write bytes as an array and then read them as an array, it happened to work. But if you write an array and then read one at a time, the conversions don't work out. They also seemed to take the specification of write(int) at face value, just assuming that the parameter would be non-negative.

What a mess!

©20022015 Christopher League