Tue, 19 Jun 2007


In general, a computer architecture will be able to access the same data in multiple sizes. E.g. by the byte (8), word(16), integer(32), or long integer(64). Usually, but not always, bytes are the smallest component, and the address space of the machine maps one-to-one and onto every byte. Typically a word is found at every other byte. However, big-endian architectures treat the byte with the lower address as the most significant byte of the word, and the next byte is the least significant byte. They take the big end first. Little-endian architectures are the most common, and put the least significant byte first, at the lower address.

Clearly this has implications for converting data into values. If you get data (a stream of bytes) from a file or over the network, you need to know how to convert those data into values. You need to know if the values were stored big-endian or little-endian. This problem has caused much grief among programmer, because they have tried to take shortcuts like:

void t(unsigned char *ptr) {
    unsigned int i = *(int *)ptr;
which is completely unportable code, whereas:
void t(unsigned char *ptr) {
    unsigned int i = ptr[0] | (ptr[1] << 8);
is completely portable code, is explicit about byte order, and any reasonable peephole optimizer will generate the same code as the first.

Endianness is completely separate from file/network byte order. The first is a characteristic of the architecture. The second is a serialization method.

Posted [14:25] [Filed in: opensource] [permalink] [Google for the title] [digg this]