The byte is the smallest addressable unit in computer memory. When computers represent larger objects, they use multiple bytes. When storing these bytes in memory, the computer must decide in what order to store the data. Similarly, when transmitting large objects over a network, the order of transmission of multiple bytes is crucial. Connected computer systems must establish a unified way of transmitting multi-byte data to collaborate effectively.
This predetermined order for storing and transmitting multi-byte data is called byte order, also known as “endianess.” Understanding byte order is helpful for building lower-level programs and gaining a deeper understanding of computer systems.
Most Significant Bit and Most Significant Byte
Let’s take the example of how computers store the positive integer 305419896. What makes this number special is its hexadecimal representation: 0x12345678
. Don’t be confused by the hexadecimal notation of this number. Similar to decimal representation, the ‘8’ at the end represents the units place in hexadecimal (representing 8 lots of 1), while the preceding ‘7’ carries a higher weight (representing 7 lots of 16), and so on… The highest-weighted ‘1’ at the beginning is called the Most Significant Bit (MSB).
We extend the concept of the Most Significant Bit to “Most Significant Byte”: if in a particular computer architecture, one byte represents 8 bits (which is a common scenario), then in hexadecimal counting, the value we use as an example, 0x12345678
, can be represented using four bytes: 0x12
0x34
0x56
0x78
. Among these, the byte 0x12
, which contains the MSB, is the “Most Significant Byte”.
Big Endian and Little Endian
A simple way to understand endianness is to consider whether the Most Significant Bit or “Most Significant Byte” is transmitted first or stored first during the storage or transmission of multi-byte data. If the “Most Significant Byte” is processed first, this order is called big-endian (BE). Otherwise, it is called little-endian (LE).
Imagine in the context of the internet, where we often use big-endian to transmit data. As a data receiver, if the sender somehow informs us that a four-byte data is about to be transmitted, and we receive the data as 0xA0
0xB0
0xC0
0xD0
in sequence, the final data value should be 0xA0B0C0D0
.
Similarly, if another system adopts little-endian order, and one byte represents 16 bits, with the data stored in memory as 0xF1CE 0x000F in sequence, the original data represented is 0x000FF1CE (looks like “Office,” it’s a hexadecimal joke). This is because in little-endian order, the “Most Significant Byte” is transmitted last.
According to Wikipedia, there are also some rare byte orders, including middle-endian. For example, the PDP-11 series computers use mixed endianness, where to represent the value 0xA0B0C0D0, the corresponding data stored in memory is as follows:
0xB0 0xA0 0xD0 0xC0
The PDP-11 series computers represent a 32-bit value as two 16-bit words stored in big-endian order (0xB0A0[Most Significant Word] and 0xD0C0), then within each word, the bytes are stored in little-endian order (for example, in 0xB0A0, 0xB0 comes before 0xA0[Most Significant Byte]).
Endianness is neutral
There’s no inherent advantage to using one over the other when representing multi-byte data. It’s simply a matter of convention. Because systems uniformly use the same method to store, transmit, and interpret multi-byte data internally, there’s no technical reason why one endianness would be superior to another.
However, you might be curious about the origin of the term “endianness.” According to Wikipedia, much like the debate over whether it’s better to crack an egg at its larger or smaller end, there’s controversy surrounding this issue in various cultures. The word “endian” comes from Jonathan Swift’s work “Gulliver’s Travels”:
… as I was going to tell you, been engaged in a most obstinate war for six and thirty moons past. It began upon the following occasion: it is allowed on all hands, that the primitive way of breaking eggs, before we eat them, was upon the larger end; but his present majesty’s grandfather, while he was a boy, going to eat an egg, and breaking it according to the ancient practice, happened to cut one of his fingers. Whereupon, the emperor his father, published an edict, commanding all his subjects, upon great penalties, to break the smaller end of their eggs. The people so highly resented this law, that our histories tell us, there have been six rebellions raised on that account; wherein one emperor lost his life, and another his crown. These civil commotions were constantly fomented by the monarchs of Blefuscu; and when they were quelled, the exiles always fled for refuge to that empire. It is computed that eleven thousand persons have at several times suffered death, rather than submit to break their eggs at the smaller end. Many hundred large volumes have been published upon this controversy; but the books of the Big-endians have been long forbidden, and the whole party rendered incapable by law of holding employments.
From “Gulliver’s Travels” §1.4, page 42.
Common Endianness
The computers we typically use employ the x86-64 architecture, which, like its legacy x86 architecture, follows little-endian order. Endianness for the ARM architecture, designed for mobile devices, is configurable. Similarly, endianness for PowerPC, MIPS, and IA64 architectures is also configurable.
As mentioned, big-endian is commonly used in computer network transmission. The IP protocol, as guaranteed by RFC 791, ensures that protocol headers are transmitted in big-endian order.