War of the Worlds: Endianness

Most of the data types supported by programming languages span on more than one byte, leading to a problem: How do we store these bytes in memory? As in the case of most problems, there is more than just one solution, and of course, they are all used. The answer has separated the world of computing (mainly) in two: those who adopted the little-endian layout, and those who adopted the big-endian representation.

Data Storing Solutions

We often like to envision memory as a contiguous array of locations (each one identified by an address), lining up in a row, with the leftmost starting from address 0, and the rightmost having the address N-1 (where the total number of bytes of memory is N).

Now, take the example of type int/integer, represented on 4 bytes on a 32-bit platform machine. When it comes to storing it in memory, there are two widely used solutions.

In the little-endian representation, bytes are arrayed in memory, with the least significant byte at the lowest address (left-most memory location). For instance, if you need to store the integer 0x12345678 (hexadecimal value), the least significant byte (0x78) is stored at the left-most memory location, lowest address, base+0, and the most significant byte (0x12) is stored at the right-most memory location (out of the four needed), base+3.

In the big-endian representation, everything is reversed. The most significant byte (0x12 for the previous example) is stored at the left-most memory location, lowest address, base+0, and the least significant byte at the right-most address, base+3.

But of course, that's not all. Middle-endian is used on some platforms, and the position of bytes can vary. A PDP-11 processor stores the integer 0x12345678 as 0x34, 0x12, 0x78, 0x56 (from left to right).

There are architectures that can be configured to work either with big-endian or with little-endian (ARM, DEC Alpha, MIPS, PA-RISC, and IA64). Those are referred as bytesexual or bi-endian. On some architectures, the endianness can be switched by software (usually at start-up); on others, the endianness is selected by some hardware on the motherboard (and sometimes cannot be changed by software).

The endianness applies not only to the order of bytes in memory, but also to the numbering of bits in a byte (a word, or a double-word). In the case of a big-endian architecture, the bits are numbered from left, with bit 0 being the most significant one, and bit 7 (at the most-right position) being the least significant one. On the other hand, with a little-endian architecture, the bits are numbered from right to left, the least significant one (at the right) being bit 0, and the most significant one (at the left) being the bit 7.

And guess what: The representation of dates on different areas on the planet is subjected to the same endianness, but instead of being about bytes, is about days, months and years:

  • US: middle-endian representation: mm/dd/yy
  • Europe: little-endian representation: dd/mm/yy
  • Japan: big-endian: yy/mm/dd

The Origin of the Terminology

In case you wonder where the names come from, the answer may surprise you: from Jonathan Swift's Gulliver's Travels book. In the first part of the book, Gulliver, an English sailor, awakes after a shipwreck as a prisoner of a six-inch high people, called Lilliputians. In Chapter 4 of the book, a secretary of the emperor of Lilliput tells him about the war with the people of Blefuscu, a rival empire, who offered protection for the Big-Endians in the civil war between the Big- and Little-Endians. It was the primitive way in Lilliput that the eggs were broken from the larger end before being eaten. But, when the son of an emperor (that later become emperor himself) cut his finger breaking an egg, his father, the emperor, published an edict, commanding everyone, under great penalties, to break the eggs from the little end. And that edict led to a great civil war between the followers of the new way (the Little-Endians) and those who remained committed to the old way (the Big-Endians). In the torment of the conflict, the Big-Endians found protection in the Kingdom of Blefuscu and a war was started between the mighty Kingdoms of Lilliput and Blefuscu.

A quote from Chapter 4 of the book:

It began upon the following Occasion. It is allowed on all Hands, that the primitive way of breaking Eggs, before we eat them, was upon the larger End: But his present Majesty's Grand-father, while he was a Boy, going to eat an Egg, and breaking it according to the ancient Practice, happened to cut one of his Fingers. Whereupon the Emperor his Father published an Edict, commanding all his Subjects, upon great Penaltys, to break the smaller End of their Eggs. The People so highly resented this Law, that our Histories tell us there have been six Rebellions raised on that account; wherein one Emperor lost his Life, and another his Crown. These civil Commotions were constantly fomented by the Monarchs of Blefuscu; and when they were quelled, the Exiles always fled for Refuge to that Empire.

The Best Representation

Actually, there is no such thing. Although many have taken either one side or the other, both little-endian and big-endian representations have advantages and disadvantages.

For little-endian, the assembly language instructions that work with different length numbers (1, 2, 4 bytes) proceed in the same way by first picking up the least significant byte, at address base+0 and going towards the most significant byte.

With a big-endian representation, no matter how long the number is, you can quickly test if it is positive or negative by checking the byte at address base+0 (the most significant byte). Most network header code and bitmap graphics are mapped with a big-endian order. On a big-endian machine, the shifts and stores are automatically taken care by the architecture; but on a little-endian machine, there is a need to reverse the byte order of elements that are stored on more than one byte. Moreover, it is easier to read hexadecimal texts.

Endianness on Different Architectures

The following architectures use:

  • Little-endian:
    • Intel x86
    • AMD64
    • DEC VAX
    • MOS Technology 6502
  • Big-endian
    • Sun SPARC
    • Motorola 68000
    • POWER PC
    • IBM System/360
  • Bi-endian, running in big-endian mode by default:
    • MIPS running IRIX
    • PA-RISC
    • Most POWER and PowerPC systems
  • Bi-endian, running in little-endian mode by default:
    • MIPS running Ultrix
    • most DEC Alpha
    • IA-64 running Linux

Implications of Endianness

When you write software that runs on a single machine, usually you do not care for the endianness. When the machine is part of a network, with other machines using different architectures, and the software communicates with others in this network, a transformation must be applied before sending, or after reading data.

However, there are cases when you care for endianness even if your software runs on a single machine. Different file formats use different endianness. For instance, the JPEG format uses big-endian representation, so if you write a program that saves JPEG images and runs on a little-endian machine, you must reverse all the bytes before writing it to disk.

The following table shows the endian order for some files:

FileEndianness
Adobe PhotoshopBig Endian
BMP (Windows and OS/2 Bitmaps)Little Endian
DXF (AutoCad)Variable
GIFLittle Endian
IMG (GEM Raster)Big Endian
JPEGBig Endian
FLI (Autodesk Animator)Little Endian
MacPaintBig Endian
PCX (PC Paintbrush)Little Endian
QTM (Quicktime Movies)Little Endian
Microsoft RIFF (.WAV & .AVI)Both
Microsoft RTF (Rich Text Format)Little Endian
SGI (Silicon Graphics)Big Endian
Sun RasterBig Endian
TGA (Targa)Little Endian
TIFFBoth, Endian identifier encoded into file
WPG (WordPerfect Graphics Metafile)Big Endian
XWD (X Window Dump)Both, Endian identifier encoded into file

A UNICODE text file encoded in UTF-8, UTF-16, or UTF-32 has a special marker at the beginning, called a Byte-Order-Mask (BOM), that indicates whether the file uses little-endian or big-endian byte order.

BOMEncoding
EF BB BFUTF-8
FE FFUTF-16 (big-endian)
FF FEUTF-16 (little-endian)
00 00 FE FFUTF-32 (big-endian)
FF FE 00 00UTF-32 (little-endian)

Conversion from Little-Endian to Big-Endian

The following function swaps the 4 bytes of an integer (32-bit platform) and can be used to convert from the little-endian representation to big-endian and vice versa.

unsigned int swap(unsigned int value)
{
   return  ((value  & 0xFF000000) >> 24) |
           (((value & 0x00FF0000) >> 16) << 8) |
           (((value & 0x0000FF00) >> 8)  << 16) |
           ((value  & 0x000000FF) << 24);
}

But, you don't have to write conversion functions to deal with data that is transferred on a TCP/IP network that uses the big-endian ordering of bytes. There is a series of library functions that convert the host representation to the network representation:



About the Author

Marius Bancila

Marius Bancila is a Microsoft MVP for VC++. He works as a software developer for a Norwegian-based company. He is mainly focused on building desktop applications with MFC and VC#. He keeps a blog at www.mariusbancila.ro/blog, focused on Windows programming. He is the co-founder of codexpert.ro, a community for Romanian C++/VC++ programmers.

Comments

  • Useful

    Posted by akhchand on 08/22/2006 06:30am

    Really its of great help... The way of writing and explanation is also extremely nice..

    Reply
  • Gr8 article

    Posted by Krishnaa on 07/07/2006 01:38pm

    Nice to have it here, a gr8 reference.

    Reply
  • A question

    Posted by AllenTing on 07/27/2005 04:17am

    The endianness applies not only to the order of bytes in memory, but also to the numbering of bits in a byte (a word, or a double-word). In the case of a big-endian architecture, the bits are numbered from left, with bit 0 being the most significant one, and bit 7 (at the most-right position) being the least significant one. On the other hand, with a little-endian architecture, the bits are numbered from right to left, the least significant one (at the right) being bit 0, and the most significant one (at the left) being the bit 7. o< o< o< o< o< o< o< o< o< o< o< o< o< o< o< o< o< o< o< o< o< o< o< o< o< o< o< o< o< o< o< o< o< In fact, I think the significance of the numbering of bits in a byte is not major.As it above said, the most important n bit is always at the most-left position no matter in big-endian or little-endian.The different point is just the name of bits(Ex:the most important bit at left is called bit 0 in big-endian and bit 7 in little-endian.). In my opinion, that can not reflect the difference between the two endians well .In reverse, the example of the order of bytes in memory is quite excellent. Thanks for the author's sharing!!

    • Design issue

      Posted by Ajay Vijay on 09/23/2005 05:24pm

      AllenTing, we are considering 8-bit as one byte - no matter how they are stored in memory (i.e. hardware). When we read a byte, the hardware returns the value that had been written. Now we have written an array of bytes: 0x40 0xea 0x0f 0x9d. When we need to read them in REVERSE order, the values would be same on the same machine. Further, let us assume the above given sequence of bytes is NOT array of bytes, but, logically, and integer (32-bit, 4-byte). Okay. When we send these sequence of bytes from the same machine (whatever the bit representation architechure), those bytes (read: BYTES) will be sent as having same values. Now, we do not know how the source computer and target computer interprets in terms of endainness - the data would not be perfect. Let us have an example, "ABCD" is sent from source to target. The target will receive the same set of bytes, but it might interpret it as "DCBA", "DCBA", "BADC",.... but the BYTES are properly exchanged!

      Reply
    • Forgot it! I'm sorry

      Posted by AllenTing on 07/27/2005 10:55pm

      Thanks for your replying. my question is : In term of the definition of big and little endianness said in the article, for example, one byte 0x83(equals: 10000011 in binary),in my opinion, should be described as 10000011 in big-endianness ,while 11000001 in little-endianness. Considering following conditions: If one computer sends one byte 0x83 according to network, the actual series of bits should be 1->0->0->0->0->0->1->1.Ofcourse, the other computer receives the byte and the receiving series are also 1->0->0->0->0->0->1->1.Now the question is how the receiving computer sorts order of the series?10000011 (equals 0x83)in big-endianness? or 11000001(equals 0xC1) in little-endianness? I'm a beginner,if I make mistake, please point out.I just want to make it clear.Thank you!

      Reply
    • what was the question

      Posted by cilu on 07/27/2005 04:55am

      The main difference between the two endianness architecture is given by the layout of bytes in memory. My intention was to point that in each architecture, the numbering of bits is also different. What was the question?

      Reply
    Reply
  • A very good article.

    Posted by Pinky98 on 07/27/2005 01:45am

    A very good article. Shot.

    Reply
Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • 10 Rules that Make or Break Enterprise App Development Projects In today's app-driven world, application development is a top priority. Even so, 68% of enterprise application delivery projects fail. Designing and building applications that pay for themselves and adapt to future needs is incredibly difficult. Executing one successful project is lucky, but making it a repeatable process and strategic advantage? That's where the money is. With help from our most experienced project leads and software engineers, …

  • Java developers know that testing code changes can be a huge pain, and waiting for an application to redeploy after a code fix can take an eternity. Wouldn't it be great if you could see your code changes immediately, fine-tune, debug, explore and deploy code without waiting for ages? In this white paper, find out how that's possible with a Java plugin that drastically changes the way you develop, test and run Java applications. Discover the advantages of this plugin, and the changes you can expect to see …

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds