The Basics Of Binary

The Basics Of Binary

This article introduces the basics of how binary data is manipulated. It is intended for readers who are fairly new to numerical systems. The concepts of storing integers, real numbers, and characters are discussed.

Introduction

I first will go through the general theory of numerical systems (decimal, binary, and hexadecimal) and then proceed to how these are stored in computers. I would guess that everyone who has heard of a computer or the Internet has at some point come across the concepts of binary and hexadecimal. But, for the sake of completeness, allow me to re-introduce you to it.

Integers

The normal system of counting has ten elements from which you build up all other quantities. You start at 0, then 1, 2, 3..., up to 9; then you move on into the next "space" (called the next digit), 10. By using this system, you can get to any integer.

Now, consider the value 613. The value of the total number represented is:

61310 = 6 * 102 + 1 * 101 + 3 * 100 = 61310

Okay... that seed seemed a bit arbitrary... but stay with me.

Note the subscript, 10, to indicate which number system is being used. It's ten because there are ten elements in the number system.

Binary is much the same except that it only has two elements in its counting system. So, you start at 0 then 1, but now you have to move to the next digit so you get 10, then 11 and again to the next digit 100, and so on. Just as with the decimal system, you can represent any integer using this technique... and just like decimal, you can get a binary number's decimal value by converting it as follows:

1012 = 1 * 22 + 0 * 21 + 1 * 20 = 510

The only difference here is that you use 2 instead of 10 because there are two elements in the number system.

The system used for hexadecimal is just like binary and decimal. Hexadecimal has 16 elements, namely 0, 1, 2, 3, ... 8, 9, A, B, C, D, E, F. The values of A, B,... F are 1010, 1110,... 1510. It follows the same rules of converting:

4F16 = 4 * 161 + 15 * 160 = 7910

In binary, you call each digit a bit, and 8 of these bits are called a byte. Why 8? Who knows! But that's the way it happened and now we're stuck with it. Now, the nice thing about hexadecimal (and why you use it so much in computing) is that you can represent four bits using one hexadecimal digit, and eight bits using two hexadecimal digits.

Consider the binary number 110100112. Wow, eight digits are pretty hard to read. But, if you convert it into hexadecimal it becomes only two digits. By using the first four bits (11012), you can convert this to decimal equivalent 1310, which equals the hexadecimal number D16. Then the last four bits, 00112 = 310 = 316. So, the total byte is represented in hexadecimal by B316.

That pretty much concludes representing integers using binary and hexadecimal (except for negative numbers, but I'll get there). Now, I'll move on to something slightly more complicated.

Real Numbers

The decimal number 0.5 is a good enough place to start. How would you represent this in binary or hexadecimal? It still follows the same rules as before. When storing integers in decimal, the last digit carries a value of 100 the next was 101 then 102 and so on. So, the "power of" increases as you look to the left, or another way to look at it is that the "power of" decreases as you look to the right.

If you were to look beyond the last digit (in other words, you look right of the point), you would find that the sequence continues. The digit just to the right of the point carries a value of 10-1 (i.e. 0.1) then 10-2 and so on.

The same rule applies in binary except that you work with a base of 2 not 10. So, the binary digit to the right of the point would carry a value of 2-1 then 2-2. Now, 2-1 is 0.5, 2-2 is 0.25 so when converting a binary number to decimal, you would use these values. Consider the binary number 101.112:

101.112 = 1 * 22 + 0 * 21 + 1 * 20 + 1 * 2-1 + 1 * 2-2 = 5.7510

Easy? You get the hang of it as you go along.

And, obviously, the same is true for hexadecimal; for example, 4F.E16:

4F.E16 = 4 * 161 + 15 * 160 + 14 * 16-1 = 79.87510

Floating Point Representations

In decimal, you often work with VERY large numbers or VERY small numbers (just ask astro and nuclear physicists). It soon becomes impractical to go writing a whole stack of zeros; for example, The speed of light, c, which is 300000000 m/s (but they get MUCH larger than that).

So, what some bright lad decided to do was put down the most significant numbers and then just indicate how many others there are. So, in the case of the speed of light, c, 3 * 108. This is shortened even more by writing E instead of "* 10", giving you 3E8, which is so much easier to write.

The two parts of this number (3 and 8) are called the mantissa and the exponent, respectively.

Naturally, you can do the same thing in binary except that you use 2 instead of 10. So, the number 10000002 could be written as 12 * 26. Writing the whole thing in binary (using our "E" notation), you get 12E1102.

Now, consider the decimal number 185712956274. How do you represent this in your "E" notation? Well, you put down the most significant numbers and then state how many other digits there are. Which numbers are significant depends on how accurately you want it, so you are going to approximate. Typically, one wouldn't worry too much about anything more than about three digits (which is what I'll use here). That massively complicated number now becomes 186E9.

But, it gets a bit hard to compare numbers if I choose 186E9 and you choose 1857E8. By quickly looking at these two representations, you can't quickly say whether they are approximately equal or if one is ten times larger than the other.

So, to get around this, you move the decimal point over in the mantissa so that only one digit is before the decimal point and then add the number of "shifts" to the exponent. With 186E9, the mantissa becomes 1.86 (two shifts), so the exponent becomes 11. Your number is then 1.86E11.

Looking back your new representation, 1.86E11 means 1.86 * 1011 = 186000000000.

This representation is commonly referred to as floating point. The name is pretty self explanatory; the point is not set in a specific place within the number, but "floats" around.

Binary can do it too. 101000002 would be represented as 1.012E01112.

Those are the basic concepts. Now, you can look more specifically at how these numbers are stored in a computer.

The Basics Of Binary

Computer Representation of Integers

There is nothing special about storing a positive integer in a computer. It is stored exactly as you saw above. The only thing is that they are always stored in groups of eight bits; in other words, they are stored in bytes. So, the decimal number 5310 is stored as 001101012.

Negative numbers are a bit more of a challenge. They are stored using the "two's compliment" of the positive number. What is this "two's compliment?" Well, all you do is invert all the 1s to 0s and visa versa, and then add one.

Use the example above, -5310. The binary equivalent of the positive number is 001101012. Swapping all the 1s and 0s gives you 110010102, and add one to give you:

110010112

That's it.

But, why use the "two's compliment?" It looks like a really roundabout way of doing things. The reason is that it makes addition REALLY easy. You just add up the bits.

Think of the operation -5310 + 6410 = (110010112 + 010000002):

 11001011
 01000000
-------------------
100001011
-------------------

Because there are more than eight digits in the answer, the first 1 drops off the end, giving 000010112, which equals 1110 (which is the correct answer for -5310 + 6410).

For larger numbers to be stored, multiple bytes are used to represent a number.

Computer Representation of Floating Point Numbers

In computers, you have only eights bits, but nowhere is a point stored. So, how do you go about representing floating point numbers? The way we do it is that you define that the point always sits in a specific place.

So, go back to the number you had earlier: 1010000002 (represented in binary as 1.012E01112).

You define in the computer that the point is after the first bit, and then the mantissa is stored as 101000002 (which represents 1.01000002). The exponent is always an integer, and as stored as such 000001112.

So, the decimal number 32010 is stored in a computer in two pieces, 101000002 and 000001112.

If you have understood all of this, carry on reading (otherwise, this next part is going to confuse you even more).

The mantissa of a floating point binary number will always have a 1 in the first digit (except when representing zero). So, if there is ALWAYS going to be a leading 1, why bother writing it? It is just taking up space. This means that your binary number 10100000 could be written as 010000002 (mantissa) and 000001112 (exponent) where youhave defined in the computer that there is a leading 1 and that the point occurs directly after that 1.

The extra bit that you have gained increases the accuracy of your stored number.

Note: This representation is not exactly the same as how a computer stores floating point. To see the exact representation, read this article. The representation used is defined by the IEEE 32-bit and IEEE 64-bit standard.

Characters in a Computer

I am not going to go into detail about characters, but the most fundamental representation used to store characters is the ASCII characters set. This ASCII set is simply a table of defined characters and the bit sequence that is used to represent them.

For example, the character "A" is represented by the number 6510, "Z" by 90, "a" by 97 and "z" by 122. There are 256 characters defined in the ASCII set (content of one byte).

More recently, it has been realised that this character set is not enough to represent languages other than English. As a result, two bytes are now used to store each character; this gives a possible set of 65536 characters.

Interpreting a Byte

It took me ages before the light bulb went off in my head and I understood this. (Why didn't someone explain it nicely, eh?)

Say you have the byte 011110102. What does this represent?

Well, you could say it is a number and so it represents the decimal number 12210. Or, it could represent a character in which case it is a "z".

How do we know which one it is? Simple. We don't.

Okay, then how does the computer know? Well, same answer... it doesn't. The only thing which DOES know what it is, is the thing that put it there. When you write a program, you do something like this:

int MyNumber;
MyNumber = 5;

(This is the C declaration of an integer.) Now, when your program runs, it looks at this and says "Mmm, okay; reserve memory of this variable and remember to treat it like an integer." Then, it puts the binary value 000001012 into that memory space. Later, when you use that number again the program says "Okay, I know this chunk of memory is an integer, so this represents 510."

It is for this reason that when you try to read some binary file in Notepad you get really funny looking characters. Notepad keeps thinking that these bytes represent ASCII characters, when in fact they actually represent numbers. The problem is only the thing that put the binary into that file knows what the sequence is.

The same happens when you look at a representation of the computer's memory. It usually shows two (maybe more) "views." One has all the binary displayed as numbers and the other has them displayed in characters. Another popular "view" is everything displayed using hexadecimal representation of the bytes in memory. Again, unless you know how to interpret what you see, it is all pretty meaningless.

Conclusion

I hope this helps you understand what is going on inside your machine. With any luck, it has also showed you, to some degree, how to read and interpret binary sequences.



Comments

  • There are no comments yet. Be the first to comment!

Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • Live Event Date: October 29, 2014 @ 11:00 a.m. ET / 8:00 a.m. PT Are you interested in building a cognitive application using the power of IBM Watson? Need a platform that provides speed and ease for rapidly deploying this application? Join Chris Madison, Watson Solution Architect, as he walks through the process of building a Watson powered application on IBM Bluemix. Chris will talk about the new Watson Services just released on IBM bluemix, but more importantly he will do a step by step cognitive …

  • QA teams don't have time to test everything yet they can't afford to ship buggy code. Learn how Coverity can help organizations shrink their testing cycles and reduce regression risk by focusing their manual and automated testing based on the impact of change.

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds