The story of cover: the first hard drive in the world
Data types in Computer
Today, we can store any data we have in our computer, including: numbers, text, audio, image, video, etc.
Most of mordern computer are based on binary, in other words, the computer can only store and manipulate 0 and 1, so how did computer store the data?
It’s no doubt that we store all data as binary in our computer, there must be some methods of representation to convert 0 and 1 to what we can see.
What is bit?
Before talking about bit pattern, we need to know what is bit?
Simple, bit is the smallest unit in computer, which only means a 0 or 1.
What is bit pattern?
Bit pattern is a sequence of bits, and sometimes we call it bit stream as well.
For different use, we can define the binary numbers’ meaning, and let the computer know the mapping rules, so that we can use binary to represent the content. In theory, if the binary number is big enough, it could represent anything, even the universe.
Fixed point number storage
Fixed point number is usually used for integer, and the point is on the rightest position of number, so we call it fixed point number as well.


For saving memory(make the length of binary shorter), we have two ways to present an integer: unsigned notation and signed notation.
Unsigned notation
Its range includes [0, 2^n)
, n is the length of binary number.
How it works?
If I want to store a decimal number 7
into a 8length binary unit:


By the way, computers are used to use 8length binary unit, and call it byte.
How to read?
Just like convert a binary number to deicmal number
If I want to read the number I just stored from a 8length binary unit:


Overflow
If you want to store a number which is bigger than the binary unit allowed, then the overflow will happen.
For example, if you want to store a decimal number 21
into a 4length binary unit:


As you can see, the first 1 is dropped, because the 4length binary unit only allows 4 numbers, the number 21
becomes 5
.
When we use it?
When you need number which is >= 0. Here are some common use:
 when we use number to present the quantity
 when presenting a logical address in computer
 when presenting other data types(text, image, audio, video, etc)
Signed absolute value
Because of its own problem, it’s not common in computer, and I will introduce what the problem is after a while.
Its range includes (2^(n1), 2^(n1))
, n is the length of binary number.
How it works?
As we mentioned above, computer are used to use byte(8length binary unit). For signed absolute value notation, the first bit in byte is the sign, 0
for positive, 1
for negative.
If I want to store decimal numbers +7
and 12
Decimal number  Sign  Binary data  Result 

7  0  0000111  00000111 
12  1  0001100  10001100 
How to read?
Here’s the steps:
 read the first number, is the sign
 convert remaining content to decimal number
 get the result
For example, read the binary number 11001101
in signed absolute value


Overflow
Just as unsigned notation, if you put a number which is bigger than the binary unit allowed, the overflow will happen.
If I store a decimal number 130
to a 8length bit sequence:


The number 130
becomes 2
, because of the overflow.
The problem
For number 0, the data part is 0000000
, but sign part could be 0
and 1
, so there have two 0: +0
and 0
.
Two’s complement
Two’s complement solves the problem of double zero, and it’s the standard method for integer storage in computer.
If you want to get the opposite number of a specific number, just calculating the two’s complement of the number.
Its range includes [2^(n1), 2^(n1))
, n is the length of binary number.
How it works?
It is based on Signed absolute value, so they are similiar.
 For positive number, they are the same.
 For negative number, number needs to be converted to two’s complement form.
How to calculate two’s complement?
 check whether the sign bit of number is
1
, if yes, continue from right to left, and all numbers (excepting the sign number) which is on the left of the first
1
needs to be reversed (1>0, 0>1)
How does it store a decimal number 125
:
Sign  Data  

Signed absolute value  1  1111101 
After two’s complement  1  0000011 
So the result is: 10000011
How to read?
For positive numbers: the same as signed absolute value
For negative numbers: you need to get the two’s complement
We use the result from the previous step: converting 10000011
to a decimal number
Sign  Data  

The original number  1  0000011 
After two’s complement  1  1111101 
So the result is 11111101
As you can see, when you use two’s complement to a number for two times, it will be restored.
Overflow
Just as other notations, when you put a number which is not in the range of the notation, it will overflow.
If I store a decimal number 130
to a 8length bit sequence:
Sign  Data  

Convert 130 to binary  1  10000010 
After two’s complement  1  01111110 
Stored in bit sequence  0  1111110 
The number 130
becomes 126
Floating point number storage
Floating point number storage includes a floating point. By moving the point, we can change the number size easily.
It’s used for presenting numbers which includes very huge integer part or very small decimal part.
How it works
For any number uses floating number notation includes 3 parts:
 Sign(S): it’s used to identify the number is positive or negative
 Exponent(E): it’s used to express the position of point
 Fraction(F): it’s used to present the number content
Scientific notation
It’s widely know that we can use scientific notation to shorten the length of number.
In floating point number notation, we will use the scientific notation.
Using scientific notation to convert a binary number 0.000000101
(As you can see, it has very small decimal part)


Standardization
In a nonzero binary number, the starting number is always 1, so we can ignore it.
If we do so, we could save memory(The less content you save, the less storage you use), we call this step as standardization.
Then the result becomes: .01 * 2^(7)
Sign and Fraction
Ok, from the previous step, we get .01 * 2^(7)
, and now we can identify the sign and the fraction parts.
Just as fixed point numbers, positive number’s sign is 0, negative is 1.
To get the fraction, the only thing we need to do is remove the point
So here it is:
Sign  Exponent  Fraction 

0  unkown  01 
Exponent
Storing a zero or positive number is easier than negative number, because we don’t need to care about sign. But the truth is the point can move to left or right, so the exponent could be positive or negative, which is annoying.
So we make an offset to make a negative number becomes a zero or positive number.
For a storage unit includes 4 bit size for exponent, we can store 16 (2^4) numbers in it:
Before offset  7  6  5  4  3  2  1  0  1  2  3  4  5  6  7  8 

After offset  0  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15 
Why is the range [7,8] rather than [8,7]?
In short, it’s designed by IEEE.
But they said “There is no standard for offset binary, but most often…”.
You can check more details from here and here.
Let’s continue! As you can see, if you want to offset the range to nonnegative, you need to offset for 2^(n1)1
, which n is the bit size of storage unit.
We got 2^(7)
from the standardization. From the offset table, we can get 0
after offset.
Sign  Exponent  Fraction 

0  0  01 
So the storage content is depended on the storage unit.
Float32 & Float64
Float32 and Float64 are two standards made by IEEE, which is commonly used in floating number presenting.
Float32
Float32 has 32 bits totally, which includes:
 Sign: 1 bit
 Exponent: 8 bits
 Fraction: 23 bits
Range calculating:
 Min absolute value:


 Max absolute value:


The range of float32 is:
[(12^(24))*2^128, (12^(1))*2^(127)] ∪ [(12^(1))*2^(127), (12^(24))*2^128]
Just like others, if you store a number which is not in the range, it will overflow.
Float64
Float64 has 64 bits totally, which includes:
 Sign: 1 bit
 Exponent: 11 bits
 Fraction: 52 bits
Range calculating:
 Min absolute value:


 Max absolute value:


The range of float64 is:
[(12^(53))*2^1024, (12^(1))*2^(1023)] ∪ [(12^(1))*2^(1023), (12^(53))*2^1024]
Float64 can present a huger number range than Float32, but it still might overflow.
Zero
You might noticed that whatever float32 or float64, their range don’t include number 0. Zero is the special occasion for floating numbers, it still could be stored in float32 and float64 in fact.
When sign, exponent, fraction are all 0, the number is 0.0
Text storage
Text is composed by characters, so it’s a character problem elemently. In other words, how we handle characters is how we handle texts.
Encoding method is the method handles characters. According to the method, we could map the character and a binary series.
ASCII
ASCII is the first mapping rule, and every character is presented in a 7length binary sequence, so it can only present 127(2^7) types of character, which is very limited.
Unicode
It’s the most common character set we used today. It includes all character from different languages and marks and every character is presented in a 32length binary sequence. ASCII is a part of Unicode, and because of its overwhelming convenience, other encodings become unpopular anymore.
Audio storage
Unlike numbers and characters, audio is not defined clearly, which means it cannot be stored lossless, we can only emulate how the audio generated.
Sampling rate
The sampling rate determines how many details can we emulate.
Just like the calculus, we cut the audio to n parts averagely. When n is larger, which means the sampling rate is higher and the more details we emulated. In contrast, when the n is smaller, the sampling rate is lower and less details we emulated.
Normally, 40000 times/second of sampling rate is good enough.
Quantification
We can using a number to present every sample we get, and this process is callled quantification.
For simplifing the process, we usually use integer to do the quantification.
For example, when the sample value is 17.8, we will use 18 instead of 17.8.
Encoding
In this step, the audio will become a binary sequence which could be stored in our computer.
Bits per sample
It’s the index to measure the quality of the sample. It defines how many bits we could use to present the quantification of sample. The bits per sample higher, the quality of emulation of every sample higher.
Code rate
Code rate = Sampling rate * Bits per sample
File size = Code rate * Audio duration
For 1minute music with 40000 sampling rate and 16 bits per sample:


Image storage
There has two types of images: Bitmap graphics and Vector graphics, and they use different principles to illustrate images.
Bitmap graphics
In this way, images are composed by pixels, so all things needed are describing the pixels:
 How many pixels it has
 What the color of the pixel
Resolution
Resolution is used for illustrate how many pixels the image has. Usually, the higher resolution, the more clear the image is.
Resolution is composed by pixels on width and pixels on height, like: 1920 * 1080, and here are some common resolutions:
 1080p: 1920 * 1080, 1920 pixels on width, 1080 pixels on height
 1440p: 2560 * 1440, 2560 pixels on width, 1440 pixels on height
 4K: 3840 * 2160, 3840 pixels on width, 2160 pixels on height
Color depth
Also, we needs a binary sequence to describe the color of pixel, and the length of binary sequence is called color depth. So the larger color depth, the more color you can use, the larger the image.
Every color in computer is made up of red, green, blue(you can mix them in different ratio to generate new colors), so the color code is composed by the code of red, green, blue as well. The larger color code of specific color means the more specific color in the result color(the mixed color).
Here are some sample of 3bits color:
Code  R  G  B  Color 

111000000  111  000  000  red 
000111000  000  111  000  green 
000000111  000  000  111  blue 
000000000  000  000  000  black 
111111111  111  111  111  white 
Usually the color depth is 8 bit, which could present 16777216 colors.
Pros and cons
Pros:
 The image could present complex content
 It’s commonly used in our daliy life(photography)
Cons:
 The resolution is fixed. Once you zoom in, the image will be blurred.
Vector graphics
Just like programs, the vector graphics is composed by instructions which tells the computer how to draw it.
If I want to draw a circle, I need:
 Where is the center of the circle
 How many is the circle’s radius
 What’s the color of the border of the circle
 What’s the color in the circle
Vector graphics are generated by professional software.
Pros and cons
Pros:
 When you zoom in, the graphic will be rerendered, so it won’t be blurred.
 The size is smaller than bitmaps graphics
Cons:
 You can only generate it by using professional software
 It cannot be used for illustrate complex things
Video storage
Videos is made up of a series of images and played rapidly. In other words, video file is a file which includes infomation of what the specific image on specific time. But if we do so, the video file must be very huge, so today, our video files are all compressed.