The story of cover: the first hard drive in the world
Data types in Computer
Today, we can store any data we have in our computer, including: numbers, text, audio, image, video, etc.
Most of mordern computer are based on binary, in other words, the computer can only store and manipulate 0 and 1, so how did computer store the data?
It’s no doubt that we store all data as binary in our computer, there must be some methods of representation to convert 0 and 1 to what we can see.
What is bit?
Before talking about bit pattern, we need to know what is bit?
Simple, bit is the smallest unit in computer, which only means a 0 or 1.
What is bit pattern?
Bit pattern is a sequence of bits, and sometimes we call it bit stream as well.
For different use, we can define the binary numbers’ meaning, and let the computer know the mapping rules, so that we can use binary to represent the content. In theory, if the binary number is big enough, it could represent anything, even the universe.
Fixed point number storage
Fixed point number is usually used for integer, and the point is on the rightest position of number, so we call it fixed point number as well.
|
|
For saving memory(make the length of binary shorter), we have two ways to present an integer: unsigned notation and signed notation.
Unsigned notation
Its range includes [0, 2^n)
, n is the length of binary number.
How it works?
If I want to store a decimal number 7
into a 8-length binary unit:
|
|
By the way, computers are used to use 8-length binary unit, and call it byte.
How to read?
Just like convert a binary number to deicmal number
If I want to read the number I just stored from a 8-length binary unit:
|
|
Overflow
If you want to store a number which is bigger than the binary unit allowed, then the overflow will happen.
For example, if you want to store a decimal number 21
into a 4-length binary unit:
|
|
As you can see, the first 1 is dropped, because the 4-length binary unit only allows 4 numbers, the number 21
becomes 5
.
When we use it?
When you need number which is >= 0. Here are some common use:
- when we use number to present the quantity
- when presenting a logical address in computer
- when presenting other data types(text, image, audio, video, etc)
Signed absolute value
Because of its own problem, it’s not common in computer, and I will introduce what the problem is after a while.
Its range includes (-2^(n-1), 2^(n-1))
, n is the length of binary number.
How it works?
As we mentioned above, computer are used to use byte(8-length binary unit). For signed absolute value notation, the first bit in byte is the sign, 0
for positive, 1
for negative.
If I want to store decimal numbers +7
and -12
Decimal number | Sign | Binary data | Result |
---|---|---|---|
7 | 0 | 0000111 | 00000111 |
-12 | 1 | 0001100 | 10001100 |
How to read?
Here’s the steps:
- read the first number, is the sign
- convert remaining content to decimal number
- get the result
For example, read the binary number 11001101
in signed absolute value
|
|
Overflow
Just as unsigned notation, if you put a number which is bigger than the binary unit allowed, the overflow will happen.
If I store a decimal number 130
to a 8-length bit sequence:
|
|
The number 130
becomes -2
, because of the overflow.
The problem
For number 0, the data part is 0000000
, but sign part could be 0
and 1
, so there have two 0: +0
and -0
.
Two’s complement
Two’s complement solves the problem of double zero, and it’s the standard method for integer storage in computer.
If you want to get the opposite number of a specific number, just calculating the two’s complement of the number.
Its range includes [-2^(n-1), 2^(n-1))
, n is the length of binary number.
How it works?
It is based on Signed absolute value, so they are similiar.
- For positive number, they are the same.
- For negative number, number needs to be converted to two’s complement form.
How to calculate two’s complement?
- check whether the sign bit of number is
1
, if yes, continue- from right to left, and all numbers (excepting the sign number) which is on the left of the first
1
needs to be reversed (1->0, 0->1)
How does it store a decimal number -125
:
Sign | Data | |
---|---|---|
Signed absolute value | 1 | 1111101 |
After two’s complement | 1 | 0000011 |
So the result is: 10000011
How to read?
For positive numbers: the same as signed absolute value
For negative numbers: you need to get the two’s complement
We use the result from the previous step: converting 10000011
to a decimal number
Sign | Data | |
---|---|---|
The original number | 1 | 0000011 |
After two’s complement | 1 | 1111101 |
So the result is 11111101
As you can see, when you use two’s complement to a number for two times, it will be restored.
Overflow
Just as other notations, when you put a number which is not in the range of the notation, it will overflow.
If I store a decimal number -130
to a 8-length bit sequence:
Sign | Data | |
---|---|---|
Convert -130 to binary | 1 | 10000010 |
After two’s complement | 1 | 01111110 |
Stored in bit sequence | 0 | 1111110 |
The number -130
becomes 126
Floating point number storage
Floating point number storage includes a floating point. By moving the point, we can change the number size easily.
It’s used for presenting numbers which includes very huge integer part or very small decimal part.
How it works
For any number uses floating number notation includes 3 parts:
- Sign(S): it’s used to identify the number is positive or negative
- Exponent(E): it’s used to express the position of point
- Fraction(F): it’s used to present the number content
Scientific notation
It’s widely know that we can use scientific notation to shorten the length of number.
In floating point number notation, we will use the scientific notation.
Using scientific notation to convert a binary number 0.000000101
(As you can see, it has very small decimal part)
|
|
Standardization
In a non-zero binary number, the starting number is always 1, so we can ignore it.
If we do so, we could save memory(The less content you save, the less storage you use), we call this step as standardization.
Then the result becomes: .01 * 2^(-7)
Sign and Fraction
Ok, from the previous step, we get .01 * 2^(-7)
, and now we can identify the sign and the fraction parts.
Just as fixed point numbers, positive number’s sign is 0, negative is 1.
To get the fraction, the only thing we need to do is remove the point
So here it is:
Sign | Exponent | Fraction |
---|---|---|
0 | unkown | 01 |
Exponent
Storing a zero or positive number is easier than negative number, because we don’t need to care about sign. But the truth is the point can move to left or right, so the exponent could be positive or negative, which is annoying.
So we make an offset to make a negative number becomes a zero or positive number.
For a storage unit includes 4 bit size for exponent, we can store 16 (2^4) numbers in it:
Before offset | -7 | -6 | -5 | -4 | -3 | -2 | -1 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
After offset | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |
Why is the range [-7,8] rather than [-8,7]?
In short, it’s designed by IEEE.
But they said “There is no standard for offset binary, but most often…”.
You can check more details from here and here.
Let’s continue! As you can see, if you want to offset the range to non-negative, you need to offset for 2^(n-1)-1
, which n is the bit size of storage unit.
We got 2^(-7)
from the standardization. From the offset table, we can get 0
after offset.
Sign | Exponent | Fraction |
---|---|---|
0 | 0 | 01 |
So the storage content is depended on the storage unit.
Float32 & Float64
Float32 and Float64 are two standards made by IEEE, which is commonly used in floating number presenting.
Float32
Float32 has 32 bits totally, which includes:
- Sign: 1 bit
- Exponent: 8 bits
- Fraction: 23 bits
Range calculating:
- Min absolute value:
|
|
- Max absolute value:
|
|
The range of float32 is:
[-(1-2^(-24))*2^128, -(1-2^(-1))*2^(-127)] ∪ [(1-2^(-1))*2^(-127), (1-2^(-24))*2^128]
Just like others, if you store a number which is not in the range, it will overflow.
Float64
Float64 has 64 bits totally, which includes:
- Sign: 1 bit
- Exponent: 11 bits
- Fraction: 52 bits
Range calculating:
- Min absolute value:
|
|
- Max absolute value:
|
|
The range of float64 is:
[-(1-2^(-53))*2^1024, -(1-2^(-1))*2^(-1023)] ∪ [(1-2^(-1))*2^(-1023), (1-2^(-53))*2^1024]
Float64 can present a huger number range than Float32, but it still might overflow.
Zero
You might noticed that whatever float32 or float64, their range don’t include number 0. Zero is the special occasion for floating numbers, it still could be stored in float32 and float64 in fact.
When sign, exponent, fraction are all 0, the number is 0.0
Text storage
Text is composed by characters, so it’s a character problem elemently. In other words, how we handle characters is how we handle texts.
Encoding method is the method handles characters. According to the method, we could map the character and a binary series.
ASCII
ASCII is the first mapping rule, and every character is presented in a 7-length binary sequence, so it can only present 127(2^7) types of character, which is very limited.
Unicode
It’s the most common character set we used today. It includes all character from different languages and marks and every character is presented in a 32-length binary sequence. ASCII is a part of Unicode, and because of its overwhelming convenience, other encodings become unpopular anymore.
Audio storage
Unlike numbers and characters, audio is not defined clearly, which means it cannot be stored lossless, we can only emulate how the audio generated.
Sampling rate
The sampling rate determines how many details can we emulate.
Just like the calculus, we cut the audio to n parts averagely. When n is larger, which means the sampling rate is higher and the more details we emulated. In contrast, when the n is smaller, the sampling rate is lower and less details we emulated.
Normally, 40000 times/second of sampling rate is good enough.
Quantification
We can using a number to present every sample we get, and this process is callled quantification.
For simplifing the process, we usually use integer to do the quantification.
For example, when the sample value is 17.8, we will use 18 instead of 17.8.
Encoding
In this step, the audio will become a binary sequence which could be stored in our computer.
Bits per sample
It’s the index to measure the quality of the sample. It defines how many bits we could use to present the quantification of sample. The bits per sample higher, the quality of emulation of every sample higher.
Code rate
Code rate = Sampling rate * Bits per sample
File size = Code rate * Audio duration
For 1-minute music with 40000 sampling rate and 16 bits per sample:
|
|
Image storage
There has two types of images: Bitmap graphics and Vector graphics, and they use different principles to illustrate images.
Bitmap graphics
In this way, images are composed by pixels, so all things needed are describing the pixels:
- How many pixels it has
- What the color of the pixel
Resolution
Resolution is used for illustrate how many pixels the image has. Usually, the higher resolution, the more clear the image is.
Resolution is composed by pixels on width and pixels on height, like: 1920 * 1080, and here are some common resolutions:
- 1080p: 1920 * 1080, 1920 pixels on width, 1080 pixels on height
- 1440p: 2560 * 1440, 2560 pixels on width, 1440 pixels on height
- 4K: 3840 * 2160, 3840 pixels on width, 2160 pixels on height
Color depth
Also, we needs a binary sequence to describe the color of pixel, and the length of binary sequence is called color depth. So the larger color depth, the more color you can use, the larger the image.
Every color in computer is made up of red, green, blue(you can mix them in different ratio to generate new colors), so the color code is composed by the code of red, green, blue as well. The larger color code of specific color means the more specific color in the result color(the mixed color).
Here are some sample of 3-bits color:
Code | R | G | B | Color |
---|---|---|---|---|
111000000 | 111 | 000 | 000 | red |
000111000 | 000 | 111 | 000 | green |
000000111 | 000 | 000 | 111 | blue |
000000000 | 000 | 000 | 000 | black |
111111111 | 111 | 111 | 111 | white |
Usually the color depth is 8 bit, which could present 16777216 colors.
Pros and cons
Pros:
- The image could present complex content
- It’s commonly used in our daliy life(photography)
Cons:
- The resolution is fixed. Once you zoom in, the image will be blurred.
Vector graphics
Just like programs, the vector graphics is composed by instructions which tells the computer how to draw it.
If I want to draw a circle, I need:
- Where is the center of the circle
- How many is the circle’s radius
- What’s the color of the border of the circle
- What’s the color in the circle
Vector graphics are generated by professional software.
Pros and cons
Pros:
- When you zoom in, the graphic will be re-rendered, so it won’t be blurred.
- The size is smaller than bitmaps graphics
Cons:
- You can only generate it by using professional software
- It cannot be used for illustrate complex things
Video storage
Videos is made up of a series of images and played rapidly. In other words, video file is a file which includes infomation of what the specific image on specific time. But if we do so, the video file must be very huge, so today, our video files are all compressed.