Statistics
People don't just count no more. They gotta know what's up with the "too much of this" or "too less of that". Also, they gotta know what's the likelihood of that or this happening tomorrow because of this or that. They also use the numbers to build diagrams and put up reasonable doubt. It's called Statistics instead of "counting" or "recording".
A Defined Measure
The usual use of data collection is:
- To get a single number that could act a summary to be used in place of that collection. Such numbers are mean, mode or median. There are also ranges, deviation numbers and so on.
- To make diagrams that can be used to show progress or detail concerning how things are. These diagrams usually are put on Cartesian coordinate that have a vertical axis and horizontal axis. These axis are labeled to show level and identification for the data. The types of diagrams include histograms, pie charts, bar charts and so on.
Raw and Processed Data
Collected data is called "raw" and after all the means and modes, standard deviations, and graphs are done, the data becomes "processed" or "useful information". Information from one place could become raw data at another place.
The Ways Data is Processed
The data collected can be processed around a central value or it can be processed using how it is spread. It might be better to use one than the other. For instance, look at the data collected below.
name | age |
Rita | 20 |
John | 34 |
Tony | 25 |
Sam | 26 |
Camilla | 23 |
Davis | 23 |
Tamil | 30 |
Lily | 32 |
Mark | 33 |
Frank | 28 |
You might notice that these values seem distributed around 26 or 27. You could apply a central value like mean to this data.
But look at the data below:
name | age |
Rita | 20 |
John | 60 |
Tony | 66 |
Sam | 21 |
Camilla | 70 |
Davis | 56 |
Tamil | 59 |
Lily | 67 |
Mark | 23 |
Frank | 28 |
The mean is probably more useful when trying to find a center, while the standard deviation is more useful when checking how the data varies around the mean. You might want to use a standard deviation rathan a mean when values are not well arranged around a center such as in the second table above. That being said, there are two ways to process statistical data apart from graphing:
- Measures of Central Tendency
- Measures of Spread
Measures of Central Tendency
A Measure of Central Tendency talks about a single center value to use for a data collection. The one chosen may depend on certain properties of the data or the requirements of those needing it. If you were a news reporter you might need the average. If you were a sports analyst you might need the maximum. If you deal in mechanics you might need the median. There are three values you usually look for: mean, median and mode
Mean
This is simply the average of the set of numerical data. In the table above the mean is 27.4. The mean is gotten by first summing up the data then averaging it using the count of the data.
Mode
This is the most occurring value in the sum of data. In the first table above the mode is 23. There can be no modes or more than one mode. The second table above has no modes. If you had some data such as: 23 23 34 34 25 34 23 25, your modes are 23 and 34.
Median
To get the median of a list of data, you first arrange the data in ascending or descending order. Then you look at the center data. If you arranged the data in 2 4 1 5 8 in ascending order you would have 1 2 4 5 8. You median would would be 4. If there were an even number of data then the median is the averaged sum of the two middle data. If you had 2 4 1 5 8 2, and you arranged it in ascending order you would get 1 2 2 4 5 8 and your median would be (2+4)/2 = 3.
Measures of Spread
The Measures of Spread calculate the way the collection of data is spaced out or how separated each data may be from a center. They include the following:
- range
- mean deviation
- variance
- standard deviation
Range
The Range is simply the difference between the largest value and the smallest value. If you spread the values on an x axis, you're going to get a kind of distance where those value in between look like stops.
The Range can be useful when checking diseases in certain areas. There nay be other cases where it can be applied.
Mean Deviation
The Mean Deviation is the mean of all absolute difference from the mean or average of the values. If the collection is the following: 3, 4, 7, 9, the mean is (3+4+7+9)/4=23/4=5.75. To find the mean deviation, you first find the sum of all absolute differences from the mean which is |3-5.75|+|4-5.75|+|7-5.75|+|9-5.75|. The bars mean that the difference is ignoring whether result is a negative or positive difference. Its just using the distance between the two values. For instance, |3-5.75| = 5.75-3= 2.75 and |9-5.75|=9-5.75=3.25.
The mean deviation is therefore (|3-5.75|+|4-5.75|+|7-5.75|+|9-5.75|)/4 = (2.75+1.75+1.25+3.25)/4 = 9/4 = 2.25
Variance
When you average the square of the differences instead of just using the average of the absolute differences you are looking for the variance. From the last example using the numbers 3, 4, 7, 9, the mean being 5.75, the variance would be (|3-5.75|2+|4-5.75|2+|7-5.75|2+|9-5.75|2)/4 = (2.752+1.752+1.252+3.252)/4 = 22.75/4 = 5.69
Standard Deviation
When you take the square root of the variance you get the standard deviation. For instance the standard deviation of the above data is the square root of 5.69 which is 2.89