Sunday, 16 October 2022

Statistics

Statistics

People don't just count no more. They gotta know what's up with the "too much of this" or "too less of that". Also, they gotta know what's the likelihood of that or this happening tomorrow because of this or that. They also use the numbers to build diagrams and put up reasonable doubt. It's called Statistics instead of "counting" or "recording".

A Defined Measure

Statistics can be defined as a collection of data and utilization of that data for the purpose of predictions using mathematical methods.

The usual use of data collection is:

  • To get a single number that could act a summary to be used in place of that collection. Such numbers are mean, mode or median. There are also ranges, deviation numbers and so on.

  • To make diagrams that can be used to show progress or detail concerning how things are. These diagrams usually are put on Cartesian coordinate that have a vertical axis and horizontal axis. These axis are labeled to show level and identification for the data. The types of diagrams include histograms, pie charts, bar charts and so on.

Raw and Processed Data

Collected data is called "raw" and after all the means and modes, standard deviations, and graphs are done, the data becomes "processed" or "useful information". Information from one place could become raw data at another place.

The Ways Data is Processed

The data collected can be processed around a central value or it can be processed using how it is spread. It might be better to use one than the other. For instance, look at the data collected below.

nameage
Rita20
John34
Tony25
Sam26
Camilla23
Davis23
Tamil30
Lily32
Mark33
Frank28

You might notice that these values seem distributed around 26 or 27. You could apply a central value like mean to this data.

But look at the data below:

nameage
Rita20
John60
Tony66
Sam21
Camilla70
Davis56
Tamil59
Lily67
Mark23
Frank28

The mean is probably more useful when trying to find a center, while the standard deviation is more useful when checking how the data varies around the mean. You might want to use a standard deviation rathan a mean when values are not well arranged around a center such as in the second table above. That being said, there are two ways to process statistical data apart from graphing:

  • Measures of Central Tendency
  • Measures of Spread

Measures of Central Tendency

A Measure of Central Tendency talks about a single center value to use for a data collection. The one chosen may depend on certain properties of the data or the requirements of those needing it. If you were a news reporter you might need the average. If you were a sports analyst you might need the maximum. If you deal in mechanics you might need the median. There are three values you usually look for: mean, median and mode

Mean

This is simply the average of the set of numerical data. In the table above the mean is 27.4. The mean is gotten by first summing up the data then averaging it using the count of the data.

Mode

This is the most occurring value in the sum of data. In the first table above the mode is 23. There can be no modes or more than one mode. The second table above has no modes. If you had some data such as: 23 23 34 34 25 34 23 25, your modes are 23 and 34.

Median

To get the median of a list of data, you first arrange the data in ascending or descending order. Then you look at the center data. If you arranged the data in 2 4 1 5 8 in ascending order you would have 1 2 4 5 8. You median would would be 4. If there were an even number of data then the median is the averaged sum of the two middle data. If you had 2 4 1 5 8 2, and you arranged it in ascending order you would get 1 2 2 4 5 8 and your median would be (2+4)/2 = 3.

Measures of Spread

The Measures of Spread calculate the way the collection of data is spaced out or how separated each data may be from a center. They include the following:

  • range
  • mean deviation
  • variance
  • standard deviation

Range

The Range is simply the difference between the largest value and the smallest value. If you spread the values on an x axis, you're going to get a kind of distance where those value in between look like stops.

The Range can be useful when checking diseases in certain areas. There nay be other cases where it can be applied.

Mean Deviation

The Mean Deviation is the mean of all absolute difference from the mean or average of the values. If the collection is the following: 3, 4, 7, 9, the mean is (3+4+7+9)/4=23/4=5.75. To find the mean deviation, you first find the sum of all absolute differences from the mean which is |3-5.75|+|4-5.75|+|7-5.75|+|9-5.75|. The bars mean that the difference is ignoring whether result is a negative or positive difference. Its just using the distance between the two values. For instance, |3-5.75| = 5.75-3= 2.75 and |9-5.75|=9-5.75=3.25.

The mean deviation is therefore (|3-5.75|+|4-5.75|+|7-5.75|+|9-5.75|)/4 = (2.75+1.75+1.25+3.25)/4 = 9/4 = 2.25

Variance

When you average the square of the differences instead of just using the average of the absolute differences you are looking for the variance. From the last example using the numbers 3, 4, 7, 9, the mean being 5.75, the variance would be (|3-5.75|2+|4-5.75|2+|7-5.75|2+|9-5.75|2)/4 = (2.752+1.752+1.252+3.252)/4 = 22.75/4 = 5.69

Standard Deviation

When you take the square root of the variance you get the standard deviation. For instance the standard deviation of the above data is the square root of 5.69 which is 2.89

Statistics

Subjects Happenings Open Problems ...