Descriptive Statistics

Datasets - List of things or bunch of numbers

Examples:

Weight of 20 Students

122, 118, 56, 54, 77, 90, 116, 88, 72, 56

112, 123, 55, 74, 87, 100, 96, 68, 55, 61

Hair Color of 20 Students

Brown, Red, Black, Black, Black, Black, Red, Brown, Red, Black

Black, Red, Brown, Black, Black, Red, Brown, Brown, Brown, Black

Sample Size (n) = total count of Observations of Dataset (ex: n=20 in above cases)

Data Classifications:

Discrete:

Have natural categories (Ex: Hair color of 20 students Red, Brown, Black)

Continuous:

Have continuous numerical data (Ex: Weight of 20 Students)

Distribution of Discrete Datasets:

Simply classify the data/observations into categories

Eg: Left Handed Person (L)/ Right Handed Person (R)

L, R, L, L, L, R, R, R, R, L

R, L, R, L, R, L, L, L, R, L

Only have 2 categories

Right(R) = 9

Left (L) = 11

Inference:

Sample Proportion (occurrence of Right handed people in sample): 9/20

Sample Proportion (occurrence of Left handed people in sample): 11/20

Left handed people are more

Distribution of Continuous Datasets:

No natural values, hence create categories

Steam Leaf Plot:

Create Category1 as 2Digit. All values falling under this interval will have first 2 digits 12 or 13 or 14 or 15. This is called “Stem”. “Leaf”= All digits left out of Stem.

Example:

126 => Stem =12; Leaf=6

1298 => Stem=12; Leaf=98

138=>Stem=13; Leaf=8

Applying the 2Digit technique on above data:

Again classify the “Stems” as Stem1= 0-4 digits under one category and Stem2=5-9 digits as other

Dot Plot:

Draw a line (scale) of maximum and minimum range and plot the values

Observations with same value put second dot above first dot.

For the example data of Etruscan:

Comparative Dot Plots

5-Point method Descriptive Statistics–Continuous data:

Minimum = 126

Maximum=158

Range=Maximum – Minimum = 32

Range also termed as measure of scale/dispersion /noise

Outliers – Points far from rest of data

Median = Robust in statistics. Center item in sorted list

If n=25 then median = (n+1)/2 = 26/2=13^th item = 146

(25%) First Quartile (Q1) = Center point of 1^st half up to median (n/4)

50% Quartile (Q2) = 50% = n/2

(75%) Third Quartile (Q3) = Center point of last half from median (n/8)

Inter Quartile range (IQR) = Q3 – Q1

As per 5 Point method for the above dataset

Min=126, Q1=146, Median=142, Q3=150, Max=158, Range=32

Stem Leaf Plot:

Graphical representation of values which will easily define 5 point method.

|------------------- -------|

Min---------- Q1 Q3----------Mx

|---------------------------|

Outliers and Box Plot

Points not belong to the range or dataset is termed as outliers

Comparing datasets:

Shift of 13 between Etruscan and Italian is the Medians out of 5 Points (146-133)=13. This is termed as Location Problem

Measure of Center:

Sample Mean – average (sum of numbers / number of items)

Median =11

Sample Mean = 84/7 =12

Mean is sensitive to outliers

s=standard deviation

Walsh Average/Hodges-Lehmann Estimates:

For a simple data sets order and make tables as rows and columns.

For pair of data points we calculate only once. Calculate average pairs associated with row & column elements.

Measure of Scale of Noise:

Range, IQR

Variance = Measure of deviation from mean

Set1=11, 18, 6,4,8,15,22

Sample mean =12

Deviations =-1, 6, 6,-8, 3, 10 (xn-12)

Squared Deviations= s²=1, 36, 36, 64, 9, 100 =246/6=41

s= =6.4

SenthilTecks

Wednesday, August 1, 2018

Descriptive Statstics