Statistics Problems
4.
A.Median = 205
B.Q1 = 175. Q3 = 245. IQR = 70. Outliers are those values that are outside the range [Q1-1.5*IQR; Q3+1.5*IQR] = [70; 350]. There are no such values, which means there are no outliers.
C.50% (50% values are less than median)
D.75% (75% values are less than Q3)
8.
A.Median = 3125
B.IQR = Q3 – Q1 = 3875 – 3125 = 750
C.IQR = Q3 – Q1 = 3375 – 2625 = 750. [Q1-1.5*IQR; Q3+1.5*IQR] = [1500; 4500]. All the values are in this range. That means there are no outliers.
D.The median and IQR are the best. The right whisker of the plot is much bigger than the left one. It means even though there are few big values, the most typical birth weight is much smaller. Median is not influenced by those values, so that it is the best measure. IQR is also enough to get variability. However, standard deviation for sample can give better understanding of variability.
9.
28 29 30 39 41 48 48 49 51 62
A.Mean = (28+29+30+39+41+48+48+49+51+62)/10 = 42,5
B.SD = (((28-42,5)^2+(29-42,5)^2+…+(51-42,5)^2+(62-42,4)^2)/10)^(1/2) = 10,58
C.Median = (5th+6th)/2 = (41+48)/2 = 44,5
D.Q1 = median of first 5 values = 30; Q3 = median of second 5 values = 49
E.It cannot be said what is always better – each measure has some advantages depending on the sample. If sample has outliers, it is better to use median, because some abnormal and random values will not influence the typical value. However, if there are not many outliers and sample size is big, the mean provides a better measure. In this example median could be better, because of extreme value 62.
F.The standard deviation is a better measure of dispersion, as it calculates how every value in the dataset varies from the mean. However, there should be always taken into account that there are two different formulas depending on whether the entire population or sample is considered. The advantages of IQR is its simplicity.
10.
Category
Frequency
Less than 35
3
35 – 44
2
45 – 54
4
55 or more
1
15.
IQR = Q3 – Q1 = 243 – 59 = 184.
[Q1-1.5*IQR; Q3+1.5*IQR] = [-217; 519]
Both Min=38 and Max=394 are in this range. The sample has no outliers.
19.
D. The standard deviation is larger than expected. This statement is true, because there is a big square difference between the mean and the value of outlier, which increases standard deviation. Statements A and C are false, because mean > median if outlier is on the right side from median and mean
Extra questions
1.
Dichotomous – Heads/Tails, Dead/Alive, Male/Female, True/False
Ordinal – grading system: Failure, Poor, Satisfactory, Good, Excellent
Categorical – Nationality, Blood Type, Race
Continuous - Building height, Person’s weight, Time of some action, Temperature
2. It is important for getting a better understanding of the dataset. If there is a very big variance, the mean will tell almost nothing. For example, some man won the lottery with a 1 million dollars prize. The other 999 contestants won nothing. The mean is (1000000+0+0…)/1000 …