When we learn a new subject, the first step is to learn words or concepts in this subject. It is like that you need to expand your vocabulary in the field first, then to understand and use these terms / jargons to analyze and solve questions in this field.
This, naming a thing, is indeed the first step, when people start studying a subject. We human beings are very good at naming things. To the extreme point, either consciously or unconsciously, we admit one thing exist only after we give it a name.
When leaning a new word, a concept, be careful in understanding what it is and what it is not. Everything is connected in the universe. When we name a thing — we single it out of its connections with other elements, we pay attention to specific features of this object and simultaneously we overlook some common features this object share with other objects. So always bearing in mind, a concept helps us understand some part of a thing, but it also blocks some of its features from us. Remember: A map has specific and limited functions to us. Yet a map of a city will never equal to the city itself. This drives us to learn more and create more words of concepts for further understanding and for more uses.
This is especially true in statistics. The first time when you select your sample from your population, you will be able to understand some features about population from analyzing your sample, but in the meanwhile, as a sample is not equal to the population, you have lost a lot information in the population.
It is crucial to understand uses and limitations of each concept and each method from the very beginning.
In reality, it is very costly or even impossible to measure the whole population. So we select some number of people from this population to form a sample. We study this sample and try to obtain some features of this sample so as to infer the features of the population.
Population: all the 6th graders on this school
Sample: 25 6th graders
Raw data: a sample, i.e. heights of 25 6th graders in some school
- number of observations (number of the 6th graders)
- value of each observation is one data entry
categorical vs numerical data
Organize and present data: (Attn.: what you can and cannot tell directly from these ways of presentations)
- tables: data (value) and frequency (number of a same data)
- dot plots: data (value) on the number line
- histogram: data (value) range (X-axis) and frequency (Y-axis)
One picture is clearer and more powerful than thousands of words.
Statistics: calculations and descriptions of a sample
- Mean = sum / number of data;
- SUM = X1 + X2 + …. + Xn;
- SUM = m1Y1 + m2Y2 +..+
- SUM = Mean x number of data; When more observations are added to or subtracted from the sample, you can calculate new mean without knowing every data in the old sample.
- Mode, Median: can be seen from sequence, tables, and dot plots
- Mean Absolute Deviation
- Spread / Variability: Maximum – minimum, how outliers are different away from the center of the data (mean, median, mode)
Relating jargons to everyday words
- Mean: on average, in general, overall, the center of the data (central tendency)
- Median: middle, the center of the data (central tendency)
- Mode: most frequent/often/popular value (data/observation); the center of the data (central tendency)
- Q2 = median of the whole data; “about 50% of number of observations in the data set lie below Q1.”
- Q1 = median of the lower half of the data;
- Q3 = median of the upper half of the data;
- interquartile range = Q3-Q1: 50% of the data lies between Q1 to Q3. The difference in the observations of the middle 50% of the whole data set is at most Q3-Q1.
- Spread: largest value – least value, outliers,
- MAD: outliers relating to central tendency, distribution, consistency