Glossary of Key Terms
 aggregate

a mass, assemblage, or sum of particulars; something consisting of elements but considered as a whole
 arithmetic mean

the measure of central tendency of a set of values computed by dividing the sum of the values by their number; commonly called the mean or the average
 average

any measure of central tendency, especially any mean, the median, or the mode
 Bayes' factor

The ratio of the conditional probabilities of the event $B$ given that $A_1$ is the case or that $A_2$ is the case, respectively.
 bell curve

In mathematics, the bellshaped curve that is typical of the normal distribution. A symmetrical bellshaped curve that represents the distribution of values, frequencies, or probabilities of a set of data. It slopes downward from a point in the middle corresponding to the mean value, or the maximum probability. Data that reflect the aggregate outcome of large numbers of unrelated events tend to result in bell curve distributions. (Dictionary.com, 2021)
 bellwether

anything that indicates future trends
 bias

(Uncountable) Inclination towards something; predisposition, partiality, prejudice, preference, predilection.
 bivariate

Having or involving exactly two variables.
 box plot

A graphical summary of a numerical data sample through five statistics: median, lower quartile, upper quartile, and some indication of more extreme upper and lower values.
 boxandwhisker plot

a convenient way of graphically depicting groups of numerical data through their quartiles
 breakdown point

the number or proportion of arbitrarily large or small extreme values that must be introduced into a batch or sample to cause the estimator to yield an arbitrarily large result
 breeding

the process through which propagation, growth, or development occurs
 causality

the relationship between an event (the cause) and a second event (the effect), where the second event is understood as a consequence of the first
 census

an official count of members of a population (not necessarily human), usually residents or citizens in a particular region, often done at regular intervals
 central limit theorem

The theorem that states: If the sum of independent identically distributed random variables has a finite variance, then it will be (approximately) normally distributed.
 central tendency

a term that relates the way in which quantitative data tend to cluster around some value
 chance variation

the presence of chance in determining the variation in experimental results
 chisquared test

In probability theory and statistics, refers to a test in which the chisquared distribution (also chisquare or χdistribution) with k degrees of freedom is the distribution of a sum of the squares of k independent standard normal random variables.
 chromosome

A structure in the cell nucleus that contains DNA, histone protein, and other structural proteins.
 cluster

a significant subset within a population
 coefficient of variation

The ratio of the standard deviation to the mean.
 combinatorics

A branch of mathematics that studies (usually finite) collections of objects that satisfy specified criteria.
 conditional probability

The probability that an event will take place given the restrictive assumption that another event has taken place, or that a combination of other events has taken place
 confidence interval

A type of interval estimate of a population parameter used to indicate the reliability of an estimate.
 confounding variable

an extraneous variable in a statistical model that correlates (positively or negatively) with both the dependent variable and the independent variable
 contingency table

a table presenting the joint distribution of two categorical variables
 continuous random variable

obtained from data that can take infinitely many values
 continuous variable

a variable that has a continuous distribution function, such as temperature
 control

a separate group or subject in an experiment against which the results are compared where the primary variable is low or nonexistence
 control group

the group of test subjects left untreated or unexposed to some procedure and then compared with treated subjects in order to validate the results of the test
 correlation

One of the several measures of the linear statistical relationship between two random variables, indicating both the strength and direction of the relationship.
 critical thinking

the application of logical principles, rigorous standards of evidence, and careful reasoning to the analysis and discussion of claims, beliefs, and issues
 cross tabulation

a presentation of data in a tabular form to aid in identifying a relationship between variables
 cumulative relative frequency

the accumulation of the previous relative frequencies
 data mining

a technique for searching largescale databases for patterns; used mainly to find previously unknown correlations between variables that may be commercially useful
 density

the probability that an event will occur, as a function of some observed variable
 dependent variable

in an equation, the variable whose value depends on one or more variables in the equation
 descriptive statistics

A branch of mathematics dealing with summarization and description of collections of data sets, including the concepts of arithmetic mean, median, and mode.
 deviation

For interval variables and ratio variables, a measure of difference between the observed value and the mean.
 dichotomous

dividing or branching into two pieces
 discrete random variable

obtained by counting values for which there are no inbetween values, such as the integers 0, 1, 2, ….
 discrete variable

a variable that takes values from a finite or countable set, such as the number of legs of an animal
 disjoint

Having no members in common; having an intersection equal to the empty set.
 disparity

the state of being unequal; difference
 dispersion

the degree of scatter of data
 distribution

the set of relative likelihoods that a variable will have a value in a given interval
 ellipsis

a mark consisting of three periods, historically with spaces in between, before, and after them “… “, nowadays a single character ” (used in printing to indicate an omission)
 empirical

verifiable by means of scientific experimentation
 empirical rule

That a normal distribution has 68% of its observations within one standard deviation of the mean, 95% within two, and 99.7% within three.
 equiprobable

having an equal chance of occurring mathematically
 event

A subset of the sample space.
 evolution

a gradual directional change, especially one leading to a more advanced or complex form; growth; development
 exhaustive

including every possible element
 expected value

of a discrete random variable, the sum of the probability of each possible outcome of the experiment multiplied by the value itself
 experiment

A test under controlled conditions made to either demonstrate a known truth, examine the validity of a hypothesis, or determine the efficacy of something previously untried.
 exploratory data analysis

an approach to analyzing data sets that is concerned with uncovering underlying structure, extracting important variables, detecting outliers and anomalies, testing underlying assumptions, and developing models
 finite

limited, constrained by bounds, having an end
 frequency

number of times an event occurred in an experiment (absolute frequency)
 frequency distribution

a representation, either in a graphical or tabular format, which displays the number of observations within a given interval
 gene

a unit of heredity; a segment of DNA or RNA that is transmitted from one generation to the next, and that carries genetic information such as the sequence of amino acids for a protein
 gradient

of a function y = f(x) or the graph of such a function, the rate of change of y with respect to x, that is, the amount by which y changes for a certain (often unit) change in x
 graph

A diagram displaying data; in particular one showing the relationship between two or more quantities, measurements or indicative numbers that may or may not have a specific mathematical formula relating them to each other.
 heterogeneous

diverse in kind or nature; composed of diverse parts
 histogram

a representation of tabulated frequencies, shown as adjacent rectangles, erected over discrete intervals (bins), with an area equal to the frequency of the observations in the interval
 independence

The occurrence of one event does not affect the probability of the occurrence of another.
 independent

Not dependent; not contingent or depending on something else; free.
 independent event

the fact that $A$ occurs does not affect the probability that $B$ occurs
 independent variable

in an equation, any variable whose value is not dependent on any other in the equation
 inferential statistics

A branch of mathematics that involves drawing conclusions about a population based on sample data drawn from it.
 integral

the limit of the sums computed in a process in which the domain of a function is divided into small subsets and a possibly nominal value of the function on each subset is multiplied by the measure of that subset, all these products then being summed
 intercept

the coordinate of the point at which a curve intersects an axis
 interquartile range

The difference between the first and third quartiles; a robust measure of sample dispersion.
 labor force

The collective group of people who are available for employment, whether currently employed or unemployed (though sometimes only those unemployed people who are seeking work are included).
 line

a path through two or more points (compare ‘segment’); a continuous mark, including as made by a pen; any path, curved or straight
 linear regression

an approach to modeling the relationship between a scalar dependent variable $y$ and one or more explanatory variables denoted $x$.
 logarithm

for a number $x$, the power to which a given base number must be raised in order to obtain $x$
 margin of error

An expression of the lack of precision in the results obtained from a sample.
 mean squared error

A measure of the average of the squares of the “errors”; the amount by which the value implied by the estimator differs from the quantity to be estimated.
 median

the numerical value separating the higher half of a data sample, a population, or a probability distribution, from the lower half
 mode

the most frequently occurring value in a distribution
 Monte Carlo simulation

a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results–i.e., by running simulations many times over in order to calculate those same probabilities
 multiplication rule

The probability that A and B occur is equal to the probability that A occurs times the probability that B occurs, given that we know A has already occurred.
 mutually exclusive

describing multiple events or states of being such that the occurrence of any one implies the nonoccurrence of all the others
 nominal

Having values whose order is insignificant.
 nonresponse

the absence of a response
 nonresponse bias

Occurs when the sample becomes biased because some of those initially selected refuse to respond.
 normal distribution

A family of continuous probability distributions such that the probability density function is the normal (or Gaussian) function.
 nuisance parameters

any parameter that is not of immediate interest but which must be accounted for in the analysis of those parameters which are of interest; the classic example of a nuisance parameter is the variance $sigma^2$, of a normal distribution, when the mean, $mu$, is of primary interest
 null hypothesis

A hypothesis set up to be refuted in order to support an alternative hypothesis; presumed true until statistical evidence in the form of a hypothesis test indicates otherwise.
 objective

not influenced by the emotions or prejudices
 observational study

a study drawing inferences about the possible effect of a treatment on subjects, where the assignment of subjects into a treated group versus a control group is outside the control of the investigator
 odds

the ratio of the probabilities of an event happening to that of it not happening
 ordinal

Of a number, indicating position in a sequence.
 outcome

One of the individual results that can occur in an experiment.
 outlier

a value in a statistical sample which does not fit a pattern that describes most other data points; specifically, a value that lies 1.5 IQR beyond the upper or lower quartile
 Pareto chart

a type of bar graph where where the bars are drawn in decreasing order of frequency or relative frequency
 Pareto distribution

The Pareto distribution, named after the Italian economist Vilfredo Pareto, is a power law probability distribution that is used in description of social, scientific, geophysical, actuarial, and many other types of observable phenomena.
 partition

a part of something that had been divided, each of its results
 peer review

the scholarly process whereby manuscripts intended to be published in an academic journal are reviewed by independent researchers (referees) to evaluate the contribution, i.e. the importance, novelty and accuracy of the manuscript’s contents
 percentile

any of the ninetynine points that divide an ordered distribution into one hundred parts, each containing one per cent of the population
 pictogram

a picture that represents a word or an idea by illustration; used often in graphs
 pip

one of the spots or symbols on a playing card, domino, die, etc.
 placebo

an inactive substance or preparation used as a control in an experiment or test to determine the effectiveness of a medicinal drug
 placebo effect

the tendency of any medication or treatment, even an inert or ineffective one, to exhibit results simply because the recipient believes that it will work
 Platonic solid

any one of the following five polyhedra: the regular tetrahedron, the cube, the regular octahedron, the regular dodecahedron and the regular icosahedron
 plot

a graph or diagram drawn by hand or produced by a mechanical or electronic device
 polynomial

An expression consisting of a sum of a finite number of terms: each term being the product of a constant coefficient and one or more variables raised to a nonnegative integer power.
 population

a group of units (persons, objects, or other items) enumerated in a census or from which a sample is drawn
 probability

The relative likelihood of an event happening.
 probability density function

any function whose integral over a set gives the probability that a random variable has a value in that set
 probability distribution

A function of a discrete random variable yielding the probability that the variable will have a given value.
 probability sample

a sample in which every unit in the population has a chance (greater than zero) of being selected in the sample, and this probability can be accurately determined
 probability theory

The mathematical study of probability (the likelihood of occurrence of random events in order to predict the behavior of defined systems).
 prognostic

a sign by which a future event may be known or foretold
 prosecutor's fallacy

A fallacy of statistical reasoning when used as an argument in legal proceedings.
 public opinion polls

surveys designed to represent the beliefs of a population by conducting a series of questions and then extrapolating generalities in ratio or within confidence intervals
 purposive sampling

occurs when the researchers choose the sample based on who they think would be appropriate for the study; used primarily when there is a limited number of people that have expertise in the area being researched
 quadrennial

happening every four years
 qualitative

of descriptions or distinctions based on some quality rather than on some quantity
 qualitative analysis

The numerical examination and interpretation of observations for the purpose of discovering underlying meanings and patterns of relationships.
 qualitative data

data centered around descriptions or distinctions based on some quality or characteristic rather than on some quantity or measured value
 quantitative

of a measurement based on some quantity or number rather than on some quality
 quantity

of a measurement based on some quantity or number rather than on some quality
 quartile

any of the three points that divide an ordered distribution into four parts, each containing a quarter of the population
 quota sampling

a sampling method that chooses a representative crosssection of the population by taking into consideration each important characteristic of the population proportionally, such as income, sex, race, age, etc.
 R

A free software programming language and a software environment for statistical computing and graphics.
 random assignment

an experimental technique for assigning subjects to different treatments (or no treatment)
 random number

number allotted randomly using suitable generator (electronic machine or as simple “generator” as die)
 random sample

a sample randomly taken from an investigated population
 random variable

a quantity whose value is random and to which a probability distribution is assigned, such as the possible outcome of a roll of a die
 random walk

a stochastic path consisting of a series of sequential movements, the direction (and sometime length) of which is chosen at random
 range

the length of the smallest interval which contains all the data in a sample; the difference between the largest and smallest observations in the sample
 raw score

an original observation that has not been transformed to a $z$score
 regression

An analytic method to measure the association of one or more independent variables with a dependent variable.
 regression to the mean

the phenomenon by which extreme examples from any set of data are likely to be followed by examples which are less extreme; a tendency towards the average of any sample
 relative frequency

the fraction or proportion of times a value occurs
 relative frequency distribution

a representation, either in graphical or tabular format, which displays the fraction of observations in a certain category
 residual

The difference between the observed value and the estimated function value.
 response bias

Occurs when the answers given by respondents do not reflect their true beliefs.
 root mean square

the square root of the arithmetic mean of the squares
 sample

a subset of a population selected for measurement, observation, or questioning to provide statistical information about the population
 sample mean

the mean of a sample of random variables taken from the entire population of those variables
 sample space

The set of all outcomes of an experiment.
 sampling

the process or technique of obtaining a representative sample
 sampling distribution

The probability distribution of a given statistic based on a random sample.
 scatter plot

A type of display using Cartesian coordinates to display values for two variables for a set of data.
 scientific control

an experiment or observation designed to minimize the effects of variables other than the single independent variable
 shunt

a passage between body channels constructed surgically as a bypass
 Simpson's paradox

a paradox in which a trend that appears in different groups of data disappears when these groups are combined, and the reverse trend appears for the aggregate data
 skewed

Biased or distorted (pertaining to statistics or information).
 skewness

A measure of the asymmetry of the probability distribution of a realvalued random variable; is the third standardized moment, defined as where is the third moment about the mean and is the standard deviation.
 slope

the ratio of the vertical and horizontal distances between two points on a line; zero if the line is horizontal, undefined if it is vertical.
 spread

A numerical difference.
 standard deviation

a measure of how spread out data values are around the mean, defined as the square root of the variance
 standard error

A measure of how spread out data values are around the mean, defined as the square root of the variance.
 statistical literacy

the ability to understand statistics, necessary for citizens to understand material presented in publications such as newspapers, television, and the Internet
 statistics

a mathematical science concerned with data collection, presentation, analysis, and interpretation
 stemandleaf display

a means of displaying data used especially in exploratory data analysis; another name for stemplot
 stemplot

a means of displaying data used especially in exploratory data analysis; another name for stemandleaf display
 stochastic

random; randomly determined
 stratum

a category composed of people with certain similarities, such as gender, race, religion, or even grade level
 straw poll

a survey of opinion which is unofficial, casual, or ad hoc
 Student's tdistribution

A distribution that arises when the population standard deviation is unknown and has to be estimated from the data; originally derived by William Sealy Gosset (who wrote under the pseudonym “Student”).
 Student's tstatistic

a ratio of the departure of an estimated parameter from its notional value and its standard error
 summation notation

a notation, given by the Greek letter sigma, that denotes the operation of adding a sequence of numbers
 TI83

A calculator manufactured by Texas Instruments that is one of the most popular graphing calculators for statistical purposes.
 truncate

To shorten something as if by cutting off part of it.
 unbiased

impartial or without prejudice
 undercoverage

Occurs when a survey fails to reach a certain portion of the population.
 unemployment

The level of joblessness in an economy, often measured as a percentage of the workforce.
 variable

a quantity that may assume any one of a set of values
 variation ratio

the proportion of cases not in the mode
 vector

in statistics, a set of realvalued random variables that may be correlated
 veridical paradox

a situation in which a result appears absurd but is demonstrated to be true nevertheless
 volatility

the state of sharp and regular fluctuation
 weighted average

an arithmetic mean of values biased according to agreed weightings
 zscore

The standardized value of observation $x$ from a distribution that has mean $mu$ and standard deviation $sigma$.
 zvalue

the standardized value of an observation found by subtracting the mean from the observed value, and then dividing that value by the standard deviation; also called $z$score