# Glossary of Key Terms

aggregate

a mass, assemblage, or sum of particulars; something consisting of elements but considered as a whole

arithmetic mean

the measure of central tendency of a set of values computed by dividing the sum of the values by their number; commonly called the mean or the average

average

any measure of central tendency, especially any mean, the median, or the mode

Bayes' factor

The ratio of the conditional probabilities of the event \$B\$ given that \$A_1\$ is the case or that \$A_2\$ is the case, respectively.

bell curve

In mathematics, the bell-shaped curve that is typical of the normal distribution. A symmetrical bell-shaped curve that represents the distribution of values, frequencies, or probabilities of a set of data. It slopes downward from a point in the middle corresponding to the mean value, or the maximum probability. Data that reflect the aggregate outcome of large numbers of unrelated events tend to result in bell curve distributions. (Dictionary.com, 2021)

bellwether

anything that indicates future trends

bias

(Uncountable) Inclination towards something; predisposition, partiality, prejudice, preference, predilection.

bivariate

Having or involving exactly two variables.

box plot

A graphical summary of a numerical data sample through five statistics: median, lower quartile, upper quartile, and some indication of more extreme upper and lower values.

box-and-whisker plot

a convenient way of graphically depicting groups of numerical data through their quartiles

breakdown point

the number or proportion of arbitrarily large or small extreme values that must be introduced into a batch or sample to cause the estimator to yield an arbitrarily large result

breeding

the process through which propagation, growth, or development occurs

causality

the relationship between an event (the cause) and a second event (the effect), where the second event is understood as a consequence of the first

census

an official count of members of a population (not necessarily human), usually residents or citizens in a particular region, often done at regular intervals

central limit theorem

The theorem that states: If the sum of independent identically distributed random variables has a finite variance, then it will be (approximately) normally distributed.

central tendency

a term that relates the way in which quantitative data tend to cluster around some value

chance variation

the presence of chance in determining the variation in experimental results

chi-squared test

In probability theory and statistics, refers to a test in which the chi-squared distribution (also chi-square or χ-distribution) with k degrees of freedom is the distribution of a sum of the squares of k independent standard normal random variables.

chromosome

A structure in the cell nucleus that contains DNA, histone protein, and other structural proteins.

cluster

a significant subset within a population

coefficient of variation

The ratio of the standard deviation to the mean.

combinatorics

A branch of mathematics that studies (usually finite) collections of objects that satisfy specified criteria.

conditional probability

The probability that an event will take place given the restrictive assumption that another event has taken place, or that a combination of other events has taken place

confidence interval

A type of interval estimate of a population parameter used to indicate the reliability of an estimate.

confounding variable

an extraneous variable in a statistical model that correlates (positively or negatively) with both the dependent variable and the independent variable

contingency table

a table presenting the joint distribution of two categorical variables

continuous random variable

obtained from data that can take infinitely many values

continuous variable

a variable that has a continuous distribution function, such as temperature

control

a separate group or subject in an experiment against which the results are compared where the primary variable is low or nonexistence

control group

the group of test subjects left untreated or unexposed to some procedure and then compared with treated subjects in order to validate the results of the test

correlation

One of the several measures of the linear statistical relationship between two random variables, indicating both the strength and direction of the relationship.

critical thinking

the application of logical principles, rigorous standards of evidence, and careful reasoning to the analysis and discussion of claims, beliefs, and issues

cross tabulation

a presentation of data in a tabular form to aid in identifying a relationship between variables

cumulative relative frequency

the accumulation of the previous relative frequencies

data mining

a technique for searching large-scale databases for patterns; used mainly to find previously unknown correlations between variables that may be commercially useful

density

the probability that an event will occur, as a function of some observed variable

dependent variable

in an equation, the variable whose value depends on one or more variables in the equation

descriptive statistics

A branch of mathematics dealing with summarization and description of collections of data sets, including the concepts of arithmetic mean, median, and mode.

deviation

For interval variables and ratio variables, a measure of difference between the observed value and the mean.

dichotomous

dividing or branching into two pieces

discrete random variable

obtained by counting values for which there are no in-between values, such as the integers 0, 1, 2, ….

discrete variable

a variable that takes values from a finite or countable set, such as the number of legs of an animal

disjoint

Having no members in common; having an intersection equal to the empty set.

disparity

the state of being unequal; difference

dispersion

the degree of scatter of data

distribution

the set of relative likelihoods that a variable will have a value in a given interval

ellipsis

a mark consisting of three periods, historically with spaces in between, before, and after them “… “, nowadays a single character ” (used in printing to indicate an omission)

empirical

verifiable by means of scientific experimentation

empirical rule

That a normal distribution has 68% of its observations within one standard deviation of the mean, 95% within two, and 99.7% within three.

equiprobable

having an equal chance of occurring mathematically

event

A subset of the sample space.

evolution

exhaustive

including every possible element

expected value

of a discrete random variable, the sum of the probability of each possible outcome of the experiment multiplied by the value itself

experiment

A test under controlled conditions made to either demonstrate a known truth, examine the validity of a hypothesis, or determine the efficacy of something previously untried.

exploratory data analysis

an approach to analyzing data sets that is concerned with uncovering underlying structure, extracting important variables, detecting outliers and anomalies, testing underlying assumptions, and developing models

finite

limited, constrained by bounds, having an end

frequency

number of times an event occurred in an experiment (absolute frequency)

frequency distribution

a representation, either in a graphical or tabular format, which displays the number of observations within a given interval

gene

a unit of heredity; a segment of DNA or RNA that is transmitted from one generation to the next, and that carries genetic information such as the sequence of amino acids for a protein

of a function y = f(x) or the graph of such a function, the rate of change of y with respect to x, that is, the amount by which y changes for a certain (often unit) change in x

graph

A diagram displaying data; in particular one showing the relationship between two or more quantities, measurements or indicative numbers that may or may not have a specific mathematical formula relating them to each other.

heterogeneous

diverse in kind or nature; composed of diverse parts

histogram

a representation of tabulated frequencies, shown as adjacent rectangles, erected over discrete intervals (bins), with an area equal to the frequency of the observations in the interval

independence

The occurrence of one event does not affect the probability of the occurrence of another.

independent

Not dependent; not contingent or depending on something else; free.

independent event

the fact that \$A\$ occurs does not affect the probability that \$B\$ occurs

independent variable

in an equation, any variable whose value is not dependent on any other in the equation

inferential statistics

A branch of mathematics that involves drawing conclusions about a population based on sample data drawn from it.

integral

the limit of the sums computed in a process in which the domain of a function is divided into small subsets and a possibly nominal value of the function on each subset is multiplied by the measure of that subset, all these products then being summed

intercept

the coordinate of the point at which a curve intersects an axis

interquartile range

The difference between the first and third quartiles; a robust measure of sample dispersion.

labor force

The collective group of people who are available for employment, whether currently employed or unemployed (though sometimes only those unemployed people who are seeking work are included).

line

a path through two or more points (compare ‘segment’); a continuous mark, including as made by a pen; any path, curved or straight

linear regression

an approach to modeling the relationship between a scalar dependent variable \$y\$ and one or more explanatory variables denoted \$x\$.

logarithm

for a number \$x\$, the power to which a given base number must be raised in order to obtain \$x\$

margin of error

An expression of the lack of precision in the results obtained from a sample.

mean squared error

A measure of the average of the squares of the “errors”; the amount by which the value implied by the estimator differs from the quantity to be estimated.

median

the numerical value separating the higher half of a data sample, a population, or a probability distribution, from the lower half

mode

the most frequently occurring value in a distribution

Monte Carlo simulation

a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results–i.e., by running simulations many times over in order to calculate those same probabilities

multiplication rule

The probability that A and B occur is equal to the probability that A occurs times the probability that B occurs, given that we know A has already occurred.

mutually exclusive

describing multiple events or states of being such that the occurrence of any one implies the non-occurrence of all the others

nominal

Having values whose order is insignificant.

non-response

the absence of a response

non-response bias

Occurs when the sample becomes biased because some of those initially selected refuse to respond.

normal distribution

A family of continuous probability distributions such that the probability density function is the normal (or Gaussian) function.

nuisance parameters

any parameter that is not of immediate interest but which must be accounted for in the analysis of those parameters which are of interest; the classic example of a nuisance parameter is the variance \$sigma^2\$, of a normal distribution, when the mean, \$mu\$, is of primary interest

null hypothesis

A hypothesis set up to be refuted in order to support an alternative hypothesis; presumed true until statistical evidence in the form of a hypothesis test indicates otherwise.

objective

not influenced by the emotions or prejudices

observational study

a study drawing inferences about the possible effect of a treatment on subjects, where the assignment of subjects into a treated group versus a control group is outside the control of the investigator

odds

the ratio of the probabilities of an event happening to that of it not happening

ordinal

Of a number, indicating position in a sequence.

outcome

One of the individual results that can occur in an experiment.

outlier

a value in a statistical sample which does not fit a pattern that describes most other data points; specifically, a value that lies 1.5 IQR beyond the upper or lower quartile

Pareto chart

a type of bar graph where where the bars are drawn in decreasing order of frequency or relative frequency

Pareto distribution

The Pareto distribution, named after the Italian economist Vilfredo Pareto, is a power law probability distribution that is used in description of social, scientific, geophysical, actuarial, and many other types of observable phenomena.

partition

a part of something that had been divided, each of its results

peer review

the scholarly process whereby manuscripts intended to be published in an academic journal are reviewed by independent researchers (referees) to evaluate the contribution, i.e. the importance, novelty and accuracy of the manuscript’s contents

percentile

any of the ninety-nine points that divide an ordered distribution into one hundred parts, each containing one per cent of the population

pictogram

a picture that represents a word or an idea by illustration; used often in graphs

pip

one of the spots or symbols on a playing card, domino, die, etc.

placebo

an inactive substance or preparation used as a control in an experiment or test to determine the effectiveness of a medicinal drug

placebo effect

the tendency of any medication or treatment, even an inert or ineffective one, to exhibit results simply because the recipient believes that it will work

Platonic solid

any one of the following five polyhedra: the regular tetrahedron, the cube, the regular octahedron, the regular dodecahedron and the regular icosahedron

plot

a graph or diagram drawn by hand or produced by a mechanical or electronic device

polynomial

An expression consisting of a sum of a finite number of terms: each term being the product of a constant coefficient and one or more variables raised to a non-negative integer power.

population

a group of units (persons, objects, or other items) enumerated in a census or from which a sample is drawn

probability

The relative likelihood of an event happening.

probability density function

any function whose integral over a set gives the probability that a random variable has a value in that set

probability distribution

A function of a discrete random variable yielding the probability that the variable will have a given value.

probability sample

a sample in which every unit in the population has a chance (greater than zero) of being selected in the sample, and this probability can be accurately determined

probability theory

The mathematical study of probability (the likelihood of occurrence of random events in order to predict the behavior of defined systems).

prognostic

a sign by which a future event may be known or foretold

prosecutor's fallacy

A fallacy of statistical reasoning when used as an argument in legal proceedings.

public opinion polls

surveys designed to represent the beliefs of a population by conducting a series of questions and then extrapolating generalities in ratio or within confidence intervals

purposive sampling

occurs when the researchers choose the sample based on who they think would be appropriate for the study; used primarily when there is a limited number of people that have expertise in the area being researched

happening every four years

qualitative

of descriptions or distinctions based on some quality rather than on some quantity

qualitative analysis

The numerical examination and interpretation of observations for the purpose of discovering underlying meanings and patterns of relationships.

qualitative data

data centered around descriptions or distinctions based on some quality or characteristic rather than on some quantity or measured value

quantitative

of a measurement based on some quantity or number rather than on some quality

quantity

of a measurement based on some quantity or number rather than on some quality

quartile

any of the three points that divide an ordered distribution into four parts, each containing a quarter of the population

quota sampling

a sampling method that chooses a representative cross-section of the population by taking into consideration each important characteristic of the population proportionally, such as income, sex, race, age, etc.

R

A free software programming language and a software environment for statistical computing and graphics.

random assignment

an experimental technique for assigning subjects to different treatments (or no treatment)

random number

number allotted randomly using suitable generator (electronic machine or as simple “generator” as die)

random sample

a sample randomly taken from an investigated population

random variable

a quantity whose value is random and to which a probability distribution is assigned, such as the possible outcome of a roll of a die

random walk

a stochastic path consisting of a series of sequential movements, the direction (and sometime length) of which is chosen at random

range

the length of the smallest interval which contains all the data in a sample; the difference between the largest and smallest observations in the sample

raw score

an original observation that has not been transformed to a \$z\$-score

regression

An analytic method to measure the association of one or more independent variables with a dependent variable.

regression to the mean

the phenomenon by which extreme examples from any set of data are likely to be followed by examples which are less extreme; a tendency towards the average of any sample

relative frequency

the fraction or proportion of times a value occurs

relative frequency distribution

a representation, either in graphical or tabular format, which displays the fraction of observations in a certain category

residual

The difference between the observed value and the estimated function value.

response bias

Occurs when the answers given by respondents do not reflect their true beliefs.

root mean square

the square root of the arithmetic mean of the squares

sample

a subset of a population selected for measurement, observation, or questioning to provide statistical information about the population

sample mean

the mean of a sample of random variables taken from the entire population of those variables

sample space

The set of all outcomes of an experiment.

sampling

the process or technique of obtaining a representative sample

sampling distribution

The probability distribution of a given statistic based on a random sample.

scatter plot

A type of display using Cartesian coordinates to display values for two variables for a set of data.

scientific control

an experiment or observation designed to minimize the effects of variables other than the single independent variable

shunt

a passage between body channels constructed surgically as a bypass

a paradox in which a trend that appears in different groups of data disappears when these groups are combined, and the reverse trend appears for the aggregate data

skewed

Biased or distorted (pertaining to statistics or information).

skewness

A measure of the asymmetry of the probability distribution of a real-valued random variable; is the third standardized moment, defined as where is the third moment about the mean and is the standard deviation.

slope

the ratio of the vertical and horizontal distances between two points on a line; zero if the line is horizontal, undefined if it is vertical.

A numerical difference.

standard deviation

a measure of how spread out data values are around the mean, defined as the square root of the variance

standard error

A measure of how spread out data values are around the mean, defined as the square root of the variance.

statistical literacy

the ability to understand statistics, necessary for citizens to understand material presented in publications such as newspapers, television, and the Internet

statistics

a mathematical science concerned with data collection, presentation, analysis, and interpretation

stem-and-leaf display

a means of displaying data used especially in exploratory data analysis; another name for stemplot

stemplot

a means of displaying data used especially in exploratory data analysis; another name for stem-and-leaf display

stochastic

random; randomly determined

stratum

a category composed of people with certain similarities, such as gender, race, religion, or even grade level

straw poll

a survey of opinion which is unofficial, casual, or ad hoc

Student's t-distribution

A distribution that arises when the population standard deviation is unknown and has to be estimated from the data; originally derived by William Sealy Gosset (who wrote under the pseudonym “Student”).

Student's t-statistic

a ratio of the departure of an estimated parameter from its notional value and its standard error

summation notation

a notation, given by the Greek letter sigma, that denotes the operation of adding a sequence of numbers

TI-83

A calculator manufactured by Texas Instruments that is one of the most popular graphing calculators for statistical purposes.

truncate

To shorten something as if by cutting off part of it.

unbiased

impartial or without prejudice

undercoverage

Occurs when a survey fails to reach a certain portion of the population.

unemployment

The level of joblessness in an economy, often measured as a percentage of the workforce.

variable

a quantity that may assume any one of a set of values

variation ratio

the proportion of cases not in the mode

vector

in statistics, a set of real-valued random variables that may be correlated

a situation in which a result appears absurd but is demonstrated to be true nevertheless

volatility

the state of sharp and regular fluctuation

weighted average

an arithmetic mean of values biased according to agreed weightings

z-score

The standardized value of observation \$x\$ from a distribution that has mean \$mu\$ and standard deviation \$sigma\$.

z-value

the standardized value of an observation found by subtracting the mean from the observed value, and then dividing that value by the standard deviation; also called \$z\$-score 