Glossary of Key Terms
- aggregate
-
a mass, assemblage, or sum of particulars; something consisting of elements but considered as a whole
- arithmetic mean
-
the measure of central tendency of a set of values computed by dividing the sum of the values by their number; commonly called the mean or the average
- average
-
any measure of central tendency, especially any mean, the median, or the mode
- Bayes' factor
-
The ratio of the conditional probabilities of the event $B$ given that $A_1$ is the case or that $A_2$ is the case, respectively.
- bell curve
-
In mathematics, the bell-shaped curve that is typical of the normal distribution. A symmetrical bell-shaped curve that represents the distribution of values, frequencies, or probabilities of a set of data. It slopes downward from a point in the middle corresponding to the mean value, or the maximum probability. Data that reflect the aggregate outcome of large numbers of unrelated events tend to result in bell curve distributions. (Dictionary.com, 2021)
- bellwether
-
anything that indicates future trends
- bias
-
(Uncountable) Inclination towards something; predisposition, partiality, prejudice, preference, predilection.
- bivariate
-
Having or involving exactly two variables.
- box plot
-
A graphical summary of a numerical data sample through five statistics: median, lower quartile, upper quartile, and some indication of more extreme upper and lower values.
- box-and-whisker plot
-
a convenient way of graphically depicting groups of numerical data through their quartiles
- breakdown point
-
the number or proportion of arbitrarily large or small extreme values that must be introduced into a batch or sample to cause the estimator to yield an arbitrarily large result
- breeding
-
the process through which propagation, growth, or development occurs
- causality
-
the relationship between an event (the cause) and a second event (the effect), where the second event is understood as a consequence of the first
- census
-
an official count of members of a population (not necessarily human), usually residents or citizens in a particular region, often done at regular intervals
- central limit theorem
-
The theorem that states: If the sum of independent identically distributed random variables has a finite variance, then it will be (approximately) normally distributed.
- central tendency
-
a term that relates the way in which quantitative data tend to cluster around some value
- chance variation
-
the presence of chance in determining the variation in experimental results
- chi-squared test
-
In probability theory and statistics, refers to a test in which the chi-squared distribution (also chi-square or χ-distribution) with k degrees of freedom is the distribution of a sum of the squares of k independent standard normal random variables.
- chromosome
-
A structure in the cell nucleus that contains DNA, histone protein, and other structural proteins.
- cluster
-
a significant subset within a population
- coefficient of variation
-
The ratio of the standard deviation to the mean.
- combinatorics
-
A branch of mathematics that studies (usually finite) collections of objects that satisfy specified criteria.
- conditional probability
-
The probability that an event will take place given the restrictive assumption that another event has taken place, or that a combination of other events has taken place
- confidence interval
-
A type of interval estimate of a population parameter used to indicate the reliability of an estimate.
- confounding variable
-
an extraneous variable in a statistical model that correlates (positively or negatively) with both the dependent variable and the independent variable
- contingency table
-
a table presenting the joint distribution of two categorical variables
- continuous random variable
-
obtained from data that can take infinitely many values
- continuous variable
-
a variable that has a continuous distribution function, such as temperature
- control
-
a separate group or subject in an experiment against which the results are compared where the primary variable is low or nonexistence
- control group
-
the group of test subjects left untreated or unexposed to some procedure and then compared with treated subjects in order to validate the results of the test
- correlation
-
One of the several measures of the linear statistical relationship between two random variables, indicating both the strength and direction of the relationship.
- critical thinking
-
the application of logical principles, rigorous standards of evidence, and careful reasoning to the analysis and discussion of claims, beliefs, and issues
- cross tabulation
-
a presentation of data in a tabular form to aid in identifying a relationship between variables
- cumulative relative frequency
-
the accumulation of the previous relative frequencies
- data mining
-
a technique for searching large-scale databases for patterns; used mainly to find previously unknown correlations between variables that may be commercially useful
- density
-
the probability that an event will occur, as a function of some observed variable
- dependent variable
-
in an equation, the variable whose value depends on one or more variables in the equation
- descriptive statistics
-
A branch of mathematics dealing with summarization and description of collections of data sets, including the concepts of arithmetic mean, median, and mode.
- deviation
-
For interval variables and ratio variables, a measure of difference between the observed value and the mean.
- dichotomous
-
dividing or branching into two pieces
- discrete random variable
-
obtained by counting values for which there are no in-between values, such as the integers 0, 1, 2, ….
- discrete variable
-
a variable that takes values from a finite or countable set, such as the number of legs of an animal
- disjoint
-
Having no members in common; having an intersection equal to the empty set.
- disparity
-
the state of being unequal; difference
- dispersion
-
the degree of scatter of data
- distribution
-
the set of relative likelihoods that a variable will have a value in a given interval
- ellipsis
-
a mark consisting of three periods, historically with spaces in between, before, and after them “… “, nowadays a single character ” (used in printing to indicate an omission)
- empirical
-
verifiable by means of scientific experimentation
- empirical rule
-
That a normal distribution has 68% of its observations within one standard deviation of the mean, 95% within two, and 99.7% within three.
- equiprobable
-
having an equal chance of occurring mathematically
- event
-
A subset of the sample space.
- evolution
-
a gradual directional change, especially one leading to a more advanced or complex form; growth; development
- exhaustive
-
including every possible element
- expected value
-
of a discrete random variable, the sum of the probability of each possible outcome of the experiment multiplied by the value itself
- experiment
-
A test under controlled conditions made to either demonstrate a known truth, examine the validity of a hypothesis, or determine the efficacy of something previously untried.
- exploratory data analysis
-
an approach to analyzing data sets that is concerned with uncovering underlying structure, extracting important variables, detecting outliers and anomalies, testing underlying assumptions, and developing models
- finite
-
limited, constrained by bounds, having an end
- frequency
-
number of times an event occurred in an experiment (absolute frequency)
- frequency distribution
-
a representation, either in a graphical or tabular format, which displays the number of observations within a given interval
- gene
-
a unit of heredity; a segment of DNA or RNA that is transmitted from one generation to the next, and that carries genetic information such as the sequence of amino acids for a protein
- gradient
-
of a function y = f(x) or the graph of such a function, the rate of change of y with respect to x, that is, the amount by which y changes for a certain (often unit) change in x
- graph
-
A diagram displaying data; in particular one showing the relationship between two or more quantities, measurements or indicative numbers that may or may not have a specific mathematical formula relating them to each other.
- heterogeneous
-
diverse in kind or nature; composed of diverse parts
- histogram
-
a representation of tabulated frequencies, shown as adjacent rectangles, erected over discrete intervals (bins), with an area equal to the frequency of the observations in the interval
- independence
-
The occurrence of one event does not affect the probability of the occurrence of another.
- independent
-
Not dependent; not contingent or depending on something else; free.
- independent event
-
the fact that $A$ occurs does not affect the probability that $B$ occurs
- independent variable
-
in an equation, any variable whose value is not dependent on any other in the equation
- inferential statistics
-
A branch of mathematics that involves drawing conclusions about a population based on sample data drawn from it.
- integral
-
the limit of the sums computed in a process in which the domain of a function is divided into small subsets and a possibly nominal value of the function on each subset is multiplied by the measure of that subset, all these products then being summed
- intercept
-
the coordinate of the point at which a curve intersects an axis
- interquartile range
-
The difference between the first and third quartiles; a robust measure of sample dispersion.
- labor force
-
The collective group of people who are available for employment, whether currently employed or unemployed (though sometimes only those unemployed people who are seeking work are included).
- line
-
a path through two or more points (compare ‘segment’); a continuous mark, including as made by a pen; any path, curved or straight
- linear regression
-
an approach to modeling the relationship between a scalar dependent variable $y$ and one or more explanatory variables denoted $x$.
- logarithm
-
for a number $x$, the power to which a given base number must be raised in order to obtain $x$
- margin of error
-
An expression of the lack of precision in the results obtained from a sample.
- mean squared error
-
A measure of the average of the squares of the “errors”; the amount by which the value implied by the estimator differs from the quantity to be estimated.
- median
-
the numerical value separating the higher half of a data sample, a population, or a probability distribution, from the lower half
- mode
-
the most frequently occurring value in a distribution
- Monte Carlo simulation
-
a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results–i.e., by running simulations many times over in order to calculate those same probabilities
- multiplication rule
-
The probability that A and B occur is equal to the probability that A occurs times the probability that B occurs, given that we know A has already occurred.
- mutually exclusive
-
describing multiple events or states of being such that the occurrence of any one implies the non-occurrence of all the others
- nominal
-
Having values whose order is insignificant.
- non-response
-
the absence of a response
- non-response bias
-
Occurs when the sample becomes biased because some of those initially selected refuse to respond.
- normal distribution
-
A family of continuous probability distributions such that the probability density function is the normal (or Gaussian) function.
- nuisance parameters
-
any parameter that is not of immediate interest but which must be accounted for in the analysis of those parameters which are of interest; the classic example of a nuisance parameter is the variance $sigma^2$, of a normal distribution, when the mean, $mu$, is of primary interest
- null hypothesis
-
A hypothesis set up to be refuted in order to support an alternative hypothesis; presumed true until statistical evidence in the form of a hypothesis test indicates otherwise.
- objective
-
not influenced by the emotions or prejudices
- observational study
-
a study drawing inferences about the possible effect of a treatment on subjects, where the assignment of subjects into a treated group versus a control group is outside the control of the investigator
- odds
-
the ratio of the probabilities of an event happening to that of it not happening
- ordinal
-
Of a number, indicating position in a sequence.
- outcome
-
One of the individual results that can occur in an experiment.
- outlier
-
a value in a statistical sample which does not fit a pattern that describes most other data points; specifically, a value that lies 1.5 IQR beyond the upper or lower quartile
- Pareto chart
-
a type of bar graph where where the bars are drawn in decreasing order of frequency or relative frequency
- Pareto distribution
-
The Pareto distribution, named after the Italian economist Vilfredo Pareto, is a power law probability distribution that is used in description of social, scientific, geophysical, actuarial, and many other types of observable phenomena.
- partition
-
a part of something that had been divided, each of its results
- peer review
-
the scholarly process whereby manuscripts intended to be published in an academic journal are reviewed by independent researchers (referees) to evaluate the contribution, i.e. the importance, novelty and accuracy of the manuscript’s contents
- percentile
-
any of the ninety-nine points that divide an ordered distribution into one hundred parts, each containing one per cent of the population
- pictogram
-
a picture that represents a word or an idea by illustration; used often in graphs
- pip
-
one of the spots or symbols on a playing card, domino, die, etc.
- placebo
-
an inactive substance or preparation used as a control in an experiment or test to determine the effectiveness of a medicinal drug
- placebo effect
-
the tendency of any medication or treatment, even an inert or ineffective one, to exhibit results simply because the recipient believes that it will work
- Platonic solid
-
any one of the following five polyhedra: the regular tetrahedron, the cube, the regular octahedron, the regular dodecahedron and the regular icosahedron
- plot
-
a graph or diagram drawn by hand or produced by a mechanical or electronic device
- polynomial
-
An expression consisting of a sum of a finite number of terms: each term being the product of a constant coefficient and one or more variables raised to a non-negative integer power.
- population
-
a group of units (persons, objects, or other items) enumerated in a census or from which a sample is drawn
- probability
-
The relative likelihood of an event happening.
- probability density function
-
any function whose integral over a set gives the probability that a random variable has a value in that set
- probability distribution
-
A function of a discrete random variable yielding the probability that the variable will have a given value.
- probability sample
-
a sample in which every unit in the population has a chance (greater than zero) of being selected in the sample, and this probability can be accurately determined
- probability theory
-
The mathematical study of probability (the likelihood of occurrence of random events in order to predict the behavior of defined systems).
- prognostic
-
a sign by which a future event may be known or foretold
- prosecutor's fallacy
-
A fallacy of statistical reasoning when used as an argument in legal proceedings.
- public opinion polls
-
surveys designed to represent the beliefs of a population by conducting a series of questions and then extrapolating generalities in ratio or within confidence intervals
- purposive sampling
-
occurs when the researchers choose the sample based on who they think would be appropriate for the study; used primarily when there is a limited number of people that have expertise in the area being researched
- quadrennial
-
happening every four years
- qualitative
-
of descriptions or distinctions based on some quality rather than on some quantity
- qualitative analysis
-
The numerical examination and interpretation of observations for the purpose of discovering underlying meanings and patterns of relationships.
- qualitative data
-
data centered around descriptions or distinctions based on some quality or characteristic rather than on some quantity or measured value
- quantitative
-
of a measurement based on some quantity or number rather than on some quality
- quantity
-
of a measurement based on some quantity or number rather than on some quality
- quartile
-
any of the three points that divide an ordered distribution into four parts, each containing a quarter of the population
- quota sampling
-
a sampling method that chooses a representative cross-section of the population by taking into consideration each important characteristic of the population proportionally, such as income, sex, race, age, etc.
- R
-
A free software programming language and a software environment for statistical computing and graphics.
- random assignment
-
an experimental technique for assigning subjects to different treatments (or no treatment)
- random number
-
number allotted randomly using suitable generator (electronic machine or as simple “generator” as die)
- random sample
-
a sample randomly taken from an investigated population
- random variable
-
a quantity whose value is random and to which a probability distribution is assigned, such as the possible outcome of a roll of a die
- random walk
-
a stochastic path consisting of a series of sequential movements, the direction (and sometime length) of which is chosen at random
- range
-
the length of the smallest interval which contains all the data in a sample; the difference between the largest and smallest observations in the sample
- raw score
-
an original observation that has not been transformed to a $z$-score
- regression
-
An analytic method to measure the association of one or more independent variables with a dependent variable.
- regression to the mean
-
the phenomenon by which extreme examples from any set of data are likely to be followed by examples which are less extreme; a tendency towards the average of any sample
- relative frequency
-
the fraction or proportion of times a value occurs
- relative frequency distribution
-
a representation, either in graphical or tabular format, which displays the fraction of observations in a certain category
- residual
-
The difference between the observed value and the estimated function value.
- response bias
-
Occurs when the answers given by respondents do not reflect their true beliefs.
- root mean square
-
the square root of the arithmetic mean of the squares
- sample
-
a subset of a population selected for measurement, observation, or questioning to provide statistical information about the population
- sample mean
-
the mean of a sample of random variables taken from the entire population of those variables
- sample space
-
The set of all outcomes of an experiment.
- sampling
-
the process or technique of obtaining a representative sample
- sampling distribution
-
The probability distribution of a given statistic based on a random sample.
- scatter plot
-
A type of display using Cartesian coordinates to display values for two variables for a set of data.
- scientific control
-
an experiment or observation designed to minimize the effects of variables other than the single independent variable
- shunt
-
a passage between body channels constructed surgically as a bypass
- Simpson's paradox
-
a paradox in which a trend that appears in different groups of data disappears when these groups are combined, and the reverse trend appears for the aggregate data
- skewed
-
Biased or distorted (pertaining to statistics or information).
- skewness
-
A measure of the asymmetry of the probability distribution of a real-valued random variable; is the third standardized moment, defined as where is the third moment about the mean and is the standard deviation.
- slope
-
the ratio of the vertical and horizontal distances between two points on a line; zero if the line is horizontal, undefined if it is vertical.
- spread
-
A numerical difference.
- standard deviation
-
a measure of how spread out data values are around the mean, defined as the square root of the variance
- standard error
-
A measure of how spread out data values are around the mean, defined as the square root of the variance.
- statistical literacy
-
the ability to understand statistics, necessary for citizens to understand material presented in publications such as newspapers, television, and the Internet
- statistics
-
a mathematical science concerned with data collection, presentation, analysis, and interpretation
- stem-and-leaf display
-
a means of displaying data used especially in exploratory data analysis; another name for stemplot
- stemplot
-
a means of displaying data used especially in exploratory data analysis; another name for stem-and-leaf display
- stochastic
-
random; randomly determined
- stratum
-
a category composed of people with certain similarities, such as gender, race, religion, or even grade level
- straw poll
-
a survey of opinion which is unofficial, casual, or ad hoc
- Student's t-distribution
-
A distribution that arises when the population standard deviation is unknown and has to be estimated from the data; originally derived by William Sealy Gosset (who wrote under the pseudonym “Student”).
- Student's t-statistic
-
a ratio of the departure of an estimated parameter from its notional value and its standard error
- summation notation
-
a notation, given by the Greek letter sigma, that denotes the operation of adding a sequence of numbers
- TI-83
-
A calculator manufactured by Texas Instruments that is one of the most popular graphing calculators for statistical purposes.
- truncate
-
To shorten something as if by cutting off part of it.
- unbiased
-
impartial or without prejudice
- undercoverage
-
Occurs when a survey fails to reach a certain portion of the population.
- unemployment
-
The level of joblessness in an economy, often measured as a percentage of the workforce.
- variable
-
a quantity that may assume any one of a set of values
- variation ratio
-
the proportion of cases not in the mode
- vector
-
in statistics, a set of real-valued random variables that may be correlated
- veridical paradox
-
a situation in which a result appears absurd but is demonstrated to be true nevertheless
- volatility
-
the state of sharp and regular fluctuation
- weighted average
-
an arithmetic mean of values biased according to agreed weightings
- z-score
-
The standardized value of observation $x$ from a distribution that has mean $mu$ and standard deviation $sigma$.
- z-value
-
the standardized value of an observation found by subtracting the mean from the observed value, and then dividing that value by the standard deviation; also called $z$-score