Glossary

alpha level

the probability that the test will lead to a Type I error. That is, the alpha level determines the probability of obtaining sample data in the critical region even though the null hypothesis is true.

Alternative hypothesis

a statement that a study will find meaningful differences or relationships between the variables under investigation.

Bar graph

a graph in which bars of varying height with spaces between them are used to display data for variables defined by qualities or categories.

Binary numbers

Zero or one. Binary numbers are often used to represent true or false or present or absent.

bivariate

characterized by two variables or attributes

Box plots

a graphical display of the central value, variance, and extreme values in a data set. Used to identify outliers and for comparing distributions. Also known as box-and-whisker plots.

Central tendency

A statistical measures that identifies a single score (usually a central value) to serve as a representative for the entire group.

Class intervals

A group of scores in a grouped frequency distribution. Groups of scores have the same range (e.g., grouped by 10s)

class intervals

a range of scores or numerical values that constitute one segment or class of a variable of interest.

Confidence interval (C.I.)

A range of values for a population parameter that is estimated from a sample with a preset, fixed probability (known as the confidence level) that the range will contain the true value of the parameter. The width of the confidence interval provides information about the precision of the estimate, such that a wider interval indicates relatively low precision and a narrower interval indicates relatively high precision.

Confidence level

A value that gives the frequency that a given confidence interval contains the true value of the population parameter being estimated. Usually 95% or 99% is most commonly used.

Confounding variable

A confounding variable is a type of extraneous variable that will influence the outcome of the study. And so, researchers do their best to control these variables.

Construct

An unobservable theoretical concept, an abstract idea, that researchers want to measure in an experiment or study

contingency table

a two-dimensional table in which frequency values for categories of one variable are presented in the rows and values for categories of a second variable are presented in the columns: Values that appear in the various cells then represent the number or percentage of cases that fall into the two categories that intersect at this point.

Continuous variables

Variables that can have an infinite number of possible values or can fall on a continuum of values. For example, response time, height, temperature, GPA.

Control group

the group who are not exposed to the treatment variable; the control group serves as the comparison group.

Convenience sampling

When a study is conducted with whoever is willing or available to be a research participant.

criterion variable

the effect that one wants to predict or explain in correlational research.

critical region

the portion of a probability distribution containing the values for a test statistic that would result in rejection of a null hypothesis in favor of its corresponding alternative hypothesis.

critical value

a value used to make decisions about whether a test result is statistically meaningful.

curvilinear relationship

describing an association between variables that does not consistently follow an increasing or decreasing pattern but rather changes direction after a certain point (i.e., it involves a curve in the set of data points).

Data

Data are observations or measurements, usually quantified and obtained in the course of research (APA, 2022).

Degrees of freedom (df)

Degrees of freedom = df = n – 1, measures the number of scores that are free to vary when computing SS for sample data. The value of df also describes how well a t statistic estimates a z-score. (as discussed in Unit 3).

Dependent variable

In an experiment, the variable that is observed for changes due to the independent variable. (the measured variable, the outcome variable)

Descriptive statistics

Techniques that organize, summarize, and describe a set of data.

difference score

an index of dissimilarity or change between observations from the same individual across time, based on the measurement of a construct or attribute on two or more separate occasions.

directional hypothesis

a scientific prediction stating (a) that an effect will occur and (b) whether that effect will specifically increase or specifically decrease, depending on changes to the independent variable.

Discrete variables

A variable that consists of two or more distinct, non-continuous categories. For example, number of children, gender, hair color. Often called a categorical variable.

effect size

A measure of the magnitude or meaningfulness of a difference, relationship, or effect in a study. Often, effect sizes are interpreted as indicating the practical or meaningful significance of a research finding.

Empirical Rule

68% of all scores within 1 standard deviation of the mean; 95% of all scores within 2 standard deviations of the mean; 99% of all scores within 3 standard deviations of the mean. Also known as the 68-95-99 Rule.

equality of variances

the statistical assumption of equal variance, meaning that the average squared distance of a score from the mean is the same across all groups sampled in a study.

Expected value of M

The mean of a sampling distribution; equal to the true population mean.

expected value of M
Experimental design

A type of study in which a researcher assigns or manipulates which groups participants will be in and controls other variables.

Experimental group

the group who are exposed to the independent variable (or the manipulation) by the researcher; the experimental group represents the treatment group.

Extraneous variable

An extraneous variable is something that occurs in the environment or happens to the participants that could unintentionally influence the outcome of the study

fatigue effect

a decline in performance on a prolonged or demanding research task that is generally attributed to the participant becoming tired or bored with the task.

Frequency

How many times a particular category occurs.

Frequency distribution

A table or graph that breaks down the number of times or occurrence of a particular category or value of a variable of interest in your data.

Frequency polygon

a graph depicting a statistical distribution, made up of lines connecting the peaks of adjacent intervals. A line graph connecting the midpoints of the bars of a histogram is a frequency polygon.

Histogram

a graphical depiction of continuous data using bars of varying height, similar to a bar graph but with blocks on the x-axis adjoining or touching one another so as to show their continuous nature.  A histogram is appropriate to use when data is measured on an interval or ratio scale.

Hypothesis testing

a statistical inference procedure for determining whether a given proposition about a population parameter should be rejected on the basis of observed sample data.

Independent variable

In an experiment, the variable that is manipulated by the researcher. (the treatment conditions)

independent-measures

a research design that uses a separate group of participants for each treatment condition

Inferential statistics

Statistical techniques that use sample data to draw general conclusions about populations. Additionally, they allow us to answer our research questions.

Integers

Whole numbers (no decimal or fraction).

interaction

in a factorial design, the joint effect of two or more independent variables on a dependent variable above and beyond the sum of their individual effects: The independent variables combine to have a different (and multiplicative) effect, such that the value of one is contingent upon the value of another. This indicates that the relationship between the independent variables changes as their values change.

intercept

the point at which either axis of a graph is intersected by a line plotted on the graph.

Interquartile range (IQR)

The range of the middle 50% of the scores in a distribution and is sometimes used to communicate where the bulk of the data in the distribution are located.

Interval scale
Interval scales

An ordinal scale where all the categories are intervals with exactly the same width.

least squares error solution

in regression analysis, the principle that one should estimate the values of parameters iin a way that will minimize the squared error of predictions from the model.

line of best fit

A straight line that minimizes the distance between it and the data points in a scatter plot.

linear regression

a regression analysis in which the predictor or independent variables (xs) are assumed to be related to the criterion or dependent variable (y) in such a manner that increases in an x variable result in consistent increases in the y variable. In other words, the direction and rate of change of one variable is constant with respect to changes in the other variable.

linear relationship

an association between two variables that when subjected to regression analysis and plotted on a graph forms a straight line. In linear relationships, the direction and rate of change in one variable are constant with respect to changes in the other variable.

main effect

the consistent total effect of a single independent variable on a dependent variable over all other independent variables in an experimental design

Marginal values

In a contingency table, the total values for a single category of one variable, added up across levels of the other variable

matched design

two or more sets of study participants that are equivalent to one another with respect to certain relevant variables.

Mean

The most common measure of central tendency. The mean is the balancing point of a distribution of scores.

Median

The score that divides a distribution exactly in half.

Mode

The score with the greatest frequency overall (major), or the greatest frequency within the set of neighboring scores (minor).

multivariate

consisting of or otherwise involving a number of distinct variables.

negative relationship

an association in which one variable decreases as the other variable increases, or vice versa. Also called inverse relationship.

Nominal scale

A measurement scale where the categories are differentiated only by qualitative names.

non-directional hypothesis

a scientific prediction stating that an effect, difference or relationship will occur but does not predict if it will be an increase or decrease.

Non-experimental research

Also called correlational design involves observing things as they occur naturally and recording our observations as data. Often the purpose of this type of research is to find a correlation or relationship between variables.

nonparametric

describing any analytic method that does not involve making assumptions about the data of interest.

Normal distribution

A distribution where the left-hand side is a mirror image of the right-hand side. Also known as symmetrical distribution.

normality

the condition in which a data set presents a normal distribution of values.

Null hypothesis

a statement that a study will find no meaningful differences between the groups or conditions under investigation.

Observational studies

A type of non-experimental research where information is gathered from observing.

Operational definition

An operational definition is a description of a variable in terms of the operations (procedures, actions, or processes) by which it could be observed and measured (APA, 2022).

Operationalize

A procedure for measuring and defining a construct.

order effect

the influence of the order in which treatments are administered, such as the effect of being the first administered treatment (rather than the second, third, and so forth).

Ordinal scales

A measurement scale consisting of a series of ordered categories.

Outlier

Observation or data point that does not fit the pattern of the rest of the data. Sometimes called an extreme value.

Parameter

A value that describes a population.

percentile

the location of a score in a distribution expressed as the percentage of cases in the data set with scores equal to or below the score in question.

Placebo

A substance that has no therapeutic effect, used as a control in testing new drugs

Population

The entire group of individuals that a researcher wishes to study.

positive relationship

an association between two variables such that they rise and fall in value together.

power

A statistical measure of how effective a statistical procedure is at identifying real differences in a study. It is the probability that use of the procedure will lead to the null hypothesis being rejected.

practice effect

any change or improvement that results from practice or repetition of task items or activities.

predictor variable

a variable used to estimate, forecast, or project future events or circumstances. This term sometimes is used interchangeably with independent variable.

probability

The degree to which an event is likely to occur. Calculated as the frequency of that event or category divided by the total number of possible events or categories.

Qualitative variables

Variables that express a qualitative attribute such as hair color, eye color, religion, favorite movie, gender, and so on. Qualitative means that they describe a quality rather than a numeric quantity. Qualitative variables are sometimes referred to as categorical variables.

Quantitative variables

Variables that are measured in terms of numbers. Some examples of quantitative variables are height, weight, and shoe size.

quartile

values that divide a list of numbers into quarters. For example, the first (or lower) quartile of a distribution is the data value below which are the lowest 25% of scores, the second quartile is the data value below which are 50% of scores, and the third (or upper) quartile is the data value below which are 75% of scores (or, conversely, above which are 25% of scores).

Quasi-experimental design

A research design where the researcher is looking for group differences between levels of the independent variable but does not or more likely can not randomly assign members of the population to groups.

Random assignment

In experimental research, random assignment is a way of placing participants from your sample into different groups using randomization. With this method, every member of the sample has an equal chance of being placed in a control group or an experimental group.

Range

The distance from the upper real limit of the highest score to the lower real limit of the lowest score; the total distance from the absolute highest point to the lowest point in the distribution.

Ratio scale

A measurement scale with a true zero where the difference between two values is a constant ratio.

Real numbers

Using real data, data can be written in fraction/decimal form.

regression

an analysis in which the predictor or independent variables (xs) are assumed to be related to the criterion or dependent variable (y) in such a manner that increases in an x variable result in consistent increases in the y variable. In other words, the direction and rate of change of one variable is constant with respect to changes in the other variable.

related samples

sets of data that are related owing to their having been collected from the same group on two or more occasions (AKA repeated-measures)

Relative frequency

A ratio of occurrence for a particular category in a set of data. It is calculated by dividing the frequency of a category by the total number of observations or data points in that dataset.

Reliability

The trustworthiness or consistency of a measure, that is, the degree to which a test or other measurement instrument is free of random error, yielding the same results across multiple applications to the same sample (APA, 2022).

Representative sample

When the demographic characteristics of the sample closely match the demographic characteristics of the population from which the sample was selected, the sample is said to be representative of the population.

residual

in regression analysis, the difference between the value of an empirical observation and the value predicted by a model.

Sample

A group selected from a population to participate in a research study.

Sampling bias

Sampling bias occurs when your conclusions apply only to your sample and are not generalizable to the full population.

Sampling distribution

the distribution of a statistic, such as the mean, obtained with repeated samples, of a specific size (n), drawn from a population.

Sampling error

Sampling error is a measure of the naturally occurring statistical error that exists because a sample does not exactly represent the entire population of data.

scatterplot

a graphical representation of the relationship between two continuously measured variables in which one variable is arrayed on each axis and a dot or other symbol is placed at each point where the values of the variables intersect. The overall pattern of dots provides an indication of the extent to which there is a linear relationship between variables.

Simple random sample

In simple random sample, every member of the population has an equal chance of being selected into the sample. And, that chance remains constant.  That is, picking one member from the population must not increase or decrease the probability of picking any other member (relative to the others).

simple random sampling
Simpson's paradox

a phenomenon that can occur when data from two or more studies are merged, giving results that differ from those of either study individually.

slope

the steepness or slant of a line on a graph, measured as the change of value on the y-axis associated with a change of one unit of value on the x-axis.

spurious relationship

a situation in which variables are associated through their common relationship with one or more other variables but do not have a causal relationship with one another.

Standard deviation (s)

a measure of the variability within a sample, indicating how narrowly or broadly they deviate from the mean.

Standard error

a measure of the variability between a sample mean and population mean based on a specific sample size, n.

Statistical graph

A visual tool that helps you learn about the shape or distribution of a sample or a population.

statistical independence

the condition in which the occurrence of one event makes it neither more nor less probable that another event will occur.

statistical significance

the degree to which a research outcome cannot reasonably be attributed to the operation of chance or random factors. It is determined during significance testing and given by a critical p value, which is the probability of obtaining the observed data if the null hypothesis (i.e., of no significant relationship between variables) were true.

Statistical thinking

A way of understanding a complex world by describing it in relatively simple terms that nonetheless capture essential aspects of its structure or function, and that also provide us some idea of how uncertain we are about that knowledge.

Statistics

A value that describes a sample. A statistic is derived from measurements of the individuals in the sample.

Stem-and-leaf graph

Provides a way to share specific data points and spread based on base unit (stem) and final significant digit (leaf). Also known as a stemplot.

Stratified random sample

A type of random sample in which the population is first divided into smaller groups representing particular subpopulations.  Then, random samples are drawn from the subpopulations or strata.

stratified random sampling
sum of products

the value obtained by multiplying each pair of numbers in a set and then adding the individual totals.

Sum of Squares (SS)

The sum of the squared deviation scores.

symmetrical distribution

a distribution in which the frequency of values above the mean are a mirror image of those below the mean.

test for independence

a procedure used to test the hypothesis of the relationship between two categorical variables. The observed frequencies of a variable are compared with the frequencies that would be expected if the null hypothesis of no association (i.e., statistical independence) were true.

test statistic

the numerical result of a statistical test, which is used to determine statistical significane and evaluate the viability of a hypothesis.

Type I error

When a researcher rejects the null hypothesis when it should not be rejected. Researchers make this error when they believe they have found an effect, difference, or relationship that does not actually exist. The probability of committing a Type I error is called the significance level or alpha (α) level.

Type II error

When researchers fail to reject the null hypothesis when it is should be rejected. Researchers make this error if they conclude that a particular effect, difference, or relationship does not exist when in fact it does. The probability of committing a Type II error is called the beta (β) level of a test. Conversely, the probability of not committing a Type II error (i.e., of detecting a genuinely significant difference, effect, or relationship) is called the power of the test, where power = 1 – β.

Unrepresentative sample

A subset of the population that does not have the characteristics typical of the target population. Also known as a biased sample.

Validity

The degree to which the tool or assessment method measures what it claims to measure.

Value

A number, such as 4, - 81, or 367.12. A value can also be a category (word), such as male or female, or a psychological diagnosis (major depressive disorder, post-traumatic stress disorder, schizophrenia).

variability

the degree to which members of a group or population or scores in a dataset differ from each other

Variable

A condition or characteristic that can take on different values.

Variance

The average squared deviation of the scores from the mean.

Whiskers

lines extending from the box plot that represent the majority of bottom and top 25% of your distribution.

z-score

A standardized version of a raw score (x) that gives the relative location of that score within its distribution in terms of the mean and standard deviation of the distribution.

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Introduction to Statistics for the Social Sciences Copyright © 2021 by Jennifer Ivie; Alicia MacKay is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.

Share This Book