Planning Assessments
Assessment is a critical component of the instructional planning process and should have a prominent role in the learning process. This means that teachers should plan to integrate multiple forms of assessment and use the data to understand how well their students are learning the content and skills specified by the learning objectives. An assessment used during the learning process is referred to as a formative assessment. In this section, you will learn about the second stage in the Backward Design process of ensuring alignment between your learning objectives and your assessment plan.
Learning Objectives
By the end of this chapter, you will be able to:
- Determine acceptable evidence of student learning; and
- Select and/or design formative and summative assessments aligned with learning objectives to support, verify, and document learning.
Stage 2: Determining Acceptable Evidence
Now that we understand the value of having clear learning objectives, we can start to look at the second stage of the Backward Design model (Wiggins & McTighe, 2005) where we determine what types of evidence will be acceptable to demonstrate that our students have met our goals. When considering potential evidence, Popham and Baker (1970) contend that teachers must develop skills to differentiate between different types of practice to ensure that the evidence they collect aligns with their stated learning objectives. The assessment piece you choose, whether it be a quiz, assignment, essay, test, or project, will provide you with evidence of student learning. However, Popham and Baker suggest that you should evaluate what you are asking students to do based on the following practice types:
- Equivalent: practice the specific desired objective
- Analogous: practice is similar to the desired objective but not identical.
- En-route: skill needed before performing the desired objective
- Irrelevant: any practice or activity that does not align with the desired objective
Recognizing what type of practice you are requiring students to engage in will help guide your selection, adoption, and creation of assessments in stage 2 of the Backward Design process. The key to remember is that students should be given the opportunity to practice the specific skill(s) defined by your learning objectives (Popham & Baker, 1970). This second stage requires that you understand the differences between formative and summative assessment which is foundational information necessary to ensure you provide practice and feedback for your students during the learning process. In addition, we will investigate a variety of assessment types and their pros and cons in order to select the best format for your assessment.
Formative Assessment
Formative assessment includes all the practices teachers use to check student understanding throughout the teaching and learning process. Often, formative assessment is said to be an assessment for learning.
Definition of Formative Assessment*
Formative assessment refers to the ongoing process teachers and students engage in when selecting a learning goal(s), determining student performance in relation to the goal, and planning steps needed to move students closer to the goal. This ongoing process is implemented through informal assessments, assessments that can easily be incorporated into day-to-day classroom activities. Informal assessments are content and performance-driven and include questioning students during a discussion, student work (exit slips; assignments), and direct observation of students working. Rather than being used for grading, formative assessment is used to inform instructional planning and to provide students with valuable feedback on their progress. Formative assessment data can be collected as a pre-assessment, during a lesson, or as a post-assessment at the closing of a lesson.
In the video below, Rick Wormeli, author of Fair Isn’t Always Equal and Differentiation, explains the difference between summative and formative assessment and how formative assessment helps you offer better feedback to your students.
Listen to Jeoy Feith and Terri Drain discuss what assessment for learning in a PE setting looks like (show notes available if you what to read instead).
Adjusting Instruction Based on Formative Assessment*
Using assessment information to adjust instruction is fundamental to the concept of assessment for learning. Teachers make these adjustments “in the moment” during classroom instruction as well as during reflection and planning periods. Teachers use the information they gain from questioning and observation to adjust their teaching during classroom instruction. If students cannot answer a question, the teacher may need to rephrase the question, probe understanding of prior knowledge, or change the way the current idea is being considered. Teachers need to learn to identify when only one or two students need individual help and when a large proportion of the class is struggling so whole group intervention is needed.
After the class is over, effective teachers spend time analyzing how well the lessons went, what students did and did not seem to understand, and what needs to be done the next day. Evaluation of student work also provides important information for teachers. If many students are confused about a similar concept, the teacher needs to re-teach it and consider new ways of helping students understand the topic. If the majority of students complete the tasks very quickly and well, the teacher might decide that the assessment was not challenging enough.
Formative Assessment Strategies
Selecting and administering assessment techniques that are appropriate for the goals of instruction as well as the developmental level of the students is a crucial component of effective formative assessment. Teachers need to know the characteristics of a wide variety of classroom assessment techniques and how these techniques can be adapted for various content, skills, and student characteristics (Seifert, 2011). There is a vast array of formative assessment strategies that have been proven to be effective. For example, Natalie Reiger has compiled a list of 60 formative assessment strategies along with guidance on how to use them successfully in the classroom. Finding different strategies to try has never been easier as dozens of books have been written on the topic and hundreds of videos have been posted online demonstrating effective strategies. The key is not knowing all the possible formative assessment strategies but being able to distinguish which strategy best fits your assessment needs.
Technology & Formative Assessment*
Technology is a powerful ally for teachers, especially in measuring student learning. With digital formative assessments, teachers can expedite their ability to assess and provide student feedback in real-time. Timmis, Broadfoot, Sutherland, and Oldfield (2016) encourage teachers to reflect on the “four C’s” when using technology to enhance a lesson. Ask yourself, does technology allow for increased collaboration or critical thinking opportunities? Are students able to communicate their ideas uniquely and are students able to demonstrate creative thinking? Following this format provides lessons that foster student engagement, with technology as an enhancement tool. Digital formative assessments provide teachers the opportunity to give individual feedback quicker and in real-time compared to traditional non-digital paper and pen formative assessments.
Educators now have access to a variety of tools that allow for instant feedback. Google Forms, Socrative, Kahoot, Quizziz, Plickers, Formative, PollEverywhere, Edpuzzle, Nearpod, and Quizlet are all educational technologies that allow teachers and students to attain instant results on the learning taking place. The students may access the system using a variety of different technological tools including a learning management system (LMS) or a mobile device.
Teachers can have students work through retrieval practice together (such as when using a polling tool like PollEverywhere or a game-like tool like Kahoot). There are also educational technology tools that are more self-paced and provide opportunities for learners to work at their own pace. Many of these services are starting to allow for either approach to be used. Quizlet flashcards and some of their games such as Scatter, Match, and Gravity can be used in a self-directed way by students. Quizlet also has a game called Quizlet Live that can be used with a group of students at one time for retrieval practice. Beyond assessment, teachers can utilize student devices, typically smartphones, to enhance learning in a variety of ways.
Exit Tickets
Exit Tickets are a great way to practice the backward design model on a small scale. Exit Tickets are brief mini-assessments aligned to your daily objective. Teachers can provide their students a short period at the end of the class session to complete and submit the Exit Ticket. By considering the content of the Exit Ticket before planning, teachers can ensure that they address the desired skills and concepts during their lesson. Teachers can then use the evidence gathered from Exit Tickets to guide future planning sessions for remediation purposes.
See It in Action: Exit Tickets
Check out this resource from the Teacher Toolkit website. They provide a video of a teacher using Exit Tickets and tips on how and when to use Exit Tickets.
Summative Assessment*
Assessment of learning is a formal assessment that involves assessing students to certify their competence and fulfill accountability mandates. Assessment of learning is typically summative, that is, administered after the instruction is completed (e.g. end-of-unit or chapter tests, end-of-term tests, or standardized tests). Summative assessments provide information about how well students mastered the material, whether students are ready for the next unit, and what grades should be given (Airasian, 2005).
Assessment Methods
Learning objectives guide what sort of practice is appropriate. There are four classifications for learning objectives: knowledge, reasoning, skill, or product (Chappuis et al. 2012). The action defined by the objective will determine which assessment method is best appropriate for gathering evidence of learning. The table below outlines commonly used words and descriptions of each classification.
Classifications of Learning Objectives
| Type | Keywords | Description |
|---|---|---|
| Knowledge | Know, list, identify, understand, explain | “Knowledge targets represent the factual information, procedural knowledge, and conceptual understandings that underpin each discipline or content area…These targets form the foundation for each of the other types of learning targets.” |
| Reasoning | Predict, infer, summarize, compare, analyze, classify | “Reasoning targets specify thought processes students must learn to do well across a range of subjects.” Reasoning involves thinking and applying–using knowledge to solve a problem, make a decision, etc. These targets move students beyond mastering content knowledge to the application of knowledge.” |
| Skill | Demonstrate, pronounce, perform | “Skill targets are those where a demonstration or a physical skill-based performance is at the heart of the learning. Most skill targets are found in subjects such as physical education, visual and performing arts, and foreign languages. Other content areas may have a few skill targets.” |
| Product | Create, design, write, draw, make | “Product targets describe learning in terms of artifacts where creation of a product is the focus of the learning target. With product targets, the specifications for quality of the product itself are the focus of teaching and assessment.” |
Source: Classroom Assessment of Student Learning (Chappuis et al. 2012)
It is important to understand the focus of your learning objective because it will define what type of assessment tool to use. There are many methods to assess students learning but three common types are selected response, constructed response, and performance tasks (Chappuis et al. 2012). The visuals below from Chappuis et al. (2012) and Stiggins (2005) show how some assessment methods are better suited for certain learning targets than others.
Target-Assessment Method Match
| Target | Selected Response | Short Constructed Response | Extended Constructed Response | Performance Task | Technology Enhanced | Comments |
|---|---|---|---|---|---|---|
| Knowledge | Y | Y | Y | Y | Y | SR for recall/recognition; CR for descriptions/explanations-deeper knowledge and understanding |
| Reasoning | Y | Y+ | Y+ | Y | Y | SR for some types of reasoning; CR for deeper knowledge, application-"seeing student's thinking" |
| Skill | N | N | N | Y+ | Y | Very limited SR potential; PT to observe and listen to student response/demonstration of skill |
| Product | N | Y | Y | Y+ | Y | CR may work for targets where writing is the learning focus; essays, term papers, etc. are generally considered products by the student |
Source: Classroom Assessment of Student Learning (Chappuis et al. 2012)
Assessment Method
| TARGET TO BE ASSESSED | Selected Response | Essay | Performance Assessment | Personal Communication |
|---|---|---|---|---|
| Knowledge and Understanding | Multiple choice, true/false, matching, and fill-in can sample mastery of elements of knowledge | Essay exercises can tap understanding of relationships among elements of knowledge | Not a good choice for this target-three other options preferred | Can ask questions, evaluate answers, and infer mastery, but a time-consuming option |
| Reasoning Proficiency | Can assess application of some patterns of reasoning | Written descriptions of complex problem solutions can provide a window into reasoning proficiency | Can watch students solve some problems or examine some products and infer about reasoning proficiency | Can ask student to "think aloud" or can ask followup questions to probe reasoning |
| Performance Skills | Can assess mastery of understandings prerequisite to skillful performance, but cannot rely on these to tap the skill itself. | Can assess mastery of understandings prerequisite to skillful performance, but cannot rely on these to tap the skill itself. | Can observe and evaluate skills as they are being performed | Strong match when skill is oral communication proficiency; also can assess mastery of knowledge prerequisite to skillful performance |
| Ability to Create Products | Can only assess mastery of understandings prerequisite to the ability to create quality products | Can assess mastery of knowledge prerequisite to product development; brief essays can provide evidence of writing proficiency | Can assess (1) proficiency in carrying out steps in product development, and (2) attributes of the product itself | Can probe procedural knowledge and knowledge of attributes of quality products, but now product quality |
| Dispositions | Selected response questionnaire items can tap student feelings | Open-ended questionnaire items can probe dispositions | Can infer dispositions from behavior and products | Can talk with students about their feelings |
Links between achievement targets and assessment methods. Source: Student-involved assessment for learning (Stiggins, 2005)
The first and arguably most common form of assessment used in secondary classrooms is selected response. By asking various questions at varying levels of knowledge, selected-response assessments are an efficient way to measure student knowledge and understanding. However, the limitations of multiple-choice, true-false, matching, and fill-in-the-blank style assessments are that they can only provide a limited amount of evidence of student reasoning skills and are incapable of demonstrating a student’s ability to apply skills. A benefit to selected response assessments is that they are great at collecting information quickly and are easy to grade, thus decreasing the feedback loop. Therefore, selected-response can be a great tool to use for formative assessment. Not that it can’t or shouldn’t be used as a summative assessment tool, but if your learning objectives require action above recall of knowledge, you should probably look for another method.
The second form of assessment often used is constructed response. Constructed responses are often chosen to elicit evidence of students’ thinking regarding reasoning, understanding of connections, and application of content knowledge. This assessment form may be more heavily used in some disciplines than others. Lastly, the third type of assessment is the performance assessment. Performance tasks are best suited for gathering evidence of a student’s ability to perform a specific skill or create a product. With the increased pressure on schools to prepare students for college and careers, there has been a push to integrate more performance-type assessments into the teaching and learning process. The idea is that by adding more performance-based assessments, students will develop a deeper understanding of content and be able to not only retain information but also apply and transfer that knowledge to new areas.
Understanding which assessment method to use is crucial to accurately assess student learning. However, learning when and how to use assessment to further learning and measure learning is also necessary. Consider reviewing the Teacher Made Assessment Strategies resource for a deeper dive into the strengths and weaknesses of different assessment types. In the next sections, we will look at how to ensure that our assessments measure accurately.
Considerations for Formatting Assessments
If you choose to summatively assess your students with a performance assessment, then a well-designed rubric can provide students with feedback on how they did on each objective. However, traditional assessments (MC, free response, etc.) often lack detailed feedback on student learning objectives. To provide better feedback to students, consider either grouping assessment items based on learning objectives or tagging items with information that points back to specific objectives or standards for reference.
Grouping or tagging assessment items allows a teacher to track student progress and provide specific feedback to students. Tracking individual learning objectives on an assessment provides a clearer picture of student learning of the objectives than an overall score. By providing subscores for each learning objective, students can see their strengths and weaknesses and use your feedback to guide any remediation efforts. If your assessments are broken into sections based on learning objectives, you might allow students to re-test specific sections of a unit versus taking the whole assessment again. This could save time and stress for students and the teacher.
High-Quality Assessments*
To be able to select and administer appropriate assessment techniques, teachers need to know about the variety of techniques that can be used as well as what factors ensure that the assessment techniques are high quality. We begin by considering high-quality assessments. For an assessment to be high quality, it needs to have good validity and reliability as well as the absence of bias.
Validity
Validity is the evaluation of the “adequacy and appropriateness of the interpretations and uses of assessment results” for a given group of individuals (Linn & Miller, 2005, p. 68).
For example, is it appropriate to conclude that the results of a mathematics test on fractions given to recent immigrants accurately represent their understanding of fractions?
Is it appropriate for the teacher to conclude, based on her observations, that a kindergarten student, Jasmine, has Attention Deficit Disorder because she does not follow the teacher’s oral instructions?
Obviously, in each situation, other interpretations are possible that the immigrant students have poor English skills rather than mathematics skills, or that Jasmine may be hearing impaired.
It is important to understand that validity refers to the interpretation and uses made of the results of an assessment procedure, not of the assessment procedure itself. For example, making judgments about the results of the same test on fractions may be valid if all the students understand English well. A teacher, concluding from her observations that the kindergarten student has Attention Deficit Disorder (ADD) may be appropriate if the student has been screened for hearing and other disorders (although the classification of a disorder like ADD cannot be made by one teacher). Validity involves making an overall judgment of the degree to which the interpretations and uses of the assessment results are justified. Validity is a matter of degree (e.g. high, moderate, or low validity) rather than all-or-none (e.g. totally valid vs invalid) (Linn & Miller, 2005).
Content validity evidence is associated with the question: How well does the assessment include the content or tasks it is supposed to? For example, suppose your educational psychology instructor devises a mid-term test and tells you this includes chapters one to seven in the textbook. All the items in the test should be based on the content from educational psychology, not your methods or cultural foundations classes. Also, the items in the test should cover content from all seven chapters and not just chapters three to seven—unless the instructor tells you that these chapters have priority.
Teachers have to be clear about their purposes and priorities for instruction before they can begin to gather evidence related to content validity. Content validation determines the degree that assessment tasks are relevant and representative of the tasks judged by the teacher (or test developer) to represent their goals and objectives (Linn & Miller, 2005). In their book, The Understanding by Design Guide to Creating High-Quality Units, Wiggins & McTighe share a method that teachers can use to determine the validity of their assessments. Consider how the Two Question Validity Test (Wiggins & McTighe, 2011, p. 91) below might help you evaluate how well your assessment measures student understanding versus recall abilities, effort, creativity, or presentation skills.
|
Two-Question Validity Test |
|||
| Very likely* | Somewhat likely | Very unlikely | |
| Question 1: How likely is it that a student could do well on the assessment by | |||
|
|||
|
|||
|
|||
| Question 2: How likely is it that a student could do poorly on the assessment by | |||
|
|||
|
|||
| * “Very likely” means that the assessment is not aligned with goal(s). |
Construct validity evidence is more complex than content validity evidence. Often, we are interested in making broader judgments about students’ performances than specific skills such as doing fractions. The focus may be on constructs such as mathematical reasoning or reading comprehension.
A construct is a characteristic of a person we assume exists to help explain behavior.
For example, we use the concept of test anxiety to explain why some individuals when taking a test have difficulty concentrating, have physiological reactions such as sweating, and perform poorly on tests but not in class assignments. Similarly, mathematics reasoning and reading comprehension are constructs as we use them to help explain performance on an assessment.
Construct validation is the process of determining the extent to which performance on an assessment can be interpreted in terms of the intended constructs and is not influenced by factors irrelevant to the construct.
For example, judgments about recent immigrants’ performance on a mathematical reasoning test administered in English will have low construct validity if the results are influenced by English language skills that are irrelevant to mathematical problem-solving. Similarly, construct validity of end-of-semester examinations is likely to be poor for those students who are highly anxious when taking major tests but not during regular class periods or when doing assignments. Teachers can help increase construct validity by trying to reduce factors that influence performance but are irrelevant to the construct being assessed. These factors include anxiety, English language skills, and reading speed (Linn & Miller 2005).
The third form of validity evidence is called criterion-related validity. Selective colleges in the USA use the ACT or SAT among other criteria to choose who will be admitted because these standardized tests help predict freshman grades, i.e. have high criterion-related validity. Some K-12 schools give students math or reading tests in the fall semester in order to predict which are likely to do well on the annual state tests administered in the spring semester and which students are unlikely to pass the tests and will need additional assistance. If the tests administered in the fall do not predict students’ performances accurately, the additional assistance may be given to the wrong students illustrating the importance of criterion-related validity.
Reliability
Reliability refers to the consistency of the measurement (Linn & Miller 2005). Suppose Mr. Garcia is teaching a unit on food chemistry in his tenth-grade class and gives an assessment at the end of the unit using test items from the teachers’ guide. Reliability is related to questions such as: How similar would the scores of the students be if they had taken the assessment on a Friday or Monday? Would the scores have varied if Mr. Garcia had selected different test items, or if a different teacher had graded the test? An assessment provides information about students by using a specific measure of performance at one particular time. Unless the results from the assessment are reasonably consistent over different occasions, different raters, or different tasks (in the same content domain) confidence in the results will be low and so cannot be useful in improving student learning.
We cannot expect perfect consistency. Students’ memory, attention, fatigue, effort, and anxiety fluctuate, and so influence performance. Even trained raters vary somewhat when grading assessments such as essays, science projects, or oral presentations. Also, the wording and design of specific items influence students’ performances. However, some assessments are more reliable than others, and there are several strategies teachers can use to increase reliability
- First, assessments with more tasks or items typically have higher reliability.
To understand this, consider two tests one with five items and one with 50 items. Chance factors influence the shorter test more than the longer test. If a student does not understand one of the items in the first test the total score is very highly influenced (it would be reduced by 20 percent). In contrast, if there was one item in the test with 50 items that was confusing, the total score would be influenced much less (by only 2 percent). This does not mean that assessments should be inordinately long, but, on average, enough tasks should be included to reduce the influence of chance variations.
- Second, clear directions and tasks help increase reliability.
If the directions or wording of specific tasks or items are unclear, then students have to guess what they mean undermining the accuracy of their results.
- Third, clear scoring criteria are crucial in ensuring high reliability (Linn & Miller, 2005).
Absence of bias
Bias occurs in assessment when there are components in the assessment method or the administration of the assessment that distort the performance of the student because of their characteristics such as gender, ethnicity, or social class (Popham, 2005).
- Two types of assessment bias are important: offensiveness and unfair penalization.
An assessment is most likely offensive to a subgroup of students when negative stereotypes are included in the test. For example, the assessment in a health class could include items, in which all the doctors were men and all the nurses were women. Or, a series of questions in a social studies class could portray Latinos and Asians as immigrants rather than native-born Americans. In these examples, some female, Latino or Asian students are likely to be offended by the stereotypes, and this can distract them from performing well on the assessment.
Unfair penalization occurs when items disadvantage one group not because they may be offensive but because of differential background experiences. For example, an item for math assessment that assumes knowledge of a particular sport may disadvantage groups not as familiar with that sport (e.g. American football for recent immigrants). Or an assessment on teamwork that asks students to model their concept of a team on a symphony orchestra is likely to be easier for those students who have attended orchestra performances—probably students from affluent families. Unfair penalization does not occur just because some students do poorly in class. For example, asking questions about a specific sport in a physical education class when information on that sport had been discussed in class is not unfair penalization as long as the questions do not require knowledge beyond that taught in class that some groups are less likely to have.
It can be difficult for new teachers teaching in multi-ethnic classrooms to devise interesting assessments that do not penalize any groups of students. Teachers need to think seriously about the impact of students’ differing backgrounds on the assessment they use in class. Listening carefully to what students say is important as is learning about the backgrounds of the students.
Assessments in the PE Setting
If you are teaching in a PE setting and you are thinking that assessment “looks different,” then you might consider reviewing some of the resources below to see how the principles above can help you gather evidence of student learning and skill development.
Conclusion
Formative assessment is most commonly referred to as assessment for learning, as the purpose is to inform your instructional decisions to guide student learning. In contrast, summative assessment is referred to as assessment of learning, as the purpose is to measure what students know at the conclusion of learning. To effectively use formative or summative assessment in the classroom, teachers must clearly define their learning objectives, choose assessment techniques that provide reliable individual evidence of student learning, and use data of student understanding to adjust their instruction. Technology should be considered when planning assessments as it may assist in increasing student motivation and analyzing resulting data.
Summarizing Key Understandings
Peer Examples
References & Attributions
Attribution: “Definition of Formative Assessment” was adapted in part from GSC Lesson Planning 101 by Deborah Kolling and Kate Shumway-Pitt, licensed CC BY-SA 4.0
Attribution: “Adjusting Instruction Based on Assessment” was adapted in part from Educational Psychology by Kelvin Seifert, licensed CC BY 3.0. Download for free at http://cnx.org/contents/ce6c5eb6-84d3-4265-9554-84059b75221e@2.1
Attribution: “Technology & Formative Assessment” was adapted in part from Igniting Your Teaching with Educational Technology by Malikah R. Nu-Man and Tamika M. Porter, licensed CC BY 4.0
Attribution: “Summative Assessment” was adapted in part from Ch. 15 Teacher made assessment strategies by Kevin Seifert and Rosemary Sutton, licensed under a Creative Commons Attribution 4.0 International License.
Attribution: “High-Quality Assessments” section is adapted in part from Ch. 15 Teacher made assessment strategies by Kevin Seifert and Rosemary Sutton, licensed under a Creative Commons Attribution 4.0 International License.
Airasian, P. W. (2004). Classroom Assessment: Concepts and Applications 3rd ed. Boston: McGraw Hill.
Chappuis, J., Stiggins, R. J., Chappuis, S., & Arter, J. A. (2012). Classroom assessment for student learning: Doing it right – using it well. Boston, MA: Pearson.
Linn, R. L., & Miller, M. D. (2005). Measurement and Assessment in Teaching 9th ed. Upper Saddle River, NJ: Pearson.
Popham, W.J. (2005).
Popham, W. J. (2017). Classroom assessment: What teachers need to know, 8th edition. Boston, MA: Pearson
Popham, W. J., & Baker, E. L. (1970). Planning an instructional sequence. New Jersey: Prentice Hall.
Seifert, K. (May 11, 2011). Educational Psychology. OpenStax CNX. Download for free at http://cnx.org/contents/ce6c5eb6-84d3-4265-9554-84059b75221e@2.1
Stiggins, R. J. (2005). Student-involved assessment for learning. Upper Saddle River, NJ: Prentice Hall.
Timmis, S., Broadfoot, P., Sutherland, R., & Oldfield, A. (2016). Rethinking assessment in a digital age: Opportunities, challenges and risks. British Educational Research Journal, 42(3), 454-476.
Wiggins, G., & McTighe, J. (2011). The Understanding by Design Guide to Creating High-Quality Units. Alexandria, VA: Association for Supervision and Curriculum Development.
Wiggins, G., & McTighe, J. (2005). Understanding by design (2nd ed.). Alexandria, VA: Association for Supervision and Curriculum Development.