The New Art and Science of Classroom Assessment. Robert J. Marzano

Чтение книги онлайн.

Читать онлайн книгу The New Art and Science of Classroom Assessment - Robert J. Marzano страница 9

The New Art and Science of Classroom Assessment - Robert J. Marzano The New Art and Science of Teaching

Скачать книгу

drawn from the text.

      RL.7.2

      Determine a theme or central idea of a text and analyze its development over the course of the text; provide an objective summary of the text.

      RL.7.10

      By the end of the year, read and comprehend literature, including stories, dramas, and poems, in the grades 6–8 text complexity band proficiently, with scaffolding as needed at the high end of the range.

      WHST.6–8.1.B

      Support claim(s) with logical reasoning and relevant, accurate data and evidence that demonstrate an understanding of the topic or text, using credible sources.

      WHST. 6–8.10

      Write routinely over extended time frames (time for reflection and revision) and shorter time frames (a single sitting or a day or two) for a range of discipline-specific tasks, purposes, and audiences. (NGA & CCSSO, 2010a)

      In effect, then, the teacher uses the score on one test to represent a student’s standing on five separate standards. Such an approach gives the perception that teachers are addressing standards but in reality, it constitutes a record-keeping convention that wastes teachers’ time and renders the standards inconsequential. In fact, we believe that this approach is actually the antithesis of using standards meaningfully.

       Relying on Sampling Across Standards

      At first glance, it might appear that designing assessments that sample content from multiple standards solves the problem of too much content. If a teacher has seventy-three standards statements to cover in a year, he or she can design assessments that include items from multiple statements. One assessment might have items from three or more statements. If a teacher systematically samples across the standards in such a way to equally emphasize all topics, then in the aggregate, the test scores for a particular student should paint an accurate picture of the student’s standing within the subject area. This is different from and better than tagging because the teacher designs assessments by starting with the standards. With tagging, the teacher designs assessments and then looks for standards that appear to be related.

      Even though sampling has an intuitive logic to it, it still doesn’t work well with classroom assessments. Indeed, sampling was designed for large-scale assessments, but even there it doesn’t work very well. To illustrate, consider the following example:

      You are tasked with creating a test of science knowledge and skills for grade 5 students. The school will report test results at both the individual and school levels to help students, parents, teachers, and leaders understand how well students are learning the curriculum. The test must address a variety of topics such as X, Y, Z and, in order to effectively assess their knowledge, many of the items require students to construct and justify responses. Some of the items are multiple choice.

      Pilot testing of items indicates that students require about 10 minutes to complete a constructed response item and about two minutes to complete a multiple-choice item. Your team has created 32 constructed response items and 16 multiple choice items that you feel cover all topics in the grade 5 science curriculum. Based on your estimates of how much time a student needs to complete items, the test will require approximately 6 hours to complete, not including time for set up and instructions, and breaks. And that’s just one content area test. (Childs & Jaciw, 2003, p. 8)

      We can infer from the comments of Ruth A. Childs and Andrew P. Jaciw (2003) that adequate sampling, even for three topics, requires a very long assessment. As a side note, Childs and Jaciw (2003) imply that fifth-grade science involves three topics only (for example, X, Y, and Z). In fact, Simms (2016) has determined that fifth-grade science involves at least twelve topics, four times the amount of content that Childs and Jaciw’s (2003) example implies.

      Finally, even with a relatively slim version of the content involved in fifth-grade science (three topics as opposed to twelve), and a test that requires six hours to complete, the sampling process might not be robust enough to justify reporting scores for individual students. Childs and Jaciw (2003) describe the following concern for any test that purports to provide accurate scores for individual students:

      Whether there is enough information at the student level to report subscores may be a concern. For example, research by Gao, Shavelson, and Baxter (1994) suggests that each student must answer at least nine or ten performance tasks to avoid very large effects of person-by-item interactions. To produce reliable subscores, even more items may have to be administered. Given that there are limits in test administration time, it may not be feasible to administer enough items to support student-level subscores. Instead, only overall scores might be reported at the student level, while both overall scores and subscores are reported at the school level. (p. 8)

      Despite these clear flaws in sampling procedures as the basis for test design, educators do it all the time. Everyone in the system (students, teachers, leaders, parents) relies on the resulting information to make important decisions that influence student grades, placement in classes and coursework, and advancement to the next grade or course.

      As we mention in the introduction, using proficiency scales solves a variety of assessment problems, sampling being one of those. In a system that uses proficiency scales as a measurement tool, one might lose the ability to generalize across a content area using a single test but gain immense clarity in particular slices of the target domain (for example, fifth-grade science).

      Clearly, it is the case that standards statements as currently written are not effective vehicles to drive classroom assessment. We recommend that educators rewrite standards statements so they provide a clear and unequivocal focus for classroom assessments. We call these rewritten standards statements focus statements. Focus statements translate into measurement topics. As the name implies, these measurement topics are considered important enough to assess multiple times at the school level or district level in an effort to determine the most accurate scores for individual students. To illustrate, we present figure 1.3.

Image Image

      Source for standards: Adapted from McREL, 2014a.

      The focus statements in figure 1.3 contain the essence of the content in the full standards statement with enough detail to provide guidance for assessment, but not so much as to add unnecessary complexity. As we demonstrate in chapter 2, proficiency scales add even more detail, but focus statements are a useful step in the process of identifying critical content. As we indicate in the last column of figure 1.3, once educators articulate focus statements, it is easy to translate them into measurement topics.

      The wording of the focus statements in figure 1.3 highlights the type of knowledge they represent. Those that begin with the word knows or understands are examples of declarative knowledge. Those that begin with the word executes are examples of procedural knowledge. It is important to note that we did this to make a point—namely, that the content embedded in standards statements comes in two different

Скачать книгу