The New Art and Science of Classroom Assessment. Robert J. Marzano

Чтение книги онлайн.

Читать онлайн книгу The New Art and Science of Classroom Assessment - Robert J. Marzano страница 5

The New Art and Science of Classroom Assessment - Robert J. Marzano The New Art and Science of Teaching

Скачать книгу

in fact, a basic assumption underlying the reliability coefficient. As Cronbach and Shavelson (2004) note:

      If, hypothetically, we could apply the instrument twice and on the second occasion have the person unchanged and without memory of his first experience, then the consistency of the two identical measurements would indicate the uncertainty due to measurement error. (p. 394)

      The traditional reliability coefficient simply tells how similar the score set is between the first and second test administrations. In figure I.3, the scores on the first administration and the second administration (A) are quite similar. Student 1 receives a 97 on the first administration and a 98 on the second administration; student 2 receives a 92 and a 90 respectively, and so on. There were some differences in scores but not much. The last row of the table shows the correlation between the initial administration and the second administration. That correlation (0.96) is, in fact, the reliability coefficient, and it is quite high.

      But let’s now consider another scenario, as we depict in the last column of figure I.3, Second Administration (B). In this scenario, students receive very different scores on the second administration. Student 1 receives a score of 97 on the first administration and a score of 82 on the second; student 2 receives a 92 and 84 respectively. If the second administration of the test produces a vastly different pattern of scores, we would expect the correlation between the two administrations (or the reliability coefficient) to be quite low, which it is. The last row of the table indicates that the reliability coefficient is 0.32.

      So how can educators obtain precise scores for individual students using classroom assessments? The answer to this question is that they can design multiple assessments and administer them over time.

      The preceding discussion indicates that as long as we think of tests as independent events, the scores from which educators must interpret in isolation, there is little hope for precision at the individual student level. However, if one changes the perspective from a single assessment to multiple assessments administered and interpreted over time, then it becomes not only possible but relatively straightforward to generate a relatively precise summary score for individuals.

      To illustrate, consider the following five scores for an individual student on a specific topic gathered over the course of a grading period.

      70, 72, 75, 77, 81

      We have already discussed that any one of these scores in isolation probably does not provide a great deal of accuracy. Recall from figure I.2 (page 3) that even if all test reliabilities were 0.85, we would have to add and subtract about eleven points to compute an interval score into which we are 95 percent sure the true score actually falls. But if we consider the pattern of these scores, we can have a relatively high degree of confidence in the scores, particularly as more time passes and we collect more scores.

      This pattern is clear that over time, the student’s scores have been gradually increasing. This makes intuitive sense. If the student is learning and the assessments are accurate, we would expect to see the scores continually go up. The more scores that precede any given score, the more one can judge the accuracy of that score. In the previous series, the first score is 70. In judging its accuracy, we would have to treat it like an individual assessment—we wouldn’t have much confidence in its accuracy. But with the second score of 72, we now have two data points. Since we can reasonably assume that the student is learning, it makes sense that his or her score would increase. We now have more confidence in the score of 72 than we did with the single score of 70. By the time we have the fifth score of 81, we have amassed a good deal of antecedent information with which to judge its accuracy. Although we can’t say that 81 is precisely accurate, we can say the student’s true score is probably close to it. In subsequent chapters, we present techniques for specifying the accuracy of this final score of 81.

      It’s important to note that some data patterns would indicate a lack of accuracy in the test scores. To illustrate, consider the following pattern of scores.

      70, 76, 65, 82, 71

      Assuming that the student who exhibited these scores is learning over time, the pattern doesn’t make much sense. The student began and ended the grading period with about the same score. In between, the student exhibited some scores that were significantly higher and some scores that were significantly lower. This pattern implies that there was probably a great deal of error in the assessments. (Again, we discuss how to interpret such aberrant patterns in subsequent chapters.) This scenario illustrates the need for a new view of summative scores.

      The practice of examining the mounting evidence that multiple assessments provide is a veritable sea change in the way we think of summative assessments for individual students. More specifically, we have seen school leaders initiate policies in which they make a sharp distinction between formative assessments and summative assessments. Within these policies, educators consider formative assessments as practice only, and they do not record scores from these assessments. They consider summative tests as the “real” assessments, and the scores from them play a substantive role in a student’s final grade.

      As the previous discussion illustrates, this makes little sense for at least two reasons. First, the single score educators derive from the summative assessment is not precise enough to support absolute decisions about individual students. Second, not recording formative scores is tantamount to ignoring all the historical assessment information that teachers can use to estimate a student’s current status. We take the position that educators should use the terms formative and summative scores, as opposed to formative and summative assessments, to meld the two types of assessments into a unified continuum.

      Also, teachers should periodically estimate students’ current summative scores by examining the pattern of the antecedent scores. We describe this process in depth in chapter 6 (page 91). Briefly, though, consider the pattern of five scores we described previously: 70, 72, 75, 77, 81. A teacher could use this pattern to assign a current summative score without administering another assessment. The pattern clearly indicates steady growth for the student and makes the last score of 81 appear quite reasonable.

      The process of estimating a summative score as opposed to relying only on the score from a single summative test works best if the teacher uses a scale that automatically communicates what students already know and what they still have to learn. A single score of 81 (or 77 or pretty much any score on a one hundred–point scale) doesn’t communicate much about a student’s knowledge of specific content. However, a score on a proficiency scale does and greatly increases the precision with which a teacher can estimate an individual student’s summative score.

      We discuss the nature and function of proficiency scales in depth in chapter 3. For now, figure I.4 provides an example of a proficiency scale.

Image

       Figure I.4: Sample proficiency scale for fourth-grade science.

      Notice that the proficiency

Скачать книгу