Formative Assessment & Standards-Based Grading. Robert J. Marzano

Чтение книги онлайн.

Читать онлайн книгу Formative Assessment & Standards-Based Grading - Robert J. Marzano страница 6

Formative Assessment & Standards-Based Grading - Robert J. Marzano Classroom Strategies

Скачать книгу

rel="nofollow" href="#ulink_e41e1e01-b33c-575c-b4e2-bd7a8b39dd9e">table 1.5 provides a number of useful generalizations about learning goals and, by extrapolation, about learning progressions. First, setting goals appears to have a notable effect on student achievement in its own right. This is evidenced by the substantial ESs reported in table 1.5 for the general effects of goal setting. For example, Kevin Wise and James Okey (1983) reported an ES of 1.37, Mark Lipsey and David Wilson (1993) reported an ES of 0.55, and Herbert Walberg (1999) reported an ES of 0.40. Second, specific goals have more of an impact than do general goals. Witness Mark Tubbs’s (1986) ES of 0.50 associated with setting specific goals as opposed to general goals. Edwin Locke and Gary Latham (1990) reported ESs that range from 0.42 to 0.82 regarding specific versus general goals, and Steve Graham and Dolores Perin (2007) reported an ES of 0.70 (for translations of ESs into percentile gains, see appendix B). Third, goals must be at the right level of difficulty for maximum effect on student achievement. This is evidenced in the findings reported by Tubbs (1986), Anthony Mento, Robert Steel, and Ronald Karren (1987), Locke and Latham (1990), Kluger and DeNisi (1996), and Matthew Burns (2004). Specifically, goals must be challenging enough to interest students but not so difficult as to frustrate them (for a detailed discussion of learning goals, see Marzano, 2009).

       The Imprecision of Assessments

      One fact that must be kept in mind in any discussion of assessment—formative or otherwise—is that all assessments are imprecise to one degree or another. This is explicit in a fundamental equation of classical test theory that can be represented as follows:

      Observed score = true score + error score

      Marzano (2006) explained:

      This equation indicates that a student’s observed score on an assessment (the final score assigned by the teacher) consists of two components—the student’s true score and the student’s error score. The student’s true score is that which represents the student’s true level of understanding or skill regarding the topic being measured. The error score is the part of an observed score that is due to factors other than the student’s level understanding or skill. (pp. 36–37)

      In technical terms, every score assigned to a student on every assessment probably contains some part that is error. To illustrate the consequences of error in the interpretation of assessment scores, consider table 1.6.

Image

      Note: 95% confidence interval based on the assumption of a standard deviation of 12 points.

      Table 1.6 shows what can be expected in terms of the amount of error that surrounds a score of 70 when an assessment has reliabilities that range from 0.85 to 0.45. In all cases, the student is assumed to have received a score of 70 on the assessment. That is, the student’s observed score is 70.

      First, let us consider the precision of an observed score of 70 when the reliability of the assessment is 0.85. This is the typical reliability one would expect from a standardized test or a state test (Lou et al., 1996). Using statistical formulas, it is possible to compute a range of scores in which you are 95 percent sure the true score actually falls. Columns three, four, and five of table 1.6 report that range. In the first row of table 1.6, we see that for an assessment with a reliability of 0.85 and an observed score of 70, one would be 95 percent sure the student’s true score is anywhere between a score of 60 and 80. That is, the student’s true score might really be as low as 60 or as high as 80 even though he or she receives a score of 70. This is a range of 20 points. But this assumes the reliability of the assessment to be 0.85, which, again, is what you would expect from a state test or a standardized test.

      Next, let us consider the range with classroom assessments. To do so, consider the second row of table 1.6, which pertains to the reliability of 0.75. This is probably the highest reliability you could expect from an assessment designed by a teacher, school, or district (see Lou et al., 1996). Now the low score is 58 and the high score is 82—a range of 24 points. To obtain the full impact of the information presented in table 1.6, consider the last row, which depicts the range of possible true scores when the reliability is 0.45. This reliability is, in fact, probably more typical of what you could expect from a teacher-designed classroom assessment (Marzano, 2002). The lowest possible true score is 52 and the highest possible true score is 88—a range of 36 points.

      Quite obviously, no single assessment can ever be relied on as an absolute indicator of a student’s status. Gregory Cizek (2007) added a perspective on the precision of assessments in his discussion on the mathematics section of the state test in a large midwestern state. He explained that the total score reliability for the mathematics portion of the test in that state at the fourth grade is 0.87—certainly an acceptable level of reliability. That test also reports students’ scores in subareas using the National Council of Teachers of Mathematics categories: algebra, data analysis and probability, estimation and mental computation, geometry, and problem-solving strategies. Unfortunately, the reliability of these subscale scores ranges from 0.33 to 0.57 (p. 103). As evidenced by table 1.6, reliabilities this low would translate into a wide range of possible true scores.

      Imprecision in assessments can come in many forms. It can be a function of poorly constructed items on a test, or it can come from students’ lack of attention or effort when taking a test. Imprecision can also be a function of teachers’ interpretations of assessments. A study done by Herman and Choi (2008) asked two questions: How accurate are teachers’ judgments of student learning, and how does accuracy of teachers’ judgments relate to student performance? They found that “the study results show that the more accurate teachers are in their knowledge of where students are, the more effective they may be in promoting subsequent subject learning” (p. 18). Unfortunately, they also found that “average accuracy was less than 50%” (p. 19). Margaret Heritage, Jinok Kim, Terry Vendlinski, and Joan Herman (2008) added that “inaccurate analyses or inappropriate inference about students’ learning status can lead to errors in what the next instructional steps will be” (p. 1). They concluded that “using assessment information to plan subsequent instruction tends to be the most difficult task for teachers as compared to other tasks (for example, assessing student responses)” (p. 14).

      One very important consideration when interpreting scores from assessments or making inferences about a student based on an assessment is the native language of the student. Christy Kim Boscardin, Barbara Jones, Claire Nishimura, Shannon Madsen, and Jae-Eun Park (2008) conducted a review of performance assessments administered in high school biology courses. They focused their review on English language learners, noting that “the language demand of content assessments may introduce construct-irrelevant components into the testing process for EL students” (p. 3). Specifically, they found that the students with a stronger grasp of the English language would perform better on the tests even though they might not have had any better understanding of the science content. The same concept holds true for standardized tests in content-specific areas. They noted that “the language demand of a content assessment is a potential threat to the validity of the assessment” (p. 3).

      At

Скачать книгу