Making Classroom Assessments Reliable and Valid. Robert J. Marzano

Чтение книги онлайн.

Читать онлайн книгу Making Classroom Assessments Reliable and Valid - Robert J. Marzano страница 10

Making Classroom Assessments Reliable and Valid - Robert J. Marzano

Скачать книгу

of the content already established as valid. The information provided by a set of CAs can be interpretable in terms of how well students might perform on interim and end-of-year assessments. Construct Validity Based on statistical analysis, the items on a particular CA are highly correlated for a particular topic. The information provided by a set of CAs can be interpretable in terms of specific knowledge or skill that can be directly taught. Content Validity The scores on a specific CA clearly measure specific content. The information provided by a set of CAs can be interpreted in terms of students’ status on an explicit progression of knowledge.

      The argument-based perspective is perfectly suited for classroom teachers, and classroom teachers are the perfect individuals to generate Kane’s (1992, 2001, 2009) network of inferences leading to conclusions and decisions from the scores generated from CAs. To do this effectively, though, teachers must utilize standards as the basis for designing tests.

      As the discussion in the introduction illustrates, CAs have an advantage over traditional assessments in that they typically have a narrow focus. Also, within K–12 education, the topics on which CAs should focus have been articulated in content standards. This would seem to make the various types of validity relatively easy for classroom teachers, and it does so if standards are used wisely. Unfortunately, state standards usually require a great deal of interpretation and adaptation to be used effectively in guiding the development of CAs. Their interpretations make all the difference in the world in terms of the utility of standards. As Schneider et al. (2013) note, the way educators interpret state standards plays a major role in assessment development.

      Now we consider the standards movement, as well as the problem with standards.

      The K–12 standards movement in the United States has a long, intriguing history. (For a detailed discussion, see Marzano & Haystead, 2008; Marzano et al., 2013). Arguably, the standards movement started in 1989 at the first education summit when the National Education Goals Panel (1991, 1993) set national goals for the year 2000. Millions of dollars were made available to develop sample national standards in all the major subject areas. States took these national-level documents and created state-level versions. Probably the most famous attempts to influence state standards at the national level came in the form of the CCSS and the Next Generation Science Standards (NGSS). States have continued to adapt national-level documents such as these to meet the needs and values of their constituents.

      While it might appear that standards help teachers design valid CAs (at least in terms of content validity), this is not necessarily the case. In fact, in many situations, state standards make validity of all types problematic to achieve. To illustrate, consider the following Common Core State Standard for eighth-grade reading: “Determine the meaning of words and phrases as they are used in a text, including figurative, connotative, and technical meanings; analyze the impact of specific word choices on meaning and tone, including analogies or allusions to other texts” (RI.8.4; NGA & CCSSO, 2010a, p. 39).

      While this standard provides some direction for assessment development, it contains a great deal of content. Specifically, this standard includes the following information and skills.

      ■ Students will understand what figurative, connotative, and technical meanings are.

      ■ Students will be able to identify specific word choices an author made.

      ■ Students will be able to analyze the impact of specific word choices.

      ■ Students will understand what tone is.

      ■ Students will understand what an analogy is.

      ■ Students will understand what an allusion is.

      ■ Students will be able to analyze analogies and allusions.

      The volume of discrete pieces of content in this one standard creates an obvious problem of too much content. As mentioned previously, in their analysis of the CCSS, Marzano et al. (2013) identify seventy-three standard statements for eighth-grade English language arts, as articulated in the CCSS. If one makes a conservative assumption that each of those statements contains about seven component skills like those listed previously, this would mean that an eighth-grade teacher is expected to assess 365 specific pieces of content for ELA alone in a 180-day school year. According to Marzano and colleagues (2013), the same pattern can be observed in many state standards documents.

      Given the fact that it is virtually impossible to teach all the content embedded in national or state standards for a given subject area, a teacher must unpack standards to identify what will be assessed within a system of CAs. Ideally, the district or school does this unpacking. Tammy Heflebower, Jan K. Hoegh, and Phil Warrick (2014) explain how a school or district can lead a systematic effort to identify between fifteen and twenty-five essential topics that should be the focus of CAs. Briefly, the process involves prioritizing standards and the elements within those standards that are absolutely essential to assess. When identifying and articulating essential topics in standards, schools and districts must be cognizant of their dimensionality.

      In general, there should be one topic addressed in a CA. Parkes (2013) explains that this is a foundational concept in measurement theory: “any single score from a measurement is to represent a single quality” (p. 107). This is technically referred to as making a CA unidimensional (technically stated, a unidimensional test “measures only one dimension or only one latent trait” [AERA et al., 2014, p. 224]). The notion that unidimensionality is foundational to test theory can be traced back to the middle of the 1900s. For example, in a foundational article on measurement theory in 1959, Frederic M. Lord notes that a test is a “collection of tasks; the examinee’s performance on these tasks is taken as an index of [a student’s] standing along some psychological dimension” (p. 473). Over forty years later, David Thissen and Howard Wainer (2001) explain:

      Before the responses to any set of items are combined into a single score that is taken to be, in some sense, representative of the responses to all of the items, we must ascertain the extent to which the items “measure the same thing.” (p. 10)

      Without unidimensionality, a score on a test is difficult to interpret. For example, assume that two students receive a score of 70 on the same test, but that test measures two dimensions. This is depicted in figure 1.1.

      Note: Black = patterns; gray = data analysis. Total possible points for black (patterns) = sixty; total possible points for gray (data analysis) = forty.

      Figure 1.1: Two students’ scores on a two-dimensional test.

      Конец ознакомительного

Скачать книгу