Section 5
Current Commercially Available Interim Assessment Systems
Having determined their purposes and intended uses and thoughtfully considered the test design issues
in the prior section, educational leaders can choose among many assessment systems in the market. In
this section, we offer some general observations about criteria policy makers might consider when decid-
ing whether to purchase commercial available assessment systems.
Many test publishing companies offer interim assessment products, often labeled “formative” or “bench-
mark” assessment products. These assessments are marketed to serve a plethora of purposes, including
serving as diagnostic tools, providing information that can be used to guide instruction, determining stu-
dent placement, measuring growth or progress over time, and predicting success on a future assessment.
Typically these systems consist of item banks, administration tools, and customized reports. These sys-
tems often are computer-based and even web-based, allowing students to take the test whenever they
wish (or their teacher wishes) and wherever a computer with an internet connection is available. Others
also have the option of creating pencil-and-paper tests. Teachers can construct the tests, the tests can be
fixed by an administrator, or the tests can be adaptive.
The items are “linked” to content standards
4
, and results typically are reported in terms of number cor-
rect. The “diagnostic” portion tends to be a summary of results by content standard, allowing the teacher
to see which standards students perform well on and which they do not. Often these systems provide a
variety of options for reports, with different levels of aggregation. A student-level report indicates which
items students answered correctly or incorrectly, while a classroom report might indicate the percentage
of students answering each item correctly or the average percent correct for each content standard.
Some of the products have been linked to state end-of-year assessments, allowing them to serve a predic-
tive function. Some of these systems have reported correlations and other indices to document the sta-
tistical link between the interim and summative assessments as evidence of the interim assessment’s pre-
dictive ability. Most products include at least some measures of their reliability as well.
These products are marketed as being very flexible, giving instant feedback, and providing diagnostic
information on which areas need further instruction. However, these systems generally fail in providing
rich diagnostic feedback regarding student thinking. That is, few provide any information on why a stu-
dent answered an item incorrectly or how best to provide corrective feedback. For instance, many of
these computer-based assessments rely primarily on multiple-choice items. Unless each wrong answer
provides insight into the nature of the student’s incorrect thinking, the only information received from
this type of item is essentially a correct/incorrect dichotomous response. Likewise, open-ended items
need to result in more than a score, preferably in a summary report of the types of errors a student is
making or of the areas of strength and weakness in a given performance (e.g., his/her writing).
13 Achieve/Aspen/Center for Assessment—Policy Brief
4 Unfortunately, the strength of the alignment between such commercial tests and the state content standards is rarely evaluated by
independent analysts, so the “link” between the two is often based on the publishers’ claims.