This is part 2 of a 2-part series on the national tests under development to assess progress under the new Common Core State Standards (CCSS). Part 1 is here, and here’s an earlier post about the CCSS.
The Obama administration’s support for more testing under ESEA is based on an assertion that the tests will be better than the ones we have now.
How’s that going so far?
To find out more about this so-called “new generation of assessments,” I followed a link on the Illinois State Board of Education web site to a report by the Educational Testing Service. I learned that there are two groups, or consortia, developing the national standardized tests based on the CCSS.
One is the Partnership for Assessment of Readiness for College and Careers (PARCC). There are 24 states involved in this group, including Illinois. PARCC’s project management partner is Achieve.
PARCC’s goal is to “increase the rates at which students graduate from high school prepared for success in college and the workplace.” To achieve this goal, PARCC will develop “assessments to help educators improve teacher, school, and system effectiveness by providing a wider variety of data that is useful for the purposes of analyzing effectiveness, calibrating interventions, holding school professionals accountable for student outcomes, supporting strategic management of human resources, and identifying mid-year professional development and support needs for educators. This, in turn, is intended to lead to higher levels of teacher and administrator effectiveness and faster rates of student and school improvement” (ETS report p. 7).
That’s not exactly how John Dewey described the purposes of education, is it? Nothing, really, about meeting students’ needs, about the value of learning, about helping students develop their potential…. In fact, that goal statement would make a whole lot more sense if you substituted “widget production” for “student outcomes.”
The other group is the SMARTER Balanced Assessment Consortium (SBAC), whose project management partner is West Ed. The letters in SMARTER seem to stand for something but I couldn’t find out what. The SBAC group includes 31 states (some states are involved in both groups).
SBAC’s plans are even heavier on technology than PARCC’s, and depend the most on having the technology actually work. “The design of the SBAC Consortium is intended to strategically ‘balance’ summative, interim, and formative assessments through an integrated system of standards, curriculum, assessment, instruction, and teacher development, while providing accurate year-to-year indicators of students’ progress toward college and career readiness” (ETS report p.11).
The thing that jumped out at me from the pages of this guide was that THEY REALLY DON’T KNOW HOW TO DO WHAT THEY PLAN TO DO. What do you think? Here are a few more quotes from the ETS report (emphasis added throughout).
For example, about PARCC:
The end-of-year component will utilize 100 percent computer scoring. The Partnership (PARCC) plans to press for advances in automated scoring, including the use of artificial intelligence (p. 9).
While a specific analytic approach for calculating growth has not yet been determined, the objective will be to describe each students’ relative growth, expected growth given the students’ prior achievement, and the extent to which that student is ‘on track’ toward college and career readiness… A number of technical and psychometric challenges will be investigated during the development phase to determine if and how the scores from these multiple components can be aggregated to yield valid, reliable and legally defensible scores (p. 9).
Student scores from both the performance tasks (one in reading, one in writing, and two in math per year) and the computer adaptive assessment will be combined for the annual summative score. Research will be conducted to inform decisions concerning the aggregation and weighting of the results from these two components (p. 13).
They call them “unresolved challenges”
The final essay in the ETS guide is “Finding Solutions, Moving Forward,” by Nancy Doorey, Director of Programs at ETS’s Center for K-12 Assessment and performance Management, which prepared the report.
Granted, the purpose of the essay is to raise some important questions about the assessment development, but considering what is riding on these tests being BETTER tests, there seem to be way too many questions for comfort.
Here are a few:
How far can we push the frontiers of measurement during this development phase? Can we find better solutions to address two priorities that stand in tension, first, the need for highly precise and reliable data for high-stakes decisions; and second, the need for assessments that require students to apply knowledge and skills to solve complex, real world problems? (p. 15).
Studies will need to be carried out to gain deeper understanding than we currently have to support these decisions (p. 15).
Designing these components such that they can be placed onto a common scale and equated from year to year may require new approaches (p. 16).
Policy decisions concerning the weighting of the individual components into a composite annual score will need to be informed by data and field tests to ensure that the final composite scores are legally defensible for use in high-stakes decisions concerning individuals (p. 16).
Artificial intelligence engines exist that score the large majority of student essays at least as reliably as humans, and ‘send back’ those essays that are so unique or creative as to require human scoring. However, as we look to assess writing in the context of science, English literature, or history, as called for in the CCSS, new advances are needed to produce reliable sub-scores for both writing and the content area constructs assessed (p. 16).
The development of interactive computer tasks and their automated scoring engines is challenging, particularly when used within high-stakes assessments. To illustrate, the National Board of Medical Examiners has been working on interactive medical case simulations and automated scoring methodologies for use in their licensure examination for nearly 30 years and using them in operational exams for more than a decade. Examinees currently complete nine simulations, each of which takes about 25 minutes. However, to insure high comparability from one administration of the test to the next, each examinee takes approximately 480 selected response items. If results from the new K-12 assessment systems are to be used for comparable high-stakes decisions regarding individuals, such as the awarding of high school diplomas, similarly high thresholds for psychometric quality must be met (p. 17).
Finding solutions to the measurement challenges would help our schools identify individual student needs for intervention or acceleration throughout the year. The Race to the Top Assessment Program, therefore, will likely stimulate important advances in the measurement field (p. 16).
So, to President Obama, Fed Ed Head Arne Duncan, and the entire USDE PR Department, I’m not convinced.
And to you, dear readers, for doing all this god-awful reading so you don’t have to – you’re welcome!