Posts Tagged ‘national tests’

Better tests? Better get some better answers first, part 2

Sunday, April 10th, 2011

This is part 2 of a 2-part series on the national tests under development to assess progress under the new Common Core State Standards (CCSS). Part 1 is here, and here’s an earlier post about the CCSS.


The Obama administration’s support for more testing under ESEA is based on an assertion that the tests will be better than the ones we have now.

How’s that going so far?

To find out more about this so-called “new generation of assessments,” I followed a link on the Illinois State Board of Education web site to a report by the Educational Testing Service. I learned that there are two groups, or consortia, developing the national standardized tests based on the CCSS.

One is the Partnership for Assessment of Readiness for College and Careers (PARCC). There are 24 states involved in this group, including Illinois. PARCC’s project management partner is Achieve.

PARCC’s goal is to “increase the rates at which students graduate from high school prepared for success in college and the workplace.” To achieve this goal, PARCC will develop “assessments to help educators improve teacher, school, and system effectiveness by providing a wider variety of data that is useful for the purposes of analyzing effectiveness, calibrating interventions, holding school professionals accountable for student outcomes, supporting strategic management of human resources, and identifying mid-year professional development and support needs for educators. This, in turn, is intended to lead to higher levels of teacher and administrator effectiveness and faster rates of student and school improvement” (ETS report p. 7).

That’s not exactly how John Dewey described the purposes of education, is it? Nothing, really, about meeting students’ needs, about the value of learning, about helping students develop their potential…. In fact, that goal statement would make a whole lot more sense if you substituted “widget production” for “student outcomes.”

The other group is the SMARTER Balanced Assessment Consortium (SBAC), whose project management partner is West Ed. The letters in SMARTER seem to stand for something but I couldn’t find out what. The SBAC group includes 31 states (some states are involved in both groups).

SBAC’s plans are even heavier on technology than PARCC’s, and depend the most on having the technology actually work. “The design of the SBAC Consortium is intended to strategically ‘balance’ summative, interim, and formative assessments through an integrated system of standards, curriculum, assessment, instruction, and teacher development, while providing accurate year-to-year indicators of students’ progress toward college and career readiness” (ETS report p.11).

The thing that jumped out at me from the pages of this guide was that THEY REALLY DON’T KNOW HOW TO DO WHAT THEY PLAN TO DO. What do you think? Here are a few more quotes from the ETS report (emphasis added throughout).

For example, about PARCC:

The end-of-year component will utilize 100 percent computer scoring. The Partnership (PARCC) plans to press for advances in automated scoring, including the use of artificial intelligence (p. 9).

While a specific analytic approach for calculating growth has not yet been determined, the objective will be to describe each students’ relative growth, expected growth given the students’ prior achievement, and the extent to which that student is ‘on track’ toward college and career readiness… A number of technical and psychometric challenges will be investigated during the development phase to determine if and how the scores from these multiple components can be aggregated to yield valid, reliable and legally defensible scores (p. 9).

About SBAC:

Student scores from both the performance tasks (one in reading, one in writing, and two in math per year) and the computer adaptive assessment will be combined for the annual summative score. Research will be conducted to inform decisions concerning the aggregation and weighting of the results from these two components (p. 13).

They call them “unresolved challenges”

The final essay in the ETS guide is “Finding Solutions, Moving Forward,” by Nancy Doorey, Director of Programs at ETS’s Center for K-12 Assessment and performance Management, which prepared the report.

Granted, the purpose of the essay is to raise some important questions about the assessment development, but considering what is riding on these tests being BETTER tests, there seem to be way too many questions for comfort.

Here are a few:

How far can we push the frontiers of measurement during this development phase? Can we find better solutions to address two priorities that stand in tension, first, the need for highly precise and reliable data for high-stakes decisions; and second, the need for assessments that require students to apply knowledge and skills to solve complex, real world problems? (p. 15).

Studies will need to be carried out to gain deeper understanding than we currently have to support these decisions (p. 15).

Designing these components such that they can be placed onto a common scale and equated from year to year may require new approaches (p. 16).

Policy decisions concerning the weighting of the individual components into a composite annual score will need to be informed by data and field tests to ensure that the final composite scores are legally defensible for use in high-stakes decisions concerning individuals (p. 16).

Artificial intelligence engines exist that score the large majority of student essays at least as reliably as humans, and ‘send back’ those essays that are so unique or creative as to require human scoring. However, as we look to assess writing in the context of science, English literature, or history, as called for in the CCSS, new advances are needed to produce reliable sub-scores for both writing and the content area constructs assessed (p. 16).

The development of interactive computer tasks and their automated scoring engines is challenging, particularly when used within high-stakes assessments. To illustrate, the National Board of Medical Examiners has been working on interactive medical case simulations and automated scoring methodologies for use in their licensure examination for nearly 30 years and using them in operational exams for more than a decade. Examinees currently complete nine simulations, each of which takes about 25 minutes. However, to insure high comparability from one administration of the test to the next, each examinee takes approximately 480 selected response items. If results from the new K-12 assessment systems are to be used for comparable high-stakes decisions regarding individuals, such as the awarding of high school diplomas, similarly high thresholds for psychometric quality must be met (p. 17).

Finding solutions to the measurement challenges would help our schools identify individual student needs for intervention or acceleration throughout the year. The Race to the Top Assessment Program, therefore, will likely stimulate important advances in the measurement field (p. 16).

So, to President Obama, Fed Ed Head Arne Duncan, and the entire USDE PR Department, I’m not convinced.

And to you, dear readers, for doing all this god-awful reading so you don’t have to – you’re welcome!

Better tests? Better get some better answers first. Part 1

Saturday, April 9th, 2011

Here’s part 1 of my follow up to last week’s post about common core state standards and the developing related national assessments.

This week has seen a major flap over the Obama administration’s plans for testing under a new ESEA. The brouhaha started with remarks by President Obama at a student forum on March 28. In answer to a student’s question about whether there could be less testing, the President went on a riff against standardized tests and over-testing which brought back some fond memories of 2008 candidate Obama.

He railed that we’ve “piled on” too many standardized tests. He asserted that such tests should be given only “occasionally” as is the practice at his daughters’ private school, and even then shouldn’t have high-stakes attached. “Too often,” he said, “what we’ve been doing is using these tests to punish students or to, in some cases, punish schools.”

Well, a few people found it remarkable to hear this strong anti-test rhetoric from a president whose Department of Education is prepared to expand standardized testing to unprecedented levels in its proposal for reauthorizing federal education laws.

Most notable was what teacher Anthony Cody wrote about Obama’s response on his Education Week blog, Living in Dialogue, where he concluded that, “Either President Obama is trying to mislead people, or he is unfamiliar with the policies being advanced by his very own secretary of education, who was seated just a few feet away from him at this event.”

Almost immediately, someone from the Dept. of Education contacted Cody, asking him to post a “correction” to his “misinterpretation” of the President’s remarks.

Cody, who is a co-organizer of this summer’s Save our Schools March, came back with four questions for clarification, which the USDE’s press person eventually answered. You really need to read the whole exercise in bureaucratic doublespeak here.

But the upshot of the USDE’s argument is that we will have better tests. More formative assessments as opposed to one-shot tests. Tests of critical thinking as opposed to bubble tests. Growth measures as opposed to year-to-year apples and oranges comparisons. A “new generation of tests.” Tests scored by computers! Better tests.

At best, our children’s futures depend on that promise being real and realizable.

OK. So what are the odds that we will really have better, more accurate and helpful tests?

Here’s just one description of these “better tests” by a report by the International Center for Leadership in Education:

A new, next generation assessment program will accompany the Common Core State Standards. These assessments range far beyond the usual multiple-choice and short-answer questions. Instead, students will have to apply their knowledge to real-world situations through performance events…. Some performance events will take weeks to complete. These performance events will move instruction and assessment from Quadrants A (Acquisition) and B (application) to Quadrant D (Adaptation).

So, the odds are not too good. More in Part 2.

Common core standards: “A fundamental shift in the education marketplace”

Tuesday, April 5th, 2011

The recently-completed common core state standards (CCSS) define what K-12 students need to know in order to be prepared for college and career. The process is being completed with work on related assessments.

“This is a fundamental shift in the education marketplace,” writes Pascal D. Forgione, the executive director of the Educational Testing Service (ETS), in an introductory letter to a guide to the new assessments, a collection of essays on the ongoing efforts to develop these new tests.

Marketplace is the definitely the key word for test publishers.

Common core state standards (CCSS) have already been adopted by 43 states and the District of Columbia. According to Mr. Forgione, this means that “more than 80 percent of our nation’s public school students and teachers will be focused on the same content standards for their students.”

Some of what the guide says about the CCSS sounds good. About the new mathematics standards — “Because sufficient time is allocated and important ideas are developed over many years, there will be less need to teachers to repeat the same content year after year” (p. 3). The new English language arts standards will “discard the five-paragraph straightjacket” (p. 4).

You’ll have to check out the CCSS web site and read the guide to the assessments yourself to decide whether you believe common national standards can improve public education in the US, and only time will tell if they do. Either way, most of us are stuck with them.

But, red flags definitely begin to wave as the talk turns to standardized national assessments to support the CCSS.

For example, “The introductions to grades K – 8 identify two to four critical areas for each grade level, setting priorities for teachers, professional developers, and assessment writers…Faithful assessments will focus most of their time on these critical areas…” (p. 4).

In the past, PURE has raised concerns about the way Illinois identified certain of its state standards as “suitable for testing,” and developed a state assessment framework around those items. They urged districts not to use that framework as the curriculum, but really – teaching to the test was never made so easy. (See this fact sheet, “What’s testable?“)

What the Guide suggests to me is that there are far more questions than answers about the new national assessments. Will they be the “better tests” that President Obama and Fed Ed Head Arne Duncan say we need, and suggest that we’ll get? More on that here.

Support PURE!
About the PURE Thoughts blogger
Julie Woestehoff is PURE's executive director. Julie's work has earned her a Ford Foundation award and recognition as one of the 100 Most Powerful Women in Chicago.