Friday, December 22, 2006

Proficiency for All, Part I: A Brief History of the NAEP

I’ve talked before about how reaching 100% proficiency for all children in all subjects by 2014 is a mathematical impossibility, but this excellent article from the November 29th issue of Education Week does the best job I’ve seen of talking about both why it can’t happen and what that means for us all as we try to reform education in the United States. One of the pieces that’s the most interesting to me from it is this history that they give of the National Assessment of Education Progress (NAEP), which seems incredibly secretive for a test that is considered to be the “nation’s report card.” I’ve pasted that section below; the full article can be found in the follow-up link at the bottom of the post:
How did we get standards so divorced from reality, even for students in the middle of the distribution? Few Americans realize how unscientific the process for defining proficiency was—and must be. NAEP officials assembled some teachers, businesspeople, parents, and others, presented these judges with NAEP questions, and asked their opinions about whether students should get them right. No comparison with actual performance, even in the best schools, was required. Judges’ opinions were averaged to calculate how many NAEP questions proficient students should answer.


From the start, experts lambasted this process. When officials first contemplated defining proficiency, the NAEP board commissioned a 1982 study by Willard Wirtz, a former U.S. secretary of labor, and his colleague Archie Lapointe, a former executive director of NAEP. They reported that “setting levels of failure, mediocrity, or excellence in terms of NAEP percentages would be a serious mistake.” Indeed, they said, it would be “fatal” to NAEP’s credibility. Harold Howe II, a former U.S. commissioner of education responsible for NAEP’s early implementation, warned the assessment’s administrators that expecting all students to achieve proficiency “defies reality.”

In 1988, Congress ordered NAEP to determine the proficient score. Later, U.S. Sen. Edward M. Kennedy’s education aide, who wrote the bill’s language, testified that Congress’ demand was “deliberately ambiguous” because neither congressional staff members nor education experts could formulate it precisely. “There was not an enormous amount of introspection,” the aide acknowledged.

Others urged NAEP to wait. In 1991, Gregory Anrig, then the president of the Educational Testing Service, which administered NAEP, suggested delaying proficiency definitions until they could be properly established. Chester E. Finn Jr., an influential member of the NAEP governing board, responded that by delaying reports on how few students were proficient, “we may be sacrificing something else—the sense of urgency for national improvement.”

Once achievement levels were set, the government commissioned a series of evaluations. Each study denounced the process for defining proficiency, leading to calls for yet another evaluation that might generate a better answer.

The first such evaluation, conducted by three respected statisticians in 1991, concluded that “the technical difficulties are extremely serious.” To continue the process, they said, would be “ridiculous.” Their preliminary report said that NAEP’s willingness to proceed in this way reflected technical incompetence. NAEP fired the statisticians.

Congress then asked the U.S. General Accounting Office for its opinion. The GAO found NAEP’s approach “inherently flawed, both conceptually and procedurally.” “These weaknesses,” it said, “could have serious consequences.” The GAO recommended that NAEP results not be published using percentages of students who were allegedly basic, proficient, or advanced.

Proficiency for all, implying the elimination of variation within socioeconomic groups, is inconceivable. Closing achievement gaps, implying the elimination of variation between socioeconomic groups, is daunting but worth striving for.
In response, the U.S. Department of Education commissioned yet another study, this one by the National Academy of Education. The panel concluded that procedures for defining proficiency were “subject to large biases,” and that levels by which American students had been judged deficient were “unreasonably high.” Continued use of NAEP proficiency definitions could set back the cause of education reform because it would harm the credibility of NAEP itself, the panel warned.

Finally, the Education Department asked the National Academy of Sciences to weigh in. It concluded, in 1999, that the “process for setting NAEP achievement levels is fundamentally flawed” and “achievement-level results do not appear to be reasonable.”

I still don’t quite understand the credence that has been put into the NAEP as being the be-all and end-all of assessment in America. The NAEP scores always come back low, but they are given a weight that I don’t know that they deserve. The Fordham Foundation especially has used the NAEP to show that state standards stink and schools aren’t making progress, but what if the test itself is an inaccurate measure?

There’s always going to be a disconnect, too, between the results of the NAEP and the results on the myriad statewide assessments used across the country, because they test different things. You might view this as a reason to go to a national curriculum and national standards ($), and certaily the argument can be had, but the state's rights argument is equally as strong and still resonates with many people.


‘Proficiency for All’ Is an Oxymoron
Accountability should begin with realistic goals that recognize human variability.
By Richard Rothstein, Rebecca Jacobsen, & Tamara Wilder


The No Child Left Behind Act requires all students to be proficient by 2014. This is widely understood to be unattainable because 2014 is too soon. But there is no date by which all (or nearly all) students, even middle-class students, can achieve proficiency. “Proficiency for all” is an oxymoron.



—Peter Lui
The federal education legislation does not define proficiency, but refers to the National Assessment of Educational Progress. Although the Bush administration winks and nods when states require only low-level skills, the law says proficiency must be “challenging,” a term taken from NAEP’s definition. Democrats and Republicans stress that the No Child Left Behind law’s tough standards are a world apart from the minimum competency required by 1970s-style accountability programs.


But no goal can be both challenging to and achievable by all students across the achievement distribution. Standards can either be minimal and present little challenge to typical students, or challenging and unattainable by below-average students. No standard can simultaneously do both—hence the oxymoron—but that is what the No Child Left Behind law requires.


As the Harvard University professor Daniel Koretz, an expert on educational assessment and testing, has noted, typical variation in performance between those with lower and higher achievement is not primarily racial or ethnic; it is a gap within groups, including whites. Performance ranges in Japan and Korea, whose average math and science scores surpass ours, are similar to the U.S. range. If black-white gaps were eliminated in the United States, the standard deviation of test scores here would shrink by less than 10 percent. It would still be impossible to craft standards that simultaneously challenged students at the top, middle, and bottom.


The No Child Left Behind Act’s admirable goal of closing achievement gaps can only sensibly mean that achievement distributions for disadvantaged and middle-class children should be more alike. If gaps disappeared, similar proportions of whites and blacks would be “proficient”—but similar proportions would also fall below that level. Proficiency for all, implying the elimination of variation within socioeconomic groups, is inconceivable. Closing achievement gaps, implying the elimination of variation between socioeconomic groups, is daunting but worth striving for.



--------------------------------------------------------------------------------


Not only is it logically impossible to have “proficiency for all” at a challenging level. The law and NAEP stumble further. Their expectations of proficiency are absurd, beyond challenging, even for students in the middle of the distribution. The highest-performing countries can’t come close to meeting the No Child Left Behind Act’s standard of proficiency for all. “First in the world,” a widely ridiculed U.S. goal from the 1990s that was supplanted by this federal legislation, is modest compared with the demand that all students be proficient.


States, no matter how well-intentioned, cannot perform psychometric miracles that are beyond the reach of federal experts.
We can compare performance in top-scoring countries with NAEP’s proficiency standard. Comparisons are inexact—all tests don’t cover identical curricula, define grades exactly the same, or have easily equated scales. But rough comparisons can serve policy purposes.


On a 1991 international math exam, Taiwan scored highest. But if Taiwanese students had taken the NAEP math exam, 60 percent would have scored below proficient, and 22 percent below basic. On the 2003 Trends in International Mathematics and Science Study, 25 percent of students in top-scoring Singapore were below NAEP proficiency in math, and 49 percent were below proficiency in science.


On a 2001 international reading test, Sweden was tops, but two-thirds of Swedish students were not proficient in reading, as NAEP defines it.



--------------------------------------------------------------------------------


How did we get standards so divorced from reality, even for students in the middle of the distribution? Few Americans realize how unscientific the process for defining proficiency was—and must be. NAEP officials assembled some teachers, businesspeople, parents, and others, presented these judges with NAEP questions, and asked their opinions about whether students should get them right. No comparison with actual performance, even in the best schools, was required. Judges’ opinions were averaged to calculate how many NAEP questions proficient students should answer.


From the start, experts lambasted this process. When officials first contemplated defining proficiency, the NAEP board commissioned a 1982 study by Willard Wirtz, a former U.S. secretary of labor, and his colleague Archie Lapointe, a former executive director of NAEP. They reported that “setting levels of failure, mediocrity, or excellence in terms of NAEP percentages would be a serious mistake.” Indeed, they said, it would be “fatal” to NAEP’s credibility. Harold Howe II, a former U.S. commissioner of education responsible for NAEP’s early implementation, warned the assessment’s administrators that expecting all students to achieve proficiency “defies reality.”


TalkBack
Join the related discussion, “'Proficiency for All' Is an Oxymoron .”
In 1988, Congress ordered NAEP to determine the proficient score. Later, U.S. Sen. Edward M. Kennedy’s education aide, who wrote the bill’s language, testified that Congress’ demand was “deliberately ambiguous” because neither congressional staff members nor education experts could formulate it precisely. “There was not an enormous amount of introspection,” the aide acknowledged.


Others urged NAEP to wait. In 1991, Gregory Anrig, then the president of the Educational Testing Service, which administered NAEP, suggested delaying proficiency definitions until they could be properly established. Chester E. Finn Jr., an influential member of the NAEP governing board, responded that by delaying reports on how few students were proficient, “we may be sacrificing something else—the sense of urgency for national improvement.”


Once achievement levels were set, the government commissioned a series of evaluations. Each study denounced the process for defining proficiency, leading to calls for yet another evaluation that might generate a better answer.


The first such evaluation, conducted by three respected statisticians in 1991, concluded that “the technical difficulties are extremely serious.” To continue the process, they said, would be “ridiculous.” Their preliminary report said that NAEP’s willingness to proceed in this way reflected technical incompetence. NAEP fired the statisticians.


Congress then asked the U.S. General Accounting Office for its opinion. The GAO found NAEP’s approach “inherently flawed, both conceptually and procedurally.” “These weaknesses,” it said, “could have serious consequences.” The GAO recommended that NAEP results not be published using percentages of students who were allegedly basic, proficient, or advanced.


Proficiency for all, implying the elimination of variation within socioeconomic groups, is inconceivable. Closing achievement gaps, implying the elimination of variation between socioeconomic groups, is daunting but worth striving for.
In response, the U.S. Department of Education commissioned yet another study, this one by the National Academy of Education. The panel concluded that procedures for defining proficiency were “subject to large biases,” and that levels by which American students had been judged deficient were “unreasonably high.” Continued use of NAEP proficiency definitions could set back the cause of education reform because it would harm the credibility of NAEP itself, the panel warned.


Finally, the Education Department asked the National Academy of Sciences to weigh in. It concluded, in 1999, that the “process for setting NAEP achievement levels is fundamentally flawed” and “achievement-level results do not appear to be reasonable.”



--------------------------------------------------------------------------------


All this advice has been ignored—although now, every NAEP report includes a congressionally mandated disclaimer, buried in the text: “Achievement levels are to be used on a trial basis and should be interpreted with caution.” The disclaimer adds that conclusions about changes in proficiency over time may have merit, but not about how many students are actually proficient. Yet the same reports highlight percentages of students deemed below proficient or basic, and these, not the disclaimer, are promoted in NAEP’s press releases.


A curiosity of the No Child Left Behind legislation is that while it imposes sanctions on schools where all students are not proficient, it also acknowledges that NAEP proficiency definitions should be used only on a “developmental basis,” until re-evaluated. No re-evaluation has been performed.


Although the legislation implies that proficiency is as NAEP defines it, the law permits states to set their own proficiency levels. States use their own judges to imagine how students should perform. Widely differing conclusions of judges in different states is proof enough of how fanciful the process must be. States, no matter how well-intentioned, cannot perform psychometric miracles that are beyond the reach of federal experts.


State definitions now result in many states’ reporting far higher percentages of proficient students than NAEP does. Some states define proficiency in NAEP’s below-basic range. More will do so if the No Child Left Behind law’s requirement of proficiency for all continues.


Even then, the demand for proficiency for all cannot be met because of the inevitable distribution of ability in any human population. The federal law exempts only 1 percent of all students. From what we know of normal cognitive distributions, this means that students with IQs as low as 65 must be proficient; these cognitively challenged young people must do better in math than 60 percent of students in top-scoring Taiwan. Were proficiency standards lowered to NAEP’s basic level, children with IQs as low as 65 would be expected to perform better than the 22 percent of Taiwanese students whose achievement is below NAEP’s basic score.


Discussions of reauthorizing the now almost 5-year-old law typically propose to “fix” it: by crediting gains as well as levels, extending deadlines past 2014, fiddling with minimum subgroup sizes, giving English-learners more time. None of these can save the law unless we jettison the incoherent demand that all students be proficient.


We could design accountability with realistic goals that recognize human variability. Although research and experimentation is needed to determine practical and ambitious goals, we can imagine the outlines.


We might, for example, expect students who today are at the 65th percentile of the test-score distribution to improve so that, at some future date, they perform similarly to students who are now at the 75th; students who today are at the 40th percentile to perform similarly to those who are now at the 50th; and students who are at the 15th percentile to perform similarly to those who are now at the 25th. Such goals create challenges for all students and express our intent that no child be left behind.


Such goals would perhaps have to vary for subpopulations, ages, regions, and schools. The system would be too complex to be reduced to simple sound bites and administered by the highly politicized federal Department of Education.


The No Child Left Behind Act cannot be “fixed.” It gives us a “sense of urgency for national improvement” at the price of our intellectual integrity, and an unjustified sense of failure and humiliation for educators and students. It’s time to return to the drawing board.

2 Comments:

Anonymous Anonymous said...

Yep. Any standard is ultimately going to be capricious. Measures are always going to have logical criticisms. Comparisons of measures are subject to both criticisms.

Maybe we are thinking about math and science all wrong. Perhaps we should divide the standard:

Basic competency.
A standard that sets out the content and skills that tie directly to the public interest in an educated citizenry. Content related to real life skills and knowledge.

Secondary Proficiency.
this would be a standard reflecting what a semi-studious learner ought to be able to accomplish in a high school education.

The policy could be that all must reach basic competency in all subjects of public interest (Of course the "Rs" but maybe also financial literacy? civics? health? law? etiquette/ethics?

(I'm just thinking through the keyboard here)

Then require a few secondary proficiency levels in a number of menu choices...college prep, foreign language, art, vocational, technology, whatever can be valued and measured as education.

Theoretically, the WASL is the minimum competency standard.

Major obstacles? I predict the adults operating contemporary comprehensive high schools might not be ready to abandon the "one-size-fits-all" model yet.

Oh, and "states' rights?" Wait till you see what we are about to experience in "local control" versus "WA Learns."

jl

9:51 AM  
Anonymous Anonymous said...

豆豆聊天室 aio交友愛情館 2008真情寫真 2009真情寫真 aa片免費看 捷克論壇 微風論壇 大眾論壇 plus論壇 080視訊聊天室 情色視訊交友90739 美女交友-成人聊天室 色情小說 做愛成人圖片區 豆豆色情聊天室 080豆豆聊天室 小辣妹影音交友網 台中情人聊天室 桃園星願聊天室 高雄網友聊天室 新中台灣聊天室 中部網友聊天室 嘉義之光聊天室 基隆海岸聊天室 中壢網友聊天室 南台灣聊天室 南部聊坊聊天室 台南不夜城聊天室 南部網友聊天室 屏東網友聊天室 台南網友聊天室 屏東聊坊聊天室 雲林網友聊天室 大學生BBS聊天室 網路學院聊天室 屏東夜語聊天室 孤男寡女聊天室 一網情深聊天室 心靈饗宴聊天室 流星花園聊天室 食色男女色情聊天室 真愛宣言交友聊天室 情人皇朝聊天室 上班族成人聊天室 上班族f1影音視訊聊天室 哈雷視訊聊天室 080影音視訊聊天室 38不夜城聊天室 援交聊天室080 080哈啦聊天室 台北已婚聊天室 已婚廣場聊天室 夢幻家族聊天室 摸摸扣扣同學會聊天室 520情色聊天室 QQ成人交友聊天室 免費視訊網愛聊天室 愛情公寓免費聊天室 拉子性愛聊天室 柔情網友聊天室 哈啦影音交友網 哈啦影音視訊聊天室 櫻井莉亞三點全露寫真集 123上班族聊天室 尋夢園上班族聊天室 成人聊天室上班族 080上班族聊天室 6k聊天室 粉紅豆豆聊天室 080豆豆聊天網 新豆豆聊天室 080聊天室 免費音樂試聽 流行音樂試聽 免費aa片試看A片 免費a長片線上看 色情貼影片 免費a長片 本土成人貼圖站 大台灣情色網 台灣男人幫論壇 A圖網 嘟嘟成人電影網 火辣春夢貼圖網 情色貼圖俱樂部 台灣成人電影 絲襪美腿樂園 18美女貼圖區 柔情聊天網 707網愛聊天室聯盟 台北69色情貼圖區 38女孩情色網 台灣映像館 波波成人情色網站 美女成人貼圖區 無碼貼圖力量 色妹妹性愛貼圖區 日本女優貼圖網 日本美少女貼圖區 亞洲風暴情色貼圖網 哈啦聊天室 美少女自拍貼圖 辣妹成人情色網 台北女孩情色網 辣手貼圖情色網 AV無碼女優影片 男女情色寫真貼圖 a片天使俱樂部 萍水相逢遊戲區 平水相逢遊戲區 免費視訊交友90739 免費視訊聊天 辣妹視訊 - 影音聊天網 080視訊聊天室 日本美女肛交 美女工廠貼圖區 百分百貼圖區 亞洲成人電影情色網 台灣本土自拍貼圖網 麻辣貼圖情色網 好色客成人圖片貼圖區 711成人AV貼圖區 台灣美女貼圖區 筱萱成人論壇 咪咪情色貼圖區 momokoko同學會視訊 kk272視訊 情色文學小站 成人情色貼圖區 嘟嘟成人網 嘟嘟情人色網 - 貼圖區 免費色情a片下載 台灣情色論壇 成人影片分享 免費視訊聊天區 微風 成人 論壇 kiss文學區 taiwankiss文學區

5:15 AM  

Post a Comment

<< Home