Selected Scholarly Publications
This open-access research memorandum describes a preliminary research and evaluation agenda for personalized assessments, where such assessments are intended to be attuned to the social, cultural, and other relevant characteristics of individuals and the contexts from which they come. The agenda targets the full range of assessment uses—school accountability, national and international assessment, admissions, certification and licensure, and instructional planning. The purposes of the agenda are to guide the theoretical and empirical research and development needed to create personalized assessments and to suggest a means for judging the effectiveness of those instruments.
Bennett, R. E., Sparks, J. R., Arslan, B., Lehman, B., Sinharay, S., & Zapata-Rivera, D. (2026). A preliminary research and evaluation agenda for personalized assessment in the service of equity (Research Memorandum No. RM-26-01). ETS. https://doi.org/10.64634/wjv5e895
Investigating Elements of Culturally Responsive Assessments in the Context of the National Assessment of Educational Progress: An Initial Exploration
Standardized assessments are a common part of U.S. education. Due to dramatic increases in population diversity over the past several decades, a number of scholars have recommended that these assessments become more attuned to demographic differences (e.g., Bennett; Buzick, Casabianca, and Gholson; Hughes), evaluating students' knowledge, skills, and understanding by taking into account their unique cultural identities, interests, and lived experiences (e.g., Hood; Landl). In this exploratory study, several National Assessment of Educational Progress (NAEP) Grade 8 mathematics items were adapted to include vocabulary, scenarios, and contexts expected to be more familiar to Hispanic students living in the United States, with a focus on Puerto Rican, Dominican, and Mexican culture. We evaluated the impact of these minimal changes on student performance, and on student behavior during testing, in a sample of almost a thousand eighth-grade students across several U.S. states. The adapted test items led to a 25% reduction in the average (equated) score difference between White and Hispanic students, but little if any difference in examinee behavior indicative of motivation and engagement. We also make recommendations for future research and methodology refinements.
Sinharay, S., Johnson, M. S., Bennett, R. E., Lopez, R. M., Sparks, J. R., & Pillarisetti, S. (2025), Investigating elements of culturally responsive assessments in the context of the National Assessment of Educational Progress: An initial exploration. Educational Measurement: Issues and Practice, 44(4), 33-51. https://doi.org/10.1111/emip.70008
This open-access chapter focuses on validity, modeling, and analysis issues in technology-based assessment (TBA), where TBA is defined as a measurement used for decision making primarily in education, but also in the workplace, that employs digital computing in most, if not all, aspects of its creation, delivery, presentation, scoring, or reporting. The first section centers on assessments used to support consequential purposes, encompassing decisions that may have highly significant impact on individuals, groups, or institutions and that are often difficult to reverse. The second section covers assessments employed for in-the-moment instructional decisions or for describing what a student knows and can do so that near-term instructional next steps can be taken. The last major section explores the idea of combining both assessment purposes—i.e., consequential decision making and instructional support--in the same assessment. The chapter concludes with a summary of key points, recommendations for research, and speculation on future directions.
Bennett, R. E., LaMar, M., & Mazzeo, J. (2025). Technology-based assessment: Validity, modeling, and analysis issues. In L. L. Cook & M. J. Pitoniak (Eds.), Educational measurement (5th ed., pp. 581–654). Oxford University Press. DOI: 10.1093/oso/9780197654965.003.0009.
Using the method of narrative review, this paper considers the impact of structural inequity in US society and its implications for educational assessment. Focusing on African Americans, some of the many past and present examples of structural inequity and their effects are delineated. Described next are how these effects can be connected to opportunity to learn (OTL), very broadly conceived, and how the persistence of so-called achievement gaps might be seen from that OTL perspective. Based on the conception derived from the review, a graphical representation is given positing how structural inequity, through OTL, works to constrain achievement and life chances cumulatively over time. Understanding OTL from this broad-ranging, cumulative perspective suggests ways in which the design and use of K-12 and higher education assessment might be rethought.
Bennett, R. E. (2025). Rethinking equity and assessment through opportunity to learn. Assessment in Education: Principles, Policy & Practice, 32(1), 5–32. https://doi.org/10.1080/0969594X.2025.2462549
Socioculturally Responsive Assessment assembles the best-available thinking from within and outside the educational measurement community about the theoretical foundations and systems-level policy implications of formal assessment programs designed to be socioculturally responsive. Synthesized from culturally responsive assessment design and practices, culturally relevant pedagogy and funds of knowledge, universal design for learning, the learning sciences, and other literatures, this emerging concept affirms that students’ learning and performance is inextricably tied to the social, cultural, and linguistic contexts in which they live and develop knowledge. Across four sections, this open-access book provides an argument and initial evidence for impact on students, users, and assessment quality; offers guidance for implementation; and examines the potential limitations, pitfalls, barriers, and measurement issues that such programs will inevitably raise. Scholars, teaching faculty, test developers, and policymakers will come away with integral foundations, new assessment approaches, and a greater sense of the potential for positive impact that these assessments may afford.
Bennett, R.E., Darling-Hammond, L., & Badrinarayan, A. (Eds.). (2025). Socioculturally Responsive Assessment: Implications for Theory, Measurement, and Systems-Level Policy (1st ed.). Routledge. https://doi.org/10.4324/9781003435105
Over our field's 100-year-plus history, standardization has been a central assumption in test theory and practice. The concept's justification turns on leveling the playing field by presenting all examinees with putatively equivalent experiences. Until relatively recently, our field has accepted that justification almost without question. In this article, I present a case for standardization's antithesis, personalization. Interestingly, personalized assessment has important precedents within the measurement community. As intriguing are some of the divergent ways in which personalization might be realized in practice. Those ways, however, suggest a host of serious issues. Despite those issues, both moral obligation and survival imperative counsel persistence in trying to personalize assessment.
Bennett, R.E. (2024), Personalizing Assessment: Dream or Nightmare? Educational Measurement: Issues and Practice, 43, 119-125. https://doi.org/10.1111/emip.12652
In the United States, opposition to traditional standardized tests is widespread, particularly obvious in the admissions context but also evident in elementary and secondary education. This opposition is fueled in significant part by the perception that tests perpetuate social injustice through their content, design, and use. To survive, as well as contribute positively, the measurement field must rethink assessment, including how to make it more socioculturally responsive. This open-access paper offers a rationale for that rethinking and then employs provisional design principles drawn from various literatures to formulate a working definition and the beginnings of a theory. In the closing section, a path toward implementation is suggested.
Bennett, R. E. (2023). Toward a Theory of Socioculturally Responsive Assessment. Educational Assessment, 28(2), 83–104. https://doi.org/10.1080/10627197.2023.2202312
Are There Distinctive Profiles in Examinee Essay-Writing Processes?
Grouping individuals according to a set of measured characteristics, or profiling, is frequently used in describing, understanding, and acting on a phenomenon. The advent of computer-based assessment offers new possibilities for profiling writing because aspects can be captured that were not heretofore observable. We explored whether writing processes could be profiled of over 30,000 adults taking a high-school equivalency examination. Process features were extracted from keystroke logs, aggregated into composite indicators, and used with essay score to assign individuals to profiles. Analyses included computing the percentages of individuals that could be classified, using MANOVA to examine differences among profiles on external variables, and examining if profiles could be distinguished from one another based on patterns derived from cluster analysis. Results showed that about 30% of examinees could be classified into profiles that were largely distinct. These results contribute toward a foundation for using such profiles in describing how individuals compose and in how their writing might be improved.
Bennett, R.E., Zhang, M., Sinharay, S., Guo, H. and Deane, P. (2022), Are There Distinctive Profiles in Examinee Essay-Writing Processes?. Educational Measurement: Issues and Practice, 41: 55-69.

This open-access book describes the extensive contributions made toward the advancement of human assessment by scientists from one of the world’s leading research institutions, Educational Testing Service. The book’s four major sections detail research and development in measurement and statistics, education policy analysis and evaluation, scientific psychology, and validity. Many of the developments presented have become de-facto standards in educational and psychological measurement, including in item response theory (IRT), linking and equating, differential item functioning (DIF), and educational surveys like the National Assessment of Educational Progress (NAEP), the Programme of international Student Assessment (PISA), the Progress of International Reading Literacy Study (PIRLS) and the Trends in Mathematics and Science Study (TIMSS). In addition to its comprehensive coverage of contributions to the theory and methodology of educational and psychological measurement and statistics, the book gives significant attention to ETS work in cognitive, personality, developmental, and social psychology, and to education policy analysis and program evaluation. The chapter authors are long-standing experts who provide broad coverage and thoughtful insights that build upon decades of experience in research and best practices for measurement, evaluation, scientific psychology, and education policy analysis. Opening with a chapter on the genesis of ETS and closing with a synthesis of the enormously diverse set of contributions made over its 70-year history, the book is a useful resource for all interested in the improvement of human assessment.
Bennett, R. E., & von Davier, M. (Eds.). (2017). Advancing human assessment: The methodological, psychological, and policy contributions of ETS. Cham, Switzerland: Springer Open.
From Cognitive-Domain Theory to Assessment Practice
This article exemplifies how assessment design might be grounded in theory, thereby helping to strengthen validity claims. Spanning work across multiple related projects, the article first briefly summarizes an assessment system model for the elementary and secondary levels. Next the article describes how cognitive-domain theory and principles are used in the design of a scenario-based summative assessment for argumentation in the English language arts. Finally, results from several psychometric approaches are used to evaluate propositions suggested by the domain theory, including ones related to the use of topical scenarios and learning progressions in assessment design. Although results generally supported these propositions, the work described represents only a small step in a long-term, iterative process of theory development, assessment design, and empirical tryout, which should, in principle, lead to more valid assessments that better inform teaching and learning.
Bennett, R. E., Deane, P., & W. van Rijn, P. (2016). From Cognitive-Domain Theory to Assessment Practice. Educational Psychologist, 51(1), 82–107. https://doi.org/10.1080/00461520.2016.1141683
The Changing Nature of Educational Assessment
On the surface, this chapter concerns the evolution of educational assessment from a paper-based technology to an electronic one. On a deeper level, that evolution is more substantive. As has been noted, that evolution can be viewed in terms of developmental stages (Bennett, 1998, 2010b; Bunderson, Inouye, & Olsen, 1989). In the first section of this chapter, those stages are briefly described and used to place the new generation of assessments being created by the two comprehensive Common Core State Assessment (CCSA) consortia, the Partnership for the Assessment of Readiness for College and Careers (PARCC), and the Smarter Balanced Assessment Consortium (SBAC).1 That placement is primarily employed to make the characteristics of each stage concrete, as well as to highlight key aspects of the consortia’s emerging assessment designs. Next, some of the more substantive factors that differentiate the most advanced stage from the earlier ones are discussed, as well as the challenges in producing assessments fit for that most advanced stage. In that most advanced stage, such innovations are considered as the continuous testing made possible by electronic learning environments (e.g., games, simulations, e-books, massive open online courses). This section identifies, in passing, advanced features that the CCSA consortia are actively incorporating, as well as ones to which they might at some point aspire. Finally, a conclusion is offered, including suggestions for research.
Bennett, R. E. (2015). The Changing Nature of Educational Assessment. Review of Research in Education, 39(1), 370-407. https://doi.org/10.3102/0091732X14554179 (Original work published 2015)
This paper covers six interrelated issues in formative assessment (aka, ‘assessment for learning’). The issues concern the definition of formative assessment, the claims commonly made for its effectiveness, the limited attention given to domain considerations in its conceptualisation, the under‐representation of measurement principles in that conceptualisation, the teacher‐support demands formative assessment entails, and the impact of the larger educational system. The paper concludes that the term, ‘formative assessment’, does not yet represent a well‐defined set of artefacts or practices. Although research suggests that the general practices associated with formative assessment can facilitate learning, existing definitions admit such a wide variety of implementations that effects should be expected to vary widely from one implementation and student population to the next. In addition, the magnitude of commonly made quantitative claims for effectiveness is suspect, deriving from untraceable, flawed, dated, or unpublished sources. To realise maximum benefit from formative assessment, new development should focus on conceptualising well‐specified approaches built around process and methodology rooted within specific content domains. Those conceptualisations should incorporate fundamental measurement principles that encourage teachers and students to recognise the inferential nature of assessment. The conceptualisations should also allow for the substantial time and professional support needed if the vast majority of teachers are to become proficient users of formative assessment. Finally, for greatest benefit, formative approaches should be conceptualised as part of a comprehensive system in which all components work together to facilitate learning.
Bennett, R. E. (2011). Formative assessment: a critical review. Assessment in Education: Principles, Policy & Practice, 18(1), 5–25. https://doi.org/10.1080/0969594X.2010.513678
Cognitively Based Assessment of, for, and as Learning (CBAL): A Preliminary Theory of Action for Summative and Formative Assessment
CBAL (Cognitively Based Assessment of, for, and as Learning) is a research initiative intended to create a model for an innovative K–12 assessment system that documents what students have achieved (of learning); helps identify how to plan instruction (for learning); and is considered by students and teachers to be a worthwhile educational experience in and of itself (as learning). Because CBAL intends to not only measure student achievement but also facilitate it, CBAL, like any similar assessment program, requires a theory of action. This paper describes the notion of theory of action, offers a preliminary version of such a theory for CBAL, and outlines a provisional research program for evaluating that theory.
Bennett, R. E. (2010). Cognitively Based Assessment of, for, and as Learning (CBAL): A Preliminary Theory of Action for Summative and Formative Assessment. Measurement: Interdisciplinary Research and Perspectives, 8(2–3), 70–91. https://doi.org/10.1080/15366367.2010.508686
Inexorable and Inevitable: The Continuing Story of Technology and Assessment
This paper argues that the inexorable advance of technology will force fundamental changes in the format and content of assessment. Technology is infusing the workplace, leading to widespread requirements for workers skilled in the use of computers. Technology is also finding a key place in education. This is occurring not only because technology skill has become a workplace requirement. It is also happening because technology provides information resources central to the pursuit of knowledge and because the medium allows for the delivery of instruction to individuals who couldn’t otherwise obtain it. As technology becomes more central to schooling, assessing students in a medium different from the one in which they typically learn will become increasingly untenable. Education leaders in several states and numerous school districts are acting on that implication, implementing technology-based tests for low- and high-stakes decisions in elementary and secondary schools and across all key content areas. While some of these examinations are already being administered statewide, others will take several years to bring to fully operational status. These groundbreaking efforts will undoubtedly encounter significant difficulties that may include cost, measurement, technological-dependability, and security issues. But most importantly, state efforts will need to go beyond the initial achievement of computerizing traditional multiple-choice tests to create assessments that facilitate learning and instruction in ways that paper measures cannot.
Bennett, R. E. (2002). Inexorable and Inevitable: The Continuing Story of Technology and Assessment. The Journal of Technology, Learning and Assessment, 1(1). Retrieved from https://ejournals.bc.edu/index.php/jtla/article/view/1667
Validity and Automated Scoring: It's Not Only the Scoring
What are the validity issues involved in automated scoring of tests? What is the nature of the interplay among construct definition, task design, examinee interface, tutorial, test development tools, and automated scoring and reporting?
Bennett, R.E. and Bejar, I.I. (1998), Validity and Automated Scoring: It's Not Only the Scoring. Educational Measurement: Issues and Practice, 17 (4), 9-17. https://doi.org/10.1111/j.1745-3992.1998.tb00631.x
Influence of behavior perceptions and gender on teachers' judgments of students' academic skill
This study evaluated the hypothesis that gender and behavior, as perceived by teachers, affect judgment of the academic skills of their students. A path model was proposed to describe the relationships among tested academic skill, gender, behavior grades, and teachers' academic judgments. The model was evaluated separately in each of 3 grades (kindergarten–2nd) in 2 locations, with scholastic grades and structured ratings in specific academic skill areas as the dependent variables. Results showed that, after tested academic skill and gender were controlled for, teachers' perceptions of students' behavior constituted a significant component of their scholastic judgments. This effect was more pronounced for the judgments of boys because, in Grades 1 and 2, their conduct was perceived as less adequate than was girls' behavior.
Bennett, R. E., Gottesman, R. L., Rock, D. A., & Cerullo, F. (1993). Influence of behavior perceptions and gender on teachers' judgments of students' academic skill. Journal of Educational Psychology, 85(2), 347–356. https://doi.org/10.1037/0022-0663.85.2.347
Chinese Language Publications
个性化测评:下一个测评前沿 [Personalization: The Next Assessment Frontier]
传统意义上严格的标准化测评已是属于过去时代的理念,个性化测评才是未来的发展方向。之所以得出这一论断,是因为随着各国人口多样性的增加,标准化测评非但无法保障公平,反而可能会削弱其实现。实际上,教育测量领域内早已有一些个性化测评的实践先例。从这些先例中,可以提炼出个性化测评的通用方法,其中若干方案已在当前的实践中被广泛采用。不过,无论是现有方法还是更具前瞻性的人工智能测评方法,都存在一些问题,引起了考生、考试项目设计方与测评使用者等的顾虑和关注。本文简要回顾了标准化测评的起源与发展变化,探讨了两个教育测量领域的个性化测评实践先例,并详述了三类个性化测评的通用方法,最后阐述了笔者对于这一测评前沿方向的若干顾虑。
Bennett, R. E. (2025). Personalization: The next assessment frontier.|个性化测评:下一个测评前沿. Journal of China Examinations|中国考试. DOI: 10.19360/j.cnki.11-3303/g4.2025.05.003 https://mp.weixin.qq.com/s/1IswQTjfC5dPPWdlaE8Oug
测评和学习中的人工智能使用和公平性 [AI and equity in assessment and learning]
本文讨论了测评中的人工智能和公平性问题。当前人工智能运用于教育考试的自动评分,项目生成以及安全保障等方面。人工智能也可运用于人性化学习,如适应性形成性评价中相应内容的生成。这种做法不仅是社会道德层面,也是法律层面的要求。为了满足这种要求,我们必须在测评中构建使用人工智能的方法,提升公平性教育、助力个性化学习。为此,我们应借鉴、适应随不同学生进入校园的多元文化知识及身份认同;允许在测试和学习过程中运用多种表达途径;鼓励深度学习;使用有助于解释的方法。最后,测评开发和人工智能开发人员应致力于解决关键的教育问题,甚至社会问题,并确保在设计意图中时刻贯穿着公平性。
Bennett, R. E. (2022). 测评和学习中的人工智能使用和公平性 [AI and equity in assessment and learning]。 Admissions Testing Research|招生考试研究,1,1-6.
教育高危群体在写作过程上的性别差异研究 [How Do Educationally At-Risk Men and Women Differ in Their Essay-Writing Processes?]
本研究基于一个中学同等学力测验,考察了教育高危群体在写作过程上的性 别差异。研究涉及了来自美国23个州的合计三万多考生,每一考生均参与了 该语言测验的12副本考卷中的一个。研究借助键盘记录中抽取出的特征推断 背后的写作过程,并将之整合为7个过程指标。研究结果发现女性被试的作文 得分和语言测验总分均领先于男性,但领先程度很微弱。更重要的是,当控 制了语言测验总分、年龄和作文题目后,全部7个过程指标均显示出显著的性 别差异,其中,最突出的指标是流畅性和编辑性的不同方面。当前研究结果 在许多重要方面与先前一些对在校生和成人的研究结果相一致,也与在线和 纸笔写作任务的研究结果相吻合。关于对使用字符类语言(如汉语) 进行写作 的个体如何开展类似研究,文章结尾给出了一些建议。
Bennett, R. E., Zhang, M. & Sinharay, S. (2021). 教育高危群体在写作过程上的性别差异研究 [How do Educationally at-risk men and women differ in their essay- writing processes?]. Chinese/English Journal of Educational Measurement and Evaluation|教育测量与评估双语季刊, 2(1), 18-32. https://doi.org/10.59863/RNSC1388
教育测量的未来趋势 [Future Trends in Educational Measurement]
本文根据作者于2018年4月在纽约召开的(美国)全国教育测量学会(NCME)年会上的主席演讲稿修改而成。作者首先介绍了未来教育测量发展变化的11个可能特征、每个特征之所以重要的原因,以及应该如何看待这些变化。随后概述了未来教育测量领域不太可能发生变化的几个方面。最后对今后十年的教育发展进行了展望,并就这些发展对教育测量工作者可能产生的影响进行了讨论。
Bennett, R. E. (2019). 教育测量的未来趋势 [Future trends in educational measurement]. 教育测量与评价 [Educational Measurement and Evaluation], 3, 3-14. https://mp.weixin.qq.com/s/1jkwMqm7PYyRgcKPCf6d5g
Contact
Reach out to discuss a conference presentation, consultation, or collaboration, or to request a publication.
© 2025 Randy E. Bennett. All rights reserved.



































