ورود به حساب

نام کاربری گذرواژه

گذرواژه را فراموش کردید؟ کلیک کنید

حساب کاربری ندارید؟ ساخت حساب

ساخت حساب کاربری

نام نام کاربری ایمیل شماره موبایل گذرواژه

برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید


09117307688
09117179751

در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید

دسترسی نامحدود

برای کاربرانی که ثبت نام کرده اند

ضمانت بازگشت وجه

درصورت عدم همخوانی توضیحات با کتاب

پشتیبانی

از ساعت 7 صبح تا 10 شب

دانلود کتاب The Routledge International Handbook of Automated Essay Evaluation (Routledge International Handbooks)

دانلود کتاب کتابچه راهنمای بین‌المللی راتلج برای ارزیابی خودکار مقاله (کتاب‌های بین‌المللی راتلج)

The Routledge International Handbook of Automated Essay Evaluation (Routledge International Handbooks)

مشخصات کتاب

The Routledge International Handbook of Automated Essay Evaluation (Routledge International Handbooks)

ویرایش: 1 
نویسندگان: ,   
سری:  
ISBN (شابک) : 1032502568, 9781032502564 
ناشر: Routledge 
سال نشر: 2024 
تعداد صفحات: 647 
زبان: English 
فرمت فایل : PDF (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود) 
حجم فایل: 114 مگابایت 

قیمت کتاب (تومان) : 81,000



ثبت امتیاز به این کتاب

میانگین امتیاز به این کتاب :
       تعداد امتیاز دهندگان : 8


در صورت تبدیل فایل کتاب The Routledge International Handbook of Automated Essay Evaluation (Routledge International Handbooks) به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.

توجه داشته باشید کتاب کتابچه راهنمای بین‌المللی راتلج برای ارزیابی خودکار مقاله (کتاب‌های بین‌المللی راتلج) نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.


توضیحاتی درمورد کتاب به خارجی



فهرست مطالب

Cover
Half Title
Series Information
Title Page
Copyright Page
Table of Contents
About the Editors
List of Contributors
Foreword
Acknowledgments
	Reviewer Acknowledgments
Section 1 Introduction to AEE and Modern AEE Systems
	1 Introduction to Automated Essay Evaluation
		1.1 Introduction
		1.2 The Evolution of Automated Scoring and Automated Feedback On Writing
			1.2.1 The 2012 Hewlett Trials and Their Outcomes
			1.2.2 The National Assessment of Educational Progress (NAEP) Trials
		1.3 Current Use Cases for Automated Essay Evaluation
			1.3.1 Evaluating Essays With 150 Words Or More
			1.3.2 Short-Form Constructed Responses With Fewer Than 150 Words
			1.3.3 Content-Intensive Responses
			1.3.4 Content-Superficial Responses
			1.3.5 Summatively Scored Essays
			1.3.6 Formative Assessment
		1.4 Frameworks for Validating AEE
		1.5 Lingering and New Concerns Related to AEE
		1.6 The Current Handbook: Apprising the State of the Art and Fostering Future Development
		References
	2 Automated Essay Evaluation at Scale: Hybrid Automated Scoring/Hand Scoring in the Summative Assessment Program
		2.1 Introduction
		2.2 Progressive Hybrid Scoring Approaches
			2.2.1 Overview
			2.2.2 Project Essay Grade
				2.2.2.1 PEG Architecture
				2.2.2.2 PEG Hybrid Scoring Applications
				2.2.2.3 Evidence for Use
			2.2.3 Requirements
				2.2.3.1 Training Data
				2.2.3.2 Validation
			2.2.4 Training
			2.2.5 Hybrid Scoring Process
				2.2.5.1 Role of Humans
				2.2.5.2 Role of the Engine
		2.3 Implications
			2.3.1 Future Directions
		Notes
		References
	3 Exploration of the Stacking Ensemble Learning Algorithm for Automated Scoring of Constructed-Response Items in Reading Assessment
		3.1 Introduction
		3.2 Methods
			3.2.1 Data
			3.2.2 Model Building Process
				3.2.2.1 Text Preprocessing and Processing
				3.2.2.2 Feature Extraction
				3.2.2.3 Automated Scoring Classifier Development
			3.2.3 Model Evaluation
		3.3 Results
			3.3.1 Automated Scoring Classifier Development
		3.4 Summary and Discussion
		Notes
		References
	4 Scoring Essays Written in Persian Using a Transformer-Based Model: Implications for Multilingual AES
		4.1 Introduction
			4.1.1 Persian as a Unique Case Study for Multilingual AES
			4.1.2 Purpose of the Chapter
		4.2 Overview of a Transformer-Based System for AES
			4.2.1 Introduction to Transformers
			4.2.2 Bidirectional Encoder Representations From Transformers
			4.2.3 Multilingual BERT
		4.3 Scoring Persian Essays Using MBERT Transformer Model
			4.3.1 Data Set
			4.3.2 Model Architecture
				4.3.2.1 Word Embedding Word2Vec Model
				4.3.2.2 Transformer MBERT Model
				4.3.2.3 Hyperparameter Tuning
			4.3.3 Performance Measures
		4.4 Comparing the Performance of the Word Embedding and Transformer Models
			4.4.1 Performance of Models Overall
			4.4.2 Performance of Models By Score Level
		4.5 Conclusions and Implications for Multilingual AES
			4.5.1 The Importance of Transformers for Multilingual AES
			4.5.2 Using MBERT to Score Essays Written in Persian
			4.5.3 Assessment Technology, Equity, and Opportunity
		Note
		References
		Appendix A
		Appendix B
			.......
			....
			Instruction
			Topics
	5 SmartWriting-Mandarin: An Automated Essay Scoring System for Chinese Foreign Language Learners
		5.1 Introduction
		5.2 Related Works
			5.2.1 DNN-Based AES Systems
			5.2.2 Chinese Automatic Essay Scoring and ACES
		5.3 Details of SW-M
			5.3.1 Preprocessing Module
			5.3.2 Textual Features
			5.3.3 Typos
			5.3.4 Grammatical Errors
			5.3.5 Scoring Model: A Fuzzy-Based Approach
		5.4 Performance of SWM
		5.5 Future Studies
		Acknowledgments
		References
	6 NLP Application in the Hebrew Language for Assessment and Learning
		6.1 Introduction
		6.2 Hebrew Orthography and Morphology
			6.2.1 Hebrew Orthography
			6.2.2 Hebrew Morphology
				6.2.2.1 The Verb System
				6.2.2.2 The Noun System
				6.2.2.3 Prepositions, Conjunctions, and Determiners
			6.2.3 Text Length and Density
				6.2.3.1 Hebrew Versus English Lexicon
				6.2.3.2 Text Length
		6.3 Morphological Lexicon and Corpora
			6.3.1 Morphological Lexicon
			6.3.2 Hebrew Corpora
				6.3.2.1 M1 Corpus and the Annotated Corpus
				6.3.2.2 News Corpus
			6.3.3 Language Models
		6.4 Computational Infrastructure for NLP in Hebrew
			6.4.1 Tokenizer
			6.4.2 Morphological Analyzer
			6.4.3 Morphological Disambiguator
			6.4.4 Semantic Disambiguator
			6.4.5 Feature Extraction
				6.4.5.1 Statistical Or “Surface” Features
				6.4.5.2 Lexical Features
				6.4.5.3 Morphological Features
				6.4.5.4 Syntactic Features
				6.4.5.5 Semantic Features
			6.4.6 Grouping Text Features Into Linguistic Factors
			6.4.7 Text Analysis Pipeline
		6.5 Automated Essay Scoring
			6.5.1 Score Prediction Algorithms
			6.5.2 Grouping Features Into Macro-Features and Factors
			6.5.3 Validity of NiteRater
				6.5.3.1 Face and Content Validity – Identifying and Scoring Aberrant Essays
				6.5.3.2 Predictive Or Criterion-Related Validity – Scoring of Classroom Essays
				6.5.3.3 Predictive Or Criterion-Related Validity – Scoring of Tests for Admission to Higher Education
				6.5.3.4 “True” Validity – Agreement With True Scores
				6.5.3.5 Content Validity – Generalizing the Prediction Equation Across Prompts
			6.5.4 Validity of Combined Computer and Human Scores
			6.5.5 Quality Assurance of Essay Scoring
		6.6 Other Applications of the Hebrew-NLP System
			6.6.1 Providing Feedback to Essay Writers
			6.6.2 Readability Assessment
				6.6.2.1 Application to Textbooks (CET)
				6.6.2.2 Simplification of Hebrew Legal Texts
			6.6.3 Online Service to the Research Community
		6.7 Summary, Open Issues, and Future Directions
		References
Section 2 Expanding Automated Evaluation: Reading, Speech, Mathematics, and Writing Research
	7 Automated Scoring for NAEP Short-Form Constructed Responses in Reading
		7.1 Introduction
			7.1.1 Short-Form Constructed Responses
			7.1.2 The Current Study
		7.2 Method
			7.2.1 Prompt-Specific Competition
				7.2.1.1 Participants
				7.2.1.2 Instruments
				7.2.1.3 Procedure
				7.2.1.4 Results
			7.2.2 Generic Competition
				7.2.2.1 Participants and Instruments
				7.2.2.2 Procedure
				7.2.2.3 Results
		7.3 Discussion
			7.3.1 Limitations
		References
	8 Automated Scoring and Feedback for Spoken Language
		8.1 Introduction
		8.2 Automated Scoring of Spoken Vs. Written Language
		8.3 From the Rubric to Speech Features
		8.4 Automated Speech Scoring System Architecture
			8.4.1 Automatic Speech Recognition
			8.4.2 Computing Speech Features
			8.4.3 Filtering Models
			8.4.4 Scoring Models
		8.5 Operational Considerations
		8.6 Providing Feedback to Language Learners
		8.7 Speech Scoring Without Curated Features
		8.8 Open Research Issues
		8.9 Conclusion
		Note
		References
	9 Automated Scoring of Math Constructed-Response Items
		9.1 Introduction
		9.2 Anatomy of a Math Item
		9.3 Challenges of Math Automated Scoring
			9.3.1 Representation of Mathematics
			9.3.2 Equivalence of Expressions
			9.3.3 Evaluation of Mathematics
			9.3.4 Extracting Mathematics From Prose
			9.3.5 Understanding Reasoning
		9.4 Injecting Mathematical Reasoning Into NLP Scoring Models
			9.4.1 Scoring of Math-Only Responses
			9.4.2 Scoring of Responses Containing Prose
			9.4.3 Brief Comment On the Validity of Automated Scoring of Math CR Items
		9.5 Empirical Study
			9.5.1 Ablation Study Results
			9.5.2 Large Language Models for Math CR Scoring
		9.6 Conclusion
		References
	10 We Write Automated Scoring: Using ChatGPT for Scoring in Large-Scale Writing Research Projects
		10.1 Introduction
			10.1.1 We Write Intervention
			10.1.2 Theoretical Framework
		10.2 Developing a ChatGPT-Based Scoring Algorithm to Evaluate the Efficacy of the We Write Intervention
			10.2.1 Design of Measures
			10.2.2 Human Scoring Scheme for Essay Quality
			10.2.3 ChatGPT Scoring Model Architecture/Details
				10.2.3.1 Refinement of Scoring
		10.3 Score Validation: Comparing Human and ChatGPT Scoring
		10.4 Discussion and Future Research
			10.4.1 Score Tendencies
			10.4.2 Agreement Between Scores
			10.4.3 Generosity of Scoring
			10.4.4 Correlation Across Proficiency Levels
			10.4.5 Efficiency
		10.5 Limitations
		10.6 Conclusion
		Acknowledgments
		References
Section 3 Innovations in Automated Writing Evaluation
	11 Exploring the Role of Automated Writing Evaluation as a Formative Assessment Tool Supporting Self-Regulated Learning in Writing
		11.1 Introduction
			11.1.1 The Present Chapter
		11.2 Does AWE Help Students Learn Evaluation Criteria?
			11.2.1 Learning Evaluation Criteria: Summary and Future Directions
		11.3 Does AWE Help Students Practice Writing Skills and Processes?
			11.3.1 Practice Writing Skills and Processes: Summary and Future Directions
		11.4 Does AWE Provide Understandable and Actionable Feedback?
			11.4.1 Understandable and Actionable Feedback: Summary and Future Directions
		11.5 Does AWE-Supported Peer Review Offer Benefits for Reviewers and Writers?
			11.5.1 AWE-Supported Peer Review: Summary and Future Directions
		11.6 Does AWE Support Students Taking Ownership of Their Learning?
			11.6.1 Ownership of Learning: Summary and Future Directions
		11.7 Conclusion
		References
	12 Supporting Students’ Text-Based Evidence Use Via Formative Automated Writing and Revision Assessment
		12.1 Introduction
			12.1.1 Theoretical Background and Motivation for ERevise System Design
		12.2 From Automated Essay Scoring to Automated Writing Evaluation
			12.2.1 ERevise – Initial Design of a Formative Assessment System
			12.2.2 Prior Research Contributing to the Motivation for Formative Assessment Grounded in Sociocultural Learning Theories
				12.2.2.1 Conceptual Change as Elemental for “Learning” the Genre of Argumentation
				12.2.2.2 Meaning-Making as the Goal of Feedback (And Measurement in the AWE System)
		12.3 Validity of the ERevise System for Use as a Formative Assessment
		12.4 Toward ERevise+RF
			12.4.1 Revising in Response to Feedback Is a Critical Skill Not Yet Addressed in AWE Systems
			12.4.2 ERevise+RF as an AWE Formative Assessment to Support the Development of Revision Skills
			12.4.3 The System Is Based On Rubrics Assessing Critical Aspects of the Revision Construct
			12.4.4 NLP Approaches Prioritize Rubric Alignment
		12.5 Summary and Conclusions: An Unfinished Composition
		Notes
		Acknowledgements
		References
	13 The Use of AWE in Non-English Majors: Student Responses to Automated Feedback and the Impact of Feedback Accuracy
		13.1 Introduction
			13.1.1 Literature Review
				13.1.1.1 Theoretical Background
				13.1.1.2 Learner Responses to Automated Feedback and the Impact of Feedback Accuracy
			13.1.2 The Present Study
		13.2 Methods
			13.2.1 Context and Participants
			13.2.2 Criterion
			13.2.3 Data Collection and Procedures
			13.2.4 Data Analysis
		13.3 Results
			13.3.1 To What Extent Do Non-English Major Students Address Automated Feedback From Criterion and Make Successful and Relevant Revisions When Using It On a Voluntary Basis?
			13.3.2 How Does the Accuracy of Automated Feedback Features of Criterion Influence the Success of Non-English Major Students’ Revisions?
		13.4 Discussion
		13.5 Conclusion
		References
	Chapter 14 Relationships Between Middle-School Teachers’ Perceptions and Application of Automated Writing Evaluation and Student Performance
		14.1 Introduction
			14.1.1 Implementation of AWE in Schools
			14.1.2 Educators’ Views of AWE
			14.1.3 Study Purpose
		14.2 Methods
			14.2.1 Study Context
			14.2.2 MI Write Overview
				14.2.2.1 Fidelity of Implementation Expectations
			14.2.3 Participants
			14.2.4 Measures
				14.2.4.1 AWE Perceptions Scale (AWE-P)
				14.2.4.2 MI Write Usage Indicators
				14.2.4.3 Student Writing Performance
			14.2.5 Data Analysis
				14.2.5.1 RQ1
				14.2.5.2 RQ2
		14.3 Results
			14.3.1 Descriptive Statistics: Teacher Perceptions and Fidelity
			14.3.2 Alignment Between Teachers’ Perceptions and Usage of MI Write
				14.3.2.1 Usability
				14.3.2.2 Usefulness
				14.3.2.3 Social Desirability
				14.3.2.4 Misalignments
			14.3.3 Relations Among Teacher Perceptions and Usage, and Student Writing Performance
		14.4 Discussion
			14.4.1 Limitations and Future Directions
		14.5 Conclusion
		Conflict of Interest Statement
		References
	15 Automated Writing Trait Analysis
		15.1 Introduction
			15.1.1 Forms of Automated Writing Trait Analysis
				15.1.1.1 Multidimensional Modeling of Genre and Style
				15.1.1.2 Multidimensional Modeling of Writing Quality
				15.1.1.3 Multidimensional Keystroke Log Analysis
			15.1.2 The Potential Power of Combined Models
		15.2 A Hierarchical, Multidimensional Model of Text Variation in Student Essays
			15.2.1 Feature Sets
			15.2.2 Participants/Data Source
			15.2.3 Confirmatory Factor Analysis
			15.2.4 Reliability
			15.2.5 External Validity
		15.3 Multidimensional Modeling of Variation in Student Writing Processes
		15.4 Using Combined Multidimensional Models to Measure the Effects of Instruction
			15.4.1 Method
				15.4.1.1 Participants
				15.4.1.2 Materials
				15.4.1.3 Qualitative School Differences
				15.4.1.4 Procedure
			15.4.2 Results
				15.4.2.1 Factor Structure
				15.4.2.2 Relations to Prior-Year Performance
				15.4.2.3 Changes in Performance Over Time
				15.4.2.4 Relation With Pretest and Prior-Year Variables
				15.4.2.5 School Differences
				15.4.2.6 Demographic Effects
		15.5 Discussion, Limitations, and Conclusions
		Notes
		References
	16 Advances in Automating Feedback for Argumentative Writing: Feedback Prize as a Case Study
		16.1 Introduction
		16.2 The Feedback Prize
			16.2.1 Corpus Annotation
				16.2.1.1 Discourse Elements
				16.2.1.2 Discourse Effectiveness
				16.2.1.3 Holistic Essay Quality
		16.3 Feedback Prize 1.0: Evaluating Student Writing
			16.3.1 Architecture and Accuracy of Select Models
			16.3.2 Overall Model Trends
			16.3.3 Bias of Select Models
		16.4 Feedback Prize 2.0: Predicting Effective Arguments
			16.4.1 Architecture and Performance of Select Models
			16.4.2 Accuracy of Select Models By Effectiveness Rating
			16.4.3 Bias of Select Models
		16.5 Leveraging the PERSUADE Corpus and Algorithms to Improve Student Writing Outcomes in AWE Systems
			16.5.1 Potential User-Facing Features in an AWE System
			16.5.2 Supporting the Academic Growth of All Students, Including the Historically Marginalized
		References
		Appendix
	17 Automated Feedback in Formative Assessment
		17.1 Introduction
			17.1.1 Background and Context
			17.1.2 A Focus On Claims and Evidence
			17.1.3 Structured, Content-Centric Rubrics
			17.1.4 Associating Response Elements to Rubric Quality-Level Definitions
			17.1.5 Approach to Task and Texts
			17.1.6 Moving Beyond Generic, Holistic Rubrics
		17.2 Research Design
			17.2.1 Item/Lesson Requirements
			17.2.2 Sentence-Level Scoring
			17.2.3 Data Sources
			17.2.4 Scoring Processes
		17.3 Automated Scoring With a Focus On Feedback
			17.3.1 Overview of the Automated Scoring Approach
			17.3.2 Data Selection
			17.3.3 CER Data Collection
			17.3.4 How Much CER Data Is Needed
			17.3.5 Additional Analytics for Response Analysis
				17.3.5.1 Claim Category Exemplars
				17.3.5.2 Evidence Exemplars and Keyword Sets
			17.3.6 The Automated Evaluation and Feedback Pipeline
			17.3.7 From Predictions and Analytics to Feedback
			17.3.8 Examples of Feedback to Simulated Responses
		17.4 Results and Discussion
			17.4.1 What Works Well
			17.4.2 Challenges and Areas for Additional Analytics Development
			17.4.3 Future Research: Efficacy and Beyond
			17.4.4 Narrative Feedback and Analysis: A Potential Next Step With Large Language Models
		References
Section 4 Factors Affecting the Performance of Automated Evaluation
	18 Using Automated Scoring to Support Rating Quality Analyses for Human Raters
		18.1 Introduction
			18.1.1 Purpose
			18.1.2 Background
				18.1.2.1 Evaluating Ratings in Incomplete Scoring Designs
				18.1.2.2 Combining Human and Automated Ratings
		18.2 Methods
			18.2.1 Data Analysis
				18.2.1.1 Rater Effects Analyses
				18.2.1.2 Severity and Leniency Effects
				18.2.1.3 Centrality and Extremism Effects
			18.2.2 Summarizing Rater Effect Results
		18.3 Results
			18.3.1 Severity and Leniency
			18.3.2 Centrality and Extremism
		18.4 Discussion
		References
	19 Calibrating and Evaluating Automated Scoring Engines and Human Raters Over Time Using Measurement Models
		19.1 Introduction
			19.1.1 Purpose
			19.1.2 Background: Rating Quality Analysis With Automated Scoring Engines
		19.2 Methods
			19.2.1 Data Analysis
		19.3 Results
			19.3.1 Rater Drift Results
		19.4 Discussion
			19.4.1 How Can ASEs Be Incorporated Into Rater Drift Analyses Based On Measurement Models?
			19.4.2 How Accurate Are Rater Drift Analyses Based On Measurement Models to Changes in Rater Severity Between Administrations?
			19.4.3 Practical Implications
			19.4.4 Directions for Future Research
		References
	20 AI Scoring and Writing Fairness
		20.1 Introduction
			20.1.1 DIF Testing and Machine Learning
		20.2 Method
			20.2.1 Study Context
			20.2.2 Participants
			20.2.3 Instruments
			20.2.4 Procedure
		20.3 Results
		20.4 Discussion
		Author Note
		References
	21 Automating Bias in Writing Evaluation: Sources, Barriers, and Recommendations
		21.1 Concerns About Bias in Automated Writing Evaluation
		21.2 Human Language Biases
		21.3 Automating Bias
			21.3.1 Biased Training Data
			21.3.2 Constrained Analytical Assumptions
				21.3.2.1 Aggregation
				21.3.2.2 Assumptions of Linearity
			21.3.3 Black Box Approaches: Lack of Validation and Transparency
				21.3.3.1 Validation
				21.3.3.2 Lack of Transparency
		21.4 Bias Reduction and Mitigating Bias in Automated Writing Evaluation
			21.4.1 Recommendation 1: Compile Intentionally Inclusive and Representative Training Datasets
			21.4.2 Recommendation 2: Rigorously Train Human Raters Regarding Potential Language Ideologies Along With Anti-Bias Assessment Practices
			21.4.3 Recommendation 3: Explore Training Data for Discrepancies, Disparities, and Other Signs of Bias Across Subpopulations
			21.4.4 Recommendation 4: Implement Principled Aggregation Only After Establishing Equivalence
			21.4.5 Recommendation 5: Include Nonlinear Relationships in Analyses and Allow for Complex Associations
			21.4.6 Recommendation 6: Extend Validation Procedures to Assess Accuracy Within and Across Subgroups and Subpopulations
			21.4.7 Recommendation 7: Report Algorithms, Models, and Underlying Inferences in a Transparent and Explainable Manner
		21.5 Conclusion
		References
	22 Explainable AI and AWE: Balancing Tensions Between Transparency and Predictive Accuracy
		22.1 Introduction
			22.1.1 Role of XAI in the Data Science Life Cycle and AWE Development
			22.1.2 Types and Methods of XAI
			22.1.3 Why Use XAI in AWE?
			22.1.4 Challenges of Applying XAI in the AWE Field
			22.1.5 Desirable Properties of XAI
		22.2 How SHAP Sheds Light On the Mechanics Inside AWE
		22.3 Applications of XAI in AWE
		22.4 Impact of XAI On Other Aspects of AWE Development and Deployment
		22.5 Why Should XAI Not Be Blindly Trusted for AWE?
		Notes
		References
	23 Validity Argument Roadmap for Automated Scoring
		23.1 Introduction
		23.2 The Roadmap
			23.2.1 Describe Assessment Constructs, Including Rationale for Using Automated Scoring
			23.2.2 Document Intended Score Interpretation and Uses (SIUs) (Including Scoring Rubrics, Processes, and Features)
			23.2.3 Describe Test Blueprints and Specifications to Clarify Item Types That Use Automated Scoring
			23.2.4 Describe Item/Test Development Procedures, Including Evidence Supporting Validity Arguments
			23.2.5 Document Approach for Developing AI Scoring Model(s)
			23.2.6 Describe Test Administration Procedures, Including Evidence Supporting Validity Arguments and System Security
			23.2.7 Document Psychometric Analyses Supporting Validity, Reliability, and Fairness Claims, Including Details of AI Scoring Models
			23.2.8 Document Reporting Procedures, Ensuring Alignment With Core Interpretation and Uses
		23.3 Discussion: Future Directions
			23.3.1 Future Area 1: More External Review
			23.3.2 Future Area 2: Increased Pace of Change and Complexity in AI
			23.3.3 Future Area 3: Continuing Challenges and Opportunities Around Fairness and Diversity
		Notes
		References
Section 5 Technological Innovations: “Where Do We Go From Here?”
	24 Redesigning Automated Scoring Engines to Include Deep Learning Models
		24.1 Introduction
		24.2 Deep Learning Approaches to Automated Scoring
		24.3 Autoscore 1.0
		24.4 Philosophy Driving Autoscore 2.0 Development
			24.4.1 Future-Proof the Software
			24.4.2 Demonstrate Improved (Or at Least Same) Level of Performance
			24.4.3 Minimize Negative Business Impacts
			24.4.4 Support New R&D and Flexible Training/Validation
		24.5 Autoscore 2.0
			24.5.1 Architecture
			24.5.2 Continuity With Autoscore 1.0
			24.5.3 Training and Validation
			24.5.4 Deployment/Scoring
				24.5.4.1 Supporting Several Models and Millions of Responses During Live Scoring
				24.5.4.2 Model Size and Hardware
				24.5.4.3 Sensitivity to Operating Systems and Libraries/Versions
				24.5.4.4 Model Management
			24.5.5 Automated Scoring Performance
			24.5.6 Considerations
				24.5.6.1 Length
				24.5.6.2 Confidence
				24.5.6.3 Error Diagnosis/Explainability
				24.5.6.4 Fairness/Bias
		24.6 Future Work
			24.6.1 Explainability
			24.6.2 Optimization
			24.6.3 Feedback
			24.6.4 Large Language Models
		24.7 Conclusion
		References
	25 Automated Short-Response Scoring for Automated Item Generation in Science Assessments
		25.1 Introduction
			25.1.1 Overview of Automated Essay Scoring
				25.1.1.1 Efficiency, Objectivity, and Consistency
				25.1.1.2 Customized Learning With Transparent AES
			25.1.2 Overview of Automated Item Generation for Educational Assessments
				25.1.2.1 Automated Item Generation Frameworks
				25.1.2.2 Automated Item Generation System to Target Misconception
			25.1.3 Present Study
		25.2 Demonstration With the Science Assessment Response Data
			25.2.1 Data
			25.2.2 Methods and Analysis Framework
				25.2.2.1 Libraries and GPU Setup for Automated Scoring
				25.2.2.2 Preprocessing
			25.2.3 Transformer Neural-Language Models for Automated Scoring
			25.2.4 Attribution Score Analysis for Misconception Identification
			25.2.5 Item Generation Demonstration Using Transformer Models
			25.2.6 Evaluation Metrics
		25.3 Results
			25.3.1 Automated Essay Scoring Evaluation Results
			25.3.2 Attribution Score Analysis Results
			25.3.3 Automated Item Generation Demonstration Results With Creative Prompting
				25.3.3.1 Example Items Generated From Question 5
				25.3.3.2 Semantic Similarity Score and Prompt-Based Scoring Results
		25.4 Conclusion and Discussion
			25.4.1 Limitations and Directions for Future Research
		Notes
		References
	26 Latent Dirichlet Allocation of Constructed Responses
		26.1 Introduction
			26.1.1 Purpose of the Chapter
		26.2 Latent Dirichlet Allocation
			26.2.1 LDA Model Parameters
			26.2.2 LDA as a Graphical Model
			26.2.3 LDA as a Generative Model
			26.2.4 LDA as a Probabilistic Model
			26.2.5 Extension to LDA
			26.2.6 Understanding Word Probabilities of Topics and Topic Proportions
		26.3 Data Preprocessing, Estimation, and Postprocessing for LDA
			26.3.1 Data Cleaning and Preprocessing
				26.3.1.1 Tokenization
				26.3.1.2 Normalization (Lemmatization)
				26.3.1.3 Stopword Removal
			26.3.2 Estimation of Parameters
				26.3.2.1 Prior Specification
				26.3.2.2 Estimation Algorithm and Software
				26.3.2.3 Model Selection
			26.3.3 Postprocessing and Model Interpretation
		26.4 LDA Example
			26.4.1 Essay Data
			26.4.2 Data Cleaning
			26.4.3 Model Estimation and Selection
			26.4.4 Interpreting LDA Results
			26.4.5 Note On Generalized LDA Code
		26.5 Future Direction for LDA
		References
		Appendix A: Generalized R Code for LDA
		Appendix B: Generalized Python Code for LDA
	27 Computational Language Analysis as a Window Into Cognitive Functioning
		27.1 Language as a Window Into Cognitive Functioning
			27.1.1 Computational Measurement of Language: Cognition, Affect, and Social States
			27.1.2 Historical Perspectives On Language for Clinical Assessment of Serious Mental Illness
		27.2 Underlying Technology and Goals for Assessment of Cognitive Functioning
			27.2.1 Overarching Purpose and a Generalized Architecture
			27.2.2 Automatic Speech Recognition (ASR)
			27.2.3 Language and Speech Feature Extraction With a Focus On Several Specific Measures
				27.2.3.1 Lexeme-Level Features
				27.2.3.2 Syntactic Features
				27.2.3.3 Semantic Features
				27.2.3.4 Speech Signal Features
			27.2.4 Training Feature-Based Machine Learning Models
		27.3 Research and Applications Applying Language Analysis to Measuring Mental Health
			27.3.1 Automating Traditional Neuropsychiatric Assessments
			27.3.2 Detecting Presence and Severity of Thought Disorder
			27.3.3 Acoustic Features for Affective and Cognitive State Assessment
			27.3.4 Detecting Onset of Cognitive Decline
			27.3.5 Detection of Students at Risk
			27.3.6 Applying Deep Learning Models for Mental Health Assessment
		27.4 Ethical and Measurement Considerations
			27.4.1 What Are We Measuring?
			27.4.2 Actionable Inferences
			27.4.3 Establishing Trustworthy Measures: Transparency, Generalizability, and Explainability
			27.4.4 Bias and Fairness
			27.4.5 Human in the Loop ML Processing
		27.5 Conclusions
		References
	28 Expanding AWE to Incorporate Reading and Writing Evaluation
		28.1 Introduction
		28.2 Overview of Discourse Comprehension Theories
			28.2.1 Single Document Comprehension
				28.2.1.1 Measuring Comprehension in Single-Text Contexts
			28.2.2 Multiple Document Comprehension
				28.2.2.1 Measuring Comprehension in MD Contexts
		28.3 Recommendations for Comprehension-Aware AWE Systems
			28.3.1 Recommendation 1: AWE Systems Should Account for the Ways in Which Students Engage With Outside Source Material
			28.3.2 Recommendation 2: AWE Systems Should Account for the Comprehension Processes Involved in Understanding the Material That Students Are Asked to Write About
			28.3.3 Recommendation 3: AWE Systems Should Account for Individual Differences in the Skills and Knowledge Involved in Complex Writing Tasks
		28.4 Conclusion
		References
	29 The Two U’s in the Future of Automated Essay Evaluation: Universal Access and User-Centered Design
		29.1 Introduction
		29.2 Universal Access
			29.2.1 Device Agnostic Technologies: Responsive Design
			29.2.2 Language Agnostic Technologies
			29.2.3 Genre Responsive Technologies
			29.2.4 Culturally Responsive Technologies
			29.2.5 Embedded AEE
		29.3 User-Centered Design
			29.3.1 Large-Scale Digital Learning Platforms
			29.3.2 Generative AI
		29.4 Conclusions
		References
Index




نظرات کاربران