دسترسی نامحدود
برای کاربرانی که ثبت نام کرده اند
برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید
در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید
برای کاربرانی که ثبت نام کرده اند
درصورت عدم همخوانی توضیحات با کتاب
از ساعت 7 صبح تا 10 شب
ویرایش: 1 نویسندگان: Mark D. Shermis (editor), Joshua Wilson (editor) سری: ISBN (شابک) : 1032502568, 9781032502564 ناشر: Routledge سال نشر: 2024 تعداد صفحات: 647 زبان: English فرمت فایل : PDF (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود) حجم فایل: 114 مگابایت
در صورت تبدیل فایل کتاب The Routledge International Handbook of Automated Essay Evaluation (Routledge International Handbooks) به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.
توجه داشته باشید کتاب کتابچه راهنمای بینالمللی راتلج برای ارزیابی خودکار مقاله (کتابهای بینالمللی راتلج) نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.
Cover Half Title Series Information Title Page Copyright Page Table of Contents About the Editors List of Contributors Foreword Acknowledgments Reviewer Acknowledgments Section 1 Introduction to AEE and Modern AEE Systems 1 Introduction to Automated Essay Evaluation 1.1 Introduction 1.2 The Evolution of Automated Scoring and Automated Feedback On Writing 1.2.1 The 2012 Hewlett Trials and Their Outcomes 1.2.2 The National Assessment of Educational Progress (NAEP) Trials 1.3 Current Use Cases for Automated Essay Evaluation 1.3.1 Evaluating Essays With 150 Words Or More 1.3.2 Short-Form Constructed Responses With Fewer Than 150 Words 1.3.3 Content-Intensive Responses 1.3.4 Content-Superficial Responses 1.3.5 Summatively Scored Essays 1.3.6 Formative Assessment 1.4 Frameworks for Validating AEE 1.5 Lingering and New Concerns Related to AEE 1.6 The Current Handbook: Apprising the State of the Art and Fostering Future Development References 2 Automated Essay Evaluation at Scale: Hybrid Automated Scoring/Hand Scoring in the Summative Assessment Program 2.1 Introduction 2.2 Progressive Hybrid Scoring Approaches 2.2.1 Overview 2.2.2 Project Essay Grade 2.2.2.1 PEG Architecture 2.2.2.2 PEG Hybrid Scoring Applications 2.2.2.3 Evidence for Use 2.2.3 Requirements 2.2.3.1 Training Data 2.2.3.2 Validation 2.2.4 Training 2.2.5 Hybrid Scoring Process 2.2.5.1 Role of Humans 2.2.5.2 Role of the Engine 2.3 Implications 2.3.1 Future Directions Notes References 3 Exploration of the Stacking Ensemble Learning Algorithm for Automated Scoring of Constructed-Response Items in Reading Assessment 3.1 Introduction 3.2 Methods 3.2.1 Data 3.2.2 Model Building Process 3.2.2.1 Text Preprocessing and Processing 3.2.2.2 Feature Extraction 3.2.2.3 Automated Scoring Classifier Development 3.2.3 Model Evaluation 3.3 Results 3.3.1 Automated Scoring Classifier Development 3.4 Summary and Discussion Notes References 4 Scoring Essays Written in Persian Using a Transformer-Based Model: Implications for Multilingual AES 4.1 Introduction 4.1.1 Persian as a Unique Case Study for Multilingual AES 4.1.2 Purpose of the Chapter 4.2 Overview of a Transformer-Based System for AES 4.2.1 Introduction to Transformers 4.2.2 Bidirectional Encoder Representations From Transformers 4.2.3 Multilingual BERT 4.3 Scoring Persian Essays Using MBERT Transformer Model 4.3.1 Data Set 4.3.2 Model Architecture 4.3.2.1 Word Embedding Word2Vec Model 4.3.2.2 Transformer MBERT Model 4.3.2.3 Hyperparameter Tuning 4.3.3 Performance Measures 4.4 Comparing the Performance of the Word Embedding and Transformer Models 4.4.1 Performance of Models Overall 4.4.2 Performance of Models By Score Level 4.5 Conclusions and Implications for Multilingual AES 4.5.1 The Importance of Transformers for Multilingual AES 4.5.2 Using MBERT to Score Essays Written in Persian 4.5.3 Assessment Technology, Equity, and Opportunity Note References Appendix A Appendix B ....... .... Instruction Topics 5 SmartWriting-Mandarin: An Automated Essay Scoring System for Chinese Foreign Language Learners 5.1 Introduction 5.2 Related Works 5.2.1 DNN-Based AES Systems 5.2.2 Chinese Automatic Essay Scoring and ACES 5.3 Details of SW-M 5.3.1 Preprocessing Module 5.3.2 Textual Features 5.3.3 Typos 5.3.4 Grammatical Errors 5.3.5 Scoring Model: A Fuzzy-Based Approach 5.4 Performance of SWM 5.5 Future Studies Acknowledgments References 6 NLP Application in the Hebrew Language for Assessment and Learning 6.1 Introduction 6.2 Hebrew Orthography and Morphology 6.2.1 Hebrew Orthography 6.2.2 Hebrew Morphology 6.2.2.1 The Verb System 6.2.2.2 The Noun System 6.2.2.3 Prepositions, Conjunctions, and Determiners 6.2.3 Text Length and Density 6.2.3.1 Hebrew Versus English Lexicon 6.2.3.2 Text Length 6.3 Morphological Lexicon and Corpora 6.3.1 Morphological Lexicon 6.3.2 Hebrew Corpora 6.3.2.1 M1 Corpus and the Annotated Corpus 6.3.2.2 News Corpus 6.3.3 Language Models 6.4 Computational Infrastructure for NLP in Hebrew 6.4.1 Tokenizer 6.4.2 Morphological Analyzer 6.4.3 Morphological Disambiguator 6.4.4 Semantic Disambiguator 6.4.5 Feature Extraction 6.4.5.1 Statistical Or “Surface” Features 6.4.5.2 Lexical Features 6.4.5.3 Morphological Features 6.4.5.4 Syntactic Features 6.4.5.5 Semantic Features 6.4.6 Grouping Text Features Into Linguistic Factors 6.4.7 Text Analysis Pipeline 6.5 Automated Essay Scoring 6.5.1 Score Prediction Algorithms 6.5.2 Grouping Features Into Macro-Features and Factors 6.5.3 Validity of NiteRater 6.5.3.1 Face and Content Validity – Identifying and Scoring Aberrant Essays 6.5.3.2 Predictive Or Criterion-Related Validity – Scoring of Classroom Essays 6.5.3.3 Predictive Or Criterion-Related Validity – Scoring of Tests for Admission to Higher Education 6.5.3.4 “True” Validity – Agreement With True Scores 6.5.3.5 Content Validity – Generalizing the Prediction Equation Across Prompts 6.5.4 Validity of Combined Computer and Human Scores 6.5.5 Quality Assurance of Essay Scoring 6.6 Other Applications of the Hebrew-NLP System 6.6.1 Providing Feedback to Essay Writers 6.6.2 Readability Assessment 6.6.2.1 Application to Textbooks (CET) 6.6.2.2 Simplification of Hebrew Legal Texts 6.6.3 Online Service to the Research Community 6.7 Summary, Open Issues, and Future Directions References Section 2 Expanding Automated Evaluation: Reading, Speech, Mathematics, and Writing Research 7 Automated Scoring for NAEP Short-Form Constructed Responses in Reading 7.1 Introduction 7.1.1 Short-Form Constructed Responses 7.1.2 The Current Study 7.2 Method 7.2.1 Prompt-Specific Competition 7.2.1.1 Participants 7.2.1.2 Instruments 7.2.1.3 Procedure 7.2.1.4 Results 7.2.2 Generic Competition 7.2.2.1 Participants and Instruments 7.2.2.2 Procedure 7.2.2.3 Results 7.3 Discussion 7.3.1 Limitations References 8 Automated Scoring and Feedback for Spoken Language 8.1 Introduction 8.2 Automated Scoring of Spoken Vs. Written Language 8.3 From the Rubric to Speech Features 8.4 Automated Speech Scoring System Architecture 8.4.1 Automatic Speech Recognition 8.4.2 Computing Speech Features 8.4.3 Filtering Models 8.4.4 Scoring Models 8.5 Operational Considerations 8.6 Providing Feedback to Language Learners 8.7 Speech Scoring Without Curated Features 8.8 Open Research Issues 8.9 Conclusion Note References 9 Automated Scoring of Math Constructed-Response Items 9.1 Introduction 9.2 Anatomy of a Math Item 9.3 Challenges of Math Automated Scoring 9.3.1 Representation of Mathematics 9.3.2 Equivalence of Expressions 9.3.3 Evaluation of Mathematics 9.3.4 Extracting Mathematics From Prose 9.3.5 Understanding Reasoning 9.4 Injecting Mathematical Reasoning Into NLP Scoring Models 9.4.1 Scoring of Math-Only Responses 9.4.2 Scoring of Responses Containing Prose 9.4.3 Brief Comment On the Validity of Automated Scoring of Math CR Items 9.5 Empirical Study 9.5.1 Ablation Study Results 9.5.2 Large Language Models for Math CR Scoring 9.6 Conclusion References 10 We Write Automated Scoring: Using ChatGPT for Scoring in Large-Scale Writing Research Projects 10.1 Introduction 10.1.1 We Write Intervention 10.1.2 Theoretical Framework 10.2 Developing a ChatGPT-Based Scoring Algorithm to Evaluate the Efficacy of the We Write Intervention 10.2.1 Design of Measures 10.2.2 Human Scoring Scheme for Essay Quality 10.2.3 ChatGPT Scoring Model Architecture/Details 10.2.3.1 Refinement of Scoring 10.3 Score Validation: Comparing Human and ChatGPT Scoring 10.4 Discussion and Future Research 10.4.1 Score Tendencies 10.4.2 Agreement Between Scores 10.4.3 Generosity of Scoring 10.4.4 Correlation Across Proficiency Levels 10.4.5 Efficiency 10.5 Limitations 10.6 Conclusion Acknowledgments References Section 3 Innovations in Automated Writing Evaluation 11 Exploring the Role of Automated Writing Evaluation as a Formative Assessment Tool Supporting Self-Regulated Learning in Writing 11.1 Introduction 11.1.1 The Present Chapter 11.2 Does AWE Help Students Learn Evaluation Criteria? 11.2.1 Learning Evaluation Criteria: Summary and Future Directions 11.3 Does AWE Help Students Practice Writing Skills and Processes? 11.3.1 Practice Writing Skills and Processes: Summary and Future Directions 11.4 Does AWE Provide Understandable and Actionable Feedback? 11.4.1 Understandable and Actionable Feedback: Summary and Future Directions 11.5 Does AWE-Supported Peer Review Offer Benefits for Reviewers and Writers? 11.5.1 AWE-Supported Peer Review: Summary and Future Directions 11.6 Does AWE Support Students Taking Ownership of Their Learning? 11.6.1 Ownership of Learning: Summary and Future Directions 11.7 Conclusion References 12 Supporting Students’ Text-Based Evidence Use Via Formative Automated Writing and Revision Assessment 12.1 Introduction 12.1.1 Theoretical Background and Motivation for ERevise System Design 12.2 From Automated Essay Scoring to Automated Writing Evaluation 12.2.1 ERevise – Initial Design of a Formative Assessment System 12.2.2 Prior Research Contributing to the Motivation for Formative Assessment Grounded in Sociocultural Learning Theories 12.2.2.1 Conceptual Change as Elemental for “Learning” the Genre of Argumentation 12.2.2.2 Meaning-Making as the Goal of Feedback (And Measurement in the AWE System) 12.3 Validity of the ERevise System for Use as a Formative Assessment 12.4 Toward ERevise+RF 12.4.1 Revising in Response to Feedback Is a Critical Skill Not Yet Addressed in AWE Systems 12.4.2 ERevise+RF as an AWE Formative Assessment to Support the Development of Revision Skills 12.4.3 The System Is Based On Rubrics Assessing Critical Aspects of the Revision Construct 12.4.4 NLP Approaches Prioritize Rubric Alignment 12.5 Summary and Conclusions: An Unfinished Composition Notes Acknowledgements References 13 The Use of AWE in Non-English Majors: Student Responses to Automated Feedback and the Impact of Feedback Accuracy 13.1 Introduction 13.1.1 Literature Review 13.1.1.1 Theoretical Background 13.1.1.2 Learner Responses to Automated Feedback and the Impact of Feedback Accuracy 13.1.2 The Present Study 13.2 Methods 13.2.1 Context and Participants 13.2.2 Criterion 13.2.3 Data Collection and Procedures 13.2.4 Data Analysis 13.3 Results 13.3.1 To What Extent Do Non-English Major Students Address Automated Feedback From Criterion and Make Successful and Relevant Revisions When Using It On a Voluntary Basis? 13.3.2 How Does the Accuracy of Automated Feedback Features of Criterion Influence the Success of Non-English Major Students’ Revisions? 13.4 Discussion 13.5 Conclusion References Chapter 14 Relationships Between Middle-School Teachers’ Perceptions and Application of Automated Writing Evaluation and Student Performance 14.1 Introduction 14.1.1 Implementation of AWE in Schools 14.1.2 Educators’ Views of AWE 14.1.3 Study Purpose 14.2 Methods 14.2.1 Study Context 14.2.2 MI Write Overview 14.2.2.1 Fidelity of Implementation Expectations 14.2.3 Participants 14.2.4 Measures 14.2.4.1 AWE Perceptions Scale (AWE-P) 14.2.4.2 MI Write Usage Indicators 14.2.4.3 Student Writing Performance 14.2.5 Data Analysis 14.2.5.1 RQ1 14.2.5.2 RQ2 14.3 Results 14.3.1 Descriptive Statistics: Teacher Perceptions and Fidelity 14.3.2 Alignment Between Teachers’ Perceptions and Usage of MI Write 14.3.2.1 Usability 14.3.2.2 Usefulness 14.3.2.3 Social Desirability 14.3.2.4 Misalignments 14.3.3 Relations Among Teacher Perceptions and Usage, and Student Writing Performance 14.4 Discussion 14.4.1 Limitations and Future Directions 14.5 Conclusion Conflict of Interest Statement References 15 Automated Writing Trait Analysis 15.1 Introduction 15.1.1 Forms of Automated Writing Trait Analysis 15.1.1.1 Multidimensional Modeling of Genre and Style 15.1.1.2 Multidimensional Modeling of Writing Quality 15.1.1.3 Multidimensional Keystroke Log Analysis 15.1.2 The Potential Power of Combined Models 15.2 A Hierarchical, Multidimensional Model of Text Variation in Student Essays 15.2.1 Feature Sets 15.2.2 Participants/Data Source 15.2.3 Confirmatory Factor Analysis 15.2.4 Reliability 15.2.5 External Validity 15.3 Multidimensional Modeling of Variation in Student Writing Processes 15.4 Using Combined Multidimensional Models to Measure the Effects of Instruction 15.4.1 Method 15.4.1.1 Participants 15.4.1.2 Materials 15.4.1.3 Qualitative School Differences 15.4.1.4 Procedure 15.4.2 Results 15.4.2.1 Factor Structure 15.4.2.2 Relations to Prior-Year Performance 15.4.2.3 Changes in Performance Over Time 15.4.2.4 Relation With Pretest and Prior-Year Variables 15.4.2.5 School Differences 15.4.2.6 Demographic Effects 15.5 Discussion, Limitations, and Conclusions Notes References 16 Advances in Automating Feedback for Argumentative Writing: Feedback Prize as a Case Study 16.1 Introduction 16.2 The Feedback Prize 16.2.1 Corpus Annotation 16.2.1.1 Discourse Elements 16.2.1.2 Discourse Effectiveness 16.2.1.3 Holistic Essay Quality 16.3 Feedback Prize 1.0: Evaluating Student Writing 16.3.1 Architecture and Accuracy of Select Models 16.3.2 Overall Model Trends 16.3.3 Bias of Select Models 16.4 Feedback Prize 2.0: Predicting Effective Arguments 16.4.1 Architecture and Performance of Select Models 16.4.2 Accuracy of Select Models By Effectiveness Rating 16.4.3 Bias of Select Models 16.5 Leveraging the PERSUADE Corpus and Algorithms to Improve Student Writing Outcomes in AWE Systems 16.5.1 Potential User-Facing Features in an AWE System 16.5.2 Supporting the Academic Growth of All Students, Including the Historically Marginalized References Appendix 17 Automated Feedback in Formative Assessment 17.1 Introduction 17.1.1 Background and Context 17.1.2 A Focus On Claims and Evidence 17.1.3 Structured, Content-Centric Rubrics 17.1.4 Associating Response Elements to Rubric Quality-Level Definitions 17.1.5 Approach to Task and Texts 17.1.6 Moving Beyond Generic, Holistic Rubrics 17.2 Research Design 17.2.1 Item/Lesson Requirements 17.2.2 Sentence-Level Scoring 17.2.3 Data Sources 17.2.4 Scoring Processes 17.3 Automated Scoring With a Focus On Feedback 17.3.1 Overview of the Automated Scoring Approach 17.3.2 Data Selection 17.3.3 CER Data Collection 17.3.4 How Much CER Data Is Needed 17.3.5 Additional Analytics for Response Analysis 17.3.5.1 Claim Category Exemplars 17.3.5.2 Evidence Exemplars and Keyword Sets 17.3.6 The Automated Evaluation and Feedback Pipeline 17.3.7 From Predictions and Analytics to Feedback 17.3.8 Examples of Feedback to Simulated Responses 17.4 Results and Discussion 17.4.1 What Works Well 17.4.2 Challenges and Areas for Additional Analytics Development 17.4.3 Future Research: Efficacy and Beyond 17.4.4 Narrative Feedback and Analysis: A Potential Next Step With Large Language Models References Section 4 Factors Affecting the Performance of Automated Evaluation 18 Using Automated Scoring to Support Rating Quality Analyses for Human Raters 18.1 Introduction 18.1.1 Purpose 18.1.2 Background 18.1.2.1 Evaluating Ratings in Incomplete Scoring Designs 18.1.2.2 Combining Human and Automated Ratings 18.2 Methods 18.2.1 Data Analysis 18.2.1.1 Rater Effects Analyses 18.2.1.2 Severity and Leniency Effects 18.2.1.3 Centrality and Extremism Effects 18.2.2 Summarizing Rater Effect Results 18.3 Results 18.3.1 Severity and Leniency 18.3.2 Centrality and Extremism 18.4 Discussion References 19 Calibrating and Evaluating Automated Scoring Engines and Human Raters Over Time Using Measurement Models 19.1 Introduction 19.1.1 Purpose 19.1.2 Background: Rating Quality Analysis With Automated Scoring Engines 19.2 Methods 19.2.1 Data Analysis 19.3 Results 19.3.1 Rater Drift Results 19.4 Discussion 19.4.1 How Can ASEs Be Incorporated Into Rater Drift Analyses Based On Measurement Models? 19.4.2 How Accurate Are Rater Drift Analyses Based On Measurement Models to Changes in Rater Severity Between Administrations? 19.4.3 Practical Implications 19.4.4 Directions for Future Research References 20 AI Scoring and Writing Fairness 20.1 Introduction 20.1.1 DIF Testing and Machine Learning 20.2 Method 20.2.1 Study Context 20.2.2 Participants 20.2.3 Instruments 20.2.4 Procedure 20.3 Results 20.4 Discussion Author Note References 21 Automating Bias in Writing Evaluation: Sources, Barriers, and Recommendations 21.1 Concerns About Bias in Automated Writing Evaluation 21.2 Human Language Biases 21.3 Automating Bias 21.3.1 Biased Training Data 21.3.2 Constrained Analytical Assumptions 21.3.2.1 Aggregation 21.3.2.2 Assumptions of Linearity 21.3.3 Black Box Approaches: Lack of Validation and Transparency 21.3.3.1 Validation 21.3.3.2 Lack of Transparency 21.4 Bias Reduction and Mitigating Bias in Automated Writing Evaluation 21.4.1 Recommendation 1: Compile Intentionally Inclusive and Representative Training Datasets 21.4.2 Recommendation 2: Rigorously Train Human Raters Regarding Potential Language Ideologies Along With Anti-Bias Assessment Practices 21.4.3 Recommendation 3: Explore Training Data for Discrepancies, Disparities, and Other Signs of Bias Across Subpopulations 21.4.4 Recommendation 4: Implement Principled Aggregation Only After Establishing Equivalence 21.4.5 Recommendation 5: Include Nonlinear Relationships in Analyses and Allow for Complex Associations 21.4.6 Recommendation 6: Extend Validation Procedures to Assess Accuracy Within and Across Subgroups and Subpopulations 21.4.7 Recommendation 7: Report Algorithms, Models, and Underlying Inferences in a Transparent and Explainable Manner 21.5 Conclusion References 22 Explainable AI and AWE: Balancing Tensions Between Transparency and Predictive Accuracy 22.1 Introduction 22.1.1 Role of XAI in the Data Science Life Cycle and AWE Development 22.1.2 Types and Methods of XAI 22.1.3 Why Use XAI in AWE? 22.1.4 Challenges of Applying XAI in the AWE Field 22.1.5 Desirable Properties of XAI 22.2 How SHAP Sheds Light On the Mechanics Inside AWE 22.3 Applications of XAI in AWE 22.4 Impact of XAI On Other Aspects of AWE Development and Deployment 22.5 Why Should XAI Not Be Blindly Trusted for AWE? Notes References 23 Validity Argument Roadmap for Automated Scoring 23.1 Introduction 23.2 The Roadmap 23.2.1 Describe Assessment Constructs, Including Rationale for Using Automated Scoring 23.2.2 Document Intended Score Interpretation and Uses (SIUs) (Including Scoring Rubrics, Processes, and Features) 23.2.3 Describe Test Blueprints and Specifications to Clarify Item Types That Use Automated Scoring 23.2.4 Describe Item/Test Development Procedures, Including Evidence Supporting Validity Arguments 23.2.5 Document Approach for Developing AI Scoring Model(s) 23.2.6 Describe Test Administration Procedures, Including Evidence Supporting Validity Arguments and System Security 23.2.7 Document Psychometric Analyses Supporting Validity, Reliability, and Fairness Claims, Including Details of AI Scoring Models 23.2.8 Document Reporting Procedures, Ensuring Alignment With Core Interpretation and Uses 23.3 Discussion: Future Directions 23.3.1 Future Area 1: More External Review 23.3.2 Future Area 2: Increased Pace of Change and Complexity in AI 23.3.3 Future Area 3: Continuing Challenges and Opportunities Around Fairness and Diversity Notes References Section 5 Technological Innovations: “Where Do We Go From Here?” 24 Redesigning Automated Scoring Engines to Include Deep Learning Models 24.1 Introduction 24.2 Deep Learning Approaches to Automated Scoring 24.3 Autoscore 1.0 24.4 Philosophy Driving Autoscore 2.0 Development 24.4.1 Future-Proof the Software 24.4.2 Demonstrate Improved (Or at Least Same) Level of Performance 24.4.3 Minimize Negative Business Impacts 24.4.4 Support New R&D and Flexible Training/Validation 24.5 Autoscore 2.0 24.5.1 Architecture 24.5.2 Continuity With Autoscore 1.0 24.5.3 Training and Validation 24.5.4 Deployment/Scoring 24.5.4.1 Supporting Several Models and Millions of Responses During Live Scoring 24.5.4.2 Model Size and Hardware 24.5.4.3 Sensitivity to Operating Systems and Libraries/Versions 24.5.4.4 Model Management 24.5.5 Automated Scoring Performance 24.5.6 Considerations 24.5.6.1 Length 24.5.6.2 Confidence 24.5.6.3 Error Diagnosis/Explainability 24.5.6.4 Fairness/Bias 24.6 Future Work 24.6.1 Explainability 24.6.2 Optimization 24.6.3 Feedback 24.6.4 Large Language Models 24.7 Conclusion References 25 Automated Short-Response Scoring for Automated Item Generation in Science Assessments 25.1 Introduction 25.1.1 Overview of Automated Essay Scoring 25.1.1.1 Efficiency, Objectivity, and Consistency 25.1.1.2 Customized Learning With Transparent AES 25.1.2 Overview of Automated Item Generation for Educational Assessments 25.1.2.1 Automated Item Generation Frameworks 25.1.2.2 Automated Item Generation System to Target Misconception 25.1.3 Present Study 25.2 Demonstration With the Science Assessment Response Data 25.2.1 Data 25.2.2 Methods and Analysis Framework 25.2.2.1 Libraries and GPU Setup for Automated Scoring 25.2.2.2 Preprocessing 25.2.3 Transformer Neural-Language Models for Automated Scoring 25.2.4 Attribution Score Analysis for Misconception Identification 25.2.5 Item Generation Demonstration Using Transformer Models 25.2.6 Evaluation Metrics 25.3 Results 25.3.1 Automated Essay Scoring Evaluation Results 25.3.2 Attribution Score Analysis Results 25.3.3 Automated Item Generation Demonstration Results With Creative Prompting 25.3.3.1 Example Items Generated From Question 5 25.3.3.2 Semantic Similarity Score and Prompt-Based Scoring Results 25.4 Conclusion and Discussion 25.4.1 Limitations and Directions for Future Research Notes References 26 Latent Dirichlet Allocation of Constructed Responses 26.1 Introduction 26.1.1 Purpose of the Chapter 26.2 Latent Dirichlet Allocation 26.2.1 LDA Model Parameters 26.2.2 LDA as a Graphical Model 26.2.3 LDA as a Generative Model 26.2.4 LDA as a Probabilistic Model 26.2.5 Extension to LDA 26.2.6 Understanding Word Probabilities of Topics and Topic Proportions 26.3 Data Preprocessing, Estimation, and Postprocessing for LDA 26.3.1 Data Cleaning and Preprocessing 26.3.1.1 Tokenization 26.3.1.2 Normalization (Lemmatization) 26.3.1.3 Stopword Removal 26.3.2 Estimation of Parameters 26.3.2.1 Prior Specification 26.3.2.2 Estimation Algorithm and Software 26.3.2.3 Model Selection 26.3.3 Postprocessing and Model Interpretation 26.4 LDA Example 26.4.1 Essay Data 26.4.2 Data Cleaning 26.4.3 Model Estimation and Selection 26.4.4 Interpreting LDA Results 26.4.5 Note On Generalized LDA Code 26.5 Future Direction for LDA References Appendix A: Generalized R Code for LDA Appendix B: Generalized Python Code for LDA 27 Computational Language Analysis as a Window Into Cognitive Functioning 27.1 Language as a Window Into Cognitive Functioning 27.1.1 Computational Measurement of Language: Cognition, Affect, and Social States 27.1.2 Historical Perspectives On Language for Clinical Assessment of Serious Mental Illness 27.2 Underlying Technology and Goals for Assessment of Cognitive Functioning 27.2.1 Overarching Purpose and a Generalized Architecture 27.2.2 Automatic Speech Recognition (ASR) 27.2.3 Language and Speech Feature Extraction With a Focus On Several Specific Measures 27.2.3.1 Lexeme-Level Features 27.2.3.2 Syntactic Features 27.2.3.3 Semantic Features 27.2.3.4 Speech Signal Features 27.2.4 Training Feature-Based Machine Learning Models 27.3 Research and Applications Applying Language Analysis to Measuring Mental Health 27.3.1 Automating Traditional Neuropsychiatric Assessments 27.3.2 Detecting Presence and Severity of Thought Disorder 27.3.3 Acoustic Features for Affective and Cognitive State Assessment 27.3.4 Detecting Onset of Cognitive Decline 27.3.5 Detection of Students at Risk 27.3.6 Applying Deep Learning Models for Mental Health Assessment 27.4 Ethical and Measurement Considerations 27.4.1 What Are We Measuring? 27.4.2 Actionable Inferences 27.4.3 Establishing Trustworthy Measures: Transparency, Generalizability, and Explainability 27.4.4 Bias and Fairness 27.4.5 Human in the Loop ML Processing 27.5 Conclusions References 28 Expanding AWE to Incorporate Reading and Writing Evaluation 28.1 Introduction 28.2 Overview of Discourse Comprehension Theories 28.2.1 Single Document Comprehension 28.2.1.1 Measuring Comprehension in Single-Text Contexts 28.2.2 Multiple Document Comprehension 28.2.2.1 Measuring Comprehension in MD Contexts 28.3 Recommendations for Comprehension-Aware AWE Systems 28.3.1 Recommendation 1: AWE Systems Should Account for the Ways in Which Students Engage With Outside Source Material 28.3.2 Recommendation 2: AWE Systems Should Account for the Comprehension Processes Involved in Understanding the Material That Students Are Asked to Write About 28.3.3 Recommendation 3: AWE Systems Should Account for Individual Differences in the Skills and Knowledge Involved in Complex Writing Tasks 28.4 Conclusion References 29 The Two U’s in the Future of Automated Essay Evaluation: Universal Access and User-Centered Design 29.1 Introduction 29.2 Universal Access 29.2.1 Device Agnostic Technologies: Responsive Design 29.2.2 Language Agnostic Technologies 29.2.3 Genre Responsive Technologies 29.2.4 Culturally Responsive Technologies 29.2.5 Embedded AEE 29.3 User-Centered Design 29.3.1 Large-Scale Digital Learning Platforms 29.3.2 Generative AI 29.4 Conclusions References Index