دسترسی نامحدود
برای کاربرانی که ثبت نام کرده اند
برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید
در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید
برای کاربرانی که ثبت نام کرده اند
درصورت عدم همخوانی توضیحات با کتاب
از ساعت 7 صبح تا 10 شب
ویرایش: [1st ed. 2021] نویسندگان: Kyriakos G. Vamvoudakis (editor), Yan Wan (editor), Frank L. Lewis (editor), Derya Cansever (editor) سری: ISBN (شابک) : 3030609898, 9783030609894 ناشر: Springer سال نشر: 2021 تعداد صفحات: 857 [839] زبان: English فرمت فایل : PDF (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود) حجم فایل: 20 Mb
در صورت تبدیل فایل کتاب Handbook of Reinforcement Learning and Control: 325 (Studies in Systems, Decision and Control, 325) به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.
توجه داشته باشید کتاب راهنمای یادگیری و کنترل تقویتی: 325 (مطالعات در سیستم ها، تصمیم گیری و کنترل، 325) نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.
This handbook presents state-of-the-art research in reinforcement learning, focusing on its applications in the control and game theory of dynamic systems and future directions for related research and technology. The contributions gathered in this book deal with challenges faced when using learning and adaptation methods to solve academic and industrial problems, such as optimization in dynamic environments with single and multiple agents, convergence and performance analysis, and online implementation. They explore means by which these difficulties can be solved, and cover a wide range of related topics including: deep learning; artificial intelligence; applications of game theory; mixed modality learning; and multi-agent reinforcement learning. Practicing engineers and scholars in the field of machine learning, game theory, and autonomous control will find the Handbook of Reinforcement Learning and Control to be thought-provoking, instructive and informative.
Preface Contents Part ITheory of Reinforcement Learning for Model-Free and Model-Based Control and Games 1 What May Lie Ahead in Reinforcement Learning References 2 Reinforcement Learning for Distributed Control and Multi-player Games 2.1 Introduction 2.2 Optimal Control of Continuous-Time Systems 2.2.1 IRL with Experience Replay Learning Technique ch2Modares2014Automatica,ch2Kamalapurkar2016 2.2.2 mathcalHinfty Control of CT Systems 2.3 Nash Games 2.4 Graphical Games 2.4.1 Off-Policy RL for Graphical Games 2.5 Output Synchronization of Multi-agent Systems 2.6 Conclusion and Open Research Directions References 3 From Reinforcement Learning to Optimal Control: A Unified Framework for Sequential Decisions 3.1 Introduction 3.2 The Communities of Sequential Decisions 3.3 Stochastic Optimal Control Versus Reinforcement Learning 3.3.1 Stochastic Control 3.3.2 Reinforcement Learning 3.3.3 A Critique of the MDP Modeling Framework 3.3.4 Bridging Optimal Control and Reinforcement Learning 3.4 The Universal Modeling Framework 3.4.1 Dimensions of a Sequential Decision Model 3.4.2 State Variables 3.4.3 Objective Functions 3.4.4 Notes 3.5 Energy Storage Illustration 3.5.1 A Basic Energy Storage Problem 3.5.2 With a Time-Series Price Model 3.5.3 With Passive Learning 3.5.4 With Active Learning 3.5.5 With Rolling Forecasts 3.5.6 Remarks 3.6 Designing Policies 3.6.1 Policy Search 3.6.2 Lookahead Approximations 3.6.3 Hybrid Policies 3.6.4 Remarks 3.6.5 Stochastic Control, Reinforcement Learning, and the Four Classes of Policies 3.7 Policies for Energy Storage 3.8 Extension to Multi-agent Systems 3.9 Observations References 4 Fundamental Design Principles for Reinforcement Learning Algorithms 4.1 Introduction 4.1.1 Stochastic Approximation and Reinforcement Learning 4.1.2 Sample Complexity Bounds 4.1.3 What Will You Find in This Chapter? 4.1.4 Literature Survey 4.2 Stochastic Approximation: New and Old Tricks 4.2.1 What is Stochastic Approximation? 4.2.2 Stochastic Approximation and Learning 4.2.3 Stability and Convergence 4.2.4 Zap–Stochastic Approximation 4.2.5 Rates of Convergence 4.2.6 Optimal Convergence Rate 4.2.7 TD and LSTD Algorithms 4.3 Zap Q-Learning: Fastest Convergent Q-Learning 4.3.1 Markov Decision Processes 4.3.2 Value Functions and the Bellman Equation 4.3.3 Q-Learning 4.3.4 Tabular Q-Learning 4.3.5 Convergence and Rate of Convergence 4.3.6 Zap Q-Learning 4.4 Numerical Results 4.4.1 Finite State-Action MDP 4.4.2 Optimal Stopping in Finance 4.5 Zap-Q with Nonlinear Function Approximation 4.5.1 Choosing the Eligibility Vectors 4.5.2 Theory and Challenges 4.5.3 Regularized Zap-Q 4.6 Conclusions and Future Work References 5 Mixed Density Methods for Approximate Dynamic Programming 5.1 Introduction 5.2 Unconstrained Affine-Quadratic Regulator 5.3 Regional Model-Based Reinforcement Learning 5.3.1 Preliminaries 5.3.2 Regional Value Function Approximation 5.3.3 Bellman Error 5.3.4 Actor and Critic Update Laws 5.3.5 Stability Analysis 5.3.6 Summary 5.4 Local (State-Following) Model-Based Reinforcement Learning 5.4.1 StaF Kernel Functions 5.4.2 Local Value Function Approximation 5.4.3 Actor and Critic Update Laws 5.4.4 Analysis 5.4.5 Stability Analysis 5.4.6 Summary 5.5 Combining Regional and Local State-Following Approximations 5.6 Reinforcement Learning with Sparse Bellman Error Extrapolation 5.7 Conclusion References 6 Model-Free Linear Quadratic Regulator 6.1 Introduction to a Model-Free LQR Problem 6.2 A Gradient-Based Random Search Method 6.3 Main Results 6.4 Proof Sketch 6.4.1 Controlling the Bias 6.4.2 Correlation of "0362 f(K) and f(K) 6.5 An Example 6.6 Thoughts and Outlook References Part IIConstraint-Driven and Verified RL 7 Adaptive Dynamic Programming in the Hamiltonian-Driven Framework 7.1 Introduction 7.1.1 Literature Review 7.1.2 Motivation 7.1.3 Structure 7.2 Problem Statement 7.3 Hamiltonian-Driven Framework 7.3.1 Policy Evaluation 7.3.2 Policy Comparison 7.3.3 Policy Improvement 7.4 Discussions on the Hamiltonian-Driven ADP 7.4.1 Implementation with Critic-Only Structure 7.4.2 Connection to Temporal Difference Learning 7.4.3 Connection to Value Gradient Learning 7.5 Simulation Study 7.6 Conclusion References 8 Reinforcement Learning for Optimal Adaptive Control of Time Delay Systems 8.1 Introduction 8.2 Problem Description 8.3 Extended State Augmentation 8.4 State Feedback Q-Learning Control of Time Delay Systems 8.5 Output Feedback Q-Learning Control of Time Delay Systems 8.6 Simulation Results 8.7 Conclusions References 9 Optimal Adaptive Control of Partially Uncertain Linear Continuous-Time Systems with State Delay 9.1 Introduction 9.2 Problem Statement 9.3 Linear Quadratic Regulator Design 9.3.1 Periodic Sampled Feedback 9.3.2 Event Sampled Feedback 9.4 Optimal Adaptive Control 9.4.1 Periodic Sampled Feedback 9.4.2 Event Sampled Feedback 9.4.3 Hybrid Reinforcement Learning Scheme 9.5 Perspectives on Controller Design with Image Feedback 9.6 Simulation Results 9.6.1 Linear Quadratic Regulator with Known Internal Dynamics 9.6.2 Optimal Adaptive Control with Unknown Drift Dynamics 9.7 Conclusion References 10 Dissipativity-Based Verification for Autonomous Systems in Adversarial Environments 10.1 Introduction 10.1.1 Related Work 10.1.2 Contributions 10.1.3 Structure 10.1.4 Notation 10.2 Problem Formulation 10.2.1 (Q,S,R)-Dissipative and L2–Gain Stable Systems 10.3 Learning-Based Distributed Cascade Interconnection 10.4 Learning-Based L2–Gain Composition 10.4.1 Q-Learning for L2–Gain Verification 10.4.2 L2–Gain Model-Free Composition 10.5 Learning-Based Lossless Composition 10.6 Discussion 10.7 Conclusion and Future Work References 11 Reinforcement Learning-Based Model Reduction for Partial Differential Equations: Application to the Burgers Equation 11.1 Introduction 11.2 Basic Notation and Definitions 11.3 RL-Based Model Reduction of PDEs 11.3.1 Reduced-Order PDE Approximation 11.3.2 Proper Orthogonal Decomposition for ROMs 11.3.3 Closure Models for ROM Stabilization 11.3.4 Main Result: RL-Based Closure Model 11.4 Extremum Seeking Based Closure Model Auto-Tuning 11.5 The Case of the Burgers Equation 11.6 Conclusion References Part IIIMulti-agent Systems and RL 12 Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms 12.1 Introduction 12.2 Background 12.2.1 Single-Agent RL 12.2.2 Multi-Agent RL Framework 12.3 Challenges in MARL Theory 12.3.1 Non-unique Learning Goals 12.3.2 Non-stationarity 12.3.3 Scalability Issue 12.3.4 Various Information Structures 12.4 MARL Algorithms with Theory 12.4.1 Cooperative Setting 12.4.2 Competitive Setting 12.4.3 Mixed Setting 12.5 Application Highlights 12.5.1 Cooperative Setting 12.5.2 Competitive Setting 12.5.3 Mixed Settings 12.6 Conclusions and Future Directions References 13 Computational Intelligence in Uncertainty Quantification for Learning Control and Differential Games 13.1 Introduction 13.2 Problem Formulation of Optimal Control for Uncertain Systems 13.2.1 Optimal Control for Systems with Parameters Modulated by Multi-dimensional Uncertainties 13.2.2 Optimal Control for Random Switching Systems 13.3 Effective Uncertainty Evaluation Methods 13.3.1 Problem Formulation 13.3.2 The MPCM 13.3.3 The MPCM-OFFD 13.4 Optimal Control Solutions for Systems with Parameter Modulated by Multi-dimensional Uncertainties 13.4.1 Reinforcement Learning-Based Stochastic Optimal Control 13.4.2 Q-Learning-Based Stochastic Optimal Control 13.5 Optimal Control Solutions for Random Switching Systems 13.5.1 Optimal Controller for Random Switching Systems 13.5.2 Effective Estimator for Random Switching Systems 13.6 Differential Games for Systems with Parameters Modulated by Multi-dimensional Uncertainties 13.6.1 Stochastic Two-Player Zero-Sum Game 13.6.2 Multi-player Nonzero-Sum Game 13.7 Applications 13.7.1 Traffic Flow Management Under Uncertain Weather 13.7.2 Learning Control for Aerial Communication Using Directional Antennas (ACDA) Systems 13.8 Summary References 14 A Top-Down Approach to Attain Decentralized Multi-agents 14.1 Introduction 14.2 Background 14.2.1 Reinforcement Learning 14.2.2 Multi-agent Reinforcement Learning 14.3 Centralized Learning, But Decentralized Execution 14.3.1 A Bottom-Up Approach 14.3.2 A Top-Down Approach 14.4 Centralized Expert Supervises Multi-agents 14.4.1 Imitation Learning 14.4.2 CESMA 14.5 Experiments 14.5.1 Decentralization Can Achieve Centralized Optimality 14.5.2 Expert Trajectories Versus Multi-agent Trajectories 14.6 Conclusion References 15 Modeling and Mitigating Link-Flooding Distributed Denial-of-Service Attacks via Learning in Stackelberg Games 15.1 Introduction 15.2 Routing and Attack in Communication Network 15.3 Stackelberg Game Model 15.4 Optimal Attack and Stackelberg Equilibria for Malicious Adversaries 15.4.1 Optimal Attack and Stackelberg Equilibria for Networks with Identical Links 15.5 Mitigating Attacks via Learning 15.5.1 Predicting the Routing Cost 15.5.2 Minimizing the Predicted Routing Cost 15.6 Simulation Study 15.6.1 Discussion 15.7 Conclusion References Part IVBounded Rationality and Value of Information in RL and Games 16 Bounded Rationality in Differential Games: A Reinforcement Learning-Based Approach 16.1 Introduction 16.1.1 Related Work 16.2 Problem Formulation 16.2.1 Nash Equilibrium Solutions for Differential Games 16.3 Boundedly Rational Game Solution Concepts 16.4 Cognitive Hierarchy for Adversarial Target Tracking 16.4.1 Problem Formulation 16.4.2 Zero-Sum Game 16.4.3 Cognitive Hierarchy 16.4.4 Coordination with Nonequilibrium Game-Theoretic Learning 16.4.5 Simulation 16.5 Conclusion and Future Work References 17 Bounded Rationality in Learning, Perception, Decision-Making, and Stochastic Games 17.1 The Autonomy Challenge 17.1.1 The Case of Actionable Data 17.1.2 The Curse of Optimality 17.2 How to Move Forward 17.2.1 Bounded Rationality for Human-Like Decision-Making 17.2.2 Hierarchical Abstractions for Scalability 17.3 Sequential Decision-Making Subject to Resource Constraints 17.3.1 Standard Markov Decision Processes 17.3.2 Information-Limited Markov Decision Processes 17.4 An Information-Theoretic Approach for Hierarchical Decision-Making 17.4.1 Agglomerative Information Bottleneck for Quadtree Compression 17.4.2 Optimal Compression of Quadtrees 17.4.3 The Q-Tree Search Algorithm 17.5 Stochastic Games and Bounded Rationality 17.5.1 Stochastic Pursuit–Evasion 17.5.2 Level-k Thinking 17.5.3 A Pursuit–Evasion Game in a Stochastic Environment 17.6 Conclusions References 18 Fairness in Learning-Based Sequential Decision Algorithms: A Survey 18.1 Introduction 18.2 Preliminaries 18.2.1 Sequential Decision Algorithms 18.2.2 Notions of Fairness 18.3 (Fair) Sequential Decision When Decisions Do Not Affect Underlying Population 18.3.1 Bandits, Regret, and Fair Regret 18.3.2 Fair Experts and Expert Opinions 18.3.3 Fair Policing 18.4 (Fair) Sequential Decision When Decisions Affect Underlying Population 18.4.1 Two-Stage Models 18.4.2 Long-Term Impacts on the Underlying Population References 19 Trading Utility and Uncertainty: Applying the Value of Information to Resolve the Exploration–Exploitation Dilemma in Reinforcement Learning 19.1 Introduction 19.2 Exploring Single-State, Multiple-Action Markov Decision Processes 19.2.1 Literature Survey 19.2.2 Methodology 19.2.3 Simulations and Analyses 19.2.4 Conclusions 19.3 Exploring Multiple-sate, Multiple-Action Markov Decision Processes 19.3.1 Literature Survey 19.3.2 Methodology 19.3.3 Simulations and Analyses 19.3.4 Conclusions References Part VApplications of RL 20 Map-Based Planning for Small Unmanned Aircraft Rooftop Landing 20.1 Introduction 20.2 Background 20.2.1 Sensor-Based Planning 20.2.2 Map-Based Planning 20.2.3 Multi-goal Planning 20.2.4 Urban Landscape and Rooftop Landings 20.3 Preliminaries 20.3.1 Coordinates and Landing Sites 20.3.2 3D Path Planning with Mapped Obstacles 20.4 Landing Site Database 20.4.1 Flat-Like Roof Identification 20.4.2 Flat Surface Extraction for Usable Landing Area 20.4.3 Touchdown Points 20.4.4 Landing Site Risk Model 20.5 Three-Dimensional Maps for Path Planning 20.6 Planning Risk Metric Analysis and Integration 20.6.1 Real-Time Map-Based Planner Architecture 20.6.2 Trade-Off Between Landing Site and Path Risk 20.6.3 Multi-goal Planner 20.7 Maps and Simulation Results 20.7.1 Landing Sites and Risk Maps 20.7.2 Case Studies 20.7.3 Urgent Landing Statistical Analysis 20.8 Conclusion References 21 Reinforcement Learning: An Industrial Perspective 21.1 Introduction 21.2 RL Applications 21.2.1 Sensor Management in Intelligence, Surveillance, and Reconnaissance 21.2.2 High Level Reasoning in Autonomous Navigation 21.2.3 Advanced Manufacturing Process Control 21.2.4 Maintenance, Repair, and Overhaul Operations 21.2.5 Human–Robot Collaboration 21.3 Case Study I: Optimal Sensor Tasking 21.3.1 Sensor Tasking as a Stochastic Optimal Control Problem 21.3.2 Multi-Arm Bandit Problem Approximation 21.3.3 Numerical Study 21.4 Case Study II: Deep Reinforcement Learning for Advanced Manufacturing Control 21.4.1 Cold Spray Control Problem 21.4.2 Guided Policy Search 21.4.3 Simulation Results 21.5 Future Outlook References 22 Robust Autonomous Driving with Human in the Loop 22.1 Introduction 22.2 Mathematical Modeling of Human–Vehicle Interaction 22.2.1 Vehicle Lateral Dynamics 22.2.2 Interconnected Human–Vehicle Model 22.3 Model-Based Control Design 22.3.1 Discretization of Differential-Difference Equations 22.3.2 Formulation of the Shared Control Problem 22.3.3 Model-Based Optimal Control Design 22.4 Learning-Based Optimal Control for Cooperative Driving 22.5 Numerical Results 22.5.1 Algorithmic Implementation 22.5.2 Comparisons and Discussions for ADP-Based Shared Control Design 22.6 Conclusions and Future Work References 23 Decision-Making for Complex Systems Subjected to Uncertainties—A Probability Density Function Control Approach 23.1 Introduction 23.2 Integrated Modeling Perspectives—Ordinary Algebra Versus {Max, +} Algebra 23.2.1 Process Level Modeling via Ordinary Algebra Systems 23.2.2 {Max, +} Algebra-Based Modeling 23.2.3 Learning Under Uncertainties–PDF Shaping of Modeling Error-Based Approach 23.3 Human-in-the-Loop Consideration: Impact of Uncertainties in Decision-Making Phase 23.4 Optimization Under Uncertainties Impacts 23.4.1 Formulation of Optimization as a Feedback Control Design Problem–Optimization is a Special Case of Feedback Control System Design 23.5 A Generalized Framework for Decision-Making Using PDF Shaping Approach 23.5.1 PDF Shaping for the Performance Function 23.5.2 Dealing with the Constraint 23.5.3 Dealing with Dynamic Constraint 23.5.4 A Total Probabilistic Solution 23.5.5 Uncertainties in Performance Function and Constraints 23.6 System Analysis: Square Impact Principle as a Mathematical Principle for Integrated IT with Infrastructure Design 23.6.1 Description of Operational Optimal Control 23.6.2 Square Impact Principle (SIP): Infrastructure Versus Control Performance 23.7 Conclusions References Part VIMulti-Disciplinary Connections 24 A Hybrid Dynamical Systems Perspective on Reinforcement Learning for Cyber-Physical Systems: Vistas, Open Problems, and Challenges 24.1 Introduction 24.2 Hybrid Dynamical Systems 24.2.1 Non-uniqueness of Solutions and Set-Valued Dynamics 24.2.2 Hybrid Time Domains and Solutions of Hybrid Dynamical Systems 24.2.3 Graphical Convergence, Basic Assumptions and Sequential Compactness 24.2.4 Stability and Robustness 24.3 Reinforcement Learning via Dynamic Policy Gradient 24.3.1 Asynchronous Policy Iteration 24.3.2 Synchronous Policy Iteration: Online Training of Actor–Critic Structures 24.4 Reinforcement Learning in Hybrid Dynamical Systems 24.4.1 Hybrid Learning Algorithms 24.4.2 Hybrid Dynamic Environments 24.5 Conclusions References 25 The Role of Systems Biology, Neuroscience, and Thermodynamics in Network Control and Learning 25.1 Introduction 25.2 Large-Scale Networks and Hybrid Thermodynamics 25.3 Multiagent Systems with Uncertain Interagent Communication 25.4 Systems Biology, Neurophysiology, Thermodynamics, and Dynamic Switching Communication Topologies for Large-Scale Multilayered Networks 25.5 Nonlinear Stochastic Optimal Control and Learning 25.6 Complexity, Thermodynamics, Information Theory, and Swarm Dynamics 25.7 Thermodynamic Entropy, Shannon Entropy, Bode Integrals, and Performance Limitations in Nonlinear Systems 25.8 Conclusion References 26 Quantum Amplitude Amplification for Reinforcement Learning 26.1 Exploration and Exploitation in Reinforcement Learning 26.2 Quantum Probability Theory 26.3 The Original Quantum Reinforcement Learning (QRL) Algorithm 26.4 The Revised Quantum Reinforcement Learning Algorithm 26.5 Learning Rate and Performance Comparisons 26.6 Other Applications of QRL 26.6.1 Example 26.7 Application to Human Learning 26.8 Concluding Comments References