ورود به حساب

نام کاربری گذرواژه

گذرواژه را فراموش کردید؟ کلیک کنید

حساب کاربری ندارید؟ ساخت حساب

ساخت حساب کاربری

نام نام کاربری ایمیل شماره موبایل گذرواژه

برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید


09117307688
09117179751

در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید

دسترسی نامحدود

برای کاربرانی که ثبت نام کرده اند

ضمانت بازگشت وجه

درصورت عدم همخوانی توضیحات با کتاب

پشتیبانی

از ساعت 7 صبح تا 10 شب

دانلود کتاب Handbook of Reinforcement Learning and Control: 325 (Studies in Systems, Decision and Control, 325)

دانلود کتاب راهنمای یادگیری و کنترل تقویتی: 325 (مطالعات در سیستم ها، تصمیم گیری و کنترل، 325)

Handbook of Reinforcement Learning and Control: 325 (Studies in Systems, Decision and Control, 325)

مشخصات کتاب

Handbook of Reinforcement Learning and Control: 325 (Studies in Systems, Decision and Control, 325)

ویرایش: [1st ed. 2021] 
نویسندگان: , , ,   
سری:  
ISBN (شابک) : 3030609898, 9783030609894 
ناشر: Springer 
سال نشر: 2021 
تعداد صفحات: 857
[839] 
زبان: English 
فرمت فایل : PDF (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود) 
حجم فایل: 20 Mb 

قیمت کتاب (تومان) : 35,000



ثبت امتیاز به این کتاب

میانگین امتیاز به این کتاب :
       تعداد امتیاز دهندگان : 7


در صورت تبدیل فایل کتاب Handbook of Reinforcement Learning and Control: 325 (Studies in Systems, Decision and Control, 325) به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.

توجه داشته باشید کتاب راهنمای یادگیری و کنترل تقویتی: 325 (مطالعات در سیستم ها، تصمیم گیری و کنترل، 325) نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.


توضیحاتی در مورد کتاب راهنمای یادگیری و کنترل تقویتی: 325 (مطالعات در سیستم ها، تصمیم گیری و کنترل، 325)




توضیحاتی درمورد کتاب به خارجی

This handbook presents state-of-the-art research in reinforcement learning, focusing on its applications in the control and game theory of dynamic systems and future directions for related research and technology. The contributions gathered in this book deal with challenges faced when using learning and adaptation methods to solve academic and industrial problems, such as optimization in dynamic environments with single and multiple agents, convergence and performance analysis, and online implementation. They explore means by which these difficulties can be solved, and cover a wide range of related topics including: deep learning; artificial intelligence; applications of game theory; mixed modality learning; and multi-agent reinforcement learning. Practicing engineers and scholars in the field of machine learning, game theory, and autonomous control will find the Handbook of Reinforcement Learning and Control to be thought-provoking, instructive and informative.



فهرست مطالب

Preface
Contents
Part ITheory of Reinforcement Learning for Model-Free and Model-Based Control and Games
1 What May Lie Ahead in Reinforcement Learning
	References
2 Reinforcement Learning for Distributed Control and Multi-player Games
	2.1 Introduction
	2.2 Optimal Control of Continuous-Time Systems
		2.2.1 IRL with Experience Replay Learning Technique ch2Modares2014Automatica,ch2Kamalapurkar2016
		2.2.2 mathcalHinfty Control of CT Systems
	2.3 Nash Games
	2.4 Graphical Games
		2.4.1 Off-Policy RL for Graphical Games
	2.5 Output Synchronization of Multi-agent Systems
	2.6 Conclusion and Open Research Directions
	References
3 From Reinforcement Learning to Optimal Control: A Unified Framework for Sequential Decisions
	3.1 Introduction
	3.2 The Communities of Sequential Decisions
	3.3 Stochastic Optimal Control Versus Reinforcement Learning
		3.3.1 Stochastic Control
		3.3.2 Reinforcement Learning
		3.3.3 A Critique of the MDP Modeling Framework
		3.3.4 Bridging Optimal Control and Reinforcement Learning
	3.4 The Universal Modeling Framework
		3.4.1 Dimensions of a Sequential Decision Model
		3.4.2 State Variables
		3.4.3 Objective Functions
		3.4.4 Notes
	3.5 Energy Storage Illustration
		3.5.1 A Basic Energy Storage Problem
		3.5.2 With a Time-Series Price Model
		3.5.3 With Passive Learning
		3.5.4 With Active Learning
		3.5.5 With Rolling Forecasts
		3.5.6 Remarks
	3.6 Designing Policies
		3.6.1 Policy Search
		3.6.2 Lookahead Approximations
		3.6.3 Hybrid Policies
		3.6.4 Remarks
		3.6.5 Stochastic Control, Reinforcement Learning, and the Four Classes of Policies
	3.7 Policies for Energy Storage
	3.8 Extension to Multi-agent Systems
	3.9 Observations
	References
4 Fundamental Design Principles for Reinforcement Learning Algorithms
	4.1 Introduction
		4.1.1 Stochastic Approximation and Reinforcement Learning
		4.1.2 Sample Complexity Bounds
		4.1.3 What Will You Find in This Chapter?
		4.1.4 Literature Survey
	4.2 Stochastic Approximation: New and Old Tricks
		4.2.1 What is Stochastic Approximation?
		4.2.2 Stochastic Approximation and Learning
		4.2.3 Stability and Convergence
		4.2.4 Zap–Stochastic Approximation
		4.2.5 Rates of Convergence
		4.2.6 Optimal Convergence Rate
		4.2.7 TD and LSTD Algorithms
	4.3 Zap Q-Learning: Fastest Convergent Q-Learning
		4.3.1 Markov Decision Processes
		4.3.2 Value Functions and the Bellman Equation
		4.3.3 Q-Learning
		4.3.4 Tabular Q-Learning
		4.3.5 Convergence and Rate of Convergence
		4.3.6 Zap Q-Learning
	4.4 Numerical Results
		4.4.1 Finite State-Action MDP
		4.4.2 Optimal Stopping in Finance
	4.5 Zap-Q with Nonlinear Function Approximation
		4.5.1 Choosing the Eligibility Vectors
		4.5.2 Theory and Challenges
		4.5.3 Regularized Zap-Q
	4.6 Conclusions and Future Work
	References
5 Mixed Density Methods for Approximate Dynamic Programming
	5.1 Introduction
	5.2 Unconstrained Affine-Quadratic Regulator
	5.3 Regional Model-Based Reinforcement Learning
		5.3.1 Preliminaries
		5.3.2 Regional Value Function Approximation
		5.3.3 Bellman Error
		5.3.4 Actor and Critic Update Laws
		5.3.5 Stability Analysis
		5.3.6 Summary
	5.4 Local (State-Following) Model-Based Reinforcement Learning
		5.4.1 StaF Kernel Functions
		5.4.2 Local Value Function Approximation
		5.4.3 Actor and Critic Update Laws
		5.4.4 Analysis
		5.4.5 Stability Analysis
		5.4.6 Summary
	5.5 Combining Regional and Local State-Following Approximations
	5.6 Reinforcement Learning with Sparse Bellman Error Extrapolation
	5.7 Conclusion
	References
6 Model-Free Linear Quadratic Regulator
	6.1 Introduction to a Model-Free LQR Problem
	6.2 A Gradient-Based Random Search Method
	6.3 Main Results
	6.4 Proof Sketch
		6.4.1 Controlling the Bias
		6.4.2 Correlation of "0362 f(K) and f(K)
	6.5 An Example
	6.6 Thoughts and Outlook
	References
Part IIConstraint-Driven and Verified RL
7 Adaptive Dynamic Programming  in the Hamiltonian-Driven Framework
	7.1 Introduction
		7.1.1 Literature Review
		7.1.2 Motivation
		7.1.3 Structure
	7.2 Problem Statement
	7.3 Hamiltonian-Driven Framework
		7.3.1 Policy Evaluation
		7.3.2 Policy Comparison
		7.3.3 Policy Improvement
	7.4 Discussions on the Hamiltonian-Driven ADP
		7.4.1 Implementation with Critic-Only Structure
		7.4.2 Connection to Temporal Difference Learning
		7.4.3 Connection to Value Gradient Learning
	7.5 Simulation Study
	7.6 Conclusion
	References
8 Reinforcement Learning for Optimal Adaptive Control of Time Delay Systems
	8.1 Introduction
	8.2 Problem Description
	8.3 Extended State Augmentation
	8.4 State Feedback Q-Learning Control of Time Delay Systems
	8.5 Output Feedback Q-Learning Control of Time Delay Systems
	8.6 Simulation Results
	8.7 Conclusions
	References
9 Optimal Adaptive Control of Partially Uncertain Linear Continuous-Time Systems with State Delay
	9.1 Introduction
	9.2 Problem Statement
	9.3 Linear Quadratic Regulator Design
		9.3.1 Periodic Sampled Feedback
		9.3.2 Event Sampled Feedback
	9.4 Optimal Adaptive Control
		9.4.1 Periodic Sampled Feedback
		9.4.2 Event Sampled Feedback
		9.4.3 Hybrid Reinforcement Learning Scheme
	9.5 Perspectives on Controller Design with Image Feedback
	9.6 Simulation Results
		9.6.1 Linear Quadratic Regulator with Known Internal Dynamics
		9.6.2 Optimal Adaptive Control with Unknown Drift Dynamics
	9.7 Conclusion
	References
10 Dissipativity-Based Verification for Autonomous Systems in Adversarial Environments
	10.1 Introduction
		10.1.1 Related Work
		10.1.2 Contributions
		10.1.3 Structure
		10.1.4 Notation
	10.2 Problem Formulation
		10.2.1 (Q,S,R)-Dissipative and L2–Gain Stable Systems
	10.3 Learning-Based Distributed Cascade Interconnection
	10.4 Learning-Based L2–Gain Composition
		10.4.1 Q-Learning for L2–Gain Verification
		10.4.2 L2–Gain Model-Free Composition
	10.5 Learning-Based Lossless Composition
	10.6 Discussion
	10.7 Conclusion and Future Work
	References
11 Reinforcement Learning-Based Model Reduction for Partial Differential Equations: Application to the Burgers Equation
	11.1 Introduction
	11.2 Basic Notation and Definitions
	11.3 RL-Based Model Reduction of PDEs
		11.3.1 Reduced-Order PDE Approximation
		11.3.2 Proper Orthogonal Decomposition for ROMs
		11.3.3 Closure Models for ROM Stabilization
		11.3.4 Main Result: RL-Based Closure Model
	11.4 Extremum Seeking Based Closure Model Auto-Tuning
	11.5 The Case of the Burgers Equation
	11.6 Conclusion
	References
Part IIIMulti-agent Systems and RL
12 Multi-Agent Reinforcement Learning:  A Selective Overview of Theories and Algorithms
	12.1 Introduction
	12.2 Background
		12.2.1 Single-Agent RL
		12.2.2 Multi-Agent RL Framework
	12.3 Challenges in MARL Theory
		12.3.1 Non-unique Learning Goals
		12.3.2 Non-stationarity
		12.3.3 Scalability Issue
		12.3.4 Various Information Structures
	12.4 MARL Algorithms with Theory
		12.4.1 Cooperative Setting
		12.4.2 Competitive Setting
		12.4.3 Mixed Setting
	12.5 Application Highlights
		12.5.1 Cooperative Setting
		12.5.2 Competitive Setting
		12.5.3 Mixed Settings
	12.6 Conclusions and Future Directions
	References
13 Computational Intelligence in Uncertainty Quantification for Learning Control and Differential Games
	13.1 Introduction
	13.2 Problem Formulation of Optimal Control for Uncertain Systems
		13.2.1 Optimal Control for Systems with Parameters Modulated by Multi-dimensional Uncertainties
		13.2.2 Optimal Control for Random Switching Systems
	13.3 Effective Uncertainty Evaluation Methods
		13.3.1 Problem Formulation
		13.3.2 The MPCM
		13.3.3 The MPCM-OFFD
	13.4 Optimal Control Solutions for Systems with Parameter Modulated by Multi-dimensional Uncertainties
		13.4.1 Reinforcement Learning-Based Stochastic Optimal Control
		13.4.2 Q-Learning-Based Stochastic Optimal Control
	13.5 Optimal Control Solutions for Random Switching Systems
		13.5.1 Optimal Controller for Random Switching Systems
		13.5.2 Effective Estimator for Random Switching Systems
	13.6 Differential Games for Systems with Parameters Modulated by Multi-dimensional Uncertainties
		13.6.1 Stochastic Two-Player Zero-Sum Game
		13.6.2 Multi-player Nonzero-Sum Game
	13.7 Applications
		13.7.1 Traffic Flow Management Under Uncertain Weather
		13.7.2 Learning Control for Aerial Communication Using Directional Antennas (ACDA) Systems
	13.8 Summary
	References
14 A Top-Down Approach to Attain Decentralized Multi-agents
	14.1 Introduction
	14.2 Background
		14.2.1 Reinforcement Learning
		14.2.2 Multi-agent Reinforcement Learning
	14.3 Centralized Learning, But Decentralized Execution
		14.3.1 A Bottom-Up Approach
		14.3.2 A Top-Down Approach
	14.4 Centralized Expert Supervises Multi-agents
		14.4.1 Imitation Learning
		14.4.2 CESMA
	14.5 Experiments
		14.5.1 Decentralization Can Achieve Centralized Optimality
		14.5.2 Expert Trajectories Versus Multi-agent Trajectories
	14.6 Conclusion
	References
15 Modeling and Mitigating Link-Flooding Distributed Denial-of-Service Attacks via Learning in Stackelberg Games
	15.1 Introduction
	15.2 Routing and Attack in Communication Network
	15.3 Stackelberg Game Model
	15.4 Optimal Attack and Stackelberg Equilibria for Malicious Adversaries
		15.4.1 Optimal Attack and Stackelberg Equilibria for Networks with Identical Links
	15.5 Mitigating Attacks via Learning
		15.5.1 Predicting the Routing Cost
		15.5.2 Minimizing the Predicted Routing Cost
	15.6 Simulation Study
		15.6.1 Discussion
	15.7 Conclusion
	References
Part IVBounded Rationality and Value of Information in RL and Games
16 Bounded Rationality in Differential Games: A Reinforcement Learning-Based Approach
	16.1 Introduction
		16.1.1 Related Work
	16.2 Problem Formulation
		16.2.1 Nash Equilibrium Solutions for Differential Games
	16.3 Boundedly Rational Game Solution Concepts
	16.4 Cognitive Hierarchy for Adversarial Target Tracking
		16.4.1 Problem Formulation
		16.4.2 Zero-Sum Game
		16.4.3 Cognitive Hierarchy
		16.4.4 Coordination with Nonequilibrium Game-Theoretic Learning
		16.4.5 Simulation
	16.5 Conclusion and Future Work
	References
17 Bounded Rationality in Learning, Perception, Decision-Making, and Stochastic Games
	17.1 The Autonomy Challenge
		17.1.1 The Case of Actionable Data
		17.1.2 The Curse of Optimality
	17.2 How to Move Forward
		17.2.1 Bounded Rationality for Human-Like Decision-Making
		17.2.2 Hierarchical Abstractions for Scalability
	17.3 Sequential Decision-Making Subject to Resource Constraints
		17.3.1 Standard Markov Decision Processes
		17.3.2 Information-Limited Markov Decision Processes
	17.4 An Information-Theoretic Approach for Hierarchical Decision-Making
		17.4.1 Agglomerative Information Bottleneck for Quadtree Compression
		17.4.2 Optimal Compression of Quadtrees
		17.4.3 The Q-Tree Search Algorithm
	17.5 Stochastic Games and Bounded Rationality
		17.5.1 Stochastic Pursuit–Evasion
		17.5.2 Level-k Thinking
		17.5.3 A Pursuit–Evasion Game in a Stochastic Environment
	17.6 Conclusions
	References
18 Fairness in Learning-Based Sequential Decision Algorithms: A Survey
	18.1 Introduction
	18.2 Preliminaries
		18.2.1 Sequential Decision Algorithms
		18.2.2 Notions of Fairness
	18.3 (Fair) Sequential Decision When Decisions Do Not Affect Underlying Population
		18.3.1 Bandits, Regret, and Fair Regret
		18.3.2 Fair Experts and Expert Opinions
		18.3.3 Fair Policing
	18.4 (Fair) Sequential Decision When Decisions Affect Underlying Population
		18.4.1 Two-Stage Models
		18.4.2 Long-Term Impacts on the Underlying Population
	References
19 Trading Utility and Uncertainty: Applying the Value of Information to Resolve the Exploration–Exploitation Dilemma in Reinforcement Learning
	19.1 Introduction
	19.2 Exploring Single-State, Multiple-Action Markov Decision Processes
		19.2.1 Literature Survey
		19.2.2 Methodology
		19.2.3 Simulations and Analyses
		19.2.4 Conclusions
	19.3 Exploring Multiple-sate, Multiple-Action Markov Decision Processes
		19.3.1 Literature Survey
		19.3.2 Methodology
		19.3.3 Simulations and Analyses
		19.3.4 Conclusions
	References
Part VApplications of RL
20 Map-Based Planning for Small Unmanned Aircraft Rooftop Landing
	20.1 Introduction
	20.2 Background
		20.2.1 Sensor-Based Planning
		20.2.2 Map-Based Planning
		20.2.3 Multi-goal Planning
		20.2.4 Urban Landscape and Rooftop Landings
	20.3 Preliminaries
		20.3.1 Coordinates and Landing Sites
		20.3.2 3D Path Planning with Mapped Obstacles
	20.4 Landing Site Database
		20.4.1 Flat-Like Roof Identification
		20.4.2 Flat Surface Extraction for Usable Landing Area
		20.4.3 Touchdown Points
		20.4.4 Landing Site Risk Model
	20.5 Three-Dimensional Maps for Path Planning
	20.6 Planning Risk Metric Analysis and Integration
		20.6.1 Real-Time Map-Based Planner Architecture
		20.6.2 Trade-Off Between Landing Site and Path Risk
		20.6.3 Multi-goal Planner
	20.7 Maps and Simulation Results
		20.7.1 Landing Sites and Risk Maps
		20.7.2 Case Studies
		20.7.3 Urgent Landing Statistical Analysis
	20.8 Conclusion
	References
21 Reinforcement Learning: An Industrial Perspective
	21.1 Introduction
	21.2 RL Applications
		21.2.1 Sensor Management in Intelligence, Surveillance, and Reconnaissance
		21.2.2 High Level Reasoning in Autonomous Navigation
		21.2.3 Advanced Manufacturing Process Control
		21.2.4 Maintenance, Repair, and Overhaul Operations
		21.2.5 Human–Robot Collaboration
	21.3 Case Study I: Optimal Sensor Tasking
		21.3.1 Sensor Tasking as a Stochastic Optimal Control Problem
		21.3.2 Multi-Arm Bandit Problem Approximation
		21.3.3 Numerical Study
	21.4 Case Study II: Deep Reinforcement Learning for Advanced Manufacturing Control
		21.4.1 Cold Spray Control Problem
		21.4.2 Guided Policy Search
		21.4.3 Simulation Results
	21.5 Future Outlook
	References
22 Robust Autonomous Driving  with Human in the Loop
	22.1 Introduction
	22.2 Mathematical Modeling of Human–Vehicle Interaction
		22.2.1 Vehicle Lateral Dynamics
		22.2.2 Interconnected Human–Vehicle Model
	22.3 Model-Based Control Design
		22.3.1 Discretization of Differential-Difference Equations
		22.3.2 Formulation of the Shared Control Problem
		22.3.3 Model-Based Optimal Control Design
	22.4 Learning-Based Optimal Control for Cooperative Driving
	22.5 Numerical Results
		22.5.1 Algorithmic Implementation
		22.5.2 Comparisons and Discussions for ADP-Based Shared Control Design
	22.6 Conclusions and Future Work
	References
23 Decision-Making for Complex Systems Subjected to Uncertainties—A Probability Density Function Control Approach
	23.1 Introduction
	23.2 Integrated Modeling Perspectives—Ordinary Algebra Versus {Max, +} Algebra
		23.2.1 Process Level Modeling via Ordinary Algebra Systems
		23.2.2 {Max, +} Algebra-Based Modeling
		23.2.3 Learning Under Uncertainties–PDF Shaping of Modeling Error-Based Approach
	23.3 Human-in-the-Loop Consideration: Impact of Uncertainties in Decision-Making Phase
	23.4 Optimization Under Uncertainties Impacts
		23.4.1 Formulation of Optimization as a Feedback Control Design Problem–Optimization is a Special Case of Feedback Control System Design
	23.5 A Generalized Framework for Decision-Making Using PDF Shaping Approach
		23.5.1 PDF Shaping for the Performance Function
		23.5.2 Dealing with the Constraint
		23.5.3 Dealing with Dynamic Constraint
		23.5.4 A Total Probabilistic Solution
		23.5.5 Uncertainties in Performance Function and Constraints
	23.6 System Analysis: Square Impact Principle as a Mathematical Principle for Integrated IT with Infrastructure Design
		23.6.1 Description of Operational Optimal Control
		23.6.2 Square Impact Principle (SIP): Infrastructure Versus Control Performance
	23.7 Conclusions
	References
Part VIMulti-Disciplinary Connections
24 A Hybrid Dynamical Systems Perspective on Reinforcement Learning for Cyber-Physical Systems: Vistas, Open Problems, and Challenges
	24.1 Introduction
	24.2 Hybrid Dynamical Systems
		24.2.1 Non-uniqueness of Solutions and Set-Valued Dynamics
		24.2.2 Hybrid Time Domains and Solutions of Hybrid Dynamical Systems
		24.2.3 Graphical Convergence, Basic Assumptions and Sequential Compactness
		24.2.4 Stability and Robustness
	24.3 Reinforcement Learning via Dynamic Policy Gradient
		24.3.1 Asynchronous Policy Iteration
		24.3.2 Synchronous Policy Iteration: Online Training of Actor–Critic Structures
	24.4 Reinforcement Learning in Hybrid Dynamical Systems
		24.4.1 Hybrid Learning Algorithms
		24.4.2 Hybrid Dynamic Environments
	24.5 Conclusions
	References
25 The Role of Systems Biology, Neuroscience, and Thermodynamics  in Network Control and Learning
	25.1 Introduction
	25.2 Large-Scale Networks and Hybrid Thermodynamics
	25.3 Multiagent Systems with Uncertain Interagent Communication
	25.4 Systems Biology, Neurophysiology, Thermodynamics, and Dynamic Switching Communication Topologies  for Large-Scale Multilayered Networks
	25.5 Nonlinear Stochastic Optimal Control and Learning
	25.6 Complexity, Thermodynamics, Information Theory, and Swarm Dynamics
	25.7 Thermodynamic Entropy, Shannon Entropy, Bode Integrals, and Performance Limitations in  Nonlinear Systems
	25.8 Conclusion
	References
26 Quantum Amplitude Amplification for Reinforcement Learning
	26.1 Exploration and Exploitation in Reinforcement Learning
	26.2 Quantum Probability Theory
	26.3 The Original Quantum Reinforcement Learning (QRL) Algorithm
	26.4 The Revised Quantum Reinforcement Learning Algorithm
	26.5 Learning Rate and Performance Comparisons
	26.6 Other Applications of QRL
		26.6.1 Example
	26.7 Application to Human Learning
	26.8 Concluding Comments
	References




نظرات کاربران