برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید

09117307688
09117179751

در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید

دسترسی نامحدود

برای کاربرانی که ثبت نام کرده اند

ضمانت بازگشت وجه

درصورت عدم همخوانی توضیحات با کتاب

پشتیبانی

از ساعت 7 صبح تا 10 شب

دانلود کتاب Reinforcement learning algorithms: analysis and applications.

دانلود کتاب الگوریتم های یادگیری تقویتی: تحلیل و کاربردها

مشخصات کتاب

Reinforcement learning algorithms: analysis and applications.

ویرایش:  
نویسندگان: Boris Belousov,  Hany Abdulsamad,  Pascal Klink,  Simone Parisi,  Jan Peters  
سری: Studies in Computational Intelligence, 
ISBN (شابک) : 9783030411879 
ناشر:  
سال نشر: 2021 
تعداد صفحات: [197] 
زبان: English 
فرمت فایل : PDF (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود) 
حجم فایل: 10 Mb

قیمت کتاب (تومان) : 51,000

میانگین امتیاز به این کتاب :
تعداد امتیاز دهندگان : 9

در صورت تبدیل فایل کتاب Reinforcement learning algorithms: analysis and applications. به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.

توجه داشته باشید کتاب الگوریتم های یادگیری تقویتی: تحلیل و کاربردها نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.

توضیحاتی در مورد کتاب الگوریتم های یادگیری تقویتی: تحلیل و کاربردها

توضیحاتی درمورد کتاب به خارجی

This book reviews research developments in diverse areas of reinforcement learning such as model-free actor-critic methods, model-based learning and control, information geometry of policy searches, reward design, and exploration in biology and the behavioral sciences. Special emphasis is placed on advanced ideas, algorithms, methods, and applications. The contributed papers gathered here grew out of a lecture course on reinforcement learning held by Prof. Jan Peters in the winter semester 2018/2019 at Technische Universität Darmstadt. The book is intended for reinforcement learning students and researchers with a firm grasp of linear algebra, statistics, and optimization. Nevertheless, all key concepts are introduced in each chapter, making the content self-contained and accessible to a broader audience.

فهرست مطالب

Preface
Contents
Biology, Reward, Exploration
Prediction Error and Actor-Critic Hypotheses in the Brain
	1 Introduction
	2 Computational View
		2.1 Temporal Difference Learning
		2.2 Actor Critic
	3 Behavioral View
		3.1 Classical/Pavlovian Conditioning
		3.2 Instrumental/Operant Conditioning
		3.3 Credit Assignment Problem
	4 Neural View
		4.1 Reward Prediction Error Hypothesis
		4.2 Actor-Critic Hypothesis
		4.3 Multiple Critics Hypothesis
		4.4 Limitations
	5 Conclusion
	References
Reviewing On-Policy/Off-Policy Critic Learning in the Context of Temporal Differences and Residual Learning
	1 Introduction
		1.1 Reinforcement Learning
		1.2 Critic Learning
	2 Objective Functions and Temporal Differences
		2.1 Bellman Equation and Temporal Differences
	3 Error Sources of Policy Evaluation Methods
	4 Problems Occurring in Off-Policy Critic Learning
	5 Temporal Differences and Bellman Residuals
		5.1 Temporal-Difference Learning
		5.2 Residual-Gradient Algorithm
		5.3 Further Comparison
	6 Recent Methods and Approaches
	7 Conclusion
	References
Reward Function Design in Reinforcement Learning
	1 Introduction
	2 Natural Reward Signals
		2.1 Evolutionary Reward Signals: Survival and Fitness
		2.2 Monetary Reward in Economics
	3 Sparse Rewards
	4 Reward Shaping
		4.1 Shaping in Behavioral Science
		4.2 Reward Shaping in Reinforcement Learning
		4.3 Limitations of Reward Shaping and Its Relation to A*
	5 Intrinsic Motivation
	6 Conclusion
	References
Exploration Methods in Sparse Reward Environments
	1 Introduction
	2 Exploration Methods
		2.1 The Problem of Naive Exploration
		2.2 Optimism in the Face of Uncertainty
		2.3 Intrinsic Rewards
		2.4 Bayesian RL Methods
		2.5 Other Approaches
	3 Conclusion
	References
Information Geometry in Reinforcement Learning
A Survey on Constraining Policy Updates Using the KL Divergence
	1 Introduction
		1.1 Fisher Information Matrix (FIM) in Policy Gradients
		1.2 FIM, KL Divergence, and Information Loss
	2 Background
	3 Methods
		3.1 Relative Entropy Policy Search
		3.2 Trust Region Policy Optimization
		3.3 Proximal Policy Optimization
	4 Discussion
	5 Conclusion
	References
Fisher Information Approximations  in Policy Gradient Methods
	1 Introduction
	2 Background
		2.1 Fisher Information Matrix
		2.2 Natural Gradient
		2.3 Kronecker Product
	3 Structural FIM Approximations
		3.1 Kronecker Factorization for Neural Networks
		3.2 Recursive Approximation Schemes for the FIM
		3.3 Tikhonov Damping for Stabilization
	4 Monte Carlo FIM Approximations
		4.1 Offline Empirical Fisher Estimation
		4.2 Online Estimation Based on Exponential Averaging
		4.3 Bayesian FIM Estimation
	5 Discussion and Conclusion
	References
Benchmarking the Natural Gradient  in Policy Gradient Methods and Evolution Strategies
	1 Introduction
	2 `Vanilla' Gradient
	3 Natural Gradient
	4 Policy Gradient Methods
		4.1 `Vanilla' Policy Gradient
		4.2 Natural Policy Gradient
		4.3 Natural Actor-Critic Algorithms
		4.4 Trust Region Policy Optimization
	5 Natural Evolution Strategies
		5.1 Search Gradients
		5.2 Natural Gradients in Evolution Strategies
		5.3 Exponential Natural Evolution Strategies
		5.4 Separable Natural Evolution Strategies
	6 Experiments
		6.1 Platforms
		6.2 Results
	7 Discussion and Conclusion
	References
Information-Loss-Bounded Policy Optimization
	1 Introduction
	2 Notation and Background
	3 Method
		3.1 Constrained Policy Optimization
		3.2 Information Loss Bound
		3.3 The Algorithm
	4 Experiments
		4.1 Simulated MuJoCo Tasks
		4.2 Furuta Pendulum Swing-Up and Stabilization
	5 Conclusion
	References
Persistent Homology for Dimensionality Reduction
	1 Introduction
	2 Background and Terminology
	3 Persistent Homology
		3.1 Simplicial-Complexes
		3.2 Homology
		3.3 Computation
	4 Successful Applications of Persistent Homology
		4.1 Robotics and Deep Learning
		4.2 Data Visualization, Neuroscience, and Physics
	5 Conclusion
	References
Model-Free Reinforcement Learning and Actor-Critic Methods
Model-Free Deep Reinforcement Learning—Algorithms and Applications
	1 Introduction
	2 Background
	3 Off-Policy—Discrete Action Space
	4 Off-Policy—Continuous Action Space
	5 On-Policy
	6 Applications—Discrete Space
	7 Applications—Continuous Space
	8 Conclusion and Discussion
	References
Actor vs Critic: Learning the Policy or Learning the Value
	1 Introduction
	2 Notation and Background
		2.1 Markov Decision Process
		2.2 Value-Based (Critic-Only)
		2.3 Policy Gradient (Actor-Only)
		2.4 Actor-Critic
	3 Actor Versus Critic
		3.1 Actor-Only and Critic-Only: Differences
		3.2 Combining Actor-Only and Critic-Only: The Actor-Critic Approach
		3.3 Example Algorithms with Comparison
	4 Conclusion
	References
Bring Color to Deep Q-Networks: Limitations and Improvements of DQN Leading to Rainbow DQN
	1 Introduction
	2 Background and the Deep Q-Networks Algorithm
	3 Limitations of the Deep Q-Networks Algorithm
	4 Extensions to the Deep Q-Networks Algorithm
	5 Combinations of Improvements
	References
Distributed Methods for Reinforcement Learning Survey
	1 Introduction
	2 Notation of Multi-agent Reinforcement Learning
	3 Taxononomy of Distributed Reinforcement Learning
		3.1 Multi-agents
		3.2 Parallel Methods
		3.3 Population-Based
	4 Applications
	5 Discussion
	6 Conclusion
	References
Model-Based Learning and Control
Model-Based Reinforcement Learning from PILCO to PETS
	1 Introduction
	2 Reinforcement Learning and Policy Search
	3 Model-Based Policy Search: PILCO
	4 From Gaussian Processes to Neural Networks
	5 From Policy Search to MPC
	6 From PILCO to PETS
	7 Conclusion
	References
Challenges of Model Predictive Control in a Black Box Environment
	1 Introduction
	2 Terminology
		2.1 Reinforcement Learning
		2.2 Model-Based Reinforcement Learning
	3 Model Predictive Control
		3.1 Learning a Model
		3.2 Optimizing the Trajectory
	4 Challenges of MPC
		4.1 Computation
		4.2 Horizon Problem
	5 Conclusion
	References
Control as Inference?
	1 Introduction
	2 Discrete-Time Optimal Control
		2.1 The Linear Quadratic Regulator
		2.2 Differential Dynamic Programming
	3 Discrete-Time Optimal Control as Message Passing
	4 Continuous-Time Stochastic Optimal Control
	5 Path Integral Control
	6 Discussion
	7 Conclusion
	References