ورود به حساب

نام کاربری گذرواژه

گذرواژه را فراموش کردید؟ کلیک کنید

حساب کاربری ندارید؟ ساخت حساب

ساخت حساب کاربری

نام نام کاربری ایمیل شماره موبایل گذرواژه

برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید


09117307688
09117179751

در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید

دسترسی نامحدود

برای کاربرانی که ثبت نام کرده اند

ضمانت بازگشت وجه

درصورت عدم همخوانی توضیحات با کتاب

پشتیبانی

از ساعت 7 صبح تا 10 شب

دانلود کتاب Deep Reinforcement Learning

دانلود کتاب یادگیری تقویتی عمیق

Deep Reinforcement Learning

مشخصات کتاب

Deep Reinforcement Learning

ویرایش: [1st ed. 2022] 
نویسندگان:   
سری:  
ISBN (شابک) : 9811906378, 9789811906374 
ناشر: Springer 
سال نشر: 2022 
تعداد صفحات: 421
[414] 
زبان: English 
فرمت فایل : PDF (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود) 
حجم فایل: 11 Mb 

قیمت کتاب (تومان) : 47,000



ثبت امتیاز به این کتاب

میانگین امتیاز به این کتاب :
       تعداد امتیاز دهندگان : 5


در صورت تبدیل فایل کتاب Deep Reinforcement Learning به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.

توجه داشته باشید کتاب یادگیری تقویتی عمیق نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.


توضیحاتی در مورد کتاب یادگیری تقویتی عمیق

یادگیری تقویتی عمیق اخیرا توجه قابل توجهی را به خود جلب کرده است. نتایج چشمگیری در زمینه های متنوعی مانند رانندگی مستقل، بازی، نوترکیبی مولکولی و روباتیک به دست آمده است. در تمام این زمینه ها، برنامه های کامپیوتری به خود آموخته اند که مسائلی را که قبلاً بسیار دشوار در نظر گرفته می شدند، درک کنند. در بازی Go، برنامه AlphaGo حتی یاد گرفته است که سه بازیکن پیشرو جهان را پشت سر بگذارد. یادگیری تقویتی عمیق از زمینه های زیست شناسی و روانشناسی الهام می گیرد. زیست‌شناسی الهام‌بخش ایجاد شبکه‌های عصبی مصنوعی و یادگیری عمیق است، در حالی که روان‌شناسی به مطالعه چگونگی یادگیری حیوانات و انسان‌ها و چگونگی تقویت رفتار مورد نظر افراد با محرک‌های مثبت و منفی می‌پردازد. وقتی می‌بینیم که چگونه یادگیری تقویتی به یک ربات شبیه‌سازی شده راه رفتن را آموزش می‌دهد، یادمان می‌آید که کودکان چگونه از طریق اکتشاف بازیگوش یاد می‌گیرند. تکنیک هایی که از زیست شناسی و روانشناسی الهام گرفته شده اند به طرز شگفت انگیزی در رایانه ها کار می کنند: رفتار حیوانات و ساختار مغز به عنوان طرح های جدید برای علم و مهندسی. در واقع، به نظر می رسد که رایانه ها واقعاً دارای جنبه هایی از رفتار انسان هستند. به این ترتیب، این زمینه به قلب رویای هوش مصنوعی می رود.

این پیشرفت‌های پژوهشی از چشم مربیان دور نمانده است. بسیاری از دانشگاه ها شروع به ارائه دوره هایی با موضوع یادگیری تقویتی عمیق کرده اند. هدف این کتاب ارائه یک نمای کلی از این رشته، در سطح دقیق جزئیات برای دوره تحصیلات تکمیلی هوش مصنوعی است. این زمینه کامل، از الگوریتم‌های اساسی Deep Q-learning تا موضوعات پیشرفته مانند یادگیری تقویتی چند عاملی و فرا یادگیری را پوشش می‌دهد.


توضیحاتی درمورد کتاب به خارجی

Deep reinforcement learning has attracted considerable attention recently. Impressive results have been achieved in such diverse fields as autonomous driving, game playing, molecular recombination, and robotics. In all these fields, computer programs have taught themselves to understand problems that were previously considered to be very difficult. In the game of Go, the program AlphaGo has even learned to outmatch three of the world’s leading players.Deep reinforcement learning takes its inspiration from the fields of biology and psychology. Biology has inspired the creation of artificial neural networks and deep learning, while psychology studies how animals and humans learn, and how subjects’ desired behavior can be reinforced with positive and negative stimuli. When we see how reinforcement learning teaches a simulated robot to walk, we are reminded of how children learn, through playful exploration. Techniques that are inspired by biology and psychology work amazingly well in computers: animal behavior and the structure of the brain as new blueprints for science and engineering. In fact, computers truly seem to possess aspects of human behavior; as such, this field goes to the heart of the dream of artificial intelligence.

These research advances have not gone unnoticed by educators. Many universities have begun offering courses on the subject of deep reinforcement learning. The aim of this book is to provide an overview of the field, at the proper level of detail for a graduate course in artificial intelligence. It covers the complete field, from the basic algorithms of Deep Q-learning, to advanced topics such as multi-agent reinforcement learning and meta learning.



فهرست مطالب

Preface
Acknowledgments
Contents
List of Tables
1 Introduction
	1.1 What Is Deep Reinforcement Learning?
		1.1.1 Deep Learning
		1.1.2 Reinforcement Learning
		1.1.3 Deep Reinforcement Learning
		1.1.4 Applications
			1.1.4.1 Sequential Decision Problems
			1.1.4.2 Robotics
			1.1.4.3 Games
		1.1.5 Four Related Fields
			1.1.5.1 Psychology
			1.1.5.2 Mathematics
			1.1.5.3 Engineering
			1.1.5.4 Biology
	1.2 Three Machine Learning Paradigms
		1.2.1 Supervised Learning
		1.2.2 Unsupervised Learning
		1.2.3 Reinforcement Learning
	1.3 Overview of the Book
		1.3.1 Prerequisite Knowledge
			1.3.1.1 Course
			1.3.1.2 Blogs and GitHub
		1.3.2 Structure of the Book
			1.3.2.1 Chapters
	References
2 Tabular Value-Based Reinforcement Learning
	Core Concepts
	Core Problem
	Finding a Supermarket
	2.1 Sequential Decision Problems
		2.1.1 Grid Worlds
		2.1.2 Mazes and Box Puzzles
	2.2 Tabular Value-Based Agents
		2.2.1 Agent and Environment
		2.2.2 Markov Decision Process
			2.2.2.1 State S
			2.2.2.2 Action A
			2.2.2.3 Transition Ta
			2.2.2.4 Reward Ra
			2.2.2.5 Discount Factor γ
			2.2.2.6 Policy π
		2.2.3 MDP Objective
			2.2.3.1 Trace τ
			2.2.3.2 Return R
			2.2.3.3 State Value V
			2.2.3.4 State–Action Value Q
			2.2.3.5 Reinforcement Learning Objective
			2.2.3.6 Bellman Equation
		2.2.4 MDP Solution Methods
			2.2.4.1 Hands On: Value Iteration in Gym
			2.2.4.2 OpenAI Gym
			2.2.4.3 Taxi Example with Value Iteration
			2.2.4.4 Model-Free Learning
			2.2.4.5 Temporal Difference Learning
			2.2.4.6 Find Policy by Value-Based Learning
			2.2.4.7 Exploration
			2.2.4.8 Bandit Theory
			2.2.4.9 ε-Greedy Exploration
			2.2.4.10 Off-Policy Learning
			2.2.4.11 On-Policy SARSA
			2.2.4.12 Off-Policy Q-Learning
			2.2.4.13 Sparse Rewards and Reward Shaping
			2.2.4.14 Hands on: Q-Learning on Taxi
			2.2.4.15 Tuning Your Learning Rate
	2.3 Classic Gym Environments
		2.3.1 Mountain Car and Cartpole
		2.3.2 Path Planning and Board Games
			2.3.2.1 Path Planning
			2.3.2.2 Board Games
	Summary and Further Reading
		Summary
		Further Reading
	Exercises
		Questions
		Exercises
	References
3 Deep Value-Based Reinforcement Learning
	Core Concepts
	Core Problem
	Core Algorithm
	End-to-End Learning
	3.1 Large, High-Dimensional, Problems
		3.1.1 Atari Arcade Games
		3.1.2 Real-Time Strategy and Video Games
	3.2 Deep Value-Based Agents
		3.2.1 Generalization of Large Problemswith Deep Learning
			3.2.1.1 Minimizing Supervised Target Loss
			3.2.1.2 Bootstrapping Q-Values
			3.2.1.3 Deep Reinforcement Learning Target-Error
		3.2.2 Three Challenges
			3.2.2.1 Coverage
			3.2.2.2 Correlation
			3.2.2.3 Convergence
			3.2.2.4 Deadly Triad
		3.2.3 Stable Deep Value-Based Learning
			3.2.3.1 Decorrelating States
			3.2.3.2 Experience Replay
			3.2.3.3 Infrequent Updates of Target Weights
			3.2.3.4 Hands On: DQN and Breakout Gym Example
			3.2.3.5 Install Stable Baselines
			3.2.3.6 The DQN Code
		3.2.4 Improving Exploration
			3.2.4.1 Overestimation
			3.2.4.2 Prioritized Experience Replay
			3.2.4.3 Advantage Function
			3.2.4.4 Distributional Methods
			3.2.4.5 Noisy DQN
	3.3 Atari 2600 Environments
		3.3.1 Network Architecture
		3.3.2 Benchmarking Atari
	Summary and Further Reading
		Summary
		Further Reading
	Exercises
		Questions
		Exercises
	References
4 Policy-Based Reinforcement Learning
	Core Concepts
	Core Problem
	Core Algorithms
	Jumping Robots
	4.1 Continuous Problems
		4.1.1 Continuous Policies
		4.1.2 Stochastic Policies
		4.1.3 Environments: Gym and MuJoCo
			4.1.3.1 Robotics
			4.1.3.2 Physics Models
			4.1.3.3 Games
	4.2 Policy-Based Agents
		4.2.1 Policy-Based Algorithm: REINFORCE
		4.2.2 Bias–Variance Trade-Off in Policy-Based Methods
		4.2.3 Actor Critic Bootstrapping
		4.2.4 Baseline Subtraction with Advantage Function
		4.2.5 Trust Region Optimization
		4.2.6 Entropy and Exploration
		4.2.7 Deterministic Policy Gradient
		4.2.8 Hands On: PPO and DDPG MuJoCo Examples
	4.3 Locomotion and Visuo-Motor Environments
		4.3.1 Locomotion
		4.3.2 Visuo-Motor Interaction
		4.3.3 Benchmarking
	Summary and Further Reading
		Summary
		Further Reading
	Exercises
		Questions
		Exercises
	References
5 Model-Based Reinforcement Learning
	Core Concepts
	Core Problem
	Core Algorithms
	Building a Navigation Map
	5.1 Dynamics Models of High-Dimensional Problems
	5.2 Learning and Planning Agents
		5.2.1 Learning the Model
			5.2.1.1 Modeling Uncertainty
			5.2.1.2 Latent Models
		5.2.2 Planning with the Model
			5.2.2.1 Trajectory Rollouts and Model-Predictive Control
			5.2.2.2 End-to-End Learning and Planning-by-Network
	5.3 High-Dimensional Environments
		5.3.1 Overview of Model-Based Experiments
		5.3.2 Small Navigation Tasks
		5.3.3 Robotic Applications
		5.3.4 Atari Game Applications
		5.3.5 Hands On: PlaNet Example
	Summary and Further Reading
		Summary
		Further Reading
	Exercises
		Questions
		Exercises
	References
6 Two-Agent Self-Play
	Core Concepts
	Core Problem
	Core Algorithms
	Self-Play in Games
	6.1 Two-Agent Zero-Sum Problems
		6.1.1 The Difficulty of Playing Go
		6.1.2 AlphaGo Achievements
	6.2 Tabula Rasa Self-Play Agents
		6.2.1 Move-Level Self-Play
			6.2.1.1 Minimax
			6.2.1.2 Monte Carlo Tree Search
		6.2.2 Example-Level Self-Play
			6.2.2.1 Policy and Value Network
			6.2.2.2 Stability and Exploration
		6.2.3 Tournament-Level Self-Play
			6.2.3.1 Self-Play Curriculum Learning
			6.2.3.2 Supervised and Reinforcement Curriculum Learning
	6.3 Self-Play Environments
		6.3.1 How to Design a World Class Go Program?
		6.3.2 AlphaGo Zero Performance
		6.3.3 AlphaZero
		6.3.4 Open Self-Play Frameworks
		6.3.5 Hands On: Hex in PolyGames Example
	Summary and Further Reading
		Summary
		Further Reading
	Exercises
		Questions
		Implementation: New or Make/Undo
		Exercises
	References
7 Multi-Agent Reinforcement Learning
	Core Concepts
	Core Problem
	Core Algorithms
	Self-driving Car
	7.1 Multi-Agent Problems
	Game Theory
	Stochastic Games and Extensive-Form Games
	Competitive, Cooperative, and Mixed Strategies
		7.1.1 Competitive Behavior
		7.1.2 Cooperative Behavior
			7.1.2.1 Multi-Objective Reinforcement Learning
		7.1.3 Mixed Behavior
			7.1.3.1 Iterated Prisoner's Dilemma
		7.1.4 Challenges
			7.1.4.1 Partial Observability
			7.1.4.2 Nonstationary Environments
			7.1.4.3 Large State Space
	7.2 Multi-Agent Reinforcement Learning Agents
		7.2.1 Competitive Behavior
			7.2.1.1 Counterfactual Regret Minimization
			7.2.1.2 Deep Counterfactual Regret Minimization
		7.2.2 Cooperative Behavior
			7.2.2.1 Centralized Training/Decentralized Execution
			7.2.2.2 Opponent Modeling
			7.2.2.3 Communication
			7.2.2.4 Psychology
		7.2.3 Mixed Behavior
			7.2.3.1 Evolutionary Algorithms
			7.2.3.2 Swarm Computing
			7.2.3.3 Population-Based Training
			7.2.3.4 Self-play Leagues
	7.3 Multi-Agent Environments
		7.3.1 Competitive Behavior: Poker
		7.3.2 Cooperative Behavior: Hide and Seek
		7.3.3 Mixed Behavior: Capture the Flag and StarCraft
			7.3.3.1 Capture the Flag
			7.3.3.2 StarCraft
		7.3.4 Hands On: Hide and Seek in the Gym Example
			7.3.4.1 Multiplayer Environments
	Summary and Further Reading
		Summary
		Further Reading
	Exercises
		Questions
		Exercises
	References
8 Hierarchical Reinforcement Learning
	Core Concepts
	Core Problem
	Core Algorithms
	Planning a Trip
	8.1 Granularity of the Structure of Problems
		8.1.1 Advantages
		8.1.2 Disadvantages
			8.1.2.1 Conclusion
	8.2 Divide and Conquer for Agents
		8.2.1 The Options Framework
			8.2.1.1 Universal Value Function
		8.2.2 Finding Subgoals
		8.2.3 Overview of Hierarchical Algorithms
			8.2.3.1 Tabular Methods
			8.2.3.2 Deep Learning
	8.3 Hierarchical Environments
		8.3.1 Four Rooms and Robot Tasks
		8.3.2 Montezuma's Revenge
		8.3.3 Multi-Agent Environments
		8.3.4 Hands On: Hierarchical Actor Critic Example
	Summary and Further Reading
		Summary
		Further Reading
	Exercises
		Questions
		Exercises
	References
9 Meta-Learning
	Core Concepts
	Core Problem
	Core Algorithms
	Foundation Models
	9.1 Learning to Learn Related Problems
	9.2 Transfer Learning and Meta-Learning Agents
		9.2.1 Transfer Learning
			9.2.1.1 Task Similarity
			9.2.1.2 Pretraining and Finetuning
			9.2.1.3 Hands On: Pretraining Example
			9.2.1.4 Multi-Task Learning
			9.2.1.5 Domain Adaptation
		9.2.2 Meta-Learning
			9.2.2.1 Evaluating Few-Shot Learning Problems
			9.2.2.2 Deep Meta-Learning Algorithms
			9.2.2.3 Inner and Outer Loop Optimization
			9.2.2.4 Recurrent Meta-Learning
			9.2.2.5 Model-Agnostic Meta-Learning
			9.2.2.6 Hyperparameter Optimization
			9.2.2.7 Meta-Learning and Curriculum Learning
			9.2.2.8 From Few-Shot to Zero-Shot Learning
	9.3 Meta-Learning Environments
		9.3.1 Image Processing
		9.3.2 Natural Language Processing
		9.3.3 Meta-Dataset
		9.3.4 Meta-World
		9.3.5 Alchemy
		9.3.6 Hands On: Meta-World Example
	Summary and Further Reading
		Summary
		Further Reading
	Exercises
		Questions
		Exercises
	References
10 Further Developments
	10.1 Development of Deep Reinforcement Learning
		10.1.1 Tabular Methods
		10.1.2 Model-Free Deep Learning
		10.1.3 Multi-Agent Methods
		10.1.4 Evolution of Reinforcement Learning
	10.2 Main Challenges
		10.2.1 Latent Models
		10.2.2 Self-Play
		10.2.3 Hierarchical Reinforcement Learning
		10.2.4 Transfer Learning and Meta-Learning
		10.2.5 Population-Based Methods
		10.2.6 Exploration and Intrinsic Motivation
		10.2.7 Explainable AI
		10.2.8 Generalization
	10.3 The Future of Artificial Intelligence
	References
A Mathematical Background
	A.1 Sets and Functions
		A.1.1 Sets
			A.1.1.1 Discrete Set
			A.1.1.2 Continuous Set
			A.1.1.3 Conditioning a Set
			A.1.1.4 Cardinality and Dimensionality
			A.1.1.5 Cartesian Product
		A.1.2 Functions
	A.2 Probability Distributions
		A.2.1 Discrete Probability Distributions
			A.2.1.1 Parameters
			A.2.1.2 Representing Discrete Random Variables
		A.2.2 Continuous Probability Distributions
			A.2.2.1 Parameters
		A.2.3 Conditional Distributions
		A.2.4 Expectation
			A.2.4.1 Expectation of a Random Variable
			A.2.4.2 Expectation of a Function of a Random Variable
		A.2.5 Information Theory
			A.2.5.1 Information
			A.2.5.2 Entropy
			A.2.5.3 Cross-Entropy
			A.2.5.4 Kullback–Leibler Divergence
	A.3 Derivative of an Expectation
	A.4 Bellman Equations
	References
B Deep Supervised Learning
	B.1 Machine Learning
		B.1.1 Training Set and Test Set
		B.1.2 Curse of Dimensionality
		B.1.3 Overfitting and the Bias–Variance Trade-Off
			B.1.3.1 Regularization—the World Is Smooth
	B.2 Deep Learning
		B.2.1 Weights, Neurons
		B.2.2 Backpropagation
			B.2.2.1 Loss Function
		B.2.3 End-to-End Feature Learning
			B.2.3.1 Function Approximation
		B.2.4 Convolutional Networks
			B.2.4.1 Shared Weights
			B.2.4.2 CNN Architecture
			B.2.4.3 Max Pooling
		B.2.5 Recurrent Networks
			B.2.5.1 Long Short-Term Memory
		B.2.6 More Network Architectures
			B.2.6.1 Residual Networks
			B.2.6.2 Generative Adversarial Networks
			B.2.6.3 Autoencoders
			B.2.6.4 Attention Mechanism
			B.2.6.5 Transformers
		B.2.7 Overfitting
	B.3 Datasets and Software
		B.3.1 MNIST and ImageNet
			B.3.1.1 ImageNet
		B.3.2 GPU Implementations
		B.3.3 Hands On: Classification Example
	Excercise
		B.3.3.1 Installing TensorFlow and Keras
		B.3.3.2 Keras MNIST Example
	Exercises
		Questions
		Exercises
	References
C Deep Reinforcement Learning Suites
	C.1 Environments
	C.2 Agent Algorithms
	C.3 Deep Learning Suites
	References
	Glossary
Glossary
Index




نظرات کاربران