دسترسی نامحدود
برای کاربرانی که ثبت نام کرده اند
برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید
در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید
برای کاربرانی که ثبت نام کرده اند
درصورت عدم همخوانی توضیحات با کتاب
از ساعت 7 صبح تا 10 شب
ویرایش: [1st ed. 2022]
نویسندگان: Aske Plaat
سری:
ISBN (شابک) : 9811906378, 9789811906374
ناشر: Springer
سال نشر: 2022
تعداد صفحات: 421
[414]
زبان: English
فرمت فایل : PDF (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود)
حجم فایل: 11 Mb
در صورت تبدیل فایل کتاب Deep Reinforcement Learning به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.
توجه داشته باشید کتاب یادگیری تقویتی عمیق نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.
این پیشرفتهای پژوهشی از چشم مربیان دور نمانده است. بسیاری از دانشگاه ها شروع به ارائه دوره هایی با موضوع یادگیری تقویتی عمیق کرده اند. هدف این کتاب ارائه یک نمای کلی از این رشته، در سطح دقیق جزئیات برای دوره تحصیلات تکمیلی هوش مصنوعی است. این زمینه کامل، از الگوریتمهای اساسی Deep Q-learning تا موضوعات پیشرفته مانند یادگیری تقویتی چند عاملی و فرا یادگیری را پوشش میدهد.
These research advances have not gone unnoticed by educators. Many universities have begun offering courses on the subject of deep reinforcement learning. The aim of this book is to provide an overview of the field, at the proper level of detail for a graduate course in artificial intelligence. It covers the complete field, from the basic algorithms of Deep Q-learning, to advanced topics such as multi-agent reinforcement learning and meta learning.
Preface Acknowledgments Contents List of Tables 1 Introduction 1.1 What Is Deep Reinforcement Learning? 1.1.1 Deep Learning 1.1.2 Reinforcement Learning 1.1.3 Deep Reinforcement Learning 1.1.4 Applications 1.1.4.1 Sequential Decision Problems 1.1.4.2 Robotics 1.1.4.3 Games 1.1.5 Four Related Fields 1.1.5.1 Psychology 1.1.5.2 Mathematics 1.1.5.3 Engineering 1.1.5.4 Biology 1.2 Three Machine Learning Paradigms 1.2.1 Supervised Learning 1.2.2 Unsupervised Learning 1.2.3 Reinforcement Learning 1.3 Overview of the Book 1.3.1 Prerequisite Knowledge 1.3.1.1 Course 1.3.1.2 Blogs and GitHub 1.3.2 Structure of the Book 1.3.2.1 Chapters References 2 Tabular Value-Based Reinforcement Learning Core Concepts Core Problem Finding a Supermarket 2.1 Sequential Decision Problems 2.1.1 Grid Worlds 2.1.2 Mazes and Box Puzzles 2.2 Tabular Value-Based Agents 2.2.1 Agent and Environment 2.2.2 Markov Decision Process 2.2.2.1 State S 2.2.2.2 Action A 2.2.2.3 Transition Ta 2.2.2.4 Reward Ra 2.2.2.5 Discount Factor γ 2.2.2.6 Policy π 2.2.3 MDP Objective 2.2.3.1 Trace τ 2.2.3.2 Return R 2.2.3.3 State Value V 2.2.3.4 State–Action Value Q 2.2.3.5 Reinforcement Learning Objective 2.2.3.6 Bellman Equation 2.2.4 MDP Solution Methods 2.2.4.1 Hands On: Value Iteration in Gym 2.2.4.2 OpenAI Gym 2.2.4.3 Taxi Example with Value Iteration 2.2.4.4 Model-Free Learning 2.2.4.5 Temporal Difference Learning 2.2.4.6 Find Policy by Value-Based Learning 2.2.4.7 Exploration 2.2.4.8 Bandit Theory 2.2.4.9 ε-Greedy Exploration 2.2.4.10 Off-Policy Learning 2.2.4.11 On-Policy SARSA 2.2.4.12 Off-Policy Q-Learning 2.2.4.13 Sparse Rewards and Reward Shaping 2.2.4.14 Hands on: Q-Learning on Taxi 2.2.4.15 Tuning Your Learning Rate 2.3 Classic Gym Environments 2.3.1 Mountain Car and Cartpole 2.3.2 Path Planning and Board Games 2.3.2.1 Path Planning 2.3.2.2 Board Games Summary and Further Reading Summary Further Reading Exercises Questions Exercises References 3 Deep Value-Based Reinforcement Learning Core Concepts Core Problem Core Algorithm End-to-End Learning 3.1 Large, High-Dimensional, Problems 3.1.1 Atari Arcade Games 3.1.2 Real-Time Strategy and Video Games 3.2 Deep Value-Based Agents 3.2.1 Generalization of Large Problemswith Deep Learning 3.2.1.1 Minimizing Supervised Target Loss 3.2.1.2 Bootstrapping Q-Values 3.2.1.3 Deep Reinforcement Learning Target-Error 3.2.2 Three Challenges 3.2.2.1 Coverage 3.2.2.2 Correlation 3.2.2.3 Convergence 3.2.2.4 Deadly Triad 3.2.3 Stable Deep Value-Based Learning 3.2.3.1 Decorrelating States 3.2.3.2 Experience Replay 3.2.3.3 Infrequent Updates of Target Weights 3.2.3.4 Hands On: DQN and Breakout Gym Example 3.2.3.5 Install Stable Baselines 3.2.3.6 The DQN Code 3.2.4 Improving Exploration 3.2.4.1 Overestimation 3.2.4.2 Prioritized Experience Replay 3.2.4.3 Advantage Function 3.2.4.4 Distributional Methods 3.2.4.5 Noisy DQN 3.3 Atari 2600 Environments 3.3.1 Network Architecture 3.3.2 Benchmarking Atari Summary and Further Reading Summary Further Reading Exercises Questions Exercises References 4 Policy-Based Reinforcement Learning Core Concepts Core Problem Core Algorithms Jumping Robots 4.1 Continuous Problems 4.1.1 Continuous Policies 4.1.2 Stochastic Policies 4.1.3 Environments: Gym and MuJoCo 4.1.3.1 Robotics 4.1.3.2 Physics Models 4.1.3.3 Games 4.2 Policy-Based Agents 4.2.1 Policy-Based Algorithm: REINFORCE 4.2.2 Bias–Variance Trade-Off in Policy-Based Methods 4.2.3 Actor Critic Bootstrapping 4.2.4 Baseline Subtraction with Advantage Function 4.2.5 Trust Region Optimization 4.2.6 Entropy and Exploration 4.2.7 Deterministic Policy Gradient 4.2.8 Hands On: PPO and DDPG MuJoCo Examples 4.3 Locomotion and Visuo-Motor Environments 4.3.1 Locomotion 4.3.2 Visuo-Motor Interaction 4.3.3 Benchmarking Summary and Further Reading Summary Further Reading Exercises Questions Exercises References 5 Model-Based Reinforcement Learning Core Concepts Core Problem Core Algorithms Building a Navigation Map 5.1 Dynamics Models of High-Dimensional Problems 5.2 Learning and Planning Agents 5.2.1 Learning the Model 5.2.1.1 Modeling Uncertainty 5.2.1.2 Latent Models 5.2.2 Planning with the Model 5.2.2.1 Trajectory Rollouts and Model-Predictive Control 5.2.2.2 End-to-End Learning and Planning-by-Network 5.3 High-Dimensional Environments 5.3.1 Overview of Model-Based Experiments 5.3.2 Small Navigation Tasks 5.3.3 Robotic Applications 5.3.4 Atari Game Applications 5.3.5 Hands On: PlaNet Example Summary and Further Reading Summary Further Reading Exercises Questions Exercises References 6 Two-Agent Self-Play Core Concepts Core Problem Core Algorithms Self-Play in Games 6.1 Two-Agent Zero-Sum Problems 6.1.1 The Difficulty of Playing Go 6.1.2 AlphaGo Achievements 6.2 Tabula Rasa Self-Play Agents 6.2.1 Move-Level Self-Play 6.2.1.1 Minimax 6.2.1.2 Monte Carlo Tree Search 6.2.2 Example-Level Self-Play 6.2.2.1 Policy and Value Network 6.2.2.2 Stability and Exploration 6.2.3 Tournament-Level Self-Play 6.2.3.1 Self-Play Curriculum Learning 6.2.3.2 Supervised and Reinforcement Curriculum Learning 6.3 Self-Play Environments 6.3.1 How to Design a World Class Go Program? 6.3.2 AlphaGo Zero Performance 6.3.3 AlphaZero 6.3.4 Open Self-Play Frameworks 6.3.5 Hands On: Hex in PolyGames Example Summary and Further Reading Summary Further Reading Exercises Questions Implementation: New or Make/Undo Exercises References 7 Multi-Agent Reinforcement Learning Core Concepts Core Problem Core Algorithms Self-driving Car 7.1 Multi-Agent Problems Game Theory Stochastic Games and Extensive-Form Games Competitive, Cooperative, and Mixed Strategies 7.1.1 Competitive Behavior 7.1.2 Cooperative Behavior 7.1.2.1 Multi-Objective Reinforcement Learning 7.1.3 Mixed Behavior 7.1.3.1 Iterated Prisoner's Dilemma 7.1.4 Challenges 7.1.4.1 Partial Observability 7.1.4.2 Nonstationary Environments 7.1.4.3 Large State Space 7.2 Multi-Agent Reinforcement Learning Agents 7.2.1 Competitive Behavior 7.2.1.1 Counterfactual Regret Minimization 7.2.1.2 Deep Counterfactual Regret Minimization 7.2.2 Cooperative Behavior 7.2.2.1 Centralized Training/Decentralized Execution 7.2.2.2 Opponent Modeling 7.2.2.3 Communication 7.2.2.4 Psychology 7.2.3 Mixed Behavior 7.2.3.1 Evolutionary Algorithms 7.2.3.2 Swarm Computing 7.2.3.3 Population-Based Training 7.2.3.4 Self-play Leagues 7.3 Multi-Agent Environments 7.3.1 Competitive Behavior: Poker 7.3.2 Cooperative Behavior: Hide and Seek 7.3.3 Mixed Behavior: Capture the Flag and StarCraft 7.3.3.1 Capture the Flag 7.3.3.2 StarCraft 7.3.4 Hands On: Hide and Seek in the Gym Example 7.3.4.1 Multiplayer Environments Summary and Further Reading Summary Further Reading Exercises Questions Exercises References 8 Hierarchical Reinforcement Learning Core Concepts Core Problem Core Algorithms Planning a Trip 8.1 Granularity of the Structure of Problems 8.1.1 Advantages 8.1.2 Disadvantages 8.1.2.1 Conclusion 8.2 Divide and Conquer for Agents 8.2.1 The Options Framework 8.2.1.1 Universal Value Function 8.2.2 Finding Subgoals 8.2.3 Overview of Hierarchical Algorithms 8.2.3.1 Tabular Methods 8.2.3.2 Deep Learning 8.3 Hierarchical Environments 8.3.1 Four Rooms and Robot Tasks 8.3.2 Montezuma's Revenge 8.3.3 Multi-Agent Environments 8.3.4 Hands On: Hierarchical Actor Critic Example Summary and Further Reading Summary Further Reading Exercises Questions Exercises References 9 Meta-Learning Core Concepts Core Problem Core Algorithms Foundation Models 9.1 Learning to Learn Related Problems 9.2 Transfer Learning and Meta-Learning Agents 9.2.1 Transfer Learning 9.2.1.1 Task Similarity 9.2.1.2 Pretraining and Finetuning 9.2.1.3 Hands On: Pretraining Example 9.2.1.4 Multi-Task Learning 9.2.1.5 Domain Adaptation 9.2.2 Meta-Learning 9.2.2.1 Evaluating Few-Shot Learning Problems 9.2.2.2 Deep Meta-Learning Algorithms 9.2.2.3 Inner and Outer Loop Optimization 9.2.2.4 Recurrent Meta-Learning 9.2.2.5 Model-Agnostic Meta-Learning 9.2.2.6 Hyperparameter Optimization 9.2.2.7 Meta-Learning and Curriculum Learning 9.2.2.8 From Few-Shot to Zero-Shot Learning 9.3 Meta-Learning Environments 9.3.1 Image Processing 9.3.2 Natural Language Processing 9.3.3 Meta-Dataset 9.3.4 Meta-World 9.3.5 Alchemy 9.3.6 Hands On: Meta-World Example Summary and Further Reading Summary Further Reading Exercises Questions Exercises References 10 Further Developments 10.1 Development of Deep Reinforcement Learning 10.1.1 Tabular Methods 10.1.2 Model-Free Deep Learning 10.1.3 Multi-Agent Methods 10.1.4 Evolution of Reinforcement Learning 10.2 Main Challenges 10.2.1 Latent Models 10.2.2 Self-Play 10.2.3 Hierarchical Reinforcement Learning 10.2.4 Transfer Learning and Meta-Learning 10.2.5 Population-Based Methods 10.2.6 Exploration and Intrinsic Motivation 10.2.7 Explainable AI 10.2.8 Generalization 10.3 The Future of Artificial Intelligence References A Mathematical Background A.1 Sets and Functions A.1.1 Sets A.1.1.1 Discrete Set A.1.1.2 Continuous Set A.1.1.3 Conditioning a Set A.1.1.4 Cardinality and Dimensionality A.1.1.5 Cartesian Product A.1.2 Functions A.2 Probability Distributions A.2.1 Discrete Probability Distributions A.2.1.1 Parameters A.2.1.2 Representing Discrete Random Variables A.2.2 Continuous Probability Distributions A.2.2.1 Parameters A.2.3 Conditional Distributions A.2.4 Expectation A.2.4.1 Expectation of a Random Variable A.2.4.2 Expectation of a Function of a Random Variable A.2.5 Information Theory A.2.5.1 Information A.2.5.2 Entropy A.2.5.3 Cross-Entropy A.2.5.4 Kullback–Leibler Divergence A.3 Derivative of an Expectation A.4 Bellman Equations References B Deep Supervised Learning B.1 Machine Learning B.1.1 Training Set and Test Set B.1.2 Curse of Dimensionality B.1.3 Overfitting and the Bias–Variance Trade-Off B.1.3.1 Regularization—the World Is Smooth B.2 Deep Learning B.2.1 Weights, Neurons B.2.2 Backpropagation B.2.2.1 Loss Function B.2.3 End-to-End Feature Learning B.2.3.1 Function Approximation B.2.4 Convolutional Networks B.2.4.1 Shared Weights B.2.4.2 CNN Architecture B.2.4.3 Max Pooling B.2.5 Recurrent Networks B.2.5.1 Long Short-Term Memory B.2.6 More Network Architectures B.2.6.1 Residual Networks B.2.6.2 Generative Adversarial Networks B.2.6.3 Autoencoders B.2.6.4 Attention Mechanism B.2.6.5 Transformers B.2.7 Overfitting B.3 Datasets and Software B.3.1 MNIST and ImageNet B.3.1.1 ImageNet B.3.2 GPU Implementations B.3.3 Hands On: Classification Example Excercise B.3.3.1 Installing TensorFlow and Keras B.3.3.2 Keras MNIST Example Exercises Questions Exercises References C Deep Reinforcement Learning Suites C.1 Environments C.2 Agent Algorithms C.3 Deep Learning Suites References Glossary Glossary Index