برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید

09117307688
09117179751

در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید

دسترسی نامحدود

برای کاربرانی که ثبت نام کرده اند

ضمانت بازگشت وجه

درصورت عدم همخوانی توضیحات با کتاب

پشتیبانی

از ساعت 7 صبح تا 10 شب

دانلود کتاب Foundations of Deep Reinforcement Learning: Theory and Practice in Python

دانلود کتاب مبانی یادگیری تقویت عمیق: نظریه و عمل در پایتون

مشخصات کتاب

Foundations of Deep Reinforcement Learning: Theory and Practice in Python

دسته بندی: سایبرنتیک: هوش مصنوعی
ویرایش: 1 
نویسندگان: Laura Graesser. Wah Loon Keng  
سری: Addison-Wesley Data & Analytics Series 
ISBN (شابک) : 9780135172384, 0135172381 
ناشر: Addison-Wesley Professional 
سال نشر: 2019 
تعداد صفحات: 412 
زبان: English 
فرمت فایل : PDF (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود) 
حجم فایل: 6 مگابایت

قیمت کتاب (تومان) : 33,000

کلمات کلیدی مربوط به کتاب مبانی یادگیری تقویت عمیق: نظریه و عمل در پایتون: الگوریتم‌ها، شبکه‌های عصبی، یادگیری عمیق، یادگیری تقویتی، یادگیری نظارت شده، برنامه‌نویسی موازی، یادگیری تفاوت زمانی، روش منتقد، شبکه‌های کیو عمیق، بهینه‌سازی خط مشی پروگزیمال

میانگین امتیاز به این کتاب :
تعداد امتیاز دهندگان : 13

در صورت تبدیل فایل کتاب Foundations of Deep Reinforcement Learning: Theory and Practice in Python به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.

توجه داشته باشید کتاب مبانی یادگیری تقویت عمیق: نظریه و عمل در پایتون نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.

توضیحاتی در مورد کتاب مبانی یادگیری تقویت عمیق: نظریه و عمل در پایتون

مقدمه معاصر بر یادگیری تقویتی عمیق که تئوری و عمل را ترکیب می کند یادگیری تقویتی عمیق (Deep RL) ترکیبی از یادگیری عمیق و یادگیری تقویتی است که در آن عوامل مصنوعی حل مسائل تصمیم گیری متوالی را یاد می گیرند. در دهه گذشته دیپ RL به نتایج قابل توجهی در مورد طیف وسیعی از مشکلات دست یافته است، از بازی های تک نفره و چند نفره - مانند بازی های Go، Atari و DotA 2 تا روباتیک. مبانی یادگیری تقویتی عمیق، مقدمه ای بر RL عمیق است که به طور منحصر به فردی تئوری و پیاده سازی را با هم ترکیب می کند. با شهود شروع می‌شود، سپس نظریه الگوریتم‌های RL عمیق را به دقت توضیح می‌دهد، پیاده‌سازی‌ها را در کتابخانه نرم‌افزار همراه خود SLM Lab مورد بحث قرار می‌دهد و با جزئیات عملی به کار بردن RL عمیق پایان می‌دهد. این راهنما هم برای دانشجویان علوم کامپیوتر و هم برای مهندسان نرم افزار که با مفاهیم اولیه یادگیری ماشین آشنا هستند و درک درستی از پایتون دارند ایده آل است. • هر یک از جنبه های کلیدی یک مشکل RL عمیق را درک کنید • الگوریتم‌های مبتنی بر خط‌مشی و ارزش، از جمله REINFORCE، SARSA، DQN، Double DQN، و تکرار تجربه اولویت‌دار (PER) را کاوش کنید. • در الگوریتم‌های ترکیبی، از جمله Actor-Critic و Proximal Policy Optimization (PPO) کاوش کنید. • درک کنید که چگونه الگوریتم ها را می توان به صورت همزمان و ناهمزمان موازی کرد • الگوریتم ها را در آزمایشگاه SLM اجرا کنید و جزئیات پیاده سازی عملی را برای به کار انداختن RL عمیق یاد بگیرید • نتایج معیار الگوریتم را با فراپارامترهای تنظیم شده کاوش کنید • درک چگونگی طراحی محیط های عمیق RL

توضیحاتی درمورد کتاب به خارجی

The Contemporary Introduction to Deep Reinforcement Learning that Combines Theory and Practice Deep reinforcement learning (deep RL) combines deep learning and reinforcement learning, in which artificial agents learn to solve sequential decision-making problems. In the past decade deep RL has achieved remarkable results on a range of problems, from single and multiplayer games–such as Go, Atari games, and DotA 2–to robotics. Foundations of Deep Reinforcement Learning is an introduction to deep RL that uniquely combines both theory and implementation. It starts with intuition, then carefully explains the theory of deep RL algorithms, discusses implementations in its companion software library SLM Lab, and finishes with the practical details of getting deep RL to work. This guide is ideal for both computer science students and software engineers who are familiar with basic machine learning concepts and have a working understanding of Python. • Understand each key aspect of a deep RL problem • Explore policy- and value-based algorithms, including REINFORCE, SARSA, DQN, Double DQN, and Prioritized Experience Replay (PER) • Delve into combined algorithms, including Actor-Critic and Proximal Policy Optimization (PPO) • Understand how algorithms can be parallelized synchronously and asynchronously • Run algorithms in SLM Lab and learn the practical implementation details for getting deep RL to work • Explore algorithm benchmark results with tuned hyperparameters • Understand how deep RL environments are designed

فهرست مطالب

Cover
Title Page
Copyright Page
Contents
Foreword
Preface
Acknowledgments
About the Authors
1 Introduction to Reinforcement Learning
	1.1 Reinforcement Learning
	1.2 Reinforcement Learning as MDP
	1.3 Learnable Functions in Reinforcement Learning
	1.4 Deep Reinforcement Learning Algorithms
		1.4.1 Policy-Based Algorithms
		1.4.2 Value-Based Algorithms
		1.4.3 Model-Based Algorithms
		1.4.4 Combined Methods
		1.4.5 Algorithms Covered in This Book
		1.4.6 On-Policy and Off-Policy Algorithms
		1.4.7 Summary
	1.5 Deep Learning for Reinforcement Learning
	1.6 Reinforcement Learning and Supervised Learning
		1.6.1 Lack of an Oracle
		1.6.2 Sparsity of Feedback
		1.6.3 Data Generation
	1.7 Summary
Part I: Policy-Based and Value-Based Algorithms
	2 REINFORCE
		2.1 Policy
		2.2 The Objective Function
		2.3 The Policy Gradient
			2.3.1 Policy Gradient Derivation
		2.4 Monte Carlo Sampling
		2.5 REINFORCE Algorithm
			2.5.1 Improving REINFORCE
		2.6 Implementing REINFORCE
			2.6.1 A Minimal REINFORCE Implementation
			2.6.2 Constructing Policies with PyTorch
			2.6.3 Sampling Actions
			2.6.4 Calculating Policy Loss
			2.6.5 REINFORCE Training Loop
			2.6.6 On-Policy Replay Memory
		2.7 Training a REINFORCE Agent
		2.8 Experimental Results
			2.8.1 Experiment: The Effect of Discount Factor γ
			2.8.2 Experiment: The Effect of Baseline
		2.9 Summary
		2.10 Further Reading
		2.11 History
	3 SARSA
		3.1 The Q- and V-Functions
		3.2 Temporal Difference Learning
			3.2.1 Intuition for Temporal Difference Learning
		3.3 Action Selection in SARSA
			3.3.1 Exploration and Exploitation
		3.4 SARSA Algorithm
			3.4.1 On-Policy Algorithms
		3.5 Implementing SARSA
			3.5.1 Action Function: ɛ-Greedy
			3.5.2 Calculating the Q-Loss
			3.5.3 SARSA Training Loop
			3.5.4 On-Policy Batched Replay Memory
		3.6 Training a SARSA Agent
		3.7 Experimental Results
			3.7.1 Experiment: The Effect of Learning Rate
		3.8 Summary
		3.9 Further Reading
		3.10 History
	4 Deep Q-Networks (DQN)
		4.1 Learning the Q-Function in DQN
		4.2 Action Selection in DQN
			4.2.1 The Boltzmann Policy
		4.3 Experience Replay
		4.4 DQN Algorithm
		4.5 Implementing DQN
			4.5.1 Calculating the Q-Loss
			4.5.2 DQN Training Loop
			4.5.3 Replay Memory
		4.6 Training a DQN Agent
		4.7 Experimental Results
			4.7.1 Experiment: The Effect of Network Architecture
		4.8 Summary
		4.9 Further Reading
		4.10 History
	5 Improving DQN
		5.1 Target Networks
		5.2 Double DQN 106
		5.3 Prioritized Experience Replay (PER) 109
			5.3.1 Importance Sampling
		5.4 Modified DQN Implementation
			5.4.1 Network Initialization
			5.4.2 Calculating the Q-Loss
			5.4.3 Updating the Target Network
			5.4.4 DQN with Target Networks
			5.4.5 Double DQN
			5.4.6 Prioritized Experienced Replay
		5.5 Training a DQN Agent to Play Atari Games
		5.6 Experimental Results
			5.6.1 Experiment: The Effect of Double DQN and PER
		5.7 Summary
		5.8 Further Reading
Part II: Combined Methods
	6 Advantage Actor-Critic (A2C)
		6.1 The Actor
		6.2 The Critic
			6.2.1 The Advantage Function
			6.2.2 Learning the Advantage Function
		6.3 A2C Algorithm
		6.4 Implementing A2C
			6.4.1 Advantage Estimation
			6.4.2 Calculating Value Loss and Policy Loss
			6.4.3 Actor-Critic Training Loop
		6.5 Network Architecture
		6.6 Training an A2C Agent
			6.6.1 A2C with n-Step Returns on Pong
			6.6.2 A2C with GAE on Pong
			6.6.3 A2C with n-Step Returns on BipedalWalker
		6.7 Experimental Results
			6.7.1 Experiment: The Effect of n-Step Returns
			6.7.2 Experiment: The Effect of λ of GAE
		6.8 Summary
		6.9 Further Reading
		6.10 History
	7 Proximal Policy Optimization (PPO)
		7.1 Surrogate Objective
			7.1.1 Performance Collapse
			7.1.2 Modifying the Objective
		7.2 Proximal Policy Optimization (PPO)
		7.3 PPO Algorithm
		7.4 Implementing PPO
			7.4.1 Calculating the PPO Policy Loss
			7.4.2 PPO Training Loop
		7.5 Training a PPO Agent
			7.5.1 PPO on Pong
			7.5.2 PPO on BipedalWalker
		7.6 Experimental Results
			7.6.1 Experiment: The Effect of λ of GAE
			7.6.2 Experiment: The Effect of Clipping Variable ε
		7.7 Summary
		7.8 Further Reading
	8 Parallelization Methods
		8.1 Synchronous Parallelization
		8.2 Asynchronous Parallelization
			8.2.1 Hogwild!
		8.3 Training an A3C Agent
		8.4 Summary
		8.5 Further Reading
	9 Algorithm Summary
Part III: Practical Details
	10 Getting Deep RL to Work
		10.1 Software Engineering Practices
			10.1.1 Unit Tests
			10.1.2 Code Quality
			10.1.3 Git Workflow
		10.2 Debugging Tips
			10.2.1 Signs of Life
			10.2.2 Policy Gradient Diagnoses
			10.2.3 Data Diagnoses
			10.2.4 Preprocessor
			10.2.5 Memory
			10.2.6 Algorithmic Functions
			10.2.7 Neural Networks
			10.2.8 Algorithm Simplification
			10.2.9 Problem Simplification
			10.2.10 Hyperparameters
			10.2.11 Lab Workflow
		10.3 Atari Tricks
		10.4 Deep RL Almanac
			10.4.1 Hyperparameter Tables
			10.4.2 Algorithm Performance Comparison
		10.5 Summary
	11 SLM Lab
		11.1 Algorithms Implemented in SLM Lab
		11.2 Spec File
			11.2.1 Search Spec Syntax
		11.3 Running SLM Lab
			11.3.1 SLM Lab Commands
		11.4 Analyzing Experiment Results
			11.4.1 Overview of the Experiment Data
		11.5 Summary
	12 Network Architectures
		12.1 Types of Neural Networks
			12.1.1 Multilayer Perceptrons (MLPs)
			12.1.2 Convolutional Neural Networks (CNNs)
			12.1.3 Recurrent Neural Networks (RNNs)
		12.2 Guidelines for Choosing a Network Family
			12.2.1 MDPs vs. POMDPs
			12.2.2 Choosing Networks for Environments
		12.3 The Net API
			12.3.1 Input and Output Layer Shape Inference
			12.3.2 Automatic Network Construction
			12.3.3 Training Step
			12.3.4 Exposure of Underlying Methods
		12.4 Summary
		12.5 Further Reading
	13 Hardware
		13.1 Computer
		13.2 Data Types
		13.3 Optimizing Data Types in RL
		13.4 Choosing Hardware
		13.5 Summary
Part IV: Environment Design
	14 States
		14.1 Examples of States
		14.2 State Completeness
		14.3 State Complexity
		14.4 State Information Loss
			14.4.1 Image Grayscaling
			14.4.2 Discretization
			14.4.3 Hash Conflict
			14.4.4 Metainformation Loss
		14.5 Preprocessing
			14.5.1 Standardization
			14.5.2 Image Preprocessing
			14.5.3 Temporal Preprocessing
		14.6 Summary
	15 Actions
		15.1 Examples of Actions
		15.2 Action Completeness
		15.3 Action Complexity
		15.4 Summary
		15.5 Further Reading: Action Design in Everyday Things
	16 Rewards
		16.1 The Role of Rewards
		16.2 Reward Design Guidelines
		16.3 Summary
	17 Transition Function
		17.1 Feasibility Checks
		17.2 Reality Check
		17.3 Summary
Epilogue
A: Deep Reinforcement Learning Timeline
B: Example Environments
	B.1 Discrete Environments
		B.1.1 CartPole-v0
		B.1.2 MountainCar-v0
		B.1.3 LunarLander-v2
		B.1.4 PongNoFrameskip-v4
		B.1.5 BreakoutNoFrameskip-v4
	B.2 Continuous Environments
		B.2.1 Pendulum-v0
		B.2.2 BipedalWalker-v2
References
Index
	A
	B
	C
	D
	E
	F
	G
	H
	I
	J
	K
	L
	M
	N
	O
	P
	Q
	R
	S
	T
	U
	V
	W
	X
	Z