ورود به حساب

نام کاربری گذرواژه

گذرواژه را فراموش کردید؟ کلیک کنید

حساب کاربری ندارید؟ ساخت حساب

ساخت حساب کاربری

نام نام کاربری ایمیل شماره موبایل گذرواژه

برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید


09117307688
09117179751

در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید

دسترسی نامحدود

برای کاربرانی که ثبت نام کرده اند

ضمانت بازگشت وجه

درصورت عدم همخوانی توضیحات با کتاب

پشتیبانی

از ساعت 7 صبح تا 10 شب

دانلود کتاب Deep Reinforcement Learning with Python, 2nd Edition

دانلود کتاب آموزش تقویتی عمیق با پایتون، ویرایش دوم

Deep Reinforcement Learning with Python, 2nd Edition

مشخصات کتاب

Deep Reinforcement Learning with Python, 2nd Edition

ویرایش: 2 
نویسندگان:   
سری: Expert Insight 
ISBN (شابک) : 9781839210686 
ناشر: Packt Publishing 
سال نشر: 2020 
تعداد صفحات: 0 
زبان: English 
فرمت فایل : EPUB (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود) 
حجم فایل: 28 مگابایت 

قیمت کتاب (تومان) : 43,000

در صورت ایرانی بودن نویسنده امکان دانلود وجود ندارد و مبلغ عودت داده خواهد شد



کلمات کلیدی مربوط به کتاب آموزش تقویتی عمیق با پایتون، ویرایش دوم: COM037000 - رایانه‌ها / نظریه ماشین، COM004000 - رایانه‌ها / هوش (AI) و معناشناسی، COM044000 - رایانه‌ها / شبکه‌های عصبی



ثبت امتیاز به این کتاب

میانگین امتیاز به این کتاب :
       تعداد امتیاز دهندگان : 7


در صورت تبدیل فایل کتاب Deep Reinforcement Learning with Python, 2nd Edition به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.

توجه داشته باشید کتاب آموزش تقویتی عمیق با پایتون، ویرایش دوم نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.


توضیحاتی در مورد کتاب آموزش تقویتی عمیق با پایتون، ویرایش دوم



یادگیری تقویتی عمیق با پایتون - نسخه دوم به شما کمک می‌کند الگوریتم‌ها، تکنیک‌ها و معماری‌های یادگیری تقویتی - از جمله یادگیری تقویتی عمیق - را از ابتدا بیاموزید. این نسخه جدید به‌روزرسانی گسترده‌ای از نسخه اصلی است که منعکس‌کننده جدیدترین تفکرات در زمینه یادگیری تقویتی است.


توضیحاتی درمورد کتاب به خارجی

Deep Reinforcement Learning with Python - Second Edition will help you learn reinforcement learning algorithms, techniques and architectures – including deep reinforcement learning – from scratch. This new edition is an extensive update of the original, reflecting the state-of-the-art latest thinking in reinforcement learning.



فهرست مطالب

Cover
Copyright
Packt Page
Contributors
Table of Contents
Preface
Chapter 1: Fundamentals of Reinforcement Learning
	Key elements of RL
		Agent
		Environment
		State and action
		Reward
	The basic idea of RL
	The RL algorithm
		RL agent in the grid world
	How RL differs from other ML paradigms
	Markov Decision Processes
		The Markov property and Markov chain
		The Markov Reward Process
		The Markov Decision Process
	Fundamental concepts of RL
		Math essentials
			Expectation
		Action space
		Policy
			Deterministic policy
			Stochastic policy
		Episode
		Episodic and continuous tasks
		Horizon
		Return and discount factor
			Small discount factor
			Large discount factor
			What happens when we set the discount factor to 0?
			What happens when we set the discount factor to 1?
		The value function
		Q function
		Model-based and model-free learning
		Different types of environments
			Deterministic and stochastic environments
			Discrete and continuous environments
			Episodic and non-episodic environments
			Single and multi-agent environments
	Applications of RL
	RL glossary
	Summary
	Questions
	Further reading
Chapter 2: A Guide to the Gym Toolkit
	Setting up our machine
		Installing Anaconda
		Installing the Gym toolkit
			Common error fixes
	Creating our first Gym environment
		Exploring the environment
			States
			Actions
			Transition probability and reward function
		Generating an episode in the Gym environment
			Action selection
			Generating an episode
	More Gym environments
		Classic control environments
			State space
			Action space
			Cart-Pole balancing with random policy
		Atari game environments
		General environment
			Deterministic environment
			No frame skipping
			State and action space
			An agent playing the Tennis game
			Recording the game
		Other environments
			Box2D
			MuJoCo
			Robotics
			Toy text
			Algorithms
	Environment synopsis
	Summary
	Questions
	Further reading
Chapter 3: The Bellman Equation and Dynamic Programming
	The Bellman equation
		The Bellman equation of the value function
		The Bellman equation of the Q function
		The Bellman optimality equation
		The relationship between the value and Q functions
	Dynamic programming
		Value iteration
			The value iteration algorithm
			Solving the Frozen Lake problem with value iteration
		Policy iteration
			Algorithm – policy iteration
			Solving the Frozen Lake problem with policy iteration
	Is DP applicable to all environments?
	Summary
	Questions
Chapter 4: Monte Carlo Methods
	Understanding the Monte Carlo method
	Prediction and control tasks
		Prediction task
		Control task
	Monte Carlo prediction
		MC prediction algorithm
		Types of MC prediction
			First-visit Monte Carlo
			Every-visit Monte Carlo
		Implementing the Monte Carlo prediction method
			Understanding the blackjack game
			The blackjack environment in the Gym library
			Every-visit MC prediction with the blackjack game
			First-visit MC prediction with the blackjack game
		Incremental mean updates
		MC prediction (Q function)
	Monte Carlo control
		MC control algorithm
		On-policy Monte Carlo control
			Monte Carlo exploring starts
			Monte Carlo with the epsilon-greedy policy
			Implementing on-policy MC control
		Off-policy Monte Carlo control
	Is the MC method applicable to all tasks?
	Summary
	Questions
Chapter 5: Understanding Temporal Difference Learning
	TD learning
	TD prediction
		TD prediction algorithm
			Predicting the value of states in the Frozen Lake environment
	TD control
		On-policy TD control – SARSA
			Computing the optimal policy using SARSA
		Off-policy TD control – Q learning
			Computing the optimal policy using Q learning
		The difference between Q learning and SARSA
	Comparing the DP, MC, and TD methods
	Summary
	Questions
	Further reading
Chapter 6: Case Study – The MAB Problem
	The MAB problem
		Creating a bandit in the Gym
		Exploration strategies
			Epsilon-greedy
			Softmax exploration
			Upper confidence bound
			Thompson sampling
	Applications of MAB
	Finding the best advertisement banner using bandits
		Creating a dataset
		Initialize the variables
		Define the epsilon-greedy method
		Run the bandit test
	Contextual bandits
	Summary
	Questions
	Further reading
Chapter 7: Deep Learning Foundations
	Biological and artificial neurons
	ANN and its layers
		Input layer
		Hidden layer
		Output layer
	Exploring activation functions
		The sigmoid function
		The tanh function
		The Rectified Linear Unit function
		The softmax function
	Forward propagation in ANNs
	How does an ANN learn?
	Putting it all together
		Building a neural network from scratch
	Recurrent Neural Networks
		The difference between feedforward networks and RNNs
		Forward propagation in RNNs
		Backpropagating through time
	LSTM to the rescue
		Understanding the LSTM cell
	What are CNNs?
		Convolutional layers
			Strides
			Padding
		Pooling layers
		Fully connected layers
	The architecture of CNNs
	Generative adversarial networks
		Breaking down the generator
		Breaking down the discriminator
		How do they learn, though?
		Architecture of a GAN
		Demystifying the loss function
			Discriminator loss
			Generator loss
			Total loss
			Summary
			Questions
			Further reading
Chapter 8: A Primer on TensorFlow
	What is TensorFlow?
	Understanding computational graphs and sessions
		Sessions
	Variables, constants, and placeholders
		Variables
		Constants
		Placeholders and feed dictionaries
	Introducing TensorBoard
		Creating a name scope
	Handwritten digit classification using TensorFlow
		Importing the required libraries
		Loading the dataset
		Defining the number of neurons in each layer
		Defining placeholders
		Forward propagation
		Computing loss and backpropagation
		Computing accuracy
		Creating a summary
		Training the model
		Visualizing graphs in TensorBoard
	Introducing eager execution
	Math operations in TensorFlow
	TensorFlow 2.0 and Keras
		Bonjour Keras
			Defining the model
			Compiling the model
			Training the model
			Evaluating the model
		MNIST digit classification using TensorFlow 2.0
	Summary
	Questions
	Further reading
Chapter 9: Deep Q Network and Its Variants
	What is DQN?
		Understanding DQN
			Replay buffer
			Loss function
			Target network
		Putting it all together
			The DQN algorithm
	Playing Atari games using DQN
		Architecture of the DQN
		Getting hands-on with the DQN
			Preprocess the game screen
			Defining the DQN class
			Training the DQN
	The double DQN
		The double DQN algorithm
	DQN with prioritized experience replay
		Types of prioritization
			Proportional prioritization
			Rank-based prioritization
		Correcting the bias
	The dueling DQN
		Understanding the dueling DQN
		The architecture of a dueling DQN
	The deep recurrent Q network
		The architecture of a DRQN
	Summary
	Questions
	Further reading
Chapter 10: Policy Gradient Method
	Why policy-based methods?
	Policy gradient intuition
		Understanding the policy gradient
		Deriving the policy gradient
		Algorithm – policy gradient
	Variance reduction methods
		Policy gradient with reward-to-go
			Algorithm – Reward-to-go policy gradient
		Cart pole balancing with policy gradient
			Computing discounted and normalized reward
			Building the policy network
			Training the network
		Policy gradient with baseline
			Algorithm – REINFORCE with baseline
	Summary
	Questions
	Further reading
Chapter 11: Actor-Critic Methods – A2C and A3C
	Overview of the actor-critic method
		Understanding the actor-critic method
		The actor-critic algorithm
	Advantage actor-critic (A2C)
	Asynchronous advantage actor-critic (A3C)
		The three As
		The architecture of A3C
		Mountain car climbing using A3C
			Creating the mountain car environment
			Defining the variables
			Defining the actor-critic class
			Defining the worker class
			Training the network
			Visualizing the computational graph
	A2C revisited
	Summary
	Questions
	Further reading
Chapter 12: Learning DDPG, TD3, and SAC
	Deep deterministic policy gradient
		An overview of DDPG
			Actor
			Critic
		DDPG components
			Critic network
			Actor network
		Putting it all together
		Algorithm – DDPG
		Swinging up a pendulum using DDPG
			Creating the Gym environment
			Defining the variables
			Defining the DDPG class
			Training the network
	Twin delayed DDPG
		Key features of TD3
			Clipped double Q learning
			Delayed policy updates
			Target policy smoothing
		Putting it all together
		Algorithm – TD3
	Soft actor-critic
		Understanding soft actor-critic
			V and Q functions with the entropy term
		Components of SAC
			Critic network
			Actor network
		Putting it all together
		Algorithm – SAC
	Summary
	Questions
	Further reading
Chapter 13: TRPO, PPO, and ACKTR Methods
	Trust region policy optimization
		Math essentials
			The Taylor series
			The trust region method
			The conjugate gradient method
			Lagrange multipliers
			Importance sampling
		Designing the TRPO objective function
			Parameterizing the policies
			Sample-based estimation
		Solving the TRPO objective function
			Computing the search direction
			Performing a line search in the search direction
		Algorithm – TRPO
	Proximal policy optimization
		PPO with a clipped objective
			Algorithm – PPO-clipped
		Implementing the PPO-clipped method
			Creating the Gym environment
			Defining the PPO class
			Training the network
		PPO with a penalized objective
			Algorithm – PPO-penalty
	Actor-critic using Kronecker-factored trust region
		Math essentials
			Block matrix
			Block diagonal matrix
			The Kronecker product
			The vec operator
			Properties of the Kronecker product
		Kronecker-Factored Approximate Curvature (K-FAC)
		K-FAC in actor-critic
		Incorporating the trust region
	Summary
	Questions
	Further reading
Chapter 14: Distributional Reinforcement Learning
	Why distributional reinforcement learning?
	Categorical DQN
		Predicting the value distribution
		Selecting an action based on the value distribution
		Training the categorical DQN
			Projection step
		Putting it all together
		Algorithm – categorical DQN
		Playing Atari games using a categorical DQN
			Defining the variables
			Defining the replay buffer
			Defining the categorical DQN class
	Quantile Regression DQN
		Math essentials
			Quantile
			Inverse CDF (quantile function)
		Understanding QR-DQN
			Action selection
			Loss function
	Distributed Distributional DDPG
		Critic network
		Actor network
		Algorithm – D4PG
	Summary
	Questions
	Further reading
Chapter 15: Imitation Learning and Inverse RL
	Supervised imitation learning
	DAgger
		Understanding DAgger
		Algorithm – DAgger
	Deep Q learning from demonstrations
		Phases of DQfD
			Pre-training phase
			Training phase
		Loss function of DQfD
		Algorithm – DQfD
	Inverse reinforcement learning
		Maximum entropy IRL
			Key terms
			Back to maximum entropy IRL
			Computing the gradient
			Algorithm – maximum entropy IRL
	Generative adversarial imitation learning
		Formulation of GAIL
	Summary
	Questions
	Further reading
Chapter 16: Deep Reinforcement Learning with Stable Baselines
	Installing Stable Baselines
	Creating our first agent with Stable Baselines
		Evaluating the trained agent
		Storing and loading the trained agent
		Viewing the trained agent
		Putting it all together
	Vectorized environments
		SubprocVecEnv
		DummyVecEnv
	Integrating custom environments
	Playing Atari games with a DQN and its variants
		Implementing DQN variants
	Lunar lander using A2C
		Creating a custom network
	Swinging up a pendulum using DDPG
		Viewing the computational graph in TensorBoard
	Training an agent to walk using TRPO
		Installing the MuJoCo environment
		Implementing TRPO
		Recording the video
	Training a cheetah bot to run using PPO
		Making a GIF of a trained agent
	Implementing GAIL
	Summary
	Questions
	Further reading
Chapter 17: Reinforcement Learning Frontiers
	Meta reinforcement learning
		Model-agnostic meta learning
			Understanding MAML
			MAML in a supervised learning setting
			MAML in a reinforcement learning setting
	Hierarchical reinforcement learning
		MAXQ value function Decomposition
	Imagination augmented agents
	Summary
	Questions
	Further reading
Appendix 1 – Reinforcement Learning Algorithms
	Reinforcement learning algorithm
	Value Iteration
	Policy Iteration
	First-Visit MC Prediction
	Every-Visit MC Prediction
	MC Prediction – the Q Function
	MC Control Method
	On-Policy MC Control – Exploring starts
	On-Policy MC Control – Epsilon-Greedy
	Off-Policy MC Control
	TD Prediction
	On-Policy TD Control – SARSA
	Off-Policy TD Control – Q Learning
	Deep Q Learning
	Double DQN
	REINFORCE Policy Gradient
	Policy Gradient with Reward-To-Go
	REINFORCE with Baseline
	Advantage Actor Critic
	Asynchronous Advantage Actor-Critic
	Deep Deterministic Policy Gradient
	Twin Delayed DDPG
	Soft Actor-Critic
	Trust Region Policy Optimization
	PPO-Clipped
	PPO-Penalty
	Categorical DQN
	Distributed Distributional DDPG
	DAgger
	Deep Q learning from demonstrations
	MaxEnt Inverse Reinforcement Learning
	MAML in Reinforcement Learning
Appendix 2 – Assessments
	Chapter 1 – Fundamentals of Reinforcement Learning
	Chapter 2 – A Guide to the Gym Toolkit
	Chapter 3 – Bellman Equation and Dynamic Programming
	Chapter 4 – Monte Carlo Methods
	Chapter 5 – Understanding Temporal Difference Learning
	Chapter 6 – Case Study – The MAB Problem
	Chapter 7 – Deep Learning Foundations
	Chapter 8 – A Primer on TensorFlow
	Chapter 9 – Deep Q Network and Its Variants
	Chapter 10 – Policy Gradient Method
	Chapter 11 – Actor-Critic Methods – A2C and A3C
	Chapter 12 – Learning DDPG, TD3, and SAC
	Chapter 13 – TRPO, PPO, and ACKTR Methods
	Chapter 14 – Distributional Reinforcement Learning
	Chapter 15 – Imitation Learning and Inverse RL
	Chapter 16 – Deep Reinforcement Learning with Stable Baselines
	Chapter 17 – Reinforcement Learning Frontiers
Other Books You May Enjoy
Index




نظرات کاربران