ورود به حساب

نام کاربری گذرواژه

گذرواژه را فراموش کردید؟ کلیک کنید

حساب کاربری ندارید؟ ساخت حساب

ساخت حساب کاربری

نام نام کاربری ایمیل شماره موبایل گذرواژه

برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید


09117307688
09117179751

در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید

دسترسی نامحدود

برای کاربرانی که ثبت نام کرده اند

ضمانت بازگشت وجه

درصورت عدم همخوانی توضیحات با کتاب

پشتیبانی

از ساعت 7 صبح تا 10 شب

دانلود کتاب Deep Reinforcement Learning Hands-On

دانلود کتاب آموزش تقویت عمیق به صورت دستی

Deep Reinforcement Learning Hands-On

مشخصات کتاب

Deep Reinforcement Learning Hands-On

ویرایش: [2 ed.] 
نویسندگان:   
سری:  
ISBN (شابک) : 9781838826994 
ناشر: Packt Publishing 
سال نشر: 2020 
تعداد صفحات:  
زبان: English 
فرمت فایل : EPUB (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود) 
حجم فایل: 22 Mb 

قیمت کتاب (تومان) : 37,000



ثبت امتیاز به این کتاب

میانگین امتیاز به این کتاب :
       تعداد امتیاز دهندگان : 8


در صورت تبدیل فایل کتاب Deep Reinforcement Learning Hands-On به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.

توجه داشته باشید کتاب آموزش تقویت عمیق به صورت دستی نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.


توضیحاتی درمورد کتاب به خارجی



فهرست مطالب

Cover
Copyright
Packt Page
Contributors
Table of Contents
Preface
Chapter 1: What Is Reinforcement Learning?
	Supervised learning
	Unsupervised learning
	Reinforcement learning
	RL's complications
	RL formalisms
		Reward
		The agent
		The environment
		Actions
		Observations
	The theoretical foundations of RL
		Markov decision processes
			The Markov process
			Markov reward processes
			Adding actions
		Policy
	Summary
Chapter 2: OpenAI Gym
	The anatomy of the agent
	Hardware and software requirements
	The OpenAI Gym API
		The action space
		The observation space
		The environment
		Creating an environment
		The CartPole session
	The random CartPole agent
	Extra Gym functionality – wrappers and monitors
		Wrappers
		Monitor
	Summary
Chapter 3: Deep Learning with PyTorch
	Tensors
		The creation of tensors
		Scalar tensors
		Tensor operations
		GPU tensors
	Gradients
		Tensors and gradients
	NN building blocks
	Custom layers
	The final glue – loss functions and optimizers
		Loss functions
		Optimizers
	Monitoring with TensorBoard
		TensorBoard 101
		Plotting stuff
	Example – GAN on Atari images
	PyTorch Ignite
		Ignite concepts
	Summary
Chapter 4: The Cross-Entropy Method
	The taxonomy of RL methods
	The cross-entropy method in practice
	The cross-entropy method on CartPole
	The cross-entropy method on FrozenLake
	The theoretical background of the cross-entropy method
	Summary
Chapter 5: Tabular Learning and the Bellman Equation
	Value, state, and optimality
	The Bellman equation of optimality
	The value of the action
	The value iteration method
	Value iteration in practice
	Q-learning for FrozenLake
	Summary
Chapter 6: Deep Q-Networks
	Real-life value iteration
	Tabular Q-learning
	Deep Q-learning
		Interaction with the environment
		SGD optimization
		Correlation between steps
		The Markov property
		The final form of DQN training
	DQN on Pong
		Wrappers
		The DQN model
		Training
		Running and performance
		Your model in action
	Things to try
	Summary
Chapter 7: Higher-Level RL Libraries
	Why RL libraries?
	The PTAN library
		Action selectors
		The agent
			DQNAgent
			PolicyAgent
		Experience source
			Toy environment
			The ExperienceSource class
			ExperienceSourceFirstLast
		Experience replay buffers
		The TargetNet class
		Ignite helpers
	The PTAN CartPole solver
	Other RL libraries
	Summary
Chapter 8: DQN Extensions
	Basic DQN
		Common library
		Implementation
		Results
	N-step DQN
		Implementation
		Results
	Double DQN
		Implementation
		Results
	Noisy networks
		Implementation
		Results
	Prioritized replay buffer
		Implementation
		Results
	Dueling DQN
		Implementation
		Results
	Categorical DQN
		Implementation
		Results
	Combining everything
		Results
	Summary
	References
Chapter 9: Ways to Speed up RL
	Why speed matters
	The baseline
	The computation graph in PyTorch
	Several environments
	Play and train in separate processes
	Tweaking wrappers
	Benchmark summary
	Going hardcore: CuLE
	Summary
	References
Chapter 10: Stocks Trading Using RL
	Trading
	Data
	Problem statements and key decisions
	The trading environment
	Models
	Training code
	Results
		The feed-forward model
		The convolution model
	Things to try
	Summary
Chapter 11: Policy Gradients – an Alternative
	Values and policy
		Why the policy?
		Policy representation
		Policy gradients
	The REINFORCE method
		The CartPole example
		Results
		Policy-based versus value-based methods
	REINFORCE issues
		Full episodes are required
		High gradients variance
		Exploration
		Correlation between samples
	Policy gradient methods on CartPole
		Implementation
		Results
	Policy gradient methods on Pong
		Implementation
		Results
	Summary
Chapter 12: The Actor-Critic Method
	Variance reduction
	CartPole variance
	Actor-critic
	A2C on Pong
	A2C on Pong results
	Tuning hyperparameters
		Learning rate
		Entropy beta
		Count of environments
		Batch size
	Summary
Chapter 13: Asynchronous Advantage Actor-Critic
	Correlation and sample efficiency
	Adding an extra A to A2C
	Multiprocessing in Python
	A3C with data parallelism
		Implementation
		Results
	A3C with gradients parallelism
		Implementation
		Results
	Summary
Chapter 14: Training Chatbots with RL
	An overview of chatbots
	Chatbot training
	The deep NLP basics
		RNNs
		Word embedding
		The Encoder-Decoder architecture
	Seq2seq training
		Log-likelihood training
		The bilingual evaluation understudy (BLEU) score
		RL in seq2seq
		Self-critical sequence training
	Chatbot example
		The example structure
		Modules: cornell.py and data.py
		BLEU score and utils.py
		Model
	Dataset exploration
	Training: cross-entropy
		Implementation
		Results
	Training: SCST
		Implementation
		Results
	Models tested on data
	Telegram bot
	Summary
Chapter 15: The TextWorld Environment
	Interactive fiction
	The environment
		Installation
		Game generation
		Observation and action spaces
		Extra game information
	Baseline DQN
		Observation preprocessing
		Embeddings and encoders
		The DQN model and the agent
		Training code
		Training results
	The command generation model
		Implementation
		Pretraining results
		DQN training code
		The result of DQN training
	Summary
Chapter 16: Web Navigation
	Web navigation
		Browser automation and RL
		The MiniWoB benchmark
	OpenAI Universe
		Installation
		Actions and observations
		Environment creation
		MiniWoB stability
	The simple clicking approach
		Grid actions
		Example overview
		The model
		The training code
		Starting containers
		The training process
		Checking the learned policy
		Issues with simple clicking
	Human demonstrations
		Recording the demonstrations
		The recording format
		Training using demonstrations
		Results
		The tic-tac-toe problem
	Adding text descriptions
		Implementation
		Results
	Things to try
	Summary
Chapter 17: Continuous Action Space
	Why a continuous space?
		The action space
		Environments
	The A2C method
		Implementation
		Results
		Using models and recording videos
	Deterministic policy gradients
		Exploration
		Implementation
		Results
		Recording videos
	Distributional policy gradients
		Architecture
		Implementation
		Results
		Video recordings
	Things to try
	Summary
Chapter 18: RL in Robotics
	Robots and robotics
		Robot complexities
		The hardware overview
		The platform
		The sensors
		The actuators
		The frame
	The first training objective
	The emulator and the model
		The model definition file
		The robot class
	DDPG training and results
	Controlling the hardware
		MicroPython
		Dealing with sensors
			The I2C bus
			Sensor initialization and reading
			Sensor classes and timer reading
			Observations
		Driving servos
		Moving the model to hardware
			The model export
			Benchmarks
		Combining everything
	Policy experiments
	Summary
Chapter 19: Trust Regions – PPO, TRPO, ACKTR, and SAC
	Roboschool
	The A2C baseline
		Implementation
		Results
		Video recording
	PPO
		Implementation
		Results
	TRPO
		Implementation
		Results
	ACKTR
		Implementation
		Results
	SAC
		Implementation
		Results
	Summary
Chapter 20: Black-Box Optimization in RL
	Black-box methods
	Evolution strategies
		ES on CartPole
			Results
		ES on HalfCheetah
			Implementation
			Results
	Genetic algorithms
		GA on CartPole
			Results
		GA tweaks
			Deep GA
			Novelty search
		GA on HalfCheetah
			Results
	Summary
	References
Chapter 21: Advanced Exploration
	Why exploration is important
	What's wrong with -greedy?
	Alternative ways of exploration
		Noisy networks
		Count-based methods
		Prediction-based methods
	MountainCar experiments
		The DQN method with -greedy
		The DQN method with noisy networks
		The DQN method with state counts
		The proximal policy optimization method
		The PPO method with noisy networks
		The PPO method with count-based exploration
		The PPO method with network distillation
	Atari experiments
		The DQN method with -greedy
		The classic PPO method
		The PPO method with network distillation
		The PPO method with noisy networks
	Summary
	References
Chapter 22: Beyond Model-Free – Imagination
	Model-based methods
		Model-based versus model-free
		Model imperfections
	The imagination-augmented agent
		The EM
		The rollout policy
		The rollout encoder
		The paper's results
	I2A on Atari Breakout
		The baseline A2C agent
		EM training
		The imagination agent
			The I2A model
			The Rollout encoder
			The training of I2A
	Experiment results
		The baseline agent
		Training EM weights
		Training with the I2A model
	Summary
	References
Chapter 23: AlphaGo Zero
	Board games
	The AlphaGo Zero method
		Overview
		MCTS
		Self-play
		Training and evaluation
	The Connect 4 bot
		The game model
		Implementing MCTS
		The model
		Training
		Testing and comparison
	Connect 4 results
	Summary
	References
Chapter 24: RL in Discrete Optimization
	RL's reputation
	The Rubik's Cube and combinatorial optimization
	Optimality and God's number
	Approaches to cube solving
		Data representation
		Actions
		States
	The training process
		The NN architecture
		The training
	The model application
	The paper's results
	The code outline
		Cube environments
		Training
		The search process
	The experiment results
		The 2×2 cube
		The 3×3 cube
	Further improvements and experiments
	Summary
Chapter 25: Multi-agent RL
	Multi-agent RL explained
		Forms of communication
		The RL approach
	The MAgent environment
		Installation
		An overview
		A random environment
	Deep Q-network for tigers
		Training and results
	Collaboration by the tigers
	Training both tigers and deer
	The battle between equal actors
	Summary
Other Books You May Enjoy
Index




نظرات کاربران