دسترسی نامحدود
برای کاربرانی که ثبت نام کرده اند
برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید
در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید
برای کاربرانی که ثبت نام کرده اند
درصورت عدم همخوانی توضیحات با کتاب
از ساعت 7 صبح تا 10 شب
ویرایش: 3
نویسندگان: Maxim Lapan
سری: EXPERT INSIGHT
ISBN (شابک) : 9781835882702
ناشر: Packt
سال نشر: 2024
تعداد صفحات: 0
زبان: English
فرمت فایل : EPUB (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود)
حجم فایل: 57 مگابایت
در صورت تبدیل فایل کتاب Deep Reinforcement Learning Hands-On به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.
توجه داشته باشید کتاب تقویت عمیق یادگیری دستی نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.
Preface
Why I wrote this book
The approach
Who this book is for
What this book covers
To get the most out of this book
Changes in the third edition
Part 1 Introduction to RL
What Is Reinforcement Learning?
Supervised learning
Unsupervised learning
Reinforcement learning
Complications in RL
RL formalisms
Reward
The agent
The environment
Actions
Observations
The theoretical foundations of RL
Markov decision processes
The Markov process
Markov reward processes
Adding actions to MDP
Policy
Summary
OpenAI Gym API and Gymnasium
The anatomy of the agent
Hardware and software requirements
The OpenAI Gym API and Gymnasium
The action space
The observation space
The environment
Creating an environment
The CartPole session
The random CartPole agent
Extra Gym API functionality
Wrappers
Rendering the environment
More wrappers
Summary
Deep Learning with PyTorch
Tensors
The creation of tensors
Scalar tensors
Tensor operations
GPU tensors
Gradients
Tensors and gradients
NN building blocks
Custom layers
Loss functions and optimizers
Loss functions
Optimizers
Monitoring with TensorBoard
TensorBoard 101
Plotting metrics
GAN on Atari images
PyTorch Ignite
Ignite concepts
GAN training on Atari using Ignite
Summary
The Cross-Entropy Method
The taxonomy of RL methods
The cross-entropy method in practice
The cross-entropy method on CartPole
The cross-entropy method on FrozenLake
The theoretical background of the cross-entropy method
Summary
Part 2 Value-based methods
Tabular Learning and the Bellman Equation
Value, state, and optimality
The Bellman equation of optimality
The value of the action
The value iteration method
Value iteration in practice
Q-iteration for FrozenLake
Summary
Deep Q-Networks
Real-life value iteration
Tabular Q-learning
Deep Q-learning
Interaction with the environment
SGD optimization
Correlation between steps
The Markov property
The final form of DQN training
DQN on Pong
Wrappers
The DQN model
Training
Running and performance
Your model in action
Things to try
Summary
Higher-Level RL Libraries
Why RL libraries?
The PTAN library
Action selectors
The agent
DQNAgent
PolicyAgent
Experience source
Toy environment
The ExperienceSource class
The ExperienceSourceFirstLast Class
Experience replay buffers
The TargetNet class
Ignite helpers
The PTAN CartPole solver
Other RL libraries
Summary
DQN Extensions
Basic DQN
Common library
Implementation
Hyperparameter tuning
Results with common parameters
Tuned baseline DQN
N-step DQN
Implementation
Results
Hyperparameter tuning
Double DQN
Implementation
Results
Hyperparameter tuning
Noisy networks
Implementation
Results
Hyperparameter tuning
Prioritized replay buffer
Implementation
Results
Hyperparameter tuning
Dueling DQN
Implementation
Results
Hyperparameter tuning
Categorical DQN
Implementation
Results
Hyperparameter tuning
Combining everything
Results
Hyperparameter tuning
Summary
Ways to Speed Up RL
Why speed matters
Baseline
The computation graph in PyTorch
Several environments
Playing and training in separate processes
Tweaking wrappers
Benchmark results
Summary
Stocks Trading Using RL
Why trading?
Problem statement and key decisions
Data
The trading environment
Models
Training code
Results
The feed-forward model
The convolution model
Things to try
Summary
Part 3 Policy-based methods
Policy Gradients
Values and policy
Why the policy?
Policy representation
Policy gradients
The REINFORCE method
The CartPole example
Results
Policy-based versus value-based methods
REINFORCE issues
Full episodes are required
High gradient variance
Exploration problems
High correlation of samples
Policy gradient methods on CartPole
Implementation
Results
Policy gradient methods on Pong
Implementation
Results
Summary
Actor-Critic Method: A2C and A3C
Variance reduction
CartPole variance
Advantage actor-critic (A2C)
A2C on Pong
Results
Asynchronous Advantage Actor-Critic (A3C)
Correlation and sample efficiency
Adding an extra “A” to A2C
A3C with data parallelism
Results
A3C with gradient parallelism
Implementation
Results
Summary
The TextWorld Environment
Interactive fiction
The environment
Installation
Game generation
Observation and action spaces
Extra game information
The deep NLP basics
Recurrent Neural Networks (RNNs)
Word embedding
The Encoder-Decoder architecture
Transformers
Baseline DQN
Observation preprocessing
Embeddings and encoders
The DQN model and the agent
Training code
Training results
Tweaking observations
Tracking visited rooms
Relative actions
Objective in observation
Transformers
ChatGPT
Setup
Interactive mode
ChatGPT API
Summary
Web Navigation
The evolution of web navigation
Browser automation and RL
Challenges in browser automation
The MiniWoB benchmark
MiniWoB++
Installation
Actions and observations
Simple example
The simple clicking approach
Grid actions
The RL part of our implementation
The model and training code
Training results
Simple clicking limitations
Adding text description
Implementation
Results
Human demonstrations
Recording the demonstrations
Training with demonstrations
Results
Things to try
Summary
Part 4 Advanced RL
Continous Action Space
Why a continuous space?
The action space
Environments
The A2C method
Implementation
Results
Using models and recording videos
Deep deterministic policy gradients
Exploration
Implementation
Results and video
Distributional policy gradients
Architecture
Implementation
Results
Things to try
Summary
Trust Region Methods
Environments
The A2C baseline
Implementation
Results
Video recording
PPO
Implementation
Results
TRPO
Implementation
Results
ACKTR
Implementation
Results
SAC
Implementation
Results
Overall results
Summary
Black-Box Optimizations in RL
Black-box methods
Evolution strategies
Implementing ES on CartPole
CartPole results
ES on HalfCheetah
Implementing ES on HalfCheetah
HalfCheetah results
Genetic algorithms
GA on CartPole
GA tweaks
Deep GA
Novelty search
GA on HalfCheetah
Implementation
Results
Summary
Advanced Exploration
Why exploration is important
What’s wrong with