ورود به حساب

نام کاربری گذرواژه

گذرواژه را فراموش کردید؟ کلیک کنید

حساب کاربری ندارید؟ ساخت حساب

ساخت حساب کاربری

نام نام کاربری ایمیل شماره موبایل گذرواژه

برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید


09117307688
09117179751

در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید

دسترسی نامحدود

برای کاربرانی که ثبت نام کرده اند

ضمانت بازگشت وجه

درصورت عدم همخوانی توضیحات با کتاب

پشتیبانی

از ساعت 7 صبح تا 10 شب

دانلود کتاب Deep Reinforcement Learning Hands-On

دانلود کتاب تقویت عمیق یادگیری دستی

Deep Reinforcement Learning Hands-On

مشخصات کتاب

Deep Reinforcement Learning Hands-On

ویرایش: 3 
نویسندگان:   
سری: EXPERT INSIGHT 
ISBN (شابک) : 9781835882702 
ناشر: Packt 
سال نشر: 2024 
تعداد صفحات: 0 
زبان: English 
فرمت فایل : EPUB (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود) 
حجم فایل: 57 مگابایت 

قیمت کتاب (تومان) : 83,000



ثبت امتیاز به این کتاب

میانگین امتیاز به این کتاب :
       تعداد امتیاز دهندگان : 3


در صورت تبدیل فایل کتاب Deep Reinforcement Learning Hands-On به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.

توجه داشته باشید کتاب تقویت عمیق یادگیری دستی نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.


توضیحاتی درمورد کتاب به خارجی



فهرست مطالب

Preface
   Why I wrote this book
   The approach
   Who this book is for
   What this book covers
   To get the most out of this book
   Changes in the third edition
Part 1 Introduction to RL
What Is Reinforcement Learning?
   Supervised learning
   Unsupervised learning
   Reinforcement learning
   Complications in RL
   RL formalisms
      Reward
      The agent
      The environment
      Actions
      Observations
   The theoretical foundations of RL
      Markov decision processes
         The Markov process
         Markov reward processes
         Adding actions to MDP
      Policy
   Summary
OpenAI Gym API and Gymnasium
   The anatomy of the agent
   Hardware and software requirements
   The OpenAI Gym API and Gymnasium
      The action space
      The observation space
      The environment
      Creating an environment
      The CartPole session
   The random CartPole agent
   Extra Gym API functionality
      Wrappers
      Rendering the environment
      More wrappers
   Summary
Deep Learning with PyTorch
   Tensors
   The creation of tensors
      Scalar tensors
      Tensor operations
      GPU tensors
   Gradients
      Tensors and gradients
   NN building blocks
   Custom layers
   Loss functions and optimizers
      Loss functions
      Optimizers
   Monitoring with TensorBoard
      TensorBoard 101
      Plotting metrics
   GAN on Atari images
   PyTorch Ignite
      Ignite concepts
      GAN training on Atari using Ignite
   Summary
The Cross-Entropy Method
   The taxonomy of RL methods
   The cross-entropy method in practice
   The cross-entropy method on CartPole
   The cross-entropy method on FrozenLake
   The theoretical background of the cross-entropy method
   Summary
Part 2 Value-based methods
Tabular Learning and the Bellman Equation
   Value, state, and optimality
   The Bellman equation of optimality
   The value of the action
   The value iteration method
   Value iteration in practice
   Q-iteration for FrozenLake
   Summary
Deep Q-Networks
   Real-life value iteration
   Tabular Q-learning
   Deep Q-learning
      Interaction with the environment
      SGD optimization
      Correlation between steps
      The Markov property
      The final form of DQN training
   DQN on Pong
      Wrappers
      The DQN model
      Training
      Running and performance
      Your model in action
   Things to try
   Summary
Higher-Level RL Libraries
   Why RL libraries?
   The PTAN library
      Action selectors
      The agent
         DQNAgent
         PolicyAgent
      Experience source
         Toy environment
         The ExperienceSource class
         The ExperienceSourceFirstLast Class
      Experience replay buffers
      The TargetNet class
      Ignite helpers
   The PTAN CartPole solver
   Other RL libraries
   Summary
DQN Extensions
   Basic DQN
      Common library
      Implementation
      Hyperparameter tuning
      Results with common parameters
      Tuned baseline DQN
   N-step DQN
      Implementation
      Results
      Hyperparameter tuning
   Double DQN
      Implementation
      Results
      Hyperparameter tuning
   Noisy networks
      Implementation
      Results
      Hyperparameter tuning
   Prioritized replay buffer
      Implementation
      Results
      Hyperparameter tuning
   Dueling DQN
      Implementation
      Results
      Hyperparameter tuning
   Categorical DQN
      Implementation
      Results
      Hyperparameter tuning
   Combining everything
      Results
      Hyperparameter tuning
   Summary
Ways to Speed Up RL
   Why speed matters
   Baseline
   The computation graph in PyTorch
   Several environments
   Playing and training in separate processes
   Tweaking wrappers
   Benchmark results
   Summary
Stocks Trading Using RL
   Why trading?
   Problem statement and key decisions
   Data
   The trading environment
   Models
   Training code
   Results
      The feed-forward model
      The convolution model
   Things to try
   Summary
Part 3 Policy-based methods
Policy Gradients
   Values and policy
      Why the policy?
      Policy representation
      Policy gradients
   The REINFORCE method
      The CartPole example
      Results
      Policy-based versus value-based methods
   REINFORCE issues
      Full episodes are required
      High gradient variance
      Exploration problems
      High correlation of samples
   Policy gradient methods on CartPole
      Implementation
      Results
   Policy gradient methods on Pong
      Implementation
      Results
   Summary
Actor-Critic Method: A2C and A3C
   Variance reduction
   CartPole variance
   Advantage actor-critic (A2C)
      A2C on Pong
      Results
   Asynchronous Advantage Actor-Critic (A3C)
      Correlation and sample efficiency
      Adding an extra “A” to A2C
      A3C with data parallelism
         Results
      A3C with gradient parallelism
         Implementation
         Results
   Summary
The TextWorld Environment
   Interactive fiction
   The environment
      Installation
      Game generation
      Observation and action spaces
      Extra game information
   The deep NLP basics
      Recurrent Neural Networks (RNNs)
      Word embedding
      The Encoder-Decoder architecture
      Transformers
   Baseline DQN
      Observation preprocessing
      Embeddings and encoders
      The DQN model and the agent
      Training code
      Training results
   Tweaking observations
      Tracking visited rooms
      Relative actions
      Objective in observation
   Transformers
   ChatGPT
      Setup
      Interactive mode
      ChatGPT API
   Summary
Web Navigation
   The evolution of web navigation
   Browser automation and RL
   Challenges in browser automation
   The MiniWoB benchmark
   MiniWoB++
      Installation
      Actions and observations
      Simple example
   The simple clicking approach
      Grid actions
      The RL part of our implementation
      The model and training code
      Training results
      Simple clicking limitations
   Adding text description
      Implementation
      Results
   Human demonstrations
      Recording the demonstrations
      Training with demonstrations
      Results
   Things to try
   Summary
Part 4 Advanced RL
Continous Action Space
   Why a continuous space?
      The action space
      Environments
   The A2C method
      Implementation
      Results
      Using models and recording videos
   Deep deterministic policy gradients
      Exploration
      Implementation
      Results and video
   Distributional policy gradients
      Architecture
      Implementation
      Results
   Things to try
   Summary
Trust Region Methods
   Environments
   The A2C baseline
      Implementation
      Results
      Video recording
   PPO
      Implementation
      Results
   TRPO
      Implementation
      Results
   ACKTR
      Implementation
      Results
   SAC
      Implementation
      Results
   Overall results
   Summary
Black-Box Optimizations in RL
   Black-box methods
   Evolution strategies
      Implementing ES on CartPole
      CartPole results
      ES on HalfCheetah
      Implementing ES on HalfCheetah
      HalfCheetah results
   Genetic algorithms
      GA on CartPole
      GA tweaks
         Deep GA
         Novelty search
      GA on HalfCheetah
         Implementation
         Results
   Summary
Advanced Exploration
   Why exploration is important
   What’s wrong with 




نظرات کاربران