دسترسی نامحدود
برای کاربرانی که ثبت نام کرده اند
برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید
در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید
برای کاربرانی که ثبت نام کرده اند
درصورت عدم همخوانی توضیحات با کتاب
از ساعت 7 صبح تا 10 شب
ویرایش: 2
نویسندگان: Sudharsan Ravichandiran
سری: Expert Insight
ISBN (شابک) : 9781839210686
ناشر: Packt Publishing
سال نشر: 2020
تعداد صفحات: 0
زبان: English
فرمت فایل : EPUB (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود)
حجم فایل: 28 مگابایت
در صورت ایرانی بودن نویسنده امکان دانلود وجود ندارد و مبلغ عودت داده خواهد شد
کلمات کلیدی مربوط به کتاب آموزش تقویتی عمیق با پایتون، ویرایش دوم: COM037000 - رایانهها / نظریه ماشین، COM004000 - رایانهها / هوش (AI) و معناشناسی، COM044000 - رایانهها / شبکههای عصبی
در صورت تبدیل فایل کتاب Deep Reinforcement Learning with Python, 2nd Edition به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.
توجه داشته باشید کتاب آموزش تقویتی عمیق با پایتون، ویرایش دوم نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.
یادگیری تقویتی عمیق با پایتون - نسخه دوم به شما کمک میکند الگوریتمها، تکنیکها و معماریهای یادگیری تقویتی - از جمله یادگیری تقویتی عمیق - را از ابتدا بیاموزید. این نسخه جدید بهروزرسانی گستردهای از نسخه اصلی است که منعکسکننده جدیدترین تفکرات در زمینه یادگیری تقویتی است.
Deep Reinforcement Learning with Python - Second Edition will help you learn reinforcement learning algorithms, techniques and architectures – including deep reinforcement learning – from scratch. This new edition is an extensive update of the original, reflecting the state-of-the-art latest thinking in reinforcement learning.
Cover Copyright Packt Page Contributors Table of Contents Preface Chapter 1: Fundamentals of Reinforcement Learning Key elements of RL Agent Environment State and action Reward The basic idea of RL The RL algorithm RL agent in the grid world How RL differs from other ML paradigms Markov Decision Processes The Markov property and Markov chain The Markov Reward Process The Markov Decision Process Fundamental concepts of RL Math essentials Expectation Action space Policy Deterministic policy Stochastic policy Episode Episodic and continuous tasks Horizon Return and discount factor Small discount factor Large discount factor What happens when we set the discount factor to 0? What happens when we set the discount factor to 1? The value function Q function Model-based and model-free learning Different types of environments Deterministic and stochastic environments Discrete and continuous environments Episodic and non-episodic environments Single and multi-agent environments Applications of RL RL glossary Summary Questions Further reading Chapter 2: A Guide to the Gym Toolkit Setting up our machine Installing Anaconda Installing the Gym toolkit Common error fixes Creating our first Gym environment Exploring the environment States Actions Transition probability and reward function Generating an episode in the Gym environment Action selection Generating an episode More Gym environments Classic control environments State space Action space Cart-Pole balancing with random policy Atari game environments General environment Deterministic environment No frame skipping State and action space An agent playing the Tennis game Recording the game Other environments Box2D MuJoCo Robotics Toy text Algorithms Environment synopsis Summary Questions Further reading Chapter 3: The Bellman Equation and Dynamic Programming The Bellman equation The Bellman equation of the value function The Bellman equation of the Q function The Bellman optimality equation The relationship between the value and Q functions Dynamic programming Value iteration The value iteration algorithm Solving the Frozen Lake problem with value iteration Policy iteration Algorithm – policy iteration Solving the Frozen Lake problem with policy iteration Is DP applicable to all environments? Summary Questions Chapter 4: Monte Carlo Methods Understanding the Monte Carlo method Prediction and control tasks Prediction task Control task Monte Carlo prediction MC prediction algorithm Types of MC prediction First-visit Monte Carlo Every-visit Monte Carlo Implementing the Monte Carlo prediction method Understanding the blackjack game The blackjack environment in the Gym library Every-visit MC prediction with the blackjack game First-visit MC prediction with the blackjack game Incremental mean updates MC prediction (Q function) Monte Carlo control MC control algorithm On-policy Monte Carlo control Monte Carlo exploring starts Monte Carlo with the epsilon-greedy policy Implementing on-policy MC control Off-policy Monte Carlo control Is the MC method applicable to all tasks? Summary Questions Chapter 5: Understanding Temporal Difference Learning TD learning TD prediction TD prediction algorithm Predicting the value of states in the Frozen Lake environment TD control On-policy TD control – SARSA Computing the optimal policy using SARSA Off-policy TD control – Q learning Computing the optimal policy using Q learning The difference between Q learning and SARSA Comparing the DP, MC, and TD methods Summary Questions Further reading Chapter 6: Case Study – The MAB Problem The MAB problem Creating a bandit in the Gym Exploration strategies Epsilon-greedy Softmax exploration Upper confidence bound Thompson sampling Applications of MAB Finding the best advertisement banner using bandits Creating a dataset Initialize the variables Define the epsilon-greedy method Run the bandit test Contextual bandits Summary Questions Further reading Chapter 7: Deep Learning Foundations Biological and artificial neurons ANN and its layers Input layer Hidden layer Output layer Exploring activation functions The sigmoid function The tanh function The Rectified Linear Unit function The softmax function Forward propagation in ANNs How does an ANN learn? Putting it all together Building a neural network from scratch Recurrent Neural Networks The difference between feedforward networks and RNNs Forward propagation in RNNs Backpropagating through time LSTM to the rescue Understanding the LSTM cell What are CNNs? Convolutional layers Strides Padding Pooling layers Fully connected layers The architecture of CNNs Generative adversarial networks Breaking down the generator Breaking down the discriminator How do they learn, though? Architecture of a GAN Demystifying the loss function Discriminator loss Generator loss Total loss Summary Questions Further reading Chapter 8: A Primer on TensorFlow What is TensorFlow? Understanding computational graphs and sessions Sessions Variables, constants, and placeholders Variables Constants Placeholders and feed dictionaries Introducing TensorBoard Creating a name scope Handwritten digit classification using TensorFlow Importing the required libraries Loading the dataset Defining the number of neurons in each layer Defining placeholders Forward propagation Computing loss and backpropagation Computing accuracy Creating a summary Training the model Visualizing graphs in TensorBoard Introducing eager execution Math operations in TensorFlow TensorFlow 2.0 and Keras Bonjour Keras Defining the model Compiling the model Training the model Evaluating the model MNIST digit classification using TensorFlow 2.0 Summary Questions Further reading Chapter 9: Deep Q Network and Its Variants What is DQN? Understanding DQN Replay buffer Loss function Target network Putting it all together The DQN algorithm Playing Atari games using DQN Architecture of the DQN Getting hands-on with the DQN Preprocess the game screen Defining the DQN class Training the DQN The double DQN The double DQN algorithm DQN with prioritized experience replay Types of prioritization Proportional prioritization Rank-based prioritization Correcting the bias The dueling DQN Understanding the dueling DQN The architecture of a dueling DQN The deep recurrent Q network The architecture of a DRQN Summary Questions Further reading Chapter 10: Policy Gradient Method Why policy-based methods? Policy gradient intuition Understanding the policy gradient Deriving the policy gradient Algorithm – policy gradient Variance reduction methods Policy gradient with reward-to-go Algorithm – Reward-to-go policy gradient Cart pole balancing with policy gradient Computing discounted and normalized reward Building the policy network Training the network Policy gradient with baseline Algorithm – REINFORCE with baseline Summary Questions Further reading Chapter 11: Actor-Critic Methods – A2C and A3C Overview of the actor-critic method Understanding the actor-critic method The actor-critic algorithm Advantage actor-critic (A2C) Asynchronous advantage actor-critic (A3C) The three As The architecture of A3C Mountain car climbing using A3C Creating the mountain car environment Defining the variables Defining the actor-critic class Defining the worker class Training the network Visualizing the computational graph A2C revisited Summary Questions Further reading Chapter 12: Learning DDPG, TD3, and SAC Deep deterministic policy gradient An overview of DDPG Actor Critic DDPG components Critic network Actor network Putting it all together Algorithm – DDPG Swinging up a pendulum using DDPG Creating the Gym environment Defining the variables Defining the DDPG class Training the network Twin delayed DDPG Key features of TD3 Clipped double Q learning Delayed policy updates Target policy smoothing Putting it all together Algorithm – TD3 Soft actor-critic Understanding soft actor-critic V and Q functions with the entropy term Components of SAC Critic network Actor network Putting it all together Algorithm – SAC Summary Questions Further reading Chapter 13: TRPO, PPO, and ACKTR Methods Trust region policy optimization Math essentials The Taylor series The trust region method The conjugate gradient method Lagrange multipliers Importance sampling Designing the TRPO objective function Parameterizing the policies Sample-based estimation Solving the TRPO objective function Computing the search direction Performing a line search in the search direction Algorithm – TRPO Proximal policy optimization PPO with a clipped objective Algorithm – PPO-clipped Implementing the PPO-clipped method Creating the Gym environment Defining the PPO class Training the network PPO with a penalized objective Algorithm – PPO-penalty Actor-critic using Kronecker-factored trust region Math essentials Block matrix Block diagonal matrix The Kronecker product The vec operator Properties of the Kronecker product Kronecker-Factored Approximate Curvature (K-FAC) K-FAC in actor-critic Incorporating the trust region Summary Questions Further reading Chapter 14: Distributional Reinforcement Learning Why distributional reinforcement learning? Categorical DQN Predicting the value distribution Selecting an action based on the value distribution Training the categorical DQN Projection step Putting it all together Algorithm – categorical DQN Playing Atari games using a categorical DQN Defining the variables Defining the replay buffer Defining the categorical DQN class Quantile Regression DQN Math essentials Quantile Inverse CDF (quantile function) Understanding QR-DQN Action selection Loss function Distributed Distributional DDPG Critic network Actor network Algorithm – D4PG Summary Questions Further reading Chapter 15: Imitation Learning and Inverse RL Supervised imitation learning DAgger Understanding DAgger Algorithm – DAgger Deep Q learning from demonstrations Phases of DQfD Pre-training phase Training phase Loss function of DQfD Algorithm – DQfD Inverse reinforcement learning Maximum entropy IRL Key terms Back to maximum entropy IRL Computing the gradient Algorithm – maximum entropy IRL Generative adversarial imitation learning Formulation of GAIL Summary Questions Further reading Chapter 16: Deep Reinforcement Learning with Stable Baselines Installing Stable Baselines Creating our first agent with Stable Baselines Evaluating the trained agent Storing and loading the trained agent Viewing the trained agent Putting it all together Vectorized environments SubprocVecEnv DummyVecEnv Integrating custom environments Playing Atari games with a DQN and its variants Implementing DQN variants Lunar lander using A2C Creating a custom network Swinging up a pendulum using DDPG Viewing the computational graph in TensorBoard Training an agent to walk using TRPO Installing the MuJoCo environment Implementing TRPO Recording the video Training a cheetah bot to run using PPO Making a GIF of a trained agent Implementing GAIL Summary Questions Further reading Chapter 17: Reinforcement Learning Frontiers Meta reinforcement learning Model-agnostic meta learning Understanding MAML MAML in a supervised learning setting MAML in a reinforcement learning setting Hierarchical reinforcement learning MAXQ value function Decomposition Imagination augmented agents Summary Questions Further reading Appendix 1 – Reinforcement Learning Algorithms Reinforcement learning algorithm Value Iteration Policy Iteration First-Visit MC Prediction Every-Visit MC Prediction MC Prediction – the Q Function MC Control Method On-Policy MC Control – Exploring starts On-Policy MC Control – Epsilon-Greedy Off-Policy MC Control TD Prediction On-Policy TD Control – SARSA Off-Policy TD Control – Q Learning Deep Q Learning Double DQN REINFORCE Policy Gradient Policy Gradient with Reward-To-Go REINFORCE with Baseline Advantage Actor Critic Asynchronous Advantage Actor-Critic Deep Deterministic Policy Gradient Twin Delayed DDPG Soft Actor-Critic Trust Region Policy Optimization PPO-Clipped PPO-Penalty Categorical DQN Distributed Distributional DDPG DAgger Deep Q learning from demonstrations MaxEnt Inverse Reinforcement Learning MAML in Reinforcement Learning Appendix 2 – Assessments Chapter 1 – Fundamentals of Reinforcement Learning Chapter 2 – A Guide to the Gym Toolkit Chapter 3 – Bellman Equation and Dynamic Programming Chapter 4 – Monte Carlo Methods Chapter 5 – Understanding Temporal Difference Learning Chapter 6 – Case Study – The MAB Problem Chapter 7 – Deep Learning Foundations Chapter 8 – A Primer on TensorFlow Chapter 9 – Deep Q Network and Its Variants Chapter 10 – Policy Gradient Method Chapter 11 – Actor-Critic Methods – A2C and A3C Chapter 12 – Learning DDPG, TD3, and SAC Chapter 13 – TRPO, PPO, and ACKTR Methods Chapter 14 – Distributional Reinforcement Learning Chapter 15 – Imitation Learning and Inverse RL Chapter 16 – Deep Reinforcement Learning with Stable Baselines Chapter 17 – Reinforcement Learning Frontiers Other Books You May Enjoy Index