دسترسی نامحدود
برای کاربرانی که ثبت نام کرده اند
برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید
در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید
برای کاربرانی که ثبت نام کرده اند
درصورت عدم همخوانی توضیحات با کتاب
از ساعت 7 صبح تا 10 شب
ویرایش: [2 ed.]
نویسندگان: Maxim Lapan
سری:
ISBN (شابک) : 9781838826994
ناشر: Packt Publishing
سال نشر: 2020
تعداد صفحات:
زبان: English
فرمت فایل : EPUB (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود)
حجم فایل: 22 Mb
در صورت تبدیل فایل کتاب Deep Reinforcement Learning Hands-On به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.
توجه داشته باشید کتاب آموزش تقویت عمیق به صورت دستی نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.
Cover Copyright Packt Page Contributors Table of Contents Preface Chapter 1: What Is Reinforcement Learning? Supervised learning Unsupervised learning Reinforcement learning RL's complications RL formalisms Reward The agent The environment Actions Observations The theoretical foundations of RL Markov decision processes The Markov process Markov reward processes Adding actions Policy Summary Chapter 2: OpenAI Gym The anatomy of the agent Hardware and software requirements The OpenAI Gym API The action space The observation space The environment Creating an environment The CartPole session The random CartPole agent Extra Gym functionality – wrappers and monitors Wrappers Monitor Summary Chapter 3: Deep Learning with PyTorch Tensors The creation of tensors Scalar tensors Tensor operations GPU tensors Gradients Tensors and gradients NN building blocks Custom layers The final glue – loss functions and optimizers Loss functions Optimizers Monitoring with TensorBoard TensorBoard 101 Plotting stuff Example – GAN on Atari images PyTorch Ignite Ignite concepts Summary Chapter 4: The Cross-Entropy Method The taxonomy of RL methods The cross-entropy method in practice The cross-entropy method on CartPole The cross-entropy method on FrozenLake The theoretical background of the cross-entropy method Summary Chapter 5: Tabular Learning and the Bellman Equation Value, state, and optimality The Bellman equation of optimality The value of the action The value iteration method Value iteration in practice Q-learning for FrozenLake Summary Chapter 6: Deep Q-Networks Real-life value iteration Tabular Q-learning Deep Q-learning Interaction with the environment SGD optimization Correlation between steps The Markov property The final form of DQN training DQN on Pong Wrappers The DQN model Training Running and performance Your model in action Things to try Summary Chapter 7: Higher-Level RL Libraries Why RL libraries? The PTAN library Action selectors The agent DQNAgent PolicyAgent Experience source Toy environment The ExperienceSource class ExperienceSourceFirstLast Experience replay buffers The TargetNet class Ignite helpers The PTAN CartPole solver Other RL libraries Summary Chapter 8: DQN Extensions Basic DQN Common library Implementation Results N-step DQN Implementation Results Double DQN Implementation Results Noisy networks Implementation Results Prioritized replay buffer Implementation Results Dueling DQN Implementation Results Categorical DQN Implementation Results Combining everything Results Summary References Chapter 9: Ways to Speed up RL Why speed matters The baseline The computation graph in PyTorch Several environments Play and train in separate processes Tweaking wrappers Benchmark summary Going hardcore: CuLE Summary References Chapter 10: Stocks Trading Using RL Trading Data Problem statements and key decisions The trading environment Models Training code Results The feed-forward model The convolution model Things to try Summary Chapter 11: Policy Gradients – an Alternative Values and policy Why the policy? Policy representation Policy gradients The REINFORCE method The CartPole example Results Policy-based versus value-based methods REINFORCE issues Full episodes are required High gradients variance Exploration Correlation between samples Policy gradient methods on CartPole Implementation Results Policy gradient methods on Pong Implementation Results Summary Chapter 12: The Actor-Critic Method Variance reduction CartPole variance Actor-critic A2C on Pong A2C on Pong results Tuning hyperparameters Learning rate Entropy beta Count of environments Batch size Summary Chapter 13: Asynchronous Advantage Actor-Critic Correlation and sample efficiency Adding an extra A to A2C Multiprocessing in Python A3C with data parallelism Implementation Results A3C with gradients parallelism Implementation Results Summary Chapter 14: Training Chatbots with RL An overview of chatbots Chatbot training The deep NLP basics RNNs Word embedding The Encoder-Decoder architecture Seq2seq training Log-likelihood training The bilingual evaluation understudy (BLEU) score RL in seq2seq Self-critical sequence training Chatbot example The example structure Modules: cornell.py and data.py BLEU score and utils.py Model Dataset exploration Training: cross-entropy Implementation Results Training: SCST Implementation Results Models tested on data Telegram bot Summary Chapter 15: The TextWorld Environment Interactive fiction The environment Installation Game generation Observation and action spaces Extra game information Baseline DQN Observation preprocessing Embeddings and encoders The DQN model and the agent Training code Training results The command generation model Implementation Pretraining results DQN training code The result of DQN training Summary Chapter 16: Web Navigation Web navigation Browser automation and RL The MiniWoB benchmark OpenAI Universe Installation Actions and observations Environment creation MiniWoB stability The simple clicking approach Grid actions Example overview The model The training code Starting containers The training process Checking the learned policy Issues with simple clicking Human demonstrations Recording the demonstrations The recording format Training using demonstrations Results The tic-tac-toe problem Adding text descriptions Implementation Results Things to try Summary Chapter 17: Continuous Action Space Why a continuous space? The action space Environments The A2C method Implementation Results Using models and recording videos Deterministic policy gradients Exploration Implementation Results Recording videos Distributional policy gradients Architecture Implementation Results Video recordings Things to try Summary Chapter 18: RL in Robotics Robots and robotics Robot complexities The hardware overview The platform The sensors The actuators The frame The first training objective The emulator and the model The model definition file The robot class DDPG training and results Controlling the hardware MicroPython Dealing with sensors The I2C bus Sensor initialization and reading Sensor classes and timer reading Observations Driving servos Moving the model to hardware The model export Benchmarks Combining everything Policy experiments Summary Chapter 19: Trust Regions – PPO, TRPO, ACKTR, and SAC Roboschool The A2C baseline Implementation Results Video recording PPO Implementation Results TRPO Implementation Results ACKTR Implementation Results SAC Implementation Results Summary Chapter 20: Black-Box Optimization in RL Black-box methods Evolution strategies ES on CartPole Results ES on HalfCheetah Implementation Results Genetic algorithms GA on CartPole Results GA tweaks Deep GA Novelty search GA on HalfCheetah Results Summary References Chapter 21: Advanced Exploration Why exploration is important What's wrong with -greedy? Alternative ways of exploration Noisy networks Count-based methods Prediction-based methods MountainCar experiments The DQN method with -greedy The DQN method with noisy networks The DQN method with state counts The proximal policy optimization method The PPO method with noisy networks The PPO method with count-based exploration The PPO method with network distillation Atari experiments The DQN method with -greedy The classic PPO method The PPO method with network distillation The PPO method with noisy networks Summary References Chapter 22: Beyond Model-Free – Imagination Model-based methods Model-based versus model-free Model imperfections The imagination-augmented agent The EM The rollout policy The rollout encoder The paper's results I2A on Atari Breakout The baseline A2C agent EM training The imagination agent The I2A model The Rollout encoder The training of I2A Experiment results The baseline agent Training EM weights Training with the I2A model Summary References Chapter 23: AlphaGo Zero Board games The AlphaGo Zero method Overview MCTS Self-play Training and evaluation The Connect 4 bot The game model Implementing MCTS The model Training Testing and comparison Connect 4 results Summary References Chapter 24: RL in Discrete Optimization RL's reputation The Rubik's Cube and combinatorial optimization Optimality and God's number Approaches to cube solving Data representation Actions States The training process The NN architecture The training The model application The paper's results The code outline Cube environments Training The search process The experiment results The 2×2 cube The 3×3 cube Further improvements and experiments Summary Chapter 25: Multi-agent RL Multi-agent RL explained Forms of communication The RL approach The MAgent environment Installation An overview A random environment Deep Q-network for tigers Training and results Collaboration by the tigers Training both tigers and deer The battle between equal actors Summary Other Books You May Enjoy Index