دسترسی نامحدود
برای کاربرانی که ثبت نام کرده اند
برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید
در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید
برای کاربرانی که ثبت نام کرده اند
درصورت عدم همخوانی توضیحات با کتاب
از ساعت 7 صبح تا 10 شب
ویرایش:
نویسندگان: Sean Meyn
سری:
ISBN (شابک) : 9781316511961, 9781009051873
ناشر: Cambridge University Press
سال نشر: 2022
تعداد صفحات: 453
زبان: English
فرمت فایل : PDF (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود)
حجم فایل: 14 مگابایت
در صورت تبدیل فایل کتاب Control Systems and Reinforcement Learning به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.
توجه داشته باشید کتاب سیستم های کنترل و یادگیری تقویتی نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.
راهنمای چگونگی و آموزش علمی که جهان تئوری یادگیری تقویتی و کنترل را برای تصمیم گیری آنلاین پوشش می دهد.
A how-to guide and scientific tutorial covering the universe of reinforcement learning and control theory for online decision making.
Control Systems and Reinforcement Learning Acknowledgments Introduction Part I Control Crash Course 2.1.1 Failure of an Adaptive Control System 2.3.1 Sufficient Statistics and the Nonlinear State Space Model 2.3.2 State Augmentation and Learning 2.3.3 Linear State Space Model 2.3.4 A Nod to Newton and Leibniz 2.4.1 Total Cost 2.4.2 Stability of Equilibria 2.4.3 Lyapunov Functions 2.4.4 Technical Proofs 2.4.5 Geometry in Continuous Time 2.4.6 Linear State Space Models 2.5.1 Actors and Critics 2.5.2 Temporal Differences 2.5.3 Bandits and Exploration 2.7.1 Wall Street 2.7.2 Mountain Car 2.7.3 MagBall 2.7.4 CartPole 2.7.5 Pendubot and Acrobot 2.7.6 Cooperative Rowing Optimal Control 3.2.1 Value Iteration 3.2.2 Policy Improvement 3.2.3 Perron-Frobenius Theory: A Gentle Introduction* 3.3.1 Discounted Cost 3.3.2 Shortest Path Problem 3.3.3 Finite Horizon 3.3.4 Model Predictive Control 3.9.1 Mountain Car 3.9.2 Spiders and Flies 3.9.3 Contention for Resources and Instability 3.9.4 Solving the HJB Equation ODE Methods for Algorithm Design 4.2.1 Euler Approximation for a Linear System 4.4.1 Role of Convexity 4.4.2 Polyak-Lojasiewicz Condition 4.4.3 Euler Approximation 4.4.4 Constrained Optimization 4.5.1 Quasi Monte-Carlo 4.5.2 System Identification 4.5.3 Approximate Policy Improvement 4.5.4 A Brief Tour of QSA Theory 4.5.5 Constant Gain Algorithm 4.5.6 Zap QSA 4.6.1 Simulated Annealing 4.6.2 A Menu of Algorithms 4.7.1 Mountain Car 4.7.2 LQR 4.7.3 What about High Dimensions? 4.8.1 Gronwall’s Inequality 4.8.2 Lyapunov Functions 4.8.3 Gradient Flows 4.8.4 The ODE at ^ 4.9.1 Main Results and Some Insights 4.9.2 ODE Solidarity 4.9.3 Criteria for Stability 4.9.4 Deterministic Markovian Model 4.9.5 Convergence Rates 4.11.1 ODE Methods for Algorithm Design 4.11.2 Optimization 4.11.3 QSA 4.11.4 SGD and Extremum Seeking Control Value Function Approximations 5.1.1 Function Approximation Based on Training Data 5.1.2 Linear Function Approximation 5.1.3 Neural Networks 5.1.4 Kernels 5.1.5 Are We Done Yet? 5.3.1 Fixed-Policy Temporal Difference 5.3.2 Least Squares and Linear Regression 5.3.3 Recursive LSTD and Zap 5.4.1 Galerkin Relaxations and Projection 5.4.2 TD(X)-Learning 5.4.3 Projected Bellman Operator and Q-Learning 5.4.4 GQ-Learning 5.4.5 Batch Methods and DQN 5.5.1 Convex Q-Learning with a Finite-Dimensional Function Class 5.5.2 BCQL and Kernel Methods 5.9.1 Machine Learning 5.9.2 TD-Learning 5.9.3 Q-Learning Part II Markov Chains 6.1.1 Notation and Conventions 6.3.1 Example 6.4.1 Critic Methods 6.4.2 Actor Methods 6.6.1 Average Cost 6.6.2 Discounted Cost 6.7.1 Asymptotic Statistics Made Finite 6.7.2 Asymptotic Variance and Mixing Time 6.7.3 Sample Complexity 6.7.4 A Simple Example? 6.7.5 Combating Variance by Design 6.9.1 Classification 6.9.2 Lyapunov Theory Stochastic Control 7.1.1 Admissible Inputs 7.3.1 Fluid Models and Value Functions 7.4.1 Fluid Model 7.4.2 Computation and Solidarity 7.4.3 Solidarity 7.5.1 Fluid Model Dynamics 7.5.2 DP Equations 7.5.3 Partial Observations 7.7.1 Control with Partial Observations 7.8.1 Bandit Models 7.8.2 Bayesian Bandits 7.8.3 Naive Optimism Can Be Successful Stochastic Approximation 8.2.1 ODE Design 8.2.2 ODE Approximation 8.2.3 Choice of Step-Size 8.2.4 Multiple Time Scales 8.2.5 Algorithm Performance 8.2.6 Asymptotic versus Transient Performance 8.3.1 Monte Carlo 8.3.2 Stochastic Gradient Descent 8.3.3 Empirical Risk Minimization 8.4.1 Gain Selection 8.4.2 Variance Formulae 8.4.3 Simulations 8.5.1 Approximating the Newton-Raphson Flow 8.5.2 Zap Zero 8.5.3 Stochastic Newton-Raphson 8.6.1 Curse of Condition Number 8.6.2 Curse of Markovian Memory □ 8.7.1 Stability and Convergence 8.7.2 Linearization and Convergence Rates 8.7.3 Polyak-Ruppert Averaging 8.9.1 SA and RL 8.9.2 Stability 8.9.3 Asymptotic Statistics 8.9.4 Less Asymptotic Statistics Temporal Difference Methods 9.1.1 Fixed-Policy Value Functions and DP Equations 9.1.2 PIA and the Q-Function PIA for Discounted Cost 9.1.3 Advantage Function 9.2.1 Conditional Expectation and Projection 9.2.2 Linear Independence 9.3.1 Mean-Square Bellman Error 9.3.2 Mean-Square Value Function Error 9.3.3 Projected Bellman Error 9.4.1 Linear Function Class 9.4.2 Nonlinear Parameterizations 9.5.1 Exploration 9.5.2 On and Off Algorithms 9.5.3 Relative TD(X) 9.5.4 TD(X) for Advantage 9.6.1 Optimal Control Essentials 9.6.2 Watkins’s Algorithms 9.6.3 Exploration 9.6.4 ODE Analysis 9.6.5 Variance Matters 9.7.1 Gain Selection 9.7.2 Honest Conclusions 9.8.1 GQ-Learning 9.8.2 Zap Q-Learning 9.9.1 Advantage Function 9.9.2 TD Stability Theory 9.11.1 Temporal Difference Methods 9.11.2 Q-Learning 9.11.3 GQ and Zap 9.11.4 Convex Q-Learning 10 Setting the Stage, Return of the Actors 10.1.1 Linear Operators and Adjoints 10.1.2 Adjoints and Eligibility Vectors 10.1.3 Weighted Norms and Weighted Eligibility 10.2.1 Projection of Advantage and Value 10.2.2 Weighted Norm 10.4.1 Every Other Criterion 10.4.2 Average-Cost Algorithms 10.5.1 Actor-Critic for Average Cost 10.5.2 A Few Warnings and Remedies 10.7.1 Variance Reduction through Advantage 10.7.2 A Better Advantage 10.10.1 Adjoints and TD-Learning 10.10.2 Actor-Critic Methods 10.10.3 Some Ancient History 10.10.4 Fisher Information Appendix A Mathematical Background A.1.1 Generalities A.1.2 Topology A.1.3 Linear Algebra A.2.1 Events and Sample Space A.2.2 Markov Chain Basics A.2.3 Strong Markov Property Appendix B Markov Decision Processes B.1.1 Total Cost in Many Flavors B.2.1 Value Iteration and Successive Approximation B.2.2 Policy Improvement and Newton-Raphson B.2.3 LP Formulations Appendix C Partial Observations and Belief States