دسترسی نامحدود
برای کاربرانی که ثبت نام کرده اند
برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید
در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید
برای کاربرانی که ثبت نام کرده اند
درصورت عدم همخوانی توضیحات با کتاب
از ساعت 7 صبح تا 10 شب
ویرایش:
نویسندگان: Tor Lattimore. Csaba Szepesvári
سری:
ISBN (شابک) : 1108486827, 9781108486828
ناشر: Cambridge University Press
سال نشر: 2020
تعداد صفحات: 537
زبان: English
فرمت فایل : PDF (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود)
حجم فایل: 13 مگابایت
در صورت ایرانی بودن نویسنده امکان دانلود وجود ندارد و مبلغ عودت داده خواهد شد
در صورت تبدیل فایل کتاب Bandit Algorithms به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.
توجه داشته باشید کتاب الگوریتم های راهزن نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.
تصمیم گیری در مواجهه با عدم قطعیت یک چالش مهم در یادگیری ماشین است و مدل راهزن چند مسلح یک چارچوب رایج برای رسیدگی به آن است. این مقدمه جامع و دقیق به مسئله راهزن چند مسلح، تمام تنظیمات اصلی، از جمله چارچوبهای تصادفی، متخاصم و بیزی را بررسی میکند. تمرکز بر روی شهود ریاضی و شواهد دقیق کار شده، این مرجع عالی را برای محققان تثبیت شده و منبعی مفید برای دانشجویان تحصیلات تکمیلی در علوم کامپیوتر، مهندسی، آمار، ریاضیات کاربردی و اقتصاد میسازد. راهزن های خطی به عنوان یکی از مفیدترین مدل ها در برنامه ها مورد توجه ویژه قرار می گیرند، در حالی که فصل های دیگر به راهزن های ترکیبی، رتبه بندی، مسائل غیر ثابت، نمونه برداری تامپسون و اکتشاف خالص اختصاص دارد. کتاب با نگاهی به جهان فراتر از راهزنان با مقدمه ای بر نظارت جزئی و یادگیری در فرآیندهای تصمیم مارکوف به پایان می رسد.
Decision-making in the face of uncertainty is a significant challenge in machine learning, and the multi-armed bandit model is a commonly used framework to address it. This comprehensive and rigorous introduction to the multi-armed bandit problem examines all the major settings, including stochastic, adversarial, and Bayesian frameworks. A focus on both mathematical intuition and carefully worked proofs makes this an excellent reference for established researchers and a helpful resource for graduate students in computer science, engineering, statistics, applied mathematics and economics. Linear bandits receive special attention as one of the most useful models in applications, while other chapters are dedicated to combinatorial bandits, ranking, non-stationary problems, Thompson sampling and pure exploration. The book ends with a peek into the world beyond bandits with an introduction to partial monitoring and learning in Markov decision processes.
Contents Preface Notation Part I Bandits, Probability and Concentration 1 Introduction 1.1 The Language ofBandits 1.2 Applications 1.3 Notes 1.4 Bibliographic Remarks 2 Foundations of Probability 2.1 Probability Spaces and Random Elements 2.2 σ-Algebras and Knowledge 2.3 Conditional Probabilities 2.4 Independence 2.5 Integration and Expectation 2.6 Conditional Expectation 2.7 Notes 2.8 Bibliographic Remarks 2.9 Exercises 3 Stochastic Processes and Markov Chains 3.1 Stochastic Processes 3.2 Markov Chains 3.3 Martingales and Stopping Times 3.4 Notes 3.5 Bibliographic Remarks 3.6 Exercises 4 Stochastic Bandits 4.1 Core Assumptions 4.2 The Learning Objective 4.3 Knowledge and Environment Classes 4.4 The Regret 4.5 Decomposing theRegret 4.6 The Canonical Bandit Model 4.7 The Canonical Bandit Model forUncountable Action Sets 4.8 Notes 4.9 Bibliographical Remarks 4.10 Exercises 5 Concentration of Measure 5.1 Tail Probabilities 5.2 The Inequalities of Markov and Chebyshev 5.3 The Cramér-Chernoff Method and Subgaussian Random Variables 5.4 Notes 5.5 Bibliographical Remarks 5.6 Exercises Part II Stochastic Bandits with Finitely Many Arms 6 The Explore-Then-Commit Algorithm 6.1 Algorithm and Regret Analysis 6.2 Notes 6.3 Bibliographical Remarks 6.4 Exercises 7 The Upper Confidence Bound Algorithm 7.1 The Optimism Principle 7.2 Notes 7.3 Bibliographical Remarks 7.4 Exercises 8 The Upper Confidence Bound Algorithm: Asymptotic Optimality 8.1 Asymptotically Optimal UCB 8.2 Notes 8.3 Bibliographic Remarks 8.4 Exercises 9 The Upper Confidence Bound Algorithm: Minimax Optimality 9.1 The MOSS Algorithm 9.2 Two Problems 9.3 Notes 9.4 Bibliographic Remarks 9.5 Exercises 10 The Upper Confidence Bound Algorithm: Bernoulli Noise 10.1 Concentration forSums of Bernoulli Random Variables 10.2 The KL-UCB Algorithm 10.3 Notes 10.4 Bibliographic Remarks 10.5 Exercises Part III Adversarial Bandits with Finitely Many Arms 11 The Exp3 Algorithm 11.1 Adversarial Bandit Environments 11.2 Importance-Weighted Estimators 11.3 The Exp3 Algorithm 11.4 Regret Analysis 11.5 Notes 11.6 Bibliographic Remarks 11.7 Exercises 12 The Exp3-IX Algorithm 12.1 The Exp3-IX Algorithm 12.2 Regret Analysis 12.3 Notes 12.4 Bibliographic Remarks 12.5 Exercises Part IV Lower Bounds for Bandits with Finitely Many Arms 13 Lower Bounds: Basic Ideas 13.1 MainIdeas Underlying Minimax Lower Bounds 13.2 Notes 13.3 Bibliographic Remarks 13.4 Exercises 14 Foundations of Information Theory 14.1 Entropy and Optimal Coding 14.2 Relative Entropy 14.3 Notes 14.4 Bibliographic Remarks 14.5 Exercises 15 Minimax Lower Bounds 15.1 Relative Entropy Between Bandits 15.2 Minimax Lower Bounds 15.3 Notes 15.4 Bibliographic Remarks 15.5 Exercises 16 Instance-Dependent Lower Bounds 16.1 Asymptotic Bounds 16.2 Finite-Time Bounds 16.3 Notes 16.4 Bibliographic Remarks 16.5 Exercises 17 High-Probability Lower Bounds 17.1 Stochastic Bandits 17.2 Adversarial Bandits 17.3 Notes 17.4 Bibliographic Remarks 17.5 Exercises Part V Contextual and Linear Bandits 18 Contextual Bandits 18.1 Contextual Bandits: One Bandit per Context 18.2 Bandits withExpert Advice 18.3 Exp4 18.4 Regret Analysis 18.5 Notes 18.6 Bibliographic Remarks 18.7 Exercises 19 Stochastic Linear Bandits 19.1 Stochastic Contextual Bandits 19.2 Stochastic Linear Bandits 19.3 Regret Analysis 19.4 Notes 19.5 Bibliographic Remarks 19.6 Exercises 20 Confidence Bounds for Least Squares Estimators 20.1 Martingales and the Method of Mixtures 20.2 Notes 20.3 Bibliographic Remarks 20.4 Exercises 21 Optimal Design for Least Squares Estimators 21.1 The Kiefer–Wolfowitz Theorem 21.2 Notes 21.3 Bibliographic Remarks 21.4 Exercises 22 Stochastic Linear Bandits with Finitely Many Arms 22.1 Notes 22.2 Bibliographic Remarks 22.3 Exercises 23 Stochastic Linear Bandits with Sparsity 23.1 Sparse Linear Stochastic Bandits 23.2 Elimination on theHypercube 23.3 Online toConfidence Set Conversion 23.4 Sparse Online Linear Prediction 23.5 Notes 23.6 Bibliographical Remarks 23.7 Exercises 24 Minimax Lower Bounds for Stochastic Linear Bandits 24.1 Hypercube 24.2 Unit Ball 24.3 Sparse Parameter Vectors 24.4 MisspecifiedModels 24.5 Notes 24.6 Bibliographic Remarks 24.7 Exercises 25 Asymptotic Lower Bounds for Stochastic Linear Bandits 25.1 AnAsymptotic Lower Bound forFixed Action Sets 25.2 Clouds Looming forOptimism 25.3 Notes 25.4 Bibliographic Remarks 25.5 Exercises Part VI Adversarial Linear Bandits 26 Foundations of Convex Analysis 26.1 Convex Sets and Functions 26.2 Jensen’s Inequality 26.3 Bregman Divergence 26.4 Legendre Functions 26.5 Optimisation 26.6 Projections 26.7 Notes 26.8 Bibliographic Remarks 26.9 Exercises 27 Exp3 for Adversarial Linear Bandits 27.1 Exponential Weights for Linear Bandits 27.2 Regret Analysis 27.3 Continuous Exponential Weights 27.4 Notes 27.5 Bibliographic Remarks 27.6 Exercises 28 Follow-the-regularised-Leader and Mirror Descent 28.1 Online Linear Optimisation 28.2 Regret Analysis 28.3 Application toLinear Bandits 28.4 Linear Bandits onthe Unit Ball 28.5 Notes 28.6 Bibliographic Remarks 28.7 Exercises 29 The Relation between Adversarial and Stochastic Linear Bandits 29.1 Unified View 29.2 Reducing Stochastic Linear Bandits toAdversarial Linear Bandits 29.3 Stochastic Linear Bandits withParameter Noise 29.4 Contextual Linear Bandits 29.5 Notes 29.6 Bibliographic Remarks 29.7 Exercises Part VII Other Topics 30 Combinatorial Bandits 30.1 Notation and Assumptions 30.2 Applications 30.3 Bandit Feedback 30.4 Semi-bandit Feedback and MirrorDescent 30.5 Follow-the-Perturbed-Leader 30.6 Notes 30.7 Bibliographic Remarks 30.8 Exercises 31 Non-stationary Bandits 31.1 Adversarial Bandits 31.2 Stochastic Bandits 31.3 Notes 31.4 Bibliographic Remarks 31.5 Exercises 32 Ranking 32.1 ClickModels 32.2 Policy 32.3 Regret Analysis 32.4 Notes 32.5 Bibliographic Remarks 32.6 Exercises 33 Pure Exploration 33.1 Simple Regret 33.2 Best-Arm Identification witha Fixed Confidence 33.3 Best-Arm Identification witha Budget 33.4 Notes 33.5 Bibliographical Remarks 33.6 Exercises 34 Foundations of Bayesian Learning 34.1 Statistical Decision Theory and Bayesian Learning 34.2 Bayesian Learning and thePosterior Distribution 34.3 Conjugate Pairs, Conjugate Priorsand theExponential Family 34.4 The Bayesian Bandit Environment 34.5 Posterior DistributionsinBandits 34.6 Bayesian Regret 34.7 Notes 34.8 Bibliographic Remarks 34.9 Exercises 35 Bayesian Bandits 35.1 Bayesian Optimal Regret for k-ArmedStochastic Bandits 35.2 Optimal Stopping 35.3 One-armed bandits 35.4 GittinsIndex 35.5 Computing the GittinsIndex 35.6 Notes 35.7 Bibliographical Remarks 35.8 Exercises 36 Thompson Sampling 36.1 Finite-Armed Bandits 36.2 Frequentist Analysis 36.3 Linear Bandits 36.4 Information Theoretic Analysis 36.5 Notes 36.6 Bibliographic Remarks 36.7 Exercises Part VIII Beyond Bandits 37 Partial Monitoring 37.1 FiniteAdversarial Partial Monitoring Problems 37.2 The Structure of Partial Monitoring 37.3 Classificationof FiniteAdversarial Partial Monitoring 37.4 Lower Bounds 37.5 Policy and Upper Bounds 37.6 Proof of Theorem 37.16 37.7 Proof of Theorem 37.17 37.8 Proof of the ClassificationTheorem 37.9 Notes 37.10 Bibliographical Remarks 37.11 Exercises 38 Markov Decision Processes 38.1 Problem Set-Up 38.2 Optimal Policies and theBellman OptimalityEquation 38.3 Finding anOptimal Policy 38.4 Learning inMarkov Decision Processes 38.5 Upper Confidence Bounds for Reinforcement Learning 38.6 Proof of Upper Bound 38.7 Proof of Lower Bound 38.8 Notes 38.9 Bibliographical Remarks 38.10 Exercises Bibliography Index