دسترسی نامحدود
برای کاربرانی که ثبت نام کرده اند
برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید
در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید
برای کاربرانی که ثبت نام کرده اند
درصورت عدم همخوانی توضیحات با کتاب
از ساعت 7 صبح تا 10 شب
ویرایش:
نویسندگان: Ashwin Rao. Tikhon Jelvis
سری:
ISBN (شابک) : 1032124121, 9781032124124
ناشر: CRC Press/Chapman & Hall
سال نشر: 2022
تعداد صفحات: 520
[522]
زبان: English
فرمت فایل : PDF (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود)
حجم فایل: 11 Mb
در صورت تبدیل فایل کتاب Foundations of Reinforcement Learning with Applications in Finance به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.
توجه داشته باشید کتاب مبانی یادگیری تقویتی با برنامه های کاربردی در امور مالی نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.
This book demystifies Reinforcement Learning, and makes it a practically useful tool for those studying and working in applied areas, especially finance. This book seeks to overcome that barrier, and to introduce the foundations of RL in a way that balances depth of understanding with clear, minimally technical delivery.
Cover Half Title Title Page Copyright Page Contents Preface Author Biographies Summary of Notation CHAPTER 1: Overview 1.1. LEARNING REINFORCEMENT LEARNING 1.2. WHAT YOU WILL LEARN FROM THIS BOOK 1.3. EXPECTED BACKGROUND TO READ THIS BOOK 1.4. DECLUTTERING THE JARGON LINKED TO REINFORCEMENT LEARNING 1.5. INTRODUCTION TO THE MARKOV DECISION PROCESS (MDP) FRAMEWORK 1.6. REAL-WORLD PROBLEMS THAT FIT THE MDP FRAMEWORK 1.7. THE INHERENT DIFFICULTY IN SOLVING MDPS 1.8. VALUE FUNCTION, BELLMAN EQUATIONS, DYNAMIC PROGRAMMING AND RL 1.9. OUTLINE OF CHAPTERS 1.9.1. Module I: Processes and Planning Algorithms 1.9.2. Module II: Modeling Financial Applications 1.9.3. Module III: Reinforcement Learning Algorithms 1.9.4. Module IV: Finishing Touches 1.9.5. Short Appendix Chapters CHAPTER 2: Programming and Design 2.1. CODE DESIGN 2.2. ENVIRONMENT SETUP 2.3. CLASSES AND INTERFACES 2.3.1. A Distribution Interface 2.3.2. A Concrete Distribution 2.3.2.1. Dataclasses 2.3.2.2. Immutability 2.3.3. Checking Types 2.3.3.1. Static Typing 2.3.4. Type Variables 2.3.5. Functionality 2.4. ABSTRACTING OVER COMPUTATION 2.4.1. First-Class Functions 2.4.1.1. Lambdas 2.4.2. Iterative Algorithms 2.4.2.1. Iterators and Generators 2.5. KEY TAKEAWAYS FROM THIS CHAPTER MODULE I: Processes and Planning Algorithms Chapter 3: Markov Processes 3.1. THE CONCEPT OF STATE IN A PROCESS 3.2. UNDERSTANDING MARKOV PROPERTY FROM STOCK PRICE EXAMPLES 3.3. FORMAL DEFINITIONS FOR MARKOV PROCESSES 3.3.1. Starting States 3.3.2. Terminal States 3.3.3. Markov Process Implementation 3.4. STOCK PRICE EXAMPLES MODELED AS MARKOV PROCESSES 3.5. FINITE MARKOV PROCESSES 3.6. SIMPLE INVENTORY EXAMPLE 3.7. STATIONARY DISTRIBUTION OF A MARKOV PROCESS 3.8. FORMALISM OF MARKOV REWARD PROCESSES 3.9. SIMPLE INVENTORY EXAMPLE AS A MARKOV REWARD PROCESS 3.10. FINITE MARKOV REWARD PROCESSES 3.11. SIMPLE INVENTORY EXAMPLE AS A FINITE MARKOV REWARD PROCESS 3.12. VALUE FUNCTION OF A MARKOV REWARD PROCESS 3.13. SUMMARY OF KEY LEARNINGS FROM THIS CHAPTER Chapter 4: Markov Decision Processes 4.1. SIMPLE INVENTORY EXAMPLE: HOW MUCH TO ORDER? 4.2. THE DIFFICULTY OF SEQUENTIAL DECISIONING UNDER UNCERTAINTY 4.3. FORMAL DEFINITION OF A MARKOV DECISION PROCESS 4.4. POLICY 4.5. [MARKOV DECISION PROCESS, POLICY] := MARKOV REWARD PROCESS 4.6. SIMPLE INVENTORY EXAMPLE WITH UNLIMITED CAPACITY (INFINITE STATE/ACTION SPACE) 4.7. FINITE MARKOV DECISION PROCESSES 4.8. SIMPLE INVENTORY EXAMPLE AS A FINITE MARKOV DECISION PROCESS 4.9. MDP VALUE FUNCTION FOR A FIXED POLICY 4.10. OPTIMAL VALUE FUNCTION AND OPTIMAL POLICIES 4.11. VARIANTS AND EXTENSIONS OF MDPS 4.11.1. Size of Spaces and Discrete versus Continuous 4.11.1.1. State Space 4.11.1.2. Action Space 4.11.1.3. Time Steps 4.11.2. Partially-Observable Markov Decision Processes (POMDPs) 4.12. SUMMARY OF KEY LEARNINGS FROM THIS CHAPTER Chapter 5: Dynamic Programming Algorithms 5.1. PLANNING VERSUS LEARNING 5.2. USAGE OF THE TERM DYNAMIC PROGRAMMING 5.3. FIXED-POINT THEORY 5.4. BELLMAN POLICY OPERATOR AND POLICY EVALUATION ALGORITHM 5.5. GREEDY POLICY 5.6. POLICY IMPROVEMENT 5.7. POLICY ITERATION ALGORITHM 5.8. BELLMAN OPTIMALITY OPERATOR AND VALUE ITERATION ALGORITHM 5.9. OPTIMAL POLICY FROM OPTIMAL VALUE FUNCTION 5.10. REVISITING THE SIMPLE INVENTORY EXAMPLE 5.11. GENERALIZED POLICY ITERATION 5.12. ASYNCHRONOUS DYNAMIC PROGRAMMING 5.13. FINITE-HORIZON DYNAMIC PROGRAMMING: BACKWARD INDUCTION 5.14. DYNAMIC PRICING FOR END-OF-LIFE/END-OF-SEASON OF A PRODUCT 5.15. GENERALIZATION TO NON-TABULAR ALGORITHMS 5.16. SUMMARY OF KEY LEARNINGS FROM THIS CHAPTER Chapter 6: Function Approximation and Approximate Dynamic Programming 6.1. FUNCTION APPROXIMATION 6.2. LINEAR FUNCTION APPROXIMATION 6.3. NEURAL NETWORK FUNCTION APPROXIMATION 6.4. TABULAR AS A FORM OF FUNCTIONAPPROX 6.5. APPROXIMATE POLICY EVALUATION 6.6. APPROXIMATE VALUE ITERATION 6.7. FINITE-HORIZON APPROXIMATE POLICY EVALUATION 6.8. FINITE-HORIZON APPROXIMATE VALUE ITERATION 6.9. FINITE-HORIZON APPROXIMATE Q-VALUE ITERATION 6.10. HOW TO CONSTRUCT THE NON-TERMINAL STATES DISTRIBUTION 6.11. KEY TAKEAWAYS FROM THIS CHAPTER MODULE II: Modeling Financial Applications Chapter 7: Utility Theory 7.1. INTRODUCTION TO THE CONCEPT OF UTILITY 7.2. A SIMPLE FINANCIAL EXAMPLE 7.3. THE SHAPE OF THE UTILITY FUNCTION 7.4. CALCULATING THE RISK-PREMIUM 7.5. CONSTANT ABSOLUTE RISK-AVERSION (CARA) 7.6. A PORTFOLIO APPLICATION OF CARA 7.7. CONSTANT RELATIVE RISK-AVERSION (CRRA) 7.8. A PORTFOLIO APPLICATION OF CRRA 7.9. KEY TAKEAWAYS FROM THIS CHAPTER Chapter 8: Dynamic Asset-Allocation and Consumption 8.1. OPTIMIZATION OF PERSONAL FINANCE 8.2. MERTON’S PORTFOLIO PROBLEM AND SOLUTION 8.3. DEVELOPING INTUITION FOR THE SOLUTION TO MERTON’S PORTFOLIO PROBLEM 8.4. A DISCRETE-TIME ASSET-ALLOCATION EXAMPLE 8.5. PORTING TO REAL-WORLD 8.6. KEY TAKEAWAYS FROM THIS CHAPTER Chapter 9: Derivatives Pricing and Hedging 9.1. A BRIEF INTRODUCTION TO DERIVATIVES 9.1.1. Forwards 9.1.2. European Options 9.1.3. American Options 9.2. NOTATION FOR THE SINGLE-PERIOD SIMPLE SETTING 9.3. PORTFOLIOS, ARBITRAGE AND RISK-NEUTRAL PROBABILITY MEASURE 9.4. FIRST FUNDAMENTAL THEOREM OF ASSET PRICING (1ST FTAP) 9.5. SECOND FUNDAMENTAL THEOREM OF ASSET PRICING (2ND FTAP) 9.6. DERIVATIVES PRICING IN SINGLE-PERIOD SETTING 9.6.1. Derivatives Pricing When Market Is Complete 9.6.2. Derivatives Pricing When Market Is Incomplete 9.6.3. Derivatives Pricing When Market Has Arbitrage 9.7. DERIVATIVES PRICING IN MULTI-PERIOD/CONTINUOUS-TIME 9.7.1. Multi-Period Complete-Market Setting 9.7.2. Continuous-Time Complete-Market Setting 9.8. OPTIMAL EXERCISE OF AMERICAN OPTIONS CAST AS A FINITE MDP 9.9. GENERALIZING TO OPTIMAL-STOPPING PROBLEMS 9.10. PRICING/HEDGING IN AN INCOMPLETE MARKET CAST AS AN MDP 9.11. KEY TAKEAWAYS FROM THIS CHAPTER Chapter 10: Order-Book Trading Algorithms 10.1. BASICS OF ORDER BOOK AND PRICE IMPACT 10.2. OPTIMAL EXECUTION OF A MARKET ORDER 10.2.1. Simple Linear Price Impact Model with No Risk-Aversion 10.2.2. Paper by Bertsimas and Lo on Optimal Order Execution 10.2.3. Incorporating Risk-Aversion and Real-World Considerations 10.3. OPTIMAL MARKET-MAKING 10.3.1. Avellaneda-Stoikov Continuous-Time Formulation 10.3.2. Solving the Avellaneda-Stoikov Formulation 10.3.3. Analytical Approximation to the Solution to Avellaneda-Stoikov Formulation 10.3.4. Real-World Market-Making 10.4. KEY TAKEAWAYS FROM THIS CHAPTER MODULE III: Reinforcement Learning Algorithms Chapter 11: Monte-Carlo and Temporal-Difference for Prediction 11.1. OVERVIEW OF THE REINFORCEMENT LEARNING APPROACH 11.2. RL FOR PREDICTION 11.3. MONTE-CARLO (MC) PREDICTION 11.4. TEMPORAL-DIFFERENCE (TD) PREDICTION 11.5. TD VERSUS MC 11.5.1. TD Learning Akin to Human Learning 11.5.2. Bias, Variance and Convergence 11.5.3. Fixed-Data Experience Replay on TD versus MC 11.5.4. Bootstrapping and Experiencing 11.6. TD(λ) PREDICTION 11.6.1. n-Step Bootstrapping Prediction Algorithm 11.6.2. λ-Return Prediction Algorithm 11.6.3. Eligibility Traces 11.6.4. Implementation of the TD(λ) Prediction Algorithm 11.7. KEY TAKEAWAYS FROM THIS CHAPTER Chapter 12: Monte-Carlo and Temporal-Difference for Control 12.1. REFRESHER ON GENERALIZED POLICY ITERATION (GPI) 12.2. GPI WITH EVALUATION AS MONTE-CARLO 12.3. GLIE MONTE-CONTROL CONTROL 12.4. SARSA 12.5. SARSA(λ) 12.6. OFF-POLICY CONTROL 12.6.1. Q-Learning 12.6.2. Windy Grid 12.6.3. Importance Sampling 12.7. CONCEPTUAL LINKAGE BETWEEN DP AND TD ALGORITHMS 12.8. CONVERGENCE OF RL ALGORITHMS 12.9. KEY TAKEAWAYS FROM THIS CHAPTER Chapter 13: Batch RL, Experience-Replay, DQN, LSPI, Gradient TD 13.1. BATCH RL AND EXPERIENCE-REPLAY 13.2. A GENERIC IMPLEMENTATION OF EXPERIENCE-REPLAY 13.3. LEAST-SQUARES RL PREDICTION 13.3.1. Least-Squares Monte-Carlo (LSMC) 13.3.2. Least-Squares Temporal-Difference (LSTD) 13.3.3. LSTD(λ) 13.3.4. Convergence of Least-Squares Prediction 13.4. Q-LEARNING WITH EXPERIENCE-REPLAY 13.4.1. Deep Q-Networks (DQN) Algorithm 13.5. LEAST-SQUARES POLICY ITERATION (LSPI) 13.5.1. Saving Your Village from a Vampire 13.5.2. Least-Squares Control Convergence 13.6. RL FOR OPTIMAL EXERCISE OF AMERICAN OPTIONS 13.6.1. LSPI for American Options Pricing 13.6.2. Deep Q-Learning for American Options Pricing 13.7. VALUE FUNCTION GEOMETRY 13.7.1. Notation and Definitions 13.7.2. Bellman Policy Operator and Projection Operator 13.7.3. Vectors of Interest in the Φ Subspace 13.8. GRADIENT TEMPORAL-DIFFERENCE (GRADIENT TD) 13.9. KEY TAKEAWAYS FROM THIS CHAPTER Chapter 14: Policy Gradient Algorithms 14.1. ADVANTAGES AND DISADVANTAGES OF POLICY GRADIENT ALGORITHMS 14.2. POLICY GRADIENT THEOREM 14.2.1. Notation and Definitions 14.2.2. Statement of the Policy Gradient Theorem 14.2.3. Proof of the Policy Gradient Theorem 14.3. SCORE FUNCTION FOR CANONICAL POLICY FUNCTIONS 14.3.1. Canonical π(s, a; θ) for Finite Action Spaces 14.3.2. Canonical π(s, a; θ) for Single-Dimensional Continuous Action Spaces 14.4. REINFORCE ALGORITHM (MONTE-CARLO POLICY GRADIENT) 14.5. OPTIMAL ASSET ALLOCATION (REVISITED) 14.6. ACTOR-CRITIC AND VARIANCE REDUCTION 14.7. OVERCOMING BIAS WITH COMPATIBLE FUNCTION APPROXIMATION 14.8. POLICY GRADIENT METHODS IN PRACTICE 14.8.1. Natural Policy Gradient 14.8.2. Deterministic Policy Gradient 14.9. EVOLUTIONARY STRATEGIES 14.10. KEY TAKEAWAYS FROM THIS CHAPTER MODULE IV: Finishing Touches Chapter 15: Multi-Armed Bandits: Exploration versus Exploitation 15.1. INTRODUCTION TO THE MULTI-ARMED BANDIT PROBLEM 15.1.1. Some Examples of Explore-Exploit Dilemma 15.1.2. Problem Definition 15.1.3. Regret 15.1.4. Counts and Gaps 15.2. SIMPLE ALGORITHMS 15.2.1. Greedy and ϵ-Greedy 15.2.2. Optimistic Initialization 15.2.3. Decaying ϵt-Greedy Algorithm 15.3. LOWER BOUND 15.4. UPPER CONFIDENCE BOUND ALGORITHMS 15.4.1. Hoeffding’s Inequality 15.4.2. UCB1 Algorithm 15.4.3. Bayesian UCB 15.5. PROBABILITY MATCHING 15.5.1. Thompson Sampling 15.6. GRADIENT BANDITS 15.7. HORSE RACE 15.8. INFORMATION STATE SPACE MDP 15.9. EXTENDING TO CONTEXTUAL BANDITS AND RL CONTROL 15.10. KEY TAKEAWAYS FROM THIS CHAPTER Chapter 16: Blending Learning and Planning 16.1. PLANNING VERSUS LEARNING 16.1.1. Planning the Solution of Prediction/Control 16.1.2. Learning the Solution of Prediction/Control 16.1.3. Advantages and Disadvantages of Planning versus Learning 16.1.4. Blending Planning and Learning 16.2. DECISION-TIME PLANNING 16.3. MONTE-CARLO TREE-SEARCH (MCTS) 16.4. ADAPTIVE MULTI-STAGE SAMPLING 16.5. SUMMARY OF KEY LEARNINGS FROM THIS CHAPTER Chapter 17: Summary and Real-World Considerations 17.1. SUMMARY OF KEY LEARNINGS FROM THIS BOOK 17.2. RL IN THE REAL-WORLD APPENDIX A: Moment Generating Function and Its Applications A.1. THE MOMENT GENERATING FUNCTION (MGF) A.2. MGF FOR LINEAR FUNCTIONS OF RANDOM VARIABLES A.3. MGF FOR THE NORMAL DISTRIBUTION A.4. MINIMIZING THE MGF A.4.1. Minimizing the MGF When x Follows a Normal Distribution A.4.2. Minimizing the MGF When x Follows a Symmetric Binary Distribution APPENDIX B: Portfolio Theory B.1. SETTING AND NOTATION B.2. PORTFOLIO RETURNS B.3. DERIVATION OF EFFICIENT FRONTIER CURVE B.4. GLOBAL MINIMUM VARIANCE PORTFOLIO (GMVP) B.5. ORTHOGONAL EFFICIENT PORTFOLIOS B.6. TWO-FUND THEOREM B.7. AN EXAMPLE OF THE EFFICIENT FRONTIER FOR 16 ASSETS B.8. CAPM: LINEARITY OF COVARIANCE VECTOR W.R.T. MEAN RETURNS B.9. USEFUL COROLLARIES OF CAPM B.10. CROSS-SECTIONAL VARIANCE B.11. EFFICIENT SET WITH A RISK-FREE ASSET APPENDIX C: Introduction to and Overview of Stochastic Calculus Basics C.1. SIMPLE RANDOM WALK C.2. BROWNIAN MOTION AS SCALED RANDOM WALK C.3. CONTINUOUS-TIME STOCHASTIC PROCESSES C.4. PROPERTIES OF BROWNIAN MOTION SAMPLE TRACES C.5. ITO INTEGRAL C.6. ITO’S LEMMA C.7. A LOGNORMAL PROCESS C.8. A MEAN-REVERTING PROCESS APPENDIX D: The Hamilton-Jacobi-Bellman (HJB) Equation D.1. HJB AS A CONTINUOUS-TIME VERSION OF BELLMAN OPTIMALITY EQUATION D.2. HJB WITH STATE TRANSITIONS AS AN ITO PROCESS APPENDIX E: Black-Scholes Equation and Its Solution for Call/Put Options E.1. ASSUMPTIONS E.2. DERIVATION OF THE BLACK-SCHOLES EQUATION E.3. SOLUTION OF THE BLACK-SCHOLES EQUATION FOR CALL/PUT OPTIONS APPENDIX F: Function Approximations as Affine Spaces F.1. VECTOR SPACE F.2. FUNCTION SPACE F.3. LINEAR MAP OF VECTOR SPACES F.4. AFFINE SPACE F.5. AFFINE MAP F.6. FUNCTION APPROXIMATIONS F.6.1. D[R] as an Affine Space P F.6.2. Delegator Space R F.7. STOCHASTIC GRADIENT DESCENT F.8. SGD UPDATE FOR LINEAR FUNCTION APPROXIMATIONS APPENDIX G: Conjugate Priors for Gaussian and Bernoulli Distributions G.1. CONJUGATE PRIOR FOR GAUSSIAN DISTRIBUTION G.2. CONJUGATE PRIOR FOR BERNOULLI DISTRIBUTION Bibliography Index