دسترسی نامحدود
برای کاربرانی که ثبت نام کرده اند
برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید
در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید
برای کاربرانی که ثبت نام کرده اند
درصورت عدم همخوانی توضیحات با کتاب
از ساعت 7 صبح تا 10 شب
ویرایش: [1 ed.]
نویسندگان: Charu C. Aggarwal
سری:
ISBN (شابک) : 3031532813, 9783031532818
ناشر: Springer
سال نشر: 2024
تعداد صفحات: 540
[530]
زبان: English
فرمت فایل : PDF (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود)
حجم فایل: 18 Mb
در صورت تبدیل فایل کتاب Probability and Statistics for Machine Learning: A Textbook به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.
توجه داشته باشید کتاب احتمال و آمار برای یادگیری ماشین: کتاب درسی نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.
این کتاب احتمال و آمار را از دیدگاه یادگیری ماشین پوشش می دهد. فصول این کتاب به سه دسته تعلق دارد: 1. مبانی احتمال و آمار: این فصل ها بر مبانی احتمال و آمار تمرکز دارند و اصول کلیدی این مباحث را در بر می گیرند. فصل 1 یک نمای کلی از حوزه احتمالات و آمار و همچنین رابطه آن با یادگیری ماشین ارائه می دهد. مبانی احتمال و آمار در فصلهای 2 تا 5 پوشش داده شده است. فصلهای 6 تا 9 به بررسی این موضوع میپردازند که چگونه مدلهای مختلف از احتمال و آمار برای یادگیری ماشین اعمال میشوند. شاید مهمترین ابزاری که فاصله بین داده ها و احتمالات را پر می کند، تخمین حداکثر احتمال است که یک مفهوم اساسی از دیدگاه یادگیری ماشینی است. این مفهوم به طور مکرر در این فصل ها بررسی شده است. 3. موضوعات پیشرفته: فصل 10 به فرآیندهای مارکوف با حالت گسسته اختصاص دارد. این برنامه کاربرد احتمالات و آمار را در یک تنظیمات زمانی و ترتیبی بررسی می کند، اگرچه برنامه ها به تنظیمات پیچیده تری مانند داده های گرافیکی گسترش می یابند. فصل 11 تعدادی از نابرابری ها و تقریب های احتمالی را پوشش می دهد. سبک نوشتن، یادگیری احتمالات و آمار را به طور همزمان با دیدگاه احتمالی در مدلسازی کاربردهای یادگیری ماشین ارتقا میدهد. این کتاب شامل بیش از 200 مثال کار شده به منظور روشن کردن مفاهیم کلیدی است. تمرین ها هم در متن فصل ها و هم در انتهای فصل ها گنجانده شده است. این کتاب برای طیف وسیعی از مخاطبان، از جمله دانشجویان تحصیلات تکمیلی، پژوهشگران و متخصصان نوشته شده است.
This book covers probability and statistics from the machine learning perspective. The chapters of this book belong to three categories: 1. The basics of probability and statistics: These chapters focus on the basics of probability and statistics, and cover the key principles of these topics. Chapter 1 provides an overview of the area of probability and statistics as well as its relationship to machine learning. The fundamentals of probability and statistics are covered in Chapters 2 through 5. 2. From probability to machine learning: Many machine learning applications are addressed using probabilistic models, whose parameters are then learned in a data-driven manner. Chapters 6 through 9 explore how different models from probability and statistics are applied to machine learning. Perhaps the most important tool that bridges the gap from data to probability is maximum-likelihood estimation, which is a foundational concept from the perspective of machine learning. This concept is explored repeatedly in these chapters. 3. Advanced topics: Chapter 10 is devoted to discrete-state Markov processes. It explores the application of probability and statistics to a temporal and sequential setting, although the applications extend to more complex settings such as graphical data. Chapter 11 covers a number of probabilistic inequalities and approximations. The style of writing promotes the learning of probability and statistics simultaneously with a probabilistic perspective on the modeling of machine learning applications. The book contains over 200 worked examples in order to elucidate key concepts. Exercises are included both within the text of the chapters and at the end of the chapters. The book is written for a broad audience, including graduate students, researchers, and practitioners.
Contents Preface Acknowledgments Author Biography 1 Probability and Statistics: An Introduction 1.1 Introduction 1.1.1 The Interplay Between Probability, Statistics, and Machine Learning 1.1.2 Chapter Organization 1.2 Representing Data 1.2.1 Numeric Multidimensional Data 1.2.2 Categorical and Mixed Attribute Data 1.3 Summarizing and Visualizing Data 1.4 The Basics of Probability and Probability Distributions 1.4.1 Populations versus Samples 1.4.2 Modeling Populations with Samples 1.4.3 Handing Dependence in Data Samples 1.5 Hypothesis Testing 1.6 Basic Problems in Machine Learning 1.6.1 Clustering 1.6.2 Classification and Regression Modeling 1.6.2.1 Regression 1.6.3 Outlier Detection 1.7 Summary 1.8 Further Reading 1.9 Exercises 2 Summarizing and Visualizing Data 2.1 Introduction 2.1.1 Chapter Organization 2.2 Summarizing Data 2.2.1 Univariate Summarization 2.2.1.1 Measures of Central Tendency 2.2.1.2 Measures of Dispersion 2.2.2 Multivariate Summarization 2.2.2.1 Covariance and Correlation 2.2.2.2 Rank Correlation Measures 2.2.2.3 Correlations among Multiple Attributes 2.2.2.4 Contingency Tables for Categorical Data 2.3 Data Visualization 2.3.1 Univariate Visualization 2.3.1.1 Histogram 2.3.1.2 Box Plot 2.3.2 Multivariate Visualization 2.3.2.1 Line Plot 2.3.2.2 Scatter Plot 2.3.2.3 Bar Chart 2.4 Applications to Data Preprocessing 2.4.1 Univariate Preprocessing Methods 2.4.2 Whitening: A Multivariate Preprocessing Method 2.5 Summary 2.6 Further Reading 2.7 Exercises 3 Probability Basics and Random Variables 3.1 Introduction 3.1.1 Chapter Organization 3.2 Sample Spaces and Events 3.3 The Counting Approach to Probabilities 3.4 Set-Wise View of Events 3.5 Conditional Probabilities and Independence 3.6 The Bayes Rule 3.6.1 The Observability Perspective: Posteriors versus Likelihoods 3.7 The Basics of Probability Distributions 3.7.1 Closed-Form View of Probability Distributions 3.7.2 Continuous Distributions 3.7.3 Multivariate Probability Distributions 3.8 Distribution Independence and Conditionals 3.8.1 Independence of Distributions 3.8.2 Conditional Distributions 3.8.3 Example: A Simple 1-Dimensional Knowledge-Based Bayes Classifier 3.9 Summarizing Distributions 3.9.1 Expectation and Variance 3.9.2 Distribution Covariance 3.9.3 Useful Multivariate Properties Under Independence 3.10 Compound Distributions 3.10.1 Total Probability Rule in Continuous Hypothesis Spaces 3.10.2 Bayes Rule in Continuous Hypothesis Spaces 3.11 Functions of Random Variables (*) 3.11.1 Distribution of the Function of a Single Random Variable 3.11.2 Distribution of the Sum of Random Variables 3.11.3 Geometric Derivation of Distributions of Functions 3.12 Summary 3.13 Further Reading 3.14 Exercises 4 Probability Distributions 4.1 Introduction 4.1.1 Chapter Organization 4.2 The Uniform Distribution 4.3 The Bernoulli Distribution 4.4 The Categorical Distribution 4.5 The Geometric Distribution 4.6 The Binomial Distribution 4.7 The Multinomial Distribution 4.8 The Exponential Distribution 4.9 The Poisson Distribution 4.10 The Normal Distribution 4.10.0.1 Closure Properties of the Normal Distribution Family 4.10.1 Multivariate Normal Distribution: Independent Attributes 4.10.2 Multivariate Normal Distribution: Dependent Attributes 4.11 The Student's t-Distribution 4.12 The χ2-Distribution 4.12.1 Application: Mahalanobis Method for Outlier Detection 4.13 Mixture Distributions: The Realistic View 4.13.1 Why Mixtures are Ubiquitous: A Motivating Example 4.13.2 The Basic Generative Process of a Mixture Model 4.13.3 Some Useful Results for Prediction 4.13.4 The Conditional Independence Assumption 4.14 Moments of Random Variables (*) 4.14.1 Central and Standardized Moments 4.14.2 Moment Generating Functions 4.15 Summary 4.16 Further Reading 4.17 Exercises 5 Hypothesis Testing and Confidence Intervals 5.1 Introduction 5.1.1 Chapter Organization 5.2 The Central Limit Theorem 5.3 Sampling Distribution and Standard Error 5.4 The Basics of Hypothesis Testing 5.4.1 Confidence Intervals 5.4.2 When Population Standard Deviations Are Not Available 5.4.3 The One-Tailed Hypothesis Test 5.5 Hypothesis Tests For Differences in Means 5.5.1 Unequal Variance t-Test 5.5.1.1 Tightening the Degrees of Freedom 5.5.2 Equal Variance t-Test 5.5.3 Paired t-Test 5.6 χ2-Hypothesis Tests 5.6.1 Standard Deviation Hypothesis Test 5.6.2 χ2-Goodness-of-Fit Test 5.6.3 Independence Tests 5.7 Analysis of Variance (ANOVA) 5.8 Machine Learning Applications of Hypothesis Testing 5.8.1 Evaluating the Performance of a Single Classifier 5.8.2 Comparing Two Classifiers 5.8.3 χ2-Statistic for Feature Selection in Text 5.8.4 Fisher Discriminant Index for Feature Selection 5.8.5 Fisher Discriminant Index for Classification (*) 5.8.5.1 Most Discriminating Direction for the Two-Class Case 5.8.5.2 Most Discriminating Direction for Multiple Classes 5.9 Summary 5.10 Further Reading 5.11 Exercises 6 Reconstructing Probability Distributions 6.1 Introduction 6.1.1 Chapter Organization 6.2 Maximum Likelihood Estimation 6.2.1 Comparing Likelihoods with Posteriors 6.3 Reconstructing Common Distributions from Data 6.3.1 The Uniform Distribution 6.3.2 The Bernoulli Distribution 6.3.3 The Geometric Distribution 6.3.4 The Binomial Distribution 6.3.5 The Multinomial Distribution 6.3.6 The Exponential Distribution 6.3.7 The Poisson Distribution 6.3.8 The Normal Distribution 6.3.9 Multivariate Distributions with Dimension Independence 6.3.10 Gaussian Distribution with Dimension Dependence 6.4 Mixture of Distributions: The EM Algorithm 6.5 Kernel Density Estimation 6.6 Reducing Reconstruction Variance 6.6.1 Variance in Maximum Likelihood Estimation 6.6.2 Prior Beliefs with Maximum A Posteriori (MAP) Estimation 6.6.2.1 Example: Laplacian Smoothing 6.6.3 Kernel Density Estimation: Role of Bandwidth 6.7 The Bias-Variance Trade-Off 6.8 Popular Distributions Used as Conjugate Priors (*) 6.8.1 Gamma Distribution 6.8.2 Beta Distribution 6.8.3 Dirichlet Distribution 6.9 Summary 6.10 Further Reading 6.11 Exercises 7 Regression 7.1 Introduction 7.1.1 Chapter Organization 7.2 The Basics of Regression 7.2.1 Interpreting the Coefficients 7.2.2 Feature Engineering Trick for Dropping Bias 7.2.3 Regression: A Central Problem in Statistics and Linear Algebra 7.3 Two Perspectives on Linear Regression 7.3.1 The Linear Algebra Perspective 7.3.2 The Probabilistic Perspective 7.3.2.1 Example: Regression with L1-Loss 7.4 Solutions to Linear Regression 7.4.1 Closed-Form Solution to Squared-Loss Regression 7.4.2 The Case of One Non-Trivial Predictor Variable 7.4.3 Solution with Gradient Descent for Squared Loss 7.4.3.1 Stochastic Gradient Descent 7.4.4 Gradient Descent For L1-Loss Regression 7.5 Handling Categorical Predictors 7.6 Overfitting and Regularization 7.6.1 Closed-Form Solution for Regularized Formulation 7.6.2 Solution Based on Gradient Descent 7.6.3 LASSO Regularization 7.7 A Probabilistic View of Regularization 7.8 Evaluating Linear Regression 7.8.1 Evaluating In-Sample Properties of Regression 7.8.1.1 Correlation Versus R2-Statistic 7.8.2 Out-of-Sample Evaluation 7.9 Nonlinear Regression 7.9.1 Interpretable Feature Engineering 7.9.2 Explicit Feature Engineering with Similarity Matrices 7.9.3 Implicit Feature Engineering with Similarity Matrices 7.10 Summary 7.11 Further Reading 7.12 Exercises 8 Classification: A Probabilistic View 8.1 Introduction 8.1.1 Chapter Organization 8.2 Generative Probabilistic Models 8.2.1 Continuous Numeric Data: The Gaussian Distribution 8.2.1.1 Prediction 8.2.1.2 Handling Overfitting 8.2.2 Binary Data: The Bernoulli Distribution 8.2.2.1 Prediction 8.2.2.2 Handling Overfitting 8.2.3 Sparse Numeric Data: The Multinomial Distribution 8.2.3.1 Prediction 8.2.3.2 Handling Overfitting 8.2.3.3 Extending Multinomial Distributions to Real-Valued Data 8.2.4 Plate Diagrams for Generative Processes 8.3 Loss-Based Formulations: A Probabilistic View 8.3.1 Least-Squares Classification 8.3.1.1 The Probabilistic Interpretation and Its Problems 8.3.1.2 Practical Issues with Least Squares Classification 8.3.2 Logistic Regression 8.3.2.1 Maximum Likelihood Estimation for Logistic Regression 8.3.2.2 Gradient Descent and Stochastic Gradient Descent 8.3.2.3 Interpreting Updates in Terms of Error Probabilities 8.3.3 Multinomial Logistic Regression 8.3.3.1 The Probabilistic Model 8.3.3.2 Maximum Likelihood Estimation 8.3.3.3 Gradient Descent and Stochastic Gradient Descent 8.3.3.4 Probabilistic Interpretation of Gradient Descent Updates 8.4 Beyond Classification: Ordered Logit Model 8.4.1 Maximum Likelihood Estimation for Ordered Logit 8.5 Summary 8.6 Further Reading 8.7 Exercises 9 Unsupervised Learning: A Probabilistic View 9.1 Introduction 9.1.1 Chapter Organization 9.2 Mixture Models for Clustering 9.2.1 Continuous Numeric Data: The Gaussian Distribution 9.2.2 Binary Data: The Bernoulli Distribution 9.2.3 Sparse Numeric Data: The Multinomial Distribution 9.3 Matrix Factorization 9.3.1 The Squared Loss Model 9.3.1.1 Probabilistic Interpretation of Squared Loss 9.3.1.2 Regularization 9.3.1.3 Application to Incomplete Data: Recommender Systems 9.3.2 Probabilistic Latent Semantic Analysis 9.3.2.1 Example of PLSA 9.3.2.2 Alternative Plate Diagram for PLSA 9.3.3 Logistic Matrix Factorization 9.3.4 Gradient Descent Steps for Logistic Matrix Factorization 9.4 Outlier Detection 9.4.1 The Mahalanobis Method: A Probabilistic View of Whitening 9.4.2 Mixture Models in Outlier Detection 9.4.3 Matrix Factorization for Outlier Detection 9.4.3.1 Outlier Detection in Incomplete Matrices 9.5 Summary 9.6 Further Reading 9.7 Exercises 10 Discrete State Markov Processes 10.1 Introduction 10.1.1 Chapter Organization 10.2 Markov Chains 10.2.1 Steady-State Behavior of Markov Chains 10.2.2 Transient Behavior of Markov Chains 10.2.2.1 Transitory Behavior with Probabilistic Termination 10.2.3 Periodic Markov Chains 10.2.4 Ergodicity 10.2.4.1 Alternative Characterization of Ergodicity 10.2.5 Different Cases of Ergodicity and Non-Ergodicity 10.2.6 Properties and Applications of Non-Ergodic Markov Chains 10.2.7 Probabilities of Absorbing Outcomes 10.2.8 The View from Matrix Algebra (*) 10.3 Machine Learning Applications of Markov Chains 10.3.1 PageRank 10.3.1.1 Application to Undirected Networks 10.3.1.2 Personalized PageRank 10.3.2 Application to Vertex Classification 10.4 Markov Chains to Generative Models 10.5 Hidden Markov Models 10.5.1 Formal Definition and Techniques for HMMs 10.5.2 Evaluation: Computing the Fit Probability for Observed Sequence 10.5.3 Explanation: Determining the Most Likely State Sequence for Observed Sequence 10.5.4 Training: Baum-Welch Algorithm 10.6 Applications of Hidden Markov Models 10.6.1 Mixture of HMMs for Clustering 10.6.2 Outlier Detection 10.6.3 Classification 10.7 Summary 10.8 Further Reading 10.9 Exercises 11 Probabilistic Inequalities and Approximations 11.1 Introduction 11.1.1 Chapter Organization 11.2 Jensen's Inequality 11.3 Markov and Chebychev Inequalities 11.4 Approximations for Sums of Random Variables 11.4.1 The Chernoff Bound 11.4.2 The Normal Approximation to the Binomial Distribution 11.4.3 The Poisson Approximation to the Binomial Distribution 11.4.4 The Hoeffding Inequality 11.5 Tail Inequalities Versus Approximation Estimates 11.6 Summary 11.7 Further Reading 11.8 Exercises References Index