دسترسی نامحدود
برای کاربرانی که ثبت نام کرده اند
برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید
در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید
برای کاربرانی که ثبت نام کرده اند
درصورت عدم همخوانی توضیحات با کتاب
از ساعت 7 صبح تا 10 شب
ویرایش: 2
نویسندگان: Theodoridis S
سری:
ISBN (شابک) : 9780128188033
ناشر: Elsevier
سال نشر: 2020
تعداد صفحات: 1146
زبان: English
فرمت فایل : PDF (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود)
حجم فایل: 17 مگابایت
در صورت تبدیل فایل کتاب Machine learning. A Bayesian and optimization perspective به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.
توجه داشته باشید کتاب فراگیری ماشین. دیدگاه بیزی و بهینه سازی نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.
یادگیری ماشینی: چشمانداز بیزی و بهینهسازی، ویرایش دوم، با پوشش هر دو ستون یادگیری نظارتشده، یعنی رگرسیون و طبقهبندی، چشمانداز واحدی را در مورد یادگیری ماشین ارائه میدهد. کتاب با مبانی شروع می شود، از جمله روش های میانگین مربع، حداقل مربعات و حداکثر احتمال، رگرسیون خط الراس، طبقه بندی نظریه تصمیم بیزی، رگرسیون لجستیک و درختان تصمیم. سپس به تکنیکهای جدیدتر، پوشش روشهای مدلسازی پراکنده، یادگیری بازتولید فضاهای هیلبرت هسته و ماشینهای بردار پشتیبان، استنتاج بیزی با تمرکز بر الگوریتم EM و نسخههای متغیر استنتاج تقریبی آن، روشهای مونت کارلو، مدلهای گرافیکی احتمالی با تمرکز بر بیزی پیشرفت میکند. شبکه ها، مدل های پنهان مارکوف و فیلتر ذرات. کاهش ابعاد و مدل سازی متغیرهای پنهان نیز به طور عمیق در نظر گرفته شده است. این پالت تکنیک ها با یک فصل گسترده در مورد شبکه های عصبی و معماری های یادگیری عمیق به پایان می رسد. این کتاب همچنین مبانی تخمین پارامترهای آماری، فیلتر وینر و کالمن، محدب و بهینهسازی محدب، از جمله فصلی در مورد تقریب تصادفی و خانواده الگوریتمهای نزولی گرادیان، ارائه تکنیکهای یادگیری آنلاین مرتبط و همچنین مفاهیم و نسخههای الگوریتمی برای بهینهسازی توزیعی را پوشش میدهد. . با تمرکز بر استدلال فیزیکی پشت ریاضیات، بدون به خطر انداختن دقت، تمام روشها و تکنیکهای مختلف به طور عمیق توضیح داده میشوند، با مثالها و مسائل پشتیبانی میشوند و منبع ارزشمندی برای درک و به کارگیری مفاهیم یادگیری ماشین در اختیار دانشآموز و محقق قرار میدهند. اکثر فصول شامل مطالعات موردی معمولی و تمرینات کامپیوتری، هم در متلب و هم در پایتون است. فصلها به گونهای نوشته شدهاند که تا حد امکان مستقل باشند، و متن را برای دورههای مختلف مناسب میسازد: تشخیص الگو، پردازش سیگنال آماری/تطبیقی، یادگیری آماری/بیزی، و همچنین دورههایی درباره مدلسازی پراکنده، یادگیری عمیق، و مدلهای گرافیکی احتمالی . جدید در این نسخه: فصل مربوط به شبکه های عصبی و یادگیری عمیق را بازنویسی کنید تا آخرین پیشرفت ها از ویرایش اول را منعکس کنید. این فصل که از مفاهیم اولیه پرسپترون و شبکههای عصبی پیشخور شروع میشود، اکنون یک درمان عمیق از شبکههای عمیق، از جمله الگوریتمهای بهینهسازی اخیر، نرمالسازی دستهای، تکنیکهای منظمسازی مانند روش حذف، شبکههای عصبی کانولوشن، شبکههای عصبی مکرر را ارائه میکند. مکانیسمهای توجه، مثالها و آموزشهای متخاصم، شبکههای کپسولی و معماریهای مولد، مانند ماشینهای محدود شده بولتزمن (RBM)، رمزگذارهای خودکار متغیر و شبکههای متخاصم مولد (GAN). درمان گسترده یادگیری بیزی شامل روش های بیزی ناپارامتریک، با تمرکز بر رستوران چینی و فرآیندهای بوفه هندی.
Machine Learning: A Bayesian and Optimization Perspective, 2nd edition, gives a unified perspective on machine learning by covering both pillars of supervised learning, namely regression and classification. The book starts with the basics, including mean square, least squares and maximum likelihood methods, ridge regression, Bayesian decision theory classification, logistic regression, and decision trees. It then progresses to more recent techniques, covering sparse modelling methods, learning in reproducing kernel Hilbert spaces and support vector machines, Bayesian inference with a focus on the EM algorithm and its approximate inference variational versions, Monte Carlo methods, probabilistic graphical models focusing on Bayesian networks, hidden Markov models and particle filtering. Dimensionality reduction and latent variables modelling are also considered in depth. This palette of techniques concludes with an extended chapter on neural networks and deep learning architectures. The book also covers the fundamentals of statistical parameter estimation, Wiener and Kalman filtering, convexity and convex optimization, including a chapter on stochastic approximation and the gradient descent family of algorithms, presenting related online learning techniques as well as concepts and algorithmic versions for distributed optimization. Focusing on the physical reasoning behind the mathematics, without sacrificing rigor, all the various methods and techniques are explained in depth, supported by examples and problems, giving an invaluable resource to the student and researcher for understanding and applying machine learning concepts. Most of the chapters include typical case studies and computer exercises, both in MATLAB and Python. The chapters are written to be as self-contained as possible, making the text suitable for different courses: pattern recognition, statistical/adaptive signal processing, statistical/Bayesian learning, as well as courses on sparse modeling, deep learning, and probabilistic graphical models. New to this edition: Complete re-write of the chapter on Neural Networks and Deep Learning to reflect the latest advances since the 1st edition. The chapter, starting from the basic perceptron and feed-forward neural networks concepts, now presents an in depth treatment of deep networks, including recent optimization algorithms, batch normalization, regularization techniques such as the dropout method, convolutional neural networks, recurrent neural networks, attention mechanisms, adversarial examples and training, capsule networks and generative architectures, such as restricted Boltzman machines (RBMs), variational autoencoders and generative adversarial networks (GANs). Expanded treatment of Bayesian learning to include nonparametric Bayesian methods, with a focus on the Chinese restaurant and the Indian buffet processes.
Contents About the Author Preface Acknowledgments Notation 1 Introduction 1.1 The Historical Context 1.2 Artificial Intelligence and Machine Learning 1.3 Algorithms Can Learn What Is Hidden in the Data 1.4 Typical Applications of Machine Learning Speech Recognition Computer Vision Multimodal Data Natural Language Processing Robotics Autonomous Cars Challenges for the Future 1.5 Machine Learning: Major Directions 1.5.1 Supervised Learning Classification Regression 1.6 Unsupervised and Semisupervised Learning 1.7 Structure and a Road Map of the Book References 2 Probability and Stochastic Processes 2.1 Introduction 2.2 Probability and Random Variables 2.2.1 Probability Relative Frequency Definition Axiomatic Definition 2.2.2 Discrete Random Variables Joint and Conditional Probabilities Bayes Theorem 2.2.3 Continuous Random Variables 2.2.4 Mean and Variance Complex Random Variables 2.2.5 Transformation of Random Variables 2.3 Examples of Distributions 2.3.1 Discrete Variables The Bernoulli Distribution The Binomial Distribution The Multinomial Distribution 2.3.2 Continuous Variables The Uniform Distribution The Gaussian Distribution The Central Limit Theorem The Exponential Distribution The Beta Distribution The Gamma Distribution The Dirichlet Distribution 2.4 Stochastic Processes 2.4.1 First- and Second-Order Statistics 2.4.2 Stationarity and Ergodicity 2.4.3 Power Spectral Density Properties of the Autocorrelation Sequence Power Spectral Density Transmission Through a Linear System Physical Interpretation of the PSD 2.4.4 Autoregressive Models 2.5 Information Theory 2.5.1 Discrete Random Variables Information Mutual and Conditional Information Entropy and Average Mutual Information 2.5.2 Continuous Random Variables Average Mutual Information and Conditional Information Relative Entropy or Kullback-Leibler Divergence 2.6 Stochastic Convergence Convergence Everywhere Convergence Almost Everywhere Convergence in the Mean-Square Sense Convergence in Probability Convergence in Distribution Problems References 3 Learning in Parametric Modeling: Basic Concepts and Directions 3.1 Introduction 3.2 Parameter Estimation: the Deterministic Point of View 3.3 Linear Regression 3.4 Classification Generative Versus Discriminative Learning 3.5 Biased Versus Unbiased Estimation 3.5.1 Biased or Unbiased Estimation? 3.6 The Cramér-Rao Lower Bound 3.7 Sufficient Statistic 3.8 Regularization Inverse Problems: Ill-Conditioning and Overfitting 3.9 The Bias-Variance Dilemma 3.9.1 Mean-Square Error Estimation 3.9.2 Bias-Variance Tradeoff 3.10 Maximum Likelihood Method 3.10.1 Linear Regression: the Nonwhite Gaussian Noise Case 3.11 Bayesian Inference 3.11.1 The Maximum a Posteriori Probability Estimation Method 3.12 Curse of Dimensionality 3.13 Validation Cross-Validation 3.14 Expected Loss and Empirical Risk Functions Learnability 3.15 Nonparametric Modeling and Estimation Problems MATLAB® Exercises References 4 Mean-Square Error Linear Estimation 4.1 Introduction 4.2 Mean-Square Error Linear Estimation: the Normal Equations 4.2.1 The Cost Function Surface 4.3 A Geometric Viewpoint: Orthogonality Condition 4.4 Extension to Complex-Valued Variables 4.4.1 Widely Linear Complex-Valued Estimation Circularity Conditions 4.4.2 Optimizing With Respect to Complex-Valued Variables: Wirtinger Calculus 4.5 Linear Filtering 4.6 MSE Linear Filtering: a Frequency Domain Point of View Deconvolution: Image Deblurring 4.7 Some Typical Applications 4.7.1 Interference Cancelation 4.7.2 System Identification 4.7.3 Deconvolution: Channel Equalization 4.8 Algorithmic Aspects: the Levinson and Lattice-Ladder Algorithms Forward and Backward MSE Optimal Predictors 4.8.1 The Lattice-Ladder Scheme Orthogonality of the Optimal Backward Errors 4.9 Mean-Square Error Estimation of Linear Models 4.9.1 The Gauss-Markov Theorem 4.9.2 Constrained Linear Estimation: the Beamforming Case 4.10 Time-Varying Statistics: Kalman Filtering Problems MATLAB® Exercises References 5 Online Learning: the Stochastic Gradient Descent Family of Algorithms 5.1 Introduction 5.2 The Steepest Descent Method 5.3 Application to the Mean-Square Error Cost Function Time-Varying Step Sizes 5.3.1 The Complex-Valued Case 5.4 Stochastic Approximation Application to the MSE Linear Estimation 5.5 The Least-Mean-Squares Adaptive Algorithm 5.5.1 Convergence and Steady-State Performance of the LMS in Stationary Environments Convergence of the Parameter Error Vector 5.5.2 Cumulative Loss Bounds 5.6 The Affine Projection Algorithm Geometric Interpretation of APA Orthogonal Projections 5.6.1 The Normalized LMS 5.7 The Complex-Valued Case The Widely Linear LMS The Widely Linear APA 5.8 Relatives of the LMS The Sign-Error LMS The Least-Mean-Fourth (LMF) Algorithm Transform-Domain LMS 5.9 Simulation Examples 5.10 Adaptive Decision Feedback Equalization 5.11 The Linearly Constrained LMS 5.12 Tracking Performance of the LMS in Nonstationary Environments 5.13 Distributed Learning: the Distributed LMS 5.13.1 Cooperation Strategies Centralized Networks Decentralized Networks 5.13.2 The Diffusion LMS 5.13.3 Convergence and Steady-State Performance: Some Highlights 5.13.4 Consensus-Based Distributed Schemes 5.14 A Case Study: Target Localization 5.15 Some Concluding Remarks: Consensus Matrix Problems MATLAB® Exercises References 6 The Least-Squares Family 6.1 Introduction 6.2 Least-Squares Linear Regression: a Geometric Perspective 6.3 Statistical Properties of the LS Estimator The LS Estimator Is Unbiased Covariance Matrix of the LS Estimator The LS Estimator Is BLUE in the Presence of White Noise The LS Estimator Achieves the Cramér-Rao Bound for White Gaussian Noise Asymptotic Distribution of the LS Estimator 6.4 Orthogonalizing the Column Space of the Input Matrix: the SVD Method Pseudoinverse Matrix and SVD 6.5 Ridge Regression: a Geometric Point of View Principal Components Regression 6.6 The Recursive Least-Squares Algorithm Time-Iterative Computations Time Updating of the Parameters 6.7 Newton\'s Iterative Minimization Method 6.7.1 RLS and Newton\'s Method 6.8 Steady-State Performance of the RLS 6.9 Complex-Valued Data: the Widely Linear RLS 6.10 Computational Aspects of the LS Solution Cholesky Factorization QR Factorization Fast RLS Versions 6.11 The Coordinate and Cyclic Coordinate Descent Methods 6.12 Simulation Examples 6.13 Total Least-Squares Geometric Interpretation of the Total Least-Squares Method Problems MATLAB® Exercises References 7 Classification: a Tour of the Classics 7.1 Introduction 7.2 Bayesian Classification The Bayesian Classifier Minimizes the Misclassification Error 7.2.1 Average Risk 7.3 Decision (Hyper)Surfaces 7.3.1 The Gaussian Distribution Case Minimum Distance Classifiers 7.4 The Naive Bayes Classifier 7.5 The Nearest Neighbor Rule 7.6 Logistic Regression 7.7 Fisher\'s Linear Discriminant 7.7.1 Scatter Matrices 7.7.2 Fisher\'s Discriminant: the Two-Class Case 7.7.3 Fisher\'s Discriminant: the Multiclass Case 7.8 Classification Trees 7.9 Combining Classifiers No Free Lunch Theorem Some Experimental Comparisons Schemes for Combining Classifiers 7.10 The Boosting Approach The AdaBoost Algorithm The Log-Loss Function 7.11 Boosting Trees Problems MATLAB® Exercises References 8 Parameter Learning: a Convex Analytic Path 8.1 Introduction 8.2 Convex Sets and Functions 8.2.1 Convex Sets 8.2.2 Convex Functions 8.3 Projections Onto Convex Sets 8.3.1 Properties of Projections 8.4 Fundamental Theorem of Projections Onto Convex Sets 8.5 A Parallel Version of POCS 8.6 From Convex Sets to Parameter Estimation and Machine Learning 8.6.1 Regression 8.6.2 Classification 8.7 Infinitely Many Closed Convex Sets: the Online Learning Case 8.7.1 Convergence of APSM Some Practical Hints 8.8 Constrained Learning 8.9 The Distributed APSM 8.10 Optimizing Nonsmooth Convex Cost Functions 8.10.1 Subgradients and Subdifferentials 8.10.2 Minimizing Nonsmooth Continuous Convex Loss Functions: the Batch Learning Case The Subgradient Method The Generic Projected Subgradient Scheme The Projected Gradient Method (PGM) Projected Subgradient Method 8.10.3 Online Learning for Convex Optimization The PEGASOS Algorithm 8.11 Regret Analysis Regret Analysis of the Subgradient Algorithm 8.12 Online Learning and Big Data Applications: a Discussion Approximation, Estimation, and Optimization Errors Batch Versus Online Learning 8.13 Proximal Operators 8.13.1 Properties of the Proximal Operator 8.13.2 Proximal Minimization Resolvent of the Subdifferential Mapping 8.14 Proximal Splitting Methods for Optimization The Proximal Forward-Backward Splitting Operator Alternating Direction Method of Multipliers (ADMM) Mirror Descent Algorithms 8.15 Distributed Optimization: Some Highlights Problems MATLAB® Exercises References 9 Sparsity-Aware Learning: Concepts and Theoretical Foundations 9.1 Introduction 9.2 Searching for a Norm 9.3 The Least Absolute Shrinkage and Selection Operator (LASSO) 9.4 Sparse Signal Representation 9.5 In Search of the Sparsest Solution The l2 Norm Minimizer The l0 Norm Minimizer The l1 Norm Minimizer Characterization of the l1 Norm Minimizer Geometric Interpretation 9.6 Uniqueness of the l0 Minimizer 9.6.1 Mutual Coherence 9.7 Equivalence of l0 and l1 Minimizers: Sufficiency Conditions 9.7.1 Condition Implied by the Mutual Coherence Number 9.7.2 The Restricted Isometry Property (RIP) Constructing Matrices That Obey the RIP of Order k 9.8 Robust Sparse Signal Recovery From Noisy Measurements 9.9 Compressed Sensing: the Glory of Randomness Compressed Sensing 9.9.1 Dimensionality Reduction and Stable Embeddings 9.9.2 Sub-Nyquist Sampling: Analog-to-Information Conversion 9.10 A Case Study: Image Denoising Problems MATLAB® Exercises References 10 Sparsity-Aware Learning: Algorithms and Applications 10.1 Introduction 10.2 Sparsity Promoting Algorithms 10.2.1 Greedy Algorithms OMP Can Recover Optimal Sparse Solutions: Sufficiency Condition The LARS Algorithm Compressed Sensing Matching Pursuit (CSMP) Algorithms 10.2.2 Iterative Shrinkage/Thresholding (IST) Algorithms 10.2.3 Which Algorithm? Some Practical Hints 10.3 Variations on the Sparsity-Aware Theme 10.4 Online Sparsity Promoting Algorithms 10.4.1 LASSO: Asymptotic Performance 10.4.2 The Adaptive Norm-Weighted LASSO 10.4.3 Adaptive CoSaMP Algorithm 10.4.4 Sparse-Adaptive Projection Subgradient Method Projection Onto the Weighted l1 Ball 10.5 Learning Sparse Analysis Models 10.5.1 Compressed Sensing for Sparse Signal Representation in Coherent Dictionaries 10.5.2 Cosparsity 10.6 A Case Study: Time-Frequency Analysis Gabor Transform and Frames Time-Frequency Resolution Gabor Frames Time-Frequency Analysis of Echolocation Signals Emitted by Bats Problems MATLAB® Exercises References 11 Learning in Reproducing Kernel Hilbert Spaces 11.1 Introduction 11.2 Generalized Linear Models 11.3 Volterra, Wiener, and Hammerstein Models 11.4 Cover\'s Theorem: Capacity of a Space in Linear Dichotomies 11.5 Reproducing Kernel Hilbert Spaces 11.5.1 Some Properties and Theoretical Highlights 11.5.2 Examples of Kernel Functions Constructing Kernels String Kernels 11.6 Representer Theorem 11.6.1 Semiparametric Representer Theorem 11.6.2 Nonparametric Modeling: a Discussion 11.7 Kernel Ridge Regression 11.8 Support Vector Regression 11.8.1 The Linear ε-Insensitive Optimal Regression The Solution Solving the Optimization Task 11.9 Kernel Ridge Regression Revisited 11.10 Optimal Margin Classification: Support Vector Machines 11.10.1 Linearly Separable Classes: Maximum Margin Classifiers The Solution The Optimization Task 11.10.2 Nonseparable Classes The Solution The Optimization Task 11.10.3 Performance of SVMs and Applications 11.10.4 Choice of Hyperparameters 11.10.5 Multiclass Generalizations 11.11 Computational Considerations 11.12 Random Fourier Features 11.12.1 Online and Distributed Learning in RKHS 11.13 Multiple Kernel Learning 11.14 Nonparametric Sparsity-Aware Learning: Additive Models 11.15 A Case Study: Authorship Identification Problems MATLAB® Exercises References 12 Bayesian Learning: Inference and the EM Algorithm 12.1 Introduction 12.2 Regression: a Bayesian Perspective 12.2.1 The Maximum Likelihood Estimator 12.2.2 The MAP Estimator 12.2.3 The Bayesian Approach 12.3 The Evidence Function and Occam\'s Razor Rule Laplacian Approximation and the Evidence Function 12.4 Latent Variables and the EM Algorithm 12.4.1 The Expectation-Maximization Algorithm 12.5 Linear Regression and the EM Algorithm 12.6 Gaussian Mixture Models 12.6.1 Gaussian Mixture Modeling and Clustering 12.7 The EM Algorithm: a Lower Bound Maximization View 12.8 Exponential Family of Probability Distributions 12.8.1 The Exponential Family and the Maximum Entropy Method 12.9 Combining Learning Models: a Probabilistic Point of View 12.9.1 Mixing Linear Regression Models Mixture of Experts Hierarchical Mixture of Experts 12.9.2 Mixing Logistic Regression Models Problems MATLAB® Exercises References 13 Bayesian Learning: Approximate Inference and Nonparametric Models 13.1 Introduction 13.2 Variational Approximation in Bayesian Learning The Mean Field Approximation 13.2.1 The Case of the Exponential Family of Probability Distributions 13.3 A Variational Bayesian Approach to Linear Regression Computation of the Lower Bound 13.4 A Variational Bayesian Approach to Gaussian Mixture Modeling 13.5 When Bayesian Inference Meets Sparsity 13.6 Sparse Bayesian Learning (SBL) 13.6.1 The Spike and Slab Method 13.7 The Relevance Vector Machine Framework 13.7.1 Adopting the Logistic Regression Model for Classification 13.8 Convex Duality and Variational Bounds 13.9 Sparsity-Aware Regression: a Variational Bound Bayesian Path Sparsity-Aware Learning: Some Concluding Remarks 13.10 Expectation Propagation Minimizing the KL Divergence The Expectation Propagation Algorithm 13.11 Nonparametric Bayesian Modeling 13.11.1 The Chinese Restaurant Process 13.11.2 Dirichlet Processes Predictive Distribution and the Pólya Urn Model Chinese Restaurant Process Revisited 13.11.3 The Stick Breaking Construction of a DP 13.11.4 Dirichlet Process Mixture Modeling Inference 13.11.5 The Indian Buffet Process Searching for a Prior on Infinite Binary Matrices Restaurant Construction Stick Breaking Construction Inference 13.12 Gaussian Processes 13.12.1 Covariance Functions and Kernels 13.12.2 Regression Dealing With Hyperparameters Computational Considerations 13.12.3 Classification 13.13 A Case Study: Hyperspectral Image Unmixing 13.13.1 Hierarchical Bayesian Modeling 13.13.2 Experimental Results Problems MATLAB® Exercises References 14 Monte Carlo Methods 14.1 Introduction 14.2 Monte Carlo Methods: the Main Concept 14.2.1 Random Number Generation 14.3 Random Sampling Based on Function Transformation 14.4 Rejection Sampling 14.5 Importance Sampling 14.6 Monte Carlo Methods and the EM Algorithm 14.7 Markov Chain Monte Carlo Methods 14.7.1 Ergodic Markov Chains 14.8 The Metropolis Method 14.8.1 Convergence Issues 14.9 Gibbs Sampling 14.10 In Search of More Efficient Methods: a Discussion Variational Inference or Monte Carlo Methods 14.11 A Case Study: Change-Point Detection Problems MATLAB® Exercise References 15 Probabilistic Graphical Models: Part I 15.1 Introduction 15.2 The Need for Graphical Models 15.3 Bayesian Networks and the Markov Condition 15.3.1 Graphs: Basic Definitions 15.3.2 Some Hints on Causality 15.3.3 d-Separation 15.3.4 Sigmoidal Bayesian Networks 15.3.5 Linear Gaussian Models 15.3.6 Multiple-Cause Networks 15.3.7 I-Maps, Soundness, Faithfulness, and Completeness 15.4 Undirected Graphical Models 15.4.1 Independencies and I-Maps in Markov Random Fields 15.4.2 The Ising Model and Its Variants 15.4.3 Conditional Random Fields (CRFs) 15.5 Factor Graphs 15.5.1 Graphical Models for Error Correcting Codes 15.6 Moralization of Directed Graphs 15.7 Exact Inference Methods: Message Passing Algorithms 15.7.1 Exact Inference in Chains 15.7.2 Exact Inference in Trees 15.7.3 The Sum-Product Algorithm 15.7.4 The Max-Product and Max-Sum Algorithms Problems References 16 Probabilistic Graphical Models: Part II 16.1 Introduction 16.2 Triangulated Graphs and Junction Trees 16.2.1 Constructing a Join Tree 16.2.2 Message Passing in Junction Trees 16.3 Approximate Inference Methods 16.3.1 Variational Methods: Local Approximation Multiple-Cause Networks and the Noisy-OR Model The Boltzmann Machine 16.3.2 Block Methods for Variational Approximation The Mean Field Approximation and the Boltzmann Machine 16.3.3 Loopy Belief Propagation 16.4 Dynamic Graphical Models 16.5 Hidden Markov Models 16.5.1 Inference The Sum-Product Algorithm: the HMM Case 16.5.2 Learning the Parameters in an HMM 16.5.3 Discriminative Learning 16.6 Beyond HMMs: a Discussion 16.6.1 Factorial Hidden Markov Models 16.6.2 Time-Varying Dynamic Bayesian Networks 16.7 Learning Graphical Models 16.7.1 Parameter Estimation 16.7.2 Learning the Structure Problems References 17 Particle Filtering 17.1 Introduction 17.2 Sequential Importance Sampling 17.2.1 Importance Sampling Revisited 17.2.2 Resampling 17.2.3 Sequential Sampling 17.3 Kalman and Particle Filtering 17.3.1 Kalman Filtering: a Bayesian Point of View 17.4 Particle Filtering 17.4.1 Degeneracy 17.4.2 Generic Particle Filtering 17.4.3 Auxiliary Particle Filtering Problems MATLAB® Exercises References 18 Neural Networks and Deep Learning 18.1 Introduction 18.2 The Perceptron 18.3 Feed-Forward Multilayer Neural Networks 18.3.1 Fully Connected Networks 18.4 The Backpropagation Algorithm Nonconvexity of the Cost Function 18.4.1 The Gradient Descent Backpropagation Scheme Pattern-by-Pattern/Online Scheme Minibatch Schemes 18.4.2 Variants of the Basic Gradient Descent Scheme Gradient Descent With a Momentum Term Nesterov\'s Momentum Algorithm The AdaGrad Algorithm The RMSProp With Nesterov Momentum The Adaptive Moment Estimation Algorithm (Adam) Some Practical Hints Batch Normalization 18.4.3 Beyond the Gradient Descent Rationale 18.5 Selecting a Cost Function 18.6 Vanishing and Exploding Gradients 18.6.1 The Rectified Linear Unit 18.7 Regularizing the Network Dropout 18.8 Designing Deep Neural Networks: a Summary 18.9 Universal Approximation Property of Feed-Forward Neural Networks 18.10 Neural Networks: a Bayesian Flavor 18.11 Shallow Versus Deep Architectures 18.11.1 The Power of Deep Architectures On the Representation Properties of Deep Networks Distributed Representations On the Optimization of Deep Networks: Some Theoretical Highlights On the Generalization Power of Deep Networks 18.12 Convolutional Neural Networks 18.12.1 The Need for Convolutions The Convolution Step The Nonlinearity Step The Pooling Step 18.12.2 Convolution Over Volumes Network in Network and 1x1 Convolution 18.12.3 The Full CNN Architecture What Deep Neural Networks Learn 18.12.4 CNNs: the Epilogue 18.13 Recurrent Neural Networks 18.13.1 Backpropagation Through Time Vanishing and Exploding Gradients The Long Short-Term Memory (LSTM) Network 18.13.2 Attention and Memory 18.14 Adversarial Examples Adversarial Training 18.15 Deep Generative Models 18.15.1 Restricted Boltzmann Machines 18.15.2 Pretraining Deep Feed-Forward Networks 18.15.3 Deep Belief Networks 18.15.4 Autoencoders 18.15.5 Generative Adversarial Networks On the Optimality of the Solution Problems in Training GANs The Wasserstein GAN Which Algorithm Then 18.15.6 Variational Autoencoders 18.16 Capsule Networks Training 18.17 Deep Neural Networks: Some Final Remarks Transfer Learning Multitask Learning Geometric Deep Learning Open Problems 18.18 A Case Study: Neural Machine Translation 18.19 Problems Computer Exercises References 19 Dimensionality Reduction and Latent Variable Modeling 19.1 Introduction 19.2 Intrinsic Dimensionality 19.3 Principal Component Analysis PCA, SVD, and Low Rank Matrix Factorization Minimum Error Interpretation PCA and Information Retrieval Orthogonalizing Properties of PCA and Feature Generation Latent Variables 19.4 Canonical Correlation Analysis 19.4.1 Relatives of CCA Partial Least-Squares 19.5 Independent Component Analysis 19.5.1 ICA and Gaussianity 19.5.2 ICA and Higher-Order Cumulants ICA Ambiguities 19.5.3 Non-Gaussianity and Independent Components 19.5.4 ICA Based on Mutual Information 19.5.5 Alternative Paths to ICA The Cocktail Party Problem 19.6 Dictionary Learning: the k-SVD Algorithm Why the Name k-SVD? Dictionary Learning and Dictionary Identifiability 19.7 Nonnegative Matrix Factorization 19.8 Learning Low-Dimensional Models: a Probabilistic Perspective 19.8.1 Factor Analysis 19.8.2 Probabilistic PCA 19.8.3 Mixture of Factors Analyzers: a Bayesian View to Compressed Sensing 19.9 Nonlinear Dimensionality Reduction 19.9.1 Kernel PCA 19.9.2 Graph-Based Methods Laplacian Eigenmaps Local Linear Embedding (LLE) Isometric Mapping (ISOMAP) 19.10 Low Rank Matrix Factorization: a Sparse Modeling Path 19.10.1 Matrix Completion 19.10.2 Robust PCA 19.10.3 Applications of Matrix Completion and ROBUST PCA Matrix Completion Robust PCA/PCP 19.11 A Case Study: FMRI Data Analysis Problems MATLAB® Exercises References Index