ورود به حساب

نام کاربری گذرواژه

گذرواژه را فراموش کردید؟ کلیک کنید

حساب کاربری ندارید؟ ساخت حساب

ساخت حساب کاربری

نام نام کاربری ایمیل شماره موبایل گذرواژه

برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید


09117307688
09117179751

در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید

دسترسی نامحدود

برای کاربرانی که ثبت نام کرده اند

ضمانت بازگشت وجه

درصورت عدم همخوانی توضیحات با کتاب

پشتیبانی

از ساعت 7 صبح تا 10 شب

دانلود کتاب Machine learning. A Bayesian and optimization perspective

دانلود کتاب فراگیری ماشین. دیدگاه بیزی و بهینه سازی

Machine learning. A Bayesian and optimization perspective

مشخصات کتاب

Machine learning. A Bayesian and optimization perspective

ویرایش: 2 
نویسندگان:   
سری:  
ISBN (شابک) : 9780128188033 
ناشر: Elsevier 
سال نشر: 2020 
تعداد صفحات: 1146 
زبان: English 
فرمت فایل : PDF (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود) 
حجم فایل: 17 مگابایت 

قیمت کتاب (تومان) : 55,000



ثبت امتیاز به این کتاب

میانگین امتیاز به این کتاب :
       تعداد امتیاز دهندگان : 2


در صورت تبدیل فایل کتاب Machine learning. A Bayesian and optimization perspective به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.

توجه داشته باشید کتاب فراگیری ماشین. دیدگاه بیزی و بهینه سازی نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.


توضیحاتی در مورد کتاب فراگیری ماشین. دیدگاه بیزی و بهینه سازی

یادگیری ماشینی: چشم‌انداز بیزی و بهینه‌سازی، ویرایش دوم، با پوشش هر دو ستون یادگیری نظارت‌شده، یعنی رگرسیون و طبقه‌بندی، چشم‌انداز واحدی را در مورد یادگیری ماشین ارائه می‌دهد. کتاب با مبانی شروع می شود، از جمله روش های میانگین مربع، حداقل مربعات و حداکثر احتمال، رگرسیون خط الراس، طبقه بندی نظریه تصمیم بیزی، رگرسیون لجستیک و درختان تصمیم. سپس به تکنیک‌های جدیدتر، پوشش روش‌های مدل‌سازی پراکنده، یادگیری بازتولید فضاهای هیلبرت هسته و ماشین‌های بردار پشتیبان، استنتاج بیزی با تمرکز بر الگوریتم EM و نسخه‌های متغیر استنتاج تقریبی آن، روش‌های مونت کارلو، مدل‌های گرافیکی احتمالی با تمرکز بر بیزی پیشرفت می‌کند. شبکه ها، مدل های پنهان مارکوف و فیلتر ذرات. کاهش ابعاد و مدل سازی متغیرهای پنهان نیز به طور عمیق در نظر گرفته شده است. این پالت تکنیک ها با یک فصل گسترده در مورد شبکه های عصبی و معماری های یادگیری عمیق به پایان می رسد. این کتاب همچنین مبانی تخمین پارامترهای آماری، فیلتر وینر و کالمن، محدب و بهینه‌سازی محدب، از جمله فصلی در مورد تقریب تصادفی و خانواده الگوریتم‌های نزولی گرادیان، ارائه تکنیک‌های یادگیری آنلاین مرتبط و همچنین مفاهیم و نسخه‌های الگوریتمی برای بهینه‌سازی توزیعی را پوشش می‌دهد. . با تمرکز بر استدلال فیزیکی پشت ریاضیات، بدون به خطر انداختن دقت، تمام روش‌ها و تکنیک‌های مختلف به طور عمیق توضیح داده می‌شوند، با مثال‌ها و مسائل پشتیبانی می‌شوند و منبع ارزشمندی برای درک و به کارگیری مفاهیم یادگیری ماشین در اختیار دانش‌آموز و محقق قرار می‌دهند. اکثر فصول شامل مطالعات موردی معمولی و تمرینات کامپیوتری، هم در متلب و هم در پایتون است. فصل‌ها به گونه‌ای نوشته شده‌اند که تا حد امکان مستقل باشند، و متن را برای دوره‌های مختلف مناسب می‌سازد: تشخیص الگو، پردازش سیگنال آماری/تطبیقی، یادگیری آماری/بیزی، و همچنین دوره‌هایی درباره مدل‌سازی پراکنده، یادگیری عمیق، و مدل‌های گرافیکی احتمالی . جدید در این نسخه: فصل مربوط به شبکه های عصبی و یادگیری عمیق را بازنویسی کنید تا آخرین پیشرفت ها از ویرایش اول را منعکس کنید. این فصل که از مفاهیم اولیه پرسپترون و شبکه‌های عصبی پیش‌خور شروع می‌شود، اکنون یک درمان عمیق از شبکه‌های عمیق، از جمله الگوریتم‌های بهینه‌سازی اخیر، نرمال‌سازی دسته‌ای، تکنیک‌های منظم‌سازی مانند روش حذف، شبکه‌های عصبی کانولوشن، شبکه‌های عصبی مکرر را ارائه می‌کند. مکانیسم‌های توجه، مثال‌ها و آموزش‌های متخاصم، شبکه‌های کپسولی و معماری‌های مولد، مانند ماشین‌های محدود شده بولتزمن (RBM)، رمزگذارهای خودکار متغیر و شبکه‌های متخاصم مولد (GAN). درمان گسترده یادگیری بیزی شامل روش های بیزی ناپارامتریک، با تمرکز بر رستوران چینی و فرآیندهای بوفه هندی.


توضیحاتی درمورد کتاب به خارجی

Machine Learning: A Bayesian and Optimization Perspective, 2nd edition, gives a unified perspective on machine learning by covering both pillars of supervised learning, namely regression and classification. The book starts with the basics, including mean square, least squares and maximum likelihood methods, ridge regression, Bayesian decision theory classification, logistic regression, and decision trees. It then progresses to more recent techniques, covering sparse modelling methods, learning in reproducing kernel Hilbert spaces and support vector machines, Bayesian inference with a focus on the EM algorithm and its approximate inference variational versions, Monte Carlo methods, probabilistic graphical models focusing on Bayesian networks, hidden Markov models and particle filtering. Dimensionality reduction and latent variables modelling are also considered in depth. This palette of techniques concludes with an extended chapter on neural networks and deep learning architectures. The book also covers the fundamentals of statistical parameter estimation, Wiener and Kalman filtering, convexity and convex optimization, including a chapter on stochastic approximation and the gradient descent family of algorithms, presenting related online learning techniques as well as concepts and algorithmic versions for distributed optimization. Focusing on the physical reasoning behind the mathematics, without sacrificing rigor, all the various methods and techniques are explained in depth, supported by examples and problems, giving an invaluable resource to the student and researcher for understanding and applying machine learning concepts. Most of the chapters include typical case studies and computer exercises, both in MATLAB and Python. The chapters are written to be as self-contained as possible, making the text suitable for different courses: pattern recognition, statistical/adaptive signal processing, statistical/Bayesian learning, as well as courses on sparse modeling, deep learning, and probabilistic graphical models. New to this edition: Complete re-write of the chapter on Neural Networks and Deep Learning to reflect the latest advances since the 1st edition. The chapter, starting from the basic perceptron and feed-forward neural networks concepts, now presents an in depth treatment of deep networks, including recent optimization algorithms, batch normalization, regularization techniques such as the dropout method, convolutional neural networks, recurrent neural networks, attention mechanisms, adversarial examples and training, capsule networks and generative architectures, such as restricted Boltzman machines (RBMs), variational autoencoders and generative adversarial networks (GANs). Expanded treatment of Bayesian learning to include nonparametric Bayesian methods, with a focus on the Chinese restaurant and the Indian buffet processes.



فهرست مطالب

Contents
About the Author
Preface
Acknowledgments
Notation
1 Introduction
	1.1 The Historical Context
	1.2 Artificial Intelligence and Machine Learning
	1.3 Algorithms Can Learn What Is Hidden in the Data
	1.4 Typical Applications of Machine Learning
		Speech Recognition
		Computer Vision
		Multimodal Data
		Natural Language Processing
		Robotics
		Autonomous Cars
		Challenges for the Future
	1.5 Machine Learning: Major Directions
		1.5.1 Supervised Learning
			Classification
			Regression
	1.6 Unsupervised and Semisupervised Learning
	1.7 Structure and a Road Map of the Book
	References
2 Probability and Stochastic Processes
	2.1 Introduction
	2.2 Probability and Random Variables
		2.2.1 Probability
			Relative Frequency Definition
			Axiomatic Definition
		2.2.2 Discrete Random Variables
			Joint and Conditional Probabilities
			Bayes Theorem
		2.2.3 Continuous Random Variables
		2.2.4 Mean and Variance
			Complex Random Variables
		2.2.5 Transformation of Random Variables
	2.3 Examples of Distributions
		2.3.1 Discrete Variables
			The Bernoulli Distribution
			The Binomial Distribution
			The Multinomial Distribution
		2.3.2 Continuous Variables
			The Uniform Distribution
			The Gaussian Distribution
			The Central Limit Theorem
			The Exponential Distribution
			The Beta Distribution
			The Gamma Distribution
			The Dirichlet Distribution
	2.4 Stochastic Processes
		2.4.1 First- and Second-Order Statistics
		2.4.2 Stationarity and Ergodicity
		2.4.3 Power Spectral Density
			Properties of the Autocorrelation Sequence
			Power Spectral Density
			Transmission Through a Linear System
			Physical Interpretation of the PSD
		2.4.4 Autoregressive Models
	2.5 Information Theory
		2.5.1 Discrete Random Variables
			Information
			Mutual and Conditional Information
			Entropy and Average Mutual Information
		2.5.2 Continuous Random Variables
			Average Mutual Information and Conditional Information
			Relative Entropy or Kullback-Leibler Divergence
	2.6 Stochastic Convergence
		Convergence Everywhere
		Convergence Almost Everywhere
		Convergence in the Mean-Square Sense
		Convergence in Probability
		Convergence in Distribution
	Problems
	References
3 Learning in Parametric Modeling: Basic Concepts and Directions
	3.1 Introduction
	3.2 Parameter Estimation: the Deterministic Point of View
	3.3 Linear Regression
	3.4 Classification
		Generative Versus Discriminative Learning
	3.5 Biased Versus Unbiased Estimation
		3.5.1 Biased or Unbiased Estimation?
	3.6 The Cramér-Rao Lower Bound
	3.7 Sufficient Statistic
	3.8 Regularization
		Inverse Problems: Ill-Conditioning and Overfitting
	3.9 The Bias-Variance Dilemma
		3.9.1 Mean-Square Error Estimation
		3.9.2 Bias-Variance Tradeoff
	3.10 Maximum Likelihood Method
		3.10.1 Linear Regression: the Nonwhite Gaussian Noise Case
	3.11 Bayesian Inference
		3.11.1 The Maximum a Posteriori Probability Estimation Method
	3.12 Curse of Dimensionality
	3.13 Validation
		Cross-Validation
	3.14 Expected Loss and Empirical Risk Functions
		Learnability
	3.15 Nonparametric Modeling and Estimation
	Problems
		MATLAB® Exercises
	References
4 Mean-Square Error Linear Estimation
	4.1 Introduction
	4.2 Mean-Square Error Linear Estimation: the Normal Equations
		4.2.1 The Cost Function Surface
	4.3 A Geometric Viewpoint: Orthogonality Condition
	4.4 Extension to Complex-Valued Variables
		4.4.1 Widely Linear Complex-Valued Estimation
			Circularity Conditions
		4.4.2 Optimizing With Respect to Complex-Valued Variables: Wirtinger Calculus
	4.5 Linear Filtering
	4.6 MSE Linear Filtering: a Frequency Domain Point of View
		Deconvolution: Image Deblurring
	4.7 Some Typical Applications
		4.7.1 Interference Cancelation
		4.7.2 System Identification
		4.7.3 Deconvolution: Channel Equalization
	4.8 Algorithmic Aspects: the Levinson and Lattice-Ladder Algorithms
		Forward and Backward MSE Optimal Predictors
		4.8.1 The Lattice-Ladder Scheme
			Orthogonality of the Optimal Backward Errors
	4.9 Mean-Square Error Estimation of Linear Models
		4.9.1 The Gauss-Markov Theorem
		4.9.2 Constrained Linear Estimation: the Beamforming Case
	4.10 Time-Varying Statistics: Kalman Filtering
	Problems
		MATLAB® Exercises
	References
5 Online Learning: the Stochastic Gradient Descent Family of Algorithms
	5.1 Introduction
	5.2 The Steepest Descent Method
	5.3 Application to the Mean-Square Error Cost Function
		Time-Varying Step Sizes
		5.3.1 The Complex-Valued Case
	5.4 Stochastic Approximation
		Application to the MSE Linear Estimation
	5.5 The Least-Mean-Squares Adaptive Algorithm
		5.5.1 Convergence and Steady-State Performance of the LMS in Stationary Environments
			Convergence of the Parameter Error Vector
		5.5.2 Cumulative Loss Bounds
	5.6 The Affine Projection Algorithm
		Geometric Interpretation of APA
		Orthogonal Projections
		5.6.1 The Normalized LMS
	5.7 The Complex-Valued Case
		The Widely Linear LMS
		The Widely Linear APA
	5.8 Relatives of the LMS
		The Sign-Error LMS
		The Least-Mean-Fourth (LMF) Algorithm
		Transform-Domain LMS
	5.9 Simulation Examples
	5.10 Adaptive Decision Feedback Equalization
	5.11 The Linearly Constrained LMS
	5.12 Tracking Performance of the LMS in Nonstationary Environments
	5.13 Distributed Learning: the Distributed LMS
		5.13.1 Cooperation Strategies
			Centralized Networks
			Decentralized Networks
		5.13.2 The Diffusion LMS
		5.13.3 Convergence and Steady-State Performance: Some Highlights
		5.13.4 Consensus-Based Distributed Schemes
	5.14 A Case Study: Target Localization
	5.15 Some Concluding Remarks: Consensus Matrix
	Problems
		MATLAB® Exercises
	References
6 The Least-Squares Family
	6.1 Introduction
	6.2 Least-Squares Linear Regression: a Geometric Perspective
	6.3 Statistical Properties of the LS Estimator
		The LS Estimator Is Unbiased
		Covariance Matrix of the LS Estimator
		The LS Estimator Is BLUE in the Presence of White Noise
		The LS Estimator Achieves the Cramér-Rao Bound for White Gaussian Noise
		Asymptotic Distribution of the LS Estimator
	6.4 Orthogonalizing the Column Space of the Input Matrix: the SVD Method
		Pseudoinverse Matrix and SVD
	6.5 Ridge Regression: a Geometric Point of View
		Principal Components Regression
	6.6 The Recursive Least-Squares Algorithm
		Time-Iterative Computations
		Time Updating of the Parameters
	6.7 Newton\'s Iterative Minimization Method
		6.7.1 RLS and Newton\'s Method
	6.8 Steady-State Performance of the RLS
	6.9 Complex-Valued Data: the Widely Linear RLS
	6.10 Computational Aspects of the LS Solution
		Cholesky Factorization
		QR Factorization
		Fast RLS Versions
	6.11 The Coordinate and Cyclic Coordinate Descent Methods
	6.12 Simulation Examples
	6.13 Total Least-Squares
		Geometric Interpretation of the Total Least-Squares Method
	Problems
		MATLAB® Exercises
	References
7 Classification: a Tour of the Classics
	7.1 Introduction
	7.2 Bayesian Classification
		The Bayesian Classifier Minimizes the Misclassification Error
		7.2.1 Average Risk
	7.3 Decision (Hyper)Surfaces
		7.3.1 The Gaussian Distribution Case
			Minimum Distance Classifiers
	7.4 The Naive Bayes Classifier
	7.5 The Nearest Neighbor Rule
	7.6 Logistic Regression
	7.7 Fisher\'s Linear Discriminant
		7.7.1 Scatter Matrices
		7.7.2 Fisher\'s Discriminant: the Two-Class Case
		7.7.3 Fisher\'s Discriminant: the Multiclass Case
	7.8 Classification Trees
	7.9 Combining Classifiers
		No Free Lunch Theorem
		Some Experimental Comparisons
		Schemes for Combining Classifiers
	7.10 The Boosting Approach
		The AdaBoost Algorithm
		The Log-Loss Function
	7.11 Boosting Trees
	Problems
		MATLAB® Exercises
	References
8 Parameter Learning: a Convex Analytic Path
	8.1 Introduction
	8.2 Convex Sets and Functions
		8.2.1 Convex Sets
		8.2.2 Convex Functions
	8.3 Projections Onto Convex Sets
		8.3.1 Properties of Projections
	8.4 Fundamental Theorem of Projections Onto Convex Sets
	8.5 A Parallel Version of POCS
	8.6 From Convex Sets to Parameter Estimation and Machine Learning
		8.6.1 Regression
		8.6.2 Classification
	8.7 Infinitely Many Closed Convex Sets: the Online Learning Case
		8.7.1 Convergence of APSM
			Some Practical Hints
	8.8 Constrained Learning
	8.9 The Distributed APSM
	8.10 Optimizing Nonsmooth Convex Cost Functions
		8.10.1 Subgradients and Subdifferentials
		8.10.2 Minimizing Nonsmooth Continuous Convex Loss Functions: the Batch Learning Case
			The Subgradient Method
			The Generic Projected Subgradient Scheme
			The Projected Gradient Method (PGM)
			Projected Subgradient Method
		8.10.3 Online Learning for Convex Optimization
			The PEGASOS Algorithm
	8.11 Regret Analysis
		Regret Analysis of the Subgradient Algorithm
	8.12 Online Learning and Big Data Applications: a Discussion
		Approximation, Estimation, and Optimization Errors
		Batch Versus Online Learning
	8.13 Proximal Operators
		8.13.1 Properties of the Proximal Operator
		8.13.2 Proximal Minimization
			Resolvent of the Subdifferential Mapping
	8.14 Proximal Splitting Methods for Optimization
		The Proximal Forward-Backward Splitting Operator
		Alternating Direction Method of Multipliers (ADMM)
		Mirror Descent Algorithms
	8.15 Distributed Optimization: Some Highlights
	Problems
		MATLAB® Exercises
	References
9 Sparsity-Aware Learning: Concepts and Theoretical Foundations
	9.1 Introduction
	9.2 Searching for a Norm
	9.3 The Least Absolute Shrinkage and Selection Operator (LASSO)
	9.4 Sparse Signal Representation
	9.5 In Search of the Sparsest Solution
		The l2 Norm Minimizer
		The l0 Norm Minimizer
		The l1 Norm Minimizer
		Characterization of the l1 Norm Minimizer
		Geometric Interpretation
	9.6 Uniqueness of the l0 Minimizer
		9.6.1 Mutual Coherence
	9.7 Equivalence of l0 and l1 Minimizers: Sufficiency Conditions
		9.7.1 Condition Implied by the Mutual Coherence Number
		9.7.2 The Restricted Isometry Property (RIP)
			Constructing Matrices That Obey the RIP of Order k
	9.8 Robust Sparse Signal Recovery From Noisy Measurements
	9.9 Compressed Sensing: the Glory of Randomness
		Compressed Sensing
		9.9.1 Dimensionality Reduction and Stable Embeddings
		9.9.2 Sub-Nyquist Sampling: Analog-to-Information Conversion
	9.10 A Case Study: Image Denoising
	Problems
		MATLAB® Exercises
	References
10 Sparsity-Aware Learning: Algorithms and Applications
	10.1 Introduction
	10.2 Sparsity Promoting Algorithms
		10.2.1 Greedy Algorithms
			OMP Can Recover Optimal Sparse Solutions: Sufficiency Condition
			The LARS Algorithm
			Compressed Sensing Matching Pursuit (CSMP) Algorithms
		10.2.2 Iterative Shrinkage/Thresholding (IST) Algorithms
		10.2.3 Which Algorithm? Some Practical Hints
	10.3 Variations on the Sparsity-Aware Theme
	10.4 Online Sparsity Promoting Algorithms
		10.4.1 LASSO: Asymptotic Performance
		10.4.2 The Adaptive Norm-Weighted LASSO
		10.4.3 Adaptive CoSaMP Algorithm
		10.4.4 Sparse-Adaptive Projection Subgradient Method
			Projection Onto the Weighted l1 Ball
	10.5 Learning Sparse Analysis Models
		10.5.1 Compressed Sensing for Sparse Signal Representation in Coherent Dictionaries
		10.5.2 Cosparsity
	10.6 A Case Study: Time-Frequency Analysis
		Gabor Transform and Frames
		Time-Frequency Resolution
		Gabor Frames
		Time-Frequency Analysis of Echolocation Signals Emitted by Bats
	Problems
		MATLAB® Exercises
	References
11 Learning in Reproducing Kernel Hilbert Spaces
	11.1 Introduction
	11.2 Generalized Linear Models
	11.3 Volterra, Wiener, and Hammerstein Models
	11.4 Cover\'s Theorem: Capacity of a Space in Linear Dichotomies
	11.5 Reproducing Kernel Hilbert Spaces
		11.5.1 Some Properties and Theoretical Highlights
		11.5.2 Examples of Kernel Functions
			Constructing Kernels
			String Kernels
	11.6 Representer Theorem
		11.6.1 Semiparametric Representer Theorem
		11.6.2 Nonparametric Modeling: a Discussion
	11.7 Kernel Ridge Regression
	11.8 Support Vector Regression
		11.8.1 The Linear ε-Insensitive Optimal Regression
			The Solution
			Solving the Optimization Task
	11.9 Kernel Ridge Regression Revisited
	11.10 Optimal Margin Classification: Support Vector Machines
		11.10.1 Linearly Separable Classes: Maximum Margin Classifiers
			The Solution
			The Optimization Task
		11.10.2 Nonseparable Classes
			The Solution
			The Optimization Task
		11.10.3 Performance of SVMs and Applications
		11.10.4 Choice of Hyperparameters
		11.10.5 Multiclass Generalizations
	11.11 Computational Considerations
	11.12 Random Fourier Features
		11.12.1 Online and Distributed Learning in RKHS
	11.13 Multiple Kernel Learning
	11.14 Nonparametric Sparsity-Aware Learning: Additive Models
	11.15 A Case Study: Authorship Identification
	Problems
		MATLAB® Exercises
	References
12 Bayesian Learning: Inference and the EM Algorithm
	12.1 Introduction
	12.2 Regression: a Bayesian Perspective
		12.2.1 The Maximum Likelihood Estimator
		12.2.2 The MAP Estimator
		12.2.3 The Bayesian Approach
	12.3 The Evidence Function and Occam\'s Razor Rule
		Laplacian Approximation and the Evidence Function
	12.4 Latent Variables and the EM Algorithm
		12.4.1 The Expectation-Maximization Algorithm
	12.5 Linear Regression and the EM Algorithm
	12.6 Gaussian Mixture Models
		12.6.1 Gaussian Mixture Modeling and Clustering
	12.7 The EM Algorithm: a Lower Bound Maximization View
	12.8 Exponential Family of Probability Distributions
		12.8.1 The Exponential Family and the Maximum Entropy Method
	12.9 Combining Learning Models: a Probabilistic Point of View
		12.9.1 Mixing Linear Regression Models
			Mixture of Experts
			Hierarchical Mixture of Experts
		12.9.2 Mixing Logistic Regression Models
	Problems
		MATLAB® Exercises
	References
13 Bayesian Learning: Approximate Inference and Nonparametric Models
	13.1 Introduction
	13.2 Variational Approximation in Bayesian Learning
		The Mean Field Approximation
		13.2.1 The Case of the Exponential Family of Probability Distributions
	13.3 A Variational Bayesian Approach to Linear Regression
		Computation of the Lower Bound
	13.4 A Variational Bayesian Approach to Gaussian Mixture Modeling
	13.5 When Bayesian Inference Meets Sparsity
	13.6 Sparse Bayesian Learning (SBL)
		13.6.1 The Spike and Slab Method
	13.7 The Relevance Vector Machine Framework
		13.7.1 Adopting the Logistic Regression Model for Classification
	13.8 Convex Duality and Variational Bounds
	13.9 Sparsity-Aware Regression: a Variational Bound Bayesian Path
		Sparsity-Aware Learning: Some Concluding Remarks
	13.10 Expectation Propagation
		Minimizing the KL Divergence
		The Expectation Propagation Algorithm
	13.11 Nonparametric Bayesian Modeling
		13.11.1 The Chinese Restaurant Process
		13.11.2 Dirichlet Processes
			Predictive Distribution and the Pólya Urn Model
			Chinese Restaurant Process Revisited
		13.11.3 The Stick Breaking Construction of a DP
		13.11.4 Dirichlet Process Mixture Modeling
		Inference
		13.11.5 The Indian Buffet Process
			Searching for a Prior on Infinite Binary Matrices
			Restaurant Construction
			Stick Breaking Construction
			Inference
	13.12 Gaussian Processes
		13.12.1 Covariance Functions and Kernels
		13.12.2 Regression
			Dealing With Hyperparameters
			Computational Considerations
		13.12.3 Classification
	13.13 A Case Study: Hyperspectral Image Unmixing
		13.13.1 Hierarchical Bayesian Modeling
		13.13.2 Experimental Results
	Problems
		MATLAB® Exercises
	References
14 Monte Carlo Methods
	14.1 Introduction
	14.2 Monte Carlo Methods: the Main Concept
		14.2.1 Random Number Generation
	14.3 Random Sampling Based on Function Transformation
	14.4 Rejection Sampling
	14.5 Importance Sampling
	14.6 Monte Carlo Methods and the EM Algorithm
	14.7 Markov Chain Monte Carlo Methods
		14.7.1 Ergodic Markov Chains
	14.8 The Metropolis Method
		14.8.1 Convergence Issues
	14.9 Gibbs Sampling
	14.10 In Search of More Efficient Methods: a Discussion
		Variational Inference or Monte Carlo Methods
	14.11 A Case Study: Change-Point Detection
	Problems
		MATLAB® Exercise
	References
15 Probabilistic Graphical Models: Part I
	15.1 Introduction
	15.2 The Need for Graphical Models
	15.3 Bayesian Networks and the Markov Condition
		15.3.1 Graphs: Basic Definitions
		15.3.2 Some Hints on Causality
		15.3.3 d-Separation
		15.3.4 Sigmoidal Bayesian Networks
		15.3.5 Linear Gaussian Models
		15.3.6 Multiple-Cause Networks
		15.3.7 I-Maps, Soundness, Faithfulness, and Completeness
	15.4 Undirected Graphical Models
		15.4.1 Independencies and I-Maps in Markov Random Fields
		15.4.2 The Ising Model and Its Variants
		15.4.3 Conditional Random Fields (CRFs)
	15.5 Factor Graphs
		15.5.1 Graphical Models for Error Correcting Codes
	15.6 Moralization of Directed Graphs
	15.7 Exact Inference Methods: Message Passing Algorithms
		15.7.1 Exact Inference in Chains
		15.7.2 Exact Inference in Trees
		15.7.3 The Sum-Product Algorithm
		15.7.4 The Max-Product and Max-Sum Algorithms
	Problems
	References
16 Probabilistic Graphical Models: Part II
	16.1 Introduction
	16.2 Triangulated Graphs and Junction Trees
		16.2.1 Constructing a Join Tree
		16.2.2 Message Passing in Junction Trees
	16.3 Approximate Inference Methods
		16.3.1 Variational Methods: Local Approximation
			Multiple-Cause Networks and the Noisy-OR Model
			The Boltzmann Machine
		16.3.2 Block Methods for Variational Approximation
			The Mean Field Approximation and the Boltzmann Machine
		16.3.3 Loopy Belief Propagation
	16.4 Dynamic Graphical Models
	16.5 Hidden Markov Models
		16.5.1 Inference
			The Sum-Product Algorithm: the HMM Case
		16.5.2 Learning the Parameters in an HMM
		16.5.3 Discriminative Learning
	16.6 Beyond HMMs: a Discussion
		16.6.1 Factorial Hidden Markov Models
		16.6.2 Time-Varying Dynamic Bayesian Networks
	16.7 Learning Graphical Models
		16.7.1 Parameter Estimation
		16.7.2 Learning the Structure
	Problems
	References
17 Particle Filtering
	17.1 Introduction
	17.2 Sequential Importance Sampling
		17.2.1 Importance Sampling Revisited
		17.2.2 Resampling
		17.2.3 Sequential Sampling
	17.3 Kalman and Particle Filtering
		17.3.1 Kalman Filtering: a Bayesian Point of View
	17.4 Particle Filtering
		17.4.1 Degeneracy
		17.4.2 Generic Particle Filtering
		17.4.3 Auxiliary Particle Filtering
	Problems
		MATLAB® Exercises
	References
18 Neural Networks and Deep Learning
	18.1 Introduction
	18.2 The Perceptron
	18.3 Feed-Forward Multilayer Neural Networks
		18.3.1 Fully Connected Networks
	18.4 The Backpropagation Algorithm
		Nonconvexity of the Cost Function
		18.4.1 The Gradient Descent Backpropagation Scheme
			Pattern-by-Pattern/Online Scheme
			Minibatch Schemes
		18.4.2 Variants of the Basic Gradient Descent Scheme
			Gradient Descent With a Momentum Term
			Nesterov\'s Momentum Algorithm
			The AdaGrad Algorithm
			The RMSProp With Nesterov Momentum
			The Adaptive Moment Estimation Algorithm (Adam)
			Some Practical Hints
			Batch Normalization
		18.4.3 Beyond the Gradient Descent Rationale
	18.5 Selecting a Cost Function
	18.6 Vanishing and Exploding Gradients
		18.6.1 The Rectified Linear Unit
	18.7 Regularizing the Network
		Dropout
	18.8 Designing Deep Neural Networks: a Summary
	18.9 Universal Approximation Property of Feed-Forward Neural Networks
	18.10 Neural Networks: a Bayesian Flavor
	18.11 Shallow Versus Deep Architectures
		18.11.1 The Power of Deep Architectures
			On the Representation Properties of Deep Networks
			Distributed Representations
			On the Optimization of Deep Networks: Some Theoretical Highlights
			On the Generalization Power of Deep Networks
	18.12 Convolutional Neural Networks
		18.12.1 The Need for Convolutions
			The Convolution Step
			The Nonlinearity Step
			The Pooling Step
		18.12.2 Convolution Over Volumes
			Network in Network and 1x1 Convolution
		18.12.3 The Full CNN Architecture
			What Deep Neural Networks Learn
		18.12.4 CNNs: the Epilogue
	18.13 Recurrent Neural Networks
		18.13.1 Backpropagation Through Time
			Vanishing and Exploding Gradients
			The Long Short-Term Memory (LSTM) Network
		18.13.2 Attention and Memory
	18.14 Adversarial Examples
		Adversarial Training
	18.15 Deep Generative Models
		18.15.1 Restricted Boltzmann Machines
		18.15.2 Pretraining Deep Feed-Forward Networks
		18.15.3 Deep Belief Networks
		18.15.4 Autoencoders
		18.15.5 Generative Adversarial Networks
			On the Optimality of the Solution
			Problems in Training GANs
			The Wasserstein GAN
			Which Algorithm Then
		18.15.6 Variational Autoencoders
	18.16 Capsule Networks
		Training
	18.17 Deep Neural Networks: Some Final Remarks
		Transfer Learning
		Multitask Learning
		Geometric Deep Learning
		Open Problems
	18.18 A Case Study: Neural Machine Translation
	18.19 Problems
		Computer Exercises
	References
19 Dimensionality Reduction and Latent Variable Modeling
	19.1 Introduction
	19.2 Intrinsic Dimensionality
	19.3 Principal Component Analysis
		PCA, SVD, and Low Rank Matrix Factorization
		Minimum Error Interpretation
		PCA and Information Retrieval
		Orthogonalizing Properties of PCA and Feature Generation
		Latent Variables
	19.4 Canonical Correlation Analysis
		19.4.1 Relatives of CCA
			Partial Least-Squares
	19.5 Independent Component Analysis
		19.5.1 ICA and Gaussianity
		19.5.2 ICA and Higher-Order Cumulants
			ICA Ambiguities
		19.5.3 Non-Gaussianity and Independent Components
		19.5.4 ICA Based on Mutual Information
		19.5.5 Alternative Paths to ICA
		The Cocktail Party Problem
	19.6 Dictionary Learning: the k-SVD Algorithm
		Why the Name k-SVD?
		Dictionary Learning and Dictionary Identifiability
	19.7 Nonnegative Matrix Factorization
	19.8 Learning Low-Dimensional Models: a Probabilistic Perspective
		19.8.1 Factor Analysis
		19.8.2 Probabilistic PCA
		19.8.3 Mixture of Factors Analyzers: a Bayesian View to Compressed Sensing
	19.9 Nonlinear Dimensionality Reduction
		19.9.1 Kernel PCA
		19.9.2 Graph-Based Methods
			Laplacian Eigenmaps
			Local Linear Embedding (LLE)
			Isometric Mapping (ISOMAP)
	19.10 Low Rank Matrix Factorization: a Sparse Modeling Path
		19.10.1 Matrix Completion
		19.10.2 Robust PCA
		19.10.3 Applications of Matrix Completion and ROBUST PCA
			Matrix Completion
			Robust PCA/PCP
	19.11 A Case Study: FMRI Data Analysis
	Problems
		MATLAB® Exercises
	References
Index




نظرات کاربران