دسترسی نامحدود
برای کاربرانی که ثبت نام کرده اند
برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید
در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید
برای کاربرانی که ثبت نام کرده اند
درصورت عدم همخوانی توضیحات با کتاب
از ساعت 7 صبح تا 10 شب
ویرایش:
نویسندگان: Kevin P. Murphy
سری:
ISBN (شابک) : 2021027430, 9780262369305
ناشر: MIT Press
سال نشر: 2022
تعداد صفحات:
زبان: English
فرمت فایل : EPUB (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود)
حجم فایل: 24 Mb
در صورت تبدیل فایل کتاب Probabilistic Machine Learning : An Introduction به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.
توجه داشته باشید کتاب یادگیری ماشین احتمالی: مقدمه نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.
Cover Brief Contents Contents Preface Introduction What is machine learning? Supervised learning Classification Regression Overfitting and generalization No free lunch theorem Unsupervised learning Clustering Discovering latent ``factors of variation\'\' Self-supervised learning Evaluating unsupervised learning Reinforcement learning Data Some common image datasets Some common text datasets Preprocessing discrete input data Preprocessing text data Handling missing data Discussion The relationship between ML and other fields Structure of the book Caveats I Foundations Probability: Univariate Models Introduction What is probability? Types of uncertainty Probability as an extension of logic Random variables Discrete random variables Continuous random variables Sets of related random variables Independence and conditional independence Moments of a distribution Limitations of summary statistics * Bayes\' rule Example: Testing for COVID-19 Example: The Monty Hall problem Inverse problems * Bernoulli and binomial distributions Definition Sigmoid (logistic) function Binary logistic regression Categorical and multinomial distributions Definition Softmax function Multiclass logistic regression Log-sum-exp trick Univariate Gaussian (normal) distribution Cumulative distribution function Probability density function Regression Why is the Gaussian distribution so widely used? Dirac delta function as a limiting case Some other common univariate distributions * Student t distribution Cauchy distribution Laplace distribution Beta distribution Gamma distribution Empirical distribution Transformations of random variables * Discrete case Continuous case Invertible transformations (bijections) Moments of a linear transformation The convolution theorem Central limit theorem Monte Carlo approximation Exercises Probability: Multivariate Models Joint distributions for multiple random variables Covariance Correlation Uncorrelated does not imply independent Correlation does not imply causation Simpson\'s paradox The multivariate Gaussian (normal) distribution Definition Mahalanobis distance Marginals and conditionals of an MVN * Example: conditioning a 2d Gaussian Example: Imputing missing values * Linear Gaussian systems * Bayes rule for Gaussians Derivation * Example: Inferring an unknown scalar Example: inferring an unknown vector Example: sensor fusion The exponential family * Definition Example Log partition function is cumulant generating function Maximum entropy derivation of the exponential family Mixture models Gaussian mixture models Bernoulli mixture models Probabilistic graphical models * Representation Inference Learning Exercises Statistics Introduction Maximum likelihood estimation (MLE) Definition Justification for MLE Example: MLE for the Bernoulli distribution Example: MLE for the categorical distribution Example: MLE for the univariate Gaussian Example: MLE for the multivariate Gaussian Example: MLE for linear regression Empirical risk minimization (ERM) Example: minimizing the misclassification rate Surrogate loss Other estimation methods * The method of moments Online (recursive) estimation Regularization Example: MAP estimation for the Bernoulli distribution Example: MAP estimation for the multivariate Gaussian * Example: weight decay Picking the regularizer using a validation set Cross-validation Early stopping Using more data Bayesian statistics * Conjugate priors The beta-binomial model The Dirichlet-multinomial model The Gaussian-Gaussian model Beyond conjugate priors Credible intervals Bayesian machine learning Computational issues Frequentist statistics * Sampling distributions Gaussian approximation of the sampling distribution of the MLE Bootstrap approximation of the sampling distribution of any estimator Confidence intervals Caution: Confidence intervals are not credible The bias-variance tradeoff Exercises Decision Theory Bayesian decision theory Basics Classification problems ROC curves Precision-recall curves Regression problems Probabilistic prediction problems Choosing the ``right\'\' model Bayesian hypothesis testing Bayesian model selection Occam\'s razor Connection between cross validation and marginal likelihood Information criteria Posterior inference over effect sizes and Bayesian significance testing Frequentist decision theory Computing the risk of an estimator Consistent estimators Admissible estimators Empirical risk minimization Empirical risk Structural risk Cross-validation Statistical learning theory * Frequentist hypothesis testing * Likelihood ratio test Null hypothesis significance testing (NHST) p-values p-values considered harmful Why isn\'t everyone a Bayesian? Exercises Information Theory Entropy Entropy for discrete random variables Cross entropy Joint entropy Conditional entropy Perplexity Differential entropy for continuous random variables * Relative entropy (KL divergence) * Definition Interpretation Example: KL divergence between two Gaussians Non-negativity of KL KL divergence and MLE Forward vs reverse KL Mutual information * Definition Interpretation Example Conditional mutual information MI as a ``generalized correlation coefficient\'\' Normalized mutual information Maximal information coefficient Data processing inequality Sufficient Statistics Fano\'s inequality * Exercises Linear Algebra Introduction Notation Vector spaces Norms of a vector and matrix Properties of a matrix Special types of matrices Matrix multiplication Vector–vector products Matrix–vector products Matrix–matrix products Application: manipulating data matrices Kronecker products * Einstein summation * Matrix inversion The inverse of a square matrix Schur complements * The matrix inversion lemma * Matrix determinant lemma * Application: deriving the conditionals of an MVN * Eigenvalue decomposition (EVD) Basics Diagonalization Eigenvalues and eigenvectors of symmetric matrices Geometry of quadratic forms Standardizing and whitening data Power method Deflation Eigenvectors optimize quadratic forms Singular value decomposition (SVD) Basics Connection between SVD and EVD Pseudo inverse SVD and the range and null space of a matrix * Truncated SVD Other matrix decompositions * LU factorization QR decomposition Cholesky decomposition Solving systems of linear equations * Solving square systems Solving underconstrained systems (least norm estimation) Solving overconstrained systems (least squares estimation) Matrix calculus Derivatives Gradients Directional derivative Total derivative * Jacobian Hessian Gradients of commonly used functions Exercises Optimization Introduction Local vs global optimization Constrained vs unconstrained optimization Convex vs nonconvex optimization Smooth vs nonsmooth optimization First-order methods Descent direction Step size (learning rate) Convergence rates Momentum methods Second-order methods Newton\'s method BFGS and other quasi-Newton methods Trust region methods Stochastic gradient descent Application to finite sum problems Example: SGD for fitting linear regression Choosing the step size (learning rate) Iterate averaging Variance reduction * Preconditioned SGD Constrained optimization Lagrange multipliers The KKT conditions Linear programming Quadratic programming Mixed integer linear programming * Proximal gradient method * Projected gradient descent Proximal operator for 1-norm regularizer Proximal operator for quantization Incremental (online) proximal methods Bound optimization * The general algorithm The EM algorithm Example: EM for a GMM Blackbox and derivative free optimization Exercises II Linear Models Linear Discriminant Analysis Introduction Gaussian discriminant analysis Quadratic decision boundaries Linear decision boundaries The connection between LDA and logistic regression Model fitting Nearest centroid classifier Fisher\'s linear discriminant analysis * Naive Bayes classifiers Example models Model fitting Bayesian naive Bayes The connection between naive Bayes and logistic regression Generative vs discriminative classifiers Advantages of discriminative classifiers Advantages of generative classifiers Handling missing features Exercises Logistic Regression Introduction Binary logistic regression Linear classifiers Nonlinear classifiers Maximum likelihood estimation Stochastic gradient descent Perceptron algorithm Iteratively reweighted least squares MAP estimation Standardization Multinomial logistic regression Linear and nonlinear classifiers Maximum likelihood estimation Gradient-based optimization Bound optimization MAP estimation Maximum entropy classifiers Hierarchical classification Handling large numbers of classes Robust logistic regression * Mixture model for the likelihood Bi-tempered loss Bayesian logistic regression * Laplace approximation Approximating the posterior predictive Exercises Linear Regression Introduction Least squares linear regression Terminology Least squares estimation Other approaches to computing the MLE Measuring goodness of fit Ridge regression Computing the MAP estimate Connection between ridge regression and PCA Choosing the strength of the regularizer Lasso regression MAP estimation with a Laplace prior (1 regularization) Why does 1 regularization yield sparse solutions? Hard vs soft thresholding Regularization path Comparison of least squares, lasso, ridge and subset selection Variable selection consistency Group lasso Elastic net (ridge and lasso combined) Optimization algorithms Regression splines * B-spline basis functions Fitting a linear model using a spline basis Smoothing splines Generalized additive models Robust linear regression * Laplace likelihood Student-t likelihood Huber loss RANSAC Bayesian linear regression * Priors Posteriors Example Computing the posterior predictive The advantage of centering Dealing with multicollinearity Automatic relevancy determination (ARD) * Exercises Generalized Linear Models * Introduction Examples Linear regression Binomial regression Poisson regression GLMs with non-canonical link functions Maximum likelihood estimation Worked example: predicting insurance claims III Deep Neural Networks Neural Networks for Tabular Data Introduction Multilayer perceptrons (MLPs) The XOR problem Differentiable MLPs Activation functions Example models The importance of depth The ``deep learning revolution\'\' Connections with biology Backpropagation Forward vs reverse mode differentiation Reverse mode differentiation for multilayer perceptrons Vector-Jacobian product for common layers Computation graphs Training neural networks Tuning the learning rate Vanishing and exploding gradients Non-saturating activation functions Residual connections Parameter initialization Parallel training Regularization Early stopping Weight decay Sparse DNNs Dropout Bayesian neural networks Regularization effects of (stochastic) gradient descent * Other kinds of feedforward networks * Radial basis function networks Mixtures of experts Exercises Neural Networks for Images Introduction Common layers Convolutional layers Pooling layers Putting it all together Normalization layers Common architectures for image classification LeNet AlexNet GoogLeNet (Inception) ResNet DenseNet Neural architecture search Other forms of convolution * Dilated convolution Transposed convolution Depthwise separable convolution Solving other discriminative vision tasks with CNNs * Image tagging Object detection Instance segmentation Semantic segmentation Human pose estimation Generating images by inverting CNNs * Converting a trained classifier into a generative model Image priors Visualizing the features learned by a CNN Deep Dream Neural style transfer Neural Networks for Sequences Introduction Recurrent neural networks (RNNs) Vec2Seq (sequence generation) Seq2Vec (sequence classification) Seq2Seq (sequence translation) Teacher forcing Backpropagation through time Vanishing and exploding gradients Gating and long term memory Beam search 1d CNNs 1d CNNs for sequence classification Causal 1d CNNs for sequence generation Attention Attention as soft dictionary lookup Kernel regression as non-parametric attention Parametric attention Seq2Seq with attention Seq2vec with attention (text classification) Seq+Seq2Vec with attention (text pair classification) Soft vs hard attention Transformers Self-attention Multi-headed attention Positional encoding Putting it all together Comparing transformers, CNNs and RNNs Transformers for images * Other transformer variants * Efficient transformers * Fixed non-learnable localized attention patterns Learnable sparse attention patterns Memory and recurrence methods Low-rank and kernel methods Language models and unsupervised representation learning ELMo BERT GPT T5 Discussion IV Nonparametric Models Exemplar-based Methods K nearest neighbor (KNN) classification Example The curse of dimensionality Reducing the speed and memory requirements Open set recognition Learning distance metrics Linear and convex methods Deep metric learning Classification losses Ranking losses Speeding up ranking loss optimization Other training tricks for DML Kernel density estimation (KDE) Density kernels Parzen window density estimator How to choose the bandwidth parameter From KDE to KNN classification Kernel regression Kernel Methods * Mercer kernels Mercer\'s theorem Some popular Mercer kernels Gaussian processes Noise-free observations Noisy observations Comparison to kernel regression Weight space vs function space Numerical issues Estimating the kernel GPs for classification Connections with deep learning Scaling GPs to large datasets Support vector machines (SVMs) Large margin classifiers The dual problem Soft margin classifiers The kernel trick Converting SVM outputs into probabilities Connection with logistic regression Multi-class classification with SVMs How to choose the regularizer C Kernel ridge regression SVMs for regression Sparse vector machines Relevance vector machines (RVMs) Comparison of sparse and dense kernel methods Exercises Trees, Forests, Bagging, and Boosting Classification and regression trees (CART) Model definition Model fitting Regularization Handling missing input features Pros and cons Ensemble learning Stacking Ensembling is not Bayes model averaging Bagging Random forests Boosting Forward stagewise additive modeling Quadratic loss and least squares boosting Exponential loss and AdaBoost LogitBoost Gradient boosting Interpreting tree ensembles Feature importance Partial dependency plots V Beyond Supervised Learning Learning with Fewer Labeled Examples Data augmentation Examples Theoretical justification Transfer learning Fine-tuning Adapters Supervised pre-training Unsupervised pre-training (self-supervised learning) Domain adaptation Semi-supervised learning Self-training and pseudo-labeling Entropy minimization Co-training Label propagation on graphs Consistency regularization Deep generative models * Combining self-supervised and semi-supervised learning Active learning Decision-theoretic approach Information-theoretic approach Batch active learning Meta-learning Model-agnostic meta-learning (MAML) Few-shot learning Matching networks Weakly supervised learning Exercises Dimensionality Reduction Principal components analysis (PCA) Examples Derivation of the algorithm Computational issues Choosing the number of latent dimensions Factor analysis * Generative model Probabilistic PCA EM algorithm for FA/PPCA Unidentifiability of the parameters Nonlinear factor analysis Mixtures of factor analysers Exponential family factor analysis Factor analysis models for paired data Autoencoders Bottleneck autoencoders Denoising autoencoders Contractive autoencoders Sparse autoencoders Variational autoencoders Manifold learning * What are manifolds? The manifold hypothesis Approaches to manifold learning Multi-dimensional scaling (MDS) Isomap Kernel PCA Maximum variance unfolding (MVU) Local linear embedding (LLE) Laplacian eigenmaps t-SNE Word embeddings Latent semantic analysis / indexing Word2vec GloVE Word analogies RAND-WALK model of word embeddings Contextual word embeddings Exercises Clustering Introduction Evaluating the output of clustering methods Hierarchical agglomerative clustering The algorithm Example Extensions K means clustering The algorithm Examples Vector quantization The K-means++ algorithm The K-medoids algorithm Speedup tricks Choosing the number of clusters K Clustering using mixture models Mixtures of Gaussians Mixtures of Bernoullis Spectral clustering * Normalized cuts Eigenvectors of the graph Laplacian encode the clustering Example Connection with other methods Biclustering * Basic biclustering Nested partition models (Crosscat) Recommender Systems Explicit feedback Datasets Collaborative filtering Matrix factorization Autoencoders Implicit feedback Bayesian personalized ranking Factorization machines Neural matrix factorization Leveraging side information Exploration-exploitation tradeoff Graph Embeddings * Introduction Graph Embedding as an Encoder/Decoder Problem Shallow graph embeddings Unsupervised embeddings Distance-based: Euclidean methods Distance-based: non-Euclidean methods Outer product-based: Matrix factorization methods Outer product-based: Skip-gram methods Supervised embeddings Graph Neural Networks Message passing GNNs Spectral Graph Convolutions Spatial Graph Convolutions Non-Euclidean Graph Convolutions Deep graph embeddings Unsupervised embeddings Semi-supervised embeddings Applications Unsupervised applications Supervised applications Notation Introduction Common mathematical symbols Functions Common functions of one argument Common functions of two arguments Common functions of >2 arguments Linear algebra General notation Vectors Matrices Matrix calculus Optimization Probability Information theory Statistics and machine learning Supervised learning Unsupervised learning and generative models Bayesian inference Abbreviations Index Bibliography