دسترسی نامحدود
برای کاربرانی که ثبت نام کرده اند
برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید
در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید
برای کاربرانی که ثبت نام کرده اند
درصورت عدم همخوانی توضیحات با کتاب
از ساعت 7 صبح تا 10 شب
ویرایش: نویسندگان: Ian Goodfellow, Yoshua Bengio, Aaron Courville سری: ناشر: سال نشر: 2016 تعداد صفحات: [801] زبان: English فرمت فایل : PDF (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود) حجم فایل: 19 Mb
در صورت تبدیل فایل کتاب Deep Learning به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.
توجه داشته باشید کتاب یادگیری عمیق نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.
Cover Contents Website Acknowledgments Notation Introduction 1.1 Who Should Read This Book? 1.2 Historical Trends in Deep Learning 1.2.1 The Many Names and Changing Fortunes of Neural Networks 1.2.2 Increasing Dataset Sizes 1.2.3 Increasing Model Sizes 1.2.4 Increasing Accuracy, Complexity and Real-World Impact Part I Applied Math and MachineLearning Basics Chapter 2 Linear Algebra 2.1 Scalars, Vectors, Matrices and Tensors 2.2 Multiplying Matrices and Vectors 2.3 Identity and Inverse Matrices 2.4 Linear Dependence and Span 2.5 Norms 2.6 Special Kinds of Matrices and Vectors 2.7 Eigendecomposition 2.8 Singular Value Decomposition 2.9 The Moore-Penrose Pseudoinverse 2.10 The Trace Operator 2.11 The Determinant 2.12 Example: Principal Components Analysis Chapter 3 Probability and InformationTheory 3.1 Why Probability? 3.2 Random Variables 3.3 Probability Distributions 3.3.1 Discrete Variables and Probability Mass Functions 3.3.2 Continuous Variables and Probability Density Functions 3.4 Marginal Probability 3.5 Conditional Probability 3.6 The Chain Rule of Conditional Probabilities 3.7 Independence and Conditional Independence 3.8 Expectation, Variance and Covariance 3.9 Common Probability Distributions 3.9.1 Bernoulli Distribution 3.9.2 Multinoulli Distribution 3.9.3 Gaussian Distribution 3.9.4 Exponential and Laplace Distributions 3.9.5 The Dirac Distribution and Empirical Distribution 3.9.6 Mixtures of Distributions 3.10 Useful Properties of Common Functions 3.11 Bayes’ Rule 3.12 Technical Details of Continuous Variables 3.13 Information Theory 3.14 Structured Probabilistic Models Chapter 4 Numerical Computation 4.1 Overflow and Underflow 4.2 Poor Conditioning 4.3 Gradient-Based Optimization 4.3.1 Beyond the Gradient: Jacobian and Hessian Matrices 4.4 Constrained Optimization 4.5 Example: Linear Least Squares Chapter 5 Machine Learning Basics 5.1 Learning Algorithms 5.1.1 The Task, T 5.1.2 The Performance Measure, PIn 5.1.3 The Experience, E 5.1.4 Example: Linear Regression 5.2 Capacity, Overfitting and Underfitting 5.2.1 The No Free Lunch Theorem 5.2.2 Regularization 5.3 Hyperparameters and Validation Sets 5.3.1 Cross-Validation 5.4 Estimators, Bias and Variance 5.4.1 Point Estimation 5.4.2 Bias 5.4.3 Variance and Standard Error 5.4.4 Trading off Bias and Variance to Minimize Mean SquaredError 5.4.5 Consistency 5.5 Maximum Likelihood Estimation 5.5.1 Conditional Log-Likelihood and Mean Squared Error 5.6 Bayesian Statistics 5.6.1 Maximum A Posteriori (MAP) Estimation 5.7 Supervised Learning Algorithms 5.7.1 Probabilistic Supervised Learning 5.7.2 Support Vector Machines 5.7.3 Other Simple Supervised Learning Algorithms 5.8 Unsupervised Learning Algorithms 5.8.1 Principal Components Analysis 5.8.2 k-means Clustering 5.9 Stochastic Gradient Descent 5.10 Building a Machine Learning Algorithm 5.11 Challenges Motivating Deep Learning 5.11.1 The Curse of Dimensionality 5.11.2 Local Constancy and Smoothness Regularization 5.11.3 Manifold Learning Part II Deep Networks: ModernPractices Chapter 6 Deep Feedforward Networks 6.1 Example: Learning XOR 6.2 Gradient-Based Learning 6.2.1 Cost Functions 6.2.1.1 Learning Conditional Distributions with Maximum Likelihood 6.2.1.2 Learning Conditional Statistics 6.2.2 Output Units 6.2.2.1 Linear Units for Gaussian Output Distributions 6.2.2.2 Sigmoid Units for Bernoulli Output Distributions 6.2.2.3 Softmax Units for Multinoulli Output Distributions 6.2.2.4 Other Output Types 6.3 Hidden Units 6.3.1 Rectified Linear Units and Their Generalizations 6.3.2 Logistic Sigmoid and Hyperbolic Tangent 6.3.3 Other Hidden Units 6.4 Architecture Design 6.4.1 Universal Approximation Properties and Depth 6.4.2 Other Architectural Considerations 6.5 Back-Propagation and Other Differentiation Algorithms 6.5.1 Computational Graphs 6.5.2 Chain Rule of Calculus 6.5.3 Recursively Applying the Chain Rule to Obtain Backprop 6.5.4 Back-Propagation Computation in Fully-Connected MLP 6.5.5 Symbol-to-Symbol Derivatives 6.5.6 General Back-Propagation 6.5.7 Example: Back-Propagation for MLP Training 6.5.8 Complications 6.5.9 Differentiation outside the Deep Learning Community 6.5.10 Higher-Order Derivatives 6.6 Historical Notes Chapter 7 Regularization for Deep Learning 7.1 Parameter Norm Penalties 7.1.1 L2 Parameter Regularization 7.1.2 L1 Regularization 7.2 Norm Penalties as Constrained Optimization 7.3 Regularization and Under-Constrained Problems 7.4 Dataset Augmentation 7.5 Noise Robustness 7.5.1 Injecting Noise at the Output Targets 7.6 Semi-Supervised Learning 7.7 Multi-Task Learning 7.8 Early Stopping 7.9 Parameter Tying and Parameter Sharing 7.10 Sparse Representations 7.11 Bagging and Other Ensemble Methods 7.12 Dropout 7.13 Adversarial Training 7.14 Tangent Distance, Tangent Prop, and ManifoldTangent Classifier Chapter 8 Optimization for Training DeepModels 8.1 How Learning Differs from Pure Optimization 8.1.1 Empirical Risk Minimization 8.1.2 Surrogate Loss Functions and Early Stopping 8.1.3 Batch and Minibatch Algorithms 8.2 Challenges in Neural Network Optimization 8.2.1 Ill-Conditioning 8.2.2 Local Minima 8.2.3 Plateaus, Saddle Points and Other Flat Regions 8.2.4 Cliffs and Exploding Gradients 8.2.5 Long-Term Dependencies 8.2.6 Inexact Gradients 8.2.7 Poor Correspondence between Local and Global Structure 8.2.8 Theoretical Limits of Optimization 8.3 Basic Algorithms 8.3.1 Stochastic Gradient Descent 8.3.2 Momentum 8.3.3 Nesterov Momentum 8.4 Parameter Initialization Strategies 8.5 Algorithms with Adaptive Learning Rates 8.5.1 AdaGrad 8.5.2 RMSProp 8.5.3 Adam 8.5.4 Choosing the Right Optimization Algorithm 8.6 Approximate Second-Order Methods 8.6.1 Newton’s Method 8.6.2 Conjugate Gradients 8.6.3 BFGS 8.7 Optimization Strategies and Meta-Algorithms 8.7.1 Batch Normalization 8.7.2 Coordinate Descent 8.7.3 Polyak Averaging 8.7.4 Supervised Pretraining 8.7.5 Designing Models to Aid Optimization 8.7.6 Continuation Methods and Curriculum Learning Chapter 9 Convolutional Networks 9.1 The Convolution Operation 9.2 Motivation 9.3 Pooling 9.4 Convolution and Pooling as an Infinitely StrongPrior 9.5 Variants of the Basic Convolution Function 9.6 Structured Outputs 9.7 Data Types 9.8 Efficient Convolution Algorithms 9.9 Random or Unsupervised Features 9.10 The Neuroscientific Basis for Convolutional Networks 9.11 Convolutional Networks and the History of DeepLearning Chapter 10 Sequence Modeling: Recurrentand Recursive Nets 10.1 Unfolding Computational Graphs 10.2 Recurrent Neural Networks 10.2.1 Teacher Forcing and Networks with Output Recurrence 10.2.2 Computing the Gradient in a Recurrent Neural Network 10.2.3 Recurrent Networks as Directed Graphical Models 10.2.4 Modeling Sequences Conditioned on Context with RNNs 10.3 Bidirectional RNNs 10.4 Encoder-Decoder Sequence-to-Sequence Architectures 10.5 Deep Recurrent Networks 10.6 Recursive Neural Networks 10.7 The Challenge of Long-Term Dependencies 10.8 Echo State Networks 10.9 Leaky Units and Other Strategies for MultipleTime Scales 10.9.1 Adding Skip Connections through Time 10.9.2 Leaky Units and a Spectrum of Different Time Scales 10.9.3 Removing Connections 10.10 The Long Short-Term Memory and Other GatedRNNs 10.10.1 LSTM 10.10.2 Other Gated RNNs 10.11 Optimization for Long-Term Dependencies 10.11.1 Clipping GradientsAs 10.11.2 Regularizing to Encourage Information Flow 10.12 Explicit Memory Chapter 11 Practical Methodology 11.1 Performance Metrics 11.2 Default Baseline Models 11.3 Determining Whether to Gather More Data 11.4 Selecting Hyperparameters 11.4.1 Manual Hyperparameter Tuning 11.4.2 Automatic Hyperparameter Optimization Algorithms 11.4.3 Grid Search 11.4.4 Random Search 11.4.5 Model-Based Hyperparameter Optimization 11.5 Debugging Strategies 11.6 Example: Multi-Digit Number Recognition Chapter 12 Applications 12.1 Large-Scale Deep Learning 12.1.1 Fast CPU Implementations 12.1.2 GPU Implementations 12.1.3 Large-Scale Distributed Implementations 12.1.4 Model Compression 12.1.5 Dynamic Structure 12.1.6 Specialized Hardware Implementations of Deep Networks 12.2 Computer Vision 12.2.1 Preprocessing 12.2.1.1 Contrast Normalization 12.2.1.2 Dataset Augmentation 12.3 Speech Recognition 12.4 Natural Language Processing 12.4.1 n-grams 12.4.2 Neural Language Models 12.4.3 High-Dimensional Outputs 12.4.3.1 Use of a Short List 12.4.3.2 Hierarchical Softmax 12.4.3.3 Importance Sampling 12.4.3.4 Noise-Contrastive Estimation and Ranking Loss 12.4.4 Combining Neural Language Models with n-grams 12.4.5 Neural Machine Translation 12.4.5.1 Using an Attention Mechanism and Aligning Pieces of Data 12.4.6 Historical Perspective 12.5 Other Applications 12.5.1 Recommender Systems 12.5.1.1 Exploration Versus Exploitation 12.5.2 Knowledge Representation, Reasoning and Question Answering 12.5.2.1 Knowledge, Relations and Question Answering Part III Deep Learning Research Chapter 13 Linear Factor Models 13.1 Probabilistic PCA and Factor Analysis 13.2 Independent Component Analysis (ICA) 13.3 Slow Feature Analysis 13.4 Sparse Coding 13.5 Manifold Interpretation of PCA Chapter 14 Autoencoders 14.1 Undercomplete Autoencoders 14.2 Regularized Autoencoders 14.2.1 Sparse Autoencoders 14.2.2 Denoising Autoencoders 14.2.3 Regularizing by Penalizing Derivatives 14.3 Representational Power, Layer Size and Depth 14.4 Stochastic Encoders and Decoders 14.5 Denoising Autoencoders 14.5.1 Estimating the Score 14.5.1.1 Historical Perspective 14.6 Learning Manifolds with Autoencoders 14.7 Contractive Autoencoders 14.8 Predictive Sparse Decomposition 14.9 Applications of Autoencoders Chapter 15 Representation Learning 15.1 Greedy Layer-Wise Unsupervised Pretraining 15.1.1 When and Why Does Unsupervised Pretraining Work? 15.2 Transfer Learning and Domain Adaptation 15.3 Semi-Supervised Disentangling of Causal Factors 15.4 Distributed Representation 15.5 Exponential Gains from Depth 15.6 Providing Clues to Discover Underlying Causes Chapter 16 Structured Probabilistic Modelsfor Deep Learning 16.1 The Challenge of Unstructured Modeling 16.2 Using Graphs to Describe Model Structure 16.2.1 Directed Models 16.2.2 Undirected Models 16.2.3 The Partition Function 16.2.4 Energy-Based Models 16.2.5 Separation and D-Separation 16.2.6 Converting between Undirected and Directed Graphs 16.2.7 Factor Graphs 16.3 Sampling from Graphical Models 16.4 Advantages of Structured Modeling 16.5 Learning about Dependencies 16.6 Inference and Approximate Inference 16.7 The Deep Learning Approach to Structured ProbabilisticModels 16.7.1 Example: The Restricted Boltzmann Machine Chapter 17 Monte Carlo Methods 17.1 Sampling and Monte Carlo Methods 17.1.1 Why Sampling? 17.1.2 Basics of Monte Carlo Sampling 17.2 Importance Sampling 17.3 Markov Chain Monte Carlo Methods 17.4 Gibbs Sampling 17.5 The Challenge of Mixing between Separated Modes 17.5.1 Tempering to Mix between Modes 17.5.2 Depth May Help Mixing Chapter 18 Confronting the PartitionFunction 18.1 The Log-Likelihood Gradient 18.2 Stochastic Maximum Likelihood and ContrastiveDivergence 18.3 Pseudolikelihood 18.4 Score Matching and Ratio Matching 18.5 Denoising Score Matching 18.6 Noise-Contrastive Estimation 18.7 Estimating the Partition Function 18.7.1 Annealed Importance Sampling 18.7.2 Bridge Sampling Chapter 19 Approximate Inference 19.1 Inference as Optimization 19.2 Expectation Maximization 19.3 MAP Inference and Sparse Coding 19.4 Variational Inference and Learning 19.4.1 Discrete Latent Variables 19.4.2 Calculus of Variations 19.4.3 Continuous Latent Variables 19.4.4 Interactions between Learning and Inference 19.5 Learned Approximate Inference 19.5.1 Wake-Sleep 19.5.2 Other Forms of Learned Inference Chapter 20 Deep Generative Models 20.1 Boltzmann Machines 20.2 Restricted Boltzmann Machines 20.2.1 Conditional Distributions 20.2.2 Training Restricted Boltzmann Machines 20.3 Deep Belief Networks 20.4 Deep Boltzmann Machines 20.4.1 Interesting Properties 20.4.2 DBM Mean Field Inference 20.4.3 DBM Parameter Learning 20.4.4 Layer-Wise Pretraining 20.4.5 Jointly Training Deep Boltzmann Machines 20.5 Boltzmann Machines for Real-Valued Data 20.5.1 Gaussian-Bernoulli RBMs 20.5.2 Undirected Models of Conditional Covariance 20.6 Convolutional Boltzmann Machines 20.7 Boltzmann Machines for Structured or SequentialOutputs 20.8 Other Boltzmann Machines 20.9 Back-Propagation through Random Operations 20.9.1 Back-Propagating through Discrete Stochastic Operations 20.10 Directed Generative Nets 20.10.1 Sigmoid Belief Nets 20.10.2 Differentiable Generator Nets 20.10.3 Variational Autoencoders 20.10.4 Generative Adversarial Networks 20.10.5 Generative Moment Matching Networks 20.10.6 Convolutional Generative Networks 20.10.7 Auto-Regressive Networks 20.10.8 Linear Auto-Regressive Networks 20.10.9 Neural Auto-Regressive Networks 20.10.10 NADE 20.11 Drawing Samples from Autoencoders 20.11.1 Markov Chain Associated with any Denoising Autoencoder 20.11.2 Clamping and Conditional Sampling 20.11.3 Walk-Back Training Procedure 20.12 Generative Stochastic Networks 20.12.1 Discriminant GSNs 20.13 Other Generation Schemes 20.14 Evaluating Generative Models 20.15 Conclusion Bibliography Index