دسترسی نامحدود
برای کاربرانی که ثبت نام کرده اند
برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید
در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید
برای کاربرانی که ثبت نام کرده اند
درصورت عدم همخوانی توضیحات با کتاب
از ساعت 7 صبح تا 10 شب
ویرایش:
نویسندگان: Norman Matloff
سری:
ISBN (شابک) : 9781718502116, 9781718502109
ناشر: No Starch Press, Inc.
سال نشر: 2023
تعداد صفحات: 272
زبان: English
فرمت فایل : EPUB (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود)
حجم فایل: 18 Mb
در صورت تبدیل فایل کتاب The Art of Machine Learning: A Hands-On Guide to Machine Learning with R به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.
توجه داشته باشید کتاب هنر یادگیری ماشینی: راهنمای عملی برای یادگیری ماشین با R نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.
با استفاده از این راهنمای عملی، یاد بگیرید که به طور ماهرانه طیف وسیعی از روش های یادگیری ماشین را در داده های واقعی اعمال کنید. مجموعه دادههای واقعی و مثالهای عملی، The Art of Machine Learning به شما کمک میکند بدون نیاز به ریاضیات پیشرفته، درک شهودی از نحوه و چرایی کار روشهای ML ایجاد کنید. همانطور که روی کتاب کار می کنید، یاد خواهید گرفت که چگونه طیف وسیعی از تکنیک های قدرتمند ML را پیاده سازی کنید، از روش k-نزدیک ترین همسایگان (k-NN) و جنگل های تصادفی شروع کنید و به سمت تقویت گرادیان، ماشین های بردار پشتیبانی (SVM) بروید. )، شبکه های عصبی و موارد دیگر. با کمک مجموعه دادههای واقعی، از طریق استفاده از مجموعه داده اشتراکگذاری دوچرخه، مدلهای رگرسیون را بررسی میکنید، درختهای تصمیم را با استفاده از دادههای تاکسی شهر نیویورک کشف میکنید، و روشهای پارامتریک را با آمار بازیکنان بیسبال تشریح میکنید. همچنین نکات متخصص را برای اجتناب از مشکلات رایج، مانند مدیریت دادههای «کثیف» یا نامتعادل، و نحوه عیبیابی مشکلات پیدا خواهید کرد. همچنین بررسی خواهید کرد: نحوه برخورد با مجموعه دادههای بزرگ و تکنیکهای کاهش ابعاد جزئیات در مورد نحوه انجام مبادله بایاس-واریانس در روشهای خاص ML مدلهای مبتنی بر روابط خطی، از جمله رگرسیون و رگرسیون LASSO در دنیای واقعی تصویر و متن طبقه بندی و نحوه رسیدگی به داده های سری زمانی، یادگیری ماشینی هنری است که نیاز به تنظیم و تنظیم دقیق دارد. با هنر یادگیری ماشینی به عنوان راهنمای خود، بر اصول اساسی ML مسلط خواهید شد که به شما قدرت میدهد تا به طور مؤثر از این مدلها استفاده کنید، نه اینکه صرفاً چند اقدام سهام را با استفاده عملی محدود ارائه دهید. الزامات: درک اولیه از نمودارها و نمودارها و آشنایی با زبان برنامه نویسی R
Learn to expertly apply a range of machine learning methods to real data with this practical guide. Packed with real datasets and practical examples, The Art of Machine Learning will help you develop an intuitive understanding of how and why ML methods work, without the need for advanced math. As you work through the book, you’ll learn how to implement a range of powerful ML techniques, starting with the k-Nearest Neighbors (k-NN) method and random forests, and moving on to gradient boosting, support vector machines (SVMs), neural networks, and more. With the aid of real datasets, you’ll delve into regression models through the use of a bike-sharing dataset, explore decision trees by leveraging New York City taxi data, and dissect parametric methods with baseball player stats. You’ll also find expert tips for avoiding common problems, like handling “dirty” or unbalanced data, and how to troubleshoot pitfalls. You’ll also explore: How to deal with large datasets and techniques for dimension reduction Details on how the Bias-Variance Trade-off plays out in specific ML methods Models based on linear relationships, including ridge and LASSO regression Real-world image and text classification and how to handle time series data Machine learning is an art that requires careful tuning and tweaking. With The Art of Machine Learning as your guide, you’ll master the underlying principles of ML that will empower you to effectively use these models, rather than simply provide a few stock actions with limited practical use. Requirements: A basic understanding of graphs and charts and familiarity with the R programming language
Cover Page Title Page Copyright Page About the Author About the Technical Reviewer BRIEF CONTENTS CONTENTS IN DETAIL ACKNOWLEDGMENTS INTRODUCTION 0.1 What Is ML? 0.2 The Role of Math in ML Theory and Practice 0.3 Why Another ML Book? 0.4 Recurring Special Sections 0.5 Background Needed 0.6 The qe*-Series Software 0.7 The Book’s Grand Plan 0.8 One More Point PART I: PROLOGUE, AND NEIGHBORHOOD-BASED METHODS 1 REGRESSION MODELS 1.1 Example: The Bike Sharing Dataset 1.1.1 Loading the Data 1.1.2 A Look Ahead 1.2 Machine Learning and Prediction 1.2.1 Predicting Past, Present, and Future 1.2.2 Statistics vs. Machine Learning in Prediction 1.3 Introducing the k-Nearest Neighbors Method 1.3.1 Predicting Bike Ridership with k-NN 1.4 Dummy Variables and Categorical Variables 1.5 Analysis with qeKNN() 1.5.1 Predicting Bike Ridership with qeKNN() 1.6 The Regression Function: The Basis of ML 1.7 The Bias-Variance Trade-off 1.7.1 Analogy to Election Polls 1.7.2 Back to ML 1.8 Example: The mlb Dataset 1.9 k-NN and Categorical Features 1.10 Scaling 1.11 Choosing Hyperparameters 1.11.1 Predicting the Training Data 1.12 Holdout Sets 1.12.1 Loss Functions 1.12.2 Holdout Sets in the qe*-Series 1.12.3 Motivating Cross-Validation 1.12.4 Hyperparameters, Dataset Size, and Number of Features 1.13 Pitfall: p-Hacking and Hyperparameter Selection 1.14 Pitfall: Long-Term Time Trends 1.15 Pitfall: Dirty Data 1.16 Pitfall: Missing Data 1.17 Direct Access to the regtools k-NN Code 1.18 Conclusions 2 CLASSIFICATION MODELS 2.1 Classification Is a Special Case of Regression 2.2 Example: The Telco Churn Dataset 2.2.1 Pitfall: Factor Data Read as Non-factor 2.2.2 Pitfall: Retaining Useless Features 2.2.3 Dealing with NA Values 2.2.4 Applying the k-Nearest Neighbors Method 2.2.5 Pitfall: Overfitting Due to Features with Many Categories 2.3 Example: Vertebrae Data 2.3.1 Analysis 2.4 Pitfall: Error Rate Improves Only Slightly Using the Features 2.5 The Confusion Matrix 2.6 Clearing the Confusion: Unbalanced Data 2.6.1 Example: The Kaggle Appointments Dataset 2.6.2 A Better Approach to Unbalanced Data 2.7 Receiver Operating Characteristic and Area Under Curve 2.7.1 Details of ROC and AUC 2.7.2 The qeROC() Function 2.7.3 Example: Telco Churn Data 2.7.4 Example: Vertebrae Data 2.7.5 Pitfall: Overreliance on AUC 2.8 Conclusions 3 BIAS, VARIANCE, OVERFITTING, AND CROSS-VALIDATION 3.1 Overfitting and Underfitting 3.1.1 Intuition Regarding the Number of Features and Overfitting 3.1.2 Relation to Overall Dataset Size 3.1.3 Well Then, What Are the Best Values of k and p? 3.2 Cross-Validation 3.2.1 K-Fold Cross-Validation 3.2.2 Using the replicMeans() Function 3.2.3 Example: Programmer and Engineer Data 3.2.4 Triple Cross-Validation 3.3 Conclusions 4 DEALING WITH LARGE NUMBERS OF FEATURES 4.1 Pitfall: Computational Issues in Large Datasets 4.2 Introduction to Dimension Reduction 4.2.1 Example: The Million Song Dataset 4.2.2 The Need for Dimension Reduction 4.3 Methods for Dimension Reduction 4.3.1 Consolidation and Embedding 4.3.2 The All Possible Subsets Method 4.3.3 Principal Components Analysis 4.3.4 But Now We Have Two Hyperparameters 4.3.5 Using the qePCA() Wrapper 4.3.6 PCs and the Bias-Variance Trade-off 4.4 The Curse of Dimensionality 4.5 Other Methods of Dimension Reduction 4.5.1 Feature Ordering by Conditional Independence 4.5.2 Uniform Manifold Approximation and Projection 4.6 Going Further Computationally 4.7 Conclusions PART II: TREE-BASED METHODS 5 A STEP BEYOND K-NN: DECISION TREES 5.1 Basics of Decision Trees 5.2 The qeDT() Function 5.2.1 Looking at the Plot 5.3 Example: New York City Taxi Data 5.3.1 Pitfall: Too Many Combinations of Factor Levels 5.3.2 Tree-Based Analysis 5.4 Example: Forest Cover Data 5.5 Decision Tree Hyperparameters: How to Split? 5.6 Hyperparameters in the qeDT() Function 5.7 Conclusions 6 TWEAKING THE TREES 6.1 Bias vs. Variance, Bagging, and Boosting 6.2 Bagging: Generating New Trees by Resampling 6.2.1 Random Forests 6.2.2 The qeRF() Function 6.2.3 Example: Vertebrae Data 6.2.4 Example: Remote-Sensing Soil Analysis 6.3 Boosting: Repeatedly Tweaking a Tree 6.3.1 Implementation: AdaBoost 6.3.2 Gradient Boosting 6.3.3 Example: Call Network Monitoring 6.3.4 Example: Vertebrae Data 6.3.5 Bias vs. Variance in Boosting 6.3.6 Computational Speed 6.3.7 Further Hyperparameters 6.3.8 The Learning Rate 6.4 Pitfall: No Free Lunch 7 FINDING A GOOD SET OF HYPERPARAMETERS 7.1 Combinations of Hyperparameters 7.2 Grid Searching with qeFT() 7.2.1 How to Call qeFT() 7.3 Example: Programmer and Engineer Data 7.3.1 Confidence Intervals 7.3.2 The Takeaway on Grid Searching 7.4 Example: Programmer and Engineer Data 7.5 Example: Phoneme Data 7.6 Conclusions PART III: METHODS BASED ON LINEAR RELATIONSHIPS 8 PARAMETRIC METHODS 8.1 Motivating Example: The Baseball Player Data 8.1.1 A Graph to Guide Our Intuition 8.1.2 View as Dimension Reduction 8.2 The lm() Function 8.3 Wrapper for lm() in the qe*-Series: qeLin() 8.4 Use of Multiple Features 8.4.1 Example: Baseball Player, Continued 8.4.2 Beta Notation 8.4.3 Example: Airbnb Data 8.4.4 Applying the Linear Model 8.5 Dimension Reduction 8.5.1 Which Features Are Important? 8.5.2 Statistical Significance and Dimension Reduction 8.6 Least Squares and Residuals 8.7 Diagnostics: Is the Linear Model Valid? 8.7.1 Exactness? 8.7.2 Diagnostic Methods 8.8 The R-Squared Value(s) 8.9 Classification Applications: The Logistic Model 8.9.1 The glm() and qeLogit() Functions 8.9.2 Example: Telco Churn Data 8.9.3 Multiclass Case 8.9.4 Example: Fall Detection Data 8.10 Bias and Variance in Linear/Generalized Linear Models 8.10.1 Example: Bike Sharing Data 8.11 Polynomial Models 8.11.1 Motivation 8.11.2 Modeling Nonlinearity with a Linear Model 8.11.3 Polynomial Logistic Regression 8.11.4 Example: Programmer and Engineer Wages 8.12 Blending the Linear Model with Other Methods 8.13 The qeCompare() Function 8.13.1 Need for Caution Regarding Polynomial Models 8.14 What’s Next 9 CUTTING THINGS DOWN TO SIZE: REGULARIZATION 9.1 Motivation 9.2 Size of a Vector 9.3 Ridge Regression and the LASSO 9.3.1 How They Work 9.3.2 The Bias-Variance Trade-off, Avoiding Overfitting 9.3.3 Relation Between λ, n, and p 9.3.4 Comparison, Ridge vs. LASSO 9.4 Software 9.5 Example: NYC Taxi Data 9.6 Example: Airbnb Data 9.7 Example: African Soil Data 9.7.1 LASSO Analysis 9.8 Optional Section: The Famous LASSO Picture 9.9 Coming Up PART IV: METHODS BASED ON SEPARATING LINES AND PLANES 10 A BOUNDARY APPROACH: SUPPORT VECTOR MACHINES 10.1 Motivation 10.1.1 Example: The Forest Cover Dataset 10.2 Lines, Planes, and Hyperplanes 10.3 Math Notation 10.3.1 Vector Expressions 10.3.2 Dot Products 10.3.3 SVM as a Parametric Model 10.4 SVM: The Basic Ideas—Separable Case 10.4.1 Example: The Anderson Iris Dataset 10.4.2 Optimizing Criterion 10.5 Major Problem: Lack of Linear Separability 10.5.1 Applying a “Kernel” 10.5.2 Soft Margin 10.6 Example: Forest Cover Data 10.7 And What About That Kernel Trick? 10.8 “Warning: Maximum Number of Iterations Reached” 10.9 Summary 11 LINEAR MODELS ON STEROIDS: NEURAL NETWORKS 11.1 Overview 11.2 Working on Top of a Complex Infrastructure 11.3 Example: Vertebrae Data 11.4 Neural Network Hyperparameters 11.5 Activation Functions 11.6 Regularization 11.6.1 L1 and L2 Regularization 11.6.2 Regularization by Dropout 11.7 Example: Fall Detection Data 11.8 Pitfall: Convergence Problems 11.9 Close Relation to Polynomial Regression 11.10 Bias vs. Variance in Neural Networks 11.11 Discussion PART V: APPLICATIONS 12 IMAGE CLASSIFICATION 12.1 Example: The Fashion MNIST Data 12.1.1 A First Try Using a Logit Model 12.1.2 Refinement via PCA 12.2 Convolutional Models 12.2.1 Need for Recognition of Locality 12.2.2 Overview of Convolutional Methods 12.2.3 Image Tiling 12.2.4 The Convolution Operation 12.2.5 The Pooling Operation 12.2.6 Shape Evolution Across Layers 12.2.7 Dropout 12.2.8 Summary of Shape Evolution 12.2.9 Translation Invariance 12.3 Tricks of the Trade 12.3.1 Data Augmentation 12.3.2 Pretrained Networks 12.4 So, What About the Overfitting Issue? 12.5 Conclusions 13 HANDLING TIME SERIES AND TEXT DATA 13.1 Converting Time Series Data to Rectangular Form 13.1.1 Toy Example 13.1.2 The regtools Function TStoX() 13.2 The qeTS() Function 13.3 Example: Weather Data 13.4 Bias vs. Variance 13.5 Text Applications 13.5.1 The Bag-of-Words Model 13.5.2 The qeText() Function 13.5.3 Example: Quiz Data 13.5.4 Example: AG News Dataset 13.6 Summary A LIST OF ACRONYMS AND SYMBOLS B STATISTICS AND ML TERMINOLOGY CORRESPONDENCE C MATRICES, DATA FRAMES, AND FACTOR CONVERSIONS C.1 Matrices C.2 Conversions: Between R Factors and Dummy Variables, Between Data Frames and Matrices D PITFALL: BEWARE OF “P-HACKING”! INDEX