دسترسی نامحدود
برای کاربرانی که ثبت نام کرده اند
برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید
در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید
برای کاربرانی که ثبت نام کرده اند
درصورت عدم همخوانی توضیحات با کتاب
از ساعت 7 صبح تا 10 شب
ویرایش:
نویسندگان: Hefin Ioan Rhys
سری:
ISBN (شابک) : 1617296570, 9781617296574
ناشر: Manning Publications
سال نشر: 2020
تعداد صفحات: 538
زبان: English
فرمت فایل : PDF (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود)
حجم فایل: 39 مگابایت
در صورت تبدیل فایل کتاب Machine Learning with R, Tidyverse, and Mlr به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.
توجه داشته باشید کتاب یادگیری ماشین با R، Tidyverse و Mlr نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.
یادگیری ماشین با R، tidyverse، و mlr به خوانندگان می آموزد که چگونه با استفاده از زبان برنامه نویسی قدرتمند R، اطلاعات ارزشمندی از داده های خود به دست آورند. هفین ایوان ریس، نویسنده و متخصص R، در سبک جذاب و غیررسمی خود، پایه محکمی از مبانی ML می گذارد و خوانندگان را با نظم و ترتیب، مجموعه قدرتمندی از ابزارهای R که به طور خاص برای علم داده های عملی طراحی شده اند، آشنا می کند. ویژگی های کلیدی * تکنیک های رایج ML * استفاده از بسته های منظم برای سازماندهی و ترسیم داده های شما * اعتبارسنجی عملکرد مدل * انتخاب بهترین مدل ML برای کار شما * انواع تمرین های کدنویسی عملی * بهترین شیوه های ML برای خوانندگان با مهارت های برنامه نویسی اولیه در R، Python یا زبان برنامه نویسی استاندارد دیگر. درباره فن آوری تکنیک های یادگیری ماشینی به طور دقیق و کارآمد الگوها و روابط در داده ها را شناسایی کرده و از آن مدل ها برای پیش بینی داده های جدید استفاده می کنند. تکنیکهای ML میتوانند حتی روی مجموعههای داده نسبتاً کوچک کار کنند و این مهارتها را به متحد قدرتمندی برای تقریباً هر کار تجزیه و تحلیل داده تبدیل میکنند. Hefin Ioan Rhysis، دانشمند تحقیقات آزمایشگاهی ارشد در بستر فناوری اشتراکی فلوسیتومتری در موسسه فرانسیس کریک. او سال آخر دوره دکتری خود را به تدریس مهارت های پایه R در دانشگاه گذراند. او که علاقهمند به علم داده و یادگیری ماشین است، کانال یوتیوب خود را دارد که شامل آموزشهای اسکرینست در R and R Studio است.
Machine Learning with R, tidyverse, and mlr teaches readers how to gain valuable insights from their data using the powerful R programming language. In his engaging and informal style, author and R expert Hefin Ioan Rhys lays a firm foundation of ML basics and introduces readers to the tidyverse, a powerful set of R tools designed specifically for practical data science. Key Features * Commonly used ML techniques * Using the tidyverse packages to organize and plot your data * Validating model performance * Choosing the best ML model for your task * A variety of hands-on coding exercises * ML best practices For readers with basic programming skills in R, Python, or another standard programming language. About the technology Machine learning techniques accurately and efficiently identify patterns and relationships in data and use those models to make predictions about new data. ML techniques can work on even relatively small datasets, making these skills a powerful ally for nearly any data analysis task. Hefin Ioan Rhysis a senior laboratory research scientist in the Flow Cytometry Shared Technology Platform at The Francis Crick Institute. He spent the final year of his PhD program teaching basic R skills at the university. A data science and machine learning enthusiast, he has his own Youtube channel featuring screencast tutorials in R and R Studio.
Machine Learning with R, the tidyverse, and mlr brief contents contents preface acknowledgments about this book Who should read this book How this book is organized: A roadmap About the code liveBook discussion forum about the author about the cover illustration Part 1—Introduction 1 Introduction to machine learning 1.1 What is machine learning? 1.1.1 AI and machine learning 1.1.2 The difference between a model and an algorithm 1.2 Classes of machine learning algorithms 1.2.1 Differences between supervised, unsupervised, and semi-supervised learning 1.2.2 Classification, regression, dimension reduction, and clustering 1.2.3 A brief word on deep learning 1.3 Thinking about the ethical impact of machine learning 1.4 Why use R for machine learning? 1.5 Which datasets will we use? 1.6 What will you learn in this book? Summary 2 Tidying, manipulating, and plotting data with the tidyverse 2.1 What is the tidyverse, and what is tidy data? 2.2 Loading the tidyverse 2.3 What the tibble package is and what it does 2.3.1 Creating tibbles 2.3.2 Converting existing data frames into tibbles 2.3.3 Differences between data frames and tibbles 2.4 What the dplyr package is and what it does 2.4.1 Manipulating the CO2 dataset with dplyr 2.4.2 Chaining dplyr functions together 2.5 What the ggplot2 package is and what it does 2.6 What the tidyr package is and what it does 2.7 What the purrr package is and what it does 2.7.1 Replacing for loops with map() 2.7.2 Returning an atomic vector instead of a list 2.7.3 Using anonymous functions inside the map() family 2.7.4 Using walk() to produce a function’s side effects 2.7.5 Iterating over multiple lists simultaneously Summary Solutions to exercises Part 2—Classification 3 Classifying based on similarities with k-nearest neighbors 3.1 What is the k-nearest neighbors algorithm? 3.1.1 How does the k-nearest neighbors algorithm learn? 3.1.2 What happens if the vote is tied? 3.2 Building your first kNN model 3.2.1 Loading and exploring the diabetes dataset 3.2.2 Using mlr to train your first kNN model 3.2.3 Telling mlr what we’re trying to achieve: Defining the task 3.2.4 Telling mlr which algorithm to use: Defining the learner 3.2.5 Putting it all together: Training the model 3.3 Balancing two sources of model error: The bias-variance trade-off 3.4 Using cross-validation to tell if we’re overfitting or underfitting 3.5 Cross-validating our kNN model 3.5.1 Holdout cross-validation 3.5.2 K-fold cross-validation 3.5.3 Leave-one-out cross-validation 3.6 What algorithms can learn, and what they must be told: Parameters and hyperparameters 3.7 Tuning k to improve the model 3.7.1 Including hyperparameter tuning in cross-validation 3.7.2 Using our model to make predictions 3.8 Strengths and weaknesses of kNN Summary Solutions to exercises 4 Classifying based on odds with logistic regression 4.1 What is logistic regression? 4.1.1 How does logistic regression learn? 4.1.2 What if we have more than two classes? 4.2 Building your first logistic regression model 4.2.1 Loading and exploring the Titanic dataset 4.2.2 Making the most of the data: Feature engineering and feature selection 4.2.3 Plotting the data 4.2.4 Training the model 4.2.5 Dealing with missing data 4.2.6 Training the model (take two) 4.3 Cross-validating the logistic regression model 4.3.1 Including missing value imputation in cross-validation 4.3.2 Accuracy is the most important performance metric, right? 4.4 Interpreting the model: The odds ratio 4.4.1 Converting model parameters into odds ratios 4.4.2 When a one-unit increase doesn’t make sense 4.5 Using our model to make predictions 4.6 Strengths and weaknesses of logistic regression Summary Solutions to exercises 5 Classifying by maximizing separation with discriminant analysis 5.1 What is discriminant analysis? 5.1.1 How does discriminant analysis learn? 5.1.2 What if we have more than two classes? 5.1.3 Learning curves instead of straight lines: QDA 5.1.4 How do LDA and QDA make predictions? 5.2 Building your first linear and quadratic discriminant models 5.2.1 Loading and exploring the wine dataset 5.2.2 Plotting the data 5.2.3 Training the models 5.3 Strengths and weaknesses of LDA and QDA Summary Solutions to exercises 6 Classifying with naive Bayes and support vector machines 6.1 What is the naive Bayes algorithm? 6.1.1 Using naive Bayes for classification 6.1.2 Calculating the likelihood for categorical and continuous predictors 6.2 Building your first naive Bayes model 6.2.1 Loading and exploring the HouseVotes84 dataset 6.2.2 Plotting the data 6.2.3 Training the model 6.3 Strengths and weaknesses of naive Bayes 6.4 What is the support vector machine (SVM) algorithm? 6.4.1 SVMs for linearly separable data 6.4.2 What if the classes aren’t fully separable? 6.4.3 SVMs for non-linearly separable data 6.4.4 Hyperparameters of the SVM algorithm 6.4.5 What if we have more than two classes? 6.5 Building your first SVM model 6.5.1 Loading and exploring the spam dataset 6.5.2 Tuning our hyperparameters 6.5.3 Training the model with the tuned hyperparameters 6.6 Cross-validating our SVM model 6.7 Strengths and weaknesses of the SVM algorithm Summary Solutions to exercises 7 Classifying with decision trees 7.1 What is the recursive partitioning algorithm? 7.1.1 Using Gini gain to split the tree 7.1.2 What about continuous and multilevel categorical predictors? 7.1.3 Hyperparameters of the rpart algorithm 7.2 Building your first decision tree model 7.3 Loading and exploring the zoo dataset 7.4 Training the decision tree model 7.4.1 Training the model with the tuned hyperparameters 7.5 Cross-validating our decision tree model 7.6 Strengths and weaknesses of tree-based algorithms Summary 8 Improving decision trees with random forests and boosting 8.1 Ensemble techniques: Bagging, boosting, and stacking 8.1.1 Training models on sampled data: Bootstrap aggregating 8.1.2 Learning from the previous models’ mistakes: Boosting 8.1.3 Learning from predictions made by other models: Stacking 8.2 Building your first random forest model 8.3 Building your first XGBoost model 8.4 Strengths and weaknesses of tree-based algorithms 8.5 Benchmarking algorithms against each other Summary Part 3—Regression 9 Linear regression 9.1 What is linear regression? 9.1.1 What if we have multiple predictors? 9.1.2 What if our predictors are categorical? 9.2 Building your first linear regression model 9.2.1 Loading and exploring the Ozone dataset 9.2.2 Imputing missing values 9.2.3 Automating feature selection 9.2.4 Including imputation and feature selection in cross-validation 9.2.5 Interpreting the model 9.3 Strengths and weaknesses of linear regression Summary Solutions to exercises 10 Nonlinear regression with generalized additive models 10.1 Making linear regression nonlinear with polynomial terms 10.2 More flexibility: Splines and generalized additive models 10.2.1 How GAMs learn their smoothing functions 10.2.2 How GAMs handle categorical variables 10.3 Building your first GAM 10.4 Strengths and weaknesses of GAMs Summary Solutions to exercises 11 Preventing overfitting with ridge regression, LASSO, and elastic net 11.1 What is regularization? 11.2 What is ridge regression? 11.3 What is the L2 norm, and how does ridge regression use it? 11.4 What is the L1 norm, and how does LASSO use it? 11.5 What is elastic net? 11.6 Building your first ridge, LASSO, and elastic net models 11.6.1 Loading and exploring the Iowa dataset 11.6.2 Training the ridge regression model 11.6.3 Training the LASSO model 11.6.4 Training the elastic net model 11.7 Benchmarking ridge, LASSO, elastic net, and OLS against each other 11.8 Strengths and weaknesses of ridge, LASSO, and elastic net Summary Solutions to exercises 12 Regression with kNN, random forest, and XGBoost 12.1 Using k-nearest neighbors to predict a continuous variable 12.2 Using tree-based learners to predict a continuous variable 12.3 Building your first kNN regression model 12.3.1 Loading and exploring the fuel dataset 12.3.2 Tuning the k hyperparameter 12.4 Building your first random forest regression model 12.5 Building your first XGBoost regression model 12.6 Benchmarking the kNN, random forest, and XGBoost model-building processes 12.7 Strengths and weaknesses of kNN, random forest, and XGBoost Summary Solutions to exercises Part 4—Dimension reduction 13 Maximizing variance with principal component analysis 13.1 Why dimension reduction? 13.1.1 Visualizing high-dimensional data 13.1.2 Consequences of the curse of dimensionality 13.1.3 Consequences of collinearity 13.1.4 Mitigating the curse of dimensionality and collinearity by using dimension reduction 13.2 What is principal component analysis? 13.3 Building your first PCA model 13.3.1 Loading and exploring the banknote dataset 13.3.2 Performing PCA 13.3.3 Plotting the result of our PCA 13.3.4 Computing the component scores of new data 13.4 Strengths and weaknesses of PCA Summary Solutions to exercises 14 Maximizing similarity with t-SNE and UMAP 14.1 What is t-SNE? 14.2 Building your first t-SNE embedding 14.2.1 Performing t-SNE 14.2.2 Plotting the result of t-SNE 14.3 What is UMAP? 14.4 Building your first UMAP model 14.4.1 Performing UMAP 14.4.2 Plotting the result of UMAP 14.4.3 Computing the UMAP embeddings of new data 14.5 Strengths and weaknesses of t-SNE and UMAP Summary Solutions to exercises 15 Self-organizing maps and locally linear embedding 15.1 Prerequisites: Grids of nodes and manifolds 15.2 What are self-organizing maps? 15.2.1 Creating the grid of nodes 15.2.2 Randomly assigning weights, and placing cases in nodes 15.2.3 Updating node weights to better match the cases inside them 15.3 Building your first SOM 15.3.1 Loading and exploring the flea dataset 15.3.2 Training the SOM 15.3.3 Plotting the SOM result 15.3.4 Mapping new data onto the SOM 15.4 What is locally linear embedding? 15.5 Building your first LLE 15.5.1 Loading and exploring the S-curve dataset 15.5.2 Training the LLE 15.5.3 Plotting the LLE result 15.6 Building an LLE of our flea data 15.7 Strengths and weaknesses of SOMs and LLE Summary Solutions to exercises Part 5—Clustering 16 Clustering by finding centers with k-means 16.1 What is k-means clustering? 16.1.1 Lloyd’s algorithm 16.1.2 MacQueen’s algorithm 16.1.3 Hartigan-Wong algorithm 16.2 Building your first k-means model 16.2.1 Loading and exploring the GvHD dataset 16.2.2 Defining our task and learner 16.2.3 Choosing the number of clusters 16.2.4 Tuning k and the algorithm choice for our k-means model 16.2.5 Training the final, tuned k-means model 16.2.6 Using our model to predict clusters of new data 16.3 Strengths and weaknesses of k-means clustering Summary Solutions to exercises 17 Hierarchical clustering 17.1 What is hierarchical clustering? 17.1.1 Agglomerative hierarchical clustering 17.1.2 Divisive hierarchical clustering 17.2 Building your first agglomerative hierarchical clustering model 17.2.1 Choosing the number of clusters 17.2.2 Cutting the tree to select a flat set of clusters 17.3 How stable are our clusters? 17.4 Strengths and weaknesses of hierarchical clustering Summary Solutions to exercises 18 Clustering based on density: DBSCAN and OPTICS 18.1 What is density-based clustering? 18.1.1 How does the DBSCAN algorithm learn? 18.1.2 How does the OPTICS algorithm learn? 18.2 Building your first DBSCAN model 18.2.1 Loading and exploring the banknote dataset 18.2.2 Tuning the epsilon and minPts hyperparameters 18.3 Building your first OPTICS model 18.4 Strengths and weaknesses of density-based clustering Summary Solutions to exercises 19 Clustering based on distributions with mixture modeling 19.1 What is mixture model clustering? 19.1.1 Calculating probabilities with the EM algorithm 19.1.2 EM algorithm expectation and maximization steps 19.1.3 What if we have more than one variable? 19.2 Building your first Gaussian mixture model for clustering 19.3 Strengths and weaknesses of mixture model clustering Summary Solutions to exercises 20 Final notes and further reading 20.1 A brief recap of machine learning concepts 20.1.1 Supervised, unsupervised, and semi-supervised learning 20.1.2 Balancing the bias-variance trade-off for model performance 20.1.3 Using model validation to identify over-/underfitting 20.1.4 Maximizing model performance with hyperparameter tuning 20.1.5 Using missing value imputation to deal with missing data 20.1.6 Feature engineering and feature selection 20.1.7 Improving model performance with ensemble techniques 20.1.8 Preventing overfitting with regularization 20.2 Where can you go from here? 20.2.1 Deep learning 20.2.2 Reinforcement learning 20.2.3 General R data science and the tidyverse 20.2.4 mlr tutorial and creating new learners/metrics 20.2.5 Generalized additive models 20.2.6 Ensemble methods 20.2.7 Support vector machines 20.2.8 Anomaly detection 20.2.9 Time series 20.2.10 Clustering 20.2.11 Generalized linear models 20.2.12 Semi-supervised learning 20.2.13 Modeling spectral data 20.3 The last word Appendix—Refresher on statistical concepts A.1 Data vocabulary A.1.1 Sample vs. population A.1.2 Rows and columns A.1.3 Variable types A.2 Vectors A.3 Distributions A.4 Sigma notation A.5 Central tendency A.5.1 Arithmetic mean A.5.2 Median A.5.3 Mode A.6 Measures of dispersion A.6.1 Mean absolute deviation A.6.2 Standard deviation A.6.3 Variance A.6.4 Interquartile range A.7 Measures of the relationships between variables A.7.1 Covariance A.7.2 Pearson correlation coefficient A.8 Logarithms index Symbols A B C D E F G H I J K L M N O P Q R S T U V W X Y