دسترسی نامحدود
برای کاربرانی که ثبت نام کرده اند
برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید
در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید
برای کاربرانی که ثبت نام کرده اند
درصورت عدم همخوانی توضیحات با کتاب
از ساعت 7 صبح تا 10 شب
ویرایش:
نویسندگان: Michael Walker
سری:
ISBN (شابک) : 1803241675, 9781803241678
ناشر: Packt Publishing
سال نشر: 2022
تعداد صفحات: 542
زبان: English
فرمت فایل : EPUB (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود)
حجم فایل: 9 Mb
در صورت تبدیل فایل کتاب Data Cleaning and Exploration with Machine Learning: Get to grips with machine learning techniques to achieve sparkling-clean data quickly به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.
توجه داشته باشید کتاب پاکسازی و کاوش داده ها با یادگیری ماشینی: با تکنیک های یادگیری ماشینی آشنا شوید تا به سرعت به داده های تمیز و درخشان برسید نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.
Explore supercharged machine learning techniques to take care of your data laundry loads Key Features: Learn how to prepare data for machine learning processes Understand which algorithms are based on prediction objectives and the properties of the data Explore how to interpret and evaluate the results from machine learning Book Description: Many individuals who know how to run machine learning algorithms do not have a good sense of the statistical assumptions they make and how to match the properties of the data to the algorithm for the best results. As you start with this book, models are carefully chosen to help you grasp the underlying data, including in-feature importance and correlation, and the distribution of features and targets. The first two parts of the book introduce you to techniques for preparing data for ML algorithms, without being bashful about using some ML techniques for data cleaning, including anomaly detection and feature selection. The book then helps you apply that knowledge to a wide variety of ML tasks. You\'ll gain an understanding of popular supervised and unsupervised algorithms, how to prepare data for them, and how to evaluate them. Next, you\'ll build models and understand the relationships in your data, as well as perform cleaning and exploration tasks with that data. You\'ll make quick progress in studying the distribution of variables, identifying anomalies, and examining bivariate relationships, as you focus more on the accuracy of predictions in this book. By the end of this book, you\'ll be able to deal with complex data problems using unsupervised ML algorithms like principal component analysis and k-means clustering. What You Will Learn: Explore essential data cleaning and exploration techniques to be used before running the most popular machine learning algorithms Understand how to perform preprocessing and feature selection, and how to set up the data for testing and validation Model continuous targets with supervised learning algorithms Model binary and multiclass targets with supervised learning algorithms Execute clustering and dimension reduction with unsupervised learning algorithms Understand how to use regression trees to model a continuous target Who this book is for: This book is for professional data scientists, particularly those in the first few years of their career, or more experienced analysts who are relatively new to machine learning. Readers should have prior knowledge of concepts in statistics typically taught in an undergraduate introductory course as well as beginner-level experience in manipulating data programmatically.
Cover Title page Copyright and Credits, Contributors Table of Contents Preface Section 1 – Data Cleaning and Machine Learning Algorithms Chapter 1: Examining the Distribution of Features and Targets Technical requirements Subsetting data Generating frequencies for categorical features Generating summary statistics for continuous and discrete features Identifying extreme values and outliers in univariate analysis Using histograms, boxplots, and violin plots to examine the distribution of features Using histograms Using boxplots Using violin plots Summary Chapter 2: Examining Bivariate and Multivariate Relationships between Features and Targets Technical requirements Identifying outliers and extreme values in bivariate relationships Using scatter plots to view bivariate relationships between continuous features Using grouped boxplots to view bivariate relationships between continuous and categorical features Using linear regression to identify data points with significant influence Using K-nearest neighbors to find outliers Using Isolation Forest to find outliers Summary Chapter 3: Identifying and Fixing Missing Values Technical requirements Identifying missing values Cleaning missing values Imputing values with regression Using KNN imputation Using random forest for imputation Summary Section 2 – Preprocessing, Feature Selection, and Sampling Chapter 4: Encoding, Transforming, and Scaling Features Technical requirements Creating training datasets and avoiding data leakage Removing redundant or unhelpful features Encoding categorical features One-hot encoding Ordinal encoding Encoding categorical features with medium or high cardinality Feature hashing Using mathematical transformations Feature binning Equal-width and equal-frequency binning K-means binning Feature scaling Summary Chapter 5: Feature Selection Technical requirements Selecting features for classification models Mutual information classification for feature selection with a categorical target ANOVA F-value for feature selection with a categorical target Selecting features for regression models F-tests for feature selection with a continuous target Mutual information for feature selection with a continuous target Using forward and backward feature selection Using forward feature selection Using backward feature selection Using exhaustive feature selection Eliminating features recursively in a regression model Eliminating features recursively in a classification model Using Boruta for feature selection Using regularization and other embedded methods Using L1 regularization Using a random forest classifier Using principal component analysis Summary Chapter 6: Preparing for Model Evaluation Technical requirements Measuring accuracy, sensitivity, specificity, and precision for binary classification Examining CAP, ROC, and precision-sensitivity curves for binary classification Constructing CAP curves Plotting a receiver operating characteristic (ROC) curve Plotting precision-sensitivity curves Evaluating multiclass models Evaluating regression models Using K-fold cross-validation Preprocessing data with pipelines Summary Section 3 – Modeling Continuous Targets with Supervised Learning Chapter 7: Linear Regression Models Technical requirements Key concepts Key assumptions of linear regression models Linear regression and ordinary least squares Linear regression and gradient descent Using classical linear regression Pre-processing the data for our regression model Running and evaluating our linear model Improving our model evaluation Using lasso regression Tuning hyperparameters with grid searches Using non-linear regression Regression with gradient descent Summary Chapter 8: Support Vector Regression Technical requirements Key concepts of SVR Nonlinear SVR and the kernel trick SVR with a linear model Using kernels for nonlinear SVR Summary Chapter 9: K-Nearest Neighbors, Decision Tree, Random Forest, and Gradient Boosted Regression Technical requirements Key concepts for K-nearest neighbors regression K-nearest neighbors regression Key concepts for decision tree and random forest regression Using random forest regression Decision tree and random forest regression A decision tree example with interpretation Building and interpreting our actual model Random forest regression Using gradient boosted regression Summary Section 4 – Modeling Dichotomous and Multiclass Targets with Supervised Learning Chapter 10: Logistic Regression Technical requirements Key concepts of logistic regression Logistic regression extensions Binary classification with logistic regression Evaluating a logistic regression model Regularization with logistic regression Multinomial logistic regression Summary Chapter 11: Decision Trees and Random Forest Classification Technical requirements Key concepts Using random forest for classification Using gradient-boosted decision trees Decision tree models Implementing random forest Implementing gradient boosting Summary Chapter 12: K-Nearest Neighbors for Classification Technical requirements Key concepts of KNN KNN for binary classification KNN for multiclass classification KNN for letter recognition Summary Chapter 13: Support Vector Machine Classification Technical requirements Key concepts for SVC Nonlinear SVM and the kernel trick Multiclass classification with SVC Linear SVC models Nonlinear SVM classification models SVMs for multiclass classification Summary Chapter 14: Naïve Bayes Classification Technical requirements Key concepts Naïve Bayes classification models Naïve Bayes for text classification Summary Section 5 – Clustering and Dimensionality Reduction with Unsupervised Learning Chapter 15: Principal Component Analysis Technical requirements Key concepts of PCA Feature extraction with PCA Using kernels with PCA Summary Chapter 16: K-Means and DBSCAN Clustering Technical requirements The key concepts of k-means and DBSCAN clustering Implementing k-means clustering Implementing DBSCAN clustering Summary Index About Packt Other Books You May Enjoy