ورود به حساب

نام کاربری گذرواژه

گذرواژه را فراموش کردید؟ کلیک کنید

حساب کاربری ندارید؟ ساخت حساب

ساخت حساب کاربری

نام نام کاربری ایمیل شماره موبایل گذرواژه

برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید


09117307688
09117179751

در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید

دسترسی نامحدود

برای کاربرانی که ثبت نام کرده اند

ضمانت بازگشت وجه

درصورت عدم همخوانی توضیحات با کتاب

پشتیبانی

از ساعت 7 صبح تا 10 شب

دانلود کتاب The Data Science Workshop

دانلود کتاب کارگاه علوم داده

The Data Science Workshop

مشخصات کتاب

The Data Science Workshop

ویرایش: 2 
نویسندگان: , , , , ,   
سری:  
ISBN (شابک) : 9781800566927 
ناشر:  
سال نشر: 2020 
تعداد صفحات: 823 
زبان: English 
فرمت فایل : PDF (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود) 
حجم فایل: 24 مگابایت 

قیمت کتاب (تومان) : 49,000



ثبت امتیاز به این کتاب

میانگین امتیاز به این کتاب :
       تعداد امتیاز دهندگان : 24


در صورت تبدیل فایل کتاب The Data Science Workshop به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.

توجه داشته باشید کتاب کارگاه علوم داده نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.


توضیحاتی درمورد کتاب به خارجی



فهرست مطالب

Cover
FM
Copyright
Table of Contents
Preface
Chapter 1: Introduction to Data Science in Python
	Introduction
	Application of Data Science
		What Is Machine Learning?
			Supervised Learning
			Unsupervised Learning
			Reinforcement Learning
	Overview of Python
		Types of Variable
			Numeric Variables
			Text Variables
			Python List
			Python Dictionary
		Exercise 1.01: Creating a Dictionary That Will Contain Machine Learning Algorithms
	Python for Data Science
		The pandas Package
			DataFrame and Series
			CSV Files
			Excel Spreadsheets
			JSON
		Exercise 1.02: Loading Data of Different Formats into a pandas DataFrame
	Scikit-Learn
		What Is a Model?
			Model Hyperparameters
			The sklearn API
		Exercise 1.03: Predicting Breast Cancer from a Dataset Using sklearn
		Activity 1.01: Train a Spam Detector Algorithm
	Summary
Chapter 2: Regression
	Introduction
	Simple Linear Regression
		The Method of Least Squares
	Multiple Linear Regression
		Estimating the Regression Coefficients (β0, β1, β2 and β3)
		Logarithmic Transformations of Variables
		Correlation Matrices
	Conducting Regression Analysis Using Python
		Exercise 2.01: Loading and Preparing the Data for Analysis
		The Correlation Coefficient
		Exercise 2.02: Graphical Investigation of Linear Relationships Using Python
		Exercise 2.03: Examining a Possible Log-Linear Relationship Using Python
		The Statsmodels formula API
		Exercise 2.04: Fitting a Simple Linear Regression Model Using the Statsmodels formula API
		Analyzing the Model Summary
		The Model Formula Language
		Intercept Handling
		Activity 2.01: Fitting a Log-Linear Model Using the Statsmodels Formula API
	Multiple Regression Analysis
		Exercise 2.05: Fitting a Multiple Linear Regression Model Using the Statsmodels Formula API
	Assumptions of Regression Analysis
		Activity 2.02: Fitting a Multiple Log-Linear Regression Model
	Explaining the Results of Regression Analysis
		Regression Analysis Checks and Balances
		The F-test
		The t-test
	Summary
Chapter 3: Binary Classification
	Introduction
	Understanding the Business Context
		Business Discovery
		Exercise 3.01: Loading and Exploring the Data from the Dataset
		Testing Business Hypotheses Using Exploratory Data Analysis
		Visualization for Exploratory Data Analysis
		Exercise 3.02: Business Hypothesis Testing for Age versus Propensity for a Term Loan
		Intuitions from the Exploratory Analysis
		Activity 3.01: Business Hypothesis Testing to Find Employment Status versus Propensity for Term Deposits
	Feature Engineering
		Business-Driven Feature Engineering
		Exercise 3.03: Feature Engineering – Exploration of Individual Features
		Exercise 3.04: Feature Engineering – Creating New Features from Existing Ones
	Data-Driven Feature Engineering
		A Quick Peek at Data Types and a Descriptive Summary
	Correlation Matrix and Visualization
		Exercise 3.05: Finding the Correlation in Data to Generate a Correlation Plot Using Bank Data
		Skewness of Data
		Histograms
		Density Plots
		Other Feature Engineering Methods
		Summarizing Feature Engineering
		Building a Binary Classification Model Using the Logistic Regression Function
		Logistic Regression Demystified
		Metrics for Evaluating Model Performance
		Confusion Matrix
		Accuracy
		Classification Report
		Data Preprocessing
		Exercise 3.06: A Logistic Regression Model for Predicting the Propensity of Term Deposit Purchases in a Bank
		Activity 3.02: Model Iteration 2 – Logistic Regression Model with Feature Engineered Variables
		Next Steps
	Summary
Chapter 4: Multiclass Classification with RandomForest
	Introduction
	Training a Random Forest Classifier
	Evaluating the Model's Performance
		Exercise 4.01: Building a Model for Classifying Animal Type and Assessing Its Performance
		Number of Trees Estimator
		Exercise 4.02: Tuning n_estimators to Reduce Overfitting
	Maximum Depth
		Exercise 4.03: Tuning max_depth to Reduce Overfitting
	Minimum Sample in Leaf
		Exercise 4.04: Tuning min_samples_leaf
	Maximum Features
		Exercise 4.05: Tuning max_features
		Activity 4.01: Train a Random Forest Classifier on the ISOLET Dataset
	Summary
Chapter 5: Performing Your First Cluster Analysis
	Introduction
	Clustering with k-means
		Exercise 5.01: Performing Your First Clustering Analysis on the ATO Dataset
	Interpreting k-means Results
		Exercise 5.02: Clustering Australian Postcodes by Business Income and Expenses
	Choosing the Number of Clusters
		Exercise 5.03: Finding the Optimal Number of Clusters
	Initializing Clusters
		Exercise 5.04: Using Different Initialization Parameters to Achieve a Suitable Outcome
	Calculating the Distance to the Centroid
		Exercise 5.05: Finding the Closest Centroids in Our Dataset
	Standardizing Data
		Exercise 5.06: Standardizing the Data from Our Dataset
		Activity 5.01: Perform Customer Segmentation Analysis in a Bank Using k-means
	Summary
Chapter 6: How to Assess Performance
	Introduction
	Splitting Data
		Exercise 6.01: Importing and Splitting Data
	Assessing Model Performance for Regression Models
		Data Structures – Vectors and Matrices
			Scalars
			Vectors
			Matrices
		R2 Score
		Exercise 6.02: Computing the R2 Score of a Linear Regression Model
		Mean Absolute Error
		Exercise 6.03: Computing the MAE of a Model
		Exercise 6.04: Computing the Mean Absolute Error of a Second Model
			Other Evaluation Metrics
	Assessing Model Performance for Classification Models
		Exercise 6.05: Creating a Classification Model for Computing Evaluation Metrics
	The Confusion Matrix
		Exercise 6.06: Generating a Confusion Matrix for the Classification Model
		More on the Confusion Matrix
		Precision
		Exercise 6.07: Computing Precision for the Classification Model
		Recall
		Exercise 6.08: Computing Recall for the Classification Model
		F1 Score
		Exercise 6.09: Computing the F1 Score for the Classification Model
		Accuracy
		Exercise 6.10: Computing Model Accuracy for the Classification Model
		Logarithmic Loss
		Exercise 6.11: Computing the Log Loss for the Classification Model
	Receiver Operating Characteristic Curve
		Exercise 6.12: Computing and Plotting ROC Curve for a Binary Classification Problem
	Area Under the ROC Curve
		Exercise 6.13: Computing the ROC AUC for the Caesarian Dataset
	Saving and Loading Models
		Exercise 6.14: Saving and Loading a Model
		Activity 6.01: Train Three Different Models and Use Evaluation Metrics to Pick the Best Performing Model
	Summary
Chapter 7: The Generalization of Machine Learning Models
	Introduction
	Overfitting
		Training on Too Many Features
		Training for Too Long
	Underfitting
	Data
		The Ratio for Dataset Splits
		Creating Dataset Splits
		Exercise 7.01: Importing and Splitting Data
	Random State
		Exercise 7.02: Setting a Random State When Splitting Data
	Cross-Validation
		KFold
		Exercise 7.03: Creating a Five-Fold Cross-Validation Dataset
		Exercise 7.04: Creating a Five-Fold Cross-Validation Dataset Using a Loop for Calls
	cross_val_score
		Exercise 7.05: Getting the Scores from Five-Fold Cross-Validation
		Understanding Estimators That Implement CV
	LogisticRegressionCV
		Exercise 7.06: Training a Logistic Regression Model Using Cross-Validation
	Hyperparameter Tuning with GridSearchCV
		Decision Trees
		Exercise 7.07: Using Grid Search with Cross-Validation to Find the Best Parameters for a Model
	Hyperparameter Tuning with RandomizedSearchCV
		Exercise 7.08: Using Randomized Search for Hyperparameter Tuning
	Model Regularization with Lasso Regression
		Exercise 7.09: Fixing Model Overfitting Using Lasso Regression
	Ridge Regression
		Exercise 7.10: Fixing Model Overfitting Using Ridge Regression
		Activity 7.01: Find an Optimal Model for Predicting the Critical Temperatures of Superconductors
	Summary
Chapter 8: Hyperparameter Tuning
	Introduction
	What Are Hyperparameters?
		Difference between Hyperparameters and Statistical Model Parameters
		Setting Hyperparameters
		A Note on Defaults
	Finding the Best Hyperparameterization
		Exercise 8.01: Manual Hyperparameter Tuning for a k-NN Classifier
		Advantages and Disadvantages of a Manual Search
	Tuning Using Grid Search
		Simple Demonstration of the Grid Search Strategy
	GridSearchCV
		Tuning using GridSearchCV
			Support Vector Machine (SVM) Classifiers
		Exercise 8.02: Grid Search Hyperparameter Tuning for an SVM
		Advantages and Disadvantages of Grid Search
	Random Search
		Random Variables and Their Distributions
		Simple Demonstration of the Random Search Process
		Tuning Using RandomizedSearchCV
		Exercise 8.03: Random Search Hyperparameter Tuning for a Random Forest Classifier
		Advantages and Disadvantages of a Random Search
		Activity 8.01: Is the Mushroom Poisonous?
	Summary
Chapter 9: Interpreting a Machine Learning Model
	Introduction
	Linear Model Coefficients
		Exercise 9.01: Extracting the Linear Regression Coefficient
	RandomForest Variable Importance
		Exercise 9.02: Extracting RandomForest Feature Importance
	Variable Importance via Permutation
		Exercise 9.03: Extracting Feature Importance via Permutation
	Partial Dependence Plots
		Exercise 9.04: Plotting Partial Dependence
	Local Interpretation with LIME
		Exercise 9.05: Local Interpretation with LIME
		Activity 9.01: Train and Analyze a Network Intrusion Detection Model
	Summary
Chapter 10: Analyzing a Dataset
	Introduction
	Exploring Your Data
	Analyzing Your Dataset
		Exercise 10.01: Exploring the Ames Housing Dataset with Descriptive Statistics
	Analyzing the Content of a Categorical Variable
		Exercise 10.02: Analyzing the Categorical Variables from the Ames Housing Dataset
	Summarizing Numerical Variables
		Exercise 10.03: Analyzing Numerical Variables from the Ames Housing Dataset
	Visualizing Your Data
		Using the Altair API
		Histogram for Numerical Variables
		Bar Chart for Categorical Variables
	Boxplots
		Exercise 10.04: Visualizing the Ames Housing Dataset with Altair
		Activity 10.01: Analyzing Churn Data Using Visual Data Analysis Techniques
	Summary
Chapter 11: Data Preparation
	Introduction
	Handling Row Duplication
		Exercise 11.01: Handling Duplicates in a Breast Cancer Dataset
	Converting Data Types
		Exercise 11.02: Converting Data Types for the Ames Housing Dataset
	Handling Incorrect Values
		Exercise 11.03: Fixing Incorrect Values in the State Column
	Handling Missing Values
		Exercise 11.04: Fixing Missing Values for the Horse Colic Dataset
		Activity 11.01: Preparing the Speed Dating Dataset
	Summary
Chapter 12: Feature Engineering
	Introduction
		Merging Datasets
			The Left Join
			The Right Join
		Exercise 12.01: Merging the ATO Dataset with the Postcode Data
		Binning Variables
		Exercise 12.02: Binning the YearBuilt Variable from the AMES Housing Dataset
		Manipulating Dates
		Exercise 12.03: Date Manipulation on Financial Services Consumer Complaints
		Performing Data Aggregation
		Exercise 12.04: Feature Engineering Using Data Aggregation on the AMES Housing Dataset
		Activity 12.01: Feature Engineering on a Financial Dataset
		Summary
Chapter 13: Imbalanced Datasets
	Introduction
	Understanding the Business Context
		Exercise 13.01: Benchmarking the Logistic Regression Model on the Dataset
		Analysis of the Result
	Challenges of Imbalanced Datasets
	Strategies for Dealing with Imbalanced Datasets
		Collecting More Data
		Resampling Data
		Exercise 13.02: Implementing Random Undersampling and Classification on Our Banking Dataset to Find the Optimal Result
		Analysis
	Generating Synthetic Samples
		Implementation of SMOTE and MSMOTE
		Exercise 13.03: Implementing SMOTE on Our Banking Dataset to Find the Optimal Result
		Exercise 13.04: Implementing MSMOTE on Our Banking Dataset to Find the Optimal Result
		Applying Balancing Techniques on a Telecom Dataset
		Activity 13.01: Finding the Best Balancing Technique by Fitting a Classifier on the Telecom Churn Dataset
	Summary
Chapter 14: Dimensionality Reduction
	Introduction
		Business Context
		Exercise 14.01: Loading and Cleaning the Dataset
	Creating a High-Dimensional Dataset
		Activity 14.01: Fitting a Logistic Regression Model on a HighDimensional Dataset
	Strategies for Addressing High-Dimensional Datasets
		Backward Feature Elimination (Recursive Feature Elimination)
		Exercise 14.02: Dimensionality Reduction Using Backward Feature Elimination
		Forward Feature Selection
		Exercise 14.03: Dimensionality Reduction Using Forward Feature Selection
		Principal Component Analysis (PCA)
		Exercise 14.04: Dimensionality Reduction Using PCA
		Independent Component Analysis (ICA)
		Exercise 14.05: Dimensionality Reduction Using Independent Component Analysis
		Factor Analysis
		Exercise 14.06: Dimensionality Reduction Using Factor Analysis
	Comparing Different Dimensionality Reduction Techniques
		Activity 14.02: Comparison of Dimensionality Reduction Techniques on the Enhanced Ads Dataset
	Summary
Chapter 15: Ensemble Learning
	Introduction
	Ensemble Learning
		Variance
		Bias
		Business Context
		Exercise 15.01: Loading, Exploring, and Cleaning the Data
		Activity 15.01: Fitting a Logistic Regression Model on Credit Card Data
	Simple Methods for Ensemble Learning
		Averaging
		Exercise 15.02: Ensemble Model Using the Averaging Technique
		Weighted Averaging
		Exercise 15.03: Ensemble Model Using the Weighted Averaging Technique
			Iteration 2 with Different Weights
			Max Voting
		Exercise 15.04: Ensemble Model Using Max Voting
	Advanced Techniques for Ensemble Learning
		Bagging
		Exercise 15.05: Ensemble Learning Using Bagging
		Boosting
		Exercise 15.06: Ensemble Learning Using Boosting
		Stacking
		Exercise 15.07: Ensemble Learning Using Stacking
		Activity 15.02: Comparison of Advanced Ensemble Techniques
	Summary
Index




نظرات کاربران