دسترسی نامحدود
برای کاربرانی که ثبت نام کرده اند
برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید
در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید
برای کاربرانی که ثبت نام کرده اند
درصورت عدم همخوانی توضیحات با کتاب
از ساعت 7 صبح تا 10 شب
ویرایش:
نویسندگان: Poornachandra Sarang
سری: The Springer Series in Applied Machine Learning
ISBN (شابک) : 3031023625, 9783031023620
ناشر: Springer
سال نشر: 2023
تعداد صفحات: 365
[366]
زبان: English
فرمت فایل : PDF (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود)
حجم فایل: 18 Mb
در صورت تبدیل فایل کتاب Thinking Data Science: A Data Science Practitioner’s Guide به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.
توجه داشته باشید کتاب Thinking Data Science: A Data Science Practitioners Guide نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.
این راهنمای قطعی پروژههای یادگیری ماشینی به مشکلاتی که یک دانشمند داده مشتاق یا با تجربه اغلب دارد پاسخ میدهد: آیا در مورد استفاده از چه فناوری برای توسعه ML خود سردرگم هستید؟ آیا باید از GOFAI، ANN/DNN یا Transfer Learning استفاده کنم؟ آیا می توانم برای توسعه مدل به AutoML تکیه کنم؟ اگر مشتری گیگ و ترابایت داده برای توسعه مدلهای تحلیلی به من بدهد چه؟ چگونه مجموعه داده های پویا با فرکانس بالا را مدیریت کنم؟ این کتاب ادغام کل فرآیند علم داده را در یک \"برگ تقلب\" به پزشک ارائه می دهد. چالش یک دانشمند داده استخراج اطلاعات معنادار از مجموعه داده های عظیم است که به ایجاد استراتژی های بهتر برای کسب و کارها کمک می کند. بسیاری از الگوریتم های یادگیری ماشین و شبکه های عصبی برای انجام تجزیه و تحلیل بر روی چنین مجموعه داده هایی طراحی شده اند. برای یک دانشمند داده، این یک تصمیم دلهره آور است که از کدام الگوریتم برای یک مجموعه داده معین استفاده کند. اگرچه پاسخ واحدی برای این سوال وجود ندارد، اما یک رویکرد سیستماتیک برای حل مسئله ضروری است. این کتاب الگوریتمهای مختلف ML را به صورت مفهومی توصیف میکند و فرآیندی را در انتخاب مدلهای ML/DL تعریف/بحث میکند. ادغام الگوریتم ها و تکنیک های موجود برای طراحی مدل های کارآمد ML جنبه کلیدی این کتاب است. Thinking Data Science به تمرین دانشمندان داده، دانشگاهیان، محققان و دانشجویانی که میخواهند مدلهای ML را با استفاده از الگوریتمها و معماریهای مناسب بسازند، چه دادهها کوچک یا بزرگ باشند، کمک میکند.
This definitive guide to Machine Learning projects answers the problems an aspiring or experienced data scientist frequently has: Confused on what technology to use for your ML development? Should I use GOFAI, ANN/DNN or Transfer Learning? Can I rely on AutoML for model development? What if the client provides me Gig and Terabytes of data for developing analytic models? How do I handle high-frequency dynamic datasets? This book provides the practitioner with a consolidation of the entire data science process in a single “Cheat Sheet”. The challenge for a data scientist is to extract meaningful information from huge datasets that will help to create better strategies for businesses. Many Machine Learning algorithms and Neural Networks are designed to do analytics on such datasets. For a data scientist, it is a daunting decision as to which algorithm to use for a given dataset. Although there is no single answer to this question, a systematic approach to problem solving is necessary. This book describes the various ML algorithms conceptually and defines/discusses a process in the selection of ML/DL models. The consolidation of available algorithms and techniques for designing efficient ML models is the key aspect of this book. Thinking Data Science will help practising data scientists, academicians, researchers, and students who want to build ML models using the appropriate algorithms and architectures, whether the data be small or big.
Preface Contents 1: Data Science Process Traditional Model Building Modern Approach for Model Building AI on Image Datasets Model Development on Text Datasets Model Building on High-Frequency Datasets Data Science Process Data Preparation Numeric Data Processing Text Processing Preprocessing Text Data Exploratory Data Analysis Features Engineering Deciding on Model Type Model Training Algorithm Selection AutoML Hyper-Parameter Tuning Model Building Using ANN Models Based on Transfer Learning Summary 2: Dimensionality Reduction In a Nutshell Why Reduce Dimensionality? Dimensionality Reduction Techniques Project Dataset Columns with Missing Values Filtering Columns Based on Variance Filtering Highly Correlated Columns Random Forest Backward Elimination Forward Features Selection Factor Analysis Principal Component Analysis PCA on Huge Multi-columnar Dataset About the Dataset Loading Dataset Model Building PCA for Visualization PCA for Model Building Independent Component Analysis Isometric Mapping t-Distributed Stochastic Neighbor Embedding (t-SNE) UMAP Singular Value Decomposition Linear Discriminant Analysis (LDA) Summary Part I: Classical Algorithms: Overview 3: Regression Analysis In a Nutshell When to Use? Regression Types Linear Regression Assumptions Polynomial Regression Ridge Regression Lasso Regression ElasticNet Regression Linear Regression Implementations Linear Regression Ridge Regression Lasso Regression Bayesian Linear Regression BLR Implementation BLR Project Logistic Regression Logistic Regression Implementation Guidelines for Model Selection What´s Next? Summary 4: Decision Tree In a Nutshell Wide Range of Applications Decision Tree Workings Tree Traversal Tree Construction Entropy Information Gain Gini Index Constructing Tree Tree Construction Algorithm Tree Traversal Algorithm Implementation Project (Regression) Loading Dataset Preparing Datasets Model Building Evaluating Performance Tree Visualization Feature Importance Project (Classifier) Summary 5: Ensemble: Bagging and Boosting What is Bagging and Boosting? Bagging Boosting Random Forest In a Nutshell What Is Random Forest? Random Forest Algorithm Advantages Applications Implementation Random Forest Project ExtraTrees Bagging Ensemble Project ExtraTreesRegressor ExtraTreesClassifier Bagging BaggingRegressor BaggingClassifier AdaBoost How Does It Work? Implementation AdaBoostRegressor AdaBoost Classifier Advantages/Disadvantages Gradient Boosting Loss Function Requirements for Gradient Boosting Implementation GradientBoostingRegressor AdaBoostClassifier Pros and Cons XGBoost Implementation XGBRegressor XGBClassifier CatBoost Implementation CatBoostRegressor CatBoostClassifier LightGBM Implementation The LGBMRegressor The LGBMClassifier Performance Summary Summary 6: K-Nearest Neighbors In a Nutshell K-Nearest Neighbors KNN Algorithm KNN Working Effect of K Advantages Disadvantages of KNN Implementation Project Loading Dataset Determining K Optimal Model Training Model Testing When to Use? Summary 7: Naive Bayes In a Nutshell When to Use? Naive Bayes Theorem Applying the Theorem Advantages Disadvantages Improving Performance Naive Bayes Types Multinomial Naive Bayes Bernoulli Naive Bayes Gaussian Naive Bayes Complement Naive Bayes Categorical Naive Bayes Model Fitting for Huge Datasets Project Preparing Dataset Data Visualization Model Building Inferring on Unseen Data Summary 8: Support Vector Machines In a Nutshell SVM Working Hyperplane Types Kernel Effects Linear Kernel Polynomial Kernel Radial Basis Function Sigmoid Guidelines on Kernel Selection Parameter Tuning The C Parameter The Degree Parameter The Gamma Parameter The decision_function_shape Parameter Project Advantages and Disadvantages Summary Part II: Clustering: Overview 9: Centroid-Based Clustering The K-Means Algorithm In a Nutshell How Does It Work? K-Means Algorithm Objective Function The Process Workflow Selecting Optimal Clusters Elbow Method Average Silhouette Method The Gap Statistic Method Limitations of K-Means Clustering Applications Implementation Project The K-Medoids Algorithm In a Nutshell Why K-Medoids? Algorithm Merits and Demerits Implementation Summary 10: Connectivity-Based Clustering Agglomerative Clustering In a Nutshell The Working Single Linkage Complete Linkage Average Linkage Advantages and Disadvantages Applications Implementation Project Divisive Clustering In a Nutshell The Working Implementation Challenges Summary 11: Gaussian Mixture Model In a Nutshell Gaussian Distribution Probability Distribution Selecting Number of Clusters Implementation Project Determining Optimal Number of Clusters Summary 12: Density-Based Clustering DBSCAN In a Nutshell Why DBSCAN? Preliminaries Algorithm Working Advantages and Disadvantages Implementation Project OPTICS In a Nutshell Core Distance Reachability Distance Implementation Project Mean Shift Clustering In a Nutshell Algorithm Working Bandwidth Selection Strengths Weaknesses Applications Implementation Project Summary 13: BIRCH In a Nutshell Why BIRCH? Clustering Feature CF Tree BIRCH Algorithm Implementation Project Summary 14: CLARANS In a Nutshell CLARA Algorithm CLARANS Algorithm Advantages Project Summary 15: Affinity Propagation Clustering In a Nutshell Algorithm Working Responsibility Matrix Updates Availability Matrix Updates Updating Scores Few Remarks Implementation Project Summary 16: STING & CLIQUE STING: A Grid-Based Clustering Algorithm In a Nutshell How Does It Work? Advantages and Disadvantages Applications CLIQUE: Density- and Grid-Based Subspace Clustering Algorithm In a Nutshell How Does It Work? Pros/Cons Implementation Project Summary Part III: ANN: Overview 17: Artificial Neural Networks AI Evolution Artificial Neural Networks Perceptron What Is ANN? Network Training ANN Architectures What Is DNN? Network Architectures What Are Pre-trained Models? Important Terms to Know Activation Functions Back Propagation Vanishing and Exploding Gradients Optimization Functions Types of Optimizers Loss Functions Regression Loss Functions Classification Loss Functions Types of Network Architectures Convolutional Neural Network Convolutional Layer Pooling Layer Fully Connected Layer CNN Applications Generative Adversarial Network Model Architecture The Generator The Discriminator How Does GAN Work? How Data Scientists Use GAN? Recurrent Neural Networks (RNN) Long Short-Term Memory (LSTM) Forget Gate Input Gate Update Gate Output Gate LSTM Applications Transfer Learning Pre-trained Models for Text Word2Vec Glove Transformer BERT GPT Pre-trained Models for Image Data Advantages/Disadvantages Summary 18: ANN-Based Applications Developing NLP Applications Dataset Text Preprocessing Using BERT Creating Training/Testing Datasets Setting Up BERT Model Building Model Training Model Evaluation Using Embeddings N-gram Analysis Tokenizing Remove Stop Words Model Building Using Own Embeddings: Model 0 Embedding Weight Matrix Glove: Model 1 Glove: Model 2 Glove: Model 3 Final Thoughts Developing Image-Based Applications Data Preparation Modeling CNN-Based Network VGG16 ResNet50 MobileNet DenseNet121 Summarizing Observations Modeling on High-Resolution Images Inferring Web Images Summary 19: Automated Tools In a Nutshell Classical AI Auto-sklearn Auto-sklearn for Classification on Synthetic Dataset Auto-sklearn for Classification on Real Dataset Auto-sklearn for Regression Auto-sklearn Architecture Auto-sklearn Features What´s Next? ANN/DNN AutoKeras for Classification AutoKeras for Regression AutoKeras Image Classifier More AutoML Frameworks PyCaret MLBox TPOT H2O.ai DataRobot DataBricks BlobCity AutoAI Summary 20: Data Scientist´s Ultimate Workflow Consolidated Overview Workflow-0: Quick Solution Workflow-1: Technology Selection Workflow-2: Data Preprocessing Workflow-3: EDA Workflow-4: Features Engineering Workflow-5: Type of Task Workflow-6: Preparing Datasets Workflow-7: Algorithm Selections Workflow-8: AutoML Workflow-9: Hyper-parameter Tuning Workflow-10: ANN Model Building Workflow-11: Clustering Summary