دسترسی نامحدود
برای کاربرانی که ثبت نام کرده اند
برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید
در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید
برای کاربرانی که ثبت نام کرده اند
درصورت عدم همخوانی توضیحات با کتاب
از ساعت 7 صبح تا 10 شب
ویرایش: [4th Edition]
نویسندگان: Max Bramer
سری: Undergraduate Topics In Computer Science
ISBN (شابک) : 1447174925, 9781447174936
ناشر: Springer
سال نشر: 2020
تعداد صفحات: 576
زبان: English
فرمت فایل : PDF (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود)
حجم فایل: 6 Mb
در صورت تبدیل فایل کتاب Principles Of Data Mining به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.
توجه داشته باشید کتاب اصول داده کاوی نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.
این کتاب تکنیکهای اصلی دادهکاوی، استخراج خودکار اطلاعات ضمنی و بالقوه مفید از دادهها را که به طور فزایندهای در حوزههای تجاری، علمی و سایر حوزههای کاربردی مورد استفاده قرار میگیرد، توضیح میدهد و بررسی میکند. این بر طبقه بندی، استخراج قوانین انجمن و خوشه بندی تمرکز دارد. هر موضوع به وضوح توضیح داده شده است، با تمرکز بر الگوریتم ها نه فرمالیسم ریاضی، و با مثال های دقیق کار شده نشان داده شده است. این کتاب برای خوانندگان بدون پیش زمینه قوی در ریاضیات یا آمار نوشته شده است و هر فرمول استفاده شده با جزئیات توضیح داده شده است. این می تواند به عنوان یک کتاب درسی برای پشتیبانی از دوره های کارشناسی یا کارشناسی ارشد در طیف گسترده ای از موضوعات از جمله علوم کامپیوتر، مطالعات بازرگانی، بازاریابی، هوش مصنوعی، بیوانفورماتیک و علوم قانونی استفاده شود. به عنوان کمکی برای خودآموزی، هدف آن کمک به خوانندگان عمومی است تا درک لازم را از آنچه در داخل "جعبه سیاه" وجود دارد ایجاد کنند تا بتوانند از بسته های داده کاوی تجاری به طور متمایز استفاده کنند و همچنین خوانندگان پیشرفته یا محققان دانشگاهی را قادر می سازد تا درک کنند یا مشارکت کنند. به پیشرفت های فنی آینده در این زمینه. هر فصل دارای تمرینهای عملی است تا خوانندگان بتوانند پیشرفت خود را بررسی کنند. یک واژه نامه کامل از اصطلاحات فنی استفاده شده گنجانده شده است. اصول داده کاوی شامل توصیفی از الگوریتمهایی برای طبقهبندی دادههای جریانی است، هم دادههای ثابت، جایی که مدل اصلی ثابت است، و هم دادههایی که وابسته به زمان هستند، جایی که مدل اساسی هر از گاهی تغییر میکند - پدیدهای که به عنوان رانش مفهومی شناخته میشود. نسخه چهارم توسعهیافته، شرح مفصلی از یک شبکه عصبی پیشخور با پس انتشار ارائه میدهد و نشان میدهد که چگونه میتوان از آن برای طبقهبندی استفاده کرد.
This book explains and explores the principal techniques of Data Mining, the automatic extraction of implicit and potentially useful information from data, which is increasingly used in commercial, scientific and other application areas. It focuses on classification, association rule mining and clustering. Each topic is clearly explained, with a focus on algorithms not mathematical formalism, and is illustrated by detailed worked examples. The book is written for readers without a strong background in mathematics or statistics and any formulae used are explained in detail. It can be used as a textbook to support courses at undergraduate or postgraduate levels in a wide range of subjects including Computer Science, Business Studies, Marketing, Artificial Intelligence, Bioinformatics and Forensic Science. As an aid to self-study, it aims to help general readers develop the necessary understanding of what is inside the 'black box' so they can use commercial data mining packages discriminatingly, as well as enabling advanced readers or academic researchers to understand or contribute to future technical advances in the field. Each chapter has practical exercises to enable readers to check their progress. A full glossary of technical terms used is included. Principles of Data Mining includes descriptions of algorithms for classifying streaming data, both stationary data, where the underlying model is fixed, and data that is time-dependent, where the underlying model changes from time to time - a phenomenon known as concept drift. The expanded fourth edition gives a detailed description of a feed-forward neural network with backpropagation and shows how it can be used for classification.
Principles of Data Mining About This Book Contents 1. Introduction to Data Mining 1.1 The Data Explosion 1.2 Knowledge Discovery 1.3 Applications of Data Mining 1.4 Labelled and Unlabelled Data 1.5 Supervised Learning: Classification 1.6 Supervised Learning: Numerical Prediction 1.7 Unsupervised Learning: Association Rules 1.8 Unsupervised Learning: Clustering 2. Data for Data Mining 2.1 Standard Formulation 2.2 Types of Variable 2.2.1 Categorical and Continuous Attributes 2.3 Data Preparation 2.3.1 Data Cleaning 2.4 Missing Values 2.4.1 Discard Instances 2.4.2 Replace by Most Frequent/Average Value 2.5 Reducing the Number of Attributes 2.6 The UCI Repository of Datasets 2.7 Chapter Summary 2.8 Self-assessment Exercises for Chapter 2 Reference 3. Introduction to Classification: Naïve Bayes and Nearest Neighbour 3.1 What Is Classification? 3.2 Naïve Bayes Classifiers 3.3 Nearest Neighbour Classification 3.3.1 Distance Measures 3.3.2 Normalisation 3.3.3 Dealing with Categorical Attributes 3.4 Eager and Lazy Learning 3.5 Chapter Summary 3.6 Self-assessment Exercises for Chapter 3 4. Using Decision Trees for Classification 4.1 Decision Rules and Decision Trees 4.1.1 Decision Trees: The Golf Example 4.1.2 Terminology 4.1.3 The degrees Dataset 4.2 The TDIDT Algorithm 4.3 Types of Reasoning 4.4 Chapter Summary 4.5 Self-assessment Exercises for Chapter 4 References 5. Decision Tree Induction: Using Entropy for Attribute Selection 5.1 Attribute Selection: An Experiment 5.2 Alternative Decision Trees 5.2.1 The Football/Netball Example 5.2.2 The anonymous Dataset 5.3 Choosing Attributes to Split On: Using Entropy 5.3.1 The lens24 Dataset 5.3.2 Entropy 5.3.3 Using Entropy for Attribute Selection 5.3.4 Maximising Information Gain 5.4 Chapter Summary 5.5 Self-assessment Exercises for Chapter 5 6. Decision Tree Induction: Using Frequency Tables for Attribute Selection 6.1 Calculating Entropy in Practice 6.1.1 Proof of Equivalence 6.1.2 A Note on Zeros 6.2 Other Attribute Selection Criteria: Gini Index of Diversity 6.3 The chi2 Attribute Selection Criterion 6.4 Inductive Bias 6.5 Using Gain Ratio for Attribute Selection 6.5.1 Properties of Split Information 6.5.2 Summary 6.6 Number of Rules Generated by Different Attribute Selection Criteria 6.7 Missing Branches 6.8 Chapter Summary 6.9 Self-assessment Exercises for Chapter 6 References 7. Estimating the Predictive Accuracy of a Classifier 7.1 Introduction 7.2 Method 1: Separate Training and Test Sets 7.2.1 Standard Error 7.2.2 Repeated Train and Test 7.3 Method 2: k-fold Cross-validation 7.4 Method 3: N-fold Cross-validation 7.5 Experimental Results I 7.6 Experimental Results II: Datasets with Missing Values 7.6.1 Strategy 1: Discard Instances 7.6.2 Strategy 2: Replace by Most Frequent/Average Value 7.6.3 Missing Classifications 7.7 Confusion Matrix 7.7.1 True and False Positives 7.8 Chapter Summary 7.9 Self-assessment Exercises for Chapter 7 Reference 8. Continuous Attributes 8.1 Introduction 8.2 Local versus Global Discretisation 8.3 Adding Local Discretisation to TDIDT 8.3.1 Calculating the Information Gain of a Set of Pseudo-attributes 8.3.2 Computational Efficiency 8.4 Using the ChiMerge Algorithm for Global Discretisation 8.4.1 Calculating the Expected Values and chi2 8.4.2 Finding the Threshold Value 8.4.3 Setting minIntervals and maxIntervals 8.4.4 The ChiMerge Algorithm: Summary 8.4.5 The ChiMerge Algorithm: Comments 8.5 Comparing Global and Local Discretisation for Tree Induction 8.6 Chapter Summary 8.7 Self-assessment Exercises for Chapter 8 Reference 9. Avoiding Overfitting of Decision Trees 9.1 Dealing with Clashes in a Training Set 9.1.1 Adapting TDIDT to Deal with Clashes 9.2 More About Overfitting Rules to Data 9.3 Pre-pruning Decision Trees 9.4 Post-pruning Decision Trees 9.5 Chapter Summary 9.6 Self-assessment Exercise for Chapter 9 References 10. More About Entropy 10.1 Introduction 10.2 Coding Information Using Bits 10.3 Discriminating Amongst M Values (M Not a Power of 2) 10.4 Encoding Values That Are Not Equally Likely 10.5 Entropy of a Training Set 10.6 Information Gain Must Be Positive or Zero 10.7 Using Information Gain for Feature Reduction for Classification Tasks 10.7.1 Example 1: The genetics Dataset 10.7.2 Example 2: The bcst96 Dataset 10.8 Chapter Summary 10.9 Self-assessment Exercises for Chapter 10 References 11. Inducing Modular Rules for Classification 11.1 Rule Post-pruning 11.2 Conflict Resolution 11.3 Problems with Decision Trees 11.4 The Prism Algorithm 11.4.1 Changes to the Basic Prism Algorithm 11.4.2 Comparing Prism with TDIDT 11.5 Chapter Summary 11.6 Self-assessment Exercise for Chapter 11 References 12. Measuring the Performance of a Classifier 12.1 True and False Positives and Negatives 12.2 Performance Measures 12.3 True and False Positive Rates versus Predictive Accuracy 12.4 ROC Graphs 12.5 ROC Curves 12.6 Finding the Best Classifier 12.7 Chapter Summary 12.8 Self-assessment Exercise for Chapter 12 13. Dealing with Large Volumes of Data 13.1 Introduction 13.2 Distributing Data onto Multiple Processors 13.3 Case Study: PMCRI 13.4 Evaluating the Effectiveness of a Distributed System: PMCRI 13.5 Revising a Classifier Incrementally 13.6 Chapter Summary 13.7 Self-assessment Exercises for Chapter 13 References 14. Ensemble Classification 14.1 Introduction 14.2 Estimating the Performance of a Classifier 14.3 Selecting a Different Training Set for Each Classifier 14.4 Selecting a Different Set of Attributes for Each Classifier 14.5 Combining Classifications: Alternative Voting Systems 14.6 Parallel Ensemble Classifiers 14.7 Chapter Summary 14.8 Self-assessment Exercises for Chapter 14 References 15. Comparing Classifiers 15.1 Introduction 15.2 The Paired t-Test 15.3 Choosing Datasets for Comparative Evaluation 15.3.1 Confidence Intervals 15.4 Sampling 15.5 How Bad Is a `No Significant Difference\' Result? 15.6 Chapter Summary 15.7 Self-assessment Exercises for Chapter 15 References 16. Association Rule Mining I 16.1 Introduction 16.2 Measures of Rule Interestingness 16.2.1 The Piatetsky-Shapiro Criteria and the RI Measure 16.2.2 Rule Interestingness Measures Applied to the chess Dataset 16.2.3 Using Rule Interestingness Measures for Conflict Resolution 16.3 Association Rule Mining Tasks 16.4 Finding the Best N Rules 16.4.1 The J-Measure: Measuring the Information Content of a Rule 16.4.2 Search Strategy 16.5 Chapter Summary 16.6 Self-assessment Exercises for Chapter 16 References 17. Association Rule Mining II 17.1 Introduction 17.2 Transactions and Itemsets 17.3 Support for an Itemset 17.4 Association Rules 17.5 Generating Association Rules 17.6 Apriori 17.7 Generating Supported Itemsets: An Example 17.8 Generating Rules for a Supported Itemset 17.9 Rule Interestingness Measures: Lift and Leverage 17.10 Chapter Summary 17.11 Self-assessment Exercises for Chapter 17 Reference 18. Association Rule Mining III: Frequent Pattern Trees 18.1 Introduction: FP-Growth 18.2 Constructing the FP-tree 18.2.1 Pre-processing the Transaction Database 18.2.2 Initialisation 18.2.3 Processing Transaction 1: f, c, a, m, p 18.2.4 Processing Transaction 2: f, c, a, b, m 18.2.5 Processing Transaction 3: f, b 18.2.6 Processing Transaction 4: c, b, p 18.2.7 Processing Transaction 5: f, c, a, m, p 18.3 Finding the Frequent Itemsets from the FP-tree 18.3.1 Itemsets Ending with Item p 18.3.2 Itemsets Ending with Item m 18.4 Chapter Summary 18.5 Self-assessment Exercises for Chapter 18 Reference 19. Clustering 19.1 Introduction 19.2 k-Means Clustering 19.2.1 Example 19.2.2 Finding the Best Set of Clusters 19.3 Agglomerative Hierarchical Clustering 19.3.1 Recording the Distance Between Clusters 19.3.2 Terminating the Clustering Process 19.4 Chapter Summary 19.5 Self-assessment Exercises for Chapter 19 20. Text Mining 20.1 Multiple Classifications 20.2 Representing Text Documents for Data Mining 20.3 Stop Words and Stemming 20.4 Using Information Gain for Feature Reduction 20.5 Representing Text Documents: Constructing a Vector Space Model 20.6 Normalising the Weights 20.7 Measuring the Distance Between Two Vectors 20.8 Measuring the Performance of a Text Classifier 20.9 Hypertext Categorisation 20.9.1 Classifying Web Pages 20.9.2 Hypertext Classification versus Text Classification 20.10 Chapter Summary 20.11 Self-assessment Exercises for Chapter 20 21. Classifying Streaming Data 21.1 Introduction 21.1.1 Stationary v Time-dependent Data 21.2 Building an H-Tree: Updating Arrays 21.2.1 Array currentAtts 21.2.2 Array splitAtt 21.2.3 Sorting a record to the appropriate leaf node 21.2.4 Array hitcount 21.2.5 Array classtotals 21.2.6 Array acvCounts 21.2.7 Array branch 21.3 Building an H-Tree: a Detailed Example 21.3.1 Step (a): Initialise Root Node 0 21.3.2 Step (b): Begin Reading Records 21.3.3 Step (c): Consider Splitting at Node 0 21.3.4 Step (d): Split on Root Node and Initialise New Leaf Nodes 21.3.5 Step (e): Process the Next Set of Records 21.3.6 Step (f): Consider Splitting at Node 2 21.3.7 Step (g): Process the Next Set of Records 21.3.8 Outline of the H-Tree Algorithm 21.4 Splitting on an Attribute: Using Information Gain 21.5 Splitting on An Attribute: Using a Hoeffding Bound 21.6 H-Tree Algorithm: Final Version 21.7 Using an Evolving H-Tree to Make Predictions 21.7.1 Evaluating the Performance of an H-Tree 21.8 Experiments: H-Tree versus TDIDT 21.8.1 The lens24 Dataset 21.8.2 The vote Dataset 21.9 Chapter Summary 21.10 Self-assessment Exercises for Chapter 21 References 22. Classifying Streaming Data II: Time-Dependent Data 22.1 Stationary versus Time-dependent Data 22.2 Summary of the H-Tree Algorithm 22.2.1 Array currentAtts 22.2.2 Array splitAtt 22.2.3 Array hitcount 22.2.4 Array classtotals 22.2.5 Array acvCounts 22.2.6 Array branch 22.2.7 Pseudocode for the H-Tree Algorithm 22.3 From H-Tree to CDH-Tree: Overview 22.4 From H-Tree to CDH-Tree: Incrementing Counts 22.5 The Sliding Window Method 22.6 Resplitting at Nodes 22.7 Identifying Suspect Nodes 22.8 Creating Alternate Nodes 22.9 Growing/Forgetting an Alternate Node and its Descendants 22.10 Replacing an Internal Node by One of its Alternate Nodes 22.11 Experiment: Tracking Concept Drift 22.11.1 lens24 Data: Alternative Mode 22.11.2 Introducing Concept Drift 22.11.3 An Experiment with Alternating lens24 Data 22.11.4 Comments on Experiment 22.12 Chapter Summary 22.13 Self-assessment Exercises for Chapter 22 Reference 23. An Introduction to Neural Networks 23.1 Introduction 23.2 Neural Nets Example 1 23.3 Neural Nets Example 2 23.3.1 Forward Propagating the Values of the Input Nodes 23.3.2 Forward Propagation: Summary of Formulae 23.4 Backpropagation 23.4.1 Stochastic Gradient Descent 23.4.2 Finding the Gradients 23.4.3 Working backwards from the output layer to the hidden layer 23.4.4 Working backwards from the hidden layer to the input layer 23.4.5 Updating the Weights 23.5 Processing a Multi-instance Training Set 23.6 Using a Neural Net for Classification: the iris Dataset 23.7 Using a Neural Net for Classification: the seeds Dataset 23.8 Neural Nets: A Note of Caution 23.9 Chapter Summary 23.10 Self-assessment Exercises for Chapter 23 A. Essential Mathematics A.1 Subscript Notation A.1.1 Sigma Notation for Summation A.1.2 Double Subscript Notation A.1.3 Other Uses of Subscripts A.2 Trees A.2.1 Terminology A.2.2 Interpretation A.2.3 Subtrees A.3 The Logarithm Function log2 X A.3.1 The Function -X log2 X A.4 Introduction to Set Theory A.4.1 Subsets A.4.2 Summary of Set Notation B. Datasets References C. Sources of Further Information Websites Books Conferences Information About Association Rule Mining D. Glossary and Notation E. Solutions to Self-assessment Exercises Index