دسترسی نامحدود
برای کاربرانی که ثبت نام کرده اند
برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید
در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید
برای کاربرانی که ثبت نام کرده اند
درصورت عدم همخوانی توضیحات با کتاب
از ساعت 7 صبح تا 10 شب
ویرایش: [2 ed.]
نویسندگان: CHARU AGGARWAL
سری:
ISBN (شابک) : 9783030966232, 3030966232
ناشر: SPRINGER
سال نشر: 2022
تعداد صفحات: [582]
زبان: English
فرمت فایل : PDF (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود)
حجم فایل: 8 Mb
در صورت تبدیل فایل کتاب MACHINE LEARNING FOR TEXT به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.
توجه داشته باشید کتاب یادگیری ماشینی برای متن نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.
این کتاب درسی ویرایش دوم یک چارچوب سازمانیافته منسجم برای تجزیه و تحلیل متن را پوشش میدهد، که مطالبی را که از موضوعات متقاطع بازیابی اطلاعات، یادگیری ماشین و پردازش زبان طبیعی گرفته شده است، ادغام میکند. اهمیت ویژه ای به روش های یادگیری عمیق داده می شود. فصول این کتاب شامل سه دسته کلی است: 1. الگوریتمهای پایه: فصلهای 1 تا 7 الگوریتمهای کلاسیک برای تجزیه و تحلیل متن مانند پیش پردازش، محاسبه شباهت، مدلسازی موضوع، فاکتورسازی ماتریس، خوشهبندی، طبقهبندی، رگرسیون و تحلیل مجموعه را مورد بحث قرار میدهند. 2. یادگیری حساس به دامنه و بازیابی اطلاعات: فصل های 8 و 9 در مورد مدل های یادگیری در تنظیمات ناهمگون مانند ترکیبی از متن با چند رسانه ای یا پیوندهای وب بحث می کنند. مشکل بازیابی اطلاعات و جستجوی وب نیز در زمینه ارتباط آن با روش های رتبه بندی و یادگیری ماشین مورد بحث قرار می گیرد. 3. پردازش زبان طبیعی: فصول 10 تا 16 کاربردهای مختلف توالی محور و زبان طبیعی، مانند مهندسی ویژگی، مدل های زبان عصبی، یادگیری عمیق، ترانسفورماتورها، مدل های زبان از پیش آموزش دیده، خلاصه سازی متن، استخراج اطلاعات، نمودارهای دانش، سؤال را مورد بحث قرار می دهد. پاسخگویی، نظرکاوی، تقسیم بندی متن، و تشخیص رویداد. در مقایسه با ویرایش اول، این کتاب درسی ویرایش دوم (که بیشتر دانشآموزان سطح پیشرفته در رشته علوم کامپیوتر و ریاضی را هدف قرار میدهد) به طور قابلتوجهی مطالب بیشتری در مورد یادگیری عمیق و پردازش زبان طبیعی دارد. تمرکز قابل توجهی روی موضوعاتی مانند ترانسفورماتورها، مدلهای زبانی از پیش آموزشدیده، نمودارهای دانش و پاسخگویی به سؤال است.
This second edition textbook covers a coherently organized framework for text analytics, which integrates material drawn from the intersecting topics of information retrieval, machine learning, and natural language processing. Particular importance is placed on deep learning methods. The chapters of this book span three broad categories:1. Basic algorithms: Chapters 1 through 7 discuss the classical algorithms for text analytics such as preprocessing, similarity computation, topic modeling, matrix factorization, clustering, classification, regression, and ensemble analysis. 2. Domain-sensitive learning and information retrieval: Chapters 8 and 9 discuss learning models in heterogeneous settings such as a combination of text with multimedia or Web links. The problem of information retrieval and Web search is also discussed in the context of its relationship with ranking and machine learning methods. 3. Natural language processing: Chapters 10 through 16 discuss various sequence-centric and natural language applications, such as feature engineering, neural language models, deep learning, transformers, pre-trained language models, text summarization, information extraction, knowledge graphs, question answering, opinion mining, text segmentation, and event detection. Compared to the first edition, this second edition textbook (which targets mostly advanced level students majoring in computer science and math) has substantially more material on deep learning and natural language processing. Significant focus is placed on topics like transformers, pre-trained language models, knowledge graphs, and question answering.
Preface Acknowledgments Contents Author Biography 1 An Introduction to Text Analytics 1.1 Introduction 1.2 What Is Special About Learning from Text? 1.3 Analytical Models for Text 1.3.1 Text Preprocessing and Similarity Computation 1.3.2 Dimensionality Reduction and Matrix Factorization 1.3.3 Text Clustering 1.3.3.1 Deterministic and Probabilistic Matrix FactorizationMethods 1.3.3.2 Probabilistic Mixture Models of Documents 1.3.3.3 Similarity-Based Algorithms 1.3.3.4 Advanced Methods 1.3.4 Text Classification and Regression Modeling 1.3.4.1 Decision Trees 1.3.4.2 Rule-Based Classifiers 1.3.4.3 Naïve Bayes Classifier 1.3.4.4 Nearest Neighbor Classifiers 1.3.4.5 Linear Classifiers 1.3.4.6 Broader Topics in Classification 1.3.5 Joint Analysis of Text with Heterogeneous Data 1.3.6 Information Retrieval and Web Search 1.3.7 Sequential Language Modeling and Embeddings 1.3.8 Transformers and Pretrained Language Models 1.3.9 Text Summarization 1.3.10 Information Extraction 1.3.11 Question Answering 1.3.12 Opinion Mining and Sentiment Analysis 1.3.13 Text Segmentation and Event Detection 1.4 Summary 1.5 Bibliographic Notes 1.5.1 Software Resources 1.6 Exercises 2 Text Preparation and Similarity Computation 2.1 Introduction 2.2 Raw Text Extraction and Tokenization 2.2.1 Web-Specific Issues in Text Extraction 2.3 Extracting Terms from Tokens 2.3.1 Stop-Word Removal 2.3.2 Hyphens 2.3.3 Case Folding 2.3.4 Usage-Based Consolidation 2.3.5 Stemming 2.4 Vector Space Representation and Normalization 2.5 Similarity Computation in Text 2.5.1 Is idf Normalization and Stemming Always Useful? 2.6 Summary 2.7 Bibliographic Notes 2.7.1 Software Resources 2.8 Exercises 3 Matrix Factorization and Topic Modeling 3.1 Introduction 3.1.1 Normalizing a Two-Way Factorization into a StandardizedThree-Way Factorization 3.2 Singular Value Decomposition 3.2.1 Example of SVD 3.2.2 The Power Method of Implementing SVD 3.2.3 Applications of SVD/LSA 3.2.4 Advantages and Disadvantages of SVD/LSA 3.3 Nonnegative Matrix Factorization 3.3.1 Interpretability of Nonnegative Matrix Factorization 3.3.2 Example of Nonnegative Matrix Factorization 3.3.3 Folding in New Documents 3.3.4 Advantages and Disadvantages of Nonnegative MatrixFactorization 3.4 Probabilistic Latent Semantic Analysis 3.4.1 Connections with Nonnegative Matrix Factorization 3.4.2 Comparison with SVD 3.4.3 Example of PLSA 3.4.4 Advantages and Disadvantages of PLSA 3.5 A Bird's Eye View of Latent Dirichlet Allocation 3.5.1 Simplified LDA Model 3.5.2 Smoothed LDA Model 3.6 Nonlinear Transformations and Feature Engineering 3.6.1 Choosing a Similarity Function 3.6.1.1 Traditional Kernel Similarity Functions 3.6.1.2 Generalizing Bag-of-Words to N-Grams 3.6.1.3 String Subsequence Kernels 3.6.1.4 Speeding Up the Recursion 3.6.1.5 Language-Dependent Kernels 3.6.2 Nyström Approximation 3.6.3 Partial Availability of the Similarity Matrix 3.7 Summary 3.8 Bibliographic Notes 3.8.1 Software Resources 3.9 Exercises 4 Text Clustering 4.1 Introduction 4.2 Feature Selection and Engineering 4.2.1 Feature Selection 4.2.1.1 Term Strength 4.2.1.2 Supervised Modeling for Unsupervised FeatureSelection 4.2.1.3 Unsupervised Wrappers with Supervised FeatureSelection 4.2.2 Feature Engineering 4.2.2.1 Matrix Factorization Methods 4.2.2.2 Nonlinear Dimensionality Reduction 4.3 Topic Modeling and Matrix Factorization 4.3.1 Mixed Membership Models and Overlapping Clusters 4.3.2 Non-overlapping Clusters and Co-clustering: A MatrixFactorization View 4.3.2.1 Co-clustering by Bipartite Graph Partitioning 4.4 Generative Mixture Models for Clustering 4.4.1 The Bernoulli Model 4.4.2 The Multinomial Model 4.4.3 Comparison with Mixed Membership Topic Models 4.4.4 Connections with Naïve Bayes Model for Classification 4.5 The k-Means Algorithm 4.5.1 Convergence and Initialization 4.5.2 Computational Complexity 4.5.3 Connection with Probabilistic Models 4.6 Hierarchical Clustering Algorithms 4.6.1 Efficient Implementation and Computational Complexity 4.6.2 The Natural Marriage with k-Means 4.7 Clustering Ensembles 4.7.1 Choosing the Ensemble Component 4.7.2 Combining the Results from Different Components 4.8 Clustering Text as Sequences 4.8.1 Kernel Methods for Clustering 4.8.1.1 Kernel k-Means 4.8.1.2 Explicit Feature Engineering 4.8.1.3 Kernel Trick or Explicit Feature Engineering? 4.8.2 Data-Dependent Kernels: Spectral Clustering 4.9 Transforming Clustering into Supervised Learning 4.10 Clustering Evaluation 4.10.1 The Pitfalls of Internal Validity Measures 4.10.2 External Validity Measures 4.10.2.1 Relationship of Clustering Evaluation to SupervisedLearning 4.10.2.2 Common Mistakes in Evaluation 4.11 Summary 4.12 Bibliographic Notes 4.12.1 Software Resources 4.13 Exercises 5 Text Classification: Basic Models 5.1 Introduction 5.1.1 Types of Labels and Regression Modeling 5.1.2 Training and Testing 5.1.3 Inductive, Transductive, and Deductive Learners 5.1.4 The Basic Models 5.1.5 Text-Specific Challenges in Classifiers 5.2 Feature Selection and Engineering 5.2.1 Gini Index 5.2.2 Conditional Entropy 5.2.3 Pointwise Mutual Information 5.2.4 Closely Related Measures 5.2.5 The χ2-Statistic 5.2.6 Embedded Feature Selection Models 5.2.7 Feature Engineering Tricks 5.3 The Naïve Bayes Model 5.3.1 The Bernoulli Model 5.3.2 Multinomial Model 5.3.3 Practical Observations 5.3.4 Ranking Outputs with Naïve Bayes 5.3.5 Example of Naïve Bayes 5.3.5.1 Bernoulli Model 5.3.5.2 Multinomial Model 5.3.6 Semi-Supervised Naïve Bayes 5.4 Nearest Neighbor Classifier 5.4.1 Properties of 1-Nearest Neighbor Classifiers 5.4.2 Rocchio and Nearest Centroid Classification 5.4.3 Weighted Nearest Neighbors 5.4.3.1 Bagged and Subsampled 1-Nearest Neighbors as Weighted Nearest Neighbor Classifiers 5.4.4 Adaptive Nearest Neighbors: A Powerful Family 5.5 Decision Trees and Random Forests 5.5.1 Basic Procedure for Decision Tree Construction 5.5.2 Splitting a Node 5.5.3 Multivariate Splits 5.5.4 Problematic Issues with Decision Trees in Text Classification 5.5.5 Random Forests 5.5.6 Random Forests as Adaptive Nearest Neighbor Methods 5.6 Rule-Based Classifiers 5.6.1 Sequential Covering Algorithms 5.6.1.1 Learn-One-Rule 5.6.2 Generating Rules from Decision Trees 5.6.3 Associative Classifiers 5.7 Summary 5.8 Bibliographic Notes 5.8.1 Software Resources 5.9 Exercises 6 Linear Models for Classification and Regression 6.1 Introduction 6.1.1 Geometric Interpretation of Linear Models 6.1.2 Do We Need the Bias Variable? 6.1.3 A General Definition of Linear Models with Regularization 6.1.4 Generalizing Binary Predictions to Multiple Classes 6.1.5 Characteristics of Linear Models for Text 6.2 Least-Squares Regression and Classification 6.2.1 Least-Squares Regression with L2-Regularization 6.2.1.1 Efficient Implementation 6.2.1.2 Approximate Estimation with Singular ValueDecomposition 6.2.1.3 The Path to Kernel Regression 6.2.2 LASSO: Least-Squares Regression with L1-Regularization 6.2.2.1 Interpreting LASSO as a Feature Selector 6.2.3 Fisher's Linear Discriminant and Least-Squares Classification 6.2.3.1 Linear Discriminant with Multiple Classes 6.2.3.2 Equivalence of Fisher Discriminant and Least-Squares Regression 6.2.3.3 Regularized Least-Squares Classification and LLSF 6.2.3.4 The Achilles Heel of Least-Squares Classification 6.3 Support Vector Machines 6.3.1 The Regularized Optimization Interpretation 6.3.2 The Maximum Margin Interpretation 6.3.3 Pegasos: Solving SVMs in the Primal 6.3.4 Dual SVM Formulation 6.3.5 Learning Algorithms for Dual SVMs 6.3.6 Adaptive Nearest Neighbor Interpretation of Dual SVMs 6.4 Logistic Regression 6.4.1 The Regularized Optimization Interpretation 6.4.2 Training Algorithms for Logistic Regression 6.4.3 Probabilistic Interpretation of Logistic Regression 6.4.3.1 Probabilistic Interpretation of Stochastic Gradient Descent Steps 6.4.3.2 Relationships among Primal Updates of LinearModels 6.4.4 Multinomial Logistic Regression and Other Generalizations 6.4.5 Comments on the Performance of Logistic Regression 6.5 Nonlinear Generalizations of Linear Models 6.5.1 Kernel SVMs with Explicit Transformation 6.5.2 Why do Conventional Kernels Promote Linear Separability? 6.5.3 Strengths and Weaknesses of Different Kernels 6.5.4 The Kernel Trick 6.5.5 Systematic Application of the Kernel Trick 6.6 Summary 6.7 Bibliographic Notes 6.7.1 Software Resources 6.8 Exercises 7 Classifier Performance and Evaluation 7.1 Introduction 7.2 The Bias-Variance Trade-Off 7.2.1 A Formal View 7.2.2 Telltale Signs of Bias and Variance 7.3 Implications of Bias-Variance Trade-Off on Performance 7.3.1 Impact of Training Data Size 7.3.2 Impact of Data Dimensionality 7.3.3 Implications for Model Choice in Text 7.4 Systematic Performance Enhancement with Ensembles 7.4.1 Bagging and Subsampling 7.4.2 Boosting 7.5 Classifier Evaluation 7.5.1 Segmenting into Training and Testing Portions 7.5.1.1 Hold-Out 7.5.1.2 Cross-Validation 7.5.2 Absolute Accuracy Measures 7.5.2.1 Accuracy of Classification 7.5.2.2 Accuracy of Regression 7.5.3 Ranking Measures for Classification and Information Retrieval 7.5.3.1 Receiver Operating Characteristic 7.5.3.2 Top-Heavy Measures for Ranked Lists 7.6 Summary 7.7 Bibliographic Notes 7.7.1 Software Resources 7.7.2 Data Sets for Evaluation 7.8 Exercises 8 Joint Text Mining with Heterogeneous Data 8.1 Introduction 8.2 The Shared Matrix Factorization Trick 8.2.1 The Factorization Graph 8.2.2 Application: Shared Factorization with Text and Web Links 8.2.2.1 Solving the Optimization Problem 8.2.2.2 Supervised Embeddings 8.2.3 Application: Text with Undirected Social Networks 8.2.3.1 Application to Link Prediction with Text Content 8.2.4 Application: Transfer Learning in Images with Text 8.2.4.1 Transfer Learning with Unlabeled Text 8.2.4.2 Transfer Learning with Labeled Text 8.2.5 Application: Recommender Systems with Ratings and Text 8.2.6 Application: Cross-Lingual Text Mining 8.3 Factorization Machines 8.4 Joint Probabilistic Modeling Techniques 8.4.1 Joint Probabilistic Models for Clustering 8.4.2 Naïve Bayes Classifier 8.5 Transformation to Graph Mining Techniques 8.6 Summary 8.7 Bibliographic Notes 8.7.1 Software Resources 8.8 Exercises 9 Information Retrieval and Search Engines 9.1 Introduction 9.2 Indexing and Query Processing 9.2.1 Dictionary Data Structures 9.2.2 Inverted Index 9.2.3 Linear Time Index Construction 9.2.4 Query Processing 9.2.4.1 Boolean Retrieval 9.2.4.2 Ranked Retrieval 9.2.4.3 Positional Queries 9.2.4.4 Zoned Scoring 9.2.4.5 Machine Learning in Information Retrieval 9.2.4.6 Ranking Support Vector Machines 9.2.5 Efficiency Optimizations 9.2.5.1 Skip Pointers 9.2.5.2 Champion Lists and Tiered Indexes 9.2.5.3 Caching Tricks 9.2.5.4 Compression Tricks 9.3 Scoring with Information Retrieval Models 9.3.1 Vector Space Models with tf-idf 9.3.2 The Binary Independence Model 9.3.3 The BM25 Model with Term Frequencies 9.3.4 Statistical Language Models in Information Retrieval 9.3.4.1 Query Likelihood Models 9.4 Web Crawling and Resource Discovery 9.4.1 A Basic Crawler Algorithm 9.4.2 Preferential Crawlers 9.4.3 Multiple Threads 9.4.4 Combatting Spider Traps 9.4.5 Shingling for Near Duplicate Detection 9.5 Query Processing in Search Engines 9.5.1 Distributed Index Construction 9.5.2 Dynamic Index Updates 9.5.3 Query Processing 9.5.4 The Importance of Reputation 9.6 Link-Based Ranking Algorithms 9.6.1 PageRank 9.6.1.1 Topic-Sensitive PageRank 9.6.1.2 SimRank 9.6.2 HITS 9.7 Summary 9.8 Bibliographic Notes 9.8.1 Software Resources 9.9 Exercises 10 Language Modeling and Deep Learning 10.1 Introduction 10.2 Statistical Language Models 10.2.1 Skip-Gram Models 10.2.2 Relationship with Embeddings 10.2.3 Evaluating Language Models with Perplexity 10.3 Kernel Methods for Sequence-Centric Learning 10.4 Word-Context Matrix Factorization Models 10.4.1 Matrix Factorization with Counts 10.4.2 The GloVe Embedding 10.4.3 PPMI Matrix Factorization 10.4.4 Shifted PPMI Matrix Factorization 10.4.5 Incorporating Syntactic and Other Features 10.5 Graphical Representations of Word Distances 10.6 Neural Networks and Word Embeddings 10.6.1 Neural Networks: A Gentle Introduction 10.6.1.1 Single Computational Layer: The Perceptron 10.6.1.2 Multilayer Neural Networks 10.6.2 Neural Embedding with Word2vec 10.6.2.1 Neural Embedding with Continuous Bag of Words 10.6.2.2 Neural Embedding with Skip-Gram Model 10.6.2.3 Skip-Gram with Negative Sampling 10.6.2.4 What Is the Actual Neural Architecture of SGNS? 10.6.3 Word2vec (SGNS) Is Logistic Matrix Factorization 10.6.4 Beyond Words: Embedding Paragraphs with Doc2vec 10.7 Recurrent Neural Networks 10.7.1 Language Modeling Example of RNN 10.7.1.1 Generating a Language Sample 10.7.2 Backpropagation Through Time 10.7.3 Bidirectional Recurrent Networks 10.7.4 Multilayer Recurrent Networks 10.7.5 Long Short-Term Memory (LSTM) 10.7.6 Gated Recurrent Units (GRUs) 10.7.7 Layer Normalization 10.8 Applications of Recurrent Neural Networks 10.8.1 Contextual Word Embeddings with ELMo 10.8.2 Application to Automatic Image Captioning 10.8.3 Sequence-to-Sequence Learning and Machine Translation 10.8.3.1 BLEU Score for Evaluating Machine Translation 10.8.4 Application to Sentence-Level Classification 10.8.5 Token-Level Classification with Linguistic Features 10.9 Convolutional Neural Networks for Text 10.10 Summary 10.11 Bibliographic Notes 10.11.1 Software Resources 10.12 Exercises 11 Attention Mechanisms and Transformers 11.1 Introduction 11.2 Attention Mechanisms for Machine Translation 11.2.1 The Luong Attention Model 11.2.2 Variations and Comparison with Bahdanau Attention 11.3 Transformer Networks 11.3.1 How Self Attention Helps 11.3.2 The Self-Attention Module 11.3.3 Incorporating Positional Information 11.3.4 The Sequence-to-Sequence Transformer 11.3.5 Multihead Attention 11.4 Transformer-Based Pre-trained Language Models 11.4.1 GPT-n 11.4.2 BERT 11.4.3 T5 11.5 Natural Language Processing Applications 11.5.1 The GLUE and SuperGLUE Benchmarks 11.5.2 The Corpus of Linguistic Acceptability (CoLA) 11.5.3 Sentiment Analysis 11.5.4 Token-Level Classification 11.5.5 Machine Translation and Summarization 11.5.6 Textual Entailment 11.5.7 Semantic Textual Similarity 11.5.8 Word Sense Disambiguation 11.5.9 Co-Reference Resolution 11.5.10 Question Answering 11.6 Summary 11.7 Bibliographic Notes 11.7.1 Software Resources 11.8 Exercises 12 Text Summarization 12.1 Introduction 12.1.1 Extractive and Abstractive Summarization 12.1.2 Key Steps in Extractive Summarization 12.1.3 The Segmentation Phase in Extractive Summarization 12.2 Topic Word Methods for Extractive Summarization 12.2.1 Word Probabilities 12.2.2 Normalized Frequency Weights 12.2.3 Topic Signatures 12.2.4 Sentence Selection Methods 12.3 Latent Methods for Extractive Summarization 12.3.1 Latent Semantic Analysis 12.3.2 Lexical Chains 12.3.2.1 Short Description of WordNet 12.3.2.2 Leveraging WordNet for Lexical Chains 12.3.3 Graph-Based Methods 12.3.4 Centroid Summarization 12.4 Traditional Machine Learning for Extractive Summarization 12.4.1 Feature Extraction 12.4.2 Which Classifiers to Use? 12.5 Deep Learning for Extractive Summarization 12.5.1 Recurrent Neural Networks 12.5.2 Using Pre-Trained Language Models with Transformers 12.6 Multi-Document Summarization 12.6.1 Centroid-Based Summarization 12.6.2 Graph-Based Methods 12.7 Abstractive Summarization 12.7.1 Sentence Compression 12.7.2 Information Fusion 12.7.3 Information Ordering 12.7.4 Recurrent Neural Networks for Summarization 12.7.5 Abstractive Summarization with Transformers 12.8 Summary 12.9 Bibliographic Notes 12.9.1 Software Resources 12.10 Exercises 13 Information Extraction and Knowledge Graphs 13.1 Introduction 13.1.1 Historical Evolution 13.1.2 The Role of Natural Language Processing 13.2 Named Entity Recognition 13.2.1 Rule-Based Methods 13.2.1.1 Training Algorithms for Rule-Based Systems 13.2.2 Transformation to Token-Level Classification 13.2.3 Hidden Markov Models 13.2.3.1 Training 13.2.3.2 Prediction for Test Segment 13.2.3.3 Incorporating Extracted Features 13.2.3.4 Variations and Enhancements 13.2.4 Maximum Entropy Markov Models 13.2.5 Conditional Random Fields 13.2.6 Deep Learning for Entity Extraction 13.2.6.1 Recurrent Neural Networks for Named EntityRecognition 13.2.6.2 Use of Pretrained Language Modelswith Transformers 13.3 Relationship Extraction 13.3.1 Transformation to Classification 13.3.2 Relationship Prediction with Explicit Feature Engineering 13.3.3 Relationship Prediction with Implicit Feature Engineering: Kernel Methods 13.3.3.1 Kernels from Dependency Graphs 13.3.3.2 Subsequence-Based Kernels 13.3.3.3 Convolution Tree-Based Kernels 13.3.4 Relationship Extraction with Pretrained Language Models 13.4 Knowledge Graphs 13.4.1 Constructing a Knowledge Graph 13.4.2 Knowledge Graphs in Search 13.5 Summary 13.6 Bibliographic Notes 13.6.1 Weakly Supervised Learning Methods 13.6.2 Unsupervised and Open Information Extraction 13.6.3 Software Resources 13.7 Exercises 14 Question Answering 14.1 Introduction 14.2 The Reading Comprehension Task 14.2.1 Using Recurrent Neural Networks with Attention 14.2.2 Leveraging Pretrained Language Models 14.3 Retrieval for Open-Domain Question Answering 14.3.1 Dense Retrieval in Open Retriever Question Answering 14.3.2 Salient Span Masking 14.4 Closed Book Systems with Pretrained Language Models 14.5 Question Answering with Knowledge Graphs 14.5.1 Leveraging Query Translation 14.5.2 Fusing Text and Structured Data 14.5.3 Knowledge Graph to Corpus Translation 14.6 Challenges of Long-Form Question Answering 14.7 Summary 14.8 Bibliographic Notes 14.8.1 Data Sets for Evaluation 14.8.2 Software Resources 14.9 Exercises 15 Opinion Mining and Sentiment Analysis 15.1 Introduction 15.1.1 The Opinion Lexicon 15.2 Document-Level Sentiment Classification 15.2.1 Unsupervised Approaches to Classification 15.3 Phrase- and Sentence-Level Sentiment Classification 15.3.1 Applications of Sentence- and Phrase-Level Analysis 15.3.2 Reduction of Subjectivity Classification to Minimum CutProblem 15.3.3 Context in Sentence- and Phrase-Level Polarity Analysis 15.3.4 Sentiment Analysis with Deep Learning 15.3.4.1 Recurrent Neural Networks 15.3.4.2 Leveraging Pretrained Language Modelswith Transformers 15.4 Aspect-Based Opinion Mining as Information Extraction 15.4.1 Hu and Liu's Unsupervised Approach 15.4.2 OPINE: An Unsupervised Approach 15.4.3 Supervised Opinion Extraction as Token-Level Classification 15.5 Opinion Spam 15.5.1 Supervised Methods for Spam Detection 15.5.1.1 Labeling Deceptive Spam 15.5.1.2 Feature Extraction 15.5.2 Unsupervised Methods for Spammer Detection 15.6 Opinion Summarization 15.7 Summary 15.8 Bibliographic Notes 15.8.1 Software Resources 15.9 Exercises 16 Text Segmentation and Event Detection 16.1 Introduction 16.1.1 Relationship with Topic Detection and Tracking 16.2 Text Segmentation 16.2.1 TextTiling 16.2.2 The C99 Approach 16.2.3 Supervised Segmentation with Off-the-Shelf Classifiers 16.2.4 Supervised Segmentation with Markovian Models 16.3 Mining Text Streams 16.3.1 Streaming Text Clustering 16.3.2 Application to First Story Detection 16.4 Event Detection 16.4.1 Unsupervised Event Detection 16.4.1.1 Window-Based Nearest-Neighbor Method 16.4.1.2 Leveraging Generative Models 16.4.1.3 Event Detection in Social Streams 16.4.2 Supervised Event Detection as Supervised Segmentation 16.4.3 Event Detection as an Information Extraction Problem 16.4.3.1 Transformation to Token-Level Classification 16.4.3.2 Open Domain Event Extraction 16.5 Summary 16.6 Bibliographic Notes 16.6.1 Software Resources 16.7 Exercises Index