دسترسی نامحدود
برای کاربرانی که ثبت نام کرده اند
برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید
در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید
برای کاربرانی که ثبت نام کرده اند
درصورت عدم همخوانی توضیحات با کتاب
از ساعت 7 صبح تا 10 شب
ویرایش: نویسندگان: Jens Albrecht, Sidharth Ramachandran, Christian Winkler سری: ISBN (شابک) : 9781492074083 ناشر: O'Reilly Media, Inc. سال نشر: 2020 تعداد صفحات: 0 زبان: English فرمت فایل : EPUB (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود) حجم فایل: 8 مگابایت
در صورت تبدیل فایل کتاب Blueprints for Text Analytics Using Python به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.
توجه داشته باشید کتاب طرح های اولیه برای تجزیه و تحلیل متن با استفاده از پایتون نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.
تبدیل متن به اطلاعات ارزشمند برای بسیاری از مشاغلی که به دنبال کسب مزیت رقابتی هستند ضروری است. پیشرفتهای زیادی در پردازش زبان طبیعی وجود دارد و کاربران هنگام انتخاب برای کار بر روی یک مشکل، گزینههای زیادی دارند. با این حال، همیشه مشخص نیست که کدام ابزارها یا کتابخانه های NLP برای استفاده تجاری کار می کنند - یا از کدام تکنیک ها و به چه ترتیبی باید استفاده کنید. این کتاب عملی پسزمینه نظری و مطالعات موردی در دنیای واقعی را با نمونههای کد دقیق ارائه میکند تا به توسعهدهندگان و دانشمندان داده کمک کند تا بینش را از متن آنلاین به دست آورند. نویسندگان ینس آلبرشت، سیدهارت راماچاندران و کریستین وینکلر از طرحهایی برای مشکلات مربوط به متن استفاده میکنند که از روشهای پیشرفتهی یادگیری ماشین در پایتون استفاده میکنند. اگر درک اساسی از آمار و یادگیری ماشین به همراه تجربه برنامه نویسی اولیه در پایتون دارید، برای شروع آماده هستید. شما یاد خواهید گرفت که چگونه: خزیدن و تمیز کردن، سپس کاوش و تجسم داده های متنی در قالب های مختلف، پیش پردازش و بردار کردن متن برای یادگیری ماشینی استفاده از روش هایی برای طبقه بندی، تجزیه و تحلیل موضوع، خلاصه سازی و استخراج دانش استفاده از جاسازی کلمات معنایی و رویکردهای یادگیری عمیق برای پیچیده مشکلات کار با کتابخانه های Python NLP مانند spaCy، NLTK، و Gensim در ترکیب با scikit-learn، Pandas و PyTorch
Turning text into valuable information is essential for many businesses looking to gain a competitive advantage. There have many improvements in natural language processing and users have a lot of options when choosing to work on a problem. However, it\'s not always clear which NLP tools or libraries would work for a business use--or which techniques you should use and in what order. This practical book provides theoretical background and real-world case studies with detailed code examples to help developers and data scientists obtain insight from text online. Authors Jens Albrecht, Sidharth Ramachandran, and Christian Winkler use blueprints for text-related problems that apply state-of-the-art machine learning methods in Python. If you have a fundamental understanding of statistics and machine learning along with basic programming experience in Python, you\'re ready to get started. You\'ll learn how to: Crawl and clean then explore and visualize textual data in different formats Preprocess and vectorize text for machine learning Apply methods for classification, topic analysis, summarization, and knowledge extraction Use semantic word embeddings and deep learning approaches for complex problems Work with Python NLP libraries like spaCy, NLTK, and Gensim in combination with scikit-learn, Pandas, and PyTorch
Cover Copyright Table of Contents Preface Approach of the Book Prerequisites Some Important Libraries to Know Books to Read Conventions Used in This Book Using Code Examples O’Reilly Online Learning How to Contact Us Acknowledgments Chapter 1. Gaining Early Insights from Textual Data What You’ll Learn and What We’ll Build Exploratory Data Analysis Introducing the Dataset Blueprint: Getting an Overview of the Data with Pandas Calculating Summary Statistics for Columns Checking for Missing Data Plotting Value Distributions Comparing Value Distributions Across Categories Visualizing Developments Over Time Blueprint: Building a Simple Text Preprocessing Pipeline Performing Tokenization with Regular Expressions Treating Stop Words Processing a Pipeline with One Line of Code Blueprints for Word Frequency Analysis Blueprint: Counting Words with a Counter Blueprint: Creating a Frequency Diagram Blueprint: Creating Word Clouds Blueprint: Ranking with TF-IDF Blueprint: Finding a Keyword-in-Context Blueprint: Analyzing N-Grams Blueprint: Comparing Frequencies Across Time Intervals and Categories Creating Frequency Timelines Creating Frequency Heatmaps Closing Remarks Chapter 2. Extracting Textual Insights with APIs What You’ll Learn and What We’ll Build Application Programming Interfaces Blueprint: Extracting Data from an API Using the Requests Module Pagination Rate Limiting Blueprint: Extracting Twitter Data with Tweepy Obtaining Credentials Installing and Configuring Tweepy Extracting Data from the Search API Extracting Data from a User’s Timeline Extracting Data from the Streaming API Closing Remarks Chapter 3. Scraping Websites and Extracting Data What You’ll Learn and What We’ll Build Scraping and Data Extraction Introducing the Reuters News Archive URL Generation Blueprint: Downloading and Interpreting robots.txt Blueprint: Finding URLs from sitemap.xml Blueprint: Finding URLs from RSS Downloading Data Blueprint: Downloading HTML Pages with Python Blueprint: Downloading HTML Pages with wget Extracting Semistructured Data Blueprint: Extracting Data with Regular Expressions Blueprint: Using an HTML Parser for Extraction Blueprint: Spidering Introducing the Use Case Error Handling and Production-Quality Software Density-Based Text Extraction Extracting Reuters Content with Readability Summary Density-Based Text Extraction All-in-One Approach Blueprint: Scraping the Reuters Archive with Scrapy Possible Problems with Scraping Closing Remarks and Recommendation Chapter 4. Preparing Textual Data for Statistics and Machine Learning What You’ll Learn and What We’ll Build A Data Preprocessing Pipeline Introducing the Dataset: Reddit Self-Posts Loading Data Into Pandas Blueprint: Standardizing Attribute Names Saving and Loading a DataFrame Cleaning Text Data Blueprint: Identify Noise with Regular Expressions Blueprint: Removing Noise with Regular Expressions Blueprint: Character Normalization with textacy Blueprint: Pattern-Based Data Masking with textacy Tokenization Blueprint: Tokenization with Regular Expressions Tokenization with NLTK Recommendations for Tokenization Linguistic Processing with spaCy Instantiating a Pipeline Processing Text Blueprint: Customizing Tokenization Blueprint: Working with Stop Words Blueprint: Extracting Lemmas Based on Part of Speech Blueprint: Extracting Noun Phrases Blueprint: Extracting Named Entities Feature Extraction on a Large Dataset Blueprint: Creating One Function to Get It All Blueprint: Using spaCy on a Large Dataset Persisting the Result A Note on Execution Time There Is More Language Detection Spell-Checking Token Normalization Closing Remarks and Recommendations Chapter 5. Feature Engineering and Syntactic Similarity What You’ll Learn and What We’ll Build A Toy Dataset for Experimentation Blueprint: Building Your Own Vectorizer Enumerating the Vocabulary Vectorizing Documents The Document-Term Matrix The Similarity Matrix Bag-of-Words Models Blueprint: Using scikit-learn’s CountVectorizer Blueprint: Calculating Similarities TF-IDF Models Optimized Document Vectors with TfidfTransformer Introducing the ABC Dataset Blueprint: Reducing Feature Dimensions Blueprint: Improving Features by Making Them More Specific Blueprint: Using Lemmas Instead of Words for Vectorizing Documents Blueprint: Limit Word Types Blueprint: Remove Most Common Words Blueprint: Adding Context via N-Grams Syntactic Similarity in the ABC Dataset Blueprint: Finding Most Similar Headlines to a Made-up Headline Blueprint: Finding the Two Most Similar Documents in a Large Corpus (Much More Difficult) Blueprint: Finding Related Words Tips for Long-Running Programs like Syntactic Similarity Summary and Conclusion Chapter 6. Text Classification Algorithms What You’ll Learn and What We’ll Build Introducing the Java Development Tools Bug Dataset Blueprint: Building a Text Classification System Step 1: Data Preparation Step 2: Train-Test Split Step 3: Training the Machine Learning Model Step 4: Model Evaluation Final Blueprint for Text Classification Blueprint: Using Cross-Validation to Estimate Realistic Accuracy Metrics Blueprint: Performing Hyperparameter Tuning with Grid Search Blueprint Recap and Conclusion Closing Remarks Further Reading Chapter 7. How to Explain a Text Classifier What You’ll Learn and What We’ll Build Blueprint: Determining Classification Confidence Using Prediction Probability Blueprint: Measuring Feature Importance of Predictive Models Blueprint: Using LIME to Explain the Classification Results Blueprint: Using ELI5 to Explain the Classification Results Blueprint: Using Anchor to Explain the Classification Results Using the Distribution with Masked Words Working with Real Words Closing Remarks Chapter 8. Unsupervised Methods: Topic Modeling and Clustering What You’ll Learn and What We’ll Build Our Dataset: UN General Debates Checking Statistics of the Corpus Preparations Nonnegative Matrix Factorization (NMF) Blueprint: Creating a Topic Model Using NMF for Documents Blueprint: Creating a Topic Model for Paragraphs Using NMF Latent Semantic Analysis/Indexing Blueprint: Creating a Topic Model for Paragraphs with SVD Latent Dirichlet Allocation Blueprint: Creating a Topic Model for Paragraphs with LDA Blueprint: Visualizing LDA Results Blueprint: Using Word Clouds to Display and Compare Topic Models Blueprint: Calculating Topic Distribution of Documents and Time Evolution Using Gensim for Topic Modeling Blueprint: Preparing Data for Gensim Blueprint: Performing Nonnegative Matrix Factorization with Gensim Blueprint: Using LDA with Gensim Blueprint: Calculating Coherence Scores Blueprint: Finding the Optimal Number of Topics Blueprint: Creating a Hierarchical Dirichlet Process with Gensim Blueprint: Using Clustering to Uncover the Structure of Text Data Further Ideas Summary and Recommendation Conclusion Chapter 9. Text Summarization What You’ll Learn and What We’ll Build Text Summarization Extractive Methods Data Preprocessing Blueprint: Summarizing Text Using Topic Representation Identifying Important Words with TF-IDF Values LSA Algorithm Blueprint: Summarizing Text Using an Indicator Representation Measuring the Performance of Text Summarization Methods Blueprint: Summarizing Text Using Machine Learning Step 1: Creating Target Labels Step 2: Adding Features to Assist Model Prediction Step 3: Build a Machine Learning Model Closing Remarks Further Reading Chapter 10. Exploring Semantic Relationships with Word Embeddings What You’ll Learn and What We’ll Build The Case for Semantic Embeddings Word Embeddings Analogy Reasoning with Word Embeddings Types of Embeddings Blueprint: Using Similarity Queries on Pretrained Models Loading a Pretrained Model Similarity Queries Blueprints for Training and Evaluating Your Own Embeddings Data Preparation Blueprint: Training Models with Gensim Blueprint: Evaluating Different Models Blueprints for Visualizing Embeddings Blueprint: Applying Dimensionality Reduction Blueprint: Using the TensorFlow Embedding Projector Blueprint: Constructing a Similarity Tree Closing Remarks Further Reading Chapter 11. Performing Sentiment Analysis on Text Data What You’ll Learn and What We’ll Build Sentiment Analysis Introducing the Amazon Customer Reviews Dataset Blueprint: Performing Sentiment Analysis Using Lexicon-Based Approaches Bing Liu Lexicon Disadvantages of a Lexicon-Based Approach Supervised Learning Approaches Preparing Data for a Supervised Learning Approach Blueprint: Vectorizing Text Data and Applying a Supervised Machine Learning Algorithm Step 1: Data Preparation Step 2: Train-Test Split Step 3: Text Vectorization Step 4: Training the Machine Learning Model Pretrained Language Models Using Deep Learning Deep Learning and Transfer Learning Blueprint: Using the Transfer Learning Technique and a Pretrained Language Model Step 1: Loading Models and Tokenization Step 2: Model Training Step 3: Model Evaluation Closing Remarks Further Reading Chapter 12. Building a Knowledge Graph What You’ll Learn and What We’ll Build Knowledge Graphs Information Extraction Introducing the Dataset Named-Entity Recognition Blueprint: Using Rule-Based Named-Entity Recognition Blueprint: Normalizing Named Entities Merging Entity Tokens Coreference Resolution Blueprint: Using spaCy’s Token Extensions Blueprint: Performing Alias Resolution Blueprint: Resolving Name Variations Blueprint: Performing Anaphora Resolution with NeuralCoref Name Normalization Entity Linking Blueprint: Creating a Co-Occurrence Graph Extracting Co-Occurrences from a Document Visualizing the Graph with Gephi Relation Extraction Blueprint: Extracting Relations Using Phrase Matching Blueprint: Extracting Relations Using Dependency Trees Creating the Knowledge Graph Don’t Blindly Trust the Results Closing Remarks Further Reading Chapter 13. Using Text Analytics in Production What You’ll Learn and What We’ll Build Blueprint: Using Conda to Create Reproducible Python Environments Blueprint: Using Containers to Create Reproducible Environments Blueprint: Creating a REST API for Your Text Analytics Model Blueprint: Deploying and Scaling Your API Using a Cloud Provider Blueprint: Automatically Versioning and Deploying Builds Closing Remarks Further Reading Index About the Authors Colophon