دسترسی نامحدود
برای کاربرانی که ثبت نام کرده اند
برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید
در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید
برای کاربرانی که ثبت نام کرده اند
درصورت عدم همخوانی توضیحات با کتاب
از ساعت 7 صبح تا 10 شب
دسته بندی: سایبرنتیک: هوش مصنوعی ویرایش: 1 نویسندگان: Jens Albrecht, Sidharth Ramachandran, Christian Winkler سری: ISBN (شابک) : 149207408X, 9781492074083 ناشر: O'Reilly Media سال نشر: 2021 تعداد صفحات: 422 زبان: English فرمت فایل : PDF (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود) حجم فایل: 20 مگابایت
در صورت تبدیل فایل کتاب Blueprints for Text Analytics Using Python: Machine Learning-Based Solutions for Common Real World (NLP) Applications به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.
توجه داشته باشید کتاب نقشه هایی برای تجزیه و تحلیل متن با استفاده از پایتون: راه حلهای مبتنی بر یادگیری ماشین برای برنامه های کاربردی دنیای واقعی (NLP) نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.
تبدیل متن به اطلاعات ارزشمند برای مشاغلی که به دنبال کسب مزیت رقابتی هستند ضروری است. با پیشرفت های اخیر در پردازش زبان طبیعی (NLP)، کاربران اکنون گزینه های زیادی برای حل چالش های پیچیده دارند. اما همیشه مشخص نیست که کدام ابزارها یا کتابخانه های NLP برای نیازهای یک کسب و کار کار می کنند، یا از کدام تکنیک ها و به چه ترتیبی باید استفاده کنید. این کتاب کاربردی به دانشمندان داده و توسعه دهندگان نقشه هایی برای بهترین راه حل های عملی برای وظایف رایج در تجزیه و تحلیل متن و پردازش زبان طبیعی ارائه می دهد. نویسندگان Jens Albrecht، Sidharth Ramachandran و Christian Winkler مطالعات موردی در دنیای واقعی و نمونههای کد دقیق را در Python ارائه میکنند تا به شما در شروع سریع کمک کنند. • استخراج داده ها از API ها و صفحات وب • داده های متنی را برای تجزیه و تحلیل آماری و یادگیری ماشین آماده کنید • از یادگیری ماشین برای طبقه بندی، مدل سازی موضوعات و خلاصه سازی استفاده کنید • مدل های هوش مصنوعی و نتایج طبقه بندی را توضیح دهید • تشابهات معنایی با جاسازی کلمات را کاوش و تجسم کنید • احساسات مشتری را در بررسی محصول شناسایی کنید • یک نمودار دانش بر اساس موجودیت های نامگذاری شده و روابط آنها ایجاد کنید
Turning text into valuable information is essential for businesses looking to gain a competitive advantage. With recent improvements in natural language processing (NLP), users now have many options for solving complex challenges. But it's not always clear which NLP tools or libraries would work for a business's needs, or which techniques you should use and in what order. This practical book provides data scientists and developers with blueprints for best practice solutions to common tasks in text analytics and natural language processing. Authors Jens Albrecht, Sidharth Ramachandran, and Christian Winkler provide real-world case studies and detailed code examples in Python to help you get started quickly. • Extract data from APIs and web pages • Prepare textual data for statistical analysis and machine learning • Use machine learning for classification, topic modeling, and summarization • Explain AI models and classification results • Explore and visualize semantic similarities with word embeddings • Identify customer sentiment in product reviews • Create a knowledge graph based on named entities and their relations
Cover Copyright Table of Contents Preface Approach of the Book Prerequisites Some Important Libraries to Know Books to Read Conventions Used in This Book Using Code Examples O’Reilly Online Learning How to Contact Us Acknowledgments Chapter 1. Gaining Early Insights from Textual Data What You’ll Learn and What We’ll Build Exploratory Data Analysis Introducing the Dataset Blueprint: Getting an Overview of the Data with Pandas Calculating Summary Statistics for Columns Checking for Missing Data Plotting Value Distributions Comparing Value Distributions Across Categories Visualizing Developments Over Time Blueprint: Building a Simple Text Preprocessing Pipeline Performing Tokenization with Regular Expressions Treating Stop Words Processing a Pipeline with One Line of Code Blueprints for Word Frequency Analysis Blueprint: Counting Words with a Counter Blueprint: Creating a Frequency Diagram Blueprint: Creating Word Clouds Blueprint: Ranking with TF-IDF Blueprint: Finding a Keyword-in-Context Blueprint: Analyzing N-Grams Blueprint: Comparing Frequencies Across Time Intervals and Categories Creating Frequency Timelines Creating Frequency Heatmaps Closing Remarks Chapter 2. Extracting Textual Insights with APIs What You’ll Learn and What We’ll Build Application Programming Interfaces Blueprint: Extracting Data from an API Using the Requests Module Pagination Rate Limiting Blueprint: Extracting Twitter Data with Tweepy Obtaining Credentials Installing and Configuring Tweepy Extracting Data from the Search API Extracting Data from a User’s Timeline Extracting Data from the Streaming API Closing Remarks Chapter 3. Scraping Websites and Extracting Data What You’ll Learn and What We’ll Build Scraping and Data Extraction Introducing the Reuters News Archive URL Generation Blueprint: Downloading and Interpreting robots.txt Blueprint: Finding URLs from sitemap.xml Blueprint: Finding URLs from RSS Downloading Data Blueprint: Downloading HTML Pages with Python Blueprint: Downloading HTML Pages with wget Extracting Semistructured Data Blueprint: Extracting Data with Regular Expressions Blueprint: Using an HTML Parser for Extraction Blueprint: Spidering Introducing the Use Case Error Handling and Production-Quality Software Density-Based Text Extraction Extracting Reuters Content with Readability Summary Density-Based Text Extraction All-in-One Approach Blueprint: Scraping the Reuters Archive with Scrapy Possible Problems with Scraping Closing Remarks and Recommendation Chapter 4. Preparing Textual Data for Statistics and Machine Learning What You’ll Learn and What We’ll Build A Data Preprocessing Pipeline Introducing the Dataset: Reddit Self-Posts Loading Data Into Pandas Blueprint: Standardizing Attribute Names Saving and Loading a DataFrame Cleaning Text Data Blueprint: Identify Noise with Regular Expressions Blueprint: Removing Noise with Regular Expressions Blueprint: Character Normalization with textacy Blueprint: Pattern-Based Data Masking with textacy Tokenization Blueprint: Tokenization with Regular Expressions Tokenization with NLTK Recommendations for Tokenization Linguistic Processing with spaCy Instantiating a Pipeline Processing Text Blueprint: Customizing Tokenization Blueprint: Working with Stop Words Blueprint: Extracting Lemmas Based on Part of Speech Blueprint: Extracting Noun Phrases Blueprint: Extracting Named Entities Feature Extraction on a Large Dataset Blueprint: Creating One Function to Get It All Blueprint: Using spaCy on a Large Dataset Persisting the Result A Note on Execution Time There Is More Language Detection Spell-Checking Token Normalization Closing Remarks and Recommendations Chapter 5. Feature Engineering and Syntactic Similarity What You’ll Learn and What We’ll Build A Toy Dataset for Experimentation Blueprint: Building Your Own Vectorizer Enumerating the Vocabulary Vectorizing Documents The Document-Term Matrix The Similarity Matrix Bag-of-Words Models Blueprint: Using scikit-learn’s CountVectorizer Blueprint: Calculating Similarities TF-IDF Models Optimized Document Vectors with TfidfTransformer Introducing the ABC Dataset Blueprint: Reducing Feature Dimensions Blueprint: Improving Features by Making Them More Specific Blueprint: Using Lemmas Instead of Words for Vectorizing Documents Blueprint: Limit Word Types Blueprint: Remove Most Common Words Blueprint: Adding Context via N-Grams Syntactic Similarity in the ABC Dataset Blueprint: Finding Most Similar Headlines to a Made-up Headline Blueprint: Finding the Two Most Similar Documents in a Large Corpus (Much More Difficult) Blueprint: Finding Related Words Tips for Long-Running Programs like Syntactic Similarity Summary and Conclusion Chapter 6. Text Classification Algorithms What You’ll Learn and What We’ll Build Introducing the Java Development Tools Bug Dataset Blueprint: Building a Text Classification System Step 1: Data Preparation Step 2: Train-Test Split Step 3: Training the Machine Learning Model Step 4: Model Evaluation Final Blueprint for Text Classification Blueprint: Using Cross-Validation to Estimate Realistic Accuracy Metrics Blueprint: Performing Hyperparameter Tuning with Grid Search Blueprint Recap and Conclusion Closing Remarks Further Reading Chapter 7. How to Explain a Text Classifier What You’ll Learn and What We’ll Build Blueprint: Determining Classification Confidence Using Prediction Probability Blueprint: Measuring Feature Importance of Predictive Models Blueprint: Using LIME to Explain the Classification Results Blueprint: Using ELI5 to Explain the Classification Results Blueprint: Using Anchor to Explain the Classification Results Using the Distribution with Masked Words Working with Real Words Closing Remarks Chapter 8. Unsupervised Methods: Topic Modeling and Clustering What You’ll Learn and What We’ll Build Our Dataset: UN General Debates Checking Statistics of the Corpus Preparations Nonnegative Matrix Factorization (NMF) Blueprint: Creating a Topic Model Using NMF for Documents Blueprint: Creating a Topic Model for Paragraphs Using NMF Latent Semantic Analysis/Indexing Blueprint: Creating a Topic Model for Paragraphs with SVD Latent Dirichlet Allocation Blueprint: Creating a Topic Model for Paragraphs with LDA Blueprint: Visualizing LDA Results Blueprint: Using Word Clouds to Display and Compare Topic Models Blueprint: Calculating Topic Distribution of Documents and Time Evolution Using Gensim for Topic Modeling Blueprint: Preparing Data for Gensim Blueprint: Performing Nonnegative Matrix Factorization with Gensim Blueprint: Using LDA with Gensim Blueprint: Calculating Coherence Scores Blueprint: Finding the Optimal Number of Topics Blueprint: Creating a Hierarchical Dirichlet Process with Gensim Blueprint: Using Clustering to Uncover the Structure of Text Data Further Ideas Summary and Recommendation Conclusion Chapter 9. Text Summarization What You’ll Learn and What We’ll Build Text Summarization Extractive Methods Data Preprocessing Blueprint: Summarizing Text Using Topic Representation Identifying Important Words with TF-IDF Values LSA Algorithm Blueprint: Summarizing Text Using an Indicator Representation Measuring the Performance of Text Summarization Methods Blueprint: Summarizing Text Using Machine Learning Step 1: Creating Target Labels Step 2: Adding Features to Assist Model Prediction Step 3: Build a Machine Learning Model Closing Remarks Further Reading Chapter 10. Exploring Semantic Relationships with Word Embeddings What You’ll Learn and What We’ll Build The Case for Semantic Embeddings Word Embeddings Analogy Reasoning with Word Embeddings Types of Embeddings Blueprint: Using Similarity Queries on Pretrained Models Loading a Pretrained Model Similarity Queries Blueprints for Training and Evaluating Your Own Embeddings Data Preparation Blueprint: Training Models with Gensim Blueprint: Evaluating Different Models Blueprints for Visualizing Embeddings Blueprint: Applying Dimensionality Reduction Blueprint: Using the TensorFlow Embedding Projector Blueprint: Constructing a Similarity Tree Closing Remarks Further Reading Chapter 11. Performing Sentiment Analysis on Text Data What You’ll Learn and What We’ll Build Sentiment Analysis Introducing the Amazon Customer Reviews Dataset Blueprint: Performing Sentiment Analysis Using Lexicon-Based Approaches Bing Liu Lexicon Disadvantages of a Lexicon-Based Approach Supervised Learning Approaches Preparing Data for a Supervised Learning Approach Blueprint: Vectorizing Text Data and Applying a Supervised Machine Learning Algorithm Step 1: Data Preparation Step 2: Train-Test Split Step 3: Text Vectorization Step 4: Training the Machine Learning Model Pretrained Language Models Using Deep Learning Deep Learning and Transfer Learning Blueprint: Using the Transfer Learning Technique and a Pretrained Language Model Step 1: Loading Models and Tokenization Step 2: Model Training Step 3: Model Evaluation Closing Remarks Further Reading Chapter 12. Building a Knowledge Graph What You’ll Learn and What We’ll Build Knowledge Graphs Information Extraction Introducing the Dataset Named-Entity Recognition Blueprint: Using Rule-Based Named-Entity Recognition Blueprint: Normalizing Named Entities Merging Entity Tokens Coreference Resolution Blueprint: Using spaCy’s Token Extensions Blueprint: Performing Alias Resolution Blueprint: Resolving Name Variations Blueprint: Performing Anaphora Resolution with NeuralCoref Name Normalization Entity Linking Blueprint: Creating a Co-Occurrence Graph Extracting Co-Occurrences from a Document Visualizing the Graph with Gephi Relation Extraction Blueprint: Extracting Relations Using Phrase Matching Blueprint: Extracting Relations Using Dependency Trees Creating the Knowledge Graph Don’t Blindly Trust the Results Closing Remarks Further Reading Chapter 13. Using Text Analytics in Production What You’ll Learn and What We’ll Build Blueprint: Using Conda to Create Reproducible Python Environments Blueprint: Using Containers to Create Reproducible Environments Blueprint: Creating a REST API for Your Text Analytics Model Blueprint: Deploying and Scaling Your API Using a Cloud Provider Blueprint: Automatically Versioning and Deploying Builds Closing Remarks Further Reading Index About the Authors Colophon