ورود به حساب

نام کاربری گذرواژه

گذرواژه را فراموش کردید؟ کلیک کنید

حساب کاربری ندارید؟ ساخت حساب

ساخت حساب کاربری

نام نام کاربری ایمیل شماره موبایل گذرواژه

برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید


09117307688
09117179751

در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید

دسترسی نامحدود

برای کاربرانی که ثبت نام کرده اند

ضمانت بازگشت وجه

درصورت عدم همخوانی توضیحات با کتاب

پشتیبانی

از ساعت 7 صبح تا 10 شب

دانلود کتاب Blueprints for Text Analytics Using Python

دانلود کتاب طرح های اولیه برای تجزیه و تحلیل متن با استفاده از پایتون

Blueprints for Text Analytics Using Python

مشخصات کتاب

Blueprints for Text Analytics Using Python

ویرایش:  
نویسندگان: , ,   
سری:  
ISBN (شابک) : 9781492074083 
ناشر: O'Reilly Media, Inc. 
سال نشر: 2020 
تعداد صفحات: 0 
زبان: English 
فرمت فایل : EPUB (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود) 
حجم فایل: 8 مگابایت 

قیمت کتاب (تومان) : 44,000



ثبت امتیاز به این کتاب

میانگین امتیاز به این کتاب :
       تعداد امتیاز دهندگان : 7


در صورت تبدیل فایل کتاب Blueprints for Text Analytics Using Python به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.

توجه داشته باشید کتاب طرح های اولیه برای تجزیه و تحلیل متن با استفاده از پایتون نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.


توضیحاتی در مورد کتاب طرح های اولیه برای تجزیه و تحلیل متن با استفاده از پایتون

تبدیل متن به اطلاعات ارزشمند برای بسیاری از مشاغلی که به دنبال کسب مزیت رقابتی هستند ضروری است. پیشرفت‌های زیادی در پردازش زبان طبیعی وجود دارد و کاربران هنگام انتخاب برای کار بر روی یک مشکل، گزینه‌های زیادی دارند. با این حال، همیشه مشخص نیست که کدام ابزارها یا کتابخانه های NLP برای استفاده تجاری کار می کنند - یا از کدام تکنیک ها و به چه ترتیبی باید استفاده کنید. این کتاب عملی پس‌زمینه نظری و مطالعات موردی در دنیای واقعی را با نمونه‌های کد دقیق ارائه می‌کند تا به توسعه‌دهندگان و دانشمندان داده کمک کند تا بینش را از متن آنلاین به دست آورند. نویسندگان ینس آلبرشت، سیدهارت راماچاندران و کریستین وینکلر از طرح‌هایی برای مشکلات مربوط به متن استفاده می‌کنند که از روش‌های پیشرفته‌ی یادگیری ماشین در پایتون استفاده می‌کنند. اگر درک اساسی از آمار و یادگیری ماشین به همراه تجربه برنامه نویسی اولیه در پایتون دارید، برای شروع آماده هستید. شما یاد خواهید گرفت که چگونه: خزیدن و تمیز کردن، سپس کاوش و تجسم داده های متنی در قالب های مختلف، پیش پردازش و بردار کردن متن برای یادگیری ماشینی استفاده از روش هایی برای طبقه بندی، تجزیه و تحلیل موضوع، خلاصه سازی و استخراج دانش استفاده از جاسازی کلمات معنایی و رویکردهای یادگیری عمیق برای پیچیده مشکلات کار با کتابخانه های Python NLP مانند spaCy، NLTK، و Gensim در ترکیب با scikit-learn، Pandas و PyTorch


توضیحاتی درمورد کتاب به خارجی

Turning text into valuable information is essential for many businesses looking to gain a competitive advantage. There have many improvements in natural language processing and users have a lot of options when choosing to work on a problem. However, it\'s not always clear which NLP tools or libraries would work for a business use--or which techniques you should use and in what order. This practical book provides theoretical background and real-world case studies with detailed code examples to help developers and data scientists obtain insight from text online. Authors Jens Albrecht, Sidharth Ramachandran, and Christian Winkler use blueprints for text-related problems that apply state-of-the-art machine learning methods in Python. If you have a fundamental understanding of statistics and machine learning along with basic programming experience in Python, you\'re ready to get started. You\'ll learn how to: Crawl and clean then explore and visualize textual data in different formats Preprocess and vectorize text for machine learning Apply methods for classification, topic analysis, summarization, and knowledge extraction Use semantic word embeddings and deep learning approaches for complex problems Work with Python NLP libraries like spaCy, NLTK, and Gensim in combination with scikit-learn, Pandas, and PyTorch



فهرست مطالب

Cover
Copyright
Table of Contents
Preface
	Approach of the Book
	Prerequisites
	Some Important Libraries to Know
	Books to Read
	Conventions Used in This Book
	Using Code Examples
	O’Reilly Online Learning
	How to Contact Us
	Acknowledgments
Chapter 1. Gaining Early Insights from Textual Data
	What You’ll Learn and What We’ll Build
	Exploratory Data Analysis
	Introducing the Dataset
	Blueprint: Getting an Overview of the Data with Pandas
		Calculating Summary Statistics for Columns
		Checking for Missing Data
		Plotting Value Distributions
		Comparing Value Distributions Across Categories
		Visualizing Developments Over Time
	Blueprint: Building a Simple Text Preprocessing Pipeline
		Performing Tokenization with Regular Expressions
		Treating Stop Words
		Processing a Pipeline with One Line of Code
	Blueprints for Word Frequency Analysis
		Blueprint: Counting Words with a Counter
		Blueprint: Creating a Frequency Diagram
		Blueprint: Creating Word Clouds
		Blueprint: Ranking with TF-IDF
	Blueprint: Finding a Keyword-in-Context
	Blueprint: Analyzing N-Grams
	Blueprint: Comparing Frequencies Across Time Intervals and Categories
		Creating Frequency Timelines
		Creating Frequency Heatmaps
	Closing Remarks
Chapter 2. Extracting Textual Insights with APIs
	What You’ll Learn and What We’ll Build
	Application Programming Interfaces
	Blueprint: Extracting Data from an API Using the Requests Module
		Pagination
		Rate Limiting
	Blueprint: Extracting Twitter Data with Tweepy
		Obtaining Credentials
		Installing and Configuring Tweepy
		Extracting Data from the Search API
		Extracting Data from a User’s Timeline
		Extracting Data from the Streaming API
	Closing Remarks
Chapter 3. Scraping Websites and Extracting Data
	What You’ll Learn and What We’ll Build
	Scraping and Data Extraction
	Introducing the Reuters News Archive
	URL Generation
	Blueprint: Downloading and Interpreting robots.txt
	Blueprint: Finding URLs from sitemap.xml
	Blueprint: Finding URLs from RSS
	Downloading Data
	Blueprint: Downloading HTML Pages with Python
	Blueprint: Downloading HTML Pages with wget
	Extracting Semistructured Data
	Blueprint: Extracting Data with Regular Expressions
	Blueprint: Using an HTML Parser for Extraction
	Blueprint: Spidering
		Introducing the Use Case
		Error Handling and Production-Quality Software
	Density-Based Text Extraction
		Extracting Reuters Content with Readability
		Summary Density-Based Text Extraction
	All-in-One Approach
	Blueprint: Scraping the Reuters Archive with Scrapy
	Possible Problems with Scraping
	Closing Remarks and Recommendation
Chapter 4. Preparing Textual Data for Statistics and Machine Learning
	What You’ll Learn and What We’ll Build
	A Data Preprocessing Pipeline
	Introducing the Dataset: Reddit Self-Posts
		Loading Data Into Pandas
		Blueprint: Standardizing Attribute Names
		Saving and Loading a DataFrame
	Cleaning Text Data
		Blueprint: Identify Noise with Regular Expressions
		Blueprint: Removing Noise with Regular Expressions
		Blueprint: Character Normalization with textacy
		Blueprint: Pattern-Based Data Masking with textacy
	Tokenization
		Blueprint: Tokenization with Regular Expressions
		Tokenization with NLTK
		Recommendations for Tokenization
	Linguistic Processing with spaCy
		Instantiating a Pipeline
		Processing Text
		Blueprint: Customizing Tokenization
		Blueprint: Working with Stop Words
		Blueprint: Extracting Lemmas Based on Part of Speech
		Blueprint: Extracting Noun Phrases
		Blueprint: Extracting Named Entities
	Feature Extraction on a Large Dataset
		Blueprint: Creating One Function to Get It All
		Blueprint: Using spaCy on a Large Dataset
		Persisting the Result
		A Note on Execution Time
	There Is More
		Language Detection
		Spell-Checking
		Token Normalization
	Closing Remarks and Recommendations
Chapter 5. Feature Engineering and Syntactic Similarity
	What You’ll Learn and What We’ll Build
	A Toy Dataset for Experimentation
	Blueprint: Building Your Own Vectorizer
		Enumerating the Vocabulary
		Vectorizing Documents
		The Document-Term Matrix
		The Similarity Matrix
	Bag-of-Words Models
		Blueprint: Using scikit-learn’s CountVectorizer
		Blueprint: Calculating Similarities
	TF-IDF Models
		Optimized Document Vectors with TfidfTransformer
		Introducing the ABC Dataset
		Blueprint: Reducing Feature Dimensions
		Blueprint: Improving Features by Making Them More Specific
		Blueprint: Using Lemmas Instead of Words for Vectorizing Documents
		Blueprint: Limit Word Types
		Blueprint: Remove Most Common Words
		Blueprint: Adding Context via N-Grams
	Syntactic Similarity in the ABC Dataset
		Blueprint: Finding Most Similar Headlines to a Made-up Headline
		Blueprint: Finding the Two Most Similar Documents in a Large Corpus (Much More Difficult)
		Blueprint: Finding Related Words
		Tips for Long-Running Programs like Syntactic Similarity
	Summary and Conclusion
Chapter 6. Text Classification Algorithms
	What You’ll Learn and What We’ll Build
	Introducing the Java Development Tools Bug Dataset
	Blueprint: Building a Text Classification System
		Step 1: Data Preparation
		Step 2: Train-Test Split
		Step 3: Training the Machine Learning Model
		Step 4: Model Evaluation
	Final Blueprint for Text Classification
	Blueprint: Using Cross-Validation to Estimate Realistic Accuracy Metrics
	Blueprint: Performing Hyperparameter Tuning with Grid Search
	Blueprint Recap and Conclusion
	Closing Remarks
	Further Reading
Chapter 7. How to Explain a Text Classifier
	What You’ll Learn and What We’ll Build
	Blueprint: Determining Classification Confidence Using Prediction Probability
	Blueprint: Measuring Feature Importance of Predictive Models
	Blueprint: Using LIME to Explain the Classification Results
	Blueprint: Using ELI5 to Explain the Classification Results
	Blueprint: Using Anchor to Explain the Classification Results
		Using the Distribution with Masked Words
		Working with Real Words
	Closing Remarks
Chapter 8. Unsupervised Methods: Topic Modeling and Clustering
	What You’ll Learn and What We’ll Build
	Our Dataset: UN General Debates
		Checking Statistics of the Corpus
		Preparations
	Nonnegative Matrix Factorization (NMF)
		Blueprint: Creating a Topic Model Using NMF for Documents
		Blueprint: Creating a Topic Model for Paragraphs Using NMF
	Latent Semantic Analysis/Indexing
		Blueprint: Creating a Topic Model for Paragraphs with SVD
	Latent Dirichlet Allocation
		Blueprint: Creating a Topic Model for Paragraphs with LDA
		Blueprint: Visualizing LDA Results
	Blueprint: Using Word Clouds to Display and Compare Topic Models
	Blueprint: Calculating Topic Distribution of Documents and Time Evolution
	Using Gensim for Topic Modeling
		Blueprint: Preparing Data for Gensim
		Blueprint: Performing Nonnegative Matrix Factorization with Gensim
		Blueprint: Using LDA with Gensim
		Blueprint: Calculating Coherence Scores
		Blueprint: Finding the Optimal Number of Topics
		Blueprint: Creating a Hierarchical Dirichlet Process with Gensim
	Blueprint: Using Clustering to Uncover the Structure of Text Data
	Further Ideas
	Summary and Recommendation
	Conclusion
Chapter 9. Text Summarization
	What You’ll Learn and What We’ll Build
	Text Summarization
		Extractive Methods
		Data Preprocessing
	Blueprint: Summarizing Text Using Topic Representation
		Identifying Important Words with TF-IDF Values
		LSA Algorithm
	Blueprint: Summarizing Text Using an Indicator Representation
	Measuring the Performance of Text Summarization Methods
	Blueprint: Summarizing Text Using Machine Learning
		Step 1: Creating Target Labels
		Step 2: Adding Features to Assist Model Prediction
		Step 3: Build a Machine Learning Model
	Closing Remarks
	Further Reading
Chapter 10. Exploring Semantic Relationships with Word Embeddings
	What You’ll Learn and What We’ll Build
	The Case for Semantic Embeddings
		Word Embeddings
		Analogy Reasoning with Word Embeddings
		Types of Embeddings
	Blueprint: Using Similarity Queries on Pretrained Models
		Loading a Pretrained Model
		Similarity Queries
	Blueprints for Training and Evaluating Your Own Embeddings
		Data Preparation
		Blueprint: Training Models with Gensim
		Blueprint: Evaluating Different Models
	Blueprints for Visualizing Embeddings
		Blueprint: Applying Dimensionality Reduction
		Blueprint: Using the TensorFlow Embedding Projector
		Blueprint: Constructing a Similarity Tree
	Closing Remarks
	Further Reading
Chapter 11. Performing Sentiment Analysis on Text Data
	What You’ll Learn and What We’ll Build
	Sentiment Analysis
	Introducing the Amazon Customer Reviews Dataset
	Blueprint: Performing Sentiment Analysis Using Lexicon-Based Approaches
		Bing Liu Lexicon
		Disadvantages of a Lexicon-Based Approach
	Supervised Learning Approaches
		Preparing Data for a Supervised Learning Approach
	Blueprint: Vectorizing Text Data and Applying a Supervised Machine Learning Algorithm
		Step 1: Data Preparation
		Step 2: Train-Test Split
		Step 3: Text Vectorization
		Step 4: Training the Machine Learning Model
	Pretrained Language Models Using Deep Learning
		Deep Learning and Transfer Learning
	Blueprint: Using the Transfer Learning Technique and a Pretrained Language Model
		Step 1: Loading Models and Tokenization
		Step 2: Model Training
		Step 3: Model Evaluation
	Closing Remarks
	Further Reading
Chapter 12. Building a Knowledge Graph
	What You’ll Learn and What We’ll Build
	Knowledge Graphs
		Information Extraction
	Introducing the Dataset
	Named-Entity Recognition
		Blueprint: Using Rule-Based Named-Entity Recognition
		Blueprint: Normalizing Named Entities
		Merging Entity Tokens
	Coreference Resolution
		Blueprint: Using spaCy’s Token Extensions
		Blueprint: Performing Alias Resolution
		Blueprint: Resolving Name Variations
		Blueprint: Performing Anaphora Resolution with NeuralCoref
		Name Normalization
		Entity Linking
	Blueprint: Creating a Co-Occurrence Graph
		Extracting Co-Occurrences from a Document
		Visualizing the Graph with Gephi
	Relation Extraction
		Blueprint: Extracting Relations Using Phrase Matching
		Blueprint: Extracting Relations Using Dependency Trees
	Creating the Knowledge Graph
		Don’t Blindly Trust the Results
	Closing Remarks
	Further Reading
Chapter 13. Using Text Analytics in Production
	What You’ll Learn and What We’ll Build
	Blueprint: Using Conda to Create Reproducible Python Environments
	Blueprint: Using Containers to Create Reproducible Environments
	Blueprint: Creating a REST API for Your Text Analytics Model
	Blueprint: Deploying and Scaling Your API Using a Cloud Provider
	Blueprint: Automatically Versioning and Deploying Builds
	Closing Remarks
	Further Reading
Index
About the Authors
Colophon




نظرات کاربران