دسترسی نامحدود
برای کاربرانی که ثبت نام کرده اند
برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید
در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید
برای کاربرانی که ثبت نام کرده اند
درصورت عدم همخوانی توضیحات با کتاب
از ساعت 7 صبح تا 10 شب
ویرایش:
نویسندگان: Sam Lau
سری:
ISBN (شابک) : 9781098113001
ناشر: O'Reilly Media
سال نشر: 2023
تعداد صفحات: 594
زبان: English
فرمت فایل : EPUB (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود)
حجم فایل: 16 Mb
در صورت تبدیل فایل کتاب Learning Data Science: Data Wrangling, Exploration, Visualization, and Modeling with Python (Final) به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.
توجه داشته باشید کتاب یادگیری علم داده: جدال داده ها، کاوش، تجسم و مدل سازی با پایتون (نهایی) نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.
As an aspiring data scientist, you appreciate why organizations rely on data for important decisions--whether it\'s for companies designing websites, cities deciding how to improve services, or scientists discovering how to stop the spread of disease. And you want the skills required to distill a messy pile of data into actionable insights. We call this the data science lifecycle: the process of collecting, wrangling, analyzing, and drawing conclusions from data. Learning Data Science is the first book to cover foundational skills in both programming and statistics that encompass this entire lifecycle. It\'s aimed at those who wish to become data scientists or who already work with data scientists, and at data analysts who wish to cross the \"technical/nontechnical\" divide. If you have a basic knowledge of Python programming, you\'ll learn how to work with data using industry-standard tools like pandas. Refine a question of interest to one that can be studied with data Pursue data collection that may involve text processing, web scraping, etc. Glean valuable insights about data through data cleaning, exploration, and visualization Learn how to use modeling to describe the data Generalize findings beyond the data
Preface Expected Background Knowledge Organization of the Book Conventions Used in This Book Using Code Examples O’Reilly Online Learning How to Contact Us Acknowledgements I. The Data Science Lifecycle 1. The Data Science Lifecycle The Stages of the Lifecycle Examples of the Lifecycle Summary 2. Questions and Data Scope Big Data and New Opportunities Example: Google Flu Trends Target Population, Access Frame, Sample Instruments and Protocols Measuring Natural Phenomenon Accuracy Types of Bias Types of Variation Summary 3. Simulation and Data Design The Urn Model Sampling Designs Sampling Distribution of a Statistic Simulating the Sampling Distribution The Hypergeometric Distribution Example: Simulating Election Poll Bias and Variance The Pennsylvania Urn Model An Urn Model with Bias Conducting Larger Polls Example: Simulating a Randomized Trial for a Vaccine Scope The Urn Model for Random Assignment Example: Measuring Air Quality Summary 4. Modeling with Summary Statistics The Constant Model Minimizing Loss Mean Absolute Error Mean Squared Error Choosing Loss Functions Summary 5. Case Study: Why is my Bus Always Late? Question and Scope Data Wrangling Exploring Bus Times Modeling Wait Times Summary II. Rectangular Data 6. Working With Dataframes Using pandas Subsetting Data Scope and Question DataFrames and Indices Slicing Filtering Rows Example: How recently has Luna become a popular name? Aggregating Basic Group-Aggregate Grouping on Multiple Columns Custom Aggregation Functions Example: Have People Become More Creative With Baby Names? Pivoting Joining Inner Joins Left, Right, and Outer Joins Example: Popularity of NYT Name Categories Transforming Apply Example: Popularity of “L” Names The Price of Apply How are Dataframes Different from Other Data Representations? Dataframes and Spreadsheets Dataframes and Matrices Dataframes and Relations Summary 7. Working With Relations Using SQL Subsetting SQL Basics: SELECT and FROM What’s a Relation? Slicing Filtering Rows Example: How recently has Luna become a popular name? Aggregating Basic Group-Aggregate using GROUP BY Grouping on Multiple Columns Other Aggregation Functions Joining Inner Joins Left and Right Joins Example: Popularity of NYT Name Categories Transforming and Common Table Expressions SQL Functions Multistep Queries Using a WITH Clause Example: Popularity of “L” Names Summary III. Understanding The Data 8. Wrangling Files Data Source Examples Drug Abuse Warning Network (DAWN) Survey San Francisco Restaurant Food Safety File Formats Delimited format Fixed-width Format Hierarchical Formats Loosely Formatted Text File Encoding File Size Working with Large Data Sets The Shell and Command Line Tools Table Shape and Granularity Granularity of Restaurant Inspections and Violations DAWN Survey Shape and Granularity Summary 9. Wrangling Dataframes Example: Wrangling CO2 Measurements from Mauna Loa Observatory Quality Checks Addressing Missing Data Reshaping the Data Table Quality Checks Quality based on scope Quality of measurements and recorded values Quality across related features Quality for analysis Fixing the Data or Not Missing Values and Records Imputing Missing Values Transformations and Timestamps Transforming Timestamps Piping for Transformations Modifying Structure Example: Wrangling Restaurant Safety Violations Narrowing the Focus Aggregating Violations Extracting Information from Violation Descriptions Summary 10. Exploratory Data Analysis Feature Types Example: Dog Breeds Transforming Qualitative Features The Importance of Feature Types What to Look For in a Distribution What to Look For in a Relationship Two Quantitative Features One Qualitative and One Quantitative Variable Two Qualitative Features Comparisons in Multivariate Settings Guidelines for Exploration Example: Sale Prices for Houses Understanding Price What Next? Examining other features Delving Deeper into Relationships Fixing Location EDA discoveries Summary 11. Data Visualization Choosing Scale to Reveal Structure Filling the Data Region Including Zero Revealing Shape Through Transformations Banking to Decipher Relationships Revealing Relationships Through Straightening Smoothing and Aggregating Data Smoothing Techniques to Uncover Shape Smoothing Techniques to Uncover Relationships and Trends Smoothing Techniques Need Tuning Reducing Distributions to Quantiles When Not to Smooth Facilitating Meaningful Comparisons Emphasize the Important Difference Ordering Groups Avoid Stacking Selecting a Color Palette Guidelines for Comparisons in Plots Incorporating the Data Design Data Collected over Time Observational Studies Unequal Sampling Geographic Data Adding Context Example: 100m Sprint Times Creating Plots Using plotly Figure and Trace Objects Modifying Layout Plotting Functions Annotations Other Tools for Visualization matplotlib Grammar of Graphics Summary 12. Case Study: How Accurate are Air Quality Measurements? Question, Design, and Scope Finding Collocated Sensors Wrangling the List of AQS Sites Wrangling the List of PurpleAir Sites Matching AQS and PurpleAir Sensors Wrangling and Cleaning AQS Sensor Data Checking Granularity Removing Unneeded Columns Checking the Validity of Dates Checking the Quality of PM2.5 Measurements Wrangling PurpleAir Sensor Data Checking the Granularity Handling Missing Values Exploring PurpleAir and AQS Measurements Creating a Model to Correct PurpleAir Measurements Summary IV. Other Data Sources 13. Working with Text Examples of Text and Tasks Convert text into a standard format Extract a piece of text to create a feature Transform text into features Text analysis String Manipulation Converting Text to a Standard Format with Python String Methods String Methods in pandas Splitting Strings to Extract Pieces of Text Regular Expressions Concatenation of Literals Quantifiers Alternation and Grouping to Create Features Reference Tables Text Analysis Summary 14. Data Exchange NetCDF Data Example: Rainfall Around the World JSON Data Example: Air Quality Data Exchange HTTP REST Example: Retrieving Info on Clash Songs from Spotify XML, HTML, and XPath Example: Scraping Race Times from Wikipedia XPath Example: Accessing Exchange Rates from the ECB Summary V. Linear Modeling 15. Linear Models Simple Linear Model Example: A Simple Linear Model for Air Quality Interpreting Linear Models Assessing the Fit Fitting the Simple Linear Model Multiple Linear Model Example: A Multiple Linear Model for Air Quality Fitting the Multiple Linear Model A Geometric Problem Example: Where is the Land of Opportunity? Explaining Upward Mobility using Commute Time Relating Upward Mobility Using Multiple Variables Feature Engineering for Numeric Measurements Feature Engineering for Categorical Measurements Summary 16. Model Selection Overfitting Example: Energy Consumption Train-Test Split Cross-Validation Example: Fitting a Bent Line Model with Cross-validation Regularization Example: A Market Analysis Model Bias and Variance Summary 17. Theory for Inference and Prediction Distributions: Population, Empirical, Sampling Basics of Hypothesis Testing Example: A Rank-test to Compare Productivity of Wikipedia Contributors Example: A Test of Proportions for Vaccine Efficacy Bootstrapping for Inference Boostrapping a Test for a Regression Coefficient Basics of Confidence Intervals Confidence Intervals for a Coefficient Basics of Prediction Intervals Example: Predicting Bus Lateness Example: Predicting Crab Size Example: Predicting the Incremental Growth of a Crab Probability for Inference and Prediction Formalizing the Theory for Average rank statistics General Properties of Random Variables Probability Behind Testing and Intervals Probability Behind Model Selection Summary 18. Case Study: How to Weigh a Donkey Donkey Study Question and Scope Wrangling and Transforming Train-Test Split of the Data Exploring Modeling a Donkey’s Weight A Loss Function for Prescribing Anesthetics Fitting a Simple Linear Model Fitting a Multiple Linear Model Bringing Qualitative Features into the Model Model Assessment Summary VI. Classification 19. Classification Example: Wind Damaged Trees Modeling and Classification A Constant Model Examining the Relationship Between Size and Windthrow Modeling Proportions (and Probabilities) A Logistic Model Log Odds Using a Logistic Curve A Loss Function for the Logistic Model Fitting a Logistic Model From Probabilities to Classification The Confusion Matrix Precision vs Recall Summary 20. Numerical Optimization Gradient Descent Basics Minimizing Huber Loss Convex and Differentiable Loss Functions Variants of Gradient Descent Stochastic Gradient Descent Mini-batch Gradient Descent Newton’s Method Summary 21. Case Study: Detecting Fake News Question and Scope Obtaining and Wrangling the Data Exploring the Data Exploring the Publishers Exploring Publication Date Exploring Words in Articles Modeling A Single-Word Model Multiple Word Model Predicting with the tf-idf Transform Summary About the Authors