دسترسی نامحدود
برای کاربرانی که ثبت نام کرده اند
برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید
در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید
برای کاربرانی که ثبت نام کرده اند
درصورت عدم همخوانی توضیحات با کتاب
از ساعت 7 صبح تا 10 شب
ویرایش: نویسندگان: Brian Lipp, Shubhadeep Roychowdhury, Dr. Tirthajyoti Sarkar سری: ISBN (شابک) : 1839215003, 9781839215001 ناشر: Packt Publishing سال نشر: 2020 تعداد صفحات: 0 زبان: English فرمت فایل : ZIP (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود) حجم فایل: 19 مگابایت
در صورت تبدیل فایل کتاب The Data Wrangling Workshop: Create your own actionable insights using data from multiple raw sources, 2nd Edition. Code به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.
توجه داشته باشید کتاب کارگاه مبارزه با داده ها: بینش عملی خود را با استفاده از داده های چند منبع خام، ویرایش دوم ایجاد کنید. کد نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.
راهنمای مبتدی برای ساده کردن فرآیندهای استخراج، تبدیل، بارگذاری (ETL) با کمک نکات عملی، ترفندها و بهترین شیوه ها، به روشی سرگرم کننده و تعاملی
در حالی که مقدار زیادی از داده ها به راحتی در دسترس ما هستند، به شکل خام مفید نیستند. برای اینکه دادهها معنیدار باشند، باید آنها را مدیریت و اصلاح کنید.
اگر مبتدی هستید، کارگاه بحث جدال داده ها به شما کمک می کند تا این روند را برای شما شکسته شود. شما با اصول اولیه شروع میکنید و دانش خود را میسازید، از جنبههای اصلی در پشت بحث دادهها تا استفاده از محبوبترین ابزارها و تکنیکها پیشرفت میکنید.
این کتاب با نشان دادن نحوه کار با ساختارهای داده با استفاده از پایتون شروع می شود. از طریق مثالها و فعالیتها، متوجه خواهید شد که چرا باید از روشهای سنتی پاکسازی دادهها که در زبانهای دیگر استفاده میشود دوری کنید و از روالهای از پیش ساخته شده تخصصی در پایتون استفاده کنید. بعداً، یاد خواهید گرفت که چگونه از همان بکاند پایتون برای استخراج و تبدیل دادهها از مجموعهای از منابع، از جمله اینترنت، خزانههای پایگاه داده بزرگ و جداول مالی اکسل استفاده کنید. برای کمک به شما برای آماده شدن برای سناریوهای چالش برانگیزتر، این کتاب به شما می آموزد که چگونه داده های گم شده یا نادرست را مدیریت کنید و آن ها را بر اساس الزامات ابزار تجزیه و تحلیل پایین دستی خود قالب بندی کنید.
در پایان این کتاب، شما درک کاملی از نحوه انجام جدال داده با پایتون خواهید داشت و چندین تکنیک و بهترین روش برای استخراج، تمیز کردن، تبدیل و قالببندی دادههای خود را به طور کارآمد یاد خواهید گرفت. از مجموعه ای متنوع از منابع.
کارگاه جدال داده برای توسعه دهندگان، تحلیلگران داده و تحلیلگران تجاری طراحی شده است که به دنبال شغلی به عنوان یک دانشمند داده یا متخصص تجزیه و تحلیل کامل هستند. اگرچه این کتاب برای مبتدیانی است که میخواهند بحث و جدل دادهها را شروع کنند، دانش قبلی زبان برنامهنویسی پایتون برای درک آسان مفاهیم زیر ضروری است. همچنین به داشتن دانش ابتدایی از پایگاه های داده رابطه ای و SQL کمک می کند.
A beginner's guide to simplifying Extract, Transform, Load (ETL) processes with the help of hands-on tips, tricks, and best practices, in a fun and interactive way
While a huge amount of data is readily available to us, it is not useful in its raw form. For data to be meaningful, it must be curated and refined.
If you're a beginner, then The Data Wrangling Workshop will help to break down the process for you. You'll start with the basics and build your knowledge, progressing from the core aspects behind data wrangling, to using the most popular tools and techniques.
This book starts by showing you how to work with data structures using Python. Through examples and activities, you'll understand why you should stay away from traditional methods of data cleaning used in other languages and take advantage of the specialized pre-built routines in Python. Later, you'll learn how to use the same Python backend to extract and transform data from an array of sources, including the internet, large database vaults, and Excel financial tables. To help you prepare for more challenging scenarios, the book teaches you how to handle missing or incorrect data, and reformat it based on the requirements from your downstream analytics tool.
By the end of this book, you will have developed a solid understanding of how to perform data wrangling with Python, and learned several techniques and best practices to extract, clean, transform, and format your data efficiently, from a diverse array of sources.
The Data Wrangling Workshop is designed for developers, data analysts, and business analysts who are looking to pursue a career as a full-fledged data scientist or analytics expert. Although this book is for beginners who want to start data wrangling, prior working knowledge of the Python programming language is necessary to easily grasp the concepts covered here. It will also help to have a rudimentary knowledge of relational databases and SQL.
Cover FM Copyright Table of Contents Preface Chapter 1: Introduction to Data Wrangling with Python Introduction Importance of Data Wrangling Python for Data Wrangling Lists, Sets, Strings, Tuples, and Dictionaries Lists List Functions Exercise 1.01: Accessing the List Members Exercise 1.02: Generating and Iterating through a List Exercise 1.03: Iterating over a List and Checking Membership Exercise 1.04: Sorting a List Exercise 1.05: Generating a Random List Activity 1.01: Handling Lists Sets Introduction to Sets Union and Intersection of Sets Creating Null Sets Dictionary Exercise 1.06: Accessing and Setting Values in a Dictionary Exercise 1.07: Iterating over a Dictionary Exercise 1.08: Revisiting the Unique Valued List Problem Exercise 1.09: Deleting a Value from Dict Exercise 1.10: Dictionary Comprehension Tuples Creating a Tuple with Different Cardinalities Unpacking a Tuple Exercise 1.11: Handling Tuples Strings Exercise 1.12: Accessing Strings Exercise 1.13: String Slices String Functions Exercise 1.14: Splitting and Joining a String Activity 1.02: Analyzing a Multiline String and Generating the Unique Word Count Summary Chapter 2: Advanced Operations on Built-In Data Structures Introduction Advanced Data Structures Iterator Exercise 2.01: Introducing to the Iterator Stacks Exercise 2.02: Implementing a Stack in Python Exercise 2.03: Implementing a Stack Using User-Defined Methods Lambda Expressions Exercise 2.04: Implementing a Lambda Expression Exercise 2.05: Lambda Expression for Sorting Exercise 2.06: Multi-Element Membership Checking Queue Exercise 2.07: Implementing a Queue in Python Activity 2.01: Permutation, Iterator, Lambda, and List Basic File Operations in Python Exercise 2.08: File Operations File Handling Exercise 2.09: Opening and Closing a File The with Statement Opening a File Using the with Statement Exercise 2.10: Reading a File Line by Line Exercise 2.11: Writing to a File Activity 2.02: Designing Your Own CSV Parser Summary Chapter 3: Introduction to NumPy, Pandas, and Matplotlib Introduction NumPy Arrays NumPy Arrays and Features Exercise 3.01: Creating a NumPy Array (from a List) Exercise 3.02: Adding Two NumPy Arrays Exercise 3.03: Mathematical Operations on NumPy Arrays Advanced Mathematical Operations Exercise 3.04: Advanced Mathematical Operations on NumPy Arrays Exercise 3.05: Generating Arrays Using arange and linspace Methods Exercise 3.06: Creating Multi-Dimensional Arrays Exercise 3.07: The Dimension, Shape, Size, and Data Type of Two-dimensional Arrays Exercise 3.08: Zeros, Ones, Random, Identity Matrices, and Vectors Exercise 3.09: Reshaping, Ravel, Min, Max, and Sorting Exercise 3.10: Indexing and Slicing Conditional SubSetting Exercise 3.11: Array Operations Stacking Arrays Pandas DataFrames Exercise 3.12: Creating a Pandas Series Exercise 3.13: Pandas Series and Data Handling Exercise 3.14: Creating Pandas DataFrames Exercise 3.15: Viewing a DataFrame Partially Indexing and Slicing Columns Indexing and Slicing Rows Exercise 3.16: Creating and Deleting a New Column or Row Statistics and Visualization with NumPy and Pandas Refresher on Basic Descriptive Statistics Exercise 3.17: Introduction to Matplotlib through a Scatter Plot The Definition of Statistical Measures – Central Tendency and Spread Random Variables and Probability Distribution What is a Probability Distribution? Discrete Distributions Continuous Distributions Data Wrangling in Statistics and Visualization Using NumPy and Pandas to Calculate Basic Descriptive Statistics Random Number Generation Using NumPy Exercise 3.18: Generating Random Numbers from a Uniform Distribution Exercise 3.19: Generating Random Numbers from a Binomial Distribution and Bar Plot Exercise 3.20: Generating Random Numbers from a Normal Distribution and Histograms Exercise 3.21: Calculating Descriptive Statistics from a DataFrame Exercise 3.22: Built-in Plotting Utilities Activity 3.01: Generating Statistics from a CSV File Summary Chapter 4: A Deep Dive into Data Wrangling with Python Introduction Subsetting, Filtering, and Grouping Exercise 4.01: Examining the Superstore Sales Data in an Excel File Subsetting the DataFrame An Example Use Case – Determining Statistics on Sales and Profit Exercise 4.02: The unique Function Conditional Selection and Boolean Filtering Exercise 4.03: Setting and Resetting the Index The GroupBy Method Exercise 4.04: The GroupBy Method Detecting Outliers and Handling Missing Values Missing Values in Pandas Exercise 4.05: Filling in the Missing Values Using the fillna Method The dropna Method Exercise 4.06: Dropping Missing Values with dropna Outlier Detection Using a Simple Statistical Test Concatenating, Merging, and Joining Exercise 4.07: Concatenation in Datasets Merging by a Common Key Exercise 4.08: Merging by a Common Key The join Method Exercise 4.09: The join Method Useful Methods of Pandas Randomized Sampling Exercise 4.10: Randomized Sampling The value_counts Method Pivot Table Functionality Exercise 4.11: Sorting by Column Values – the sort_values Method Exercise 4.12: Flexibility of User-Defined Functions with the apply Method Activity 4.01: Working with the Adult Income Dataset (UCI) Summary Chapter 5: Getting Comfortable with Different Kinds of Data Sources Introduction Reading Data from Different Sources Data Files Provided with This Chapter Libraries to Install for This Chapter Reading Data Using Pandas Exercise 5.01: Working with Headers When Reading Data from a CSV File Exercise 5.02: Reading from a CSV File Where Delimiters Are Not Commas Exercise 5.03: Bypassing and Renaming the Headers of a CSV File Exercise 5.04: Skipping Initial Rows and Footers When Reading a CSV File Reading Only the First N Rows Exercise 5.05: Combining skiprows and nrows to Read Data in Small Chunks Setting the skip_blank_lines Option Reading CSV Data from a Zip File Reading from an Excel File Using sheet_name and Handling a Distinct sheet_name Exercise 5.06: Reading a General Delimited Text File Reading HTML Tables Directly from a URL Exercise 5.07: Further Wrangling to Get the Desired Data Reading from a JSON file Exercise 5.08: Reading from a JSON File Reading a PDF File Exercise 5.09: Reading Tabular Data from a PDF File Introduction to Beautiful Soup 4 and Web Page Parsing Structure of HTML Exercise 5.10: Reading an HTML File and Extracting Its Contents Using Beautiful Soup Exercise 5.11: DataFrames and BeautifulSoup Exercise 5.12: Exporting a DataFrame as an Excel File Exercise 5.13: Stacking URLs from a Document Using bs4 Activity 5.01: Reading Tabular Data from a Web Page and Creating DataFrames Summary Chapter 6: Learning the Hidden Secrets of Data Wrangling Introduction Advanced List Comprehension and the zip Function Introduction to Generator Expressions Exercise 6.01: Generator Expressions Exercise 6.02: Single-Line Generator Expression Exercise 6.03: Extracting a List with Single Words Exercise 6.04: The zip Function Exercise 6.05: Handling Messy Data Data Formatting The % operator Using the format Function Exercise 6.06: Data Representation Using {} Identifying and Cleaning Outliers Exercise 6.07: Outliers in Numerical Data Z-score Exercise 6.08: The Z-Score Value to Remove Outliers Levenshtein Distance Additional Software Required for This Section Exercise 6.09: Fuzzy String Matching Activity 6.01: Handling Outliers and Missing Data Summary Chapter 7: Advanced Web Scraping and Data Gathering Introduction The Requests and BeautifulSoup Libraries Exercise 7.01: Using the Requests Library to Get a Response from the Wikipedia Home Page Exercise 7.02: Checking the Status of the Web Request Checking the Encoding of a Web Page Exercise 7.03: Decoding the Contents of a Response and Checking Its Length Exercise 7.04: Extracting Readable Text from a BeautifulSoup Object Extracting Text from a Section Extracting Important Historical Events that Happened on Today\'s Date Exercise 7.05: Using Advanced BS4 Techniques to Extract Relevant Text Exercise 7.06: Creating a Compact Function to Extract the On this day Text from the Wikipedia Home Page Reading Data from XML Exercise 7.07: Creating an XML File and Reading XML Element Objects Exercise 7.08: Finding Various Elements of Data within a Tree (Element) Reading from a Local XML File into an ElementTree Object Exercise 7.09: Traversing the Tree, Finding the Root, and Exploring All the Child Nodes and Their Tags and Attributes Exercise 7.10: Using the text Method to Extract Meaningful Data Extracting and Printing the GDP/Per Capita Information Using a Loop Finding All the Neighboring Countries for Each Country and Printing Them Exercise 7.11: A Simple Demo of Using XML Data Obtained by Web Scraping Reading Data from an API Defining the Base URL (or API Endpoint) Exercise 7.12: Defining and Testing a Function to Pull Country Data from an API Using the Built-In JSON Library to Read and Examine Data Printing All the Data Elements Using a Function that Extracts a DataFrame Containing Key Information Exercise 7.13: Testing the Function by Building a Small Database of Country Information Fundamentals of Regular Expressions (RegEx) RegEx in the Context of Web Scraping Exercise 7.14: Using the match Method to Check Whether a Pattern Matches a String/Sequence Using the compile Method to Create a RegEx Program Exercise 7.15: Compiling Programs to Match Objects Exercise 7.16: Using Additional Parameters in the match Method to Check for Positional Matching Finding the Number of Words in a List That End with \"ing\" The search Method in RegEx Exercise 7.17: The search Method in RegEx Exercise 7.18: Using the span Method of the Match Object to Locate the Position of the Matched Pattern Exercise 7.19: Examples of Single-Character Pattern Matching with search Exercise 7.20: Handling Pattern Matching at the Start or End of a String Exercise 7.21: Pattern Matching with Multiple Characters Exercise 7.22: Greedy versus Non-Greedy Matching Exercise 7.23: Controlling Repetitions to Match in a Text Sets of Matching Characters Exercise 7.24: Sets of Matching Characters Exercise 7.25: The Use of OR in RegEx Using the OR Operator The findall Method Activity 7.01: Extracting the Top 100 e-books from Gutenberg Activity 7.02: Building Your Own Movie Database by Reading an API Summary Chapter 8: RDBMS and SQL Introduction Refresher of RDBMS and SQL How Is an RDBMS Structured? SQL Using an RDBMS (MySQL/PostgreSQL/SQLite) Exercise 8.01: Connecting to a Database in SQLite DDL and DML Commands in SQLite Exercise 8.02: Using DDL and DML Commands in SQLite Reading Data from a Database in SQLite Exercise 8.03: Sorting Values That Are Present in the Database The ALTER Command Exercise 8.04: Altering the Structure of a Table and Updating the New Fields The GROUP BY clause Exercise 8.05: Grouping Values in Tables Relation Mapping in Databases Adding Rows in the comments Table Joins Retrieving Specific Columns from a JOIN Query Deleting Rows from Tables Exercise 8.06: Deleting Rows from Tables Updating Specific Values in a Table Exercise 8.07: RDBMS and DataFrames Activity 8.01: Retrieving Data Accurately from Databases Summary Chapter 9: Applications in Business Use Cases and Conclusion of the Course Introduction Applying Your Knowledge to a Data Wrangling Task Activity 9.01: Data Wrangling Task – Fixing UN Data Activity 9.02: Data Wrangling Task – Cleaning GDP Data Activity 9.03: Data Wrangling Task – Merging UN Data and GDP Data Activity 9.04: Data Wrangling Task – Connecting the New Data to the Database An Extension to Data Wrangling Additional Skills Required to Become a Data Scientist Basic Familiarity with Big Data and Cloud Technologies What Goes with Data Wrangling? Tips and Tricks for Mastering Machine Learning Summary Appendix Index