دسترسی نامحدود
برای کاربرانی که ثبت نام کرده اند
برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید
در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید
برای کاربرانی که ثبت نام کرده اند
درصورت عدم همخوانی توضیحات با کتاب
از ساعت 7 صبح تا 10 شب
ویرایش:
نویسندگان: Roy Jafari
سری:
ISBN (شابک) : 1801072132, 9781801072137
ناشر: Packt Publishing
سال نشر: 2022
تعداد صفحات: 602
زبان: English
فرمت فایل : PDF (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود)
حجم فایل: 48 مگابایت
در صورت تبدیل فایل کتاب Hands-On Data Preprocessing in Python: Learn how to effectively prepare data for successful data analytics به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.
توجه داشته باشید کتاب پیش پردازش داده ها در پایتون: بیاموزید چگونه به طور موثر داده ها را برای تجزیه و تحلیل داده های موفق آماده کنید نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.
این کتاب بین پاکسازی داده ها و پیش پردازش پیوند ایجاد می کند تا به شما در طراحی راه حل های تحلیلی داده موثر کمک کند
پیشپردازش دادهها اولین گام در تجسم دادهها، تجزیه و تحلیل دادهها، و یادگیری ماشینی است که در آن دادهها برای توابع تجزیه و تحلیل آماده شده است تا بهترین بینش ممکن را به دست آورد. حدود 90 درصد از زمانی که برای تجزیه و تحلیل داده ها، تجسم داده ها، و پروژه های یادگیری ماشین صرف می شود، به انجام پیش پردازش داده ها اختصاص دارد.
این کتاب شما را با تکنیک های بهینه پیش پردازش داده ها از منظرهای مختلف مجهز می کند. شما در مورد جنبه های مختلف فنی و تحلیلی پیش پردازش داده ها - جمع آوری داده ها، تمیز کردن داده ها، یکپارچه سازی داده ها، کاهش داده ها و تبدیل داده ها - خواهید آموخت و با اجرای آنها با استفاده از محیط برنامه نویسی منبع باز Python آشنا خواهید شد. این کتاب بیان جامعی از پیش پردازش داده ها، چرایی و چگونگی آن ارائه می دهد و به شما کمک می کند فرصت هایی را شناسایی کنید که تجزیه و تحلیل داده ها می تواند منجر به تصمیم گیری موثرتر شود. همچنین نقش سیستمها و فناوریهای مدیریت داده را برای تجزیه و تحلیل مؤثر و نحوه استفاده از APIها برای استخراج دادهها نشان میدهد.
در پایان این کتاب پیشپردازش دادههای پایتون، میتوانید از پایتون برای خواندن استفاده کنید. ، دستکاری و تجزیه و تحلیل داده ها؛ انجام تکنیک های پاکسازی، ادغام، کاهش و تبدیل داده ها؛ و مقادیر پرت یا از دست رفته را برای آماده سازی موثر داده ها برای ابزارهای تحلیلی مدیریت کنید.
جوانان و ارشد تحلیلگران داده، متخصصان هوش تجاری، فارغ التحصیلان مهندسی، و علاقه مندان به داده که به دنبال انجام پیش پردازش و پاکسازی داده ها بر روی مقادیر زیادی داده هستند، این کتاب را مفید خواهند یافت. مهارت های برنامه نویسی اولیه، مانند کار با متغیرها، شرطی ها، و حلقه ها، همراه با دانش سطح مبتدی پایتون و تجربه ساده تجزیه و تحلیل، فرض می شود.
This book will make the link between data cleaning and preprocessing to help you design effective data analytic solutions
Data preprocessing is the first step in data visualization, data analytics, and machine learning, where data is prepared for analytics functions to get the best possible insights. Around 90% of the time spent on data analytics, data visualization, and machine learning projects is dedicated to performing data preprocessing.
This book will equip you with the optimum data preprocessing techniques from multiple perspectives. You'll learn about different technical and analytical aspects of data preprocessing – data collection, data cleaning, data integration, data reduction, and data transformation – and get to grips with implementing them using the open source Python programming environment. This book will provide a comprehensive articulation of data preprocessing, its whys and hows, and help you identify opportunities where data analytics could lead to more effective decision making. It also demonstrates the role of data management systems and technologies for effective analytics and how to use APIs to pull data.
By the end of this Python data preprocessing book, you'll be able to use Python to read, manipulate, and analyze data; perform data cleaning, integration, reduction, and transformation techniques; and handle outliers or missing values to effectively prepare data for analytic tools.
Junior and senior data analysts, business intelligence professionals, engineering undergraduates, and data enthusiasts looking to perform preprocessing and data cleaning on large amounts of data will find this book useful. Basic programming skills, such as working with variables, conditionals, and loops, along with beginner-level knowledge of Python and simple analytics experience, are assumed.
Cover Copyright Contributors Table of Contents Preface Part 1:Technical Needs Chapter 1: Review of the Core Modules of NumPy and Pandas Technical requirements Overview of the Jupyter Notebook Are we analyzing data via computer programming? Overview of the basic functions of NumPy The np.arange() function The np.zeros() and np.ones() functions The np.linspace() function Overview of Pandas Pandas data access Boolean masking for filtering a DataFrame Pandas functions for exploring a DataFrame Pandas applying a function The Pandas groupby function Pandas multi-level indexing Pandas pivot and melt functions Summary Exercises Chapter 2: Review of Another Core Module – Matplotlib Technical requirements Drawing the main plots in Matplotlib Summarizing numerical attributes using histograms or boxplots Observing trends in the data using a line plot Relating two numerical attributes using a scatterplot Modifying the visuals Adding a title to visuals and labels to the axis Adding legends Modifying ticks Modifying markers Subplots Resizing visuals and saving them Resizing Saving Example of Matplotilb assisting data preprocessing Summary Exercises Chapter 3: Data – What Is It Really? Technical requirements What is data? Why this definition? DIKW pyramid Data preprocessing for data analytics versus data preprocessing for machine learning The most universal data structure – a table Data objects Data attributes Types of data values Analytics standpoint Programming standpoint Information versus pattern Understanding everyday use of the word "information" Statistical use of the word "information" Statistical meaning of the word "pattern" Summary Exercises References Chapter 4: Databases Technical requirements What is a database? Understanding the difference between a database and a dataset Types of databases The differentiating elements of databases Relational databases (SQL databases) Unstructured databases (NoSQL databases) A practical example that requires a combination of both structured and unstructured databases Distributed databases Blockchain Connecting to, and pulling data from, databases Direct connection Web page connection API connection Request connection Publicly shared Summary Exercises Part 2: Analytic Goals Chapter 5: Data Visualization Technical requirements Summarizing a population Example of summarizing numerical attributes Example of summarizing categorical attributes Comparing populations Example of comparing populations using boxplots Example of comparing populations using histograms Example of comparing populations using bar charts Investigating the relationship between two attributes Visualizing the relationship between two numerical attributes Visualizing the relationship between two categorical attributes Visualizing the relationship between a numerical attribute and a categorical attribute Adding visual dimensions Example of a five-dimensional scatter plot Showing and comparing trends Example of visualizing and comparing trends Summary Exercise Chapter 6: Prediction Technical requirements Predictive models Forecasting Regression analysis Linear regression Example of applying linear regression to perform regression analysis MLP How does MLP work? Example of applying MLP to perform regression analysis Summary Exercises Chapter 7: Classification Technical requirements Classification models Example of designing a classification model Classification algorithms KNN Example of using KNN for classification Decision Trees Example of using Decision Trees for classification Summary Exercises Chapter 8: Clustering Analysis Technical requirements Clustering model Clustering example using a two-dimensional dataset Clustering example using a three-dimensional dataset K-Means algorithm Using K-Means to cluster a two-dimensional dataset Using K-Means to cluster a dataset with more than two dimensions Centroid analysis Summary Exercises Part 3: The Preprocessing Chapter 9: Data Cleaning Level I – Cleaning Up the Table Technical requirements The levels, tools, and purposes of data cleaning – a roadmap to chapters 9, 10, and 11 Purpose of data analytics Tools for data analytics Levels of data cleaning Mapping the purposes and tools of analytics to the levels of data cleaning Data cleaning level I – cleaning up the table Example 1 – unwise data collection Example 2 – reindexing (multi-level indexing) Example 3 – intuitive but long column titles Summary Exercises Chapter 10: Data Cleaning Level II – Unpacking, Restructuring, and Reformulating the Table Technical requirements Example 1 – unpacking columns and reformulating the table Unpacking FileName Unpacking Content Reformulating a new table for visualization The last step – drawing the visualization Example 2 – restructuring the table Example 3 – level I and II data cleaning Level I cleaning Level II cleaning Doing the analytics – using linear regression to create a predictive model Summary Exercises Chapter 11: Data Cleaning Level III – Missing Values, Outliers, and Errors Technical requirements Missing values Detecting missing values Example of detecting missing values Causes of missing values Types of missing values Diagnosis of missing values Dealing with missing values Outliers Detecting outliers Dealing with outliers Errors Types of errors Dealing with errors Detecting systematic errors Summary Exercises Chapter 11: Data Fusion and Data Integration Technical requirements What are data fusion and data integration? Data fusion versus data integration Directions of data integration Frequent challenges regarding data fusion and integration Challenge 1 – entity identification Challenge 2 – unwise data collection Challenge 3 – index mismatched formatting Challenge 4 – aggregation mismatch Challenge 5 – duplicate data objects Challenge 6 – data redundancy Example 1 (challenges 3 and 4) Example 2 (challenges 2 and 3) Example 3 (challenges 1, 3, 5, and 6) Checking for duplicate data objects Designing the structure for the result of data integration Filling songIntegrate_df from billboard_df Filling songIntegrate_df from songAttribute_df Filling songIntegrate_df from artist_df Checking for data redundancy The analysis Example summary Summary Exercise Chapter 13: Data Reduction Technical requirements The distinction between data reduction and data redundancy The objectives of data reduction Types of data reduction Performing numerosity data reduction Random sampling Stratified sampling Random over/undersampling Performing dimensionality data reduction Linear regression as a dimension reduction method Using a decision tree as a dimension reduction method Using random forest as a dimension reduction method Brute-force computational dimension reduction PCA Functional data analysis Summary Exercises Chapter 14: Data Transformation and Massaging Technical requirements The whys of data transformation and massaging Data transformation versus data massaging Normalization and standardization Binary coding, ranking transformation, and discretization Example one – binary coding of nominal attribute Example two – binary coding or ranking transformation of ordinal attributes Example three – discretization of numerical attributes Understanding the types of discretization Discretization – the number of cut-off points A summary – from numbers to categories and back Attribute construction Example – construct one transformed attribute from two attributes Feature extraction Example – extract three attributes from one attribute Example – Morphological feature extraction Feature extraction examples from the previous chapters Log transformation Implementation – doing it yourself Implementation – the working module doing it for you Smoothing, aggregation, and binning Smoothing Aggregation Binning Summary Exercise Part 4: Case Studies Chapter 15: Case Study 1 – Mental Health in Tech Technical requirements Introducing the case study The audience of the results of analytics Introduction to the source of the data Integrating the data sources Cleaning the data Detecting and dealing with outliers and errors Detecting and dealing with missing values Analyzing the data Analysis question one – is there a significant difference between the mental health of employees across the attribute of gender? Analysis question two – is there a significant difference between the mental health of employees across the Age attribute? Analysis question three – do more supportive companies have mentally healthier employees? Analysis question four – does the attitude of individuals toward mental health influence their mental health and their seeking of treatments? Summary Chapter 16: Case Study 2 – Predicting COVID-19 Hospitalizations Technical requirements Introducing the case study Introducing the source of the data Preprocessing the data Designing the dataset to support the prediction Filling up the placeholder dataset Supervised dimension reduction Analyzing the data Summary Chapter 17: Case Study 3: United States Counties Clustering Analysis Technical requirements Introducing the case study Introduction to the source of the data Preprocessing the data Transforming election_df to partisan_df Cleaning edu_df, employ_df, pop_df, and pov_df Data integration Data cleaning level III – missing values, errors, and outliers Checking for data redundancy Analyzing the data Using PCA to visualize the dataset K-Means clustering analysis Summary Chapter 18: Summary, Practice Case Studies, and Conclusions A summary of the book Part 1 – Technical requirements Part 2 – Analytics goals Part 3 – The preprocessing Part 4 – Case studies Practice case studies Google Covid-19 mobility dataset Police killings in the US US accidents San Francisco crime Data analytics job market FIFA 2018 player of the match Hot hands in basketball Wildfires in California Silicon Valley diversity profile Recognizing fake job posting Hunting more practice case studies Conclusions Index Other Books You May Enjoy