دسترسی نامحدود
برای کاربرانی که ثبت نام کرده اند
برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید
در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید
برای کاربرانی که ثبت نام کرده اند
درصورت عدم همخوانی توضیحات با کتاب
از ساعت 7 صبح تا 10 شب
ویرایش:
نویسندگان: Sreeram Nudurupati
سری:
ISBN (شابک) : 1800568878, 9781800568877
ناشر: Packt Publishing
سال نشر: 2021
تعداد صفحات: 322
زبان: English
فرمت فایل : EPUB (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود)
حجم فایل: 5 Mb
در صورت تبدیل فایل کتاب Essential PySpark for Scalable Data Analytics: A beginner's guide to harnessing the power and ease of PySpark 3 به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.
توجه داشته باشید کتاب PySpark ضروری برای تجزیه و تحلیل داده های مقیاس پذیر: راهنمای مبتدی برای استفاده از قدرت و سهولت PySpark 3 نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.
Cover Title Page Copyright and Credits Contributors Table of Contents Preface Section 1: Data Engineering Chapter 1: Distributed Computing Primer Technical requirements Distributed Computing Introduction to Distributed Computing Data Parallel Processing Data Parallel Processing using the MapReduce paradigm Distributed Computing with Apache Spark Introduction to Apache Spark Data Parallel Processing with RDDs Higher-order functions Apache Spark cluster architecture Getting started with Spark Big data processing with Spark SQL and DataFrames Transforming data with Spark DataFrames Using SQL on Spark What's new in Apache Spark 3.0? Summary Chapter 2: Data Ingestion Technical requirements Introduction to Enterprise Decision Support Systems Ingesting data from data sources Ingesting from relational data sources Ingesting from file-based data sources Ingesting from message queues Ingesting data into data sinks Ingesting into data warehouses Ingesting into data lakes Ingesting into NoSQL and in-memory data stores Using file formats for data storage in data lakes Unstructured data storage formats Semi-structured data storage formats Structured data storage formats Building data ingestion pipelines in batch and real time Data ingestion using batch processing Data ingestion in real time using structured streaming Unifying batch and real time using Lambda Architecture Lambda Architecture The Batch layer The Speed layer The Serving layer Summary Chapter 3: Data Cleansing and Integration Technical requirements Transforming raw data into enriched meaningful data Extracting, transforming, and loading data Extracting, loading, and transforming data Advantages of choosing ELT over ETL Building analytical data stores using cloud data lakes Challenges with cloud data lakes Overcoming data lake challenges with Delta Lake Consolidating data using data integration Data consolidation via ETL and data warehousing Integrating data using data virtualization techniques Data integration through data federation Making raw data analytics-ready using data cleansing Data selection to eliminate redundancies De-duplicating data Standardizing data Optimizing ELT processing performance with data partitioning Summary Chapter 4: Real-Time Data Analytics Technical requirements Real-time analytics systems architecture Streaming data sources Streaming data sinks Stream processing engines Real-time data consumers Real-time analytics industry use cases Real-time predictive analytics in manufacturing Connected vehicles in the automotive sector Financial fraud detection IT security threat detection Simplifying the Lambda Architecture using Delta Lake Change Data Capture Handling late-arriving data Stateful stream processing using windowing and watermarking Multi-hop pipelines Summary Section 2: Data Science Chapter 5: Scalable Machine Learning with PySpark Technical requirements ML overview Types of ML algorithms Business use cases of ML Scaling out machine learning Techniques for scaling ML Introduction to Apache Spark's ML library Data wrangling with Apache Spark and MLlib Data preprocessing Data cleansing Data manipulation Summary Chapter 6: Feature Engineering – Extraction, Transformation, and Selection Technical requirements The machine learning process Feature extraction Feature transformation Transforming categorical variables Transforming continuous variables Transforming the date and time variables Assembling individual features into a feature vector Feature scaling Feature selection Feature store as a central feature repository Batch inferencing using the offline feature store Delta Lake as an offline feature store Structure and metadata with Delta tables Schema enforcement and evolution with Delta Lake Support for simultaneous batch and streaming workloads Delta Lake time travel Integration with machine learning operations tools Online feature store for real-time inferencing Summary Chapter 7: Supervised Machine Learning Technical requirements Introduction to supervised machine learning Parametric machine learning Non-parametric machine learning Regression Linear regression Regression using decision trees Classification Logistic regression Classification using decision trees Naïve Bayes Support vector machines Tree ensembles Regression using random forests Classification using random forests Regression using gradient boosted trees Classification using GBTs Real-world supervised learning applications Regression applications Classification applications Summary Chapter 8: Unsupervised Machine Learning Technical requirements Introduction to unsupervised machine learning Clustering using machine learning K-means clustering Hierarchical clustering using bisecting K-means Topic modeling using latent Dirichlet allocation Gaussian mixture model Building association rules using machine learning Collaborative filtering using alternating least squares Real-world applications of unsupervised learning Clustering applications Association rules and collaborative filtering applications Summary Chapter 9: Machine Learning Life Cycle Management Technical requirements Introduction to the ML life cycle Introduction to MLflow Tracking experiments with MLflow ML model tuning Tracking model versions using MLflow Model Registry Model serving and inferencing Offline model inferencing Online model inferencing Continuous delivery for ML Summary Chapter 10: Scaling Out Single-Node Machine Learning Using PySpark Technical requirements Scaling out EDA EDA using pandas EDA using PySpark Scaling out model inferencing Model training using embarrassingly parallel computing Distributed hyperparameter tuning Scaling out arbitrary Python code using pandas UDF Upgrading pandas to PySpark using Koalas Summary Section 3: Data Analysis Chapter 11: Data Visualization with PySpark Technical requirements Importance of data visualization Types of data visualization tools Techniques for visualizing data using PySpark PySpark native data visualizations Using Python data visualizations with PySpark Considerations for PySpark to pandas conversion Introduction to pandas Converting from PySpark into pandas Summary Chapter 12: Spark SQL Primer Technical requirements Introduction to SQL DDL DML Joins and sub-queries Row-based versus columnar storage Introduction to Spark SQL Catalyst optimizer Spark SQL data sources Spark SQL language reference Spark SQL DDL Spark DML Optimizing Spark SQL performance Summary Chapter 13: Integrating External Tools with Spark SQL Technical requirements Apache Spark as a distributed SQL engine Introduction to Hive Thrift JDBC/ODBC Server Spark connectivity to SQL analysis tools Spark connectivity to BI tools Connecting Python applications to Spark SQL using Pyodbc Summary Chapter 14: The Data Lakehouse Moving from BI to AI Challenges with data warehouses Challenges with data lakes The data lakehouse paradigm Key requirements of a data lakehouse Data lakehouse architecture Examples of existing lakehouse architectures Apache Spark-based data lakehouse architecture Advantages of data lakehouses Summary About PACKT Other Books You May Enjoy Index