ورود به حساب

نام کاربری گذرواژه

گذرواژه را فراموش کردید؟ کلیک کنید

حساب کاربری ندارید؟ ساخت حساب

ساخت حساب کاربری

نام نام کاربری ایمیل شماره موبایل گذرواژه

برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید


09117307688
09117179751

در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید

دسترسی نامحدود

برای کاربرانی که ثبت نام کرده اند

ضمانت بازگشت وجه

درصورت عدم همخوانی توضیحات با کتاب

پشتیبانی

از ساعت 7 صبح تا 10 شب

دانلود کتاب Essential PySpark for Scalable Data Analytics: A beginner's guide to harnessing the power and ease of PySpark 3

دانلود کتاب PySpark ضروری برای تجزیه و تحلیل داده های مقیاس پذیر: راهنمای مبتدی برای استفاده از قدرت و سهولت PySpark 3

Essential PySpark for Scalable Data Analytics: A beginner's guide to harnessing the power and ease of PySpark 3

مشخصات کتاب

Essential PySpark for Scalable Data Analytics: A beginner's guide to harnessing the power and ease of PySpark 3

ویرایش:  
نویسندگان:   
سری:  
ISBN (شابک) : 1800568878, 9781800568877 
ناشر: Packt Publishing 
سال نشر: 2021 
تعداد صفحات: 322 
زبان: English 
فرمت فایل : EPUB (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود) 
حجم فایل: 5 Mb 

قیمت کتاب (تومان) : 66,000



ثبت امتیاز به این کتاب

میانگین امتیاز به این کتاب :
       تعداد امتیاز دهندگان : 10


در صورت تبدیل فایل کتاب Essential PySpark for Scalable Data Analytics: A beginner's guide to harnessing the power and ease of PySpark 3 به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.

توجه داشته باشید کتاب PySpark ضروری برای تجزیه و تحلیل داده های مقیاس پذیر: راهنمای مبتدی برای استفاده از قدرت و سهولت PySpark 3 نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.


توضیحاتی درمورد کتاب به خارجی



فهرست مطالب

Cover
Title Page
Copyright and Credits
Contributors
Table of Contents
Preface
Section 1: Data Engineering
Chapter 1: Distributed Computing Primer
	Technical requirements
	Distributed Computing
		Introduction to Distributed Computing
		Data Parallel Processing
		Data Parallel Processing using the MapReduce paradigm
	Distributed Computing with Apache Spark
		Introduction to Apache Spark
		Data Parallel Processing with RDDs
		Higher-order functions
		Apache Spark cluster architecture
		Getting started with Spark
	Big data processing with Spark SQL and DataFrames
		Transforming data with Spark DataFrames
		Using SQL on Spark
		What's new in Apache Spark 3.0?
	Summary
Chapter 2: Data Ingestion
	Technical requirements
	Introduction to Enterprise Decision Support Systems
	Ingesting data from data sources
		Ingesting from relational data sources
		Ingesting from file-based data sources
		Ingesting from message queues
	Ingesting data into data sinks
		Ingesting into data warehouses
		Ingesting into data lakes
		Ingesting into NoSQL and in-memory data stores
	Using file formats for data storage in data lakes
		Unstructured data storage formats
		Semi-structured data storage formats
		Structured data storage formats
	Building data ingestion pipelines in batch and real time
		Data ingestion using batch processing
		Data ingestion in real time using structured streaming
	Unifying batch and real time using Lambda Architecture
		Lambda Architecture
		The Batch layer
		The Speed layer
		The Serving layer
	Summary
Chapter 3: Data Cleansing and Integration
	Technical requirements
	Transforming raw data into enriched meaningful data
		Extracting, transforming, and loading data
		Extracting, loading, and transforming data
		Advantages of choosing ELT over ETL
	Building analytical data stores using cloud data lakes
		Challenges with cloud data lakes
		Overcoming data lake challenges with Delta Lake
	Consolidating data using data integration
		Data consolidation via ETL and data warehousing
		Integrating data using data virtualization techniques
		Data integration through data federation
	Making raw data analytics-ready using data cleansing
		Data selection to eliminate redundancies
		De-duplicating data
		Standardizing data
		Optimizing ELT processing performance with data partitioning
	Summary
Chapter 4: Real-Time Data Analytics
	Technical requirements
	Real-time analytics systems architecture
		Streaming data sources
		Streaming data sinks
	Stream processing engines
		Real-time data consumers
	Real-time analytics industry use cases
		Real-time predictive analytics in manufacturing
		Connected vehicles in the automotive sector
		Financial fraud detection
		IT security threat detection
	Simplifying the Lambda Architecture using Delta Lake
	Change Data Capture
	Handling late-arriving data
		Stateful stream processing using windowing and watermarking
	Multi-hop pipelines
	Summary
Section 2: Data Science
Chapter 5: Scalable Machine Learning with PySpark
	Technical requirements
	ML overview
		Types of ML algorithms
		Business use cases of ML
	Scaling out machine learning
		Techniques for scaling ML
		Introduction to Apache Spark's ML library
	Data wrangling with Apache Spark and MLlib
		Data preprocessing
		Data cleansing
		Data manipulation
	Summary
Chapter 6: Feature Engineering – Extraction, Transformation, and Selection
	Technical requirements
	The machine learning process
	Feature extraction
	Feature transformation
		Transforming categorical variables
		Transforming continuous variables
		Transforming the date and time variables
		Assembling individual features into a feature vector
		Feature scaling
	Feature selection
	Feature store as a central feature repository
		Batch inferencing using the offline feature store
	Delta Lake as an offline feature store
		Structure and metadata with Delta tables
		Schema enforcement and evolution with Delta Lake
		Support for simultaneous batch and streaming workloads
		Delta Lake time travel
		Integration with machine learning operations tools
		Online feature store for real-time inferencing
	Summary
Chapter 7: Supervised Machine Learning
	Technical requirements
	Introduction to supervised machine learning
		Parametric machine learning
		Non-parametric machine learning
	Regression
		Linear regression
		Regression using decision trees
	Classification
		Logistic regression
		Classification using decision trees
		Naïve Bayes
		Support vector machines
	Tree ensembles
		Regression using random forests
		Classification using random forests
		Regression using gradient boosted trees
		Classification using GBTs
	Real-world supervised learning applications
		Regression applications
		Classification applications
	Summary
Chapter 8: Unsupervised Machine Learning
	Technical requirements
	Introduction to unsupervised machine learning
	Clustering using machine learning
		K-means clustering
		Hierarchical clustering using bisecting K-means
		Topic modeling using latent Dirichlet allocation
		Gaussian mixture model
	Building association rules using machine learning
		Collaborative filtering using alternating least squares
	Real-world applications of unsupervised learning
		Clustering applications
		Association rules and collaborative filtering applications
	Summary
Chapter 9: Machine Learning Life Cycle Management
	Technical requirements
	Introduction to the ML life cycle
		Introduction to MLflow
	Tracking experiments with MLflow
		ML model tuning
	Tracking model versions using MLflow Model Registry
	Model serving and inferencing
		Offline model inferencing
		Online model inferencing
	Continuous delivery for ML
	Summary
Chapter 10: Scaling Out Single-Node Machine Learning Using PySpark
	Technical requirements
	Scaling out EDA
		EDA using pandas
		EDA using PySpark
	Scaling out model inferencing
	Model training using embarrassingly parallel computing
		Distributed hyperparameter tuning
		Scaling out arbitrary Python code using pandas UDF
	Upgrading pandas to PySpark using Koalas
	Summary
Section 3: Data Analysis
Chapter 11: Data Visualization with PySpark
	Technical requirements
	Importance of data visualization
		Types of data visualization tools
	Techniques for visualizing data using PySpark
		PySpark native data visualizations
		Using Python data visualizations with PySpark
	Considerations for PySpark to pandas conversion
		Introduction to pandas
		Converting from PySpark into pandas
	Summary
Chapter 12: Spark SQL Primer
	Technical requirements
	Introduction to SQL
		DDL
		DML
		Joins and sub-queries
		Row-based versus columnar storage
	Introduction to Spark SQL
		Catalyst optimizer
		Spark SQL data sources
	Spark SQL language reference
		Spark SQL DDL
		Spark DML
	Optimizing Spark SQL performance
	Summary
Chapter 13: Integrating External Tools with Spark SQL
	Technical requirements
	Apache Spark as a distributed SQL engine
		Introduction to Hive Thrift JDBC/ODBC Server
	Spark connectivity to SQL analysis tools
	Spark connectivity to BI tools
	Connecting Python applications to Spark SQL using Pyodbc
	Summary
Chapter 14: The Data Lakehouse
	Moving from BI to AI
		Challenges with data warehouses
		Challenges with data lakes
	The data lakehouse paradigm
		Key requirements of a data lakehouse
		Data lakehouse architecture
		Examples of existing lakehouse architectures
		Apache Spark-based data lakehouse architecture
	Advantages of data lakehouses
	Summary
About PACKT
Other Books You May Enjoy
Index




نظرات کاربران