برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید

09117307688
09117179751

در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید

دسترسی نامحدود

برای کاربرانی که ثبت نام کرده اند

ضمانت بازگشت وجه

درصورت عدم همخوانی توضیحات با کتاب

پشتیبانی

از ساعت 7 صبح تا 10 شب

دانلود کتاب Learning Spark: Lightning-Fast Data Analytics

مشخصات کتاب

Learning Spark: Lightning-Fast Data Analytics

دسته بندی: پایگاه داده ها
ویرایش: 2 
نویسندگان: Jules S. Damji,  Brooke Wenig,  Tathagata Das,  Denny Lee  
سری:  
ISBN (شابک) : 1492050040, 9781492050049 
ناشر: O'Reilly Media 
سال نشر: 2020 
تعداد صفحات: 399 
زبان: English 
فرمت فایل : PDF (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود) 
حجم فایل: 15 مگابایت

قیمت کتاب (تومان) : 51,000

کلمات کلیدی مربوط به کتاب Learning Spark: Lightning-Fast Data Analytics: یادگیری ماشینی، یادگیری بدون نظارت، یادگیری نظارت شده، جاوا، MySQL، PostgreSQL، Apache Hive، Apache Spark، قابلیت اطمینان، پردازش جریان، خطوط لوله، Apache Kafka، Apache Hadoop، Scala، Spark DataFrames،ApacheMenning، ParqueyMorning بهینه سازی، رگرسیون خطی، CSV، سرور مایکروسافت SQL، Spark SQL، Spark MLlib، مجموعه داده های توزیع شده انعطاف پذیر، تنظیم عملکرد، دریاچه داده، پردازش توزیع شده، Cosmos DB، آپاچی هودی، کوه یخ آپاچی، دریاچه دلتا، Lakehouses

میانگین امتیاز به این کتاب :
تعداد امتیاز دهندگان : 6

در صورت تبدیل فایل کتاب Learning Spark: Lightning-Fast Data Analytics به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.

توجه داشته باشید کتاب Learning Spark: Lightning-Fast Data Analytics نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.

توضیحاتی در مورد کتاب Learning Spark: Lightning-Fast Data Analytics

داده‌ها بزرگ‌تر می‌شوند، سریع‌تر می‌رسند و در قالب‌های متنوعی می‌آیند - و همه باید در مقیاس برای تجزیه و تحلیل یا یادگیری ماشین پردازش شوند. چگونه می توانید چنین حجم کاری داده های متنوعی را به طور موثر پردازش کنید؟ وارد آپاچی اسپارک شوید. این ویرایش دوم که برای تأکید بر ویژگی‌های جدید در Spark 2.x به‌روزرسانی شده است، به مهندسان داده و دانشمندان نشان می‌دهد که چرا ساختار و یکپارچگی در Spark مهم است. به طور خاص، این کتاب نحوه انجام تجزیه و تحلیل داده های ساده و پیچیده و به کارگیری الگوریتم های یادگیری ماشینی را توضیح می دهد. از طریق گفتمان، قطعه کد و دفترچه یادداشت، شما قادر خواهید بود: • APIهای سطح بالای Python، SQL، Scala یا Java را یاد بگیرید: DataFrames و Datasets • زیر کاپوت موتور Spark SQL را نگاه کنید تا تغییرات و عملکرد Spark را درک کنید • عملیات Spark خود را با تنظیمات Spark و Spark UI بررسی، تنظیم و اشکال زدایی کنید • اتصال به منابع داده: JSON، Parquet، CSV، Avro، ORC، Hive، S3، یا Kafka • تجزیه و تحلیل را روی داده های دسته ای و جریانی با استفاده از جریان ساخت یافته انجام دهید • خطوط لوله داده قابل اعتماد با دلتا لیک و اسپارک منبع باز بسازید • توسعه خطوط لوله یادگیری ماشین با MLlib و تولید مدل ها با استفاده از MLflow • از چارچوب پانداهای منبع باز کوالا و اسپارک برای تبدیل داده ها و مهندسی ویژگی ها استفاده کنید

توضیحاتی درمورد کتاب به خارجی

Data is getting bigger, arriving faster, and coming in varied formats — and it all needs to be processed at scale for analytics or machine learning. How can you process such varied data workloads efficiently? Enter Apache Spark. Updated to emphasize new features in Spark 2.x., this second edition shows data engineers and scientists why structure and unification in Spark matters. Specifically, this book explains how to perform simple and complex data analytics and employ machine-learning algorithms. Through discourse, code snippets, and notebooks, you’ll be able to: • Learn Python, SQL, Scala, or Java high-level APIs: DataFrames and Datasets • Peek under the hood of the Spark SQL engine to understand Spark transformations and performance • Inspect, tune, and debug your Spark operations with Spark configurations and Spark UI • Connect to data sources: JSON, Parquet, CSV, Avro, ORC, Hive, S3, or Kafka • Perform analytics on batch and streaming data using Structured Streaming • Build reliable data pipelines with open source Delta Lake and Spark • Develop machine learning pipelines with MLlib and productionize models using MLflow • Use open source Pandas framework Koalas and Spark for data transformation and feature engineering

فهرست مطالب

Copyright......Page 6
Table of Contents......Page 7
Foreword......Page 15
Who This Book Is For......Page 17
How the Book Is Organized......Page 18
Conventions Used in This Book......Page 20
O’Reilly Online Learning......Page 21
Acknowledgments......Page 22
Big Data and Distributed Computing at Google......Page 25
Hadoop at Yahoo!......Page 26
Spark’s Early Years at AMPLab......Page 27
Speed......Page 28
Extensibility......Page 29
Apache Spark Components as a Unified Stack......Page 30
Apache Spark’s Distributed Execution......Page 34
Who Uses Spark, and for What?......Page 38
Community Adoption and Expansion......Page 40
Step 1: Downloading Apache Spark......Page 43
Spark’s Directories and Files......Page 45
Step 2: Using the Scala or PySpark Shell......Page 46
Using the Local Machine......Page 47
Step 3: Understanding Spark Application Concepts......Page 49
Spark Application and SparkSession......Page 50
Spark Jobs......Page 51
Transformations, Actions, and Lazy Evaluation......Page 52
Narrow and Wide Transformations......Page 54
The Spark UI......Page 55
Your First Standalone Application......Page 58
Counting M&Ms for the Cookie Monster......Page 59
Building Standalone Applications in Scala......Page 64
Summary......Page 66
Spark: What’s Underneath an RDD?......Page 67
Structuring Spark......Page 68
Key Merits and Benefits......Page 69
The DataFrame API......Page 71
Spark’s Basic Data Types......Page 72
Spark’s Structured and Complex Data Types......Page 73
Schemas and Creating DataFrames......Page 74
Columns and Expressions......Page 78
Rows......Page 81
Common DataFrame Operations......Page 82
End-to-End DataFrame Example......Page 92
Typed Objects, Untyped Objects, and Generic Rows......Page 93
Creating Datasets......Page 95
Dataset Operations......Page 96
DataFrames Versus Datasets......Page 98
When to Use RDDs......Page 99
Spark SQL and the Underlying Engine......Page 100
The Catalyst Optimizer......Page 101
Summary......Page 106
Chapter 4. Spark SQL and DataFrames: Introduction to Built-in Data Sources......Page 107
Using Spark SQL in Spark Applications......Page 108
Basic Query Examples......Page 109
Managed Versus UnmanagedTables......Page 113
Creating SQL Databases and Tables......Page 114
Creating Views......Page 115
Reading Tables into DataFrames......Page 117
DataFrameReader......Page 118
DataFrameWriter......Page 120
Parquet......Page 121
JSON......Page 124
CSV......Page 126
Avro......Page 128
ORC......Page 130
Images......Page 132
Binary Files......Page 134
Summary......Page 135
Spark SQL and Apache Hive......Page 137
User-Defined Functions......Page 138
Using the Spark SQL Shell......Page 143
Working with Beeline......Page 144
Working with Tableau......Page 146
JDBC and SQL Databases......Page 153
PostgreSQL......Page 156
MySQL......Page 157
Azure Cosmos DB......Page 158
MS SQL Server......Page 160
Other External Sources......Page 161
Option 2: User-Defined Function......Page 162
Built-in Functions for Complex Data Types......Page 163
Higher-Order Functions......Page 165
Common DataFrames and Spark SQL Operations......Page 168
Unions......Page 171
Joins......Page 172
Windowing......Page 173
Modifications......Page 175
Summary......Page 179
Single API for Java and Scala......Page 181
Scala Case Classes and JavaBeans for Datasets......Page 182
Creating Sample Data......Page 184
Transforming Sample Data......Page 186
Memory Management for Datasets and DataFrames......Page 191
Spark’s Internal Format Versus Java Object Format......Page 192
Serialization and Deserialization (SerDe)......Page 193
Strategies to Mitigate Costs......Page 194
Summary......Page 196
Viewing and Setting Apache Spark Configurations......Page 197
Scaling Spark for Large Workloads......Page 201
DataFrame.cache()......Page 207
DataFrame.persist()......Page 208
A Family of Spark Joins......Page 211
Broadcast Hash Join......Page 212
Shuffle Sort Merge Join......Page 213
Journey Through the Spark UI Tabs......Page 221
Summary......Page 229
Evolution of the Apache Spark Stream Processing Engine......Page 231
The Advent of Micro-Batch Stream Processing......Page 232
Lessons Learned from Spark Streaming (DStreams)......Page 233
The Philosophy of Structured Streaming......Page 234
The Programming Model of Structured Streaming......Page 235
Five Steps to Define a Streaming Query......Page 237
Under the Hood of an Active Streaming Query......Page 243
Recovering from Failures with Exactly-Once Guarantees......Page 245
Monitoring an Active Query......Page 247
Files......Page 250
Apache Kafka......Page 252
Custom Streaming Sources and Sinks......Page 254
Incremental Execution and Streaming State......Page 258
Stateful Transformations......Page 259
Aggregations Not Based on Time......Page 262
Aggregations with Event-Time Windows......Page 263
Stream–Static Joins......Page 270
Stream–Stream Joins......Page 272
Arbitrary Stateful Computations......Page 277
Modeling Arbitrary Stateful Operations with mapGroupsWithState()......Page 278
Using Timeouts to Manage Inactive Groups......Page 281
Generalization with flatMapGroupsWithState()......Page 285
Performance Tuning......Page 286
Summary......Page 288
The Importance of an Optimal Storage Solution......Page 289
A Brief Introduction to Databases......Page 290
Limitations of Databases......Page 291
A Brief Introduction to Data Lakes......Page 292
Reading from and Writing to Data Lakes using Apache Spark......Page 293
Limitations of Data Lakes......Page 294
Lakehouses: The Next Step in the Evolution of Storage Solutions......Page 295
Apache Iceberg......Page 296
Delta Lake......Page 297
Configuring Apache Spark with Delta Lake......Page 298
Loading Data into a Delta Lake Table......Page 299
Loading Data Streams into a Delta Lake Table......Page 301
Enforcing Schema on Write to Prevent Data Corruption......Page 302
Transforming Existing Data......Page 303
Auditing Data Changes with Operation History......Page 306
Querying Previous Snapshots of a Table with Time Travel......Page 307
Summary......Page 308
Chapter 10. Machine Learning with MLlib......Page 309
Supervised Learning......Page 310
Unsupervised Learning......Page 312
Designing Machine Learning Pipelines......Page 313
Data Ingestion and Exploration......Page 314
Creating Training and Test Data Sets......Page 315
Preparing Features with Transformers......Page 317
Understanding Linear Regression......Page 318
Using Estimators to Build Models......Page 319
Creating a Pipeline......Page 320
Evaluating Models......Page 326
Saving and Loading Models......Page 330
Tree-Based Models......Page 331
k-Fold Cross-Validation......Page 340
Optimizing Pipelines......Page 344
Summary......Page 345
Model Management......Page 347
MLflow......Page 348
Model Deployment Options with MLlib......Page 354
Batch......Page 356
Streaming......Page 357
Model Export Patterns for Real-Time Inference......Page 358
Pandas UDFs......Page 360
Spark for Distributed Hyperparameter Tuning......Page 361
Summary......Page 365
Dynamic Partition Pruning......Page 367
Adaptive Query Execution......Page 369
SQL Join Hints......Page 372
Catalog Plugin API and DataSourceV2......Page 373
Accelerator-Aware Scheduler......Page 375
Structured Streaming......Page 376
Redesigned Pandas UDFs with Python Type Hints......Page 378
Iterator Support in Pandas UDFs......Page 379
New Pandas Function APIs......Page 380
Changes to the DataFrame and Dataset APIs......Page 381
DataFrame and SQL Explain Commands......Page 382
Summary......Page 384
Index......Page 385
Colophon......Page 398