دسترسی نامحدود
برای کاربرانی که ثبت نام کرده اند
برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید
در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید
برای کاربرانی که ثبت نام کرده اند
درصورت عدم همخوانی توضیحات با کتاب
از ساعت 7 صبح تا 10 شب
ویرایش: 2
نویسندگان: Hien Luu
سری:
ISBN (شابک) : 1484273826, 9781484273821
ناشر: Apress
سال نشر: 2021
تعداد صفحات: 445
زبان: English
فرمت فایل : PDF (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود)
حجم فایل: 8 مگابایت
در صورت تبدیل فایل کتاب Beginning Apache Spark 3: With DataFrame, Spark SQL, Structured Streaming, and Spark Machine Learning Library به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.
توجه داشته باشید کتاب آغاز Apache Spark 3: با DataFrame، Spark SQL، Structured Streaming و Spark Machine Learning Library نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.
به سمت کشف، یادگیری و استفاده از Apache Spark 3.0 سفر کنید. در این کتاب، شما در مورد موتور پردازش داده های توزیع شده قدرتمند و کارآمد در داخل آپاچی اسپارک تخصص کسب خواهید کرد. مدل برنامه نویسی کاربرپسند، جامع و انعطاف پذیر آن برای پردازش داده ها به صورت دسته ای و جریانی. و الگوریتمهای یادگیری ماشین مقیاسپذیر و ابزارهای کاربردی برای ساخت برنامههای یادگیری ماشین.
آغاز Apache Spark 3 با توضیح روشهای مختلف تعامل با Apache Spark، مانند مفاهیم Spark و معماری آغاز میشود. و Spark Unified Stack. در مرحله بعد، مروری بر Spark SQL قبل از رفتن به ویژگی های پیشرفته آن ارائه می دهد. این نکات و تکنیکهایی را برای رسیدگی به مشکلات عملکرد، و به دنبال آن مروری بر موتور پردازش جریان ساختاریافته پوشش میدهد. این با نمایشی از نحوه توسعه برنامههای یادگیری ماشین با استفاده از Spark MLlib و نحوه مدیریت چرخه عمر توسعه یادگیری ماشین به پایان میرسد. این کتاب مملو از مثالهای عملی و تکههای کد است که به شما کمک میکند مفاهیم و ویژگیها را بلافاصله پس از پوشش دادن در هر بخش تسلط پیدا کنید.
پس از خواندن این کتاب، دانش لازم برای ساختن کلان دادههای خود را خواهید داشت. خطوط لوله، برنامهها، و برنامههای یادگیری ماشین موتور و اجزای مختلف آن
این کتاب برای چه کسانی است
دانشمندان داده، مهندسان داده و توسعه دهندگان نرم افزار.
Take a journey toward discovering, learning, and using Apache Spark 3.0. In this book, you will gain expertise on the powerful and efficient distributed data processing engine inside of Apache Spark; its user-friendly, comprehensive, and flexible programming model for processing data in batch and streaming; and the scalable machine learning algorithms and practical utilities to build machine learning applications.
Beginning Apache Spark 3 begins by explaining different ways of interacting with Apache Spark, such as Spark Concepts and Architecture, and Spark Unified Stack. Next, it offers an overview of Spark SQL before moving on to its advanced features. It covers tips and techniques for dealing with performance issues, followed by an overview of the structured streaming processing engine. It concludes with a demonstration of how to develop machine learning applications using Spark MLlib and how to manage the machine learning development lifecycle. This book is packed with practical examples and code snippets to help you master concepts and features immediately after they are covered in each section.
After reading this book, you will have the knowledge required to build your own big data pipelines, applications, and machine learning applications.
What You Will Learn
Who This Book Is For
Data scientists, data engineers and software developers.
Table of Contents About the Author About the Technical Reviewers Acknowledgments Introduction Chapter 1: Introduction to Apache Spark Overview History Spark Core Concepts and Architecture Spark Cluster and Resource Management System Spark Applications Spark Drivers and Executors Spark Unified Stack Spark Core Spark SQL Spark Structured Streaming Spark MLlib Spark GraphX SparkR Apache Spark 3.0 Adaptive Query Execution Framework Dynamic Partition Pruning (DPP) Accelerator-aware Scheduler Apache Spark Applications Spark Example Applications Apache Spark Ecosystem Delta Lake Koalas MLflow Summary Chapter 2: Working with Apache Spark Downloading and Installation Downloading Spark Installing Spark Spark Scala Shell Spark Python Shell Having Fun with the Spark Scala Shell Useful Spark Scala Shell Command and Tips Basic Interactions with Scala and Spark Basic Interactions with Scala Spark UI and Basic Interactions with Spark Spark UI Basic Interactions with Spark Introduction to Collaborative Notebooks Create a Cluster Create a Folder Create a Notebook Setting up Spark Source Code Summary Chapter 3: Spark SQL: Foundation Understanding RDD Introduction to the DataFrame API Creating a DataFrame Creating a DataFrame from RDD Creating a DataFrame from a Range of Numbers Creating a DataFrame from Data Sources Creating a DataFrame by Reading Text Files Creating a DataFrame by Reading CSV Files Creating a DataFrame by Reading JSON Files Creating a DataFrame by Reading Parquet Files Creating a DataFrame by Reading ORC Files Creating a DataFrame from JDBC Working with Structured Operations Working with Columns Working with Structured Transformations select(columns) selectExpr(expressions) filler(condition), where(condition) distinct, dropDuplicates sort(columns), orderBy(columns) limit(n) union(otherDataFrame) withColumn(colName, column) withColumnRenamed(existingColName, newColName) drop(columnName1, columnName2) sample(fraction), sample(fraction, seed), sample(fraction, seed, withReplacement) randomSplit(weights) Working with Missing or Bad Data Working with Structured Actions describe(columnNames) Introduction to Datasets Creating Datasets Working with Datasets Using SQL in Spark SQL Running SQL in Spark Writing Data Out to Storage Systems The Trio: DataFrame, Dataset, and SQL DataFrame Persistence Summary Chapter 4: Spark SQL: Advanced Aggregations Aggregation Functions Common Aggregation Functions count(col) countDistinct(col) min(col), max(col) sum(col) sumDistinct(col) avg(col) skewness(col), kurtosis(col) variance(col), stddev(col) Aggregation with Grouping Multiple Aggregations per Group Collection Group Values Aggregation with Pivoting Joins Join Expression and Join Types Working with Joins Inner Joins Left Outer Joins Right Outer Joins Outer Joins (a.k.a. Full Outer Joins) Left Anti-Joins Left Semi-Joins Cross (a.k.a. Cartesian) Dealing with Duplicate Column Names Use Original DataFrame Renaming Column Before Joining Using Joined Column Name Overview of Join Implementation Shuffle Hash Join Broadcast Hash Join Functions Working with Built-in Functions Working with Date Time Functions Working with String Functions Working with Math Functions Working with Collection Functions Working with Miscellaneous Functions Working with User-Defined Functions (UDFs) Advanced Analytics Functions Aggregation with Rollups and Cubes Rollups Cubes Aggregation with Time Windows Window Functions Exploring Catalyst Optimizer Logical Plan Physical Plan Catalyst in Action Project Tungsten Summary Chapter 5: Optimizing Spark Applications Common Performance Issues Spark Configurations Different Ways of Setting Properties Different Kinds of Properties Viewing Spark Properties Spark Memory Management Spark Driver Spark Executor Leverage In-Memory Computation When to Persist and Cache Data Persistence and Caching APIs Persistence and Caching Example Understanding Spark Joins Broadcast Hash Join Shuffle Sort Merge Join Adaptive Query Execution Dynamically Coalescing Shuffle Partitions Dynamically Switching Join Strategies Dynamically Optimizing Skew Joins Summary Chapter 6: Spark Streaming Stream Processing Concepts Data Delivery Semantics Notion of Time Windowing Stream Processing Engine Landscape Spark Streaming Overview Spark DStream Spark Structured Streaming Overview Core Concepts Data Sources Output Modes Trigger Types Data Sinks Watermarking Structured Streaming Applications Streaming DataFrame Operations Selection, Project, Aggregation Operations Join Operations Working with Data Sources Working with a Socket Data Source Working with a Rate Data Source Working with a File Data Source Working with a Kafka Data Source Working with a Custom Data Source Working with Data Sinks Working with a File Data Sink Working with a Kafka Data Sink Working with a foreach Data Sink Working with a Console Data Sink Working with a Memory Data Sink Output Modes Triggers Summary Chapter 7: Advanced Spark Streaming Event Time Fixed Window Aggregation over an Event Time Sliding Window Aggregation over Event Time Aggregation State Watermarking: Limit State and Handle Late Data Arbitrary Stateful Processing Arbitrary Stateful Processing with Structured Streaming Handling State Timeouts Arbitrary State Processing in Action Extracting Patterns with mapGroupsWithState User Sessionization with flatMapGroupsWithState Handling Duplicate Data Fault Tolerance Streaming Application Code Change Spark Runtime Change Streaming Query Metrics and Monitoring Streaming Query Metrics Monitoring Streaming Queries via Callback Monitoring Streaming Queries via Visualization UI Streaming Query Summary Information Streaming Query Detailed Statistics Information Troubleshooting Streaming Query Summary Chapter 8: Machine Learning with Spark Machine Learning Overview Machine Learning Terminologies Machine Learning Types Supervised Learning Unsupervised Learning Reinforcement Learning Machine Learning Development Process Spark Machine Learning Library Machine Learning Pipelines Transformers Estimators Pipeline Pipeline Persistence: Saving and Loading Model Tuning Speeding Up Model Tuning Model Evaluators Machine Learning Tasks in Action Classification Model Hyperparameters Example Regression Model Hyperparameters Example Recommendation Model Hyperparameters Example Deep Learning Pipeline Summary Chapter 9: Managing the Machine Learning Life Cycle The Rise of MLOps MLOps Overview MLflow Overview MLflow Components MLflow in Action MLflow Tracking MLflow Projects MLflow Models MLflow Model Registry Model Deployment and Prediction Summary Index