دسترسی نامحدود
برای کاربرانی که ثبت نام کرده اند
برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید
در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید
برای کاربرانی که ثبت نام کرده اند
درصورت عدم همخوانی توضیحات با کتاب
از ساعت 7 صبح تا 10 شب
ویرایش: نویسندگان: Anirudh Kala, Anshul Bhatnagar, Sarthak Sarbahi سری: ISBN (شابک) : 1801819076, 9781801819077 ناشر: Packt Publishing سال نشر: 2021 تعداد صفحات: 230 زبان: English فرمت فایل : EPUB (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود) حجم فایل: 9 Mb
در صورت تبدیل فایل کتاب Optimizing Databricks Workloads: Harness the power of Apache Spark in Azure and maximize the performance of modern big data workloads به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.
توجه داشته باشید کتاب بهینه سازی حجم کاری Databricks: از قدرت Apache Spark در Azure استفاده کنید و کارایی بارهای کاری داده های بزرگ مدرن را به حداکثر برسانید. نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.
Cover Title Page Copyright and credit page Contributors About the reviewer Table of Contents Preface Section 1: Introduction to Azure Databricks Chapter 1: Discovering Databricks Technical requirements Introducing Spark fundamentals Introducing Databricks Creating an Azure Databricks workspace Core Databricks concepts Creating a Spark cluster Databricks notebooks Databricks File System (DBFS) Databricks jobs Databricks Community Learning about Delta Lake Big data file formats Understanding the transactional log Delta Lake in action Summary Chapter 2: Batch and Real-Time Processing in Databricks Technical requirements Differentiating batch versus real-time processing Mounting Azure Data Lake in Databricks Creating an Azure Data Lake instance Accessing Azure Data Lake in Databricks Working with batch processing Reading data Checking row count Selecting columns Filtering data Dropping columns Adding or replacing columns Printing schema Renaming a column Dropping duplicate rows Limiting output rows Sorting rows Grouping data Visualizing data Writing data to a sink Batch ETL process demo Learning Structured Streaming in Azure Databricks Structured Streaming concepts Managing streams Sorting data Productionizing Structured Streaming Summary Chapter 3: Learning about Machine Learning and Graph Processing in Databricks Technical requirements Learning about ML components in Databricks Practicing ML in Databricks Environment setup EDA ML Learning about MLflow Learning about graph analysis in Databricks Summary Section 2: Optimization Techniques Chapter 4: Managing Spark Clusters Technical requirements Designing Spark clusters Understanding cluster types Learning about spot instances Learning about autoscaling in Spark clusters Introducing Databricks Pools Learning about Databricks runtime versions (DBRs) Learning about automatic termination Learning about cluster sizing Learning about Databricks managed resource groups Learning about Databricks Pools Creating a pool Attaching a cluster to the Pool Following the best practices for Azure Databricks Pools Using spot instances Following the Spark UI Understanding the Jobs section Understanding the Stages section Understanding the Storage section Understanding the Environment section Understanding the Executors section Understanding the SQL section Understanding the JDBC/ODBC Server section Understanding the Structured Streaming section Summary Chapter 5: Big Data Analytics Technical requirements Understanding the collect() method Understanding the use of inferSchema Experiment 1 Experiment 2 Learning to differentiate CSV and Parquet Learning to differentiate Pandas and Koalas Understanding built-in Spark functions Learning column predicate pushdown Learning partitioning strategies in Spark Understanding Spark partitions Understanding Hive partitions Understanding Spark SQL optimizations Understanding bucketing in Spark Summary Chapter 6: Databricks Delta Lake Technical requirements Working with the OPTIMIZE and ZORDER commands Using Auto Optimize Understanding optimized writes Understanding Auto Compaction Learning about delta caching Learning about dynamic partition pruning Understanding bloom filter indexing Summary Chapter 7: Spark Core Technical requirements Learning about broadcast joins Learning about Apache Arrow in Pandas Understanding shuffle partitions Understanding caching in Spark Learning about AQE Dynamically coalescing shuffle partitions Dynamically switching join strategies Dynamically optimizing skew joins Summary Section 3: Real-World Scenarios Chapter 8: Case Studies Learning case studies from the manufacturing industry Case study 1 – leading automobile manufacturing company Case study 2 – international automobile manufacturing giant Case study 3 – graph search in a chemical corporate firm Case study 4 – real-time loyalty engine for a leading medical equipment manufacturer Learning case studies from the media and entertainment industry Case study 5 – HD Insights to Databricks migration for a media giant Learning case studies from the retail and FMCG industry Case study 6 – real-time analytics using IoT Hub for a retail giant Learning case studies from the pharmaceutical industry Case study 7 – pricing analytics for a pharmaceutical company Learning case studies from the e-commerce industry Case study 8 – migrating interactive analytical apps from Redshift to Postgres Learning case studies from the logistics and supply chain industry Case study 9 – accelerating intelligent insights with tailored big data analytics Summary Other Books You May Enjoy Index