ورود به حساب

نام کاربری گذرواژه

گذرواژه را فراموش کردید؟ کلیک کنید

حساب کاربری ندارید؟ ساخت حساب

ساخت حساب کاربری

نام نام کاربری ایمیل شماره موبایل گذرواژه

برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید


09117307688
09117179751

در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید

دسترسی نامحدود

برای کاربرانی که ثبت نام کرده اند

ضمانت بازگشت وجه

درصورت عدم همخوانی توضیحات با کتاب

پشتیبانی

از ساعت 7 صبح تا 10 شب

دانلود کتاب Databricks Certified Associate Developer for Apache Spark Using Python: The ultimate guide to getting certified

دانلود کتاب توسعه دهنده خبره Databricks برای Apache Spark با استفاده از پایتون: راهنمای نهایی برای دریافت گواهینامه

Databricks Certified Associate Developer for Apache Spark Using Python: The ultimate guide to getting certified

مشخصات کتاب

Databricks Certified Associate Developer for Apache Spark Using Python: The ultimate guide to getting certified

ویرایش: [1 ed.] 
نویسندگان:   
سری:  
ISBN (شابک) : 9781804619780 
ناشر: Packt Publishing 
سال نشر: 2024 
تعداد صفحات: 274 
زبان: English 
فرمت فایل : PDF (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود) 
حجم فایل: 7 Mb 

قیمت کتاب (تومان) : 60,000

در صورت ایرانی بودن نویسنده امکان دانلود وجود ندارد و مبلغ عودت داده خواهد شد



ثبت امتیاز به این کتاب

میانگین امتیاز به این کتاب :
       تعداد امتیاز دهندگان : 1


در صورت تبدیل فایل کتاب Databricks Certified Associate Developer for Apache Spark Using Python: The ultimate guide to getting certified به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.

توجه داشته باشید کتاب توسعه دهنده خبره Databricks برای Apache Spark با استفاده از پایتون: راهنمای نهایی برای دریافت گواهینامه نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.


توضیحاتی درمورد کتاب به خارجی



فهرست مطالب

Building Modern Data Applications Using Databricks Lakehouse
Contributors
About the author
About the reviewer
Preface
   Who this book is for
   What this book covers
   To get the most out of this book
   Download the example code files
   Conventions used
   Get in touch
   Share Your Thoughts
   Download a free PDF copy of this book
Part 1:Near-Real-Time Data Pipelines for the Lakehouse
1
An Introduction to Delta Live Tables
   Technical requirements
   The emergence of the lakehouse
      The Lambda architectural pattern
      Introducing the medallion architecture
      The Databricks lakehouse
   The maintenance predicament of a streaming application
   What is the DLT framework?
   How is DLT related to Delta Lake?
   Introducing DLT concepts
      Streaming tables
      Materialized views
      Views
      Pipeline
      Pipeline triggers
      Workflow
      Types of Databricks compute
      Databricks Runtime
      Unity Catalog
   A quick Delta Lake primer
      The architecture of a Delta table
      The contents of a transaction commit
      Supporting concurrent table reads and writes
      Tombstoned data files
      Calculating Delta table state
      Time travel
      Tracking table changes using change data feed
   A hands-on example – creating your first Delta Live Tables pipeline
   Summary
2
Applying Data Transformations Using Delta Live Tables
   Technical requirements
   Ingesting data from input sources
      Ingesting data using Databricks Auto Loader
      Scalability challenge in structured streaming
      Using Auto Loader with DLT
   Applying changes to downstream tables
      APPLY CHANGES command
      The DLT reconciliation process
   Publishing datasets to Unity Catalog
      Why store datasets in Unity Catalog?
      Creating a new catalog
      Assigning catalog permissions
   Data pipeline settings
      The DLT product edition
      Pipeline execution mode
      Databricks runtime
      Pipeline cluster types
      A serverless compute versus a traditional compute
      Loading external dependencies
      Data pipeline processing modes
   Hands-on exercise – applying SCD Type 2 changes
   Summary
3
Managing Data Quality Using Delta Live Tables
   Technical requirements
   Defining data constraints in Delta Lake
   Using temporary datasets to validate data processing
   An introduction to expectations
      Expectation composition
      Hands-on exercise – writing your first data quality expectation
      Acting on failed expectations
      Hands-on example – failing a pipeline run due to poor data quality
      Applying multiple data quality expectations
   Decoupling expectations from a DLT pipeline
   Hands-on exercise – quarantining bad data for correction
   Summary
4
Scaling DLT Pipelines
   Technical requirements
   Scaling compute to handle demand
   Hands-on example – setting autoscaling properties using the Databricks REST API
   Automated table maintenance tasks
      Why auto compaction is important
      Vacuuming obsolete table files
      Moving compute closer to the data
   Optimizing table layouts for faster table updates
      Rewriting table files during updates
      Data skipping using table partitioning
      Delta Lake Z-ordering on MERGE columns
      Improving write performance using deletion vectors
   Serverless DLT pipelines
   Introducing Enzyme, a performance optimization layer
   Summary
Part 2:Securing the Lakehouse Using the Unity Catalog
5
Mastering Data Governance in the Lakehouse with Unity Catalog
   Technical requirements
   Understanding data governance in a lakehouse
      Introducing the Databricks Unity Catalog
      A problem worth solving
      An overview of the Unity Catalog architecture
      Unity Catalog-enabled cluster types
      Unity Catalog object model
   Enabling Unity Catalog on an existing Databricks workspace
   Identity federation in Unity Catalog
   Data discovery and cataloging
      Tracking dataset relationships using lineage
      Observability with system tables
      Tracing the lineage of other assets
      Fine-grained data access
   Hands-on example – data masking healthcare datasets
   Summary
6
Managing Data Locations in Unity Catalog
   Technical requirements
   Creating and managing data catalogs in Unity Catalog
      Managed data versus external data
      Saving data to storage volumes in Unity Catalog
   Setting default locations for data within Unity Catalog
   Isolating catalogs to specific workspaces
   Creating and managing external storage locations in Unity Catalog
      Storing cloud service authentication using storage credentials
      Querying external systems using Lakehouse Federation
   Hands-on lab – extracting document text for a generative AI pipeline
      Generating mock documents
      Defining helper functions
      Choosing a file format randomly
      Creating/assembling the DLT pipeline
   Summary
7
Viewing Data Lineage Using Unity Catalog
   Technical requirements
   Introducing data lineage in Unity Catalog
   Tracing data origins using the Data Lineage REST API
   Visualizing upstream and downstream transformations
   Identifying dependencies and impacts
   Hands-on lab – documenting data lineage across an organization
   Summary
Part 3:Continuous Integration, Continuous Deployment, and Continuous Monitoring
8
Deploying, Maintaining, and Administrating DLT Pipelines Using Terraform
   Technical requirements
   Introducing the Databricks provider for Terraform
   Setting up a local Terraform environment
      Importing the Databricks Terraform provider
      Configuring workspace authentication
      Defining a DLT pipeline source notebook
      Applying workspace changes
   Configuring DLT pipelines using Terraform
      name
      notification
      channel
      development
      continuous
      edition
      photon
      configuration
      library
      cluster
      catalog
      target
      storage
   Automating DLT pipeline deployment
   Hands-on exercise – deploying a DLT pipeline using VS Code
      Setting up VS Code
      Creating a new Terraform project
      Defining the Terraform resources
      Deploying the Terraform project
   Summary
9
Leveraging Databricks Asset Bundles to Streamline Data Pipeline Deployment
   Technical requirements
   Introduction to Databricks Asset Bundles
      Elements of a DAB configuration file
      Specifying a deployment mode
   Databricks Asset Bundles in action
      User-to-machine authentication
      Machine-to-machine authentication
      Initializing an asset bundle using templates
   Hands-on exercise – deploying your first DAB
   Hands-on exercise – simplifying cross-team collaboration with GitHub Actions
      Setting up the environment
      Configuring the GitHub Action
      Testing the workflow
   Versioning and maintenance
   Summary
10
Monitoring Data Pipelines in Production
   Technical requirements
   Introduction to data pipeline monitoring
      Exploring ways to monitor data pipelines
      Using DBSQL Alerts to notify data validity
   Pipeline health and performance monitoring
   Hands-on exercise – querying data quality events for a dataset
   Data quality monitoring
      Introducing Lakehouse Monitoring
      Hands-on exercise – creating a lakehouse monitor
   Best practices for production failure resolution
      Handling pipeline update failures
      Recovering from table transaction failure
   Hands-on exercise – setting up a webhook alert when a job runs longer than expected
   Summary
Index
   Why subscribe?
Other Books You May Enjoy
   Packt is searching for authors like you
   Share Your Thoughts
   Download a free PDF copy of this book




نظرات کاربران