دسترسی نامحدود
برای کاربرانی که ثبت نام کرده اند
برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید
در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید
برای کاربرانی که ثبت نام کرده اند
درصورت عدم همخوانی توضیحات با کتاب
از ساعت 7 صبح تا 10 شب
ویرایش: 1
نویسندگان: Will Girten
سری:
ISBN (شابک) : 9781801073233
ناشر: Packt Publishing
سال نشر: 2024
تعداد صفحات: 246
زبان: English
فرمت فایل : PDF (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود)
حجم فایل: 9 مگابایت
در صورت تبدیل فایل کتاب Building Modern Data Applications Using Databricks Lakehouse به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.
توجه داشته باشید کتاب ساختن برنامه های داده مدرن با استفاده از Databricks Lakehouse نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.
Cover Title Page Copyright and Credits Dedication Contributors Table of Contents Preface Part 1: Near-Real-Time Data Pipelines for the Lakehouse Chapter 1: An Introduction to Delta Live Tables Technical requirements The emergence of the lakehouse The Lambda architectural pattern Introducing the Medallion architecture The Databricks lakehouse The maintenance predicament of a streaming application What is the DLT framework? How is DLT related to Delta Lake? Introducing DLT concepts Streaming tables Materialized views Views Pipeline Pipeline triggers Workflow Types of Databricks compute Databricks Runtime Unity Catalog A quick Delta Lake primer The architecture of a Delta table The contents of a transaction commit Supporting concurrent table reads and writes Tombstoned data files Calculating Delta table state Time travel Tracking table changes using change data feed A hands-on example – creating your first Delta Live Tables pipeline Summary Chapter 2: Applying Data Transformations Using Delta Live Tables Technical requirements Ingesting data from input sources Ingesting data using Databricks Auto Loader Scalability challenge in structured streaming Using Auto Loader with DLT Applying changes to downstream tables APPLY CHANGES command The DLT reconciliation process Publishing datasets to Unity Catalog Why store datasets in Unity Catalog? Creating a new catalog Assigning catalog permissions Data pipeline settings The DLT product edition Pipeline execution mode Databricks runtime Pipeline cluster types A serverless compute versus a traditional compute Loading external dependencies Data pipeline processing modes Hands-on exercise – applying SCD Type 2 changes Summary Chapter 3: Managing Data Quality Using Delta Live Tables Technical requirements Defining data constraints in Delta Lake Using temporary datasets to validate data processing An introduction to expectations Expectation composition Hands-on exercise – writing your first data quality expectation Acting on failed expectations Hands-on example – failing a pipeline run due to poor data quality Applying multiple data quality expectations Decoupling expectations from a DLT pipeline Hands-on exercise – quarantining bad data for correction Summary Chapter 4: Scaling DLT Pipelines Technical requirements Scaling compute to handle demand Hands-on example – setting autoscaling properties using the Databricks REST API Automated table maintenance tasks Why auto compaction is important Vacuuming obsolete table files Moving compute closer to the data Optimizing table layouts for faster table updates Rewriting table files during updates Data skipping using table partitioning Delta Lake Z-ordering on MERGE columns Improving write performance using deletion vectors Serverless DLT pipelines Introducing Enzyme, a performance optimization layer Summary Part 2: Securing the Lakehouse Using the Unity Catalog Chapter 5: Mastering Data Governance in the Lakehouse with Unity Catalog Technical requirements Understanding data governance in a lakehouse Introducing the Databricks Unity Catalog A problem worth solving An overview of the Unity Catalog architecture Unity Catalog-enabled cluster types Unity Catalog object model Enabling Unity Catalog on an existing Databricks workspace Identity federation in Unity Catalog Data discovery and cataloging Tracking dataset relationships using lineage Observability with system tables Tracing the lineage of other assets Fine-grained data access Hands-on example – data masking healthcare datasets Summary Chapter 6: Managing Data Locations in Unity Catalog Technical requirements Creating and managing data catalogs in Unity Catalog Managed data versus external data Saving data to storage volumes in Unity Catalog Setting default locations for data within Unity Catalog Isolating catalogs to specific workspaces Creating and managing external storage locations in Unity Catalog Storing cloud service authentication using storage credentials Querying external systems using Lakehouse Federation Hands-on lab – extracting document text for a generative AI pipeline Generating mock documents Defining helper functions Choosing a file format randomly Creating/assembling the DLT pipeline Summary Chapter 7: Viewing Data Lineage Using Unity Catalog Technical requirements Introducing data lineage in Unity Catalog Tracing data origins using the Data Lineage REST API Visualizing upstream and downstream transformations Identifying dependencies and impacts Hands-on lab – documenting data lineage across an organization Summary Part 3: Continuous Integration, Continuous Deployment, and Continuous Monitoring Chapter 8: Deploying, Maintaining, and Administrating DLT Pipelines Using Terraform Technical requirements Introducing the Databricks provider for Terraform Setting up a local Terraform environment Importing the Databricks Terraform provider Configuring workspace authentication Defining a DLT pipeline source notebook Applying workspace changes Configuring DLT pipelines using Terraform name notification channel development continuous edition photon configuration library cluster catalog target storage Automating DLT pipeline deployment Hands-on exercise – deploying a DLT pipeline using VS Code Setting up VS Code Creating a new Terraform project Defining the Terraform resources Deploying the Terraform project Summary Chapter 9: Leveraging Databricks Asset Bundles to Streamline Data Pipeline Deployment Technical requirements Introduction to Databricks Asset Bundles Elements of a DAB configuration file Specifying a deployment mode Databricks Asset Bundles in action User-to-machine authentication Machine-to-machine authentication Initializing an asset bundle using templates Hands-on exercise – deploying your first DAB Hands-on exercise – simplifying cross-team collaboration with GitHub Actions Setting up the environment Configuring the GitHub Action Testing the workflow Versioning and maintenance Summary Chapter 10: Monitoring Data Pipelines in Production Technical requirements Introduction to data pipeline monitoring Exploring ways to monitor data pipelines Using DBSQL Alerts to notify data validity Pipeline health and performance monitoring Hands-on exercise – querying data quality events for a dataset Data quality monitoring Introducing Lakehouse Monitoring Hands-on exercise – creating a lakehouse monitor Best practices for production failure resolution Handling pipeline update failures Recovering from table transaction failure Hands-on exercise – setting up a webhook alert when a job runs longer than expected Summary Index About Packt Other Books You May Enjoy