ورود به حساب

نام کاربری گذرواژه

گذرواژه را فراموش کردید؟ کلیک کنید

حساب کاربری ندارید؟ ساخت حساب

ساخت حساب کاربری

نام نام کاربری ایمیل شماره موبایل گذرواژه

برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید


09117307688
09117179751

در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید

دسترسی نامحدود

برای کاربرانی که ثبت نام کرده اند

ضمانت بازگشت وجه

درصورت عدم همخوانی توضیحات با کتاب

پشتیبانی

از ساعت 7 صبح تا 10 شب

دانلود کتاب Building Modern Data Applications Using Databricks Lakehouse

دانلود کتاب ساختن برنامه های داده مدرن با استفاده از Databricks Lakehouse

Building Modern Data Applications Using Databricks Lakehouse

مشخصات کتاب

Building Modern Data Applications Using Databricks Lakehouse

ویرایش: 1 
نویسندگان:   
سری:  
ISBN (شابک) : 9781801073233 
ناشر: Packt Publishing 
سال نشر: 2024 
تعداد صفحات: 246 
زبان: English 
فرمت فایل : PDF (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود) 
حجم فایل: 9 مگابایت 

قیمت کتاب (تومان) : 62,000



ثبت امتیاز به این کتاب

میانگین امتیاز به این کتاب :
       تعداد امتیاز دهندگان : 3


در صورت تبدیل فایل کتاب Building Modern Data Applications Using Databricks Lakehouse به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.

توجه داشته باشید کتاب ساختن برنامه های داده مدرن با استفاده از Databricks Lakehouse نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.


توضیحاتی درمورد کتاب به خارجی



فهرست مطالب

Cover
Title Page
Copyright and Credits
Dedication
Contributors
Table of Contents
Preface
Part 1: Near-Real-Time Data Pipelines for the Lakehouse
Chapter 1: An Introduction to Delta Live Tables
	Technical requirements
	The emergence of the lakehouse
		The Lambda architectural pattern
		Introducing the Medallion architecture
		The Databricks lakehouse
	The maintenance predicament of a streaming application
	What is the DLT framework?
	How is DLT related to Delta Lake?
	Introducing DLT concepts
		Streaming tables
		Materialized views
		Views
		Pipeline
		Pipeline triggers
		Workflow
		Types of Databricks compute
		Databricks Runtime
		Unity Catalog
	A quick Delta Lake primer
		The architecture of a Delta table
		The contents of a transaction commit
		Supporting concurrent table reads and writes
		Tombstoned data files
		Calculating Delta table state
		Time travel
		Tracking table changes using change data feed
	A hands-on example – creating your first Delta Live Tables pipeline
	Summary
Chapter 2: Applying Data Transformations Using Delta Live Tables
	Technical requirements
	Ingesting data from input sources
		Ingesting data using Databricks Auto Loader
		Scalability challenge in structured streaming
		Using Auto Loader with DLT
	Applying changes to downstream tables
		APPLY CHANGES command
		The DLT reconciliation process
	Publishing datasets to Unity Catalog
		Why store datasets in Unity Catalog?
		Creating a new catalog
		Assigning catalog permissions
	Data pipeline settings
		The DLT product edition
		Pipeline execution mode
		Databricks runtime
		Pipeline cluster types
		A serverless compute versus a traditional compute
		Loading external dependencies
		Data pipeline processing modes
	Hands-on exercise – applying SCD Type 2 changes
	Summary
Chapter 3: Managing Data Quality Using Delta Live Tables
	Technical requirements
	Defining data constraints in Delta Lake
	Using temporary datasets to validate data processing
	An introduction to expectations
		Expectation composition
		Hands-on exercise – writing your first data quality expectation
		Acting on failed expectations
		Hands-on example – failing a pipeline run due to poor data quality
		Applying multiple data quality expectations
	Decoupling expectations from a DLT pipeline
	Hands-on exercise – quarantining bad data for correction
	Summary
Chapter 4: Scaling DLT Pipelines
	Technical requirements
	Scaling compute to handle demand
	Hands-on example – setting autoscaling properties using the Databricks REST API
	Automated table maintenance tasks
		Why auto compaction is important
		Vacuuming obsolete table files
		Moving compute closer to the data
	Optimizing table layouts for faster table updates
		Rewriting table files during updates
		Data skipping using table partitioning
		Delta Lake Z-ordering on MERGE columns
		Improving write performance using deletion vectors
	Serverless DLT pipelines
	Introducing Enzyme, a performance optimization layer
	Summary
Part 2: Securing the Lakehouse Using the Unity Catalog
Chapter 5: Mastering Data Governance in the Lakehouse with Unity Catalog
	Technical requirements
	Understanding data governance in a lakehouse
		Introducing the Databricks Unity Catalog
		A problem worth solving
		An overview of the Unity Catalog architecture
		Unity Catalog-enabled cluster types
		Unity Catalog object model
	Enabling Unity Catalog on an existing Databricks workspace
	Identity federation in Unity Catalog
	Data discovery and cataloging
		Tracking dataset relationships using lineage
		Observability with system tables
		Tracing the lineage of other assets
		Fine-grained data access
	Hands-on example – data masking healthcare datasets
	Summary
Chapter 6: Managing Data Locations in Unity Catalog
	Technical requirements
	Creating and managing data catalogs in Unity Catalog
		Managed data versus external data
		Saving data to storage volumes in Unity Catalog
	Setting default locations for data within Unity Catalog
	Isolating catalogs to specific workspaces
	Creating and managing external storage locations in Unity Catalog
		Storing cloud service authentication using storage credentials
		Querying external systems using Lakehouse Federation
	Hands-on lab – extracting document text for a generative AI pipeline
		Generating mock documents
		Defining helper functions
		Choosing a file format randomly
		Creating/assembling the DLT pipeline
	Summary
Chapter 7: Viewing Data Lineage Using Unity Catalog
	Technical requirements
	Introducing data lineage in Unity Catalog
	Tracing data origins using the Data Lineage REST API
	Visualizing upstream and downstream transformations
	Identifying dependencies and impacts
	Hands-on lab – documenting data lineage across an organization
	Summary
Part 3: Continuous Integration, Continuous Deployment, and Continuous Monitoring
Chapter 8: Deploying, Maintaining, and Administrating DLT Pipelines Using Terraform
	Technical requirements
	Introducing the Databricks provider for Terraform
	Setting up a local Terraform environment
		Importing the Databricks Terraform provider
		Configuring workspace authentication
		Defining a DLT pipeline source notebook
		Applying workspace changes
	Configuring DLT pipelines using Terraform
		name
		notification
		channel
		development
		continuous
		edition
		photon
		configuration
		library
		cluster
		catalog
		target
		storage
	Automating DLT pipeline deployment
	Hands-on exercise – deploying a DLT pipeline using VS Code
		Setting up VS Code
		Creating a new Terraform project
		Defining the Terraform resources
		Deploying the Terraform project
	Summary
Chapter 9: Leveraging Databricks Asset Bundles to Streamline Data Pipeline Deployment
	Technical requirements
	Introduction to Databricks Asset Bundles
		Elements of a DAB configuration file
		Specifying a deployment mode
	Databricks Asset Bundles in action
		User-to-machine authentication
		Machine-to-machine authentication
		Initializing an asset bundle using templates
	Hands-on exercise – deploying your first DAB
	Hands-on exercise – simplifying cross-team collaboration with GitHub Actions
		Setting up the environment
		Configuring the GitHub Action
		Testing the workflow
	Versioning and maintenance
	Summary
Chapter 10: Monitoring Data Pipelines in Production
	Technical requirements
	Introduction to data pipeline monitoring
		Exploring ways to monitor data pipelines
		Using DBSQL Alerts to notify data validity
	Pipeline health and performance monitoring
	Hands-on exercise – querying data quality events for a dataset
	Data quality monitoring
		Introducing Lakehouse Monitoring
		Hands-on exercise – creating a lakehouse monitor
	Best practices for production failure resolution
		Handling pipeline update failures
		Recovering from table transaction failure
	Hands-on exercise – setting up a webhook alert when a job runs longer than expected
	Summary
Index
About Packt
Other Books You May Enjoy




نظرات کاربران