دسترسی نامحدود
برای کاربرانی که ثبت نام کرده اند
برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید
در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید
برای کاربرانی که ثبت نام کرده اند
درصورت عدم همخوانی توضیحات با کتاب
از ساعت 7 صبح تا 10 شب
ویرایش: نویسندگان: Prashanth Babu, Tristen Wentling, Scott Haines, Denny Lee سری: ISBN (شابک) : 1098151941, 9781098151942 ناشر: O'Reilly Media سال نشر: 2024 تعداد صفحات: 383 زبان: English فرمت فایل : PDF (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود) حجم فایل: 7 مگابایت
در صورت تبدیل فایل کتاب Delta Lake: The Definitive Guide: Modern Data Lakehouse Architectures with Data Lakes به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.
توجه داشته باشید کتاب دریاچه دلتا: راهنمای قطعی: معماری های مدرن Data Lakehouse با دریاچه های داده نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.
Copyright Table of Contents Foreword by Michael Armbrust Foreword by Dominique Brezinski Preface Who This Book Is For How This Book Is Organized Conventions Used in This Book Using Code Examples O’Reilly Online Learning How to Contact Us Acknowledgments Denny Tristen Scott Prashanth Chapter 1. Introduction to the Delta Lake Lakehouse Format The Genesis of Delta Lake Data Warehousing, Data Lakes, and Data Lakehouses Project Tahoe to Delta Lake: The Early Years Months What Is Delta Lake? Common Use Cases Key Features Anatomy of a Delta Lake Table Delta Transaction Protocol Understanding the Delta Lake Transaction Log at the File Level The Single Source of Truth The Relationship Between Metadata and Data Multiversion Concurrency Control (MVCC) File and Data Observations Observing the Interaction Between the Metadata and Data Table Features Delta Kernel Delta UniForm Conclusion Chapter 2. Installing Delta Lake Delta Lake Docker Image Delta Lake for Python PySpark Shell JupyterLab Notebook Scala Shell Delta Rust API ROAPI Native Delta Lake Libraries Multiple Bindings Available Installing the Delta Lake Python Package Apache Spark with Delta Lake Setting Up Delta Lake with Apache Spark Prerequisite: Set Up Java Setting Up an Interactive Shell PySpark Declarative API Databricks Community Edition Create a Cluster with Databricks Runtime Importing Notebooks Attaching Notebooks Conclusion Chapter 3. Essential Delta Lake Operations Create Creating a Delta Lake Table Loading Data into a Delta Lake Table The Transaction Log Read Querying Data from a Delta Lake Table Reading with Time Travel Update Delete Deleting Data from a Delta Lake Table Overwriting Data in a Delta Lake Table Merge Other Useful Actions Parquet Conversions Delta Lake Metadata and History Conclusion Chapter 4. Diving into the Delta Lake Ecosystem Connectors Apache Flink Flink DataStream Connector Installing the Connector DeltaSource API DeltaSink API End-to-End Example Kafka Delta Ingest Install Rust Build the Project Run the Ingestion Flow Trino Getting Started Configuring and Using the Trino Connector Using Show Catalogs Creating a Schema Show Schemas Working with Tables Table Operations Conclusion Chapter 5. Maintaining Your Delta Lake Using Delta Lake Table Properties Delta Lake Table Properties Reference Create an Empty Table with Properties Populate the Table Evolve the Table Schema Add or Modify Table Properties Remove Table Properties Delta Lake Table Optimization The Problem with Big Tables and Small Files Using OPTIMIZE to Fix the Small File Problem Table Tuning and Management Partitioning Your Tables Defining Partitions on Table Creation Migrating from a Nonpartitioned to a Partitioned Table Repairing, Restoring, and Replacing Table Data Recovering and Replacing Tables Deleting Data and Removing Partitions The Life Cycle of a Delta Lake Table Restoring Your Table Cleaning Up Conclusion Chapter 6. Building Native Applications with Delta Lake Getting Started Python Rust Building a Lambda What’s Next Chapter 7. Streaming In and Out of Your Delta Lake Streaming and Delta Lake Streaming Versus Batch Processing Delta as Source Delta as Sink Delta Streaming Options Limit the Input Rate Ignore Updates or Deletes Initial Processing Position Initial Snapshot with withEventTimeOrder Advanced Usage with Apache Spark Idempotent Stream Writes Delta Lake Performance Metrics Auto Loader and Delta Live Tables Auto Loader Delta Live Tables Change Data Feed Using Change Data Feed Schema Conclusion Chapter 8. Advanced Features Generated Columns, Keys, and IDs Comments and Constraints Comments Delta Table Constraints Deletion Vectors Merge-on-Read Stepping Through Deletion Vectors Conclusion Chapter 9. Architecting Your Lakehouse The Lakehouse Architecture What Is a Lakehouse? Learning from Data Warehouses Learning from Data Lakes The Dual-Tier Data Architecture Lakehouse Architecture Foundations with Delta Lake Open Source on Open Standards in an Open Ecosystem Transaction Support Schema Enforcement and Governance The Medallion Architecture Exploring the Bronze Layer Exploring the Silver Layer Exploring the Gold Layer Streaming Medallion Architecture Conclusion Chapter 10. Performance Tuning: Optimizing Your Data Pipelines with Delta Lake Performance Objectives Maximizing Read Performance Maximizing Write Performance Performance Considerations Partitioning Table Utilities Table Statistics Cluster By Bloom Filter Index Conclusion Chapter 11. Successful Design Patterns Slashing Compute Costs High-Speed Solutions Smart Device Integration Efficient Streaming Ingestion Streaming Ingestion The Inception of Delta Rust The Evolution of Ingestion Coordinating Complex Systems Combining Operational Data Stores at DoorDash Change Data Capture Delta and Flink in Harmony Conclusion Chapter 12. Foundations of Lakehouse Governance and Security Lakehouse Governance The Emergence of Data Governance Data Products and Their Relationship to Data Assets Data Products in the Lakehouse Maintaining High Trust Data Assets and Access The Data Asset Model Unifying Governance Between Data Warehouses and Lakes Permissions Management Filesystem Permissions Cloud Object Store Access Controls Identity and Access Management Data Security Fine-Grained Access Controls for the Lakehouse Conclusion Chapter 13. Metadata Management, Data Flow, and Lineage Metadata Management What Is Metadata Management? Data Catalogs Data Reliability, Stewards, and Permissions Management Why the Metastore Matters Unity Catalog Data Flow and Lineage Data Lineage Data Sharing Automating Data Life Cycles Audit Logging Monitoring and Alerting What Is Data Discovery? Conclusion Chapter 14. Data Sharing with the Delta Sharing Protocol The Basics of Delta Sharing Data Providers Data Recipients Delta Sharing Server Using the REST APIs Anatomy of the REST URI List Shares Get Share List Schemas in Share List All Tables in Share Delta Sharing Clients Delta Sharing with Apache Spark Stream Processing with Delta Shares Delta Sharing Community Connectors Conclusion Index About the Authors Colophon