دسترسی نامحدود
برای کاربرانی که ثبت نام کرده اند
برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید
در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید
برای کاربرانی که ثبت نام کرده اند
درصورت عدم همخوانی توضیحات با کتاب
از ساعت 7 صبح تا 10 شب
ویرایش: 1 نویسندگان: Tomer Shiran, Jason Hughes, Alex Merced سری: ISBN (شابک) : 1098148622, 9781098148621 ناشر: O’Reilly Media سال نشر: 2024 تعداد صفحات: 344 زبان: English فرمت فایل : PDF (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود) حجم فایل: 5 مگابایت
در صورت ایرانی بودن نویسنده امکان دانلود وجود ندارد و مبلغ عودت داده خواهد شد
در صورت تبدیل فایل کتاب Apache Iceberg: The Definitive Guide: Data Lakehouse Functionality, Performance, and Scalability on the Data Lake به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.
توجه داشته باشید کتاب Apache Iceberg: راهنمای قطعی: عملکرد ، عملکرد ، عملکرد و مقیاس پذیری Data Lakehouse در دریاچه داده ها نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.
Cover Copyright Table of Contents Foreword by Gerrit Kazmaier Foreword by Raghu Ramakrishnan Foreword by Rick Sears Preface About This Book Why We Wrote This Book What You Will Find Inside How to Use This Book Feedback and Questions Conventions Used in This Book Using Code Examples O’Reilly Online Learning How to Contact Us Acknowledgments Part I. Fundamentals of Apache Iceberg Chapter 1. Introduction to Apache Iceberg How Did We Get Here? A Brief History Foundational Components of a System Designed for OLAP Workloads Bringing It All Together The Data Warehouse A Brief History Pros and Cons of a Data Warehouse The Data Lake A Brief History Pros and Cons of a Data Lake Should I Run Analytics on a Data Lake or a Data Warehouse? The Data Lakehouse What Is a Table Format? Hive: The Original Table Format Modern Data Lake Table Formats What Is Apache Iceberg? How Apache Iceberg Came to Be The Apache Iceberg Architecture Key Features of Apache Iceberg Conclusion Chapter 2. The Architecture of Apache Iceberg The Data Layer Datafiles Delete Files The Metadata Layer Manifest Files Manifest Lists Metadata Files Puffin Files The Catalog Conclusion Chapter 3. Lifecycle of Write and Read Queries Writing Queries in Apache Iceberg Create the Table Insert the Query Merge Query Reading Queries in Apache Iceberg The SELECT Query The Time-Travel Query Conclusion Chapter 4. Optimizing the Performance of Iceberg Tables Compaction Hands-on with Compaction Compaction Strategies Automating Compaction Sorting Z-order Partitioning Hidden Partitioning Partition Evolution Other Partitioning Considerations Copy-on-Write Versus Merge-on-Read Copy-on-Write Merge-on-Read Configuring COW and MOR Other Considerations Metrics Collection Rewriting Manifests Optimizing Storage Write Distribution Mode Object Storage Considerations Datafile Bloom Filters Conclusion Chapter 5. Iceberg Catalogs Requirements of an Iceberg Catalog Catalog Comparison The Hadoop Catalog The Hive Catalog The AWS Glue Catalog The Nessie Catalog The REST Catalog The JDBC Catalog Other Catalogs Catalog Migration Using the Apache Iceberg Catalog Migration CLI Using an Engine Conclusion Part II. Hands-on with Apache Iceberg Chapter 6. Apache Spark Configuration Configuring Apache Iceberg and Spark Configuring the Catalogs Starting Spark with All the Configurations (AWS Glue Example) Data Definition Language Operations CREATE TABLE ALTER TABLE Alter a Table with Iceberg’s Spark SQL Extensions DROP TABLE Reading Data The Select All Query The Filter Rows Query Aggregation Queries Using Window Functions Writing Data INSERT INTO MERGE INTO INSERT OVERWRITE DELETE FROM UPDATE Iceberg Table Maintenance Procedures Expire Snapshots Rewrite Datafiles Rewrite Manifests Remove Orphan Files Conclusion Chapter 7. Dremio’s SQL Query Engine Configuration Data Definition Language Operations CREATE TABLE ALTER TABLE DROP TABLE Reading Data Using the SELECT Query Filtering Rows Using Aggregated Queries Using Window Functions Writing Data INSERT INTO COPY INTO MERGE INTO DELETE UPDATE Iceberg Table Maintenance Expire Snapshots Rewrite Datafiles Rewrite Manifests Conclusion Chapter 8. AWS Glue Configuration Creating a Glue Database Configuring the Glue ETL Job Create a Table Using the Glue Data Catalog Read the Table Insert the Data Conclusion Chapter 9. Apache Flink Configuration Prerequisites Start the Flink Cluster and Flink SQL Client Data Definition Language Operations CREATE CATALOG CREATE DATABASE CREATE TABLE ALTER TABLE DROP TABLE Reading Data Flink SQL Batch Read Flink SQL Streaming Read Metadata Table Writing Data INSERT INTO INSERT OVERWRITE UPSERT Flink DataFrame and Table API with Apache Iceberg Tables Prerequisites Configuring the Flink Job Starting the Cluster and Building the Package Running the Job Conclusion Part III. Apache Iceberg in Practice Chapter 10. Apache Iceberg in Production Apache Iceberg Metadata Tables The history Metadata Table The metadata_log_entries Metadata Table The snapshots Metadata Table The files Metadata Table The manifests Metadata Table The partitions Metadata Table The all_data_files Metadata Table The all_manifests Metadata Table The refs Metadata Table The entries Metadata Table Using the Metadata Tables in Conjunction Isolation of Changes with Branches Table Branching and Tagging Catalog Branching and Tagging Multitable Transactions Rolling Back Changes Rolling Back at the Table Level Rolling Back at the Catalog Level Conclusion Chapter 11. Streaming with Apache Iceberg Streaming with Spark Streaming into Iceberg with Spark Streaming from Iceberg with Spark Streaming with Flink Streaming into Iceberg with Flink Example of Streaming into Iceberg with Flink Streaming with Kafka Connect The Iceberg Kafka Sink Streaming with AWS Conclusion Chapter 12. Governance and Security Securing Datafiles Securing Files: Best Practices Hadoop Distributed File System Amazon Simple Storage Service Azure Data Lake Storage Google Cloud Storage Securing and Governing at the Semantic Layer Semantic Layer Best Practices Dremio Trino Securing and Governing at the Catalog Level Nessie Tabular AWS Glue and Lake Formation Additional Security and Governance Considerations Conclusion Chapter 13. Migrating to Apache Iceberg Migration Considerations Three-Step In-Place Migration Plan Four-Phase Shadow Migration Plan Migrating Hive Tables to Apache Iceberg The Snapshot Procedure The Migrate Procedure Migrating Delta Lake to Apache Iceberg Migrating Apache Hudi to Apache Iceberg Migrating Individual Files to Apache Iceberg Using the add_files Procedure Migrating from Delta Lake or Apache Hudi Without Preserving History Migrating from Anywhere by Rewriting Data Migrating Data to a New Iceberg Table Migrating Data into an Existing Iceberg Table Conclusion Chapter 14. Real-World Use Cases of Apache Iceberg Ensuring High-Quality Data with Write-Audit-Publish in Apache Iceberg WAP Using Iceberg’s Branching Feature Running BI Workloads on the Data Lake Land the Raw Data into the Data Lake Curate Virtual Data Marts/Data Products Create a Reflection to Accelerate Our Dashboard Connect Our View to Our BI Tool Benefits of Running BI Workloads on the Data Lake Implementing Change Data Capture with Apache Iceberg Create Apache Iceberg Tables Apply Updates from Operational Systems Create the Change Log View to Capture Changes Merge Changed Data in the Aggregated Table Conclusion Index About the Authors Colophon