دسترسی نامحدود
برای کاربرانی که ثبت نام کرده اند
برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید
در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید
برای کاربرانی که ثبت نام کرده اند
درصورت عدم همخوانی توضیحات با کتاب
از ساعت 7 صبح تا 10 شب
ویرایش:
نویسندگان: James Densmore
سری:
ISBN (شابک) : 1492087831, 9781492087830
ناشر: O'Reilly Media, Inc, USA
سال نشر: 2021
تعداد صفحات: 277
زبان: English
فرمت فایل : PDF (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود)
حجم فایل: 7 مگابایت
در صورت تبدیل فایل کتاب Data Pipelines Pocket Reference: Moving and Processing Data for Analytics به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.
توجه داشته باشید کتاب مرجع جیبی خطوط لوله داده: انتقال و پردازش داده ها برای تجزیه و تحلیل نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.
Copyright Table of Contents Preface Who This Book Is For Conventions Used in This Book Using Code Examples O’Reilly Online Learning How to Contact Us Acknowledgments Chapter 1. Introduction to Data Pipelines What Are Data Pipelines? Who Builds Data Pipelines? SQL and Data Warehousing Fundamentals Python and/or Java Distributed Computing Basic System Administration A Goal-Oriented Mentality Why Build Data Pipelines? How Are Pipelines Built? Chapter 2. A Modern Data Infrastructure Diversity of Data Sources Source System Ownership Ingestion Interface and Data Structure Data Volume Data Cleanliness and Validity Latency and Bandwidth of the Source System Cloud Data Warehouses and Data Lakes Data Ingestion Tools Data Transformation and Modeling Tools Workflow Orchestration Platforms Directed Acyclic Graphs Customizing Your Data Infrastructure Chapter 3. Common Data Pipeline Patterns ETL and ELT The Emergence of ELT over ETL EtLT Subpattern ELT for Data Analysis ELT for Data Science ELT for Data Products and Machine Learning Steps in a Machine Learning Pipeline Incorporate Feedback in the Pipeline Further Reading on ML Pipelines Chapter 4. Data Ingestion: Extracting Data Setting Up Your Python Environment Setting Up Cloud File Storage Extracting Data from a MySQL Database Full or Incremental MySQL Table Extraction Binary Log Replication of MySQL Data Extracting Data from a PostgreSQL Database Full or Incremental Postgres Table Extraction Replicating Data Using the Write-Ahead Log Extracting Data from MongoDB Extracting Data from a REST API Streaming Data Ingestions with Kafka and Debezium Chapter 5. Data Ingestion: Loading Data Configuring an Amazon Redshift Warehouse as a Destination Loading Data into a Redshift Warehouse Incremental Versus Full Loads Loading Data Extracted from a CDC Log Configuring a Snowflake Warehouse as a Destination Loading Data into a Snowflake Data Warehouse Using Your File Storage as a Data Lake Open Source Frameworks Commercial Alternatives Chapter 6. Transforming Data Noncontextual Transformations Deduplicating Records in a Table Parsing URLs When to Transform? During or After Ingestion? Data Modeling Foundations Key Data Modeling Terms Modeling Fully Refreshed Data Slowly Changing Dimensions for Fully Refreshed Data Modeling Incrementally Ingested Data Modeling Append-Only Data Modeling Change Capture Data Chapter 7. Orchestrating Pipelines Apache Airflow Setup and Overview Installing and Configuring Airflow Database Web Server and UI Scheduler Executors Operators Building Airflow DAGs A Simple DAG An ELT Pipeline DAG Additional Pipeline Tasks Alerts and Notifications Data Validation Checks Advanced Orchestration Configurations Coupled Versus Uncoupled Pipeline Tasks When to Split Up DAGs Coordinating Multiple DAGs with Sensors Managed Airflow Options Other Orchestration Frameworks Chapter 8. Data Validation in Pipelines Validate Early, Validate Often Source System Data Quality Data Ingestion Risks Enabling Data Analyst Validation A Simple Validation Framework Validator Framework Code Structure of a Validation Test Running a Validation Test Usage in an Airflow DAG When to Halt a Pipeline, When to Warn and Continue Extending the Framework Validation Test Examples Duplicate Records After Ingestion Unexpected Change in Row Count After Ingestion Metric Value Fluctuations Commercial and Open Source Data Validation Frameworks Chapter 9. Best Practices for Maintaining Pipelines Handling Changes in Source Systems Introduce Abstraction Maintain Data Contracts Limits of Schema-on-Read Scaling Complexity Standardizing Data Ingestion Reuse of Data Model Logic Ensuring Dependency Integrity Chapter 10. Measuring and Monitoring Pipeline Performance Key Pipeline Metrics Prepping the Data Warehouse A Data Infrastructure Schema Logging and Ingesting Performance Data Ingesting DAG Run History from Airflow Adding Logging to the Data Validator Transforming Performance Data DAG Success Rate DAG Runtime Change Over Time Validation Test Volume and Success Rate Orchestrating a Performance Pipeline The Performance DAG Performance Transparency Index