دسترسی نامحدود
برای کاربرانی که ثبت نام کرده اند
برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید
در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید
برای کاربرانی که ثبت نام کرده اند
درصورت عدم همخوانی توضیحات با کتاب
از ساعت 7 صبح تا 10 شب
ویرایش:
نویسندگان: Pavan Kumar Narayanan
سری:
ISBN (شابک) : 9798868806018, 9798868806025
ناشر: Apress
سال نشر: 2024
تعداد صفحات: 651
زبان: English
فرمت فایل : PDF (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود)
حجم فایل: 33 مگابایت
در صورت تبدیل فایل کتاب Data Engineering for Machine Learning Pipelines: From Python Libraries to ML Pipelines and Cloud Platforms به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.
توجه داشته باشید کتاب مهندسی داده برای خطوط لوله یادگیری ماشین: از کتابخانه های پایتون تا خطوط لوله ML و بسترهای ابری نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.
Table of Contents About the Author About the Technical Reviewer Introduction Chapter 1: Core Technologies in Data Engineering Introduction Python Programming F Strings Python Functions Advanced Function Arguments *args **kwargs Lambda Functions Decorators in Python Type Hinting Typing Module Generators in Python Enumerate Functions List Comprehension Random Module random() randint() getrandbits() choice() shuffle() sample() seed() Git Source Code Management Foundations of Git GitHub Setup and Installation Core Concepts of Git Cloning Branching Forking Pull request Gitignore SQL Programming Essential SQL Queries Conditional Data Filtering Joining SQL Tables Self-Join in SQL Common Table Expressions Views in SQL Standard View Materialized View Temporary Tables in SQL Window Functions in SQL SQL Aggregate Functions SQL Rank Functions Query Tuning and Execution Plan Optimizations Conclusion Chapter 2: Data Wrangling using Pandas Introduction Data Structures Series Data Frame Indexing Essential Indexing Methods Multi-indexing Time Delta Index Data Extraction and Loading CSV JSON HDF5 Feather Parquet ORC Avro Pickle Chunk Loading Missing Values Background Missing Values in Data Pipelines None NaN NaT NA Handling Missing Values isna() Method notna() Method When to Use Which Method? Data Transformation Data Exploration Combining Multiple Pandas Objects Left Join Left Join Using merge() Right Join Outer Join Inner Join Cross Join Data Reshaping pivot() pivot_table() stack() unstack() melt() crosstab() factorize() compare() groupby() Conclusion Chapter 3: Data Wrangling using Rust’s Polars Introduction Introduction to Polars Lazy vs. Eager Evaluation Data Structures in Polars Polars Series Polars Data Frame Polars Lazy Frame Data Extraction and Loading CSV JSON Parquet Data Transformation in Polars Polars Context Selection Context Filter Context Group-By Context Basic Operations String Operations Aggregation and Group-By Combining Multiple Polars Objects Left Join Outer Join Inner Join Semi Join Anti Join Cross Join Advanced Operations Identifying Missing Values Identifying Unique Values Pivot Melt Examples Polars/SQL Interaction Polars CLI Conclusion Chapter 4: GPU Driven Data Wrangling Using CuDF Introduction CPU vs. GPU Introduction to CUDA Concepts of GPU Programming Kernels Memory Management Introduction to CuDF CuDF vs. Pandas Setup Testing the Installation File IO Operations CSV Parquet JSON Basic Operations Column Filtering Row Filtering Sorting the Dataset Combining Multiple CuDF Objects Left Join Outer Join Inner Join Left Semi Join Left Anti Join Advanced Operations Group-By Function Transform Function apply() Cross Tabulation Feature Engineering Using cut() Factorize Function Window Functions CuDF Pandas Conclusion Chapter 5: Getting Started with Data Validation using Pydantic and Pandera Introduction Introduction to Data Validation Need for Good Data Definition Principles of Data Validation Data Accuracy Data Uniqueness Data Completeness Data Range Data Consistency Data Format Referential Integrity Introduction to Pydantic Type Annotations Refresher Setup and Installation Pydantic Models Nested Models Fields JSON Schemas Constrained Types Validators in Pydantic Introduction to Pandera Setup and Installation DataFrame Schema in Pandera Data Coercion in Pandera Checks in Pandera Statistical Validation in Pandera Lazy Validation Pandera Decorators Conclusion Chapter 6: Data Validation using Great Expectations Introduction Introduction to Great Expectations Components of Great Expectations Data Context Data Sources Expectations Checkpoints Setup and Installation Getting Started with Writing Expectations Data Validation Workflow in Great Expectations Creating a Checkpoint Data Documentation Expectation Store Conclusion Chapter 7: Introduction to Concurrency Programming and Dask Introduction Introduction to Parallel and Concurrent Processing History Python and the Global Interpreter Lock Concepts of Parallel Processing Identifying CPU Cores Concurrent Processing Introduction to Dask Setup and Installation Features of Dask Tasks and Graphs Lazy Evaluation Partitioning and Chunking Serialization and Pickling Dask-CuDF Dask Architecture Core Library Schedulers Client Workers Task Graphs Dask Data Structures and Concepts Dask Arrays Dask Bags Dask DataFrames Dask Delayed Dask Futures Optimizing Dask Computations Data Locality Prioritizing Work Work Stealing Conclusion Chapter 8: Engineering Machine Learning Pipelines using DaskML Introduction Machine Learning Data Pipeline Workflow Data Sourcing Data Exploration Data Cleaning Data Wrangling Data Integration Feature Engineering Feature Selection Data Splitting Model Selection Model Training Model Evaluation Hyperparameter Tuning Final Testing Model Deployment Model Monitoring Model Retraining Dask-ML Integration with Other ML Libraries scikit-learn XGBoost PyTorch Other Libraries Dask-ML Setup and Installation Dask-ML Data Preprocessing RobustScaler() MinMaxScaler() One Hot Encoding Cross Validation Hyperparameter Tuning Using Dask-ML Grid Search Random Search Incremental Search Statistical Imputation with Dask-ML Conclusion Chapter 9: Engineering Real-time Data Pipelines using Apache Kafka Introduction Introduction to Distributed Computing Introduction to Kafka Kafka Architecture Events Topics Partitions Broker Replication Producers Consumers Schema Registry With Avro With Protobuf Kafka Connect Kafka Streams and ksqlDB Kafka Admin Client Setup and Development Kafka Application with the Schema Registry Protobuf Serializer Stream Processing Stateful vs. Stateless Processing Kafka Connect Best Practices Conclusion Chapter 10: Engineering Machine Learning and Data REST APIs using FastAPI Introduction Introduction to Web Services and APIs OpenWeather API Types of APIs SOAP APIs REST APIs GraphQL APIs Webhooks Typical Process of APIs Endpoints API Development Process REST API HTTP Status Codes FastAPI Setup and Installation Core Concepts Path Parameters and Query Parameters Pydantic Integration Response Model Dependency Injection in FastAPI Database Integration with FastAPI Object Relational Mapping SQLAlchemy Engine Session Query API Alembic Building a REST Data API Middleware in FastAPI ML API Endpoint Using FastAPI Conclusion Chapter 11: Getting Started with Workflow Management and Orchestration Introduction Introduction to Workflow Orchestration Workflow ETL and ELT Data Pipeline Workflow Workflow Configuration Workflow Orchestration Introduction to Cron Job Scheduler Concepts Crontab File Cron Logging Cron Job Usage Cron Scheduler Applications Database Backup Data Processing Email Notification Cron Alternatives Conclusion Chapter 12: Orchestrating Data Engineering Pipelines using Apache Airflow Introduction Introduction to Apache Airflow Setup and Installation Airflow Architecture Web Server Database Executor Scheduler Configuration Files A Simple Example Airflow DAGs Tasks Operators Sensors Task Flow Xcom Hooks Variables Params Templates Macros Controlling the DAG Workflow Triggers Conclusion Chapter 13: Orchestrating Data Engineering Pipelines using Prefect Introduction Introduction to Prefect Setup and Installation Prefect Server Prefect Development Flows Flow Runs Interface Tasks Results Persisting Results Artifacts in Prefect Link Artifacts Markdown Artifacts Table Artifacts States in Prefect State Change Hooks Blocks Prefect Variables Variables in .yaml Files Task Runners Conclusion Chapter 14: Getting Started with Big Data and Cloud Computing Introduction Background of Cloud Computing Networking Concepts for Cloud Computing IP address DNS Ports Firewalls Virtual Private Cloud Virtualization Introduction to Big Data Hadoop Spark Introduction to Cloud Computing Cloud Computing Deployment Models Public Cloud Private Cloud Hybrid Cloud Community Cloud Government Cloud Multi-cloud Cloud Architecture Concepts Scalability Elasticity High Availability Fault Tolerance Disaster Recovery Caching Cloud Computing Vendors Cloud Service Models Infrastructure as a Service Platform as a Service Software as a Service Cloud Computing Services Identity and Access Management Compute Storage Object Storage Databases NoSQL Schema on Write vs. Schema on Read Document Databases Column-Oriented Databases Key–Value Stores Graph Databases Time Series Databases Vector Databases Data Warehouses Data Lakes Data Warehouses vs. Data Lakes Real-Time/Streaming Processing Service Serverless Functions Data Integration Services Continuous Integration Services Containerization Data Governance Data Catalog Compliance and Data Protection Data Lifecycle Management Machine Learning Conclusion Chapter 15: Engineering Data Pipelines Using Amazon Web Services Introduction AWS Console Overview Setting Up an AWS Account Installing the AWS CLI AWS S3 Uploading Files AWS Data Systems Amazon RDS Amazon Redshift Amazon Athena Amazon Glue AWS Lake Formation AWS SageMaker Conclusion Chapter 16: Engineering Data Pipelines Using Google Cloud Platform Introduction Google Cloud Platform Set Up a GCP Account Google Cloud Storage Google Cloud CLI Google Compute Engine Cloud SQL Google Bigtable Google BigQuery Google Dataproc Google Vertex AI Workbench Google Vertex AI Conclusion Chapter 17: Engineering Data Pipelines Using Microsoft Azure Introduction Introduction to Azure Azure Blob Storage Azure SQL Azure Cosmos DB Azure Synapse Analytics Azure Data Factory Azure Functions Azure Machine Learning Azure ML Data Assets Azure ML Job Conclusion Index