دسترسی نامحدود
برای کاربرانی که ثبت نام کرده اند
برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید
در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید
برای کاربرانی که ثبت نام کرده اند
درصورت عدم همخوانی توضیحات با کتاب
از ساعت 7 صبح تا 10 شب
ویرایش:
نویسندگان: Gareth Eagar
سری:
ISBN (شابک) : 1800560419, 9781800560413
ناشر: Packt Publishing
سال نشر: 2021
تعداد صفحات: 482
زبان: English
فرمت فایل : PDF (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود)
حجم فایل: 19 مگابایت
در صورت تبدیل فایل کتاب Data Engineering with AWS: Learn how to design and build cloud-based data transformation pipelines using AWS به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.
توجه داشته باشید کتاب مهندسی داده با AWS: نحوه طراحی و ساخت خطوط انتقال داده مبتنی بر ابر با استفاده از AWS را بیاموزید. نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.
سفر مهندسی داده AWS خود را با این راهنمای عملی و کاربردی شروع کنید و با مفاهیم اساسی تا ساخت خطوط لوله مهندسی داده با استفاده از AWS آشنا شوید
آشنایی با نحوه طراحی و پیادهسازی خطوط لوله داده پیچیده یک مهارت بسیار مورد توجه است. مهندسان داده مسئول ساخت این خطوط لوله هستند که مجموعه داده های خام را جذب، تبدیل و به آن می پیوندند - ایجاد ارزش جدید از داده ها در این فرآیند.
سرویس وب آمازون (AWS) طیف وسیعی از ابزارها را برای ساده کردن یک داده ارائه می دهد. شغل مهندس، آن را به پلتفرم ترجیحی برای انجام وظایف مهندسی داده تبدیل می کند.
این کتاب شما را از طریق خدمات و مهارت هایی که برای طراحی و پیاده سازی خطوط لوله داده در AWS نیاز دارید، راهنمایی می کند. شما با مرور مفاهیم مهم مهندسی داده و برخی از خدمات اصلی AWS که بخشی از جعبه ابزار مهندس داده را تشکیل می دهند، شروع می کنید. سپس یک خط لوله داده را طراحی میکنید، منابع داده خام را بررسی میکنید، دادهها را تغییر میدهید و یاد میگیرید که چگونه دادههای تبدیل شده توسط مصرفکنندگان مختلف داده استفاده میشوند. این کتاب همچنین به شما در مورد پر کردن دادهها و انبارهای داده به همراه نحوه قرار گرفتن یک خانه داده در تصویر میآموزد. بعداً، با ابزارهای AWS برای تجزیه و تحلیل داده ها، از جمله ابزارهایی برای پرس و جوهای SQL موقت و ایجاد تجسم، آشنا خواهید شد. در فصلهای آخر، خواهید فهمید که چگونه میتوان از قدرت یادگیری ماشین و هوش مصنوعی برای به دست آوردن بینشهای جدید از دادهها استفاده کرد.
در پایان این کتاب AWS، میتوانید آن را حمل کنید. وظایف مهندسی داده را انجام دهید و یک خط لوله داده را به طور مستقل در AWS پیاده سازی کنید.
این کتاب برای مهندسان داده، تحلیلگران داده، و معماران داده است که تازه کار هستند. به AWS و به دنبال گسترش مهارت های خود به ابر AWS. هرکسی که در زمینه مهندسی داده تازه کار است و میخواهد در مورد مفاهیم اساسی و در عین حال کسب تجربه عملی با خدمات متداول مهندسی داده در AWS بیاموزد، این کتاب را نیز مفید خواهد یافت.
درکی اساسی از موضوعات مرتبط با دادههای بزرگ و کد نویسی پایتون به شما کمک می کند تا از این کتاب بیشترین بهره را ببرید اما نیازی به آن نیست. آشنایی با کنسول AWS و خدمات اصلی نیز مفید است اما ضروری نیست.
Start your AWS data engineering journey with this easy-to-follow, hands-on guide and get to grips with foundational concepts through to building data engineering pipelines using AWS
Knowing how to architect and implement complex data pipelines is a highly sought-after skill. Data engineers are responsible for building these pipelines that ingest, transform, and join raw datasets - creating new value from the data in the process.
Amazon Web Services (AWS) offers a range of tools to simplify a data engineer's job, making it the preferred platform for performing data engineering tasks.
This book will take you through the services and the skills you need to architect and implement data pipelines on AWS. You'll begin by reviewing important data engineering concepts and some of the core AWS services that form a part of the data engineer's toolkit. You'll then architect a data pipeline, review raw data sources, transform the data, and learn how the transformed data is used by various data consumers. The book also teaches you about populating data marts and data warehouses along with how a data lakehouse fits into the picture. Later, you'll be introduced to AWS tools for analyzing data, including those for ad-hoc SQL queries and creating visualizations. In the final chapters, you'll understand how the power of machine learning and artificial intelligence can be used to draw new insights from data.
By the end of this AWS book, you'll be able to carry out data engineering tasks and implement a data pipeline on AWS independently.
This book is for data engineers, data analysts, and data architects who are new to AWS and looking to extend their skills to the AWS cloud. Anyone who is new to data engineering and wants to learn about the foundational concepts while gaining practical experience with common data engineering services on AWS will also find this book useful.
A basic understanding of big data-related topics and Python coding will help you get the most out of this book but is not needed. Familiarity with the AWS console and core services is also useful but not necessary.
Cover Title page Copyright and Credits Contributors Table of Contents Preface Section 1: AWS Data Engineering Concepts and Trends Chapter 1: An Introduction to Data Engineering Technical requirements The rise of big data as a corporate asset The challenges of ever-growing datasets Data engineers – the big data enablers Understanding the role of the data engineer Understanding the role of the data scientist Understanding the role of the data analyst Understanding other common data-related roles The benefits of the cloud when building big data analytic solutions Hands-on – creating and accessing your AWS account Creating a new AWS account Accessing your AWS account Summary Chapter 2: Data Management Architectures for Analytics Technical requirements The evolution of data management for analytics Databases and data warehouses Dealing with big, unstructured data A lake on the cloud and a house on that lake Understanding data warehouses and data marts – fountains of truth Distributed storage and massively parallel processing Columnar data storage and efficient data compression Dimensional modeling in data warehouses Understanding the role of data marts Feeding data into the warehouse – ETL and ELT pipelines Building data lakes to tame the variety and volume of big data Data lake logical architecture Bringing together the best of both worlds with the lake house architecture Data lakehouse implementations Building a data lakehouse on AWS Hands-on – configuring the AWS Command Line Interface tool and creating an S3 bucket Installing and configuring the AWS CLI Creating a new Amazon S3 bucket Summary Chapter 3: The AWS Data Engineer's Toolkit Technical requirements AWS services for ingesting data Overview of Amazon Database Migration Service (DMS) Overview of Amazon Kinesis for streaming data ingestion Overview of Amazon MSK for streaming data ingestion Overview of Amazon AppFlow for ingesting data from SaaS services Overview of Amazon Transfer Family for ingestion using FTP/SFTP protocols Overview of Amazon DataSync for ingesting from on-premises storage Overview of the AWS Snow family of devices for large data transfers AWS services for transforming data Overview of AWS Lambda for light transformations Overview of AWS Glue for serverless Spark processing Overview of Amazon EMR for Hadoop ecosystem processing AWS services for orchestrating big data pipelines Overview of AWS Glue workflows for orchestrating Glue components Overview of AWS Step Functions for complex workflows Overview of Amazon managed workflows for Apache Airflow AWS services for consuming data Overview of Amazon Athena for SQL queries in the data lake Overview of Amazon Redshift and Redshift Spectrum for data warehousing and data lakehouse architectures Overview of Amazon QuickSight for visualizing data Hands-on – triggering an AWS Lambda function when a new file arrives in an S3 bucket Creating a Lambda layer containing the AWS Data Wrangler library Creating new Amazon S3 buckets Creating an IAM policy and role for your Lambda function Creating a Lambda function Configuring our Lambda function to be triggered by an S3 upload Summary Chapter 4: Data Cataloging, Security, and Governance Technical requirements Getting data security and governance right Common data regulatory requirements Core data protection concepts Personal data Encryption Anonymized data Pseudonymized data/tokenization Authentication Authorization Putting these concepts together Cataloging your data to avoid the data swamp How to avoid the data swamp The AWS Glue/Lake Formation data catalog AWS services for data encryption and security monitoring AWS Key Management Service (KMS) Amazon Macie Amazon GuardDuty AWS services for managing identity and permissions AWS Identity and Access Management (IAM) service Using AWS Lake Formation to manage data lake access Hands-on – configuring Lake Formation permissions Creating a new user with IAM permissions Transitioning to managing fine-grained permissions with AWS Lake Formation Summary Section 2: Architecting and Implementing Data Lakes and Data Lake Houses Chapter 5: Architecting Data Engineering Pipelines Technical requirements Approaching the data pipeline architecture Architecting houses and architecting pipelines Whiteboarding as an information-gathering tool Conducting a whiteboarding session Identifying data consumers and understanding their requirements Identifying data sources and ingesting data Identifying data transformations and optimizations File format optimizations Data standardization Data quality checks Data partitioning Data denormalization Data cataloging Whiteboarding data transformation Loading data into data marts Wrapping up the whiteboarding session Hands-on – architecting a sample pipeline Detailed notes from the project "Bright Light" whiteboarding meeting of GP Widgets, Inc Summary Chapter 6: Ingesting Batch and Streaming Data Technical requirements Understanding data sources Data variety Data volume Data velocity Data veracity Data value Questions to ask Ingesting data from a relational database AWS Database Migration Service (DMS) AWS Glue Other ways to ingest data from a database Deciding on the best approach for ingesting from a database Ingesting streaming data Amazon Kinesis versus Amazon Managed Streaming for Kafka (MSK) Hands-on – ingesting data with AWS DMS Creating a new MySQL database instance Loading the demo data using an Amazon EC2 instance Creating an IAM policy and role for DMS Configuring DMS settings and performing a full load from MySQL to S3 Querying data with Amazon Athena Hands-on – ingesting streaming data Configuring Kinesis Data Firehose for streaming delivery to Amazon S3 Configuring Amazon Kinesis Data Generator (KDG) Adding newly ingested data to the Glue Data Catalog Querying the data with Amazon Athena Summary Chapter 7: Transforming Data to Optimize for Analytics Technical requirements Transformations – making raw data more valuable Cooking, baking, and data transformations Transformations as part of a pipeline Types of data transformation tools Apache Spark Hadoop and MapReduce SQL GUI-based tools Data preparation transformations Protecting PII data Optimizing the file format Optimizing with data partitioning Data cleansing Business use case transforms Data denormalization Enriching data Pre-aggregating data Extracting metadata from unstructured data Working with change data capture (CDC) data Traditional approaches – data upserts and SQL views Modern approaches – the transactional data lake Hands-on – joining datasets with AWS Glue Studio Creating a new data lake zone – the curated zone Creating a new IAM role for the Glue job Configuring a denormalization transform using AWS Glue Studio Finalizing the denormalization transform job to write to S3 Create a transform job to join streaming and film data using AWS Glue Studio Summary Chapter 8: Identifying and Enabling Data Consumers Technical requirements Understanding the impact of data democratization A growing variety of data consumers Meeting the needs of business users with data visualization AWS tools for business users Meeting the needs of data analysts with structured reporting AWS tools for data analysts Meeting the needs of data scientists and ML models AWS tools used by data scientists to work with data Hands-on – creating data transformations with AWS Glue DataBrew Configuring new datasets for AWS Glue DataBrew Creating a new Glue DataBrew project Building your Glue DataBrew recipe Creating a Glue DataBrew job Summary Chapter 9: Loading Data into a Data Mart Technical requirements Extending analytics with data warehouses/data marts Cold data Warm data Hot data What not to do – anti-patterns for a data warehouse Using a data warehouse as a transactional datastore Using a data warehouse as a data lake Using data warehouses for real-time, record-level use cases Storing unstructured data Redshift architecture review and storage deep dive Data distribution across slices Redshift Zone Maps and sorting data Designing a high-performance data warehouse Selecting the optimal Redshift node type Selecting the optimal table distribution style and sort key Selecting the right data type for columns Selecting the optimal table type Moving data between a data lake and Redshift Optimizing data ingestion in Redshift Exporting data from Redshift to the data lake Hands-on – loading data into an Amazon Redshift cluster and running queries Uploading our sample data to Amazon S3 IAM roles for Redshift Creating a Redshift cluster Creating external tables for querying data in S3 Creating a schema for a local Redshift table Running complex SQL queries against our data Summary Chapter 10: Orchestrating the Data Pipeline Technical requirements Understanding the core concepts for pipeline orchestration What is a data pipeline, and how do you orchestrate it? How do you trigger a data pipeline to run? How do you handle the failures of a step in your pipeline? Examining the options for orchestrating pipelines in AWS AWS Data Pipeline for managing ETL between data sources AWS Glue Workflows to orchestrate Glue resources Apache Airflow as an open source orchestration solution Pros and cons of using MWAA AWS Step Function for a serverless orchestration solution Pros and cons of using AWS Step Function Deciding on which data pipeline orchestration tool to use Hands-on – orchestrating a data pipeline using AWS Step Function Creating new Lambda functions Creating an SNS topic and subscribing to an email address Creating a new Step Function state machine Configuring AWS CloudTrail and Amazon EventBridge Summary Section 3: The Bigger Picture: Data Analytics, Data Visualization, and Machine Learning Chapter 11: Ad Hoc Queries with Amazon Athena Technical requirements Amazon Athena – in-place SQL analytics for the data lake Tips and tricks to optimize Amazon Athena queries Common file format and layout optimizations Writing optimized SQL queries Federating the queries of external data sources with Amazon Athena Query Federation Querying external data sources using Athena Federated Query Managing governance and costs with Amazon Athena Workgroups Athena Workgroups overview Enforcing settings for groups of users Enforcing data usage controls Hands-on – creating an Amazon Athena workgroup and configuring Athena settings Hands-on – switching Workgroups and running queries Summary Chapter 12: Visualizing Data with Amazon QuickSight Technical requirements Representing data visually for maximum impact Benefits of data visualization Popular uses of data visualizations Understanding Amazon QuickSight's core concepts Standard versus enterprise edition SPICE – the in-memory storage and computation engine for QuickSight Ingesting and preparing data from a variety of sources Preparing datasets in QuickSight versus performing ETL outside of QuickSight Creating and sharing visuals with QuickSight analyses and dashboards Visual types in Amazon QuickSight Understanding QuickSight's advanced features – ML Insights and embedded dashboards Amazon QuickSight ML Insights Amazon QuickSight embedded dashboards Hands-on – creating a simple QuickSight visualization Setting up a new QuickSight account and loading a dataset Creating a new analysis Summary Chapter 13: Enabling Artificial Intelligence and Machine Learning Technical requirements Understanding the value of ML and AI for organizations Specialized ML projects Everyday use cases for ML and AI Exploring AWS services for ML AWS ML services Exploring AWS services for AI AI for unstructured speech and text AI for extracting metadata from images and video AI for ML-powered forecasts AI for fraud detection and personalization Hands-on – reviewing reviews with Amazon Comprehend Setting up a new Amazon SQS message queue Creating a Lambda function for calling Amazon Comprehend Adding Comprehend permissions for our IAM role Adding a Lambda function as a trigger for our SQS message queue Testing the solution with Amazon Comprehend Summary Further reading Chapter 14: Wrapping Up the First Part of Your Learning Journey Technical requirements Looking at the data analytics big picture Managing complex data environments with DataOps Examining examples of real-world data pipelines A decade of data wrapped up for Spotify users Ingesting and processing streaming files at Netflix scale Imagining the future – a look at emerging trends ACID transactions directly on data lake data More data and more streaming ingestion Multi-cloud Decentralized data engineering teams, data platforms, and a data mesh architecture Data and product thinking convergence Data and self-serve platform design convergence Implementations of the data mesh architecture Hands-on – cleaning up your AWS account Reviewing AWS Billing to identify the resources being charged for Closing your AWS account Summary About Packt Other Books YouMay Enjoy Index