دسترسی نامحدود
برای کاربرانی که ثبت نام کرده اند
برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید
در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید
برای کاربرانی که ثبت نام کرده اند
درصورت عدم همخوانی توضیحات با کتاب
از ساعت 7 صبح تا 10 شب
ویرایش: [1 ed.]
نویسندگان: Chris Fregly. Antje Barth
سری:
ISBN (شابک) : 1492079391, 9781492079392
ناشر: O'Reilly Media
سال نشر: 2021
تعداد صفحات: 522
زبان: English
فرمت فایل : EPUB (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود)
حجم فایل: 39 Mb
در صورت تبدیل فایل کتاب Data Science on AWS: Implementing End-to-End, Continuous AI and Machine Learning Pipelines به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.
توجه داشته باشید کتاب علم داده در مورد AWS: پیادهسازی خط لولههای انتها به انتها، هوش مصنوعی مداوم و یادگیری ماشینی نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.
با این کتاب عملی، متخصصان هوش مصنوعی و یادگیری ماشینی یاد میگیرند که چگونه پروژههای علم داده را با موفقیت در سرویسهای وب آمازون بسازند و اجرا کنند. پشته هوش مصنوعی آمازون و یادگیری ماشین، علم داده، مهندسی داده، و توسعه برنامهها را یکپارچه میکند تا به ارتقای مهارتهای شما کمک کند. این راهنما به شما نشان می دهد که چگونه خطوط لوله را در فضای ابری بسازید و اجرا کنید، سپس نتایج را به جای چند روز در چند دقیقه در برنامه ها ادغام کنید. در سرتاسر کتاب، نویسندگان کریس فرگلی و آنژ بارت نشان میدهند که چگونه میتوان هزینه را کاهش داد و عملکرد را بهبود بخشید.
With this practical book, AI and machine learning practitioners will learn how to successfully build and deploy data science projects on Amazon Web Services. The Amazon AI and machine learning stack unifies data science, data engineering, and application development to help level upyour skills. This guide shows you how to build and run pipelines in the cloud, then integrate the results into applications in minutes instead of days. Throughout the book, authors Chris Fregly and Antje Barth demonstrate how to reduce cost and improve performance.
Copyright Table of Contents Preface Overview of the Chapters Who Should Read This Book Other Resources Conventions Used in This Book Using Code Examples O’Reilly Online Learning How to Contact Us Acknowledgments Chapter 1. Introduction to Data Science on AWS Benefits of Cloud Computing Agility Cost Savings Elasticity Innovate Faster Deploy Globally in Minutes Smooth Transition from Prototype to Production Data Science Pipelines and Workflows Amazon SageMaker Pipelines AWS Step Functions Data Science SDK Kubeflow Pipelines Managed Workflows for Apache Airflow on AWS MLflow TensorFlow Extended Human-in-the-Loop Workflows MLOps Best Practices Operational Excellence Security Reliability Performance Efficiency Cost Optimization Amazon AI Services and AutoML with Amazon SageMaker Amazon AI Services AutoML with SageMaker Autopilot Data Ingestion, Exploration, and Preparation in AWS Data Ingestion and Data Lakes with Amazon S3 and AWS Lake Formation Data Analysis with Amazon Athena, Amazon Redshift, and Amazon QuickSight Evaluate Data Quality with AWS Deequ and SageMaker Processing Jobs Label Training Data with SageMaker Ground Truth Data Transformation with AWS Glue DataBrew, SageMaker Data Wrangler, and SageMaker Processing Jobs Model Training and Tuning with Amazon SageMaker Train Models with SageMaker Training and Experiments Built-in Algorithms Bring Your Own Script (Script Mode) Bring Your Own Container Pre-Built Solutions and Pre-Trained Models with SageMaker JumpStart Tune and Validate Models with SageMaker Hyper-Parameter Tuning Model Deployment with Amazon SageMaker and AWS Lambda Functions SageMaker Endpoints SageMaker Batch Transform Serverless Model Deployment with AWS Lambda Streaming Analytics and Machine Learning on AWS Amazon Kinesis Streaming Amazon Managed Streaming for Apache Kafka Streaming Predictions and Anomaly Detection AWS Infrastructure and Custom-Built Hardware SageMaker Compute Instance Types GPUs and Amazon Custom-Built Compute Hardware GPU-Optimized Networking and Custom-Built Hardware Storage Options Optimized for Large-Scale Model Training Reduce Cost with Tags, Budgets, and Alerts Summary Chapter 2. Data Science Use Cases Innovation Across Every Industry Personalized Product Recommendations Recommend Products with Amazon Personalize Generate Recommendations with Amazon SageMaker and TensorFlow Generate Recommendations with Amazon SageMaker and Apache Spark Detect Inappropriate Videos with Amazon Rekognition Demand Forecasting Predict Energy Consumption with Amazon Forecast Predict Demand for Amazon EC2 Instances with Amazon Forecast Identify Fake Accounts with Amazon Fraud Detector Enable Privacy-Leak Detection with Amazon Macie Conversational Devices and Voice Assistants Speech Recognition with Amazon Lex Text-to-Speech Conversion with Amazon Polly Speech-to-Text Conversion with Amazon Transcribe Text Analysis and Natural Language Processing Translate Languages with Amazon Translate Classify Customer-Support Messages with Amazon Comprehend Extract Resume Details with Amazon Textract and Comprehend Cognitive Search and Natural Language Understanding Intelligent Customer Support Centers Industrial AI Services and Predictive Maintenance Home Automation with AWS IoT and Amazon SageMaker Extract Medical Information from Healthcare Documents Self-Optimizing and Intelligent Cloud Infrastructure Predictive Auto Scaling for Amazon EC2 Anomaly Detection on Streams of Data Cognitive and Predictive Business Intelligence Ask Natural-Language Questions with Amazon QuickSight Train and Invoke SageMaker Models with Amazon Redshift Invoke Amazon Comprehend and SageMaker Models from Amazon Aurora SQL Database Invoke SageMaker Model from Amazon Athena Run Predictions on Graph Data Using Amazon Neptune Educating the Next Generation of AI and ML Developers Build Computer Vision Models with AWS DeepLens Learn Reinforcement Learning with AWS DeepRacer Understand GANs with AWS DeepComposer Program Nature’s Operating System with Quantum Computing Quantum Bits Versus Digital Bits Quantum Supremacy and the Quantum Computing Eras Cracking Cryptography Molecular Simulations and Drug Discovery Logistics and Financial Optimizations Quantum Machine Learning and AI Programming a Quantum Computer with Amazon Braket AWS Center for Quantum Computing Increase Performance and Reduce Cost Automatic Code Reviews with CodeGuru Reviewer Improve Application Performance with CodeGuru Profiler Improve Application Availability with DevOps Guru Summary Chapter 3. Automated Machine Learning Automated Machine Learning with SageMaker Autopilot Track Experiments with SageMaker Autopilot Train and Deploy a Text Classifier with SageMaker Autopilot Train and Deploy with SageMaker Autopilot UI Train and Deploy a Model with the SageMaker Autopilot Python SDK Predict with Amazon Athena and SageMaker Autopilot Train and Predict with Amazon Redshift ML and SageMaker Autopilot Automated Machine Learning with Amazon Comprehend Predict with Amazon Comprehend’s Built-in Model Train and Deploy a Custom Model with the Amazon Comprehend UI Train and Deploy a Custom Model with the Amazon Comprehend Python SDK Summary Chapter 4. Ingest Data into the Cloud Data Lakes Import Data into the S3 Data Lake Describe the Dataset Query the Amazon S3 Data Lake with Amazon Athena Access Athena from the AWS Console Register S3 Data as an Athena Table Update Athena Tables as New Data Arrives with AWS Glue Crawler Create a Parquet-Based Table in Athena Continuously Ingest New Data with AWS Glue Crawler Build a Lake House with Amazon Redshift Spectrum Export Amazon Redshift Data to S3 Data Lake as Parquet Share Data Between Amazon Redshift Clusters Choose Between Amazon Athena and Amazon Redshift Reduce Cost and Increase Performance S3 Intelligent-Tiering Parquet Partitions and Compression Amazon Redshift Table Design and Compression Use Bloom Filters to Improve Query Performance Materialized Views in Amazon Redshift Spectrum Summary Chapter 5. Explore the Dataset Tools for Exploring Data in AWS Visualize Our Data Lake with SageMaker Studio Prepare SageMaker Studio to Visualize Our Dataset Run a Sample Athena Query in SageMaker Studio Dive Deep into the Dataset with Athena and SageMaker Query Our Data Warehouse Run a Sample Amazon Redshift Query from SageMaker Studio Dive Deep into the Dataset with Amazon Redshift and SageMaker Create Dashboards with Amazon QuickSight Detect Data-Quality Issues with Amazon SageMaker and Apache Spark SageMaker Processing Jobs Analyze Our Dataset with Deequ and Apache Spark Detect Bias in Our Dataset Generate and Visualize Bias Reports with SageMaker Data Wrangler Detect Bias with a SageMaker Clarify Processing Job Integrate Bias Detection into Custom Scripts with SageMaker Clarify Open Source Mitigate Data Bias by Balancing the Data Detect Different Types of Drift with SageMaker Clarify Analyze Our Data with AWS Glue DataBrew Reduce Cost and Increase Performance Use a Shared S3 Bucket for Nonsensitive Athena Query Results Approximate Counts with HyperLogLog Dynamically Scale a Data Warehouse with AQUA for Amazon Redshift Improve Dashboard Performance with QuickSight SPICE Summary Chapter 6. Prepare the Dataset for Model Training Perform Feature Selection and Engineering Select Training Features Based on Feature Importance Balance the Dataset to Improve Model Accuracy Split the Dataset into Train, Validation, and Test Sets Transform Raw Text into BERT Embeddings Convert Features and Labels to Optimized TensorFlow File Format Scale Feature Engineering with SageMaker Processing Jobs Transform with scikit-learn and TensorFlow Transform with Apache Spark and TensorFlow Share Features Through SageMaker Feature Store Ingest Features into SageMaker Feature Store Retrieve Features from SageMaker Feature Store Ingest and Transform Data with SageMaker Data Wrangler Track Artifact and Experiment Lineage with Amazon SageMaker Understand Lineage-Tracking Concepts Show Lineage of a Feature Engineering Job Understand the SageMaker Experiments API Ingest and Transform Data with AWS Glue DataBrew Summary Chapter 7. Train Your First Model Understand the SageMaker Infrastructure Introduction to SageMaker Containers Increase Availability with Compute and Network Isolation Deploy a Pre-Trained BERT Model with SageMaker JumpStart Develop a SageMaker Model Built-in Algorithms Bring Your Own Script Bring Your Own Container A Brief History of Natural Language Processing BERT Transformer Architecture Training BERT from Scratch Masked Language Model Next Sentence Prediction Fine Tune a Pre-Trained BERT Model Create the Training Script Setup the Train, Validation, and Test Dataset Splits Set Up the Custom Classifier Model Train and Validate the Model Save the Model Launch the Training Script from a SageMaker Notebook Define the Metrics to Capture and Monitor Configure the Hyper-Parameters for Our Algorithm Select Instance Type and Instance Count Putting It All Together in the Notebook Download and Inspect Our Trained Model from S3 Show Experiment Lineage for Our SageMaker Training Job Show Artifact Lineage for Our SageMaker Training Job Evaluate Models Run Some Ad Hoc Predictions from the Notebook Analyze Our Classifier with a Confusion Matrix Visualize Our Neural Network with TensorBoard Monitor Metrics with SageMaker Studio Monitor Metrics with CloudWatch Metrics Debug and Profile Model Training with SageMaker Debugger Detect and Resolve Issues with SageMaker Debugger Rules and Actions Profile Training Jobs Interpret and Explain Model Predictions Detect Model Bias and Explain Predictions Detect Bias with a SageMaker Clarify Processing Job Feature Attribution and Importance with SageMaker Clarify and SHAP More Training Options for BERT Convert TensorFlow BERT Model to PyTorch Train PyTorch BERT Models with SageMaker Train Apache MXNet BERT Models with SageMaker Train BERT Models with PyTorch and AWS Deep Java Library Reduce Cost and Increase Performance Use Small Notebook Instances Test Model-Training Scripts Locally in the Notebook Profile Training Jobs with SageMaker Debugger Start with a Pre-Trained Model Use 16-Bit Half Precision and bfloat16 Mixed 32-Bit Full and 16-Bit Half Precision Quantization Use Training-Optimized Hardware Spot Instances and Checkpoints Early Stopping Rule in SageMaker Debugger Summary Chapter 8. Train and Optimize Models at Scale Automatically Find the Best Model Hyper-Parameters Set Up the Hyper-Parameter Ranges Run the Hyper-Parameter Tuning Job Analyze the Best Hyper-Parameters from the Tuning Job Show Experiment Lineage for Our SageMaker Tuning Job Use Warm Start for Additional SageMaker Hyper-Parameter Tuning Jobs Run HPT Job Using Warm Start Analyze the Best Hyper-Parameters from the Warm-Start Tuning Job Scale Out with SageMaker Distributed Training Choose a Distributed-Communication Strategy Choose a Parallelism Strategy Choose a Distributed File System Launch the Distributed Training Job Reduce Cost and Increase Performance Start with Reasonable Hyper-Parameter Ranges Shard the Data with ShardedByS3Key Stream Data on the Fly with Pipe Mode Enable Enhanced Networking Summary Chapter 9. Deploy Models to Production Choose Real-Time or Batch Predictions Real-Time Predictions with SageMaker Endpoints Deploy Model Using SageMaker Python SDK Track Model Deployment in Our Experiment Analyze the Experiment Lineage of a Deployed Model Invoke Predictions Using the SageMaker Python SDK Invoke Predictions Using HTTP POST Create Inference Pipelines Invoke SageMaker Models from SQL and Graph-Based Queries Auto-Scale SageMaker Endpoints Using Amazon CloudWatch Define a Scaling Policy with AWS-Provided Metrics Define a Scaling Policy with a Custom Metric Tuning Responsiveness Using a Cooldown Period Auto-Scale Policies Strategies to Deploy New and Updated Models Split Traffic for Canary Rollouts Shift Traffic for Blue/Green Deployments Testing and Comparing New Models Perform A/B Tests to Compare Model Variants Reinforcement Learning with Multiarmed Bandit Testing Monitor Model Performance and Detect Drift Enable Data Capture Understand Baselines and Drift Monitor Data Quality of Deployed SageMaker Endpoints Create a Baseline to Measure Data Quality Schedule Data-Quality Monitoring Jobs Inspect Data-Quality Results Monitor Model Quality of Deployed SageMaker Endpoints Create a Baseline to Measure Model Quality Schedule Model-Quality Monitoring Jobs Inspect Model-Quality Monitoring Results Monitor Bias Drift of Deployed SageMaker Endpoints Create a Baseline to Detect Bias Schedule Bias-Drift Monitoring Jobs Inspect Bias-Drift Monitoring Results Monitor Feature Attribution Drift of Deployed SageMaker Endpoints Create a Baseline to Monitor Feature Attribution Schedule Feature Attribution Drift Monitoring Jobs Inspect Feature Attribution Drift Monitoring Results Perform Batch Predictions with SageMaker Batch Transform Select an Instance Type Set Up the Input Data Tune the SageMaker Batch Transform Configuration Prepare the SageMaker Batch Transform Job Run the SageMaker Batch Transform Job Review the Batch Predictions AWS Lambda Functions and Amazon API Gateway Optimize and Manage Models at the Edge Deploy a PyTorch Model with TorchServe TensorFlow-BERT Inference with AWS Deep Java Library Reduce Cost and Increase Performance Delete Unused Endpoints and Scale In Underutilized Clusters Deploy Multiple Models in One Container Attach a GPU-Based Elastic Inference Accelerator Optimize a Trained Model with SageMaker Neo and TensorFlow Lite Use Inference-Optimized Hardware Summary Chapter 10. Pipelines and MLOps Machine Learning Operations Software Pipelines Machine Learning Pipelines Components of Effective Machine Learning Pipelines Steps of an Effective Machine Learning Pipeline Pipeline Orchestration with SageMaker Pipelines Create an Experiment to Track Our Pipeline Lineage Define Our Pipeline Steps Configure the Pipeline Parameters Create the Pipeline Start the Pipeline with the Python SDK Start the Pipeline with the SageMaker Studio UI Approve the Model for Staging and Production Review the Pipeline Artifact Lineage Review the Pipeline Experiment Lineage Automation with SageMaker Pipelines GitOps Trigger When Committing Code S3 Trigger When New Data Arrives Time-Based Schedule Trigger Statistical Drift Trigger More Pipeline Options AWS Step Functions and the Data Science SDK Kubeflow Pipelines Apache Airflow MLflow TensorFlow Extended Human-in-the-Loop Workflows Improving Model Accuracy with Amazon A2I Active-Learning Feedback Loops with SageMaker Ground Truth Reduce Cost and Improve Performance Cache Pipeline Steps Use Less-Expensive Spot Instances Summary Chapter 11. Streaming Analytics and Machine Learning Online Learning Versus Offline Learning Streaming Applications Windowed Queries on Streaming Data Stagger Windows Tumbling Windows Sliding Windows Streaming Analytics and Machine Learning on AWS Classify Real-Time Product Reviews with Amazon Kinesis, AWS Lambda, and Amazon SageMaker Implement Streaming Data Ingest Using Amazon Kinesis Data Firehose Create Lambda Function to Invoke SageMaker Endpoint Create the Kinesis Data Firehose Delivery Stream Put Messages on the Stream Summarize Real-Time Product Reviews with Streaming Analytics Setting Up Amazon Kinesis Data Analytics Create a Kinesis Data Stream to Deliver Data to a Custom Application Create AWS Lambda Function to Send Notifications via Amazon SNS Create AWS Lambda Function to Publish Metrics to Amazon CloudWatch Transform Streaming Data in Kinesis Data Analytics Understand In-Application Streams and Pumps Amazon Kinesis Data Analytics Applications Calculate Average Star Rating Detect Anomalies in Streaming Data Calculate Approximate Counts of Streaming Data Create Kinesis Data Analytics Application Start the Kinesis Data Analytics Application Put Messages on the Stream Classify Product Reviews with Apache Kafka, AWS Lambda, and Amazon SageMaker Reduce Cost and Improve Performance Aggregate Messages Consider Kinesis Firehose Versus Kinesis Data Streams Enable Enhanced Fan-Out for Kinesis Data Streams Summary Chapter 12. Secure Data Science on AWS Shared Responsibility Model Between AWS and Customers Applying AWS Identity and Access Management IAM Users IAM Policies IAM User Roles IAM Service Roles Specifying Condition Keys for IAM Roles Enable Multifactor Authentication Least Privilege Access with IAM Roles and Policies Resource-Based IAM Policies Identity-Based IAM Policies Isolating Compute and Network Environments Virtual Private Cloud VPC Endpoints and PrivateLink Limiting Athena APIs with a VPC Endpoint Policy Securing Amazon S3 Data Access Require a VPC Endpoint with an S3 Bucket Policy Limit S3 APIs for an S3 Bucket with a VPC Endpoint Policy Restrict S3 Bucket Access to a Specific VPC with an S3 Bucket Policy Limit S3 APIs with an S3 Bucket Policy Restrict S3 Data Access Using IAM Role Policies Restrict S3 Bucket Access to a Specific VPC with an IAM Role Policy Restrict S3 Data Access Using S3 Access Points Encryption at Rest Create an AWS KMS Key Encrypt the Amazon EBS Volumes During Training Encrypt the Uploaded Model in S3 After Training Store Encryption Keys with AWS KMS Enforce S3 Encryption for Uploaded S3 Objects Enforce Encryption at Rest for SageMaker Jobs Enforce Encryption at Rest for SageMaker Notebooks Enforce Encryption at Rest for SageMaker Studio Encryption in Transit Post-Quantum TLS Encryption in Transit with KMS Encrypt Traffic Between Training-Cluster Containers Enforce Inter-Container Encryption for SageMaker Jobs Securing SageMaker Notebook Instances Deny Root Access Inside SageMaker Notebooks Disable Internet Access for SageMaker Notebooks Securing SageMaker Studio Require a VPC for SageMaker Studio SageMaker Studio Authentication Securing SageMaker Jobs and Models Require a VPC for SageMaker Jobs Require Network Isolation for SageMaker Jobs Securing AWS Lake Formation Securing Database Credentials with AWS Secrets Manager Governance Secure Multiaccount AWS Environments with AWS Control Tower Manage Accounts with AWS Organizations Enforce Account-Level Permissions with SCPs Implement Multiaccount Model Deployments Auditability Tag Resources Log Activities and Collect Events Track User Activity and API Calls Reduce Cost and Improve Performance Limit Instance Types to Control Cost Quarantine or Delete Untagged Resources Use S3 Bucket KMS Keys to Reduce Cost and Increase Performance Summary Index About the Authors Colophon