دسترسی نامحدود
برای کاربرانی که ثبت نام کرده اند
برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید
در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید
برای کاربرانی که ثبت نام کرده اند
درصورت عدم همخوانی توضیحات با کتاب
از ساعت 7 صبح تا 10 شب
ویرایش:
نویسندگان: Jeremy Stanley
سری:
ISBN (شابک) : 9781098145934
ناشر: O'Reilly Media
سال نشر: 2024
تعداد صفحات: 170
زبان: English
فرمت فایل : EPUB (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود)
حجم فایل: 9 Mb
در صورت تبدیل فایل کتاب Automating Data Quality Monitoring at Scale: Scaling Beyond Rules with Machine Learning (Final) به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.
توجه داشته باشید کتاب خودکارسازی نظارت بر کیفیت داده در مقیاس: مقیاس گذاری فراتر از قوانین با یادگیری ماشین (نهایی) نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.
The world\'s businesses ingest a combined 2.5 quintillion bytes of data every day. But how much of this vast amount of data--used to build products, power AI systems, and drive business decisions--is poor quality or just plain bad? This practical book shows you how to ensure that the data your organization relies on contains only high-quality records. Most data engineers, data analysts, and data scientists genuinely care about data quality, but they often don\'t have the time, resources, or understanding to create a data quality monitoring solution that succeeds at scale. In this book, Jeremy Stanley and Paige Schwartz from Anomalo explain how you can use automated data quality monitoring to cover all your tables efficiently, proactively alert on every category of issue, and resolve problems immediately. This book will help you: Learn why data quality is a business imperative Understand and assess unsupervised learning models for detecting data issues Implement notifications that reduce alert fatigue and let you triage and resolve issues quickly Integrate automated data quality monitoring with data catalogs, orchestration layers, and BI and ML systems Understand the limits of automated data quality monitoring and how to overcome them Learn how to deploy and manage your monitoring solution at scale Maintain automated data quality monitoring for the long term
Foreword Preface Who Should Use This Book Conventions Used in This Book O’Reilly Online Learning How to Contact Us Acknowledgments 1. The Data Quality Imperative High-Quality Data Is the New Gold Data-Driven Companies Are Today’s Disrupters Data Analytics Is Democratized AI and Machine Learning Are Differentiators Generative AI and data quality Companies Are Investing in a Modern Data Stack More Data, More Problems Issues Inside the Data Factory Data Migrations Third-Party Data Sources Company Growth and Change Exogenous Factors Why We Need Data Quality Monitoring Data Scars Data Shocks Automating Data Quality Monitoring: The New Frontier 2. Data Quality Monitoring Strategies and the Role of Automation Monitoring Requirements Data Observability: Necessary, but Not Sufficient Traditional Approaches to Data Quality Manual Data Quality Detection Rule-Based Testing Metrics Monitoring Automating Data Quality Monitoring with Unsupervised Machine Learning What Is Unsupervised Machine Learning? An Analogy: Lane Departure Warnings The Limits of Automation Automating rule and metric creation Rules Metrics A Four-Pillar Approach to Data Quality Monitoring 3. Assessing the Business Impact of Automated Data Quality Monitoring Assessing Your Data Volume Variety Unstructured data Semistructured data Structured data Normalized relational data Fact tables Summary tables Velocity Veracity Special Cases Assessing Your Industry Regulatory Pressure AI/ML Risks Feature shocks NULL increases Change in correlation Duplicate data Data as a Product Assessing Your Data Maturity Assessing Benefits to Stakeholders Engineers Data Leadership Scientists Consumers Conducting an ROI Analysis Quantitative Measures Qualitative Measures Conclusion 4. Automating Data Quality Monitoring with Machine Learning Requirements Sensitivity Specificity Transparency Scalability Nonrequirements Data Quality Monitoring Is Not Outlier Detection ML Approach and Algorithm Data Sampling Sample size Bias and efficiency Feature Encoding Model Development Training and evaluation Computational efficiency Model Explainability Putting It Together with Pseudocode Other Applications Conclusion 5. Building a Model That Works on Real-World Data Data Challenges and Mitigations Seasonality Time-Based Features Chaotic Tables Updated-in-Place Tables Column Correlations Model Testing Injecting Synthetic Issues Example Benchmarking Analyzing performance Putting it together with pseudocode Improving the Model Conclusion 6. Implementing Notifications While Avoiding Alert Fatigue How Notifications Facilitate Data Issue Response Triage Routing Resolution Documentation Taking Action Without Notifications Anatomy of a Notification Visualization Actions Text Description Who Created/Last Edited the Check Delivering Notifications Notification Audience Notification Channels Email Real-time communication PagerDuty or Opsgenie-type platforms (alerting, on-call management) Ticketing platforms (Jira, ServiceNow) Webhooks Notification Timing Avoiding Alert Fatigue Scheduling Checks in the Right Order Clustering Alerts Using Machine Learning Suppressing Notifications Priority level Continuous retraining Narrowing the scope of the model Making the check less sensitive What not to suppress: Expected changes Automating the Root Cause Analysis Conclusion 7. Integrating Monitoring with Data Tools and Systems Monitoring Your Data Stack Data Warehouses Integrating with Data Warehouses Security Reconciling Data Across Multiple Warehouses Comparing datasets with rule-based testing Comparing datasets with unsupervised machine learning Comparing summary statistics Data Orchestrators Integrating with Orchestrators Data Catalogs Integrating with Catalogs Data Consumers Analytics and BI Tools MLOps Conclusion 8. Operating Your Solution at Scale Build Versus Buy Vendor Deployment Models SaaS Fully in-VPC or on-prem Hybrid Configuration Determining Which Tables Are Most Important Deciding What Data in a Table to Monitor Configuration at Scale Enablement User Roles and Permissions Onboarding, Training, and Support Improving Data Quality Over Time Initiatives Metrics Triage and resolution Executive dashboards Scorecards From Chaos to Clarity A. Types of Data Quality Issues Table Issues Late Arrival Definition Example Causes Analytics impact ML impact How to monitor Schema Changes Definition Example Causes Analytics impact ML impact How to monitor Untraceable Changes Definition Example Causes Analytics impact ML impact How to monitor Row Issues Incomplete Rows Definition Example Causes Analytics impact ML impact How to monitor Duplicate Rows Definition Example Causes Analytics impact ML impact How to monitor Temporal Inconsistency Definition Example Causes Analytics impact ML impact How to monitor Value Issues Missing Values Definition Example Causes Analytics impact ML impact How to monitor Incorrect Values Definition Example Causes Analytics impact ML impact How to monitor Invalid Values Definition Example Causes Analytics impact ML impact How to monitor Multi Issues Relational Failures Definition Example Causes Analytics impact ML impact How to monitor Inconsistent Sources Definition Example Causes Analytics impact ML impact How to monitor Index