دسترسی نامحدود
برای کاربرانی که ثبت نام کرده اند
برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید
در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید
برای کاربرانی که ثبت نام کرده اند
درصورت عدم همخوانی توضیحات با کتاب
از ساعت 7 صبح تا 10 شب
ویرایش:
نویسندگان: Vladyslav Ukis
سری:
ISBN (شابک) : 9780137424757, 0137424604
ناشر: Pearson
سال نشر: 2023
تعداد صفحات:
زبان: English
فرمت فایل : EPUB (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود)
حجم فایل: 18 Mb
در صورت تبدیل فایل کتاب Establishing SRE Foundations: A Step-by-Step Guide to Introducing Site Reliability Engineering in Software Delivery Organizations (Casey Sisterson's Library) به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.
توجه داشته باشید کتاب ایجاد پایه های SRE: راهنمای گام به گام برای معرفی مهندسی قابلیت اطمینان سایت در سازمان های ارائه دهنده نرم افزار (کتابخانه کیسی سیسترسون) نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.
مقیاس پذیری و قابلیت اطمینان خدمات خود را با SRE بهبود بخشید "تکنیکها و اصول SRE نه تنها در اینجا به وضوح تعریف شدهاند، بلکه منطق پشت آنها نیز به گونهای توضیح داده میشود که ثابت بماند. این یک تعریف خشک نیست، این یک درک عملی و قابل استفاده است. -از صمیم قلب این کتاب را بدون هیچ گونه رزرو توصیه می کنیم. این کتاب بسیار خوبی در مورد یک موضوع مهم است که به پیشرفت بازی برای رشته ما کمک می کند!\" - از پیشگفتار دیوید فارلی، بنیانگذار و مدیر عامل Continuous Delivery Ltd. مهندسی قابلیت اطمینان سایت (SRE) که توسط گوگل برای ایجاد سیستم های مقیاس پذیرتر و قابل اعتمادتر در مقیاس بزرگ پیشگام بود، به یکی از با ارزش ترین فرصت های نوآوری نرم افزار امروزی تبدیل شده است. ایجاد پایه های SRE یک راهنمای مختصر و عملی است که نشان می دهد چگونه می توانید پذیرش موفق SRE را در سازمان خود هدایت کنید. دکتر ولادیسلاو یوکیس یک رویکرد گام به گام برای ایجاد مبانی فرآیندهای فرهنگی، سازمانی و فنی مناسب، دستیابی سریع به "حداقل SRE قابل دوام" و بهبود مستمر از آنجا ارائه می دهد. دکتر Ukis به طور گسترده از تجربیات خود در هدایت یک سفر تحول SRE در یک شرکت بزرگ مراقبت های بهداشتی استفاده می کند. در سرتاسر، او به سؤالات خاصی که سازمان ها در مورد SRE می پرسند، پاسخ می دهد، دام ها را شناسایی می کند و نشان می دهد که چگونه می توان از آنها اجتناب کرد یا بر آنها غلبه کرد. نقش شما در توسعه نرمافزار، مهندسی یا عملیات هر چه باشد، این راهنما به شما کمک میکند تا از SRE برای بهبود آنچه مهمتر است استفاده کنید: تجربه کاربر و مشتری. نحوه عملکرد SRE، نقش آن در عملیات نرم افزاری و چالش های تبدیل SRE را درک کنید عملیات فعلی و آمادگی سازمان خود را برای تحول SRE ارزیابی کنید دستیابی به خرید سازمانی و شروع فعالیت های بنیادی، از جمله تعاریف SLO، هشدار، چرخش در حین تماس، پاسخ به حادثه، و تصمیم گیری مبتنی بر بودجه خطا ساختارهای سازمانی را برای حمایت از یک تحول کامل SRE هماهنگ کنید پیشرفت و موفقیت طرح SRE خود را اندازه گیری کنید تحول SRE خود را فراتر از پایه ها حفظ کرده و پیش ببرید کتاب خود را برای دسترسی راحت به بارگیریها، بهروزرسانیها و/یا اصلاحات به محض دردسترس شدن، ثبت کنید. برای جزئیات به داخل کتاب مراجعه کنید.
Improve Your Service Scalability and Reliability with SRE "The techniques and principles of SRE are not only clearly defined here, but also the rationale behind them is explained in a way that will stick. This is not some dry definition, this is practical, usable understanding. . . . I can whole-heartedly recommend this book without any reservation. This is a very good book on an important topic that helps to move the game forward for our discipline!" --From the Foreword by David Farley, Founder and CEO of Continuous Delivery Ltd. Pioneered by Google to create more scalable and reliable large-scale systems, Site Reliability Engineering (SRE) has become one of today's most valuable software innovation opportunities. Establishing SRE Foundations is a concise, practical guide that shows how to drive successful SRE adoption in your own organization. Dr. Vladyslav Ukis presents a step-by-step approach to establishing the right cultural, organizational, and technical process foundations, quickly achieving a "minimum viable SRE" and continually improving from there. Dr. Ukis draws extensively on his own experiences leading an SRE transformation journey at a major healthcare company. Throughout, he answers specific questions that organizations ask about SRE, identifies pitfalls, and shows how to avoid or overcome them. Whatever your role in software development, engineering, or operations, this guide will help you apply SRE to improve what matters most: user and customer experience. Understand how SRE works, its role in software operations, and the challenges of SRE transformation Assess your organization's current operations and readiness for SRE transformation Achieve organizational buy-in and initiate foundational activities, including SLO definitions, alerting, on-call rotations, incident response, and error budget-based decision-making Align organizational structures to support a full SRE transformation Measure the progress and success of your SRE initiative Sustain and advance your SRE transformation beyond the foundations Register your book for convenient access to downloads, updates, and/or corrections as they become available. See inside book for details.
Cover Half Title Title Page Copyright Page Table of Contents Foreword Preface Acknowledgments About the Author Part I: Foundations Chapter 1 Introduction to SRE 1.1 Why SRE? 1.1.1 ITIL 1.1.2 COBIT 1.1.3 Modeling 1.1.4 DevOps 1.1.5 SRE 1.1.6 Comparison 1.2 Alignment Using SRE 1.3 Why Does SRE Work? 1.4 Summary Chapter 2 The Challenge 2.1 Misalignment 2.2 Collective Ownership 2.3 Ownership Using SRE 2.3.1 Product Development 2.3.2 Product Operations 2.3.3 Product Management 2.3.4 Benefits and Costs 2.4 The Challenge Statement 2.5 Coaching 2.6 Summary Chapter 3 SRE Basic Concepts 3.1 Service Level Indicators 3.2 Service Level Objectives 3.3 Error Budgets 3.3.1 Availability Error Budget Example 3.3.2 Error Budget of Zero 3.3.3 Latency Error Budget Example 3.4 Error Budget Policies 3.5 SRE Concept Pyramid 3.6 Alignment Using the SRE Concept Pyramid 3.7 Summary Chapter 4 Assessing the Status Quo 4.1 Where Is the Organization? 4.1.1 Organizational Structure 4.1.2 Organizational Alignment 4.1.3 Formal and Informal Leadership 4.2 Where Are the People? 4.3 Where Is the Tech? 4.4 Where Is the Culture? 4.4.1 Is There High Cooperation? 4.4.2 Are Messengers Trained? 4.4.3 Are Risks Shared? 4.4.4 Is Bridging Encouraged? 4.4.5 Does Failure Lead to Inquiry? 4.4.6 Is Novelty Implemented? 4.5 Where Is the Process? 4.6 SRE Maturity Model 4.7 Posing Hypotheses 4.8 Summary Part II: Running the Transformation Chapter 5 Achieving Organizational Buy-In 5.1 Getting People Behind SRE 5.2 SRE Marketing Funnel 5.2.1 Awareness 5.2.2 Interest 5.2.3 Understanding 5.2.4 Agreement 5.2.5 Engagement 5.3 SRE Coaches 5.3.1 Qualities 5.3.2 Responsibilities 5.4 Top-Down Buy-In 5.4.1 Stakeholder Chart 5.4.2 Engaging the Head of Development 5.4.3 Engaging the Head of Operations 5.4.4 Engaging the Head of Product Management 5.4.5 Achieving Joint Buy-In 5.4.6 Getting SRE into the Portfolio 5.5 Bottom-Up Buy-In 5.5.1 Engaging the Operations Teams 5.5.2 Engaging the Development Teams 5.6 Lateral Buy-In 5.7 Buy-In Staggering 5.8 Team Coaching 5.9 Traversing the Organization 5.9.1 Grouping the Organization 5.9.2 Traversing the Organization Versus SRE Infrastructure Demand 5.9.3 Team Engagements Over Time 5.10 Organizational Coaching 5.11 Summary Chapter 6 Laying Down the Foundations 6.1 Introductory Talks by Team 6.2 Conveying the Basics 6.2.1 SLO as a Contract 6.2.2 SLO as a Proxy Measure of Customer Happiness 6.2.3 User Personas 6.2.4 User Story Mapping 6.2.5 Motivation to Fix SLO Breaches 6.2.6 SLOs Are Not About Technicalities 6.2.7 Causes of SLO Breaches 6.2.8 On Call for SLO Breaches 6.3 SLI Standardization 6.3.1 Application Performance Management Facility 6.3.2 Availability 6.3.3 Latency 6.3.4 Prioritization 6.4 Enabling Logging 6.5 Teaching the Log Query Language 6.6 Defining Initial SLOs 6.6.1 What Makes a Good SLO? 6.6.2 Iterating on an SLO 6.6.3 Revising SLOs 6.7 Default SLOs 6.8 Providing Basic Infrastructure 6.8.1 Dashboards 6.8.2 Alert Content 6.9 Engaging Champions 6.10 Dealing with Detractors 6.10.1 Issues with the Cause 6.10.2 Issues with Alerting 6.10.3 Issues with Tooling 6.10.4 Issues with Product Owner Involvement 6.10.5 Issues with Team Motivation 6.11 Creating Documentation 6.12 Broadcast Success 6.13 Summary Chapter 7 Reacting to Alerts on SLO Breaches 7.1 Environment Selection 7.2 Responsibilities 7.2.1 Dev Versus Ops Responsibilities 7.2.2 Operational Responsibilities 7.2.3 Splitting Operational Responsibilities 7.3 Ways of Working 7.3.1 Interruption-Based Working Mode 7.3.2 Focus-Based Working Mode 7.4 Setting Up On-Call Rotations 7.4.1 Initial Rotation Period 7.4.2 One Person On Call 7.4.3 Two People On Call 7.4.4 Three People On Call 7.5 On-Call Management Tools 7.5.1 Posting SLO Breaches 7.5.2 Scheduling 7.5.3 Professional On-Call Management Tools 7.6 Out-of-Hours On-Call 7.6.1 Using Availability Targets and Product Demand 7.6.2 Trade-offs 7.7 Systematic Knowledge Sharing 7.7.1 Knowledge-Sharing Needs 7.7.2 Knowledge-Sharing Pyramid 7.7.3 On-Call Training 7.7.4 Runbooks 7.7.5 Internal Stack Overflow 7.7.6 SRE Community of Practice 7.8 Broadcast Success 7.9 Summary Chapter 8 Implementing Alert Dispatching 8.1 Alert Escalation 8.2 Defining an Alert Escalation Policy 8.3 Defining Stakeholder Groups 8.4 Triggering Stakeholder Notifications 8.5 Defining Stakeholder Rings 8.6 Defining Effective Stakeholder Notifications 8.7 Getting the Stakeholders Subscribed 8.7.1 Subscribing Using the On-Call Management Tool 8.7.2 Subscribing Using Other Means 8.8 Broadcast Success 8.9 Summary Chapter 9 Implementing Incident Response 9.1 Incident Response Foundations 9.2 Incident Priorities 9.2.1 SLO Breaches Versus Incidents 9.2.2 Changing Incident Priority During an Incident 9.2.3 Defining Generic Incident Priorities 9.2.4 Mapping SLOs to Incident Priorities 9.2.5 Mapping Error Budgets to Incident Priorities 9.2.6 Mapping Resource-Based Alerts to Incident Priorities 9.2.7 Uncovering New Use Cases for Incident Priorities 9.2.8 Adjusting Incident Priorities Based on Stakeholder Feedback 9.2.9 Extending the SLO Definition Process 9.2.10 Infrastructure 9.2.11 Deduplication 9.3 Complex Incident Coordination 9.3.1 What Is a Complex Incident? 9.3.2 Existing Incident Coordination Systems 9.3.3 Incident Classification 9.3.4 Defining Generic Incident Severities 9.3.5 Social Dimension of Incident Classification 9.3.6 Incident Priority Versus Incident Severity 9.3.7 Defining Roles 9.3.8 Roles Required by Incident Severity 9.3.9 Roles On Call 9.3.10 Incident Response Process Evaluation 9.3.11 Incident Response Process Dynamics 9.3.12 Incident Response Team Well-Being 9.4 Incident Postmortems 9.5 Effective Postmortem Criteria 9.5.1 Initiating a Postmortem 9.5.2 Postmortem Lifecycle 9.5.3 Before the Postmortem 9.5.4 During the Postmortem 9.5.5 After the Postmortem 9.5.6 Analyzing the Postmortem Process 9.5.7 Postmortem Template 9.5.8 Facilitating Learning from Postmortems 9.5.9 Successful Postmortem Practice 9.5.10 Example Postmortems 9.6 Mashing Up the Tools 9.6.1 Connecting to the On-Call Management Tool 9.6.2 Connections Among Other Tools 9.6.3 Mobile Integrations 9.6.4 Example Tool Landscapes 9.7 Service Status Broadcast 9.8 Documenting the Incident Response Process 9.9 Broadcast Success 9.10 Summary Chapter 10 Setting Up an Error Budget Policy 10.1 Motivation 10.2 Terminology 10.3 Error Budget Policy Structure 10.4 Error Budget Policy Conditions 10.5 Error Budget Policy Consequences 10.6 Error Budget Policy Governance 10.7 Extending the Error Budget Policy 10.8 Agreeing to the Error Budget Policy 10.9 Storing the Error Budget Policy 10.10 Enacting the Error Budget Policy 10.11 Reviewing the Error Budget Policy 10.12 Related Concepts 10.13 Summary Chapter 11 Enabling Error Budget–Based Decision–Making 11.1 Reliability Decision-Making Taxonomy 11.2 Implementing SRE Indicators 11.2.1 Dimensions of SRE Indicators 11.2.2 “SLOs by Service” Indicator 11.2.3 SLO Adherence Indicator 11.2.4 SLO Error Budget Depletion Indicator 11.2.5 Premature SLO Error Budget Exhaustion Indicator 11.2.6 “SLAs by Service” Indicator 11.2.7 SLA Error Budget Depletion Indicator 11.2.8 SLA Adherence Indicator 11.2.9 Customer Support Ticket Trend Indicator 11.2.10 “On-Call Rotations by Team” Indicator 11.2.11 Incident Time to Recovery Trend Indicator 11.2.12 Least Available Service Endpoints Indicator 11.2.13 Slowest Service Endpoints Indicator 11.3 Process Indicators, Not People KPIs 11.4 Decisions Versus Indicators 11.5 Decision-Making Workflows 11.5.1 API Consumption Decision Workflow 11.5.2 Tightening a Dependency’s SLO Decision Workflow 11.5.3 Features Versus Reliability Prioritization Workflow 11.5.4 Setting an SLO Decision Workflow 11.5.5 Setting an SLA Decision Workflow 11.5.6 Allocating SRE Capacity to a Team Decision Workflow 11.5.7 Chaos Engineering Hypotheses Selection Workflow 11.6 Summary Chapter 12 Implementing Organizational Structure 12.1 SRE Principles Versus Organizational Structure 12.2 Who Builds It, Who Runs It? 12.2.1 “Who Builds It, Who Runs It?” Spectrum 12.2.2 Hybrid Models 12.2.3 Reliability Incentives 12.2.4 Model Comparison Criteria 12.2.5 Model Comparison 12.3 You Build It, You Run It 12.4 You Build It, You and SRE Run It 12.4.1 SRE Team Within the Development Organization 12.4.2 SRE Team Within the Operations Organization 12.4.3 SRE Team in a Dedicated SRE Organization 12.4.4 Comparison 12.4.5 SRE Team Incentives, Identity, and Pride 12.4.6 SRE Team Head Count and Budget 12.4.7 SRE Team Cost Accounting 12.4.8 SRE Team KPIs 12.5 You Build It, SRE Run It 12.5.1 SRE Team Within a Development Organization 12.5.2 SRE Team Within an Operations Organization 12.5.3 SRE Team in a Dedicated SRE Organization 12.6 Cost Optimization 12.7 Team Topologies 12.7.1 Reporting Lines 12.7.2 SRE Identity Triangle 12.7.3 Holacracy: No Reporting Lines 12.8 Choosing a Model 12.8.1 Model Transformation Options 12.8.2 Decision Dimensions 12.8.3 Reporting Options 12.8.4 Positioning the SRE Organization 12.8.5 Conveying the Value to Executives 12.9 A New Role: SRE 12.9.1 Why Is a New Role Needed? 12.9.2 Role Definition 12.9.3 Role Naming 12.9.4 Role Assignment 12.9.5 Role Fulfillment 12.10 SRE Career Path 12.10.1 SRE Role Progressions 12.10.2 SRE Role Transitions 12.10.3 Cultural Importance 12.11 Communicating the Chosen Model 12.12 Introducing the Chosen Model 12.12.1 Organization Changes 12.12.2 Reporting Structure Changes 12.12.3 Role Changes 12.13 Summary Part III: Measuring and Sustaining the Transformation Chapter 13 Measuring the SRE Transformation 13.1 Testing Transformation Hypotheses 13.2 Outages Not Detected Internally 13.3 Services Exhausting Error Budgets Prematurely 13.4 Executives’ Perceptions 13.5 Reliability Perception by Users and Partners 13.6 Summary Chapter 14 Sustaining the SRE Movement 14.1 Maturing the SRE CoP 14.2 SRE Minutes 14.3 Availability Newsletter 14.4 SRE Column in the Engineering Blog 14.5 Promote Long-Form SRE Wiki Articles 14.6 SRE Broadcasting 14.7 Combining SRE and CD Indicators 14.7.1 CD Versus SRE Indicators 14.7.2 Bottleneck Analysis 14.8 SRE Feedback Loops 14.9 New Hypotheses 14.10 Providing Learning Opportunities 14.11 Supporting SRE Coaches 14.12 Summary Chapter 15 The Road Ahead 15.1 Service Catalog 15.2 SLAs 15.3 Regulatory Compliance 15.4 SRE Infrastructure 15.5 Game Days Appendix Topics for Quick Reference Index A B C D E F G H I J K L M N O P Q R S T U V W X Y Z