دسترسی نامحدود
برای کاربرانی که ثبت نام کرده اند
برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید
در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید
برای کاربرانی که ثبت نام کرده اند
درصورت عدم همخوانی توضیحات با کتاب
از ساعت 7 صبح تا 10 شب
ویرایش: نویسندگان: Buss. Ian, George. Lars, Kunigk. Jan, Wilkinson. Paul سری: ISBN (شابک) : 9781491969274, 1491969245978 ناشر: O’Reilly Media, Inc سال نشر: 2019 تعداد صفحات: 605 Se [633] زبان: English فرمت فایل : PDF (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود) حجم فایل: 16 Mb
در صورت تبدیل فایل کتاب Architecting modern data platforms : a guide to enterprise Hadoop at scale به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.
توجه داشته باشید کتاب معماری سیستم عامل های مدرن داده ها: راهنمای سازمانی Hadoop در مقیاس نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.
اطلاعات زیادی درباره فناوریهای کلان داده وجود دارد، اما پیوند این فناوریها در یک پلتفرم دادههای سازمانی یک کار دلهرهآور است که به طور گسترده پوشش داده نشده است. با این کتاب کاربردی، یاد خواهید گرفت که چگونه زیرساخت داده های بزرگ را هم در محل و هم در فضای ابری بسازید و با موفقیت یک پلت فرم داده مدرن را طراحی کنید. ایده آل برای معماران سازمانی، مدیران فناوری اطلاعات، معماران برنامه و مهندسان داده، این کتاب به شما نشان می دهد که چگونه بر چالش های بسیاری که در طول پروژه های Hadoop ظاهر می شوند غلبه کنید. شما چشم انداز وسیع ابزارهای موجود در قلمرو Hadoop و کلان داده را در یک پرایمر فنی کامل قبل از غواصی در زیر کاوش خواهید کرد: زیرساخت: به تمام لایه های مؤلفه در یک پلت فرم داده مدرن، از سرور گرفته تا مرکز داده، نگاه کنید تا یک پایه محکم برای داده ها در پلتفرم سازمانی شما: جنبه های استقرار، عملیات، امنیت، در دسترس بودن بالا و بازیابی فاجعه را به همراه همه چیزهایی که برای ادغام پلت فرم خود با سایر بخش های فناوری اطلاعات سازمانی خود نیاز دارید بدانید. جنبه های مهم معماری اجرای یک پلت فرم داده های بزرگ در فضای ابری با حفظ امنیت سازمانی و در دسترس بودن بالا
There's a lot of information about big data technologies, but splicing these technologies into an end-to-end enterprise data platform is a daunting task not widely covered. With this practical book, you'll learn how to build big data infrastructure both on-premises and in the cloud and successfully architect a modern data platform. Ideal for enterprise architects, IT managers, application architects, and data engineers, this book shows you how to overcome the many challenges that emerge during Hadoop projects. You'll explore the vast landscape of tools available in the Hadoop and big data realm in a thorough technical primer before diving into: Infrastructure: Look at all component layers in a modern data platform, from the server to the data center, to establish a solid foundation for data in your enterprise Platform: Understand aspects of deployment, operation, security, high availability, and disaster recovery, along with everything you need to know to integrate your platform with the rest of your enterprise IT Taking Hadoop to the cloud: Learn the important architectural aspects of running a big data platform in the cloud while maintaining enterprise security and high availability
Copyright Table of Contents Foreword Preface Some Misconceptions Some General Trends Horizontal Scaling Adoption of Open Source Embracing Cloud Compute Decoupled Compute and Storage What Is This Book About? Who Should Read This Book? The Road Ahead Conventions Used in This Book O’Reilly Safari How to Contact Us Acknowledgments Chapter 1. Big Data Technology Primer A Tour of the Landscape Core Components Computational Frameworks Analytical SQL Engines Storage Engines Ingestion Orchestration Summary Part I. Infrastructure Chapter 2. Clusters Reasons for Multiple Clusters Multiple Clusters for Resiliency Multiple Clusters for Software Development Multiple Clusters for Workload Isolation Multiple Clusters for Legal Separation Multiple Clusters and Independent Storage and Compute Multitenancy Requirements for Multitenancy Sizing Clusters Sizing by Storage Sizing by Ingest Rate Sizing by Workload Cluster Growth The Drivers of Cluster Growth Implementing Cluster Growth Data Replication Replication for Software Development Replication and Workload Isolation Summary Chapter 3. Compute and Storage Computer Architecture for Hadoop Commodity Servers Server CPUs and RAM Nonuniform Memory Access CPU Specifications RAM Commoditized Storage Meets the Enterprise Modularity of Compute and Storage Everything Is Java Replication or Erasure Coding? Alternatives Hadoop and the Linux Storage Stack User Space Important System Calls The Linux Page Cache Short-Circuit and Zero-Copy Reads Filesystems Erasure Coding Versus Replication Discussion Guidance Low-Level Storage Storage Controllers Disk Layer Server Form Factors Form Factor Comparison Guidance Workload Profiles Cluster Configurations and Node Types Master Nodes Worker Nodes Utility Nodes Edge Nodes Small Cluster Configurations Medium Cluster Configurations Large Cluster Configurations Summary Chapter 4. Networking How Services Use a Network Remote Procedure Calls (RPCs) Data Transfers Monitoring Backup Consensus Network Architectures Small Cluster Architectures Medium Cluster Architectures Large Cluster Architectures Network Integration Reusing an Existing Network Creating an Additional Network Network Design Considerations Layer 1 Recommendations Layer 2 Recommendations Layer 3 Recommendations Summary Chapter 5. Organizational Challenges Who Runs It? Is It Infrastructure, Middleware, or an Application? Case Study: A Typical Business Intelligence Project The Traditional Approach Typical Team Setup Compartmentalization of IT Revised Team Setup for Hadoop in the Enterprise Solution Overview with Hadoop New Team Setup Split Responsibilities Do I Need DevOps? Do I Need a Center of Excellence/Competence? Summary Chapter 6. Datacenter Considerations Why Does It Matter ? Basic Datacenter Concepts Cooling Power Network Rack Awareness and Rack Failures Failure Domain Alignment Space and Racking Constraints Ingest and Intercluster Connectivity Software Hardware Replacements and Repair Operational Procedures Typical Pitfalls Networking Cluster Spanning Summary Part II. Platform Chapter 7. Provisioning Clusters Operating Systems OS Choices OS Configuration for Hadoop Automated Configuration Example Service Databases Required Databases Database Integration Options Database Considerations Hadoop Deployment Hadoop Distributions Installation Choices Distribution Architecture Installation Process Summary Chapter 8. Platform Validation Testing Methodology Useful Tools Hardware Validation CPU Disks Network Hadoop Validation HDFS Validation General Validation Validating Other Components Operations Validation Summary Chapter 9. Security In-Flight Encryption TLS Encryption SASL Quality of Protection Enabling in-Flight Encryption Authentication Kerberos LDAP Authentication Delegation Tokens Impersonation Authorization Group Resolution Superusers and Supergroups Hadoop Service Level Authorization Centralized Security Management HDFS YARN ZooKeeper Hive Impala HBase Solr Kudu Oozie Hue Kafka Sentry At-Rest Encryption Volume Encryption with Cloudera Navigator Encrypt and Key Trustee Server HDFS Transparent Data Encryption Encrypting Temporary Files Summary Chapter 10. Integration with Identity Management Providers Integration Areas Integration Scenarios Scenario 1: Writing a File to HDFS Scenario 2: Submitting a Hive Query Scenario 3: Running a Spark Job Integration Providers LDAP Integration Background LDAP Security Load Balancing Application Integration Linux Integration Kerberos Integration Kerberos Clients KDC Integration Certificate Management Signing Certificates Converting Certificates Wildcard Certificates Automation Summary Chapter 11. Accessing and Interacting with Clusters Access Mechanisms Programmatic Access Command-Line Access Web UIs Access Topologies Interaction Patterns Proxy Access Load Balancing Edge Node Interactions Access Security Administration Gateways Workbenches Hue Notebooks Landing Zones Summary Chapter 12. High Availability High Availability Defined Lateral/Service HA Vertical/Systemic HA Measuring Availability Percentages Percentiles Operating for HA Monitoring Playbooks and Postmortems HA Building Blocks Quorums Load Balancing Database HA Ancillary Services General Considerations Separation of Master and Worker Processes Separation of Identical Service Roles Master Servers in Separate Failure Domains Balanced Master Configurations Optimized Server Configurations High Availability of Cluster Services ZooKeeper HDFS YARN HBase KMS Hive Impala Solr Kafka Oozie Hue Other Services Autoconfiguration Summary Chapter 13. Backup and Disaster Recovery Context Many Distributed Systems Policies and Objectives Failure Scenarios Suitable Data Sources Strategies Data Types Consistency Validation Summary Data Replication HBase Cluster Management Tools Kafka Summary Hadoop Cluster Backups Subsystems Case Study: Automating Backups with Oozie Restore Summary Part III. Taking Hadoop to the Cloud Chapter 14. Basics of Virtualization for Hadoop Compute Virtualization Virtual Machine Distribution Anti-Affinity Groups Storage Virtualization Virtualizing Local Storage SANs Object Storage and Network-Attached Storage Network Virtualization Cluster Life Cycle Models Summary Chapter 15. Solutions for Private Clouds OpenStack Automation and Integration Life Cycle and Storage Isolation Summary OpenShift Automation Life Cycle and Storage Isolation Summary VMware and Pivotal Cloud Foundry Do It Yourself? Automation Isolation Life Cycle Model Summary Object Storage for Private Clouds EMC Isilon Ceph Summary Chapter 16. Solutions in the Public Cloud Key Things to Know Cloud Providers AWS Microsoft Azure Google Cloud Platform Implementing Clusters Instances Storage and Life Cycle Models Network Architecture High Availability Summary Chapter 17. Automated Provisioning Long-Lived Clusters Configuration and Templating Deployment Phases Vendor Solutions One-Click Deployments Homegrown Automation Hooking Into a Provisioning Life Cycle Scaling Up and Down Deploying with Security Transient Clusters Sharing Metadata Services Summary Chapter 18. Security in the Cloud Assessing the Risk Risk Model Environmental Risks Deployment Risks Identity Provider Options for Hadoop Option A: Cloud-Only Self-Contained ID Services Option B: Cloud-Only Shared ID Services Option C: On-Premises ID Services Object Storage Security and Hadoop Identity and Access Management Amazon Simple Storage Service GCP Cloud Storage Microsoft Azure Auditing Encryption for Data at Rest Requirements for Key Material Options for Encryption in the Cloud On-Premises Key Persistence Encryption via the Cloud Provider Encryption Feature and Interoperability Summary Recommendations and Summary for Cloud Encryption Encrypting Data in Flight in the Cloud Perimeter Controls and Firewalling GCP AWS Azure Summary Appendix A. Backup Onboarding Checklist Backup Onboarding Checklist Backup Services Cloudera Manager HDFS HBase Hive/Impala Sqoop Oozie Hue Sentry Index About the Authors Colophon