دسترسی نامحدود
برای کاربرانی که ثبت نام کرده اند
برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید
در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید
برای کاربرانی که ثبت نام کرده اند
درصورت عدم همخوانی توضیحات با کتاب
از ساعت 7 صبح تا 10 شب
ویرایش: 2nd
نویسندگان: Alex Holmes
سری:
ISBN (شابک) : 1617292222
ناشر: Manning Publications
سال نشر: 2014
تعداد صفحات: 513
زبان: English
فرمت فایل : PDF (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود)
حجم فایل: 9 مگابایت
در صورت تبدیل فایل کتاب Hadoop in Practice به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.
توجه داشته باشید کتاب هادوپ در عمل نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.
Hadoop in Practice, Second Edition brief contents contents preface acknowledgments about this book Roadmap What’s new in the second edition? Getting help Code conventions and downloads Third-party libraries Datasets NASDAQ financial stocks Apache log data Names Author Online about the cover illustration Part 1: Background and fundamentals Chapter 1: Hadoop in a heartbeat 1.1 What is Hadoop? 1.1.1 Core Hadoop components 1.1.2 The Hadoop ecosystem 1.1.3 Hardware requirements 1.1.4 Hadoop distributions 1.1.5 Who’s using Hadoop? 1.1.6 Hadoop limitations 1.2 Getting your hands dirty with MapReduce 1.3 Chapter summary Chapter 2: Introduction to YARN 2.1 YARN overview 2.1.1 Why YARN? 2.1.2 YARN concepts and components 2.1.3 YARN configuration Technique 1: Determining the configuration of your cluster 2.1.4 Interacting with YARN Technique 2: Running a command on your YARN cluster Technique 3: Accessing container logs Technique 4: Aggregating container log files 2.1.5 YARN challenges 2.2 YARN and MapReduce 2.2.1 Dissecting a YARN MapReduce application 2.2.2 Configuration 2.2.3 Backward compatibility Technique 5: Writing code that works on Hadoop versions 1 and 2 2.2.4 Running a job Technique 6: Using the command line to run a job 2.2.5 Monitoring running jobs and viewing archived jobs 2.2.6 Uber jobs Technique 7: Running small MapReduce jobs 2.3 YARN applications 2.3.1 NoSQL 2.3.2 Interactive SQL 2.3.3 Graph processing 2.3.4 Real-time data processing 2.3.5 Bulk synchronous parallel 2.3.6 MPI 2.3.7 In-memory 2.3.8 DAG execution 2.4 Chapter summary Part 2: Data logistics Chapter 3: Data serialization— working with text and beyond 3.1 Understanding inputs and outputs in MapReduce 3.1.1 Data input 3.1.2 Data output 3.2 Processing common serialization formats 3.2.1 XML Technique 8: MapReduce and XML 3.2.2 JSON Technique 9: MapReduce and JSON 3.3 Big data serialization formats 3.3.1 Comparing SequenceFile, Protocol Buffers, Thrift, and Avro 3.3.2 SequenceFile Technique 10: Working with SequenceFiles Technique 11: Using SequenceFiles to encode Protocol Buffers 3.3.3 Protocol Buffers 3.3.4 Thrift 3.3.5 Avro Technique 12: Avro’s schema and code generation Technique 13: Selecting the appropriate way to use Avro in MapReduce Technique 14: Mixing Avro and non-Avro data in MapReduce Technique 15: Using Avro records in MapReduce Technique 16: Using Avro key/value pairs in MapReduce Technique 17: Controlling how sorting worksin MapReduce Technique 18: Avro and Hive Technique 19: Avro and Pig 3.4 Columnar storage 3.4.1 Understanding object models and storage formats 3.4.2 Parquet and the Hadoop ecosystem 3.4.3 Parquet block and page sizes Technique 20: Reading Parquet files via the command line Technique 21: Reading and writing Avro data in Parquet with Java Technique 22: Parquet and MapReduce Technique 23: Parquet and Hive/Impala Technique 24: Pushdown predicates and projection with Parquet 3.4.4 Parquet limitations 3.5 Custom file formats 3.5.1 Input and output formats Technique 25: Writing input and output formats for CSV 3.5.2 The importance of output committing 3.6 Chapter summary Chapter 4: Organizing and optimizing data in HDFS 4.1 Data organization 4.1.1 Directory and file layout 4.1.2 Data tiers 4.1.3 Partitioning Technique 26: Using MultipleOutputs to partition your data Technique 27: Using a custom MapReduce partitioner 4.1.4 Compacting Technique 28: Using filecrush to compact data Technique 29: Using Avro to store multiple small binary files 4.1.5 Atomic data movement 4.2 Efficient storage with compression Technique 30: Picking the right compression codec for your data Technique 31: Compression with HDFS, MapReduce, Pig, and Hive Technique 32: Splittable LZOP with MapReduce, Hive, and Pig 4.3 Chapter summary Chapter 5: Moving data into and out of Hadoop 5.1 Key elements of data movement 5.2 Moving data into Hadoop 5.2.1 Roll your own ingest Technique 33: Using the CLI to load files Technique 34: Using REST to load files Technique 35: Accessing HDFS from behind a firewall Technique 36: Mounting Hadoop with NFS Technique 37: Using DistCp to copy data within and between clusters Technique 38: Using Java to load files 5.2.2 Continuous movement of log and binary files into HDFS Technique 39: Pushing system log messages into HDFS with Flume Technique 40: An automated mechanism to copy files into HDFS Technique 41: Scheduling regular ingress activities with Oozie 5.2.3 Databases Technique 42: Using Sqoop to import data from MySQL 5.2.4 HBase Technique 43: HBase ingress into HDFS Technique 44: MapReduce with HBase as a data source 5.2.5 Importing data from Kafka Technique 45: Using Camus to copy Avro data from Kafka into HDFS 5.3 Moving data out of Hadoop 5.3.1 Roll your own egress Technique 46: Using the CLI to extract files Technique 47: Using REST to extract files Technique 48: Reading from HDFS when behind a firewall Technique 49: Mounting Hadoop with NFS Technique 50: Using DistCp to copy data out of Hadoop Technique 51: Using Java to extract files 5.3.2 Automated file egress Technique 52: An automated mechanism to export files from HDFS 5.3.3 Databases Technique 53: Using Sqoop to export data to MySQL 5.3.4 NoSQL 5.4 Chapter summary Part 3: Big data patterns Chapter 6: Applying MapReduce patterns to big data 6.1 Joining Technique 54: Picking the best join strategy for your data Technique 55: Filters, projections, and pushdowns 6.1.1 Map-side joins Technique 56: Joining data where one dataset can fit into memory Technique 57: Performing a semi-join on large datasets Technique 58: Joining on presorted and prepartitioned data 6.1.2 Reduce-side joins Technique 59: A basic repartition join Technique 60: Optimizing the repartition join Technique 61: Using Bloom filters to cut down on shuffled data 6.1.3 Data skew in reduce-side joins Technique 62: Joining large datasets with high join-key cardinality Technique 63: Handling skews generated by the hash partitioner 6.2 Sorting 6.2.1 Secondary sort Technique 64: Implementing a secondary sort 6.2.2 Total order sorting Technique 65: Sorting keys across multiple reducers 6.3 Sampling Technique 66: Writing a reservoir-sampling InputFormat 6.4 Chapter summary Chapter 7: Utilizing data structures and algorithms at scale 7.1 Modeling data and solving problems with graphs 7.1.1 Modeling graphs 7.1.2 Shortest-path algorithm Technique 67: Find the shortest distance between two users 7.1.3 Friends-of-friends algorithm Technique 68: Calculating FoFs 7.1.4 Using Giraph to calculate PageRank over a web graph Technique 69: Calculate PageRank over a web graph 7.2 Bloom filters Technique 70: Parallelized Bloom filter creation in MapReduce 7.3 HyperLogLog 7.3.1 A brief introduction to HyperLogLog Technique 71: Using HyperLogLog to calculate unique counts 7.4 Chapter summary Chapter 8: Tuning, debugging, and testing 8.1 Measure, measure, measure 8.2 Tuning MapReduce 8.2.1 Common inefficiencies in MapReduce jobs Technique 72: Viewing job statistics 8.2.2 Map optimizations Technique 73: Data locality Technique 74: Dealing with a large number of input splits Technique 75: Generating input splits in the cluster with YARN 8.2.3 Shuffle optimizations Technique 76: Using the combiner Technique 77: Blazingly fast sorting with binary comparators Technique 78: Tuning the shuffle internals 8.2.4 Reducer optimizations Technique 79: Too few or too many reducers 8.2.5 General tuning tips Technique 80: Using stack dumps to discover unoptimized user code Technique 81: Profiling your map and reduce tasks 8.3 Debugging 8.3.1 Accessing container log output Technique 82: Examining task logs 8.3.2 Accessing container start scripts Technique 83: Figuring out the container startup command 8.3.3 Debugging OutOfMemory errors Technique 84: Force container JVMs to generate a heap dump 8.3.4 MapReduce coding guidelines for effective debugging Technique 85: Augmenting MapReduce code for better debugging 8.4 Testing MapReduce jobs 8.4.1 Essential ingredients for effective unit testing 8.4.2 MRUnit Technique 86: Using MRUnit to unit-test MapReduce 8.4.3 LocalJobRunner Technique 87: Heavyweight job testing with the LocalJobRunner 8.4.4 MiniMRYarnCluster Technique 88: Using MiniMRYarnCluster to test your jobs 8.4.5 Integration and QA testing 8.5 Chapter summary Part 4: Beyond MapReduce Chapter 9: SQL on Hadoop 9.1 Hive 9.1.1 Hive basics 9.1.2 Reading and writing data Technique 89: Working with text files Technique 90: Exporting data to local disk 9.1.3 User-defined functions in Hive Technique 91: Writing UDFs 9.1.4 Hive performance Technique 92: Partitioning Technique 93: Tuning Hive joins 9.2 Impala 9.2.1 Impala vs. Hive 9.2.2 Impala basics Technique 94: Working with text Technique 95: Working with Parquet Technique 96: Refreshing metadata 9.2.3 User-defined functions in Impala Technique 97: Executing Hive UDFs in Impala 9.3 Spark SQL Technique 98: Calculating stock averages with Spark SQL Technique 99: Language-integrated queries Technique 100: Hive and Spark SQL 9.3.1 Spark 101 9.3.2 Spark on Hadoop 9.3.3 SQL with Spark 9.4 Chapter summary Chapter 10: Writing a YARN application 10.1 Fundamentals of building a YARN application 10.1.1 Actors 10.1.2 The mechanics of a YARN application 10.2 Building a YARN application to collect cluster statistics Technique 101: A bare-bones YARN client Technique 102: A bare-bones ApplicationMaster Technique 103: Running the application and accessing logs Technique 104: Debugging using an unmanaged application master 10.3 Additional YARN application capabilities 10.3.1 RPC between components 10.3.2 Service discovery 10.3.3 Checkpointing application progress 10.3.4 Avoiding split-brain 10.3.5 Long-running applications 10.3.6 Security 10.4 YARN programming abstractions 10.4.1 Twill 10.4.2 Spring 10.4.3 REEF 10.4.4 Picking a YARN API abstraction 10.5 Chapter summary appendix: Installing Hadoop and friends A.1 Code for the book A.2 Recommended Java versions A.3 Hadoop Apache tarball installation Hadoop 1.x UI ports Hadoop 2.x UI ports A.4 Flume Getting more information Installation on Apache Hadoop 1.x systems Installation on Apache Hadoop 2.x systems A.5 Oozie Getting more information Installation on Hadoop 1.x systems Installation on Hadoop 2.x systems A.6 Sqoop Getting more information Installation A.7 HBase Getting more information Installation A.8 Kafka Getting more information Installation A.9 Camus Getting more information Installation on Hadoop 1 Installation on Hadoop 2 A.10 Avro Getting more information Installation A.11 Apache Thrift Getting more information Building Thrift 0.7 A.12 Protocol Buffers Getting more information Building Protocol Buffers A.13 Snappy Getting more information A.14 LZOP Getting more information Building LZOP A.15 Elephant Bird Getting more information A.16 Hive Getting more information Installation A.17 R Getting more information Installation on Red Hat–based systems Installation on non–Red Hat systems A.18 RHadoop Getting more information rmr/rhdfs installation A.19 Mahout Getting more information Installation index A B C D E F G H I J K L M N O P Q R S T U V W X Y Z