دسترسی نامحدود
برای کاربرانی که ثبت نام کرده اند
برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید
در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید
برای کاربرانی که ثبت نام کرده اند
درصورت عدم همخوانی توضیحات با کتاب
از ساعت 7 صبح تا 10 شب
دسته بندی: برنامه نويسي ویرایش: نویسندگان: Frank Kane سری: ISBN (شابک) : 1787287947, 9781787287945 ناشر: Packt Publishing سال نشر: 2017 تعداد صفحات: 289 زبان: English فرمت فایل : PDF (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود) حجم فایل: 14 مگابایت
در صورت تبدیل فایل کتاب Frank Kane's Taming Big Data with Apache Spark and Python به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.
توجه داشته باشید کتاب رام کردن کلان داده توسط فرانک کین با آپاچی اسپارک و پایتون نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.
Cover Copyright Credits About the Author www.PacktPub.com Customer Feedback Table of Contents Preface Chapter 1: Getting Started with Spark Getting set up - installing Python, a JDK, and Spark and its dependencies Installing Enthought Canopy Installing the Java Development Kit Installing Spark Running Spark code Installing the MovieLens movie rating dataset Run your first Spark program - the ratings histogram example Examining the ratings counter script Running the ratings counter script Summary Chapter 2: Spark Basics and Spark Examples What is Spark? Spark is scalable Spark is fast Spark is hot Spark is not that hard Components of Spark Using Python with Spark The Resilient Distributed Dataset (RDD) What is the RDD? The SparkContext object Creating RDDs Transforming RDDs Map example RDD actions Ratings histogram walk-through Understanding the code Setting up the SparkContext object Loading the data Extract (MAP) the data we care about Perform an action - count by value Sort and display the results Looking at the ratings-counter script in Canopy Key/value RDDs and the average friends by age example Key/value concepts - RDDs can hold key/value pairs Creating a key/value RDD What Spark can do with key/value data? Mapping the values of a key/value RDD The friends by age example Parsing (mapping) the input data Counting up the sum of friends and number of entries per age Compute averages Collect and display the results Running the average friends by age example Examining the script Running the code Filtering RDDs and the minimum temperature by location example What is filter() The source data for the minimum temperature by location example Parse (map) the input data Filter out all but the TMIN entries Create (station ID, temperature) key/value pairs Find minimum temperature by station ID Collect and print results Running the minimum temperature example and modifying it for maximums Examining the min-temperatures script Running the script Running the maximum temperature by location example Counting word occurrences using flatmap() Map versus flatmap Map () Flatmap () Code sample - count the words in a book Improving the word-count script with regular expressions Text normalization Examining the use of regular expressions in the word-count script Running the code Sorting the word count results Step 1 - Implement countByValue() the hard way to create a new RDD Step 2 - Sort the new RDD Examining the script Running the code Find the total amount spent by customer Introducing the problem Strategy for solving the problem Useful snippets of code Check your results and sort them by the total amount spent Check your sorted implementation and results against mine Summary Chapter 3: Advanced Examples of Spark Programs Finding the most popular movie Examining the popular-movies script Getting results Using broadcast variables to display movie names instead of ID numbers Introducing broadcast variables Examining the popular-movies-nicer.py script Getting results Finding the most popular superhero in a social graph Superhero social networks Input data format Strategy Running the script - discover who the most popular superhero is Mapping input data to (hero ID, number of co-occurrences) per line Adding up co-occurrence by hero ID Flipping the (map) RDD to (number, hero ID) Using max() and looking up the name of the winner Getting results Superhero degrees of separation - introducing the breadth-first search algorithm Degrees of separation How the breadth-first search algorithm works? The initial condition of our social graph First pass through the graph Second pass through the graph Third pass through the graph Final pass through the graph Accumulators and implementing BFS in Spark Convert the input file into structured data Writing code to convert Marvel-Graph.txt to BFS nodes Iteratively process the RDD Using a mapper and a reducer How do we know when we\'re done? Superhero degrees of separation - review the code and run it Setting up an accumulator and using the convert to BFS function Calling flatMap() Calling an action Calling reduceByKey Getting results Item-based collaborative filtering in Spark, cache(), and persist() How does item-based collaborative filtering work? Making item-based collaborative filtering a Spark problem It\'s getting real Caching RDDs Running the similar-movies script using Spark\'s cluster manager Examining the script Getting results Improving the quality of the similar movies example Summary Chapter 4: Running Spark on a Cluster Introducing Elastic MapReduce Why use Elastic MapReduce? Warning - Spark on EMR is not cheap Setting up our Amazon Web Services / Elastic MapReduce account and PuTTY Partitioning Using .partitionBy() Choosing a partition size Creating similar movies from one million ratings - part 1 Changes to the script Creating similar movies from one million ratings - part 2 Our strategy Specifying memory per executor Specifying a cluster manager Running on a cluster Setting up to run the movie-similarities-1m.py script on a cluster Preparing the script Creating a cluster Connecting to the master node using SSH Running the code Creating similar movies from one million ratings – part 3 Assessing the results Terminating the cluster Troubleshooting Spark on a cluster More troubleshooting and managing dependencies Troubleshooting Managing dependencies Summary Chapter 5: SparkSQL, DataFrames, and DataSets Introducing SparkSQL Using SparkSQL in Python More things you can do with DataFrames Differences between DataFrames and DataSets Shell access in SparkSQL User-defined functions (UDFs) Executing SQL commands and SQL-style functions on a DataFrame Using SQL-style functions instead of queries Using DataFrames instead of RDDs Summary Chapter 6: Other Spark Technologies and Libraries Introducing MLlib MLlib capabilities Special MLlib data types For more information on machine learning Making movie recommendations Using MLlib to produce movie recommendations Examining the movie-recommendations-als.py script Analyzing the ALS recommendations results Why did we get bad results? Using DataFrames with MLlib Examining the spark-linear-regression.py script Getting results Spark Streaming and GraphX What is Spark Streaming? GraphX Summary Chapter 7: Where to Go From Here? – Learning More About Spark and Data Science Index