برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید

09117307688
09117179751

در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید

دسترسی نامحدود

برای کاربرانی که ثبت نام کرده اند

ضمانت بازگشت وجه

درصورت عدم همخوانی توضیحات با کتاب

پشتیبانی

از ساعت 7 صبح تا 10 شب

دانلود کتاب Frank Kane's Taming Big Data with Apache Spark and Python

دانلود کتاب رام کردن کلان داده توسط فرانک کین با آپاچی اسپارک و پایتون

مشخصات کتاب

Frank Kane's Taming Big Data with Apache Spark and Python

دسته بندی: برنامه نويسي
ویرایش:  
نویسندگان: Frank Kane  
سری:  
ISBN (شابک) : 1787287947, 9781787287945 
ناشر: Packt Publishing 
سال نشر: 2017 
تعداد صفحات: 289 
زبان: English 
فرمت فایل : PDF (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود) 
حجم فایل: 14 مگابایت

قیمت کتاب (تومان) : 36,000

میانگین امتیاز به این کتاب :
تعداد امتیاز دهندگان : 5

در صورت تبدیل فایل کتاب Frank Kane's Taming Big Data with Apache Spark and Python به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.

توجه داشته باشید کتاب رام کردن کلان داده توسط فرانک کین با آپاچی اسپارک و پایتون نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.

توضیحاتی درمورد کتاب به خارجی

فهرست مطالب

Cover
Copyright
Credits
About the Author
www.PacktPub.com
Customer Feedback
Table of Contents
Preface
Chapter 1: Getting Started with Spark
	Getting set up - installing Python, a JDK, and Spark and its dependencies
		Installing Enthought Canopy
		Installing the Java Development Kit
		Installing Spark
		Running Spark code
	Installing the MovieLens movie rating dataset
	Run your first Spark program - the ratings histogram example
		Examining the ratings counter script
		Running the ratings counter script
	Summary
Chapter 2: Spark Basics and Spark Examples
	What is Spark?
		Spark is scalable
		Spark is fast
		Spark is hot
		Spark is not that hard
		Components of Spark
		Using Python with Spark
	The Resilient Distributed Dataset (RDD)
		What is the RDD?
			The SparkContext object
			Creating RDDs
			Transforming RDDs
			Map example
			RDD actions
	Ratings histogram walk-through
		Understanding the code
			Setting up the SparkContext object
			Loading the data
			Extract (MAP) the data we care about
			Perform an action - count by value
			Sort and display the results
		Looking at the ratings-counter script in Canopy
	Key/value RDDs and the average friends by age example
		Key/value concepts - RDDs can hold key/value pairs
			Creating a key/value RDD
			What Spark can do with key/value data?
			Mapping the values of a key/value RDD
		The friends by age example
			Parsing (mapping) the input data
			Counting up the sum of friends and number of entries per age
			Compute averages
			Collect and display the results
	Running the average friends by age example
		Examining the script
		Running the code
	Filtering RDDs and the minimum temperature by location example
		What is filter()
			The source data for the minimum temperature by location example
			Parse (map) the input data
			Filter out all but the TMIN entries
			Create (station ID, temperature) key/value pairs
			Find minimum temperature by station ID
			Collect and print results
	Running the minimum temperature example and modifying it for maximums
		Examining the min-temperatures script
		Running the script
	Running the maximum temperature by location example
	Counting word occurrences using flatmap()
		Map versus flatmap
			Map ()
			Flatmap ()
		Code sample - count the words in a book
	Improving the word-count script with regular expressions
		Text normalization
		Examining the use of regular expressions in the word-count script
		Running the code
	Sorting the word count results
		Step 1 - Implement countByValue() the hard way to create a new RDD
		Step 2 - Sort the new RDD
		Examining the script
		Running the code
	Find the total amount spent by customer
		Introducing the problem
		Strategy for solving the problem
		Useful snippets of code
	Check your results and sort them by the total amount spent
	Check your sorted implementation and results against mine
	Summary
Chapter 3: Advanced Examples of Spark Programs
	Finding the most popular movie
		Examining the popular-movies script
		Getting results
	Using broadcast variables to display movie names instead of ID numbers
		Introducing broadcast variables
			Examining the popular-movies-nicer.py script
		Getting results
	Finding the most popular superhero in a social graph
		Superhero social networks
			Input data format
		Strategy
	Running the script - discover who the most popular superhero is
		Mapping input data to (hero ID, number of co-occurrences) per line
		Adding up co-occurrence by hero ID
		Flipping the (map) RDD to (number, hero ID)
		Using max() and looking up the name of the winner
		Getting results
	Superhero degrees of separation - introducing the breadth-first search algorithm
		Degrees of separation
			How the breadth-first search algorithm works?
			The initial condition of our social graph
			First pass through the graph
			Second pass through the graph
			Third pass through the graph
			Final pass through the graph
	Accumulators and implementing BFS in Spark
		Convert the input file into structured data
			Writing code to convert Marvel-Graph.txt to BFS nodes
		Iteratively process the RDD
			Using a mapper and a reducer
			How do we know when we\'re done?
	Superhero degrees of separation - review the code and run it
		Setting up an accumulator and using the convert to BFS function
			Calling flatMap()
			Calling an action
			Calling reduceByKey
		Getting results
	Item-based collaborative filtering in Spark, cache(), and persist()
		How does item-based collaborative filtering work?
		Making item-based collaborative filtering a Spark problem
		It\'s getting real
			Caching RDDs
	Running the similar-movies script using Spark\'s cluster manager
		Examining the script
		Getting results
	Improving the quality of the similar movies example
	Summary
Chapter 4: Running Spark on a Cluster
	Introducing Elastic MapReduce
		Why use Elastic MapReduce?
		Warning - Spark on EMR is not cheap
	Setting up our Amazon Web Services / Elastic MapReduce account and PuTTY
	Partitioning
		Using .partitionBy()
			Choosing a partition size
	Creating similar movies from one million ratings - part 1
		Changes to the script
	Creating similar movies from one million ratings - part 2
		Our strategy
			Specifying memory per executor
			Specifying a cluster manager
			Running on a cluster
		Setting up to run the movie-similarities-1m.py script on a cluster
			Preparing the script
			Creating a cluster
			Connecting to the master node using SSH
			Running the code
	Creating similar movies from one million ratings – part 3
		Assessing the results
		Terminating the cluster
	Troubleshooting Spark on a cluster
	More troubleshooting and managing dependencies
		Troubleshooting
			Managing dependencies
	Summary
Chapter 5: SparkSQL, DataFrames, and DataSets
	Introducing SparkSQL
		Using SparkSQL in Python
			More things you can do with DataFrames
		Differences between DataFrames and DataSets
		Shell access in SparkSQL
		User-defined functions (UDFs)
	Executing SQL commands and SQL-style functions on a DataFrame
		Using SQL-style functions instead of queries
	Using DataFrames instead of RDDs
	Summary
Chapter 6: Other Spark Technologies and Libraries
	Introducing MLlib
		MLlib capabilities
			Special MLlib data types
			For more information on machine learning
		Making movie recommendations
	Using MLlib to produce movie recommendations
		Examining the movie-recommendations-als.py script
	Analyzing the ALS recommendations results
		Why did we get bad results?
	Using DataFrames with MLlib
		Examining the spark-linear-regression.py script
		Getting results
	Spark Streaming and GraphX
		What is Spark Streaming?
		GraphX
	Summary
Chapter 7: Where to Go From Here? – Learning More About Spark and Data Science
Index