دسترسی نامحدود
برای کاربرانی که ثبت نام کرده اند
برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید
در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید
برای کاربرانی که ثبت نام کرده اند
درصورت عدم همخوانی توضیحات با کتاب
از ساعت 7 صبح تا 10 شب
ویرایش:
نویسندگان: Jonathan Rioux
سری:
ISBN (شابک) : 9781617297205, 1196192318
ناشر: Manning
سال نشر: 2022
تعداد صفحات: 425
زبان: English
فرمت فایل : EPUB (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود)
حجم فایل: 3 Mb
در صورت تبدیل فایل کتاب Data Analysis with Python and PySpark به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.
توجه داشته باشید کتاب تجزیه و تحلیل داده ها با پایتون و پای اسپارک نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.
تجزیه و تحلیل داده ها با پایتون و PySpark یک آموزش مهندسی دقیق است که به شما کمک می کند از PySpark برای ارائه برنامه های کاربردی مبتنی بر داده خود در هر مقیاسی استفاده کنید. این راهنمای واضح و عملی به شما نشان میدهد که چگونه میتوانید قابلیتهای پردازش خود را در چندین ماشین با دادههایی از هر منبع، از خوشههای مبتنی بر Hadoop گرفته تا کاربرگهای اکسل، افزایش دهید. شما یاد خواهید گرفت که چگونه وظایف تجزیه و تحلیل بزرگ را به تکه های قابل مدیریت تقسیم کنید و چگونه بهترین انتزاع داده PySpark را برای نیازهای منحصر به فرد خود انتخاب و استفاده کنید. تا زمانی که کارتان تمام شد، میتوانید برنامههای PySpark بسیار سریعی را بنویسید و اجرا کنید که مقیاسپذیر، کارآمد برای کار و اشکالزدایی آسان هستند.
Data Analysis with Python and PySpark is a carefully engineered tutorial that helps you use PySpark to deliver your data-driven applications at any scale. This clear and hands-on guide shows you how to enlarge your processing capabilities across multiple machines with data from any source, ranging from Hadoop-based clusters to Excel worksheets. You’ll learn how to break down big analysis tasks into manageable chunks and how to choose and use the best PySpark data abstraction for your unique needs. By the time you’re done, you’ll be able to write and run incredibly fast PySpark programs that are scalable, efficient to operate, and easy to debug.
PySpark in Action: Python data analysis at scale MEAP V06\nCopyright\nWelcome\nBrief contents\nChapter 1: Introduction\n 1.1 What is PySpark?\n 1.1.1 You saw it coming: What is Spark?\n 1.1.2 PySpark = Spark + Python\n 1.1.3 Why PySpark?\n 1.1.4 Your very own factory: how PySpark works\n 1.1.5 Some physical planning with the cluster manager\n 1.1.6 A factory made efficient through a lazy manager\n 1.2 What will you learn in this book?\n 1.3 What do I need to get started?\n 1.4 Summary\nChapter 2: Your first data program in PySpark\n 2.1 Setting up the pyspark shell\n 2.1.1 The SparkSession entry-point\n 2.1.2 Configuring how chatty spark is: the log level\n 2.2 Mapping our program\n 2.3 Reading and ingesting data into a data frame\n 2.4 Exploring data in the DataFrame structure\n 2.4.1 Peeking under the hood: the show() method\n 2.5 Moving from a sentence to a list of words\n 2.5.1 Selecting specific columns using select()\n 2.5.2 Transforming columns: splitting a string into a list of words\n 2.5.3 Renaming columns: alias and withColumnRenamed\n 2.6 Reshaping your data: exploding a list into rows\n 2.7 Working with words: changing case and removing punctuation\n 2.8 Filtering rows\n 2.9 Summary\n 2.10 Exercises\n 2.10.1 Exercise 2.1\n 2.10.2 Exercice 2.2\n 2.10.3 Exercise 2.3\n 2.10.4 Exercise 2.4\n 2.10.5 Exercise 2.5\nChapter 3: Submitting and scaling your first PySpark program\n 3.1 Grouping records: Counting word frequencies\n 3.2 Ordering the results on the screen using orderBy\n 3.3 Writing data from a data frame\n 3.4 Putting it all together: counting\n 3.4.1 Simplifying your dependencies with PySpark’s import conventions\n 3.4.2 Simplifying our program via method chaining\n 3.5 Your first non-interactive program: using spark-submit\n 3.5.1 Creating your own SparkSession\n 3.6 Using spark-submit to launch your program in batch mode\n 3.7 What didn’t happen in this Chapter\n 3.8 Scaling up our word frequency program\n 3.9 Summary\n 3.10 Exercises\n 3.10.1 Exercise 3.1\n 3.10.2 Exercise 3.2\n 3.10.3 Exercise 3.3\n 3.10.4 Exercise 3.4\nChapter 4: Analyzing tabular data with pyspark.sql\n 4.1 What is tabular data?\n 4.1.1 How does PySpark represent tabular data?\n 4.2 PySpark for analyzing and processing tabular data\n 4.3 Reading delimited data in PySpark\n 4.3.1 Customizing the SparkReader object to read CSV data files\n 4.3.2 Exploring the shape of our data universe\n 4.4 The basics of data manipulation: diagnosing our centre table\n 4.4.1 Knowing what we want: selecting columns\n 4.4.2 Keeping what we need: deleting columns\n 4.4.3 Creating what’s not there: new columns with withColumn()\n 4.4.4 Tidying our data frame: renaming and re-ordering columns\n 4.4.5 Summarizing your data frame: describe() and summary()\n 4.5 Summary\nChapter 5: The data frame through a new lens: joining and grouping\n 5.1 From many to one: joining data\n 5.1.1 What’s what in the world of joins\n 5.1.2 Knowing our left from our right\n 5.1.3 The rules to a successful join: the predicates\n 5.1.4 How do you do it: the join method\n 5.1.5 Naming conventions in the joining world\n 5.2 Summarizing the data via: groupby and GroupedData\n 5.2.1 A simple groupby blueprint\n 5.2.2 A column is a column: using agg with custom column definitions\n 5.3 Taking care of null values: drop and fill\n 5.3.1 Dropping it like it’s hot\n 5.3.2 Filling values to our heart’s content\n 5.4 What was our question again: our end-to-end program\n 5.5 Summary\n 5.6 Exercises\n 5.6.1 Exercise 5.4\n 5.6.2 Exercise 5.5\n 5.6.3 Exercise 5.6\nChapter 6: Making sense of your data: types, structure and semantic\n 6.1 Open sesame: what does your data tell you?\n 6.2 The first step in understanding our data: PySpark’s scalar types\n 6.2.1 String and bytes\n 6.2.2 The numerical tower(s): integer values\n 6.2.3 The numerical tower(s): double, floats and decimals\n 6.2.4 Date and timestamp\n 6.2.5 Null and boolean\n 6.3 PySpark’s complex types\n 6.3.1 Complex types: the array\n 6.3.2 Complex types: the map\n 6.4 Structure and type: The dual-nature of the struct\n 6.4.1 A data frame is an ordered collection of columns\n 6.4.2 The second dimension: just enough about the row\n 6.4.3 Casting your way to sanity\n 6.4.4 Defaulting values with fillna\n 6.5 Summary\n 6.6 Exercises\n 6.6.1 Exercise 6.1\n 6.6.2 Exercise 6.2\nChapter 7: Bilingual PySpark: blending Python and SQL code\n 7.1 Banking on what we know: pyspark.sql vs plain SQL\n 7.2 Using SQL queries on a data frame\n 7.2.1 Promoting a data frame to a Spark table\n 7.2.2 Using the Spark catalog\n 7.3 SQL and PySpark\n 7.4 Using SQL-like syntax within data frame methods\n 7.4.1 Select and where\n 7.4.2 Group and order by\n 7.4.3 Having\n 7.4.4 Create tables/views\n 7.4.5 Union and join\n 7.4.6 Subqueries and common table expressions\n 7.4.7 A quick summary of PySpark vs. SQL syntax\n 7.5 Simplifying our code: blending SQL and Python together\n 7.5.1 Reading our data\n 7.5.2 Using SQL-style expressions in PySpark\n 7.6 Conclusion\n 7.7 Summary\n 7.8 Exercises\n 7.8.1 Exercise 7.1\n 7.8.2 Exercise 7.2\n 7.8.3 Exercise 7.3\n 7.8.4 Exercise 7.4\nChapter 8: Extending PySpark with Python: RDD and user-defined-functions\n 8.1 PySpark, freestyle: the resilient distributed dataset\n 8.1.1 Manipulating data the RDD way: map, filter and reduce\n 8.2 Using Python to extend PySpark via user-defined functions\n 8.2.1 It all starts with plain Python: using typed Python functions\n 8.2.2 From Python function to UDF: two approaches\n 8.3 Big data is just a lot of small data: using pandas UDF\n 8.3.1 Setting our environment: connectors and libraries\n 8.3.2 Preparing our data\n 8.3.3 Scalar UDF\n 8.3.4 Grouped map UDF\n 8.3.5 Grouped aggregate UDF\n 8.3.6 Going local to troubleshoot pandas UDF\n 8.4 Summary\n 8.5 Exercises\n 8.5.1 Exercise 8.1\n 8.5.2 Exercise 8.2\n 8.5.3 Exercise 8.3\n 8.5.4 Exercise 8.4\n 8.5.5 Exercise 8.5\nAppendix A: Solutions to the exercices\nAppendix B: Installing PySpark locally\n B.1 Windows\n B.1.1 Install Java\n B.1.2 Install 7-zip\n B.1.3 Download and install Apache Spark\n B.1.4 Install Python\n B.1.5 Launching an iPython REPL and starting PySpark\n B.1.6 (Optional) Install and run Jupyter to use Jupyter notebook\n B.2 macOS\n B.2.1 Install Homebrew\n B.2.2 Install Java and Spark\n B.2.3 Install Anaconda/Python\n B.2.4 Launching a iPython REPL and starting PySpark\n B.2.5 (Optional) Install and run Jupyter to use Jupyter notebook\n B.3 GNU/Linux and WSL\n B.3.1 Install Java\n B.3.2 Installing Spark\n B.3.3 Install Python 3 and IPython\n B.3.4 Launch PySpark with IPython\n B.3.5 (Optional) Install and run Jupyter to use Jupyter notebook