ورود به حساب

نام کاربری گذرواژه

گذرواژه را فراموش کردید؟ کلیک کنید

حساب کاربری ندارید؟ ساخت حساب

ساخت حساب کاربری

نام نام کاربری ایمیل شماره موبایل گذرواژه

برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید


09117307688
09117179751

در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید

دسترسی نامحدود

برای کاربرانی که ثبت نام کرده اند

ضمانت بازگشت وجه

درصورت عدم همخوانی توضیحات با کتاب

پشتیبانی

از ساعت 7 صبح تا 10 شب

دانلود کتاب Stream Processing with Apache Spark: Mastering Structured Streaming and Spark Streaming

دانلود کتاب پردازش جریان با Apache Spark: تسلط بر جریان ساختاری و جریان جرقه ای

Stream Processing with Apache Spark: Mastering Structured Streaming and Spark Streaming

مشخصات کتاب

Stream Processing with Apache Spark: Mastering Structured Streaming and Spark Streaming

دسته بندی: پایگاه داده ها
ویرایش: 1 
نویسندگان:   
سری:  
ISBN (شابک) : 1491944242, 9781491944240 
ناشر: O’Reilly Media 
سال نشر: 2019 
تعداد صفحات: 0 
زبان: English 
فرمت فایل : EPUB (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود) 
حجم فایل: 4 مگابایت 

قیمت کتاب (تومان) : 45,000



کلمات کلیدی مربوط به کتاب پردازش جریان با Apache Spark: تسلط بر جریان ساختاری و جریان جرقه ای: رایانش ابری، یادگیری ماشین، تجزیه و تحلیل، اسپارک آپاچی، طوفان آپاچی، نظارت، پردازش جریان، آپاچی کافکا، پردازش دسته‌ای، خوشه‌ها، Apache Flink، Spark SQL، مجموعه داده‌های توزیع‌شده انعطاف‌پذیر، تنظیم عملکرد، جریان اسپارک، پردازش معماری Lambda، معماری، جریان ساخت یافته



ثبت امتیاز به این کتاب

میانگین امتیاز به این کتاب :
       تعداد امتیاز دهندگان : 16


در صورت تبدیل فایل کتاب Stream Processing with Apache Spark: Mastering Structured Streaming and Spark Streaming به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.

توجه داشته باشید کتاب پردازش جریان با Apache Spark: تسلط بر جریان ساختاری و جریان جرقه ای نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.


توضیحاتی در مورد کتاب پردازش جریان با Apache Spark: تسلط بر جریان ساختاری و جریان جرقه ای

قبل از اینکه بتوانید ابزارهای تحلیلی برای به دست آوردن بینش سریع بسازید، ابتدا باید بدانید که چگونه داده ها را در زمان واقعی پردازش کنید. با این راهنمای عملی، توسعه دهندگان آشنا با Apache Spark یاد خواهند گرفت که چگونه این چارچوب درون حافظه را برای استفاده در جریان داده ها قرار دهند. متوجه خواهید شد که چگونه Spark شما را قادر می‌سازد تا کارهای پخش جریانی را تقریباً به همان روشی که کارهای دسته‌ای می‌نویسید بنویسید. نویسندگان جرارد ماس و فرانسوا گاریلو به شما کمک می کنند تا زیربنای نظری آپاچی اسپارک را کشف کنید. این راهنمای جامع دارای دو بخش است که APIهای جریانی را که Spark اکنون پشتیبانی می‌کند مقایسه و مقایسه می‌کند: کتابخانه اصلی Spark Streaming و API جدیدتر Structured Streaming. • مفاهیم اساسی پردازش جریان را بیاموزید و معماری های مختلف جریان را بررسی کنید • جریان ساختاریافته را از طریق مثال های عملی کاوش کنید. جنبه های مختلف پردازش جریانی را با جزئیات بیاموزید • با Spark Streaming، کارها و برنامه های پخش جریانی را ایجاد و اجرا کنید. Spark Streaming را با سایر APIهای Spark یکپارچه کنید • تکنیک های پیشرفته جریان جرقه، از جمله الگوریتم های تقریبی و الگوریتم های یادگیری ماشینی را بیاموزید • Apache Spark را با سایر پروژه‌های پردازش جریان، از جمله Apache Storm، Apache Flink، و Apache Kafka Streams مقایسه کنید.


توضیحاتی درمورد کتاب به خارجی

Before you can build analytics tools to gain quick insights, you first need to know how to process data in real time. With this practical guide, developers familiar with Apache Spark will learn how to put this in-memory framework to use for streaming data. You’ll discover how Spark enables you to write streaming jobs in almost the same way you write batch jobs. Authors Gerard Maas and François Garillot help you explore the theoretical underpinnings of Apache Spark. This comprehensive guide features two sections that compare and contrast the streaming APIs Spark now supports: the original Spark Streaming library and the newer Structured Streaming API. • Learn fundamental stream processing concepts and examine different streaming architectures • Explore Structured Streaming through practical examples; learn different aspects of stream processing in detail • Create and operate streaming jobs and applications with Spark Streaming; integrate Spark Streaming with other Spark APIs • Learn advanced Spark Streaming techniques, including approximation algorithms and machine learning algorithms • Compare Apache Spark to other stream processing projects, including Apache Storm, Apache Flink, and Apache Kafka Streams



فهرست مطالب

Copyright
Table of Contents
Foreword
Preface
	Who Should Read This Book?
	Installing Spark
	Learning Scala
	The Way Ahead
	Bibliography
	Conventions Used in This Book
	Using Code Examples
	O’Reilly Online Learning
	How to Contact Us
	Acknowledgments
		From Gerard
		From François
Part I. Fundamentals of Stream Processing with Apache Spark
	Chapter 1. Introducing Stream Processing
		What Is Stream Processing?
			Batch Versus Stream Processing
			The Notion of Time in Stream Processing
			The Factor of Uncertainty
		Some Examples of Stream Processing
		Scaling Up Data Processing
			MapReduce
			The Lesson Learned: Scalability and Fault Tolerance
		Distributed Stream Processing
			Stateful Stream Processing in a Distributed System
		Introducing Apache Spark
			The First Wave: Functional APIs
			The Second Wave: SQL
			A Unified Engine
			Spark Components
			Spark Streaming
			Structured Streaming
		Where Next?
	Chapter 2. Stream-Processing Model
		Sources and Sinks
		Immutable Streams Defined from One Another
		Transformations and Aggregations
		Window Aggregations
			Tumbling Windows
			Sliding Windows
		Stateless and Stateful Processing
		Stateful Streams
		An Example: Local Stateful Computation in Scala
			A Stateless Definition of the Fibonacci Sequence as a Stream Transformation
		Stateless or Stateful Streaming
		The Effect of Time
			Computing on Timestamped Events
			Timestamps as the Provider of the Notion of Time
			Event Time Versus Processing Time
			Computing with a Watermark
		Summary
	Chapter 3. Streaming Architectures
		Components of a Data Platform
		Architectural Models
		The Use of a Batch-Processing Component in a Streaming Application
		Referential Streaming Architectures
			The Lambda Architecture
			The Kappa Architecture
		Streaming Versus Batch Algorithms
			Streaming Algorithms Are Sometimes Completely Different in Nature
			Streaming Algorithms Can’t Be Guaranteed to Measure Well Against Batch Algorithms
		Summary
	Chapter 4. Apache Spark as a Stream-Processing Engine
		The Tale of Two APIs
		Spark’s Memory Usage
			Failure Recovery
			Lazy Evaluation
			Cache Hints
		Understanding Latency
		Throughput-Oriented Processing
		Spark’s Polyglot API
		Fast Implementation of Data Analysis
		To Learn More About Spark
		Summary
	Chapter 5. Spark’s Distributed Processing Model
		Running Apache Spark with a Cluster Manager
			Examples of Cluster Managers
		Spark’s Own Cluster Manager
		Understanding Resilience and Fault Tolerance in a Distributed System
			Fault Recovery
			Cluster Manager Support for Fault Tolerance
		Data Delivery Semantics
		Microbatching and One-Element-at-a-Time
			Microbatching: An Application of Bulk-Synchronous Processing
			One-Record-at-a-Time Processing
			Microbatching Versus One-at-a-Time: The Trade-Offs
		Bringing Microbatch and One-Record-at-a-Time Closer Together
		Dynamic Batch Interval
		Structured Streaming Processing Model
			The Disappearance of the Batch Interval
	Chapter 6. Spark’s Resilience Model
		Resilient Distributed Datasets in Spark
		Spark Components
		Spark’s Fault-Tolerance Guarantees
			Task Failure Recovery
			Stage Failure Recovery
			Driver Failure Recovery
		Summary
Appendix A. References for Part I
Part II. Structured Streaming
	Chapter 7. Introducing Structured Streaming
		First Steps with Structured Streaming
		Batch Analytics
		Streaming Analytics
			Connecting to a Stream
			Preparing the Data in the Stream
			Operations on Streaming Dataset
			Creating a Query
			Start the Stream Processing
			Exploring the Data
		Summary
	Chapter 8. The Structured Streaming Programming Model
		Initializing Spark
		Sources: Acquiring Streaming Data
			Available Sources
		Transforming Streaming Data
			Streaming API Restrictions on the DataFrame API
		Sinks: Output the Resulting Data
			format
			outputMode
			queryName
			option
			options
			trigger
			start()
		Summary
	Chapter 9. Structured Streaming in Action
		Consuming a Streaming Source
		Application Logic
		Writing to a Streaming Sink
		Summary
	Chapter 10. Structured Streaming Sources
		Understanding Sources
			Reliable Sources Must Be Replayable
			Sources Must Provide a Schema
		Available Sources
		The File Source
			Specifying a File Format
			Common Options
			Common Text Parsing Options (CSV, JSON)
			JSON File Source Format
			CSV File Source Format
			Parquet File Source Format
			Text File Source Format
		The Kafka Source
			Setting Up a Kafka Source
			Selecting a Topic Subscription Method
			Configuring Kafka Source Options
			Kafka Consumer Options
		The Socket Source
			Configuration
			Operations
		The Rate Source
			Options
	Chapter 11. Structured Streaming Sinks
		Understanding Sinks
		Available Sinks
			Reliable Sinks
			Sinks for Experimentation
			The Sink API
			Exploring Sinks in Detail
		The File Sink
			Using Triggers with the File Sink
			Common Configuration Options Across All Supported File Formats
			Common Time and Date Formatting (CSV, JSON)
			The CSV Format of the File Sink
			The JSON File Sink Format
			The Parquet File Sink Format
			The Text File Sink Format
		The Kafka Sink
			Understanding the Kafka Publish Model
			Using the Kafka Sink
		The Memory Sink
			Output Modes
		The Console Sink
			Options
			Output Modes
		The Foreach Sink
			The ForeachWriter Interface
			TCP Writer Sink: A Practical ForeachWriter Example
			The Moral of this Example
			Troubleshooting ForeachWriter Serialization Issues
	Chapter 12. Event Time–Based Stream Processing
		Understanding Event Time in Structured Streaming
		Using Event Time
		Processing Time
		Watermarks
		Time-Based Window Aggregations
			Defining Time-Based Windows
			Understanding How Intervals Are Computed
			Using Composite Aggregation Keys
			Tumbling and Sliding Windows
		Record Deduplication
		Summary
	Chapter 13. Advanced Stateful Operations
		Example: Car Fleet Management
		Understanding Group with State Operations
			Internal State Flow
		Using MapGroupsWithState
		Using FlatMapGroupsWithState
			Output Modes
			Managing State Over Time
		Summary
	Chapter 14. Monitoring Structured Streaming Applications
		The Spark Metrics Subsystem
			Structured Streaming Metrics
		The StreamingQuery Instance
			Getting Metrics with StreamingQueryProgress
		The StreamingQueryListener Interface
			Implementing a StreamingQueryListener
	Chapter 15. Experimental Areas: Continuous Processing and Machine Learning
		Continuous Processing
			Understanding Continuous Processing
			Using Continuous Processing
			Limitations
		Machine Learning
			Learning Versus Exploiting
			Applying a Machine Learning Model to a Stream
			Example: Estimating Room Occupancy by Using Ambient Sensors
			Online Training
Appendix B. References for Part II
Part III. Spark Streaming
	Chapter 16. Introducing Spark Streaming
		The DStream Abstraction
			DStreams as a Programming Model
			DStreams as an Execution Model
		The Structure of a Spark Streaming Application
			Creating the Spark Streaming Context
			Defining a DStream
			Defining Output Operations
			Starting the Spark Streaming Context
			Stopping the Streaming Process
		Summary
	Chapter 17. The Spark Streaming Programming Model
		RDDs as the Underlying Abstraction for DStreams
		Understanding DStream Transformations
		Element-Centric DStream Transformations
		RDD-Centric DStream Transformations
		Counting
		Structure-Changing Transformations
		Summary
	Chapter 18. The Spark Streaming Execution Model
		The Bulk-Synchronous Architecture
		The Receiver Model
			The Receiver API
			How Receivers Work
			The Receiver’s Data Flow
			The Internal Data Resilience
			Receiver Parallelism
			Balancing Resources: Receivers Versus Processing Cores
			Achieving Zero Data Loss with the Write-Ahead Log
		The Receiverless or Direct Model
		Summary
	Chapter 19. Spark Streaming Sources
		Types of Sources
			Basic Sources
			Receiver-Based Sources
			Direct Sources
		Commonly Used Sources
		The File Source
			How It Works
		The Queue Source
			How It Works
			Using a Queue Source for Unit Testing
			A Simpler Alternative to the Queue Source: The ConstantInputDStream
		The Socket Source
			How It Works
		The Kafka Source
			Using the Kafka Source
			How It Works
		Where to Find More Sources
	Chapter 20. Spark Streaming Sinks
		Output Operations
		Built-In Output Operations
			print
			saveAsxyz
			foreachRDD
		Using foreachRDD as a Programmable Sink
		Third-Party Output Operations
	Chapter 21. Time-Based Stream Processing
		Window Aggregations
		Tumbling Windows
			Window Length Versus Batch Interval
		Sliding Windows
			Sliding Windows Versus Batch Interval
			Sliding Windows Versus Tumbling Windows
		Using Windows Versus Longer Batch Intervals
		Window Reductions
			reduceByWindow
			reduceByKeyAndWindow
			countByWindow
			countByValueAndWindow
		Invertible Window Aggregations
		Slicing  Streams
		Summary
	Chapter 22. Arbitrary Stateful Streaming Computation
		Statefulness at the Scale of a Stream
		updateStateByKey
		Limitation of updateStateByKey
			Performance
			Memory Usage
		Introducing Stateful Computation with mapwithState
		Using mapWithState
		Event-Time Stream Computation Using mapWithState
	Chapter 23. Working with Spark SQL
		Spark SQL
		Accessing Spark SQL Functions from Spark Streaming
			Example: Writing Streaming Data to Parquet
		Dealing with Data at Rest
			Using Join to Enrich the Input Stream
		Join Optimizations
		Updating Reference Datasets in a Streaming Application
			Enhancing Our Example with a Reference Dataset
		Summary
	Chapter 24. Checkpointing
		Understanding the Use of Checkpoints
		Checkpointing DStreams
		Recovery from a Checkpoint
			Limitations
		The Cost of Checkpointing
		Checkpoint Tuning
	Chapter 25. Monitoring Spark Streaming
		The Streaming UI
		Understanding Job Performance Using the Streaming UI
			Input Rate Chart
			Scheduling Delay Chart
			Processing Time Chart
			Total Delay Chart
			Batch Details
		The Monitoring REST API
			Using the Monitoring REST API
			Information Exposed by the Monitoring REST API
		The Metrics Subsystem
		The Internal Event Bus
			Interacting with the Event Bus
		Summary
	Chapter 26. Performance Tuning
		The Performance Balance of Spark Streaming
			The Relationship Between Batch Interval and Processing Delay
			The Last Moments of a Failing Job
			Going Deeper: Scheduling Delay and Processing Delay
			Checkpoint Influence in Processing Time
		External Factors that Influence the Job’s Performance
		How to Improve Performance?
		Tweaking the Batch Interval
		Limiting the Data Ingress with Fixed-Rate Throttling
		Backpressure
		Dynamic Throttling
			Tuning the Backpressure PID
			Custom Rate Estimator
			A Note on Alternative Dynamic Handling Strategies
		Caching
		Speculative Execution
Appendix C. References for Part III
Part IV. Advanced Spark Streaming Techniques
	Chapter 27. Streaming Approximation and Sampling Algorithms
		Exactness, Real Time, and Big Data
			Exactness
			Real-Time Processing
			Big Data
		The Exactness, Real-Time, and Big Data triangle
			Big Data and Real Time
		Approximation Algorithms
		Hashing and Sketching: An Introduction
		Counting Distinct Elements: HyperLogLog
			Role-Playing Exercise: If We Were a System Administrator
			Practical HyperLogLog in Spark
		Counting Element Frequency: Count Min Sketches
			Introducing Bloom Filters
			Bloom Filters with Spark
			Computing Frequencies with a Count-Min Sketch
		Ranks and Quantiles: T-Digest
			T-Digest in Spark
		Reducing the Number of Elements: Sampling
			Random Sampling
			Stratified Sampling
	Chapter 28. Real-Time Machine Learning
		Streaming Classification with Naive Bayes
			streamDM Introduction
			Naive Bayes in Practice
			Training a Movie Review Classifier
		Introducing Decision Trees
		Hoeffding Trees
			Hoeffding Trees in Spark, in Practice
		Streaming Clustering with Online K-Means
			K-Means Clustering
			Online Data and K-Means
			The Problem of Decaying Clusters
			Streaming K-Means with Spark Streaming
Appendix D. References for Part IV
Part V. Beyond Apache Spark
	Chapter 29. Other Distributed Real-Time Stream Processing Systems
		Apache Storm
			Processing Model
			The Storm Topology
			The Storm Cluster
			Compared to Spark
		Apache Flink
			A Streaming-First Framework
			Compared to Spark
		Kafka Streams
			Kafka Streams Programming Model
			Compared to Spark
		In the Cloud
			Amazon Kinesis on AWS
			Microsoft Azure Stream Analytics
			Apache Beam/Google Cloud Dataflow
	Chapter 30. Looking Ahead
		Stay Plugged In
			Seek Help on Stack Overflow
			Start Discussions on the Mailing Lists
			Attend Conferences
		Attend Meetups
			Read Books
		Contributing to the Apache Spark Project
Appendix E. References for Part V
Index
About the Authors
Colophon




نظرات کاربران