ورود به حساب

نام کاربری گذرواژه

گذرواژه را فراموش کردید؟ کلیک کنید

حساب کاربری ندارید؟ ساخت حساب

ساخت حساب کاربری

نام نام کاربری ایمیل شماره موبایل گذرواژه

برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید


09117307688
09117179751

در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید

دسترسی نامحدود

برای کاربرانی که ثبت نام کرده اند

ضمانت بازگشت وجه

درصورت عدم همخوانی توضیحات با کتاب

پشتیبانی

از ساعت 7 صبح تا 10 شب

دانلود کتاب Data Parallel C++: Mastering DPC++ for Programming of Heterogeneous Systems using C++ and SYCL

دانلود کتاب C++ موازی داده: تسلط بر DPC++ برای برنامه نویسی سیستم های ناهمگن با استفاده از C++ و SYCL

Data Parallel C++: Mastering DPC++ for Programming of Heterogeneous Systems using C++ and SYCL

مشخصات کتاب

Data Parallel C++: Mastering DPC++ for Programming of Heterogeneous Systems using C++ and SYCL

ویرایش:  
نویسندگان: , , , , ,   
سری:  
ISBN (شابک) : 9781484255742 
ناشر: Apress 
سال نشر:  
تعداد صفحات: 0 
زبان: English 
فرمت فایل : EPUB (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود) 
حجم فایل: 81 مگابایت 

قیمت کتاب (تومان) : 37,000



ثبت امتیاز به این کتاب

میانگین امتیاز به این کتاب :
       تعداد امتیاز دهندگان : 8


در صورت تبدیل فایل کتاب Data Parallel C++: Mastering DPC++ for Programming of Heterogeneous Systems using C++ and SYCL به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.

توجه داشته باشید کتاب C++ موازی داده: تسلط بر DPC++ برای برنامه نویسی سیستم های ناهمگن با استفاده از C++ و SYCL نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.


توضیحاتی درمورد کتاب به خارجی



فهرست مطالب

Table of Contents
About the Authors
Preface
Acknowledgments
Chapter 1: Introduction
	Read the Book, Not the Spec
	SYCL 1.2.1 vs. SYCL 2020, and DPC++
	Getting a DPC++ Compiler
	Book GitHub
	Hello, World! and a SYCL Program Dissection
	Queues and Actions
	It Is All About Parallelism
		Throughput
		Latency
		Think Parallel
		Amdahl and Gustafson
		Scaling
		Heterogeneous Systems
		Data-Parallel Programming
	Key Attributes of DPC++ and SYCL
		Single-Source
		Host
		Devices
			Sharing Devices
		Kernel Code
			Kernel: Vector Addition (DAXPY)
		Asynchronous Task Graphs
			Race Conditions When We Make a Mistake
		C++ Lambda Functions
		Portability and Direct Programming
	Concurrency vs. Parallelism
	Summary
Chapter 2: Where Code Executes
	Single-Source
		Host Code
		Device Code
	Choosing Devices
	Method#1: Run on a Device of Any Type
		Queues
		Binding a Queue to a Device, When Any Device Will Do
	Method#2: Using the Host Device for Development and Debugging
	Method#3: Using a GPU (or Other Accelerators)
		Device Types
			Accelerator Devices
		Device Selectors
			When Device Selection Fails
	Method#4: Using Multiple Devices
	Method#5: Custom (Very Specific) Device Selection
		device_selector Base Class
		Mechanisms to Score a Device
	Three Paths to Device Code Execution on CPU
	Creating Work on a Device
		Introducing the Task Graph
		Where Is the Device Code?
		Actions
		Fallback
	Summary
Chapter 3: Data Management
	Introduction
	The Data Management Problem
	Device Local vs. Device Remote
	Managing Multiple Memories
		Explicit Data Movement
		Implicit Data Movement
		Selecting the Right Strategy
	USM, Buffers, and Images
	Unified Shared Memory
		Accessing Memory Through Pointers
		USM and Data Movement
			Explicit Data Movement in USM
			Implicit Data Movement in USM
	Buffers
		Creating Buffers
		Accessing Buffers
		Access Modes
	Ordering the Uses of Data
		In-order Queues
		Out-of-Order (OoO) Queues
		Explicit Dependences with Events
		Implicit Dependences with Accessors
	Choosing a Data Management Strategy
	Handler Class: Key Members
	Summary
Chapter 4: Expressing Parallelism
	Parallelism Within Kernels
		Multidimensional Kernels
		Loops vs. Kernels
	Overview of Language Features
		Separating Kernels from Host Code
		Different Forms of Parallel Kernels
	Basic Data-Parallel Kernels
		Understanding Basic Data-Parallel Kernels
		Writing Basic Data-Parallel Kernels
		Details of Basic Data-Parallel Kernels
			The range Class
			The id Class
			The item Class
	Explicit ND-Range Kernels
		Understanding Explicit ND-Range Parallel Kernels
			Work-Items
			Work-Groups
			Sub-Groups
		Writing Explicit ND-Range Data-Parallel Kernels
		Details of Explicit ND-Range Data-Parallel Kernels
			The nd_range Class
			The nd_item Class
			The group Class
			The sub_group Class
	Hierarchical Parallel Kernels
		Understanding Hierarchical Data-Parallel Kernels
		Writing Hierarchical Data-Parallel Kernels
		Details of Hierarchical Data-Parallel Kernels
			The h_item Class
			The private_memory Class
	Mapping Computation to Work-Items
		One-to-One Mapping
		Many-to-One Mapping
	Choosing a Kernel Form
	Summary
Chapter 5: Error Handling
	Safety First
	Types of Errors
	Let’s Create Some Errors!
		Synchronous Error
		Asynchronous Error
	Application Error Handling Strategy
		Ignoring Error Handling
		Synchronous Error Handling
		Asynchronous Error Handling
			The Asynchronous Handler
			Invocation of the Handler
	Errors on a Device
	Summary
Chapter 6: Unified Shared Memory
	Why Should We Use USM?
	Allocation Types
		Device Allocations
		Host Allocations
		Shared Allocations
	Allocating Memory
		What Do We Need to Know?
		Multiple Styles
			Allocations à la C
			Allocations à la C++
			C++ Allocators
		Deallocating Memory
		Allocation Example
	Data Management
		Initialization
		Data Movement
			Explicit
			Implicit
				Migration
				Fine-Grained Control
	Queries
	Summary
Chapter 7: Buffers
	Buffers
		Creation
			Buffer Properties
				use_host_ptr
				use_mutex
				context_bound
		What Can We Do with a Buffer?
	Accessors
		Accessor Creation
		What Can We Do with an Accessor?
	Summary
Chapter 8: Scheduling Kernels and Data Movement
	What Is Graph Scheduling?
	How Graphs Work in DPC++
		Command Group Actions
		How Command Groups Declare Dependences
		Examples
		When Are the Parts of a CG Executed?
	Data Movement
		Explicit
		Implicit
	Synchronizing with the Host
	Summary
Chapter 9: Communication and Synchronization
	Work-Groups and Work-Items
	Building Blocks for Efficient Communication
		Synchronization via Barriers
		Work-Group Local Memory
	Using Work-Group Barriers and Local Memory
		Work-Group Barriers and Local Memory in ND-Range Kernels
			Local Accessors
			Synchronization Functions
			A Full ND-Range Kernel Example
		Work-Group Barriers and Local Memory in Hierarchical Kernels
			Scopes for Local Memory and Barriers
			A Full Hierarchical Kernel Example
	Sub-Groups
		Synchronization via Sub-Group Barriers
		Exchanging Data Within a Sub-Group
		A Full Sub-Group ND-Range Kernel Example
	Collective Functions
		Broadcast
		Votes
		Shuffles
		Loads and Stores
	Summary
Chapter 10: Defining Kernels
	Why Three Ways to Represent a Kernel?
	Kernels As Lambda Expressions
		Elements of a Kernel Lambda Expression
		Naming Kernel Lambda Expressions
	Kernels As Named Function Objects
		Elements of a Kernel Named Function Object
	Interoperability with Other APIs
		Interoperability with API-Defined Source Languages
		Interoperability with API-Defined Kernel Objects
	Kernels in Program Objects
	Summary
Chapter 11: Vectors
	How to Think About Vectors
	Vector Types
	Vector Interface
		Load and Store Member Functions
		Swizzle Operations
	Vector Execution Within a Parallel Kernel
	Vector Parallelism
	Summary
Chapter 12: Device Information
	Refining Kernel Code to Be More Prescriptive
	How to Enumerate Devices and Capabilities
		Custom Device Selector
		Being Curious: get_info<>
		Being More Curious: Detailed Enumeration Code
		Inquisitive: get_info<>
	Device Information Descriptors
	Device-Specific Kernel Information Descriptors
	The Specifics: Those of “Correctness”
		Device Queries
		Kernel Queries
	The Specifics: Those of “Tuning/Optimization”
		Device Queries
		Kernel Queries
	Runtime vs. Compile-Time Properties
	Summary
Chapter 13: Practical Tips
	Getting a DPC++ Compiler and Code Samples
	Online Forum and Documentation
	Platform Model
		Multiarchitecture Binaries
		Compilation Model
	Adding SYCL to Existing C++ Programs
	Debugging
		Debugging Kernel Code
		Debugging Runtime Failures
	Initializing Data and Accessing Kernel Outputs
	Multiple Translation Units
		Performance Implications of Multiple Translation Units
	When Anonymous Lambdas Need Names
	Migrating from CUDA to SYCL
	Summary
Chapter 14: Common Parallel Patterns
	Understanding the Patterns
		Map
		Stencil
		Reduction
		Scan
		Pack and Unpack
			Pack
			Unpack
	Using Built-In Functions and Libraries
		The DPC++ Reduction Library
			The reduction Class
			The reducer Class
			User-Defined Reductions
		oneAPI DPC++ Library
		Group Functions
	Direct Programming
		Map
		Stencil
		Reduction
		Scan
		Pack and Unpack
			Pack
			Unpack
	Summary
		For More Information
Chapter 15: Programming for GPUs
	Performance Caveats
	How GPUs Work
		GPU Building Blocks
		Simpler Processors (but More of Them)
			Expressing Parallelism
			Expressing More Parallelism
		Simplified Control Logic (SIMD Instructions)
			Predication and Masking
			SIMD Efficiency
			SIMD Efficiency and Groups of Items
		Switching Work to Hide Latency
	Offloading Kernels to GPUs
		SYCL Runtime Library
		GPU Software Drivers
		GPU Hardware
		Beware the Cost of Offloading!
			Transfers to and from Device Memory
	GPU Kernel Best Practices
		Accessing Global Memory
		Accessing Work-Group Local Memory
		Avoiding Local Memory Entirely with Sub-Groups
		Optimizing Computation Using Small Data Types
		Optimizing Math Functions
		Specialized Functions and Extensions
	Summary
		For More Information
Chapter 16: Programming for CPUs
	Performance Caveats
	The Basics of a General-Purpose CPU
	The Basics of SIMD Hardware
	Exploiting Thread-Level Parallelism
		Thread Affinity Insight
		Be Mindful of First Touch to Memory
	SIMD Vectorization on CPU
		Ensure SIMD Execution Legality
		SIMD Masking and Cost
		Avoid Array-of-Struct for SIMD Efficiency
		Data Type Impact on SIMD Efficiency
		SIMD Execution Using single_task
	Summary
Chapter 17: Programming for FPGAs
	Performance Caveats
	How to Think About FPGAs
		Pipeline Parallelism
		Kernels Consume Chip “Area”
	When to Use an FPGA
		Lots and Lots of Work
		Custom Operations or Operation Widths
		Scalar Data Flow
		Low Latency and Rich Connectivity
		Customized Memory Systems
	Running on an FPGA
		Compile Times
			The FPGA Emulator
			FPGA Hardware Compilation Occurs “Ahead-of-Time”
	Writing Kernels for FPGAs
		Exposing Parallelism
			Keeping the Pipeline Busy Using ND-Ranges
			Pipelines Do Not Mind Data Dependences!
			Spatial Pipeline Implementation of a Loop
			Loop Initiation Interval
		Pipes
			Blocking and Non-blocking Pipe Accesses
			For More Information on Pipes
		Custom Memory Systems
	Some Closing Topics
		FPGA Building Blocks
		Clock Frequency
	Summary
Chapter 18: Libraries
	Built-In Functions
		Use the sycl:: Prefix with Built-In Functions
	DPC++ Library
		Standard C++ APIs in DPC++
		DPC++ Parallel STL
			DPC++ Execution Policy
			FPGA Execution Policy
			Using DPC++ Parallel STL
			Using Parallel STL with USM
		Error Handling with DPC++ Execution Policies
	Summary
Chapter 19: Memory Model and Atomics
	What Is in a Memory Model?
		Data Races and Synchronization
		Barriers and Fences
		Atomic Operations
		Memory Ordering
	The Memory Model
		The memory_order Enumeration Class
		The memory_scope Enumeration Class
		Querying Device Capabilities
		Barriers and Fences
		Atomic Operations in DPC++
			The atomic Class
			The atomic_ref Class
			Using Atomics with Buffers
			Using Atomics with Unified Shared Memory
	Using Atomics in Real Life
		Computing a Histogram
		Implementing Device-Wide Synchronization
	Summary
		For More Information
Epilogue: Future Direction of DPC++
	Alignment with C++20 and C++23
	Address Spaces
	Extension and Specialization Mechanism
	Hierarchical Parallelism
	Summary
		For More Information
Index




نظرات کاربران