دسترسی نامحدود
برای کاربرانی که ثبت نام کرده اند
برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید
در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید
برای کاربرانی که ثبت نام کرده اند
درصورت عدم همخوانی توضیحات با کتاب
از ساعت 7 صبح تا 10 شب
ویرایش: نویسندگان: James Reinders, Ben Ashbaugh, James Brodman, Michael Kinsner, John Pennycook, Xinmin Tian سری: ISBN (شابک) : 9781484255742 ناشر: Apress سال نشر: تعداد صفحات: 0 زبان: English فرمت فایل : EPUB (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود) حجم فایل: 81 مگابایت
در صورت تبدیل فایل کتاب Data Parallel C++: Mastering DPC++ for Programming of Heterogeneous Systems using C++ and SYCL به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.
توجه داشته باشید کتاب C++ موازی داده: تسلط بر DPC++ برای برنامه نویسی سیستم های ناهمگن با استفاده از C++ و SYCL نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.
Table of Contents About the Authors Preface Acknowledgments Chapter 1: Introduction Read the Book, Not the Spec SYCL 1.2.1 vs. SYCL 2020, and DPC++ Getting a DPC++ Compiler Book GitHub Hello, World! and a SYCL Program Dissection Queues and Actions It Is All About Parallelism Throughput Latency Think Parallel Amdahl and Gustafson Scaling Heterogeneous Systems Data-Parallel Programming Key Attributes of DPC++ and SYCL Single-Source Host Devices Sharing Devices Kernel Code Kernel: Vector Addition (DAXPY) Asynchronous Task Graphs Race Conditions When We Make a Mistake C++ Lambda Functions Portability and Direct Programming Concurrency vs. Parallelism Summary Chapter 2: Where Code Executes Single-Source Host Code Device Code Choosing Devices Method#1: Run on a Device of Any Type Queues Binding a Queue to a Device, When Any Device Will Do Method#2: Using the Host Device for Development and Debugging Method#3: Using a GPU (or Other Accelerators) Device Types Accelerator Devices Device Selectors When Device Selection Fails Method#4: Using Multiple Devices Method#5: Custom (Very Specific) Device Selection device_selector Base Class Mechanisms to Score a Device Three Paths to Device Code Execution on CPU Creating Work on a Device Introducing the Task Graph Where Is the Device Code? Actions Fallback Summary Chapter 3: Data Management Introduction The Data Management Problem Device Local vs. Device Remote Managing Multiple Memories Explicit Data Movement Implicit Data Movement Selecting the Right Strategy USM, Buffers, and Images Unified Shared Memory Accessing Memory Through Pointers USM and Data Movement Explicit Data Movement in USM Implicit Data Movement in USM Buffers Creating Buffers Accessing Buffers Access Modes Ordering the Uses of Data In-order Queues Out-of-Order (OoO) Queues Explicit Dependences with Events Implicit Dependences with Accessors Choosing a Data Management Strategy Handler Class: Key Members Summary Chapter 4: Expressing Parallelism Parallelism Within Kernels Multidimensional Kernels Loops vs. Kernels Overview of Language Features Separating Kernels from Host Code Different Forms of Parallel Kernels Basic Data-Parallel Kernels Understanding Basic Data-Parallel Kernels Writing Basic Data-Parallel Kernels Details of Basic Data-Parallel Kernels The range Class The id Class The item Class Explicit ND-Range Kernels Understanding Explicit ND-Range Parallel Kernels Work-Items Work-Groups Sub-Groups Writing Explicit ND-Range Data-Parallel Kernels Details of Explicit ND-Range Data-Parallel Kernels The nd_range Class The nd_item Class The group Class The sub_group Class Hierarchical Parallel Kernels Understanding Hierarchical Data-Parallel Kernels Writing Hierarchical Data-Parallel Kernels Details of Hierarchical Data-Parallel Kernels The h_item Class The private_memory Class Mapping Computation to Work-Items One-to-One Mapping Many-to-One Mapping Choosing a Kernel Form Summary Chapter 5: Error Handling Safety First Types of Errors Let’s Create Some Errors! Synchronous Error Asynchronous Error Application Error Handling Strategy Ignoring Error Handling Synchronous Error Handling Asynchronous Error Handling The Asynchronous Handler Invocation of the Handler Errors on a Device Summary Chapter 6: Unified Shared Memory Why Should We Use USM? Allocation Types Device Allocations Host Allocations Shared Allocations Allocating Memory What Do We Need to Know? Multiple Styles Allocations à la C Allocations à la C++ C++ Allocators Deallocating Memory Allocation Example Data Management Initialization Data Movement Explicit Implicit Migration Fine-Grained Control Queries Summary Chapter 7: Buffers Buffers Creation Buffer Properties use_host_ptr use_mutex context_bound What Can We Do with a Buffer? Accessors Accessor Creation What Can We Do with an Accessor? Summary Chapter 8: Scheduling Kernels and Data Movement What Is Graph Scheduling? How Graphs Work in DPC++ Command Group Actions How Command Groups Declare Dependences Examples When Are the Parts of a CG Executed? Data Movement Explicit Implicit Synchronizing with the Host Summary Chapter 9: Communication and Synchronization Work-Groups and Work-Items Building Blocks for Efficient Communication Synchronization via Barriers Work-Group Local Memory Using Work-Group Barriers and Local Memory Work-Group Barriers and Local Memory in ND-Range Kernels Local Accessors Synchronization Functions A Full ND-Range Kernel Example Work-Group Barriers and Local Memory in Hierarchical Kernels Scopes for Local Memory and Barriers A Full Hierarchical Kernel Example Sub-Groups Synchronization via Sub-Group Barriers Exchanging Data Within a Sub-Group A Full Sub-Group ND-Range Kernel Example Collective Functions Broadcast Votes Shuffles Loads and Stores Summary Chapter 10: Defining Kernels Why Three Ways to Represent a Kernel? Kernels As Lambda Expressions Elements of a Kernel Lambda Expression Naming Kernel Lambda Expressions Kernels As Named Function Objects Elements of a Kernel Named Function Object Interoperability with Other APIs Interoperability with API-Defined Source Languages Interoperability with API-Defined Kernel Objects Kernels in Program Objects Summary Chapter 11: Vectors How to Think About Vectors Vector Types Vector Interface Load and Store Member Functions Swizzle Operations Vector Execution Within a Parallel Kernel Vector Parallelism Summary Chapter 12: Device Information Refining Kernel Code to Be More Prescriptive How to Enumerate Devices and Capabilities Custom Device Selector Being Curious: get_info<> Being More Curious: Detailed Enumeration Code Inquisitive: get_info<> Device Information Descriptors Device-Specific Kernel Information Descriptors The Specifics: Those of “Correctness” Device Queries Kernel Queries The Specifics: Those of “Tuning/Optimization” Device Queries Kernel Queries Runtime vs. Compile-Time Properties Summary Chapter 13: Practical Tips Getting a DPC++ Compiler and Code Samples Online Forum and Documentation Platform Model Multiarchitecture Binaries Compilation Model Adding SYCL to Existing C++ Programs Debugging Debugging Kernel Code Debugging Runtime Failures Initializing Data and Accessing Kernel Outputs Multiple Translation Units Performance Implications of Multiple Translation Units When Anonymous Lambdas Need Names Migrating from CUDA to SYCL Summary Chapter 14: Common Parallel Patterns Understanding the Patterns Map Stencil Reduction Scan Pack and Unpack Pack Unpack Using Built-In Functions and Libraries The DPC++ Reduction Library The reduction Class The reducer Class User-Defined Reductions oneAPI DPC++ Library Group Functions Direct Programming Map Stencil Reduction Scan Pack and Unpack Pack Unpack Summary For More Information Chapter 15: Programming for GPUs Performance Caveats How GPUs Work GPU Building Blocks Simpler Processors (but More of Them) Expressing Parallelism Expressing More Parallelism Simplified Control Logic (SIMD Instructions) Predication and Masking SIMD Efficiency SIMD Efficiency and Groups of Items Switching Work to Hide Latency Offloading Kernels to GPUs SYCL Runtime Library GPU Software Drivers GPU Hardware Beware the Cost of Offloading! Transfers to and from Device Memory GPU Kernel Best Practices Accessing Global Memory Accessing Work-Group Local Memory Avoiding Local Memory Entirely with Sub-Groups Optimizing Computation Using Small Data Types Optimizing Math Functions Specialized Functions and Extensions Summary For More Information Chapter 16: Programming for CPUs Performance Caveats The Basics of a General-Purpose CPU The Basics of SIMD Hardware Exploiting Thread-Level Parallelism Thread Affinity Insight Be Mindful of First Touch to Memory SIMD Vectorization on CPU Ensure SIMD Execution Legality SIMD Masking and Cost Avoid Array-of-Struct for SIMD Efficiency Data Type Impact on SIMD Efficiency SIMD Execution Using single_task Summary Chapter 17: Programming for FPGAs Performance Caveats How to Think About FPGAs Pipeline Parallelism Kernels Consume Chip “Area” When to Use an FPGA Lots and Lots of Work Custom Operations or Operation Widths Scalar Data Flow Low Latency and Rich Connectivity Customized Memory Systems Running on an FPGA Compile Times The FPGA Emulator FPGA Hardware Compilation Occurs “Ahead-of-Time” Writing Kernels for FPGAs Exposing Parallelism Keeping the Pipeline Busy Using ND-Ranges Pipelines Do Not Mind Data Dependences! Spatial Pipeline Implementation of a Loop Loop Initiation Interval Pipes Blocking and Non-blocking Pipe Accesses For More Information on Pipes Custom Memory Systems Some Closing Topics FPGA Building Blocks Clock Frequency Summary Chapter 18: Libraries Built-In Functions Use the sycl:: Prefix with Built-In Functions DPC++ Library Standard C++ APIs in DPC++ DPC++ Parallel STL DPC++ Execution Policy FPGA Execution Policy Using DPC++ Parallel STL Using Parallel STL with USM Error Handling with DPC++ Execution Policies Summary Chapter 19: Memory Model and Atomics What Is in a Memory Model? Data Races and Synchronization Barriers and Fences Atomic Operations Memory Ordering The Memory Model The memory_order Enumeration Class The memory_scope Enumeration Class Querying Device Capabilities Barriers and Fences Atomic Operations in DPC++ The atomic Class The atomic_ref Class Using Atomics with Buffers Using Atomics with Unified Shared Memory Using Atomics in Real Life Computing a Histogram Implementing Device-Wide Synchronization Summary For More Information Epilogue: Future Direction of DPC++ Alignment with C++20 and C++23 Address Spaces Extension and Specialization Mechanism Hierarchical Parallelism Summary For More Information Index