برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید

09117307688
09117179751

در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید

دسترسی نامحدود

برای کاربرانی که ثبت نام کرده اند

ضمانت بازگشت وجه

درصورت عدم همخوانی توضیحات با کتاب

پشتیبانی

از ساعت 7 صبح تا 10 شب

دانلود کتاب Learn CUDA Programming: A beginner's guide to GPU programming and parallel computing with CUDA 10.x and C/C++

دانلود کتاب آموزش برنامه نویسی CUDA: راهنمای مبتدیان برای برنامه نویسی GPU و محاسبات موازی با CUDA 10.x و C/C++

مشخصات کتاب

Learn CUDA Programming: A beginner's guide to GPU programming and parallel computing with CUDA 10.x and C/C++

ویرایش:  
نویسندگان: Jaegeun Han. Bharatkumar Sharma  
سری:  
ISBN (شابک) : 1788996240, 9781788996242 
ناشر: Packt Publishing 
سال نشر: 2019 
تعداد صفحات: 508 
زبان: English 
فرمت فایل : EPUB (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود) 
حجم فایل: 33 Mb

قیمت کتاب (تومان) : 46,000

میانگین امتیاز به این کتاب :
تعداد امتیاز دهندگان : 6

در صورت تبدیل فایل کتاب Learn CUDA Programming: A beginner's guide to GPU programming and parallel computing with CUDA 10.x and C/C++ به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.

توجه داشته باشید کتاب آموزش برنامه نویسی CUDA: راهنمای مبتدیان برای برنامه نویسی GPU و محاسبات موازی با CUDA 10.x و C/C++ نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.

توضیحاتی در مورد کتاب آموزش برنامه نویسی CUDA: راهنمای مبتدیان برای برنامه نویسی GPU و محاسبات موازی با CUDA 10.x و C/C++

کاوش روش‌های مختلف برنامه‌نویسی GPU با استفاده از کتابخانه‌ها و دستورالعمل‌ها، مانند OpenACC، با گسترش به زبان‌هایی مانند C، C++، و Python

ویژگی‌های کلیدی

< ul>

اصول و شیوه های برنامه نویسی موازی و تجزیه و تحلیل عملکرد در محاسبات GPU را بیاموزید

با برنامه نویسی چند GPU توزیع شده و سایر رویکردهای برنامه نویسی GPU آشنا شوید

درک نحوه شتاب دادن به GPU در مدل‌های یادگیری عمیق می‌توانند عملکرد خود را بهبود بخشند

توضیحات کتاب

معماری دستگاه واحد محاسبه (CUDA) پلتفرم محاسباتی GPU و رابط برنامه‌نویسی کاربردی NVIDIA است. این برنامه برای کار با زبان های برنامه نویسی مانند C، C++ و Python طراحی شده است. با CUDA، می‌توانید از قدرت محاسباتی موازی یک GPU برای طیف وسیعی از برنامه‌های محاسباتی با کارایی بالا در زمینه‌های علم، مراقبت‌های بهداشتی و یادگیری عمیق استفاده کنید.

Learn CUDA Programming به شما کمک می کند برنامه نویسی موازی GPU را یاد بگیرید و کاربردهای مدرن آن را درک کنید. در این کتاب، رویکردهای برنامه نویسی CUDA برای معماری های مدرن GPU را کشف خواهید کرد. شما نه تنها از طریق ویژگی‌های GPU، ابزارها و APIها راهنمایی می‌شوید، بلکه یاد می‌گیرید که چگونه عملکرد را با نمونه الگوریتم‌های برنامه‌نویسی موازی تجزیه و تحلیل کنید. این کتاب به شما کمک می‌کند تا عملکرد برنامه‌های خود را با ارائه بینش‌هایی در مورد پلتفرم‌های برنامه‌نویسی CUDA با کتابخانه‌های مختلف، دستورالعمل‌های کامپایلر (OpenACC) و زبان‌های دیگر بهینه کنید. همانطور که پیشرفت می کنید، خواهید آموخت که چگونه می توان با استفاده از چندین GPU در یک جعبه یا در چندین جعبه، قدرت محاسباتی اضافی تولید کرد. در نهایت، نحوه تسریع الگوریتم‌های یادگیری عمیق، از جمله شبکه‌های عصبی کانولوشن (CNN) و شبکه‌های عصبی تکراری (RNN) را بررسی خواهید کرد.

در پایان این کتاب CUDA، به مهارت هایی که برای ادغام قدرت محاسبات GPU در برنامه های خود نیاز دارید، مجهز خواهید شد.

آنچه خواهید آموخت

درک عملیات کلی GPU و الگوهای برنامه نویسی در CUDA
تفاوت بین برنامه نویسی GPU و برنامه نویسی CPU را کشف کنید
تجزیه و تحلیل عملکرد برنامه GPU و پیاده سازی استراتژی های بهینه سازی
برنامه نویسی، نمایه سازی و ابزارهای اشکال زدایی GPU را کاوش کنید
الگوریتم های برنامه نویسی موازی و نحوه پیاده سازی آنها را درک کنید
مقیاس سازی برنامه های شتاب دهنده GPU با چند GPU و چند گره < /li>
در پلتفرم‌های برنامه‌نویسی GPU با کتابخانه‌های شتاب‌دهنده، پایتون، و OpenACC کاوش کنید
درباره شتاب‌دهنده‌های یادگیری عمیق در CNN و RNN با استفاده از GPU بینش به دست آورید

Who این کتاب برای

این کتاب سطح مبتدی برای برنامه نویسانی است که می خواهند به محاسبات موازی بپردازند، بخشی از جامعه محاسباتی با کارایی بالا شوند و برنامه های کاربردی مدرن بسازند. تجربه برنامه نویسی پایه C و C++ در نظر گرفته شده است. برای علاقه مندان به یادگیری عمیق، این کتاب Python InterOps، کتابخانه های DL، و مثال های عملی در مورد تخمین عملکرد را پوشش می دهد.

فهرست محتوا

مقدمه ای بر برنامه نویسی CUDA< li>مدیریت حافظه CUDA
برنامه نویسی CUDA Thread: شاخص های عملکرد و استراتژی های بهینه سازی
مدل اجرای هسته CUDA و استراتژی های بهینه سازی
نظارت و اشکال زدایی برنامه CUDA
برنامه نویسی چند GPU مقیاس پذیر
الگوهای برنامه نویسی موازی در CUDA
کتابخانه های شتاب دهنده GPU و زبان های برنامه نویسی محبوب
برنامه نویسی GPU با استفاده از OpenACC
شتاب یادگیری عمیق با CUDA
ضمیمه

توضیحاتی درمورد کتاب به خارجی

Explore different GPU programming methods using libraries and directives, such as OpenACC, with extension to languages such as C, C++, and Python

Key Features

Learn parallel programming principles and practices and performance analysis in GPU computing
Get to grips with distributed multi GPU programming and other approaches to GPU programming
Understand how GPU acceleration in deep learning models can improve their performance

Book Description

Compute Unified Device Architecture (CUDA) is NVIDIA's GPU computing platform and application programming interface. It's designed to work with programming languages such as C, C++, and Python. With CUDA, you can leverage a GPU's parallel computing power for a range of high-performance computing applications in the fields of science, healthcare, and deep learning.

Learn CUDA Programming will help you learn GPU parallel programming and understand its modern applications. In this book, you'll discover CUDA programming approaches for modern GPU architectures. You'll not only be guided through GPU features, tools, and APIs, you'll also learn how to analyze performance with sample parallel programming algorithms. This book will help you optimize the performance of your apps by giving insights into CUDA programming platforms with various libraries, compiler directives (OpenACC), and other languages. As you progress, you'll learn how additional computing power can be generated using multiple GPUs in a box or in multiple boxes. Finally, you'll explore how CUDA accelerates deep learning algorithms, including convolutional neural networks (CNNs) and recurrent neural networks (RNNs).

By the end of this CUDA book, you'll be equipped with the skills you need to integrate the power of GPU computing in your applications.

What you will learn

Understand general GPU operations and programming patterns in CUDA
Uncover the difference between GPU programming and CPU programming
Analyze GPU application performance and implement optimization strategies
Explore GPU programming, profiling, and debugging tools
Grasp parallel programming algorithms and how to implement them
Scale GPU-accelerated applications with multi-GPU and multi-nodes
Delve into GPU programming platforms with accelerated libraries, Python, and OpenACC
Gain insights into deep learning accelerators in CNNs and RNNs using GPUs

Who this book is for

This beginner-level book is for programmers who want to delve into parallel computing, become part of the high-performance computing community and build modern applications. Basic C and C++ programming experience is assumed. For deep learning enthusiasts, this book covers Python InterOps, DL libraries, and practical examples on performance estimation.

Introduction to CUDA programming
CUDA Memory Management
CUDA Thread Programming: Performance Indicators and Optimization Strategies
CUDA Kernel Execution model and optimization strategies
CUDA Application Monitoring and Debugging
Scalable Multi-GPU programming
Parallel Programming Patterns in CUDA
GPU accelerated Libraries and popular programming languages
GPU programming using OpenACC
Deep Learning Acceleration with CUDA
Appendix

فهرست مطالب

Cover
Title Page
Copyright and Credits
Dedication
About Packt
Contributors
Table of Contents
Preface
Chapter 1: Introduction to CUDA Programming
	The history of high-performance computing
		Heterogeneous computing
		Programming paradigm
		Low latency versus higher throughput
		Programming approaches to GPU
	Technical requirements 
	Hello World from CUDA
		Thread hierarchy
		GPU architecture
	Vector addition using CUDA
		Experiment 1 – creating multiple blocks
		Experiment 2 – creating multiple threads
		Experiment 3 – combining blocks and threads
		Why bother with threads and blocks?
		Launching kernels in multiple dimensions
	Error reporting in CUDA
	Data type support in CUDA
	Summary
Chapter 2: CUDA Memory Management
	Technical requirements 
	NVIDIA Visual Profiler
	Global memory/device memory
		Vector addition on global memory
		Coalesced versus uncoalesced global memory access
		Memory throughput analysis
	Shared memory
		Matrix transpose on shared memory
		Bank conflicts and its effect on shared memory
	Read-only data/cache
		Computer vision – image scaling using texture memory
	Registers in GPU
	Pinned memory
		Bandwidth test – pinned versus pageable
	Unified memory
		Understanding unified memory page allocation and transfer
		Optimizing unified memory with warp per page
		Optimizing unified memory using data prefetching
	GPU memory evolution
		Why do GPUs have caches?
	Summary
Chapter 3: CUDA Thread Programming
	Technical requirements
	CUDA threads, blocks, and the GPU
		Exploiting a CUDA block and warp
	Understanding CUDA occupancy
		Setting NVCC to report GPU resource usages
			The settings for Linux
			Settings for Windows
		Analyzing the optimal occupancy using the Occupancy Calculator
		Occupancy tuning – bounding register usage
		Getting the achieved occupancy from the profiler
	Understanding parallel reduction
		Naive parallel reduction using global memory
		Reducing kernels using shared memory
		Writing performance measurement code
		Performance comparison for the two reductions – global and shared memory
	Identifying the application's performance limiter
		Finding the performance limiter and optimization
	Minimizing the CUDA warp divergence effect
		Determining divergence as a performance bottleneck
			Interleaved addressing
			Sequential addressing
	Performance modeling and balancing the limiter
		The Roofline model
		Maximizing memory bandwidth with grid-strided loops
		Balancing the I/O throughput
	Warp-level primitive programming
		Parallel reduction with warp primitives
	Cooperative Groups for flexible thread handling
		Cooperative Groups in a CUDA thread block
		Benefits of Cooperative Groups
			Modularity
			Explicit grouped threads' operation and race condition avoidance
			Dynamic active thread selection
			Applying to the parallel reduction
			Cooperative Groups to avoid deadlock
	Loop unrolling in the CUDA kernel
	Atomic operations
	Low/mixed precision operations
		Half-precision operation
		Dot product operations and accumulation for 8-bit integers and 16-bit data (DP4A and DP2A)
		Measuring the performance
	Summary
Chapter 4: Kernel Execution Model and Optimization Strategies
	Technical requirements
	Kernel execution with CUDA streams
		The usage of CUDA streams
		Stream-level synchronization
		Working with the default stream
	Pipelining the GPU execution
		Concept of GPU pipelining
		Building a pipelining execution
	The CUDA callback function
	CUDA streams with priority
		Priorities in CUDA
		Stream execution with priorities
	Kernel execution time estimation using CUDA events
		Using CUDA events
		Multiple stream estimation
	CUDA dynamic parallelism
		Understanding dynamic parallelism
		Usage of dynamic parallelism
		Recursion
	Grid-level cooperative groups
		Understanding grid-level cooperative groups
		Usage of grid_group
	CUDA kernel calls with OpenMP
		OpenMP and CUDA calls
		CUDA kernel calls with OpenMP
	Multi-Process Service
		Introduction to Message Passing Interface
		Implementing an MPI-enabled application
		Enabling MPS
		Profiling an MPI application and understanding MPS operation
	Kernel execution overhead comparison
		Implementing three types of kernel executions
		Comparison of three executions
	Summary 
Chapter 5: CUDA Application Profiling and Debugging
	Technical requirements
	Profiling focused target ranges in GPU applications
		Limiting the profiling target in code
		Limiting the profiling target with time or GPU
	Profiling with NVTX
	Visual profiling against the remote machine
	Debugging a CUDA application with CUDA error
	Asserting local GPU values using CUDA assert
	Debugging a CUDA application with Nsight Visual Studio Edition
	Debugging a CUDA application with Nsight Eclipse Edition
	Debugging a CUDA application with CUDA-GDB
		Breakpoints of CUDA-GDB
		Inspecting variables with CUDA-GDB
			Listing kernel functions
			Variables investigation
	Runtime validation with CUDA-memcheck
		Detecting memory out of bounds
		Detecting other memory errors
	Profiling GPU applications with Nsight Systems
	Profiling a kernel with Nsight Compute
		Profiling with the CLI
		Profiling with the GUI
			Performance analysis report
			Baseline compare
			Source view
	Summary
Chapter 6: Scalable Multi-GPU Programming
	Technical requirements 
	Solving a linear equation using Gaussian elimination
		Single GPU hotspot analysis of Gaussian elimination
	GPUDirect peer to peer
		Single node – multi-GPU Gaussian elimination
	Brief introduction to MPI
	GPUDirect RDMA
		CUDA-aware MPI
		Multinode – multi-GPU Gaussian elimination
	CUDA streams
		Application 1 – using multiple streams to overlap data transfers with kernel execution
		Application 2 – using multiple streams to run kernels on multiple devices
	Additional tricks
		Benchmarking an existing system with an InfiniBand network card
		NVIDIA Collective Communication Library (NCCL)
			Collective communication acceleration using NCCL
	Summary
Chapter 7: Parallel Programming Patterns in CUDA
	Technical requirements
	Matrix multiplication optimization
		Implementation of the tiling approach
		Performance analysis of the tiling approach
	Convolution
		Convolution operation in CUDA
		Optimization strategy
		Filtering coefficients optimization using constant memory
		Tiling input data using shared memory
		Getting more performance
	Prefix sum (scan)
		Blelloch scan implementation
		Building a global size scan
		The pursuit of better performance
		Other applications for the parallel prefix-sum operation
	Compact and split
		Implementing compact
		Implementing split
	N-body
		Implementing an N-body simulation on GPU
		Overview of an N-body simulation implementation
	Histogram calculation
		Compile and execution steps
		Understanding a parallel histogram 
		Calculating a histogram with CUDA atomic functions
	Quicksort in CUDA using dynamic parallelism
		Quicksort and CUDA dynamic parallelism 
		Quicksort with CUDA
		Dynamic parallelism guidelines and constraints
	Radix sort
		Two approaches
		Approach 1 – warp-level primitives
		Approach 2 – Thrust-based radix sort
	Summary
Chapter 8: Programming with Libraries and Other Languages
	Linear algebra operation using cuBLAS
		cuBLAS SGEMM operation
		Multi-GPU operation
	Mixed-precision operation using cuBLAS
		GEMM with mixed precision
		GEMM with TensorCore
	cuRAND for parallel random number generation
		cuRAND host API
		cuRAND device API
		cuRAND with mixed precision cuBLAS GEMM
	cuFFT for Fast Fourier Transformation in GPU
		Basic usage of cuFFT
		cuFFT with mixed precision
		cuFFT for multi-GPU
	NPP for image and signal processing with GPU
		Image processing with NPP
		Signal processing with NPP
		Applications of NPP
	Writing GPU accelerated code in OpenCV
		CUDA-enabled OpenCV installation
		Implementing a CUDA-enabled blur filter
		Enabling multi-stream processing
	Writing Python code that works with CUDA
		Numba – a high-performance Python compiler
			Installing Numba
			Using Numba with the @vectorize decorator
			Using Numba with the @cuda.jit decorator
		CuPy – GPU accelerated Python matrix library 
			Installing CuPy
			Basic usage of CuPy
			Implementing custom kernel functions
		PyCUDA – Pythonic access to CUDA API
		Installing PyCUDA
			Matrix multiplication using PyCUDA
	NVBLAS for zero coding acceleration in Octave and R
		Configuration
		Accelerating Octave's computation
		Accelerating R's compuation
	CUDA acceleration in MATLAB
	Summary
Chapter 9: GPU Programming Using OpenACC
	Technical requirements
		Image merging on a GPU using OpenACC
	OpenACC directives
		Parallel and loop directives
		Data directive
		Applying the parallel, loop, and data directive to merge image code
	Asynchronous programming in OpenACC
		Structured data directive
		Unstructured data directive
		Asynchronous programming in OpenACC
		Applying the unstructured data and async directives to merge image code
	Additional important directives and clauses
		Gang/vector/worker
		Managed memory
		Kernel directive
		Collapse clause
		Tile clause
		CUDA interoperability
			DevicePtr clause
			Routine directive
	Summary
Chapter 10: Deep Learning Acceleration with CUDA
	Technical requirements
	Fully connected layer acceleration with cuBLAS 
		Neural network operations
		Design of a neural network layer
		Tensor and parameter containers
		Implementing a fully connected layer
			Implementing forward propagation
			Implementing backward propagation
		Layer termination
	Activation layer with cuDNN
		Layer configuration and initialization
		Implementing layer operation
			Implementing forward propagation
			Implementing backward propagation
	Softmax and loss functions in cuDNN/CUDA
		Implementing the softmax layer
			Implementing forward propagation
			Implementing backward propagation
		Implementing the loss function
		MNIST dataloader
		Managing and creating a model
		Network training with the MNIST dataset
	Convolutional neural networks with cuDNN
		The convolution layer
			Implementing forward propagation
			Implementing backward propagation
		Pooling layer with cuDNN
			Implementing forward propagation
			Implementing backward propagation
		Network configuration
		Mixed precision operations
	Recurrent neural network optimization
		Using the CUDNN LSTM operation
		Implementing a virtual LSTM operation
		Comparing the performance between CUDNN and SGEMM LSTM
	Profiling deep learning frameworks
		Profiling the PyTorch model
		Profiling a TensorFlow model
	Summary
Appendix
Another Book You May Enjoy
Index

اینترنشنال لایبرری

ساخت حساب کاربری

دسترسی نامحدود

ضمانت بازگشت وجه

پشتیبانی

دانلود کتاب Learn CUDA Programming: A beginner's guide to GPU programming and parallel computing with CUDA 10.x and C/C++

دانلود کتاب آموزش برنامه نویسی CUDA: راهنمای مبتدیان برای برنامه نویسی GPU و محاسبات موازی با CUDA 10.x و C/C++

مشخصات کتاب

Learn CUDA Programming: A beginner's guide to GPU programming and parallel computing with CUDA 10.x and C/C++

توضیحاتی در مورد کتاب آموزش برنامه نویسی CUDA: راهنمای مبتدیان برای برنامه نویسی GPU و محاسبات موازی با CUDA 10.x و C/C++

ویژگی‌های کلیدی

توضیحات کتاب

آنچه خواهید آموخت

Who این کتاب برای

فهرست محتوا

توضیحاتی درمورد کتاب به خارجی

Key Features

Book Description

What you will learn

Who this book is for

Table of Contents

فهرست مطالب

نظرات کاربران

کتاب های تصادفی

دانلود کتاب The Geek’s Guide to the Writing Life: An Instructional Memoir for Prose Writers

دانلود کتاب Педагагічная практыка студэнтаў па беларускай мове і літаратуры

دانلود کتاب Can a Catholic Be a Socialist?

دانلود کتاب The EU and Global Climate Justice: Normative Power Caught in Normative Battles

دانلود کتاب Cambridge Young Learners English Tests: Starters

ورود به حساب

ساخت حساب کاربری

دسترسی نامحدود

ضمانت بازگشت وجه

پشتیبانی

دانلود کتاب Learn CUDA Programming: A beginner's guide to GPU programming and parallel computing with CUDA 10.x and C/C++

دانلود کتاب ﻿﻿آموزش برنامه نویسی CUDA: راهنمای مبتدیان برای برنامه نویسی GPU و محاسبات موازی با CUDA 10.x و C/C++

مشخصات کتاب

Learn CUDA Programming: A beginner's guide to GPU programming and parallel computing with CUDA 10.x and C/C++

توضیحاتی در مورد کتاب ﻿﻿آموزش برنامه نویسی CUDA: راهنمای مبتدیان برای برنامه نویسی GPU و محاسبات موازی با CUDA 10.x و C/C++

ویژگی‌های کلیدی

توضیحات کتاب

آنچه خواهید آموخت

Who این کتاب برای

فهرست محتوا

توضیحاتی درمورد کتاب به خارجی

Key Features

Book Description

What you will learn

Who this book is for

Table of Contents

فهرست مطالب

نظرات کاربران

کتاب های تصادفی

دانلود کتاب The Geek’s Guide to the Writing Life: An Instructional Memoir for Prose Writers

دانلود کتاب Педагагічная практыка студэнтаў па беларускай мове і літаратуры

دانلود کتاب Can a Catholic Be a Socialist?

دانلود کتاب The EU and Global Climate Justice: Normative Power Caught in Normative Battles

دانلود کتاب Cambridge Young Learners English Tests: Starters

دانلود کتاب آموزش برنامه نویسی CUDA: راهنمای مبتدیان برای برنامه نویسی GPU و محاسبات موازی با CUDA 10.x و C/C++

توضیحاتی در مورد کتاب آموزش برنامه نویسی CUDA: راهنمای مبتدیان برای برنامه نویسی GPU و محاسبات موازی با CUDA 10.x و C/C++