دسترسی نامحدود
برای کاربرانی که ثبت نام کرده اند
برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید
در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید
برای کاربرانی که ثبت نام کرده اند
درصورت عدم همخوانی توضیحات با کتاب
از ساعت 7 صبح تا 10 شب
ویرایش: نویسندگان: Abhinav Bhatele (editor), Jeff Hammond (editor), Marc Baboulin (editor), Carola Kruse (editor) سری: ISBN (شابک) : 3031320409, 9783031320408 ناشر: Springer سال نشر: 2023 تعداد صفحات: 440 [432] زبان: English فرمت فایل : PDF (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود) حجم فایل: 34 Mb
در صورت تبدیل فایل کتاب High Performance Computing: 38th International Conference, ISC High Performance 2023, Hamburg, Germany, May 21–25, 2023, Proceedings (Lecture Notes in Computer Science) به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.
توجه داشته باشید کتاب محاسبات با کارایی بالا: 38 مین کنفرانس بین المللی ، ISC با عملکرد بالا 2023 ، هامبورگ ، آلمان ، 21-25 مه 2023 ، مجموعه مقالات (یادداشت های سخنرانی در علوم کامپیوتر) نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.
Preface Organization Contents Architecture, Networks, and Storage CPU Architecture Modelling and Co-design 1 Introduction 2 Approach to Modelling 3 Methodology 4 Model Tuning and Validation 5 Applications 5.1 GROMACS 5.2 GPAW 6 Results 6.1 GROMACS 6.2 GPAW 7 Related Work 8 Summary and Conclusions References Illuminating the I/O Optimization Path of Scientific Applications 1 Introduction 2 Related Work 3 Visualization, Diagnosis, and Recommendations 3.1 Extracting I/O Behavior from Metrics 3.2 Exploring I/O Behavior Interactively 3.3 Automatic Detection of I/O Bottlenecks 3.4 Exploring I/O Phases and Bottlenecks 3.5 Towards Exploring File System Usage 4 Results 4.1 I/O Systems in NERSC and OLCF 4.2 I/O Bottlenecks in OpenPMD 4.3 Improving AMReX with Asynchronous I/O 5 Conclusion References Efficient Large Scale DLRM Implementation on Heterogeneous Memory Systems 1 Introduction 2 Related Work 3 Implementing Embedding Tables in Heterogeneous Memory Systems 4 Cached Embeddings 4.1 CachedEmbeddings Performance 5 DLRM Implementation Methodology 6 End-to-End DLRM Performance 7 Conclusions and Future Work References HPC Algorithms and Applications Efficient GPU Offloading with OpenMP for a Hyperbolic Finite Volume Solver on Dynamically Adaptive Meshes 1 Introduction 2 Science Case and Code Architecture 3 A Realisation of GPU Offloads with target map 4 User-Managed Memory Management 4.1 Data Pre-allocation on the GPU 4.2 Pre-allocation on the CPU with Unified Memory 5 Results 6 Discussion and Conclusions 7 Summary and Outlook References Shallow Water DG Simulations on FPGAs: Design and Comparison of a Novel Code Generation Pipeline 1 Introduction 2 Background 2.1 Mathematical Model and Numerical Scheme 2.2 Simulation Scenario: Radial Dam Break 2.3 FPGAs 3 Proposed Code Generation Pipeline (CGP) 3.1 GHODDESS 3.2 pystencils 3.3 StencilStream 3.4 Integration 4 Existing Dataflow Design 5 FPGA Designs, Experiments and Evaluation 5.1 Performance of the CPU Reference and Validation 6 Analysis 7 Related Work 8 Conclusion and Outlook References Massively Parallel Genetic Optimization Through Asynchronous Propagation of Populations 1 Introduction 2 Related Work 3 Propulate Algorithm and Implementation 4 Experimental Evaluation 4.1 Experimental Environment 4.2 Benchmark Functions 4.3 Meta-optimizing the Optimizer 4.4 Benchmark Function Optimization 4.5 HP Optimization for Remote Sensing Classification 4.6 Scaling 5 Conclusion References Steering Customized AI Architectures for HPC Scientific Applications 1 Introduction 2 Related Work and Research Contributions 3 Batching/Compression or Why Matricization Matters? 4 The Graphcore IPU Hardware Technology 4.1 Architecture Principles and Hardware Details 4.2 Programming Model and Poplar Development Kit 5 HPC Scientific Applications 5.1 Adaptive Optics in Computational Astronomy 5.2 Seismic Processing and Imaging 5.3 Climate/Weather Prediction Applications 5.4 Wireless Communications 6 Implementation Details 7 Performance Results 8 Limitations and Perspectives 9 Conclusion and Future Work References GPU-Based Low-Precision Detection Approach for Massive MIMO Systems 1 Introduction 2 Brief Background 2.1 Modulation 2.2 Signal to Noise Ratio (SNR) 2.3 Error Rate and Time Complexity 3 Related Work 4 System Model 4.1 Tree-Based Representation 5 Multi-level Approach 6 GPU-Based Multi-level Approaches 6.1 GPU Multi-level 6.2 Multi-GPU Version 7 Results and Discussions 8 Conclusion and Perspectives References A Mixed Precision Randomized Preconditioner for the LSQR Solver on GPUs 1 Introduction 2 Background 2.1 Related Work 3 Design and Implementation of the Mixed Precision Preconditioner 4 Numerical Experiments 4.1 Experiment Setup 4.2 Discussion 5 Conclusion References Ready for the Frontier: Preparing Applications for the World's First Exascale System 1 Introduction and Background 2 Systems Overview 2.1 Summit 2.2 Frontier 3 Applications 3.1 CoMet 3.2 Cholla: Computational Hydrodynamics on Parallel Architecture 3.3 GESTS: GPUs for Extreme-Scale Turbulence Simulations 3.4 LBPM: Lattice Boltzmann Methods for Porous Media 3.5 LSMS 3.6 NUCCOR/NTCL 3.7 NAMD 3.8 PIConGPU 4 Lessons Learned 5 Conclusions References End-to-End Differentiable Reactive Molecular Dynamics Simulations Using JAX 1 Introduction 1.1 Related Work 1.2 Our Contribution 2 Background 2.1 ReaxFF Overview 2.2 JAX and JAX-MD Overview 3 Design and Implementation 3.1 Memory Management 3.2 Generation of Interaction Lists 3.3 Force Field Training 4 Experimental Results 4.1 Software and Hardware Setup 4.2 Validation of MD Capabilities 4.3 Performance and Scalability 4.4 Training 5 Conclusion References Machine Learning, AI, and Quantum Computing Allegro-Legato: Scalable, Fast, and Robust Neural-Network Quantum Molecular Dynamics via Sharpness-Aware Minimization 1 Introduction 2 Method Innovation 2.1 Summary of Neural-Network Quantum Molecular Dynamics 2.2 Summary of Sharpness-Aware Minimization 2.3 Key Innovation: Allegro-Legato: SAM-Enhanced Allegro 2.4 RXMD-NN: Scalable Parallel Implementation of Allegro-Legato NNQMD 3 Results 3.1 Experimental Platform 3.2 Fidelity-Scaling Results 3.3 Computational-Scaling Results 4 Discussions 4.1 Simulation Time 4.2 Training Time 4.3 Model Accuracy 4.4 Implicit Sharpness Regularization in Allegro 4.5 Training Details 5 Applications 6 Related Work 7 Conclusion References Quantum Annealing vs. QAOA: 127 Qubit Higher-Order Ising Problems on NISQ Computers 1 Introduction 2 Methods 2.1 Ising Model Problem Instances 2.2 Quantum Alternating Operator Ansatz 2.3 Quantum Annealing 2.4 Simulated Annealing Implementation 3 Results 4 Discussion References Quantum Circuit Simulation by SGEMM Emulation on Tensor Cores and Automatic Precision Selection 1 Introduction 2 Background 2.1 NVIDIA Tensor Core and SGEMM Emulation 2.2 Quantum Circuit Simulation and Tensor Network Contraction 3 SGEMM Emulation Library on Tensor Cores 4 Automatic Precision Selection 4.1 Exponent Statistics and Computing Mode Selection Rule 4.2 Dynamic Kernel Selection 4.3 The Overhead of the Exponent Statistics 5 Experiment 5.1 Preparation 5.2 Exploratory Experiment 5.3 Random Quantum Circuit Simulation 6 Conclusion References Performance Modeling, Evaluation, and Analysis A Study on the Performance Implications of AArch64 Atomics 1 Introduction 2 The Problem 2.1 RAJAPerf and the PI_ATOMIC kernel 2.2 Performance Results 2.3 A Closer Look at OpenMP Floating-Point Atomics 3 Benchmarking CAS Operations 3.1 Compare-and-Swap Operations 3.2 Benchmark Description 3.3 Assembly Kernels 4 Experiments and Observations 4.1 Evaluating the Performance of CAS 4.2 A Closer Look at A64FX 4.3 Testing LL-SC Implementations 4.4 Summary and Recommendations 5 Related Work 6 Conclusions References Analyzing Resource Utilization in an HPC System: A Case Study of NERSC's Perlmutter 1 Introduction 2 Related Work 3 Background 3.1 System Overview 3.2 Data Collection 3.3 Analysis Methods 4 Results 4.1 Workloads Overview 4.2 Resource Utilization 4.3 Temporal Characteristics 4.4 Spatial Characteristics 4.5 Correlations 5 Discussion and Conclusion References Overcoming Weak Scaling Challenges in Tree-Based Nearest Neighbor Time Series Mining 1 Introduction 2 Matrix Profile Background and Performance-Accuracy Trade-offs 2.1 Related Work 2.2 Potentials of Tree-based Methods 3 Current Parallel Tree-Based Approach and Its Shortcomings 4 Overcoming the Scalability Challenges 4.1 Pipelining Mechanism 4.2 Forest of Trees on Ensembles of Resources: 5 Modeling the Impact of Optimizations on Complexity 6 Experimental Setup 7 Evaluations 7.1 Region of Benefit 7.2 Performance on Real-World Datasets 7.3 Single-Node Performance 7.4 Scaling Overheads 7.5 Effects of Pipelining and Forest Mechanisms 7.6 Scaling Results 7.7 Billion Scale Experiment 8 Conclusions References Porting Numerical Integration Codes from CUDA to oneAPI: A Case Study 1 Introduction 2 Background 2.1 oneAPI and SYCL 2.2 CUDA-Backend for SYCL 2.3 Related Work 3 Numerical Integration Use Case 3.1 PAGANI 3.2 m-Cubes 4 Porting Process 4.1 Challenges 5 Experimental Results 5.1 Offloading Mathematical Computations to Kernels 5.2 Benchmark Integrands Performance Comparison 5.3 Simple Integrands Performance Comparison 5.4 Factors Limiting Performance 6 Conclusion References Performance Evaluation of a Next-Generation SX-Aurora TSUBASA Vector Supercomputer 1 Introduction 2 Overview of SX-Aurora TSUBASA VE30 2.1 The SX-Aurora TSUBASA Product Family 2.2 Basic Architecture of the VE30 Processor 2.3 Architectural Improvements from the VE20 Processor 3 Performance Evaluation 3.1 Evaluation Environment 3.2 Basic Benchmarks 3.3 Evaluation of Architectural Improvements 3.4 Real-World Workloads 4 Performance Tuning for VE30 4.1 Selective L3 Caching 4.2 Partitioning Mode 5 Conclusions References Programming Environments and Systems Software Expression Isolation of Compiler-Induced Numerical Inconsistencies in Heterogeneous Code 1 Introduction 2 Examples of Compiler-Induced Inconsistencies 3 Technical Approach 3.1 Hierarchy Extraction 3.2 Hierarchical Code Isolation 3.3 Source-to-Source Precision Enhancement 4 Experimental Evaluation 4.1 RQ1: Numerical Inconsistencies in Heterogeneous Programs 4.2 RQ2: Comparison with the State of the Art 4.3 Threats to Validity 5 Related Work 6 Conclusion References SAI: AI-Enabled Speech Assistant Interface for Science Gateways in HPC 1 Introduction and Motivation 1.1 Motivation 1.2 Challenges in Enabling Conversational Interface for HPC 1.3 Contributions 2 Background 2.1 Conversational User Interface 2.2 Open OnDemand 2.3 Ontology and Knowledge Graphs 2.4 Spack 3 Terminologies 4 Proposed SAI Framework 4.1 Generating HPC Datasets for Speech and Text 4.2 Fine-Tuning Speech Recognition Model for HPC Terminologies 4.3 Designing an Entity Detection and Classification Model for SAI 4.4 Creating the HPC Ontology and Knowledge Graphs 4.5 Knowledge Graph Selection and Inference 4.6 Software Installer Check and Interfacing with Spack 4.7 Integration with Open OnDemand 5 Insights into SAI Usage and Explainable Flow 6 Experimental Evaluation 6.1 Evaluation Platform 6.2 Evaluation Methodology 6.3 Evaluating ASR Model 6.4 Evaluating NLU Model 6.5 Performance Evaluation of Combined ASR and NLU Models 6.6 Overhead Analysis of SAI 6.7 Overhead Analysis of Scaling Passenger App Users 6.8 Analysis of SAI Interactive App on Different Architectures 7 Discussion 7.1 Security and Authentication 7.2 Handling Ambiguous Queries in SAI 7.3 Trade-offs for Converting Speech to Entities 7.4 Portability for New Software and Systems 8 Related Work 9 Future Work 10 Conclusion References Author Index