دسترسی نامحدود
برای کاربرانی که ثبت نام کرده اند
برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید
در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید
برای کاربرانی که ثبت نام کرده اند
درصورت عدم همخوانی توضیحات با کتاب
از ساعت 7 صبح تا 10 شب
ویرایش: نویسندگان: Arndt Bode (editor), Thomas Ludwig (editor), Wolfgang Karl (editor), Roland Wismüller (editor) سری: ISBN (شابک) : 3540679561, 9783540679561 ناشر: Springer سال نشر: 2000 تعداد صفحات: 1395 زبان: English فرمت فایل : PDF (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود) حجم فایل: 20 مگابایت
در صورت تبدیل فایل کتاب Euro-Par 2000 Parallel Processing: 6th International Euro-Par Conference Munich, Germany, August 29 – September 1, 2000 Proceedings (Lecture Notes in Computer Science, 1900) به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.
توجه داشته باشید کتاب یورو-پار 2000 پردازش موازی: ششمین کنفرانس بین المللی یورو-پار مونیخ، آلمان، 29 اوت - 1 سپتامبر 2000 مجموعه مقالات (یادداشت های سخنرانی در علوم کامپیوتر، 1900) نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.
Euro-Par 2000 Parallel Processing
Preface
Euro-Par Steering Committee
Euro-Par 2000 Referees
Table of Contents
Four Horizons for Enhancing the Performance of Parallel Simulations Based on Partial Differential Equations
Introduction
Background and Complexity of PDEs
PDE Varieties and Complexities
Typical PDE Tasks
Concurrency, Communication, and Synchronization
Source #1: Expanded Number of Processors
Source #2: More Efficient Use of Faster Processors
PDE Workingsets
Source #3: More Architecture-Friendly Algorithms
Source #4: Algorithms Delivering More ``Science per Flop\'\'
Summary of Performance Improvements
References
E2K Technology and Implementation
Grid-Based Asynchronous Migration of Execution Context in Java Virtual Machines
Introduction
The Thread Migration System MOBA
MOBA System Components
Programming Interface
Implementation
Organization of the Migration Facilities
Design Issues of Thread Migration in JVMs
Moba/G Service Requirements
Grid-Based Registration Service
Grid-Based Installation Service
Grid-Based Startup Service
Authentication and Authorization Service
Secure Communication Service
Conclusion
References
Logical Instantaneity and Causal Order: Two ``First Class\'\' Communication Modes for Parallel Computing
Introduction
Underlying System Model
Underlying Asynchronous Distributed System
Communication Primitives at the Application Level
Logically Instantaneous Communication
Definition
Communication Statements
Implementing {sc li} Communication
Causally Ordered Communication
Definition
Implementation Protocols
References
The TOP500 Project of the Universities Mannheim and Tennessee
Topic 01 Support Tools and Environments
Visualization and Computational Steering in Heterogeneous Computing Environments
Introduction
Related Work
OViD
OViD Architecture
OViD with a Parallel CFD Simulation
Conclusion and Future Work
References
A Web-Based Finite Element Meshes Partitioner and Load Balancer
Introduction
Related Work
The System Structure of FEMPAL
The Partitioner
The Load Balancer
The Simulator
The Visualization Tool
The Web Interface
The Implementation of FEMPAL
Experience and Experimental Results
Experimental Results for the Partitioner
Experimental Results for the Load Balancer
Experience with the Simulator
Conclusions and Future Work
Acknowledgments
References
A Framework for an Interoperable Tool Environment
Introduction
Initial Toolset
Tool Interoperability Scenarios
Interaction with a Browser
Computational Steering
Interaction with a Debugger
Conclusion and Future Work
References
ToolBlocks: An Infrastructure for the Construction of Memory Hierarchy Analysis Tools
Introduction
System Overview
Example Output
Conclusion
References
A Preliminary Evaluation of FINESSE, a Feedback-Guided Performance Enhancement System
Introduction
Overview of {sc Finesse}
Definitions
Experimental Arrangement
Automatic versus Manual Parallelisation of SP
Parallelisation of SP Using {sc Finesse}
Summary of Results for All Six Test Codes
Related Work
Conclusion
References
On Combining Computational Differentiation and Toolkits for Parallel Scientific Computing
Numerical versus Automatic Differentiation
Computational Differentiation in Scientific Toolkits
Potential Gain of CD and Future Research Directions
Concluding Remarks
References
Generating Parallel Program Frameworks from Parallel Design Patterns
Introduction
Reaction--Diffusion Texture Generation
Design Pattern Selection
Generating and Using the Mesh Framework
The Implementation of the Mesh Framework
Evaluating the Mesh Framework
Other Patterns in CO$_2$P$_3$S
Conclusions
References
Topic 02 Performance Evaluation and Prediction
A Callgraph-Based Search Strategy for Automated Performance Diagnosis
Introduction
Some Paradyn Basics
Exclusive vs. Inclusive Timing Metrics
The Performance Consultant
Original Paradyn: Searching the Code Hierarchy
Dynamic Function Call Instrumentation
Call Site Instrumentation Code
Control Flow for Dynamic Call Site Instrumentation
Callgraph-Based Searching
Experimental Results
Experimental Setup
Results
Conclusions
References
Automatic Performance Analysis of MPI Applications Based on Event Traces
Introduction
EARL
The EARL Event Trace Model
The EARL Language
An Extensible and Modular Tool Architecture
Automatic Performance Analysis of MPI Programs
Analyzing a Real Application
Related Work
Conclusion and Future Work
References
Pajé: An Extensible Environment for Visualizing Multi-threaded Programs Executions
Introduction
Outline of Paj\'e
textsc {Athadiscretionary {-}{}{}pasdiscretionary {-}{}{}can}xspace : A Thread-Based Parallel Programming Model
Tracing of Parallel Programs
Visualization of Threads in Paj\'e
Extensibility
Modular Architecture
Flexibility of Visualization Modules
Genericity of Paj\'e
Conclusion
References
A Statistical-Empirical Hybrid Approach to Hierarchical Memory Analysis
Introduction
The Hybrid Approach
The Hybrid Approach: Level 1
The Hybrid Approach: Level 2
Case Study
Architecture Descriptions
ASCI Representative Workloads
Hybrid Analysis
Conclusions and Future Work
References
Use of Performance Technology for the Management of Distributed Systems
Introduction
The PACE System
Performance Language
Performance Object Hierarchy
Performance Object Definition
Software Objects
Hardware Objects
Model Evaluation
Performance Models in Use
Off-Line Analysis
On-the-Fly Analysis
Conclusion
Acknowledgement
References
Delay Behavior in Domain Decomposition Applications
Introduction
Asynchronous Communication
Lower Bound for the Number of Total Delays
Transition Probability
Effective Delay
Simulations
Conclusions
References
Automating Performance Analysis from UML Design Patterns
Introduction
The Meeting Design Patterns
Petri Net Models
Arrival/Departure Petri Nets
Conclusion
References
Integrating Automatic Techniques in a Performance Analysis Session
Introduction
KappaPi Tool. Rule-Based Performance Analysis System
Examining an Application: Forest Fire Propagation
Conclusions
Acknowledgments
References
Combining Light Static Code Annotation and Instruction-Set Emulation for Flexible and Efficient On-the-Fly Simulation
Introduction
Light Static Code Annotation and Instruction-Set Emulation
calvin2 and DICE
calvin2
DICE: A Dynamic Inner Code Emulator
Performance Evaluation
Summary and Future Work
References
SCOPE - The Specific Cluster Operation and Performance Evaluation Benchmark Suite
Introduction
Performance Evaluation of HPC Systems and Clusters
The Structure of the SCOPE Benchmark
Case Study Analysis and Results
Conclusions
References
Implementation Lessons of Performance Prediction Tool for Parallel Conservative Simulation
Introduction
Analyzer for Conservative Simulation Protocol
Issues for Accurate Predictions
Conclusion
References
A Fast and Accurate Approach to Analyze Cache Memory Behavior
Introduction
Overview of CMEs
Solving CMEs
Sampling
CMEs Particularization
Generating Samples
Evaluation
Performance Evaluation
Conclusions
References
Impact of PE Mapping on Cray T3E Message-Passing Performance
Introduction
Random Pairwise Exchanges
Random Pairing in the Cray T3E
Random Pairing in the SGI Origin 2000
Preliminary Conclusions
MPI_Cart_create Optimization on the Cray T3E
Our Mapping Algorithm
1D Algorithm
N-Dimensional Algorithm
Results
MPI_Cart_create Benchmark
Conclusions
Acknowledgements
References
Performance Prediction of an NAS Benchmark Program with ChronosMix Environment
Introduction
Presentation of the ChronosMix Environment
Performance Prediction of the NAS Integer Sorting Benchmark
Presentation of the Integer Sorting Benchmark
Comparison of IS on Various Types of Architecture
Conclusion
References
Topic 03 Scheduling and Load Balancing
A Hierarchical Approach to Irregular Problems
Introduction
Data Mapping and Runtime Load Balancing
Fault Prevention
Experimental Results
References
Load Scheduling with Profile Information
Introduction
Related Work
DCPI
Information Supplied by DCPI
Deriving Locality Information
Validation of the Locality Information
Scheduling with Runtime Data
Balanced Scheduling
Balanced Scheduling with Locality Data
Communicating Locality Classifications to the Scheduler
Limitations of Experiments
Experimental Results
Conclusions
References
Neighbourhood Preserving Load Balancing: A Self-Organizing Approach
Introduction
Self Organizing Maps (SOM)
Load Balancing with SOM
Results
Improvement with Multilevel Approach
Related Work
Conclusion
References
The Impact of Migration on Parallel Job Scheduling for Distributed Systems
Introduction
The Migration Algorithm
Methodology
Experimental Results
Conclusions
References
Memory Management Techniques for Gang Scheduling
Introduction
Preliminaries
System Model
Job Selection and Mapping for Gang Scheduling
Gang Scheduling with Memory Considerations
Memory Management Techniques for Gang Scheduling
Memory Balancing
Adaptive Multi-programming Level
Experimental Results
Workload Model
Simulation Results
Summary and Future Work
References
Exploiting Knowledge of Temporal Behaviour in Parallel Programs for Improving Distributed Mapping
Introduction
The Parallel Program Model
Experimental Study on a PVM Platform
Conclusions
References
Preemptive Task Scheduling for Distributed Systems
Introduction
Preliminaries
The PTS Algorithm
Performance Results
Conclusion
References
Towards Optimal Load Balancing Topologies
Introduction
Definitions and Background
Flow Calculation
Flow Migration
Conclusion
References
Scheduling Trees with Large Communication Delays on Two Identical Processors
Introduction
NP-Hardness Result
Polynomial Time Algorithm for Complete Trees
References
Parallel Multilevel Algorithms for Multi-constraint Graph Partitioning
Introduction
Parallel Multi-constraint Refinement
Experimental Results
Conclusions
References
Experiments with Scheduling Divisible Tasks in Clusters of Workstations
Introduction
Processing Divisible Tasks on Star and Bus Topologies
Test Applications
Search for a Pattern
Compression
Join
Graph Coloring and Genetic Search
The Results
Discussion and Conclusions
References
Optimal Mapping of Pipeline Algorithms
Introduction
The Problem
The Analytical Model
Validation of the Model
Conclussions
References
Dynamic Load Balancing for Parallel Adaptive Multigrid Solvers with Algorithmic Skeletons
Introduction
Algorithmic Skeletons with {em Skil}
Dynamic Load Balancing with Skeletons
Properties
Conclusions and Future Work
References
Topic 04 Compilers for High Performance
Improving the Sparse Parallelization Using Semantical Information at Compile-Time
Introduction
Compilation Strategy Based on Privatizations
Sparse Loops Partitioning
Sparse Matrix Updating
Buffering Analysis
Parallelization of the Matrix Transposition
Experimental Results
Conclusions
References
Automatic Parallelization of Sparse Matrix Computations: A Static Analysis
Working Context
Symbolic Analysis
Abstraction Domain
Calculation of the Filling Function
Sparse Dependence Analysis
References
Automatic SIMD Parallelization of Embedded Applications Based on Pattern Recognition
Introduction
Code Transformation Using {sc ctt }
Experimental Framework
Results and Discussion
Conclusions
References
Temporary Arrays for Distribution of Loops with Control Dependences
Introduction
Distribution of Control Structures: Related Works
Complex Control Flow
If Conversion
McKinley and Kennedy\'s Approach
The Mixed Dependence Graph
Definition
Introducing Temporary Arrays
Parallelizing Algorithm
Conclusion
References
Automatic Generation of Block-Recursive Codes
Introduction
The Program-Space Formulation
Traversing the Program Iteration Space
Code Generation
Experimental Results
Related Work and Conclusions
References
Left-Looking to Right-Looking and Vice Versa: An Application of Fractal Symbolic Analysis to Linear Algebra Code Restructuring
Introduction
Factorizations and Triangular Solve
Lower Triangular Solve
Cholesky Factorization
LU Factorization with Partial Pivoting
Fractal Symbolic Analysis
Recursive Simplification
Base Symbolic Comparison
LU with Pivoting
Conclusions
References
Identifying and Validating Irregular Mutual Exclusion Synchronization in Explicitly Parallel Programs
Introduction
The CSSAME Form
Motivation and Overview
Detecting Mutex Structures
Lock-Picking
Experimental Results
Conclusions
References
Exact Distributed Invalidation
Introduction
Approach
Example
Coherence Equations
Compiler Implementation
Basic Blocks
Loops
Nested Loops and Summarising
Experiments
Conclusion
References
Scheduling the Computations of a Loop Nest with Respect to a Given Mapping
Introduction
Compatibility of Mapping and Scheduling Functions
Statement of the Problem
Hypotheses and Notations
The Underlying Scheduler
Example
Existence of a Compatible Schedule
The Algorithm
Construction of the Vectors
Construction of the Schedule Linear Parts
Computation of the Constants
Algorithm Complexity
Conclusion
References
Volume Driven Data Distribution for NUMA-Machines
Introduction
Problem Formulation
Related Work
Geometric Framework
Data Transformation
Ranking References
Ranking Transformations
Final Selection
Enumerating Transformations
Data Distribution
The Utilization Pattern
The Offset
Results and Conclusion
References
Topic 05 Parallel and Distributed Databases and Applications
Database Replication Using Epidemic Communication
Introduction
System Model and Epidemic Update Protocols
Performance Results
Response Time Analysis
Varying Degree of Replication
Comparison with Traditional Methods
Discussion
References
Evaluating the Coordination Overhead of Replica Maintenance in a Cluster of Databases
Introduction
Related Work
Design Alternatives
TP-Heavy: Transaction Monitor TUXEDO
TP-Lite: ORACLE8 RDBMS
TP-Less Coordinator
Evaluation
Experimental Setup
Lower Bounds of Coordination Overhead for Synchronous Replication
Response Times of Insert Streams with Synchronous Replication
Response Times of Insert Streams with Asynchronous Replication
Conclusions
References
A Communication Infrastructure for a Distributed RDBMS
Introduction
The Communication Architecture
Dialog Management
Conclusion
References
Distribution, Replication, Parallelism, and Efficiency Issues in a Large-Scale Online/Real-Time Information System for Foreign Exchange Trading
Introduction
Application and Requirements
System Architecture
Implementation Aspects
Summary and Conclusions
References
Topic 06 Complexity Theory and Algorithms
Positive Linear Programming Extensions: Parallel Complexity and Applications
Introduction
Extended PLP
The Lagrangian Search Method
Searching with Decision Problems
Applications
References
Parallel Shortest Path for Arbitrary Graphs
Introduction
Overview and Summary of New Results
Notation and Basic Facts
Parallelization
Finding Shortcuts
Determining $Delta $
Adaptation to Distributed Memory Machines
Conclusion
References
Periodic Correction Networks
Introduction
Preliminaries
Periodic $k$-Correction Network
Conclusions
References
Topic 07 Applications on High-Performance Computers
References
An Efficient Algorithm for Parallel 3D Reconstruction of Asymmetric Objects from Electron Micrographs
Introduction
{tt 3D} Reconstruction by Fourier Transforms
Results of Numerical Experiments
Performance of This Parallel Program
Summary
Acknowledgments
References
Fast Cloth Simulation with Parallel Computers
Introduction
Implementation
Forces
Collisions
Solver
Parallelization
Forces
Collisions
Conjugate Gradient
Results and Conclusions
References
The Input, Preparation, and Distribution of Data for Parallel GIS Operations
Introduction
Vector-Topological Data
The Parallel Data Partitioning Algorithm
Implementation and Performance
Conclusions and Future Work
Acknowledgements
References
Study of the Load Balancing in the Parallel Training for Automatic Speech Recognition
Introduction
The Training
Complexity of the Training
Parallelization
Experimentations
Conclusion
References
Pfortran and Co-Array Fortran as Tools for Parallelization of a Large-Scale Scientific Application
Introduction
Quantum Dynamics Algorithm
Parallelization Tools
Pfortran
Co-Array Fortran
Code Parallelization
Results
Discussion
References
Sparse Matrix Structure for Dynamic Parallelisation Efficiency
Introduction
PERMAS Global Structure
Blocking: Fixed-Sized vs. Variable-Sized
Data Distribution and Interleaving
Conclusions and Future Work
References
A Multi-color Inverse Iteration for a High Performance Real Symmetric Eigensolver
Introduction
The Multi-color Inverse Iteration
Numerical Tests and Remarks
References
Parallel Implementation of Fast Hartley Transform (FHT) in Multiprocessor Systems
Introduction
The Analysis of the Sequential FHT Algorithm
Parallelization of the FHT Algorithm
Results and Conclusions
References
Topic 08 Parallel Computer Architecture
Coherency Behavior on DSM: A Case Study
Introduction
Framework,/,Experimental Set-Up
Data Activity
Code Activity
Conclusion and Future Works
References
Hardware Migratable Channels
Introduction
Compiler Directed Input Buffers
Communicating Ports over Ports
Protocol
Conclusions
References
Reducing the Replacement Overhead on COMA Protocols for Workstation-Based Architectures
Introduction
Replacement Strategies in COMA Protocols
The VSR-COMA Protocol
Events, States, and Operations
State Transition Diagram
VSR-COMA Replacement Strategy
Results
Conclusions
References
Cache Injection: A Novel Technique for Tolerating Memory Latency in Bus-Based SMPs
Introduction
Cache Injection
Experimental Methodology
Results
Conclusion
References
Adaptive Proxies: Handling Widely-Shared Data in Shared-Memory Multiprocessors
Introduction
Adaptive Proxies
Simulated Architecture and Experimental Design
Experimental Results
Conclusions and Further Work
References
Topic 09 Distributed Systems and Algorithms
A Combinatorial Characterization of Properties Preserved by Antitokens
Introduction
Framework
Properties
Combinatorial Characterization
Conclusion
References
Searching with Mobile Agents in Networks with Liars
Introduction
Preliminaries and Definitions
Models
Results
Complete Graphs
Ring and Torus
Hypercube
Trees
References
Complete Exchange Algorithms for Meshes and Tori Using a Systematic Approach
Introduction
Considered Scenarios
The Method
A CC-Cube Algorithm for Complete Exchange
Concluding Remarks
References
Algorithms for Routing AGVs on a Mesh Topology
Introduction
The Problem
The Routing Strategy
Routing among Nodes
Routing among Extended Nodes
Complexity of Concurrent Moves
Discussions & Conclusions
References
Self-Stabilizing Protocol for Shortest Path Tree for Multi-cast Routing in Mobile Networks
Introduction
Shortest Path Tree Protocol
Complexity Analysis
Multi-cast Protocol
References
Quorum-Based Replication in Asynchronous Crash-Recovery Distributed Systems
Introduction
System Model and Building Blocks
Quorum-Based Replica Management
Discussion
References
Timestamping Algorithms: A Characterization and a Few Properties
Introduction
Computation Model
A Characterization of Timestamping Algorithms
Causal Pasts of a Set of Events $E$
Properties
Related Work
References
Topic 10 Programming Languages, Models, and Methods
TheField
The Common Agenda
The Selection Process
ThePapers
HPF vs. SAC -- A Case Study
Introduction
A Case Study: The PDE1-Benchmark
Performance Comparison
Conclusion
References
Developing a Communication Intensive Application on the EARTH Multithreaded Architecture
Introduction
The EARTH Multithreaded Architecture
Multithreaded Implementation
Scalability Results
Performance Analysis
Conclusion
References
On the Predictive Quality of BSP-like Cost Functions for NOWs
Introduction
Our Contribution
Fitting the Cost Functions
Validation Results
Predicting the Communication Time of Sorting Algorithms
Future Work
References
Exploiting Data Locality on Scalable Shared Memory Machines with Data Parallel Programs
Introduction
Thread Parallelism vs. Process Parallelism
Data Mapping and Data Layout
Work Distribution
Communication and Synchronization
Private and Reduction Variables
Experiments and Results
Summary and Conclusion
References
The Skel-BSP Global Optimizer: Enhancing Performance Portability in Parallel Programming
Introduction
The Skel-BSP Methodology
The Skel-BSP Compiler
The Cost Model
The Program Annotated Tree (PAT)
The Global Optimizer
The Transformation Rules
Initializing the PAT
Reducing Resources
Augmenting Parallelism
Case Study
Conclusions and Related Work
References
A Theoretical Framework of Data Parallelism and Its Operational Semantics
Introduction
Our Theory
Objects
Operations
A Minimal Notation Set
Theory Adequacy
Example 1
Example 2
Operational Semantics
Well-Formed Statements
States and Transitions
Example
Conclusion
References
A Pattern Language for Parallel Application Programs
Introduction
Organization of the Pattern Language
Related Work
Conclusions
References
Oblivious BSP
Introduction
The Oblivious BSP Model
Acknowledgements
References
A Software Architecture for HPC Grid Applications
Introduction
An Example: Heat Flow in an Insulated Bar
Conclusions
Satin: Efficient Parallel Divide-and-Conquer in Java
Introduction
The Programming Model
Spawn and Sync
The Parameter Passing Mechanism
The Implementation
Performance Evaluation
Basic Spawn Overhead (Fibonacci)
Parallel Applications
Related Work
Conclusions and Future Work
References
Implementing Declarative Concurrency in Java
Introduction
Related Work
Logic Programs for Concurrent Programming
Events and Constraints
Markers and Events
Example
Implementation
Architecture
The Constraint Interpreter
Conclusion
References
Building Distributed Applications Using Multiple, Heterogeneous Environments
Introduction
Designing Dynamic Environments
The Role of Java
Shared Library Aspects
Shared and Static Libraries
Conclusion
References
A Multiprotocol Communication Support for the Global Address Space Programming Model on the IBM SP
Introduction
SMP-Aware Communication Protocols
Performance of Communication Operations
Application Study
Related Work
Conclusions and Future Work
References
A Comparison of Concurrent Programming and Cooperative Multithreading
Introduction
Language Features
Experimental Results
{CP} versus {CM} (Single Processor)
{CP} versus {PCM} (Multiprocessor)
Discussion
Conclusion
References
The Multi-architecture Performance of the Parallel Functional Language GPH
Introduction
The {sc GUM} Runtime System
Measurement Setup
Accident Blackspots: A Larger {sc GpH} Program
Conclusion
References
Novel Models for Or-Parallel Logic Programs: A Performance Analysis
Introduction
Models for Or-Parallelism
Implementation Issues
YapOr with Copying
$alpha $COWL
Sparse Binding Arrays
Performance Evaluation
Conclusions
References
Executable Specification Language for Parallel Symbolic Computation
Introduction
SL Language, Its Sequential and Parallel Semantics
Compile-Time Transformations
Conclusion and Future Works
References
Efficient Parallelisation of Recursive Problems Using Constructive Recursion
Introduction
Constructive Recursion
Example: Heatflow in One Dimension
Conclusion
References
Development of Parallel Algorithms in Data Field Haskell
Introduction
The Data Field Model
Data Field Haskell
Forall- and For-Abstraction
An Example
Conclusions
References
The ParCel-2 Programming Language
Introduction
The parcel {} Programming Model
The parcel {} Syntax
Interface Declarations
Body Declarations
Topology Declarations
Conclusion and Future Work
References
Topic 11 Numerical Algorithms for Linear and Nonlinear Algebra
Ahnentafel Indexing into Morton-Ordered Arrays, or Matrix Locality for Free
INTRODUCTION
BASIC DEFINITIONS
ARRAYS
MATRICES
CARTESIAN INDEXING AND MORTON ORDERING
DILATED INTEGERS
SPACE AND BOUNDS
CONCLUSION
References
An Efficient Parallel Linear Solver with a Cascadic Conjugate Gradient Method: Experience with Reality
Introduction
Sparsity Patterns of Matrices
Communication Expense
Optimization Targets to Improve the Floating Point Performance on RISC Processors
Matrix Vector Multiplication
Iteration Steps of the Conjugate Gradient Method
Conclusion
References
A Fast Solver for Convection Diffusion Equations Based on Nested Dissection with Incomplete Elimination
Introduction
The Nested Dissection Approach
Nested Dissection as a Direct Solver
Iterative Versions of the Nested Dissection Method
Parallel, Iterative Nested Dissection
Nested Dissection with Incomplete Elimination
Numerical Results
Present and Future Work
References
Low Communication Parallel Multigrid
Introduction
Algorithm of Brandt & Diskin
Efficiency Analysis
The Two Level Brandt–Diskin–Algorithm
Conclusions
References
Parallelizing an Unstructured Grid Generator with a Space-Filling Curve Approach
Introduction
Recursive Calculation of the Space-Filling Curve for Triangle Bisection
The Parallel Grid Generator
Numerical Examples
Conclusions
References
Solving Discrete-Time Periodic Riccati Equations on a Cluster
Introduction
Parallel Solution of DPREs
Experimental Results
References
A Parallel Optimization Scheme for Parameter Estimation in Motor Vehicle Dynamics
Introduction
Simulation of Full Motor Vehicle Dynamics
Estimation of Vehicle Parameters
Parallel Optimization
Results
References
Sliding-Window Compression on the Hypercube
Introduction
LZ77 Coding on the Hypercube
Conclusions
References
A Parallel Implementation of a Potential Reduction Algorithm for Box-Constrained Quadratic Programming
Introduction
The Potential Reduction Algorithm for Quadratic Problems with Box Constraints
A Parallel Version of PR Algorithm
Computational Results
Concluding Remarks
References
Topic 12 European Projects
NEPHEW: Applying a Toolset for the Efficient Deployment of a Medical Image Application on SCI-Based Clusters
Motivation
Background for NEPHEW
PeakWare: Toolset for Efficient Cluster Computing
Nuclear Medical Imaging Using PET
PET Image Reconstruction Using NEPHEW
Preliminary Experiences on Windows NT Clusters
Conclusions and Future Work
References
SEEDS: Airport Management Database System
Introduction
Airport Management Database System
Architecture
Data Transmission Rules
Communication with SQL Server
Security Model
Application Server and Clients
Conclusions
References
HIPERTRANS: High Performance Transport Network Modelling and Simulation
Introduction
The HIPERTRANS Requirements and Specifications
HIPERTRANS Partnership and Test Sites
Objectives
Technical Description
Results
Summary and Conclusions
Acknowledgement
References
Topic 13 Routing and Communication in Interconnection Networks
Experimental Evaluation of Hot-Potato Routing Algorithms on 2-Dimensional Processor Arrays
Introduction
Short Description of the Algorithms
Experimentation
References
Improving the Up*/Down* Routing Scheme for Networks of Workstations
Introduction
Up$^{*}$$/$Down$^{*}$ Routing
Computing a BFS Spanning Tree
Computing a DFS Spanning Tree
Applying New Heuristic Rules
Traffic Balancing Algorithm
Performance Evaluation
Network Model
Simulation Results
Conclusions
References
Deadlock Avoidance for Wormhole Based Switches
Introduction
Deadlock Caused by Blocking Switches
Flow Control Methods
Source Driven Approach
Destination Driven Approach
Draining Network Approach
Simulation
Conclusion
References
An Analytical Model of Adaptive Wormhole Routing with Deadlock Recovery
Introduction
The Analytical Model
Conclusion
References
Analysis of Pipelined Circuit Switching in Cube Networks
Introduction
Analysis
Model Validation
Conclusion
References
A New Reliability Model for Interconnection Networks
Introduction
A Methodology to Evaluate Reliability Based on Markov Chains
Applying the Reliability Methodology
Fault Model
Computing Reliability Parameters
Results
Conclusions
References
A Bandwidth Latency Tradeoff for Broadcast and Reduction
Introduction
Basic Results on Broadcasting Long Messages
Fractional Tree Broadcasting
Sparse Interconnection Networks
References
Optimal Broadcasting in Even Tori with Dynamic Faults
Introduction
Model and Basic Facts
Optimal Upper Bound on the Broadcasting Time
References
Broadcasting in All-Port Wormhole 3-D Meshes of Trees
Preliminaries
Previous and Related Work
The Main Result
References
Probability-Based Fault-Tolerant Routing in Hypercubes
Introduction
The Proposed Fault-Tolerant Routing Algorithm
Performance Comparison
Conclusion
References
Topic 14 Instruction-Level Parallelism and Processor Architecture
On the Performance of Fetch Engines Running DSS Workloads
Introduction
Experimental Setup
Effect of Instruction Latency on Performance
Effect of Instruction Quality on Performance
Effect of Fetch Bandwidth on Performance
Code Reordering
Concluding Remarks
References
Cost-Efficient Branch Target Buffers
Introduction
Simulation Environment
Partial Resolution
Exploiting Branch Locality
Paired-Entry and Variable-Size BTBs
Evaluation
Variations
Conclusions
References
Two-Level Address Storage and Address Prediction
Introduction
Two-Level Address Predictor
Basic Idea
Locality Analysis and HAT Size
Prediction-Table Management
Evaluation: 2LAP versus BP
Area Cost of the Predictors
Captured Address Predictability
Accuracy
Conclusions
References
Hashed Addressed Caches for Embedded Pointer Based Codes
Introduction
Hashing Functions and Bit Juggling Addressing
Evaluation
Conclusions and Future Work
References
BitValue Inference: Detecting and Exploiting Narrow Bitwidth Computations
Introduction
The BitValue Inference Algorithm
Example
Experiments with a C Compiler
Evaluation
Practical Issues
Experiments with a Reconfigurable Hardware Compiler
Related Work
Conclusions
References
General Matrix-Matrix Multiplication Using SIMD Features of the PIII
Introduction
SIMD Parallelization
Memory Hierarchy Optimizations
Results
Conclusion
References
Redundant Arithmetic Optimizations
Introduction
Contributions
Worst-Case Delay
Instruction Scheduling
Power
Simulation Data
References
The Decoupled-Style Prefetch Architecture
Introduction
Background
The Decoupled-Style Prefetch Architecture
Results
Conclusion
References
Exploiting Java Bytecode Parallelism by Enhanced POC Folding Model
Introduction
Enhanced POC Folding Model
Performance Comparison of Various Folding Models
Conclusion
References
Cache Remapping to Improve the Performance of Tiled Algorithms
Introduction
Cache Remapping Technique
Tiled Loop Nests
Cache Memory
Conflict Misses in Tiled Algorithms
High-Level View of Cache Remapping
Low-Level Details
Implementation and Results
Processor Requirements
Simulation
Comparison with Related Work
Conclusion
References
Code Partitioning in Decoupled Compilers
Introduction
Background
Processor Model
The Compiler
Code Partitioning
Example Compiler Output
Results
Conclusion
References
Limits and Graph Structure of Available Instruction-Level Parallelism
Background and Related Work
Run-Time Analysis of Programs
Future Directions
References
Pseudo-vectorizing Compiler for the SR8000
Introduction
Pseudo-vector Optimization
Access Method Analysis
Preloading Optimization
Prefetching Optimization
Evaluation
Conclusion
References
Topic 15 Object Oriented Architectures, Tools, and Applications
Debugging by Remote Reflection
Introduction
Related Works
Remote Reflection
Implementation
Remote Object
Bytecode Extensions
Example
Status and Future Works
Conclusions
References
Compiling Multithreaded Java Bytecode for Distributed Execution
Introduction
The Hyperion System
Compiling Java
The Hyperion Run-Time System Design
Hyperion/PM2 Implementation Details
Threads and Communication
Memory Management
Performance Evaluation: Minimal-Cost Map-Coloring
Experimental Conditions and Benchmark Programs
Overhead of Hyperion/PM2 vs. Hand-Written C Code
Performance of the Multithreaded Version
Related Work
Conclusion
References
A More Expressive Monitor for Concurrent Java Programming
The Introduction to Java Monitor
The Drawbacks of Java Monitor
The Problems Introduced by Single Condition Queue
No Additional Support for Scheduling
The Troubles Caused by No-Priority Monitor
Insufficient Signal Semantics
Deadlock of Inter-monitor Nested Calls
Our Solution
The Characteristics of the EMonitor
The Syntax and Implementation of EMonitor
Experimental Result
Conclusion
References
An Object-Oriented Software Framework for Large-Scale Networked Virtual Environments
Introduction
Object and Perception Model
Replication and Persistence Model
Event Model and Synchronization
Platform Architecture
Related Work
Conclusion
References
TACO -- Dynamic Distributed Collections with Templates and Topologies
Introduction
The Multiple Threads Template Library
Global Object Pointers and Remote Method Invocation
Collections and Topologies
Collective Methods
Creation of Collections
Design Considerations
Dynamic Collections
Performance
Conclusion
References
Object-Oriented Message-Passing with TPO++
Motivation and Design Goals
Interface and Examples
Comparison with MPI
Conclusions
References
Topic 17 Architectures and Algorithms for Multimedia Applications
References
Design of Multi-dimensional DCT Array Processors for Video Applications
Introduction
A Dimensional Splitting Method
Array Processor Designs for 1-D DCT
Array Processor for Multidimensional DCT
Concluding Remarks
References
Design of a Parallel Accelerator for Volume Rendering
Introduction
Volume Rendering Algorithms
Previous SIMD Volume Rendering Work
Principle of the ISA
Accelerator Architecture Design
Mapping of Ray Casting to the Accelerator Architecture
Performance Evaluation
Conclusions
References
Automated Design of an ASIP for Image Processing Applications
Introduction
Image Processing Algorithms
Mapping the Algorithms to TTAs
Results
Conclusion
References
A Distributed Storage System for a Video-on-Demand Server
Introduction
Related Works
Overview of the Complete Video Server
The Cluster File System
Fault Tolerance Management
Experimental Results
Conclusion
References
Topic 18 Cluster Computing
Partition Cast -- Modelling and Optimizing the Distribution of Large Data Sets in PC Clusters
Introduction and Related Work
A Model for Partition-Cast in Clusters
Node Types
Network Types
Capacity Model
Model Algorithm
A More Detailed Model for an Active Node
Modelling the Limiting Resources in an Active Node
Dealing with Compressed Images
Differences in the Implementations
Evaluation of Partition-Cast
Conclusion
References
A New Home-Based Software DSM Protocol for SMP Clusters
Introduction
The JIAJIA Software DSM System
SMP Protocol for JIAJIA
Design Alternatives
SMP Protocol
Intra-node Communication
Performance Evaluation
Conclusion and Future Work
References
Encouraging the Unexpected: Cluster Management for OS and Systems Research
Introduction
The MultiOS Framework
A Hardware Reset Mechanism
Control Must Be Passed to the MultiOS Server During Boot
A Special Management Environment Which Does Not Use the Local Disk
The MultiOS Server
Security Issues
Summary
References
Flow Control in ServerNetR Clusters
Introduction
Packet Pair Flow Control
Alternating Static Window Flow Control
Summary
References
The WMPI Library Evolution: Experience with MPI Development for Windows Environments
Introduction
Related Work
MPICH – The WMPI’s Base Architecture
Windows Clusters Environment
The First WMPI Architecture
Multiple Devices
Dynamic Environment
The Second WMPI Architecture
Lessons Learned
Conclusions
References
Implementing Explicit and Implicit Coscheduling in a PVM Environment
Introduction
Coscheduling
Explicit Coscheduling
Implicit Coscheduling
Algorithms
Experimentation
Implemented Environments
Results
Conclusions and Future Work
References
A Jini-Based Prototype Metacomputing Framework
Introduction
Metacomputing Systems
A Minimal Metacomputing System
The Operation of the System
Implementation of the Prototype System
The Host Service
The Broker Service
Conclusions
References
SKElib: Parallel Programming with Skeletons in C
Introduction
Library Design
textbf {textsf {SKElib}} Implementation
Experimental Results
Related Work & Conclusions
References
Token-Based Read/Write-Locks for Distributed Mutual Exclusion
Introduction
Related Work
Dynamic Reader/Writer Protocol for Mutual Exclusion
Experimental Platform
Measurements
Conclusions
References
On Solving a Problem in Algebraic Geometry by Cluster Computing
Introduction
The Parallelization Approaches
Distributed Maple
Experimental Results
References
PCI-DDC Application Programming Interface: Performance in User-Level Messaging
Introduction
Programming the PCI-DDC Component
Performance
Conclusion
References
A Clustering Approach for Improving Network Performance in Heterogeneous Systems
Introduction
A New Clustering Approach
Performance Evaluation
Conclusions
References
Topic 19 Metacomputing
Request Sequencing: Optimizing Communication for the Grid
Introduction
Positioning Our Work
An Overview of NetSolve
Sequencing Design and Implementation
The DAG Model
Data Analysis and the DAG
The Interface
Execution Scheduling at the Server
Discussion
Applications and Initial Results
Linear Sequence: Principle Component Analysis
Parallel Sequence: Clustering
Conclusion and Future Work
References
An Architectural Meta-application Model for Coarse Grained Metacomputing
Introduction
The Amica Metacomputing Infrastructure
The Amica Programming Model
Architecture Description
An Architectural Style for Amica Meta-applications
A Small Example
Meta-application Execution and Formal Analysis
Related Work and Conclusion
References
Javelin 2.0: Java-Based Parallel Computing on the Internet
Introduction
Model of Computation
Architecture
Javelin Broker Name Service
Broker Network & Host Tree Management
Scalable Computation & Fault Tolerance
The Scheduler
Shared Memory
Fault Tolerance
Experimental Results
Conclusion
References
Data Distribution for Parallel CORBA Objects
Introduction
Communication within a Computational Grid
Overview of Parallel CORBA Object
Data Redistribution in a Parallel Object
Design Considerations
Implementation
Experimental Results
Comparison with the Master/Slave Approach
Redistribution at the Client versus the Server
Conclusion and Future Works
References
Topic 20 Parallel I/O and Storage Technology
Towards a High-Performance Implementation of MPI-IO on Top of GPFS
Introduction
MPI-IO/GPFS Features
Performance Measurements
Benchmark Description
Experimental Platform
Benchmark Results
Work in Progress
Conclusion
Acknowledgements
References
Design and Evaluation of a Compiler-Directed Collective I/O Technique
Introduction
Collective I/O
Compiler Analysis
Access Pattern Detection
Storage Pattern Detection
Discussion
Experiments
Experimental Environment
Setups
Base Experiments
Sensitivity Analysis
Conclusions
References
Effective File-I/O Bandwidth Benchmark
Introduction
Multidimensional Benchmarking Space
Criteria
Definition of the Effective I/O Bandwidth
Comparing Systems Using b_eff_io
Outlook
References
Instant Image: Transitive and Cyclical Snapshots in Distributed Storage Volumes
Introduction
Definitions
Algorithm
Related Work
Conclusions and Future Work
References
Scheduling Queries for Tape-Resident Data
Introduction
Background
Workload Characterization
Workloads Consisting of Small Jobs
Workloads Consisting of Big Jobs
Performance Evaluation
Conclusions
References
Logging RAID - An Approach to Fast, Reliable, and Low-Cost Disk Arrays
Introduction
The Logging RAID Architecture
The Logging RAID Storage Layout
The Mapping Structures
Logging RAID Operations
A Trace-Driven Simulation Study
Experimental Setups and Traces
Simulation Results
Conclusion
References
Topic 21 Problem Solving Environments
AMANDA - A Distributed System for Aircraft Design
Introduction
The AMANDA--Applications
Airplane Design
Turbine Design
The Software Integration System
The Software Development Kit
TENT - Base System
TENT Facilities
TENT Components
Impacts of the AMANDA--Applications on TENT
NASTRAN-Co-process
Strongly Coupled Multi-disciplinary Subsystems
Hierarchical Structure
Conclusions
References
Problem Solving Environments: Extending the Rôle of Visualization Systems
Introduction
Visualization Architecture and Extensions
Collaborative Working
Data Persistence
Pipeline Management
Augmented Architecture
Conclusions
References
An Architecture for Web-Based Interaction and Steering of Adaptive Parallel/Distributed Applications
Introduction
Related Work
DISCOVER: An Interactive Computational Collaboratory
Interaction and Collaboration Servers
Collaborative Interaction and Steering
Security, Authentication, and Access Control
Application View Plug-Ins
Application Control Network for Interaction and Steering
Sensors/Actuators and Interaction Object
The Control Network and Interaction Agents
Conclusion and Future Work
References
Computational Steering in Problem Solving Environments
label {Overview}Introduction
label {Arch}PSE Architecture
label {Framework}Prototype PSE
Conclusion
References
Implementing Problem Solving Environments for Computational Science
Introduction
Applications Using the PSE Infrastructure
Conclusion and Future Work
References
Pseudovectorization, SMP, and Message Passing on the Hitachi SR8000-F1
Aiming for Top Level Computing
The Innovative Architecture of the SR8000-F1
Pseudo-Vector-Processing (PVP)
Cooperative Micro Processors in Single Address Space (COMPAS)
Benchmark Results and Principles for Code Optimization
Memory Throughput
Scalability of MPI Programs
Case Studies for the Hybrid (COMPAS/OpenMP + MPI)Programming Paradigm
Conclusion
Further Reading and Details
Index of Authors