search for books and compare prices
Tables of Contents for Introduction to Parallel Computing
Chapter/Section Title
Page #
Page Count
Preface
xvii
 
Acknowledgments
xix
 
Introduction to Parallel Computing
1
10
Motivating Parallelism
2
2
The Computational Power Argument -- from Transistors to FLOPS
2
1
The Memory/Disk Speed Argument
3
1
The Data Communication Argument
4
1
Scope of Parallel Computing
4
2
Applications in Engineering and Design
4
1
Scientific Applications
5
1
Commercial Applications
5
1
Applications in Computer Systems
6
1
Organization and Contents of the Text
6
2
Bibliographic Remarks
8
3
Problems
9
2
Parallel Programming Platforms
11
74
Implicit Parallelism: Trends in Microprocessor Architectures
12
4
Pipelining and Superscalar Execution
12
3
Very Long Instruction Word Processors
15
1
Limitations of Memory System Performance*
16
8
Improving Effective Memory Latency Using Caches
17
1
Impact of Memory Bandwidth
18
3
Alternate Approaches for Hiding Memory Latency
21
2
Tradeoffs of Multithreading and Prefetching
23
1
Dichotomy of Parallel Computing Platforms
24
7
Control Structure of Parallel Platforms
25
2
Communication Model of Parallel Platforms
27
4
Physical Organization of Parallel Platforms
31
22
Architecture of an Ideal Parallel Computer
31
1
Interconnection Networks for Parallel Computers
32
1
Network Topologies
33
10
Evaluating Static Interconnection Networks
43
1
Evaluating Dynamic Interconnection Networks
44
1
Cache Coherence in Multiprocessor Systems
45
8
Communication Costs in Parallel Machines
53
10
Message Passing Costs in Parallel Computers
53
8
Communication Costs in Shared-Address-Space Machines
61
2
Routing Mechanisms for Interconnection Networks
63
2
Impact of Process-Processor Mapping and Mapping Techniques
65
9
Mapping Techniques for Graphs
66
7
Cost-Performance Tradeoffs
73
1
Bibliographic Remarks
74
11
Problems
76
9
Principles of Parallel Algorithm Design
85
62
Preliminaries
86
9
Decomposition, Tasks, and Dependency Graphs
86
3
Granularity, Concurrency, and Task-Interaction
89
4
Processes and Mapping
93
1
Processes versus Processors
94
1
Decomposition Techniques
95
15
Recursive Decomposition
95
2
Data Decomposition
97
8
Exploratory Decomposition
105
2
Speculative Decomposition
107
2
Hybrid Decompositions
109
1
Characteristics of Tasks and Interactions
110
5
Characteristics of Tasks
110
2
Characteristics of Inter-Task Interactions
112
3
Mapping Techniques for Load Balancing
115
17
Schemes for Static Mapping
117
13
Schemes for Dynamic Mapping
130
2
Methods for Containing Interaction Overheads
132
7
Maximizing Data Locality
132
2
Minimizing Contention and Hot Spots
134
1
Overlapping Computations with Interactions
135
1
Replicating Data or Computations
136
1
Using Optimized Collective Interaction Operations
137
1
Overlapping Interactions with Other Interactions
138
1
Parallel Algorithm Models
139
3
The Data-Parallel Model
139
1
The Task Graph Model
140
1
The Work Pool Model
140
1
The Master-Slave Model
141
1
The Pipeline or Producer-Consumer Model
141
1
Hybrid Models
142
1
Bibliographic Remarks
142
5
Problems
143
4
Basic Communication Operations
147
48
One-to-All Broadcast and All-to-One Reduction
149
8
Ring or Linear Array
149
3
Mesh
152
1
Hypercube
153
1
Balanced Binary Tree
153
1
Detailed Algorithms
154
2
Cost Analysis
156
1
All-to-All Broadcast and Reduction
157
9
Linear Array and Ring
158
2
Mesh
160
1
Hypercube
161
3
Cost Analysis
164
2
All-Reduce and Prefix-Sum Operations
166
1
Scatter and Gather
167
3
All-to-All Personalized Communication
170
9
Ring
173
1
Mesh
174
1
Hypercube
175
4
Circular Shift
179
5
Mesh
179
2
Hypercube
181
3
Improving the Speed of Some Communication Operations
184
3
Splitting and Routing Messages in Parts
184
2
All-Port Communication
186
1
Summary
187
1
Bibliographic Remarks
188
7
Problems
190
5
Analytical Modeling of Parallel Programs
195
38
Sources of Overhead in Parallel Programs
195
2
Performance Metrics for Parallel Systems
197
8
Execution Time
197
1
Total Parallel Overhead
197
1
Speedup
198
4
Efficiency
202
1
Cost
203
2
The Effect of Granularity on Performance
205
3
Scalability of Parallel Systems
208
10
Scaling Characteristics of Parallel Programs
209
3
The Isoefficiency Metric of Scalability
212
5
Cost-Optimality and the Isoefficiency Function
217
1
A Lower Bound on the Isoefficiency Function
217
1
The Degree of Concurrency and the Isoefficiency Function
218
1
Minimum Execution Time and Minimum Cost-Optimal Execution Time
218
3
Asymptotic Analysis of Parallel Programs
221
1
Other Scalability Metrics
222
4
Bibliographic Remarks
226
7
Problems
228
5
Programming Using the Message-Passing Paradigm
233
46
Principles of Message-Passing Programming
233
2
The Building Blocks: Send and Receive Operations
235
5
Blocking Message Passing Operations
236
3
Non-Blocking Message Passing Operations
239
1
MPI: the Message Passing Interface
240
10
Starting and Terminating the MPI Library
242
1
Communicators
242
1
Getting Information
243
1
Sending and Receiving Messages
244
4
Example: Odd-Even Sort
248
2
Topologies and Embedding
250
5
Creating and Using Cartesian Topologies
251
2
Example: Cannon's Matrix-Matrix Multiplication
253
2
Overlapping Communication with Computation
255
5
Non-Blocking Communication Operations
255
5
Collective Communication and Computation Operations
260
12
Barrier
260
1
Broadcast
260
1
Reduction
261
2
Prefix
263
1
Gather
263
1
Scatter
264
1
All-to-All
265
1
Example: One-Dimensional Matrix-Vector Multiplication
266
2
Example: Single-Source Shortest-Path
268
2
Example: Sample Sort
270
2
Groups and Communicators
272
4
Example: Two-Dimensional Matrix-Vector Multiplication
274
2
Bibliographic Remarks
276
3
Problems
277
2
Programming Shared Address Space Platforms
279
58
Thread Basics
280
1
Why Threads?
281
1
The POSIX Thread API
282
1
Thread Basics: Creation and Termination
282
5
Synchronization Primitives in Pthreads
287
11
Mutual Exclusion for Shared Variables
287
7
Condition Variables for Synchronization
294
4
Controlling Thread and Synchronization Attributes
298
3
Attributes Objects for Threads
299
1
Attributes Objects for Mutexes
300
1
Thread Cancellation
301
1
Composite Synchronization Constructs
302
8
Read-Write Locks
302
5
Barriers
307
3
Tips for Designing Asynchronous Programs
310
1
OpenMP: a Standard for Directive Based Parallel Programming
311
21
The OpenMP Programming Model
312
3
Specifying Concurrent Tasks in OpenMP
315
7
Synchronization Constructs in OpenMP
322
5
Data Handling in OpenMP
327
1
OpenMP Library Functions
328
2
Environment Variables in OpenMP
330
1
Explicit Threads versus OpenMP Based Programming
331
1
Bibliographic Remarks
332
5
Problems
332
5
Dense Matrix Algorithms
337
42
Matrix-Vector Multiplication
337
8
Rowwise 1-D Partitioning
338
3
2-D Partitioning
341
4
Matrix-Matrix Multiplication
345
7
A Simple Parallel Algorithm
346
1
Cannon's Algorithm
347
2
The DNS Algorithm
349
3
Solving a System of Linear Equations
352
19
A Simple Gaussian Elimination Algorithm
353
13
Gaussian Elimination with Partial Pivoting
366
3
Solving a Triangular System: Back-Substitution
369
1
Numerical Considerations in Solving Systems of Linear Equations
370
1
Bibliographic Remarks
371
8
Problems
372
7
Sorting
379
50
Issues in Sorting on Parallel Computers
380
2
Where the Input and Output Sequences are Stored
380
1
How Comparisons are Performed
380
2
Sorting Networks
382
12
Bitonic Sort
384
3
Mapping Bitonic Sort to a Hypercube and a Mesh
387
7
Bubble Sort and its Variants
394
5
Odd-Even Transposition
395
3
Shellsort
398
1
Quicksort
399
13
Parallelizing Quicksort
401
1
Parallel Formulation for a CRCW PRAM
402
2
Parallel Formulation for Practical Architectures
404
7
Pivot Selection
411
1
Bucket and Sample Sort
412
2
Other Sorting Algorithms
414
2
Enumeration Sort
414
1
Radix Sort
415
1
Bibliographic Remarks
416
13
Problems
419
10
Graph Algorithms
429
40
Definitions and Representation
429
3
Minimum Spanning Tree: Prim's Algorithm
432
4
Single-Source Shortest Paths: Dijkstra's Algorithm
436
1
All-Pairs Shortest Paths
437
8
Dijkstra's Algorithm
438
2
Floyd's Algorithm
440
5
Performance Comparisons
445
1
Transitive Closure
445
1
Connected Components
446
4
A Depth-First Search Based Algorithm
446
4
Algorithms for Sparse Graphs
450
12
Finding a Maximal Independent Set
451
4
Single-Source Shortest Paths
455
7
Bibliographic Remarks
462
7
Problems
465
4
Search Algorithms for Discrete Optimization Problems
469
46
Definitions and Examples
469
5
Sequential Search Algorithms
474
4
Depth-First Search Algorithms
474
4
Best-First Search Algorithms
478
1
Search Overhead Factor
478
2
Parallel Depth-First Search
480
16
Important Parameters of Parallel DFS
482
3
A General Framework for Analysis of Parallel DFS
485
3
Analysis of Load-Balancing Schemes
488
2
Termination Detection
490
2
Experimental Results
492
3
Parallel Formulations of Depth-First Branch-and-Bound Search
495
1
Parallel Formulations of IDA*
496
1
Parallel Best-First Search
496
5
Speedup Anomalies in Parallel Search Algorithms
501
4
Analysis of Average Speedup in Parallel DFS
502
3
Bibliographic Remarks
505
10
Problems
510
5
Dynamic Programming
515
22
Overview of Dynamic Programming
515
3
Serial Monadic DP Formulations
518
5
The Shortest-Path Problem
518
2
The 0/1 Knapsack Problem
520
3
Nonserial Monadic DP Formulations
523
3
The Longest-Common-Subsequence Problem
523
3
Serial Polyadic DP Formulations
526
1
Floyd's All-Pairs Shortest-Paths Algorithm
526
1
Nonserial Polyadic DP Formulations
527
3
The Optimal Matrix-Parenthesization Problem
527
3
Summary and Discussion
530
1
Bibliographic Remarks
531
6
Problems
532
5
Fast Fourier Transform
537
28
The Serial Algorithm
538
3
The Binary-Exchange Algorithm
541
12
A Full Bandwidth Network
541
7
Limited Bandwidth Network
548
3
Extra Computations in Parallel FFT
551
2
The Transpose Algorithm
553
7
Two-Dimensional Transpose Algorithm
553
3
The Generalized Transpose Algorithm
556
4
Bibliographic Remarks
560
5
Problems
562
3
Appendix A Complexity of Functions and Order Analysis
565
4
A.1 Complexity of Functions
565
1
A.2 Order Analysis of Functions
566
3
Bibliography
569
42
Author Index
611
10
Subject Index
621