Abstract

A Scalable, Robust Network for Parallel Computing
Peter Cappello - University of California, Santa Barbara
Dimitrios Mourloukos - University of California, Santa Barbara
CX, a network-based computational exchange, is presented.
The system's design integrates variations of ideas from other 
researchers, such as
work stealing, non-blocking tasks, eager scheduling,
and space-based coordination.
The object-oriented API is simple, compact,
and cleanly separates application logic from
the logic that supports interprocess communication and fault tolerance.
Computations, of course, run to completion in the presence of computational
hosts
that join and leave the ongoing computation.
Such hosts, or producers, use task caching and prefetching to
overlap computation with interprocessor communication.
To break a potential task server bottleneck,
a network of task servers is presented.
Even though task servers are envisioned as reliable,
the self-organizing, scalable network of N servers,
described as a sibling-connected height-balanced fat tree,
tolerates a sequence of N-1 server failures.
Tasks are distributed throughout the server network via
a simple "diffusion" process.

CX is intended as a test bed for research on automated silent
auctions, reputation services, authentication services, and
bonding services. CX also provides a test bed for algorithm
research into network-based parallel computation.