UBC Theses and Dissertations
MPI collective operations over Myrinet Zhang, Qianfeng
Collective communication is an important subset of Message Passing Interface. Improving the performance of collective communication can greatly contribute to the performance of MPI applications. MPI-NP II is an MPI specific messaging system for PC clusters. By integrating collective communication support into the communication layer of MPI-NP II, we extended the system to MPI-NP II+. MPI-NP 11+ first added those functions that are essential for supporting a complete MPI application which were not provided by MPI-NP II. These functions include Any .Source message matching, multiple local processes and multiple communicators. These functions are efficiently designed and implemented so that they are not overly costly to the performance of message passing. For sending a point-topoint message, MPI-NP 11+ still has a minimum message latency of 42 microseconds and a maximum end-to-end bandwidth of 89MB/s, comparable to the performance reached by the incomplete MPI-NP II implementation. Three collective communication operations are added. By using NIC level message forwarding, the performance benefits for these operations are obvious. On a system of 8 nodes, the NIC-based MPLBarrier is 4 times better than a host-based implementation over the same Myrinet, and for MPLComm.Create, the improvement factor is 2. The NIC-based MPI_Bcast is always better than a host-based implementation for all message sizes, and for a small message, the improvement factor of the broadcast latency is 2 to 3 times. Moreover, for all the three operations, the NIC-based implementation scales better than host-based implementation. MPI-NP 11+ extended the concept of the microchannel. By using a special microchannel, we were able to support three collective communication operations while preserving the semantics of the MPI specific communication layer.
Item Citations and Data