UBC Theses and Dissertations
Multilevel debugging of parallel message passing programs Pedersen, Jan Bækgaard
"Errare humanum est" - To err is human (Hieronymus, Epistle 57, 12); this fact has been known throughout time, and inevitably this means that humans writing computer programs are bound to introduce errors. With computers operating in Frankenstein's Igor mode, 'Your wish is my command', executing instructions without questioning their validity, errors introduced by humans are carried out. When adding parallel programming with message passing an error in one process can spread like a virus through message passing to other processes. Much research has been done on debugging sequential programs, and most of these theories and results apply directly to parallel programs, but the set of potential errors dramatically increases in size when introducing parallelism and message passing. Not only can one process fail, but sets of processes can deadlock, computational errors can be propagated from process to process, thus infecting otherwise correct programs. Correct programs can stop working because of the underlying implementation of the message passing system. We propose a framework for debugging parallel message passing programs: a multilevel approach that divides errors into separate groups at various levels from the well known sequential errors, such as stray pointers and array out of bound, to deadlock caused by incorrect message passing code, protocol errors and buffer allocation problems. We show the validity of this approach by developing new debugging techniques and analyses, and by implementing these in Millipede, a prototype multilevel debugger written for C programs that use the PVM message passing system.
Item Citations and Data