UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Copy-on-write in Mammoth Gong, Shihao

Abstract

Mammoth is a versioned peer-to-peer storage system with a traditional file system interface. Files are replicated selectively on several nodes based on per-file policies to protect against site failure. A set of interested nodes may cache the file data to read and write. To increase the availability during network partition, Mammoth allows branch to occur. Consistency is achieved by separating file history logs and file data. All file data are kept as immutable versions. File metadata is propagated eagerly among interested nodes. However, keeping a file version in its entirety may hurt the performance when versioning and replicating large files with minor modifications. Thus a copy-on- write (COW) mechanism is integrated with Mammoth file system. It preserves all the features of Mammoth's original design. The basic idea of COW is to use an in-core bitmap to record which blocks are modified during an update session. When versioning the file, only the new data of changed blocks (delta) is copied into a dfile and the changed block list is written to an ifile. When updating a copy on a remote node, the system scans the ifiles created during this interval and gets a complete list of changed blocks. Then only the delta is sent through the network. File restoration is achieved by applying deltas reversely. We did a comparison test between Mammoth and Mammoth+. It shows that Mammoth+ reduces versioning overhead and replication overhead dramatically for large files with a small update size. For example, when only 4KB changed in a 64MB file, the versioning overhead of Mammoth is 4565 times of Mammoth+, and the replication overhead of Mammoth is 17812 times of Mammoth-h

Item Media

Item Citations and Data

Rights

For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use.