Copy-on-write in Mammoth

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Copy-on-write in Mammoth Gong, Shihao

Abstract

Mammoth is a versioned peer-to-peer storage system with a traditional file system interface. Files are replicated selectively on several nodes based on per-file policies to protect against site failure. A set of interested nodes may cache the file data to read and write. To increase the availability during network partition, Mammoth allows branch to occur. Consistency is achieved by separating file history logs and file data. All file data are kept as immutable versions. File metadata is propagated eagerly among interested nodes. However, keeping a file version in its entirety may hurt the performance when versioning and replicating large files with minor modifications. Thus a copy-on- write (COW) mechanism is integrated with Mammoth file system. It preserves all the features of Mammoth's original design. The basic idea of COW is to use an in-core bitmap to record which blocks are modified during an update session. When versioning the file, only the new data of changed blocks (delta) is copied into a dfile and the changed block list is written to an ifile. When updating a copy on a remote node, the system scans the ifiles created during this interval and gets a complete list of changed blocks. Then only the delta is sent through the network. File restoration is achieved by applying deltas reversely. We did a comparison test between Mammoth and Mammoth+. It shows that Mammoth+ reduces versioning overhead and replication overhead dramatically for large files with a small update size. For example, when only 4KB changed in a 64MB file, the versioning overhead of Mammoth is 4565 times of Mammoth+, and the replication overhead of Mammoth is 17812 times of Mammoth-h

Item Metadata

Title	Copy-on-write in Mammoth
Creator	Gong, Shihao
Publisher	University of British Columbia
Date Issued	2003
Description	Mammoth is a versioned peer-to-peer storage system with a traditional file system interface. Files are replicated selectively on several nodes based on per-file policies to protect against site failure. A set of interested nodes may cache the file data to read and write. To increase the availability during network partition, Mammoth allows branch to occur. Consistency is achieved by separating file history logs and file data. All file data are kept as immutable versions. File metadata is propagated eagerly among interested nodes. However, keeping a file version in its entirety may hurt the performance when versioning and replicating large files with minor modifications. Thus a copy-on- write (COW) mechanism is integrated with Mammoth file system. It preserves all the features of Mammoth's original design. The basic idea of COW is to use an in-core bitmap to record which blocks are modified during an update session. When versioning the file, only the new data of changed blocks (delta) is copied into a dfile and the changed block list is written to an ifile. When updating a copy on a remote node, the system scans the ifiles created during this interval and gets a complete list of changed blocks. Then only the delta is sent through the network. File restoration is achieved by applying deltas reversely. We did a comparison test between Mammoth and Mammoth+. It shows that Mammoth+ reduces versioning overhead and replication overhead dramatically for large files with a small update size. For example, when only 4KB changed in a 64MB file, the versioning overhead of Mammoth is 4565 times of Mammoth+, and the replication overhead of Mammoth is 17812 times of Mammoth-h
Extent	2556217 bytes
Genre	Thesis/Dissertation
Type	Text
File Format	application/pdf
Language	eng
Date Available	2009-10-17
Provider	Vancouver : University of British Columbia Library
Rights	For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use.
DOI	10.14288/1.0051166
URI	http://hdl.handle.net/2429/13958
Degree	Master of Science - MSc
Program	Computer Science
Affiliation	Science, Faculty of; Computer Science, Department of
Degree Grantor	University of British Columbia
Graduation Date	2003-05
Campus	UBCV
Scholarly Level	Graduate
Aggregated Source Repository	DSpace

Item Media

ubc_2003-0114.pdf -- 2.44MB

Item Citations and Data

Rights

For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use.

Open Collections

UBC Theses and Dissertations

Copy-on-write in Mammoth Gong, Shihao

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights