Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

FindFS : adding tag-based views to a hierarchical filesystem Chou, Jason 2015

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2015_november_chou_jason.pdf [ 209.9kB ]
Metadata
JSON: 24-1.0166678.json
JSON-LD: 24-1.0166678-ld.json
RDF/XML (Pretty): 24-1.0166678-rdf.xml
RDF/JSON: 24-1.0166678-rdf.json
Turtle: 24-1.0166678-turtle.txt
N-Triples: 24-1.0166678-rdf-ntriples.txt
Original Record: 24-1.0166678-source.json
Full Text
24-1.0166678-fulltext.txt
Citation
24-1.0166678.ris

Full Text

FindFS - Adding Tag-BasedViews to aHierarchical FilesystembyJason ChouB.Sc., The University of British Columbia, 2012A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFMASTER OF SCIENCEinThe Faculty of Graduate and Postdoctoral Studies(Computer Science)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)August 2015c© Jason Chou 2015AbstractOrganising files by grouping them using tags is an alternative approach to filesystemdesign that has benefits over the traditional hierarchical model. However, the ma-jority of filesystems in use remain hierarchical. In this paper we describe FindFS,an extension middleware which provides dynamic, tag-based views of an existinghierarchical filesystem. FindFS adopts the functionality of the find utility andadds support for extended attribute queries, which can be used for tagging files andfiltering the filesystem. Search results are represented as directories containing sym-bolic links which are kept up-to-date in response to filesystem operations, allowingthem to persist as views that are accessible from existing unmodified applications.Control of the system is accomplished via filesystem operations, enhancing its easeof integration and portability. We have developed a prototype of this system, andcharacterised the performance overhead that it adds to file operations.iiPrefaceThe contents of this thesis were submitted, as a candidate paper of the same name,to the Middleware 2015 conference. The primary author was Jason Chou, withsupervisor Mike Feeley as a secondary author.iiiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 FindFS Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83.1 Enabling Multiple Hierarchies . . . . . . . . . . . . . . . . . . . . . . 93.2 Tag-Based Filtering and Searching . . . . . . . . . . . . . . . . . . . 103.3 Persisting and Updating Results . . . . . . . . . . . . . . . . . . . . 113.4 Control Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12ivTable of Contents4 FindFS Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134.1 Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144.1.1 Query State INIT . . . . . . . . . . . . . . . . . . . . . . . . 144.1.2 Query State FIND . . . . . . . . . . . . . . . . . . . . . . . . 164.1.3 Query State WAIT . . . . . . . . . . . . . . . . . . . . . . . . 164.1.4 Query State STOP . . . . . . . . . . . . . . . . . . . . . . . . 174.2 Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174.3 Resultsets and Dynamic Update . . . . . . . . . . . . . . . . . . . . 204.3.1 Flat Resultset . . . . . . . . . . . . . . . . . . . . . . . . . . 214.3.2 Hierarchical Resultset . . . . . . . . . . . . . . . . . . . . . . 224.3.3 Resultset Format Comparison . . . . . . . . . . . . . . . . . . 234.3.4 Resultset Updating . . . . . . . . . . . . . . . . . . . . . . . 244.4 Example Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285.1 Implementation Goals . . . . . . . . . . . . . . . . . . . . . . . . . . 295.2 Resultset Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . 315.2.1 Path Existence . . . . . . . . . . . . . . . . . . . . . . . . . . 325.2.2 Path Addition . . . . . . . . . . . . . . . . . . . . . . . . . . 325.2.3 Path Removal . . . . . . . . . . . . . . . . . . . . . . . . . . 335.2.4 Path Modification . . . . . . . . . . . . . . . . . . . . . . . . 335.2.5 Integrated Algorithm . . . . . . . . . . . . . . . . . . . . . . 346 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386.1 Basic FUSE Overhead . . . . . . . . . . . . . . . . . . . . . . . . . . 39vTable of Contents6.2 Expression Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 406.2.1 Filename Match . . . . . . . . . . . . . . . . . . . . . . . . . 406.2.2 Extended Attribute Match . . . . . . . . . . . . . . . . . . . 416.2.3 Filename Non-Match . . . . . . . . . . . . . . . . . . . . . . 426.2.4 Extended Attribute Non-Match . . . . . . . . . . . . . . . . . 426.2.5 Query Complexity . . . . . . . . . . . . . . . . . . . . . . . . 436.3 Resultset Modification . . . . . . . . . . . . . . . . . . . . . . . . . . 446.3.1 Flat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 456.3.2 Hierarchical . . . . . . . . . . . . . . . . . . . . . . . . . . . . 456.3.3 Microbenchmark Summary . . . . . . . . . . . . . . . . . . . 466.4 Macrobenchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466.4.1 FUSE Overhead . . . . . . . . . . . . . . . . . . . . . . . . . 476.4.2 Flat Resultset . . . . . . . . . . . . . . . . . . . . . . . . . . 476.4.3 Hierarchical Resultset . . . . . . . . . . . . . . . . . . . . . . 486.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 487 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51viList of Tables4.1 Predicates for Extended Attributes (EAs) . . . . . . . . . . . . . . . 185.1 Resultset Modification Effects . . . . . . . . . . . . . . . . . . . . . . 356.1 FUSE Overhead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396.2 Matching Benchmark Results . . . . . . . . . . . . . . . . . . . . . . 416.3 Resultset Modification Benchmark (creat()/close()/unlink()), timesper query-result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446.4 Filebench Throughput . . . . . . . . . . . . . . . . . . . . . . . . . . 47viiList of Figures4.1 Query States and Transitions . . . . . . . . . . . . . . . . . . . . . . 154.2 Flat (left) and Hierarchical (right) Resultset Structure . . . . . . . . 205.1 FindFS Component Interaction (thick lines show application↔FindFSrequests, thin lines show FindFS↔filesystem requests) . . . . . . . . 295.2 Modification of Flat Resultsets . . . . . . . . . . . . . . . . . . . . . . 365.3 Modification of Hierarchical Resultsets . . . . . . . . . . . . . . . . . 37viiiAcknowledgementsMany thanks to supervisor Mike Feeley for his suggestions that resulted in theinitial idea, and providing assistance throughout the progress of this project untilits completion.ixChapter 1IntroductionHierarchical filesystems have become well-established in mass storage systems sincethey provide an easily understood and intuitive model of organising data in theform of a tree structure. Because of this, the convention of pathnames, directories,and filenames is nearly universal across most filesystems, and it has resulted in theexistence of a large amount of software which makes use of such names in identifyingthe files they work with. However, the hierarchical model of organisation has itslimitations, one of which is the fact that it is not always possible to find a uniqueor convenient single hierarchy in which files are easily categorised.Semantic filesystems provide an alternative design. First described in [1], thisconcept involves the use of various types of metadata to produce a set of attributesin key-value format, which are assigned to the files in the system. By construct-ing queries on these attributes, a subset of the files can be identified and groupedtogether in a non-hierarchical manner, serving to provide a “flattened” namespaceview.A common way to configure such a system is to “label” files with their metadata-derived attributes; the term “tag-based” has become commonplace in the descriptionof these systems. These tags allow any application, and by extension the user, tomanipulate files through any of the groupings to which they belong, in a more1Chapter 1. Introductionflexible way than any fixed hierarchy.[2][3]Although they may still contain elements of hierarchical access, tag-based filesys-tems are distinguished by their ability to allow for files to be selected in a non-hierarchical manner. Because the files stored in modern systems can have varioustypes of content-derived metadata attributes associated with them, it thus seemsnatural that a file should not be contained in any single group, but instead be able tobe viewed as part of a variety of groups, based on the values of attributes associatedwith it. Furthermore, it should be possible to take advantage of these attributeswhen searching. For example, a photo application will place image files in directo-ries, perhaps based on when they are imported into the application; but these filesmight be naturally grouped by such attributes as when or where they were taken,or rankings or tags placed on them by users.There are many more dimensions of organisation one can use for photos (orienta-tion, camera model, resolution, filesize); and the same is true for the other types offiles one may wish to store, such as music (date, album, artist, genre, length), mak-ing the task of finding a good hierarchical organisation more difficult. Furthermore,the huge growth in production of data by users puts pressure on means to organisethis data with more effective techniques. It becomes clear that, while a hierarchicalconfiguration of directories is one useful way to organise files, confining a file to asingle taxonomy is too limiting for modern systems. Filesystems now store vastlymore files and a much wider array of file types than they did when the hierarchicalnamespace was born.In contrast, a hierarchical filesystem rigidly confines files to a single directory.Other dimensions of grouping are left to the application and as such are not available2Chapter 1. Introductionthrough the filesystem’s API. This limitation is unfortunate because it is simple anduniversal. Every application that accesses persistent storage typically uses this API.It has been many years now since the hierarchical filesystem was declared dead[2],and yet it lives. The argument for replacing the file hierarchy and switching to oneof these new designs is powerful; why then is it that every mainstream commercialfilesystem stores files hierarchically? The filesystem plays a particularly importantrole for applications. It sits between every application and persistent storage. Itmust be robust, performant, and scalable. Persistent data must survive a varietyof software failures — and some hardware failures. The filesystem implementationmust be as free of bugs as possible; and so, filesystems are carefully engineered andsystem vendors are extraordinarily resistant to making wholesale changes to theirimplementation.However, it is not necessarily true that the hierarchical filesystem must be re-placed in order to build a system which can provide a more flexible form of fileorganisation and access. Users have learned to organise files effectively and developad-hoc methods of accessing them non-hierarchically, given the constraints of hier-archical filesystems. We can build upon those organisational techniques to enhancetheir functionality and effectiveness, resulting in a system which combines all theadvantages of traditional hierarchical filesystems with the flexibility of tag-basedones.The exploitation of this idea has resulted in FindFS, a system which allowsnon-hierarchical, tag-based access to an existing hierarchical filesystem by using thefeatures that they provide. The key feature of this approach is that it is layeredentirely on top of a traditional filesystem, and thus inherits the large engineering3Chapter 1. Introductionefforts that made them robust, nearly bug-free, and scalable. As described herein,exposing the hybridisation of performant, ubiquitous hierarchical filesystems andemerging flexible tag-based file selection strategies through familiar interfaces createsa powerful, practical, and low-overhead solution for the efficient management of thelarge, growing semantically-rich file collections common in systems today.4Chapter 2Related WorkPrevious work in attempts to move away from the pure hierarchical filesystem andoffer tag-based access have ranged from reimplementing most of the system start-ing at the layer of block storage [2], augmenting a filesystem with a tag metadatadatabase [1][3], or as an application layer shell feature [4] or library [7]. All ofthese approaches have certain disadvantages. A filesystem reimplemented with non-hierarchical access as its primary goal may perform poorly with the large base ofexisting applications that were written to take advantage of unique characteristicsof hierarchical filesystems, such as access locality within directories. Systems thatuse databases incur the overhead of the database system, one whose full capabilitiesmay rarely be needed. Finally, an application-level feature is exclusive to a givenapplication or library, and is unavailable to use for other existing applications.The original semantic filesystem [1] used pathnames as queries into a databaseto present virtual, created-on-demand directories containing the desired files thatmatch the query. This method is suited to ad-hoc exploration, but for often-usedqueries or those with more complex expressions, it is advantageous to persist theexpression as well as its results in a more convenient form, such as in an actualfilesystem structure, so that it can be easily referenced in the future and avoidrecomputation. TagTree[5], TagFS[6], HAC[7], Tags for Plan 9[8], and the Logic5Chapter 2. Related WorkFile System[9] are also based upon a similar strategy, only differing in how theyrepresent attributes and their mapping to pathnames.These aforementioned systems use an underlying hierarchical filesystem to storethe data of the files, augmenting it with a database and indexing; in contrast,hFAD[2] eliminates the hierarchical filesystem completely, replacing it with a databasethat directly utilises a block storage allocator and using that for both metadata andfile storage. It is able to emulate a hierarchical filesystem by storing paths in thedatabase, and while there are no benchmarks presented, there are other examplesof using a database as a filesystem, such as Inversion[10], which show their higheroverhead. In these systems, the need for a database, and the specific APIs andapplications required to manipulate one, restrict their generality compared to afully-filesystem-based approach.Approaches in providing non-hierarchical access at the application level havealso been studied, such as by extending the chdir command of the shell to performa search of the hierarchy[4]. This is a relatively simple and non-disruptive changeto the system, but also the least general, since such a feature becomes exclusive toan application and would have to be replicated or explicitly linked as a library forit to be usable by other applications. HAC[7] is an example of a more fully-featuredapplication-level approach, providing a virtual, tag-based hierarchical namespacefilesystem within a library. In consideration of the fact that file access is a verygeneral capability, and thus one that should be consistent across applications, thislimitation can be extremely disadvantageous.Other examples of application-level approaches include OS X’s Spotlight[11] andWindows Desktop Search[12], which utilise an indexer to collect metadata about the6Chapter 2. Related Workfiles in the system and store it in a database. This allows them to persist a searchquery[13] and present to the user a “virtual folder” which is dynamically updatedwith matching files[14], but since they are not accessible via the filesystem API,other applications cannot access their contents in the same way they do with files ina regular filesystem. In practice, since the application in question is a GUI and theprimary means for the user to access the filesystem, this is not a strong disadvantage— the “real” path is passed as a parameter when another application is executed ona selected file — but the ability to navigate and inspect search results in the sameway as other structures in the filesystem is a useful feature. The fact that thesevirtual folders are specific to the shell application is noticeable when accessing filesfrom within another application, since they do not have a pathname and are notvisible in the filesystem.7Chapter 3FindFS OverviewThe general approach we take in providing an alternative non-hierarchical view is toleverage the capabilities of an existing hierarchical filesystem as much as possible.This obviates any need to reimplement much of the underlying functionality withregards to file storage. It allows reuse of the standard filesystem API, meaningthat existing applications do not need to be specifically modified in order to takeadvantage of the new features and can continue to access files as they did before.From the user perspective, the filesystem remains compatible with both existingapplications and the well-ingrained idea of hierarchical storage; there is no need torelearn or be forced into using a new paradigm of file organisation. However, whenthe need arises to use a tag-based view of the filesystem, these capabilities becomeavailable. This approach is motivated by the observation that with only the existingfeatures available in a standard Unix system, users are already able to identify andmanipulate files in ways that are not strictly hierarchical. However, doing thisrequires extra effort and has its limitations; FindFS builds on the principles behindthese ad-hoc manipulations and extends them into the form of a filesystem, makingsearch results which persist and are updated according to filesystem changes.83.1. Enabling Multiple Hierarchies3.1 Enabling Multiple HierarchiesConsider server access logs for a series of N websites; they may be organised in ahierarchy of the form logs/siteN/YYYY/MM/DD. A consequence of this choice is thatit is easy to obtain all the logs of one site for all dates, a particular date, or set ofdates; they can all be found under e.g. the logs/site3 directory. On the otherhand, to obtain the logs of all sites on a particular date requires “cutting across”this hierarchy, since the path component that varies in this case is not the last one.Alternatively we could choose to use the hierarchy logs/YYYY/MM/DD/siteN, whichmakes the latter use case easier but the former harder.Links — symbolic[15] or otherwise[16] — provide a standard solution to thisproblem in the context of a hierarchical filesystem; they allow a file to be accessedvia multiple pathnames, meaning that it can appear to be in different hierarchies.We choose to exclusively use symbolic links in FindFS, due to their added flexibilityin allowing links to directories and across different filesystems.With this flexibility, the user can create hierarchies as needed, and add links tothe appropriate files in it to enable different views of the filesystem; to use the log ex-ample, both the logs/siteN/YYYY/MM/DD and logs/YYYY/MM/DD/siteN hierarchiesmay coexist, with one of them containing links to files stored in the other, or both ofthem links to same set of files stored in an entirely different hierarchy (e.g. a “flat-ter” structure, with the names containing this information: logs/N-YYYY-MM-DD.)The original semantic filesystem [1] embodies this idea, creating symbolic links froma “virtual”, on-demand tag-based hierarchy to files stored elsewhere. In FindFS,search results are also directories of symbolic links to matching files, stored in the93.2. Tag-Based Filtering and Searchingunderlying filesystem, but we depart from the traditional semantic filesystem in thehandling of tags and file selection.3.2 Tag-Based Filtering and SearchingOn standard Unix systems, the find utility standardised by POSIX [17] is theusual means by which users can search the filesystem; in its most basic usage, ittakes as parameters a series of paths and an expression, and recursively traversesthese paths while evaluating the expression for each file/directory encountered. Theexpression allows matching on various metadata such as names, sizes, dates, andownership, which combined with boolean operators allows for complex queries to beeasily constructed. The effect of this filtering process is not unlike that of tag-basedfilesystems, where files are selected based on the metadata of their tags.However, the standard find queries only allow selection based on the typicalmetadata found in a traditional filesystem. Index-based search systems have evolvedsimilar tools such as mdfind[18], which query the metadata database, but its expres-sion syntax[20] is not compatible with find and they are specific to one index-basedsystem. A more general solution is available in the filesystem, in the form of ex-tended attributes. Although the standard POSIX filesystem API has no supportfor them, many filesystems such as ext*, XFS, etc. do provide this feature[19]. Thisallows a set of key-value pairs to be associated with each file, and thus be used tostore the information on the tag namespace. An application like find could thenbe used to query these attributes, providing tag-based filtering.103.3. Persisting and Updating Results3.3 Persisting and Updating ResultsThe output of find is not a continuous view of the files that it selects, but anapproximately instantaneous list created at the time it is run; in other words, theresults are not inherently persistent. Using the flexibility of I/O redirection, onecould save these results to a file, or use them to create an alternate hierarchy oflinks. For example, with a suitably extended find, the command$ find logs -eav user.date_year 2015 -a -eavuser.date_month 05| xargs -I linkdest ln -s linkdestcould create a set of links (a resultset) in the current directory to all the logs taggedwith a date of May 2015. This solves the problem of persistence, but if the filesin this resultset are modified such that they no longer exist or satisfy the query, oradditional files are added that do, the resultset will become inconsistent with thefilesystem. Thus, a method of ensuring that these links are up-to-date is required.A set of these commands could be put in a script which refreshes the resultsetswhen periodically executed, but such a “polling” method is an inefficient use ofresources. Increasing the refresh frequency allows the results to be kept in closercorrespondence to filesystem changes but consumes more processing time, while re-ducing it leaves more resources for other processes but decreases the consistency ofresults. However, no periodic update frequency guarantees that the results are con-sistent whenever they are inspected; the solution is to not poll but to use event-basednotifications of filesystem activity, causing the results to be updated synchronously113.4. Control Interfacewith their associated filesystem operation. In this way, no inconsistency can oc-cur, and no unnecessary work is performed when there are no associated filesystemoperations. FindFS adopts such a solution, monitoring filesystem operations andupdating resultsets accordingly.3.4 Control InterfaceIn addition to monitoring filesystem operations for the purposes of modifying re-sultsets, FindFS also monitors operations performed on special files and directo-ries which allow control over query creation, modification, execution, and deletion;queries are themselves also special directories. As with resultset contents, thesespecial files are persisted by the underlying filesystem like any other file, relievingFindFS of the need to explicitly manage their storage. For example, the selectedpaths and query expression are written to a query file, while the control file canbe read to determine the state of a query as well as written to command it to adifferent state, e.g. to initiate a find operation. The output file provides textualinformational, warning, and error messages, while the results directory containsthe query’s resultset.Presenting a control interface via the filesystem API is a common practice inexisting Unix and Unix-like operating systems[21][22], and enables applications tobuild upon it to provide more advanced functionality. It also allows for an easyextension to distributed systems[23], since the filesystem API is uniform and a well-understood paradigm for interaction.12Chapter 4FindFS DesignFindFS is designed to present itself to the OS as a filesystem, while also requiringan underlying filesystem on which it can search and also store the persistent dataassociated with its search queries. This means that FindFS can be mounted any-where in the OS’s virtual filesystem tree, allowing applications to issue requests toit; once mounted, it acts as a thin layer between applications and an underlyingfilesystem, transparently passing through requests from applications and acting onthose which have a special meaning to it such as query control operations.Because the underlying filesystem is specified as a path relative to the VFSroot, FindFS’ filtering capabilities can be selectively applied to subtrees of the VFS.This flexibility is advantageous since access via FindFS adds overhead (examinedin more detail in Chapter 6), and it may not be beneficial to apply it to certaindirectories in the same way that indexing systems may not attempt to index allfiles. For example, temporary and system directories could be excluded, while thosecontaining metadata-rich multimedia are included.134.1. Queries4.1 QueriesFindFS’ model of operation is based on the concept of the query, through whichfiltered views of the filesystem are observed. Once mounted onto an underlyingfilesystem, a queries directory is visible in the root of the mount, and operationson the files and directories within this subtree are interpreted as query control op-erations. A query is an object represented by a directory in the queries directory,and itself contains files and directories which reflect its state. The manipulationof queries is accomplished entirely via filesystem operations; for example, the cre-ation of a subdirectory of the queries directory creates a query of the same name.Whenever a query is created, FindFS creates a few special files inside the query’sdirectory and monitors the activity on them, allowing the user to control the query.Similarly, when deletion is performed on a query’s directory, FindFS also frees theother resources associated with a query.Queries are always in one of four states: INIT, FIND, WAIT, and STOP. Changes inthe state of a query can be effected either by the system itself or by the commandsissued to it via the control file. Figure 4.1 shows the states and transitions whichare commanded or system-initiated.4.1.1 Query State INITThe INIT state is the state in which a query is initially in after it has been created(by the use of mkdir() in the queries directory). In this state, its query file canbe freely edited to construct the desired query. The results directory is present, butempty. The query can be removed with a rmdir() call on the query’s directory,144.1. QueriesFigure 4.1: Query States and Transitionswhich automatically removes all the files as well as directories that were createdduring the creation of the query. Writing find or wait to the controlfile causes anattempt to transition to the indicated state. When this occurs, the system parsesthe contents of the query file, and if the query is syntactically valid, the transitionis allowed. From INIT to WAIT, the query becomes active and its results directorywill be updated with the files and/or subdirectories which match the query as aresult of the filesystem operations executed by applications. From INIT to FIND, inaddition to new operations, a process similar to the find utility is used to collectexisting files that match the query. Otherwise, error messages are written to theoutput file and the transition does not occur, allowing the user to edit and correctthe query.154.1. Queries4.1.2 Query State FINDThe FIND state is anticipated to be the usual state a query will transition to fromINIT. In this state, the system searches the filesystem for files which match the query,a process similar to the find utility — hence the name of this state — and addsthem to the set of results. In addition, files that are accessed, modified, and createdduring this state will also be subjected to an evaluation by the query’s expression.When the find process is complete, the query automatically transitions to eitherthe WAIT or STOP state, depending on the options used for the query. While a queryis in this state, it is possible to command it to INIT, WAIT, or STOP, interrupting theprocess.4.1.3 Query State WAITA query in the WAIT state has finished its process of traversing the existing filesystemand gathering results, and is waiting for filesystem operations in order to updatethe resultset. (Alternatively, if it was commanded into this state without enteringFIND, the resultset will be initially empty but updated as new filesystem operationsoccur.) In the terminology of the system, queries in a FIND or INIT state are knownas an active query. In this state, a query may be commanded to the FIND, INIT, orSTOP states to respectively reinitiate a find process, be reinitialised after alteringit, or made inactive.164.2. Expressions4.1.4 Query State STOPSTOP is the other state, besides INIT, in which a query is inactive; in this stateits results remain accessible but, because its processing in response to filesystemoperations is disabled, do not change. It is a state that is useful when managingmany queries; for example, a WAITing query whose results are not anticipated tochange during an upcoming period of intense filesystem activity can be commandedto STOP for that period, avoiding the overhead that it adds, and be “resumed” toWAIT afterwards.This state can also be used to preserve certain historical information aboutthe filesystem, since a query which goes through the INIT→FIND→WAIT sequencecontains in its resultset all the files existing which match it, and those that currentlydo — until made inactive. Thereafter, files that now match and files that no longermatch have no effect on it, so it has effectively captured a snapshot of matching files;deletions and renames notwithstanding, since results are symbolic links. Combinedwith the INIT→WAIT sequence, this allows persisting information on files that wereaccessed and matched a query in a certain timeframe, a task not possible with thefind utility.4.2 ExpressionsThe query expression language of FindFS is an extension of that supported bythe find utility. It includes all of the primary-expressions (predicates) defined bythe POSIX[17] standard, and adds those shown in Table 4.1 to allow matching onextended attributes.174.2. ExpressionsPredicate Descriptionea name EA name existseav name value EA exists with valueeavl name value EA exists with a value < valueeavle name value EA exists with a value ≤ valueeavg name value EA exists with a value > valueeavge name value EA exists with a value ≥ valueeavne name value EA exists with a value 6= valueTable 4.1: Predicates for Extended Attributes (EAs)The predicates that require a value to compare against have two forms; the oneshown in the table, intended for attribute values which are relatively short printablestrings, and an alternate version intended for either longer attribute values or thosecontaining nonprintable characters. The latter form takes a filename for the secondargument, and uses the contents of that file as the value; they are named withan f replacing the v: eaf, eafge, etc. Because attribute values are stored in thefilesystem as a sequence of bytes, not unlike file contents, they are not necessarilyrequired to be strings, and as such, no C-style null-termination is applied to thespecified values.Since the standard predicates and operators are also available, they can be usedtogether with the extended attribute predicates, further increasing the flexibilityof query expressions; unlike the previous efforts discussed above, whose query ex-pressiveness is limited to matches on the tag namespace, FindFS allows filteringby standard metadata, and is usable even without any extended attributes. Forexample, a FindFS query could be created to contain all files that were modified inthe last hour and owned by a particular user.It also does not impose any structure on the tag namespace, since the match-184.2. Expressionsing predicates it supports are quite general; by using only the name of extendedattributes (the ea predicate) in selection, tags can be simple labels. Using bothname and value, and comparing values with the equality or inequality predicatescombined with the standard conjunction and disjunction operators (-o and -a), anarbitrarily complex tagging system can be utilised. Furthermore, there is no re-striction on the values, allowing binary and text formats to be accommodated. Theparticular design of tagging systems is beyond the scope of FindFS — it supportsany semantic model that can be expressed as key-value pairs or standard filesystemmetadata, such as those discussed in [24] and [25]. Likewise, this metadata could bederived from file contents and updated when they change, or produced in some othermanner; as long as metadata changes are performed through the existing interfacethat FindFS monitors, it will interoperate.For example, considering the use-case of website access logs presented in Chapter3, FindFS may be employed in various ways to achieve the filtering desired. Withfilenames that contain the desired information, e.g. using the N-YYYY-MM-DD format,a query of the form logs -name 5-* will match all the logs for site 5, logs -name*-2015-01-* all the logs for January 2015, etc., presenting them in the query re-sults as a “flattened” view in a directory of links. Observe that, even without anyuse of extended attributes, there is already substantial filtering flexibility available.However, its use with files that have extended attributes brings more flexibility;suppose that the process which generates the logs also adds an extended attributewith the name user.high rate to the log files containing an unusually large num-ber of recorded accesses. Then the query logs -ea user.high rate can be usedto provide a continuously updated view of the sites receiving the most traffic, and194.3. Resultsets and Dynamic UpdateFigure 4.2: Flat (left) and Hierarchical (right) Resultset Structurewhen, for further analysis or even additional filtering via another FindFS query thatoperates on this query’s results.4.3 Resultsets and Dynamic UpdateIn the results subdirectory of the query, references to matching files and directoriesare present in the form of symbolic links, in a similar manner to the original semanticfile system. A query can be configured to use one of two different formats for theresults: a flattened list, or a hierarchical view. These two formats are providedto allow flexibility in choosing the one which is most suitable to the specific use-case, since they both have advantages and disadvantages. Figure 4.2 graphicallyillustrates the difference between these formats.The design of the resultsets follows from the requirement that operations onthem, particularly incremental ones, should be both efficient and make use of un-derlying filesystem features as much as possible. Furthermore, by monitoring filesys-tem activity, FindFS continuously keeps the contents of this directory updated toreflect the latest modifications to the filesystem. Thus, by navigating inside thissubtree, one can obtain a dynamic realtime non-hierarchical selection of files from204.3. Resultsets and Dynamic Updatethe original filesystem.4.3.1 Flat ResultsetA flat resultset, whenever possible, represents its contents using a single level ofhierarchy — a symbolic link in the results directory of the query, named after andtargeting the matching file; this is not possible in all cases since using the nameof the file in the results is most sensible to the user, and traditional hierarchicalfilesystems do not impose any requirement that filenames be globally unique. Tohandle instances of duplicate names, FindFS adds a second level of hierarchy, usinga subdirectory named after the file and putting the links in it. These links arenamed after the inode numbers of the files they reference; a decision whose benefitover a naming that appears to be simpler at first glance, like sequentially increasingnumbering, is that it avoids counting and the issues that arise when files are addedand removed from the resultset.A second level of duplication, which can occur with files that have multiple hardlinks of the same name, is handled by appending an “-” to the inode number anda sequentially increasing, first-fit identifier; here, we have resorted to counting sincethis case should be rare in practice — the majority of regular files in a filesystemusually have a hard link count of 1.For example, a resultset containing the following paths with the correspondinginode numbers214.3. Resultsets and Dynamic Update/a/b/file1 [11]/a/b/file2 [34]/a/c/file3 [27]/a/d/file1 [13]/a/d/file2 [34]would be represented by a results directory of the following structure:results/file1/11 -> /a/b/file1results/file1/13 -> /a/d/file1results/file2/34 -> /a/b/file2results/file2/34_1 -> /a/d/file2results/file3 -> /a/c/file34.3.2 Hierarchical ResultsetOn the other hand, the hierarchical resultset presents a “virtual hierarchy” treerooted in the result directory of the query, which positions the symbolic link for afile in the resultset nested in a set of directories with the same name as found in itspath in the underlying filesystem, i.e. relative to the root of FindFS’ mount point.Using the same example as above, the resultset structure becomes:results/a/b/file1 -> /a/b/file1results/a/b/file2 -> /a/b/file2results/a/c/file3 -> /a/c/file3results/a/d/file1 -> /a/d/file1results/a/d/file2 -> /a/d/file2224.3. Resultsets and Dynamic UpdateWith this format, duplicate names do not pose any problem, meaning that theresultset modification algorithm can be more efficient for some operations. Thetrade-off is that queries which match files in deeply-nested directories tend to create atree structure with low fan-out, i.e. they are sparse and mainly consist of directorieswhose only content is a single subdirectory, and this frustrates access to the files atthe ends of these long branches. One capability which this format lacks compared tothe flat mode is for resultset entries to be directories; since it requires the directoriesto store links, it is impossible to have a link to a directory as a result and also asa directory to store results matched in its subtree. With the flat format, thosedirectories which match are accommodated with the same mechanism for regularfiles: by creating links to them in the result directory, or in a directory one leveldeeper (if there is a duplicate name).4.3.3 Resultset Format ComparisonThe two resultset formats also differ in performance characteristics, which makethem suited to different use-cases. A flat resultset, although slower in response tosome filesystem operations, is ideal for queries which do not have many results orduplicate names expected, and for which the user desires to see all the files in onedirectory without needing to recurse into a series of subdirectories. It representsa mostly non-hierarchical view that emphasises the selection process of the query,which is similar to those of the other tag-based filesystems. The hierarchical resultsetpresents a view not unlike a traditional a hierarchical filesystem, but one that isfiltered to include only the files (and their containing directories) which match.234.3. Resultsets and Dynamic Update4.3.4 Resultset UpdatingThe resultsets of active queries are kept updated by monitoring operations on thefilesystem, and adding or removing files from them according to the result of evalu-ating the query’s matching expression on the changed file(s). Because the values ofexpressions depend only on filesystem metadata, they may change, and the resultsbe updated, only when such metadata changes. Thus, only those operations on afile which can affect its presence in a resultset need to be monitored by FindFS.This includes creation, deletion, renaming, and modifying of extended attributes,but not reading or writing; although the latter two operations may update accesstimes, this is handled upon closure of the file.The semantics of expression evaluation differ slightly from that of the findutility, to better reflect how FindFS operates; this is because the utility does notactually perform a filtering operation — instead, its operation can be considered asone that executes an expression on each file in a directory tree, and the usual outputit produces is only a side-effect of this recursive filesystem walk. In fact, the printprimary predicate is used to produce this output; given the possibility of multipleoccurrences of print in the expression, or none at all, the standard[17] specifiesthat in absence of a print, exec, or ok predicate, the actual expression which isevaluated is conjuncted with print (i.e. expr -a -print). This results in the usualoutput if no print is explicitly specified. In FindFS, the value of the expressionis used to determine resultset membership and print evaluates to true with noside-effects, which achieves good compatibility with existing find expressions.244.4. Example Interaction4.4 Example InteractionSince FindFS is entirely controlled by filesystem operations, its interface is avail-able to all applications; this makes viable the possibility of developing specialisedapplications to assist users in creating and managing queries, but also allows its op-eration via existing command-line utilities, as shown in the following interactions.Assuming the current directory is the root directory of the FindFS mountpoint, westart by creating a query named q1:$ mkdir queries/q1The query now exists but cannot be executed (nor is it active), since it is in theINIT state and contains no matching expression. The following command may beused to give it one:$ echo ’/ -name file1.txt -o -eav user.attr1 value1’ > queries/q1/queryThis matches files with a name of file1.txt, or those with an extended attributenamed user.attr1 and value value1. The syntax used here is equivalent to theparameters to the find utility; options are specified first if present, followed byone or more paths (relative to the mountpoint, where / refers to the mountpointdirectory), and finally the query expression. Strings may be quoted in the usual shellmanner via single and double quotes and escaped with the backslash, to facilitate usewith directories containing spaces or other special characters; neither command norvariable substitution are supported, with those characters being parsed as regular254.4. Example Interactionones. If the need arises, they can be performed using the shell in the customarymanner, and the results, correctly quoted, written to the query file.In the INIT state, the query file may be rewritten multiple times, as writing tothe file does not trigger any FindFS operations; this allows composition of arbitrarilycomplex queries with saves of partial edits, and applications such as text editors tomanipulate the query file.The state of this query is examinable by reading the control file:$ cat queries/q1/controlinitHowever, a write to the control file, as effected by e.g. this command,$ echo ’find’ > queries/q1/controlwill command FindFS to request a state change of the query object as describedabove. This may also be performed via another application, but care must be takenwhen doing so to ensure that it writes only the command and nothing else to thisspecial file, and that reading will obtain the current status instead of the commandlast issued. After a query has been put into the FIND state, its progress can bemonitored using the control file:$ cat queries/q1/controlfind$ cat queries/q1/controlfind$ cat queries/q1/controlwait264.4. Example InteractionThe results of a query are also always available in the results directory, in theformat described above, and while the query is active, will be updated in responseto changes in the filesystem:$ ls queries/q1/resultsfile1.txtfile2.txtfile3.txt$Finally, to remove a query, which automatically stops it and frees associatedresources (including its resultset and control files), it suffices to remove its directory:$ rm -rf queries/q127Chapter 5ImplementationWe have implemented a prototype of FindFS as a FUSE[26] filesystem application,in C, instead of a kernel module. This allows use of the facilities available to user-mode applications such as the POSIX API and C standard library, which easesimplementation due to much of the functionality being inherited from its origins inthe find utility. The development of FindFS began with creating a version of findwith extended attribute support; a significant portion of which was then reused, forparsing and evaluating the query expressions. The implementation totals less than2500 lines of code, approximately half of it for find-like expression parsing and eval-uation and the other half for code specific to FindFS: query management, resultsetmodification, and filesystem request handling. Figure 5.1 shows how filesystem op-erations from applications are passed through to the FindFS application via FUSE,while FindFS itself performs operations on the underlying filesystem.Although the use of FUSE allows for easier implementation and portability, itincreases overhead and reduces performance; all filesystem operations in the FindFSmount subtree need to pass through the FUSE layer, including those that cannotcause resultset changes (e.g. reading and writing), and thus those that FindFSwould not need to inspect. This could be solved by modifying the FUSE kernelmodule to perform transparent passthrough operations itself — based on what op-285.1. Implementation Goalserations the user-mode application desires to handle — or by having FindFS itselfbe a complete kernel-mode filesystem. The former retains most of the portabilitybut also the extra kernel-user transitions and a relatively restrictive API (in termsof filesystem manipulation — see Section 5.2), while the latter, although requiringmodifications for different kernel APIs, allows for more efficient filesystem manip-ulations and eliminates kernel-user transitions for each filesystem operation on theFindFS subtree.5.1 Implementation GoalsIn keeping with the theme of using the filesystem whenever possible, there is verylittle data the system explicitly manages in memory; only the identifying informa-tion, state, and parsed expression for each query are kept in memory by FindFS.In particular, this means the resultset is managed by the filesystem, and opera-tions on it are handled by the filesystem driver. This is beneficial since it simplifiesFigure 5.1: FindFS Component Interaction (thick lines show application↔FindFSrequests, thin lines show FindFS↔filesystem requests)295.1. Implementation GoalsFindFS and allows the number of results to not be limited by available memory,and also automatically makes the results persistent. One possible disadvantage isthat accessing the filesystem is slower than a local in-memory data structure, butit was decided that the increased complexity and memory requirements (being atleast proportional to the size of the resultset) of doing so, along with the goal ofpersisting the results, meant that access to the filesystem would be unavoidable; ad-ditionally, there is caching in the OS’s filesystem and block device drivers, reducingthe performance impact.One additional effect of this design choice is that if FindFS is mounted onto thesame location as its underlying filesystem root, even when it is not mounted, thecontents of queries and their result directories remain on the filesystem and thus canbe accessed by applications; the only difference is that there is no dynamic update,so they may not correspond to current filesystem conditions.Furthermore, since FindFS queries are intended to be long-lived and filter thefilesystem while the user performs other work, their presence should not have a largeimpact on the performance of the system. This agrees with the design principle ofnot using more memory than necessary in an attempt to improve its performance,since that will be detrimental to other applications the user may have more impor-tance on; FindFS should be unobtrusive, and without any queries active, should notconsume any additional resources.This is unlike indexing-based systems that need to perform regular index updatesregardless of whether the information in the indices will eventually be used, at therisk of wasting that effort and causing unnecessary filesystem activity which canimpact the performance of other applications on the system. Our implementation305.2. Resultset Processingof FindFS occupies less than 26MB of memory initially, and each additional query(of a typical complexity, i.e. 2-3 terms) adds less than 1KB. As there is no persistentindex, the only filesystem storage needed is proportional to the number of queriesand the number of results they contain. Since our implementation does not attemptto merge duplicate parts of query expressions or resultsets, both space and time aredirectly proportional.5.2 Resultset ProcessingUsing the standard POSIX filesystem interface imposes certain restrictions on theway in which resultset modifications can be performed; for example, the only way todetermine the number of entries in a directory is to exhaustively read them — whichis not a constant-time operation. This is due to early Unix filesystems’ storage ofdirectories as a series of variable-length entries, without any count of the number ofentries. The inode number of a file, which is easily obtained via stat(), cannot beused to refer to it for any other operations — the standard API accepts only textualpaths and file descriptors. Textual paths are problematic in that their manipulationrequires operations on null-terminated strings, which are neither constant-time norspace.There is no way to find the path(s) associated with an inode number exceptvia exhaustive search of the filesystem. Determining if a filename exists in a direc-tory can vary from nearly constant-time (hashed lookups) to proportional to thenumber of entries in it, depending on the directory format and implementation ofthe underlying filesystem. However, even with a linear search, directory lookups315.2. Resultset Processingare usually not a problem except for the very largest of directories since the entriesare relatively small, relative to the speed at which the storage system can retrievethem. Caching further improves on this, making it likely that that the contents offrequently accessed directories are entirely in memory.5.2.1 Path ExistenceAll resultset modifications require determining if a path exists in one, which may befollowed by addition or removal (or neither), depending on the result of evaluatingthe query expression. Since these tests of existence need to be performed wheneveran operation on the filesystem may cause its membership in a query’s resultset tochange, they must be efficiently implemented. Using the flat format, a path notin the resultset requires one directory lookup on the name; otherwise, a stat()operation follows, and then either a single readlink() in the case that there is oneexisting result, or opening the directory and reading all the entries and performingreadlink() on ones with a matching inode number, if there are multiple. With thehierarchical resultset format, a single directory lookup of the path relative to theresult directory suffices; the presence of a link signifies membership.5.2.2 Path AdditionIf path existence equals the value of evaluating the query’s expression with it, noth-ing more needs to be done; otherwise the given path must be added or removedfrom the resultset. For a flat resultset, adding the first result of a given filename isa single symlink(); the second requires unlink(), mkdir(), and two symlink()s;325.2. Resultset Processingand the third and subsequent results, depending on presence of hardlink collisions,one or more symlink() calls. With a hierarchical resultset, adding a path results ina number of mkdir()s proportional to the path depth (it is futile to create directo-ries which exist, but since the mkdir() function implicitly checks for this condition,explicitly doing so — for example, with stat() or access() — would be even moreinefficient) in a similar fashion to the behaviour of the mkdir -p command, followedby one symlink() call.5.2.3 Path RemovalRemoving a path from a flat resultset is a single unlink() operation for one ormore than two existing results of that name, but reducing a resultset from 2 to1 result requires two unlinks(), a rmdir(), and a symlink() operation. For ahierarchical resultset, an unlink() is performed, followed by as many rmdir()s asthere are empty parents in the path (relative to the FindFS root); as with addition,we have utilised the implicit test for non-emptiness of rmdir() to automaticallyremove empty directories, and furthermore its return value to stop further futilityupon reaching the first non-empty one.5.2.4 Path ModificationRenaming a component of a path is perhaps the most complex operation that canbe performed relative to updating of the results, since it has the potential to alterthe paths of a very large number of files. The simplest case, where the path to berenamed refers to a file, can be considered as a combination of deletion and creation,335.2. Resultset Processingso that a rename of /a/b/c/file1 to /a/d/file2 can be reflected in the resultsetas a removal of /a/b/c/file1, followed by addition of /a/d/file2 (provided thatthe new path still satisfies the query expression and scope.) The complex caseoccurs when renaming a directory; this affects all resultsets containing paths thatinvolve it, and thus requires iterating over the resultset to both remove entries thatno longer match the query and modify entries which still do, but need their linksadjusted to target the new directory’s path. The entire contents of a flat resultsetneed to be examined, while a hierarchical one isolates the requires changes to therenamed subtree; a rename() operation moves the subtree in the resultset in thesame way it modified the original filesystem, and then a recursive traversal throughthe files it references adds/removes/updates the resultset accordingly. In addition,since renaming effectively deletes a subtree and creates another, it is necessary tore-evaluate all the files in the subtree as their changed paths may now satisfy aquery that was not satisfied before.5.2.5 Integrated AlgorithmThe operations required are summarised in Table 5.1, and Figures 5.2 and 5.3 depictgraphically the algorithms used for flat and hierarchical resultsets, respectively. Asit does not make sense to remove a result which does not exist, or add paths thatare already present, these algorithms integrate the existence checks with additionor removal, controlled by one variable (“add” in the figures) — depending on thisboolean, they either add a path if it does not exist (if it already exists, no actionis taken), or remove one that exists (similarly, one that does not exist causes no345.2. Resultset ProcessingOperation Resultset Duplicate Operations TotalNamesExistence Flat 0 or 1 stat(), readlink() 22 or more stat(), opendir(), 3 + 2nn * (readdir(),readlink()),closedir()Hierarchical - stat() 1Add (if not exists) Flat 0 symlink() 11 unlink(), mkdir(), 42 * symlink()2 or more k * symlink() kHierarchical - p * mkdir(), symlink() p+ 1Remove (if exists) Flat 1 unlink() 12 2 * unlink(), mkdir(), 4symlink()3 or more unlink() 1Hierarchical - unlink(), q * rmdir() q + 1(n = number of results, k = number of inode number and name collisions; p = depthof path; q = number of 1-child deepest directories in path)Table 5.1: Resultset Modification Effects355.2. Resultset ProcessingFigure 5.2: Modification of Flat Resultsets365.2. Resultset ProcessingFigure 5.3: Modification of Hierarchical Resultsetsaction.) Integration of addition/removal into one algorithm eliminates much of theduplication and work — particularly for the flat resultset — that would otherwiseoccur as the code paths for these operations are very similar. For example, thenumber of duplicate results in a flat resultset is partially determined during existencechecking, and this is also used when both adding or removing is performed.Observe that a hierarchical resultset, owing to its similarity in structure to theexisting filesystem, requires far fewer decisions to be made than for a flat one; themajority of the complexity in the latter arises from a need to handle the differentcases of duplicate names.37Chapter 6EvaluationSpecific benchmarks were performed to characterise the system, to determine theoverhead introduced by evaluation of queries and updating of their results. Allof the results are obtained with this configuration: Ubuntu 12.04 Linux 3.2.0-24with FUSE 2.8, i7 920 (2.67GHz), 1GB RAM, 2xWD2003FZEX (7200RPM, 2TB,64MB cache) HDDs (in RAID 1) running as a VM on a host which is otherwiseidle. Unless otherwise specified, the filesystem used in these benchmarks is a small27.85GB Ext3 partition; while the time taken for the FIND state of each query isdependent on the size of the filesystem, in a similar manner to how the find utilityoperates, the benchmarking of that is not the main concern. The performance ofresultset updates in response to changes in the filesystem (i.e. the WAIT state) is ofsignificantly more interest; we use microbenchmarks to determine the cost of theseoperations, as well as a macrobenchmark to evaluate the overall performance of thesystem. Unless otherwise specified, all the results were obtained by averaging timeover 1000000 iterations.386.1. Basic FUSE Overhead6.1 Basic FUSE OverheadFilesystems which utilise FUSE are known to be slower than native kernel-modeimplementations [27] due to the additional context switches involved, and the factthat all requests to the filesystem go through a device special file; the purpose of thisbenchmark is to quantify the overhead of various filesystem operations via FUSE,and compare that to the native (Ext3) filesystem in the benchmarking environment.Note that this also includes the overhead of the user-mode FUSE library, whichFindFS relies on. For this benchmark, FindFS is used with no active queries; in thismode, all filesystem activity is passed through transparently. The operations testedare open/close, open/read/close (reading only the first byte of a small file), stat(),creat()/close()/ unlink(), mkdir()/rmdir(), and reading all the entries of adirectory containing 20 files. Table 6.1 shows the results.Operation Native (µs)FUSE (µs)open()/close() 2.44 40.2open()/read()/close() 3.06 68.0stat() 1.10 1.17creat()/close()/unlink() 12.89 119.3mkdir()/rmdir() 54.91 153.8opendir()/readdir()/closedir() 7.86 109.4Table 6.1: FUSE OverheadFrom these results, it can be seen that file accesses through FUSE are roughly20x slower than accessing the native filesystem directly, while creation and deletionnearly 10x slower; however, there is a difference of less than 7% with stat() oper-ations, and directory creation/removal, at ≈3x, is not quite as slow as open/close.The other operation that FindFS frequently needs to perform for flat resultsets,396.2. Expression Evaluationreading the contents of a directory, is almost 14x slower. Since all the benchmarksin the following sections include these overheads, it is important to note that even asmall relative change in time could appear much greater if FindFS were implementedas a native kernel-mode filesystem instead of with FUSE.6.2 Expression EvaluationEach (active) query requires processing time, which is comprised of two parts: eval-uation and modification. The former involves evaluating the metadata of the fileagainst the query expression, while the latter may add or remove it from the resultsetdepending on the result of the evaluation. To determine precisely the costs of eachof these operations, we use benchmarks which only exercise the respective parts.For expression evaluation, this is achieved by performing filesystem operations thatdo not cause any resultset modification, while varying the number of queries ortheir complexity. We timed an open/close and open/read/close operation, for bothfilename and extended attribute queries in the matching and non-matching cases.Since the times scaled almost proportionally with the independent variable, lin-ear regression was used to obtain the per-query cost. The results of this benchmarkare shown in Table 6.2.6.2.1 Filename MatchThis benchmark determines the time taken to evaluate a single expression using the-name predicate, which is a typical application of the find utility. It represents theuse case of FindFS as a “persistent invocation of find”, and involves accessing a file406.2. Expression EvaluationMatch Type open()/close() (µs) open()/read()/close() (µs)Match Filename 5.95± 0.25 10.0± 0.61Match EA 3.32± 0.31 4.37± 0.73Non-match Filename 0.198± 0.072 1.76± 0.02Non-match EA 2.87± 0.52 2.87± 0.49Per Filename Term −0.242± 0.086∗ 0.520± 0.147* See text for explanation of negative valueTable 6.2: Matching Benchmark Resultsexisting in the (flat) resultsets of N queries, such that it remains in those resultsets.This performs a stat() and a readlink() operation on each iteration, followed bya path comparison. As expected, the time scales nearly linearly with the number ofqueries, but the case in which the file is only opened and closed but not read takesonly 60% of the time when it is read; this is likely due to (CPU instruction) cacheeffects, with the quantity of executed code being greater in the latter case, and inactual use an extra 10µs would be expected per file access.6.2.2 Extended Attribute MatchWhen queries are used to create “containers” like a tag-structured filesystem, ex-tended attributes are employed; more specifically, name and value matches. Thus itis important to determine the time required to evaluate such queries. The testing isperformed with the same setup as 6.2.1, except the queries use the -eav predicateto examine an extended attribute’s name and value. A 13-byte extended attributename (user.filename) is used, and matched with a value of 9 bytes, comparableto the filename match in the previous section. In this case, there is the same trendof reading from the file increasing the matching time — giving more evidence in416.2. Expression Evaluationsupport of the theory that this is a CPU cache effect — and it can also be seen thatretrieving extended attributes and matching on them adds less than 5µs, meaningthat Ext3 extended attribute retrieval is unlikely to be a performance bottleneckfor FindFS.6.2.3 Filename Non-MatchMany files in a filesystem, particularly system files and temporary data, are notoften the targets of search queries; however, they may be frequently and rapidlyaccessed by system services during normal operation. For an indexing filesystemthey would not be indexed, but since FindFS monitors all operations it is importantto determine how accesses to non-matching files are affected. This benchmark isidentical to 6.2.1 except that the file being accessed is not in, and will not be addedto, any of the resultsets. Additionally, we test the creation and deletion of a non-matching file. The cache effect is strongly visible in these results, adding less than0.2µs when the file is only opened and closed, but close to 10x that when it is read;creation and deletion also nearly doubles that number. Nevertheless, an extra 1.76µsfor accessing a non-matching file, or 3.20µs for creating or deleting, is less than anymatching one, following the principle that FindFS should be relatively unobtrusiveon files which are not targeted by queries.6.2.4 Extended Attribute Non-MatchThis benchmark tests the opposite condition to 6.2.2, and its goal is similar to 6.2.3;files not matching queries utilising extended attributes should not incur significant426.2. Expression Evaluationoverhead when they are accessed. The same query as for 6.2.2 is used, and a filethat does not match is repeatedly accessed. This adds the same overhead regardlessof whether the file is read, and is relatively close to the times for 6.2.2, indicatingthat the code path for extended attributes does not differ much between matchingand non-matching cases.Overall, access to files that do not match non-EA queries are significantly fasterthan those that do, but there is a smaller difference between accessing files matchingand non-matching EA queries.6.2.5 Query ComplexityMore complex query expressions may take longer to evaluate; this benchmark isaimed at determining how evaluation time scales with the number of terms in aquery expression. A single query is used, and access to a matching file timed whilevarying the number of disjunctive terms in the expression; the -name predicate isused for each term, and the matching term is placed at the end of the disjunc-tion chain (-name fileX.txt -o -name fileY.txt . . . -name file match.txt)to force thorough evaluation. The obtained result for opening and closing with-out reading is negative — adding terms to the query actually caused it to becomeslightly faster, as there was a downward trend in the timing; this is clearly thesame caching effect seen in the earlier benchmarks, but when the file is read beforeclosing it, causing more code to be uncached, the per-term evaluation time is stillapproximately 0.5µs. Even with 10 terms, which is far more complex than a typicalquery expected of this system, the time taken is still only 5µs; this is an indication436.3. Resultset ModificationResultset Type Condition Time (µs)No modification (no match) 3.20± 0.10Flat 0 ↔ 1 results 17.43± 0.231 ↔ 2 results 153.6± 1.42 ↔ 3 results 47.98± 0.26Hierarchical depth 0 16.06± 0.30depth 1 101.7± 1.0Table 6.3: Resultset Modification Benchmark (creat()/close()/unlink()), timesper query-resultthat query evaluation comprises a very small portion of the total processing time inFindFS, and thus complex queries with many terms should not cause the system tobecome unusably slow.6.3 Resultset ModificationResultset modification occurs following expression evaluation if the existence of thepath differs from the evaluation result. The benchmarks in this section aim toelucidate the overhead associated with adding and removing files from a resultsetin response to creating and deleting files which satisfy a query expression. Sincethe operations performed differ between flat and hierarchical resultsets, and alsodepend on specific conditions of the resultset, we test a few representative resultsetconditions intended to exercise the cases shown in Table 5.1, and the times areshown in Table 6.3. As with expression evaluation, we vary the number of matchingqueries from 1 to 10, causing the same number of additions and removals to occurwith each creation and deletion; the times scale linearly, and obtaining the slope ofthis line with regression gives the per query-result time. The baseline of no resultset446.3. Resultset Modificationmodifications is 3.20µs, for comparison purposes.6.3.1 FlatThese benchmarks repetitively add and remove a file from the resultset of a queryconfigured with the (default) flat resultset mode, caused by creating and deletinga file elsewhere in the filesystem. A resultset initially containing 0, 1, or 2 filesof the same name is used, testing the differences that arise due to duplicate namehandling. The results show expected behaviour with adding or removing a uniquely-named file being the fastest since it is a single symlink() or unlink() call, andtransitioning between 1 and 2 duplicate names being the slowest due to requiringmultiple manipulations of the underlying filesystem. The times for creating andremoving the link are comparable to those taken by the native filesystem for creatingand removing a regular file, as shown in Table 6.1.6.3.2 HierarchicalThis benchmark repetitively adds and removes a file from the resultset of a queryconfigured with the hierarchical resultset mode, using a path with 0 and 1 sub-directory levels to cause either none or one directory creation/removal with thecreation/deletion of the file. The time taken when there are no directory addi-tions/creations is expectedly similar to that of the flat mode, since it performsthe same symlink()/unlink() pair. However, it is much slower when directorycreation/deletion is required; observing that even the native filesystem takes nearly55µs, the time it takes is not surprising. The results suggest that hierarchical result-456.4. Macrobenchmarkssets should be avoided for queries intended to match frequently added and deletedfiles in deeply nested, sparse directory trees.6.3.3 Microbenchmark SummaryComparing the results in Tables 6.2 and 6.3 with the FUSE overhead in Table6.1, in particular the creat()/close()/unlink() sequence, it can be seen that asignificant portion of the total time spent is FUSE overhead. For example, creatingand removing a file takes 119.3µs, and each query’s resultset that is modified dueto this operation adds an additional 16.06 to 153.6µs depending on the type andstate of the resultsets. Thus, it is very likely that a kernel-mode implementation ofFindFS, although seeming to have higher relative overhead, would achieve far betterabsolute performance than the current one with FUSE.6.4 MacrobenchmarksThe benchmarks of the previous section access only a single file and have very smallresultset sizes, and thus their frequent accesses to the filesystem will have beencached; this benchmark accesses a sufficiently large number of files such that notall of them can be cached, to investigate the performance of FindFS as it wouldbe in a typical application. For this purpose, we used the Filebench[28] utilitywith the standard fileserver workload. This is a more intensive workload thanis typical for a desktop application, but because it contains a variety of file accessoperations such as reading, writing, creating, and deleting, it is useful for assessingoverall performance. The test filesystem is parameterised to use 10000 files, with an466.4. Macrobenchmarksaverage of 20 files per directory, 128KB per file, and a tree depth of 3.1. This resultsin a total data volume of 1240.757MB. The runtime parameters are 50 threads,16KB average write size, and 1MB read size, operating for 5 minutes over which thetotal throughput is measured. Table 6.4 summarises these results.Configuration Throughput (ops/s)Native (ext3) 191.57FindFS, no queries 184.85FindFS, 5 flat queries 160.92FindFS, 5 hier queries 152.72Table 6.4: Filebench Throughput6.4.1 FUSE OverheadAs with the microbenchmarks, it is useful to obtain a baseline of the expectedperformance of the native filesystem in the test environment and compare that withthe overhead that FUSE adds; as before, the latter is accomplished by running theworkload on the filesystem through FindFS with no active queries. We obtain 96.5%of the throughput of the native system, which likely indicates that the underlyingstorage device is becoming the bottleneck as most of the accessed data can no longerbe cached.6.4.2 Flat ResultsetIn order to determine the effect of active queries on overall performance, we create5 queries, each encompassing a non-disjoint subset of the files, and run the sameworkload after they have entered the WAIT state. The resultsets total to approxi-476.5. Discussionmately 84% of all the files in the filesystem, and follow the same distribution of sizes.They are then added to and removed from by the file operations of the workload.The resulting throughput is 87% of that achieved without any queries, or 13% less.This is unlikely to be a problem in a typical desktop application where file operationfrequencies are low.6.4.3 Hierarchical ResultsetWe expect that a typical usage will have a mix of resultset modes, but due tothe slower modification of a hierarchical resultset, the lowest throughput can beobserved if all active queries use them. This benchmark uses the identical queriesand workload access patterns as the previous one, with the exception that the queriesare configured to use a hierarchical resultset. The obtained throughput, 83% of thatachieved without any queries, is only 5% less than that with 5 flat queries, and so weexpect the typical overhead to be in the 83 to 85% range; this is not a large difference,and on the occasion that the overhead of FindFS does cause a significant impacton performance, such as an unusually intense sequence of filesystem operations,queries on the affected subtree(s) can be easily made inactive (STOP state) and latercommanded into FIND to “batch-update” the resultsets afterward.6.5 DiscussionAlthough the microbenchmarks show that operations which cause resultset modi-fications under FindFS become 5x to 48x slower due to the additional filesystemoperations, with a more general workload their impact is reduced significantly to486.5. Discussionunder 15%. Opening and closing a file while reading very little data from it, orcreating and deleting files, which show the worst-case behaviour, are operationsseldom performed at such extreme frequencies on a typical desktop application oreven a file server. It is even rarer that these rapid changes would cause repetitiveaddition and removal from a resultset. In typical applications, overall throughput isconstrained more by storage access times — as the amount of data read or writtenincreases, the percentage of time spent opening and closing the file decreases; sinceFindFS transparently passes reads and writes through to the underlying filesystem,those operations which account for the majority of time in a typical workload arenot affected. This trend agrees with the FUSE benchmarks in [27].49Chapter 7ConclusionFindFS is a filesystem-based extension which allows adding tag-based views to atraditional hierarchical filesystem, making it easier to organise files and access themin ways that their metadata naturally suggest, without requiring significant changesto existing applications. The overhead added by monitoring filesystem operationsand modifying resultsets can be relatively small, and thus not have great impacton perceived performance in applications; even the filesystem-intensive use-casescan benefit from the flexibility and simplicity of FindFS’ approach to on-demandpersistent search queries with its fine-grained control over query processing.FindFS retains backwards-compatibility and provides the familiarity of selectingfiles using find and creating symbolic links to allow building alternate hierarchies,but also leverages extended attributes and dynamic resultset updates to give usersthe more powerful and convenient ways to group and find files which semanticfilesystems are known for.The source code of the FindFS prototype is available at http://www.cs.ubc.ca/labs/dsg/findfs/ . FindFS is usable on platforms with FUSE, a POSIX-compatibleAPI, and filesystems supporting extended attributes.50References[1] Gifford, D.K., Jouvelot, P., Sheldon, M.A., James W. O’Toole, J.: Semantic filesystems. In SOSP ’91, Proc. of the thirteenth ACM Symposium on OperatingSystems Principles, New York, NY, USA, ACM Press (1991) 16-25.[2] Seltzer, M., Murphy, N.: Hierarchical file systems are dead. In HotOS ’09, Proc.of the 12th conference on Hot topics in Operating Systems, Berkeley, CA, USA,USENIX Association (2009) 1-1.[3] Ngo, H.B., Silber-Chaussumier, F., Bac, C.: Enhancing Personal File Retrieval inSemantic File Systems with Tag-Based Context. In EGC 2008, Revue des NouvellesTechnologies de l’Information, RNTI E-11, Cepadues, Volume I, (2008), 73-78.[4] Amoson, J., Lundqvist, T.: A light-weight non-hierarchical file system navigationextension. In Proc. of the 7th International Workshop on Plan 9, Dublin, Ireland,Bell Labs (2012) 11-13.[5] Voit, K., Andrews, K., Slany, W.: TagTree: storing and re-finding files using tags.In Proc. of the 7th conference on Workgroup Human-Computer Interaction andUsability Engineering of the Austrian Computer Society: information Quality ine-Health, Graz, Austria, (2011) 25-26.51References[6] Bloehdorn, S., Gorlitz, O., Shenk, S., Volkel, M.: TagFS - Tag Semantics for Hierar-chical File Systems. In I-KNOW 2006, Proc. of the Sixth International Conferenceon Knowledge Management, Graz, Austria, (2006), 304-312.[7] Gopal, B., Manber, U.: Integrating content-based access mechanisms with hierar-chical file systems. In OSDI 1999, Proc. of the Third Symposium on OperatingSystems Design and Implementation, New Orleans, Louisiana, USA, USENIX As-sociation (1999) 265-278.[8] Ballesteros, F.J.: File indexing and searching for Plan 9. In Proc. of the 4thInternational Workshop on Plan 9, Athens, Georgia, (2009) 131-138.[9] Padioleau, Y., Ridoux, O.: A Logic File System. In USENIX Annual TechnicalConference, USENIX Association, 2003.[10] M.A. Olson.: The Design and Implementation of the Inversion File System. In Proc.of the Winter USENIX Technical Conference, San Diego, California, USA, USENIXAssociation (1993) 205-217.[11] Apple Spotlight Search -https://developer.apple.com/library/mac/documentation/Carbon/Conceptual/MetadataIntro/MetadataIntro.html, June 2015[12] Windows Desktop Search -http://www.microsoft.com/windows/products/winfamily/desktopsearch/default.mspx, June 2015[13] Saved Search File Format (Windows) -https://msdn.microsoft.com/en-us/library/windows/desktop/bb892885(v=vs.85).aspx, June 201552References[14] OS X Mavericks: Create or modify a Smart Folder -https://support.apple.com/kb/PH14047, July 2015[15] symlink - The Single UNIX(R) Specification, Version 2, The Open Group, 1997[16] link - The Single UNIX(R) Specification, Version 2, The Open Group, 1997[17] find - The Single UNIX(R) Specification, Version 2, The Open Group, 1997[18] mdfind(1) - OS X Man Pages -https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man1/mdfind.1.html, July 2015[19] xattr(7) - Linux Man Pages -http://man7.org/linux/man-pages/man5/attr.5.html, June 2015[20] File Metadata Query Expression Syntax -https://developer.apple.com/library/mac/documentation/Carbon/Conceptual/SpotlightQuery/Concepts/QueryFormat.html, July 2015[21] Ritchie, D.M., Thompson, K.: The UNIX Time-Sharing System. In Communica-tions of the ACM, Vol. 17 No.7, July (1974) 365-375.[22] Ousterhout, J.K., Welch, B.B.: Pseudo-devices: user-level extensions to the Spritefile system. In USENIX Association Summer Conference Proceedings, San Fran-cisco, California, USA, USENIX Association (1988) 37-49.[23] Pike, R., Ritchie, D.M.: The Styx Architecture for Distributed Systems. In BellLabs Technical Journal, Vol 4, No 2, April-June (1999), 146-152.[24] Xu, Z., Karlsson, M., Tang, C., Karamanolis, C.: Towards a Semantic-Aware FileStore. In HotOS ’03, Proc. of the 9th Workshop on Hot Topics in Operating Sys-tems, Berkeley, CA, USA, USENIX Association (2003) 31-31.53References[25] Dekeyser, S., Watson, R.: Metadata manipulation interface design. In Proc. of theFourteenth Australasian User Interface Conference, Melbourne, Australia, (2013)33-42.[26] FUSE: Filesystem in Userspace - http://fuse.sourceforge.net/, April 2014[27] Rajgarhia, A., Gehani, A.: Performance and Extension of User Space File Systems.In Proc. of the 2010 ACM Symposium on Applied Computing, New York, NY, USA,ACM (2010) 206-213.[28] Filebench File System Benchmark - http://filebench.sourceforge.net/, May201554

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0166678/manifest

Comment

Related Items