UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Sharing and privacy using untrusted storage Ofir, Jacob 2000

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


831-ubc_2000-0511.pdf [ 1.99MB ]
JSON: 831-1.0051529.json
JSON-LD: 831-1.0051529-ld.json
RDF/XML (Pretty): 831-1.0051529-rdf.xml
RDF/JSON: 831-1.0051529-rdf.json
Turtle: 831-1.0051529-turtle.txt
N-Triples: 831-1.0051529-rdf-ntriples.txt
Original Record: 831-1.0051529-source.json
Full Text

Full Text

Sharing and Privacy Using Untrusted Storage by Jacob Ofir B . S c , York University, 1998  A THESIS S U B M I T T E D IN P A R T I A L F U L F I L L M E N T O F THE REQUIREMENTS FOR T H E D E G R E E OF  M a s t e r of Science in T H E F A C U L T Y OF G R A D U A T E STUDIES (Department of Computer Science)  We accept this thesis as conforming to the required standard .  The University of British Columbia August 2000 © Jacob Ofir, 2000  In  presenting  degree freely  at  the  available  copying  of  department publication  this  of  in  partial  fulfilment  University  of  British  Columbia,  for  this or  thesis  reference  thesis by  this  for  his thesis  and  scholarly  or for  her  Department The University of British Columbia Vancouver, Canada  (2/88)  I further  purposes  gain  the  requirements  1 agree  shall  that  agree  may  representatives.  financial  permission.  DE-6  study.  of  be  It not  is be  that  the  for  Library  an shall  permission for  granted  by  understood allowed  advanced  the that  without  make  it  extensive  head  of  copying my  my or  written  Abstract Broadband connections to the Internet are enabling a new set of applications and services. Of interest is the impact of this additional bandwidth on current file system models. These models are being challenged as the Internet is enabling global file access, cross-domain sharing, and the use of Internet-based storage services. Various network file systems [3, 15, 8] offer ubiquitous file access, research systems [9] have offered solutions to cross-domain sharing, and cryptographic file systems [2, 5] addressed concerns regarding the trust of system administrators and data security. The Internet model requires that all these ideas be integrated into a single system. This thesis describes a new file system called bFS that addresses the challenges of this new model by eliminating the assumption that servers (specifically, their administrators) are trusted. Instead, agents sue trusted to manage data, metadata, authentication with storage providers, and enforcing access control. This enables global access and cross-domain sharing using untrusted storage servers.  ii  Contents  Abstract  ii  Contents  S  List of Tables  iii  vi  List of Figures  vii  vni  Acknowledgements  Dedication  rx  1  Introduction  1  2  Design  5  2.1  Architectural Overview  6  2.2  Trust Model  7  2.2.1  Storage Providers  8  2.2.2  Agents  9  2.2.3  Users  10  2.3  Meta Data  11  2.4  Certificates  13 iii  2.4.1 2.5  2.6  2.7 3  4  5  Revocation  15  Agents  1 5  2.5.1  Protocol  1 6  2.5.2  Sharing  18  2.5.3  A n Example  2  2.5.4  Multiple Agents  0  20  Encryption  22  2.6.1  24  Naming  Summary  25  Implementation  26  3.1  Overview  26  3.2  Client  3.3  Agent  3.4  Performance Optimizations  30  3.5  Summary  30  "  Performance  2  7  2  8  **1  4.1  Overview  31  4.2  Test Environment  32  4.3  Micro-Benchmarks  33  4.4  Andrew Benchmark  36  4.5  Summary  38  Related Work  3  9  5.1  Crpytographic File Systems  40  5.2  Cross-Domain Sharing  40 iv  6  Conclusions 6.1  42  Future Work  43  Bibliography  45  v  List of Tables  2.1  bFS Certificate  14  2.2  File access operations  17  2.3  Cryptography operations  18  2.4  Sharing operations  18  4.1  bFS micro-benchmarks in milliseconds  34  4.2  Andrew benchmark: bFS vs. N F S  36  4.3  Andrew benchmark using 32k read/write sizes: bFS vs. N F S . . . .  37  vi  List of Figures 2.1  Architectural Overview  7  2.2  Sharing using public read-only  21  3.1  bFS agent overview  27  vii  Acknowledgements Too many people ensured I complete this thesis. Their support ranged from encouragement to threats, from brain-storming to just laughs. You would not be reading this today without the contributions listed below - so blame them, not me! To all the clowns in the D S G , thanks for the fantastic environment. In particular, Ross Carton, Douglas Santry, Alex Brodsky, Joon Suan Ong, Yvonne Coady, and Dima Brodsky have been great to bouce ideas and frustrations off of. There are many friends outside the D S G whose presence made my experience more enjoyable. Paul Kry, Joshua Richmond, and Derek DiFilippo have been, and continue to be great friends. Thank you. Michael Feeley and Norm Hutchinson signed their names on the front page. But they did more than just that. They let me roam the expanse of my thesis, but pushed me in the right direction when I strayed too far. They asked me hard questions and forced me to think in new ways. And they also made sure I got a good laugh once in a while. M y parents, brother, sister, and Emeline have also been amazing. They let me talk their ears off about stuff they did not really understand. A n d Emeline somehow managed to deal with me while I was thinking, coding, and writing Wow! Lastly, the rest of the faculty, staff and fellow students made the department feel like home. Oh, this thesis would not have been possible without financial contributions from N S E R C and the B C Advanced Systems Institute.  JACOB  The University of British Columbia August 2000  Vlll  OFIR  To my F a m i l y .  ix  Chapter 1  Introduction The traditional approach to building secure file systems has been to assume that servers are trusted, but that clients need not be. This assumption of server trust has been critically important. Only a trusted server can implement the access control and data protection policies and mechanisms necessary to prevent unauthorized access to user data. This assumption of server trust has also been a natural one. File systems are typically restricted to a single administrative domain and thus server trust is provided by that domain's system administrator. This administrator controls the software that runs on the server as well as the users that can access it. File access control is based on a user authentication system under control of this administrator. Before a user can access a server's files, the administrator must have added that user to an authentication database used by the server. Furthermore, the transfer of file data over the network from server to client has typically not been viewed as a significant security hole, because administrators also controlled access to the network. This basic approach has been followed by all traditional file systems, even 1  global file systems such as A F S . In A F S [8], files from remote domains can be named, but access to those files requires that a user authenticate themselves with the domain that stores the files. For instance, if a user in the M I T Computer Science Department wishes to share with a colleague outside her department, she must first ask her system administrator to add that user to the local user database. The Internet is changing this model of file system access in two ways. First, users are increasingly accessing files from a different security domain than the file server that stores those files.  Second, users are increasingly interested in being  able to share their files with other users outside of the server's security domain; for example, a user may want to share files with colleagues, friends, or family members. As a consequence, Internet-based storage services have begun to appear in the past year or so. Internet-based storage services have several benefits. First, they facilitate global access, because unlike many secure file systems, they can be accessed from anywhere on the Internet. Second, they may provide a useful alternative to local disks for home computer users, thus freeing the typical user from fear that valuable data might be lost due to the loss of the local disk that stores it. Finally, automatic software maintenance and perceived unlimited storage capacity are also possible. This new model of file system access challenges the traditional file system model because the network is not trusted, the storage provider might not be trusted, and sharing is likely to occur between users from different administrative domains. The first challenge is clearly unavoidable as the network is the Internet, but the latter two require further exploration. In a model in which users buy storage, they may be unwilling to trust storage providers to prevent the accidental or intentional disclosure of their valuable data.  2  The criteria to determine what is valuable will vary from user to user, but the underlying privacy theme cannot be challenged as users must retain that right. This untrusted storage assumption is further enforced when examining corporate use, as it is unlikely that corporate users would trust their data to Acme File Storage, Inc. Internet-based storage could enable simplified sharing.  Existing storage  providers facilitate sharing by providing each user with a public read-only folder. Other users can access all files in a user's public public folder by simply knowning the user name. These sharing semantics are insufficient and will eventually have to be replaced with semantics that allow a user to place restrictions on who can access their files.  The traditional approach of placing such access restrictions is  using access control lists ( A C L ) . However, server-managed A C L s that support sharing between administrative domains lead to the following problem. Server-managed global A C L s require a server to authenticate all possible users. Maintaining such a database will prove tedious, if not impossible, and even if possible, would force that server's security policy onto users outside of its administrative domain. As an alternative, the server could delegate the authentication of foreign users to an outside server, in which case the server would have to trust the authenticating server. The argument against this approach is that it requires a global authentication scheme and key management policy. Such an infrastructure would require a global name for all users in the World, and even if it were possible, the idea itself is scary due to its Orwell-ian nature. This thesis describes a new file system, called bFS, that leaps into this new model by eliminating the requirement that storage providers be trusted. Instead, bFS agents are trusted to manage data, meta-data, authentication with storage providers, and enforcing access control.  There are two core arguments for this  3  approach. The first argument is an end-to-end [13] argument for cryptographic security. A l l data sent between server and client must be encrypted to provide a secure communication channel. If data must also be stored encrypted on disk, as servers are not trusted, it is argued that clients should encrypt data, without the storage provider's knowledge, and send this ciphertext over unsecured channels. The second argument is for client control over user authentication. In bFS, users can control access to their files without involving their storage provider (the server nor the administrator).  In effect, each user maintains their own private  authentication database, thus avoiding the global authentication required if servers were to implement access control.  4  Chapter 2  Design Two core ideas motivate the design of bFS. The first is the use of a remote filesystem, referred to as the storage provider. The storage provider is not trusted to prevent the accidental or malicious disclosure of user data.  The second core  idea is a decentralized sharing mechanism designed to span multiple administrative domains. Together, these ideas allow a user to place their trust in a software agent that manages encrypted data on the storage provider and enforces access control. An additional requirement is to ensure that the use of bFS is transparent to storage providers, and hence easily deployable. To fulfill this requirement the design has to support existing network file system protocols and require no involvement from the storage providers. This means that it should not be necessary for a storage provider to install any additional software or support any additional protocols in order to use bFS. Finally, if storage providers offer public read-only access, as most current systems do, this feature can be used to optimize access to files. The performance optimization occurs when data is read directly from the storage provider while fetching the decryption information from the agent.  I  This chapter begins with an architectural overview of the system. This is followed by a detailed account of the trust relationship amongst the different entities and components that comprise bFS. The discussion then moves to meta-data and a description of the bFS Certificate and its management and revocation issues. After establishing sufficient background information, technical issues relating to agents, and encryption techniques are explored.  2.1  Architectural Overview  A bFS file system is composed of a storage provider, agents, and clients. The storage provider stores data. The data includes the encrypted form of user data and the meta-data necessary to manage it. Clients access their data through agents that manage both data and the additional meta-data. The bFS certificate is a file that is used to authenticate agents and clients, and to locate a user's storage provider and agents. Storage providers are accessed using the protocol they require. For instance, some storage providers require CIFS, while others require some other protocol. It is the agent's responsibility to support the required protocol. Clients access agents using the bFS protocol. Both agents and users have their own private-public key pairs which are used for authentication using well known authentication techniques [16]. The public keys for a user and all their agents are stored within that user's bFS certificate. Figure 2.1 contains a high-level overview of the components. It illustrates two users. Bob and Alice, their agents, storage providers, and certificates. Bob authenticates with his agent and uses his agent to access his bFS file system. He also uses t.he public read-only access to his storage provider to increase performance. 6  His agent uses the credentials Bob provides during his logon with the agent, to authenticate with the storage provider. Alice's storage provider does not support public read-only access, so she always accesses her file system through the agent. Bob and Alice exchanged certificates in the past and Alice is accessing Bob's files through his agent. Bob's Certificate [Bob's Agent Authentic: P e r f o m y f / O [Bob's Storage Provider  P u b l i c K e y Agent Locations  &  Keys  P u b l i c K e y Agent Locations  &  Keys  Access Dala  [Alice's Storage Provider  (Alice's Agent)  Figure 2.1: Architectural Overview  2.2  Trust Model  The trust model describes the different entities in a system. This includes their roles, relationships, functional assumptions, and any other aspect of their presence within the system. There are three entities that comprise bFS: storage providers, agents, and users. Storage providers provide remote storage facilities to users. Users access this storage through agents. Agents maintain all data and meta-data, authenticate users, other agents, and when possible, storage providers.  7  2.2.1  Storage P r o v i d e r s  A storage provider is an entity providing remote storage facilities to a user. The storage provider defines the communication protocol used to access the storage. These protocols, such as N F S [3, 17, 18], CIFS [4, 15], and the recently appearing network storage provider protocols (X:drive [19], netdrive [11], and driveway [6]) have an impact on the trust model. For instance, N F S using U I D authentication is not secure. Proposed extensions to the N F S protocol [7] address this issue but are not widely deployed. On the other hand, newer versions of CIFS support secure authentication. Secure authentication with the remote storage does not guarantee privacy as the protocol may specify that after the authentication stage all data is sent in the clear. Since bFS uses the specified protocol and cannot require changes to the storage provider, it suffers from the weaknesses of the underlying protocol. For instance, if the underlying protocol provides very weak authentication then the overall authentication strength of bFS is weak, regardless of the strength of other authentication stages. Authentication of storage providers can occur only if it is supported by the storage provider protocol, and most existing protocols do not. This lack of server authentication is not surprising, because existing protocols rely on the assumption that servers are trusted. The lack of server authentication does not pose a challenge to bFS, as all data stored on a server is encrypted, and hence, a rogue server must still attempt to crack encryption keys. However, if an agent communicates with a rogue server, all data committed to that server is permanently lost as it is not committed to the real server. A non-technical way to deal with a rogue server is to assume that the rogue server will not be able to produce the file-system view that the user expects, giving the user the opportunity to detect the rogue server.  8  A more serious attack is the man-in-the-middle attack. In this situation a rogue server can let the real server present the expected file-system view. It can later modify packets as they travel between the two parties. If the remote storage protocol includes message signatures then modified packets will be detected, otherwise, bFS will, with very high probability, detect these modifications when decrypting the modified data. Authentication of agents can occur only if it is supported by the storage provider protocol. The agent obtains the credentials to authenticate as the user, from the user, when the user establishes a session with an agent. Users access their data using a software client that communicate with agents and storage providers. In most cases users will access all of their data through an agent. However, public read-only access allows clients to read data directly from the storage provider and decrypt it using a key retrieved from the agent. The use of public read-only access reduces load on agents at the cost of load on the clients.  2.2.2  Agents  Each user owns at least one bFS agent. Agents are trusted software entities that provide a user with access to their bFS file-system. Recall that agents can be easily authenticated using their key-pair. Once authenticated, an agent is trusted to hold a user's storage provider credentials. The credentials vary depending on the protocol used to access the storage provider. For instance, S M B requires a host name, user name, and password. For an agent to perform its access control role properly, it must also authenticate clients. Since the agent is a trusted software entity, it must run within an environment that is trusted by the agent's owning user. Using trusted software agents allows bFS to remove trust from storage providers.  9  These agents can be deployed in various configurations. For instance, some users may decide to have a single agent residing on their home computer. They can use this agent when at home or away from home, but when communicating with the agent over an untrusted network, a secure channel must be established. This use of a secure channel leads to double encryption since the agent will decrypt remote data, and then have to encrypt it for transmission over the secure channel. To avoid this un-necessary step the client has two options. The first option, is for the client to read data directly from the storage provider and obtain decryption keys from the agent. The second option if for the client to read raw unencrypted data and obtain decryption keys from the agent. The difference between the two options is that the first requires that the client support both the bFS protocol and the storage provider protocol, while the second requires client support only for the bFS protocol. There are two other options when writing data. The first is to use a secure channel and suffer from double encryption. The second is to perform encryption in the client and send the agent encrypted data blocks. The first approach is simple, but the second requires moving agent functionality such as block alignment into the client. The second approach is much slower than the first when the client is very light weight and has very limited bandwidth to the agent. 2.2.3  Users  Users access a bFS file system the same way as other distributed file systems, through a trusted client layer. The client communicates with agents using the bFS protocol, discussed in section 2.5.1. The decentralized nature of bFS requires an assumption that each user can manage their own sharing relationships. For instance, if Bob and Alice wish to  10  share files they must first exchange certificates. They do so using any method they feel is sufficiently secure for their purposes.  Since bFS does not enforce any key  management policy, different users can enforce their own policies. For example, some users may require that the public key specified in a bFS certificate be a X.509 certificate signed by VeriSign while others may require P G P certificates.  Notice  that a sharing relationship is a mutual relationship. If Alice were to pass Bob's certificate to Eve and Bob's agent did not possess Eve's certificate then Eve's agent would be unable to authenticate with Bob's agent. This property is a key feature of the decentralized key management system as it puts users in control of their sharing relationships by empowering them to determine which certificates should be trusted.  2.3  Meta Data  Agents maintain meta-data in special files stored at the storage provider. Metadata in bFS is divided into five categories: file-system key, user database, directory map, directory meta-data, and file meta-data. The special meta-data files are given names that contain the illegal character n u l l . These names are transformed into remote names using the same naming technique that applies to all other files. These naming techniques are discussed in section 2.6.1, and ensure that these illegal names become legal names on the storage provider. Core to both meta-data and the client access protocol is the object identifier, or OID. The OID uniquely identifies every object in the file-system. If all storage provider protocols provided a mechanism to access their internal object identifiers, and gain access to an object using an object identifier, then bFS would not have to maintain its own set of OIDs. However, even NFS3 [3] does not provide a mechanism to use a f i l e i d retrieved from a GETATTR to access the actual object. Since bFS has 11  to manage its own set of unique object identifiers, and does not inherit a mechanism to deterministically map these OIDs to files on the storage provider, bFS must provide its own mapping of OID to an object's meta-data. The file-system key is a symmetric key used to encrypt the user database and the directory map. It is stored in a file residing in the file-system root. The file is encrypted using the user's public key and stored in a file named { n u l l  'fskey'}.  The user database holds certificates for all sharing partners. It is stored in a file residing in the file-system root. The file is encrypted using the file-system key and stored in a file named { n u l l ' u d b ' } . The directory map contains a mapping, for directories, from OIDs to their remote name and parent's O I D . It also contains the OID that will be assigned to the next object, and a symmetric key used to encrypt directory meta-data. The OID mapping ensures that given a directory's OID an agent can rapidly construct the fully-qualified path of the file containing the directory's meta-data. The agent constructs this path recursively in reverse order by looking up an OID, finding its remote name and its parent's OID, and then doing the same for the parent OID until it reached the root O I D . The root OID is identified in the directory map as it is the only entry whose parent OID is identical to the entry's OID. The bFS protocol, discussed in section 2.5.1, specifies that a bFS file handle is composed of the object's OID and their parent's OID. This technique allows any file handle to be mapped to its meta-data. The directory map is stored in a file residing in the file-system root. The file is encrypted using the file-system key and stored in a file named { n u l l 'dmap'}. Both file and directory meta-data contain the object's key, real and remote names, OID, and an A C L . Both real and remote names are stored because the  12  naming technique may not be reversible. In addition to the common meta-data, directory meta-data contains a map of OID to file meta-data for files, or directory names for directories. File meta-data contains the file's real size. Each directory stores its meta-data in a file residing within that directory on the remote storage. The file is encrypted using the key specified in the directory map and stored in a file named { n u l l ' d i r m d ' } .  2.4  Certificates  A certificate is a file that contains information used to authenticate a user and all their agents, and to access the user's storage provider. Certificates are not centrally managed, and can therefore be created by anyone and managed using any policy. A user can even partition their storage by issuing certificates for different directories on their storage provider. For instance, Bob has Acme Storage File Storage, Inc. as his storage provider. His home directory is /users/bob, but he issues creates two certificates, one for / u s e r s / b o b / p e r s o n a l and one for / u s e r s / b o b / b u s i n e s s . In effect, Bob established two separate accounts. bFS certificates contain all the information required for sharing, authentication, and bootstrapping, and their content is described in table 2.1. The certificate contains some form of the user's public key and is signed using the corresponding private key. The public key can take many forms. It may be a X.509 [16] certificate, a P G P [21] certificate, or a raw public key. In addition to the public key, the bFS certificate contains the following information: • Storage Protocol and Location: This is used by agents to communicate with the storage provider, and by sharing parties for public read-only access. For  13  Element  Description  Preamble Version Username length Username Storage provider type Storage provider details length Storage provider details  BFSCERT  Storage root length Storage root b F S block size E n c r y p t i o n type  b F S Certificate version 8bit length T h e user's alias N F S , S M B , etc. Size of following segment P r o t o c o l specific details. F o r instance, S M B requires host name and user name. Size of following segment F o r instance, / m a r k 16 bit block size  N a m i n g type P u b l i c access  W h a t encryption technique is used: type, key size, and initialization vector W h a t n a m i n g technique is used Is public read-only access available  Certificate type Certificate length  P G P , X . 5 0 9 , etc. Size of following segment  Certificate N u m b e r of agents For each agent: Agent location length Agent location  Certificate that contains the user's public key  Agent port S F S hash  N u m b e r of agents Size of following segment T h e host where the agent is running T h e port where the agent is running Hash of agent's public key, hostname, and port  Table 2.1: b F S Certificate  14  instance, Bob has a storage account with Acme File Storage, Inc. The storage provider uses SMB, their server is located at acmefilestore.com, and Bob's root on the storage is /users/bob. • Agent Locations: A list of the location and hash of the public key (using the self-certifying technique from SFS [9]) for the user's agents. More aggressive caching can take place if the user has only a single agent. • bFS Bootstrap Information: This information is used by agents and public read-only clients. This includes configuration information such as the encryption and naming techniques. 2.4.1  Revocation  Certificate revocation is major challenge in a centralized system. Such systems have certificate authorities (CA) that can be queried to determine whether a certificate has been revoked. Querying certificate authorities when presented with a certificate causes a performance bottleneck. Modern CAs use certificate revocation lists (CRL) to propagate new revocation. However, strict reliance on CRLs leads to window between a certificate revocation and acquisition of a new CRL in which revoked certificates are mistaken as valid. Certificate revocation in bFS requires users to distribute their new certificate to their sharing partners, and for these partners to update their local databases.  2.5  Agents  The bFS agent controls communication with the storage provider, performs all authentication, maintains meta-data, and controls sharing. Earlier sections discussed 15  the relationship between agents and storage providers, how standard authentication techniques are used in bFS, and additional meta-data maintained in bFS. What follows is a discussion of the protocol used to access agents, how sharing occurs in bFS, and issues related to the use of multiple agents per user.  2.5.1  Protocol  The bFS protocol is very similar to N F S . It has operations on names (such as resolve name into a file handle, remove the named object), operation on file handles (such as read, write), and administrative operations (such as add certificate, grant access). Each object in bFS is assigned a unique, persistent, object identifier (OID) which is part of that object's meta-data. The root of the file system contains the directory map, which is used to rapidly locate any directory's meta-data.  Directories use  OIDs to locate the meta-data for objects within the given directory, and OIDs are used within bFS file handles. A bFS file handle is composed of a user ID (UID), parent's OID and the object's OID. The UID is a unique number assigned to each user in the user database, with the UID of the main user being zero. Since the root does not have a parent, the OID used for that field is the root's OID. When an operation happens on a file handle, the directory map can be examined to determine either the object's remote name or, if the object is a file, the parent's remote name. If the object is a file, the parent's remote name is used to retrieve the directory's meta-data and access the file's meta-data. Communication between an agent and the user may take several forms. In some circumstances they may communicate over an untrusted network (ie, the Internet), a secure network, or even share the same address-space. Part of the user-agent  1(5  LOGON GUEST LOOKUP READ READRAW WRITE WRITERAW CREATE MKDIR REMOVE REMDIR RENAME GETATTR SETSIZE READDIR  Tunnel authentication information, receive root file handle Logon as a guest to a file system Resolve parent file handle and object name into a fi le handle Read from the specified file Read encrypted data from the specified file Write to the specified file Write already encrypted data to the specified file Create a file in the directory specified by a file h andle Creates a directory in the directory specified by a file handle Removes a named file from a directory Removes a named directory from a directory Renames an object within one directory, to an object within another directory Retrieve attributes for the specified file handle Sets the size of a file specified by a file handle Retrieves the object listing of a directory Table 2.2: File access operations  handshaking phase is to negotiate protocol parameters. These parameters determine how data will be transferred between the two. For example, if using the same address space there may be no reason to establish a secure communication channel between the two co-located modules. When using a trusted network we would expect the agent and user to use plaintext after authenticating one another. However, when using an untrusted network the two must communicate over a secure channel, raising the double encryption problem discussed in section 2.2.2. Tables 2.2, 2.3, and 2.4 describe the operations exposed by agents and available using the bFS protocol. The tables list operations on files and directories, cryptographic operations, and sharing operations respectively. Users access their data using a software client that communicates with agents  17  Force a re-key of a file Force generation of a new file-system key Retrieve remote name and key  REKEY FSREKEY GETREMOTEINFO  Table 2.3: Cryptography operations List users Retrieve a user's certificate given an alias Add a new user (with their bFS certificate) Remove a bFS user Update a user's bFS certificate Set rights for a user on a file handle Get rights for a user for a file handle Get rights for all users for a file handle Set A C L for the specified file handle to be same as parent's  GETUSERS GETUSER ADDUSER REMOVEUSER UPDATEUSER SETRIGHTS GETUSERRIGHTS GETALLRIGHTS RESETACL  Table 2.4: Sharing operations and storage providers. Modern operating systems support various techniques to introduce new file systems.  In Unix, new file systems can be introduced within  the kernel or at user-level. In Windows, new file systems can be introduced using the Installable File Systems K i t [10]. A new method to access files is using a web interface. Regardless of which technique is used, the client that interfaces with the bFS a.gent must communicate using the bFS protocol.  2.5.2  Sharing  Existing network file systems accomplish sharing by using access control lists (ACL) or capabilities, based on the assumption that all users are known and authenticated to the file server. Whenever sharing has to cross an administrative domain, different methods, such as F T P or e-mail, are used. The disadvantages of such methods are  18  that sharing is obtrusive, does not fit the typical file sharing model, and writesharing is complicated by the fact that files must be F T P e d or e-mailed back to the owner. Furthermore, F T P and e-mail are usually not used in a secure manner. The sharing semantics of bFS are taken from Multics [14]. New A C L entries inherit the A C L of their parent directory when they are created. Subsequent changes to the parent's A C L do not affect any objects within the directory. There exists a user database in bFS where the certificate of all sharing parties are stored. This database assigns a unique user ID to each certificate, and when a certificate is loaded into the database it must be accompanied by an alias assigned by the user. The access control list contains a user ID and a privilege for each user granted access to an object. When a user's access to an object is revoked the object's contents may have to be re-encrypted. In some scenarios clients may retrieve an object's remote name and key from an agent, and then read the data using public read-only access. bFS maintains information in the A C L about which users have ever had in their possession the key to a file. If in the future a user's access is revoked and they never acquired the key, the system need take no action. Since the user never had the key they must go through the agent to obtain the cleartext data, but the agent will deny them access. However, if the user's access to a file is revoked and they acquired the file key at some point in the past, then the file is re-encrypted with a new key. This means that keys previously handed out to other users are now also invalid. These keys, previously retrieved and cached by clients, will cause, with very high probability, decryption errors when used to read the padding information (discussed in Section 2.6), indicating to the client that the key has changed.  19  2.5.3  A n Example  T h e following describes a way for users to share files. W h e n B o b and A l i c e wish to share files, they must first exchange b F S certificates. Once exchanged, the certificates are loaded into their corresponding b F S file systems v i a their agents. B o b can now grant Alice various levels of access to different files. T h e mechanism by which this happens depends on the client B o b is using to access b F S . F o r instance, some clients may have a graphical tool to load certificate and grant access, while others may only provide a c o m m a n d line interface. T h e client will use various b F S protocol commands to accomplish the necessary task. W h e n A l i c e wishes to read one of B o b ' s files, her agent (or client) detects an a t t e m p t to use a shared object. It determines t h a t the object is owned by B o b , retrieves B o b ' s certificate and initiates c o m m u n i c a t i o n w i t h one of B o b ' s agents. Once an authenticated session has been established, A l i c e performs all operations on B o b ' s files through B o b ' s agent. B o b ' s agent is aware that the operations are being performed by A l i c e and can therefore enforce proper access control. A l i c e can improve performance on some operations by t a k i n g advantage of public read-only access to files. She does so by requesting the key and remote name for the required file. B o b ' s agent ensures t h a t access should be granted and responds with the key and remote name. A l i c e now accesses the file using public read-only access. After fetching the required d a t a she uses the key to decrypt it. T h i s is illustrated in F i g u r e 2.2.  2.5.4  M u l t i p l e Agents  A user may have several agents, each of which may be designated public or private. P u b l i c agents are published in the user's certificate and are hence accessible for sharing. P r i v a t e agents are used to ensure t h a t sharing does not compromise the user's  20  Bob  _  Alice  F i g u r e 2.2: Sharing using public read-only  performance. A local agent is the agent that the user is currently using, all others are remote.  Since only a single agent is used at the user-storage authentication  stage, all agents are required to broadcast, in a secure manner, a user's credentials to the user's remaining public agents. Once an agent receives a user's credentials it can proceed to establish secure sessions with the user's remaining public agents. Agents could be rendered useless if a user changes his password w i t h the storage provider but does not re-authenticate through an agent, however, users, will eventually authenticate with an agent to get at their d a t a . To disable a public agent, the user must generate a new certificate with that agent removed, and change credentials (password) on the remote storage. T o disable a private agent, the user need only change credentials.  21  2.6  Encryption  Recall that one of the bFS design requirements is to optimize performance in public read-only environments. Since such an environment allows access to all files for the specified user, bFS must ensure that a given key is only used for objects with the same permissions. To avoid any book-keeping associated with maintaining these sub-groups of file system objects, bFS uses a different key for every object. Files are encrypted using symmetric cryptography since asymmetric cryptography is too expensive. Symmetric ciphers [16] come in different two primary flavors: stream and block. Stream ciphers process data one bit at a time and the ciphertext of a given bit depends on all the preceding bits. This dependency shows itself in the decryption process as well, and is an unacceptable performance barrier as bFS requires uniform-access-time random access to files. The second family of ciphers, block ciphers, processes data one block (typically 64bits) at a time. Block ciphers can operate in different modes. In E C B (Electronic Codebook) mode, the ciphertext of each block is completely independent from any other block. In C B C (Cipher Block Chaining) mode, the plaintext of block n is xor'ed with the ciphertext of block n—1 before it is encrypted. To decrypt block n, it is decrypted in the normal manner, and then the results are xor'ed with the ciphertext of block ra —1. Notice that decrypting block n does not require block n — 1 to be decrypted, which is required for random access reads. However, a change in ciphertext of block n will propagate changes for all the remaining blocks, which is an unacceptable performance penalty for writes. The remaining block cipher modes combine the flexibility of stream ciphers with the strength of block ciphers. A t first, it appears as if bFS has to use E C B . This is not satisfactory as E C B has some cryptographic weaknesses. These weaknesses arise because two identical  22  plaintext blocks have the identical ciphertext. T h e duplication offers an attacker a starting point in a t t e m p t i n g to uncover the plaintext. T h e encryption technique in C F S [2] uses a combination of E C B and O F B ( O u t p u t Feed B a c k ) . B u t since the C F S author states some concerns w i t h his approach, an alternative that balances E C B and C B C is used. b F S uses C B C on b F S blocks. E a c h b F S block contains a fixed number of ciphertext blocks, and hence a write anywhere in a file will only affect the b F S block it resides w i t h i n and not the remainder of the file. T h i s use of C B C on b F S blocks offers very strong security, and like all the b F S components, the cryptographic module could easily be told to use different block ciphers w i t h different key lengths.  With  slightly more effort, it could be converted to use a completely different encryption technique, such as the one used in C F S . To help avoid the duplicate block syndrome of E C B on the first C B C block in a b F S block, the cipher uses an initialization vector. T h i s vector is x o r ' e d w i t h the first block before it is encrypted — it acts as the ciphertext of block —1. L i k e other b F S modules, the initialization vector can easily be customized. T h e default initialization vector of block n is the 64-bit block number. Block ciphers require padding if the length of the plaintext is not guaranteed to be a multiple of the cipher block size.  Therefore, both b F S and t r a d i t i o n a l  file systems w i t h built-in encrypting capabilities, such as N T F S 5 . 0 or C F S , face a problem when a file's size is not a multiple of the cipher block size. T o solve this problem, b F S uses the same padding technique as in the U n i x utility bdes [1]. T h e very last byte in a file, once decrypted, specifies how many bytes should be ignored. b F S ensures that the encrypted file size is always a multiple of the cipher block size, hence requiring null padding. T h i s technique allows public read-only clients to  23  determine the actual file size by first reading the last b F S block. E x i s t i n g file systems optimize disk usage of sparse files by not allocating null blocks. D o i n g so in b F S would mean t h a t public read-only access to files would not only require the current key, but also the currently valid non-null ranges in the file. T h i s could be avoided by assuming t h a t a b F S block that is composed of only null bytes is in fact, a null plaintext block. B o t h approaches are unacceptable.  In the  first approach b F S would have to maintain a large amount of meta-data. T h e second approach is based on the assumption t h a t the s y m m e t r i c algorithm will not produce a block of nulls as ciphertext. T h i s assumption is flawed as a block cipher maps one .64 bit quantity onto another.  T h i s relationship must be a function that is 1-to-l  and onto, which means that some plaintext, using some key, must encrypt into eight null bytes. Furthermore, by using the storage provider's sparse file feature we are revealing information about the structure of the encrypted file. F o r simplicity and s y m m e t r y b F S does not optimize storage of sparse files.  2.6.1  Naming  In order to provide an acceptable level of security, b F S encrypts file names as well as file contents. b F S currently supports four different transformations which map an object's real name to the name used on the remote storage, and can easily support more. T h e first is s i m p l y the real name encrypted w i t h the directory's key, using the file identifier as the initialization vector. T h i s results in relatively long file names as the encrypted binary d a t a must be normalized to a set of legal characters for the remote name. T h e second transformation is the hash of the object's real name and the directory's key. T h i s results in  fixed-length  names. T h e third is a function of  the O I D , and the forth is a randomly generated name, unique within the directory.  24  T h e advantage of the last two approaches is t h a t a very compact namespace can be created, which could offer a longer real path length than supported by the remote storage. T o support all these transformations, b F S stores both the object's real and remote names in its meta-data.  2.7  Summary  T h e core ideas behind the design of b F S allow users to store their d a t a w i t h various storage providers using various network file-system protocols. Users no longer need to trust their storage providers to prevent the accidental or malicious disclosure of their data.  Instead, users trust a software agent to control c o m m u n i c a t i o n with  the storage provider, perform all authentication, maintain meta-data, and control sharing. T h e agent can support different network file-system protocols, different encryption and n a m i n g techniques, and deployed in multiple locations. S h a r i n g relies on a user's ability to establish relationships w i t h sharing partners, and exchange certificates in any manner sufficiently secure for one's purposes. T h i s removes any centralized sharing authority and does not enforce any key management policies. Furthermore, sharing can span multiple administrative domains as system administrators are never involved in the sharing process.  25  Chapter 3  Implementation T h e b F S prototype implementation consists of two entities: client, and agent. T h e client provides access to b F S by integrating w i t h the local operating system and c o m m u n i c a t i n g with agents. T h e agent communicates w i t h the storage providers, authenticates users, maintains meta-data, and enforces access control. T h i s chapter discusses these two entities in further detail.  3.1  Overview  F i g u r e 3.1 illustrates the relationship between clients, agents, and storage providers. Clients provide access to a b F S file system by c o m m u n i c a t i n g w i t h either remote or co-located agents. These agents communicate with the user's remaining agents to ensure that all the user's agent have the user's credentials for access to the remote storage.  These credentials are used by the agents to gain access to the remote  storage.  For instance, S M B agents require the username, password, and domain  (host) to gain access to the remote storage.  T h e agent also communicates with  foreign agents to facilitate sharing. A l l I / O to the remote storage is done through  26  the remote storage facade. This facade provides a consistent interface within the agent to storage providers using various protocols. All cryptographic routines are encapsulated in a cryptographic library that conforms to the Java Cryptography Extension specification. C l i e n t  Figure 3.1: bFS agent overview The following sections discuss the implementation of the client and agent, and what optimization were added to increase performance.  3.2  Client  Access to bFS is provided by a user-level NFS server running on FreeBSD. The server acts as a gateway between the NFS protocol and the bFS protocol. It was 27  developed by implementing the server stubs generated by the R P C compiler. T h e server communicates w i t h the kernel's N F S client using U n i x d o m a i n sockets, and w i t h agents using T C P / I P over a trusted network. T h e user-level N F S server, on startup, mounts / b f s. In the mounted subtree exist two subdirectories: home, where the user's fdes are located; and f r i e n d s , where sharing parties are accessible. T h e / b f s / f r i e n d s directory contains a directory entry for each entry in the owner's user database.  F o r instance, if a user has B o b ,  A l i c e and M i k e as sharing partners, / b f s / f r i e n d s / B o b , / b f s / f r i e n d s / A l i c e , and / b f s / f r i e n d s / M i k e would be present. W h e n the user accesses one of these directories, the N F S server retrieves the users's certificate using the GETUSER c o m m a n d . It uses the certificate to locate one of the sharing party's public agents, and communicates w i t h that agent when accessing files. W h e n the user accesses / b f s/home, the N F S server communicates with the user's agent.  3.3  Agent  T h e agent is implemented as a W i n d o w s Service w r i t t e n in J a v a . T h e W i n d o w s platform was chosen because it provides a very simple and secure mechanism to authenticate users and acquire their security tokens. A single process can obtain security tokens for many users and switch between these identities as required. T h i s allowed the agent to run as a system service (similar to a U n i x daemon), authenticate users to a d o m a i n , and impersonate them on subsequent access to the remote storage. J a v a was chosen for rapid development purposes. T w o major functional components exist in the agent: remote storage facade, and agent core. T h e remote storage facade is an interface that must be exposed for any  28  network file-system protocol that need be supported. T h e current implementation provides an S M B facade. T h i s allows the agent to authenticate against S M B d o m a i n controllers, and use S M B file servers as the storage provider. Since newer versions of S M B support secure authentication with a d o m a i n controller, the agent can securely authenticate users. T h e facade is the only component requiring change i f a different network file-system protocol was to be used.  D u e to the nature of the facade,  stacking multiple facades achieves richer functionality. In the implementation the S M B facade sits below a caching facade. T h e caching facade uses the S M B facade when it cannot fulfill a request. T h i s separation enables the use of different caching strategies (for instance, if support for multiple agents was included) without affecting the backend facade. T h e agent core communicates with a facade to perform all I / O on the storage provider. It manages meta-data, determines what needs to be read or w r i t t e n , manages keys, handles authentication, and understands the b F S protocol. Its functionality was discussed in section 2.5. T h e current agent implementation supports a single agent. In order to enable multiple agents, the remote storage facade has to be modified to utilise a cache consistency protocol. T h e sharing privileges supported by the agent are: read, write, and a d m i n ister.  O n l y users w i t h administrative privileges may modify A C L s and the user  database, and administrative privileges may not be revoked from the owner. These semantics were chosen for their simplicity and have proven sufficient for the prototype.  29  3.4  Performance Optimizations  Performance o p t i m i z a t i o n s can be introduced to both the client and the agent. T h e client contained no optimizations, but the agent contained two o p t i m i z a t i o n s . T h e agent uses a simple block-caching a l g o r i t h m , and caches J a v a objects representing meta-data. T h e cache uses a hashtable to map an object's fully qualified name to an entry t h a t contains the file's remote file descriptor, a d i r t y flag, a clock (used for n-th chance block replacement), the J a v a meta-data object associated w i t h this file (used for encryption and decryption) and a hashtable of blocks. T h e blocks hashtable maps a block number to a d i r t y flag, clock, and the actual data. A syncher thread r u n n i n g at some user-defined interval performs lazy writes, clears d i r t y flags, and removes cache entries when the cache gets too large (size is configurable). C a c h i n g meta-data J a v a objects avoids the cost of re-creating a J a v a object from its persistent state. W h e n d a t a or meta-data change the changes are written to the cache and the associated blocks flagged as dirty. A d d i t i o n a l o p t i m i z a t i o n can improve performance. A t t r i b u t e caching on the client can reduce the number of G E T A T T R operations performed.  T h e agent can  improve performance by implementing prefetching.  3.5  Summary  T h e b F S prototype implementation concentrated mainly on new functionality offered by b F S . Performance o p t i m i z a t i o n were not a major priority as it was known what o p t i m i z a t i o n will improve performance, and that these o p t i m i z a t i o n could be made at a later date.  30  Chapter 4  Performance T h i s chapter begins with a discussion of what the performance expectations are for the current b F S prototype. T h i s is followed by a set of micro-benchmarks and the A n d r e w benchmarks.  4.1  Overview  T h e system's performance is impacted by three items: Java, a d d i t i o n a l network c o m m u n i c a t i o n , and the nature of the user-level N F S server. T h e agent is implemented in J a v a using the Microsoft virual machine, running as a W i n d o w s Service. T h e cryptography library used by the agent is pure J a v a , containing no native calls. J a v a was selected because it enabled rapid development as the p r i m a r y concern was exploring the new functionality offered by b F S , and tackling performance issues at a later stage. T h e user-level N F S server employs no performance enhancements and can be thought of as a simple gateway t h a t translates the N F S protocol to the b F S protocol. A n o t h e r system that used a user-level N F S server to mount its file-system  31  is SFS. The SFS client daemon employed aggressive attribute caching to achieve NFS performance [9]. It is expected that since no such effort was put into the bFS user-level NFS server, performance will not be on par with NFS. The goal for the performance of bFS is that it be comparable to the performance of NFS running over a secure channel. For reads and writes, bFS performs only one half of the encryption that NFS over a secure channel must perform and therefore should be faster. This is so because secure NFS would require data to be encrypted by the client, transmitted, and then decrypted by the server. bFS does not require server decryption. On the other hand, the bFS agent introduces overheads that will make bFS slower. In other performance respects, the performance of the two systems should be comparable assuming that a similar optimization effort is performed on both. The following sections describe the test environment, microbenchmarks, and performance observations.  4.2  Test Environment  The test environment consists of four machines: • Client: Pentium III 550MHz with 512MByte R A M running FreeBSD .4.0. • Agent: Pentium II 350MHz with 128MByte R A M running Windows 2000 Professional. •  Samba:  Sun Ultra 10 300MHz with 64MByte R A M running Solaris 5.7. SMB  services are offered by Samba [15] version 2.0.0alphaX . • NFS: Sun Ultra 10, 300MHz with 192MByte R A M running Solaris 5.6. 32  W h e n measuring N F S performance, Client communicates directly with NFS. W h e n measuring b F S performance,  Client communicates w i t h a user-level N F S  server, (running on the same machine) using U n i x d o m a i n sockets.  T h a t server  communicates w i t h Agent who accesses a S M B share offered by Samba, w h o needs to communicate with NFS since the share is actually an N F S mount. It is expected that b F S suffer a performance penalty due to the fact t h a t the test environment requires additional network hops. Ideally, Client and Agent should be co-located, as should NFS a n d Samba.  4.3  Micro-Benchmarks  M i c r o benchmarks are used to analyze the cost of file system operations. T h e userlevel N F S server was instrumented t o record each N F S request it received and all b F S operations it performed to satify that request.  Each of the events was time  stamped using the Intel P e n t i u m cycle counter, using the 'read time s t a m p counter' (rdtsc) i n s t r u c t i o n . T h e test involved m a k i n g a directory, copying a 6 4 K B file into i t , and then reading the file. T o determine Read cost d u r i n g a cache miss, the user-level N F S server was stopped, re-started, and the file was read. T h i s process ensured that the block cache would be empty. Table 4.1 summarises end-to-end performance of.key functionality. T h e performance number are the mean of five test runs.  T h e first c o l u m n describes the  operation. T h e second is the t o t a l time of the operation in milliseconds. T h i s time is broken to the amount of time spent in the agent and the a m o u n t of time spent in the user-level N F S server (client). T h e last column indicates how much time could have been saved i f the user-level N F S server performed some a t t r i b u t e caching.  T o t a l time  Agent  Client  Client overhead  Read 3 2 K (cache miss)  49.25  47.40  1.85  0.00  Read 3 2 K (cache hit) W r i t e 32.K (cache entry not present)  15.58 33.09  13.85 32.11  1.73 0.98  0.00 2.64  W r i t e 3 2 K (cache entry present) Create MkDir L o o k u p cache hit G e t A t t r cache hit  19.80 44.67 102.24 6.51 2.90  19.00 44.49 101.82 6.21 2.71  0.80 0.18 0.42 0.30 0.19  2.85 6.83 18.12 5.62  Operation  0.00  Table 4.1: b F S micro-benchmarks in milliseconds Notice that Create and MkDir have to perform remote file-system operations synchronously, and hence are relatively expensive. A Create requires a synchronous file creation on the storage provider. After creating the remote file the agent updates the directory meta-data.  T h e MkDir operation requires a synchronous directory  creation on the storage provider, followed by a synchronous file creation for the meta-data file of the newly created directory. After creating the two objects the agent updates the parent directory's meta-data and populates the new directory's meta-data. T h e Read cache miss requires both synchronous access to the storage provider and cost d a t a d e c r y p t i o n . T h e decryption step alone takes 15.24 milliseconds. T h i s step completes in 0.62 milliseconds using a native C routine. T h e difference between the cache hit and miss numbers for Read indicate that reading the d a t a from the storage provider and adding it to the cache requires almost 20 milliseconds. T h e cost of adding d a t a to the cache when a cache entry (for the file) does not exist can be determined by examining different between the two Write figures. T h e t w o writes are generated when the 6 4 K B file is copied and do not differ in any way aside from the fact that the write creates a cache entry for the file while  34  the second write simply adds a block to the cache entry's block list. T h e difference between the two writes, and hence the cost of adding d a t a to the cache when a cache entry for the file does not exist, is 13.11 milliseconds. T h e difference in the agent time of the two reads is 33.55 milliseconds, of which 15.24 is cryptographic cost, leaving 18.31 milliseconds to access the storage provider and insert the d a t a into the cache. Since the cost of adding d a t a to the cache is 13.11 milliseconds, the cost of network access to the remote storage is 5.2 milliseconds. There are two reasons for the performance of b F S : J a v a and the user-level N F S server. T h e agent is implemented in J a v a because it allowed rapid prototype development and portability. A t the outset, it was expected for J a v a to have a negative impact on performance, but it was also expected that ongoing efforts by the J a v a c o m m u n i t y will improve J a v a performance. To quantify the effect of the decision to use J a v a for the agent, a modified user-level N F S server tracked agent performance as the file-system grew. Surprisingly, performance deteriorated rapidly. F o r instance, recall that a GetAttr microbenchmark took a p p r o x i m a t e l y 3 milliseconds, as the file-system grew to several hundered files in a few dozen directories, the same operation required almost 80 m i l liseconds. F u r t h e r investigation was inconclusive, but suggested that the Microsoft V i r t u a l M a c h i n e and the J a v a garbage collector may have been responsible for the performance deterioration. Unfortunately, the Microsoft V i r t u a l M a c h i n e cannot be configured to disable garbage collection, so this theory could not be validated. T o quantify the effect of the user-level N F S server, the performance of a C program that uses the N F S server to access the b F S file system was compared to that of a J a v a p r o g r a m t h a t accesses the agent directly. W h e r e the C program  35  Phase  bFS  NFS  1  0.4  14  0.8 0.4  I II III IV V  5 6 77  1.6 4.4  Table 4.2: A n d r e w benchmark: b F S vs. N F S took 25 seconds to write a one megabyte file, the J a v a program took just as long as the cp utility under N F S — 280 milliseconds. There is a single encompassing reason for the performance differences between the two methods of accessing the b F S system — the N F S client in the kernel causes a one megabyte write request to be broken into 128 eight kilobyte write requests. T h e b F S agent and the N F S server both have to process these additional requests.  T o determine exactly where these  e x t r a cycles are consumed, the J a v a program was modified to perform 128 eight kilobyte writes instead of the single one megabyte write.  T h e modified program  now required 450 milliseconds. T h i s pointed at the user-level N F S server as the cause for poor performance.  B y modifying the N F S read and write sizes to 16k  (from 8k), performance for the C program improved to 11.7 seconds. M o v i n g to 24k resulted in the one megabyte write requiring 2 seconds. F i n a l l y , by m o v i n g to 32k read and write sizes the one megabyte write required a mere 370 milliseconds.  4.4  Andrew Benchmark  T h e current implementation falls short of the goal to match the performance of N F S . Table 4.2 compares the elapsed time in seconds of each phase of the Modified A n d r e w B e n c h m a r k [12] running on b F S and regular N F S .  36  Phase  bFS  NFS  I  0.7  0.4 0.8 0.4  II  6  III IV  3 4  V  38  1.6 4.4  Table 4.3: A n d r e w benchmark using 32k r e a d / w r i t e sizes: b F S vs. N F S  T h e first phase creates many subdirectories. T h e second copies many files and directories. T h e t h i r d recursively retrieves the status of every file in a subdirectory that contains source files for a p r o g r a m . E v e r y byte of every file in that source tree is examined d u r i n g the fourth phase.  T h e last phase compiles the project in the  source directory. T h e compiler and linker used d u r i n g the last phase are themselves located in the file system being tested, and t e m p o r a r y files are also created within that file system. T h e b F S block size used was 4 K B y t e , d a t a was encrypted using 128 bit Blowfish, and names were transformed using a function of the O I D . T h e N F S read ©  and write sizes were set to 8 K B y t e . R u n n i n g the same benchmard using N F S read and write sizes of 3 2 K B y t e yields the results outlined in Table 4.3. A l t h o u g h the performance is not on par with N F S , the numbers indicate that an enhanced user-level N F S server would be able to achieve performance on par w i t h N F S . especially if it is integrated with a b F S agent.  37  4.5  Summary  b F S offers functionality not available on any other network file system. T h e performance of the initial prototype falls short of the goals to match N F S performance, however, analysis of the performance numbers has shown t h a t the performance of b F S could be on par with N F S given sufficient o p t i m i z a t i o n s .  38  Chapter 5  Related Work b F S is a unique file system as it addresses two issues that have yet to be addressed together. C r y p t o g r a p h i c file systems protect users from d a t a exposure due to physical media theft or malicious administrators. Such systems can even sit on top of existing network file system protocols, offering network-based c r y p t o g r a p h i c a l l y secure file systems. However, the use of a network file system does not i m p l y cross-domain sharing. C u r r e n t network file systems do not include support for cross-domain sharing as the historical deployment of these system was w i t h i n closed user groups. A s stated earlier, the Internet is changing t r a d i t i o n a l file-system access and requires a more flexible model. T h a t model is the one support by b F S . O t h e r file systems, such as S F S [9], C F S [2], and T C F S [5], have taken steps t o w a r d support for this new model. S F S still requires server trust and does not solve the cross administrative d o m a i n sharing problem since, for authenticated users, it simply maps the remote user to a local U I D . If Alice wants to grant B o b access to her files but B o b does not have an account on A l i c e ' s system, Alice must either create B o b an account, or map B o b to her U I D , hence g r a n t i n g him more privileges t h a n initially intended. S F S also requires software to be installed on the servers.  39  C F S and T C F S solve the server trust issue by encrypting all data, but they do not attempt to solve the sharing problem. b F S offers a practical solution to both problems.  5.1  Crpytographic File Systems  A number of researchers have written about adding cryptography to a file system, but surprisingly few have implemented it. C F S [2] is a cryptographic file system for U n i x . A n encrypted directory is mounted under / c r y p t by a user's p r o v i d i n g the password to the system. T h e Transparent C r y p t o g r a p h i c F i l e System [5] extends C F S by integrating it seamlessly w i t h the file system so t h a t files appearing anywhere in the file system can be transparently encrypted and decrypted upon access. E n c r y p t i o n is triggered by t u r n i n g on a new secure bit in the file protection bits, and keys are managed by a separate server process. A n o t h e r system, called C r y p t f s [20], offers functionality identical to the previous two, except that is it implemented as a stackable file-system at the vnode layer. In all three systems, clients trust servers to be secure. F u r t h e r , all sharing must occur within a single administrative d o m a i n .  5.2  Cross-Domain  Sharing  S F S [9] implements a secure file system by separating key management from the rest of the file system security apparatus. In S F S , as in C F S and T C F S , the server is assumed to be secure.  T h e key to their design is self-certifying path names -  file names that effectively contain the appropriate remote server's public key. T h i s allows a user who is authorized to access a file to do so from anywhere in the Internet w i t h o u t the intervention of any system administrator. However, each incoming file  40  system request is made with the credentials of a user who is known to, and capable of authenticating himself to, the local file server. S F S does not address either of the main goals of b F S : security in the face of untrusted storage providers and sharing across administrative domains.  T h e closest t h a t S F S can come to sharing across  administrative domains is v i a its use of an authserv  process which maps remote  users into "a set of U n i x credentials—a user I D and list of group I D s " . B y doing so, the owner of a file accessible using S F S can specify access permissions based on the set of users and groups defined by the U n i x system a d m i n i s t r a t o r . C I F S [4] presents a global file system image, but does not address either of the main goals of b F S . It assumes t h a t servers are trusted, and does not facilitate sharing across administrative domains. E a c h security d o m a i n in C I F S manages its own user authentication. Hence A l i c e can only grant B o b access to her files i f she adds him to the local user database. If she does not have the ability to do so then she must provide h i m w i t h her credentials. A F S [8] restricts clients to a set of trusted servers maintained by an a d m i n istrator. Those properties are opposite to the main goals of b F S . There are a few other ways for users to securely share d a t a in the Internet, however these do not involve file systems. E x a m p l e include web ( H T T P ) and F T P access to files using the Secure Socket Layer ( S S L ) , and e-mail using P G P or other cryptographic tools.  In these schemes, security is assured v i a appropriate use of  cryptography, but at the cost of stepping outside of any shared file system.  One  could imagine constructing a file system interface to secure H T T P or secure F T P using the same sort of user-level N F S server that b F S uses. Once one takes care of key management and authentication of the p r i m a r y user and guests, the resulting system would look just like b F S .  41  Chapter 6  Conclusions T h e Internet is changing the t r a d i t i o n a l model of file system access by m a k i n g it easier for users to access their files from a variety of security domains, to share their files w i t h colleagues in other administrative domains, and to use Internetbased storage services which the user may not wish to trust to keep their d a t a private. E x i s t i n g cryptographic file systems solved the problem of untrusted storage by ensuring t h a t d a t a is stored in encrypted form. There are also other file systems that attempt to provide cross-domain sharing, but they require either a centralized user authentication mechanism, or modifications to the storage providers. b F S is unique as it addresses the issues of server trust and cross-domain sharing. Server trust is handled by using a cryptographic file-system. O t h e r cryptographic file-systems do not require their own set of meta-data as a single key is used for the entire file-system and cross-domain sharing is not supported. Cross-domain sharing is supported in b F S by using a decentralized authentication scheme.  Each  user is responsible for managing their own private authentication database that contains the certificates of all users with whom a sharing relationship is established. T h e b F S prototype implementation illustrates that the model argued for in  42  this thesis can be deployed using existing network file system infrastructure. Analsis of the prototype's performance revealed t h a t N F S performance can be attained by o p t i m i z i n g the user-level N F S server, and t a k i n g some action to overcome the J a v a performance issues. b F S provides sharing and privacy using untrusted storage by allowing users to place their trust in a simple agent. T h i s agent is the only locus of trust in the system, and the b F S structure of interacting agents provides a user complete control over when, how, and w i t h w h o m their files are shared.  6.1  Future Work  O f immediate importance is a detailed e x a m i n a t i o n of the Microsoft V i r t u a l M a c h i n e as it is believed to be the primary contributor to performance penalties. A s i d e from the performance issue, two projects come to m i n d . T h e first is the addition of a web interface to the system. T h e second is the use of decentralized sharing to enable a capabilities-based  file-system.  T h e largest challenge in adding a web interface is defining the trust model. A standard web authentication interface requires that a user enter their username and password. However, b F S requires a certificate and a matching private key for a user to authenticate w i t h an agent.  T h e naive solution is for a user to store  their certificate and encrypted private key w i t h the web server, their username and password acting as the decryption key. T h i s solution leaves a user's private key with the web server, and is hence unappealing.  A much more appealing solution  is to store a user's certificate and private key on a smart c a r d . T h e web interface communicates with a smart card reader on the user's machine to retrieve the user's certificate and to sign d a t a using the user's private key.  43  U s i n g b F S to create a capabilities-based file-system would require application software to execute in an environment where file-system access is provided by an agent acting as a user w i t h limited access privileges. T h e existing user-level N F S server would have to be modified to present different file system images for different user processes.  44  Bibliography [1] M a t t B i s h o p . Implementation notes on b d e s ( l ) . Technical R e p o r t P C S - T R - 9 1 158, Department of M a t h e m a t i c s and C o m p u t e r Science, D a r t m o u t h College, Hanover, N H 03755, A p r i l 1991. [2] M a t t Blaze. A cryptographic file system for unix. In Proceedings of the 1st ACM Conference on Communications and Computing Security, November 1993. [3] B . C a l l a g h a n , B . P a w l o w s k i , and P . Staubach. N F S version 3 protocol specification. R F C 1831, Network W o r k i n g G r o u p , June 1995. [4] Microsoft C o r p o r a t i o n . Microsoft Networks SMB File Sharing Protocol (Document Version 6.Op). R e d m o n d , Washington. [5] D i p a r t i m e n t o di Informatica ed A p p l i c a z i o n i of the Universit di Salerno. Transparent Cryptographic File System, h t t p : / / t c f s . d i a . u n i s a . i t / . [6] driveway, h t t p : / / w w w . d r i v e w a y . c o m . [7] M . Eisler. N F S version 2 and version 3 security issues and the nfs protocol's use of R P C S E C - G S S and kerberos V 5 . R F C 2623, Network W o r k i n g G r o u p , June 1999.  45  [8] John H . H o w a r d , M i c h a e l L . K a z a r , Sherri G . Menees, D a v i d A . Nichols, M . Satyanarayanan, R o b e r t N . Sidebotham, and M i c h e a l J . West. Scale and performance of a distributed file system. ACM Transactions on Computer Systems, 6(1):51-81, February 1988. [9] D a v i d Mazieres, M i c h a e l K a m i n s k y , M . Frans Kaashoek, and E m m e t t W i t c h e l . Separating key management from file system security. In Proceedings of the 17th ACM Symposium on Operating Systems Principles (SOSP'99), pages 124-139, December 1999. [10] Microsoft. Windows 2000 IFS Kit.  http://www.microsoft.com/HWDEV/ntifskit/.  [11] netdrive. h t t p : / / w w w . n e t d r i v e . c o m . [12] J o h n K . Ousterhout. hardware?  W h y aren't operating systems getting faster as fast as  Summer USENIX, pages 247-256, June 1990.  [13] J . H . Saltzer, D . P . Reed, and D . D . C l a r k .  End-to-end arguments in sys-  tem design. ACM Transactions of Computer Systems, 2(4):277-288, November 1984. [14] Jerome H . Saltzer. P r o t e c t i o n and the control of information sharing in multics. Communications of the ACM, (7):388-402, J u l y 1974. [15] Samba,  http://www.samba.org.  [16] Bruce Schneier. Applied Cryptography. J o h n W i l e y & Sons, Inc., second edition, 1996. [17] R . Srinivasan.  R P C : Remote procedure call protocol specification verion 2.  R F C 1831, Network W o r k i n g G r o u p , A u g u s t 1995. 46  [18] R . Srinivasan. X D R : E x t e r n a l d a t a representation s t a n d a r d . R F C 1832, Network W o r k i n g G r o u p , A u g u s t 1995. [19] X : d r i v e . h t t p : / / w w w . x d r i v e . c o m . [20] E . Zadok, L . Badulescu, and A . Shender.  C r y p t f s : A stackable vnode level  encryption file system. Technical Report C U C S - 0 2 1 - 9 8 , C o m p u t e r Science Department, C o l u m b i a University, 1998. [21] P h i l Z i m m e r m a n n . Pretty Good Privacy, h t t p : / / w w w . p g p . c o m .  47  


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items