CSE 291 Spring 2018, George Porter

CSE 291 Project 2

2018 May 24: Project 2: SurfStore

FAQ

(May 24): Here are a few additional examples of interacting with the client:

$ # A new MetaDataStore and a new BlockStore has been started
$ client /etc/myconfig.txt getversion myfile.txt
0
$ client /etc/myconfig.txt delete myfile.txt
Not Found
$ mkdir /home/aturing/downloads
$ client /etc/myconfig.txt download myfile.txt /home/aturing/downloads
Not Found
$ client /etc/myconfig.txt upload /home/aturing/myfile.txt
OK
$ client /etc/myconfig.txt download myfile.txt /home/aturing/downloads
OK
$ diff /home/aturing/myfile.txt /home/aturing/downloads/myfile.txt
$ # (indicates the files are identical)
$ client /etc/myconfig.txt getversion myfile.txt
1
$ echo "adding some additional data" >> /home/aturing/myfile.txt
$ client /etc/myconfig.txt upload /home/aturing/myfile.txt
OK
$ client /etc/myconfig.txt getversion myfile.txt
2
$ client /etc/myconfig.txt delete myfile.txt
OK
$ client /etc/myconfig.txt getversion myfile.txt
3
$ client /etc/myconfig.txt download myfile.txt /tmp
Not Found

(May 13): The video walk-through mentions a “blockstore” git branch. The starter code doesn’t have this branch–instead you can implement the code described in the video yourself. In terms of “HashUtils”, the code for implementing a hash function is in this write-up.
(May-8): Updated the grading rubric to include the client. Updated information on printing OK or Not Found to stdout.

GitHub classroom invite link

Link to GitHub starter code invitation

Overview

In this project you are going to create a cloud-based file storage service called SurfStore. SurfStore is a networked file storage application that supports four basic commands:

Create a file
Read the contents of a file
Change the contents of a file
Delete a file

Multiple clients can concurrently connect to the SurfStore service to access a common, shared set of files. Clients accessing SurfStore “see” a consistent set of updates to files, but SurfStore does not offer any guarantees about operations across files, meaning that it does not support multi-file transactions (such as atomic move).

The SurfStore service is composed of the following two sub-services:

BlockStore: The content of each file in SurfStore is divided up into chunks, or blocks, each of which has a unique identifier. The BlockStore service stores these blocks, and when given an identifier, retrieves and returns the appropriate block.
MetadataStore: The MetadataStore service holds the mapping of filenames/paths to blocks.

Additionally, you will need to implement a client that can support the four basic commands listed above.

The project is structured into two parts:

Part 1: You’ll implement both the BlockStore and MetadataStore services, and the SurfStore client. The metadata service you implement will simply keep its data in memory, with no replication or fault tolerance. We’ll refer to this version as a centralized implementation of SurfStore.
Part 2: Next, you’ll create a version of the MetadataStore service as a set of distributed processes that implement fault tolerance. This distributed implementation will use a repliated log (replicated state machine) plus 2-phase commit to ensure that the MetadataStore service can survive, and continue operating, even if one of its processes fails, and that after failed processes recover they are able to rejoin the distributed system and get up-to-date.

Logistics

This project can be done individually or in a group of two.

SurfStore Specification

We now describe the service in more detail.

Basic concepts

Blocks, hashes, and hashlists

A file in SurfStore is broken into an ordered sequence of one or more blocks. Each block is of uniform size (4KB), except for the last block in the file, which may be smaller than 4KB (but must be at least 1 byte large). As an example, consider the following file:

The file ‘MyFile.mp4’ is 14,437 bytes long, and the block size is 4KB. The file is broken into blocks b0, b1, b2, and b3 (which is only 2,149 bytes long). For each block, a hash value is generated using the SHA-256 hash function. So for MyFile.mp4, those hashes will be denoted as [h0, h1, h2, h3] in the same order as the blocks. This set of hash values, in order, represents the file, and is referred to as the hashlist. Note that if you are given a block, you can compute its hash by applying the SHA-256 hash function to the block. This also means that if you change data in a block the hash value will change as a result. To update a file, you change a subset of the bytes in the file, and recompute the hashlist. Depending on the modification, at least one, but perhaps all, of the hash values in the hashlist will change.

Generating SHA-256 hash values in Java

As an example of converting the string “foobar” to a SHA-256 hash value (encoded using base-64) in Java:

import java.nio.charset.StandardCharsets;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.Base64;

public class ShaTest {

	public static void main(String[] args) {
		if (args.length != 1) {
			System.err.println("Usage: ShaTest <string>");
			System.exit(1);
		}

        String text = args[0];
        System.out.println("Input: " + text);

        MessageDigest digest = null;
		try {
			digest = MessageDigest.getInstance("SHA-256");
		} catch (NoSuchAlgorithmException e) {
			e.printStackTrace();
			System.exit(2);
		}
        byte[] hash = digest.digest(text.getBytes(StandardCharsets.UTF_8));
        String encoded = Base64.getEncoder().encodeToString(hash);

        System.out.println("Output: " + encoded);
	}

}

Note that you’ll actually be hashing 4KB binary blocks, and so you will want to simply pass an array of bytes directly to the digest.digest() function.

Files and filenames

Files in SurfStore are denoted by filenames, which are represented as strings. For example “/Myfile.mp4”, “/Documents/My Videos/BeachVacation.mp4”, and “/Conferences/Expenses.txt” are all example of filenames. Although filenames can contain the slash character (“/”), SurfStore doesn’t really have any concept of a directory or directory heirarchy–filenames are just strings. For this reason, filenames can only be compared for equality or inequality, and there are no “cd” or “mkdir” commands.

File versions

Each file/filename is associated with a version, which is a monotonically increasing non-negative integer. The version is incremented any time the file is created, modified, or deleted. The purpose of the version is so that clients can detect when they have an out-of-date view of the file hierarchy.

For example, imagine that Client 1 wants to update a spreadsheet file that tracks conference room reservations. Ideally, they would perform the following actions:

However, another client might be concurrently modifying this file as well. In reality, the order of operations might be:

As you can see, Client 1 overwrote the change that client 2 made without realizing it. We can solve this problem with file versions. Every time a file is modified, its version number is incremented. SurfStore only records modifications to files if the version is one larger than the currently recorded version. Let’s see what would happen in the two-client case:

To delete a file, the MetadataStore service simply notes that the file is deleted. In this way, deletion events also require version numbers, which prevents race conditions that can occur when one client deletes a file concurrently with another client deleting that file. In SurfStore, we are going to represent a deleted file as a file that has a hashlist with a single hash value of “0”. Note that this means the file must be recreated before it can be read by a client again.

Processes

SurfStore consists of three types of processes: client processes, a BlockStore process, and one or more Metadata processes. Note that one (and only one) of the Metadata processes is specially designated as the “leader”, meaning that all client requests should go through that server and that server only. The leader never fails, never crashes, and never loses connectivity with the clients.

Client

A client is a program that interacts with SurfStore. It is used to create, modify, read, and delete files. Your client will call the various file modification/creation/deletion RPC calls. We will be testing your service with our own client, and your client with instrumented versions of our service.

The client implements the following functionalities (the config files are described in a later section):

upload : Reads the local file, creates a set of hashed blocks and uploads them onto the MetadataStore (and potentially the BlockStore if they were not already present there).
- Usage: $ client myconfig.txt upload /path/to/file
download : Downloads a file from SurfStore. Ensures not to download unnecessary blocks.
- Usage: $ client myconfig.txt download filename.txt /path/to/where/to/store/it
delete : Signals the MetadataStore to delete a file.
- Usage: $ client myconfig.txt delete filename.txt
getversion: Returns the current version of the given file. If SurfStore is centralized, then it should return a single value. If SurfStore is distributed, it should return three values separated by spaces.
- Usage: $ client myconfig.txt getversion filename.txt
- Centralized example output: 32
- Distributed example output: 32 32 32

We will be testing your client by calling these functions. Make sure that your client matches the same syntax as described above.

Some examples:

$ client /etc/myconfig.txt upload /home/aturing/myvacation.mp4
$ mkdir /home/aturing/downloads
$ client /etc/myconfig.txt download myvacation.mp4 /home/aturing/downloads
$ diff /home/aturing/myvacation.mp4 /home/aturing/downloads/myvacation.mp4
(indicates the files are identical)
$ client /etc/myconfig.txt getversion myvacation.mp4
1
$ cp /home/aturing/springbreak.mp4 /home/aturing/myvacation.mp4
$ client /etc/myconfig.txt upload /home/aturing/myvacation.mp4
$ client /etc/myconfig.txt getversion myvacation.mp4
2
$ client /etc/myconfig.txt delete myvacation.mp4
$ client /etc/myconfig.txt getversion myvacation.mp4
3
$ client /etc/myconfig.txt download myvacation.mp4 /tmp
(indicates an error because the file isn't there)

BlockStore

The BlockStore service is an in-memory data store that stores blocks of data, indexed by the hash value. Thus it is a key-value store. It supports a basic get() and put() operations. It does not need to support deleting blocks of data–we just let unused blocks remain in the store. The BlockStore service only knows about blocks–it doesn’t know anything about how blocks relate to files.

The service implements the following API:

StoreBlock(h,b): Stores block b in the key-value store, indexed by hash value h
b = GetBlock(h): Retrieves a block indexed by hash value h
True/False = HasBlock(h): Signals whether block indexed by h exists in the BlockStore service

MetadataStore

The MetadataStore process maintains the mapping of filenames to hashlists. All metadata is stored in memory, and no database systems or files will be used to maintain the data. When we test your project, we will always start from a “clean slate” in which there are no files in the system.

The service implements the following API:

(v,hl) = ReadFile(f): Reads the file with filename f, returning the most up-to-date version number v, and the corresponding hashlist hl. If the file does not exist, v will be 0.
ModifyFile(f,v,hl): Modifies file f so that it now contains the contents refered to by the hashlist hl. The version provided, v, must be exactly one larger than the current version that the MetadataStore maintains.
DeleteFile(f,v): Deletes file f. Like ModifyFile(), the provided version number v must be one bigger than the most up-date-date version.
IsLeader(): Returns true if this Metadata service is the leader, otherwise returns false.

To create a file that has never existed, use the ModifyFile() API call with v set to 1. To create a file that was previously deleted when it was at version v, use ModifyFile with a version number of v+1.

Basic operating theory

When a client wants to create a new file, it first contacts the MetadataStore leader to see if the file already exists (or existed in the past, and has since been deleted). If so, it notes the previous version number, otherwise the file will start with a default version of 0.

The client then reads its local copy of the file and splits it into blocks, as described above. It then computes the hash values of each of the blocks to form a hashlist. It then contacts the MetadataStore leader and invokes the ModifyFile() API, passing it the filename, updated version number, and hashlist.

Clients are also responsible for uploading the blocks of the file to the BlockStore service. A naive implementation would do this after uploading the hashlist via ModifyFile(). But this leads to a potential race condition: what if another client tries to download a file using the hashlist given to ModifyFile() by the first client to the MetadataStore before it’s done uploading all of the blocks to the BlockStore? Moreover, how can SurfStore guarantee that a client actually did upload the necessary blocks for the file, and didn’t crash along the way?

To prevent these issues, the protocol you’re going to use works as follows. When the client does a ModifyFile() operation, the MetadataStore leader is going to query the BlockStore for each of the hash values in the hashlist, to see which, if any, of the blocks are already in the BlockStore. If any blocks are missing from the BlockStore, the MetadataStore will reply back to the client with a list of missing blocks. The MetadataStore will not create the filename to hashlist mapping if any blocks are not present in the BlockStore. Only when all the blocks are in the BlockStore will the MetadataStore signal a success return value to the client’s ModifyFile() operation, and from then on the new file version is available to any clients that want to download it.

As an example:

To download a file, the client invokes the ReadFile() API call on the MetadataStore, passing in the filename. The MetadataStore simply returns the version and hashlist to the client. The client then downloads the blocks from the BlockStore to form the complete file. As an example:

Distributed SurfStore

In the second part of this project, you are going to replicate the MetadataStore service to make it fault tolerant. There will be a single leader that all read, delete, and modify operations will go through, as before. That leader never fails and never goes off the network. The leader will turn client requests into replicated state machine operations that will be distributed to the other replicas using the 2-phase commit protocol. For this project, there will be one leader and two additional replicas for a total of three Metadata servers.

Initiating failures

In a real system, processes might fail for a number of reasons, including crashes, disk or memory failures, network partitions, software bugs, etc. In general, such failures are highly non-deterministic.

To aid your testing of your own implementation, and to aid our testing and grading of your project, we are going to extend the MetadataService API with calls that enable us to manually “fail” and “recover” processes. In this way, we can explore a number of failure scenarios.

Thus, each MetadataStore instance will need to support these APIs:

Crash(): This API call signals to the process that it should enter an emulated failure state. A failed process should not instigate any new messages, and should reply to any incoming messages with a failure response.
Restore(): This API call signals to the process that it has “recovered” and is now back online. It can now process and instigate RPC calls, and must begin to “catch up” so that it has the most up-to-date information about the current state of the Metadata service state.
GetVersion(): returns the current version number for a given file

Note that it is not valid to call Crash() or Restore() on the leader.

Implementation details

Language restrictions

For Project 2, you must complete all parts in Java.

Configuration file

For this project, you will use a configuration file describing the cluster details, with the following format:

configCentralized.txt

    M: 1
    L: 1
    metadata1: <port>
    block: <port>

configDistributed.txt

    M: 3
    L: 2
    metadata1: <port>
    metadata2: <port>
    metadata3: <port>
    block: <port>

The initial line M defines the number of Metadata servers.
The second line L denotes which of the Metadata servers is the leader. In the centralized example, metadata1 is the leader. In the distributed example, metadata2 is the leader.
The ‘metadata1’ line specifies the port number of your metadata server. Note the ‘1’, ‘2’, etc after the word metadata to indicate the ports for the different instances of the service.
‘block’ denotes the port number of your BlockStore.

This config file will be available to the client and servers when they are started. This configuration file helps the server or client know the cluster information and also how many metadata servers are present in the service. Note that because you’re going to run the client, the BlockStore, and the Metadata server all on the same machine, you will need to use unique ports. The configuration file we provide will always be valid and will not contain any errors or problems.

Client

The client will perform one or more operations before exiting.
Note that before downloading any blocks from the BlockStore, the client should check whether any of those blocks exist locally first. It only needs to check whether that block already is present in the destination directory. For example, if the client is downloading a file to the /home/aturing/downloads directory, then the block protocol only needs to look for local blocks in /home/aturing/downloads, not everywhere else (and not in any subdirectories either, just that one directory).

Client output

You can print out whatever messages you want to stderr. To stdout, please either print:

OK: if the operation succeeded
Not Found: if the client tried to download a file that was not found on the server, or tried to upload a file which does not exist on the local machine. Also return this if GetVersion is called on a file which does not exist.

SurfStore gRPC API

For this project, you will be using gRPC to implement the SurfStore API. We have provided you with a SurfStoreBasic.proto file that defines the basic API calls outlined above. You may add new services, RPCs, and Message types to this file, but do not delete or modify any of the existing RPCs or Messages.

gRPC is an RPC framework that will automatically generate stub code for API calls you specify. It’s up to you to implement them. We recommend looking at the gRPC sample included with the project in order to learn how to use the generated stub code.
You’ll need to implement more RPCs in part 2 of the project yourself in order to implement replicated state machines and 2-phase commit.
Be careful to not make any assumptions or shortcuts when using the API. Your implementation should work with any individual component (MetadataStore, BlockStore, and Client) swapped out with another student’s version. Feel free to try this out by logging into the same ieng6 machine and using the same configuration file! Of course, we don’t expect different implementations of MetadataStore to work together in part 2, since the API for that is completely up to you.
The provided starter code has separate “start” scripts for each component (e.g., the BlockStore, Metadata1, Metadata2, etc). Make sure that we can start these component individually. In other words, please do not have your code start all services in a single program, since we’d like to try different combinations of servers. For example, we might want to test your client and your Metadata servers with our BlockStore. Or we might try your Metadata servers and Blockstore with our client.

Grading rubric

BlockStore functionality (10 points)
- Storing, retrieving, and testing for the existence of blocks
Client functionality (15 points)
- Correctly uploading and downloading files to the store
- Correctly implementing the optimization where unnecessary blocks are not downloaded or uploaded to the BlockStore
Metadata store: Part 1 (40 points)
- Properly supporting versions across concurrent clients
- Properly supporting reading and modifying files. In particular, this includes the “missing blocks” protocol in which the Metadata store calls into the Block Store and returns the correct set of missing blocks back to the client. When duplicate files, or files that contain duplicate blocks, are uploaded, the client should not re-upload blocks that already exist.
- Properly supporting delete file
- Properly supporting the case where a new file is created with the same name as a previously deleted file
Metadata store: Part 2 (35 points)
- Properly handling the isLeader() call
- When one non-leader metadata store is crashed, clients interacting with the leader should continue work correctly. The crashed server should not have its metadata updated (in other words, its metadata stays the way it was when the server crashed)
- Once a crashed metadata store is restored, then it should come back up-to-date within 5 seconds. (Hint: the leader should try to append entries to the logs every 500 milliseconds)
Total: 100 points

Starter code and submission link

A link to the starter code and the GitHub invitation will be made available shortly.

Tutorial video

I’ve put together a four-part tutorial on how to get started with SurfStore, including how to implement the BlockStore service. It is located at this link

Misc FAQs

Storing replicated logs

You do not need to write out the logs to disk–it is fine to keep them in memory. This is true for the followers and for the leader.

Some hints on testing part 2

To test part 2, we are (in part) going to do the following:

Start up your servers
Update a number of files
We will then “crash” one or more of the followers (but never more than half)
We’ll then continue to update files
Your service should continue to work while the followers are crashed, so that as far as the client is concerned, nothing appears to have failed
During the time that one or more of your followers is crashed, we’ll call the getversion api call from your client to ensure that its state is not being updated. In other words, we’ll ensure that it is falling behind the rest of the system
We’ll then “uncrash” the follower(s), and wait e.g., 5 seconds. Then we’ll check to make sure that those followers have “caught up” to the rest of the system and have the updated information
This may happen multiple times.

I hope that this testing strategy can help you exercise your code.

Some notes on implementing part 2:

When a client sends a command to the leader, the leader is going to log that command in its local log, then issue a two-phase commit operation to its followers. When a majority of those followers approve of the update, the leader can commit the transaction locally, and then respond back to the client. After the leader responds back to the client, it is going to need to tell the followers that the transaction was committed. It is fine to immediately call into them with the updated commit index.

Now, what happens if a follower is in a crashed state? The leader should attempt to bring it up to date every 500ms, meaning that every half second the leader should call into the follower with updated information.

Versions and leaders

ModifyFile/DeleteFile should only be applied when the given version number is exactly one higher than the version stored in the metadatastore.

The Client should only ever invoke operations on the leader. The leader can handle ReadFile requests by itself, and can also properly handle ModifyFile and DeleteFile operations that are not valid. For example, if the client tries a modifyfile operation and there are missing blocks, or if a client tries a modifyfile operation that is the wrong version, then the leader can just reject the request. Only when a request is valid (all blocks are present and the version number is correct) does it need to invoke 2PC on the followers.

Followers

Note that the followers (and leaders) need always implement GetVersion, even when they are crashed. GetVersion should return the most up-to-date committed value of the filesystem. It should not include any logged (but not yet committed) updates.