CS 3113: Project 3

File System Implementation (Directory Structures)

Introduction

Hard disks organize data into fixed-sized blocks. When one wants to fetch a particular byte, the entire block in which that byte lives must be fetched. Likewise, when a single byte must be changed, the entire block is first read into memory, the byte is changed and then the entire block is written back to the disk. As application-level programmers, however, we prefer to think in terms of the file system abstraction: a file is a sequence of bytes of some arbitrary length. It is convenient to program with this abstraction in mind: we would like to be able to read a subsequence of the bytes from a file or write a new subsequence of bytes (either overwriting existing bytes in the file, or appending onto the file itself). During these processes, we prefer not to think in terms of which blocks on the hard disk that our bytes are coming from or going to. In addition, the file system abstraction also provides us with a convenient and logical way of finding files. Specifically, we use directories and subdirectories, along with specific names within a directory that map to specific files.

For projects 3 and 4, you will implement a miniature file system, MYFS, that makes the connection between disk blocks and the file system abstraction. We will use a real file on our Linux systems as a virtual hard disk drive. This virtual disk will be accessed one block at a time. Access to the bytes on your virtual disk will be provided by the vdisk code that we provide.

We also provide the file system data structure and a few other components. Your job in project 3 is to implement a hierarchical directory structure. In project 4, you will add files (with content!) to the file system. Specifically, as part of project 3, you will:

Objectives

By the end of this project, you should be able to:

Overview

The diagram below shows the relationship of the different components that we are implementing. Starting from the bottom:
  • vdisk (PROVIDED) implements a block-level storage device. This device allows read/write operations at the level of individual blocks of bytes. The storage of this block-level device is a Linux file (hence, this is a virtual disk)

  • myfs_lib_support (TO BE IMPLEMENTED, mostly) provides reusable functionality for manipulating different parts of the file system at the block level

  • myfs_lib (TO BE IMPLEMENTED) provides a set of virtual system calls that make up the user API

  • application programs (PROVIDED) include:
    • myfs_inspect: a program for examining the block-level structure of the disk. This program is about debugging and testing your code, and is not a user program
    • myfs_stats: a program that prints out the sizes of the various structures on the disk (including blocks and index nodes)
    • myfs_format: format the virtual disk
    • myfs_list: list an existing file or the contents of an existing directory
    • myfs_mkd: make a directory
    • myfs_rmd: remove a directory


Proper Academic Conduct

The code solution to this project must be done individually. Do not copy solutions to this problem from the network and do not look at or copy solutions from others. However, you may use the net for inspiration and you may discuss your general approach with others. These sources must be documented in your README file.


Logical Representation of the File System

Index Nodes

Index nodes contain the meta-data for a logical entity that is stored in the file system (either a file or a directory). The data inside the index node include:

All index nodes in the file system are referenced with an integer (of type INDEX NODE_REFERENCE). An INDEX NODE_REFERENCE can be the following values:

Directory Entries

A single directory entry contains the following information:

Blocks

A block is a fixed-size unit of storage on the virtual disk. Each block on the disk is referenced using an integer; the type is BLOCK_REFERENCE. A BLOCK_REFERENCE can be one of the following values:

All blocks contain BLOCK_SIZE bytes, which are allocated accordingly:

Depending on the context, the data stored in the block can be interpreted in one of several ways:

Volume Block

The volume block contains several key components:

Index Node Blocks

Because index nodes are much smaller than blocks, we pack multiple index nodes into the block. This is represented within an index node block as an array of individual index nodes (a total of N_INDEX_NODES_PER_BLOCK).

When the file system is first formatted, one block is allocated to representing index nodes. However, as needs grow, additional blocks are allocate. The list of blocks used for the index nodes is implemented as a linked list of blocks.

Directory Blocks

Because directory entries are much smaller than a block, we fit N_DIRECTORY_ENTRIES_PER_BLOCK directory entries within the block. As with index nodes, these are implemented as an array of directory entries.

When the file system is first formatted, one block is permanently allocated to represent the content of the root directory. In addition, as new directories are created, they are allocated a single block to represent their content. However, as the directory grows in size, more directory blocks are allocated, forming a linked list of blocks.


Logical Structure Examples

Format

Given the following command:
./myfs_format 128

The format command creates a disk that contains 128 blocks. After the format, the logical structure will be as follows:

Notes:

Create Directory

Given the following command:

./myfs_mkd foo

The logical structure will change to be:

Notes:

Create Another Directory

Given the next command:

./myfs_mkd foo/bar

The logical structure will change to be:

Remove Directory

Given the next command:

./myfs_rmd foo/bar

The logical structure will change back to the case above with just the foo directory


Virtual Disk Layout

The layout of an initial file system is discussed above. Below are some more general notes.
  • Block 0 (Use VOLUME_BLOCK_REFERENCE to refer to this block): volume block

  • Block 1 (Use ROOT_INDEX_NODE_BLOCK): the first index node block in a linked list of blocks. Initially, this linked list only contains block 1, but as the existing set of index node blocks are filled, additional blocks are allocated and appended onto this linked list.

    Note: once additional blocks are allocated to this list, they are never deallocated.

  • Block 2 (ROOT_DIRECTORY_BLOCK): the first directory block in the linked list for the file system root directory.

  • Any directory in the file system will have exactly one index node

  • Any directory in the file system will have at least one directory block in its linked list. When a new file or directory is added to a parent directory and there are no available directory entries, then a new block is allocated and added to this linked list.

    Note: once additional blocks are allocated to this list, they are never deallocated unless the entire directory is deleted.


Virtual Disk API

The virtual disk API is provided. This API implements block-level virtual disk read/write functionality.
#ifndef VDISK_H

#include <sys/types.h>
#include <unistd.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdlib.h>
#include <stdio.h>

typedef unsigned short BLOCK_REFERENCE;

// Size of a single block in bytes
#define BLOCK_SIZE 256 

// Maximum number of blocks on a disk
#define MAX_BLOCKS 1948


int vdisk_open(char *virtual_disk_name, int truncate_flag);
int vdisk_close();
int vdisk_read_block(BLOCK_REFERENCE block_ref, void *block);
int vdisk_write_block(BLOCK_REFERENCE block_ref, void *block);

#endif

Before any file system operations are performed, an application program must open the virtual disk. Likewise, when the application program is complete, it must close the disk.

The MYFS API interacts with the virtual disk at the block level: it can read from and write to single blocks. This will be your only interface to storage (do not circumvent the virtual disk API in your implementation of the MYFS API).

When your MYFS library is manipulating the virtual disk, it does the manipulations in memory. This means that when you wish to change the contents of a block on the virtual disk, you must first read the block into memory, make the changes there, and then write the block back to the virtual disk. Hence, your program will typically have one or two blocks and one or two index nodes in memory at any one time (but rarely any more).


MYFS Data Structures

The MYFS data structure is provided and must not be changed. Below are the specifics of this structure:
#ifndef FILE_STRUCTS_H
#define FILE_STRUCTS_H

#include <string.h>
#include <limits.h>
#include "vdisk.h"


// Implementation of min operator
#define MIN(a, b) (((a) > (b)) ? (b) : (a))


/**********************************************************************/
/*
File system layout onto disk blocks:

Block 0: Volume block
Block 1: First Index Node block
Block 2: Root directory block 
  :
  :
(all other blocks are either index node, directory or data blocks)
*/


/**********************************************************************/
// Basic types and sizes
// Chosen carefully so that all block types pack nicely into a full block
//
// NOTE: USE THESE CONSTANTS INSTEAD OF HARD-CODING VALUES IN YOUR CODE

// An index for a block (0, 1, 2, ...)
typedef unsigned short BLOCK_REFERENCE;

// Value used as an index when it does not refer to a block
#define UNALLOCATED_BLOCK (USHRT_MAX-1)

// An index that refers to an index node
typedef unsigned short INDEX_NODE_REFERENCE;

// Value used as an index when it does not refer to an index node
#define UNALLOCATED_INDEX_NODE (USHRT_MAX)

// Number of bytes available for block data
#define DATA_BLOCK_SIZE ((int)(BLOCK_SIZE-sizeof(int)))

// The block on the virtual disk containing the root directory
#define ROOT_DIRECTORY_BLOCK 2

// The block on the virtual disk containing the first index nodes
#define ROOT_INDEX_NODE_BLOCK 1

// The Index node for the root directory
#define ROOT_DIRECTORY_INDEX_NODE 0

// Size of file/directory name
#define FILE_NAME_SIZE ((int)(32 - sizeof(INDEX_NODE_REFERENCE)))

/**********************************************************************/
// Data block: storage for file contents (project 4!)
typedef struct
{
  unsigned char data[DATA_BLOCK_SIZE];
} DATA_BLOCK;


/**********************************************************************/
// Index node types
typedef enum {T_UNUSED=0, T_DIRECTORY, T_FILE, T_PIPE} INDEX_NODE_TYPE;

// Single index node
typedef struct
{
  // Type of INDEX_NODE
  INDEX_NODE_TYPE type;

  // Number of directory references to this index node
  unsigned char references;

  // Contents.  UNALLOCATED_BLOCK means that this entry is not used
  BLOCK_REFERENCE content;

  // File: size in bytes; Directory: number of directory entries
  //  (including . and ..)
  unsigned int size;
} INDEX_NODE;

// Number of index nodes stored in each block
#define N_INDEX_NODES_PER_BLOCK ((int)(DATA_BLOCK_SIZE/sizeof(INDEX_NODE)))

// Block of index_nodes
typedef struct 
{
  INDEX_NODE index_node[N_INDEX_NODES_PER_BLOCK];
} INDEX_NODE_BLOCK;


/**********************************************************************/
// Block 0: Volume block
#define VOLUME_BLOCK_REFERENCE 0

typedef struct 
{
  int n_blocks;                     // Total number of blocks
  int n_allocated_blocks;           // Allocated == used
  int n_allocated_index_nodes;
  // 8 blocks per byte: One block per bit: 1 = allocated, 0 = free
  // Block 0 (zero) is byte 0, bit 0 
  //       1        is byte 0, bit 1
  //       8        is byte 1, bit 0
  unsigned char block_allocation_table[(MAX_BLOCKS+7)>>3];
}VOLUME_BLOCK;

/**********************************************************************/
// Single directory element
typedef struct
{
  // Name of file/directory
  char name[FILE_NAME_SIZE];

  // UNALLOCATED_INDEX_NODE if this directory entry is non-existent
  INDEX_NODE_REFERENCE index_node_reference;

} DIRECTORY_ENTRY;

// Number of directory entries stored in one data block
#define N_DIRECTORY_ENTRIES_PER_BLOCK ((int)(DATA_BLOCK_SIZE / sizeof(DIRECTORY_ENTRY)))

// Maximum number of files that can be contained in a directory (note, a directory can span multiple blocks)
#define MAX_ENTRIES_PER_DIRECTORY (N_DIRECTORY_ENTRIES_PER_BLOCK * 10)

// Directory block
typedef struct directory_block_s
{
  DIRECTORY_ENTRY entry[N_DIRECTORY_ENTRIES_PER_BLOCK];
} DIRECTORY_BLOCK;

/**********************************************************************/
// All-encompassing structure for a disk block
// The union says that all 4 of these elements occupy overlapping bytes in 
//  memory (hence, a block will only be one of these 4 at any given time)

typedef struct
{
  // Next block in a linked list (if this block belongs to one)
  BLOCK_REFERENCE next;
  union {
    DATA_BLOCK data;
    VOLUME_BLOCK volume;
    INDEX_NODE_BLOCK index_nodes;
    DIRECTORY_BLOCK directory;
  } content;
} BLOCK;


/**********************************************************************/
// Representing files (project 4!)

#define MAX_BLOCKS_IN_FILE 1000

typedef struct
{
  INDEX_NODE_REFERENCE index_node_reference;
  char mode;
  int offset;

  // Cache for file content details.  Use of these is optional
  int n_data_blocks;
  BLOCK_REFERENCE block_reference_cache[MAX_BLOCKS_IN_FILE];
} MYFILE;


#endif


Environment Variables

We use two environment variables to provide important context to the executables: We provide a function that reads these environment variables and fills in reasonable default values if they do not exist.


MYFS Application Programs

The high-level implementation of all of the MYFS application programs is given. However, you will be responsible for implementing the underlying system calls.

myfs_format [size]

Initialize the entire file system with a total of size blocks (size is optional).

myfs_list [name]

myfs_mkd name

Create a new directory. The behavior is:

myfs_rmd name

Remove a directory. The behavior is:


Example Interaction

Note that below, you are seeing a mixture of the commands that are typed and the outputs from the programs. Also note that all error messages are printed via STDERR.

Simple Interaction

$ ./myfs_format 128
$ ./myfs_list
../
./
$ ./myfs_mkd a
$ ./myfs_list
../
./
a/
$ ./myfs_mkd ab
$ ./myfs_list
../
./
a/
ab/
$ ./myfs_list a
../
./
$ ./myfs_list /a
../
./
$ ./myfs_mkd a/abc
$ ./myfs_list a
../
./
abc/
$ ./myfs_list /a
../
./
abc/
$ ./myfs_list /a/abc
../
./
$ ./myfs_list /a/def
Not found
$ ./myfs_rmd a
Directory not empty
$ ./myfs_rmd a/abc
$ ./myfs_rmd a
$ ./myfs_list
../
./
ab/
$ ./myfs_list ab
../
./

Interaction with FS Checks

Using myfs_inspect, you can examine the details of your file system at the block and index node levels. It is a useful tool for debugging your code.

$ ./myfs_stats
BLOCK_SIZE: 256
DATA_BLOCK_SIZE: 252
VOLUME_BLOCK_SIZE: 252
INDEX_NODE_BLOCK_SIZE: 252
BLOCK_REFERENCE_SIZE: 2
DIRECTORY_BLOCK_SIZE: 224
MAX_BLOCKS: 1920
UNALLOCATED_BLOCK reference: 65534
UNALLOCATED_INDEX_NODE reference: 65535
DATA_BLOCK_SIZE: 252
INDEX_NODES_PER_BLOCK: 21
DIRECTORY_ENTRIES_PER_BLOCK: 7
$ ./myfs_format 128
$ ./myfs_inspect -help
Usage:
myfs_inspect -volume		 Show the volume block
myfs_inspect -help		 Print this help
myfs_inspect -index <#>		 Print contents of INDEX_NODE #
myfs_inspect -iblock <#>		 Print index node contents of block #
myfs_inspect -dir <#>	 Print the contents of directory block #
myfs_inspect -block <#>		 Print the top-level block data for block #
myfs_inspect -data <#>		 Print the raw data contents for the block (including printable characters)
$ ./myfs_inspect -volume
N_BLOCKS: 128
N_ALLOCATED_BLOCKS: 3
N_ALLOCATED_INDEX_NODES: 1
Block allocation table:
00: 07
01: 00
02: 00
03: 00
04: 00
05: 00
06: 00
07: 00
08: 00
09: 00
10: 00
11: 00
12: 00
13: 00
14: 00
15: 00
$ ./myfs_inspect -index 0
Index node: 0
Type: DIRECTORY
Nreferences: 1
Content block: 2
Size: 2
$ ./myfs_inspect -dir 2
Directory at block 2:
Entry 0: name=".", index_node=0
Entry 1: name="..", index_node=0
Next block: 65534
$ ./myfs_inspect -iblock 1
Relative Index Node 0	DIRECTORY	Nref=1	Content=2	size=2
Next block: 65534
$ ./myfs_mkd foo
$ ./myfs_inspect -volume
N_BLOCKS: 128
N_ALLOCATED_BLOCKS: 4
N_ALLOCATED_INDEX_NODES: 2
Block allocation table:
00: 0f
01: 00
02: 00
03: 00
04: 00
05: 00
06: 00
07: 00
08: 00
09: 00
10: 00
11: 00
12: 00
13: 00
14: 00
15: 00
$ ./myfs_inspect -dir 2
Directory at block 2:
Entry 0: name=".", index_node=0
Entry 1: name="..", index_node=0
Entry 2: name="foo", index_node=1
Next block: 65534
$ ./myfs_inspect -index 1
Index node: 1
Type: DIRECTORY
Nreferences: 1
Content block: 3
Size: 2
$ ./myfs_inspect -dir 3
Directory at block 3:
Entry 0: name=".", index_node=1
Entry 1: name="..", index_node=0
Next block: 65534
$ ./myfs_mkd foo/bar
$ ./myfs_inspect -volume
N_BLOCKS: 128
N_ALLOCATED_BLOCKS: 5
N_ALLOCATED_INDEX_NODES: 3
Block allocation table:
00: 1f
01: 00
02: 00
03: 00
04: 00
05: 00
06: 00
07: 00
08: 00
09: 00
10: 00
11: 00
12: 00
13: 00
14: 00
15: 00
$ ./myfs_inspect -index 0
Index node: 0
Type: DIRECTORY
Nreferences: 1
Content block: 2
Size: 3
$ ./myfs_inspect -dir 2
Directory at block 2:
Entry 0: name=".", index_node=0
Entry 1: name="..", index_node=0
Entry 2: name="foo", index_node=1
Next block: 65534
$ ./myfs_inspect -index 1
Index node: 1
Type: DIRECTORY
Nreferences: 1
Content block: 3
Size: 3
$ ./myfs_inspect -dir 3
Directory at block 3:
Entry 0: name=".", index_node=1
Entry 1: name="..", index_node=0
Entry 2: name="bar", index_node=2
Next block: 65534
$ ./myfs_inspect -index 2
Index node: 2
Type: DIRECTORY
Nreferences: 1
Content block: 4
Size: 2
$ ./myfs_inspect -dir 4
Directory at block 4:
Entry 0: name=".", index_node=2
Entry 1: name="..", index_node=1
Next block: 65534
$ ./myfs_list
../
./
foo/
$ ./myfs_list foo
../
./
bar/
$ ./myfs_list foo/bar
../
./
$ ./myfs_inspect -iblock 1
Relative Index Node 0	DIRECTORY	Nref=1	Content=2	size=3
Relative Index Node 1	DIRECTORY	Nref=1	Content=3	size=3
Relative Index Node 2	DIRECTORY	Nref=1	Content=4	size=2
Next block: 65534
$ ./myfs_mkd baz
$ ./myfs_list
../
./
baz/
foo/
$ ./myfs_inspect -iblock 1
Relative Index Node 0	DIRECTORY	Nref=1	Content=2	size=4
Relative Index Node 1	DIRECTORY	Nref=1	Content=3	size=3
Relative Index Node 2	DIRECTORY	Nref=1	Content=4	size=2
Relative Index Node 3	DIRECTORY	Nref=1	Content=5	size=2
Next block: 65534
$ ./myfs_inspect -index 0
Index node: 0
Type: DIRECTORY
Nreferences: 1
Content block: 2
Size: 4
$ ./myfs_inspect -dir 2
Directory at block 2:
Entry 0: name=".", index_node=0
Entry 1: name="..", index_node=0
Entry 2: name="foo", index_node=1
Entry 3: name="baz", index_node=3
Next block: 65534
$ ./myfs_inspect -volume
N_BLOCKS: 128
N_ALLOCATED_BLOCKS: 6
N_ALLOCATED_INDEX_NODES: 4
Block allocation table:
00: 3f
01: 00
02: 00
03: 00
04: 00
05: 00
06: 00
07: 00
08: 00
09: 00
10: 00
11: 00
12: 00
13: 00
14: 00
15: 00
$ ./myfs_rmd foo/bar
$ ./myfs_inspect -volume
N_BLOCKS: 128
N_ALLOCATED_BLOCKS: 5
N_ALLOCATED_INDEX_NODES: 3
Block allocation table:
00: 2f
01: 00
02: 00
03: 00
04: 00
05: 00
06: 00
07: 00
08: 00
09: 00
10: 00
11: 00
12: 00
13: 00
14: 00
15: 00
$ ./myfs_inspect -iblock 1
Relative Index Node 0	DIRECTORY	Nref=1	Content=2	size=4
Relative Index Node 1	DIRECTORY	Nref=1	Content=3	size=2
Relative Index Node 3	DIRECTORY	Nref=1	Content=5	size=2
Next block: 65534
$ ./myfs_list foo
../
./
$ ./myfs_inspect -dir 3
Directory at block 3:
Entry 0: name=".", index_node=1
Entry 1: name="..", index_node=0
Next block: 65534
$ ./myfs_rmd foo
$ ./myfs_inspect -volume
N_BLOCKS: 128
N_ALLOCATED_BLOCKS: 4
N_ALLOCATED_INDEX_NODES: 2
Block allocation table:
00: 27
01: 00
02: 00
03: 00
04: 00
05: 00
06: 00
07: 00
08: 00
09: 00
10: 00
11: 00
12: 00
13: 00
14: 00
15: 00
$ ./myfs_list
../
./
baz/
$ ./myfs_inspect -iblock 1
Relative Index Node 0	DIRECTORY	Nref=1	Content=2	size=3
Relative Index Node 3	DIRECTORY	Nref=1	Content=5	size=2
Next block: 65534
$ ./myfs_inspect -dir 2
Directory at block 2:
Entry 0: name=".", index_node=0
Entry 1: name="..", index_node=0
Entry 3: name="baz", index_node=3
Next block: 65534


Checklist

Write / Implement the following files / functions.

Supporting Materials

MYFS API (Library)

The documentation for the following can be found in the skeleton file. Your contributions are in bold.

OUFS API Support

These helper functions are to be used only by the API. Your contributions are in bold. The documentation for the following can be found in the skeleton file. Take the time to read/understand the functions that we provide. They are good examples of how to do things and will help a lot.


Submitting Your Program


Grading Criteria

For this project, we have a sequence of deadlines. They are:


Downloads

The following file contains several header and skeleton C files: project3_skel.tar. These skeleton files are good starting points for the source files that you must implement.


Hints


Addenda


andrewhfagg at gmail.com

Last modified: Wed Nov 18 14:40:26 2020