Homework — File system

All the sources are available in the part-3 tarball.

The C files are located in directory part-3/homework-file-system/material/.

The general specifications of the POSIX functions are available at OpenGroup website.

Overview

In this homework, we study file system operations. To illustrate that, we implement concurrent sorted singly linked lists that can be easily stored in a file.

Sorted linked lists are very common in very large information systems (big data). When many processes read such a structure and very few modify it, it can be interesting to let these readers access the structure in parallel. However, even if very few processes modify the structure, we still need to ensure the consistency of the list.

Three common approaches to solve this problem are: coarse-grain locking (we lock the entire list, but this leads to very poor performance as it prevents parallelism) and fine-grain locking (we lock only the previous and next nodes, between which we insert a new node). We shall focus on the fine-grain locking.

A sorted singly linked array list

In this homework, we study concurrent sorted singly linked lists that can be easily stored in a file.

  1. Our first objective will be to implement a sorted singly linked list as an array of dynamic size stored in memory space.
  2. The second step will be to store the array in a file instead of memory space. This will enable us to exercise file system syscalls.
  3. Finally, we protect the values in the list from concurrent access using fine-grained file locking mechanisms.

We provide incomplete codes to implement a sorted singly linked list.

  • fs-linked-list-main.c defines a main test procedure common to both the memory or the file implementation.
  • fs-common-linked-list.c defines functions common to both the memory and the file implementation.
  • fs-mem-linked-list.c defined functions specific to the memory implementation.
  • fs-file-linked-list.c defined functions specific to the file implementation.

Our singly linked list consists of a chain of nodes containing a value (32-bits signed integer) and a reference to the next node. The first node is a dummy node that represents the head of the list. This node contains a value attribute whose value is -oo and a next attribute which is actually the first real node in the list. If the next attribute of the dummy node is NONE, then the list is empty. The following figure illustrates the role of the dummy node.

Sorted singly linked list

A singly linked list stored as an array in memory

We first focus on the implementation of the linked list stored in memory. We choose to implement this singly linked list as an array. This will ease the implementation of the linked list when we shall store it in a file. We choose the following structure:

typedef struct _node_t {
  value_t value;
  index_t next;
} node_t;

typedef struct _list_t {
  node_t *nodes;
} list_t;

This results in the following architecture:

Sorted singly linked list (step 1)

The dummy node has always the index 0, and in this example, the first node has index 2 and value 12. If we add value 17, as this value does not exist, we extend the array and set the attribute value of this new node to 17 and the attribute next to node 3 which has the next higher value 19.

Sorted singly linked list (step 2)

Once we have extended the array, we link the new node to its previous node, here node 1 to the new node 5.

Sorted singly linked list (step 3)

Allocate a new node to a singly linked array list stored in memory

  1. Complete allocate_node() in fs-mem-linked-list.c to add a new node of attributes value and next to the existing array list (step 2). We remind you two memory allocation system calls malloc and realloc.
#include <stdlib.h>

void *malloc(size_t size);
void *realloc(void *ptr, size_t size);
  1. Build fs-mem-linked-list-main and run it:
$ make fs-mem-linked-list-main
$ ./fs-mem-linked-list-main 1 4

The first command line argument is the number of processes that will add (random) values to the list, and the second is the number of values to be added per process. At this stage, use only one process to avoid concurrent accesses.

  1. Check that multiple identical values are stored only once. For instance, in the scenario below, value 2 is produced twice but stored once, and 3, 2 and 6 are well sorted in the list.
$ ./fs-mem-linked-list-main 1 4
Process 82704 insert node 3
Process 82704 [ 3]
Process 82704 insert node 2
Process 82704 [ 2 3]
Process 82704 insert node 6
Process 82704 [ 2 3 6]
Process 82704 insert node 2
Process 82704 [ 2 3 6]

A singly linked list stored in a file

In this section, we shall be reviewing our implementation of the array list by storing it not in a memory space but in a file. We complete now fs-file-linked-list.c. We have now a redefinition of the previous structures:

typedef struct _node_t {
  value_t value;
  index_t next;
} node_t;

typedef struct _list_t {
  int file;
} list_t;

Allocate a new node to a singly linked array list stored in a storage file

  1. Implement the allocate_node() function in file fs-file-linked-list.c using lseek() and write() system calls. This time, we shall extend the file, attribute of the list, as we did in step 2 of the previous implementation when the list was stored in memory. But note we can not access the i-th array slot easily, we have first to set the file index to the end of the file and only then append the new node. For this, we suggest to use lseek().

Before testing complete the next section.

Read and write nodes of the array list stored in a storage file

  1. Implement the read_node() and write_node() functions using lseek(), read() and write() system calls. Note that to read or write to a given node of the array, we must first move the file index to the start of the node.
off_t lseek(int fd, off_t offset, int whence);
size_t write(int fd, const void *buf, size_t nbyte);
size_t read(int fd, void *buf, size_t nbyte);

The functions have already been described in the file system lab session.

Test allocate_node(), read_node() and write_node()

  1. Build fs-file-linked-list-main and run it:
$ make fs-file-linked-list-main
$ ./fs-file-linked-list-main 1 4

A typical run would look like this:

$ ./fs-file-linked-list-main 1 4 
Process 19573 insert node 1
Process 19573 [ 1]
Process 19573 insert node 8
Process 19573 [ 1 8]
Process 19573 insert node 6
Process 19573 [ 1 6 8]
Process 19573 insert node 2
Process 19573 [ 1 2 6 8]
> od -x storage.dat
0000000      0000    8000    0001    0000    0001    0000    0004    0000
0000020      0008    0000    ffff    ffff    0006    0000    0002    0000
0000040      0002    0000    0003    0000                                
0000050

The storage.dat file contains the array. From the dump of od, we deduce the following: "0000 8000 0001 0000" indicates that the dummy node has value "80000000" (-oo) and has node 1 as next node. Node 1 contains "0001 0000 0004 0000", that it has 1 for value and 4 as next node. Node 4 has value 2 and node 3 as next node. Node 3 has value 6 and node 2 as next node. Node 2 has value 8 and -1 as next node, so is the end of the list.

A protected linked list stored in a file

So far, we implemented a singly linked array list stored in a file. However, this list is not yet protected against concurrent accesses. Remember that we did not launch more than 1 process in our tests.

To protect this list against concurrent accesses, there are several approaches:

  • The coarse grained locking consists in locking the whole list. For example, we can lock the dummy node the entire time we access the list, even if we are only modifying one node. However, this induces very poor performances as we prevent any parallelism.
  • The fine-grained locking consists in locking only the previous and next nodes between which we insert a new node. This is the approach we shall study.

We describe the algorithm on the figure below. We want to add value 17. We start by locking node 0 (head node) and then its next node, node 2 of value 12 (in red on the figure).

Protected singly linked list

As value 17 is not between -oo and 12, we move our locks from one node, we unlock node 0 and lock the next node of node 2, node 4 of value 14.

Protected singly linked list

As value 17 is not between 12 and 14, we move our locks from one node, we unlock node 2 and lock the next node of node 4, node 1 of value 15.

Protected singly linked list

Again, as value 17 is not between 14 and 15, we move our locks from one node, we unlock node 4 and lock the next node of node 1, node 3 of value 19.

Protected singly linked list

As value 17 is between 15 and 19, we allocate new node 5 of value 17.

Protected singly linked list

Once node 1 has updated its next node to node 5, we unlock nodes 1 and 3.

Protected singly linked list

Inconsistencies on concurrent access

  1. Explore fs-file-linked-list.c, in particular create_list() and open_list() and identify their functional differences. Explore fs-file-linked-list-main.c, in particular identify where create_list() and open_list() are invoked. Why do we have two distinct functions beyond their functional differences, create_list() being invoked in the main process and open_list() being invoked separately in max_processes processes ? Why is open_list() not invoked in the main process? Why would it be incorrect to open the file in the main process? Devise a scenario to highlight a possible problem.

  2. Run fs-file-linked-list-main several times to detect any inconsistencies in the list construction.

We introduce delays to force concurrent insertions to interfere. So do not be surprized if the runs get slower when the list is long. The longer the list, the longer the delays.

An example is given below. Note that the value 5 is requested for insertion in the list, but is never actually inserted. Also note that process 85355 attempts to insert value 8 without success, but when process 85356 also attempts to insert value 8, it works. This is due to inconsistencies caused by concurrent access.

$ ./fs-file-linked-list-main 2 4
Process 85355 insert node 5
Process 85356 insert node 0
Process 85355 [ 0]
Process 85355 insert node 8
Process 85356 [ 0]
Process 85356 insert node 6
Process 85355 [ 0 6]
Process 85355 insert node 0
Process 85356 [ 0 6]
Process 85356 insert node 7
Process 85355 [ 0 6]
Process 85355 insert node 2
Process 85356 [ 0 6 7]
Process 85356 insert node 8
Process 85355 [ 0 2 6 7]
Process 85356 [ 0 2 6 7 8]

Protect list against concurrent access

We already included the calls to functions lock_node() and unlock_node(). However, these two functions have not been implemented.

  1. Implement functions lock_node() and unlock_node(). Run fs-file-linked-list-main several times to detect any inconsistencies in the list construction.