TP — File system

All the sources are available in the part-3 tarball.

The C files are located in directory part-3/tp-file-system/material/.

The general specifications of the POSIX functions are available at OpenGroup website

Overview

The aim of this lab is to learn how to use Unix system calls to manage files.

We first study how to execute basic operations such as open(), read(), write(), lseek() and close(). We implement a program to dump a binary file and then another one to edit a binary file.

Second, we manipulate files using Shell commands such as ls, ln, chmod, ...

Files

Here is a brief summary of the main system calls used to manipulate files.

#include <fcntl.h>

int open(const char *path, int oflag, ... );

open opens a file located at path and returns an integer file descriptor to refer to it later. This descriptor allows us to perform other operations such as reading and writing without using the file name.

oflag specifies the operations allowed on the file. Values for oflag are constructed by a bitwise-inclusive OR. Applications shall specify one of the first three file access modes:

  • O_RDONLY
    • Open for reading only.
  • O_WRONLY
    • Open for writing only.
  • O_RDWR
    • Open for reading and writing.

Several other flags can be combined with them.

In particular, O_CREAT allows to create the file if it does not exist already. If it already exists, nothing is done and the existing file is opened.

If oflags is O_CREAT | O_WRONLY, the file is created if it does not already exist and the user can only write in it.

If the file is created, the third argument of open() represents the file mode of the file. The following code will create a file (if it does not exist) with permission mode 0644 (in octal), i.e., read and write modes for user, read mode for group and others.

fd = open("my_new_file.txt", O_CREAT | O_RDWR, 0644);

If the returned value of open() is -1, an error occurred and the file has not been opened.

Once all read and write operations have been performed on the file, the user can close the file. It will be closed anyway when the process ends.

To delete a file, use unlink on its path.

read() and write()

#include <unistd.h>

ssize_t read(int fd, void *buf, size_t nbyte);
ssize_t write(int fd, const void *buf, size_t nbyte);

The read() function reads nbyte bytes from the file associated with file descriptor, fd, into the buffer pointed to by buf.

The write() function writes nbyte bytes from the buffer pointed to by buf to the file associated with the file descriptor, fd.

Upon successful completion, read() or write() return the number of bytes actually read or written. Otherwise, they return -1 and set the global variable errno to indicate the error.

Typically, nbyte is set to the buffer (max) size when we read from a file, and if the operation is successful, the returned value indicates the number of bytes actually read from the file.

lseek()

Associated to each open file the operating systems maintains an offset---an integer between 0 (beginning of the file) and file length---indicating where in the file the next read/write operation (among others) will start operating.

Each read/write operation implicitly advances the file offset by the amount of bytes read or written.

off_t lseek(int fd, off_t offset, int whence);

The lseek() system call explicitly changes the position of the file offset for the open file fd, as follows:

  • If whence is SEEK_SET, the file offset shall be set to offset bytes (absolute positioning from the beginning of the file).

  • If whence is SEEK_CUR, the file offset shall be set to its current location plus offset (relative positioning w.r.t. the current file offset).

  • If whence is SEEK_END, the file offset shall be set to the size of the file plus offset (absolute position from the end of the file).

Upon successful completion, the returned value corresponds to the resulting offset, as measured in bytes from the beginning of the file. Note that using lseek() with offset set to 0 and whence to SEEK_END allow to get the size of the file as returned value.

File management in C

Dump a binary file

In this part, we want to dump a binary file, each element representing an 32-bits unsigned integer. We want to output 4 32-bits unsigned integers per line. At the beginning of each line, we output the offset (index) of the file. If the last 32-bits unsigned integer is incomplete, we skip it. To check the result, you can use the Unix command od. It should look that way:

> echo "hello world" > test.dat
> ./fs-dump-file test.dat
00000000: 6c6c6568 6f77206f 0a646c72

> od -X test.dat
0000000          6c6c6568        6f77206f        0a646c72                
0000014
  1. Complete the file fs-dump-file.c. Compile and run it to make it dump a file.

Edit a binary file

Next, we want to edit a binary file, one 32-bits unsigned integer at a time. We start at offset 0. We read the first 32-bits unsigned integer and output the offset and the integer. We can then:

  • Type <enter> and outputs the current 32-bit unsigned integer preceded by its offset between brackets.
  • Type x to leave the editor.
  • Enter an integer. Replace the current unsigned integer with this new integer. Re-read the current unsigned integer from the file as before to allow the user to modify the unsigned integer again.

To check the result, you can use the Unix command od. It should look that way:

> echo "hello world" > test.dat
> ./fs-edit-file test.dat
[00000000] 6c6c6568
value to edit or RET to skip: 
[00000004] 6f77206f
value to edit or RET to skip: 01
[00000004] 00000001
value to edit or RET to skip: 
[00000008] 0a646c72
value to edit or RET to skip: 
[0000000c] <EOF>
value to edit or RET to skip: x

> ./fs-dump-file test.dat
00000000: 6c6c6568 00000001 0a646c72

> od -X test.dat
0000000          6c6c6568        00000001        0a646c72                
0000014
  1. Complete the fs-edit-file.c file. Compile it and run it to modify a binary file as described above.
  2. To obtain the current file offset, use lseek().
  3. Note that although you close the file before reaching the end, it is not truncated.

Edit and truncate a binary file

This time, when we exit our program, we want to truncate the file at its current index. But we don't want to load the file into memory and save the truncated part in the file. We want to copy the truncated file to a temporary file, delete the original file and rename the temporary file to the original file name. To do that, we describe two functions:

int unlink(char *path);
int rename(const char *old, const char *new);
  • The unlink() function removes a link to a file. If there is no more links, the file is removed.
  • The rename() function changes the name of a file.

To check the result, you can try the following scenario:

> echo "hello world" > test.dat
> ./fs-edit-file test.dat
[00000000] 6c6c6568
value to edit or RET to skip: 
[00000004] 6f77206f
value to edit or RET to skip: 
[00000008] 0a646c72
value to edit or RET to skip: x

> fs-dump-file test.dat
00000000: 6c6c6568 6f77206f

> od -X test.dat
0000000          6c6c6568        6f77206f                                
0000010
  1. Complete the fs-edit-file.c file. Compile it and run it to modify a binary file as described above.

File management with Shell

Access modes

  1. Create a directory private and change its access modes so that the user can read, write and execute, the group and others can only execute it.
  2. In private, create a directory with a name that is hard to guess (e.g., 000_ooo_000).
  3. Move to the initial directory and execute ls private.
  4. Change access modes of private such that the user can only execute it and execute ls private.
  5. Create a directory hierarchy so that only your friends will be able to see your vacation photos, thanks to a secret you share with them.

Inodes

mkdir tmp
ls -il .
cd tmp
touch my_orig_file
  1. Create a new directory tmp and move to it.
  2. Create a file my_orig_file in this directory.
  3. Create a symbolic link of my_orig_file names my_symb_link
  4. Execute ls -il. What does -i change in ls execution ?
  5. Create a hard link of my_orig_file names my_hard_link
  6. Execute ls -il. What is the difference ?
  7. Move my_orig_file into my_new_file.
  8. Execute ls -il. What happens to my_symb_link ?
  9. Remove my_new_file.
  10. Execute ls -il. What happens to my_hard_link ?