TP — Virtual memory

Virtual object storage
Shared memory

All the sources are available in the part-3 tarball.

The C files are located in directory part-3/tp-virtual-memory/material/.

The general specifications of the POSIX functions are available at OpenGroup website.

DO NOT COMPILE WITH -std=c11 - signal.h is broken on Linux

Overview

In this lab, we first study a very useful component, the virtual object storage. In this work, we implement a virtual object storage similar to the virtual memory space you studied in the course. Virtual memory gives the illusion that the memory available to the process is very large, usually much larger than the physical memory. Here, we mimic the same mechanism, but instead of manipulating pages, we manipulate objects. This part assumes you worked on the first sections of the homework on signals.

Then, to continue with memory management, we shall return to the homework on semaphores and improve our implementation of the blocking array queue. In fact, we have been using files to store shared variables and arrays. This time, we shall simplify our implementation by using shared memory. This part assumes you worked on the first section of the homework on semaphores.

Virtual object storage

In the first part, we shall reuse file system calls studied in the file system lab. In particular, we shall reuse open(), read(), write() or lseek(). If you need to, feel free to revise that lab.

In this work, we implement a virtual object storage similar to the virtual memory space you studied in the course. Virtual memory gives the illusion that the memory available to the process is very large, usually much larger than the physical memory. Here, we mimic the same mechanism, but instead of manipulating pages, we manipulate objects.

We assume that the designer manages a large number of objects that cannot all be in memory simultaneously. Instead, some of them will be located in memory space of limited size, and the rest will be stored in a swap file. When the designer wants to use an object, they will try to access it directly via its address in memory.

If the object is actually present, it can be manipulated via its memory address. Otherwise, its address in memory will be zero, and a memory fault studied in a previous homework will occur. We shall catch this fault to fix the problem and make the object available in memory. If there is no memory left to store this object (we have a memory space of limited size), the implementation first finds an object to store in the swap file and free the memory space it occupied. This space is then reused to store the missing initial object in memory. We then repeat the operation that had failed, since access to the object has now become possible.

Virtual storage

In the following example, the user accesses objects via vir_obj_tab. Some of these objects are in memory, those in phy_obj_tab. For example, virtual object 2, vir_obj_tab[2], is stored at address #130. How does the implementation achieve this?

It stores a mapping table vir_to_phy_map which translates virtual objects into physical object. Here, the mapping table indicates that virtual object whose vir_obj_id is 2 is mapped onto physical object (in memory) whose phy_obj_id is 3. This means that virtual object 2 is physically stored in phy_obj_tab[3], which physical address is #130. Thus, vir_obj_tab[2] is stored at address #130.

Conversely, in the mapping table, virtual object 1 has phy_obj_id set to -1 (no physical object id). This means it is not mapped onto a physical object (in memory) and if the program wants to access it in memory, it must find a free location in phy_obj_tab to store it from the disk swap file, i.e. from vir_obj_file.

We implement this system step by step. A draft of implementation is provided in mem-virtual-objects.c.

Initialize the virtual objects in the swap file (1)

In init_obj_tab(), create and initialize a swap file vir_obj_file to store the virtual objects. The objects (cartesian points) should be initialised with zero values (use initial_object)
Compile and run. The main program asks you to modify an object giving its virtual object id. It first reads it from memory (vir_obj_tab) and then proposes to update it. As no object is yet in memory, accessing any object should raise an error. Such an error should ring a bell for you since we saw this issue in a previous homework.

Initialize some virtual objects as physical objects (2)

Until now, the memory space allocated for storing objects in physical memory phy_obj_tab has not been used. We shall refer to objects in physical memory as physical objects for short. We want to initialize the max_phy_obj physical objects so that they refer to the first max_phy_obj virtual objects.
Update init_obj_tab(). Note that for a virtual object vir_obj_id, phy_obj_tab[vir_obj_id] must be initialized with an initial object (0, 0), vir_obj_tab[vir_obj_id] must point to the physical object and the mapping vir_to_phy_map[vir_obj_id] between virtual and physical objects must also be updated.
Compile and run. When the main program asks you to modify an object by giving its virtual object id, check that the first virtual objects are accessible and that the last ones are not, as in this example:

$ ./mem-virtual-objects 
vir obj id> 1
vir_obj_tab[1] = (0, 0)
x = 1
y = 1
vir_obj_tab[1] = (1, 1)
vir obj id> 4
[1]  segmentation fault  ./mem-virtual-objects

Dump virtual object table (3)

We want to display the contents of the virtual object table. That is, for each virtual object, the id of the physical object corresponding to the virtual object (possibly none) and its access time (a counter for the number of accesses to virtual objects). The purpose of this function is to check that for the next steps, the implementation is correct. Complete the dump_vir_obj_tab() function.
Compile and run it. The output should be the following. The first integer indicates the physical object mapping the virtual object (if none then -1) and the second integer the last access time. For instance, [03 04] means that the object is mapped on the fourth physical object (03) and has a last access time of 4 (04).

$ ./mem-virtual-objects
[00 00][01 00][02 00][03 00][-1 00][-1 00][-1 00][-1 00]
vir obj id>

Catch object fault (4)

Until now, we do not catch access to a virtual object missing in memory. Update set_obj_fault_handler() function in order to call handle_obj_fault() function when an oject fault occurs.
Compile and run it. This implementation is similar to the ones done in the signal homework.

$ ./mem-virtual-objects
[00 00][01 00][02 00][03 00][-1 00][-1 00][-1 00][-1 00]
vir obj id> 4
[00 00][01 00][02 00][03 00][-1 00][-1 00][-1 00][-1 00]
no replaced object

Implement a LRU replacement strategy (5)

As we are now intercepting the object fault, we want to find a physical object that can host the virtual object that is not associated with a physical object. Complete the handle_obj_fault() function:
Find a virtual object that (a) is already mapped to a physical object and (b) has the least recent access time. In the following, it will be called the replaced virtual object.
Save the replaced object at its excepted location in the swap file. Remember the file system calls described in the previous homework.
Load the missing virtual object in memory into the physical object that corresponded to the replaced virtual object.
Compile and run. Try to access the virtual object 04. Note that we haven't finished yet, as access will still fail. This is because the physical address of the missing virtual object is stored in a register, has a valur of NULL and has not been modified. We therefore have this "not yet ready" message to alert you to this point. We'll fix this in the next question.

$ ./mem-virtual-objects
[00 00][01 00][02 00][03 00][-1 00][-1 00][-1 00][-1 00]
vir obj id> 4
[00 00][01 00][02 00][03 00][-1 00][-1 00][-1 00][-1 00]
Replace vir obj 0 (phy obj 0) to store vir obj 4
[-1 00][01 00][02 00][03 00][00 00][-1 00][-1 00][-1 00]
not yet ready

New access to virtual object when object fault has been handled (6)

We now want to re-execute access to the virtual object once we've mapped it to a physical object. To do this, we reuse the system calls presented in the signal homework. Typically, we save a recovery point before modifying the virtual object and when an object fault occurs, we load the virtual object in memory (as we did in the previous question) and restart execution at the saved recovery point.
Compile and run. Try to access virtual object 04. The replaced object should be 00. Thus, try to access virtual object 00. You reload a virtual object that was just backup to the swap file. Below; you have a typical output:

$ ./mem-virtual-objects
[00 00][01 00][02 00][03 00][-1 00][-1 00][-1 00][-1 00]
vir obj id> 4
[00 00][01 00][02 00][03 00][-1 00][-1 00][-1 00][-1 00]
Replace vir obj 0 (phy obj 0) to store vir obj 4
[-1 00][01 00][02 00][03 00][00 00][-1 00][-1 00][-1 00]
vir_obj_tab[4] = (0, 0)
x = 4
y = 4
vir_obj_tab[4] = (4, 4)
[-1 00][01 00][02 00][03 00][00 01][-1 00][-1 00][-1 00]
vir obj id> 0
[-1 00][01 00][02 00][03 00][00 01][-1 00][-1 00][-1 00]
Replace vir obj 1 (phy obj 1) to store vir obj 0
[01 01][-1 00][02 00][03 00][00 01][-1 00][-1 00][-1 00]
vir_obj_tab[0] = (0, 0)
x = 5
y = 5
vir_obj_tab[0] = (5, 5)
[01 02][-1 00][02 00][03 00][00 01][-1 00][-1 00][-1 00]

Shared memory

In this part, we shall improve the implementation of the blocking queue designed in the semaphore homework (which required to store shared variables in shared files). If you need to, feel free to revise this lab.

We aim at using the shared memory mechanisms provided by mmap(). In the homework on semaphores, and in particular for the producer/consumer model, we have used files and file system calls to implement the shared variables "first", "last" and "size". We would like to get rid of these shared variables and use mmap() instead.

We give an overview of the mmap() function.

#include <sys/mman.h>

void *mmap(void *addr, size_t len, int prot, int flags, int fildes, off_t offset);

mmap() causes the pages starting at addr and continuing for at most len bytes to be mapped from the file described by fd, starting at byte offset offset.
addr is used by the system to determine the starting address of the mapping. If addr is NULL and no specific flags are provided, then the system will provide you with an address (similarly to what malloc() does).
len indicates the number of bytes allocated as malloc() would require.
prot indicates the desired memory "protection" level, i.e., whether on the obtained memory the program can read data (PROT_READ), write data (PROT_WRITE), or both (PROT_READ|PROT_WRITE).
There are three main flags. MAP_SHARED is used to share modifications between processes (and the file system), otherwise, with MAP_PRIVATE modifications remain local to each process. Of course, we want to use the flag MAP_SHARED for our needs in this lab session. Another interesting flag is MAP_ANONYMOUS which allows the user not to open any files, but request a region of "anonymous" memory instead. In this case, fildes can be equal to -1 and offset can be equal to 0.
When fildes designates an open file (with the appropriate access modes), the memory content is mapped/reflected to the corresponding file.
offset designates the file offset from which the memory mapping begins.

Make a copy of sem-prod-cons.c and call it mem-prod-cons.c.
Remove all the file system calls used to share variables and arrays and take advantage of mmap() instead.

INF107 Exercises