CS3026/CS4096 Operating Systems – Assessment 02 – Virtual Disk
In this assessment, you will implement a simple file system that allows you to manage files and directories in a virtual in-memory disk. The file system is to be based on simplified concepts of a File Allocation Table (FAT). Your implementation will allow the creation of files and directories within this virtual hard disk and the performance of simple read and write operations on files.
Your task is to implement interface functions for creating files and directories and for reading and writing operations. For this assessment, the virtual disk will be simulated by an array of memory blocks, where each block is a fixed array of bytes. Each block has a block number (from 0 to MAXBLOCKS-1). The allocation of a new block to a file is recorded in the FAT. The FAT is a table (an array of integers) of size MAXBLOCKS that acts as a kind of block directory for the complete disk content: it contains an entry for each block and records whether this block is allocated to a file or unused. The FAT itself is also stored on this virtual disk at a particular location.
Files may occupy one or more disk blocks and this allocation is non-contiguous – any block of the disk may be allocated to a file. A disk usually becomes more fragmented over time, the more files are deleted and added. ENDOFCHAIN (this block is the last block of the file), or UNUSED (block on disk is free). We assume: ENDOFCHAIN == 0 and UNUSED == -1. When reading a file, we first have to look into the FAT to read one block after the other into memory, following such a block chain in the FAT.
For this assessment, we assume that the virtual disk consists of an array of 1024 blocks (MAXBLOCKS), where each block is an array of 1024 bytes (BLOCKSIZE). A block can be either (a) file data (an array of 1024 bytes), (b) directory data (a set of directory entries, as much as can fit within 1024 bytes), or (c) the FAT itself containing information about used and unused blocks. As there are 1024 disk blocks, the FAT has to have 1024 entries. For this assessment, we assume that a FAT entry is a short integer (2 bytes).
Two files are provided for this assessment: You will find the files filesys.c and filesys.h containing data structures and basic functions you may use in your implementation. The two basic functions that are directly interacting with the virtual disk are writeblock() and readblock(). All other functions (those you have to implement) are using these two functions for reading from and writing to the virtual disk.
The virtual disk has the following layout:
• block 0 is reserved and can contain any information about the whole file system on the disk (e.g. volume name etc.); block 0 is left free, because the number 0 has a special meaning in the FAT (it is ENDOFCHAIN == 0). However, you can put arbitrary information into these first 1024 bytes on your virtual disk, such as the name of the disk etc.
-
block 1 and 2 will be occupied by the FAT (we need 2 blocks, because each entry is a short integer, occupying 2 bytes of disk space, and with 1024 entries in the FAT it needs 2048 bytes, which are 2 blocks of disk space
-
block 3 is the root directory: a directory block has special structure, containing a list of directory entries
-
The rest of the virtual disk, blocks 4 – 1023, are either data or directory blocks.
Your task is to extend filesys.c with additional interface functions (as outlined below). Also, implement a test program called shell.c that calls functions you have implemented. The files filesys.h and filesys.c are provided to give you the C structures needed for the implementation. You can also create your own C structures to complete the assessment. Don’t hesitate to extend or change structures in filesys.h, if you see a need for that in order to support your implementation (you may have to add additional parameters to the file descriptor MyFILE to record additional information, e.g. about the location of a file in the file system etc.).
The complete public interface of the file system for this virtual disk is the following (for each assessment step, you have to implement some of them):
void format()
• creates the initial structure on the virtual disk, writing the FAT and the root directory into the virtual disk
MyFILE * myfopen ( const char * filename, const char * mode ) ;
• Opens a file on the virtual disk and manages a buffer for it of size BLOCKSIZE, mode may be either “r” for readonly or “w” for read/write/append (default “w”)
void myfclose ( MyFILE * stream )
• closes the file, writes out any blocks not written to disk int myfgetc ( MyFILE * stream )
• Returns the next byte of the open file, or EOF (EOF == -1) void myfputc ( int b, MyFILE * stream )
• Writes a byte to the file. Depending on the write policy, either writes the disk block containing the written byte to disk, or waits until block is full
void mymkdir ( const char * path )
• this function will create a new directory, using path, e.g. mymkdir (“/first/second/third”) creates directory “third” in parent dir “second”, which is a subdir of directory “first”, and “first is a sub directory of the root directory
void myrmdir ( const char * path )
• this function removes an existing directory, using path, e.g. myrmdir (“/first/second/third”) removes directory “third” in parent dir “second”, which is a subdir of directory “first”, and “first is a sub directory of the root directory
void mychdir ( const char * path )
• this function will change into an existing directory, using path, e.g. mkdir (“/first/second/third”) creates directory “third” in parent dir “second”, which is a subdir of directory “first”, and “first is a sub directory of the root directory
void myremove ( const char * path )
• this function removes an existing file, using path, e.g. myremove (“/first/second/third/testfile.txt”) char ** mylistdir (const char * path)
• this function lists the content of a directory and returns a list of strings, where the last element is NULL
Assessment Requirements
CGS D3-D1
Task
Implement the function format() to create a structure for the virtual disk. Format has to create the FAT and the root directory. Write a test program containing the main() function, and call it “shell.c”.
Your test program shell.c should perform the following three actions:
-
call format() to format the virtual disk
-
transfer the following text into block 0: “CS3026 Operating Systems Assessment”
-
write the virtual disk to a file (call it “virtualdiskD3_D1”).
Include any header files required, such as filesys.h, in your test program shell.c (do NOT include any .c files!!).
Use the unix command “hexdump” to see what the file contains: hexdump –C virtualdiskD3_D1.
In the assessment description page on MyAberdeen, you can download a file “virtualdiskD3_D1” that shows a layout of the virtual disk as acceptable for CGS D3-D1. You must demonstrate that your implementation produces the same or a similar layout (your implementation may differ), with the FAT and entries within the FAT recognizable. It is expected that the hexdump only shows the information written back into the virtual disk and no other clutter. This requires that the virtual disk is properly initialized when formatted().
Additional Information:
Please also read section “How to start your Project” for additional information and how to implement the “format()” function.
Submission for CGS D3-D1
In order to achieve at least a D3, your submission must include a make file for building your solution and your solution also must correctly include the required header files. Please describe your implementation in detail in your report: for each statement in function format(), provide an explanation in your report about its purpose. Provide detailed explanations how to run the submission. Provide an explanation what the result of such an execution is: include a hexdump of the virtual disk into the report and provide explanations for it.
Submit a test program, called shell.c, as well as filesys.c, filesys.h, a file virtualdiskD3_D1 and a Makefile that allows your program to be compiled (put files into a directory CGS_D3_D1).
CGS C3-C1
Task
Implement the following interface functions:
-
myfopen(),
-
myfputc(),
-
myfgetc() and
-
myfclose().
It is assumed that there is only a root directory and that all files are created there. Extend your test program shell.c with the following steps:
-
create a file “testfile.txt” in your virtual disk: call myfopen ( “testfile.txt”, “w” ) to open this file
-
write a text of size 4kb (4096 bytes) to this file, using the function myfputc():
o in shell.c, create a char array of 4 * BLOCKSIZE, fill it with text and then write it to the virtual file with myfputc()
-
close the file with myfclose()
-
write the complete virtual disk to a file “virtualdiskC3_C1”
-
test myfgetc():
o open the file again on your virtual harddisk
o read out its content with myfgetc() (you may read until the function returns EOF) and, at the
same time, print it to the screen
o write the content to a real file on your real harddisk and call it “testfileC3_C1_copy.txt”
In order to create a recognizable pattern in your hexdump for “testfile.txt”, you may loop through the alphabet over and over again, until the array of size 4*BLOCKSIZE is filled (remember how a string literal “ABCDEFGHIJKLMNOPQRSTUVWXWZ” can be indexed).
Use the unix command hexdump to check the content of your virtual disk: • hexdump –C virtualdiskC3_C1
Redirect the output of your shell program into a file “traceC3_C1.txt” • ./shell>traceC3_C1.txt
Submission
For a CGS C3, submit shell.c, filesys.c, filesys.h, the files virtualdiskC3_C1, testfileC3_C1_copy.txt, traceC3_C1.txt and a Makefile that allows your program to be compiled. Put files into a separate directory CGS_C3_C1. you have to provide detailed comments about the implementation of the required functions (explain it by walking the reader through the implemented statements and provide comments for each of them).
Parts missing in your solution may reduce the CGS mark. Try to get with your implementation as far as possible.
Explanation
When a file is created or opened, a file descriptor has to be created (see filesys.h):
This file descriptor can hold one disk block with “diskblock_t buffer”. Read and write operations access this buffer. The attribute “pos” points to the byte in this buffer that is read or written. block has to be allocated to the open file – this is done by simply finding the next entry in the FAT with value UNUSED and extending the block chain in the FAT. When the file is closed with fclose(), its length has to be written into the directory entry of this file (situated in the root directory, block 3).
Find the FAT (at 0x400) and the block for the root directory in the hexdump (it starts at 0xc00):
As illustrated, the file 'testfile.txt' was opened for write on the virtual disk, which created it in the root directory (block no 3, starting at 0xc00), and the content of file 'testfile.txt' starts at block no 4. you see 05 00, and not 00 05). At location 0xc00, the block of the root directory starts. The first two bytes, set to ’01 00’, indicate that this is a dirblock.
Please note: text strings always have a ‘\0’ at the end. C functions such as strcpy() etc. will scan a text string as long as they haven’t found the ‘\0’. For example, the parameter ‘mode’ of function myfopen() is a character array of 3 elements, because it can hold a mode string that can be one or two characters long and has ‘\0’ as its last character.
Indicating End-of-File: the last block of a file may not be filled completely. How do we know where the file ends? You have to store the length of the file in the directory entry. When using myfgetc (), it has to be calculated whether all bytes of the files have been read by checking the amount of chars read against the file size stored in the directory entry. If the last byte of the file has been read, then at the next call of
fgetc(), the function has to return EOF (EOF may already be a macro contained in one of the system header files you include in your program). Therefore, when you close your file with myfclose(), you have to update the directory entry with the new file size (number of bytes added to the file).
CGS B3-B1
Task
Add a directory hierarchy to your virtualdisk that allows the creation of subdirectories. Implement the following interface functions:
-
mymkdir( char * path) that creates a new directory
-
char ** mylistdir (char * path) that lists the content of a directory
Extend your test program shell.c with the following test steps:
-
create a directory “/myfirstdir/myseconddir/mythirddir” in the virtual disk
-
call mylistdir(“/myfirstdir/myseconddir”): print out the list of strings returned by this function
-
write out virtual disk to “virtualdiskB3_B1_a”
-
create a file “/myfirstdir/myseconddir/testfile.txt” in the virtual disk
-
call mylistdir(“/myfirstdir/myseconddir”): print out the list of strings returned by this function
-
write out virtual disk to “virtualdiskB3_B1_b”
Redirect the output of your shell program into a file “traceB3_B1.txt”
• ./shell>traceB3_B1.txt
Submission
For a CGS B3, submit shell.c, filesys.c, filesys.h, the files virtualdiskB3_B1_a, virtualdiskB3_B1_b, traceB3_B1.txt and a Makefile that allows your program to be compiled. Put files into a separate directory CGS_B3_B1. For a CGS B1, the virtual disk should not show any clutter, only the information you write into it, and in your report, you have to provide detailed comments about the implementation of the required functions (explain it by walking the reader through the most important implemented statements and and explain their purpose). Parts missing in your solution may reduce the CGS mark. Try to get with your implementation as far as possible.
Explanation
A directory can be specified absolute or relative to another directory:
-
absolute: “/mydirectory”
-
relative: “mydirectory”
A directory may be specified with a path:-
absolute: “/firstlevel/secondlevel/mydirectory”
-
relative: “somelevel/somelevelbeneath/mydirectory”
In order to create the directory “mydirectory”, all the subdirectories specified in the path must exist. If you call mymkdir ( “/firstlevel/secondlevel/mydirectory” ) in your test program shell.c, then the directory hierarchy consisting of Root->firstlevel->secondlevel must exist, before you can create “mydirectory” in the parent directory “secondlevel”. If these directories don’t exist, they have to be created.
Use strtok_r() from the C standard library to tokenize a path string ( look up its usage ). If you experience a segmentation fault during running the program, remember that pointers have to point to allocated memory and that string literals are allocated in the segment ‘.rodata’ and cannot be manipulated.
-
CGS A5-A1
Task to achieve A5-A4
Implement the following interface functions to achieve :
-
mychdir( char * path), using the global variable “currentDir” as specified in filesys.c: a change into a directory will change the variable “currentDir”
-
myremove( char * path) removes a file; the path can be absolute or relative
-
myrmdir( char * path) removes a directory, if it is empty; the path can be absolute or relative
Change
how directories are created:
Change
how files are created:
-
add two default entries (as we are used to under Unix etc.): “.”
o thedirectoryentry“..”pointstotheparentdirectory -
allow the creation of a directory relative to the current directory
• the function myfopen() can be called using an absolute or relative path in the filename, if the directories specified in the path do not exist, then they have to be created
Demonstrate with your test program shell.c that creating and deleting files and directories works and that results are visible in the hexdumps of the virtual disk. Save intermediate results. You may follow the following steps:
-
create a directory “/firstdir/seconddir” in the virtual disk
-
call myfopen( “/firstdir/seconddir/testfile1.txt” )
-
you may write something into the file
-
close the file
-
call mylistdir(“/firstdir/seconddir”): print out the list of strings returned by this function
-
change to directory “/firstdir/seconddir”
-
call mylistdir(“/firstdir/seconddir/” ) or mylistdir(“.”) to list the current dir, print out the list of strings
returned by this function
-
call myfopen( “testfile2.txt, “w” )
-
you may write something into the file
-
close the file
-
write out virtual disk to “virtualdiskA5_A1_b”
-
call mychdir (thirddir”)
-
call myremove( “testfile3.txt”)
-
write out virtual disk to “virtualdiskA5_A1_c”
-
call mychdir( “/firstdir/seconddir”) or mychdir(“..”)
-
call myremdir( “thirddir” )
-
call mychdir(“/firstdir”)
-
call myrmdir ( “seconddir” )
-
call mychdir(“/”) or mychdir(“..”)
-
call myrmdir( “firstdir”)
• write out virtual disk to “virtualdiskA5_A1_d”
Redirect the output of your shell program into a file “traceA5_A1.txt”
• ./shell > traceA5_A1.txt
Work for points (A3-A1):
-
A3: Try to write a copy function that allows you to copy files from your real hard disk into your virtual disk and vice versa.
-
A2: Try to implement a copy and a move function that relocates files within your virtual disk.
-
A1: Saveguard the manipulation of the FAT table in a multithreaded application. Introduce a lock variable and store it in block 0 (you can introduce an extra struct for block 0 that contains, among other things, a volume name and this lock variable). The lock variable indicates either a LOCKED or UNLOCKED state of the virtual disk. Use mutexes to change the lock in a thread. Run tests by implementing a
multithreaded shell.c.
NOTE:
In your submitted assignment if there is any issue with the implementation for CGS D3-B1, it will negatively impact your overall grades – even if you have implemented to CGS A5-A1 requirements.
Submission
For a CGS A5, submit shell.c, filesys.c, filesys.h, the files virtualdiskA5_A1_a .. d, traceA5_A1.txt and a Makefile that allows your program to be compiled. For a CGS A4 .. A2, your solutions have to be of high quality with attempts at providing solutions to extra work as outlined above. For CGS A1, provide additional functionality of your own choosing.
Put files into a separate directory CGS_A5_A1. The virtual disk should not show any clutter, only the information you write into it and in your report, you have to provide detailed comments about the implementation of the required functions (explain it by walking the reader through the most important implemented statements and explain their purpose). Parts missing in your solution may reduce the CGS mark. Try to get with your implementation as far as possible.
Explanation
Look into fileys.h. A directory entry direntry_t uses a char array of 256 bytes for the file or directory name. Very few files may have such a long name. You may reduce the length of this array two 128 or 64 to allow more directory entries. Try to implement directory entries with variable size. Unused directory entries have to be reused when new files are created. The pointer nextEntry has to be changed to a counter, indicating whether the dirblock is full, and the array of directory entries has to be scanned for an entry set to “UNUSED”.