What is a Directory?
Normally, the hard disk is divided into various parts of different sizes known as partitions or volumes. Each of which contains a file system that stores the device directory or system volume information table. It contains information such as name, locations, size, type, etc., about all files stored in that volume.
A directory is used to organize files and at least one directory should be present in each partition. It can be thought as a symbol table that converts the given filenames into their directory entries thereby we can get all the information about that particular file including its location.
Operations on Directory :
The following operations can be performed on directories.
- Searching Files – By reading the directory table, we can find a particular file or similar files or whose name matches a specified pattern or criteria.
- File Creation – When a new file is created an entry is inserted in the directory table.
- File Deletion – When a particular file has to be deleted, its entry is deleted from the directory table.
- Listing – We can list the files and other sub-directories present in their directory.
- Renaming Files – The name of the existing file can be changed by modifying its entry in the directory table.
Types of Directory Structure :
The following are the common schemes used to define directory structure.
- Single-level Directory Structure
- Two-level Directory Structure
- Tree-Structured Directory
- Acyclic-Graph Directory Structure
- General Graph Directory Structure.
Single-level Directory Structure :
Here, the volume contains a single directory and all files are stored in the same directory. We cannot create sub-directories within that directory.
There are various limitations to this scheme. All filenames have to be unique because they reside in the same container. As users increases, files also increases, and given unique file names to all those files may become complicated. Grouping of similar types of files is not possible and there is no protection implemented for each user.
But the advantages of this directory structure are it can be implemented easily compared to other schemes of directory structure, and file creation, searching, deletion, updating operations can be performed very easily in this scheme.
Two-level Directory Structure :
Here, each user of the system is given a separate directory called as user file directory (UFD) where all files of a particular user are present. If there are n number of users then there will be n number of UFDs all of which are indexed in a Master File Directory (MFD).
When a user wants to search any file ‘x’ then it is searched in his UFD only. The filenames within the UFD should be unique. But, two or more users can have the same filenames because their directories are different. UFD is created whenever a new user is created.
This directory structure overcomes the problem of the unique filename by isolating or separating users in different directories. However, this solution creates a problem when several users require to share some files.
Each file in the system is identified by a pathname. The syntax of specifying pathname differs from operating system to operating system. For example, in MS-DOS and windows, a colon is used to specify a volume and a backslash to specify a directory. The pathname is created by using these symbols and the directory names. For example, if we want to access file 2 of user 2 the path would be,
Tree-Structured Directory :
This scheme allows users to create any number of their own directories within their User File Directory (UFD). It has a variable number of levels. It gives better flexibility to manage files. A sub-directory is treated as a file. A special bit is used which defines whether the entry is a file (0) or sub-directory (1).
A current directory is normally a directory from where the process is executing and carries almost all the associated files of the currently executing process. When the process tries to access a particular file it is searched in the current directory. If it is not there, then the user has to specify the pathname of that file or change the current directory to that path which can be done using a system call that considers the pathname as a parameter and redefines the current directory. The two types of pathnames are,
Absolute Pathname :
It gives the full address of the file starting from the root directory through all directories till the filename. For example, the absolute pathname of file “F7” in the above figure is,
rootUser1DOCSMusicF7
Relative Pathname :
It gives the address of the file from the current directory. For example, if the current directory is “rootUser1DOCS”, the relative path of file “MusicF7” would be,
root/user/DOCSMUSICF7
Acyclic-Graph Directory Structure :
It is a technique that allows sub-directories and files to be shared. Acyclic-graph means a graph without a cycle. The tree-structured directory method does not allow sharing of files and directories, but here that problem is resolved. The entry of a particular file is present in all directories that are sharing that file. The below figure shows a file “main library” shared by two directories.
Sharing is totally different from maintaining multiple copies and it is very important in the case where more than one programmer working on a single program. In this case, changes made by one programmer to a file need to be informed to the others instantly which can only be done by sharing. If a directory is shared, every new file created in it will be made available to all its shared users.
Sharing in Unix is accomplished by creating a directory called link which acts as a pointer to the original directory/file. When we try to access the file present in a shared directory it is marked as a link and the name of the original file will be attached to it.
Acyclic-graph directory structure gives flexibility as well as sharing. The deletion of shared files is a complex problem here, because if we delete a file by searching it through a path, then entry will be deleted in that directory, but what about other directories that are pointing to the same file? Entry will not be deleted in those. This will create dangling pointers to file which actually does not exist. Hence, some operating systems like MS-DOS do not allow acyclic-graph structure. It uses a simple tree structure rather than an acyclic graph.
General Graph Directory Structure :
It is a tree-structured directory organization where we can add links to an existing directory. It is same as the acyclic-graph structure except it allows cycles in graph whereas, acyclic-graph does not.
Since cycles are present in the graph, we should avoid searching a particular file more than once. This organization also suffers from the dangling pointers problem while deletion of files. To avoid this problem acyclic-graph structure uses a variable called reference counter which stores the number of directories referring to this file.
If the reference counter is 0 then it means no directories are referring to it and can be deleted. Since there are cycles present here, this approach is not useful. Another approach called garbage collection scheme is used to find whether all the references to a particular file are deleted or not so that space occupied by that file is deallocated and is marked as deleted.
The garbage collection scheme works in two-phase. Firstly, it traverses the whole file system and marks everything (file and directories) that are accessible, and ensures that it is marked only once (without reputation). In the second phase, it frees all unmarked files and directories because they are not referred to by any directories and are garbage.