Building the Right Environment to Support AI, Machine Learning and Deep Learning
The NTFS windows file system has introduced several features that improve the performance, stability, and reliability of file storage. Of these, three features lend themselves to advanced methods of information storage and collation. Few applications have, however, made extensive use of these features. This article introduces these features and identifies some of the potential uses each one exhibits.
This article is intended for experienced programmers.
While working on Windows NT, Microsoft realised the huge deficiencies that existed within the old FAT file system (FS). A server utilising FAT is not able to function as reliably as is required in commercial or industrial environment.
In an effort to address these deficiencies and the ubiquitous inefficiencies of FAT, Microsoft designed the new NTFS. This new FS bolsters security, performance, and reliability, and also supplies several advanced features not supported by FAT. Native support for file encryption and compression, disk quotas, and access permissions were all included in NTFS (although some have only been added subsequent to the original design of NTFS).
An example of the improvements of NTFS over FAT is transactional file processing. In an effort to improve reliability, NTFS volumes support transactional processing by utilising a log file; this ensures that transactions are completed successfully. When a transaction is begun, an entry is first written to this log file, the transaction is committed, and the end of the transaction is written to the log. Should a system fault occur during the processing of the transaction, its end will not be reflected in the log file. The system then can perform a recovery on the volume at the next system start-up.
Hardlinks In NTFS
One of the features that is built into NTFS is the separation of the data and presentation layers. This means that data exists in one form on the disk, but when accessed through NTFS can be presented differently to the user. This is done though the use of links.
A link is very similar to the concept of a pointer as used in coding. While data exists in memory, it can be accessed via a pointer that simply stores the address of that data. Multiple pointers can be used to point to the same data. These pointers can be moved, copied, and deleted, leaving the original data intact. Moving or copying the pointers incurs a fraction of the overhead required to move or copy the data.
In the same way, because links are used to reference data on a volume, these links can be moved and copied to represent movement of the data to the user while the data remains at the same location.
The concept of a link is depicted in the following diagram:
Most Windows users are already familiar with this concept, and have utilised shortcuts. Shortcuts in Windows are called softlinks. However, a more deeply ingrained version of linking exists in NTFS. These are named hardlinks. All files in NTFS are presented through hardlinks, and so one may say that all users of Windows have, in fact, used hardlinks. Nevertheless, data is not restricted in NTFS to be simply presented through only one hardlink. It is possible for data to be presented through multiple links.
Thus, to the user two files containing the same data are presented. However, only one copy of this of data exists on the volume.
These hardlinks may occur either in the same or in different folders, but must occur on the same volume.
Data is only removed from the data layer when the last remaining hardlink to that file is deleted. Unlike Windows shortcuts, this allows files to be safely deleted without needing to manage the link.
By the way, you may have noticed that moving a large file from disk to disk can take a long time, but moving it between folders is almost instantaneous. This is because only the link is transferred when moving between folders.
An Example of Practical Uses for Hardlinks
So, what are the potential uses of hardlinks? Well, quite often one will define a folder structure and then attempt to place files into the correct folder. But, all too often, a file should exist in more than one place.
For example, consider a filing system for pictures and photos. One may have a folder for dogs and another for dwellings.
What would you do if you had a photo of a dog in front of a house? Where does it go? It contains a dog, so it should go into the dog folder... but it also contains a house and should perhaps go into the dwellings folder.
It would be possible for the picture file to be placed in one folder and then a copy to be made in the other, but this would waste space. One also could use a shortcut to provide a link to the original file. This method can lead to management problems, especially in large filing systems. A database that identifies each file with several keywords could be built up. A lookup in the database then could provide the necessary information to find the file. This solution is not practical for the average home user to implement. It also does not integrate well with the Windows file system, making opening files from within a third-party application difficult.
Here, a hardlink may be the ideal solution. The file can be made to exist in both folders at the same time. Should the file need to be edited, the change would be reflected in both "files." The system is fairly easy to set up and no management is required to ensure that moved and deleted links are handled properly. Because the link exists at a native level for Windows, files can be easily accessed by third-party applications. The only potential downside to this technique is that, by deleting one of the "files" the data still remains on the volume and the other link is still valid. Whether or not this is an advantage or disadvantage would depend on the application.
Accessing and Managing Hardlinks
Hardlinks can be created from the command prompt by using the FSUTIL utility. The syntax for creating a hardlink is:
fsutil hardlink create FileName ExistingFileName
Where FileName is the name of the new "file" to be created, as a link to the original file, ExistingFileName.
From within code, hardlinks are created by utilising the CreateHardLink Windows API function. This function is defined as follows (according to the Visual Studio 6.0 MSDN):
BOOL CreateHardLink( LPCTSTR lpFileName, LPCTSTR lpExistingFileName, LPSECURITY_ATTRIBUTES lpSecurityAttributes );
lpFileName is the name and path of the link to be created to the original file, given by lpExistingFileName. lpSecurityAttributes points to a security descriptor for the new link (may be NULL to use default security).