Understanding File Processing in C++

Files are used to store large amounts of data in the secondary storage devices such as hard drives, optical disks, tapes, and so forth. We store data in variables and arrays, but they are temporary or non-persistent. To store persistent data, we generally use two types of file structures: flat files (unordered) or databases (ordered). Data stored in a database are naturally ordered due to the complicated logic applied to its storage and retrieval policy. In most cases, a data manager keeps watch on them. This makes them expensive in terms of complexity. Flat files, on the other hand, are simple and inexpensive. Unlike databases, there is no manager associated nor rules applied to its storage and retrieval process unless externally applied by a program’s logic. Here in this article, we shall illustrate an overview of file handling and how effectively we can use common files through C++ program logic.

Overview of a Data Item

At the core, data items are reduced to a combination of bits represented by zeros and ones. The smallest unit of data item designated by data type, char, occupies one byte (1 byte=8 bits, 1 bit=0/1) of memory. C++ also supports a data type called wchar_t, which is more than one byte, to support a wider character set, such as Unicode characters. Therefore, when we store ‘A’, we are actually storing 65 (01000001 in binary, according to the ASCII character set).

#include <iostream>
#include <iomanip>
#include <bitset>
using namespace std;

int main() {
   int c=65;
   cout<<"A = "<<static_cast<char>(c)<<" cast to char"<<endl;
   cout<<"A = "<<bitset<8>(c)<<" in 8 bit binary"<<endl;
   cout<<"A = "<<bitset<16>(c)<<" in 16 bit binary"<<endl;
   cout<<"A = "<<oct<<c<<" in octal"<<endl;
   cout<<"A = "<<hex<<c<<" in hexadecimal"<<endl;
   cout<<"A = "<<dec<<c<<" in decimal"<<endl;
   // Or simply, cout<<"A = "<<c<<" in decimal"<<endl;
   return 0;
}

Output

A = A cast to char
A = 01000001 in 8 bit binary
A = 0000000001000001 in 16 bit binary
A = 101 in octal
A = 41 in hexadecimal
A = 65 in decimal

File Processing in C++

In C++ file processing, files are nothing but a sequence of bytes without any structure. A file either ends with a specific byte number maintained by the underlying platforms administrative data structure or with a marker called EOF (end-of-file).

As we open a file, an object is created. This object is associated with a stream. This stream provides the communicating channel between the program and the file. The cin (standard input) and cout (standard output) objects that we commonly use in C++ are nothing but the objects that open up the channel for input streaming from the keyboard and output streaming to the screen, respectively.

In this regard, it is worth mentioning that all the devices in computing are also considered as files (device files). As a result, there is not much of a difference in the way data items are streamed in and out into a flat file or device files like a printer, screen, or keyboard. However, there are restrictions on the mode of access, such read-only, write-only, or read, write both.

Similarly, there are cerr and clog, standard error objects to print error messages of a program. The header file <iostream> includes these standard I/O objects. The header file <fstream> is used for file handling. This header file defines stream class template, such as basic_ifstream for file input, basic_ofstream for file output, and basic_fstream for file input and output. Apart from this, there are several typedefs that create aliases, such as ifstream for basic_ifstream, which enables char input from a file. And, the typedef ofstream creates an alias for basic_ofstream that enables char output to a file.

Therefore, the <fstream> headers provide three types of support for file IO:

  • ifstream to read from a given file
  • ofstream to write to a given file
  • fstream to do both the read and write operations

Let’s try an example to process write and read operations with these objects.

Using fstream

#include <iostream>
#include <fstream>
#include <iomanip>
using namespace std;
int main(void){
   fstream file;
   // Opening file for writing
   file.open ("testfile.dat", ios::out);
   if(file.is_open()){
      file << "this line will be written
         into testfile.dat."<<endl;
      file << "this line also will be written
         into testfile.dat."<<endl;
      file.close();
   }else{
      cerr<<"Error opening file!!"<<endl;
   }
   string buf;
   // Opening file for reading
   file.open("testfile.dat",ios::in);
   if(file.is_open()){
      while(getline(file,buf))
      cout<<buf;
      file.close();
   }else{
      cerr<<"Error opening file!!"<<endl;
   }
   return 0;
}

Using ifstream and ofstream

#include <iostream>
#include <fstream>
#include <iomanip>
using namespace std;
int main(void){
   ofstream ofile;
   // Opening file for writing
   ofile.open ("testfile.dat");
   if(ofile.is_open()){
      ofile << "this line will be written
         into testfile.dat."<<endl;
      ofile << "this line also will be written
         into testfile.dat."<<endl;
      ofile.close();
   }else{
      cerr<<"Error opening file!!"<<endl;
   }
   string buf;
   // Opening file for reading
   ifstream ifile;
   ifile.open("testfile.dat");
   if(ifile.is_open()){
      while(getline(ifile,buf))
      cout<<buf;
      ifile.close();
   }else{
      cerr<<"Error opening file!!"<<endl;
   }
   return 0;
}
Note: The file read operation specified by the mode parameter ios::in requires that a file must exist prior to reading; ios::out, on the other hand, creates the file if not found. A file can be opened in read write mode at the same time when using the fstream object.
fstream file;
file.open ("testfile.dat", ios::in |
   ios::out | ios::binary);

There are other such mode parameters, such as ios::app, to open a file in append mode, ios::ate, and the like. Refer to the C++ standard API documentation for more details.

The parameter ios::binary implies that we are opening a file in binary mode. Reading and writing data items with extraction(<<) and insertion(>>) operators or by using the getline function is inefficient if a file is opened in binary mode. In such a case, the right way is to use write and read functions of ostream (ofstream) and istream (ifstream) objects.

  • read(buffer, size);
  • write(buffer, size);

Stream Positioning

The I/O object internally keeps two position; one is called get position (tellg, seekg), maintained by the ifstream object. This position locates the element to be read in the next input operation. Similarly, there is put position (tellp, seekp), maintained by the ofstream object. These positions locate the element to be written in the next output operation.

We can manipulate this position with the help of following functions:

  • tellg(), tellp(): Returns a value of the member type streampos, signifying current read and write position, respectively.
  • seekg(position), seekp(position): Changes the location to an absolute position for read and write operations in the file.
  • seekg(offset, direction), seekg(offset, direction): The offset value of type streamoff signifies a relative position to some specific point determined by the enumerated type direction, such as:
    • ios::beg: Offset counted from the beginning of the stream
    • ios::end: Offset counted from the end of the stream
    • ios::cur: Offset counted from the current position

A Quick Example

#include <iostream>
#include <fstream>
#include <iomanip>
using namespace std;
int main(void){
   string buf="ABCD EFGH HIJK";
   fstream file;
   // Uncheck this if file doesn't already exist
   // file.open ("testfile.dat", ios::out | ios::binary);
   // file.close();
   file.open ("testfile.dat", ios::out |
      ios::in | ios::binary);
   if(!file.is_open()){
      cerr<<"Error opening file!!"<<endl;
      exit(1);
   }
   file.write(reinterpret_cast<char *>(&buf),buf.size());
   file.seekg(0,ios::end);
   cout<<"size = "<<file.tellg()<<endl;
   file.seekg(0,ios::beg);
   file.read (reinterpret_cast<char *>(&buf),buf.size());
   cout<<buf<<endl;
   file.close();
   return 0;
}

Accessing Structured Data

As mentioned earlier, flat files have no mechanism to store data items in a systematic format. Also, C++ does not impose any such structure. What we can do is apply certain program logic in such a manner that the retrieval and storage of data maintains a structural behavior. Let’s try an example.

File: employee.h

#ifndef EMPLOYEE_H
#define EMPLOYEE_H
#include <string>
using namespace std;
class Employee
{
private:
   int m_empno;
   char m_fname[20];
   char m_lname[20];
   char m_email[20];
   double m_salary;
public:
   Employee(int=0,string="",string="",string="",
      double=0.0);
   int empno() const;
   void setEmpno(int empno);
   string fname() const;
   void setFname(const string &fname);
   string lname() const;
   void setLname(const string &lname);
   string email() const;
   void setEmail(const string &email);
   double salary() const;
   void setSalary(double salary);
};
#endif   // EMPLOYEE_H

File: employee.cpp

#include "employee.h"
Employee::empno() const{ return m_empno; }
Employee::setEmpno(int empno){ m_empno = empno; }
Employee::fname() const { return m_fname; }
Employee::setFname(const string &fname)
{
   int sz=fname.size();
   sz=(sz<20? sz: 19);
   fname.copy(m_fname,sz);
   m_fname[sz]='\0';
}
string Employee::lname() const { return m_lname;}
void Employee::setLname(const string &lname)
{
   int sz=lname.size();
   sz=(sz<20? sz: 19);
   lname.copy(m_lname,sz);
   m_lname[sz]='\0';
}
string Employee::email() const { return m_email; }
void Employee::setEmail(const string &email)
{
   int sz=email.size();
   sz=(sz<20? sz: 19);
   email.copy(m_email,sz);
   m_email[sz]='\0';
}
Employee::salary() const { return m_salary;}
void Employee::setSalary(double salary)
   { m_salary = salary;}
Employee::Employee(int eno, string fn, string ln,
   string mail, double sal)
{
   setEmpno(eno);
   setFname(fn);
   setLname(ln);
   setEmail(mail);
   setSalary(sal);
}

File: main.cpp

#include <iostream>
#include <iomanip>
#include <fstream>
#include "employee.h"
using namespace std;
const string FILENAME="emp.dat";
void print_table();
bool isExists(int);
void create(Employee);
void create(Employee emp)
{
   if(isExists(emp.empno())==true){
      cout<<"Cannot create! Record with Employee No #"
         <<emp.empno()<<" already exists."<<endl;
      return;
   }
   ofstream outfile(FILENAME, ios::app|ios::binary);
   if(!outfile){
      cout<<"Error opening file!";
      exit(1);
   }
   outfile.write(reinterpret_cast<const char *>
      (&emp),sizeof(Employee));
   outfile.close();
}
bool isExists(int eno)
{
   bool exists=false;
   ifstream infile(FILENAME, ios::in|ios::binary);
   while(!infile.eof()){
      Employee ee;
      infile.read(reinterpret_cast<char *>
      (&ee),sizeof(Employee));
      if(ee.empno()==eno) {exists=true; break;}
   }
   infile.close();
   return exists;
}
void print_table(){
   cout << left
      << setw(10) << setfill('-') << left << '+'
      << setw(21) << setfill('-') << left << '+'
      << setw(21) << setfill('-') << left << '+'
      << setw(21) << setfill('-') << left << '+'
      << setw(21) << setfill('-') << '+' << '+'
      << endl;
   cout << setfill(' ') << '|' << left
      << setw(9) << "Emp No." << setfill(' ')
      << '|' << setw(20) << "First Name" << setfill(' ')
      << '|' << setw(20) << "Last Name" << setfill(' ')
      << '|' << setw(20) << "Email" << setfill(' ')
      << '|' << right<< setw(20) << "Balance" << '|'
      << endl;
   cout << left << setw(10) << setfill('-') << left <<
      << setw(21) << setfill('-') << left << '+'
      << setw(21) << setfill('-') << left << '+'
      << setw(21) << setfill('-') << left << '+'
      << setw(21) << setfill('-') << '+' << '+' << endl;
   Employee record;
   ifstream infile(FILENAME, ios::in|ios::binary);
   infile.read(reinterpret_cast<char *>
      (&record),sizeof(Employee));
   while(!infile.eof()){
      cout << setfill(' ') << '|' << left
         << setw(9) << record.empno()
         << setfill(' ') << '|' << setw(20) << record.fname()
         << setfill(' ') << '|' << setw(20) << record.lname()
         << setfill(' ') << '|' << setw(20) << record.email()
         << setfill(' ') << '|' << right << setw(20)
         << record.salary() << '|' << endl;
      infile.read(reinterpret_cast<char *>
         (&record),sizeof(Employee));
   }
   infile.close();
   cout << left << setw(10) << setfill('-') << left << '+'
         << setw(21) << setfill('-') << left << '+'
         << setw(21) << setfill('-') << left << '+'
         << setw(21) << setfill('-') << left << '+'
         << setw(21) << setfill('-') << '+' << '+' << endl;
}
int main(void)
{
   ofstream outfile(FILENAME, ios::out|ios::binary);
   if(!outfile){
      cout<<"Error opening file!";
      exit(1);
   }
   outfile.close();
   int empno;
   string fname, lname, email;
   double sal;
   while(true){
      cout<<"\nEnter Employee no.(0 to exit)#";
      cin>>empno;
      if(empno==0) break;
      if(isExists(empno)) {
         cout<<"Employee number exists.
         Please enter different number."<<endl;
         continue;
      }
      cout<<"\nEnter first name, last name, email, salary\n# ";
      cin>>setw(19)>>fname;
      cin>>setw(19)>>lname;
      cin>>setw(19)>>email;
      cin>>sal;
      Employee emp(empno,fname,lname,email,sal);
      create(emp);
      print_table();
   }
   return 0;
}

Output

Results of running main.cpp
Figure 1: Results of running main.cpp

Conclusion

The key to file processing using standard C++ API is through the ifstream and ofstream objects, which inherit the quality of the istream and ostream classes. A fstream object is more flexible and can be used to open a file in both read and write mode at the same time. The put operations (tellp, seekp) are used for writing, and the get operations (tellg, seekg) are used for reading data item from the file.

Manoj Debnath
Manoj Debnath
A teacher(Professor), actively involved in publishing, research and programming for almost two decades. Authored several articles for reputed sites like CodeGuru, Developer, DevX, Database Journal etc. Some of his research interest lies in the area of programming languages, database, compiler, web/enterprise development etc.

More by Author

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Must Read