How to Operate on Strings in C++

Overview

A string, at its core, simply means an array or characters terminated by a binary zero or null character as the final element in the array. The C-style of dealing with strings means every bit of it and therein lies the problem. Programmers need to be extra careful in dealing with character arrays for many reasons. It is error prone because it is difficult to keep track of the difference between static quoted strings and arrays created on the stack and the heap. A string is so commonly used that it needed a specific identity rather than a data structure that represents an array of characters. Unintended mistakes in dealing with individual characters can create havoc by making it difficult to debug. The introduction of the string class in the standard C++ Library is a later but apt addition to solve many of the problems of character array manipulation once and for all. So, like many primitive data types, if we consider the behavior and uses of string class, it may be treated as part of the primitive data types family of C++. The string class keeps track of the memory during assignment and copy constructor, accommodates variation in character sets, features seamless string data conversion, and so on. In fact, if we sum up we want at least three things that a string must be able to do and C++ string has all of it.

  • Able to create an modify characters stored in a string
  • Pick a character and locate in the sequence
  • Convert string characters according to various schemes of representation

C++ string Representation

C++ physically hides the sequence of characters represented as an array by using a class of object-oriented methodology. And, like all classes, it also has a defined behavior. This clearly allays the concern for array dimension or null terminated characters because it is taken care of during class design. The string class also maintains its properties, such as the size and storage location of its data. An object created from the string class knows its starting location in memory, its length in characters, content, and the mechanism of its growth by resizing its internal data buffer. Therefore, the problem such as accessing values that are out of bound, using uninitialized arrays or arrays with incorrect values which may lead to the problem such as dangling pointer, has been addressed efficiently.

The Standard C++ Library does not clearly state the exact way the memory layout is used by the string class, but it can be understood that the design is kept flexible for vendor-specific implementation to have a guaranteed predictable behavior among a variety of compilers.

Initializing string

Using string is so simple that it can almost be treated like a primitive data type. As a result, the initialization and creation are pretty simple and straightforward. A string object created is initialized to be blank and does not hold garbage values. There are members functions, such as size(), that can be used to report the length of the string or empty() to check if it contains any value or not.

Some variations of string creation and initialization are shown below:

#include <iostream>
#include <string>

using namespace std;

int main()
{
   string s1;
   string s2("Welcome ");
   string s3 = "aboard";
   string s4(s2);
   // Copy the characters from location 0 to 5
   string s5(s3,0,5);
   string s6 = s2 + s3 + ", Thank you.";
   cout<<"s1 = "<<s1<<endl;
   cout<<"s2 = "<<s2<<endl;
   cout<<"s3 = "<<s3<<endl;
   cout<<"s4 = "<<s4<<endl;
   cout<<"s5 = "<<s5<<endl;
   cout<<"s6 = "<<s6<<endl;

   return 0;
}

Output

Figure 1 shows the output of the preceding code sample.

Output of the previous code
Figure 1: Output of the previous code

Overloaded Operators and Some Common Functions

The string class has overloaded many operators and has several other useful member functions to leverage convenience of its use. The function empty determines whether or not the string is empty, and function substr returns a part of an existing string. The string class can overload operators, such as += operator for string concatenation, and the = operator invokes the copy constructor, [] operator to create lvalues that enable manipulation of characters like simple arrays. But, note that the overloaded [] operator does not perform any bound checking. As a result, accidental manipulation of out of bound elements must be handled carefully. The at function can be used to access arbitrary elements of the string. It throws an exception if the access goes out of bound of the array.

Here is a quick example.

#include <iostream>
#include <string>

using namespace std;

int main()
{

   string s1("ABCDEF");
   string s2("GHI");
   string s3;
   // Testing overloaded operators
   cout<<"\ns1: "<<s1<<"\ns2:
      "<<s2<<"\ns3: "<<s3<<endl;
   cout<<"Sizes s1, s2, s3: "<<s1.size()<<", "<<s2.size()
      <<", "<<s3.size()<<" respectively."<<endl;
   cout<<"Comparison operators demo:"<<endl;
   cout<<"s1 == s2 is "<<(s1 == s2 ? "true":"false")<<endl;
   cout<<"s1 != s2 is "<<(s1 != s2 ? "true":"false")<<endl;
   cout<<"s1 > s2 is "<<(s1 > s2 ? "true":"false")<<endl;
   cout<<"s1 < s2 is "<<(s1 < s2 ? "true":"false")<<endl;
   cout<<"s1 >= s2 is "<<(s1 >= s2 ? "true":"false")<<endl;
   cout<<"s1 <= s2 is "<<(s1 <= s2 ? "true":"false")<<endl;

   cout<<"is s3 empty? "<<(s3.empty() ?
      "true":"false")<<endl;
   s3 = s2;
   cout<<"s3 = s2, s3 is: "<<s3<<endl;

   s1+=("\n"+s2);
   cout<<"s1+=(\"\\n\"+s2), s1 is: "<<s1<<endl;

   cout<<"substring of s1 location 5 through 10: "
      <<s1.substr(5,10)<<endl;
   cout<<"substring of s1 from location 10: "
      <<s1.substr(10)<<endl;

   cout<<"testing copy constructor"<<endl;
   string s4(s2);
   cout<<"string s4(s2), s4 is "<<s4<<endl;

   cout<<"access s1 with subscript operator"<<endl;
   for(size_t i=0;i<s1.size();i++) {
      cout<<s1[i]<<" ";
   }

   try {
      cout<<"\nAttempting to access out of range
         location"<<endl;
      s3.at(100) = 'A'
   } catch (out_of_range &range_exception) {
      cout<<"An exception occurred. "
         <<range_exception.what()<<endl;
   }

   return 0;
}

Output

Figure 2 shows the output of the previous code sample.

Output from the second code sample
Figure 2: Output from the second code sample

Note that the substr function takes the first argument as the starting position of the sub string to extract and the second argument as the number of characters to select from the string. Also note that they also have default values, which means that if we invoke the substr function with empty arguments, it produces a copy of the entire string. This provides quite a convenience for the programmer where one can invoke the substr function with no-argument, single argument, or both arguments.

Using Iterators with string

The string class can be treated like a container of objects where we can use iterators to indicate the start and end of a sequence of characters. It is possible to state two iterators to the constructor of the string itself as follows:

#include <iostream>
#include <string>
#include <cassert>

using namespace std;

int main()
{
   string s1("Hello");
   string s2(s1.begin(),s1.end());
   string s3("Hi");

   assert(s1 == s2);   // Same content
   assert(s1 == s3);   // Different content

   return 0;
}

Although we can use an index to access individual characters in a string, iterators provide a unified access to a collection or a data structure. After all, a string is nothing but a collection of characters. Using the index is perfectly all right, particularly for random access but iterators provide the fine tuning and are immensely helpful, especially in code refactoring.

If we want to iterate over the characters in a string we may do so in a simple manner as this:

string s4("A simple string.");

for(size_t i = 0; i < s4.size(); i++)
   cout<<s4[i]<<' ';

Another way to do the same:

for(char c: s4)
   cout<<c<<' ';

We also can use iterators as follows:

for(auto iter = s4.begin(); iter != s4.end(); ++iter)
   cout<<*iter<<' ';

The string Operations

The string class is designed to be safe to handle and has the capability to grow as per requirement without the programmer’s intervention. The tedious housekeeping, like tracking of the bounds that we need to do with strings, has gone through a huge improvement. The class has a host of member functions to help with string manipulation needs. The function names are highly intuitive with judicious use of default arguments.

#include <iostream>
#include <string>

using namespace std;

int main()
{
   string s1("This is a sample text.");

   cout<<"Capacity: "<<s1.capacity()<<endl;

   s1.insert(0,"Hello! ");
   cout<<s1<<endl;
   s1.reserve(128);
   cout<<"Capacity: "<<s1.capacity()<<endl;

   s1.append(" Append this text.");
   cout<<s1<<endl;

   return 0;
}

When we create a string object, it has a size according to the contents. If we want to find out the capacity of the string object before more storage is reallocated as the string grows, we simply invoke the function called capacity. If we want to make sure that string must have a specific amount of space, we invoke the reserve function. The reserve function is an optimization mechanism to specify the specific amount of storage. There is a function called resize, which appends space if the new size is more than the present string size or truncates the string if the new size is less than the current size. If we insert a new string into a specified location, existing characters move to accommodate new characters. The append function can be used to add more characters at the end of the current string. Here is an example to illustrate some of these functions.

#include <iostream>
#include <string>

using namespace std;

int main()
{
   string s1("A wisest ? is a ? who does not ? with another
      ?'s ?.");
   string s2("?");
   string s3("monkey");

   size_t i = 0;
   size_t j;
   while((j = s1.find(s2, i))!= string::npos) {
      s1.replace(j, s2.size(), s3);
      i = j + s3.size();
   }

   cout<<s1<<endl;
   return 0;
}

Output

Figure 3 shows the output of the code above.

Output from the last code sample
Figure 3: Output from the last code sample

The preceding example demonstrates how we can use the find and replace functions to replace a string of characters within a string. As we have seen, the insert function inserts a set of characters without overwriting the existing characters in the string. The replace function, however, overwrites characters. The find function returns the first matched location of a string pattern in another string. Here, we have demonstrated how these two functions can be used effectively to replace a particular string with another string within the context of a large text.

Conclusion

We have discussed only a few of the member functions available in the string class. There are a whole lot of them if we also include the overloaded ones. Moreover, the string class has overloaded numerous operators to leverage the convenience of its use. One thing for sure is that the string class has not only a lot to offer the programmer but also took great care keeping in view of the convenience of its use.

Manoj Debnath
Manoj Debnath
A teacher(Professor), actively involved in publishing, research and programming for almost two decades. Authored several articles for reputed sites like CodeGuru, Developer, DevX, Database Journal etc. Some of his research interest lies in the area of programming languages, database, compiler, web/enterprise development etc.

More by Author

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Must Read