URI Encoding and Decoding

Introduction

Here are two functions of URI encoding and decoding. They use std::string as the argument and return type.

A URI is represented as a sequence of characters, not as a sequence of octets. That is because URI might be "transported" by means that are not through a computer network, e.g., printed on paper, read over the radio, etc.—RFC2396

UriEncode() maps octets to characters, such as:

"\0\1\2" -> "%00%01%02"
"~ABCD"  -> "%7EABCD"

Each octet except alphanum is converted to "% HEX HEX". UriDecode() converts them back.

Code Snippets

Encode:

std::string UriEncode(const std::string & sSrc)
{
   const char DEC2HEX[16 + 1] = "0123456789ABCDEF";
   const unsigned char * pSrc = (const unsigned char *)sSrc.c_str();
   const int SRC_LEN = sSrc.length();
   unsigned char * const pStart = new unsigned char[SRC_LEN * 3];
   unsigned char * pEnd = pStart;
   const unsigned char * const SRC_END = pSrc + SRC_LEN;

   for (; pSrc < SRC_END; ++pSrc)
   {
      if (SAFE[*pSrc]) 
         *pEnd++ = *pSrc;
      else
      {
         // escape this char
         *pEnd++ = '%';
         *pEnd++ = DEC2HEX[*pSrc >> 4];
         *pEnd++ = DEC2HEX[*pSrc & 0x0F];
      }
   }

   std::string sResult((char *)pStart, (char *)pEnd);
   delete [] pStart;
   return sResult;
}

Decode:

std::string UriDecode(const std::string & sSrc)
{
   // Note from RFC1630: "Sequences which start with a percent
   // sign but are not followed by two hexadecimal characters
   // (0-9, A-F) are reserved for future extension"

   const unsigned char * pSrc = (const unsigned char *)sSrc.c_str();
   const int SRC_LEN = sSrc.length();
   const unsigned char * const SRC_END = pSrc + SRC_LEN;
   // last decodable '%' 
   const unsigned char * const SRC_LAST_DEC = SRC_END - 2;

   char * const pStart = new char[SRC_LEN];
   char * pEnd = pStart;

   while (pSrc < SRC_LAST_DEC)
   {
      if (*pSrc == '%')
      {
         char dec1, dec2;
         if (-1 != (dec1 = HEX2DEC[*(pSrc + 1)])
            && -1 != (dec2 = HEX2DEC[*(pSrc + 2)]))
         {
            *pEnd++ = (dec1 << 4) + dec2;
            pSrc += 3;
            continue;
         }
      }

      *pEnd++ = *pSrc++;
   }

   // the last 2- chars
   while (pSrc < SRC_END)
      *pEnd++ = *pSrc++;

   std::string sResult(pStart, pEnd);
   delete [] pStart;
   return sResult;
}

Usage Example

Just copy the source codes or external link the functions to use them.

int main()
{
   extern std::string UriEncode(const std::string & sSrc);
   extern std::string UriDecode(const std::string & sSrc);
   const std::string ORG("\0\1\2", 3);
   const std::string ENC("%00%01%02");
   assert(UriEncode(ORG) == ENC);
   assert(UriDecode(ENC) == ORG);
   return 0;
}

Differences from Other Implementations

Following are other implementations on URI/URL encoding and decoding.

This implementation differs from the above in these ways because it:

  • Has a decode function.
  • Encodes a buffer. Also, it supports encoding a char buffer, including '\0'. Example:
  • "ABC\0ABC" -> "ABC%00ABC"
  • Runs faster because it uses a array to do the mapping.
  • Is portable. It doesn't use MFC CString.


Downloads

Comments

  • Programmer

    Posted by Tim on 02/25/2014 09:14am

    Note to other users: The HEX2DEC array used in the decoding function is in the .cpp in the download file.

    Reply
  • Or we could encode like this

    Posted by Anonymous on 12/02/2012 07:25pm

    99% of the time the hex codes will be double digit in reality std::string URLEscape(char*url) { std::ostringstream s; for (;*url;s

    Reply
  • License

    Posted by Cem Kalyoncu on 05/26/2012 10:50am

    Pretty good, is it possible to specify the license? MIT/BSD/PD/Apache licenses would be most useful.

    Reply
Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • Remember getting your first box of LEGOS as a kid? How fun it was putting the pieces together, collaborating with your friends to create something new? Now, as an IT professional, assembling and maintaining a Lego-like collaboration infrastructure isn't what you signed up for. Piecing together disparate systems of record for email, web meetings and other applications is about as painful as stepping on a pile of Legos. Download the e-book to learn how implementing a collaboration system connects systems of …

  • This report outlines the future look of Forrester's solution for security and risk (S&R) executives working on building an identity and access management strategy for the extended enterprise. We designed this report to help you understand and navigate the major business and IT trends affecting identity and access management (IAM) during the next five years. IAM in 2012 has become a tool not just for security but also for business agility. Competitive challenges push businesses into the cloud and encourage …

Most Popular Programming Stories

More for Developers

RSS Feeds

Thanks for your registration, follow us on our social networks to keep up-to-date