URI Encoding and Decoding

Introduction

Here are two functions of URI encoding and decoding. They use std::string as the argument and return type.

A URI is represented as a sequence of characters, not as a sequence of octets. That is because URI might be "transported" by means that are not through a computer network, e.g., printed on paper, read over the radio, etc.—RFC2396

UriEncode() maps octets to characters, such as:

"\0\1\2" -> "%00%01%02"
"~ABCD"  -> "%7EABCD"

Each octet except alphanum is converted to "% HEX HEX". UriDecode() converts them back.

Code Snippets

Encode:

std::string UriEncode(const std::string & sSrc)
{
   const char DEC2HEX[16 + 1] = "0123456789ABCDEF";
   const unsigned char * pSrc = (const unsigned char *)sSrc.c_str();
   const int SRC_LEN = sSrc.length();
   unsigned char * const pStart = new unsigned char[SRC_LEN * 3];
   unsigned char * pEnd = pStart;
   const unsigned char * const SRC_END = pSrc + SRC_LEN;

   for (; pSrc < SRC_END; ++pSrc)
   {
      if (SAFE[*pSrc]) 
         *pEnd++ = *pSrc;
      else
      {
         // escape this char
         *pEnd++ = '%';
         *pEnd++ = DEC2HEX[*pSrc >> 4];
         *pEnd++ = DEC2HEX[*pSrc & 0x0F];
      }
   }

   std::string sResult((char *)pStart, (char *)pEnd);
   delete [] pStart;
   return sResult;
}

Decode:

std::string UriDecode(const std::string & sSrc)
{
   // Note from RFC1630: "Sequences which start with a percent
   // sign but are not followed by two hexadecimal characters
   // (0-9, A-F) are reserved for future extension"

   const unsigned char * pSrc = (const unsigned char *)sSrc.c_str();
   const int SRC_LEN = sSrc.length();
   const unsigned char * const SRC_END = pSrc + SRC_LEN;
   // last decodable '%' 
   const unsigned char * const SRC_LAST_DEC = SRC_END - 2;

   char * const pStart = new char[SRC_LEN];
   char * pEnd = pStart;

   while (pSrc < SRC_LAST_DEC)
   {
      if (*pSrc == '%')
      {
         char dec1, dec2;
         if (-1 != (dec1 = HEX2DEC[*(pSrc + 1)])
            && -1 != (dec2 = HEX2DEC[*(pSrc + 2)]))
         {
            *pEnd++ = (dec1 << 4) + dec2;
            pSrc += 3;
            continue;
         }
      }

      *pEnd++ = *pSrc++;
   }

   // the last 2- chars
   while (pSrc < SRC_END)
      *pEnd++ = *pSrc++;

   std::string sResult(pStart, pEnd);
   delete [] pStart;
   return sResult;
}

Usage Example

Just copy the source codes or external link the functions to use them.

int main()
{
   extern std::string UriEncode(const std::string & sSrc);
   extern std::string UriDecode(const std::string & sSrc);
   const std::string ORG("\0\1\2", 3);
   const std::string ENC("%00%01%02");
   assert(UriEncode(ORG) == ENC);
   assert(UriDecode(ENC) == ORG);
   return 0;
}

Differences from Other Implementations

Following are other implementations on URI/URL encoding and decoding.

This implementation differs from the above in these ways because it:

  • Has a decode function.
  • Encodes a buffer. Also, it supports encoding a char buffer, including '\0'. Example:
  • "ABC\0ABC" -> "ABC%00ABC"
  • Runs faster because it uses a array to do the mapping.
  • Is portable. It doesn't use MFC CString.


Downloads

Comments

  • Programmer

    Posted by Tim on 02/25/2014 09:14am

    Note to other users: The HEX2DEC array used in the decoding function is in the .cpp in the download file.

    Reply
  • Or we could encode like this

    Posted by Anonymous on 12/02/2012 07:25pm

    99% of the time the hex codes will be double digit in reality std::string URLEscape(char*url) { std::ostringstream s; for (;*url;s

    Reply
  • License

    Posted by Cem Kalyoncu on 05/26/2012 10:50am

    Pretty good, is it possible to specify the license? MIT/BSD/PD/Apache licenses would be most useful.

    Reply
Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • 10 Rules that Make or Break Enterprise App Development Projects In today's app-driven world, application development is a top priority. Even so, 68% of enterprise application delivery projects fail. Designing and building applications that pay for themselves and adapt to future needs is incredibly difficult. Executing one successful project is lucky, but making it a repeatable process and strategic advantage? That's where the money is. With help from our most experienced project leads and software engineers, …

  • As everyone scrambles to protect customers and consumers from the Heartbleed virus, there will be a variety of mitigating solutions offered up to address this pesky bug. There are a variety of points within the data path where solutions could be put into place to mitigate this (and similar) vulnerabilities and customers must choose the most strategic point in the network at which to deploy their selected mitigation. Read this white paper to learn the ins and outs of mitigating the risk of Heartbleed and the …

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds