URI Encoding and Decoding

Introduction

Here are two functions of URI encoding and decoding. They use std::string as the argument and return type.

A URI is represented as a sequence of characters, not as a sequence of octets. That is because URI might be "transported" by means that are not through a computer network, e.g., printed on paper, read over the radio, etc.—RFC2396

UriEncode() maps octets to characters, such as:

"\0\1\2" -> "%00%01%02"
"~ABCD"  -> "%7EABCD"

Each octet except alphanum is converted to "% HEX HEX". UriDecode() converts them back.

Code Snippets

Encode:

std::string UriEncode(const std::string & sSrc)
{
   const char DEC2HEX[16 + 1] = "0123456789ABCDEF";
   const unsigned char * pSrc = (const unsigned char *)sSrc.c_str();
   const int SRC_LEN = sSrc.length();
   unsigned char * const pStart = new unsigned char[SRC_LEN * 3];
   unsigned char * pEnd = pStart;
   const unsigned char * const SRC_END = pSrc + SRC_LEN;

   for (; pSrc < SRC_END; ++pSrc)
   {
      if (SAFE[*pSrc]) 
         *pEnd++ = *pSrc;
      else
      {
         // escape this char
         *pEnd++ = '%';
         *pEnd++ = DEC2HEX[*pSrc >> 4];
         *pEnd++ = DEC2HEX[*pSrc & 0x0F];
      }
   }

   std::string sResult((char *)pStart, (char *)pEnd);
   delete [] pStart;
   return sResult;
}

Decode:

std::string UriDecode(const std::string & sSrc)
{
   // Note from RFC1630: "Sequences which start with a percent
   // sign but are not followed by two hexadecimal characters
   // (0-9, A-F) are reserved for future extension"

   const unsigned char * pSrc = (const unsigned char *)sSrc.c_str();
   const int SRC_LEN = sSrc.length();
   const unsigned char * const SRC_END = pSrc + SRC_LEN;
   // last decodable '%' 
   const unsigned char * const SRC_LAST_DEC = SRC_END - 2;

   char * const pStart = new char[SRC_LEN];
   char * pEnd = pStart;

   while (pSrc < SRC_LAST_DEC)
   {
      if (*pSrc == '%')
      {
         char dec1, dec2;
         if (-1 != (dec1 = HEX2DEC[*(pSrc + 1)])
            && -1 != (dec2 = HEX2DEC[*(pSrc + 2)]))
         {
            *pEnd++ = (dec1 << 4) + dec2;
            pSrc += 3;
            continue;
         }
      }

      *pEnd++ = *pSrc++;
   }

   // the last 2- chars
   while (pSrc < SRC_END)
      *pEnd++ = *pSrc++;

   std::string sResult(pStart, pEnd);
   delete [] pStart;
   return sResult;
}

Usage Example

Just copy the source codes or external link the functions to use them.

int main()
{
   extern std::string UriEncode(const std::string & sSrc);
   extern std::string UriDecode(const std::string & sSrc);
   const std::string ORG("\0\1\2", 3);
   const std::string ENC("%00%01%02");
   assert(UriEncode(ORG) == ENC);
   assert(UriDecode(ENC) == ORG);
   return 0;
}

Differences from Other Implementations

Following are other implementations on URI/URL encoding and decoding.

This implementation differs from the above in these ways because it:

  • Has a decode function.
  • Encodes a buffer. Also, it supports encoding a char buffer, including '\0'. Example:
  • "ABC\0ABC" -> "ABC%00ABC"
  • Runs faster because it uses a array to do the mapping.
  • Is portable. It doesn't use MFC CString.


Downloads

Comments

  • Programmer

    Posted by Tim on 02/25/2014 09:14am

    Note to other users: The HEX2DEC array used in the decoding function is in the .cpp in the download file.

    Reply
  • Or we could encode like this

    Posted by Anonymous on 12/02/2012 07:25pm

    99% of the time the hex codes will be double digit in reality std::string URLEscape(char*url) { std::ostringstream s; for (;*url;s

    Reply
  • License

    Posted by Cem Kalyoncu on 05/26/2012 10:50am

    Pretty good, is it possible to specify the license? MIT/BSD/PD/Apache licenses would be most useful.

    Reply
Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • Live Event Date: December 11, 2014 @ 1:00 p.m. ET / 10:00 a.m. PT Market pressures to move more quickly and develop innovative applications are forcing organizations to rethink how they develop and release applications. The combination of public clouds and physical back-end infrastructures are a means to get applications out faster. However, these hybrid solutions complicate DevOps adoption, with application delivery pipelines that span across complex hybrid cloud and non-cloud environments. Check out this …

  • On-demand Event Event Date: October 29, 2014 It's well understood how critical version control is for code. However, its importance to DevOps isn't always recognized. The 2014 DevOps Survey of Practice shows that one of the key predictors of DevOps success is putting all production environment artifacts into version control. In this webcast, Gene Kim discusses these survey findings and shares woeful tales of artifact management gone wrong! Gene also shares examples of how high-performing DevOps …

Most Popular Programming Stories

More for Developers

RSS Feeds