URI Encoding and Decoding

Introduction

Here are two functions of URI encoding and decoding. They use std::string as the argument and return type.

A URI is represented as a sequence of characters, not as a sequence of octets. That is because URI might be "transported" by means that are not through a computer network, e.g., printed on paper, read over the radio, etc.—RFC2396

UriEncode() maps octets to characters, such as:

"\0\1\2" -> "%00%01%02"
"~ABCD"  -> "%7EABCD"

Each octet except alphanum is converted to "% HEX HEX". UriDecode() converts them back.

Code Snippets

Encode:

std::string UriEncode(const std::string & sSrc)
{
   const char DEC2HEX[16 + 1] = "0123456789ABCDEF";
   const unsigned char * pSrc = (const unsigned char *)sSrc.c_str();
   const int SRC_LEN = sSrc.length();
   unsigned char * const pStart = new unsigned char[SRC_LEN * 3];
   unsigned char * pEnd = pStart;
   const unsigned char * const SRC_END = pSrc + SRC_LEN;

   for (; pSrc < SRC_END; ++pSrc)
   {
      if (SAFE[*pSrc]) 
         *pEnd++ = *pSrc;
      else
      {
         // escape this char
         *pEnd++ = '%';
         *pEnd++ = DEC2HEX[*pSrc >> 4];
         *pEnd++ = DEC2HEX[*pSrc & 0x0F];
      }
   }

   std::string sResult((char *)pStart, (char *)pEnd);
   delete [] pStart;
   return sResult;
}

Decode:

std::string UriDecode(const std::string & sSrc)
{
   // Note from RFC1630: "Sequences which start with a percent
   // sign but are not followed by two hexadecimal characters
   // (0-9, A-F) are reserved for future extension"

   const unsigned char * pSrc = (const unsigned char *)sSrc.c_str();
   const int SRC_LEN = sSrc.length();
   const unsigned char * const SRC_END = pSrc + SRC_LEN;
   // last decodable '%' 
   const unsigned char * const SRC_LAST_DEC = SRC_END - 2;

   char * const pStart = new char[SRC_LEN];
   char * pEnd = pStart;

   while (pSrc < SRC_LAST_DEC)
   {
      if (*pSrc == '%')
      {
         char dec1, dec2;
         if (-1 != (dec1 = HEX2DEC[*(pSrc + 1)])
            && -1 != (dec2 = HEX2DEC[*(pSrc + 2)]))
         {
            *pEnd++ = (dec1 << 4) + dec2;
            pSrc += 3;
            continue;
         }
      }

      *pEnd++ = *pSrc++;
   }

   // the last 2- chars
   while (pSrc < SRC_END)
      *pEnd++ = *pSrc++;

   std::string sResult(pStart, pEnd);
   delete [] pStart;
   return sResult;
}

Usage Example

Just copy the source codes or external link the functions to use them.

int main()
{
   extern std::string UriEncode(const std::string & sSrc);
   extern std::string UriDecode(const std::string & sSrc);
   const std::string ORG("\0\1\2", 3);
   const std::string ENC("%00%01%02");
   assert(UriEncode(ORG) == ENC);
   assert(UriDecode(ENC) == ORG);
   return 0;
}

Differences from Other Implementations

Following are other implementations on URI/URL encoding and decoding.

This implementation differs from the above in these ways because it:

  • Has a decode function.
  • Encodes a buffer. Also, it supports encoding a char buffer, including '\0'. Example:
  • "ABC\0ABC" -> "ABC%00ABC"
  • Runs faster because it uses a array to do the mapping.
  • Is portable. It doesn't use MFC CString.


Downloads

Comments

  • Programmer

    Posted by Tim on 02/25/2014 09:14am

    Note to other users: The HEX2DEC array used in the decoding function is in the .cpp in the download file.

    Reply
  • Or we could encode like this

    Posted by Anonymous on 12/02/2012 07:25pm

    99% of the time the hex codes will be double digit in reality std::string URLEscape(char*url) { std::ostringstream s; for (;*url;s

    Reply
  • License

    Posted by Cem Kalyoncu on 05/26/2012 10:50am

    Pretty good, is it possible to specify the license? MIT/BSD/PD/Apache licenses would be most useful.

    Reply
Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • Download the Information Governance Survey Benchmark Report to gain insights that can help you further establish business value in your Records and Information Management (RIM) program and across your entire organization. Discover how your peers in the industry are dealing with this evolving information lifecycle management environment and uncover key insights such as: 87% of organizations surveyed have a RIM program in place 8% measure compliance 64% cannot get employees to "let go" of information for …

  • With JRebel, developers get to see their code changes immediately, fine-tune their code with incremental changes, debug, explore and deploy their code with ease (both locally and remotely), and ultimately spend more time coding instead of waiting for the dreaded application redeploy to finish. Every time a developer tests a code change it takes minutes to build and deploy the application. JRebel keeps the app server running at all times, so testing is instantaneous and interactive.

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds