User ID:
Password:
Remember Me:
Forgot Password?
Not a member?
Click here for more information and to register.

    URI Encoding and Decoding



    Introduction

    Here are two functions of URI encoding and decoding. They use std::string as the argument and return type.

    A URI is represented as a sequence of characters, not as a sequence of octets. That is because URI might be "transported" by means that are not through a computer network, e.g., printed on paper, read over the radio, etc.—RFC2396

    UriEncode() maps octets to characters, such as:

    "\0\1\2" -> "%00%01%02"
    "~ABCD"  -> "%7EABCD"
    

    Each octet except alphanum is converted to "% HEX HEX". UriDecode() converts them back.

    Code Snippets

    Encode:

    std::string UriEncode(const std::string & sSrc)
    {
       const char DEC2HEX[16 + 1] = "0123456789ABCDEF";
       const unsigned char * pSrc = (const unsigned char *)sSrc.c_str();
       const int SRC_LEN = sSrc.length();
       unsigned char * const pStart = new unsigned char[SRC_LEN * 3];
       unsigned char * pEnd = pStart;
       const unsigned char * const SRC_END = pSrc + SRC_LEN;
    
       for (; pSrc < SRC_END; ++pSrc)
       {
          if (SAFE[*pSrc]) 
             *pEnd++ = *pSrc;
          else
          {
             // escape this char
             *pEnd++ = '%';
             *pEnd++ = DEC2HEX[*pSrc >> 4];
             *pEnd++ = DEC2HEX[*pSrc & 0x0F];
          }
       }
    
       std::string sResult((char *)pStart, (char *)pEnd);
       delete [] pStart;
       return sResult;
    }
    

    Decode:

    std::string UriDecode(const std::string & sSrc)
    {
       // Note from RFC1630: "Sequences which start with a percent
       // sign but are not followed by two hexadecimal characters
       // (0-9, A-F) are reserved for future extension"
    
       const unsigned char * pSrc = (const unsigned char *)sSrc.c_str();
       const int SRC_LEN = sSrc.length();
       const unsigned char * const SRC_END = pSrc + SRC_LEN;
       // last decodable '%' 
       const unsigned char * const SRC_LAST_DEC = SRC_END - 2;
    
       char * const pStart = new char[SRC_LEN];
       char * pEnd = pStart;
    
       while (pSrc < SRC_LAST_DEC)
       {
          if (*pSrc == '%')
          {
             char dec1, dec2;
             if (-1 != (dec1 = HEX2DEC[*(pSrc + 1)])
                && -1 != (dec2 = HEX2DEC[*(pSrc + 2)]))
             {
                *pEnd++ = (dec1 << 4) + dec2;
                pSrc += 3;
                continue;
             }
          }
    
          *pEnd++ = *pSrc++;
       }
    
       // the last 2- chars
       while (pSrc < SRC_END)
          *pEnd++ = *pSrc++;
    
       std::string sResult(pStart, pEnd);
       delete [] pStart;
       return sResult;
    }
    

    Usage Example

    Just copy the source codes or external link the functions to use them.

    int main()
    {
       extern std::string UriEncode(const std::string & sSrc);
       extern std::string UriDecode(const std::string & sSrc);
       const std::string ORG("\0\1\2", 3);
       const std::string ENC("%00%01%02");
       assert(UriEncode(ORG) == ENC);
       assert(UriDecode(ENC) == ORG);
       return 0;
    }
    

    Differences from Other Implementations

    Following are other implementations on URI/URL encoding and decoding.

    This implementation differs from the above in these ways because it:

    • Has a decode function.
    • Encodes a buffer. Also, it supports encoding a char buffer, including '\0'. Example:
    • "ABC\0ABC" -> "ABC%00ABC"
    • Runs faster because it uses a array to do the mapping.
    • Is portable. It doesn't use MFC CString.

    Downloads

  • UriCodec_src.zip

  • IT Offers


    Top Authors