Introduction
Here are two functions of URI encoding and decoding. They use std::string as the argument and return type.
A URI is represented as a sequence of characters, not as a sequence of octets. That is because URI might be “transported” by means that are not through a computer network, e.g., printed on paper, read over the radio, etc.—RFC2396
UriEncode() maps octets to characters, such as:
" 12" -> "%00%01%02" "~ABCD" -> "%7EABCD"
Each octet except alphanum is converted to “% HEX HEX”. UriDecode() converts them back.
Code Snippets
Encode:
std::string UriEncode(const std::string & sSrc) { const char DEC2HEX[16 + 1] = "0123456789ABCDEF"; const unsigned char * pSrc = (const unsigned char *)sSrc.c_str(); const int SRC_LEN = sSrc.length(); unsigned char * const pStart = new unsigned char[SRC_LEN * 3]; unsigned char * pEnd = pStart; const unsigned char * const SRC_END = pSrc + SRC_LEN; for (; pSrc < SRC_END; ++pSrc) { if (SAFE[*pSrc]) *pEnd++ = *pSrc; else { // escape this char *pEnd++ = '%'; *pEnd++ = DEC2HEX[*pSrc >> 4]; *pEnd++ = DEC2HEX[*pSrc & 0x0F]; } } std::string sResult((char *)pStart, (char *)pEnd); delete [] pStart; return sResult; }
Decode:
std::string UriDecode(const std::string & sSrc) { // Note from RFC1630: "Sequences which start with a percent // sign but are not followed by two hexadecimal characters // (0-9, A-F) are reserved for future extension" const unsigned char * pSrc = (const unsigned char *)sSrc.c_str(); const int SRC_LEN = sSrc.length(); const unsigned char * const SRC_END = pSrc + SRC_LEN; // last decodable '%' const unsigned char * const SRC_LAST_DEC = SRC_END - 2; char * const pStart = new char[SRC_LEN]; char * pEnd = pStart; while (pSrc < SRC_LAST_DEC) { if (*pSrc == '%') { char dec1, dec2; if (-1 != (dec1 = HEX2DEC[*(pSrc + 1)]) && -1 != (dec2 = HEX2DEC[*(pSrc + 2)])) { *pEnd++ = (dec1 << 4) + dec2; pSrc += 3; continue; } } *pEnd++ = *pSrc++; } // the last 2- chars while (pSrc < SRC_END) *pEnd++ = *pSrc++; std::string sResult(pStart, pEnd); delete [] pStart; return sResult; }
Usage Example
Just copy the source codes or external link the functions to use them.
int main() { extern std::string UriEncode(const std::string & sSrc); extern std::string UriDecode(const std::string & sSrc); const std::string ORG(" 12", 3); const std::string ENC("%00%01%02"); assert(UriEncode(ORG) == ENC); assert(UriDecode(ENC) == ORG); return 0; }
Differences from Other Implementations
Following are other implementations on URI/URL encoding and decoding.
- URL Encoding (by Chandrasekhar Vuppalapati): An MFC class do single char convertion.
- URLEncode (by Ryszard Krakowiak): URL encode function using MFC, convert a null terminated C string.
This implementation differs from the above in these ways because it:
- Has a decode function.
- Encodes a buffer. Also, it supports encoding a char buffer, including ‘