Introduction
Here are two functions of URI encoding and decoding. They use std::string as the argument and return type.
A URI is represented as a sequence of characters, not as a sequence of octets. That is because URI might be “transported” by means that are not through a computer network, e.g., printed on paper, read over the radio, etc.—RFC2396
UriEncode() maps octets to characters, such as:
"12" -> "%00%01%02"
"~ABCD" -> "%7EABCD"
Each octet except alphanum is converted to “% HEX HEX”. UriDecode() converts them back.
Code Snippets
Encode:
std::string UriEncode(const std::string & sSrc)
{
const char DEC2HEX[16 + 1] = "0123456789ABCDEF";
const unsigned char * pSrc = (const unsigned char *)sSrc.c_str();
const int SRC_LEN = sSrc.length();
unsigned char * const pStart = new unsigned char[SRC_LEN * 3];
unsigned char * pEnd = pStart;
const unsigned char * const SRC_END = pSrc + SRC_LEN;
for (; pSrc < SRC_END; ++pSrc)
{
if (SAFE[*pSrc])
*pEnd++ = *pSrc;
else
{
// escape this char
*pEnd++ = '%';
*pEnd++ = DEC2HEX[*pSrc >> 4];
*pEnd++ = DEC2HEX[*pSrc & 0x0F];
}
}
std::string sResult((char *)pStart, (char *)pEnd);
delete [] pStart;
return sResult;
}
Decode:
std::string UriDecode(const std::string & sSrc)
{
// Note from RFC1630: "Sequences which start with a percent
// sign but are not followed by two hexadecimal characters
// (0-9, A-F) are reserved for future extension"
const unsigned char * pSrc = (const unsigned char *)sSrc.c_str();
const int SRC_LEN = sSrc.length();
const unsigned char * const SRC_END = pSrc + SRC_LEN;
// last decodable '%'
const unsigned char * const SRC_LAST_DEC = SRC_END - 2;
char * const pStart = new char[SRC_LEN];
char * pEnd = pStart;
while (pSrc < SRC_LAST_DEC)
{
if (*pSrc == '%')
{
char dec1, dec2;
if (-1 != (dec1 = HEX2DEC[*(pSrc + 1)])
&& -1 != (dec2 = HEX2DEC[*(pSrc + 2)]))
{
*pEnd++ = (dec1 << 4) + dec2;
pSrc += 3;
continue;
}
}
*pEnd++ = *pSrc++;
}
// the last 2- chars
while (pSrc < SRC_END)
*pEnd++ = *pSrc++;
std::string sResult(pStart, pEnd);
delete [] pStart;
return sResult;
}
Usage Example
Just copy the source codes or external link the functions to use them.
int main()
{
extern std::string UriEncode(const std::string & sSrc);
extern std::string UriDecode(const std::string & sSrc);
const std::string ORG("12", 3);
const std::string ENC("%00%01%02");
assert(UriEncode(ORG) == ENC);
assert(UriDecode(ENC) == ORG);
return 0;
}
Differences from Other Implementations
Following are other implementations on URI/URL encoding and decoding.
- URL Encoding (by Chandrasekhar Vuppalapati): An MFC class do single char convertion.
- URLEncode (by Ryszard Krakowiak): URL encode function using MFC, convert a null terminated C string.
This implementation differs from the above in these ways because it:
- Has a decode function.
- Encodes a buffer. Also, it supports encoding a char buffer, including ‘’. Example:
"ABCABC" -> "ABC%00ABC"