URL Encoding
Introduction
The purpose of the article is to design a C++ class that does URL encoding. The motivation behind this article was that, in my previous project, I need to post data from a VC++ 6.0 application, which was required to be URL encoded. I have searched the MSDN to include a class or API that returns a URL encoded value for a given string input, but I haven't found one. So, I had to come out with my own URLEncode C++ class.
The URLEncoder.exe is a MFC dialog-based application that uses the URLEncode class.
Process
URL encoding is a special process that makes sure that all the characters are "safe" to transmit across the Internet. Some characters have special meaning to various programs involved in sending the data across the Internet.
For example, a carriage return has an ASCII value of 13. Programs involved in sending you "FORM" data may consider this to mean the end of a line of data.
Traditionally, all Web applications transfer data between the client and server by using the HTTP or HTTPS protocols. There are basically two ways in which a server receives input from a client:
- Data can be passed in the HTTP headers (either via cookies or a posted form), or
- It can be included in the query portion of the requested URL.
When data is included in a URL, it must be specially encoded to conform to proper URL syntax. On the Web server side, the data is automatically decoded. Consider the following URL, where data is posted as a query string parameter.
Example: http://WebSite/ResourceName?Data=Data
Where Web Site is the URL Name
Resource Name is either the ASP or Servlet Name.
Data is the one that is to be posted to the Web Server. This requires to be encoded if the MIME type is .Content-Type: application/x-www-form-urlencoded.
RFC 1738
The RFC 1738 specification defining Uniform Resource Locators (URLs) restricts the characters allowed in a URL to a subset of the US-ASCII character set. This poses a limitation because HTML, on the other hand, allows the entire range of the ISO-8859-1 (ISO-Latin) character set to be used in documents. This leads to the case of, if the data to be uploaded is in the form HTML post (or as a part of Query string), all the HTML data to be encoded.
ISO-8859-1 (ISO-Latin) Character Set
The following table, ISO-8859-1, contains the complete ISO-8859-1 (ISO-Latin) character set, corresponding to the first 256 entries. The table provides each character ISO 8859-1Position(its decimal code), Description, Entity Number, Hex-Decimal Values, and HTML Result. Broadly, the range can be divided into Safe and Unsafe characters as follows.
| Character range(decimal) | Type | Values | Safe/Unsafe |
| 0-31 | ASCII Control Characters | These characters are not printable | Unsafe |
| 32-47 | Reserved Characters | ' '!?#$%&'()*+,-./ | Unsafe |
| 48-57 | ASCII Characters and Numbers | 0-9 | Safe |
| 58-64 | Reserved Characters | :;<=>?@ | Unsafe |
| 65-90 | ASCII Characters | A-Z | Safe |
| 91-96 | Reserved Characters | [\]^_` | Unsafe |
| 97-122 | ASCII Characters | a-z | Safe |
| 123-126 | Reserved Characters | {|}~ | Unsafe |
| 127 | Control Characters | ' ' | Unsafe |
| 128-255 | Non-ASCII Characters | ' ' | Unsafe |
All the ASCII characters that are unsafe are required to encoded; for example, ranges (32-47, 58-64, 91-96, 123-126).
Below is the table that describes why these characters are not safe.
| Character | Unsafe Reason | Character Encode |
| "<" | Delimiters around URLs in free text | %3C |
| > | Delimiters around URLs in free text | %3E |
| . | Delimits URLs in some systems | %22 |
| # | It is used in the World Wide Web and in other systems to delimit a URL from a fragment/anchor identifier that might follow it. | %23 |
| { | Gateways and other transport agents are known to sometimes modify such characters | %7B |
| } | Gateways and other transport agents are known to sometimes modify such characters | %7D |
| | | Gateways and other transport agents are known to sometimes modify such characters | %7C |
| \ | Gateways and other transport agents are known to sometimes modify such characters | %5C |
| ^ | Gateways and other transport agents are known to sometimes modify such characters | %5E |
| ~ | Gateways and other transport agents are known to sometimes modify such characters | %7E |
| [ | Gateways and other transport agents are known to sometimes modify such characters | %5B |
| ] | Gateways and other transport agents are known to sometimes modify such characters | %5D |
| ` | Gateways and other transport agents are known to sometimes modify such characters | %60 |
| + | Indicates a space (spaces cannot be used in a URL) | %20 |
| / | Separates directories and subdirectories | %2F |
| ? | Separates the actual URL and the parameters | %3F |
| & | Separator between parameters specified in the URL | %26 |
How It Is Done
URL encoding of a character is done by taking the character's 8-bit hexadecimal code and prefixing it with a percent sign ("%"). For example, the US-ASCII character set represents a space with decimal code 32, or hexadecimal 20. Thus, its URL-encoded representation is %20.
URLEncode: URLEncode is a C++ class, which does URL encoding for a given string of data. The CURLEncode class has the following member functions.
- isUnsafeString
- decToHex
- convert
- URLEncode
The URLEncode() method does the encoding process. URLEncode checks each character in the string to see whether the character is safe or unsafe (isUnsafe). If the character is unsafe, the character is replaced with the .%. HEX value (convert) and appended to the original string.
Code Snippet
class CURLEncode
{
private:
static CString csUnsafeString;
CString (char num, int radix);
bool isUnsafe(char compareChar);
CString convert(char val);
public:
CURLEncode() { };
virtual ~CURLEncode() { };
CString (CString vData);
};
bool CURLEncode::isUnsafe(char compareChar)
{
bool bcharfound = false;
char tmpsafeChar;
int m_strLen = 0;
m_strLen = csUnsafeString.GetLength();
for(int ichar_pos = 0; ichar_pos < m_strLen ;ichar_pos++)
{
tmpsafeChar = csUnsafeString.GetAt(ichar_pos);
if(tmpsafeChar == compareChar)
{
bcharfound = true;
break;
}
}
int char_ascii_value = 0;
//char_ascii_value = __toascii(compareChar);
char_ascii_value = (int) compareChar;
if(bcharfound == false && char_ascii_value > 32 &&
char_ascii_value < 123)
{
return false;
}
// found no unsafe chars, return false
else
{
return true;
}
return true;
}
CString CURLEncode::decToHex(char num, int radix)
{
int temp=0;
CString csTmp;
int num_char;
num_char = (int) num;
if (num_char < 0)
num_char = 256 + num_char;
while (num_char >= radix)
{
temp = num_char % radix;
num_char = (int)floor(num_char / radix);
csTmp = hexVals[temp];
}
csTmp += hexVals[num_char];
if(csTmp.GetLength() < 2)
{
csTmp += '0';
}
CString strdecToHex(csTmp);
// Reverse the String
strdecToHex.MakeReverse();
return strdecToHex;
}
CString CURLEncode::convert(char val)
{
CString csRet;
csRet += "%";
csRet += decToHex(val, 16);
return csRet;
}

URLEncoder
References
URL Encoding: http://www.blooberry.com/indexdot/html/topics/urlencoding.htm.RFC 1866: The HTML 2.0 specification (plain text). The appendix contains the Character Entity table: http://www.rfc-editor.org/rfc/rfc1866.txt.
The Web version of the HTML 2.0 (RFC 1866) Character Entity table: http://www.w3.org/MarkUp/html-spec/html-spec_13.html.
The HTML 3.2 (Wilbur) recommendation [This includes all character entities listed in HTML 2.0, plus new named entities covering the ISO 8859-1 120-191 range.]: http://www.w3.org/MarkUp/Wilbur/.
The HTML 4.0 Recommendation [Includes new Unicode character entities]: http://www.w3.org/TR/REC-html40/.
The W3C HTML Internationalization area: http://www.w3.org/International/O-HTML.html.

Comments
Online Tools
Posted by davitz38 on 01/13/2010 03:53amUnicode builds
Posted by Syslock on 11/19/2009 06:13pmIs this code safe for Unicode builds?
ReplyA very simple MFC class to Encode and Decode an url string
Posted by serhardt on 03/03/2006 04:29amTis is my contribution to encode and decode an url string; My objective was simplifying source code, using CString existing functions...
/***************************************************************************** Module : UrlString.h Notices: Written 2006 by Stephane Erhardt Description: H URL Encoder/Decoder *****************************************************************************/ #ifndef __CURLSTRING_H_ #define __CURLSTRING_H_ class CUrlString { private: CString m_csUnsafe; public: CUrlString(); virtual ~CUrlString() { }; CString Encode(CString csDecoded); CString Decode(CString csEncoded); }; #endif //__CURLSTRING_H_ /***************************************************************************** Module : UrlString.cpp Notices: Written 2006 by Stephane Erhardt Description: CPP URL Encoder/Decoder *****************************************************************************/ #include "stdafx.h" #include "UrlString.h" /*****************************************************************************/ CUrlString::CUrlString() { m_csUnsafe = _T("%=\"<>\\^[]`+$,@:;/!#?&'"); for(int iChar = 1; iChar < 33; iChar++) m_csUnsafe += (char)iChar; for(int iChar = 124; iChar < 256; iChar++) m_csUnsafe += (char)iChar; } /*****************************************************************************/ CString CUrlString::Encode(CString csDecoded) { CString csCharEncoded, csCharDecoded; CString csEncoded = csDecoded; for(int iPos = 0; iPos < m_csUnsafe.GetLength(); iPos++) { csCharEncoded.Format(_T("%%%02X"), m_csUnsafe[iPos]); csCharDecoded = m_csUnsafe[iPos]; csEncoded.Replace(csCharDecoded, csCharEncoded); } return csEncoded; } /*****************************************************************************/ CString CUrlString::Decode(CString csEncoded) { CString csUnsafeEncoded = Encode(m_csUnsafe); CString csDecoded = csEncoded; CString csCharEncoded, csCharDecoded; for(int iPos = 0; iPos < csUnsafeEncoded.GetLength(); iPos += 3) { csCharEncoded = csUnsafeEncoded.Mid(iPos, 3); csCharDecoded = (char)strtol(csUnsafeEncoded.Mid(iPos + 1, 2), NULL, 16); csDecoded.Replace(csCharEncoded, csCharDecoded); } return csDecoded; }Replyhow about url decoding?
Posted by Legacy on 06/26/2003 12:00amOriginally posted by: william hwang
Anyone has any idea?
-
Reply
ReplyOnline Url Decoder
Posted by davitz38 on 01/13/2010 03:56amURL Encoding
Posted by Legacy on 06/14/2003 12:00amOriginally posted by: Elisha Tiyagnet
This article is grate!
It solves a lot of mannual URL scriptings. Give yourself a pat on the back.
St June 14th 2003.
Replyhttp://www.alrojo.com
Posted by Legacy on 06/04/2003 12:00amOriginally posted by: fermin
ReplyURL Encoding in C with Win32 code sample.
Posted by Legacy on 11/15/2002 12:00amOriginally posted by: AmbientHex
Reply
Space Conversion
Posted by Legacy on 10/01/2002 12:00amOriginally posted by: Robert Rehrl
Reply