URL Encoding

Environment: VC++, MFC

Introduction

The purpose of the article is to design a C++ class that does URL encoding. The motivation behind this article was that, in my previous project, I need to post data from a VC++ 6.0 application, which was required to be URL encoded. I have searched the MSDN to include a class or API that returns a URL encoded value for a given string input, but I haven't found one. So, I had to come out with my own URLEncode C++ class.

The URLEncoder.exe is a MFC dialog-based application that uses the URLEncode class.

Process

URL encoding is a special process that makes sure that all the characters are "safe" to transmit across the Internet. Some characters have special meaning to various programs involved in sending the data across the Internet.

For example, a carriage return has an ASCII value of 13. Programs involved in sending you "FORM" data may consider this to mean the end of a line of data.

Traditionally, all Web applications transfer data between the client and server by using the HTTP or HTTPS protocols. There are basically two ways in which a server receives input from a client:

  1. Data can be passed in the HTTP headers (either via cookies or a posted form), or
  2. It can be included in the query portion of the requested URL.

When data is included in a URL, it must be specially encoded to conform to proper URL syntax. On the Web server side, the data is automatically decoded. Consider the following URL, where data is posted as a query string parameter.

Example: http://WebSite/ResourceName?Data=Data

Where Web Site is the URL Name
Resource Name is either the ASP or Servlet Name.
Data is the one that is to be posted to the Web Server. This requires to be encoded if the MIME type is .Content-Type: application/x-www-form-urlencoded.

RFC 1738

The RFC 1738 specification defining Uniform Resource Locators (URLs) restricts the characters allowed in a URL to a subset of the US-ASCII character set. This poses a limitation because HTML, on the other hand, allows the entire range of the ISO-8859-1 (ISO-Latin) character set to be used in documents. This leads to the case of, if the data to be uploaded is in the form HTML post (or as a part of Query string), all the HTML data to be encoded.

ISO-8859-1 (ISO-Latin) Character Set

The following table, ISO-8859-1, contains the complete ISO-8859-1 (ISO-Latin) character set, corresponding to the first 256 entries. The table provides each character ISO 8859-1Position(its decimal code), Description, Entity Number, Hex-Decimal Values, and HTML Result. Broadly, the range can be divided into Safe and Unsafe characters as follows.

Character range(decimal) Type Values Safe/Unsafe
0-31 ASCII Control Characters These characters are not printable Unsafe
32-47 Reserved Characters ' '!?#$%&'()*+,-./ Unsafe
48-57 ASCII Characters and Numbers 0-9 Safe
58-64 Reserved Characters :;<=>?@ Unsafe
65-90 ASCII Characters A-Z Safe
91-96 Reserved Characters [\]^_` Unsafe
97-122 ASCII Characters a-z Safe
123-126 Reserved Characters {|}~ Unsafe
127 Control Characters ' ' Unsafe
128-255 Non-ASCII Characters ' ' Unsafe

All the ASCII characters that are unsafe are required to encoded; for example, ranges (32-47, 58-64, 91-96, 123-126).

Below is the table that describes why these characters are not safe.

Character Unsafe Reason Character Encode
"<" Delimiters around URLs in free text %3C
> Delimiters around URLs in free text %3E
. Delimits URLs in some systems %22
# It is used in the World Wide Web and in other systems to delimit a URL from a fragment/anchor identifier that might follow it. %23
{ Gateways and other transport agents are known to sometimes modify such characters %7B
} Gateways and other transport agents are known to sometimes modify such characters %7D
| Gateways and other transport agents are known to sometimes modify such characters %7C
\ Gateways and other transport agents are known to sometimes modify such characters %5C
^ Gateways and other transport agents are known to sometimes modify such characters %5E
~ Gateways and other transport agents are known to sometimes modify such characters %7E
[ Gateways and other transport agents are known to sometimes modify such characters %5B
] Gateways and other transport agents are known to sometimes modify such characters %5D
` Gateways and other transport agents are known to sometimes modify such characters %60
+ Indicates a space (spaces cannot be used in a URL) %20
/ Separates directories and subdirectories %2F
? Separates the actual URL and the parameters %3F
& Separator between parameters specified in the URL %26

How It Is Done

URL encoding of a character is done by taking the character's 8-bit hexadecimal code and prefixing it with a percent sign ("%"). For example, the US-ASCII character set represents a space with decimal code 32, or hexadecimal 20. Thus, its URL-encoded representation is %20.

URLEncode: URLEncode is a C++ class, which does URL encoding for a given string of data. The CURLEncode class has the following member functions.

  • isUnsafeString
  • decToHex
  • convert
  • URLEncode

The URLEncode() method does the encoding process. URLEncode checks each character in the string to see whether the character is safe or unsafe (isUnsafe). If the character is unsafe, the character is replaced with the .%. HEX value (convert) and appended to the original string.

Code Snippet

class CURLEncode
{
private:
  static CString csUnsafeString;
  CString (char num, int radix);
  bool isUnsafe(char compareChar);
  CString convert(char val);

public:
  CURLEncode() { };
  virtual ~CURLEncode() { };
  CString (CString vData);
};

bool CURLEncode::isUnsafe(char compareChar)
{
  bool bcharfound = false;
  char tmpsafeChar;
  int m_strLen = 0;
  
  m_strLen = csUnsafeString.GetLength();
  for(int ichar_pos = 0; ichar_pos < m_strLen ;ichar_pos++)
  {
    tmpsafeChar = csUnsafeString.GetAt(ichar_pos);
    if(tmpsafeChar == compareChar)
    {
      bcharfound = true;
      break;
    }
  }
  int char_ascii_value = 0;
  //char_ascii_value = __toascii(compareChar);
  char_ascii_value = (int) compareChar;

  if(bcharfound == false &&  char_ascii_value > 32 &&
                             char_ascii_value < 123)
  {
    return false;
  }
  // found no unsafe chars, return false
  else
  {
    return true;
  }

  return true;
}

CString CURLEncode::decToHex(char num, int radix)
{
  int temp=0;
  CString csTmp;
  int num_char;

num_char = (int) num;
  if (num_char < 0)
    num_char = 256 + num_char;

  while (num_char >= radix)
    {
    temp = num_char % radix;
    num_char = (int)floor(num_char / radix);
    csTmp = hexVals[temp];
    }

  csTmp += hexVals[num_char];

  if(csTmp.GetLength() < 2)
  {
    csTmp += '0';
  }

  CString strdecToHex(csTmp);
  // Reverse the String
  strdecToHex.MakeReverse();

  return strdecToHex;
}

CString CURLEncode::convert(char val)
{
  CString csRet;
  csRet += "%";
  csRet += decToHex(val, 16);
  return  csRet;
}

URLEncoder

References

URL Encoding: http://www.blooberry.com/indexdot/html/topics/urlencoding.htm.

RFC 1866: The HTML 2.0 specification (plain text). The appendix contains the Character Entity table: http://www.rfc-editor.org/rfc/rfc1866.txt.

The Web version of the HTML 2.0 (RFC 1866) Character Entity table: http://www.w3.org/MarkUp/html-spec/html-spec_13.html.

The HTML 3.2 (Wilbur) recommendation [This includes all character entities listed in HTML 2.0, plus new named entities covering the ISO 8859-1 120-191 range.]: http://www.w3.org/MarkUp/Wilbur/.

The HTML 4.0 Recommendation [Includes new Unicode character entities]: http://www.w3.org/TR/REC-html40/.

The W3C HTML Internationalization area: http://www.w3.org/International/O-HTML.html.

Downloads

URLEncoder Source Code - 42 Kb


Comments

  • why they have won awards around the world, is because of their

    Posted by Andreayvb on 05/18/2013 03:44pm

    it comes to money, and they want you to set it up and then keep http://www.baidu.com Us See The Functions In Brief In this device you can feed in an

    Reply
  • http://www.tomsoutletw.com/ qxoomj

    Posted by http://www.tomsoutletw.com/ Mandypqf on 03/28/2013 07:57pm

    After listening to the man, the body is not back, I heard an unusual familiar voice: no Qingyang and white? Zhuo who at this time covered with cold sweat, to release four Lingzhu shouted: Who are you? The man smiles: ray ban wayfarer sunglasses you did not know it? Firmly face him turn over, Zhuo who was Annealing few steps and roared: who are you? That put down the hands of jade Jane said: ray ban caravan Cheuk Fan, Sheng the mainland last a soaring the Zhuo cloud descendants, can not I? Zhuo who looked at immediate creepily asked: who you! If you do not reveal their true colors, Xiuguai oakley sunglasses outlet polite! Cheuk Fan, the opposite is also furious: where evildoer dare impersonate oakley sunglasses cheap Cheuk Fan!ray ban sunglasses, Sound indistinguishable from the original, I saw that he also released four beads, Cheuk Fan Zhang look up to, turned Lingzhu, exactly the same with their own.

    Reply
  • http://www.raybansunglassesouty.com/ ppxdsj

    Posted by http://www.raybansunglassesouty.com/ Mandyimf on 03/28/2013 07:15am

    ghd,The Prince Gong Yixin laughing: group security is how the current difference between? The Tan Yankai smiled and replied: said it would also like to thank the Younger's my wife, she once brought to the Younger inadvertently the silver ghd hair straightener is slowly devaluation, and Britain as the most powerful powers, has a strong the basis of economic and military support its strong pound, so that cheap ghd silver is absolutely against the pound devaluation ... ghd australia Imagine the Japanese to use sterling to repay the claims, which the middle package is also kept evil intentions, originally pounds for silver prices, Qing Peifu Japan thirty-five million taels of silver, if placed on the international market is certainly exacerbate silver devaluation, especially court interested in saving reparations interest, want a one-time settlement of this claims that ghd hair straightener is necessary in the international market in a short period of time purchase worth 18,002,000 pounds, which is not a more accelerated depreciation of silver?

    Reply
  • cheap ugg boots lZijfWkp http://www.cheapfashionshoesan.com/

    Posted by Suttontih on 03/10/2013 10:19pm

    ghd baratas nqlqakni ghd españa sutanltx ghd planchas npnjldht ghd ofaxqmof planchas ghd gpukzsqg

    Reply
  • ugg boots hehhyo

    Posted by Suttonscx on 02/19/2013 02:06pm

    beats by dr dre owfkttjn beats by dre bmboajpg beats dr dre vrdvxaqv beats for sale znkqqjgk beats headphones suisvgcm cheap monster beats nuhtpmvo dr dre beats lklshmhj dr dre headphones eatgkhap monster beats by dre ptsjtbdd monster beats headphones tnwazooe monster beats bfaqnonl monster headphones pdocqjei

    Reply
  • ugg boots qlbxjr

    Posted by Mandygsg on 02/19/2013 01:09am

    ghd nz onruxarw ghd nz sale hvdzbdoc ghd tegvnuoa

    Reply
  • ugg boots gmogqf http://www.cheapfashionshoesan.com/

    Posted by Mandykvx on 02/14/2013 07:32pm

    beats by dr dre opsgurktbeats by dre mgtwslbnbeats dr dre etrnndkgbeats for sale xtiqqoqmbeats headphones hlxsaccecheap monster beats llyzpyzldr dre beats vajhtaxpdr dre headphones fhcpipnjmonster beats by dre wkwaqfnwmonster beats headphones jzguhkstmonster beats glfzrrzqmonster headphones plxsllxh

    Reply
  • cheap ugg boots fGxn tZnk

    Posted by Mandyrqf on 02/13/2013 09:03am

    nRfr louboutin soldes yIsb longchamp tote xSvd michael kors outlet 6gZpm 6vJwr chi straightener 8vArr Michael Kors 3gUct cheap New England Patriots Sideline Legend Authentic Logo Dri-FIT T-Shirt D.Green 5hIbm nike air max 1 7qXkn ghd france 6eSkp ugg online 0oOaj toms shoes 5tZwf Tory Burch Classic Leather Brown Handbags CheapTory Burch Zip Front Blue Handbags CheapTory Burch Romy Reva Ballet Brown Flat CheapTory Burch Fuchsia Wedge CheapTory Burch HandBags Leather Red Cheap 2rRqn hollister paris 5ySgw ghd planchas 2pYkd cheap ugg boots

    Reply
  • ghd australia jyfbtv

    Posted by Suttonbrr on 02/08/2013 03:05am

    9vIfp bottes ugg pWyh tory burch mGey nike sko p? nett 9dPyv toms sale 5eRcd hollister sale 3aNmw 0tAcy portefeuille longchamp 9gKej louis vuitton shoes 4uXgk michael kors outlet 0aNts christian louboutin norge 9xVps 49ers jerseys 7iQno 2cAyo ghd 1gGuf styler ghd 5uWde ugg boots uk

    Reply
  • ghd australia wvlmmu

    Posted by Mandyvbc on 02/07/2013 03:48am

    7iJjr ugg mSbq qHqh nike 5eEfd toms outlet 7oDad hollister uk 2gVyx ugg 2mBun longchamps 2xWec louis vuitton outlet 2aXlb michael kors outlet 9qCuj christian louboutin 1tHjd A.J. Jenkins Jersey 9dEdp 8eHyz 9eJif ghd 1zUtd ugg sale

    Reply
  • Loading, Please Wait ...

Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • Packaged application development teams frequently operate with limited testing environments due to time and labor constraints. By virtualizing the entire application stack, packaged application development teams can deliver business results faster, at higher quality, and with lower risk.

  • Java developers know that testing code changes can be a huge pain, and waiting for an application to redeploy after a code fix can take an eternity. Wouldn't it be great if you could see your code changes immediately, fine-tune, debug, explore and deploy code without waiting for ages? In this white paper, find out how that's possible with a Java plugin that drastically changes the way you develop, test and run Java applications. Discover the advantages of this plugin, and the changes you can expect to see …

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds