Click here for a larger image.
Environment: MFC/C++, Internet & Network, HTTP/HTTPS
This article presents a utility that lets you retrieve raw information from Web servers by using HTTP’s GET
and POST
commands.
Description
This utility is just a wrapper around reusable functions that allow programmatic access to the Web through a sort of ‘mini-browser’ embedded inside your program.
There are many uses for such code. Programs that look at a series of Web pages, much like a user surfing from one page to the next, are often called spiders, bots, or crawlers. Such programs are often used to catalog Web sites, import external data from the Web, or simply to send commands to a Web server. You could extend the functionality of the classes presented here to retrieve information from the Internet in a variety of ways.
There are many third-party DLLs and solutions that retrieve data from Web sites. The functions presented in this article are totally self-contained. There is no reliance on WinInet
, Internet Explorer, Netscape, or any requirement that similar software be installed, apart from WinSock. WinSock is an integral part of the Windows TCP/IP stack and is available on any computer capable of running a browser.
Every Internet protocol is documented in an RFC (Request For Comments) document. HTTP is documented in RFC1945. Additionally, RFC1630, RFC1708, and RFC1808 document the format of a URL.
A complete set of RFCs can be found at http://www.rfc-editor.org.
Implementation
The engine of the utility is in the Request
class. The key function is SendHTTP()
. This function accepts five parameters and returns one integer. The first parameter is the URL to POST
to or GET
from. The second parameter specifies any additional HTTP headers to be passed during this request. The third and fourth parameters specify the data and length of data to post. The fifth parameter is a pointer to an HTTPRequest
structure that will hold the headers and messages sent and returned by the Web server. SendHTTP
returns 0 if the POST or GET was succesful; otherwise, 1 to indicate an error.
SendHTTP()
begins by parsing the URL string. A URL is an address that specifies the exact location of a resource on the Internet. A URL has several parts, some of which are optional. An example of a URL would be:
An HTTP GET does not send any additional information to the Web server other than the request headers and the URL. An HTTP GET often uses the URL itself to send additional information:
Usually, an HTTP POST includes the header:
Content-Type:application/x-www-form-urlencoded
Without this header, some Web servers (particularly ASP running on IIS) will not recognize your parameters. An HTTP POST has two parts. The first is the HTTP headers, just as in the GET. The headers contain the actual request and additional pieces of information. Unlike a GET, a POST contains data after the headers (separated from them by a blank line).
After the Web server receives the GET or POST request, it sends back a response. The response has two parts: headers followed by data (with a blank line separating the two).
The first line of the HTTP headers specifies the status of the request. It starts with a numeric error code.
- 100-199 is an informational message and is not generally used.
- 200-299 means a successful request.
- 300-399 indicates that the requested resource has been moved; Web servers use this for redirection.
- 400-499 indicates client errors.
- 500-599 indicates server errors.
After the headers comes the data returned by the GET or POST request. This is usually seen on the browser screen.
Dialog Box Wrapper
The MFC dialog project is used like a wrapper to the Request
class. An instance of the Microsoft Web Browser control is inserted in the dialog container. This makes it very easy to navigate the data and make commands like GET or POST. The control is used in two ways:
- When the user makes a request from the browser, the control fires the
OnBeforeNavigate2
event, which is captured by the dialog program. In that way, theOnBeforeNavigate2Explorer1
function is used to discover whether the header sent to the Web server and the posted data is a GET or POST. - If the user wants to use the
SendHTTP
engine, enter the required URL, complete the ‘SendHTTPrequest’ and ‘PostData’ (if is a POST) fields, check the radio button GET or POST, and click the ‘Go’ button. The IE control will load the HTML-formatted data received fromSendHTTP()
function in them_HTTPbody
string variable. The HTML loading is done inOnButtonViewHttp()
.
IHTMLDocument2* pHTMLDocument2;
LPDISPATCH lpDispatch;
lpDispatch = m_Browser.GetDocument();
if (lpDispatch)
{
HRESULT hr;
hr = lpDispatch->QueryInterface(IID_IHTMLDocument2,
(LPVOID*)&pHTMLDocument2);
lpDispatch->Release();
IHTMLElement* pBody;
hr = pHTMLDocument2->get_body(&pBody);
BSTR bstr;
bstr = m_HTTPbody.AllocSysString();
pBody->put_innerHTML(bstr); //insert the HTML
SysFreeString(bstr);
pBody->Release();
}
Usage
Input the URL address and click the Go button. On the right there is a mini-browser that contains your page. Navigating on links and buttons on this page and in the ‘PostData’, ‘SendHTTPrequest’ and ‘ReceiveHTTPrequest’ will receive the corresponding data. The radio buttons Get/Post are modified automatically—the IE instance knows if you make an GET (you push on a link) or POST (you push a button).
You are able to input your header in the ‘SendHTTPrequest’ edit box and your POST data in the ‘PostData’ edit box, and then push the ‘Go’ button. The browser will navigate to your address using the headers and data submitted from ‘SendHTTPrequest’ and ‘PostData’ fields.
Use the TestGet.asp and TestPost.asp files from the Web directory to test your GET/POST utility :