Function to Return HTML Source of a URL

Environment: Visual C++ 6.0

Here’s a function that gives you access to the source html of a URL.
As written the function stores the results to a .txt file, but you could easily
modified the function to fit your needs. From there you can parse the data, create a
page on the fly, and use the Navigate2 method to display the results in a
browser. This function could be useful for develping html views that block adds
or give the user the option of text-only page views. With a little imagination you could probably
come up with many other uses for this code.

The GetSourceHtml function makes use of the CInternetSession class, so be sure to place #include “afxinet.h”
below #include “stdafx.h” in the source file that contains the GetSourceHtml function.

To use GetSourceHtml, pass it a URL as a CString in the following format: GetSourceHtml( _T(“”) );.
You can then use Notepad to view the results. You will find it in the C: directory as rawHtml.txt

BOOL GetSourceHtml(CString theUrl)
// this first block does the actual work
CInternetSession session;
CInternetFile* file = NULL;
// try to connect to the URL
file = (CInternetFile*) session.OpenURL(theUrl);
catch (CInternetException* m_pException)
// set file to NULL if there’s an error
file = NULL;

// most of the following deals with storing the html to a file
CStdioFile dataStore;

if (file)
CString somecode;

BOOL bIsOk = dataStore.Open(_T(“C:\rawHtml.txt”),
| CFile::modeWrite
| CFile::shareDenyWrite
| CFile::typeText);

if (!bIsOk)
return FALSE;

// continue fetching code until there is no more
while (file->ReadString(somecode) != NULL)

delete file;
dataStore.WriteString(_T(“Could not establish a connection with the server…”));

More by Author

Must Read