Function to Return HTML Source of a URL

Environment: Visual C++ 6.0

Here’s a function that gives you access to the source html of a URL.
As written the function stores the results to a .txt file, but you could easily
modified the function to fit your needs. From there you can parse the data, create a
page on the fly, and use the Navigate2 method to display the results in a
browser. This function could be useful for develping html views that block adds
or give the user the option of text-only page views. With a little imagination you could probably
come up with many other uses for this code.

The GetSourceHtml function makes use of the CInternetSession class, so be sure to place #include “afxinet.h”
below #include “stdafx.h” in the source file that contains the GetSourceHtml function.

To use GetSourceHtml, pass it a URL as a CString in the following format: GetSourceHtml( _T(“http://www.codeguru.com”) );.
You can then use Notepad to view the results. You will find it in the C: directory as rawHtml.txt


BOOL GetSourceHtml(CString theUrl)
{
// this first block does the actual work
CInternetSession session;
CInternetFile* file = NULL;
try
{
// try to connect to the URL
file = (CInternetFile*) session.OpenURL(theUrl);
}
catch (CInternetException* m_pException)
{
// set file to NULL if there’s an error
file = NULL;
m_pException->Delete();
}

// most of the following deals with storing the html to a file
CStdioFile dataStore;

if (file)
{
CString somecode;

BOOL bIsOk = dataStore.Open(_T(“C:\rawHtml.txt”),
CFile::modeCreate
| CFile::modeWrite
| CFile::shareDenyWrite
| CFile::typeText);

if (!bIsOk)
return FALSE;

// continue fetching code until there is no more
while (file->ReadString(somecode) != NULL)
{
dataStore.WriteString(somecode);
}

file->Close();
delete file;
}
else
{
dataStore.WriteString(_T(“Could not establish a connection with the server…”));
}
}

More by Author

Must Read