Function to Return HTML Source of a URL
Here's a function that gives you access to the source html of a URL. As written the function stores the results to a .txt file, but you could easily modified the function to fit your needs. From there you can parse the data, create a page on the fly, and use the Navigate2 method to display the results in a browser. This function could be useful for develping html views that block adds or give the user the option of text-only page views. With a little imagination you could probably come up with many other uses for this code.
The GetSourceHtml function makes use of the CInternetSession class, so be sure to place #include "afxinet.h" below #include "stdafx.h" in the source file that contains the GetSourceHtml function.
To use GetSourceHtml, pass it a URL as a CString in the following format: GetSourceHtml( _T("http://www.codeguru.com") );. You can then use Notepad to view the results. You will find it in the C:\ directory as rawHtml.txt
BOOL GetSourceHtml(CString theUrl)
{
// this first block does the actual work
CInternetSession session;
CInternetFile* file = NULL;
try
{
// try to connect to the URL
file = (CInternetFile*) session.OpenURL(theUrl);
}
catch (CInternetException* m_pException)
{
// set file to NULL if there's an error
file = NULL;
m_pException->Delete();
}
// most of the following deals with storing the html to a file
CStdioFile dataStore;
if (file)
{
CString somecode;
BOOL bIsOk = dataStore.Open(_T("C:\\rawHtml.txt"),
CFile::modeCreate
| CFile::modeWrite
| CFile::shareDenyWrite
| CFile::typeText);
if (!bIsOk)
return FALSE;
// continue fetching code until there is no more
while (file->ReadString(somecode) != NULL)
{
dataStore.WriteString(somecode);
}
file->Close();
delete file;
}
else
{
dataStore.WriteString(_T("Could not establish a connection with the server..."));
}
}

Comments
can't identify url of a webpage
Posted by ancy anna on 09/25/2004 02:22ami have a web page with 2 submit buttons(next and Previous) and the form action attribute is set as "post".Then how can we identify the url of next or previous page.Also how can we access to the source html of this URL.
ReplyHow to enter data and click a button
Posted by Legacy on 01/15/2004 12:00amOriginally posted by: Russ
Hi,
I'm after something similar to this. Having read in the html I'd like to simulate logging into a system i.e. entering a username & password and clicking the 'login' button. Could anyone give me any pointers ?
Thanks,
Russ
ReplyWhat about securied pages !
Posted by Legacy on 02/22/2003 12:00amOriginally posted by: Gorgo
Actually I have one page with is securied with some user name and password. How to pass the pass and user name to the web ?
thanks
ReplyIf Only
Posted by Legacy on 02/18/2003 12:00amOriginally posted by: alan
Is there any way to get part of the source?
I tried putting the source of a large page into a CString and then trying .Find("something") but it returns -1. Its definately in the source but is it because the CString doesn't hold enough that it doesn't find it?
thanks
ReplySave all pictures
Posted by Legacy on 01/03/2003 12:00amOriginally posted by: Nexus
Hello
How can I save the picture in the HTML page. This function save only the text but not the picture.
Thanks you for the answer
ReplyCan Someone pls help?(Urgent)
Posted by Legacy on 12/26/2002 12:00amOriginally posted by: Eunice
CString urlAddr = _T("172.15.156.43/hello.txt");
CString filename = _T("\\My Documents\\album\\try.txt");
URLDownloadToFile(NULL,urlAddr,filename,0,NULL);
MessageBox( TEXT("URL OK"), TEXT("Load Image"), MB_OK);
This is my so source code but when i compile it,i keep having this error
D:\PocketPCProject\source\EVC\copy3\SimpleImageDlg.cpp(124) : error C2065: 'URLDownloadToFile' : undeclared identifier
ReplyCan anyone help me to solve it please??
I get html error 407 Proxy Authentication Required
Posted by Legacy on 09/03/2002 12:00amOriginally posted by: tom
I get html error 407 Proxy Authentication Required by using this function. What can i do?
Replystdafx.h and afxinet.h are MFC only in MSVC6
Posted by Legacy on 07/11/2002 12:00amOriginally posted by: Payson Welch
Took me a little bit to figure out that the stdafx and afxinet libraries seem to only (well with no tweaking involved) be part of an MFC type application. Simple enough you just need to create a new project, standard exe with MFC support, don't forget to use the project_name.cpp file that MSVC will create for you in the project folder.
ReplyIs it possible to get source type?
Posted by Legacy on 07/05/2002 12:00amOriginally posted by: Chandramohan J
Is it possible to get the type of source file, i.e, html, xml ,asp or jsp ?
If there is any function it will be useful.
Thank You.
ReplyHow to get all files?
Posted by Legacy on 04/28/2002 12:00amOriginally posted by: Xiaolong Wu
A web page usually contains many files: an HTML file and several files of type graphic, sound, movie, ...
Hope somebody can tell me how to separate these files, get their type and length from the CHttpFile* file returned by OpenRequest().
In short, I would like to know how the "Save" function is done in IE6.
Any help is appreciated.
ReplyLoading, Please Wait ...