Function to Return HTML Source of a URL

Environment: Visual C++ 6.0

Here's a function that gives you access to the source html of a URL. As written the function stores the results to a .txt file, but you could easily modified the function to fit your needs. From there you can parse the data, create a page on the fly, and use the Navigate2 method to display the results in a browser. This function could be useful for develping html views that block adds or give the user the option of text-only page views. With a little imagination you could probably come up with many other uses for this code.

The GetSourceHtml function makes use of the CInternetSession class, so be sure to place #include "afxinet.h" below #include "stdafx.h" in the source file that contains the GetSourceHtml function.

To use GetSourceHtml, pass it a URL as a CString in the following format: GetSourceHtml( _T("http://www.codeguru.com") );. You can then use Notepad to view the results. You will find it in the C:\ directory as rawHtml.txt
BOOL GetSourceHtml(CString theUrl) 
{
 // this first block does the actual work
 CInternetSession session;
 CInternetFile* file = NULL;
 try
 {
  // try to connect to the URL
  file = (CInternetFile*) session.OpenURL(theUrl); 
 }
 catch (CInternetException* m_pException)
 {
  // set file to NULL if there's an error
  file = NULL; 
  m_pException->Delete();
 }

 // most of the following deals with storing the html to a file
 CStdioFile dataStore;

 if (file)
 {
  CString somecode;

  BOOL bIsOk = dataStore.Open(_T("C:\\rawHtml.txt"),
                              CFile::modeCreate 
                              | CFile::modeWrite 
                              | CFile::shareDenyWrite 
                              | CFile::typeText);

  if (!bIsOk)
   return FALSE;

  // continue fetching code until there is no more
  while (file->ReadString(somecode) != NULL) 
  {
   dataStore.WriteString(somecode);
  }
  
  file->Close();
  delete file;
 }
 else
 {
  dataStore.WriteString(_T("Could not establish a connection with the server..."));	
 }
}


Comments

  • can't identify url of a webpage

    Posted by ancy anna on 09/25/2004 02:22am

    i have a web page with 2 submit buttons(next and Previous) and the form action attribute is set as "post".Then how can we identify the url of next or previous page.Also how can we access to the source html of this URL.

    Reply
  • How to enter data and click a button

    Posted by Legacy on 01/15/2004 12:00am

    Originally posted by: Russ

    Hi,

    I'm after something similar to this. Having read in the html I'd like to simulate logging into a system i.e. entering a username & password and clicking the 'login' button. Could anyone give me any pointers ?

    Thanks,

    Russ

    Reply
  • What about securied pages !

    Posted by Legacy on 02/22/2003 12:00am

    Originally posted by: Gorgo

    Actually I have one page with is securied with some user name and password. How to pass the pass and user name to the web ?

    thanks

    Reply
  • If Only

    Posted by Legacy on 02/18/2003 12:00am

    Originally posted by: alan

    Is there any way to get part of the source?

    I tried putting the source of a large page into a CString and then trying .Find("something") but it returns -1. Its definately in the source but is it because the CString doesn't hold enough that it doesn't find it?

    thanks

    Reply
  • Save all pictures

    Posted by Legacy on 01/03/2003 12:00am

    Originally posted by: Nexus

    Hello

    How can I save the picture in the HTML page. This function save only the text but not the picture.

    Thanks you for the answer

    Reply
  • Can Someone pls help?(Urgent)

    Posted by Legacy on 12/26/2002 12:00am

    Originally posted by: Eunice

    CString urlAddr = _T("172.15.156.43/hello.txt");
    CString filename = _T("\\My Documents\\album\\try.txt");
    URLDownloadToFile(NULL,urlAddr,filename,0,NULL);
    MessageBox( TEXT("URL OK"), TEXT("Load Image"), MB_OK);

    This is my so source code but when i compile it,i keep having this error

    D:\PocketPCProject\source\EVC\copy3\SimpleImageDlg.cpp(124) : error C2065: 'URLDownloadToFile' : undeclared identifier
    Can anyone help me to solve it please??

    Reply
  • I get html error 407 Proxy Authentication Required

    Posted by Legacy on 09/03/2002 12:00am

    Originally posted by: tom

    I get html error 407 Proxy Authentication Required by using this function. What can i do?

    Reply
  • stdafx.h and afxinet.h are MFC only in MSVC6

    Posted by Legacy on 07/11/2002 12:00am

    Originally posted by: Payson Welch

    Took me a little bit to figure out that the stdafx and afxinet libraries seem to only (well with no tweaking involved) be part of an MFC type application. Simple enough you just need to create a new project, standard exe with MFC support, don't forget to use the project_name.cpp file that MSVC will create for you in the project folder.

    Reply
  • Is it possible to get source type?

    Posted by Legacy on 07/05/2002 12:00am

    Originally posted by: Chandramohan J

    Is it possible to get the type of source file, i.e, html, xml ,asp or jsp ?
    If there is any function it will be useful.

    Thank You.

    Reply
  • How to get all files?

    Posted by Legacy on 04/28/2002 12:00am

    Originally posted by: Xiaolong Wu

    A web page usually contains many files: an HTML file and several files of type graphic, sound, movie, ...

    Hope somebody can tell me how to separate these files, get their type and length from the CHttpFile* file returned by OpenRequest().

    In short, I would like to know how the "Save" function is done in IE6.

    Any help is appreciated.

    Reply
  • Loading, Please Wait ...

Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • Live Event Date: October 29, 2014 @ 11:00 a.m. ET / 8:00 a.m. PT Are you interested in building a cognitive application using the power of IBM Watson? Need a platform that provides speed and ease for rapidly deploying this application? Join Chris Madison, Watson Solution Architect, as he walks through the process of building a Watson powered application on IBM Bluemix. Chris will talk about the new Watson Services just released on IBM bluemix, but more importantly he will do a step by step cognitive …

  • Live Event Date: October 23, 2014 @ 12:00 p.m. ET / 9:00 a.m. PT Despite the current "virtualize everything" mentality, there are advantages to utilizing physical hardware for certain tasks. This is especially true for backups. In many cases, it is clearly in an organization's best interest to make use of physical, purpose-built backup appliances rather than relying on virtual backup software (VBA - Virtual Backup Appliances). Join us for this eSeminar to learn why physical appliances are preferable to …

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds