Click to See Complete Forum and Search --> : Algorithm for sitemap generation tool


kalaid
January 24th, 2008, 05:52 AM
hi

I'm in the process of developing a online sitemap generator tool using php.

I've studied all the necessary prototypes and the requirements for the tool.

But i couldn't come to a conclusion on the algorithm the tool should be using for crawling..

What would be the be the best algorithm (for searching and adding links) for the crawler program?

Please help me with this..

PeejAvery
January 24th, 2008, 10:26 AM
Well, I wouldn't say that there is an algorithm involved, just simple logic.

1. Visit a page.
2. Use regular expressions to retrieve all href properties of each anchor in that page.
3. Add each value to an array.
4. Loop through that array reading each page to parse all the href properties and add those to the original array as well (if they don't already exist).
5. When all is said and done, you have an array of all pages linked on your site.