// JP opened flex table

Click to See Complete Forum and Search --> : Not receive CRLF in HTTP response


Cooker
December 5th, 2007, 11:24 PM
Hi,
I write a program to download a file from HTTP server.
I try to download the file "http://tw.yimg.com/i/tw/ks/lsm/ykp_logo.gif", skip the headers and store the entity body. I know that the division between the headers and the entity body is designated by two carriage returns/line feeds, "\r\n\r\n". But in this statements
// Skip the HTTP headers
division = strstr(recvbuf, "\r\n\r\n");
if (division != NULL && division < recvbuf + recvbytes) {
headerlength = division - recvbuf + 4;
fwrite(recvbuf + headerlength, 1, recvbytes - headerlength, fp);
}
I find that the return value of strstr is 0. So, I print the received data and store it into binary file.

HTTP/1.0 200 OK
Last-Modified: Tue, 16 Oct 2007 16:05:27 GMT
Content-Type: image/gif
Content-Length: 3542
Expires: Fri, 13 Oct 2017 16:05:27 GMT
 
tmp = fopen("recv.log","wb");
recvbytes = recv(sockfd, recvbuf, recvbuflen, 0);
fwrite(recvbuf, 1, recvbytes, tmp);
fclose(tmp);

I check the content of file, there is NO two carriage returns/line feeds between the headers and the entity body. There is two line feeds between them.
 
I try to download the another file "http://tw.yimg.com/i/tw/hp/masthead/mhlogo.png".
HTTP/1.1 200 OK
Date: Thu, 06 Dec 2007 04:09:41 GMT
P3P: policyref="http://p3p.yahoo.com/w3c/p3p.xml", CP="CAO DSP COR CUR ADM DEV T
AI PSA PSD IVAi IVDi CONi TELo OTPi OUR DELi SAMi OTRi UNRi PUBi IND PHY ONL UNI
PUR FIN COM NAV INT DEM CNT STA POL HEA PRE GOV"
Cache-Control: max-age=315360000
Expires: Sun, 03 Dec 2017 04:09:41 GMT
Last-Modified: Fri, 16 Feb 2007 09:37:45 GMT
Accept-Ranges: bytes
Content-Length: 4862
Connection: close
This case, I find two carriage returns/line feeds between the headers and the entity body and the file is completely download.
 
Do I need to check both the "\r\n\r\n" and "\n\n"??? Or there is something wrong in my program?

Thread1
December 6th, 2007, 05:45 AM
well yes line breaks must be a sequence of CRLF (\r\n) as what you have written in the code, but it looks like some HTTP servers do not conform to this standard. so in order to do so, your code has to consider also a single LF (\n) as line break.

Cooker
December 6th, 2007, 07:51 PM
well yes line breaks must be a sequence of CRLF (\r\n) as what you have written in the code, but it looks like some HTTP servers do not conform to this standard. so in order to do so, your code has to consider also a single LF (\n) as line break.
Ok~ Thanks for your comment. :thumb:

messycan
December 9th, 2007, 12:32 AM
you can also parse the header by searching for <html> or <HTML> or <HTMl> or <!DOCTYPE> delimeters.

Like Thread1 stated, not all webservers conform to the same standard. I wrote a web crawler that harvested millions of websites for a project, and those delimeters worked very well. of course, you may come across more, so if you do, just add them to your code.

//JP added flex table