I've written a multi-threaded HTTP server in c++ using winsock. I've just been trying to upgrade it to HTTP 1.1, and am having trouble getting persistent connections to work properly.
At the moment, i've got the "read request, send response" part of the code in a loop that i breaks out of if it detects an http 1.0 client or the connection fails etc. At the top of the loop is a call to select() with the socket in an fd set in the read argument of select().
It always works fine for the first request, but with most browsers, after its been through the loop once, when select returns and i try to recv the request, the recv function return 0, which means that the connection has been closed.
The really odd thing is that if im using firefox and set the max persistent connections per server to 1, it works fine! One connection is made and reused to download everything. I've even tried adding an extra five images to the page and most browsers (esp. IE and opera) open a separate tcp connection for each image.
Ive used select because i need the timeout function that it provides (recv wouldn't time out) . Although asynchronous sockets would probably be better than calling select, I may one day port the code to linux, which doesn't use asynchronous sockets.
Thanks in advance!
MikeAThon
June 20th, 2007, 01:32 PM
It sounds like your call to select() is blocking for such a long time that the client has given up and has closed the connection.
Can you verify that this is what's happening? If so, then you need to find out why select() is blocking for so long.
For us to give more meaningful help, you also need to provide more details on your server. You mentioned it is multi-threaded, but what model is being used? One thread per connection? Is there a separate thread for the listening socket (preferred, if you have already constrained yourself to a select()-based architecture)? Or is the listening socket entangled in the same thread as the communication sockets (not preferred)?
Mike
tszarx
June 20th, 2007, 02:58 PM
Thanks for replying, there is a is a listening thread, and an array of x number of sockets - the listening thread loops through the sockets, checking if they are in use (sockets are actually in structs with a bool variable which indicates whether or not they are in use). If a socket isn't in use, it calls accept() and passes a pointer to the struct with the accepted socket to a new thread, which handles the connection.
I accept doesn't seem to block for very long - doesn't it return as soon as there is data be read on the port? I wouldn't have used select at all it it were possible to put a time out on recv.
Here is the accept() call, where 'client' is the the socket:
Then it reads the headers of the http request. It does this by recv()ing 1 byte at a time so it can locate the double crlf at the end, then check the content-length and allocate memory for the body (i.e. post or put data). Is there any way to read the header faster than 1 byte at a time without reading into the body?
The odd thing is, sometimes when i load the page (ctrl+f5 to avoid cached version) only 4 connections will be formed when there are 5 files to download (4 images and 1 html file). This implies that it is working to some degree.
Could it be that IE just opens loads of connections and doesn't really bother with persistent connections until there are dozens of files to download. Even when i change the settings in opera to 1 connection per server, it just opens 5 connections, one after the other.
Thanks.
MikeAThon
June 20th, 2007, 04:37 PM
I accept doesn't seem to block for very long - doesn't it return as soon as there is data be read on the port? I wouldn't have used select at all it it were possible to put a time out on recv.
I don't understand this comment. The accept() function is used to accept new connections. It is not used to read data; the recv() function is used for this purpose, and recv() can only be called on an already-established connection.
Note that it is possble to put a timeout on the recv() function, using SO_RECVTIMEO
However, a more common approach is to create a scavenger thread, whose sole purpose is to periodically go through a list of connected sockets, checking for recent activity. If a socket is found without recent activity, close() is called from the scavenger thread. This causes all blocking calls on recv() (in another thread) to return immediately, with a return value of zero ( 0 ) signifying that the connection has been closed.
Mike
tszarx
June 20th, 2007, 04:46 PM
sorry I didn't mean accept, i meant select, which i was using to check the socket for closure or available data.
I hadn't heard about the scavenger thread idea, i will give it a go though, perhaps removing select from the equation will solve the issue. i assume i would implement this by including a variable specifying the time of the last activity in the socket's struct, and testing it to see if it further in the past than the current time minus the keep-alive time.
thanks for your help, i will post an update when i have the scavenger thread working
tszarx
June 20th, 2007, 06:22 PM
I have removed the call to select() and noticed something odd - without the select (before i put in the scavenger thread) one would assume that the connection would just stay open until either the server or browser closed, or the browser timed out the connection. This is exactly what happens with firefox, but not with IE or Opera. it is as if IE and Opera aren't http 1.1 capable, but i know they are.
Am I missing something? I'm responding with HTTP/1.1 etc. All that happens with the problem browsers is that recv() return 0 after the first loop. As it goes straight back in to the recv() call after sending the response, this means that the connections closes as soon as it gets the response (HTTP 1.0 behaviour?)
Thanks for your time!
MikeAThon
June 20th, 2007, 07:25 PM
I can't speak for Opera, but in IE, HTTP/1.1 is the default. It can be de-selected down to HTTP/1.0. In addition, under IE, if you connect through a proxy, IE will downgrade to HTTP/1.0.
So, what is IE sending as part of the request? You can tell by looking at the request whether the browser is asking for HTTP/1.1 or 1.0, and whether the request includes a Connection: Close header.
Mike
tszarx
June 20th, 2007, 08:01 PM
I've check the request, and it is definitely HTTP/1.1, and it is sending a Connection: Keep-Alive header. I'm truly stumped. I can't figure out why Firefox would work while the others don't. IE and Opera are just refusing to reuse the connections. Is there some other requirement for persistent connections that either Firefox is ignoring or IE and Opera have added?
MikeAThon
June 20th, 2007, 10:53 PM
In your response, is it possible that you are sending a "Connection: Close" header?
tszarx
June 21st, 2007, 05:10 AM
Thanks for your reply. I've checked through the headers, and it doesn't usually send a connection header, but I tried sending connection: keep-alive headers and it still wouldn't work. Also, is there any way of reading the headers faster than one byte at a time without reading past the \r\n\r\n and into the body (post or put data)?
Thanks.
MikeAThon
June 21st, 2007, 03:30 PM
is there any way of reading the headers faster than one byte at a time without reading past the \r\n\r\n and into the body (post or put data)?
Are you calling recv() with a one-byte buffer, repeatedly until a full header is received? If so, then this is highly inefficient and should be changed. The basic principle to adhere to is to recv() as much as is available, so as to transfer received data out of kernel-mode space and into user-mode buffers. Parse from your own buffers.
But it's unlikely that this is causing the symptoms you are seeing. It probably is something else.
From my own past experience in writing servers, I know that IE makes multiple persistent connections. You are not seeing two things: you do not see multiple connections, and you do not see persistent connections. I would focus on the persistent part first.
Install a web sniffer like Wire Shark, and sniff in on the conversation between the browser and your server. Do the same thing for a conversation between the browser and a working Internet server. Compare the two. That should give you more information on who is closing the connection and why it's being closed.
Mike
tszarx
June 22nd, 2007, 02:12 PM
using wireshark and a firefox extension called live http headers to check out whats going wrong.
with regard to the issue of reading the headers - if you use a large buffer to collect all the data, how do you allocate memory for the data according to the content-length header? e.g. if a large file was being uploaded.
thanks
MikeAThon
June 22nd, 2007, 02:52 PM
I probably don't understand your question, since the answer seems clear. Call recv() with a large buffer, maybe in a loop with a fxed buffer size of (say) 8K, and in each iteration, append the newly-received buffer onto the end of your file. Or, if you know in advance the size that you need, then "new/malloc" it from the heap.
Mike
tszarx
June 23rd, 2007, 08:24 PM
Isn't the content-length header there so that you can dynamically allocate a buffer for the body content? What you suggested would work if saving the file straight away, but I'm using cgi in the server to I have to separate the post data and send it to the stdin of a cgi program. doesn't this mean you need the header before you get the body?
MikeAThon
June 24th, 2007, 02:13 AM
Isn't the content-length header there so that you can dynamically allocate a buffer for the body content?
No, of course not. It's got nothing to do with sockets, which is one layer below HTTP. It's purely for purposes of HTTP. The content-length is there so that (with persistent connections) you can tell when the message body is finished, such that anything received afterwards must be a new request.
Mike
codeguru.com
Copyright Internet.com Inc., All Rights Reserved.