POLLHUP polling: Difference between revisions

Revision as of 23:49, 27 October 2022

This page is a work in progress

Recently I've been trying to listen on multiple sockets at once for events like reading or writing. This has worked fine, but as I've tried to listen for a socket close event I've found the documentation poor and often contradictory. This page is my attempt to figure this topic out.

POSIX

POSIX specifies two functions for waiting on multiple file descriptors:

select which originates from the BSD sockets API

poll which originates from the System V STREAMS API

select doesn't specify anything to do with sockets being closed, so that's not too useful.

poll initially looks like it's better but it leaves a lot of questions after reading:

Does POLLHUP apply to sockets?
Can the events field be 0 and still get POLLHUP events?
Under what conditions do we even get return events?

I highly recommend trying to read the poll standard and sussing out these details.

As far as I can tell nobody has brought up either of these questions during the standards process. At least in these sources:

The only talk of this issue I've found online is the 2001 email "poll() and events==0" which has no clear answers.

Existing lore

Richard Kettlewell's poll() and EOF has a test results for the case of setting POLLIN and closing a socket with no data to read. This is the most common case of POLLHUP use I see online.

The test results give us some interesting data:

The first is that POLLIN is often set if a socket is closed. If set alone this means a POLLIN poll without any return events may mean you still can't read from the socket
The second is that POLLHUP is often not set on older systems. This means programs polling POLLOUT may miss socket closures altogether. This is one reason why systems are moving to set POLLHUP instead of just POLLIN
The third is that neither are always set. Cygwin and very early Linux systems only set POLLHUP. This seems like a bug to me especially as most online advice recommend watching POLLIN for socket closures when writing portable programs

This data doesn't check what poll returns if there's still data in the socket to read but the connection has closed. That case may have entirely different results.

Another issue is what to return if a socket was never connected in the first place?

Test code

With that in mind, I wrote this quick program that polls for POLLHUP and nothing else:

/* compile and run with gcc test.c -oTEST && ./TEST */

#include <stdio.h>
#include <poll.h>
#include <sys/socket.h>
#include <unistd.h>

int main(void) {
  /* create a socket pair, one end for parent, one end for child */
  int sockets[2];
  int rc;
  rc = socketpair(AF_UNIX, SOCK_STREAM, 0, sockets);
  /* fork in to to a parent in child */
  int proc = fork();
  if(proc == -1) { printf("fork err\n"); return 1; }
  if(proc != 0) {
    /* parent: poll for POLLHUP */
    rc = close(sockets[1]); /* close child end to avoid holding open */
    if(rc == -1) { printf("close err 1 \n"); return 1; }
    struct pollfd fd;
    fd.fd = sockets[0]; /* watch parent end */
    fd.events = 0;
    fd.revents = 0;
    rc = poll(&fd, 1, 10000);
    if(rc < 1) { printf("poll err\n"); return 1; }
    else if(!(fd.revents & POLLHUP)) { printf("no POLLHUP?\n"); return 1; }
    printf("got POLLHUP, all good\n");
    return 0;
  } else {
    /* child: close child socket end and quit */
    rc = close(sockets[1]); /* close child end */
    if(rc == -1) { printf("close err 3\n"); return 1; }
    return 0;
  }
}

It's a bit nasty but it gets the job done.

TODO: test for pre-connect

TODO: don't fork

TODO: write to buffer before closing

Linux

https://man7.org/linux/man-pages/man2/poll.2.html

"The field fd contains a file descriptor for an open file."

"The field events is an input parameter, a bit mask specifying the events the application is interested in for the file descriptor fd. This field may be specified as zero, in which case the only events that can be returned in revents are POLLHUP, POLLERR, and POLLNVAL (see below)."

"POLLHUP - Hang up (only returned in revents; ignored in events). Note that when reading from a channel such as a pipe or a stream socket, this event merely indicates that the peer closed its end of the channel."

"If none of the events requested (and no error) has occurred for any of the file descriptors, then poll() blocks until one of the events occurs."

POLLRDHUP can be ignored

https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=967631f2c84a8f2adae1772334fa77be40f18131

Linux's poll implementation

when did poll get added? how does it work?

Unix

- sysv

- sunos

BSDs

- netbsd

- freebsd

- openbsd

- illumos

- macos

Windows

wsapoll

Emulation

cygwin

minix

my dosbox code :(

@@ Line 29: / Line 29: @@
 The only talk of this issue I've found online is the 2001 email "[https://groups.google.com/g/comp.unix.programmer/c/bNNadBIEpTo/m/G5gs1mqNhbIJ poll() and events==0]" which has no clear answers.
-== General POLLHUP lore ==
+== Existing lore ==
+Richard Kettlewell's [https://www.greenend.org.uk/rjk/tech/poll.html poll() and EOF] has a test results for the case of setting POLLIN and closing a socket with no data to read. This is the most common case of POLLHUP use I see online.
-Richard Kettlewell's [https://www.greenend.org.uk/rjk/tech/poll.html poll() and EOF] page has some interesting
+The test results give us some interesting data:
-- events = POLLIN
+* The first is that POLLIN is often set if a socket is closed. If set alone this means a POLLIN poll without any return events may mean you still can't read from the socket
+* The second is that POLLHUP is often not set on older systems. This means programs polling POLLOUT may miss socket closures altogether. This is one reason why systems are moving to set POLLHUP instead of just POLLIN
+* The third is that neither are always set. Cygwin and very early Linux systems only set POLLHUP. This seems like a bug to me especially as most online advice recommend watching POLLIN for socket closures when writing portable programs
-- doesn't write to the socket first
+This data doesn't check what poll returns if there's still data in the socket to read but the connection has closed. That case may have entirely different results.
+Another issue is what to return if a socket was never connected in the first place?
-- may indicate with POLLIN
+* [https://jira.mongodb.org/browse/CDRIVER-2996 AIX returned POLLOUT in 2019]
+* [https://freebsd-net.freebsd.narkive.com/zJxZYQdq/pollhup-on-never-connected-socket FreeBSD returned POLLHUP in 2011]
-- may indicate with POLLHUP
-- may indicate with BOTH
-- On socket disconnection, POLLIN set in poll()->revents and recv() returning 0 is the only portable and reliable method.
-https://www.illumos.org/issues/4627 not generating POLLHUP dangers
-https://github.com/illumos/illumos-gate/commit/68846fd00135fb0b10944e7806025cbefcfd6546
-- stuff to read still in socket
-- may indicate with POLLOUT if the socket has failed to connect in the first place
-<nowiki>https://lkml.iu.edu/hypermail/linux/kernel/0605.1/1420.html</nowiki>
-https://jira.mongodb.org/browse/CDRIVER-2996 POLLOUT<nowiki/>https://lists.freebsd.org/pipermail/freebsd-net/2011-September/029712.html
-freebsd different ideas
-POLLIN but no data, only POLLHUP <nowiki>https://sourceware.org/bugzilla/show_bug.cgi?id=13660</nowiki>
-<nowiki>https://developer.illumos.narkive.com/dUe1v0Ya/poll-not-returning-pollhup-for-tcp-sockets-closed-by-the-other-end</nowiki> "ie you get POLLIN with a zero length read(). SO_KEEPALIVE won't be involved. Now if you down the network interface you should eventually get the POLLHUP."
-<nowiki>https://bugs.dragonflybsd.org/issues/3268</nowiki> pipes don't do POLLHUP BUG
 == Test code ==