1 | Some less-widely known details of TCP connections.
2 |
3 | Properly closing the connection.
4 |
5 | After this code sequence:
6 |
7 | sock = socket(AF_INET, SOCK_STREAM, 0);
8 | connect(sock, &remote, sizeof(remote));
9 | write(sock, buffer, 1000000);
10 |
11 | a large block of data is only buffered by kernel, it can't be sent all at once.
12 | What will happen if we close the socket?
13 |
14 | "A host MAY implement a 'half-duplex' TCP close sequence, so that
15 | an application that has called close() cannot continue to read
16 | data from the connection. If such a host issues a close() call
17 | while received data is still pending in TCP, or if new data is
18 | received after close() is called, its TCP SHOULD send a RST
19 | to show that data was lost."
20 |
21 | IOW: if we just close(sock) now, kernel can reset the TCP connection
22 | (send RST packet).
23 |
24 | This is problematic for two reasons: it discards some not-yet sent
25 | data, and it may be reported as error, not EOF, on peer's side.
26 |
27 | What can be done about it?
28 |
29 | Solution #1: block until sending is done:
30 |
31 | /* When enabled, a close(2) or shutdown(2) will not return until
32 | * all queued messages for the socket have been successfully sent
33 | * or the linger timeout has been reached.
34 | */
35 | struct linger {
36 | int l_onoff; /* linger active */
37 | int l_linger; /* how many seconds to linger for */
38 | } linger;
39 | linger.l_onoff = 1;
40 | linger.l_linger = SOME_NUM;
41 | setsockopt(sock, SOL_SOCKET, SO_LINGER, &linger, sizeof(linger));
42 | close(sock);
43 |
44 | Solution #2: tell kernel that you are done sending.
45 | This makes kernel send FIN after all data is written:
46 |
47 | shutdown(sock, SHUT_WR);
48 | close(sock);
49 |
50 | However, experiments on Linux 3.9.4 show that kernel can return from
51 | shutdown() and from close() before all data is sent,
52 | and if peer sends any data to us after this, kernel still responds with
53 | RST before all our data is sent.
54 |
55 | In practice the protocol in use often does not allow peer to send
56 | such data to us, in which case this solution is acceptable.
57 |
58 | Solution #3: if you know that peer is going to close its end after it sees
59 | our FIN (as EOF), it might be a good idea to perform a read after shutdown().
60 | When read finishes with 0-sized result, we conclude that peer received all
61 | the data, saw EOF, and closed its end.
62 |
63 | However, this incurs small performance penalty (we run for a longer time)
64 | and requires safeguards (nonblocking reads, timeouts etc) against
65 | malicious peers which don't close the connection.
66 |
67 | Solutions #1 and #2 can be combined:
68 |
69 | /* ...set up struct linger... then: */
70 | setsockopt(sock, SOL_SOCKET, SO_LINGER, &linger, sizeof(linger));
71 | shutdown(sock, SHUT_WR);
72 | /* At this point, kernel sent FIN packet, not RST, to the peer, */
73 | /* even if there is buffered read data from the peer. */
74 | close(sock);
75 |
76 | Defeating Nagle.
77 |
78 | Method #1: manually control whether partial sends are allowed:
79 |
80 | This prevents partially filled packets being sent:
81 |
82 | int state = 1;
83 | setsockopt(fd, IPPROTO_TCP, TCP_CORK, &state, sizeof(state));
84 |
85 | and this forces last, partially filled packet (if any) to be sent:
86 |
87 | int state = 0;
88 | setsockopt(fd, IPPROTO_TCP, TCP_CORK, &state, sizeof(state));
89 |
90 | Method #2: make any write to immediately send data, even if it's partial:
91 |
92 | int state = 1;
93 | setsockopt(fd, IPPROTO_TCP, TCP_NODELAY, &state, sizeof(state));