==============[ ACKs in the presence of packet loss ]============== As you read this, keep peeking at the normal-trace.txt. After you've read this, look though the nc-trace.pcap, and find the same events in it (it's a trace from a different run of netcat, with different TCP segments lost). ---[ 1. Dealing with lost segments. The server will send TCP segments to you in order, each starting exactly where the previous one left off. For segments of size 512 bytes, the first segment will show as 1:513 (length 512), the second as 513:1025, and so on. Your ACK for the first one will have the relative ack number of 513, your ack for the second the relative number of 1025, and so on: 3-way handshake: 16:06:25.133196 IP 192.168.56.100.31337 > 192.168.56.1.8080: Flags [S], seq 308271051, win 65535, length 0 16:06:25.133389 IP 192.168.56.1.8080 > 192.168.56.100.31337: Flags [S.], seq 22222925, ack 308271052, win 65535, options [mss 1460], length 0 16:06:25.134223 IP 192.168.56.100.31337 > 192.168.56.1.8080: Flags [.], ack 1, win 65535, options [mss 1460], length 0 Netcat starts sending segments: 16:06:25.134339 IP 192.168.56.1.8080 > 192.168.56.100.31337: Flags [.], seq 1:513, ack 1, win 65535, length 512 16:06:25.134354 IP 192.168.56.1.8080 > 192.168.56.100.31337: Flags [P.], seq 513:1025, ack 1, win 65535, length 512 16:06:25.134370 IP 192.168.56.1.8080 > 192.168.56.100.31337: Flags [.], seq 1025:1537, ack 1, win 65535, length 512 16:06:25.134375 IP 192.168.56.1.8080 > 192.168.56.100.31337: Flags [P.], seq 1537:2049, ack 1, win 65535, length 512 16:06:25.134386 IP 192.168.56.1.8080 > 192.168.56.100.31337: Flags [.], seq 2049:2561, ack 1, win 65535, length 512 16:06:25.134391 IP 192.168.56.1.8080 > 192.168.56.100.31337: Flags [P.], seq 2561:3073, ack 1, win 65535, length 512 16:06:25.134404 IP 192.168.56.1.8080 > 192.168.56.100.31337: Flags [.], seq 3073:3585, ack 1, win 65535, length 512 16:06:25.134409 IP 192.168.56.1.8080 > 192.168.56.100.31337: Flags [P.], seq 3585:4097, ack 1, win 65535, length 512 First acknowledgment, for the first segment: 16:06:25.138634 IP 192.168.56.100.31337 > 192.168.56.1.8080: Flags [.], ack 513, win 65535, length 0 16:06:25.138678 IP 192.168.56.1.8080 > 192.168.56.100.31337: Flags [.], seq 4097:4609, ack 1, win 65535, length 512 16:06:25.138685 IP 192.168.56.1.8080 > 192.168.56.100.31337: Flags [.], seq 4609:5121, ack 1, win 65535, length 512 ACK for the 2nd segment: 16:06:25.140737 IP 192.168.56.100.31337 > 192.168.56.1.8080: Flags [.], ack 1025, win 65535, length 0 As soon as you ACK a segment (and assuming your ACK is not lost), the server will not attempt to resend it. This is important: if you accidentally ACK a segment that you don't have, you will never get it, and your file transfer will fail. Assume that you received the segments [1:513] [513:1025] [1025:1537] x [2049:2561] [2561:3073] whereas the segment [1537:2049] got lost. Then you can (and should) only ack 1537. Unless you negotiated using SACKs during the 3-way handshake, you cannot really signal the fact that you got any bytes beyond 1537. No matter what the server sends, so long as it's not the missing segment, you can only respond with ACK 1537; i.e., your ACK number must stop at the first gap in the run of bytes in the received segments. Eventually, though, the missing segment's RTO timer will expire, and the server will resend [1537:2049]. Then you can ACK 2049, but you can (and should) ACK 3073, because you now have the full run up to 3073. That will avoid the server having to retransmit [2049:2561] and [2561:3073] needlessly. ---[ 2. Gracefully terminating the connection. The server will transmit all the segments to you, and will conclude with the packet that has the FIN bit set. It will also likely have ACK and PUSH flags, thus showing as [FP.]: 16:06:25.346743 IP 192.168.56.1.8080 > 192.168.56.100.31337: Flags [.], seq 99329:99841, ack 1, win 65535, length 512 16:06:25.346747 IP 192.168.56.1.8080 > 192.168.56.100.31337: Flags [FP.], seq 99841:99938, ack 1, win 65535, length 97 This FIN means that the server has no more data to send, and that there will be no bytes from its side after 99937 (that is, there will be no 99938th). Recall that the file size is exactly 99937. You _cannot_ ACK that last packet if there are any gaps in your received segments! Instead, you can only ACK the stream so far as you have it unbroken: 16:06:25.348460 IP 192.168.56.100.31337 > 192.168.56.1.8080: Flags [.], ack 48641, win 65535, length 0 Since you did not ACK, the server will continue resending that packet again and again: 16:06:25.348484 IP 192.168.56.1.8080 > 192.168.56.100.31337: Flags [F.], seq 99938, ack 1, win 65535, length 0 16:06:25.350489 IP 192.168.56.100.31337 > 192.168.56.1.8080: Flags [.], ack 49153, win 65535, length 0 16:06:25.350510 IP 192.168.56.1.8080 > 192.168.56.100.31337: Flags [F.], seq 99938, ack 1, win 65535, length 0 16:06:25.352381 IP 192.168.56.100.31337 > 192.168.56.1.8080: Flags [.], ack 49665, win 65535, length 0 16:06:25.352405 IP 192.168.56.1.8080 > 192.168.56.100.31337: Flags [F.], seq 99938, ack 1, win 65535, length 0 It's a bit painful to watch (and this is why SACKs were invented), but there's nothing you can do. If you ACK 99938 (with 99939, because ACK-ing a FIN requires adding 1, just like with a SYN), you will not get the lost segments before it that you need to complete the file. So you need to ACK the segments you have, and wait until you have them all, and only _then_ ACK 99939: 16:06:25.602389 IP 192.168.56.1.8080 > 192.168.56.100.31337: Flags [F.], seq 99938, ack 1, win 65535, length 0 16:06:25.603764 IP 192.168.56.100.31337 > 192.168.56.1.8080: Flags [.], ack 97793, win 65535, length 0 16:06:25.603788 IP 192.168.56.1.8080 > 192.168.56.100.31337: Flags [F.], seq 99938, ack 1, win 65535, length 0 16:06:25.614751 IP 192.168.56.100.31337 > 192.168.56.1.8080: Flags [.], ack 97793, win 65535, length 0 16:06:25.615225 IP 192.168.56.100.31337 > 192.168.56.1.8080: Flags [.], ack 97793, win 65535, length 0 16:06:26.009307 IP 192.168.56.1.8080 > 192.168.56.100.31337: Flags [.], seq 97793:98305, ack 1, win 65535, length 512 16:06:26.010446 IP 192.168.56.100.31337 > 192.168.56.1.8080: Flags [.], ack 98817, win 65535, length 0 16:06:26.010498 IP 192.168.56.1.8080 > 192.168.56.100.31337: Flags [.], seq 98817:99329, ack 1, win 65535, length 512 16:06:26.010508 IP 192.168.56.1.8080 > 192.168.56.100.31337: Flags [.], seq 99329:99841, ack 1, win 65535, length 512 16:06:26.011847 IP 192.168.56.100.31337 > 192.168.56.1.8080: Flags [.], ack 99841, win 65535, length 0 (You may or may not see the last segment retransmitted in other runs. You see it here): 16:06:26.011882 IP 192.168.56.1.8080 > 192.168.56.100.31337: Flags [FP.], seq 99841:99938, ack 1, win 65535, length 97 16:06:26.012848 IP 192.168.56.100.31337 > 192.168.56.1.8080: Flags [.], ack 99841, win 65535, length 0 This will go on for a while, with ACKs for the previous segments that made it from your side. Finally, after many many repetitions of this [F.], seq 99938 and server retransmissions of un-ACKed missing segments, you can ACK the end of the stream. At this point the server knows its done (if your ACK doesn't get lost): 16:06:26.014135 IP 192.168.56.100.31337 > 192.168.56.1.8080: Flags [.], ack 99939, win 65535, length 0 16:06:26.014166 IP 192.168.56.1.8080 > 192.168.56.100.31337: Flags [.], ack 1, win 65535, length 0 Now you need to close the connection from your side by sending a FIN: 16:06:26.014346 IP 192.168.56.100.31337 > 192.168.56.1.8080: Flags [F.], seq 1, ack 99939, win 65534, length 0 16:06:26.014375 IP 192.168.56.1.8080 > 192.168.56.100.31337: Flags [.], ack 2, win 65535, length 0 ---[ 3. TIME_WAIT weirdness. Upon getting your FIN, and ACK-ing it (the last two packets above), the server will enter the TIME_WAIT state of the TCP state machine. In that state, it may behave the way you might not expect (I did not). From my capture when I tried to rerun my tcp-connect: 16:06:35.388217 IP 192.168.56.100.31337 > 192.168.56.1.8080: Flags [S], seq 182890984, win 65535, length 0 16:06:35.388267 IP 192.168.56.1.8080 > 192.168.56.100.31337: Flags [.], ack 2, win 65535, length 0 That was not a SYN+ACK, but just an ACK, as if the connection were still open! 16:06:35.390197 IP 192.168.56.100.31337 > 192.168.56.1.8080: Flags [R.], seq 2, ack 99940, win 65535, length 0 What gives? I closed the connection correctly, and it should go away, not send me an ACK as if it were still there! This behavior is explained here: https://serverfault.com/questions/297134/server-not-sending-a-syn-ack-packet-in-response-to-a-syn-packet -----------[ begin quote ]----------- [..] when a socket is on TIME_WAIT and a new SYN appears (for the same pair of ip/port src, ip/port dest), the kernel checks if the SEQ number of the SYN is < or > than the last SEQ received for this socket. (PS: in the image of the wireshark output attached to this issue, seq number are shown as relative; if you don't set them as absolute you can't see the issue. The capture would have to show the old session also to be able to compare SEQ numbers) -- If the SEQ number of the SYN is > than the SEQ number of the previous packet, a new connection is created and everything works -- If the SEQ number of the SYN is < than the SEQ number of the previous packet, the kernel will send an ACK related to the previous socket because the kernel think that the SYN received is a delayed packet of the previous socket. The behavior is like that because at the beginning of TCP the SEQ number generated by computers where incremental, it was almost impossible to receive a SEQ number < than the SEQ number of a previous socket still in TIME_WAIT. The increase of bandwidth of computers make this from almost impossible to rare. But the most important things here is that now most system use random ISN (initial SEQ number) to improve security. So nothing prevent the SEQ number a of new socket to be > than the SEQ number of a previous one. -----------[ end quote ]----------- More here: starting at slide 19, http://www.bsdcan.org/2006/papers/ImprovingTCPIP.pdf -----------[ begin quote ]----------- -- During a normal TCP socket close, the side of the connection that starts to close the connection will enter the time wait state for two minutes (RFC 793) -- The purpose of the time wait state is to ignore any old (or duplicate) packets still in the network -- BSD-derived TCP/IP stacks will recycle a TIME_WAIT socket only if the ISN in the SYN packet is greater than the sequence number at the end of the previous connection -----------[ end quote ]----------- The upshot is, unless you choose a larger SEQ number every time, waiting out the time-out of TIME_OUT is your best bet. More about TIME_WAIT and the detailed rationale for this behavior: https://vincent.bernat.im/en/blog/2014-tcp-time-wait-state-linux On Linux you can play with systems settings: https://serverfault.com/questions/303652/time-wait-connections-not-being-cleaned-up-after-timeout-period-expires