2f53384424
vmsplice()/splice(pipe, socket) call do_tcp_sendpages() one page at a time, adding at most 4096 bytes to an skb. (assuming PAGE_SIZE=4096) The call to tcp_push() at the end of do_tcp_sendpages() forces an immediate xmit when pipe is not already filled, and tso_fragment() try to split these skb to MSS multiples. 4096 bytes are usually split in a skb with 2 MSS, and a remaining sub-mss skb (assuming MTU=1500) This makes slow start suboptimal because many small frames are sent to qdisc/driver layers instead of big ones (constrained by cwnd and packets in flight of course) In fact, applications using sendmsg() (adding an additional memory copy) instead of vmsplice()/splice()/sendfile() are a bit faster because of this anomaly, especially if serving small files in environments with large initial [c]wnd. Call tcp_push() only if MSG_MORE is not set in the flags parameter. This bit is automatically provided by splice() internals but for the last page, or on all pages if user specified SPLICE_F_MORE splice() flag. In some workloads, this can reduce number of sent logical packets by an order of magnitude, making zero-copy TCP actually faster than one-copy :) Reported-by: Tom Herbert <therbert@google.com> Cc: Nandita Dukkipati <nanditad@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: H.K. Jerry Chu <hkchu@google.com> Cc: Maciej Żenczykowski <maze@google.com> Cc: Mahesh Bandewar <maheshb@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: Eric Dumazet <eric.dumazet@gmail>com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
---|---|---|
.. | ||
netfilter | ||
af_inet.c | ||
ah4.c | ||
arp.c | ||
cipso_ipv4.c | ||
datagram.c | ||
devinet.c | ||
esp4.c | ||
fib_frontend.c | ||
fib_lookup.h | ||
fib_rules.c | ||
fib_semantics.c | ||
fib_trie.c | ||
gre.c | ||
icmp.c | ||
igmp.c | ||
inet_connection_sock.c | ||
inet_diag.c | ||
inet_fragment.c | ||
inet_hashtables.c | ||
inet_lro.c | ||
inet_timewait_sock.c | ||
inetpeer.c | ||
ip_forward.c | ||
ip_fragment.c | ||
ip_gre.c | ||
ip_input.c | ||
ip_options.c | ||
ip_output.c | ||
ip_sockglue.c | ||
ipcomp.c | ||
ipconfig.c | ||
ipip.c | ||
ipmr.c | ||
Kconfig | ||
Makefile | ||
netfilter.c | ||
ping.c | ||
proc.c | ||
protocol.c | ||
raw.c | ||
route.c | ||
syncookies.c | ||
sysctl_net_ipv4.c | ||
tcp_bic.c | ||
tcp_cong.c | ||
tcp_cubic.c | ||
tcp_diag.c | ||
tcp_highspeed.c | ||
tcp_htcp.c | ||
tcp_hybla.c | ||
tcp_illinois.c | ||
tcp_input.c | ||
tcp_ipv4.c | ||
tcp_lp.c | ||
tcp_memcontrol.c | ||
tcp_minisocks.c | ||
tcp_output.c | ||
tcp_probe.c | ||
tcp_scalable.c | ||
tcp_timer.c | ||
tcp_vegas.c | ||
tcp_vegas.h | ||
tcp_veno.c | ||
tcp_westwood.c | ||
tcp_yeah.c | ||
tcp.c | ||
tunnel4.c | ||
udp_diag.c | ||
udp_impl.h | ||
udp.c | ||
udplite.c | ||
xfrm4_input.c | ||
xfrm4_mode_beet.c | ||
xfrm4_mode_transport.c | ||
xfrm4_mode_tunnel.c | ||
xfrm4_output.c | ||
xfrm4_policy.c | ||
xfrm4_state.c | ||
xfrm4_tunnel.c |