net/mlx4: use one page fragment per incoming frame

mlx4 driver has a suboptimal memory allocation strategy for regular MTU=1500 frames, as it uses two page fragments : One of 512 bytes and one of 1024 bytes. This makes GRO less effective, as each GSO packet contains 8 MSS instead of 16 MSS. Performance of a single TCP flow gains 25 % increase with the following patch. Before patch : A:~# netperf -H 192.168.0.2 -Cc MIGRATED TCP STREAM TEST ... Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB 87380 16384 16384 10.00 13798.47 3.06 4.20 0.436 0.598 After patch : A:~# netperf -H 192.68.0.2 -Cc MIGRATED TCP STREAM TEST ... Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB 87380 16384 16384 10.00 17273.80 3.44 4.19 0.391 0.477 Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Amir Vadai <amirv@mellanox.com> Acked-By: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-03 07:54:55 +00:00 · 2013-06-03 07:54:55 +00:00 · e6309cff76
parent 8cc3e439ab
commit e6309cff76
1 changed files with 3 additions and 3 deletions
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@ -98,11 +98,11 @@
 #define MLX4_EN_ALLOC_SIZE	PAGE_ALIGN(16384)
 #define MLX4_EN_ALLOC_ORDER	get_order(MLX4_EN_ALLOC_SIZE)

-/* Receive fragment sizes; we use at most 4 fragments (for 9600 byte MTU
+/* Receive fragment sizes; we use at most 3 fragments (for 9600 byte MTU
 * and 4K allocations) */
 enum {
-	FRAG_SZ0 = 512 - NET_IP_ALIGN,
-	FRAG_SZ1 = 1024,
+	FRAG_SZ0 = 1536 - NET_IP_ALIGN,
+	FRAG_SZ1 = 4096,
 	FRAG_SZ2 = 4096,
 	FRAG_SZ3 = MLX4_EN_ALLOC_SIZE
 };