net/mlx4: use one page fragment per incoming frame
mlx4 driver has a suboptimal memory allocation strategy for regular MTU=1500 frames, as it uses two page fragments : One of 512 bytes and one of 1024 bytes. This makes GRO less effective, as each GSO packet contains 8 MSS instead of 16 MSS. Performance of a single TCP flow gains 25 % increase with the following patch. Before patch : A:~# netperf -H 192.168.0.2 -Cc MIGRATED TCP STREAM TEST ... Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB 87380 16384 16384 10.00 13798.47 3.06 4.20 0.436 0.598 After patch : A:~# netperf -H 192.68.0.2 -Cc MIGRATED TCP STREAM TEST ... Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB 87380 16384 16384 10.00 17273.80 3.44 4.19 0.391 0.477 Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Amir Vadai <amirv@mellanox.com> Acked-By: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This commit is contained in:
parent
8cc3e439ab
commit
e6309cff76
|
@ -98,11 +98,11 @@
|
|||
#define MLX4_EN_ALLOC_SIZE PAGE_ALIGN(16384)
|
||||
#define MLX4_EN_ALLOC_ORDER get_order(MLX4_EN_ALLOC_SIZE)
|
||||
|
||||
/* Receive fragment sizes; we use at most 4 fragments (for 9600 byte MTU
|
||||
/* Receive fragment sizes; we use at most 3 fragments (for 9600 byte MTU
|
||||
* and 4K allocations) */
|
||||
enum {
|
||||
FRAG_SZ0 = 512 - NET_IP_ALIGN,
|
||||
FRAG_SZ1 = 1024,
|
||||
FRAG_SZ0 = 1536 - NET_IP_ALIGN,
|
||||
FRAG_SZ1 = 4096,
|
||||
FRAG_SZ2 = 4096,
|
||||
FRAG_SZ3 = MLX4_EN_ALLOC_SIZE
|
||||
};
|
||||
|
|
Loading…
Reference in New Issue