linux-sg2042

Commit Graph

Author	SHA1	Message	Date
Alexei Starovoitov	8f577cadf7	seccomp: JIT compile seccomp filter Take advantage of internal BPF JIT 05-sim-long_jumps.c of libseccomp was used as micro-benchmark: seccomp_rule_add_exact(ctx,... seccomp_rule_add_exact(ctx,... rc = seccomp_load(ctx); for (i = 0; i < 10000000; i++) syscall(...); $ sudo sysctl net.core.bpf_jit_enable=1 $ time ./bench real 0m2.769s user 0m1.136s sys 0m1.624s $ sudo sysctl net.core.bpf_jit_enable=0 $ time ./bench real 0m5.825s user 0m1.268s sys 0m4.548s Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-15 16:31:30 -04:00
Alexei Starovoitov	622582786c	net: filter: x86: internal BPF JIT Maps all internal BPF instructions into x86_64 instructions. This patch replaces original BPF x64 JIT with internal BPF x64 JIT. sysctl net.core.bpf_jit_enable is reused as on/off switch. Performance: 1. old BPF JIT and internal BPF JIT generate equivalent x86_64 code. No performance difference is observed for filters that were JIT-able before Example assembler code for BPF filter "tcpdump port 22" original BPF -> old JIT: original BPF -> internal BPF -> new JIT: 0: push %rbp 0: push %rbp 1: mov %rsp,%rbp 1: mov %rsp,%rbp 4: sub $0x60,%rsp 4: sub $0x228,%rsp 8: mov %rbx,-0x8(%rbp) b: mov %rbx,-0x228(%rbp) // prologue 12: mov %r13,-0x220(%rbp) 19: mov %r14,-0x218(%rbp) 20: mov %r15,-0x210(%rbp) 27: xor %eax,%eax // clear A c: xor %ebx,%ebx 29: xor %r13,%r13 // clear X e: mov 0x68(%rdi),%r9d 2c: mov 0x68(%rdi),%r9d 12: sub 0x6c(%rdi),%r9d 30: sub 0x6c(%rdi),%r9d 16: mov 0xd8(%rdi),%r8 34: mov 0xd8(%rdi),%r10 3b: mov %rdi,%rbx 1d: mov $0xc,%esi 3e: mov $0xc,%esi 22: callq 0xffffffffe1021e15 43: callq 0xffffffffe102bd75 27: cmp $0x86dd,%eax 48: cmp $0x86dd,%rax 2c: jne 0x0000000000000069 4f: jne 0x000000000000009a 2e: mov $0x14,%esi 51: mov $0x14,%esi 33: callq 0xffffffffe1021e31 56: callq 0xffffffffe102bd91 38: cmp $0x84,%eax 5b: cmp $0x84,%rax 3d: je 0x0000000000000049 62: je 0x0000000000000074 3f: cmp $0x6,%eax 64: cmp $0x6,%rax 42: je 0x0000000000000049 68: je 0x0000000000000074 44: cmp $0x11,%eax 6a: cmp $0x11,%rax 47: jne 0x00000000000000c6 6e: jne 0x0000000000000117 49: mov $0x36,%esi 74: mov $0x36,%esi 4e: callq 0xffffffffe1021e15 79: callq 0xffffffffe102bd75 53: cmp $0x16,%eax 7e: cmp $0x16,%rax 56: je 0x00000000000000bf 82: je 0x0000000000000110 58: mov $0x38,%esi 88: mov $0x38,%esi 5d: callq 0xffffffffe1021e15 8d: callq 0xffffffffe102bd75 62: cmp $0x16,%eax 92: cmp $0x16,%rax 65: je 0x00000000000000bf 96: je 0x0000000000000110 67: jmp 0x00000000000000c6 98: jmp 0x0000000000000117 69: cmp $0x800,%eax 9a: cmp $0x800,%rax 6e: jne 0x00000000000000c6 a1: jne 0x0000000000000117 70: mov $0x17,%esi a3: mov $0x17,%esi 75: callq 0xffffffffe1021e31 a8: callq 0xffffffffe102bd91 7a: cmp $0x84,%eax ad: cmp $0x84,%rax 7f: je 0x000000000000008b b4: je 0x00000000000000c2 81: cmp $0x6,%eax b6: cmp $0x6,%rax 84: je 0x000000000000008b ba: je 0x00000000000000c2 86: cmp $0x11,%eax bc: cmp $0x11,%rax 89: jne 0x00000000000000c6 c0: jne 0x0000000000000117 8b: mov $0x14,%esi c2: mov $0x14,%esi 90: callq 0xffffffffe1021e15 c7: callq 0xffffffffe102bd75 95: test $0x1fff,%ax cc: test $0x1fff,%rax 99: jne 0x00000000000000c6 d3: jne 0x0000000000000117 d5: mov %rax,%r14 9b: mov $0xe,%esi d8: mov $0xe,%esi a0: callq 0xffffffffe1021e44 dd: callq 0xffffffffe102bd91 // MSH e2: and $0xf,%eax e5: shl $0x2,%eax e8: mov %rax,%r13 eb: mov %r14,%rax ee: mov %r13,%rsi a5: lea 0xe(%rbx),%esi f1: add $0xe,%esi a8: callq 0xffffffffe1021e0d f4: callq 0xffffffffe102bd6d ad: cmp $0x16,%eax f9: cmp $0x16,%rax b0: je 0x00000000000000bf fd: je 0x0000000000000110 ff: mov %r13,%rsi b2: lea 0x10(%rbx),%esi 102: add $0x10,%esi b5: callq 0xffffffffe1021e0d 105: callq 0xffffffffe102bd6d ba: cmp $0x16,%eax 10a: cmp $0x16,%rax bd: jne 0x00000000000000c6 10e: jne 0x0000000000000117 bf: mov $0xffff,%eax 110: mov $0xffff,%eax c4: jmp 0x00000000000000c8 115: jmp 0x000000000000011c c6: xor %eax,%eax 117: mov $0x0,%eax c8: mov -0x8(%rbp),%rbx 11c: mov -0x228(%rbp),%rbx // epilogue cc: leaveq 123: mov -0x220(%rbp),%r13 cd: retq 12a: mov -0x218(%rbp),%r14 131: mov -0x210(%rbp),%r15 138: leaveq 139: retq On fully cached SKBs both JITed functions take 12 nsec to execute. BPF interpreter executes the program in 30 nsec. The difference in generated assembler is due to the following: Old BPF imlements LDX_MSH instruction via sk_load_byte_msh() helper function inside bpf_jit.S. New JIT removes the helper and does it explicitly, so ldx_msh cost is the same for both JITs, but generated code looks longer. New JIT has 4 registers to save, so prologue/epilogue are larger, but the cost is within noise on x64. Old JIT checks whether first insn clears A and if not emits 'xor %eax,%eax'. New JIT clears %rax unconditionally. 2. old BPF JIT doesn't support ANC_NLATTR, ANC_PAY_OFFSET, ANC_RANDOM extensions. New JIT supports all BPF extensions. Performance of such filters improves 2-4 times depending on a filter. The longer the filter the higher performance gain. Synthetic benchmarks with many ancillary loads see 20x speedup which seems to be the maximum gain from JIT Notes: . net.core.bpf_jit_enable=2 + tools/net/bpf_jit_disasm is still functional and can be used to see generated assembler . there are two jit_compile() functions and code flow for classic filters is: sk_attach_filter() - load classic BPF bpf_jit_compile() - try to JIT from classic BPF sk_convert_filter() - convert classic to internal bpf_int_jit_compile() - JIT from internal BPF seccomp and tracing filters will just call bpf_int_jit_compile() Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-15 16:31:30 -04:00
Alexei Starovoitov	f3c2af7ba1	net: filter: x86: split bpf_jit_compile() Split bpf_jit_compile() into two functions to improve readability of for(pass++) loop. The change follows similar style of JIT compilers for arm, powerpc, s390 The body of new do_jit() was not reformatted to reduce noise in this patch, since the following patch replaces most of it. Tested with BPF testsuite. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-15 16:31:30 -04:00
David S. Miller	9509b1c150	Merge branch 'ieee802154-next' Phoebe Buckheister says: ==================== 802154: some cleanups and fixes This series adds some definitions for 802.15.4 header fields that were missing, changes 6lowpan fragmentation to be aware of security headers and fixes 802.15.4 datagram socket sendmsg(), which was entirely incompliant to date. Also a few minor changes to mac_cb handling, mark a single-use function static, and correctly check for EMSGSIZE conditions during wpan_header_create. Changes since v1: * rename mac_cb_alloc to mac_cb_init * catch all error cases of sendmsg() instead of only !conn && msg_name * redo 6lowpan fragmentation to not clone lower layer headers ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-15 15:51:52 -04:00
Phoebe Buckheister	6ef0023a2e	mac802154: make mac802154_wpan_open static This function is only used within the same translation unit, so mark it static. Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-15 15:51:43 -04:00
Phoebe Buckheister	1cc76e3654	ieee802154: fix dgram socket sendmsg() 802.15.4 datagram sockets do not currently have a compliant sendmsg(). The destination address supplied is always ignored, and in unconnected mode, packets are broadcast instead of dropped with -EDESTADDRREQ. This patch fixes 802.15.4 dgram sockets to be compliant, i.e. !conn && !msg_name => -EDESTADDRREQ !conn && msg_name => send to msg_name conn && !msg_name => send to connected conn && msg_name => -EISCONN Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-15 15:51:43 -04:00
Phoebe Buckheister	d4b2816d67	6lowpan: fix fragmentation Currently, 6lowpan creates one 802.15.4 MAC header for the original packet the device was given by upper layers and reuses this header for all fragments, if fragmentation is required. This also reuses frame sequence numbers, which must not happen. 6lowpan also has issues with fragmentation in the presence of security headers, since those may imply the presence of trailing fields that are not accounted for by the fragmentation code right now. Fix both of these issues by properly allocating fragment skbs with headromm and tailroom as specified by the underlying device, create one header for each skb instead of reusing the original header, let the underlying device do the rest. Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-15 15:51:43 -04:00
Phoebe Buckheister	32edc40ae6	ieee802154: change _cb handling slightly The current mac_cb handling of ieee802154 is rather awkward and limited. Decompose the single flags field into multiple fields with the meanings of each subfield of the flags field to make future extensions (for example, link-layer security) easier. Also don't set the frame sequence number in upper layers, since that's a thing the MAC is supposed to set on frame transmit - we set it on header creation, but assuming that upper layers do not blindly duplicate our headers, this is fine. Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-15 15:51:42 -04:00
Phoebe Buckheister	8c84296fd2	mac802154: account for all header parts during wpan header creationg The current WPAN header creation code checks for EMSGSIZE conditions, but does not account for the MIC field that link layer security may add at the end of the frame. Now that we can accurately calculate the maximum payload size of packets, use that to check for EMSGSIZE conditions. Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-15 15:51:42 -04:00
Phoebe Buckheister	c3a6114f31	ieee802154: add definitions for link-layer security and header functions When dealing with 802.15.4, one often has to know the maximum payload size for a given packet. This depends on many factors, one of which is whether or not a security header is present in the frame. These definitions and functions provide an easy way for any upper layer to calculate the maximum payload size for a packet. The first obvious user for this is 6lowpan, which duplicates this calculation and gets it partially wrong because it ignores security headers. Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-15 15:51:42 -04:00
Markus Lottmann	9dbccc30f3	drivers: net: Register Micrel ksz884x network devices in PCI device tree. This unifies the behaviour with other network device drivers and allows for a matching of the PCI device path in UDEV rules. Signed-off-by: Markus Lottmann <markus.lottmann1986@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-15 15:43:50 -04:00
David S. Miller	fcd77db07d	net: Fix CONFIG_SYSCTL ifdef test. > include/net/ip.h:211:5: warning: "CONFIG_SYSCTL" is not defined [-Wundef] > #if CONFIG_SYSCTL > ^ Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-15 13:43:14 -04:00
David S. Miller	fd0d5e7368	Merge branch 'cpsw_cleanups' George Cherian says: ==================== TI CPSW Cleanup This series does some minimal cleanups. -Conversion of pr_() to dev_() -Convert kzalloc to devm_kzalloc. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-15 13:42:20 -04:00
George Cherian	e194312854	drivers: net: davinci_cpdma: Convert kzalloc() to devm_kzalloc(). Convert kzalloc() to devm_kzalloc(). Signed-off-by: George Cherian <george.cherian@ti.com> Reviewed-by: Felipe Balbi <balbi@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-15 13:42:14 -04:00
George Cherian	a92f40a9a3	net: davinci_mdio: Convert pr_err() to dev_err() call Convert the lone pr_err() to dev_err() call. Signed-off-by: George Cherian <george.cherian@ti.com> Reviewed-by: Felipe Balbi <balbi@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-15 13:42:14 -04:00
George Cherian	88c99ff639	driver net: cpsw: Convert pr_() to dev_() calls Convert all pr_() calls to dev_() calls. No functional changes. Signed-off-by: George Cherian <george.cherian@ti.com> Reviewed-by: Felipe Balbi <balbi@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-15 13:42:14 -04:00
Darek Marcinkiewicz	eb02a272c9	driver/net/ethernet/ec_bhf.c: fix sparse warnings Sparse was reporting quite a few warnings for the driver. Those get fixed by this patch. Signed-off-by: Dariusz Marcinkiewicz <reksio@newterm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-14 16:09:33 -04:00
Joe Perches	c722831744	net: Use a more standard macro for INET_ADDR_COOKIE Missing a colon on definition use is a bit odd so change the macro for the 32 bit case to declare an __attribute__((unused)) and __deprecated variable. The __deprecated attribute will cause gcc to emit an error if the variable is actually used. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-14 16:07:23 -04:00
Jingoo Han	126e612223	net: systemport: Use devm_ioremap_resource() Use devm_ioremap_resource() because devm_request_and_ioremap() is obsoleted by devm_ioremap_resource(). Signed-off-by: Jingoo Han <jg1.han@samsung.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-14 15:42:07 -04:00
David S. Miller	005e35f5f0	Merge branch 'mlx4-next' Or Gerlitz says: ==================== Mellanox driver update 2014-05-12 This patchset introduce some small bug fixes: Eyal fixed some compilation and syntactic checkers warnings. Ido fixed a coruption in user priority mapping when changing number of channels. Shani fixed some other problems when modifying MAC address. Yuval fixed a problem when changing IRQ affinity during high traffic - IRQ changes got effective only after the first pause in traffic. This patchset was tested and applied over commit 93dccc5: "mdio_bus: fix devm_mdiobus_alloc_size export" Changes from V1: - applied feedback from Dave to use true/false and not 0/1 in patch 1/9 - removed the patch from Noa which adddressed a bug in flow steering table when using a bond device, as the fix might need to be in the bonding driver, this is now dicussed in the netdev thread "bonding directly changes underlying device address" Changes from V0: - Patch 1/9 - net/mlx4_core: Enforce irq affinity changes immediatly - Moved the new members to a hot cache line as Eric suggested ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-14 15:41:05 -04:00
Eyal Perry	7572038409	net/mlx4_core: Fix inaccurate return value of mlx4_flow_attach() Adopt the "info: why not propagate 'ret' from parse_trans_rule()..." suggestion made by the smatch semantic checker on: drivers/net/ethernet/mellanox/mlx4/mcg.c:867 mlx4_flow_attach() Signed-off-by: Eyal Perry <eyalpe@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-14 15:40:34 -04:00
Eyal Perry	c3ca5205e6	net/mlx4_en: Using positive error value for unsigned Using a positive value for error: MLX4_NET_TRANS_RULE_NUM instead of -EPROTONOSUPPORT, to remove compilation warning. Signed-off-by: Eyal Perry <eyalpe@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-14 15:40:33 -04:00
Shani Michaelli	fe1ff29dd4	net/mlx4_en: Protect MAC address modification with the state_lock mutex This Patches solves an issue that could raise when modifying the device's MAC. It occurs due to a simultaneous access to priv->mac_hash from two contexts. The buggy scenario described below: Context 1: copy the new mac address to the dev->dev_addr field. Context 2: mlx4_en_do_uc_filter removes prev_mac entry from the mac_hash db since it is not in dev->uc and not equal to dev->dev_addr. Context 1: mlx4_en_do_set_mac() calls mlx4_en_replace_mac() to replace prev_mac with dev_addr but it fails to update the mac_hash db since it no longer contains prev_mac, therefore it returns with an error. The fix is to prevent mlx4_en_do_uc_filter from being executed by both of the context 1 calls described above, This is done by putting them both under the mdev->state_lock lock, it will solve this issue since mlx4_en_do_uc_filter is already protected by the mdev->state_lock. Reviewed-by: Eyal Perry <eyalpe@mellanox.com> Signed-off-by: Shani Michaeli <shanim@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-14 15:40:33 -04:00
Eyal Perry	483e01320e	net/mlx4_core: Removed unnecessary bit operation condition Fix the "warn: suspicious bitop condition" made by the smatch semantic checker on: drivers/net/ethernet/mellanox/mlx4/main.c:509 mlx4_slave_cap() Signed-off-by: Eyal Perry <eyalpe@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-14 15:40:33 -04:00
Eyal Perry	c05a116f39	net/mlx4_core: Fix smatch error - possible access to a null variable Fix the "error: we previously assumed 'out_param' could be null" found by smatch semantic checker on: drivers/net/ethernet/mellanox/mlx4/cmd.c:506 mlx4_cmd_poll() drivers/net/ethernet/mellanox/mlx4/cmd.c:578 mlx4_cmd_wait() Signed-off-by: Eyal Perry <eyalpe@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-14 15:40:33 -04:00
Shani Michaelli	ee755324a3	net/mlx4_en: Fix errors in MAC address changing when port is down This patch fix an issue that happen when changing the MAC address when the port is down, described as follows: 1. Set the port down. 2. Change the MAC address - mlx4_en_set_mac() will change dev->dev_addr. 3. Set the port up - will result in mlx4_en_do_uc_filter that will remove the prev_mac entry from the mac_hash db. 4. Changing the MAC address again will eventually trigger the call to mlx4_en_replace_mac() in order to replace prev_mac with dev_addr but the prev_mac entry is already not exist in the mac_hash db therefore the operation fails. The fix is to set the prev_mac with the new MAC address so in step 3 above, after setting the port up mlx4_en_get_qp() is updating the mac_hash with the entry of dev_addr which is equal to prev_mac. Therefore in step 4, when calling mlx4_en_replace_mac, the entry related to prev_mac exist in mac_hash and the replace operation succeed. Reviewed-by: Eyal Perry <eyalpe@mellanox.com> Signed-off-by: Shani Michaeli <shanim@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-14 15:40:33 -04:00
Ido Shamay	f5b6345ba8	net/mlx4_en: User prio mapping gets corrupted when changing number of channels When using ethtool set_channels, mlx4_en_setup_tc is always called, even when it was not configured. Fixed code to call mlx4_en_setup_tc() only if needed. Signed-off-by: Ido Shamay <idos@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-14 15:40:33 -04:00
Yuval Atias	2eacc23c42	net/mlx4_core: Enforce irq affinity changes immediatly During heavy traffic, napi is constatntly polling the complition queue and no interrupt is fired. Because of that, changes to irq affinity are ignored until traffic is stopped and resumed. By registering to the irq notifier mechanism, and forcing interrupt when affinity is changed, irq affinity changes will be immediatly enforced. Signed-off-by: Yuval Atias <yuvala@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-14 15:40:32 -04:00
dingtianhong	3763e7ef17	macvlan: Propagate lowerdev MTU changes When the physical MTU changes we should ensure that all existing MACVLAN dev MTU do not exceed the new lowerdev MTU. This patch adds that propagation. Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Reviewed-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-14 15:35:03 -04:00
wangweidong	8ba7e7bfc3	dccp: make the request_retries minimum is 1 In Documentation/networking/dccp.txt points that request_retries should be greater than 0. So make the extra1 to be &one instead of &zero. Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-14 15:34:16 -04:00
WANG Cong	c9f2dba61b	snmp: fix some left over of snmp stats Fengguang reported the following sparse warning: >> net/ipv6/proc.c:198:41: sparse: incorrect type in argument 1 (different address spaces) net/ipv6/proc.c:198:41: expected void [noderef] <asn:3>mib net/ipv6/proc.c:198:41: got void [noderef] <asn:3>*pcpumib Fixes: commit `698365fa18` (net: clean up snmp stats code) Reported-by: Fengguang Wu <fengguang.wu@intel.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-14 15:33:47 -04:00
WANG Cong	122ff243f5	ipv4: make ip_local_reserved_ports per netns ip_local_port_range is already per netns, so should ip_local_reserved_ports be. And since it is none by default we don't actually need it when we don't enable CONFIG_SYSCTL. By the way, rename inet_is_reserved_local_port() to inet_is_local_reserved_port() Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-14 15:31:45 -04:00
Laurent Pinchart	9cc5e36d1c	irda: sh_irda: Enable driver compilation with COMPILE_TEST This helps increasing build testing coverage. Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-14 15:23:55 -04:00
David S. Miller	d6cc76d389	Merge branch 'tipc-next' Jon Maloy says: ==================== tipc: bug fixes and improvements Intensive and extensive testing has revealed some rather infrequent problems related to flow control, buffer handling and link establishment. Commits ##1 to 4 deal with these problems. The remaining four commits are just code improvments, aiming at making the code more comprehensible and maintainable. There are no functional enhancements in this series. v2: Fixed a typo in commit log #2. Otherwise no changes from v1. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-14 15:20:19 -04:00
Jon Paul Maloy	9816f0615d	tipc: merge port message reception into socket reception function In order to reduce complexity and save a call level during message reception at port/socket level, we remove the function tipc_port_rcv() and merge its functionality into tipc_sk_rcv(). Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-14 15:19:48 -04:00
Jon Paul Maloy	c82910e2a8	tipc: clean up neigbor discovery message reception The function tipc_disc_rcv(), which is handling received neighbor discovery messages, is perceived as messy, and it is hard to verify its correctness by code inspection. The fact that the task it is set to resolve is fairly complex does not make the situation better. In this commit we try to take a more systematic approach to the problem. We define a decision machine which takes three state flags as input, and produces three action flags as output. We then walk through all permutations of the state flags, and for each of them we describe verbally what is going on, plus that we set zero or more of the action flags. The action flags indicate what should be done once the decision machine has finished its job, while the last part of the function deals with performing those actions. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-14 15:19:48 -04:00
Jon Paul Maloy	38504c28a2	tipc: improve and extend media address conversion functions TIPC currently handles two media specific addresses: Ethernet MAC addresses and InfiniBand addresses. Those are kept in three different formats: 1) A "raw" format as obtained from the device. This format is known only by the media specific adapter code in eth_media.c and ib_media.c. 2) A "generic" internal format, in the form of struct tipc_media_addr, which can be referenced and passed around by the generic media- unaware code. 3) A serialized version of the latter, to be conveyed in neighbor discovery messages. Conversion between the three formats can only be done by the media specific code, so we have function pointers for this purpose in struct tipc_media. Here, the media adapters can install their own conversion functions at startup. We now introduce a new such function, 'raw2addr()', whose purpose is to convert from format 1 to format 2 above. We also try to as far as possible uniform commenting, variable names and usage of these functions, with the purpose of making them more comprehensible. We can now also remove the function tipc_l2_media_addr_set(), whose job is done better by the new function. Finally, we expand the field for serialized addresses (format 3) in discovery messages from 20 to 32 bytes. This is permitted according to the spec, and reduces the risk of problems when we add new media in the future. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-14 15:19:48 -04:00
Jon Paul Maloy	37e22164a8	tipc: rename and move message reassembly function The function tipc_link_frag_rcv() is in reality a re-entrant generic message reassemby function that has nothing in particular to do with the link, where it is defined now. This becomes obvious when we see the need to call the function from other places in the code. In this commit rename it to tipc_buf_append() and move it to the file msg.c. We also simplify its signature by moving the tail pointer to the control block of the head buffer, hence making the head buffer self-contained. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-14 15:19:48 -04:00
Jon Paul Maloy	5074ab89c5	tipc: mark head of reassembly buffer as non-linear The message reassembly function does not update the 'len' and 'data_len' fields of the head skbuff correctly when fragments are chained to it. This may sometimes lead to obsure errors, such as fragment reordering when we receive fragments which are cloned buffers. This commit fixes this, by ensuring that the two fields are updated correctly. Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-14 15:19:48 -04:00
Jon Paul Maloy	ec37dcd382	tipc: don't record link RESET or ACTIVATE messages as traffic In the current code, all incoming LINK_PROTOCOL messages, irrespective of type, nudge the "last message received" checkpoint, informing the link state machine that a message was received from the peer since last supervision timeout event. This inhibits the link from starting probing the peer unnecessarily. However, not only STATE messages are recorded as legitimate incoming traffic this way, but even RESET and ACTIVATE messages, which in reality are there to inform the link that the peer endpoint has been reset. At the same time, some RESET messages may be dropped instead of causing a link reset. This happens when the link endpoint thinks it is fully up and working, and the session number of the RESET is lower than or equal to the current link session. In such cases the RESET is perceived as a delayed remnant from an earlier session, or the current one, and dropped. Now, if a TIPC module is removed and then immediately reinserted, e.g. when using a script, RESET messages may arrive at the peer link endpoint before this one has had time to discover the failure. The RESET may be dropped because of the session number, but only after it has been recorded as a legitimate traffic event. Hence, the receiving link will not start probing, and not discover that the peer endpoint is down, at the same time ignoring the periodic RESET messages coming from that endpoint. We have ended up in a stale state where a failed link cannot be re-established. In this commit, we remedy this by nudging the checkpoint only for received STATE messages, not for RESET or ACTIVATE messages. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-14 15:19:48 -04:00
Jon Paul Maloy	4f4482dcd9	tipc: compensate for double accounting in socket rcv buffer The function net/core/sock.c::__release_sock() runs a tight loop to move buffers from the socket backlog queue to the receive queue. As a security measure, sk_backlog.len of the receiving socket is not set to zero until after the loop is finished, i.e., until the whole backlog queue has been transferred to the receive queue. During this transfer, the data that has already been moved is counted both in the backlog queue and the receive queue, hence giving an incorrect picture of the available queue space for new arriving buffers. This leads to unnecessary rejection of buffers by sk_add_backlog(), which in TIPC leads to unnecessarily broken connections. In this commit, we compensate for this double accounting by adding a counter that keeps track of it. The function socket.c::backlog_rcv() receives buffers one by one from __release_sock(), and adds them to the socket receive queue. If the transfer is successful, it increases a new atomic counter 'tipc_sock::dupl_rcvcnt' with 'truesize' of the transferred buffer. If a new buffer arrives during this transfer and finds the socket busy (owned), we attempt to add it to the backlog. However, when sk_add_backlog() is called, we adjust the 'limit' parameter with the value of the new counter, so that the risk of inadvertent rejection is eliminated. It should be noted that this change does not invalidate the original purpose of zeroing 'sk_backlog.len' after the full transfer. We set an upper limit for dupl_rcvcnt, so that if a 'wild' sender (i.e., one that doesn't respect the send window) keeps pumping in buffers to sk_add_backlog(), he will eventually reach an upper limit, (2 x TIPC_CONN_OVERLOAD_LIMIT). After that, no messages can be added to the backlog, and the connection will be broken. Ordinary, well- behaved senders will never reach this buffer limit at all. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-14 15:19:47 -04:00
Jon Paul Maloy	6163a194e0	tipc: decrease connection flow control window Memory overhead when allocating big buffers for data transfer may be quite significant. E.g., truesize of a 64 KB buffer turns out to be 132 KB, 2 x the requested size. This invalidates the "worst case" calculation we have been using to determine the default socket receive buffer limit, which is based on the assumption that 1024x64KB = 67MB buffers may be queued up on a socket. Since TIPC connections cannot survive hitting the buffer limit, we have to compensate for this overhead. We do that in this commit by dividing the fix connection flow control window from 1024 (2512) messages to 512 (2256). Since older version nodes send out acks at 512 message intervals, compatibility with such nodes is guaranteed, although performance may be non-optimal in such cases. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-14 15:19:47 -04:00
dingtianhong	3fdddd859a	bonding: alloc the structure ad_info dynamically in per slave The struct ad_slave_info is very huge, and only be used for 802.3ad mode, so alloc the structure dynamically could save 356 Bits for every slave in non 802.3ad mode. Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Acked-by: Veaceslav Falico <vfalico@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-14 14:09:15 -04:00
Sergei Shtylyov	86b5d251d5	sh_eth: replace devm_kzalloc() with devm_kmalloc_array() When I was converting the driver to the managed device API, only devm_kzalloc() was available for memory allocation, so I had to use it, despite zeroing out the PHY IRQ array right before initializing all its entries to PHY_POLL was quite stupid. Now that devm_kmalloc_array() has become available, we can avoid the needless zeroing out... Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-13 18:51:27 -04:00
David S. Miller	1e1c77bfde	Merge branch 'tg3-next' Michael Chan says: ==================== tg3: TSO related enhancements to prevent memory allocation failure Michael Chan (3): tg3: Don't modify ip header fields when doing GSO tg3: Prevent page allocation failure during TSO workaround tg3: Update copyright and version to 3.137 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-13 18:39:01 -04:00
Michael Chan	de750e4c4b	tg3: Update copyright and version to 3.137 Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-13 18:38:51 -04:00
Michael Chan	d3f6f3a1d8	tg3: Prevent page allocation failure during TSO workaround If any TSO fragment hits hardware bug conditions (e.g. 4G boundary), the driver will workaround by calling skb_copy() to copy to a linear SKB. Users have reported page allocation failures as the TSO packet can be up to 64K. Copying such a large packet is also very inefficient. We fix this by using existing tg3_tso_bug() to transmit the packet using GSO. Signed-off-by: Prashant Sreedharan <prashant@broadcom.com> Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-13 18:38:51 -04:00
Michael Chan	d71c0dc4e9	tg3: Don't modify ip header fields when doing GSO tg3 uses GSO as workaround if the hardware cannot perform TSO on certain packets. We should not modify the ip header fields if we do GSO on the packet. It happens to work by accident because GSO recalculates the IP checksum and IP total length. Also fix the tg3_start_xmit comment to reflect that this is the only xmit function for all devices. Signed-off-by: Prashant Sreedharan <prashant@broadcom.com> Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-13 18:38:51 -04:00
David S. Miller	b6bd26c4de	Merge branch 'inet_fwmark_reflect' Lorenzo Colitti says: ==================== Make mark-based routing work better with multiple separate networks. Mark-based routing (ip rule fwmark 17 lookup 100) combined with either iptables marking (iptables -j MARK --set-mark 17) or application-based marking (the SO_MARK setsockopt) are a good way to deal with connecting simultaneously to multiple networks. Each network can be given a routing table, and ip rules can be configured to make different fwmarks select different networks. Applications can select networks them by setting appropriate socket marks, and iptables rules can be used to handle non-aware applications, enforce policy, etc. This patch series improves functionality when mark-based routing is used in this way. Current behaviour has the following limitations: 1. Kernel-originated replies that are not associated with a socket always use a mark of zero. This means that, for example, when the kernel sends a ping reply or a TCP reset, it does not send it on the network from which it received the original packet. 2. Path MTU discovery, which is triggered by incoming packets, does not always work correctly, because the routing lookups it uses to clone routes do not take the fwmark into account and thus can happen in the wrong routing table. 3. Application-based marking works well for outbound connections, but does not work well for incoming connections. Marking a listening socket causes that socket to only accept connections from a given network, and sockets that are returned by accept() are not marked (and are thus not routed correctly). sysctl. This causes route lookups for kernel-generated replies and PMTUD to use the fwmark of the packet that caused them. which causes TCP sockets returned by accept() to be marked with the same mark that sent the intial SYN packet. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-13 18:35:18 -04:00
Lorenzo Colitti	84f39b08d7	net: support marking accepting TCP sockets When using mark-based routing, sockets returned from accept() may need to be marked differently depending on the incoming connection request. This is the case, for example, if different socket marks identify different networks: a listening socket may want to accept connections from all networks, but each connection should be marked with the network that the request came in on, so that subsequent packets are sent on the correct network. This patch adds a sysctl to mark TCP sockets based on the fwmark of the incoming SYN packet. If enabled, and an unmarked socket receives a SYN, then the SYN packet's fwmark is written to the connection's inet_request_sock, and later written back to the accepted socket when the connection is established. If the socket already has a nonzero mark, then the behaviour is the same as it is today, i.e., the listening socket's fwmark is used. Black-box tested using user-mode linux: - IPv4/IPv6 SYN+ACK, FIN, etc. packets are routed based on the mark of the incoming SYN packet. - The socket returned by accept() is marked with the mark of the incoming SYN packet. - Tested with syncookies=1 and syncookies=2. Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-13 18:35:09 -04:00

1 2 3 4 5 ...

442668 Commits All Branches Search

442668 Commits

All Branches