OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
David S. Miller	ada6c1de9e	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next This a bit large (and late) patchset that contains Netfilter updates for net-next. Most relevantly br_netfilter fixes, ipset RCU support, removal of x_tables percpu ruleset copy and rework of the nf_tables netdev support. More specifically, they are: 1) Warn the user when there is a better protocol conntracker available, from Marcelo Ricardo Leitner. 2) Fix forwarding of IPv6 fragmented traffic in br_netfilter, from Bernhard Thaler. This comes with several patches to prepare the change in first place. 3) Get rid of special mtu handling of PPPoE/VLAN frames for br_netfilter. This is not needed anymore since now we use the largest fragment size to refragment, from Florian Westphal. 4) Restore vlan tag when refragmenting in br_netfilter, also from Florian. 5) Get rid of the percpu ruleset copy in x_tables, from Florian. Plus another follow up patch to refine it from Eric Dumazet. 6) Several ipset cleanups, fixes and finally RCU support, from Jozsef Kadlecsik. 7) Get rid of parens in Netfilter Kconfig files. 8) Attach the net_device to the basechain as opposed to the initial per table approach in the nf_tables netdev family. 9) Subscribe to netdev events to detect the removal and registration of a device that is referenced by a basechain. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-15 14:30:32 -07:00
Pablo Neira Ayuso	835b803377	netfilter: nf_tables_netdev: unregister hooks on net_device removal In case the net_device is gone, we have to unregister the hooks and put back the reference on the net_device object. Once it comes back, register them again. This also covers the device rename case. This patch also adds a new flag to indicate that the basechain is disabled, so their hooks are not registered. This flag is used by the netdev family to handle the case where the net_device object is gone. Currently this flag is not exposed to userspace. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-06-15 23:02:35 +02:00
Pablo Neira Ayuso	d8ee8f7c56	netfilter: nf_tables: add nft_register_basechain() and nft_unregister_basechain() This wrapper functions take care of hook registration for basechains. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-06-15 23:02:33 +02:00
Pablo Neira Ayuso	2cbce139fc	netfilter: nf_tables: attach net_device to basechain The device is part of the hook configuration, so instead of a global configuration per table, set it to each of the basechain that we create. This patch reworks `ebddf1a8d7` ("netfilter: nf_tables: allow to bind table to net_device"). Note that this adds a dev_name field in the nft_base_chain structure which is required the netdev notification subscription that follows up in a patch to handle gone net_devices. Suggested-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-06-15 23:02:31 +02:00
Eric Dumazet	711bdde6a8	netfilter: x_tables: remove XT_TABLE_INFO_SZ and a dereference. After Florian patches, there is no need for XT_TABLE_INFO_SZ anymore : Only one copy of table is kept, instead of one copy per cpu. We also can avoid a dereference if we put table data right after xt_table_info. It reduces register pressure and helps compiler. Then, we attempt a kmalloc() if total size is under order-3 allocation, to reduce TLB pressure, as in many cases, rules fit in 32 KB. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-06-15 20:19:20 +02:00
Pablo Neira Ayuso	53b8762727	Merge branch 'master' of git://blackhole.kfki.hu/nf-next Jozsef Kadlecsik says: ==================== ipset patches for nf-next Please consider to apply the next bunch of patches for ipset. First comes the small changes, then the bugfixes and at the end the RCU related patches. * Use MSEC_PER_SEC consistently instead of the number. * Use SET_WITH_() helpers to test set extensions from Sergey Popovich. Check extensions attributes before getting extensions from Sergey Popovich. * Permit CIDR equal to the host address CIDR in IPv6 from Sergey Popovich. * Make sure we always return line number on batch in the case of error from Sergey Popovich. * Check CIDR value only when attribute is given from Sergey Popovich. * Fix cidr handling for hash:net types, reported by Jonathan Johnson. * Fix parallel resizing and listing of the same set so that the original set is kept for the whole dumping. * Make sure listing doesn't grab a set which is just being destroyed. * Remove rbtree from ip_set_hash_netiface.c in order to introduce RCU. * Replace rwlock_t with spinlock_t in "struct ip_set", change the locking in the core and simplifications in the timeout routines. * Introduce RCU locking in bitmap:* types with a slight modification in the logic on how an element is added. * Introduce RCU locking in hash:* types. This is the most complex part of the changes. * Introduce RCU locking in list type where standard rculist is used. * Fix coding styles reported by checkpatch.pl. ==================== Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-06-15 18:33:09 +02:00
Pablo Neira Ayuso	f09becc79f	netfilter: Kconfig: get rid of parens around depends on According to the reporter, they are not needed. Reported-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-06-15 17:26:37 +02:00
Kenneth Klette Jonassen	758f0d4b16	tcp: cdg: use div_u64() Fixes cross-compile to mips. Signed-off-by: Kenneth Klette Jonassen <kennetkl@ifi.uio.no> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-14 12:57:45 -07:00
Jozsef Kadlecsik	ca0f6a5cd9	netfilter: ipset: Fix coding styles reported by checkpatch.pl Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2015-06-14 10:40:18 +02:00
Jozsef Kadlecsik	00590fdd5b	netfilter: ipset: Introduce RCU locking in list type Standard rculist is used. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2015-06-14 10:40:17 +02:00
Jozsef Kadlecsik	18f84d41d3	netfilter: ipset: Introduce RCU locking in hash:* types Three types of data need to be protected in the case of the hash types: a. The hash buckets: standard rcu pointer operations are used. b. The element blobs in the hash buckets are stored in an array and a bitmap is used for book-keeping to tell which elements in the array are used or free. c. Networks per cidr values and the cidr values themselves are stored in fix sized arrays and need no protection. The values are modified in such an order that in the worst case an element testing is repeated once with the same cidr value. The ipset hash approach uses arrays instead of lists and therefore is incompatible with rhashtable. Performance is tested by Jesper Dangaard Brouer: Simple drop in FORWARD ~~~~~~~~~~~~~~~~~~~~~~ Dropping via simple iptables net-mask match:: iptables -t raw -N simple \|\| iptables -t raw -F simple iptables -t raw -I simple -s 198.18.0.0/15 -j DROP iptables -t raw -D PREROUTING -j simple iptables -t raw -I PREROUTING -j simple Drop performance in "raw": 11.3Mpps Generator: sending 12.2Mpps (tx:12264083 pps) Drop via original ipset in RAW table ~~~~~~~~~~~~~~~~~~~~~~~~~~~ Create a set with lots of elements:: sudo ./ipset destroy test echo "create test hash:ip hashsize 65536" > test.set for x in `seq 0 255`; do for y in `seq 0 255`; do echo "add test 198.18.$x.$y" >> test.set done done sudo ./ipset restore < test.set Dropping via ipset:: iptables -t raw -F iptables -t raw -N net198 \|\| iptables -t raw -F net198 iptables -t raw -I net198 -m set --match-set test src -j DROP iptables -t raw -I PREROUTING -j net198 Drop performance in "raw" with ipset: 8Mpps Perf report numbers ipset drop in "raw":: + 24.65% ksoftirqd/1 [ip_set] [k] ip_set_test - 21.42% ksoftirqd/1 [kernel.kallsyms] [k] _raw_read_lock_bh - _raw_read_lock_bh + 99.88% ip_set_test - 19.42% ksoftirqd/1 [kernel.kallsyms] [k] _raw_read_unlock_bh - _raw_read_unlock_bh + 99.72% ip_set_test + 4.31% ksoftirqd/1 [ip_set_hash_ip] [k] hash_ip4_kadt + 2.27% ksoftirqd/1 [ixgbe] [k] ixgbe_fetch_rx_buffer + 2.18% ksoftirqd/1 [ip_tables] [k] ipt_do_table + 1.81% ksoftirqd/1 [ip_set_hash_ip] [k] hash_ip4_test + 1.61% ksoftirqd/1 [kernel.kallsyms] [k] __netif_receive_skb_core + 1.44% ksoftirqd/1 [kernel.kallsyms] [k] build_skb + 1.42% ksoftirqd/1 [kernel.kallsyms] [k] ip_rcv + 1.36% ksoftirqd/1 [kernel.kallsyms] [k] __local_bh_enable_ip + 1.16% ksoftirqd/1 [kernel.kallsyms] [k] dev_gro_receive + 1.09% ksoftirqd/1 [kernel.kallsyms] [k] __rcu_read_unlock + 0.96% ksoftirqd/1 [ixgbe] [k] ixgbe_clean_rx_irq + 0.95% ksoftirqd/1 [kernel.kallsyms] [k] __netdev_alloc_frag + 0.88% ksoftirqd/1 [kernel.kallsyms] [k] kmem_cache_alloc + 0.87% ksoftirqd/1 [xt_set] [k] set_match_v3 + 0.85% ksoftirqd/1 [kernel.kallsyms] [k] inet_gro_receive + 0.83% ksoftirqd/1 [kernel.kallsyms] [k] nf_iterate + 0.76% ksoftirqd/1 [kernel.kallsyms] [k] put_compound_page + 0.75% ksoftirqd/1 [kernel.kallsyms] [k] __rcu_read_lock Drop via ipset in RAW table with RCU-locking ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ With RCU locking, the RW-lock is gone. Drop performance in "raw" with ipset with RCU-locking: 11.3Mpps Performance-tested-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2015-06-14 10:40:17 +02:00
Jozsef Kadlecsik	96f51428c4	netfilter: ipset: Introduce RCU locking in bitmap:* types There's nothing much required because the bitmap types use atomic bit operations. However the logic of adding elements slightly changed: first the MAC address updated (which is not atomic), then the element activated (added). The extensions may call kfree_rcu() therefore we call rcu_barrier() at module removal. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2015-06-14 10:40:16 +02:00
Jozsef Kadlecsik	b57b2d1fa5	netfilter: ipset: Prepare the ipset core to use RCU at set level Replace rwlock_t with spinlock_t in "struct ip_set" and change the locking accordingly. Convert the comment extension into an rcu-avare object. Also, simplify the timeout routines. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2015-06-14 10:40:16 +02:00
Jozsef Kadlecsik	bd55389cc3	netfilter:ipset Remove rbtree from hash:net,iface Remove rbtree in order to introduce RCU instead of rwlock in ipset Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2015-06-14 10:40:15 +02:00
Jozsef Kadlecsik	9c1ba5c809	netfilter: ipset: Make sure listing doesn't grab a set which is just being destroyed. There was a small window when all sets are destroyed and a concurrent listing of all sets could grab a set which is just being destroyed. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2015-06-14 10:40:15 +02:00
Jozsef Kadlecsik	c4c997839c	netfilter: ipset: Fix parallel resizing and listing of the same set When elements added to a hash:* type of set and resizing triggered, parallel listing could start to list the original set (before resizing) and "continue" with listing the new set. Fix it by references and using the original hash table for listing. Therefore the destroying of the original hash table may happen from the resizing or listing functions. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2015-06-14 10:40:15 +02:00
Jozsef Kadlecsik	f690cbaed9	netfilter: ipset: Fix cidr handling for hash:net types Commit "Simplify cidr handling for hash:net types" broke the cidr handling for the hash:net types when the sets were used by the SET target: entries with invalid cidr values were added to the sets. Reported by Jonathan Johnson. Testsuite entry is added to verify the fix. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2015-06-14 10:40:14 +02:00
Sergey Popovich	aff227581e	netfilter: ipset: Check CIDR value only when attribute is given There is no reason to check CIDR value regardless attribute specifying CIDR is given. Initialize cidr array in element structure on element structure declaration to let more freedom to the compiler to optimize initialization right before element structure is used. Remove local variables cidr and cidr2 for netnet and netportnet hashes as we do not use packed cidr value for such set types and can store value directly in e.cidr[]. Signed-off-by: Sergey Popovich <popovich_sergei@mail.ua> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2015-06-14 10:40:14 +02:00
Sergey Popovich	a212e08e8e	netfilter: ipset: Make sure we always return line number on batch Even if we return with generic IPSET_ERR_PROTOCOL it is good idea to return line number if we called in batch mode. Moreover we are not always exiting with IPSET_ERR_PROTOCOL. For example hash:ip,port,net may return IPSET_ERR_HASH_RANGE_UNSUPPORTED or IPSET_ERR_INVALID_CIDR. Signed-off-by: Sergey Popovich <popovich_sergei@mail.ua> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2015-06-14 10:40:13 +02:00
Sergey Popovich	2c227f278a	netfilter: ipset: Permit CIDR equal to the host address CIDR in IPv6 Permit userspace to supply CIDR length equal to the host address CIDR length in netlink message. Prohibit any other CIDR length for IPv6 variant of the set. Also return -IPSET_ERR_HASH_RANGE_UNSUPPORTED instead of generic -IPSET_ERR_PROTOCOL in IPv6 variant of hash:ip,port,net when IPSET_ATTR_IP_TO attribute is given. Signed-off-by: Sergey Popovich <popovich_sergei@mail.ua> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2015-06-14 10:40:13 +02:00
Sergey Popovich	7dd37bc8e6	netfilter: ipset: Check extensions attributes before getting extensions. Make all extensions attributes checks within ip_set_get_extensions() and reduce number of duplicated code. Signed-off-by: Sergey Popovich <popovich_sergei@mail.ua> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2015-06-14 10:40:13 +02:00
Sergey Popovich	edda079174	netfilter: ipset: Use SET_WITH_*() helpers to test set extensions Signed-off-by: Sergey Popovich <popovich_sergei@mail.ua> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2015-06-14 10:40:12 +02:00
Jozsef Kadlecsik	aaeb6e24f5	netfilter: ipset: Use MSEC_PER_SEC consistently Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2015-06-14 10:40:12 +02:00
David S. Miller	25c43bf13b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2015-06-13 23:56:52 -07:00
Linus Torvalds	c8d17b451a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) Fix uninitialized struct station_info in cfg80211_wireless_stats(), from Johannes Berg. 2) Revert commit attempt to fix ipv6 protocol resubmission, it adds regressions. 3) Endless loops can be created in bridge port lists, fix from Nikolay Aleksandrov. 4) Don't WARN_ON() if sk->sk_forward_alloc is non-zero in sk_clear_memalloc, it is a legal situation during swap deactivation. Fix from Mel Gorman. 5) Fix order of disabling interrupts and unlocking NAPI in enic driver to avoid a race. From Govindarajulu Varadarajan. 6) High and low register writes are swapped when programming the start of periodic output in igb driver. From Richard Cochran. 7) Fix device rename handling in mpls stack, from Robert Shearman. 8) Do not trigger compaction synchronously when optimistically trying to allocate an order 3 page in alloc_skb_with_frags() and skb_page_frag_refill(). From Shaohua Li. 9) Authentication with COOKIE_ECHO is not handled properly in SCTP, fix from Marcelo Ricardo Leitner. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: Doc: networking: Fix URL for wiki.wireshark.org in udplite.txt sctp: allow authenticating DATA chunks that are bundled with COOKIE_ECHO net: don't wait for order-3 page allocation mpls: handle device renames for per-device sysctls net: igb: fix the start time for periodic output signals enic: fix memory leak in rq_clean enic: check return value for stat dump enic: unlock napi busy poll before unmasking intr net, swap: Remove a warning and clarify why sk_mem_reclaim is required when deactivating swap bridge: fix multicast router rlist endless loop tipc: disconnect socket directly after probe failure Revert "ipv6: Fix protocol resubmission" cfg80211: wext: clear sinfo struct before calling driver	2015-06-12 20:54:16 -10:00
Eric Dumazet	a2f0fad32b	tcp: tcp_v6_connect() cleanup Remove dead code from tcp_v6_connect() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-12 21:59:25 -07:00
Eric Dumazet	1e98a0f08a	flow_dissector: fix ipv6 dst, hop-by-hop and routing ext hdrs __skb_header_pointer() returns a pointer that must be checked. Fixes infinite loop reported by Alexei, and add __must_check to catch these errors earlier. Fixes: `6a74fcf426` ("flow_dissector: add support for dst, hop-by-hop and routing ext hdrs") Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Tested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-12 21:58:49 -07:00
Raghu Vatsavayi	5b173cf927	Fix Cavium Liquidio build related errors and warnings 1) Fixed following sparse warnings: lio_main.c:213:6: warning: symbol 'octeon_droq_bh' was not declared. Should it be static? lio_main.c:233:5: warning: symbol 'lio_wait_for_oq_pkts' was not declared. Should it be static? lio_main.c:3083:5: warning: symbol 'lio_nic_info' was not declared. Should it be static? lio_main.c:2618:16: warning: cast from restricted __be16 octeon_device.c:466:6: warning: symbol 'oct_set_config_info' was not declared. Should it be static? octeon_device.c:573:25: warning: cast to restricted __be32 octeon_device.c:582:29: warning: cast to restricted __be32 octeon_device.c:584:39: warning: cast to restricted __be32 octeon_device.c:594:13: warning: cast to restricted __be32 octeon_device.c:596:25: warning: cast to restricted __be32 octeon_device.c:613:25: warning: cast to restricted __be32 octeon_device.c:614:29: warning: cast to restricted __be64 octeon_device.c:615:29: warning: cast to restricted __be32 octeon_device.c:619:37: warning: cast to restricted __be32 octeon_device.c:623:33: warning: cast to restricted __be32 cn66xx_device.c:540:6: warning: symbol 'lio_cn6xxx_get_pcie_qlmport' was not declared. Should it be s octeon_mem_ops.c:181:16: warning: cast to restricted __be64 octeon_mem_ops.c:190:16: warning: cast to restricted __be32 octeon_mem_ops.c:196:17: warning: incorrect type in initializer 2) Fix build errors corresponding to vmalloc on linux-next 4.1. 3) Liquidio now supports 64 bit only, modified Kconfig accordingly. 4) Fix some code alignment issues based on kernel build warnings. Signed-off-by: Derek Chickles <derek.chickles@caviumnetworks.com> Signed-off-by: Satanand Burla <satananda.burla@caviumnetworks.com> Signed-off-by: Felix Manlunas <felix.manlunas@caviumnetworks.com> Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@caviumnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-12 19:16:04 -07:00
David S. Miller	ea70477099	Merge branch 'flow_dissector-next' Tom Herbert says: ==================== flow_dissector: Fix MPLS parsing and add ext hdr support Need to shift label. Added parsing of dst, hop-by-hop, and routing extension headers. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-12 14:24:28 -07:00
Tom Herbert	6a74fcf426	flow_dissector: add support for dst, hop-by-hop and routing ext hdrs If dst, hop-by-hop or routing extension headers are present determine length of the options and skip over them in flow dissection. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-12 14:24:28 -07:00
Tom Herbert	611d23c559	flow_dissector: Fix MPLS entropy label handling in flow dissector Need to shift after masking to get label value for comparison. Fixes: `b3baa0fbd0` ("mpls: Add MPLS entropy label in flow_keys") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-12 14:24:27 -07:00
Masanari Iida	b07d496177	Doc: networking: Fix URL for wiki.wireshark.org in udplite.txt This patch fix URL (http to https) for wiki.wireshark.org. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-12 14:21:29 -07:00
Florian Westphal	b60f2f3d65	net: ipv4: un-inline ip_finish_output2 text data bss dec hex filename old: 16527 44 0 16571 40bb net/ipv4/ip_output.o new: 14935 44 0 14979 3a83 net/ipv4/ip_output.o Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-12 14:19:17 -07:00
Marcelo Ricardo Leitner	ae36806a62	sctp: allow authenticating DATA chunks that are bundled with COOKIE_ECHO Currently, we can ask to authenticate DATA chunks and we can send DATA chunks on the same packet as COOKIE_ECHO, but if you try to combine both, the DATA chunk will be sent unauthenticated and peer won't accept it, leading to a communication failure. This happens because even though the data was queued after it was requested to authenticate DATA chunks, it was also queued before we could know that remote peer can handle authenticating, so sctp_auth_send_cid() returns false. The fix is whenever we set up an active key, re-check send queue for chunks that now should be authenticated. As a result, such packet will now contain COOKIE_ECHO + AUTH + DATA chunks, in that order. Reported-by: Liu Wei <weliu@redhat.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-12 14:18:20 -07:00
Linus Torvalds	b85dfd30cb	Merge branch 'for-linus' of git://git.kernel.dk/linux-block Pull block layer fixes from Jens Axboe: "Remember about a week ago when I sent the last pull request for 4.1? Well, I lied. Now, I don't want to shift the blame, but Dan, Ming, and Richard made a liar out of me. Here are three small patches that should go into 4.1. More specifically, this pull request contains: - A Kconfig dependency for the pmem block driver, so it can't be selected if HAS_IOMEM isn't availble. From Richard Weinberger. - A fix for genhd, making the ext_devt_lock softirq safe. This makes lockdep happier, since we also end up grabbing this lock on release off the softirq path. From Dan Williams. - A blk-mq software queue release fix from Ming Lei. Last two are headed to stable, first fixes an issue introduced in this cycle" * 'for-linus' of git://git.kernel.dk/linux-block: block: pmem: Add dependency on HAS_IOMEM block: fix ext_dev_lock lockdep report blk-mq: free hctx->ctxs in queue's release handler	2015-06-12 11:35:19 -07:00
Linus Torvalds	7b565d9d1f	Three more md fixes for 4.1 The main issue fixed here is a rare race which can result in two reshape threads running at once, which doesn't end well. Also a minor issue with a write to a sysfs file returning the wrong value. Backports to 4.0-stable are indicated. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIVAwUAVXqxoTnsnt1WYoG5AQKSOQ//d1WMgHfdoLAZgrFCTinicVAyvFkUiAc0 mchXE3XyjbU6D9BWNA4YDV4hqhCixYMyv9kQjFPQMsjQDySzXpxqPwZdLLEg2PTZ ND6xdtWa4w+28dI9UxofEy63M3msYY7LCX9ds8g0Y4/lVoqQ+6E9eLUwrfGTy8jw SgM8LB5Lli/izGDkrIgQ4XyFQdAe1Pa57dqJs5SeQGeW+3s1lV0lVsDZjlQPgQFE tBFGvTl0SBao1gE+KqB8xWmn0g7cy644q+upSvRYblpfyyQpoZjx9QJGxZQdB/3u +RY2B9WZ8PG7qyz+gfjnEXOzZK977GkEdHDq/tQpv/s64CXmFg3DUjp+J4A9lvpj HBdgLoWk4iKBTrHs9YKgqsWXqA7xES+zs+9qY89wZBd8J6lWjq7MoWX/TDjDSL0G ehRf2qZ05ngsJAlBeyvKbI5h59xc6G8NYAoevYjRGNdkHBEkE1MRRlWDmC0sH9sn 8N4U9u+gU8Fvzh98DEpTrJRIXiQB62wzLWFb3AQfd30C3ZXQ36eRIXn1I6apRxpR 1zOYJjoNgBPczxh//WVFt7sw56DlI4kGBFviLdv1QiTyJhgXs6wk3o8FOmL0e8CE LElpP3gV9qR+XlJNSM4zKdsLbh1/AJ+SD1jKP0zwLJW74DgGAyUgaIc+/qYR5OSb hWjGRQwY8Sg= =wj1j -----END PGP SIGNATURE----- Merge tag 'md/4.1-rc7-fixes' of git://neil.brown.name/md Pull three more md fixes from Neil Brown: "Hasn't been a good cycle for md has it :-( The main issue fixed here is a rare race which can result in two reshape threads running at once, which doesn't end well. Also a minor issue with a write to a sysfs file returning the wrong value. Backports to 4.0-stable are indicated" * tag 'md/4.1-rc7-fixes' of git://neil.brown.name/md: md: make sure MD_RECOVERY_DONE is clear before starting recovery/resync md: Close race when setting 'action' to 'idle'. md: don't return 0 from array_state_store	2015-06-12 11:33:03 -07:00
Linus Torvalds	c39f3bc659	Merge git://git.infradead.org/intel-iommu Pull VT-d hardware workarounds from David Woodhouse: "This contains a workaround for hardware issues which I thought were never going to be seen on production hardware. I'm glad I checked that before the 4.1 release... Firstly, PASID support is so broken on existing chips that we're just going to declare the old capability bit 28 as 'reserved' and change the VT-d spec to move PASID support to another bit. So any existing hardware doesn't support SVM; it only sets that (now) meaningless bit 28. That patch wasn't imperative for 4.1 because we don't have PASID support yet. But even the extended context tables are broken — if you just enable the wider tables and use none of the new bits in them, which is precisely what 4.1 does, you find that translations don't work. It's this problem which I thought was caught in time to be fixed before production, but wasn't. To avoid triggering this issue, we now only enable the extended context tables on hardware which also advertises "we have PASID support and we actually tested it this time" with the new PASID feature bit. In addition, I've added an 'intel_iommu=ecs_off' command line parameter to allow us to disable it manually if we need to" * git://git.infradead.org/intel-iommu: iommu/vt-d: Only enable extended context tables if PASID is supported iommu/vt-d: Change PASID support to bit 40 of Extended Capability Register	2015-06-12 11:28:57 -07:00
Florian Westphal	482cfc3185	netfilter: xtables: avoid percpu ruleset duplication We store the rule blob per (possible) cpu. Unfortunately this means we can waste lot of memory on big smp machines. ipt_entry structure ('rule head') is 112 byte, so e.g. with maxcpu=64 one single rule eats close to 8k RAM. Since previous patch made counters percpu it appears there is nothing left in the rule blob that needs to be percpu. On my test system (144 possible cpus, 400k dummy rules) this change saves close to 9 Gigabyte of RAM. Reported-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-06-12 14:27:10 +02:00
Florian Westphal	71ae0dff02	netfilter: xtables: use percpu rule counters The binary arp/ip/ip6tables ruleset is stored per cpu. The only reason left as to why we need percpu duplication are the rule counters embedded into ipt_entry et al -- since each cpu has its own copy of the rules, all counters can be lockless. The downside is that the more cpus are supported, the more memory is required. Rules are not just duplicated per online cpu but for each possible cpu, i.e. if maxcpu is 144, then rule is duplicated 144 times, not for the e.g. 64 cores present. To save some memory and also improve utilization of shared caches it would be preferable to only store the rule blob once. So we first need to separate counters and the rule blob. Instead of using entry->counters, allocate this percpu and store the percpu address in entry->counters.pcnt on CONFIG_SMP. This change makes no sense as-is; it is merely an intermediate step to remove the percpu duplication of the rule set in a followup patch. Suggested-by: Eric Dumazet <edumazet@google.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Reported-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-06-12 14:27:09 +02:00
Florian Westphal	d7b5974215	netfilter: bridge: restore vlan tag when refragmenting If bridge netfilter is used with both bridge-nf-call-iptables and bridge-nf-filter-vlan-tagged enabled then ip fragments in VLAN frames are sent without the vlan header. This has never worked reliably. Turns out this relied on pre-3.5 behaviour where skb frag_list was used to store ip fragments; ip_fragment() then re-used these skbs. But since commit `3cc4949269` ("ipv4: use skb coalescing in defragmentation") this is no longer the case. ip_do_fragment now needs to allocate new skbs, but these don't contain the vlan tag information anymore. Fix it by storing vlan information of the ressembled skb in the br netfilter percpu frag area, and restore them for each of the fragments. Fixes: `3cc4949269` ("ipv4: use skb coalescing in defragmentation") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-06-12 14:16:55 +02:00
Florian Westphal	33b1f31392	net: ip_fragment: remove BRIDGE_NETFILTER mtu special handling since commit `d6b915e29f` ("ip_fragment: don't forward defragmented DF packet") the largest fragment size is available in the IPCB. Therefore we no longer need to care about 'encapsulation' overhead of stripped PPPOE/VLAN headers since ip_do_fragment doesn't use device mtu in such cases. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-06-12 14:16:46 +02:00
Bernhard Thaler	efb6de9b4b	netfilter: bridge: forward IPv6 fragmented packets IPv6 fragmented packets are not forwarded on an ethernet bridge with netfilter ip6_tables loaded. e.g. steps to reproduce 1) create a simple bridge like this modprobe br_netfilter brctl addbr br0 brctl addif br0 eth0 brctl addif br0 eth2 ifconfig eth0 up ifconfig eth2 up ifconfig br0 up 2) place a host with an IPv6 address on each side of the bridge set IPv6 address on host A: ip -6 addr add fd01:2345:6789:1::1/64 dev eth0 set IPv6 address on host B: ip -6 addr add fd01:2345:6789:1::2/64 dev eth0 3) run a simple ping command on host A with packets > MTU ping6 -s 4000 fd01:2345:6789:1::2 4) wait some time and run e.g. "ip6tables -t nat -nvL" on the bridge IPv6 fragmented packets traverse the bridge cleanly until somebody runs. "ip6tables -t nat -nvL". As soon as it is run (and netfilter modules are loaded) IPv6 fragmented packets do not traverse the bridge any more (you see no more responses in ping's output). After applying this patch IPv6 fragmented packets traverse the bridge cleanly in above scenario. Signed-off-by: Bernhard Thaler <bernhard.thaler@wvnet.at> [pablo@netfilter.org: small changes to br_nf_dev_queue_xmit] Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-06-12 14:10:12 +02:00
Bernhard Thaler	a4611d3b74	netfilter: bridge: re-order check_hbh_len() Prepare check_hbh_len() to be called from newly introduced br_validate_ipv6() in next commit. Signed-off-by: Bernhard Thaler <bernhard.thaler@wvnet.at> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-06-12 14:09:46 +02:00
Bernhard Thaler	77d574e728	netfilter: bridge: rename br_parse_ip_options br_parse_ip_options() does not parse any IP options, it validates IP packets as a whole and the function name is misleading. Rename br_parse_ip_options() to br_validate_ipv4() and remove unneeded commments. Signed-off-by: Bernhard Thaler <bernhard.thaler@wvnet.at> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-06-12 14:09:17 +02:00
Bernhard Thaler	411ffb4fde	netfilter: bridge: refactor frag_max_size Currently frag_max_size is member of br_input_skb_cb and copied back and forth using IPCB(skb) and BR_INPUT_SKB_CB(skb) each time it is changed or used. Attach frag_max_size to nf_bridge_info and set value in pre_routing and forward functions. Use its value in forward and xmit functions. Signed-off-by: Bernhard Thaler <bernhard.thaler@wvnet.at> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-06-12 14:08:51 +02:00
Bernhard Thaler	72b31f7271	netfilter: bridge: detect NAT66 correctly and change MAC address IPv4 iptables allows to REDIRECT/DNAT/SNAT any traffic over a bridge. e.g. REDIRECT $ sysctl -w net.bridge.bridge-nf-call-iptables=1 $ iptables -t nat -A PREROUTING -p tcp -m tcp --dport 8080 \ -j REDIRECT --to-ports 81 This does not work with ip6tables on a bridge in NAT66 scenario because the REDIRECT/DNAT/SNAT is not correctly detected. The bridge pre-routing (finish) netfilter hook has to check for a possible redirect and then fix the destination mac address. This allows to use the ip6tables rules for local REDIRECT/DNAT/SNAT REDIRECT similar to the IPv4 iptables version. e.g. REDIRECT $ sysctl -w net.bridge.bridge-nf-call-ip6tables=1 $ ip6tables -t nat -A PREROUTING -p tcp -m tcp --dport 8080 \ -j REDIRECT --to-ports 81 This patch makes it possible to use IPv6 NAT66 on a bridge. It was tested on a bridge with two interfaces using SNAT/DNAT NAT66 rules. Reported-by: Artie Hamilton <artiemhamilton@yahoo.com> Signed-off-by: Sven Eckelmann <sven@open-mesh.com> [bernhard.thaler@wvnet.at: rebased, add indirect call to ip6_route_input()] [bernhard.thaler@wvnet.at: rebased, split into separate patches] Signed-off-by: Bernhard Thaler <bernhard.thaler@wvnet.at> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-06-12 14:08:07 +02:00
Bernhard Thaler	8cae308d2b	netfilter: bridge: re-order br_nf_pre_routing_finish_ipv6() Put br_nf_pre_routing_finish_ipv6() after daddr_was_changed() and br_nf_pre_routing_finish_bridge() to prepare calling these functions from there. Signed-off-by: Bernhard Thaler <bernhard.thaler@wvnet.at> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-06-12 14:07:56 +02:00
Bernhard Thaler	d39a33ed9b	netfilter: bridge: refactor clearing BRNF_NF_BRIDGE_PREROUTING use binary AND on complement of BRNF_NF_BRIDGE_PREROUTING to unset bit in nf_bridge->mask. Signed-off-by: Bernhard Thaler <bernhard.thaler@wvnet.at> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-06-12 14:07:53 +02:00
Marcelo Ricardo Leitner	779668450a	netfilter: conntrack: warn the user if there is a better helper to use After `db29a9508a` ("netfilter: conntrack: disable generic tracking for known protocols"), if the specific helper is built but not loaded (a standard for most distributions) systems with a restrictive firewall but weak configuration regarding netfilter modules to load, will silently stop working. This patch then puts a warning message so the sysadmin knows where to start looking into. It's a pr_warn_once regardless of protocol itself but it should be enough to give a hint on where to look. Cc: Florian Westphal <fw@strlen.de> Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-06-12 14:06:24 +02:00
David Woodhouse	c83b2f20fd	iommu/vt-d: Only enable extended context tables if PASID is supported Although the extended tables are theoretically a completely orthogonal feature to PASID and anything else that uses the newly-available bits, some of the early hardware has problems even when all we do is enable them and use only the same bits that were in the old context tables. For now, there's no motivation to support extended tables unless we're going to use PASID support to do SVM. So just don't use them unless PASID support is advertised too. Also add a command-line bailout just in case later chips also have issues. The equivalent problem for PASID support has already been fixed with the upcoming VT-d spec update and commit `bd00c606a` ("iommu/vt-d: Change PASID support to bit 40 of Extended Capability Register"), because the problematic platforms use the old definition of the PASID-capable bit, which is now marked as reserved and meaningless. So with this change, we'll magically start using ECS again only when we see the new hardware advertising "hey, we have PASID support and we actually tested it this time" on bit 40. The VT-d hardware architect has promised that we are not going to have any reason to support ECS without PASID any time soon, and he'll make sure he checks with us before changing that. In the future, if hypothetical new features also use new bits in the context tables and can be seen on implementations without PASID support, we might need to add their feature bits to the ecs_enabled() macro. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>	2015-06-12 11:31:25 +01:00

1 2 3 4 5 ...

521600 Commits All Branches Search

521600 Commits

All Branches