OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Jakub Kicinski	b6df00789e	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Trivial conflict in net/netfilter/nf_tables_api.c. Duplicate fix in tools/testing/selftests/net/devlink_port_split.py - take the net-next version. skmsg, and L4 bpf - keep the bpf code but remove the flags and err params. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-06-29 15:45:27 -07:00
Florian Westphal	cfc61c598e	xfrm: replay: avoid xfrm replay notify indirection replay protection is implemented using a callback structure and then called via x->repl->notify(), x->repl->recheck(), and so on. all the differect functions are always built-in, so this could be direct calls instead. This first patch prepares for removal of the x->repl structure. Add an enum with the three available replay modes to the xfrm_state structure and then replace all x->repl->notify() calls by the new xfrm_replay_notify() helper. The helper checks the enum internally to adapt behaviour as needed. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2021-06-21 09:55:06 +02:00
Sabrina Dubroca	fe9f1d8779	xfrm: add state hashtable keyed by seq When creating new states with seq set in xfrm_usersa_info, we walk through all the states already installed in that netns to find a matching ACQUIRE state (__xfrm_find_acq_byseq, called from xfrm_state_add). This causes severe slowdowns on systems with a large number of states. This patch introduces a hashtable using x->km.seq as key, so that the corresponding state can be found in a reasonable time. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2021-05-14 13:52:01 +02:00
Sabrina Dubroca	b515d26372	xfrm: xfrm_state_mtu should return at least 1280 for ipv6 Jianwen reported that IPv6 Interoperability tests are failing in an IPsec case where one of the links between the IPsec peers has an MTU of 1280. The peer generates a packet larger than this MTU, the router replies with a "Packet too big" message indicating an MTU of 1280. When the peer tries to send another large packet, xfrm_state_mtu returns 1280 - ipsec_overhead, which causes ip6_setup_cork to fail with EINVAL. We can fix this by forcing xfrm_state_mtu to return IPV6_MIN_MTU when IPv6 is used. After going through IPsec, the packet will then be fragmented to obey the actual network's PMTU, just before leaving the host. Currently, TFC padding is capped to PMTU - overhead to avoid fragementation: after padding and encapsulation, we still fit within the PMTU. That behavior is preserved in this patch. Fixes: `91657eafb6` ("xfrm: take net hdr len into account for esp payload size calculation") Reported-by: Jianwen Ji <jiji@redhat.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2021-04-19 12:43:50 +02:00
Ahmed S. Darwish	bc8e0adff3	net: xfrm: Use sequence counter with associated spinlock A sequence counter write section must be serialized or its internal state can get corrupted. A plain seqcount_t does not contain the information of which lock must be held to guaranteee write side serialization. For xfrm_state_hash_generation, use seqcount_spinlock_t instead of plain seqcount_t. This allows to associate the spinlock used for write serialization with the sequence counter. It thus enables lockdep to verify that the write serialization lock is indeed held before entering the sequence counter write section. If lockdep is disabled, this lock association is compiled out and has neither storage size nor runtime overhead. Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2021-03-22 07:38:08 +01:00
Ahmed S. Darwish	e88add19f6	net: xfrm: Localize sequence counter per network namespace A sequence counter write section must be serialized or its internal state can get corrupted. The "xfrm_state_hash_generation" seqcount is global, but its write serialization lock (net->xfrm.xfrm_state_lock) is instantiated per network namespace. The write protection is thus insufficient. To provide full protection, localize the sequence counter per network namespace instead. This should be safe as both the seqcount read and write sections access data exclusively within the network namespace. It also lays the foundation for transforming "xfrm_state_hash_generation" data type from seqcount_t to seqcount_LOCKNAME_t in further commits. Fixes: `b65e3d7be0` ("xfrm: state: add sequence count to detect hash resizes") Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2021-03-22 07:35:42 +01:00
Linus Torvalds	ca5b877b6c	selinux/stable-5.11 PR 20201214 -----BEGIN PGP SIGNATURE----- iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAl/YBtEUHHBhdWxAcGF1 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXNnwA/9Ek8DG/1t8CEoJxpoRvwovQxNo+bi 0rCT9vqvx9PeCwoZi/0Vp6oKmpE1HADvbeB/+e00VrbLYnzE3oRY6VkpjoZRofKS vc0/MzHSFxFUR1OTHwCefcXlPLK+bfitQbX5jEMeVyQCXNXXIrN7CnJf1LmCeLTR kQBPlEN9lt7HyNVAi34FhOD/TQbWnFHgl2z5puffgri6cWnc+TALKMYytUZ+rYex NYndDJW5b3g5kTat2eErn0FruxfzloGs0xMIiWb+z2i9kl41D+dkKPdAN7idqCSC Jv0nJP/bDftzA0wOe9szmGaLQzu7YnCN5kiWcSspatZVnon42Cy/tp9tiuPGLRFU XtelDfpyX6o3CLN0tX7LQEO+GYxPzvM6iaR2OrsChWPozUIIR3TLQg7jJN4bvNKl TR6gCGZCoAeS5JLNGjzVKxT/oKQY+tCLLlYXQdQY6swNFi3EKmPr+K1D9lgm98fO f3d1QmWiZZNmtxxoVogT0qoQYjkfgpnm3dVx813Vt+lwHlVpHGMEPpO27iD3/RYb w2yWOJaGKwMD8iL0l+Cm6CPW0/nE5FFISQjWgC8b4Vgxlyan6+L9eViqGICkrUQ2 Edo0i1YFFZ4utHYkDf1VYBbJ+36KyCtdktgLAcbgnePiPB3E1XBsXTIIStSUIbVQ iEbTkBlsCG4GIeU= =6Cqb -----END PGP SIGNATURE----- Merge tag 'selinux-pr-20201214' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux Pull selinux updates from Paul Moore: "While we have a small number of SELinux patches for v5.11, there are a few changes worth highlighting: - Change the LSM network hooks to pass flowi_common structs instead of the parent flowi struct as the LSMs do not currently need the full flowi struct and they do not have enough information to use it safely (missing information on the address family). This patch was discussed both with Herbert Xu (representing team netdev) and James Morris (representing team LSMs-other-than-SELinux). - Fix how we handle errors in inode_doinit_with_dentry() so that we attempt to properly label the inode on following lookups instead of continuing to treat it as unlabeled. - Tweak the kernel logic around allowx, auditallowx, and dontauditx SELinux policy statements such that the auditx/dontauditx are effective even without the allowx statement. Everything passes our test suite" * tag 'selinux-pr-20201214' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: lsm,selinux: pass flowi_common instead of flowi to the LSM hooks selinux: Fix fall-through warnings for Clang selinux: drop super_block backpointer from superblock_security_struct selinux: fix inode_doinit_with_dentry() LABEL_INVALID error handling selinux: allow dontauditx and auditallowx rules to take effect without allowx selinux: fix error initialization in inode_doinit_with_dentry()	2020-12-16 11:01:04 -08:00
Paul Moore	3df98d7921	lsm,selinux: pass flowi_common instead of flowi to the LSM hooks As pointed out by Herbert in a recent related patch, the LSM hooks do not have the necessary address family information to use the flowi struct safely. As none of the LSMs currently use any of the protocol specific flowi information, replace the flowi pointers with pointers to the address family independent flowi_common struct. Reported-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: James Morris <jamorris@linux.microsoft.com> Signed-off-by: Paul Moore <paul@paul-moore.com>	2020-11-23 18:36:21 -05:00
Yu Kuai	48f486e13f	net: xfrm: fix memory leak in xfrm_user_policy() if xfrm_get_translator() failed, xfrm_user_policy() return without freeing 'data', which is allocated in memdup_sockptr(). Fixes: `96392ee5a1` ("xfrm/compat: Translate 32-bit user_policy from sockptr") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2020-11-10 09:14:25 +01:00
Jakub Kicinski	2da4c187ae	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== 1) Fix packet receiving of standard IP tunnels when the xfrm_interface module is installed. From Xin Long. 2) Fix a race condition between spi allocating and hash list resizing. From zhuoliang zhang. ==================== Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-04 08:12:52 -08:00
zhuoliang zhang	a779d91314	net: xfrm: fix a race condition during allocing spi we found that the following race condition exists in xfrm_alloc_userspi flow: user thread state_hash_work thread ---- ---- xfrm_alloc_userspi() __find_acq_core() /alloc new xfrm_state:x/ xfrm_state_alloc() /schedule state_hash_work thread/ xfrm_hash_grow_check() xfrm_hash_resize() xfrm_alloc_spi /hold lock/ x->id.spi = htonl(spi) spin_lock_bh(&net->xfrm.xfrm_state_lock) /waiting lock release/ xfrm_hash_transfer() spin_lock_bh(&net->xfrm.xfrm_state_lock) /add x into hlist:net->xfrm.state_byspi/ hlist_add_head_rcu(&x->byspi) spin_unlock_bh(&net->xfrm.xfrm_state_lock) /add x into hlist:net->xfrm.state_byspi 2 times/ hlist_add_head_rcu(&x->byspi) 1. a new state x is alloced in xfrm_state_alloc() and added into the bydst hlist in __find_acq_core() on the LHS; 2. on the RHS, state_hash_work thread travels the old bydst and tranfers every xfrm_state (include x) into the new bydst hlist and new byspi hlist; 3. user thread on the LHS gets the lock and adds x into the new byspi hlist again. So the same xfrm_state (x) is added into the same list_hash (net->xfrm.state_byspi) 2 times that makes the list_hash become an inifite loop. To fix the race, x->id.spi = htonl(spi) in the xfrm_alloc_spi() is moved to the back of spin_lock_bh, sothat state_hash_work thread no longer add x which id.spi is zero into the hash_list. Fixes: `f034b5d4ef` ("[XFRM]: Dynamic xfrm_state hash table sizing.") Signed-off-by: zhuoliang zhang <zhuoliang.zhang@mediatek.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2020-10-23 09:08:55 +02:00
David S. Miller	8b0308fe31	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Rejecting non-native endian BTF overlapped with the addition of support for it. The rest were more simple overlapping changes, except the renesas ravb binding update, which had to follow a file move as well as a YAML conversion. Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-05 18:40:01 -07:00
Herbert Xu	e94ee17134	xfrm: Use correct address family in xfrm_state_find The struct flowi must never be interpreted by itself as its size depends on the address family. Therefore it must always be grouped with its original family value. In this particular instance, the original family value is lost in the function xfrm_state_find. Therefore we get a bogus read when it's coupled with the wrong family which would occur with inter- family xfrm states. This patch fixes it by keeping the original family value. Note that the same bug could potentially occur in LSM through the xfrm_state_pol_flow_match hook. I checked the current code there and it seems to be safe for now as only secid is used which is part of struct flowi_common. But that API should be changed so that so that we don't get new bugs in the future. We could do that by replacing fl with just secid or adding a family field. Reported-by: syzbot+577fbac3145a6eb2e7a5@syzkaller.appspotmail.com Fixes: `48b8d78315` ("[XFRM]: State selection update to use inner...") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2020-09-25 09:59:51 +02:00
Dmitry Safonov	96392ee5a1	xfrm/compat: Translate 32-bit user_policy from sockptr Provide compat_xfrm_userpolicy_info translation for xfrm setsocketopt(). Reallocate buffer and put the missing padding for 64-bit message. Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2020-09-24 08:53:04 +02:00
Dmitry Safonov	c9e7c76d70	xfrm: Provide API to register translator module Add a skeleton for xfrm_compat module and provide API to register it in xfrm_state.ko. struct xfrm_translator will have function pointers to translate messages received from 32-bit userspace or to be sent to it from 64-bit kernel. module_get()/module_put() are used instead of rcu_read_lock() as the module will vmalloc() memory for translation. The new API is registered with xfrm_state module, not with xfrm_user as the former needs translator for user_policy set by setsockopt() and xfrm_user already uses functions from xfrm_state. Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2020-09-24 08:53:03 +02:00
Antony Antony	8366685b28	xfrm: clone whole liftime_cur structure in xfrm_do_migrate When we clone state only add_time was cloned. It missed values like bytes, packets. Now clone the all members of the structure. v1->v3: - use memcpy to copy the entire structure Fixes: `80c9abaabf` ("[XFRM]: Extension for dynamic update of endpoint address(es)") Signed-off-by: Antony Antony <antony.antony@secunet.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2020-09-07 12:45:22 +02:00
Antony Antony	7aa05d3047	xfrm: clone XFRMA_SEC_CTX in xfrm_do_migrate XFRMA_SEC_CTX was not cloned from the old to the new. Migrate this attribute during XFRMA_MSG_MIGRATE v1->v2: - return -ENOMEM on error v2->v3: - fix return type to int Fixes: `80c9abaabf` ("[XFRM]: Extension for dynamic update of endpoint address(es)") Signed-off-by: Antony Antony <antony.antony@secunet.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2020-09-07 12:45:22 +02:00
Antony Antony	545e5c5716	xfrm: clone XFRMA_SET_MARK in xfrm_do_migrate XFRMA_SET_MARK and XFRMA_SET_MARK_MASK was not cloned from the old to the new. Migrate these two attributes during XFRMA_MSG_MIGRATE Fixes: `9b42c1f179` ("xfrm: Extend the output_mark to support input direction and masking.") Signed-off-by: Antony Antony <antony.antony@secunet.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2020-09-07 12:45:22 +02:00
Christoph Hellwig	c6d1b26a8f	net/xfrm: switch xfrm_user_policy to sockptr_t Pass a sockptr_t to prepare for set_fs-less handling of the kernel pointer from bpf-cgroup. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-24 15:41:53 -07:00
Huang Zijiang	a4c278d1be	xfrm: Use kmem_cache_zalloc() instead of kmem_cache_alloc() with flag GFP_ZERO. Use kmem_cache_zalloc instead of manually setting kmem_cache_alloc with flag GFP_ZERO since kzalloc sets allocated memory to zero. Change in v2: add indation Signed-off-by: Huang Zijiang <huang.zijiang@zte.com.cn> Signed-off-by: Yi Wang <wang.yi59@zte.com.cn> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2020-02-19 09:33:56 +01:00
Sabrina Dubroca	e27cca96cd	xfrm: add espintcp (RFC 8229) TCP encapsulation of IKE and IPsec messages (RFC 8229) is implemented as a TCP ULP, overriding in particular the sendmsg and recvmsg operations. A Stream Parser is used to extract messages out of the TCP stream using the first 2 bytes as length marker. Received IKE messages are put on "ike_queue", waiting to be dequeued by the custom recvmsg implementation. Received ESP messages are sent to XFRM, like with UDP encapsulation. Some of this code is taken from the original submission by Herbert Xu. Currently, only IPv4 is supported, like for UDP encapsulation. Co-developed-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2019-12-09 09:59:07 +01:00
Steffen Klassert	86c6739eda	xfrm: Fix memleak on xfrm state destroy We leak the page that we use to create skb page fragments when destroying the xfrm_state. Fix this by dropping a page reference if a page was assigned to the xfrm_state. Fixes: `cac2661c53` ("esp4: Avoid skb_cow_data whenever possible") Reported-by: JD <jdtxs00@gmail.com> Reported-by: Paul Wouters <paul@nohats.ca> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2019-11-07 10:38:07 +01:00
Florian Westphal	c7b37c769d	xfrm: remove get_mtu indirection from xfrm_type esp4_get_mtu and esp6_get_mtu are exactly the same, the only difference is a single sizeof() (ipv4 vs. ipv6 header). Merge both into xfrm_state_mtu() and remove the indirection. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2019-07-01 06:16:40 +02:00
Florian Westphal	4f518e802c	xfrm: remove type and offload_type map from xfrm_state_afinfo Only a handful of xfrm_types exist, no need to have 512 pointers for them. Reduces size of afinfo struct from 4k to 120 bytes on 64bit platforms. Also, the unregister function doesn't need to return an error, no single caller does anything useful with it. Just place a WARN_ON() where needed instead. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2019-06-06 08:34:50 +02:00
Florian Westphal	3aaf3915a3	xfrm: remove state and template sort indirections from xfrm_state_afinfo No module dependency, placing this in xfrm_state.c avoids need for an indirection. This also removes the state spinlock -- I don't see why we would need to hold it during sorting. This in turn allows to remove the 'net' argument passed to xfrm_tmpl_sort. Last, remove the EXPORT_SYMBOL, there are no modular callers. For the CONFIG_IPV6=m case, vmlinux size increase is about 300 byte. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2019-06-06 08:34:50 +02:00
Florian Westphal	e46817472a	xfrm: remove init_flags indirection from xfrm_state_afinfo There is only one implementation of this function; just call it directly. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2019-06-05 13:16:30 +02:00
Florian Westphal	5c1b9ab3ec	xfrm: remove init_temprop indirection from xfrm_state_afinfo same as previous patch: just place this in the caller, no need to have an indirection for a structure initialization. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2019-06-05 13:16:30 +02:00
Florian Westphal	bac9593515	xfrm: remove init_tempsel indirection from xfrm_state_afinfo Simple initialization, handle it in the caller. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2019-06-05 13:16:30 +02:00
Thomas Gleixner	457c899653	treewide: Add SPDX license identifier for missed files Add SPDX license identifiers to all files which: - Have no license information of any form - Have EXPORT_.*_SYMBOL_GPL inside which was used in the initial scan/conversion to ignore the file These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-05-21 10:50:45 +02:00
Linus Torvalds	80f232121b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next Pull networking updates from David Miller: "Highlights: 1) Support AES128-CCM ciphers in kTLS, from Vakul Garg. 2) Add fib_sync_mem to control the amount of dirty memory we allow to queue up between synchronize RCU calls, from David Ahern. 3) Make flow classifier more lockless, from Vlad Buslov. 4) Add PHY downshift support to aquantia driver, from Heiner Kallweit. 5) Add SKB cache for TCP rx and tx, from Eric Dumazet. This reduces contention on SLAB spinlocks in heavy RPC workloads. 6) Partial GSO offload support in XFRM, from Boris Pismenny. 7) Add fast link down support to ethtool, from Heiner Kallweit. 8) Use siphash for IP ID generator, from Eric Dumazet. 9) Pull nexthops even further out from ipv4/ipv6 routes and FIB entries, from David Ahern. 10) Move skb->xmit_more into a per-cpu variable, from Florian Westphal. 11) Improve eBPF verifier speed and increase maximum program size, from Alexei Starovoitov. 12) Eliminate per-bucket spinlocks in rhashtable, and instead use bit spinlocks. From Neil Brown. 13) Allow tunneling with GUE encap in ipvs, from Jacky Hu. 14) Improve link partner cap detection in generic PHY code, from Heiner Kallweit. 15) Add layer 2 encap support to bpf_skb_adjust_room(), from Alan Maguire. 16) Remove SKB list implementation assumptions in SCTP, your's truly. 17) Various cleanups, optimizations, and simplifications in r8169 driver. From Heiner Kallweit. 18) Add memory accounting on TX and RX path of SCTP, from Xin Long. 19) Switch PHY drivers over to use dynamic featue detection, from Heiner Kallweit. 20) Support flow steering without masking in dpaa2-eth, from Ioana Ciocoi. 21) Implement ndo_get_devlink_port in netdevsim driver, from Jiri Pirko. 22) Increase the strict parsing of current and future netlink attributes, also export such policies to userspace. From Johannes Berg. 23) Allow DSA tag drivers to be modular, from Andrew Lunn. 24) Remove legacy DSA probing support, also from Andrew Lunn. 25) Allow ll_temac driver to be used on non-x86 platforms, from Esben Haabendal. 26) Add a generic tracepoint for TX queue timeouts to ease debugging, from Cong Wang. 27) More indirect call optimizations, from Paolo Abeni" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1763 commits) cxgb4: Fix error path in cxgb4_init_module net: phy: improve pause mode reporting in phy_print_status dt-bindings: net: Fix a typo in the phy-mode list for ethernet bindings net: macb: Change interrupt and napi enable order in open net: ll_temac: Improve error message on error IRQ net/sched: remove block pointer from common offload structure net: ethernet: support of_get_mac_address new ERR_PTR error net: usb: smsc: fix warning reported by kbuild test robot staging: octeon-ethernet: Fix of_get_mac_address ERR_PTR check net: dsa: support of_get_mac_address new ERR_PTR error net: dsa: sja1105: Fix status initialization in sja1105_get_ethtool_stats vrf: sit mtu should not be updated when vrf netdev is the link net: dsa: Fix error cleanup path in dsa_init_module l2tp: Fix possible NULL pointer dereference taprio: add null check on sched_nest to avoid potential null pointer dereference net: mvpp2: cls: fix less than zero check on a u32 variable net_sched: sch_fq: handle non connected flows net_sched: sch_fq: do not assume EDT packets are ordered net: hns3: use devm_kcalloc when allocating desc_cb net: hns3: some cleanup for struct hns3_enet_ring ...	2019-05-07 22:03:58 -07:00
Linus Torvalds	a0e928ed7c	Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Ingo Molnar: "This cycle had the following changes: - Timer tracing improvements (Anna-Maria Gleixner) - Continued tasklet reduction work: remove the hrtimer_tasklet (Thomas Gleixner) - Fix CPU hotplug remove race in the tick-broadcast mask handling code (Thomas Gleixner) - Force upper bound for setting CLOCK_REALTIME, to fix ABI inconsistencies with handling values that are close to the maximum supported and the vagueness of when uptime related wraparound might occur. Make the consistent maximum the year 2232 across all relevant ABIs and APIs. (Thomas Gleixner) - various cleanups and smaller fixes" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tick: Fix typos in comments tick/broadcast: Fix warning about undefined tick_broadcast_oneshot_offline() timekeeping: Force upper bound for setting CLOCK_REALTIME timer/trace: Improve timer tracing timer/trace: Replace deprecated vsprintf pointer extension %pf by %ps timer: Move trace point to get proper index tick/sched: Update tick_sched struct documentation tick: Remove outgoing CPU from broadcast masks timekeeping: Consistently use unsigned int for seqcount snapshot softirq: Remove tasklet_hrtimer xfrm: Replace hrtimer tasklet with softirq hrtimer mac80211_hwsim: Replace hrtimer tasklet with softirq hrtimer	2019-05-06 14:50:46 -07:00
David S. Miller	ff24e4980a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Three trivial overlapping conflicts. Signed-off-by: David S. Miller <davem@davemloft.net>	2019-05-02 22:14:21 -04:00
Florian Westphal	bb9cd077e2	xfrm: remove unneeded export_symbols None of them have any external callers, make them static. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2019-04-23 07:42:20 +02:00
Florian Westphal	c9500d7b7d	xfrm: store xfrm_mode directly, not its address This structure is now only 4 bytes, so its more efficient to cache a copy rather than its address. No significant size difference in allmodconfig vmlinux. With non-modular kernel that has all XFRM options enabled, this series reduces vmlinux image size by ~11kb. All xfrm_mode indirections are gone and all modes are built-in. before (ipsec-next master): text data bss dec filename 21071494 7233140 11104324 39408958 vmlinux.master after this series: 21066448 7226772 11104324 39397544 vmlinux.patched With allmodconfig kernel, the size increase is only 362 bytes, even all the xfrm config options removed in this series are modular. before: text data bss dec filename 15731286 6936912 4046908 26715106 vmlinux.master after this series: 15731492 6937068 4046908 26715468 vmlinux Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2019-04-08 09:15:28 +02:00
Florian Westphal	4c145dce26	xfrm: make xfrm modes builtin after previous changes, xfrm_mode contains no function pointers anymore and all modules defining such struct contain no code except an init/exit functions to register the xfrm_mode struct with the xfrm core. Just place the xfrm modes core and remove the modules, the run-time xfrm_mode register/unregister functionality is removed. Before: text data bss dec filename 7523 200 2364 10087 net/xfrm/xfrm_input.o 40003 628 440 41071 net/xfrm/xfrm_state.o 15730338 6937080 4046908 26714326 vmlinux 7389 200 2364 9953 net/xfrm/xfrm_input.o 40574 656 440 41670 net/xfrm/xfrm_state.o 15730084 6937068 4046908 26714060 vmlinux The xfrm_mode_{transport,tunnel,beet} modules are gone. v2: replace CONFIG_INET6_XFRM_MODE_ IS_ENABLED guards with CONFIG_IPV6 ones rather than removing them. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2019-04-08 09:15:17 +02:00
Florian Westphal	733a5fac2f	xfrm: remove afinfo pointer from xfrm_mode Adds an EXPORT_SYMBOL for afinfo_get_rcu, as it will now be called from ipv6 in case of CONFIG_IPV6=m. This change has virtually no effect on vmlinux size, but it reduces afinfo size and allows followup patch to make xfrm modes const. v2: mark if (afinfo) tests as likely (Sabrina) re-fetch afinfo according to inner_mode in xfrm_prepare_input(). Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2019-04-08 09:15:09 +02:00
Florian Westphal	b262a69582	xfrm: place af number into xfrm_mode struct This will be useful to know if we're supposed to decode ipv4 or ipv6. While at it, make the unregister function return void, all module_exit functions did just BUG(); there is never a point in doing error checks if there is no way to handle such error. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2019-04-08 09:13:46 +02:00
Cong Wang	dbb2483b2a	xfrm: clean up xfrm protocol checks In commit `6a53b75932` ("xfrm: check id proto in validate_tmpl()") I introduced a check for xfrm protocol, but according to Herbert IPSEC_PROTO_ANY should only be used as a wildcard for lookup, so it should be removed from validate_tmpl(). And, IPSEC_PROTO_ANY is expected to only match 3 IPSec-specific protocols, this is why xfrm_state_flush() could still miss IPPROTO_ROUTING, which leads that those entries are left in net->xfrm.state_all before exit net. Fix this by replacing IPSEC_PROTO_ANY with zero. This patch also extracts the check from validate_tmpl() to xfrm_id_proto_valid() and uses it in parse_ipsecrequest(). With this, no other protocols should be added into xfrm. Fixes: `6a53b75932` ("xfrm: check id proto in validate_tmpl()") Reported-by: syzbot+0bf0519d6e0de15914fe@syzkaller.appspotmail.com Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2019-03-26 08:35:36 +01:00
Thomas Gleixner	671422b220	xfrm: Replace hrtimer tasklet with softirq hrtimer Switch the timer to HRTIMER_MODE_SOFT, which executed the timer callback in softirq context and remove the hrtimer_tasklet. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Link: https://lkml.kernel.org/r/20190301224821.29843-3-bigeasy@linutronix.de	2019-03-22 14:35:31 +01:00
Cong Wang	f75a2804da	xfrm: destroy xfrm_state synchronously on net exit path xfrm_state_put() moves struct xfrm_state to the GC list and schedules the GC work to clean it up. On net exit call path, xfrm_state_flush() is called to clean up and xfrm_flush_gc() is called to wait for the GC work to complete before exit. However, this doesn't work because one of the ->destructor(), ipcomp_destroy(), schedules the same GC work again inside the GC work. It is hard to wait for such a nested async callback. This is also why syzbot still reports the following warning: WARNING: CPU: 1 PID: 33 at net/ipv6/xfrm6_tunnel.c:351 xfrm6_tunnel_net_exit+0x2cb/0x500 net/ipv6/xfrm6_tunnel.c:351 ... ops_exit_list.isra.0+0xb0/0x160 net/core/net_namespace.c:153 cleanup_net+0x51d/0xb10 net/core/net_namespace.c:551 process_one_work+0xd0c/0x1ce0 kernel/workqueue.c:2153 worker_thread+0x143/0x14a0 kernel/workqueue.c:2296 kthread+0x357/0x430 kernel/kthread.c:246 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352 In fact, it is perfectly fine to bypass GC and destroy xfrm_state synchronously on net exit call path, because it is in process context and doesn't need a work struct to do any blocking work. This patch introduces xfrm_state_put_sync() which simply bypasses GC, and lets its callers to decide whether to use this synchronous version. On net exit path, xfrm_state_fini() and xfrm6_tunnel_net_exit() use it. And, as ipcomp_destroy() itself is blocking, it can use xfrm_state_put_sync() directly too. Also rename xfrm_state_gc_destroy() to ___xfrm_state_destroy() to reflect this change. Fixes: `b48c05ab5d` ("xfrm: Fix warning in xfrm6_tunnel_net_exit.") Reported-and-tested-by: syzbot+e9aebef558e3ed673934@syzkaller.appspotmail.com Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2019-02-05 06:29:20 +01:00
David S. Miller	fde9cd69a5	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2018-12-18 1) Fix error return code in xfrm_output_one() when no dst_entry is attached to the skb. From Wei Yongjun. 2) The xfrm state hash bucket count reported to userspace is off by one. Fix from Benjamin Poirier. 3) Fix NULL pointer dereference in xfrm_input when skb_dst_force clears the dst_entry. 4) Fix freeing of xfrm states on acquire. We use a dedicated slab cache for the xfrm states now, so free it properly with kmem_cache_free. From Mathias Krause. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-18 11:43:26 -08:00
Mathias Krause	4a135e5389	xfrm_user: fix freeing of xfrm states on acquire Commit `565f0fa902` ("xfrm: use a dedicated slab cache for struct xfrm_state") moved xfrm state objects to use their own slab cache. However, it missed to adapt xfrm_user to use this new cache when freeing xfrm states. Fix this by introducing and make use of a new helper for freeing xfrm_state objects. Fixes: `565f0fa902` ("xfrm: use a dedicated slab cache for struct xfrm_state") Reported-by: Pan Bian <bianpan2016@163.com> Cc: <stable@vger.kernel.org> # v4.18+ Signed-off-by: Mathias Krause <minipli@googlemail.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2018-11-23 07:51:32 +01:00
Benjamin Poirier	ca92e173ab	xfrm: Fix bucket count reported to userspace sadhcnt is reported by `ip -s xfrm state count` as "buckets count", not the hash mask. Fixes: `28d8909bc7` ("[XFRM]: Export SAD info.") Signed-off-by: Benjamin Poirier <bpoirier@suse.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2018-11-06 12:25:47 +01:00
Dmitry Safonov	98f76206b3	compat: Cleanup in_compat_syscall() callers Now that in_compat_syscall() is consistent on all architectures and does not longer report true on native i686, the workarounds (ifdeffery and helpers) can be removed. Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Andy Lutomirsky <luto@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: John Stultz <john.stultz@linaro.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Stephen Boyd <sboyd@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: linux-efi@vger.kernel.org Cc: netdev@vger.kernel.org Link: https://lkml.kernel.org/r/20181012134253.23266-3-dima@arista.com	2018-11-01 13:02:21 +01:00
Nathan Harold	5baf4f9c00	xfrm: Allow xfrmi if_id to be updated by UPDSA Allow attaching an SA to an xfrm interface id after the creation of the SA, so that tasks such as keying which must be done as the SA is created, can remain separate from the decision on how to route traffic from an SA. This permits SA creation to be decomposed in to three separate steps: 1) allocation of a SPI 2) algorithm and key negotiation 3) insertion into the data path Signed-off-by: Nathan Harold <nharold@google.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2018-07-20 10:19:19 +02:00
Benedict Wong	bc56b33404	xfrm: Remove xfrmi interface ID from flowi In order to remove performance impact of having the extra u32 in every single flowi, this change removes the flowi_xfrm struct, prefering to take the if_id as a method parameter where needed. In the inbound direction, if_id is only needed during the __xfrm_check_policy() function, and the if_id can be determined at that point based on the skb. As such, xfrmi_decode_session() is only called with the skb in __xfrm_check_policy(). In the outbound direction, the only place where if_id is needed is the xfrm_lookup() call in xfrmi_xmit2(). With this change, the if_id is directly passed into the xfrm_lookup_with_ifid() call. All existing callers can still call xfrm_lookup(), which uses a default if_id of 0. This change does not change any behavior of XFRMIs except for improving overall system performance via flowi size reduction. This change has been tested against the Android Kernel Networking Tests: https://android.googlesource.com/kernel/tests/+/master/net/test Signed-off-by: Benedict Wong <benedictwong@google.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2018-07-20 10:14:41 +02:00
Arnd Bergmann	386c5680e2	xfrm: use time64_t for in-kernel timestamps The lifetime managment uses '__u64' timestamps on the user space interface, but 'unsigned long' for reading the current time in the kernel with get_seconds(). While this is probably safe beyond y2038, it will still overflow in 2106, and the get_seconds() call is deprecated because fo that. This changes the xfrm time handling to use time64_t consistently, along with reading the time using the safer ktime_get_real_seconds(). It still suffers from problems that can happen from a concurrent settimeofday() call or (to a lesser degree) a leap second update, but since the time stamps are part of the user API, there is nothing we can do to prevent that. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2018-07-11 15:25:30 +02:00
Nathan Harold	6d8e85ffe1	xfrm: Allow Set Mark to be Updated Using UPDSA Allow UPDSA to change "set mark" to permit policy separation of packet routing decisions from SA keying in systems that use mark-based routing. The set mark, used as a routing and firewall mark for outbound packets, is made update-able which allows routing decisions to be handled independently of keying/SA creation. To maintain consistency with other optional attributes, the set mark is only updated if sent with a non-zero value. The per-SA lock and the xfrm_state_lock are taken in that order to avoid a deadlock with xfrm_timer_handler(), which also takes the locks in that order. Signed-off-by: Nathan Harold <nharold@google.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2018-07-01 17:47:10 +02:00
Florian Westphal	e4db5b61c5	xfrm: policy: remove pcpu policy cache Kristian Evensen says: In a project I am involved in, we are running ipsec (Strongswan) on different mt7621-based routers. Each router is configured as an initiator and has around ~30 tunnels to different responders (running on misc. devices). Before the flow cache was removed (kernel 4.9), we got a combined throughput of around 70Mbit/s for all tunnels on one router. However, we recently switched to kernel 4.14 (4.14.48), and the total throughput is somewhere around 57Mbit/s (best-case). I.e., a drop of around 20%. Reverting the flow cache removal restores, as expected, performance levels to that of kernel 4.9. When pcpu xdst exists, it has to be validated first before it can be used. A negative hit thus increases cost vs. no-cache. As number of tunnels increases, hit rate decreases so this pcpu caching isn't a viable strategy. Furthermore, the xdst cache also needs to run with BH off, so when removing this the bh disable/enable pairs can be removed too. Kristian tested a 4.14.y backport of this change and reported increased performance: In our tests, the throughput reduction has been reduced from around -20% to -5%. We also see that the overall throughput is independent of the number of tunnels, while before the throughput was reduced as the number of tunnels increased. Reported-by: Kristian Evensen <kristian.evensen@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2018-06-25 17:46:06 +02:00
Steffen Klassert	7e6526404a	xfrm: Add a new lookup key to match xfrm interfaces. This patch adds the xfrm interface id as a lookup key for xfrm states and policies. With this we can assign states and policies to virtual xfrm interfaces. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Acked-by: Shannon Nelson <shannon.nelson@oracle.com> Acked-by: Benedict Wong <benedictwong@google.com> Tested-by: Benedict Wong <benedictwong@google.com> Tested-by: Antony Antony <antony@phenome.org> Reviewed-by: Eyal Birger <eyal.birger@gmail.com>	2018-06-23 16:07:15 +02:00

1 2 3 4 5 ...

334 Commits