linux-sg2042

Commit Graph

Author	SHA1	Message	Date
Eric Dumazet	472c2e07ee	tcp: add one skb cache for tx On hosts with a lot of cores, RPC workloads suffer from heavy contention on slab spinlocks. 20.69% [kernel] [k] queued_spin_lock_slowpath 5.64% [kernel] [k] _raw_spin_lock 3.83% [kernel] [k] syscall_return_via_sysret 3.48% [kernel] [k] __entry_text_start 1.76% [kernel] [k] __netif_receive_skb_core 1.64% [kernel] [k] __fget For each sendmsg(), we allocate one skb, and free it at the time ACK packet comes. In many cases, ACK packets are handled by another cpus, and this unfortunately incurs heavy costs for slab layer. This patch uses an extra pointer in socket structure, so that we try to reuse the same skb and avoid these expensive costs. We cache at most one skb per socket so this should be safe as far as memory pressure is concerned. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-23 21:57:38 -04:00
Eric Dumazet	dc05360fee	net: convert rps_needed and rfs_needed to new static branch api We prefer static_branch_unlikely() over static_key_false() these days. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-23 21:57:38 -04:00
David S. Miller	7c1508e5f6	Merge branch 'net-dev-BYPASS-for-lockless-qdisc' Paolo Abeni says: ==================== net: dev: BYPASS for lockless qdisc This patch series is aimed at improving xmit performances of lockless qdisc in the uncontended scenario. After the lockless refactor pfifo_fast can't leverage the BYPASS optimization. Due to retpolines the overhead for the avoidables enqueue and dequeue operations has increased and we see measurable regressions. The first patch introduces the BYPASS code path for lockless qdisc, and the second one optimizes such path further. Overall this avoids up to 3 indirect calls per xmit packet. Detailed performance figures are reported in the 2nd patch. v2 -> v3: - qdisc_is_empty() has a const argument (Eric) v1 -> v2: - use really an 'empty' flag instead of 'not_empty', as suggested by Eric ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-23 21:52:37 -04:00
Paolo Abeni	ba27b4cdaa	net: dev: introduce support for sch BYPASS for lockless qdisc With commit `c5ad119fb6` ("net: sched: pfifo_fast use skb_array") pfifo_fast no longer benefit from the TCQ_F_CAN_BYPASS optimization. Due to retpolines the cost of the enqueue()/dequeue() pair has become relevant and we observe measurable regression for the uncontended scenario when the packet-rate is below line rate. After commit `46b1c18f9d` ("net: sched: put back q.qlen into a single location") we can check for empty qdisc with a reasonably fast operation even for nolock qdiscs. This change extends TCQ_F_CAN_BYPASS support to nolock qdisc. The new chunk of code mirrors closely the existing one for traditional qdisc, leveraging a newly introduced helper to read atomically the qdisc length. Tested with pktgen in queue xmit mode, with pfifo_fast, a MQ device, and MQ root qdisc: threads vanilla patched kpps kpps 1 2465 2889 2 4304 5188 4 7898 9589 Same as above, but with a single queue device: threads vanilla patched kpps kpps 1 2556 2827 2 2900 2900 4 5000 5000 8 4700 4700 No mesaurable changes in the contended scenarios, and more 10% improvement in the uncontended ones. v1 -> v2: - rebased after flag name change Signed-off-by: Paolo Abeni <pabeni@redhat.com> Tested-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-23 21:52:36 -04:00
Paolo Abeni	28cff537ef	net: sched: add empty status flag for NOLOCK qdisc The queue is marked not empty after acquiring the seqlock, and it's up to the NOLOCK qdisc clearing such flag on dequeue. Since the empty status lays on the same cache-line of the seqlock, it's always hot on cache during the updates. This makes the empty flag update a little bit loosy. Given the lack of synchronization between enqueue and dequeue, this is unavoidable. v2 -> v3: - qdisc_is_empty() has a const argument (Eric) v1 -> v2: - use really an 'empty' flag instead of 'not_empty', as suggested by Eric Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-23 21:52:36 -04:00
Soheil Hassas Yeganeh	576fd2f7ca	tcp: add documentation for tcp_ca_state Add documentation to the tcp_ca_state enum, since this enum is exposed in uapi. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Cc: Sowmini Varadhan <sowmini05@gmail.com> Acked-by: Sowmini Varadhan <sowmini05@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-23 21:50:05 -04:00
Eric Dumazet	e6d1407013	tcp: remove conditional branches from tcp_mstamp_refresh() tcp_clock_ns() (aka ktime_get_ns()) is using monotonic clock, so the checks we had in tcp_mstamp_refresh() are no longer relevant. This patch removes cpu stall (when the cache line is not hot) Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-23 21:43:21 -04:00
Florian Fainelli	a7a01ab312	net: phy: Correct Cygnus/Omega PHY driver prompt The tristate prompt should have been replaced rather than defined a few lines below, rebase mistake. Fixes: `17cc982176` ("net: phy: Move Omega PHY entry to Cygnus PHY driver") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-23 20:54:18 -04:00
Johannes Berg	3b0f31f2b8	genetlink: make policy common to family Since maxattr is common, the policy can't really differ sanely, so make it common as well. The only user that did in fact manage to make a non-common policy is taskstats, which has to be really careful about it (since it's still using a common maxattr!). This is no longer supported, but we can fake it using pre_doit. This reduces the size of e.g. nl80211.o (which has lots of commands): text data bss dec hex filename 398745 14323 2240 415308 6564c net/wireless/nl80211.o (before) 397913 14331 2240 414484 65314 net/wireless/nl80211.o (after) -------------------------------- -832 +8 0 -824 Which is obviously just 8 bytes for each command, and an added 8 bytes for the new policy pointer. I'm not sure why the ops list is counted as .text though. Most of the code transformations were done using the following spatch: @ops@ identifier OPS; expression POLICY; @@ struct genl_ops OPS[] = { ..., { - .policy = POLICY, }, ... }; @@ identifier ops.OPS; expression ops.POLICY; identifier fam; expression M; @@ struct genl_family fam = { .ops = OPS, .maxattr = M, + .policy = POLICY, ... }; This also gets rid of devlink_nl_cmd_region_read_dumpit() accessing the cb->data as ops, which we want to change in a later genl patch. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-22 10:38:23 -04:00
Heiner Kallweit	601ed4d6dc	r8169: use netif_start_queue instead of netif_wake_qeueue in rtl8169_start_xmit Replace the call to netif_wake_queue in rtl8169_start_xmit with netif_start_queue as we don't need to actually wake up the queue since we are still in mid transmit so we just need to reset the bit so it doesn't prevent the next transmit. (Description shamelessly copied from a mail sent by Alex.) Suggested-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-22 10:35:25 -04:00
Heiner Kallweit	110a2432c5	net: phy: aquantia: add downshift support Aquantia PHY's of the AQR107 family support the downshift feature. Add support for it as standard PHY tunable so that it can be controlled via ethtool. The AQCS109 supports a proprietary 2-pair 1Gbps mode. If two such PHY's are connected to each other with a 2-pair cable, they may not be able to establish a link if both advertise modes > 1Gbps. v2: - add downshift event detection - warn if downshift occurred - read downshifted rate from vendor register - enable downshift per default on all AQR107 family members Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-22 10:26:21 -04:00
David S. Miller	1d965c4def	Merge branch 'Refactor-flower-classifier-to-remove-dependency-on-rtnl-lock' Vlad Buslov says: ==================== Refactor flower classifier to remove dependency on rtnl lock Currently, all netlink protocol handlers for updating rules, actions and qdiscs are protected with single global rtnl lock which removes any possibility for parallelism. This patch set is a third step to remove rtnl lock dependency from TC rules update path. Recently, new rtnl registration flag RTNL_FLAG_DOIT_UNLOCKED was added. TC rule update handlers (RTM_NEWTFILTER, RTM_DELTFILTER, etc.) are already registered with this flag and only take rtnl lock when qdisc or classifier requires it. Classifiers can indicate that their ops callbacks don't require caller to hold rtnl lock by setting the TCF_PROTO_OPS_DOIT_UNLOCKED flag. The goal of this change is to refactor flower classifier to support unlocked execution and register it with unlocked flag. This patch set implements following changes to make flower classifier concurrency-safe: - Implement reference counting for individual filters. Change fl_get to take reference to filter. Implement tp->ops->put callback that was introduced in cls API patch set to release reference to flower filter. - Use tp->lock spinlock to protect internal classifier data structures from concurrent modification. - Handle concurrent tcf proto deletion by returning EAGAIN, which will cause cls API to retry and create new proto instance or return error to the user (depending on message type). - Handle concurrent insertion of filter with same priority and handle by returning EAGAIN, which will cause cls API to lookup filter again and process it accordingly to netlink message flags. - Extend flower mask with reference counting and protect masks list with masks_lock spinlock. - Prevent concurrent mask insertion by inserting temporary value to masks hash table. This is necessary because mask initialization is a sleeping operation and cannot be done while holding tp->lock. Both chain level and classifier level conflicts are resolved by returning -EAGAIN to cls API that results restart of whole operation. This retry mechanism is a result of fine-grained locking approach used in this and previous changes in series and is necessary to allow concurrent updates on same chain instance. Alternative approach would be to lock the whole chain while updating filters on any of child tp's, adding and removing classifier instances from the chain. However, since most CPU-intensive parts of filter update code are specifically in classifier code and its dependencies (extensions and hw offloads), such approach would negate most of the gains introduced by this change and previous changes in the series when updating same chain instance. Tcf hw offloads API is not changed by this patch set and still requires caller to hold rtnl lock. Refactored flower classifier tracks rtnl lock state by means of 'rtnl_held' flag provided by cls API and obtains the lock before calling hw offloads. Following patch set will lift this restriction and refactor cls hw offloads API to support unlocked execution. With these changes flower classifier is safely registered with TCF_PROTO_OPS_DOIT_UNLOCKED flag in last patch. Changes from V2 to V3: - Rebase on latest net-next Changes from V1 to V2: - Extend cover letter with explanation about retry mechanism. - Rebase on current net-next. - Patch 1: - Use rcu_dereference_raw() for tp->root dereference. - Update comment in fl_head_dereference(). - Patch 2: - Remove redundant check in fl_change error handling code. - Add empty line between error check and new handle assignment. - Patch 3: - Refactor loop in fl_get_next_filter() to improve readability. - Patch 4: - Refactor __fl_delete() to improve readability. - Patch 6: - Fix comment in fl_check_assign_mask(). - Patch 9: - Extend commit message. - Fix error code in comment. - Patch 11: - Fix fl_hw_replace_filter() to always release rtnl lock in error handlers. - Patch 12: - Don't take rtnl lock before calling __fl_destroy_filter() in workqueue context. - Extend commit message with explanation why flower still takes rtnl lock before calling hardware offloads API. Github: <https://github.com/vbuslov/linux/tree/unlocked-flower-cong3> ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-21 14:32:17 -07:00
Vlad Buslov	9214919006	net: sched: flower: set unlocked flag for flower proto ops Set TCF_PROTO_OPS_DOIT_UNLOCKED for flower classifier to indicate that its ops callbacks don't require caller to hold rtnl lock. Don't take rtnl lock in fl_destroy_filter_work() that is executed on workqueue instead of being called by cls API and is not affected by setting TCF_PROTO_OPS_DOIT_UNLOCKED. Rtnl mutex is still manually taken by flower classifier before calling hardware offloads API that has not been updated for unlocked execution. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-21 14:32:17 -07:00
Vlad Buslov	c24e43d83b	net: sched: flower: track rtnl lock state Use 'rtnl_held' flag to track if caller holds rtnl lock. Propagate the flag to internal functions that need to know rtnl lock state. Take rtnl lock before calling tcf APIs that require it (hw offload, bind filter, etc.). Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-21 14:32:17 -07:00
Vlad Buslov	3d81e7118d	net: sched: flower: protect flower classifier state with spinlock struct tcf_proto was extended with spinlock to be used by classifiers instead of global rtnl lock. Use it to protect shared flower classifier data structures (handle_idr, mask hashtable and list) and fields of individual filters that can be accessed concurrently. This patch set uses tcf_proto->lock as per instance lock that protects all filters on tcf_proto. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-21 14:32:17 -07:00
Vlad Buslov	272ffaadeb	net: sched: flower: handle concurrent tcf proto deletion Without rtnl lock protection tcf proto can be deleted concurrently. Check tcf proto 'deleting' flag after taking tcf spinlock to verify that no concurrent deletion is in progress. Return EAGAIN error if concurrent deletion detected, which will cause caller to retry and possibly create new instance of tcf proto. Retry mechanism is a result of fine-grained locking approach used in this and previous changes in series and is necessary to allow concurrent updates on same chain instance. Alternative approach would be to lock the whole chain while updating filters on any of child tp's, adding and removing classifier instances from the chain. However, since most CPU-intensive parts of filter update code are specifically in classifier code and its dependencies (extensions and hw offloads), such approach would negate most of the gains introduced by this change and previous changes in the series when updating same chain instance. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-21 14:32:17 -07:00
Vlad Buslov	9a2d938998	net: sched: flower: handle concurrent filter insertion in fl_change Check if user specified a handle and another filter with the same handle was inserted concurrently. Return EAGAIN to retry filter processing (in case it is an overwrite request). Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-21 14:32:17 -07:00
Vlad Buslov	259e60f967	net: sched: flower: protect masks list with spinlock Protect modifications of flower masks list with spinlock to remove dependency on rtnl lock and allow concurrent access. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-21 14:32:17 -07:00
Vlad Buslov	195c234d15	net: sched: flower: handle concurrent mask insertion Without rtnl lock protection masks with same key can be inserted concurrently. Insert temporary mask with reference count zero to masks hashtable. This will cause any concurrent modifications to retry. Wait for rcu grace period to complete after removing temporary mask from masks hashtable to accommodate concurrent readers. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Suggested-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-21 14:32:17 -07:00
Vlad Buslov	f48ef4d5b0	net: sched: flower: add reference counter to flower mask Extend fl_flow_mask structure with reference counter to allow parallel modification without relying on rtnl lock. Use rcu read lock to safely lookup mask and increment reference counter in order to accommodate concurrent deletes. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-21 14:32:17 -07:00
Vlad Buslov	b2552b8c40	net: sched: flower: track filter deletion with flag In order to prevent double deletion of filter by concurrent tasks when rtnl lock is not used for synchronization, add 'deleted' filter field. Check value of this field when modifying filters and return error if concurrent deletion is detected. Refactor __fl_delete() to accept pointer to 'last' boolean as argument, and return error code as function return value instead. This is necessary to signal concurrent filter delete to caller. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-21 14:32:17 -07:00
Vlad Buslov	061775583e	net: sched: flower: introduce reference counting for filters Extend flower filters with reference counting in order to remove dependency on rtnl lock in flower ops and allow to modify filters concurrently. Reference to flower filter can be taken/released concurrently as soon as it is marked as 'unlocked' by last patch in this series. Use atomic reference counter type to make concurrent modifications safe. Always take reference to flower filter while working with it: - Modify fl_get() to take reference to filter. - Implement tp->put() callback as fl_put() function to allow cls API to release reference taken by fl_get(). - Modify fl_change() to assume that caller holds reference to fold and take reference to fnew. - Take reference to filter while using it in fl_walk(). Implement helper functions to get/put filter reference counter. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-21 14:32:17 -07:00
Vlad Buslov	620da48608	net: sched: flower: refactor fl_change As a preparation for using classifier spinlock instead of relying on external rtnl lock, rearrange code in fl_change. The goal is to group the code which changes classifier state in single block in order to allow following commits in this set to protect it from parallel modification with tp->lock. Data structures that require tp->lock protection are mask hashtable and filters list, and classifier handle_idr. fl_hw_replace_filter() is a sleeping function and cannot be called while holding a spinlock. In order to execute all sequence of changes to shared classifier data structures atomically, call fl_hw_replace_filter() before modifying them. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-21 14:32:17 -07:00
Vlad Buslov	e474619a24	net: sched: flower: don't check for rtnl on head dereference Flower classifier only changes root pointer during init and destroy. Cls API implements reference counting for tcf_proto, so there is no danger of concurrent access to tp when it is being destroyed, even without protection provided by rtnl lock. Implement new function fl_head_dereference() to dereference tp->root without checking for rtnl lock. Use it in all flower function that obtain head pointer instead of rtnl_dereference(). Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-21 14:32:17 -07:00
Jakub Kicinski	31f1a0e37c	nfp: remove defines for unused control bits NFP driver ABI contains bits for L2 switching which were never implemented in initially envisioned form. Remove the defines, and open up the possibility of reclaiming the bits for other uses. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-21 14:01:55 -07:00
David S. Miller	143eb9ac9f	Merge branch 'rhashtable-cleanups' NeilBrown says: ==================== Two clean-ups for rhashtable. These two patches make small improvements to rhashtable, but are otherwise unrelated. Thanks to Herbert, Miguel, and Paul for the review. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-21 14:01:10 -07:00
NeilBrown	f7ad68bf98	rhashtable: rename rht_for_eachcontinue as from. The pattern set by list.h is that for_each..continue() iterators start at the next entry after the given one, while for_each..from() iterators start at the given entry. The rht_for_eachcontinue() iterators are documented as though the start at the 'next' entry, but actually start at the given entry, and they are used expecting that behaviour. So fix the documentation and change the names to from for consistency with list.h Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-21 14:01:10 -07:00
NeilBrown	4feb7c7a4f	rhashtable: don't hold lock on first table throughout insertion. rhashtable_try_insert() currently holds a lock on the bucket in the first table, while also locking buckets in subsequent tables. This is unnecessary and looks like a hold-over from some earlier version of the implementation. As insert and remove always lock a bucket in each table in turn, and as insert only inserts in the final table, there cannot be any races that are not covered by simply locking a bucket in each table in turn. When an insert call reaches that last table it can be sure that there is no matchinf entry in any other table as it has searched them all, and insertion never happens anywhere but in the last table. The fact that code tests for the existence of future_tbl while holding a lock on the relevant bucket ensures that two threads inserting the same key will make compatible decisions about which is the "last" table. This simplifies the code and allows the ->rehash field to be discarded. We still need a way to ensure that a dead bucket_table is never re-linked by rhashtable_walk_stop(). This can be achieved by calling call_rcu() inside the locked region, and checking with rcu_head_after_call_rcu() in rhashtable_walk_stop() to see if the bucket table is empty and dead. Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Reviewed-by: Paul E. McKenney <paulmck@linux.ibm.com> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-21 14:01:10 -07:00
David S. Miller	83b038db25	Merge branch 'net-phy-Move-Omega-PHY-entry-to-Cygnus-PHY-driver' Florian Fainelli says: ==================== net: phy: Move Omega PHY entry to Cygnus PHY driver In order to pave the way for adding some specific Omega PHY features that may not be desirable on other products covered by the bcm7xxx PHY driver, split the Omega PHY entry into the Cygnus PHY driver such that the PHY drivers are reflective of product lines/business units maintaining them within Broadcom. No functional changes intended. ==================== Acked-by: Arun Parameswaran <arun.parameswaran@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-21 13:41:52 -07:00
Florian Fainelli	17cc982176	net: phy: Move Omega PHY entry to Cygnus PHY driver Cygnus and Omega are part of the same business unit and product line, it makes sense to group PHY entries by products such that a platform can select only the drivers that it needs. Bring all the functionality that the BCM7XXX_28NM_GPHY() macro hides for us and remove the Omega PHY entry from bcm7xxx.c. As an added bonus, we now have a proper mdio_device_id entry to permit auto-loading. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Scott Branden <scott.branden@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-21 13:41:26 -07:00
Florian Fainelli	f878fe5685	net: phy: Prepare for moving Omega out of bcm7xxx The Omega PHY entry was added to bcm7xxx.c out of convenience and this breaks the one driver per product line paradigm that was applied up until now. Since the AFE initialization is shared between Omega and BCM7xxx move the relevant functions to bcm-phy-lib.[ch]. No functional changes introduced. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Scott Branden <scott.branden@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-21 13:41:26 -07:00
Julian Wiedmann	02afc7ad45	net: dst: remove gc leftovers Get rid of some obsolete gc-related documentation and macros that were missed in commit `5b7c9a8ff8` ("net: remove dst gc related code"). CC: Wei Wang <weiwan@google.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Acked-by: Wei Wang <weiwan@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-21 13:39:25 -07:00
David S. Miller	88f808f312	Merge branch 'net-broadcom-Remove-print-of-base-address' Florian Fainelli says: ==================== net: broadcom: Remove print of base address Some broadcom MDIO/switch/Ethernet MAC drivers insist on printing the base register virtual address which has little value. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-21 13:32:35 -07:00
Florian Fainelli	62be757fbe	net: systemport: Remove print of base address Since commit `ad67b74d24` ("printk: hash addresses printed with %p") pointers are being hashed when printed. Displaying the virtual memory at bootup time is not helpful, especially given we use a dev_info() which already displays the platform device's address. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-21 13:32:35 -07:00
Florian Fainelli	fbb7bc45ea	net: dsa: bcm_sf2: Remove print of base address Since commit `ad67b74d24` ("printk: hash addresses printed with %p") pointers are being hashed when printed. Displaying the virtual memory at bootup time is not helpful, we use a dev_info() print which already displays the platform device's address. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-21 13:32:35 -07:00
Florian Fainelli	647aed232a	net: phy: mdio-bcm-unimac: Remove print of base address Since commit `ad67b74d24` ("printk: hash addresses printed with %p") pointers are being hashed when printed. Displaying the virtual memory at bootup time is not helpful, especially given we use a dev_info() which already displays the platform device's address. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-21 13:32:35 -07:00
David Ahern	10585b4342	ipv6: Remove fallback argument from ip6_hold_safe net and null_fallback are redundant. Remove null_fallback in favor of !net check. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Wei Wang <weiwan@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-21 13:30:34 -07:00
David Ahern	9ab948a91b	ipv4: Allow amount of dirty memory from fib resizing to be controllable fib_trie implementation calls synchronize_rcu when a certain amount of pages are dirty from freed entries. The number of pages was determined experimentally in 2009 (commit `c3059477fc`). At the current setting, synchronize_rcu is called often -- 51 times in a second in one test with an average of an 8 msec delay adding a fib entry. The total impact is a lot of slow down modifying the fib. This is seen in the output of 'time' - the difference between real time and sys+user. For example, using 720,022 single path routes and 'ip -batch'[1]: $ time ./ip -batch ipv4/routes-1-hops real 0m14.214s user 0m2.513s sys 0m6.783s So roughly 35% of the actual time to install the routes is from the ip command getting scheduled out, most notably due to synchronize_rcu (this is observed using 'perf sched timehist'). This patch makes the amount of dirty memory configurable between 64k where the synchronize_rcu is called often (small, low end systems that are memory sensitive) to 64M where synchronize_rcu is called rarely during a large FIB change (for high end systems with lots of memory). The default is 512kB which corresponds to the current setting of 128 pages with a 4kB page size. As an example, at 16MB the worst interval shows 4 calls to synchronize_rcu in a second blocking for up to 30 msec in a single instance, and a total of almost 100 msec across the 4 calls in the second. The trade off is allowing FIB entries to consume more memory in a given time window but but with much better fib insertion rates (~30% increase in prefixes/sec). With this patch and net.ipv4.fib_sync_mem set to 16MB, the same batch file runs in: $ time ./ip -batch ipv4/routes-1-hops real 0m9.692s user 0m2.491s sys 0m6.769s So the dead time is reduced to about 1/2 second or <5% of the real time. [1] 'ip' modified to not request ACK messages which improves route insertion times by about 20% Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-21 13:29:53 -07:00
Kirill Tkhai	12132768dc	tun: Remove unused first parameter of tun_get_iff() Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-21 13:19:15 -07:00
Kirill Tkhai	0c3e0e3bb6	tun: Add ioctl() TUNGETDEVNETNS cmd to allow obtaining real net ns of tun device In commit `f2780d6d74` "tun: Add ioctl() SIOCGSKNS cmd to allow obtaining net ns of tun device" it was missed that tun may change its net ns, while net ns of socket remains the same as it was created initially. SIOCGSKNS returns net ns of socket, so it is not suitable for obtaining net ns of device. We may have two tun devices with the same names in two net ns, and in this case it's not possible to determ, which of them fd refers to (TUNGETIFF will return the same name). This patch adds new ioctl() cmd for obtaining net ns of a device. Reported-by: Harald Albrecht <harald.albrecht@gmx.net> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-21 13:19:15 -07:00
David S. Miller	28b18b39c5	Merge branch 'ipv6-Change-addrconf_f6i_alloc-to-use-ip6_route_info_create' David Ahern says: ==================== ipv6: Change addrconf_f6i_alloc to use ip6_route_info_create addrconf_f6i_alloc is the last caller of fib6_info_alloc besides ip6_route_info_create. There really is no good reason for it do its own fib6_info initialization, so convert it to call ip6_route_info_create. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-21 10:16:54 -07:00
David Ahern	c7a1ce397a	ipv6: Change addrconf_f6i_alloc to use ip6_route_info_create Change addrconf_f6i_alloc to generate a fib6_config and call ip6_route_info_create. addrconf_f6i_alloc is the last caller to fib6_info_alloc besides ip6_route_info_create, and there is no reason for it to do its own initialization on a fib6_info. Host routes need to be created even if the device is down, so add a new flag, fc_ignore_dev_down, to fib6_config and update fib6_nh_init to not error out if device is not up. Notes on the conversion: - ip_fib_metrics_init is the same as fib6_config has fc_mx set to NULL and fc_mx_len set to 0 - dst_nocount is handled by the RTF_ADDRCONF flag - dst_host is handled by fc_dst_len = 128 nh_gw does not get set after the conversion to ip6_route_info_create but it should not be set in addrconf_f6i_alloc since this is a host route not a gateway route. Everything else is a straight forward map between fib6_info and fib6_config. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-21 10:16:54 -07:00
David Ahern	67f6951347	ipv6: Move setting default metric for routes ip6_route_info_create is a low level function for ensuring fc_metric is set. Move the check and default setting to the 2 locations that do not already set fc_metric before calling ip6_route_info_create. This is required for the next patch which moves addrconf allocations to ip6_route_info_create and want the metric for host routes to be 0. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-21 10:16:54 -07:00
Vakul Garg	a88c26f671	net/tls: Replace kfree_skb() with consume_skb() To free the skb in normal course of processing, consume_skb() should be used. Only for failure paths, skb_free() is intended to be used. https://www.kernel.org/doc/htmldocs/networking/API-consume-skb.html Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-21 10:14:26 -07:00
Hoang Le	08e046c896	tipc: fix a null pointer deref In commit `c55c8edafa` ("tipc: smooth change between replicast and broadcast") we introduced new method to eliminate the risk of message reordering that happen in between different nodes. Unfortunately, we forgot checking at receiving side to ignore intra node. We fix this by checking and returning if arrived message from intra node. syzbot report: ================================================================== kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] PREEMPT SMP KASAN CPU: 0 PID: 7820 Comm: syz-executor418 Not tainted 5.0.0+ #61 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:tipc_mcast_filter_msg+0x21b/0x13d0 net/tipc/bcast.c:782 Code: 45 c0 0f 84 39 06 00 00 48 89 5d 98 e8 ce ab a5 fa 49 8d bc 24 c8 00 00 00 48 b9 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 <80> 3c 08 00 0f 85 9a 0e 00 00 49 8b 9c 24 c8 00 00 00 48 be 00 00 RSP: 0018:ffff8880959defc8 EFLAGS: 00010202 RAX: 0000000000000019 RBX: ffff888081258a48 RCX: dffffc0000000000 RDX: 0000000000000000 RSI: ffffffff86cab862 RDI: 00000000000000c8 RBP: ffff8880959df030 R08: ffff8880813d0200 R09: ffffed1015d05bc8 R10: ffffed1015d05bc7 R11: ffff8880ae82de3b R12: 0000000000000000 R13: 000000000000002c R14: 0000000000000000 R15: ffff888081258a48 FS: 000000000106a880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020001cc0 CR3: 0000000094a20000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: tipc_sk_filter_rcv+0x182d/0x34f0 net/tipc/socket.c:2168 tipc_sk_enqueue net/tipc/socket.c:2254 [inline] tipc_sk_rcv+0xc45/0x25a0 net/tipc/socket.c:2305 tipc_sk_mcast_rcv+0x724/0x1020 net/tipc/socket.c:1209 tipc_mcast_xmit+0x7fe/0x1200 net/tipc/bcast.c:410 tipc_sendmcast+0xb36/0xfc0 net/tipc/socket.c:820 __tipc_sendmsg+0x10df/0x18d0 net/tipc/socket.c:1358 tipc_sendmsg+0x53/0x80 net/tipc/socket.c:1291 sock_sendmsg_nosec net/socket.c:651 [inline] sock_sendmsg+0xdd/0x130 net/socket.c:661 ___sys_sendmsg+0x806/0x930 net/socket.c:2260 __sys_sendmsg+0x105/0x1d0 net/socket.c:2298 __do_sys_sendmsg net/socket.c:2307 [inline] __se_sys_sendmsg net/socket.c:2305 [inline] __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2305 do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x4401c9 Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007ffd887fa9d8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 00000000004401c9 RDX: 0000000000000000 RSI: 0000000020002140 RDI: 0000000000000003 RBP: 00000000006ca018 R08: 0000000000000000 R09: 00000000004002c8 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000401a50 R13: 0000000000401ae0 R14: 0000000000000000 R15: 0000000000000000 Modules linked in: ---[ end trace ba79875754e1708f ]--- Reported-by: syzbot+be4bdf2cc3e85e952c50@syzkaller.appspotmail.com Fixes: `c55c8eda` ("tipc: smooth change between replicast and broadcast") Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-21 09:56:55 -07:00
Hoang Le	77d5ad4048	tipc: fix use-after-free in tipc_sk_filter_rcv skb free-ed in: 1/ condition 1: tipc_sk_filter_rcv -> tipc_sk_proto_rcv 2/ condition 2: tipc_sk_filter_rcv -> tipc_group_filter_msg This leads to a "use-after-free" access in the next condition. We fix this by intializing the variable at declaration, then it is safe to check this variable to continue processing if condition matches. syzbot report: ================================================================== BUG: KASAN: use-after-free in tipc_sk_filter_rcv+0x2166/0x34f0 net/tipc/socket.c:2167 Read of size 4 at addr ffff88808ea58534 by task kworker/u4:0/7 CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 5.0.0+ #61 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: tipc_send tipc_conn_send_work Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x172/0x1f0 lib/dump_stack.c:113 print_address_description.cold+0x7c/0x20d mm/kasan/report.c:187 kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317 __asan_report_load4_noabort+0x14/0x20 mm/kasan/generic_report.c:131 tipc_sk_filter_rcv+0x2166/0x34f0 net/tipc/socket.c:2167 tipc_sk_enqueue net/tipc/socket.c:2254 [inline] tipc_sk_rcv+0xc45/0x25a0 net/tipc/socket.c:2305 tipc_topsrv_kern_evt+0x3b7/0x580 net/tipc/topsrv.c:610 tipc_conn_send_to_sock+0x43e/0x5f0 net/tipc/topsrv.c:283 tipc_conn_send_work+0x65/0x80 net/tipc/topsrv.c:303 process_one_work+0x98e/0x1790 kernel/workqueue.c:2269 worker_thread+0x98/0xe40 kernel/workqueue.c:2415 kthread+0x357/0x430 kernel/kthread.c:253 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352 Reported-by: syzbot+e863893591cc7a622e40@syzkaller.appspotmail.com Fixes: `c55c8eda` ("tipc: smooth change between replicast and broadcast") Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-21 09:56:55 -07:00
Stephen Suryaputra	0b03a5ca8b	ipv6: Add icmp_echo_ignore_anycast for ICMPv6 In addition to icmp_echo_ignore_multicast, there is a need to also prevent responding to pings to anycast addresses for security. Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-20 16:29:37 -07:00
YueHaibing	a534ea30e7	net: isdn: Make isdn_ppp_mp_discard and isdn_ppp_mp_reassembly static Fix sparse warnings: drivers/isdn/i4l/isdn_ppp.c:1891:16: warning: symbol 'isdn_ppp_mp_discard' was not declared. Should it be static? drivers/isdn/i4l/isdn_ppp.c:1903:6: warning: symbol 'isdn_ppp_mp_reassembly' was not declared. Should it be static? Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-20 12:25:59 -07:00
YueHaibing	881d7afdff	net: hns3: Make hclge_destroy_cmd_queue static Fix sparse warning: drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c:414:6: warning: symbol 'hclge_destroy_cmd_queue' was not declared. Should it be static? Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-20 11:20:12 -07:00
David S. Miller	75d317c409	Merge branch 'net-refactor-ndo_select_queue' Paolo Abeni says: ==================== net: refactor ndo_select_queue() Currently, on most devices implementing ndo_select_queue(), we get 2 indirect calls per xmit packet, at least in some scenarios. We can avoid one of such indirect calls refactoring the ndo_select_queue() usage so that we don't need anymore the 'fallback' argument. The first patch renames a helper used later as a public API, the second one changes the af packet implementation so that it uses the common infrastructure to select the xmit queue, and the second patch drops the now unneeded argument from ndo_select_queue(). Alternatively we could use the INDIRECT_CALL_WRAPPER infrastructure to avoid the fallback indirect call in the common case, but this solution allows also for some code cleanup. v1 -> v2: - renamed select queue helpers, as per Eric's and David's suggestions ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-20 11:18:55 -07:00

1 2 3 4 5 ...

824345 Commits All Branches Search

824345 Commits

All Branches