Commit Graph

200177 Commits

Author SHA1 Message Date
Eric W. Biederman 109f6e39fa af_unix: Allow SO_PEERCRED to work across namespaces.
Use struct pid and struct cred to store the peer credentials on struct
sock.  This gives enough information to convert the peer credential
information to a value relative to whatever namespace the socket is in
at the time.

This removes nasty surprises when using SO_PEERCRED on socket
connetions where the processes on either side are in different pid and
user namespaces.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Daniel Lezcano <daniel.lezcano@free.fr>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-16 14:55:55 -07:00
Eric W. Biederman 3f551f9436 sock: Introduce cred_to_ucred
To keep the coming code clear and to allow both the sock
code and the scm code to share the logic introduce a
fuction to translate from struct cred to struct ucred.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-16 14:55:35 -07:00
Eric W. Biederman 5c1469de75 user_ns: Introduce user_nsmap_uid and user_ns_map_gid.
Define what happens when a we view a uid from one user_namespace
in another user_namepece.

- If the user namespaces are the same no mapping is necessary.

- For most cases of difference use overflowuid and overflowgid,
  the uid and gid currently used for 16bit apis when we have a 32bit uid
  that does fit in 16bits.  Effectively the situation is the same,
  we want to return a uid or gid that is not assigned to any user.

- For the case when we happen to be mapping the uid or gid of the
  creator of the target user namespace use uid 0 and gid as confusing
  that user with root is not a problem.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Serge E. Hallyn <serue@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-16 14:55:34 -07:00
Eric W. Biederman 812e876e84 scm: Reorder scm_cookie.
Reorder the fields in scm_cookie so they pack better on 64bit.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-16 14:55:34 -07:00
Don Skidmore 756725064f ixgbe: add comment on SFP+ ID for Active DA
These comments were forgotten in the initial patch to add this
functionality.  This patch corrects that.

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-16 14:47:30 -07:00
Tom Hughes fa68a78227 Clear IFF_XMIT_DST_RELEASE for teql interfaces
https://bugzilla.kernel.org/show_bug.cgi?id=16183

The sch_teql module, which can be used to load balance over a set of
underlying interfaces, stopped working after 2.6.30 and has been
broken in all kernels since then for any underlying interface which
requires the addition of link level headers.

The problem is that the transmit routine relies on being able to
access the destination address in the skb in order to do address
resolution once it has decided which underlying interface it is going
to transmit through.

In 2.6.31 the IFF_XMIT_DST_RELEASE flag was introduced, and set by
default for all interfaces, which causes the destination address to be
released before the transmit routine for the interface is called.

The solution is to clear that flag for teql interfaces.

Signed-off-by: Tom Hughes <tom@compton.nu>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-16 14:47:30 -07:00
Anirban Chakraborty 434d7b380f qlcnic: Bumped up version number
Changed the driver version number to 5.0.4

Signed-off-by: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-16 14:45:51 -07:00
Anirban Chakraborty 0e33c6649e qlcnic: Fix a bug in setting up NIC partitioning mode
The driver was not detecting the presence of NIC partitioning capability of the
firmware properly. Now, it checks the eswitch set bit in the FW capabilities
register and accordingly sets the driver mode as NPAR capable or not.

Signed-off-by: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-16 14:45:51 -07:00
Florian Westphal 8c76368174 syncookies: check decoded options against sysctl settings
Discard the ACK if we find options that do not match current sysctl
settings.

Previously it was possible to create a connection with sack, wscale,
etc. enabled even if the feature was disabled via sysctl.

Also remove an unneeded call to tcp_sack_reset() in
cookie_check_timestamp: Both call sites (cookie_v4_check,
cookie_v6_check) zero "struct tcp_options_received", hand it to
tcp_parse_options() (which does not change tcp_opt->num_sacks/dsack)
and then call cookie_check_timestamp().

Even if num_sacks/dsacks were changed, the structure is allocated on
the stack and after cookie_check_timestamp returns only a few selected
members are copied to the inet_request_sock.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-16 14:42:15 -07:00
Manfred Rudigier 97553f7f3e gianfar: Fix setup of RX time stamping
Previously the RCTRL_TS_ENABLE bit was set unconditionally. However, if
the RCTRL_TS_ENABLE is set without TMR_CTRL[TE], the driver does not work
properly on some boards (Anton had problems with the MPC8313ERDB and
MPC8568EMDS).

With this patch the bit will only be set if requested from user space
with the SIOCSHWTSTAMP ioctl command, meaning that time stamping is
disabled during normal operation. Users who are not interested in time
stamps will not experience problems with buggy CPU revisions or
performance drops any more.

The setting of TMR_CTRL[TE] is still up to the user. This is considered
safe because users wanting HW timestamps must initialize the eTSEC clock
first anyway, e.g. with the recently submitted PTP clock driver.

Signed-off-by: Manfred Rudigier <manfred.rudigier@omicron.at>
Reviewed-by: Anton Vorontsov <cbouatmailru@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-16 14:40:02 -07:00
David S. Miller d8d326dc7a Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2010-06-16 13:41:55 -07:00
Christoph Fritz 021570e55b mac80211: fix warn, enum may be used uninitialized
regression introduced by b8d92c9c14

In function ‘ieee80211_work_rx_queued_mgmt’:
warning: ‘rma’ may be used uninitialized in this function

this re-adds default value WORK_ACT_NONE back to rma

Signed-off-by: Christoph Fritz <chf.fritz@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-06-16 15:49:16 -04:00
Bruno Randolf 6a0076e02a ath5k: report PHY error frames only for chips which need it
Only report PHY error frames for ANI on chipsets which do not have PHY error
counters in hardware.

Signed-off-by: Bruno Randolf <br1@einfach.org>
Acked-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-06-16 14:59:07 -04:00
Bruno Randolf 8786123b51 ath5k: review RX descriptor functions
Reviewed RX descriptor functions against the HAL sources. Some minor changes:

  - check size before making changes to the descriptor

  - whitespace

  - add comments about 5210 timestamps. this needs to be adressed later!

  - FIFO overrun error only available on 5210

  - rs_phyerr should not be OR'ed

  - clear the whole ath5k_rx_status structure before using, instead of
    zeroing specific fields.

Signed-off-by: Bruno Randolf <br1@einfach.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-06-16 14:59:07 -04:00
Bruno Randolf 1884a3678c ath5k: take descriptor differences between 5210 and 5211 into account
There are some differences between 5210 and 5211 descriptors which we did not
take into account before.

Signed-off-by: Bruno Randolf <br1@einfach.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-06-16 14:59:06 -04:00
Bruno Randolf 2237e92884 ath5k: update 5210/5211 frame types
Update 5210 frame types to match the HAL. We have to apply the same bitshift to
the constants as we use later.

Add 5211 specific frame types.

Signed-off-by: Bruno Randolf <br1@einfach.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-06-16 14:59:06 -04:00
Bruno Randolf 03417bc605 ath5k: review and add comments for descriptors
I carefully reviewed desh.h against the HAL sources. Added comments and made
differences between 5210, 5211 and 5212 more clear by adding _521x to the
defines which are specific to that chipset. Renamed some defines. No functional
changes.

Signed-off-by: Bruno Randolf <br1@einfach.org>
Acked-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-06-16 14:59:05 -04:00
Bruno Randolf 62412a8f0d ath5k: remove pointless rx error overlay struct
ath5k_hw_rx_error was only used once, where we could easily just use
ath5k_hw_rx_status as well, so remove it.

Signed-off-by: Bruno Randolf <br1@einfach.org>
Acked-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-06-16 14:59:04 -04:00
Bruno Randolf 2847109f73 ath5k: cosmetic changes in ath5k_hw_proc_5212_rx_status()
Just whitespace and indentation.

Signed-off-by: Bruno Randolf <br1@einfach.org>
Acked-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-06-16 14:59:04 -04:00
Bruno Randolf a666819354 ath5k: use direct function calls for descriptors when possible
Use direct function calls for ath5k_hw_setup_rx_desc() and
ath5k_hw_setup_mrr_tx_desc() instead of a function pointer which always pointed
to the same function in the case of ath5k_hw_setup_rx_desc() and which is
easily unified in the case of ath5k_hw_setup_mrr_tx_desc().

Also simplify the initialization function for the remaining function pointers.

Signed-off-by: Bruno Randolf <br1@einfach.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-06-16 14:59:03 -04:00
Bruno Randolf 02a78b42f8 ath5k: move checks and stats into new function
Create a new function ath5k_receive_frame_ok() which checks for errors, updates
error statistics and tells us if we want to further "receive" this frame or
not. This way we can avoid a goto and have a cleaner separation between buffer
handling and other things.

Signed-off-by: Bruno Randolf <br1@einfach.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-06-16 14:59:03 -04:00
Bruno Randolf 8a89f063e7 ath5k: split descriptor handling and frame receive
Move frame reception into it's own function to have a clearer separation
between buffer and descriptor handling and things that are done when we
actually receive a frame.

Signed-off-by: Bruno Randolf <br1@einfach.org>
Acked-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-06-16 14:59:02 -04:00
Bruno Randolf b16062facb ath5k: unify rx descriptor error handling
There is no reason for a special handling (return) here, just break like we do
with the checks before.

Signed-off-by: Bruno Randolf <br1@einfach.org>
Acked-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-06-16 14:59:02 -04:00
Bruno Randolf 39d63f2a3f ath5k: reset more pointers after we free skbs
After we free skbs for receive or transmit descriptors, make sure we have no
pointers to the now invalid memory address.

Signed-off-by: Bruno Randolf <br1@einfach.org>
Acked-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-06-16 14:59:01 -04:00
Bruno Randolf 0452d4a508 ath5k: print more errors when decriptor setup fails
Signed-off-by: Bruno Randolf <br1@einfach.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-06-16 14:59:00 -04:00
Bruno Randolf 265faadd6a ath5k: fix rx descriptor debugging
In the debug ouptut rx_status_0 was printed twice instead of rx_status_1. Also
make the debug message more clear.

Signed-off-by: Bruno Randolf <br1@einfach.org>
Acked-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-06-16 14:59:00 -04:00
Bruno Randolf beade6363c ath5k: fix some comment typos
Fix comment about dma sizes, brackets were missing. Replace 'insure' with
'ensure'.

Signed-off-by: Bruno Randolf <br1@einfach.org>
Acked-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-06-16 14:58:59 -04:00
Bruno Randolf 9e4e43f20f ath5k: rename ath5k_txbuf_free() to ath5k_txbuf_free_skb()
Rename ath5k_txbuf_free() to ath5k_txbuf_free_skb() since this is what it does:
it frees the skb and not the buf. Same for ath5k_rxbuf_free().

Signed-off-by: Bruno Randolf <br1@einfach.org>
Acked-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-06-16 14:58:58 -04:00
Bruno Randolf 8d67a0310f ath5k: more debug prints for resets
Add a debug print for every case of reset.

Signed-off-by: Bruno Randolf <br1@einfach.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-06-16 14:58:58 -04:00
ubuntu@tjworld.net c086abae5b ipw2200: Enable LED by default
BugLink: http://bugs.launchpad.net/bugs/21367

Enable LED by default and update the MODULE_PARM_DESC.  The original
reason for defaulting to disabled was documented in 2005 and noted, "The
LED code has been reported to hang some systems when running ifconfig
and is therefore disabled by default."  This no longer appears
applicable and users have been requesting this be enabled for several
years.

Signed-off-by: TJ <ubuntu@tjworld.net>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Andy Whitcroft <apw@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Leann Ogasawara <leann.ogasawara@canonical.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-06-16 14:58:57 -04:00
Leann Ogasawara 7484bdc056 p54usb: Comment out duplicate Medion MD40900 device id
The Medion MD40900 device id [0x0cde, 0x0006] is defined twice.
Comment out the duplicate.

Originally-by: Ben Collins <ben.collins@ubuntu.com>
Signed-off-by: Leann Ogasawara <leann.ogasawara@canonical.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-06-16 14:58:57 -04:00
John W. Linville 3d3b33bd99 zd1211rw: change ZD_REGDOMAIN_JAPAN_* naming
ZD_REGDOMAIN_JAPAN_ADD and ZD_REGDOMAIN_JAPAN_GW_US54GXS seem a little
verbose to me...

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-06-16 14:58:46 -04:00
Eric Dumazet 317fe0e6c5 inetpeer: restore small inet_peer structures
Addition of rcu_head to struct inet_peer added 16bytes on 64bit arches.

Thats a bit unfortunate, since old size was exactly 64 bytes.

This can be solved, using an union between this rcu_head an four fields,
that are normally used only when a refcount is taken on inet_peer.
rcu_head is used only when refcnt=-1, right before structure freeing.

Add a inet_peer_refcheck() function to check this assertion for a while.

We can bring back SLAB_HWCACHE_ALIGN qualifier in kmem cache creation.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-16 11:55:39 -07:00
Kouhei Sutou 71184ba482 zd1211rw: add 0x49 -> JP regulatory domain map
0x49 is used by PLANEX GW-US54GXS (2019:5303).

Signed-off-by: Kouhei Sutou <kou@clear-code.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-06-16 14:39:44 -04:00
David S. Miller fdb93f8ac3 gadget/rndis: dev_get_stats() now returns rtnl_link_stats64.
Based upon a report by Stephen Rothwell.

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-15 21:50:14 -07:00
Eric Dumazet 5f2f892095 inetpeer: do not use zero refcnt for freed entries
Followup of commit aa1039e73c (inetpeer: RCU conversion)

Unused inet_peer entries have a null refcnt.

Using atomic_inc_not_zero() in rcu lookups is not going to work for
them, and slow path is taken.

Fix this using -1 marker instead of 0 for deleted entries.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-15 21:47:39 -07:00
Herbert Xu d5f31fbfd8 netpoll: Use correct primitives for RCU dereferencing
Now that RCU debugging checks for matching rcu_dereference calls
and rcu_read_lock, we need to use the correct primitives or face
nasty warnings.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-15 21:44:29 -07:00
Herbert Xu 9f70b0fcec bridge: Add const to dummy br_netpoll_send_skb
The version of br_netpoll_send_skb used when netpoll is off is
missing a const thus causing a warning.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-15 21:43:48 -07:00
Herbert Xu fed396a585 bridge: Fix OOM crash in deliver_clone
The bridge multicast patches introduced an OOM crash in the forward
path, when deliver_clone fails to clone the skb.

Reported-by: Mark Wagner <mwagner@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-15 21:43:07 -07:00
Eric Dumazet 5933dd2f02 net: NET_SKB_PAD should depend on L1_CACHE_BYTES
In old kernels, NET_SKB_PAD was defined to 16.

Then commit d6301d3dd1 (net: Increase default NET_SKB_PAD to 32), and
commit 18e8c134f4 (net: Increase NET_SKB_PAD to 64 bytes) increased it
to 64.

While first patch was governed by network stack needs, second was more
driven by performance issues on current hardware. Real intent was to
align data on a cache line boundary.

So use max(32, L1_CACHE_BYTES) instead of 64, to be more generic.

Remove microblaze and powerpc own NET_SKB_PAD definitions.

Thanks to Alexander Duyck and David Miller for their comments.

Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-15 18:16:43 -07:00
Amit Kumar Salecha 7e43cd66d3 netxen: fix caching window register
CRB window register is not per pci-func for NX3031,
so caching can result in incorrect values.

Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-15 18:15:27 -07:00
Amit Kumar Salecha 2227bae22b netxen: fix rcv buffer leak
Rcv producer should be read in spin-lock.

Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-15 18:15:27 -07:00
Amit Kumar Salecha bf445080da netxen: fix memory leaks in error path
Fixes memory leak in error path when memory allocation
for adapter data structures fails.

Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-15 18:15:26 -07:00
Eric Dumazet a95d8c88be ipfrag : frag_kfree_skb() cleanup
Third param (work) is unused, remove it.

Remove __inline__ and inline qualifiers.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-15 18:12:44 -07:00
Eric Dumazet d27f9b3582 ip_frag: Remove some atomic ops
Instead of doing one atomic operation per frag, we can factorize them.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-15 18:12:44 -07:00
Florian Westphal 2bbdf389a9 ipv6: syncookies: do not skip ->iif initialization
When syncookies are in effect, req->iif is left uninitialized.
In case of e.g. link-local addresses the route lookup then fails
and no syn-ack is sent.

Rearrange things so ->iif is also initialized in the syncookie case.

want_cookie can only be true when the isn was zero, thus move the want_cookie
check into the "!isn" branch.

Cc: Glenn Griffin <ggriffin.kernel@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-15 18:10:29 -07:00
Ben Hutchings 82695d9b18 net: Fix error in comment on net_device_ops::ndo_get_stats
ndo_get_stats still returns struct net_device_stats *; there is
no struct net_device_stats64.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-15 15:08:48 -07:00
Sonic Zhang 4fcc3d3409 netdev:bfin_mac: reclaim and free tx skb as soon as possible after transfer
SKBs hold onto resources that can't be held indefinitely, such as TCP
socket references and netfilter conntrack state.  So if a packet is left
in TX ring for a long time, there might be a TCP socket that cannot be
closed and freed up.

Current blackfin EMAC driver always reclaim and free used tx skbs in future
transfers. The problem is that future transfer may not come as soon as
possible. This patch start a timer after transfer to reclaim and free skb.
There is nearly no performance drop with this patch.

TX interrupt is not enabled because of a strange behavior of the Blackfin EMAC.
If EMAC TX transfer control is turned on, endless TX interrupts are triggered
no matter if TX DMA is enabled or not. Since DMA walks down the ring automatically,
TX transfer control can't be turned off in the middle. The only way is to disable
TX interrupt completely.

Signed-off-by: Sonic Zhang <sonic.zhang@analog.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-15 15:04:10 -07:00
Dominik Brodowski 44b496f685 pcmcia: dev_node removal bugfix
Patch c7c2fa07 removed one line too much from smc91c92_cs.c.

Reported-by: Komuro <komurojun-mbn@nifty.com>
CC: netdev@vger.kernel.org
CC: linux-wireless@vger.kernel.org
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-15 14:55:31 -07:00
Eric Dumazet aa1039e73c inetpeer: RCU conversion
inetpeer currently uses an AVL tree protected by an rwlock.

It's possible to make most lookups use RCU

1) Add a struct rcu_head to struct inet_peer

2) add a lookup_rcu_bh() helper to perform lockless and opportunistic
lookup. This is a normal function, not a macro like lookup().

3) Add a limit to number of links followed by lookup_rcu_bh(). This is
needed in case we fall in a loop.

4) add an smp_wmb() in link_to_pool() right before node insert.

5) make unlink_from_pool() use atomic_cmpxchg() to make sure it can take
last reference to an inet_peer, since lockless readers could increase
refcount, even while we hold peers.lock.

6) Delay struct inet_peer freeing after rcu grace period so that
lookup_rcu_bh() cannot crash.

7) inet_getpeer() first attempts lockless lookup.
   Note this lookup can fail even if target is in AVL tree, but a
concurrent writer can let tree in a non correct form.
   If this attemps fails, lock is taken a regular lookup is performed
again.

8) convert peers.lock from rwlock to a spinlock

9) Remove SLAB_HWCACHE_ALIGN when peer_cachep is created, because
rcu_head adds 16 bytes on 64bit arches, doubling effective size (64 ->
128 bytes)
In a future patch, this is probably possible to revert this part, if rcu
field is put in an union to share space with rid, ip_id_count, tcp_ts &
tcp_ts_stamp. These fields being manipulated only with refcnt > 0.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-15 14:23:38 -07:00