Fix a possible regression with p9_client_stat where it can try to kfree
an ERR_PTR after an erroneous p9pdu_readf. Also remove an unnecessary data
buffer increment in p9_client_read.
Signed-off-by: Abhishek Kulkarni <adkulkar@umail.iu.edu>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
The default 9p transport module is not chosen unless an option parameter (any)
is passed to mount, which thus returns a ENOPROTOSUPPORT. This fix moves the
check out of parse_opts into p9_client_create.
Signed-off-by: Abhishek Kulkarni <adkulkar@umail.iu.edu>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
Revert "NET: Fix locking issues in PPP, 6pack, mkiss and strip line disciplines."
skbuff.h: Fix comment for NET_IP_ALIGN
drivers/net: using spin_lock_irqsave() in net_send_packet()
NET: phy_device, fix lock imbalance
gre: fix ToS/DiffServ inherit bug
igb: gcc-3.4.6 fix
atlx: duplicate testing of MCAST flag
NET: Fix locking issues in PPP, 6pack, mkiss and strip line disciplines.
netdev: restore MTU change operation
netdev: restore MAC address set and validate operations
sit: fix regression: do not release skb->dst before xmit
net: ip_push_pending_frames() fix
net: sk_prot_alloc() should not blindly overwrite memory
Fixes two bugs:
- ToS/DiffServ inheritance was unintentionally activated when using impair fixed ToS values
- ECN bit was lost during ToS/DiffServ inheritance
Signed-off-by: Andreas Jaggi <aj@open.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
* Remove smp_lock.h from files which don't need it (including some headers!)
* Add smp_lock.h to files which do need it
* Make smp_lock.h include conditional in hardirq.h
It's needed only for one kernel_locked() usage which is under CONFIG_PREEMPT
This will make hardirq.h inclusion cheaper for every PREEMPT=n config
(which includes allmodconfig/allyesconfig, BTW)
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The sit module makes use of skb->dst in it's xmit function, so since
93f154b594 ("net: release dst entry in dev_hard_start_xmit()") sit
tunnels are broken, because the flag IFF_XMIT_DST_RELEASE is not
unset.
This patch unsets that flag for sit devices to fix this
regression.
Signed-off-by: Sascha Hlusiak <contact@saschahlusiak.de>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After commit 2b85a34e91
(net: No more expensive sock_hold()/sock_put() on each tx)
we do not take any more references on sk->sk_refcnt on outgoing packets.
I forgot to delete two __sock_put() from ip_push_pending_frames()
and ip6_push_pending_frames().
Reported-by: Emil S Tantilov <emils.tantilov@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Emil S Tantilov <emils.tantilov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some sockets use SLAB_DESTROY_BY_RCU, and our RCU code correctness
depends on sk->sk_nulls_node.next being always valid. A NULL
value is not allowed as it might fault a lockless reader.
Current sk_prot_alloc() implementation doesnt respect this hypothesis,
calling kmem_cache_alloc() with __GFP_ZERO. Just call memset() around
the forbidden field.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adding memory barrier after the poll_wait function, paired with
receive callbacks. Adding fuctions sock_poll_wait and sk_has_sleeper
to wrap the memory barrier.
Without the memory barrier, following race can happen.
The race fires, when following code paths meet, and the tp->rcv_nxt
and __add_wait_queue updates stay in CPU caches.
CPU1 CPU2
sys_select receive packet
... ...
__add_wait_queue update tp->rcv_nxt
... ...
tp->rcv_nxt check sock_def_readable
... {
schedule ...
if (sk->sk_sleep && waitqueue_active(sk->sk_sleep))
wake_up_interruptible(sk->sk_sleep)
...
}
If there was no cache the code would work ok, since the wait_queue and
rcv_nxt are opposit to each other.
Meaning that once tp->rcv_nxt is updated by CPU2, the CPU1 either already
passed the tp->rcv_nxt check and sleeps, or will get the new value for
tp->rcv_nxt and will return with new data mask.
In both cases the process (CPU1) is being added to the wait queue, so the
waitqueue_active (CPU2) call cannot miss and will wake up CPU1.
The bad case is when the __add_wait_queue changes done by CPU1 stay in its
cache, and so does the tp->rcv_nxt update on CPU2 side. The CPU1 will then
endup calling schedule and sleep forever if there are no more data on the
socket.
Calls to poll_wait in following modules were ommited:
net/bluetooth/af_bluetooth.c
net/irda/af_irda.c
net/irda/irnet/irnet_ppp.c
net/mac80211/rc80211_pid_debugfs.c
net/phonet/socket.c
net/rds/af_rds.c
net/rfkill/core.c
net/sunrpc/cache.c
net/sunrpc/rpc_pipe.c
net/tipc/socket.c
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using early netconsole and gianfar driver this error pops up:
netconsole: timeout waiting for carrier
It appears that net/core/netpoll.c:netpoll_setup() is using
cond_resched() in a loop waiting for a carrier.
The thing is that cond_resched() is a no-op when system_state !=
SYSTEM_RUNNING, and so drivers/net/phy/phy.c's state_queue is never
scheduled, therefore link detection doesn't work.
I belive that the main problem is in cond_resched()[1], but despite
how the cond_resched() story ends, it might be a good idea to call
msleep(1) instead of cond_resched(), as suggested by Andrew Morton.
[1] http://lkml.org/lkml/2009/7/7/463
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pawel Staszewski wrote:
<blockquote>
Some time ago i report this:
http://bugzilla.kernel.org/show_bug.cgi?id=6648
and now with 2.6.29 / 2.6.29.1 / 2.6.29.3 and 2.6.30 it back
dmesg output:
oprofile: using NMI interrupt.
Fix inflate_threshold_root. Now=15 size=11 bits
...
Fix inflate_threshold_root. Now=15 size=11 bits
cat /proc/net/fib_triestat
Basic info: size of leaf: 40 bytes, size of tnode: 56 bytes.
Main:
Aver depth: 2.28
Max depth: 6
Leaves: 276539
Prefixes: 289922
Internal nodes: 66762
1: 35046 2: 13824 3: 9508 4: 4897 5: 2331 6: 1149 7: 5
9: 1 18: 1
Pointers: 691228
Null ptrs: 347928
Total size: 35709 kB
</blockquote>
It seems, the current threshold for root resizing is too aggressive,
and it causes misleading warnings during big updates, but it might be
also responsible for memory problems, especially with non-preempt
configs, when RCU freeing is delayed long after call_rcu.
It should be also mentioned that because of non-atomic changes during
resizing/rebalancing the current lookup algorithm can miss valid leaves
so it's additional argument to shorten these activities even at a cost
of a minimally longer searching.
This patch restores values before the patch "[IPV4]: fib_trie root
node settings", commit: 965ffea43d from
v2.6.22.
Pawel's report:
<blockquote>
I dont see any big change of (cpu load or faster/slower
routing/propagating routes from bgpd or something else) - in avg there
is from 2% to 3% more of CPU load i dont know why but it is - i change
from "preempt" to "no preempt" 3 times and check this my "mpstat -P ALL
1 30"
always avg cpu load was from 2 to 3% more compared to "no preempt"
[...]
cat /proc/net/fib_triestat
Basic info: size of leaf: 20 bytes, size of tnode: 36 bytes.
Main:
Aver depth: 2.44
Max depth: 6
Leaves: 277814
Prefixes: 291306
Internal nodes: 66420
1: 32737 2: 14850 3: 10332 4: 4871 5: 2313 6: 942 7: 371 8: 3 17: 1
Pointers: 599098
Null ptrs: 254865
Total size: 18067 kB
</blockquote>
According to this and other similar reports average depth is slightly
increased (~0.2), and root nodes are shorter (log 17 vs. 18), but
there is no visible performance decrease. So, until memory handling is
improved or added parameters for changing this individually, this
patch resets to safer defaults.
Reported-by: Pawel Staszewski <pstaszewski@itcare.pl>
Reported-by: Jorge Boncompte [DTI2] <jorge@dti2.net>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Tested-by: Pawel Staszewski <pstaszewski@itcare.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
If rix is not found in mi->r[], i will become -1 after the loop. This value
is eventually used to access arrays, so we were accessing arrays with a
negative index, which is obviously not what we want to do. This patch fixes
this potential problem.
Signed-off-by: Luciano Coelho <luciano.coelho@nokia.com>
Acked-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The code in cfg80211's cfg80211_bss_update erroneously
grabs a reference to the BSS, which means that it will
never be freed.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: stable@kernel.org [2.6.29, 2.6.30]
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Don't forget to unlock cfg80211_mutex in one fail path of
nl80211_set_wiphy.
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The bit that tells us whether a statistics counter snapshot operation
has completed is located in the GLOBAL register block, not in the
GLOBAL2 register block, so fix up mv88e6xxx_stats_wait() to poll the
right register address.
Signed-off-by: Stephane Contri <Stephane.Contri@grassvalley.com>
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Cc: stable@kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
There's a bug in addrconf_prefix_rcv() where it won't update the
preferred lifetime of an IPv6 address if the current valid lifetime
of the address is less than 2 hours (the minimum value in the RA).
For example, If I send a router advertisement with a prefix that
has valid lifetime = preferred lifetime = 2 hours we'll build
this address:
3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000
inet6 2001:1890:1109:a20:217:8ff:fe7d:4718/64 scope global dynamic
valid_lft 7175sec preferred_lft 7175sec
If I then send the same prefix with valid lifetime = preferred
lifetime = 0 it will be ignored since the minimum valid lifetime
is 2 hours:
3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000
inet6 2001:1890:1109:a20:217:8ff:fe7d:4718/64 scope global dynamic
valid_lft 7161sec preferred_lft 7161sec
But according to RFC 4862 we should always reset the preferred lifetime
even if the valid lifetime is invalid, which would cause the address
to immediately get deprecated. So with this patch we'd see this:
5: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000
inet6 2001:1890:1109:a20:21f:29ff:fe5a:ef04/64 scope global deprecated dynamic
valid_lft 7163sec preferred_lft 0sec
The comment winds-up being 5x the size of the code to fix the problem.
Update the preferred lifetime of IPv6 addresses derived from a prefix
info option in a router advertisement even if the valid lifetime in
the option is invalid, as specified in RFC 4862 Section 5.5.3e. Fixes
an issue where an address will not immediately become deprecated.
Reported by Jens Rosenboom.
Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The SCTP pushed the skb above the sctp chunk header, so the
check of pskb_may_pull(skb, nh + offset + 1 - skb->data) in
_decode_session6() will never return 0 and the ports decode
of sctp will always fail. (nh + offset + 1 - skb->data < 0)
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The SCTP pushed the skb data above the sctp chunk header, so the check
of pskb_may_pull(skb, xprth + 4 - skb->data) in _decode_session4() will
never return 0 because xprth + 4 - skb->data < 0, the ports decode of
sctp will always fail.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is not safe to use match_int without checking the token type returned
by match_token (especially when the token type returned is Opt_err and
args is empty). Fix it.
Signed-off-by: Abhishek Kulkarni <adkulkar@umail.iu.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 73ce7b01b4.
After discovering that we don't listen to gratuitious arps in 2.6.30
I tracked the failure down to this commit.
The patch makes absolutely no sense. RFC2131 RFC3927 and RFC5227.
are all in agreement that an arp request with sip == 0 should be used
for the probe (to prevent learning) and an arp request with sip == tip
should be used for the gratitous announcement that people can learn
from.
It appears the author of the broken patch got those two cases confused
and modified the code to drop all gratuitous arp traffic. Ouch!
Cc: stable@kernel.org
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alas current delaying of freeing old tnodes by RCU in trie_rebalance
is still not enough because we can free a top tnode before updating a
t->trie pointer.
Reported-by: Pawel Staszewski <pstaszewski@itcare.pl>
Tested-by: Pawel Staszewski <pstaszewski@itcare.pl>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 'net: skb->dst accessors'(adf30907d6)
broken the sctp protocol stack, the sctp packet can never be sent out after
Eric Dumazet's patch, which have typo in the sctp code.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Vlad Yasevich <vladisalv.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Clean up to use xfrm_addr_cmp() instead of compare addresses directly.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
If a socket starts out on a non-TSO route, and then switches to
a TSO route, then we will tack on data to the tail of the tx queue
even if it started out life as non-TSO. This is suboptimal because
all of it will then be copied and checksummed unnecessarily.
This patch fixes this by ensuring that skb->ip_summed is set to
CHECKSUM_PARTIAL before appending extra data beyond the MSS.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
If a socket starts out on a non-TSO route, and then switches to
a TSO route, then the tail on the tx queue can morph into a TSO
packet, causing mischief because the rest of the stack does not
expect a partially linear TSO packet.
This patch fixes this by ensuring that skb->ip_summed is set to
CHECKSUM_PARTIAL before declaring a packet as TSO.
Reported-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
ieee802154_nl_get_dev() lacks check for the existance of the device
that was returned by dev_get_XXX, thus resulting in Oops for non-existing
devices. Fix it.
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
As reported by Philip, the UNTRACKED state bit does not fit within
the 8-bit state_mask member. Enlarge state_mask and give status_mask
a few more bits too.
Reported-by: Philip Craig <philipc@snapgear.com>
References: http://markmail.org/thread/b7eg6aovfh4agyz7
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
When NAT helpers change the TCP packet size, the highest seen sequence
number needs to be corrected. This is currently only done upwards, when
the packet size is reduced the sequence number is unchanged. This causes
TCP conntrack to falsely detect unacknowledged data and decrease the
timeout.
Fix by updating the highest seen sequence number in both directions after
packet mangling.
Tested-by: Krzysztof Piotr Oledzki <ole@ans.pl>
Signed-off-by: Patrick McHardy <kaber@trash.net>
When NAPI is disabled while we're in net_rx_action, we end up
calling __napi_complete without flushing GRO packets. This is
a bug as it would cause the GRO packets to linger, of course it
also literally BUGs to catch error like this :)
This patch changes it to napi_complete, with the obligatory IRQ
reenabling. This should be safe because we've only just disabled
IRQs and it does not materially affect the test conditions in
between.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
As transparent proxying looks up the socket early and assigns
it to the skb for later processing, we must drop any existing
socket ownership prior to that in order to distinguish between
the case where tproxy is active and where it is not.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The mac80211 module uses rcu_call() thus it should use rcu_barrier()
on module unload.
The rcu_barrier() is placed in mech.c ieee80211_stop_mesh() which is
invoked from ieee80211_stop() in case vif.type == NL80211_IFTYPE_MESH_POINT.
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
The sunrpc module uses rcu_call() thus it should use rcu_barrier() on
module unload.
Have not verified that the possibility for new call_rcu() callbacks
has been disabled. As a hint for checking, the functions calling
call_rcu() (unx_destroy_cred and generic_destroy_cred) are
registered as crdestroy function pointer in struct rpc_credops.
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
When unloading modules that uses call_rcu() callbacks, then we must
use rcu_barrier(). This module uses syncronize_net() which is not
enough to be sure that all callback has been completed.
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ipv6 module uses rcu_call() thus it should use rcu_barrier() on
module unload.
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
The decnet module unloading as been disabled with a '#if 0' statement,
because it have had issues.
We add a rcu_barrier() anyhow for correctness.
The maintainer (Chrissie Caulfield) will look into the unload issue
when time permits.
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Chrissie Caulfield <christine.caulfield@googlemail.com>
Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Avoid showing wrong high values when the preferred lifetime of an address
is expired.
Signed-off-by: Jens Rosenboom <me@jayr.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
RFC0793 defined that in FIN-WAIT-2 state if the ACK bit is off drop
the segment and return[Page 72]. But this check is missing in function
tcp_timewait_state_process(). This cause the segment with FIN flag but
no ACK has two diffent action:
Case 1:
Node A Node B
<------------- FIN,ACK
(enter FIN-WAIT-1)
ACK ------------->
(enter FIN-WAIT-2)
FIN -------------> discard
(move sk to tw list)
Case 2:
Node A Node B
<------------- FIN,ACK
(enter FIN-WAIT-1)
ACK ------------->
(enter FIN-WAIT-2)
(move sk to tw list)
FIN ------------->
<------------- ACK
This patch fixed the problem.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RCU barriers, rcu_barrier(), is inserted two places.
In nf_conntrack_expect.c nf_conntrack_expect_fini() before the
kmem_cache_destroy(). Firstly to make sure the callback to the
nf_ct_expect_free_rcu() code is still around. Secondly because I'm
unsure about the consequence of having in flight
nf_ct_expect_free_rcu/kmem_cache_free() calls while doing a
kmem_cache_destroy() slab destroy.
And in nf_conntrack_extend.c nf_ct_extend_unregister(), inorder to
wait for completion of callbacks to __nf_ct_ext_free_rcu(), which is
invoked by __nf_ct_ext_add(). It might be more efficient to call
rcu_barrier() in nf_conntrack_core.c nf_conntrack_cleanup_net(), but
thats make it more difficult to read the code (as the callback code
in located in nf_conntrack_extend.c).
Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Netlink address deletion events were not sent when a network device
vanished neither when Phonet was unloaded.
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Our CAST algorithm is called cast5, not cast128. Clearly nobody
has ever used it :)
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6:
bnx2: Fix the behavior of ethtool when ONBOOT=no
qla3xxx: Don't sleep while holding lock.
qla3xxx: Give the PHY time to come out of reset.
ipv4 routing: Ensure that route cache entries are usable and reclaimable with caching is off
net: Move rx skb_orphan call to where needed
ipv6: Use correct data types for ICMPv6 type and code
net: let KS8842 driver depend on HAS_IOMEM
can: let SJA1000 driver depend on HAS_IOMEM
netxen: fix firmware init handshake
netxen: fix build with without CONFIG_PM
netfilter: xt_rateest: fix comparison with self
netfilter: xt_quota: fix incomplete initialization
netfilter: nf_log: fix direct userspace memory access in proc handler
netfilter: fix some sparse endianess warnings
netfilter: nf_conntrack: fix conntrack lookup race
netfilter: nf_conntrack: fix confirmation race condition
netfilter: nf_conntrack: death_by_timeout() fix
When route caching is disabled (rt_caching returns false), We still use route
cache entries that are created and passed into rt_intern_hash once. These
routes need to be made usable for the one call path that holds a reference to
them, and they need to be reclaimed when they're finished with their use. To be
made usable, they need to be associated with a neighbor table entry (which they
currently are not), otherwise iproute_finish2 just discards the packet, since we
don't know which L2 peer to send the packet to. To do this binding, we need to
follow the path a bit higher up in rt_intern_hash, which calls
arp_bind_neighbour, but not assign the route entry to the hash table.
Currently, if caching is off, we simply assign the route to the rp pointer and
are reutrn success. This patch associates us with a neighbor entry first.
Secondly, we need to make sure that any single use routes like this are known to
the garbage collector when caching is off. If caching is off, and we try to
hash in a route, it will leak when its refcount reaches zero. To avoid this,
this patch calls rt_free on the route cache entry passed into rt_intern_hash.
This places us on the gc list for the route cache garbage collector, so that
when its refcount reaches zero, it will be reclaimed (Thanks to Alexey for this
suggestion).
I've tested this on a local system here, and with these patches in place, I'm
able to maintain routed connectivity to remote systems, even if I set
/proc/sys/net/ipv4/rt_cache_rebuild_count to -1, which forces rt_caching to
return false.
Signed-off-by: Neil Horman <nhorman@redhat.com>
Reported-by: Jarek Poplawski <jarkao2@gmail.com>
Reported-by: Maxime Bizon <mbizon@freebox.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to get the tun driver to account packets, we need to be
able to receive packets with destructors set. To be on the safe
side, I added an skb_orphan call for all protocols by default since
some of them (IP in particular) cannot handle receiving packets
destructors properly.
Now it seems that at least one protocol (CAN) expects to be able
to pass skb->sk through the rx path without getting clobbered.
So this patch attempts to fix this properly by moving the skb_orphan
call to where it's actually needed. In particular, I've added it
to skb_set_owner_[rw] which is what most users of skb->destructor
call.
This is actually an improvement for tun too since it means that
we only give back the amount charged to the socket when the skb
is passed to another socket that will also be charged accordingly.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Oliver Hartkopp <olver@hartkopp.net>
Signed-off-by: David S. Miller <davem@davemloft.net>