flush_tasklet is not percpu var, and percpu is percpu var, and
this_cpu_ptr(&info->cache->percpu->flush_tasklet)
is not equal to
&this_cpu_ptr(info->cache->percpu)->flush_tasklet
1f743b076(use this_cpu_ptr per-cpu helper) introduced this bug.
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Erik Hugne's errata proposal (Errata ID: 3480) to RFC4291 has been
verified: http://www.rfc-editor.org/errata_search.php?eid=3480
We have to check for pkt_type and loopback flag because either the
packets are allowed to travel over the loopback interface (in which case
pkt_type is PACKET_HOST and IFF_LOOPBACK flag is set) or they travel
over a non-loopback interface back to us (in which case PACKET_TYPE is
PACKET_LOOPBACK and IFF_LOOPBACK flag is not set).
Cc: Erik Hugne <erik.hugne@ericsson.com>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull userns fixes from Eric W Biederman:
"The bulk of the changes are fixing the worst consequences of the user
namespace design oversight in not considering what happens when one
namespace starts off as a clone of another namespace, as happens with
the mount namespace.
The rest of the changes are just plain bug fixes.
Many thanks to Andy Lutomirski for pointing out many of these issues."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
userns: Restrict when proc and sysfs can be mounted
ipc: Restrict mounting the mqueue filesystem
vfs: Carefully propogate mounts across user namespaces
vfs: Add a mount flag to lock read only bind mounts
userns: Don't allow creation if the user is chrooted
yama: Better permission check for ptraceme
pid: Handle the exit of a multi-threaded init.
scm: Require CAP_SYS_ADMIN over the current pidns to spoof pids.
Steffen Klassert says:
====================
1) Initialize the satype field in key_notify_policy_flush(),
this was left uninitialized. From Nicolas Dichtel.
2) The sequence number difference for replay notifications
was misscalculated on ESN sequence number wrap. We need
a separate replay notify function for esn.
3) Fix an off by one in the esn replay notify function.
From Mathias Krause.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix to return a negative error code from the error handling case
instead of 0(possible overwrite to 0 by ops->fill_xstats call),
as returned elsewhere in this function.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix to return a negative error code from the error handling
case instead of 0, as returned elsewhere in function
nf_conntrack_standalone_init().
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pull networking fixes from David Miller:
1) Always increment IPV4 ID field in encapsulated GSO packets, even
when DF is set. Regression fix from Pravin B Shelar.
2) Fix per-net subsystem initialization in netfilter conntrack,
otherwise we may access dynamically allocated memory before it is
actually allocated. From Gao Feng.
3) Fix DMA buffer lengths in iwl3945 driver, from Stanislaw Gruszka.
4) Fix race between submission of sync vs async commands in mwifiex
driver, from Amitkumar Karwar.
5) Add missing cancel of command timer in mwifiex driver, from Bing
Zhao.
6) Missing SKB free in rtlwifi USB driver, from Jussi Kivilinna.
7) Thermal layer tries to use a genetlink multicast string that is
longer than the 16 character limit. Fix it and add a BUG check to
prevent this kind of thing from happening in the future.
From Masatake YAMATO.
8) Fix many bugs in the handling of the teardown of L2TP connections,
UDP encapsulation instances, and sockets. From Tom Parkin.
9) Missing socket release in IRDA, from Kees Cook.
10) Fix fec driver modular build, from Fabio Estevam.
11) Erroneous use of kfree() instead of free_netdev() in lantiq_etop,
from Wei Yongjun.
12) Fix bugs in handling of queue numbers and steering rules in mlx4
driver, from Moshe Lazer, Hadar Hen Zion, and Or Gerlitz.
13) Some FOO_DIAG_MAX constants were defined off by one, fix from Andrey
Vagin.
14) TCP segmentation deferral is unintentionally done too strongly,
breaking ACK clocking. Fix from Eric Dumazet.
15) net_enable_timestamp() can legitimately be invoked from software
interrupts, and in a way that is safe, so remove the WARN_ON().
Also from Eric Dumazet.
16) Fix use after free in VLANs, from Cong Wang.
17) Fix TCP slow start retransmit storms after SACK reneging, from
Yuchung Cheng.
18) Unix socket release should mark a socket dead before NULL'ing out
sock->sk, otherwise we can race. Fix from Paul Moore.
19) IPV6 addrconf code can try to free static memory, from Hong Zhiguo.
20) Fix register mis-programming, NULL pointer derefs, and wrong PHC
clock frequency in IGB driver. From Lior LevyAlex Williamson, Jiri
Benc, and Jeff Kirsher.
21) skb->ip_summed logic in pch_gbe driver is reversed, breaking packet
forwarding. Fix from Veaceslav Falico.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (65 commits)
ipv4: Fix ip-header identification for gso packets.
bonding: remove already created master sysfs link on failure
af_unix: dont send SCM_CREDENTIAL when dest socket is NULL
pch_gbe: fix ip_summed checksum reporting on rx
igb: fix PHC stopping on max freq
igb: make sensor info static
igb: SR-IOV init reordering
igb: Fix null pointer dereference
igb: fix i350 anti spoofing config
ixgbevf: don't release the soft entries
ipv6: fix bad free of addrconf_init_net
unix: fix a race condition in unix_release()
tcp: undo spurious timeout after SACK reneging
bnx2x: fix assignment of signed expression to unsigned variable
bridge: fix crash when set mac address of br interface
8021q: fix a potential use-after-free
net: remove a WARN_ON() in net_enable_timestamp()
tcp: preserve ACK clocking in TSO
net: fix *_DIAG_MAX constants
net/mlx4_core: Disallow releasing VF QPs which have steering rules
...
- Fix an NFSv4 idmapper regression
- Fix an Oops in the pNFS blocks client
- Fix up various issues with pNFS layoutcommit
- Ensure correct read ordering of variables in rpc_wake_up_task_queue_locked
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (GNU/Linux)
iQIcBAABAgAGBQJRUedyAAoJEGcL54qWCgDyar0P/2pTT/yxX8ejTu5DmY7e4PYJ
jhPG2AEqY/yMLn9GvB375VIs1L8tuY50+3NFhWZFjyNbEU3GV+5Y+kPpBtAgYiSI
VyIXiJ/xMtXdYJMYuE/nh5jbcqJsHwGjpcIaSd5BuWzQUaoUYvLulxWd4QN8mmaT
5SuzmgV+7WIqV6RjlaYF82srcOKAjwemcrfRkCNzzJr6aT39gH2YdYFbDaTr7qhU
fw0x3QlI7887vSNQcfaGbC1+jr6oe8wRCneOR0tceU/8bcj6zlUDk5HxqSOc28mA
jUQieoVRggcM4s5DFpNcuwW6qCPZOmzv/OFD6oqnhyyonPOrue+7zaoujZmGNmjx
dT2V/jQehanYD25WpDO8OyFXUeYE4x9bgHKsszhBTwr4x5D8ceEJ1sugcOPiTTxu
tflbbuWbt+BguvXp4p8QayUj0V2cplM/nOovWyUG+BH46sz3Dtv46NOgJeO2a29g
T6jayxmKCxvtPKtG0j34BzLngiKabZTSEhFms6Qarp9lwWvHWrR9KWGuDBNvy1Ts
GMBN8P6Ib40yVi6Pwlj5Jpy6yLKVklHtJQpactr63AZmYrF4bBBSom+MWAh3X1iO
QtF0x9Z1bBkXY2Q/u+3vWMxQtEPeW+pSiloj8aiceFAt33zKM+1bLofDhEw0s2fI
wJEHYsGyGtDQINgP0v1e
=OPbZ
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-3.9-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
- Fix an NFSv4 idmapper regression
- Fix an Oops in the pNFS blocks client
- Fix up various issues with pNFS layoutcommit
- Ensure correct read ordering of variables in
rpc_wake_up_task_queue_locked
* tag 'nfs-for-3.9-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
SUNRPC: Add barriers to ensure read ordering in rpc_wake_up_task_queue_locked
NFSv4.1: Add a helper pnfs_commit_and_return_layout
NFSv4.1: Always clear the NFS_INO_LAYOUTCOMMIT in layoutreturn
NFSv4.1: Fix a race in pNFS layoutcommit
pnfs-block: removing DM device maybe cause oops when call dev_remove
NFSv4: Fix the string length returned by the idmapper
ip-header id needs to be incremented even if IP_DF flag is set.
This behaviour was changed in commit 490ab08127
(IP_GRE: Fix IP-Identification).
Following patch fixes it so that identification is always
incremented.
Reported-by: Cong Wang <amwang@redhat.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SCM_SCREDENTIALS should apply to write() syscalls only either source or destination
socket asserted SOCK_PASSCRED. The original implememtation in maybe_add_creds is wrong,
and breaks several LSB testcases ( i.e. /tset/LSB.os/netowkr/recvfrom/T.recvfrom).
Origionally-authored-by: Karel Srot <ksrot@redhat.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As reported by Jan, and others over the past few years, there is a
race condition caused by unix_release setting the sock->sk pointer
to NULL before properly marking the socket as dead/orphaned. This
can cause a problem with the LSM hook security_unix_may_send() if
there is another socket attempting to write to this partially
released socket in between when sock->sk is set to NULL and it is
marked as dead/orphaned. This patch fixes this by only setting
sock->sk to NULL after the socket has been marked as dead; I also
take the opportunity to make unix_release_sock() a void function
as it only ever returned 0/success.
Dave, I think this one should go on the -stable pile.
Special thanks to Jan for coming up with a reproducer for this
problem.
Reported-by: Jan Stancek <jan.stancek@gmail.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Corey Richardson reported that my idle handling cleanup
(commit fd0f979a1b, "mac80211: simplify idle handling")
broke ath9k_htc. The reason appears to be that it wants
to go out of idle before switching channels. To fix it,
reimplement that sequence.
Reported-by: Corey Richardson <corey@octayn.net>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
We need to be careful when testing task->tk_waitqueue in
rpc_wake_up_task_queue_locked, because it can be changed while we
are holding the queue->lock.
By adding appropriate memory barriers, we can ensure that it is safe to
test task->tk_waitqueue for equality if the RPC_TASK_QUEUED bit is set.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
If user-space tries to create accounting object with an empty
name, then return -EINVAL.
Reported-by: Michael Zintakis <michael.zintakis@googlemail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Fix to return a negative error code from the error handling
case instead of 0, as returned elsewhere in this function.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
If a ROC item is canceled just as it expires, the work
struct may be scheduled while it is running (and waiting
for the mutex). This results in it being run after being
freed, which obviously crashes.
To fix this don't free it when aborting is requested but
instead mark it as "to be freed", which makes the work a
no-op and allows freeing it outside.
Cc: stable@vger.kernel.org [3.6+]
Reported-by: Jouni Malinen <j@w1.fi>
Tested-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Commit 0017c0b "xfrm: Fix replay notification for esn." is off by one
for the sequence number wrapped case as UINT_MAX is 0xffffffff, not
0x100000000. ;)
Just calculate the diff like done everywhere else in the file.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
On SACK reneging the sender immediately retransmits and forces a
timeout but disables Eifel (undo). If the (buggy) receiver does not
drop any packet this can trigger a false slow-start retransmit storm
driven by the ACKs of the original packets. This can be detected with
undo and TCP timestamps.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When I tried to set mac address of a bridge interface to a mac
address which already learned on this bridge, I got system hang.
The cause is straight forward: function br_fdb_change_mac_address
calls fdb_insert with NULL source nbp. Then an fdb lookup is
performed. If an fdb entry is found and it's local, it's OK. But
if it's not local, source is dereferenced for printk without NULL
check.
Signed-off-by: Hong Zhiguo <honkiko@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
vlan_vid_del() could possibly free ->vlan_info after a RCU grace
period, however, we may still refer to the freed memory area
by 'grp' pointer. Found by code inspection.
This patch moves vlan_vid_del() as behind as possible.
Cc: Patrick McHardy <kaber@trash.net>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The WARN_ON(in_interrupt()) in net_enable_timestamp() can get false
positive, in socket clone path, run from softirq context :
[ 3641.624425] WARNING: at net/core/dev.c:1532 net_enable_timestamp+0x7b/0x80()
[ 3641.668811] Call Trace:
[ 3641.671254] <IRQ> [<ffffffff80286817>] warn_slowpath_common+0x87/0xc0
[ 3641.677871] [<ffffffff8028686a>] warn_slowpath_null+0x1a/0x20
[ 3641.683683] [<ffffffff80742f8b>] net_enable_timestamp+0x7b/0x80
[ 3641.689668] [<ffffffff80732ce5>] sk_clone_lock+0x425/0x450
[ 3641.695222] [<ffffffff8078db36>] inet_csk_clone_lock+0x16/0x170
[ 3641.701213] [<ffffffff807ae449>] tcp_create_openreq_child+0x29/0x820
[ 3641.707663] [<ffffffff807d62e2>] ? ipt_do_table+0x222/0x670
[ 3641.713354] [<ffffffff807aaf5b>] tcp_v4_syn_recv_sock+0xab/0x3d0
[ 3641.719425] [<ffffffff807af63a>] tcp_check_req+0x3da/0x530
[ 3641.724979] [<ffffffff8078b400>] ? inet_hashinfo_init+0x60/0x80
[ 3641.730964] [<ffffffff807ade6f>] ? tcp_v4_rcv+0x79f/0xbe0
[ 3641.736430] [<ffffffff807ab9bd>] tcp_v4_do_rcv+0x38d/0x4f0
[ 3641.741985] [<ffffffff807ae14a>] tcp_v4_rcv+0xa7a/0xbe0
Its safe at this point because the parent socket owns a reference
on the netstamp_needed, so we cant have a 0 -> 1 transition, which
requires to lock a mutex.
Instead of refining the check, lets remove it, as all known callers
are safe. If it ever changes in the future, static_key_slow_inc()
will complain anyway.
Reported-by: Laurent Chavey <chavey@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I found another crash when deleting lots of virtual stations
in a congested environment. I think the problem is that
the ieee80211_mlme_notify_scan_completed could call
ieee80211_restart_sta_timer for a stopped interface
that was about to be deleted.
With the following patch I am unable to reproduce the
crash.
Signed-off-by: Ben Greear <greearb@candelatech.com>
[move check, also make the same change in mesh]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
If a P2P device wdev is removed while it has a scan, then the
scan completion might crash later as it is already freed by
that time. To avoid the crash always check the scan completion
when the P2P device is being removed for some reason. If the
driver already canceled it, don't want and free it, otherwise
warn and leak it to avoid later crashes.
In order to do this, locking needs to be changed away from the
rdev mutex (which can't always be guaranteed). For now, use
the sched_scan_mtx instead, I'll rename it to just scan_mtx in
a later patch.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
A long standing problem with TSO is the fact that tcp_tso_should_defer()
rearms the deferred timer, while it should not.
Current code leads to following bad bursty behavior :
20:11:24.484333 IP A > B: . 297161:316921(19760) ack 1 win 119
20:11:24.484337 IP B > A: . ack 263721 win 1117
20:11:24.485086 IP B > A: . ack 265241 win 1117
20:11:24.485925 IP B > A: . ack 266761 win 1117
20:11:24.486759 IP B > A: . ack 268281 win 1117
20:11:24.487594 IP B > A: . ack 269801 win 1117
20:11:24.488430 IP B > A: . ack 271321 win 1117
20:11:24.489267 IP B > A: . ack 272841 win 1117
20:11:24.490104 IP B > A: . ack 274361 win 1117
20:11:24.490939 IP B > A: . ack 275881 win 1117
20:11:24.491775 IP B > A: . ack 277401 win 1117
20:11:24.491784 IP A > B: . 316921:332881(15960) ack 1 win 119
20:11:24.492620 IP B > A: . ack 278921 win 1117
20:11:24.493448 IP B > A: . ack 280441 win 1117
20:11:24.494286 IP B > A: . ack 281961 win 1117
20:11:24.495122 IP B > A: . ack 283481 win 1117
20:11:24.495958 IP B > A: . ack 285001 win 1117
20:11:24.496791 IP B > A: . ack 286521 win 1117
20:11:24.497628 IP B > A: . ack 288041 win 1117
20:11:24.498459 IP B > A: . ack 289561 win 1117
20:11:24.499296 IP B > A: . ack 291081 win 1117
20:11:24.500133 IP B > A: . ack 292601 win 1117
20:11:24.500970 IP B > A: . ack 294121 win 1117
20:11:24.501388 IP B > A: . ack 295641 win 1117
20:11:24.501398 IP A > B: . 332881:351881(19000) ack 1 win 119
While the expected behavior is more like :
20:19:49.259620 IP A > B: . 197601:202161(4560) ack 1 win 119
20:19:49.260446 IP B > A: . ack 154281 win 1212
20:19:49.261282 IP B > A: . ack 155801 win 1212
20:19:49.262125 IP B > A: . ack 157321 win 1212
20:19:49.262136 IP A > B: . 202161:206721(4560) ack 1 win 119
20:19:49.262958 IP B > A: . ack 158841 win 1212
20:19:49.263795 IP B > A: . ack 160361 win 1212
20:19:49.264628 IP B > A: . ack 161881 win 1212
20:19:49.264637 IP A > B: . 206721:211281(4560) ack 1 win 119
20:19:49.265465 IP B > A: . ack 163401 win 1212
20:19:49.265886 IP B > A: . ack 164921 win 1212
20:19:49.266722 IP B > A: . ack 166441 win 1212
20:19:49.266732 IP A > B: . 211281:215841(4560) ack 1 win 119
20:19:49.267559 IP B > A: . ack 167961 win 1212
20:19:49.268394 IP B > A: . ack 169481 win 1212
20:19:49.269232 IP B > A: . ack 171001 win 1212
20:19:49.269241 IP A > B: . 215841:221161(5320) ack 1 win 119
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Van Jacobson <vanj@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The virtual monitor interface has a locking issue, it calls
into the channel context code with the iflist mutex held
which isn't allowed since it is usually acquired the other
way around. The mutex is still required for the interface
iteration, but need not be held across the channel calls.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Arend reported a crash in tracing if the driver returns an
ERR_PTR() value from the add_virtual_intf() callback. This
is due to the tracing then still attempting to dereference
the "pointer", fix this by using IS_ERR_OR_NULL().
Reported-by: Arend van Spriel <arend@broadcom.com>
Tested-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This makes sure that release_sock is called for all error conditions in
irda_getsockopt.
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Brad Spengler <spender@grsecurity.net>
Cc: stable@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
In skb_flow_dissect(), we perform a dissection of a skbuff. Since we're
doing the work here anyway, also store thoff for a later usage, e.g. in
the BPF filter.
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If we postpone unhashing of l2tp sessions until the structure is freed, we
risk:
1. further packets arriving and getting queued while the pseudowire is being
closed down
2. the recv path hitting "scheduling while atomic" errors in the case that
recv drops the last reference to a session and calls l2tp_session_free
while in atomic context
As such, l2tp sessions should be unhashed from l2tp_core data structures early
in the teardown process prior to calling pseudowire close. For pseudowires
like l2tp_ppp which have multiple shutdown codepaths, provide an unhash hook.
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
l2tp's u64_stats writers were incorrectly synchronised, making it possible to
deadlock a 64bit machine running a 32bit kernel simply by sending the l2tp
code netlink commands while passing data through l2tp sessions.
Previous discussion on netdev determined that alternative solutions such as
spinlock writer synchronisation or per-cpu data would bring unjustified
overhead, given that most users interested in high volume traffic will likely
be running 64bit kernels on 64bit hardware.
As such, this patch replaces l2tp's use of u64_stats with atomic_long_t,
thereby avoiding the deadlock.
Ref:
http://marc.info/?l=linux-netdev&m=134029167910731&w=2http://marc.info/?l=linux-netdev&m=134079868111131&w=2
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If userspace deletes a ppp pseudowire using the netlink API, either by
directly deleting the session or by deleting the tunnel that contains the
session, we need to tear down the corresponding pppox channel.
Rather than trying to manage two pppox unbind codepaths, switch the netlink
and l2tp_core session_close handlers to close via. the l2tp_ppp socket
.release handler.
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add calls to l2tp_session_queue_purge as a part of l2tp_tunnel_closeall
and l2tp_session_delete. Pseudowire implementations which are deleted only
via. l2tp_core l2tp_session_delete calls can dispense with their own code for
flushing the reorder queue.
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If an l2tp session is deleted, it is necessary to delete skbs in-flight
on the session's reorder queue before taking it down.
Rather than having each pseudowire implementation reaching into the
l2tp_session struct to handle this itself, provide a function in l2tp_core to
purge the session queue.
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is valid for an existing struct sock object to have a NULL sk_socket
pointer, so don't BUG_ON in l2tp_tunnel_del_work if that should occur.
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When looking up the tunnel socket in struct l2tp_tunnel, hold a reference
whether the socket was created by the kernel or by userspace.
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a user deletes a tunnel using netlink, all the sessions in the tunnel
should also be deleted. Since running sessions will pin the tunnel socket
with the references they hold, have the l2tp_tunnel_delete close all sessions
in a tunnel before finally closing the tunnel socket.
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
l2tp_core hooks UDP's .destroy handler to gain advance warning of a tunnel
socket being closed from userspace. We need to do the same thing for
IP-encapsulation sockets.
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
l2tp_core internally uses l2tp_tunnel_closeall to close all sessions in a
tunnel when a UDP-encapsulation socket is destroyed. We need to do something
similar for IP-encapsulation sockets.
Export l2tp_tunnel_closeall as a GPL symbol to enable l2tp_ip and l2tp_ip6 to
call it from their .destroy handlers.
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
L2TP sessions hold a reference to the tunnel socket to prevent it going away
while sessions are still active. However, since tunnel destruction is handled
by the sock sk_destruct callback there is a catch-22: a tunnel with sessions
cannot be deleted since each session holds a reference to the tunnel socket.
If userspace closes a managed tunnel socket, or dies, the tunnel will persist
and it will be neccessary to individually delete the sessions using netlink
commands. This is ugly.
To prevent this occuring, this patch leverages the udp encapsulation socket
destroy callback to gain early notification when the tunnel socket is closed.
This allows us to safely close the sessions running in the tunnel, dropping
the tunnel socket references in the process. The tunnel socket is then
destroyed as normal, and the tunnel resources deallocated in sk_destruct.
While we're at it, ensure that l2tp_tunnel_closeall correctly drops session
references to allow the sessions to be deleted rather than leaking.
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Users of udp encapsulation currently have an encap_rcv callback which they can
use to hook into the udp receive path.
In situations where a encapsulation user allocates resources associated with a
udp encap socket, it may be convenient to be able to also hook the proto
.destroy operation. For example, if an encap user holds a reference to the
udp socket, the destroy hook might be used to relinquish this reference.
This patch adds a socket destroy hook into udp, which is set and enabled
in the same way as the existing encap_rcv hook.
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Trigger BUG_ON if a group name is longer than GENL_NAMSIZ.
Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
kfree_skb was called twice when the socket receive queue is full
Signed-off-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Pablo Neira Ayuso says:
====================
The following patchset contains 7 Netfilter/IPVS fixes for 3.9-rc, they are:
* Restrict IPv6 stateless NPT targets to the mangle table. Many users are
complaining that this target does not work in the nat table, which is the
wrong table for it, from Florian Westphal.
* Fix possible use before initialization in the netns init path of several
conntrack protocol trackers (introduced recently while improving conntrack
netns support), from Gao Feng.
* Fix incorrect initialization of copy_range in nfnetlink_queue, spotted
by Eric Dumazet during the NFWS2013, patch from myself.
* Fix wrong calculation of next SCTP chunk in IPVS, from Julian Anastasov.
* Remove rcu_read_lock section in IPVS while calling ipv4_update_pmtu
not required anymore after change introduced in 3.7, again from Julian.
* Fix SYN looping in IPVS state sync if the backup is used a real server
in DR/TUN modes, this required a new /proc entry to disable the director
function when acting as backup, also from Julian.
* Remove leftover IP_NF_QUEUE Kconfig after ip_queue removal, noted by
Paul Bolle.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
We may miscalculate the sequence number difference from the
last time we send a notification if a sequence number wrap
occured in the meantime. We fix this by adding a separate
replay notify function for esn. Here we take the high bits
of the sequence number into account to calculate the
difference.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Calling sock_orphan when e.g. the NFC adapter is removed can lead to
kernel crashes when e.g. a connection less client is sleeping on the
Rx workqueue, waiting for data to show up.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Kconfig symbol IP_NF_QUEUE is unused since commit
d16cf20e2f ("netfilter: remove ip_queue
support"). Let's remove it too.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pull networking fixes from David Miller:
1) Fix ARM BPF JIT handling of negative 'k' values, from Chen Gang.
2) Insufficient space reserved for bridge netlink values, fix from
Stephen Hemminger.
3) Some dst_neigh_lookup*() callers don't interpret error pointer
correctly, fix from Zhouyi Zhou.
4) Fix transport match in SCTP active_path loops, from Xugeng Zhang.
5) Fix qeth driver handling of multi-order SKB frags, from Frank
Blaschka.
6) fec driver is missing napi_disable() call, resulting in crashes on
unload, from Georg Hofmann.
7) Don't try to handle PMTU events on a listening socket, fix from Eric
Dumazet.
8) Fix timestamp location calculations in IP option processing, from
David Ward.
9) FIB_TABLE_HASHSZ setting is not controlled by the correct kconfig
tests, from Denis V Lunev.
10) Fix TX descriptor push handling in SFC driver, from Ben Hutchings.
11) Fix isdn/hisax and tulip/de4x5 kconfig dependencies, from Arnd
Bergmann.
12) bnx2x statistics don't handle 4GB rollover correctly, fix from
Maciej Żenczykowski.
13) Openvswitch bug fixes for vport del/new error reporting, missing
genlmsg_end() call in netlink processing, and mis-parsing of
LLC/SNAP ethernet types. From Rich Lane.
14) SKB pfmemalloc state should only be propagated from the head page of
a compound page, fix from Pavel Emelyanov.
15) Fix link handling in tg3 driver for 5715 chips when autonegotation
is disabled. From Nithin Sujir.
16) Fix inverted test of cpdma_check_free_tx_desc return value in
davinci_emac driver, from Mugunthan V N.
17) vlan_depth is incorrectly calculated in skb_network_protocol(), from
Li RongQing.
18) Fix probing of Gobi 1K devices in qmi_wwan driver, and fix NCM
device mode backwards compat in cdc_ncm driver. From Bjørn Mork.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (52 commits)
inet: limit length of fragment queue hash table bucket lists
qeth: Fix scatter-gather regression
qeth: Fix invalid router settings handling
qeth: delay feature trace
tcp: dont handle MTU reduction on LISTEN socket
bnx2x: fix occasional statistics off-by-4GB error
vhost/net: fix heads usage of ubuf_info
bridge: Add support for setting BR_ROOT_BLOCK flag.
bnx2x: add missing napi deletion in error path
drivers: net: ethernet: ti: davinci_emac: fix usage of cpdma_check_free_tx_desc()
ethernet/tulip: DE4x5 needs VIRT_TO_BUS
isdn: hisax: netjet requires VIRT_TO_BUS
net: cdc_ncm, cdc_mbim: allow user to prefer NCM for backwards compatibility
rtnetlink: Mask the rta_type when range checking
Revert "ip_gre: make ipgre_tunnel_xmit() not parse network header as IP unconditionally"
Fix dst_neigh_lookup/dst_neigh_lookup_skb return value handling bug
smsc75xx: configuration help incorrectly mentions smsc95xx
net: fec: fix missing napi_disable call
net: fec: restart the FEC when PHY speed changes
skb: Propagate pfmemalloc on skb from head page only
...
This patch introduces a constant limit of the fragment queue hash
table bucket list lengths. Currently the limit 128 is choosen somewhat
arbitrary and just ensures that we can fill up the fragment cache with
empty packets up to the default ip_frag_high_thresh limits. It should
just protect from list iteration eating considerable amounts of cpu.
If we reach the maximum length in one hash bucket a warning is printed.
This is implemented on the caller side of inet_frag_find to distinguish
between the different users of inet_fragment.c.
I dropped the out of memory warning in the ipv4 fragment lookup path,
because we already get a warning by the slab allocator.
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jesper Dangaard Brouer <jbrouer@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In 3.7 we added code that uses ipv4_update_pmtu but after commit
c5ae7d4192 (ipv4: must use rcu protection while calling fib_lookup)
the RCU lock is not needed.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Dmitry Akindinov is reporting for a problem where SYNs are looping
between the master and backup server when the backup server is used as
real server in DR mode and has IPVS rules to function as director.
Even when the backup function is enabled we continue to forward
traffic and schedule new connections when the current master is using
the backup server as real server. While this is not a problem for NAT,
for DR and TUN method the backup server can not determine if a request
comes from client or from director.
To avoid such loops add new sysctl flag backup_only. It can be needed
for DR/TUN setups that do not need backup and director function at the
same time. When the backup function is enabled we stop any forwarding
and pass the traffic to the local stack (real server mode). The flag
disables the director function when the backup function is enabled.
For setups that enable backup function for some virtual services and
director function for other virtual services there should be another
more complex solution to support DR/TUN mode, may be to assign
per-virtual service syncid value, so that we can differentiate the
requests.
Reported-by: Dmitry Akindinov <dimak@stalker.com>
Tested-by: German Myzovsky <lawyer@sipnet.ru>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Fix wrong but non-fatal access to chunk length.
sch->length should be in network order, next chunk should
be aligned to 4 bytes. Problem noticed in sparse output.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
When an ICMP ICMP_FRAG_NEEDED (or ICMPV6_PKT_TOOBIG) message finds a
LISTEN socket, and this socket is currently owned by the user, we
set TCP_MTU_REDUCED_DEFERRED flag in listener tsq_flags.
This is bad because if we clone the parent before it had a chance to
clear the flag, the child inherits the tsq_flags value, and next
tcp_release_cb() on the child will decrement sk_refcnt.
Result is that we might free a live TCP socket, as reported by
Dormando.
IPv4: Attempt to release TCP socket in state 1
Fix this issue by testing sk_state against TCP_LISTEN early, so that we
set TCP_MTU_REDUCED_DEFERRED on appropriate sockets (not a LISTEN one)
This bug was introduced in commit 563d34d057
(tcp: dont drop MTU reduction indications)
Reported-by: dormando <dormando@rydia.net>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Don't allow spoofing pids over unix domain sockets in the corner
cases where a user has created a user namespace but has not yet
created a pid namespace.
Cc: stable@vger.kernel.org
Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Most of the support was already there. The only thing that was missing
was the call to set the flag. Add this call.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John W. Linville says:
====================
On the NFC bits, Samuel says:
"With this one we have:
- A fix for properly decreasing socket ack log.
- A timer and works cleanup upon NFC device removal.
- A monitoroing socket cleanup round from llcp_socket_release.
- A proper error report to pending sockets upon NFC device removal."
Regarding the Bluetooth bits, Gustavo says:
"I have these two patches for 3.9, these add support for two more devices to
the bluetooth drivers."
Along with those, we have a few wireless driver fixes...
Bing Zhao provides an mwifiex to prevent an out-of-bounds memory
access.
John Crispin offers a Kconfig fix to enable some otherwise dead code
in rt2x00. The correct symbols were added in -rc1 through a different
tree, but the symbols for enabling the wireless driver didn't match.
Larry Finger brings an rtlwifi fix for a scheduling while atomic bug,
and another fix for a reassociation problem caused by failing to
clear the BSSID after a disconnect.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Range/validity checks on rta_type in rtnetlink_rcv_msg() do
not account for flags that may be set. This causes the function
to return -EINVAL when flags are set on the type (for example
NLA_F_NESTED).
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 412ed94744.
The commit is wrong as tiph points to the outer IPv4 header which is
installed at ipgre_header() and not the inner one which is protocol dependant.
This commit broke succesfully opennhrp which use PF_PACKET socket with
ETH_P_NHRP protocol. Additionally ssl_addr is set to the link-layer
IPv4 address. This address is written by ipgre_header() to the skb
earlier, and this is the IPv4 header tiph should point to - regardless
of the inner protocol payload.
Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesse Gross says:
====================
A few different bug fixes, including several for issues with userspace
communication that have gone unnoticed up until now. These are intended
for net/3.9.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
As the translation is stateless, using it in nat table
doesn't work (only initial packet is translated).
filter table OUTPUT works but won't re-route the packet after translation.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2^16 = 0xffff, not 0xfffff (note the extra 'f'). Not dangerous since you
adjust it to min_t(data_len, skb->len) just after on.
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
In (c296bb4 netfilter: nf_conntrack: refactor l4proto support for netns)
the l4proto gre/dccp/udplite/sctp registration happened before the pernet
subsystem, which is wrong.
Register pernet subsystem before register L4proto since after register
L4proto, init_conntrack may try to access the resources which allocated
in register_pernet_subsys.
Reported-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Chrome OS team reported a crash on a Pixel ChromeBook in TCP stack :
https://code.google.com/p/chromium/issues/detail?id=182056
commit a21d45726a (tcp: avoid order-1 allocations on wifi and tx
path) did a poor choice adding an 'avail_size' field to skb, while
what we really needed was a 'reserved_tailroom' one.
It would have avoided commit 22b4a4f22d (tcp: fix retransmit of
partially acked frames) and this commit.
Crash occurs because skb_split() is not aware of the 'avail_size'
management (and should not be aware)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Mukesh Agrawal <quiche@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull namespace bugfixes from Eric Biederman:
"This tree includes a partial revert for "fs: Limit sys_mount to only
request filesystem modules." When I added the new style module aliases
to the filesystems I deleted the old ones. A bad move. It turns out
that distributions like Arch linux use module aliases when
constructing ramdisks. Which meant ultimately that an ext3 filesystem
mounted with ext4 would not result in the ext4 module being put into
the ramdisk.
The other change in this tree adds a handful of filesystem module
alias I simply failed to add the first time. Which inconvinienced a
few folks using cifs.
I don't want to inconvinience folks any longer than I have to so here
are these trivial fixes."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
fs: Readd the fs module aliases.
fs: Limit sys_mount to only request filesystem modules. (Part 3)
sctp_assoc_lookup_tsn() function searchs which transport a certain TSN
was sent on, if not found in the active_path transport, then go search
all the other transports in the peer's transport_addr_list, however, we
should continue to the next entry rather than break the loop when meet
the active_path transport.
Signed-off-by: Xufeng Zhang <xufeng.zhang@windriver.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When SCTP is done processing a duplicate cookie chunk, it tries
to delete a newly created association. For that, it has to set
the right association for the side-effect processing to work.
However, when it uses the SCTP_CMD_NEW_ASOC command, that performs
more work then really needed (like hashing the associationa and
assigning it an id) and there is no point to do that only to
delete the association as a next step. In fact, it also creates
an impossible condition where an association may be found by
the getsockopt() call, and that association is empty. This
causes a crash in some sctp getsockopts.
The solution is rather simple. We simply use SCTP_CMD_SET_ASOC
command that doesn't have all the overhead and does exactly
what we need.
Reported-by: Karl Heiss <kheiss@gmail.com>
Tested-by: Karl Heiss <kheiss@gmail.com>
CC: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I had assumed that the only use of module aliases for filesystems
prior to "fs: Limit sys_mount to only request filesystem modules."
was in request_module. It turns out I was wrong. At least mkinitcpio
in Arch linux uses these aliases.
So readd the preexising aliases, to keep from breaking userspace.
Userspace eventually will have to follow and use the same aliases the
kernel does. So at some point we may be delete these aliases without
problems. However that day is not today.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Pull Ceph fix from Sage Weil:
"This fixes a bug in the new message decoding that just went in during
the last window."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
libceph: fix decoding of pgids
Pull nfsd bugfixes from Bruce Fields:
"Some minor fallout from the user-namespace work broke most krb5 mounts
to nfsd, and I screwed up a change to the AF_LOCAL rpc code."
* 'for-3.9' of git://linux-nfs.org/~bfields/linux:
sunrpc: don't attempt to cancel unitialized work
nfsd: fix krb5 handling of anonymous principals
[ Bug added added in commit 05e8ef4ab2 (net: factor out
skb_mac_gso_segment() from skb_gso_segment() ) ]
move vlan_depth out of while loop, or else vlan_depth always is ETH_HLEN,
can not be increased, and lead to infinite loop when frame has two vlan headers.
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bridge multicast fast leave feature was added sufficient space
was not reserved in the netlink message. This means the flag may be
lost in netlink events and results of queries.
Found by observation while looking up some netlink stuff for discussion with Vlad.
Problem introduced by commit c2d3babfaf
Author: David S. Miller <davem@davemloft.net>
Date: Wed Dec 5 16:24:45 2012 -0500
bridge: implement multicast fast leave
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is needed in order to detect if the timestamp option appears
more than once in a packet, to remove the option if the packet is
fragmented, etc. My previous change neglected to store the option
location when the router addresses were prespecified and Pointer >
Length. But now the option location is also stored when Flag is an
unrecognized value, to ensure these option handling behaviors are
still performed.
Signed-off-by: David Ward <david.ward@ll.mit.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
batadv_iv_ogm_process() accesses the packet using the tt_num_changes
attribute regardless of the real packet len (assuming the length check
was done before). Therefore a length check is needed to avoid reading
random memory.
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
In 4f6a7e5ee1 we effectively dropped support
for the legacy encoding for the OSDMap and incremental. However, we didn't
fix the decoding for the pgid.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
Pull networking fixes from David Miller:
1) Missing cancel of work items in mac80211 MLME, from Ben Greear.
2) Fix DMA mapping handling in iwlwifi by using coherent DMA for
command headers, from Johannes Berg.
3) Decrease the amount of pressure on the page allocator by using order
1 pages less in iwlwifi, from Emmanuel Grumbach.
4) Fix mesh PS broadcast OOPS in mac80211, from Marco Porsch.
5) Don't forget to recalculate idle state in mac80211 monitor
interface, from Felix Fietkau.
6) Fix varargs in netfilter conntrack handler, from Joe Perches.
7) Need to reset entire chip when command queue fills up in iwlwifi,
from Emmanuel Grumbach.
8) The TX antenna value must be valid when calibrations are performed
in iwlwifi, fix from Dor Shaish.
9) Don't generate netfilter audit log entries when audit is disabled,
from Gao Feng.
10) Deal with DMA unit hang on e1000e during power state transitions,
from Bruce Allan.
11) Remove BUILD_BUG_ON check from igb driver, from Alexander Duyck.
12) Fix lockdep warning on i2c handling of igb driver, from Carolyn
Wyborny.
13) Fix several TTY handling issues in IRDA ircomm tty driver, from
Peter Hurley.
14) Several QFQ packet scheduler fixes from Paolo Valente.
15) When VXLAN encapsulates on transmit, we have to reset the netfilter
state. From Zang MingJie.
16) Fix jiffie check in net_rx_action() so that we really cap the
processing at 2HZ. From Eric Dumazet.
17) Fix erroneous trigger of IP option space exhaustion, when routers
are pre-specified and we are looking to see if we can insert a
timestamp, we will have the space. From David Ward.
18) Fix various issues in benet driver wrt waiting for firmware to
finish POST after resets or errors. From Gavin Shan and Sathya
Perla.
19) Fix TX locking in SFC driver, from Ben Hutchings.
20) Like the VXLAN fix above, when we encap in a TUN device we have to
reset the netfilter state. This should fix several strange crashes
reported by Dave Jones and others. From Eric Dumazet.
21) Don't forget to clean up MAC address resources when shutting down a
port in mlx4 driver, from Yan Burman.
22) Fix divide by zero in vmxnet3 driver, from Bhavesh Davda.
23) Fix device statistic regression in tg3 when the driver is using
phylib, from Nithin Sujir.
24) Fix info leak in several netlink handlers, from Mathias Krause.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (79 commits)
6lowpan: Fix endianness issue in is_addr_link_local().
rrunner.c: fix possible memory leak in rr_init_one()
dcbnl: fix various netlink info leaks
rtnl: fix info leak on RTM_GETLINK request for VF devices
bridge: fix mdb info leaks
tg3: Update link_up flag for phylib devices
ipv6: stop multicast forwarding to process interface scoped addresses
bridging: fix rx_handlers return code
netlabel: fix build problems when CONFIG_IPV6=n
drivers/isdn: checkng length to be sure not memory overflow
net/rds: zero last byte for strncpy
bnx2x: Fix SFP+ misconfiguration in iSCSI boot scenario
bnx2x: Fix intermittent long KR2 link up time
macvlan: Set IFF_UNICAST_FLT flag to prevent unnecessary promisc mode.
team: unsyc the devices addresses when port is removed
bridge: add missing vid to br_mdb_get()
Fix: sparse warning in inet_csk_prepare_forced_close
afkey: fix a typo
MAINTAINERS: Update qlcnic maintainers list
netlabel: correctly list all the static label mappings
...
If a P2P Device interface receives an unhandled action
frame, we attempt to return it. This crashes because it
doesn't have a channel context. Fix the crash by using
status->band and properly mark the return frame as an
off-channel frame.
Reported-by: Ilan Peer <ilan.peer@intel.com>
Reviewed-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The dcb netlink interface leaks stack memory in various places:
* perm_addr[] buffer is only filled at max with 12 of the 32 bytes but
copied completely,
* no in-kernel driver fills all fields of an IEEE 802.1Qaz subcommand,
so we're leaking up to 58 bytes for ieee_ets structs, up to 136 bytes
for ieee_pfc structs, etc.,
* the same is true for CEE -- no in-kernel driver fills the whole
struct,
Prevent all of the above stack info leaks by properly initializing the
buffers/structures involved.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Initialize the mac address buffer with 0 as the driver specific function
will probably not fill the whole buffer. In fact, all in-kernel drivers
fill only ETH_ALEN of the MAX_ADDR_LEN bytes, i.e. 6 of the 32 possible
bytes. Therefore we currently leak 26 bytes of stack memory to userland
via the netlink interface.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bridging code discloses heap and stack bytes via the RTM_GETMDB
netlink interface and via the notify messages send to group RTNLGRP_MDB
afer a successful add/del.
Fix both cases by initializing all unset members/padding bytes with
memset(0).
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull namespace bugfixes from Eric Biederman:
"This is three simple fixes against 3.9-rc1. I have tested each of
these fixes and verified they work correctly.
The userns oops in key_change_session_keyring and the BUG_ON triggered
by proc_ns_follow_link were found by Dave Jones.
I am including the enhancement for mount to only trigger requests of
filesystem modules here instead of delaying this for the 3.10 merge
window because it is both trivial and the kind of change that tends to
bit-rot if left untouched for two months."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
proc: Use nd_jump_link in proc_ns_follow_link
fs: Limit sys_mount to only request filesystem modules (Part 2).
fs: Limit sys_mount to only request filesystem modules.
userns: Stop oopsing in key_change_session_keyring
Commit b67bfe0d42 ("hlist: drop the node parameter from iterators")
did a lot of nice changes but also contains two small hunks that seem to
have slipped in accidentally and have no apparent connection to the
intent of the patch.
This reverts the two extraneous changes.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Peter Senna Tschudin <peter.senna@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
v2:
a) used struct ipv6_addr_props
v3:
a) reverted changes for ipv6_addr_props
v4:
a) do not use __ipv6_addr_needs_scope_id
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The frames for which rx_handlers return RX_HANDLER_CONSUMED are no longer
counted as dropped. They are counted as successfully received by
'netif_receive_skb'.
This allows network interface drivers to correctly update their RX-OK and
RX-DRP counters based on the result of 'netif_receive_skb'.
Signed-off-by: Cristian Bercaru <B43982@freescale.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
My last patch to solve a problem where the static/fallback labels were
not fully displayed resulted in build problems when IPv6 was disabled.
This patch resolves the IPv6 build problems; sorry for the screw-up.
Please queue for -stable or simply merge with the previous patch.
Reported-by: Kbuild Test Robot <fengguang.wu@intel.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Whenever an adapter is removed we must clean all the local structures,
especially the timers and scheduled work. Otherwise those asynchronous
threads will eventually try to access the freed nfc_dev pointer if an LLCP
link is up.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
This is really difficult to test with real NFC devices, but without
this fix an LLCP server will eventually refuse new connections.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>