Based upon initial work by Keiichi Kii <k-keiichi@bx.jp.nec.com>.
This patch introduces support for dynamic reconfiguration (adding, removing
and/or modifying parameters of netconsole targets at runtime) using a
userspace interface exported via configfs. Documentation is also updated
accordingly.
Issues and brief design overview:
(1) Kernel-initiated creation / destruction of kernel objects is not
possible with configfs -- the lifetimes of the "config items" is managed
exclusively from userspace. But netconsole must support boot/module
params too, and these are parsed in kernel and hence netpolls must be
setup from the kernel. Joel Becker suggested to separately manage the
lifetimes of the two kinds of netconsole_target objects -- those created
via configfs mkdir(2) from userspace and those specified from the
boot/module option string. This adds complexity and some redundancy here
and also means that boot/module param-created targets are not exposed
through the configfs namespace (and hence cannot be updated / destroyed
dynamically). However, this saves us from locking / refcounting
complexities that would need to be introduced in configfs to support
kernel-initiated item creation / destroy there.
(2) In configfs, item creation takes place in the call chain of the
mkdir(2) syscall in the driver subsystem. If we used an ioctl(2) to
create / destroy objects from userspace, the special userspace program is
able to fill out the structure to be passed into the ioctl and hence
specify attributes such as local interface that are required at the time
we set up the netpoll. For configfs, this information is not available at
the time of mkdir(2). So, we keep all newly-created targets (via
configfs) disabled by default. The user is expected to set various
attributes appropriately (including the local network interface if
required) and then write(2) "1" to the "enabled" attribute. Thus,
netpoll_setup() is then called on the set parameters in the context of
_this_ write(2) on the "enabled" attribute itself. This design enables
the user to reconfigure existing netconsole targets at runtime to be
attached to newly-come-up interfaces that may not have existed when
netconsole was loaded or when the targets were actually created. All this
effectively enables us to get rid of custom ioctls.
(3) Ultra-paranoid configfs attribute show() and store() operations, with
sanity and input range checking, using only safe string primitives, and
compliant with the recommendations in Documentation/filesystems/sysfs.txt.
(4) A new function netpoll_print_options() is created in the netpoll API,
that just prints out the configured parameters for a netpoll structure.
netpoll_parse_options() is modified to use that and it is also exported to
be used from netconsole.
Signed-off-by: Satyam Sharma <satyam@infradead.org>
Acked-by: Keiichi Kii <k-keiichi@bx.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch converts the messy macro for MASK_PFX to inline function
and expands TKEY_GET_MASK in the one place it is used.
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Try this out:
* replace macro's with inlines
* get rid of places doing multiple evaluations of NODE_PARENT
[akpm@linux-foundation.org: rcu_dereference wants an lval]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously code had IsReno/IsFack defined as macros that were
local to tcp_input.c though sack_ok field has user elsewhere too
for the same purpose. This changes them to static inlines as
preferred according the current coding style and unifies the
access to sack_ok across multiple files. Magic bitops of sack_ok
for FACK and DSACK are also abstracted to functions with
appropriate names.
Note:
- One sack_ok = 1 remains but that's self explanary, i.e., it
enables sack
- Couple of !IsReno cases are changed to tcp_is_sack
- There were no users for IsDSack => I dropped it
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously TCP had a transitional state during which reno
counted segments that are already below the current window into
sacked_out, which is now prevented. In addition, re-try now
the unconditional S+L skb catching.
This approach conservatively calls just remove_sack and leaves
reset_sack() calls alone. The best solution to the whole problem
would be to first calculate the new sacked_out fully (this patch
does not move reno_sack_reset calls from original sites and thus
does not implement this). However, that would require very
invasive change to fastretrans_alert (perhaps even slicing it to
two halves). Alternatively, all callers of tcp_packets_in_flight
(i.e., users that depend on sacked_out) should be postponed
until the new sacked_out has been calculated but it isn't any
simpler alternative.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
This happens rather infrequently and is only possible during
FRTO. We must not allow TCP to slip to Open state because
tcp_fastretrans_alert might then not be called on it's time
when FRTO has exited. This become a problem when left_out
got removed and was replaced by just sacked_out.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
tcp_verify_left_out is useful for verifying S+L condition, so
add it back to couple of places in where the code was not
calling to tcp_sync_left_out but used own ad-hoc solution
(before the tcp_sync_left_out got removed).
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Left_out was dropped a while ago, thus leaving verifying
consistency of the "left out" as only task for the function in
question. Thus make it's name more appropriate.
In addition, it is intentionally converted to #define instead
of static inline because the location of the invariant failure
is the most important thing to have if this ever triggers. I
think it would have been helpful e.g. in this case where the
location of the failure point had to be based on some quesswork:
http://lkml.org/lkml/2007/5/2/464
...Luckily the guesswork seems to have proved to be correct.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
tp->left_out got removed but nothing came to replace it back
then (users just did addition by themselves), so add function
for users now.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is easily calculable when needed and user are not that many
after all.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no need for such check in pkts_acked because the
callback is not invoked unless at least one segment got fully
ACKed (i.e., the snd_una moved past skb's end_seq) by the
cumulative ACK's snd_una advancement.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
No other users exist for tcp_ecn.h. Very few things remain in
tcp.h, for most TCP ECN functions callers reside within a
single .c file and can be placed there.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
In addition, added a reference about the purpose of the loop.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
F-RTO does not touch SACKED_ACKED bits at all, so there is no
need to recount them in tcp_enter_frto_loss. After removal of
the else branch, nested ifs can be combined.
This must also reset sacked_out when SACK is not in use as TCP
could have received some duplicate ACKs prior RTO. To achieve
that in a sane manner, tcp_reset_reno_sack was re-placed by the
previous patch.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stupid error from my side. Even though now that I noticed this,
I hoped it would have been an optimization but no, the counter
hint is then incorrect. Thus clearing is necessary for now (I
still suspect though that this path is never executed).
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is guaranteed to be valid only when !tp->sacked_out. In most
cases this seqno is available in the last ACK but there is no
guarantee for that. The new fast recovery loss marking algorithm
needs this as entry point.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently neighbour event notifications are limited to update
notifications and only sent if the ARP daemon is enabled. This
patch extends the existing notification code by also reporting
neighbours being removed due to gc or administratively and
removes the dependency on the ARP daemon. This allows to keep
track of neighbour states without periodically fetching the
complete neighbour table.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduces neigh_cleanup_and_release() to be used after a
neighbour has been removed from its neighbour table. Serves
as preparation to add event notifications.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch provides generic Large Receive Offload (LRO) functionality
for IPv4/TCP traffic.
LRO combines received tcp packets to a single larger tcp packet and
passes them then to the network stack in order to increase performance
(throughput). The interface supports two modes: Drivers can either
pass SKBs or fragment lists to the LRO engine.
Signed-off-by: Jan-Bernd Themann <themann@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This routine gets the parsed rtnl attributes and creates a new
link with generic info (IFLA_LINKINFO policy). Its intention
is to help the drivers, that need to create several links at
once (like VETH).
This is nothing but a copy-paste-ed part of rtnl_newlink() function
that is responsible for creation of new device.
Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Several devices have multiple independant RX queues per net
device, and some have a single interrupt doorbell for several
queues.
In either case, it's easier to support layouts like that if the
structure representing the poll is independant from the net
device itself.
The signature of the ->poll() call back goes from:
int foo_poll(struct net_device *dev, int *budget)
to
int foo_poll(struct napi_struct *napi, int budget)
The caller is returned the number of RX packets processed (or
the number of "NAPI credits" consumed if you want to get
abstract). The callee no longer messes around bumping
dev->quota, *budget, etc. because that is all handled in the
caller upon return.
The napi_struct is to be embedded in the device driver private data
structures.
Furthermore, it is the driver's responsibility to disable all NAPI
instances in it's ->stop() device close handler. Since the
napi_struct is privatized into the driver's private data structures,
only the driver knows how to get at all of the napi_struct instances
it may have per-device.
With lots of help and suggestions from Rusty Russell, Roland Dreier,
Michael Chan, Jeff Garzik, and Jamal Hadi Salim.
Bug fixes from Thomas Graf, Roland Dreier, Peter Zijlstra,
Joseph Fannin, Scott Wood, Hans J. Koch, and Michael Chan.
[ Ported to current tree and all drivers converted. Integrated
Stephen's follow-on kerneldoc additions, and restored poll_list
handling to the old style to fix mutual exclusion issues. -DaveM ]
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This makes the radiotap parser accept all other fields that are
currently defined.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The wireless extensions ioctl's implemented in mac80211 do not include
SIOCGIWTXPOWER. This patch adds the necessary code.
Acked-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Makes use of the type safe netlink interface and adds a warning
if the message is too big for NLMSG_DEFAULT_SIZE to help debug.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This makes it behave the same whether we have monitor during operation
or not.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Michael Wu noticed that the skb length checking is not taken care of enough when
a packet is presented on the Monitor interface for injection.
This patch improves the sanity checking and removes fake offsets placed
into the skb network and transport header.
Signed-off-by: Andy Green <andy@warmcat.com>
Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch replaces atomic allocations with regular ones where possible.
Merged with "revert some GFP_ATOMIC -> GFP_KERNEL changes" from Michael Wu:
> Some of the allocations made with GFP_ATOMIC really were necessary.
Signed-off-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
zd1211rw and bcm43xx are interested in being notified when ERP IE conditions
change, so that they can reprogram a register which affects how control frames
are transmitted.
This patch adds an interface similar to the one that can be found in softmac.
Signed-off-by: Daniel Drake <dsd@gentoo.org>
Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Similarly to CTS protection, whether short preambles are used for 802.11b
transmissions should be a per-subif setting, not device global.
For STAs, this patch makes short preamble handling automatic based on the ERP
IE. For APs, hostapd still uses the prism ioctls, but the write ioctl has been
restricted to AP-only subifs.
ieee80211_txrx_data.short_preamble (an unused field) was removed.
Unfortunately, some API changes were required for the following functions:
- ieee80211_generic_frame_duration
- ieee80211_rts_duration
- ieee80211_ctstoself_duration
- ieee80211_rts_get
- ieee80211_ctstoself_get
Affected drivers were updated accordingly.
Signed-off-by: Daniel Drake <dsd@gentoo.org>
Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
mac80211 informs the driver what the short and long retry values are through
set_retry_limit(), but when packets are being transmitted it did not inform the
driver which of the 2 retry limits should actually be used.
Instead it sends the actual value, but for drivers that can only set the retry limit
and the register and in the descriptor need to indicate which of the limits should
be used this is not really useful.
This patch will add a IEEE80211_TXCTL_LONG_RETRY_LIMIT flag to the
ieee80211_tx_control structure. By default the short retry limit should be
used but if the flag is set the long retry should be used.
This does not prevent the driver to ignore the request for "no retry" packets,
but at least those will be send out with the short retry limit. But there is no
perfect cure for this problem.. :(
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
My cheapy D-Link AP behaves strangely w.r.t reassociations.
The following sequence of commands causes me to lose association and to be
unable to regain it:
ifconfig eth8 down
ifconfig eth8 up
iwconfig eth8 essid <x>
This is because mac80211 tries to reassociate, rather than just associate.
My AP replies with an association response (not a reassociation response...)
denying the association with code 12: "Association denied due to reason
outside the scope of this standard"
mac80211 tries this reassociation another 4 times or so before finally giving
up.
I see 2 problems here:
1. bringing the interface down and up again should be resetting interface state
i.e. after the interface is brought down, it should have no memory of if or
where it was previously associated
2. after the first reassociation fails, mac80211 should fall back to
standard association for the next attempt
Signed-off-by: Daniel Drake <dsd@gentoo.org>
Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The sta_info code has some awkward locking which prevents some driver
callbacks from being allowed to sleep. This patch makes the locking more
focused so code that calls driver callbacks are allowed to sleep. It also
converts sta_lock to a rwlock.
Signed-off-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Introduce a new file util.c and move a whole bunch of functions into it.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch groups a whole bunch of functions together to make
ieee80211.c more maintainable.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
I think these can go with rate control just as well and it makes
ieee80211.c more readable.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
into a new file key.c which doesn't have much code right now but
it makes ieee80211.c easier to read.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Some more outdenting to make the code more readable.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
pre_rx handlers can't really touch sta since for IBSS it might not be
assigned yet, it can create sta info structs on-the-fly.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The really indented part that does the huge switch on the interface
type is a nuisance. Put it into an own function 'prepare_for_handlers'.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The ieee80211_rx_h_check handler really does two things, it's
a lot easier to understand if it's split into ieee80211_rx_h_check
and ieee80211_rx_h_load_key, and it may be possible in the future
to optimise the key loading to not do it for each interface.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Make some really indented code more readable by outdenting.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch moves the QoS handlers into rx.c making it possible
to compile wme.c only when NET_SCHED is defined.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When the ICMPv6 Target address is multicast, Linux processes the
redirect instead of dropping it. The problem is in this code in
ndisc_redirect_rcv():
if (ipv6_addr_equal(dest, target)) {
on_link = 1;
} else if (!(ipv6_addr_type(target) & IPV6_ADDR_LINKLOCAL)) {
ND_PRINTK2(KERN_WARNING
"ICMPv6 Redirect: target address is not
link-local.\n");
return;
}
This second check will succeed if the Target address is, for example,
FF02::1 because it has link-local scope. Instead, it should be checking
if it's a unicast link-local address, as stated in RFC 2461/4861 Section
8.1:
- The ICMP Target Address is either a link-local address (when
redirected to a router) or the same as the ICMP Destination
Address (when redirected to the on-link destination).
I know this doesn't explicitly say unicast link-local address, but it's
implied.
This bug is preventing Linux kernels from achieving IPv6 Logo Phase II
certification because of a recent error that was found in the TAHI test
suite - Neighbor Disovery suite test 206 (v6LC.2.3.6_G) had the
multicast address in the Destination field instead of Target field, so
we were passing the test. This won't be the case anymore.
The patch below fixes this problem, and also fixes ndisc_send_redirect()
to not send an invalid redirect with a multicast address in the Target
field. I re-ran the TAHI Neighbor Discovery section to make sure Linux
passes all 245 tests now.
Signed-off-by: Brian Haley <brian.haley@hp.com>
Acked-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When only GSO skb was partially ACKed, no hints are reset,
therefore fastpath_cnt_hint must be tweaked too or else it can
corrupt fackets_out. The corruption to occur, one must have
non-trivial ACK/SACK sequence, so this bug is not very often
that harmful. There's a fackets_out state reset in TCP because
fackets_out is known to be inaccurate and that fixes the issue
eventually anyway.
In case there was also at least one skb that got fully ACKed,
the fastpath_skb_hint is set to NULL which causes a recount for
fastpath_cnt_hint (the old value won't be accessed anymore),
thus it can safely be decremented without additional checking.
Reported by Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
CC net/ieee80211/softmac/ieee80211softmac_wx.o
/home/kernel/src/net/ieee80211/softmac/ieee80211softmac_wx.c: In function âieee80211softmac_wx_set_essidâ:
/home/kernel/src/net/ieee80211/softmac/ieee80211softmac_wx.c:117: warning: label âoutâ defined but not used
due to commit: efe870f9f4. Removing the label.
Signed-off-by: Richard Knutsson <ricknu-0@student.ltu.se>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Reported by Chris Evans <scarybeasts@gmail.com>:
> The summary is that an evil 80211 frame can crash out a victim's
> machine. It only applies to drivers using the 80211 wireless code, and
> only then to certain drivers (and even then depends on a card's
> firmware not dropping a dubious packet). I must confess I'm not
> keeping track of Linux wireless support, and the different protocol
> stacks etc.
>
> Details are as follows:
>
> ieee80211_rx() does not explicitly check that "skb->len >= hdrlen".
> There are other skb->len checks, but not enough to prevent a subtle
> off-by-two error if the frame has the IEEE80211_STYPE_QOS_DATA flag
> set.
>
> This leads to integer underflow and crash here:
>
> if (frag != 0)
> flen -= hdrlen;
>
> (flen is subsequently used as a memcpy length parameter).
How about this?
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is followup to Patrick's patch. A little optimization to enqueue
routine allows to remove artificial limitation on queue length.
Plus, testing showed that hash function used by SFQ is too bad or even worse.
It does not even sweep the whole range of hash values.
Switched to Jenkins' hash.
Signed-off-by: Alexey Kuznetsov <kaber@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Based upon a report and initial patch by Peter Lieven.
tcp4_md5sig_key and tcp6_md5sig_key need to start with
the exact same members as tcp_md5sig_key. Because they
are both cast to that type by tcp_v{4,6}_md5_do_lookup().
Unfortunately tcp{4,6}_md5sig_key use a u16 for the key
length instead of a u8, which is what tcp_md5sig_key
uses. This just so happens to work by accident on
little-endian, but on big-endian it doesn't.
Instead of casting, just place tcp_md5sig_key as the first member of
the address-family specific structures, adjust the access sites, and
kill off the ugly casts.
Signed-off-by: David S. Miller <davem@davemloft.net>
This fixes kernel bugzilla #5731
It should generate an empty packet for datagram protocols when the
socket is connected, for one.
The check is doubly-wrong because all that a write() can be is a
sendmsg() call with a NULL msg_control and a single entry iovec. No
special semantics should be assigned to it, therefore the zero length
check should be removed entirely.
This matches the behavior of BSD and several other systems.
Alan Cox notes that SuSv3 says the behavior of a zero length write on
non-files is "unspecified", but that's kind of useless since BSD has
defined this behavior for a quarter century and BSD is essentially
what application folks code to.
Based upon a patch from Stephen Hemminger.
Signed-off-by: David S. Miller <davem@davemloft.net>
It gets pointer to fastcall function, expects a pointer to normal
one and calls the sucker.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If ADDIP is enabled, when an ASCONF chunk is received with ASCONF
paramter length set to zero, this will cause infinite loop.
By the way, if an malformed ASCONF chunk is received, will cause
processing to access memory without verifying.
This is because of not check the validity of parameters in ASCONF chunk.
This patch fixed this.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
RFC 4460 and future RFC 4960 (2960-bis) specify that packets
with bundled INIT chunks need to be dropped. We currenlty do
that only after processing any leading chunks. For OOTB chunks,
since we already walk the entire packet, we should discard packets
with bundled INITs.
There are other chunks chunks that MUST NOT be bundled, but the spec
is silent on theire treatment. Thus, we'll leave their teatment
alone for the moment.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Acked-by: Wei Yongjun <yjwei@cn.fujitsu.com>
While processing OOTB chunks as well as chunks with an invalid
length of 0, it was possible to SCTP to get wedged inside an
infinite loop because we didn't catch the condition correctly,
or didn't mark the packet for discard correctly.
This work is based on original findings and work by
Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Explicitely discard OOTB chunks, whether the result is a
SHUTDOWN COMPLETE or an ABORT. We need to discard the OOTB
SHUTDOWN ACK to prevent bombing attackes since responsed
MUST NOT be bundled. We also explicietely discard in the
ABORT case since that function is widely used internally.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
When SCTP client received an INIT ACK chunk with missing mandatory
parameter such as "cookie parameter", it will send back a ABORT
with T-bit not set and verification tag is set to 0.
This is because before we accept this INIT ACK chunk, we do not know
the peer's tag. This patch change to reflect vtag when responding to
INIT ACK with missing mandatory parameter.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
When we process bundled chunks, we need to make sure that
the skb has the buffer for each header since we assume it's
always there. Some malicious node can send us something like
DATA + 2 bytes and we'll try to walk off the end refrencing
potentially uninitialized memory.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
When mac80211 is built into the kernel it needs to init earlier
so that device registrations are run after it has initialised.
The same applies to rate control algorithms.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
wme.c triggers a sparse warning; it wasn't noticed before because until
recently ARRAY_SIZE triggered a sparse error.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When cfg80211 is built into the kernel it needs to init earlier
so that device registrations are run after it has initialised.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
net/wireless/sysfs.c:108: warning: ‘wiphy_uevent’ defined but not used
when CONFIG_HOTPLUG=n is because the only usage site of this function
is #ifdef'ed as such, so let's #ifdef the definition also.
Signed-off-by: Satyam Sharma <satyam@infradead.org>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Commit 4cf92a3c was submitted as a fix for bug #8686 at bugzilla.kernel.org
(http://bugzilla.kernel.org/show_bug.cgi?id=8686). Unfortunately, the fix led to
a new bug, reported by Yoshifuji Hideaki, that prevented association for WEP
encrypted networks that use ifconfig to control the device. This patch effectively
reverts the earlier commit and does a proper fix for bug #8686.
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
we upgraded the kernel of a nfs-server from 2.6.17.11 to 2.6.22.6. Since
then we get the message
lockd: too many open TCP sockets, consider increasing the number of nfsd threads
lockd: last TCP connect from ^\\236^\É^D
These random characters in the second line are caused by a bug in
svc_tcp_accept.
(Note: there are two previous __svc_print_addr(sin, buf, sizeof(buf))
calls in this function, either of which would initialize buf correctly;
but both are inside "if"'s and are not necessarily executed. This is
less obvious in the second case, which is inside a dprintk(), which is a
macro which expands to an if statement.)
Signed-off-by: Wolfgang Walter <wolfgang.walter@studentenwerk.mhn.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The following patch fixes the handling of netlink packets containing
multiple messages.
As exposed during netfilter workshop, nfnetlink_log was overwritten the
message type of the last message (setting it to MSG_DONE) in a multipart
packet. The consequence was libnfnetlink to ignore the last message in the
packet.
The following patch adds a supplementary message (with type MSG_DONE) af
the end of the netlink skb.
Signed-off-by: Eric Leblond <eric@inl.fr>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
[VLAN]: Fix net_device leak.
[PPP] generic: Fix receive path data clobbering & non-linear handling
[PPP] generic: Call skb_cow_head before scribbling over skb
[NET] skbuff: Add skb_cow_head
[BRIDGE]: Kill clone argument to br_flood_*
[PPP] pppoe: Fill in header directly in __pppoe_xmit
[PPP] pppoe: Fix data clobbering in __pppoe_xmit and return value
[PPP] pppoe: Fix skb_unshare_check call position
[SCTP]: Convert bind_addr_list locking to RCU
[SCTP]: Add RCU synchronization around sctp_localaddr_list
[PKT_SCHED]: sch_cbq.c: Shut up uninitialized variable warning
[PKTGEN]: srcmac fix
[IPV6]: Fix source address selection.
[IPV4]: Just increment OutDatagrams once per a datagram.
[IPV6]: Just increment OutDatagrams once per a datagram.
[IPV6]: Fix unbalanced socket reference with MSG_CONFIRM.
[NET_SCHED] protect action config/dump from irqs
[NET]: Fix two issues wrt. SO_BINDTODEVICE.
In "[VLAN]: Move device registation to seperate function" (commit
e89fe42cd0), a pile of code got moved
to register_vlan_dev(), including grabbing a reference to underlying
device. However, original dev_hold() had been left behind, so we
leak a reference to net_device now...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds an optimised version of skb_cow that avoids the copy if
the header can be modified even if the rest of the payload is cloned.
This can be used in encapsulating paths where we only need to modify the
header. As it is, this can be used in PPPOE and bridging.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The clone argument is only used by one caller and that caller can clone
the packet itself. This patch moves the clone call into the caller and
kills the clone argument.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the sctp_sockaddr_entry is now RCU enabled as part of
the patch to synchronize sctp_localaddr_list, it makes sense to
change all handling of these entries to RCU. This includes the
sctp_bind_addrs structure and it's list of bound addresses.
This list is currently protected by an external rw_lock and that
looks like an overkill. There are only 2 writers to the list:
bind()/bindx() calls, and BH processing of ASCONF-ACK chunks.
These are already seriealized via the socket lock, so they will
not step on each other. These are also relatively rare, so we
should be good with RCU.
The readers are varied and they are easily converted to RCU.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Sridhar Samdurala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sctp_localaddr_list is modified dynamically via NETDEV_UP
and NETDEV_DOWN events, but there is not synchronization
between writer (even handler) and readers. As a result,
the readers can access an entry that has been freed and
crash the sytem.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Sridhar Samdurala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
net/sched/sch_cbq.c: In function 'cbq_enqueue':
net/sched/sch_cbq.c:383: warning: 'ret' may be used uninitialized in this function
has been verified to be a bogus case. So let's shut it up.
Signed-off-by: Satyam Sharma <satyam@infradead.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The commit 95c385 broke proper source address selection for cases in which
there is a address which is makred 'deprecated'. The commit mistakenly
changed ifa->flags to ifa_result->flags (probably copy/paste error from a
few lines above) in the 'Rule 3' address selection code.
The patch restores the previous RFC-compliant behavior.
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
(with no apologies to C Heston)
On Mon, 2007-10-09 at 21:00 +0800, Herbert Xu wrote:
On Sun, Sep 02, 2007 at 01:11:29PM +0000, Christian Kujau wrote:
> >
> > after upgrading to 2.6.23-rc5 (and applying davem's fix [0]), lockdep
> > was quite noisy when I tried to shape my external (wireless) interface:
> >
> > [ 6400.534545] FahCore_78.exe/3552 just changed the state of lock:
> > [ 6400.534713] (&dev->ingress_lock){-+..}, at: [<c038d595>]
> > netif_receive_skb+0x2d5/0x3c0
> > [ 6400.534941] but this lock took another, soft-read-irq-unsafe lock in the
> > past:
> > [ 6400.535145] (police_lock){-.--}
>
> This is a genuine dead-lock. The police lock can be taken
> for reading with softirqs on. If a second CPU tries to take
> the police lock for writing, while holding the ingress lock,
> then a softirq on the first CPU can dead-lock when it tries
> to get the ingress lock.
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
1) Comments suggest that setting optlen to zero will unbind
the socket from whatever device it might be attached to. This
hasn't been the case since at least 2.2.x because the first thing
this function does is return -EINVAL if 'optlen' is less than
sizeof(int).
This check also means that passing in a two byte string doesn't
work so well. It's almost as if this code was testing with "eth?"
patterned strings and nothing else :-)
Fix this by breaking the logic of this facility out into a
seperate function which validates optlen more appropriately.
The optlen==0 and small string cases now work properly.
2) We should reset the cached route of the socket after we have made
the device binding changes, not before.
Reported by Ben Greear.
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit aaf68cfbf2 added a bias
to sk_inuse, so this test for an unused socket now fails. So no
sockets get closed because they are old (they might get closed
if the client closed them).
This bug has existed since 2.6.21-rc1.
Thanks to Wolfgang Walter for finding and reporting the bug.
Cc: Wolfgang Walter <wolfgang.walter@studentenwerk.mhn.de>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some of skbs in sk->write_queue do not have skb->dst because
we do not fill skb->dst when we allocate new skb in append_data().
BTW, I think we may not need to (or we should not) increment some stats
when using corking; if 100 sendmsg() (with MSG_MORE) result in 2 packets,
how many should we increment?
If 100, we should set skb->dst for every queued skbs.
If 1 (or 2 (*)), we increment the stats for the first queued skb and
we should just skip incrementing OutDiscards for the rest of queued skbs,
adn we should also impelement this semantics in other places;
e.g., we should increment other stats just once, not 100 times.
*: depends on the place we are discarding the datagram.
I guess should just increment by 1 (or 2).
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
So I've had a deadlock reported to me. I've found that the sequence of
events goes like this:
1) process A (modprobe) runs to remove ip_tables.ko
2) process B (iptables-restore) runs and calls setsockopt on a netfilter socket,
increasing the ip_tables socket_ops use count
3) process A acquires a file lock on the file ip_tables.ko, calls remove_module
in the kernel, which in turn executes the ip_tables module cleanup routine,
which calls nf_unregister_sockopt
4) nf_unregister_sockopt, seeing that the use count is non-zero, puts the
calling process into uninterruptible sleep, expecting the process using the
socket option code to wake it up when it exits the kernel
4) the user of the socket option code (process B) in do_ipt_get_ctl, calls
ipt_find_table_lock, which in this case calls request_module to load
ip_tables_nat.ko
5) request_module forks a copy of modprobe (process C) to load the module and
blocks until modprobe exits.
6) Process C. forked by request_module process the dependencies of
ip_tables_nat.ko, of which ip_tables.ko is one.
7) Process C attempts to lock the request module and all its dependencies, it
blocks when it attempts to lock ip_tables.ko (which was previously locked in
step 3)
Theres not really any great permanent solution to this that I can see, but I've
developed a two part solution that corrects the problem
Part 1) Modifies the nf_sockopt registration code so that, instead of using a
use counter internal to the nf_sockopt_ops structure, we instead use a pointer
to the registering modules owner to do module reference counting when nf_sockopt
calls a modules set/get routine. This prevents the deadlock by preventing set 4
from happening.
Part 2) Enhances the modprobe utilty so that by default it preforms non-blocking
remove operations (the same way rmmod does), and add an option to explicity
request blocking operation. So if you select blocking operation in modprobe you
can still cause the above deadlock, but only if you explicity try (and since
root can do any old stupid thing it would like.... :) ).
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since we're now using a generic tuple decoding function in ICMP
connection tracking, ipv4_get_l4proto() might get called with a
fragmented packet from within an ICMP error. Remove the error
message we used to print when this happens.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
From: Denis V. Lunev <den@openvz.org>
addrconf_dad_failure calls addrconf_dad_stop which takes referenced address
and drops the count. So, in6_ifa_put perrformed at out: is extra. This
results in message: "Freeing alive inet6 address" and not released dst entries.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Not all are listed, same as the IPV4 devinet bug.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bug: http://bugzilla.kernel.org/show_bug.cgi?id=8876
Not all ips are shown by "ip addr show" command when IPs number assigned to an
interface is more than 60-80 (in fact it depends on broadcast/label etc
presence on each address).
Steps to reproduce:
It's terribly simple to reproduce:
# for i in $(seq 1 100); do ip ad add 10.0.$i.1/24 dev eth10 ; done
# ip addr show
this will _not_ show all IPs.
Looks like the problem is in netlink/ipv4 message processing.
This is fix from bug submitter, it looks correct.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When msg_iovlen is zero we shouldn't try to dereference
msg_iov. Right now the only thing that tries to do so
is skb_copy_and_csum_datagram_iovec. Since the total
length should also be zero if msg_iovlen is zero, it's
sufficient to check the total length there and simply
return if it's zero.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
On device initialization the event filters are cleared. In case of
clearing the filters the extra condition type shall be omitted.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch updates the HCI security filter with support for the
Bluetooth 2.1 commands and events.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The timestamp structure needs special handling in case of compat
programs. Use the same wrapping method the network core uses.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The pktgen_thread.pid is set to current->pid and is never used
after this. So remove this at all.
Found during isolating the explicit pid/tgid usage.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It doesn't seem to have any effect on the x86 architecture but it does
have effect on the Axis CRIS architecture.
Signed-off-by: Jesper Bengtsson <jesper.bengtsson@axis.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
When CONFIG_NET_CLS_ACT is enabled, tc_classify() is called twice in
prio_classify(). This causes "interesting" behaviour: with the setup
below, packets are duplicated, sent twice to ifb0, and then loop in and
out of ifb0.
The patch uses the previously calculated return value in the switch,
which is probably what Patrick had in mind in commit
bdba91ec70 -- maybe Patrick can
double-check this?
-- example setup --
ifconfig ifb0 up
tc qdisc add dev ifb0 root netem delay 2s
tc qdisc add dev $ETH root handle 1: prio
tc filter add dev $ETH parent 1: protocol ip prio 10 u32 \
match ip dst 172.24.110.6/32 flowid 1:1 \
action mirred egress redirect dev ifb0
ping -c1 172.24.110.6
Signed-off-by: Lucas Nussbaum <lucas.nussbaum@imag.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bridge code calls ethtool to get speed. The conversion to using
only ethtool_ops broke the case of devices without ethtool_ops.
This is a new regression in 2.6.23.
Rearranged the switch to a logical order, and use gcc initializer.
Ps: speed should have been part of the network device structure from
the start rather than burying it in ethtool.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Acked-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes some packet leakage in bridge. The bridging code was
allowing forward table entries to be generated even if a device was
being blocked. The fix is to not add forwarding database entries
unless the port is active.
The bug arose as part of the conversion to processing STP frames
through normal receive path (in 2.6.17).
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Acked-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cell phone networks do link layer retransmissions and other
things that cause unnecessary timeout retransmits. So allow
the minimum RTO to be inflated per-route to deal with this.
Signed-off-by: David S. Miller <davem@davemloft.net>
If an INIT with invalid parameter length look like this:
Parameter Type : 1
Parameter Length: 800
and not contain any payload, SCTP will ignore this parameter and send
back a INIT-ACK.
This patch is fix to handle this invalid parameter length correctly.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Currently we abort on the INIT chunk we our backlog is currenlty
exceeded. Delay this about untill COOKIE-ECHO to give the user
time to accept the socket. Also, make sure that we treat
sk_max_backlog of 0 as no connections allowed.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
When performing a retransmit, do not include the chunk if
it was sent less then 1 rtt ago. The reason is that we
may receive the SACK very soon and wouldn't retransmit.
Suggested by Randy Stewart.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Do not set Unconfirmed transports to Inactive state. This may
result in an inactive association being destroyed since we start
counting errors on "inactive" transports against the association.
This was found at the SCTP interop event.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
sctp_bindx() allows the use of unspecified port. The problem is
that every address we bind to ends up selecting a new port if
the user specified port 0. This patch allows re-use of the
already selected port when the port from bindx was 0.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
When multi bundling SHUTDOWN-ACK message is received in ESTAB state,
this will cause "sctp protocol violation state" message print many times.
If SHUTDOWN-ACK is bundled 300 times in one packet, message will be
print 300 times. The same problem also exists when received unexpected
HEARTBEAT-ACK message which is bundled message times.
This patch used net_ratelimit() to suppress error messages print too fast.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
PROTOCOL VIOLATION error cause in ABORT is bad encode when make abort
chunk. When SCTP encode ABORT chunk with PROTOCOL VIOLATION error cause,
it just add the error messages to PROTOCOL VIOLATION error cause, the
rest four bytes(struct sctp_paramhdr) is just add to the chunk, not
change the length of error cause. This cause the ABORT chunk to be a bad
format. The chunk is like this:
ABORT chunk
Chunk type: ABORT (6)
Chunk flags: 0x00
Chunk length: 72 (*1)
Protocol violation cause
Cause code: Protocol violation (0x000d)
Cause length: 62 (*2)
Cause information: 5468652063756D756C61746976652074736E2061636B2062...
Cause padding: 0000
[Needless] 00030010
Chunk Length(*1) = 72 but Cause length(*2) only 62, not include the
extend 4 bytes.
((72 - sizeof(chunk_hdr)) = 68) != (62 +3) / 4 * 4
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
At function sctp_addto_chunk(), it do pad before add payload to chunk if
chunk length is not 4-byte alignment. But it do pad with a bad length.
This patch fixed this probleam.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Currently we only assign the sequence number to a packet that
we are about to transmit. This however breaks the Partial
Reliability extensions, because it's possible for us to
never transmit a packet, i.e. it expires before we get to send
it. In such cases, if the message contained multiple SCTP
fragments, and we did manage to send the first part of the
message, the Stream sequence numbers would get into invalid
state and cause receiver to stall.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
When we recieve a FWD-TSN (meaning the peer has abandoned the data),
we need to clean up any partially received messages that may be
hanging out on the re-assembly or re-ordering queues. This is
a MUST requirement that was not properly done before.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com.>
Initially pkt_dev can be NULL this causes netif_subqueue_stopped to
oops. The patch below should cure it. But maybe the pktgen TX logic
should be reworked to better support the new multiqueue support.
Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
I tried to preserve bridging code as it was before, but logic is quite
strange - I think we should free skb on error, since it is already
unshared and thus will just leak.
Herbert Xu states:
> + if ((skb = skb_share_check(skb, GFP_ATOMIC)) == NULL)
> + goto out;
If this happens it'll be a double-free on skb since we'll
return NF_DROP which makes the caller free it too.
We could return NF_STOLEN to prevent that but I'm not sure
whether that's correct netfilter semantics. Patrick, could
you please make a call on this?
Patrick McHardy states:
NF_STOLEN should work fine here.
Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes a crash that may occur when the routine dev_mc_sync()
deletes an address from the list it is currently going through. It
saves the pointer to the next element before deleting the current one.
The problem may also exist in dev_mc_unsync().
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replacing n & (n - 1) for power of 2 check by is_power_of_2(n)
Signed-off-by: vignesh babu <vignesh.babu@wipro.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
People often get tripped up by this function and think that
it does not implemented the prescribed algorithms from
RFC2414 and RFC3390, even though it does.
So add a comment to head off such misunderstandings in the
future.
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix IP[V6]_ADD_MEMBERSHIP and IP[V6]_DROP_MEMBERSHIP to
return -EPROTO for connection oriented sockets.
Signed-off-by: Flavio Leitner <fleitner@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In testing our ESP/AH offload hardware, I discovered an issue with how
AH handles mutable fields in IPv4. RFC 4302 (AH) states the following
on the subject:
For IPv4, the entire option is viewed as a unit; so even
though the type and length fields within most options are immutable
in transit, if an option is classified as mutable, the entire option
is zeroed for ICV computation purposes.
The current implementation does not zero the type and length fields,
resulting in authentication failures when communicating with hosts
that do (i.e. FreeBSD).
I have tested record route and timestamp options (ping -R and ping -T)
on a small network involving Windows XP, FreeBSD 6.2, and Linux hosts,
with one router. In the presence of these options, the FreeBSD and
Linux hosts (with the patch or with the hardware) can communicate.
The Windows XP host simply fails to accept these packets with or
without the patch.
I have also been trying to test source routing options (using
traceroute -g), but haven't had much luck getting this option to work
*without* AH, let alone with.
Signed-off-by: Nick Bowler <nbowler@ellipticsemi.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
9p: fix bad error path in conversion routines
9p: remove deprecated v9fs_fid_lookup_remove()
9p: update maintainers and documentation
9p: fix use after free
When buf_check_overflow() returns != 0 we will hit kfree(ERR_PTR(err))
and it will not be happy about it.
Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
On 7/22/07, Adrian Bunk <bunk@stusta.de> wrote:
The Coverity checker spotted the following use-after-free
in net/9p/mux.c:
<-- snip -->
...
struct p9_conn *p9_conn_create(struct p9_transport *trans, int msize,
unsigned char *extended)
{
...
if (!m->tagpool) {
kfree(m);
return ERR_PTR(PTR_ERR(m->tagpool));
}
...
<-- snip -->
Also spotted was a leak of the same structure further down in the function.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
It seems an extraneous trailing ';' has slipped in to the error handling for a
name registration failure causing the error path to trigger unconditionally.
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Easily avoidable compiler warnings bug me.
Building irmod without CONFIG_SYSCTL currently results in :
net/irda/irmod.c:132: warning: label 'out_err_2' defined but not used
But that can easily be avoided by simply moving the label inside
the existing "#ifdef CONFIG_SYSCTL" one line above it.
This patch moves the label and buys us one less warning with no
ill effects.
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The snap_rcv code reads 5 bytes so we should make sure that
we have 5 bytes in the head before proceeding.
Based on diagnosis and fix by Evgeniy Polyakov, reported by
Alan J. Wylie.
Patch also kills the skb->sk assignment before kfree_skb
since it's redundant.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A similar fix to netfilter from Eric Dumazet inspired me to
look around a bit by using some grep/sed stuff as looking for
this kind of bugs seemed easy to automate. This is one of them
I found where it looks like this semicolon is not valid.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
The recent RCU work created an unbalanced rcu_read_unlock
in __sock_create. This patch fixes that. Reported by
oleg 123.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Coverity checker spotted that we'd have already oops'ed if
"vlandev" was NULL.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes the no longer used EXPORT_SYMBOL(dev_ethtool).
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: David S. Miller <davem@davemloft.net>
Probe for hidden SSIDs if initiating pre-authentication scan and SSID
is set for STA interface.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When I added the monitor for outgoing frames somehow a break
statement slipped in. Remove it.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Acked-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The stp change code generates "sleeping function called from invalid
context" because rtnl_lock() called with BH disabled. This fixes it by
not acquiring then dropping the bridge lock.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Don't drop packets shorter than "SIP/2.0", just ignore them. Keep-alives
can validly be shorter for example.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The userinfo component of a SIP-URI is optional, continue parsing at the
beginning of the SIP-URI in case its not found.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The check got lost during the conversion to nf_conntrack.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
An extraneous ";" makes xt_u32 match useless
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
For architectures that don't have a volatile atomic_ts constructs like
while (atomic_read(&something)); might result in endless loops since a
barrier() is missing which forces the compiler to generate code that
actually reads memory contents.
Fix this in ipvs by using the IP_VS_WAIT_WHILE macro which resolves to
while (expr) { cpu_relax(); }
(why isn't this open coded btw?)
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
http://bugzilla.kernel.org/show_bug.cgi?id=8797 shows that the
bonding driver may produce bogus combinations of the checksum
flags and SG/TSO.
For example, if you bond devices with NETIF_F_HW_CSUM and
NETIF_F_IP_CSUM you'll end up with a bonding device that
has neither flag set. If both have TSO then this produces
an illegal combination.
The bridge device on the other hand has the correct code to
deal with this.
In fact, the same code can be used for both. So this patch
moves that logic into net/core/dev.c and uses it for both
bonding and bridging.
In the process I've made small adjustments such as only
setting GSO_ROBUST if at least one constituent device
supports it.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
There's a memory leak in net/dccp/feat.c::dccp_feat_empty_confirm(). If we
hit the 'default:' case of the 'switch' statement, then we return without
freeing 'opt', thus leaking 'struct dccp_opt_pend' bytes.
The leak is fixed easily enough by adding a kfree(opt); before the return
statement.
The patch also changes the layout of the 'switch' to be more in line with
CodingStyle.
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make sure that spin_unlock_wait() is properly ordered wrt atomic_inc().
(akpm: can't we convert this code to use rwlocks?)
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch cleans up duplicate includes in
net/xfrm/
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch cleans up duplicate includes in
net/tipc/
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch cleans up duplicate includes in
net/sunrpc/
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch cleans up duplicate includes in
net/sched/
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch cleans up duplicate includes in
net/ipv6/
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch cleans up duplicate includes in
net/ipv4/
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch cleans up duplicate includes in
net/atm/
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The following commandline:
root=/dev/mtdblock6 rw rootfstype=jffs2 ip=192.168.1.10:::255.255.255.0:localhost.localdomain:eth1:off console=ttyS0,115200
makes ip_auto_config fall back to DHCP and complain "IP-Config: Incomplete
network configuration information." depending on if CONFIG_IP_PNP_DHCP is
set or not.
The only way I can make ip_auto_config accept my IP config is to add an
entry for the server IP:
ip=192.168.1.10:192.168.1.15::255.255.255.0:localhost.localdomain:eth1:off
I think this is a bug since I am not using a NFS root FS.
The following patch fixes the above problem.
From: Andrew Morton <akpm@linux-foundation.org>
Davem said (in February!):
Well, first of all the change in question is not in 2.4.x either. I just
checked the current 2.4.x GIT tree and the test is exactly:
if (ic_myaddr == INADDR_NONE ||
#ifdef CONFIG_ROOT_NFS
(MAJOR(ROOT_DEV) == UNNAMED_MAJOR
&& root_server_addr == INADDR_NONE
&& ic_servaddr == INADDR_NONE) ||
#endif
ic_first_dev->next) {
which matches 2.6.x
I even checked 2.4.x when it was branched for 2.5.x and the test was the
same at the point in time too.
Looking at the proposed change a bit it appears that it is probably
correct, as it's trying to check that ROOT_DEV is nfs root. But if it is
correct then the UNNAMED_MAJOR comparison in the same code block should be
removed as it becomes superfluous.
I'm happy to apply this patch with that modification made.
Signed-off-by: Joakim Tjernlund <joakim.tjernlund@transmode.se>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.linux-nfs.org/pub/linux/nfs-2.6:
SUNRPC: Replace flush_workqueue() with cancel_work_sync() and friends
NFS: Replace flush_scheduled_work with cancel_work_sync() and friends
SUNRPC: Don't call gss_delete_sec_context() from an rcu context
NFSv4: Don't call put_rpccred() from an rcu callback
NFS: Fix NFSv4 open stateid regressions
NFSv4: Fix a locking regression in nfs4_set_mode_locked()
NFS: Fix put_nfs_open_context
SUNRPC: Fix a race in rpciod_down()
Small patch to H-TCP from Douglas Leith.
Fix estimation of maxRTT. The original code ignores rtt measurements
during slow start (via the check tp->snd_ssthresh < 0xFFFF) yet this
is probably a good time to try to estimate max rtt as delayed acking
is disabled and slow start will only exit on a loss which presumably
corresponds to a maxrtt measurement. Second, the original code (via
the check htcp_ccount(ca) > 3) ignores rtt data during what it
estimates to be the first 3 round-trip times. This seems like an
unnecessary check now that the RCV timestamp are no longer used
for rtt estimation.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Loading nf_nat causes the conntrack core to be loaded, but we need IPv4 as
well.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
ctnetlink must return EEXIST for existing nat'ed conntracks instead of
EINVAL. Only return EINVAL if we try to update a conntrack with NAT
handlings (that is not allowed).
Decadence:libnetfilter_conntrack/utils# ./conntrack_create_nat
TEST: create conntrack (0)(Success)
Decadence:libnetfilter_conntrack/utils# ./conntrack_create_nat
TEST: create conntrack (-1)(Invalid argument)
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the call to seq_open() returns != 0 then the code calls
kfree(st) but then on the very next line proceeds to
dereference the pointer - not good.
Problem spotted by the Coverity checker.
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
net_msg_warn is not defined because it is in net/sock.h which isn't
included.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The LSM domain mapping head table pointer was not being referenced via the RCU
safe dereferencing function, rcu_dereference(). This patch adds those missing
calls to the NetLabel code.
This has been tested using recent linux-2.6 git kernels with no visible
regressions.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The commit 4ada539ed7 lead to the unpleasant
possibility of an asynchronous rpc_task being required to call
rpciod_down() when it is complete. This again means that the rpciod
workqueue may get to call destroy_workqueue on itself -> hang...
Change rpciod_up/rpciod_down to just get/put the module, and then
create/destroy the workqueues on module load/unload.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
zd1211rw gets confused when the user asks for a scan when the device is
in monitor mode. This patch tightens up the SIWSCAN handler to deny the scan
under these conditions.
Signed-off-by: Daniel Drake <dsd@gentoo.org>
Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Fix sparse error for sta_last_seq_ctrl_read.
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Use do { } while (0) for multi-line macros
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Fixes an unlikely reference leak condition.
Signed-off-by: Daniel Drake <dsd@gentoo.org>
Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The essid wireless extension does deadlock against the assoc mutex,
as we don't unlock the assoc mutex when flushing the workqueue, which
also holds the lock.
Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
In case a DSACK is received, it's better to lower cwnd as it's
a sign of data receival.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
tcp_cwnd_down must check for it too as it should be conservative
in case of collapse stuff and also when receiver is trying to
lie (though that wouldn't be very successful/useful anyway).
Note:
- Separated also is_dupack and do_lost in fast_retransalert
* Much cleaner look-and-feel now
* This time it really fixes cumulative ACK with many new
SACK blocks recovery entry (I claimed this fixes with
last patch but it wasn't). TCP will now call
tcp_update_scoreboard regardless of is_dupack when
in recovery as long as there is enough fackets_out.
- Introduce FLAG_SND_UNA_ADVANCED
* Some prior_snd_una arguments are unnecessary after it
- Added helper FLAG_ANY_PROGRESS to avoid long FLAG...|FLAG...
constructs
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
fix two warnings generated by sparse:
link.c:2386 symbol 'msgcount' shadows an earlier one
node.c:244 symbol 'addr_string' shadows an earlier one
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
make needlessly global function tipc_nameseq_subscribe static.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Although an ipsec SA was established, kernel couldn't seem to find it.
I think since we are now using "x->sel.family" instead of "family" in
the xfrm_selector_match() called in xfrm_state_find(), af_key needs to
set this field too, just as xfrm_user.
In af_key.c, x->sel.family only gets set when there's an
ext_hdrs[SADB_EXT_ADDRESS_PROXY-1] which I think is for tunnel.
I think pfkey needs to also set the x->sel.family field when it is 0.
Tested with below patch, and ipsec worked when using pfkey.
Signed-off-by: David S. Miller <davem@davemloft.net>
As discovered by Evegniy Polyakov, if we try to sendmsg after
a connection reset, we can do incredibly stupid things.
The core issue is that inet_sendmsg() tries to autobind the
socket, but we should never do that for TCP. Instead we should
just go straight into TCP's sendmsg() code which will do all
of the necessary state and pending socket error checks.
TCP's sendpage already directly vectors to tcp_sendpage(), so this
merely brings sendmsg() in line with that.
Signed-off-by: David S. Miller <davem@davemloft.net>
The security_secid_to_secctx() function returns memory that must be freed
by a call to security_release_secctx() which was not always happening. This
patch fixes two of these problems (all that I could find in the kernel source
at present).
Signed-off-by: Paul Moore <paul.moore@hp.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
Some code in function sctp_init_cause() seem useless, this patch remove
them.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
We need to drop the SACK if the peer is attempting to acknowledge
unset data, i.e. the CTSN in the SACK is greater or equal to the
next TSN we will send.
Example:
Endpoint A Endpoint B
<--------------- DATA (TSN=1)
SACK(TSN=1) --------------->
<--------------- DATA (TSN=2)
<--------------- DATA (TSN=3)
<--------------- DATA (TSN=4)
<--------------- DATA (TSN=5)
SACK(TSN=1000) --------------->
<--------------- DATA (TSN=6)
<--------------- DATA (TSN=7)
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
When issuing a connect call on an AF_INET6 sctp socket with
a IPv4-mapped destination, the peer address that is returned
by getpeeraddr() should be v4-mapped as well.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
An accept() call on a SCTPv6 socket that returns due to connection of
a IPv4 mapped peer will fill out the 'struct sockaddr' with a zero
IPv6 address instead of the IPv4 mapped address of the peer.
This is due to the v4mapped flag not getting copied into the new
socket on accept() as well as a missing check for INET6 socket type in
sctp_v4_to_sk_*addr().
Signed-off-by: Dave Johnson <djohnson@sw.starentnetworks.com>
Cc: Srinivas Akkipeddi <sakkiped@starentnetworks.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
net/sctp/tsnmap.c:164:16: warning: symbol '_end' shadows an earlier one
include/asm-generic/sections.h:13:13: originally declared here
Renamed renamed _end to end_ and _start (for consistence).
Signed-off-by: Sebastian Siewior <sebastian@breakpoint.cc>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
unlock the reader lock in error case.
Signed-off-by: Sebastian Siewior <sebastian@breakpoint.cc>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Fixes the following sparse warnings:
net/sctp/sm_make_chunk.c:1457:9: warning: symbol 'len' shadows an earlier one
net/sctp/sm_make_chunk.c:1356:23: originally declared here
net/sctp/socket.c:1534:22: warning: symbol 'chunk' shadows an earlier one
net/sctp/socket.c:1387:20: originally declared here
Signed-off-by: Sebastian Siewior <sebastian@breakpoint.cc>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
sctp_chunk_cachep & sctp_bucket_cachep is used module global, so move it
to a header file.
Signed-off-by: Sebastian Siewior <sebastian@breakpoint.cc>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Forward declarion is static, the function itself is not. Make it
consistent.
Signed-off-by: Sebastian Siewior <sebastian@breakpoint.cc>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (41 commits)
[RTNETLINK]: Fix warning for !CONFIG_KMOD
[IPV4] ip_options.c: kmalloc + memset conversion to kzalloc
[DECNET]: kmalloc + memset conversion to kzalloc
[NET]: ethtool_perm_addr only has one implementation
[NET]: ethtool ops are the only way
[PPPOE]: Improve hashing function in hash_item().
[XFRM]: State selection update to use inner addresses.
[IPSEC]: Ensure that state inner family is set
[TCP]: Bidir flow must not disregard SACK blocks for lost marking
[TCP]: Fix ratehalving with bidirectional flows
[PPPOL2TP]: Add CONFIG_INET Kconfig dependency.
[NET]: Page offsets and lengths need to be __u32.
[AF_UNIX]: Make code static.
[NETFILTER]: Make nf_ct_ipv6_skip_exthdr() static.
[PKTGEN]: make get_ipsec_sa() static and non-inline
[PPPoE]: move lock_sock() in pppoe_sendmsg() to the right location
[PPPoX/E]: return ENOTTY on unknown ioctl requests
[IPV6]: ipv6_addr_type() doesn't know about RFC4193 addresses.
[NET]: Fix prio_tune() handling of root qdisc.
[NET]: Fix sch_api to properly set sch->parent on the root.
...
When buf_check_overflow() returns != 0 we will hit kfree(ERR_PTR(err)) and
it will not be happy about it.
Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Cc: Eric Van Hensbergen <ericvh@ericvh.myip.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
All drivers implement ethtool get_perm_addr the same way -- by calling
the generic function. So we can inline the generic function into the
caller and avoid going through the drivers.
Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: David S. Miller <davem@davemloft.net>
During the transition to the ethtool_ops way of doing things, we supported
calling the device's ->do_ioctl method to allow unconverted drivers to
continue working. Those days are long behind us, all in-tree drivers
use the ethtool_ops way, and so we no longer need to support this.
The bonding driver is the biggest beneficiary of this; it no longer
needs to call ioctl() as a fallback if ethtool_ops aren't supported.
Also put a proper copyright statement on ethtool.c.
Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch modifies the xfrm state selection logic to use the inner
addresses where the outer have been (incorrectly) used. This is
required for beet mode in general and interfamily setups in both
tunnel and beet mode.
Signed-off-by: Joakim Koskela <jookos@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Diego Beltrami <diego.beltrami@gmail.com>
Signed-off-by: Miika Komu <miika@iki.fi>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Similar to the issue we had with template families which
specified the inner families of policies, we need to set
the inner families of states as the main xfrm user Openswan
leaves it as zero.
af_key is unaffected because the inner family is set by it
and not the KM.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
It's possible that new SACK blocks that should trigger new LOST
markings arrive with new data (which previously made is_dupack
false). In addition, I think this fixes a case where we get
a cumulative ACK with enough SACK blocks to trigger the fast
recovery (is_dupack would be false there too).
I'm not completely pleased with this solution because readability
of the code is somewhat questionable as 'is_dupack' in SACK case
is no longer about dupacks only but would mean something like
'lost_marker_work_todo' too... But because of Eifel stuff done
in CA_Recovery, the FLAG_DATA_SACKED check cannot be placed to
the if statement which seems attractive solution. Nevertheless,
I didn't like adding another variable just for that either... :-)
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Actually, the ratehalving seems to work too well, as cwnd is
reduced on every second ACK even though the packets in flight
remains unchanged. Recoveries in a bidirectional flows suffer
quite badly because of this, both NewReno and SACK are affected.
After this patch, rate halving is performed for ACK only if
packets in flight was supposedly changed too.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
The following code can now become static:
- struct unix_socket_table
- unix_table_lock
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Non-static inline code usually doesn't makes sense.
In this case making is static and non-inline is the correct solution.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
ipv6_addr_type() doesn't check for 'Unique Local IPv6 Unicast
Addresses' (RFC4193) and returns IPV6_ADDR_RESERVED for that range.
SCTP uses this function and will fail bind() and connect() calls that
use RFC4193 addresses, SCTP will also ignore inbound connections from
RFC4193 addresses if listening on IPV6_ADDR_ANY.
There may be other users of ipv6_addr_type() that could also have
problems.
Signed-off-by: Dave Johnson <djohnson@sw.starentnetworks.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the check in prio_tune() to see if sch->parent is TC_H_ROOT instead of
sch->handle to load or reject the qdisc for multiqueue devices.
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix sch_api to correctly set sch->parent for both ingress and egress
qdiscs in qdisc_create().
Signed-off-by: Patrick McHardy <trash@kaber.net>
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix handling of empty or completely non-matching filter chains. In
that case -1 is returned and tcf_result is uninitialized, the
qdisc should fall back to default classification in that case.
Noticed by PJ Waskiewicz <peter.p.waskiewicz.jr@intel.com>.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that netdev notifications can fail, we can use this to signal
errors during registration for IPv4/IPv6. In particular, if we
fail to allocate memory for the inet device, we can fail the netdev
registration.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds code to allow errors to be passed up from event
handlers of NETDEV_REGISTER and NETDEV_CHANGENAME. It also adds
the notifier_from_errno/notifier_to_errnor helpers to pass the
errno value up to the notifier caller.
If an error is detected when a device is registered, it causes
that operation to fail. A NETDEV_UNREGISTER will be sent to
all event handlers.
Similarly if NETDEV_CHANGENAME fails the original name is restored
and a new NETDEV_CHANGENAME event is sent.
As such all event handlers must be idempotent with respect to
these events.
When an event handler is registered NETDEV_REGISTER events are
sent for all devices currently registered. Should any of them
fail, we will send NETDEV_GOING_DOWN/NETDEV_DOWN/NETDEV_UNREGISTER
events to that handler for the devices which have already been
registered with it. The handler registration itself will fail.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
When we added name-based hashing the dev_base_lock was designated as the
lock to take when changing the name hash list. Unfortunately, because
it was a preexisting lock that just happened to be taken in the right
spots we neglected to take it in dev_change_name.
The race can affect calles of __dev_get_by_name that do so without taking
the RTNL. They may end up walking down the wrong hash chain and end up
missing the device that they're looking for.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes register_netdevice call dev->uninit if the regsitration
fails after dev->init has completed successfully. Very few drivers use
the init/uninit calls but at least one (drivers/net/wan/sealevel.c) may
leak without this change.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
As a path that forwards packets, IPVS should be using
skb_forward_csum instead of directly setting ip_summed
to CHECKSUM_NONE.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since nobody uses it after we convert it to host-endian,
no need to do that at all. At that point l2cap is endian-clean.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
no code changes, just documenting existing types
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We loop through psm values, calling __l2cap_get_sock_by_addr(psm, ...)
until we get NULL; then we set ->psm of our socket to htobs(psm).
IOW, we find unused psm value and put it into our socket. So far, so
good, but... __l2cap_get_sock_by_addr() compares its argument with
->psm of sockets. IOW, the entire thing works correctly only on
little-endian. On big-endian we'll get "no socket with such psm"
on the first iteration, since we won't find a socket with ->psm == 0x1001.
We will happily conclude that 0x1001 is unused and slap htobs(0x1001)
(i.e. 0x110) into ->psm of our socket. Of course, the next time around
the same thing will repeat and we'll just get a fsckload of sockets
with the same ->psm assigned.
Fix: pass htobs(psm) to __l2cap_get_sock_by_addr() there. All other
callers are already passing little-endian values and all places that
store something in ->psm are storing little-endian.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adrian Bunk wrote:
> Commit 8de0a15483 added the following
> use-after-free in net/bluetooth/rfcomm/tty.c:
>
> <-- snip -->
>
> ...
> static int rfcomm_dev_add(struct rfcomm_dev_req *req, struct rfcomm_dlc *dlc)
> {
> ...
> if (IS_ERR(dev->tty_dev)) {
> list_del(&dev->list);
> kfree(dev);
> return PTR_ERR(dev->tty_dev);
> }
> ...
>
> <-- snip -->
>
> Spotted by the Coverity checker.
really good catch. I fully overlooked that one. The attached patch
should fix it.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ADVMSS value was incorrectly updated for ALL routes when the MTU
is updated because it's outside the effect of the if statement's
condition.
Signed-off-by: Simon Arlott <simon@fire.lp0.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
if printbuf allocation or tipc_node_attach_link() fails, invalid
references to the link are left in the associated node and bearer
structures.
Fix by allocating printbuf early and moving timer initialization
and the addition of the new link to the b_ptr->links list after
tipc_node_attach_link() succeeded.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix kernel-doc omissions in net/:
Warning(linux-2.6.23-rc1//net/core/dev.c:2728): No description found for parameter 'addr'
Warning(linux-2.6.23-rc1//net/core/dev.c:2752): No description found for parameter 'addr'
Warning(linux-2.6.23-rc1//net/core/dev.c:3839): No description found for parameter 'net_dma'
Warning(linux-2.6.23-rc1//net/core/dev.c:3877): No description found for parameter 'state'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change HTCP to use measured RTT rather than smooth RTT.
Srtt is computed using the TCP receive timestamp
options, so it is vulnerable to hostile receivers. To avoid any problems
this might cause use the measured RTT instead.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove use of received timestamp option value from RTT calculation in Cubic.
A hostile receiver may be returning a larger timestamp option than the original
value. This would cause the sender to believe the malevolent receiver had
a larger RTT and because Cubic tries to provide some RTT friendliness, the
sender would then favor the liar.
Instead, use the jiffie resolutionRTT value already computed and
passed back after ack.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch changes the API for the callback that is done after an ACK is
received. It solves a couple of issues:
* Some congestion controls want higher resolution value of RTT
(controlled by TCP_CONG_RTT_SAMPLE flag). These don't really want a ktime, but
all compute a RTT in microseconds.
* Other congestion control could use RTT at jiffies resolution.
To keep API consistent the units should be the same for both cases, just the
resolution should change.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
smp_call_function_single now has the same semantics as s390's
smp_call_function_on. Therefore convert to the *single variant
and get rid of some architecture specific code.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Convert rel_info to host-endian before calling ip6_tnl_err().
The things become much more straightforward that way.
The key observation (and the reason why that code actually
worked) is that after ip6_tnl_err() we either immediately
bailed out or had rel_info set to 0 or had it set to host-endian
and guaranteed to hit
(rel_type == ICMP_DEST_UNREACH && rel_code == ICMP_FRAG_NEEDED)
case. So inconsistent endianness didn't really lead to bugs,
but it had been subtle and prone to breakage. New variant is
saner and obviously safe.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
no real bugs, just misannotations cropping up
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This avoids use of the kernel-internal "xtime" variable directly outside
of the actual time-related functions. Instead, use the helper functions
that we already have available to us.
This doesn't actually change any behaviour, but this will allow us to
fix the fact that "xtime" isn't updated very often with CONFIG_NO_HZ
(because much of the realtime information is maintained as separate
offsets to 'xtime'), which has caused interfaces that use xtime directly
to get a time that is out of sync with the real-time clock by up to a
third of a second or so.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
net/8021q/vlan.c: In function 'vlan_ioctl_handler':
net/8021q/vlan.c:700: warning: 'err' may be used uninitialized in this function
The warning is incorrect, but from my reading this ioctl will return -EINVAL
on success.
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current calculation of the maximum number of genetlink
multicast groups seems odd, fix it.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
family->mcast_groups is protected by genl_lock so it must
be held while accessing the list in genl_unregister_mc_groups().
Requires adding a non-locking variant of genl_unregister_mc_group().
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>