The fn check is unnecessary as fn can never be NULL in BACKTRACK().
Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
As ip6_route_output() never returns NULL, error checking must be done by
looking at dst->error in stead of comparing dst against NULL.
Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch causes TIPC to return an error message when it receives
an unrecognized configuration command. (Previously, the sender
received no feedback.)
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch allows a TIPC application to cancel an existing
topology service subscription by re-requesting the subscription
with the TIPC_SUB_CANCEL filter bit set. (All other bits of
the cancel request must match the original subscription request.)
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes a minor bug that prevents "tipc-config -l" from
displaying the multicast link if a TIPC node has never successfully
established at least one unicast link.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch corrects an issue wherein a previouly failed node could
not reestablish a links to a non-failing node in the TIPC network
until the latter node detected the link failure itself (which might
be configured to take up to 30 seconds). The non-failing node now
responds to link setup requests from a previously failed node in at
most 1 second, allowing it to detect the link failure more quickly.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch tivially re-orders the entries in TIPC's list of local
publications so that applications will receive publication events
in the order they were published.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch enhances TIPC's Ethernet support to include VLAN interfaces.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch allows the compiler to optimize out any code that tries to
send debugging output to the null print buffer (TIPC_NULL), a capability
that was unintentionally broken during the recent print buffer rework.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a simple test so TIPC doesn't try waking up processes
waiting on a socket if there are none waiting.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TIPC now rejects and logs link setup requests from node <Z.C.N> if the
receiving node already has a functional link to that node on the associated
interface, or if the requestor is using the same <Z.C.N> as the receiver.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The stream socket send code was not initializing some required fields
of the temporary msghdr structure it was utilizing; this is now fixed.
A check has also been added to detect if a user illegally specifies
a destination address when sending on an established stream connection.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change modifies TIPC's print buffer code as follows:
1) Now supports small print buffers (min. size reduced from 512 bytes to 64)
2) Now uses TIPC_NULL print buffer structure to indicate null device
instead of NULL pointer (this simplified error handling)
3) Fixed misuse of console buffer structure by tipc_dump()
4) Added and corrected comments in various places
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is possible for the ->fopen callback from lockd into nfsd to find that an
answer cannot be given straight away (an upcall is needed) and so the request
has to be 'dropped', to be retried later. That error status is not currently
propagated back.
So:
Change nlm_fopen to return nlm error codes (rather than a private
protocol) and define a new nlm_drop_reply code.
Cause nlm_drop_reply to cause the rpc request to get rpc_drop_reply
when this error comes back.
Cause svc_process to drop a request which returns a status of
rpc_drop_reply.
[akpm@osdl.org: fix warning storm]
Cc: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make net_random() more widely available by calling it random32
akpm: hopefully this will permit the removal of carta_random32. That needs
confirmation from Stephane - this code looks somewhat more computationally
expensive, and has a different (ie: callee-stateful) interface.
[akpm@osdl.org: lots of build fixes, cleanups]
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Stephane Eranian <eranian@hpl.hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix a slab corruption in ieee80211softmac_auth(). The size of a buffer
was miscomputed.
see http://bugzilla.kernel.org/show_bug.cgi?id=7245
Acked-by: Daniel Drake <dsd@gentoo.org>
Signed-off-by: Laurent Riffard <laurent.riffard@free.fr>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This fixes some race conditions in the WirelessExtension
handling and association handling code.
Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The bt_proto array needs to be protected by some kind of locking to
prevent a race condition between bt_sock_create and bt_sock_register.
And in addition all calls to sk_alloc need to be made GFP_ATOMIC now.
Signed-off-by: Masatake YAMATO <jet@gyve.org>
Signed-off-by: Frederik Deweerdt <frederik.deweerdt@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
If the DLC device is no longer attached to the TTY device, then it
makes no sense to go through with changing the termios settings.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
When the connection lookup for the device structure fails, the reference
count for the HCI device needs to be decremented.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The Bluetooth HID specification demands that the interrupt channel
shall be disconnected first. This is needed to pass the qualification
tests.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Most Bluetooth chips don't support concurrent connect requests, because
this would involve a multiple baseband page with only one radio. In the
case an upper layer like L2CAP requests a concurrent connect these chips
return the error "Command Disallowed" for the second request. If this
happens it the responsibility of the Bluetooth core to queue the request
and try again after the previous connect attempt has been completed.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The Bluetooth subsystem currently uses a platform device for devices
with no parent. It is a better idea to use the new virtual devices
tree for these.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Some return values of the driver core register and create functions
are not handled and so might cause unexpected problems.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
There exists no attempt do deal with the fact that a structure with
a uint32_t followed by a pointer is going to be different for 32-bit
and 64-bit userspace. Any 32-bit process trying to use it will be
failing with -EFAULT if it's lucky; suffering from having data dumped
at a random address if it's not.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This is missing the MODULE_LICENSE statements and taints the kernel
upon loading. License is obvious from the beginning of the file.
Signed-off-by: Jan Dittmer <jdi@l4x.org>
Signed-off-by: Joerg Roedel <joro-lkml@zlug.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes rt6_lookup() to provide the source address in the flow
and sets RT6_LOOKUP_F_HAS_SADDR whenever it is present in
the flow.
Avoids unnecessary prefix comparisons by checking for a prefix
length first.
Fixes the rule logic to not match packets if a source selector
has been specified but no source address is available.
Thanks to Kim Nordlund <kim.nordlund@nokia.com> for working
on this patch with me.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Ville Nuorvala <vnuorval@tcs.hut.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Missing counter bump when hashing in a new ACQ
xfrm_state.
Now that we have two spots to do the hash grow
check, break it out into a helper function.
Signed-off-by: David S. Miller <davem@davemloft.net>
The CIPSO passthrough mapping had a problem when sending categories which
would cause no or incorrect categories to be sent on the wire with a packet.
This patch fixes the problem which was a simple off-by-one bug.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: James Morris <jmorris@namei.org>
Fix several places in the CIPSO code where it was dereferencing fields which
did not have valid pointers by moving those pointer dereferences into code
blocks where the pointers are valid.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: James Morris <jmorris@namei.org>
Flush the forwarding table when carrier is lost. This helps for
availability because we don't want to forward to a downed device and
new packets may come in on other links.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove (compilation-breaking) debugging messages introduced at early
development stage.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
CONNSECMARK needs conntrack, add missing dependency to fix linking error
with CONNSECMARK=y and CONNTRACK=m.
Reported by Toralf Förster <toralf.foerster@gmx.de>.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Even though the tos field is only a single byte large, the values need to
be converted to net-endian for the checkum update so they are in the
corrent byte position. Also fix incorrect endian annotations.
Reported by Stephane Chazelas <Stephane_Chazelas@yahoo.fr>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use rb_first() to get first entry in rb tree.
Signed-off-by: Akinbou Mita <akinobu.mita@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
skb is the netlink query, nskb is the reply message.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
They are not necessarily initialized to zero by the compiler,
for example when using run-time initializers of automatic
on-stack variables.
Noticed by Eric Dumazet and Patrick McHardy.
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch contains the changes to net/ipv6/addrconf.c to remove sit
specific code if the sit driver is not selected.
Signed-off-by: Joerg Roedel <joro-lkml@zlug.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes the driver of the IPv6-in-IPv4 tunnel driver (sit)
from the IPv6 module. It adds an option to Kconfig which makes it
possible to compile it as a seperate module.
Signed-off-by: Joerg Roedel <joro-lkml@zlug.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
If more than one file descriptor was sent with an SCM_RIGHTS message,
and on the receiving end, after installing a nonzero (but not all)
file descritpors the process runs out of fds, then the already
installed fds will be lost (userspace will have no way of knowing
about them).
The following patch makes sure, that at least the already installed
fds are sent to userspace. It doesn't solve the issue of losing file
descriptors in case of an EFAULT on the userspace buffer.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Show the true receive buffer usage.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When doing receiver buffer accounting, we always used skb->truesize.
This is problematic when processing bundled DATA chunks because for
every DATA chunk that could be small part of one large skb, we would
charge the size of the entire skb. The new approach is to store the
size of the DATA chunk we are accounting for in the sctp_ulpevent
structure and use that stored value for accounting.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This treats the security errors encountered in the case of
socket policy matching, the same as how these are treated in
the case of main/sub policies, which is to return a full lookup
failure.
Signed-off-by: Venkat Yekkirala <vyekkirala@TrustedCS.com>
Signed-off-by: James Morris <jmorris@namei.org>
Currently when an IPSec policy rule doesn't specify a security
context, it is assumed to be "unlabeled" by SELinux, and so
the IPSec policy rule fails to match to a flow that it would
otherwise match to, unless one has explicitly added an SELinux
policy rule allowing the flow to "polmatch" to the "unlabeled"
IPSec policy rules. In the absence of such an explicitly added
SELinux policy rule, the IPSec policy rule fails to match and
so the packet(s) flow in clear text without the otherwise applicable
xfrm(s) applied.
The above SELinux behavior violates the SELinux security notion of
"deny by default" which should actually translate to "encrypt by
default" in the above case.
This was first reported by Evgeniy Polyakov and the way James Morris
was seeing the problem was when connecting via IPsec to a
confined service on an SELinux box (vsftpd), which did not have the
appropriate SELinux policy permissions to send packets via IPsec.
With this patch applied, SELinux "polmatching" of flows Vs. IPSec
policy rules will only come into play when there's a explicit context
specified for the IPSec policy rule (which also means there's corresponding
SELinux policy allowing appropriate domains/flows to polmatch to this context).
Secondly, when a security module is loaded (in this case, SELinux), the
security_xfrm_policy_lookup() hook can return errors other than access denied,
such as -EINVAL. We were not handling that correctly, and in fact
inverting the return logic and propagating a false "ok" back up to
xfrm_lookup(), which then allowed packets to pass as if they were not
associated with an xfrm policy.
The solution for this is to first ensure that errno values are
correctly propagated all the way back up through the various call chains
from security_xfrm_policy_lookup(), and handled correctly.
Then, flow_cache_lookup() is modified, so that if the policy resolver
fails (typically a permission denied via the security module), the flow
cache entry is killed rather than having a null policy assigned (which
indicates that the packet can pass freely). This also forces any future
lookups for the same flow to consult the security module (e.g. SELinux)
for current security policy (rather than, say, caching the error on the
flow cache entry).
This patch: Fix the selinux side of things.
This makes sure SELinux polmatching of flow contexts to IPSec policy
rules comes into play only when an explicit context is associated
with the IPSec policy rule.
Also, this no longer defaults the context of a socket policy to
the context of the socket since the "no explicit context" case
is now handled properly.
Signed-off-by: Venkat Yekkirala <vyekkirala@TrustedCS.com>
Signed-off-by: James Morris <jmorris@namei.org>
When a security module is loaded (in this case, SELinux), the
security_xfrm_policy_lookup() hook can return an access denied permission
(or other error). We were not handling that correctly, and in fact
inverting the return logic and propagating a false "ok" back up to
xfrm_lookup(), which then allowed packets to pass as if they were not
associated with an xfrm policy.
The way I was seeing the problem was when connecting via IPsec to a
confined service on an SELinux box (vsftpd), which did not have the
appropriate SELinux policy permissions to send packets via IPsec.
The first SYNACK would be blocked, because of an uncached lookup via
flow_cache_lookup(), which would fail to resolve an xfrm policy because
the SELinux policy is checked at that point via the resolver.
However, retransmitted SYNACKs would then find a cached flow entry when
calling into flow_cache_lookup() with a null xfrm policy, which is
interpreted by xfrm_lookup() as the packet not having any associated
policy and similarly to the first case, allowing it to pass without
transformation.
The solution presented here is to first ensure that errno values are
correctly propagated all the way back up through the various call chains
from security_xfrm_policy_lookup(), and handled correctly.
Then, flow_cache_lookup() is modified, so that if the policy resolver
fails (typically a permission denied via the security module), the flow
cache entry is killed rather than having a null policy assigned (which
indicates that the packet can pass freely). This also forces any future
lookups for the same flow to consult the security module (e.g. SELinux)
for current security policy (rather than, say, caching the error on the
flow cache entry).
Signed-off-by: James Morris <jmorris@namei.org>
Testing revealed a problem with the NetLabel cache where a cached entry could
be freed while in use by the LSM layer causing an oops and other problems.
This patch fixes that problem by introducing a reference counter to the cache
entry so that it is only freed when it is no longer in use.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: James Morris <jmorris@namei.org>
This annotation makes it possible to assign a subclass on lock init. This
annotation is meant to reduce the _nested() annotations by assigning a
default subclass.
One could do without this annotation and rely on lockdep_set_class()
exclusively, but that would require a manual stack of struct lock_class_key
objects.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There is some confusion about the meaning of 'bufsz' for a sunrpc server.
In some cases it is the largest message that can be sent or received. In
other cases it is the largest 'payload' that can be included in a NFS
message.
In either case, it is not possible for both the request and the reply to be
this large. One of the request or reply may only be one page long, which
fits nicely with NFS.
So we remove 'bufsz' and replace it with two numbers: 'max_payload' and
'max_mesg'. Max_payload is the size that the server requests. It is used
by the server to check the max size allowed on a particular connection:
depending on the protocol a lower limit might be used.
max_mesg is the largest single message that can be sent or received. It is
calculated as the max_payload, rounded up to a multiple of PAGE_SIZE, and
with PAGE_SIZE added to overhead. Only one of the request and reply may be
this size. The other must be at most one page.
Cc: Greg Banks <gnb@sgi.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* master.kernel.org:/pub/scm/linux/kernel/git/davej/configh:
Remove all inclusions of <linux/config.h>
Manually resolved trivial path conflicts due to removed files in
the sound/oss/ subdirectory.
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
[XFRM]: BEET mode
[TCP]: Kill warning in tcp_clean_rtx_queue().
[NET_SCHED]: Remove old estimator implementation
[ATM]: [zatm] always *pcr in alloc_shaper()
[ATM]: [ambassador] Change the return type to reflect reality
[ATM]: kmalloc to kzalloc patches for drivers/atm
[TIPC]: fix printk warning
[XFRM]: Clearing xfrm_policy_count[] to zero during flush is incorrect.
[XFRM] STATE: Use destination address for src hash.
[NEIGH]: always use hash_mask under tbl lock
[UDP]: Fix MSG_PROBE crash
[UDP6]: Fix flowi clobbering
[NET_SCHED]: Revert "HTB: fix incorrect use of RB_EMPTY_NODE"
[NETFILTER]: ebt_mark: add or/and/xor action support to mark target
[NETFILTER]: ipt_REJECT: remove largely duplicate route_reverse function
[NETFILTER]: Honour source routing for LVS-NAT
[NETFILTER]: add type parameter to ip_route_me_harder
[NETFILTER]: Kconfig: fix xt_physdev dependencies
The rpc reply has multiple levels of error returns. The code here contributes
to the confusion by using "accept_statp" for a pointer to what the rfc (and
wireshark, etc.) refer to as the "reply_stat". (The confusion is compounded
by the fact that the rfc also has an "accept_stat" which follows the
reply_stat in the succesful case.)
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If the request is denied after gss_accept was called, we shouldn't try to wrap
the reply. We were checking the accept_stat but not the reply_stat.
To check the reply_stat in _release, we need a pointer to before (rather than
after) the verifier, so modify body_start appropriately.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Factor out some common code from the integrity and privacy cases.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The NFSACL patches introduced support for multiple RPC services listening on
the same transport. However, only the first of these services was registered
with portmapper. This was perfectly fine for nfsacl, as you traditionally do
not want these to show up in a portmapper listing.
The patch below changes the default behavior to always register all services
listening on a given transport, but retains the old behavior for nfsacl
services.
Signed-off-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Speed up high call-rate workloads by caching the struct ip_map for the peer on
the connected struct svc_sock instead of looking it up in the ip_map cache
hashtable on every call. This helps workloads using AUTH_SYS authentication
over TCP.
Testing was on a 4 CPU 4 NIC Altix using 4 IRIX clients, each with 16
synthetic client threads simulating an rsync (i.e. recursive directory
listing) workload reading from an i386 RH9 install image (161480 regular files
in 10841 directories) on the server. That tree is small enough to fill in the
server's RAM so no disk traffic was involved. This setup gives a sustained
call rate in excess of 60000 calls/sec before being CPU-bound on the server.
Profiling showed strcmp(), called from ip_map_match(), was taking 4.8% of each
CPU, and ip_map_lookup() was taking 2.9%. This patch drops both contribution
into the profile noise.
Note that the above result overstates this value of this patch for most
workloads. The synthetic clients are all using separate IP addresses, so
there are 64 entries in the ip_map cache hash. Because the kernel measured
contained the bug fixed in commit
commit 1f1e030bf7
and was running on 64bit little-endian machine, probably all of those 64
entries were on a single chain, thus increasing the cost of ip_map_lookup().
With a modern kernel you would need more clients to see the same amount of
performance improvement. This patch has helped to scale knfsd to handle a
deployment with 2000 NFS clients.
Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The limit over UDP remains at 32K. Also, make some of the apparently
arbitrary sizing constants clearer.
The biggest change here involves replacing NFSSVC_MAXBLKSIZE by a function of
the rqstp. This allows it to be different for different protocols (udp/tcp)
and also allows it to depend on the servers declared sv_bufsiz.
Note that we don't actually increase sv_bufsz for nfs yet. That comes next.
Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
.. by allocating the array of 'kvec' in 'struct svc_rqst'.
As we plan to increase RPCSVC_MAXPAGES from 8 upto 256, we can no longer
allocate an array of this size on the stack. So we allocate it in 'struct
svc_rqst'.
However svc_rqst contains (indirectly) an array of the same type and size
(actually several, but they are in a union). So rather than waste space, we
move those arrays out of the separately allocated union and into svc_rqst to
share with the kvec moved out of svc_tcp_recvfrom (various arrays are used at
different times, so there is no conflict).
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We are planning to increase RPCSVC_MAXPAGES from about 8 to about 256. This
means we need to be a bit careful about arrays of size RPCSVC_MAXPAGES.
struct svc_rqst contains two such arrays. However the there are never more
that RPCSVC_MAXPAGES pages in the two arrays together, so only one array is
needed.
The two arrays are for the pages holding the request, and the pages holding
the reply. Instead of two arrays, we can simply keep an index into where the
first reply page is.
This patch also removes a number of small inline functions that probably
server to obscure what is going on rather than clarify it, and opencode the
needed functionality.
Also remove the 'rq_restailpage' variable as it is *always* 0. i.e. if the
response 'xdr' structure has a non-empty tail it is always in the same pages
as the head.
check counters are initilised and incr properly
check for consistant usage of ++ etc
maybe extra some inlines for common approach
general review
Signed-off-by: Neil Brown <neilb@suse.de>
Cc: Magnus Maatta <novell@kiruna.se>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Arrgg.. We cannot 'lockd_up' before 'svc_addsock' as we don't know the
protocol yet.... So switch it around again and save the name of the created
sockets so that it can be closed if lock_up fails.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The refcount that nfsd holds on lockd is based on the number of open sockets.
So when we close a socket, we should decrement the ref (with lockd_down).
Currently when a socket is closed via writing to the portlist file, that
doesn't happen.
So: make sure we get an error return if the socket that was requested does is
not found, and call lockd_down if it was.
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- rename ____kmalloc to kmalloc_track_caller so that people have a chance
to guess what it does just from it's name. Add a comment describing it
for those who don't. Also move it after kmalloc in slab.h so people get
less confused when they are just looking for kmalloc - move things around
in slab.c a little to reduce the ifdef mess.
[penberg@cs.helsinki.fi: Fix up reversed #ifdef]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch introduces the BEET mode (Bound End-to-End Tunnel) with as
specified by the ietf draft at the following link:
http://www.ietf.org/internet-drafts/draft-nikander-esp-beet-mode-06.txt
The patch provides only single family support (i.e. inner family =
outer family).
Signed-off-by: Diego Beltrami <diego.beltrami@gmail.com>
Signed-off-by: Miika Komu <miika@iki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Abhinav Pathak <abhinav.pathak@hiit.fi>
Signed-off-by: Jeff Ahrenholz <ahrenholz@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
GCC can't tell we always initialize 'tv' in all the cases
we actually use it, so explicitly set it up with zeros.
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove unused file, estimators live in net/core/gen_estimator.c now.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
gcc spits out this warning:
net/tipc/link.c: In function ‘link_retransmit_failure’:
net/tipc/link.c:1669: warning: cast from pointer to integer of different
size
More than a little bit ugly, storing integers in void*, but at least the
code is correct, unlike some of the more crufty Linux kernel code found
elsewhere.
Rather than having two casts to massage the value into u32, it's easier
just to have a single cast and use "%lu", since it's just a printk.
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When we flush policies, we do a type match so we might not
actually delete all policies matching a certain direction.
So keep track of how many policies we actually kill and
subtract that number from xfrm_policy_count[dir] at the
end.
Based upon a patch by Masahide NAKAMURA.
Signed-off-by: David S. Miller <davem@davemloft.net>
Src hash is introduced for Mobile IPv6 route optimization usage.
On current kenrel code it is calculated with source address only.
It results we uses the same hash value for outbound state (when
the node has only one address for Mobile IPv6).
This patch use also destination address as peer information for
src hash to be dispersed.
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make sure hash_mask is protected with tbl->lock in all cases just like
the hash_buckets.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
UDP tracks corking status through the pending variable. The
IP layer also tracks it through the socket write queue. It
is possible for the two to get out of sync when MSG_PROBE is
used.
This patch changes UDP to check the write queue to ensure
that the two stay in sync.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The udp6_sendmsg function uses a shared buffer to store the
flow without taking any locks. This leads to races with SMP.
This patch moves the flowi object onto the stack.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
With commit 10fd48f237 [1] , RB_EMPTY_NODE
changed behaviour so it returns true when the node is empty as expected.
Hence Patrick McHardy's fix for sched_htb.c should be reverted.
Signed-off-by: Ismail Donmez <ismail@pardus.org.tr>
ACKed-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The following patch adds or/and/xor functionality for the mark target,
while staying backwards compatible.
Signed-off-by: Bart De Schuymer <bdschuym@pandora.be>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use ip_route_me_harder instead, which now allows to specify how we wish
the packet to be routed.
Based on patch by Simon Horman <horms@verge.net.au>.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
For policy routing, packets originating from this machine itself may be
routed differently to packets passing through. We want this packet to be
routed as if it came from this machine itself. So re-compute the routing
information using ip_route_me_harder().
This patch is derived from work by Ken Brownfield
Cc: Ken Brownfield <krb@irridia.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
By adding a type parameter to ip_route_me_harder() the
expensive call to inet_addr_type() can be avoided in some cases.
A followup patch where ip_route_me_harder() is called from within
ip_vs_out() is one such example.
Signed-off-By: Simon Horman <horms@verge.net.au>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
xt_physdev depends on bridge netfilter, which is a boolean, but can still
be built modular because of special handling in the bridge makefile. Add
a dependency on BRIDGE to prevent XT_MATCH_PHYSDEV=y, BRIDGE=m.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Many files include the filename at the beginning, serveral used a wrong one.
Signed-off-by: Uwe Zeisberger <Uwe_Zeisberger@digi.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
In some places, particularly drivers and __init code, the init utsns is the
appropriate one to use. This patch replaces those with a the init_utsname
helper.
Changes: Removed several uses of init_utsname(). Hope I picked all the
right ones in net/ipv4/ipconfig.c. These are now changed to
utsname() (the per-process namespace utsname) in the previous
patch (2/7)
[akpm@osdl.org: CIFS fix]
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Cc: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Replace references to system_utsname to the per-process uts namespace
where appropriate. This includes things like uname.
Changes: Per Eric Biederman's comments, use the per-process uts namespace
for ELF_PLATFORM, sunrpc, and parts of net/ipv4/ipconfig.c
[jdike@addtoit.com: UML fix]
[clg@fr.ibm.com: cleanup]
[akpm@osdl.org: build fix]
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Actually implement multiple pools. On NUMA machines, allocate a svc_pool per
NUMA node; on SMP a svc_pool per CPU; otherwise a single global pool. Enqueue
sockets on the svc_pool corresponding to the CPU on which the socket bh is run
(i.e. the NIC interrupt CPU). Threads have their cpu mask set to limit them
to the CPUs in the svc_pool that owns them.
This is the patch that allows an Altix to scale NFS traffic linearly
beyond 4 CPUs and 4 NICs.
Incorporates changes and feedback from Neil Brown, Trond Myklebust, and
Christoph Hellwig.
Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently knfsd keeps its own list of all nfsd threads in nfssvc.c; add a new
way of managing the list of all threads in a svc_serv. Add
svc_create_pooled() to allow creation of a svc_serv whose threads are managed
by the sunrpc code. Add svc_set_num_threads() to manage the number of threads
in a service, either per-pool or globally across the service.
Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Split out the list of idle threads and pending sockets from svc_serv into a
new svc_pool structure, and allocate a fixed number (in this patch, 1) of
pools per svc_serv. The new structure contains a lock which takes over
several of the duties of svc_serv->sv_lock, which is now relegated to
protecting only sv_tempsocks, sv_permsocks, and sv_tmpcnt in svc_serv.
The point is to move the hottest fields out of svc_serv and into svc_pool,
allowing a following patch to arrange for a svc_pool per NUMA node or per CPU.
This is a major step towards making the NFS server NUMA-friendly.
Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The SK_BUSY bit in svc_sock->sk_flags ensures that we do not attempt to
enqueue a socket twice. Currently, setting and clearing the bit is protected
by svc_serv->sv_lock. As I intend to reduce the data that the lock protects
so it's not held when svc_sock_enqueue() tests and sets SK_BUSY, that test and
set needs to be atomic.
Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Convert the svc_sock->sk_reserved variable from an int protected by
svc_serv->sv_lock, to an atomic. This reduces (by 1) the number of places we
need to take the (effectively global) svc_serv->sv_lock.
Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Protect the svc_sock->sk_deferred list with a new lock svc_sock->sk_defer_lock
instead of svc_serv->sv_lock. Using the more fine-grained lock reduces the
number of places we need to take the svc_serv lock.
Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Convert the svc_sock->sk_inuse counter from an int protected by
svc_serv->sv_lock, to an atomic. This reduces the number of places we need to
take the (effectively global) svc_serv->sv_lock.
Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Following are 11 patches from Greg Banks which combine to make knfsd more
Numa-aware. They reduce hitting on 'global' data structures, and create some
data-structures that can be node-local.
knfsd threads are bound to a particular node, and the thread to handle a new
request is chosen from the threads that are attach to the node that received
the interrupt.
The distribution of threads across nodes can be controlled by a new file in
the 'nfsd' filesystem, though the default approach of an even spread is
probably fine for most sites.
Some (old) numbers that show the efficacy of these patches: N == number of
NICs == number of CPUs == nmber of clients. Number of NUMA nodes == N/2
N Throughput, MiB/s CPU usage, % (max=N*100)
Before After Before After
--- ------ ---- ----- -----
4 312 435 350 228
6 500 656 501 418
8 562 804 690 589
This patch:
Move the aging of RPC/TCP connection sockets from the main svc_recv() loop to
a timer which uses a mark-and-sweep algorithm every 6 minutes. This reduces
the amount of work that needs to be done in the main RPC loop and the length
of time we need to hold the (effectively global) svc_serv->sv_lock.
[akpm@osdl.org: cleanup]
Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It isn't needed as it is available in rqstp->rq_server, and dropping it allows
some local vars to be dropped.
[akpm@osdl.org: build fix]
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Userspace should create and bind a socket (but not connectted) and write the
'fd' to portlist. This will cause the nfs server to listen on that socket.
To close a socket, the name of the socket - as read from 'portlist' can be
written to 'portlist' with a preceding '-'.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This file will list all ports that nfsd has open.
Default when TCP enabled will be
ipv4 udp 0.0.0.0 2049
ipv4 tcp 0.0.0.0 2049
Later, the list of ports will be settable.
'portlist' chosen rather than 'ports', to avoid unnecessary confusion with
non-mainline patches which created 'ports' with different semantics.
[akpm@osdl.org: cleanups, build fix]
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
nfsd has some cleanup that it wants to do when the last thread exits, and
there will shortly be some more. So collect this all into one place and
define a callback for an rpc service to call when the service is about to be
destroyed.
[akpm@osdl.org: cleanups, build fix]
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In an effort to make kprobe modules more portable, here is a patch that:
o Introduces the "symbol_name" field to struct kprobe.
The symbol->address resolution now happens in the kernel in an
architecture agnostic manner. 64-bit powerpc users no longer have
to specify the ".symbols"
o Introduces the "offset" field to struct kprobe to allow a user to
specify an offset into a symbol.
o The legacy mechanism of specifying the kprobe.addr is still supported.
However, if both the kprobe.addr and kprobe.symbol_name are specified,
probe registration fails with an -EINVAL.
o The symbol resolution code uses kallsyms_lookup_name(). So
CONFIG_KPROBES now depends on CONFIG_KALLSYMS
o Apparantly kprobe modules were the only legitimate out-of-tree user of
the kallsyms_lookup_name() EXPORT. Now that the symbol resolution
happens in-kernel, remove the EXPORT as suggested by Christoph Hellwig
o Modify tcp_probe.c that uses the kprobe interface so as to make it
work on multiple platforms (in its earlier form, the code wouldn't
work, say, on powerpc)
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
As part of an SMP cleanliness pass over UML, I consted a bunch of
structures in order to not have to document their locking. One of these
structures was a struct tty_operations. In order to const it in UML
without introducing compiler complaints, the declaration of
tty_set_operations needs to be changed, and then all of its callers need to
be fixed.
This patch declares all struct tty_operations in the tree as const. In all
cases, they are static and used only as input to tty_set_operations. As an
extra check, I ran an i386 allyesconfig build which produced no extra
warnings.
53 drivers are affected. I checked the history of a bunch of them, and in
most cases, there have been only a handful of maintenance changes in the
last six months. serial_core.c was the busiest one that I looked at.
Signed-off-by: Jeff Dike <jdike@addtoit.com>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
File handles can be requested to send sigio and sigurg to processes. By
tracking the destination processes using struct pid instead of pid_t we make
the interface safe from all potential pid wrap around problems.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is mostly included for parity with dec_nlink(), where we will have some
more hooks. This one should stay pretty darn straightforward for now.
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch removes readv() and writev() methods and replaces them with
aio_read()/aio_write() methods.
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch vectorizes aio_read() and aio_write() methods to prepare for
collapsing all aio & vectored operations into one interface - which is
aio_read()/aio_write().
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Michael Holzheu <HOLZHEU@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
All on stack DECLARE_COMPLETIONs should be replaced by:
DECLARE_COMPLETION_ONSTACK
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We only need the timestamp on COOKIE-ECHO chunks, so instead of always
timestamping every SCTP packet, let common code timestamp if the socket
option is set. For COOKIE-ECHO, simply get the time of day if we don't
have a timestamp. This introduces a small possibility that the cookie
may be considered expired, but it will be renegotiated.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently if the sender is sending small messages, it can cause a receiver
to run out of receive buffer space even when the advertised receive window
is still open and results in packet drops and retransmissions. Including
a overhead while updating the sender's view of peer receive window will
reduce the chances of receive buffer space overshooting the receive window.
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This allows more aggressive bundling of chunks when sending small
messages.
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix some issues Steve Grubb had with the way NetLabel was using the audit
subsystem. This should make NetLabel more consistent with other kernel
generated audit messages specifying configuration changes.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Acked-by: Steve Grubb <sgrubb@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
GWOL might provide passwords
GSET, GLINK, and GSTATS might poke the hardware
Based upon feedback from Jeff Garzik.
Signed-off-by: David S. Miller <davem@davemloft.net>
The bt_sysfs_cleanup() is marked with __exit attribute, but it will
be called from an __init function in the error case. So the __exit
attribute must be removed.
Signed-off-by: Arnaud Patard <arnaud.patard@rtp-net.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the case of device pairing the only safe method is to establish
a low-level ACL link. In this case, the remote side should not use
the disconnect timer to give the other side the chance to enter the
PIN code. If the disconnect timer is used, the connection will be
dropped to soon, because it is impossible to identify an actual user
of this link.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no reason to not allow non-admin users to query network
statistics and settings.
[ Removed PHYS_ID and GREGS based upon feedback from Auke Kok
and Michael Chan -DaveM]
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds audit support to NetLabel, including six new audit message
types shown below.
#define AUDIT_MAC_UNLBL_ACCEPT 1406
#define AUDIT_MAC_UNLBL_DENY 1407
#define AUDIT_MAC_CIPSOV4_ADD 1408
#define AUDIT_MAC_CIPSOV4_DEL 1409
#define AUDIT_MAC_MAP_ADD 1410
#define AUDIT_MAC_MAP_DEL 1411
Signed-off-by: Paul Moore <paul.moore@hp.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This changes the microsecond RTT sampling so that samples are taken in
the same way that RTT samples are taken for the RTO calculator: on the
last segment acknowledged, and only when the segment hasn't been
retransmitted.
Signed-off-by: John Heffner <jheffner@psc.edu>
Acked-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fix the chance for tcp_lp_remote_hz_estimator return 0, if
0 < rhz < 64. It also make sure the flag LP_VALID_RHZ is set
correctly.
Signed-off-by: Wong Hoi Sing Edison <hswong3i@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
coverity spotted this one as possible dereference in the dprintk(),
but since there is only one caller of svc_create_socket(), which always
passes a valid sin, we dont need this check.
Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
(p[3]<<24) | (p[2]<<16) | (p[1]<<8) | p[0] is not a valid
way to spell get_unaligned((__be32 *)p
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
I don't know of any Andy Kleen's but I do know a Andi Kleen.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
I'm not entirely sure what happens in the case of a valid port,
at best it'll be silently ignored. This patch ensures that
the port values are unsigned short values, and thus always valid.
This is a second take at fixing this problem, it is simpler
and arguably more correct than the previous approach
that was committed as 3f5af5b353.
Prior to this patch a patch that reverses
3f5af5b353 was sent.
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch reverses 3f5af5b353 as
a better fix was suggested by Patrick McHardy.
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
SPI=0 is used for acquired IPsec SA and MIPv6 RO state.
Such state should not be added to the SPI hash
because we do not care about it on deleting path.
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
This patch replaces the bunch of arbitrary 64 and 128 bytes alloc_skb() calls
with more accurate allocation sizes.
Signed-off-by: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We lock the socket when both releasing and getting a disconnected
notification. In the latter case, we also ste the socket as orphan.
This fixes a potential kernel bug that can be triggered when we get the
disconnection notification before closing the socket.
Signed-off-by: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Because the system won't turn off the SG flag for us we
need to do this manually on the IPv6 path. Otherwise we
will throw IPv6 packets with bad checksums at the hardware.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
spi argument of xfrm_state_lookup() is net-endian
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
inet_lookup() annotated along with helper functions (__inet_lookup(),
__inet_lookup_established(), inet_lookup_established(),
inet_lookup_listener(), __inet_lookup_listener() and inet_ehashfn())
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
INET_MATCH() and friends depend on an interesting set of kludges:
* there's a pair of adjacent fields in struct inet_sock - __be16 dport
followed by __u16 num. We want to search by pair, so we combine the keys into
a single 32bit value and compare with 32bit value read from &...->dport.
* on 64bit targets we combine comparisons with pair of adjacent __be32
fields in the same way.
Make sure that we don't mix those values with anything else and that pairs
we form them from have correct types.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some of the instances of tcp_sack_block are host-endian, some - net-endian.
Define struct tcp_sack_block_wire identical to struct tcp_sack_block
with u32 replaced with __be32; annotate uses of tcp_sack_block replacing
net-endian ones with tcp_sack_block_wire. Change is obviously safe since
for cc(1) __be32 is typedefed to u32.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip_mc_sf_allow() expects addresses to be passed net-endian.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
->faddr is net-endian; annotated as such, variables inferred to be net-endian
annotated.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
The move of qdisc destruction to a rcu callback broke locking in the
entire qdisc layer by invalidating previously valid assumptions about
the context in which changes to the qdisc tree occur.
The two assumptions were:
- since changes only happen in process context, read_lock doesn't need
bottem half protection. Now invalid since destruction of inner qdiscs,
classifiers, actions and estimators happens in the RCU callback unless
they're manually deleted, resulting in dead-locks when read_lock in
process context is interrupted by write_lock_bh in bottem half context.
- since changes only happen under the RTNL, no additional locking is
necessary for data not used during packet processing (f.e. u32_list).
Again, since destruction now happens in the RCU callback, this assumption
is not valid anymore, causing races while using this data, which can
result in corruption or use-after-free.
Instead of "fixing" this by disabling bottem halfs everywhere and adding
new locks/refcounting, this patch makes these assumptions valid again by
moving destruction back to process context. Since only the dev->qdisc
pointer is protected by RCU, but ->enqueue and the qdisc tree are still
protected by dev->qdisc_lock, destruction of the tree can be performed
immediately and only the final free needs to happen in the rcu callback
to make sure dev_queue_xmit doesn't access already freed memory.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix incorrect use of RB_EMPTY_NODE in htb_safe_rb_erase, which makes it
skip nodes within the rbtree instead of nodes not in the tree, resulting
in crashes later on.
The root cause for this seems to be the very counter-intuitive behaviour
of the RB_EMPTY_NODE macro, which returns _false_ when the node is empty.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is just a minor buglet I came across by accident - when inet_init
fails to register raw_prot, it jumps to out_unregister_udp_proto which
should unregister UDP _and_ TCP.
Signed-off-by: Olaf Kirch <okir@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Anyway, I've been asked to add support for managing DSCP codepoints,
so one can test DiffServ capable routers. It's very simple code and is
working for me.
Signed-off-by: Francesco Fondelli <francesco.fondelli@gmail.com>
Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
The attached patch allows pktgen to produce 802.1Q and Q-in-Q tagged frames.
I have used it for stress test a bridge and seems ok to me.
Unfortunately I have no access to net-2.6.x git tree so the diff is against
2.6.17.13.
Signed-off-by: Francesco Fondelli <francesco.fondelli@gmail.com>
Acked-by: Steven Whitehouse <steve@chygwyn.com>
Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
Prevents filters from being added if the first generated
handle already exists.
Signed-off-by: Kim Nordlund <kim.nordlund@nokia.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
In case of non-blocking connects it is possible that the last user
of an ACL link quits before the connection has been fully established.
This will lead to a race condition where the internal state of a
connection is closed, but the actual link has been established and is
active. In case of Bluetooth 1.2 and later devices it is possible to
call create connection cancel to abort the connect. For older devices
the disconnect timer will be used to trigger the needed disconnect.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The local version information are needed to identify certain feature
sets of devices. They must be read on device init and stored for later
use. It is also possible to access them through the device model.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
In case of non-blocking socket calls we should return EINPROGRESS
and not EAGAIN.
Signed-off-by: Ulisses Furquim <ulissesf@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The command complete event of the exit periodic inquiry command must
clear the HCI_INQUIRY flag and finish the HCI request.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch assigns the next free HCI device identifier to Bluetooth
devices based on the SDIO interface.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch integrates the services of the Bluetooth protocols RFCOMM,
BNEP and HIDP into the driver model. This makes it possible to assign
the virtual TTY, network and input devices to a specific Bluetooth
connection.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch integrates the low-level connections (ACL and SCO) into the
driver model. Every connection is presented as device with the parent
set to its host controller device.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
another possible dereference spotted by coverity (#cid 1390).
if the nlmsg_parse() call fails, we goto errout, where we call
dev_put(), with dev still initialized to NULL.
Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
pure s/u32/__be32/
[AV: large part based on Alexey's patches]
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
* add svc_getnl():
Take network-endian value from buffer, convert to host-endian
and return it.
* add svc_putnl():
Take host-endian value, convert to network-endian and put it
into a buffer.
* annotate svc_getu32()/svc_putu32() as dealing with network-endian.
* convert to svc_getnl(), svc_putnl().
[AV: in large part it's a carved-up Alexey's patch]
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
multipath_wrandom.c::__multipath_lookup_weight() contains open-coded
attempt at inet_make_mask(); broken on big-endian.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
multipath_set_nhinfo() (and underlying callback) take net-endian
network and netmask.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
This one is interesting - we use net-endian value as search key, but
order the tree by *host-endian* comparisons of keys. OK since we only
care about lookups. Annotated inet_getpeer() and friends.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
argument and inferred net-endian variables in callers annotated.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
The last argument is network-endian (it will go straight into the packet).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
inet_confirm_addr(), inet_ifa_byprefix(), ip_dev_find(), inet_make_mask() and
inet_ifa_match() annotated, along with inferred net-endian variables
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
ifa_local, ifa_address, ifa_mask, ifa_broadcast and ifa_anycast are
net-endian. Annotated them and variables that are inferred to be
net-endian.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
these are passed net-endian; use be32 netlink accessors
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
annotated arguments and inferred net-endian variables in callers
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
argument and return value are net-endian. Annotated function and inferred
net-endian variables in callers.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
annotated address arguments (port number left alone for now); ditto
for inferred net-endian variables in callers.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
The first 4 arguments of ip_rt_redirect() are net-endian. Annotated.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip_route_input() takes net-endian source and destination address.
* Annotated as such.
* arguments of its invocations annotated where needed.
* local helpers getting the same values passed to by it (ip_route_input_mc(),
ip_route_input_slow(), ip_handle_martian_source(), ip_mkroute_input(),
ip_mkroute_input_def(), __mkroute_input()) annotated
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix cut/paste error in TCPPROBE help text.
Signed-off-by: Grant Grundler <grundler@parisc-linux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
A while ago Ingo patched tcp_v4_rcv on net/ipv4/tcp_ipv4.c to use
bh_lock_sock_nested and silence a lock validator warning. This fixed
it for IPv4, but recently I saw a report of the same warning on IPv6.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6: (48 commits)
[PATCH] bonding: update version number
[PATCH] git-netdev-all: pc300_tty build fix
[PATCH] Make PC300 WAN driver compile again
[PATCH] Modularize generic HDLC
[PATCH] more s2io __iomem annotations
[PATCH] restore __iomem annotations in e1000
[PATCH] 64bit bugs in s2io
[PATCH] bonding: Fix primary selection error at enslavement time
[PATCH] bonding: Don't mangle LACPDUs
[PATCH] bonding: Validate probe replies in ARP monitor
[PATCH] bonding: Don't release slaves when master is admin down
[PATCH] bonding: Add priv_flag to avoid event mishandling
[PATCH] bonding: Handle large hard_header_len
[PATCH] bonding: Remove unneeded NULL test
[PATCH] bonding: Format fix in seq_printf call
[PATCH] bonding: Convert delay value from s16 to int
[PATCH] bonding: Allow bonding to enslave a 10 Gig adapter
Delete unused drivers/net/gt64240eth.h
[PATCH] skge: fiber support
[PATCH] fix possible NULL ptr deref in forcedeth
...
This eliminates the i_blksize field from struct inode. Filesystems that want
to provide a per-inode st_blksize can do so by providing their own getattr
routine instead of using the generic_fillattr() function.
Note that some filesystems were providing pretty much random (and incorrect)
values for i_blksize.
[bunk@stusta.de: cleanup]
[akpm@osdl.org: generic_fillattr() fix]
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* Rougly half of callers already do it by not checking return value
* Code in drivers/acpi/osl.c does the following to be sure:
(void)kmem_cache_destroy(cache);
* Those who check it printk something, however, slab_error already printed
the name of failed cache.
* XFS BUGs on failed kmem_cache_destroy which is not the decision
low-level filesystem driver should make. Converted to ignore.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Now that all of the supporting pieces of NetLabel have a home at SourceForge
update the Kconfig help text and add an entry to the MAINTAINERS file.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
At the suggestion of Thomas Graf, rewrite NetLabel's use of Netlink attributes
to better follow the common Netlink attribute usage.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
At the suggestion of Thomas Graf, rewrite NetLabel's use of Netlink attributes
to better follow the common Netlink attribute usage.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The CIPSOv4 cache traversal routines are triggered both the userspace events
(cache invalidation due to DOI removal or updated SELinux policy) and network
packet processing events. As a result there is a problem with the existing
CIPSOv4 cache spinlocks as they are not bottom-half/softirq safe. This patch
converts the CIPSOv4 cache spin_[un]lock() calls into spin_[un]lock_bh() calls
to address this problem.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix a problem where NetLabel would always set the value of
sk_security_struct->peer_sid in selinux_netlbl_sock_graft() to the context of
the socket, causing problems when users would query the context of the
connection. This patch fixes this so that the value in
sk_security_struct->peer_sid is only set when the connection is NetLabel based,
otherwise the value is untouched.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patch to make bcm43xx-softmac be compatible with the revised SSID
length of WE-21.
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This is version 21 of the Wireless Extensions. Changelog :
o finishes migrating the ESSID API (remove the +1)
o netdev->get_wireless_stats is no more
o long/short retry
This is a redacted version of a patch originally submitted by Jean
Tourrilhes. I removed most of the additions, in order to minimize
future support requirements for nl80211 (or other WE successor).
CC: Jean Tourrilhes <jt@hpl.hp.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Change default congestion control used from BIC to the newer CUBIC
which it the successor to BIC but has better properties over long delay links.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change how default TCP congestion control is chosen. Don't just use
last installed module, instead allow selection during configuration,
and make sure and use the default regardless of load order.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds DCCP probing shamelessly ripped off from TCP probes by Stephen
Hemminger.
I've put in here support for further CCID3 variables as well.
Andrea/Arnaldo might look to extend for CCID2.
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
With constants for CCID numbers this now uses them in some places.
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
This has been discussed on dccp@vger and removes the necessity for applications
to supply service codes in each and every case.
If an application does not want to provide a service code, that's fine, it will
be given 0. Otherwise, service codes can be set via socket options as before.
This patch has been tested using various client/server configurations
(including listening on multiple service codes).
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6: (217 commits)
net/ieee80211: fix more crypto-related build breakage
[PATCH] Spidernet: add ethtool -S (show statistics)
[NET] GT96100: Delete bitrotting ethernet driver
[PATCH] mv643xx_eth: restrict to 32-bit PPC_MULTIPLATFORM
[PATCH] Cirrus Logic ep93xx ethernet driver
r8169: the MMIO region of the 8167 stands behin BAR#1
e1000, ixgb: Remove pointless wrappers
[PATCH] Remove powerpc specific parts of 3c509 driver
[PATCH] s2io: Switch to pci_get_device
[PATCH] gt96100: move to pci_get_device API
[PATCH] ehea: bugfix for register access functions
[PATCH] e1000 disable device on PCI error
drivers/net/phy/fixed: #if 0 some incomplete code
drivers/net: const-ify ethtool_ops declarations
[PATCH] ethtool: allow const ethtool_ops
[PATCH] sky2: big endian
[PATCH] sky2: fiber support
[PATCH] sky2: tx pause bug fix
drivers/net: Trim trailing whitespace
[PATCH] ehea: IBM eHEA Ethernet Device Driver
...
Manually resolved conflicts in drivers/net/ixgb/ixgb_main.c and
drivers/net/sky2.c related to CHECKSUM_HW/CHECKSUM_PARTIAL changes by
commit 84fa7933a3 that just happened to be
next to unrelated changes in this update.
* git://git.linux-nfs.org/pub/linux/nfs-2.6: (74 commits)
NFS: unmark NFS direct I/O as experimental
NFS: add comments clarifying the use of nfs_post_op_update()
NFSv4: rpc_mkpipe creating socket inodes w/out sk buffers
NFS: Use SEEK_END instead of hardcoded value
NFSv4: When mounting with a port=0 argument, substitute port=2049
NFSv4: Poll more aggressively when handling NFS4ERR_DELAY
NFSv4: Handle the condition NFS4ERR_FILE_OPEN
NFSv4: Retry lease recovery if it failed during a synchronous operation.
NFS: Don't invalidate the symlink we just stuffed into the cache
NFS: Make read() return an ESTALE if the file has been deleted
NFSv4: It's perfectly legal for clp to be NULL here....
NFS: nfs_lookup - don't hash dentry when optimising away the lookup
SUNRPC: Fix Oops in pmap_getport_done
SUNRPC: Add refcounting to the struct rpc_xprt
SUNRPC: Clean up soft task error handling
SUNRPC: Handle ENETUNREACH, EHOSTUNREACH and EHOSTDOWN socket errors
SUNRPC: rpc_delay() should not clobber the rpc_task->tk_status
Fix a referral error Oops
NFS: NFS_ROOT should use the new rpc_create API
NFS: Fix up compiler warnings on 64-bit platforms in client.c
...
Manually resolved conflict in net/sunrpc/xprtsock.c
This patch stop rpc_mkpipe from create S_IFSOCK nodes what don't
have associated sk buffers attached (which causes SELinux to oops
during NFSv4 mounts). Instead the S_IFIFO mode bit is set which
probably make more sense and seems to work just fine during
my connectathon and fsx testing...
Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
There is no guarantee that the parent task still exists when we exit from
the portmapper. Save the xprt instead.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
In a subsequent patch, this will allow the portmapper to take a reference
to the rpc_xprt for which it is updating the port number, fixing an Oops.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
- Ensure that the task aborts the RPC call only when it has actually timed out.
- Ensure that req->rq_majortimeo is initialised correctly.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
In case of any of the above errors occuring, delay for 3 seconds, then
handle as if it were a timeout error.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This patch is optional.
It has been suggested that the RPC client internal functions used by upper
layer protocols (such as NFS) be exported via EXPORT_SYMBOL_GPL. This
patch does that.
Test plan:
Compile kernel with CONFIG_NFS enabled as a module.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The two function call API for creating a new RPC client is now obsolete.
Remove it.
Also, remove an unnecessary check to see whether the caller is capable of
using privileged network services. The kernel RPC client always uses a
privileged ephemeral port by default; callers are responsible for checking
the authority of users to make use of any RPC service, or for specifying
that a nonprivileged port is acceptable.
Test plan:
Repeated runs of Connectathon locking suite. Check network trace to ensure
correctness of NLM requests and replies.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Replace xprt_create_proto/rpc_create_client calls in pmap_clnt.c with new
rpc_create() API.
Test plan:
Repeated runs of Connectathon locking suite. Check network trace for
proper PMAP calls and replies.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Prepare for more generic transport endpoint handling needed by transports
that might use different forms of addressing, such as IPv6.
Introduce a single function call to replace the two-call
xprt_create_proto/rpc_create_client API. Define a new rpc_create_args
structure that allows callers to pass in remote endpoint addresses of
varying length.
Test-plan:
Compile kernel with CONFIG_NFS enabled.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
IPv6 addresses are big (128 bytes). Now that no RPC client consumers treat
the addr field in rpc_xprt structs as an opaque, and access it only via the
API calls, we can safely widen the field in the rpc_xprt struct to
accomodate larger addresses.
Test plan:
Compile kernel with CONFIG_NFS enabled.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Hide the details of how the RPC client stores remote peer addresses from
the RPC pipefs implementation.
Test plan:
Connectathon with Kerberos 5 authentication.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Provide an API for formatting the remote peer address for printing without
exposing its internal structure. The address could be dynamic, so we
support a function call to get the address rather than reading it straight
out of a structure.
Test-plan:
Destructive testing (unplugging the network temporarily). Probably need
to rig a server where certain services aren't running, or that returns an
error for some typical operation.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Add a new method to the transport switch API to provide a way to convert
the opaque contents of xprt->addr to a human-readable string.
Test plan:
Compile kernel with CONFIG_NFS enabled.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
include/linux/sunrpc/clnt.h already includes include/linux/sunrpc/xprt.h.
We can remove xprt.h from source files that already include clnt.h.
Likewise include/linux/sunrpc/timer.h.
Test plan:
Compile kernel with CONFIG_NFS enabled.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Hide the details of how the RPC client stores remote peer addresses from
the RPC portmapper.
Test plan:
Destructive testing (unplugging the network temporarily). Connectathon
with UDP and TCP. NFSv2/3 and NFSv4 mounting should be carefully checked.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Provide an API for retrieving the remote peer address without allowing
direct access to the rpc_xprt struct.
Test-plan:
Compile kernel with CONFIG_NFS enabled.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Introduce a clean transport switch API for plugging in different types of
rpcbind mechanisms. For instance, rpcbind can cleanly replace the
existing portmapper client, or a transport can choose to implement RPC
binding any way it likes.
Test plan:
Destructive testing (unplugging the network temporarily). Connectathon
with UDP and TCP. NFSv2/3 and NFSv4 mounting should be carefully checked.
Probably need to rig a server where certain services aren't running, or
that returns an error for some typical operation.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The previous patches removed the last user of RPC child tasks, so we can
remove support for child tasks from net/sunrpc/sched.c now.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Add comments for external functions, use modern function definition style,
and fix up dprintk formatting.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Move connection and bind state that was maintained in the rpc_clnt
structure to the rpc_xprt structure. This will allow the creation of
a clean API for plugging in different types of bind mechanisms.
This brings improvements such as the elimination of a single spin lock to
control serialization for all in-kernel RPC binding. A set of per-xprt
bitops is used to serialize tasks during RPC binding, just like it now
works for making RPC transport connections.
Test-plan:
Destructive testing (unplugging the network temporarily). Connectathon
with UDP and TCP. NFSv2/3 and NFSv4 mounting should be carefully checked.
Probably need to rig a server where certain services aren't running, or
that returns an error for some typical operation.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Hide the contents and format of xprt->addr by eliminating direct uses
of the xprt->addr.sin_port field. This change is required to support
alternate RPC host address formats (eg IPv6).
Test-plan:
Destructive testing (unplugging the network temporarily). Repeated runs of
Connectathon locking suite with UDP and TCP.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
IFA_F_HOMEADDRESS is introduced for Mobile IPv6 Home Addresses on
Mobile Node.
The IFA_F_HOMEADDRESS flag should be set for Mobile IPv6 Home
Addresses for 2 purposes. 1) We need to check this on receipt of
Type 2 Routing Header (RFC3775 Secion 6.4), 2) We prefer Home
Address(es) in source address selection (RFC3484 Section 5 Rule 4).
Signed-off-by: Noriaki TAKAMIYA <takamiya@po.ntts.co.jp>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
IFA_F_NODAD flag, similar to IN6_IFF_NODAD in BSDs, is introduced
to skip DAD.
This flag should be set to Mobile IPv6 Home Address(es) on Mobile
Node because DAD would fail if we should perform DAD; our Home Agent
protects our Home Address(es).
Signed-off-by: Noriaki TAKAMIYA <takamiya@po.ntts.co.jp>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We did not send appropriate IsRouter flag if the forwarding setting is
positive even value. Let's give 1/0 value to ndisc_send_na().
Also, existing users of ndisc_send_na() give 0/1 to override,
we can omit redundant operation in that function.
Bug hinted by Nicolas Dichtel <nicolas.dichtel@6wind.com>.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We do not always need proxy NDP functionality even we
enable forwarding.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We have sent NA with router flag from the node-wide forwarding
configuration. This is not appropriate for proxy NA, and it should be
set according to each proxy entry's configuration.
This is used by Mobile IPv6 home agent to support physical home link
in acting as a proxy router for mobile node which is not a router,
for example.
Based on MIPL2 kernel patch.
Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi>
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
This aims at proxying router not updating neighbor cache entry for proxied
address when it receives NA because either the proxied node is off link or
it has already sent a NA to the proxied router.
Based on MIPL2 kernel patch.
Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi>
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Proxying router can't forward traffic sent to link-local address, so signal
the sender and discard the packet. This behavior is clarified by Mobile IPv6
specification (RFC3775) but might be required for all proxying router.
Based on MIPL2 kernel patch.
Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi>
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
It is required to respond to NDP messages sent directly to the "target"
unicast address. Proxying node (router) is required to handle such
messages. To achieve this, check if the packet in forwarding patch is
NDP message.
With this patch, the proxy neighbor entries are always looked up in
forwarding path. We may want to optimize further.
Based on MIPL2 kernel patch.
Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi>
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
When the master PPTP connection times out while still having unfullfilled
expectations (and a GRE keymap entry) associated with it, the keymap entry
is not destroyed.
Add a destroy callback to struct ip_conntrack_helper and use it to destroy
PPTP siblings when the master is destroyed.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
When destroying the GRE expectations without having seen the GRE connection
the keymap entry is not freed, leading to a memory leak and, in case of
a following call within the same session, failure during expectation setup.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix incorrectly used message types and call IDs:
- PPTP_IN_CALL_REQUEST (PAC->PNS) contains a PptpInCallRequest (icreq)
message and the PAC call ID
- PPTP_IN_CALL_REPLY (PNS->PAC) contains a PptpInCallReply (icack)
message and the PNS call ID
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
For rejected calls the state is set to PPTP_CALL_NONE even for non-matching
call ids.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Also make sure not to hand packets received in an invalid state to the
NAT helper since it will mangle the packet with invalid data.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Also make sure not to pass undersized messages to the NAT helper.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove duplicated expectation handling in the NAT helper and simplify
the remains in the conntrack helper.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Just the values are needed, not the memory locations.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix a few header definitions to match RFC2637. Most importantly the
PptpOutCallRequest header included an invalid padding field and a
size check was disabled because of this.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The calculated sequence numbers are not used for anything.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The call ID in reply packets is never changed, remove the code.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The conntrack structure contains the call ID in host byte order for no
reason, get rid of back and forth conversions.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Split the xt_compat_match/xt_compat_target into smaller type-safe functions
performing just one operation. Handle all alignment and size-related
conversions centrally in these function instead of requiring each module to
implement a full-blown conversion function. Replace ->compat callback by
->compat_from_user and ->compat_to_user callbacks, responsible for
converting just a single private structure.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
sparse "defined twice" warning
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix regression introduced by the incremental checksum patches.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
On SMP environments the maximum number of conntracks can be overpassed
under heavy stress situations due to an existing race condition.
CPU A CPU B
atomic_read() ...
early_drop() ...
... atomic_read()
allocate conntrack allocate conntrack
atomic_inc() atomic_inc()
This patch moves the counter incrementation before the early drop stage.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Merge the bits to dump the conntrack table and the ones to dump and
zero counters in a single piece of code. This patch does not change
the default behaviour if accounting is not enabled.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
While standard_target has target->me == NULL, module_put() should be
called for it as for others, because there were try_module_get() before.
Signed-off-by: Dmitry Mishin <dim@openvz.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that IPv6 supports policy routing we need to reroute in NF_IP6_LOCAL_OUT
when the mark value changes.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The limit match reinitializes its state whenever the ruleset changes,
which means it will forget about previously used credits.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
- remove debugging cruft
- remove printk for reallocation failures
- remove unused addition
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
- fix whitespace error
- break lines at 80 characters
- reformat some expressions to be more readable
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Also fix some whitespace errors and use the NAT bits instead of deriving
the state manually.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kill listhelp.h and use the list.h functions instead.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Handling of ipip and ip_gre ICMP error relaying is b0rken; it accesses
8bit field + 3 reserved octets as host-endian 32bit, does comparison,
subtraction and stuffs the result back. That breaks on big-endian.
Fixed, made endian-clean.
[ Note that this effected code is permanently commented out with
and ifdef, so this error couldn't actually cause problems for
anyone. -DaveM ]
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce methods which manipulate interesting congestion control
state such as pipe and rtt estimate. This is useful for people
wishing to monitor the variables of CCID and instrument the code
[perhaps using Kprobes]. Personally, I am a fan of
encapsulation---that justifies this change =D.
Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When multiple losses occur in one RTT, the window should be halved
only once [a single "congestion event"]. This is now implemented,
although not perfectly. Slightly changed the interface for changing
the cwnd: pass hctx instead of dp. This is required in order to allow
for change_cwnd to be called from _init().
Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allocate more sequence state on demand. Each time a packet is sent
out by CCID2, a record of it needs to be kept. This list of records
grows proportionally to cwnd. Previously, the length of this list was
hardcored and therefore the cwnd could only grow to this value (of
128). Now, records are allocated on demand as necessary---cwnd may
grow as it wishes. The exceptional case of when memory is not
available is not handled gracefully. Perhaps, cwnd should be capped
at that point.
Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow the user to choose whether or not to enable CCID2 debugging via
Kconfig.
Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If not enough cwnd is available, tell the sender to check again as
soon as possible. This will increase CPU utilization (polling
frequently for cwnd) but will improve network performance. That is,
the sender will need to wait less before detecting the increase of
cwnd. A better architecture would be for the CCID to call-back (or
dequeue) from DCCP when it is able to transmit traffic -- not the
other way around as it currently occurs.
Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Initialize the slow-start threshold to infinity. This way, upon connection
initiation, slow-start will be exited only upon a packet loss. This patch will
allow connections to quickly gain speed.
Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiffies are now handled correctly (I hope) in CCID2. If they wrap, no
problem.
Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Get rid of unused variables in ackvector state.
Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the way state is masked out. DCCP_ACKVEC_STATE_NOT_RECEIVED is
defined as appears in the packet, therefore bit shifting is not
required. This fix allows CCID2 to correctly detect losses.
Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix ackvector length calculation upon receiving an "ack-of-ack". This
patch avoids the ackvector from growing too large which causes it to
not be inserted into packets.
Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hashing SAs by source address breaks templates with wildcards as tunnel
source since the source address used for hashing/lookup is still 0/0.
Move source address lookup to xfrm_tmpl_resolve_one() so we can use the
real address in the lookup.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
It does not affect either mss-sized connections (obviously) or
connections controlled by Nagle (because there is only one small
segment in flight).
The idea is to record the fact that a small segment arrives on a
connection, where one small segment has already been received and
still not-ACKed. In this case ACK is forced after tcp_recvmsg() drains
receive buffer.
In other words, it is a "soft" each-2nd-segment ACK, which is enough
to preserve ACK clock even when ABC is enabled.
Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>