Commit Graph

22535 Commits

Author SHA1 Message Date
Andy Gospodarek eaddcd7690 bonding: remove entries for master_ip and vlan_ip and query devices instead
The following patch aimed to resolve an issue where secondary, tertiary,
etc. addresses added to bond interfaces could overwrite the
bond->master_ip and vlan_ip values.

        commit 917fbdb32f
        Author: Henrik Saavedra Persson <henrik.e.persson@ericsson.com>
        Date:   Wed Nov 23 23:37:15 2011 +0000

            bonding: only use primary address for ARP

That patch was good because it prevented bonds using ARP monitoring from
sending frames with an invalid source IP address.  Unfortunately, it
didn't always work as expected.

When using an ioctl (like ifconfig does) to set the IP address and
netmask, 2 separate ioctls are actually called to set the IP and netmask
if the mask chosen doesn't match the standard mask for that class of
address.  The first ioctl did not have a mask that matched the one in
the primary address and would still cause the device address to be
overwritten.  The second ioctl that was called to set the mask would
then detect as secondary and ignored, but the damage was already done.

This was not an issue when using an application that used netlink
sockets as the setting of IP and netmask came down at once.  The
inconsistent behavior between those two interfaces was something that
needed to be resolved.

While I was thinking about how I wanted to resolve this, Ralf Zeidler
came with a patch that resolved this on a RHEL kernel by keeping a full
shadow of the entries in dev->ifa_list for the bonding device and vlan
devices in the bonding driver.  I didn't like the duplication of the
list as I want to see the 'bonding' struct and code shrink rather than
grow, but liked the general idea.

As the Subject indicates this patch drops the master_ip and vlan_ip
elements from the 'bonding' and 'vlan_entry' structs, respectively.
This can be done because a device's address-list is now traversed to
determine the optimal source IP address for ARP requests and for checks
to see if the bonding device has a particular IP address.  This code
could have all be contained inside the bonding driver, but it made more
sense to me to EXPORT and call inet_confirm_addr since it did exactly
what was needed.

I tested this and a backported patch and everything works as expected.
Ralf also helped with verification of the backported patch.

Thanks to Ralf for all his help on this.

v2: Whitespace and organizational changes based on suggestions from Jay
Vosburgh and Dave Miller.

v3: Fixup incorrect usage of rcu_read_unlock based on Dave Miller's
suggestion.

Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
CC: Ralf Zeidler <ralf.zeidler@nsn.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-22 22:36:17 -04:00
Rusty Russell 523f610e1b netfilter: remove forward module param confusion.
It used to be an int, and it got changed to a bool parameter at least
7 years ago.  It happens that NF_ACCEPT and NF_DROP are 0 and 1, so
this works, but it's unclear, and the check that it's in range is not
required.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-22 22:36:17 -04:00
Linus Torvalds db14179679 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 patches from Martin Schwidefsky:
 "The biggest patch is the rework of the smp code, something I wanted to
  do for some time.  There are some patches for our various dump methods
  and one new thing: z/VM LGR detection.  LGR stands for linux-guest-
  relocation and is the guest migration feature of z/VM.  For debugging
  purposes we keep a log of the systems where a specific guest has lived."

Fix up trivial conflict in arch/s390/kernel/smp.c due to the scheduler
cleanup having removed some code next to removed s390 code.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  [S390] kernel: Pass correct stack for smp_call_ipl_cpu()
  [S390] Ensure that vmcore_info pointer is never accessed directly
  [S390] dasd: prevent validate server for offline devices
  [S390] Remove monolithic build option for zcrypt driver.
  [S390] stack dump: fix indentation in output
  [S390] kernel: Add OS info memory interface
  [S390] Use block_sigmask()
  [S390] kernel: Add z/VM LGR detection
  [S390] irq: external interrupt code passing
  [S390] irq: set __ARCH_IRQ_EXIT_IRQS_DISABLED
  [S390] zfcpdump: Implement async sdias event processing
  [S390] Use copy_to_absolute_zero() instead of "stura/sturg"
  [S390] rework idle code
  [S390] rework smp code
  [S390] rename lowcore field
  [S390] Fix gcc 4.6.0 compile warning
2012-03-22 18:15:32 -07:00
Pablo Neira Ayuso 60b5f8f745 netfilter: nf_conntrack: permanently attach timeout policy to conntrack
We need to permanently attach the timeout policy to the conntrack,
otherwise we may apply the custom timeout policy inconsistently.

Without this patch, the following example:

 nfct timeout add test inet icmp timeout 100
 iptables -I PREROUTING -t raw -p icmp -s 1.1.1.1 -j CT --timeout test

Will only apply the custom timeout policy to outgoing packets from
1.1.1.1, but not to reply packets from 2.2.2.2 going to 1.1.1.1.

To fix this issue, this patch modifies the current logic to attach the
timeout policy when the first packet is seen (which is when the
conntrack entry is created). Then, we keep using the attached timeout
policy until the conntrack entry is destroyed.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-03-23 00:52:08 +01:00
Pablo Neira Ayuso eeb4cb9523 netfilter: xt_CT: fix assignation of the generic protocol tracker
`iptables -p all' uses 0 to match all protocols, while the conntrack
subsystem uses 255. We still need `-p all' to attach the custom
timeout policies for the generic protocol tracker.

Moreover, we may use `iptables -p sctp' while the SCTP tracker is
not loaded. In that case, we have to default on the generic protocol
tracker.

Another possibility is `iptables -p ip' that should be supported
as well. This patch makes sure we validate all possible scenarios.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-03-23 00:52:08 +01:00
Pablo Neira Ayuso 1ac0bf9926 netfilter: xt_CT: missing rcu_read_lock section in timeout assignment
Fix a dereference to pointer without rcu_read_lock held.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-03-23 00:52:07 +01:00
Pablo Neira Ayuso c1ebd7dff7 netfilter: cttimeout: fix dependency with l4protocol conntrack module
This patch introduces nf_conntrack_l4proto_find_get() and
nf_conntrack_l4proto_put() to fix module dependencies between
timeout objects and l4-protocol conntrack modules.

Thus, we make sure that the module cannot be removed if it is
used by any of the cttimeout objects.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-03-23 00:52:01 +01:00
Steffen Klassert 1265fd6167 xfrm: Access the replay notify functions via the registered callbacks
We call the wrong replay notify function when we use ESN replay
handling. This leads to the fact that we don't send notifications
if we use ESN. Fix this by calling the registered callbacks instead
of xfrm_replay_notify().

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-22 19:29:58 -04:00
Steffen Klassert 26b2072e75 xfrm: Remove unused xfrm_state from xfrm_state_check_space
The xfrm_state argument is unused in this function, so remove it.
Also the name xfrm_state_check_space does not really match what this
function does. It actually checks if we have enough head and tailroom
on the skb. So we rename the function to xfrm_skb_check_space.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-22 19:29:58 -04:00
Dan Carpenter f0229eaaf3 RDS: use gfp flags from caller in conn_alloc()
We should be using the gfp flags the caller specified here, instead of
GFP_KERNEL.  I think this might be a bugfix, depending on the value of
"sock->sk->sk_allocation" when we call rds_conn_create_outgoing() in
rds_sendmsg().  Otherwise, it's just a cleanup.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-22 19:29:58 -04:00
Dan Carpenter 64b5fad526 netlabel: use GFP flags from caller instead of GFP_ATOMIC
This function takes a GFP flags as a parameter, but they are never used.
We don't take a lock in this function so there is no reason to prefer
GFP_ATOMIC over the caller's GFP flags.

There is only one caller, cipso_v4_map_cat_rng_ntoh(), and it passes
GFP_ATOMIC as the GFP flags so this doesn't change how the code works.
It's just a cleanup.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-22 19:29:57 -04:00
Alex Elder 8d63e318c4 libceph: isolate kmap() call in write_partial_msg_pages()
In write_partial_msg_pages(), every case now does an identical call
to kmap(page).  Instead, just call it once inside the CRC-computing
block where it's needed.  Move the definition of kaddr inside that
block, and make it a (char *) to ensure portable pointer arithmetic.

We still don't kunmap() it until after the sendpage() call, in case
that also ends up needing to use the mapping.

Signed-off-by: Alex Elder <elder@dreamhost.com>
Reviewed-by: Sage Weil <sage@newdream.net>
2012-03-22 10:47:52 -05:00
Alex Elder 9bd1966344 libceph: rename "page_shift" variable to something sensible
In write_partial_msg_pages() there is a local variable used to
track the starting offset within a bio segment to use.  Its name,
"page_shift" defies the Linux convention of using that name for
log-base-2(page size).

Since it's only used in the bio case rename it "bio_offset".  Use it
along with the page_pos field to compute the memory offset when
computing CRC's in that function.  This makes the bio case match the
others more closely.

Signed-off-by: Alex Elder <elder@dreamhost.com>
Reviewed-by: Sage Weil <sage@newdream.net>
2012-03-22 10:47:52 -05:00
Alex Elder 0cdf9e6018 libceph: get rid of zero_page_address
There's not a lot of benefit to zero_page_address, which basically
holds a mapping of the zero page through the life of the messenger
module.  Even with our own mapping, the sendpage interface where
it's used may need to kmap() it again.  It's almost certain to
be in low memory anyway.

So stop treating the zero page specially in write_partial_msg_pages()
and just get rid of zero_page_address entirely.

Signed-off-by: Alex Elder <elder@dreamhost.com>
Reviewed-by: Sage Weil <sage@newdream.net>
2012-03-22 10:47:52 -05:00
Alex Elder e36b13cceb libceph: only call kernel_sendpage() via helper
Make ceph_tcp_sendpage() be the only place kernel_sendpage() is
used, by using this helper in write_partial_msg_pages().

Signed-off-by: Alex Elder <elder@dreamhost.com>
Reviewed-by: Sage Weil <sage@newdream.net>
2012-03-22 10:47:52 -05:00
Alex Elder 31739139f3 libceph: use kernel_sendpage() for sending zeroes
If a message queued for send gets revoked, zeroes are sent over the
wire instead of any unsent data.  This is done by constructing a
message and passing it to kernel_sendmsg() via ceph_tcp_sendmsg().

Since we are already working with a page in this case we can use
the sendpage interface instead.  Create a new ceph_tcp_sendpage()
helper that sets up flags to match the way ceph_tcp_sendmsg()
does now.

Signed-off-by: Alex Elder <elder@dreamhost.com>
Reviewed-by: Sage Weil <sage@newdream.net>
2012-03-22 10:47:51 -05:00
Alex Elder 37675b0f42 libceph: fix inverted crc option logic
CRC's are computed for all messages between ceph entities.  The CRC
computation for the data portion of message can optionally be
disabled using the "nocrc" (common) ceph option.  The default is
for CRC computation for the data portion to be enabled.

Unfortunately, the code that implements this feature interprets the
feature flag wrong, meaning that by default the CRC's have *not*
been computed (or checked) for the data portion of messages unless
the "nocrc" option was supplied.

Fix this, in write_partial_msg_pages() and read_partial_message().
Also change the flag variable in write_partial_msg_pages() to be
"no_datacrc" to match the usage elsewhere in the file.

This fixes http://tracker.newdream.net/issues/2064

Signed-off-by: Alex Elder <elder@dreamhost.com>
Reviewed-by: Sage Weil <sage@newdream.net>
2012-03-22 10:47:51 -05:00
Alex Elder 84495f4961 libceph: some simple changes
Nothing too big here.
    - define the size of the buffer used for consuming ignored
      incoming data using a symbolic constant
    - simplify the condition determining whether to unmap the page
      in write_partial_msg_pages(): do it for crc but not if the
      page is the zero page

Signed-off-by: Alex Elder <elder@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-22 10:47:51 -05:00
Alex Elder f42299e6c3 libceph: small refactor in write_partial_kvec()
Make a small change in the code that counts down kvecs consumed by
a ceph_tcp_sendmsg() call.  Same functionality, just blocked out
a little differently.

Signed-off-by: Alex Elder <elder@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-22 10:47:51 -05:00
Alex Elder fe3ad593e2 libceph: do crc calculations outside loop
Move blocks of code out of loops in read_partial_message_section()
and read_partial_message().  They were only was getting called at
the end of the last iteration of the loop anyway.

Signed-off-by: Alex Elder <elder@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-22 10:47:51 -05:00
Alex Elder a9a0c51af4 libceph: separate CRC calculation from byte swapping
Calculate CRC in a separate step from rearranging the byte order
of the result, to improve clarity and readability.

Use offsetof() to determine the number of bytes to include in the
CRC calculation.

In read_partial_message(), switch which value gets byte-swapped,
since the just-computed CRC is already likely to be in a register.

Signed-off-by: Alex Elder <elder@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-22 10:47:51 -05:00
Alex Elder bca064d236 libceph: use "do" in CRC-related Boolean variables
Change the name (and type) of a few CRC-related Boolean local
variables so they contain the word "do", to distingish their purpose
from variables used for holding an actual CRC value.

Note that in the process of doing this I identified a fairly serious
logic error in write_partial_msg_pages():  the value of "do_crc"
assigned appears to be the opposite of what it should be.  No
attempt to fix this is made here; this change preserves the
erroneous behavior.  The problem I found is documented here:
    http://tracker.newdream.net/issues/2064

Signed-off-by: Alex Elder <elder@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-22 10:47:51 -05:00
Alex Elder cffaba15cd ceph: ensure Boolean options support both senses
Many ceph-related Boolean options offer the ability to both enable
and disable a feature.  For all those that don't offer this, add
a new option so that they do.

Note that ceph_show_options()--which reports mount options currently
in effect--only reports the option if it is different from the
default value.

Signed-off-by: Alex Elder <elder@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-22 10:47:51 -05:00
Alex Elder d3002b974c libceph: a few small changes
This gathers a number of very minor changes:
    - use %hu when formatting the a socket address's address family
    - null out the ceph_msgr_wq pointer after the queue has been
      destroyed
    - drop a needless cast in ceph_write_space()
    - add a WARN() call in ceph_state_change() in the event an
      unrecognized socket state is encountered
    - rearrange the logic in ceph_con_get() and ceph_con_put() so
      that:
        - the reference counts are only atomically read once
	- the values displayed via dout() calls are known to
	  be meaningful at the time they are formatted

Signed-off-by: Alex Elder <elder@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-22 10:47:51 -05:00
Alex Elder 41617d0c9c libceph: make ceph_tcp_connect() return int
There is no real need for ceph_tcp_connect() to return the socket
pointer it creates, since it already assigns it to con->sock, which
is visible to the caller.  Instead, have it return an error code,
which tidies things up a bit.

Signed-off-by: Alex Elder <elder@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-22 10:47:51 -05:00
Alex Elder 6173d1f02f libceph: encapsulate some messenger cleanup code
Define a helper function to perform various cleanup operations.  Use
it both in the exit routine and in the init routine in the event of
an error.

Signed-off-by: Alex Elder <elder@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-22 10:47:50 -05:00
Alex Elder e0f43c9419 libceph: make ceph_msgr_wq private
The messenger workqueue has no need to be public.  So give it static
scope.

Signed-off-by: Alex Elder <elder@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-22 10:47:50 -05:00
Alex Elder 859eb79948 libceph: encapsulate connection kvec operations
Encapsulate the operation of adding a new chunk of data to the next
open slot in a ceph_connection's out_kvec array.  Also add a "reset"
operation to make subsequent add operations start at the beginning
of the array again.

Use these routines throughout, avoiding duplicate code and ensuring
all calls are handled consistently.

Signed-off-by: Alex Elder <elder@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-22 10:47:50 -05:00
Alex Elder 963be4d770 libceph: move prepare_write_banner()
One of the arguments to prepare_write_connect() indicates whether it
is being called immediately after a call to prepare_write_banner().
Move the prepare_write_banner() call inside prepare_write_connect(),
and reinterpret (and rename) the "after_banner" argument so it
indicates that prepare_write_connect() should *make* the call
rather than should know it has already been made.

This was split out from the next patch to highlight this change in
logic.

Signed-off-by: Alex Elder <elder@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-22 10:47:50 -05:00
Alex Elder ee57741c52 rbd: make ceph_parse_options() return a pointer
ceph_parse_options() takes the address of a pointer as an argument
and uses it to return the address of an allocated structure if
successful.  With this interface is not evident at call sites that
the pointer is always initialized.  Change the interface to return
the address instead (or a pointer-coded error code) to make the
validity of the returned pointer obvious.

Signed-off-by: Alex Elder <elder@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-22 10:47:47 -05:00
Alex Elder 99f0f3b2c4 ceph: eliminate some abusive casts
This fixes some spots where a type cast to (void *) was used as
as a universal type hiding mechanism.  Instead, properly cast the
type to the intended target type.

Signed-off-by: Alex Elder <elder@newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-22 10:47:45 -05:00
Alex Elder bd40614512 ceph: eliminate some needless casts
This eliminates type casts in some places where they are not
required.

Signed-off-by: Alex Elder <elder@newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-22 10:47:45 -05:00
Alex Elder f64a93172b ceph: kill addr_str_lock spinlock; use atomic instead
A spinlock is used to protect a value used for selecting an array
index for a string used for formatting a socket address for human
consumption.  The index is reset to 0 if it ever reaches the maximum
index value.

Instead, use an ever-increasing atomic variable as a sequence
number, and compute the array index by masking off all but the
sequence number's lowest bits.  Make the number of entries in the
array a power of two to allow the use of such a mask (to avoid jumps
in the index value when the sequence number wraps).

The length of these strings is somewhat arbitrarily set at 60 bytes.
The worst-case length of a string produced is 54 bytes, for an IPv6
address that can't be shortened, e.g.:
    [1234:5678:9abc:def0:1111:2222:123.234.210.100]:32767
Change it so we arbitrarily use 64 bytes instead; if nothing else
it will make the array of these line up better in hex dumps.

Rename a few things to reinforce the distinction between the number
of strings in the array and the length of individual strings.

Signed-off-by: Alex Elder <elder@newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-22 10:47:45 -05:00
Alex Elder a5bc3129a2 ceph: make use of "else" where appropriate
Rearrange ceph_tcp_connect() a bit, making use of "else" rather than
re-testing a value with consecutive "if" statements.  Don't record a
connection's socket pointer unless the connect operation is
successful.

Signed-off-by: Alex Elder <elder@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-22 10:47:45 -05:00
Alex Elder 5766651971 ceph: use a shared zero page rather than one per messenger
Each messenger allocates a page to be used when writing zeroes
out in the event of error or other abnormal condition.  Instead,
use the kernel ZERO_PAGE() for that purpose.

Signed-off-by: Alex Elder <elder@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-22 10:47:45 -05:00
Xi Wang 6448669777 libceph: fix overflow check in crush_decode()
The existing overflow check (n > ULONG_MAX / b) didn't work, because
n = ULONG_MAX / b would both bypass the check and still overflow the
allocation size a + n * b.

The correct check should be (n > (ULONG_MAX - a) / b).

Signed-off-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-22 10:47:45 -05:00
Jim Schutt 182fac2689 net/ceph: Only clear SOCK_NOSPACE when there is sufficient space in the socket buffer
The Ceph messenger would sometimes queue multiple work items to write
data to a socket when the socket buffer was full.

Fix this problem by making ceph_write_space() use SOCK_NOSPACE in the
same way that net/core/stream.c:sk_stream_write_space() does, i.e.,
clearing it only when sufficient space is available in the socket buffer.

Signed-off-by: Jim Schutt <jaschut@sandia.gov>
Reviewed-by: Alex Elder <elder@dreamhost.com>
2012-03-22 10:47:45 -05:00
Pablo Neira Ayuso a0f65a267d netfilter: xt_LOG: use CONFIG_IP6_NF_IPTABLES instead of CONFIG_IPV6
This fixes the following linking error:

xt_LOG.c:(.text+0x789b1): undefined reference to `ip6t_ext_hdr'

ifdefs have to use CONFIG_IP6_NF_IPTABLES instead of CONFIG_IPV6.

Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-03-22 11:50:56 +01:00
Benjamin LaHaise 9395a09d05 l2tp: enable automatic module loading for l2tp_ppp
When L2TP is configured as a module, requests for L2TP sockets do not result
in the l2tp_ppp module being loaded.  Fix this by adding the appropriate
MODULE_ALIAS to be recognized by pppox's request_module() call.

Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-21 22:14:56 -04:00
Eric Dumazet 2a2a459eee net: fix napi_reuse_skb() skb reserve
napi->skb is allocated in napi_get_frags() using
netdev_alloc_skb_ip_align(), with a reserve of NET_SKB_PAD +
NET_IP_ALIGN bytes.

However, when such skb is recycled in napi_reuse_skb(), it ends with a
reserve of NET_IP_ALIGN which is suboptimal.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-21 16:52:09 -04:00
Linus Torvalds e2a0883e40 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pile 1 from Al Viro:
 "This is _not_ all; in particular, Miklos' and Jan's stuff is not there
  yet."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (64 commits)
  ext4: initialization of ext4_li_mtx needs to be done earlier
  debugfs-related mode_t whack-a-mole
  hfsplus: add an ioctl to bless files
  hfsplus: change finder_info to u32
  hfsplus: initialise userflags
  qnx4: new helper - try_extent()
  qnx4: get rid of qnx4_bread/qnx4_getblk
  take removal of PF_FORKNOEXEC to flush_old_exec()
  trim includes in inode.c
  um: uml_dup_mmap() relies on ->mmap_sem being held, but activate_mm() doesn't hold it
  um: embed ->stub_pages[] into mmu_context
  gadgetfs: list_for_each_safe() misuse
  ocfs2: fix leaks on failure exits in module_init
  ecryptfs: make register_filesystem() the last potential failure exit
  ntfs: forgets to unregister sysctls on register_filesystem() failure
  logfs: missing cleanup on register_filesystem() failure
  jfs: mising cleanup on register_filesystem() failure
  make configfs_pin_fs() return root dentry on success
  configfs: configfs_create_dir() has parent dentry in dentry->d_parent
  configfs: sanitize configfs_create()
  ...
2012-03-21 13:36:41 -07:00
Linus Torvalds 3556485f15 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem updates for 3.4 from James Morris:
 "The main addition here is the new Yama security module from Kees Cook,
  which was discussed at the Linux Security Summit last year.  Its
  purpose is to collect miscellaneous DAC security enhancements in one
  place.  This also marks a departure in policy for LSM modules, which
  were previously limited to being standalone access control systems.
  Chromium OS is using Yama, and I believe there are plans for Ubuntu,
  at least.

  This patchset also includes maintenance updates for AppArmor, TOMOYO
  and others."

Fix trivial conflict in <net/sock.h> due to the jumo_label->static_key
rename.

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (38 commits)
  AppArmor: Fix location of const qualifier on generated string tables
  TOMOYO: Return error if fails to delete a domain
  AppArmor: add const qualifiers to string arrays
  AppArmor: Add ability to load extended policy
  TOMOYO: Return appropriate value to poll().
  AppArmor: Move path failure information into aa_get_name and rename
  AppArmor: Update dfa matching routines.
  AppArmor: Minor cleanup of d_namespace_path to consolidate error handling
  AppArmor: Retrieve the dentry_path for error reporting when path lookup fails
  AppArmor: Add const qualifiers to generated string tables
  AppArmor: Fix oops in policy unpack auditing
  AppArmor: Fix error returned when a path lookup is disconnected
  KEYS: testing wrong bit for KEY_FLAG_REVOKED
  TOMOYO: Fix mount flags checking order.
  security: fix ima kconfig warning
  AppArmor: Fix the error case for chroot relative path name lookup
  AppArmor: fix mapping of META_READ to audit and quiet flags
  AppArmor: Fix underflow in xindex calculation
  AppArmor: Fix dropping of allowed operations that are force audited
  AppArmor: Add mising end of structure test to caps unpacking
  ...
2012-03-21 13:25:04 -07:00
Linus Torvalds 9f3938346a Merge branch 'kmap_atomic' of git://github.com/congwang/linux
Pull kmap_atomic cleanup from Cong Wang.

It's been in -next for a long time, and it gets rid of the (no longer
used) second argument to k[un]map_atomic().

Fix up a few trivial conflicts in various drivers, and do an "evil
merge" to catch some new uses that have come in since Cong's tree.

* 'kmap_atomic' of git://github.com/congwang/linux: (59 commits)
  feature-removal-schedule.txt: schedule the deprecated form of kmap_atomic() for removal
  highmem: kill all __kmap_atomic() [swarren@nvidia.com: highmem: Fix ARM build break due to __kmap_atomic rename]
  drbd: remove the second argument of k[un]map_atomic()
  zcache: remove the second argument of k[un]map_atomic()
  gma500: remove the second argument of k[un]map_atomic()
  dm: remove the second argument of k[un]map_atomic()
  tomoyo: remove the second argument of k[un]map_atomic()
  sunrpc: remove the second argument of k[un]map_atomic()
  rds: remove the second argument of k[un]map_atomic()
  net: remove the second argument of k[un]map_atomic()
  mm: remove the second argument of k[un]map_atomic()
  lib: remove the second argument of k[un]map_atomic()
  power: remove the second argument of k[un]map_atomic()
  kdb: remove the second argument of k[un]map_atomic()
  udf: remove the second argument of k[un]map_atomic()
  ubifs: remove the second argument of k[un]map_atomic()
  squashfs: remove the second argument of k[un]map_atomic()
  reiserfs: remove the second argument of k[un]map_atomic()
  ocfs2: remove the second argument of k[un]map_atomic()
  ntfs: remove the second argument of k[un]map_atomic()
  ...
2012-03-21 09:40:26 -07:00
Tom Tucker 9b78145c0f xprtrdma: Remove assumption that each segment is <= PAGE_SIZE
The xprtrdma FRMR mapping logic assumes that a segment is <= PAGE_SIZE.
This is not true for NFS4.

Signed-off-by: Tom Tucker <tom@ogc.us>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-21 09:31:47 -04:00
Tom Tucker 4a6862b364 xprtrdma: The transport should not bug-check when a dup reply is received
The client side RDMA transport will bug check if it receives a duplicate
reply, instead we should simply drop the duplicate reply.

Signed-off-by: Tom Tucker <tom@ogc.us>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-21 09:31:47 -04:00
Trond Myklebust ffa94db604 SUNRPC/LOCKD: Fix build warnings when CONFIG_SUNRPC_DEBUG is undefined
Stephen Rothwell reports:
net/sunrpc/rpcb_clnt.c: In function 'rpcb_enc_mapping':
net/sunrpc/rpcb_clnt.c:820:19: warning: unused variable 'task' [-Wunused-variable]
net/sunrpc/rpcb_clnt.c: In function 'rpcb_dec_getport':
net/sunrpc/rpcb_clnt.c:837:19: warning: unused variable 'task' [-Wunused-variable]
net/sunrpc/rpcb_clnt.c: In function 'rpcb_dec_set':
net/sunrpc/rpcb_clnt.c:860:19: warning: unused variable 'task' [-Wunused-variable]
net/sunrpc/rpcb_clnt.c: In function 'rpcb_enc_getaddr':
net/sunrpc/rpcb_clnt.c:892:19: warning: unused variable 'task' [-Wunused-variable]
net/sunrpc/rpcb_clnt.c: In function 'rpcb_dec_getaddr':
net/sunrpc/rpcb_clnt.c:914:19: warning: unused variable 'task' [-Wunused-variable]
fs/lockd/svclock.c:49:20: warning: 'nlmdbg_cookie2a' declared 'static' but never defined [-Wunused-function]

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-21 09:31:44 -04:00
Linus Torvalds 69a7aebcf0 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree from Jiri Kosina:
 "It's indeed trivial -- mostly documentation updates and a bunch of
  typo fixes from Masanari.

  There are also several linux/version.h include removals from Jesper."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (101 commits)
  kcore: fix spelling in read_kcore() comment
  constify struct pci_dev * in obvious cases
  Revert "char: Fix typo in viotape.c"
  init: fix wording error in mm_init comment
  usb: gadget: Kconfig: fix typo for 'different'
  Revert "power, max8998: Include linux/module.h just once in drivers/power/max8998_charger.c"
  writeback: fix fn name in writeback_inodes_sb_nr_if_idle() comment header
  writeback: fix typo in the writeback_control comment
  Documentation: Fix multiple typo in Documentation
  tpm_tis: fix tis_lock with respect to RCU
  Revert "media: Fix typo in mixer_drv.c and hdmi_drv.c"
  Doc: Update numastat.txt
  qla4xxx: Add missing spaces to error messages
  compiler.h: Fix typo
  security: struct security_operations kerneldoc fix
  Documentation: broken URL in libata.tmpl
  Documentation: broken URL in filesystems.tmpl
  mtd: simplify return logic in do_map_probe()
  mm: fix comment typo of truncate_inode_pages_range
  power: bq27x00: Fix typos in comment
  ...
2012-03-20 21:12:50 -07:00
Linus Torvalds 3b59bf0816 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking merge from David Miller:
 "1) Move ixgbe driver over to purely page based buffering on receive.
     From Alexander Duyck.

  2) Add receive packet steering support to e1000e, from Bruce Allan.

  3) Convert TCP MD5 support over to RCU, from Eric Dumazet.

  4) Reduce cpu usage in handling out-of-order TCP packets on modern
     systems, also from Eric Dumazet.

  5) Support the IP{,V6}_UNICAST_IF socket options, making the wine
     folks happy, from Erich Hoover.

  6) Support VLAN trunking from guests in hyperv driver, from Haiyang
     Zhang.

  7) Support byte-queue-limtis in r8169, from Igor Maravic.

  8) Outline code intended for IP_RECVTOS in IP_PKTOPTIONS existed but
     was never properly implemented, Jiri Benc fixed that.

  9) 64-bit statistics support in r8169 and 8139too, from Junchang Wang.

  10) Support kernel side dump filtering by ctmark in netfilter
      ctnetlink, from Pablo Neira Ayuso.

  11) Support byte-queue-limits in gianfar driver, from Paul Gortmaker.

  12) Add new peek socket options to assist with socket migration, from
      Pavel Emelyanov.

  13) Add sch_plug packet scheduler whose queue is controlled by
      userland daemons using explicit freeze and release commands.  From
      Shriram Rajagopalan.

  14) Fix FCOE checksum offload handling on transmit, from Yi Zou."

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1846 commits)
  Fix pppol2tp getsockname()
  Remove printk from rds_sendmsg
  ipv6: fix incorrent ipv6 ipsec packet fragment
  cpsw: Hook up default ndo_change_mtu.
  net: qmi_wwan: fix build error due to cdc-wdm dependecy
  netdev: driver: ethernet: Add TI CPSW driver
  netdev: driver: ethernet: add cpsw address lookup engine support
  phy: add am79c874 PHY support
  mlx4_core: fix race on comm channel
  bonding: send igmp report for its master
  fs_enet: Add MPC5125 FEC support and PHY interface selection
  net: bpf_jit: fix BPF_S_LDX_B_MSH compilation
  net: update the usage of CHECKSUM_UNNECESSARY
  fcoe: use CHECKSUM_UNNECESSARY instead of CHECKSUM_PARTIAL on tx
  net: do not do gso for CHECKSUM_UNNECESSARY in netif_needs_gso
  ixgbe: Fix issues with SR-IOV loopback when flow control is disabled
  net/hyperv: Fix the code handling tx busy
  ixgbe: fix namespace issues when FCoE/DCB is not enabled
  rtlwifi: Remove unused ETH_ADDR_LEN defines
  igbvf: Use ETH_ALEN
  ...

Fix up fairly trivial conflicts in drivers/isdn/gigaset/interface.c and
drivers/net/usb/{Kconfig,qmi_wwan.c} as per David.
2012-03-20 21:04:47 -07:00
Al Viro 68ac1234fb switch touch_atime to struct path
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-20 21:29:41 -04:00
Al Viro 40ffe67d2e switch unix_sock to struct path
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-20 21:29:41 -04:00
Al Viro 48fde701af switch open-coded instances of d_make_root() to new helper
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-20 21:29:35 -04:00
Linus Torvalds 0d9cabdcce Merge branch 'for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup changes from Tejun Heo:
 "Out of the 8 commits, one fixes a long-standing locking issue around
  tasklist walking and others are cleanups."

* 'for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: Walk task list under tasklist_lock in cgroup_enable_task_cg_list
  cgroup: Remove wrong comment on cgroup_enable_task_cg_list()
  cgroup: remove cgroup_subsys argument from callbacks
  cgroup: remove extra calls to find_existing_css_set
  cgroup: replace tasklist_lock with rcu_read_lock
  cgroup: simplify double-check locking in cgroup_attach_proc
  cgroup: move struct cgroup_pidlist out from the header file
  cgroup: remove cgroup_attach_task_current_cg()
2012-03-20 18:11:21 -07:00
Benjamin LaHaise bbdb32cb5b Fix pppol2tp getsockname()
While testing L2TP functionality, I came across a bug in getsockname().  The
IP address returned within the pppol2tp_addr's addr memember was not being
set to the IP  address in use.  This bug is caused by using inet_sk() on the
wrong socket (the L2TP socket rather than the underlying UDP socket), and was
likely introduced during the addition of L2TPv3 support.

Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-20 16:12:11 -04:00
Dave Jones a6506e1486 Remove printk from rds_sendmsg
no socket layer outputs a message for this error and neither should rds.

Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-20 16:12:11 -04:00
Linus Torvalds 843ec558f9 tty and serial merge for 3.4-rc1
Here's the big serial and tty merge for the 3.4-rc1 tree.
 
 There's loads of fixes and reworks in here from Jiri for the tty layer,
 and a number of patches from Alan to help try to wrestle the vt layer
 into a sane model.
 
 Other than that, lots of driver updates and fixes, and other minor
 stuff, all detailed in the shortlog.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.18 (GNU/Linux)
 
 iEYEABECAAYFAk9nihQACgkQMUfUDdst+ylXTQCdFuwVuZgjCts+xDVa1jX2ac84
 UogAn3Wr+P7NYFN6gvaGm52KbGbZs405
 =2b/l
 -----END PGP SIGNATURE-----

Merge tag 'tty-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull TTY/serial patches from Greg KH:
 "tty and serial merge for 3.4-rc1

  Here's the big serial and tty merge for the 3.4-rc1 tree.

  There's loads of fixes and reworks in here from Jiri for the tty
  layer, and a number of patches from Alan to help try to wrestle the vt
  layer into a sane model.

  Other than that, lots of driver updates and fixes, and other minor
  stuff, all detailed in the shortlog."

* tag 'tty-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (132 commits)
  serial: pxa: add clk_prepare/clk_unprepare calls
  TTY: Wrong unicode value copied in con_set_unimap()
  serial: PL011: clear pending interrupts
  serial: bfin-uart: Don't access tty circular buffer in TX DMA interrupt after it is reset.
  vt: NULL dereference in vt_do_kdsk_ioctl()
  tty: serial: vt8500: fix annotations for probe/remove
  serial: remove back and forth conversions in serial_out_sync
  serial: use serial_port_in/out vs serial_in/out in 8250
  serial: introduce generic port in/out helpers
  serial: reduce number of indirections in 8250 code
  serial: delete useless void casts in 8250.c
  serial: make 8250's serial_in shareable to other drivers.
  serial: delete last unused traces of pausing I/O in 8250
  pch_uart: Add module parameter descriptions
  pch_uart: Use existing default_baud in setup_console
  pch_uart: Add user_uartclk parameter
  pch_uart: Add Fish River Island II uart clock quirks
  pch_uart: Use uartclk instead of base_baud
  mpc5200b/uart: select more tolerant uart prescaler on low baudrates
  tty: moxa: fix bit test in moxa_start()
  ...
2012-03-20 11:24:39 -07:00
Linus Torvalds 9c2b957db1 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf events changes for v3.4 from Ingo Molnar:

 - New "hardware based branch profiling" feature both on the kernel and
   the tooling side, on CPUs that support it.  (modern x86 Intel CPUs
   with the 'LBR' hardware feature currently.)

   This new feature is basically a sophisticated 'magnifying glass' for
   branch execution - something that is pretty difficult to extract from
   regular, function histogram centric profiles.

   The simplest mode is activated via 'perf record -b', and the result
   looks like this in perf report:

	$ perf record -b any_call,u -e cycles:u branchy

	$ perf report -b --sort=symbol
	    52.34%  [.] main                   [.] f1
	    24.04%  [.] f1                     [.] f3
	    23.60%  [.] f1                     [.] f2
	     0.01%  [k] _IO_new_file_xsputn    [k] _IO_file_overflow
	     0.01%  [k] _IO_vfprintf_internal  [k] _IO_new_file_xsputn
	     0.01%  [k] _IO_vfprintf_internal  [k] strchrnul
	     0.01%  [k] __printf               [k] _IO_vfprintf_internal
	     0.01%  [k] main                   [k] __printf

   This output shows from/to branch columns and shows the highest
   percentage (from,to) jump combinations - i.e.  the most likely taken
   branches in the system.  "branches" can also include function calls
   and any other synchronous and asynchronous transitions of the
   instruction pointer that are not 'next instruction' - such as system
   calls, traps, interrupts, etc.

   This feature comes with (hopefully intuitive) flat ascii and TUI
   support in perf report.

 - Various 'perf annotate' visual improvements for us assembly junkies.
   It will now recognize function calls in the TUI and by hitting enter
   you can follow the call (recursively) and back, amongst other
   improvements.

 - Multiple threads/processes recording support in perf record, perf
   stat, perf top - which is activated via a comma-list of PIDs:

	perf top -p 21483,21485
	perf stat -p 21483,21485 -ddd
	perf record -p 21483,21485

 - Support for per UID views, via the --uid paramter to perf top, perf
   report, etc.  For example 'perf top --uid mingo' will only show the
   tasks that I am running, excluding other users, root, etc.

 - Jump label restructurings and improvements - this includes the
   factoring out of the (hopefully much clearer) include/linux/static_key.h
   generic facility:

	struct static_key key = STATIC_KEY_INIT_FALSE;

	...

	if (static_key_false(&key))
	        do unlikely code
	else
	        do likely code

	...
	static_key_slow_inc();
	...
	static_key_slow_inc();
	...

   The static_key_false() branch will be generated into the code with as
   little impact to the likely code path as possible.  the
   static_key_slow_*() APIs flip the branch via live kernel code patching.

   This facility can now be used more widely within the kernel to
   micro-optimize hot branches whose likelihood matches the static-key
   usage and fast/slow cost patterns.

 - SW function tracer improvements: perf support and filtering support.

 - Various hardenings of the perf.data ABI, to make older perf.data's
   smoother on newer tool versions, to make new features integrate more
   smoothly, to support cross-endian recording/analyzing workflows
   better, etc.

 - Restructuring of the kprobes code, the splitting out of 'optprobes',
   and a corner case bugfix.

 - Allow the tracing of kernel console output (printk).

 - Improvements/fixes to user-space RDPMC support, allowing user-space
   self-profiling code to extract PMU counts without performing any
   system calls, while playing nice with the kernel side.

 - 'perf bench' improvements

 - ... and lots of internal restructurings, cleanups and fixes that made
   these features possible.  And, as usual this list is incomplete as
   there were also lots of other improvements

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (120 commits)
  perf report: Fix annotate double quit issue in branch view mode
  perf report: Remove duplicate annotate choice in branch view mode
  perf/x86: Prettify pmu config literals
  perf report: Enable TUI in branch view mode
  perf report: Auto-detect branch stack sampling mode
  perf record: Add HEADER_BRANCH_STACK tag
  perf record: Provide default branch stack sampling mode option
  perf tools: Make perf able to read files from older ABIs
  perf tools: Fix ABI compatibility bug in print_event_desc()
  perf tools: Enable reading of perf.data files from different ABI rev
  perf: Add ABI reference sizes
  perf report: Add support for taken branch sampling
  perf record: Add support for sampling taken branch
  perf tools: Add code to support PERF_SAMPLE_BRANCH_STACK
  x86/kprobes: Split out optprobe related code to kprobes-opt.c
  x86/kprobes: Fix a bug which can modify kernel code permanently
  x86/kprobes: Fix instruction recovery on optimized path
  perf: Add callback to flush branch_stack on context switch
  perf: Disable PERF_SAMPLE_BRANCH_* when not supported
  perf/x86: Add LBR software filter support for Intel CPUs
  ...
2012-03-20 10:29:15 -07:00
Linus Torvalds 5928a2b60c Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU changes for v3.4 from Ingo Molnar.  The major features of this
series are:

 - making RCU more aggressive about entering dyntick-idle mode in order
   to improve energy efficiency

 - converting a few more call_rcu()s to kfree_rcu()s

 - applying a number of rcutree fixes and cleanups to rcutiny

 - removing CONFIG_SMP #ifdefs from treercu

 - allowing RCU CPU stall times to be set via sysfs

 - adding CPU-stall capability to rcutorture

 - adding more RCU-abuse diagnostics

 - updating documentation

 - fixing yet more issues located by the still-ongoing top-to-bottom
   inspection of RCU, this time with a special focus on the CPU-hotplug
   code path.

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (48 commits)
  rcu: Stop spurious warnings from synchronize_sched_expedited
  rcu: Hold off RCU_FAST_NO_HZ after timer posted
  rcu: Eliminate softirq-mediated RCU_FAST_NO_HZ idle-entry loop
  rcu: Add RCU_NONIDLE() for idle-loop RCU read-side critical sections
  rcu: Allow nesting of rcu_idle_enter() and rcu_idle_exit()
  rcu: Remove redundant check for rcu_head misalignment
  PTR_ERR should be called before its argument is cleared.
  rcu: Convert WARN_ON_ONCE() in rcu_lock_acquire() to lockdep
  rcu: Trace only after NULL-pointer check
  rcu: Call out dangers of expedited RCU primitives
  rcu: Rework detection of use of RCU by offline CPUs
  lockdep: Add CPU-idle/offline warning to lockdep-RCU splat
  rcu: No interrupt disabling for rcu_prepare_for_idle()
  rcu: Move synchronize_sched_expedited() to rcutree.c
  rcu: Check for illegal use of RCU from offlined CPUs
  rcu: Update stall-warning documentation
  rcu: Add CPU-stall capability to rcutorture
  rcu: Make documentation give more realistic rcutorture duration
  rcutorture: Permit holding off CPU-hotplug operations during boot
  rcu: Print scheduling-clock information on RCU CPU stall-warning messages
  ...
2012-03-20 10:10:18 -07:00
Trond Myklebust e27d359e9b SUNRPC/NFS: Add Kbuild dependencies for NFS_DEBUG/RPC_DEBUG
This allows us to turn on/off the dprintk() debugging interfaces for
those distributions that don't ship the 'rpcdebug' utility.
It also allows us to add Kbuild dependencies. Specifically, we already
know that dprintk() in general relies on CONFIG_SYSCTL. Now it turns out
that the NFS dprintks depend on CONFIG_CRC32 after we added support
for the filehandle hash.

Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-20 13:08:26 -04:00
Cong Wang b854178601 sunrpc: remove the second argument of k[un]map_atomic()
Signed-off-by: Cong Wang <amwang@redhat.com>
2012-03-20 21:48:28 +08:00
Cong Wang 6114eab535 rds: remove the second argument of k[un]map_atomic()
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
2012-03-20 21:48:28 +08:00
Cong Wang 0352bc550c net: remove the second argument of k[un]map_atomic()
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
2012-03-20 21:48:27 +08:00
Gao feng 1f85851e17 ipv6: fix incorrent ipv6 ipsec packet fragment
Since commit 299b0767(ipv6: Fix IPsec slowpath fragmentation problem)
In func ip6_append_data,after call skb_put(skb, fraglen + dst_exthdrlen)
the skb->len contains dst_exthdrlen,and we don't reduce dst_exthdrlen at last
This will make fraggap>0 in next "while cycle",and cause the size of skb incorrent

Fix this by reserve headroom for dst_exthdrlen.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-20 05:39:34 -04:00
Eric Dumazet c8628155ec tcp: reduce out_of_order memory use
With increasing receive window sizes, but speed of light not improved
that much, out of order queue can contain a huge number of skbs, waiting
to be moved to receive_queue when missing packets can fill the holes.

Some devices happen to use fat skbs (truesize of 4096 + sizeof(struct
sk_buff)) to store regular (MTU <= 1500) frames. This makes highly
probable sk_rmem_alloc hits sk_rcvbuf limit, which can be 4Mbytes in
many cases.

When limit is hit, tcp stack calls tcp_collapse_ofo_queue(), a true
latency killer and cpu cache blower.

Doing the coalescing attempt each time we add a frame in ofo queue
permits to keep memory use tight and in many cases avoid the
tcp_collapse() thing later.

Tested on various wireless setups (b43, ath9k, ...) known to use big skb
truesize, this patch removed the "packets collapsed in receive queue due
to low socket buffer" I had before.

This also reduced average memory used by tcp sockets.

With help from Neal Cardwell.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: H.K. Jerry Chu <hkchu@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-19 16:53:08 -04:00
Eric Dumazet e86b291962 tcp: introduce tcp_data_queue_ofo
Split tcp_data_queue() in two parts for better readability.

tcp_data_queue_ofo() is responsible for queueing incoming skb into out
of order queue.

Change code layout so that the skb_set_owner_r() is performed only if
skb is not dropped.

This is a preliminary patch before "reduce out_of_order memory use"
following patch.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: H.K. Jerry Chu <hkchu@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-19 16:53:07 -04:00
Trond Myklebust 540a0f7584 SUNRPC: We must not use list_for_each_entry_safe() in rpc_wake_up()
The problem is that for the case of priority queues, we
have to assume that __rpc_remove_wait_queue_priority will move new
elements from the tk_wait.links lists into the queue->tasks[] list.
We therefore cannot use list_for_each_entry_safe() on queue->tasks[],
since that will skip these new tasks that __rpc_remove_wait_queue_priority
is adding.

Without this fix, rpc_wake_up and rpc_wake_up_status will both fail
to wake up all functions on priority wait queues, which can result
in some nasty hangs.

Reported-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
2012-03-19 14:15:02 -04:00
David S. Miller 4da0bd7365 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-03-18 23:29:41 -04:00
Pablo Neira Ayuso a16a1647fa netfilter: ctnetlink: fix race between delete and timeout expiration
Kerin Millar reported hardlockups while running `conntrackd -c'
in a busy firewall. That system (with several processors) was
acting as backup in a primary-backup setup.

After several tries, I found a race condition between the deletion
operation of ctnetlink and timeout expiration. This patch fixes
this problem.

Tested-by: Kerin Millar <kerframil@gmail.com>
Reported-by: Kerin Millar <kerframil@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-17 01:47:08 -07:00
Neil Horman 124d37e9f0 arp: allow arp processing to honor per interface arp_accept sysctl
I found recently that the arp_process function which handles all of our received
arp frames, is using IPV4_DEVCONF_ALL macro to check the state of the arp_process
flag.  This seems wrong, as it implies that either none or all of the network
interfaces accept gratuitous arps.  This patch corrects that, allowing
per-interface arp_accept configuration to deviate from the all setting.  Note
this also brings us into line with the way the arp_filter setting is handled
during arp_process execution.

Tested this myself on my home network, and confirmed it works as expected.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-16 23:00:20 -07:00
RongQing.Li c577923756 ipv6: Don't dev_hold(dev) in ip6_mc_find_dev_rcu.
ip6_mc_find_dev_rcu() is called with rcu_read_lock(), so don't
need to dev_hold().
With dev_hold(), not corresponding dev_put(), will lead to leak.

[ bug introduced in 96b52e61be (ipv6: mcast: RCU conversions) ]

Signed-off-by: RongQing.Li <roy.qing.li@gmail.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-16 21:56:42 -07:00
John W. Linville 01a2829809 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem
Conflicts:
	drivers/net/wireless/ath/ath9k/hw.c
2012-03-16 13:45:25 -04:00
Eric Dumazet cc34eb672e sch_sfq: revert dont put new flow at the end of flows
This reverts commit d47a0ac7b6 (sch_sfq: dont put new flow at the end of
flows)

As Jesper found out, patch sounded great but has bad side effects.

In stress situation, pushing new flows in front of the queue can prevent
old flows doing any progress. Packets can stay in SFQ queue for
unlimited amount of time.

It's possible to add heuristics to limit this problem, but this would
add complexity outside of SFQ scope.

A more sensible answer to Dave Taht concerns (who reported the issued I
tried to solve in original commit) is probably to use a qdisc hierarchy
so that high prio packets dont enter a potentially crowded SFQ qdisc.

Reported-by: Jesper Dangaard Brouer <jdb@comx.dk>
Cc: Dave Taht <dave.taht@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-16 01:55:25 -07:00
Eric Dumazet 122bdf67f1 ipv6: fix icmp6_dst_alloc()
commit 87a115783 ( ipv6: Move xfrm_lookup() call down into
icmp6_dst_alloc().) forgot to convert one error path, leading
to crashes in mld_sendpack()

Many thanks to Dave Jones for providing a very complete bug report.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-16 01:53:42 -07:00
Eliad Peller dc41e4d474 mac80211: make uapsd_* keys per-vif
uapsd_queues and uapsd_max_sp_len are relevant only for managed
interfaces, and can be configured differently for each vif.

Move them from the local struct to sdata->u.mgd, and update
the debugfs functions accordingly.

Signed-off-by: Eliad Peller <eliad@wizery.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-03-15 13:43:12 -04:00
Eliad Peller ada577c12f mac80211: add NULL terminator to debugfs_netdev write buf
Some debugfs write functions call kstrto* functions, which
assume the string is null-terminated. Make it valid by changing
ieee80211_if_write() to use static buffer instead of allocating
one, and set the last char to NULL.

(The write functions try to parse some integer/mac address,
so 64 bytes buffer should be enough)

Signed-off-by: Eliad Peller <eliad@wizery.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-03-15 13:40:34 -04:00
Helmut Schaa ba6fa29c6d mac80211: Don't sample max throughput rate in minstrel_ht
The current max throughput rate is known to be good as otherwise it
wouldn't be the max throughput rate. Since rate sampling can introduce
some overhead (by adding RTS for example or due to not aggregating the
frame) don't sample the max throughput rate.

Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Acked-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-03-15 13:40:33 -04:00
Paul Stewart 3117bbdb78 mac80211: Don't let regulatory make us deaf
When regulatory information changes our HT behavior (e.g,
when we get a country code from the AP we have just associated
with), we should use this information to change the power with
which we transmit, and what channels we transmit.  Sometimes
the channel parameters we derive from regulatory information
contradicts the parameters we used in association.  For example,
we could have associated specifying HT40, but the regulatory
rules we apply may forbid HT40 operation.

In the situation above, we should reconfigure ourselves to
transmit in HT20 only, however it makes no sense for us to
disable receive in HT40, since if we associated with these
parameters, the AP has every reason to expect we can and
will receive packets this way.  The code in mac80211 does
not have the capability of sending the appropriate action
frames to signal a change in HT behaviour so the AP has
no clue we can no longer receive frames encoded this way.
In some broken AP implementations, this can leave us
effectively deaf if the AP never retries in lower HT rates.

This change breaks up the channel_type parameter in the
ieee80211_enable_ht function into a separate receive and
transmit part.  It honors the channel flags set by regulatory
in order to configure the rate control algorithm, but uses
the capability flags to configure the channel on the radio,
since these were used in association to set the AP's transmit
rate.

Signed-off-by: Paul Stewart <pstew@chromium.org>
Cc: Sam Leffler <sleffler@chromium.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Reviewed-by: Luis R Rodriguez <mcgrof@frijolero.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-03-13 14:55:53 -04:00
Johannes Berg e9ac0745c7 mac80211: rename bss_conf timestamp to last_tsf
This value is not really very useful by itself,
yet some drivers (including iwlwifi until I can
figure out what it should do) use it. At least
rename it to "last_tsf" to indicate the meaning
and add a note that it may be really old.

I suspect the value may become useful combined
with the rx_status->mactime, but we don't (yet)
store that value and pass it to the driver.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-03-13 14:54:20 -04:00
Johannes Berg 7b8bcff2e0 cfg80211: clarify timestamp in cfg80211_inform_bss
This is intended to be the timestamp sent by the
peer in the beacon/probe response, not any form
of host timestamp. Clarify the documentation and
variable names.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-03-13 14:54:20 -04:00
Johannes Berg a828691188 mac80211: linearize SKBs as needed for crypto
Not linearizing every SKB will help actually pass
non-linear SKBs all the way up when on an encrypted
connection. For now, linearize TKIP completely as
it is lower performance and I don't quite grok all
the details.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-03-13 14:54:17 -04:00
Johannes Berg 617bbde878 mac80211: move RX WEP weak IV counting
This is better done inside the WEP decrypt
function where it doesn't have to check all
the conditions any more since they've been
tested already.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-03-13 14:54:16 -04:00
Joe Perches afd465030a net: ipv4: Standardize prefixes for message logging
Add #define pr_fmt(fmt) as appropriate.

Add "IPv4: ", "TCP: ", and "IPsec: " to appropriate files.
Standardize on "UDPLite: " for appropriate uses.
Some prefixes were previously "UDPLITE: " and "UDP-Lite: ".

Add KBUILD_MODNAME ": " to icmp and gre.
Remove embedded prefixes as appropriate.

Add missing "\n" to pr_info in gre.c.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-12 17:05:21 -07:00
Ingo Molnar 35239e23c6 Merge branch 'perf/urgent' into perf/core
Merge reason: We are going to queue up a dependent patch.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-12 20:44:11 +01:00
Johannes Berg 5d6a1b069b mac80211: set basic rates earlier
The authentication and association handshake
already happens in the context of the new BSS,
and the basic rates are needed at least for
the ACK response frame to the authentication
or association response frames. Therefore the
basic rates should already be configured into
the driver when those frames are sent.

Change the logic to set up the basic rates in
the connection preparation that happens for
authentication and association (if needed).

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-03-12 14:22:16 -04:00
Johannes Berg a1cf775dea mac80211: refactor common auth/assoc setup code
As associating is possible without first authenticating
(for FT over DS) association also has to be able to
switch to the right channel, insert the station entry
etc. Factor out this common code into a new function
called ieee80211_prep_connection().

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-03-12 14:22:15 -04:00
Johannes Berg 0775f9f90c mac80211: remove spurious BSSID change flag
The BSSID has been set a lot earlier already and
didn't change again in ieee80211_set_associated().

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-03-12 14:22:14 -04:00
Johannes Berg 76f0303d61 mac80211: simplify wmm check during association
Instead of setting assoc_data->wmm_used solely
based on the BSS also take into account our own
capabilities and later check those.

Also rename "wmm_used" and "uapsd_used" to just
"wmm" and "uapsd".

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-03-12 14:22:13 -04:00
Johannes Berg 4e74bfdb30 mac80211: simplify HT checks
Always set/use IEEE80211_STA_DISABLE_11N instead
of duplicating the queue, WMM and HT checks in
all places.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-03-12 14:22:12 -04:00
Johannes Berg de5036aae6 mac80211: move misplaced comment
Looks like some changes in this area moved
the code but not the comment that belongs
to the code, move it to the right place.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-03-12 14:22:11 -04:00
Helmut Schaa e9219779f9 mac80211: Disable MCS > 7 in minstrel_ht when STA uses static SMPS
Disable multi stream rates (MCS > 7) when a STA is in static SMPS mode
since it has only one active rx chain. Hence, it doesn't even make
sense to sample multi stream rates.

Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-03-12 14:19:39 -04:00
Johannes Berg 3cc5240b5e mac80211: set channel back after disassociating
As we've discussed, we want to avoid channel changes
while associated. While the part when we actually
associate needs a bit more work, the bit that happens
on disassociating can be changed quite easily. Move
the channel type change later in the disassociate
process to set the channel only after the driver was
told that it's now disassociated.

As the driver could expect powersave to be enabled
only when associated, this thus results in splitting
the config call, but overall what happens makes more
sense this way.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-03-12 14:19:38 -04:00
Johannes Berg 177958e967 mac80211: remove tx_sync
When the station state callback was added, this
was no longer needed in theory. With the iwlwifi
changes to remove use of it landing, we can kill
the entire tx-sync framework again, RIP.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-03-12 14:19:38 -04:00
Helmut Schaa aa45458060 mac80211: Limit TID buffering during BA session setup/teardown
While setting up or tearing down a BA session mac80211 is buffering
pending frames for the according TID. However, there's currently no
limit on how many frames are buffered possibly leading to an out-of-
memory situation. This can happen on systems with little memory when
the CPU is fully loaded since the BA session work is executed in
process context while frames can still come via softirq.

Apply a limitation to the TIDs pending queue to avoid consuming
too much memory in this situation.

Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-03-12 14:19:35 -04:00
Bala Shanmugam 4486ea987e cfg80211: Add background scan period attribute.
Receive background scan period as part of connect
command and pass the same to the driver.

Signed-off-by: Bala Shanmugam <bkamatch@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-03-12 14:19:34 -04:00
Trond Myklebust 0097143c12 SUNRPC: Don't use variable length automatic arrays in kernel code
Replace the variable length array in the RPCSEC_GSS crypto code with
a fixed length one. The size should be bounded by the variable
GSS_KRB5_MAX_BLOCKSIZE, so use that.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-12 13:37:16 -04:00
Joe Perches 058bd4d2a4 net: Convert printks to pr_<level>
Use a more current kernel messaging style.

Convert a printk block to print_hex_dump.
Coalesce formats, align arguments.
Use %s, __func__ instead of embedding function names.

Some messages that were prefixed with <foo>_close are
now prefixed with <foo>_fini.  Some ah4 and esp messages
are now not prefixed with "ip ".

The intent of this patch is to later add something like
  #define pr_fmt(fmt) "IPv4: " fmt.
to standardize the output messages.

Text size is trivially reduced. (x86-32 allyesconfig)

$ size net/ipv4/built-in.o*
   text	   data	    bss	    dec	    hex	filename
 887888	  31558	 249696	1169142	 11d6f6	net/ipv4/built-in.o.new
 887934	  31558	 249800	1169292	 11d78c	net/ipv4/built-in.o.old

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-11 23:42:51 -07:00
Maciej Żenczykowski 43db362d3a net: get rid of some pointless casts to sockaddr
The following 4 functions:
  move_addr_to_kernel
  move_addr_to_user
  verify_iovec
  verify_compat_iovec
are always effectively called with a sockaddr_storage.

Make this explicit by changing their signature.

This removes a large number of casts from sockaddr_storage to sockaddr.

Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-11 19:11:22 -07:00
Trond Myklebust 09acfea5d8 SUNRPC: Fix a few sparse warnings
net/sunrpc/svcsock.c:412:22: warning: incorrect type in assignment
(different address spaces)
 - svc_partial_recvfrom now takes a struct kvec, so the variable
   save_iovbase needs to be an ordinary (void *)

Make a bunch of variables in net/sunrpc/xprtsock.c static

Fix a couple of "warning: symbol 'foo' was not declared. Should it be
static?" reports.

Fix a couple of conflicting function declarations.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-11 19:30:02 -04:00
Li Wei 8b2aaedee4 ipv6: Fix Smatch warning.
With commit d6ddef9e641d(IPv6: Fix not join all-router mcast group
when forwarding set.) I check 'dev' after it's dereference that
leads to a Smatch complaint:

net/ipv6/addrconf.c:438 ipv6_add_dev()
	 warn: variable dereferenced before check 'dev' (see line 432)

net/ipv6/addrconf.c
   431		/* protected by rtnl_lock */
   432		rcu_assign_pointer(dev->ip6_ptr, ndev);
                                   ^^^^^^^^^^^^
Old dereference.

   433
   434		/* Join all-node multicast group */
   435		ipv6_dev_mc_inc(dev, &in6addr_linklocal_allnodes);
   436
   437		/* Join all-router multicast group if forwarding is set
*/
   438		if (ndev->cnf.forwarding && dev && (dev->flags &
IFF_MULTICAST))
                                            ^^^

Remove the check to avoid the complaint as 'dev' can't be NULL.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Li Wei <lw@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-11 15:57:11 -07:00
Eric Dumazet dfd25ffffc tcp: fix syncookie regression
commit ea4fc0d619 (ipv4: Don't use rt->rt_{src,dst} in ip_queue_xmit())
added a serious regression on synflood handling.

Simon Kirby discovered a successful connection was delayed by 20 seconds
before being responsive.

In my tests, I discovered that xmit frames were lost, and needed ~4
retransmits and a socket dst rebuild before being really sent.

In case of syncookie initiated connection, we use a different path to
initialize the socket dst, and inet->cork.fl.u.ip4 is left cleared.

As ip_queue_xmit() now depends on inet flow being setup, fix this by
copying the temp flowi4 we use in cookie_v4_check().

Reported-by: Simon Kirby <sim@netnation.com>
Bisected-by: Simon Kirby <sim@netnation.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-11 15:52:12 -07:00
sjur.brandeland@stericsson.com e3abcc2a85 caif: make zero a legal caif connetion id.
Connection ID configured through RTNL must allow zero as
connection id. If connection-id is not given when creating the
interface, configure a loopback interface using ifindex as
connection-id.

Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-11 15:38:16 -07:00