Commit Graph

377 Commits

Author SHA1 Message Date
Sowmini Varadhan a28c257c9e net/rds: Declare SO_RDS_TRANSPORT and RDS_TRANS_* constants in uapi/linux/rds.h
User space applications that desire to explicitly select the
underlying transport for a PF_RDS socket may do so by using the
SO_RDS_TRANSPORT socket option at the SOL_RDS level before bind().
The integer argument provided to the socket option would be one
of the RDS_TRANS_* values, e.g., RDS_TRANS_TCP. This commit exports
the constant values need by such applications via <linux/rds.h>

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-31 21:47:23 -07:00
Sagi Grimberg 3c88f3dcff RDS: Switch to generic logging helpers
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-05-18 13:44:23 -04:00
David S. Miller b04096ff33 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Four minor merge conflicts:

1) qca_spi.c renamed the local variable used for the SPI device
   from spi_device to spi, meanwhile the spi_set_drvdata() call
   got moved further up in the probe function.

2) Two changes were both adding new members to codel params
   structure, and thus we had overlapping changes to the
   initializer function.

3) 'net' was making a fix to sk_release_kernel() which is
   completely removed in 'net-next'.

4) In net_namespace.c, the rtnl_net_fill() call for GET operations
   had the command value fixed, meanwhile 'net-next' adjusted the
   argument signature a bit.

This also matches example merge resolutions posted by Stephen
Rothwell over the past two days.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-13 14:31:43 -04:00
Eric W. Biederman 11aa9c28b4 net: Pass kern from net_proto_family.create to sk_alloc
In preparation for changing how struct net is refcounted
on kernel sockets pass the knowledge that we are creating
a kernel socket from sock_create_kern through to sk_alloc.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-11 10:50:17 -04:00
Sowmini Varadhan c82ac7e69e net/rds: RDS-TCP: only initiate reconnect attempt on outgoing TCP socket.
When the peer of an RDS-TCP connection restarts, a reconnect
attempt should only be made from the active side  of the TCP
connection, i.e. the side that has a transient TCP port
number. Do not add the passive side of the TCP connection
to the c_hash_node and thus avoid triggering rds_queue_reconnect()
for passive rds connections.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-09 16:03:28 -04:00
Sowmini Varadhan f711a6ae06 net/rds: RDS-TCP: Always create a new rds_sock for an incoming connection.
When running RDS over TCP, the active (client) side connects to the
listening ("passive") side at the RDS_TCP_PORT.  After the connection
is established, if the client side reboots (potentially without even
sending a FIN) the server still has a TCP socket in the esablished
state.  If the server-side now gets a new SYN comes from the client
with a different client port, TCP will create a new socket-pair, but
the RDS layer will incorrectly pull up the old rds_connection (which
is still associated with the stale t_sock and RDS socket state).

This patch corrects this behavior by having rds_tcp_accept_one()
always create a new connection for an incoming TCP SYN.
The rds and tcp state associated with the old socket-pair is cleaned
up via the rds_tcp_state_change() callback which would typically be
invoked in most cases when the client-TCP sends a FIN on TCP restart,
triggering a transition to CLOSE_WAIT state. In the rarer event of client
death without a FIN, TCP_KEEPALIVE probes on the socket will detect
the stale socket, and the TCP transition to CLOSE state will trigger
the RDS state cleanup.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-09 16:03:27 -04:00
David Ahern e2783717a7 net/rds: Fix new sparse warning
c0adf54a10 introduced new sparse warnings:
  CHECK   /home/dahern/kernels/linux.git/net/rds/ib_cm.c
net/rds/ib_cm.c:191:34: warning: incorrect type in initializer (different base types)
net/rds/ib_cm.c:191:34:    expected unsigned long long [unsigned] [usertype] dp_ack_seq
net/rds/ib_cm.c:191:34:    got restricted __be64 <noident>
net/rds/ib_cm.c:194:51: warning: cast to restricted __be64

The temporary variable for sequence number should have been declared as __be64
rather than u64. Make it so.

Signed-off-by: David Ahern <david.ahern@oracle.com>
Cc: shamir rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-04 15:23:06 -04:00
shamir rabinovitch c0adf54a10 net/rds: fix unaligned memory access
rdma_conn_param private data is copied using memcpy after headers such
as cma_hdr (see cma_resolve_ib_udp as example). so the start of the
private data is aligned to the end of the structure that come before. if
this structure end with u32 the meaning is that the start of the private
data will be 4 bytes aligned. structures that use u8/u16/u32/u64 are
naturally aligned but in case the structure start is not 8 bytes aligned,
all u64 members of this structure will not be aligned. to solve this issue
we must use special macros that allow unaligned access to those
unaligned members.

Addresses the following kernel log seen when attempting to use RDMA:

Kernel unaligned access at TPC[10507a88] rds_ib_cm_connect_complete+0x1bc/0x1e0 [rds_rdma]

Acked-by: Chien Yen <chien.yen@oracle.com>
Signed-off-by: shamir rabinovitch <shamir.rabinovitch@oracle.com>
[Minor tweaks for top of tree by:]
Signed-off-by: David Ahern <david.ahern@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-03 23:39:01 -04:00
David S. Miller 87ffabb1f0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
The dwmac-socfpga.c conflict was a case of a bug fix overlapping
changes in net-next to handle an error pointer differently.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-14 15:44:14 -04:00
Sowmini Varadhan 443be0e5af RDS: make sure not to loop forever inside rds_send_xmit
If a determined set of concurrent senders keep the send queue full,
we can loop forever inside rds_send_xmit.  This fix has two parts.

First we are dropping out of the while(1) loop after we've processed a
large batch of messages.

Second we add a generation number that gets bumped each time the
xmit bit lock is acquired.  If someone else has jumped in and
made progress in the queue, we skip our goto restart.

Original patch by Chris Mason.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-08 15:17:32 -04:00
Sowmini Varadhan 1789b2c077 RDS: only use passive connections when addresses match
Passive connections were added for the case where one loopback IB
connection between identical addresses needs another connection to store
the second QP.  Unfortunately, they were also created in the case where
the addesses differ and we already have both QPs.

This lead to a message reordering bug.

- two different IB interfaces and addresses on a machine: A B
- traffic is sent from A to B
- connection from A-B is created, connect request sent
- listening accepts connect request, B-A is created
- traffic flows, next_rx is incremented
- unacked messages exist on the retrans list
- connection A-B is shut down, new connect request sent
- listen sees existing loopback B-A, creates new passive B-A
- retrans messages are sent and delivered because of 0 next_rx

The problem is that the second connection request saw the previously
existing parent connection.  Instead of using it, and using the existing
next_rx_seq state for the traffic between those IPs, it mistakenly
thought that it had to create a passive connection.

We fix this by only using passive connections in the special case where
laddr and faddr match.  In this case we'll only ever have one parent
sending connection requests and one passive connection created as the
listening path sees the existing parent connection which initiated the
request.

Original patch by Zach Brown

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-08 15:17:23 -04:00
David S. Miller 0fa74a4be4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/emulex/benet/be_main.c
	net/core/sysctl_net_core.c
	net/ipv4/inet_diag.c

The be_main.c conflict resolution was really tricky.  The conflict
hunks generated by GIT were very unhelpful, to say the least.  It
split functions in half and moved them around, when the real actual
conflict only existed solely inside of one function, that being
be_map_pci_bars().

So instead, to resolve this, I checked out be_main.c from the top
of net-next, then I applied the be_main.c changes from 'net' since
the last time I merged.  And this worked beautifully.

The inet_diag.c and sysctl_net_core.c conflicts were simple
overlapping changes, and were easily to resolve.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-20 18:51:09 -04:00
Arnd Bergmann f862e07cf9 rds: avoid potential stack overflow
The rds_iw_update_cm_id function stores a large 'struct rds_sock' object
on the stack in order to pass a pair of addresses. This happens to just
fit withint the 1024 byte stack size warning limit on x86, but just
exceed that limit on ARM, which gives us this warning:

net/rds/iw_rdma.c:200:1: warning: the frame size of 1056 bytes is larger than 1024 bytes [-Wframe-larger-than=]

As the use of this large variable is basically bogus, we can rearrange
the code to not do that. Instead of passing an rds socket into
rds_iw_get_device, we now just pass the two addresses that we have
available in rds_iw_update_cm_id, and we change rds_iw_get_mr accordingly,
to create two address structures on the stack there.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 00:28:01 -04:00
Ying Xue 1b78414047 net: Remove iocb argument from sendmsg and recvmsg
After TIPC doesn't depend on iocb argument in its internal
implementations of sendmsg() and recvmsg() hooks defined in proto
structure, no any user is using iocb argument in them at all now.
Then we can drop the redundant iocb argument completely from kinds of
implementations of both sendmsg() and recvmsg() in the entire
networking stack.

Cc: Christoph Hellwig <hch@lst.de>
Suggested-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02 13:06:31 -05:00
Sowmini Varadhan 80ad0d4a7a rds: rds_cong_queue_updates needs to defer the congestion update transmission
When the RDS transport is TCP, we cannot inline the call to rds_send_xmit
from rds_cong_queue_update because
(a) we are already holding the sock_lock in the recv path, and
    will deadlock when tcp_setsockopt/tcp_sendmsg try to get the sock
    lock
(b) cong_queue_update does an irqsave on the rds_cong_lock, and this
    will trigger warnings (for a good reason) from functions called
    out of sock_lock.

This patch reverts the change introduced by
2fa57129d ("RDS: Bypass workqueue when queueing cong updates").

The patch has been verified for both RDS/TCP as well as RDS/RDMA
to ensure that there are not regressions for either transport:
 - for verification of  RDS/TCP a client-server unit-test was used,
   with the server blocked in gdb and thus unable to drain its rcvbuf,
   eventually triggering a RDS congestion update.
 - for RDS/RDMA, the standard IB regression tests were used

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-11 14:35:44 -08:00
Sowmini Varadhan d0a47d3272 rds: Make rds_message_copy_from_user() return 0 on success.
Commit 083735f4b0 ("rds: switch rds_message_copy_from_user() to iov_iter")
breaks rds_message_copy_from_user() semantics on success, and causes it
to return nbytes copied, when it should return 0.  This commit fixes that bug.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-07 22:41:56 -08:00
Rasmus Villemoes 11ac11999b net: rds: Remove repeated function names from debug output
The macro rdsdebug is defined as

  pr_debug("%s(): " fmt, __func__ , ##args)

Hence it doesn't make sense to include the name of the calling
function explicitly in the format string passed to rdsdebug.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-07 22:41:56 -08:00
Sasha Levin db27ebb111 net: rds: use correct size for max unacked packets and bytes
Max unacked packets/bytes is an int while sizeof(long) was used in the
sysctl table.

This means that when they were getting read we'd also leak kernel memory
to userspace along with the timeout values.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-04 16:07:27 -08:00
Geert Uytterhoeven 6ff4a8ad4b rds: Fix min() warning in rds_message_inc_copy_to_user()
net/rds/message.c: In function ‘rds_message_inc_copy_to_user’:
net/rds/message.c:328: warning: comparison of distinct pointer types lacks a cast

Use min_t(unsigned long, ...) like is done in
rds_message_copy_from_user().

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-15 11:49:09 -05:00
Gu Zheng f95b414edb net: introduce helper macro for_each_cmsghdr
Introduce helper macro for_each_cmsghdr as a wrapper of the enumerating
cmsghdr from msghdr, just cleanup.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-10 22:41:55 -05:00
Al Viro c0371da604 put iov_iter into msghdr
Note that the code _using_ ->msg_iter at that point will be very
unhappy with anything other than unshifted iovec-backed iov_iter.
We still need to convert users to proper primitives.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-12-09 16:29:03 -05:00
Al Viro 083735f4b0 rds: switch rds_message_copy_from_user() to iov_iter
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-11-24 05:16:43 -05:00
Al Viro c310e72c89 rds: switch ->inc_copy_to_user() to passing iov_iter
instances get considerably simpler from that...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-11-24 05:16:43 -05:00
Linus Torvalds 2e923b0251 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Include fixes for netrom and dsa (Fabian Frederick and Florian
    Fainelli)

 2) Fix FIXED_PHY support in stmmac, from Giuseppe CAVALLARO.

 3) Several SKB use after free fixes (vxlan, openvswitch, vxlan,
    ip_tunnel, fou), from Li ROngQing.

 4) fec driver PTP support fixes from Luwei Zhou and Nimrod Andy.

 5) Use after free in virtio_net, from Michael S Tsirkin.

 6) Fix flow mask handling for megaflows in openvswitch, from Pravin B
    Shelar.

 7) ISDN gigaset and capi bug fixes from Tilman Schmidt.

 8) Fix route leak in ip_send_unicast_reply(), from Vasily Averin.

 9) Fix two eBPF JIT bugs on x86, from Alexei Starovoitov.

10) TCP_SKB_CB() reorganization caused a few regressions, fixed by Cong
    Wang and Eric Dumazet.

11) Don't overwrite end of SKB when parsing malformed sctp ASCONF
    chunks, from Daniel Borkmann.

12) Don't call sock_kfree_s() with NULL pointers, this function also has
    the side effect of adjusting the socket memory usage.  From Cong Wang.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (90 commits)
  bna: fix skb->truesize underestimation
  net: dsa: add includes for ethtool and phy_fixed definitions
  openvswitch: Set flow-key members.
  netrom: use linux/uaccess.h
  dsa: Fix conversion from host device to mii bus
  tipc: fix bug in bundled buffer reception
  ipv6: introduce tcp_v6_iif()
  sfc: add support for skb->xmit_more
  r8152: return -EBUSY for runtime suspend
  ipv4: fix a potential use after free in fou.c
  ipv4: fix a potential use after free in ip_tunnel_core.c
  hyperv: Add handling of IP header with option field in netvsc_set_hash()
  openvswitch: Create right mask with disabled megaflows
  vxlan: fix a free after use
  openvswitch: fix a use after free
  ipv4: dst_entry leak in ip_send_unicast_reply()
  ipv4: clean up cookie_v4_check()
  ipv4: share tcp_v4_save_options() with cookie_v4_check()
  ipv4: call __ip_options_echo() in cookie_v4_check()
  atm: simplify lanai.c by using module_pci_driver
  ...
2014-10-18 09:31:37 -07:00
Linus Torvalds 0429fbc0bd Merge branch 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
Pull percpu consistent-ops changes from Tejun Heo:
 "Way back, before the current percpu allocator was implemented, static
  and dynamic percpu memory areas were allocated and handled separately
  and had their own accessors.  The distinction has been gone for many
  years now; however, the now duplicate two sets of accessors remained
  with the pointer based ones - this_cpu_*() - evolving various other
  operations over time.  During the process, we also accumulated other
  inconsistent operations.

  This pull request contains Christoph's patches to clean up the
  duplicate accessor situation.  __get_cpu_var() uses are replaced with
  with this_cpu_ptr() and __this_cpu_ptr() with raw_cpu_ptr().

  Unfortunately, the former sometimes is tricky thanks to C being a bit
  messy with the distinction between lvalues and pointers, which led to
  a rather ugly solution for cpumask_var_t involving the introduction of
  this_cpu_cpumask_var_ptr().

  This converts most of the uses but not all.  Christoph will follow up
  with the remaining conversions in this merge window and hopefully
  remove the obsolete accessors"

* 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (38 commits)
  irqchip: Properly fetch the per cpu offset
  percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t -fix
  ia64: sn_nodepda cannot be assigned to after this_cpu conversion. Use __this_cpu_write.
  percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t
  Revert "powerpc: Replace __get_cpu_var uses"
  percpu: Remove __this_cpu_ptr
  clocksource: Replace __this_cpu_ptr with raw_cpu_ptr
  sparc: Replace __get_cpu_var uses
  avr32: Replace __get_cpu_var with __this_cpu_write
  blackfin: Replace __get_cpu_var uses
  tile: Use this_cpu_ptr() for hardware counters
  tile: Replace __get_cpu_var uses
  powerpc: Replace __get_cpu_var uses
  alpha: Replace __get_cpu_var
  ia64: Replace __get_cpu_var uses
  s390: cio driver &__get_cpu_var replacements
  s390: Replace __get_cpu_var uses
  mips: Replace __get_cpu_var uses
  MIPS: Replace __get_cpu_var uses in FPU emulator.
  arm: Replace __this_cpu_ptr with raw_cpu_ptr
  ...
2014-10-15 07:48:18 +02:00
Cong Wang dee49f203a rds: avoid calling sock_kfree_s() on allocation failure
It is okay to free a NULL pointer but not okay to mischarge the socket optmem
accounting. Compile test only.

Reported-by: rucsoftsec@gmail.com
Cc: Chien Yen <chien.yen@oracle.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-14 17:00:19 -04:00
Linus Torvalds 35a9ad8af0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "Most notable changes in here:

   1) By far the biggest accomplishment, thanks to a large range of
      contributors, is the addition of multi-send for transmit.  This is
      the result of discussions back in Chicago, and the hard work of
      several individuals.

      Now, when the ->ndo_start_xmit() method of a driver sees
      skb->xmit_more as true, it can choose to defer the doorbell
      telling the driver to start processing the new TX queue entires.

      skb->xmit_more means that the generic networking is guaranteed to
      call the driver immediately with another SKB to send.

      There is logic added to the qdisc layer to dequeue multiple
      packets at a time, and the handling mis-predicted offloads in
      software is now done with no locks held.

      Finally, pktgen is extended to have a "burst" parameter that can
      be used to test a multi-send implementation.

      Several drivers have xmit_more support: i40e, igb, ixgbe, mlx4,
      virtio_net

      Adding support is almost trivial, so export more drivers to
      support this optimization soon.

      I want to thank, in no particular or implied order, Jesper
      Dangaard Brouer, Eric Dumazet, Alexander Duyck, Tom Herbert, Jamal
      Hadi Salim, John Fastabend, Florian Westphal, Daniel Borkmann,
      David Tat, Hannes Frederic Sowa, and Rusty Russell.

   2) PTP and timestamping support in bnx2x, from Michal Kalderon.

   3) Allow adjusting the rx_copybreak threshold for a driver via
      ethtool, and add rx_copybreak support to enic driver.  From
      Govindarajulu Varadarajan.

   4) Significant enhancements to the generic PHY layer and the bcm7xxx
      driver in particular (EEE support, auto power down, etc.) from
      Florian Fainelli.

   5) Allow raw buffers to be used for flow dissection, allowing drivers
      to determine the optimal "linear pull" size for devices that DMA
      into pools of pages.  The objective is to get exactly the
      necessary amount of headers into the linear SKB area pre-pulled,
      but no more.  The new interface drivers use is eth_get_headlen().
      From WANG Cong, with driver conversions (several had their own
      by-hand duplicated implementations) by Alexander Duyck and Eric
      Dumazet.

   6) Support checksumming more smoothly and efficiently for
      encapsulations, and add "foo over UDP" facility.  From Tom
      Herbert.

   7) Add Broadcom SF2 switch driver to DSA layer, from Florian
      Fainelli.

   8) eBPF now can load programs via a system call and has an extensive
      testsuite.  Alexei Starovoitov and Daniel Borkmann.

   9) Major overhaul of the packet scheduler to use RCU in several major
      areas such as the classifiers and rate estimators.  From John
      Fastabend.

  10) Add driver for Intel FM10000 Ethernet Switch, from Alexander
      Duyck.

  11) Rearrange TCP_SKB_CB() to reduce cache line misses, from Eric
      Dumazet.

  12) Add Datacenter TCP congestion control algorithm support, From
      Florian Westphal.

  13) Reorganize sk_buff so that __copy_skb_header() is significantly
      faster.  From Eric Dumazet"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1558 commits)
  netlabel: directly return netlbl_unlabel_genl_init()
  net: add netdev_txq_bql_{enqueue, complete}_prefetchw() helpers
  net: description of dma_cookie cause make xmldocs warning
  cxgb4: clean up a type issue
  cxgb4: potential shift wrapping bug
  i40e: skb->xmit_more support
  net: fs_enet: Add NAPI TX
  net: fs_enet: Remove non NAPI RX
  r8169:add support for RTL8168EP
  net_sched: copy exts->type in tcf_exts_change()
  wimax: convert printk to pr_foo()
  af_unix: remove 0 assignment on static
  ipv6: Do not warn for informational ICMP messages, regardless of type.
  Update Intel Ethernet Driver maintainers list
  bridge: Save frag_max_size between PRE_ROUTING and POST_ROUTING
  tipc: fix bug in multicast congestion handling
  net: better IFF_XMIT_DST_RELEASE support
  net/mlx4_en: remove NETDEV_TX_BUSY
  3c59x: fix bad split of cpu_to_le32(pci_map_single())
  net: bcmgenet: fix Tx ring priority programming
  ...
2014-10-08 21:40:54 -04:00
Herton R. Krzesinski 593cbb3ec6 net/rds: fix possible double free on sock tear down
I got a report of a double free happening at RDS slab cache. One
suspicion was that may be somewhere we were doing a sock_hold/sock_put
on an already freed sock. Thus after providing a kernel with the
following change:

 static inline void sock_hold(struct sock *sk)
 {
-       atomic_inc(&sk->sk_refcnt);
+       if (!atomic_inc_not_zero(&sk->sk_refcnt))
+               WARN(1, "Trying to hold sock already gone: %p (family: %hd)\n",
+                       sk, sk->sk_family);
 }

The warning successfuly triggered:

Trying to hold sock already gone: ffff81f6dda61280 (family: 21)
WARNING: at include/net/sock.h:350 sock_hold()
Call Trace:
<IRQ>  [<ffffffff8adac135>] :rds:rds_send_remove_from_sock+0xf0/0x21b
[<ffffffff8adad35c>] :rds:rds_send_drop_acked+0xbf/0xcf
[<ffffffff8addf546>] :rds_rdma:rds_ib_recv_tasklet_fn+0x256/0x2dc
[<ffffffff8009899a>] tasklet_action+0x8f/0x12b
[<ffffffff800125a2>] __do_softirq+0x89/0x133
[<ffffffff8005f30c>] call_softirq+0x1c/0x28
[<ffffffff8006e644>] do_softirq+0x2c/0x7d
[<ffffffff8006e4d4>] do_IRQ+0xee/0xf7
[<ffffffff8005e625>] ret_from_intr+0x0/0xa
<EOI>

Looking at the call chain above, the only way I think this would be
possible is if somewhere we already released the same socket->sock which
is assigned to the rds_message at rds_send_remove_from_sock. Which seems
only possible to happen after the tear down done on rds_release.

rds_release properly calls rds_send_drop_to to drop the socket from any
rds_message, and some proper synchronization is in place to avoid race
with rds_send_drop_acked/rds_send_remove_from_sock. However, I still see
a very narrow window where it may be possible we touch a sock already
released: when rds_release races with rds_send_drop_acked, we check
RDS_MSG_ON_CONN to avoid cleanup on the same rds_message, but in this
specific case we don't clear rm->m_rs. In this case, it seems we could
then go on at rds_send_drop_to and after it returns, the sock is freed
by last sock_put on rds_release, with concurrently we being at
rds_send_remove_from_sock; then at some point in the loop at
rds_send_remove_from_sock we process an rds_message which didn't have
rm->m_rs unset for a freed sock, and a possible sock_hold on an sock
already gone at rds_release happens.

This hopefully address the described condition above and avoids a double
free on "second last" sock_put. In addition, I removed the comment about
socket destruction on top of rds_send_drop_acked: we call rds_send_drop_to
in rds_release and we should have things properly serialized there, thus
I can't see the comment being accurate there.

Signed-off-by: Herton R. Krzesinski <herton@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-03 12:52:00 -07:00
Herton R. Krzesinski eb74cc97b8 net/rds: do proper house keeping if connection fails in rds_tcp_conn_connect
I see two problems if we consider the sock->ops->connect attempt to fail in
rds_tcp_conn_connect. The first issue is that for example we don't remove the
previously added rds_tcp_connection item to rds_tcp_tc_list at
rds_tcp_set_callbacks, which means that on a next reconnect attempt for the
same rds_connection, when rds_tcp_conn_connect is called we can again call
rds_tcp_set_callbacks, resulting in duplicated items on rds_tcp_tc_list,
leading to list corruption: to avoid this just make sure we call
properly rds_tcp_restore_callbacks before we exit. The second issue
is that we should also release the sock properly, by setting sock = NULL
only if we are returning without error.

Signed-off-by: Herton R. Krzesinski <herton@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-03 12:51:59 -07:00
Herton R. Krzesinski 310886dd5f net/rds: call rds_conn_drop instead of open code it at rds_connect_complete
Signed-off-by: Herton R. Krzesinski <herton@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-03 12:51:59 -07:00
Jesper Dangaard Brouer d7cdb96808 treewide: fix synchronize_rcu() in comments
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2014-08-28 15:01:24 +02:00
Christoph Lameter 903ceff7ca net: Replace get_cpu_var through this_cpu_ptr
Replace uses of get_cpu_var for address calculation through this_cpu_ptr.

Cc: netdev@vger.kernel.org
Cc: Eric Dumazet <edumazet@google.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2014-08-26 13:45:47 -04:00
Linus Torvalds f9da455b93 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) Seccomp BPF filters can now be JIT'd, from Alexei Starovoitov.

 2) Multiqueue support in xen-netback and xen-netfront, from Andrew J
    Benniston.

 3) Allow tweaking of aggregation settings in cdc_ncm driver, from Bjørn
    Mork.

 4) BPF now has a "random" opcode, from Chema Gonzalez.

 5) Add more BPF documentation and improve test framework, from Daniel
    Borkmann.

 6) Support TCP fastopen over ipv6, from Daniel Lee.

 7) Add software TSO helper functions and use them to support software
    TSO in mvneta and mv643xx_eth drivers.  From Ezequiel Garcia.

 8) Support software TSO in fec driver too, from Nimrod Andy.

 9) Add Broadcom SYSTEMPORT driver, from Florian Fainelli.

10) Handle broadcasts more gracefully over macvlan when there are large
    numbers of interfaces configured, from Herbert Xu.

11) Allow more control over fwmark used for non-socket based responses,
    from Lorenzo Colitti.

12) Do TCP congestion window limiting based upon measurements, from Neal
    Cardwell.

13) Support busy polling in SCTP, from Neal Horman.

14) Allow RSS key to be configured via ethtool, from Venkata Duvvuru.

15) Bridge promisc mode handling improvements from Vlad Yasevich.

16) Don't use inetpeer entries to implement ID generation any more, it
    performs poorly, from Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1522 commits)
  rtnetlink: fix userspace API breakage for iproute2 < v3.9.0
  tcp: fixing TLP's FIN recovery
  net: fec: Add software TSO support
  net: fec: Add Scatter/gather support
  net: fec: Increase buffer descriptor entry number
  net: fec: Factorize feature setting
  net: fec: Enable IP header hardware checksum
  net: fec: Factorize the .xmit transmit function
  bridge: fix compile error when compiling without IPv6 support
  bridge: fix smatch warning / potential null pointer dereference
  via-rhine: fix full-duplex with autoneg disable
  bnx2x: Enlarge the dorq threshold for VFs
  bnx2x: Check for UNDI in uncommon branch
  bnx2x: Fix 1G-baseT link
  bnx2x: Fix link for KR with swapped polarity lane
  sctp: Fix sk_ack_backlog wrap-around problem
  net/core: Add VF link state control policy
  net/fsl: xgmac_mdio is dependent on OF_MDIO
  net/fsl: Make xgmac_mdio read error message useful
  net_sched: drr: warn when qdisc is not work conserving
  ...
2014-06-12 14:27:40 -07:00
Himangi Saraogi 01728371dc rds/tcp_listen: Replace comma with semicolon
This patch replaces a comma between expression statements by a semicolon.

A simplified version of the semantic patch that performs this
transformation is as follows:

// <smpl>
@r@
expression e1,e2,e;
type T;
identifier i;
@@

 e1
-,
+;
 e2;
// </smpl>

Signed-off-by: Himangi Saraogi <himangi774@gmail.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-30 17:48:58 -07:00
Himangi Saraogi cc2afe9fe2 RDS/RDMA: Replace comma with semicolon
This patch replaces a comma between expression statements by a semicolon.

A simplified version of the semantic patch that performs this
transformation is as follows:

// <smpl>
@r@
expression e1,e2,e;
type T;
identifier i;
@@

 e1
-,
+;
 e2;
// </smpl>

Signed-off-by: Himangi Saraogi <himangi774@gmail.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-30 17:48:58 -07:00
Manuel Schölling 71fd762f2e net: rds: Use time_after() for time comparison
To be future-proof and for better readability the time comparisons are modified
to use time_after() instead of raw math.

Signed-off-by: Manuel Schölling <manuel.schoelling@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-18 21:24:52 -04:00
wangweidong be7faf7168 rds: remove the unneed NULL checking
unregister_net_sysctl_table will check the ctl_table_header,
so remove the unneed checking

Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-09 15:59:45 -04:00
Peter Zijlstra 4e857c58ef arch: Mass conversion of smp_mb__*()
Mostly scripted conversion of the smp_mb__* barriers.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/n/tip-55dhyhocezdw1dg7u19hmh1u@git.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-arch@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-18 14:20:48 +02:00
David S. Miller 676d23690f net: Fix use after free by removing length arg from sk_data_ready callbacks.
Several spots in the kernel perform a sequence like:

	skb_queue_tail(&sk->s_receive_queue, skb);
	sk->sk_data_ready(sk, skb->len);

But at the moment we place the SKB onto the socket receive queue it
can be consumed and freed up.  So this skb->len access is potentially
to freed up memory.

Furthermore, the skb->len can be modified by the consumer so it is
possible that the value isn't accurate.

And finally, no actual implementation of this callback actually uses
the length argument.  And since nobody actually cared about it's
value, lots of call sites pass arbitrary values in such as '0' and
even '1'.

So just remove the length argument from the callback, that way there
is no confusion whatsoever and all of these use-after-free cases get
fixed as a side effect.

Based upon a patch by Eric Dumazet and his suggestion to audit this
issue tree-wide.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-11 16:15:36 -04:00
Sasha Levin bf39b4247b rds: prevent dereference of a NULL device in rds_iw_laddr_check
Binding might result in a NULL device which is later dereferenced
without checking.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-31 16:25:52 -04:00
Steffen Hurrle 342dfc306f net: add build-time checks for msg->msg_name size
This is a follow-up patch to f3d3342602 ("net: rework recvmsg
handler msg_name and msg_namelen logic").

DECLARE_SOCKADDR validates that the structure we use for writing the
name information to is not larger than the buffer which is reserved
for msg->msg_name (which is 128 bytes). Also use DECLARE_SOCKADDR
consistently in sendmsg code paths.

Signed-off-by: Steffen Hurrle <steffen@hurrle.net>
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-18 23:04:16 -08:00
David S. Miller 4180442058 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
	net/ipv4/tcp_metrics.c

Overlapping changes between the "don't create two tcp metrics objects
with the same key" race fix in net and the addition of the destination
address in the lookup key in net-next.

Minor overlapping changes in bnx2x driver.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-18 00:55:41 -08:00
Gerald Schaefer c196403b79 net: rds: fix per-cpu helper usage
commit ae4b46e9d "net: rds: use this_cpu_* per-cpu helper" broke per-cpu
handling for rds. chpfirst is the result of __this_cpu_read(), so it is
an absolute pointer and not __percpu. Therefore, __this_cpu_write()
should not operate on chpfirst, but rather on cache->percpu->first, just
like __this_cpu_read() did before.

Cc: <stable@vger.kernel.org> # 3.8+
Signed-off-byd Gerald Schaefer <gerald.schaefer@de.ibm.com>

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-17 17:52:22 -08:00
Aruna-Hewapathirane 63862b5bef net: replace macros net_random and net_srandom with direct calls to prandom
This patch removes the net_random and net_srandom macros and replaces
them with direct calls to the prandom ones. As new commits only seem to
use prandom_u32 there is no use to keep them around.
This change makes it easier to grep for users of prandom_u32.

Signed-off-by: Aruna-Hewapathirane <aruna.hewapathirane@gmail.com>
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14 15:15:25 -08:00
Sasha Levin c2349758ac rds: prevent dereference of a NULL device
Binding might result in a NULL device, which is dereferenced
causing this BUG:

[ 1317.260548] BUG: unable to handle kernel NULL pointer dereference at 000000000000097
4
[ 1317.261847] IP: [<ffffffff84225f52>] rds_ib_laddr_check+0x82/0x110
[ 1317.263315] PGD 418bcb067 PUD 3ceb21067 PMD 0
[ 1317.263502] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[ 1317.264179] Dumping ftrace buffer:
[ 1317.264774]    (ftrace buffer empty)
[ 1317.265220] Modules linked in:
[ 1317.265824] CPU: 4 PID: 836 Comm: trinity-child46 Tainted: G        W    3.13.0-rc4-
next-20131218-sasha-00013-g2cebb9b-dirty #4159
[ 1317.267415] task: ffff8803ddf33000 ti: ffff8803cd31a000 task.ti: ffff8803cd31a000
[ 1317.268399] RIP: 0010:[<ffffffff84225f52>]  [<ffffffff84225f52>] rds_ib_laddr_check+
0x82/0x110
[ 1317.269670] RSP: 0000:ffff8803cd31bdf8  EFLAGS: 00010246
[ 1317.270230] RAX: 0000000000000000 RBX: ffff88020b0dd388 RCX: 0000000000000000
[ 1317.270230] RDX: ffffffff8439822e RSI: 00000000000c000a RDI: 0000000000000286
[ 1317.270230] RBP: ffff8803cd31be38 R08: 0000000000000000 R09: 0000000000000000
[ 1317.270230] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
[ 1317.270230] R13: 0000000054086700 R14: 0000000000a25de0 R15: 0000000000000031
[ 1317.270230] FS:  00007ff40251d700(0000) GS:ffff88022e200000(0000) knlGS:000000000000
0000
[ 1317.270230] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1317.270230] CR2: 0000000000000974 CR3: 00000003cd478000 CR4: 00000000000006e0
[ 1317.270230] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1317.270230] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000090602
[ 1317.270230] Stack:
[ 1317.270230]  0000000054086700 5408670000a25de0 5408670000000002 0000000000000000
[ 1317.270230]  ffffffff84223542 00000000ea54c767 0000000000000000 ffffffff86d26160
[ 1317.270230]  ffff8803cd31be68 ffffffff84223556 ffff8803cd31beb8 ffff8800c6765280
[ 1317.270230] Call Trace:
[ 1317.270230]  [<ffffffff84223542>] ? rds_trans_get_preferred+0x42/0xa0
[ 1317.270230]  [<ffffffff84223556>] rds_trans_get_preferred+0x56/0xa0
[ 1317.270230]  [<ffffffff8421c9c3>] rds_bind+0x73/0xf0
[ 1317.270230]  [<ffffffff83e4ce62>] SYSC_bind+0x92/0xf0
[ 1317.270230]  [<ffffffff812493f8>] ? context_tracking_user_exit+0xb8/0x1d0
[ 1317.270230]  [<ffffffff8119313d>] ? trace_hardirqs_on+0xd/0x10
[ 1317.270230]  [<ffffffff8107a852>] ? syscall_trace_enter+0x32/0x290
[ 1317.270230]  [<ffffffff83e4cece>] SyS_bind+0xe/0x10
[ 1317.270230]  [<ffffffff843a6ad0>] tracesys+0xdd/0xe2
[ 1317.270230] Code: 00 8b 45 cc 48 8d 75 d0 48 c7 45 d8 00 00 00 00 66 c7 45 d0 02 00
89 45 d4 48 89 df e8 78 49 76 ff 41 89 c4 85 c0 75 0c 48 8b 03 <80> b8 74 09 00 00 01 7
4 06 41 bc 9d ff ff ff f6 05 2a b6 c2 02
[ 1317.270230] RIP  [<ffffffff84225f52>] rds_ib_laddr_check+0x82/0x110
[ 1317.270230]  RSP <ffff8803cd31bdf8>
[ 1317.270230] CR2: 0000000000000974

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-27 12:33:58 -05:00
Venkat Venkatsubra 18fc25c94e rds: prevent BUG_ON triggered on congestion update to loopback
After congestion update on a local connection, when rds_ib_xmit returns
less bytes than that are there in the message, rds_send_xmit calls
back rds_ib_xmit with an offset that causes BUG_ON(off & RDS_FRAG_SIZE)
to trigger.

For a 4Kb PAGE_SIZE rds_ib_xmit returns min(8240,4096)=4096 when actually
the message contains 8240 bytes. rds_send_xmit thinks there is more to send
and calls rds_ib_xmit again with a data offset "off" of 4096-48(rds header)
=4048 bytes thus hitting the BUG_ON(off & RDS_FRAG_SIZE) [RDS_FRAG_SIZE=4k].

The commit 6094628bfd
"rds: prevent BUG_ON triggering on congestion map updates" introduced
this regression. That change was addressing the triggering of a different
BUG_ON in rds_send_xmit() on PowerPC architecture with 64Kbytes PAGE_SIZE:
 	BUG_ON(ret != 0 &&
    		 conn->c_xmit_sg == rm->data.op_nents);
This was the sequence it was going through:
(rds_ib_xmit)
/* Do not send cong updates to IB loopback */
if (conn->c_loopback
   && rm->m_inc.i_hdr.h_flags & RDS_FLAG_CONG_BITMAP) {
  	rds_cong_map_updated(conn->c_fcong, ~(u64) 0);
    	return sizeof(struct rds_header) + RDS_CONG_MAP_BYTES;
}
rds_ib_xmit returns 8240
rds_send_xmit:
  c_xmit_data_off = 0 + 8240 - 48 (rds header accounted only the first time)
   		 = 8192
  c_xmit_data_off < 65536 (sg->length), so calls rds_ib_xmit again
rds_ib_xmit returns 8240
rds_send_xmit:
  c_xmit_data_off = 8192 + 8240 = 16432, calls rds_ib_xmit again
  and so on (c_xmit_data_off 24672,32912,41152,49392,57632)
rds_ib_xmit returns 8240
On this iteration this sequence causes the BUG_ON in rds_send_xmit:
    while (ret) {
    	tmp = min_t(int, ret, sg->length - conn->c_xmit_data_off);
    	[tmp = 65536 - 57632 = 7904]
    	conn->c_xmit_data_off += tmp;
    	[c_xmit_data_off = 57632 + 7904 = 65536]
    	ret -= tmp;
    	[ret = 8240 - 7904 = 336]
    	if (conn->c_xmit_data_off == sg->length) {
    		conn->c_xmit_data_off = 0;
    		sg++;
    		conn->c_xmit_sg++;
    		BUG_ON(ret != 0 &&
    			conn->c_xmit_sg == rm->data.op_nents);
    		[c_xmit_sg = 1, rm->data.op_nents = 1]

What the current fix does:
Since the congestion update over loopback is not actually transmitted
as a message, all that rds_ib_xmit needs to do is let the caller think
the full message has been transmitted and not return partial bytes.
It will return 8240 (RDS_CONG_MAP_BYTES+48) when PAGE_SIZE is 4Kb.
And 64Kb+48 when page size is 64Kb.

Reported-by: Josh Hunt <joshhunt00@gmail.com>
Tested-by: Honggang Li <honli@redhat.com>
Acked-by: Bang Nguyen <bang.nguyen@oracle.com>
Signed-off-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-03 11:54:18 -05:00
Hannes Frederic Sowa f3d3342602 net: rework recvmsg handler msg_name and msg_namelen logic
This patch now always passes msg->msg_namelen as 0. recvmsg handlers must
set msg_namelen to the proper size <= sizeof(struct sockaddr_storage)
to return msg_name to the user.

This prevents numerous uninitialized memory leaks we had in the
recvmsg handlers and makes it harder for new code to accidentally leak
uninitialized memory.

Optimize for the case recvfrom is called with NULL as address. We don't
need to copy the address at all, so set it to NULL before invoking the
recvmsg handler. We can do so, because all the recvmsg handlers must
cope with the case a plain read() is called on them. read() also sets
msg_name to NULL.

Also document these changes in include/linux/net.h as suggested by David
Miller.

Changes since RFC:

Set msg->msg_name = NULL if user specified a NULL in msg_name but had a
non-null msg_namelen in verify_iovec/verify_compat_iovec. This doesn't
affect sendto as it would bail out earlier while trying to copy-in the
address. It also more naturally reflects the logic by the callers of
verify_iovec.

With this change in place I could remove "
if (!uaddr || msg_sys->msg_namelen == 0)
	msg->msg_name = NULL
".

This change does not alter the user visible error logic as we ignore
msg_namelen as long as msg_name is NULL.

Also remove two unnecessary curly brackets in ___sys_recvmsg and change
comments to netdev style.

Cc: David Miller <davem@davemloft.net>
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-20 21:52:30 -05:00
Hannes Frederic Sowa 1bbdceef1e inet: convert inet_ehash_secret and ipv6_hash_secret to net_get_random_once
Initialize the ehash and ipv6_hash_secrets with net_get_random_once.

Each compilation unit gets its own secret now:
  ipv4/inet_hashtables.o
  ipv4/udp.o
  ipv6/inet6_hashtables.o
  ipv6/udp.o
  rds/connection.o

The functions still get inlined into the hashing functions. In the fast
path we have at most two (needed in ipv6) if (unlikely(...)).

Cc: Eric Dumazet <edumazet@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19 19:45:35 -04:00
Hannes Frederic Sowa 65cd8033ff ipv4: split inet_ehashfn to hash functions per compilation unit
This duplicates a bit of code but let's us easily introduce
separate secret keys later. The separate compilation units are
ipv4/inet_hashtabbles.o, ipv4/udp.o and rds/connection.o.

Cc: Eric Dumazet <edumazet@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19 19:45:34 -04:00
Joe Perches c1b1203d65 net: misc: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19 19:12:11 -04:00
Joe Perches fe2c6338fd net: Convert uses of typedef ctl_table to struct ctl_table
Reduce the uses of this unnecessary typedef.

Done via perl script:

$ git grep --name-only -w ctl_table net | \
  xargs perl -p -i -e '\
	sub trim { my ($local) = @_; $local =~ s/(^\s+|\s+$)//g; return $local; } \
        s/\b(?<!struct\s)ctl_table\b(\s*\*\s*|\s+\w+)/"struct ctl_table " . trim($1)/ge'

Reflow the modified lines that now exceed 80 columns.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-13 02:36:09 -07:00
Chen Gang 2e85d67690 net/rds: zero last byte for strncpy
for NUL terminated string, need be always sure '\0' in the end.

additional info:
  strncpy will pads with zeroes to the end of the given buffer.
  should initialise every bit of memory that is going to be copied to userland

Signed-off-by: Chen Gang <gang.chen@asianux.com>
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-08 00:35:44 -05:00
Linus Torvalds 9da060d0ed Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "A moderately sized pile of fixes, some specifically for merge window
  introduced regressions although others are for longer standing items
  and have been queued up for -stable.

  I'm kind of tired of all the RDS protocol bugs over the years, to be
  honest, it's way out of proportion to the number of people who
  actually use it.

   1) Fix missing range initialization in netfilter IPSET, from Jozsef
      Kadlecsik.

   2) ieee80211_local->tim_lock needs to use BH disabling, from Johannes
      Berg.

   3) Fix DMA syncing in SFC driver, from Ben Hutchings.

   4) Fix regression in BOND device MAC address setting, from Jiri
      Pirko.

   5) Missing usb_free_urb in ISDN Hisax driver, from Marina Makienko.

   6) Fix UDP checksumming in bnx2x driver for 57710 and 57711 chips,
      fix from Dmitry Kravkov.

   7) Missing cfgspace_lock initialization in BCMA driver.

   8) Validate parameter size for SCTP assoc stats getsockopt(), from
      Guenter Roeck.

   9) Fix SCTP association hangs, from Lee A Roberts.

  10) Fix jumbo frame handling in r8169, from Francois Romieu.

  11) Fix phy_device memory leak, from Petr Malat.

  12) Omit trailing FCS from frames received in BGMAC driver, from Hauke
      Mehrtens.

  13) Missing socket refcount release in L2TP, from Guillaume Nault.

  14) sctp_endpoint_init should respect passed in gfp_t, rather than use
      GFP_KERNEL unconditionally.  From Dan Carpenter.

  15) Add AISX AX88179 USB driver, from Freddy Xin.

  16) Remove MAINTAINERS entries for drivers deleted during the merge
      window, from Cesar Eduardo Barros.

  17) RDS protocol can try to allocate huge amounts of memory, check
      that the user's request length makes sense, from Cong Wang.

  18) SCTP should use the provided KMALLOC_MAX_SIZE instead of it's own,
      bogus, definition.  From Cong Wang.

  19) Fix deadlocks in FEC driver by moving TX reclaim into NAPI poll,
      from Frank Li.  Also, fix a build error introduced in the merge
      window.

  20) Fix bogus purging of default routes in ipv6, from Lorenzo Colitti.

  21) Don't double count RTT measurements when we leave the TCP receive
      fast path, from Neal Cardwell."

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (61 commits)
  tcp: fix double-counted receiver RTT when leaving receiver fast path
  CAIF: fix sparse warning for caif_usb
  rds: simplify a warning message
  net: fec: fix build error in no MXC platform
  net: ipv6: Don't purge default router if accept_ra=2
  net: fec: put tx to napi poll function to fix dead lock
  sctp: use KMALLOC_MAX_SIZE instead of its own MAX_KMALLOC_SIZE
  rds: limit the size allocated by rds_message_alloc()
  MAINTAINERS: remove eexpress
  MAINTAINERS: remove drivers/net/wan/cycx*
  MAINTAINERS: remove 3c505
  caif_dev: fix sparse warnings for caif_flow_cb
  ax88179_178a: ASIX AX88179_178A USB 3.0/2.0 to gigabit ethernet adapter driver
  sctp: use the passed in gfp flags instead GFP_KERNEL
  ipv[4|6]: correct dropwatch false positive in local_deliver_finish
  l2tp: Restore socket refcount when sendmsg succeeds
  net/phy: micrel: Disable asymmetric pause for KSZ9021
  bgmac: omit the fcs
  phy: Fix phy_device_free memory leak
  bnx2x: Fix KR2 work-around condition
  ...
2013-03-05 18:42:29 -08:00
Cong Wang 7dac1b514a rds: simplify a warning message
Cc: David S. Miller <davem@davemloft.net>
Cc: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-04 14:12:07 -05:00
Cong Wang ece6b0a2b2 rds: limit the size allocated by rds_message_alloc()
Dave Jones reported the following bug:

"When fed mangled socket data, rds will trust what userspace gives it,
and tries to allocate enormous amounts of memory larger than what
kmalloc can satisfy."

WARNING: at mm/page_alloc.c:2393 __alloc_pages_nodemask+0xa0d/0xbe0()
Hardware name: GA-MA78GM-S2H
Modules linked in: vmw_vsock_vmci_transport vmw_vmci vsock fuse bnep dlci bridge 8021q garp stp mrp binfmt_misc l2tp_ppp l2tp_core rfcomm s
Pid: 24652, comm: trinity-child2 Not tainted 3.8.0+ #65
Call Trace:
 [<ffffffff81044155>] warn_slowpath_common+0x75/0xa0
 [<ffffffff8104419a>] warn_slowpath_null+0x1a/0x20
 [<ffffffff811444ad>] __alloc_pages_nodemask+0xa0d/0xbe0
 [<ffffffff8100a196>] ? native_sched_clock+0x26/0x90
 [<ffffffff810b2128>] ? trace_hardirqs_off_caller+0x28/0xc0
 [<ffffffff810b21cd>] ? trace_hardirqs_off+0xd/0x10
 [<ffffffff811861f8>] alloc_pages_current+0xb8/0x180
 [<ffffffff8113eaaa>] __get_free_pages+0x2a/0x80
 [<ffffffff811934fe>] kmalloc_order_trace+0x3e/0x1a0
 [<ffffffff81193955>] __kmalloc+0x2f5/0x3a0
 [<ffffffff8104df0c>] ? local_bh_enable_ip+0x7c/0xf0
 [<ffffffffa0401ab3>] rds_message_alloc+0x23/0xb0 [rds]
 [<ffffffffa04043a1>] rds_sendmsg+0x2b1/0x990 [rds]
 [<ffffffff810b21cd>] ? trace_hardirqs_off+0xd/0x10
 [<ffffffff81564620>] sock_sendmsg+0xb0/0xe0
 [<ffffffff810b2052>] ? get_lock_stats+0x22/0x70
 [<ffffffff810b24be>] ? put_lock_stats.isra.23+0xe/0x40
 [<ffffffff81567f30>] sys_sendto+0x130/0x180
 [<ffffffff810b872d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff816c547b>] ? _raw_spin_unlock_irq+0x3b/0x60
 [<ffffffff816cd767>] ? sysret_check+0x1b/0x56
 [<ffffffff810b8695>] ? trace_hardirqs_on_caller+0x115/0x1a0
 [<ffffffff81341d8e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff816cd742>] system_call_fastpath+0x16/0x1b
---[ end trace eed6ae990d018c8b ]---

Reported-by: Dave Jones <davej@redhat.com>
Cc: Dave Jones <davej@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
Acked-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-04 14:12:06 -05:00
Sasha Levin b67bfe0d42 hlist: drop the node parameter from iterators
I'm not sure why, but the hlist for each entry iterators were conceived

        list_for_each_entry(pos, head, member)

The hlist ones were greedy and wanted an extra parameter:

        hlist_for_each_entry(tpos, pos, head, member)

Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.

Besides the semantic patch, there was some manual work required:

 - Fix up the actual hlist iterators in linux/list.h
 - Fix up the declaration of other iterators based on the hlist ones.
 - A very small amount of places were using the 'node' parameter, this
 was modified to use 'obj->member' instead.
 - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
 properly, so those had to be fixed up manually.

The semantic patch which is mostly the work of Peter Senna Tschudin is here:

@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

type T;
expression a,c,d,e;
identifier b;
statement S;
@@

-T b;
    <+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
    ...+>

[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin <peter.senna@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-27 19:10:24 -08:00
Kees Cook e34430eeca net/rds: remove depends on CONFIG_EXPERIMENTAL
The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
while now and is almost always enabled by default. As agreed during the
Linux kernel summit, remove it from any "depends on" lines in Kconfigs.

CC: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Acked-by: David S. Miller <davem@davemloft.net>
2013-01-11 11:40:02 -08:00
Marciniszyn, Mike a49675988c IB/rds: suppress incompatible protocol when version is known
Add an else to only print the incompatible protocol message
when version hasn't been established.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-26 15:17:37 -08:00
Marciniszyn, Mike f2e9bd7032 IB/rds: Correct ib_api use with gs_dma_address/sg_dma_len
0b088e00 ("RDS: Use page_remainder_alloc() for recv bufs")
added uses of sg_dma_len() and sg_dma_address(). This makes
RDS DOA with the qib driver.

IB ulps should use ib_sg_dma_len() and ib_sg_dma_address
respectively since some HCAs overload ib_sg_dma* operations.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-26 15:17:37 -08:00
Shan Wei ae4b46e9d7 net: rds: use this_cpu_* per-cpu helper
Signed-off-by: Shan Wei <davidshan@tencent.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-19 18:59:44 -05:00
jeff.liu 5175a5e76b RDS: fix rds-ping spinlock recursion
This is the revised patch for fixing rds-ping spinlock recursion
according to Venkat's suggestions.

RDS ping/pong over TCP feature has been broken for years(2.6.39 to
3.6.0) since we have to set TCP cork and call kernel_sendmsg() between
ping/pong which both need to lock "struct sock *sk". However, this
lock has already been hold before rds_tcp_data_ready() callback is
triggerred. As a result, we always facing spinlock resursion which
would resulting in system panic.

Given that RDS ping is only used to test the connectivity and not for
serious performance measurements, we can queue the pong transmit to
rds_wq as a delayed response.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
CC: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
CC: David S. Miller <davem@davemloft.net>
CC: James Morris <james.l.morris@oracle.com>
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-09 13:57:23 -04:00
Ying Xue bfdc587c5a rds: Don't disable BH on BH context
Since we have already in BH context when *_write_space(),
*_data_ready() as well as *_state_change() are called, it's
unnecessary to disable BH.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-22 22:52:04 -07:00
Weiping Pan 06b6a1cf6e rds: set correct msg_namelen
Jay Fenlason (fenlason@redhat.com) found a bug,
that recvfrom() on an RDS socket can return the contents of random kernel
memory to userspace if it was called with a address length larger than
sizeof(struct sockaddr_in).
rds_recvmsg() also fails to set the addr_len paramater properly before
returning, but that's just a bug.
There are also a number of cases wher recvfrom() can return an entirely bogus
address. Anything in rds_recvmsg() that returns a non-negative value but does
not go through the "sin = (struct sockaddr_in *)msg->msg_name;" code path
at the end of the while(1) loop will return up to 128 bytes of kernel memory
to userspace.

And I write two test programs to reproduce this bug, you will see that in
rds_server, fromAddr will be overwritten and the following sock_fd will be
destroyed.
Yes, it is the programmer's fault to set msg_namelen incorrectly, but it is
better to make the kernel copy the real length of address to user space in
such case.

How to run the test programs ?
I test them on 32bit x86 system, 3.5.0-rc7.

1 compile
gcc -o rds_client rds_client.c
gcc -o rds_server rds_server.c

2 run ./rds_server on one console

3 run ./rds_client on another console

4 you will see something like:
server is waiting to receive data...
old socket fd=3
server received data from client:data from client
msg.msg_namelen=32
new socket fd=-1067277685
sendmsg()
: Bad file descriptor

/***************** rds_client.c ********************/

int main(void)
{
	int sock_fd;
	struct sockaddr_in serverAddr;
	struct sockaddr_in toAddr;
	char recvBuffer[128] = "data from client";
	struct msghdr msg;
	struct iovec iov;

	sock_fd = socket(AF_RDS, SOCK_SEQPACKET, 0);
	if (sock_fd < 0) {
		perror("create socket error\n");
		exit(1);
	}

	memset(&serverAddr, 0, sizeof(serverAddr));
	serverAddr.sin_family = AF_INET;
	serverAddr.sin_addr.s_addr = inet_addr("127.0.0.1");
	serverAddr.sin_port = htons(4001);

	if (bind(sock_fd, (struct sockaddr*)&serverAddr, sizeof(serverAddr)) < 0) {
		perror("bind() error\n");
		close(sock_fd);
		exit(1);
	}

	memset(&toAddr, 0, sizeof(toAddr));
	toAddr.sin_family = AF_INET;
	toAddr.sin_addr.s_addr = inet_addr("127.0.0.1");
	toAddr.sin_port = htons(4000);
	msg.msg_name = &toAddr;
	msg.msg_namelen = sizeof(toAddr);
	msg.msg_iov = &iov;
	msg.msg_iovlen = 1;
	msg.msg_iov->iov_base = recvBuffer;
	msg.msg_iov->iov_len = strlen(recvBuffer) + 1;
	msg.msg_control = 0;
	msg.msg_controllen = 0;
	msg.msg_flags = 0;

	if (sendmsg(sock_fd, &msg, 0) == -1) {
		perror("sendto() error\n");
		close(sock_fd);
		exit(1);
	}

	printf("client send data:%s\n", recvBuffer);

	memset(recvBuffer, '\0', 128);

	msg.msg_name = &toAddr;
	msg.msg_namelen = sizeof(toAddr);
	msg.msg_iov = &iov;
	msg.msg_iovlen = 1;
	msg.msg_iov->iov_base = recvBuffer;
	msg.msg_iov->iov_len = 128;
	msg.msg_control = 0;
	msg.msg_controllen = 0;
	msg.msg_flags = 0;
	if (recvmsg(sock_fd, &msg, 0) == -1) {
		perror("recvmsg() error\n");
		close(sock_fd);
		exit(1);
	}

	printf("receive data from server:%s\n", recvBuffer);

	close(sock_fd);

	return 0;
}

/***************** rds_server.c ********************/

int main(void)
{
	struct sockaddr_in fromAddr;
	int sock_fd;
	struct sockaddr_in serverAddr;
	unsigned int addrLen;
	char recvBuffer[128];
	struct msghdr msg;
	struct iovec iov;

	sock_fd = socket(AF_RDS, SOCK_SEQPACKET, 0);
	if(sock_fd < 0) {
		perror("create socket error\n");
		exit(0);
	}

	memset(&serverAddr, 0, sizeof(serverAddr));
	serverAddr.sin_family = AF_INET;
	serverAddr.sin_addr.s_addr = inet_addr("127.0.0.1");
	serverAddr.sin_port = htons(4000);
	if (bind(sock_fd, (struct sockaddr*)&serverAddr, sizeof(serverAddr)) < 0) {
		perror("bind error\n");
		close(sock_fd);
		exit(1);
	}

	printf("server is waiting to receive data...\n");
	msg.msg_name = &fromAddr;

	/*
	 * I add 16 to sizeof(fromAddr), ie 32,
	 * and pay attention to the definition of fromAddr,
	 * recvmsg() will overwrite sock_fd,
	 * since kernel will copy 32 bytes to userspace.
	 *
	 * If you just use sizeof(fromAddr), it works fine.
	 * */
	msg.msg_namelen = sizeof(fromAddr) + 16;
	/* msg.msg_namelen = sizeof(fromAddr); */
	msg.msg_iov = &iov;
	msg.msg_iovlen = 1;
	msg.msg_iov->iov_base = recvBuffer;
	msg.msg_iov->iov_len = 128;
	msg.msg_control = 0;
	msg.msg_controllen = 0;
	msg.msg_flags = 0;

	while (1) {
		printf("old socket fd=%d\n", sock_fd);
		if (recvmsg(sock_fd, &msg, 0) == -1) {
			perror("recvmsg() error\n");
			close(sock_fd);
			exit(1);
		}
		printf("server received data from client:%s\n", recvBuffer);
		printf("msg.msg_namelen=%d\n", msg.msg_namelen);
		printf("new socket fd=%d\n", sock_fd);
		strcat(recvBuffer, "--data from server");
		if (sendmsg(sock_fd, &msg, 0) == -1) {
			perror("sendmsg()\n");
			close(sock_fd);
			exit(1);
		}
	}

	close(sock_fd);
	return 0;
}

Signed-off-by: Weiping Pan <wpan@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-23 01:01:44 -07:00
Ben Hutchings 2c53040f01 net: Fix (nearly-)kernel-doc comments for various functions
Fix incorrect start markers, wrapped summary lines, missing section
breaks, incorrect separators, and some name mismatches.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-10 23:13:45 -07:00
Thadeu Lima de Souza Cascardo a0c6ffbcfe rds_rdma: don't assume infiniband device is PCI
RDS code assumes that the struct ib_device dma_device member, which is a
pointer, points to a struct device embedded in a struct pci_dev.

This is not the case for ehca, for example, which is a OF driver, and
makes dma_device point to a struct device embedded in a struct
platform_device.

This will make the system crash when rds_rdma is loaded in a system
with ehca, since it will try to access the bus member of a non-existent
struct pci_dev.

The only reason rds_rdma uses the struct pci_dev is to get the NUMA node
the device is attached to. Using dev_to_node for that is much better,
since it won't assume which bus the infiniband is attached to.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Cc: dledford@redhat.com
Cc: Jes.Sorensen@redhat.com
Cc: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Acked-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-29 17:30:07 -04:00
Pavel Emelyanov 4a17fd5229 sock: Introduce named constants for sk_reuse
Name them in a "backward compatible" manner, i.e. reuse or not
are still 1 and 0 respectively. The reuse value of 2 means that
the socket with it will forcibly reuse everyone else's port.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-21 15:52:25 -04:00
Eric W. Biederman ec8f23ce0f net: Convert all sysctl registrations to register_net_sysctl
This results in code with less boiler plate that is a bit easier
to read.

Additionally stops us from using compatibility code in the sysctl
core, hastening the day when the compatibility code can be removed.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20 21:22:30 -04:00
Eric W. Biederman 5dd3df105b net: Move all of the network sysctls without a namespace into init_net.
This makes it clearer which sysctls are relative to your current network
namespace.

This makes it a little less error prone by not exposing sysctls for the
initial network namespace in other namespaces.

This is the same way we handle all of our other network interfaces to
userspace and I can't honestly remember why we didn't do this for
sysctls right from the start.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20 21:21:17 -04:00
Dan Carpenter f0229eaaf3 RDS: use gfp flags from caller in conn_alloc()
We should be using the gfp flags the caller specified here, instead of
GFP_KERNEL.  I think this might be a bugfix, depending on the value of
"sock->sk->sk_allocation" when we call rds_conn_create_outgoing() in
rds_sendmsg().  Otherwise, it's just a cleanup.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-22 19:29:58 -04:00
Linus Torvalds 9f3938346a Merge branch 'kmap_atomic' of git://github.com/congwang/linux
Pull kmap_atomic cleanup from Cong Wang.

It's been in -next for a long time, and it gets rid of the (no longer
used) second argument to k[un]map_atomic().

Fix up a few trivial conflicts in various drivers, and do an "evil
merge" to catch some new uses that have come in since Cong's tree.

* 'kmap_atomic' of git://github.com/congwang/linux: (59 commits)
  feature-removal-schedule.txt: schedule the deprecated form of kmap_atomic() for removal
  highmem: kill all __kmap_atomic() [swarren@nvidia.com: highmem: Fix ARM build break due to __kmap_atomic rename]
  drbd: remove the second argument of k[un]map_atomic()
  zcache: remove the second argument of k[un]map_atomic()
  gma500: remove the second argument of k[un]map_atomic()
  dm: remove the second argument of k[un]map_atomic()
  tomoyo: remove the second argument of k[un]map_atomic()
  sunrpc: remove the second argument of k[un]map_atomic()
  rds: remove the second argument of k[un]map_atomic()
  net: remove the second argument of k[un]map_atomic()
  mm: remove the second argument of k[un]map_atomic()
  lib: remove the second argument of k[un]map_atomic()
  power: remove the second argument of k[un]map_atomic()
  kdb: remove the second argument of k[un]map_atomic()
  udf: remove the second argument of k[un]map_atomic()
  ubifs: remove the second argument of k[un]map_atomic()
  squashfs: remove the second argument of k[un]map_atomic()
  reiserfs: remove the second argument of k[un]map_atomic()
  ocfs2: remove the second argument of k[un]map_atomic()
  ntfs: remove the second argument of k[un]map_atomic()
  ...
2012-03-21 09:40:26 -07:00
Linus Torvalds 69a7aebcf0 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree from Jiri Kosina:
 "It's indeed trivial -- mostly documentation updates and a bunch of
  typo fixes from Masanari.

  There are also several linux/version.h include removals from Jesper."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (101 commits)
  kcore: fix spelling in read_kcore() comment
  constify struct pci_dev * in obvious cases
  Revert "char: Fix typo in viotape.c"
  init: fix wording error in mm_init comment
  usb: gadget: Kconfig: fix typo for 'different'
  Revert "power, max8998: Include linux/module.h just once in drivers/power/max8998_charger.c"
  writeback: fix fn name in writeback_inodes_sb_nr_if_idle() comment header
  writeback: fix typo in the writeback_control comment
  Documentation: Fix multiple typo in Documentation
  tpm_tis: fix tis_lock with respect to RCU
  Revert "media: Fix typo in mixer_drv.c and hdmi_drv.c"
  Doc: Update numastat.txt
  qla4xxx: Add missing spaces to error messages
  compiler.h: Fix typo
  security: struct security_operations kerneldoc fix
  Documentation: broken URL in libata.tmpl
  Documentation: broken URL in filesystems.tmpl
  mtd: simplify return logic in do_map_probe()
  mm: fix comment typo of truncate_inode_pages_range
  power: bq27x00: Fix typos in comment
  ...
2012-03-20 21:12:50 -07:00
Dave Jones a6506e1486 Remove printk from rds_sendmsg
no socket layer outputs a message for this error and neither should rds.

Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-20 16:12:11 -04:00
Cong Wang 6114eab535 rds: remove the second argument of k[un]map_atomic()
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
2012-03-20 21:48:28 +08:00
Masanari Iida 5fd5c44d3f rds: Fix typo in iw_recv.c and ib_recv.c
Correct spelling "inclue" to "include" in
net/rds/iw_recv.c and net/rds/ib_recv.c

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-02-09 23:09:54 +01:00
David S. Miller efc3dbc374 rds: Make rds_sock_lock BH rather than IRQ safe.
rds_sock_info() triggers locking warnings because we try to perform a
local_bh_enable() (via sock_i_ino()) while hardware interrupts are
disabled (via taking rds_sock_lock).

There is no reason for rds_sock_lock to be a hardware IRQ disabling
lock, none of these access paths run in hardware interrupt context.

Therefore making it a BH disabling lock is safe and sufficient to
fix this bug.

Reported-by: Kumar Sanghvi <kumaras@chelsio.com>
Reported-by: Josh Boyer <jwboyer@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-24 17:03:44 -05:00
Roland Dreier 5b7bf42e3d RDS: Remove some unused iWARP code
rds_iw_flush_goal() just returns a count, but it is only called in one
place and its return value is ignored there.  So delete all the dead code.

Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-12 20:05:28 -08:00
Paul Bolle 77c1c7c4bd rds: drop "select LLIST"
Commit 1bc144b625 ("net, rds, Replace xlist in net/rds/xlist.h with
llist") added "select LLIST" to the RDS_RDMA Kconfig entry. But there is
no Kconfig symbol named LLIST. The select statement for that symbol is a
nop. Drop it.

lib/llist.o is builtin, so all that's needed to use the llist
functionality is to include linux/llist.h, which this commit also did.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-14 00:10:50 -05:00
Linus Torvalds 32aaeffbd4 Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux
* 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
  Revert "tracing: Include module.h in define_trace.h"
  irq: don't put module.h into irq.h for tracking irqgen modules.
  bluetooth: macroize two small inlines to avoid module.h
  ip_vs.h: fix implicit use of module_get/module_put from module.h
  nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
  include: replace linux/module.h with "struct module" wherever possible
  include: convert various register fcns to macros to avoid include chaining
  crypto.h: remove unused crypto_tfm_alg_modname() inline
  uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
  pm_runtime.h: explicitly requires notifier.h
  linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
  miscdevice.h: fix up implicit use of lists and types
  stop_machine.h: fix implicit use of smp.h for smp_processor_id
  of: fix implicit use of errno.h in include/linux/of.h
  of_platform.h: delete needless include <linux/module.h>
  acpi: remove module.h include from platform/aclinux.h
  miscdevice.h: delete unnecessary inclusion of module.h
  device_cgroup.h: delete needless include <linux/module.h>
  net: sch_generic remove redundant use of <linux/module.h>
  net: inet_timewait_sock doesnt need <linux/module.h>
  ...

Fix up trivial conflicts (other header files, and  removal of the ab3550 mfd driver) in
 - drivers/media/dvb/frontends/dibx000_common.c
 - drivers/media/video/{mt9m111.c,ov6650.c}
 - drivers/mfd/ab3550-core.c
 - include/linux/dmaengine.h
2011-11-06 19:44:47 -08:00
Joe Perches b9075fa968 treewide: use __printf not __attribute__((format(printf,...)))
Standardize the style for compiler based printf format verification.
Standardized the location of __printf too.

Done via script and a little typing.

$ grep -rPl --include=*.[ch] -w "__attribute__" * | \
  grep -vP "^(tools|scripts|include/linux/compiler-gcc.h)" | \
  xargs perl -n -i -e 'local $/; while (<>) { s/\b__attribute__\s*\(\s*\(\s*format\s*\(\s*printf\s*,\s*(.+)\s*,\s*(.+)\s*\)\s*\)\s*\)/__printf($1, $2)/g ; print; }'

[akpm@linux-foundation.org: revert arch bits]
Signed-off-by: Joe Perches <joe@perches.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31 17:30:54 -07:00
Paul Gortmaker bc3b2d7fb9 net: Add export.h for EXPORT_SYMBOL/THIS_MODULE to non-modules
These files are non modular, but need to export symbols using
the macros now living in export.h -- call out the include so
that things won't break when we remove the implicit presence
of module.h from everywhere.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:30:30 -04:00
Paul Gortmaker d9b9384215 net: add moduleparam.h for users of module_param/MODULE_PARM_DESC
These files were getting access to these two via the implicit
presence of module.h everywhere.  They aren't modules, so they
don't need the full module.h inclusion though.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:30:29 -04:00
Paul Gortmaker 3a9a231d97 net: Fix files explicitly needing to include module.h
With calls to modular infrastructure, these files really
needs the full module.h header.  Call it out so some of the
cleanups of implicit and unrequired includes elsewhere can be
cleaned up.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:30:28 -04:00
Linus Torvalds 8a9ea3237e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1745 commits)
  dp83640: free packet queues on remove
  dp83640: use proper function to free transmit time stamping packets
  ipv6: Do not use routes from locally generated RAs
  |PATCH net-next] tg3: add tx_dropped counter
  be2net: don't create multiple RX/TX rings in multi channel mode
  be2net: don't create multiple TXQs in BE2
  be2net: refactor VF setup/teardown code into be_vf_setup/clear()
  be2net: add vlan/rx-mode/flow-control config to be_setup()
  net_sched: cls_flow: use skb_header_pointer()
  ipv4: avoid useless call of the function check_peer_pmtu
  TCP: remove TCP_DEBUG
  net: Fix driver name for mdio-gpio.c
  ipv4: tcp: fix TOS value in ACK messages sent from TIME_WAIT
  rtnetlink: Add missing manual netlink notification in dev_change_net_namespaces
  ipv4: fix ipsec forward performance regression
  jme: fix irq storm after suspend/resume
  route: fix ICMP redirect validation
  net: hold sock reference while processing tx timestamps
  tcp: md5: add more const attributes
  Add ethtool -g support to virtio_net
  ...

Fix up conflicts in:
 - drivers/net/Kconfig:
	The split-up generated a trivial conflict with removal of a
	stale reference to Documentation/networking/net-modules.txt.
	Remove it from the new location instead.
 - fs/sysfs/dir.c:
	Fairly nasty conflicts with the sysfs rb-tree usage, conflicting
	with Eric Biederman's changes for tagged directories.
2011-10-25 13:25:22 +02:00
Linus Torvalds 59e5253417 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (59 commits)
  MAINTAINERS: linux-m32r is moderated for non-subscribers
  linux@lists.openrisc.net is moderated for non-subscribers
  Drop default from "DM365 codec select" choice
  parisc: Kconfig: cleanup Kernel page size default
  Kconfig: remove redundant CONFIG_ prefix on two symbols
  cris: remove arch/cris/arch-v32/lib/nand_init.S
  microblaze: add missing CONFIG_ prefixes
  h8300: drop puzzling Kconfig dependencies
  MAINTAINERS: microblaze-uclinux@itee.uq.edu.au is moderated for non-subscribers
  tty: drop superfluous dependency in Kconfig
  ARM: mxc: fix Kconfig typo 'i.MX51'
  Fix file references in Kconfig files
  aic7xxx: fix Kconfig references to READMEs
  Fix file references in drivers/ide/
  thinkpad_acpi: Fix printk typo 'bluestooth'
  bcmring: drop commented out line in Kconfig
  btmrvl_sdio: fix typo 'btmrvl_sdio_sd6888'
  doc: raw1394: Trivial typo fix
  CIFS: Don't free volume_info->UNC until we are entirely done with it.
  treewide: Correct spelling of successfully in comments
  ...
2011-10-25 12:11:02 +02:00
David S. Miller 88c5100c28 Merge branch 'master' of github.com:davem330/net
Conflicts:
	net/batman-adv/soft-interface.c
2011-10-07 13:38:43 -04:00
Jonathan Lallinger 85a6488949 RDSRDMA: Fix cleanup of rds_iw_mr_pool
In the rds_iw_mr_pool struct the free_pinned field keeps track of
memory pinned by free MRs. While this field is incremented properly
upon allocation, it is never decremented upon unmapping. This would
cause the rds_rdma module to crash the kernel upon unloading, by
triggering the BUG_ON in the rds_iw_destroy_mr_pool function.

This change keeps track of the MRs that become unpinned, so that
free_pinned can be decremented appropriately.

Signed-off-by: Jonathan Lallinger <jonathan@ogc.us>
Signed-off-by: Steve Wise <swise@ogc.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-09-29 14:57:19 -04:00
Huang Ying 1bc144b625 net, rds, Replace xlist in net/rds/xlist.h with llist
The functionality of xlist and llist is almost same.  This patch
replace xlist with llist to avoid code duplication.

Known issues: don't know how to test this, need special hardware?

Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Andy Grover <andy.grover@oracle.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-09-15 15:36:32 -04:00
Joe Perches 3dbd443983 net: Convert vmalloc/memset to vzalloc
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-09-15 13:59:25 +02:00
Amerigo Wang 80f1ff97d0 notifiers: cpu: move cpu notifiers into cpu.h
We presently define all kinds of notifiers in notifier.h.  This is not
necessary at all, since different subsystems use different notifiers, they
are almost non-related with each other.

This can also save much build time.  Suppose I add a new netdevice event,
really I don't have to recompile all the source, just network related.
Without this patch, all the source will be recompiled.

I move the notify events near to their subsystem notifier registers, so
that they can be found more easily.

This patch:

It is not necessary to share the same notifier.h.

Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: David Miller <davem@davemloft.net>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-25 20:57:14 -07:00
Greg Dietsche 3e878b8d54 net: rds: fix const array syntax
Correct the syntax so that both array and pointer are const.

Signed-off-by: Greg Dietsche <Gregory.Dietsche@cuw.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-01 16:16:19 -07:00
Manuel Zerpies cb0a605649 net/rds: use prink_ratelimited() instead of printk_ratelimit()
Since printk_ratelimit() shouldn't be used anymore (see comment in
include/linux/printk.h), replace it with printk_ratelimited()

Signed-off-by: Manuel Zerpies <manuel.f.zerpies@ww.stud.uni-erlangen.de>
Signed-off-by: David S. Miller <davem@conan.davemloft.net>
2011-06-17 00:03:03 -04:00
Alexey Dobriyan a6b7a40786 net: remove interrupt.h inclusion from netdevice.h
* remove interrupt.g inclusion from netdevice.h -- not needed
* fixup fallout, add interrupt.h and hardirq.h back where needed.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-06-06 22:55:11 -07:00
Sean Hefty b26f9b9949 RDMA/cma: Pass QP type into rdma_create_id()
The RDMA CM currently infers the QP type from the port space selected
by the user.  In the future (eg with RDMA_PS_IB or XRC), there may not
be a 1-1 correspondence between port space and QP type.  For netlink
export of RDMA CM state, we want to export the QP type to userspace,
so it is cleaner to explicitly associate a QP type to an ID.

Modify rdma_create_id() to allow the user to specify the QP type, and
use it to make our selections of datagram versus connected mode.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-05-25 13:46:23 -07:00
Lucas De Marchi 25985edced Fix common misspellings
Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2011-03-31 11:26:23 -03:00
Akinobu Mita e1dc1c81b9 rds: use little-endian bitops
As a preparation for removing ext2 non-atomic bit operations from
asm/bitops.h.  This converts ext2 non-atomic bit operations to
little-endian bit operations.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Andy Grover <andy.grover@oracle.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-23 19:46:16 -07:00
Akinobu Mita 12ce22423a rds: stop including asm-generic/bitops/le.h directly
asm-generic/bitops/le.h is only intended to be included directly from
asm-generic/bitops/ext2-non-atomic.h or asm-generic/bitops/minix-le.h
which implements generic ext2 or minix bit operations.

This stops including asm-generic/bitops/le.h directly and use ext2
non-atomic bit operations instead.

It seems odd to use ext2_*_bit() on rds, but it will replaced with
__{set,clear,test}_bit_le() after introducing little endian bit operations
for all architectures.  This indirect step is necessary to maintain
bisectability for some architectures which have their own little-endian
bit operations.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Andy Grover <andy.grover@oracle.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-23 19:46:10 -07:00
Linus Torvalds 7a6362800c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1480 commits)
  bonding: enable netpoll without checking link status
  xfrm: Refcount destination entry on xfrm_lookup
  net: introduce rx_handler results and logic around that
  bonding: get rid of IFF_SLAVE_INACTIVE netdev->priv_flag
  bonding: wrap slave state work
  net: get rid of multiple bond-related netdevice->priv_flags
  bonding: register slave pointer for rx_handler
  be2net: Bump up the version number
  be2net: Copyright notice change. Update to Emulex instead of ServerEngines
  e1000e: fix kconfig for crc32 dependency
  netfilter ebtables: fix xt_AUDIT to work with ebtables
  xen network backend driver
  bonding: Improve syslog message at device creation time
  bonding: Call netif_carrier_off after register_netdevice
  bonding: Incorrect TX queue offset
  net_sched: fix ip_tos2prio
  xfrm: fix __xfrm_route_forward()
  be2net: Fix UDP packet detected status in RX compl
  Phonet: fix aligned-mode pipe socket buffer header reserve
  netxen: support for GbE port settings
  ...

Fix up conflicts in drivers/staging/brcm80211/brcmsmac/wl_mac80211.c
with the staging updates.
2011-03-16 16:29:25 -07:00
Linus Torvalds bd2895eead Merge branch 'for-2.6.39' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
* 'for-2.6.39' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: fix build failure introduced by s/freezeable/freezable/
  workqueue: add system_freezeable_wq
  rds/ib: use system_wq instead of rds_ib_fmr_wq
  net/9p: replace p9_poll_task with a work
  net/9p: use system_wq instead of p9_mux_wq
  xfs: convert to alloc_workqueue()
  reiserfs: make commit_wq use the default concurrency level
  ocfs2: use system_wq instead of ocfs2_quota_wq
  ext4: convert to alloc_workqueue()
  scsi/scsi_tgt_lib: scsi_tgtd isn't used in memory reclaim path
  scsi/be2iscsi,qla2xxx: convert to alloc_workqueue()
  misc/iwmc3200top: use system_wq instead of dedicated workqueues
  i2o: use alloc_workqueue() instead of create_workqueue()
  acpi: kacpi*_wq don't need WQ_MEM_RECLAIM
  fs/aio: aio_wq isn't used in memory reclaim path
  input/tps6507x-ts: use system_wq instead of dedicated workqueue
  cpufreq: use system_wq instead of dedicated workqueues
  wireless/ipw2x00: use system_wq instead of dedicated workqueues
  arm/omap: use system_wq in mailbox
  workqueue: use WQ_MEM_RECLAIM instead of WQ_RESCUER
2011-03-16 08:20:19 -07:00
David S. Miller 33175d84ee Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/bnx2x/bnx2x_cmn.c
2011-03-10 14:26:00 -08:00
Neil Horman 6094628bfd rds: prevent BUG_ON triggering on congestion map updates
Recently had this bug halt reported to me:

kernel BUG at net/rds/send.c:329!
Oops: Exception in kernel mode, sig: 5 [#1]
SMP NR_CPUS=1024 NUMA pSeries
Modules linked in: rds sunrpc ipv6 dm_mirror dm_region_hash dm_log ibmveth sg
ext4 jbd2 mbcache sd_mod crc_t10dif ibmvscsic scsi_transport_srp scsi_tgt
dm_mod [last unloaded: scsi_wait_scan]
NIP: d000000003ca68f4 LR: d000000003ca67fc CTR: d000000003ca8770
REGS: c000000175cab980 TRAP: 0700   Not tainted  (2.6.32-118.el6.ppc64)
MSR: 8000000000029032 <EE,ME,CE,IR,DR>  CR: 44000022  XER: 00000000
TASK = c00000017586ec90[1896] 'krdsd' THREAD: c000000175ca8000 CPU: 0
GPR00: 0000000000000150 c000000175cabc00 d000000003cb7340 0000000000002030
GPR04: ffffffffffffffff 0000000000000030 0000000000000000 0000000000000030
GPR08: 0000000000000001 0000000000000001 c0000001756b1e30 0000000000010000
GPR12: d000000003caac90 c000000000fa2500 c0000001742b2858 c0000001742b2a00
GPR16: c0000001742b2a08 c0000001742b2820 0000000000000001 0000000000000001
GPR20: 0000000000000040 c0000001742b2814 c000000175cabc70 0800000000000000
GPR24: 0000000000000004 0200000000000000 0000000000000000 c0000001742b2860
GPR28: 0000000000000000 c0000001756b1c80 d000000003cb68e8 c0000001742b27b8
NIP [d000000003ca68f4] .rds_send_xmit+0x4c4/0x8a0 [rds]
LR [d000000003ca67fc] .rds_send_xmit+0x3cc/0x8a0 [rds]
Call Trace:
[c000000175cabc00] [d000000003ca67fc] .rds_send_xmit+0x3cc/0x8a0 [rds]
(unreliable)
[c000000175cabd30] [d000000003ca7e64] .rds_send_worker+0x54/0x100 [rds]
[c000000175cabdb0] [c0000000000b475c] .worker_thread+0x1dc/0x3c0
[c000000175cabed0] [c0000000000baa9c] .kthread+0xbc/0xd0
[c000000175cabf90] [c000000000032114] .kernel_thread+0x54/0x70
Instruction dump:
4bfffd50 60000000 60000000 39080001 935f004c f91f0040 41820024 813d017c
7d094a78 7d290074 7929d182 394a0020 <0b090000> 40e2ff68 4bffffa4 39200000
Kernel panic - not syncing: Fatal exception
Call Trace:
[c000000175cab560] [c000000000012e04] .show_stack+0x74/0x1c0 (unreliable)
[c000000175cab610] [c0000000005a365c] .panic+0x80/0x1b4
[c000000175cab6a0] [c00000000002fbcc] .die+0x21c/0x2a0
[c000000175cab750] [c000000000030000] ._exception+0x110/0x220
[c000000175cab910] [c000000000004b9c] program_check_common+0x11c/0x180

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-08 11:22:43 -08:00