Commit Graph

361 Commits

Author SHA1 Message Date
David S. Miller e77c8e83dd Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2010-03-20 15:24:29 -07:00
Eric Dumazet 0641e4fbf2 net: Potential null skb->dev dereference
When doing "ifenslave -d bond0 eth0", there is chance to get NULL
dereference in netif_receive_skb(), because dev->master suddenly becomes
NULL after we tested it.

We should use ACCESS_ONCE() to avoid this (or rcu_dereference())

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-18 21:16:45 -07:00
Jiri Pirko 3ca5b4042e bonding: check return value of nofitier when changing type
This patch adds the possibility to refuse the bonding type change for
other subsystems (such as for example bridge, vlan, etc.)

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-18 20:00:02 -07:00
Tom Herbert 1e94d72fea rps: Fixed build with CONFIG_SMP not enabled.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-18 17:45:44 -07:00
Tom Herbert 0a9627f264 rps: Receive Packet Steering
This patch implements software receive side packet steering (RPS).  RPS
distributes the load of received packet processing across multiple CPUs.

Problem statement: Protocol processing done in the NAPI context for received
packets is serialized per device queue and becomes a bottleneck under high
packet load.  This substantially limits pps that can be achieved on a single
queue NIC and provides no scaling with multiple cores.

This solution queues packets early on in the receive path on the backlog queues
of other CPUs.   This allows protocol processing (e.g. IP and TCP) to be
performed on packets in parallel.   For each device (or each receive queue in
a multi-queue device) a mask of CPUs is set to indicate the CPUs that can
process packets. A CPU is selected on a per packet basis by hashing contents
of the packet header (e.g. the TCP or UDP 4-tuple) and using the result to index
into the CPU mask.  The IPI mechanism is used to raise networking receive
softirqs between CPUs.  This effectively emulates in software what a multi-queue
NIC can provide, but is generic requiring no device support.

Many devices now provide a hash over the 4-tuple on a per packet basis
(e.g. the Toeplitz hash).  This patch allow drivers to set the HW reported hash
in an skb field, and that value in turn is used to index into the RPS maps.
Using the HW generated hash can avoid cache misses on the packet when
steering it to a remote CPU.

The CPU mask is set on a per device and per queue basis in the sysfs variable
/sys/class/net/<device>/queues/rx-<n>/rps_cpus.  This is a set of canonical
bit maps for receive queues in the device (numbered by <n>).  If a device
does not support multi-queue, a single variable is used for the device (rx-0).

Generally, we have found this technique increases pps capabilities of a single
queue device with good CPU utilization.  Optimal settings for the CPU mask
seem to depend on architectures and cache hierarcy.  Below are some results
running 500 instances of netperf TCP_RR test with 1 byte req. and resp.
Results show cumulative transaction rate and system CPU utilization.

e1000e on 8 core Intel
   Without RPS: 108K tps at 33% CPU
   With RPS:    311K tps at 64% CPU

forcedeth on 16 core AMD
   Without RPS: 156K tps at 15% CPU
   With RPS:    404K tps at 49% CPU

bnx2x on 16 core AMD
   Without RPS  567K tps at 61% CPU (4 HW RX queues)
   Without RPS  738K tps at 96% CPU (8 HW RX queues)
   With RPS:    854K tps at 76% CPU (4 HW RX queues)

Caveats:
- The benefits of this patch are dependent on architecture and cache hierarchy.
Tuning the masks to get best performance is probably necessary.
- This patch adds overhead in the path for processing a single packet.  In
a lightly loaded server this overhead may eliminate the advantages of
increased parallelism, and possibly cause some relative performance degradation.
We have found that masks that are cache aware (share same caches with
the interrupting CPU) mitigate much of this.
- The RPS masks can be changed dynamically, however whenever the mask is changed
this introduces the possibility of generating out of order packets.  It's
probably best not change the masks too frequently.

Signed-off-by: Tom Herbert <therbert@google.com>

 include/linux/netdevice.h |   32 ++++-
 include/linux/skbuff.h    |    3 +
 net/core/dev.c            |  335 +++++++++++++++++++++++++++++++++++++--------
 net/core/net-sysfs.c      |  225 ++++++++++++++++++++++++++++++-
 net/core/skbuff.c         |    2 +
 5 files changed, 538 insertions(+), 59 deletions(-)
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-16 21:23:18 -07:00
Patrick McHardy bd38081160 dev: support deferring device flag change notifications
Split dev_change_flags() into two functions: __dev_change_flags() to
perform the actual changes and __dev_notify_flags() to invoke netdevice
notifiers. This will be used by rtnl_link to defer netlink notifications
until the device has been fully configured.

This changes ordering of some operations, in particular:

- netlink notifications are sent after all changes have been performed.
  As a side effect this surpresses one unnecessary netlink message when
  the IFF_UP and other flags are changed simultaneously.

- The NETDEV_UP/NETDEV_DOWN and NETDEV_CHANGE notifiers are invoked
  after all changes have been performed. Their relative is unchanged.

- net_dmaengine_put() is invoked before the NETDEV_DOWN notifier instead
  of afterwards. This should not make any difference since both RX and TX
  are already shut down at this point.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-27 02:43:40 -08:00
Patrick McHardy a2835763e1 rtnetlink: handle rtnl_link netlink notifications manually
In order to support specifying device flags during device creation,
we must be able to roll back device registration in case setting the
flags fails without sending any notifications related to the device
to userspace.

This patch changes rollback_registered_many() and register_netdevice()
to manually send netlink notifications for devices not handled by
rtnl_link and allows to defer notifications for devices handled by
rtnl_link until setup is complete.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-27 02:43:39 -08:00
David S. Miller ce300c7ffa Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2010-02-27 02:05:54 -08:00
John W. Linville caf66e5811 netdevice.h: check for CONFIG_WLAN instead of CONFIG_WLAN_80211
In "wireless: remove WLAN_80211 and WLAN_PRE80211 from Kconfig" I
inadvertantly missed a line in include/linux/netdevice.h.  I thereby
effectively reverted "net: Set LL_MAX_HEADER properly for wireless." by
accident. :-(  Now we should check there for CONFIG_WLAN instead.

Signed-off-by: John W. Linville <linville@tuxdriver.com>
Reported-by: Christoph Egger <siccegge@stud.informatik.uni-erlangen.de>
Cc: stable@kernel.org
2010-02-26 16:59:11 -05:00
Williams, Mitch A 95c26df829 net: Add netdev ops for SR-IOV configuration
Add netdev ops for configuring SR-IOV VF devices through the PF driver.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-12 16:56:08 -08:00
Joe Perches b3d95c5c93 include/linux/netdevice.h: Add netif_printk helpers
Add macros to test a private structure for msg_enable bits
and the netif_msg_##bit to test and call netdev_printk if set

Simplifies logic in callers and adds message logging consistency

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-12 13:27:45 -08:00
Joe Perches 571ba42303 netdevice.h: Add netdev_printk helpers like dev_printk
These netdev_printk routines take a struct net_device * and emit
dev_printk logging messages adding "%s: " ... netdev->dev.parent
to the dev_printk format and arguments.

This can create some uniformity in the output message log.

These helpers should not be used until a successful alloc_netdev.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-12 13:27:44 -08:00
Peter P Waskiewicz Jr 15682bc488 ethtool: Introduce n-tuple filter programming support
This patchset enables the ethtool layer to program n-tuple
filters to an underlying device.  The idea is to allow capable
hardware to have static rules applied that can assist steering
flows into appropriate queues.

Hardware that is known to support these types of filters today
are ixgbe and niu.

Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-10 20:03:05 -08:00
Jiri Pirko 6683ece36e net: use helpers to access mc list V2
This patch introduces the similar helpers as those already done for uc list.
However multicast lists are no list_head lists but "mademanually". The three
macros added by this patch will make the transition of mc_list to list_head
smooth in two steps:

1) convert all drivers to use these macros (with the original iterator of type
   "struct dev_mc_list")
2) once all drivers are converted, convert list type and iterators to "struct
   netdev_hw_addr" in one patch.

>From now on, drivers can (and should) use "netdev_for_each_mc_addr" to iterate
over the addresses with iterator of type "struct netdev_hw_addr". Also macros
"netdev_mc_count" and "netdev_mc_empty" to read list's length. This is the state
which should be reached in all drivers.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-04 10:22:25 -08:00
Arnd Bergmann 8a83a00b07 net: maintain namespace isolation between vlan and real device
In the vlan and macvlan drivers, the start_xmit function forwards
data to the dev_queue_xmit function for another device, which may
potentially belong to a different namespace.

To make sure that classification stays within a single namespace,
this resets the potentially critical fields.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-03 20:20:32 -08:00
Jiri Pirko 32e7bfc411 net: use helpers to access uc list V2
This patch introduces three macros to work with uc list from net drivers.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-01-25 13:36:10 -08:00
Alexey Dobriyan a271623f87 netdev: remove certain HAVE_ macros
After netdev_ops compat code HAVE_* macros aren't needed, in fact
they _will_ result in compile breakage for out of tree drivers.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-01-23 01:21:26 -08:00
David S. Miller 11380a4b2d net: Unexport napi_gro_flush().
Nothing outside of net/core/dev.c uses it.

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-01-19 13:46:10 -08:00
Patrick Mullaney fc4a748966 netdevice: provide common routine for macvlan and vlan operstate management
Provide common routine for the transition of operational state for a leaf
device during a root device transition.

Signed-off-by: Patrick Mullaney <pmullaney@novell.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-03 15:59:22 -08:00
Eric W. Biederman dcbccbd4f1 net: Implement for_each_netdev_reverse.
I will need this shortly to implement network namespace shutdown
batching.  For sanity sake network devices should be removed in
the reverse order they were created in.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-01 16:15:50 -08:00
Arnd Bergmann 445409602c veth: move loopback logic to common location
The veth driver contains code to forward an skb
from the start_xmit function of one network
device into the receive path of another device.

Moving that code into a common location lets us
reuse the code for direct forwarding of data
between macvlan ports, and possibly in other
drivers.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-26 15:52:58 -08:00
Eric Dumazet e014debecd linkwatch: linkwatch_forget_dev() to speedup device dismantle
Herbert Xu a écrit :
> On Tue, Nov 17, 2009 at 04:26:04AM -0800, David Miller wrote:
>> Really, the link watch stuff is just due for a redesign.  I don't
>> think a simple hack is going to cut it this time, sorry Eric :-)
>
> I have no objections against any redesigns, but since the only
> caller of linkwatch_forget_dev runs in process context with the
> RTNL, it could also legally emit those events.

Thanks guys, here an updated version then, before linkwatch surgery ?

In this version, I force the event to be sent synchronously.

[PATCH net-next-2.6] linkwatch: linkwatch_forget_dev() to speedup device dismantle

time ip link del eth3.103 ; time ip link del eth3.104 ; time ip link del eth3.105

real	0m0.266s
user	0m0.000s
sys	0m0.001s

real	0m0.770s
user	0m0.000s
sys	0m0.000s

real	0m1.022s
user	0m0.000s
sys	0m0.000s

One problem of current schem in vlan dismantle phase is the
holding of device done by following chain :

vlan_dev_stop() ->
	netif_carrier_off(dev) ->
		linkwatch_fire_event(dev) ->
			dev_hold() ...

And __linkwatch_run_queue() runs up to one second later...

A generic fix to this problem is to add a linkwatch_forget_dev() method
to unlink the device from the list of watched devices.

dev->link_watch_next becomes dev->link_watch_list (and use a bit more memory),
to be able to unlink device in O(1).

After patch :
time ip link del eth3.103 ; time ip link del eth3.104 ; time ip link del eth3.105

real    0m0.024s
user    0m0.000s
sys     0m0.000s

real    0m0.032s
user    0m0.000s
sys     0m0.001s

real    0m0.033s
user    0m0.000s
sys     0m0.000s

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-18 05:03:11 -08:00
Eric Dumazet d83345adf9 net: add dev_txq_stats_fold() helper
Some drivers ndo_get_stats() method need to perform txqueue stats folding.

Move folding from dev_get_stats() to a new dev_txq_stats_fold() function

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-17 23:51:52 -08:00
Jarek Poplawski 9a1654ba0b net: Optimize hard_start_xmit() return checking
Recent changes in the TX error propagation require additional checking
and masking of values returned from hard_start_xmit(), mainly to
separate cases where skb was consumed. This aim can be simplified by
changing the order of NETDEV_TX and NET_XMIT codes, because the latter
are treated similarly to negative (ERRNO) values.

After this change much simpler dev_xmit_complete() is also used in
sch_direct_xmit(), so it is moved to netdevice.h.

Additionally NET_RX definitions in netdevice.h are moved up from
between TX codes to avoid confusion while reading the TX comment.

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-15 22:08:33 -08:00
Eric Dumazet ce81b76a39 ipv6: use RCU to walk list of network devices
No longer need read_lock(&dev_base_lock), use RCU instead.
We also can avoid taking references on inet6_dev structs.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-13 20:38:49 -08:00
Patrick McHardy 572a9d7b6f net: allow to propagate errors through ->ndo_hard_start_xmit()
Currently the ->ndo_hard_start_xmit() callbacks are only permitted to return
one of the NETDEV_TX codes. This prevents any kind of error propagation for
virtual devices, like queue congestion of the underlying device in case of
layered devices, or unreachability in case of tunnels.

This patches changes the NET_XMIT codes to avoid clashes with the NETDEV_TX
codes and changes the two callers of dev_hard_start_xmit() to expect either
errno codes, NET_XMIT codes or NETDEV_TX codes as return value.

In case of qdisc_restart(), all non NETDEV_TX codes are mapped to NETDEV_TX_OK
since no error propagation is possible when using qdiscs. In case of
dev_queue_xmit(), the error is propagated upwards.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-13 14:07:32 -08:00
stephen hemminger 254245d233 netdev: add netdev_continue_rcu
This adds an RCU macro for continuing search, useful for some
network devices like vlan.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-10 22:26:29 -08:00
Eric Dumazet d94d9fee9f net: cleanup include/linux
This cleanup patch puts struct/union/enum opening braces,
in first line to ease grep games.

struct something
{

becomes :

struct something {

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-04 09:50:58 -08:00
Eric Dumazet c6d14c8456 net: Introduce for_each_netdev_rcu() iterator
Adds RCU management to the list of netdevices.

Convert some for_each_netdev() users to RCU version, if
it can avoid read_lock-ing dev_base_lock

Ie:
	read_lock(&dev_base_loack);
	for_each_netdev(net, dev)
		some_action();
	read_unlock(&dev_base_lock);

becomes :

	rcu_read_lock();
	for_each_netdev_rcu(net, dev)
		some_action();
	rcu_read_unlock();


Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-04 05:43:23 -08:00
Eric Dumazet 72c9528bab net: Introduce dev_get_by_name_rcu()
Some workloads hit dev_base_lock rwlock pretty hard.
We can use RCU lookups to avoid touching this rwlock
(and avoid touching netdevice refcount)

netdevices are already freed after a RCU grace period, so this patch
adds no penalty at device dismantle time.

However, it adds a synchronize_rcu() call in dev_change_name()

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-01 23:55:08 -08:00
Eric W. Biederman 0c509a6c93 net: Allow devices to specify a device specific sysfs group.
This isn't beautifully abstracted, but it is simple,
simplifies uses and so far is only needed for the bonding driver.

Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-30 12:41:18 -07:00
Ben Hutchings c7c4b3b6e9 gro: Change all receive functions to return GRO result codes
This will allow drivers to adjust their receive path dynamically
based on whether GRO is being applied successfully.

Currently all in-tree callers ignore the return values of these
functions and do not need to be changed.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-29 21:36:53 -07:00
Ben Hutchings 5b252f0c2f gro: Name the GRO result enumeration type
This clarifies which return and parameter types are GRO result codes
and not RX result codes.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-29 21:33:55 -07:00
Eric Dumazet fb699dfd42 net: Introduce dev_get_by_index_rcu()
Some workloads hit dev_base_lock rwlock pretty hard.
We can use RCU lookups to avoid touching this rwlock.

netdevices are already freed after a RCU grace period, so this patch
adds no penalty at device dismantle time.

dev_ifname() converted to dev_get_by_index_rcu()

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-29 01:42:55 -07:00
Yi Zou df5c79452f net: Add ndo_fcoe_get_wwn to net_device_ops
Add ndo_fcoe_get_wwn so Fiber Channel over Ethernet (FCoE) can make use of
the provided World Wide Port Name (WWPN) and World Wide Node Name (WWNN)
from the underlying network interface driver.

Signed-off-by: Yi Zou <yi.zou@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-29 01:04:03 -07:00
Eric Dumazet 9b5e383c11 net: Introduce unregister_netdevice_many()
Introduce rollback_registered_many() and unregister_netdevice_many()

rollback_registered_many() is able to perform necessary steps at device dismantle
time, factorizing two expensive synchronize_net() calls.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-28 02:22:06 -07:00
Eric Dumazet 44a0873d52 net: Introduce unregister_netdevice_queue()
This patchs adds an unreg_list anchor to struct net_device, and
introduces an unregister_netdevice_queue() function, able to queue
a net_device to a list instead of immediately unregister it.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-28 02:22:06 -07:00
David S. Miller 7fe13c5733 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2009-10-11 23:15:47 -07:00
Wolfram Sang d308e38fa5 include/linux/netdevice.h: fix nanodoc mismatch
nanodoc was missing an ndo_-prefix.

Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-07 13:53:11 -07:00
Stephen Hemminger 3295354322 dcb: data center bridging ops should be r/o
The data center bridging ops structure can be const

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-07 01:10:47 -07:00
Linus Torvalds f205ce83a7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (66 commits)
  be2net: fix some cmds to use mccq instead of mbox
  atl1e: fix 2.6.31-git4 -- ATL1E 0000:03:00.0: DMA-API: device driver frees DMA
  pkt_sched: Fix qstats.qlen updating in dump_stats
  ipv6: Log the affected address when DAD failure occurs
  wl12xx: Fix print_mac() conversion.
  af_iucv: fix race when queueing skbs on the backlog queue
  af_iucv: do not call iucv_sock_kill() twice
  af_iucv: handle non-accepted sockets after resuming from suspend
  af_iucv: fix race in __iucv_sock_wait()
  iucv: use correct output register in iucv_query_maxconn()
  iucv: fix iucv_buffer_cpumask check when calling IUCV functions
  iucv: suspend/resume error msg for left over pathes
  wl12xx: switch to %pM to print the mac address
  b44: the poll handler b44_poll must not enable IRQ unconditionally
  ipv6: Ignore route option with ROUTER_PREF_INVALID
  bonding: make ab_arp select active slaves as other modes
  cfg80211: fix SME connect
  rc80211_minstrel: fix contention window calculation
  ssb/sdio: fix printk format warnings
  p54usb: add Zcomax XG-705A usbid
  ...
2009-09-17 20:53:52 -07:00
David Brownell a4dbd6740d driver model: constify attribute groups
Let attribute group vectors be declared "const".  We'd
like to let most attribute metadata live in read-only
sections... this is a start.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-15 09:50:47 -07:00
Moni Shoua 75c78500dd bonding: remap muticast addresses without using dev_close() and dev_open()
This patch fixes commit e36b9d16c6. The approach
there is to call dev_close()/dev_open() whenever the device type is changed in
order to remap the device IP multicast addresses to HW multicast addresses.
This approach suffers from 2 drawbacks:

*. It assumes tha the device is UP when calling dev_close(), or otherwise
   dev_close() has no affect. It is worth to mention that initscripts (Redhat)
   and sysconfig (Suse) doesn't act the same in this matter. 
*. dev_close() has other side affects, like deleting entries from the routing
   table, which might be unnecessary.

The fix here is to directly remap the IP multicast addresses to HW multicast
addresses for a bonding device that changes its type, and nothing else.
   
Reported-by:   Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Moni Shoua <monis@voltaire.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-15 02:37:40 -07:00
Marcel Holtmann 384912ed19 net: Add DEVTYPE support for Ethernet based devices
The Ethernet framing is used for a lot of devices these days. Most
prominent are WiFi and WiMAX based devices. However for userspace
application it is important to classify these devices correctly and
not only see them as Ethernet devices. The daemons like HAL, DeviceKit
or even NetworkManager with udev support tries to do the classification
in userspace with a lot trickery and extra system calls. This is not
good and actually reaches its limitations. Especially since the kernel
does know the type of the Ethernet device it is pretty stupid.

To solve this problem the underlying device type needs to be set and
then the value will be exported as DEVTYPE via uevents and available
within udev.

  # cat /sys/class/net/wlan0/uevent
  DEVTYPE=wlan
  INTERFACE=wlan0
  IFINDEX=5

This is similar to subsystems like USB and SCSI that distinguish
between hosts, devices, disks, partitions etc.

The new SET_NETDEV_DEVTYPE() is a convenience helper to set the actual
device type. All device types are free form, but for convenience the
same strings as used with RFKILL are choosen.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-11 12:54:55 -07:00
Patrick McHardy af356afa01 net_sched: reintroduce dev->qdisc for use by sch_api
Currently the multiqueue integration with the qdisc API suffers from
a few problems:

- with multiple queues, all root qdiscs use the same handle. This means
  they can't be exposed to userspace in a backwards compatible fashion.

- all API operations always refer to queue number 0. Newly created
  qdiscs are automatically shared between all queues, its not possible
  to address individual queues or restore multiqueue behaviour once a
  shared qdisc has been attached.

- Dumps only contain the root qdisc of queue 0, in case of non-shared
  qdiscs this means the statistics are incomplete.

This patch reintroduces dev->qdisc, which points to the (single) root qdisc
from userspace's point of view. Currently it either points to the first
(non-shared) default qdisc, or a qdisc shared between all queues. The
following patches will introduce a classful dummy qdisc, which will be used
as root qdisc and contain the per-queue qdiscs as children.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-06 02:07:03 -07:00
Yi Zou cb45439977 net: Add ndo_fcoe_enable/ndo_fcoe_disable to net_device_ops
Add ndo_fcoe_enable/_disable to net_device_ops so the corresponding
HW can initialize itself for FCoE traffic or clean up after FCoE traffic is
done. This is expected to be called by the kernel FCoE stack upon receiving
a request for creating an FCoE instance on the corresponding netdev interface.
When implemented by the actual HW, the HW driver check the op code to perform
corresponding initialization or clean up for FCoE. The initialization normally
includes allocating extra queues for FCoE, setting corresponding HW registers
for FCoE, indicating FCoE offload features via netdev, etc. The clean-up would
include releasing the resources allocated for FCoE.

Signed-off-by: Yi Zou <yi.zou@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01 01:24:20 -07:00
Stephen Hemminger dc1f8bf68b netdev: change transmit to limited range type
The transmit function should only return one of three possible values,
some drivers got confused and returned errno's or other values.
This changes the definition so that this can be caught at compile time.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01 01:13:03 -07:00
Krishna Kumar 7b3d3e4fc6 netdevice: Consolidate to use existing macros where available.
Patch compiled and 32 simultaneous netperf testing ran fine.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-30 22:16:20 -07:00
Yi Zou bb2af4f54f net: Add NETIF_F_FCOE_MTU to indicate support for a different MTU for FCoE
Add NETIF_F_FCOE_MTU to indicate that the NIC can support a secondary MTU for
converged traffic of LAN and Fiber Channel over Ethernet (FCoE). The MTU for
FCoE is 2158 = 14 (FCoE header) + 24 (FC header) + 2112 (FC max payload) +
4 (FC CRC) + 4 (FCoE trailer).

Signed-off-by: Yi Zou <yi.zou@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-14 16:12:09 -07:00
Florian Westphal 0e8635a8e1 net: remove NET_RX_BAD and NET_RX_CN* defines
almost no users in the tree; and the few that use them treat them
like NET_RX_DROP.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-05 19:15:35 -07:00
Jiri Pirko 31278e7147 net: group address list and its count
This patch is inspired by patch recently posted by Johannes Berg. Basically what
my patch does is to group list and a count of addresses into newly introduced
structure netdev_hw_addr_list. This brings us two benefits:
1) struct net_device becames a bit nicer.
2) in the future there will be a possibility to operate with lists independently
   on netdevices (with exporting right functions).
I wanted to introduce this patch before I'll post a multicast lists conversion.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>

 drivers/net/bnx2.c              |    4 +-
 drivers/net/e1000/e1000_main.c  |    4 +-
 drivers/net/ixgbe/ixgbe_main.c  |    6 +-
 drivers/net/mv643xx_eth.c       |    2 +-
 drivers/net/niu.c               |    4 +-
 drivers/net/virtio_net.c        |   10 ++--
 drivers/s390/net/qeth_l2_main.c |    2 +-
 include/linux/netdevice.h       |   17 +++--
 net/core/dev.c                  |  130 ++++++++++++++++++--------------------
 9 files changed, 89 insertions(+), 90 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-18 00:29:08 -07:00
David S. Miller a5bd8a13e9 netdevice.h: Use frag list abstraction interfaces.
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-09 00:17:27 -07:00
Herbert Xu 278b2513f7 gso: Stop fraglists from escaping
As it stands skb fraglists can get past the check in dev_queue_xmit
if the skb is marked as GSO.  In particular, if the packet doesn't
have the proper checksums for GSO, but can otherwise be handled by
the underlying device, we will not perform the fraglist check on it
at all.

If the underlying device cannot handle fraglists, then this will
break.

The fix is as simple as moving the fraglist check from the device
check into skb_gso_ok.

This has caused crashes with Xen when used together with GRO which
can generate GSO packets with fraglists.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-03 21:20:51 -07:00
Jiri Pirko ccffad25b5 net: convert unicast addr list
This patch converts unicast address list to standard list_head using
previously introduced struct netdev_hw_addr. It also relaxes the
locking. Original spinlock (still used for multicast addresses) is not
needed and is no longer used for a protection of this list. All
reading and writing takes place under rtnl (with no changes).

I also removed a possibility to specify the length of the address
while adding or deleting unicast address. It's always dev->addr_len.

The convertion touched especially e1000 and ixgbe codes when the
change is not so trivial.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>

 drivers/net/bnx2.c               |   13 +--
 drivers/net/e1000/e1000_main.c   |   24 +++--
 drivers/net/ixgbe/ixgbe_common.c |   14 ++--
 drivers/net/ixgbe/ixgbe_common.h |    4 +-
 drivers/net/ixgbe/ixgbe_main.c   |    6 +-
 drivers/net/ixgbe/ixgbe_type.h   |    4 +-
 drivers/net/macvlan.c            |   11 +-
 drivers/net/mv643xx_eth.c        |   11 +-
 drivers/net/niu.c                |    7 +-
 drivers/net/virtio_net.c         |    7 +-
 drivers/s390/net/qeth_l2_main.c  |    6 +-
 drivers/scsi/fcoe/fcoe.c         |   16 ++--
 include/linux/netdevice.h        |   18 ++--
 net/8021q/vlan.c                 |    4 +-
 net/8021q/vlan_dev.c             |   10 +-
 net/core/dev.c                   |  195 +++++++++++++++++++++++++++-----------
 net/dsa/slave.c                  |   10 +-
 net/packet/af_packet.c           |    4 +-
 18 files changed, 227 insertions(+), 137 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-29 22:12:32 -07:00
Jiri Pirko 5d4e039b2c bonding: allow bond in mode balance-alb to work properly in bridge -try4.3
[PATCH net-next] bonding: allow bond in mode balance-alb to work properly in bridge -try4.3

(updated)
changes v4.2 -> v4.3
- memcpy the address always, not just in case it differs from master->dev_addr
- compare_ether_addr_64bits() is not used so there is no direct need to make new
  header file (I think it would be good to have bond stuff in separate file
  anyway).

changes v4.1 -> v4.2
- use skb->pkt_type == PACKET_HOST compare rather then comparing skb dest addr
  against skb->dev->dev_addr

The problem is described in following bugzilla:
https://bugzilla.redhat.com/show_bug.cgi?id=487763

Basically here's what's going on. In every mode, bonding interface uses the same
mac address for all enslaved devices (except fail_over_mac). Only balance-alb
will simultaneously use multiple MAC addresses across different slaves. When you
put this kind of bond device into a bridge it will only add one of mac adresses
into a hash list of mac addresses, say X. This mac address is marked as local.
But this bonding interface also has mac address Y. Now then packet arrives with
destination address Y, this address is not marked as local and the packed looks
like it needs to be forwarded. This packet is then lost which is wrong.

Notice that interfaces can be added and removed from bond while it is in bridge.

***
When the multiple addresses for bridge port approach failed to solve this issue
due to STP I started to think other way to solve this. I returned to previous
solution but tweaked one.

This patch solves the situation in the bonding without touching bridge code.
For every incoming frame to bonding the destination address is compared to
current address of the slave device from which tha packet came. If these two
match destination address is replaced by mac address of the master. This address
is known by bridge so it is delivered properly. Note that the comparsion is not
made directly, it's used skb->pkt_type == PACKET_HOST instead. This is "set"
previously in eth_type_trans().

I experimentally tried that this works as good as searching through the slave
list (v4 of this patch).

Jirka

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-29 01:51:23 -07:00
Jiri Pirko 385a154cac net: correct a comment for the final #endif
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-27 15:48:07 -07:00
Eric Dumazet 1ce8e7b57b net: ALIGN/PTR_ALIGN cleanup in alloc_netdev_mq()/netdev_priv()
Use ALIGN() and PTR_ALIGN() macros instead of handcoding them.

Get rid of NETDEV_ALIGN_CONST ugly define

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-27 15:47:06 -07:00
Herbert Xu a5b1cf288d gro: Avoid unnecessary comparison after skb_gro_header
For the overwhelming majority of cases, skb_gro_header's return
value cannot be NULL.  Yet we must check it because of its current
form.  This patch splits it up into multiple functions in order
to avoid this.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-27 03:26:01 -07:00
Herbert Xu 7489594cb2 gro: Optimise length comparison in skb_gro_header
By caching frag0_len, we can avoid checking both frag0 and the
length separately in skb_gro_header.  This helps as skb_gro_header
is called four times per packet which amounts to a few million
times at 10Gb/s.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-27 03:26:01 -07:00
Herbert Xu 78d3fd0b7d gro: Only use skb_gro_header for completely non-linear packets
Currently skb_gro_header is used for packets which put the hardware
header in skb->data with the rest in frags.  Since the drivers that
need this optimisation all provide completely non-linear packets,
we can gain extra optimisations by only performing the frag0
optimisation for completely non-linear packets.

In particular, we can simply test frag0 (instead of skb_headlen)
to see whether the optimisation is in force.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-27 03:25:57 -07:00
Herbert Xu 78a478d0ef gro: Inline skb_gro_header and cache frag0 virtual address
The function skb_gro_header is called four times per packet which
quickly adds up at 10Gb/s.  This patch inlines it to allow better
optimisations.

Some architectures perform multiplication for page_address, which
is done by each skb_gro_header invocation.  This patch caches that
value in skb->cb to avoid the unnecessary multiplications.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-27 03:25:55 -07:00
Eric Dumazet 08baf56108 net: txq_trans_update() helper
We would like to get rid of netdev->trans_start = jiffies; that about all net
drivers have to use in their start_xmit() function, and use txq->trans_start
instead.

This can be done generically in core network, as suggested by David.

Some devices, (particularly loopback) dont need trans_start update, because
they dont have transmit watchdog. We could add a new device flag, or rely
on fact that txq->tran_start can be updated is txq->xmit_lock_owner is
different than -1. Use a helper function to hide our choice.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-25 22:58:01 -07:00
Alexander Beregalov e3804cbebb net: remove COMPAT_NET_DEV_OPS
All drivers are already converted to new net_device_ops API
and nobody uses old API anymore.

Signed-off-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-25 01:53:53 -07:00
Eric Dumazet 7004bf252c net: add tx_packets/tx_bytes/tx_dropped counters in struct netdev_queue
offsetof(struct net_device, features)=0x44
offsetof(struct net_device, stats.tx_packets)=0x54
offsetof(struct net_device, stats.tx_bytes)=0x5c
offsetof(struct net_device, stats.tx_dropped)=0x6c

Network drivers that touch dev->stats.tx_packets/stats.tx_bytes in their
tx path can slow down SMP operations, since they dirty a cache line
that should stay shared (dev->features is needed in rx and tx paths)

We could move away stats field in net_device but it wont help that much.
(Two cache lines dirtied in tx path, we can do one only)

Better solution is to add tx_packets/tx_bytes/tx_dropped in struct
netdev_queue because this structure is already touched in tx path and
counters updates will then be free (no increase in size)

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-18 15:15:06 -07:00
Eric Dumazet 9d21493b4b net: tx scalability works : trans_start
struct net_device trans_start field is a hot spot on SMP and high performance
devices, particularly multi queues ones, because every transmitter dirties
it. Is main use is tx watchdog and bonding alive checks.

But as most devices dont use NETIF_F_LLTX, we have to lock
a netdev_queue before calling their ndo_start_xmit(). So it makes
sense to move trans_start from net_device to netdev_queue. Its update
will occur on a already present (and in exclusive state) cache line, for
free.

We can do this transition smoothly. An old driver continue to
update dev->trans_start, while an updated one updates txq->trans_start.

Further patches could also put tx_bytes/tx_packets counters in 
netdev_queue to avoid dirtying dev->stats (vlan device comes to mind)

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-17 20:55:16 -07:00
David S. Miller 4d5b78c055 net: Add missing rculist.h include to netdevice.h
Otherwise list_for_each_entry_rcu() et al. aren't visible
and we get build failures in some configurations.

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-06 16:52:51 -07:00
Jiri Pirko f001fde5ea net: introduce a list of device addresses dev_addr_list (v6)
v5 -> v6 (current):
-removed so far unused static functions
-corrected dev_addr_del_multiple to call del instead of add

v4 -> v5:
-added device address type (suggested by davem)
-removed refcounting (better to have simplier code then safe potentially few
 bytes)

v3 -> v4:
-changed kzalloc to kmalloc in __hw_addr_add_ii()
-ASSERT_RTNL() avoided in dev_addr_flush() and dev_addr_init()

v2 -> v3:
-removed unnecessary rcu read locking
-moved dev_addr_flush() calling to ensure no null dereference of dev_addr

v1 -> v2:
-added forgotten ASSERT_RTNL to dev_addr_init and dev_addr_flush
-removed unnecessary rcu_read locking in dev_addr_init
-use compare_ether_addr_64bits instead of compare_ether_addr
-use L1_CACHE_BYTES as size for allocating struct netdev_hw_addr
-use call_rcu instead of rcu_synchronize
-moved is_etherdev_addr into __KERNEL__ ifdef

This patch introduces a new list in struct net_device and brings a set of
functions to handle the work with device address list. The list is a replacement
for the original dev_addr field and because in some situations there is need to
carry several device addresses with the net device. To be backward compatible,
dev_addr is made to point to the first member of the list so original drivers
sees no difference.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-05 12:26:24 -07:00
David S. Miller aba7453037 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	Documentation/isdn/00-INDEX
	drivers/net/wireless/iwlwifi/iwl-scan.c
	drivers/net/wireless/rndis_wlan.c
	net/mac80211/main.c
2009-04-29 20:30:35 -07:00
Eric Dumazet 6a321cb370 net: netif_tx_queue_stopped too expensive
netif_tx_queue_stopped(txq) is most of the time false.

Yet its cost is very expensive on SMP.

static inline int netif_tx_queue_stopped(const struct netdev_queue *dev_queue)
{
	return test_bit(__QUEUE_STATE_XOFF, &dev_queue->state);
}

I saw this on oprofile hunting and bnx2 driver bnx2_tx_int().

We probably should split "struct netdev_queue" in two parts, one
being read mostly.

__netif_tx_lock() touches _xmit_lock & xmit_lock_owner, these
deserve a separate cache line.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-28 04:43:42 -07:00
Jesse Brandeburg 8dc92f7e2e sctp: add feature bit for SCTP offload in hardware
this is the sctp code to enable hardware crc32c offload for
adapters that support it.

Originally by: Vlad Yasevich <vladislav.yasevich@hp.com>

modified by Jesse Brandeburg <jesse.brandeburg@intel.com>

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-28 01:53:14 -07:00
Mike Rapoport 37b607c5ac net: Fix typo in net_device_ops description.
Signed-off-by: Mike Rapoport <mike@compulab.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-27 05:45:54 -07:00
Herbert Xu 36e7b1b8da gro: Fix COMPLETE checksum handling
On a brand new GRO skb, we cannot call ip_hdr since the header
may lie in the non-linear area.  This patch adds the helper
skb_gro_network_header to handle this.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-27 05:44:45 -07:00
Herbert Xu edbd9e3030 gro: Fix handling of headers that extend over the tail
The skb_gro_* code fails to handle the case where a header starts
in the linear area but ends in the frags area.  Since the goal
of skb_gro_* is to optimise the case of completely non-linear
packets, we can simply bail out if we have anything in the linear
area.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-27 05:44:29 -07:00
Adrian Bunk c759a6b4e1 net: Fix LL_MAX_HEADER for CONFIG_TR_MODULE
Unless I miss anything this should fix a bug.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-27 02:36:20 -07:00
Patrick McHardy b1b67dd45a net: factor out ethtool invocation of vlan/macvlan drivers
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-21 02:00:51 -07:00
Herbert Xu 76620aafd6 gro: New frags interface to avoid copying shinfo
It turns out that copying a 16-byte area at ~800k times a second
can be really expensive :) This patch redesigns the frags GRO
interface to avoid copying that area twice.

The two disciples of the frags interface have been converted.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-16 02:02:07 -07:00
Linus Torvalds d54b3538b0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (119 commits)
  [SCSI] scsi_dh_rdac: Retry for NOT_READY check condition
  [SCSI] mpt2sas: make global symbols unique
  [SCSI] sd: Make revalidate less chatty
  [SCSI] sd: Try READ CAPACITY 16 first for SBC-2 devices
  [SCSI] sd: Refactor sd_read_capacity()
  [SCSI] mpt2sas v00.100.11.15
  [SCSI] mpt2sas: add MPT2SAS_MINOR(221) to miscdevice.h
  [SCSI] ch: Add scsi type modalias
  [SCSI] 3w-9xxx: add power management support
  [SCSI] bsg: add linux/types.h include to bsg.h
  [SCSI] cxgb3i: fix function descriptions
  [SCSI] libiscsi: fix possbile null ptr session command cleanup
  [SCSI] iscsi class: remove host no argument from session creation callout
  [SCSI] libiscsi: pass session failure a session struct
  [SCSI] iscsi lib: remove qdepth param from iscsi host allocation
  [SCSI] iscsi lib: have lib create work queue for transmitting IO
  [SCSI] iscsi class: fix lock dep warning on logout
  [SCSI] libiscsi: don't cap queue depth in iscsi modules
  [SCSI] iscsi_tcp: replace scsi_debug/tcp_debug logging with iscsi conn logging
  [SCSI] libiscsi_tcp: replace tcp_debug/scsi_debug logging with session/conn logging
  ...
2009-03-28 13:30:43 -07:00
Dmitri Vorobiev cc0be3227d net: Add missing include into include/linux/netdevice.h
The inline function skb_gro_mac_header defined in include/linux/netdevice.h
makes use of page_address(). Depending on configuration options, the latter
is either defined as a macro or is declared as a function in another header
file, namely include/linux/mm.h. However, include/linux/netdevice.h does not
include include/linux/mm.h.

On MIPS, this has produced the following build error:

  CC      kernel/sysctl_check.o
In file included from include/linux/icmpv6.h:173,
                 from include/linux/ipv6.h:208,
                 from include/net/ip_vs.h:26,
                 from kernel/sysctl_check.c:6:
include/linux/netdevice.h: In function 'skb_gro_mac_header':
include/linux/netdevice.h:1132: error: implicit declaration of function
'page_address'
include/linux/netdevice.h:1133: warning: pointer/integer type mismatch
in conditional expression
make[1]: *** [kernel/sysctl_check.o] Error 1
make: *** [kernel] Error 2

The patch adds the missing include and fixes the build error.

Signed-off-by: Dmitri Vorobiev <dmitri.vorobiev@movial.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-27 15:55:36 -07:00
Herbert Xu d1c76af9e2 GRO: Move netpoll checks to correct location
As my netpoll fix for net doesn't really work for net-next, we
need this update to move the checks into the right place.  As it
stands we may pass freed skbs to netpoll_receive_skb.

This patch also introduces a netpoll_rx_on function to avoid GRO
completely if we're invoked through netpoll.  This might seem
paranoid but as netpoll may have an external receive hook it's
better to be safe than sorry.  I don't think we need this for
2.6.29 though since there's nothing immediately broken by it.

This patch also moves the GRO_* return values to netdevice.h since
VLAN needs them too (I tried to avoid this originally but alas
this seems to be the easiest way out).  This fixes a bug in VLAN
where it continued to use the old return value 2 instead of the
correct GRO_DROP.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-16 10:50:02 -07:00
Yi Zou 4d288d5767 [SCSI] net: add FCoE offload support through net_device
This adds support to provide Fiber Channel over Ethernet (FCoE) offload
through net_device's net_device_ops struct. The offload through net_device
for FCoE is enabled in kernel as built-in or module driver.

Signed-off-by: Yi Zou <yi.zou@intel.com>
Acked-by: David Miller <davem@davemloft.net>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-03-13 15:12:12 -05:00
Chris Leech 01d5b2fca1 [SCSI] net: define feature flags for FCoE offloads
Define feature flags for FCoE offloads.

Signed-off-by: Chris Leech <christopher.leech@intel.com>
Signed-off-by: Yi Zou <yi.zou@intel.com>
Acked-by: David Miller <davem@davemloft.net>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-03-13 15:11:07 -05:00
Chris Leech 43eb99c5b3 [SCSI] net: reclaim 8 upper bits of the netdev->features from GSO
Reclaim 8 upper bits of netdev->features from GSO.

Signed-off-by: Chris Leech <christopher.leech@intel.com>
Signed-off-by: Yi Zou <yi.zou@intel.com>
Acked-by: David Miller <davem@davemloft.net>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-03-13 15:10:23 -05:00
David S. Miller 508827ff0a Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/tokenring/tmspci.c
	drivers/net/ucc_geth_mii.c
2009-03-05 02:06:47 -08:00
David S. Miller 9d40bbda59 vlan: Fix vlan-in-vlan crashes.
As analyzed by Patrick McHardy, vlan needs to reset it's
netdev_ops pointer in it's ->init() function but this
leaves the compat method pointers stale.

Add a netdev_resync_ops() and call it from the vlan code.

Any other driver which changes ->netdev_ops after register_netdevice()
will need to call this new function after doing so too.

With help from Patrick McHardy.

Tested-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-04 23:46:25 -08:00
Harvey Harrison f3a7c66b5c net: replace __constant_{endian} uses in net headers
Base versions handle constant folding now.  For headers exposed to
userspace, we must only expose the __ prefixed versions.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-14 22:58:35 -08:00
Herbert Xu aa4b9f533e gro: Optimise Ethernet header comparison
This patch optimises the Ethernet header comparison to use 2-byte
and 4-byte xors instead of memcmp.  In order to facilitate this,
the actual comparison is now carried out by the callers of the
shared dev_gro_receive function.

This has a significant impact when receiving 1500B packets through
10GbE.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-08 20:22:18 -08:00
Herbert Xu 4ae5544f9a gro: Remember number of held packets instead of counting every time
This patch prepares for the move of the same_flow checks out of
dev_gro_receive.  As such we need to remember the number of held
packets since doing a loop just to count them every time is silly.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-08 20:22:17 -08:00
Graf Yang fe2918b098 net: fix some trailing whitespaces
Signed-off-by: Graf Yang <graf.yang@analog.com>
Signed-off-by: Bryan Wu <cooloney@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-05 21:26:19 -08:00
Herbert Xu 86911732d3 gro: Avoid copying headers of unmerged packets
Unfortunately simplicity isn't always the best.  The fraginfo
interface turned out to be suboptimal.  The problem was quite
obvious.  For every packet, we have to copy the headers from
the frags structure into skb->head, even though for 99% of the
packets this part is immediately thrown away after the merge.

LRO didn't have this problem because it directly read the headers
from the frags structure.

This patch attempts to address this by creating an interface
that allows GRO to access the headers in the first frag without
having to copy it.  Because all drivers that use frags place the
headers in the first frag this optimisation should be enough.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-29 16:33:03 -08:00
Herbert Xu 5d0d9be8ef gro: Move common completion code into helpers
Currently VLAN still has a bit of common code handling the aftermath
of GRO that's shared with the common path.  This patch moves them
into shared helpers to reduce code duplication.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-29 16:33:02 -08:00
Ben Hutchings 288379f050 net: Remove redundant NAPI functions
Following the removal of the unused struct net_device * parameter from
the NAPI functions named *netif_rx_* in commit 908a7a1, they are
exactly equivalent to the corresponding *napi_* functions and are
therefore redundant.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-21 14:33:50 -08:00
Benjamin Herrenschmidt 937f1ba56b net: Add init_dummy_netdev() and fix EMAC driver using it
This adds an init_dummy_netdev() function that gets a network device
structure (allocation and lifetime entirely under caller's control) and
initialize the minimum amount of fields so it can be used to schedule
NAPI polls without registering a full blown interface. This is to be
used by drivers that need to tie several hardware interfaces to a single
NAPI poll scheduler due to HW limitations.

It also updates the ibm_newemac driver to use that, this fixing the
oops on 2.6.29 due to passing NULL as "dev" to netif_napi_add()

Symbol is exported GPL only a I don't think we want binary drivers doing
that sort of acrobatics (if we want them at all).

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Tested-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-14 21:05:05 -08:00
Krzysztof Hałasa 985ebdb5ed net: Fix a comment in include/linux/netdevice.h.
Fix a comment in include/linux/netdevice.h.

Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-12 21:18:32 -08:00
Linus Torvalds d9e8a3a5b8 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx: (22 commits)
  ioat: fix self test for multi-channel case
  dmaengine: bump initcall level to arch_initcall
  dmaengine: advertise all channels on a device to dma_filter_fn
  dmaengine: use idr for registering dma device numbers
  dmaengine: add a release for dma class devices and dependent infrastructure
  ioat: do not perform removal actions at shutdown
  iop-adma: enable module removal
  iop-adma: kill debug BUG_ON
  iop-adma: let devm do its job, don't duplicate free
  dmaengine: kill enum dma_state_client
  dmaengine: remove 'bigref' infrastructure
  dmaengine: kill struct dma_client and supporting infrastructure
  dmaengine: replace dma_async_client_register with dmaengine_get
  atmel-mci: convert to dma_request_channel and down-level dma_slave
  dmatest: convert to dma_request_channel
  dmaengine: introduce dma_request_channel and private channels
  net_dma: convert to dma_find_channel
  dmaengine: provide a common 'issue_pending_all' implementation
  dmaengine: centralize channel allocation, introduce dma_find_channel
  dmaengine: up-level reference counting to the module level
  ...
2009-01-09 11:52:14 -08:00
Herbert Xu 96e93eab20 gro: Add internal interfaces for VLAN
Previously GRO's only entry point from the outside is through
napi_gro_receive and napi_gro_frags.  These interfaces are for
device drivers.

This patch rearranges things to provide a new set of interfaces
for VLANs.  These interfaces are for internal use only.  The
VLAN code itself can then provide a set of entry points for
device drivers.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-06 10:49:34 -08:00
Dan Williams f67b459992 net_dma: convert to dma_find_channel
Use the general-purpose channel allocation provided by dmaengine.

Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-01-06 11:38:15 -07:00
Herbert Xu 5d38a079ce gro: Add page frag support
This patch allows GRO to merge page frags (skb_shinfo(skb)->frags)
in one skb, rather than using the less efficient frag_list.

It also adds a new interface, napi_gro_frags to allow drivers
to inject page frags directly into the stack without allocating
an skb.  This is intended to be the GRO equivalent for LRO's
lro_receive_frags interface.

The existing GSO interface can already handle page frags with
or without an appended frag_list so nothing needs to be changed
there.

The merging itself is rather simple.  We store any new frag entries
after the last existing entry, without checking whether the first
new entry can be merged with the last existing entry.  Making this
check would actually be easy but since no existing driver can
produce contiguous frags anyway it would just be mental masturbation.

If the total number of entries would exceed the capacity of a
single skb, we simply resort to using frag_list as we do now.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-04 16:13:40 -08:00
Neil Horman 908a7a16b8 net: Remove unused netdev arg from some NAPI interfaces.
When the napi api was changed to separate its 1:1 binding to the net_device
struct, the netif_rx_[prep|schedule|complete] api failed to remove the now
vestigual net_device structure parameter.  This patch cleans up that api by
properly removing it..

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-22 20:43:12 -08:00
Herbert Xu d565b0a1a9 net: Add Generic Receive Offload infrastructure
This patch adds the top-level GRO (Generic Receive Offload) infrastructure.
This is pretty similar to LRO except that this is protocol-independent.
Instead of holding packets in an lro_mgr structure, they're now held in
napi_struct.

For drivers that intend to use this, they can set the NETIF_F_GRO bit and
call napi_gro_receive instead of netif_receive_skb or just call netif_rx.
The latter will call napi_receive_skb automatically.  When napi_gro_receive
is used, the driver must either call napi_complete/napi_rx_complete, or
call napi_gro_flush in softirq context if the driver uses the primitives
__napi_complete/__napi_rx_complete.

Protocols will set the gro_receive and gro_complete function pointers in
order to participate in this scheme.

In addition to the packet, gro_receive will get a list of currently held
packets.  Each packet in the list has a same_flow field which is non-zero
if it is a potential match for the new packet.  For each packet that may
match, they also have a flush field which is non-zero if the held packet
must not be merged with the new packet.

Once gro_receive has determined that the new skb matches a held packet,
the held packet may be processed immediately if the new skb cannot be
merged with it.  In this case gro_receive should return the pointer to
the existing skb in gro_list.  Otherwise the new skb should be merged into
the existing packet and NULL should be returned, unless the new skb makes
it impossible for any further merges to be made (e.g., FIN packet) where
the merged skb should be returned.

Whenever the skb is merged into an existing entry, the gro_receive
function should set NAPI_GRO_CB(skb)->same_flow.  Note that if an skb
merely matches an existing entry but can't be merged with it, then
this shouldn't be set.

If gro_receive finds it pointless to hold the new skb for future merging,
it should set NAPI_GRO_CB(skb)->flush.

Held packets will be flushed by napi_gro_flush which is called by
napi_complete and napi_rx_complete.

Currently held packets are stored in a singly liked list just like LRO.
The list is limited to a maximum of 8 entries.  In future, this may be
expanded to use a hash table to allow more flows to be held for merging.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-15 23:38:52 -08:00
Herbert Xu 1a881f27c5 net: Add frag_list support to GSO
This patch allows GSO to handle frag_list in a limited way for the
purposes of allowing packets merged by GRO to be refragmented on
output.

Most hardware won't (and aren't expected to) support handling GRO
frag_list packets directly.  Therefore we will perform GSO in
software for those cases.

However, for drivers that can support it (such as virtual NICs) we
may not have to segment the packets at all.

Whether the added overhead of GRO/GSO is worthwhile for bridges
and routers when weighed against the benefit of potentially
increasing the MTU within the host is still an open question.
However, for the case of host nodes this is undoubtedly a win.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-15 23:27:47 -08:00