Commit Graph

131646 Commits

Author SHA1 Message Date
Alexander Duyck db76176215 igb: move setting of buffsz out of repeated path in alloc_rx_buffers
buffsz is being repeatedly set when allocaing buffers.  Since this value
should only need to be set once in the function I am moving it out of the
looped portion of the path.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-07 02:43:02 -08:00
Alexander Duyck 69d3ca5357 igb: optimize/refactor receive path
While cleaning up the skb_over panic with small frames I found there was
room for improvement in the ordering of operations within the rx receive
flow.  These changes will place the prefetch for the next descriptor to a
point earlier in the rx path.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-07 02:43:01 -08:00
David S. Miller 0b492fce3d sunhme: Don't match PCI devices in SBUS probe.
Unfortunately, the OF device tree nodes for SBUS and PCI
hme devices have the same device node name on some systems.

So if the name of the parent node isn't 'sbus', skip it.

Based upon an excellent report and detective work by
Meelis Roos and Eric Brower.

Signed-off-by: David S. Miller <davem@davemloft.net>
Tested-by: Meelis Roos <mroos@linux.ee>
2009-02-07 02:20:25 -08:00
Peter P Waskiewicz Jr 3e450669cc ixgbe: Fix a set_num_queues() bug that can result in num_(r|t)x_queues = 0
Now that our set_num_queues() routines for each feature are re-entrant, and
can be called at any point, they shouldn't zero out the feature's indices
or mask bits.  Subsequent calls into those routines for those features can
result in zero Rx and Tx queues being assigned, causing a panic later in
driver reinitialization.

Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-07 02:16:59 -08:00
Ayaz Abdulla 2813ddd1bf forcedeth: bump version to 63
This patch bumps the version up to 63

Signed-off-by: Ayaz Abdulla <aabdulla@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-07 00:25:18 -08:00
Ayaz Abdulla daa91a9d24 forcedeth: recover error support
This patch adds another type of recoverable error to the driver. It also
modifies the sequence for recovery to include a mac reset and clearing
of interrupts.

Signed-off-by: Ayaz Abdulla <aabdulla@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-07 00:25:00 -08:00
Ayaz Abdulla c1086cda7d forcedeth: ethtool tx csum fix
This patch fixes the ethtool tx csum "set" command. A recent patch was
submitted to remove HW_CSUM and use IP_CSUM instead. Therefore, the
corresponding ethtool command should also be modified.

Signed-off-by: Ayaz Abdulla <aabdulla@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-07 00:24:39 -08:00
Ayaz Abdulla b6e4405bf7 forcedeth: msi interrupt fix
This patch fixes an issue with the suspend/resume cycle with msi
interrupts. See bugzilla number 10487 for more details. The fix is to
re-setup a private msi pci config offset field.

Signed-off-by: Ayaz Abdulla <aabdulla@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-07 00:24:15 -08:00
Ayaz Abdulla cac1c52c36 forcedeth: mgmt unit interface
This patch updates the logic used to communicate with the mgmt unit. It
also adds a version check for a newer mgmt unit firmware.

* Fixed udelay to schedule_timeout_uninterruptible

Signed-off-by: Ayaz Abdulla <aabdulla@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-07 00:23:57 -08:00
Ilpo Järvinen 1f0fa15432 net/sunrpc/xprtsock.c: some common code found
$ diff-funcs xs_udp_write_space net/sunrpc/xprtsock.c
net/sunrpc/xprtsock.c xs_tcp_write_space
 --- net/sunrpc/xprtsock.c:xs_udp_write_space()
 +++ net/sunrpc/xprtsock.c:xs_tcp_write_space()
@@ -1,4 +1,4 @@
- * xs_udp_write_space - callback invoked when socket buffer space
+ * xs_tcp_write_space - callback invoked when socket buffer space
  *                             becomes available
  * @sk: socket whose state has changed
  *
@@ -7,12 +7,12 @@
  * progress, otherwise we'll waste resources thrashing kernel_sendmsg
  * with a bunch of small requests.
  */
-static void xs_udp_write_space(struct sock *sk)
+static void xs_tcp_write_space(struct sock *sk)
 {
 	read_lock(&sk->sk_callback_lock);

-	/* from net/core/sock.c:sock_def_write_space */
-	if (sock_writeable(sk)) {
+	/* from net/core/stream.c:sk_stream_write_space */
+	if (sk_stream_wspace(sk) >= sk_stream_min_wspace(sk)) {
 		struct socket *sock;
 		struct rpc_xprt *xprt;


$ codiff net/sunrpc/xprtsock.o net/sunrpc/xprtsock.o.new
net/sunrpc/xprtsock.c:
  xs_tcp_write_space | -163
  xs_udp_write_space | -163
 2 functions changed, 326 bytes removed

net/sunrpc/xprtsock.c:
  xs_write_space | +179
 1 function changed, 179 bytes added

net/sunrpc/xprtsock.o.new:
 3 functions changed, 179 bytes added, 326 bytes removed, diff: -147

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-06 23:48:33 -08:00
Ilpo Järvinen b5f348e5a4 ipv6/addrconf: common code located
$ codiff net/ipv6/addrconf.o net/ipv6/addrconf.o.new
net/ipv6/addrconf.c:
 addrconf_notify | -267
1 function changed, 267 bytes removed

net/ipv6/addrconf.c:
 add_addr |  +86
1 function changed, 86 bytes added

net/ipv6/addrconf.o.new:
2 functions changed, 86 bytes added, 267 bytes removed, diff: -181

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-06 23:48:01 -08:00
Ilpo Järvinen d73f08011b ipv6/ndisc: join error paths
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-06 23:47:37 -08:00
Ilpo Järvinen 910d30b704 ax25: more common return path joining
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-06 23:47:14 -08:00
Ilpo Järvinen 69ebbf58f3 ipmr: use goto to common label instead of opencoding
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-06 23:46:51 -08:00
Eric Van Hensbergen beeebc92ee 9p: fix endian issues [attempt 3]
When the changes were done to the protocol last release, some endian
bugs crept in.  This patch fixes those endian problems and has been
verified to run on 32/64 bit and x86/ppc architectures.

This version of the patch incorporates the correct annotations
for endian variables.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-06 22:07:41 -08:00
David S. Miller b4bd07c20b net_dma: call dmaengine_get only if NET_DMA enabled
Based upon a patch from Atsushi Nemoto <anemo@mba.ocn.ne.jp>

--------------------
The commit 649274d993 ("net_dma:
acquire/release dma channels on ifup/ifdown") added unconditional call
of dmaengine_get() to net_dma.  The API should be called only if
NET_DMA was enabled.
--------------------

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Dan Williams <dan.j.williams@intel.com>
2009-02-06 22:06:43 -08:00
Ondrej Zary 152abd139c 3c509: Fix resume from hibernation for PnP mode.
From: Ondrej Zary <linux@rainbow-software.org>

last year, I posted a patch which fixed hibernation on 3c509
cards. That was back in 2.6.24. It worked fine in 2.6.25. But then I
stopped using hibernation (as it did not work with my new IT8212 RAID
controller).

Now I fixed it and noticed that 3c509 does not wake up properly
anymore (in 2.6.28) - neither in PnP nor in ISA modes. ifconfig
down/up makes the card work again in PnP mode. However, in ISA mode,
ifconfig up ends with "No such device" error.

Comparing the 3c509 driver between 2.6.25 and 2.6.28, there's only
some statistics-related change. So the cause of the problem must be
somewhere else.

This patch makes the resume work in PnP mode, but it's still not
enough for ISA mode.

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-06 22:04:08 -08:00
Ilkka Virta 71822faa3b sungem: Soft lockup in sungem on Netra AC200 when switching interface up
From: Ilkka Virta <itvirta@iki.fi>

In the lockup situation the driver seems to go off in an eternal storm
of interrupts right after calling request_irq(). It doesn't actually
do anything interesting in the interrupt handler. Since connecting the link
afterwards works, something later in initialization must fix this.

Looking at gem_do_start() and gem_open(), it seems that the only thing
done while opening the device after the request_irq(), is a call to
napi_enable().

I don't know what the ordering requirements are for the
initialization, but I boldly tried to move the napi_enable() call
inside gem_do_start() before the link state is checked and interrupts
subsequently enabled, and it seems to work for me. Doesn't even break
anything too obvious...

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-06 22:00:36 -08:00
David Howells 15bde72738 RxRPC: Fix a potential NULL dereference
Fix a potential NULL dereference bug during error handling in
rxrpc_kernel_begin_call(), whereby rxrpc_put_transport() may be handed a NULL
pointer.

This was found with a code checker (http://repo.or.cz/w/smatch.git/).

Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-06 21:50:52 -08:00
Ivan Vecera 355423d084 r8169: Don't update statistics counters when interface is down
Some Realtek chips (RTL8169sb/8110sb in my case) are unable to retrieve
ethtool statistics when the interface is down. The process stays in
endless loop in rtl8169_get_ethtool_stats. This is because these chips
need to have receiver enabled (CmdRxEnb bit in ChipCmd register) that is
cleared when the interface is going down. It's better to update statistics
only when the interface is up and otherwise return copy of statistics
grabbed when the interface was up (in rtl8169_close).

It is interesting that PCI-E NICs (like 8168b/8111b...) are not affected.

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Acked-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-06 21:49:57 -08:00
Peter P Waskiewicz Jr 12207e498b ixgbe: Defeature Tx Head writeback
Tx Head writeback is causing multi-microsecond stalls on PCIe chipsets, due
to partial cacheline writebacks.  Removing this feature removes these
issues.

Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-06 21:47:24 -08:00
Peter P Waskiewicz Jr 0ecc061d19 ixgbe: Update flow control state machine in link setup
The flow control handling is overly complicated and difficult to maintain.
This patch cleans up the flow control handling and makes it much more
explicit.  It also adds 1G flow control autonegotiation, for 1G copper
links, 1G KX links, and 1G fiber links.

Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-06 21:46:54 -08:00
Jesper Dangaard Brouer 2783ef2312 udp: Fix potential wrong ip_hdr(skb) pointers
Like the UDP header fix, pskb_may_pull() can potentially
alter the SKB buffer.  Thus the saddr and daddr, pointers
may point to the old skb->data buffer.

I haven't seen corruptions, as its only seen if the old
skb->data buffer were reallocated by another user and
written into very quickly (or poison'd by SLAB debugging).

Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-06 01:59:12 -08:00
Yinghai Lu 394827913e forcedeth: enable msix to default
Impact: change default

msix and napic can work again

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-06 01:31:12 -08:00
Yinghai Lu 033e97b24a forcedeth: ck804 and mcp55 doesn't need timerirq
Impact: cleanup

so get less irq.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-06 01:30:56 -08:00
Yinghai Lu 0335ef5d59 forcedeth: disable irq at first before schedule rx
Impact: clean up

schedule it later after disable it.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-06 01:30:36 -08:00
Yinghai Lu 79d30a581f forcedeth: don't clear nic_poll_irq too early
Impact: fix bug

for msix, we still need that flag to enable irq respectively

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-06 01:30:01 -08:00
Yinghai Lu ddb213f076 forcedeth: make msi-x different name for rx-tx
Impact: make /proc/interrupts could show more info which irq is rx or other for msi-x

add three name fields for rx, tx, other

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-06 01:29:23 -08:00
Gautam Kachroo efc683fc2a neigh: some entries can be skipped during dumping
neightbl_dump_info and neigh_dump_table  can skip entries if the
*fill*info functions return an error. This results in an incomplete
dump ((invoked by netlink requests for RTM_GETNEIGHTBL or
RTM_GETNEIGH)

nidx and idx should not be incremented if the current entry was not
placed in the output buffer

Signed-off-by: Gautam Kachroo <gk@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-06 00:52:04 -08:00
David S. Miller 684de409ac ipv6: Disallow rediculious flowlabel option sizes.
Just like PKTINFO, limit the options area to 64K.

Based upon report by Eric Sesterhenn and analysis by
Roland Dreier.

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-06 00:49:55 -08:00
Pablo Neira Ayuso ff491a7334 netlink: change return-value logic of netlink_broadcast()
Currently, netlink_broadcast() reports errors to the caller if no
messages at all were delivered:

1) If, at least, one message has been delivered correctly, returns 0.
2) Otherwise, if no messages at all were delivered due to skb_clone()
   failure, return -ENOBUFS.
3) Otherwise, if there are no listeners, return -ESRCH.

With this patch, the caller knows if the delivery of any of the
messages to the listeners have failed:

1) If it fails to deliver any message (for whatever reason), return
   -ENOBUFS.
2) Otherwise, if all messages were delivered OK, returns 0.
3) Otherwise, if no listeners, return -ESRCH.

In the current ctnetlink code and in Netfilter in general, we can add
reliable logging and connection tracking event delivery by dropping the
packets whose events were not successfully delivered over Netlink. Of
course, this option would be settable via /proc as this approach reduces
performance (in terms of filtered connections per seconds by a stateful
firewall) but providing reliable logging and event delivery (for
conntrackd) in return.

This patch also changes some clients of netlink_broadcast() that
may report ENOBUFS errors via printk. This error handling is not
of any help. Instead, the userspace daemons that are listening to
those netlink messages should resync themselves with the kernel-side
if they hit ENOBUFS.

BTW, netlink_broadcast() clients include those that call
cn_netlink_send(), nlmsg_multicast() and genlmsg_multicast() since they
internally call netlink_broadcast() and return its error value.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-05 23:56:36 -08:00
Alex Chiang 612e244c12 e1000e: normalize usage of serdes_has_link
Cosmetic change to use struct e1000_mac_info.serdes_has_link
consistently as the 'bool' that it's declared as.

No functional change.

Signed-off-by: Alex Chiang <achiang@hp.com>
Acked-by: Jeff Kirsher <Jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-05 23:55:45 -08:00
Peter P Waskiewicz Jr 34b0368c68 ixgbe: Display EEPROM version in ethtool -i queries
Currently ixgbe does not display the EEPROM version in ethtool -i, where
other drivers do.  The EEPROM version is located at offset 0x29.  This
patch adds support to display it.

Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-05 23:54:42 -08:00
Peter P Waskiewicz Jr 3201d3130e ixgbe: Update link setup code to better support autonegotiation of speed
The current code has some flaws in it when performing autonegotiation,
especially on KX/KX4 links.  This patch updates the code to better handle
the autonegotiation states on link setup.  The patch also removes a redundant
link configuration call on driver load, and moves link configuration to
the ->open() path.

Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-05 23:54:21 -08:00
Peter P Waskiewicz Jr bc97114d3f ixgbe: Refactor set_num_queues() and cache_ring_register()
The current code to determine the number of queues the device will want
on driver initialization is ugly and difficult to maintain.  It also
doesn't allow for easy expansion for future features or future hardware.
This patch refactors these routines, and make them easier to deal with.

Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-05 23:53:59 -08:00
Herbert Xu 56035022d8 gro: Fix frag_list merging on imprecisely split packets
The previous fix ad0f990444 (gro:
Fix handling of imprecisely split packets) only fixed the case
of frags merging, frag_list merging in the same circumstances
were still broken.

In particular, the packet headers end up in the data stream.

This patch fixes this plus another issue where an imprecisely
split packet header may be read incorrectly (this is mostly
harmless since it'll simply cause the packet to not match and
be rejected for GRO).

Thanks to Emil Tantilov and Jeff Kirsher for helping to track
this down.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-05 21:26:52 -08:00
Graf Yang fe2918b098 net: fix some trailing whitespaces
Signed-off-by: Graf Yang <graf.yang@analog.com>
Signed-off-by: Bryan Wu <cooloney@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-05 21:26:19 -08:00
Herbert Xu 33dccbb050 tun: Limit amount of queued packets per device
Unlike a normal socket path, the tuntap device send path does
not have any accounting.  This means that the user-space sender
may be able to pin down arbitrary amounts of kernel memory by
continuing to send data to an end-point that is congested.

Even when this isn't an issue because of limited queueing at
most end points, this can also be a problem because its only
response to congestion is packet loss.  That is, when those
local queues at the end-point fills up, the tuntap device will
start wasting system time because it will continue to send
data there which simply gets dropped straight away.

Of course one could argue that everybody should do congestion
control end-to-end, unfortunately there are people in this world
still hooked on UDP, and they don't appear to be going away
anywhere fast.  In fact, we've always helped them by performing
accounting in our UDP code, the sole purpose of which is to
provide congestion feedback other than through packet loss.

This patch attempts to apply the same bandaid to the tuntap device.
It creates a pseudo-socket object which is used to account our
packets just as a normal socket does for UDP.  Of course things
are a little complex because we're actually reinjecting traffic
back into the stack rather than out of the stack.

The stack complexities however should have been resolved by preceding
patches.  So this one can simply start using skb_set_owner_w.

For now the accounting is essentially disabled by default for
backwards compatibility.  In particular, we set the cap to INT_MAX.
This is so that existing applications don't get confused by the
sudden arrival EAGAIN errors.

In future we may wish (or be forced to) do this by default.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-05 21:25:32 -08:00
David S. Miller a23f4bbd8d Revert "tcp: Always set urgent pointer if it's beyond snd_nxt"
This reverts commit 64ff3b938e.

Jeff Chua reports that it breaks rlogin for him.

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-05 15:38:31 -08:00
Herbert Xu 0178b695fd ipv6: Copy cork options in ip6_append_data
As the options passed to ip6_append_data may be ephemeral, we need
to duplicate it for corking.  This patch applies the simplest fix
which is to memdup all the relevant bits.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-05 15:15:50 -08:00
David S. Miller 12402b5b7a Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2009-02-05 15:08:11 -08:00
Jesper Dangaard Brouer 7b5e56f9d6 udp: Fix UDP short packet false positive
The UDP header pointer assignment must happen after calling
pskb_may_pull().  As pskb_may_pull() can potentially alter the SKB
buffer.

This was exposted by running multicast traffic through the NIU driver,
as it won't prepull the protocol headers into the linear area on
receive.

Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-05 15:05:45 -08:00
Herbert Xu 4cc7f68d65 net: Reexport sock_alloc_send_pskb
The function sock_alloc_send_pskb is completely useless if not
exported since most of the code in it won't be used as is.  In
fact, this code has already been duplicated in the tun driver.

Now that we need accounting in the tun driver, we can in fact
use this function as is.  So this patch marks it for export again.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-04 16:55:54 -08:00
Herbert Xu 9a279bcbe3 net: Partially allow skb destructors to be used on receive path
As it currently stands, skb destructors are forbidden on the
receive path because the protocol end-points will overwrite
any existing destructor with their own.

This is the reason why we have to call skb_orphan in the loopback
driver before we reinject the packet back into the stack, thus
creating a period during which loopback traffic isn't charged
to any socket.

With virtualisation, we have a similar problem in that traffic
is reinjected into the stack without being associated with any
socket entity, thus providing no natural congestion push-back
for those poor folks still stuck with UDP.

Now had we been consistent in telling them that UDP simply has
no congestion feedback, I could just fob them off.  Unfortunately,
we appear to have gone to some length in catering for this on
the standard UDP path, with skb/socket accounting so that has
created a very unhealthy dependency.

Alas habits are difficult to break out of, so we may just have
to allow skb destructors on the receive path.

It turns out that making skb destructors useable on the receive path
isn't as easy as it seems.  For instance, simply adding skb_orphan
to skb_set_owner_r isn't enough.  This is because we assume all
over the IP stack that skb->sk is an IP socket if present.

The new transparent proxy code goes one step further and assumes
that skb->sk is the receiving socket if present.

Now all of this can be dealt with by adding simple checks such
as only treating skb->sk as an IP socket if skb->sk->sk_family
matches.  However, it turns out that for bridging at least we
don't need to do all of this work.

This is of interest because most virtualisation setups use bridging
so we don't actually go through the IP stack on the host (with
the exception of our old nemesis the bridge netfilter, but that's
easily taken care of).

So this patch simply adds skb_orphan to the point just before we
enter the IP stack, but after we've gone through the bridge on the
receive path.  It also adds an skb_orphan to the one place in
netfilter that touches skb->sk/skb->destructor, that is, tproxy.

One word of caution, because of the internal code structure, anyone
wishing to deploy this must use skb_set_owner_w as opposed to
skb_set_owner_r since many functions that create a new skb from
an existing one will invoke skb_set_owner_w on the new skb.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-04 16:55:27 -08:00
David S. Miller 7870389478 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2009-02-04 16:52:41 -08:00
David S. Miller 005c79b3d4 Merge branch 'master' of /home/davem/src/GIT/linux-2.6/ 2009-02-04 16:51:58 -08:00
Andy Fleming 4d7902f22b gianfar: Fix stashing support
Stashing is only supported on the 85xx (e500-based) SoCs.  The 83xx and 86xx
chips don't have a proper cache for this.  U-Boot has been updated to add
stashing properties to the device tree nodes of gianfar devices on 85xx.  So
now we modify Linux to keep stashing off unless those properties are there.

Signed-off-by: Andy Fleming <afleming@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-04 16:43:44 -08:00
Andy Fleming 0fd56bb5be gianfar: Add support for skb recycling
Signed-off-by: Andy Fleming <afleming@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-04 16:43:16 -08:00
Andy Fleming 1577ecef76 netdev: Merge UCC and gianfar MDIO bus drivers
The MDIO bus drivers for the UCC and gianfar ethernet controllers are
essentially the same.  There's no reason to duplicate that much code.

Signed-off-by: Andy Fleming <afleming@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-04 16:42:35 -08:00
Andy Fleming b98ac702f4 gianfar: Fix potential soft reset race
SOFT_RESET must be asserted for at least 3 TX clocks in order for it to work
properly.  The syncs in the gfar_write() commands have been hiding this, but
we need to guarantee it.

Signed-off-by: Andy Fleming <afleming@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-04 16:38:05 -08:00