As analyzed by Patrick McHardy, vlan needs to reset it's
netdev_ops pointer in it's ->init() function but this
leaves the compat method pointers stale.
Add a netdev_resync_ops() and call it from the vlan code.
Any other driver which changes ->netdev_ops after register_netdevice()
will need to call this new function after doing so too.
With help from Patrick McHardy.
Tested-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Base versions handle constant folding now. For headers exposed to
userspace, we must only expose the __ prefixed versions.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch optimises the Ethernet header comparison to use 2-byte
and 4-byte xors instead of memcmp. In order to facilitate this,
the actual comparison is now carried out by the callers of the
shared dev_gro_receive function.
This has a significant impact when receiving 1500B packets through
10GbE.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch prepares for the move of the same_flow checks out of
dev_gro_receive. As such we need to remember the number of held
packets since doing a loop just to count them every time is silly.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Unfortunately simplicity isn't always the best. The fraginfo
interface turned out to be suboptimal. The problem was quite
obvious. For every packet, we have to copy the headers from
the frags structure into skb->head, even though for 99% of the
packets this part is immediately thrown away after the merge.
LRO didn't have this problem because it directly read the headers
from the frags structure.
This patch attempts to address this by creating an interface
that allows GRO to access the headers in the first frag without
having to copy it. Because all drivers that use frags place the
headers in the first frag this optimisation should be enough.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently VLAN still has a bit of common code handling the aftermath
of GRO that's shared with the common path. This patch moves them
into shared helpers to reduce code duplication.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Following the removal of the unused struct net_device * parameter from
the NAPI functions named *netif_rx_* in commit 908a7a1, they are
exactly equivalent to the corresponding *napi_* functions and are
therefore redundant.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds an init_dummy_netdev() function that gets a network device
structure (allocation and lifetime entirely under caller's control) and
initialize the minimum amount of fields so it can be used to schedule
NAPI polls without registering a full blown interface. This is to be
used by drivers that need to tie several hardware interfaces to a single
NAPI poll scheduler due to HW limitations.
It also updates the ibm_newemac driver to use that, this fixing the
oops on 2.6.29 due to passing NULL as "dev" to netif_napi_add()
Symbol is exported GPL only a I don't think we want binary drivers doing
that sort of acrobatics (if we want them at all).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Tested-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx: (22 commits)
ioat: fix self test for multi-channel case
dmaengine: bump initcall level to arch_initcall
dmaengine: advertise all channels on a device to dma_filter_fn
dmaengine: use idr for registering dma device numbers
dmaengine: add a release for dma class devices and dependent infrastructure
ioat: do not perform removal actions at shutdown
iop-adma: enable module removal
iop-adma: kill debug BUG_ON
iop-adma: let devm do its job, don't duplicate free
dmaengine: kill enum dma_state_client
dmaengine: remove 'bigref' infrastructure
dmaengine: kill struct dma_client and supporting infrastructure
dmaengine: replace dma_async_client_register with dmaengine_get
atmel-mci: convert to dma_request_channel and down-level dma_slave
dmatest: convert to dma_request_channel
dmaengine: introduce dma_request_channel and private channels
net_dma: convert to dma_find_channel
dmaengine: provide a common 'issue_pending_all' implementation
dmaengine: centralize channel allocation, introduce dma_find_channel
dmaengine: up-level reference counting to the module level
...
Previously GRO's only entry point from the outside is through
napi_gro_receive and napi_gro_frags. These interfaces are for
device drivers.
This patch rearranges things to provide a new set of interfaces
for VLANs. These interfaces are for internal use only. The
VLAN code itself can then provide a set of entry points for
device drivers.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the general-purpose channel allocation provided by dmaengine.
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This patch allows GRO to merge page frags (skb_shinfo(skb)->frags)
in one skb, rather than using the less efficient frag_list.
It also adds a new interface, napi_gro_frags to allow drivers
to inject page frags directly into the stack without allocating
an skb. This is intended to be the GRO equivalent for LRO's
lro_receive_frags interface.
The existing GSO interface can already handle page frags with
or without an appended frag_list so nothing needs to be changed
there.
The merging itself is rather simple. We store any new frag entries
after the last existing entry, without checking whether the first
new entry can be merged with the last existing entry. Making this
check would actually be easy but since no existing driver can
produce contiguous frags anyway it would just be mental masturbation.
If the total number of entries would exceed the capacity of a
single skb, we simply resort to using frag_list as we do now.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the napi api was changed to separate its 1:1 binding to the net_device
struct, the netif_rx_[prep|schedule|complete] api failed to remove the now
vestigual net_device structure parameter. This patch cleans up that api by
properly removing it..
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the top-level GRO (Generic Receive Offload) infrastructure.
This is pretty similar to LRO except that this is protocol-independent.
Instead of holding packets in an lro_mgr structure, they're now held in
napi_struct.
For drivers that intend to use this, they can set the NETIF_F_GRO bit and
call napi_gro_receive instead of netif_receive_skb or just call netif_rx.
The latter will call napi_receive_skb automatically. When napi_gro_receive
is used, the driver must either call napi_complete/napi_rx_complete, or
call napi_gro_flush in softirq context if the driver uses the primitives
__napi_complete/__napi_rx_complete.
Protocols will set the gro_receive and gro_complete function pointers in
order to participate in this scheme.
In addition to the packet, gro_receive will get a list of currently held
packets. Each packet in the list has a same_flow field which is non-zero
if it is a potential match for the new packet. For each packet that may
match, they also have a flush field which is non-zero if the held packet
must not be merged with the new packet.
Once gro_receive has determined that the new skb matches a held packet,
the held packet may be processed immediately if the new skb cannot be
merged with it. In this case gro_receive should return the pointer to
the existing skb in gro_list. Otherwise the new skb should be merged into
the existing packet and NULL should be returned, unless the new skb makes
it impossible for any further merges to be made (e.g., FIN packet) where
the merged skb should be returned.
Whenever the skb is merged into an existing entry, the gro_receive
function should set NAPI_GRO_CB(skb)->same_flow. Note that if an skb
merely matches an existing entry but can't be merged with it, then
this shouldn't be set.
If gro_receive finds it pointless to hold the new skb for future merging,
it should set NAPI_GRO_CB(skb)->flush.
Held packets will be flushed by napi_gro_flush which is called by
napi_complete and napi_rx_complete.
Currently held packets are stored in a singly liked list just like LRO.
The list is limited to a maximum of 8 entries. In future, this may be
expanded to use a hash table to allow more flows to be held for merging.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch allows GSO to handle frag_list in a limited way for the
purposes of allowing packets merged by GRO to be refragmented on
output.
Most hardware won't (and aren't expected to) support handling GRO
frag_list packets directly. Therefore we will perform GSO in
software for those cases.
However, for drivers that can support it (such as virtual NICs) we
may not have to segment the packets at all.
Whether the added overhead of GRO/GSO is worthwhile for bridges
and routers when weighed against the benefit of potentially
increasing the MTU within the host is still an open question.
However, for the case of host nodes this is undoubtedly a win.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
A few months back a race was discused between the netpoll napi service
path, and the fast path through net_rx_action:
http://kerneltrap.org/mailarchive/linux-netdev/2007/10/16/345470
A patch was submitted for that bug, but I think we missed a case.
Consider the following scenario:
INITIAL STATE
CPU0 has one napi_struct A on its poll_list
CPU1 is calling netpoll_send_skb and needs to call poll_napi on the same
napi_struct A that CPU0 has on its list
CPU0 CPU1
net_rx_action poll_napi
!list_empty (returns true) locks poll_lock for A
poll_one_napi
napi->poll
netif_rx_complete
__napi_complete
(removes A from poll_list)
list_entry(list->next)
In the above scenario, net_rx_action assumes that the per-cpu poll_list is
exclusive to that cpu. netpoll of course violates that, and because the netpoll
path can dequeue from the poll list, its possible for CPU0 to detect a non-empty
list at the top of the while loop in net_rx_action, but have it become empty by
the time it calls list_entry. Since the poll_list isn't surrounded by any other
structure, the returned data from that list_entry call in this situation is
garbage, and any number of crashes can result based on what exactly that garbage
is.
Given that its not fasible for performance reasons to place exclusive locks
arround each cpus poll list to provide that mutal exclusion, I think the best
solution is modify the netpoll path in such a way that we continue to guarantee
that the poll_list for a cpu is in fact exclusive to that cpu. To do this I've
implemented the patch below. It adds an additional bit to the state field in
the napi_struct. When executing napi->poll from the netpoll_path, this bit will
be set. When a driver calls netif_rx_complete, if that bit is set, it will not
remove the napi_struct from the poll_list. That work will be saved for the next
iteration of net_rx_action.
I've tested this and it seems to work well. About the biggest drawback I can
see to it is the fact that it might result in an extra loop through
net_rx_action in the event that the device is actually contended for (i.e. the
netpoll path actually preforms all the needed work no the device, and the call
to net_rx_action winds up doing nothing, except removing the napi_struct from
the poll_list. However I think this is probably a small price to pay, given
that the alternative is a crash.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is the last shoot of this series.
After I removing all directly reference of netdev->priv, I am killing
"priv" of "struct net_device" and fixing relative comments/docs.
Anyone will not be allowed to reference netdev->priv directly.
If you want to reference the memory of private data, use netdev_priv()
instead.
If the private data is not allocted when alloc_netdev(), use
netdev->ml_priv to point that memory after you creating that private
data.
Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the netlink option for DCB is necessary to actually be useful,
simplified the Kconfig option. In addition, added useful help text for the
Kconfig option.
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As a concession to vendors who have to deal with one source for different
kernel versions, add a HAVE_NET_DEVICE_OPS so they don't end up hard
coding ifdef against kernel version.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds support for Data Center Bridging (DCB) features in the ixgbe
driver and adds an rtnetlink interface for configuring DCB to the
kernel. The DCB feature support included are Priority Grouping (PG) -
which allows bandwidth guarantees to be allocated to groups to traffic
based on the 802.1q priority, and Priority Based Flow Control (PFC) -
which introduces a new MAC control PAUSE frame which works at
granularity of the 802.1p priority instead of the link (IEEE 802.3x).
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch moves neigh_setup and hard_start_xmit into the network device ops
structure. For bisection, fix all the previously converted drivers as well.
Bonding driver took the biggest hit on this.
Added a prefetch of the hard_start_xmit in the fast path to try and reduce
any impact this would have.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order for the network device ops get_stats call to be immutable, the handling
of the default internal network device stats block has to be changed. Add a new
helper function which replaces the old use of internal_get_stats.
Note: change return code to make it clear that the caller should not
go changing the returned statistics.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch changes the network device internal API to move adminstrative
operations out of the network device structure and into a separate structure.
This patch involves some hackery to maintain compatablity between the
new and old model, so all 300+ drivers don't have to be changed at once.
For drivers that aren't converted yet, the netdevice_ops virt function list
still resides in the net_device structure. For old protocols, the new
net_device_ops are copied out to the old net_device pointers.
After the transistion is completed the nag message can be changed to
an WARN_ON, and the compatiablity code can be made configurable.
Some function pointers aren't moved:
* destructor can't be in net_device_ops because
it may need to be referenced after the module is unloaded.
* neighbor setup is manipulated in a couple of places that need special
consideration
* hard_start_xmit is in the fast path for transmit.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I was recently hunting a bug that occurred in network namespace
cleanup. In looking at the code it became apparrent that we have
and will continue to have cases where if we have anything going
on in a network namespace there will be assumptions that the
loopback device is present. Things like sending igmp unsubscribe
messages when we bring down network devices invokes the routing
code which assumes that at least the loopback driver is present.
Therefore to avoid magic initcall ordering hackery that is hard
to follow and hard to get right insert a call to register the
loopback device directly from net_dev_init(). This guarantes
that the loopback device is the first device registered and
the last network device to go away.
But do it carefully so we register the loopback device after
we clear dev_boot_phase.
Signed-off-by: Eric W. Biederman <ebiederm@maxwell.aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I was recently hunting a bug that occurred in network namespace
cleanup. In looking at the code it became apparrent that we have
and will continue to have cases where if we have anything going
on in a network namespace there will be assumptions that the
loopback device is present. Things like sending igmp unsubscribe
messages when we bring down network devices invokes the routing
code which assumes that at least the loopback driver is present.
Therefore to avoid magic initcall ordering hackery that is hard
to follow and hard to get right insert a call to register the
loopback device directly from net_dev_init(). This guarantes
that the loopback device is the first device registered and
the last network device to go away.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The only user of the net_device->last_rx field is bonding.
This patch adds a conditional update of last_rx to the bonding special
logic in skb_bond_should_drop, causing last_rx to only be updated when
the ARP monitor is running.
This frees network device drivers from the necessity of
updating last_rx, which can have cache line thrash issues.
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove excess kernel-doc function parameters from networking header
& driver files:
Warning(include/net/sock.h:946): Excess function parameter or struct member 'sk' description in 'sk_filter_release'
Warning(include/linux/netdevice.h:1545): Excess function parameter or struct member 'cpu' description in 'netif_tx_lock'
Warning(drivers/net/wan/z85230.c:712): Excess function parameter or struct member 'regs' description in 'z8530_interrupt'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
My change
commit e2a6b85247
net: Enable TSO if supported by at least one device
didn't do what was intended because the netdev_compute_features
function was designed for conjunctions. So what happened was that
it would simply take the TSO status of the last constituent device.
This patch extends it to support both conjunctions and disjunctions
under the new name of netdev_increment_features.
It also adds a new function netdev_fix_features which does the
sanity checking that usually occurs upon registration. This ensures
that the computation doesn't result in an illegal combination
since this checking is absent when the change is initiated via
ethtool.
The two users of netdev_compute_features have been converted.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Clean up the various different email addresses of mine listed in the code
to a single current and valid address. As Dave says his network merges
for 2.6.28 are now done this seems a good point to send them in where
they won't risk disrupting real changes.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds support for the Trailer switch tagging format. This is
another tagging that doesn't explicitly mark tagged packets with a
distinct ethertype, so that we need to add a similar hack in the
receive path as for the Original DSA tagging format.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Tested-by: Byron Bradley <byron.bbradley@gmail.com>
Tested-by: Tim Ellis <tim.ellis@mac.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Most of the DSA switches currently in the field do not support the
Ethertype DSA tagging format that one of the previous patches added
support for, but only the original DSA tagging format.
The original DSA tagging format carries the same information as the
Ethertype DSA tagging format, but with the difference that it does not
have an ethertype field. In other words, when receiving a packet that
is tagged with an original DSA tag, there is no way of telling in
eth_type_trans() that this packet is in fact a DSA-tagged packet.
This patch adds a hook into eth_type_trans() which is only compiled in
if support for a switch chip that doesn't support Ethertype DSA is
selected, and which checks whether there is a DSA switch driver
instance attached to this network device which uses the old tag format.
If so, it sets the protocol field to ETH_P_DSA without looking at the
packet, so that the packet ends up in the right place.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Tested-by: Nicolas Pitre <nico@marvell.com>
Tested-by: Peter van Valderen <linux@ddcrew.com>
Tested-by: Dirk Teurlings <dirk@upexia.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>
Distributed Switch Architecture is a protocol for managing hardware
switch chips. It consists of a set of MII management registers and
commands to configure the switch, and an ethernet header format to
signal which of the ports of the switch a packet was received from
or is intended to be sent to.
The switches that this driver supports are typically embedded in
access points and routers, and a typical setup with a DSA switch
looks something like this:
+-----------+ +-----------+
| | RGMII | |
| +-------+ +------ 1000baseT MDI ("WAN")
| | | 6-port +------ 1000baseT MDI ("LAN1")
| CPU | | ethernet +------ 1000baseT MDI ("LAN2")
| |MIImgmt| switch +------ 1000baseT MDI ("LAN3")
| +-------+ w/5 PHYs +------ 1000baseT MDI ("LAN4")
| | | |
+-----------+ +-----------+
The switch driver presents each port on the switch as a separate
network interface to Linux, polls the switch to maintain software
link state of those ports, forwards MII management interface
accesses to those network interfaces (e.g. as done by ethtool) to
the switch, and exposes the switch's hardware statistics counters
via the appropriate Linux kernel interfaces.
This initial patch supports the MII management interface register
layout of the Marvell 88E6123, 88E6161 and 88E6165 switch chips, and
supports the "Ethertype DSA" packet tagging format.
(There is no officially registered ethertype for the Ethertype DSA
packet format, so we just grab a random one. The ethertype to use
is programmed into the switch, and the switch driver uses the value
of ETH_P_EDSA for this, so this define can be changed at any time in
the future if the one we chose is allocated to another protocol or
if Ethertype DSA gets its own officially registered ethertype, and
everything will continue to work.)
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Tested-by: Nicolas Pitre <nico@marvell.com>
Tested-by: Byron Bradley <byron.bbradley@gmail.com>
Tested-by: Tim Ellis <tim.ellis@mac.com>
Tested-by: Peter van Valderen <linux@ddcrew.com>
Tested-by: Dirk Teurlings <dirk@upexia.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>
dev_change_name and netdev_drivername should use const char on
parameters that are read-only input values. The strcpy to newname is
not needed since newname is not used later in function.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch add support for keeping an additional character alias
associated with an network interface. This is useful for maintaining
the SNMP ifAlias value which is a user defined value. Routers use this
to hold information like which circuit or line it is connected to. It
is just an arbitrary text label on the network device.
There are two exposed interfaces with this patch, the value can be
read/written either via netlink or sysfs.
This could be maintained just by the snmp daemon, but it is more
generally useful for other management tools, and the kernel is good
place to act as an agreed upon interface to store it.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
dst_input() was doing something completely absurd, looping
on skb->dst->input() if NET_XMIT_BYPASS was seen, but these
functions never return such an error.
And as a result plain ole' NET_XMIT_BYPASS has no more
references and can be completely killed off.
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy <kaber@trash.net> noticed:
"The other problem that affects all qdiscs supporting actions is
TC_ACT_QUEUED/TC_ACT_STOLEN getting mapped to NET_XMIT_SUCCESS
even though the packet is not queued, corrupting upper qdiscs'
qlen counters."
and later explained:
"The reason why it translates it at all seems to be to not increase
the drops counter. Within a single qdisc this could be avoided by
other means easily, upper qdiscs would still increase the counter
when we return anything besides NET_XMIT_SUCCESS though.
This means we need a new NET_XMIT return value to indicate this to
the upper qdiscs. So I'd suggest to introduce NET_XMIT_STOLEN,
return that to upper qdiscs and translate it to NET_XMIT_SUCCESS
in dev_queue_xmit, similar to NET_XMIT_BYPASS."
David Miller <davem@davemloft.net> noticed:
"Maybe these NET_XMIT_* values being passed around should be a set of
bits. They could be composed of base meanings, combined with specific
attributes.
So you could say "NET_XMIT_DROP | __NET_XMIT_NO_DROP_COUNT"
The attributes get masked out by the top-level ->enqueue() caller,
such that the base meanings are the only thing that make their
way up into the stack. If it's only about communication within the
qdisc tree, let's simply code it that way."
This patch is trying to realize these ideas.
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When support for multiple TX queues were added, the
netif_tx_lock() routines we converted to iterate over
all TX queues and grab each queue's spinlock.
This causes heartburn for lockdep and it's not a healthy
thing to do with lots of TX queues anyways.
So modify this to use a top-level lock and a "frozen"
state for the individual TX queues.
Signed-off-by: David S. Miller <davem@davemloft.net>
The function header comments have to go with the functions
they are documenting, or things go horribly wrong when we
try to process them with the docbook tools.
Warning(include/linux/netdevice.h:1006): No description found for parameter 'dev_queue'
Warning(include/linux/netdevice.h:1033): No description found for parameter 'dev_queue'
Warning(include/linux/netdevice.h:1067): No description found for parameter 'dev_queue'
Warning(include/linux/netdevice.h:1093): No description found for parameter 'dev_queue'
Warning(include/linux/netdevice.h:1474): No description found for parameter 'txq'
Error(net/core/dev.c:1674): cannot understand prototype: 'u32 simple_tx_hashrnd; '
Signed-off-by: Dave Jones <davej@redhat.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As suggested by Dave:
This patch adds a function to get the driver name from a struct net_device,
and consequently uses this in the watchdog timeout handler to print as
part of the message.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Idea is from Patrick McHardy.
Instead of managing the list of qdiscs on the device level, manage it
in the root qdisc of a netdev_queue. This solves all kinds of
visibility issues during qdisc destruction.
The way to iterate over all qdiscs of a netdev_queue is to visit
the netdev_queue->qdisc, and then traverse it's list.
The only special case is to ignore builting qdiscs at the root when
dumping or doing a qdisc_lookup(). That was not needed previously
because builtin qdiscs were not added to the device's qdisc_list.
Signed-off-by: David S. Miller <davem@davemloft.net>
When we have shared qdiscs, packets come out of the qdiscs
for multiple transmit queues.
Therefore it doesn't make any sense to schedule the transmit
queue when logically we cannot know ahead of time the TX
queue of the SKB that the qdisc->dequeue() will give us.
Just for sanity I added a BUG check to make sure we never
get into a state where the noop_qdisc is scheduled.
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently it is associated with a netdev_queue, but when we have
qdisc sharing that no longer makes any sense.
Signed-off-by: David S. Miller <davem@davemloft.net>
We liberate any dangling gso_skb during qdisc destruction.
It really only matters for the root qdisc. But when qdiscs
can be shared by multiple netdev_queue objects, we can't
have the gso_skb in the netdev_queue any more.
Signed-off-by: David S. Miller <davem@davemloft.net>
Devices or device layers can set this to control the queue selection
performed by dev_pick_tx().
This function runs under RCU protection, which allows overriding
functions to have some way of synchronizing with things like dynamic
->real_num_tx_queues adjustments.
This makes the spinlock prefetch in dev_queue_xmit() a little bit
less effective, but that's the price right now for correctness.
Signed-off-by: David S. Miller <davem@davemloft.net>
The private area of a netdev is now at a fixed offset once more.
Unfortunately, some assumptions that netdev_priv() == netdev->priv
crept back into the tree. In particular this happened in the
loopback driver. Make it use netdev->ml_priv.
Signed-off-by: David S. Miller <davem@davemloft.net>
This effectively "flips the switch" by making the core networking
and multiqueue-aware drivers use the new TX multiqueue structures.
Non-multiqueue drivers need no changes. The interfaces they use such
as netif_stop_queue() degenerate into an operation on TX queue zero.
So everything "just works" for them.
Code that really wants to do "X" to all TX queues now invokes a
routine that does so, such as netif_tx_wake_all_queues(),
netif_tx_stop_all_queues(), etc.
pktgen and netpoll required a little bit more surgery than the others.
In particular the pktgen changes, whilst functional, could be largely
improved. The initial check in pktgen_xmit() will sometimes check the
wrong queue, which is mostly harmless. The thing to do is probably to
invoke fill_packet() earlier.
The bulk of the netpoll changes is to make the code operate solely on
the TX queue indicated by by the SKB queue mapping.
Setting of the SKB queue mapping is entirely confined inside of
net/core/dev.c:dev_pick_tx(). If we end up needing any kind of
special semantics (drops, for example) it will be implemented here.
Finally, we now have a "real_num_tx_queues" which is where the driver
indicates how many TX queues are actually active.
With IGB changes from Jeff Kirsher.
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no need for a feature bit for something that
can be tested by simply checking the TX queue count.
Signed-off-by: David S. Miller <davem@davemloft.net>
alloc_netdev_mq() now allocates an array of netdev_queue
structures for TX, based upon the queue_count argument.
Furthermore, all accesses to the TX queues are now vectored
through the netdev_get_tx_queue() and netdev_for_each_tx_queue()
interfaces. This makes it easy to grep the tree for all
things that want to get to a TX queue of a net device.
Problem spots which are not really multiqueue aware yet, and
only work with one queue, can easily be spotted by grepping
for all netdev_get_tx_queue() calls that pass in a zero index.
Signed-off-by: David S. Miller <davem@davemloft.net>
Add netif_addr_{lock,unlock}{,_bh}() helpers.
Use them to protect operations that operate on or read
the network device unicast and multicast address lists.
Also use them in cases where the code simply wants to
block calls into the driver's ->set_rx_mode() and
->set_multicast_list() methods.
Signed-off-by: David S. Miller <davem@davemloft.net>
This will be used to protect the per-device unicast and multicast
address lists, as well as the callbacks into the drivers which
configure such state such as ->set_rx_mode() and ->set_multicast_list().
Signed-off-by: David S. Miller <davem@davemloft.net>
When VLAN header stripping is used, packets currently bypass packet
sockets (and other network taps) completely. For locally existing
VLANs, they appear directly on the VLAN device, for unknown VLANs
they are silently dropped.
Add a new function netif_nit_deliver() to deliver incoming packets
to all network interface taps and use it in __vlan_hwaccel_rx() to
make VLAN packets visible on the underlying device.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds netif_napi_del function which is used to remove the napi struct from
the netdev napi_list in cases where CONFIG_NETPOLL was enabled.
The motivation for adding this is to handle the case in which the number of
queues on a device changes due to a configuration change. Previously the
napi structs for each queue would be left in the list until the netdev was
freed.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Accesses are mostly structured such that when there are multiple TX
queues the code transformations will be a little bit simpler.
Signed-off-by: David S. Miller <davem@davemloft.net>
Only plain netif_schedule() remains taking a net_device, mostly as a
compatability item while we transition the rest of these interfaces.
Everything else calls netif_schedule_queue() or __netif_schedule(),
both of which take a netdev_queue pointer.
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that our qdisc management is bi-directional, per-queue, and fully
orthogonal, there is no reason to have a special ingress qdisc pointer
in struct net_device.
Signed-off-by: David S. Miller <davem@davemloft.net>
Every qdisc is assosciated with a queue, and in the case of ingress
qdiscs that will now be netdev->rx_queue so using that queue's lock is
the thing to do.
Signed-off-by: David S. Miller <davem@davemloft.net>
The lock is now an attribute of the device queue.
One thing to notice is that "suspicious" places
emerge which will need specific training about
multiple queue handling. They are so marked with
explicit "netdev->rx_queue" and "netdev->tx_queue"
references.
Signed-off-by: David S. Miller <davem@davemloft.net>
A netdev_queue is an entity managed by a qdisc.
Currently there is one RX and one TX queue, and a netdev_queue merely
contains a backpointer to the net_device.
The Qdisc struct is augmented with a netdev_queue pointer as well.
Eventually the 'dev' Qdisc member will go away and we will have the
resulting hierarchy:
net_device --> netdev_queue --> Qdisc
Also, qdisc_alloc() and qdisc_create_dflt() now take a netdev_queue
pointer argument.
Signed-off-by: David S. Miller <davem@davemloft.net>
Add an implementation of the GARP (Generic Attribute Registration Protocol)
applicant-only participant. This will be used by the following patch to
add GVRP support to the VLAN code.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Due to the CONFIG_'s the value is anyway not correct in userspace.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Large Receive Offload (LRO) is only appropriate for packets that are
destined for the host, and should be disabled if received packets may be
forwarded. It can also confuse the GSO on output.
Add dev_disable_lro() function which uses the appropriate ethtool ops to
disable LRO if enabled.
Add calls to dev_disable_lro() in br_add_if() and functions that enable
IPv4 and IPv6 forwarding.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Max of promiscuity and allmulti plus positive @inc can cause overflow.
Fox example: when allmulti=0xFFFFFFFF, any caller give dev_set_allmulti() a
positive @inc will cause allmulti be off.
This is not what we want, though it's rare case.
The fix is that only negative @inc will cause allmulti or promiscuity be off
and when any caller makes the counters touch the roof, we return error.
Change of v2:
Change void function dev_set_promiscuity/allmulti to return int.
So callers can get the overflow error.
Caller's fix will be done later.
Change of v3:
1. Since we return error to caller, we don't need to print KERN_ERROR,
KERN_WARNING is enough.
2. In dev_set_promiscuity(), if __dev_set_promiscuity() failed, we
return at once.
Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The comments of dev_set_allmulti/promiscuity() is that "While the count in
the device remains above zero...". So negative count is useless.
Fix the type of the counter from "int" to "unsigned int".
Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Permit bonding to function rationally if max_bonds is set to
zero. This will load the module, but create no master devices (which can
be created via sysfs).
Requires some change to bond_create_sysfs; currently, the
netdev sysfs directory is determined from the first bonding device created,
but this is no longer possible. Instead, an interface from net/core is
created to create and destroy files in net_class.
Based on a patch submitted by Phil Oester <kernel@linuxaces.com>.
Modified by Jay Vosburgh to fix the sysfs issue mentioned above and to
update the documentation.
Signed-off-by: Phil Oester <kernel@linuxace.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Add NETDEV_BONDING_FAILOVER event to be used in a successive patch
by bonding to announce fail-over for the active-backup mode through the
netdev events notifier chain mechanism. Such an event can be of use for the
RDMA CM (communication manager) to let native RDMA ULPs (eg NFS-RDMA, iSER)
always be aligned with the IP stack, in the sense that they use the same
ports/links as the stack does. More usages can be done to allow monitoring
tools based on netlink events being aware to bonding fail-over.
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Herbert Xu points out that the use of seperate feature bits for features
to be propagated to VLAN devices is going to get messy real soon.
Replace the VLAN feature bits by a bitmask of feature flags to be
propagated and restore the old GSO_SHIFT/MASK values.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Propagate feature bits from the NETDEV_FEAT_CHANGE notifier. For now
only TSO is propagated for devices that announce their ability to
support TSO in combination with VLAN accel by setting the NETIF_F_VLAN_TSO
flag.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds needed_headroom/needed_tailroom members to struct
net_device and updates many places that allocate sbks to use them. Not
all of them can be converted though, and I'm sure I missed some (I
mostly grepped for LL_RESERVED_SPACE)
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wireless networking, particularly with MESH enabled, has
quite strong requirements for link-layer header space.
Based upon some numbers and descriptions from Johannes Berg
we use 96 (same as AX25) for plain wireless, and with
mesh enabled we use 128.
In the process, simplify the cpp conditional logic here by
ordering the cases by those needing the most space down
to those needing the least case.
Signed-off-by: David S. Miller <davem@davemloft.net>
The syncppp layer wants a mid-level netdev private pointer.
It was using netdev->priv but that only worked by accident,
and thus this scheme was broken when the device private
allocation strategy changed.
Add a proper mid-layer private pointer for uses like this,
update syncppp and all users, and remove the HDLC_PPP broken
tag from drivers/net/wan/Kconfig
Signed-off-by: David S. Miller <davem@davemloft.net>
dev_set_net is called for
- just allocated devices
- devices moving from one namespace to another
release_net has proper check inside to distinguish these cases.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This does not look good, but there is no other choice. The compilation
without CONFIG_NET is broken and can not be fixed with ease.
After that there is no need for the following commits:
1567ca7eec3edf8fa5cc2d38f9a4f8
Revert them.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
As reported by Haavard Skinnemoen and Stephen Rothwell:
> allnoconfig fails with
>
> include/linux/netdevice.h:843: error: implicit declaration of function 'dev_net'
>
> which seems to be because the definition of dev_net is inside #ifdef
> CONFIG_NET, while next_net_device, which calls it, is not.
Signed-off-by: David S. Miller <davem@davemloft.net>
Comment dev_kfree_skb_irq and dev_kfree_skb_any better.
Signed-off-by: Matti Linnanvuori <mattilinnanvuori@yahoo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Include sites should not be bothered by whether
CONFIG_NET is set or not when trying to include
benign files like linux/etherdevice.h et al.
From a report by Stephen Rothwell.
Signed-off-by: David S. Miller <davem@davemloft.net>
Based upon a lockdep report.
Since ->poll() can be invoked from netpoll with interrupts
disabled, we must not unconditionally enable interrupts
in napi_complete().
Instead we must use local_irq_{save,restore}().
Noticed by Peter Zijlstra:
<irqs disabled>
netpoll_poll()
poll_napi()
spin_trylock(&napi->poll_lock)
poll_one_napi()
napi->poll() := sky2_poll()
napi_complete()
local_irq_disable()
local_irq_enable() <--- *BUG*
<irq>
irq_exit()
do_softirq()
net_rx_action()
spin_lock(&napi->poll_lock) <--- Deadlock!
Because we still hold the lock....
Signed-off-by: David S. Miller <davem@davemloft.net>
Recent commits from YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
have been introduced a several compilation warnings
'assignment discards qualifiers from pointer target type'
due to extra const modifier in the inline call parameters of
{dev|sock|twsk}_net_set.
Drop it.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit commit c346dca108
([NET] NETNS: Omit net_device->nd_net without CONFIG_NET_NS)
breaks compilation with CONFIG_NET_NS set.
Fix the typo.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce per-net_device inlines: dev_net(), dev_net_set().
Without CONFIG_NET_NS, no namespace other than &init_net exists.
Let's explicitly define them to help compiler optimizations.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Update: My mailer ate one of Jarek's feedback mails... Fixed the
parameter in netif_set_gso_max_size() to be u32, not u16. Fixed the
whitespace issue due to a patch import botch. Changed the types from
u32 to unsigned int to be more consistent with other variables in the
area. Also brought the patch up to the latest net-2.6.26 tree.
Update: Made gso_max_size container 32 bits, not 16. Moved the
location of gso_max_size within netdev to be less hotpath. Made more
consistent names between the sock and netdev layers, and added a
define for the max GSO size.
Update: Respun for net-2.6.26 tree.
Update: changed max_gso_frame_size and sk_gso_max_size from signed to
unsigned - thanks Stephen!
This patch adds the ability for device drivers to control the size of
the TSO frames being sent to them, per TCP connection. By setting the
netdevice's gso_max_size value, the socket layer will set the GSO
frame size based on that value. This will propogate into the TCP
layer, and send TSO's of that size to the hardware.
This can be desirable to help tune the bursty nature of TSO on a
per-adapter basis, where one may have 1 GbE and 10 GbE devices
coexisting in a system, one running multiqueue and the other not, etc.
This can also be desirable for devices that cannot support full 64 KB
TSO's, but still want to benefit from some level of segmentation
offloading.
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (82 commits)
[NET]: Make sure sockets implement splice_read
netconsole: avoid null pointer dereference at show_local_mac()
[IPV6]: Fix reversed local_df test in ip6_fragment
[XFRM]: Avoid bogus BUG() when throwing new policy away.
[AF_KEY]: Fix bug in spdadd
[NETFILTER] nf_conntrack_proto_tcp.c: Mistyped state corrected.
net: xfrm statistics depend on INET
[NETFILTER]: make secmark_tg_destroy() static
[INET]: Unexport inet_listen_wlock
[INET]: Unexport __inet_hash_connect
[NET]: Improve cache line coherency of ingress qdisc
[NET]: Fix race in dev_close(). (Bug 9750)
[IPSEC]: Fix bogus usage of u64 on input sequence number
[RTNETLINK]: Send a single notification on device state changes.
[NETLABLE]: Hide netlbl_unlabel_audit_addr6 under ifdef CONFIG_IPV6.
[NETLABEL]: Don't produce unused variables when IPv6 is off.
[NETLABEL]: Compilation for CONFIG_AUDIT=n case.
[GENETLINK]: Relax dances with genl_lock.
[NETLABEL]: Fix lookup logic of netlbl_domhsh_search_def.
[IPV6]: remove unused method declaration (net/ndisc.h).
...
FASTCALL() is always expanded to empty, remove it.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Move the ingress qdisc members of struct net_device from the transmit
cache line to the receive cache line to avoid cache line ping-pong.
These members are only used on the receive path.
Signed-off-by: Neil Turton <nturton@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reuse the existing logic for multicast list synchronization for the
unicast address list. The core of dev_mc_sync/unsync are split out as
__dev_addr_sync/unsync and moved from dev_mcast.c to dev.c. These are
then used to implement dev_unicast_sync/unsync as well.
I'm working on cleaning up Intel's FCoE stack, which generates new MAC
addresses from the fibre channel device id assigned by the fabric as
per the current draft specification in T11. When using such a
protocol in a VLAN environment it would be nice to not always be
forced into promiscuous mode, assuming the underlying Ethernet driver
supports multiple unicast addresses as well.
Signed-off-by: Chris Leech <christopher.leech@intel.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Create a bit to signal that a napi_disable() is in progress.
This sets up infrastructure such that net_rx_action() can generically
break out of the ->poll() loop on a NAPI context that has a pending
napi_disable() yet is being bombed with packets (and thus would
otherwise poll endlessly and not allow the napi_disable() to finish).
Now, what napi_disable() does is first set the NAPI_STATE_DISABLE bit
(to indicate that a disable is pending), then it polls for the
NAPI_STATE_SCHED bit, and once the NAPI_STATE_SCHED bit is acquired
the NAPI_STATE_DISABLE bit is cleared. Here, the test_and_set_bit()
provides the necessary memory barrier between the various bitops.
napi_schedule_prep() now tests for a pending disable as it's first
action and won't try to obtain the NAPI_STATE_SCHED bit if a disable
is pending.
As a result, we can remove the netif_running() check in
netif_rx_schedule_prep() because the NAPI disable pending state serves
this purpose. And, it does so in a NAPI centric manner which is what
we really want.
Signed-off-by: David S. Miller <davem@davemloft.net>
It is pointless, because everything that can make a device go away
will do a napi_disable() first.
The main impetus behind this is that now we can legally do a NAPI
completion in generic code like net_rx_action() which a following
changeset needs to do. net_rx_action() can only perform actions
in NAPI centric ways, because there may be a one to many mapping
between NAPI contexts and network devices (SKY2 is one example).
We also want to get rid of this because it's an extra atomic in the
NAPI paths, and also because it is one of the last instances where the
NAPI interfaces care about net devices.
The one remaining netdev detail the NAPI stuff cares about is the
netif_running() check which will be killed off in a subsequent
changeset.
Signed-off-by: David S. Miller <davem@davemloft.net>
Documentation updates for network interfaces.
1. Add doc for netif_napi_add
2. Remove doc for unused returns from netif_rx
3. Add doc for netif_receive_skb
[ Incorporated minor mods from Randy Dunlap -DaveM ]
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current napi_disable() uses msleep_interruptible() but doesn't
(and can't) exit in case there's a signal, thus ending up doing a
hot spin without a cpu_relax. Use uninterruptible sleep instead.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove qeth bug caused by commit:
[NET]: Move hardware header operations out of netdevice.
This is the second part of the qeth header_ops patch, since
first patch sent 10/19 has been insufficient.
Nevertheless first patch is still valid and should be kept.
Signed-off-by: Ursula Braun <braunu@de.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Many places get the queue_mapping field from skb to pass it to the
netif_subqueue_stopped() which will be 0 in any case.
Make the helper that works with sk_buff
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
[ This is kernel bugzilla 9174 "linux-2.6.23-git11 kernel panic" ]
The device in question is an IPv6-over-IPv4 tunnel, which doesn't have
any header_ops, so the crash happens in dev_parse_header when
dereferencing them.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some drivers with shared NAPI need a synchronization barrier.
Also suggested by Benjamin Herrenschmidt for EMAC.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Fix networking code kernel-doc for newly added parameters.
Warning(linux-2.6.23-git2//net/core/sock.c:879): No description found for parameter 'net'
Warning(linux-2.6.23-git2//net/core/dev.c:570): No description found for parameter 'net'
Warning(linux-2.6.23-git2//net/core/dev.c:594): No description found for parameter 'net'
Warning(linux-2.6.23-git2//net/core/dev.c:617): No description found for parameter 'net'
Warning(linux-2.6.23-git2//net/core/dev.c:641): No description found for parameter 'net'
Warning(linux-2.6.23-git2//net/core/dev.c:667): No description found for parameter 'net'
Warning(linux-2.6.23-git2//net/core/dev.c:722): No description found for parameter 'net'
Warning(linux-2.6.23-git2//net/core/dev.c:959): No description found for parameter 'net'
Warning(linux-2.6.23-git2//net/core/dev.c:1195): No description found for parameter 'dev'
Warning(linux-2.6.23-git2//net/core/dev.c:2105): No description found for parameter 'n'
Warning(linux-2.6.23-git2//net/core/dev.c:3272): No description found for parameter 'net'
Warning(linux-2.6.23-git2//net/core/dev.c:3445): No description found for parameter 'net'
Warning(linux-2.6.23-git2//include/linux/netdevice.h:1301): No description found for parameter 'cpu'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Trivial fix: Swap comments for dev_put() and dev_hold() to get them
at the right place.
Typo introduced by 4fa57c9ea9f36f9ca852f3a88ca5d2f1aebbc960.
Signed-of-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit da3dedd9 ("[NET]: Make NAPI polling independent of struct
net_device objects.") changed the interface to NAPI polling. Fix up
the ibm_emac driver so that it works with this new interface. This is
actually a nice cleanup because ibm_emac is one of the drivers that
wants to have multiple NAPI structures for a single net_device.
Tested with the internal MAC of a PowerPC 440SPe SoC with an AMCC
'Yucca' evaluation board.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Since hardware header operations are part of the protocol class
not the device instance, make them into a separate object and
save memory.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wrap the hard_header_parse function to simplify next step of
header_ops conversion.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add inline for common usage of hardware header creation, and
fix bug in IPV6 mcast where the assumption about negative return is
an errno. Negative return from hard_header means not enough space
was available,(ie -N bytes).
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes loopback_dev per network namespace. Adding
code to create a different loopback device for each network
namespace and adding the code to free a loopback device
when a network namespace exits.
This patch modifies all users the loopback_dev so they
access it as init_net.loopback_dev, keeping all of the
code compiling and working. A later pass will be needed to
update the users to use something other than the initial network
namespace.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Am Freitag, 21. September 2007 schrieb Herbert Xu:
> Please don't use LLTX in new drivers. We're trying to get rid
> of it since it's
>
> 1) unnecessary;
> 2) causes problems with AF_PACKET seeing things twice.
I suggest to document that LLTX is deprecated.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch replaces all occurences to the static variable
loopback_dev to a pointer loopback_dev. That provides the
mindless, trivial, uninteressting change part for the dynamic
allocation for the loopback.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-By: Kirill Korotaev <dev@sw.ru>
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This replaces the void * parameter with a struct net_device * which
is what is actually required.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
HARD_TX_LOCK micro is a nice aggregation that could be used
in other spots. move it to netdevice.h
Also makes sure the previously superflous cpu arguement is used.
Thanks to DaveM for the suggestions.
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
It's been a useless no-op for long enough in 2.6 so I figured it's time to
remove it. The number of people that could object because they're
maintaining unified 2.4 and 2.6 drivers is probably rather small.
[ Handled drivers added by netdev tree and some missed IRDA cases... -DaveM ]
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The macro definition is bad. When calling next_net_device with
parameter name "dev", the resulting code is:
struct net_device *dev = dev and that leads to an unexpected
behavior. Especially when llc_core is compiled in, the kernel panics
at boot time.
The patchset change macro definition with static inline functions as
they were defined before.
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch introduces NETIF_F_NETNS_LOCAL a flag to indicate
a network device is local to a single network namespace and
should never be moved. Useful for pseudo devices that we
need an instance in each network namespace (like the loopback
device) and for any device we find that cannot handle multiple
network namespaces so we may trap them in the initial network
namespace.
This patch introduces the function dev_change_net_namespace
a function used to move a network device from one network
namespace to another. To the network device nothing
special appears to happen, to the components of the network
stack it appears as if the network device was unregistered
in the network namespace it is in, and a new device
was registered in the network namespace the device
was moved to.
This patch sets up a namespace device destructor that
upon the exit of a network namespace moves all of the
movable network devices to the initial network namespace
so they are not lost.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes most of the generic device layer network
namespace safe. This patch makes dev_base_head a
network namespace variable, and then it picks up
a few associated variables. The functions:
dev_getbyhwaddr
dev_getfirsthwbytype
dev_get_by_flags
dev_get_by_name
__dev_get_by_name
dev_get_by_index
__dev_get_by_index
dev_ioctl
dev_ethtool
dev_load
wireless_process_ioctl
were modified to take a network namespace argument, and
deal with it.
vlan_ioctl_set and brioctl_set were modified so their
hooks will receive a network namespace argument.
So basically anthing in the core of the network stack that was
affected to by the change of dev_base was modified to handle
multiple network namespaces. The rest of the network stack was
simply modified to explicitly use &init_net the initial network
namespace. This can be fixed when those components of the network
stack are modified to handle multiple network namespaces.
For now the ifindex generator is left global.
Fundametally ifindex numbers are per namespace, or else
we will have corner case problems with migration when
we get that far.
At the same time there are assumptions in the network stack
that the ifindex of a network device won't change. Making
the ifindex number global seems a good compromise until
the network stack can cope with ifindex changes when
you change namespaces, and the like.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Please note that network devices do not increase the count
count on the network namespace. The are inside the network
namespace and so the network namespace tag is in the nature
of a back pointer and so getting and putting the network namespace
is unnecessary.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Several devices have multiple independant RX queues per net
device, and some have a single interrupt doorbell for several
queues.
In either case, it's easier to support layouts like that if the
structure representing the poll is independant from the net
device itself.
The signature of the ->poll() call back goes from:
int foo_poll(struct net_device *dev, int *budget)
to
int foo_poll(struct napi_struct *napi, int budget)
The caller is returned the number of RX packets processed (or
the number of "NAPI credits" consumed if you want to get
abstract). The callee no longer messes around bumping
dev->quota, *budget, etc. because that is all handled in the
caller upon return.
The napi_struct is to be embedded in the device driver private data
structures.
Furthermore, it is the driver's responsibility to disable all NAPI
instances in it's ->stop() device close handler. Since the
napi_struct is privatized into the driver's private data structures,
only the driver knows how to get at all of the napi_struct instances
it may have per-device.
With lots of help and suggestions from Rusty Russell, Roland Dreier,
Michael Chan, Jeff Garzik, and Jamal Hadi Salim.
Bug fixes from Thomas Graf, Roland Dreier, Peter Zijlstra,
Joseph Fannin, Scott Wood, Hans J. Koch, and Michael Chan.
[ Ported to current tree and all drivers converted. Integrated
Stephen's follow-on kerneldoc additions, and restored poll_list
handling to the old style to fix mutual exclusion issues. -DaveM ]
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
http://bugzilla.kernel.org/show_bug.cgi?id=8797 shows that the
bonding driver may produce bogus combinations of the checksum
flags and SG/TSO.
For example, if you bond devices with NETIF_F_HW_CSUM and
NETIF_F_IP_CSUM you'll end up with a bonding device that
has neither flag set. If both have TSO then this produces
an illegal combination.
The bridge device on the other hand has the correct code to
deal with this.
In fact, the same code can be used for both. So this patch
moves that logic into net/core/dev.c and uses it for both
bonding and bridging.
In the process I've made small adjustments such as only
setting GSO_ROBUST if at least one constituent device
supports it.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Because this function is only called by unregister_netdevice,
this moving could make this non-global function static,
and also remove its declaration in netdevice.h;
Any further, function __dev_addr_discard is also just called by
dev_mc_discard and dev_unicast_discard, keeping this two functions
both in one c file could make __dev_addr_discard also static
and remove its declaration in netdevice.h;
Futhermore, the sequential call to dev_unicast_discard and then
dev_mc_discard in unregister_netdevice have a similar mechanism that:
(netif_tx_lock_bh / __dev_addr_discard / netif_tx_unlock_bh),
they should merged into one to eliminate duplicates in acquiring and
releasing the dev->_xmit_lock, this would be done in my following patch.
Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 29578624e3.
Ingo Molnar reports complete breakage with his e1000 card (no
networking, card reports transmit timeouts), and bisected it down to
this commit. Let's figure out what went wrong, but not keep breaking
machines until we do.
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Olaf Kirch <olaf.kirch@oracle.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add macvlan driver, which allows to create virtual ethernet devices
based on MAC address.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The method drivers currently use to synchronize multicast lists is not
very pretty:
- walk the multicast list
- search each entry on a copy of the previous list
- if new add to lower device
- walk the copy of the previous list
- search each entry on the current list
- if removed delete from lower device
- copy entire list
This patch adds a new field to struct dev_addr_list to store the
synchronization state and adds two helper functions for synchronization
and cleanup.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the set_multicast_list (and set_rx_mode) callbacks are
responsible for configuring the device according to the IFF_PROMISC,
IFF_MULTICAST and IFF_ALLMULTI flags and the mc_list (and uc_list in
case of set_rx_mode).
These callbacks can be invoked from BH context without the rtnl_mutex
by dev_mc_add/dev_mc_delete, which makes reading the device flags and
promiscous/allmulti count racy. For real hardware drivers that just
commit all changes to the hardware this is not a real problem since
the stack guarantees to call them for every change, so at least the
final call will not race and commit the correct configuration to the
hardware.
For software devices that want to synchronize promiscous and multicast
state to an underlying device however this can cause corruption of the
underlying device's flags or promisc/allmulti counts.
When the software device is concurrently put in promiscous or allmulti
mode while set_multicast_list is invoked from bottem half context, the
device might synchronize the change to the underlying device without
holding the rtnl_mutex, which races with concurrent changes to the
underlying device.
Add a dev->change_rx_flags hook that is invoked when any of the flags
that affect rx filtering change (under the rtnl_mutex), which allows
drivers to perform synchronization immediately and only synchronize
the address lists in set_multicast_list/set_rx_mode.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Keep netpoll/poll_napi from messing with the poll_list.
Only net_rx_action is allowed to manipulate the list.
Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a reference to an existing address is increased or decreased without
hitting zero, the address count is incorrectly adjusted.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the multiqueue hardware device support API to the core network
stack. Allow drivers to allocate multiple queues and manage them at
the netdev level if they choose to do so.
Added a new field to sk_buff, namely queue_mapping, for drivers to
know which tx_ring to select based on OS classification of the flow.
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for configuring secondary unicast addresses on network
devices. To support this devices capable of filtering multiple
unicast addresses need to change their set_multicast_list function
to configure unicast filters as well and assign it to dev->set_rx_mode
instead of dev->set_multicast_list. Other devices are put into promiscous
mode when secondary unicast addresses are present.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use generic net_device address lists for multicast list handling.
Some defines are used to keep drivers working.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce struct dev_addr_list and list maintenance functions
based on dev_mc_list and the related functions. This will be
used by follow-up patches for both multicast and secondary
unicast addresses.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The existing model for checksum offload does not correctly handle
devices that can offload IPV4 and IPV6 only. The NETIF_F_HW_CSUM flag
implies device can do any arbitrary protocol.
This patch:
* adds NETIF_F_IPV6_CSUM for those devices
* fixes bnx2 and tg3 devices that need it
* add NETIF_F_IPV6_CSUM to ipv6 output (incl GSO)
* fixes assumptions about NETIF_F_ALL_CSUM in nat
* adjusts bridge union of checksumming computation
Signed-off-by: David S. Miller <davem@davemloft.net>
Add rtnetlink API for creating, changing and deleting software devices.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch kills a little bit of code duplication between the two
variants of netif_rx_complete.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
These days the link watch mechanism is an integral part of the
network subsystem as it manages the carrier status. So it now
makes sense to allocate some memory for it in net_device rather
than allocating it on demand.
In fact, this is necessary because we can't tolerate a memory
allocation failure since that means we'd have to potentially
throw a link up event away.
It also simplifies the code greatly.
In doing so I discovered a subtle race condition in the use
of singleevent. This race condition still exists (and is
somewhat magnified) without singleevent but it's now plugged
thanks to an smp_mb__before_clear_bit.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cleanup of dev_base list use, with the aim to simplify making device
list per-namespace. In almost every occasion, use of dev_base variable
and dev->next pointer could be easily replaced by for_each_netdev
loop. A few most complicated places were converted to using
first_netdev()/next_netdev().
Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Acked-by: Kirill Korotaev <dev@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add __dev_getfirstbyhwtype for callers that don't want a reference but
some data from the device and thus need to take the rtnl anyway.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu conviced me that a new flag was overkill; every driver
currently overrides get_stats, so we might as well make the internal
one the default. If someone did fail to set get_stats, they would now
get all 0 stats instead of "No statistics available".
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
CONFIG_NETPOLL_TRAP causes the TX queue controls to be completely bypassed in
the netpoll's "trapped" mode which easily causes overflows in the drivers with
short TX queues (most notably, in 8139too with its 4-deep queue). So, make
this option more sensible by making it only bypass the TX softirq wakeup.
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Acked-by: Jeff Garzik <jgarzik@pobox.com>
Acked-by: Tom Rini <trini@kernel.crashing.org>
Acked-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes the wext bits in struct net_device depend on
CONFIG_WIRELESS_EXT.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch creates the core cfg80211 code along with some sysfs bits.
This is a stripped down version to allow mac80211 to function, but
doesn't include any configuration yet except for creating and removing
virtual interfaces.
This patch includes the nl80211 header file but it only contains the
interface types which the cfg80211 interface for creating virtual
interfaces relies on.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Network drivers which keep stats allocate their own stats structure
then write a get_stats() function to return them. It would be nice if
this were done by default.
1) Add a new "stats" field to "struct net_device".
2) Add a new feature field to say "this driver uses the internal one"
3) Have a default "get_stats" which returns NULL if that feature not set.
4) Change callers to check result of get_stats call for NULL, not if
->get_stats is set.
This should not break backwards compatibility with older drivers, yet
allow modern drivers to shed some boilerplate code.
Lightly tested: works for a modified lguest network driver.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
There was no real useful information from the unregister_netdevice() return
code, the only error occurred in a situation that was a driver bug. So
change it to a void function.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This lets the network core have the ability to handle suspend/resume
issues, if it wants to.
Thanks to Frederik Deweerdt <frederik.deweerdt@gmail.com> for the arm
driver fixes.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When a driver writer calls this, they generally expect that
all previous stores and modifications they've made will be
visible before netif_poll_enable() executes, so ensure this.
Noticed by Ben H.
Signed-off-by: David S. Miller <davem@davemloft.net>
hh_lock was converted from rwlock to seqlock by Stephen.
To have a 100% benefit of this change, I suggest to place read mostly fields
of hh_cache in a separate cache line, because hh_refcnt may be changed quite
frequently on some busy machines.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The hard header cache is in the main output path, so using
seqlock instead of reader/writer lock should reduce overhead.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Throughout the TCP/DCCP (and tunnelling) code, it often happens that the
return code of a transmit function needs to be tested against NET_XMIT_CN
which is a value that does not indicate a strict error condition.
This patch uses a macro for these recurring situations which is consistent
with the already existing macro net_xmit_errno, saving on duplicated code.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
This patch contains the scheduled removal of the frame diverter.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
MAX_HEADER is either set to LL_MAX_HEADER or LL_MAX_HEADER + 48, and
this is controlled by a set of CONFIG_* ifdef tests.
It is trying to use LL_MAX_HEADER + 48 when any of the tunnels are
enabled which set hard_header_len like this:
dev->hard_header_len = LL_MAX_HEADER + sizeof(struct xxx);
The correct set of tunnel drivers which do this are:
ipip
ip_gre
ip6_tunnel
sit
so make the ifdef test match.
Noticed by Patrick McHardy and with help from Herbert Xu.
Signed-off-by: David S. Miller <davem@davemloft.net>
Add logic to check ARP request / reply packets used for ARP
monitor link integrity checking.
The current method simply examines the slave device to see if it
has sent and received traffic; this can be fooled by extraneous traffic.
For example, if multiple hosts running bonding are behind a common
switch, the probe traffic from the multiple instances of bonding will
update the tx/rx times on each other's slave devices.
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
This is version 21 of the Wireless Extensions. Changelog :
o finishes migrating the ESSID API (remove the +1)
o netdev->get_wireless_stats is no more
o long/short retry
This is a redacted version of a patch originally submitted by Jean
Tourrilhes. I removed most of the additions, in order to minimize
future support requirements for nl80211 (or other WE successor).
CC: Jean Tourrilhes <jt@hpl.hp.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6: (217 commits)
net/ieee80211: fix more crypto-related build breakage
[PATCH] Spidernet: add ethtool -S (show statistics)
[NET] GT96100: Delete bitrotting ethernet driver
[PATCH] mv643xx_eth: restrict to 32-bit PPC_MULTIPLATFORM
[PATCH] Cirrus Logic ep93xx ethernet driver
r8169: the MMIO region of the 8167 stands behin BAR#1
e1000, ixgb: Remove pointless wrappers
[PATCH] Remove powerpc specific parts of 3c509 driver
[PATCH] s2io: Switch to pci_get_device
[PATCH] gt96100: move to pci_get_device API
[PATCH] ehea: bugfix for register access functions
[PATCH] e1000 disable device on PCI error
drivers/net/phy/fixed: #if 0 some incomplete code
drivers/net: const-ify ethtool_ops declarations
[PATCH] ethtool: allow const ethtool_ops
[PATCH] sky2: big endian
[PATCH] sky2: fiber support
[PATCH] sky2: tx pause bug fix
drivers/net: Trim trailing whitespace
[PATCH] ehea: IBM eHEA Ethernet Device Driver
...
Manually resolved conflicts in drivers/net/ixgb/ixgb_main.c and
drivers/net/sky2.c related to CHECKSUM_HW/CHECKSUM_PARTIAL changes by
commit 84fa7933a3 that just happened to be
next to unrelated changes in this update.
Replace CHECKSUM_HW by CHECKSUM_PARTIAL (for outgoing packets, whose
checksum still needs to be completed) and CHECKSUM_COMPLETE (for
incoming packets, device supplied full checksum).
Patch originally from Herbert Xu, updated by myself for 2.6.18-rc3.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ethtool_ops structure is immutable, it expected to be setup
by the driver and is never changed. This patch allows drivers to
declare there ethtool_ops structure read-only.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
When the bridge recomputes features, it does not maintain the
constraint that SG/GSO must be off if TX checksum is off.
This patch adds that constraint.
On a completely unrelated note, I've also added TSO6 and TSO_ECN
feature bits if GSO is enabled on the underlying device through
the new NETIF_F_GSO_SOFTWARE macro.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since __vlan_hwaccel_rx() is essentially bypassing the
netif_receive_skb() call that would have occurred if we did the VLAN
decapsulation in software, we are missing the skb_bond() call and the
assosciated checks it does.
Export those checks via an inline function, skb_bond_should_drop(),
and use this in __vlan_hwaccel_rx().
Signed-off-by: David S. Miller <davem@davemloft.net>
A non-zero return value indicates success from spin_trylock,
not error.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Certain subsystems in the stack (e.g., netfilter) can break the partial
checksum on GSO packets. Until they're fixed, this patch allows this to
work by recomputing the partial checksums through the GSO mechanism.
Once they've all been converted to update the partial checksum instead of
clearing it, this workaround can be removed.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the wrapper function skb_is_gso which can be used instead
of directly testing skb_shinfo(skb)->gso_size. This makes things a little
nicer and allows us to change the primary key for indicating whether an skb
is GSO (if we ever want to do that).
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds GSO support for IPv6 and TCPv6. This is based on a patch
by Ananda Raju <Ananda.Raju@neterion.com>. His original description is:
This patch enables TSO over IPv6. Currently Linux network stacks
restricts TSO over IPv6 by clearing of the NETIF_F_TSO bit from
"dev->features". This patch will remove this restriction.
This patch will introduce a new flag NETIF_F_TSO6 which will be used
to check whether device supports TSO over IPv6. If device support TSO
over IPv6 then we don't clear of NETIF_F_TSO and which will make the
TCP layer to create TSO packets. Any device supporting TSO over IPv6
will set NETIF_F_TSO6 flag in "dev->features" along with NETIF_F_TSO.
In case when user disables TSO using ethtool, NETIF_F_TSO will get
cleared from "dev->features". So even if we have NETIF_F_TSO6 we don't
get TSO packets created by TCP layer.
SKB_GSO_TCPV4 renamed to SKB_GSO_TCP to make it generic GSO packet.
SKB_GSO_UDPV4 renamed to SKB_GSO_UDP as UFO is not a IPv4 feature.
UFO is supported over IPv6 also
The following table shows there is significant improvement in
throughput with normal frames and CPU usage for both normal and jumbo.
--------------------------------------------------
| | 1500 | 9600 |
| ------------------|-------------------|
| | thru CPU | thru CPU |
--------------------------------------------------
| TSO OFF | 2.00 5.5% id | 5.66 20.0% id |
--------------------------------------------------
| TSO ON | 2.63 78.0 id | 5.67 39.0% id |
--------------------------------------------------
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch generalises the TSO-specific bits from sk_setup_caps by adding
the sk_gso_type member to struct sock. This makes sk_setup_caps generic
so that it can be used by TCPv6 or UFO.
The only catch is that whoever uses this must provide a GSO implementation
for their protocol which I think is a fair deal :) For now UFO continues to
live without a GSO implementation which is OK since it doesn't use the sock
caps field at the moment.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the current TSO implementation, NETIF_F_TSO and ECN cannot be
turned on together in a TCP connection. The problem is that most
hardware that supports TSO does not handle CWR correctly if it is set
in the TSO packet. Correct handling requires CWR to be set in the
first packet only if it is set in the TSO header.
This patch adds the ability to turn on NETIF_F_TSO and ECN using
GSO if necessary to handle TSO packets with CWR set. Hardware
that handles CWR correctly can turn on NETIF_F_TSO_ECN in the dev->
features flag.
All TSO packets with CWR set will have the SKB_GSO_TCPV4_ECN set. If
the output device does not have the NETIF_F_TSO_ECN feature set, GSO
will split the packet up correctly with CWR only set in the first
segment.
With help from Herbert Xu <herbert@gondor.apana.org.au>.
Since ECN can always be enabled with TSO, the SOCK_NO_LARGESEND sock
flag is completely removed.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The test in skb_gso_ok is backwards. Noticed by Michael Chan
<mchan@broadcom.com>.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When GSO packets come from an untrusted source (e.g., a Xen guest domain),
we need to verify the header integrity before passing it to the hardware.
Since the first step in GSO is to verify the header, we can reuse that
code by adding a new bit to gso_type: SKB_GSO_DODGY. Packets with this
bit set can only be fed directly to devices with the corresponding bit
NETIF_F_GSO_ROBUST. If the device doesn't have that bit, then the skb
is fed to the GSO engine which will allow the packet to be sent to the
hardware if it passes the header check.
This patch changes the sg flag to a full features flag. The same method
can be used to implement TSO ECN support. We simply have to mark packets
with CWR set with SKB_GSO_ECN so that only hardware with a corresponding
NETIF_F_TSO_ECN can accept them. The GSO engine can either fully segment
the packet, or segment the first MTU and pass the rest to the hardware for
further segmentation.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a generic segmentation offload toggle that can be turned
on/off for each net device. For now it only supports in TCPv4.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the infrastructure for generic segmentation offload.
The idea is to tap into the potential savings of TSO without hardware
support by postponing the allocation of segmented skb's until just
before the entry point into the NIC driver.
The same structure can be used to support software IPv6 TSO, as well as
UFO and segmentation offload for other relevant protocols, e.g., DCCP.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Having separate fields in sk_buff for TSO/UFO (tso_size/ufo_size) is not
going to scale if we add any more segmentation methods (e.g., DCCP). So
let's merge them.
They were used to tell the protocol of a packet. This function has been
subsumed by the new gso_type field. This is essentially a set of netdev
feature bits (shifted by 16 bits) that are required to process a specific
skb. As such it's easy to tell whether a given device can process a GSO
skb: you just have to and the gso_type field and the netdev's features
field.
I've made gso_type a conjunction. The idea is that you have a base type
(e.g., SKB_GSO_TCPV4) that can be modified further to support new features.
For example, if we add a hardware TSO type that supports ECN, they would
declare NETIF_F_TSO | NETIF_F_TSO_ECN. All TSO packets with CWR set would
have a gso_type of SKB_GSO_TCPV4 | SKB_GSO_TCPV4_ECN while all other TSO
packets would be SKB_GSO_TCPV4. This means that only the CWR packets need
to be emulated in software.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.infradead.org/hdrcleanup-2.6: (63 commits)
[S390] __FD_foo definitions.
Switch to __s32 types in joystick.h instead of C99 types for consistency.
Add <sys/types.h> to headers included for userspace in <linux/input.h>
Move inclusion of <linux/compat.h> out of user scope in asm-x86_64/mtrr.h
Remove struct fddi_statistics from user view in <linux/if_fddi.h>
Move user-visible parts of drivers/s390/crypto/z90crypt.h to include/asm-s390
Revert include/media changes: Mauro says those ioctls are only used in-kernel(!)
Include <linux/types.h> and use __uXX types in <linux/cramfs_fs.h>
Use __uXX types in <linux/i2o_dev.h>, include <linux/ioctl.h> too
Remove private struct dx_hash_info from public view in <linux/ext3_fs.h>
Include <linux/types.h> and use __uXX types in <linux/affs_hardblocks.h>
Use __uXX types in <linux/divert.h> for struct divert_blk et al.
Use __u32 for elf_addr_t in <asm-powerpc/elf.h>, not u32. It's user-visible.
Remove PPP_FCS from user view in <linux/ppp_defs.h>, remove __P mess entirely
Use __uXX types in user-visible structures in <linux/nbd.h>
Don't use 'u32' in user-visible struct ip_conntrack_old_tuple.
Use __uXX types for S390 DASD volume label definitions which are user-visible
S390 BIODASDREADCMB ioctl should use __u64 not u64 type.
Remove unneeded inclusion of <linux/time.h> from <linux/ufs_fs.h>
Fix private integer types used in V4L2 ioctls.
...
Manually resolve conflict in include/linux/mtd/physmap.h
Having two or more qdisc_run's contend against each other is bad because
it can induce packet reordering if the packets have to be requeued. It
appears that this is an unintended consequence of relinquinshing the queue
lock while transmitting. That in turn is needed for devices that spend a
lot of time in their transmit routine.
There are no advantages to be had as devices with queues are inherently
single-threaded (the loopback device is not but then it doesn't have a
queue).
Even if you were to add a queue to a parallel virtual device (e.g., bolt
a tbf filter in front of an ipip tunnel device), you would still want to
process the queue in sequence to ensure that the packets are ordered
correctly.
The solution here is to steal a bit from net_device to prevent this.
BTW, as qdisc_restart is no longer used by anyone as a module inside the
kernel (IIRC it used to with netif_wake_queue), I have not exported the
new __qdisc_run function.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current stack treats NETIF_F_HW_CSUM and NETIF_F_NO_CSUM
identically so we test for them in quite a few places. For the sake
of brevity, I'm adding the macro NETIF_F_GEN_CSUM for these two. We
also test the disjunct of NETIF_F_IP_CSUM and the other two in various
places, for that purpose I've added NETIF_F_ALL_CSUM.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Various drivers use xmit_lock internally to synchronise with their
transmission routines. They do so without setting xmit_lock_owner.
This is fine as long as netpoll is not in use.
With netpoll it is possible for deadlocks to occur if xmit_lock_owner
isn't set. This is because if a printk occurs while xmit_lock is held
and xmit_lock_owner is not set can cause netpoll to attempt to take
xmit_lock recursively.
While it is possible to resolve this by getting netpoll to use
trylock, it is suboptimal because netpoll's sole objective is to
maximise the chance of getting the printk out on the wire. So
delaying or dropping the message is to be avoided as much as possible.
So the only alternative is to always set xmit_lock_owner. The
following patch does this by introducing the netif_tx_lock family of
functions that take care of setting/unsetting xmit_lock_owner.
I renamed xmit_lock to _xmit_lock to indicate that it should not be
used directly. I didn't provide irq versions of the netif_tx_lock
functions since xmit_lock is meant to be a BH-disabling lock.
This is pretty much a straight text substitution except for a small
bug fix in winbond. It currently uses
netif_stop_queue/spin_unlock_wait to stop transmission. This is
unsafe as an IRQ can potentially wake up the queue. So it is safer to
use netif_tx_disable.
The hamradio bits used spin_lock_irq but it is unnecessary as
xmit_lock must never be taken in an IRQ handler.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Attempts to allocate per-CPU DMA channels
Signed-off-by: Chris Leech <christopher.leech@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The last step of netdevice registration was being done by a delayed
call, but because it was delayed, it was impossible to return any error
code if the class_device registration failed.
Side effects:
* one state in registration process is unnecessary.
* register_netdevice can sleep inside class_device registration/hotplug
* code in netdev_run_todo only does unregistration so it is simpler.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a backout of earlier patch.
The whole rescheduling hack was a bad idea. It doesn't really solve
the problem and it makes the code more complicated for no good reason.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Atomically create attributes when class device is added. This avoids
the race between registering class_device (which generates hotplug
event), and the creation of attribute groups.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a workaround for the case edge-triggered irq's. Several users
seem to have broken configurations sharing edge-triggered irq's. To avoid
losing IRQ's, reshedule if more work arrives.
The changes to netdevice.h are to extract the part that puts device
back in list into separate inline.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
On a allyesconfig'ured kernel:
Size Uses Wasted Name and definition
===== ==== ====== ================================================
95 162 12075 netif_wake_queue include/linux/netdevice.h
129 86 9265 dev_kfree_skb_any include/linux/netdevice.h
127 56 5885 netif_device_attach include/linux/netdevice.h
73 86 4505 dev_kfree_skb_irq include/linux/netdevice.h
46 60 1534 netif_device_detach include/linux/netdevice.h
119 16 1485 __netif_rx_schedule include/linux/netdevice.h
143 5 492 netif_rx_schedule include/linux/netdevice.h
81 7 366 netif_schedule include/linux/netdevice.h
netif_wake_queue is big because __netif_schedule is a big inline:
static inline void __netif_schedule(struct net_device *dev)
{
if (!test_and_set_bit(__LINK_STATE_SCHED, &dev->state)) {
unsigned long flags;
struct softnet_data *sd;
local_irq_save(flags);
sd = &__get_cpu_var(softnet_data);
dev->next_sched = sd->output_queue;
sd->output_queue = dev;
raise_softirq_irqoff(NET_TX_SOFTIRQ);
local_irq_restore(flags);
}
}
static inline void netif_wake_queue(struct net_device *dev)
{
#ifdef CONFIG_NETPOLL_TRAP
if (netpoll_trap())
return;
#endif
if (test_and_clear_bit(__LINK_STATE_XOFF, &dev->state))
__netif_schedule(dev);
}
By de-inlining __netif_schedule we are saving a lot of text
at each callsite of netif_wake_queue and netif_schedule.
__netif_rx_schedule is also big, and it makes more sense to keep
both of them out of line.
Patch also deinlines dev_kfree_skb_any. We can deinline dev_kfree_skb_irq
instead... oh well.
netif_device_attach/detach are not hot paths, we can deinline them too.
Signed-off-by: Denis Vlasenko <vda@ilport.com.ua>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Get rid of the old __dev_put macro that is just a hold over from pre 2.6
kernel. And turn dev_hold into an inline instead of a macro.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
this patch adds a dormant flag to network devices, RFC2863 operstate derived
from these flags and possibility for userspace interaction. It allows drivers
to signal that a device is unusable for user traffic without disabling
queueing (and therefore the possibility for protocol establishment traffic to
flow) and a userspace supplicant (WPA, 802.1X) to mark a device unusable
without changes to the driver.
It is the result of our long discussion. However I must admit that it
represents what Jamal and I agreed on with compromises towards Krzysztof, but
Thomas and Krzysztof still disagree with some parts. Anyway I think it should
be applied.
Signed-off-by: Stefan Rompf <stefan@loplof.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
I didn't like the name netif_rx_schedule_test(), in earlier patches
and changed to __netif_rx_schedule_prep to be more consistent.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
dev_valid_name() is a useful function. Make it public.
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Here is the patch that introduces the generic skb_checksum_complete
which also checks for hardware RX checksum faults. If that happens,
it'll call netdev_rx_csum_fault which currently prints out a stack
trace with the device name. In future it can turn off RX checksum.
I've converted every spot under net/ that does RX checksum checks to
use skb_checksum_complete or __skb_checksum_complete with the
exceptions of:
* Those places where checksums are done bit by bit. These will call
netdev_rx_csum_fault directly.
* The following have not been completely checked/converted:
ipmr
ip_vs
netfilter
dccp
This patch is based on patches and suggestions from Stephen Hemminger
and David S. Miller.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Attached is kernel patch for UDP Fragmentation Offload (UFO) feature.
1. This patch incorporate the review comments by Jeff Garzik.
2. Renamed USO as UFO (UDP Fragmentation Offload)
3. udp sendfile support with UFO
This patches uses scatter-gather feature of skb to generate large UDP
datagram. Below is a "how-to" on changes required in network device
driver to use the UFO interface.
UDP Fragmentation Offload (UFO) Interface:
-------------------------------------------
UFO is a feature wherein the Linux kernel network stack will offload the
IP fragmentation functionality of large UDP datagram to hardware. This
will reduce the overhead of stack in fragmenting the large UDP datagram to
MTU sized packets
1) Drivers indicate their capability of UFO using
dev->features |= NETIF_F_UFO | NETIF_F_HW_CSUM | NETIF_F_SG
NETIF_F_HW_CSUM is required for UFO over ipv6.
2) UFO packet will be submitted for transmission using driver xmit routine.
UFO packet will have a non-zero value for
"skb_shinfo(skb)->ufo_size"
skb_shinfo(skb)->ufo_size will indicate the length of data part in each IP
fragment going out of the adapter after IP fragmentation by hardware.
skb->data will contain MAC/IP/UDP header and skb_shinfo(skb)->frags[]
contains the data payload. The skb->ip_summed will be set to CHECKSUM_HW
indicating that hardware has to do checksum calculation. Hardware should
compute the UDP checksum of complete datagram and also ip header checksum of
each fragmented IP packet.
For IPV6 the UFO provides the fragment identification-id in
skb_shinfo(skb)->ip6_frag_id. The adapter should use this ID for generating
IPv6 fragments.
Signed-off-by: Ananda Raju <ananda.raju@neterion.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (forwarded)
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Place them on separate cache lines in SMP to lower memory bouncing
between multiple CPU accessing the device.
- One part is mostly used on receive path (including
eth_type_trans()) (poll_list, poll, quota, weight, last_rx,
dev_addr, broadcast)
- One part is mostly used on queue transmit path (qdisc)
(queue_lock, qdisc, qdisc_sleeping, qdisc_list, tx_queue_len)
- One part is mostly used on xmit path (device)
(xmit_lock, xmit_lock_owner, priv, hard_start_xmit, trans_start)
'features' is placed outside of these hot points, in a location that
may be shared by all cpus (because mostly read)
name_hlist is moved close to name[IFNAMSIZ] to speedup __dev_get_by_name()
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Here is revised patch against netdev sky2 branch.
It includes whitespace fixes, all the changes from the previous
review as well as some optimizations and timing fixes to
solve some of the hangs.
The stall problem is better but not perfect. It appears that
under stress the chip can't keep up with the bus
and sends a pause frame, then hangs. This version is for
testing, and hopefully other eyes might see the root
cause of the problem.
I don't want to reinvent the ugly watchdog code in the syskonnect
version of sk98lin. If you read it you will see, the original
driver writer and the hardware developer obviously didn't
understand each other.
Dual port support is included, but not tested yet. It did
require small change to NAPI since both ports share same
IRQ.
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Use schedule_timeout_interruptible() instead of
set_current_state()/schedule_timeout() to reduce kernel size.
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: Jeff Garzik <jgarzik@pobox.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
This patch adds a new field to net device to hold the permanent
hardware address, and adds a new generic ethtool_op function to
get that address.
Signed-off-by: Jon Wetzel <jon_wetzel@dell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Of this type, mostly:
CHECK net/ipv6/netfilter.c
net/ipv6/netfilter.c:96:12: warning: symbol 'ipv6_netfilter_init' was not declared. Should it be static?
net/ipv6/netfilter.c:101:6: warning: symbol 'ipv6_netfilter_fini' was not declared. Should it be static?
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
# grep -r 'netif_carrier_o[nf]' linux-2.6.12 | wc -l
246
# size vmlinux.org vmlinux.carrier
text data bss dec hex filename
4339634 1054414 259296 5653344 564360 vmlinux.org
4337710 1054414 259296 5651420 563bdc vmlinux.carrier
And this ain't an allyesconfig kernel!
Signed-off-by: David S. Miller <davem@davemloft.net>
Bonding just wants the device before the skb_bond()
decapsulation occurs, so simply pass that original
device into packet_type->func() as an argument.
It remains to be seen whether we can use this same
exact thing to get rid of skb->input_dev as well.
Signed-off-by: David S. Miller <davem@davemloft.net>
Eliminate the throttling behaviour when the netif receive queue fills
because it behaves badly when using high speed networks under load.
The throttling cause multiple packet drops that cause TCP to go into
slow start mode. The same effective patch has been part of BIC TCP and
H-TCP as well as part of Web100.
The existing code drops 100's of packets when the queue fills;
this changes it to individual packet drop-tail.
Signed-off-by: Stephen Hemmminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the congestion sensing mechanism from netif_rx, and always
return either full or empty. Almost no driver checks the return value
from netif_rx, and those that do only use it for debug messages.
The original design of netif_rx was to do flow control based on the
receive queue, but NAPI has supplanted this and no driver uses the
feedback.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove last vestiages of fastroute code that is no longer used.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch creates a new kstrdup library function and changes the "local"
implementations in several places to use this function.
Most of the changes come from the sound and net subsystems. The sound part
had already been acknowledged by Takashi Iwai and the net part by David S.
Miller.
I left UML alone for now because I would need more time to read the code
carefully before making changes there.
Signed-off-by: Paulo Marques <pmarques@grupopie.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch introduces a netpoll_info structure, which the struct net_device
will now point to instead of pointing to a struct netpoll. The reason for
this is two-fold: 1) fields such as the rx_flags, poll_owner, and poll_lock
should be maintained per net_device, not per netpoll; and 2) this is a first
step in providing support for multiple netpoll clients to register against the
same net_device.
The struct netpoll is now pointed to by the netpoll_info structure. As
such, the previous behaviour of the code is preserved.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the hardware header size is a multiple of HH_DATA_MOD, HH_DATA_OFF()
incorrectly returns HH_DATA_MOD (instead of 0). This affects ieee80211 layer
as 802.11 header is 32 bytes long.
Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
The features field in netdevice is really a bitmask, and bitmask's should
be unsigned.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Resend of earlier patch (no changes) from Catalin used to provide
device feature change notification.
Signed-off-by: Catalin BOIE <catab at umbrella.ro>
Acked-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Everybody does
struct packet_type foo_packet_type = {
.type = __constant_htons(ETH_P_FOO);
};
5 introduced warnings will be properly fixed later.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ross moved. Remove the bad email address so people will find the correct
one in ./CREDITS.
Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.
Let it rip!