Trivial sparse detected functions that should be static.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reduce the number of calls required to alloc
a zeroed block of memory.
Trivially reduces overall object size.
Other changes around these removals
o Neaten call argument alignment
o Remove an unnecessary OOM message after dma_alloc_coherent failure
o Remove unnecessary gfp_t stack variable
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using TX push when notifying the NIC of multiple new descriptors in
the ring will very occasionally cause the TX DMA engine to re-use an
old descriptor. This can result in a duplicated or partly duplicated
packet (new headers with old data), or an IOMMU page fault. This does
not happen when the pushed descriptor is the only one written.
TX push also provides little latency benefit when a packet requires
more than one descriptor.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Allocating 2 buffers per page is insanely inefficient when MTU is 1500
and PAGE_SIZE is 64K (as it usually is on POWER). Allocate as many as
we can fit, and choose the refill batch size at run-time so that we
still always use a whole page at once.
[bwh: Fix loop condition to allow for compound pages; rebase]
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
On POWER systems, DMA mapping/unmapping operations are very expensive.
These changes reduce these costs by trying to reuse DMA mapped pages.
After all the buffers associated with a page have been processed and
passed up, the page is placed into a ring (if there is room). For
each page that is required for a refill operation, a page in the ring
is examined to determine if its page count has fallen to 1, ie. the
kernel has released its reference to these packets. If this is the
case, the page can be immediately added back into the RX descriptor
ring, without having to re-map it for DMA.
If the kernel is still holding a reference to this page, it is removed
from the ring and unmapped for DMA. Then a new page, which can
immediately be used by RX buffers in the descriptor ring, is allocated
and DMA mapped.
The time a page needs to spend in the recycle ring before the kernel
has released its page references is based on the number of buffers
that use this page. As large pages can hold more RX buffers, the RX
recycle ring can be shorter. This reduces memory usage on POWER
systems, while maintaining the performance gain achieved by recycling
pages, following the driver change to pack more than two RX buffers
into large pages.
When an IOMMU is not present, the recycle ring can be small to reduce
memory usage, since DMA mapping operations are inexpensive.
With a small recycle ring, attempting to refill the descriptor queue
with more buffers than the equivalent size of the recycle ring could
ultimately lead to memory leaks if page entries in the recycle ring
were overwritten. To prevent this, the check to see if the recycle
ring is full is changed to check if the next entry to be written is
NULL.
[bwh: Combine and rebase several commits so this is complete
before the following buffer-packing changes. Remove module
parameter.]
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Enable RX DMA scattering iff an RX buffer large enough for the current
MTU will not fit into a single page and the NIC supports DMA
scattering for kernel-mode RX queues.
On Falcon and Siena, the RX_USR_BUF_SIZE field is used as the DMA
limit for both all RX queues with scatter enabled. Set it to 1824,
matching what Onload uses now.
Maintain a statistic for frames truncated due to lack of descriptors
(rx_nodesc_trunc). This is distinct from rx_frm_trunc which may be
incremented when scattering is disabled and implies an over-length
frame.
Whenever an MTU change causes scattering to be turned on or off,
update filters that point to the PF queues, but leave others
unchanged, as VF drivers assume scattering is off.
Add n_frags parameters to various functions, and make them iterate:
- efx_rx_packet()
- efx_recycle_rx_buffers()
- efx_rx_mk_skb()
- efx_rx_deliver()
Make efx_handle_rx_event() responsible for updating
efx_rx_queue::removed_count.
Change the RX pipeline state to a starting ring index and number of
fragments, and make __efx_rx_packet() responsible for clearing it.
Based on earlier versions by David Riddoch and Jon Cooper.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Adjust rx_buf->page_offset when we eat the RX hash prefix. Remove
efx_rx_buf_offset(), which is now redundant.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Currently we prefetch from the Ethernet header, but we will also read
the hash prefix. In practice they should be in the same cache line
and this won't hurt, but it is still pointless to add on the hash
prefix size.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
efx_rx_buf_va() returns the virtual address of the current start of
the buffer. The callers must add the hash prefix size themselves.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
The pipeline mechanism will need to change a bit for scattered
packets. Add a wrapper to insulate efx_process_channel() from this.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
The Linux side of EEH is triggered by MMIO reads, but this
driver's data path does not issue any MMIO reads (except in
legacy interrupt mode). Therefore add a monitor function
to poll EEH periodically.
When preparing to reset the device based on our own error
detection, also poll EEH and defer to its recovery mechanism
if appropriate.
[bwh: Use a separate condition for the initial link poll; fix some
style errors]
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
On Siena, VFs share RSS configuration with the PF. We attempted to
support configurations where the PF only uses 1 RX queue and VFs use
multiple RX queues, by (1) setting up RSS for the number of RX queues
per VF (2) disabling RSS in the PF's RX default filters.
Unfortunately commit cd2d5b529c ('sfc: Add SR-IOV back-end support
for SFC9000 family') only included (1). This is (2).
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
efx_filter_insert_filter() uses the first table entry in the hash chain
that either has the same match values or is empty. This means that
replacement doesn't always work correctly:
1. Insert filter F1 with match values M1, hashing to H1, at first
possible entry E1.
2. Insert filter F2 with match values M2, hashing to H1, at second
possible entry E2.
3. Remove filter F1.
4. Insert filter F3 with match values M2, hashing to H1, at first
possible entry E1.
F3 should have either replaced F2 or been rejected (depending on
priority and the replace_equal parameter).
Instead, search for both a matching filter that the inserted filter
would replace, and an available insertion point, up to the applicable
maximum search depths. If we insert at lower depth than a replaced
filter, clear the replaced filter.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
efx_filter_search() is only called from efx_filter_insert(), and
neither function is very long. The following bug fix requires a more
sophisticated search with a third result, which is going to be easier
to implement as part of the same function.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
These functions happen to work for default MAC filters: they generate
an initial index of 1/0 for unicast/multicast respectively and an
increment of 1 for either, so a search succeeds at depth 2. But this
is a matter of luck rather than design, and it really won't work well
with the bug fix we're about to do.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
The 'for_insert' parameter is redundant since there are no longer
any other operations that need to search based on a filter spec.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
The 'replace' flag to efx_filter_insert_filter() controls whether the
new filter may replace *any* filter, and is checked even before
priority comparison. But lower-priority filters should never
block insertion of higher-priority filters.
Change the priority checking so that lower-priority filters are
replaced regardless of the value of the flag, and rename the
flag to 'replace_equal'.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
[bwh: Remove more dead code, and make efx_ptp_rx() pull the data it
needs into the header area.]
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
There is a long-standing problem with the packet-timestamp matching in
the driver. When a PTP packet is received by the MC, the FPGA
timestamps the packet and the MC sends the timestamp and 6 bytes of
the UUID to the driver. The driver then matches the timestamp against
received packets using the same 6 bytes of UUID.
The problem comes from the choice of which 6 bytes to use. The PTP
spec is slightly contradictory and misleading in one of the two places
where the UUIDs are discussed. From section 7.2.2.2 of the spec, a
PTPD2 UUID can be either a EUI-64 or a EUI-64 constructed from a
EUI-48. The typical ethernet based implementation uses a EUI-64
constructed from a EUI-48. This works by taking the first 3 bytes of
the MAC address of the NIC being used for PTP (the OUI), then
inserting 0xFF, 0xFE, then taking the last 3 bytes of the MAC address
giving
MAC[0], MAC[1], MAC[2], 0xFF, 0xFE, MAC[3], MAC[4], MAC[5]
The current MC firmware and driver discard the first two bytes of this
UUID and packets are matched against timestamps using bytes 2 to 7 so
there is a small risk that in a deployment of Solarflare PTP NICs used
with other vendors NICs, that a PTP packet could be matched against
the wrong timestamp. This applies to all other organisations whose
third byte of the OUI is 0x53. It's a long list but I notice that it
includes Cisco.
The necessary modifications to use bytes 0-2 and 5-7 of the UUID to
match against are quite small but introduce incompatibility between
older version of the firmware and driver.
When PTP is enabled via SO_TIMESTAMPING specifying PTP V2, the driver
will try to enable PTP in the firmware using the enhanced mode
(above). If the firmware returns an error, the driver will enable PTP
in the firmware using the old mode.
[bwh: Fix some style errors; remove private ioctl bits]
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Instead of having efx_ptp_rx() call netif_receive_skb() for an invalid
PTP packet, make it return false for rejected packets and have
efx_rx_deliver() pass them up.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
RX DMA buffers start at an offset of EFX_PAGE_IP_ALIGN bytes from the
start of a cache line. This offset obviously needs to be included in
the virtual address, but this was missed in commit b590ace09d
('sfc: Fix efx_rx_buf_offset() in the presence of swiotlb') since
EFX_PAGE_IP_ALIGN is equal to 0 on both x86 and powerpc.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
efx_device_detach_sync() locks all TX queues before marking the device
detached and thus disabling further TX scheduling. But it can still
be interrupted by TX completions which then result in TX scheduling in
soft interrupt context. This will deadlock when it tries to acquire
a TX queue lock that efx_device_detach_sync() already acquired.
To avoid deadlock, we must use netif_tx_{,un}lock_bh().
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
We must only ever stop TX queues when they are full or the net device
is not 'ready' so far as the net core, and specifically the watchdog,
is concerned. Otherwise, the watchdog may fire *immediately* if no
packets have been added to the queue in the last 5 seconds.
The device is ready if all the following are true:
(a) It has a qdisc
(b) It is marked present
(c) It is running
(d) The link is reported up
(a) and (c) are normally true, and must not be changed by a driver.
(d) is under our control, but fake link changes may disturb userland.
This leaves (b). We already mark the device absent during reset
and self-test, but we need to do the same during MTU changes and ring
reallocation. We don't need to do this when the device is brought
down because then (c) is already false.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
We assume that the mapping between DMA and virtual addresses is done
on whole pages, so we can find the page offset of an RX buffer using
the lower bits of the DMA address. However, swiotlb maps in units of
2K, breaking this assumption.
Add an explicit page_offset field to struct efx_rx_buffer.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
We may currently allocate two RX DMA buffers to a page, and only unmap
the page when the second is completed. We do not sync the first RX
buffer to be completed; this can result in packet loss or corruption
if the last RX buffer completed in a NAPI poll is the first in a page
and is not DMA-coherent. (In the middle of a NAPI poll, we will
handle the following RX completion and unmap the page *before* looking
at the content of the first buffer.)
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Delete successive tests to the same location. rc was previously tested and
not subsequently updated. efx_phc_adjtime can return an error code, so the
call is updated so that is tested instead.
A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)
// <smpl>
@s exists@
local idexpression y;
expression x,e;
@@
*if ( \(x == NULL\|IS_ERR(x)\|y != 0\) )
{ ... when forall
return ...; }
... when != \(y = e\|y += e\|y -= e\|y |= e\|y &= e\|y++\|y--\|&y\)
when != \(XT_GETPAGE(...,y)\|WMI_CMD_BUF(...)\)
*if ( \(x == NULL\|IS_ERR(x)\|y != 0\) )
{ ... when forall
return ...; }
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Acked-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The __dev* removal patches for the network drivers ended up messing up
the function prototypes for a bunch of drivers. This patch fixes all of
them back up to be properly aligned.
Bonus is that this almost removes 100 lines of code, always a nice
surprise.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
CONFIG_HOTPLUG is going away as an option. As result the __dev*
markings will be going away.
Remove use of __devinit, __devexit_p, __devinitdata, __devinitconst,
and __devexit.
Signed-off-by: Bill Pemberton <wfp5p@virginia.edu>
Cc: Solarflare linux maintainers <linux-net-drivers@solarflare.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Most of the module parameters treated as boolean are currently exposed
as type int or uint. Defining them with the proper type is useful
documentation for both users and developers.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
efx_mcdi_poll() uses get_seconds() to read the current time and to
implement a polling timeout. The use of this function was chosen
partly because it could easily be replaced in a co-sim environment
with a macro that read the simulated time.
Unfortunately the real get_seconds() returns the system time (real
time) which is subject to adjustment by e.g. ntpd. If the system time
is adjusted forward during a polled MCDI operation, the effective
timeout can be shorter than the intended 10 seconds, resulting in a
spurious failure. It is also possible for a backward adjustment to
delay detection of a areal failure.
Use jiffies instead, and change MCDI_RPC_TIMEOUT to be denominated in
jiffies. Also correct rounding of the timeout: check time > finish
(or rather time_after(time, finish)) and not time >= finish.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
The assertion of netif_device_present() at the top of
efx_hard_start_xmit() may fail if we don't do this.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
We sometimes hit a "failed to flush" timeout on some TX queues, but the
flushes have completed and the flush completion events seem to go missing.
In this case, we can check the TX_DESC_PTR_TBL register and drain the
queues if the flushes had finished.
[bwh: Minor fixes to coding style]
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
If the MC reboots then the stats it reports to us will have been
reset. We need to reset ours to get efx_update_diff_stat() working
properly.
(Ideally we would maintain stats across the reboot, but as this should
only happen immediately after a firmware upgrade it's not really worth
the trouble.)
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Currently we initialise the newly allocated buffer to all-1s, which is
important for event queues but not for descriptor queues. And since
we also do that in efx_nic_init_eventq(), it is completely pointless
to do it here.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
efx_writed_table() uses a step of 16 bytes but efx_readd_table() uses
a step of 4 bytes. Why are they different?
Firstly, register access is asymmetric:
- The EVQ_RPTR table and RX_INDIRECTION_TBL can (or must?) be written
as dwords even though they have a step size of 16 bytes, unlike
most other CSRs.
- In general, a read of any width is valid for registers, so long as
it does not cross register boundaries. There is also no latching
behaviour in the BIU, contrary to rumour.
We write to the EVQ_RPTR table with efx_writed_table() but never read
it back as it's write-only. We write to the RX_INDIRECTION_TBL with
efx_writed_table(), but only read it back for the register dump, where
we use efx_reado_table() as for any other table with step size of 16.
We read MC_TREG_SMEM with efx_readd_table() for the register dump, but
normally read and write it with efx_readd() and efx_writed() using
offsets calculated in bytes.
Since these functions are trivial and have few callers, it's clearer
to open-code them at the call sites. While we're at it, update the
comments on the BIU behaviour again.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
efx_mcdi_rpc_start() returns a negative value on error or zero on
success. However one caller that can't properly handle failure then
does WARN_ON(rc > 0). Change it to WARN_ON(rc < 0).
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Receiving pause frames can block TX queue flushes. Earlier changes
work around this by reconfiguring the MAC during flushes for VFs, but
during flushes for the PF we would only change the fc_disable counter.
Unless the MAC is reconfigured for some other reason during the flush
(which I would not expect to happen) this had no effect at all.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
sparse has got a bit more picky since I last ran it over this. Add
forced casts for use of ~0 as a big-endian value. Undo the pointless
optimisation of parameter validation with '|'; using '||' avoids these
warnings.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Various drivers depend on INET because they used to select INET_LRO,
but they have all been converted to use GRO which has no such
dependency.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This was missed in commit a24006ed12
('ptp: Enable clock drivers along with associated net/PHY drivers')
which enabled sfc's clock driver unconditionally.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Where a PTP clock driver is associated with a net or PHY driver, it
should be enabled automatically whenever that driver is enabled.
Therefore:
- Make PTP clock drivers select rather than depending on PTP_1588_CLOCK
- Remove separate boolean options for PTP clock drivers that are built
as part of net driver modules. (This also fixes cases where the PTP
subsystem is wrongly forced to be built-in.)
- Set 'default y' for PTP clock drivers that depend on specific net
drivers but are built separately
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using list_move() instead of list_del() + list_add().
dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are now standard functions for dealing with little-endian bit
arrays, so use them instead of our own implementations.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ben Hutchings says:
====================
Some bug fixes that should go into 3.7:
1. Fix oops when removing device with SR-IOV enabled. (This regression
was introduced by the last set of changes, so the fix does not need to
be applied to any earlier kernel versions.)
2. Fix firmware structure field lookup bug that resulted in missing
sensor information.
3. Fix bug that makes self-test do very little in some configurations.
4. Fix the numbering of ethtool RX flow steering filters to reflect the
real hardware priorities.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Each RX filter table contains filters with two different levels of
specificity: TCP/IPv4 and UDP/IPv4 filters match the local address and
port and optionally the remote address and port; Ethernet filters
match the local address and optionally the VID. The more specific
filters always override less specific filters within the same table,
and should be numbered accordingly.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
This filter flag cannot yet be set through the ethtool command and
will not be supported on future hardware.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
The loopback self-test iterates over all the TX queues of channel 0,
which is not very interesting when that's an RX-only channel.
Signed-off-by: Ben Hutchings <bhutchings@solarflre.com>
The least significant bit number (LBN) of a field within an MCDI
structure is counted from the start of the structure, not the
containing dword. In MCDI_ARRAY_FIELD() we need to mask it rather
than using the usual EFX_DWORD_FIELD() macro.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Commit c31e5f9 ('sfc: Add channel specific receive_skb handler and
post_remove callback') added the function pointer field
efx_channel_type::post_remove and an unconditional call through it.
This field should have been initialised to efx_channel_dummy_op_void
in the existing instances of efx_channel_type, but this was only done
in efx_default_channel_type. Consequently, if a device has SR-IOV
enabled then removing the driver or device will result in an oops.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
PTP Hardware Clock devices appear as class devices in sysfs. This patch
changes the registration API to use the parent device, clarifying the
clock's relationship to the underlying device.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Acked-by: Ben Hutchings <bhutchings@solarflare.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
MCDI supports requests up to 252 bytes long, which is only enough to
pass 63 RX queue IDs to MC_CMD_FLUSH_RX_QUEUES. However a VF may have
up to 64 RX queues, and if we try to flush them all we will generate
an over-length request and BUG() in efx_mcdi_copyin(). Currently
all VF drivers limit themselves to 32 RX queues, so reducing the
limit to 63 does no harm.
Also add a BUILD_BUG_ON in efx_mcdi_flush_rxqs() so we remember to
deal with the same problem there if EFX_MAX_CHANNELS is increased.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
On big-endian systems the MTD partition names currently have mangled
subtype numbers and are not recognised by the firmware update tool
(sfupdate).
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Add PTP IEEE-1588 support and make accesible via the PHC subsystem.
This work is based on prior code by Andrew Jackson
Signed-off-by: Stuart Hodgson <smhodgson@solarflare.com>
[bwh:
- Add byte order conversion in efx_ptp_send_times()
- Simplify conversion of PPS event times
- Add the built-in vs module check to CONFIG_SFC_PTP dependencies]
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
The maximum array sizes have been calculated on the basis of a maximum
SDU size of 255 bytes, whereas the actual maximum is 252 bytes.
Constructing a larger SDU will result in a BUG_ON in efx_mcdi_copyin.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
For NIC/System time synchonisation efx_mcdi_rpc needs to be split in
efx_mcdi_rpc_start and efx_mcdi_rpc_finish operations.
Signed-off-by: Stuart Hodgson <smhodgson@solarflare.com>
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Allows an extra channel to override the standard receive_skb handler
and also for extra non generic operations to be performed on remove.
Also set default rx strategy so only skbs can be delivered to the
PTP receive function.
Signed-off-by: Stuart Hodgson <smhodgson@solarflare.com>
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
The PTP channel will have its own RX queue even though it's not
a regular traffic channel.
Original work by Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Stuart Hodgson <smhodgson@solarflare.com>
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Merge the 'net' tree to get the recent set of netfilter bug fixes in
order to assist with some merge hassles Pablo is going to have to deal
with for upcoming changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
Following commit 8f4cccb ('net: Set device operstate at registration
time') it is now correct and preferable to set the carrier off before
registering a device.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
We also stop clearing *efx in efx_init_struct(). This is safe because
alloc_etherdev_mq() already clears it for us.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
RX DMA is limited by the length specified in each descriptor and not
by the MAC. Over-length frames may get into the RX FIFO regardless of
the MAC settings, due to a hardware bug, but they will be truncated by
the packet DMA engine and reported as such in the completion event.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
We try to defer resets while the device is not READY, but we're not
doing this quite correctly. In particular, changes to efx_nic::state
are documented as serialised by the RTNL lock, but they aren't.
1. We check whether a reset was requested during probe (suggesting
broken hardware) before we allow requested resets to be scheduled.
This leaves a window where a requested reset would be deferred
indefinitely.
2. Although we cancel the reset work item during device removal,
there are still later operations that can cause it to be scheduled
again. We need to check the state before scheduling it.
3. Since the state can change between scheduling and running of
the work item, we still need to check it there, and we need to
do so *after* acquiring the RTNL lock which serialises state
changes.
4. We must cancel the reset work item during device removal, if the
state could ever have been READY. This wasn't done in some of the
failure paths from efx_pci_probe(). Move the cancellation to
efx_pci_remove_main().
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
The current informational message doesn't properly explain what
happens, and could also appear if we defer a reset during
suspend/resume.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
efx_change_mtu() and efx_realloc_channels() each stop and start much
of the NIC, even if it has been disabled. Since efx_start_all() is a
no-op when the NIC is disabled, this is probably harmless in the case
of efx_change_mtu(), but efx_realloc_channels() also reenables
interrupts which could be a bad thing to do.
Change efx_start_all() and efx_start_interrupts() to assert that the
NIC is not disabled, but make efx_stop_interrupts() do nothing if the
NIC is disabled (since it is already stopped), consistent with
efx_stop_all().
Update comments for efx_start_all() and efx_stop_all() to describe
their purpose and preconditions more accurately.
Add a common function to check and log if the NIC is disabled, and use
it in efx_net_open(), efx_change_mtu() and efx_realloc_channels().
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Interrupt state should be consistently guarded by the RTNL lock once
the net device is registered.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
I don't think these PM functions can race with userland net device
operations, but it's much easier to reason about locking if state is
consistently guarded by the same lock.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
STATE_INIT and STATE_FINI are equivalent and represent incompletely
initialised states; combine them as STATE_UNINIT.
Rename STATE_RUNNING to STATE_READY, to avoid confusion with
netif_running() and IFF_RUNNING.
The comments do not quite match current usage, but this will be
corrected in subsequent fixes.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
We only use tso_state::full_packet_space to calculate the IPv4 tot_len
or IPv6 payload_len, not to set tso_state::packet_space. Replace it
with an ip_base_len field holding the value of tot_len or payload_len
before including the TCP payload, which is much more useful when
constructing the new headers.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
TSO header buffers contain a control structure immediately followed by
the packet headers, and are kept on a free list when not in use. This
complicates buffer management and tends to result in cache read misses
when we recycle such buffers (particularly if DMA-coherent memory
requires caches to be disabled).
Replace the free list with a simple mapping by descriptor index. We
know that there is always a payload descriptor between any two
descriptors with TSO header buffers, so we can allocate only one
such buffer for each two descriptors.
While we're at it, use a standard error code for allocation failure,
not -1.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
We now have a definite upper bound on the number of descriptors per
skb; use that to stop the queue when the next packet might not fit.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Add a flags field to struct efx_tx_buffer, replacing the
continuation and map_single booleans.
Since a single descriptor cannot be both a TSO header and the last
descriptor for an skb, unionise efx_tx_buffer::{skb,tsoh} and add
flags for validity of these fields.
Clear all flags in free buffers (whereas previously the continuation
flag would be set).
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
ETHTOOL_GRXCLSRULE returns filters for a TCP/IPv4 or UDP/IPv4 4-tuple
with source and destination swapped.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Currently an skb requiring TSO may not fit within a minimum-size TX
queue. The TX queue selected for the skb may stall and trigger the TX
watchdog repeatedly (since the problem skb will be retried after the
TX reset). This issue is designated as CVE-2012-3412.
Set the maximum number of TSO segments for our devices to 100. This
should make no difference to behaviour unless the actual MSS is less
than about 700. Increase the minimum TX queue size accordingly to
allow for 2 worst-case skbs, so that there will definitely be space
to add an skb after we wake a queue.
To avoid invalidating existing configurations, change
efx_ethtool_set_ringparam() to fix up values that are too small rather
than returning -EINVAL.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dynamically allocated sysfs attributes must be initialized using
sysfs_attr_init(), otherwise lockdep complains:
BUG: key <address> not in .data!
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Acked-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings says:
====================
1. Fix potential badness when running a self-test with SR-IOV enabled.
2. Fix calculation of some interface statistics that could run backward.
3. Miscellaneous cleanup.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Some interface statistics are computed in such a way that they can
sometimes decrease (and even underflow). Since the computed value
will never be greater than the true value, we fix this by only storing
the computed value when it increases.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Currently VF queues and drivers may remain active during this test.
This could cause memory corruption or spurious test failures.
Therefore we reset the port/function before running these tests on
Siena.
On Falcon this doesn't work: we have to do some additional
initialisation before some blocks will work again. So refactor the
reset/register-test sequence into an efx_nic_type method so
efx_selftest() doesn't have to consider such quirks.
In the process, fix another minor bug: Siena does not have an
'invisible' reset and the self-test currently fails to push the PHY
configuration after resetting. Passing RESET_TYPE_ALL to
efx_reset_{down,up}() fixes this.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Fix CID 113952 in Coverity report on Linux.
This is the one instance where we don't, and shouldn't, check the
return code from efx_mcdi_rpc(). It wasn't immediately obvious to me
why we didn't, so I think an explanation is in order.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Fix CID 102619 in the Coverity report on Linux.
efx_end_loopback() iterates over an array of skb pointers of which
some may be null (if efx_begin_loopback() failed). It should not use
dev_kfree_skb_irq(), which requires non-null pointers. In practice
this is safe because it does not run in interrupt context and
therefore always ends up calling dev_kfree_skb(), which does allow
null pointers. But we should make that explicit.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Fix CID 113703 in the Coverity report on Linux.
ethtool stats names are limited to 32 bytes including a null
terminator. Use strlcpy() to ensure that we will always include the
null terminator even if a source string becomes longer than this.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
There is nothing in the VLAN driver or core VLAN support that
invalidates the TCP and IP header offsets.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
tso_state::packet_space is always set in tso_start_packet(); the
value set in tso_start() is not used, and is also incorrect.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
With some gcc versions & optimisations, the compiler will warn that
'depth' in efx_filter_insert_filter() may be used without being
initialised, although this is not the case.
This is related to inlining of efx_filter_search(), which only has
one caller since commit 8db182f4a8
('sfc: Remove now-unused filter function').
Shut the compiler up by initialising it to 0.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Convert doxygen (or similar) formatted comments to kernel-doc or
unformatted comment. Delete a few that are content-free.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix incorrect start markers, wrapped summary lines, missing section
breaks, incorrect separators, and some name mismatches. Delete
a few that are content-free.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the new bool function ether_addr_equal to add
some clarity and reduce the likelihood for misuse
of compare_ether_addr for sorting.
Done via cocci script:
$ cat compare_ether_addr.cocci
@@
expression a,b;
@@
- !compare_ether_addr(a, b)
+ ether_addr_equal(a, b)
@@
expression a,b;
@@
- compare_ether_addr(a, b)
+ !ether_addr_equal(a, b)
@@
expression a,b;
@@
- !ether_addr_equal(a, b) == 0
+ ether_addr_equal(a, b)
@@
expression a,b;
@@
- !ether_addr_equal(a, b) != 0
+ !ether_addr_equal(a, b)
@@
expression a,b;
@@
- ether_addr_equal(a, b) == 0
+ !ether_addr_equal(a, b)
@@
expression a,b;
@@
- ether_addr_equal(a, b) != 0
+ ether_addr_equal(a, b)
@@
expression a,b;
@@
- !!ether_addr_equal(a, b)
+ ether_addr_equal(a, b)
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently allows for SFP+ eeprom to be returned using the ethtool API.
This can be extended in future to handle different eeprom formats
and sizes
Signed-off-by: Stuart Hodgson <smhodgson@solarflare.com>
[bwh: Drop redundant validation, comment, whitespace]
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Previously we refilled with much larger batches, which caused large latency
spikes. We now have many more much much smaller spikes!
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
We need to clear the private data pointer in the PCI device.
Also reorder cleanup in efx_pci_remove() for symmetry.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
efx_nic_fatal_interrupt() disables DMA before scheduling a reset.
After this, we need not and *cannot* flush queues.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
If RSS is disabled on the PF (efx->n_rx_channels == 1) we try to set
up the indirection table so that VFs can use it, setting
efx->rss_spread = efx_vf_size(efx). But if SR-IOV was disabled at
compile time, this evaluates to 0 and we end up dividing by zero when
initialising the table.
I considered changing the fallback definition of efx_vf_size() to
return 1, but its value is really meaningless if we are not going to
enable VFs. Therefore add a condition of efx_sriov_wanted(efx) in
efx_probe_interrupts().
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Artem's cleanup of the MTD API continues apace.
Fixes and improvements for ST FSMC and SuperH FLCTL NAND, amongst others.
More work on DiskOnChip G3, new driver for DiskOnChip G4.
Clean up debug/warning printks in JFFS2 to use pr_<level>.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iEYEABECAAYFAk92K6UACgkQdwG7hYl686NrMACfWQJRWasR78MWKfkT2vWZwTFJ
X5AAoKiSYO2pfo5gWJGOAahNC1zUqMX0
=i3Vb
-----END PGP SIGNATURE-----
Merge tag 'for-linus-3.4' of git://git.infradead.org/mtd-2.6
Pull MTD changes from David Woodhouse:
- Artem's cleanup of the MTD API continues apace.
- Fixes and improvements for ST FSMC and SuperH FLCTL NAND, amongst
others.
- More work on DiskOnChip G3, new driver for DiskOnChip G4.
- Clean up debug/warning printks in JFFS2 to use pr_<level>.
Fix up various trivial conflicts, largely due to changes in calling
conventions for things like dmaengine_prep_slave_sg() (new inline
wrapper to hide new parameter, clashing with rewrite of previously last
parameter that used to be an 'append' flag, and is now a bitmap of
'unsigned long flags').
(Also some header file fallout - like so many merges this merge window -
and silly conflicts with sparse fixes)
* tag 'for-linus-3.4' of git://git.infradead.org/mtd-2.6: (120 commits)
mtd: docg3 add protection against concurrency
mtd: docg3 refactor cascade floors structure
mtd: docg3 increase write/erase timeout
mtd: docg3 fix inbound calculations
mtd: nand: gpmi: fix function annotations
mtd: phram: fix section mismatch for phram_setup
mtd: unify initialization of erase_info->fail_addr
mtd: support ONFI multi lun NAND
mtd: sm_ftl: fix typo in major number.
mtd: add device-tree support to spear_smi
mtd: spear_smi: Remove default partition information from driver
mtd: Add device-tree support to fsmc_nand
mtd: fix section mismatch for doc_probe_device
mtd: nand/fsmc: Remove sparse warnings and errors
mtd: nand/fsmc: Add DMA support
mtd: nand/fsmc: Access the NAND device word by word whenever possible
mtd: nand/fsmc: Use dev_err to report error scenario
mtd: nand/fsmc: Use devm routines
mtd: nand/fsmc: Modify fsmc driver to accept nand timing parameters via platform
mtd: fsmc_nand: add pm callbacks to support hibernation
...
As of bb0eb217, MTD_FAIL_ADDR_UNKNOWN should be used to indicate mtd
erase failure not specific to any particular block.
Use MTD_FAIL_ADDR_UNKNOWN instead of 0xffffffff when setting
'erase->fail_addr' in 'efx_mtd_erase()'.
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Acked-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
This patch renames all MTD functions by adding a "_" prefix:
mtd->erase -> mtd->_erase
mtd->read_oob -> mtd->_read_oob
...
The reason is that we are re-working the MTD API and from now on it is
an error to use MTD function pointers directly - we have a corresponding
API call for every pointer. By adding a leading "_" we achieve the following:
1. Make sure we convert every direct pointer users
2. A leading "_" suggests that this interface is internal and it becomes
less likely that people will use them directly
3. Make sure all the out-of-tree modules stop compiling and the owners
spot the big API change and amend them.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
During probe of each port, read and log the part number from VPD.
Remove the Falcon-specific board name lookup.
Initial version by Stuart Hodgson <smhodgson@solarflare.com>.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Generate a test event on each event queue whenever the interface is
brought up, then after 1 second check that we have either handled a
test event or handled another IRQ for each event queue.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
In case all event queues are broken for some reason, this means it
will only take about a second to check them all, rather than up to 32
seconds. This may also speed up testing in the successful case.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
IRQ latency can be ridiculously high for various reasons, so our
current timeouts of 100 ms or 10 ms are too short.
Change the IRQ and event tests to use polling loops starting with a
delay of 1 tick and doubling that if necessary up to a maximum total
delay of approximately 1 second.
Raise the loopback packet RX timeout to 1 second.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
RX and TX completions on the same event queue are generally not associated
with the same flows. The inclusion of TX completions in the adaptive IRQ
score is more of a source of noise rather than useful feedback. Therefore,
do not include them in the score, and adjust the default threshold scores
down.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
The in-tree driver has never supported Driverlink. The rest of the
comments are rather redundant, but we can usefully state what the
requirements are on the buffer state.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Conflicts:
drivers/net/ethernet/sfc/rx.c
Overlapping changes in drivers/net/ethernet/sfc/rx.c, one to change
the rx_buf->is_page boolean into a set of u16 flags, and another to
adjust how ->ip_summed is initialized.
Signed-off-by: David S. Miller <davem@davemloft.net>
When pre-allocating skbs for received packets, we set ip_summed =
CHECKSUM_UNNCESSARY. We used to change it back to CHECKSUM_NONE when
the received packet had an incorrect checksum or unhandled protocol.
Commit bc8acf2c8c ('drivers/net: avoid
some skb->ip_summed initializations') mistakenly replaced the latter
assignment with a DEBUG-only assertion that ip_summed ==
CHECKSUM_NONE. This assertion is always false, but it seems no-one
has exercised this code path in a DEBUG build.
Fix this by moving our assignment of CHECKSUM_UNNECESSARY into
efx_rx_packet_gro().
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Unify return value of .ndo_set_mac_address if the given address
isn't valid. Return -EADDRNOTAVAIL as eth_mac_addr() already does
if is_valid_ether_addr() fails.
Signed-off-by: Danny Kukawka <danny.kukawka@bisect.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
efx_for_each_possible_channel_tx_queue() should do nothing for RX-only
or extra channels. The current definition results in allocating
additional unused hardware TX queues when using the mqprio qdisc and
either separate_tx_channels or SR-IOV.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
We have a very simple way of allocating buffer table entries to
queues, which is just to take the next one available. The extra
channels are the highest numbered channels but they need to be
allocated the lowest entries so that the traffic channels can be
allocated new entries without any collisions.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
efx_vfdi_set_status_page() validates the peer page count by
calculating the size of a request containing that many addresses and
comparing that with the maximum valid request size (4KB). The
calculation involves a multiplication that may overflow on a 32-bit
system.
We use kcalloc() to allocate memory to store the addresses; that also
does a multiplication and it does check for integer overflow, so any
values larger than 0x1fffffff will be rejected. However, values in
the range [0x1fffffffc, 0x1fffffff] pass boh tests and result in an
attempt to allocate nearly 4GB on the heap. This should be rejected
rather quickly as it's obviously impossible on a 32-bit system, and
indeed the maximum possible heap allocation is 32MB. Still, let's
make absolutely sure by fixing the initial validation.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
This requirement was meant to be implied in the name 'status page'.
One out-of-tree VF driver allocates a buffer using the structure size
and not a full page - hence the current odd specification - but in
practice that allocation will be padded and aligned to at least 4KB.
Therefore, we can specify this and have the option to extend the
structure up to 4KB without worrying about VF drivers using odd-shaped
buffers.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
On the SFC9000 family, each port has 1024 Virtual Interfaces (VIs),
each with an RX queue, a TX queue, an event queue and a mailbox
register. These may be assigned to up to 127 SR-IOV virtual functions
per port, with up to 64 VIs per VF.
We allocate an extra channel (IRQ and event queue only) to receive
requests from VF drivers.
There is a per-port limit of 4 concurrent RX queue flushes, and queue
flushes may be initiated by the MC in response to a Function Level
Reset (FLR) of a VF. Therefore, when SR-IOV is in use, we submit all
flush requests via the MC.
The RSS indirection table is shared with VFs, so the number of RX
queues used in the PF is limited to the number of VIs per VF.
This is almost entirely the work of Steve Hodgson, formerly
shodgson@solarflare.com.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Each port has a block of 64-bit SRAM that is divided between buffer
table and descriptor cache regions at initialisation time. Currently
we use a fixed allocation, but it needs to be changed to support
larger numbers of queues.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
This lets us identify the NIC affected in case of failure, and
will be necessary to adjust for SR-IOV constraints.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Abstract some of the channel operations to allow for 'extra'
channels that do not have RX or TX queues.
- Try to assign a channel to each extra channel type that is enabled
for the NIC, but gracefully degrade if we can't allocate sufficient
MSI-X vectors
- Allow each extra channel type to generate its own channel name
- Allow channel types to disable reallocation and reinitialisation
of their channels
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
The TX DMA engine issues upstream read requests when there is room in
the TX FIFO for the completion. However, the fetches for the rest of
the packet might be delayed by any back pressure. Since a flush must
wait for an EOP, the entire flush may be delayed by back pressure.
Mitigate this by disabling flow control before the flushes are
started. Since PF and VF flushes run in parallel introduce
fc_disable, a reference count of the number of flushes outstanding.
The same principle could be applied to Falcon, but that
would bring with it its own testing.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
For SR-IOV we will need to send events to event queues that belong to
VFs serviced by other drivers. Change the parameters of
efx_generate_event() to allow this and declare it extern.
While we're at it, remove the existing declaration under the wrong
name efx_nic_generate_event().
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
When SR-IOV is enabled we may receive FLR (Function-Level Reset)
events, associated queue flush events and requests from VF drivers at
any time. Therefore we need to keep event queues and interrupts
enabled whenever possible.
Currently we stop interrupt-driven event processing before flushing RX
and TX queues; efx_nic_flush_queues() then polls event queues for
flush events and discards any others it finds. Change it to work with
the regular event handling functions.
Currently efx_start_channel() fills RX queues synchronously when a
device is brought up. This could now race with NAPI, so change it to
send fill events.
This was almost entirely written by Steve Hodgson, formerly
shodgson@solarflare.com.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
The RMFT_DEST_MAC and TMFT_SRC_MAC register fields were previously
documented as 44 bits wide, whereas a MAC address has 48 bits.
Thankfully the hardware uses the correct width and the driver has
used separate definitions that divide each of these into 32-bit and
16-bit fields.
Fix the initial definitions for these fields and rewrite the latter
definitions to use them.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
On Siena each TX queue can be configured to send only packets for
which there is a TX MAC filter that matches the source MAC address,
queue ID, and optionally VID. This will be used to implement the
'spoofchk' feature for SR-IOV virtual functions.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
On Siena all received packets that don't match a more specific filter
will match the unicast or multicast default filter. Currently we
leave these set to the default values (RSS with base queue number of
0). Allow them to be reconfigured to select a single RX queue.
These default filters are programmed through the FILTER_CTL register,
but we represent them internally as an additional table of size 2.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Log an explicit warning if we are unable to create MTDs for a net
device. Also correct the comment about why mtd_device_register() may
fail; there is no longer an MTD table to fill up.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
The 'page size' for PCIe DMA, i.e. the alignment of boundaries at
which DMA must be broken, is 4KB. Name this value as EFX_PAGE_SIZE
and use it in efx_max_tx_len(). Redefine EFX_BUF_SIZE as
EFX_PAGE_SIZE since its value is also a result of that requirement,
and use it in efx_init_special_buffer().
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
If efx_pci_probe_main() schedules an INVISIBLE or ALL reset (but
nothing more drastic), we retry it up to 5 times. So far as I'm
aware, this was a workaround for bugs in Falcon A0 which were fixed
in production silicon. Remove the retry.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
The code in efx_process_channel() to update the RX queue after each
batch of RX completions works out as a no-op on a TX-only channel
where the RX queue structure is set to all-zeroes, but
(1) efx_channel_get_rx_queue() will BUG() if DEBUG is defined, and
(2) it's a waste of time.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
This function returns the page offset of the buffer, which can be
calculated based on either its DMA address or its virtual address. It
used to use the virtual address and we would cast that to unsigned
long, as anything smaller would result in a compiler warning. Now
that it's using the DMA address we should use unsigned int, matching
the return type. It is also unnecessary to use __force.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Replace checksummed and discard booleans from efx_handle_rx_event()
with a bitmask, added to the flags field.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Currently we use type u64 for byte counts, which can very quickly
exceed 2^32, and unsigned long for packet counts, which do not. But
it can still take only 20-something minutes to send or receive 2^32
packets, and not all tools properly handle overflow even if they
sample more often than this.
The MAC statistics are all updated synchronously, so it costs very
little to make them all 64-bit regardless of native word size.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Rename efx_set_multicast_list() to efx_set_rx_mode(), in line
with the operation name net_device_ops::ndo_set_rx_mode.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
The out-of-tree version of the sfc driver used to run a self-test on
each device before registering it. Although this was never included
in-tree, some functions have checks for this special case which is not
really possible.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
SFC4000 boards also have an EEPROM exposed as MTD.
The boot configuration is accessed through MTD.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
The SFC9000-family controllers have firmware to manage all board
peripherals including temperature, heat sink continuity and voltage
sensors. The firmware reports sensor alarms, which we log, and
will shut down the board if necessary.
Some users may want to monitor their boards more closely, so add an
hwmon driver that exposes all sensors reported by the firmware. Move
efx_mcdi_sensor_event() into the new file so it can share the array of
sensor labels with the hwmon driver.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Interrupts are normally generated by the event queues, moderated by
timers. However, they may also be triggered by detection of a 'fatal'
error condition (e.g. memory parity error) or by the host writing to
certain CSR fields as part of a self-test.
The IRQ level/index used for these on Falcon rev B0 and Siena is set
by the KER_INT_LEVE_SEL field and cached by the driver in
efx_nic::fatal_irq_level. Since this value is also relevant to
self-tests rename the field to just 'irq_level'.
Avoid unnecessary cache traffic by using a per-channel 'last_irq_cpu'
field and only writing to the per-controller field when the interrupt
matches efx_nic::irq_level. Remove the volatile qualifier and use
ACCESS_ONCE in the places we read these fields.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
This reverts commit 6369545945 in
drivers/net/ethernet/sfc/falcon.c.
Unlike the INT_ISR0 register on later controller revisions, the
NET_IVEC_INT_Q bits written to memory are only ever set for
interrupting event queues, not for any other interrupt sources.
By definition there can only be one legacy interrupt handler per
function, so there is no need to worry about detecting a fatal
interrupt more than once.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
We cannot safely assume that the NAPI handler will complete within the
20 ms that we allow for the event self-test. The handler may be
deferred for longer than this, particularly on realtime kernels.
Instead, check whether either an event has been handled or (as in the
old failure path) whether an interrupt has been received and an event
has been delivered but not yet handled. Use napi_disable() to
synchronize with the NAPI handler before checking, since it will
clear events before updating eventq_read_ptr.
Remove the test result chan.N.eventq.poll, since it is not an error
if the NAPI handler does not run during the test.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
We currently assume that the timer quantum for Siena is 5 us, the same
as for Falcon. This is not correct; timer ticks are generated on a
rota which takes a minimum of 768 cycles (each event delivery or other
timer change will delay it by 3 cycles). The timer quantum should be
6.144 or 3.072 us depending on whether turbo mode is active.
Replace EFX_IRQ_MOD_RESOLUTION with a timer_quantum_ns field in struct
efx_nic, initialised by the efx_nic_type::probe function.
While we're at it, replace EFX_IRQ_MOD_MAX with a timer_period_max
field in struct efx_nic_type.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
The netif_dbg() macro is defined in <linux/netdevice.h>. If the DEBUG
macro is defined, it logs a message at 'debug' level, otherwise it
does nothing.
In net_driver.h we define DEBUG if EFX_ENABLE_DEBUG is defined, but
this is too late for those source files that already got a
definition of netif_dbg() by including <linux/netdevice.h>
Get rid of EFX_ENABLE_DEBUG, and only define and test DEBUG.
In mtd.c, we do not use DEBUG as a condition flag but are forced to
use the DEBUG macro-function from <linux/mtd/mtd.h>. Undefine DEBUG
before including it.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Both implementations of efx_nic_type::reconfigure_mac operation
push the multicast hash filter to the hardware. It is therefore
redundant to call efx_nic_type::push_multicast_hash as well.
efx_mcdi_mac_reconfigure() also uses this operation, but the
implementation for Siena just uses MCDI anyway. Merge that into
efx_mcdi_mac_reconfigure().
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
The latter is only called by the former, which is a very short
wrapper. Further, gcc 4.5 may currently wrongly warn that the
'faults' variable may be used uninitialised.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
No NICs need to switch efx_mac_operations at run-time, and the MAC
operations are fairly closely bound to NIC types.
Move efx_mac_operations::reconfigure to efx_nic_type::reconfigure_mac
and efx_mac_operations::check_fault fo efx_nic_type::check_mac_fault.
Change callers to call through efx->type or directly if the NIC type
is known.
Remove efx_mac_operations::update_stats. The implementations for
Falcon used to fetch MAC statistics synchronously and this was used by
efx_register_netdev() to clear statistics after running self-tests.
However, it now only converts statistics that have already been
fetched (and that only for Falcon), and the call from
efx_register_netdev() has no effect.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
efx_nic::stats_lock is used to serialise stats updates, but each
reader was dropping it before it finished reading efx_nic::mac_stats.
If there were concurrent stats reads using procfs, or one using procfs
and one using ethtool, an update could race with a read. On a 32-bit
system, the reader could see word-tearing of 64-bit stats (32 bits of
the old value and 32 bits of the new).
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
When the MC reboots, either as part of a firmware upgrade or due to a
bug, it attempts to complete (with an error) any requests that were
outstanding before the reboot. Since there is an inherent race
condition in checking this, it will also write to a status word in
shared memory.
If we look at each of these separately, we may detect each reboot
twice, resulting in a spurious command failure after a firmware
upgrade or frustrating recovery from a firmware bug. Instead, if a
request completion indicates a reboot, we must poll and clear the
status word.
This bug was previously masked by use of an incorrect address for the
status word. Fix that, using the definition now included in
mcdi_pcol.h.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
By the time we look at the MAC address in efx_probe_port(), either the
driver or the firmware has already validated the board configuration.
The possibility of having an invalid MAC address just isn't worth
considering. It certainly isn't worth having a compile-time option
for this.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
The previous default of per-package can be more CPU-efficient, but
users generally seem to prefer per-core. It should also allow
accelerated RFS to direct packets more precisely, if IRQ affinity
is properly spread out.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
This function returns the degree of parallelism wanted, which is not
necessarily the total number of channels we want to create.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Fix the warning:
WARNING: Use #include <linux/io.h> instead of <asm/io.h>
There is no need for selftest.c to include the file at all.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Fix the following errors and warnings:
ERROR: trailing whitespace
ERROR: spaces required around that '=' (ctx:VxV)
WARNING: please, no space before tabs
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
These new functions will support an implementation of the ethtool
RX NFC rules API.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Filter IDs are u32 (but never very large) so an ID/error return
value should have type s32.
Filter indices and search depths are never negative, so should
have type unsigned int.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Also add note that the efx_filter_spec::priority field has nothing
to do with priority between multiple matching filters.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
DaveM said:
Please, this kind of stuff rots forever and not using bool properly
drives me crazy.
Joe Perches <joe@perches.com> gave me the spatch script:
@@
bool b;
@@
-b = 0
+b = false
@@
bool b;
@@
-b = 1
+b = true
I merely installed coccinelle, read the documentation and took credit.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
All drivers that support modification of the RX flow hash indirection
table initialise it in the same way: RX rings are assigned to table
entries in rotation. Make that default policy explicit by having them
call a ethtool_rxfh_indir_default() function.
In the ethtool core, add support for a zero size value for
ETHTOOL_SRXFHINDIR, which resets the table to this default.
Partly-suggested-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Acked-by: Shreyas N Bhatewara <sbhatewara@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a new ethtool operation (get_rxfh_indir_size) to get the
indirectional table size. Use this to validate the user buffer size
before calling get_rxfh_indir or set_rxfh_indir. Use get_rxnfc to get
the number of RX rings, and validate the contents of the new
indirection table before calling set_rxfh_indir. Remove this
validation from drivers.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Acked-by: Dimitris Michailidis <dm@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The SFC9020/SFL9021 device IDs are only used in the device ID table,
where we can just as well use comments.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
The advantage of kcalloc is, that will prevent integer overflows which could
result from the multiplication of number of elements and size and it is also
a bit nicer to read.
The semantic patch that makes this change is available
in https://lkml.org/lkml/2011/11/25/107
Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
As soon as skb is pushed to hardware, it can be completed and freed, so
we should not dereference skb anymore.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Changes to sfc to use byte queue limits.
Signed-off-by: Tom Herbert <therbert@google.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
v2: add couple missing conversions in drivers
split unexporting netdev_fix_features()
implemented %pNF
convert sock::sk_route_(no?)caps
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
These files were using moduleparam infrastructure, but were not
including anything for it -- which is fine when module.h is being
implicitly included in all files, but that is going away.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
* 'next-rebase' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci:
PCI: Clean-up MPS debug output
pci: Clamp pcie_set_readrq() when using "performance" settings
PCI: enable MPS "performance" setting to properly handle bridge MPS
PCI: Workaround for Intel MPS errata
PCI: Add support for PASID capability
PCI: Add implementation for PRI capability
PCI: Export ATS functions to modules
PCI: Move ATS implementation into own file
PCI / PM: Remove unnecessary error variable from acpi_dev_run_wake()
PCI hotplug: acpiphp: Prevent deadlock on PCI-to-PCI bridge remove
PCI / PM: Extend PME polling to all PCI devices
PCI quirk: mmc: Always check for lower base frequency quirk for Ricoh 1180:e823
PCI: Make pci_setup_bridge() non-static for use by arch code
x86: constify PCI raw ops structures
PCI: Add quirk for known incorrect MPSS
PCI: Add Solarflare vendor ID and SFC4000 device IDs
To ease skb->truesize sanitization, its better to be able to localize
all references to skb frags size.
Define accessors : skb_frag_size() to fetch frag size, and
skb_frag_size_{set|add|sub}() to manipulate it.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Per comments from Ben Hutchings on a previous patch, sweep the floors
a little removing unnecessary assignments of zero to fields of struct
ethtool_ringparam in driver code supporting ethtool -g.
Signed-off-by: Rick Jones <rick.jones2@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When I converted some drivers from pci_map_page to skb_frag_dma_map I
neglected to convert PCI_DMA_xDEVICE into DMA_x_DEVICE and
pci_dma_mapping_error into dma_mapping_error.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Solarflare linux maintainers <linux-net-drivers@solarflare.com>
Cc: Steve Hodgson <shodgson@solarflare.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Correct the description of ethtool_rxnfc::rule_locs; it is an array
of currently used locations, not all possible valid locations.
Add note that drivers must not use ethtool_rxnfc::rule_locs.
The rule_locs argument to ethtool_ops::get_rxnfc is either NULL or a
pointer to an array of u32, so change the parameter type accordingly.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
An earlier developer misunderstood the meaning of the 'irq' fields and
the driver did not support the standard fields. To avoid invalidating
existing user documentation, we report and accept changes through
either the standard or 'irq' fields. If both are changed at the same
time, we prefer the standard field.
Also explain why we don't currently use the 'max_frames' fields.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a range check, and move the check that RX and TX are consistent
from efx_ethtool_set_coalesce().
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The reported TX IRQ moderation is generated in a completely crazy way.
Make it simple and correct.
When channels are shared between RX and TX, TX IRQ moderation must be
the same as RX IRQ moderation, but must be specified as 0! Allow it
to be either specified as the same, or left at its previous value
in which case it will be quietly overridden.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Moves the Solarflare drivers into drivers/net/ethernet/sfc/ and
make the necessary Kconfig and Makefile changes.
CC: Steve Hodgson <shodgson@solarflare.com>
CC: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>