PCI devices that are 64-bit DMA capable should set the coherent
DMA mask as well as the streaming DMA mask. On some architectures,
these are managed separately, and so the coherent DMA mask will be
left at its default value of 32 if it is not set explicitly. This
results in errors such as
r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
hwdev DMA mask = 0x00000000ffffffff, dev_addr = 0x00000080fbfff000
swiotlb: coherent allocation failed for device 0000:02:00.0 size=4096
CPU: 0 PID: 1062 Comm: systemd-udevd Not tainted 4.8.0+ #35
Hardware name: AMD Seattle/Seattle, BIOS 10:53:24 Oct 13 2016
on systems without memory that is 32-bit addressable by PCI devices.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When cp_rx_poll does not get enough packet, it will check the rx
interrupt status again. If so, it will jumpt to rx_status_loop again.
But the goto jump resets the rx variable as zero too.
As a result, it causes one possible deadloop. Assume this case,
rx_status_loop only gets the packet count which is less than budget,
and (cpr16(IntrStatus) & cp_rx_intr_mask) condition is always true.
It causes the deadloop happens and system is blocked.
Signed-off-by: Gao Feng <fgao@ikuai8.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If tx timeout event occur, kernel will call rtl8139_tx_timeout_task() to reset
hardware. But in this function, driver does not stop tx and rx function before
reset hardware, that will cause system hang.
In this patch, add stop tx and rx function before reset hardware.
Signed-off-by: Chunhao Lin <hau@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When there is no AC power, NIC may not work after changing mac address.
Please refer to following link.
http://www.spinics.net/lists/netdev/msg356572.html
This issue is caused by runtime power management. When there is no AC
power, if we put NIC down (ifconfig down), the driver will be in runtime
suspend state and hardware will be put into D3 state. During this time,
driver cannot access hardware regisers. So if you set new mac address
during this time, it will not be set to hardware. After resume, NIC will
keep using the old mac address and the network will not work normally.
In this patch I add detecting runtime pm status when setting mac address.
If driver is in runtime suspend state, it will skip setting mac address, keep
the new mac address, and set the new mac address during runtime resume.
Signed-off-by: Chunhao Lin <hau@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Not to call rtl8169_update_counters() to dump tally counter when driver
is in runtime suspend state.
Calling rtl8169_update_counters() in runtime suspend state will produce
warning message "rtl_counters_cond == 1 (loop: 1000, delay: 10)".
Signed-off-by: Chunhao Lin <hau@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
NIC will be put into D3 state during runtime suspend state. When set or
get hardware wol setting, driver will write or read hardware registers.
If we set or get hardware wol setting in runtime suspend state, because
NIC will in D3 state, the hardware registers read by driver will return all
0xff. That will let driver thinking register flag is not toggled and
then prints the warning message "rtl_counters_cond == 1 (loop: 1000,
delay: 10)" to kernel log.
For fixing this issue, add checking driver's pm runtime status in
rtl8169_get_wol() and rtl8169_set_wol().
Signed-off-by: Chunhao Lin <hau@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current logic around the 'use_dac' module parameter prevents the
r81969 driver from being loadable on 64-bit systems without any RAM
below 4 GB when the parameter is left at its default value.
So introduce a new default value -1 which indicates that 64-bit DMA
should be enabled on sufficiently recent PCIe chips, i.e., versions
RTL_GIGA_MAC_VER_18 or later. Explicit param values of 0 or 1 retain
the existing behavior of unconditionally enabling/disabling 64-bit DMA
on 64-bit architectures (i.e., regardless of the type and version of the
chip)
Since PCIe chips do not need to CPlusCmd Dual Address Cycle to be set,
make that conditional on the device type as well.
Cc: Realtek linux nic maintainers <nic_swsd@realtek.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
For pcie nic, after setting link speed and there is no link driver does not need
to do phy reset until link up.
For some pcie nics, to do this will also reset phy speed down counter and prevent
phy from auto speed down.
This patch fix the issue reported in following link.
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1547151
Signed-off-by: Chunhao Lin <hau@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For RTL8168G/RTL8168H/RTL8411B/RTL8107E, enable this flag to eliminate
message "AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x0002
address=0x0000000000003000 flags=0x0050] in dmesg.
Signed-off-by: Chunhao Lin <hau@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There will be a log spam when there is no cable plugged. Please refer to
following links. https://bugzilla.kernel.org/show_bug.cgi?id=104351https://bugzilla.kernel.org/show_bug.cgi?id=107421
This issue is caused by runtime power management. When there is no cable
plugged, the driver will be suspend (runtime suspend) by OS and NIC will be
put into the D3 state. During this time, if OS call rtl8169_get_stats64()
to dump tally counter, because NIC is in D3 state, the register value read
by driver will return all 0xff. This will let driver think tally counter
flag is not toggled and then sends the warning message "rtl_counters_cond
== 1 (loop: 1000, delay: 10)" to kernel log.
For fixing this issue, 1.add checking driver's pm runtime status in
rtl8169_get_stats64(). 2.dump tally counter before going runtime suspend
for counter accuracy in runtime suspend.
Signed-off-by: Chunhao Lin <hau@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are typos in setting RTL8168H hardware parameters. If system install
another version driver that may cuase system hang.
Signed-off-by: Chunhao Lin <hau@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The original way is wrong, it always writes ephy reg 0x03.
Signed-off-by: Chunhao Lin <hau@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The PHY PFM register is in PHY page 0x0a44 register 0x11, not 0x14.
Signed-off-by: Chunhao Lin <hau@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The register for setting D3code PFM mode is MISC_1, not DLLPR.
Signed-off-by: Chunhao Lin <hau@realtek.com>
Reviewed-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The vlaue of RTL8168H PHY register "rg_saw_cnt" only valid from bit0 to bit13.
When read this register, add bitwise-anding its value with 0x3fff.
Signed-off-by: Chunhao Lin <hau@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In function "rtl8168h_2_hw_phy_config", there is a typo in setting
RTL8168H PHY parameter.
Signed-off-by: Chunhao Lin <hau@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Fixes: d7d2d89d4b ("r8169: Add software counter for multicast packages")
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Corinna Vinschen <vinschen@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Many drivers initialize uselessly n_priv_flags, n_stats, testinfo_len,
eedump_len & regdump_len fields in their .get_drvinfo() ethtool op.
It's not necessary as these fields is filled in ethtool_get_drvinfo().
v2: removed unused variable
v3: removed another unused variable
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function can return negative value.
The problem has been detected using proposed semantic patch
scripts/coccinelle/tests/assign_signed_to_unsigned.cocci [1].
[1]: http://permalink.gmane.org/gmane.linux.kernel/2046107
Signed-off-by: Andrzej Hajda <a.hajda@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When fixing the TSO support I noticed we just mask ->gso_size with the
MSSMask value and don't care about the consequences.
Provide a .ndo_features_check() method which drops the NETIF_F_TSO
feature for any skb which would exceed the maximum, and thus forces it
to be segmented by software.
Then we can stop the masking in cp_start_xmit(), and just WARN if the
maximum is exceeded, which should now never happen.
Finally, Francois Romieu noticed that we didn't even have the right
value for MSSMask anyway; it should be 0x7ff (11 bits) not 0xfff.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I fixed TSO. Hardware checksum and scatter/gather also appear to be
working correctly both on real hardware and in QEMU's emulation.
Let's enable them by default and see if anyone screams...
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We are seeing unexplained TX timeouts under heavy load. Let's try to get
a better idea of what's going on.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The low 16 bits of the 'opts1' field in the TX descriptor are supposed
to still contain the buffer length when the descriptor is handed back to
us. In practice, at least on my hardware, they don't. So stash the
original value of the opts1 field and get the length to unmap from
there.
There are other ways we could have worked out the length, but I actually
want a stash of the opts1 field anyway so that I can dump it alongside
the contents of the descriptor ring when we suffer a TX timeout.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We calculate the value of the opts1 descriptor field in three different
places. With two different behaviours when given an invalid packet to
be checksummed — none of them correct. Sort that out.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When sending a TSO frame in multiple buffers, we were neglecting to set
the first descriptor up in TSO mode.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After a certain amount of staring at the debug output of this driver, I
realised it was lying to me.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If an RX interrupt was already received but NAPI has not yet run when
the RX timeout happens, we end up in cp_tx_timeout() with RX interrupts
already disabled. Blindly re-enabling them will cause an IRQ storm.
(This is made particularly horrid by the fact that cp_interrupt() always
returns that it's handled the interrupt, even when it hasn't actually
done anything. If it didn't do that, the core IRQ code would have
detected the storm and handled it, I'd have had a clear smoking gun
backtrace instead of just a spontaneously resetting router, and I'd have
at *least* two days of my life back. Changing the return value of
cp_interrupt() will be argued about under separate cover.)
Unconditionally leave RX interrupts disabled after the reset, and
schedule NAPI to check the receive ring and re-enable them.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Unless we reset the RX config, on real hardware I don't seem to receive
any packets after a TX timeout.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This can be called from cp_tx_timeout() with interrupts disabled.
Spotted by Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=104031
Fixes: 6e85d5ad36
Based on the discussion starting at
http://www.spinics.net/lists/netdev/msg342193.html
Tested locally on RTL8168evl/8111evl with various concurrent processes
accessing /proc/net/dev while changing the link state as well as
removing/reloading the r8169 module.
Signed-off-by: Corinna Vinschen <vinschen@redhat.com>
Tested-by: poma <pomidorabelisima@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The multicast hardware counter on 8168/8111 chips is only 32 bit while the
statistics in struct rtnl_link_stats64 are 64 bit. Given that statistics
are requested on an irregular basis, an overflow of the hardware counter
can go unnoticed. To count even very large numbers of multicast packets
reliably, add a software counter and remove previously applied code to
fill the multicast field requested by @rtl8169_get_stats64 with the values
read from the rx_multicast hardware counter.
Signed-off-by: Corinna Vinschen <vinschen@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The r8169 driver collects statistical information returned by
@get_stats64 by counting them in the driver itself, even though many
(but not all) of the values are already collected by tally counters
(TCs) in the NIC. Some of these TC values are not returned by
@get_stats64. Especially the received multicast packages are missing
from /proc/net/dev.
Rectify this by fetching the TCs and returning them from
rtl8169_get_stats64.
The counters collected in the driver obviously disappear as soon as the
driver is unloaded so after a driver is loaded the counters always start
at 0. The TCs on the other hand are only reset by a power cycle. Without
further considerations the values collected by the driver would not match
up against the TC values.
This patch introduces a new function rtl8169_reset_counters which
resets the TCs. Also, since rtl8169_reset_counters shares most of
its code with rtl8169_update_counters, refactor the shared code into
two new functions rtl8169_map_counters and rtl8169_unmap_counters.
Unfortunately chip versions prior to RTL_GIGA_MAC_VER_19 don't allow
to reset the TCs programatically. Therefore introduce an addition to
the rtl8169_private struct and a function rtl8169_init_counter_offsets
to store the TCs at first rtl_open. Use these values as offsets in
rtl8169_get_stats64. Propagate a failure to reset *and* update the
counters up to rtl_open and emit a warning message, if so.
Signed-off-by: Corinna Vinschen <vinschen@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enforcing this flag in RxConfig for the mentioned chips fixes netdev
watchdog issues prepended with AMD IOMMU message(s) like:
AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x001d address=0x0000000000003000 flags=0x0050]
Note that this flag is also set in Realtek's own driver for these chips.
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Tested-by: Alexander Lindqvist <alexander@bitspace.se>
Acked-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This howto made sense in the 1990s when users had to manually configure
ISA cards with jumpers or vendor utilities, but with the implementation
of PCI it became increasingly less and less relevant, to the point where
it has been well over a decade since I last updated it. And there is
no value in anyone else taking over updating it either.
However the references to it continue to spread as boiler plate text
from one Kconfig file into the next. We are not doing end users any
favours by pointing them at this old document, so lets kill it with
fire, once and for all, to hopefully stop any further spread.
No code is changed in this commit, just Kconfig help text.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function r8169_csum_workaround is called in the ndo_start_xmit path of
the r8169 driver. As such it should not be using dev_kfree_skb as it is
not irq safe, so instead we should be using dev_kfree_skb_any for freeing
in the dropped path, and dev_consume_skb_any for any frames that were
transmitted.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are certain regressions which are pointing to
these two commits which we are having a hard time
resolving. So revert them for now.
Specifically this reverts:
commit 0bec3b700d
Author: Florian Westphal <fw@strlen.de>
Date: Wed Jan 7 10:49:49 2015 +0100
r8169: add support for xmit_more
and
commit 1e91887685
Author: Florian Westphal <fw@strlen.de>
Date: Wed Oct 1 13:38:03 2014 +0200
r8169: add support for Byte Queue Limits
There were some attempts by Eric Dumazet to address some obvious
problems in the TX flow, to see if they would fix the problems,
but none of them seem to help for the regression reporters.
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking updates from David Miller:
1) More iov_iter conversion work from Al Viro.
[ The "crypto: switch af_alg_make_sg() to iov_iter" commit was
wrong, and this pull actually adds an extra commit on top of the
branch I'm pulling to fix that up, so that the pre-merge state is
ok. - Linus ]
2) Various optimizations to the ipv4 forwarding information base trie
lookup implementation. From Alexander Duyck.
3) Remove sock_iocb altogether, from CHristoph Hellwig.
4) Allow congestion control algorithm selection via routing metrics.
From Daniel Borkmann.
5) Make ipv4 uncached route list per-cpu, from Eric Dumazet.
6) Handle rfs hash collisions more gracefully, also from Eric Dumazet.
7) Add xmit_more support to r8169, e1000, and e1000e drivers. From
Florian Westphal.
8) Transparent Ethernet Bridging support for GRO, from Jesse Gross.
9) Add BPF packet actions to packet scheduler, from Jiri Pirko.
10) Add support for uniqu flow IDs to openvswitch, from Joe Stringer.
11) New NetCP ethernet driver, from Muralidharan Karicheri and Wingman
Kwok.
12) More sanely handle out-of-window dupacks, which can result in
serious ACK storms. From Neal Cardwell.
13) Various rhashtable bug fixes and enhancements, from Herbert Xu,
Patrick McHardy, and Thomas Graf.
14) Support xmit_more in be2net, from Sathya Perla.
15) Group Policy extensions for vxlan, from Thomas Graf.
16) Remove Checksum Offload support for vxlan, from Tom Herbert.
17) Like ipv4, support lockless transmit over ipv6 UDP sockets. From
Vlad Yasevich.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1494+1 commits)
crypto: fix af_alg_make_sg() conversion to iov_iter
ipv4: Namespecify TCP PMTU mechanism
i40e: Fix for stats init function call in Rx setup
tcp: don't include Fast Open option in SYN-ACK on pure SYN-data
openvswitch: Only set TUNNEL_VXLAN_OPT if VXLAN-GBP metadata is set
ipv6: Make __ipv6_select_ident static
ipv6: Fix fragment id assignment on LE arches.
bridge: Fix inability to add non-vlan fdb entry
net: Mellanox: Delete unnecessary checks before the function call "vunmap"
cxgb4: Add support in cxgb4 to get expansion rom version via ethtool
ethtool: rename reserved1 memeber in ethtool_drvinfo for expansion ROM version
net: dsa: Remove redundant phy_attach()
IB/mlx4: Reset flow support for IB kernel ULPs
IB/mlx4: Always use the correct port for mirrored multicast attachments
net/bonding: Fix potential bad memory access during bonding events
tipc: remove tipc_snprintf
tipc: nl compat add noop and remove legacy nl framework
tipc: convert legacy nl stats show to nl compat
tipc: convert legacy nl net id get to nl compat
tipc: convert legacy nl net id set to nl compat
...
Replace a magic number with a PCI #define symbol.
Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: David S. Miller <davem@davemloft.net>
The same macros are used for rx as well. So rename it.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Delay update of hw tail descriptor if we know that another skb is going
to be sent.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
For linux-3.18.0
The driver lacks netif_napi_del in the normal path and error path
to match the call of netif_napi_add in rtl8139_init_one.
This patch fixes this problem.
Signed-off-by: Jia-Ju Bai <baijiaju1990@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For linux-3.18.0
When pci_request_regions is failed in rtl8139_init_board, pci_disable_device
is not called to disable the device which are enabled by pci_enable_device,
because of disable_dev_on_err is not assigned 1.
This patch fix this problem.
Signed-off-by: Jia-Ju Bai <baijiaju1990@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add ephy parameter to rtl8168g.
Also change the common function of rtl8168g from "rtl_hw_start_8168g_1" to
"rtl_hw_start_8168g". And function "rtl_hw_start_8168g_1" is used for
setting rtl8168g hardware parameters.
Following is the explanation of what hardware parameter change for.
rtl8168g may erroneous judge the PCIe signal quality and show the error bit
on PCI configuration space when in PCIe low power mode.
The following ephy parameters are for above issue.
{ 0x00, 0x0000, 0x0008 }
{ 0x0c, 0x37d0, 0x0820 }
{ 0x1e, 0x0000, 0x0001 }
rtl8168g may return to PCIe L0 from PCIe L0s low power mode too slow.
The following ephy parameter is for above issue.
{ 0x19, 0x8000, 0x0000 }
Signed-off-by: Chunhao Lin <hau@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The r8169 use a pair of wmb() calls when setting up the descriptor rings.
The first is to synchronize the descriptor data with the descriptor status,
and the second is to synchronize the descriptor status with the use of the
MMIO doorbell to notify the device that descriptors are ready. This can
come at a heavy price on some systems, and is not really necessary on
systems such as x86 as a simple barrier() would suffice to order store/store
accesses. As such we can replace the first memory barrier with
dma_wmb() to reduce the cost for these accesses.
In addition the r8169 uses a rmb() to prevent compiler optimization in the
cleanup paths, however by moving the barrier down a few lines and replacing
it with a dma_rmb() we should be able to use it to guarantee
descriptor accesses do not occur until the device has updated the DescOwn
bit from its end.
One last change I made is to move the update of cur_tx in the xmit path to
after the wmb. This way we can guarantee the device and all CPUs should
see the DescOwn update before they see the cur_tx value update.
Cc: Realtek linux nic maintainers <nic_swsd@realtek.com>
Cc: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This replaces most of the calls to netdev_alloc_skb_ip_align in the Realtek
drivers. The one instance I didn't replace in 8139cp.c is because it was
called as a part of init and as such is not always accessed from the
softirq context.
Cc: Realtek linux nic maintainers <nic_swsd@realtek.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace rtl_skb_pad with eth_skb_pad since they do the same thing.
Cc: Realtek linux nic maintainers <nic_swsd@realtek.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>