Commit Graph

330 Commits

Author SHA1 Message Date
David S. Miller fb2a3d5c7c Revert "xen-netback: create a debugfs node for hash information"
This reverts commit c0c64c1523.

There is no vif->ctrl_task member, so this change broke
the build.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 09:09:31 -04:00
David S. Miller d6989d4bbe Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-09-23 06:46:57 -04:00
Juergen Gross 0364a8824c xen-netback: switch to threaded irq for control ring
Instead of open coding it use the threaded irq mechanism in
xen-netback.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-22 08:26:24 -04:00
Filipe Manco cce94483e4 xen-netback: fix error handling on netback_probe()
In case of error during netback_probe() (e.g. an entry missing on the
xenstore) netback_remove() is called on the new device, which will set
the device backend state to XenbusStateClosed by calling
set_backend_state(). However, the backend state wasn't initialized by
netback_probe() at this point, which will cause and invalid transaction
and set_backend_state() to BUG().

Initialize the backend state at the beginning of netback_probe() to
XenbusStateInitialising, and create two new valid state transitions on
set_backend_state(), from XenbusStateInitialising to XenbusStateClosed,
and from XenbusStateInitialising to XenbusStateInitWait.

Signed-off-by: Filipe Manco <filipe.manco@neclab.eu>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-17 09:56:02 -04:00
Wei Yongjun 1345b1ac57 xen-netback: using kfree_rcu() to simplify the code
The callback function of call_rcu() just calls a kfree(), so we
can use kfree_rcu() instead of call_rcu() + callback function.

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-23 17:00:26 -07:00
Paul Durrant c0c64c1523 xen-netback: create a debugfs node for hash information
It is useful to be able to see the hash configuration when running tests.
This patch adds a debugfs node for that purpose.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-18 23:29:09 -07:00
Paul Durrant c0fcded2e6 xen-netback: only deinitialized hash if it was initialized
A domain with a frontend that does not implement a control ring has been
seen to cause a crash during domain save. This was apparently because
the call to xenvif_deinit_hash() in xenvif_disconnect_ctrl() is made
regardless of whether a control ring was connected, and hence
xenvif_hash_init() was called.

This patch brings the call to xenvif_deinit_hash() in
xenvif_disconnect_ctrl() inside the if clause that checks whether the
control ring event channel was connected. This is sufficient to ensure
it is only called if xenvif_init_hash() was called previously.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-20 17:41:18 -04:00
Paul Durrant f86911e2de xen-netback: correct length checks on hash copy_ops
The length checks on the grant table copy_ops for setting hash key and
hash mapping are checking the local 'len' value which is correct in
the case of the former but not the latter. This was picked up by
static analysis checks.

This patch replaces checks of 'len' with 'copy_op.len' in both cases
to correct the incorrect check, keep the two checks consistent, and to
make it clear what the checks are for.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-20 10:51:03 -07:00
Paul Durrant c2d09fde72 xen-netback: use hash value from the frontend
My recent patch to include/xen/interface/io/netif.h defines a new extra
info type that can be used to pass hash values between backend and guest
frontend.

This patch adds code to xen-netback to use the value in a hash extra
info fragment passed from the guest frontend in a transmit-side
(i.e. netback receive side) packet to set the skb hash accordingly.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-16 13:35:56 -04:00
Paul Durrant f07f989338 xen-netback: pass hash value to the frontend
My recent patch to include/xen/interface/io/netif.h defines a new extra
info type that can be used to pass hash values between backend and guest
frontend.

This patch adds code to xen-netback to pass hash values calculated for
guest receive-side packets (i.e. netback transmit side) to the frontend.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-16 13:35:56 -04:00
Paul Durrant 40d8abdee8 xen-netback: add control protocol implementation
My recent patch to include/xen/interface/io/netif.h defines a new shared
ring (in addition to the rx and tx rings) for passing control messages
from a VM frontend driver to a backend driver.

A previous patch added the necessary boilerplate for mapping the control
ring from the frontend, should it be created. This patch adds
implementations for each of the defined protocol messages.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-16 13:35:56 -04:00
Paul Durrant 4e15ee2cb4 xen-netback: add control ring boilerplate
My recent patch to include/xen/interface/io/netif.h defines a new shared
ring (in addition to the rx and tx rings) for passing control messages
from a VM frontend driver to a backend driver.

This patch adds the necessary code to xen-netback to map this new shared
ring, should it be created by a frontend, but does not add implementations
for any of the defined protocol messages. These are added in a subsequent
patch for clarity.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-16 13:35:56 -04:00
Paul Durrant 72eec92acc xen-netback: fix extra_info handling in xenvif_tx_err()
Patch 562abd39 "xen-netback: support multiple extra info fragments
passed from frontend" contained a mistake which can result in an in-
correct number of responses being generated when handling errors
encountered when processing packets containing extra info fragments.
This patch fixes the problem.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reported-by: Jan Beulich <JBeulich@suse.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-13 01:58:57 -04:00
Paul Durrant 8e4ee59c1e xen-netback: reduce log spam
Remove the "prepare for reconnect" pr_info in xenbus.c. It's largely
uninteresting and the states of the frontend and backend can easily be
observed by watching the (o)xenstored log.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-13 22:08:01 -04:00
Paul Durrant 562abd39a1 xen-netback: support multiple extra info fragments passed from frontend
The code does not currently support a frontend passing multiple extra info
fragments to the backend in a tx request. The xenvif_get_extras() function
handles multiple extra_info fragments but make_tx_response() assumes there
is only ever a single extra info fragment.

This patch modifies xenvif_get_extras() to pass back a count of extra
info fragments, which is then passed to make_tx_response() (after
possibly being stashed in pending_tx_info for deferred responses).

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-13 22:08:01 -04:00
Paul Durrant 22fae97d86 xen-netback: implement dynamic multicast control
My recent patch to the Xen Project documents a protocol for 'dynamic
multicast control' in netif.h. This extends the previous multicast control
protocol to not require a shared ring reconnection to turn the feature off.
Instead the backend watches the "request-multicast-control" key in xenstore
and turns the feature off if the key value is written to zero.

This patch adds support for dynamic multicast control in xen-netback.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-07 13:58:36 -05:00
David Vrabel 9c6f3ffe82 xen-netback: free queues after freeing the net device
If a queue still has a NAPI instance added to the net device, freeing
the queues early results in a use-after-free.

The shouldn't ever happen because we disconnect and tear down all queues
before freeing the net device, but doing this makes it obviously safe.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-15 15:13:19 -05:00
David Vrabel 4a65852727 xen-netback: delete NAPI instance when queue fails to initialize
When xenvif_connect() fails it may leave a stale NAPI instance added to
the device.  Make sure we delete it in the error path.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-15 15:13:18 -05:00
David Vrabel 99a2dea50d xen-netback: use skb to determine number of required guest Rx requests
Using the MTU or GSO size to determine the number of required guest Rx
requests for an skb was subtly broken since these value may change at
runtime.

After 1650d5455b (xen-netback: always
fully coalesce guest Rx packets) we always fully pack a packet into
its guest Rx slots.  Calculating the number of required slots from the
packet length is then easy.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-15 15:13:18 -05:00
Linus Torvalds 3273cba195 xen: bug fixes for 4.4-rc5
- XSA-155 security fixes to backend drivers.
 - XSA-157 security fixes to pciback.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJWdDrXAAoJEFxbo/MsZsTR3N0H/0Lvz6MWBARCje7livbz7nqE
 PS0Bea+2yAfNhCDDiDlpV0lor8qlyfWDF6lGhLjItldAzahag3ZDKDf1Z/lcQvhf
 3MwFOcOVZE8lLtvLT6LGnPuehi1Mfdi1Qk1/zQhPhsq6+FLPLT2y+whmBihp8mMh
 C12f7KRg5r3U7eZXNB6MEtGA0RFrOp0lBdvsiZx3qyVLpezj9mIe0NueQqwY3QCS
 xQ0fILp/x2EnZNZuzgghFTPRxMAx5ReOezgn9Rzvq4aThD+irz1y6ghkYN4rG2s2
 tyYOTqBnjJEJEQ+wmYMhnfCwVvDffztG+uI9hqN31QFJiNB0xsjSWFCkDAWchiU=
 =Argz
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.4-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen bug fixes from David Vrabel:
 - XSA-155 security fixes to backend drivers.
 - XSA-157 security fixes to pciback.

* tag 'for-linus-4.4-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen-pciback: fix up cleanup path when alloc fails
  xen/pciback: Don't allow MSI-X ops if PCI_COMMAND_MEMORY is not set.
  xen/pciback: For XEN_PCI_OP_disable_msi[|x] only disable if device has MSI(X) enabled.
  xen/pciback: Do not install an IRQ handler for MSI interrupts.
  xen/pciback: Return error on XEN_PCI_OP_enable_msix when device has MSI or MSI-X enabled
  xen/pciback: Return error on XEN_PCI_OP_enable_msi when device has MSI or MSI-X enabled
  xen/pciback: Save xen_pci_op commands before processing it
  xen-scsiback: safely copy requests
  xen-blkback: read from indirect descriptors only once
  xen-blkback: only read request operation from shared ring once
  xen-netback: use RING_COPY_REQUEST() throughout
  xen-netback: don't use last request to determine minimum Tx credit
  xen: Add RING_COPY_REQUEST()
  xen/x86/pvh: Use HVM's flush_tlb_others op
  xen: Resume PMU from non-atomic context
  xen/events/fifo: Consume unprocessed events when a CPU dies
2015-12-18 12:24:52 -08:00
David Vrabel 68a33bfd84 xen-netback: use RING_COPY_REQUEST() throughout
Instead of open-coding memcpy()s and directly accessing Tx and Rx
requests, use the new RING_COPY_REQUEST() that ensures the local copy
is correct.

This is more than is strictly necessary for guest Rx requests since
only the id and gref fields are used and it is harmless if the
frontend modifies these.

This is part of XSA155.

CC: stable@vger.kernel.org
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2015-12-18 10:00:28 -05:00
David Vrabel 0f589967a7 xen-netback: don't use last request to determine minimum Tx credit
The last from guest transmitted request gives no indication about the
minimum amount of credit that the guest might need to send a packet
since the last packet might have been a small one.

Instead allow for the worst case 128 KiB packet.

This is part of XSA155.

CC: stable@vger.kernel.org
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2015-12-18 10:00:23 -05:00
Linus Torvalds 41ecf1404b xen: features for 4.4-rc0
- Improve balloon driver memory hotplug placement.
 - Use unpopulated hotplugged memory for foreign pages (if
   supported/enabled).
 - Support 64 KiB guest pages on arm64.
 - CPU hotplug support on arm/arm64.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJWOeSkAAoJEFxbo/MsZsTRph0H/0nE8Tx0GyGtOyCYfBdInTvI
 WgjvL8VR1XrweZMVis3668MzhLSYg6b5lvJsoi+L3jlzYRyze43iHXsKfvp+8p0o
 TVUhFnlHEHF8ASEtPydAi6HgS7Dn9OQ9LaZ45R1Gk0rHnwJjIQonhTn2jB0yS9Am
 Hf4aZXP2NVZphjYcloqNsLH0G6mGLtgq8cS0uKcVO2YIrR4Dr3sfj9qfq9mflf8n
 sA/5ifoHRfOUD1vJzYs4YmIBUv270jSsprWK/Mi2oXIxUTBpKRAV1RVCAPW6GFci
 HIZjIJkjEPWLsvxWEs0dUFJQGp3jel5h8vFPkDWBYs3+9rILU2DnLWpKGNDHx3k=
 =vUfa
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.4-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen updates from David Vrabel:

 - Improve balloon driver memory hotplug placement.

 - Use unpopulated hotplugged memory for foreign pages (if
   supported/enabled).

 - Support 64 KiB guest pages on arm64.

 - CPU hotplug support on arm/arm64.

* tag 'for-linus-4.4-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (44 commits)
  xen: fix the check of e_pfn in xen_find_pfn_range
  x86/xen: add reschedule point when mapping foreign GFNs
  xen/arm: don't try to re-register vcpu_info on cpu_hotplug.
  xen, cpu_hotplug: call device_offline instead of cpu_down
  xen/arm: Enable cpu_hotplug.c
  xenbus: Support multiple grants ring with 64KB
  xen/grant-table: Add an helper to iterate over a specific number of grants
  xen/xenbus: Rename *RING_PAGE* to *RING_GRANT*
  xen/arm: correct comment in enlighten.c
  xen/gntdev: use types from linux/types.h in userspace headers
  xen/gntalloc: use types from linux/types.h in userspace headers
  xen/balloon: Use the correct sizeof when declaring frame_list
  xen/swiotlb: Add support for 64KB page granularity
  xen/swiotlb: Pass addresses rather than frame numbers to xen_arch_need_swiotlb
  arm/xen: Add support for 64KB page granularity
  xen/privcmd: Add support for Linux 64KB page granularity
  net/xen-netback: Make it running on 64KB page granularity
  net/xen-netfront: Make it running on 64KB page granularity
  block/xen-blkback: Make it running on 64KB page granularity
  block/xen-blkfront: Make it running on 64KB page granularity
  ...
2015-11-04 17:32:42 -08:00
Julien Grall d0089e8a0e net/xen-netback: Make it running on 64KB page granularity
The PV network protocol is using 4KB page granularity. The goal of this
patch is to allow a Linux using 64KB page granularity working as a
network backend on a non-modified Xen.

It's only necessary to adapt the ring size and break skb data in small
chunk of 4KB. The rest of the code is relying on the grant table code.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-10-23 14:20:41 +01:00
Julien Grall a0f2e80fcd net/xen-netback: xenvif_gop_frag_copy: move GSO check out of the loop
The skb doesn't change within the function. Therefore it's only
necessary to check if we need GSO once at the beginning.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-10-23 14:20:32 +01:00
Insu Yun 833b8f18ad xen-netback: correctly check failed allocation
Since vzalloc can be failed in memory pressure,
writes -ENOMEM to xenstore to indicate error.

Signed-off-by: Insu Yun <wuninsu@gmail.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-18 19:37:29 -07:00
Linus Torvalds 06ab838c20 xen: MFN/GFN/BFN terminology changes for 4.3-rc0
- Use the correct GFN/BFN terms more consistently.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJV8VRMAAoJEFxbo/MsZsTRiGQH/i/jrAJUJfrFC2PINaA2gDwe
 O0dlrkCiSgAYChGmxxxXZQSPM5Po5+EbT/dLjZ/uvSooeorM9RYY/mFo7ut/qLep
 4pyQUuwGtebWGBZTrj9sygUVXVhgJnyoZxskNUbhj9zvP7hb9++IiI78mzne6cpj
 lCh/7Z2dgpfRcKlNRu+qpzP79Uc7OqIfDK+IZLrQKlXa7IQDJTQYoRjbKpfCtmMV
 BEG3kN9ESx5tLzYiAfxvaxVXl9WQFEoktqe9V8IgOQlVRLgJ2DQWS6vmraGrokWM
 3HDOCHtRCXlPhu1Vnrp0R9OgqWbz8FJnmVAndXT8r3Nsjjmd0aLwhJx7YAReO/4=
 =JDia
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.3-rc0b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen terminology fixes from David Vrabel:
 "Use the correct GFN/BFN terms more consistently"

* tag 'for-linus-4.3-rc0b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/xenbus: Rename the variable xen_store_mfn to xen_store_gfn
  xen/privcmd: Further s/MFN/GFN/ clean-up
  hvc/xen: Further s/MFN/GFN clean-up
  video/xen-fbfront: Further s/MFN/GFN clean-up
  xen/tmem: Use xen_page_to_gfn rather than pfn_to_gfn
  xen: Use correctly the Xen memory terminologies
  arm/xen: implement correctly pfn_to_mfn
  xen: Make clear that swiotlb and biomerge are dealing with DMA address
2015-09-10 16:21:11 -07:00
Wei Liu 4c82ac3c37 xen-netback: respect user provided max_queues
Originally that parameter was always reset to num_online_cpus during
module initialisation, which renders it useless.

The fix is to only set max_queues to num_online_cpus when user has not
provided a value.

Reported-by: Johnny Strom <johnny.strom@linuxsolutions.fi>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-10 10:11:48 -07:00
David Vrabel 1d5d485239 xen-netback: require fewer guest Rx slots when not using GSO
Commit f48da8b14d (xen-netback: fix
unlimited guest Rx internal queue and carrier flapping) introduced a
regression.

The PV frontend in IPXE only places 4 requests on the guest Rx ring.
Since netback required at least (MAX_SKB_FRAGS + 1) slots, IPXE could
not receive any packets.

a) If GSO is not enabled on the VIF, fewer guest Rx slots are required
   for the largest possible packet.  Calculate the required slots
   based on the maximum GSO size or the MTU.

   This calculation of the number of required slots relies on
   1650d5455b (xen-netback: always fully coalesce guest Rx packets)
   which present in 4.0-rc1 and later.

b) Reduce the Rx stall detection to checking for at least one
   available Rx request.  This is fine since we're predominately
   concerned with detecting interfaces which are down and thus have
   zero available Rx requests.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-09 12:34:35 -07:00
Julien Grall 0df4f266b3 xen: Use correctly the Xen memory terminologies
Based on include/xen/mm.h [1], Linux is mistakenly using MFN when GFN
is meant, I suspect this is because the first support for Xen was for
PV. This resulted in some misimplementation of helpers on ARM and
confused developers about the expected behavior.

For instance, with pfn_to_mfn, we expect to get an MFN based on the name.
Although, if we look at the implementation on x86, it's returning a GFN.

For clarity and avoid new confusion, replace any reference to mfn with
gfn in any helpers used by PV drivers. The x86 code will still keep some
reference of pfn_to_mfn which may be used by all kind of guests
No changes as been made in the hypercall field, even
though they may be invalid, in order to keep the same as the defintion
in xen repo.

Note that page_to_mfn has been renamed to xen_page_to_gfn to avoid a
name to close to the KVM function gfn_to_page.

Take also the opportunity to simplify simple construction such
as pfn_to_mfn(page_to_pfn(page)) into xen_page_to_gfn. More complex clean up
will come in follow-up patches.

[1] http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=e758ed14f390342513405dd766e874934573e6cb

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-09-08 18:03:49 +01:00
Paul Durrant 210c34dcd8 xen-netback: add support for multicast control
Xen's PV network protocol includes messages to add/remove ethernet
multicast addresses to/from a filter list in the backend. This allows
the frontend to request the backend only forward multicast packets
which are of interest thus preventing unnecessary noise on the shared
ring.

The canonical netif header in git://xenbits.xen.org/xen.git specifies
the message format (two more XEN_NETIF_EXTRA_TYPEs) so the minimal
necessary changes have been pulled into include/xen/interface/io/netif.h.

To prevent the frontend from extending the multicast filter list
arbitrarily a limit (XEN_NETBK_MCAST_MAX) has been set to 64 entries.
This limit is not specified by the protocol and so may change in future.
If the limit is reached then the next XEN_NETIF_EXTRA_TYPE_MCAST_ADD
sent by the frontend will be failed with NETIF_RSP_ERROR.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-02 11:45:00 -07:00
David S. Miller 182ad468e7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/cavium/Kconfig

The cavium conflict was overlapping dependency
changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-13 16:23:11 -07:00
Ross Lagerwall 57b229063a xen/netback: Wake dealloc thread after completing zerocopy work
Waking the dealloc thread before decrementing inflight_packets is racy
because it means the thread may go to sleep before inflight_packets is
decremented. If kthread_stop() has already been called, the dealloc
thread may wait forever with nothing to wake it. Instead, wake the
thread only after decrementing inflight_packets.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-06 23:42:48 -07:00
Ross Lagerwall 2475b22526 xen-netback: Allocate fraglist early to avoid complex rollback
Determine if a fraglist is needed in the tx path, and allocate it if
necessary before setting up the copy and map operations.
Otherwise, undoing the copy and map operations is tricky.

This fixes a use-after-free: if allocating the fraglist failed, the copy
and map operations that had been set up were still executed, writing
over the data area of a freed skb.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-03 22:23:03 -07:00
David S. Miller c5e40ee287 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	net/bridge/br_mdb.c

br_mdb.c conflict was a function call being removed to fix a bug in
'net' but whose signature was changed in 'net-next'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-23 00:41:16 -07:00
Dan Carpenter 50c2e4dd67 net/xen-netback: off by one in BUG_ON() condition
The > should be >=.  I also added spaces around the '-' operations so
the code is a little more consistent and matches the condition better.

Fixes: f53c3fe8da ('xen-netback: Introduce TX grant mapping')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-14 15:40:52 -07:00
Li, Liang Z 6ab13b2769 xen-netback: remove duplicated function definition
There are two duplicated xenvif_zerocopy_callback() definitions.
Remove one of them.

Signed-off-by: Liang Li <liang.z.li@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-08 14:12:53 -07:00
Linus Torvalds 7adf12b87f xen: features and cleanups for 4.2-rc0
- Add "make xenconfig" to assist in generating configs for Xen guests.
 - Preparatory cleanups necessary for supporting 64 KiB pages in ARM
   guests.
 - Automatically use hvc0 as the default console in ARM guests.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQEcBAABAgAGBQJVkpoqAAoJEFxbo/MsZsTRu3IH/2AMPx2i65hoSqfHtGf3sz/z
 XNfcidVmOElFVXGaW83m0tBWMemT5LpOGRfiq5sIo8xt/8xD2vozEkl/3kkf3RrX
 EmZDw3E8vmstBdBTjWdovVhNenRc0m0pB5daS7wUdo9cETq1ag1L3BHTB3fEBApO
 74V6qAfnhnq+snqWhRD3XAk3LKI0nWuWaV+5HsmxDtnunGhuRLGVs7mwxZGg56sM
 mILA0eApGPdwyVVpuDe0SwV52V8E/iuVOWTcomGEN2+cRWffG5+QpHxQA8bOtF6O
 KfqldiNXOY/idM+5+oSm9hespmdWbyzsFqmTYz0LvQvxE8eEZtHHB3gIcHkE8QU=
 =danz
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.2-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen updates from David Vrabel:
 "Xen features and cleanups for 4.2-rc0:

   - add "make xenconfig" to assist in generating configs for Xen guests

   - preparatory cleanups necessary for supporting 64 KiB pages in ARM
     guests

   - automatically use hvc0 as the default console in ARM guests"

* tag 'for-linus-4.2-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  block/xen-blkback: s/nr_pages/nr_segs/
  block/xen-blkfront: Remove invalid comment
  block/xen-blkfront: Remove unused macro MAXIMUM_OUTSTANDING_BLOCK_REQS
  arm/xen: Drop duplicate define mfn_to_virt
  xen/grant-table: Remove unused macro SPP
  xen/xenbus: client: Fix call of virt_to_mfn in xenbus_grant_ring
  xen: Include xen/page.h rather than asm/xen/page.h
  kconfig: add xenconfig defconfig helper
  kconfig: clarify kvmconfig is for kvm
  xen/pcifront: Remove usage of struct timeval
  xen/tmem: use BUILD_BUG_ON() in favor of BUG_ON()
  hvc_xen: avoid uninitialized variable warning
  xenbus: avoid uninitialized variable warning
  xen/arm: allow console=hvc0 to be omitted for guests
  arm,arm64/xen: move Xen initialization earlier
  arm/xen: Correctly check if the event channel interrupt is present
2015-07-01 11:53:46 -07:00
David S. Miller 3a07bd6fea Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/mellanox/mlx4/main.c
	net/packet/af_packet.c

Both conflicts were cases of simple overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-24 02:58:51 -07:00
Palik, Imre 12b322ac85 xen-netback: fix a BUG() during initialization
Commit edafc132ba ("xen-netback: making the bandwidth limiter runtime settable")
introduced the capability to change the bandwidth rate limit at runtime.
But it also introduced a possible crashing bug.

If netback receives two XenbusStateConnected without getting the
hotplug-status watch firing in between, then it will try to register the
watches for the rate limiter again.  But this triggers a BUG() in the watch
registration code.

The fix modifies connect() to remove the possibly existing packet-rate
watches before trying to install those watches.  This behaviour is in line
with how connect() deals with the hotplug-status watch.

Signed-off-by: Imre Palik <imrep@amazon.de>
Cc: Matt Wilson <msw@amazon.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-23 03:34:02 -07:00
Julien Grall 68946159da net/xen-netback: Don't mix hexa and decimal with 0x in the printf format
Append 0x to all %x in order to avoid while reading when there is other
decimal value in the log.

Also replace some of the hexadecimal print to decimal to uniformize the
format with netfront.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: netdev@vger.kernel.org
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-21 09:40:41 -07:00
Julien Grall 44f0764cfe net/xen-netback: Remove unused code in xenvif_rx_action
The variables old_req_cons and ring_slots_used are assigned but never
used since commit 1650d5455b "xen-netback:
always fully coalesce guest Rx packets".

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-21 09:40:40 -07:00
Julien Grall a9fd60e268 xen: Include xen/page.h rather than asm/xen/page.h
Using xen/page.h will be necessary later for using common xen page
helpers.

As xen/page.h already include asm/xen/page.h, always use the later.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: netdev@vger.kernel.org
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-06-17 16:14:18 +01:00
David S. Miller dda922c831 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/phy/amd-xgbe-phy.c
	drivers/net/wireless/iwlwifi/Kconfig
	include/net/mac80211.h

iwlwifi/Kconfig and mac80211.h were both trivial overlapping
changes.

The drivers/net/phy/amd-xgbe-phy.c file got removed in 'net-next' and
the bug fix that happened on the 'net' side is already integrated
into the rest of the amd-xgbe driver.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-01 22:51:30 -07:00
Ian Campbell 31a418986a xen: netback: read hotplug script once at start of day.
When we come to tear things down in netback_remove() and generate the
uevent it is possible that the xenstore directory has already been
removed (details below).

In such cases netback_uevent() won't be able to read the hotplug
script and will write a xenstore error node.

A recent change to the hypervisor exposed this race such that we now
sometimes lose it (where apparently we didn't ever before).

Instead read the hotplug script configuration during setup and use it
for the lifetime of the backend device.

The apparently more obvious fix of moving the transition to
state=Closed in netback_remove() to after the uevent does not work
because it is possible that we are already in state=Closed (in
reaction to the guest having disconnected as it shutdown). Being
already in Closed means the toolstack is at liberty to start tearing
down the xenstore directories. In principal it might be possible to
arrange to unregister the device sooner (e.g on transition to Closing)
such that xenstore would still be there but this state machine is
fragile and prone to anger...

A modern Xen system only relies on the hotplug uevent for driver
domains, when the backend is in the same domain as the toolstack it
will run the necessary setup/teardown directly in the correct sequence
wrt xenstore changes.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-01 12:03:04 -07:00
Ian Campbell dc5e7a811d xen: netback: fix printf format string warning
drivers/net/xen-netback/netback.c: In function ‘xenvif_tx_build_gops’:
drivers/net/xen-netback/netback.c:1253:8: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 5 has type ‘int’ [-Wformat=]
        (txreq.offset&~PAGE_MASK) + txreq.size);
        ^

PAGE_MASK's type can vary by arch, so a cast is needed.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
----
v2: Cast to unsigned long, since PAGE_MASK can vary by arch.
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-01 12:03:04 -07:00
Ross Lagerwall ce0e5c522d xen/netback: Properly initialize credit_bytes
Commit e9ce7cb6b1 ("xen-netback: Factor queue-specific data into queue
struct") introduced a regression when moving queue-specific data into
the queue struct by failing to set the credit_bytes field. This
prevented bandwidth limiting from working. Initialize the field as it
was done before multiqueue support was added.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-27 12:56:55 -04:00
Shailendra Verma c489dbb189 net:xen-netback - Change 1 to true for bool type variable.
The variable separate_tx_rx_irq is bool type so assigning true
instead of 1.

Signed-off-by: Shailendra Verma <shailendra.capricorn@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-25 16:34:35 -04:00
Linus Torvalds 497a5df7bf xen: features and fixes for 4.1-rc0
- Use a single source list of hypercalls, generating other tables
   etc. at build time.
 - Add a "Xen PV" APIC driver to support >255 VCPUs in PV guests.
 - Significant performance improve to guest save/restore/migration.
 - scsiback/front save/restore support.
 - Infrastructure for multi-page xenbus rings.
 - Misc fixes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQEcBAABAgAGBQJVL6OZAAoJEFxbo/MsZsTRs3YH/2AycBHs129DNnc2OLmOklBz
 AdD43k+FOfZlv0YU80WmPVmOpGGHGB5Pqkix2KtnvPYmtx3pb/5ikhDwSTWZpqBl
 Qq6/RgsRjYZ8VMKqrMTkJMrJWHQYbg8lgsP5810nsFBn/Qdbxms+WBqpMkFVo3b2
 rvUZj8QijMJPS3qr55DklVaOlXV4+sTAytTdCiubVnaB/agM2jjRflp/lnJrhtTg
 yc4NTrIlD1RsMV/lNh92upBP/pCm6Bs0zQ2H1v3hkdhBBmaO0IVXpSheYhfDOHfo
 9v209n137N7X86CGWImFk6m2b+EfiFnLFir07zKSA+iZwkYKn75znSdPfj0KCc0=
 =bxTm
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-4.1-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen features and fixes from David Vrabel:

 - use a single source list of hypercalls, generating other tables etc.
   at build time.

 - add a "Xen PV" APIC driver to support >255 VCPUs in PV guests.

 - significant performance improve to guest save/restore/migration.

 - scsiback/front save/restore support.

 - infrastructure for multi-page xenbus rings.

 - misc fixes.

* tag 'stable/for-linus-4.1-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/pci: Try harder to get PXM information for Xen
  xenbus_client: Extend interface to support multi-page ring
  xen-pciback: also support disabling of bus-mastering and memory-write-invalidate
  xen: support suspend/resume in pvscsi frontend
  xen: scsiback: add LUN of restored domain
  xen-scsiback: define a pr_fmt macro with xen-pvscsi
  xen/mce: fix up xen_late_init_mcelog() error handling
  xen/privcmd: improve performance of MMAPBATCH_V2
  xen: unify foreign GFN map/unmap for auto-xlated physmap guests
  x86/xen/apic: WARN with details.
  x86/xen: Provide a "Xen PV" APIC driver to support >255 VCPUs
  xen/pciback: Don't print scary messages when unsupported by hypervisor.
  xen: use generated hypercall symbols in arch/x86/xen/xen-head.S
  xen: use generated hypervisor symbols in arch/x86/xen/trace.c
  xen: synchronize include/xen/interface/xen.h with xen
  xen: build infrastructure for generating hypercall depending symbols
  xen: balloon: Use static attribute groups for sysfs entries
  xen: pcpu: Use static attribute groups for sysfs entry
2015-04-16 14:01:03 -05:00
Wei Liu ccc9d90a9a xenbus_client: Extend interface to support multi-page ring
Originally Xen PV drivers only use single-page ring to pass along
information. This might limit the throughput between frontend and
backend.

The patch extends Xenbus driver to support multi-page ring, which in
general should improve throughput if ring is the bottleneck. Changes to
various frontend / backend to adapt to the new interface are also
included.

Affected Xen drivers:
* blkfront/back
* netfront/back
* pcifront/back
* scsifront/back
* vtpmfront

The interface is documented, as before, in xenbus_client.c.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Cc: Konrad Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-04-15 10:56:47 +01:00
David S. Miller 0fa74a4be4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/emulex/benet/be_main.c
	net/core/sysctl_net_core.c
	net/ipv4/inet_diag.c

The be_main.c conflict resolution was really tricky.  The conflict
hunks generated by GIT were very unhelpful, to say the least.  It
split functions in half and moved them around, when the real actual
conflict only existed solely inside of one function, that being
be_map_pci_bars().

So instead, to resolve this, I checked out be_main.c from the top
of net-next, then I applied the be_main.c changes from 'net' since
the last time I merged.  And this worked beautifully.

The inet_diag.c and sysctl_net_core.c conflicts were simple
overlapping changes, and were easily to resolve.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-20 18:51:09 -04:00
Palik, Imre edafc132ba xen-netback: making the bandwidth limiter runtime settable
With the current netback, the bandwidth limiter's parameters are only
settable during vif setup time.  This patch register a watch on them, and
thus makes them runtime changeable.

When the watch fires, the timer is reset.  The timer's mutex is used for
fencing the change.

Cc: Anthony Liguori <aliguori@amazon.com>
Signed-off-by: Imre Palik <imrep@amazon.de>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-20 12:55:15 -04:00
David Vrabel c8a4d29988 xen-netback: notify immediately after pushing Tx response.
This fixes a performance regression introduced by
7fbb9d8415 (xen-netback: release pending
index before pushing Tx responses)

Moving the notify outside of the spin locks means it can be delayed a
long time (if the dealloc thread is descheduled or there is an
interrupt or softirq).

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Zoltan Kiss <zoltan.kiss@linaro.org>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-11 23:30:44 -04:00
David S. Miller 3cef5c5b0b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/cadence/macb.c

Overlapping changes in macb driver, mostly fixes and cleanups
in 'net' overlapping with the integration of at91_ether into
macb in 'net-next'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-09 23:38:02 -04:00
David Vrabel b0c21badf1 xen-netback: refactor xenvif_handle_frag_list()
When handling a from-guest frag list, xenvif_handle_frag_list()
replaces the frags before calling the destructor to clean up the
original (foreign) frags.  Whilst this is safe (the destructor doesn't
actually use the frags), it looks odd.

Reorder the function to be less confusing.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-05 14:58:18 -05:00
David Vrabel 49d9991a18 xen-netback: unref frags when handling a from-guest skb with a frag list
Every time a VIF is destroyed up to 256 pages may be leaked if packets
with more than MAX_SKB_FRAGS frags were transmitted from the guest.
Even worse, if another user of ballooned pages allocated one of these
ballooned pages it would not handle the unexpectedly >1 page count
(e.g., gntdev would deadlock when unmapping a grant because the page
count would never reach 1).

When handling a from-guest skb with a frag list, unref the frags
before releasing them so they are freed correctly when the VIF is
destroyed.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-05 14:58:17 -05:00
David Vrabel d63951d744 xen-netback: return correct ethtool stats
Use correct pointer arithmetic to get the pointer to each stat.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-05 14:58:17 -05:00
David S. Miller 71a83a6db6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/rocker/rocker.c

The rocker commit was two overlapping changes, one to rename
the ->vport member to ->pport, and another making the bitmask
expression use '1ULL' instead of plain '1'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-03 21:16:48 -05:00
Joe Perches 3b6ed26d8b xen: Use eth_<foo>_addr instead of memset
Use the built-in function instead of memset.

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-03 17:01:37 -05:00
David Vrabel 7fbb9d8415 xen-netback: release pending index before pushing Tx responses
If the pending indexes are released /after/ pushing the Tx response
then a stale pending index may be used if a new Tx request is
immediately pushed by the frontend.  The may cause various WARNINGs or
BUGs if the stale pending index is actually still in use.

Fix this by releasing the pending index before pushing the Tx
response.

The full barrier for the pending ring update is not required since the
the Tx response push already has a suitable write barrier.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-24 16:24:22 -05:00
Linus Torvalds c5ce28df0e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) More iov_iter conversion work from Al Viro.

    [ The "crypto: switch af_alg_make_sg() to iov_iter" commit was
      wrong, and this pull actually adds an extra commit on top of the
      branch I'm pulling to fix that up, so that the pre-merge state is
      ok.   - Linus ]

 2) Various optimizations to the ipv4 forwarding information base trie
    lookup implementation.  From Alexander Duyck.

 3) Remove sock_iocb altogether, from CHristoph Hellwig.

 4) Allow congestion control algorithm selection via routing metrics.
    From Daniel Borkmann.

 5) Make ipv4 uncached route list per-cpu, from Eric Dumazet.

 6) Handle rfs hash collisions more gracefully, also from Eric Dumazet.

 7) Add xmit_more support to r8169, e1000, and e1000e drivers.  From
    Florian Westphal.

 8) Transparent Ethernet Bridging support for GRO, from Jesse Gross.

 9) Add BPF packet actions to packet scheduler, from Jiri Pirko.

10) Add support for uniqu flow IDs to openvswitch, from Joe Stringer.

11) New NetCP ethernet driver, from Muralidharan Karicheri and Wingman
    Kwok.

12) More sanely handle out-of-window dupacks, which can result in
    serious ACK storms.  From Neal Cardwell.

13) Various rhashtable bug fixes and enhancements, from Herbert Xu,
    Patrick McHardy, and Thomas Graf.

14) Support xmit_more in be2net, from Sathya Perla.

15) Group Policy extensions for vxlan, from Thomas Graf.

16) Remove Checksum Offload support for vxlan, from Tom Herbert.

17) Like ipv4, support lockless transmit over ipv6 UDP sockets.  From
    Vlad Yasevich.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1494+1 commits)
  crypto: fix af_alg_make_sg() conversion to iov_iter
  ipv4: Namespecify TCP PMTU mechanism
  i40e: Fix for stats init function call in Rx setup
  tcp: don't include Fast Open option in SYN-ACK on pure SYN-data
  openvswitch: Only set TUNNEL_VXLAN_OPT if VXLAN-GBP metadata is set
  ipv6: Make __ipv6_select_ident static
  ipv6: Fix fragment id assignment on LE arches.
  bridge: Fix inability to add non-vlan fdb entry
  net: Mellanox: Delete unnecessary checks before the function call "vunmap"
  cxgb4: Add support in cxgb4 to get expansion rom version via ethtool
  ethtool: rename reserved1 memeber in ethtool_drvinfo for expansion ROM version
  net: dsa: Remove redundant phy_attach()
  IB/mlx4: Reset flow support for IB kernel ULPs
  IB/mlx4: Always use the correct port for mirrored multicast attachments
  net/bonding: Fix potential bad memory access during bonding events
  tipc: remove tipc_snprintf
  tipc: nl compat add noop and remove legacy nl framework
  tipc: convert legacy nl stats show to nl compat
  tipc: convert legacy nl net id get to nl compat
  tipc: convert legacy nl net id set to nl compat
  ...
2015-02-10 20:01:30 -08:00
Linus Torvalds bdccc4edeb xen: features and fixes for 3.20-rc0
- Reworked handling for foreign (grant mapped) pages to simplify the
   code, enable a number of additional use cases and fix a number of
   long-standing bugs.
 - Prefer the TSC over the Xen PV clock when dom0 (and the TSC is
   stable).
 - Assorted other cleanup and minor bug fixes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQEcBAABAgAGBQJU2JC+AAoJEFxbo/MsZsTRIvAH/1lgQ0EQlxaZtEFWY8cJBzxY
 dXaTMfyGQOddGYDCW0r42hhXJHeX7DWXSERSD3aW9DZOn/eYdneHq9gWRD4uPrGn
 hEFQ26J4jZWR5riGXaja0LqI2gJKLZ6BhHIQciLEbY+jw4ynkNBLNRPFehuwrCsZ
 WdBwJkyvXC3RErekncRl/aNhxdi4p1P6qeiaW/mo3UcSO/CFSKybOLwT65iePazg
 XuY9UiTn2+qcRkm/tjx8K9heHK8SBEGNWuoTcWYF1to8mwwUfKIAc4NO2UBDXJI+
 rp7Z2lVFdII15JsQ08ATh3t7xDrMWLzCX/y4jCzmF3DBXLbSWdHCQMgI7TWt5pE=
 =PyJK
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-3.20-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen features and fixes from David Vrabel:

 - Reworked handling for foreign (grant mapped) pages to simplify the
   code, enable a number of additional use cases and fix a number of
   long-standing bugs.

 - Prefer the TSC over the Xen PV clock when dom0 (and the TSC is
   stable).

 - Assorted other cleanup and minor bug fixes.

* tag 'stable/for-linus-3.20-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (25 commits)
  xen/manage: Fix USB interaction issues when resuming
  xenbus: Add proper handling of XS_ERROR from Xenbus for transactions.
  xen/gntdev: provide find_special_page VMA operation
  xen/gntdev: mark userspace PTEs as special on x86 PV guests
  xen-blkback: safely unmap grants in case they are still in use
  xen/gntdev: safely unmap grants in case they are still in use
  xen/gntdev: convert priv->lock to a mutex
  xen/grant-table: add a mechanism to safely unmap pages that are in use
  xen-netback: use foreign page information from the pages themselves
  xen: mark grant mapped pages as foreign
  xen/grant-table: add helpers for allocating pages
  x86/xen: require ballooned pages for grant maps
  xen: remove scratch frames for ballooned pages and m2p override
  xen/grant-table: pre-populate kernel unmap ops for xen_gnttab_unmap_refs()
  mm: add 'foreign' alias for the 'pinned' page flag
  mm: provide a find_special_page vma operation
  x86/xen: cleanup arch/x86/xen/mmu.c
  x86/xen: add some __init annotations in arch/x86/xen/mmu.c
  x86/xen: add some __init and static annotations in arch/x86/xen/setup.c
  x86/xen: use correct types for addresses in arch/x86/xen/setup.c
  ...
2015-02-10 13:56:56 -08:00
Lad, Prabhakar 38741d50eb xen-netback: fix sparse warning
this patch fixes following sparse warning:

interface.c:83:5: warning: symbol 'xenvif_poll' was not declared. Should it be static?

Signed-off-by: Lad, Prabhakar <prabhakar.csengg@gmail.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-05 16:04:22 -08:00
David S. Miller 6e03f896b5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/vxlan.c
	drivers/vhost/net.c
	include/linux/if_vlan.h
	net/core/dev.c

The net/core/dev.c conflict was the overlap of one commit marking an
existing function static whilst another was adding a new function.

In the include/linux/if_vlan.h case, the type used for a local
variable was changed in 'net', whereas the function got rewritten
to fix a stacked vlan bug in 'net-next'.

In drivers/vhost/net.c, Al Viro's iov_iter conversions in 'net-next'
overlapped with an endainness fix for VHOST 1.0 in 'net'.

In drivers/net/vxlan.c, vxlan_find_vni() added a 'flags' parameter
in 'net-next' whereas in 'net' there was a bug fix to pass in the
correct network namespace pointer in calls to this function.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-05 14:33:28 -08:00
David Vrabel 42b5212fee xen-netback: stop the guest rx thread after a fatal error
After commit e9d8b2c296 (xen-netback:
disable rogue vif in kthread context), a fatal (protocol) error would
leave the guest Rx thread spinning, wasting CPU time.  Commit
ecf08d2dbb (xen-netback: reintroduce
guest Rx stall detection) made this even worse by removing a
cond_resched() from this path.

Since a fatal error is non-recoverable, just allow the guest Rx thread
to exit.  This requires taking additional refs to the task so the
thread exiting early is handled safely.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reported-by: Julien Grall <julien.grall@linaro.org>
Tested-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-02 19:39:04 -08:00
Jennifer Herbert c2677a6fc4 xen-netback: use foreign page information from the pages themselves
Use the foreign page flag in netback to get the domid and grant ref
needed for the grant copy.  This signficiantly simplifies the netback
code and makes netback work with foreign pages from other backends
(e.g., blkback).

This allows blkback to use iSCSI disks provided by domUs running on
the same host.

Signed-off-by: Jennifer Herbert <jennifer.herbert@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-01-28 14:03:13 +00:00
David Vrabel ff4b156f16 xen/grant-table: add helpers for allocating pages
Add gnttab_alloc_pages() and gnttab_free_pages() to allocate/free pages
suitable to for granted maps.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
2015-01-28 14:03:12 +00:00
Jennifer Herbert 0ae65f49af x86/xen: require ballooned pages for grant maps
Ballooned pages are always used for grant maps which means the
original frame does not need to be saved in page->index nor restored
after the grant unmap.

This allows the workaround in netback for the conflicting use of the
(unionized) page->index and page->pfmemalloc to be removed.

Signed-off-by: Jennifer Herbert <jennifer.herbert@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-01-28 14:03:11 +00:00
David Vrabel 1650d5455b xen-netback: always fully coalesce guest Rx packets
Always fully coalesce guest Rx packets into the minimum number of ring
slots.  Reducing the number of slots per packet has significant
performance benefits when receiving off-host traffic.

Results from XenServer's performance benchmarks:

                         Baseline    Full coalesce
Interhost VM receive      7.2 Gb/s   11 Gb/s
Interhost aggregate      24 Gb/s     24 Gb/s
Intrahost single stream  14 Gb/s     14 Gb/s
Intrahost aggregate      34 Gb/s     34 Gb/s

However, this can increase the number of grant ops per packet which
decreases performance of backend (dom0) to VM traffic (by ~10%)
/unless/ grant copy has been optimized for adjacent ops with the same
source or destination (see "grant-table: defer releasing pages
acquired in a grant copy"[1] expected in Xen 4.6).

[1] http://lists.xen.org/archives/html/xen-devel/2015-01/msg01118.html

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-23 18:01:58 -08:00
Palik, Imre 07ff890dae xen-netback: fixing the propagation of the transmit shaper timeout
Since e9ce7cb6b1 ("xen-netback: Factor queue-specific data into queue struct"),
the transimt shaper timeout is always set to 0.  The value the user sets via
xenbus is never propagated to the transmit shaper.

This patch fixes the issue.

Cc: Anthony Liguori <aliguori@amazon.com>
Signed-off-by: Imre Palik <imrep@amazon.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-06 14:17:37 -05:00
David Vrabel 26c0e10258 xen-netback: support frontends without feature-rx-notify again
Commit bc96f648df (xen-netback: make
feature-rx-notify mandatory) incorrectly assumed that there were no
frontends in use that did not support this feature.  But the frontend
driver in MiniOS does not and since this is used by (qemu) stubdoms,
these stopped working.

Netback sort of works as-is in this mode except:

- If there are no Rx requests and the internal Rx queue fills, only
  the drain timeout will wake the thread.  The default drain timeout
  of 10 s would give unacceptable pauses.

- If an Rx stall was detected and the internal Rx queue is drained,
  then the Rx thread would never wake.

Handle these two cases (when feature-rx-notify is disabled) by:

- Reducing the drain timeout to 30 ms.

- Disabling Rx stall detection.

Reported-by: John <jw@nuclearfallout.net>
Tested-by: John <jw@nuclearfallout.net>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-18 12:49:49 -05:00
David S. Miller 22f10923dd Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/amd/xgbe/xgbe-desc.c
	drivers/net/ethernet/renesas/sh_eth.c

Overlapping changes in both conflict cases.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-10 15:48:20 -05:00
Jan Beulich f15650b7f9 netback: don't store invalid vif pointer
When xenvif_alloc() fails, it returns a non-NULL error indicator. To
avoid eventual races, we shouldn't store that into struct backend_info
as readers of it only check for NULL.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-09 18:30:08 -05:00
David S. Miller 60b7379dc5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-11-29 20:47:48 -08:00
Alexey Khoroshilov 2dd34339ac xen-netback: do not report success if backend_create_xenvif() fails
If xenvif_alloc() or xenbus_scanf() fail in backend_create_xenvif(),
xenbus is left in offline mode but netback_probe() reports success.

The patch implements propagation of error code for backend_create_xenvif().

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-24 16:14:45 -05:00
Malcolm Crossley 7e5d775395 xen-netback: remove unconditional __pskb_pull_tail() in guest Tx path
Unconditionally pulling 128 bytes into the linear area is not required
for:

- security: Every protocol demux starts with pskb_may_pull() to pull
  frag data into the linear area, if necessary, before looking at
  headers.

- performance: Netback has already grant copied up-to 128 bytes from
  the first slot of a packet into the linear area. The first slot
  normally contain all the IPv4/IPv6 and TCP/UDP headers.

The unconditional pull would often copy frag data unnecessarily.  This
is a performance problem when running on a version of Xen where grant
unmap avoids TLB flushes for pages which are not accessed.  TLB
flushes can now be avoided for > 99% of unmaps (it was 0% before).

Grant unmap TLB flush avoidance will be available in a future version
of Xen (probably 4.6).

Signed-off-by: Malcolm Crossley <malcolm.crossley@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-06 14:40:18 -05:00
David S. Miller 55b42b5ca2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/phy/marvell.c

Simple overlapping changes in drivers/net/phy/marvell.c

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-01 14:53:27 -04:00
Zoltan Kiss 44cc8ed17e xen-netback: Remove __GFP_COLD
This flag is unnecessary, it came from some old code.

Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Zoltan Kiss <zoltan.kiss@linaro.org>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-29 15:59:37 -04:00
Zoltan Kiss 8fe78989c3 xen-netback: Disable NAPI after disabling interrupts
Otherwise the interrupt handler still calls napi_complete. Although it
won't schedule NAPI again as either NAPI_STATE_DISABLE or
NAPI_STATE_SCHED is set, it is just unnecessary, and it makes more
sense to do this way.

Signed-off-by: Zoltan Kiss <zoltan.kiss@linaro.org>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-29 15:59:37 -04:00
David Vrabel ecf08d2dbb xen-netback: reintroduce guest Rx stall detection
If a frontend not receiving packets it is useful to detect this and
turn off the carrier so packets are dropped early instead of being
queued and drained when they expire.

A to-guest queue is stalled if it doesn't have enough free slots for a
an extended period of time (default 60 s).

If at least one queue is stalled, the carrier is turned off (in the
expectation that the other queues will soon stall as well).  The
carrier is only turned on once all queues are ready.

When the frontend connects, all the queues start in the stalled state
and only become ready once the frontend queues enough Rx requests.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-25 14:15:20 -04:00
David Vrabel f48da8b14d xen-netback: fix unlimited guest Rx internal queue and carrier flapping
Netback needs to discard old to-guest skb's (guest Rx queue drain) and
it needs detect guest Rx stalls (to disable the carrier so packets are
discarded earlier), but the current implementation is very broken.

1. The check in hard_start_xmit of the slot availability did not
   consider the number of packets that were already in the guest Rx
   queue.  This could allow the queue to grow without bound.

   The guest stops consuming packets and the ring was allowed to fill
   leaving S slot free.  Netback queues a packet requiring more than S
   slots (ensuring that the ring stays with S slots free).  Netback
   queue indefinately packets provided that then require S or fewer
   slots.

2. The Rx stall detection is not triggered in this case since the
   (host) Tx queue is not stopped.

3. If the Tx queue is stopped and a guest Rx interrupt occurs, netback
   will consider this an Rx purge event which may result in it taking
   the carrier down unnecessarily.  It also considers a queue with
   only 1 slot free as unstalled (even though the next packet might
   not fit in this).

The internal guest Rx queue is limited by a byte length (to 512 Kib,
enough for half the ring).  The (host) Tx queue is stopped and started
based on this limit.  This sets an upper bound on the amount of memory
used by packets on the internal queue.

This allows the estimatation of the number of slots for an skb to be
removed (it wasn't a very good estimate anyway).  Instead, the guest
Rx thread just waits for enough free slots for a maximum sized packet.

skbs queued on the internal queue have an 'expires' time (set to the
current time plus the drain timeout).  The guest Rx thread will detect
when the skb at the head of the queue has expired and discard expired
skbs.  This sets a clear upper bound on the length of time an skb can
be queued for.  For a guest being destroyed the maximum time needed to
wait for all the packets it sent to be dropped is still the drain
timeout (10 s) since it will not be sending new packets.

Rx stall detection is reintroduced in a later commit.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-25 14:15:20 -04:00
David Vrabel bc96f648df xen-netback: make feature-rx-notify mandatory
Frontends that do not provide feature-rx-notify may stall because
netback depends on the notification from frontend to wake the guest Rx
thread (even if can_queue is false).

This could be fixed but feature-rx-notify was introduced in 2006 and I
am not aware of any frontends that do not implement this.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-25 14:15:20 -04:00
David Vrabel 95afae4814 xen: remove DEFINE_XENBUS_DRIVER() macro
The DEFINE_XENBUS_DRIVER() macro looks a bit weird and causes sparse
errors.

Replace the uses with standard structure definitions instead.  This is
similar to pci and usb device registration.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2014-10-06 10:27:57 +01:00
Wei Liu e24f8191cc xen-netback: move netif_napi_add before binding interrupt
Interrupt is enabled when bind_interdomain_evtchn_to_irqhandler returns.
If there's interrupt pending interrupt handler is invoked.

NAPI needs to be initialised before binding interrupt otherwise the
interrupt handler will try to scheduling a NAPI instance that is not
initialised yet, resulting in kernel OOPS.

This fixes a regression introduced in ea2c5e13 ("xen-netback: move NAPI
add/remove calls").

Ideally function calls to create kthreads should also be moved before
binding but I intent to fix this regression with minimal changes and
refactor the code with another patch.

Reported-by: Thomas Leonard <talex5@gmail.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-25 17:31:42 -07:00
Wei Liu b125285821 xen-netback: remove loop waiting function
The original implementation relies on a loop to check if all inflight
packets are freed. Now we have proper reference counting, there's no
need to use loop anymore.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-13 20:07:44 -07:00
Wei Liu a64bd93452 xen-netback: don't stop dealloc kthread too early
Reference count the number of packets in host stack, so that we don't
stop the deallocation thread too early. If not, we can end up with
xenvif_free permanently waiting for deallocation thread to unmap grefs.

Reported-by: Thomas Leonard <talex5@gmail.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-13 20:07:44 -07:00
Wei Liu ea2c5e1342 xen-netback: move NAPI add/remove calls
Originally netif_napi_add was in xenvif_init_queue and netif_napi_del
was in xenvif_deinit_queue, while kthreads were handled in
xenvif_connect and xenvif_disconnect. Move netif_napi_add and
netif_napi_del to xenvif_connect and xenvif_disconnect so that they
reside together with kthread operations.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-13 20:07:44 -07:00
Wei Liu 628fa76b09 xen-netback: fix debugfs entry creation
The original code is bogus. The function gets called in a loop which
leaks entries created in previous rounds.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Zoltan Kiss <zoltan.kiss@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-13 20:06:43 -07:00
Wei Liu 5c807005fa xen-netback: fix debugfs write length check
Enlarge buffer size and check input length properly, so that we don't
misuse -ENOSPC.

Note that command like "kickXXXX" is still allowed, that's one patch for
another day if we really want to be very strict on this.

Reported-by: SeeChen Ng <seechen81@gmail.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Zoltan Kiss <zoltan.kiss@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-13 20:06:43 -07:00
Zoltan Kiss 2561cc15e3 xen-netback: Don't deschedule NAPI when carrier off
In the patch called "xen-netback: Turn off the carrier if the guest is not able
to receive" NAPI was descheduled when the carrier was set off. That's
not what most of the drivers do, and we don't have any specific reason to do so
as well, so revert that change.

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: xen-devel@lists.xenproject.org
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-11 14:07:47 -07:00
Zoltan Kiss 743b0a92b9 xen-netback: Fix vif->disable handling
In the patch called "xen-netback: Turn off the carrier if the guest is not able
to receive" new branches were introduced to this if statement, risking that a
queue with non-zero id can reenable the disabled interface.

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: xen-devel@lists.xenproject.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-07 16:02:56 -07:00
Zoltan Kiss f34a4cf9c9 xen-netback: Turn off the carrier if the guest is not able to receive
Currently when the guest is not able to receive more packets, qdisc layer starts
a timer, and when it goes off, qdisc is started again to deliver a packet again.
This is a very slow way to drain the queues, consumes unnecessary resources and
slows down other guests shutdown.
This patch change the behaviour by turning the carrier off when that timer
fires, so all the packets are freed up which were stucked waiting for that vif.
Instead of the rx_queue_purge bool it uses the VIF_STATUS_RX_PURGE_EVENT bit to
signal the thread that either the timeout happened or an RX interrupt arrived,
so the thread can check what it should do. It also disables NAPI, so the guest
can't transmit, but leaves the interrupts on, so it can resurrect.
Only the queues which brought down the interface can enable it again, the bit
QUEUE_STATUS_RX_STALLED makes sure of that.

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: xen-devel@lists.xenproject.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-05 16:04:46 -07:00
Zoltan Kiss 3d1af1df97 xen-netback: Using a new state bit instead of carrier
This patch introduces a new state bit VIF_STATUS_CONNECTED to track whether the
vif is in a connected state. Using carrier will not work with the next patch
in this series, which aims to turn the carrier temporarily off if the guest
doesn't seem to be able to receive packets.

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: xen-devel@lists.xenproject.org

v2:
- rename the bitshift type to "enum state_bit_shift" here, not in the next patch
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-05 16:04:46 -07:00
David S. Miller 8fd90bb889 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/infiniband/hw/cxgb4/device.c

The cxgb4 conflict was simply overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-22 00:44:59 -07:00
Zoltan Kiss d8cfbfc466 xen-netback: Fix pointer incrementation to avoid incorrect logging
Due to this pointer is increased prematurely, the error log contains rubbish.

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Reported-by: Armin Zentai <armin.zentai@ezit.hu>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: xen-devel@lists.xenproject.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-20 20:56:06 -07:00
Zoltan Kiss 1b860da040 xen-netback: Fix releasing header slot on error path
This patch makes this function aware that the first frag and the header might
share the same ring slot. That could happen if the first slot is bigger than
PKT_PROT_LEN. Due to this the error path might release that slot twice or never,
depending on the error scenario.
xenvif_idx_release is also removed from xenvif_idx_unmap, and called separately.

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Reported-by: Armin Zentai <armin.zentai@ezit.hu>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: xen-devel@lists.xenproject.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-20 20:56:06 -07:00
Zoltan Kiss b42cc6e421 xen-netback: Fix releasing frag_list skbs in error path
When the grant operations failed, the skb is freed up eventually, and it tries
to release the frags, if there is any. For the main skb nr_frags is set to 0 to
avoid this, but on the frag_list it iterates through the frags array, and tries
to call put_page on the page pointer which contains garbage at that time.

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Reported-by: Armin Zentai <armin.zentai@ezit.hu>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: xen-devel@lists.xenproject.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-20 20:56:06 -07:00
Zoltan Kiss 1a998d3e6b xen-netback: Fix handling frag_list on grant op error path
The error handling for skb's with frag_list was completely wrong, it caused
double unmap attempts to happen if the error was on the first skb. Move it to
the right place in the loop.

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Reported-by: Armin Zentai <armin.zentai@ezit.hu>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: xen-devel@lists.xenproject.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-20 20:56:05 -07:00
Tom Gundersen c835a67733 net: set name_assign_type in alloc_netdev()
Extend alloc_netdev{,_mq{,s}}() to take name_assign_type as argument, and convert
all users to pass NET_NAME_UNKNOWN.

Coccinelle patch:

@@
expression sizeof_priv, name, setup, txqs, rxqs, count;
@@

(
-alloc_netdev_mqs(sizeof_priv, name, setup, txqs, rxqs)
+alloc_netdev_mqs(sizeof_priv, name, NET_NAME_UNKNOWN, setup, txqs, rxqs)
|
-alloc_netdev_mq(sizeof_priv, name, setup, count)
+alloc_netdev_mq(sizeof_priv, name, NET_NAME_UNKNOWN, setup, count)
|
-alloc_netdev(sizeof_priv, name, setup)
+alloc_netdev(sizeof_priv, name, NET_NAME_UNKNOWN, setup)
)

v9: move comments here from the wrong commit

Signed-off-by: Tom Gundersen <teg@jklm.no>
Reviewed-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-15 16:12:48 -07:00
Zoltan Kiss f51de24356 xen-netback: Adding debugfs "io_ring_qX" files
This patch adds debugfs capabilities to netback. There used to be a similar
patch floating around for classic kernel, but it used procfs. It is based on a
very similar blkback patch.
It creates xen-netback/[vifname]/io_ring_q[queueno] files, reading them output
various ring variables etc. Writing "kick" into it imitates an interrupt
happened, it can be useful to check whether the ring is just stalled due to a
missed interrupt.

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: xen-devel@lists.xenproject.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-08 20:48:36 -07:00
Wei Liu f7b50c4e7c xen-netback: bookkeep number of active queues in our own module
The original code uses netdev->real_num_tx_queues to bookkeep number of
queues and invokes netif_set_real_num_tx_queues to set the number of
queues. However, netif_set_real_num_tx_queues doesn't allow
real_num_tx_queues to be smaller than 1, which means setting the number
to 0 will not work and real_num_tx_queues is untouched.

This is bogus when xenvif_free is invoked before any number of queues is
allocated. That function needs to iterate through all queues to free
resources. Using the wrong number of queues results in NULL pointer
dereference.

So we bookkeep the number of queues in xen-netback to solve this
problem. This fixes a regression introduced by multiqueue patchset in
3.16-rc1.

There's another bug in original code that the real number of RX queues
is never set. In current Xen multiqueue design, the number of TX queues
and RX queues are in fact the same. We need to set the numbers of TX and
RX queues to the same value.

Also remove xenvif_select_queue and leave queue selection to core
driver, as suggested by David Miller.

Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-25 15:59:47 -07:00
Arnd Bergmann e7b599d7c1 net: xen-netback: include linux/vmalloc.h again
commit e9ce7cb6b1 ("xen-netback: Factor queue-specific data into
queue struct") added a use of vzalloc/vfree to interface.c, but
removed the #include <linux/vmalloc.h> statement at the same time,
which causes this build error:

drivers/net/xen-netback/interface.c: In function 'xenvif_free':
drivers/net/xen-netback/interface.c:754:2: error: implicit declaration of function 'vfree' [-Werror=implicit-function-declaration]
  vfree(vif->queues);
  ^
cc1: some warnings being treated as errors

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Andrew J. Bennieston <andrew.bennieston@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11 15:19:28 -07:00
David S. Miller f666f87b94 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/xen-netback/netback.c
	net/core/filter.c

A filter bug fix overlapped some cleanups and a conversion
over to some new insn generation macros.

A xen-netback bug fix overlapped the addition of multi-queue
support.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-05 16:22:02 -07:00
Zoltan Kiss 59ae9fc670 xen-netback: Fix handling of skbs requiring too many slots
A recent commit (a02eb4 "xen-netback: worse-case estimate in xenvif_rx_action is
underestimating") capped the slot estimation to MAX_SKB_FRAGS, but that triggers
the next BUG_ON a few lines down, as the packet consumes more slots than
estimated.
This patch introduces full_coalesce on the skb callback buffer, which is used in
start_new_rx_buffer() to decide whether netback needs coalescing more
aggresively. By doing that, no packet should need more than
(XEN_NETIF_MAX_TX_SIZE + 1) / PAGE_SIZE data slots (excluding the optional GSO
slot, it doesn't carry data, therefore irrelevant in this case), as the provided
buffers are fully utilized.

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Cc: Paul Durrant <paul.durrant@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@gmail.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-05 15:09:08 -07:00
Andrew J. Bennieston 8d3d53b3e4 xen-netback: Add support for multiple queues
Builds on the refactoring of the previous patch to implement multiple
queues between xen-netfront and xen-netback.

Writes the maximum supported number of queues into XenStore, and reads
the values written by the frontend to determine how many queues to use.

Ring references and event channels are read from XenStore on a per-queue
basis and rings are connected accordingly.

Also adds code to handle the cleanup of any already initialised queues
if the initialisation of a subsequent queue fails.

Signed-off-by: Andrew J. Bennieston <andrew.bennieston@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-04 14:48:16 -07:00
Wei Liu e9ce7cb6b1 xen-netback: Factor queue-specific data into queue struct
In preparation for multi-queue support in xen-netback, move the
queue-specific data from struct xenvif into struct xenvif_queue, and
update the rest of the code to use this.

Also adds loops over queues where appropriate, even though only one is
configured at this point, and uses alloc_netdev_mq() and the
corresponding multi-queue netif wake/start/stop functions in preparation
for multiple active queues.

Finally, implements a trivial queue selection function suitable for
ndo_select_queue, which simply returns 0 for a single queue and uses
skb_get_hash() to compute the queue index otherwise.

Signed-off-by: Andrew J. Bennieston <andrew.bennieston@citrix.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-04 14:48:16 -07:00
Andrew J. Bennieston a55d9766ce xen-netback: Move grant_copy_op array back into struct xenvif.
This array was allocated separately in commit ac3d5ac2 ("xen-netback:
fix guest-receive-side array sizes") due to it being very large, and a
struct xenvif is allocated as the netdev_priv part of a struct
net_device, i.e. via kmalloc() but falling back to vmalloc() if the
initial alloc. fails.

In preparation for the multi-queue patches, where this array becomes
part of struct xenvif_queue and is always allocated through vzalloc(),
move this back into the struct xenvif.

Signed-off-by: Andrew J. Bennieston <andrew.bennieston@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-04 14:48:16 -07:00
David S. Miller 54e5c4def0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/bonding/bond_alb.c
	drivers/net/ethernet/altera/altera_msgdma.c
	drivers/net/ethernet/altera/altera_sgdma.c
	net/ipv6/xfrm6_output.c

Several cases of overlapping changes.

The xfrm6_output.c has a bug fix which overlaps the renaming
of skb->local_df to skb->ignore_df.

In the Altera TSE driver cases, the register access cleanups
in net-next overlapped with bug fixes done in net.

Similarly a bug fix to send ALB packets in the bonding driver using
the right source address overlaps with cleanups in net-next.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-24 00:32:30 -04:00
David Vrabel 0d08fceb2e xen-netback: fix race between napi_complete() and interrupt handler
When the NAPI budget was not all used, xenvif_poll() would call
napi_complete() /after/ enabling the interrupt.  This resulted in a
race between the napi_complete() and the napi_schedule() in the
interrupt handler.  The use of local_irq_save/restore() avoided by
race iff the handler is running on the same CPU but not if it was
running on a different CPU.

Fix this properly by calling napi_complete() before reenabling
interrupts (in the xenvif_napi_schedule_or_enable_irq() call).

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-16 16:27:23 -04:00
Zoltan Kiss 583757446b xen-netback: Fix grant ref resolution in RX path
The original series for reintroducing grant mapping for netback had a patch [1]
to handle receiving of packets from an another VIF. Grant copy on the receiving
side needs the grant ref of the page to set up the op.
The original patch assumed (wrongly) that the frags array haven't changed. In
the case reported by Sander, the sending guest sent a packet where the linear
buffer and the first frag were under PKT_PROT_LEN (=128) bytes.
xenvif_tx_submit() then pulled up the linear area to 128 bytes, and ditched the
first frag. The receiving side had an off-by-one problem when gathered the grant
refs.
This patch fixes that by checking whether the actual frag's page pointer is the
same as the page in the original frag list. It can handle any kind of changes on
the original frags array, like:
- removing granted frags from the array at any point
- adding local pages to the frags list anywhere
- reordering the frags
It's optimized to the most common case, when there is 1:1 relation between the
frags and the list, plus works optimal when frags are removed from the end or
the beginning.

[1]: 3e2234: xen-netback: Handle foreign mapped pages on the guest RX path

Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-15 23:32:36 -04:00
Wilfried Klaebe 7ad24ea4bf net: get rid of SET_ETHTOOL_OPS
net: get rid of SET_ETHTOOL_OPS

Dave Miller mentioned he'd like to see SET_ETHTOOL_OPS gone.
This does that.

Mostly done via coccinelle script:
@@
struct ethtool_ops *ops;
struct net_device *dev;
@@
-       SET_ETHTOOL_OPS(dev, ops);
+       dev->ethtool_ops = ops;

Compile tested only, but I'd seriously wonder if this broke anything.

Suggested-by: Dave Miller <davem@davemloft.net>
Signed-off-by: Wilfried Klaebe <w-lkml@lebenslange-mailadresse.de>
Acked-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-13 17:43:20 -04:00
Zoltan Kiss 00aefceb2f xen-netback: Trivial format string fix
There is a "%" after pending_idx instead of ":".

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-04 10:49:53 -04:00
Zoltan Kiss bdab82759b xen-netback: Grant copy the header instead of map and memcpy
An old inefficiency of the TX path that we are grant mapping the first slot,
and then copy the header part to the linear area. Instead, doing a grant copy
for that header straight on is more reasonable. Especially because there are
ongoing efforts to make Xen avoiding TLB flush after unmap when the page were
not touched in Dom0. In the original way the memcpy ruined that.
The key changes:
- the vif has a tx_copy_ops array again
- xenvif_tx_build_gops sets up the grant copy operations
- we don't have to figure out whether the header and first frag are on the same
  grant mapped page or not
Note, we only grant copy PKT_PROT_LEN bytes from the first slot, the rest (if
any) will be on the first frag, which is grant mapped. If the first slot is
smaller than PKT_PROT_LEN, then we grant copy that, and later __pskb_pull_tail
will pull more from the frags (if any)

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-03 14:22:40 -04:00
Zoltan Kiss 9074ce2493 xen-netback: Rename map ops
Rename identifiers to state explicitly that they refer to map ops.

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-03 14:22:38 -04:00
Wei Liu e9d8b2c296 xen-netback: disable rogue vif in kthread context
When netback discovers frontend is sending malformed packet it will
disables the interface which serves that frontend.

However disabling a network interface involving taking a mutex which
cannot be done in softirq context, so we need to defer this process to
kthread context.

This patch does the following:
1. introduce a flag to indicate the interface is disabled.
2. check that flag in TX path, don't do any work if it's true.
3. check that flag in RX path, turn off that interface if it's true.

The reason to disable it in RX path is because RX uses kthread. After
this change the behavior of netback is still consistent -- it won't do
any TX work for a rogue frontend, and the interface will be eventually
turned off.

Also change a "continue" to "break" after xenvif_fatal_tx_err, as it
doesn't make sense to continue processing packets if frontend is rogue.

This is a fix for XSA-90.

Reported-by: Török Edwin <edwin@etorok.net>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-01 16:25:51 -04:00
David S. Miller 0b70195e0c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/xen-netback/netback.c

A bug fix overlapped with changing how the netback SKB control
block is implemented.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-31 16:56:43 -04:00
Paul Durrant 1425c7a4e8 xen-netback: BUG_ON in xenvif_rx_action() not catching overflow
The BUG_ON to catch ring overflow in xenvif_rx_action() makes the assumption
that meta_slots_used == ring slots used. This is not necessarily the case
for GSO packets, because the non-prefix GSO protocol consumes one more ring
slot than meta-slot for the 'extra_info'. This patch changes the test to
actually check ring slots.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-29 18:50:34 -04:00
Paul Durrant a02eb4732c xen-netback: worse-case estimate in xenvif_rx_action is underestimating
The worse-case estimate for skb ring slot usage in xenvif_rx_action()
fails to take fragment page_offset into account. The page_offset does,
however, affect the number of times the fragmentation code calls
start_new_rx_buffer() (i.e. consume another slot) and the worse-case
should assume that will always return true. This patch adds the page_offset
into the DIV_ROUND_UP for each frag.

Unfortunately some frontends aggressively limit the number of requests
they post into the shared ring so to avoid an estimate that is 'too'
pessimal it is capped at MAX_SKB_FRAGS.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-29 18:50:34 -04:00
Paul Durrant 0576eddf24 xen-netback: remove pointless clause from if statement
This patch removes a test in start_new_rx_buffer() that checks whether
a copy operation is less than MAX_BUFFER_OFFSET in length, since
MAX_BUFFER_OFFSET is defined to be PAGE_SIZE and the only caller of
start_new_rx_buffer() already limits copy operations to PAGE_SIZE or less.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Reported-By: Sander Eikelenboom <linux@eikelenboom.it>
Tested-By: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-29 18:50:34 -04:00
Zoltan Kiss 7aceb47a9d xen-netback: Functional follow-up patch for grant mapping series
Ian made some late comments about the grant mapping series, I incorporated the
functional outcomes into this patch:

- use callback_param macro to shorten access to pending_tx_info in
  xenvif_fill_frags() and xenvif_tx_submit()
- print an error message in xenvif_idx_unmap() before panic

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-26 16:33:42 -04:00
Zoltan Kiss 0e59a4a553 xen-netback: Non-functional follow-up patch for grant mapping series
Ian made some late comments about the grant mapping series, I incorporated the
non-functional outcomes into this patch:

- typo fixes in a comment of xenvif_free(), and add another one there as well
- typo fix for comment of rx_drain_timeout_msecs
- remove stale comment before calling xenvif_grant_handle_reset()

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-26 16:33:42 -04:00
Zoltan Kiss 869b9b19b3 xen-netback: Stop using xenvif_tx_pending_slots_available
Since the early days TX stops if there isn't enough free pending slots to
consume a maximum sized (slot-wise) packet. Probably the reason for that is to
avoid the case when we don't have enough free pending slot in the ring to finish
the packet. But if we make sure that the pending ring has the same size as the
shared ring, that shouldn't really happen. The frontend can only post packets
which fit the to the free space of the shared ring. If it doesn't, the frontend
has to stop, as it can only increase the req_prod when the whole packet fits
onto the ring.
This patch avoid using this checking, makes sure the 2 ring has the same size,
and remove a checking from the callback. As now we don't stop the NAPI instance
on this condition, we don't have to wake it up if we free pending slots up.

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-26 16:33:42 -04:00
David S. Miller 2c5f4f8422 xen-netback: Proper printf format for ptrdiff_t is 't'.
This fixes:

drivers/net/xen-netback/netback.c: In function ‘xenvif_tx_dealloc_action’:
drivers/net/xen-netback/netback.c:1573:8: warning: format ‘%x’ expects argument of type ‘unsigned int’, but argument 3 has type ‘long int’ [-Wformat=]

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-25 19:02:16 -04:00
Zoltan Kiss 397dfd9f93 Revert "xen-netback: Aggregate TX unmap operations"
This reverts commit e9275f5e2d. This commit is the
last in the netback grant mapping series, and it tries to do more aggressive
aggreagtion of unmap operations. However practical use showed almost no
positive effect, whilst with certain frontends it causes significant performance
regression.

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-25 18:58:07 -04:00
David S. Miller 85dcce7a73 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/usb/r8152.c
	drivers/net/xen-netback/netback.c

Both the r8152 and netback conflicts were simple overlapping
changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-14 22:31:55 -04:00
Wei Liu 836fbaf459 xen-netback: use skb_is_gso in xenvif_start_xmit
In 5bd076708 ("Xen-netback: Fix issue caused by using gso_type wrongly")
we use skb_is_gso to determine if we need an extra slot to accommodate
the SKB. There's similar error in interface.c. Change that to use
skb_is_gso as well.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Annie Li <annie.li@oracle.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 15:36:32 -04:00
Annie Li 5bd0767086 Xen-netback: Fix issue caused by using gso_type wrongly
Current netback uses gso_type to check whether the skb contains
gso offload, and this is wrong. Gso_size is the right one to
check gso existence, and gso_type is only used to check gso type.

Some skbs contains nonzero gso_type and zero gso_size, current
netback would treat these skbs as gso and create wrong response
for this. This also causes ssh failure to domu from other server.

V2: use skb_is_gso function as Paul Durrant suggested

Signed-off-by: Annie Li <annie.li@oracle.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-10 21:57:50 -04:00
Zoltan Kiss e9275f5e2d xen-netback: Aggregate TX unmap operations
Unmapping causes TLB flushing, therefore we should make it in the largest
possible batches. However we shouldn't starve the guest for too long. So if
the guest has space for at least two big packets and we don't have at least a
quarter ring to unmap, delay it for at most 1 milisec.

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-07 15:57:21 -05:00
Zoltan Kiss 093507885a xen-netback: Timeout packets in RX path
A malicious or buggy guest can leave its queue filled indefinitely, in which
case qdisc start to queue packets for that VIF. If those packets came from an
another guest, it can block its slots and prevent shutdown. To avoid that, we
make sure the queue is drained in every 10 seconds.
The QDisc queue in worst case takes 3 round to flush usually.

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-07 15:57:15 -05:00
Zoltan Kiss e3377f36ca xen-netback: Handle guests with too many frags
Xen network protocol had implicit dependency on MAX_SKB_FRAGS. Netback has to
handle guests sending up to XEN_NETBK_LEGACY_SLOTS_MAX slots. To achieve that:
- create a new skb
- map the leftover slots to its frags (no linear buffer here!)
- chain it to the previous through skb_shinfo(skb)->frag_list
- map them
- copy and coalesce the frags into a brand new one and send it to the stack
- unmap the 2 old skb's pages

It's also introduces new stat counters, which help determine how often the guest
sends a packet with more than MAX_SKB_FRAGS frags.

NOTE: if bisect brought you here, you should apply the series up until
"xen-netback: Timeout packets in RX path", otherwise malicious guests can block
other guests by not releasing their sent packets.

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-07 15:56:35 -05:00
Zoltan Kiss 1bb332af4c xen-netback: Add stat counters for zerocopy
These counters help determine how often the buffers had to be copied. Also
they help find out if packets are leaked, as if "sent != success + fail",
there are probably packets never freed up properly.

NOTE: if bisect brought you here, you should apply the series up until
"xen-netback: Timeout packets in RX path", otherwise Windows guests can't work
properly and malicious guests can block other guests by not releasing their sent
packets.

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-07 15:56:35 -05:00
Zoltan Kiss 62bad3199a xen-netback: Remove old TX grant copy definitons and fix indentations
These became obsolete with grant mapping. I've left intentionally the
indentations in this way, to improve readability of previous patches.

NOTE: if bisect brought you here, you should apply the series up until
"xen-netback: Timeout packets in RX path", otherwise Windows guests can't work
properly and malicious guests can block other guests by not releasing their sent
packets.

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-07 15:56:35 -05:00
Zoltan Kiss f53c3fe8da xen-netback: Introduce TX grant mapping
This patch introduces grant mapping on netback TX path. It replaces grant copy
operations, ditching grant copy coalescing along the way. Another solution for
copy coalescing is introduced in "xen-netback: Handle guests with too many
frags", older guests and Windows can broke before that patch applies.
There is a callback (xenvif_zerocopy_callback) from core stack to release the
slots back to the guests when kfree_skb or skb_orphan_frags called. It feeds a
separate dealloc thread, as scheduling NAPI instance from there is inefficient,
therefore we can't do dealloc from the instance.

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-07 15:56:35 -05:00
Zoltan Kiss 3e2234b314 xen-netback: Handle foreign mapped pages on the guest RX path
RX path need to know if the SKB fragments are stored on pages from another
domain.
Logically this patch should be after introducing the grant mapping itself, as
it makes sense only after that. But to keep bisectability, I moved it here. It
shouldn't change any functionality here. xenvif_zerocopy_callback and
ubuf_to_vif are just stubs here, they will be introduced properly later on.

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-07 15:56:35 -05:00
Zoltan Kiss 121fa4b777 xen-netback: Minor refactoring of netback code
This patch contains a few bits of refactoring before introducing the grant
mapping changes:
- introducing xenvif_tx_pending_slots_available(), as this is used several
  times, and will be used more often
- rename the thread to vifX.Y-guest-rx, to signify it does RX work from the
  guest point of view

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-07 15:56:34 -05:00
Zoltan Kiss 8f13dd9612 xen-netback: Use skb->cb for pending_idx
Storing the pending_idx at the first byte of the linear buffer never looked
good, skb->cb is a more proper place for this. It also prevents the header to
be directly grant copied there, and we don't have the pending_idx after we
copied the header here, so it's time to change it.
It also introduces helpers for the RX side

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-07 15:56:34 -05:00
Zoltan Kiss 9ab9831b4c xen-netback: Fix Rx stall due to race condition
The recent patch to fix receive side flow control
(11b57f90257c1d6a91cee720151b69e0c2020cf6: xen-netback: stop vif thread
spinning if frontend is unresponsive) solved the spinning thread problem,
however caused an another one. The receive side can stall, if:
- [THREAD] xenvif_rx_action sets rx_queue_stopped to true
- [INTERRUPT] interrupt happens, and sets rx_event to true
- [THREAD] then xenvif_kthread sets rx_event to false
- [THREAD] rx_work_todo doesn't return true anymore

Also, if interrupt sent but there is still no room in the ring, it take quite a
long time until xenvif_rx_action realize it. This patch ditch that two variable,
and rework rx_work_todo. If the thread finds it can't fit more skb's into the
ring, it saves the last slot estimation into rx_last_skb_slots, otherwise it's
kept as 0. Then rx_work_todo will check if:
- there is something to send to the ring (like before)
- there is space for the topmost packet in the queue

I think that's more natural and optimal thing to test than two bool which are
set somewhere else.

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-05 16:24:08 -08:00
Paul Durrant 2721637c1c xen-netback: use new skb_checksum_setup function
Use skb_checksum_setup to set up partial checksum offsets rather
then a private implementation.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14 14:24:19 -08:00
Paul Durrant 11b57f9025 xen-netback: stop vif thread spinning if frontend is unresponsive
The recent patch to improve guest receive side flow control (ca2f09f2) had a
slight flaw in the wait condition for the vif thread in that any remaining
skbs in the guest receive side netback internal queue would prevent the
thread from sleeping. An unresponsive frontend can lead to a permanently
non-empty internal queue and thus the thread will spin. In this case the
thread should really sleep until the frontend becomes responsive again.

This patch adds an extra flag to the vif which is set if the shared ring
is full and cleared when skbs are drained into the shared ring. Thus,
if the thread runs, finds the shared ring full and can make no progress the
flag remains set. If the flag remains set then the thread will sleep,
regardless of a non-empty queue, until the next event from the frontend.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-09 23:05:46 -05:00
David S. Miller 56a4342dfe Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c
	net/ipv6/ip6_tunnel.c
	net/ipv6/ip6_vti.c

ipv6 tunnel statistic bug fixes conflicting with consolidation into
generic sw per-cpu net stats.

qlogic conflict between queue counting bug fix and the addition
of multiple MAC address support.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-06 17:37:45 -05:00
Josh Boyer f35f76ee76 xen-netback: Include header for vmalloc
Commit ac3d5ac277 ("xen-netback: fix guest-receive-side array sizes")
added calls to vmalloc and vfree in the interface.c file without including
<linux/vmalloc.h>.  This causes build failures if the
-Werror=implicit-function-declaration flag is passed.

Signed-off-by: Josh Boyer <jwboyer@fedoraproject.org>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-05 20:34:36 -05:00
Paul Durrant ac3d5ac277 xen-netback: fix guest-receive-side array sizes
The sizes chosen for the metadata and grant_copy_op arrays on the guest
receive size are wrong;

- The meta array is needlessly twice the ring size, when we only ever
  consume a single array element per RX ring slot
- The grant_copy_op array is way too small. It's sized based on a bogus
  assumption: that at most two copy ops will be used per ring slot. This
  may have been true at some point in the past but it's clear from looking
  at start_new_rx_buffer() that a new ring slot is only consumed if a frag
  would overflow the current slot (plus some other conditions) so the actual
  limit is MAX_SKB_FRAGS grant_copy_ops per ring slot.

This patch fixes those two sizing issues and, because grant_copy_ops grows
so much, it pulls it out into a separate chunk of vmalloc()ed memory.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-29 22:31:30 -05:00
Paul Durrant b89587a7af xen-netback: add gso_segs calculation
netback already has code which parses IPv4 and v6 headers to set up checksum
offsets and these are always applied to GSO packets being sent from
frontends. It's therefore suboptimal that GSOs are being marked
SKB_GSO_DODGY to defer the gso_segs calculation when netback already has all
necessary information to hand to do the calculation. This patch adds that
calculation.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-19 15:11:49 -05:00
Wei Yongjun 0c8d087c04 xen-netback: fix some error return code
'err' is overwrited to 0 after maybe_pull_tail() call, so the error
code was not set if skb_partial_csum_set() call failed. Fix to return
error -EPROTO from those error handling case instead of 0.

Fixes: d52eb0d46f ('xen-netback: make sure skb linear area covers checksum field')
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-19 14:58:47 -05:00
David S. Miller 143c905494 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/intel/i40e/i40e_main.c
	drivers/net/macvtap.c

Both minor merge hassles, simple overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-18 16:42:06 -05:00
Wei Yongjun 7022ef8b2a xen-netback: fix fragments error handling in checksum_setup_ip()
Fix to return -EPROTO error if fragments detected in checksum_setup_ip().

Fixes: 1431fb31ec ('xen-netback: fix fragment detection in checksum setup')
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17 16:18:18 -05:00
Paul Durrant a3314f3d40 xen-netback: fix gso_prefix check
There is a mistake in checking the gso_prefix mask when passing large
packets to a guest. The wrong shift is applied to the bit - the raw skb
gso type is used rather then the translated one. This leads to large packets
being handed to the guest without the GSO metadata. This patch fixes the
check.

The mistake manifested as errors whilst running Microsoft HCK large packet
offload tests between a pair of Windows 8 VMs. I have verified this patch
fixes those errors.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-12 15:47:18 -05:00
Paul Durrant d9601a36ff xen-netback: napi: don't prematurely request a tx event
This patch changes the RING_FINAL_CHECK_FOR_REQUESTS in
xenvif_build_tx_gops to a check for RING_HAS_UNCONSUMED_REQUESTS as the
former call has the side effect of advancing the ring event pointer and
therefore inviting another interrupt from the frontend before the napi
poll has actually finished, thereby defeating the point of napi.

The event pointer is updated by RING_FINAL_CHECK_FOR_REQUESTS in
xenvif_poll, the napi poll function, if the work done is less than the
budget i.e. when actually transitioning back to interrupt mode.

Reported-by: Malcolm Crossley <malcolm.crossley@citrix.com>
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-12 13:35:38 -05:00
Paul Durrant 10574059ce xen-netback: napi: fix abuse of budget
netback seems to be somewhat confused about the napi budget parameter. The
parameter is supposed to limit the number of skbs processed in each poll,
but netback has this confused with grant operations.

This patch fixes that, properly limiting the work done in each poll. Note
that this limit makes sure we do not process any more data from the shared
ring than we intend to pass back from the poll. This is important to
prevent tx_queue potentially growing without bound.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-12 13:35:38 -05:00
Paul Durrant d52eb0d46f xen-netback: make sure skb linear area covers checksum field
skb_partial_csum_set requires that the linear area of the skb covers the
checksum field. The checksum setup code in netback was only doing that
pullup in the case when the pseudo header checksum was being recalculated
though. This patch makes that pullup unconditional. (I pullup the whole
transport header just for simplicity; the requirement is only for the check
field but in the case of UDP this is the last field in the header and in the
case of TCP it's the last but one).

The lack of pullup manifested as failures running Microsoft HCK network
tests on a pair of Windows 8 VMs and it has been verified that this patch
fixes the problem.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-11 16:46:24 -05:00