Commit Graph

402772 Commits

Author SHA1 Message Date
Rafael J. Wysocki c511851de1 Revert "epoll: use freezable blocking call"
This reverts commit 1c441e9212 (epoll: use freezable blocking call)
which is reported to cause user space memory corruption to happen
after suspend to RAM.

Since it appears to be extremely difficult to root cause this
problem, it is best to revert the offending commit and try to address
the original issue in a better way later.

References: https://bugzilla.kernel.org/show_bug.cgi?id=61781
Reported-by: Natrio <natrio@list.ru>
Reported-by: Jeff Pohlmeyer <yetanothergeek@gmail.com>
Bisected-by: Leo Wolf <jclw@ymail.com>
Fixes: 1c441e9212 (epoll: use freezable blocking call)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: 3.11+ <stable@vger.kernel.org> # 3.11+
2013-10-30 15:27:53 +01:00
Takashi Iwai 6fc16e58ad ALSA: hda - Add a fixup for ASUS N76VZ
ASUS N76VZ needs the same fixup as N56VZ for supporting the boost
speaker.

Bugzilla: https://bugzilla.novell.com/show_bug.cgi?id=846529
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2013-10-30 12:31:35 +01:00
Paolo Bonzini 0c8eb04a62 KVM: use a more sensible error number when debugfs directory creation fails
I don't know if this was due to cut and paste, or somebody was really
using a D20 to pick the error code for kvm_init_debugfs as suggested by
Linus (EFAULT is 14, so the possibility cannot be entirely ruled out).

In any case, this patch fixes it.

Reported-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-10-30 12:15:34 +01:00
Tim Gardner d780a31271 KVM: Fix modprobe failure for kvm_intel/kvm_amd
The x86 specific kvm init creates a new conflicting
debugfs directory which causes modprobe issues
with kvm_intel and kvm_amd. For example,

sudo modprobe kvm_amd
modprobe: ERROR: could not insert 'kvm_amd': Bad address

The simplest fix is to just rename the directory. The following
KVM config options are set:

CONFIG_KVM_GUEST=y
CONFIG_KVM_DEBUG_FS=y
CONFIG_HAVE_KVM=y
CONFIG_HAVE_KVM_IRQCHIP=y
CONFIG_HAVE_KVM_IRQ_ROUTING=y
CONFIG_HAVE_KVM_EVENTFD=y
CONFIG_KVM_APIC_ARCHITECTURE=y
CONFIG_KVM_MMIO=y
CONFIG_KVM_ASYNC_PF=y
CONFIG_HAVE_KVM_MSI=y
CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT=y
CONFIG_KVM=m
CONFIG_KVM_INTEL=m
CONFIG_KVM_AMD=m
CONFIG_KVM_DEVICE_ASSIGNMENT=y

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
[Change debugfs directory name. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-10-30 12:10:42 +01:00
David Herrmann 3d3b78c06c drm: allow DRM_IOCTL_VERSION on render-nodes
DRM_IOCTL_VERSION is a reliable way to get the driver-name and version
information. It's not related to the interface-version (SET_VERSION ioctl)
so we can safely enable it on render-nodes.

Note that gbm uses udev-BUSID to load the correct mesa driver. However,
the VERSION ioctl should be the more reliable way to do this (in case we
add new DRM-bus drivers which have no BUSID or similar).

Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2013-10-30 14:41:56 +10:00
Nathan Hintz b757a62e9f bgmac: don't update slot on skb alloc/dma mapping error
Don't update the slot in "bgmac_dma_rx_skb_for_slot" unless both the
skb alloc and dma mapping are successful; and free the newly allocated
skb if a dma mapping error occurs.  This relieves the caller of the need
to deduce/execute the appropriate cleanup action required when an error
occurs.

Signed-off-by: Nathan Hintz <nlhintz@hotmail.com>
Acked-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29 22:59:19 -04:00
Alistair Popple 32663b8b89 ibm emac: Fix locking for enable/disable eob irq
Calls to mal_enable_eob_irq perform a read-write-modify of a dcr to
enable device irqs which is protected by a spin lock. However calls to
mal_disable_eob_irq do not take the corresponding lock.

This patch resolves the problem by ensuring that calls to
mal_disable_eob_irq also take the lock.

Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29 22:57:42 -04:00
Alistair Popple b4dfd326c2 ibm emac: Don't call napi_complete if napi_reschedule failed
This patch fixes a bug which would trigger the BUG_ON() at
net/core/dev.c:4156. It was found that this was due to continuing
processing in the current poll call even when the call to
napi_reschedule failed, indicating the device was already on the
polling list. This resulted in an extra call to napi_complete which
triggered the BUG_ON().

This patch ensures that we only contine processing rotting packets in
the current mal_poll call if we are not already on the polling list.

Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29 22:57:42 -04:00
Luka Perkov 99470819b1 arc_emac: drop redundant mac address check
Checking if MAC address is valid using is_valid_ether_addr() is already done in
of_get_mac_address(). While at it, reorganize checking so it matches checks in
other drivers.

Signed-off-by: Luka Perkov <luka@openwrt.org>
CC: Alexey Brodkin <Alexey.Brodkin@synopsys.com>
CC: David Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29 22:52:45 -04:00
Luka Perkov 6c7a9a3c78 mvneta: drop redundant mac address check
Checking if MAC address is valid using is_valid_ether_addr() is already done in
of_get_mac_address().

Signed-off-by: Luka Perkov <luka@openwrt.org>
Acked-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
CC: David Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29 22:52:44 -04:00
Luka Perkov 09ec0d051a octeon_mgmt: drop redundant mac address check
Checking if MAC address is valid using is_valid_ether_addr() is already done in
of_get_mac_address().

Signed-off-by: Luka Perkov <luka@openwrt.org>
Acked-by: David Daney <david.daney@cavium.com>
CC: David Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29 22:52:44 -04:00
Yuchung Cheng c968601d17 tcp: temporarily disable Fast Open on SYN timeout
Fast Open currently has a fall back feature to address SYN-data being
dropped but it requires the middle-box to pass on regular SYN retry
after SYN-data. This is implemented in commit aab487435 ("net-tcp:
Fast Open client - detecting SYN-data drops")

However some NAT boxes will drop all subsequent packets after first
SYN-data and blackholes the entire connections.  An example is in
commit 356d7d8 "netfilter: nf_conntrack: fix tcp_in_window for Fast
Open".

The sender should note such incidents and fall back to use the regular
TCP handshake on subsequent attempts temporarily as well: after the
second SYN timeouts the original Fast Open SYN is most likely lost.
When such an event recurs Fast Open is disabled based on the number of
recurrences exponentially.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29 22:50:41 -04:00
Jason Wang ec9debbd9a virtio-net: correctly handle cpu hotplug notifier during resuming
commit 3ab098df35 (virtio-net: don't respond to
cpu hotplug notifier if we're not ready) tries to bypass the cpu hotplug
notifier by checking the config_enable and does nothing is it was false. So it
need to try to hold the config_lock mutex which may happen in atomic
environment which leads the following warnings:

[  622.944441] CPU0 attaching NULL sched-domain.
[  622.944446] CPU1 attaching NULL sched-domain.
[  622.944485] CPU0 attaching NULL sched-domain.
[  622.950795] BUG: sleeping function called from invalid context at kernel/mutex.c:616
[  622.950796] in_atomic(): 1, irqs_disabled(): 1, pid: 10, name: migration/1
[  622.950796] no locks held by migration/1/10.
[  622.950798] CPU: 1 PID: 10 Comm: migration/1 Not tainted 3.12.0-rc5-wl-01249-gb91e82d #317
[  622.950799] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  622.950802]  0000000000000000 ffff88001d42dba0 ffffffff81a32f22 ffff88001bfb9c70
[  622.950803]  ffff88001d42dbb0 ffffffff810edb02 ffff88001d42dc38 ffffffff81a396ed
[  622.950805]  0000000000000046 ffff88001d42dbe8 ffffffff810e861d 0000000000000000
[  622.950805] Call Trace:
[  622.950810]  [<ffffffff81a32f22>] dump_stack+0x54/0x74
[  622.950815]  [<ffffffff810edb02>] __might_sleep+0x112/0x114
[  622.950817]  [<ffffffff81a396ed>] mutex_lock_nested+0x3c/0x3c6
[  622.950818]  [<ffffffff810e861d>] ? up+0x39/0x3e
[  622.950821]  [<ffffffff8153ea7c>] ? acpi_os_signal_semaphore+0x21/0x2d
[  622.950824]  [<ffffffff81565ed1>] ? acpi_ut_release_mutex+0x5e/0x62
[  622.950828]  [<ffffffff816d04ec>] virtnet_cpu_callback+0x33/0x87
[  622.950830]  [<ffffffff81a42576>] notifier_call_chain+0x3c/0x5e
[  622.950832]  [<ffffffff810e86a8>] __raw_notifier_call_chain+0xe/0x10
[  622.950835]  [<ffffffff810c5556>] __cpu_notify+0x20/0x37
[  622.950836]  [<ffffffff810c5580>] cpu_notify+0x13/0x15
[  622.950838]  [<ffffffff81a237cd>] take_cpu_down+0x27/0x3a
[  622.950841]  [<ffffffff81136289>] stop_machine_cpu_stop+0x93/0xf1
[  622.950842]  [<ffffffff81136167>] cpu_stopper_thread+0xa0/0x12f
[  622.950844]  [<ffffffff811361f6>] ? cpu_stopper_thread+0x12f/0x12f
[  622.950847]  [<ffffffff81119710>] ? lock_release_holdtime.part.7+0xa3/0xa8
[  622.950848]  [<ffffffff81135e4b>] ? cpu_stop_should_run+0x3f/0x47
[  622.950850]  [<ffffffff810ea9b0>] smpboot_thread_fn+0x1c5/0x1e3
[  622.950852]  [<ffffffff810ea7eb>] ? lg_global_unlock+0x67/0x67
[  622.950854]  [<ffffffff810e36b7>] kthread+0xd8/0xe0
[  622.950857]  [<ffffffff81a3bfad>] ? wait_for_common+0x12f/0x164
[  622.950859]  [<ffffffff810e35df>] ? kthread_create_on_node+0x124/0x124
[  622.950861]  [<ffffffff81a45ffc>] ret_from_fork+0x7c/0xb0
[  622.950862]  [<ffffffff810e35df>] ? kthread_create_on_node+0x124/0x124
[  622.950876] smpboot: CPU 1 is now offline
[  623.194556] SMP alternatives: lockdep: fixing up alternatives
[  623.194559] smpboot: Booting Node 0 Processor 1 APIC 0x1
...

A correct fix is to unregister the hotcpu notifier during restore and register a
new one in resume.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Tested-by: Fengguang Wu <fengguang.wu@intel.com>
Cc: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29 22:43:41 -04:00
Dave Airlie 2f2632ff6e Merge tag 'drm-intel-fixes-2013-10-29' of git://people.freedesktop.org/~danvet/drm-intel into drm-fixes
Regression and warn fixes for i915.

* tag 'drm-intel-fixes-2013-10-29' of git://people.freedesktop.org/~danvet/drm-intel:
  drm/i915: Fix the PPT fdi lane bifurcate state handling on ivb
  drm/i915: No LVDS hardware on Intel D410PT and D425KT
  drm/i915/dp: workaround BIOS eDP bpp clamping issue
  drm/i915: Add HSW CRT output readout support
  drm/i915: Add support for pipe_bpp readout
2013-10-30 12:30:12 +10:00
Daniel Borkmann 97203abe6b net: ipvs: sctp: do not recalc sctp csum when ports didn't change
Unlike UDP or TCP, we do not take the pseudo-header into
account in SCTP checksums. So in case port mapping is the
very same, we do not need to recalculate the whole SCTP
checksum in software, which is very expensive.

Also, similarly as in TCP, take into account when a private
helper mangled the packet. In that case, we also need to
recalculate the checksum even if ports might be same.

Thanks for feedback regarding skb->ip_summed checks from
Julian Anastasov; here's a discussion on these checks for
snat and dnat:

* For snat_handler(), we can see CHECKSUM_PARTIAL from
  virtual devices, and from LOCAL_OUT, otherwise it
  should be CHECKSUM_UNNECESSARY. In general, in snat it
  is more complex. skb contains the original route and
  ip_vs_route_me_harder() can change the route after
  snat_handler. So, for locally generated replies from
  local server we can not preserve the CHECKSUM_PARTIAL
  mode. It is an chicken or egg dilemma: snat_handler
  needs the device after rerouting (to check for
  NETIF_F_SCTP_CSUM), while ip_route_me_harder() wants
  the snat_handler() to put the new saddr for proper
  rerouting.

* For dnat_handler(), we should not see CHECKSUM_COMPLETE
  for SCTP, in fact the small set of drivers that support
  SCTP offloading return CHECKSUM_UNNECESSARY on correctly
  received SCTP csum. We can see CHECKSUM_PARTIAL from
  local stack or received from virtual drivers. The idea is
  that SCTP decides to avoid csum calculation if hardware
  supports offloading. IPVS can change the device after
  rerouting to real server but we can preserve the
  CHECKSUM_PARTIAL mode if the new device supports
  offloading too. This works because skb dst is changed
  before dnat_handler and we see the new device. So, checks
  in the 'if' part will decide whether it is ok to keep
  CHECKSUM_PARTIAL for the output. If the packet was with
  CHECKSUM_NONE, hence we deal with unknown checksum. As we
  recalculate the sum for IP header in all cases, it should
  be safe to use CHECKSUM_UNNECESSARY. We can forward wrong
  checksum in this case (without cp->app). In case of
  CHECKSUM_UNNECESSARY, the csum was valid on receive.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-10-30 09:48:16 +09:00
David S. Miller aa58d9813d Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next
Jeff Kirsher says:

====================
This series contains updates to vxlan, net, ixgbe, ixgbevf, and i40e.

Joseph provides a single patch against vxlan which removes the burden
from the NIC drivers to check if the vxlan driver is enabled in the
kernel and also makes available the vxlan headrooms to the drivers.

Jacob provides majority of the patches, with patches against net, ixgbe
and ixgbevf.  His net patch adds might_sleep() call to napi_disable so
that every use of napi_disable during atomic context will be visible.
Then Jacob provides a patch to fix qv_lock_napi call in
ixgbe_napi_disable_all.  The other ixgbe patches cleanup
ixgbe_check_minimum_link function to correctly show that there are some
minor loss of encoding, even though we don't calculate it and remove
unnecessary duplication of PCIe bandwidth display.  Lastly, Jacob
provides 4 patches against ixgbevf to add ixgbevf_rx_skb in line with
how ixgbe handles the variations on how packets can be received, adds
support in order to track how many packets were cleaned during busy poll
as part of the extended statistics.

Wei Yongjun provides a fix for i40e to return -ENOMEN in the memory
allocation error handling case instead of returning 0, as done
elsewhere in this function.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29 18:57:49 -04:00
Leigh Brown d4a0acb8ed net: mvmdio: doc: mvmdio now used by mv643xx_eth
Amend the documentation in the mvmdio driver to note the fact
that it is now used by both the mvneta and mv643xx_eth drivers.

Signed-off-by: Leigh Brown <leigh@solinno.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29 18:54:20 -04:00
Leigh Brown 526edcf567 net: mvmdio: slight optimisation of orion_mdio_write
Make only a single call to mutex_unlock in orion_mdio_write.

Signed-off-by: Leigh Brown <leigh@solinno.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29 18:53:36 -04:00
Leigh Brown 839f46bb4c net: mvmdio: orion_mdio_ready: remove manual poll
Replace manual poll of MVMDIO_SMI_READ_VALID with a call to
orion_mdio_wait_ready.  This ensures a consistent timeout,
eliminates a busy loop, and allows for use of interrupts on
systems that support them.

Signed-off-by: Leigh Brown <leigh@solinno.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29 18:53:36 -04:00
Leigh Brown b70cd1c1a9 net: mvmdio: make orion_mdio_wait_ready consistent
Amend orion_mdio_wait_ready so that the same timeout is used when
polling or using wait_event_timeout.  Set the timeout to 1ms.

Replace udelay with usleep_range to avoid a busy loop, and set the
polling interval range as 45us to 55us, so that the first sleep
will be enough in almost all cases.

Generate the same log message at timeout when polling or using
wait_event_timeout.

Signed-off-by: Leigh Brown <leigh@solinno.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29 18:53:35 -04:00
Gavin Shan 87f20c26f9 net/benet: Make lancer_wait_ready() static
The function needn't to be public, so to make it as static.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29 17:50:17 -04:00
Gavin Shan 547e2daeba net/benet: Remove interface type
The interface type, which is being traced by "struct be_adapter::
if_type", isn't used currently. So we can remove that safely
according to Sathya's comments.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29 17:50:16 -04:00
Joe Perches 22ded57729 netconsole: Convert to pr_<level>
Use a more current logging style.

Convert printks to pr_<level>.

Consolidate multiple printks into a single printk to avoid
any possible dmesg interleaving.  Add a default "event" msg
in case the listed types are ever expanded.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29 17:41:49 -04:00
Vlad Yasevich 06499098a0 bridge: pass correct vlan id to multicast code
Currently multicast code attempts to extrace the vlan id from
the skb even when vlan filtering is disabled.  This can lead
to mdb entries being created with the wrong vlan id.
Pass the already extracted vlan id to the multicast
filtering code to make the correct id is used in
creation as well as lookup.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Acked-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29 17:40:08 -04:00
David S. Miller 911aeb1084 Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch
Jesse Gross says:

====================
One patch for net/3.12 fixing an issue where devices could be in an
invalid state they are removed while still attached to OVS.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29 17:36:39 -04:00
Michael Drüing 706e282b69 net: x25: Fix dead URLs in Kconfig
Update the URLs in the Kconfig file to the new pages at sangoma.com and cisco.com

Signed-off-by: Michael Drüing <michael@drueing.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29 17:35:17 -04:00
Daniel Borkmann 7d1d65cb84 net: sched: cls_bpf: add BPF-based classifier
This work contains a lightweight BPF-based traffic classifier that can
serve as a flexible alternative to ematch-based tree classification, i.e.
now that BPF filter engine can also be JITed in the kernel. Naturally, tc
actions and policies are supported as well with cls_bpf. Multiple BPF
programs/filter can be attached for a class, or they can just as well be
written within a single BPF program, that's really up to the user how he
wishes to run/optimize the code, e.g. also for inversion of verdicts etc.
The notion of a BPF program's return/exit codes is being kept as follows:

     0: No match
    -1: Select classid given in "tc filter ..." command
  else: flowid, overwrite the default one

As a minimal usage example with iproute2, we use a 3 band prio root qdisc
on a router with sfq each as leave, and assign ssh and icmp bpf-based
filters to band 1, http traffic to band 2 and the rest to band 3. For the
first two bands we load the bytecode from a file, in the 2nd we load it
inline as an example:

echo 1 > /proc/sys/net/core/bpf_jit_enable

tc qdisc del dev em1 root
tc qdisc add dev em1 root handle 1: prio bands 3 priomap 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

tc qdisc add dev em1 parent 1:1 sfq perturb 16
tc qdisc add dev em1 parent 1:2 sfq perturb 16
tc qdisc add dev em1 parent 1:3 sfq perturb 16

tc filter add dev em1 parent 1: bpf run bytecode-file /etc/tc/ssh.bpf flowid 1:1
tc filter add dev em1 parent 1: bpf run bytecode-file /etc/tc/icmp.bpf flowid 1:1
tc filter add dev em1 parent 1: bpf run bytecode-file /etc/tc/http.bpf flowid 1:2
tc filter add dev em1 parent 1: bpf run bytecode "`bpfc -f tc -i misc.ops`" flowid 1:3

BPF programs can be easily created and passed to tc, either as inline
'bytecode' or 'bytecode-file'. There are a couple of front-ends that can
compile opcodes, for example:

1) People familiar with tcpdump-like filters:

   tcpdump -iem1 -ddd port 22 | tr '\n' ',' > /etc/tc/ssh.bpf

2) People that want to low-level program their filters or use BPF
   extensions that lack support by libpcap's compiler:

   bpfc -f tc -i ssh.ops > /etc/tc/ssh.bpf

   ssh.ops example code:
   ldh [12]
   jne #0x800, drop
   ldb [23]
   jneq #6, drop
   ldh [20]
   jset #0x1fff, drop
   ldxb 4 * ([14] & 0xf)
   ldh [%x + 14]
   jeq #0x16, pass
   ldh [%x + 16]
   jne #0x16, drop
   pass: ret #-1
   drop: ret #0

It was chosen to load bytecode into tc, since the reverse operation,
tc filter list dev em1, is then able to show the exact commands again.
Possible follow-up work could also include a small expression compiler
for iproute2. Tested with the help of bmon. This idea came up during
the Netfilter Workshop 2013 in Copenhagen. Also thanks to feedback from
Eric Dumazet!

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29 17:33:17 -04:00
Rafał Miłecki d549c76bcc bgmac: separate RX descriptor setup code into a new function
This cleans code a bit and will be useful when allocating buffers in
other places (like RX path, to avoid skb_copy_from_linear_data_offset).

Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29 17:20:36 -04:00
David S. Miller 68783ec73c Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
This pull request contains the following netfilter fix:

* fix --queue-bypass in xt_NFQUEUE revision 3. While adding the
  revision 3 of this target, the bypass flags were not correctly
  handled anymore, thus, breaking packet bypassing if no application
  is listening from userspace, patch from Holger Eitzenberger,
  reported by Florian Westphal.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29 16:53:44 -04:00
Deng-Cheng Zhu 7f081f1755 MIPS: Perf: Fix 74K cache map
According to Software User's Manual, the event of last-level-cache
read/write misses is mapped to even counters. Odd counters of that
event number count miss cycles.

Signed-off-by: Deng-Cheng Zhu <dengcheng.zhu@imgtec.com>
Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/6036/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2013-10-29 21:18:23 +01:00
Linus Torvalds 7314e613d5 Fix a few incorrectly checked [io_]remap_pfn_range() calls
Nico Golde reports a few straggling uses of [io_]remap_pfn_range() that
really should use the vm_iomap_memory() helper.  This trivially converts
two of them to the helper, and comments about why the third one really
needs to continue to use remap_pfn_range(), and adds the missing size
check.

Reported-by: Nico Golde <nico@ngolde.de>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org.
2013-10-29 10:21:34 -07:00
Linus Torvalds f9ec2e6f79 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf tooling fixes from Ingo Molnar:
 "This contains five tooling fixes:

   - fix a remaining mmap2 assumption which resulted in perf top output
     breakage
   - fix mmap ring-buffer processing bug that corrupts data
   - fix for a severe python scripting memory leak
   - fix broken (and user-visible) -g option handling
   - fix stdio output

  The diffstat size is larger than what we'd like to see this late :-/"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf tools: Fixup mmap event consumption
  perf top: Split -G and --call-graph
  perf record: Split -g and --call-graph
  perf hists: Add color overhead for stdio output buffer
  perf tools: Fix up /proc/PID/maps parsing
  perf script python: Fix mem leak due to missing Py_DECREFs on dict entries
2013-10-29 08:36:50 -07:00
Linus Torvalds 2a999aa0a1 Kconfig: make KOBJECT_RELEASE debugging require timer debugging
Without the timer debugging, the delayed kobject release will just
result in undebuggable oopses if it triggers any latent bugs.  That
doesn't actually help debugging at all.

So make DEBUG_KOBJECT_RELEASE depend on DEBUG_OBJECTS_TIMERS to avoid
having people enable one without the other.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-29 08:33:36 -07:00
Daniel Vetter 1fbc0d789d drm/i915: Fix the PPT fdi lane bifurcate state handling on ivb
Originally I've thought that this is leftover hw state dirt from the
BIOS. But after way too much helpless flailing around on my part I've
noticed that the actual bug is when we change the state of an already
active pipe.

For example when we change the fdi lines from 2 to 3 without switching
off outputs in-between we'll never see the crucial on->off transition
in the ->modeset_global_resources hook the current logic relies on.

Patch version 2 got this right by instead also checking whether the
pipe is indeed active. But that in turn broke things when pipes have
been turned off through dpms since the bifurcate enabling is done in
the ->crtc_mode_set callback.

To address this issues discussed with Ville in the patch review move
the setting of the bifurcate bit into the ->crtc_enable hook. That way
we won't wreak havoc with this state when userspace puts all other
outputs into dpms off state. This also moves us forward with our
overall goal to unify the modeset and dpms on paths (which we need to
have to allow runtime pm in the dpms off state).

Unfortunately this requires us to move the bifurcate helpers around a
bit.

Also update the commit message, I've misanalyzed the bug rather badly.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70507
Tested-by: Jan-Michael Brummer <jan.brummer@tabos.org>
Cc: stable@vger.kernel.org
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-10-29 13:52:56 +01:00
Holger Eitzenberger d954777324 netfilter: xt_NFQUEUE: fix --queue-bypass regression
V3 of the NFQUEUE target ignores the --queue-bypass flag,
causing packets to be dropped when the userspace listener
isn't running.

Regression is in since 8746ddcf12 ("netfilter: xt_NFQUEUE:
introduce CPU fanout").

Reported-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-29 13:05:54 +01:00
Wei Yongjun ed87ac09d8 i40e: fix error return code in i40e_probe()
Fix to return -ENOMEM in the memory alloc error handling
case instead of 0, as done elsewhere in this function.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-10-29 04:29:25 -07:00
Don Skidmore 44bd741e10 ixgbevf: Add zero_base handler to network statistics
This patch removes the need to keep a zero_base variable in the adapter
structure. Now we just use two different macros to set the non-zero and
zero base. This adds to readability and shortens some of the structure
initialization under 80 columns. The gathering of status for ethtool was
slightly modified to again better fit into 80 columns and become a bit
more readable.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-10-29 04:22:24 -07:00
Jacob Keller 3b5dca262f ixgbevf: add BP_EXTENDED_STATS for CONFIG_NET_RX_BUSY_POLL
This patch adds the extended statistics similar to the ixgbe driver. These
statistics keep track of how often the busy polling yields, as well as how many
packets are cleaned or missed by the polling routine.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-10-29 04:15:11 -07:00
Jacob Keller c777cdfa4e ixgbevf: implement CONFIG_NET_RX_BUSY_POLL
This patch enables CONFIG_NET_RX_BUSY_POLL support in the VF code. This enables
sockets which have enabled the SO_BUSY_POLL socket option to use the
ndo_busy_poll_recv operation which could result in lower latency, at the cost
of higher CPU utilization, and increased power usage. This support is similar
to how the ixgbe driver works.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-10-29 04:08:12 -07:00
Peter Zijlstra e8a923cc1f perf/x86: Fix NMI measurements
OK, so what I'm actually seeing on my WSM is that sched/clock.c is
'broken' for the purpose we're using it for.

What triggered it is that my WSM-EP is broken :-(

  [    0.001000] tsc: Fast TSC calibration using PIT
  [    0.002000] tsc: Detected 2533.715 MHz processor
  [    0.500180] TSC synchronization [CPU#0 -> CPU#6]:
  [    0.505197] Measured 3 cycles TSC warp between CPUs, turning off TSC clock.
  [    0.004000] tsc: Marking TSC unstable due to check_tsc_sync_source failed

For some reason it consistently detects TSC skew, even though NHM+
should have a single clock domain for 'reasonable' systems.

This marks sched_clock_stable=0, which means that we do fancy stuff to
try and get a 'sane' clock. Part of this fancy stuff relies on the tick,
clearly that's gone when NOHZ=y. So for idle cpus time gets stuck, until
it either wakes up or gets kicked by another cpu.

While this is perfectly fine for the scheduler -- it only cares about
actually running stuff, and when we're running stuff we're obviously not
idle. This does somewhat break down for perf which can trigger events
just fine on an otherwise idle cpu.

So I've got NMIs get get 'measured' as taking ~1ms, which actually
don't last nearly that long:

          <idle>-0     [013] d.h.   886.311970: rcu_nmi_enter <-do_nmi
  ...
          <idle>-0     [013] d.h.   886.311997: perf_sample_event_took: HERE!!! : 1040990

So ftrace (which uses sched_clock(), not the fancy bits) only sees
~27us, but we measure ~1ms !!

Now since all this measurement stuff lives in x86 code, we can actually
fix it.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: mingo@kernel.org
Cc: dave.hansen@linux.intel.com
Cc: eranian@google.com
Cc: Don Zickus <dzickus@redhat.com>
Cc: jmario@redhat.com
Cc: acme@infradead.org
Link: http://lkml.kernel.org/r/20131017133350.GG3364@laptop.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-29 12:01:20 +01:00
Peter Zijlstra bf378d341e perf: Fix perf ring buffer memory ordering
The PPC64 people noticed a missing memory barrier and crufty old
comments in the perf ring buffer code. So update all the comments and
add the missing barrier.

When the architecture implements local_t using atomic_long_t there
will be double barriers issued; but short of introducing more
conditional barrier primitives this is the best we can do.

Reported-by: Victor Kaplansky <victork@il.ibm.com>
Tested-by: Victor Kaplansky <victork@il.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: michael@ellerman.id.au
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: anton@samba.org
Cc: benh@kernel.crashing.org
Link: http://lkml.kernel.org/r/20131025173749.GG19466@laptop.lan
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-29 12:01:19 +01:00
Jacob Keller 08e50a20ed ixgbevf: have clean_rx_irq return total_rx_packets cleaned
Rather than return true/false indicating whether there was budget left, return
the total packets cleaned. This currently has no use, but will be used in a
following patch which enables CONFIG_NET_RX_BUSY_POLL support in order to track
how many packets were cleaned during the busy poll as part of the extended
statistics.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-10-29 04:00:21 -07:00
Jacob Keller 0868161866 ixgbevf: add ixgbevf_rx_skb
This patch adds ixgbevf_rx_skb in line with how ixgbe handles the variations on
how packets can be received. It will be extended in a following patch for
CONFIG_NET_RX_BUSY_POLL support.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-10-29 03:53:30 -07:00
Jacob Keller 6a2aae5ae6 ixgbe: remove unnecessary duplication of PCIe bandwidth display
This patch removes the unnecessary display of PCIe bandwidth twice. Since the
ixgbe_check_minimum_link does a better job, and ensures accurate detection on
even complex chains, this older check is no longer necessary.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-10-29 03:45:57 -07:00
Jacob Keller 9f0a433ce6 ixgbe: show <2% for encoding loss on PCIe Gen3
This patch updates the ixgbe_check_minimum_link function to correctly show that
there is some minor loss of encoding, even though we don't calculate it in the
max GT/s equation. It is small enough to not bother, but is better to report it
than not.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-10-29 03:38:26 -07:00
Mel Gorman 0255d49184 mm: Account for a THP NUMA hinting update as one PTE update
A THP PMD update is accounted for as 512 pages updated in vmstat.  This is
large difference when estimating the cost of automatic NUMA balancing and
can be misleading when comparing results that had collapsed versus split
THP. This patch addresses the accounting issue.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: <stable@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1381141781-10992-10-git-send-email-mgorman@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-29 11:38:17 +01:00
Mel Gorman 3f926ab945 mm: Close races between THP migration and PMD numa clearing
THP migration uses the page lock to guard against parallel allocations
but there are cases like this still open

  Task A					Task B
  ---------------------				---------------------
  do_huge_pmd_numa_page				do_huge_pmd_numa_page
  lock_page
  mpol_misplaced == -1
  unlock_page
  goto clear_pmdnuma
						lock_page
						mpol_misplaced == 2
						migrate_misplaced_transhuge
  pmd = pmd_mknonnuma
  set_pmd_at

During hours of testing, one crashed with weird errors and while I have
no direct evidence, I suspect something like the race above happened.
This patch extends the page lock to being held until the pmd_numa is
cleared to prevent migration starting in parallel while the pmd_numa is
being cleared. It also flushes the old pmd entry and orders pagetable
insertion before rmap insertion.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: <stable@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1381141781-10992-9-git-send-email-mgorman@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-29 11:38:05 +01:00
Mel Gorman c61109e34f mm: numa: Sanitize task_numa_fault() callsites
There are three callers of task_numa_fault():

 - do_huge_pmd_numa_page():
     Accounts against the current node, not the node where the
     page resides, unless we migrated, in which case it accounts
     against the node we migrated to.

 - do_numa_page():
     Accounts against the current node, not the node where the
     page resides, unless we migrated, in which case it accounts
     against the node we migrated to.

 - do_pmd_numa_page():
     Accounts not at all when the page isn't migrated, otherwise
     accounts against the node we migrated towards.

This seems wrong to me; all three sites should have the same
sementaics, furthermore we should accounts against where the page
really is, we already know where the task is.

So modify all three sites to always account; we did after all receive
the fault; and always account to where the page is after migration,
regardless of success.

They all still differ on when they clear the PTE/PMD; ideally that
would get sorted too.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: <stable@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1381141781-10992-8-git-send-email-mgorman@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-29 11:37:52 +01:00
Mel Gorman 587fe586f4 mm: Prevent parallel splits during THP migration
THP migrations are serialised by the page lock but on its own that does
not prevent THP splits. If the page is split during THP migration then
the pmd_same checks will prevent page table corruption but the unlock page
and other fix-ups potentially will cause corruption. This patch takes the
anon_vma lock to prevent parallel splits during migration.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: <stable@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1381141781-10992-7-git-send-email-mgorman@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-29 11:37:39 +01:00
Mel Gorman 42836f5f8b mm: Wait for THP migrations to complete during NUMA hinting faults
The locking for migrating THP is unusual. While normal page migration
prevents parallel accesses using a migration PTE, THP migration relies on
a combination of the page_table_lock, the page lock and the existance of
the NUMA hinting PTE to guarantee safety but there is a bug in the scheme.

If a THP page is currently being migrated and another thread traps a
fault on the same page it checks if the page is misplaced. If it is not,
then pmd_numa is cleared. The problem is that it checks if the page is
misplaced without holding the page lock meaning that the racing thread
can be migrating the THP when the second thread clears the NUMA bit
and faults a stale page.

This patch checks if the page is potentially being migrated and stalls
using the lock_page if it is potentially being migrated before checking
if the page is misplaced or not.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: <stable@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1381141781-10992-6-git-send-email-mgorman@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-29 11:37:19 +01:00