Commit Graph

601837 Commits

Author SHA1 Message Date
Jianxin Xiong 23f7d0d29e IB/hfi1, qib: Add ieth to the packet header definitions
A new union member "ieth" (Invalidate Extended Transport Header) is
added to the packet header definition in preparation of supporting
the send with invalidate opcode.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-26 12:21:10 -04:00
Linus Torvalds 159d08f4b8 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull Yama locking fix from James Morris:
 "Fix for the Yama LSM"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  Yama: fix double-spinlock and user access in atomic context
2016-05-26 09:15:19 -07:00
Doug Ledford e6f61130ed Merge branches 'misc-4.7-2', 'ipoib' and 'ib-router' into k.o/for-4.7 2016-05-26 11:55:19 -04:00
Dennis Dalessandro f48ad614c1 IB/hfi1: Move driver out of staging
The TODO list for the hfi1 driver was completed during 4.6. In addition
other objections raised (which are far beyond what was in the TODO list)
have been addressed as well. It is now time to remove the driver from
staging and into the drivers/infiniband sub-tree.

Reviewed-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-26 11:35:14 -04:00
Dennis Dalessandro e11ffbd575 IB/hfi1: Do not free hfi1 cdev parent structure early
The deletion of a cdev is not a fence for holding off references to the
structure. The driver attempts to delete the cdev and then proceeds to
free the parent structure, the hfi1_devdata, or dd. This can potentially
lead to a kernel panic in situations where a user has an FD for the cdev
open, and the pci device gets removed. If the user then closes the FD
there will be a NULL dereference when trying to do put on the cdev's
kobject.

Fix this by pointing the cdev's kobject.parent at a new kobject embedded
in its parent structure. Also take a reference when the device is opened
and put it back when it is closed.

Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-26 11:35:13 -04:00
Dennis Dalessandro 8a1882ebd4 IB/hfi1: Add trace message in user IOCTL handling
Add a trace message to HFI1s user IOCTL handling. This allows debugging
of which IOCTLs are being handled by the driver.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-26 11:35:13 -04:00
Dennis Dalessandro 380fb94288 IB/hfi1: Remove write(), use ioctl() for user cmds
Remove the write() handler for user space commands now that ioctl
handling is available. User apps will need to change to use ioctl from
this point forward.

Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-26 11:35:13 -04:00
Dennis Dalessandro 8d970cf991 IB/hfi1: Add ioctl() interface for user commands
IOCTL is more suited to what user space commands need to do than the
write() interface. Add IOCTL definitions for all existing write commands
and the handling for those. The write() interface will be removed in a
follow on patch.

Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-26 11:35:06 -04:00
Dennis Dalessandro ac56f162d4 IB/hfi1: Remove unused user command
The HFI1_CMD_SDMA_STATUS_UPD command was never implemented it has no
reason to live in the driver. Remove it.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-26 11:23:18 -04:00
Dennis Dalessandro 0f7b1f917c IB/hfi1: Remove snoop/diag interface
The snoop/diag interface is better served by an implementation which is
more general and usable by other drivers perhaps. Go ahead and remove
the code now and get rid of the char dev. We can put the feature back
when we have a more agreeable solution.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-26 11:23:18 -04:00
Dennis Dalessandro d079031742 IB/hfi1: Remove EPROM functionality from data device
Remove EPROM handling from the cdev which is used for user application
data traffic.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-26 11:23:18 -04:00
Dennis Dalessandro 7312f29d8e IB/hfi1: Remove UI char device
Remove UI char device which exposes direct access to registers for user
space. This was put in to aid in debugging the hardware. We are looking
into alternatives means of providing the same functionality. This
removes another char device from HFI1's footprint.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-26 11:23:18 -04:00
Dennis Dalessandro 0eb626590d IB/hfi1: Remove multiple device cdev
hfi1 current exports a cdev that can be used to target all of the hfi's
in the system. However there is a problem with this approach in
that the devices could be on different subnets. This is a problem that
user space can figure out and explicitly tell the driver on which device
to create a context.

Remove the multi-purpose cdev leaving a dedicated cdev for each port.
Also remove the striping capability that is dependent upon the user
choosing the multi-purpose cdev. It is now up to user space to determine
how to stripe contexts.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-26 11:23:17 -04:00
Dennis Dalessandro f3225c3f11 IB/hfi1: Remove anti-pattern in cdev init
Remove the usage of an anti-pattern goto in hfi1_cdev_init to improve
code readability.

Suggested-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-26 11:23:11 -04:00
Jianxin Xiong b583faf4dc IB/hfi1: Fix bug that blocks process on exit after port bounce
During the processing of a user SDMA request, if there was an
error before the request counter was increased, the state of
the packet queue could be updated incorrectly, causing the
counter to underflow. As the result, the process could get
stuck later since the counter could never get back to 0.

This patch adds a condition to guard the packet queue update
so that the counter is only decreased if it has been increased
before the error happens.

Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-26 11:23:11 -04:00
Jubin John f70f5f6af3 IB/qib: Remove unused qib_7322_intr_msgs[]
Building the qib driver with gcc version 6.1.0 raises the following
build warning:
drivers/infiniband/hw/qib/qib_iba7322.c:1311:39: warning:
'qib_7322_intr_msgs' defined but not used [-Wunused-const-variable=]
 static const struct  qib_hwerror_msgs qib_7322_intr_msgs[] = {
                                       ^~~~~~~~~~~~~~~~~~
Remove the unused qib_7322_intr_msgs[]

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-26 11:23:11 -04:00
Ira Weiny 46aa5baf96 IB/hfi1: Remove unnecessary comment
This comment was old, the MTU enums have been defined.

Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-26 11:23:10 -04:00
Jubin John eac7193632 IB/hfi1: Fix sdma_event_names[] build warning
sdma_event_names[] is only used within CONFIG_SDMA_VERBOSITY ifdefs, so
when CONFIG_SDMA_VERBOSITY is disabled, it results in the following
0-day build warning:
>> drivers/infiniband/hw/hfi1/sdma.c:137:27: warning: 'sdma_event_names'
>> defined but not used [-Wunused-const-variable=]
    static const char * const sdma_event_names[] = {
                              ^~~~~~~~~~~~~~~~
This occurs on the following compiler:
compiler: gcc-6 (Debian 6.1.1-1) 6.1.1 20160430

For more information check:
https://lists.01.org/pipermail/kbuild-all/2016-May/020060.html

Fix this warning by defining sdma_event_name[] only within the
CONFIG_SDMA_VERBOSITY ifdefs.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-26 11:23:10 -04:00
Jubin John 49961f8fe8 IB/rdmavt: Use kzalloc_node
Use kzalloc_node instead of kzalloc for rdmavt memory region segment
allocation to optimize for performance on NUMA platforms.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-26 11:23:10 -04:00
Mike Marciniszyn 654b643670 IB/rdmavt: Insure QP vmalloc variants zero memory
The usage of the various vmalloc APIs do not consistently zero memory
when allocating the swqe. Insure zeroing variants are used.

Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-26 11:23:10 -04:00
Mitko Haralanov 9565c6a37a IB/hfi1: Fix an interval RB node reference count leak
Commit e88c9271d9 ("IB/hfi1: Fix buffer cache corner case which
may cause corruption") introduced a bug which may cause a reference
count of a interval RB node to be leaked in the case where an SDMA
transfer from that node completes at the same time as the node is
being extended.

If a node is being extended, it is first removed from the RB tree
in order to be processed without the risk of an invalidation event
removing the node at the same time.

If a SDMA completion happens during that time, the completion handler
will fail to find the node in the RB tree and, therefore, fail to
correctly decrement its refcount. This leaves the node in the tree and
its pages pinned for the duration of the user process.

To prevent this from happening the io vector adds a reference to the
RB node, which is used during the SDMA completion instead of looking
up the node in the RB tree.

This change adds a performance improvement as a side effect by avoiding
the RB tree lookup.

Fixes: e88c9271d9 ("IB/hfi1: Fix buffer cache corner case which may cause corruption")
Reviewed-by: Dean Luick <dean.luick@intel.com>
Reviewed-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-26 11:23:10 -04:00
Ming Lin c7de572630 blk-mq: clear q->mq_ops if init fail
blk_mq_init_queue() calls blk_mq_init_allocated_queue(), but q->mq_ops
was not cleared when blk_mq_init_allocated_queue() fails.
Then blk_cleanup_queue() calls blk_mq_free_queue() which will crash because:
- q->all_q_node is not added to all_q_list yet
- q->tag_set is NULL
- hctx was not setup yet or already freed

Fixed it by clearing q->mq_ops on error path.

Signed-off-by: Ming Lin <ming.l@samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-26 08:51:43 -06:00
Tom Haynes c7d73af2d2 pnfs: pnfs_update_layout needs to consider if strict iomode checking is on
As flexfiles has FF_FLAGS_NO_READ_IO, there is a need to generically
support enforcing that a IOMODE_RW segment will not allow READ I/O.

Signed-off-by: Tom Haynes <loghyr@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-26 08:40:56 -04:00
Tom Haynes 602c4cd452 nfs/flexfiles: Use the layout segment for reading unless it a IOMODE_RW and reading is disabled
Signed-off-by: Tom Haynes <loghyr@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-26 08:40:51 -04:00
Glenn Dayton 1dc2c9f54b Documentation/hwmon: Update links in max34440
It appears the website for maxim-ic.com changed to
maximintegrated.com.

Signed-off-by: Glenn Dayton <glenn.dayton24@gmail.com>
Signed-off-by: Jean Delvare <jdelvare@suse.de>
2016-05-26 11:06:53 +02:00
Dan Carpenter 54f0ffc4e2 hwmon: (emc2103) Fix typo in MODULE_PARM_DESC
"apd" was intended here instead of "init".

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jean Delvare <jdelvare@suse.de>
2016-05-26 11:06:53 +02:00
David S. Miller 61248720c4 Merge branch 'mlx4-stats-fixes'
Eric Dumazet says:

====================
net/mlx4_en: fix stats

mlx4 has various bugs in its ndo_get_stats() and related functions.
This patch series address the obvious issues.
Remaining ones will be discussed later.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-25 22:15:50 -07:00
Eric Dumazet f73a6f439f net/mlx4_en: get rid of private net_device_stats
We simply can use the standard net_device stats.

We do not need to clear fields that are already 0.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-25 22:15:50 -07:00
Eric Dumazet 9ed17db17f net/mlx4_en: get rid of ret_stats
mlx4 uses a private struct net_device_stats in a vain attempt
to avoid races.

This is buggy because multiple cpus could call mlx4_en_get_stats()
at the same time, so ret_stats can not guarantee stable results.

To fix this, we need to switch to ndo_get_stats64() as this
method provides per-thread storage.

This allows to reduce mlx4_en_priv bloat.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-25 22:15:50 -07:00
Eric Dumazet 45acbac609 net/mlx4_en: clear some TX ring stats in mlx4_en_clear_stats()
mlx4_en_clear_stats() clears about everything but few TX ring
fields are missing :
- queue_stopped, wake_queue, tso_packets, xmit_more

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-25 22:15:50 -07:00
Eric Dumazet 63a664b7e9 net/mlx4_en: fix tx_dropped bug
1) mlx4_en_xmit() can increment priv->stats.tx_dropped, but this variable
is overwritten in mlx4_en_DUMP_ETH_STATS().

2) This increment was not SMP safe, as a port might have many TX queues.

Add a per TX ring tx_dropped to fix these issues.

This is u32 as mlx4_en_DUMP_ETH_STATS() will add a 32bit field.

So lets avoid bugs with SNMP agents having to cope with partial
overwraps. (One of these agents being bond_fold_stats())

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Willem de Bruijn <willemb@google.com>
Cc: Eugenia Emantayev <eugenia@mellanox.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-25 22:15:49 -07:00
Xin Long bed187b540 sctp: fix double EPs display in sctp_diag
We have this situation: that EP hash table, contains only the EPs
that are listening, while the transports one, has the opposite.
We have to traverse both to dump all.

But when we traverse the transports one we will also get EPs that are
in the EP hash if they are listening. In this case, the EP is dumped
twice.

We will fix it by checking if the endpoint that is in the endpoint
hash table contains any ep->asoc in there, as it means we will also
find it via transport hash, and thus we can/should skip it, depending
on the filters used, like 'ss -l'.

Still, we should NOT skip it if the user is listing only listening
endpoints, because then we are not traversing the transport hash.
so we have to check idiag_states there also.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-25 22:14:31 -07:00
Marek Vasut 3424d9be8f net: arc: trivial: Replace comma with a semicolon
Fix a typo in the driver, replace comma with a semicolon at the end
of statement. While using comma is a legal C here and probably does
not even generate compiler warning, it was unlikely the intention.

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Caesar Wang <wxt@rock-chips.com>
Cc: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-25 22:13:15 -07:00
Marek Vasut 643d60bf57 net: stmmac: Fix incorrect memcpy source memory
The memcpy() currently copies mdio_bus_data into new_bus->irq, which
makes no sense, since the mdio_bus_data structure contains more than
just irqs. The code was likely supposed to copy mdio_bus_data->irqs
into the new_bus->irq instead, so fix this.

Fixes: e7f4dc3536 ("mdio: Move allocation of interrupts into core")
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-25 21:43:35 -07:00
Al Viro 002354112f restore killability of old mutex_lock_killable(&inode->i_mutex) users
The ones that are taking it exclusive, that is...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-26 00:13:25 -04:00
Al Viro 887bddfa90 add down_write_killable_nested()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-26 00:04:58 -04:00
Al Viro d42b386834 update D/f/directory-locking
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-26 00:04:18 -04:00
Wenyou Yang 53b74ed2d0 Revert "mtd: atmel_nand: Support variable RB_EDGE interrupts"
This reverts commit 5ddc7bd43c ("mtd: atmel_nand: Support variable
RB_EDGE interrupts")

Because for current SoCs, the RB_EDGE3(i.e. bit 27) of HSMC_SR
register does not exist, the RB_EDGE0 (i.e. bit 24) is the ready/busy
line edge status bit. It is a datasheet bug.

Cc: <stable@vger.kernel.org>
Fixes: commit 5ddc7bd43c ("mtd: atmel_nand: Support variable RB_EDGE interrupts")
Signed-off-by: Wenyou Yang <wenyou.yang@atmel.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
2016-05-25 20:06:28 -07:00
Linus Torvalds 2f7c3a18a2 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Misc fixes: EFI, entry code, pkeys and MPX fixes, TASK_SIZE cleanups
  and a tsc frequency table fix"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Switch from TASK_SIZE to TASK_SIZE_MAX in the page fault code
  x86/fsgsbase/64: Use TASK_SIZE_MAX for FSBASE/GSBASE upper limits
  x86/mm/mpx: Work around MPX erratum SKD046
  x86/entry/64: Fix stack return address retrieval in thunk
  x86/efi: Fix 7-parameter efi_call()s
  x86/cpufeature, x86/mm/pkeys: Fix broken compile-time disabling of pkeys
  x86/tsc: Add missing Cherrytrail frequency to the table
2016-05-25 17:37:33 -07:00
Linus Torvalds f89eae4ee7 Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
 "Two fixes: one for a lost wakeup, the other to fix the compiler
  optimizing out preempt operations on ARM64 (and possibly other non-x86
  architectures)"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/core: Fix remote wakeups
  sched/preempt: Fix preempt_count manipulations
2016-05-25 17:11:43 -07:00
Linus Torvalds bdc6b758e4 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "Mostly tooling and PMU driver fixes, but also a number of late updates
  such as the reworking of the call-chain size limiting logic to make
  call-graph recording more robust, plus tooling side changes for the
  new 'backwards ring-buffer' extension to the perf ring-buffer"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits)
  perf record: Read from backward ring buffer
  perf record: Rename variable to make code clear
  perf record: Prevent reading invalid data in record__mmap_read
  perf evlist: Add API to pause/resume
  perf trace: Use the ptr->name beautifier as default for "filename" args
  perf trace: Use the fd->name beautifier as default for "fd" args
  perf report: Add srcline_from/to branch sort keys
  perf evsel: Record fd into perf_mmap
  perf evsel: Add overwrite attribute and check write_backward
  perf tools: Set buildid dir under symfs when --symfs is provided
  perf trace: Only auto set call-graph to "dwarf" when syscalls are being traced
  perf annotate: Sort list of recognised instructions
  perf annotate: Fix identification of ARM blt and bls instructions
  perf tools: Fix usage of max_stack sysctl
  perf callchain: Stop validating callchains by the max_stack sysctl
  perf trace: Fix exit_group() formatting
  perf top: Use machine->kptr_restrict_warned
  perf trace: Warn when trying to resolve kernel addresses with kptr_restrict=1
  perf machine: Do not bail out if not managing to read ref reloc symbol
  perf/x86/intel/p4: Trival indentation fix, remove space
  ...
2016-05-25 17:05:40 -07:00
Jann Horn dca6b41491 Yama: fix double-spinlock and user access in atomic context
Commit 8a56038c2a ("Yama: consolidate error reporting") causes lockups
when someone hits a Yama denial. Call chain:

process_vm_readv -> process_vm_rw -> process_vm_rw_core -> mm_access
-> ptrace_may_access
task_lock(...) is taken
__ptrace_may_access -> security_ptrace_access_check
-> yama_ptrace_access_check -> report_access -> kstrdup_quotable_cmdline
-> get_cmdline -> access_process_vm -> get_task_mm
task_lock(...) is taken again

task_lock(p) just calls spin_lock(&p->alloc_lock), so at this point,
spin_lock() is called on a lock that is already held by the current
process.

Also: Since the alloc_lock is a spinlock, sleeping inside
security_ptrace_access_check hooks is probably not allowed at all? So it's
not even possible to print the cmdline from in there because that might
involve paging in userspace memory.

It would be tempting to rewrite ptrace_may_access() to drop the alloc_lock
before calling the LSM, but even then, ptrace_may_access() itself might be
called from various contexts in which you're not allowed to sleep; for
example, as far as I understand, to be able to hold a reference to another
task, usually an RCU read lock will be taken (see e.g. kcmp() and
get_robust_list()), so that also prohibits sleeping. (And using e.g. FUSE,
a user can cause pagefault handling to take arbitrary amounts of time -
see https://bugs.chromium.org/p/project-zero/issues/detail?id=808.)

Therefore, AFAIK, in order to print the name of a process below
security_ptrace_access_check(), you'd have to either grab a reference to
the mm_struct and defer the access violation reporting or just use the
"comm" value that's stored in kernelspace and accessible without big
complications. (Or you could try to use some kind of atomic remote VM
access that fails if the memory isn't paged in, similar to
copy_from_user_inatomic(), and if necessary fall back to comm, but
that'd be kind of ugly because the comm/cmdline choice would look
pretty random to the user.)

Fix it by deferring reporting of the access violation until current
exits kernelspace the next time.

v2: Don't oops on PTRACE_TRACEME, call report_access under
task_lock(current). Also fix nonsensical comment. And don't use
GPF_ATOMIC for memory allocation with no locks held.
This patch is tested both for ptrace attach and ptrace traceme.

Fixes: 8a56038c2a ("Yama: consolidate error reporting")
Signed-off-by: Jann Horn <jann@thejh.net>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2016-05-26 09:56:18 +10:00
Feng Tang 26c5f03b2a net: alx: use custom skb allocator
This patch follows Eric Dumazet's commit 7b70176421 for Atheros
atl1c driver to fix one exactly same bug in alx driver, that the
network link will be lost in 1-5 minutes after the device is up.

My laptop Lenovo Y580 with Atheros AR8161 ethernet device hit the
same problem with kernel 4.4, and it will be cured by Jarod Wilson's
commit c406700c for alx driver which get merged in 4.5. But there
are still some alx devices can't function well even with Jarod's
patch, while this patch could make them work fine. More details on
	https://bugzilla.kernel.org/show_bug.cgi?id=70761

The debug shows the issue is very likely to be related with the RX
DMA address, specifically 0x...f80, if RX buffer get 0x...f80 several
times, their will be RX overflow error and device will stop working.

For kernel 4.5.0 with Jarod's patch which works fine with my
AR8161/Lennov Y580, if I made some change to the
	__netdev_alloc_skb
		--> __alloc_page_frag()
to make the allocated buffer can get an address with 0x...f80,
then the same error happens. If I make it to 0x...f40 or 0x....fc0,
everything will be still fine. So I tend to believe that the
0x..f80 address cause the silicon to behave abnormally.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=70761
Cc: Eric Dumazet <edumazet@google.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Feng Tang <feng.tang@intel.com>
Tested-by: Ole Lukoie <olelukoie@mail.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-25 19:52:27 -04:00
Linus Torvalds c4a346002b Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool build fix from Ingo Molnar:
 "An libtool fix for older libelf versions"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  objtool: Allow building with older libelf
2016-05-25 16:52:19 -07:00
Yan, Zheng e536030934 ceph: fix wake_up_session_cb()
We should reset i_requested_max_size before waking the waiters.
(zero i_requested_max_size make waiter re-request the max size)

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26 01:15:42 +02:00
Yan, Zheng 9abd4db713 ceph: don't use truncate_pagecache() to invalidate read cache
truncate_pagecache() drops dirty pages, it's dangerous to use it
to invalidate read cache. Besides, we shouldn't start invalidating
read cache while there are buffer writers. Because buffer writers
may add dirty pages later.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26 01:15:42 +02:00
Yan, Zheng b109eec6f4 ceph: SetPageError() for writeback pages if writepages fails
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26 01:15:41 +02:00
Yan, Zheng ad15ec06e5 ceph: handle interrupted ceph_writepage()
writepage() can be interrupted when it's called by direct memory
reclaimer (the direct memory relaimer is killed). To avoid lossing
data, we redirty the page.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26 01:15:41 +02:00
Yan, Zheng a78bbd4b29 ceph: make ceph_update_writeable_page() uninterruptible
ceph_update_writeable_page() is used by ceph_write_begin(). It beaks
atomicity of write operation if it's interruptible.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26 01:15:41 +02:00
Yan, Zheng 0e76abf21e libceph: make ceph_osdc_wait_request() uninterruptible
Ceph_osdc_wait_request() is used when cephfs issues sync IO. In most
cases, the sync IO should be uninterruptible. The fix is use killale
wait function in ceph_osdc_wait_request().

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26 01:15:40 +02:00