Commit Graph

902424 Commits

Author SHA1 Message Date
Srinivas Kandagatla e6de179d7a nvmem: core: add root_only member to nvmem device struct
As we are planning to move to use sysfs is_bin_visible callback,
having root_only as part of nvmem_device will help decide correct
permissions.

Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20200325122116.15096-2-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-25 13:45:09 +01:00
Greg Kroah-Hartman b83f68776b Update extcon for 5.7
Detailed description for this pull request:
 1. Update the extcon provider driver as following:
 - Add wakeup support for extcon-axp288.c
 - Clean-up code of -EPROBE_DEFER error case for extcon-palmas.c
 - Covert extcon-usbc-cros-ec.txt to yaml format
 2. Export symbol of extcon_get_edev_name()
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEsSpuqBtbWtRe4rLGnM3fLN7rz1MFAl56lX8WHGN3MDAuY2hv
 aUBzYW1zdW5nLmNvbQAKCRCczd8s3uvPU8qYD/96o/Uh7PSvPCawwOwxex4Zot+J
 c4h9xXNSSe/Pm7NUlQL5et4l+xQkCc/6dWWf9S0WhSoTxfh8aJ7zJkuuzR3K37tk
 /lMNQHBlRvgf2eb09TPAHjVwZP/aj4j+CFqdlicfcE9ajB4z5HLNQ6K9PgGNRazE
 RfwY0o14ySCm96M+90ntEAi4Izpa8PD+6f3CoWOA1guI3meyOmLhtZHoNk3P6Wg7
 Yo+T8zSoqJCLOklmUEUeBS4H8kr5mTjvbF94ElRqhJhqOHgZ3n2FiMhMWnsEeQlf
 KwCVk0Jh4NAG/d8XOdz9UxKI8FHCwEsa/YLJD1ezD/GgWgns6loPBC3e5pFwbUYf
 4se9urGA5jpQp5pSrvysh1qcg+P3b59el8ymTigONWiicZVPNRKiyCHp3ngRp8sa
 shJFENKIEfhtfTxY0dWegfM0EEgq2xLLJa7IG21xnGzO/xQ84iiW0sUdlCAgJpTN
 GJtkgRY3iRz5TG2sjoBaaxVffossg/GG0GUqruK38u4Q7trwPI4sTQgk2c4boXA3
 4KD+eBboivaqDftUuAbR+umwvZ20HFg0yAB+Q6WAO+d8OfQDv5raGs2svapvvppB
 xJj4FqPmVbINC/jDPDNBzzE7AfhKPqHOoiyuS6f1SXEnnGe9Uu5Ms1TXN6AcIKcH
 JbqhwN8+mz9XnN/dTQ==
 =X0Vk
 -----END PGP SIGNATURE-----

Merge tag 'extcon-next-for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/extcon into char-misc-next

Chanwoo writes:

Update extcon for 5.7

Detailed description for this pull request:
1. Update the extcon provider driver as following:
- Add wakeup support for extcon-axp288.c
- Clean-up code of -EPROBE_DEFER error case for extcon-palmas.c
- Covert extcon-usbc-cros-ec.txt to yaml format
2. Export symbol of extcon_get_edev_name()

* tag 'extcon-next-for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/extcon:
  extcon: axp288: Add wakeup support
  extcon: Mark extcon_get_edev_name() function as exported symbol
  extcon: palmas: Hide error messages if gpio returns -EPROBE_DEFER
  dt-bindings: extcon: usbc-cros-ec: convert extcon-usbc-cros-ec.txt to yaml format
2020-03-25 13:25:58 +01:00
Hans de Goede 9c94553099 extcon: axp288: Add wakeup support
On devices with an AXP288, we need to wakeup from suspend when a charger
is plugged in, so that we can do charger-type detection and so that the
axp288-charger driver, which listens for our extcon events, can configure
the input-current-limit accordingly.

Cc: stable@vger.kernel.org
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
2020-03-25 08:16:14 +09:00
Mayank Rana 995bb10923 extcon: Mark extcon_get_edev_name() function as exported symbol
extcon_get_edev_name() function provides client driver to request
extcon dev's name. If extcon driver and client driver are compiled
as loadable modules, extcon_get_edev_name() function symbol is not
visible to client driver. Hence mark extcon_find_edev_name() function
as exported symbol.

Signed-off-by: Mayank Rana <mrana@codeaurora.org>
Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
2020-03-25 08:16:13 +09:00
H. Nikolaus Schaller 3426ad6d40 extcon: palmas: Hide error messages if gpio returns -EPROBE_DEFER
If the gpios are probed after this driver (e.g. if they
come from an i2c expander) there is no need to print an
error message.

Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com>
Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
2020-03-25 08:16:13 +09:00
Dafna Hirschfeld 1d27904703 dt-bindings: extcon: usbc-cros-ec: convert extcon-usbc-cros-ec.txt to yaml format
convert the binding file extcon-usbc-cros-ec.txt to
yaml format extcon-usbc-cros-ec.yaml

This was tested and verified on ARM with:
make dt_binding_check DT_SCHEMA_FILES=Documentation/devicetree/bindings/extcon/extcon-usbc-cros-ec.yaml
make dtbs_check DT_SCHEMA_FILES=Documentation/devicetree/bindings/extcon/extcon-usbc-cros-ec.yaml

Signed-off-by: Dafna Hirschfeld <dafna.hirschfeld@collabora.com>
Reviewed-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
2020-03-25 08:16:13 +09:00
Manivannan Sadhasivam 821747386c bus: mhi: core: Pass module owner during client driver registration
The module owner field can be used to prevent the removal of kernel
modules when there are any device files associated with it opened in
userspace. Hence, modify the API to pass module owner field. For
convenience, module_mhi_driver() macro is used which takes care of
passing the module owner through THIS_MODULE of the module of the
driver and also avoiding the use of specifying the default MHI client
driver register/unregister routines.

Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20200324061050.14845-2-manivannan.sadhasivam@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-24 13:45:24 +01:00
Alexander Shishkin 8622dfefb6 intel_th: msu: Make stopping the trace optional
Some use cases prefer to keep collecting the trace data into the last
available window while the other windows are being offloaded instead of
stopping the trace. In this scenario, the window switch happens
automatically when the next window becomes available again.

Add an option to allow this and a sysfs attribute to enable it.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20200319085152.52183-1-alexander.shishkin@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-24 13:45:24 +01:00
Randy Dunlap 3baf89abca bus/mhi: fix printk format for size_t
Fix printk format warning by using %z for size_t modifier:

../drivers/bus/mhi/core/boot.c: In function `mhi_rddm_prepare':
../drivers/bus/mhi/core/boot.c:55:15: warning: format `%lx' expects argument of type `long unsigned int', but argument 5 has type `size_t {aka unsigned int}' [-Wformat=]
  dev_dbg(dev, "Address: %p and len: 0x%lx sequence: %u
",

Link: http://lkml.kernel.org/r/c4852a82-cdb9-6318-70a4-96ccb4ba5af2@infradead.org
Fixes: 6fdfdd2732 ("bus: mhi: core: Add support for downloading RDDM image during panic")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Cc: Hemant Kumar <hemantk@codeaurora.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/r/20200324022505.UiPPJZVXX%akpm@linux-foundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-24 12:40:28 +01:00
Greg Kroah-Hartman 9d20328d0b This tag contains the following changes for kernel 5.7:
- MMU code improvements that includes:
   - Flush MMU TLB cache only once, at the end of mapping/unmapping
     function, instead of flushing after mapping of every page.
   - Add future ASIC support by splitting properties of ASIC capabilities
     regarding mapping of host memory to regular and huge pages.
 
 - Add debugfs interface to write and read 64-bit values from the device's
   memory/registers. Previously the driver provided interface for 32-bit
   values and this will allow the user to debug much more quickly. We saw it
   gives a boost of around 1.5 - 1.7 when reading internal memories.
 
 - Support temperature offset via sysfs as defined in
   https://www.kernel.org/doc/Documentation/hwmon/sysfs-interface
 
 - Display historical maximum of various sensors.
 
 - Print to kernel log when clock throttling occurs to due breach of power
   or thermal envelope. Also prints when clock throttling is finished
   (clock is back to optimal).
 
 - Fix bug when moving from manual to auto power-management mode.
 
 - Print a message ("unsupported device") to kernel log in case a GAUDI device
   is recognized.
 
 - Small bug fixes and minor improvements to code.
 -----BEGIN PGP SIGNATURE-----
 
 iQFJBAABCgA0FiEE7TEboABC71LctBLFZR1NuKta54AFAl55zpQWHG9kZWQuZ2Fi
 YmF5QGdtYWlsLmNvbQAKCRBlHU24q1rngOuHB/iBZhX6XY5uRMW4BIOxfXC+x3by
 r0lugR7pvlJ+w+SrM3IjwAxlF6T9QDfiZviS5MqnRhSupTFzmekGbq6KZEnLmuyQ
 nsnzLBW2auzf8kjAIrCB7ddl6GfakJS4elyZMKEhQkWmQsJJ6vv+TvACOVvHzb1J
 o4lXxRqeEruzA/OBXxaTjC9MLQa/tRyT6LQoSg4L+bHlx/JZO5T1eFQWcMACoWV1
 75ZV6o7vaVRpFx3CTMF5S+MQCoZbroYNwNz/Xaqc8ezpCFB2LzheYpKUXpuf9Y47
 lnDAoHKnozOoZDVDzgdc5hgGeNOnRISSAkmAnI0rKg+JZKOw+v02Zw7nciA=
 =FUs3
 -----END PGP SIGNATURE-----

Merge tag 'misc-habanalabs-next-2020-03-24' of git://people.freedesktop.org/~gabbayo/linux into char-misc-next

Oded writes:

This tag contains the following changes for kernel 5.7:

- MMU code improvements that includes:
  - Flush MMU TLB cache only once, at the end of mapping/unmapping
    function, instead of flushing after mapping of every page.
  - Add future ASIC support by splitting properties of ASIC capabilities
    regarding mapping of host memory to regular and huge pages.

- Add debugfs interface to write and read 64-bit values from the device's
  memory/registers. Previously the driver provided interface for 32-bit
  values and this will allow the user to debug much more quickly. We saw it
  gives a boost of around 1.5 - 1.7 when reading internal memories.

- Support temperature offset via sysfs as defined in
  https://www.kernel.org/doc/Documentation/hwmon/sysfs-interface

- Display historical maximum of various sensors.

- Print to kernel log when clock throttling occurs to due breach of power
  or thermal envelope. Also prints when clock throttling is finished
  (clock is back to optimal).

- Fix bug when moving from manual to auto power-management mode.

- Print a message ("unsupported device") to kernel log in case a GAUDI device
  is recognized.

- Small bug fixes and minor improvements to code.

* tag 'misc-habanalabs-next-2020-03-24' of git://people.freedesktop.org/~gabbayo/linux:
  habanalabs: fix pm manual->auto in GOYA
  habanalabs: show unsupported message for GAUDI
  habanalabs: add print upon clock change
  habanalabs: update goya firmware register map
  habanalabs: Add missing annotation for goya_hw_queues_unlock()
  habanalabs: Add missing annotation for goya_hw_queues_lock()
  habanalabs: Remove unused parse_cnt variable
  habanalabs: provide historical maximum of various sensors
  habanalabs: modify the return values of hl_read/write routines
  habanalabs: support temperature offset via sysfs
  habanalabs: ratelimit error prints of IRQs
  habanalabs: add debugfs write64/read64
  habanalabs: fix DDR bar address setting
  habanalabs: removing extra ;
  habanalabs: Avoid running restore chunks if no execute chunks
  habanalabs: Modify CS jobs counter to u16
  habanalabs: split the host MMU properties
  habanalabs: use the user CB size as a default job size
  habanalabs: flush only at the end of the map/unmap
2020-03-24 11:06:05 +01:00
Oded Gabbay 1184550155 habanalabs: fix pm manual->auto in GOYA
When moving from manual to automatic power management mode in GOYA, the
driver didn't correctly place the device in LOW power mode. As a result, if
an application was run immediately after the move, it would have run with
low frequencies.

Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-03-24 10:54:17 +02:00
Oded Gabbay 6966d9e1f2 habanalabs: show unsupported message for GAUDI
If a GAUDI device is present in the system, display an error message that
it is not supported by the current kernel.

Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-03-24 10:54:17 +02:00
Omer Shpigelman 4f0e6ab78a habanalabs: add print upon clock change
Add print upon clock slow down due to power consumption or overheating.
In addition, add print when back to optimal clock.

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-03-24 10:54:17 +02:00
Oded Gabbay bc6ed3aa92 habanalabs: update goya firmware register map
Use specific values in enum of register map to be able to deprecate old
values.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-03-24 10:54:17 +02:00
Jules Irenge 8a7a88c10c habanalabs: Add missing annotation for goya_hw_queues_unlock()
Sparse reports a warning at goya_hw_queues_unlock()
warning: context imbalance in goya_hw_queues_unlock() - unexpected unlock
The root cause is a missing annotation at goya_hw_queues_unlock()
Add the missing __releases(&goya->hw_queues_lock) annotation

Signed-off-by: Jules Irenge <jbi.octave@gmail.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-03-24 10:54:17 +02:00
Jules Irenge cf87f966d2 habanalabs: Add missing annotation for goya_hw_queues_lock()
Sparse reports a warning at goya_hw_queues_lock()
warning: context imbalance in goya_hw_queues_lock() - wrong count at exit
The root cause is a missing annotation at goya_hw_queues_lock()
Add the missing __acquires(&goya->hw_queues_lock) annotation

Signed-off-by: Jules Irenge <jbi.octave@gmail.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-03-24 10:54:17 +02:00
Tomer Tayar b41e9728d8 habanalabs: Remove unused parse_cnt variable
The "parse_cnt" variable is incremented while validating the CS chunks,
but it is actually not being used.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-03-24 10:54:17 +02:00
Christine Gharzuzi 0da10e683e habanalabs: provide historical maximum of various sensors
Add support for hwmon_in_highest, hwmon_temp_highest and hwmon_curr_highest
attributes. These attributes retrieve the historical maximum voltage,
temperature and current that were sampled, respectively.

Signed-off-by: Christine Gharzuzi <cgharzuzi@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-03-24 10:54:16 +02:00
Moti Haimovski d57b83c3df habanalabs: modify the return values of hl_read/write routines
The hl read and write routines implement the hwmon_ops read and write
interface routines respectively.
These routines are expected to return a completion status when called,
which was not the case until this commit.
This commit modifies these routines to return 0 upon success and a
negative error value upon failure.

Signed-off-by: Moti Haimovski <mhaimovski@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-03-24 10:54:16 +02:00
Moti Haimovski 5557b138dc habanalabs: support temperature offset via sysfs
This commit adds support for offsetting the temperatures reading
by a specified value as defined in
https://www.kernel.org/doc/Documentation/hwmon/sysfs-interface
using the standard sysfs defined for hwmon.
This is required by system administrators to inject errors to test
their monitoring applications in data centers.

Signed-off-by: Moti Haimovski <mhaimovski@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-03-24 10:54:16 +02:00
Oded Gabbay e5509d5279 habanalabs: ratelimit error prints of IRQs
The compute engines can perform millions of transactions per second. If
there is a bug in the S/W stack, we could get a lot of interrupts and spam
the kernel log. Therefore, ratelimit these prints

Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-03-24 10:54:16 +02:00
Moti Haimovski 5cce51464c habanalabs: add debugfs write64/read64
Allow debug user to write/read 64-bit data through debugfs.
This will expedite the dump process of the (large) internal
memories of the device done during debug.

Signed-off-by: Moti Haimovski <mhaimovski@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-03-24 10:54:16 +02:00
Omer Shpigelman 0c002ceb39 habanalabs: fix DDR bar address setting
DRAM_PHYS_BASE is already taken into account in MMU_PAGE_TABLES_ADDR.

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-03-24 10:54:16 +02:00
Oded Gabbay 7491c036cb habanalabs: removing extra ;
There is an extra ; after the end of a function, which needs to be removed

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Tomer Tayar <ttayar@habana.ai>
2020-03-24 10:54:16 +02:00
Tomer Tayar 1718a45b28 habanalabs: Avoid running restore chunks if no execute chunks
CS with no chunks for execute phase is invalid, so its
context_switch/restore phase should not be run.
Hence, move the check of the execute chunks number to the beginning of
hl_cs_ioctl().

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-03-24 10:54:16 +02:00
Tomer Tayar f3a838c0c7 habanalabs: Modify CS jobs counter to u16
As HL_MAX_JOBS_PER_CS is 512, it is possible that more than 255 CS jobs
will be submitted for a certain queue. Hence, modify the
"jobs_in_queue_cnt" parameter of the "hl_cs" structure to be u16 instead
of u8.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-03-24 10:54:16 +02:00
Omer Shpigelman 64a7e2955d habanalabs: split the host MMU properties
Host memory may be allocated with huge pages.
A different virtual range may be used for mapping in this case.
Add Huge PCI MMU (HPMMU) properties to support it.
This patch is a prerequisite for future ASICs support and has no effect on
Goya ASIC as currently a single virtual host range is used for all page
sizes.

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-03-24 10:54:16 +02:00
Omer Shpigelman 240c92fd04 habanalabs: use the user CB size as a default job size
When no patched command buffer (CB) is created, use the user CB size as
the job size.

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-03-24 10:54:16 +02:00
Pawel Piskorski 7fc40bcaa6 habanalabs: flush only at the end of the map/unmap
Optimize hl_mmu_map and hl_mmu_unmap by not calling flush(ctx)
within per-page loop.

Signed-off-by: Pawel Piskorski <ppiskorski@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-03-24 10:54:16 +02:00
Anson Huang bbde5709ee nvmem: mxs-ocotp: Use devm_add_action_or_reset() for cleanup
Use devm_add_action_or_reset() for cleanup to call clk_unprepare(),
which can simplify the error handling in .probe, and .remove callback
can be dropped.

Signed-off-by: Anson Huang <Anson.Huang@nxp.com>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20200323150007.7487-5-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-23 20:05:23 +01:00
Baolin Wang 4bd5a15d93 nvmem: sprd: Determine double data programming from device data
We've saved the double data flag in the device data, so we should
use it when programming a block.

Signed-off-by: Baolin Wang <baolin.wang7@gmail.com>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20200323150007.7487-4-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-23 20:05:23 +01:00
Freeman Liu 5af25388ba nvmem: sprd: Optimize the block lock operation
We have some cases that will programme the eFuse block partially multiple
times, so we should allow the block to be programmed again if it was
programmed partially. But we should lock the block if the whole block
was programmed. Thus add a condition to validate if we need lock the
block or not.

Moreover we only enable the auto-check function when locking the block.

Signed-off-by: Freeman Liu <freeman.liu@unisoc.com>
Signed-off-by: Baolin Wang <baolin.wang7@gmail.com>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20200323150007.7487-3-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-23 20:05:23 +01:00
Freeman Liu c66ebde4d9 nvmem: sprd: Fix the block lock operation
According to the Spreadtrum eFuse specification, we should write 0 to
the block to trigger the lock operation.

Fixes: 096030e7f4 ("nvmem: sprd: Add Spreadtrum SoCs eFuse support")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Freeman Liu <freeman.liu@unisoc.com>
Signed-off-by: Baolin Wang <baolin.wang7@gmail.com>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20200323150007.7487-2-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-23 20:05:23 +01:00
Greg Kroah-Hartman 33e12f6e45 soundwire updates for v5.7-rc1
This contains updates to stream and pm handling in the core as well as
 updates to Intel drivers for hw sequencing and multi-link.
 
 Details:
 Core:
   - Updates to stream handling for state machine checks
   - Changes to handle potential races for probe/enumeration and init of the bus
   - Add no pm version of read and writes
   - Support for multiple Slave on same link
   - Add read_only_wordlength for simple/reduced ports
 
 Intel:
   - Updates to cadence lib to handle hw sequencing
   - Support for audio dai calls in intel driver
   - Multi link support for cadence lib
 
 Qualcomm:
   - Support for get_sdw_stream()
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+vs47OPLdNbVcHzyfBQHDyUjg0cFAl54ppEACgkQfBQHDyUj
 g0edww/+Pm5Z5OeJgzf6Ekx6wIMqyOvJoEaYreloKck9Cbr0TEKHkLASKcMqWdUj
 jl+gPyZcT7piDGFuj5HP4Ld/6PLmfgiNTUNbLXTftLrJZa2NrjYp3RuKon2Zug+z
 2Y6fiV1nOTtp5oqGunsEPP4LxEDVEsj3pGa2TumCkgd0MrAPDKApgp/icrQ2f1xl
 UmXicDkLRvIPV29VaCsaIki6+Te9JjA7r5TRpSEK7NSzdiq2/+lu1cHypn3Py38a
 eaLKGZxN/hnSDIK/7PHSCmzbd2e0MDSGeRrFFeLQ5J3rUwz/Mg1UtEG+KRq+YusN
 qpkoKwnbDpZ+2TSpJvd33xmO7saJTI4/tbo8WVxtZBnSSs2Im4jlLR5rtX5OczLw
 OE2XYWOFtVu0vtgjbLsTT6Y/AlJSQ7h4mR3DfapZQ01hPGRIp9UIA+A0Sum9hX4e
 R1V9yPp41QX1TbaOIgN6IkyKt/3DYOKJ3LhsD25pPo0Dhqwdyvnl2o7yCMGfa5+1
 ISIBJ5MAF3dAFRuDXs6H2oqWX5ZiSUflxO54wMYDpj7pQImXGmgIJVMpd8D3yZ81
 jmd3btgo/uM746s9UwayA7+oGbrsFAVxqp+YS8IkDOZlqa+Qfy6/+KYRkS/vccBg
 L+/QLDcPW+E5Qeul1VNpLCxv/EYybTI59Tt8+UwkY/udYVN1kew=
 =UPKg
 -----END PGP SIGNATURE-----

Merge tag 'soundwire-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire into char-misc-next

Vinod writes:

soundwire updates for v5.7-rc1

This contains updates to stream and pm handling in the core as well as
updates to Intel drivers for hw sequencing and multi-link.

Details:
Core:
  - Updates to stream handling for state machine checks
  - Changes to handle potential races for probe/enumeration and init of the bus
  - Add no pm version of read and writes
  - Support for multiple Slave on same link
  - Add read_only_wordlength for simple/reduced ports

Intel:
  - Updates to cadence lib to handle hw sequencing
  - Support for audio dai calls in intel driver
  - Multi link support for cadence lib

Qualcomm:
  - Support for get_sdw_stream()

* tag 'soundwire-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire: (43 commits)
  soundwire: qcom: add support for get_sdw_stream()
  soundwire: stream: Add read_only_wordlength flag to port properties
  soundwire: cadence: clear FIFO to avoid pop noise issue on playback start
  soundwire: cadence: multi-link support
  soundwire: cadence: commit changes in the exit_reset() sequence
  soundwire: cadence: remove automatic command retries
  soundwire: cadence: remove PREQ_DELAY assignment
  soundwire: cadence: enable NORMAL operation in cdns_init()
  soundwire: cadence: reorder MCP_CONFIG settings
  soundwire: cadence: make SSP interval programmable
  soundwire: cadence: move clock/SSP related inits to dedicated function
  soundwire: cadence: merge routines to clear/set bits
  soundwire: cadence: mask Slave interrupt before stopping clock
  soundwire: cadence: fix a io timeout issue in S3 test
  soundwire: cadence: add clock_stop/restart routines
  soundwire: cadence: handle error cases with CONFIG_UPDATE
  soundwire: cadence: add interface to check clock status
  soundwire: cadence: simplifiy cdns_init()
  soundwire: cadence: s/update_config/config_update
  soundwire: stream: use sdw_write instead of update
  ...
2020-03-23 13:19:00 +01:00
Greg Kroah-Hartman baca54d956 Merge 5.6-rc7 into char-misc-next
We need the char/misc driver fixes in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-23 07:59:38 +01:00
Linus Torvalds 16fbf79b0f Linux 5.6-rc7 2020-03-22 18:31:56 -07:00
Linus Torvalds 67d584e33e for-5.6-rc6-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAl53Vh0ACgkQxWXV+ddt
 WDtfOQ//bbUyKXcdH0FBZOCEcJmegcK1eUFYqKrwR2bHGe5JRdLM8pAvjCcqmWeO
 jtaRiFC4NSCqTIl3mkBUb+XmQtjZwixBUHRxJpuEO8zqawvFZXTqg/KJklNvi2rd
 KdflSNia6KrozTT+B/lpwZ5emS+wSdj5XTZ6VGj4riwtphSfWAjOu+4cOASMeFu+
 Gfn+N9xu0ZcR/6zO20xAg0Xz+WU2uj4EfeM35dtRP2bPLG0yOGmiYT15Ll9h74Wm
 7F+28iNTQfYutAexGvUpiouanGXE+ka3TCsJg5LuVTpdKGraOVGEuX+RhsyoKQrB
 E8bk91fbkLlooluhUC306iNA9/+RN/yFGtILX8JsgI2Od26ZuU01l/OHrc19MDIm
 gw1w3PMsD/hXLsG5ba4QsIYOzXofSrPdWej29h/o5p0VEQrAoCJEpAi7fVsiJDR1
 sx6kCodw5jYhVs1P6DdXO1pgjE7iFUmjUQCFkl40edPMLy/LwB99A4zNnCOwI0KZ
 49CMWHDe+tXVJBTzPvtma/PycQHIxJYMf1f8ko9E4stB7HtfH4dnUERDkb1UwQ5n
 aJgyhsCCnp/EJoPunUT7g9nLUdyu0Rtwknn3NascWZEieX2QhKEF5RcjAUSL+Hlo
 jbGGvoLhG0nOtYkU7BNSQbL8wxPJEEAq8e6F4tWMcOkhX4pNZP8=
 =YkB0
 -----END PGP SIGNATURE-----

Merge tag 'for-5.6-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
 "Two fixes.

  The first is a regression: when dropping some incompat bits the
  conditions were reversed. The other is a fix for rename whiteout
  potentially leaving stack memory linked to a list"

* tag 'for-5.6-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fix removal of raid[56|1c34} incompat flags after removing block group
  btrfs: fix log context list corruption after rename whiteout error
2020-03-22 11:35:33 -07:00
Linus Torvalds b3c03db67e Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "10 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  x86/mm: split vmalloc_sync_all()
  mm, slub: prevent kmalloc_node crashes and memory leaks
  mm/mmu_notifier: silence PROVE_RCU_LIST warnings
  epoll: fix possible lost wakeup on epoll_ctl() path
  mm: do not allow MADV_PAGEOUT for CoW pages
  mm, memcg: throttle allocators based on ancestral memory.high
  mm, memcg: fix corruption on 64-bit divisor in memory.high throttling
  page-flags: fix a crash at SetPageError(THP_SWAP)
  mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case
  memcg: fix NULL pointer dereference in __mem_cgroup_usage_unregister_event
2020-03-22 10:46:50 -07:00
Joerg Roedel 763802b53a x86/mm: split vmalloc_sync_all()
Commit 3f8fd02b1b ("mm/vmalloc: Sync unmappings in
__purge_vmap_area_lazy()") introduced a call to vmalloc_sync_all() in
the vunmap() code-path.  While this change was necessary to maintain
correctness on x86-32-pae kernels, it also adds additional cycles for
architectures that don't need it.

Specifically on x86-64 with CONFIG_VMAP_STACK=y some people reported
severe performance regressions in micro-benchmarks because it now also
calls the x86-64 implementation of vmalloc_sync_all() on vunmap().  But
the vmalloc_sync_all() implementation on x86-64 is only needed for newly
created mappings.

To avoid the unnecessary work on x86-64 and to gain the performance
back, split up vmalloc_sync_all() into two functions:

	* vmalloc_sync_mappings(), and
	* vmalloc_sync_unmappings()

Most call-sites to vmalloc_sync_all() only care about new mappings being
synchronized.  The only exception is the new call-site added in the
above mentioned commit.

Shile Zhang directed us to a report of an 80% regression in reaim
throughput.

Fixes: 3f8fd02b1b ("mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()")
Reported-by: kernel test robot <oliver.sang@intel.com>
Reported-by: Shile Zhang <shile.zhang@linux.alibaba.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Borislav Petkov <bp@suse.de>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	[GHES]
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20191009124418.8286-1-joro@8bytes.org
Link: https://lists.01.org/hyperkitty/list/lkp@lists.01.org/thread/4D3JPPHBNOSPFK2KEPC6KGKS6J25AIDB/
Link: http://lkml.kernel.org/r/20191113095530.228959-1-shile.zhang@linux.alibaba.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-21 18:56:06 -07:00
Vlastimil Babka 0715e6c516 mm, slub: prevent kmalloc_node crashes and memory leaks
Sachin reports [1] a crash in SLUB __slab_alloc():

  BUG: Kernel NULL pointer dereference on read at 0x000073b0
  Faulting instruction address: 0xc0000000003d55f4
  Oops: Kernel access of bad area, sig: 11 [#1]
  LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
  Modules linked in:
  CPU: 19 PID: 1 Comm: systemd Not tainted 5.6.0-rc2-next-20200218-autotest #1
  NIP:  c0000000003d55f4 LR: c0000000003d5b94 CTR: 0000000000000000
  REGS: c0000008b37836d0 TRAP: 0300   Not tainted  (5.6.0-rc2-next-20200218-autotest)
  MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 24004844  XER: 00000000
  CFAR: c00000000000dec4 DAR: 00000000000073b0 DSISR: 40000000 IRQMASK: 1
  GPR00: c0000000003d5b94 c0000008b3783960 c00000000155d400 c0000008b301f500
  GPR04: 0000000000000dc0 0000000000000002 c0000000003443d8 c0000008bb398620
  GPR08: 00000008ba2f0000 0000000000000001 0000000000000000 0000000000000000
  GPR12: 0000000024004844 c00000001ec52a00 0000000000000000 0000000000000000
  GPR16: c0000008a1b20048 c000000001595898 c000000001750c18 0000000000000002
  GPR20: c000000001750c28 c000000001624470 0000000fffffffe0 5deadbeef0000122
  GPR24: 0000000000000001 0000000000000dc0 0000000000000002 c0000000003443d8
  GPR28: c0000008b301f500 c0000008bb398620 0000000000000000 c00c000002287180
  NIP ___slab_alloc+0x1f4/0x760
  LR __slab_alloc+0x34/0x60
  Call Trace:
    ___slab_alloc+0x334/0x760 (unreliable)
    __slab_alloc+0x34/0x60
    __kmalloc_node+0x110/0x490
    kvmalloc_node+0x58/0x110
    mem_cgroup_css_online+0x108/0x270
    online_css+0x48/0xd0
    cgroup_apply_control_enable+0x2ec/0x4d0
    cgroup_mkdir+0x228/0x5f0
    kernfs_iop_mkdir+0x90/0xf0
    vfs_mkdir+0x110/0x230
    do_mkdirat+0xb0/0x1a0
    system_call+0x5c/0x68

This is a PowerPC platform with following NUMA topology:

  available: 2 nodes (0-1)
  node 0 cpus:
  node 0 size: 0 MB
  node 0 free: 0 MB
  node 1 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
  node 1 size: 35247 MB
  node 1 free: 30907 MB
  node distances:
  node   0   1
    0:  10  40
    1:  40  10

  possible numa nodes: 0-31

This only happens with a mmotm patch "mm/memcontrol.c: allocate
shrinker_map on appropriate NUMA node" [2] which effectively calls
kmalloc_node for each possible node.  SLUB however only allocates
kmem_cache_node on online N_NORMAL_MEMORY nodes, and relies on
node_to_mem_node to return such valid node for other nodes since commit
a561ce00b0 ("slub: fall back to node_to_mem_node() node if allocating
on memoryless node").  This is however not true in this configuration
where the _node_numa_mem_ array is not initialized for nodes 0 and 2-31,
thus it contains zeroes and get_partial() ends up accessing
non-allocated kmem_cache_node.

A related issue was reported by Bharata (originally by Ramachandran) [3]
where a similar PowerPC configuration, but with mainline kernel without
patch [2] ends up allocating large amounts of pages by kmalloc-1k
kmalloc-512.  This seems to have the same underlying issue with
node_to_mem_node() not behaving as expected, and might probably also
lead to an infinite loop with CONFIG_SLUB_CPU_PARTIAL [4].

This patch should fix both issues by not relying on node_to_mem_node()
anymore and instead simply falling back to NUMA_NO_NODE, when
kmalloc_node(node) is attempted for a node that's not online, or has no
usable memory.  The "usable memory" condition is also changed from
node_present_pages() to N_NORMAL_MEMORY node state, as that is exactly
the condition that SLUB uses to allocate kmem_cache_node structures.
The check in get_partial() is removed completely, as the checks in
___slab_alloc() are now sufficient to prevent get_partial() being
reached with an invalid node.

[1] https://lore.kernel.org/linux-next/3381CD91-AB3D-4773-BA04-E7A072A63968@linux.vnet.ibm.com/
[2] https://lore.kernel.org/linux-mm/fff0e636-4c36-ed10-281c-8cdb0687c839@virtuozzo.com/
[3] https://lore.kernel.org/linux-mm/20200317092624.GB22538@in.ibm.com/
[4] https://lore.kernel.org/linux-mm/088b5996-faae-8a56-ef9c-5b567125ae54@suse.cz/

Fixes: a561ce00b0 ("slub: fall back to node_to_mem_node() node if allocating on memoryless node")
Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Reported-by: PUVICHAKRAVARTHY RAMACHANDRAN <puvichakravarthy@in.ibm.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Tested-by: Bharata B Rao <bharata@linux.ibm.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Christopher Lameter <cl@linux.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200320115533.9604-1-vbabka@suse.cz
Debugged-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-21 18:56:06 -07:00
Qian Cai 63886bad90 mm/mmu_notifier: silence PROVE_RCU_LIST warnings
It is safe to traverse mm->notifier_subscriptions->list either under
SRCU read lock or mm->notifier_subscriptions->lock using
hlist_for_each_entry_rcu().  Silence the PROVE_RCU_LIST false positives,
for example,

  WARNING: suspicious RCU usage
  -----------------------------
  mm/mmu_notifier.c:484 RCU-list traversed in non-reader section!!

  other info that might help us debug this:

  rcu_scheduler_active = 2, debug_locks = 1
  3 locks held by libvirtd/802:
   #0: ffff9321e3f58148 (&mm->mmap_sem#2){++++}, at: do_mprotect_pkey+0xe1/0x3e0
   #1: ffffffff91ae6160 (mmu_notifier_invalidate_range_start){+.+.}, at: change_p4d_range+0x5fa/0x800
   #2: ffffffff91ae6e08 (srcu){....}, at: __mmu_notifier_invalidate_range_start+0x178/0x460

  stack backtrace:
  CPU: 7 PID: 802 Comm: libvirtd Tainted: G          I       5.6.0-rc6-next-20200317+ #2
  Hardware name: HP ProLiant BL460c Gen8, BIOS I31 11/02/2014
  Call Trace:
    dump_stack+0xa4/0xfe
    lockdep_rcu_suspicious+0xeb/0xf5
    __mmu_notifier_invalidate_range_start+0x3ff/0x460
    change_p4d_range+0x746/0x800
    change_protection+0x1df/0x300
    mprotect_fixup+0x245/0x3e0
    do_mprotect_pkey+0x23b/0x3e0
    __x64_sys_mprotect+0x51/0x70
    do_syscall_64+0x91/0xae8
    entry_SYSCALL_64_after_hwframe+0x49/0xb3

Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Link: http://lkml.kernel.org/r/20200317175640.2047-1-cai@lca.pw
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-21 18:56:06 -07:00
Roman Penyaev 1b53734bd0 epoll: fix possible lost wakeup on epoll_ctl() path
This fixes possible lost wakeup introduced by commit a218cc4914.
Originally modifications to ep->wq were serialized by ep->wq.lock, but
in commit a218cc4914 ("epoll: use rwlock in order to reduce
ep_poll_callback() contention") a new rw lock was introduced in order to
relax fd event path, i.e. callers of ep_poll_callback() function.

After the change ep_modify and ep_insert (both are called on epoll_ctl()
path) were switched to ep->lock, but ep_poll (epoll_wait) was using
ep->wq.lock on wqueue list modification.

The bug doesn't lead to any wqueue list corruptions, because wake up
path and list modifications were serialized by ep->wq.lock internally,
but actual waitqueue_active() check prior wake_up() call can be
reordered with modifications of ep ready list, thus wake up can be lost.

And yes, can be healed by explicit smp_mb():

  list_add_tail(&epi->rdlink, &ep->rdllist);
  smp_mb();
  if (waitqueue_active(&ep->wq))
	wake_up(&ep->wp);

But let's make it simple, thus current patch replaces ep->wq.lock with
the ep->lock for wqueue modifications, thus wake up path always observes
activeness of the wqueue correcty.

Fixes: a218cc4914 ("epoll: use rwlock in order to reduce ep_poll_callback() contention")
Reported-by: Max Neunhoeffer <max@arangodb.com>
Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Max Neunhoeffer <max@arangodb.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Christopher Kohlhoff <chris.kohlhoff@clearpool.io>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Jes Sorensen <jes.sorensen@gmail.com>
Cc: <stable@vger.kernel.org>	[5.1+]
Link: http://lkml.kernel.org/r/20200214170211.561524-1-rpenyaev@suse.de
References: https://bugzilla.kernel.org/show_bug.cgi?id=205933
Bisected-by: Max Neunhoeffer <max@arangodb.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-21 18:56:06 -07:00
Michal Hocko 12e967fd8e mm: do not allow MADV_PAGEOUT for CoW pages
Jann has brought up a very interesting point [1].  While shared pages
are excluded from MADV_PAGEOUT normally, CoW pages can be easily
reclaimed that way.  This can lead to all sorts of hard to debug
problems.  E.g.  performance problems outlined by Daniel [2].

There are runtime environments where there is a substantial memory
shared among security domains via CoW memory and a easy to reclaim way
of that memory, which MADV_{COLD,PAGEOUT} offers, can lead to either
performance degradation in for the parent process which might be more
privileged or even open side channel attacks.

The feasibility of the latter is not really clear to me TBH but there is
no real reason for exposure at this stage.  It seems there is no real
use case to depend on reclaiming CoW memory via madvise at this stage so
it is much easier to simply disallow it and this is what this patch
does.  Put it simply MADV_{PAGEOUT,COLD} can operate only on the
exclusively owned memory which is a straightforward semantic.

[1] http://lkml.kernel.org/r/CAG48ez0G3JkMq61gUmyQAaCq=_TwHbi1XKzWRooxZkv08PQKuw@mail.gmail.com
[2] http://lkml.kernel.org/r/CAKOZueua_v8jHCpmEtTB6f3i9e2YnmX4mqdYVWhV4E=Z-n+zRQ@mail.gmail.com

Fixes: 9c276cc65a ("mm: introduce MADV_COLD")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Daniel Colascione <dancol@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200312082248.GS23944@dhcp22.suse.cz
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-21 18:56:06 -07:00
Chris Down e26733e0d0 mm, memcg: throttle allocators based on ancestral memory.high
Prior to this commit, we only directly check the affected cgroup's
memory.high against its usage.  However, it's possible that we are being
reclaimed as a result of hitting an ancestor memory.high and should be
penalised based on that, instead.

This patch changes memory.high overage throttling to use the largest
overage in its ancestors when considering how many penalty jiffies to
charge.  This makes sure that we penalise poorly behaving cgroups in the
same way regardless of at what level of the hierarchy memory.high was
breached.

Fixes: 0e4b01df86 ("mm, memcg: throttle allocators when failing reclaim over memory.high")
Reported-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Chris Down <chris@chrisdown.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Nathan Chancellor <natechancellor@gmail.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: <stable@vger.kernel.org>	[5.4.x+]
Link: http://lkml.kernel.org/r/8cd132f84bd7e16cdb8fde3378cdbf05ba00d387.1584036142.git.chris@chrisdown.name
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-21 18:56:06 -07:00
Chris Down d397a45fc7 mm, memcg: fix corruption on 64-bit divisor in memory.high throttling
Commit 0e4b01df86 had a bunch of fixups to use the right division
method.  However, it seems that after all that it still wasn't right --
div_u64 takes a 32-bit divisor.

The headroom is still large (2^32 pages), so on mundane systems you
won't hit this, but this should definitely be fixed.

Fixes: 0e4b01df86 ("mm, memcg: throttle allocators when failing reclaim over memory.high")
Reported-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Chris Down <chris@chrisdown.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Nathan Chancellor <natechancellor@gmail.com>
Cc: <stable@vger.kernel.org>	[5.4.x+]
Link: http://lkml.kernel.org/r/80780887060514967d414b3cd91f9a316a16ab98.1584036142.git.chris@chrisdown.name
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-21 18:56:06 -07:00
Qian Cai d72520ad00 page-flags: fix a crash at SetPageError(THP_SWAP)
Commit bd4c82c22c ("mm, THP, swap: delay splitting THP after swapped
out") supported writing THP to a swap device but forgot to upgrade an
older commit df8c94d13c ("page-flags: define behavior of FS/IO-related
flags on compound pages") which could trigger a crash during THP
swapping out with DEBUG_VM_PGFLAGS=y,

  kernel BUG at include/linux/page-flags.h:317!

  page dumped because: VM_BUG_ON_PAGE(1 && PageCompound(page))
  page:fffff3b2ec3a8000 refcount:512 mapcount:0 mapping:000000009eb0338c index:0x7f6e58200 head:fffff3b2ec3a8000 order:9 compound_mapcount:0 compound_pincount:0
  anon flags: 0x45fffe0000d8454(uptodate|lru|workingset|owner_priv_1|writeback|head|reclaim|swapbacked)

  end_swap_bio_write()
    SetPageError(page)
      VM_BUG_ON_PAGE(1 && PageCompound(page))

  <IRQ>
  bio_endio+0x297/0x560
  dec_pending+0x218/0x430 [dm_mod]
  clone_endio+0xe4/0x2c0 [dm_mod]
  bio_endio+0x297/0x560
  blk_update_request+0x201/0x920
  scsi_end_request+0x6b/0x4b0
  scsi_io_completion+0x509/0x7e0
  scsi_finish_command+0x1ed/0x2a0
  scsi_softirq_done+0x1c9/0x1d0
  __blk_mqnterrupt+0xf/0x20
  </IRQ>

Fix by checking PF_NO_TAIL in those places instead.

Fixes: bd4c82c22c ("mm, THP, swap: delay splitting THP after swapped out")
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: "Huang, Ying" <ying.huang@intel.com>
Acked-by: Rafael Aquini <aquini@redhat.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200310235846.1319-1-cai@lca.pw
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-21 18:56:06 -07:00
Baoquan He d41e2f3bd5 mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case
In section_deactivate(), pfn_to_page() doesn't work any more after
ms->section_mem_map is resetting to NULL in SPARSEMEM|!VMEMMAP case.  It
causes a hot remove failure:

  kernel BUG at mm/page_alloc.c:4806!
  invalid opcode: 0000 [#1] SMP PTI
  CPU: 3 PID: 8 Comm: kworker/u16:0 Tainted: G        W         5.5.0-next-20200205+ #340
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
  Workqueue: kacpi_hotplug acpi_hotplug_work_fn
  RIP: 0010:free_pages+0x85/0xa0
  Call Trace:
   __remove_pages+0x99/0xc0
   arch_remove_memory+0x23/0x4d
   try_remove_memory+0xc8/0x130
   __remove_memory+0xa/0x11
   acpi_memory_device_remove+0x72/0x100
   acpi_bus_trim+0x55/0x90
   acpi_device_hotplug+0x2eb/0x3d0
   acpi_hotplug_work_fn+0x1a/0x30
   process_one_work+0x1a7/0x370
   worker_thread+0x30/0x380
   kthread+0x112/0x130
   ret_from_fork+0x35/0x40

Let's move the ->section_mem_map resetting after
depopulate_section_memmap() to fix it.

[akpm@linux-foundation.org: remove unneeded initialization, per David]
Fixes: ba72b4c8cf ("mm/sparsemem: support sub-section hotplug")
Signed-off-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Wei Yang <richardw.yang@linux.intel.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200307084229.28251-2-bhe@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-21 18:56:06 -07:00
Chunguang Xu 7d36665a58 memcg: fix NULL pointer dereference in __mem_cgroup_usage_unregister_event
An eventfd monitors multiple memory thresholds of the cgroup, closes them,
the kernel deletes all events related to this eventfd.  Before all events
are deleted, another eventfd monitors the memory threshold of this cgroup,
leading to a crash:

  BUG: kernel NULL pointer dereference, address: 0000000000000004
  #PF: supervisor write access in kernel mode
  #PF: error_code(0x0002) - not-present page
  PGD 800000033058e067 P4D 800000033058e067 PUD 3355ce067 PMD 0
  Oops: 0002 [#1] SMP PTI
  CPU: 2 PID: 14012 Comm: kworker/2:6 Kdump: loaded Not tainted 5.6.0-rc4 #3
  Hardware name: LENOVO 20AWS01K00/20AWS01K00, BIOS GLET70WW (2.24 ) 05/21/2014
  Workqueue: events memcg_event_remove
  RIP: 0010:__mem_cgroup_usage_unregister_event+0xb3/0x190
  RSP: 0018:ffffb47e01c4fe18 EFLAGS: 00010202
  RAX: 0000000000000001 RBX: ffff8bb223a8a000 RCX: 0000000000000001
  RDX: 0000000000000001 RSI: ffff8bb22fb83540 RDI: 0000000000000001
  RBP: ffffb47e01c4fe48 R08: 0000000000000000 R09: 0000000000000010
  R10: 000000000000000c R11: 071c71c71c71c71c R12: ffff8bb226aba880
  R13: ffff8bb223a8a480 R14: 0000000000000000 R15: 0000000000000000
  FS:  0000000000000000(0000) GS:ffff8bb242680000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000004 CR3: 000000032c29c003 CR4: 00000000001606e0
  Call Trace:
    memcg_event_remove+0x32/0x90
    process_one_work+0x172/0x380
    worker_thread+0x49/0x3f0
    kthread+0xf8/0x130
    ret_from_fork+0x35/0x40
  CR2: 0000000000000004

We can reproduce this problem in the following ways:

1. We create a new cgroup subdirectory and a new eventfd, and then we
   monitor multiple memory thresholds of the cgroup through this eventfd.

2.  closing this eventfd, and __mem_cgroup_usage_unregister_event ()
   will be called multiple times to delete all events related to this
   eventfd.

The first time __mem_cgroup_usage_unregister_event() is called, the
kernel will clear all items related to this eventfd in thresholds->
primary.

Since there is currently only one eventfd, thresholds-> primary becomes
empty, so the kernel will set thresholds-> primary and hresholds-> spare
to NULL.  If at this time, the user creates a new eventfd and monitor
the memory threshold of this cgroup, kernel will re-initialize
thresholds-> primary.

Then when __mem_cgroup_usage_unregister_event () is called for the
second time, because thresholds-> primary is not empty, the system will
access thresholds-> spare, but thresholds-> spare is NULL, which will
trigger a crash.

In general, the longer it takes to delete all events related to this
eventfd, the easier it is to trigger this problem.

The solution is to check whether the thresholds associated with the
eventfd has been cleared when deleting the event.  If so, we do nothing.

[akpm@linux-foundation.org: fix comment, per Kirill]
Fixes: 907860ed38 ("cgroups: make cftype.unregister_event() void-returning")
Signed-off-by: Chunguang Xu <brookxu@tencent.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/077a6f67-aefa-4591-efec-f2f3af2b0b02@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-21 18:56:06 -07:00
Linus Torvalds b74b991fb8 block-5.6-20200320
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl51dZoQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpt1GD/4qD5KahkC9cdRRcliGoYrY5CLJbEx+Hwlu
 QNJKggGJKMs3f4KqD1JwGsZCppD9bPTD2qgjs8Hjkw6HCOabXx8elQBKwyhjVglu
 5ogv981fmbzPau3n5gQ1llqjF6aslKpSkA9arQPgKov7gIoa2U3Gc1ZIO5Mz/X/T
 J0Z4TqmjMhbjAyP9BqfVIyQyDR9WGvO4U/9XmTclKU4Rex7lT5JRiGVF0ZpqfXDQ
 pkJOaqsltZVXN0J6Uy2e0qL5nkWIFhfrqjvoBG2V/ivt9zOfiPzmt9DLNl3S/QyU
 TYtNvAg6wuw/DBOdDsLoHztQUWbBqUMhn6892ADc6786TwFZv0/Ytv+CqL2mxbYy
 wImli5cnowWNevkNFpm3RLAMA0Oi8NiULb31AmRP23OyGSPB51JWhpnlqEylRnz2
 aa8KkA+VHiuaMnsII6Caq6tXsXyNfoDPGYvy5vCIyzZXPTvvH4i7rPr0QYb6VjAa
 v89mGE+Nx/eiC9FFEzPCXU5tgA6AMiMzoqXodRoSUSl7Lm+iGm4pPhUX4EIia1NG
 6Jc1A4cOQTjM8mptJKPIEbAovHsVMRUext5pinQYtMB58R16ZQfpdzQY1ojJUllk
 u2nWPyGswifMfpJ7daMhhYx/z6yQNy/MOqEPG8dNyc96R6OJoFGe+Wp41/9oP4Ly
 OjnPSFygKA==
 =q5rs
 -----END PGP SIGNATURE-----

Merge tag 'block-5.6-20200320' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
 "Just two NVMe fabrics fixes that should go into 5.6"

* tag 'block-5.6-20200320' of git://git.kernel.dk/linux-block:
  nvmet-tcp: set MSG_MORE only if we actually have more to send
  nvme-rdma: Avoid double freeing of async event data
2020-03-21 12:08:26 -07:00
Linus Torvalds 1ab7ea1f83 io_uring-5.6-20200320
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl51dbQQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpiV3EADJHB2r2hTTEym5u1PbrEEVkjvdL6InU8lD
 lFM7m2g6yZUncwm+aSZynHqAFY6Rd5Jk+gmYMuioi3ZxC2rs7jG1AOTpaeJYmhle
 lzkjqSLtl+gdPMA9ydivk1UwILFjtZKG1JNc++tnCn3q7+eCkgnWAlq5b7idG2eF
 BS0AEZP6Yz1zStTHLbHSB0StY8ovMIw0VaVQvguHLL9EBpbHmrs0cq3tipWkAyPR
 2YwnXbxsJySukkwmBKxEWrGUYDze56jqJIqdFsOE0+WtGV+nk7OScPseXAaP4/+G
 Vl23VNfryuZcsBUwI9tY1SzCFEXIwdXVGpCAYwQ/kU5WfvFpYaei+fXVNnL4kjR0
 PfpA6XnMsZ3DzqgepmUd92sAA56ZtBxuGjqcSYlg/JwjvUHdpaZDkE2WLqkAMeUN
 8A7cUw+R6XWQ2/y6ob7QvKiT/ZDR8GrYUl3EdGE3LhB1ZsvLXJDZpWipwQBzuk9R
 vJJOkGst38rjsWnb+nfeLh3AsgjF14wo+2vQL4mKs24xKTIvadHsFAZjKLXZ93Wf
 Vn58FaPOYIkjBidYLWb3dlO1ZR8S0803gohLkLV6adH8bCNCWxGTOR51DZLomAsb
 nAUCEAJaZrOqaQAuJAFNNpS8+/da3AIF4HVd2EdZ1yFXU15y0+zIxtROjKzg+OxO
 M3jC/Aet1Q==
 =IMcu
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-5.6-20200320' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:
 "Two different fixes in here:

   - Fix for a potential NULL pointer deref for links with async or
     drain marked (Pavel)

   - Fix for not properly checking RLIMIT_NOFILE for async punted
     operations.

     This affects openat/openat2, which were added this cycle, and
     accept4. I did a full audit of other cases where we might check
     current->signal->rlim[] and found only RLIMIT_FSIZE for buffered
     writes and fallocate. That one is fixed and queued for 5.7 and
     marked stable"

* tag 'io_uring-5.6-20200320' of git://git.kernel.dk/linux-block:
  io_uring: make sure accept honor rlimit nofile
  io_uring: make sure openat/openat2 honor rlimit nofile
  io_uring: NULL-deref for IOSQE_{ASYNC,DRAIN}
2020-03-21 11:54:47 -07:00