Now we cannot distinguish that one sk is a udp or sctp style when
we use ss to dump sctp_info. it's necessary to dump it as well.
For sctp_diag, ss support is not officially available, thus there
are no official users of this yet, so we can add this field in the
middle of sctp_info without breaking user API.
v1->v2:
- move 'sctpi_s_type' field to the end of struct sctp_info, so
that it won't cause incompatibility with applications already
built.
- add __reserved3 in sctp_info to make sure sctp_info is 8-byte
alignment.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The members child_list and active_list were added to the fence struct
without descriptions for the Documentation. Adding these.
Fixes: b55b54b5db ("staging/android: remove struct sync_pt")
Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com>
Reviewed-by: Javier Martinez Canillas <javier@osg.samsung.com>
Reviewed-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
Apparently nobody noticed that dma-buf.h wasn't actually pulled into
docbook build. And as a result the headerdoc comments bitrot a bit.
Add missing params/fields.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
include/uapi/sound/Kbuild was missing the inclusion of three header
files in that directory.
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The driver extended capabilities may differ for different
interface types which the userspace needs to know (for
example the fine timing measurement initiator and responder
bits might differ for a station and AP). Add a new nl80211
attribute to provide extended capabilities per interface type
to userspace.
Signed-off-by: Vidyullatha Kanchanapally <vkanchan@qti.qualcomm.com>
Reviewed-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Previously, the status parameter to cfg80211_connect_result() was
documented as using WLAN_STATUS_UNSPECIFIED_FAILURE (1) when the real
status code for the failure is not known. This value can be used by an
AP (and often is) and as such, user space cannot distinguish between
explicitly rejected authentication/association and not being able to
even try to associate or not receiving a response from the AP.
Add a new inline function, cfg80211_connect_timeout(), to be used when
the driver knows that the connection attempt failed due to a reason
where connection could not be attempt or no response was received from
the AP. The internal functions now allow a negative status value (-1) to
be used as an indication of this special case. This results in the
NL80211_ATTR_TIMED_OUT to be added to the NL80211_CMD_CONNECT event to
allow user space to determine this case was hit. For backwards
compatibility, NL80211_STATUS_CODE with the value
WLAN_STATUS_UNSPECIFIED_FAILURE is still indicated in the event in such
a case.
Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
[johannes: fix cfg80211_connect_bss() prototype to use int for status,
add cfg80211_connect_timeout() to docbook, fix docbook]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
For the benefit of every single caller, take osdc instead of map.
Also, now that osdc->osdmap can't ever be NULL, drop the check.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
commit 059500940d (ACPI/video: export acpi_video_get_levels)
mistakenly dropped the correct value of max_level and that caused the
set_level function following failed and the acpi_video backlight interface
didn't get created. Fix this by passing back the correct max_level value.
While at it, also fix the param used in acpi_video_device_lcd_query_levels
where acpi_handle is expected but acpi_video_device is passed.
Fixes: 059500940d (ACPI/video: export acpi_video_get_levels)
Reported-and-tested-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Acked-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
A recent cleanup moved MAX_IPTUN_ENCAP_OPS along with some other
definitions, but it is now invisible when CONFIG_INET is
not defined, but still referenced from ip6_tunnel.h:
In file included from net/xfrm/xfrm_input.c:17:0:
include/net/ip6_tunnel.h:67:17: error: 'MAX_IPTUN_ENCAP_OPS' undeclared here (not in a function)
ip6tun_encaps[MAX_IPTUN_ENCAP_OPS];
^~~~~~~~~~~~~~~~~~~
This hides the ip6_encap_hlen and ip6_tnl_encap functions inside
of CONFIG_INET so we don't run into the the problem.
Alternatively we could move the macro out of the #ifdef again to
restore the previous behavior
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 55c2bc1432 ("net: Cleanup encap items in ip_tunnels.h")
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull string hash improvements from George Spelvin:
"This series does several related things:
- Makes the dcache hash (fs/namei.c) useful for general kernel use.
(Thanks to Bruce for noticing the zero-length corner case)
- Converts the string hashes in <linux/sunrpc/svcauth.h> to use the
above.
- Avoids 64-bit multiplies in hash_64() on 32-bit platforms. Two
32-bit multiplies will do well enough.
- Rids the world of the bad hash multipliers in hash_32.
This finishes the job started in commit 689de1d6ca ("Minimal
fix-up of bad hashing behavior of hash_64()")
The vast majority of Linux architectures have hardware support for
32x32-bit multiply and so derive no benefit from "simplified"
multipliers.
The few processors that do not (68000, h8/300 and some models of
Microblaze) have arch-specific implementations added. Those
patches are last in the series.
- Overhauls the dcache hash mixing.
The patch in commit 0fed3ac866 ("namei: Improve hash mixing if
CONFIG_DCACHE_WORD_ACCESS") was an off-the-cuff suggestion.
Replaced with a much more careful design that's simultaneously
faster and better. (My own invention, as there was noting suitable
in the literature I could find. Comments welcome!)
- Modify the hash_name() loop to skip the initial HASH_MIX(). This
would let us salt the hash if we ever wanted to.
- Sort out partial_name_hash().
The hash function is declared as using a long state, even though
it's truncated to 32 bits at the end and the extra internal state
contributes nothing to the result. And some callers do odd things:
- fs/hfs/string.c only allocates 32 bits of state
- fs/hfsplus/unicode.c uses it to hash 16-bit unicode symbols not bytes
- Modify bytemask_from_count to handle inputs of 1..sizeof(long)
rather than 0..sizeof(long)-1. This would simplify users other
than full_name_hash"
Special thanks to Bruce Fields for testing and finding bugs in v1. (I
learned some humbling lessons about "obviously correct" code.)
On the arch-specific front, the m68k assembly has been tested in a
standalone test harness, I've been in contact with the Microblaze
maintainers who mostly don't care, as the hardware multiplier is never
omitted in real-world applications, and I haven't heard anything from
the H8/300 world"
* 'hash' of git://ftp.sciencehorizons.net/linux:
h8300: Add <asm/hash.h>
microblaze: Add <asm/hash.h>
m68k: Add <asm/hash.h>
<linux/hash.h>: Add support for architecture-specific functions
fs/namei.c: Improve dcache hash function
Eliminate bad hash multipliers from hash_32() and hash_64()
Change hash_64() return value to 32 bits
<linux/sunrpc/svcauth.h>: Define hash_str() in terms of hashlen_string()
fs/namei.c: Add hashlen_string() function
Pull out string hash to <linux/stringhash.h>
This is just the infrastructure; there are no users yet.
This is modelled on CONFIG_ARCH_RANDOM; a CONFIG_ symbol declares
the existence of <asm/hash.h>.
That file may define its own versions of various functions, and define
HAVE_* symbols (no CONFIG_ prefix!) to suppress the generic ones.
Included is a self-test (in lib/test_hash.c) that verifies the basics.
It is NOT in general required that the arch-specific functions compute
the same thing as the generic, but if a HAVE_* symbol is defined with
the value 1, then equality is tested.
Signed-off-by: George Spelvin <linux@sciencehorizons.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Andreas Schwab <schwab@linux-m68k.org>
Cc: Philippe De Muyter <phdm@macq.eu>
Cc: linux-m68k@lists.linux-m68k.org
Cc: Alistair Francis <alistai@xilinx.com>
Cc: Michal Simek <michal.simek@xilinx.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: uclinux-h8-devel@lists.sourceforge.jp
The "simplified" prime multipliers made very bad hash functions, so get rid
of them. This completes the work of 689de1d6ca.
To avoid the inefficiency which was the motivation for the "simplified"
multipliers, hash_64() on 32-bit systems is changed to use a different
algorithm. It makes two calls to hash_32() instead.
drivers/media/usb/dvb-usb-v2/af9015.c uses the old GOLDEN_RATIO_PRIME_32
for some horrible reason, so it inherits a copy of the old definition.
Signed-off-by: George Spelvin <linux@sciencehorizons.net>
Cc: Antti Palosaari <crope@iki.fi>
Cc: Mauro Carvalho Chehab <m.chehab@samsung.com>
That's all that's ever asked for, and it makes the return
type of hash_long() consistent.
It also allows (upcoming patch) an optimized implementation
of hash_64 on 32-bit machines.
I tried adding a BUILD_BUG_ON to ensure the number of bits requested
was never more than 32 (most callers use a compile-time constant), but
adding <linux/bug.h> to <linux/hash.h> breaks the tools/perf compiler
unless tools/perf/MANIFEST is updated, and understanding that code base
well enough to update it is too much trouble. I did the rest of an
allyesconfig build with such a check, and nothing tripped.
Signed-off-by: George Spelvin <linux@sciencehorizons.net>
Finally, the first use of previous two patches: eliminate the
separate ad-hoc string hash functions in the sunrpc code.
Now hash_str() is a wrapper around hash_string(), and hash_mem() is
likewise a wrapper around full_name_hash().
Note that sunrpc code *does* call hash_mem() with a zero length, which
is why the previous patch needed to handle that in full_name_hash().
(Thanks, Bruce, for finding that!)
This also eliminates the only caller of hash_long which asks for
more than 32 bits of output.
The comment about the quality of hashlen_string() and full_name_hash()
is jumping the gun by a few patches; they aren't very impressive now,
but will be improved greatly later in the series.
Signed-off-by: George Spelvin <linux@sciencehorizons.net>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Acked-by: J. Bruce Fields <bfields@redhat.com>
Cc: Jeff Layton <jlayton@poochiereds.net>
Cc: linux-nfs@vger.kernel.org
We'd like to make more use of the highly-optimized dcache hash functions
throughout the kernel, rather than have every subsystem create its own,
and a function that hashes basic null-terminated strings is required
for that.
(The name is to emphasize that it returns both hash and length.)
It's actually useful in the dcache itself, specifically d_alloc_name().
Other uses in the next patch.
full_name_hash() is also tweaked to make it more generally useful:
1) Take a "char *" rather than "unsigned char *" argument, to
be consistent with hash_name().
2) Handle zero-length inputs. If we want more callers, we don't want
to make them worry about corner cases.
Signed-off-by: George Spelvin <linux@sciencehorizons.net>
... so they can be used without the rest of <linux/dcache.h>
The hashlen_* macros will make sense next patch.
Signed-off-by: George Spelvin <linux@sciencehorizons.net>
A handful of changes this merge window:
- A few patches to fix probing and configuration of pstore
- A few patches adding Elan touchpad registration on a few devices
- EC changes: a security fix dealing with max message sizes and addition
of compat_ioctl support.
- Keyboard backlight control support
There was also an accidential duplicate registration of trackpads on 'Leon',
which was reverted just recently.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXSb5tAAoJEIwa5zzehBx38BsQAI+p6urzboC2+NSsZAq72d4/
u2SF+ziEdgj4f7n38TjiDaFTqig9nqoQeB2nDCRicyEcHsCiK/ghsdq4ULfkvHIa
VALn/rBkQWQ7hC+2tfrQn7Rd2VqVnh28+EyhGezhVtrAGRaE2jIbgKnnfD3MS+b5
lab014HBppP1pDtJQQYCcoCpOUF1a9WEb8jKPWf4DCeCMaSrwFLKxV5VeqDkNtEZ
a3GFzB+WypN1nwBQ7PEOUxutzsUakSNy+RFoCqGwVppJ+kA/E3MO2BGoEJuQ6LXP
u7nyabJq6zBQAlrYie4OoiijBWQ+Q759jbOKkmxpBfL/mtMMepV05DIx/16Yi9NY
j+4A9HsuY+EYUhr0mH7B57CH0RClebixdZ62DtUK0Fe6H66SsMZPYtTKLIzOTzgO
j+gypQyW8BvcM/XjPYTA/vlYK45IJYB5OUDQzFW0qWQv/nInoYOS3G8IVooKHMDy
+s2wih/OAxBCDJ+KfDeJUj3U5Avz1YCojRraZWmBsOPpRpsQHVQ02r3uoHeHHZFv
bI8sTJ7BGxoYX5Kb6+xbtW7SnXoWZN8vlqdCpdNnLWKcwIqHKFirkoZh2N2AqyZB
7VpQW1HD2WxYBpxMv98KVyPiR+5xCnZnpqcIIvpGrX00KhmJkMgIT4FbWHMf6vVt
T55Sz5V9z/dsgMSDT3LY
=bcc4
-----END PGP SIGNATURE-----
Merge tag 'chrome-platform' of git://git.kernel.org/pub/scm/linux/kernel/git/olof/chrome-platform
Pull chrome platform updates from Olof Johansson
"A handful of Chrome driver and binding changes this merge window:
- a few patches to fix probing and configuration of pstore
- a few patches adding Elan touchpad registration on a few devices
- EC changes: a security fix dealing with max message sizes and
addition of compat_ioctl support.
- keyboard backlight control support
There was also an accidential duplicate registration of trackpads on
'Leon', which was reverted just recently"
* tag 'chrome-platform' of git://git.kernel.org/pub/scm/linux/kernel/git/olof/chrome-platform:
Revert "platform/chrome: chromeos_laptop: Add Leon Touch"
platform/chrome: chromeos_laptop - Add Elan touchpad for Wolf
platform/chrome: chromeos_laptop - Add elan trackpad option for C720
platform/chrome: cros_ec_dev - Populate compat_ioctl
platform/chrome: cros_ec_lightbar - use name instead of ID to hide lightbar attributes
platform/chrome: cros_ec_dev - Fix security issue
platform/chrome: Add Chrome OS keyboard backlight LEDs support
platform/chrome: use to_platform_device()
platform/chrome: pstore: Move to larger record size.
platform/chrome: pstore: probe for ramoops buffer using acpi
platform/chrome: chromeos_laptop: Add Leon Touch
This is the second update round for 4.7-rc1. Most of changes are
about the pending ASoC updates and fixes, including a few new
drivers. Below are some highlights:
ASoC:
- New drivers for MAX98371 and TAS5720
- SPI support for TLV320AIC32x4, along with the module split
- TDM support for STI Uniperf IPs
- Remaining topology API fixes / updates
HDA:
- A couple of Dell quirks and new Realtek codec support
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJXSLGwAAoJEGwxgFQ9KSmkKz4P/2xpAXcwjT/g/WgeNVsZLnxd
vs1KlMPSWXLHY7ESZB+oDYtw5pAQWta2gKnG3T0QpkEtyqcyvEAUch55SfPbkDWz
bRwboK91NF9Cfrso+QnUG1HdpaeDsNydiAR5u2sdemQG+rh8TmWXNmFsuqPptjbm
LP6Spf8Ia2TYAvagZOB+2UTl7Jq8jMXiYP3aGWMHm7P/kREMQkSWcQ9U8F8UK92G
5D0qKGvChsd23ybGUL1nBM7wBvErFoKd4Xa1zMudQt8EkTjistdgm24v3PO+lKDv
JYiEugOZctzqtQVUlQMXcIqrlsafXwJN7ttKGst9gj32bM+a7EW0TGG0KyhxXI5w
fRgGU7AJwncs9hBzEPBfc6Jms85THN2HpusU61ZYpyFAhLnHAOL7iIZnNKY8Pyyg
tOPY2lTwHD9ic9EiC33/IypT50n0lBOi7X+YE7lGWdm2jXNvxtFv1jUw99kx25fj
UaFNQaDYXXDKO1POCFrHpIq+jJ71Jmk7mXktI75wfuLyX3PSPyFg8OBbYVbTWkbL
xdSqBs6LCESZ1iV9mauxwPSex44BpaMB3E0TA+7iN3+0Uwdfxe0OoLnX6dGiLSZJ
QenFu/EDdCsA8rlrDy7AS1e5ulpYsTY1KGSZbfNzMdNsmD2XC3FdZHEF5PAC4wZQ
EnNaik6InJgHlMF/6MXo
=uNKQ
-----END PGP SIGNATURE-----
Merge tag 'sound-4.7-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull more sound updates from Takashi Iwai:
"This is the second update round for 4.7-rc1. Most of changes are
about the pending ASoC updates and fixes, including a few new drivers.
Below are some highlights:
ASoC:
- New drivers for MAX98371 and TAS5720
- SPI support for TLV320AIC32x4, along with the module split
- TDM support for STI Uniperf IPs
- Remaining topology API fixes / updates
HDA:
- A couple of Dell quirks and new Realtek codec support"
* tag 'sound-4.7-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (63 commits)
ALSA: hda - Fix headset mic detection problem for one Dell machine
spi: spi-ep93xx: Fix the PTR_ERR() argument
ALSA: hda/realtek - Add support for ALC295/ALC3254
ASoC: kirkwood: fix build failure
ALSA: hda - Fix headphone noise on Dell XPS 13 9360
ASoC: ak4642: Enable cache usage to fix crashes on resume
ASoC: twl6040: Disconnect AUX output pads on digital mute
ASoC: tlv320aic32x4: Properly implement the positive and negative pins into the mixers
rcar: src: skip disabled-SRC nodes
ASoC: max98371 Remove duplicate entry in max98371_reg
ASoC: twl6040: Select LPPLL during standby
ASoC: rsnd: don't use prohibited number to PDMACHCRn.SRS
ASoC: simple-card: Add pm callbacks to platform driver
ASoC: pxa: Fix module autoload for platform drivers
ASoC: topology: Fix memory leak in widget creation
ASoC: Add max98371 codec driver
ASoC: rsnd: count .probe/.remove for rsnd_mod_call()
ASoC: topology: Check size mismatch of ABI objects before parsing
ASoC: topology: Check failure to create a widget
ASoC: add support for TAS5720 digital amplifier
...
Pull SCSI target updates from Nicholas Bellinger:
"Here are the outstanding target pending updates for v4.7-rc1.
The highlights this round include:
- Allow external PR/ALUA metadata path be defined at runtime via top
level configfs attribute (Lee)
- Fix target session shutdown bug for ib_srpt multi-channel (hch)
- Make TFO close_session() and shutdown_session() optional (hch)
- Drop se_sess->sess_kref + convert tcm_qla2xxx to internal kref
(hch)
- Add tcm_qla2xxx endpoint attribute for basic FC jammer (Laurence)
- Refactor iscsi-target RX/TX PDU encode/decode into common code
(Varun)
- Extend iscsit_transport with xmit_pdu, release_cmd, get_rx_pdu,
validate_parameters, and get_r2t_ttt for generic ISO offload
(Varun)
- Initial merge of cxgb iscsi-segment offload target driver (Varun)
The bulk of the changes are Chelsio's new driver, along with a number
of iscsi-target common code improvements made by Varun + Co along the
way"
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (29 commits)
iscsi-target: Fix early sk_data_ready LOGIN_FLAGS_READY race
cxgbit: Use type ISCSI_CXGBIT + cxgbit tpg_np attribute
iscsi-target: Convert transport drivers to signal rdma_shutdown
iscsi-target: Make iscsi_tpg_np driver show/store use generic code
tcm_qla2xxx Add SCSI command jammer/discard capability
iscsi-target: graceful disconnect on invalid mapping to iovec
target: need_to_release is always false, remove redundant check and kfree
target: remove sess_kref and ->shutdown_session
iscsi-target: remove usage of ->shutdown_session
tcm_qla2xxx: introduce a private sess_kref
target: make close_session optional
target: make ->shutdown_session optional
target: remove acl_stop
target: consolidate and fix session shutdown
cxgbit: add files for cxgbit.ko
iscsi-target: export symbols
iscsi-target: call complete on conn_logout_comp
iscsi-target: clear tx_thread_active
iscsi-target: add new offload transport type
iscsi-target: use conn_transport->transport_type in text rsp
...
- Dynamic counter infrastructure in the IB drivers
This is a sysfs based code to allow free form access to the hardware
counters RDMA devices might support so drivers don't need to code
this up repeatedly themselves
- SendOnlyFullMember multicast support
- IB router support
- A couple misc fixes
- The big item on the list: hfi1 driver updates, plus moving the hfi1
driver out of staging
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXRyupAAoJELgmozMOVy/deHAQAJp87yn8hntHXh81Kjrvezch
McJ0v7FTsfsLVe/Y+YAbaAREjanhnUDzZ36WUqS9rHadOl+Ia0fD+M53aTk9hDKS
R9/oWQDsUb8k7imdhNrSS72Q2kbwr++8RqBoEEFcl4fuDCVsrRhUaf+perJTEnoK
I9APDQCNU3FCTPFRllK5al6gi5UGEW9vO9JQZb+VnTVW9+LjlYobzcZfttAw8oRN
rtQLP3xGb7ZwV0ngH0dt2TthN2ih97TavnISojTdJ5I+WHFxLV0YbHSK/E2JLdoi
hVknhrflPIKr+mRVv97yCCz92kGAdTKG3sT9vNUVUQ0Y8kEdrW/LrnrmxsBEYzkl
GcQeqg7//ffOlQ0dGZZjlkPg/8yMPo9CBsfPDcOvXeenK0lkao9qg2yrIz+CBC/9
YlpUKfL0JLRal9O6y3Mh++ZQ9R1kvQN8/CgHLUlRQdWLfxf7VPA4rIcaWT7iGfvg
z6jIJKjsK+hqKhY9oKh3YfNmXh65IQ5OF3+tBzHKwNiGXNSsMYAVYP3vc/HYTJvz
4EpsenndKb5l0AOvDch8rMmw7YZPf2NOdi90TW6Bq1VgTIDRxh0WzR2QTnWBVRSf
XAcC3DLXTHqC/UUpHrdXgfvU/0nQXaVUvPWIAM/t3hy91GLscByDw4973+tISs/B
7AHSptbVQaBUpTjWHfKf
=5+cf
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull more rdma updates from Doug Ledford:
"This is the second group of code for the 4.7 merge window. It looks
large, but only in one sense. I'll get to that in a minute. The list
of changes here breaks down as follows:
- Dynamic counter infrastructure in the IB drivers
This is a sysfs based code to allow free form access to the
hardware counters RDMA devices might support so drivers don't need
to code this up repeatedly themselves
- SendOnlyFullMember multicast support
- IB router support
- A couple misc fixes
- The big item on the list: hfi1 driver updates, plus moving the hfi1
driver out of staging
There was a group of 15 patches in the hfi1 list that I thought I had
in the first pull request but they weren't. So that added to the
length of the hfi1 section here.
As far as these go, everything but the hfi1 is pretty straight
forward.
The hfi1 is, if you recall, the driver that Al had complaints about
how it used the write/writev interfaces in an overloaded fashion. The
write portion of their interface behaved like the write handler in the
IB stack proper and did bi-directional communications. The writev
interface, on the other hand, only accepts SDMA request structures.
The completions for those structures are sent back via an entirely
different event mechanism.
With the security patch, we put security checks on the write
interface, however, we also knew they would be going away soon. Now,
we've converted the write handler in the hfi1 driver to use ioctls
from the IB reserved magic area for its bidirectional communications.
With that change, Intel has addressed all of the items originally on
their TODO when they went into staging (as well as many items added to
the list later).
As such, I moved them out, and since they were the last item in the
staging/rdma directory, and I don't have immediate plans to use the
staging area again, I removed the staging/rdma area.
Because of the move out of staging, as well as a series of 5 patches
in the hfi1 driver that removed code people thought should be done in
a different way and was optional to begin with (a snoop debug
interface, an eeprom driver for an eeprom connected directory to their
hfi1 chip and not via an i2c bus, and a few other things like that),
the line count, especially the removal count, is high"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (56 commits)
staging/rdma: Remove the entire rdma subdirectory of staging
IB/core: Make device counter infrastructure dynamic
IB/hfi1: Fix pio map initialization
IB/hfi1: Correct 8051 link parameter settings
IB/hfi1: Update pkey table properly after link down or FM start
IB/rdamvt: Fix rdmavt s_ack_queue sizing
IB/rdmavt: Max atomic value should be a u8
IB/hfi1: Fix hard lockup due to not using save/restore spin lock
IB/hfi1: Add tracing support for send with invalidate opcode
IB/hfi1, qib: Add ieth to the packet header definitions
IB/hfi1: Move driver out of staging
IB/hfi1: Do not free hfi1 cdev parent structure early
IB/hfi1: Add trace message in user IOCTL handling
IB/hfi1: Remove write(), use ioctl() for user cmds
IB/hfi1: Add ioctl() interface for user commands
IB/hfi1: Remove unused user command
IB/hfi1: Remove snoop/diag interface
IB/hfi1: Remove EPROM functionality from data device
IB/hfi1: Remove UI char device
IB/hfi1: Remove multiple device cdev
...
Pull more i2c updates from Wolfram Sang:
"Here is the second pull request from I2C for this merge window:
- one new feature (which nearly fell through the cracks): i2c-dev
does now use the cdev API so it can handle >256 minors. Seems
people do need that.
- two fixes for the just added DMA feature for i2c-rcar
- some typo fixes"
* 'i2c/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: dev: don't start function name with 'return'
i2c: dev: switch from register_chrdev to cdev API
i2c: xlr: rename ARCH_TANGOX to ARCH_TANGO
i2c: at91: change log when dma configuration fails
misc: at24: Fix typo in at24 header file
i2c: rcar: should depend on HAS_DMA
i2c: rcar: use dma_request_chan()
Pull vfs fixes from Al Viro:
"Followups to the parallel lookup work:
- update docs
- restore killability of the places that used to take ->i_mutex
killably now that we have down_write_killable() merged
- Additionally, it turns out that I missed a prerequisite for
security_d_instantiate() stuff - ->getxattr() wasn't the only thing
that could be called before dentry is attached to inode; with smack
we needed the same treatment applied to ->setxattr() as well"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
switch ->setxattr() to passing dentry and inode separately
switch xattr_handler->set() to passing dentry and inode separately
restore killability of old mutex_lock_killable(&inode->i_mutex) users
add down_write_killable_nested()
update D/f/directory-locking
smack ->d_instantiate() uses ->setxattr(), so to be able to call it before
we'd hashed the new dentry and attached it to inode, we need ->setxattr()
instances getting the inode as an explicit argument rather than obtaining
it from dentry.
Similar change for ->getxattr() had been done in commit ce23e64. Unlike
->getxattr() (which is used by both selinux and smack instances of
->d_instantiate()) ->setxattr() is used only by smack one and unfortunately
it got missed back then.
Reported-by: Seung-Woo Kim <sw0312.kim@samsung.com>
Tested-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Now that the allmodconfig x86-64 build is clean wrt IS_ERR_VALUE() uses
on integers, add a cast to a pointer and back to the argument, so that
any new mis-uses of IS_ERR_VALUE() will cause warnings like
warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
so that we don't re-introduce any bogus uses.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The do_brk() and vm_brk() return value was "unsigned long" and returned
the starting address on success, and an error value on failure. The
reasons are entirely historical, and go back to it basically behaving
like the mmap() interface does.
However, nobody actually wanted that interface, and it causes totally
pointless IS_ERR_VALUE() confusion.
What every single caller actually wants is just the simpler integer
return of zero for success and negative error number on failure.
So just convert to that much clearer and more common calling convention,
and get rid of all the IS_ERR_VALUE() uses wrt vm_brk().
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The register_page_bootmem_info_node() function needs to be marked __init
in order to avoid a new warning introduced by commit f65e91df25 ("mm:
use early_pfn_to_nid in register_page_bootmem_info_node").
Otherwise you'll get a warning about how a non-init function calls
early_pfn_to_nid (which is __meminit)
Cc: Yang Shi <yang.shi@linaro.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull block fixes from Jens Axboe:
"A set of fixes that wasn't included in the first merge window pull
request. This pull request contains:
- A set of NVMe fixes from Keith, and one from Nic for the integrity
side of it.
- Fix from Ming, clearing ->mq_ops if we don't successfully setup a
queue for multiqueue.
- A set of stability fixes for bcache from Jiri, and also marking
bcache as orphaned as it's no longer actively maintained (in
mainline, at least)"
* 'for-linus' of git://git.kernel.dk/linux-block:
blk-mq: clear q->mq_ops if init fail
MAINTAINERS: mark bcache as orphan
bcache: bch_gc_thread() is not freezable
bcache: bch_allocator_thread() is not freezable
bcache: bch_writeback_thread() is not freezable
nvme/host: Add missing blk_integrity tag_size + flags assignments
NVMe: Add device ID's with stripe quirk
NVMe: Short-cut removal on surprise hot-unplug
NVMe: Allow user initiated rescan
NVMe: Reduce driver log spamming
NVMe: Unbind driver on failure
NVMe: Delete only created queues
NVMe: Allocate queues only for online cpus
Pull drm fixes from Dave Airlie:
- one IMX built-in regression fix
- a set of amdgpu fixes, mostly powerplay and polaris GPU stuff
- a set of i915 fixes all over, many cc'ed to stable.
The i915 batch contain support for DP++ dongle detection, which is
used to fix some regressions in the HDMI color depth area
* tag 'drm-fixes-v4.7-rc1' of git://people.freedesktop.org/~airlied/linux: (44 commits)
drm/amd: add Kconfig dependency for ACP on DRM_AMDGPU
drm/amdgpu: Fix hdmi deep color support.
drm/amdgpu: fix bug in fence driver fini
drm/i915: Stop automatically retiring requests after a GPU hang
drm/i915: Unify intel_ring_begin()
drm/i915: Ignore stale wm register values on resume on ilk-bdw (v2)
drm/i915/psr: Try to program link training times correctly
drm/imx: Match imx-ipuv3-crtc components using device node in platform data
drm/i915/bxt: Adjusting the error in horizontal timings retrieval
drm/i915: Don't leave old junk in ilk active watermarks on readout
drm/i915: s/DPPL/DPLL/ for SKL DPLLs
drm/i915: Fix gen8 semaphores id for legacy mode
drm/i915: Set crtc_state->lane_count for HDMI
drm/i915/BXT: Retrieving the horizontal timing for DSI
drm/i915: Protect gen7 irq_seqno_barrier with uncore lock
drm/i915: Re-enable GGTT earlier during resume on pre-gen6 platforms
drm/i915: Determine DP++ type 1 DVI adaptor presence based on VBT
drm/i915: Enable/disable TMDS output buffers in DP++ adaptor as needed
drm/i915: Respect DP++ adaptor TMDS clock limit
drm: Add helper for DP++ adaptors
...
Pull intel IOMMU updates from David Woodhouse:
"This patchset improves the scalability of the Intel IOMMU code by
resolving two spinlock bottlenecks and eliminating the linearity of
the IOVA allocator, yielding up to ~5x performance improvement and
approaching 'iommu=off' performance"
* git://git.infradead.org/intel-iommu:
iommu/vt-d: Use per-cpu IOVA caching
iommu/iova: introduce per-cpu caching to iova allocation
iommu/vt-d: change intel-iommu to use IOVA frame numbers
iommu/vt-d: avoid dev iotlb logic for domains with no dev iotlbs
iommu/vt-d: only unmap mapped entries
iommu/vt-d: correct flush_unmaps pfn usage
iommu/vt-d: per-cpu deferred invalidation queues
iommu/vt-d: refactoring of deferred flush entries
(kvm_stat had nothing to do with QEMU in the first place -- the tool
only interprets debugfs)
- expose per-vm statistics in debugfs and support them in kvm_stat
(KVM always collected per-vm statistics, but they were summarised into
global statistics)
x86:
- fix dynamic APICv (VMX was improperly configured and a guest could
access host's APIC MSRs, CVE-2016-4440)
- minor fixes
ARM changes from Christoffer Dall:
"This set of changes include the new vgic, which is a reimplementation
of our horribly broken legacy vgic implementation. The two
implementations will live side-by-side (with the new being the
configured default) for one kernel release and then we'll remove the
legacy one.
Also fixes a non-critical issue with virtual abort injection to
guests."
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABCAAGBQJXRz0KAAoJEED/6hsPKofosiMIAIHmRI+9I6VMNmQe5vrZKz9/
vt89QGxDJrFQwhEuZovenLEDaY6rMIJNguyvIbPhNuXNHIIPWbe6cO6OPwByqkdo
WI/IIqcAJN/Bpwt4/Y2977A5RwDOwWLkaDs0LrZCEKPCgeh9GWQf+EfyxkDJClhG
uIgbSAU+t+7b05K3c6NbiQT/qCzDTCdl6In6PI/DFSRRkXDaTcopjjp1PmMUSSsR
AM8LGhEzMer+hGKOH7H5TIbN+HFzAPjBuDGcoZt0/w9IpmmS5OMd3ZrZ320cohz8
zZQooRcFrT0ulAe+TilckmRMJdMZ69fyw3nzfqgAKEx+3PaqjKSY/tiEgqqDJHY=
=EEBK
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull second batch of KVM updates from Radim Krčmář:
"General:
- move kvm_stat tool from QEMU repo into tools/kvm/kvm_stat (kvm_stat
had nothing to do with QEMU in the first place -- the tool only
interprets debugfs)
- expose per-vm statistics in debugfs and support them in kvm_stat
(KVM always collected per-vm statistics, but they were summarised
into global statistics)
x86:
- fix dynamic APICv (VMX was improperly configured and a guest could
access host's APIC MSRs, CVE-2016-4440)
- minor fixes
ARM changes from Christoffer Dall:
- new vgic reimplementation of our horribly broken legacy vgic
implementation. The two implementations will live side-by-side
(with the new being the configured default) for one kernel release
and then we'll remove the legacy one.
- fix for a non-critical issue with virtual abort injection to guests"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (70 commits)
tools: kvm_stat: Add comments
tools: kvm_stat: Introduce pid monitoring
KVM: Create debugfs dir and stat files for each VM
MAINTAINERS: Add kvm tools
tools: kvm_stat: Powerpc related fixes
tools: Add kvm_stat man page
tools: Add kvm_stat vm monitor script
kvm:vmx: more complete state update on APICv on/off
KVM: SVM: Add more SVM_EXIT_REASONS
KVM: Unify traced vector format
svm: bitwise vs logical op typo
KVM: arm/arm64: vgic-new: Synchronize changes to active state
KVM: arm/arm64: vgic-new: enable build
KVM: arm/arm64: vgic-new: implement mapped IRQ handling
KVM: arm/arm64: vgic-new: Wire up irqfd injection
KVM: arm/arm64: vgic-new: Add vgic_v2/v3_enable
KVM: arm/arm64: vgic-new: vgic_init: implement map_resources
KVM: arm/arm64: vgic-new: vgic_init: implement vgic_init
KVM: arm/arm64: vgic-new: vgic_init: implement vgic_create
KVM: arm/arm64: vgic-new: vgic_init: implement kvm_vgic_hyp_init
...
Really sorry about this late pull request. It looks like at the time I
sent my pull request for v4.7 there was some conflict or other issue
which caused my script to stop merging the ASoC branches at some point
after the HDMI changes.
It's all specific driver updates, including:
- New drivers for MAX98371 and TAS5720.
- SPI support for TLV320AIC32x4.
- TDM support for STI Uniperf IPs.
This code should all have been in -next prior to the merge window apart
from some fixes, it dropped out on the 18th.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJXSEQkAAoJECTWi3JdVIfQsPoH/jDJ/Eo95aq2UWBqjNbGCahZ
GIiV6TsSnfbg+UIiEvnJUt0ABa0NYKbyOGUny+NC/gLUt4n3FvzGSGGXs/pkuR9R
kF07daPDyQp2uz2GFjN45JLZJV/P1ofkJTsjYnqqqup15XW7EA0F5KbTiOhd+VIc
GlgUHZGMpxVYCc9bS59ZW2V/X1Epzuh8cKD3+I1vyYo6UZbsKMPzhS3aksCM5cMO
h3uifB4hgS26x+0Yrinj0capUalmS+N7isMtL3r/c0Z8DGRiHZOZuMnfoGavNek+
qR/SbOpj2Lj6J9Xut63xtoLAo/JNj7Fk7xYxkOxlpO8+Jqk9KgEKEM/Va3xLJ+M=
=yEWH
-----END PGP SIGNATURE-----
Merge tag 'asoc-v4.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Updates for v4.7 part 2
Really sorry about this late pull request. It looks like at the time I
sent my pull request for v4.7 there was some conflict or other issue
which caused my script to stop merging the ASoC branches at some point
after the HDMI changes.
It's all specific driver updates, including:
- New drivers for MAX98371 and TAS5720.
- SPI support for TLV320AIC32x4.
- TDM support for STI Uniperf IPs.
This code should all have been in -next prior to the merge window apart
from some fixes, it dropped out on the 18th.
I see the main drm pull got merged, here's the first batch of fixes for
4.7 already. Fixes all around, a large portion cc: stable stuff.
[airlied: the DP++ stuff is a regression fix].
* tag 'drm-intel-next-fixes-2016-05-25' of git://anongit.freedesktop.org/drm-intel:
drm/i915: Stop automatically retiring requests after a GPU hang
drm/i915: Unify intel_ring_begin()
drm/i915: Ignore stale wm register values on resume on ilk-bdw (v2)
drm/i915/psr: Try to program link training times correctly
drm/i915/bxt: Adjusting the error in horizontal timings retrieval
drm/i915: Don't leave old junk in ilk active watermarks on readout
drm/i915: s/DPPL/DPLL/ for SKL DPLLs
drm/i915: Fix gen8 semaphores id for legacy mode
drm/i915: Set crtc_state->lane_count for HDMI
drm/i915/BXT: Retrieving the horizontal timing for DSI
drm/i915: Protect gen7 irq_seqno_barrier with uncore lock
drm/i915: Re-enable GGTT earlier during resume on pre-gen6 platforms
drm/i915: Determine DP++ type 1 DVI adaptor presence based on VBT
drm/i915: Enable/disable TMDS output buffers in DP++ adaptor as needed
drm/i915: Respect DP++ adaptor TMDS clock limit
drm: Add helper for DP++ adaptors
Pull kbuild updates from Michal Marek:
- new option CONFIG_TRIM_UNUSED_KSYMS which does a two-pass build and
unexports symbols which are not used in the current config [Nicolas
Pitre]
- several kbuild rule cleanups [Masahiro Yamada]
- warning option adjustments for gcov etc [Arnd Bergmann]
- a few more small fixes
* 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild: (31 commits)
kbuild: move -Wunused-const-variable to W=1 warning level
kbuild: fix if_change and friends to consider argument order
kbuild: fix adjust_autoksyms.sh for modules that need only one symbol
kbuild: fix ksym_dep_filter when multiple EXPORT_SYMBOL() on the same line
gcov: disable -Wmaybe-uninitialized warning
gcov: disable tree-loop-im to reduce stack usage
gcov: disable for COMPILE_TEST
Kbuild: disable 'maybe-uninitialized' warning for CONFIG_PROFILE_ALL_BRANCHES
Kbuild: change CC_OPTIMIZE_FOR_SIZE definition
kbuild: forbid kernel directory to contain spaces and colons
kbuild: adjust ksym_dep_filter for some cmd_* renames
kbuild: Fix dependencies for final vmlinux link
kbuild: better abstract vmlinux sequential prerequisites
kbuild: fix call to adjust_autoksyms.sh when output directory specified
kbuild: Get rid of KBUILD_STR
kbuild: rename cmd_as_s_S to cmd_cpp_s_S
kbuild: rename cmd_cc_i_c to cmd_cpp_i_c
kbuild: drop redundant "PHONY += FORCE"
kbuild: delete unnecessary "@:"
kbuild: mark help target as PHONY
...
- We use a bit in an exceptional radix tree entry as a lock bit and use it
similarly to how page lock is used for normal faults. This fixes races
between hole instantiation and read faults of the same index.
- Filesystem DAX PMD faults are disabled, and will be re-enabled when PMD
locking is implemented.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXRKwLAAoJEJ/BjXdf9fLB+BkP/3HBm05KlAKDklvnBIPFDMUK
hA7g2K6vuvaEDZXZQ1ioc1Ajf1sCpVip7shXJsojZqwWmRz0/4nneF7ytluW9AjS
dBX+0qCgKGH1fnwyGFF+MN7fuj7kGrSDz34lG0OObRN6/oKiVNb2svXiYKkT6J6C
AgsWlWRUpMy9jrn1u/FduMjDhk92Z3ojarexuicr0i8NUlBClCIrdCEmUMi4orSB
DuiIjestLOc7+mERBUwrXkzoh9v8Z0FpIgnDLWwpeEkAvJwWkGe5eXrBJwF+hEbi
RYfTrOYc7bBQLo22LRb8pdighjrx3OW9EpNCfEmLDOjM3cYBbMK/d2i/ww52H6IK
Mw6iS5rXdGgJtQIGL8N96HLFk+cDyZ8J8xNUCwbYYBJqgpMzxzVkL3vTm72tyFnl
InWhih+miCMbBPytQSRd6+1wZG2piJTv6SsFTd5K1OaiRmJhBJZG47t2QTBRBu7Y
5A4FGPtlraV+iDJvD6VLO1Tp8twxdLluOJ2BwdGeiKXiGh6LP+FGGFF3aFa5N4Ro
xSslCTX7Q1G66zXQwD4+IMWLwS1FDNymPkUSsF6RQo6qfAnl9SrmYTc4xJ4QXy92
sUdrWEz2OBTfxKNqbGyc/KrXKZT3RnEkJNft8snB2h6WTCdOPaNYs/yETUwiwkSc
CXpuQFrxm69QYwNsqVu1
=Pkd0
-----END PGP SIGNATURE-----
Merge tag 'dax-locking-for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull DAX locking updates from Ross Zwisler:
"Filesystem DAX locking for 4.7
- We use a bit in an exceptional radix tree entry as a lock bit and
use it similarly to how page lock is used for normal faults. This
fixes races between hole instantiation and read faults of the same
index.
- Filesystem DAX PMD faults are disabled, and will be re-enabled when
PMD locking is implemented"
* tag 'dax-locking-for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
dax: Remove i_mmap_lock protection
dax: Use radix tree entry lock to protect cow faults
dax: New fault locking
dax: Allow DAX code to replace exceptional entries
dax: Define DAX lock bit for radix tree exceptional entry
dax: Make huge page handling depend of CONFIG_BROKEN
dax: Fix condition for filling of PMD holes
- Until now, dax has been disabled if media errors were found on
any device. This enables the use of DAX in the presence of these
errors by making all sector-aligned zeroing go through the driver.
- The driver (already) has the ability to clear errors on writes that
are sent through the block layer using 'DSMs' defined in ACPI 6.1.
Other misc changes:
- When mounting DAX filesystems, check to make sure the partition
is page aligned. This is a requirement for DAX, and previously, we
allowed such unaligned mounts to succeed, but subsequent reads/writes
would fail.
- Misc/cleanup fixes from Jan that remove unused code from DAX related to
zeroing, writeback, and some size checks.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXQ4GKAAoJEHr6Yb6juE3/zowP/iclIhgXXXMQJRUHJlePMXC8
15sGZ32JS1ak9g7vrsmNVEDNynfNtiMYdBxtUyRuj6xqgwdZvFk3F55KOCPtaeA1
+yADkgeRkTAcwzmHw9WQVEzBCqyzSisdrwtEfH817qdq9FJdH66x2Kos6i+HeAVr
5Q/e4gs7lKrjf384/QBl+wxNZOndJaQAPd2VRHQqx2A9F33v0ljdwRaUG1r4fjK2
dtmhcZCqdQyuAGXW3piTnZc5ZFc3DPqO4FkEfqkEK3lFOflK0fd8wMsAZRp/Jd0j
GJsgnVSWSqG0Dz476djlG0w8t2p5Jv1g9cKChV+ZZEdFLKWHCOUFqXNj8uI8I4k5
cOEKCHyJ3IwfSHhNQqktEWrQN4T8ZXhWtuc9GuV4UZYuqJqHci6EdR/YsWsJjV+L
lm/qvK4ipDS1pivxOy8KX/iN0z7Io8J9GXpStDx3g8iWjLlh4YYlbJLWeeRepo/z
aPlV/QAKcHiGY6jzLExrZIyCWkzwo6O+0p1Kxerv9/7K/32HWbOodZ+tC8eD+N25
pV69nCGf+u50T2TtIx1+iann4NC1r7zg5yqnT9AgpyZpiwR5joCDzI5sXW+D0rcS
vPtfM84Ccdeq/e6mvfIpZgR0/npQapKnrmUest0J7P2BFPHiFPji1KzZ7M+1aFOo
9R6JdrAj0Sc+FBa+cGzH
=v6Of
-----END PGP SIGNATURE-----
Merge tag 'dax-misc-for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull misc DAX updates from Vishal Verma:
"DAX error handling for 4.7
- Until now, dax has been disabled if media errors were found on any
device. This enables the use of DAX in the presence of these
errors by making all sector-aligned zeroing go through the driver.
- The driver (already) has the ability to clear errors on writes that
are sent through the block layer using 'DSMs' defined in ACPI 6.1.
Other misc changes:
- When mounting DAX filesystems, check to make sure the partition is
page aligned. This is a requirement for DAX, and previously, we
allowed such unaligned mounts to succeed, but subsequent
reads/writes would fail.
- Misc/cleanup fixes from Jan that remove unused code from DAX
related to zeroing, writeback, and some size checks"
* tag 'dax-misc-for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
dax: fix a comment in dax_zero_page_range and dax_truncate_page
dax: for truncate/hole-punch, do zeroing through the driver if possible
dax: export a low-level __dax_zero_page_range helper
dax: use sb_issue_zerout instead of calling dax_clear_sectors
dax: enable dax in the presence of known media errors (badblocks)
dax: fallback from pmd to pte on error
block: Update blkdev_dax_capable() for consistency
xfs: Add alignment check for DAX mount
ext2: Add alignment check for DAX mount
ext4: Add alignment check for DAX mount
block: Add bdev_dax_supported() for dax mount checks
block: Add vfs_msg() interface
dax: Remove redundant inode size checks
dax: Remove pointless writeback from dax_do_io()
dax: Remove zeroing from dax_io()
dax: Remove dead zeroing code from fault handlers
ext2: Avoid DAX zeroing to corrupt data
ext2: Fix block zeroing in ext2_get_blocks() for DAX
dax: Remove complete_unwritten argument
DAX: move RADIX_DAX_ definitions to dax.c
mmput_async is currently used only from the oom_reaper which is defined
only for CONFIG_MMU. We can save work_struct in mm_struct for
!CONFIG_MMU.
[akpm@linux-foundation.org: fix typo, per Minchan]
Link: http://lkml.kernel.org/r/20160520061658.GB19172@dhcp22.suse.cz
Reported-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
lockless_dereference() is supposed to take pointer not integer.
Link: http://lkml.kernel.org/r/20160521201448.GA7429@p183.telecom.by
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull Ceph updates from Sage Weil:
"This changeset has a few main parts:
- Ilya has finished a huge refactoring effort to sync up the
client-side logic in libceph with the user-space client code, which
has evolved significantly over the last couple years, with lots of
additional behaviors (e.g., how requests are handled when cluster
is full and transitions from full to non-full).
This structure of the code is more closely aligned with userspace
now such that it will be much easier to maintain going forward when
behavior changes take place. There are some locking improvements
bundled in as well.
- Zheng adds multi-filesystem support (multiple namespaces within the
same Ceph cluster)
- Zheng has changed the readdir offsets and directory enumeration so
that dentry offsets are hash-based and therefore stable across
directory fragmentation events on the MDS.
- Zheng has a smorgasbord of bug fixes across fs/ceph"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (71 commits)
ceph: fix wake_up_session_cb()
ceph: don't use truncate_pagecache() to invalidate read cache
ceph: SetPageError() for writeback pages if writepages fails
ceph: handle interrupted ceph_writepage()
ceph: make ceph_update_writeable_page() uninterruptible
libceph: make ceph_osdc_wait_request() uninterruptible
ceph: handle -EAGAIN returned by ceph_update_writeable_page()
ceph: make fault/page_mkwrite return VM_FAULT_OOM for -ENOMEM
ceph: block non-fatal signals for fault/page_mkwrite
ceph: make logical calculation functions return bool
ceph: tolerate bad i_size for symlink inode
ceph: improve fragtree change detection
ceph: keep leaf frag when updating fragtree
ceph: fix dir_auth check in ceph_fill_dirfrag()
ceph: don't assume frag tree splits in mds reply are sorted
ceph: fix inode reference leak
ceph: using hash value to compose dentry offset
ceph: don't forbid marking directory complete after forward seek
ceph: record 'offset' for each entry of readdir result
ceph: define 'end/complete' in readdir reply as bit flags
...
This commit fixes a simple typo s/mvmem/nvmem in the
example.
Signed-off-by: Moritz Fischer <moritz.fischer@ettus.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Highlights include:
Features:
- Add support for the NFS v4.2 COPY operation
- Add support for NFS/RDMA over IPv6
Bugfixes and cleanups:
- Avoid race that crashes nfs_init_commit()
- Fix oops in callback path
- Fix LOCK/OPEN race when unlinking an open file
- Choose correct stateids when using delegations in setattr, read and write
- Don't send empty SETATTR after OPEN_CREATE
- xprtrdma: Prevent server from writing a reply into memory client has released
- xprtrdma: Support using Read list and Reply chunk in one RPC call
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJXRu76AAoJENfLVL+wpUDrDVoQAKPKv1tEVJMRUQA3UVoKoixd
KjmmZMjl6GfpISwTZl+a8W549jyGuYH7Gl8vSbMaE9/FI+kJW6XZQniTYfFqY8/a
LbMSdNx1+yURisbkyO0vPqqwKw9r6UmsfGeUT8SpS3ff61yp4Oj436ra2qcPJsZ3
cWl/lHItzX7oKFAWmr0Nmq2X8ac/8+NFyK29+V/QGfwtp3qAPbpA8XM5HrHw3rA2
uk5uNSr3hwqz7P3+Hi7ZoO2m4nQTAbQnEunfYpxlOwz4IaM7qcGnntT6Jhwq1pGE
/1YasG7bHeiWjhynmZZ4CWuMkogau2UJ/G68Cz7ehLhPNr8rH/ZFCJZ+XX0e0CgI
1d+AwxZvgszIQVBY3S7sg8ezVSCPBXRFJ8rtzggGscqC53aP7L+rLfUFH+OKrhMg
6n7RQiq4EmGDJGviB/R2HixI9CpdOf2puNhDKSJmPOqiSS7UuHMw8QCq++vdru+1
GLGunGyO7D70yTV92KtsdzJlFlnfa/g+FIJrmaMpL3HH1h0stTctWX5xlTYmqEL3
z3aUuT8RySk2t1FTabSj6KRWqE/krK5BMZbX91kpF27WL4c/olXFaZPqBDsj0q4u
2rm1fIrc8RxLXctJan9ro092s/e9dup/1JxV5XWMq/EGS1ezvf+0XkCOtURaAWp3
2aPHlx7M8iuq2SouL6f7
=QMmY
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-4.7-1' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client updates from Anna Schumaker:
"Highlights include:
Features:
- Add support for the NFS v4.2 COPY operation
- Add support for NFS/RDMA over IPv6
Bugfixes and cleanups:
- Avoid race that crashes nfs_init_commit()
- Fix oops in callback path
- Fix LOCK/OPEN race when unlinking an open file
- Choose correct stateids when using delegations in setattr, read and
write
- Don't send empty SETATTR after OPEN_CREATE
- xprtrdma: Prevent server from writing a reply into memory client
has released
- xprtrdma: Support using Read list and Reply chunk in one RPC call"
* tag 'nfs-for-4.7-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (61 commits)
pnfs: pnfs_update_layout needs to consider if strict iomode checking is on
nfs/flexfiles: Use the layout segment for reading unless it a IOMODE_RW and reading is disabled
nfs/flexfiles: Helper function to detect FF_FLAGS_NO_READ_IO
nfs: avoid race that crashes nfs_init_commit
NFS: checking for NULL instead of IS_ERR() in nfs_commit_file()
pnfs: make pnfs_layout_process more robust
pnfs: rework LAYOUTGET retry handling
pnfs: lift retry logic from send_layoutget to pnfs_update_layout
pnfs: fix bad error handling in send_layoutget
flexfiles: add kerneldoc header to nfs4_ff_layout_prepare_ds
flexfiles: remove pointless setting of NFS_LAYOUT_RETURN_REQUESTED
pnfs: only tear down lsegs that precede seqid in LAYOUTRETURN args
pnfs: keep track of the return sequence number in pnfs_layout_hdr
pnfs: record sequence in pnfs_layout_segment when it's created
pnfs: don't merge new ff lsegs with ones that have LAYOUTRETURN bit set
pNFS/flexfiles: When initing reads or writes, we might have to retry connecting to DSes
pNFS/flexfiles: When checking for available DSes, conditionally check for MDS io
pNFS/flexfile: Fix erroneous fall back to read/write through the MDS
NFS: Reclaim writes via writepage are opportunistic
NFSv4: Use the right stateid for delegations in setattr, read and write
...
In practice, each RDMA device has a unique set of counters that the
hardware implements. Having a central set of counters that they must
all adhere to is limiting and causes many useful counters to not be
available.
Therefore we create a dynamic counter registration infrastructure.
The driver must implement a stats structure allocation routine, in
which the driver must place the directory name it wants, a list of
names for all of the counters, an array of u64 counters themselves,
plus a few generic configuration options.
We then implement a core routine to create a sysfs file for each
of the named stats elements, and a core routine to retrieve the
stats when any of the sysfs attribute files are read.
To avoid excessive beating on the stats generation routine in the
drivers, the core code also caches the stats for a short period of
time so that someone attempting to read all of the stats in a
given device's directory will not result in a stats generation
call per file read.
Future work will attempt to standardize just the shared stats
elements, and possibly add a method to get the stats via netlink
in addition to sysfs.
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
[ Add caching, make structure names more informative, add i40iw support,
other significant rewrites from the original patch ]
- Prevent re-tuning while serving requests for RPMB partitions
- Extend timeout for long read time quirk to support more eMMCs
MMC host:
- sdhci-acpi: Ensure connected devices are powered when probing
- sdhci-pci|acpi: Remove unreliable MMC_CAP_BUS_WIDTH_TEST for Intel HWs
- dw_mmc: Correct the assigning of max_blk_size
- dw_mmc-rockchip: Allow RPMB partitions to be created
- dw_mmc-rockchip: Set the drive phase properly
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXRq1wAAoJEP4mhCVzWIwpKa0QAId0gcmR5EF45FHYdNP50TgJ
EgJa8xFZiBXC/RZbcDYBOfNVcQ+RqvCs4K5PxJcpleARA7MtCcbwWpKynHG/2cdR
xbZbzuuMd4jKS7AO6kC1Rww370XTRhtXAlkO1w/KWvnC79wV86EgIxxl8adBycw7
68JFmjrcDTyBq63G0jHZlUzA8mVxl+k9Jb3lW1+FnsGMCV7R4dOK58rwp3z7ITwl
r4sSQGN6STwlav3tLlurNi/7Wd9VfHjbuuwSR9oIc11hRzBNNL+I2Runk/4lfD6q
52S4OYoqPiZrMheYonQ6zVAjm9MVWLAMId6AbJXlx8m42qbq2q2o4gv4csGjKiNr
e8ZaHRw8FYRL8A4UewYcewPxJfXkJSV81KNKQYS1jn94zfPspffln73VM1Sbqmu8
tg89C1CYShAS0IcIyKk3XCBocq6JARIF+M7mE3FTZd3yfBd6hBbjUbv9ufKnbsxJ
lSG9jISCYaqsoc4SLK/hBEypN68otLKDm7Jl8VcPj351f5NDlj0hv56scqt3HbAk
hcPUYtHc5+6vsQ1mIQRC+NmH9qacChq8JfLHej7HB+hSCJZUp35QB0Zr6oTmv4G4
9mUY/J6++9fGQtcWWAyNcVOuZUjGSxiSDm6wgtZMJZDYUFECXDdPEYRatBIMa2Ul
4bbcKYEbqdC8Cn2q5FZn
=zHa6
-----END PGP SIGNATURE-----
Merge tag 'mmc-v4.7-rc1' of git://git.linaro.org/people/ulf.hansson/mmc
Pull MMC fixes from Ulf Hansson:
"Here are some mmc fixes intended for v4.7 rc1. They are based on a
commit earlier in the merge window and have been tested in linux-next
for a while.
MMC core:
- Prevent re-tuning while serving requests for RPMB partitions
- Extend timeout for long read time quirk to support more eMMCs
MMC host:
- sdhci-acpi: Ensure connected devices are powered when probing
- sdhci-pci|acpi: Remove unreliable MMC_CAP_BUS_WIDTH_TEST for Intel HWs
- dw_mmc: Correct the assigning of max_blk_size
- dw_mmc-rockchip: Allow RPMB partitions to be created
- dw_mmc-rockchip: Set the drive phase properly"
* tag 'mmc-v4.7-rc1' of git://git.linaro.org/people/ulf.hansson/mmc:
mmc: sdhci-acpi: Remove MMC_CAP_BUS_WIDTH_TEST for Intel controllers
mmc: sdhci-pci: Remove MMC_CAP_BUS_WIDTH_TEST for Intel controllers
mmc: longer timeout for long read time quirk
mmc: dw_mmc: rockchip: Set the drive phase properly
mmc: dw_mmc: fix the wrong max_blk_size
mmc: dw_mmc-rockchip: add MMC_CAP_CMD23 capabilities
mmc: sdhci-acpi: Ensure connected devices are powered when probing
ACPI / PM: Export acpi_device_fix_up_power()
mmc: block: Pause re-tuning while switched to the RPMB partition
mmc: block: Always switch back to main area after RPMB access
mmc: core: Add a facility to "pause" re-tuning
Pull thermal management updates from Zhang Rui:
- Introduce generic ADC thermal driver, based on OF thermal (Laxman
Dewangan)
- Introduce new thermal driver for Tango chips (Marc Gonzalez)
- Rockchip driver support for RK3399, RK3366, and some fixes (Caesar
Wang, Elaine Zhang and Shawn Lin)
- Add CPU power cooling model to Mediatek thermal driver (Dawei Chien)
- Wider usage of dev_thermal_zone_of_sensor_register (Eduardo Valentin)
- TI thermal driver gained a new maintainer (Keerthy).
- Enabled powerclamp driver by checking CPU feature and package cstate
counter instead of CPU whitelist (Jacob Pan)
- Various fixes on thermal governor, OF thermal, Tegra, and RCAR
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: (50 commits)
thermal: tango: initialize TEMPSI_CFG
thermal: rockchip: use the usleep_range instead of udelay
thermal: rockchip: add the notes for better reading
thermal: rockchip: Support RK3366 SoCs in the thermal driver
thermal: rockchip: handle the power sequence for tsadc controller
thermal: rockchip: update the tsadc table for rk3399
thermal: rockchip: fixes the code_to_temp for tsadc driver
thermal: rockchip: disable thermal->clk in err case
thermal: tegra: add Tegra132 specific SOC_THERM driver
thermal: fix ptr_ret.cocci warnings
thermal: mediatek: Add cpu dynamic power cooling model.
thermal: generic-adc: Add ADC based thermal sensor driver
thermal: generic-adc: Add DT binding for ADC based thermal sensor
thermal: tegra: fix static checker warning
thermal: tegra: mark PM functions __maybe_unused
thermal: add temperature sensor support for tango SoC
thermal: hisilicon: fix IRQ imbalance enabling
thermal: hisilicon: support to use any sensor
thermal: rcar: Remove binding docs for r8a7794
thermal: tegra: add PM support
...
rdmavt allows the driver to specify the size of the ack queue, but
only uses it for the modify QP limit testing for setting the atomic
limit value.
The driver dependent size is now used to size the s_ack_queue ring
dynamicially.
Since the driver knows its size, the driver will use its define
for any ring size dependent code.
Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This matches the ib_qp_attr size and
avoids a extremely large value when the lower level
driver registers.
As part of the patch, the u8 ordinals are moved to the
end of the struct to reduce pahole noted excesses.
Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Remove the write() handler for user space commands now that ioctl
handling is available. User apps will need to change to use ioctl from
this point forward.
Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
IOCTL is more suited to what user space commands need to do than the
write() interface. Add IOCTL definitions for all existing write commands
and the handling for those. The write() interface will be removed in a
follow on patch.
Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The HFI1_CMD_SDMA_STATUS_UPD command was never implemented it has no
reason to live in the driver. Remove it.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Remove EPROM handling from the cdev which is used for user application
data traffic.
Reviewed-by: Dean Luick <dean.luick@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
hfi1 current exports a cdev that can be used to target all of the hfi's
in the system. However there is a problem with this approach in
that the devices could be on different subnets. This is a problem that
user space can figure out and explicitly tell the driver on which device
to create a context.
Remove the multi-purpose cdev leaving a dedicated cdev for each port.
Also remove the striping capability that is dependent upon the user
choosing the multi-purpose cdev. It is now up to user space to determine
how to stripe contexts.
Reviewed-by: Dean Luick <dean.luick@intel.com>
Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Pull scheduler fixes from Ingo Molnar:
"Two fixes: one for a lost wakeup, the other to fix the compiler
optimizing out preempt operations on ARM64 (and possibly other non-x86
architectures)"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/core: Fix remote wakeups
sched/preempt: Fix preempt_count manipulations
Pull perf updates from Ingo Molnar:
"Mostly tooling and PMU driver fixes, but also a number of late updates
such as the reworking of the call-chain size limiting logic to make
call-graph recording more robust, plus tooling side changes for the
new 'backwards ring-buffer' extension to the perf ring-buffer"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits)
perf record: Read from backward ring buffer
perf record: Rename variable to make code clear
perf record: Prevent reading invalid data in record__mmap_read
perf evlist: Add API to pause/resume
perf trace: Use the ptr->name beautifier as default for "filename" args
perf trace: Use the fd->name beautifier as default for "fd" args
perf report: Add srcline_from/to branch sort keys
perf evsel: Record fd into perf_mmap
perf evsel: Add overwrite attribute and check write_backward
perf tools: Set buildid dir under symfs when --symfs is provided
perf trace: Only auto set call-graph to "dwarf" when syscalls are being traced
perf annotate: Sort list of recognised instructions
perf annotate: Fix identification of ARM blt and bls instructions
perf tools: Fix usage of max_stack sysctl
perf callchain: Stop validating callchains by the max_stack sysctl
perf trace: Fix exit_group() formatting
perf top: Use machine->kptr_restrict_warned
perf trace: Warn when trying to resolve kernel addresses with kptr_restrict=1
perf machine: Do not bail out if not managing to read ref reloc symbol
perf/x86/intel/p4: Trival indentation fix, remove space
...
This patch makes serverl logical caculation functions return bool to
improve readability due to these particular functions only using 0/1
as their return value.
No functional change.
Signed-off-by: Zhang Zhuoyu <zhangzhuoyu@cmss.chinamobile.com>
If MDS sorts dentries in dirfrag in hash order, we use hash value to
compose dentry offset. dentry offset is:
(0xff << 52) | ((24 bits hash) << 28) |
(the nth entry hash hash collision)
This offset is stable across directory fragmentation. This alos means
there is no need to reset readdir offset if directory get fragmented
in the middle of readdir.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
Set a flag in readdir request, which indicates that client interprets
'end/complete' as bit flags. So that mds can reply additional flags in
readdir reply.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
This adds the "map check" infrastructure for sending osdmap version
checks on CALC_TARGET_POOL_DNE and completing in-flight requests with
-ENOENT if the target pool doesn't exist or has just been deleted.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
For map check, we are going to need to send CEPH_MSG_MON_GET_VERSION
messages asynchronously and get a callback on completion. Refactor MON
client to allow firing off generic requests asynchronously and add an
async variant of ceph_monc_get_version(). ceph_monc_do_statfs() is
switched over and remains sync.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Implement ceph_osdc_watch_check() to be able to check on status of
watch. Note that the time it takes for a watch/notify event to get
delivered through the notify_wq is taken into account.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Implement ceph_osdc_notify() for sending notifies.
Due to the fact that the current messenger can't do read-in into
pagelists (it can only do write-out from them), I had to go with a page
vector for a NOTIFY_COMPLETE payload, for now.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
This adds support and switches rbd to a new, more reliable version of
watch/notify protocol. As with the OSD client update, this is mostly
about getting the right structures linked into the right places so that
reconnects are properly sent when needed. watch/notify v2 also
requires sending regular pings to the OSDs - send_linger_ping().
A major change from the old watch/notify implementation is the
introduction of ceph_osd_linger_request - linger requests no longer
piggy back on ceph_osd_request. ceph_osd_event has been merged into
ceph_osd_linger_request.
All the details are now hidden within libceph, the interface consists
of a simple pair of watch/unwatch functions and ceph_osdc_notify_ack().
ceph_osdc_watch() does return ceph_osd_linger_request, but only to keep
the lifetime management simple.
ceph_osdc_notify_ack() accepts an optional data payload, which is
relayed back to the notifier.
Portions of this patch are loosely based on work by Douglas Fuller
<dfuller@redhat.com> and Mike Christie <michaelc@cs.wisc.edu>.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
This is a major sync up, up to ~Jewel. The highlights are:
- per-session request trees (vs a global per-client tree)
- per-session locking (vs a global per-client rwlock)
- homeless OSD session
- no ad-hoc global per-client lists
- support for pool quotas
- foundation for watch/notify v2 support
- foundation for map check (pool deletion detection) support
The switchover is incomplete: lingering requests can be setup and
teared down but aren't ever reestablished. This functionality is
restored with the introduction of the new lingering infrastructure
(ceph_osd_linger_request, linger_work, etc) in a later commit.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
OSD client is getting moved from the big per-client lock to a set of
per-session locks. The big rwlock would only be held for read most of
the time, so a global osdc->osd_lru needs additional protection.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Separate osdmap handling from decoding and iterating over a bag of maps
in a fresh MOSDMap message. This sets up the scene for the updated OSD
client.
Of particular importance here is the addition of pi->was_full, which
can be used to answer "did this pool go full -> not-full in this map?".
This is the key bit for supporting pool quotas.
We won't be able to downgrade map_sem for much longer, so drop
downgrade_write().
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
This leads to a simpler osdmap handling code, particularly when dealing
with pi->was_full, which is introduced in a later commit.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
If you specify ACK | ONDISK and set ->r_unsafe_callback, both
->r_callback and ->r_unsafe_callback(true) are called on ack. This is
very confusing. Redo this so that only one of them is called:
->r_unsafe_callback(true), on ack
->r_unsafe_callback(false), on commit
or
->r_callback, on ack|commit
Decode everything in decode_MOSDOpReply() to reduce clutter.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
finish_read(), its only user, uses it to get to hdr.data_len, which is
what ->r_result is set to on success. This gains us the ability to
safely call callbacks from contexts other than reply, e.g. map check.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
The crux of this is getting rid of ceph_osdc_build_request(), so that
MOSDOp can be encoded not before but after calc_target() calculates the
actual target. Encoding now happens within ceph_osdc_start_request().
Also nuked is the accompanying bunch of pointers into the encoded
buffer that was used to update fields on each send - instead, the
entire front is re-encoded. If we want to support target->name_len !=
base->name_len in the future, there is no other way, because oid is
surrounded by other fields in the encoded buffer.
Encoding OSD ops and adding data items to the request message were
mixed together in osd_req_encode_op(). While we want to re-encode OSD
ops, we don't want to add duplicate data items to the message when
resending, so all call to ceph_osdc_msg_data_add() are factored out
into a new setup_request_data().
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Replace __calc_request_pg() and most of __map_request() with
calc_target() and start using req->r_t.
ceph_osdc_build_request() however still encodes base_oid, because it's
called before calc_target() is and target_oid is empty at that point in
time; a printf in osdc_show() also shows base_oid. This is fixed in
"libceph: switch to calc_target(), part 2".
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Introduce ceph_osd_request_target, containing all mapping-related
fields of ceph_osd_request and calc_target() for calculating mappings
and populating it.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Add and decode pi->min_size and pi->last_force_request_resend. These
are going to be used by calc_target().
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
calc_target() code is going to need to know how to compare PGs. Take
lhs and rhs pgid by const * while at it.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Rename ceph_calc_pg_primary() to ceph_pg_to_acting_primary() to
emphasise that it returns acting primary.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Knowning just acting set isn't enough, we need to be able to record up
set as well to detect interval changes. This means returning (up[],
up_len, up_primary, acting[], acting_len, acting_primary) and passing
it around. Introduce and switch to ceph_osds to help with that.
Rename ceph_calc_pg_acting() to ceph_pg_to_up_acting_osds() and return
both up and acting sets from it.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Rename ceph_oloc_oid_to_pg() to ceph_object_locator_to_pg(). Emphasise
that returned is raw PG and return -ENOENT instead of -EIO if the pool
doesn't exist.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
eversion_t is version+epoch in userspace and is encoded in that order.
ceph_eversion is defined as epoch+version in rados.h, yet we memcpy it
in __send_request(). Reoder ceph_eversion fields.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Given
struct foo {
u64 id;
struct rb_node bar_node;
};
generate insert_bar(), erase_bar() and lookup_bar() functions with
DEFINE_RB_FUNCS(bar, struct foo, id, bar_node)
The key is assumed to be an integer (u64, int, etc), compared with
< and >. nodefld has to be initialized with RB_CLEAR_NODE().
Start using it for MDS, MON and OSD requests and OSD sessions.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Currently ceph_object_id can hold object names of up to 100
(CEPH_MAX_OID_NAME_LEN) characters. This is enough for all use cases,
expect one - long rbd image names:
- a format 1 header is named "<imgname>.rbd"
- an object that points to a format 2 header is named "rbd_id.<imgname>"
We operate on these potentially long-named objects during rbd map, and,
for format 1 images, during header refresh. (A format 2 header name is
a small system-generated string.)
Lift this 100 character limit by making ceph_object_id be able to point
to an externally-allocated string. Apart from being able to work with
almost arbitrarily-long named objects, this allows us to reduce the
size of ceph_object_id from >100 bytes to 64 bytes.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
The size of ->r_request and ->r_reply messages depends on the size of
the object name (ceph_object_id), while the size of ceph_osd_request is
fixed. Move message allocation into a separate function that would
have to be called after ceph_object_id and ceph_object_locator (which
is also going to become variable in size with RADOS namespaces) have
been filled in:
req = ceph_osdc_alloc_request(...);
<fill in req->r_base_oid>
<fill in req->r_base_oloc>
ceph_osdc_alloc_messages(req);
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
New SA query function to return the ClassPortInfo struct from the SA.
If the SM supports FullMemberSendOnly mode for MCG's, it sets a
capability bit in the capability_mask2 field of the response.
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Change struct ib_class_port_info to conform to IB Spec 1.3
That in order to get specific capability mask from ClassPortInfo mad.
>From the IB Spec, ClassPortInfo section:
"CapabilityMask2 Bits 0-26: Additional class-specific capabilities...
RespTimeValue the rest 5 bits"
The new struct now has one field for capabilitymask2 (previously was the
reserved field) and the resp_time field.
And it fixes up qib and srpt, use of the field repurposed to be used as
capabilitymask2:
IB/qib: Change pma_get_classportinfo
IB/srpt: Adjust the use of ib_class_port_info
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This patch enhances ethtool link mode bitmap to include
25G/50G/100G speed along with interface modes
Signed-off-by: Vidya Sagar Ravipati <vidya@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The updates this time around are almost all driver code:
- Further slow progress on the topology code.
- Substantial updates and improvements for the da7219, es8328, fsl-ssi
Intel and rcar drivers.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJXOao7AAoJECTWi3JdVIfQ3EQH/1Z4nukvcOeZgVN/4K9b27t2
LYSyPH4+7XiDsi24UAyxZWls625t+1XRtolS0yHYY+IMObkeH/T+StTirDG4C1Mv
0uw/lEs5XmkSPFMad2fDcVXhf+D6EsvuLZ24qLKhoi8TyePv6GRvYapitE4dAI7Z
bBwjT+f9r1qSMJvfCmqit8zDneDFMKd7oqPmBW6NpFri5/ksn1KUnd/zOGu2SlSd
R01Oa2VbRDGj8/Zzu5MORvgLLucxTqtAFYeF3T52M5oc33IBWvbha4fk/BDOswbz
H9S3vHyakmbZgXnnGMTp4qz0bxA76YaHzjtqgGUEMbigHTsB0PP5TtII3i5LkaY=
=Zsr1
-----END PGP SIGNATURE-----
Merge tag 'asoc-v4.7' into asoc-linus
ASoC: Updates for v4.7
The updates this time around are almost all driver code:
- Further slow progress on the topology code.
- Substantial updates and improvements for the da7219, es8328, fsl-ssi
Intel and rcar drivers.
# gpg: Signature made Mon 16 May 2016 12:08:43 BST using RSA key ID 5D5487D0
# gpg: Good signature from "Mark Brown <broonie@sirena.org.uk>"
# gpg: aka "Mark Brown <broonie@debian.org>"
# gpg: aka "Mark Brown <broonie@kernel.org>"
# gpg: aka "Mark Brown <broonie@tardis.ed.ac.uk>"
# gpg: aka "Mark Brown <broonie@linaro.org>"
# gpg: aka "Mark Brown <Mark.Brown@linaro.org>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 3F25 68AA C269 98F9 E813 A1C5 C3F4 36CA 30F5 D8EB
# Subkey fingerprint: ADE6 68AA 6757 18B5 9FE2 9FEA 24D6 8B72 5D54 87D0
This set of changes introduces an atomic API to the PWM subsystem. This
is influenced by the DRM atomic API that was introduced a while back,
though it is obviously a lot simpler. The fundamental idea remains the
same, though: drivers provide a single callback to implement the atomic
configuration of a PWM channel.
As a side-effect the PWM subsystem gains the ability for initial state
retrieval, so that the logical state mirrors that of the hardware. Many
use-cases don't care about this, but for others it is essential.
These new features require changes in all users, which these patches
take care of. The core is transitioned to use the atomic callback if
available and provides a fallback mechanism for other drivers.
Changes to transition users and drivers to the atomic API are postponed
to v4.8.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJXRcVWAAoJEN0jrNd/PrOhO3EP/RDuuco1fROml1ElCjcnWWfv
3dPyKEJhZktMmRNd/V0zMUJiOwr77wlbX4oQ5HajMHNYFQ65jfihbbylhSDepnxg
mjKV/yo18rzYZt9fv8huwvlwMOlLrJ9wQn4Gkbr5tzke6nITp52DTNH5y/anPQIk
B7neA1TerodAbE9FWjYuBZIltkmYZDqdm//RCHXVyYym8VuotE+jf+nrMXI78FoL
lgG64z/2OaGI+NZJQcpWftuz9nnenpa3sSLrvpitWEb/dAsXroMW/f08uVuOW87v
0xk7N7zmEkef7izVOWiPOK/MxIdc8hI4A5JftzMJ7nbgJvwG78dJiOxgFhrJYx0z
7zrYfjvvzjW0dpjZUvO37T/V5Rfxrk9sM7qUHJmN0+1oEkkCo1/c75JWTU2AmT4l
qkJdOGhgv7LumIiwbEyxc/5Jyh1akKOUX2svO0+0dptLRX2UpN3yeKIYinG1dAuT
86+/uuM6CL5gc+jVZ3GLNWfzHUu2RFVX0r0pzywq53pK5gMEs5WyxoIb5mHb8liA
sHsrZ3wbGGn95yZo8CwkzXIUsUH7qKYK+UVWA6OVBoTq4AOBZtII1AqvUttl25qL
xuKpj70xaBhK7VGqzDYQ68lqBaRySh+yzL/QsmnPEyx59mW81ytMrsn1Kmnuae2l
bzUsnWrpHc6530fRggTD
=sxT9
-----END PGP SIGNATURE-----
Merge tag 'pwm/for-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm
Pull pwm updates from Thierry Reding:
"This set of changes introduces an atomic API to the PWM subsystem.
This is influenced by the DRM atomic API that was introduced a while
back, though it is obviously a lot simpler. The fundamental idea
remains the same, though: drivers provide a single callback to
implement the atomic configuration of a PWM channel.
As a side-effect the PWM subsystem gains the ability for initial state
retrieval, so that the logical state mirrors that of the hardware.
Many use-cases don't care about this, but for others it is essential.
These new features require changes in all users, which these patches
take care of. The core is transitioned to use the atomic callback if
available and provides a fallback mechanism for other drivers.
Changes to transition users and drivers to the atomic API are
postponed to v4.8"
* tag 'pwm/for-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm: (30 commits)
pwm: Add information about polarity, duty cycle and period to debugfs
pwm: Switch to the atomic API
pwm: Update documentation
pwm: Add core infrastructure to allow atomic updates
pwm: Add hardware readout infrastructure
pwm: Move the enabled/disabled info into pwm_state
pwm: Introduce the pwm_state concept
pwm: Keep PWM state in sync with hardware state
ARM: Explicitly apply PWM config extracted from pwm_args
drm: i915: Explicitly apply PWM config extracted from pwm_args
input: misc: pwm-beeper: Explicitly apply PWM config extracted from pwm_args
input: misc: max8997: Explicitly apply PWM config extracted from pwm_args
backlight: lm3630a: explicitly apply PWM config extracted from pwm_args
backlight: lp855x: Explicitly apply PWM config extracted from pwm_args
backlight: lp8788: Explicitly apply PWM config extracted from pwm_args
backlight: pwm_bl: Use pwm_get_args() where appropriate
fbdev: ssd1307fb: Use pwm_get_args() where appropriate
regulator: pwm: Use pwm_get_args() where appropriate
leds: pwm: Use pwm_get_args() where appropriate
input: misc: max77693: Use pwm_get_args() where appropriate
...
This patch adds a kvm debugfs subdirectory for each VM, which is named
after its pid and file descriptor. The directories contain the same
kind of files that are already in the kvm debugfs directory, but the
data exported through them is now VM specific.
This makes the debugfs kvm data a convenient alternative to the
tracepoints which already have per VM data. The debugfs data is easy
to read and low overhead.
CC: Dan Carpenter <dan.carpenter@oracle.com> [includes fixes by Dan Carpenter]
Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Florian Weber reported:
> Under full load (unshare() in loop -> OOM conditions) we can
> get kernel panic:
>
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> IP: [<ffffffff81476c85>] nfqnl_nf_hook_drop+0x35/0x70
> [..]
> task: ffff88012dfa3840 ti: ffff88012dffc000 task.ti: ffff88012dffc000
> RIP: 0010:[<ffffffff81476c85>] [<ffffffff81476c85>] nfqnl_nf_hook_drop+0x35/0x70
> RSP: 0000:ffff88012dfffd80 EFLAGS: 00010206
> RAX: 0000000000000008 RBX: ffffffff81add0c0 RCX: ffff88013fd80000
> [..]
> Call Trace:
> [<ffffffff81474d98>] nf_queue_nf_hook_drop+0x18/0x20
> [<ffffffff814738eb>] nf_unregister_net_hook+0xdb/0x150
> [<ffffffff8147398f>] netfilter_net_exit+0x2f/0x60
> [<ffffffff8141b088>] ops_exit_list.isra.4+0x38/0x60
> [<ffffffff8141b652>] setup_net+0xc2/0x120
> [<ffffffff8141bd09>] copy_net_ns+0x79/0x120
> [<ffffffff8106965b>] create_new_namespaces+0x11b/0x1e0
> [<ffffffff810698a7>] unshare_nsproxy_namespaces+0x57/0xa0
> [<ffffffff8104baa2>] SyS_unshare+0x1b2/0x340
> [<ffffffff81608276>] entry_SYSCALL_64_fastpath+0x1e/0xa8
> Code: 65 00 48 89 e5 41 56 41 55 41 54 53 83 e8 01 48 8b 97 70 12 00 00 48 98 49 89 f4 4c 8b 74 c2 18 4d 8d 6e 08 49 81 c6 88 00 00 00 <49> 8b 5d 00 48 85 db 74 1a 48 89 df 4c 89 e2 48 c7 c6 90 68 47
>
The simple fix for this requires a new pernet variable for struct
nf_queue that indicates when it is safe to use the dynamically
allocated nf_queue state.
As we need a variable anyway make nf_register_queue_handler and
nf_unregister_queue_handler pernet. This allows the existing logic of
when it is safe to use the state from the nfnetlink_queue module to be
reused with no changes except for making it per net.
The syncrhonize_rcu from nf_unregister_queue_handler is moved to a new
function nfnl_queue_net_exit_batch so that the worst case of having a
syncrhonize_rcu in the pernet exit path is not experienced in batch
mode.
Reported-by: Florian Westphal <fw@strlen.de>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Commit:
b5179ac70d ("sched/fair: Prepare to fix fairness problems on migration")
... introduced a bug: Mike Galbraith found that it introduced a
performance regression, while Paul E. McKenney reported lost
wakeups and bisected it to this commit.
The reason is that I mis-read ttwu_queue() such that I assumed any
wakeup that got a remote queue must have had the task migrated.
Since this is not so; we need to transfer this information between
queueing the wakeup and actually doing the wakeup. Use a new
task_struct::sched_flag for this, we already write to
sched_contributes_to_load in the wakeup path so this is a hot and
modified cacheline.
Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reported-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Tested-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Hunter <ahh@google.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Ben Segall <bsegall@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Turner <pjt@google.com>
Cc: Pavan Kondeti <pkondeti@codeaurora.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: byungchul.park@lge.com
Fixes: b5179ac70d ("sched/fair: Prepare to fix fairness problems on migration")
Link: http://lkml.kernel.org/r/20160523091907.GD15728@worktop.ger.corp.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Commit 950b410dd1ab ("gpu: ipu-v3: Fix imx-ipuv3-crtc module autoloading")
broke probing of the imx-drm driver in the non-modular case because the
unset dev->of_node during probing of imx-ipuv3-crtc would cause the
component matching to fail. This patch patch instead matches against
an of_node pointer stored in platform data, allowing dev->of_node to
be left unset for the platform probed imx-ipuv3-crtc devices.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXQ/gUAAoJEFDCiBxwnmDrs8MQAIYNrsM+2K/INc0IyLrdVlOJ
BG61jJLPDyou4topFpIBsIdL75TGTbIPz70EZ9TeA6uYFXox/jvlBO6RJujphMjK
n3ECnPWCkhGt2FL/MxM/A1RfZHxJUyRKsEpX8/Qjum0+mOnzEc1mrlgOKpCg3suM
rRhoxR6PyNNJDrW+5+VOuCl2Nxp+zDB1URdTnEzVNhw9FPqzA6Jjqnnj2YYLuza+
qjnDfgfpVOV0b+LXrx9K5BVImelJE8lWu7kf9dJJ9RcOI6Ykiu78RSoqGJODANxI
4+bAQFUUDnj2C2pAkCezjAq53LodzxmhnKJsi4HQacp77ze5DyEql1iv4tNMmL9I
y9pC9Lqg/ZXbOhu2jJjmyebfn3q1xOIBDjQIgvnq00Y6Q/M7ffkZM3VvKnNjxEKh
qtRV8VSKDix5pBjrufSvnMaygLwxuUXA5zB1bAg4IzfsPA4ovDPV3NMh7QUcfnGZ
55nvQ1r3HZ4ll734d8Y7BcYCsqrNP+aYkzDDVoVJsKWgxIKz/PFeE/kpI/T7buMo
GOuX/AjOqcr7pOrElq72YRUKZXWj5ANGORcwXL/84fuyUW7YK468ErE2yviWqbmD
d/tVkmM5nyd9QVoEFhfxtPccKwFXfUM72yVK9Gl/hzVxvsl1xfPOcDiMmeB+a6r+
2XeeeVp8Lke27+Y2RzQT
=08BG
-----END PGP SIGNATURE-----
Merge tag 'imx-drm-fixes-2016-05-24' of git://git.pengutronix.de/git/pza/linux into drm-next
imx-drm probing fix
Commit 950b410dd1ab ("gpu: ipu-v3: Fix imx-ipuv3-crtc module autoloading")
broke probing of the imx-drm driver in the non-modular case because the
unset dev->of_node during probing of imx-ipuv3-crtc would cause the
component matching to fail. This patch patch instead matches against
an of_node pointer stored in platform data, allowing dev->of_node to
be left unset for the platform probed imx-ipuv3-crtc devices.
* tag 'imx-drm-fixes-2016-05-24' of git://git.pengutronix.de/git/pza/linux:
drm/imx: Match imx-ipuv3-crtc components using device node in platform data
Policer was not dumping or updating timestamps
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I have only one patch for asm-generic in this release, this one is from
James Hogan and updates the generic system call table for renameat2
so we don't need to provide both renameat and renameat2 in newly
added architectures.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIVAwUAV0SqqmCrR//JCVInAQLAWBAAwP7tijdUi2kmqyvYUV/r08n3G0JoHzfT
2bFfl0N0aCu7eglyraXdCjGLATrRt9t50CYtFHtg/6pNVB2kYGdouIMy6RVcDtGn
fGPE5trCZMLGpGQ29NaSlJplW/293X/BkoZ0ERHpEHhNfIBGZf/WeM0YvJI086sk
XZ0tByw5q+pNjBjbB59KFQ8iKRtwIBk4bnrnbSWCmLUMSF+zY54zxkbAqTqJGpA7
sQuidOZ8p2Daol2nMOAgLuFdVbqOuYOsW33LMDnjJD54VU3kxBlWrPFbaJs93ask
fkKVg2KAlDQFE0bAWzXtYaIPBPSHjC4GO/QbvUQJNgYMLQRjskx/GBYxMNDcPVrR
wGbyitCIhnwWn1ch/DIdpQ8bNgMI5/5zhrVBhLpiBQypy5L2BJOk3yPhWvwOhR2B
Bacaqpk0Ydz5VjSsR30iERnZ7hGv7XX8pAOk5Slu3wAyw7EmZ5g6OqfcUX9OBHLZ
xfc7IELlRr0EFY30gGPXwmqM5GeC9ibt4cY6uNVQklDCRqbU9I90bmSm5Rps83P4
AmjDM95v7IT7aVN+KSRW1PTijEt64Z0FNacf0Hps6d5NUYtk/pxxbWOzl8zKW0Z4
yNHaX+TcuyCfPcaZR0xddV8KXCnptEaDY2yrJAL5vWKfazQQVO736F3ExSJTcrRE
QKASfkYpTAQ=
=ae81
-----END PGP SIGNATURE-----
Merge tag 'asm-generic-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic cleanup from Arnd Bergmann:
"I have only one patch for asm-generic in this release, this one is
from James Hogan and updates the generic system call table for
renameat2 so we don't need to provide both renameat and renameat2 in
newly added architectures"
* tag 'asm-generic-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
asm-generic: Drop renameat syscall from default list
I found a serious performance bug in packet schedulers using hrtimers.
sch_htb and sch_fq are definitely impacted by this problem.
We constantly rearm high resolution timers if some packets are throttled
in one (or more) class, and other packets are flying through qdisc on
another (non throttled) class.
hrtimer_start() does not have the mod_timer() trick of doing nothing if
expires value does not change :
if (timer_pending(timer) &&
timer->expires == expires)
return 1;
This issue is particularly visible when multiple cpus can queue/dequeue
packets on the same qdisc, as hrtimer code has to lock a remote base.
I used following fix :
1) Change htb to use qdisc_watchdog_schedule_ns() instead of open-coding
it.
2) Cache watchdog prior expiration. hrtimer might provide this, but I
prefer to not rely on some hrtimer internal.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXRL2PAAoJECebzXlCjuG+c34P/1wnkehVxDozBJp7UEzhrsE/
U1dpwfykzVEIMh68TldBvyrt2Lb4ThLPZ7V2dVwNqA831S/VM6fWJyw8WerSgGgU
SUGOzdF04rNfy41lXQNpDiiC417Fbp4Js4O+Q5kd+8kqQbXYqCwz0ce3DVbAT571
JmJgBI8gZLhicyNRDOt0y6C+/3P+0bbXYvS8wkzY+CwbNczHJOCLhwViKzWTptm9
LCSgDGm68ckpR7mZkWfEF3WdiZ9+SxeI+pT9dcomzxNfbv8NluDplYmdLbepA2J8
uWHGprVe9WJMDnw4hJhrI2b3/rHIntpxuZYktmnb/z/ezBTyi3FXYWgAEdE1by+Y
Gf7OewKOp8XcQ/iHRZ8vwXNrheHAr9++SB49mGBZJ3qj6bO+FrISQKX9FRxo6PrJ
SDRgYjt5yUG2oD1AAs1NzuBPqZzR40mA6Yk4zuNAcxzK/S7DdRF/9Kjyk86TVv08
3E3O5i1RyVcU/A7JdnbiyeDFMQoRshdnN0HShIZcSfcfW+qFKghNlO9bFfSl904F
jlG6moNB5OBiV8FNOelY+HGAYoUdw120QxqQMv47oZGKCjv+rfK38aB4GBJ4iEuo
TrGqNmrMrs/AKdL3Sd+8LuJqSfXggrwUDc/KS6CFz/U0eBbp6k0kcd7FEyG/J8kW
JxQ0URgyJ+DHfc60E8LN
=k6RP
-----END PGP SIGNATURE-----
Merge tag 'nfsd-4.7' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
"A very quiet cycle for nfsd, mainly just an RDMA update from Chuck
Lever"
* tag 'nfsd-4.7' of git://linux-nfs.org/~bfields/linux:
sunrpc: fix stripping of padded MIC tokens
svcrpc: autoload rdma module
svcrdma: Generalize svc_rdma_xdr_decode_req()
svcrdma: Eliminate code duplication in svc_rdma_recvfrom()
svcrdma: Drain QP before freeing svcrdma_xprt
svcrdma: Post Receives only for forward channel requests
svcrdma: Remove superfluous line from rdma_read_chunks()
svcrdma: svc_rdma_put_context() is invoked twice in Send error path
svcrdma: Do not add XDR padding to xdr_buf page vector
svcrdma: Support IPv6 with NFS/RDMA
nfsd: handle seqid wraparound in nfsd4_preprocess_layout_stateid
Remove unnecessary allocation
after a crash and a potential BUG_ON crash if a file has the data
journalling flag enabled while it has dirty delayed allocation blocks
that haven't been written yet. Also fix a potential crash in the new
project quota code and a maliciously corrupted file system.
In addition, fix some DAX-specific bugs, including when there is a
transient ENOSPC situation and races between writes via direct I/O and
an mmap'ed segment that could lead to lost I/O.
Finally the usual set of miscellaneous cleanups.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABCAAGBQJXQ40fAAoJEPL5WVaVDYGjnwMH+wXHASgPfzZgtRInsTG8W/2L
jsmAcMlyMAYIATWMppNtPIq0td49z1dYO0YkKhtPVMwfzu230IFWhGWp93WqP9ve
XYHMmaBorFlMAzWgMKn1K0ExWZlV+ammmcTKgU0kU4qyZp0G/NnMtlXIkSNv2amI
9Mn6R+v97c20gn8e9HWP/IVWkgPr+WBtEXaSGjC7dL6yI8hL+rJMqN82D76oU5ea
vtwzrna/ISijy+etYmQzqHNYNaBKf40+B5HxQZw/Ta3FSHofBwXAyLaeEAr260Mf
V3Eg2NDcKQxiZ3adBzIUvrRnrJV381OmHoguo8Frs8YHTTRiZ0T/s7FGr2Q0NYE=
=7yIM
-----END PGP SIGNATURE-----
Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"Fix a number of bugs, most notably a potential stale data exposure
after a crash and a potential BUG_ON crash if a file has the data
journalling flag enabled while it has dirty delayed allocation blocks
that haven't been written yet. Also fix a potential crash in the new
project quota code and a maliciously corrupted file system.
In addition, fix some DAX-specific bugs, including when there is a
transient ENOSPC situation and races between writes via direct I/O and
an mmap'ed segment that could lead to lost I/O.
Finally the usual set of miscellaneous cleanups"
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (23 commits)
ext4: pre-zero allocated blocks for DAX IO
ext4: refactor direct IO code
ext4: fix race in transient ENOSPC detection
ext4: handle transient ENOSPC properly for DAX
dax: call get_blocks() with create == 1 for write faults to unwritten extents
ext4: remove unmeetable inconsisteny check from ext4_find_extent()
jbd2: remove excess descriptions for handle_s
ext4: remove unnecessary bio get/put
ext4: silence UBSAN in ext4_mb_init()
ext4: address UBSAN warning in mb_find_order_for_block()
ext4: fix oops on corrupted filesystem
ext4: fix check of dqget() return value in ext4_ioctl_setproject()
ext4: clean up error handling when orphan list is corrupted
ext4: fix hang when processing corrupted orphaned inode list
ext4: remove trailing \n from ext4_warning/ext4_error calls
ext4: fix races between changing inode journal mode and ext4_writepages
ext4: handle unwritten or delalloc buffers before enabling data journaling
ext4: fix jbd2 handle extension in ext4_ext_truncate_extend_restart()
ext4: do not ask jbd2 to write data for delalloc buffers
jbd2: add support for avoiding data writes during transaction commits
...
This commits adds a new RDMA local service operation:
- IP to GID resolution.
The client request would include the ifindex of the outgoing interface
and would place in an attribute (LS_NLA_TYPE_IPV4 or LS_NLA_TYPE_IPV6)
the destnation IP.
The local service would answer with a message that has the attribute:
- LS_NLA_TYPE_DGID - The destination GID.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Another quiet release for SPI, almost entirely driver specific changes
with the diffstat dominated by two new drivers which are about two
thirds of it in terms of lines of code:
- New drivers for PIC32 standard and SQI controllers.
- The Cadence driver has had runtime PM support added and quite a few
fixes and cleanups.
- The flash-specific accelerated path support now has a feature query
interface.
- The pxa2xx driver has been moved to use the core DMA mapping support.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJXQufEAAoJECTWi3JdVIfQXeEH/3PZVHvwQBqpN6S0AunlJQoM
s1bScKYeH2ukx9iw86M/upSCOVt4TGlPrdwzcYCUYll9IJuO/ChDio7PoVlxQeJB
kYUrFi6dzE/bCNzWtrGtyvNlSDsrRccbRBhmKTFQ9DokcJHgzdzhuCuXUR6OKDDw
CxlvDrLwapzOpHIncrhh6dvv1NoZgusOTMzVQAPvLbuiH9WpdnD9MjySklIqd0XU
bp+J4J5+jyBVykOZ2MdYpXf1dRhg0c0kmKXOKuX9woiJhvBFrtZX2GfCw1MXchKZ
/obHOyD7ff+MBCY2nFN95s3rl9Vxn8IAfNWsuQvZaFK0nz1bypaQ6aXIbXXgj/8=
=QO1T
-----END PGP SIGNATURE-----
Merge tag 'spi-v4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi updates from Mark Brown:
"Another quiet release for SPI, almost entirely driver specific changes
with the diffstat dominated by two new drivers which are about two
thirds of it in terms of lines of code:
- new drivers for PIC32 standard and SQI controllers
- the Cadence driver has had runtime PM support added and quite a few
fixes and cleanups
- flash-specific accelerated path support now has a feature query
interface
- the pxa2xx driver has been moved to use the core DMA mapping support"
* tag 'spi-v4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: (48 commits)
spi: pic32-sqi: Fix linker error, undefined reference to `bad_dma_ops'
spi: dw-pci: Spelling s/paltforms/platforms/g
spi: pic32-sqi: Remove pic32_sqi_setup and pic32_sqi_cleanup
spi: Fix simple typo s/impelment/implement
spi: rockchip: potential NULL dereference on error
spi: zynqmp: disable clocks in error paths
spi: Drop unnecessary dependencies on relaxed I/O accessors
spi: qup: Add spi_master_put in remove function
spi: qup: Handle clocks in pm_runtime suspend and resume
spi: st-ssc4: Fix missing spi_master_put in spi_st_probe error paths
spi: st-ssc4: Allow compile test build
spi: omap2-mcspi: Use dma_request_chan() for requesting DMA channel
spi: davinci: Use dma_request_chan() for requesting DMA channel
spi: pic32: Fix checking return value of devm_ioremap_resource
spi: spi-fsl-dspi: Update DT binding documentation
spi: Drop duplicate code to set master->dev.parent
spi: pic32: Set proper bits_per_word_mask
spi: return error if kmap'd buffers passed to spi_map_buf()
spi: core: add hook flash_read_supported to spi_master
spi: pic32-sqi: silence array overflow warning
...
First cycle with Boris as NAND maintainer! Many (most) bullets stolen from him.
Generic:
* Migrated NAND LED trigger to be a generic MTD trigger
NAND:
* Introduction of the "ECC algorithm" concept, to avoid overloading the ECC
mode field too much more
* Replaced the nand_ecclayout infrastructure with something a little more
flexible (finally!) and future proof
* Rework of the OMAP GPMC and NAND drivers; the TI folks pulled some of
this into their own tree as well
* Prepare the sunxi NAND driver to receive DMA support
* Handle bitflips in erased pages on GPMI revisions that do not support
this in hardware.
SPI NOR:
* Start using the spi_flash_read() API for SPI drivers that support it (i.e.,
SPI drivers with special memory-mapped flash modes)
And other small scattered improvments.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXQ9oUAAoJEFySrpd9RFgttf0P/3oIVCvLHSFIsi7XiUusWJWk
Cb+xW3ujFd2kNUqAQGnyvPUGU1amgjAjy2kwMpvpOG07DVgSnxQVGaQLins8Zwpw
auWxH8llISmC6UkNsS1jV0d7KzSMCT2Ne+BenRAn68kq3ovXPPB3B19B6dFj8ail
s83ajoZhsn1+eyctiKtbhXgZWkJHlRmBeXPKAJcS0lBcSibR+6N+O//JEAMnyYvc
7azuw0KMVwQNnNYFAfd9dilV5juZ9bZptTJYH7XuF+44FhxmSKvTX2a9gmp0C4Bm
FszUiPrIWF+t98nSQxxSn/zPlyllFyoisa6F7eGnDHIz+bH0Emf2oVwsSG5ASl42
XTml0kB0jCfuBfgAiyhYU2Uds7rSYs/ZcHr3iPgpUY3Sc3dgoArDdahMJXwqaoa8
UdChu6A+rjhi9PqhzNNVTarbilp3pOVgKAUVEWTdpQ1wGU4c+9SNlTTwhPy4g7RB
uKlqbMeiZ/5rPiihaMUNtzxMxSe9OGYW2HVNVExvmlF2Ca42M1xJJBMlAA6IIXyS
35d3Y4F5zPP7U6GCVla06WHkL5ahXJWmI0Xhf+2jCnDMipeAl6eCEiAJY5EmvAnr
FTpZ4qkspED69mO8oZW9ORje0n6PCm4XPOi4Vl8kci8tlBsEJMk9jaedWwGlZkRk
I5leUP4NEougvuHce2Cn
=J6KN
-----END PGP SIGNATURE-----
Merge tag 'for-linus-20160523' of git://git.infradead.org/linux-mtd
Pull MTD updates from Brian Norris:
"First cycle with Boris as NAND maintainer! Many (most) bullets stolen
from him.
Generic:
- Migrated NAND LED trigger to be a generic MTD trigger
NAND:
- Introduction of the "ECC algorithm" concept, to avoid overloading
the ECC mode field too much more
- Replaced the nand_ecclayout infrastructure with something a little
more flexible (finally!) and future proof
- Rework of the OMAP GPMC and NAND drivers; the TI folks pulled some
of this into their own tree as well
- Prepare the sunxi NAND driver to receive DMA support
- Handle bitflips in erased pages on GPMI revisions that do not
support this in hardware.
SPI NOR:
- Start using the spi_flash_read() API for SPI drivers that support
it (i.e., SPI drivers with special memory-mapped flash modes)
And other small scattered improvments"
* tag 'for-linus-20160523' of git://git.infradead.org/linux-mtd: (155 commits)
mtd: spi-nor: support GigaDevice gd25lq64c
mtd: nand_bch: fix spelling of "probably"
mtd: brcmnand: respect ECC algorithm set by NAND subsystem
gpmi-nand: Handle ECC Errors in erased pages
Documentation: devicetree: deprecate "soft_bch" nand-ecc-mode value
mtd: nand: add support for "nand-ecc-algo" DT property
mtd: mtd: drop NAND_ECC_SOFT_BCH enum value
mtd: drop support for NAND_ECC_SOFT_BCH as "soft_bch" mapping
mtd: nand: read ECC algorithm from the new field
mtd: nand: fsmc: validate ECC setup by checking algorithm directly
mtd: nand: set ECC algorithm to Hamming on fallback
staging: mt29f_spinand: set ECC algorithm explicitly
CRIS v32: nand: set ECC algorithm explicitly
mtd: nand: atmel: set ECC algorithm explicitly
mtd: nand: davinci: set ECC algorithm explicitly
mtd: nand: bf5xx: set ECC algorithm explicitly
mtd: nand: omap2: Fix high memory dma prefetch transfer
mtd: nand: omap2: Start dma request before enabling prefetch
mtd: nandsim: add __init attribute
mtd: nand: move of_get_nand_xxx() helpers into nand_base.c
...
Specifically the change from hex to decimal helps correlating events.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
"The GIC is dead; Long live the GIC"
This set of changes include the new vgic, which is a reimplementation of
our horribly broken legacy vgic implementation. The two implementations
will live side-by-side (with the new being the configured default) for
one kernel release and then we'll remove it.
Also fixes a non-critical issue with virtual abort injection to guests.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJXRA7PAAoJEEtpOizt6ddy9a0H+wQ5LTC4rrgcWOsLjfa7res7
SqB3HEqrzmN3zMbv4dgYeMSBDqr6F3b1+8DJ6n1WkG7afXOKrpMWsl9SXgwmRc1q
8H54DYLcu/CDakzi/FKTeZEIp/u+tpMC7xQLFk8PFx/NUPfspPt1NU6Qi0fumZj6
5AbXwC6FKojplgO6wxV7oHRRiEnfrN5F+1whD3QlaDmI+rlNUSYNp2Ljhp8k9m9E
NSeGzZgeMdHOeZpv60uhjotuxegG1zmHeHI59ltNytdfjuFL3LvNm18yG8u1yjKm
9ZDjIu1m7dfuPw39DYC99cxP6tvEh03/N2zw86ZFgj7QfdW756WrV+UkSFfeIKE=
=xy+z
-----END PGP SIGNATURE-----
Merge tag 'kvm-arm-for-4-7-take2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-next
KVM/ARM Changes for v4.7 take 2
"The GIC is dead; Long live the GIC"
This set of changes include the new vgic, which is a reimplementation of
our horribly broken legacy vgic implementation. The two implementations
will live side-by-side (with the new being the configured default) for
one kernel release and then we'll remove it.
Also fixes a non-critical issue with virtual abort injection to guests.
Merge yet more updates from Andrew Morton:
- Oleg's "wait/ptrace: assume __WALL if the child is traced". It's a
kernel-based workaround for existing userspace issues.
- A few hotfixes
- befs cleanups
- nilfs2 updates
- sys_wait() changes
- kexec updates
- kdump
- scripts/gdb updates
- the last of the MM queue
- a few other misc things
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (84 commits)
kgdb: depends on VT
drm/amdgpu: make amdgpu_mn_get wait for mmap_sem killable
drm/radeon: make radeon_mn_get wait for mmap_sem killable
drm/i915: make i915_gem_mmap_ioctl wait for mmap_sem killable
uprobes: wait for mmap_sem for write killable
prctl: make PR_SET_THP_DISABLE wait for mmap_sem killable
exec: make exec path waiting for mmap_sem killable
aio: make aio_setup_ring killable
coredump: make coredump_wait wait for mmap_sem for write killable
vdso: make arch_setup_additional_pages wait for mmap_sem for write killable
ipc, shm: make shmem attach/detach wait for mmap_sem killable
mm, fork: make dup_mmap wait for mmap_sem for write killable
mm, proc: make clear_refs killable
mm: make vm_brk killable
mm, elf: handle vm_brk error
mm, aout: handle vm_brk failures
mm: make vm_munmap killable
mm: make vm_mmap killable
mm: make mmap_sem for write waits killable for mm syscalls
MAINTAINERS: add co-maintainer for scripts/gdb
...
Pull libata ZAC support from Tejun Heo:
"This contains Zone ATA Command support for Shingled Magnetic Recording
devices.
In addition to sending the new commands down to the device, as ZAC
commands depend on getting a lot of responses from the device, piping
up responses is beefed up too. However, it doesn't involve changes to
libata core mechanism or its interaction with upper layers, so I'm not
expecting too many fallouts.
Kudos to Hannes for driving SMR support"
* 'for-4.7-zac' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata: (28 commits)
libata: support host-aware and host-managed ZAC devices
libata: support device-managed ZAC devices
libata: NCQ encapsulation for ZAC MANAGEMENT OUT
libata: Implement ZBC OUT translation
libata: implement ZBC IN translation
libata: fixup ZAC device disabling
libata-scsi: Generate sense code for disabled devices
libata-trace: decode subcommands
libata: Check log page directory before accessing pages
libata: Add command definitions for NCQ Encapsulation for READ LOG DMA EXT
libata: Separate out ata_dev_config_ncq_send_recv()
libata/libsas: Define ATA_CMD_NCQ_NON_DATA
libsas: enable FPDMA SEND/RECEIVE
libata: do not attempt to retrieve sense code twice
libata-scsi: Set information sense field for invalid parameter
libata-scsi: set bit pointer for sense code information
libata-scsi: Set field pointer in sense code
scsi: add scsi_set_sense_field_pointer()
libata: Implement control mode page to select sense format
libata-scsi: generate correct ATA pass-through sense
...
Now that all the callers handle vm_brk failure we can change it wait for
mmap_sem killable to help oom_reaper to not get blocked just because
vm_brk gets blocked behind mmap_sem readers.
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
All the callers of vm_mmap seem to check for the failure already and
bail out in one way or another on the error which means that we can
change it to use killable version of vm_mmap_pgoff and return -EINTR if
the current task gets killed while waiting for mmap_sem. This also
means that vm_mmap_pgoff can be killable by default and drop the
additional parameter.
This will help in the OOM conditions when the oom victim might be stuck
waiting for the mmap_sem for write which in turn can block oom_reaper
which relies on the mmap_sem for read to make a forward progress and
reclaim the address space of the victim.
Please note that load_elf_binary is ignoring vm_mmap error for
current->personality & MMAP_PAGE_ZERO case but that shouldn't be a
problem because the address is not used anywhere and we never return to
the userspace if we got killed.
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 3f625002581b ("kexec: introduce a protection mechanism for the
crashkernel reserved memory") is a similar mechanism for protecting the
crash kernel reserved memory to previous crash_map/unmap_reserved_pages()
implementation, the new one is more generic in name and cleaner in code
(besides, some arch may not be allowed to unmap the pgtable).
Therefore, this patch consolidates them, and uses the new
arch_kexec_protect(unprotect)_crashkres() to replace former
crash_map/unmap_reserved_pages() which by now has been only used by
S390.
The consolidation work needs the crash memory to be mapped initially,
this is done in machine_kdump_pm_init() which is after
reserve_crashkernel(). Once kdump kernel is loaded, the new
arch_kexec_protect_crashkres() implemented for S390 will actually
unmap the pgtable like before.
Signed-off-by: Xunlei Pang <xlpang@redhat.com>
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Minfei Huang <mhuang@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For the cases that some kernel (module) path stamps the crash reserved
memory(already mapped by the kernel) where has been loaded the second
kernel data, the kdump kernel will probably fail to boot when panic
happens (or even not happens) leaving the culprit at large, this is
unacceptable.
The patch introduces a mechanism for detecting such cases:
1) After each crash kexec loading, it simply marks the reserved memory
regions readonly since we no longer access it after that. When someone
stamps the region, the first kernel will panic and trigger the kdump.
The weak arch_kexec_protect_crashkres() is introduced to do the actual
protection.
2) To allow multiple loading, once 1) was done we also need to remark
the reserved memory to readwrite each time a system call related to
kdump is made. The weak arch_kexec_unprotect_crashkres() is introduced
to do the actual protection.
The architecture can make its specific implementation by overriding
arch_kexec_protect_crashkres() and arch_kexec_unprotect_crashkres().
Signed-off-by: Xunlei Pang <xlpang@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Minfei Huang <mhuang@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
All the users of siginmask() must ensure that sig < SIGRTMIN. sig_fatal()
doesn't and this is wrong:
UBSAN: Undefined behaviour in kernel/signal.c:911:6
shift exponent 32 is too large for 32-bit type 'long unsigned int'
the patch doesn't add the neccesary check to sig_fatal(), it moves the
check into siginmask() and updates other callers.
Link: http://lkml.kernel.org/r/20160517195052.GA15187@redhat.com
Reported-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently the size of "struct signal_struct"->oom_flags member is
sizeof(unsigned) bytes, but only one flag OOM_FLAG_ORIGIN which is
updated by current thread is defined. We can convert OOM_FLAG_ORIGIN
into a bool, and reuse the saved bytes for updating from the OOM killer
and/or the OOM reaper thread.
By the way, do we care about a race window between run_store() and
swapoff() because it would be theoretically possible that two threads
sharing the "struct signal_struct" concurrently call respective
functions? If we care, we can make oom_flags an atomic_t.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This fixes block comments with proper formatting to eliminate the
following checkpatch.pl warnings:
"WARNING: Block comments use * on subsequent lines"
"WARNING: Block comments use a trailing */ on a separate line"
Link: http://lkml.kernel.org/r/1462886671-3521-8-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This fixes checkpatch.pl warning "WARNING: Prefer 'unsigned int' to
bare use of 'unsigned'".
Link: http://lkml.kernel.org/r/1462886671-3521-5-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
E-mail addresses of osrg.net domain are no longer available. This
removes them from authorship notices and prevents reporters from being
confused.
Link: http://lkml.kernel.org/r/1461935747-10380-5-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This removes the extra paragraph which mentions FSF address in GPL
notices from source code of nilfs2 and avoids the checkpatch.pl error
related to it.
Link: http://lkml.kernel.org/r/1461935747-10380-4-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull drm updates from Dave Airlie:
"Here's the main drm pull request for 4.7, it's been a busy one, and
I've been a bit more distracted in real life this merge window. Lots
more ARM drivers, not sure if it'll ever end. I think I've at least
one more coming the next merge window.
But changes are all over the place, support for AMD Polaris GPUs is in
here, some missing GM108 support for nouveau (found in some Lenovos),
a bunch of MST and skylake fixes.
I've also noticed a few fixes from Arnd in my inbox, that I'll try and
get in asap, but I didn't think they should hold this up.
New drivers:
- Hisilicon kirin display driver
- Mediatek MT8173 display driver
- ARC PGU - bitstreamer on Synopsys ARC SDP boards
- Allwinner A13 initial RGB output driver
- Analogix driver for DisplayPort IP found in exynos and rockchip
DRM Core:
- UAPI headers fixes and C++ safety
- DRM connector reference counting
- DisplayID mode parsing for Dell 5K monitors
- Removal of struct_mutex from drivers
- Connector registration cleanups
- MST robustness fixes
- MAINTAINERS updates
- Lockless GEM object freeing
- Generic fbdev deferred IO support
panel:
- Support for a bunch of new panels
i915:
- VBT refactoring
- PLL computation cleanups
- DSI support for BXT
- Color manager support
- More atomic patches
- GEM improvements
- GuC fw loading fixes
- DP detection fixes
- SKL GPU hang fixes
- Lots of BXT fixes
radeon/amdgpu:
- Initial Polaris support
- GPUVM/Scheduler/Clock/Power improvements
- ASYNC pageflip support
- New mesa feature support
nouveau:
- GM108 support
- Power sensor support improvements
- GR init + ucode fixes.
- Use GPU provided topology information
vmwgfx:
- Add host messaging support
gma500:
- Some cleanups and fixes
atmel:
- Bridge support
- Async atomic commit support
fsl-dcu:
- Timing controller for LCD support
- Pixel clock polarity support
rcar-du:
- Misc fixes
exynos:
- Pipeline clock support
- Exynoss4533 SoC support
- HW trigger mode support
- export HDMI_PHY clock
- DECON5433 fixes
- Use generic prime functions
- use DMA mapping APIs
rockchip:
- Lots of little fixes
vc4:
- Render node support
- Gamma ramp support
- DPI output support
msm:
- Mostly cleanups and fixes
- Conversion to generic struct fence
etnaviv:
- Fix for prime buffer handling
- Allow hangcheck to be coalesced with other wakeups
tegra:
- Gamme table size fix"
* 'drm-next' of git://people.freedesktop.org/~airlied/linux: (1050 commits)
drm/edid: add displayid detailed 1 timings to the modelist. (v1.1)
drm/edid: move displayid validation to it's own function.
drm/displayid: Iterate over all DisplayID blocks
drm/edid: move displayid tiled block parsing into separate function.
drm: Nuke ->vblank_disable_allowed
drm/vmwgfx: Report vmwgfx version to vmware.log
drm/vmwgfx: Add VMWare host messaging capability
drm/vmwgfx: Kill some lockdep warnings
drm/nouveau/gr/gf100-: fix race condition in fecs/gpccs ucode
drm/nouveau/core: recognise GM108 chipsets
drm/nouveau/gr/gm107-: fix touching non-existent ppcs in attrib cb setup
drm/nouveau/gr/gk104-: share implementation of ppc exception init
drm/nouveau/gr/gk104-: move rop_active_fbps init to nonctx
drm/nouveau/bios/pll: check BIT table version before trying to parse it
drm/nouveau/bios/pll: prevent oops when limits table can't be parsed
drm/nouveau/volt/gk104: round up in gk104_volt_set
drm/nouveau/fb/gm200: setup mmu debug buffer registers at init()
drm/nouveau/fb/gk20a,gm20b: setup mmu debug buffer registers at init()
drm/nouveau/fb/gf100-: allocate mmu debug buffers
drm/nouveau/fb: allow chipset-specific actions for oneinit()
...
1/ Device DAX for persistent memory:
Device DAX is the device-centric analogue of Filesystem DAX
(CONFIG_FS_DAX). It allows memory ranges to be allocated and mapped
without need of an intervening file system. Device DAX is strict,
precise and predictable. Specifically this interface:
a) Guarantees fault granularity with respect to a given page size
(pte, pmd, or pud) set at configuration time.
b) Enforces deterministic behavior by being strict about what fault
scenarios are supported.
Persistent memory is the first target, but the mechanism is also
targeted for exclusive allocations of performance/feature differentiated
memory ranges.
2/ Support for the HPE DSM (device specific method) command formats.
This enables management of these first generation devices until a
unified DSM specification materializes.
3/ Further ACPI 6.1 compliance with support for the common dimm
identifier format.
4/ Various fixes and cleanups across the subsystem.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXQhdeAAoJEB7SkWpmfYgCYP8P/RAgHkroL5lUKKU45TQUBKcY
diC9POeNSccme4tIRIQCGQUZ7+7mKM5ECv2ulF4xYOHvFBCcd/8OF6xKAXs48r3v
oguYhvX1YvIkBc9FUfBQbR1IsCOJ7uWp/UYiYCIQEXS5tS9Jv545j3ASqDt9xWoV
TWlceZn3yWSbASiV9qZ2eXhEkk75pg4yara++rsm2/7rs/TTXn5EIjBs+57BtAo+
6utI4fTy0CQvBYwVzam3m7y9dt2Z2jWXL4hgmT7pkvJ7HDoctVly0P9+bknJPUAo
g+NugKgTGeiqH5GYp5CTZ9KvL91sDF4q00pfinITVdFl0E3VE293cIHlAzSQBm5/
w58xxaRV958ZvpH7EaBmYQG82QDi/eFNqeHqVGn0xAM6MlaqO7avUMQp2lRPYMCJ
u1z/NloR5yo+sffHxsn5Luiq9KqOf6zk33PuxEkKbN74OayCSPn/SeVCO7rQR0B6
yPMJTTcTiCLnId1kOWAPaEmuK2U3BW/+ogg7hKgeCQSysuy5n6Ok5a2vEx/gJRAm
v9yF68RmIWumpHr+QB0TmB8mVbD5SY+xWTm3CqJb9MipuFIOF7AVsPyTgucBvE7s
v+i5F6MDO6tcVfiDT4AiZEt6D2TM5RbtckkUEX3ZTD6j7CGuR5D8bH0HNRrghrYk
KT1lAk6tjWBOGAHc5Ji7
=Y3Xv
-----END PGP SIGNATURE-----
Merge tag 'libnvdimm-for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm updates from Dan Williams:
"The bulk of this update was stabilized before the merge window and
appeared in -next. The "device dax" implementation was revised this
week in response to review feedback, and to address failures detected
by the recently expanded ndctl unit test suite.
Not included in this pull request are two dax topic branches (dax
error handling, and dax radix-tree locking). These topics were
deferred to get a few more days of -next integration testing, and to
coordinate a branch baseline with Ted and the ext4 tree. Vishal and
Ross will send the error handling and locking topics respectively in
the next few days.
This branch has received a positive build result from the kbuild robot
across 226 configs.
Summary:
- Device DAX for persistent memory: Device DAX is the device-centric
analogue of Filesystem DAX (CONFIG_FS_DAX). It allows memory
ranges to be allocated and mapped without need of an intervening
file system. Device DAX is strict, precise and predictable.
Specifically this interface:
a) Guarantees fault granularity with respect to a given page size
(pte, pmd, or pud) set at configuration time.
b) Enforces deterministic behavior by being strict about what
fault scenarios are supported.
Persistent memory is the first target, but the mechanism is also
targeted for exclusive allocations of performance/feature
differentiated memory ranges.
- Support for the HPE DSM (device specific method) command formats.
This enables management of these first generation devices until a
unified DSM specification materializes.
- Further ACPI 6.1 compliance with support for the common dimm
identifier format.
- Various fixes and cleanups across the subsystem"
* tag 'libnvdimm-for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (40 commits)
libnvdimm, dax: fix deletion
libnvdimm, dax: fix alignment validation
libnvdimm, dax: autodetect support
libnvdimm: release ida resources
Revert "block: enable dax for raw block devices"
/dev/dax, core: file operations and dax-mmap
/dev/dax, pmem: direct access to persistent memory
libnvdimm: stop requiring a driver ->remove() method
libnvdimm, dax: record the specified alignment of a dax-device instance
libnvdimm, dax: reserve space to store labels for device-dax
libnvdimm, dax: introduce device-dax infrastructure
nfit: add sysfs dimm 'family' and 'dsm_mask' attributes
tools/testing/nvdimm: ND_CMD_CALL support
nfit: disable vendor specific commands
nfit: export subsystem ids as attributes
nfit: fix format interface code byte order per ACPI6.1
nfit, libnvdimm: limited/whitelisted dimm command marshaling mechanism
nfit, libnvdimm: clarify "commands" vs "_DSMs"
libnvdimm: increase max envelope size for ioctl
acpi/nfit: Add sysfs "id" for NVDIMM ID
...
The component master driver imx-drm-core matches component devices using
their of_node. Since commit 950b410dd1ab ("gpu: ipu-v3: Fix imx-ipuv3-crtc
module autoloading"), the imx-ipuv3-crtc dev->of_node is not set during
probing. Before that, of_node was set and caused an of: modalias to be
used instead of the platform: modalias, which broke module autoloading.
On the other hand, if dev->of_node is not set yet when the imx-ipuv3-crtc
probe function calls component_add, component matching in imx-drm-core
fails. While dev->of_node will be set once the next component tries to
bring up the component master, imx-drm-core component binding will never
succeed if one of the crtc devices is probed last.
Add of_node to the component platform data and match against the
pdata->of_node instead of dev->of_node in imx-drm-core to work around
this problem.
Cc: <stable@vger.kernel.org> # 4.4.x
Fixes: 950b410dd1ab ("gpu: ipu-v3: Fix imx-ipuv3-crtc module autoloading")
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Tested-by: Fabio Estevam <fabio.estevam@nxp.com>
Tested-by: Lothar Waßmann <LW@KARO-electronics.de>
Tested-by: Heiko Schocher <hs@denx.de>
Tested-by: Chris Ruehl <chris.ruehl@gtsys.com.hk>
Add a helper which aids in the identification of DP dual mode
(aka. DP++) adaptors. There are several types of adaptors
specified: type 1 DVI, type 1 HDMI, type 2 DVI, type 2 HDMI
Type 1 adaptors have a max TMDS clock limit of 165MHz, type 2 adaptors
may go as high as 300MHz and they provide a register informing the
source device what the actual limit is. Supposedly also type 1 adaptors
may optionally implement this register. This TMDS clock limit is the
main reason why we need to identify these adaptors.
Type 1 adaptors provide access to their internal registers and the sink
DDC bus through I2C. Type 2 adaptors provide this access both via I2C
and I2C-over-AUX. A type 2 source device may choose to implement either
of these methods. If a source device implements the I2C-over-AUX
method, then the driver will obviously need specific support for such
adaptors since the port is driven like an HDMI port, but DDC
communication happes over the AUX channel.
This helper should be enough to identify the adaptor type (some
type 1 DVI adaptors may be a slight exception) and the maximum TMDS
clock limit. Another feature that may be available is control over
the TMDS output buffers on the adaptor, possibly allowing for some
power saving when the TMDS link is down.
Other user controllable features that may be available in the adaptors
are downstream i2c bus speed control when using i2c-over-aux, and
some control over the CEC pin. I chose not to provide any helper
functions for those since I have no use for them in i915 at this time.
The rest of the registers in the adaptor are mostly just information,
eg. IEEE OUI, hardware and firmware revision, etc.
v2: Pass adaptor type to helper functions to ease driver implementation
Fix a bunch of typoes (Paulo)
Add DRM_DP_DUAL_MODE_UNKNOWN for the case where we don't (yet) know
the type (Paulo)
Reject 0x00 and 0xff DP_DUAL_MODE_MAX_TMDS_CLOCK values (Paulo)
Adjust drm_dp_dual_mode_detect() type2 vs. type1 detection to
ease future LSPCON enabling
Remove the unused DP_DUAL_MODE_LAST_RESERVED define
v3: Fix kernel doc function argument descriptions (Jani)
s/NONE/UNKNOWN/ in drm_dp_dual_mode_detect() docs
Add kernel doc for enum drm_dp_dual_mode_type
Actually build the docs
Fix more typoes
v4: Adjust code indentation of type2 adaptor detection (Shashank)
Add debug messages for failurs cases (Shashank)
v5: EXPORT_SYMBOL(drm_dp_dual_mode_read) (Paulo)
Cc: stable@vger.kernel.org
Cc: Tore Anderson <tore@fud.no>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Shashank Sharma <shashank.sharma@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Shashank Sharma <shashank.sharma@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Shashank Sharma <shashank.sharma@intel.com> (v4)
Link: http://patchwork.freedesktop.org/patch/msgid/1462542412-25533-1-git-send-email-ville.syrjala@linux.intel.com
(cherry picked from commit ede53344db)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
The tiled 5K Dell monitor appears to be hiding it's tiled mode
inside the displayid timings block, this patch parses this
blocks and adds the modes to the modelist.
v1.1: add missing __packed.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95207
Signed-off-by: Dave Airlie <airlied@redhat.com>
- fs-specific prefix for fscrypto
- fault injection facility
- expose validity bitmaps for user to be aware of fragmentation
- fallocate/rm/preallocation speed up
- use percpu counters
Bug fixes
- some inline_dentry/inline_data bugs
- error handling for atomic/volatile/orphan inodes
- recover broken superblock
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXQPu4AAoJEEAUqH6CSFDSILgP/1dj6fmtytr8c+55EBqXUGpo
M7rS93JTxlmU5BduIo9psJsEquTQoVEmxB/Gjd+ZnI5R6Rp1c/REaP0ba374rEhZ
ecMQh5QqzM1gRNFXrQhWFEL/KtfRqt3T80zebQP7pxFUm/m9NGMLWT43RzQ8AAhr
Y3P0NLdvxA4HAnipKptkPJcGZQlWnL9W/MR+LgsXLXqLDwJHkVu61GcF0y2ibcJM
lEtIRmyH5tg7hP5c5LTw9pKQFHkIZt5cHFLjrJ1x8FSm2TXOcJPbjOrThvcb+NKK
e0O+6R0meH2eMpak+BTkZp2YbPPyXOb1N00j//lmbPjCoJPd4ZuiJ+oRoHUlTxtU
FhO67t0brlDbMFQVRFrtv8VA8M6by+DTAAP3Ffx62I/TJkphKANCSoyQRhlWtxxO
kRU69N7ipnRNxO4WCv40FjaQjSIElCKysP1POazRmAOQm7UFTGT9Nj37+eqUcEPJ
HZ7O61DEHNemb0SMlJ8WSClstt0yUU+2cjRfTPAr2Wd3V8gYbRs0QUg5M2GLgywR
EmiJfpkXse3f/nR8W6g1hganSOXA0AZX+EUibed6VkV3oYemdFbm8OymeEmLmWpM
y2F3D7dPLW7MCoTXJqtwFWdoDwI+zkH4rJaPGTq5TVBRWVU/njX8OvoB47pOvKV1
kccL7zv2PekE1hSDO5WF
=6MSp
-----END PGP SIGNATURE-----
Merge tag 'for-f2fs-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"In this round, as Ted pointed out, fscrypto allows one more key prefix
given by filesystem to resolve backward compatibility issues. Other
than that, we've fixed several error handling cases by introducing
a fault injection facility. We've also achieved performance
improvement in some workloads as well as a bunch of bug fixes.
Summary:
Enhancements:
- fs-specific prefix for fscrypto
- fault injection facility
- expose validity bitmaps for user to be aware of fragmentation
- fallocate/rm/preallocation speed up
- use percpu counters
Bug fixes:
- some inline_dentry/inline_data bugs
- error handling for atomic/volatile/orphan inodes
- recover broken superblock"
* tag 'for-f2fs-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (73 commits)
f2fs: fix to update dirty page count correctly
f2fs: flush pending bios right away when error occurs
f2fs: avoid ENOSPC fault in the recovery process
f2fs: make exit_f2fs_fs more clear
f2fs: use percpu_counter for total_valid_inode_count
f2fs: use percpu_counter for alloc_valid_block_count
f2fs: use percpu_counter for # of dirty pages in inode
f2fs: use percpu_counter for page counters
f2fs: use bio count instead of F2FS_WRITEBACK page count
f2fs: manipulate dirty file inodes when DATA_FLUSH is set
f2fs: add fault injection to sysfs
f2fs: no need inc dirty pages under inode lock
f2fs: fix incorrect error path handling in f2fs_move_rehashed_dirents
f2fs: fix i_current_depth during inline dentry conversion
f2fs: correct return value type of f2fs_fill_super
f2fs: fix deadlock when flush inline data
f2fs: avoid f2fs_bug_on during recovery
f2fs: show # of orphan inodes
f2fs: support in batch fzero in dnode page
f2fs: support in batch multi blocks preallocation
...
Pull btrfs updates from Chris Mason:
"This has our merge window series of cleanups and fixes. These target
a wide range of issues, but do include some important fixes for
qgroups, O_DIRECT, and fsync handling. Jeff Mahoney moved around a
few definitions to make them easier for userland to consume.
Also whiteout support is included now that issues with overlayfs have
been cleared up.
I have one more fix pending for page faults during btrfs_copy_from_user,
but I wanted to get this bulk out the door first"
* 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (90 commits)
btrfs: fix memory leak during RAID 5/6 device replacement
Btrfs: add semaphore to synchronize direct IO writes with fsync
Btrfs: fix race between block group relocation and nocow writes
Btrfs: fix race between fsync and direct IO writes for prealloc extents
Btrfs: fix number of transaction units for renames with whiteout
Btrfs: pin logs earlier when doing a rename exchange operation
Btrfs: unpin logs if rename exchange operation fails
Btrfs: fix inode leak on failure to setup whiteout inode in rename
btrfs: add support for RENAME_EXCHANGE and RENAME_WHITEOUT
Btrfs: pin log earlier when renaming
Btrfs: unpin log if rename operation fails
Btrfs: don't do unnecessary delalloc flushes when relocating
Btrfs: don't wait for unrelated IO to finish before relocation
Btrfs: fix empty symlink after creating symlink and fsync parent dir
Btrfs: fix for incorrect directory entries after fsync log replay
btrfs: build fixup for qgroup_account_snapshot
btrfs: qgroup: Fix qgroup accounting when creating snapshot
Btrfs: fix fspath error deallocation
btrfs: make find_workspace warn if there are no workspaces
btrfs: make find_workspace always succeed
...
Pull mailbox updates from Jassi Brar:
"OMAP:
- Remove non-DT support from mailbox driver
- Move PM from client calls to native driver suspend/resume
- Trivial cleanups to make checkpatch happy
STI:
- Check return from devm_ioremap_resource as ERR_PTR, not NULL"
* 'mailbox-for-next' of git://git.linaro.org/landing-teams/working/fujitsu/integration:
mailbox: Fix devm_ioremap_resource error detection code
mailbox/omap: kill omap_mbox_{save/restore}_ctx() functions
mailbox/omap: check for any unread messages during suspend
mailbox/omap: add support for suspend/resume
mailbox/omap: store mailbox interrupt type in omap_mbox_device
mailbox/omap: add blank lines after declarations
mailbox/omap: remove FSF mailing address paragraph
mailbox/omap: use variable name for sizeof() operator
mailbox/omap: drop legacy platform device support
Merge more updates from Andrew Morton:
- the rest of MM
- KASAN updates
- procfs updates
- exit, fork updates
- printk updates
- lib/ updates
- radix-tree testsuite updates
- checkpatch updates
- kprobes updates
- a few other misc bits
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (162 commits)
samples/kprobes: print out the symbol name for the hooks
samples/kprobes: add a new module parameter
kprobes: add the "tls" argument for j_do_fork
init/main.c: simplify initcall_blacklisted()
fs/efs/super.c: fix return value
checkpatch: improve --git <commit-count> shortcut
checkpatch: reduce number of `git log` calls with --git
checkpatch: add support to check already applied git commits
checkpatch: add --list-types to show message types to show or ignore
checkpatch: advertise the --fix and --fix-inplace options more
checkpatch: whine about ACCESS_ONCE
checkpatch: add test for keywords not starting on tabstops
checkpatch: improve CONSTANT_COMPARISON test for structure members
checkpatch: add PREFER_IS_ENABLED test
lib/GCD.c: use binary GCD algorithm instead of Euclidean
radix-tree: free up the bottom bit of exceptional entries for reuse
dax: move RADIX_DAX_ definitions to dax.c
radix-tree: make radix_tree_descend() more useful
radix-tree: introduce radix_tree_replace_clear_tags()
radix-tree: tidy up __radix_tree_create()
...
Here's the big staging and iio driver update for 4.7-rc1.
I think we almost broke even with this release, only adding a few more
lines than we removed, which isn't bad overall given that there's a
bunch of new iio drivers added. The Lustre developers seem to have
woken up from their sleep and have been doing a great job in cleaning up
the code and pruning unused or old cruft, the filesystem is almost
readable :)
Other than that, just a lot of basic coding style cleanups in the churn.
All have been in linux-next for a while with no reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iEYEABECAAYFAlc/00QACgkQMUfUDdst+ynXYQCdG9oEsw4CCItbjGfQau5YVGbd
TOcAnA19tZz+Wcg3sLT8Zsm979dgVvDt
=9UG/
-----END PGP SIGNATURE-----
Merge tag 'staging-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging and IIO driver updates from Greg KH:
"Here's the big staging and iio driver update for 4.7-rc1.
I think we almost broke even with this release, only adding a few more
lines than we removed, which isn't bad overall given that there's a
bunch of new iio drivers added.
The Lustre developers seem to have woken up from their sleep and have
been doing a great job in cleaning up the code and pruning unused or
old cruft, the filesystem is almost readable :)
Other than that, just a lot of basic coding style cleanups in the
churn. All have been in linux-next for a while with no reported
issues"
* tag 'staging-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (938 commits)
Staging: emxx_udc: emxx_udc: fixed coding style issue
staging/gdm724x: fix "alignment should match open parenthesis" issues
staging/gdm724x: Fix avoid CamelCase
staging: unisys: rename misleading var ii with frag
staging: unisys: visorhba: switch success handling to error handling
staging: unisys: visorhba: main path needs to flow down the left margin
staging: unisys: visorinput: handle_locking_key() simplifications
staging: unisys: visorhba: fail gracefully for thread creation failures
staging: unisys: visornic: comment restructuring and removing bad diction
staging: unisys: fix format string %Lx to %llx for u64
staging: unisys: remove unused struct members
staging: unisys: visorchannel: correct variable misspelling
staging: unisys: visorhba: replace functionlike macro with function
staging: dgnc: Need to check for NULL of ch
staging: dgnc: remove redundant condition check
staging: dgnc: fix 'line over 80 characters'
staging: dgnc: clean up the dgnc_get_modem_info()
staging: lustre: lnet: enable configuration per NI interface
staging: lustre: o2iblnd: properly set ibr_why
staging: lustre: o2iblnd: remove last of kiblnd_tunables_fini
...
This reverts commit 5a023cdba5.
The functionality is superseded by the new "Device DAX" facility.
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Jan Kara <jack@suse.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Here's the "big" driver core update for 4.7-rc1.
Mostly just debugfs changes, the long-known and messy races with removing
debugfs files should be fixed thanks to the great work of Nicolai Stange. We
also have some isa updates in here (the x86 maintainers told me to take it
through this tree), a new warning when we run out of dynamic char major
numbers, and a few other assorted changes, details in the shortlog.
All have been in linux-next for some time with no reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iEYEABECAAYFAlc/0mwACgkQMUfUDdst+ynjXACgjNxR5nMUiM8ZuuD0i4Xj7VXd
hnIAoM08+XDCv41noGdAcKv+2WZVZWMC
=i+0H
-----END PGP SIGNATURE-----
Merge tag 'driver-core-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here's the "big" driver core update for 4.7-rc1.
Mostly just debugfs changes, the long-known and messy races with
removing debugfs files should be fixed thanks to the great work of
Nicolai Stange. We also have some isa updates in here (the x86
maintainers told me to take it through this tree), a new warning when
we run out of dynamic char major numbers, and a few other assorted
changes, details in the shortlog.
All have been in linux-next for some time with no reported issues"
* tag 'driver-core-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (32 commits)
Revert "base: dd: don't remove driver_data in -EPROBE_DEFER case"
gpio: ws16c48: Utilize the ISA bus driver
gpio: 104-idio-16: Utilize the ISA bus driver
gpio: 104-idi-48: Utilize the ISA bus driver
gpio: 104-dio-48e: Utilize the ISA bus driver
watchdog: ebc-c384_wdt: Utilize the ISA bus driver
iio: stx104: Utilize the module_isa_driver and max_num_isa_dev macros
iio: stx104: Add X86 dependency to STX104 Kconfig option
Documentation: Add ISA bus driver documentation
isa: Implement the max_num_isa_dev macro
isa: Implement the module_isa_driver macro
pnp: pnpbios: Add explicit X86_32 dependency to PNPBIOS
isa: Decouple X86_32 dependency from the ISA Kconfig option
driver-core: use 'dev' argument in dev_dbg_ratelimited stub
base: dd: don't remove driver_data in -EPROBE_DEFER case
kernfs: Move faulting copy_user operations outside of the mutex
devcoredump: add scatterlist support
debugfs: unproxify files created through debugfs_create_u32_array()
debugfs: unproxify files created through debugfs_create_blob()
debugfs: unproxify files created through debugfs_create_bool()
...
Here's the big char and misc driver update for 4.7-rc1.
Lots of different tiny driver subsystems have updates here with new
drivers and functionality. Details in the shortlog.
All have been in linux-next with no reported issues for a while.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iEYEABECAAYFAlc/0YYACgkQMUfUDdst+ynmtACeLpLLKZsy1v7WfkW92cLSOPBD
2C8AoLFPKoh55rlOJrNz3bW9ANAaOloX
=/nsL
-----END PGP SIGNATURE-----
Merge tag 'char-misc-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char / misc driver updates from Greg KH:
"Here's the big char and misc driver update for 4.7-rc1.
Lots of different tiny driver subsystems have updates here with new
drivers and functionality. Details in the shortlog.
All have been in linux-next with no reported issues for a while"
* tag 'char-misc-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (125 commits)
mcb: Delete num_cells variable which is not required
mcb: Fixed bar number assignment for the gdd
mcb: Replace ioremap and request_region with the devm version
mcb: Implement bus->dev.release callback
mcb: export bus information via sysfs
mcb: Correctly initialize the bus's device
mei: bus: call mei_cl_read_start under device lock
coresight: etb10: adjust read pointer only when needed
coresight: configuring ETF in FIFO mode when acting as link
coresight: tmc: implementing TMC-ETF AUX space API
coresight: moving struct cs_buffers to header file
coresight: tmc: keep track of memory width
coresight: tmc: make sysFS and Perf mode mutually exclusive
coresight: tmc: dump system memory content only when needed
coresight: tmc: adding mode of operation for link/sinks
coresight: tmc: getting rid of multiple read access
coresight: tmc: allocating memory when needed
coresight: tmc: making prepare/unprepare functions generic
coresight: tmc: splitting driver in ETB/ETF and ETR components
coresight: tmc: cleaning up header file
...
Here's the big pull request for USB and PHY drivers for 4.7-rc1
Full details in the shortlog, but it's the normal major gadget driver
updates, phy updates, new usbip code, as well as a bit of lots of other
stuff.
All have been in linux-next with no reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iEYEABECAAYFAlc/0P8ACgkQMUfUDdst+ykkFQCg0kJlxIiCU1FYBZYriqo4vX3F
9N8AoM/8nO8Y6vMpF2LWnamafYgqscTE
=ZuCh
-----END PGP SIGNATURE-----
Merge tag 'usb-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB updates from Greg KH:
"Here's the big pull request for USB and PHY drivers for 4.7-rc1
Full details in the shortlog, but it's the normal major gadget driver
updates, phy updates, new usbip code, as well as a bit of lots of
other stuff.
All have been in linux-next with no reported issues"
* tag 'usb-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (164 commits)
USB: serial: ti_usb_3410_5052: add MOXA UPORT 11x0 support
USB: serial: fix minor-number allocation
USB: serial: quatech2: fix use-after-free in probe error path
USB: serial: mxuport: fix use-after-free in probe error path
USB: serial: keyspan: fix debug and error messages
USB: serial: keyspan: fix URB unlink
USB: serial: keyspan: fix use-after-free in probe error path
USB: serial: io_edgeport: fix memory leaks in probe error path
USB: serial: io_edgeport: fix memory leaks in attach error path
usb: Remove unnecessary space before operator ','.
usb: Remove unnecessary space before open square bracket.
USB: FHCI: avoid redundant condition
usb: host: xhci-rcar: Avoid long wait in xhci_reset()
usb/host/fotg210: remove dead code in create_sysfs_files
usb: wusbcore: Do not initialise statics to 0.
usb: wusbcore: Remove space before ',' and '(' .
USB: serial: cp210x: clean up CRTSCTS flag code
USB: serial: cp210x: get rid of magic numbers in CRTSCTS flag code
USB: serial: cp210x: fix hardware flow-control disable
USB: serial: option: add even more ZTE device ids
...
Here's the large TTY and Serial driver update for 4.7-rc1.
A few new serial drivers are added here, and Peter has fixed a bunch of
long-standing bugs in the tty layer and serial drivers as normal. Full
details in the shortlog.
All of these have been in linux-next for a while with no reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iEYEABECAAYFAlc/0/oACgkQMUfUDdst+ynzyQCgsa54VNijdAzU6AA5HEfqmf2M
cGMAn1boH7hUWlAbJmzzihx4JASoGjYW
=V5VH
-----END PGP SIGNATURE-----
Merge tag 'tty-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty and serial driver updates from Greg KH:
"Here's the large TTY and Serial driver update for 4.7-rc1.
A few new serial drivers are added here, and Peter has fixed a bunch
of long-standing bugs in the tty layer and serial drivers as normal.
Full details in the shortlog.
All of these have been in linux-next for a while with no reported
issues"
* tag 'tty-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (88 commits)
MAINTAINERS: 8250: remove website reference
serial: core: Fix port mutex assert if lockdep disabled
serial: 8250_dw: fix wrong logic in dw8250_check_lcr()
tty: vt, finish looping on duplicate
tty: vt, return error when con_startup fails
QE-UART: add "fsl,t1040-ucc-uart" to of_device_id
serial: mctrl_gpio: Drop support for out1-gpios and out2-gpios
serial: 8250dw: Add device HID for future AMD UART controller
Fix OpenSSH pty regression on close
serial: mctrl_gpio: add IRQ locking
serial: 8250: Integrate Fintek into 8250_base
serial: mps2-uart: add support for early console
serial: mps2-uart: add MPS2 UART driver
dt-bindings: document the MPS2 UART bindings
serial: sirf: Use generic uart-has-rtscts DT property
serial: sirf: Introduce helper variable struct device_node *np
serial: mxs-auart: Use generic uart-has-rtscts DT property
serial: imx: Use generic uart-has-rtscts DT property
doc: DT: Add Generic Serial Device Tree Bindings
serial: 8250: of: Make tegra_serial_handle_break() static
...
do have a couple core changes in here as well.
Core:
- CLK_IS_CRITICAL support has been added. This should allow drivers
to properly express that a certain clk should stay on even if
their prepare/enable count drops to 0 (and in turn the parents of
these clks should stay enabled).
- A clk registration API has been added, clk_hw_register(), and
an OF clk provider API has been added, of_clk_add_hw_provider().
These APIs have been put in place to further split clk providers
from clk consumers, with the goal being to have clk providers
never deal with struct clk pointers at all. Conversion of provider
drivers is on going. clkdev has also gained support for registering
clk_hw pointers directly so we can convert drivers that don't use
devicetree.
New Drivers:
- Marvell ap806 and cp110 system controllers (with clks inside!)
- Hisilicon Hi3519 clock and reset controller
- Axis ARTPEC-6 clock controllers
- Oxford Semiconductor OXNAS clock controllers
- AXS10X I2S PLL
- Rockchip RK3399 clock and reset controller
Updates:
- MMC2 and UART2 clks on Samsung Exynos 3250, ACLK on Samsung Exynos 542x
SoCs, and some more clk ID exporting for bus frequency scaling
- Proper BCM2835 PCM clk support and various other clks
- i.MX clk updates for i.MX6SX, i.MX7, and VF610
- Renesas updates for R-Car H3
- Tegra210 got updates for DisplayPort and HDMI 2.0
- Rockchip driver refactorings and fixes due to adding RK3399 support
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJXP7QdAAoJEK0CiJfG5JUl/Q8P/i93QXTom/VbwDHZ4DDZr0Hc
69oCRVTDTArGLa4YrGMxu3crNWf8/ORwsZVG93PD6bkkrWo9qH52KFsI22MdZcta
HlApsFjI503C7qDw6V8UVz7mUJVfarCxKNSd1WBPCVCNExarIrRRymC3NXT6ZrUP
D59E53d4G+I6OUuybsp4gtA7aEoYebAE7BInPDDihIk7Lall5mLYbfJUumpHlmSd
wqqPad5OYoC1nkrYhIGficK9Bizy3eyK829EoqpQpE4djkNhEwKd/AwSJZ6i1pdC
obt8vQyPRK0ByND2I+3XPqZ7bFb9IKu5WIAkYzG8QskFyIqiFtOkFgEP360ojlGT
D8sZY7RBmIM4Tu5RgeoN94wML4f/zYOm6YzVUVjWdVPGoxuy4QhQsvS5Id70ifNU
pSYf1KG0Gq0wvptth02zaDE9r1lDMOCHsOPIbVMqHRxRj8shUyjroTEzdtdyS6SE
FsYmGdrq4YctXyP4E8efLzFMjN7qZyKgnAoGfROsPRb6NE3DSFs5PcxQldOcoBPv
+NstBGUlJ4Xzwd1BdxKWJq8aIsG/CLqTec63OYSYM0bfUSWXKOgemvBV8MJrDP1D
rFabdJVHhUZOy5UgxOdfmy1XWp/SWup8OUnpEJp84RywGP6UMM0s1RtWruMJ776J
tBzVIIYCJrAWFia0Djlr
=aEzb
-----END PGP SIGNATURE-----
Merge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk updates from Stephen Boyd:
"It's the usual big pile of driver updates and additions, but we do
have a couple core changes in here as well.
Core:
- CLK_IS_CRITICAL support has been added. This should allow drivers
to properly express that a certain clk should stay on even if their
prepare/enable count drops to 0 (and in turn the parents of these
clks should stay enabled).
- A clk registration API has been added, clk_hw_register(), and an OF
clk provider API has been added, of_clk_add_hw_provider(). These
APIs have been put in place to further split clk providers from clk
consumers, with the goal being to have clk providers never deal
with struct clk pointers at all. Conversion of provider drivers is
on going. clkdev has also gained support for registering clk_hw
pointers directly so we can convert drivers that don't use
devicetree.
New Drivers:
- Marvell ap806 and cp110 system controllers (with clks inside!)
- Hisilicon Hi3519 clock and reset controller
- Axis ARTPEC-6 clock controllers
- Oxford Semiconductor OXNAS clock controllers
- AXS10X I2S PLL
- Rockchip RK3399 clock and reset controller
Updates:
- MMC2 and UART2 clks on Samsung Exynos 3250, ACLK on Samsung Exynos
542x SoCs, and some more clk ID exporting for bus frequency scaling
- Proper BCM2835 PCM clk support and various other clks
- i.MX clk updates for i.MX6SX, i.MX7, and VF610
- Renesas updates for R-Car H3
- Tegra210 got updates for DisplayPort and HDMI 2.0
- Rockchip driver refactorings and fixes due to adding RK3399 support"
* tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (139 commits)
clk: fix critical clock locking
clk: qcom: mmcc-8996: Remove clocks that should be controlled by RPM
clk: ingenic: Allow divider value to be divided
clk: sunxi: Add display and TCON0 clocks driver
clk: rockchip: drop old_rate calculation on pll rate changes
clk: rockchip: simplify GRF handling in pll clocks
clk: rockchip: lookup General Register Files in rockchip_clk_init
clk: rockchip: fix the rk3399 sdmmc sample / drv name
clk: mvebu: new driver for Armada CP110 system controller
dt-bindings: arm: add DT binding for Marvell CP110 system controller
clk: mvebu: new driver for Armada AP806 system controller
clk: hisilicon: add CRG driver for hi3519 soc
clk: hisilicon: export some hisilicon APIs to modules
reset: hisilicon: add reset controller driver for hisilicon SOCs
clk: bcm/kona: Do not use sizeof on pointer type
clk: qcom: msm8916: Fix crypto clock flags
clk: nxp: lpc18xx: Initialize clk_init_data::flags to 0
clk/axs10x: Add I2S PLL clock driver
clk: imx7d: fix ahb clock mux 1
clk: fix comment of devm_clk_hw_register()
...
Pull networking fixes and more updates from David Miller:
1) Tunneling fixes from Tom Herbert and Alexander Duyck.
2) AF_UNIX updates some struct sock bit fields with the socket lock,
whereas setsockopt() sets overlapping ones with locking. Seperate
out the synchronized vs. the AF_UNIX unsynchronized ones to avoid
corruption. From Andrey Ryabinin.
3) Mount BPF filesystem with mount_nodev rather than mount_ns, from
Eric Biederman.
4) A couple kmemdup conversions, from Muhammad Falak R Wani.
5) BPF verifier fixes from Alexei Starovoitov.
6) Don't let tunneled UDP packets get stuck in socket queues, if
something goes wrong during the encapsulation just drop the packet
rather than signalling an error up the call stack. From Hannes
Frederic Sowa.
7) SKB ref after free in batman-adv, from Florian Westphal.
8) TCP iSCSI, ocfs2, rds, and tipc have to disable BH in it's TCP
callbacks since the TCP stack runs pre-emptibly now. From Eric
Dumazet.
9) Fix crash in fixed_phy_add, from Rabin Vincent.
10) Fix length checks in xen-netback, from Paul Durrant.
11) Fix mixup in KEY vs KEYID macsec attributes, from Sabrina Dubroca.
12) RDS connection spamming bug fixes from Sowmini Varadhan
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (152 commits)
net: suppress warnings on dev_alloc_skb
uapi glibc compat: fix compilation when !__USE_MISC in glibc
udp: prevent skbs lingering in tunnel socket queues
bpf: teach verifier to recognize imm += ptr pattern
bpf: support decreasing order in direct packet access
net: usb: ch9200: use kmemdup
ps3_gelic: use kmemdup
net:liquidio: use kmemdup
bpf: Use mount_nodev not mount_ns to mount the bpf filesystem
net: cdc_ncm: update datagram size after changing mtu
tuntap: correctly wake up process during uninit
intel: Add support for IPv6 IP-in-IP offload
ip6_gre: Do not allow segmentation offloads GRE_CSUM is enabled with FOU/GUE
RDS: TCP: Avoid rds connection churn from rogue SYNs
RDS: TCP: rds_tcp_accept_worker() must exit gracefully when terminating rds-tcp
net: sock: move ->sk_shutdown out of bitfields.
ipv6: Don't reset inner headers in ip6_tnl_xmit
ip4ip6: Support for GSO/GRO
ip6ip6: Support for GSO/GRO
ipv6: Set features for IPv6 tunnels
...
Similar to commits:
51d7d5205d ("powerpc: Add smp_mb() to arch_spin_is_locked()")
d86b8da04d ("arm64: spinlock: serialise spin_unlock_wait against concurrent lockers")
qspinlock suffers from the fact that the _Q_LOCKED_VAL store is
unordered inside the ACQUIRE of the lock.
And while this is not a problem for the regular mutual exclusive
critical section usage of spinlocks, it breaks creative locking like:
spin_lock(A) spin_lock(B)
spin_unlock_wait(B) if (!spin_is_locked(A))
do_something() do_something()
In that both CPUs can end up running do_something at the same time,
because our _Q_LOCKED_VAL store can drop past the spin_unlock_wait()
spin_is_locked() loads (even on x86!!).
To avoid making the normal case slower, add smp_mb()s to the less used
spin_unlock_wait() / spin_is_locked() side of things to avoid this
problem.
Reported-and-tested-by: Davidlohr Bueso <dave@stgolabs.net>
Reported-by: Giovanni Gherdovich <ggherdovich@suse.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org # v4.2 and later
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We are guaranteed that pointers to radix_tree_nodes always have the
bottom two bits clear (because they come from a slab cache, and slab
caches have a minimum alignment of sizeof(void *)), so we can redefine
'radix_tree_is_internal_node' to only return true if the bottom two bits
have value '01'. This frees up one quarter of the potential values for
use by the user.
Idea from Neil Brown.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Suggested-by: Neil Brown <neilb@suse.de>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jan Kara <jack@suse.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
These don't belong in radix-tree.h any more than PAGECACHE_TAG_* do.
Let's try to maintain the idea that radix-tree simply implements an
abstract data type.
Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In addition to replacing the entry, we also clear all associated tags.
This is really a one-off special for page_cache_tree_delete() which had
far too much detailed knowledge about how the radix tree works.
For efficiency, factor node_tag_clear() out of radix_tree_tag_clear() It
can be used by radix_tree_delete_item() as well as
radix_tree_replace_clear_tags().
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jan Kara <jack@suse.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
As with indirect_to_ptr(), ptr_to_indirect() and
RADIX_TREE_INDIRECT_PTR, change radix_tree_is_indirect_ptr() to
radix_tree_is_internal_node().
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jan Kara <jack@suse.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mirrors the earlier commit introducing node_to_entry().
Also change the type returned to be a struct radix_tree_node pointer.
That lets us simplify a couple of places in the radix tree shrink &
extend paths where we could convert an entry into a pointer, modify the
node, then convert the pointer back into an entry.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jan Kara <jack@suse.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The name RADIX_TREE_INDIRECT_PTR doesn't really match the meaning.
RADIX_TREE_INTERNAL_NODE is a better name.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jan Kara <jack@suse.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The only remaining references to root->height were in extend and shrink,
where it was updated. Now we can remove it entirely.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jan Kara <jack@suse.com>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
node->shift represents the shift necessary for looking in the slots
array at this level. It is equal to the old (node->height - 1) *
RADIX_TREE_MAP_SHIFT.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jan Kara <jack@suse.com>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Neither piece of information we're storing in node->path can be larger
than 64, so store each in its own unsigned char instead of shifting and
masking to store them both in an unsigned int.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jan Kara <jack@suse.com>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This enables the macros radix_tree_for_each_slot() and friends to be
used with multi-order entries.
The way that this works is that we treat all entries in a given slots[]
array as a single chunk. If the index given to radix_tree_next_chunk()
happens to point us to a sibling entry, we will back up iter->index so
that it points to the canonical entry, and that will be the place where
we start our iteration.
As we're processing a chunk in radix_tree_next_slot(), we process
canonical entries, skip over sibling entries, and restart the chunk
lookup if we find a non-sibling indirect pointer. This drops back to
the radix_tree_next_chunk() code, which will re-walk the tree and look
for another chunk.
This allows us to properly handle multi-order entries mixed with other
entries that are at various heights in the radix tree.
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jan Kara <jack@suse.com>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
radix_tree_for_each_chunk() and radix_tree_for_each_chunk_slot() have
never been used in the kernel since their introduction in 2012, so
remove them.
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jan Kara <jack@suse.com>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The defines in regression2.c are already in radix-tree.h and duplicating
them in the test case makes experimenting with other values for the
fan-out harder than necessary. Allow the user of the radix tree to
decide what the fan-out should be rather than fixing it to 8 for
non-kernel uses.
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jan Kara <jack@suse.com>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit e614523653 ("radix_tree: add support for multi-order entries")
left the impression that the support for multiorder radix tree entries
was functional. As soon as Ross tried to use it, it became apparent
that my testing was completely inadequate, and it didn't even work a
little bit for orders that were not a multiple of shift.
This series of patches is the result of about 6 weeks of redesign,
reimplementation, testing, arguing and hair-pulling. The great news is
that the test-suite is now far better than it was. That's reflected in
the diffstat for the test-suite alone:
12 files changed, 436 insertions(+), 28 deletions(-)
The highlight for users of the tree is that the restriction on the order
of inserted entries being >= RADIX_TREE_MAP_SHIFT is now gone; the radix
tree now supports any order between 0 and 64.
For those who are interested in how the tree works, patch 9 is probably
the most interesting one as it introduces the new machinery for handling
sibling entries.
I've tried to be fair in attributing authorship to the person who
contributed the majority of the code in each patch; Ross has been an
invaluable partner in the development of this support and it's fair to
say that each of us has code in every commit.
I should also express my appreciation of the 0day testing. It prompted
me that I was bloating the tinyconfig in an unacceptable way, and it
bisected to a commit which contained a rather nasty memory-corruption
bug.
This patch (of 29):
The irqdomain code was checking for 0 or 1 entries, not 0 entries like
the comment said they were. Introduce a new helper that will actually
check for an empty tree.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Generic UUID library defines structure type, macro to define UUID, and
the length of the UUID string. This patch removes duplicate data
structure definition, UUID string length constant as well as macro for
UUID handling.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Dmitry Kasatkin <dmitry.kasatkin@gmail.com>
Cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There is no point in keeping an address in the file since it's subject
to change.
While here, update Intel Copyright years.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Dmitry Kasatkin <dmitry.kasatkin@gmail.com>
Cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are new helpers in this patch:
uuid_is_valid checks if a UUID is valid
uuid_be_to_bin converts from string to binary (big endian)
uuid_le_to_bin converts from string to binary (little endian)
They will be used in future, i.e. in the following patches in the series.
This also moves the indices arrays to lib/uuid.c to be shared accross
modules.
[andriy.shevchenko@linux.intel.com: fix typo]
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Dmitry Kasatkin <dmitry.kasatkin@gmail.com>
Cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In NMI context, printk() messages are stored into per-CPU buffers to
avoid a possible deadlock. They are normally flushed to the main ring
buffer via an IRQ work. But the work is never called when the system
calls panic() in the very same NMI handler.
This patch tries to flush NMI buffers before the crash dump is
generated. In this case it does not risk a double release and bails out
when the logbuf_lock is already taken. The aim is to get the messages
into the main ring buffer when possible. It makes them better
accessible in the vmcore.
Then the patch tries to flush the buffers second time when other CPUs
are down. It might be more aggressive and reset logbuf_lock. The aim
is to get the messages available for the consequent kmsg_dump() and
console_flush_on_panic() calls.
The patch causes vprintk_emit() to be called even in NMI context again.
But it is done via printk_deferred() so that the console handling is
skipped. Consoles use internal locks and we could not prevent a
deadlock easily. They are explicitly called later when the crash dump
is not generated, see console_flush_on_panic().
Signed-off-by: Petr Mladek <pmladek@suse.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: David Miller <davem@davemloft.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jiri Kosina <jkosina@suse.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
printk() takes some locks and could not be used a safe way in NMI
context.
The chance of a deadlock is real especially when printing stacks from
all CPUs. This particular problem has been addressed on x86 by the
commit a9edc88093 ("x86/nmi: Perform a safe NMI stack trace on all
CPUs").
The patchset brings two big advantages. First, it makes the NMI
backtraces safe on all architectures for free. Second, it makes all NMI
messages almost safe on all architectures (the temporary buffer is
limited. We still should keep the number of messages in NMI context at
minimum).
Note that there already are several messages printed in NMI context:
WARN_ON(in_nmi()), BUG_ON(in_nmi()), anything being printed out from MCE
handlers. These are not easy to avoid.
This patch reuses most of the code and makes it generic. It is useful
for all messages and architectures that support NMI.
The alternative printk_func is set when entering and is reseted when
leaving NMI context. It queues IRQ work to copy the messages into the
main ring buffer in a safe context.
__printk_nmi_flush() copies all available messages and reset the buffer.
Then we could use a simple cmpxchg operations to get synchronized with
writers. There is also used a spinlock to get synchronized with other
flushers.
We do not longer use seq_buf because it depends on external lock. It
would be hard to make all supported operations safe for a lockless use.
It would be confusing and error prone to make only some operations safe.
The code is put into separate printk/nmi.c as suggested by Steven
Rostedt. It needs a per-CPU buffer and is compiled only on
architectures that call nmi_enter(). This is achieved by the new
HAVE_NMI Kconfig flag.
The are MN10300 and Xtensa architectures. We need to clean up NMI
handling there first. Let's do it separately.
The patch is heavily based on the draft from Peter Zijlstra, see
https://lkml.org/lkml/2015/6/10/327
[arnd@arndb.de: printk-nmi: use %zu format string for size_t]
[akpm@linux-foundation.org: min_t->min - all types are size_t here]
Signed-off-by: Petr Mladek <pmladek@suse.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Jan Kara <jack@suse.cz>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> [arm part]
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: Jiri Kosina <jkosina@suse.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: David Miller <davem@davemloft.net>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In include/linux/syscalls.h, the four functions sys_kill, sys_tgkill,
sys_tkill and sys_rt_sigqueueinfo are declared with "int pid" and "int
tgid".
However, in kernel/signal.c, the corresponding definitions use the more
appropriate "pid_t" (which is a typedef'd int).
This patch changes "int" to "pid_t" in the declarations of sys_kill,
sys_tgkill, sys_tkill and sys_rt_sigqueueinfo in <linux/syscalls.h> in
order to harmonize the function declarations with their respective
definitions.
Link: http://lkml.kernel.org/r/57302FDA.7020205@renenyffenegger.ch
Signed-off-by: René Nyffenegger <mail@renenyffenegger.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>
Cc: Zach Brown <zab@redhat.com>
Cc: Milosz Tanski <milosz@adfin.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We need to call exit_thread from copy_process in a fail path. So make it
accept task_struct as a parameter.
[v2]
* s390: exit_thread_runtime_instr doesn't make sense to be called for
non-current tasks.
* arm: fix the comment in vfp_thread_copy
* change 'me' to 'tsk' for task_struct
* now we can change only archs that actually have exit_thread
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen Liqin <liqin.linux@gmail.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: David Howells <dhowells@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Steven Miao <realmz6@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Define HAVE_EXIT_THREAD for archs which want to do something in
exit_thread. For others, let's define exit_thread as an empty inline.
This is a cleanup before we change the prototype of exit_thread to
accept a task parameter.
[akpm@linux-foundation.org: fix mips]
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen Liqin <liqin.linux@gmail.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: David Howells <dhowells@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Steven Miao <realmz6@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pass GFP flags to zs_malloc() instead of using a fixed mask supplied to
zs_create_pool(), so we can be more flexible, but, more importantly, we
need this to switch zram to per-cpu compression streams -- zram will try
to allocate handle with preemption disabled in a fast path and switch to
a slow path (using different gfp mask) if the fast one has failed.
Apart from that, this also align zs_malloc() interface with zspool/zbud.
[sergey.senozhatsky@gmail.com: pass GFP flags to zs_malloc() instead of using a fixed mask]
Link: http://lkml.kernel.org/r/20160429150942.GA637@swordfish
Link: http://lkml.kernel.org/r/20160429150942.GA637@swordfish
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Memory access coded in an assembly won't be seen by KASAN as a compiler
can instrument only C code. Add kasan_check_[read,write]() API which is
going to be used to check a certain memory range.
Link: http://lkml.kernel.org/r/1462538722-1574-3-git-send-email-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Acked-by: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Quarantine isolates freed objects in a separate queue. The objects are
returned to the allocator later, which helps to detect use-after-free
errors.
When the object is freed, its state changes from KASAN_STATE_ALLOC to
KASAN_STATE_QUARANTINE. The object is poisoned and put into quarantine
instead of being returned to the allocator, therefore every subsequent
access to that object triggers a KASAN error, and the error handler is
able to say where the object has been allocated and deallocated.
When it's time for the object to leave quarantine, its state becomes
KASAN_STATE_FREE and it's returned to the allocator. From now on the
allocator may reuse it for another allocation. Before that happens,
it's still possible to detect a use-after free on that object (it
retains the allocation/deallocation stacks).
When the allocator reuses this object, the shadow is unpoisoned and old
allocation/deallocation stacks are wiped. Therefore a use of this
object, even an incorrect one, won't trigger ASan warning.
Without the quarantine, it's not guaranteed that the objects aren't
reused immediately, that's why the probability of catching a
use-after-free is lower than with quarantine in place.
Quarantine isolates freed objects in a separate queue. The objects are
returned to the allocator later, which helps to detect use-after-free
errors.
Freed objects are first added to per-cpu quarantine queues. When a
cache is destroyed or memory shrinking is requested, the objects are
moved into the global quarantine queue. Whenever a kmalloc call allows
memory reclaiming, the oldest objects are popped out of the global queue
until the total size of objects in quarantine is less than 3/4 of the
maximum quarantine size (which is a fraction of installed physical
memory).
As long as an object remains in the quarantine, KASAN is able to report
accesses to it, so the chance of reporting a use-after-free is
increased. Once the object leaves quarantine, the allocator may reuse
it, in which case the object is unpoisoned and KASAN can't detect
incorrect accesses to it.
Right now quarantine support is only enabled in SLAB allocator.
Unification of KASAN features in SLAB and SLUB will be done later.
This patch is based on the "mm: kasan: quarantine" patch originally
prepared by Dmitry Chernenkov. A number of improvements have been
suggested by Andrey Ryabinin.
[glider@google.com: v9]
Link: http://lkml.kernel.org/r/1462987130-144092-1-git-send-email-glider@google.com
Signed-off-by: Alexander Potapenko <glider@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrey Konovalov <adech.fo@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Dmitry Chernenkov <dmitryc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, faultaround code produces young pte. This can screw up
vmscan behaviour[1], as it makes vmscan think that these pages are hot
and not push them out on first round.
During sparse file access faultaround gets more pages mapped and all of
them are young. Under memory pressure, this makes vmscan swap out anon
pages instead, or to drop other page cache pages which otherwise stay
resident.
Modify faultaround to produce old ptes, so they can easily be reclaimed
under memory pressure.
This can to some extend defeat the purpose of faultaround on machines
without hardware accessed bit as it will not help us with reducing the
number of minor page faults.
We may want to disable faultaround on such machines altogether, but
that's subject for separate patchset.
Minchan:
"I tested 512M mmap sequential word read test on non-HW access bit
system (i.e., ARM) and confirmed it doesn't increase minor fault any
more.
old: 4096 fault_around
minor fault: 131291
elapsed time: 6747645 usec
new: 65536 fault_around
minor fault: 131291
elapsed time: 6709263 usec
0.56% benefit"
[1] https://lkml.kernel.org/r/1460992636-711-1-git-send-email-vinmenon@codeaurora.org
Link: http://lkml.kernel.org/r/1463488366-47723-1-git-send-email-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Tested-by: Minchan Kim <minchan@kernel.org>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since commit 92923ca3aa ("mm: meminit: only set page reserved in the
memblock region") the reserved bit is set on reserved memblock regions.
However start and end address are passed as unsigned long. This is only
32bit on i386, so it can end up marking the wrong pages reserved for
ranges at 4GB and above.
This was observed on a 32bit Xen dom0 which was booted with initial
memory set to a value below 4G but allowing to balloon in memory
(dom0_mem=1024M for example). This would define a reserved bootmem
region for the additional memory (for example on a 8GB system there was
a reverved region covering the 4GB-8GB range). But since the addresses
were passed on as unsigned long, this was actually marking all pages
from 0 to 4GB as reserved.
Fixes: 92923ca3aa ("mm: meminit: only set page reserved in the memblock region")
Link: http://lkml.kernel.org/r/1463491221-10573-1-git-send-email-stefan.bader@canonical.com
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Cc: <stable@vger.kernel.org> [4.2+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
userfaultfd_file_create() increments mm->mm_users; this means that the
memory won't be unmapped/freed if mm owner exits/execs, and UFFDIO_COPY
after that can populate the orphaned mm more.
Change userfaultfd_file_create() and userfaultfd_ctx_put() to use
mm->mm_count to pin mm_struct. This means that
atomic_inc_not_zero(mm->mm_users) is needed when we are going to
actually play with this memory. Except handle_userfault() path doesn't
need this, the caller must already have a reference.
The patch adds the new trivial helper, mmget_not_zero(), it can have
more users.
Link: http://lkml.kernel.org/r/20160516172254.GA8595@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
compound_mapcount() is only called after PageCompound() has already been
checked by the caller, so there's no point to check it again. Gcc may
optimize it away too because it's inline but this will remove the
runtime check for sure and add it'll add an assert instead.
Link: http://lkml.kernel.org/r/1462547040-1737-3-git-send-email-aarcange@redhat.com
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
struct page->flags is unsigned long, so when shifting bits we should use
UL suffix to match it.
Found this problem after I added 64-bit CPU specific page flags and
failed to compile the kernel:
mm/page_alloc.c: In function '__free_one_page':
mm/page_alloc.c:672:2: error: integer overflow in expression [-Werror=overflow]
Link: http://lkml.kernel.org/r/1461971723-16187-1-git-send-email-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If SPARSEMEM, use page_ext in mem_section
if !SPARSEMEM, use page_ext in pgdata
Signed-off-by: Weijie Yang <weijie.yang@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It is used as a pure bool function within kernel source wide.
Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Macro HUGETLBFS_SB is clear enough, so one statement is clearer than 3
lines statements.
Remove redundant return statements for non-return functions, which can
save lines, at least.
Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
copy_page_to_iter_iovec() is currently the only user of
fault_in_pages_writeable(), and it definitely can use fragments from
high order pages.
Make sure fault_in_pages_writeable() is only touching two adjacent pages
at most, as claimed.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When mixing lots of vmallocs and set_memory_*() (which calls
vm_unmap_aliases()) I encountered situations where the performance
degraded severely due to the walking of the entire vmap_area list each
invocation.
One simple improvement is to add the lazily freed vmap_area to a
separate lockless free list, such that we then avoid having to walk the
full list on each purge.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Roman Pen <r.peniaev@gmail.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Roman Pen <r.peniaev@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Shawn Lin <shawn.lin@rock-chips.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since commit 3a5dda7a17 ("oom: prevent unnecessary oom kills or kernel
panics"), select_bad_process() is using for_each_process_thread().
Since oom_unkillable_task() scans all threads in the caller's thread
group and oom_task_origin() scans signal_struct of the caller's thread
group, we don't need to call oom_unkillable_task() and oom_task_origin()
on each thread. Also, since !mm test will be done later at
oom_badness(), we don't need to do !mm test on each thread. Therefore,
we only need to do TIF_MEMDIE test on each thread.
Although the original code was correct it was quite inefficient because
each thread group was scanned num_threads times which can be a lot
especially with processes with many threads. Even though the OOM is
extremely cold path it is always good to be as effective as possible
when we are inside rcu_read_lock() - aka unpreemptible context.
If we track number of TIF_MEMDIE threads inside signal_struct, we don't
need to do TIF_MEMDIE test on each thread. This will allow
select_bad_process() to use for_each_process().
This patch adds a counter to signal_struct for tracking how many
TIF_MEMDIE threads are in a given thread group, and check it at
oom_scan_process_thread() so that select_bad_process() can use
for_each_process() rather than for_each_process_thread().
[mhocko@suse.com: do not blow the signal_struct size]
Link: http://lkml.kernel.org/r/20160520075035.GF19172@dhcp22.suse.cz
Link: http://lkml.kernel.org/r/201605182230.IDC73435.MVSOHLFOQFOJtF@I-love.SAKURA.ne.jp
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
task_will_free_mem is a misnomer for a more complex PF_EXITING test for
early break out from the oom killer because it is believed that such a
task would release its memory shortly and so we do not have to select an
oom victim and perform a disruptive action.
Currently we make sure that the given task is not participating in the
core dumping because it might get blocked for a long time - see commit
d003f371b2 ("oom: don't assume that a coredumping thread will exit
soon").
The check can still do better though. We shouldn't consider the task
unless the whole thread group is going down. This is rather unlikely
but not impossible. A single exiting thread would surely leave all the
address space behind. If we are really unlucky it might get stuck on
the exit path and keep its TIF_MEMDIE and so block the oom killer.
Link: http://lkml.kernel.org/r/1460452756-15491-1-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tetsuo has properly noted that mmput slow path might get blocked waiting
for another party (e.g. exit_aio waits for an IO). If that happens the
oom_reaper would be put out of the way and will not be able to process
next oom victim. We should strive for making this context as reliable
and independent on other subsystems as much as possible.
Introduce mmput_async which will perform the slow path from an async
(WQ) context. This will delay the operation but that shouldn't be a
problem because the oom_reaper has reclaimed the victim's address space
for most cases as much as possible and the remaining context shouldn't
bind too much memory anymore. The only exception is when mmap_sem
trylock has failed which shouldn't happen too often.
The issue is only theoretical but not impossible.
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 36324a990c ("oom: clear TIF_MEMDIE after oom_reaper managed to
unmap the address space") not only clears TIF_MEMDIE for oom reaped task
but also set OOM_SCORE_ADJ_MIN for the target task to hide it from the
oom killer. This works in simple cases but it is not sufficient for
(unlikely) cases where the mm is shared between independent processes
(as they do not share signal struct). If the mm had only small amount
of memory which could be reaped then another task sharing the mm could
be selected and that wouldn't help to move out from the oom situation.
Introduce MMF_OOM_REAPED mm flag which is checked in oom_badness (same
as OOM_SCORE_ADJ_MIN) and task is skipped if the flag is set. Set the
flag after __oom_reap_task is done with a task. This will force the
select_bad_process() to ignore all already oom reaped tasks as well as
no such task is sacrificed for its parent.
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
"mm: consider compaction feedback also for costly allocation" has
removed the upper bound for the reclaim/compaction retries based on the
number of reclaimed pages for costly orders. While this is desirable
the patch did miss a mis interaction between reclaim, compaction and the
retry logic. The direct reclaim tries to get zones over min watermark
while compaction backs off and returns COMPACT_SKIPPED when all zones
are below low watermark + 1<<order gap. If we are getting really close
to OOM then __compaction_suitable can keep returning COMPACT_SKIPPED a
high order request (e.g. hugetlb order-9) while the reclaim is not able
to release enough pages to get us over low watermark. The reclaim is
still able to make some progress (usually trashing over few remaining
pages) so we are not able to break out from the loop.
I have seen this happening with the same test described in "mm: consider
compaction feedback also for costly allocation" on a swapless system.
The original problem got resolved by "vmscan: consider classzone_idx in
compaction_ready" but it shows how things might go wrong when we
approach the oom event horizont.
The reason why compaction requires being over low rather than min
watermark is not clear to me. This check was there essentially since
56de7263fc ("mm: compaction: direct compact when a high-order
allocation fails"). It is clearly an implementation detail though and
we shouldn't pull it into the generic retry logic while we should be
able to cope with such eventuality. The only place in
should_compact_retry where we retry without any upper bound is for
compaction_withdrawn() case.
Introduce compaction_zonelist_suitable function which checks the given
zonelist and returns true only if there is at least one zone which would
would unblock __compaction_suitable if more memory got reclaimed. In
this implementation it checks __compaction_suitable with NR_FREE_PAGES
plus part of the reclaimable memory as the target for the watermark
check. The reclaimable memory is reduced linearly by the allocation
order. The idea is that we do not want to reclaim all the remaining
memory for a single allocation request just unblock
__compaction_suitable which doesn't guarantee we will make a further
progress.
The new helper is then used if compaction_withdrawn() feedback was
provided so we do not retry if there is no outlook for a further
progress. !costly requests shouldn't be affected much - e.g. order-2
pages would require to have at least 64kB on the reclaimable LRUs while
order-9 would need at least 32M which should be enough to not lock up.
[vbabka@suse.cz: fix classzone_idx vs. high_zoneidx usage in compaction_zonelist_suitable]
[akpm@linux-foundation.org: fix it for Mel's mm-page_alloc-remove-field-from-alloc_context.patch]
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
__alloc_pages_slowpath has traditionally relied on the direct reclaim
and did_some_progress as an indicator that it makes sense to retry
allocation rather than declaring OOM. shrink_zones had to rely on
zone_reclaimable if shrink_zone didn't make any progress to prevent from
a premature OOM killer invocation - the LRU might be full of dirty or
writeback pages and direct reclaim cannot clean those up.
zone_reclaimable allows to rescan the reclaimable lists several times
and restart if a page is freed. This is really subtle behavior and it
might lead to a livelock when a single freed page keeps allocator
looping but the current task will not be able to allocate that single
page. OOM killer would be more appropriate than looping without any
progress for unbounded amount of time.
This patch changes OOM detection logic and pulls it out from shrink_zone
which is too low to be appropriate for any high level decisions such as
OOM which is per zonelist property. It is __alloc_pages_slowpath which
knows how many attempts have been done and what was the progress so far
therefore it is more appropriate to implement this logic.
The new heuristic is implemented in should_reclaim_retry helper called
from __alloc_pages_slowpath. It tries to be more deterministic and
easier to follow. It builds on an assumption that retrying makes sense
only if the currently reclaimable memory + free pages would allow the
current allocation request to succeed (as per __zone_watermark_ok) at
least for one zone in the usable zonelist.
This alone wouldn't be sufficient, though, because the writeback might
get stuck and reclaimable pages might be pinned for a really long time
or even depend on the current allocation context. Therefore there is a
backoff mechanism implemented which reduces the reclaim target after
each reclaim round without any progress. This means that we should
eventually converge to only NR_FREE_PAGES as the target and fail on the
wmark check and proceed to OOM. The backoff is simple and linear with
1/16 of the reclaimable pages for each round without any progress. We
are optimistic and reset counter for successful reclaim rounds.
Costly high order pages mostly preserve their semantic and those without
__GFP_REPEAT fail right away while those which have the flag set will
back off after the amount of reclaimable pages reaches equivalent of the
requested order. The only difference is that if there was no progress
during the reclaim we rely on zone watermark check. This is more
logical thing to do than previous 1<<order attempts which were a result
of zone_reclaimable faking the progress.
[vdavydov@virtuozzo.com: check classzone_idx for shrink_zone]
[hannes@cmpxchg.org: separate the heuristic into should_reclaim_retry]
[rientjes@google.com: use zone_page_state_snapshot for NR_FREE_PAGES]
[rientjes@google.com: shrink_zones doesn't need to return anything]
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Compaction can provide a wild variation of feedback to the caller. Many
of them are implementation specific and the caller of the compaction
(especially the page allocator) shouldn't be bound to specifics of the
current implementation.
This patch abstracts the feedback into three basic types:
- compaction_made_progress - compaction was active and made some
progress.
- compaction_failed - compaction failed and further attempts to
invoke it would most probably fail and therefore it is not
worth retrying
- compaction_withdrawn - compaction wasn't invoked for an
implementation specific reasons. In the current implementation
it means that the compaction was deferred, contended or the
page scanners met too early without any progress. Retrying is
still worthwhile.
[vbabka@suse.cz: do not change thp back off behavior]
[akpm@linux-foundation.org: fix typo in comment, per Hillf]
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
compaction_result will be used as the primary feedback channel for
compaction users. At the same time try_to_compact_pages (and
potentially others) assume a certain ordering where a more specific
feedback takes precendence.
This gets a bit awkward when we have conflicting feedback from different
zones. E.g one returing COMPACT_COMPLETE meaning the full zone has been
scanned without any outcome while other returns with COMPACT_PARTIAL aka
made some progress. The caller should get COMPACT_PARTIAL because that
means that the compaction still can make some progress. The same
applies for COMPACT_PARTIAL vs COMPACT_PARTIAL_SKIPPED.
Reorder PARTIAL to be the largest one so the larger the value is the
more progress we have done.
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
COMPACT_COMPLETE now means that compaction and free scanner met. This
is not very useful information if somebody just wants to use this
feedback and make any decisions based on that. The current caller might
be a poor guy who just happened to scan tiny portion of the zone and
that could be the reason no suitable pages were compacted. Make sure we
distinguish the full and partial zone walks.
Consumers should treat COMPACT_PARTIAL_SKIPPED as a potential success
and be optimistic in retrying.
The existing users of COMPACT_COMPLETE are conservatively changed to use
COMPACT_PARTIAL_SKIPPED as well but some of them should be probably
reconsidered and only defer the compaction only for COMPACT_COMPLETE
with the new semantic.
This patch shouldn't introduce any functional changes.
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
try_to_compact_pages() can currently return COMPACT_SKIPPED even when
the compaction is defered for some zone just because zone DMA is skipped
in 99% of cases due to watermark checks. This makes COMPACT_DEFERRED
basically unusable for the page allocator as a feedback mechanism.
Make sure we distinguish those two states properly and switch their
ordering in the enum. This would mean that the COMPACT_SKIPPED will be
returned only when all eligible zones are skipped.
As a result COMPACT_DEFERRED handling for THP in __alloc_pages_slowpath
will be more precise and we would bail out rather than reclaim.
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Compaction code is doing weird dances between COMPACT_FOO -> int ->
unsigned long
But there doesn't seem to be any reason for that. All functions which
return/use one of those constants are not expecting any other value so it
really makes sense to define an enum for them and make it clear that no
other values are expected.
This is a pure cleanup and shouldn't introduce any functional changes.
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The inactive file list should still be large enough to contain readahead
windows and freshly written file data, but it no longer is the only
source for detecting multiple accesses to file pages. The workingset
refault measurement code causes recently evicted file pages that get
accessed again after a shorter interval to be promoted directly to the
active list.
With that mechanism in place, we can afford to (on a larger system)
dedicate more memory to the active file list, so we can actually cache
more of the frequently used file pages in memory, and not have them
pushed out by streaming writes, once-used streaming file reads, etc.
This can help things like database workloads, where only half the page
cache can currently be used to cache the database working set. This
patch automatically increases that fraction on larger systems, using the
same ratio that has already been used for anonymous memory.
[hannes@cmpxchg.org: cgroup-awareness]
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Andres Freund <andres@anarazel.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Noticed an allocation failure in a network driver the other day on a 32 bit
system:
DMA-API: debugging out of memory - disabling
bnx2fc: adapter_lookup: hba NULL
lldpad: page allocation failure. order:0, mode:0x4120
Pid: 4556, comm: lldpad Not tainted 2.6.32-639.el6.i686.debug #1
Call Trace:
[<c08a4086>] ? printk+0x19/0x23
[<c05166a4>] ? __alloc_pages_nodemask+0x664/0x830
[<c0649d02>] ? free_object+0x82/0xa0
[<fb4e2c9b>] ? ixgbe_alloc_rx_buffers+0x10b/0x1d0 [ixgbe]
[<fb4e2fff>] ? ixgbe_configure_rx_ring+0x29f/0x420 [ixgbe]
[<fb4e228c>] ? ixgbe_configure_tx_ring+0x15c/0x220 [ixgbe]
[<fb4e3709>] ? ixgbe_configure+0x589/0xc00 [ixgbe]
[<fb4e7be7>] ? ixgbe_open+0xa7/0x5c0 [ixgbe]
[<fb503ce6>] ? ixgbe_init_interrupt_scheme+0x5b6/0x970 [ixgbe]
[<fb4e8e54>] ? ixgbe_setup_tc+0x1a4/0x260 [ixgbe]
[<fb505a9f>] ? ixgbe_dcbnl_set_state+0x7f/0x90 [ixgbe]
[<c088d80d>] ? dcb_doit+0x10ed/0x16d0
...
Thought that perhaps the big splat in the logs wasn't really necessecary, as
all call sites for dev_alloc_skb:
a) check the return code for the function
and
b) either print their own error message or have a recovery path that makes the
warning moot.
Fix it by modifying dev_alloc_pages to pass __GFP_NOWARN as a gfp flag to
suppress the warning
applies to the net tree
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Alexander Duyck <alexander.duyck@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These structures are defined only if __USE_MISC is set in glibc net/if.h
headers, ie when _BSD_SOURCE or _SVID_SOURCE are defined.
CC: Jan Engelhardt <jengelh@inai.de>
CC: Josh Boyer <jwboyer@fedoraproject.org>
CC: Stephen Hemminger <shemming@brocade.com>
CC: Waldemar Brodkorb <mail@waldemar-brodkorb.de>
CC: Gabriel Laskar <gabriel@lse.epita.fr>
CC: Mikko Rapeli <mikko.rapeli@iki.fi>
Fixes: 4a91cb61bb ("uapi glibc compat: fix compile errors when glibc net/if.h included before linux/if.h")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Major changes:
iwlwifi
* remove IWLWIFI_DEBUG_EXPERIMENTAL_UCODE kconfig option
* work for RX multiqueue continues
* dynamic queue allocation work continues
* add Luca as maintainer
* a bunch of fixes and improvements all over
brcmfmac
* add 4356 sdio support
ath6kl
* add ability to set debug uart baud rate with a module parameter
wil6210
* add debugfs file to configure firmware led functionality
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQEcBAABAgAGBQJXNbEBAAoJEG4XJFUm622bKfAH/2CnQV7dBCT5QwEiKYoOdsCR
eTiH7OYjTPw/rjKaG3laFgFbecnfUnHoGt55WKqRY58JycLza+SPTTv57hFTnOl+
4kDhUEjUggxMs5BRb3H7wtcnQVs/pTkgqKqwUrmFNkG6idENQgorK6DG4SNCwIdf
JrmxiHcN73xSATxlduoA9bGpluW3OvnFfRrJfyT6UBWZaFqFe3qsoKDx08S2WU2z
kUI9ZUO9Ht7Q85QdLfPQI7xo54dXo9a+8v3yc7fNFbcu1s8cqeYuofXfypjK7H/B
DEY96mubDnmDt8YE8yR9wStVzTr5zf39urE3o+/xSKSKhQxmNo8+x2TBSm5nFSQ=
=0HKi
-----END PGP SIGNATURE-----
Merge tag 'wireless-drivers-next-for-davem-2016-05-13' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next
Kalle Valo says:
====================
wireless-drivers patches for 4.7
Major changes:
iwlwifi
* remove IWLWIFI_DEBUG_EXPERIMENTAL_UCODE kconfig option
* work for RX multiqueue continues
* dynamic queue allocation work continues
* add Luca as maintainer
* a bunch of fixes and improvements all over
brcmfmac
* add 4356 sdio support
ath6kl
* add ability to set debug uart baud rate with a module parameter
wil6210
* add debugfs file to configure firmware led functionality
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
->sk_shutdown bits share one bitfield with some other bits in sock struct,
such as ->sk_no_check_[r,t]x, ->sk_userlocks ...
sock_setsockopt() may write to these bits, while holding the socket lock.
In case of AF_UNIX sockets, we change ->sk_shutdown bits while holding only
unix_state_lock(). So concurrent setsockopt() and shutdown() may lead
to corrupting these bits.
Fix this by moving ->sk_shutdown bits out of bitfield into a separate byte.
This will not change the 'struct sock' size since ->sk_shutdown moved into
previously unused 16-bit hole.
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch add a new fou6 module that provides encapsulation
operations for IPv6.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add encap_hlen and ip_tunnel_encap structure to ip6_tnl. Add functions
for getting encap hlen, setting up encap on a tunnel, performing
encapsulation operation.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Create __fou_build_header and __gue_build_header. These implement the
protocol generic parts of building the fou and gue header.
fou_build_header and gue_build_header implement the IPv4 specific
functions and call the __*_build_header functions.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Consolidate all the ip_tunnel_encap definitions in one spot in the
header file. Also, move ip_encap_hlen and ip_tunnel_encap from
ip_tunnel.c to ip_tunnels.h so they call be called without a dependency
on ip_tunnel module. Similarly, move iptun_encaps to ip_tunnel_core.c.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch defines two new GSO definitions SKB_GSO_IPXIP4 and
SKB_GSO_IPXIP6 along with corresponding NETIF_F_GSO_IPXIP4 and
NETIF_F_GSO_IPXIP6. These are used to described IP in IP
tunnel and what the outer protocol is. The inner protocol
can be deduced from other GSO types (e.g. SKB_GSO_TCPV4 and
SKB_GSO_TCPV6). The GSO types of SKB_GSO_IPIP and SKB_GSO_SIT
are removed (these are both instances of SKB_GSO_IPXIP4).
SKB_GSO_IPXIP6 will be used when support for GSO with IP
encapsulation over IPv6 is added.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>