While creating new device with NVM_DEV_CREATE if LUNs are already
allocated ioctl would return -ENOMEM which is wrong. This patch
propagates -EBUSY from nvm_reserve_luns which is correct response.
Fixes: ade69e243 ("lightnvm: merge gennvm with core")
Reviewed-by: Frans Klaver <fransklaver@gmail.com>
Signed-off-by: Rakesh Pandit <rakesh@tuxera.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Now everything uses pr_warning(), so ditch it.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-hv8r0mgdhk73wtfq3zrhavgx@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Convert sole user of warning() in this file to pr_warning(),
consolidating error reporting facilities.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-3y7yf6v673ujl2rcs34tzv8n@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When a machine check happens in the guest, related mcck info (mcic,
external damage code, ...) is stored in the vcpu's lowcore on the host.
Then the machine check handler's low-level part is executed, followed
by the high-level part.
If the high-level part's execution is interrupted by a new machine check
happening on the same vcpu on the host, the mcck info in the lowcore is
overwritten with the new machine check's data.
If the high-level part's execution is scheduled to a different cpu,
the mcck info in the lowcore is uncertain.
Therefore, for both cases, the further reinjection to the guest will use
the wrong data.
Let's backup the mcck info in the lowcore to the sie page
for further reinjection, so that the right data will be used.
Add new member into struct sie_page to store related machine check's
info of mcic, failing storage address and external damage code.
Signed-off-by: QingFeng Hao <haoqf@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Add the logic to check if the machine check happens when the guest is
running. If yes, set the exit reason -EINTR in the machine check's
interrupt handler. Refactor s390_do_machine_check to avoid panicing
the host for some kinds of machine checks which happen
when guest is running.
Reinject the instruction processing damage's machine checks including
Delayed Access Exception instead of damaging the host if it happens
in the guest because it could be caused by improper update on TLB entry
or other software case and impacts the guest only.
Signed-off-by: QingFeng Hao <haoqf@linux.vnet.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
warning() is going away, consolidating error reporting.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-5r3636cwl4z1varo90mervai@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Complete the switch to using te pr_{warning,error,etc} error reporting
facilities.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-3l9gr6237b4aqyo0rsspixe2@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
And switch from warning() to pr_warning(), to elliminate another
duplication: too many error reporting facilities.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-pkzcjrhek3uuqc4i5i9ealwd@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The warning(str_error_r(errno)) pattern can be replaced with a function,
do it.
And while at it use pr_warning(), we have way too many error reporting
facilities, time to drop some, starting with the one we got from the git
sources.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-lbak5npj1ri1uuvf1en3c0p0@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull clockevents updates from Daniel Lezcano:
- Made the tcb_clksrc endianess agnostic as the AVR32 support is gone
(Alexandre Belloni)
- Unmap io region on failure at init time in the fsl_ftm_timer (Arvind Yadav)
- Fix a bad return value for the mips-gic-timer at init time (Christophe
Jaillet)
- Fix invalid iomap check and switch the sun4i timer to use the common timer
init routine (Daniel Lezcano)
'clk' is a valid pointer at this point. So calling PTR_ERR on it is
pointess.
Return the error code from 'clk_prepare_enable()' if it fails instead.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
In case of error at init time, rollback iomapping.
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Now that AVR32 is gone, we can use the proper IO accessors that are
correctly handling endianness.
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
This patch adds platform dependency into the test case 15
(perf_event_attr). It is based on a suggestion from Jiri Olsa.
Add a new optional attribute named 'arch' in the [config] section of the
test case file. It is a comma separated list of architecture names this
test can be executed on. For example:
arch = x86_64,alpha,ppc
If this attribute is missing the test is executed on any platform. This
does not break existing behavior.
The values listed for this attribute should be identical to uname -m
output.
If the list starts with an exclamation mark (!) the comparison is
inverted, for example for
arch = !s390x,ppc
the test is not executed on s390x or ppc platforms. The exclamation
mark must be at the beginnning of the list.
Here is an example debug output:
[root@s35lp76]# fgrep arch tests/attr/test-stat-C2
arch = x86_64,alpha,ppc
[root@s35lp76]# PERF_TEST_ATTR=/tmp /usr/bin/python2 ./tests/attr.py \
-d ./tests/attr/ -p ./perf -vvvvv -t test-stat-C1
provides the following output:
running './tests/attr//test-stat-C1'
test limitation 'x86_64,alpha,ppc' <--- new
loading expected events
Event event:base-stat
fd = 1
group_fd = -1
.....
Here is the output when a test is skipped:
[root@s35lp76]# fgrep arch tests/attr/test-stat-C1
arch = !s390x
[root@s35lp76]# PERF_TEST_ATTR=/tmp /usr/bin/python2 ./tests/attr.py \
-d ./tests/attr/ -p ./perf -vvvvv -t test-stat-C1
provides the following output:
test limitation '!s390x' <--- new
skipped [s390x] './tests/attr//test-stat-C1' <--- new
The test is skipped with return code 0.
Suggested-and-Acked-by: Jiri Olsa <jolsa@redhat.com>
Reviewed-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: linux-s390@vger.kernel.org
Link: http://lkml.kernel.org/r/20170622073625.86762-1-tmricht@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Due to user writes being decoupled from media writes because of the need
of an intermediate write buffer, irrecoverable media write errors lead
to pblk stalling; user writes fill up the buffer and end up in an
infinite retry loop.
In order to let user writes fail gracefully, it is necessary for pblk to
keep track of its own internal state and prevent further writes from
being placed into the write buffer.
This patch implements a state machine to keep track of internal errors
and, in case of failure, fail further user writes in an standard way.
Depending on the type of error, pblk will do its best to persist
buffered writes (which are already acknowledged) and close down on a
graceful manner. This way, data might be recovered by re-instantiating
pblk. Such state machine paves out the way for a state-based FTL log.
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Make constants to define sizes for internal mempools and workqueues. In
this process, adjust the values to be more meaningful given the internal
constrains of the FTL. In order to do this for workqueues, separate the
current auxiliary workqueue into two dedicated workqueues to manage
lines being closed and bad blocks.
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
At the moment, in order to get enough read parallelism, we have recycled
several lines at the same time. This approach has proven not to work
well when reaching capacity, since we end up mixing valid data from all
lines, thus not maintaining a sustainable free/recycled line ratio.
The new design, relies on a two level workqueue mechanism. In the first
level, we read the metadata for a number of lines based on the GC list
they reside on (this is governed by the number of valid sectors in each
line). In the second level, we recycle a single line at a time. Here, we
issue reads in parallel, while a single GC write thread places data in
the write buffer. This design allows to (i) only move data from one line
at a time, thus maintaining a sane free/recycled ration and (ii)
maintain the GC writer busy with recycled data.
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Set a dma area for all I/Os in order to read/write from/to the metadata
stored on the per-sector out-of-bound area.
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
At the moment, we separate the closed lines on three different list
based on their number of valid sectors. GC recycles lines from each list
based on capacity. Lines from each list are taken in a FIFO fashion.
Since the number of lines is limited (it corresponds to the number of
blocks in a LUN, which is somewhere between 1000-2000), we can afford
scanning the lists to choose the optimal line to be recycled. This helps
specially in lines with a high number of valid sectors.
If the number of blocks per LUN increases, we will consider a more
efficient policy.
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Decouple bad block discovery from line allocation logic. This allows to
return meaningful error codes in case of bad block discovery failure.
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
smeta size will always be suitable for a kmalloc allocation. Simplify
the code and leave the vmalloc fallback only for emeta, where the pblk
configuration has an impact.
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If a read request is sequential and its size aligns with a
multi-plane page size, use the multi-plane hint to process the I/O in
parallel in the controller.
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
After refactoring the metadata path, the backpointer controlling
synced I/Os in a line becomes unnecessary; metadata is scheduled
on the write thread, thus we know when the end of the line is reached
and act on it directly.
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Remove a legacy variable that helped verifying the consistency of the
run-time metadata for the free line list. With the new metadata layout,
this check is no longer necessary.
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
At the moment, line metadata is persisted on a separate work queue, that
is kicked each time that a line is closed. The assumption when designing
this was that freeing the write thread from creating a new write request
was better than the potential impact of writes colliding on the media
(user I/O and metadata I/O). Experimentation has proven that this
assumption is wrong; collision can cause up to 25% of bandwidth and
introduce long tail latencies on the write thread, which potentially
cause user write threads to spend more time spinning to get a free entry
on the write buffer.
This patch moves the metadata logic to the write thread. When a line is
closed, remaining metadata is written in memory and is placed on a
metadata queue. The write thread then takes the metadata corresponding
to the previous line, creates the write request and schedules it to
minimize collisions on the media. Using this approach, we see that we
can saturate the media's bandwidth, which helps reducing both write
latencies and the spinning time for user writer threads.
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Read requests allocate some extra memory to store its per I/O context.
Instead of requiring yet another memory pool for other type of requests,
generalize this context allocation (and change naming accordingly).
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Erase I/Os are scheduled with the following goals in mind: (i) minimize
LUNs collisions with write I/Os, and (ii) even out the price of erasing
on every write, instead of putting all the burden on when garbage
collection runs. This works well on the current design, but is specific
to the default mapping algorithm.
This patch generalizes the erase path so that other mapping algorithms
can select an arbitrary line to be erased instead. It also gets rid of
the erase semaphore since it creates jittering for user writes.
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Allow to configure the number of maximum sectors per write command
through sysfs. This makes it easier to tune write command sizes for
different controller configurations.
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add a new debug counter to measure cache hits on the read path
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Spare a double calculation on the fast write path.
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If nvme_alloc_request fails, propagate the right error, instead of
assuming ENOMEM.
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In case of a failure when submitting a request, convert the ppa_list
addresses to the target format so that it can interpret ppas for
recovery
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull s390 bugfix from Martin Schwidefsky:
"One last s390 patch for 4.12
Revert the re-IPL semantics back to the v4.7 state. It turned out that
the memory layout may change due to memory hotplug if load-normal is
used"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/ipl: revert Load Normal semantics for LPAR CCW-type re-IPL
Previously a framework to factor out the drivers init function has been
merged.
Use this common framework in this driver, we get:
Before:
text data bss dec hex filename
1787 384 12 2183 887 drivers/clocksource/sun4i_timer.o
After:
text data bss dec hex filename
1407 512 0 1919 77f drivers/clocksource/sun4i_timer.o
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Tested-by: Chen-Yu Tsai <wens@csie.org>
A typo in the code checks the return value of iomap against !NULL
and, thus, fails everytime the mapping succeed.
Fix this by inverting the condition in the check.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Michael reported the segfault when kernel.kptr_restrict=2 is set.
$ perf record ls
...
perf: Segmentation fault
Obtained 16 stack frames.
./perf(dump_stack+0x2d) [0x5068df]
./perf(sighandler_dump_stack+0x2d) [0x5069bf]
./perf() [0x43e47b]
/lib64/libc.so.6(+0x3594f) [0x7f762004794f]
/lib64/libc.so.6(strlen+0x26) [0x7f762009ef86]
/lib64/libc.so.6(__strdup+0xd) [0x7f762009ecbd]
./perf(maps__set_kallsyms_ref_reloc_sym+0x4d) [0x51590f]
./perf(machine__create_kernel_maps+0x136) [0x50a7de]
./perf(perf_session__create_kernel_maps+0x2c) [0x510a81]
./perf(perf_session__new+0x13d) [0x510e23]
./perf() [0x43fd61]
./perf(cmd_record+0x704) [0x441823]
./perf() [0x4bc1a0]
./perf() [0x4bc40d]
./perf() [0x4bc55f]
./perf(main+0x2d5) [0x4bc939]
Segmentation fault (core dumped)
The reason is that with kernel.kptr_restrict=2, we don't get
the symbol from machine__get_running_kernel_start, which we
want to use in maps__set_kallsyms_ref_reloc_sym and we crash.
Check the symbol name value before calling
maps__set_kallsyms_ref_reloc_sym() and succeed without ref_reloc_sym
being set. It's safe because we check its existence before we use it.
Reported-by: Michael Petlan <mpetlan@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20170626095153.553-1-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The function sbi_send() is local to just pnd2_edac.c and does not need
to be in global scope, so make it static.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170623084855.9197-1-colin.king@canonical.com
Signed-off-by: Borislav Petkov <bp@suse.de>
The MCE severity gives a hint as to how to handle the error. The
notifier blocks can then use the severity to decide on an action.
It's not necessary for machine_check_poll() to filter errors for
the notifier chain, since each block will check its own set of
conditions before handling an error.
Also, there isn't any urgency for machine_check_poll() to make decisions
based on severity like in do_machine_check().
If we can assume that a severity is set then we can use it in more
notifier blocks. For example, the CEC block could check for a "KEEP"
severity rather than checking bits in the status. This isn't possible
now since the severity is not set except for "DEFFRRED/UCNA" errors with
a valid address.
Save the severity since we have it, and let the notifier blocks decide
if they want to do anything.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1498074402-98633-1-git-send-email-Yazen.Ghannam@amd.com
The helper function __load_ucode_amd() and pointer intel_ucode_patch do
not need to be in global scope, so make them static.
Fixes those sparse warnings:
"symbol '__load_ucode_amd' was not declared. Should it be static?"
"symbol 'intel_ucode_patch' was not declared. Should it be static?"
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170622095736.11937-1-colin.king@canonical.com
Larry Finger reported that his Powerbook G4 was no longer booting with v4.12-rc,
userspace was up but giving weird errors such as:
udevd[64]: starting version 175
udevd[64]: Unable to receive ctrl message: Bad address.
modprobe: chdir(4.12-rc1): No such file or directory
He bisected the problem to commit 3448890c32 ("powerpc: get rid of zeroing,
switch to RAW_COPY_USER").
Al identified that the problem is actually a miscompilation by GCC 4.6.3, which
is exposed by the above commit.
Al also pointed out that inlining copy_to/from_user() is probably of little or
no benefit, which is correct. Using Anton's copy_to_user benchmark, with a
pathological single byte copy, we see a small increase in performance
by *removing* inlining:
Before (inlined):
# time ./copy_to_user -w -l 1 -i 10000000 ( x 3 )
real 0m22.063s
real 0m22.059s
real 0m22.076s
After:
# time ./copy_to_user -w -l 1 -i 10000000 ( x 3 )
real 0m21.325s
real 0m21.299s
real 0m21.364s
So as a small performance improvement and to avoid the miscompilation, drop
inlining copy_to/from_user() on 32-bit.
Fixes: 3448890c32 ("powerpc: get rid of zeroing, switch to RAW_COPY_USER")
Reported-by: Larry Finger <Larry.Finger@lwfinger.net>
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The hash table created during vmw_cmdbuf_res_man_create was
never freed. This causes memory leak in context creation.
Added the corresponding drm_ht_remove in vmw_cmdbuf_res_man_destroy.
Tested for memory leak by running piglit overnight and kernel
memory is not inflated which earlier was.
Cc: <stable@vger.kernel.org>
Signed-off-by: Deepak Rawat <drawat@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Since commit:
af2cf278ef ("x86/mm/hotplug: Don't remove PGD entries in remove_pagetable()")
we no longer free PUDs so that we do not have to synchronize
all PGDs on hot-remove/vfree().
But the new 5-level page table patchset reverted that for 4-level
page tables, in the following commit:
f2a6a7050109: ("x86: Convert the rest of the code to support p4d_t")
This patch restores the damage and disables free_pud() if we are in the
4-level page table case, thus avoiding BUG_ON() after hot-remove.
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
[ Clarified the changelog and the code comments. ]
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170624180514.3821-1-jglisse@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>