Go to file
Waiman Long 810507fe6f locking/lockdep: Reuse freed chain_hlocks entries
Once a lock class is zapped, all the lock chains that include the zapped
class are essentially useless. The lock_chain structure itself can be
reused, but not the corresponding chain_hlocks[] entries. Over time,
we will run out of chain_hlocks entries while there are still plenty
of other lockdep array entries available.

To fix this imbalance, we have to make chain_hlocks entries reusable
just like the others. As the freed chain_hlocks entries are in blocks of
various lengths. A simple bitmap like the one used in the other reusable
lockdep arrays isn't applicable. Instead the chain_hlocks entries are
put into bucketed lists (MAX_CHAIN_BUCKETS) of chain blocks.  Bucket 0
is the variable size bucket which houses chain blocks of size larger than
MAX_CHAIN_BUCKETS sorted in decreasing size order.  Initially, the whole
array is in one chain block (the primordial chain block) in bucket 0.

The minimum size of a chain block is 2 chain_hlocks entries. That will
be the minimum allocation size. In other word, allocation requests
for one chain_hlocks entry will cause 2-entry block to be returned and
hence 1 entry will be wasted.

Allocation requests for the chain_hlocks are fulfilled first by looking
for chain block of matching size. If not found, the first chain block
from bucket[0] (the largest one) is split. That can cause hlock entries
fragmentation and reduce allocation efficiency if a chain block of size >
MAX_CHAIN_BUCKETS is ever zapped and put back to after the primordial
chain block. So the MAX_CHAIN_BUCKETS must be large enough that this
should seldom happen.

By reusing the chain_hlocks entries, we are able to handle workloads
that add and zap a lot of lock classes without the risk of running out
of chain_hlocks entries as long as the total number of outstanding lock
classes at any time remain within a reasonable limit.

Two new tracking counters, nr_free_chain_hlocks & nr_large_chain_blocks,
are added to track the total number of chain_hlocks entries in the
free bucketed lists and the number of large chain blocks in buckets[0]
respectively. The nr_free_chain_hlocks replaces nr_chain_hlocks.

The nr_large_chain_blocks counter enables to see if we should increase
the number of buckets (MAX_CHAIN_BUCKETS) available so as to avoid to
avoid the fragmentation problem in bucket[0].

An internal nfsd test that ran for more than an hour and kept on
loading and unloading kernel modules could cause the following message
to be displayed.

  [ 4318.443670] BUG: MAX_LOCKDEP_CHAIN_HLOCKS too low!

The patched kernel was able to complete the test with a lot of free
chain_hlocks entries to spare:

  # cat /proc/lockdep_stats
     :
   dependency chains:                   18867 [max: 65536]
   dependency chain hlocks:             74926 [max: 327680]
   dependency chain hlocks lost:            0
     :
   zapped classes:                       1541
   zapped lock chains:                  56765
   large chain blocks:                      1

By changing MAX_CHAIN_BUCKETS to 3 and add a counter for the size of the
largest chain block. The system still worked and We got the following
lockdep_stats data:

   dependency chains:                   18601 [max: 65536]
   dependency chain hlocks used:        73133 [max: 327680]
   dependency chain hlocks lost:            0
     :
   zapped classes:                       1541
   zapped lock chains:                  56702
   large chain blocks:                  45165
   large chain block size:              20165

By running the test again, I was indeed able to cause chain_hlocks
entries to get lost:

   dependency chain hlocks used:        74806 [max: 327680]
   dependency chain hlocks lost:          575
     :
   large chain blocks:                  48737
   large chain block size:                  7

Due to the fragmentation, it is possible that the
"MAX_LOCKDEP_CHAIN_HLOCKS too low!" error can happen even if a lot of
of chain_hlocks entries appear to be free.

Fortunately, a MAX_CHAIN_BUCKETS value of 16 should be big enough that
few variable sized chain blocks, other than the initial one, should
ever be present in bucket 0.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20200206152408.24165-7-longman@redhat.com
2020-02-11 13:10:52 +01:00
Documentation Kbuild updates for v5.6 (2nd) 2020-02-09 16:05:50 -08:00
LICENSES LICENSES: Rename other to deprecated 2019-05-03 06:34:32 -06:00
arch Kbuild updates for v5.6 (2nd) 2020-02-09 16:05:50 -08:00
block block-5.6-2020-02-05 2020-02-06 06:15:23 +00:00
certs certs: Add wrapper function to check blacklisted binary hash 2019-11-12 12:25:50 +11:00
crypto treewide: remove redundant IS_ERR() before error code check 2020-02-04 03:05:27 +00:00
drivers Kbuild updates for v5.6 (2nd) 2020-02-09 16:05:50 -08:00
fs Kbuild updates for v5.6 (2nd) 2020-02-09 16:05:50 -08:00
include fs: New zonefs file system 2020-02-09 15:51:46 -08:00
init Tracing updates: 2020-02-06 07:12:11 +00:00
ipc proc: convert everything to "struct proc_ops" 2020-02-04 03:05:26 +00:00
kernel locking/lockdep: Reuse freed chain_hlocks entries 2020-02-11 13:10:52 +01:00
lib Kbuild updates for v5.6 (2nd) 2020-02-09 16:05:50 -08:00
mm Merge branch 'merge.nfs-fs_parse.1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-02-08 13:26:41 -08:00
net Kbuild updates for v5.6 (2nd) 2020-02-09 16:05:50 -08:00
samples Kbuild updates for v5.6 (2nd) 2020-02-09 16:05:50 -08:00
scripts Kbuild updates for v5.6 (2nd) 2020-02-09 16:05:50 -08:00
security selinux/stable-5.6 PR 20200210 2020-02-10 16:51:35 -08:00
sound sound fixes for 5.6-rc1 2020-02-06 14:15:01 +00:00
tools A set of fixes and improvements for the perf subsystem: 2020-02-09 12:04:09 -08:00
usr Kbuild updates for v5.6 (2nd) 2020-02-09 16:05:50 -08:00
virt KVM: fix overflow of zero page refcount with ksm running 2020-02-05 15:27:46 +01:00
.clang-format clang-format: Update with the latest for_each macro list 2019-08-31 10:00:51 +02:00
.cocciconfig
.get_maintainer.ignore Opt out of scripts/get_maintainer.pl 2019-05-16 10:53:40 -07:00
.gitattributes .gitattributes: use 'dts' diff driver for dts files 2019-12-04 19:44:11 -08:00
.gitignore modpost: dump missing namespaces into a single modules.nsdeps file 2019-11-11 20:10:01 +09:00
.mailmap A handful of small documentation fixes that wandered in. 2020-02-07 13:03:10 -08:00
COPYING COPYING: use the new text with points to the license files 2018-03-23 12:41:45 -06:00
CREDITS open: introduce openat2(2) syscall 2020-01-18 09:19:18 -05:00
Kbuild kbuild: rename hostprogs-y/always to hostprogs/always-y 2020-02-04 01:53:07 +09:00
Kconfig docs: kbuild: convert docs to ReST and rename to *.rst 2019-06-14 14:21:21 -06:00
MAINTAINERS fs: New zonefs file system 2020-02-09 15:51:46 -08:00
Makefile Linux 5.6-rc1 2020-02-09 16:08:48 -08:00
README Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00

README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.