License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01 22:07:57 +08:00
|
|
|
/* SPDX-License-Identifier: GPL-2.0 */
|
2021-08-15 05:16:59 +08:00
|
|
|
#ifndef __LINUX_FIND_H_
|
|
|
|
#define __LINUX_FIND_H_
|
|
|
|
|
|
|
|
#ifndef __LINUX_BITMAP_H
|
|
|
|
#error only <linux/bitmap.h> can be included directly
|
|
|
|
#endif
|
|
|
|
|
|
|
|
#include <linux/bitops.h>
|
2006-03-26 17:39:11 +08:00
|
|
|
|
lib/find_bit: optimize find_next_bit() functions
Over the past couple years, the function _find_next_bit() was extended
with parameters that modify its behavior to implement and- zero- and le-
flavors. The parameters are passed at compile time, but current design
prevents a compiler from optimizing out the conditionals.
As find_next_bit() API grows, I expect that more parameters will be added.
Current design would require more conditional code in _find_next_bit(),
which would bloat the helper even more and make it barely readable.
This patch replaces _find_next_bit() with a macro FIND_NEXT_BIT, and adds
a set of wrappers, so that the compile-time optimizations become possible.
The common logic is moved to the new macro, and all flavors may be
generated by providing a FETCH macro parameter, like in this example:
#define FIND_NEXT_BIT(FETCH, MUNGE, size, start) ...
find_next_xornot_and_bit(addr1, addr2, addr3, size, start)
{
return FIND_NEXT_BIT(addr1[idx] ^ ~addr2[idx] & addr3[idx],
/* nop */, size, start);
}
The FETCH may be of any complexity, as soon as it only refers the bitmap(s)
and an iterator idx.
MUNGE is here to support _le code generation for BE builds. May be
empty.
I ran find_bit_benchmark 16 times on top of 6.0-rc2 and 16 times on top
of 6.0-rc2 + this series. The results for kvm/x86_64 are:
v6.0-rc2 Optimized Difference Z-score
Random dense bitmap ns ns ns %
find_next_bit: 787735 670546 117189 14.9 3.97
find_next_zero_bit: 777492 664208 113284 14.6 10.51
find_last_bit: 830925 687573 143352 17.3 2.35
find_first_bit: 3874366 3306635 567731 14.7 1.84
find_first_and_bit: 40677125 37739887 2937238 7.2 1.36
find_next_and_bit: 347865 304456 43409 12.5 1.35
Random sparse bitmap
find_next_bit: 19816 14021 5795 29.2 6.10
find_next_zero_bit: 1318901 1223794 95107 7.2 1.41
find_last_bit: 14573 13514 1059 7.3 6.92
find_first_bit: 1313321 1249024 64297 4.9 1.53
find_first_and_bit: 8921 8098 823 9.2 4.56
find_next_and_bit: 9796 7176 2620 26.7 5.39
Where the statistics is significant (z-score > 3), the improvement
is ~15%.
According to the bloat-o-meter, the Image size is 10-11K less:
x86_64/defconfig:
add/remove: 32/14 grow/shrink: 61/782 up/down: 6344/-16521 (-10177)
arm64/defconfig:
add/remove: 3/2 grow/shrink: 50/714 up/down: 608/-11556 (-10948)
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
2022-09-15 10:07:29 +08:00
|
|
|
unsigned long _find_next_bit(const unsigned long *addr1, unsigned long nbits,
|
|
|
|
unsigned long start);
|
|
|
|
unsigned long _find_next_and_bit(const unsigned long *addr1, const unsigned long *addr2,
|
|
|
|
unsigned long nbits, unsigned long start);
|
2022-10-03 23:34:17 +08:00
|
|
|
unsigned long _find_next_andnot_bit(const unsigned long *addr1, const unsigned long *addr2,
|
|
|
|
unsigned long nbits, unsigned long start);
|
lib/find_bit: optimize find_next_bit() functions
Over the past couple years, the function _find_next_bit() was extended
with parameters that modify its behavior to implement and- zero- and le-
flavors. The parameters are passed at compile time, but current design
prevents a compiler from optimizing out the conditionals.
As find_next_bit() API grows, I expect that more parameters will be added.
Current design would require more conditional code in _find_next_bit(),
which would bloat the helper even more and make it barely readable.
This patch replaces _find_next_bit() with a macro FIND_NEXT_BIT, and adds
a set of wrappers, so that the compile-time optimizations become possible.
The common logic is moved to the new macro, and all flavors may be
generated by providing a FETCH macro parameter, like in this example:
#define FIND_NEXT_BIT(FETCH, MUNGE, size, start) ...
find_next_xornot_and_bit(addr1, addr2, addr3, size, start)
{
return FIND_NEXT_BIT(addr1[idx] ^ ~addr2[idx] & addr3[idx],
/* nop */, size, start);
}
The FETCH may be of any complexity, as soon as it only refers the bitmap(s)
and an iterator idx.
MUNGE is here to support _le code generation for BE builds. May be
empty.
I ran find_bit_benchmark 16 times on top of 6.0-rc2 and 16 times on top
of 6.0-rc2 + this series. The results for kvm/x86_64 are:
v6.0-rc2 Optimized Difference Z-score
Random dense bitmap ns ns ns %
find_next_bit: 787735 670546 117189 14.9 3.97
find_next_zero_bit: 777492 664208 113284 14.6 10.51
find_last_bit: 830925 687573 143352 17.3 2.35
find_first_bit: 3874366 3306635 567731 14.7 1.84
find_first_and_bit: 40677125 37739887 2937238 7.2 1.36
find_next_and_bit: 347865 304456 43409 12.5 1.35
Random sparse bitmap
find_next_bit: 19816 14021 5795 29.2 6.10
find_next_zero_bit: 1318901 1223794 95107 7.2 1.41
find_last_bit: 14573 13514 1059 7.3 6.92
find_first_bit: 1313321 1249024 64297 4.9 1.53
find_first_and_bit: 8921 8098 823 9.2 4.56
find_next_and_bit: 9796 7176 2620 26.7 5.39
Where the statistics is significant (z-score > 3), the improvement
is ~15%.
According to the bloat-o-meter, the Image size is 10-11K less:
x86_64/defconfig:
add/remove: 32/14 grow/shrink: 61/782 up/down: 6344/-16521 (-10177)
arm64/defconfig:
add/remove: 3/2 grow/shrink: 50/714 up/down: 608/-11556 (-10948)
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
2022-09-15 10:07:29 +08:00
|
|
|
unsigned long _find_next_zero_bit(const unsigned long *addr, unsigned long nbits,
|
|
|
|
unsigned long start);
|
2021-05-07 09:03:14 +08:00
|
|
|
extern unsigned long _find_first_bit(const unsigned long *addr, unsigned long size);
|
lib: add find_nth{,_and,_andnot}_bit()
Kernel lacks for a function that searches for Nth bit in a bitmap.
Usually people do it like this:
for_each_set_bit(bit, mask, size)
if (n-- == 0)
return bit;
We can do it more efficiently, if we:
1. find a word containing Nth bit, using hweight(); and
2. find the bit, using a helper fns(), that works similarly to
__ffs() and ffz().
fns() is implemented as a simple loop. For x86_64, there's PDEP instruction
to do that: ret = clz(pdep(1 << idx, num)). However, for large bitmaps the
most of improvement comes from using hweight(), so I kept fns() simple.
New find_nth_bit() is ~70 times faster on x86_64/kvm in find_bit benchmark:
find_nth_bit: 7154190 ns, 16411 iterations
for_each_bit: 505493126 ns, 16315 iterations
With all that, a family of 3 new functions is added, and used where
appropriate in the following patches.
Signed-off-by: Yury Norov <yury.norov@gmail.com>
2022-09-18 11:07:13 +08:00
|
|
|
unsigned long __find_nth_bit(const unsigned long *addr, unsigned long size, unsigned long n);
|
|
|
|
unsigned long __find_nth_and_bit(const unsigned long *addr1, const unsigned long *addr2,
|
|
|
|
unsigned long size, unsigned long n);
|
|
|
|
unsigned long __find_nth_andnot_bit(const unsigned long *addr1, const unsigned long *addr2,
|
|
|
|
unsigned long size, unsigned long n);
|
2021-08-15 05:17:01 +08:00
|
|
|
extern unsigned long _find_first_and_bit(const unsigned long *addr1,
|
|
|
|
const unsigned long *addr2, unsigned long size);
|
2021-05-07 09:03:14 +08:00
|
|
|
extern unsigned long _find_first_zero_bit(const unsigned long *addr, unsigned long size);
|
|
|
|
extern unsigned long _find_last_bit(const unsigned long *addr, unsigned long size);
|
2021-05-07 09:03:03 +08:00
|
|
|
|
2022-09-15 10:07:28 +08:00
|
|
|
#ifdef __BIG_ENDIAN
|
|
|
|
unsigned long _find_first_zero_bit_le(const unsigned long *addr, unsigned long size);
|
lib/find_bit: optimize find_next_bit() functions
Over the past couple years, the function _find_next_bit() was extended
with parameters that modify its behavior to implement and- zero- and le-
flavors. The parameters are passed at compile time, but current design
prevents a compiler from optimizing out the conditionals.
As find_next_bit() API grows, I expect that more parameters will be added.
Current design would require more conditional code in _find_next_bit(),
which would bloat the helper even more and make it barely readable.
This patch replaces _find_next_bit() with a macro FIND_NEXT_BIT, and adds
a set of wrappers, so that the compile-time optimizations become possible.
The common logic is moved to the new macro, and all flavors may be
generated by providing a FETCH macro parameter, like in this example:
#define FIND_NEXT_BIT(FETCH, MUNGE, size, start) ...
find_next_xornot_and_bit(addr1, addr2, addr3, size, start)
{
return FIND_NEXT_BIT(addr1[idx] ^ ~addr2[idx] & addr3[idx],
/* nop */, size, start);
}
The FETCH may be of any complexity, as soon as it only refers the bitmap(s)
and an iterator idx.
MUNGE is here to support _le code generation for BE builds. May be
empty.
I ran find_bit_benchmark 16 times on top of 6.0-rc2 and 16 times on top
of 6.0-rc2 + this series. The results for kvm/x86_64 are:
v6.0-rc2 Optimized Difference Z-score
Random dense bitmap ns ns ns %
find_next_bit: 787735 670546 117189 14.9 3.97
find_next_zero_bit: 777492 664208 113284 14.6 10.51
find_last_bit: 830925 687573 143352 17.3 2.35
find_first_bit: 3874366 3306635 567731 14.7 1.84
find_first_and_bit: 40677125 37739887 2937238 7.2 1.36
find_next_and_bit: 347865 304456 43409 12.5 1.35
Random sparse bitmap
find_next_bit: 19816 14021 5795 29.2 6.10
find_next_zero_bit: 1318901 1223794 95107 7.2 1.41
find_last_bit: 14573 13514 1059 7.3 6.92
find_first_bit: 1313321 1249024 64297 4.9 1.53
find_first_and_bit: 8921 8098 823 9.2 4.56
find_next_and_bit: 9796 7176 2620 26.7 5.39
Where the statistics is significant (z-score > 3), the improvement
is ~15%.
According to the bloat-o-meter, the Image size is 10-11K less:
x86_64/defconfig:
add/remove: 32/14 grow/shrink: 61/782 up/down: 6344/-16521 (-10177)
arm64/defconfig:
add/remove: 3/2 grow/shrink: 50/714 up/down: 608/-11556 (-10948)
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
2022-09-15 10:07:29 +08:00
|
|
|
unsigned long _find_next_zero_bit_le(const unsigned long *addr, unsigned
|
|
|
|
long size, unsigned long offset);
|
|
|
|
unsigned long _find_next_bit_le(const unsigned long *addr, unsigned
|
|
|
|
long size, unsigned long offset);
|
2022-09-15 10:07:28 +08:00
|
|
|
#endif
|
|
|
|
|
2011-05-27 07:26:09 +08:00
|
|
|
#ifndef find_next_bit
|
2010-09-29 17:08:51 +08:00
|
|
|
/**
|
|
|
|
* find_next_bit - find the next set bit in a memory region
|
|
|
|
* @addr: The address to base the search on
|
|
|
|
* @size: The bitmap size in bits
|
2022-04-11 23:05:55 +08:00
|
|
|
* @offset: The bitnumber to start searching at
|
2013-11-13 07:09:48 +08:00
|
|
|
*
|
|
|
|
* Returns the bit number for the next set bit
|
|
|
|
* If no bits are set, returns @size.
|
2010-09-29 17:08:51 +08:00
|
|
|
*/
|
2021-05-07 09:03:03 +08:00
|
|
|
static inline
|
|
|
|
unsigned long find_next_bit(const unsigned long *addr, unsigned long size,
|
|
|
|
unsigned long offset)
|
|
|
|
{
|
lib: add fast path for find_next_*_bit()
Similarly to bitmap functions, find_next_*_bit() users will benefit if
we'll handle a case of bitmaps that fit into a single word inline. In the
very best case, the compiler may replace a function call with a few
instructions.
This is the quite typical find_next_bit() user:
unsigned int cpumask_next(int n, const struct cpumask *srcp)
{
/* -1 is a legal arg here. */
if (n != -1)
cpumask_check(n);
return find_next_bit(cpumask_bits(srcp), nr_cpumask_bits, n + 1);
}
EXPORT_SYMBOL(cpumask_next);
Currently, on ARM64 the generated code looks like this:
0000000000000000 <cpumask_next>:
0: a9bf7bfd stp x29, x30, [sp, #-16]!
4: 11000402 add w2, w0, #0x1
8: aa0103e0 mov x0, x1
c: d2800401 mov x1, #0x40 // #64
10: 910003fd mov x29, sp
14: 93407c42 sxtw x2, w2
18: 94000000 bl 0 <find_next_bit>
1c: a8c17bfd ldp x29, x30, [sp], #16
20: d65f03c0 ret
24: d503201f nop
After applying this patch:
0000000000000140 <cpumask_next>:
140: 11000400 add w0, w0, #0x1
144: 93407c00 sxtw x0, w0
148: f100fc1f cmp x0, #0x3f
14c: 54000168 b.hi 178 <cpumask_next+0x38> // b.pmore
150: f9400023 ldr x3, [x1]
154: 92800001 mov x1, #0xffffffffffffffff // #-1
158: 9ac02020 lsl x0, x1, x0
15c: 52800802 mov w2, #0x40 // #64
160: 8a030001 and x1, x0, x3
164: dac00020 rbit x0, x1
168: f100003f cmp x1, #0x0
16c: dac01000 clz x0, x0
170: 1a800040 csel w0, w2, w0, eq // eq = none
174: d65f03c0 ret
178: 52800800 mov w0, #0x40 // #64
17c: d65f03c0 ret
find_next_bit() call is replaced with 6 instructions. find_next_bit()
itself is 41 instructions plus function call overhead.
Despite inlining, the scripts/bloat-o-meter report smaller .text size
after applying the series:
add/remove: 11/9 grow/shrink: 233/176 up/down: 5780/-6768 (-988)
Link: https://lkml.kernel.org/r/20210401003153.97325-10-yury.norov@gmail.com
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Alexey Klimov <aklimov@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Sterba <dsterba@suse.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Jianpeng Ma <jianpeng.ma@intel.com>
Cc: Joe Perches <joe@perches.com>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Stefano Brivio <sbrivio@redhat.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Wolfram Sang <wsa+renesas@sang-engineering.com>
Cc: Yoshinori Sato <ysato@users.osdn.me>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-07 09:03:11 +08:00
|
|
|
if (small_const_nbits(size)) {
|
|
|
|
unsigned long val;
|
|
|
|
|
|
|
|
if (unlikely(offset >= size))
|
|
|
|
return size;
|
|
|
|
|
|
|
|
val = *addr & GENMASK(size - 1, offset);
|
|
|
|
return val ? __ffs(val) : size;
|
|
|
|
}
|
|
|
|
|
lib/find_bit: optimize find_next_bit() functions
Over the past couple years, the function _find_next_bit() was extended
with parameters that modify its behavior to implement and- zero- and le-
flavors. The parameters are passed at compile time, but current design
prevents a compiler from optimizing out the conditionals.
As find_next_bit() API grows, I expect that more parameters will be added.
Current design would require more conditional code in _find_next_bit(),
which would bloat the helper even more and make it barely readable.
This patch replaces _find_next_bit() with a macro FIND_NEXT_BIT, and adds
a set of wrappers, so that the compile-time optimizations become possible.
The common logic is moved to the new macro, and all flavors may be
generated by providing a FETCH macro parameter, like in this example:
#define FIND_NEXT_BIT(FETCH, MUNGE, size, start) ...
find_next_xornot_and_bit(addr1, addr2, addr3, size, start)
{
return FIND_NEXT_BIT(addr1[idx] ^ ~addr2[idx] & addr3[idx],
/* nop */, size, start);
}
The FETCH may be of any complexity, as soon as it only refers the bitmap(s)
and an iterator idx.
MUNGE is here to support _le code generation for BE builds. May be
empty.
I ran find_bit_benchmark 16 times on top of 6.0-rc2 and 16 times on top
of 6.0-rc2 + this series. The results for kvm/x86_64 are:
v6.0-rc2 Optimized Difference Z-score
Random dense bitmap ns ns ns %
find_next_bit: 787735 670546 117189 14.9 3.97
find_next_zero_bit: 777492 664208 113284 14.6 10.51
find_last_bit: 830925 687573 143352 17.3 2.35
find_first_bit: 3874366 3306635 567731 14.7 1.84
find_first_and_bit: 40677125 37739887 2937238 7.2 1.36
find_next_and_bit: 347865 304456 43409 12.5 1.35
Random sparse bitmap
find_next_bit: 19816 14021 5795 29.2 6.10
find_next_zero_bit: 1318901 1223794 95107 7.2 1.41
find_last_bit: 14573 13514 1059 7.3 6.92
find_first_bit: 1313321 1249024 64297 4.9 1.53
find_first_and_bit: 8921 8098 823 9.2 4.56
find_next_and_bit: 9796 7176 2620 26.7 5.39
Where the statistics is significant (z-score > 3), the improvement
is ~15%.
According to the bloat-o-meter, the Image size is 10-11K less:
x86_64/defconfig:
add/remove: 32/14 grow/shrink: 61/782 up/down: 6344/-16521 (-10177)
arm64/defconfig:
add/remove: 3/2 grow/shrink: 50/714 up/down: 608/-11556 (-10948)
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
2022-09-15 10:07:29 +08:00
|
|
|
return _find_next_bit(addr, size, offset);
|
2021-05-07 09:03:03 +08:00
|
|
|
}
|
2011-05-27 07:26:09 +08:00
|
|
|
#endif
|
2006-03-26 17:39:11 +08:00
|
|
|
|
lib: optimize cpumask_next_and()
We've measured that we spend ~0.6% of sys cpu time in cpumask_next_and().
It's essentially a joined iteration in search for a non-zero bit, which is
currently implemented as a lookup join (find a nonzero bit on the lhs,
lookup the rhs to see if it's set there).
Implement a direct join (find a nonzero bit on the incrementally built
join). Also add generic bitmap benchmarks in the new `test_find_bit`
module for new function (see `find_next_and_bit` in [2] and [3] below).
For cpumask_next_and, direct benchmarking shows that it's 1.17x to 14x
faster with a geometric mean of 2.1 on 32 CPUs [1]. No impact on memory
usage. Note that on Arm, the new pure-C implementation still outperforms
the old one that uses a mix of C and asm (`find_next_bit`) [3].
[1] Approximate benchmark code:
```
unsigned long src1p[nr_cpumask_longs] = {pattern1};
unsigned long src2p[nr_cpumask_longs] = {pattern2};
for (/*a bunch of repetitions*/) {
for (int n = -1; n <= nr_cpu_ids; ++n) {
asm volatile("" : "+rm"(src1p)); // prevent any optimization
asm volatile("" : "+rm"(src2p));
unsigned long result = cpumask_next_and(n, src1p, src2p);
asm volatile("" : "+rm"(result));
}
}
```
Results:
pattern1 pattern2 time_before/time_after
0x0000ffff 0x0000ffff 1.65
0x0000ffff 0x00005555 2.24
0x0000ffff 0x00001111 2.94
0x0000ffff 0x00000000 14.0
0x00005555 0x0000ffff 1.67
0x00005555 0x00005555 1.71
0x00005555 0x00001111 1.90
0x00005555 0x00000000 6.58
0x00001111 0x0000ffff 1.46
0x00001111 0x00005555 1.49
0x00001111 0x00001111 1.45
0x00001111 0x00000000 3.10
0x00000000 0x0000ffff 1.18
0x00000000 0x00005555 1.18
0x00000000 0x00001111 1.17
0x00000000 0x00000000 1.25
-----------------------------
geo.mean 2.06
[2] test_find_next_bit, X86 (skylake)
[ 3913.477422] Start testing find_bit() with random-filled bitmap
[ 3913.477847] find_next_bit: 160868 cycles, 16484 iterations
[ 3913.477933] find_next_zero_bit: 169542 cycles, 16285 iterations
[ 3913.478036] find_last_bit: 201638 cycles, 16483 iterations
[ 3913.480214] find_first_bit: 4353244 cycles, 16484 iterations
[ 3913.480216] Start testing find_next_and_bit() with random-filled
bitmap
[ 3913.481074] find_next_and_bit: 89604 cycles, 8216 iterations
[ 3913.481075] Start testing find_bit() with sparse bitmap
[ 3913.481078] find_next_bit: 2536 cycles, 66 iterations
[ 3913.481252] find_next_zero_bit: 344404 cycles, 32703 iterations
[ 3913.481255] find_last_bit: 2006 cycles, 66 iterations
[ 3913.481265] find_first_bit: 17488 cycles, 66 iterations
[ 3913.481266] Start testing find_next_and_bit() with sparse bitmap
[ 3913.481272] find_next_and_bit: 764 cycles, 1 iterations
[3] test_find_next_bit, arm (v7 odroid XU3).
[ 267.206928] Start testing find_bit() with random-filled bitmap
[ 267.214752] find_next_bit: 4474 cycles, 16419 iterations
[ 267.221850] find_next_zero_bit: 5976 cycles, 16350 iterations
[ 267.229294] find_last_bit: 4209 cycles, 16419 iterations
[ 267.279131] find_first_bit: 1032991 cycles, 16420 iterations
[ 267.286265] Start testing find_next_and_bit() with random-filled
bitmap
[ 267.302386] find_next_and_bit: 2290 cycles, 8140 iterations
[ 267.309422] Start testing find_bit() with sparse bitmap
[ 267.316054] find_next_bit: 191 cycles, 66 iterations
[ 267.322726] find_next_zero_bit: 8758 cycles, 32703 iterations
[ 267.329803] find_last_bit: 84 cycles, 66 iterations
[ 267.336169] find_first_bit: 4118 cycles, 66 iterations
[ 267.342627] Start testing find_next_and_bit() with sparse bitmap
[ 267.356919] find_next_and_bit: 91 cycles, 1 iterations
[courbet@google.com: v6]
Link: http://lkml.kernel.org/r/20171129095715.23430-1-courbet@google.com
[geert@linux-m68k.org: m68k/bitops: always include <asm-generic/bitops/find.h>]
Link: http://lkml.kernel.org/r/1512556816-28627-1-git-send-email-geert@linux-m68k.org
Link: http://lkml.kernel.org/r/20171128131334.23491-1-courbet@google.com
Signed-off-by: Clement Courbet <courbet@google.com>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Yury Norov <ynorov@caviumnetworks.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-07 07:38:34 +08:00
|
|
|
#ifndef find_next_and_bit
|
|
|
|
/**
|
|
|
|
* find_next_and_bit - find the next set bit in both memory regions
|
|
|
|
* @addr1: The first address to base the search on
|
|
|
|
* @addr2: The second address to base the search on
|
|
|
|
* @size: The bitmap size in bits
|
2022-04-11 23:05:55 +08:00
|
|
|
* @offset: The bitnumber to start searching at
|
lib: optimize cpumask_next_and()
We've measured that we spend ~0.6% of sys cpu time in cpumask_next_and().
It's essentially a joined iteration in search for a non-zero bit, which is
currently implemented as a lookup join (find a nonzero bit on the lhs,
lookup the rhs to see if it's set there).
Implement a direct join (find a nonzero bit on the incrementally built
join). Also add generic bitmap benchmarks in the new `test_find_bit`
module for new function (see `find_next_and_bit` in [2] and [3] below).
For cpumask_next_and, direct benchmarking shows that it's 1.17x to 14x
faster with a geometric mean of 2.1 on 32 CPUs [1]. No impact on memory
usage. Note that on Arm, the new pure-C implementation still outperforms
the old one that uses a mix of C and asm (`find_next_bit`) [3].
[1] Approximate benchmark code:
```
unsigned long src1p[nr_cpumask_longs] = {pattern1};
unsigned long src2p[nr_cpumask_longs] = {pattern2};
for (/*a bunch of repetitions*/) {
for (int n = -1; n <= nr_cpu_ids; ++n) {
asm volatile("" : "+rm"(src1p)); // prevent any optimization
asm volatile("" : "+rm"(src2p));
unsigned long result = cpumask_next_and(n, src1p, src2p);
asm volatile("" : "+rm"(result));
}
}
```
Results:
pattern1 pattern2 time_before/time_after
0x0000ffff 0x0000ffff 1.65
0x0000ffff 0x00005555 2.24
0x0000ffff 0x00001111 2.94
0x0000ffff 0x00000000 14.0
0x00005555 0x0000ffff 1.67
0x00005555 0x00005555 1.71
0x00005555 0x00001111 1.90
0x00005555 0x00000000 6.58
0x00001111 0x0000ffff 1.46
0x00001111 0x00005555 1.49
0x00001111 0x00001111 1.45
0x00001111 0x00000000 3.10
0x00000000 0x0000ffff 1.18
0x00000000 0x00005555 1.18
0x00000000 0x00001111 1.17
0x00000000 0x00000000 1.25
-----------------------------
geo.mean 2.06
[2] test_find_next_bit, X86 (skylake)
[ 3913.477422] Start testing find_bit() with random-filled bitmap
[ 3913.477847] find_next_bit: 160868 cycles, 16484 iterations
[ 3913.477933] find_next_zero_bit: 169542 cycles, 16285 iterations
[ 3913.478036] find_last_bit: 201638 cycles, 16483 iterations
[ 3913.480214] find_first_bit: 4353244 cycles, 16484 iterations
[ 3913.480216] Start testing find_next_and_bit() with random-filled
bitmap
[ 3913.481074] find_next_and_bit: 89604 cycles, 8216 iterations
[ 3913.481075] Start testing find_bit() with sparse bitmap
[ 3913.481078] find_next_bit: 2536 cycles, 66 iterations
[ 3913.481252] find_next_zero_bit: 344404 cycles, 32703 iterations
[ 3913.481255] find_last_bit: 2006 cycles, 66 iterations
[ 3913.481265] find_first_bit: 17488 cycles, 66 iterations
[ 3913.481266] Start testing find_next_and_bit() with sparse bitmap
[ 3913.481272] find_next_and_bit: 764 cycles, 1 iterations
[3] test_find_next_bit, arm (v7 odroid XU3).
[ 267.206928] Start testing find_bit() with random-filled bitmap
[ 267.214752] find_next_bit: 4474 cycles, 16419 iterations
[ 267.221850] find_next_zero_bit: 5976 cycles, 16350 iterations
[ 267.229294] find_last_bit: 4209 cycles, 16419 iterations
[ 267.279131] find_first_bit: 1032991 cycles, 16420 iterations
[ 267.286265] Start testing find_next_and_bit() with random-filled
bitmap
[ 267.302386] find_next_and_bit: 2290 cycles, 8140 iterations
[ 267.309422] Start testing find_bit() with sparse bitmap
[ 267.316054] find_next_bit: 191 cycles, 66 iterations
[ 267.322726] find_next_zero_bit: 8758 cycles, 32703 iterations
[ 267.329803] find_last_bit: 84 cycles, 66 iterations
[ 267.336169] find_first_bit: 4118 cycles, 66 iterations
[ 267.342627] Start testing find_next_and_bit() with sparse bitmap
[ 267.356919] find_next_and_bit: 91 cycles, 1 iterations
[courbet@google.com: v6]
Link: http://lkml.kernel.org/r/20171129095715.23430-1-courbet@google.com
[geert@linux-m68k.org: m68k/bitops: always include <asm-generic/bitops/find.h>]
Link: http://lkml.kernel.org/r/1512556816-28627-1-git-send-email-geert@linux-m68k.org
Link: http://lkml.kernel.org/r/20171128131334.23491-1-courbet@google.com
Signed-off-by: Clement Courbet <courbet@google.com>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Yury Norov <ynorov@caviumnetworks.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-07 07:38:34 +08:00
|
|
|
*
|
|
|
|
* Returns the bit number for the next set bit
|
|
|
|
* If no bits are set, returns @size.
|
|
|
|
*/
|
2021-05-07 09:03:03 +08:00
|
|
|
static inline
|
|
|
|
unsigned long find_next_and_bit(const unsigned long *addr1,
|
lib: optimize cpumask_next_and()
We've measured that we spend ~0.6% of sys cpu time in cpumask_next_and().
It's essentially a joined iteration in search for a non-zero bit, which is
currently implemented as a lookup join (find a nonzero bit on the lhs,
lookup the rhs to see if it's set there).
Implement a direct join (find a nonzero bit on the incrementally built
join). Also add generic bitmap benchmarks in the new `test_find_bit`
module for new function (see `find_next_and_bit` in [2] and [3] below).
For cpumask_next_and, direct benchmarking shows that it's 1.17x to 14x
faster with a geometric mean of 2.1 on 32 CPUs [1]. No impact on memory
usage. Note that on Arm, the new pure-C implementation still outperforms
the old one that uses a mix of C and asm (`find_next_bit`) [3].
[1] Approximate benchmark code:
```
unsigned long src1p[nr_cpumask_longs] = {pattern1};
unsigned long src2p[nr_cpumask_longs] = {pattern2};
for (/*a bunch of repetitions*/) {
for (int n = -1; n <= nr_cpu_ids; ++n) {
asm volatile("" : "+rm"(src1p)); // prevent any optimization
asm volatile("" : "+rm"(src2p));
unsigned long result = cpumask_next_and(n, src1p, src2p);
asm volatile("" : "+rm"(result));
}
}
```
Results:
pattern1 pattern2 time_before/time_after
0x0000ffff 0x0000ffff 1.65
0x0000ffff 0x00005555 2.24
0x0000ffff 0x00001111 2.94
0x0000ffff 0x00000000 14.0
0x00005555 0x0000ffff 1.67
0x00005555 0x00005555 1.71
0x00005555 0x00001111 1.90
0x00005555 0x00000000 6.58
0x00001111 0x0000ffff 1.46
0x00001111 0x00005555 1.49
0x00001111 0x00001111 1.45
0x00001111 0x00000000 3.10
0x00000000 0x0000ffff 1.18
0x00000000 0x00005555 1.18
0x00000000 0x00001111 1.17
0x00000000 0x00000000 1.25
-----------------------------
geo.mean 2.06
[2] test_find_next_bit, X86 (skylake)
[ 3913.477422] Start testing find_bit() with random-filled bitmap
[ 3913.477847] find_next_bit: 160868 cycles, 16484 iterations
[ 3913.477933] find_next_zero_bit: 169542 cycles, 16285 iterations
[ 3913.478036] find_last_bit: 201638 cycles, 16483 iterations
[ 3913.480214] find_first_bit: 4353244 cycles, 16484 iterations
[ 3913.480216] Start testing find_next_and_bit() with random-filled
bitmap
[ 3913.481074] find_next_and_bit: 89604 cycles, 8216 iterations
[ 3913.481075] Start testing find_bit() with sparse bitmap
[ 3913.481078] find_next_bit: 2536 cycles, 66 iterations
[ 3913.481252] find_next_zero_bit: 344404 cycles, 32703 iterations
[ 3913.481255] find_last_bit: 2006 cycles, 66 iterations
[ 3913.481265] find_first_bit: 17488 cycles, 66 iterations
[ 3913.481266] Start testing find_next_and_bit() with sparse bitmap
[ 3913.481272] find_next_and_bit: 764 cycles, 1 iterations
[3] test_find_next_bit, arm (v7 odroid XU3).
[ 267.206928] Start testing find_bit() with random-filled bitmap
[ 267.214752] find_next_bit: 4474 cycles, 16419 iterations
[ 267.221850] find_next_zero_bit: 5976 cycles, 16350 iterations
[ 267.229294] find_last_bit: 4209 cycles, 16419 iterations
[ 267.279131] find_first_bit: 1032991 cycles, 16420 iterations
[ 267.286265] Start testing find_next_and_bit() with random-filled
bitmap
[ 267.302386] find_next_and_bit: 2290 cycles, 8140 iterations
[ 267.309422] Start testing find_bit() with sparse bitmap
[ 267.316054] find_next_bit: 191 cycles, 66 iterations
[ 267.322726] find_next_zero_bit: 8758 cycles, 32703 iterations
[ 267.329803] find_last_bit: 84 cycles, 66 iterations
[ 267.336169] find_first_bit: 4118 cycles, 66 iterations
[ 267.342627] Start testing find_next_and_bit() with sparse bitmap
[ 267.356919] find_next_and_bit: 91 cycles, 1 iterations
[courbet@google.com: v6]
Link: http://lkml.kernel.org/r/20171129095715.23430-1-courbet@google.com
[geert@linux-m68k.org: m68k/bitops: always include <asm-generic/bitops/find.h>]
Link: http://lkml.kernel.org/r/1512556816-28627-1-git-send-email-geert@linux-m68k.org
Link: http://lkml.kernel.org/r/20171128131334.23491-1-courbet@google.com
Signed-off-by: Clement Courbet <courbet@google.com>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Yury Norov <ynorov@caviumnetworks.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-07 07:38:34 +08:00
|
|
|
const unsigned long *addr2, unsigned long size,
|
2021-05-07 09:03:03 +08:00
|
|
|
unsigned long offset)
|
|
|
|
{
|
lib: add fast path for find_next_*_bit()
Similarly to bitmap functions, find_next_*_bit() users will benefit if
we'll handle a case of bitmaps that fit into a single word inline. In the
very best case, the compiler may replace a function call with a few
instructions.
This is the quite typical find_next_bit() user:
unsigned int cpumask_next(int n, const struct cpumask *srcp)
{
/* -1 is a legal arg here. */
if (n != -1)
cpumask_check(n);
return find_next_bit(cpumask_bits(srcp), nr_cpumask_bits, n + 1);
}
EXPORT_SYMBOL(cpumask_next);
Currently, on ARM64 the generated code looks like this:
0000000000000000 <cpumask_next>:
0: a9bf7bfd stp x29, x30, [sp, #-16]!
4: 11000402 add w2, w0, #0x1
8: aa0103e0 mov x0, x1
c: d2800401 mov x1, #0x40 // #64
10: 910003fd mov x29, sp
14: 93407c42 sxtw x2, w2
18: 94000000 bl 0 <find_next_bit>
1c: a8c17bfd ldp x29, x30, [sp], #16
20: d65f03c0 ret
24: d503201f nop
After applying this patch:
0000000000000140 <cpumask_next>:
140: 11000400 add w0, w0, #0x1
144: 93407c00 sxtw x0, w0
148: f100fc1f cmp x0, #0x3f
14c: 54000168 b.hi 178 <cpumask_next+0x38> // b.pmore
150: f9400023 ldr x3, [x1]
154: 92800001 mov x1, #0xffffffffffffffff // #-1
158: 9ac02020 lsl x0, x1, x0
15c: 52800802 mov w2, #0x40 // #64
160: 8a030001 and x1, x0, x3
164: dac00020 rbit x0, x1
168: f100003f cmp x1, #0x0
16c: dac01000 clz x0, x0
170: 1a800040 csel w0, w2, w0, eq // eq = none
174: d65f03c0 ret
178: 52800800 mov w0, #0x40 // #64
17c: d65f03c0 ret
find_next_bit() call is replaced with 6 instructions. find_next_bit()
itself is 41 instructions plus function call overhead.
Despite inlining, the scripts/bloat-o-meter report smaller .text size
after applying the series:
add/remove: 11/9 grow/shrink: 233/176 up/down: 5780/-6768 (-988)
Link: https://lkml.kernel.org/r/20210401003153.97325-10-yury.norov@gmail.com
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Alexey Klimov <aklimov@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Sterba <dsterba@suse.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Jianpeng Ma <jianpeng.ma@intel.com>
Cc: Joe Perches <joe@perches.com>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Stefano Brivio <sbrivio@redhat.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Wolfram Sang <wsa+renesas@sang-engineering.com>
Cc: Yoshinori Sato <ysato@users.osdn.me>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-07 09:03:11 +08:00
|
|
|
if (small_const_nbits(size)) {
|
|
|
|
unsigned long val;
|
|
|
|
|
|
|
|
if (unlikely(offset >= size))
|
|
|
|
return size;
|
|
|
|
|
|
|
|
val = *addr1 & *addr2 & GENMASK(size - 1, offset);
|
|
|
|
return val ? __ffs(val) : size;
|
|
|
|
}
|
|
|
|
|
lib/find_bit: optimize find_next_bit() functions
Over the past couple years, the function _find_next_bit() was extended
with parameters that modify its behavior to implement and- zero- and le-
flavors. The parameters are passed at compile time, but current design
prevents a compiler from optimizing out the conditionals.
As find_next_bit() API grows, I expect that more parameters will be added.
Current design would require more conditional code in _find_next_bit(),
which would bloat the helper even more and make it barely readable.
This patch replaces _find_next_bit() with a macro FIND_NEXT_BIT, and adds
a set of wrappers, so that the compile-time optimizations become possible.
The common logic is moved to the new macro, and all flavors may be
generated by providing a FETCH macro parameter, like in this example:
#define FIND_NEXT_BIT(FETCH, MUNGE, size, start) ...
find_next_xornot_and_bit(addr1, addr2, addr3, size, start)
{
return FIND_NEXT_BIT(addr1[idx] ^ ~addr2[idx] & addr3[idx],
/* nop */, size, start);
}
The FETCH may be of any complexity, as soon as it only refers the bitmap(s)
and an iterator idx.
MUNGE is here to support _le code generation for BE builds. May be
empty.
I ran find_bit_benchmark 16 times on top of 6.0-rc2 and 16 times on top
of 6.0-rc2 + this series. The results for kvm/x86_64 are:
v6.0-rc2 Optimized Difference Z-score
Random dense bitmap ns ns ns %
find_next_bit: 787735 670546 117189 14.9 3.97
find_next_zero_bit: 777492 664208 113284 14.6 10.51
find_last_bit: 830925 687573 143352 17.3 2.35
find_first_bit: 3874366 3306635 567731 14.7 1.84
find_first_and_bit: 40677125 37739887 2937238 7.2 1.36
find_next_and_bit: 347865 304456 43409 12.5 1.35
Random sparse bitmap
find_next_bit: 19816 14021 5795 29.2 6.10
find_next_zero_bit: 1318901 1223794 95107 7.2 1.41
find_last_bit: 14573 13514 1059 7.3 6.92
find_first_bit: 1313321 1249024 64297 4.9 1.53
find_first_and_bit: 8921 8098 823 9.2 4.56
find_next_and_bit: 9796 7176 2620 26.7 5.39
Where the statistics is significant (z-score > 3), the improvement
is ~15%.
According to the bloat-o-meter, the Image size is 10-11K less:
x86_64/defconfig:
add/remove: 32/14 grow/shrink: 61/782 up/down: 6344/-16521 (-10177)
arm64/defconfig:
add/remove: 3/2 grow/shrink: 50/714 up/down: 608/-11556 (-10948)
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
2022-09-15 10:07:29 +08:00
|
|
|
return _find_next_and_bit(addr1, addr2, size, offset);
|
2021-05-07 09:03:03 +08:00
|
|
|
}
|
lib: optimize cpumask_next_and()
We've measured that we spend ~0.6% of sys cpu time in cpumask_next_and().
It's essentially a joined iteration in search for a non-zero bit, which is
currently implemented as a lookup join (find a nonzero bit on the lhs,
lookup the rhs to see if it's set there).
Implement a direct join (find a nonzero bit on the incrementally built
join). Also add generic bitmap benchmarks in the new `test_find_bit`
module for new function (see `find_next_and_bit` in [2] and [3] below).
For cpumask_next_and, direct benchmarking shows that it's 1.17x to 14x
faster with a geometric mean of 2.1 on 32 CPUs [1]. No impact on memory
usage. Note that on Arm, the new pure-C implementation still outperforms
the old one that uses a mix of C and asm (`find_next_bit`) [3].
[1] Approximate benchmark code:
```
unsigned long src1p[nr_cpumask_longs] = {pattern1};
unsigned long src2p[nr_cpumask_longs] = {pattern2};
for (/*a bunch of repetitions*/) {
for (int n = -1; n <= nr_cpu_ids; ++n) {
asm volatile("" : "+rm"(src1p)); // prevent any optimization
asm volatile("" : "+rm"(src2p));
unsigned long result = cpumask_next_and(n, src1p, src2p);
asm volatile("" : "+rm"(result));
}
}
```
Results:
pattern1 pattern2 time_before/time_after
0x0000ffff 0x0000ffff 1.65
0x0000ffff 0x00005555 2.24
0x0000ffff 0x00001111 2.94
0x0000ffff 0x00000000 14.0
0x00005555 0x0000ffff 1.67
0x00005555 0x00005555 1.71
0x00005555 0x00001111 1.90
0x00005555 0x00000000 6.58
0x00001111 0x0000ffff 1.46
0x00001111 0x00005555 1.49
0x00001111 0x00001111 1.45
0x00001111 0x00000000 3.10
0x00000000 0x0000ffff 1.18
0x00000000 0x00005555 1.18
0x00000000 0x00001111 1.17
0x00000000 0x00000000 1.25
-----------------------------
geo.mean 2.06
[2] test_find_next_bit, X86 (skylake)
[ 3913.477422] Start testing find_bit() with random-filled bitmap
[ 3913.477847] find_next_bit: 160868 cycles, 16484 iterations
[ 3913.477933] find_next_zero_bit: 169542 cycles, 16285 iterations
[ 3913.478036] find_last_bit: 201638 cycles, 16483 iterations
[ 3913.480214] find_first_bit: 4353244 cycles, 16484 iterations
[ 3913.480216] Start testing find_next_and_bit() with random-filled
bitmap
[ 3913.481074] find_next_and_bit: 89604 cycles, 8216 iterations
[ 3913.481075] Start testing find_bit() with sparse bitmap
[ 3913.481078] find_next_bit: 2536 cycles, 66 iterations
[ 3913.481252] find_next_zero_bit: 344404 cycles, 32703 iterations
[ 3913.481255] find_last_bit: 2006 cycles, 66 iterations
[ 3913.481265] find_first_bit: 17488 cycles, 66 iterations
[ 3913.481266] Start testing find_next_and_bit() with sparse bitmap
[ 3913.481272] find_next_and_bit: 764 cycles, 1 iterations
[3] test_find_next_bit, arm (v7 odroid XU3).
[ 267.206928] Start testing find_bit() with random-filled bitmap
[ 267.214752] find_next_bit: 4474 cycles, 16419 iterations
[ 267.221850] find_next_zero_bit: 5976 cycles, 16350 iterations
[ 267.229294] find_last_bit: 4209 cycles, 16419 iterations
[ 267.279131] find_first_bit: 1032991 cycles, 16420 iterations
[ 267.286265] Start testing find_next_and_bit() with random-filled
bitmap
[ 267.302386] find_next_and_bit: 2290 cycles, 8140 iterations
[ 267.309422] Start testing find_bit() with sparse bitmap
[ 267.316054] find_next_bit: 191 cycles, 66 iterations
[ 267.322726] find_next_zero_bit: 8758 cycles, 32703 iterations
[ 267.329803] find_last_bit: 84 cycles, 66 iterations
[ 267.336169] find_first_bit: 4118 cycles, 66 iterations
[ 267.342627] Start testing find_next_and_bit() with sparse bitmap
[ 267.356919] find_next_and_bit: 91 cycles, 1 iterations
[courbet@google.com: v6]
Link: http://lkml.kernel.org/r/20171129095715.23430-1-courbet@google.com
[geert@linux-m68k.org: m68k/bitops: always include <asm-generic/bitops/find.h>]
Link: http://lkml.kernel.org/r/1512556816-28627-1-git-send-email-geert@linux-m68k.org
Link: http://lkml.kernel.org/r/20171128131334.23491-1-courbet@google.com
Signed-off-by: Clement Courbet <courbet@google.com>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Yury Norov <ynorov@caviumnetworks.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-07 07:38:34 +08:00
|
|
|
#endif
|
|
|
|
|
2022-10-03 23:34:17 +08:00
|
|
|
#ifndef find_next_andnot_bit
|
|
|
|
/**
|
|
|
|
* find_next_andnot_bit - find the next set bit in *addr1 excluding all the bits
|
|
|
|
* in *addr2
|
|
|
|
* @addr1: The first address to base the search on
|
|
|
|
* @addr2: The second address to base the search on
|
|
|
|
* @size: The bitmap size in bits
|
|
|
|
* @offset: The bitnumber to start searching at
|
|
|
|
*
|
|
|
|
* Returns the bit number for the next set bit
|
|
|
|
* If no bits are set, returns @size.
|
|
|
|
*/
|
|
|
|
static inline
|
|
|
|
unsigned long find_next_andnot_bit(const unsigned long *addr1,
|
|
|
|
const unsigned long *addr2, unsigned long size,
|
|
|
|
unsigned long offset)
|
|
|
|
{
|
|
|
|
if (small_const_nbits(size)) {
|
|
|
|
unsigned long val;
|
|
|
|
|
|
|
|
if (unlikely(offset >= size))
|
|
|
|
return size;
|
|
|
|
|
|
|
|
val = *addr1 & ~*addr2 & GENMASK(size - 1, offset);
|
|
|
|
return val ? __ffs(val) : size;
|
|
|
|
}
|
|
|
|
|
|
|
|
return _find_next_andnot_bit(addr1, addr2, size, offset);
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2011-05-27 07:26:09 +08:00
|
|
|
#ifndef find_next_zero_bit
|
2010-09-29 17:08:51 +08:00
|
|
|
/**
|
|
|
|
* find_next_zero_bit - find the next cleared bit in a memory region
|
|
|
|
* @addr: The address to base the search on
|
|
|
|
* @size: The bitmap size in bits
|
2022-04-11 23:05:55 +08:00
|
|
|
* @offset: The bitnumber to start searching at
|
2013-11-13 07:09:48 +08:00
|
|
|
*
|
|
|
|
* Returns the bit number of the next zero bit
|
|
|
|
* If no bits are zero, returns @size.
|
2010-09-29 17:08:51 +08:00
|
|
|
*/
|
2021-05-07 09:03:03 +08:00
|
|
|
static inline
|
|
|
|
unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size,
|
|
|
|
unsigned long offset)
|
|
|
|
{
|
lib: add fast path for find_next_*_bit()
Similarly to bitmap functions, find_next_*_bit() users will benefit if
we'll handle a case of bitmaps that fit into a single word inline. In the
very best case, the compiler may replace a function call with a few
instructions.
This is the quite typical find_next_bit() user:
unsigned int cpumask_next(int n, const struct cpumask *srcp)
{
/* -1 is a legal arg here. */
if (n != -1)
cpumask_check(n);
return find_next_bit(cpumask_bits(srcp), nr_cpumask_bits, n + 1);
}
EXPORT_SYMBOL(cpumask_next);
Currently, on ARM64 the generated code looks like this:
0000000000000000 <cpumask_next>:
0: a9bf7bfd stp x29, x30, [sp, #-16]!
4: 11000402 add w2, w0, #0x1
8: aa0103e0 mov x0, x1
c: d2800401 mov x1, #0x40 // #64
10: 910003fd mov x29, sp
14: 93407c42 sxtw x2, w2
18: 94000000 bl 0 <find_next_bit>
1c: a8c17bfd ldp x29, x30, [sp], #16
20: d65f03c0 ret
24: d503201f nop
After applying this patch:
0000000000000140 <cpumask_next>:
140: 11000400 add w0, w0, #0x1
144: 93407c00 sxtw x0, w0
148: f100fc1f cmp x0, #0x3f
14c: 54000168 b.hi 178 <cpumask_next+0x38> // b.pmore
150: f9400023 ldr x3, [x1]
154: 92800001 mov x1, #0xffffffffffffffff // #-1
158: 9ac02020 lsl x0, x1, x0
15c: 52800802 mov w2, #0x40 // #64
160: 8a030001 and x1, x0, x3
164: dac00020 rbit x0, x1
168: f100003f cmp x1, #0x0
16c: dac01000 clz x0, x0
170: 1a800040 csel w0, w2, w0, eq // eq = none
174: d65f03c0 ret
178: 52800800 mov w0, #0x40 // #64
17c: d65f03c0 ret
find_next_bit() call is replaced with 6 instructions. find_next_bit()
itself is 41 instructions plus function call overhead.
Despite inlining, the scripts/bloat-o-meter report smaller .text size
after applying the series:
add/remove: 11/9 grow/shrink: 233/176 up/down: 5780/-6768 (-988)
Link: https://lkml.kernel.org/r/20210401003153.97325-10-yury.norov@gmail.com
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Alexey Klimov <aklimov@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Sterba <dsterba@suse.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Jianpeng Ma <jianpeng.ma@intel.com>
Cc: Joe Perches <joe@perches.com>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Stefano Brivio <sbrivio@redhat.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Wolfram Sang <wsa+renesas@sang-engineering.com>
Cc: Yoshinori Sato <ysato@users.osdn.me>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-07 09:03:11 +08:00
|
|
|
if (small_const_nbits(size)) {
|
|
|
|
unsigned long val;
|
|
|
|
|
|
|
|
if (unlikely(offset >= size))
|
|
|
|
return size;
|
|
|
|
|
|
|
|
val = *addr | ~GENMASK(size - 1, offset);
|
|
|
|
return val == ~0UL ? size : ffz(val);
|
|
|
|
}
|
|
|
|
|
lib/find_bit: optimize find_next_bit() functions
Over the past couple years, the function _find_next_bit() was extended
with parameters that modify its behavior to implement and- zero- and le-
flavors. The parameters are passed at compile time, but current design
prevents a compiler from optimizing out the conditionals.
As find_next_bit() API grows, I expect that more parameters will be added.
Current design would require more conditional code in _find_next_bit(),
which would bloat the helper even more and make it barely readable.
This patch replaces _find_next_bit() with a macro FIND_NEXT_BIT, and adds
a set of wrappers, so that the compile-time optimizations become possible.
The common logic is moved to the new macro, and all flavors may be
generated by providing a FETCH macro parameter, like in this example:
#define FIND_NEXT_BIT(FETCH, MUNGE, size, start) ...
find_next_xornot_and_bit(addr1, addr2, addr3, size, start)
{
return FIND_NEXT_BIT(addr1[idx] ^ ~addr2[idx] & addr3[idx],
/* nop */, size, start);
}
The FETCH may be of any complexity, as soon as it only refers the bitmap(s)
and an iterator idx.
MUNGE is here to support _le code generation for BE builds. May be
empty.
I ran find_bit_benchmark 16 times on top of 6.0-rc2 and 16 times on top
of 6.0-rc2 + this series. The results for kvm/x86_64 are:
v6.0-rc2 Optimized Difference Z-score
Random dense bitmap ns ns ns %
find_next_bit: 787735 670546 117189 14.9 3.97
find_next_zero_bit: 777492 664208 113284 14.6 10.51
find_last_bit: 830925 687573 143352 17.3 2.35
find_first_bit: 3874366 3306635 567731 14.7 1.84
find_first_and_bit: 40677125 37739887 2937238 7.2 1.36
find_next_and_bit: 347865 304456 43409 12.5 1.35
Random sparse bitmap
find_next_bit: 19816 14021 5795 29.2 6.10
find_next_zero_bit: 1318901 1223794 95107 7.2 1.41
find_last_bit: 14573 13514 1059 7.3 6.92
find_first_bit: 1313321 1249024 64297 4.9 1.53
find_first_and_bit: 8921 8098 823 9.2 4.56
find_next_and_bit: 9796 7176 2620 26.7 5.39
Where the statistics is significant (z-score > 3), the improvement
is ~15%.
According to the bloat-o-meter, the Image size is 10-11K less:
x86_64/defconfig:
add/remove: 32/14 grow/shrink: 61/782 up/down: 6344/-16521 (-10177)
arm64/defconfig:
add/remove: 3/2 grow/shrink: 50/714 up/down: 608/-11556 (-10948)
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
2022-09-15 10:07:29 +08:00
|
|
|
return _find_next_zero_bit(addr, size, offset);
|
2021-05-07 09:03:03 +08:00
|
|
|
}
|
2011-05-27 07:26:09 +08:00
|
|
|
#endif
|
2006-03-26 17:39:11 +08:00
|
|
|
|
2021-08-15 05:16:57 +08:00
|
|
|
#ifndef find_first_bit
|
2010-09-29 17:08:50 +08:00
|
|
|
/**
|
|
|
|
* find_first_bit - find the first set bit in a memory region
|
|
|
|
* @addr: The address to start the search at
|
2013-11-13 07:09:48 +08:00
|
|
|
* @size: The maximum number of bits to search
|
2010-09-29 17:08:50 +08:00
|
|
|
*
|
|
|
|
* Returns the bit number of the first set bit.
|
2013-11-13 07:09:48 +08:00
|
|
|
* If no bits are set, returns @size.
|
2010-09-29 17:08:50 +08:00
|
|
|
*/
|
2021-05-07 09:03:14 +08:00
|
|
|
static inline
|
|
|
|
unsigned long find_first_bit(const unsigned long *addr, unsigned long size)
|
|
|
|
{
|
|
|
|
if (small_const_nbits(size)) {
|
|
|
|
unsigned long val = *addr & GENMASK(size - 1, 0);
|
|
|
|
|
|
|
|
return val ? __ffs(val) : size;
|
|
|
|
}
|
|
|
|
|
|
|
|
return _find_first_bit(addr, size);
|
|
|
|
}
|
2021-08-15 05:16:57 +08:00
|
|
|
#endif
|
2010-09-29 17:08:50 +08:00
|
|
|
|
lib: add find_nth{,_and,_andnot}_bit()
Kernel lacks for a function that searches for Nth bit in a bitmap.
Usually people do it like this:
for_each_set_bit(bit, mask, size)
if (n-- == 0)
return bit;
We can do it more efficiently, if we:
1. find a word containing Nth bit, using hweight(); and
2. find the bit, using a helper fns(), that works similarly to
__ffs() and ffz().
fns() is implemented as a simple loop. For x86_64, there's PDEP instruction
to do that: ret = clz(pdep(1 << idx, num)). However, for large bitmaps the
most of improvement comes from using hweight(), so I kept fns() simple.
New find_nth_bit() is ~70 times faster on x86_64/kvm in find_bit benchmark:
find_nth_bit: 7154190 ns, 16411 iterations
for_each_bit: 505493126 ns, 16315 iterations
With all that, a family of 3 new functions is added, and used where
appropriate in the following patches.
Signed-off-by: Yury Norov <yury.norov@gmail.com>
2022-09-18 11:07:13 +08:00
|
|
|
/**
|
|
|
|
* find_nth_bit - find N'th set bit in a memory region
|
|
|
|
* @addr: The address to start the search at
|
|
|
|
* @size: The maximum number of bits to search
|
|
|
|
* @n: The number of set bit, which position is needed, counting from 0
|
|
|
|
*
|
|
|
|
* The following is semantically equivalent:
|
|
|
|
* idx = find_nth_bit(addr, size, 0);
|
|
|
|
* idx = find_first_bit(addr, size);
|
|
|
|
*
|
|
|
|
* Returns the bit number of the N'th set bit.
|
|
|
|
* If no such, returns @size.
|
|
|
|
*/
|
|
|
|
static inline
|
|
|
|
unsigned long find_nth_bit(const unsigned long *addr, unsigned long size, unsigned long n)
|
|
|
|
{
|
|
|
|
if (n >= size)
|
|
|
|
return size;
|
|
|
|
|
|
|
|
if (small_const_nbits(size)) {
|
|
|
|
unsigned long val = *addr & GENMASK(size - 1, 0);
|
|
|
|
|
|
|
|
return val ? fns(val, n) : size;
|
|
|
|
}
|
|
|
|
|
|
|
|
return __find_nth_bit(addr, size, n);
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* find_nth_and_bit - find N'th set bit in 2 memory regions
|
|
|
|
* @addr1: The 1st address to start the search at
|
|
|
|
* @addr2: The 2nd address to start the search at
|
|
|
|
* @size: The maximum number of bits to search
|
|
|
|
* @n: The number of set bit, which position is needed, counting from 0
|
|
|
|
*
|
|
|
|
* Returns the bit number of the N'th set bit.
|
|
|
|
* If no such, returns @size.
|
|
|
|
*/
|
|
|
|
static inline
|
|
|
|
unsigned long find_nth_and_bit(const unsigned long *addr1, const unsigned long *addr2,
|
|
|
|
unsigned long size, unsigned long n)
|
|
|
|
{
|
|
|
|
if (n >= size)
|
|
|
|
return size;
|
|
|
|
|
|
|
|
if (small_const_nbits(size)) {
|
|
|
|
unsigned long val = *addr1 & *addr2 & GENMASK(size - 1, 0);
|
|
|
|
|
|
|
|
return val ? fns(val, n) : size;
|
|
|
|
}
|
|
|
|
|
|
|
|
return __find_nth_and_bit(addr1, addr2, size, n);
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* find_nth_andnot_bit - find N'th set bit in 2 memory regions,
|
|
|
|
* flipping bits in 2nd region
|
|
|
|
* @addr1: The 1st address to start the search at
|
|
|
|
* @addr2: The 2nd address to start the search at
|
|
|
|
* @size: The maximum number of bits to search
|
|
|
|
* @n: The number of set bit, which position is needed, counting from 0
|
|
|
|
*
|
|
|
|
* Returns the bit number of the N'th set bit.
|
|
|
|
* If no such, returns @size.
|
|
|
|
*/
|
|
|
|
static inline
|
|
|
|
unsigned long find_nth_andnot_bit(const unsigned long *addr1, const unsigned long *addr2,
|
|
|
|
unsigned long size, unsigned long n)
|
|
|
|
{
|
|
|
|
if (n >= size)
|
|
|
|
return size;
|
|
|
|
|
|
|
|
if (small_const_nbits(size)) {
|
|
|
|
unsigned long val = *addr1 & (~*addr2) & GENMASK(size - 1, 0);
|
|
|
|
|
|
|
|
return val ? fns(val, n) : size;
|
|
|
|
}
|
|
|
|
|
|
|
|
return __find_nth_andnot_bit(addr1, addr2, size, n);
|
|
|
|
}
|
|
|
|
|
2021-08-15 05:17:01 +08:00
|
|
|
#ifndef find_first_and_bit
|
|
|
|
/**
|
|
|
|
* find_first_and_bit - find the first set bit in both memory regions
|
|
|
|
* @addr1: The first address to base the search on
|
|
|
|
* @addr2: The second address to base the search on
|
|
|
|
* @size: The bitmap size in bits
|
|
|
|
*
|
|
|
|
* Returns the bit number for the next set bit
|
|
|
|
* If no bits are set, returns @size.
|
|
|
|
*/
|
|
|
|
static inline
|
|
|
|
unsigned long find_first_and_bit(const unsigned long *addr1,
|
|
|
|
const unsigned long *addr2,
|
|
|
|
unsigned long size)
|
|
|
|
{
|
|
|
|
if (small_const_nbits(size)) {
|
|
|
|
unsigned long val = *addr1 & *addr2 & GENMASK(size - 1, 0);
|
|
|
|
|
|
|
|
return val ? __ffs(val) : size;
|
|
|
|
}
|
|
|
|
|
|
|
|
return _find_first_and_bit(addr1, addr2, size);
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2021-08-15 05:16:57 +08:00
|
|
|
#ifndef find_first_zero_bit
|
2010-09-29 17:08:50 +08:00
|
|
|
/**
|
|
|
|
* find_first_zero_bit - find the first cleared bit in a memory region
|
|
|
|
* @addr: The address to start the search at
|
2013-11-13 07:09:48 +08:00
|
|
|
* @size: The maximum number of bits to search
|
2010-09-29 17:08:50 +08:00
|
|
|
*
|
|
|
|
* Returns the bit number of the first cleared bit.
|
2013-11-13 07:09:48 +08:00
|
|
|
* If no bits are zero, returns @size.
|
2010-09-29 17:08:50 +08:00
|
|
|
*/
|
2021-05-07 09:03:14 +08:00
|
|
|
static inline
|
|
|
|
unsigned long find_first_zero_bit(const unsigned long *addr, unsigned long size)
|
|
|
|
{
|
|
|
|
if (small_const_nbits(size)) {
|
|
|
|
unsigned long val = *addr | ~GENMASK(size - 1, 0);
|
|
|
|
|
|
|
|
return val == ~0UL ? size : ffz(val);
|
|
|
|
}
|
|
|
|
|
|
|
|
return _find_first_zero_bit(addr, size);
|
|
|
|
}
|
2021-08-15 05:16:57 +08:00
|
|
|
#endif
|
|
|
|
|
2021-05-07 09:03:14 +08:00
|
|
|
#ifndef find_last_bit
|
|
|
|
/**
|
|
|
|
* find_last_bit - find the last set bit in a memory region
|
|
|
|
* @addr: The address to start the search at
|
|
|
|
* @size: The number of bits to search
|
|
|
|
*
|
|
|
|
* Returns the bit number of the last set bit, or size.
|
|
|
|
*/
|
|
|
|
static inline
|
|
|
|
unsigned long find_last_bit(const unsigned long *addr, unsigned long size)
|
|
|
|
{
|
|
|
|
if (small_const_nbits(size)) {
|
|
|
|
unsigned long val = *addr & GENMASK(size - 1, 0);
|
|
|
|
|
|
|
|
return val ? __fls(val) : size;
|
|
|
|
}
|
|
|
|
|
|
|
|
return _find_last_bit(addr, size);
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2022-09-20 05:05:56 +08:00
|
|
|
/**
|
|
|
|
* find_next_and_bit_wrap - find the next set bit in both memory regions
|
|
|
|
* @addr1: The first address to base the search on
|
|
|
|
* @addr2: The second address to base the search on
|
|
|
|
* @size: The bitmap size in bits
|
|
|
|
* @offset: The bitnumber to start searching at
|
|
|
|
*
|
|
|
|
* Returns the bit number for the next set bit, or first set bit up to @offset
|
|
|
|
* If no bits are set, returns @size.
|
|
|
|
*/
|
|
|
|
static inline
|
|
|
|
unsigned long find_next_and_bit_wrap(const unsigned long *addr1,
|
|
|
|
const unsigned long *addr2,
|
|
|
|
unsigned long size, unsigned long offset)
|
|
|
|
{
|
|
|
|
unsigned long bit = find_next_and_bit(addr1, addr2, size, offset);
|
|
|
|
|
|
|
|
if (bit < size)
|
|
|
|
return bit;
|
|
|
|
|
|
|
|
bit = find_first_and_bit(addr1, addr2, offset);
|
|
|
|
return bit < offset ? bit : size;
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* find_next_bit_wrap - find the next set bit in both memory regions
|
|
|
|
* @addr: The first address to base the search on
|
|
|
|
* @size: The bitmap size in bits
|
|
|
|
* @offset: The bitnumber to start searching at
|
|
|
|
*
|
|
|
|
* Returns the bit number for the next set bit, or first set bit up to @offset
|
|
|
|
* If no bits are set, returns @size.
|
|
|
|
*/
|
|
|
|
static inline
|
|
|
|
unsigned long find_next_bit_wrap(const unsigned long *addr,
|
|
|
|
unsigned long size, unsigned long offset)
|
|
|
|
{
|
|
|
|
unsigned long bit = find_next_bit(addr, size, offset);
|
|
|
|
|
|
|
|
if (bit < size)
|
|
|
|
return bit;
|
|
|
|
|
|
|
|
bit = find_first_bit(addr, offset);
|
|
|
|
return bit < offset ? bit : size;
|
|
|
|
}
|
|
|
|
|
2022-09-20 05:05:57 +08:00
|
|
|
/*
|
|
|
|
* Helper for for_each_set_bit_wrap(). Make sure you're doing right thing
|
|
|
|
* before using it alone.
|
|
|
|
*/
|
|
|
|
static inline
|
|
|
|
unsigned long __for_each_wrap(const unsigned long *bitmap, unsigned long size,
|
|
|
|
unsigned long start, unsigned long n)
|
|
|
|
{
|
|
|
|
unsigned long bit;
|
|
|
|
|
|
|
|
/* If not wrapped around */
|
|
|
|
if (n > start) {
|
|
|
|
/* and have a bit, just return it. */
|
|
|
|
bit = find_next_bit(bitmap, size, n);
|
|
|
|
if (bit < size)
|
|
|
|
return bit;
|
|
|
|
|
|
|
|
/* Otherwise, wrap around and ... */
|
|
|
|
n = 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Search the other part. */
|
|
|
|
bit = find_next_bit(bitmap, start, n);
|
|
|
|
return bit < start ? bit : size;
|
|
|
|
}
|
|
|
|
|
bitops: introduce the for_each_set_clump8 macro
Pach series "Introduce the for_each_set_clump8 macro", v18.
While adding GPIO get_multiple/set_multiple callback support for various
drivers, I noticed a pattern of looping manifesting that would be useful
standardized as a macro.
This patchset introduces the for_each_set_clump8 macro and utilizes it
in several GPIO drivers. The for_each_set_clump macro8 facilitates a
for-loop syntax that iterates over a memory region entire groups of set
bits at a time.
For example, suppose you would like to iterate over a 32-bit integer 8
bits at a time, skipping over 8-bit groups with no set bit, where
XXXXXXXX represents the current 8-bit group:
Example: 10111110 00000000 11111111 00110011
First loop: 10111110 00000000 11111111 XXXXXXXX
Second loop: 10111110 00000000 XXXXXXXX 00110011
Third loop: XXXXXXXX 00000000 11111111 00110011
Each iteration of the loop returns the next 8-bit group that has at
least one set bit.
The for_each_set_clump8 macro has four parameters:
* start: set to the bit offset of the current clump
* clump: set to the current clump value
* bits: bitmap to search within
* size: bitmap size in number of bits
In this version of the patchset, the for_each_set_clump macro has been
reimplemented and simplified based on the suggestions provided by Rasmus
Villemoes and Andy Shevchenko in the version 4 submission.
In particular, the function of the for_each_set_clump macro has been
restricted to handle only 8-bit clumps; the drivers that use the
for_each_set_clump macro only handle 8-bit ports so a generic
for_each_set_clump implementation is not necessary. Thus, a solution
for large clumps (i.e. those larger than the width of a bitmap word)
can be postponed until a driver appears that actually requires such a
generic for_each_set_clump implementation.
For what it's worth, a semi-generic for_each_set_clump (i.e. for clumps
smaller than the width of a bitmap word) can be implemented by simply
replacing the hardcoded '8' and '0xFF' instances with respective
variables. I have not yet had a need for such an implementation, and
since it falls short of a true generic for_each_set_clump function, I
have decided to forgo such an implementation for now.
In addition, the bitmap_get_value8 and bitmap_set_value8 functions are
introduced to get and set 8-bit values respectively. Their use is based
on the behavior suggested in the patchset version 4 review.
This patch (of 14):
This macro iterates for each 8-bit group of bits (clump) with set bits,
within a bitmap memory region. For each iteration, "start" is set to
the bit offset of the found clump, while the respective clump value is
stored to the location pointed by "clump". Additionally, the
bitmap_get_value8 and bitmap_set_value8 functions are introduced to
respectively get and set an 8-bit value in a bitmap memory region.
[gustavo@embeddedor.com: fix potential sign-extension overflow]
Link: http://lkml.kernel.org/r/20191015184657.GA26541@embeddedor
[akpm@linux-foundation.org: s/ULL/UL/, per Joe]
[vilhelm.gray@gmail.com: add for_each_set_clump8 documentation]
Link: http://lkml.kernel.org/r/20191016161825.301082-1-vilhelm.gray@gmail.com
Link: http://lkml.kernel.org/r/893c3b4f03266c9496137cc98ac2b1bd27f92c73.1570641097.git.vilhelm.gray@gmail.com
Signed-off-by: William Breathitt Gray <vilhelm.gray@gmail.com>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Suggested-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Suggested-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Suggested-by: Lukas Wunner <lukas@wunner.de>
Tested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Phil Reid <preid@electromag.com.au>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Mathias Duckeck <m.duckeck@kunbus.de>
Cc: Morten Hein Tiljeset <morten.tiljeset@prevas.dk>
Cc: Sean Nyekjaer <sean.nyekjaer@prevas.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-05 08:50:57 +08:00
|
|
|
/**
|
|
|
|
* find_next_clump8 - find next 8-bit clump with set bits in a memory region
|
|
|
|
* @clump: location to store copy of found clump
|
|
|
|
* @addr: address to base the search on
|
|
|
|
* @size: bitmap size in number of bits
|
|
|
|
* @offset: bit offset at which to start searching
|
|
|
|
*
|
|
|
|
* Returns the bit offset for the next set clump; the found clump value is
|
|
|
|
* copied to the location pointed by @clump. If no bits are set, returns @size.
|
|
|
|
*/
|
|
|
|
extern unsigned long find_next_clump8(unsigned long *clump,
|
|
|
|
const unsigned long *addr,
|
|
|
|
unsigned long size, unsigned long offset);
|
|
|
|
|
|
|
|
#define find_first_clump8(clump, bits, size) \
|
|
|
|
find_next_clump8((clump), (bits), (size), 0)
|
|
|
|
|
2021-08-15 05:16:58 +08:00
|
|
|
#if defined(__LITTLE_ENDIAN)
|
|
|
|
|
|
|
|
static inline unsigned long find_next_zero_bit_le(const void *addr,
|
|
|
|
unsigned long size, unsigned long offset)
|
|
|
|
{
|
|
|
|
return find_next_zero_bit(addr, size, offset);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline unsigned long find_next_bit_le(const void *addr,
|
|
|
|
unsigned long size, unsigned long offset)
|
|
|
|
{
|
|
|
|
return find_next_bit(addr, size, offset);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline unsigned long find_first_zero_bit_le(const void *addr,
|
|
|
|
unsigned long size)
|
|
|
|
{
|
|
|
|
return find_first_zero_bit(addr, size);
|
|
|
|
}
|
|
|
|
|
|
|
|
#elif defined(__BIG_ENDIAN)
|
|
|
|
|
|
|
|
#ifndef find_next_zero_bit_le
|
|
|
|
static inline
|
|
|
|
unsigned long find_next_zero_bit_le(const void *addr, unsigned
|
|
|
|
long size, unsigned long offset)
|
|
|
|
{
|
|
|
|
if (small_const_nbits(size)) {
|
|
|
|
unsigned long val = *(const unsigned long *)addr;
|
|
|
|
|
|
|
|
if (unlikely(offset >= size))
|
|
|
|
return size;
|
|
|
|
|
|
|
|
val = swab(val) | ~GENMASK(size - 1, offset);
|
|
|
|
return val == ~0UL ? size : ffz(val);
|
|
|
|
}
|
|
|
|
|
lib/find_bit: optimize find_next_bit() functions
Over the past couple years, the function _find_next_bit() was extended
with parameters that modify its behavior to implement and- zero- and le-
flavors. The parameters are passed at compile time, but current design
prevents a compiler from optimizing out the conditionals.
As find_next_bit() API grows, I expect that more parameters will be added.
Current design would require more conditional code in _find_next_bit(),
which would bloat the helper even more and make it barely readable.
This patch replaces _find_next_bit() with a macro FIND_NEXT_BIT, and adds
a set of wrappers, so that the compile-time optimizations become possible.
The common logic is moved to the new macro, and all flavors may be
generated by providing a FETCH macro parameter, like in this example:
#define FIND_NEXT_BIT(FETCH, MUNGE, size, start) ...
find_next_xornot_and_bit(addr1, addr2, addr3, size, start)
{
return FIND_NEXT_BIT(addr1[idx] ^ ~addr2[idx] & addr3[idx],
/* nop */, size, start);
}
The FETCH may be of any complexity, as soon as it only refers the bitmap(s)
and an iterator idx.
MUNGE is here to support _le code generation for BE builds. May be
empty.
I ran find_bit_benchmark 16 times on top of 6.0-rc2 and 16 times on top
of 6.0-rc2 + this series. The results for kvm/x86_64 are:
v6.0-rc2 Optimized Difference Z-score
Random dense bitmap ns ns ns %
find_next_bit: 787735 670546 117189 14.9 3.97
find_next_zero_bit: 777492 664208 113284 14.6 10.51
find_last_bit: 830925 687573 143352 17.3 2.35
find_first_bit: 3874366 3306635 567731 14.7 1.84
find_first_and_bit: 40677125 37739887 2937238 7.2 1.36
find_next_and_bit: 347865 304456 43409 12.5 1.35
Random sparse bitmap
find_next_bit: 19816 14021 5795 29.2 6.10
find_next_zero_bit: 1318901 1223794 95107 7.2 1.41
find_last_bit: 14573 13514 1059 7.3 6.92
find_first_bit: 1313321 1249024 64297 4.9 1.53
find_first_and_bit: 8921 8098 823 9.2 4.56
find_next_and_bit: 9796 7176 2620 26.7 5.39
Where the statistics is significant (z-score > 3), the improvement
is ~15%.
According to the bloat-o-meter, the Image size is 10-11K less:
x86_64/defconfig:
add/remove: 32/14 grow/shrink: 61/782 up/down: 6344/-16521 (-10177)
arm64/defconfig:
add/remove: 3/2 grow/shrink: 50/714 up/down: 608/-11556 (-10948)
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
2022-09-15 10:07:29 +08:00
|
|
|
return _find_next_zero_bit_le(addr, size, offset);
|
2021-08-15 05:16:58 +08:00
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2022-09-15 10:07:28 +08:00
|
|
|
#ifndef find_first_zero_bit_le
|
|
|
|
static inline
|
|
|
|
unsigned long find_first_zero_bit_le(const void *addr, unsigned long size)
|
|
|
|
{
|
|
|
|
if (small_const_nbits(size)) {
|
|
|
|
unsigned long val = swab(*(const unsigned long *)addr) | ~GENMASK(size - 1, 0);
|
|
|
|
|
|
|
|
return val == ~0UL ? size : ffz(val);
|
|
|
|
}
|
|
|
|
|
|
|
|
return _find_first_zero_bit_le(addr, size);
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2021-08-15 05:16:58 +08:00
|
|
|
#ifndef find_next_bit_le
|
|
|
|
static inline
|
|
|
|
unsigned long find_next_bit_le(const void *addr, unsigned
|
|
|
|
long size, unsigned long offset)
|
|
|
|
{
|
|
|
|
if (small_const_nbits(size)) {
|
|
|
|
unsigned long val = *(const unsigned long *)addr;
|
|
|
|
|
|
|
|
if (unlikely(offset >= size))
|
|
|
|
return size;
|
|
|
|
|
|
|
|
val = swab(val) & GENMASK(size - 1, offset);
|
|
|
|
return val ? __ffs(val) : size;
|
|
|
|
}
|
|
|
|
|
lib/find_bit: optimize find_next_bit() functions
Over the past couple years, the function _find_next_bit() was extended
with parameters that modify its behavior to implement and- zero- and le-
flavors. The parameters are passed at compile time, but current design
prevents a compiler from optimizing out the conditionals.
As find_next_bit() API grows, I expect that more parameters will be added.
Current design would require more conditional code in _find_next_bit(),
which would bloat the helper even more and make it barely readable.
This patch replaces _find_next_bit() with a macro FIND_NEXT_BIT, and adds
a set of wrappers, so that the compile-time optimizations become possible.
The common logic is moved to the new macro, and all flavors may be
generated by providing a FETCH macro parameter, like in this example:
#define FIND_NEXT_BIT(FETCH, MUNGE, size, start) ...
find_next_xornot_and_bit(addr1, addr2, addr3, size, start)
{
return FIND_NEXT_BIT(addr1[idx] ^ ~addr2[idx] & addr3[idx],
/* nop */, size, start);
}
The FETCH may be of any complexity, as soon as it only refers the bitmap(s)
and an iterator idx.
MUNGE is here to support _le code generation for BE builds. May be
empty.
I ran find_bit_benchmark 16 times on top of 6.0-rc2 and 16 times on top
of 6.0-rc2 + this series. The results for kvm/x86_64 are:
v6.0-rc2 Optimized Difference Z-score
Random dense bitmap ns ns ns %
find_next_bit: 787735 670546 117189 14.9 3.97
find_next_zero_bit: 777492 664208 113284 14.6 10.51
find_last_bit: 830925 687573 143352 17.3 2.35
find_first_bit: 3874366 3306635 567731 14.7 1.84
find_first_and_bit: 40677125 37739887 2937238 7.2 1.36
find_next_and_bit: 347865 304456 43409 12.5 1.35
Random sparse bitmap
find_next_bit: 19816 14021 5795 29.2 6.10
find_next_zero_bit: 1318901 1223794 95107 7.2 1.41
find_last_bit: 14573 13514 1059 7.3 6.92
find_first_bit: 1313321 1249024 64297 4.9 1.53
find_first_and_bit: 8921 8098 823 9.2 4.56
find_next_and_bit: 9796 7176 2620 26.7 5.39
Where the statistics is significant (z-score > 3), the improvement
is ~15%.
According to the bloat-o-meter, the Image size is 10-11K less:
x86_64/defconfig:
add/remove: 32/14 grow/shrink: 61/782 up/down: 6344/-16521 (-10177)
arm64/defconfig:
add/remove: 3/2 grow/shrink: 50/714 up/down: 608/-11556 (-10948)
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
2022-09-15 10:07:29 +08:00
|
|
|
return _find_next_bit_le(addr, size, offset);
|
2021-08-15 05:16:58 +08:00
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
|
|
|
#else
|
|
|
|
#error "Please fix <asm/byteorder.h>"
|
|
|
|
#endif
|
|
|
|
|
2021-08-15 05:17:06 +08:00
|
|
|
#define for_each_set_bit(bit, addr, size) \
|
2022-09-20 05:05:58 +08:00
|
|
|
for ((bit) = 0; (bit) = find_next_bit((addr), (size), (bit)), (bit) < (size); (bit)++)
|
2021-08-15 05:17:06 +08:00
|
|
|
|
2022-09-20 05:05:55 +08:00
|
|
|
#define for_each_and_bit(bit, addr1, addr2, size) \
|
2022-09-20 05:05:58 +08:00
|
|
|
for ((bit) = 0; \
|
|
|
|
(bit) = find_next_and_bit((addr1), (addr2), (size), (bit)), (bit) < (size);\
|
|
|
|
(bit)++)
|
2022-09-20 05:05:55 +08:00
|
|
|
|
2022-10-03 23:34:18 +08:00
|
|
|
#define for_each_andnot_bit(bit, addr1, addr2, size) \
|
|
|
|
for ((bit) = 0; \
|
|
|
|
(bit) = find_next_andnot_bit((addr1), (addr2), (size), (bit)), (bit) < (size);\
|
|
|
|
(bit)++)
|
|
|
|
|
2021-08-15 05:17:06 +08:00
|
|
|
/* same as for_each_set_bit() but use bit as value to start with */
|
|
|
|
#define for_each_set_bit_from(bit, addr, size) \
|
2022-09-20 05:05:58 +08:00
|
|
|
for (; (bit) = find_next_bit((addr), (size), (bit)), (bit) < (size); (bit)++)
|
2021-08-15 05:17:06 +08:00
|
|
|
|
|
|
|
#define for_each_clear_bit(bit, addr, size) \
|
2022-09-20 05:05:58 +08:00
|
|
|
for ((bit) = 0; \
|
|
|
|
(bit) = find_next_zero_bit((addr), (size), (bit)), (bit) < (size); \
|
|
|
|
(bit)++)
|
2021-08-15 05:17:06 +08:00
|
|
|
|
|
|
|
/* same as for_each_clear_bit() but use bit as value to start with */
|
|
|
|
#define for_each_clear_bit_from(bit, addr, size) \
|
2022-09-20 05:05:58 +08:00
|
|
|
for (; (bit) = find_next_zero_bit((addr), (size), (bit)), (bit) < (size); (bit)++)
|
2021-08-15 05:17:06 +08:00
|
|
|
|
2021-08-15 05:17:11 +08:00
|
|
|
/**
|
|
|
|
* for_each_set_bitrange - iterate over all set bit ranges [b; e)
|
|
|
|
* @b: bit offset of start of current bitrange (first set bit)
|
|
|
|
* @e: bit offset of end of current bitrange (first unset bit)
|
|
|
|
* @addr: bitmap address to base the search on
|
|
|
|
* @size: bitmap size in number of bits
|
|
|
|
*/
|
|
|
|
#define for_each_set_bitrange(b, e, addr, size) \
|
2022-09-20 05:05:58 +08:00
|
|
|
for ((b) = 0; \
|
|
|
|
(b) = find_next_bit((addr), (size), b), \
|
|
|
|
(e) = find_next_zero_bit((addr), (size), (b) + 1), \
|
2021-08-15 05:17:11 +08:00
|
|
|
(b) < (size); \
|
2022-09-20 05:05:58 +08:00
|
|
|
(b) = (e) + 1)
|
2021-08-15 05:17:11 +08:00
|
|
|
|
|
|
|
/**
|
|
|
|
* for_each_set_bitrange_from - iterate over all set bit ranges [b; e)
|
|
|
|
* @b: bit offset of start of current bitrange (first set bit); must be initialized
|
|
|
|
* @e: bit offset of end of current bitrange (first unset bit)
|
|
|
|
* @addr: bitmap address to base the search on
|
|
|
|
* @size: bitmap size in number of bits
|
|
|
|
*/
|
|
|
|
#define for_each_set_bitrange_from(b, e, addr, size) \
|
2022-09-20 05:05:58 +08:00
|
|
|
for (; \
|
|
|
|
(b) = find_next_bit((addr), (size), (b)), \
|
|
|
|
(e) = find_next_zero_bit((addr), (size), (b) + 1), \
|
2021-08-15 05:17:11 +08:00
|
|
|
(b) < (size); \
|
2022-09-20 05:05:58 +08:00
|
|
|
(b) = (e) + 1)
|
2021-08-15 05:17:11 +08:00
|
|
|
|
|
|
|
/**
|
|
|
|
* for_each_clear_bitrange - iterate over all unset bit ranges [b; e)
|
|
|
|
* @b: bit offset of start of current bitrange (first unset bit)
|
|
|
|
* @e: bit offset of end of current bitrange (first set bit)
|
|
|
|
* @addr: bitmap address to base the search on
|
|
|
|
* @size: bitmap size in number of bits
|
|
|
|
*/
|
|
|
|
#define for_each_clear_bitrange(b, e, addr, size) \
|
2022-09-20 05:05:58 +08:00
|
|
|
for ((b) = 0; \
|
|
|
|
(b) = find_next_zero_bit((addr), (size), (b)), \
|
|
|
|
(e) = find_next_bit((addr), (size), (b) + 1), \
|
2021-08-15 05:17:11 +08:00
|
|
|
(b) < (size); \
|
2022-09-20 05:05:58 +08:00
|
|
|
(b) = (e) + 1)
|
2021-08-15 05:17:11 +08:00
|
|
|
|
|
|
|
/**
|
|
|
|
* for_each_clear_bitrange_from - iterate over all unset bit ranges [b; e)
|
|
|
|
* @b: bit offset of start of current bitrange (first set bit); must be initialized
|
|
|
|
* @e: bit offset of end of current bitrange (first unset bit)
|
|
|
|
* @addr: bitmap address to base the search on
|
|
|
|
* @size: bitmap size in number of bits
|
|
|
|
*/
|
|
|
|
#define for_each_clear_bitrange_from(b, e, addr, size) \
|
2022-09-20 05:05:58 +08:00
|
|
|
for (; \
|
|
|
|
(b) = find_next_zero_bit((addr), (size), (b)), \
|
|
|
|
(e) = find_next_bit((addr), (size), (b) + 1), \
|
2021-08-15 05:17:11 +08:00
|
|
|
(b) < (size); \
|
2022-09-20 05:05:58 +08:00
|
|
|
(b) = (e) + 1)
|
2021-08-15 05:17:11 +08:00
|
|
|
|
2022-09-20 05:05:57 +08:00
|
|
|
/**
|
|
|
|
* for_each_set_bit_wrap - iterate over all set bits starting from @start, and
|
|
|
|
* wrapping around the end of bitmap.
|
|
|
|
* @bit: offset for current iteration
|
|
|
|
* @addr: bitmap address to base the search on
|
|
|
|
* @size: bitmap size in number of bits
|
|
|
|
* @start: Starting bit for bitmap traversing, wrapping around the bitmap end
|
|
|
|
*/
|
|
|
|
#define for_each_set_bit_wrap(bit, addr, size, start) \
|
|
|
|
for ((bit) = find_next_bit_wrap((addr), (size), (start)); \
|
|
|
|
(bit) < (size); \
|
|
|
|
(bit) = __for_each_wrap((addr), (size), (start), (bit) + 1))
|
|
|
|
|
2021-08-15 05:17:06 +08:00
|
|
|
/**
|
|
|
|
* for_each_set_clump8 - iterate over bitmap for each 8-bit clump with set bits
|
|
|
|
* @start: bit offset to start search and to store the current iteration offset
|
|
|
|
* @clump: location to store copy of current 8-bit clump
|
|
|
|
* @bits: bitmap address to base the search on
|
|
|
|
* @size: bitmap size in number of bits
|
|
|
|
*/
|
|
|
|
#define for_each_set_clump8(start, clump, bits, size) \
|
|
|
|
for ((start) = find_first_clump8(&(clump), (bits), (size)); \
|
|
|
|
(start) < (size); \
|
|
|
|
(start) = find_next_clump8(&(clump), (bits), (size), (start) + 8))
|
|
|
|
|
2021-08-15 05:16:59 +08:00
|
|
|
#endif /*__LINUX_FIND_H_ */
|