linux-sg2042/tools/testing/radix-tree/multiorder.c

719 lines
17 KiB
C
Raw Normal View History

/*
* multiorder.c: Multi-order radix tree entry testing
* Copyright (c) 2016 Intel Corporation
* Author: Ross Zwisler <ross.zwisler@linux.intel.com>
* Author: Matthew Wilcox <matthew.r.wilcox@intel.com>
*
* This program is free software; you can redistribute it and/or modify it
* under the terms and conditions of the GNU General Public License,
* version 2, as published by the Free Software Foundation.
*
* This program is distributed in the hope it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
* more details.
*/
#include <linux/radix-tree.h>
#include <linux/slab.h>
#include <linux/errno.h>
radix tree test suite: multi-order iteration race Add a test which shows a race in the multi-order iteration code. This test reliably hits the race in under a second on my machine, and is the result of a real bug report against kernel a production v4.15 based kernel (4.15.6-300.fc27.x86_64). With a real kernel this issue is hit when using order 9 PMD DAX radix tree entries. The race has to do with how we tear down multi-order sibling entries when we are removing an item from the tree. Remember that an order 2 entry looks like this: struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling] where 'entry' is in some slot in the struct radix_tree_node, and the three slots following 'entry' contain sibling pointers which point back to 'entry.' When we delete 'entry' from the tree, we call : radix_tree_delete() radix_tree_delete_item() __radix_tree_delete() replace_slot() replace_slot() first removes the siblings in order from the first to the last, then at then replaces 'entry' with NULL. This means that for a brief period of time we end up with one or more of the siblings removed, so: struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling] This causes an issue if you have a reader iterating over the slots in the tree via radix_tree_for_each_slot() while only under rcu_read_lock()/rcu_read_unlock() protection. This is a common case in mm/filemap.c. The issue is that when __radix_tree_next_slot() => skip_siblings() tries to skip over the sibling entries in the slots, it currently does so with an exact match on the slot directly preceding our current slot. Normally this works: V preceding slot struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling] ^ current slot This lets you find the first sibling, and you skip them all in order. But in the case where one of the siblings is NULL, that slot is skipped and then our sibling detection is interrupted: V preceding slot struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling] ^ current slot This means that the sibling pointers aren't recognized since they point all the way back to 'entry', so we think that they are normal internal radix tree pointers. This causes us to think we need to walk down to a struct radix_tree_node starting at the address of 'entry'. In a real running kernel this will crash the thread with a GP fault when you try and dereference the slots in your broken node starting at 'entry'. In the radix tree test suite this will be caught by the address sanitizer: ==27063==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60c0008ae400 at pc 0x00000040ce4f bp 0x7fa89b8fcad0 sp 0x7fa89b8fcac0 READ of size 8 at 0x60c0008ae400 thread T3 #0 0x40ce4e in __radix_tree_next_slot /home/rzwisler/project/linux/tools/testing/radix-tree/radix-tree.c:1660 #1 0x4022cc in radix_tree_next_slot linux/../../../../include/linux/radix-tree.h:567 #2 0x4022cc in iterator_func /home/rzwisler/project/linux/tools/testing/radix-tree/multiorder.c:655 #3 0x7fa8a088d50a in start_thread (/lib64/libpthread.so.0+0x750a) #4 0x7fa8a03bd16e in clone (/lib64/libc.so.6+0xf516e) Link: http://lkml.kernel.org/r/20180503192430.7582-5-ross.zwisler@linux.intel.com Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: CR, Sapthagirish <sapthagirish.cr@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Jan Kara <jack@suse.cz> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-19 07:09:01 +08:00
#include <pthread.h>
#include "test.h"
#define for_each_index(i, base, order) \
for (i = base; i < base + (1 << order); i++)
static void __multiorder_tag_test(int index, int order)
{
RADIX_TREE(tree, GFP_KERNEL);
int base, err, i;
/* our canonical entry */
base = index & ~((1 << order) - 1);
printv(2, "Multiorder tag test with index %d, canonical entry %d\n",
index, base);
err = item_insert_order(&tree, index, order);
assert(!err);
/*
* Verify we get collisions for covered indices. We try and fail to
* insert a value entry so we don't leak memory via
* item_insert_order().
*/
for_each_index(i, base, order) {
err = __radix_tree_insert(&tree, i, order, xa_mk_value(0xA0));
assert(err == -EEXIST);
}
for_each_index(i, base, order) {
assert(!radix_tree_tag_get(&tree, i, 0));
assert(!radix_tree_tag_get(&tree, i, 1));
}
assert(radix_tree_tag_set(&tree, index, 0));
for_each_index(i, base, order) {
assert(radix_tree_tag_get(&tree, i, 0));
assert(!radix_tree_tag_get(&tree, i, 1));
}
assert(tag_tagged_items(&tree, NULL, 0, ~0UL, 10, 0, 1) == 1);
assert(radix_tree_tag_clear(&tree, index, 0));
for_each_index(i, base, order) {
assert(!radix_tree_tag_get(&tree, i, 0));
radix-tree: fix radix_tree_range_tag_if_tagged() for multiorder entries I had previously decided that tagging a single multiorder entry would count as tagging 2^order entries for the purposes of 'nr_to_tag'. I now believe that decision to be a mistake, and it should count as a single entry. That's more likely to be what callers expect. When walking back up the tree from a newly-tagged entry, the current code assumed we were starting from the lowest level of the tree; if we have a multiorder entry with an order at least RADIX_TREE_MAP_SHIFT in size then we need to shift the index by 'shift' before we start walking back up the tree, or we will end up not setting tags on higher entries, and then mistakenly thinking that entries below a certain point in the tree are not tagged. If the first index we examine is a sibling entry of a tagged multiorder entry, we were not tagging it. We need to examine the canonical entry, and the easiest way to do that is to use radix_tree_descend(). We then have to skip over sibling slots when looking for the next entry in the tree or we will end up walking back to the canonical entry. Add several tests for radix_tree_range_tag_if_tagged(). Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com> Cc: Jan Kara <jack@suse.com> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-21 08:02:52 +08:00
assert(radix_tree_tag_get(&tree, i, 1));
}
radix-tree: fix radix_tree_range_tag_if_tagged() for multiorder entries I had previously decided that tagging a single multiorder entry would count as tagging 2^order entries for the purposes of 'nr_to_tag'. I now believe that decision to be a mistake, and it should count as a single entry. That's more likely to be what callers expect. When walking back up the tree from a newly-tagged entry, the current code assumed we were starting from the lowest level of the tree; if we have a multiorder entry with an order at least RADIX_TREE_MAP_SHIFT in size then we need to shift the index by 'shift' before we start walking back up the tree, or we will end up not setting tags on higher entries, and then mistakenly thinking that entries below a certain point in the tree are not tagged. If the first index we examine is a sibling entry of a tagged multiorder entry, we were not tagging it. We need to examine the canonical entry, and the easiest way to do that is to use radix_tree_descend(). We then have to skip over sibling slots when looking for the next entry in the tree or we will end up walking back to the canonical entry. Add several tests for radix_tree_range_tag_if_tagged(). Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com> Cc: Jan Kara <jack@suse.com> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-21 08:02:52 +08:00
assert(radix_tree_tag_clear(&tree, index, 1));
assert(!radix_tree_tagged(&tree, 0));
assert(!radix_tree_tagged(&tree, 1));
item_kill_tree(&tree);
}
static void __multiorder_tag_test2(unsigned order, unsigned long index2)
{
RADIX_TREE(tree, GFP_KERNEL);
unsigned long index = (1 << order);
index2 += index;
assert(item_insert_order(&tree, 0, order) == 0);
assert(item_insert(&tree, index2) == 0);
assert(radix_tree_tag_set(&tree, 0, 0));
assert(radix_tree_tag_set(&tree, index2, 0));
assert(tag_tagged_items(&tree, NULL, 0, ~0UL, 10, 0, 1) == 2);
item_kill_tree(&tree);
}
static void multiorder_tag_tests(void)
{
int i, j;
/* test multi-order entry for indices 0-7 with no sibling pointers */
__multiorder_tag_test(0, 3);
__multiorder_tag_test(5, 3);
/* test multi-order entry for indices 8-15 with no sibling pointers */
__multiorder_tag_test(8, 3);
__multiorder_tag_test(15, 3);
/*
* Our order 5 entry covers indices 0-31 in a tree with height=2.
* This is broken up as follows:
* 0-7: canonical entry
* 8-15: sibling 1
* 16-23: sibling 2
* 24-31: sibling 3
*/
__multiorder_tag_test(0, 5);
__multiorder_tag_test(29, 5);
/* same test, but with indices 32-63 */
__multiorder_tag_test(32, 5);
__multiorder_tag_test(44, 5);
/*
* Our order 8 entry covers indices 0-255 in a tree with height=3.
* This is broken up as follows:
* 0-63: canonical entry
* 64-127: sibling 1
* 128-191: sibling 2
* 192-255: sibling 3
*/
__multiorder_tag_test(0, 8);
__multiorder_tag_test(190, 8);
/* same test, but with indices 256-511 */
__multiorder_tag_test(256, 8);
__multiorder_tag_test(300, 8);
__multiorder_tag_test(0x12345678UL, 8);
for (i = 1; i < 10; i++)
for (j = 0; j < (10 << i); j++)
__multiorder_tag_test2(i, j);
}
static void multiorder_check(unsigned long index, int order)
{
unsigned long i;
unsigned long min = index & ~((1UL << order) - 1);
unsigned long max = min + (1UL << order);
void **slot;
struct item *item2 = item_create(min, order);
RADIX_TREE(tree, GFP_KERNEL);
printv(2, "Multiorder index %ld, order %d\n", index, order);
assert(item_insert_order(&tree, index, order) == 0);
for (i = min; i < max; i++) {
struct item *item = item_lookup(&tree, i);
assert(item != 0);
assert(item->index == index);
}
for (i = 0; i < min; i++)
item_check_absent(&tree, i);
for (i = max; i < 2*max; i++)
item_check_absent(&tree, i);
for (i = min; i < max; i++)
assert(radix_tree_insert(&tree, i, item2) == -EEXIST);
slot = radix_tree_lookup_slot(&tree, index);
free(*slot);
radix_tree_replace_slot(&tree, slot, item2);
for (i = min; i < max; i++) {
struct item *item = item_lookup(&tree, i);
assert(item != 0);
assert(item->index == min);
}
assert(item_delete(&tree, min) != 0);
for (i = 0; i < 2*max; i++)
item_check_absent(&tree, i);
}
radix-tree: fix several shrinking bugs with multiorder entries Setting the indirect bit on the user data entry used to be unambiguous because the tree walking code knew not to expect internal nodes in the last level of the tree. Multiorder entries can appear at any level of the tree, and a leaf with the indirect bit set is indistinguishable from a pointer to a node. Introduce a special entry (RADIX_TREE_RETRY) which is neither a valid user entry, nor a valid pointer to a node. The radix_tree_deref_retry() function continues to work the same way, but tree walking code can distinguish it from a pointer to a node. Also fix the condition for setting slot->parent to NULL; it does not matter what height the tree is, it only matters whether slot is an indirect pointer. Move this code above the comment which is referring to the assignment to root->rnode. Also fix the condition for preventing the tree from shrinking to a single entry if it's a multiorder entry. Add a test-case to the test suite that checks that the tree goes back down to its original height after an item is inserted & deleted from a higher index in the tree. Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com> Cc: Jan Kara <jack@suse.com> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-21 08:02:17 +08:00
static void multiorder_shrink(unsigned long index, int order)
{
unsigned long i;
unsigned long max = 1 << order;
RADIX_TREE(tree, GFP_KERNEL);
struct radix_tree_node *node;
printv(2, "Multiorder shrink index %ld, order %d\n", index, order);
radix-tree: fix several shrinking bugs with multiorder entries Setting the indirect bit on the user data entry used to be unambiguous because the tree walking code knew not to expect internal nodes in the last level of the tree. Multiorder entries can appear at any level of the tree, and a leaf with the indirect bit set is indistinguishable from a pointer to a node. Introduce a special entry (RADIX_TREE_RETRY) which is neither a valid user entry, nor a valid pointer to a node. The radix_tree_deref_retry() function continues to work the same way, but tree walking code can distinguish it from a pointer to a node. Also fix the condition for setting slot->parent to NULL; it does not matter what height the tree is, it only matters whether slot is an indirect pointer. Move this code above the comment which is referring to the assignment to root->rnode. Also fix the condition for preventing the tree from shrinking to a single entry if it's a multiorder entry. Add a test-case to the test suite that checks that the tree goes back down to its original height after an item is inserted & deleted from a higher index in the tree. Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com> Cc: Jan Kara <jack@suse.com> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-21 08:02:17 +08:00
assert(item_insert_order(&tree, 0, order) == 0);
node = tree.rnode;
assert(item_insert(&tree, index) == 0);
assert(node != tree.rnode);
assert(item_delete(&tree, index) != 0);
assert(node == tree.rnode);
for (i = 0; i < max; i++) {
struct item *item = item_lookup(&tree, i);
assert(item != 0);
assert(item->index == 0);
}
for (i = max; i < 2*max; i++)
item_check_absent(&tree, i);
if (!item_delete(&tree, 0)) {
printv(2, "failed to delete index %ld (order %d)\n", index, order);
abort();
radix-tree: fix several shrinking bugs with multiorder entries Setting the indirect bit on the user data entry used to be unambiguous because the tree walking code knew not to expect internal nodes in the last level of the tree. Multiorder entries can appear at any level of the tree, and a leaf with the indirect bit set is indistinguishable from a pointer to a node. Introduce a special entry (RADIX_TREE_RETRY) which is neither a valid user entry, nor a valid pointer to a node. The radix_tree_deref_retry() function continues to work the same way, but tree walking code can distinguish it from a pointer to a node. Also fix the condition for setting slot->parent to NULL; it does not matter what height the tree is, it only matters whether slot is an indirect pointer. Move this code above the comment which is referring to the assignment to root->rnode. Also fix the condition for preventing the tree from shrinking to a single entry if it's a multiorder entry. Add a test-case to the test suite that checks that the tree goes back down to its original height after an item is inserted & deleted from a higher index in the tree. Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com> Cc: Jan Kara <jack@suse.com> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-21 08:02:17 +08:00
}
for (i = 0; i < 2*max; i++)
item_check_absent(&tree, i);
}
static void multiorder_insert_bug(void)
{
RADIX_TREE(tree, GFP_KERNEL);
item_insert(&tree, 0);
radix_tree_tag_set(&tree, 0, 0);
item_insert_order(&tree, 3 << 6, 6);
item_kill_tree(&tree);
}
void multiorder_iteration(void)
{
RADIX_TREE(tree, GFP_KERNEL);
struct radix_tree_iter iter;
void **slot;
int i, j, err;
printv(1, "Multiorder iteration test\n");
#define NUM_ENTRIES 11
int index[NUM_ENTRIES] = {0, 2, 4, 8, 16, 32, 34, 36, 64, 72, 128};
int order[NUM_ENTRIES] = {1, 1, 2, 3, 4, 1, 0, 1, 3, 0, 7};
for (i = 0; i < NUM_ENTRIES; i++) {
err = item_insert_order(&tree, index[i], order[i]);
assert(!err);
}
for (j = 0; j < 256; j++) {
for (i = 0; i < NUM_ENTRIES; i++)
if (j <= (index[i] | ((1 << order[i]) - 1)))
break;
radix_tree_for_each_slot(slot, &tree, &iter, j) {
int height = order[i] / RADIX_TREE_MAP_SHIFT;
int shift = height * RADIX_TREE_MAP_SHIFT;
radix-tree: improve multiorder iterators This fixes several interlinked problems with the iterators in the presence of multiorder entries. 1. radix_tree_iter_next() would only advance by one slot, which would result in the iterators returning the same entry more than once if there were sibling entries. 2. radix_tree_next_slot() could return an internal pointer instead of a user pointer if a tagged multiorder entry was immediately followed by an entry of lower order. 3. radix_tree_next_slot() expanded to a lot more code than it used to when multiorder support was compiled in. And I wasn't comfortable with entry_to_node() being in a header file. Fixing radix_tree_iter_next() for the presence of sibling entries necessarily involves examining the contents of the radix tree, so we now need to pass 'slot' to radix_tree_iter_next(), and we need to change the calling convention so it is called *before* dropping the lock which protects the tree. Also rename it to radix_tree_iter_resume(), as some people thought it was necessary to call radix_tree_iter_next() each time around the loop. radix_tree_next_slot() becomes closer to how it looked before multiorder support was introduced. It only checks to see if the next entry in the chunk is a sibling entry or a pointer to a node; this should be rare enough that handling this case out of line is not a performance impact (and such impact is amortised by the fact that the entry we just processed was a multiorder entry). Also, radix_tree_next_slot() used to force a new chunk lookup for untagged entries, which is more expensive than the out of line sibling entry skipping. Link: http://lkml.kernel.org/r/1480369871-5271-55-git-send-email-mawilcox@linuxonhyperv.com Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-15 07:08:49 +08:00
unsigned long mask = (1UL << order[i]) - 1;
struct item *item = *slot;
radix-tree: improve multiorder iterators This fixes several interlinked problems with the iterators in the presence of multiorder entries. 1. radix_tree_iter_next() would only advance by one slot, which would result in the iterators returning the same entry more than once if there were sibling entries. 2. radix_tree_next_slot() could return an internal pointer instead of a user pointer if a tagged multiorder entry was immediately followed by an entry of lower order. 3. radix_tree_next_slot() expanded to a lot more code than it used to when multiorder support was compiled in. And I wasn't comfortable with entry_to_node() being in a header file. Fixing radix_tree_iter_next() for the presence of sibling entries necessarily involves examining the contents of the radix tree, so we now need to pass 'slot' to radix_tree_iter_next(), and we need to change the calling convention so it is called *before* dropping the lock which protects the tree. Also rename it to radix_tree_iter_resume(), as some people thought it was necessary to call radix_tree_iter_next() each time around the loop. radix_tree_next_slot() becomes closer to how it looked before multiorder support was introduced. It only checks to see if the next entry in the chunk is a sibling entry or a pointer to a node; this should be rare enough that handling this case out of line is not a performance impact (and such impact is amortised by the fact that the entry we just processed was a multiorder entry). Also, radix_tree_next_slot() used to force a new chunk lookup for untagged entries, which is more expensive than the out of line sibling entry skipping. Link: http://lkml.kernel.org/r/1480369871-5271-55-git-send-email-mawilcox@linuxonhyperv.com Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-15 07:08:49 +08:00
assert((iter.index | mask) == (index[i] | mask));
assert(iter.shift == shift);
radix-tree: improve multiorder iterators This fixes several interlinked problems with the iterators in the presence of multiorder entries. 1. radix_tree_iter_next() would only advance by one slot, which would result in the iterators returning the same entry more than once if there were sibling entries. 2. radix_tree_next_slot() could return an internal pointer instead of a user pointer if a tagged multiorder entry was immediately followed by an entry of lower order. 3. radix_tree_next_slot() expanded to a lot more code than it used to when multiorder support was compiled in. And I wasn't comfortable with entry_to_node() being in a header file. Fixing radix_tree_iter_next() for the presence of sibling entries necessarily involves examining the contents of the radix tree, so we now need to pass 'slot' to radix_tree_iter_next(), and we need to change the calling convention so it is called *before* dropping the lock which protects the tree. Also rename it to radix_tree_iter_resume(), as some people thought it was necessary to call radix_tree_iter_next() each time around the loop. radix_tree_next_slot() becomes closer to how it looked before multiorder support was introduced. It only checks to see if the next entry in the chunk is a sibling entry or a pointer to a node; this should be rare enough that handling this case out of line is not a performance impact (and such impact is amortised by the fact that the entry we just processed was a multiorder entry). Also, radix_tree_next_slot() used to force a new chunk lookup for untagged entries, which is more expensive than the out of line sibling entry skipping. Link: http://lkml.kernel.org/r/1480369871-5271-55-git-send-email-mawilcox@linuxonhyperv.com Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-15 07:08:49 +08:00
assert(!radix_tree_is_internal_node(item));
assert((item->index | mask) == (index[i] | mask));
assert(item->order == order[i]);
i++;
}
}
item_kill_tree(&tree);
}
void multiorder_tagged_iteration(void)
{
RADIX_TREE(tree, GFP_KERNEL);
struct radix_tree_iter iter;
void **slot;
int i, j;
printv(1, "Multiorder tagged iteration test\n");
#define MT_NUM_ENTRIES 9
int index[MT_NUM_ENTRIES] = {0, 2, 4, 16, 32, 40, 64, 72, 128};
int order[MT_NUM_ENTRIES] = {1, 0, 2, 4, 3, 1, 3, 0, 7};
#define TAG_ENTRIES 7
int tag_index[TAG_ENTRIES] = {0, 4, 16, 40, 64, 72, 128};
for (i = 0; i < MT_NUM_ENTRIES; i++)
assert(!item_insert_order(&tree, index[i], order[i]));
assert(!radix_tree_tagged(&tree, 1));
for (i = 0; i < TAG_ENTRIES; i++)
assert(radix_tree_tag_set(&tree, tag_index[i], 1));
for (j = 0; j < 256; j++) {
radix-tree: improve multiorder iterators This fixes several interlinked problems with the iterators in the presence of multiorder entries. 1. radix_tree_iter_next() would only advance by one slot, which would result in the iterators returning the same entry more than once if there were sibling entries. 2. radix_tree_next_slot() could return an internal pointer instead of a user pointer if a tagged multiorder entry was immediately followed by an entry of lower order. 3. radix_tree_next_slot() expanded to a lot more code than it used to when multiorder support was compiled in. And I wasn't comfortable with entry_to_node() being in a header file. Fixing radix_tree_iter_next() for the presence of sibling entries necessarily involves examining the contents of the radix tree, so we now need to pass 'slot' to radix_tree_iter_next(), and we need to change the calling convention so it is called *before* dropping the lock which protects the tree. Also rename it to radix_tree_iter_resume(), as some people thought it was necessary to call radix_tree_iter_next() each time around the loop. radix_tree_next_slot() becomes closer to how it looked before multiorder support was introduced. It only checks to see if the next entry in the chunk is a sibling entry or a pointer to a node; this should be rare enough that handling this case out of line is not a performance impact (and such impact is amortised by the fact that the entry we just processed was a multiorder entry). Also, radix_tree_next_slot() used to force a new chunk lookup for untagged entries, which is more expensive than the out of line sibling entry skipping. Link: http://lkml.kernel.org/r/1480369871-5271-55-git-send-email-mawilcox@linuxonhyperv.com Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-15 07:08:49 +08:00
int k;
for (i = 0; i < TAG_ENTRIES; i++) {
for (k = i; index[k] < tag_index[i]; k++)
;
if (j <= (index[k] | ((1 << order[k]) - 1)))
break;
}
radix_tree_for_each_tagged(slot, &tree, &iter, j, 1) {
radix-tree: improve multiorder iterators This fixes several interlinked problems with the iterators in the presence of multiorder entries. 1. radix_tree_iter_next() would only advance by one slot, which would result in the iterators returning the same entry more than once if there were sibling entries. 2. radix_tree_next_slot() could return an internal pointer instead of a user pointer if a tagged multiorder entry was immediately followed by an entry of lower order. 3. radix_tree_next_slot() expanded to a lot more code than it used to when multiorder support was compiled in. And I wasn't comfortable with entry_to_node() being in a header file. Fixing radix_tree_iter_next() for the presence of sibling entries necessarily involves examining the contents of the radix tree, so we now need to pass 'slot' to radix_tree_iter_next(), and we need to change the calling convention so it is called *before* dropping the lock which protects the tree. Also rename it to radix_tree_iter_resume(), as some people thought it was necessary to call radix_tree_iter_next() each time around the loop. radix_tree_next_slot() becomes closer to how it looked before multiorder support was introduced. It only checks to see if the next entry in the chunk is a sibling entry or a pointer to a node; this should be rare enough that handling this case out of line is not a performance impact (and such impact is amortised by the fact that the entry we just processed was a multiorder entry). Also, radix_tree_next_slot() used to force a new chunk lookup for untagged entries, which is more expensive than the out of line sibling entry skipping. Link: http://lkml.kernel.org/r/1480369871-5271-55-git-send-email-mawilcox@linuxonhyperv.com Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-15 07:08:49 +08:00
unsigned long mask;
struct item *item = *slot;
for (k = i; index[k] < tag_index[i]; k++)
;
radix-tree: improve multiorder iterators This fixes several interlinked problems with the iterators in the presence of multiorder entries. 1. radix_tree_iter_next() would only advance by one slot, which would result in the iterators returning the same entry more than once if there were sibling entries. 2. radix_tree_next_slot() could return an internal pointer instead of a user pointer if a tagged multiorder entry was immediately followed by an entry of lower order. 3. radix_tree_next_slot() expanded to a lot more code than it used to when multiorder support was compiled in. And I wasn't comfortable with entry_to_node() being in a header file. Fixing radix_tree_iter_next() for the presence of sibling entries necessarily involves examining the contents of the radix tree, so we now need to pass 'slot' to radix_tree_iter_next(), and we need to change the calling convention so it is called *before* dropping the lock which protects the tree. Also rename it to radix_tree_iter_resume(), as some people thought it was necessary to call radix_tree_iter_next() each time around the loop. radix_tree_next_slot() becomes closer to how it looked before multiorder support was introduced. It only checks to see if the next entry in the chunk is a sibling entry or a pointer to a node; this should be rare enough that handling this case out of line is not a performance impact (and such impact is amortised by the fact that the entry we just processed was a multiorder entry). Also, radix_tree_next_slot() used to force a new chunk lookup for untagged entries, which is more expensive than the out of line sibling entry skipping. Link: http://lkml.kernel.org/r/1480369871-5271-55-git-send-email-mawilcox@linuxonhyperv.com Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-15 07:08:49 +08:00
mask = (1UL << order[k]) - 1;
radix-tree: improve multiorder iterators This fixes several interlinked problems with the iterators in the presence of multiorder entries. 1. radix_tree_iter_next() would only advance by one slot, which would result in the iterators returning the same entry more than once if there were sibling entries. 2. radix_tree_next_slot() could return an internal pointer instead of a user pointer if a tagged multiorder entry was immediately followed by an entry of lower order. 3. radix_tree_next_slot() expanded to a lot more code than it used to when multiorder support was compiled in. And I wasn't comfortable with entry_to_node() being in a header file. Fixing radix_tree_iter_next() for the presence of sibling entries necessarily involves examining the contents of the radix tree, so we now need to pass 'slot' to radix_tree_iter_next(), and we need to change the calling convention so it is called *before* dropping the lock which protects the tree. Also rename it to radix_tree_iter_resume(), as some people thought it was necessary to call radix_tree_iter_next() each time around the loop. radix_tree_next_slot() becomes closer to how it looked before multiorder support was introduced. It only checks to see if the next entry in the chunk is a sibling entry or a pointer to a node; this should be rare enough that handling this case out of line is not a performance impact (and such impact is amortised by the fact that the entry we just processed was a multiorder entry). Also, radix_tree_next_slot() used to force a new chunk lookup for untagged entries, which is more expensive than the out of line sibling entry skipping. Link: http://lkml.kernel.org/r/1480369871-5271-55-git-send-email-mawilcox@linuxonhyperv.com Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-15 07:08:49 +08:00
assert((iter.index | mask) == (tag_index[i] | mask));
assert(!radix_tree_is_internal_node(item));
assert((item->index | mask) == (tag_index[i] | mask));
assert(item->order == order[k]);
i++;
}
}
assert(tag_tagged_items(&tree, NULL, 0, ~0UL, TAG_ENTRIES, 1, 2) ==
TAG_ENTRIES);
radix-tree: fix radix_tree_range_tag_if_tagged() for multiorder entries I had previously decided that tagging a single multiorder entry would count as tagging 2^order entries for the purposes of 'nr_to_tag'. I now believe that decision to be a mistake, and it should count as a single entry. That's more likely to be what callers expect. When walking back up the tree from a newly-tagged entry, the current code assumed we were starting from the lowest level of the tree; if we have a multiorder entry with an order at least RADIX_TREE_MAP_SHIFT in size then we need to shift the index by 'shift' before we start walking back up the tree, or we will end up not setting tags on higher entries, and then mistakenly thinking that entries below a certain point in the tree are not tagged. If the first index we examine is a sibling entry of a tagged multiorder entry, we were not tagging it. We need to examine the canonical entry, and the easiest way to do that is to use radix_tree_descend(). We then have to skip over sibling slots when looking for the next entry in the tree or we will end up walking back to the canonical entry. Add several tests for radix_tree_range_tag_if_tagged(). Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com> Cc: Jan Kara <jack@suse.com> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-21 08:02:52 +08:00
for (j = 0; j < 256; j++) {
int mask, k;
for (i = 0; i < TAG_ENTRIES; i++) {
for (k = i; index[k] < tag_index[i]; k++)
;
if (j <= (index[k] | ((1 << order[k]) - 1)))
break;
}
radix_tree_for_each_tagged(slot, &tree, &iter, j, 2) {
radix-tree: improve multiorder iterators This fixes several interlinked problems with the iterators in the presence of multiorder entries. 1. radix_tree_iter_next() would only advance by one slot, which would result in the iterators returning the same entry more than once if there were sibling entries. 2. radix_tree_next_slot() could return an internal pointer instead of a user pointer if a tagged multiorder entry was immediately followed by an entry of lower order. 3. radix_tree_next_slot() expanded to a lot more code than it used to when multiorder support was compiled in. And I wasn't comfortable with entry_to_node() being in a header file. Fixing radix_tree_iter_next() for the presence of sibling entries necessarily involves examining the contents of the radix tree, so we now need to pass 'slot' to radix_tree_iter_next(), and we need to change the calling convention so it is called *before* dropping the lock which protects the tree. Also rename it to radix_tree_iter_resume(), as some people thought it was necessary to call radix_tree_iter_next() each time around the loop. radix_tree_next_slot() becomes closer to how it looked before multiorder support was introduced. It only checks to see if the next entry in the chunk is a sibling entry or a pointer to a node; this should be rare enough that handling this case out of line is not a performance impact (and such impact is amortised by the fact that the entry we just processed was a multiorder entry). Also, radix_tree_next_slot() used to force a new chunk lookup for untagged entries, which is more expensive than the out of line sibling entry skipping. Link: http://lkml.kernel.org/r/1480369871-5271-55-git-send-email-mawilcox@linuxonhyperv.com Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-15 07:08:49 +08:00
struct item *item = *slot;
for (k = i; index[k] < tag_index[i]; k++)
;
mask = (1 << order[k]) - 1;
radix-tree: improve multiorder iterators This fixes several interlinked problems with the iterators in the presence of multiorder entries. 1. radix_tree_iter_next() would only advance by one slot, which would result in the iterators returning the same entry more than once if there were sibling entries. 2. radix_tree_next_slot() could return an internal pointer instead of a user pointer if a tagged multiorder entry was immediately followed by an entry of lower order. 3. radix_tree_next_slot() expanded to a lot more code than it used to when multiorder support was compiled in. And I wasn't comfortable with entry_to_node() being in a header file. Fixing radix_tree_iter_next() for the presence of sibling entries necessarily involves examining the contents of the radix tree, so we now need to pass 'slot' to radix_tree_iter_next(), and we need to change the calling convention so it is called *before* dropping the lock which protects the tree. Also rename it to radix_tree_iter_resume(), as some people thought it was necessary to call radix_tree_iter_next() each time around the loop. radix_tree_next_slot() becomes closer to how it looked before multiorder support was introduced. It only checks to see if the next entry in the chunk is a sibling entry or a pointer to a node; this should be rare enough that handling this case out of line is not a performance impact (and such impact is amortised by the fact that the entry we just processed was a multiorder entry). Also, radix_tree_next_slot() used to force a new chunk lookup for untagged entries, which is more expensive than the out of line sibling entry skipping. Link: http://lkml.kernel.org/r/1480369871-5271-55-git-send-email-mawilcox@linuxonhyperv.com Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-15 07:08:49 +08:00
assert((iter.index | mask) == (tag_index[i] | mask));
assert(!radix_tree_is_internal_node(item));
assert((item->index | mask) == (tag_index[i] | mask));
assert(item->order == order[k]);
i++;
}
radix-tree: fix radix_tree_range_tag_if_tagged() for multiorder entries I had previously decided that tagging a single multiorder entry would count as tagging 2^order entries for the purposes of 'nr_to_tag'. I now believe that decision to be a mistake, and it should count as a single entry. That's more likely to be what callers expect. When walking back up the tree from a newly-tagged entry, the current code assumed we were starting from the lowest level of the tree; if we have a multiorder entry with an order at least RADIX_TREE_MAP_SHIFT in size then we need to shift the index by 'shift' before we start walking back up the tree, or we will end up not setting tags on higher entries, and then mistakenly thinking that entries below a certain point in the tree are not tagged. If the first index we examine is a sibling entry of a tagged multiorder entry, we were not tagging it. We need to examine the canonical entry, and the easiest way to do that is to use radix_tree_descend(). We then have to skip over sibling slots when looking for the next entry in the tree or we will end up walking back to the canonical entry. Add several tests for radix_tree_range_tag_if_tagged(). Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com> Cc: Jan Kara <jack@suse.com> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-21 08:02:52 +08:00
}
assert(tag_tagged_items(&tree, NULL, 1, ~0UL, MT_NUM_ENTRIES * 2, 1, 0)
== TAG_ENTRIES);
radix-tree: fix radix_tree_range_tag_if_tagged() for multiorder entries I had previously decided that tagging a single multiorder entry would count as tagging 2^order entries for the purposes of 'nr_to_tag'. I now believe that decision to be a mistake, and it should count as a single entry. That's more likely to be what callers expect. When walking back up the tree from a newly-tagged entry, the current code assumed we were starting from the lowest level of the tree; if we have a multiorder entry with an order at least RADIX_TREE_MAP_SHIFT in size then we need to shift the index by 'shift' before we start walking back up the tree, or we will end up not setting tags on higher entries, and then mistakenly thinking that entries below a certain point in the tree are not tagged. If the first index we examine is a sibling entry of a tagged multiorder entry, we were not tagging it. We need to examine the canonical entry, and the easiest way to do that is to use radix_tree_descend(). We then have to skip over sibling slots when looking for the next entry in the tree or we will end up walking back to the canonical entry. Add several tests for radix_tree_range_tag_if_tagged(). Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com> Cc: Jan Kara <jack@suse.com> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-21 08:02:52 +08:00
i = 0;
radix_tree_for_each_tagged(slot, &tree, &iter, 0, 0) {
assert(iter.index == tag_index[i]);
i++;
}
item_kill_tree(&tree);
}
/*
* Basic join checks: make sure we can't find an entry in the tree after
* a larger entry has replaced it
*/
static void multiorder_join1(unsigned long index,
unsigned order1, unsigned order2)
{
unsigned long loc;
void *item, *item2 = item_create(index + 1, order1);
RADIX_TREE(tree, GFP_KERNEL);
item_insert_order(&tree, index, order2);
item = radix_tree_lookup(&tree, index);
radix_tree_join(&tree, index + 1, order1, item2);
loc = find_item(&tree, item);
if (loc == -1)
free(item);
item = radix_tree_lookup(&tree, index + 1);
assert(item == item2);
item_kill_tree(&tree);
}
/*
* Check that the accounting of value entries is handled correctly
* by joining a value entry to a normal pointer.
*/
static void multiorder_join2(unsigned order1, unsigned order2)
{
RADIX_TREE(tree, GFP_KERNEL);
struct radix_tree_node *node;
void *item1 = item_create(0, order1);
void *item2;
item_insert_order(&tree, 0, order2);
radix_tree_insert(&tree, 1 << order2, xa_mk_value(5));
item2 = __radix_tree_lookup(&tree, 1 << order2, &node, NULL);
assert(item2 == xa_mk_value(5));
assert(node->exceptional == 1);
item2 = radix_tree_lookup(&tree, 0);
free(item2);
radix_tree_join(&tree, 0, order1, item1);
item2 = __radix_tree_lookup(&tree, 1 << order2, &node, NULL);
assert(item2 == item1);
assert(node->exceptional == 0);
item_kill_tree(&tree);
}
/*
* This test revealed an accounting bug for value entries at one point.
* Nodes were being freed back into the pool with an elevated exception count
* by radix_tree_join() and then radix_tree_split() was failing to zero the
* count of exceptional entries.
*/
static void multiorder_join3(unsigned int order)
{
RADIX_TREE(tree, GFP_KERNEL);
struct radix_tree_node *node;
void **slot;
struct radix_tree_iter iter;
unsigned long i;
for (i = 0; i < (1 << order); i++) {
radix_tree_insert(&tree, i, xa_mk_value(5));
}
radix_tree_join(&tree, 0, order, xa_mk_value(7));
rcu_barrier();
radix_tree_split(&tree, 0, 0);
radix_tree_for_each_slot(slot, &tree, &iter, 0) {
radix_tree_iter_replace(&tree, &iter, slot, xa_mk_value(5));
}
__radix_tree_lookup(&tree, 0, &node, NULL);
assert(node->exceptional == node->count);
item_kill_tree(&tree);
}
static void multiorder_join(void)
{
int i, j, idx;
for (idx = 0; idx < 1024; idx = idx * 2 + 3) {
for (i = 1; i < 15; i++) {
for (j = 0; j < i; j++) {
multiorder_join1(idx, i, j);
}
}
}
for (i = 1; i < 15; i++) {
for (j = 0; j < i; j++) {
multiorder_join2(i, j);
}
}
for (i = 3; i < 10; i++) {
multiorder_join3(i);
}
}
static void check_mem(unsigned old_order, unsigned new_order, unsigned alloc)
{
struct radix_tree_preload *rtp = &radix_tree_preloads;
if (rtp->nr != 0)
printv(2, "split(%u %u) remaining %u\n", old_order, new_order,
rtp->nr);
/*
* Can't check for equality here as some nodes may have been
* RCU-freed while we ran. But we should never finish with more
* nodes allocated since they should have all been preloaded.
*/
if (nr_allocated > alloc)
printv(2, "split(%u %u) allocated %u %u\n", old_order, new_order,
alloc, nr_allocated);
}
static void __multiorder_split(int old_order, int new_order)
{
RADIX_TREE(tree, GFP_ATOMIC);
void **slot;
struct radix_tree_iter iter;
unsigned alloc;
struct item *item;
radix_tree_preload(GFP_KERNEL);
assert(item_insert_order(&tree, 0, old_order) == 0);
radix_tree_preload_end();
/* Wipe out the preloaded cache or it'll confuse check_mem() */
radix_tree_cpu_dead(0);
item = radix_tree_tag_set(&tree, 0, 2);
radix_tree_split_preload(old_order, new_order, GFP_KERNEL);
alloc = nr_allocated;
radix_tree_split(&tree, 0, new_order);
check_mem(old_order, new_order, alloc);
radix_tree_for_each_slot(slot, &tree, &iter, 0) {
radix_tree_iter_replace(&tree, &iter, slot,
item_create(iter.index, new_order));
}
radix_tree_preload_end();
item_kill_tree(&tree);
free(item);
}
static void __multiorder_split2(int old_order, int new_order)
{
RADIX_TREE(tree, GFP_KERNEL);
void **slot;
struct radix_tree_iter iter;
struct radix_tree_node *node;
void *item;
__radix_tree_insert(&tree, 0, old_order, xa_mk_value(5));
item = __radix_tree_lookup(&tree, 0, &node, NULL);
assert(item == xa_mk_value(5));
assert(node->exceptional > 0);
radix_tree_split(&tree, 0, new_order);
radix_tree_for_each_slot(slot, &tree, &iter, 0) {
radix_tree_iter_replace(&tree, &iter, slot,
item_create(iter.index, new_order));
}
item = __radix_tree_lookup(&tree, 0, &node, NULL);
assert(item != xa_mk_value(5));
assert(node->exceptional == 0);
item_kill_tree(&tree);
}
static void __multiorder_split3(int old_order, int new_order)
{
RADIX_TREE(tree, GFP_KERNEL);
void **slot;
struct radix_tree_iter iter;
struct radix_tree_node *node;
void *item;
__radix_tree_insert(&tree, 0, old_order, xa_mk_value(5));
item = __radix_tree_lookup(&tree, 0, &node, NULL);
assert(item == xa_mk_value(5));
assert(node->exceptional > 0);
radix_tree_split(&tree, 0, new_order);
radix_tree_for_each_slot(slot, &tree, &iter, 0) {
radix_tree_iter_replace(&tree, &iter, slot, xa_mk_value(7));
}
item = __radix_tree_lookup(&tree, 0, &node, NULL);
assert(item == xa_mk_value(7));
assert(node->exceptional > 0);
item_kill_tree(&tree);
__radix_tree_insert(&tree, 0, old_order, xa_mk_value(5));
item = __radix_tree_lookup(&tree, 0, &node, NULL);
assert(item == xa_mk_value(5));
assert(node->exceptional > 0);
radix_tree_split(&tree, 0, new_order);
radix_tree_for_each_slot(slot, &tree, &iter, 0) {
if (iter.index == (1 << new_order))
radix_tree_iter_replace(&tree, &iter, slot,
xa_mk_value(7));
else
radix_tree_iter_replace(&tree, &iter, slot, NULL);
}
item = __radix_tree_lookup(&tree, 1 << new_order, &node, NULL);
assert(item == xa_mk_value(7));
assert(node->count == node->exceptional);
do {
node = node->parent;
if (!node)
break;
assert(node->count == 1);
assert(node->exceptional == 0);
} while (1);
item_kill_tree(&tree);
}
static void multiorder_split(void)
{
int i, j;
for (i = 3; i < 11; i++)
for (j = 0; j < i; j++) {
__multiorder_split(i, j);
__multiorder_split2(i, j);
__multiorder_split3(i, j);
}
}
static void multiorder_account(void)
{
RADIX_TREE(tree, GFP_KERNEL);
struct radix_tree_node *node;
void **slot;
item_insert_order(&tree, 0, 5);
__radix_tree_insert(&tree, 1 << 5, 5, xa_mk_value(5));
__radix_tree_lookup(&tree, 0, &node, NULL);
assert(node->count == node->exceptional * 2);
radix_tree_delete(&tree, 1 << 5);
assert(node->exceptional == 0);
__radix_tree_insert(&tree, 1 << 5, 5, xa_mk_value(5));
__radix_tree_lookup(&tree, 1 << 5, &node, &slot);
assert(node->count == node->exceptional * 2);
mm, truncate: do not check mapping for every page being truncated During truncation, the mapping has already been checked for shmem and dax so it's known that workingset_update_node is required. This patch avoids the checks on mapping for each page being truncated. In all other cases, a lookup helper is used to determine if workingset_update_node() needs to be called. The one danger is that the API is slightly harder to use as calling workingset_update_node directly without checking for dax or shmem mappings could lead to surprises. However, the API rarely needs to be used and hopefully the comment is enough to give people the hint. sparsetruncate (tiny) 4.14.0-rc4 4.14.0-rc4 oneirq-v1r1 pickhelper-v1r1 Min Time 141.00 ( 0.00%) 140.00 ( 0.71%) 1st-qrtle Time 142.00 ( 0.00%) 141.00 ( 0.70%) 2nd-qrtle Time 142.00 ( 0.00%) 142.00 ( 0.00%) 3rd-qrtle Time 143.00 ( 0.00%) 143.00 ( 0.00%) Max-90% Time 144.00 ( 0.00%) 144.00 ( 0.00%) Max-95% Time 147.00 ( 0.00%) 145.00 ( 1.36%) Max-99% Time 195.00 ( 0.00%) 191.00 ( 2.05%) Max Time 230.00 ( 0.00%) 205.00 ( 10.87%) Amean Time 144.37 ( 0.00%) 143.82 ( 0.38%) Stddev Time 10.44 ( 0.00%) 9.00 ( 13.74%) Coeff Time 7.23 ( 0.00%) 6.26 ( 13.41%) Best99%Amean Time 143.72 ( 0.00%) 143.34 ( 0.26%) Best95%Amean Time 142.37 ( 0.00%) 142.00 ( 0.26%) Best90%Amean Time 142.19 ( 0.00%) 141.85 ( 0.24%) Best75%Amean Time 141.92 ( 0.00%) 141.58 ( 0.24%) Best50%Amean Time 141.69 ( 0.00%) 141.31 ( 0.27%) Best25%Amean Time 141.38 ( 0.00%) 140.97 ( 0.29%) As you'd expect, the gain is marginal but it can be detected. The differences in bonnie are all within the noise which is not surprising given the impact on the microbenchmark. radix_tree_update_node_t is a callback for some radix operations that optionally passes in a private field. The only user of the callback is workingset_update_node and as it no longer requires a mapping, the private field is removed. Link: http://lkml.kernel.org/r/20171018075952.10627-3-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-16 09:37:41 +08:00
__radix_tree_replace(&tree, node, slot, NULL, NULL);
assert(node->exceptional == 0);
item_kill_tree(&tree);
}
radix tree test suite: multi-order iteration race Add a test which shows a race in the multi-order iteration code. This test reliably hits the race in under a second on my machine, and is the result of a real bug report against kernel a production v4.15 based kernel (4.15.6-300.fc27.x86_64). With a real kernel this issue is hit when using order 9 PMD DAX radix tree entries. The race has to do with how we tear down multi-order sibling entries when we are removing an item from the tree. Remember that an order 2 entry looks like this: struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling] where 'entry' is in some slot in the struct radix_tree_node, and the three slots following 'entry' contain sibling pointers which point back to 'entry.' When we delete 'entry' from the tree, we call : radix_tree_delete() radix_tree_delete_item() __radix_tree_delete() replace_slot() replace_slot() first removes the siblings in order from the first to the last, then at then replaces 'entry' with NULL. This means that for a brief period of time we end up with one or more of the siblings removed, so: struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling] This causes an issue if you have a reader iterating over the slots in the tree via radix_tree_for_each_slot() while only under rcu_read_lock()/rcu_read_unlock() protection. This is a common case in mm/filemap.c. The issue is that when __radix_tree_next_slot() => skip_siblings() tries to skip over the sibling entries in the slots, it currently does so with an exact match on the slot directly preceding our current slot. Normally this works: V preceding slot struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling] ^ current slot This lets you find the first sibling, and you skip them all in order. But in the case where one of the siblings is NULL, that slot is skipped and then our sibling detection is interrupted: V preceding slot struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling] ^ current slot This means that the sibling pointers aren't recognized since they point all the way back to 'entry', so we think that they are normal internal radix tree pointers. This causes us to think we need to walk down to a struct radix_tree_node starting at the address of 'entry'. In a real running kernel this will crash the thread with a GP fault when you try and dereference the slots in your broken node starting at 'entry'. In the radix tree test suite this will be caught by the address sanitizer: ==27063==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60c0008ae400 at pc 0x00000040ce4f bp 0x7fa89b8fcad0 sp 0x7fa89b8fcac0 READ of size 8 at 0x60c0008ae400 thread T3 #0 0x40ce4e in __radix_tree_next_slot /home/rzwisler/project/linux/tools/testing/radix-tree/radix-tree.c:1660 #1 0x4022cc in radix_tree_next_slot linux/../../../../include/linux/radix-tree.h:567 #2 0x4022cc in iterator_func /home/rzwisler/project/linux/tools/testing/radix-tree/multiorder.c:655 #3 0x7fa8a088d50a in start_thread (/lib64/libpthread.so.0+0x750a) #4 0x7fa8a03bd16e in clone (/lib64/libc.so.6+0xf516e) Link: http://lkml.kernel.org/r/20180503192430.7582-5-ross.zwisler@linux.intel.com Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: CR, Sapthagirish <sapthagirish.cr@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Jan Kara <jack@suse.cz> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-19 07:09:01 +08:00
bool stop_iteration = false;
static void *creator_func(void *ptr)
{
/* 'order' is set up to ensure we have sibling entries */
unsigned int order = RADIX_TREE_MAP_SHIFT - 1;
struct radix_tree_root *tree = ptr;
int i;
for (i = 0; i < 10000; i++) {
item_insert_order(tree, 0, order);
item_delete_rcu(tree, 0);
}
stop_iteration = true;
return NULL;
}
static void *iterator_func(void *ptr)
{
struct radix_tree_root *tree = ptr;
struct radix_tree_iter iter;
struct item *item;
void **slot;
while (!stop_iteration) {
rcu_read_lock();
radix_tree_for_each_slot(slot, tree, &iter, 0) {
item = radix_tree_deref_slot(slot);
if (!item)
continue;
if (radix_tree_deref_retry(item)) {
slot = radix_tree_iter_retry(&iter);
continue;
}
item_sanity(item, iter.index);
}
rcu_read_unlock();
}
return NULL;
}
static void multiorder_iteration_race(void)
{
const int num_threads = sysconf(_SC_NPROCESSORS_ONLN);
pthread_t worker_thread[num_threads];
RADIX_TREE(tree, GFP_KERNEL);
int i;
pthread_create(&worker_thread[0], NULL, &creator_func, &tree);
for (i = 1; i < num_threads; i++)
pthread_create(&worker_thread[i], NULL, &iterator_func, &tree);
for (i = 0; i < num_threads; i++)
pthread_join(worker_thread[i], NULL);
item_kill_tree(&tree);
}
void multiorder_checks(void)
{
int i;
for (i = 0; i < 20; i++) {
multiorder_check(200, i);
multiorder_check(0, i);
multiorder_check((1UL << i) + 1, i);
}
radix-tree: fix several shrinking bugs with multiorder entries Setting the indirect bit on the user data entry used to be unambiguous because the tree walking code knew not to expect internal nodes in the last level of the tree. Multiorder entries can appear at any level of the tree, and a leaf with the indirect bit set is indistinguishable from a pointer to a node. Introduce a special entry (RADIX_TREE_RETRY) which is neither a valid user entry, nor a valid pointer to a node. The radix_tree_deref_retry() function continues to work the same way, but tree walking code can distinguish it from a pointer to a node. Also fix the condition for setting slot->parent to NULL; it does not matter what height the tree is, it only matters whether slot is an indirect pointer. Move this code above the comment which is referring to the assignment to root->rnode. Also fix the condition for preventing the tree from shrinking to a single entry if it's a multiorder entry. Add a test-case to the test suite that checks that the tree goes back down to its original height after an item is inserted & deleted from a higher index in the tree. Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com> Cc: Jan Kara <jack@suse.com> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-21 08:02:17 +08:00
for (i = 0; i < 15; i++)
multiorder_shrink((1UL << (i + RADIX_TREE_MAP_SHIFT)), i);
multiorder_insert_bug();
multiorder_tag_tests();
multiorder_iteration();
multiorder_tagged_iteration();
multiorder_join();
multiorder_split();
multiorder_account();
radix tree test suite: multi-order iteration race Add a test which shows a race in the multi-order iteration code. This test reliably hits the race in under a second on my machine, and is the result of a real bug report against kernel a production v4.15 based kernel (4.15.6-300.fc27.x86_64). With a real kernel this issue is hit when using order 9 PMD DAX radix tree entries. The race has to do with how we tear down multi-order sibling entries when we are removing an item from the tree. Remember that an order 2 entry looks like this: struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling] where 'entry' is in some slot in the struct radix_tree_node, and the three slots following 'entry' contain sibling pointers which point back to 'entry.' When we delete 'entry' from the tree, we call : radix_tree_delete() radix_tree_delete_item() __radix_tree_delete() replace_slot() replace_slot() first removes the siblings in order from the first to the last, then at then replaces 'entry' with NULL. This means that for a brief period of time we end up with one or more of the siblings removed, so: struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling] This causes an issue if you have a reader iterating over the slots in the tree via radix_tree_for_each_slot() while only under rcu_read_lock()/rcu_read_unlock() protection. This is a common case in mm/filemap.c. The issue is that when __radix_tree_next_slot() => skip_siblings() tries to skip over the sibling entries in the slots, it currently does so with an exact match on the slot directly preceding our current slot. Normally this works: V preceding slot struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling] ^ current slot This lets you find the first sibling, and you skip them all in order. But in the case where one of the siblings is NULL, that slot is skipped and then our sibling detection is interrupted: V preceding slot struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling] ^ current slot This means that the sibling pointers aren't recognized since they point all the way back to 'entry', so we think that they are normal internal radix tree pointers. This causes us to think we need to walk down to a struct radix_tree_node starting at the address of 'entry'. In a real running kernel this will crash the thread with a GP fault when you try and dereference the slots in your broken node starting at 'entry'. In the radix tree test suite this will be caught by the address sanitizer: ==27063==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60c0008ae400 at pc 0x00000040ce4f bp 0x7fa89b8fcad0 sp 0x7fa89b8fcac0 READ of size 8 at 0x60c0008ae400 thread T3 #0 0x40ce4e in __radix_tree_next_slot /home/rzwisler/project/linux/tools/testing/radix-tree/radix-tree.c:1660 #1 0x4022cc in radix_tree_next_slot linux/../../../../include/linux/radix-tree.h:567 #2 0x4022cc in iterator_func /home/rzwisler/project/linux/tools/testing/radix-tree/multiorder.c:655 #3 0x7fa8a088d50a in start_thread (/lib64/libpthread.so.0+0x750a) #4 0x7fa8a03bd16e in clone (/lib64/libc.so.6+0xf516e) Link: http://lkml.kernel.org/r/20180503192430.7582-5-ross.zwisler@linux.intel.com Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: CR, Sapthagirish <sapthagirish.cr@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Jan Kara <jack@suse.cz> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-19 07:09:01 +08:00
multiorder_iteration_race();
radix_tree_cpu_dead(0);
}
int __weak main(void)
{
radix_tree_init();
multiorder_checks();
return 0;
}