2007-06-12 21:07:21 +08:00
|
|
|
/*
|
|
|
|
* Copyright (C) 2007 Oracle. All rights reserved.
|
|
|
|
*
|
|
|
|
* This program is free software; you can redistribute it and/or
|
|
|
|
* modify it under the terms of the GNU General Public
|
|
|
|
* License v2 as published by the Free Software Foundation.
|
|
|
|
*
|
|
|
|
* This program is distributed in the hope that it will be useful,
|
|
|
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
|
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
|
|
|
* General Public License for more details.
|
|
|
|
*
|
|
|
|
* You should have received a copy of the GNU General Public
|
|
|
|
* License along with this program; if not, write to the
|
|
|
|
* Free Software Foundation, Inc., 59 Temple Place - Suite 330,
|
|
|
|
* Boston, MA 021110-1307, USA.
|
|
|
|
*/
|
|
|
|
|
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
|
|
|
#include <linux/delay.h>
|
|
|
|
#include <linux/kthread.h>
|
|
|
|
#include <linux/pagemap.h>
|
|
|
|
|
2007-03-21 02:38:32 +08:00
|
|
|
#include "ctree.h"
|
|
|
|
#include "disk-io.h"
|
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
|
|
|
#include "free-space-cache.h"
|
|
|
|
#include "inode-map.h"
|
2007-03-21 02:38:32 +08:00
|
|
|
#include "transaction.h"
|
|
|
|
|
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
|
|
|
static int caching_kthread(void *data)
|
|
|
|
{
|
|
|
|
struct btrfs_root *root = data;
|
|
|
|
struct btrfs_fs_info *fs_info = root->fs_info;
|
|
|
|
struct btrfs_free_space_ctl *ctl = root->free_ino_ctl;
|
|
|
|
struct btrfs_key key;
|
|
|
|
struct btrfs_path *path;
|
|
|
|
struct extent_buffer *leaf;
|
|
|
|
u64 last = (u64)-1;
|
|
|
|
int slot;
|
|
|
|
int ret;
|
|
|
|
|
2011-06-03 21:36:29 +08:00
|
|
|
if (!btrfs_test_opt(root, INODE_MAP_CACHE))
|
|
|
|
return 0;
|
|
|
|
|
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
|
|
|
path = btrfs_alloc_path();
|
|
|
|
if (!path)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
/* Since the commit root is read-only, we can safely skip locking. */
|
|
|
|
path->skip_locking = 1;
|
|
|
|
path->search_commit_root = 1;
|
|
|
|
path->reada = 2;
|
|
|
|
|
|
|
|
key.objectid = BTRFS_FIRST_FREE_OBJECTID;
|
|
|
|
key.offset = 0;
|
|
|
|
key.type = BTRFS_INODE_ITEM_KEY;
|
|
|
|
again:
|
|
|
|
/* need to make sure the commit_root doesn't disappear */
|
|
|
|
mutex_lock(&root->fs_commit_mutex);
|
|
|
|
|
|
|
|
ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
|
|
|
|
if (ret < 0)
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
while (1) {
|
2011-06-01 00:07:27 +08:00
|
|
|
if (btrfs_fs_closing(fs_info))
|
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
|
|
|
goto out;
|
|
|
|
|
|
|
|
leaf = path->nodes[0];
|
|
|
|
slot = path->slots[0];
|
2011-05-26 14:38:30 +08:00
|
|
|
if (slot >= btrfs_header_nritems(leaf)) {
|
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
|
|
|
ret = btrfs_next_leaf(root, path);
|
|
|
|
if (ret < 0)
|
|
|
|
goto out;
|
|
|
|
else if (ret > 0)
|
|
|
|
break;
|
|
|
|
|
|
|
|
if (need_resched() ||
|
|
|
|
btrfs_transaction_in_commit(fs_info)) {
|
|
|
|
leaf = path->nodes[0];
|
|
|
|
|
2013-10-31 13:00:08 +08:00
|
|
|
if (WARN_ON(btrfs_header_nritems(leaf) == 0))
|
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
|
|
|
break;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Save the key so we can advances forward
|
|
|
|
* in the next search.
|
|
|
|
*/
|
|
|
|
btrfs_item_key_to_cpu(leaf, &key, 0);
|
2011-05-23 00:33:42 +08:00
|
|
|
btrfs_release_path(path);
|
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
|
|
|
root->cache_progress = last;
|
|
|
|
mutex_unlock(&root->fs_commit_mutex);
|
|
|
|
schedule_timeout(1);
|
|
|
|
goto again;
|
|
|
|
} else
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
btrfs_item_key_to_cpu(leaf, &key, slot);
|
|
|
|
|
|
|
|
if (key.type != BTRFS_INODE_ITEM_KEY)
|
|
|
|
goto next;
|
|
|
|
|
2011-05-26 14:38:30 +08:00
|
|
|
if (key.objectid >= root->highest_objectid)
|
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
|
|
|
break;
|
|
|
|
|
|
|
|
if (last != (u64)-1 && last + 1 != key.objectid) {
|
|
|
|
__btrfs_add_free_space(ctl, last + 1,
|
|
|
|
key.objectid - last - 1);
|
|
|
|
wake_up(&root->cache_wait);
|
|
|
|
}
|
|
|
|
|
|
|
|
last = key.objectid;
|
|
|
|
next:
|
|
|
|
path->slots[0]++;
|
|
|
|
}
|
|
|
|
|
2011-05-26 14:38:30 +08:00
|
|
|
if (last < root->highest_objectid - 1) {
|
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
|
|
|
__btrfs_add_free_space(ctl, last + 1,
|
2011-05-26 14:38:30 +08:00
|
|
|
root->highest_objectid - last - 1);
|
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
spin_lock(&root->cache_lock);
|
|
|
|
root->cached = BTRFS_CACHE_FINISHED;
|
|
|
|
spin_unlock(&root->cache_lock);
|
|
|
|
|
|
|
|
root->cache_progress = (u64)-1;
|
|
|
|
btrfs_unpin_free_ino(root);
|
|
|
|
out:
|
|
|
|
wake_up(&root->cache_wait);
|
|
|
|
mutex_unlock(&root->fs_commit_mutex);
|
|
|
|
|
|
|
|
btrfs_free_path(path);
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void start_caching(struct btrfs_root *root)
|
|
|
|
{
|
2011-05-26 14:38:30 +08:00
|
|
|
struct btrfs_free_space_ctl *ctl = root->free_ino_ctl;
|
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
|
|
|
struct task_struct *tsk;
|
2011-04-20 10:33:24 +08:00
|
|
|
int ret;
|
2011-05-26 14:38:30 +08:00
|
|
|
u64 objectid;
|
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
|
|
|
|
2011-06-03 21:36:29 +08:00
|
|
|
if (!btrfs_test_opt(root, INODE_MAP_CACHE))
|
|
|
|
return;
|
|
|
|
|
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
|
|
|
spin_lock(&root->cache_lock);
|
|
|
|
if (root->cached != BTRFS_CACHE_NO) {
|
|
|
|
spin_unlock(&root->cache_lock);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
root->cached = BTRFS_CACHE_STARTED;
|
|
|
|
spin_unlock(&root->cache_lock);
|
|
|
|
|
2011-04-20 10:33:24 +08:00
|
|
|
ret = load_free_ino_cache(root->fs_info, root);
|
|
|
|
if (ret == 1) {
|
|
|
|
spin_lock(&root->cache_lock);
|
|
|
|
root->cached = BTRFS_CACHE_FINISHED;
|
|
|
|
spin_unlock(&root->cache_lock);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
2011-05-26 14:38:30 +08:00
|
|
|
/*
|
|
|
|
* It can be quite time-consuming to fill the cache by searching
|
|
|
|
* through the extent tree, and this can keep ino allocation path
|
|
|
|
* waiting. Therefore at start we quickly find out the highest
|
|
|
|
* inode number and we know we can use inode numbers which fall in
|
|
|
|
* [highest_ino + 1, BTRFS_LAST_FREE_OBJECTID].
|
|
|
|
*/
|
|
|
|
ret = btrfs_find_free_objectid(root, &objectid);
|
|
|
|
if (!ret && objectid <= BTRFS_LAST_FREE_OBJECTID) {
|
|
|
|
__btrfs_add_free_space(ctl, objectid,
|
|
|
|
BTRFS_LAST_FREE_OBJECTID - objectid + 1);
|
|
|
|
}
|
|
|
|
|
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
|
|
|
tsk = kthread_run(caching_kthread, root, "btrfs-ino-cache-%llu\n",
|
|
|
|
root->root_key.objectid);
|
2012-03-12 23:03:00 +08:00
|
|
|
BUG_ON(IS_ERR(tsk)); /* -ENOMEM */
|
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
int btrfs_find_free_ino(struct btrfs_root *root, u64 *objectid)
|
|
|
|
{
|
2011-06-03 21:36:29 +08:00
|
|
|
if (!btrfs_test_opt(root, INODE_MAP_CACHE))
|
|
|
|
return btrfs_find_free_objectid(root, objectid);
|
|
|
|
|
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
|
|
|
again:
|
|
|
|
*objectid = btrfs_find_ino_for_alloc(root);
|
|
|
|
|
|
|
|
if (*objectid != 0)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
start_caching(root);
|
|
|
|
|
|
|
|
wait_event(root->cache_wait,
|
|
|
|
root->cached == BTRFS_CACHE_FINISHED ||
|
|
|
|
root->free_ino_ctl->free_space > 0);
|
|
|
|
|
|
|
|
if (root->cached == BTRFS_CACHE_FINISHED &&
|
|
|
|
root->free_ino_ctl->free_space == 0)
|
|
|
|
return -ENOSPC;
|
|
|
|
else
|
|
|
|
goto again;
|
|
|
|
}
|
|
|
|
|
|
|
|
void btrfs_return_ino(struct btrfs_root *root, u64 objectid)
|
|
|
|
{
|
|
|
|
struct btrfs_free_space_ctl *ctl = root->free_ino_ctl;
|
|
|
|
struct btrfs_free_space_ctl *pinned = root->free_ino_pinned;
|
2011-06-03 21:36:29 +08:00
|
|
|
|
|
|
|
if (!btrfs_test_opt(root, INODE_MAP_CACHE))
|
|
|
|
return;
|
|
|
|
|
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
|
|
|
again:
|
|
|
|
if (root->cached == BTRFS_CACHE_FINISHED) {
|
|
|
|
__btrfs_add_free_space(ctl, objectid, 1);
|
|
|
|
} else {
|
|
|
|
/*
|
|
|
|
* If we are in the process of caching free ino chunks,
|
|
|
|
* to avoid adding the same inode number to the free_ino
|
|
|
|
* tree twice due to cross transaction, we'll leave it
|
|
|
|
* in the pinned tree until a transaction is committed
|
|
|
|
* or the caching work is done.
|
|
|
|
*/
|
|
|
|
|
|
|
|
mutex_lock(&root->fs_commit_mutex);
|
|
|
|
spin_lock(&root->cache_lock);
|
|
|
|
if (root->cached == BTRFS_CACHE_FINISHED) {
|
|
|
|
spin_unlock(&root->cache_lock);
|
|
|
|
mutex_unlock(&root->fs_commit_mutex);
|
|
|
|
goto again;
|
|
|
|
}
|
|
|
|
spin_unlock(&root->cache_lock);
|
|
|
|
|
|
|
|
start_caching(root);
|
|
|
|
|
2011-05-26 14:38:30 +08:00
|
|
|
if (objectid <= root->cache_progress ||
|
Btrfs: Don't allocate inode that is already in use
Due to an off-by-one error, it is possible to reproduce a bug
when the inode cache is used.
The same inode number is assigned twice, the second time this
leads to an EEXIST in btrfs_insert_empty_items().
The issue can happen when a file is removed right after a subvolume
is created and then a new inode number is created before the
inodes in free_inode_pinned are processed.
unlink() calls btrfs_return_ino() which calls start_caching() in this
case which adds [highest_ino + 1, BTRFS_LAST_FREE_OBJECTID] by
searching for the highest inode (which already cannot find the
unlinked one anymore in btrfs_find_free_objectid()). So if this
unlinked inode's number is equal to the highest_ino + 1 (or >= this value
instead of > this value which was the off-by-one error), we mustn't add
the inode number to free_ino_pinned (caching_thread() does it right).
In this case we need to try directly to add the number to the inode_cache
which will fail in this case.
When this inode number is allocated while it is still in free_ino_pinned,
it is allocated and still added to the free inode cache when the
pinned inodes are processed, thus one of the following inode number
allocations will get an inode that is already in use and fail with EEXIST
in btrfs_insert_empty_items().
One example which was created with the reproducer below:
Create a snapshot, work in the newly created snapshot for the rest.
In unlink(inode 34284) call btrfs_return_ino() which calls start_caching().
start_caching() calls add_free_space [34284, 18446744073709517077].
In btrfs_return_ino(), call start_caching pinned [34284, 1] which is wrong.
mkdir() call btrfs_find_ino_for_alloc() which returns the number 34284.
btrfs_unpin_free_ino calls add_free_space [34284, 1].
mkdir() call btrfs_find_ino_for_alloc() which returns the number 34284.
EEXIST when the new inode is inserted.
One possible reproducer is this one:
#!/bin/sh
# preparation
TEST_DEV=/dev/sdc1
TEST_MNT=/mnt
umount ${TEST_MNT} 2>/dev/null || true
mkfs.btrfs -f ${TEST_DEV}
mount ${TEST_DEV} ${TEST_MNT} -o \
rw,relatime,compress=lzo,space_cache,inode_cache
btrfs subv create ${TEST_MNT}/s1
for i in `seq 34027`; do touch ${TEST_MNT}/s1/${i}; done
btrfs subv snap ${TEST_MNT}/s1 ${TEST_MNT}/s2
FILENAME=`find ${TEST_MNT}/s1/ -inum 4085 | sed 's|^.*/\([^/]*\)$|\1|'`
rm ${TEST_MNT}/s2/$FILENAME
touch ${TEST_MNT}/s2/$FILENAME
# the following steps can be repeated to reproduce the issue again and again
[ -e ${TEST_MNT}/s3 ] && btrfs subv del ${TEST_MNT}/s3
btrfs subv snap ${TEST_MNT}/s2 ${TEST_MNT}/s3
rm ${TEST_MNT}/s3/$FILENAME
touch ${TEST_MNT}/s3/$FILENAME
ls -alFi ${TEST_MNT}/s?/$FILENAME
touch ${TEST_MNT}/s3/_1 || logger FAILED
ls -alFi ${TEST_MNT}/s?/_1
touch ${TEST_MNT}/s3/_2 || logger FAILED
ls -alFi ${TEST_MNT}/s?/_2
touch ${TEST_MNT}/s3/__1 || logger FAILED
ls -alFi ${TEST_MNT}/s?/__1
touch ${TEST_MNT}/s3/__2 || logger FAILED
ls -alFi ${TEST_MNT}/s?/__2
# if the above is not enough, add the following loop:
for i in `seq 3 9`; do touch ${TEST_MNT}/s3/__${i} || logger FAILED; done
#for i in `seq 3 34027`; do touch ${TEST_MNT}/s3/__${i} || logger FAILED; done
# one of the touch(1) calls in s3 fail due to EEXIST because the inode is
# already in use that btrfs_find_ino_for_alloc() returns.
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Reviewed-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-10-16 02:08:15 +08:00
|
|
|
objectid >= root->highest_objectid)
|
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
|
|
|
__btrfs_add_free_space(ctl, objectid, 1);
|
|
|
|
else
|
|
|
|
__btrfs_add_free_space(pinned, objectid, 1);
|
|
|
|
|
|
|
|
mutex_unlock(&root->fs_commit_mutex);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* When a transaction is committed, we'll move those inode numbers which
|
|
|
|
* are smaller than root->cache_progress from pinned tree to free_ino tree,
|
|
|
|
* and others will just be dropped, because the commit root we were
|
|
|
|
* searching has changed.
|
|
|
|
*
|
|
|
|
* Must be called with root->fs_commit_mutex held
|
|
|
|
*/
|
|
|
|
void btrfs_unpin_free_ino(struct btrfs_root *root)
|
|
|
|
{
|
|
|
|
struct btrfs_free_space_ctl *ctl = root->free_ino_ctl;
|
|
|
|
struct rb_root *rbroot = &root->free_ino_pinned->free_space_offset;
|
|
|
|
struct btrfs_free_space *info;
|
|
|
|
struct rb_node *n;
|
|
|
|
u64 count;
|
|
|
|
|
2011-06-03 21:36:29 +08:00
|
|
|
if (!btrfs_test_opt(root, INODE_MAP_CACHE))
|
|
|
|
return;
|
|
|
|
|
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
|
|
|
while (1) {
|
|
|
|
n = rb_first(rbroot);
|
|
|
|
if (!n)
|
|
|
|
break;
|
|
|
|
|
|
|
|
info = rb_entry(n, struct btrfs_free_space, offset_index);
|
2012-03-12 23:03:00 +08:00
|
|
|
BUG_ON(info->bitmap); /* Logic error */
|
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
|
|
|
|
|
|
|
if (info->offset > root->cache_progress)
|
|
|
|
goto free;
|
|
|
|
else if (info->offset + info->bytes > root->cache_progress)
|
|
|
|
count = root->cache_progress - info->offset + 1;
|
|
|
|
else
|
|
|
|
count = info->bytes;
|
|
|
|
|
|
|
|
__btrfs_add_free_space(ctl, info->offset, count);
|
|
|
|
free:
|
|
|
|
rb_erase(&info->offset_index, rbroot);
|
|
|
|
kfree(info);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
#define INIT_THRESHOLD (((1024 * 32) / 2) / sizeof(struct btrfs_free_space))
|
|
|
|
#define INODES_PER_BITMAP (PAGE_CACHE_SIZE * 8)
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The goal is to keep the memory used by the free_ino tree won't
|
|
|
|
* exceed the memory if we use bitmaps only.
|
|
|
|
*/
|
|
|
|
static void recalculate_thresholds(struct btrfs_free_space_ctl *ctl)
|
|
|
|
{
|
|
|
|
struct btrfs_free_space *info;
|
|
|
|
struct rb_node *n;
|
|
|
|
int max_ino;
|
|
|
|
int max_bitmaps;
|
|
|
|
|
|
|
|
n = rb_last(&ctl->free_space_offset);
|
|
|
|
if (!n) {
|
|
|
|
ctl->extents_thresh = INIT_THRESHOLD;
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
info = rb_entry(n, struct btrfs_free_space, offset_index);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Find the maximum inode number in the filesystem. Note we
|
|
|
|
* ignore the fact that this can be a bitmap, because we are
|
|
|
|
* not doing precise calculation.
|
|
|
|
*/
|
|
|
|
max_ino = info->bytes - 1;
|
|
|
|
|
|
|
|
max_bitmaps = ALIGN(max_ino, INODES_PER_BITMAP) / INODES_PER_BITMAP;
|
|
|
|
if (max_bitmaps <= ctl->total_bitmaps) {
|
|
|
|
ctl->extents_thresh = 0;
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
ctl->extents_thresh = (max_bitmaps - ctl->total_bitmaps) *
|
|
|
|
PAGE_CACHE_SIZE / sizeof(*info);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We don't fall back to bitmap, if we are below the extents threshold
|
|
|
|
* or this chunk of inode numbers is a big one.
|
|
|
|
*/
|
|
|
|
static bool use_bitmap(struct btrfs_free_space_ctl *ctl,
|
|
|
|
struct btrfs_free_space *info)
|
|
|
|
{
|
|
|
|
if (ctl->free_extents < ctl->extents_thresh ||
|
|
|
|
info->bytes > INODES_PER_BITMAP / 10)
|
|
|
|
return false;
|
|
|
|
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
|
|
|
static struct btrfs_free_space_op free_ino_op = {
|
|
|
|
.recalc_thresholds = recalculate_thresholds,
|
|
|
|
.use_bitmap = use_bitmap,
|
|
|
|
};
|
|
|
|
|
|
|
|
static void pinned_recalc_thresholds(struct btrfs_free_space_ctl *ctl)
|
|
|
|
{
|
|
|
|
}
|
|
|
|
|
|
|
|
static bool pinned_use_bitmap(struct btrfs_free_space_ctl *ctl,
|
|
|
|
struct btrfs_free_space *info)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* We always use extents for two reasons:
|
|
|
|
*
|
|
|
|
* - The pinned tree is only used during the process of caching
|
|
|
|
* work.
|
|
|
|
* - Make code simpler. See btrfs_unpin_free_ino().
|
|
|
|
*/
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
static struct btrfs_free_space_op pinned_free_ino_op = {
|
|
|
|
.recalc_thresholds = pinned_recalc_thresholds,
|
|
|
|
.use_bitmap = pinned_use_bitmap,
|
|
|
|
};
|
|
|
|
|
|
|
|
void btrfs_init_free_ino_ctl(struct btrfs_root *root)
|
|
|
|
{
|
|
|
|
struct btrfs_free_space_ctl *ctl = root->free_ino_ctl;
|
|
|
|
struct btrfs_free_space_ctl *pinned = root->free_ino_pinned;
|
|
|
|
|
|
|
|
spin_lock_init(&ctl->tree_lock);
|
|
|
|
ctl->unit = 1;
|
|
|
|
ctl->start = 0;
|
|
|
|
ctl->private = NULL;
|
|
|
|
ctl->op = &free_ino_op;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Initially we allow to use 16K of ram to cache chunks of
|
|
|
|
* inode numbers before we resort to bitmaps. This is somewhat
|
|
|
|
* arbitrary, but it will be adjusted in runtime.
|
|
|
|
*/
|
|
|
|
ctl->extents_thresh = INIT_THRESHOLD;
|
|
|
|
|
|
|
|
spin_lock_init(&pinned->tree_lock);
|
|
|
|
pinned->unit = 1;
|
|
|
|
pinned->start = 0;
|
|
|
|
pinned->private = NULL;
|
|
|
|
pinned->extents_thresh = 0;
|
|
|
|
pinned->op = &pinned_free_ino_op;
|
|
|
|
}
|
|
|
|
|
2011-04-20 10:33:24 +08:00
|
|
|
int btrfs_save_ino_cache(struct btrfs_root *root,
|
|
|
|
struct btrfs_trans_handle *trans)
|
|
|
|
{
|
|
|
|
struct btrfs_free_space_ctl *ctl = root->free_ino_ctl;
|
|
|
|
struct btrfs_path *path;
|
|
|
|
struct inode *inode;
|
2011-11-11 09:45:04 +08:00
|
|
|
struct btrfs_block_rsv *rsv;
|
|
|
|
u64 num_bytes;
|
2011-04-20 10:33:24 +08:00
|
|
|
u64 alloc_hint = 0;
|
|
|
|
int ret;
|
|
|
|
int prealloc;
|
|
|
|
bool retry = false;
|
|
|
|
|
2011-06-01 17:42:49 +08:00
|
|
|
/* only fs tree and subvol/snap needs ino cache */
|
|
|
|
if (root->root_key.objectid != BTRFS_FS_TREE_OBJECTID &&
|
|
|
|
(root->root_key.objectid < BTRFS_FIRST_FREE_OBJECTID ||
|
|
|
|
root->root_key.objectid > BTRFS_LAST_FREE_OBJECTID))
|
|
|
|
return 0;
|
|
|
|
|
2011-06-01 03:33:33 +08:00
|
|
|
/* Don't save inode cache if we are deleting this root */
|
2013-09-05 22:58:43 +08:00
|
|
|
if (btrfs_root_refs(&root->root_item) == 0)
|
2011-06-01 03:33:33 +08:00
|
|
|
return 0;
|
|
|
|
|
2011-06-03 21:36:29 +08:00
|
|
|
if (!btrfs_test_opt(root, INODE_MAP_CACHE))
|
|
|
|
return 0;
|
|
|
|
|
2011-04-20 10:33:24 +08:00
|
|
|
path = btrfs_alloc_path();
|
|
|
|
if (!path)
|
|
|
|
return -ENOMEM;
|
2011-06-03 21:36:29 +08:00
|
|
|
|
2011-11-11 09:45:04 +08:00
|
|
|
rsv = trans->block_rsv;
|
|
|
|
trans->block_rsv = &root->fs_info->trans_block_rsv;
|
|
|
|
|
|
|
|
num_bytes = trans->bytes_reserved;
|
|
|
|
/*
|
|
|
|
* 1 item for inode item insertion if need
|
2013-05-13 21:55:09 +08:00
|
|
|
* 4 items for inode item update (in the worst case)
|
|
|
|
* 1 items for slack space if we need do truncation
|
2011-11-11 09:45:04 +08:00
|
|
|
* 1 item for free space object
|
|
|
|
* 3 items for pre-allocation
|
|
|
|
*/
|
2013-05-13 21:55:09 +08:00
|
|
|
trans->bytes_reserved = btrfs_calc_trans_metadata_size(root, 10);
|
Btrfs: improve the noflush reservation
In some places(such as: evicting inode), we just can not flush the reserved
space of delalloc, flushing the delayed directory index and delayed inode
is OK, but we don't try to flush those things and just go back when there is
no enough space to be reserved. This patch fixes this problem.
We defined 3 types of the flush operations: NO_FLUSH, FLUSH_LIMIT and FLUSH_ALL.
If we can in the transaction, we should not flush anything, or the deadlock
would happen, so use NO_FLUSH. If we flushing the reserved space of delalloc
would cause deadlock, use FLUSH_LIMIT. In the other cases, FLUSH_ALL is used,
and we will flush all things.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-10-16 19:33:38 +08:00
|
|
|
ret = btrfs_block_rsv_add(root, trans->block_rsv,
|
|
|
|
trans->bytes_reserved,
|
|
|
|
BTRFS_RESERVE_NO_FLUSH);
|
2011-11-11 09:45:04 +08:00
|
|
|
if (ret)
|
|
|
|
goto out;
|
2012-02-24 23:39:05 +08:00
|
|
|
trace_btrfs_space_reservation(root->fs_info, "ino_cache",
|
2012-03-29 21:57:44 +08:00
|
|
|
trans->transid, trans->bytes_reserved, 1);
|
2011-04-20 10:33:24 +08:00
|
|
|
again:
|
|
|
|
inode = lookup_free_ino_inode(root, path);
|
2012-03-12 23:03:00 +08:00
|
|
|
if (IS_ERR(inode) && (PTR_ERR(inode) != -ENOENT || retry)) {
|
2011-04-20 10:33:24 +08:00
|
|
|
ret = PTR_ERR(inode);
|
2011-11-11 09:45:04 +08:00
|
|
|
goto out_release;
|
2011-04-20 10:33:24 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
if (IS_ERR(inode)) {
|
2012-03-12 23:03:00 +08:00
|
|
|
BUG_ON(retry); /* Logic error */
|
2011-04-20 10:33:24 +08:00
|
|
|
retry = true;
|
|
|
|
|
|
|
|
ret = create_free_ino_inode(root, trans, path);
|
|
|
|
if (ret)
|
2011-11-11 09:45:04 +08:00
|
|
|
goto out_release;
|
2011-04-20 10:33:24 +08:00
|
|
|
goto again;
|
|
|
|
}
|
|
|
|
|
|
|
|
BTRFS_I(inode)->generation = 0;
|
|
|
|
ret = btrfs_update_inode(trans, root, inode);
|
2012-03-12 23:03:00 +08:00
|
|
|
if (ret) {
|
|
|
|
btrfs_abort_transaction(trans, root, ret);
|
|
|
|
goto out_put;
|
|
|
|
}
|
2011-04-20 10:33:24 +08:00
|
|
|
|
|
|
|
if (i_size_read(inode) > 0) {
|
2013-09-20 21:46:51 +08:00
|
|
|
ret = btrfs_truncate_free_space_cache(root, trans, inode);
|
2012-03-12 23:03:00 +08:00
|
|
|
if (ret) {
|
2013-05-13 21:55:08 +08:00
|
|
|
if (ret != -ENOSPC)
|
|
|
|
btrfs_abort_transaction(trans, root, ret);
|
2011-04-20 10:33:24 +08:00
|
|
|
goto out_put;
|
2012-03-12 23:03:00 +08:00
|
|
|
}
|
2011-04-20 10:33:24 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
spin_lock(&root->cache_lock);
|
|
|
|
if (root->cached != BTRFS_CACHE_FINISHED) {
|
|
|
|
ret = -1;
|
|
|
|
spin_unlock(&root->cache_lock);
|
|
|
|
goto out_put;
|
|
|
|
}
|
|
|
|
spin_unlock(&root->cache_lock);
|
|
|
|
|
|
|
|
spin_lock(&ctl->tree_lock);
|
|
|
|
prealloc = sizeof(struct btrfs_free_space) * ctl->free_extents;
|
|
|
|
prealloc = ALIGN(prealloc, PAGE_CACHE_SIZE);
|
|
|
|
prealloc += ctl->total_bitmaps * PAGE_CACHE_SIZE;
|
|
|
|
spin_unlock(&ctl->tree_lock);
|
|
|
|
|
|
|
|
/* Just to make sure we have enough space */
|
|
|
|
prealloc += 8 * PAGE_CACHE_SIZE;
|
|
|
|
|
2011-08-30 22:19:10 +08:00
|
|
|
ret = btrfs_delalloc_reserve_space(inode, prealloc);
|
2011-04-20 10:33:24 +08:00
|
|
|
if (ret)
|
|
|
|
goto out_put;
|
|
|
|
|
|
|
|
ret = btrfs_prealloc_file_range_trans(inode, trans, 0, 0, prealloc,
|
|
|
|
prealloc, prealloc, &alloc_hint);
|
2011-08-30 22:19:10 +08:00
|
|
|
if (ret) {
|
|
|
|
btrfs_delalloc_release_space(inode, prealloc);
|
2011-04-20 10:33:24 +08:00
|
|
|
goto out_put;
|
2011-08-30 22:19:10 +08:00
|
|
|
}
|
2011-04-20 10:33:24 +08:00
|
|
|
btrfs_free_reserved_data_space(inode, prealloc);
|
|
|
|
|
2013-09-20 21:43:28 +08:00
|
|
|
ret = btrfs_write_out_ino_cache(root, trans, path, inode);
|
2011-04-20 10:33:24 +08:00
|
|
|
out_put:
|
|
|
|
iput(inode);
|
2011-11-11 09:45:04 +08:00
|
|
|
out_release:
|
2012-02-24 23:39:05 +08:00
|
|
|
trace_btrfs_space_reservation(root->fs_info, "ino_cache",
|
2012-03-29 21:57:44 +08:00
|
|
|
trans->transid, trans->bytes_reserved, 0);
|
2011-11-11 09:45:04 +08:00
|
|
|
btrfs_block_rsv_release(root, trans->block_rsv, trans->bytes_reserved);
|
2011-04-20 10:33:24 +08:00
|
|
|
out:
|
2011-11-11 09:45:04 +08:00
|
|
|
trans->block_rsv = rsv;
|
|
|
|
trans->bytes_reserved = num_bytes;
|
2011-04-20 10:33:24 +08:00
|
|
|
|
|
|
|
btrfs_free_path(path);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
|
|
|
static int btrfs_find_highest_objectid(struct btrfs_root *root, u64 *objectid)
|
2007-04-06 01:35:25 +08:00
|
|
|
{
|
|
|
|
struct btrfs_path *path;
|
|
|
|
int ret;
|
2007-10-16 04:14:19 +08:00
|
|
|
struct extent_buffer *l;
|
2007-04-06 01:35:25 +08:00
|
|
|
struct btrfs_key search_key;
|
2007-10-16 04:14:19 +08:00
|
|
|
struct btrfs_key found_key;
|
2007-04-06 01:35:25 +08:00
|
|
|
int slot;
|
|
|
|
|
|
|
|
path = btrfs_alloc_path();
|
2011-03-23 16:14:16 +08:00
|
|
|
if (!path)
|
|
|
|
return -ENOMEM;
|
2007-04-06 01:35:25 +08:00
|
|
|
|
2008-09-06 04:43:53 +08:00
|
|
|
search_key.objectid = BTRFS_LAST_FREE_OBJECTID;
|
|
|
|
search_key.type = -1;
|
2007-04-06 01:35:25 +08:00
|
|
|
search_key.offset = (u64)-1;
|
|
|
|
ret = btrfs_search_slot(NULL, root, &search_key, path, 0, 0);
|
|
|
|
if (ret < 0)
|
|
|
|
goto error;
|
2012-03-12 23:03:00 +08:00
|
|
|
BUG_ON(ret == 0); /* Corruption */
|
2007-04-06 01:35:25 +08:00
|
|
|
if (path->slots[0] > 0) {
|
|
|
|
slot = path->slots[0] - 1;
|
2007-10-16 04:14:19 +08:00
|
|
|
l = path->nodes[0];
|
|
|
|
btrfs_item_key_to_cpu(l, &found_key, slot);
|
2009-09-22 03:56:00 +08:00
|
|
|
*objectid = max_t(u64, found_key.objectid,
|
|
|
|
BTRFS_FIRST_FREE_OBJECTID - 1);
|
2007-04-06 01:35:25 +08:00
|
|
|
} else {
|
2009-09-22 03:56:00 +08:00
|
|
|
*objectid = BTRFS_FIRST_FREE_OBJECTID - 1;
|
2007-04-06 01:35:25 +08:00
|
|
|
}
|
|
|
|
ret = 0;
|
|
|
|
error:
|
|
|
|
btrfs_free_path(path);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
|
|
|
int btrfs_find_free_objectid(struct btrfs_root *root, u64 *objectid)
|
2007-03-21 02:38:32 +08:00
|
|
|
{
|
|
|
|
int ret;
|
2008-06-26 04:01:30 +08:00
|
|
|
mutex_lock(&root->objectid_mutex);
|
2007-03-21 02:38:32 +08:00
|
|
|
|
2009-09-22 03:56:00 +08:00
|
|
|
if (unlikely(root->highest_objectid < BTRFS_FIRST_FREE_OBJECTID)) {
|
Btrfs: Cache free inode numbers in memory
Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.
This fixes it, and it works similarly to how we cache free space in block
cgroups.
We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.
Because we are searching the commit root, we have to carefully handle the
cross-transaction case.
The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-20 10:06:11 +08:00
|
|
|
ret = btrfs_find_highest_objectid(root,
|
|
|
|
&root->highest_objectid);
|
2009-09-22 03:56:00 +08:00
|
|
|
if (ret)
|
|
|
|
goto out;
|
|
|
|
}
|
2008-09-26 22:05:38 +08:00
|
|
|
|
2009-09-22 03:56:00 +08:00
|
|
|
if (unlikely(root->highest_objectid >= BTRFS_LAST_FREE_OBJECTID)) {
|
|
|
|
ret = -ENOSPC;
|
|
|
|
goto out;
|
2007-03-21 02:38:32 +08:00
|
|
|
}
|
2009-09-22 03:56:00 +08:00
|
|
|
|
|
|
|
*objectid = ++root->highest_objectid;
|
|
|
|
ret = 0;
|
|
|
|
out:
|
2008-06-26 04:01:30 +08:00
|
|
|
mutex_unlock(&root->objectid_mutex);
|
2007-03-21 02:38:32 +08:00
|
|
|
return ret;
|
|
|
|
}
|