OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Chandra Seetharaman	5a52c2a581	xfs: Remove the macro XFS_BUF_ERROR and family Remove the definitions and usage of the macros XFS_BUF_ERROR, XFS_BUF_GETERROR and XFS_BUF_ISERROR. Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-07-25 14:57:46 -05:00
Christoph Hellwig	69ef921b55	xfs: byteswap constants instead of variables Micro-optimize various comparisms by always byteswapping the constant instead of the variable, which allows to do the swap at compile instead of runtime. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2011-07-08 14:36:05 +02:00
Dave Chinner	0b932cccbd	xfs: Convert remaining cmn_err() callers to new API Once converted, kill the remainder of the cmn_err() interface. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2011-03-07 10:08:35 +11:00
Dave Chinner	5348778699	xfs: convert xfs_fs_cmn_err to new error logging API Continue to clean up the error logging code by converting all the callers of xfs_fs_cmn_err() to the new API. Once done, remove the unused old API function. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2011-03-07 10:05:35 +11:00
Christoph Hellwig	1a1a3e97ba	xfs: remove xfs_buf wrappers Stop having two different names for many buffer functions and use the more descriptive xfs_buf_* names directly. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2010-10-18 15:08:07 -05:00
Dave Chinner	4536f2ad8b	xfs: fix untrusted inode number lookup Commit `7124fe0a5b` ("xfs: validate untrusted inode numbers during lookup") changes the inode lookup code to do btree lookups for untrusted inode numbers. This change made an invalid assumption about the alignment of inodes and hence incorrectly calculated the first inode in the cluster. As a result, some inode numbers were being incorrectly considered invalid when they were actually valid. The issue was not picked up by the xfstests suite because it always runs fsr and dump (the two utilities that utilise the bulkstat interface) on cache hot inodes and hence the lookup code in the cold cache path was not sufficiently exercised to uncover this intermittent problem. Fix the issue by relaxing the btree lookup criteria and then checking if the record returned contains the inode number we are lookup for. If it we get an incorrect record, then the inode number is invalid. Cc: <stable@kernel.org> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2010-08-24 11:42:30 +10:00
Christoph Hellwig	3400777ff0	xfs: remove unneeded #include statements Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com>	2010-07-26 13:16:33 -05:00
Christoph Hellwig	288699feca	xfs: drop dmapi hooks Dmapi support was never merged upstream, but we still have a lot of hooks bloating XFS for it, all over the fast pathes of the filesystem. This patch drops over 700 lines of dmapi overhead. If we'll ever get HSM support in mainline at least the namespace events can be done much saner in the VFS instead of the individual filesystem, so it's not like this is much help for future work. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2010-07-26 13:16:33 -05:00
Dave Chinner	7b6259e7a8	xfs: remove block number from inode lookup code The block number comes from bulkstat based inode lookups to shortcut the mapping calculations. We ar enot able to trust anything from bulkstat, so drop the block number as well so that the correct lookups and mappings are always done. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2010-06-24 11:35:17 +10:00
Dave Chinner	1920779e67	xfs: rename XFS_IGET_BULKSTAT to XFS_IGET_UNTRUSTED Inode numbers may come from somewhere external to the filesystem (e.g. file handles, bulkstat information) and so are inherently untrusted. Rename the flag we use for these lookups to make it obvious we are doing a lookup of an untrusted inode number and need to verify it completely before trying to read it from disk. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2010-06-24 11:15:47 +10:00
Dave Chinner	7124fe0a5b	xfs: validate untrusted inode numbers during lookup When we decode a handle or do a bulkstat lookup, we are using an inode number we cannot trust to be valid. If we are deleting inode chunks from disk (default noikeep mode), then we cannot trust the on disk inode buffer for any given inode number to correctly reflect whether the inode has been unlinked as the di_mode nor the generation number may have been updated on disk. This is due to the fact that when we delete an inode chunk, we do not write the clusters back to disk when they are removed - instead we mark them stale to avoid them being written back potentially over the top of something that has been subsequently allocated at that location. The result is that we can have locations of disk that look like they contain valid inodes but in reality do not. Hence we cannot simply convert the inode number to a block number and read the location from disk to determine if the inode is valid or not. As a result, and XFS_IGET_BULKSTAT lookup needs to actually look the inode up in the inode allocation btree to determine if the inode number is valid or not. It should be noted even on ikeep filesystems, there is the possibility that blocks on disk may look like valid inode clusters. e.g. if there are filesystem images hosted on the filesystem. Hence even for ikeep filesystems we really need to validate that the inode number is valid before issuing the inode buffer read. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2010-06-24 11:15:33 +10:00
Christoph Hellwig	0cadda1c5f	xfs: remove duplicate buffer flags Currently we define aliases for the buffer flags in various namespaces, which only adds confusion. Remove all but the XBF_ flags to clean this up a bit. Note that we still abuse XFS_B_ASYNC/XBF_ASYNC for some non-buffer uses, but I'll clean that up later. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2010-01-21 13:44:36 -06:00
Dave Chinner	1c1c6ebcf5	xfs: Replace per-ag array with a radix tree The use of an array for the per-ag structures requires reallocation of the array when growing the filesystem. This requires locking access to the array to avoid use after free situations, and the locking is difficult to get right. To avoid needing to reallocate an array, change the per-ag structures to an allocated object per ag and index them using a tree structure. The AGs are always densely indexed (hence the use of an array), but the number supported is 2^32 and lookups tend to be random and hence indexing needs to scale. A simple choice is a radix tree - it works well with this sort of index. This change also removes another large contiguous allocation from the mount/growfs path in XFS. The growing process now needs to change to only initialise the new AGs required for the extra space, and as such only needs to exclusively lock the tree for inserts. The rest of the code only needs to lock the tree while doing lookups, and hence this will remove all the deadlocks that currently occur on the m_perag_lock as it is now an innermost lock. The lock is also changed to a spinlock from a read/write lock as the hold time is now extremely short. To complete the picture, the per-ag structures will need to be reference counted to ensure that we don't free/modify them while they are still in use. This will be done in subsequent patch. Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2010-01-15 15:33:52 -06:00
Dave Chinner	44b56e0a1a	xfs: convert remaining direct references to m_perag Convert the remaining direct lookups of the per ag structures to use get/put accesses. Ensure that the loops across AGs and prior users of the interface balance gets and puts correctly. Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2010-01-15 15:33:39 -06:00
Christoph Hellwig	b8f82a4a6f	xfs: kill the STATIC_INLINE macro Remove our own STATIC_INLINE macro. For small function inside implementation files just use STATIC and let gcc inline it, and for those in headers do the normal static inline - they are all small enough to be inlined for debug builds, too. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2009-12-11 15:11:22 -06:00
Eric Sandeen	3b826386d3	xfs: free temporary cursor in xfs_dialloc Commit `bd16956599` seems to have a slight regression where this code path: if (!--searchdistance) { /* * Not in range - save last search * location and allocate a new inode */ ... goto newino; } doesn't free the temporary cursor (tcur) that got dup'd in this function. This leaks an item in the xfs_btree_cur zone, and it's caught on module unload: =========================================================== BUG xfs_btree_cur: Objects remaining on kmem_cache_close() ----------------------------------------------------------- It seems like maybe a single free at the end of the function might be cleaner, but for now put a del_cursor right in this code block similar to the handling in the rest of the function. Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Christoph Hellwig <hch@lst.de>	2009-10-30 09:27:07 +01:00
Christoph Hellwig	81e251766e	xfs: un-static xfs_inobt_lookup xfs_inobt_lookup is also used in xfs_itable.c, remove the STATIC modifier from it's declaration to fix non-debug builds. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Felix Blyakher <felixb@sgi.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>	2009-09-01 20:43:01 -05:00
Dave Chinner	bd16956599	xfs: speed up free inode search Don't search too far - abort if it is outside a certain radius and simply do a linear search for the first free inode. In AGs with a million inodes this can speed up allocation speed by 3-4x. [hch: ported to the new xfs_ialloc.c world order] Signed-off-by: Dave Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>	2009-09-01 12:45:48 -05:00
Christoph Hellwig	2187550525	xfs: rationalize xfs_inobt_lookup* Currenly we have a xfs_inobt_lookup* variant for each comparism direction, and all these get all three fields of the inobt records passed, while the common case is just looking for the inode number and we have only marginally more callers than xfs_inobt_lookup* variants. So opencode a direct call to xfs_btree_lookup for the single case where we need all fields, and replace xfs_inobt_lookup* with a xfs_inobt_looku that just takes the inode number and the direction for all other callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>	2009-09-01 12:45:39 -05:00
Christoph Hellwig	4254b0bbb1	xfs: untangle xfs_dialloc Clarify the control flow in xfs_dialloc. Factor out a helper to go to the next node from the current one and improve the control flow by expanding composite if statements and using gotos. The xfs_ialloc_next_rec helper is borrowed from Dave Chinners dynamic allocation policy patches. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>	2009-09-01 12:45:29 -05:00
Dave Chinner	0b48db80ba	xfs: factor out debug checks from xfs_dialloc and xfs_difree Factor out a common helper from repeated debug checks in xfs_dialloc and xfs_difree. [hch: split out from Dave's dynamic allocation policy patches] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>	2009-09-01 12:45:18 -05:00
Christoph Hellwig	afabc24a73	xfs: improve xfs_inobt_update prototype Both callers of xfs_inobt_update have the record in form of a xfs_inobt_rec_incore_t, so just pass a pointer to it instead of the individual variables. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>	2009-09-01 12:45:08 -05:00
Christoph Hellwig	2e287a731e	xfs: improve xfs_inobt_get_rec prototype Most callers of xfs_inobt_get_rec need to fill a xfs_inobt_rec_incore_t, and those who don't yet are fine with a xfs_inobt_rec_incore_t, instead of the three individual variables, too. So just change xfs_inobt_get_rec to write the output into a xfs_inobt_rec_incore_t directly. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>	2009-09-01 12:44:56 -05:00
Dave Chinner	85c0b2ab5e	xfs: factor out inode initialisation Factor out code to initialize new inode clusters into a function of it's own. This keeps xfs_ialloc_ag_alloc smaller and better structured and enables a future inode cluster initialization transaction. Also initialize the agno variable earlier in xfs_ialloc_ag_alloc to avoid repeated byte swaps. [hch: The original patch is from Dave from his unpublished inode create transaction patch series, with some modifcations by me to apply stand-alone] Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>	2009-09-01 12:44:27 -05:00
Malcolm Parsons	9da096fd13	xfs: fix various typos Signed-off-by: Malcolm Parsons <malcolm.parsons@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2009-03-29 09:55:42 +02:00
Christoph Hellwig	0d87e656dd	xfs: remove superflous inobt macros xfs_ialloc_btree.h has a a cuple of macros that only obsfucate the code but don't provide any abstraction benefits. This patches removes those and cleans up the reamaining defintions up a little. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com>	2009-02-09 08:37:14 +01:00
Eric Sandeen	9d87c3192d	[XFS] Remove the rest of the macro-to-function indirections. Remove the last of the macros-defined-to-static-functions. Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2009-01-16 17:10:42 +11:00
Christoph Hellwig	b48d8d6437	[XFS] kill the XFS_IMAP_BULKSTAT flag Just pass down the XFS_IGET_* flags all the way down to xfs_imap instead of translating them mid-way. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-01 11:38:13 +11:00
Christoph Hellwig	92bfc6e7c4	[XFS] embededd struct xfs_imap into xfs_inode Most uses of struct xfs_imap are to map and inode to a buffer. To avoid copying around the inode location information we should just embedd a strcut xfs_imap into the xfs_inode. To make sure it doesn't bloat an inode the im_len is changed to a ushort, which is fine as that's what the users exepect anyway. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-01 11:38:08 +11:00
Christoph Hellwig	94e1b69d1a	[XFS] merge xfs_imap into xfs_dilocate xfs_imap is the only caller of xfs_dilocate and doesn't add any significant value. Merge the two functions and document the various cases we have for inode cluster lookup in the new xfs_imap. Also remove the unused im_agblkno and im_ioffset fields from struct xfs_imap while we're at it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-01 11:38:03 +11:00
Christoph Hellwig	a194189503	[XFS] remove dead code for old inode item recovery We have removed the support for old-style inode items a while ago and xlog_recover_do_inode_trans is now only called for XFS_LI_INODE items. That means we can remove the call to xfs_imap there and with it the XFS_IMAP_LOOKUP that is set by all other callers. We can also mark xfs_imap static now. (First sent on October 21st) Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-01 11:37:58 +11:00
Christoph Hellwig	51ce16d519	[XFS] kill XFS_DINODE_VERSION_ defines These names don't add any value at all over just using the numerical values. (First sent on October 9th) Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-01 11:37:42 +11:00
Christoph Hellwig	81591fe2db	[XFS] kill xfs_dinode_core_t Now that we have a separate xfs_icdinode_t for the in-core inode which gets logged there is no need anymore for the xfs_dinode vs xfs_dinode_core split - the fact that part of the structure gets logged through the inode log item and a small part not can better be described in a comment. All sizeof operations on the dinode_core either really wanted the icdinode and are switched to that one, or had already added the size of the agi unlinked list pointer. Later both will be replaced with helpers once we get the larger CRC-enabled dinode. Removing the data and attribute fork unions also has the advantage that xfs_dinode.h doesn't need to pull in every header under the sun. While we're at it also add some more comments describing the dinode structure. (First sent on October 7th) Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-01 11:37:35 +11:00
Christoph Hellwig	d42f08f61c	[XFS] kill xfs_ialloc_log_di xfs_ialloc_log_di is only used to log the full inode core + di_next_unlinked. That means all the offset magic is not nessecary and we can simply use xfs_trans_log_buf directly. Also add a comment describing what we should do here instead. (First sent on October 7th) Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-01 11:37:31 +11:00
Christoph Hellwig	5e1be0fb1a	[XFS] factor out xfs_read_agi helper Add a helper to read the AGI header and perform basic verification. Based on hunks from a larger patch from Dave Chinner. (First sent on Juli 23rd) Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-01 11:37:15 +11:00
Christoph Hellwig	8cc938fe42	[XFS] implement generic xfs_btree_get_rec Not really much reason to make it generic given that it's so small, but this is the last non-method in xfs_alloc_btree.c and xfs_ialloc_btree.c, so it makes the whole btree implementation more structured. SGI-PV: 985583 SGI-Modid: xfs-linux-melb:xfs-kern:32206a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Bill O'Donnell <billodo@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com>	2008-10-30 16:58:11 +11:00
Christoph Hellwig	91cca5df9b	[XFS] implement generic xfs_btree_delete/delrec Make the btree delete code generic. Based on a patch from David Chinner with lots of changes to follow the original btree implementations more closely. While this loses some of the generic helper routines for inserting/moving/removing records it also solves some of the one off bugs in the original code and makes it easier to verify. SGI-PV: 985583 SGI-Modid: xfs-linux-melb:xfs-kern:32205a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Bill O'Donnell <billodo@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com>	2008-10-30 16:58:01 +11:00
Christoph Hellwig	4b22a57188	[XFS] implement generic xfs_btree_insert/insrec Make the btree insert code generic. Based on a patch from David Chinner with lots of changes to follow the original btree implementations more closely. While this loses some of the generic helper routines for inserting/moving/removing records it also solves some of the one off bugs in the original code and makes it easier to verify. SGI-PV: 985583 SGI-Modid: xfs-linux-melb:xfs-kern:32202a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Bill O'Donnell <billodo@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com>	2008-10-30 16:57:40 +11:00
Christoph Hellwig	278d0ca14e	[XFS] implement generic xfs_btree_update From: Dave Chinner <dgc@sgi.com> The most complicated part here is the lastrec tracking for the alloc btree. Most logic is in the update_lastrec method which has to do some hopefully good enough dirty magic to maintain it. [hch: split out from bigger patch and a rework of the lastrec logic] SGI-PV: 985583 SGI-Modid: xfs-linux-melb:xfs-kern:32194a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Bill O'Donnell <billodo@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com>	2008-10-30 16:56:32 +11:00
Christoph Hellwig	fe033cc848	[XFS] implement generic xfs_btree_lookup From: Dave Chinner <dgc@sgi.com> [hch: split out from bigger patch and minor adaptions] SGI-PV: 985583 SGI-Modid: xfs-linux-melb:xfs-kern:32192a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Bill O'Donnell <billodo@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com>	2008-10-30 16:56:09 +11:00
Christoph Hellwig	8df4da4a0a	[XFS] implement generic xfs_btree_decrement From: Dave Chinner <dgc@sgi.com> [hch: split out from bigger patch and minor adaptions] SGI-PV: 985583 SGI-Modid: xfs-linux-melb:xfs-kern:32191a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Bill O'Donnell <billodo@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com>	2008-10-30 16:55:58 +11:00
Christoph Hellwig	637aa50f46	[XFS] implement generic xfs_btree_increment From: Dave Chinner <dgc@sgi.com> Because this is the first major generic btree routine this patch includes some infrastrucure, first a few routines to deal with a btree block that can be either in short or long form, second xfs_btree_read_buf_block, which is the new central routine to read a btree block given a cursor, and third the new xfs_btree_ptr_addr routine to calculate the address for a given btree pointer record. [hch: split out from bigger patch and minor adaptions] SGI-PV: 985583 SGI-Modid: xfs-linux-melb:xfs-kern:32190a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Bill O'Donnell <billodo@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com>	2008-10-30 16:55:45 +11:00
Christoph Hellwig	561f7d1739	[XFS] split up xfs_btree_init_cursor xfs_btree_init_cursor contains close to little shared code for the different btrees and will get even more non-common code in the future. Split it up into one routine per btree type. Because xfs_btree_dup_cursor needs to call the init routine for a generic btree cursor add a new btree operation vector that contains a dup_cursor method that initializes a new cursor based on an existing one. The btree operations vector is based on an idea and code from Dave Chinner and will grow more entries later during this series. SGI-PV: 985583 SGI-Modid: xfs-linux-melb:xfs-kern:32176a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Bill O'Donnell <billodo@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com>	2008-10-30 16:53:59 +11:00
David Chinner	359346a965	[XFS] Don't initialise new inode generation numbers to zero When we allocation new inode chunks, we initialise the generation numbers to zero. This works fine until we delete a chunk and then reallocate it, resulting in the same inode numbers but with a reset generation count. This can result in inode/generation pairs of different inodes occurring relatively close together. Given that the inode/gen pair makes up the "unique" portion of an NFS filehandle on XFS, this can result in file handles cached on clients being seen on the wire from the server but refer to a different file. This causes .... issues for NFS clients. Hence we need a unique generation number initialisation for each inode to prevent reuse of a small portion of the generation number space. Use a random number to initialise the generation number so we don't need to keep any new state on disk whilst making the new number difficult to guess from previous allocations. SGI-PV: 979416 SGI-Modid: xfs-linux-melb:xfs-kern:31001a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-29 15:58:56 +10:00
David Chinner	75de2a91c9	[XFS] Account for inode cluster alignment in all allocations At ENOSPC, we can get a filesystem shutdown due to a cancelling a dirty transaction in xfs_mkdir or xfs_create. This is due to the initial allocation attempt not taking into account inode alignment and hence we can prepare the AGF freelist for allocation when it's not actually possible to do an allocation. This results in inode allocation returning ENOSPC with a dirty transaction, and hence we shut down the filesystem. Because the first allocation is an exact allocation attempt, we must tell the allocator that the alignment does not affect the allocation attempt. i.e. we will accept any extent alignment as long as the extent starts at the block we want. Unfortunately, this means that if the longest free extent is less than the length + alignment necessary for fallback allocation attempts but is long enough to attempt a non-aligned allocation, we will modify the free list. If we then have the exact allocation fail, all other allocation attempts will also fail due to the alignment constraint being taken into account. Hence the initial attempt needs to set the "alignment slop" field so that alignment, while not required, must be taken into account when determining if there is enough space left in the AG to do the allocation. That means if the exact allocation fails, we will not dirty the freelist if there is not enough space available fo a subsequent allocation to succeed. Hence we get an ENOSPC error back to userspace without shutting down the filesystem. SGI-PV: 978886 SGI-Modid: xfs-linux-melb:xfs-kern:30699a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:42:09 +10:00
Eric Sandeen	6211870992	[XFS] remove shouting-indirection macros from xfs_sb.h Remove macro-to-small-function indirection from xfs_sb.h, and remove some which are completely unused. SGI-PV: 976035 SGI-Modid: xfs-linux-melb:xfs-kern:30528a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-10 16:24:45 +10:00
Josef Jeff Sipek	1bd960ee2b	[XFS] If you mount an XFS filesystem with no mount options at all, then the "ikeep" option is set rather than "noikeep". This regression was introduced in 970451. With no mount options specified, xfs_parseargs() does the following: int ikeep = 0; args->flags \|= XFSMNT_BARRIER; args->flags2 \|= XFSMNT2_COMPAT_IOSIZE; if (!options) goto done; It only sets the above two options by default and before, it also used to set XFSMNT_IDELETE by default. If options are specified, then if (!(args->flags & XFSMNT_DMAPI) && !ikeep) args->flags \|= XFSMNT_IDELETE; is executed later on which is skipped by the "goto done;" above. The solution is to invert the logic. SGI-PV: 977771 SGI-Modid: xfs-linux-melb:xfs-kern:30590a Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Barry Naujok <bnaujok@sgi.com> Signed-off-by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-02-28 20:37:56 -08:00
Marcin Slusarz	413d57c990	xfs: convert beX_add to beX_add_cpu (new common API) remove beX_add functions and replace all uses with beX_add_cpu Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Cc: Mark Fasheh <mark.fasheh@oracle.com> Reviewed-by: Dave Chinner <dgc@sgi.com> Cc: Timothy Shimmin <tes@sgi.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-13 16:21:19 -08:00
Christoph Hellwig	347d1c0195	[XFS] dinode endianess annotations Biggest bit is duplicating the dinode structure so we have one annotated for native endianess and one for disk endianess. The other significant change is that xfs_xlate_dinode_core is split into one helper per direction to allow for proper annotations, everything else is trivial. As a sidenode splitting out the incore dinode means we can move it into xfs_inode.h in a later patch and severely improving on the include hell in xfs. SGI-PV: 968563 SGI-Modid: xfs-linux-melb:xfs-kern:29476a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-10-15 16:48:30 +10:00
David Chinner	92821e2ba4	[XFS] Lazy Superblock Counters When we have a couple of hundred transactions on the fly at once, they all typically modify the on disk superblock in some way. create/unclink/mkdir/rmdir modify inode counts, allocation/freeing modify free block counts. When these counts are modified in a transaction, they must eventually lock the superblock buffer and apply the mods. The buffer then remains locked until the transaction is committed into the incore log buffer. The result of this is that with enough transactions on the fly the incore superblock buffer becomes a bottleneck. The result of contention on the incore superblock buffer is that transaction rates fall - the more pressure that is put on the superblock buffer, the slower things go. The key to removing the contention is to not require the superblock fields in question to be locked. We do that by not marking the superblock dirty in the transaction. IOWs, we modify the incore superblock but do not modify the cached superblock buffer. In short, we do not log superblock modifications to critical fields in the superblock on every transaction. In fact we only do it just before we write the superblock to disk every sync period or just before unmount. This creates an interesting problem - if we don't log or write out the fields in every transaction, then how do the values get recovered after a crash? the answer is simple - we keep enough duplicate, logged information in other structures that we can reconstruct the correct count after log recovery has been performed. It is the AGF and AGI structures that contain the duplicate information; after recovery, we walk every AGI and AGF and sum their individual counters to get the correct value, and we do a transaction into the log to correct them. An optimisation of this is that if we have a clean unmount record, we know the value in the superblock is correct, so we can avoid the summation walk under normal conditions and so mount/recovery times do not change under normal operation. One wrinkle that was discovered during development was that the blocks used in the freespace btrees are never accounted for in the AGF counters. This was once a valid optimisation to make; when the filesystem is full, the free space btrees are empty and consume no space. Hence when it matters, the "accounting" is correct. But that means the when we do the AGF summations, we would not have a correct count and xfs_check would complain. Hence a new counter was added to track the number of blocks used by the free space btrees. This is an on-disk format change. As a result of this, lazy superblock counters are a mkfs option and at the moment on linux there is no way to convert an old filesystem. This is possible - xfs_db can be used to twiddle the right bits and then xfs_repair will do the format conversion for you. Similarly, you can convert backwards as well. At some point we'll add functionality to xfs_admin to do the bit twiddling easily.... SGI-PV: 964999 SGI-Modid: xfs-linux-melb:xfs-kern:28652a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-07-14 15:28:50 +10:00

1 2

65 Commits