Merge branches 'core/debugobjects', 'core/iommu', 'core/locking', 'core/printk', 'core/rcu', 'core/resources', 'core/softirq' and 'core/stacktrace' into core/core

2008-12-25 14:06:29 +01:00 · 2008-12-25 14:06:29 +01:00 · 6638101c11
parent cc37d3d206 3ae7020543 a08636690d 00ef9f7348 26cc271db7 12d79bafb7 3ac52669c7 8b752e3ef6 9212ddb5ea
commit 6638101c11
34 changed files with 3058 additions and 173 deletions
--- a/Documentation/RCU/00-INDEX
+++ b/Documentation/RCU/00-INDEX
@ -16,6 +16,8 @@ RTFP.txt
 	- List of RCU papers (bibliography) going back to 1980.
 torture.txt
 	- RCU Torture Test Operation (CONFIG_RCU_TORTURE_TEST)
 trace.txt
 	- CONFIG_RCU_TRACE debugfs files and formats
 UP.txt
 	- RCU on Uniprocessor Systems
 whatisRCU.txt
--- a/Documentation/RCU/trace.txt
+++ b/Documentation/RCU/trace.txt
@ -0,0 +1,413 @@
 CONFIG_RCU_TRACE debugfs Files and Formats
 The rcupreempt and rcutree implementations of RCU provide debugfs trace
 output that summarizes counters and state.  This information is useful for
 debugging RCU itself, and can sometimes also help to debug abuses of RCU.
 Note that the rcuclassic implementation of RCU does not provide debugfs
 trace output.
 The following sections describe the debugfs files and formats for
 preemptable RCU (rcupreempt) and hierarchical RCU (rcutree).
 Preemptable RCU debugfs Files and Formats
 This implementation of RCU provides three debugfs files under the
 top-level directory RCU: rcu/rcuctrs (which displays the per-CPU
 counters used by preemptable RCU) rcu/rcugp (which displays grace-period
 counters), and rcu/rcustats (which internal counters for debugging RCU).
 The output of "cat rcu/rcuctrs" looks as follows:
 CPU last cur F M
  0    5  -5 0 0
  1   -1   0 0 0
  2    0   1 0 0
  3    0   1 0 0
  4    0   1 0 0
  5    0   1 0 0
  6    0   2 0 0
  7    0  -1 0 0
  8    0   1 0 0
 ggp = 26226, state = waitzero
 The per-CPU fields are as follows:
 o	"CPU" gives the CPU number.  Offline CPUs are not displayed.
 o	"last" gives the value of the counter that is being decremented
 	for the current grace period phase.  In the example above,
 	the counters sum to 4, indicating that there are still four
 	RCU read-side critical sections still running that started
 	before the last counter flip.
 o	"cur" gives the value of the counter that is currently being
 	both incremented (by rcu_read_lock()) and decremented (by
 	rcu_read_unlock()).  In the example above, the counters sum to
 	1, indicating that there is only one RCU read-side critical section
 	still running that started after the last counter flip.
 o	"F" indicates whether RCU is waiting for this CPU to acknowledge
 	a counter flip.  In the above example, RCU is not waiting on any,
 	which is consistent with the state being "waitzero" rather than
 	"waitack".
 o	"M" indicates whether RCU is waiting for this CPU to execute a
 	memory barrier.  In the above example, RCU is not waiting on any,
 	which is consistent with the state being "waitzero" rather than
 	"waitmb".
 o	"ggp" is the global grace-period counter.
 o	"state" is the RCU state, which can be one of the following:
 	o	"idle": there is no grace period in progress.
 	o	"waitack": RCU just incremented the global grace-period
 		counter, which has the effect of reversing the roles of
 		the "last" and "cur" counters above, and is waiting for
 		all the CPUs to acknowledge the flip.  Once the flip has
 		been acknowledged, CPUs will no longer be incrementing
 		what are now the "last" counters, so that their sum will
 		decrease monotonically down to zero.
 	o	"waitzero": RCU is waiting for the sum of the "last" counters
 		to decrease to zero.
 	o	"waitmb": RCU is waiting for each CPU to execute a memory
 		barrier, which ensures that instructions from a given CPU's
 		last RCU read-side critical section cannot be reordered
 		with instructions following the memory-barrier instruction.
 The output of "cat rcu/rcugp" looks as follows:
 oldggp=48870  newggp=48873
 Note that reading from this file provokes a synchronize_rcu().  The
 "oldggp" value is that of "ggp" from rcu/rcuctrs above, taken before
 executing the synchronize_rcu(), and the "newggp" value is also the
 "ggp" value, but taken after the synchronize_rcu() command returns.
 The output of "cat rcu/rcugp" looks as follows:
 na=1337955 nl=40 wa=1337915 wl=44 da=1337871 dl=0 dr=1337871 di=1337871
 1=50989 e1=6138 i1=49722 ie1=82 g1=49640 a1=315203 ae1=265563 a2=49640
 z1=1401244 ze1=1351605 z2=49639 m1=5661253 me1=5611614 m2=49639
 These are counters tracking internal preemptable-RCU events, however,
 some of them may be useful for debugging algorithms using RCU.  In
 particular, the "nl", "wl", and "dl" values track the number of RCU
 callbacks in various states.  The fields are as follows:
 o	"na" is the total number of RCU callbacks that have been enqueued
 	since boot.
 o	"nl" is the number of RCU callbacks waiting for the previous
 	grace period to end so that they can start waiting on the next
 	grace period.
 o	"wa" is the total number of RCU callbacks that have started waiting
 	for a grace period since boot.  "na" should be roughly equal to
 	"nl" plus "wa".
 o	"wl" is the number of RCU callbacks currently waiting for their
 	grace period to end.
 o	"da" is the total number of RCU callbacks whose grace periods
 	have completed since boot.  "wa" should be roughly equal to
 	"wl" plus "da".
 o	"dr" is the total number of RCU callbacks that have been removed
 	from the list of callbacks ready to invoke.  "dr" should be roughly
 	equal to "da".
 o	"di" is the total number of RCU callbacks that have been invoked
 	since boot.  "di" should be roughly equal to "da", though some
 	early versions of preemptable RCU had a bug so that only the
 	last CPU's count of invocations was displayed, rather than the
 	sum of all CPU's counts.
 o	"1" is the number of calls to rcu_try_flip().  This should be
 	roughly equal to the sum of "e1", "i1", "a1", "z1", and "m1"
 	described below.  In other words, the number of times that
 	the state machine is visited should be equal to the sum of the
 	number of times that each state is visited plus the number of
 	times that the state-machine lock acquisition failed.
 o	"e1" is the number of times that rcu_try_flip() was unable to
 	acquire the fliplock.
 o	"i1" is the number of calls to rcu_try_flip_idle().
 o	"ie1" is the number of times rcu_try_flip_idle() exited early
 	due to the calling CPU having no work for RCU.
 o	"g1" is the number of times that rcu_try_flip_idle() decided
 	to start a new grace period.  "i1" should be roughly equal to
 	"ie1" plus "g1".
 o	"a1" is the number of calls to rcu_try_flip_waitack().
 o	"ae1" is the number of times that rcu_try_flip_waitack() found
 	that at least one CPU had not yet acknowledge the new grace period
 	(AKA "counter flip").
 o	"a2" is the number of time rcu_try_flip_waitack() found that
 	all CPUs had acknowledged.  "a1" should be roughly equal to
 	"ae1" plus "a2".  (This particular output was collected on
 	a 128-CPU machine, hence the smaller-than-usual fraction of
 	calls to rcu_try_flip_waitack() finding all CPUs having already
 	acknowledged.)
 o	"z1" is the number of calls to rcu_try_flip_waitzero().
 o	"ze1" is the number of times that rcu_try_flip_waitzero() found
 	that not all of the old RCU read-side critical sections had
 	completed.
 o	"z2" is the number of times that rcu_try_flip_waitzero() finds
 	the sum of the counters equal to zero, in other words, that
 	all of the old RCU read-side critical sections had completed.
 	The value of "z1" should be roughly equal to "ze1" plus
 	"z2".
 o	"m1" is the number of calls to rcu_try_flip_waitmb().
 o	"me1" is the number of times that rcu_try_flip_waitmb() finds
 	that at least one CPU has not yet executed a memory barrier.
 o	"m2" is the number of times that rcu_try_flip_waitmb() finds that
 	all CPUs have executed a memory barrier.
 Hierarchical RCU debugfs Files and Formats
 This implementation of RCU provides three debugfs files under the
 top-level directory RCU: rcu/rcudata (which displays fields in struct
 rcu_data), rcu/rcugp (which displays grace-period counters), and
 rcu/rcuhier (which displays the struct rcu_node hierarchy).
 The output of "cat rcu/rcudata" looks as follows:
 rcu:
  0 c=4011 g=4012 pq=1 pqc=4011 qp=0 rpfq=1 rp=3c2a dt=23301/73 dn=2 df=1882 of=0 ri=2126 ql=2 b=10
  1 c=4011 g=4012 pq=1 pqc=4011 qp=0 rpfq=3 rp=39a6 dt=78073/1 dn=2 df=1402 of=0 ri=1875 ql=46 b=10
  2 c=4010 g=4010 pq=1 pqc=4010 qp=0 rpfq=-5 rp=1d12 dt=16646/0 dn=2 df=3140 of=0 ri=2080 ql=0 b=10
  3 c=4012 g=4013 pq=1 pqc=4012 qp=1 rpfq=3 rp=2b50 dt=21159/1 dn=2 df=2230 of=0 ri=1923 ql=72 b=10
  4 c=4012 g=4013 pq=1 pqc=4012 qp=1 rpfq=3 rp=1644 dt=5783/1 dn=2 df=3348 of=0 ri=2805 ql=7 b=10
  5 c=4012 g=4013 pq=0 pqc=4011 qp=1 rpfq=3 rp=1aac dt=5879/1 dn=2 df=3140 of=0 ri=2066 ql=10 b=10
  6 c=4012 g=4013 pq=1 pqc=4012 qp=1 rpfq=3 rp=ed8 dt=5847/1 dn=2 df=3797 of=0 ri=1266 ql=10 b=10
  7 c=4012 g=4013 pq=1 pqc=4012 qp=1 rpfq=3 rp=1fa2 dt=6199/1 dn=2 df=2795 of=0 ri=2162 ql=28 b=10
 rcu_bh:
  0 c=-268 g=-268 pq=1 pqc=-268 qp=0 rpfq=-145 rp=21d6 dt=23301/73 dn=2 df=0 of=0 ri=0 ql=0 b=10
  1 c=-268 g=-268 pq=1 pqc=-268 qp=1 rpfq=-170 rp=20ce dt=78073/1 dn=2 df=26 of=0 ri=5 ql=0 b=10
  2 c=-268 g=-268 pq=1 pqc=-268 qp=1 rpfq=-83 rp=fbd dt=16646/0 dn=2 df=28 of=0 ri=4 ql=0 b=10
  3 c=-268 g=-268 pq=1 pqc=-268 qp=0 rpfq=-105 rp=178c dt=21159/1 dn=2 df=28 of=0 ri=2 ql=0 b=10
  4 c=-268 g=-268 pq=1 pqc=-268 qp=1 rpfq=-30 rp=b54 dt=5783/1 dn=2 df=32 of=0 ri=0 ql=0 b=10
  5 c=-268 g=-268 pq=1 pqc=-268 qp=1 rpfq=-29 rp=df5 dt=5879/1 dn=2 df=30 of=0 ri=3 ql=0 b=10
  6 c=-268 g=-268 pq=1 pqc=-268 qp=1 rpfq=-28 rp=788 dt=5847/1 dn=2 df=32 of=0 ri=0 ql=0 b=10
  7 c=-268 g=-268 pq=1 pqc=-268 qp=1 rpfq=-53 rp=1098 dt=6199/1 dn=2 df=30 of=0 ri=3 ql=0 b=10
 The first section lists the rcu_data structures for rcu, the second for
 rcu_bh.  Each section has one line per CPU, or eight for this 8-CPU system.
 The fields are as follows:
 o	The number at the beginning of each line is the CPU number.
 	CPUs numbers followed by an exclamation mark are offline,
 	but have been online at least once since boot.	There will be
 	no output for CPUs that have never been online, which can be
 	a good thing in the surprisingly common case where NR_CPUS is
 	substantially larger than the number of actual CPUs.
 o	"c" is the count of grace periods that this CPU believes have
 	completed.  CPUs in dynticks idle mode may lag quite a ways
 	behind, for example, CPU 4 under "rcu" above, which has slept
 	through the past 25 RCU grace periods.	It is not unusual to
 	see CPUs lagging by thousands of grace periods.
 o	"g" is the count of grace periods that this CPU believes have
 	started.  Again, CPUs in dynticks idle mode may lag behind.
 	If the "c" and "g" values are equal, this CPU has already
 	reported a quiescent state for the last RCU grace period that
 	it is aware of, otherwise, the CPU believes that it owes RCU a
 	quiescent state.
 o	"pq" indicates that this CPU has passed through a quiescent state
 	for the current grace period.  It is possible for "pq" to be
 	"1" and "c" different than "g", which indicates that although
 	the CPU has passed through a quiescent state, either (1) this
 	CPU has not yet reported that fact, (2) some other CPU has not
 	yet reported for this grace period, or (3) both.
 o	"pqc" indicates which grace period the last-observed quiescent
 	state for this CPU corresponds to.  This is important for handling
 	the race between CPU 0 reporting an extended dynticks-idle
 	quiescent state for CPU 1 and CPU 1 suddenly waking up and
 	reporting its own quiescent state.  If CPU 1 was the last CPU
 	for the current grace period, then the CPU that loses this race
 	will attempt to incorrectly mark CPU 1 as having checked in for
 	the next grace period!
 o	"qp" indicates that RCU still expects a quiescent state from
 	this CPU.
 o	"rpfq" is the number of rcu_pending() calls on this CPU required
 	to induce this CPU to invoke force_quiescent_state().
 o	"rp" is low-order four hex digits of the count of how many times
 	rcu_pending() has been invoked on this CPU.
 o	"dt" is the current value of the dyntick counter that is incremented
 	when entering or leaving dynticks idle state, either by the
 	scheduler or by irq.  The number after the "/" is the interrupt
 	nesting depth when in dyntick-idle state, or one greater than
 	the interrupt-nesting depth otherwise.
 	This field is displayed only for CONFIG_NO_HZ kernels.
 o	"dn" is the current value of the dyntick counter that is incremented
 	when entering or leaving dynticks idle state via NMI.  If both
 	the "dt" and "dn" values are even, then this CPU is in dynticks
 	idle mode and may be ignored by RCU.  If either of these two
 	counters is odd, then RCU must be alert to the possibility of
 	an RCU read-side critical section running on this CPU.
 	This field is displayed only for CONFIG_NO_HZ kernels.
 o	"df" is the number of times that some other CPU has forced a
 	quiescent state on behalf of this CPU due to this CPU being in
 	dynticks-idle state.
 	This field is displayed only for CONFIG_NO_HZ kernels.
 o	"of" is the number of times that some other CPU has forced a
 	quiescent state on behalf of this CPU due to this CPU being
 	offline.  In a perfect world, this might neve happen, but it
 	turns out that offlining and onlining a CPU can take several grace
 	periods, and so there is likely to be an extended period of time
 	when RCU believes that the CPU is online when it really is not.
 	Please note that erring in the other direction (RCU believing a
 	CPU is offline when it is really alive and kicking) is a fatal
 	error, so it makes sense to err conservatively.
 o	"ri" is the number of times that RCU has seen fit to send a
 	reschedule IPI to this CPU in order to get it to report a
 	quiescent state.
 o	"ql" is the number of RCU callbacks currently residing on
 	this CPU.  This is the total number of callbacks, regardless
 	of what state they are in (new, waiting for grace period to
 	start, waiting for grace period to end, ready to invoke).
 o	"b" is the batch limit for this CPU.  If more than this number
 	of RCU callbacks is ready to invoke, then the remainder will
 	be deferred.
 The output of "cat rcu/rcugp" looks as follows:
 rcu: completed=33062  gpnum=33063
 rcu_bh: completed=464  gpnum=464
 Again, this output is for both "rcu" and "rcu_bh".  The fields are
 taken from the rcu_state structure, and are as follows:
 o	"completed" is the number of grace periods that have completed.
 	It is comparable to the "c" field from rcu/rcudata in that a
 	CPU whose "c" field matches the value of "completed" is aware
 	that the corresponding RCU grace period has completed.
 o	"gpnum" is the number of grace periods that have started.  It is
 	comparable to the "g" field from rcu/rcudata in that a CPU
 	whose "g" field matches the value of "gpnum" is aware that the
 	corresponding RCU grace period has started.
 	If these two fields are equal (as they are for "rcu_bh" above),
 	then there is no grace period in progress, in other words, RCU
 	is idle.  On the other hand, if the two fields differ (as they
 	do for "rcu" above), then an RCU grace period is in progress.
 The output of "cat rcu/rcuhier" looks as follows, with very long lines:
 c=6902 g=6903 s=2 jfq=3 j=72c7 nfqs=13142/nfqsng=0(13142) fqlh=6
 1/1 0:127 ^0    
 3/3 0:35 ^0    0/0 36:71 ^1    0/0 72:107 ^2    0/0 108:127 ^3    
 3/3f 0:5 ^0    2/3 6:11 ^1    0/0 12:17 ^2    0/0 18:23 ^3    0/0 24:29 ^4    0/0 30:35 ^5    0/0 36:41 ^0    0/0 42:47 ^1    0/0 48:53 ^2    0/0 54:59 ^3    0/0 60:65 ^4    0/0 66:71 ^5    0/0 72:77 ^0    0/0 78:83 ^1    0/0 84:89 ^2    0/0 90:95 ^3    0/0 96:101 ^4    0/0 102:107 ^5    0/0 108:113 ^0    0/0 114:119 ^1    0/0 120:125 ^2    0/0 126:127 ^3    
 rcu_bh:
 c=-226 g=-226 s=1 jfq=-5701 j=72c7 nfqs=88/nfqsng=0(88) fqlh=0
 0/1 0:127 ^0    
 0/3 0:35 ^0    0/0 36:71 ^1    0/0 72:107 ^2    0/0 108:127 ^3    
 0/3f 0:5 ^0    0/3 6:11 ^1    0/0 12:17 ^2    0/0 18:23 ^3    0/0 24:29 ^4    0/0 30:35 ^5    0/0 36:41 ^0    0/0 42:47 ^1    0/0 48:53 ^2    0/0 54:59 ^3    0/0 60:65 ^4    0/0 66:71 ^5    0/0 72:77 ^0    0/0 78:83 ^1    0/0 84:89 ^2    0/0 90:95 ^3    0/0 96:101 ^4    0/0 102:107 ^5    0/0 108:113 ^0    0/0 114:119 ^1    0/0 120:125 ^2    0/0 126:127 ^3
 This is once again split into "rcu" and "rcu_bh" portions.  The fields are
 as follows:
 o	"c" is exactly the same as "completed" under rcu/rcugp.
 o	"g" is exactly the same as "gpnum" under rcu/rcugp.
 o	"s" is the "signaled" state that drives force_quiescent_state()'s
 	state machine.
 o	"jfq" is the number of jiffies remaining for this grace period
 	before force_quiescent_state() is invoked to help push things
 	along.  Note that CPUs in dyntick-idle mode thoughout the grace
 	period will not report on their own, but rather must be check by
 	some other CPU via force_quiescent_state().
 o	"j" is the low-order four hex digits of the jiffies counter.
 	Yes, Paul did run into a number of problems that turned out to
 	be due to the jiffies counter no longer counting.  Why do you ask?
 o	"nfqs" is the number of calls to force_quiescent_state() since
 	boot.
 o	"nfqsng" is the number of useless calls to force_quiescent_state(),
 	where there wasn't actually a grace period active.  This can
 	happen due to races.  The number in parentheses is the difference
 	between "nfqs" and "nfqsng", or the number of times that
 	force_quiescent_state() actually did some real work.
 o	"fqlh" is the number of calls to force_quiescent_state() that
 	exited immediately (without even being counted in nfqs above)
 	due to contention on ->fqslock.
 o	Each element of the form "1/1 0:127 ^0" represents one struct
 	rcu_node.  Each line represents one level of the hierarchy, from
 	root to leaves.  It is best to think of the rcu_data structures
 	as forming yet another level after the leaves.  Note that there
 	might be either one, two, or three levels of rcu_node structures,
 	depending on the relationship between CONFIG_RCU_FANOUT and
 	CONFIG_NR_CPUS.
 	o	The numbers separated by the "/" are the qsmask followed
 		by the qsmaskinit.  The qsmask will have one bit
 		set for each entity in the next lower level that
 		has not yet checked in for the current grace period.
 		The qsmaskinit will have one bit for each entity that is
 		currently expected to check in during each grace period.
 		The value of qsmaskinit is assigned to that of qsmask
 		at the beginning of each grace period.
 		For example, for "rcu", the qsmask of the first entry
 		of the lowest level is 0x14, meaning that we are still
 		waiting for CPUs 2 and 4 to check in for the current
 		grace period.
 	o	The numbers separated by the ":" are the range of CPUs
 		served by this struct rcu_node.  This can be helpful
 		in working out how the hierarchy is wired together.
 		For example, the first entry at the lowest level shows
 		"0:5", indicating that it covers CPUs 0 through 5.
 	o	The number after the "^" indicates the bit in the
 		next higher level rcu_node structure that this
 		rcu_node structure corresponds to.
 		For example, the first entry at the lowest level shows
 		"^0", indicating that it corresponds to bit zero in
 		the first entry at the middle level.
--- a/arch/powerpc/platforms/pseries/rtasd.c
+++ b/arch/powerpc/platforms/pseries/rtasd.c
@ -208,6 +208,7 @@ void pSeries_log_error(char *buf, unsigned int err_type, int fatal)
 		break;
 	case ERR_TYPE_KERNEL_PANIC:
 	default:
 		WARN_ON_ONCE(!irqs_disabled()); /* @@@ DEBUG @@@ */
 		spin_unlock_irqrestore(&rtasd_log_lock, s);
 		return;
 	}
@ -227,6 +228,7 @@ void pSeries_log_error(char *buf, unsigned int err_type, int fatal)
 	/* Check to see if we need to or have stopped logging */
 	if (fatal || !logging_enabled) {
 		logging_enabled = 0;
 		WARN_ON_ONCE(!irqs_disabled()); /* @@@ DEBUG @@@ */
 		spin_unlock_irqrestore(&rtasd_log_lock, s);
 		return;
 	}
@ -249,11 +251,13 @@ void pSeries_log_error(char *buf, unsigned int err_type, int fatal)
 		else
 			rtas_log_start += 1;
 		WARN_ON_ONCE(!irqs_disabled()); /* @@@ DEBUG @@@ */
 		spin_unlock_irqrestore(&rtasd_log_lock, s);
 		wake_up_interruptible(&rtas_log_wait);
 		break;
 	case ERR_TYPE_KERNEL_PANIC:
 	default:
 		WARN_ON_ONCE(!irqs_disabled()); /* @@@ DEBUG @@@ */
 		spin_unlock_irqrestore(&rtasd_log_lock, s);
 		return;
 	}
--- a/arch/x86/include/asm/dma-mapping.h
+++ b/arch/x86/include/asm/dma-mapping.h
@ -65,7 +65,7 @@ static inline struct dma_mapping_ops *get_dma_ops(struct device *dev)
 		return dma_ops;
 	else
 		return dev->archdata.dma_ops;
-#endif /* _ASM_X86_DMA_MAPPING_H */
+#endif
 }
 /* Make sure we keep the same behaviour */
--- a/arch/x86/include/asm/iommu.h
+++ b/arch/x86/include/asm/iommu.h
@ -7,8 +7,6 @@ extern struct dma_mapping_ops nommu_dma_ops;
 extern int force_iommu, no_iommu;
 extern int iommu_detected;
 extern unsigned long iommu_nr_pages(unsigned long addr, unsigned long len);
 /* 10 seconds */
 #define DMAR_OPERATION_TIMEOUT ((cycles_t) tsc_khz*10*1000)
--- a/arch/x86/include/asm/pci.h
+++ b/arch/x86/include/asm/pci.h
@ -82,6 +82,8 @@ static inline void pci_dma_burst_advice(struct pci_dev *pdev,
 static inline void early_quirks(void) { }
 #endif
 extern void pci_iommu_alloc(void);
 #endif  /* __KERNEL__ */
 #ifdef CONFIG_X86_32
--- a/arch/x86/include/asm/pci_64.h
+++ b/arch/x86/include/asm/pci_64.h
@ -23,7 +23,6 @@ extern int (*pci_config_write)(int seg, int bus, int dev, int fn,
 			       int reg, int len, u32 value);
 extern void dma32_reserve_bootmem(void);
 extern void pci_iommu_alloc(void);
 /* The PCI address space does equal the physical memory
 * address space.  The networking and block device layers use
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@ -105,6 +105,8 @@ microcode-$(CONFIG_MICROCODE_INTEL)	+= microcode_intel.o
 microcode-$(CONFIG_MICROCODE_AMD)	+= microcode_amd.o
 obj-$(CONFIG_MICROCODE)			+= microcode.o
 obj-$(CONFIG_SWIOTLB)			+= pci-swiotlb_64.o # NB rename without _64
 ###
 # 64 bit specific files
 ifeq ($(CONFIG_X86_64),y)
@ -118,7 +120,6 @@ ifeq ($(CONFIG_X86_64),y)
        obj-$(CONFIG_GART_IOMMU)	+= pci-gart_64.o aperture_64.o
        obj-$(CONFIG_CALGARY_IOMMU)	+= pci-calgary_64.o tce_64.o
        obj-$(CONFIG_AMD_IOMMU)		+= amd_iommu_init.o amd_iommu.o
        obj-$(CONFIG_SWIOTLB)		+= pci-swiotlb_64.o
        obj-$(CONFIG_PCI_MMCONFIG)	+= mmconf-fam10h_64.o
 endif
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@ -105,11 +105,15 @@ static void __init dma32_free_bootmem(void)
 	dma32_bootmem_ptr = NULL;
 	dma32_bootmem_size = 0;
 }
 #endif
 void __init pci_iommu_alloc(void)
 {
 #ifdef CONFIG_X86_64
 	/* free the range so iommu could get some range less than 4G */
 	dma32_free_bootmem();
 #endif
 	/*
 	 * The order of these functions is important for
 	 * fall-back/fail-over reasons
@ -125,15 +129,6 @@ void __init pci_iommu_alloc(void)
 	pci_swiotlb_init();
 }
 unsigned long iommu_nr_pages(unsigned long addr, unsigned long len)
 {
 	unsigned long size = roundup((addr & ~PAGE_MASK) + len, PAGE_SIZE);
 	return size >> PAGE_SHIFT;
 }
 EXPORT_SYMBOL(iommu_nr_pages);
 #endif
 void *dma_generic_alloc_coherent(struct device *dev, size_t size,
 				 dma_addr_t *dma_addr, gfp_t flag)
 {
--- a/arch/x86/kernel/pci-swiotlb_64.c
+++ b/arch/x86/kernel/pci-swiotlb_64.c
@ -3,6 +3,8 @@
 #include <linux/pci.h>
 #include <linux/cache.h>
 #include <linux/module.h>
 #include <linux/swiotlb.h>
 #include <linux/bootmem.h>
 #include <linux/dma-mapping.h>
 #include <asm/iommu.h>
@ -11,6 +13,31 @@
 int swiotlb __read_mostly;
 void *swiotlb_alloc_boot(size_t size, unsigned long nslabs)
 {
 	return alloc_bootmem_low_pages(size);
 }
 void *swiotlb_alloc(unsigned order, unsigned long nslabs)
 {
 	return (void *)__get_free_pages(GFP_DMA | __GFP_NOWARN, order);
 }
 dma_addr_t swiotlb_phys_to_bus(phys_addr_t paddr)
 {
 	return paddr;
 }
 phys_addr_t swiotlb_bus_to_phys(dma_addr_t baddr)
 {
 	return baddr;
 }
 int __weak swiotlb_arch_range_needs_mapping(void *ptr, size_t size)
 {
 	return 0;
 }
 static dma_addr_t
 swiotlb_map_single_phys(struct device *hwdev, phys_addr_t paddr, size_t size,
 			int direction)
@ -50,8 +77,10 @@ struct dma_mapping_ops swiotlb_dma_ops = {
 void __init pci_swiotlb_init(void)
 {
 	/* don't initialize swiotlb if iommu=off (no_iommu=1) */
 #ifdef CONFIG_X86_64
 	if (!iommu_detected && !no_iommu && max_pfn > MAX_DMA32_PFN)
 	       swiotlb = 1;
 #endif
 	if (swiotlb_force)
 		swiotlb = 1;
 	if (swiotlb) {
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@ -21,6 +21,7 @@
 #include <linux/init.h>
 #include <linux/highmem.h>
 #include <linux/pagemap.h>
 #include <linux/pci.h>
 #include <linux/pfn.h>
 #include <linux/poison.h>
 #include <linux/bootmem.h>
@ -971,6 +972,8 @@ void __init mem_init(void)
 	start_periodic_check_for_corruption();
 	pci_iommu_alloc();
 #ifdef CONFIG_FLATMEM
 	BUG_ON(!mem_map);
 #endif
--- a/include/linux/bottom_half.h
+++ b/include/linux/bottom_half.h
@ -2,7 +2,6 @@
 #define _LINUX_BH_H
 extern void local_bh_disable(void);
 extern void __local_bh_enable(void);
 extern void _local_bh_enable(void);
 extern void local_bh_enable(void);
 extern void local_bh_enable_ip(unsigned long ip);
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@ -118,13 +118,17 @@ static inline void account_system_vtime(struct task_struct *tsk)
 }
 #endif
-#if defined(CONFIG_PREEMPT_RCU) && defined(CONFIG_NO_HZ)
+#if defined(CONFIG_NO_HZ) && !defined(CONFIG_CLASSIC_RCU)
 extern void rcu_irq_enter(void);
 extern void rcu_irq_exit(void);
 extern void rcu_nmi_enter(void);
 extern void rcu_nmi_exit(void);
 #else
 # define rcu_irq_enter() do { } while (0)
 # define rcu_irq_exit() do { } while (0)
-#endif /* CONFIG_PREEMPT_RCU */
+# define rcu_nmi_enter() do { } while (0)
 # define rcu_nmi_exit() do { } while (0)
 #endif /* #if defined(CONFIG_NO_HZ) && !defined(CONFIG_CLASSIC_RCU) */
 /*
 * It is safe to do non-atomic ops on ->hardirq_context,
@ -134,7 +138,6 @@ extern void rcu_irq_exit(void);
 */
 #define __irq_enter()					\
 	do {						\
 		rcu_irq_enter();			\
 		account_system_vtime(current);		\
 		add_preempt_count(HARDIRQ_OFFSET);	\
 		trace_hardirq_enter();			\
@ -153,7 +156,6 @@ extern void irq_enter(void);
 		trace_hardirq_exit();			\
 		account_system_vtime(current);		\
 		sub_preempt_count(HARDIRQ_OFFSET);	\
 		rcu_irq_exit();				\
 	} while (0)
 /*
@ -161,7 +163,7 @@ extern void irq_enter(void);
 */
 extern void irq_exit(void);
-#define nmi_enter()		do { lockdep_off(); __irq_enter(); } while (0)
+#define nmi_enter()		do { lockdep_off(); rcu_nmi_enter(); __irq_enter(); } while (0)
-#define nmi_exit()		do { __irq_exit(); lockdep_on(); } while (0)
+#define nmi_exit()		do { __irq_exit(); rcu_nmi_exit(); lockdep_on(); } while (0)
 #endif /* LINUX_HARDIRQ_H */
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@ -314,8 +314,15 @@ extern void lock_acquire(struct lockdep_map *lock, unsigned int subclass,
 extern void lock_release(struct lockdep_map *lock, int nested,
 			 unsigned long ip);
-extern void lock_set_subclass(struct lockdep_map *lock, unsigned int subclass,
+extern void lock_set_class(struct lockdep_map *lock, const char *name,
-			      unsigned long ip);
+			   struct lock_class_key *key, unsigned int subclass,
 			   unsigned long ip);
 static inline void lock_set_subclass(struct lockdep_map *lock,
 		unsigned int subclass, unsigned long ip)
 {
 	lock_set_class(lock, lock->name, lock->key, subclass, ip);
 }
 # define INIT_LOCKDEP				.lockdep_recursion = 0,
@ -333,6 +340,7 @@ static inline void lockdep_on(void)
 # define lock_acquire(l, s, t, r, c, n, i)	do { } while (0)
 # define lock_release(l, n, i)			do { } while (0)
 # define lock_set_class(l, n, k, s, i)		do { } while (0)
 # define lock_set_subclass(l, s, i)		do { } while (0)
 # define lockdep_init()				do { } while (0)
 # define lockdep_info()				do { } while (0)
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@ -52,11 +52,15 @@ struct rcu_head {
 	void (*func)(struct rcu_head *head);
 };
-#ifdef CONFIG_CLASSIC_RCU
+#if defined(CONFIG_CLASSIC_RCU)
 #include <linux/rcuclassic.h>
-#else /* #ifdef CONFIG_CLASSIC_RCU */
+#elif defined(CONFIG_TREE_RCU)
 #include <linux/rcutree.h>
 #elif defined(CONFIG_PREEMPT_RCU)
 #include <linux/rcupreempt.h>
-#endif /* #else #ifdef CONFIG_CLASSIC_RCU */
+#else
 #error "Unknown RCU implementation specified to kernel configuration"
 #endif /* #else #if defined(CONFIG_CLASSIC_RCU) */
 #define RCU_HEAD_INIT 	{ .next = NULL, .func = NULL }
 #define RCU_HEAD(head) struct rcu_head head = RCU_HEAD_INIT
--- a/include/linux/rcutree.h
+++ b/include/linux/rcutree.h
@ -0,0 +1,329 @@
 /*
 * Read-Copy Update mechanism for mutual exclusion (tree-based version)
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or
 * (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with this program; if not, write to the Free Software
 * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
 *
 * Copyright IBM Corporation, 2008
 *
 * Author: Dipankar Sarma <dipankar@in.ibm.com>
 *	   Paul E. McKenney <paulmck@linux.vnet.ibm.com> Hierarchical algorithm
 *
 * Based on the original work by Paul McKenney <paulmck@us.ibm.com>
 * and inputs from Rusty Russell, Andrea Arcangeli and Andi Kleen.
 *
 * For detailed explanation of Read-Copy Update mechanism see -
 * 	Documentation/RCU
 */
 #ifndef __LINUX_RCUTREE_H
 #define __LINUX_RCUTREE_H
 #include <linux/cache.h>
 #include <linux/spinlock.h>
 #include <linux/threads.h>
 #include <linux/percpu.h>
 #include <linux/cpumask.h>
 #include <linux/seqlock.h>
 /*
 * Define shape of hierarchy based on NR_CPUS and CONFIG_RCU_FANOUT.
 * In theory, it should be possible to add more levels straightforwardly.
 * In practice, this has not been tested, so there is probably some
 * bug somewhere.
 */
 #define MAX_RCU_LVLS 3
 #define RCU_FANOUT	      (CONFIG_RCU_FANOUT)
 #define RCU_FANOUT_SQ	      (RCU_FANOUT * RCU_FANOUT)
 #define RCU_FANOUT_CUBE	      (RCU_FANOUT_SQ * RCU_FANOUT)
 #if NR_CPUS <= RCU_FANOUT
 #  define NUM_RCU_LVLS	      1
 #  define NUM_RCU_LVL_0	      1
 #  define NUM_RCU_LVL_1	      (NR_CPUS)
 #  define NUM_RCU_LVL_2	      0
 #  define NUM_RCU_LVL_3	      0
 #elif NR_CPUS <= RCU_FANOUT_SQ
 #  define NUM_RCU_LVLS	      2
 #  define NUM_RCU_LVL_0	      1
 #  define NUM_RCU_LVL_1	      (((NR_CPUS) + RCU_FANOUT - 1) / RCU_FANOUT)
 #  define NUM_RCU_LVL_2	      (NR_CPUS)
 #  define NUM_RCU_LVL_3	      0
 #elif NR_CPUS <= RCU_FANOUT_CUBE
 #  define NUM_RCU_LVLS	      3
 #  define NUM_RCU_LVL_0	      1
 #  define NUM_RCU_LVL_1	      (((NR_CPUS) + RCU_FANOUT_SQ - 1) / RCU_FANOUT_SQ)
 #  define NUM_RCU_LVL_2	      (((NR_CPUS) + (RCU_FANOUT) - 1) / (RCU_FANOUT))
 #  define NUM_RCU_LVL_3	      NR_CPUS
 #else
 # error "CONFIG_RCU_FANOUT insufficient for NR_CPUS"
 #endif /* #if (NR_CPUS) <= RCU_FANOUT */
 #define RCU_SUM (NUM_RCU_LVL_0 + NUM_RCU_LVL_1 + NUM_RCU_LVL_2 + NUM_RCU_LVL_3)
 #define NUM_RCU_NODES (RCU_SUM - NR_CPUS)
 /*
 * Dynticks per-CPU state.
 */
 struct rcu_dynticks {
 	int dynticks_nesting;	/* Track nesting level, sort of. */
 	int dynticks;		/* Even value for dynticks-idle, else odd. */
 	int dynticks_nmi;	/* Even value for either dynticks-idle or */
 				/*  not in nmi handler, else odd.  So this */
 				/*  remains even for nmi from irq handler. */
 };
 /*
 * Definition for node within the RCU grace-period-detection hierarchy.
 */
 struct rcu_node {
 	spinlock_t lock;
 	unsigned long qsmask;	/* CPUs or groups that need to switch in */
 				/*  order for current grace period to proceed.*/
 	unsigned long qsmaskinit;
 				/* Per-GP initialization for qsmask. */
 	unsigned long grpmask;	/* Mask to apply to parent qsmask. */
 	int	grplo;		/* lowest-numbered CPU or group here. */
 	int	grphi;		/* highest-numbered CPU or group here. */
 	u8	grpnum;		/* CPU/group number for next level up. */
 	u8	level;		/* root is at level 0. */
 	struct rcu_node *parent;
 } ____cacheline_internodealigned_in_smp;
 /* Index values for nxttail array in struct rcu_data. */
 #define RCU_DONE_TAIL		0	/* Also RCU_WAIT head. */
 #define RCU_WAIT_TAIL		1	/* Also RCU_NEXT_READY head. */
 #define RCU_NEXT_READY_TAIL	2	/* Also RCU_NEXT head. */
 #define RCU_NEXT_TAIL		3
 #define RCU_NEXT_SIZE		4
 /* Per-CPU data for read-copy update. */
 struct rcu_data {
 	/* 1) quiescent-state and grace-period handling : */
 	long		completed;	/* Track rsp->completed gp number */
 					/*  in order to detect GP end. */
 	long		gpnum;		/* Highest gp number that this CPU */
 					/*  is aware of having started. */
 	long		passed_quiesc_completed;
 					/* Value of completed at time of qs. */
 	bool		passed_quiesc;	/* User-mode/idle loop etc. */
 	bool		qs_pending;	/* Core waits for quiesc state. */
 	bool		beenonline;	/* CPU online at least once. */
 	struct rcu_node *mynode;	/* This CPU's leaf of hierarchy */
 	unsigned long grpmask;		/* Mask to apply to leaf qsmask. */
 	/* 2) batch handling */
 	/*
 	 * If nxtlist is not NULL, it is partitioned as follows.
 	 * Any of the partitions might be empty, in which case the
 	 * pointer to that partition will be equal to the pointer for
 	 * the following partition.  When the list is empty, all of
 	 * the nxttail elements point to nxtlist, which is NULL.
 	 *
 	 * [*nxttail[RCU_NEXT_READY_TAIL], NULL = *nxttail[RCU_NEXT_TAIL]):
 	 *	Entries that might have arrived after current GP ended
 	 * [*nxttail[RCU_WAIT_TAIL], *nxttail[RCU_NEXT_READY_TAIL]):
 	 *	Entries known to have arrived before current GP ended
 	 * [*nxttail[RCU_DONE_TAIL], *nxttail[RCU_WAIT_TAIL]):
 	 *	Entries that batch # <= ->completed - 1: waiting for current GP
 	 * [nxtlist, *nxttail[RCU_DONE_TAIL]):
 	 *	Entries that batch # <= ->completed
 	 *	The grace period for these entries has completed, and
 	 *	the other grace-period-completed entries may be moved
 	 *	here temporarily in rcu_process_callbacks().
 	 */
 	struct rcu_head *nxtlist;
 	struct rcu_head **nxttail[RCU_NEXT_SIZE];
 	long		qlen; 	 	/* # of queued callbacks */
 	long		blimit;		/* Upper limit on a processed batch */
 #ifdef CONFIG_NO_HZ
 	/* 3) dynticks interface. */
 	struct rcu_dynticks *dynticks;	/* Shared per-CPU dynticks state. */
 	int dynticks_snap;		/* Per-GP tracking for dynticks. */
 	int dynticks_nmi_snap;		/* Per-GP tracking for dynticks_nmi. */
 #endif /* #ifdef CONFIG_NO_HZ */
 	/* 4) reasons this CPU needed to be kicked by force_quiescent_state */
 #ifdef CONFIG_NO_HZ
 	unsigned long dynticks_fqs;	/* Kicked due to dynticks idle. */
 #endif /* #ifdef CONFIG_NO_HZ */
 	unsigned long offline_fqs;	/* Kicked due to being offline. */
 	unsigned long resched_ipi;	/* Sent a resched IPI. */
 	/* 5) state to allow this CPU to force_quiescent_state on others */
 	long n_rcu_pending;		/* rcu_pending() calls since boot. */
 	long n_rcu_pending_force_qs;	/* when to force quiescent states. */
 	int cpu;
 };
 /* Values for signaled field in struct rcu_state. */
 #define RCU_GP_INIT		0	/* Grace period being initialized. */
 #define RCU_SAVE_DYNTICK	1	/* Need to scan dyntick state. */
 #define RCU_FORCE_QS		2	/* Need to force quiescent state. */
 #ifdef CONFIG_NO_HZ
 #define RCU_SIGNAL_INIT		RCU_SAVE_DYNTICK
 #else /* #ifdef CONFIG_NO_HZ */
 #define RCU_SIGNAL_INIT		RCU_FORCE_QS
 #endif /* #else #ifdef CONFIG_NO_HZ */
 #define RCU_JIFFIES_TILL_FORCE_QS	 3	/* for rsp->jiffies_force_qs */
 #ifdef CONFIG_RCU_CPU_STALL_DETECTOR
 #define RCU_SECONDS_TILL_STALL_CHECK   (10 * HZ)  /* for rsp->jiffies_stall */
 #define RCU_SECONDS_TILL_STALL_RECHECK (30 * HZ)  /* for rsp->jiffies_stall */
 #define RCU_STALL_RAT_DELAY		2	  /* Allow other CPUs time */
 						  /*  to take at least one */
 						  /*  scheduling clock irq */
 						  /*  before ratting on them. */
 #endif /* #ifdef CONFIG_RCU_CPU_STALL_DETECTOR */
 /*
 * RCU global state, including node hierarchy.  This hierarchy is
 * represented in "heap" form in a dense array.  The root (first level)
 * of the hierarchy is in ->node[0] (referenced by ->level[0]), the second
 * level in ->node[1] through ->node[m] (->node[1] referenced by ->level[1]),
 * and the third level in ->node[m+1] and following (->node[m+1] referenced
 * by ->level[2]).  The number of levels is determined by the number of
 * CPUs and by CONFIG_RCU_FANOUT.  Small systems will have a "hierarchy"
 * consisting of a single rcu_node.
 */
 struct rcu_state {
 	struct rcu_node node[NUM_RCU_NODES];	/* Hierarchy. */
 	struct rcu_node *level[NUM_RCU_LVLS];	/* Hierarchy levels. */
 	u32 levelcnt[MAX_RCU_LVLS + 1];		/* # nodes in each level. */
 	u8 levelspread[NUM_RCU_LVLS];		/* kids/node in each level. */
 	struct rcu_data *rda[NR_CPUS];		/* array of rdp pointers. */
 	/* The following fields are guarded by the root rcu_node's lock. */
 	u8	signaled ____cacheline_internodealigned_in_smp;
 						/* Force QS state. */
 	long	gpnum;				/* Current gp number. */
 	long	completed;			/* # of last completed gp. */
 	spinlock_t onofflock;			/* exclude on/offline and */
 						/*  starting new GP. */
 	spinlock_t fqslock;			/* Only one task forcing */
 						/*  quiescent states. */
 	unsigned long jiffies_force_qs;		/* Time at which to invoke */
 						/*  force_quiescent_state(). */
 	unsigned long n_force_qs;		/* Number of calls to */
 						/*  force_quiescent_state(). */
 	unsigned long n_force_qs_lh;		/* ~Number of calls leaving */
 						/*  due to lock unavailable. */
 	unsigned long n_force_qs_ngp;		/* Number of calls leaving */
 						/*  due to no GP active. */
 #ifdef CONFIG_RCU_CPU_STALL_DETECTOR
 	unsigned long gp_start;			/* Time at which GP started, */
 						/*  but in jiffies. */
 	unsigned long jiffies_stall;		/* Time at which to check */
 						/*  for CPU stalls. */
 #endif /* #ifdef CONFIG_RCU_CPU_STALL_DETECTOR */
 #ifdef CONFIG_NO_HZ
 	long dynticks_completed;		/* Value of completed @ snap. */
 #endif /* #ifdef CONFIG_NO_HZ */
 };
 extern struct rcu_state rcu_state;
 DECLARE_PER_CPU(struct rcu_data, rcu_data);
 extern struct rcu_state rcu_bh_state;
 DECLARE_PER_CPU(struct rcu_data, rcu_bh_data);
 /*
 * Increment the quiescent state counter.
 * The counter is a bit degenerated: We do not need to know
 * how many quiescent states passed, just if there was at least
 * one since the start of the grace period. Thus just a flag.
 */
 static inline void rcu_qsctr_inc(int cpu)
 {
 	struct rcu_data *rdp = &per_cpu(rcu_data, cpu);
 	rdp->passed_quiesc = 1;
 	rdp->passed_quiesc_completed = rdp->completed;
 }
 static inline void rcu_bh_qsctr_inc(int cpu)
 {
 	struct rcu_data *rdp = &per_cpu(rcu_bh_data, cpu);
 	rdp->passed_quiesc = 1;
 	rdp->passed_quiesc_completed = rdp->completed;
 }
 extern int rcu_pending(int cpu);
 extern int rcu_needs_cpu(int cpu);
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 extern struct lockdep_map rcu_lock_map;
 # define rcu_read_acquire()	\
 			lock_acquire(&rcu_lock_map, 0, 0, 2, 1, NULL, _THIS_IP_)
 # define rcu_read_release()	lock_release(&rcu_lock_map, 1, _THIS_IP_)
 #else
 # define rcu_read_acquire()	do { } while (0)
 # define rcu_read_release()	do { } while (0)
 #endif
 static inline void __rcu_read_lock(void)
 {
 	preempt_disable();
 	__acquire(RCU);
 	rcu_read_acquire();
 }
 static inline void __rcu_read_unlock(void)
 {
 	rcu_read_release();
 	__release(RCU);
 	preempt_enable();
 }
 static inline void __rcu_read_lock_bh(void)
 {
 	local_bh_disable();
 	__acquire(RCU_BH);
 	rcu_read_acquire();
 }
 static inline void __rcu_read_unlock_bh(void)
 {
 	rcu_read_release();
 	__release(RCU_BH);
 	local_bh_enable();
 }
 #define __synchronize_sched() synchronize_rcu()
 #define call_rcu_sched(head, func) call_rcu(head, func)
 static inline void rcu_init_sched(void)
 {
 }
 extern void __rcu_init(void);
 extern void rcu_check_callbacks(int cpu, int user);
 extern void rcu_restart_cpu(int cpu);
 extern long rcu_batches_completed(void);
 extern long rcu_batches_completed_bh(void);
 #ifdef CONFIG_NO_HZ
 void rcu_enter_nohz(void);
 void rcu_exit_nohz(void);
 #else /* CONFIG_NO_HZ */
 static inline void rcu_enter_nohz(void)
 {
 }
 static inline void rcu_exit_nohz(void)
 {
 }
 #endif /* CONFIG_NO_HZ */
 #endif /* __LINUX_RCUTREE_H */
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@ -7,9 +7,31 @@ struct device;
 struct dma_attrs;
 struct scatterlist;
 /*
 * Maximum allowable number of contiguous slabs to map,
 * must be a power of 2.  What is the appropriate value ?
 * The complexity of {map,unmap}_single is linearly dependent on this value.
 */
 #define IO_TLB_SEGSIZE	128
 /*
 * log of the size of each IO TLB slab.  The number of slabs is command line
 * controllable.
 */
 #define IO_TLB_SHIFT 11
 extern void
 swiotlb_init(void);
 extern void *swiotlb_alloc_boot(size_t bytes, unsigned long nslabs);
 extern void *swiotlb_alloc(unsigned order, unsigned long nslabs);
 extern dma_addr_t swiotlb_phys_to_bus(phys_addr_t address);
 extern phys_addr_t swiotlb_bus_to_phys(dma_addr_t address);
 extern int swiotlb_arch_range_needs_mapping(void *ptr, size_t size);
 extern void
 *swiotlb_alloc_coherent(struct device *hwdev, size_t size,
 			dma_addr_t *dma_handle, gfp_t flags);
--- a/init/Kconfig
+++ b/init/Kconfig
@ -928,10 +928,90 @@ source "block/Kconfig"
 config PREEMPT_NOTIFIERS
 	bool
 choice
 	prompt "RCU Implementation"
 	default CLASSIC_RCU
 config CLASSIC_RCU
-	def_bool !PREEMPT_RCU
+	bool "Classic RCU"
 	help
 	  This option selects the classic RCU implementation that is
 	  designed for best read-side performance on non-realtime
-	  systems.  Classic RCU is the default.  Note that the
+	  systems.
-	  PREEMPT_RCU symbol is used to select/deselect this option.
+
 	  Select this option if you are unsure.
 config TREE_RCU
 	bool "Tree-based hierarchical RCU"
 	help
 	  This option selects the RCU implementation that is
 	  designed for very large SMP system with hundreds or
 	  thousands of CPUs.
 config PREEMPT_RCU
 	bool "Preemptible RCU"
 	depends on PREEMPT
 	help
 	  This option reduces the latency of the kernel by making certain
 	  RCU sections preemptible. Normally RCU code is non-preemptible, if
 	  this option is selected then read-only RCU sections become
 	  preemptible. This helps latency, but may expose bugs due to
 	  now-naive assumptions about each RCU read-side critical section
 	  remaining on a given CPU through its execution.
 endchoice
 config RCU_TRACE
 	bool "Enable tracing for RCU"
 	depends on TREE_RCU || PREEMPT_RCU
 	help
 	  This option provides tracing in RCU which presents stats
 	  in debugfs for debugging RCU implementation.
 	  Say Y here if you want to enable RCU tracing
 	  Say N if you are unsure.
 config RCU_FANOUT
 	int "Tree-based hierarchical RCU fanout value"
 	range 2 64 if 64BIT
 	range 2 32 if !64BIT
 	depends on TREE_RCU
 	default 64 if 64BIT
 	default 32 if !64BIT
 	help
 	  This option controls the fanout of hierarchical implementations
 	  of RCU, allowing RCU to work efficiently on machines with
 	  large numbers of CPUs.  This value must be at least the cube
 	  root of NR_CPUS, which allows NR_CPUS up to 32,768 for 32-bit
 	  systems and up to 262,144 for 64-bit systems.
 	  Select a specific number if testing RCU itself.
 	  Take the default if unsure.
 config RCU_FANOUT_EXACT
 	bool "Disable tree-based hierarchical RCU auto-balancing"
 	depends on TREE_RCU
 	default n
 	help
 	  This option forces use of the exact RCU_FANOUT value specified,
 	  regardless of imbalances in the hierarchy.  This is useful for
 	  testing RCU itself, and might one day be useful on systems with
 	  strong NUMA behavior.
 	  Without RCU_FANOUT_EXACT, the code will balance the hierarchy.
 	  Say N if unsure.
 config TREE_RCU_TRACE
 	def_bool RCU_TRACE && TREE_RCU
 	select DEBUG_FS
 	help
 	  This option provides tracing for the TREE_RCU implementation,
 	  permitting Makefile to trivially select kernel/rcutree_trace.c.
 config PREEMPT_RCU_TRACE
 	def_bool RCU_TRACE && PREEMPT_RCU
 	select DEBUG_FS
 	help
 	  This option provides tracing for the PREEMPT_RCU implementation,
 	  permitting Makefile to trivially select kernel/rcupreempt_trace.c.
--- a/kernel/Kconfig.preempt
+++ b/kernel/Kconfig.preempt
@ -52,28 +52,3 @@ config PREEMPT
 endchoice
 config PREEMPT_RCU
 	bool "Preemptible RCU"
 	depends on PREEMPT
 	default n
 	help
 	  This option reduces the latency of the kernel by making certain
 	  RCU sections preemptible. Normally RCU code is non-preemptible, if
 	  this option is selected then read-only RCU sections become
 	  preemptible. This helps latency, but may expose bugs due to
 	  now-naive assumptions about each RCU read-side critical section
 	  remaining on a given CPU through its execution.
 	  Say N if you are unsure.
 config RCU_TRACE
 	bool "Enable tracing for RCU - currently stats in debugfs"
 	depends on PREEMPT_RCU
 	select DEBUG_FS
 	default y
 	help
 	  This option provides tracing in RCU which presents stats
 	  in debugfs for debugging RCU implementation.
 	  Say Y here if you want to enable RCU tracing
 	  Say N if you are unsure.
--- a/kernel/Makefile
+++ b/kernel/Makefile
@ -74,10 +74,10 @@ obj-$(CONFIG_GENERIC_HARDIRQS) += irq/
 obj-$(CONFIG_SECCOMP) += seccomp.o
 obj-$(CONFIG_RCU_TORTURE_TEST) += rcutorture.o
 obj-$(CONFIG_CLASSIC_RCU) += rcuclassic.o
 obj-$(CONFIG_TREE_RCU) += rcutree.o
 obj-$(CONFIG_PREEMPT_RCU) += rcupreempt.o
-ifeq ($(CONFIG_PREEMPT_RCU),y)
+obj-$(CONFIG_TREE_RCU_TRACE) += rcutree_trace.o
-obj-$(CONFIG_RCU_TRACE) += rcupreempt_trace.o
+obj-$(CONFIG_PREEMPT_RCU_TRACE) += rcupreempt_trace.o
 endif
 obj-$(CONFIG_RELAY) += relay.o
 obj-$(CONFIG_SYSCTL) += utsname_sysctl.o
 obj-$(CONFIG_TASK_DELAY_ACCT) += delayacct.o
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@ -673,6 +673,18 @@ int request_irq(unsigned int irq, irq_handler_t handler,
 	struct irq_desc *desc;
 	int retval;
 	/*
 	 * handle_IRQ_event() always ignores IRQF_DISABLED except for
 	 * the _first_ irqaction (sigh).  That can cause oopsing, but
 	 * the behavior is classified as "will not fix" so we need to
 	 * start nudging drivers away from using that idiom.
 	 */
 	if ((irqflags & (IRQF_SHARED|IRQF_DISABLED))
 			== (IRQF_SHARED|IRQF_DISABLED))
 		pr_warning("IRQ %d/%s: IRQF_DISABLED is not "
 				"guaranteed on shared IRQs\n",
 				irq, devname);
 #ifdef CONFIG_LOCKDEP
 	/*
 	 * Lockdep wants atomic interrupt handlers:
--- a/kernel/lockdep.c
+++ b/kernel/lockdep.c
@ -291,14 +291,12 @@ void lockdep_off(void)
 {
 	current->lockdep_recursion++;
 }
 EXPORT_SYMBOL(lockdep_off);
 void lockdep_on(void)
 {
 	current->lockdep_recursion--;
 }
 EXPORT_SYMBOL(lockdep_on);
 /*
@ -580,7 +578,8 @@ static void print_lock_class_header(struct lock_class *class, int depth)
 /*
 * printk all lock dependencies starting at <entry>:
 */
-static void print_lock_dependencies(struct lock_class *class, int depth)
+static void __used
 print_lock_dependencies(struct lock_class *class, int depth)
 {
 	struct lock_list *entry;
@ -2512,7 +2511,6 @@ void lockdep_init_map(struct lockdep_map *lock, const char *name,
 	if (subclass)
 		register_lock_class(lock, subclass, 1);
 }
 EXPORT_SYMBOL_GPL(lockdep_init_map);
 /*
@ -2693,8 +2691,9 @@ static int check_unlock(struct task_struct *curr, struct lockdep_map *lock,
 }
 static int
-__lock_set_subclass(struct lockdep_map *lock,
+__lock_set_class(struct lockdep_map *lock, const char *name,
-		    unsigned int subclass, unsigned long ip)
+		 struct lock_class_key *key, unsigned int subclass,
 		 unsigned long ip)
 {
 	struct task_struct *curr = current;
 	struct held_lock *hlock, *prev_hlock;
@ -2721,6 +2720,7 @@ __lock_set_subclass(struct lockdep_map *lock,
 	return print_unlock_inbalance_bug(curr, lock, ip);
 found_it:
 	lockdep_init_map(lock, name, key, 0);
 	class = register_lock_class(lock, subclass, 0);
 	hlock->class_idx = class - lock_classes + 1;
@ -2905,9 +2905,9 @@ static void check_flags(unsigned long flags)
 #endif
 }
-void
+void lock_set_class(struct lockdep_map *lock, const char *name,
-lock_set_subclass(struct lockdep_map *lock,
+		    struct lock_class_key *key, unsigned int subclass,
-		  unsigned int subclass, unsigned long ip)
+		    unsigned long ip)
 {
 	unsigned long flags;
@ -2917,13 +2917,12 @@ lock_set_subclass(struct lockdep_map *lock,
 	raw_local_irq_save(flags);
 	current->lockdep_recursion = 1;
 	check_flags(flags);
-	if (__lock_set_subclass(lock, subclass, ip))
+	if (__lock_set_class(lock, name, key, subclass, ip))
 		check_chain_key(current);
 	current->lockdep_recursion = 0;
 	raw_local_irq_restore(flags);
 }
-
+EXPORT_SYMBOL_GPL(lock_set_class);
 EXPORT_SYMBOL_GPL(lock_set_subclass);
 /*
 * We are not always called with irqs disabled - do that here,
@ -2947,7 +2946,6 @@ void lock_acquire(struct lockdep_map *lock, unsigned int subclass,
 	current->lockdep_recursion = 0;
 	raw_local_irq_restore(flags);
 }
 EXPORT_SYMBOL_GPL(lock_acquire);
 void lock_release(struct lockdep_map *lock, int nested,
@ -2965,7 +2963,6 @@ void lock_release(struct lockdep_map *lock, int nested,
 	current->lockdep_recursion = 0;
 	raw_local_irq_restore(flags);
 }
 EXPORT_SYMBOL_GPL(lock_release);
 #ifdef CONFIG_LOCK_STAT
@ -3450,7 +3447,6 @@ retry:
 	if (unlock)
 		read_unlock(&tasklist_lock);
 }
 EXPORT_SYMBOL_GPL(debug_show_all_locks);
 /*
@ -3471,7 +3467,6 @@ void debug_show_held_locks(struct task_struct *task)
 {
 		__debug_show_held_locks(task);
 }
 EXPORT_SYMBOL_GPL(debug_show_held_locks);
 void lockdep_sys_exit(void)
--- a/kernel/printk.c
+++ b/kernel/printk.c
@ -662,7 +662,7 @@ asmlinkage int vprintk(const char *fmt, va_list args)
 	if (recursion_bug) {
 		recursion_bug = 0;
 		strcpy(printk_buf, recursion_bug_msg);
-		printed_len = sizeof(recursion_bug_msg);
+		printed_len = strlen(recursion_bug_msg);
 	}
 	/* Emit the output into the temporary buffer */
 	printed_len += vscnprintf(printk_buf + printed_len,
--- a/kernel/rcupreempt.c
+++ b/kernel/rcupreempt.c
@ -551,6 +551,16 @@ void rcu_irq_exit(void)
 	}
 }
 void rcu_nmi_enter(void)
 {
 	rcu_irq_enter();
 }
 void rcu_nmi_exit(void)
 {
 	rcu_irq_exit();
 }
 static void dyntick_save_progress_counter(int cpu)
 {
 	struct rcu_dyntick_sched *rdssp = &per_cpu(rcu_dyntick_sched, cpu);
--- a/kernel/rcupreempt_trace.c
+++ b/kernel/rcupreempt_trace.c
@ -149,12 +149,12 @@ static void rcupreempt_trace_sum(struct rcupreempt_trace *sp)
 		sp->done_length += cp->done_length;
 		sp->done_add += cp->done_add;
 		sp->done_remove += cp->done_remove;
-		atomic_set(&sp->done_invoked, atomic_read(&cp->done_invoked));
+		atomic_add(atomic_read(&cp->done_invoked), &sp->done_invoked);
 		sp->rcu_check_callbacks += cp->rcu_check_callbacks;
-		atomic_set(&sp->rcu_try_flip_1,
+		atomic_add(atomic_read(&cp->rcu_try_flip_1),
-			   atomic_read(&cp->rcu_try_flip_1));
+			   &sp->rcu_try_flip_1);
-		atomic_set(&sp->rcu_try_flip_e1,
+		atomic_add(atomic_read(&cp->rcu_try_flip_e1),
-			   atomic_read(&cp->rcu_try_flip_e1));
+			   &sp->rcu_try_flip_e1);
 		sp->rcu_try_flip_i1 += cp->rcu_try_flip_i1;
 		sp->rcu_try_flip_ie1 += cp->rcu_try_flip_ie1;
 		sp->rcu_try_flip_g1 += cp->rcu_try_flip_g1;
--- a/kernel/rcutorture.c
+++ b/kernel/rcutorture.c
@ -39,6 +39,7 @@
 #include <linux/moduleparam.h>
 #include <linux/percpu.h>
 #include <linux/notifier.h>
 #include <linux/reboot.h>
 #include <linux/freezer.h>
 #include <linux/cpu.h>
 #include <linux/delay.h>
@ -108,7 +109,6 @@ struct rcu_torture {
 	int rtort_mbtest;
 };
 static int fullstop = 0;	/* stop generating callbacks at test end. */
 static LIST_HEAD(rcu_torture_freelist);
 static struct rcu_torture *rcu_torture_current = NULL;
 static long rcu_torture_current_version = 0;
@ -136,6 +136,30 @@ static int stutter_pause_test = 0;
 #endif
 int rcutorture_runnable = RCUTORTURE_RUNNABLE_INIT;
 #define FULLSTOP_SIGNALED 1	/* Bail due to signal. */
 #define FULLSTOP_CLEANUP  2	/* Orderly shutdown. */
 static int fullstop;		/* stop generating callbacks at test end. */
 DEFINE_MUTEX(fullstop_mutex);	/* protect fullstop transitions and */
 				/*  spawning of kthreads. */
 /*
 * Detect and respond to a signal-based shutdown.
 */
 static int
 rcutorture_shutdown_notify(struct notifier_block *unused1,
 			   unsigned long unused2, void *unused3)
 {
 	if (fullstop)
 		return NOTIFY_DONE;
 	if (signal_pending(current)) {
 		mutex_lock(&fullstop_mutex);
 		if (!ACCESS_ONCE(fullstop))
 			fullstop = FULLSTOP_SIGNALED;
 		mutex_unlock(&fullstop_mutex);
 	}
 	return NOTIFY_DONE;
 }
 /*
 * Allocate an element from the rcu_tortures pool.
 */
@ -199,11 +223,12 @@ rcu_random(struct rcu_random_state *rrsp)
 static void
 rcu_stutter_wait(void)
 {
-	while (stutter_pause_test || !rcutorture_runnable)
+	while ((stutter_pause_test || !rcutorture_runnable) && !fullstop) {
 		if (rcutorture_runnable)
 			schedule_timeout_interruptible(1);
 		else
 			schedule_timeout_interruptible(round_jiffies_relative(HZ));
 	}
 }
 /*
@ -599,7 +624,7 @@ rcu_torture_writer(void *arg)
 		rcu_stutter_wait();
 	} while (!kthread_should_stop() && !fullstop);
 	VERBOSE_PRINTK_STRING("rcu_torture_writer task stopping");
-	while (!kthread_should_stop())
+	while (!kthread_should_stop() && fullstop != FULLSTOP_SIGNALED)
 		schedule_timeout_uninterruptible(1);
 	return 0;
 }
@ -624,7 +649,7 @@ rcu_torture_fakewriter(void *arg)
 	} while (!kthread_should_stop() && !fullstop);
 	VERBOSE_PRINTK_STRING("rcu_torture_fakewriter task stopping");
-	while (!kthread_should_stop())
+	while (!kthread_should_stop() && fullstop != FULLSTOP_SIGNALED)
 		schedule_timeout_uninterruptible(1);
 	return 0;
 }
@ -734,7 +759,7 @@ rcu_torture_reader(void *arg)
 	VERBOSE_PRINTK_STRING("rcu_torture_reader task stopping");
 	if (irqreader && cur_ops->irqcapable)
 		del_timer_sync(&t);
-	while (!kthread_should_stop())
+	while (!kthread_should_stop() && fullstop != FULLSTOP_SIGNALED)
 		schedule_timeout_uninterruptible(1);
 	return 0;
 }
@ -831,7 +856,7 @@ rcu_torture_stats(void *arg)
 	do {
 		schedule_timeout_interruptible(stat_interval * HZ);
 		rcu_torture_stats_print();
-	} while (!kthread_should_stop());
+	} while (!kthread_should_stop() && !fullstop);
 	VERBOSE_PRINTK_STRING("rcu_torture_stats task stopping");
 	return 0;
 }
@ -899,7 +924,7 @@ rcu_torture_shuffle(void *arg)
 	do {
 		schedule_timeout_interruptible(shuffle_interval * HZ);
 		rcu_torture_shuffle_tasks();
-	} while (!kthread_should_stop());
+	} while (!kthread_should_stop() && !fullstop);
 	VERBOSE_PRINTK_STRING("rcu_torture_shuffle task stopping");
 	return 0;
 }
@ -914,10 +939,10 @@ rcu_torture_stutter(void *arg)
 	do {
 		schedule_timeout_interruptible(stutter * HZ);
 		stutter_pause_test = 1;
-		if (!kthread_should_stop())
+		if (!kthread_should_stop() && !fullstop)
 			schedule_timeout_interruptible(stutter * HZ);
 		stutter_pause_test = 0;
-	} while (!kthread_should_stop());
+	} while (!kthread_should_stop() && !fullstop);
 	VERBOSE_PRINTK_STRING("rcu_torture_stutter task stopping");
 	return 0;
 }
@ -934,12 +959,27 @@ rcu_torture_print_module_parms(char *tag)
 		stutter, irqreader);
 }
 static struct notifier_block rcutorture_nb = {
 	.notifier_call = rcutorture_shutdown_notify,
 };
 static void
 rcu_torture_cleanup(void)
 {
 	int i;
-	fullstop = 1;
+	mutex_lock(&fullstop_mutex);
 	if (!fullstop) {
 		/* If being signaled, let it happen, then exit. */
 		mutex_unlock(&fullstop_mutex);
 		schedule_timeout_interruptible(10 * HZ);
 		if (cur_ops->cb_barrier != NULL)
 			cur_ops->cb_barrier();
 		return;
 	}
 	fullstop = FULLSTOP_CLEANUP;
 	mutex_unlock(&fullstop_mutex);
 	unregister_reboot_notifier(&rcutorture_nb);
 	if (stutter_task) {
 		VERBOSE_PRINTK_STRING("Stopping rcu_torture_stutter task");
 		kthread_stop(stutter_task);
@ -1015,6 +1055,8 @@ rcu_torture_init(void)
 		{ &rcu_ops, &rcu_sync_ops, &rcu_bh_ops, &rcu_bh_sync_ops,
 		  &srcu_ops, &sched_ops, &sched_ops_sync, };
 	mutex_lock(&fullstop_mutex);
 	/* Process args and tell the world that the torturer is on the job. */
 	for (i = 0; i < ARRAY_SIZE(torture_ops); i++) {
 		cur_ops = torture_ops[i];
@ -1024,6 +1066,7 @@ rcu_torture_init(void)
 	if (i == ARRAY_SIZE(torture_ops)) {
 		printk(KERN_ALERT "rcutorture: invalid torture type: \"%s\"\n",
 		       torture_type);
 		mutex_unlock(&fullstop_mutex);
 		return (-EINVAL);
 	}
 	if (cur_ops->init)
@ -1146,9 +1189,12 @@ rcu_torture_init(void)
 			goto unwind;
 		}
 	}
 	register_reboot_notifier(&rcutorture_nb);
 	mutex_unlock(&fullstop_mutex);
 	return 0;
 unwind:
 	mutex_unlock(&fullstop_mutex);
 	rcu_torture_cleanup();
 	return firsterr;
 }
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
--- a/kernel/rcutree_trace.c
+++ b/kernel/rcutree_trace.c
@ -0,0 +1,271 @@
 /*
 * Read-Copy Update tracing for classic implementation
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or
 * (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with this program; if not, write to the Free Software
 * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
 *
 * Copyright IBM Corporation, 2008
 *
 * Papers:  http://www.rdrop.com/users/paulmck/RCU
 *
 * For detailed explanation of Read-Copy Update mechanism see -
 * 		Documentation/RCU
 *
 */
 #include <linux/types.h>
 #include <linux/kernel.h>
 #include <linux/init.h>
 #include <linux/spinlock.h>
 #include <linux/smp.h>
 #include <linux/rcupdate.h>
 #include <linux/interrupt.h>
 #include <linux/sched.h>
 #include <asm/atomic.h>
 #include <linux/bitops.h>
 #include <linux/module.h>
 #include <linux/completion.h>
 #include <linux/moduleparam.h>
 #include <linux/percpu.h>
 #include <linux/notifier.h>
 #include <linux/cpu.h>
 #include <linux/mutex.h>
 #include <linux/debugfs.h>
 #include <linux/seq_file.h>
 static void print_one_rcu_data(struct seq_file *m, struct rcu_data *rdp)
 {
 	if (!rdp->beenonline)
 		return;
 	seq_printf(m, "%3d%cc=%ld g=%ld pq=%d pqc=%ld qp=%d rpfq=%ld rp=%x",
 		   rdp->cpu,
 		   cpu_is_offline(rdp->cpu) ? '!' : ' ',
 		   rdp->completed, rdp->gpnum,
 		   rdp->passed_quiesc, rdp->passed_quiesc_completed,
 		   rdp->qs_pending,
 		   rdp->n_rcu_pending_force_qs - rdp->n_rcu_pending,
 		   (int)(rdp->n_rcu_pending & 0xffff));
 #ifdef CONFIG_NO_HZ
 	seq_printf(m, " dt=%d/%d dn=%d df=%lu",
 		   rdp->dynticks->dynticks,
 		   rdp->dynticks->dynticks_nesting,
 		   rdp->dynticks->dynticks_nmi,
 		   rdp->dynticks_fqs);
 #endif /* #ifdef CONFIG_NO_HZ */
 	seq_printf(m, " of=%lu ri=%lu", rdp->offline_fqs, rdp->resched_ipi);
 	seq_printf(m, " ql=%ld b=%ld\n", rdp->qlen, rdp->blimit);
 }
 #define PRINT_RCU_DATA(name, func, m) \
 	do { \
 		int _p_r_d_i; \
 		\
 		for_each_possible_cpu(_p_r_d_i) \
 			func(m, &per_cpu(name, _p_r_d_i)); \
 	} while (0)
 static int show_rcudata(struct seq_file *m, void *unused)
 {
 	seq_puts(m, "rcu:\n");
 	PRINT_RCU_DATA(rcu_data, print_one_rcu_data, m);
 	seq_puts(m, "rcu_bh:\n");
 	PRINT_RCU_DATA(rcu_bh_data, print_one_rcu_data, m);
 	return 0;
 }
 static int rcudata_open(struct inode *inode, struct file *file)
 {
 	return single_open(file, show_rcudata, NULL);
 }
 static struct file_operations rcudata_fops = {
 	.owner = THIS_MODULE,
 	.open = rcudata_open,
 	.read = seq_read,
 	.llseek = seq_lseek,
 	.release = single_release,
 };
 static void print_one_rcu_data_csv(struct seq_file *m, struct rcu_data *rdp)
 {
 	if (!rdp->beenonline)
 		return;
 	seq_printf(m, "%d,%s,%ld,%ld,%d,%ld,%d,%ld,%ld",
 		   rdp->cpu,
 		   cpu_is_offline(rdp->cpu) ? "\"Y\"" : "\"N\"",
 		   rdp->completed, rdp->gpnum,
 		   rdp->passed_quiesc, rdp->passed_quiesc_completed,
 		   rdp->qs_pending,
 		   rdp->n_rcu_pending_force_qs - rdp->n_rcu_pending,
 		   rdp->n_rcu_pending);
 #ifdef CONFIG_NO_HZ
 	seq_printf(m, ",%d,%d,%d,%lu",
 		   rdp->dynticks->dynticks,
 		   rdp->dynticks->dynticks_nesting,
 		   rdp->dynticks->dynticks_nmi,
 		   rdp->dynticks_fqs);
 #endif /* #ifdef CONFIG_NO_HZ */
 	seq_printf(m, ",%lu,%lu", rdp->offline_fqs, rdp->resched_ipi);
 	seq_printf(m, ",%ld,%ld\n", rdp->qlen, rdp->blimit);
 }
 static int show_rcudata_csv(struct seq_file *m, void *unused)
 {
 	seq_puts(m, "\"CPU\",\"Online?\",\"c\",\"g\",\"pq\",\"pqc\",\"pq\",\"rpfq\",\"rp\",");
 #ifdef CONFIG_NO_HZ
 	seq_puts(m, "\"dt\",\"dt nesting\",\"dn\",\"df\",");
 #endif /* #ifdef CONFIG_NO_HZ */
 	seq_puts(m, "\"of\",\"ri\",\"ql\",\"b\"\n");
 	seq_puts(m, "\"rcu:\"\n");
 	PRINT_RCU_DATA(rcu_data, print_one_rcu_data_csv, m);
 	seq_puts(m, "\"rcu_bh:\"\n");
 	PRINT_RCU_DATA(rcu_bh_data, print_one_rcu_data_csv, m);
 	return 0;
 }
 static int rcudata_csv_open(struct inode *inode, struct file *file)
 {
 	return single_open(file, show_rcudata_csv, NULL);
 }
 static struct file_operations rcudata_csv_fops = {
 	.owner = THIS_MODULE,
 	.open = rcudata_csv_open,
 	.read = seq_read,
 	.llseek = seq_lseek,
 	.release = single_release,
 };
 static void print_one_rcu_state(struct seq_file *m, struct rcu_state *rsp)
 {
 	int level = 0;
 	struct rcu_node *rnp;
 	seq_printf(m, "c=%ld g=%ld s=%d jfq=%ld j=%x "
 	              "nfqs=%lu/nfqsng=%lu(%lu) fqlh=%lu\n",
 		   rsp->completed, rsp->gpnum, rsp->signaled,
 		   (long)(rsp->jiffies_force_qs - jiffies),
 		   (int)(jiffies & 0xffff),
 		   rsp->n_force_qs, rsp->n_force_qs_ngp,
 		   rsp->n_force_qs - rsp->n_force_qs_ngp,
 		   rsp->n_force_qs_lh);
 	for (rnp = &rsp->node[0]; rnp - &rsp->node[0] < NUM_RCU_NODES; rnp++) {
 		if (rnp->level != level) {
 			seq_puts(m, "\n");
 			level = rnp->level;
 		}
 		seq_printf(m, "%lx/%lx %d:%d ^%d    ",
 			   rnp->qsmask, rnp->qsmaskinit,
 			   rnp->grplo, rnp->grphi, rnp->grpnum);
 	}
 	seq_puts(m, "\n");
 }
 static int show_rcuhier(struct seq_file *m, void *unused)
 {
 	seq_puts(m, "rcu:\n");
 	print_one_rcu_state(m, &rcu_state);
 	seq_puts(m, "rcu_bh:\n");
 	print_one_rcu_state(m, &rcu_bh_state);
 	return 0;
 }
 static int rcuhier_open(struct inode *inode, struct file *file)
 {
 	return single_open(file, show_rcuhier, NULL);
 }
 static struct file_operations rcuhier_fops = {
 	.owner = THIS_MODULE,
 	.open = rcuhier_open,
 	.read = seq_read,
 	.llseek = seq_lseek,
 	.release = single_release,
 };
 static int show_rcugp(struct seq_file *m, void *unused)
 {
 	seq_printf(m, "rcu: completed=%ld  gpnum=%ld\n",
 		   rcu_state.completed, rcu_state.gpnum);
 	seq_printf(m, "rcu_bh: completed=%ld  gpnum=%ld\n",
 		   rcu_bh_state.completed, rcu_bh_state.gpnum);
 	return 0;
 }
 static int rcugp_open(struct inode *inode, struct file *file)
 {
 	return single_open(file, show_rcugp, NULL);
 }
 static struct file_operations rcugp_fops = {
 	.owner = THIS_MODULE,
 	.open = rcugp_open,
 	.read = seq_read,
 	.llseek = seq_lseek,
 	.release = single_release,
 };
 static struct dentry *rcudir, *datadir, *datadir_csv, *hierdir, *gpdir;
 static int __init rcuclassic_trace_init(void)
 {
 	rcudir = debugfs_create_dir("rcu", NULL);
 	if (!rcudir)
 		goto out;
 	datadir = debugfs_create_file("rcudata", 0444, rcudir,
 						NULL, &rcudata_fops);
 	if (!datadir)
 		goto free_out;
 	datadir_csv = debugfs_create_file("rcudata.csv", 0444, rcudir,
 						NULL, &rcudata_csv_fops);
 	if (!datadir_csv)
 		goto free_out;
 	gpdir = debugfs_create_file("rcugp", 0444, rcudir, NULL, &rcugp_fops);
 	if (!gpdir)
 		goto free_out;
 	hierdir = debugfs_create_file("rcuhier", 0444, rcudir,
 						NULL, &rcuhier_fops);
 	if (!hierdir)
 		goto free_out;
 	return 0;
 free_out:
 	if (datadir)
 		debugfs_remove(datadir);
 	if (datadir_csv)
 		debugfs_remove(datadir_csv);
 	if (gpdir)
 		debugfs_remove(gpdir);
 	debugfs_remove(rcudir);
 out:
 	return 1;
 }
 static void __exit rcuclassic_trace_cleanup(void)
 {
 	debugfs_remove(datadir);
 	debugfs_remove(datadir_csv);
 	debugfs_remove(gpdir);
 	debugfs_remove(hierdir);
 	debugfs_remove(rcudir);
 }
 module_init(rcuclassic_trace_init);
 module_exit(rcuclassic_trace_cleanup);
 MODULE_AUTHOR("Paul E. McKenney");
 MODULE_DESCRIPTION("Read-Copy Update tracing for hierarchical implementation");
 MODULE_LICENSE("GPL");
--- a/kernel/resource.c
+++ b/kernel/resource.c
@ -853,6 +853,15 @@ int iomem_map_sanity_check(resource_size_t addr, unsigned long size)
 		if (PFN_DOWN(p->start) <= PFN_DOWN(addr) &&
 		    PFN_DOWN(p->end) >= PFN_DOWN(addr + size - 1))
 			continue;
 		/*
 		 * if a resource is "BUSY", it's not a hardware resource
 		 * but a driver mapping of such a resource; we don't want
 		 * to warn for those; some drivers legitimately map only
 		 * partial hardware resources. (example: vesafb)
 		 */
 		if (p->flags & IORESOURCE_BUSY)
 			continue;
 		printk(KERN_WARNING "resource map sanity check conflict: "
 		       "0x%llx 0x%llx 0x%llx 0x%llx %s\n",
 		       (unsigned long long)addr,
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@ -102,20 +102,6 @@ void local_bh_disable(void)
 EXPORT_SYMBOL(local_bh_disable);
 void __local_bh_enable(void)
 {
 	WARN_ON_ONCE(in_irq());
 	/*
 	 * softirqs should never be enabled by __local_bh_enable(),
 	 * it always nests inside local_bh_enable() sections:
 	 */
 	WARN_ON_ONCE(softirq_count() == SOFTIRQ_OFFSET);
 	sub_preempt_count(SOFTIRQ_OFFSET);
 }
 EXPORT_SYMBOL_GPL(__local_bh_enable);
 /*
 * Special-case - softirqs can safely be enabled in
 * cond_resched_softirq(), or by __do_softirq(),
@ -269,6 +255,7 @@ void irq_enter(void)
 {
 	int cpu = smp_processor_id();
 	rcu_irq_enter();
 	if (idle_cpu(cpu) && !in_interrupt()) {
 		__irq_enter();
 		tick_check_idle(cpu);
@ -295,9 +282,9 @@ void irq_exit(void)
 #ifdef CONFIG_NO_HZ
 	/* Make sure that timer wheel updates are propagated */
 	if (!in_interrupt() && idle_cpu(smp_processor_id()) && !need_resched())
 		tick_nohz_stop_sched_tick(0);
 	rcu_irq_exit();
 	if (idle_cpu(smp_processor_id()) && !in_interrupt() && !need_resched())
 		tick_nohz_stop_sched_tick(0);
 #endif
 	preempt_enable_no_resched();
 }
--- a/kernel/stacktrace.c
+++ b/kernel/stacktrace.c
@ -6,6 +6,7 @@
 *  Copyright (C) 2006 Red Hat, Inc., Ingo Molnar <mingo@redhat.com>
 */
 #include <linux/sched.h>
 #include <linux/kernel.h>
 #include <linux/module.h>
 #include <linux/kallsyms.h>
 #include <linux/stacktrace.h>
@ -24,3 +25,13 @@ void print_stack_trace(struct stack_trace *trace, int spaces)
 }
 EXPORT_SYMBOL_GPL(print_stack_trace);
 /*
 * Architectures that do not implement save_stack_trace_tsk get this
 * weak alias and a once-per-bootup warning (whenever this facility
 * is utilized - for example by procfs):
 */
 __weak void
 save_stack_trace_tsk(struct task_struct *tsk, struct stack_trace *trace)
 {
 	WARN_ONCE(1, KERN_INFO "save_stack_trace_tsk() not implemented yet.\n");
 }
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@ -252,6 +252,14 @@ config DEBUG_OBJECTS_TIMERS
 	  timer routines to track the life time of timer objects and
 	  validate the timer operations.
 config DEBUG_OBJECTS_ENABLE_DEFAULT
 	int "debug_objects bootup default value (0-1)"
        range 0 1
        default "1"
        depends on DEBUG_OBJECTS
        help
          Debug objects boot parameter default value
 config DEBUG_SLAB
 	bool "Debug slab memory allocations"
 	depends on DEBUG_KERNEL && SLAB
@ -629,6 +637,19 @@ config RCU_CPU_STALL_DETECTOR
 	  Say N if you are unsure.
 config RCU_CPU_STALL_DETECTOR
 	bool "Check for stalled CPUs delaying RCU grace periods"
 	depends on CLASSIC_RCU || TREE_RCU
 	default n
 	help
 	  This option causes RCU to printk information on which
 	  CPUs are delaying the current grace period, but only when
 	  the grace period extends for excessive time periods.
 	  Say Y if you want RCU to perform such checks.
 	  Say N if you are unsure.
 config KPROBES_SANITY_TEST
 	bool "Kprobes sanity tests"
 	depends on DEBUG_KERNEL
--- a/lib/debugobjects.c
+++ b/lib/debugobjects.c
@ -45,7 +45,9 @@ static struct kmem_cache	*obj_cache;
 static int			debug_objects_maxchain __read_mostly;
 static int			debug_objects_fixups __read_mostly;
 static int			debug_objects_warnings __read_mostly;
-static int			debug_objects_enabled __read_mostly;
+static int			debug_objects_enabled __read_mostly
 				= CONFIG_DEBUG_OBJECTS_ENABLE_DEFAULT;
 static struct debug_obj_descr	*descr_test  __read_mostly;
 static int __init enable_object_debug(char *str)
--- a/lib/swiotlb.c
+++ b/lib/swiotlb.c
@ -21,9 +21,12 @@
 #include <linux/mm.h>
 #include <linux/module.h>
 #include <linux/spinlock.h>
 #include <linux/swiotlb.h>
 #include <linux/string.h>
 #include <linux/swiotlb.h>
 #include <linux/types.h>
 #include <linux/ctype.h>
 #include <linux/highmem.h>
 #include <asm/io.h>
 #include <asm/dma.h>
@ -36,22 +39,6 @@
 #define OFFSET(val,align) ((unsigned long)	\
 	                   ( (val) & ( (align) - 1)))
 #define SG_ENT_VIRT_ADDRESS(sg)	(sg_virt((sg)))
 #define SG_ENT_PHYS_ADDRESS(sg)	virt_to_bus(SG_ENT_VIRT_ADDRESS(sg))
 /*
 * Maximum allowable number of contiguous slabs to map,
 * must be a power of 2.  What is the appropriate value ?
 * The complexity of {map,unmap}_single is linearly dependent on this value.
 */
 #define IO_TLB_SEGSIZE	128
 /*
 * log of the size of each IO TLB slab.  The number of slabs is command line
 * controllable.
 */
 #define IO_TLB_SHIFT 11
 #define SLABS_PER_PAGE (1 << (PAGE_SHIFT - IO_TLB_SHIFT))
 /*
@ -102,7 +89,10 @@ static unsigned int io_tlb_index;
 * We need to save away the original address corresponding to a mapped entry
 * for the sync operations.
 */
-static unsigned char **io_tlb_orig_addr;
+static struct swiotlb_phys_addr {
 	struct page *page;
 	unsigned int offset;
 } *io_tlb_orig_addr;
 /*
 * Protect the above data structures in the map and unmap calls
@ -126,6 +116,72 @@ setup_io_tlb_npages(char *str)
 __setup("swiotlb=", setup_io_tlb_npages);
 /* make io_tlb_overflow tunable too? */
 void * __weak swiotlb_alloc_boot(size_t size, unsigned long nslabs)
 {
 	return alloc_bootmem_low_pages(size);
 }
 void * __weak swiotlb_alloc(unsigned order, unsigned long nslabs)
 {
 	return (void *)__get_free_pages(GFP_DMA | __GFP_NOWARN, order);
 }
 dma_addr_t __weak swiotlb_phys_to_bus(phys_addr_t paddr)
 {
 	return paddr;
 }
 phys_addr_t __weak swiotlb_bus_to_phys(dma_addr_t baddr)
 {
 	return baddr;
 }
 static dma_addr_t swiotlb_virt_to_bus(volatile void *address)
 {
 	return swiotlb_phys_to_bus(virt_to_phys(address));
 }
 static void *swiotlb_bus_to_virt(dma_addr_t address)
 {
 	return phys_to_virt(swiotlb_bus_to_phys(address));
 }
 int __weak swiotlb_arch_range_needs_mapping(void *ptr, size_t size)
 {
 	return 0;
 }
 static dma_addr_t swiotlb_sg_to_bus(struct scatterlist *sg)
 {
 	return swiotlb_phys_to_bus(page_to_phys(sg_page(sg)) + sg->offset);
 }
 static void swiotlb_print_info(unsigned long bytes)
 {
 	phys_addr_t pstart, pend;
 	dma_addr_t bstart, bend;
 	pstart = virt_to_phys(io_tlb_start);
 	pend = virt_to_phys(io_tlb_end);
 	bstart = swiotlb_phys_to_bus(pstart);
 	bend = swiotlb_phys_to_bus(pend);
 	printk(KERN_INFO "Placing %luMB software IO TLB between %p - %p\n",
 	       bytes >> 20, io_tlb_start, io_tlb_end);
 	if (pstart != bstart || pend != bend)
 		printk(KERN_INFO "software IO TLB at phys %#llx - %#llx"
 		       " bus %#llx - %#llx\n",
 		       (unsigned long long)pstart,
 		       (unsigned long long)pend,
 		       (unsigned long long)bstart,
 		       (unsigned long long)bend);
 	else
 		printk(KERN_INFO "software IO TLB at phys %#llx - %#llx\n",
 		       (unsigned long long)pstart,
 		       (unsigned long long)pend);
 }
 /*
 * Statically reserve bounce buffer space and initialize bounce buffer data
 * structures for the software IO TLB used to implement the DMA API.
@ -145,7 +201,7 @@ swiotlb_init_with_default_size(size_t default_size)
 	/*
 	 * Get IO TLB memory from the low pages
 	 */
-	io_tlb_start = alloc_bootmem_low_pages(bytes);
+	io_tlb_start = swiotlb_alloc_boot(bytes, io_tlb_nslabs);
 	if (!io_tlb_start)
 		panic("Cannot allocate SWIOTLB buffer");
 	io_tlb_end = io_tlb_start + bytes;
@ -159,7 +215,7 @@ swiotlb_init_with_default_size(size_t default_size)
 	for (i = 0; i < io_tlb_nslabs; i++)
 		io_tlb_list[i] = IO_TLB_SEGSIZE - OFFSET(i, IO_TLB_SEGSIZE);
 	io_tlb_index = 0;
-	io_tlb_orig_addr = alloc_bootmem(io_tlb_nslabs * sizeof(char *));
+	io_tlb_orig_addr = alloc_bootmem(io_tlb_nslabs * sizeof(struct swiotlb_phys_addr));
 	/*
 	 * Get the overflow emergency buffer
@ -168,8 +224,7 @@ swiotlb_init_with_default_size(size_t default_size)
 	if (!io_tlb_overflow_buffer)
 		panic("Cannot allocate SWIOTLB overflow buffer!\n");
-	printk(KERN_INFO "Placing software IO TLB between 0x%lx - 0x%lx\n",
+	swiotlb_print_info(bytes);
 	       virt_to_bus(io_tlb_start), virt_to_bus(io_tlb_end));
 }
 void __init
@ -202,8 +257,7 @@ swiotlb_late_init_with_default_size(size_t default_size)
 	bytes = io_tlb_nslabs << IO_TLB_SHIFT;
 	while ((SLABS_PER_PAGE << order) > IO_TLB_MIN_SLABS) {
-		io_tlb_start = (char *)__get_free_pages(GFP_DMA | __GFP_NOWARN,
+		io_tlb_start = swiotlb_alloc(order, io_tlb_nslabs);
 		                                        order);
 		if (io_tlb_start)
 			break;
 		order--;
@ -235,12 +289,12 @@ swiotlb_late_init_with_default_size(size_t default_size)
 		io_tlb_list[i] = IO_TLB_SEGSIZE - OFFSET(i, IO_TLB_SEGSIZE);
 	io_tlb_index = 0;
-	io_tlb_orig_addr = (unsigned char **)__get_free_pages(GFP_KERNEL,
+	io_tlb_orig_addr = (struct swiotlb_phys_addr *)__get_free_pages(GFP_KERNEL,
-	                           get_order(io_tlb_nslabs * sizeof(char *)));
+	                           get_order(io_tlb_nslabs * sizeof(struct swiotlb_phys_addr)));
 	if (!io_tlb_orig_addr)
 		goto cleanup3;
-	memset(io_tlb_orig_addr, 0, io_tlb_nslabs * sizeof(char *));
+	memset(io_tlb_orig_addr, 0, io_tlb_nslabs * sizeof(struct swiotlb_phys_addr));
 	/*
 	 * Get the overflow emergency buffer
@ -250,9 +304,7 @@ swiotlb_late_init_with_default_size(size_t default_size)
 	if (!io_tlb_overflow_buffer)
 		goto cleanup4;
-	printk(KERN_INFO "Placing %luMB software IO TLB between 0x%lx - "
+	swiotlb_print_info(bytes);
 	       "0x%lx\n", bytes >> 20,
 	       virt_to_bus(io_tlb_start), virt_to_bus(io_tlb_end));
 	return 0;
@ -279,16 +331,69 @@ address_needs_mapping(struct device *hwdev, dma_addr_t addr, size_t size)
 	return !is_buffer_dma_capable(dma_get_mask(hwdev), addr, size);
 }
 static inline int range_needs_mapping(void *ptr, size_t size)
 {
 	return swiotlb_force || swiotlb_arch_range_needs_mapping(ptr, size);
 }
 static int is_swiotlb_buffer(char *addr)
 {
 	return addr >= io_tlb_start && addr < io_tlb_end;
 }
 static struct swiotlb_phys_addr swiotlb_bus_to_phys_addr(char *dma_addr)
 {
 	int index = (dma_addr - io_tlb_start) >> IO_TLB_SHIFT;
 	struct swiotlb_phys_addr buffer = io_tlb_orig_addr[index];
 	buffer.offset += (long)dma_addr & ((1 << IO_TLB_SHIFT) - 1);
 	buffer.page += buffer.offset >> PAGE_SHIFT;
 	buffer.offset &= PAGE_SIZE - 1;
 	return buffer;
 }
 static void
 __sync_single(struct swiotlb_phys_addr buffer, char *dma_addr, size_t size, int dir)
 {
 	if (PageHighMem(buffer.page)) {
 		size_t len, bytes;
 		char *dev, *host, *kmp;
 		len = size;
 		while (len != 0) {
 			unsigned long flags;
 			bytes = len;
 			if ((bytes + buffer.offset) > PAGE_SIZE)
 				bytes = PAGE_SIZE - buffer.offset;
 			local_irq_save(flags); /* protects KM_BOUNCE_READ */
 			kmp  = kmap_atomic(buffer.page, KM_BOUNCE_READ);
 			dev  = dma_addr + size - len;
 			host = kmp + buffer.offset;
 			if (dir == DMA_FROM_DEVICE)
 				memcpy(host, dev, bytes);
 			else
 				memcpy(dev, host, bytes);
 			kunmap_atomic(kmp, KM_BOUNCE_READ);
 			local_irq_restore(flags);
 			len -= bytes;
 			buffer.page++;
 			buffer.offset = 0;
 		}
 	} else {
 		void *v = page_address(buffer.page) + buffer.offset;
 		if (dir == DMA_TO_DEVICE)
 			memcpy(dma_addr, v, size);
 		else
 			memcpy(v, dma_addr, size);
 	}
 }
 /*
 * Allocates bounce buffer and returns its kernel virtual address.
 */
 static void *
-map_single(struct device *hwdev, char *buffer, size_t size, int dir)
+map_single(struct device *hwdev, struct swiotlb_phys_addr buffer, size_t size, int dir)
 {
 	unsigned long flags;
 	char *dma_addr;
@ -298,11 +403,16 @@ map_single(struct device *hwdev, char *buffer, size_t size, int dir)
 	unsigned long mask;
 	unsigned long offset_slots;
 	unsigned long max_slots;
 	struct swiotlb_phys_addr slot_buf;
 	mask = dma_get_seg_boundary(hwdev);
-	start_dma_addr = virt_to_bus(io_tlb_start) & mask;
+	start_dma_addr = swiotlb_virt_to_bus(io_tlb_start) & mask;
 	offset_slots = ALIGN(start_dma_addr, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
 	/*
 	 * Carefully handle integer overflow which can occur when mask == ~0UL.
 	 */
 	max_slots = mask + 1
 		    ? ALIGN(mask + 1, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT
 		    : 1UL << (BITS_PER_LONG - IO_TLB_SHIFT);
@ -378,10 +488,15 @@ found:
 	 * This is needed when we sync the memory.  Then we sync the buffer if
 	 * needed.
 	 */
-	for (i = 0; i < nslots; i++)
+	slot_buf = buffer;
-		io_tlb_orig_addr[index+i] = buffer + (i << IO_TLB_SHIFT);
+	for (i = 0; i < nslots; i++) {
 		slot_buf.page += slot_buf.offset >> PAGE_SHIFT;
 		slot_buf.offset &= PAGE_SIZE - 1;
 		io_tlb_orig_addr[index+i] = slot_buf;
 		slot_buf.offset += 1 << IO_TLB_SHIFT;
 	}
 	if (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL)
-		memcpy(dma_addr, buffer, size);
+		__sync_single(buffer, dma_addr, size, DMA_TO_DEVICE);
 	return dma_addr;
 }
@ -395,17 +510,17 @@ unmap_single(struct device *hwdev, char *dma_addr, size_t size, int dir)
 	unsigned long flags;
 	int i, count, nslots = ALIGN(size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
 	int index = (dma_addr - io_tlb_start) >> IO_TLB_SHIFT;
-	char *buffer = io_tlb_orig_addr[index];
+	struct swiotlb_phys_addr buffer = swiotlb_bus_to_phys_addr(dma_addr);
 	/*
 	 * First, sync the memory before unmapping the entry
 	 */
-	if (buffer && ((dir == DMA_FROM_DEVICE) || (dir == DMA_BIDIRECTIONAL)))
+	if ((dir == DMA_FROM_DEVICE) || (dir == DMA_BIDIRECTIONAL))
 		/*
 		 * bounce... copy the data back into the original buffer * and
 		 * delete the bounce buffer.
 		 */
-		memcpy(buffer, dma_addr, size);
+		__sync_single(buffer, dma_addr, size, DMA_FROM_DEVICE);
 	/*
 	 * Return the buffer to the free list by setting the corresponding
@ -437,21 +552,18 @@ static void
 sync_single(struct device *hwdev, char *dma_addr, size_t size,
 	    int dir, int target)
 {
-	int index = (dma_addr - io_tlb_start) >> IO_TLB_SHIFT;
+	struct swiotlb_phys_addr buffer = swiotlb_bus_to_phys_addr(dma_addr);
 	char *buffer = io_tlb_orig_addr[index];
 	buffer += ((unsigned long)dma_addr & ((1 << IO_TLB_SHIFT) - 1));
 	switch (target) {
 	case SYNC_FOR_CPU:
 		if (likely(dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL))
-			memcpy(buffer, dma_addr, size);
+			__sync_single(buffer, dma_addr, size, DMA_FROM_DEVICE);
 		else
 			BUG_ON(dir != DMA_TO_DEVICE);
 		break;
 	case SYNC_FOR_DEVICE:
 		if (likely(dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
-			memcpy(dma_addr, buffer, size);
+			__sync_single(buffer, dma_addr, size, DMA_TO_DEVICE);
 		else
 			BUG_ON(dir != DMA_FROM_DEVICE);
 		break;
@ -473,7 +585,7 @@ swiotlb_alloc_coherent(struct device *hwdev, size_t size,
 		dma_mask = hwdev->coherent_dma_mask;
 	ret = (void *)__get_free_pages(flags, order);
-	if (ret && !is_buffer_dma_capable(dma_mask, virt_to_bus(ret), size)) {
+	if (ret && !is_buffer_dma_capable(dma_mask, swiotlb_virt_to_bus(ret), size)) {
 		/*
 		 * The allocated memory isn't reachable by the device.
 		 * Fall back on swiotlb_map_single().
@ -488,13 +600,16 @@ swiotlb_alloc_coherent(struct device *hwdev, size_t size,
 		 * swiotlb_map_single(), which will grab memory from
 		 * the lowest available address range.
 		 */
-		ret = map_single(hwdev, NULL, size, DMA_FROM_DEVICE);
+		struct swiotlb_phys_addr buffer;
 		buffer.page = virt_to_page(NULL);
 		buffer.offset = 0;
 		ret = map_single(hwdev, buffer, size, DMA_FROM_DEVICE);
 		if (!ret)
 			return NULL;
 	}
 	memset(ret, 0, size);
-	dev_addr = virt_to_bus(ret);
+	dev_addr = swiotlb_virt_to_bus(ret);
 	/* Confirm address can be DMA'd by device */
 	if (!is_buffer_dma_capable(dma_mask, dev_addr, size)) {
@ -554,8 +669,9 @@ dma_addr_t
 swiotlb_map_single_attrs(struct device *hwdev, void *ptr, size_t size,
 			 int dir, struct dma_attrs *attrs)
 {
-	dma_addr_t dev_addr = virt_to_bus(ptr);
+	dma_addr_t dev_addr = swiotlb_virt_to_bus(ptr);
 	void *map;
 	struct swiotlb_phys_addr buffer;
 	BUG_ON(dir == DMA_NONE);
 	/*
@ -563,19 +679,22 @@ swiotlb_map_single_attrs(struct device *hwdev, void *ptr, size_t size,
 	 * we can safely return the device addr and not worry about bounce
 	 * buffering it.
 	 */
-	if (!address_needs_mapping(hwdev, dev_addr, size) && !swiotlb_force)
+	if (!address_needs_mapping(hwdev, dev_addr, size) &&
 	    !range_needs_mapping(ptr, size))
 		return dev_addr;
 	/*
 	 * Oh well, have to allocate and map a bounce buffer.
 	 */
-	map = map_single(hwdev, ptr, size, dir);
+	buffer.page   = virt_to_page(ptr);
 	buffer.offset = (unsigned long)ptr & ~PAGE_MASK;
 	map = map_single(hwdev, buffer, size, dir);
 	if (!map) {
 		swiotlb_full(hwdev, size, dir, 1);
 		map = io_tlb_overflow_buffer;
 	}
-	dev_addr = virt_to_bus(map);
+	dev_addr = swiotlb_virt_to_bus(map);
 	/*
 	 * Ensure that the address returned is DMA'ble
@ -605,7 +724,7 @@ void
 swiotlb_unmap_single_attrs(struct device *hwdev, dma_addr_t dev_addr,
 			   size_t size, int dir, struct dma_attrs *attrs)
 {
-	char *dma_addr = bus_to_virt(dev_addr);
+	char *dma_addr = swiotlb_bus_to_virt(dev_addr);
 	BUG_ON(dir == DMA_NONE);
 	if (is_swiotlb_buffer(dma_addr))
@ -635,7 +754,7 @@ static void
 swiotlb_sync_single(struct device *hwdev, dma_addr_t dev_addr,
 		    size_t size, int dir, int target)
 {
-	char *dma_addr = bus_to_virt(dev_addr);
+	char *dma_addr = swiotlb_bus_to_virt(dev_addr);
 	BUG_ON(dir == DMA_NONE);
 	if (is_swiotlb_buffer(dma_addr))
@ -666,7 +785,7 @@ swiotlb_sync_single_range(struct device *hwdev, dma_addr_t dev_addr,
 			  unsigned long offset, size_t size,
 			  int dir, int target)
 {
-	char *dma_addr = bus_to_virt(dev_addr) + offset;
+	char *dma_addr = swiotlb_bus_to_virt(dev_addr) + offset;
 	BUG_ON(dir == DMA_NONE);
 	if (is_swiotlb_buffer(dma_addr))
@ -714,18 +833,20 @@ swiotlb_map_sg_attrs(struct device *hwdev, struct scatterlist *sgl, int nelems,
 		     int dir, struct dma_attrs *attrs)
 {
 	struct scatterlist *sg;
-	void *addr;
+	struct swiotlb_phys_addr buffer;
 	dma_addr_t dev_addr;
 	int i;
 	BUG_ON(dir == DMA_NONE);
 	for_each_sg(sgl, sg, nelems, i) {
-		addr = SG_ENT_VIRT_ADDRESS(sg);
+		dev_addr = swiotlb_sg_to_bus(sg);
-		dev_addr = virt_to_bus(addr);
+		if (range_needs_mapping(sg_virt(sg), sg->length) ||
 		if (swiotlb_force ||
 		    address_needs_mapping(hwdev, dev_addr, sg->length)) {
-			void *map = map_single(hwdev, addr, sg->length, dir);
+			void *map;
 			buffer.page   = sg_page(sg);
 			buffer.offset = sg->offset;
 			map = map_single(hwdev, buffer, sg->length, dir);
 			if (!map) {
 				/* Don't panic here, we expect map_sg users
 				   to do proper error handling. */
@ -735,7 +856,7 @@ swiotlb_map_sg_attrs(struct device *hwdev, struct scatterlist *sgl, int nelems,
 				sgl[0].dma_length = 0;
 				return 0;
 			}
-			sg->dma_address = virt_to_bus(map);
+			sg->dma_address = swiotlb_virt_to_bus(map);
 		} else
 			sg->dma_address = dev_addr;
 		sg->dma_length = sg->length;
@ -765,11 +886,11 @@ swiotlb_unmap_sg_attrs(struct device *hwdev, struct scatterlist *sgl,
 	BUG_ON(dir == DMA_NONE);
 	for_each_sg(sgl, sg, nelems, i) {
-		if (sg->dma_address != SG_ENT_PHYS_ADDRESS(sg))
+		if (sg->dma_address != swiotlb_sg_to_bus(sg))
-			unmap_single(hwdev, bus_to_virt(sg->dma_address),
+			unmap_single(hwdev, swiotlb_bus_to_virt(sg->dma_address),
 				     sg->dma_length, dir);
 		else if (dir == DMA_FROM_DEVICE)
-			dma_mark_clean(SG_ENT_VIRT_ADDRESS(sg), sg->dma_length);
+			dma_mark_clean(swiotlb_bus_to_virt(sg->dma_address), sg->dma_length);
 	}
 }
 EXPORT_SYMBOL(swiotlb_unmap_sg_attrs);
@ -798,11 +919,11 @@ swiotlb_sync_sg(struct device *hwdev, struct scatterlist *sgl,
 	BUG_ON(dir == DMA_NONE);
 	for_each_sg(sgl, sg, nelems, i) {
-		if (sg->dma_address != SG_ENT_PHYS_ADDRESS(sg))
+		if (sg->dma_address != swiotlb_sg_to_bus(sg))
-			sync_single(hwdev, bus_to_virt(sg->dma_address),
+			sync_single(hwdev, swiotlb_bus_to_virt(sg->dma_address),
 				    sg->dma_length, dir, target);
 		else if (dir == DMA_FROM_DEVICE)
-			dma_mark_clean(SG_ENT_VIRT_ADDRESS(sg), sg->dma_length);
+			dma_mark_clean(swiotlb_bus_to_virt(sg->dma_address), sg->dma_length);
 	}
 }
@ -823,7 +944,7 @@ swiotlb_sync_sg_for_device(struct device *hwdev, struct scatterlist *sg,
 int
 swiotlb_dma_mapping_error(struct device *hwdev, dma_addr_t dma_addr)
 {
-	return (dma_addr == virt_to_bus(io_tlb_overflow_buffer));
+	return (dma_addr == swiotlb_virt_to_bus(io_tlb_overflow_buffer));
 }
 /*
@ -835,7 +956,7 @@ swiotlb_dma_mapping_error(struct device *hwdev, dma_addr_t dma_addr)
 int
 swiotlb_dma_supported(struct device *hwdev, u64 mask)
 {
-	return virt_to_bus(io_tlb_end - 1) <= mask;
+	return swiotlb_virt_to_bus(io_tlb_end - 1) <= mask;
 }
 EXPORT_SYMBOL(swiotlb_map_single);