llvm-project

Commit Graph

Author	SHA1	Message	Date
Saleem Abdulrasool	7230b377df	CodeGen: silence a C++11 feature warning llvm-svn: 198133	2013-12-28 22:47:55 +00:00
Andrew Trick	7afe481801	Uninitialized variable (in never taken path) after factoring. llvm-svn: 198131	2013-12-28 22:25:57 +00:00
Andrew Trick	33e05d7665	Added debugging options: -misched-only-func/block llvm-svn: 198124	2013-12-28 21:57:02 +00:00
Andrew Trick	d14d7c20f5	Add a PostMachineScheduler pass with generic implementation. PostGenericScheduler uses either the new machine model or the hazard checker for top-down scheduling. Most of the infrastructure for PreRA machine scheduling is reused. With a some tuning, this should allow MachineScheduler to be default for all ARM targets, including cortex-A9, using the new machine model. Likewise, with additional tuning, it should be able to replace PostRAScheduler for all targets. The PostMachineScheduler pass does not currently run the AntiDepBreaker. There is less need for it on targets that are already running preRA MachineScheduler. I want to prove it's necessary before committing to the maintenance burden. The PostMachineScheduler also currently removes kill flags and adds them all back later. This is a bit ridiculous. I'd prefer passes to directly use a liveness utility than rely on flags. A test case that enables this scheduler will be included in a subsequent checkin that updates the A9 model. llvm-svn: 198122	2013-12-28 21:56:57 +00:00
Andrew Trick	17080b9bf2	Stub out a PostMachineScheduler pass. Placeholder and boilerplate for a PostRA MachineScheduler pass. llvm-svn: 198120	2013-12-28 21:56:51 +00:00
Andrew Trick	d7f890edb0	Factor MI-Sched in preparation for post-ra scheduling support. Factor the MachineFunctionPass into MachineSchedulerBase. Split the DAG class into ScheduleDAGMI and SchedulerDAGMILive. llvm-svn: 198119	2013-12-28 21:56:47 +00:00
Andrew Trick	fc127d1197	Factor out the SchedRemainder/SchedBoundary from GenericScheduler strategy. These helper classes take care of the book-keeping the drives the GenericScheduler heuristics. It is likely that developers writing target-specific schedulers that work similarly to GenericScheduler will want to use these helpers too. The immediate goal is to develop a GenericPostScheduler that can run in place of the old PostRAScheduler, but will use the new machine model. No functionality change intended. llvm-svn: 196643	2013-12-07 05:59:44 +00:00
Andrew Trick	f7760a24e5	comment grammar llvm-svn: 196585	2013-12-06 17:19:20 +00:00
Daniel Jasper	0d92abdfd2	Fix bug introduced in r196517. Not only does it trigger -Wparentheses, I think the assert actually relies on incorrect operator precedence. Also, the grammar as questionable, but I might not know enough about the problem at hand. llvm-svn: 196567	2013-12-06 08:58:22 +00:00
Andrew Trick	5a22df498e	MI-Sched: Model "reserved" processor resources. This allows a target to use MI-Sched as an in-order scheduler that will model strict resource conflicts without defining a processor itinerary. Instead, the target can now use the new per-operand machine model and define in-order resources with BufferSize=0. For example, this would allow restricting the type of operations that can be formed into a dispatch group. (Normally NumMicroOps is sufficient to enforce dispatch groups). If the intent is to model latency in in-order pipeline, as opposed to resource conflicts, then a resource with BufferSize=1 should be defined instead. This feature is only casually tested as there are no in-tree targets using it yet. However, Hal will be experimenting with POWER7. llvm-svn: 196517	2013-12-05 17:56:02 +00:00
Andrew Trick	880e573d98	MI-Sched: handle latency of in-order operations with the new machine model. The per-operand machine model allows the target to define "unbuffered" processor resources. This change is a quick, cheap way to model stalls caused by the latency of operations that use such resources. This only applies when the processor's micro-op buffer size is non-zero (Out-of-Order). We can't precisely model in-order stalls during out-of-order execution, but this is an easy and effective heuristic. It benefits cortex-a9 scheduling when using the new machine model, which is not yet on by default. MI-Sched for armv7 was evaluated on Swift (and only not enabled because of a performance bug related to predication). However, we never evaluated Cortex-A9 performance on MI-Sched in its current form. This change adds MI-Sched functionality to reach performance goals on A9. The only remaining change is to allow MI-Sched to run as a PostRA pass. I evaluated performance using a set of options to estimate the performance impact once MI sched is default on armv7: -mcpu=cortex-a9 -disable-post-ra -misched-bench -scheditins=false For a simple saxpy loop I see a 1.7x speedup. Here are the llvm-testsuite results: (min run time over 2 runs, filtering tiny changes) Speedups: \| Benchmarks/BenchmarkGame/recursive \| 52.39% \| \| Benchmarks/VersaBench/beamformer \| 20.80% \| \| Benchmarks/Misc/pi \| 19.97% \| \| Benchmarks/Misc/mandel-2 \| 19.95% \| \| SPEC/CFP2000/188.ammp \| 18.72% \| \| Benchmarks/McCat/08-main/main \| 18.58% \| \| Benchmarks/Misc-C++/Large/sphereflake \| 18.46% \| \| Benchmarks/Olden/power \| 17.11% \| \| Benchmarks/Misc-C++/mandel-text \| 16.47% \| \| Benchmarks/Misc/oourafft \| 15.94% \| \| Benchmarks/Misc/flops-7 \| 14.99% \| \| Benchmarks/FreeBench/distray \| 14.26% \| \| SPEC/CFP2006/470.lbm \| 14.00% \| \| mediabench/mpeg2/mpeg2dec/mpeg2decode \| 12.28% \| \| Benchmarks/SmallPT/smallpt \| 10.36% \| \| Benchmarks/Misc-C++/Large/ray \| 8.97% \| \| Benchmarks/Misc/fp-convert \| 8.75% \| \| Benchmarks/Olden/perimeter \| 7.10% \| \| Benchmarks/Bullet/bullet \| 7.03% \| \| Benchmarks/Misc/mandel \| 6.75% \| \| Benchmarks/Olden/voronoi \| 6.26% \| \| Benchmarks/Misc/flops-8 \| 5.77% \| \| Benchmarks/Misc/matmul_f64_4x4 \| 5.19% \| \| Benchmarks/MiBench/security-rijndael \| 5.15% \| \| Benchmarks/Misc/flops-6 \| 5.10% \| \| Benchmarks/Olden/tsp \| 4.46% \| \| Benchmarks/MiBench/consumer-lame \| 4.28% \| \| Benchmarks/Misc/flops-5 \| 4.27% \| \| Benchmarks/mafft/pairlocalalign \| 4.19% \| \| Benchmarks/Misc/himenobmtxpa \| 4.07% \| \| Benchmarks/Misc/lowercase \| 4.06% \| \| SPEC/CFP2006/433.milc \| 3.99% \| \| Benchmarks/tramp3d-v4 \| 3.79% \| \| Benchmarks/FreeBench/pifft \| 3.66% \| \| Benchmarks/Ptrdist/ks \| 3.21% \| \| Benchmarks/Adobe-C++/loop_unroll \| 3.12% \| \| SPEC/CINT2000/175.vpr \| 3.12% \| \| Benchmarks/nbench \| 2.98% \| \| SPEC/CFP2000/183.equake \| 2.91% \| \| Benchmarks/Misc/perlin \| 2.85% \| \| Benchmarks/Misc/flops-1 \| 2.82% \| \| Benchmarks/Misc-C++-EH/spirit \| 2.80% \| \| Benchmarks/Misc/flops-2 \| 2.77% \| \| Benchmarks/NPB-serial/is \| 2.42% \| \| Benchmarks/ASC_Sequoia/CrystalMk \| 2.33% \| \| Benchmarks/BenchmarkGame/n-body \| 2.28% \| \| Benchmarks/SciMark2-C/scimark2 \| 2.27% \| \| Benchmarks/Olden/bh \| 2.03% \| \| skidmarks10/skidmarks \| 1.81% \| \| Benchmarks/Misc/flops \| 1.72% \| Slowdowns: \| Benchmarks/llubenchmark/llu \| -14.14% \| \| Benchmarks/Polybench/stencils/seidel-2d \| -5.67% \| \| Benchmarks/Adobe-C++/functionobjects \| -5.25% \| \| Benchmarks/Misc-C++/oopack_v1p8 \| -5.00% \| \| Benchmarks/Shootout/hash \| -2.35% \| \| Benchmarks/Prolangs-C++/ocean \| -2.01% \| \| Benchmarks/Polybench/medley/floyd-warshall \| -1.98% \| \| Polybench/linear-algebra/kernels/3mm \| -1.95% \| \| Benchmarks/McCat/09-vor/vor \| -1.68% \| llvm-svn: 196516	2013-12-05 17:55:58 +00:00
Andrew Trick	bb1247b9f0	comment typo and reformat llvm-svn: 196513	2013-12-05 17:55:47 +00:00
Juergen Ributzka	d12ccbd343	[weak vtables] Remove a bunch of weak vtables This patch removes most of the trivial cases of weak vtables by pinning them to a single object file. The memory leaks in this version have been fixed. Thanks Alexey for pointing them out. Differential Revision: http://llvm-reviews.chandlerc.com/D2068 Reviewed by Andy llvm-svn: 195064	2013-11-19 00:57:56 +00:00
Alexey Samsonov	49109a279c	Revert r194865 and r194874. This change is incorrect. If you delete virtual destructor of both a base class and a subclass, then the following code: Base *foo = new Child(); delete foo; will not cause the destructor for members of Child class. As a result, I observe plently of memory leaks. Notable examples I investigated are: ObjectBuffer and ObjectBufferStream, AttributeImpl and StringSAttributeImpl. llvm-svn: 194997	2013-11-18 09:31:53 +00:00
Juergen Ributzka	dbedae89b9	[weak vtables] Remove a bunch of weak vtables This patch removes most of the trivial cases of weak vtables by pinning them to a single object file. Differential Revision: http://llvm-reviews.chandlerc.com/D2068 Reviewed by Andy llvm-svn: 194865	2013-11-15 22:34:48 +00:00
Matthias Braun	88dd0abd2d	Pass LiveQueryResult by value This makes the API a bit more natural to use and makes it easier to make LiveRanges implementation details private. llvm-svn: 192394	2013-10-10 21:28:52 +00:00
Andrew Trick	dc4c1adfc7	Comment typo. llvm-svn: 191312	2013-09-24 17:11:19 +00:00
Andrew Trick	978674b2bc	Allow subtarget selection of the default MachineScheduler and document the interface. The global registry is used to allow command line override of the scheduler selection, but does not work well as the normal selection API. For example, the same LLVM process should be able to target multiple targets or subtargets. llvm-svn: 191071	2013-09-20 05:14:41 +00:00
Andrew Trick	665d3ec3d3	Rename ConvergingScheduler to GenericScheduler. This was an experimental scheduler a year ago. It's now used by several subtargets, both in-order and out-of-order, and it is about to be enabled by default for x86 and armv7. It will be the new GenericScheduler for subtargets that don't provide their own SchedulingStrategy. llvm-svn: 191051	2013-09-19 23:10:59 +00:00
Andrew Trick	6c88b35090	Enable -misched-cyclicpath by default. llvm-svn: 190367	2013-09-09 23:31:14 +00:00
Andrew Trick	e1f7bf2c02	mi-sched: smooth out the cyclicpath heuristic. Arnold's idea. I generally try to avoid stateful heuristics because it can make debugging harder. However, we need a way to prevent the latency priority from dominating, and it somewhat makes sense to schedule aggressively for latency only within an issue group. Swift in particular likes this, and it doesn't hurt anyone else: \| Benchmarks/MiBench/consumer-lame \| 10.39% \| \| Benchmarks/Misc/himenobmtxpa \| 9.63% \| llvm-svn: 190360	2013-09-09 22:28:08 +00:00
Andrew Trick	b248b4a1de	mi-sched: cleanup register pressure update, remove a FIXME. llvm-svn: 190181	2013-09-06 17:32:47 +00:00
Andrew Trick	c573cd905a	mi-sched: improve regpressure tracing. llvm-svn: 190180	2013-09-06 17:32:44 +00:00
Andrew Trick	7609b7d1b5	mi-sched: print tree size in -view-misched-dags llvm-svn: 190179	2013-09-06 17:32:42 +00:00
Andrew Trick	ffdbefb90c	mi-sched: register pressure update tracing. llvm-svn: 190178	2013-09-06 17:32:39 +00:00
Andrew Trick	ddffae9027	mi-sched: Reorder Cyclicpath (latency) and CriticalMax (pressure) heuristics. The latency based scheduling could induce spills in some cases. llvm-svn: 190177	2013-09-06 17:32:36 +00:00
Andrew Trick	75e411cc8e	Added MachineSchedPolicy. Allow subtargets to customize the generic scheduling strategy. This is convenient for targets that don't need to add new heuristics by specializing the strategy. llvm-svn: 190176	2013-09-06 17:32:34 +00:00
Andrew Trick	ed20075d19	mi-sched: Force bottom up scheduling for generic targets. Fast register pressure tracking currently only takes effect during bottom up scheduling. Forcing this is a bit faster and simpler for targets that don't have many scheduling constraints and don't need top-down scheduling. llvm-svn: 190014	2013-09-04 23:54:00 +00:00
Andrew Trick	b05db8e0b9	comment typo llvm-svn: 189997	2013-09-04 21:12:05 +00:00
Andrew Trick	2a749ee0b9	Remove dead subtree limit code. llvm-svn: 189995	2013-09-04 21:00:20 +00:00
Andrew Trick	856ecd9ab3	-view-misched-dags, better pruning. llvm-svn: 189994	2013-09-04 21:00:18 +00:00
Andrew Trick	ef54c59490	mi-sched: DEBUG cleanup, call tracePick for unidirectional scheduling. llvm-svn: 189993	2013-09-04 21:00:16 +00:00
Andrew Trick	1ab16d9ecf	80 columns llvm-svn: 189992	2013-09-04 21:00:13 +00:00
Andrew Trick	66c3dfbf8c	mi-sched: Suppress register pressure tracking when the scheduling window is too small. If the instruction window is < NumRegs/2, pressure tracking is not likely to be effective. The scheduler has to process a very large number of tiny blocks. We want this to be fast. llvm-svn: 189991	2013-09-04 21:00:11 +00:00
Andrew Trick	a6e877707f	mi-sched: Load clustering is a bit to expensive to enable unconditionally. llvm-svn: 189990	2013-09-04 21:00:08 +00:00
Andrew Trick	8c699c93b2	mi-sched: Reuse an invalid HazardRecognizer to save compile time. llvm-svn: 189989	2013-09-04 21:00:05 +00:00
Andrew Trick	310190e21f	mi-sched: bypass heuristic checks when regpressure tracking is disabled. llvm-svn: 189988	2013-09-04 21:00:02 +00:00
Andrew Trick	b6e74712b6	Added -misched-regpressure option. Register pressure tracking is half the complexity of the scheduler. It's useful to be able to turn it off for compile time and performance comparisons. llvm-svn: 189987	2013-09-04 20:59:59 +00:00
Andrew Trick	2c4f8b7ee8	Fix my previous checkin to updatePressureDiffs. There was one case that we could hit a DebugValue where I didn't think to check. DebugValues are evil. No checkinable test case, sorry. It's an obvious fix. llvm-svn: 189717	2013-08-31 05:17:58 +00:00
Andrew Trick	2bc74c2887	mi-sched: update PressureDiffs on-the-fly for liveness. This removes all expensive pressure tracking logic from the scheduling critical path of node comparison. llvm-svn: 189643	2013-08-30 04:36:57 +00:00
Andrew Trick	b1a45b6c61	mi-sched: improve the generic register pressure comparison. Only compare pressure within the same set. When multiple sets are affected, we prioritize the most constrained set. llvm-svn: 189641	2013-08-30 04:27:29 +00:00
Andrew Trick	1a8313458f	mi-sched: Precompute a PressureDiff for each instruction, adjust for liveness later. Created SUPressureDiffs array to hold the per node PDiff computed during DAG building. Added a getUpwardPressureDelta API that will soon replace the old one. Compute PressureDelta here from the precomputed PressureDiffs. Updating for liveness will come next. llvm-svn: 189640	2013-08-30 03:49:48 +00:00
Andrew Trick	ef80f50058	comment typo llvm-svn: 189635	2013-08-30 02:02:12 +00:00
Andrew Trick	483f4199f3	Comment and revise the cyclic critical path code. This should be much more clear now. It's still disabled pending testing. llvm-svn: 189597	2013-08-29 18:04:49 +00:00
Andrew Trick	c01b00400d	Adds cyclic critical path computation and heuristics, temporarily disabled. Estimate the cyclic critical path within a single block loop. If the acyclic critical path is longer, then the loop will exhaust OOO resources after some number of iterations. If lag between the acyclic critical path and cyclic critical path is longer the the time it takes to issue those loop iterations, then aggressively schedule for latency. llvm-svn: 189120	2013-08-23 17:48:43 +00:00
Andrew Trick	a53e101627	mi-sched: Don't call MBB.size() in initSUnits. The driver already has instr count. This fixes a pathological compile time problem with very large blocks and lots of scheduling boundaries. llvm-svn: 189116	2013-08-23 17:48:33 +00:00
Andrew Trick	2f7667e018	Confusing comment typo. llvm-svn: 187895	2013-08-07 17:20:32 +00:00
Andrew Trick	9c17eab761	MI Sched: Track live-thru registers. When registers must be live throughout the scheduling region, increase the limit for the register class. Once we exceed the original limit, they will be spilled, and there's no point further reducing pressure. This isn't a perfect heuristics but avoids a situation where the scheduler could become trapped by trying to achieve the impossible. llvm-svn: 187436	2013-07-30 19:59:12 +00:00
Andrew Trick	d9761776bc	MI Sched fix: assert "Disconnected LRG within the scheduling region." llvm-svn: 187435	2013-07-30 19:59:08 +00:00
Andrew Trick	401b6959ae	MI Sched: Register pressure heuristics. Consider which set is being increased or decreased before comparing. llvm-svn: 187110	2013-07-25 07:26:35 +00:00

1 2 3 4

192 Commits