llvm-project

Commit Graph

Author	SHA1	Message	Date
Benjamin Kramer	951b15eb09	Revamp error checking in the ms inline asm parser. - Actually abort when an error occurred. - Check that the frontend lookup worked when parsing length/size/type operators. Tested by a clang test. PR18096. llvm-svn: 196044	2013-12-01 11:47:42 +00:00
Hal Finkel	42daeae9bd	Add a scheduling model (with itinerary) for the PPC POWER7 This adds a scheduling model for the POWER7 (P7) core, and enables the machine-instruction scheduler when targeting the P7. Scheduling for the P7, like earlier ooo PPC cores, requires considering both dispatch group hazards, and functional unit resources and latencies. These are both modeled in a combined itinerary. Dispatch group formation is still handled by the post-RA scheduler (which still needs to be updated for the P7, but nevertheless does a pretty good job). One interesting aspect of this change is that I've also enabled to use of AA duing CodeGen for the P7 (just as it is for the embedded cores). The benchmark results seem to support this decision (see below), and while this is normally useful for in-order cores, and not for ooo cores like the P7, I think that the dispatch slot hazards are enough like in-order resources to make the AA useful. Test suite significant performance differences (where negative is a speedup, and positive is a regression) vs. the current situation: MultiSource/Benchmarks/BitBench/drop3/drop3 with AA: N/A without AA: -28.7614% +/- 19.8356% (significantly against AA) MultiSource/Benchmarks/FreeBench/neural/neural with AA: -17.7406% +/- 11.2712% without AA: N/A (significantly in favor of AA) MultiSource/Benchmarks/SciMark2-C/scimark2 with AA: -11.2079% +/- 1.80543% without AA: -11.3263% +/- 2.79651% MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt with AA: -41.8649% +/- 17.0053% without AA: -34.5256% +/- 23.7072% MultiSource/Benchmarks/mafft/pairlocalalign with AA: 25.3016% +/- 17.8614% without AA: 38.6629% +/- 14.9391% (significantly in favor of AA) MultiSource/Benchmarks/sim/sim with AA: N/A without AA: 13.4844% +/- 7.18195% (significantly in favor of AA) SingleSource/Benchmarks/BenchmarkGame/Large/fasta with AA: 15.0664% +/- 6.70216% without AA: 12.7747% +/- 8.43043% SingleSource/Benchmarks/BenchmarkGame/puzzle with AA: 82.2713% +/- 26.3567% without AA: 75.7525% +/- 41.1842% SingleSource/Benchmarks/Misc/flops-2 with AA: -37.1621% +/- 20.7964% without AA: -35.2342% +/- 20.2999% (significantly in favor of AA) These are 99.5% confidence intervals from 5 runs per configuration. Regarding the choice to turn on AA during CodeGen, of these results, four seem significantly in favor of using AA, and one seems significantly against. I'm not making this decision based on these numbers alone, but these results seem consistent with results I have from other tests, and so I think that, on balance, using AA is a win. llvm-svn: 195981	2013-11-30 20:55:12 +00:00
Hal Finkel	46402a4211	Split some PPC itinerary classes In preparation for adding scheduling definitions for the POWER7, split some PPC itinerary classes so that the P7's latencies and hazards can be better described. For the most part, this means differentiating indexed from non-index pre-increment loads and stores. Also, differentiate single from double-precision sqrt. No functionality change intended (except for a more-specific latency for single-precision sqrt on the A2). llvm-svn: 195980	2013-11-30 20:41:13 +00:00
Zoran Jovanovic	9d86e26e62	Fixed issue with microMIPS long branch. llvm-svn: 195975	2013-11-30 19:12:28 +00:00
Daniel Sanders	7fd68d6018	[mips][msa] MSA loads and stores have a 10-bit offset. Account for this when lowering FrameIndex. This prevents the compiler from emitting invalid ld.[bhwd]'s and st.[bhwd]'s when the stack frame is between 512 and 32,768 bytes in size. llvm-svn: 195973	2013-11-30 13:47:57 +00:00
Daniel Sanders	7153414768	[mips][msa] A small refactor to reduce patch noise in my next commit No functional change. An if-statement has been split into two nested if-statements. llvm-svn: 195972	2013-11-30 13:15:21 +00:00
Reed Kotler	ad450f239f	Part 1 of 3 patches that completes very long conditional branches in constant islands for Mips16. We introdcuce JalB16 as a synomnym for Jal16. It makes it easier to read and is also necessary because Jal16 is a call instruction but JalB16 is being used as a branch. Various parts of LLVM will not work properly even in this late stage of the backend if we use what was declared as a call instruction to function as a branch. For one, basic block labels may not get emitted in some situations. llvm-svn: 195968	2013-11-29 22:32:56 +00:00
Zoran Jovanovic	1bc3cce040	Revert revision 195965. llvm-svn: 195967	2013-11-29 22:10:02 +00:00
Zoran Jovanovic	ff2a40ce4d	Fixed issue with microMIPS long branch. llvm-svn: 195965	2013-11-29 21:41:24 +00:00
Hal Finkel	1df3205e8c	Adjust PPC A2 input operand latencies On the PPC A2, instructions are only issued after their input operands are ready. Model this by specifying that input operands are read at dispatch (0 cycles after issue). This changes all input operand latencies from 1 to 0. Significant test-suite performance changes (these are 99.5% confidence intervals on 6 runs for both before and after): speedups: MultiSource/Benchmarks/sim/sim -1.21915% +/- 0.175063% MultiSource/Benchmarks/TSVC/LinearDependence-flt/LinearDependence-flt -1.23946% +/- 1.05133% SingleSource/Benchmarks/Misc/flops-2 -1.24237% +/- 0.681362% MultiSource/Applications/JM/lencod/lencod -1.33992% +/- 0.757498% MultiSource/Benchmarks/TSVC/InductionVariable-flt/InductionVariable-flt -1.51802% +/- 1.21468% MultiSource/Benchmarks/TSVC/GlobalDataFlow-flt/GlobalDataFlow-flt -2.18818% +/- 1.28605% MultiSource/Benchmarks/TSVC/Packing-flt/Packing-flt -2.21977% +/- 1.19499% SingleSource/Benchmarks/BenchmarkGame/spectral-norm -2.29822% +/- 0.671871% MultiSource/Benchmarks/TSVC/Packing-dbl/Packing-dbl -2.40975% +/- 0.355931% SingleSource/Benchmarks/Misc/fp-convert -2.41899% +/- 1.04751% MultiSource/Benchmarks/TSVC/Searching-dbl/Searching-dbl -2.50349% +/- 0.126765% SingleSource/Benchmarks/Misc/flops-3 -3.00214% +/- 0.700795% MultiSource/Benchmarks/TSVC/LoopRestructuring-flt/LoopRestructuring-flt -3.56995% +/- 3.2929% MultiSource/Applications/sgefa/sgefa -4.24908% +/- 2.00413% MultiSource/Benchmarks/ASC_Sequoia/IRSmk/IRSmk -18.1294% +/- 3.96489% regressions: MultiSource/Benchmarks/TSVC/Reductions-dbl/Reductions-dbl 1.03249% +/- 0.178547% MultiSource/Applications/hexxagon/hexxagon 1.16597% +/- 0.285235% MultiSource/Benchmarks/TSVC/IndirectAddressing-flt/IndirectAddressing-flt 1.39576% +/- 1.07855% SingleSource/Benchmarks/Misc-C++/stepanov_v1p2 1.71539% +/- 0.173182% MultiSource/Benchmarks/Fhourstones-3.1/fhourstones3.1 1.90013% +/- 0.866472% MultiSource/Benchmarks/TSVC/Recurrences-dbl/Recurrences-dbl 2.39854% +/- 1.05914% MultiSource/Benchmarks/TSVC/ControlFlow-dbl/ControlFlow-dbl 2.4402% +/- 0.817904% MultiSource/Benchmarks/TSVC/LoopRestructuring-dbl/LoopRestructuring-dbl 5.87997% +/- 3.3172% MultiSource/Benchmarks/Trimaran/netbench-crc/netbench-crc 9.02643% +/- 5.79591% MultiSource/Benchmarks/VersaBench/bmm/bmm 10.3517% +/- 1.227% Obviously, there are data points on both sides of this; but I think, overall, this supports making the change. llvm-svn: 195951	2013-11-29 07:04:59 +00:00
Hal Finkel	5a7162f36b	Create a PPC440 SchedMachineModel Some of the older PPC processor definitions don't have associated SchedMachineModels; correct this for the PPC440. llvm-svn: 195949	2013-11-29 06:32:17 +00:00
Hal Finkel	4035e8d86a	Fixup PPC440 load/store operand latencies The operand latencies for loads and stores in the PPC440 itinerary were wrong (the store operands are all inputs, and the "with update" (pre-increment) instructions need a latency for the additional output). llvm-svn: 195948	2013-11-29 06:19:43 +00:00
Hal Finkel	a10bd1d23a	Adjust PPC440 operand latencies The operand latencies for the PPC440 should be specified relative to dispatch, not relative to the initial fetch-and-decode stages. Because most instructions (ignoring bypass) wait in dispatch until their operands are ready, this is modeled as reading input operands "at dispatch" (0 cycles after issue), and so every input and output operand has 4 cycles subtracted from it. This could alter scheduling slightly, but I don't expect a large effect. llvm-svn: 195947	2013-11-29 05:59:00 +00:00
Hal Finkel	dd06369913	Don't model the fetch and decode units for the PPC440 Modeling the fetch and decode units in the PPC440 itinerary does not add anything to the hazard detection capability (and so modeling them just wastes compile time). No functionality change intended. llvm-svn: 195946	2013-11-29 05:58:38 +00:00
Lang Hames	39609996d9	Refactor a lot of patchpoint/stackmap related code to simplify and make it target independent. Most of the x86 specific stackmap/patchpoint handling was necessitated by the use of the native address-mode format for frame index operands. PEI has now been modified to treat stackmap/patchpoint similarly to DEBUG_INFO, allowing us to use a simple, platform independent register/offset pair for frame indexes on stackmap/patchpoints. Notes: - Folding is now platform independent and automatically supported. - Emiting patchpoints with direct memory references now just involves calling the TargetLoweringBase::emitPatchPoint utility method from the target's XXXTargetLowering::EmitInstrWithCustomInserter method. (See X86TargetLowering for an example). - No more ugly platform-specific operand parsers. This patch shouldn't change the generated output for X86. llvm-svn: 195944	2013-11-29 03:07:54 +00:00
Hao Liu	ba38eee8ac	AArch64: The pattern match should check the range of the immediate value. Or we can generate some illegal instructions. E.g. shrn2 v0.4s, v1.2d, #35. The legal range should be in [1, 16]. llvm-svn: 195941	2013-11-29 02:11:22 +00:00
Jiangning Liu	c429c00f3b	Add missing pattern for supporting intrinsic function vbsl_f64 with argument double floating point. llvm-svn: 195938	2013-11-29 01:37:15 +00:00
Kevin Qin	337cfcc83c	[AArch64 NEON]Fix a assertion failure when disassemble SHLL instruction. llvm-svn: 195936	2013-11-29 01:29:16 +00:00
Rafael Espindola	d5bd5a4716	Refactor to remove a bit of duplication. No functionality change. llvm-svn: 195933	2013-11-28 20:12:44 +00:00
Benjamin Kramer	ea1982aff9	Silence sign-compare warning and reduce nesting. No functionality change. llvm-svn: 195932	2013-11-28 19:58:56 +00:00
NAKAMURA Takumi	226e10edff	[CMake] Let add_public_tablegen_target() provide intrinsics_gen, too. I think, in principle, intrinsics_gen may be added explicitly. That said, it can be added incidentally, since each target already has dependencies to llvm-tblgen. Almost all source files depend on both CommonTaleGen and intrinsics_gen. Explicit add_dependencies() have been pruned under lib/Target. llvm-svn: 195929	2013-11-28 17:04:31 +00:00
NAKAMURA Takumi	ce746c6c49	[CMake] Let add_public_tablegen_target responsible to provide dependency to CommonTableGen. add_public_tablegen_target adds *CommonTableGen to LLVM_COMMON_DEPENDS. LLVM_COMMON_DEPENDS affects add_llvm_library (and other add_target stuff) within its scope. llvm-svn: 195927	2013-11-28 17:04:04 +00:00
Rafael Espindola	848493d886	The global prefix is always one char. Don't use a string for it. llvm-svn: 195926	2013-11-28 17:00:49 +00:00
NAKAMURA Takumi	b2abd160b3	[CMake] Prune include_directories() in llvm/lib/Target, take #2 . I forgot to commit them. They were staging in my local repo. llvm-svn: 195924	2013-11-28 15:30:37 +00:00
Daniel Sanders	063b74ad4e	[mips] Revert test commit r195922. llvm-svn: 195923	2013-11-28 15:26:33 +00:00
Daniel Sanders	eb16443fca	[mips] A test commit to test my Herald and Audit workflow Will be reverted in the next commit llvm-svn: 195922	2013-11-28 15:25:43 +00:00
NAKAMURA Takumi	413518f1f8	[CMake] Prune include_directories() in llvm/lib/Target. add_llvm_target() sets them. llvm-svn: 195921	2013-11-28 14:53:30 +00:00
NAKAMURA Takumi	979e604d8c	Add newline at eof. llvm-svn: 195920	2013-11-28 14:52:52 +00:00
Rafael Espindola	3e3a3f1f85	Use the mangler consistently instead of using getGlobalPrefix directly. llvm-svn: 195911	2013-11-28 08:59:52 +00:00
Hal Finkel	92720ab1b2	Don't share functional units among the PPC itineraries Instead of sharing functional unit names between the various PPC itineraries, give each core its own unit names prefixed with the core name. This follows the convention used by other backends (such as ARM), and removes a non-obvious ordering dependency between the various PPCSchedule*.td files. No functionality change intended. llvm-svn: 195908	2013-11-28 06:05:59 +00:00
Jiangning Liu	4bc9dbd846	Remove the variable only used by assert to avoid the build failure caused by build options [-Werror,-Wunused-variable]. llvm-svn: 195905	2013-11-28 01:34:55 +00:00
Hao Liu	f9f468abee	AArch64: Fix a bug about disassembling post-index load single element to 4 vectors llvm-svn: 195903	2013-11-28 01:07:45 +00:00
Reed Kotler	0d409e2dfe	Check in conditional branches for constant islands. Still need to finish conditional branches for very large targets. That will be the next small patch. Everything now should in principle work as good (functionality wise) as without constant islands so we decided at Mips/Imagination to make constant islands the default for Mips16 now so that it will get excercised a lot and this port is still experimentatl though hopefully soon we will change the status. Some more cleanup and code review is in order but things are converging fast. llvm-svn: 195902	2013-11-28 00:56:37 +00:00
Akira Hatanaka	f6109e4ad7	[mips] Redefine TAILCALL as a pseudo instruction. No functionality change. llvm-svn: 195896	2013-11-27 23:58:32 +00:00
Akira Hatanaka	f9a0ec4fc4	Add MipsOptimizePICCall.cpp to CMakeLists.txt. llvm-svn: 195894	2013-11-27 23:47:25 +00:00
Akira Hatanaka	168d4e5b20	[mips] Implement the following optimizations using dominance information to make PIC calls a little more efficient: 1. Remove instructions setting up $gp if it is known that a function has been called at least once. 2. Save the address of a called function in a register instead of loading it from the GOT at every call site. llvm-svn: 195892	2013-11-27 23:38:42 +00:00
Hal Finkel	3e5a360ba3	Add IIC_ prefix to PPC instruction-class names This adds the IIC_ prefix to the instruction itinerary class names, giving the PPC backend a naming convention for itinerary classes that is more consistent with that used by the X86 and ARM backends. Instruction scheduling in the PPC backend needs a bunch of cleanup and improvement (especially for the ooo cores). This is just a preliminary step. No functionality change intended. llvm-svn: 195890	2013-11-27 23:26:09 +00:00
Rafael Espindola	c90584b6f6	Don't set GlobalPrefix to the default value. llvm-svn: 195884	2013-11-27 21:57:54 +00:00
Rafael Espindola	429e3fb068	The R600 has its own asm printer which doesn't use GlobalPrefix. Drop it. llvm-svn: 195883	2013-11-27 21:52:37 +00:00
Tom Stellard	175e7a8c97	R600: Expand vector FABS NOTE: This is a candidate for the 3.4 branch. llvm-svn: 195881	2013-11-27 21:23:39 +00:00
Tom Stellard	c149dc02d3	R600/SI: Implement spilling of SGPRs v5 SGPRs are spilled into VGPRs using the {READ,WRITE}LANE_B32 instructions. v2: - Fix encoding of Lane Mask - Use correct register flags, so we don't overwrite the low dword when restoring multi-dword registers. v3: - Register spilling seems to hang the GPU, so replace all shaders that need spilling with a dummy shader. v4: - Fix *LANE definitions - Change destination reg class for 32-bit SMRD instructions v5: - Remove small optimization that was crashing Serious Sam 3. https://bugs.freedesktop.org/show_bug.cgi?id=68224 https://bugs.freedesktop.org/show_bug.cgi?id=71285 NOTE: This is a candidate for the 3.4 branch. llvm-svn: 195880	2013-11-27 21:23:35 +00:00
Tom Stellard	859199dad8	R600/SI: Use SGPR_32 register class for 32-bit SMRD outputs Writing to the M0 register from an SMRD instruction hangs the GPU, so we need to use the SGPR_32 register class, which does not include M0. NOTE: This is a candidate for the 3.4 branch. llvm-svn: 195879	2013-11-27 21:23:29 +00:00
Tom Stellard	4d566b2edf	R600: Add support for ISD::FROUND NOTE: This is a candidate for the 3.4 branch. llvm-svn: 195878	2013-11-27 21:23:20 +00:00
Rafael Espindola	3dc549dbe3	Remove dead code. MO_ExternalSymbol and MO_JumpTableIndex don't show up in inline asm. llvm-svn: 195861	2013-11-27 18:38:14 +00:00
Rafael Espindola	52434f9673	Convert two if sequences to switches. llvm-svn: 195859	2013-11-27 18:26:51 +00:00
Rafael Espindola	ed20f478bc	Use a switch. llvm-svn: 195857	2013-11-27 18:18:24 +00:00
Rafael Espindola	c5c7bb6b20	Remove more dead code now that this is only used for inline asm. MO_ConstantPoolIndex is handled in printLeaMemReference. MO_JumpTableIndex and MO_ExternalSymbol don't show up in inline asm. llvm-svn: 195847	2013-11-27 15:13:06 +00:00
Jiangning Liu	97aa8cf8b7	Fix the AArch64 NEON bug exposed by checking constant integer argument range of ACLE intrinsics. llvm-svn: 195843	2013-11-27 14:02:25 +00:00
Rafael Espindola	e370147b8c	Convert more methods in static helpers. llvm-svn: 195826	2013-11-27 07:34:09 +00:00
Rafael Espindola	7caa135677	Convert these methods into static functions. llvm-svn: 195825	2013-11-27 07:14:26 +00:00

1 2 3 4 5 ...

26336 Commits