llvm-project

Commit Graph

Author	SHA1	Message	Date
Akira Hatanaka	c6496e2cb6	Test case for MIPS long branch pass. llvm-svn: 158438	2012-06-14 02:12:21 +00:00
Akira Hatanaka	843aca9328	Fix test cases. llvm-svn: 158435	2012-06-14 01:21:00 +00:00
Akira Hatanaka	df5205ef3d	Implement a DAGCombine in MipsISelLowering.cpp which transforms the following pattern: (add v0, (add v1, abs_lo(tjt))) => (add (add v0, v1), abs_lo(tjt)) "tjt" is a TargetJumpTable node. llvm-svn: 158419	2012-06-13 20:33:18 +00:00
Akira Hatanaka	1daf8c2a16	Set a higher value for maxStoresPerMemcpy in MipsISelLowering.cpp. llvm-svn: 158414	2012-06-13 19:33:32 +00:00
Akira Hatanaka	f0273603f5	Implement fastcc calling convention for MIPS. llvm-svn: 158410	2012-06-13 18:06:00 +00:00
Richard Osborne	ab7d788eb5	Fix pattern for MKMSK instruction. llvm-svn: 158409	2012-06-13 17:59:12 +00:00
Craig Topper	71dc02d659	Fix intrinsics for XOP frczss/sd instructions. These instructions only take one source register and zero the upper bits of the destination rather than preserving them. llvm-svn: 158396	2012-06-13 07:18:53 +00:00
Akira Hatanaka	5fa541231b	disable use of directive .set nomicromips until this directive is pushed in gas to open source fsf Patch by Reed Kotler. llvm-svn: 158381	2012-06-13 02:41:14 +00:00
Andrew Trick	344fb64fa3	sched: fix latency of memory dependence chain edges for consistency. For store->load dependencies that may alias, we should always use TrueMemOrderLatency, which may eventually become a subtarget hook. In effect, we should guarantee at least TrueMemOrderLatency on at least one DAG path from a store to a may-alias load. This should fix the standard mode as well as -enable-aa-sched-mi". llvm-svn: 158380	2012-06-13 02:39:03 +00:00
Chad Rosier	c6916f88a8	[arm-fast-isel] Add support for -arm-long-calls. Patch by Jush Lu <jush.msn@gmail.com>. llvm-svn: 158368	2012-06-12 19:25:13 +00:00
Jakob Stoklund Olesen	e782fa649f	Fix test that depends on register allocation. The test is really checking the prolog/epilog load/store multiple formation. llvm-svn: 158328	2012-06-11 21:14:28 +00:00
Jakob Stoklund Olesen	4e28777465	Fix test case to work on ARM. Patch by James Benton! llvm-svn: 158316	2012-06-11 16:01:14 +00:00
Bill Wendling	4b79647a6e	Re-enable the CMN instruction. We turned off the CMN instruction because it had semantics which we weren't getting correct. If we are comparing with an immediate, then it's okay to use the CMN instruction. <rdar://problem/7569620> llvm-svn: 158302	2012-06-11 08:07:26 +00:00
Hal Finkel	4e9f1a859f	Enable ILP scheduling for all nodes by default on PPC. Over the entire test-suite, this has an insignificantly negative average performance impact, but reduces some of the worst slowdowns from the anti-dep. change (r158294). Largest speedups: SingleSource/Benchmarks/Stanford/Quicksort - 28% SingleSource/Benchmarks/Stanford/Towers - 24% SingleSource/Benchmarks/Shootout-C++/matrix - 23% MultiSource/Benchmarks/SciMark2-C/scimark2 - 19% MultiSource/Benchmarks/MiBench/automotive-bitcount/automotive-bitcount - 15% (matrix and automotive-bitcount were both in the top-5 slowdown list from the anti-dep. change) Largest slowdowns: MultiSource/Benchmarks/McCat/03-testtrie/testtrie - 28% MultiSource/Benchmarks/mediabench/gsm/toast/toast - 26% MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan - 21% SingleSource/Benchmarks/CoyoteBench/lpbench - 20% MultiSource/Applications/d/make_dparser - 16% llvm-svn: 158296	2012-06-10 19:32:29 +00:00
Hal Finkel	2edfbddcf0	Improve ext/trunc patterns on PPC64. The PPC64 backend had patterns for i32 <-> i64 extensions and truncations that would leave self-moves in the final assembly. Replacing those patterns with ones based on the SUBREG builtins yields better-looking code. Thanks to Jakob and Owen for their suggestions in this matter. llvm-svn: 158283	2012-06-09 22:10:19 +00:00
Craig Topper	3352ba55b9	Replace XOP vpcom intrinsics with fewer intrinsics that take the immediate as an argument. llvm-svn: 158278	2012-06-09 16:46:13 +00:00
Hal Finkel	eb50c2d4a4	Enable tail merging on PPC. Tail merging had been disabled on PPC because it would disturb bundling decisions made during pre-RA scheduling on the 970 cores. Now, however, all bundling decisions are made during post-RA scheduling, and tail merging is generally beneficial (the average test-suite speedup is insignificantly positive). Largest test-suite speedups: MultiSource/Benchmarks/mediabench/gsm/toast/toast - 30% MultiSource/Benchmarks/BitBench/uuencode/uuencode - 23% SingleSource/Benchmarks/Shootout-C++/ary - 21% SingleSource/Benchmarks/Stanford/Queens - 17% Largest slowdowns: MultiSource/Benchmarks/MiBench/security-sha/security-sha - 24% MultiSource/Benchmarks/McCat/03-testtrie/testtrie - 22% MultiSource/Applications/JM/ldecod/ldecod - 14% MultiSource/Benchmarks/mediabench/g721/g721encode/encode - 9% This is improved by using full (instead of just critical) anti-dependency breaking, but doing so still causes miscompiles and so cannot yet be enabled by default. llvm-svn: 158259	2012-06-09 03:14:50 +00:00
Jakob Stoklund Olesen	33a1b416ac	Don't run RAFast in the optimizing regalloc pipeline. The fast register allocator is not supposed to work in the optimizing pipeline. It doesn't make sense to compute live intervals, run full copy coalescing, and then run RAFast. Fast register allocation in the optimizing pipeline is better done by RABasic. llvm-svn: 158242	2012-06-08 23:15:12 +00:00
Hal Finkel	c6b5debb40	Enable PPC CTR loop formation by default. Thanks to Jakob's help, this now causes no new test suite failures! Over the entire test suite, this gives an average 1% speedup. The largest speedups are: SingleSource/Benchmarks/Misc/pi - 108% SingleSource/Benchmarks/CoyoteBench/lpbench - 54% MultiSource/Benchmarks/Prolangs-C/unix-smail/unix-smail - 50% SingleSource/Benchmarks/Shootout/ary3 - 32% SingleSource/Benchmarks/Shootout-C++/matrix - 30% The largest slowdowns are: MultiSource/Benchmarks/mediabench/gsm/toast/toast - -30% MultiSource/Benchmarks/Prolangs-C/bison/mybison - -25% MultiSource/Benchmarks/BitBench/uuencode/uuencode - -22% MultiSource/Applications/d/make_dparser - -14% SingleSource/Benchmarks/Shootout-C++/ary - -13% In light of these slowdowns, additional profiling work is obviously needed! llvm-svn: 158223	2012-06-08 19:19:53 +00:00
Manman Ren	bf86b295bb	Test case for r158160 llvm-svn: 158218	2012-06-08 18:42:37 +00:00
Chad Rosier	3d464d8068	Fix a crash in APInt::lshr when shiftAmt > BitWidth. Patch by James Benton <jbenton@vmware.com>. llvm-svn: 158213	2012-06-08 18:04:52 +00:00
NAKAMURA Takumi	5412cef77d	test/CodeGen/Generic/APIntLoadStore.ll: Mark as XFAIL:ppc since r157911. llvm-svn: 158209	2012-06-08 16:28:06 +00:00
Hal Finkel	821e00121c	Disable the PPC CTR-Loops pass by default. The pass itself works well, but the something in the Machine* infrastructure does not understand terminators which define registers. Without the ability to use the block-placement pass, etc. this causes performance regressions (and so is turned off by default). Turning off the analysis turns off the problems with the Machine* infrastructure. llvm-svn: 158206	2012-06-08 15:38:25 +00:00
Hal Finkel	8b01503ee5	Fix a bug in the new PPC CTR-Loops pass. The code which tests for an induction operation cannot assume that any ADDI instruction will have a register operand because the operand could also be a frame index; for example: %vreg16<def> = ADDI8 <fi#0>, 0; G8RC:%vreg16 llvm-svn: 158205	2012-06-08 15:38:23 +00:00
Hal Finkel	96c2d4d945	Add the PPCCTRLoops pass: a PPC machine-code-level optimization pass to form CTR-based loop branching code. This pass is derived from the Hexagon HardwareLoops pass. The only significant enhancement over the Hexagon pass is that PPCCTRLoops will also attempt to delete the replaced add and compare operations if they are no longer otherwise used. Also, invalid preheader DebugLoc is not used. llvm-svn: 158204	2012-06-08 15:38:21 +00:00
Manman Ren	2cdc8afccf	X86: optimize generated code for integer ABS This patch will generate the following for integer ABS: movl %edi, %eax negl %eax cmovll %edi, %eax INSTEAD OF movl %edi, %ecx sarl $31, %ecx leal (%rdi,%rcx), %eax xorl %ecx, %eax There exists a target-independent DAG combine for integer ABS, which converts integer ABS to sar+add+xor. For X86, we match this pattern back to neg+cmov. This is implemented in PerformXorCombine. rdar://10695237 llvm-svn: 158175	2012-06-07 22:39:10 +00:00
Rafael Espindola	55d1145bd5	Use a base register instead of an index register with the local dynamic model. Fixes pr13048. llvm-svn: 158158	2012-06-07 18:39:19 +00:00
Manman Ren	ae02c5a93e	X86: replace SUB with CMP if possible This patch will optimize the following movq %rdi, %rax subq %rsi, %rax cmovsq %rsi, %rdi movq %rdi, %rax to cmpq %rsi, %rdi cmovsq %rsi, %rdi movq %rdi, %rax Perform this optimization if the actual result of SUB is not used. rdar: 11540023 llvm-svn: 158126	2012-06-07 00:42:47 +00:00
Manman Ren	9c9641812c	Revert r157755. The commit is intended to fix rdar://11540023. It is implemented as part of peephole optimization. We can actually implement this in the SelectionDAG lowering phase. llvm-svn: 158122	2012-06-06 23:53:03 +00:00
Chad Rosier	5d6f01ad77	Add support for dynamic stack realignment in the presence of dynamic allocas on X86. rdar://11496434 llvm-svn: 158087	2012-06-06 17:37:40 +00:00
Joel Jones	7f2ac7a2c8	Revert commit r157966 llvm-svn: 157972	2012-06-05 00:47:21 +00:00
Joel Jones	d08534f82e	This change handles a another case for generating the bic instruction when a compile time constant is known. This occurs when implicitly zero extending function arguments from 16 bits to 32 bits. <rdar://problem/11481151> llvm-svn: 157966	2012-06-04 23:38:57 +00:00
Akira Hatanaka	3ee0405231	Add a test case for mips64 unaligned load/store instructions. llvm-svn: 157939	2012-06-04 17:57:06 +00:00
Akira Hatanaka	b964932e70	Rename test/CodeGen/Mips/load-shift-left-right.ll. llvm-svn: 157938	2012-06-04 17:50:36 +00:00
Roman Divacky	e3f15c98d1	Implement local-exec TLS on PowerPC. llvm-svn: 157935	2012-06-04 17:36:38 +00:00
Nadav Rotem	b7bb72e4f3	Remove the "-promote-elements" flag. This flag is now enabled by default. llvm-svn: 157925	2012-06-04 11:27:21 +00:00
Hal Finkel	595817eebe	Enable generating PPC pre-increment (r+imm) instructions by default. It seems that this no longer causes test suite failures on PPC64 (after r157159), and often gives a performance benefit, so it can be enabled by default. llvm-svn: 157911	2012-06-04 02:21:00 +00:00
Craig Topper	79dbb0c6e4	Rename FMA3 feature flag to just FMA to match gcc so it can be added to clang. llvm-svn: 157903	2012-06-03 18:58:46 +00:00
Craig Topper	fd53b80219	Rename fma4 intrinsics to just fma since they are now used for both FMA4 and FMA3. Autoupgrade support coming in a separate commit. llvm-svn: 157898	2012-06-03 07:26:46 +00:00
Manman Ren	5097e4f38a	Revert r157831 llvm-svn: 157896	2012-06-03 03:14:24 +00:00
Craig Topper	29eafea292	Use sse_load_f32/64 for scalar FMA3 intrinsic patterns instead of 128-bit loads to match instruction behavior. llvm-svn: 157895	2012-06-03 01:40:43 +00:00
Manman Ren	be10421c17	ARM: add testing case for struct byval rdar://9877866 llvm-svn: 157876	2012-06-02 05:37:44 +00:00
Akira Hatanaka	27512b167b	Add another test case which tests Mips' unaligned load/store instructions. llvm-svn: 157874	2012-06-02 01:13:10 +00:00
Akira Hatanaka	63c0e2c58c	Fix test cases in test/CodeGen/Mips. llvm-svn: 157868	2012-06-02 00:05:45 +00:00
Manman Ren	879ca9d47d	X86: peephole optimization to remove cmp instruction This patch will optimize the following: sub r1, r3 cmp r3, r1 or cmp r1, r3 bge L1 TO sub r1, r3 bge L1 or ble L1 If the branch instruction can use flag from "sub", then we can eliminate the "cmp" instruction. llvm-svn: 157831	2012-06-01 19:49:33 +00:00
Chris Lattner	b1359894f3	testcase for PR13006, thanks to Duncan for filing it. llvm-svn: 157824	2012-06-01 18:19:46 +00:00
Hans Wennborg	789acfb63d	Implement the local-dynamic TLS model for x86 (PR3985) This implements codegen support for accesses to thread-local variables using the local-dynamic model, and adds a clean-up pass so that the base address for the TLS block can be re-used between local-dynamic access on an execution path. llvm-svn: 157818	2012-06-01 16:27:21 +00:00
Craig Topper	00649d5111	Remove fadd(fmul) patterns for FMA3. This needs to be implemented by paying attention to FP_CONTRACT and matching @llvm.fma which is not available yet. This will allow us to enablle intrinsic use at least though. llvm-svn: 157804	2012-06-01 06:07:48 +00:00
Chris Lattner	466076b95f	enhance the logic for looking through tailcalls to look through transparent casts in multiple-return value scenarios, like what happens on X86-64 when returning small structs. llvm-svn: 157800	2012-06-01 05:29:15 +00:00
Chris Lattner	182fe3eef1	enhance getNoopInput to know about vector<->vector bitcasts of legal types, as well as int<->ptr casts. This allows us to tailcall functions with some trivial casts between the call and return (i.e. because the return types disagree). llvm-svn: 157798	2012-06-01 05:16:33 +00:00
Chris Lattner	22afea7689	add some simple 64-bit tail call tests. llvm-svn: 157797	2012-06-01 05:03:31 +00:00
Chris Lattner	21b1e6bbdc	merge some tests. llvm-svn: 157795	2012-06-01 05:00:54 +00:00
Chris Lattner	d82ae12d8c	rename test llvm-svn: 157794	2012-06-01 04:58:50 +00:00
Owen Anderson	ff458f89aa	Make this testcase independent of register allocation. llvm-svn: 157761	2012-05-31 18:07:02 +00:00
Manman Ren	9bccb64e56	X86: replace SUB with CMP if possible This patch will optimize the following movq %rdi, %rax subq %rsi, %rax cmovsq %rsi, %rdi movq %rdi, %rax to cmpq %rsi, %rdi cmovsq %rsi, %rdi movq %rdi, %rax Perform this optimization if the actual result of SUB is not used. rdar: 11540023 llvm-svn: 157755	2012-05-31 17:20:29 +00:00
Elena Demikhovsky	602f3a26d6	Added FMA3 Intel instructions. I disabled FMA3 autodetection, since the result may differ from expected for some benchmarks. I added tests for GodeGen and intrinsics. I did not change llvm.fma.f32/64 - it may be done later. llvm-svn: 157737	2012-05-31 09:20:20 +00:00
Craig Topper	c1ac05dad5	Add intrinsic for pclmulqdq instruction. llvm-svn: 157731	2012-05-31 04:37:40 +00:00
Jakob Stoklund Olesen	05e2245fc6	Prioritize smaller register classes for urgent evictions. It helps compile exotic inline asm. In the test case, normal GR32 virtual registers use up eax-edx so the final GR32_ABCD live range has no registers left. Since all the live ranges were tiny, we had no way of prioritizing the smaller register class. This patch allows tiny unspillable live ranges to be evicted by tiny unspillable live ranges from a smaller register class. <rdar://problem/11542429> llvm-svn: 157715	2012-05-30 21:46:58 +00:00
Eric Christopher	f481ab3877	Add support for the mips inline asm 'm' output modifier. Patch by Jack Carter. llvm-svn: 157709	2012-05-30 19:05:19 +00:00
Owen Anderson	0eda3e1de6	Switch the canonical FMA term operand order to match both the comment I wrote and the usual LLVM convention. llvm-svn: 157708	2012-05-30 18:54:50 +00:00
Owen Anderson	c7aaf523e1	Teach DAGCombine to canonicalize the position of a constant in the term operands of an FMA node. llvm-svn: 157707	2012-05-30 18:50:39 +00:00
Chris Lattner	1622a99e58	it's pointed out that R11 can be used for magic things, and doing things just for 64-bit registers is silly. Just optimize 3 more. llvm-svn: 157699	2012-05-30 18:08:02 +00:00
Chris Lattner	04d722a68d	Extend the (abi-irrelevant) return convention to be able to return more than two values in integer registers. This is already supported by the fastcc convention, but it doesn't hurt to support it in the standard conventions as well. In cases where we can cheat at the calling convention, this allows us to avoid returning things through memory in more cases. llvm-svn: 157698	2012-05-30 17:50:14 +00:00
Chad Rosier	820d248c4d	[arm-fast-isel] Add support for the llvm.frameaddress() intrinsic. Patch by Jush Lu <jush.msn@gmail.com>. llvm-svn: 157696	2012-05-30 17:23:22 +00:00
Evan Cheng	bc2453dd3d	Teach taildup to update livein set. rdar://11538365 llvm-svn: 157663	2012-05-30 00:42:39 +00:00
Bob Wilson	33e5188c27	Add an insertPass API to TargetPassConfig. <rdar://problem/11498613> Besides adding the new insertPass function, this patch uses it to enhance the existing -print-machineinstrs so that the MachineInstrs after a specific pass can be printed. Patch by Bin Zeng! llvm-svn: 157655	2012-05-30 00:17:12 +00:00
Benjamin Kramer	ef479ea854	Add intrinsics, code gen, assembler and disassembler support for the SSE4a extrq and insertq instructions. This required light surgery on the assembler and disassembler because the instructions use an uncommon encoding. They are the only two instructions in x86 that use register operands and two immediates. llvm-svn: 157634	2012-05-29 19:05:25 +00:00
Peter Collingbourne	913869be45	Add llvm.fabs intrinsic. llvm-svn: 157594	2012-05-28 21:48:37 +00:00
Chris Lattner	f7f59b15aa	These tests used intrinsics with the wrong prototype. They weren't caught because the old verifier just checked that something "was a pointer", but not that the pointee was correct. llvm-svn: 157544	2012-05-27 19:35:41 +00:00
Benjamin Kramer	f2beccf6b4	SelectionDAGBuilder: When emitting small compare chains for switches order them by using edge weights. SimplifyCFG tends to form a lot of 2-3 case switches when merging branches. Move the most likely condition to the front so it is checked first and the others can be skipped. This is currently not as effective as it could be because SimplifyCFG destroys profiling metadata when merging branches and switches. Merging branch weight metadata is tricky though. This code touches at most 3 cases so I didn't use a proper sorting algorithm. llvm-svn: 157521	2012-05-26 20:01:32 +00:00
Justin Holewinski	c98041d4d9	[NVPTX] Add a new test case for the newly-enabled call handling NV_CONTRIB llvm-svn: 157485	2012-05-25 17:20:38 +00:00
NAKAMURA Takumi	3eca973bf8	test/CodeGen/X86/bigstructret.ll: Suppress one test. It is msvc-incompatible. (compatible to mingw32 and netbsd, though) llvm-svn: 157474	2012-05-25 15:40:54 +00:00
NAKAMURA Takumi	501dbd06ae	test/CodeGen/X86/bigstructret.ll: Relax stack offsets for hosts of stack-align=8, eg. win32 and netbsd. llvm-svn: 157471	2012-05-25 15:12:21 +00:00
Eli Friedman	315a0c79f3	Simplify code for calling a function where CanLowerReturn fails, fixing a small bug in the process. llvm-svn: 157446	2012-05-25 00:09:29 +00:00
David Blaikie	c575c80c3b	Fix for CHECK-NOT misspelling. Patch by Nicklas Bo Jensen. llvm-svn: 157421	2012-05-24 22:08:29 +00:00
Justin Holewinski	907f7606f2	Remove the PTX back-end and all of its artifacts (triple, etc.) This back-end was deprecated in favor of the NVPTX back-end. NV_CONTRIB llvm-svn: 157417	2012-05-24 21:38:21 +00:00
Akira Hatanaka	a649cc75b3	Turn on mips16 pseudo op when compiling for mips16. Expand test case for this. Patch by Reed Kotler. llvm-svn: 157410	2012-05-24 18:37:43 +00:00
Akira Hatanaka	df98a7a34d	Enable Mips16 compiler to compile a null program. First code from the Mips16 compiler. Includes trivial test program. Patch by Reed Kotler. llvm-svn: 157408	2012-05-24 18:32:33 +00:00
Jakob Stoklund Olesen	41ebcda8f4	Add a test case for global live range splitting. llvm-svn: 157357	2012-05-23 23:42:23 +00:00
Jakob Stoklund Olesen	0ce90494e6	Add a last resort tryInstructionSplit() to RAGreedy. Live ranges with a constrained register class may benefit from splitting around individual uses. It allows the remaining live range to use a larger register class where it may allocate. This is like spilling to a different register class. This is only attempted on constrained register classes. <rdar://problem/11438902> llvm-svn: 157354	2012-05-23 22:37:27 +00:00
Jakob Stoklund Olesen	5b8f476037	Correctly deal with identity copies in RegisterCoalescer. Now that the coalescer keeps live intervals and machine code in sync at all times, it needs to deal with identity copies differently. When merging two virtual registers, all identity copies are removed right away. This means that other identity copies must come from somewhere else, and they are going to have a value number. Deal with such copies by merging the value numbers before erasing the copy instruction. Otherwise, we leave dangling value numbers in the live interval. This fixes PR12927. llvm-svn: 157340	2012-05-23 20:21:06 +00:00
Chad Rosier	223faf719c	[arm-fast-isel] Add support for non-global callee. Patch by Jush Lu <jush.msn@gmail.com>. llvm-svn: 157336	2012-05-23 18:38:57 +00:00
Nuno Lopes	ad40c0a425	revert my previous patches that introduced an additional parameter to the objectsize intrinsic. After a lot of discussion, we realized it's not the best option for run-time bounds checking llvm-svn: 157255	2012-05-22 15:25:31 +00:00
Jakob Stoklund Olesen	924279ca0e	Only erase virtregs with no uses left. Also make sure registers aren't erased twice if the dead def mentions the register twice. This fixes PR12911. llvm-svn: 157254	2012-05-22 14:52:12 +00:00
Jim Grosbach	da04fa0d02	FileCheck'ize test, and add a bit to test for r157221. llvm-svn: 157222	2012-05-21 23:50:00 +00:00
Craig Topper	e88f2fd4f7	Allow 256-bit shuffles to still be split even if only half of the shuffle comes from two 128-bit pieces. llvm-svn: 157175	2012-05-21 06:40:16 +00:00
Peter Collingbourne	8eb05fd093	When legalising shifts, do not pre-build a list of operands which may be RAUW'd by the recursive call to LegalizeOps; instead, retrieve the other operands when calling UpdateNodeOperands. Fixes PR12889. llvm-svn: 157162	2012-05-20 18:36:15 +00:00
Hal Finkel	601f555eee	Add a missing PPC 64-bit stwu pattern. This seems to fix the remaining compile-time failures on PPC64 when compiling with -enable-ppc-preinc. llvm-svn: 157159	2012-05-20 17:11:24 +00:00
Jakob Stoklund Olesen	691ae3388f	Use the right register class for LDRrs. llvm-svn: 157152	2012-05-20 06:38:47 +00:00
Jakob Stoklund Olesen	4fd0e4f415	Transfer memory operands to the right instruction. They need to go on the PICLDR as the verifier points out. llvm-svn: 157151	2012-05-20 06:38:42 +00:00
Jakob Stoklund Olesen	1f1c6add10	Properly constrain register classes for sub-registers. Not all GR64 registers have sub_8bit sub-registers. llvm-svn: 157150	2012-05-20 06:38:37 +00:00
Jakob Stoklund Olesen	a103a516c6	Properly constrain register classes in 2-addr. X86 has 2-addr instructions with different constraints on the tied def and use operands. One is GR32, one is GR32_NOSP. llvm-svn: 157149	2012-05-20 06:38:32 +00:00
Jakob Stoklund Olesen	a34a69ce0c	Fix 12892. Dead code elimination during coalescing could cause a virtual register to be split into connected components. The following rewriting would be confused about the already joined copies present in the code, but without a corresponding value number in the live range. Erase all joined copies instantly when joining intervals such that the MI and LiveInterval representations are always in sync. llvm-svn: 157135	2012-05-19 23:34:59 +00:00
Jakob Stoklund Olesen	25ced18407	Erase joined copies immediately. The late dead code elimination is no longer necessary. The test changes are cause by a register hint that can be either %rdi or %rax. The choice depends on the use list order, which this patch changes. llvm-svn: 157131	2012-05-19 20:54:07 +00:00
Nadav Rotem	c93e91da27	On Haswell, perfer storing YMM registers using a single instruction. llvm-svn: 157129	2012-05-19 20:30:08 +00:00
Nadav Rotem	900c7cb7ce	Add support for additional in-reg vbroadcast patterns llvm-svn: 157127	2012-05-19 19:57:37 +00:00
Eric Christopher	bc5d24999c	Add support for the 'd' mips inline asm output modifier. Patch by Jack Carter. llvm-svn: 157093	2012-05-19 00:51:56 +00:00
Jim Grosbach	4b63d2ae1d	Refactor data-in-code annotations. Use a dedicated MachO load command to annotate data-in-code regions. This is the same format the linker produces for final executable images, allowing consistency of representation and use of introspection tools for both object and executable files. Data-in-code regions are annotated via ".data_region"/".end_data_region" directive pairs, with an optional region type. data_region_directive := ".data_region" { region_type } region_type := "jt8" \| "jt16" \| "jt32" \| "jta32" end_data_region_directive := ".end_data_region" The previous handling of ARM-style "$d.*" labels was broken and has been removed. Specifically, it didn't handle ARM vs. Thumb mode when marking the end of the section. rdar://11459456 llvm-svn: 157062	2012-05-18 19:12:01 +00:00
Eric Christopher	9ca26cfb5f	Add support for the mips 'x' inline asm modifier. Patch by Jack Carter. llvm-svn: 157057	2012-05-18 17:39:35 +00:00
Craig Topper	92db928ee9	Simplify handling of v16i8 shuffles and fix a missed optimization. llvm-svn: 157043	2012-05-18 06:42:06 +00:00
Evan Cheng	22d405f57b	Teach two-address pass to update the "source" map so it doesn't perform a non-profitable commute using outdated info. The test case would still fail because of poor pre-RA schedule. That will be fixed by MI scheduler. rdar://11472010 llvm-svn: 157038	2012-05-18 01:33:51 +00:00
Jakob Stoklund Olesen	874e401382	Remove a test that was only testing for physreg joining. This is the same as the other tests: Clever tricks are required to make the arguments and return value line up in a single-instruction function. It rarely happens in real life. We have plenty other examples of this behavior. llvm-svn: 157030	2012-05-18 00:07:14 +00:00
Jakob Stoklund Olesen	589c6eb95c	Remove -join-physregs from the test suite. This option has been disabled for a while, and it is going away so I can clean up the coalescer code. The tests that required physreg joining to be enabled were almost all of the form "tiny function with interference between arguments and return value". Such functions are usually inlined in the real world. The problem exposed by phys_subreg_coalesce-3.ll is real, but fairly rare. llvm-svn: 157027	2012-05-17 23:44:19 +00:00
Tim Northover	af501a29d3	Remove incorrect pattern for ARM SMML instruction. Patch by Meador Inge. llvm-svn: 156989	2012-05-17 13:12:13 +00:00
Evan Cheng	58a95f0c8a	Avoid creating a cycle when folding load / op with flag / store. PR11451474. rdar://11451474 llvm-svn: 156896	2012-05-16 01:54:27 +00:00
Jakob Stoklund Olesen	984997b3a0	Enable sub-sub-register copy coalescing. It is now possible to coalesce weird skewed sub-register copies by picking a super-register class larger than both original registers. The included test case produces code like this: vld2.32 {d16, d17, d18, d19}, [r0]! vst2.32 {d18, d19, d20, d21}, [r0] We still perform interference checking as if it were a normal full copy join, so this is still quite conservative. In particular, the f1 and f2 functions in the included test case still have remaining copies because of false interference. llvm-svn: 156878	2012-05-15 23:31:35 +00:00
Sirish Pande	91856a1f15	Enable all Hexagon tests. llvm-svn: 156824	2012-05-15 16:13:12 +00:00
Jakob Stoklund Olesen	dc2e0cd44a	Fix PR12821. RAFast must add an <imp-def> operand when it is rewriting a sub-register def that isn't a read-modify-write. llvm-svn: 156777	2012-05-14 21:10:25 +00:00
Brendon Cahoon	f6b687e5d1	Revert 156634 upon request until code improvement changes are made. llvm-svn: 156775	2012-05-14 19:35:42 +00:00
Dan Gohman	164fe18cfe	Rename @llvm.debugger to @llvm.debugtrap. llvm-svn: 156774	2012-05-14 18:58:10 +00:00
Sirish Pande	4bd20c50eb	Support for Hexagon feature, New Value Jump. llvm-svn: 156698	2012-05-12 05:10:30 +00:00
Akira Hatanaka	763ab85690	Fix test cases. llvm-svn: 156697	2012-05-12 03:25:16 +00:00
Akira Hatanaka	8f3573034b	Make the following changes in MipsAsmPrinter.cpp: - Remove code which lowers pseudo SETGP01. - Fix LowerSETGP01. The first two of the three instructions that are emitted to initialize the global pointer register now use register $2. - Stop emitting .cpload directive. llvm-svn: 156689	2012-05-12 00:48:43 +00:00
Akira Hatanaka	d918f77ba3	Insert instructions to the entry basic block which initializes the global pointer register. This is the first of the series of patches which clean up the way global pointer register is used. The patches will make the following improvements: - Make $gp an allocatable temporary register rather than reserving it. - Use a virtual register as the global pointer register and let the register allocator decide which register to assign to it or whether spill/reloads are needed. - Make sure $gp is valid at the entry of a called function, which is necessary for functions using lazy binding. - Remove the need for emitting .cprestore and .cpload directives. llvm-svn: 156671	2012-05-12 00:17:17 +00:00
Akira Hatanaka	0661b81bca	Do not replace operands of pseudo instructions with register $zero. llvm-svn: 156663	2012-05-11 23:22:18 +00:00
Akira Hatanaka	5d60c36f37	Use regular expression to match register names. llvm-svn: 156656	2012-05-11 23:00:40 +00:00
Chad Rosier	aa9cb9df59	[fast-isel] Add support for selecting @llvm.trap(). llvm-svn: 156646	2012-05-11 21:33:49 +00:00
Brendon Cahoon	31f8723ef3	Hexagon constant extender support. Patch by Jyotsna Verma. llvm-svn: 156634	2012-05-11 19:56:59 +00:00
Chad Rosier	3268692aa8	[fast-isel] Remove -disable-arm-fast-isel option. -fast-isel=0 suffices. Minor cleanup. llvm-svn: 156632	2012-05-11 19:40:25 +00:00
Chad Rosier	90f9afe659	[fast-isel] Cleaner fix for when we're unable to handle a non-double multi-reg retval. Hoists check before emitting the call to avoid unnecessary work. rdar://11430407 PR12796 llvm-svn: 156628	2012-05-11 18:51:55 +00:00
Hans Wennborg	addad7388d	Fix test/CodeGen/X86/tls-pie.ll. llvm-svn: 156612	2012-05-11 10:19:54 +00:00
Hans Wennborg	f9d0e44b82	Implement initial-exec TLS model for 32-bit PIC x86 This fixes a TODO from 2007 :) Previously, LLVM would emit the wrong code here (see the update to test/CodeGen/X86/tls-pie.ll). llvm-svn: 156611	2012-05-11 10:11:01 +00:00
Manman Ren	dc8ad0058f	ARM: peephole optimization to remove cmp instruction This patch will optimize the following cases: sub r1, r3 \| sub r1, imm cmp r3, r1 or cmp r1, r3 \| cmp r1, imm bge L1 TO subs r1, r3 bge L1 or ble L1 If the branch instruction can use flag from "sub", then we can replace "sub" with "subs" and eliminate the "cmp" instruction. rdar: 10734411 llvm-svn: 156599	2012-05-11 01:30:47 +00:00
Dan Gohman	dfab443ae8	Define a new intrinsic, @llvm.debugger. It will be similar to __builtin_trap(), but it generates int3 on x86 instead of ud2. llvm-svn: 156593	2012-05-11 00:19:32 +00:00
Eric Christopher	ed51b9ec0b	Add support for the 'X' inline asm operand modifier. Patch by Jack Carter. llvm-svn: 156577	2012-05-10 21:48:22 +00:00
Sirish Pande	69295b8963	Hexagon V5 FP Support. llvm-svn: 156568	2012-05-10 20:20:25 +00:00
Manman Ren	b555b382bd	Revert: 156550 "ARM: peephole optimization to remove cmp instruction" This commit broke an external linux bot and gave a compile-time warning. llvm-svn: 156556	2012-05-10 18:49:43 +00:00
Manman Ren	c860887b2d	ARM: peephole optimization to remove cmp instruction This patch will optimize the following cases: sub r1, r3 \| sub r1, imm cmp r3, r1 or cmp r1, r3 \| cmp r1, imm bge L1 TO subs r1, r3 bge L1 or ble L1 If the branch instruction can use flag from "sub", then we can replace "sub" with "subs" and eliminate the "cmp" instruction. rdar: 10734411 llvm-svn: 156550	2012-05-10 16:48:21 +00:00
Nadav Rotem	15946e50c1	AVX2: Add an additional broadcast idiom. llvm-svn: 156540	2012-05-10 12:39:13 +00:00
Nadav Rotem	b86a3fb8d0	Generate AVX/AVX2 shuffles even when there is a memory op somewhere else in the program. Starting r155461 we are able to select patterns for vbroadcast even when the load op is used by other users. Fix PR11900. llvm-svn: 156539	2012-05-10 12:22:05 +00:00
Danil Malyshev	47aba39004	Added a regress test for the bug #9964 before close it. This bug was fixed by Jim Grosbach in #138879, thanks Jim! llvm-svn: 156505	2012-05-09 19:07:04 +00:00
Nuno Lopes	01547b3ad2	change the objectsize intrinsic signature: add a 3rd parameter to denote the maximum runtime performance penalty that the user is willing to accept. This commit only adds the parameter. Code taking advantage of it will follow. llvm-svn: 156473	2012-05-09 15:52:43 +00:00
Akira Hatanaka	ca41d13bbd	Add another peephole pattern for conditional moves. llvm-svn: 156460	2012-05-09 02:29:29 +00:00
Akira Hatanaka	05b9dad1e6	Make register FP allocatable if the compiled function does not have dynamic allocas. llvm-svn: 156458	2012-05-09 01:38:13 +00:00
Akira Hatanaka	0a8ab718cb	Expand 64-bit shifts if target ABI is O32. llvm-svn: 156457	2012-05-09 00:55:21 +00:00
Craig Topper	7daf897678	Remove 256-bit AVX non-temporal store intrinsics. Similar was previously done for 128-bit. llvm-svn: 156375	2012-05-08 06:58:15 +00:00
Owen Anderson	ab63d84252	Teach DAG combine to fold x-x to 0.0 when unsafe FP math is enabled. llvm-svn: 156324	2012-05-07 20:51:25 +00:00
Chad Rosier	d8287fec17	Fix a regression from r147481. This combine should only happen if there is a single use. rdar://11360370 llvm-svn: 156316	2012-05-07 18:47:44 +00:00
Manman Ren	ef4e0479ec	X86: optimization for -(x != 0) This patch will optimize -(x != 0) on X86 FROM cmpl $0x01,%edi sbbl %eax,%eax notl %eax TO negl %edi sbbl %eax %eax In order to generate negl, I added patterns in Target/X86/X86InstrCompiler.td: def : Pat<(X86sub_flag 0, GR32:$src), (NEG32r GR32:$src)>; rdar: 10961709 llvm-svn: 156312	2012-05-07 18:06:23 +00:00
Eric Christopher	9c492e6ebf	Add support for the 'l' constraint. Patch by Jack Carter. llvm-svn: 156294	2012-05-07 06:25:15 +00:00
Eric Christopher	e3c494de82	Add support for the 'c' constraint. Patch by Jack Carter. llvm-svn: 156293	2012-05-07 06:25:10 +00:00
Eric Christopher	c18ae4a3b1	Add support for the 'P' constraint. Patch by Jack Carter. llvm-svn: 156292	2012-05-07 06:25:02 +00:00
Eric Christopher	470578a91b	Add support for the 'O' constraint. Patch by Jack Carter. llvm-svn: 156285	2012-05-07 05:46:48 +00:00
Eric Christopher	e07aa430b8	Add support for the 'N' inline asm constraint. Patch by Jack Carter. llvm-svn: 156284	2012-05-07 05:46:43 +00:00
Eric Christopher	1109b3406d	Add support for the 'L' inline asm constraint. Patch by Jack Carter. llvm-svn: 156283	2012-05-07 05:46:37 +00:00
Eric Christopher	3ff88a05b7	Add support for the inline asm constraint 'K'. llvm-svn: 156282	2012-05-07 05:46:29 +00:00
Craig Topper	d4e1894ec1	Add SSE4A MOVNTSS/MOVNTSD instructions. llvm-svn: 156281	2012-05-07 05:36:19 +00:00
Eric Christopher	7201e1b4b9	Support the 'J' constraint. Patch by Jack Carter. llvm-svn: 156280	2012-05-07 03:13:42 +00:00
Eric Christopher	1d6c89eea1	Add support for the 'I' inline asm constraint. Also add tests from the previous 2 patches. Patch by Jack Carter. llvm-svn: 156279	2012-05-07 03:13:32 +00:00
Benjamin Kramer	3d38c17b59	Switch the select to branch transformation on by default. The primitive conservative heuristic seems to give a slight overall improvement while not regressing stuff. Make it available to wider testing. If you notice any speed regressions (or significant code size regressions) let me know! llvm-svn: 156258	2012-05-06 14:25:16 +00:00
Benjamin Kramer	047d7ca0b1	CodeGenPrepare: Add a transform to turn selects into branches in some cases. This came up when a change in block placement formed a cmov and slowed down a hot loop by 50%: ucomisd (%rdi), %xmm0 cmovbel %edx, %esi cmov is a really bad choice in this context because it doesn't get branch prediction. If we emit it as a branch, an out-of-order CPU can do a better job (if the branch is predicted right) and avoid waiting for the slow load+compare instruction to finish. Of course it won't help if the branch is unpredictable, but those are really rare in practice. This patch uses a dumb conservative heuristic, it turns all cmovs that have one use and a direct memory operand into branches. cmovs usually save some code size, so we disable the transform in -Os mode. In-Order architectures are unlikely to benefit as well, those are included in the "predictableSelectIsExpensive" flag. It would be better to reuse branch probability info here, but BPI doesn't support select instructions currently. It would make sense to use the same heuristics as the if-converter pass, which does the opposite direction of this transform. Test suite shows a small improvement here and there on corei7-level machines, but the actual results depend a lot on the used microarchitecture. The transformation is currently disabled by default and available by passing the -enable-cgp-select2branch flag to the code generator. Thanks to Chandler for the initial test case to him and Evan Cheng for providing me with comments and test-suite numbers that were more stable than mine :) llvm-svn: 156234	2012-05-05 12:49:22 +00:00
Justin Holewinski	ae556d3ef7	This patch adds a new NVPTX back-end to LLVM which supports code generation for NVIDIA PTX 3.0. This back-end will (eventually) replace the current PTX back-end, while maintaining compatibility with it. The new target machines are: nvptx (old ptx32) => 32-bit PTX nvptx64 (old ptx64) => 64-bit PTX The sources are based on the internal NVIDIA NVPTX back-end, and contain more functionality than the current PTX back-end currently provides. NV_CONTRIB llvm-svn: 156196	2012-05-04 20:18:50 +00:00
Sebastian Pop	2420e8b7d5	Added missing CMN case in Thumb2SizeReduction pass so that LLVM emits 16-bits encoding of CMN instructions. llvm-svn: 156195	2012-05-04 19:53:56 +00:00
Craig Topper	42f2182366	Allow v16i16 and v32i8 shuffles to be rewritten as narrower shuffles. llvm-svn: 156156	2012-05-04 04:44:49 +00:00
Sirish Pande	f8e5e3c072	Support for target dependent Hexagon VLIW packetizer. This patch creates and optimizes packets as per Hexagon ISA rules. llvm-svn: 156109	2012-05-03 21:52:53 +00:00
Craig Topper	315a5cc789	Fix 256-bit vpshuflw and vpshufhw immediate encoding to handle undefs in the lower half correctly. Missed in r155982. llvm-svn: 156059	2012-05-03 07:12:59 +00:00
Evan Cheng	b64e7b778b	Fix two-address pass's aggressive instruction commuting heuristics. It's meant to catch cases like: %reg1024<def> = MOV r1 %reg1025<def> = MOV r0 %reg1026<def> = ADD %reg1024, %reg1025 r0 = MOV %reg1026 By commuting ADD, it let coalescer eliminate all of the copies. However, there was a bug in the heuristics where it ended up commuting the ADD in: %reg1024<def> = MOV r0 %reg1025<def> = MOV 0 %reg1026<def> = ADD %reg1024, %reg1025 r0 = MOV %reg1026 That did no benefit but rather ensure the last MOV would not be coalesced. rdar://11355268 llvm-svn: 156048	2012-05-03 01:45:13 +00:00
Owen Anderson	41b0665b5b	Teach DAGCombine the same multiply-by-1.0 folding trick when doing FMAs, just like it now knows for FMULs. llvm-svn: 156029	2012-05-02 22:17:40 +00:00
Owen Anderson	b5f167c660	Teach DAG combine that multiplication by 1.0 can always be constant folded. llvm-svn: 156023	2012-05-02 21:32:35 +00:00
Manman Ren	f02efc8731	Revert r155853 The commit is intended to fix rdar://10961709. But it is the root cause of PR12720. Revert it for now. llvm-svn: 155992	2012-05-02 15:24:32 +00:00
Craig Topper	c73bc39c22	Add support for selecting AVX2 vpshuflw and vpshufhw. Add decoding support for AsmPrinter. llvm-svn: 155982	2012-05-02 08:03:44 +00:00
Bill Wendling	b6b50c6638	Strip the pointer casts off of allocas so that the selection DAG can find them. PR10799 llvm-svn: 155954	2012-05-01 22:50:45 +00:00
Manman Ren	425a55c1ce	X86: optimization for max-like struct This patch will optimize the following cases on X86 (a > b) ? (a-b) : 0 (a >= b) ? (a-b) : 0 (b < a) ? (a-b) : 0 (b <= a) ? (a-b) : 0 FROM movl %edi, %ecx subl %esi, %ecx cmpl %edi, %esi movl $0, %eax cmovll %ecx, %eax TO xorl %eax, %eax subl %esi, %edi cmovll %eax, %edi movl %edi, %eax rdar: 10734411 llvm-svn: 155919	2012-05-01 17:16:15 +00:00
Jay Foad	8fc810c2ef	Regression test for PR2960. llvm-svn: 155912	2012-05-01 11:11:34 +00:00
Manman Ren	4f4d5c8fc8	X86: optimization for -(x != 0) This patch will optimize -(x != 0) on X86 FROM cmpl $0x01,%edi sbbl %eax,%eax notl %eax TO negl %edi sbbl %eax %eax llvm-svn: 155853	2012-04-30 22:51:25 +00:00
Manman Ren	5b7e08c9d8	test/CodeGen/X86/select.ll: remove spaces llvm-svn: 155840	2012-04-30 18:54:27 +00:00
Derek Schuff	b051adf263	Fix fastcc structure return with fast-isel on x86-32 On x86-32, structure return via sret lets the callee pop the hidden pointer argument off the stack, which the caller then re-pushes. However if the calling convention is fastcc, then a register is used instead, and the caller should not adjust the stack. This is implemented with a check of IsTailCallConvention X86TargetLowering::LowerCall but is now checked properly in X86FastISel::DoSelectCall. (this time, actually commit what was reviewed!) llvm-svn: 155825	2012-04-30 16:57:15 +00:00
Bob Wilson	9245c93656	Don't introduce illegal types when creating vmull operations. <rdar://11324364> ARM BUILD_VECTORs created after type legalization cannot use i8 or i16 operands, since those types are not legal. Instead use i32 operands, which will be implicitly truncated by the BUILD_VECTOR to match the element type. llvm-svn: 155824	2012-04-30 16:53:34 +00:00
Andrew Trick	833f04962a	Reapply 155668: Fix the SD scheduler to avoid gluing the same node twice. This time, also fix the caller of AddGlue to properly handle incomplete chains. AddGlue had failure modes, but shamefully hid them from its caller. It's luck ran out. Fixes rdar://11314175: BuildSchedUnits assert. llvm-svn: 155749	2012-04-28 01:03:23 +00:00
Derek Schuff	a99b168145	Revert r155745 llvm-svn: 155746	2012-04-27 23:37:41 +00:00
Derek Schuff	bbf8b83e90	Fix fastcc structure return with fast-isel on x86-32 On x86-32, structure return via sret lets the callee pop the hidden pointer argument off the stack, which the caller then re-pushes. However if the calling convention is fastcc, then a register is used instead, and the caller should not adjust the stack. This is implemented with a check of IsTailCallConvention X86TargetLowering::LowerCall but is now checked properly in X86FastISel::DoSelectCall. llvm-svn: 155745	2012-04-27 23:27:17 +00:00
Andrew Trick	7a773ec053	Temporarily revert r155668: Fix the SD scheduler to avoid gluing. This definitely caused regression with ARM -mno-thumb. llvm-svn: 155743	2012-04-27 22:55:59 +00:00
Chad Rosier	32c2178ef3	Add x86-specific DAG combine to simplify: x == -y --> x+y == 0 x != -y --> x+y != 0 On x86, the generated code goes from negl %esi cmpl %esi, %edi je .LBB0_2 to addl %esi, %edi je .L4 This case is correctly handled for ARM with "cmn". Patch by Manman Ren. rdar://11245199 PR12545 llvm-svn: 155739	2012-04-27 22:33:25 +00:00
Evan Cheng	73fd08d5bd	Make test less fragile. llvm-svn: 155732	2012-04-27 20:48:18 +00:00
Lang Hames	ea001225c1	Fix the order of the operands in the llvm.fma intrinsic patterns for ARM, <rdar://problem/11325085>. llvm-svn: 155724	2012-04-27 18:51:24 +00:00
Benjamin Kramer	913da4b261	X86: Don't emit conditional floating point moves on when targeting pre-pentiumpro architectures. * Model FPSW (the FPU status word) as a register. * Add ISel patterns for the FUCOM, FNSTSW and SAHF instructions. During Legalize/Lowering, build a node sequence to transfer the comparison result from FPSW into EFLAGS. If you're wondering about the right-shift: That's an implicit sub-register extraction (%ax -> %ah) which is handled later on by the instruction selector. Fixes PR6679. Patch by Christoph Erhardt! llvm-svn: 155704	2012-04-27 12:07:43 +00:00
Craig Topper	e57b49ee16	Add mcpu to tests to prevent them from using AVX instructions on Sandy Bridge after r155618. llvm-svn: 155696	2012-04-27 07:11:58 +00:00
Evan Cheng	1ec87ee096	Implement a bastardized ABI. llvm-svn: 155686	2012-04-27 02:11:10 +00:00
Evan Cheng	f52003de56	- thumbv6 shouldn't imply +thumb2. Cortex-M0 doesn't suppport 32-bit Thumb2 instructions. - However, it does support dmb, dsb, isb, mrs, and msr. rdar://11331541 llvm-svn: 155685	2012-04-27 01:27:19 +00:00
Andrew Trick	03fa574af5	Fix the SD scheduler to avoid gluing the same node twice. DAGCombine strangeness may result in multiple loads from the same offset. They both may try to glue themselves to another load. We could insist that the redundant loads glue themselves to each other, but the beter fix is to bail out from bad gluing at the time we detect it. Fixes rdar://11314175: BuildSchedUnits assert. llvm-svn: 155668	2012-04-26 21:48:25 +00:00
Tim Northover	3de97b7a86	Use VLD1 in NEON extenting-load patterns instead of VLDR. On some cores it's a bad idea for performance to mix VFP and NEON instructions and since these patterns are NEON anyway, the NEON load should be used. llvm-svn: 155630	2012-04-26 08:46:29 +00:00
Evan Cheng	9f7ad310b5	If triple is armv7 / thumbv7 and a CPU is specified, do not automatically assume the feature set of v7a. This comes about if the user specifies something like -arch armv7 -mcpu=cortex-m3. We shouldn't be generating instructions such as uxtab in this case. rdar://11318438 llvm-svn: 155601	2012-04-26 01:13:36 +00:00
Jakob Stoklund Olesen	6eeeb7e19c	Try to fix llvm-arm-linux builder with -mcpu. llvm-svn: 155589	2012-04-25 21:22:33 +00:00
Preston Gurd	82cac0acc0	Trivial change to make the test use -mcpu=generic so as to avoid a failure if run on an Intel Atom with post RA instruction scheduling. llvm-svn: 155587	2012-04-25 21:04:54 +00:00
Akira Hatanaka	2020e27d6d	Do not use $gp as a dedicated global register if the target ABI is not O32. llvm-svn: 155522	2012-04-25 01:24:52 +00:00
Nadav Rotem	d50c3b2c57	Fix the testcase. We do expect two vblendw on XMMs. llvm-svn: 155477	2012-04-24 19:57:38 +00:00
Nadav Rotem	edef71790b	Add a testcase for 155440 llvm-svn: 155475	2012-04-24 19:45:28 +00:00
Evan Cheng	2d14d8aca1	MachineBasicBlock::SplitCriticalEdge() should follow LLVM IR variant and refuse to break edge to EH landing pad. rdar://11300144 llvm-svn: 155470	2012-04-24 19:06:55 +00:00
Nadav Rotem	aa3ff8da00	AVX: We lower VECTOR_SHUFFLE and BUILD_VECTOR nodes into vbroadcast instructions using the pattern (vbroadcast (i32load src)). In some cases, after we generate this pattern new users are added to the load node, which prevent the selection of the blend pattern. This commit provides fallback patterns which perform in-vector broadcast (using in-vector vbroadcast in AVX2 and pshufd on AVX1). llvm-svn: 155437	2012-04-24 11:07:03 +00:00
Nadav Rotem	3f8acfc3c4	Optimize the vector UINT_TO_FP, SINT_TO_FP and FP_TO_SINT operations where the integer type is i8 (commonly used in graphics). llvm-svn: 155397	2012-04-23 21:53:37 +00:00
Preston Gurd	9a0914753a	This patch fixes a problem which arose when using the Post-RA scheduler on X86 Atom. Some of our tests failed because the tail merging part of the BranchFolding pass was creating new basic blocks which did not contain live-in information. When the anti-dependency code in the Post-RA scheduler ran, it would sometimes rename the register containing the function return value because the fact that the return value was live-in to the subsequent block had been lost. To fix this, it is necessary to run the RegisterScavenging code in the BranchFolding pass. This patch makes sure that the register scavenging code is invoked in the X86 subtarget only when post-RA scheduling is being done. Post RA scheduling in the X86 subtarget is only done for Atom. This patch adds a new function to the TargetRegisterClass to control whether or not live-ins should be preserved during branch folding. This is necessary in order for the anti-dependency optimizations done during the PostRASchedulerList pass to work properly when doing Post-RA scheduling for the X86 in general and for the Intel Atom in particular. The patch adds and invokes the new function trackLivenessAfterRegAlloc() instead of using the existing requiresRegisterScavenging(). It changes BranchFolding.cpp to call trackLivenessAfterRegAlloc() instead of requiresRegisterScavenging(). It changes the all the targets that implemented requiresRegisterScavenging() to also implement trackLivenessAfterRegAlloc(). It adds an assertion in the Post RA scheduler to make sure that post RA liveness information is available when it is needed. It changes the X86 break-anti-dependencies test to use –mcpu=atom, in order to avoid running into the added assertion. Finally, this patch restores the use of anti-dependency checking (which was turned off temporarily for the 3.1 release) for Intel Atom in the Post RA scheduler. Patch by Andy Zhang! Thanks to Jakob and Anton for their reviews. llvm-svn: 155395	2012-04-23 21:39:35 +00:00
Chandler Carruth	3c3bb55a85	Revert r155365, r155366, and r155367. All three of these have regression test suite failures. The failures occur at each stage, and only get worse, so I'm reverting all of them. Please resubmit these patches, one at a time, after verifying that the regression test suite passes. Never submit a patch without running the regression test suite. llvm-svn: 155372	2012-04-23 18:25:57 +00:00
Sirish Pande	a3f8ba2439	Hexagon V5 (floating point) support. llvm-svn: 155367	2012-04-23 17:49:40 +00:00
Sirish Pande	2c7bf00fba	Support for Hexagon architectural feature, new value jump. llvm-svn: 155366	2012-04-23 17:49:28 +00:00
Sirish Pande	6cd2251598	Support for Hexagon VLIW Packetizer. llvm-svn: 155365	2012-04-23 17:49:20 +00:00
Elena Demikhovsky	6c6cdec3de	cleaned line endings in the newly added test file llvm-svn: 155315	2012-04-22 13:22:48 +00:00
Elena Demikhovsky	8d7e56c409	ZERO_EXTEND/SIGN_EXTEND/TRUNCATE optimization for AVX2 llvm-svn: 155309	2012-04-22 09:39:03 +00:00
Nadav Rotem	31caa27bf5	Teach getVectorTypeBreakdown about promotion of vectors in addition to widening of vectors. llvm-svn: 155296	2012-04-21 20:08:32 +00:00
Jakob Stoklund Olesen	d114da6004	Fix PR12599. The X86 target is editing the selection DAG while isel is selecting nodes following a topological ordering. When the DAG hacking triggers CSE, nodes can be deleted and bad things happen. llvm-svn: 155257	2012-04-20 23:36:09 +00:00
Joel Jones	a7691f18a6	Test for the the problem with xors being changed into ands when the set bits aren't the same for both args of the xor. This transformation is in the function TargetLowering::SimplifyDemandedBits in the file lib/CodeGen/SelectionDAG/TargetLowering.cpp. I have tested this test using a previous version of llc which the defect and the a version of llc which does not. I got the expected fail and pass, respectively. This test goes with rdar://11195364 and the check in with the fix: svn r154955 llvm-svn: 155156	2012-04-19 20:54:44 +00:00
Joe Groff	3a940250bf	Move win32 SimplifyLibcall test under Transforms llvm-svn: 154967	2012-04-18 00:07:45 +00:00
Joe Groff	a81bcbb9bb	fix pr12559: mark unavailable win32 math libcalls also fix SimplifyLibCalls to use TLI rather than compile-time conditionals to enable optimizations on floor, ceil, round, rint, and nearbyint llvm-svn: 154960	2012-04-17 23:05:54 +00:00
Benjamin Kramer	7ce42c476a	Force cmov on test so block placement doesn't shuffle the code around. This made the test fail with -mcpu=generic (when building on a non-x86 host). llvm-svn: 154926	2012-04-17 13:55:23 +00:00
James Molloy	a9bcf20d22	Fix bad EXTRACT_SUBREG in instruction selection for extending-loads on NEON. llvm-svn: 154915	2012-04-17 08:18:00 +00:00
Andrew Trick	13840499df	Test cases that assume layout should use -disable-code-place. llvm-svn: 154908	2012-04-17 06:20:42 +00:00
Preston Gurd	e63746195d	temporarily XFAIL this test until post RA live-ins is properly enabled. llvm-svn: 154882	2012-04-17 00:21:35 +00:00
Chandler Carruth	1f05b5a4ec	Disable the atom scheduling test after r154874 broke it. llvm-svn: 154877	2012-04-16 23:11:39 +00:00
Chandler Carruth	f594b178c6	Relax this test a touch to cope with different assembly variants. llvm-svn: 154870	2012-04-16 22:20:48 +00:00
Chandler Carruth	1f5580b6f3	Fix updateTerminator to be resiliant to degenerate terminators where both fallthrough and a conditional branch target the same successor. Gracefully delete the conditional branch and introduce any unconditional branch needed to reach the actual successor. This fixes memory corruption in 2009-06-15-RegScavengerAssert.ll and possibly other tests. Also, while I'm here fix a latent bug I spotted by inspection. I never applied the same fundamental fix to this fallthrough successor finding logic that I did to the logic used when there are no conditional branches. As a consequence it would have selected landing pads had they be aligned in just the right way here. I don't have a test case as I spotted this by inspection, and the previous time I found this required have of TableGen's source code to produce it. =/ I hate backend bugs. ;] Thanks to Jim Grosbach for helping me reason through this and reviewing the fix. llvm-svn: 154867	2012-04-16 22:03:00 +00:00
Jakob Stoklund Olesen	73d96651ab	FileCheckize these tests. Add an extra test to ldr_post with an immediate increment. llvm-svn: 154859	2012-04-16 20:56:42 +00:00
Jakob Stoklund Olesen	e8ee9d1c8c	Disable code placement for this test. It makes it less sensitive to small changes in heuristics. llvm-svn: 154857	2012-04-16 20:49:06 +00:00
Richard Smith	12da79b859	Fix incorrect atomics codegen introduced in r154705, and extend test to catch it. llvm-svn: 154845	2012-04-16 18:43:53 +00:00
Bill Wendling	7e6be75e06	Move to X86 directory because this fails on non-X86 platforms. llvm-svn: 154825	2012-04-16 16:38:48 +00:00
Chandler Carruth	4190b507c5	Flip the new block-placement pass to be on by default. This is mostly to test the waters. I'd like to get results from FNT build bots and other bots running on non-x86 platforms. This feature has been pretty heavily tested over the last few months by me, and it fixes several of the execution time regressions caused by the inlining work by preventing inlining decisions from radically impacting block layout. I've seen very large improvements in yacr2 and ackermann benchmarks, along with the expected noise across all of the benchmark suite whenever code layout changes. I've analyzed all of the regressions and fixed them, or found them to be impossible to fix. See my email to llvmdev for more details. I'd like for this to be in 3.1 as it complements the inliner changes, but if any failures are showing up or anyone has concerns, it is just a flag flip and so can be easily turned off. I'm switching it on tonight to try and get at least one run through various folks' performance suites in case SPEC or something else has serious issues with it. I'll watch bots and revert if anything shows up. llvm-svn: 154816	2012-04-16 13:49:17 +00:00
Chandler Carruth	a355e7cf82	Remove an overly brittle test. This test will no longer be interesting once we start changing the block layout, so just nuke it. If anyone has ideas about how to craft a code layout agnostic form of the test please let me know. llvm-svn: 154815	2012-04-16 13:49:09 +00:00
Chandler Carruth	8c0b41d656	Add a somewhat hacky heuristic to do something different from whole-loop rotation. When there is a loop backedge which is an unconditional branch, we will end up with a branch somewhere no matter what. Try placing this backedge in a fallthrough position above the loop header as that will definitely remove at least one branch from the loop iteration, where whole loop rotation may not. I haven't seen any benchmarks where this is important but loop-blocks.ll tests for it, and so this will be covered when I flip the default. llvm-svn: 154812	2012-04-16 13:33:36 +00:00
Chandler Carruth	8c74c7b1c6	Tweak the loop rotation logic to check whether the loop is naturally laid out in a form with a fallthrough into the header and a fallthrough out of the bottom. In that case, leave the loop alone because any rotation will introduce unnecessary branches. If either side looks like it will require an explicit branch, then the rotation won't add any, do it to ensure the branch occurs outside of the loop (if possible) and maximize the benefit of the fallthrough in the bottom. llvm-svn: 154806	2012-04-16 09:31:23 +00:00
Hal Finkel	e0cf6397fd	Remove dead SD nodes after the combining pass. Fixes PR12201. llvm-svn: 154786	2012-04-16 03:33:22 +00:00
Chandler Carruth	ccc7e42b1f	Rewrite how machine block placement handles loop rotation. This is a complex change that resulted from a great deal of experimentation with several different benchmarks. The one which proved the most useful is included as a test case, but I don't know that it captures all of the relevant changes, as I didn't have specific regression tests for each, they were more the result of reasoning about what the old algorithm would possibly do wrong. I'm also failing at the moment to craft more targeted regression tests for these changes, if anyone has ideas, it would be welcome. The first big thing broken with the old algorithm is the idea that we can take a basic block which has a loop-exiting successor and a looping successor and use the looping successor as the layout top in order to get that particular block to be the bottom of the loop after layout. This happens to work in many cases, but not in all. The second big thing broken was that we didn't try to select the exit which fell into the nearest enclosing loop (to which we exit at all). As a consequence, even if the rotation worked perfectly, it would result in one of two bad layouts. Either the bottom of the loop would get fallthrough, skipping across a nearer enclosing loop and thereby making it discontiguous, or it would be forced to take an explicit jump over the nearest enclosing loop to earch its successor. The point of the rotation is to get fallthrough, so we need it to fallthrough to the nearest loop it can. The fix to the first issue is to actually layout the loop from the loop header, and then rotate the loop such that the correct exiting edge can be a fallthrough edge. This is actually much easier than I anticipated because we can handle all the hard parts of finding a viable rotation before we do the layout. We just store that, and then rotate after layout is finished. No inner loops get split across the post-rotation backedge because we check for them when selecting the rotation. That fix exposed a latent problem with our exitting block selection -- we should allow the backedge to point into the middle of some inner-loop chain as there is no real penalty to it, the whole point is that it won't be a fallthrough edge. This may have blocked the rotation at all in some cases, I have no idea and no test case as I've never seen it in practice, it was just noticed by inspection. Finally, all of these fixes, and studying the loops they produce, highlighted another problem: in rotating loops like this, we sometimes fail to align the destination of these backwards jumping edges. Fix this by actually walking the backwards edges rather than relying on loopinfo. This fixes regressions on heapsort if block placement is enabled as well as lots of other cases where the previous logic would introduce an abundance of unnecessary branches into the execution. llvm-svn: 154783	2012-04-16 01:12:56 +00:00
Craig Topper	bfc9a5f7d3	Remove AVX2 vpermq and vpermpd intrinsics. These can now be handled with normal shuffle vectors. llvm-svn: 154778	2012-04-15 22:43:31 +00:00
Nadav Rotem	42bcd04ee3	Fix PR12529. The Vxx family of instructions are only supported by AVX. Use non-vex instructions for SSE4. llvm-svn: 154770	2012-04-15 19:36:44 +00:00
Nadav Rotem	02ef0c3524	When emulating vselect using OR/AND/XOR make sure to bitcast the result back to the original type. llvm-svn: 154764	2012-04-15 15:08:09 +00:00
Elena Demikhovsky	779a72b49e	Added VPERM optimization for AVX2 shuffles llvm-svn: 154761	2012-04-15 11:18:59 +00:00
Richard Smith	3e8f1f6aea	Fix X86 codegen for 'atomicrmw nand' to generate x = ~(x & y), not x = ~x & y. llvm-svn: 154705	2012-04-13 22:47:00 +00:00
Evan Cheng	267a4ada52	On Darwin targets, only use vfma etc. if the source use fma() intrinsic explicitly. llvm-svn: 154689	2012-04-13 18:59:28 +00:00
Sirish Pande	1d195b9c25	Disable Hexagon test temporarily. There is an assert at line 558 in ScheduleDAGInstrs::buildSchedGraph(AliasAnalysis *AA). This assert needs to addressed for post RA scheduler. Until that assert is addressed, any passes that uses post ra scheduler will fail. So, I am temporarily disabling the hexagon tests until that fix is in. The assert is as follows: assert(!MI->isTerminator() && !MI->isLabel() && "Cannot schedule terminators or labels!"); llvm-svn: 154617	2012-04-12 21:06:54 +00:00
Craig Topper	d0271b27cb	Fix 128-bit ptest intrinsics to take v2i64 instead of v4f32 since these are integer instructions. llvm-svn: 154580	2012-04-12 07:23:00 +00:00
Akira Hatanaka	c80ae58a5e	Revert changes that were accidentally committed. llvm-svn: 154563	2012-04-11 23:19:55 +00:00
Akira Hatanaka	1e962f250b	Fix string that is being checked. llvm-svn: 154547	2012-04-11 23:11:33 +00:00
Akira Hatanaka	47ad674f67	Emit neg.s or neg.d only if -enable-no-nans-fp-math is supplied by user, otherwise expand FNEG during legalization. llvm-svn: 154546	2012-04-11 22:59:08 +00:00
Akira Hatanaka	7f4c9d1429	Emit abs.s or abs.d only if -enable-no-nans-fp-math is supplied by user. Invalid operation is signaled if the operand of these instructions is NaN. llvm-svn: 154545	2012-04-11 22:49:04 +00:00
Akira Hatanaka	4f5c8421b3	Fix bugs in lowering of FCOPYSIGN nodes. - FCOPYSIGN nodes that have operands of different types were not handled. - Different code was generated depending on the endianness of the target. Additionally, code is added that emits INS and EXT instructions, if they are supported by target (they are R2 instructions). llvm-svn: 154540	2012-04-11 22:13:04 +00:00
Evan Cheng	5efc442290	Add more fused mul+add/sub patterns. rdar://10139676 llvm-svn: 154484	2012-04-11 06:59:47 +00:00
Nadav Rotem	9bc178ac5c	Reapply 154396 after fixing a test. Original message: Modify the code that lowers shuffles to blends from using blendvXX to vblendXX. blendV uses a register for the selection while Vblend uses an immediate. On sandybridge they still have the same latency and execute on the same execution ports. llvm-svn: 154483	2012-04-11 06:40:27 +00:00
Evan Cheng	67a09fc397	Match (fneg (fma) to vfnma. rdar://10139676 llvm-svn: 154469	2012-04-11 01:21:25 +00:00
Evan Cheng	d0f61cbefe	Merge fma.ll into fusedMAC.ll llvm-svn: 154466	2012-04-11 01:03:11 +00:00
Jakob Stoklund Olesen	0bcf8f4bfb	Fix test to be register assignment invariant. llvm-svn: 154453	2012-04-11 00:00:24 +00:00
Owen Anderson	6f1ee1634d	Move the constant-folding support for FP_ROUND in SelectionDAG from the one-operand version of getNode() to the two-operand version, since it became a two-operand node at sound point. Zap a testcase that this allows us to completely fold away. llvm-svn: 154447	2012-04-10 22:46:53 +00:00
Evan Cheng	d0007f3c83	Handle llvm.fma.* intrinsics. rdar://10914096 llvm-svn: 154439	2012-04-10 21:40:28 +00:00
Duncan Sands	4f53074cca	Add a comment noting that the fdiv -> fmul conversion won't generate multiplication by a denormal, and some tests checking that. llvm-svn: 154431	2012-04-10 20:35:27 +00:00
Eric Christopher	65ada95b84	Temporarily revert this patch to see if it brings the buildbots back. llvm-svn: 154425	2012-04-10 19:33:16 +00:00
Eric Christopher	e9abba71fe	To ensure that we have more accurate line information for a block don't elide the branch instruction if it's the only one in the block, otherwise it's ok. PR9796 and rdar://11215207 llvm-svn: 154417	2012-04-10 18:18:10 +00:00
Nadav Rotem	f934f91709	Modify the code that lowers shuffles to blends from using blendvXX to vblendXX. blendv uses a register for the selection while vblend uses an immediate. On sandybridge they still have the same latency and execute on the same execution ports. llvm-svn: 154396	2012-04-10 14:33:13 +00:00
Anton Korobeynikov	4d1220de34	Transform div to mul with reciprocal only when fp imm is legal. This fixes PR12516 and uncovers one weird problem in legalize (workarounded) llvm-svn: 154394	2012-04-10 13:22:49 +00:00
Evan Cheng	0752624970	Add proper checks. llvm-svn: 154379	2012-04-10 03:15:42 +00:00
Evan Cheng	f8bad08001	Fix a long standing tail call optimization bug. When a libcall is emitted legalizer always use the DAG entry node. This is wrong when the libcall is emitted as a tail call since it effectively folds the return node. If the return node's input chain is not the entry (i.e. call, load, or store) use that as the tail call input chain. PR12419 rdar://9770785 rdar://11195178 llvm-svn: 154370	2012-04-10 01:51:00 +00:00
Rafael Espindola	1d9672bdce	Don't try to zExt just to check if an integer constant is zero, it might not fit in a i64. llvm-svn: 154364	2012-04-10 00:16:22 +00:00
Lang Hames	ec96cd0690	Test case for PR12495. llvm-svn: 154359	2012-04-09 23:58:59 +00:00
Akira Hatanaka	8483a6c47d	Have TargetLowering::getPICJumpTableRelocBase return a node that points to the GOT if jump table uses 64-bit gp-relative relocation. llvm-svn: 154341	2012-04-09 20:32:12 +00:00
Chad Rosier	e0e38f61a5	When performing a truncating store, it's possible to rearrange the data in-register, such that we can use a single vector store rather then a series of scalar stores. For func_4_8 the generated code vldr d16, LCPI0_0 vmov d17, r0, r1 vadd.i16 d16, d17, d16 vmov.u16 r0, d16[3] strb r0, [r2, #3] vmov.u16 r0, d16[2] strb r0, [r2, #2] vmov.u16 r0, d16[1] strb r0, [r2, #1] vmov.u16 r0, d16[0] strb r0, [r2] bx lr becomes vldr d16, LCPI0_0 vmov d17, r0, r1 vadd.i16 d16, d17, d16 vuzp.8 d16, d17 vst1.32 {d16[0]}, [r2, :32] bx lr I'm not fond of how this combine pessimizes 2012-03-13-DAGCombineBug.ll, but I couldn't think of a way to judiciously apply this combine. This ldrh r0, [r0, #4] strh r0, [r1] becomes vldr d16, [r0] vmov.u16 r0, d16[2] vmov.32 d16[0], r0 vuzp.16 d16, d17 vst1.32 {d16[0]}, [r1, :32] PR11158 rdar://10703339 llvm-svn: 154340	2012-04-09 20:32:02 +00:00

... 3 4 5 6 7 ...

6209 Commits