llvm-project

Commit Graph

Author	SHA1	Message	Date
Andrew Trick	5533adc117	Reintroduce the SelectionDAG scheduler test for r233351. This test returns nonnative integer types which aren't supported on all targets. The real issue with the SelectionDAG scheduler is with x86 EFLAGS. llvm-svn: 233355	2015-03-27 04:42:52 +00:00
David Majnemer	b919dd693f	WinEH: Create a parent frame alloca for HandlerType xdata tables We don't have any logic to emit those tables yet, so the SDAG lowering of this intrinsic is just a stub. We can see the intrinsic in the prepared IR, though. llvm-svn: 233354	2015-03-27 04:17:07 +00:00
Andrew Trick	46863e5565	This test should have been target specific. I missed that. llvm-svn: 233353	2015-03-27 04:04:35 +00:00
Andrew Trick	e97ff5a2ad	Fix a bug in SelectionDAG scheduling backtracking code: PR22304. It can happen (by line CurSU->isPending = true; // This SU is not in AvailableQueue right now.) that a SUnit is mark as available but is not in the AvailableQueue. For SUnit being selected for scheduling both conditions must be met. This patch mainly defensively protects from invalid removing a node from a queue. Sometimes nodes are marked isAvailable but are not in the queue because they have been defered due to some hazard. Patch by Pawel Bylica! llvm-svn: 233351	2015-03-27 03:44:13 +00:00
Duncan P. N. Exon Smith	219c8d3876	DebugInfo: Update testcases with invalid variables Fix testcases whose variables are invalid. I'm working on a patch that adds `Verifier` checks for `MDLocalVariable` (and `MDGlobalVariable`), and these failed because: - `scope:` fields need to point at `MDLocalScope` and can't be null. - `file:` fields need to point at `MDFile`. - `inlinedAt:` fields need to point at `MDLocation`. llvm-svn: 233349	2015-03-27 01:58:34 +00:00
Derek Schuff	b051389f04	Use movw/movt instead of constant pool loads to lower byval parameter copies Summary: The ARM backend can use a loop to implement copying byval parameters before a call. In non-thumb2 mode it uses a constant pool load to materialize the trip count. For targets that need movt instead (e.g. Native Client), use the same code as in thumb2 mode to materialize the trip count. Reviewers: jfb, t.p.northover Differential Revision: http://reviews.llvm.org/D8442 llvm-svn: 233324	2015-03-26 22:11:00 +00:00
Vladimir Sukharev	4b18c727a2	[ARM] Add v8.1a "Rounding Double Multiply Add/Subtract" extension Reviewers: t.p.northover Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D8503 llvm-svn: 233301	2015-03-26 18:29:02 +00:00
Vladimir Sukharev	c632cda8b2	[AArch64, ARM] Add v8.1a architecture and generic cpu New architecture and cpu added, following http://community.arm.com/groups/processors/blog/2014/12/02/the-armv8-a-architecture-and-its-ongoing-development Reviewers: t.p.northover Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D8505 llvm-svn: 233290	2015-03-26 17:05:54 +00:00
Andrea Di Biagio	8f7feec5fd	[X86][FastIsel] Teach how to select vector load instructions. This patch teaches fast-isel how to select 128-bit vector load instructions. Added test CodeGen/X86/fast-isel-vecload.ll Differential Revision: http://reviews.llvm.org/D8605 llvm-svn: 233270	2015-03-26 11:29:02 +00:00
Quentin Colombet	2c6e0597c6	[RegisterCoalescer] Add a rule to consider more profitable copies first when those are in the same basic block. The previous approach was the topological order of the basic block. By default this rule is disabled. Related to PR22768. llvm-svn: 233241	2015-03-26 01:01:48 +00:00
Eric Christopher	9f74ca5e0f	Testcase for r233239. llvm-svn: 233240	2015-03-26 00:57:33 +00:00
Simon Pilgrim	09f3ff9a0a	[DAGCombiner] Add support for TRUNCATE + FP_EXTEND vector constant folding This patch adds supports for the vector constant folding of TRUNCATE and FP_EXTEND instructions and tidies up the SINT_TO_FP and UINT_TO_FP instructions to match. It also moves the vector constant folding for the FNEG and FABS instructions to use the DAG.getNode() functionality like the other unary instructions. Differential Revision: http://reviews.llvm.org/D8593 llvm-svn: 233224	2015-03-25 22:30:31 +00:00
Reid Kleckner	7e9546b378	WinEH: Create an unwind help alloca for __CxxFrameHandler3 xdata tables We don't have any logic to emit those tables yet, so the sdag lowering of this intrinsic is just a stub. We can see the intrinsic in the prepared IR, though. llvm-svn: 233209	2015-03-25 20:10:36 +00:00
Kit Barton	535e69de34	Add Hardware Transactional Memory (HTM) Support This patch adds Hardware Transaction Memory (HTM) support supported by ISA 2.07 (POWER8). The intrinsic support is based on GCC one [1], but currently only the 'PowerPC HTM Low Level Built-in Function' are implemented. The HTM instructions follows the RC ones and the transaction initiation result is set on RC0 (with exception of tcheck). Currently approach is to create a register copy from CR0 to GPR and comapring. Although this is suboptimal, since the branch could be taken directly by comparing the CR0 value, it generates code correctly on both test and branch and just return value. A possible future optimization could be elimitate the MFCR instruction to branch directly. The HTM usage requires a recently newer kernel with PPC HTM enabled. Tested on powerpc64 and powerpc64le. This is send along a clang patch to enabled the builtins and option switch. [1] https://gcc.gnu.org/onlinedocs/gcc/PowerPC-Hardware-Transactional-Memory-Built-in-Functions.html Phabricator Review: http://reviews.llvm.org/D8247 llvm-svn: 233204	2015-03-25 19:36:23 +00:00
Sanjay Patel	2f8f019daf	[X86, AVX] improve insertion into zero element of 256-bit vector This patch allows AVX blend instructions to handle insertion into the low element of a 256-bit vector for the appropriate data types. For f32, instead of: vblendps $1, %xmm1, %xmm0, %xmm1 ## xmm1 = xmm1[0],xmm0[1,2,3] vblendps $15, %ymm1, %ymm0, %ymm0 ## ymm0 = ymm1[0,1,2,3],ymm0[4,5,6,7] we get: vblendps $1, %ymm1, %ymm0, %ymm0 ## ymm0 = ymm1[0],ymm0[1,2,3,4,5,6,7] For f64, instead of: vmovsd %xmm1, %xmm0, %xmm1 ## xmm1 = xmm1[0],xmm0[1] vblendpd $3, %ymm1, %ymm0, %ymm0 ## ymm0 = ymm1[0,1],ymm0[2,3] we get: vblendpd $1, %ymm1, %ymm0, %ymm0 ## ymm0 = ymm1[0],ymm0[1,2,3] For the hardware-neglected integer data types, I left a TODO comment in the code and added regression tests for a follow-on patch. Differential Revision: http://reviews.llvm.org/D8609 llvm-svn: 233199	2015-03-25 17:36:01 +00:00
Sanjay Patel	defd9b9b4c	use update_llc_test_checks.py to tighten checking in these tests 1. There were no CHECK-LABELs, so we could match instructions from the wrong function. 2. The use of zero operands meant multiple xor instructions could match some CHECKs. 3. The test was over-specified to need a Sandybridge CPU and Darwin triple. llvm-svn: 233198	2015-03-25 17:34:11 +00:00
Andrea Di Biagio	07a26d6b2f	[X86] Simplify check lines in tests. No functional change. Also, removed unused check lines from test atomic6432.ll. llvm-svn: 233181	2015-03-25 11:44:19 +00:00
Paul Robinson	284f0451cf	'optnone' should not disable DAG combiner. Reverts the code change from r221168 and the relevant test. It was a mistake to disable the combiner, and based on the ultimate definition of 'optnone' we shouldn't have considered the test case as failing in the first place. llvm-svn: 233153	2015-03-25 00:10:24 +00:00
Reid Kleckner	11470c48d0	X86: Fix frameescape when not using an FP We can't use TargetFrameLowering::getFrameIndexOffset directly, because Win64 really wants the offset from the stack pointer at the end of the prologue. Instead, use X86FrameLowering::getFrameIndexOffsetFromSP(), which is a pretty close approximiation of that. It fails to handle cases with interestingly large stack alignments, which is pretty uncommon on Win64 and is TODO. llvm-svn: 233137	2015-03-24 23:46:01 +00:00
Sanjay Patel	99d246d7d7	[X86, AVX] recognize shufflevector with zero input as a vperm2 (PR22984) vperm2x128 instructions have the special ability (aka free hardware capability) to shuffle zero values into a vector. This patch recognizes that type of shuffle and generates the appropriate control byte. https://llvm.org/bugs/show_bug.cgi?id=22984 Differential Revision: http://reviews.llvm.org/D8563 llvm-svn: 233100	2015-03-24 19:19:07 +00:00
Daniel Sanders	c676f2a8bb	[mips] Support 16-bit offsets for 'm' inline assembly memory constraint. Reviewers: vkalintiris Reviewed By: vkalintiris Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D8435 llvm-svn: 233086	2015-03-24 15:19:14 +00:00
Marek Olsak	949f5dab95	R600/SI: Select V_BFE_U32 for and+shift with a non-literal offset llvm-svn: 233079	2015-03-24 13:40:34 +00:00
Marek Olsak	9b72868d17	R600/SI: Custom-select 32-bit S_BFE from bitwise opcodes llvm-svn: 233078	2015-03-24 13:40:27 +00:00
Marek Olsak	63a7b084eb	R600/SI: Improve BFM support llvm-svn: 233077	2015-03-24 13:40:21 +00:00
Marek Olsak	7d77728c97	R600/SI: Use V_FRACT_F64 for faster 64-bit floor on SI Other f64 opcodes not supported on SI can be lowered in a similar way. v2: use complex VOP3 patterns llvm-svn: 233076	2015-03-24 13:40:15 +00:00
Marek Olsak	43650e45c3	R600/SI: Expand fract to floor, then only select V_FRACT on CI V_FRACT is buggy on SI. R600-specific code is left intact. v2: drop the multiclass, use complex VOP3 patterns llvm-svn: 233075	2015-03-24 13:40:08 +00:00
Daniel Sanders	a73d8fe2ad	[mips] Distinguish 'R', 'ZC', and 'm' inline assembly memory constraint. Summary: Previous behaviour of 'R' and 'm' has been preserved for now. They will be improved in subsequent commits. The offset permitted by ZC varies according to the subtarget since it is intended to match the restrictions of the pref, ll, and sc instructions. The restrictions on these instructions are: * For microMIPS: 12-bit signed offset. * For Mips32r6/Mips64r6: 9-bit signed offset. * Otherwise: 16-bit signed offset. Reviewers: vkalintiris Reviewed By: vkalintiris Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D8414 llvm-svn: 233063	2015-03-24 11:26:34 +00:00
Simon Pilgrim	481f4146cd	[SelectionDAG] Fixed issue with uitofp vector constant folding being treated as sitofp While the uitofp scalar constant folding treats an integer as an unsigned value (from lang ref): %X = sitofp i8 -1 to double ; yields double:-1.0 %Y = uitofp i8 -1 to double ; yields double:255.0 The vector constant folding was always using sitofp: %X = sitofp <2 x i8> <i8 -1, i8 -1> to <2 x double> ; yields <double -1.0, double -1.0> %Y = uitofp <2 x i8> <i8 -1, i8 -1> to <2 x double> ; yields <double -1.0, double -1.0> This patch fixes this so that the correct opcode is used for sitofp and uitofp. %X = sitofp <2 x i8> <i8 -1, i8 -1> to <2 x double> ; yields <double -1.0, double -1.0> %Y = uitofp <2 x i8> <i8 -1, i8 -1> to <2 x double> ; yields <double 255.0, double 255.0> Differential Revision: http://reviews.llvm.org/D8560 llvm-svn: 233033	2015-03-23 22:44:55 +00:00
Duncan P. N. Exon Smith	9b9cc2dad4	DebugInfo: Overload get() in DIDescriptor subclasses Continue to simplify the `DIDescriptor` subclasses, so that they behave more like raw pointers. Remove `getRaw()`, replace it with an overloaded `get()`, and overload the arrow and cast operators. Two testcases started to crash on the arrow operators with this change because of `scope:` references that weren't real scopes. I fixed them. Soon I'll add verifier checks for them too. This also adds explicit dereference operators. Previously, the builtin dereference against `operator MDNode *()` would have worked, but now the builtins are ambiguous. llvm-svn: 233030	2015-03-23 21:54:07 +00:00
Ahmed Bougacha	d1655cb1c0	[AArch64, ARM] Enable GlobalMerge with -O3 rather than -O1. The pass used to be enabled by default with CodeGenOpt::Less (-O1). This is too aggressive, considering the pass indiscriminately merges all globals together. Currently, performance doesn't always improve, and, on code that uses few globals (e.g., the odd file- or function- static), more often than not is degraded by the optimization. Lengthy discussion can be found on llvmdev (AArch64-focused; ARM has similar problems): http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-February/082800.html Also, it makes tooling and debuggers less useful when dealing with globals and data sections. GlobalMerge needs to better identify those cases that benefit, and this will be done separately. In the meantime, move the pass to run with -O3 rather than -O1, on both ARM and AArch64. llvm-svn: 233024	2015-03-23 21:17:36 +00:00
Chad Rosier	384ade9b11	[AArch64] Add FileCheck that was missing from test in r232967. llvm-svn: 233013	2015-03-23 20:25:15 +00:00
Matt Arsenault	f5b2cd891a	R600/SI: Allow commuting compares This enables very common cases to switch to the smaller encoding. All of the standard LLVM canonicalizations of comparisons are the opposite of what we want. Compares with constants are moved to the RHS, but the first operand can be an inline immediate, literal constant, or SGPR using the 32-bit VOPC encoding. There are additional bad canonicalizations that should also be fixed, such as canonicalizing ge x, k to gt x, (k + 1) if this makes k no longer an inline immediate value. llvm-svn: 232988	2015-03-23 18:45:30 +00:00
Chad Rosier	affe181b39	[AArch64] Enable rematerialization of float 0 values. Patch by Geoff Berry<gberry@codeaurora.org>. llvm-svn: 232967	2015-03-23 17:19:34 +00:00
Bradley Smith	ae0ad9c95d	Revert "[ARM] Add more pattern matching for f16 <-> f64 conversions" This change is incorrect since it converts double rounding into single rounding, which can produce different results. Instead this optimization will be done by modifying Clang's codegen to not produce double rounding in the first place. This reverts commit r232954. llvm-svn: 232962	2015-03-23 16:52:52 +00:00
Tom Stellard	f0a575f6be	R600/SI: Fix crash in SIInstrInfo::areLoadsFromSameBasePtr() This function assumed that SMRD instructions always have immediate offsets, which is not always the case. llvm-svn: 232957	2015-03-23 16:06:01 +00:00
Bradley Smith	bc0f0d8c49	[ARM] Add more pattern matching for f16 <-> f64 conversions Specifically when the conversion is done in two steps, f16 -> f32 -> f64. For example: %1 = tail call float @llvm.convert.from.fp16.f32(i16 %0) %conv = fpext float %1 to double to: vcvtb.f64.f16 llvm-svn: 232954	2015-03-23 15:59:54 +00:00
Petar Jovanovic	5b4362276b	Fix sign extension for MIPS64 in makeLibCall function Fixing sign extension in makeLibCall for MIPS64. In MIPS64 architecture all 32 bit arguments (int, unsigned int, float 32 (soft float)) must be sign extended. This fixes test "MultiSource/Applications/oggenc/". Patch by Strahinja Petrovic. Differential Revision: http://reviews.llvm.org/D7791 llvm-svn: 232943	2015-03-23 12:28:13 +00:00
Hal Finkel	8f7c5a7f18	[SDAG] Don't widen VSETCC during type legalization for split operands Because the operands of a vector SETCC node can be of a different type from the result (and often are), it can happen that even if we'd prefer to widen the result type of the SETCC, the operands have been split instead. In this case, the SETCC result also must be split. This mirrors what is done in WidenVecRes_SELECT, and should be NFC elsewhere because if the operands are not widened the following calls to GetWidenedVector will assert (which is what was happening in the test case). llvm-svn: 232935	2015-03-23 08:22:43 +00:00
Matt Arsenault	da5ece8e35	R600: Cleanup test with multiple check prefixes llvm-svn: 232901	2015-03-21 19:15:46 +00:00
Simon Pilgrim	307cb8fe5d	Tidied up vec_zero_cse.ll test. NFCI. Added target triple and refactored the CHECKs to be per function. llvm-svn: 232894	2015-03-21 14:05:12 +00:00
Tim Northover	000f994633	AArch64: simplify test case llvm-svn: 232886	2015-03-21 04:37:08 +00:00
Eric Christopher	faad620569	Remove the bare getSubtargetImpl call from the AArch64 port. As part of this add a test that shows we can generate code for functions that specifically enable a subtarget feature. llvm-svn: 232884	2015-03-21 04:04:50 +00:00
Eric Christopher	83eb13c967	Remove the bare getSubtargetImpl call from the PPC port. As part of this add a test that shows we can generate code with for functions that differ by subtarget feature. llvm-svn: 232882	2015-03-21 03:36:02 +00:00
Eric Christopher	c5a85af3b2	Cache the Function dependent subtarget on the MachineFunction. As preparation for removing the getSubtargetImpl() call from TargetMachine go ahead and flip the switch on caching the function dependent subtarget and remove the bare getSubtargetImpl call from the X86 port. As part of this add a few tests that show we can generate code and assemble on X86 based on features/cpu on the Function. llvm-svn: 232879	2015-03-21 03:13:10 +00:00
Ahmed Bougacha	7173b669b4	[CodeGen][IfCvt] Don't re-ifcvt blocks with unanalyzable terminators. If we couldn't analyze its terminator (i.e., it's an indirectbr, or some other weirdness), we can't safely re-if-convert a predicated block, because we can't tell whether the predicated terminator can fallthrough (it does). Currently, we would completely ignore the fallthrough successor. In the added testcase, this means we used to generate: ... @ %entry: cmp r5, #21 ittt ne @ %cc1f: cmpne r7, #42 @ %cc2t: strne.w r5, [r8] movne pc, r10 @ %cc1t: ... Whereas the successor of %cc1f was originally %bb1. With the fix, we get the correct: ... @ %entry: cmp r5, #21 itt eq @ %cc1t: streq.w r5, [r11] moveq pc, r0 @ %cc1f: cmp r7, #42 itt ne @ %cc2t: strne.w r5, [r8] movne pc, r10 @ %bb1: ... rdar://20192768 Differential Revision: http://reviews.llvm.org/D8509 llvm-svn: 232872	2015-03-21 01:23:15 +00:00
Ahmed Bougacha	e6bb09ac3f	[AArch64] Prefer UZP for concat_vector of illegal truncs. Follow-up to r232459: prefer a UZP shuffle to the intermediate truncs. llvm-svn: 232871	2015-03-21 01:08:39 +00:00
Andrew Kaylor	3170e5620e	Fixing a bug with WinEH PHI handling llvm-svn: 232851	2015-03-20 21:42:54 +00:00
Sanjay Patel	c88f724fed	[X86] Prefer blendps over insertps codegen for one special case With this patch, for this one exact case, we'll generate: blendps %xmm0, %xmm1, $1 instead of: insertps %xmm0, %xmm1, $0 If there's a memory operand available for load folding and we're optimizing for size, we'll still generate the insertps. The detailed performance data motivation for this may be found in D7866; in summary, blendps has 2-3x throughput vs. insertps on widely used chips. Differential Revision: http://reviews.llvm.org/D8332 llvm-svn: 232850	2015-03-20 21:19:52 +00:00
Rafael Espindola	36a15cb975	Don't declare all text sections at the start of the .s The code this patch removes was there to make sure the text sections went before the dwarf sections. That is necessary because MachO uses offsets relative to the start of the file, so adding a section can change relaxations. The dwarf sections were being printed at the start just to produce symbols pointing at the start of those sections. The underlying issue was fixed in r231898. The dwarf sections are now printed when they are about to be used, which is after we printed the text sections. To make sure we don't regress, the patch makes the MachO streamer assert if CodeGen puts anything unexpected after the DWARF sections. llvm-svn: 232842	2015-03-20 20:00:01 +00:00
John Brawn	1f26a47630	[ARM] Fix handling of thumb1 out-of-range frame offsets LocalStackSlotPass assumes that isFrameOffsetLegal doesn't change its answer when the base register changes. Unfortunately this isn't true in thumb1, where SP-based loads allow a larger offset than non-SP-based loads, and this causes the base register reuse code to generate instructions that are unencodable, causing an assertion failure. Solve this by adding a BaseReg parameter to isFrameOffsetLegal, which ARMBaseRegisterInfo can then make use of to give the correct answer. Differential Revision: http://reviews.llvm.org/D8419 llvm-svn: 232825	2015-03-20 17:20:07 +00:00

1 2 3 4 5 ...

12308 Commits