llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	d263fdc512	[X86][AVX512BW] Add support for v64i8 multiplies Extend the existing lowering of vXi8 multiplies to support v64i8 on avx512bw targets. I added the Lower512IntArith helper function to help with this - not sure how often this could be used in the future, but it seemed better than putting all that logic inside LowerMUL. Differential Revision: http://reviews.llvm.org/D18937 llvm-svn: 265902	2016-04-10 17:02:48 +00:00
Simon Pilgrim	bab9979ee9	[X86][AVX512] Regenerated mask op tests llvm-svn: 265898	2016-04-10 14:16:03 +00:00
Charles Davis	2f65f35c27	[CodeGen] Don't assume that fixed stack objects are aligned in a stack-realigned function. Summary: After we make the adjustment, we can assume that for local allocas, but not for stack parameters, the return address, or any other fixed stack object (which has a negative offset and therefore lies prior to the adjusted SP). Fixes PR26662. Reviewers: hfinkel, qcolombet, rnk Subscribers: rnk, llvm-commits Differential Revision: http://reviews.llvm.org/D18471 llvm-svn: 265886	2016-04-09 23:34:42 +00:00
Sanjay Patel	4abae4e0fa	[x86] use BMI 'andn' for logic + compare ops With BMI, we can use 'andn' to save an instruction when the result is only used in a compare. This is related to one of the potential sequences to check 'isfinite' in: https://llvm.org/bugs/show_bug.cgi?id=27164 Differential Revision: http://reviews.llvm.org/D18910 llvm-svn: 265875	2016-04-09 16:02:52 +00:00
Simon Pilgrim	1cc5712763	[X86][XOP] Support for VPPERM 2-input shuffle mask decoding This patch adds support for decoding XOP VPPERM instruction when it represents a basic shuffle. The mask decoding required the existing MCInstrLowering code to be updated to support binary shuffles - the implementation now matches what is done in X86InstrComments.cpp. Differential Revision: http://reviews.llvm.org/D18441 llvm-svn: 265874	2016-04-09 14:51:26 +00:00
Sanjay Patel	c0a627524d	[x86] show missed opportunities to use andn llvm-svn: 265850	2016-04-08 21:26:11 +00:00
Sanjay Patel	a699ecc55c	[x86] regenerate checks for BMI tests llvm-svn: 265841	2016-04-08 20:29:39 +00:00
Kevin B. Smith	e0a6fc3bcc	[X86] Fix PR23155 by turning on X86FixupBWInsts by default. Differential Revision: http://reviews.llvm.org/D18866 llvm-svn: 265830	2016-04-08 18:58:29 +00:00
Hans Wennborg	5a7723c7a2	Revert r265547 "Recommit r265309 after fixed an invalid memory reference bug happened" It caused PR27275: "ARM: Bad machine code: Using an undefined physical register" Also reverting the following commits that were landed on top: r265610 "Fix the compare-clang diff error introduced by r265547." r265639 "Fix the sanitizer bootstrap error in r265547." r265657 "InlineSpiller.cpp: Escap \@ in r265547. [-Wdocumentation]" llvm-svn: 265790	2016-04-08 15:17:43 +00:00
Simon Pilgrim	872dd6c3fe	[X86][SSE] Added 32-bit tests for vector lzcnt/tzcnt tests v2i64 tests are particularly bad on 32-bit targets. llvm-svn: 265789	2016-04-08 15:01:31 +00:00
Dmitry Polukhin	24c4f28ac9	[IFUNC] Fix ifunc-asm.ll test It seems that llc cannot be called used in assembler tests so test that checks asm for particular target needs to be moved to codegen. llvm-svn: 265770	2016-04-08 06:45:19 +00:00
Amaury Sechet	c53ad4f3b2	Do not select EhPad BB in MachineBlockPlacement when there is regular BB to schedule Summary: EHPad BB are not entered the classic way and therefor do not need to be placed after their predecessors. This patch make sure EHPad BB are not chosen amongst successors to form chains, and are selected as last resort when selecting the best candidate. EHPad are scheduled in reverse probability order in order to have them flow into each others naturally. Reviewers: chandlerc, majnemer, rafael, MatzeB, escha, silvas Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D17625 llvm-svn: 265726	2016-04-07 21:29:39 +00:00
Simon Pilgrim	fba9352f31	[X86][SSE] Added bitmask pattern shuffle tests Based on OR(AND(MASK,V0),AND(~MASK,V1)) style patterns llvm-svn: 265697	2016-04-07 17:23:55 +00:00
Kevin B. Smith	3802c4af59	[X86]: Fix for PR27251. Differential Revision: http://reviews.llvm.org/D18850 llvm-svn: 265690	2016-04-07 16:15:34 +00:00
Simon Pilgrim	d54bae6525	[X86][SSE] Add support for VZEXT constant folding llvm-svn: 265646	2016-04-07 07:52:45 +00:00
Ahmed Bougacha	1cf67fb9cb	[X86] Reuse EFLAGS and form LOCKed ops when only user is SETCC. Re-apply r265450 which caused PR27245 and was reverted in r265559 because of a wrong generalization: the fetch_and_add->add_and_fetch combine only works in specific, but pretty common, cases: (icmp slt x, 0) -> (icmp sle (add x, 1), 0) (icmp sge x, 0) -> (icmp sgt (add x, 1), 0) (icmp sle x, 0) -> (icmp slt (sub x, 1), 0) (icmp sgt x, 0) -> (icmp sge (sub x, 1), 0) Original Message: We only generate LOCKed versions of add/sub when the result is unused. It often happens that the result is used, but only by a comparison. We can optimize those out by reusing EFLAGS, which lets us use the proper instructions, instead of having to fallback to LXADD. Instead of doing this as an MI peephole (as we do for the other non-LOCKed (really, non-MR) forms), do it in ISel. It becomes quite tricky later. This also makes it eventually possible to stop expanding and/or/xor if the only user is an icmp (also see D18141). This uses the LOCK ISD opcodes added by r262244. Differential Revision: http://reviews.llvm.org/D17633 llvm-svn: 265636	2016-04-07 02:07:10 +00:00
Ahmed Bougacha	70bde5445b	[X86] Refresh and tweak EFLAGS reuse tests. NFC. The non-1 and EQ/NE tests were misguided. llvm-svn: 265635	2016-04-07 02:06:53 +00:00
Hans Wennborg	ab16be799c	Re-commit r265039 "[X86] Merge adjacent stack adjustments in eliminateCallFramePseudoInstr (PR27140)" Third time's the charm? The previous attempt (r265345) caused ASan test failures on X86, as broken CFI caused stack traces to not work. This version of the patch makes sure not to merge with stack adjustments that have CFI, and to not add merged instructions' offests to the CFI about to be generated. This is already covered by the lit tests; I just got the expectations wrong previously. llvm-svn: 265623	2016-04-07 00:05:49 +00:00
Hans Wennborg	6849f8f15f	Revert r265450 "[X86] Reuse EFLAGS and form LOCKed ops when only user is SETCC." It caused ASan 32-bit tests to hang (PR27245). llvm-svn: 265559	2016-04-06 16:44:38 +00:00
Hans Wennborg	a7e396b5ef	Revert "Re-commit r265039 "[X86] Merge adjacent stack adjustments in eliminateCallFramePseudoInstr (PR27140)"" It seems to be causing ASan tests to crash, probably due to miscompiling the run-time somehow. llvm-svn: 265551	2016-04-06 16:10:20 +00:00
Wei Mi	18293bef4e	Recommit r265309 after fixed an invalid memory reference bug happened when DenseMap growed and moved memory. I verified it fixed the bootstrap problem on x86_64-linux-gnu but I cannot verify whether it fixes the bootstrap error on clang-ppc64be-linux. I will watch the build-bot result closely. Replace analyzeSiblingValues with new algorithm to fix its compile time issue. The patch is to solve PR17409 and its duplicates. analyzeSiblingValues is a N x N complexity algorithm where N is the number of siblings generated by reg splitting. Although it causes siginificant compile time issue when N is large, it is also important for performance since it removes redundent spills and enables rematerialization. To solve the compile time issue, the patch removes analyzeSiblingValues and replaces it with lower cost alternatives containing two parts. The first part creates a new spill hoisting method in postOptimization of register allocation. It does spill hoisting at once after all the spills are generated instead of inside every instance of selectOrSplit. The second part queries the define expr of the original register for rematerializaiton and keep it always available during register allocation even if it is already dead. It deletes those dead instructions only in postOptimization. With the two parts in the patch, it can remove analyzeSiblingValues without sacrificing performance. Differential Revision: http://reviews.llvm.org/D15302 llvm-svn: 265547	2016-04-06 15:41:07 +00:00
Sanjoy Das	65a60670e8	Lower @llvm.experimental.deoptimize as a noreturn call While preserving the return value for @llvm.experimental.deoptimize at the IR level is useful during mid-level optimization, doing so at the machine instruction level requires generating some extra code and a return that is non-ideal. This change has LLVM lower ``` %val = call @llvm.experimental.deoptimize ret %val ``` to effectively ``` call @__llvm_deoptimize() unreachable ``` instead. llvm-svn: 265502	2016-04-06 01:33:49 +00:00
Evgeniy Stepanov	dde29e2799	Faster stack-protector for Android/AArch64. Bionic has a defined thread-local location for the stack protector cookie. Emit a direct load instead of going through __stack_chk_guard. llvm-svn: 265481	2016-04-05 22:41:50 +00:00
Manman Ren	f8bdd88cd9	Swift Calling Convention: add swiftcc. Differential Revision: http://reviews.llvm.org/D17863 llvm-svn: 265480	2016-04-05 22:41:47 +00:00
Ahmed Bougacha	50e6cd4a3a	[X86] Reuse EFLAGS and form LOCKed ops when only user is SETCC. We only generate LOCKed versions of add/sub when the result is unused. It often happens that the result is used, but only by a comparison. We can optimize those out by reusing EFLAGS, which lets us use the proper instructions, instead of having to fallback to LXADD. Instead of doing this as an MI peephole (as we do for the other non-LOCKed (really, non-MR) forms), do it in ISel. It becomes quite tricky later. This also makes it eventually possible to stop expanding and/or/xor if the only user is an icmp (also see D18141). This uses the LOCK ISD opcodes added by r262244. Differential Revision: http://reviews.llvm.org/D17633 llvm-svn: 265450	2016-04-05 20:02:57 +00:00
Ahmed Bougacha	b76e7253e9	[X86] Add tests for ATOMIC_LOAD_OP EFLAGS reuse. NFC. llvm-svn: 265448	2016-04-05 20:02:44 +00:00
Sanjay Patel	0484879fe7	[x86] regenerate checks utils/update_test_checks.py was improved with: http://reviews.llvm.org/rL265414 to include the first line of the function (expected to be a comment line). This ensures that nothing bad has happened before the first actual line of checked asm. It also matches the existing behavior of the old script. llvm-svn: 265416	2016-04-05 17:12:19 +00:00
Chuang-Yu Cheng	d3fb38cae5	Don't delete empty preheaders in CodeGenPrepare if it would create a critical edge Presently, CodeGenPrepare deletes all nearly empty (only phi and branch) basic blocks. This pass can delete loop preheaders which frequently creates critical edges. A preheader can be a convenient place to spill registers to the stack. If the entrance to a loop body is a critical edge, then spills may occur in the loop body rather than immediately before it. This patch protects loop preheaders from deletion in CodeGenPrepare even if they are nearly empty. Since the patch alters the CFG, it affects a large number of test cases. In most cases, the changes are merely cosmetic (basic blocks have different names or instruction orders change slightly). I am somewhat concerned about the test/CodeGen/Mips/brdelayslot.ll test case. If the loop preheader is not deleted, then the MIPS backend does not take advantage of a branch delay slot. Consequently, I would like some close review by a MIPS expert. The patch also partially subsumes D16893 from George Burgess IV. George correctly notes that CodeGenPrepare does not actually preserve the dominator tree. I think the dominator tree was usually not valid when CodeGenPrepare ran, but I am using LoopInfo to mark preheaders, so the dominator tree is now always valid before CodeGenPrepare. Author: Tom Jablin (tjablin) Reviewers: hfinkel george.burgess.iv vkalintiris dsanders kbarton cycheng http://reviews.llvm.org/D16984 llvm-svn: 265397	2016-04-05 14:06:20 +00:00
Teresa Johnson	f4cf1c3eb4	Don't fold double constant to an integer if dest type not integral Summary: I encountered this issue when constant folding during inlining tried to fold away a bitcast of a double to an x86_mmx, which is not an integral type. The test case exposes the same issue with a smaller code snippet during early CSE. Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D18528 llvm-svn: 265367	2016-04-04 23:50:46 +00:00
Hans Wennborg	a47a692341	Re-commit r265039 "[X86] Merge adjacent stack adjustments in eliminateCallFramePseudoInstr (PR27140)" The original commit miscompiled things on 32-bit Windows, e.g. a Clang boostrap. It turns out that mergeSPUpdates() was a bit too generous in what it interpreted as a stack adjustment, causing the following code: addl $12, %esp leal -4(%ebp), %esp To be "optimized" into simply: addl $8, %esp This commit tightens up mergeSPUpdates() and includes a new test (test14 in movtopush.ll) for this situation. llvm-svn: 265345	2016-04-04 21:02:46 +00:00
Sean Silva	f4517a15b8	Beef up some dllexport tests. Adds some dllexport tests to verify that: - Variables in bss are exported appropriately - Non-dllexport symbols aliased to dllexport symbols are not exported - Symbols declared as dllexport but are not defined are not exported We plan to enable dllimport/dllexport support for the PS4, and these additional tests are for points we noticed in our internal testing. Patch by Warren Ristow! Differential Revision: http://reviews.llvm.org/D18682 llvm-svn: 265333	2016-04-04 19:10:55 +00:00
Wei Mi	fb5252cac1	Revert r265309 and r265312 because they caused some errors I need to investigate. llvm-svn: 265317	2016-04-04 17:45:03 +00:00
Wei Mi	ffbc9c7f3b	Replace analyzeSiblingValues with new algorithm to fix its compile time issue. The patch is to solve PR17409 and its duplicates. analyzeSiblingValues is a N x N complexity algorithm where N is the number of siblings generated by reg splitting. Although it causes siginificant compile time issue when N is large, it is also important for performance since it removes redundent spills and enables rematerialization. To solve the compile time issue, the patch removes analyzeSiblingValues and replaces it with lower cost alternatives containing two parts. The first part creates a new spill hoisting method in postOptimization of register allocation. It does spill hoisting at once after all the spills are generated instead of inside every instance of selectOrSplit. The second part queries the define expr of the original register for rematerializaiton and keep it always available during register allocation even if it is already dead. It deletes those dead instructions only in postOptimization. With the two parts in the patch, it can remove analyzeSiblingValues without sacrificing performance. Differential Revision: http://reviews.llvm.org/D15302 llvm-svn: 265309	2016-04-04 16:42:40 +00:00
Elena Demikhovsky	e99c561391	AVX-512: Truncating store for i1 vectors Implemented truncstore for KNL and skylake-avx512. Covered vectors from v2i1 to v64i1. We save the value in bits (not in bytes) - v32i1 is saved in 4 bytes. Differential Revision: http://reviews.llvm.org/D18740 llvm-svn: 265283	2016-04-04 07:17:47 +00:00
Simon Pilgrim	d74f6e22f2	[X86][SSE] Refreshed MOVMSK sign bit tests llvm-svn: 265267	2016-04-03 18:59:42 +00:00
Elena Demikhovsky	5e426f7356	AVX-512: Load and Extended Load for i1 vectors Implemented load+{sign\|zero}_extend for i1 vectors Fixed failures in i1 vector load. Covered loading of v2i1, v4i1, v8i1, v16i1, v32i1, v64i1 vectors for KNL and SKX. Differential Revision: http://reviews.llvm.org/D18737 llvm-svn: 265259	2016-04-03 08:41:12 +00:00
Simon Pilgrim	b2e837d875	[X86][SSE] Added 1024-bit vector comparison tests More examples of PR22603, poor vector splitting for AVX512F targets as well as missing uses of PACKSS/MOVMSK llvm-svn: 265248	2016-04-02 21:33:09 +00:00
Simon Pilgrim	8fd7d852d5	[X86][AVX512] Added AVX512 comparison tests llvm-svn: 265247	2016-04-02 21:24:42 +00:00
Simon Pilgrim	a2c8da9e06	[X86][AVX] Added vector float truncation (double2float) tests llvm-svn: 265222	2016-04-02 14:09:17 +00:00
Adrian Prantl	cf0961f5ea	Add missing emissionKind flags to the DICompileUnits of several old testcases. llvm-svn: 265192	2016-04-01 22:18:43 +00:00
Simon Pilgrim	3243b21dae	[X86][SSE] Regenerated vector float tests - fabs / floor(etc.) / fneg / float2double llvm-svn: 265186	2016-04-01 21:30:48 +00:00
Simon Pilgrim	1e5bf0a256	[X86][SSE] Vector i64 load tests llvm-svn: 265185	2016-04-01 21:06:17 +00:00
Simon Pilgrim	275b2bcb76	[X86][SSE] Regenerated comparison mask and float immediate tests llvm-svn: 265184	2016-04-01 21:00:00 +00:00
Simon Pilgrim	a372a0f295	[X86][SSE] Regenerated the vec_extract tests. llvm-svn: 265183	2016-04-01 20:55:19 +00:00
Simon Pilgrim	f739d8a2ed	[X86][SSE] Regenerated the vec_insert tests. llvm-svn: 265179	2016-04-01 19:42:23 +00:00
Simon Pilgrim	6121118663	[X86][SSE] Regenerated vec_partial tests. llvm-svn: 265173	2016-04-01 18:30:29 +00:00
Sanjay Patel	9b5b5c82ca	[x86] add an SSE2 + fast-unaligned accesses run for memset nonzero tests Was there really no other way to splat a byte in SSE2? punpcklbw {{.#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7] pshuflw {{.#+}} xmm0 = xmm0[0,0,0,0,4,5,6,7] pshufd {{.*#+}} xmm0 = xmm0[0,0,1,1] llvm-svn: 265172	2016-04-01 18:29:25 +00:00
Simon Pilgrim	858194640e	[X86][SSE] Regenerated vec_logical tests. llvm-svn: 265171	2016-04-01 18:28:23 +00:00
Simon Pilgrim	1b14082488	[X86][SSE] Regenerated vector sdiv to shifts tests Added SSE + AVX1 tests as well as AVX2 llvm-svn: 265169	2016-04-01 18:18:40 +00:00
Sanjay Patel	d3e3d48cb9	[x86] add an SSE1 run for these tests Note however that this is identical to the existing SSE2 run. What we really want is yet another run for an SSE2 machine that also has fast unaligned 16-byte accesses. llvm-svn: 265167	2016-04-01 18:11:30 +00:00
Simon Pilgrim	b8283631a5	[X86][SSE] Regenerated vec_setcc tests. llvm-svn: 265164	2016-04-01 17:55:02 +00:00
Simon Pilgrim	41e31ff6bd	[X86][SSE] Regenerated the vec_set tests. Replaced lots of dodgy greps with actual codegen llvm-svn: 265163	2016-04-01 17:40:25 +00:00
Sanjay Patel	9f413364d5	[x86] avoid intermediate splat for non-zero memsets (PR27100) Follow-up to http://reviews.llvm.org/D18566 and http://reviews.llvm.org/D18676 - where we noticed that an intermediate splat was being generated for memsets of non-zero chars. That was because we told getMemsetStores() to use a 32-bit vector element type, and it happily obliged by producing that constant using an integer multiply. The 16-byte test that was added in D18566 is now equivalent for AVX1 and AVX2 (no splats, just a vector load), but we have PR27141 to track that splat difference. Note that the SSE1 path is not changed in this patch. That can be a follow-up. This patch should resolve PR27100. llvm-svn: 265161	2016-04-01 17:36:45 +00:00
Sanjay Patel	a05e0ff223	[x86] avoid intermediate splat for non-zero memsets (PR27100) Follow-up to D18566 - where we noticed that an intermediate splat was being generated for memsets of non-zero chars. That was because we told getMemsetStores() to use a 32-bit vector element type, and it happily obliged by producing that constant using an integer multiply. The tests that were added in the last patch are now equivalent for AVX1 and AVX2 (no splats, just a vector load), but we have PR27141 to track that splat difference. In the new tests, the splat via shuffling looks ok to me, but there might be some room for improvement depending on uarch there. Note that the SSE1/2 paths are not changed in this patch. That can be a follow-up. This patch should resolve PR27100. Differential Revision: http://reviews.llvm.org/D18676 llvm-svn: 265148	2016-04-01 16:27:14 +00:00
Simon Pilgrim	7ec092d0f8	[X86][AVX512] Regenerated intrinsics tests llvm-svn: 265135	2016-04-01 11:57:51 +00:00
Andrey Turetskiy	958eb46443	[X86] Introduce Lakemont CPU. Add a new Intel MCU CPU Lakemont, which doesn't support X87. Differential Revision: http://reviews.llvm.org/D18650 llvm-svn: 265128	2016-04-01 10:16:15 +00:00
Sean Silva	3de5ef96c9	Improve CHECK-NOT robustness of dllexport tests This changes some dllexport tests, to verify that some symbols that should not be exported are not, in a way that improves the robustness of CHECK-SAME interaction with CHECK-NOT. We plan to enable dllimport/dllexport support for the PS4, and these changes are for points we noticed in our internal testing. Patch by Warren Ristow! llvm-svn: 265106	2016-04-01 03:54:03 +00:00
Sanjoy Das	9d41a8f269	Don't use an i64 return type with webkit_jscc Re-enable an assertion enabled by Justin Lebar in rL265092. rL265092 was breaking test/CodeGen/X86/deopt-intrinsic.ll because webkit_jscc does not like non-i64 return types. Change the test case to not do that. llvm-svn: 265099	2016-04-01 02:51:21 +00:00
Adrian Prantl	b8089516a5	testcase gardening: update the emissionKind enum to the new syntax. (NFC) llvm-svn: 265081	2016-04-01 00:16:49 +00:00
Adrian Prantl	b939a25707	Move the DebugEmissionKind enum from DIBuilder into DICompileUnit. This mostly cosmetic patch moves the DebugEmissionKind enum from DIBuilder into DICompileUnit. DIBuilder is not the right place for this enum to live in — a metadata consumer should not have to include DIBuilder.h. I also added a Verifier check that checks that the emission kind of a DICompileUnit is actually legal. http://reviews.llvm.org/D18612 <rdar://problem/25427165> llvm-svn: 265077	2016-03-31 23:56:58 +00:00
Sanjay Patel	61e13249d8	[x86] add memset tests to show another potential improvement llvm-svn: 265048	2016-03-31 20:40:32 +00:00
Hans Wennborg	132cd62121	Revert r265039 "[X86] Merge adjacent stack adjustments in eliminateCallFramePseudoInstr (PR27140)" I think it might have caused these build breakages: http://lab.llvm.org:8011/builders/clang-x86-win2008-selfhost/builds/7234/steps/build%20stage%202/logs/stdio http://lab.llvm.org:8011/builders/sanitizer-windows/builds/19566/steps/run%20tests/logs/stdio llvm-svn: 265046	2016-03-31 20:27:30 +00:00
Simon Pilgrim	aab59b7a28	[X86][SSE] Some basic tests for variable shuffles We don't really support non-constant shuffle masks, but these tests are for cases where BUILD_VECTOR is made up from vector extracts (as well as undef/zero scalars). llvm-svn: 265045	2016-03-31 20:26:30 +00:00
Hans Wennborg	e97fb414e8	[X86] Merge adjacent stack adjustments in eliminateCallFramePseudoInstr (PR27140) For code such as: void f(int, int); void g() { f(1, 2); } compiled for 32-bit X86 Linux, Clang would previously generate: subl $12, %esp subl $8, %esp pushl $2 pushl $1 calll f addl $16, %esp addl $12, %esp retl This patch fixes that by merging adjacent stack adjustments in eliminateCallFramePseudoInstr(). Differential Revision: http://reviews.llvm.org/D18627 llvm-svn: 265039	2016-03-31 19:26:24 +00:00
Sanjay Patel	92d5ea5e07	[x86] use SSE/AVX ops for non-zero memsets (PR27100) Move the memset check down to the CPU-with-slow-SSE-unaligned-memops case: this allows fast targets to take advantage of SSE/AVX instructions and prevents slow targets from stepping into a codegen sinkhole while trying to splat a byte into an XMM reg. Follow-on bugs exposed by the current codegen are: https://llvm.org/bugs/show_bug.cgi?id=27141 https://llvm.org/bugs/show_bug.cgi?id=27143 Differential Revision: http://reviews.llvm.org/D18566 llvm-svn: 265029	2016-03-31 17:30:06 +00:00
Hans Wennborg	a7543ba10c	More checks in win32-seh-nested-finally.ll after comment on r264966 llvm-svn: 265027	2016-03-31 16:42:10 +00:00
Davide Italiano	936a2b09f3	[DebugInfo] Subprograms should belong to a CU. Start fixing tests accordingly. There are still about 35 failures before we can enable this check in the IR verifier. llvm-svn: 264990	2016-03-31 03:40:07 +00:00
Sean Silva	24d7e2e869	Fix case confusion. The test case was defining and using a function 'notExported()', but the FileCheck checks were checking for the name 'not_exported'. This changes the test to use 'notExported' across the board. Also, the test defined a function 'not_defined()', but doesn't have any checks related to it. For consistency, this name is changed to 'notDefined'. A later commit will add checks for 'notDefined'. Patch by Warren Ristow! llvm-svn: 264984	2016-03-31 01:47:33 +00:00
Hans Wennborg	be0df2b102	Add some more triples after r264966 llvm-svn: 264972	2016-03-30 23:55:22 +00:00
Hans Wennborg	6596977130	[X86] Enable call frame optimization ("mov to push") not only for optsize (PR26325) The size savings are significant, and from what I can tell, both ICC and GCC do this. Differential Revision: http://reviews.llvm.org/D18573 llvm-svn: 264966	2016-03-30 23:38:01 +00:00
Simon Pilgrim	c49bd2ede0	[X86][AVX] Ensure EltsFromConsecutiveLoads tests the entire vector for consecutive loads/zeros Fix for issue introduced D17297, where we were breaking early from the loop detecting consecutive loads which could leave us thinking a consecutive load with zeros was possible. llvm-svn: 264922	2016-03-30 20:52:24 +00:00
Simon Pilgrim	b87ffe8519	[X86][XOP] BITREVERSE lowering using VPPERM XOP's VPPERM has some great 'permute operations' that it can do as well as part of shuffling the bytes of a 128-bit vector - in this case we use it to perform BITREVERSE in a single instruction. llvm-svn: 264870	2016-03-30 14:14:00 +00:00
Simon Pilgrim	9490b56a89	[X86][SSE] Test the legalization of vector comparison results We are currently doing a REALLY bad job of packing results of vector comparisons into the legalized <X x i1> result equivalents - a mixture of PACKSS/PMOVMSKB would be much better here. llvm-svn: 264867	2016-03-30 13:55:00 +00:00
Simon Pilgrim	ab305a9d4c	[X86][SSE] Added tests for clearing upper bits of vector elements Patterns based on PR6455 llvm-svn: 264857	2016-03-30 11:43:26 +00:00
Chandler Carruth	8e06a10d1f	[x86] Fix a horrible bug in our lowering of x86 floating point atomic operations. Specifically, we had code that tried to badly approximate reconstructing all of the possible variations on addressing modes in two x86 instructions based on those in one pseudo instruction. This is not the first bug uncovered with doing this, so stop doing it altogether. Instead generically and pedantically copy every operand from the address over to both new instructions, and strip kill flags from any register operands. This fixes a subtle bug seen in the wild where we would mysteriously drop parts of the addressing mode, causing for example the index argument in the added test case to just be completely ignored. Hypothetically, this was an extremely bad miscompile because it actually caused a predictable and leveragable write of a 64bit quantity to an unintended offset (the first element of the array intead of whatever other element was intended). As a consequence, in theory this could even have introduced security vulnerabilities. However, this was only something that could happen with an atomic floating point add. No other operation could trigger this bug, so it seems extremely unlikely to have occured widely in the wild. But it did in fact occur, and frequently in scientific applications which were using relaxed atomic updates of a floating point value after adding a delta. Those would end up being quite badly miscompiled by LLVM, which is how we found this. Of course, this often looks like a race condition in the code, but it was actually a miscompile. I suspect that this whole RELEASE_FADD thing was a complete mistake. There is no such operation, and I worry that anything other than add will get remarkably worse codegeneration. But that's not for this change.... llvm-svn: 264845	2016-03-30 08:41:59 +00:00
Sanjay Patel	f48f7d74e2	use FileCheck and auto-check-generation script for exact checking 1. Removed the run line for mingw32 and made the Darwin triples unknown. This is a test of 32-bit vs. 64-bit platform and the underlying hardware. We have other tests for checking behavioral differences of the OS platform. 2. Changed the CPU specifiers to the attributes they were meant to represent. Any CPU that doesn't have SSE4.2 is assumed to have slow unaligned 16-byte accesses, so it won't use those here. 3. Although the stores really could all be CHECK-DAG, I left them as CHECK-NEXT to show the strange behavior of the instruction scheduler in the SLOW_32 case. 4. The odd-looking instructions are due to the use of a null pointer in the IR, so we have integer immediate store addresses. Cute. llvm-svn: 264796	2016-03-29 22:27:39 +00:00
Nirav Dave	2aab7f4358	Add support for no-jump-tables Add function soft attribute to the generation of Jump Tables in CodeGen as initial step towards clang support of gcc's no-jump-table support Reviewers: hans, echristo Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D18321 llvm-svn: 264756	2016-03-29 17:46:23 +00:00
Manman Ren	f46262e0b7	Swift Calling Convention: add swiftself attribute. Differential Revision: http://reviews.llvm.org/D17866 llvm-svn: 264754	2016-03-29 17:37:21 +00:00
Sanjay Patel	7c61507991	[x86] add tests to show current memset codegen llvm-svn: 264748	2016-03-29 17:09:27 +00:00
Sanjay Patel	32de9628e3	regenerate checks llvm-svn: 264738	2016-03-29 16:11:29 +00:00
Junmo Park	32ddc544c3	fix CHECK_NOT -> CHECK-NOT llvm-svn: 264706	2016-03-29 07:53:07 +00:00
Elena Demikhovsky	95629caaa9	AVX-512: fixed a bug in fp_to_uint pattern on KNL Fixed fp_to_uint instruction selection on KNL. One pattern was missing for <4 x double> to <4 x i32> Differential Revision: http://reviews.llvm.org/D18512 llvm-svn: 264701	2016-03-29 06:33:41 +00:00
Kyle Butt	5e241b11ed	[Codegen] Decrease minimum jump table density. Minimum density for both optsize and non optsize are now options -sparse-jump-table-density (default 10) for non optsize functions -dense-jump-table-density (default 40) for optsize functions, which matches the current default. This improves several benchmarks at google at the cost of a small codesize increase. For code compiled with -Os, the old behavior continues llvm-svn: 264689	2016-03-29 00:23:41 +00:00
Sanjay Patel	52fe9d0efe	fix CHECK_NEXT -> CHECK-NEXT llvm-svn: 264672	2016-03-28 21:58:27 +00:00
Sanjay Patel	8f9e26964a	fix CHECK_LABEL -> CHECK-LABEL llvm-svn: 264671	2016-03-28 21:56:48 +00:00
Sanjay Patel	a1e5ef624c	trailing whitespace llvm-svn: 264670	2016-03-28 21:52:53 +00:00
Simon Pilgrim	d3df400fa9	[X86][SSE] Vectorize a bit (AND/XOR/OR) op if a BUILD_VECTOR has the same op for all their scalar elements. If all a BUILD_VECTOR's source elements are the same bit (AND/XOR/OR) operation type and each has one constant operand, lower to a pair of BUILD_VECTOR and just apply the bit operation to the vectors. The constant operands will form a constant vector meaning that we still only have a single BUILD_VECTOR to lower and we will have replaced all the scalarized operations with a single SSE equivalent. Its not in our interest to start make a general purpose vectorizer from this, but I'm seeing enough of these scalar bit operations from the later legalization/scalarization stages to support them at least. Differential Revision: http://reviews.llvm.org/D18492 llvm-svn: 264666	2016-03-28 21:33:52 +00:00
Sanjay Patel	7092de4cd2	fix CHECK_NEXT -> CHECK-NEXT llvm-svn: 264661	2016-03-28 21:14:24 +00:00
Derek Schuff	ad154c837e	Introduce MachineFunctionProperties and the AllVRegsAllocated property MachineFunctionProperties represents a set of properties that a MachineFunction can have at particular points in time. Existing examples of this idea are MachineRegisterInfo::isSSA() and MachineRegisterInfo::tracksLiveness() which will eventually be switched to use this mechanism. This change introduces the AllVRegsAllocated property; i.e. the property that all virtual registers have been allocated and there are no VReg operands left. With this mechanism, passes can declare that they require a particular property to be set, or that they set or clear properties by implementing e.g. MachineFunctionPass::getRequiredProperties(). The MachineFunctionPass base class verifies that the requirements are met, and handles the setting and clearing based on the delcarations. Passes can also directly query and update the current properties of the MF if they want to have conditional behavior. This change annotates the target-independent post-regalloc passes; future changes will also annotate target-specific ones. Reviewers: qcolombet, hfinkel Differential Revision: http://reviews.llvm.org/D18421 llvm-svn: 264593	2016-03-28 17:05:30 +00:00
Elena Demikhovsky	83f0647d85	AVX-512: Fixed ICMP instruction selection for i1 operands ICMP instruction selection fails on SKX and KNL for i1 operand. I use XOR to resolve: (A == B) is equivalent to (A xor B) == 0 Differential Revision: http://reviews.llvm.org/D18511 llvm-svn: 264566	2016-03-28 07:47:58 +00:00
Simon Pilgrim	dcdf85033c	[X86][AVX] Enabled SMUL_LOHI/UMUL_LOHI v8i32 vectors on AVX1 targets Correct splitting of v8i32 vectors into v4i32 vectors to prevent scalarization llvm-svn: 264517	2016-03-26 18:32:13 +00:00
Simon Pilgrim	e4dbeb40c6	[X86][AVX] Enabled MULHS/MULHU v16i16 vectors on AVX1 targets Correct splitting of v16i16 vectors into v8i16 vectors to prevent scalarization Differential Revision: http://reviews.llvm.org/D18307 llvm-svn: 264512	2016-03-26 15:44:55 +00:00
Simon Pilgrim	3eef33a806	[X86][SSE] Add MULHS/MULHU custom lowering for i8 vectors Currently this is to mainly to prevent scalarization of integer division by constants. Differential Revision: http://reviews.llvm.org/D18307 llvm-svn: 264511	2016-03-26 15:27:20 +00:00
Simon Pilgrim	7b36cdaecf	[X86][SSE] Added v64i8 vector integer multiply tests llvm-svn: 264510	2016-03-26 09:50:06 +00:00
Simon Pilgrim	7379a70677	[X86][AVX512BW] AVX512BW can sign-extend v32i8 to v32i16 for simpler v32i8 multiplies. Only pre-AVX512BW targets need to split v32i8 vectors. llvm-svn: 264509	2016-03-26 09:44:27 +00:00
Simon Pilgrim	9a5f19f509	[X86][SSE] Refreshed vector integer multiply tests Add all 256-bit vector tests. Added AVX512F/AVX512BW test targets. Renamed tests something more meaningful. llvm-svn: 264507	2016-03-26 09:35:48 +00:00
David Majnemer	020e890a19	[X86] Emit a proper ADJCALLSTACKDOWN in EmitLoweredTLSAddr We forgot to add the second machine operand to our ADJCALLSTACKDOWN, resulting in crashes in PEI. This fixes PR27071. llvm-svn: 264465	2016-03-25 21:49:11 +00:00
Hans Wennborg	5f916d3df4	[X86] Use "and $0" and "orl $-1" to store 0 and -1 when optimizing for minsize 64-bit, 32-bit and 16-bit move-immediate instructions are 7, 6, and 5 bytes, respectively, whereas and/or with 8-bit immediate is only three bytes. Since these instructions imply an additional memory read (which the CPU could elide, but we don't think it does), restrict these patterns to minsize functions. Differential Revision: http://reviews.llvm.org/D18374 llvm-svn: 264440	2016-03-25 18:11:31 +00:00
Hans Wennborg	4ae5119eeb	X86: Use push-pop for materializing 8-bit immediates for minsize (take 2) This is the same as r255936, with added logic for avoiding clobbering of the red zone (PR26023). Differential Revision: http://reviews.llvm.org/D18246 llvm-svn: 264375	2016-03-25 01:10:56 +00:00
Manman Ren	9dd8c14674	CXX TLS: collect return blocks after SelectAllBasicBlocks. It is incorrect to get the corresponding MBB for a ReturnInst before SelectAllBasicBlocks since SelectAllBasicBlocks can change the correspondence between a ReturnInst and the MBB it is in. PR27062 llvm-svn: 264358	2016-03-24 23:21:29 +00:00
Sanjoy Das	731c67fed2	Lower varargs correctly in deopt bundle lowering Earlier we were ignoring varargs in LowerCallSiteWithDeoptBundle because populateCallLoweringInfo does not set CallLoweringInfo::IsVarArg. llvm-svn: 264354	2016-03-24 22:37:52 +00:00
Sanjoy Das	6bcfe31820	Match call and target calling conventions in test Fixes an issue in rL264329. llvm-svn: 264337	2016-03-24 20:51:24 +00:00
Sanjoy Das	df9ae70f49	Add lowering support for llvm.experimental.deoptimize Summary: Only adds support for "naked" calls to llvm.experimental.deoptimize. Support for round-tripping through RewriteStatepointsForGC will come as a separate patch (should be simpler than this one). Reviewers: reames Subscribers: sanjoy, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D18429 llvm-svn: 264329	2016-03-24 20:23:29 +00:00
Sanjoy Das	c0c59fe14e	[Statepoints] Fix yet another issue around gc pointer uniqueing Given that StatepointLowering now uniques derived pointers before putting them in the per-statepoint spill map, we may end up with missing entries for derived pointers when we visit a gc.relocate on a pointer that was de-duplicated away. Fix this by keeping two maps, one mapping gc pointers to their de-duplicated values, and one mapping a de-duplicated value to the slot it is spilled in. llvm-svn: 264320	2016-03-24 18:57:39 +00:00
Sanjoy Das	8f42b7b3cd	Remove unnecessary redirect from test llvm-svn: 264308	2016-03-24 17:18:00 +00:00
Elena Demikhovsky	95f3173ce9	AVX-512: Generate KTEST instead of TEST fir i1 vectors KTEST instruction may be used instead of TEST in this case: %int_sel3 = bitcast <8 x i1> %sel3 to i8 %res = icmp eq i8 %int_sel3, zeroinitializer br i1 %res, label %L2, label %L1 Differential Revision: http://reviews.llvm.org/D18444 llvm-svn: 264298	2016-03-24 15:53:45 +00:00
Simon Pilgrim	572ca71573	[X86][XOP] Support for VPPERM byte shuffle instruction This patch begins adding support for lowering to the XOP VPPERM instruction - adding the X86ISD::VPPERM opcode. Differential Revision: http://reviews.llvm.org/D18189 llvm-svn: 264260	2016-03-24 11:52:43 +00:00
Simon Pilgrim	0110890a79	[X86][SSE] Added tests to ensure that consecutive loads including any/all volatiles are not combined llvm-svn: 264225	2016-03-24 00:14:37 +00:00
Paul Robinson	f81836bd18	[PS4] Guarantee an instruction after a 'noreturn' call. We need the "return address" of a noreturn call to be within the bounds of the calling function; TrapUnreachable turns 'unreachable' into a 'ud2' instruction, which has that desired effect. Differential Revision: http://reviews.llvm.org/D18414 llvm-svn: 264224	2016-03-24 00:10:03 +00:00
Cong Hou	94710840fb	Allow X86::COND_NE_OR_P and X86::COND_NP_OR_E to be reversed. Currently, AnalyzeBranch() fails non-equality comparison between floating points on X86 (see https://llvm.org/bugs/show_bug.cgi?id=23875). This is because this function can modify the branch by reversing the conditional jump and removing unconditional jump if there is a proper fall-through. However, in the case of non-equality comparison between floating points, this can turn the branch "unanalyzable". Consider the following case: jne.BB1 jp.BB1 jmp.BB2 .BB1: ... .BB2: ... AnalyzeBranch() will reverse "jp .BB1" to "jnp .BB2" and then "jmp .BB2" will be removed: jne.BB1 jnp.BB2 .BB1: ... .BB2: ... However, AnalyzeBranch() cannot analyze this branch anymore as there are two conditional jumps with different targets. This may disable some optimizations like block-placement: in this case the fall-through behavior is enforced even if the fall-through block is very cold, which is suboptimal. Actually this optimization is also done in block-placement pass, which means we can remove this optimization from AnalyzeBranch(). However, currently X86::COND_NE_OR_P and X86::COND_NP_OR_E are not reversible: there is no defined negation conditions for them. In order to reverse them, this patch defines two new CondCode X86::COND_E_AND_NP and X86::COND_P_AND_NE. It also defines how to synthesize instructions for them. Here only the second conditional jump is reversed. This is valid as we only need them to do this "unconditional jump removal" optimization. Differential Revision: http://reviews.llvm.org/D11393 llvm-svn: 264199	2016-03-23 21:45:37 +00:00
Simon Pilgrim	b5fb65d43e	[X86] Regenerated WidenArith test llvm-svn: 264157	2016-03-23 14:00:28 +00:00
Andrey Turetskiy	6a3d561ea0	[X86] Introduction of FeatureX87. Add FeatureX87 in X86 backend to be able to define CPUs which doesn't have x87. Differential Revision: http://reviews.llvm.org/D13979 llvm-svn: 264148	2016-03-23 11:13:54 +00:00
Sanjoy Das	e58ca59cf4	[StatepointLowering] Schedule gc relocates before uniqueing them Otherwise we can see an "unexpected" gc.relocate that we uniqued away. llvm-svn: 264127	2016-03-23 02:24:07 +00:00
Matthias Braun	68bb2931cc	Revert "Support arbitrary addrspace pointers in masked load/store intrinsics" This commit broke LTO builds. Reverting it to unbreak the bots while the issue is investigated. See also: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160321/341002.html This reverts r263158 llvm-svn: 264088	2016-03-22 20:24:34 +00:00
Simon Pilgrim	cc41495eb8	[X86][AVX] Added AVX1 tests for 256-bit vector idiv-by-constant Prep work based on feedback for D18307 llvm-svn: 264086	2016-03-22 20:10:49 +00:00
Simon Pilgrim	c6f5fe3d69	[SelectionDAG] Ensure constant folded legalized vector element types are compatible with the BUILD_VECTOR type Found during fuzz testing - 32-bit x86 targets were legalizing a <2 x i1> compare result to <2 x i32> when <2 x i64> was expected. llvm-svn: 264085	2016-03-22 19:59:53 +00:00
Tim Northover	b49a8a9dbb	CodeGen: check return types match when emitting tail call to builtin. We were just completely ignoring the types when determining whether we could safely emit a libcall as a tail call. This is clearly wrong. Theoretically, we could dig deeper looking for incidental matches (much like the generic code in Analysis.cpp does), but it's probably not worth it for the few libcalls that exist. llvm-svn: 264084	2016-03-22 19:14:38 +00:00
Sanjoy Das	bfecef5e1b	Remove unnecessary branch from test (Addresses post commit review by Reid Kleckner) llvm-svn: 264083	2016-03-22 18:45:41 +00:00
Sanjoy Das	eb5037cadc	Allow lowering call sites with both funclets and deopt state Lowering funclets is a no-op, so we can just go ahead and lower the deopt state. llvm-svn: 264078	2016-03-22 18:10:39 +00:00
Simon Pilgrim	25fb4177fb	[X86][SSE] Reapplied: Simplify vector LOAD + EXTEND on pre-SSE41 hardware Improve vector extension of vectors on hardware without dedicated VSEXT/VZEXT instructions. We already convert these to SIGN_EXTEND_VECTOR_INREG/ZERO_EXTEND_VECTOR_INREG but can further improve this by using the legalizer instead of prematurely splitting into legal vectors in the combine as this only properly helps for lowering to VSEXT/VZEXT. Removes a lot of unnecessary any_extend + mask pattern - (Fix for PR25718). Reapplied with a fix for PR26953 (missing vector widening legalization). Differential Revision: http://reviews.llvm.org/D17932 llvm-svn: 264062	2016-03-22 16:22:08 +00:00
Sanjoy Das	dd1d72ce92	Appease the windows buildbots The guess is that the stdout/stderr ordering may differ between windows / unix. llvm-svn: 264019	2016-03-22 02:11:57 +00:00
Sanjoy Das	38bfc22161	Add "first class" lowering for deopt operand bundles Summary: After this change, deopt operand bundles can be lowered directly by SelectionDAG into STATEPOINT instructions (which are then lowered to a call or sequence of nop, with an associated __llvm_stackmaps entry0. This obviates the need to round-trip deoptimization state through gc.statepoint via RewriteStatepointsForGC. Reviewers: reames, atrick, majnemer, JosephTremoulet, pgavlin Subscribers: sanjoy, mcrosier, majnemer, llvm-commits Differential Revision: http://reviews.llvm.org/D18257 llvm-svn: 264015	2016-03-22 00:59:13 +00:00
Elena Demikhovsky	39a0020f2d	Fixed -mcpu flag "core-avx" does not exist; I changed to "nehalem" llvm-svn: 263932	2016-03-21 11:06:20 +00:00
Simon Pilgrim	4af44f3c13	[X86][SSE] Add vector integer division by constant tests Expanded tests and split into sdiv/srem and udiv/urem cases for 128 and 256 bit vectors. llvm-svn: 263917	2016-03-20 21:46:58 +00:00
Simon Pilgrim	c44472a5bc	[X86][SSE] Detect zeroable shuffle elements from different value types Improve computeZeroableShuffleElements to be able to peek through bitcasts to extract zero/undef values from BUILD_VECTOR nodes of different element sizes to the shuffle mask. Differential Revision: http://reviews.llvm.org/D14261 llvm-svn: 263906	2016-03-20 15:45:42 +00:00
Igor Breger	3ea8af5108	AVX512BW: Enable v32i1/v64i1 BUILD_VECTOR Differential Revision: http://reviews.llvm.org/D18211 llvm-svn: 263898	2016-03-20 13:09:43 +00:00
Manman Ren	4865d89653	[CXX_FAST_TLS] Disable tail call when calling conventions are mismatched. Since CXX_FAST_TLS has a bigger set of CSRs, we don't tail call when caller and callee have mismatched calling conventions. llvm-svn: 263856	2016-03-18 23:41:51 +00:00
Manman Ren	2828c57b6f	[CXX_FAST_TLS] fix issues with O0 on ARM, AArch64 and X86. Since at O0, explicit copies via SplitCSR may not be removed even if they are unnecessary, we choose not to use SplitCSR at O0. llvm-svn: 263855	2016-03-18 23:38:49 +00:00
Simon Pilgrim	0f37fbac51	[X86][SSE] Simplified blend-with-zero combining We were being too aggressive in trying to combine a shuffle into a blend-with-zero pattern, often resulting in a endless loop of contrasting combines This patch stops the combine if we already have a blend in place (means we miss some domain corrections) llvm-svn: 263717	2016-03-17 15:59:36 +00:00
Simon Pilgrim	53f27fc00f	[X86] Reduced alignment of widened vector load/stores to better match PR26953 cases llvm-svn: 263649	2016-03-16 18:32:44 +00:00
Simon Pilgrim	60228bdb80	[X86] Regenerated + extended widened vector conversion tests - Ensure we test X86 + X64 - sitopfp / uitofp requires testing for SSE2 and SSE42 as well (part of the fix for PR26953) llvm-svn: 263640	2016-03-16 15:33:43 +00:00
Igor Breger	0ba7b04f5f	AVX512BW: Fix SRA v64i8 lowering. Use PCMPGTM (cmp result in k register) for 512bit vector because PCMPGT supported only for 128/256bit. Differential Revision: http://reviews.llvm.org/D18204 llvm-svn: 263624	2016-03-16 08:48:26 +00:00
Simon Pilgrim	39d2411c3b	[X86] Regenerated widen load tests llvm-svn: 263608	2016-03-16 00:41:21 +00:00
Simon Pilgrim	f7cb16f7de	[X86][SSE41] Additional tests for extracting zeroable shuffle elements We can currently only match zeroable vector elements of the same size as the shuffle type - these tests demonstrate the problem and a solution will be shortly added in an updated D14261 llvm-svn: 263606	2016-03-16 00:13:36 +00:00
Lang Hames	1b640e05ba	[MachO] Add MachO alt-entry directive support. This patch adds support for the MachO .alt_entry assembly directive, and uses it for global aliases with non-zero GEP offsets. The alt_entry flag indicates that a symbol should be layed out immediately after the preceding symbol. Conceptually it introduces an alternate entry point for a function or data structure. E.g.: safe_foo: // check preconditions for foo .alt_entry fast_foo fast_foo: // body of foo, can assume preconditions. The .alt_entry flag is also implicitly set on assembly aliases of the form: a = b + C where C is a non-zero constant, since these have the same effect as an alt_entry symbol: they introduce a label that cannot be moved relative to the preceding one. Setting the alt_entry flag on aliases of this form fixes http://llvm.org/PR25381. llvm-svn: 263521	2016-03-15 01:43:05 +00:00
Eric Christopher	da8b3f1914	Temporarily Revert "[X86][SSE] Simplify vector LOAD + EXTEND on pre-SSE41 hardware" as it seems to be causing crashes during code generation in halide. PR forthcoming. This reverts commit r263303. llvm-svn: 263512	2016-03-14 23:59:57 +00:00
Sanjay Patel	62d707c8d9	[x86, AVX] replace masked load with full vector load when possible Converting masked vector loads to regular vector loads for x86 AVX should always be a win. I raised the legality issue of reading the extra memory bytes on llvm-dev. I did not see any objections. 1. x86 already does this kind of optimization for multiple scalar loads -> vector load. 2. If other targets have the same flexibility, we could move this transform up to CGP or DAGCombiner. Differential Revision: http://reviews.llvm.org/D18094 llvm-svn: 263446	2016-03-14 16:54:43 +00:00
Igor Breger	a949100532	AVX512: icmp operation should be always lowered to CMPM (AVX-512) instruction on SKX. implemented by delena Differential Revision: http://reviews.llvm.org/D18054 llvm-svn: 263417	2016-03-14 10:26:39 +00:00
Simon Pilgrim	9b7aaafc6a	[X86][XOP] Added target shuffle combine tests for XOP's VPPERM 2-op shuffle Actual combing support will be added in a future patch llvm-svn: 263402	2016-03-14 00:18:26 +00:00
Simon Pilgrim	5f1326f8cc	[X86][SSE] Added truncated vector arithmetic tests. For cases where we are truncating an integer vector arithmetic result, it may be better to pre-truncate the input operands - no code to support this yet (scalar is done with SimplifyDemandedBits but adding vector support could be a lot of work) but these tests represent the current codegen status. Example bugs: PR14666, PR22703 llvm-svn: 263384	2016-03-13 19:08:01 +00:00
Simon Pilgrim	035b19ecf5	[X86][SSE41] Avoid variable blend for constant v8i16 shifts The SSE41 v8i16 shift lowering using (v)pblendvb is great for non-constant shift amounts, but if it is constant then we can efficiently reduce the VSELECT to shuffles with the pre-SSE41 lowering. llvm-svn: 263383	2016-03-13 18:35:59 +00:00
Quentin Colombet	cf9732b417	[X86] Make sure we do not clobber RBX with cmpxchg when used as a base pointer. cmpxchg[8\|16]b uses RBX as one of its argument. In other words, using this instruction clobbers RBX as it is defined to hold one the input. When the backend uses dynamically allocated stack, RBX is used as a reserved register for the base pointer. Reserved registers have special semantic that only the target understands and enforces, because of that, the register allocator don’t use them, but also, don’t try to make sure they are used properly (remember it does not know how they are supposed to be used). Therefore, when RBX is used as a reserved register but defined by something that is not compatible with that use, the register allocator will not fix the surrounding code to make sure it gets saved and restored properly around the broken code. This is the responsibility of the target to do the right thing with its reserved register. To fix that, when the base pointer needs to be preserved, we use a different pseudo instruction for cmpxchg that save rbx. That pseudo takes two more arguments than the regular instruction: - One is the value to be copied into RBX to set the proper value for the comparison. - The other is the virtual register holding the save of the value of RBX as the base pointer. This saving is done as part of isel (i.e., we emit a copy from rbx). cmpxchg_save_rbx <regular cmpxchg args>, input_for_rbx_reg, save_of_rbx_as_bp This gets expanded into: rbx = copy input_for_rbx_reg cmpxchg <regular cmpxchg args> rbx = save_of_rbx_as_bp Note: The actual modeling of the pseudo is a bit more complicated to make sure the interferes that appears after the pseudo gets expanded are properly modeled before that expansion. This fixes PR26883. llvm-svn: 263325	2016-03-12 02:25:27 +00:00
Simon Pilgrim	33d57c7547	[X86][SSE] Simplify vector LOAD + EXTEND on pre-SSE41 hardware Improve vector extension of vectors on hardware without dedicated VSEXT/VZEXT instructions. We already convert these to SIGN_EXTEND_VECTOR_INREG/ZERO_EXTEND_VECTOR_INREG but can further improve this by using the legalizer instead of prematurely splitting into legal vectors in the combine as this only properly helps for lowering to VSEXT/VZEXT. Removes a lot of unnecessary any_extend + mask pattern - (Fix for PR25718). Differential Revision: http://reviews.llvm.org/D17932 llvm-svn: 263303	2016-03-11 22:18:05 +00:00
Simon Pilgrim	7ca9614c71	[X86][AVX] Fixed issue where a long chain of shuffles could attempt to combine to a single (illegal) PSHUFB instruction. Its not enough that we test for SSSE3 - that's only OK for 128-bit vectors - we also need to test for AVX2 / AVX512BW for 256/512 bit vector cases. llvm-svn: 263239	2016-03-11 14:39:10 +00:00
Simon Pilgrim	61eb49e437	[X86][SSE] Reapplied: Improve vector ZERO_EXTEND by combining to ZERO_EXTEND_VECTOR_INREG Generalise the existing SIGN_EXTEND to SIGN_EXTEND_VECTOR_INREG combine to support zero extension as well and get rid of a lot of unnecessary ANY_EXTEND + mask patterns. Reapplied with a fix for PR26870 (avoid premature use of TargetConstant in ZERO_EXTEND_VECTOR_INREG expansion). Differential Revision: http://reviews.llvm.org/D17691 llvm-svn: 263159	2016-03-10 20:40:26 +00:00
Artur Pilipenko	3c8fc57e16	Support arbitrary addrspace pointers in masked load/store intrinsics This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace. The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics. Reviewed By: reames Differential Revision: http://reviews.llvm.org/D17270 llvm-svn: 263158	2016-03-10 20:39:22 +00:00
Michael Kuperstein	8be8de6d62	[X86] Correctly select registers to pop into for x86_64 When trying to replace an add to esp with pops, we need to choose dead registers to pop into. Registers clobbered by the call and not imp-def'd by it should be safe. Except that it's not enough to check the register itself isn't defined, we also need to make sure no overlapping registers are defined either. This fixes PR26711. Differential Revision: http://reviews.llvm.org/D18029 llvm-svn: 263139	2016-03-10 18:43:21 +00:00
Sanjay Patel	a333dcfc42	give regression test a meaningful name llvm-svn: 263135	2016-03-10 17:52:19 +00:00
David L Kreitzer	14f0077f38	Unified the handling of returns in the X87 stackifier so that the stackifier runs successfully on routines containing IRETs. This fixes PR26410. Differential Revision: http://reviews.llvm.org/D17643 llvm-svn: 263120	2016-03-10 15:14:02 +00:00
Elena Demikhovsky	cd9967d160	AVX-512: Fixed a bug in i1 vector zero extending. (Skylake-avx512) (failed on instruction selection phase) Differential Revision: http://reviews.llvm.org/D17924 llvm-svn: 263111	2016-03-10 13:44:22 +00:00
Simon Pilgrim	13d4056795	[X86][AVX] Improve target shuffle combining of BLEND+zero The BLEND+zero combine was failing to combine equivalent BLEND masks. Follow up to D17483 and D17858 llvm-svn: 263105	2016-03-10 11:50:15 +00:00
Simon Pilgrim	16d11785a5	[X86][SSE] Basic combining of unary target shuffles of binary target shuffles. This patch reorders the combining of target shuffle masks so that when a unary shuffle takes a binary shuffle as its input but only references one of its inputs it can correctly combine into a unary shuffle mask. This is starting to encroach on the purpose of resolveTargetShuffleInputs, but I don't want to remove it until we definitely know we won't need it for full binary shuffle combining. There is a lot more work before we can properly support binary target shuffle masks but this was an easy case to add support for. Differential Revision: http://reviews.llvm.org/D17858 llvm-svn: 263102	2016-03-10 11:23:51 +00:00
Elena Demikhovsky	38f78a2b92	AVX-512: Fixed a bug in shuffle for v64i8 type Operation SCALAR_TO_VECTOR for v64i8 and v32i16 should be lowered if BW feature is "on". Differential Revision: http://reviews.llvm.org/D17994 llvm-svn: 263097	2016-03-10 08:32:09 +00:00
Sanjay Patel	4a8dd89128	[x86, AVX] optimize masked loads with constant masks Instead of a variable-blend instruction, form a blend with immediate because those are always cheaper. Differential Revision: http://reviews.llvm.org/D17899 llvm-svn: 263067	2016-03-09 22:12:08 +00:00
Quentin Colombet	4340b55593	Revert r262759 and r262760. The fix consisting in using the library call for atomic compare and swap when the instruction is not safe to use may be incorrect. Indeed the library call may not exist on all platform. In other words, we need a better fix! llvm-svn: 262943	2016-03-08 17:29:11 +00:00
Hans Wennborg	e00b6e7249	Revert r262599 "[X86][SSE] Improve vector ZERO_EXTEND by combining to ZERO_EXTEND_VECTOR_INREG" This caused PR26870. llvm-svn: 262935	2016-03-08 16:21:41 +00:00
Igor Breger	999ac754f2	AVX512: Add extract_subvector patterns v8i1->v4i1 , v4i1->v2i1. Differential Revision: http://reviews.llvm.org/D17953 llvm-svn: 262929	2016-03-08 15:21:25 +00:00
Simon Pilgrim	d8ac7c9f2d	[X86] Regenerated vector float extension tests llvm-svn: 262919	2016-03-08 09:17:12 +00:00
Sanjay Patel	8c84f74f3a	[x86] add test to show missing optimization This should make it clearer how this proposed patch: http://reviews.llvm.org/D11393 ...will change codegen. llvm-svn: 262875	2016-03-07 23:13:06 +00:00
Sanjay Patel	55c0dd4b26	[x86] simplify test and tighten checks I noticed this test as part of: http://reviews.llvm.org/D11393 ...which is confusing enough as-is. Let's show the exact codegen, so the changes will be more obvious. llvm-svn: 262874	2016-03-07 22:53:23 +00:00
Simon Pilgrim	253ca348b2	[X86][AVX512] Fixed VPERMT2* shuffle mask decoding and enabled target shuffle combining. Patch to add support for target shuffle combining of X86ISD::VPERMV3 nodes, including support for detecting unary shuffles. This uncovered several issues with the X86ISD::VPERMV3 shuffle mask decoding of non-64 bit shuffle mask elements - the bit masking wasn't being correctly computed. Removed non-constant pool mask decode path as we have no way of testing it right now. Differential Revision: http://reviews.llvm.org/D17916 llvm-svn: 262809	2016-03-06 21:54:52 +00:00
Igor Breger	4d94d4d5f7	AVX512BW: Support llvm intrinsic masked vector load/store for i8/i16 element types on SKX Differential Revision: http://reviews.llvm.org/D17913 llvm-svn: 262803	2016-03-06 12:38:58 +00:00
Igor Breger	f1bd761e00	AVX512: Remove VSHRI kmask patterns from TD file. It is incorrect to use kshiftw to implement VSHRI v4i1 , bits 15-4 is undef so the upper bits of v4i1 may not be zeroed. v4i1 should be zero_extend to v16i1 ( or any natively supported vector). Differential Revision: http://reviews.llvm.org/D17763 llvm-svn: 262797	2016-03-06 07:46:03 +00:00
Simon Pilgrim	40e1a71cdd	[X86][AVX] Improved VPERMILPS variable shuffle mask decoding. Added support for decoding VPERMILPS variable shuffle masks that aren't in the constant pool. Added target shuffle mask decoding for SCALAR_TO_VECTOR+VZEXT_MOVL cases - these can happen for v2i64 constant re-materialization Followup to D17681 llvm-svn: 262784	2016-03-05 22:53:31 +00:00
Quentin Colombet	2a7676b442	[X86] Fix the lowering of setjmp intrinsic on i386. When the lowering of the setjmp intrinsic requires a global base pointer to be set, make sure such pointer gets defined by the CGBR pass. This fixes PR26742. llvm-svn: 262762	2016-03-05 00:31:04 +00:00
Quentin Colombet	fb5be7a37f	Add missing triple in my previous commit! llvm-svn: 262760	2016-03-04 23:36:32 +00:00
Quentin Colombet	13b524597d	[X86] Do not use cmpxchgXXb when we need the base pointer (RBX). cmpxchgXXb uses RBX as one of its implicit argument. I.e., when we use that instruction we need to clobber RBX. This is generally fine, expect when RBX is a reserved register because in that case, the register allocator will not track its value and will not save and restore it when interferences occur. rdar://problem/24851412 llvm-svn: 262759	2016-03-04 23:29:39 +00:00
Sanjay Patel	216b275994	[x86] add tests for masked loads with constant masks llvm-svn: 262758	2016-03-04 23:28:07 +00:00
David Majnemer	d2f767d2f6	[X86] Support cleaning more than 2**16 bytes of stack The x86 ret instruction has a 16 bit immediate indicating how many bytes to pop off of the stack beyond the return address. There is a problem when extremely large structs are passed by value: we might not be able to fit the number of bytes to pop into the return instruction. To fix this, expand RET_FLAG a little later and use a special sequence to clean the stack: pop %ecx ; return address is now in %ecx add $n, %esp ; clean the stack push %ecx ; bring the return address back on the stack ret ; pop the return address and jmp to it's value llvm-svn: 262755	2016-03-04 22:56:17 +00:00
Michael Kuperstein	b89f0fa2a2	[DAGCombine] Fix divrem combine not to assume div/rem type is simple. The divrem combine assumed the type of the div/rem is simple, which isn't necessarily true. This probably worked fine until r250825, since it only saw legal types, but now breaks when it runs as a pre-type-legalization combine. This fixes PR26835. Differential Revision: http://reviews.llvm.org/D17878 llvm-svn: 262746	2016-03-04 21:23:29 +00:00
Simon Pilgrim	3c7e94208a	[X86][AVX512] Added some basic X86ISD::VPERMV3 shuffle combining tests None of these actually combine yet as we haven't enabled X86ISD::VPERMV3 for target shuffle combining llvm-svn: 262718	2016-03-04 15:19:42 +00:00
Simon Pilgrim	b4b90fb8d6	[X86][SSSE3] Added combine test for unary shuffle (pshufb) only referencing elements from the second input of a binary shuffle (punpcklbw) llvm-svn: 262710	2016-03-04 11:15:23 +00:00
Simon Pilgrim	f33cb61471	[X86][AVX512BW] Fixed 512-bit PSHUFB shuffle mask decode and added combine test. PSHUFB decoder was assuming that input was 128 or 256-bit vector only. llvm-svn: 262661	2016-03-03 21:55:01 +00:00
Simon Pilgrim	abcee45b7a	[X86][AVX] Better support for the variable mask form of VPERMILPD/VPERMILPS The variable mask form of VPERMILPD/VPERMILPS were only partially implemented, with much of it still performed as an intrinsic. This patch properly defines the instructions in terms of X86ISD::VPERMILPV, permitting the opcode to be easily combined as a target shuffle. Differential Revision: http://reviews.llvm.org/D17681 llvm-svn: 262635	2016-03-03 18:13:53 +00:00
Ahmed Bougacha	671795a985	[X86] Don't assume that shuffle non-mask operands starts at #0 . That's not the case for VPERMV/VPERMV3, which cover all possible combinations (the C intrinsics use a different order; the AVX vs AVX512 intrinsics are different still). Since: r246981 AVX-512: Lowering for 512-bit vector shuffles. VPERMV is recognized in getTargetShuffleMask. This breaks assumptions in most callers, as they expect the non-mask operands to start at index 0. VPERMV has the mask as operand #0; VPERMV3 has it in the middle. Instead of the faulty assumption, have getTargetShuffleMask return its operands as well. One alternative we considered was to change the operand order of VPERMV, but we agreed to stick to the instruction order, as there are more AVX512 weirdness to cover (vpermt2/vpermi2 in particular). Differential Revision: http://reviews.llvm.org/D17041 llvm-svn: 262627	2016-03-03 16:53:50 +00:00
Igor Breger	639fde79b0	AVX512: Combine AND + TESTM instructions . Differential Revision: http://reviews.llvm.org/D17844 llvm-svn: 262621	2016-03-03 14:18:38 +00:00
Renato Golin	f824ced6a1	Making rem_crash.ll target-specific This test failed in some ARM bots after a divmod change because it was running on a native llc, instead of targeted one. This makes sure the test is target-specific (as intended), and also copies to ARM and AArch64 directories. If it is also supposed to work on other architectures, I'll leave as an exercise to the respective maintainers. llvm-svn: 262620	2016-03-03 14:01:10 +00:00
Simon Pilgrim	91dd0a796c	[X86][SSE] Improve vector ZERO_EXTEND by combining to ZERO_EXTEND_VECTOR_INREG Generalise the existing SIGN_EXTEND to SIGN_EXTEND_VECTOR_INREG combine to support zero extension as well and get rid of a lot of unnecessary ANY_EXTEND + mask patterns. Differential Revision: http://reviews.llvm.org/D17691 llvm-svn: 262599	2016-03-03 09:43:28 +00:00
Michael Zuckerman	c4d054fa4a	[LLVM][AVX512] PSRLWI Chnage imm8 to int Differential Revision: http://reviews.llvm.org/D17753 llvm-svn: 262592	2016-03-03 08:54:05 +00:00
Hans Wennborg	153e4b0f11	[X86] Enable forwarding bool arguments in tail calls (PR26305) The code was previously not able to track a boolean argument at a call site back to the formal argument of the caller. Differential Revision: http://reviews.llvm.org/D17786 llvm-svn: 262575	2016-03-03 02:06:32 +00:00
Philip Reames	23d933982a	[MBP] Avoid placing random blocks between loop preheader and header If we have a loop with a rarely taken path, we will prune that from the blocks which get added as part of the loop chain. The problem is that we weren't then recognizing the loop chain as schedulable when considering the preheader when forming the function chain. We'd then fall to various non-predecessors before finally scheduling the loop chain (as if the CFG was unnatural.) The net result was that there could be lots of garbage between a loop preheader and the loop, even though we could have directly fallen into the loop. It also meant we separated hot code with regions of colder code. The particular reason for the rejection of the loop chain was that we were scanning predecessor of the header, seeing the backedge, believing that was a globally more important predecessor (true), but forgetting to account for the fact the backedge precessor was already part of the existing loop chain (oops!. Differential Revision: http://reviews.llvm.org/D17830 llvm-svn: 262547	2016-03-03 00:01:42 +00:00
David Majnemer	1ef654024f	[X86] Don't give catch objects a displacement of zero Catch objects with a displacement of zero do not initialize a catch object. The displacement is relative to %rsp at the end of the function's prologue for x86_64 targets. If we place an object at the top-of-stack, we will end up wit a displacement of zero resulting in our catch object remaining uninitialized. Address this by creating our catch objects as fixed objects. We will ensure that the UnwindHelp object is created after the catch objects so that no catch object will have a displacement of zero. Differential Revision: http://reviews.llvm.org/D17823 llvm-svn: 262546	2016-03-03 00:01:25 +00:00
Reid Kleckner	65f9d9cd32	Revert "[X86] Elide references to _chkstk for dynamic allocas" This reverts commit r262370. It turns out there is code out there that does sequences of allocas greater than 4K: http://crbug.com/591404 The goal of this change was to improve the code size of inalloca call sequences, but we got tangled up in the mess of dynamic allocas. Instead, we should come back later with a separate MI pass that uses dominance to optimize the full sequence. This should also be able to remove the often unneeded stacksave/stackrestore pairs around the call. llvm-svn: 262505	2016-03-02 19:20:59 +00:00
Simon Pilgrim	537907fd32	[X86][SSSE3] Added combine test for unary shuffle (pshufb) only referencing elements from one of the inputs of a binary shuffle (punpcklbw) llvm-svn: 262486	2016-03-02 14:16:50 +00:00
Michael Zuckerman	927fdaee88	[LLVM][AVX512]PSRAWI Change imm8 to int. Differential Revision: http://reviews.llvm.org/D17705 llvm-svn: 262480	2016-03-02 12:05:07 +00:00
Simon Pilgrim	c02b72627a	[X86][SSE] Lower 128-bit MOVDDUP with existing VBROADCAST mechanisms We have a number of useful lowering strategies for VBROADCAST instructions (both from memory and register element 0) which the 128-bit form of the MOVDDUP instruction can make use of. This patch tweaks lowerVectorShuffleAsBroadcast to enable it to broadcast 2f64 args using MOVDDUP as well. It does require a slight tweak to the lowerVectorShuffleAsBroadcast mechanism as the existing MOVDDUP lowering uses isShuffleEquivalent which can match binary shuffles that can lower to (unary) broadcasts. Differential Revision: http://reviews.llvm.org/D17680 llvm-svn: 262478	2016-03-02 11:43:05 +00:00
David Majnemer	5aadde1ecc	[X86] Permit reading of the FLAGS register without it being previously defined We modeled the RDFLAGS{32,64} operations as "using" {E,R}FLAGS. While technically correct, this is not be desirable for folks who want to examine aspects of the FLAGS register which are not related to computation like whether or not CPUID is a valid instruction. Differential Revision: http://reviews.llvm.org/D17782 llvm-svn: 262465	2016-03-02 06:46:52 +00:00
Simon Pilgrim	a3ad9cd793	[X86][SSE41] Added missing fast-isel intrinsics tests Match IR generated in clang/test/CodeGen/sse41-builtins.c llvm-svn: 262412	2016-03-01 22:05:05 +00:00
Simon Pilgrim	9c03d1313c	[X86][XOP] Regenerated intrinsics tests llvm-svn: 262410	2016-03-01 21:58:50 +00:00
Simon Pilgrim	b1f5c62d5f	[X86][AVX2] Regenerated 256-bit vector / 64-bit element permute tests llvm-svn: 262406	2016-03-01 21:53:12 +00:00
Simon Pilgrim	89244ba84a	[X86][AVX2] Regenerated horizontal add/sub tests llvm-svn: 262403	2016-03-01 21:43:55 +00:00
Simon Pilgrim	46e57fd073	[X86][AVX2] Regenerated intrinsics tests llvm-svn: 262401	2016-03-01 21:38:41 +00:00
Simon Pilgrim	f2f5626b84	[X86][AVX] Fixed triple/arch clash in test case We were specifying a x64 triple and then overriding with a x86 arch. llvm-svn: 262398	2016-03-01 21:33:08 +00:00
David Majnemer	791b88b6da	[X86] Elide references to _chkstk for dynamic allocas The _chkstk function is called by the compiler to probe the stack in an order consistent with Windows' expectations. However, it is possible to elide the call to _chkstk and manually adjust the stack pointer if we can prove that the allocation is fixed size and smaller than the probe size. This shrinks chrome.dll, chrome_child.dll and chrome.exe by a cummulative ~133 KB. Differential Revision: http://reviews.llvm.org/D17679 llvm-svn: 262370	2016-03-01 19:20:23 +00:00
Michael Zuckerman	433b241570	[LLVM][AVX512] PSRL{DI\|QI} Change imm8 to int Differential Revision: http://reviews.llvm.org/D17713 llvm-svn: 262353	2016-03-01 17:46:32 +00:00
Hans Wennborg	e64cf9dddb	[X86] Check that attribute parameters match for tail calls (PR26590) In the code below on 32-bit targets, x would previously get forwarded to g() without sign-extension to 32 bits as required by the parameter attribute. void g(signed short); void f(unsigned short x) { g(x); } llvm-svn: 262352	2016-03-01 17:45:23 +00:00
Michael Zuckerman	7878888690	[AVX512][PSRAQ][PSRAD] Change imm8 to int. Differential Revision: http://reviews.llvm.org/D17692 llvm-svn: 262320	2016-03-01 11:36:23 +00:00
Amjad Aboud	719325fe11	Disallow generating vzeroupper before return instruction (iret) in interrupt handler function. This resolves https://llvm.org/bugs/show_bug.cgi?id=26412 Differential Revision: http://reviews.llvm.org/D17542 llvm-svn: 262319	2016-03-01 11:32:03 +00:00
David Majnemer	cb305dea1c	[WinEH] Allocate the registration node before the catch objects The CatchObjOffset is relative to the end of the EH registration node for 32-bit x86 WinEH targets. A special sentinel value, 0, is used to indicate that no catch object should be initialized. This means that a catch object allocated immediately before the registration node would be assigned a CatchObjOffset of 0, leading the runtime to believe that a catch object should not be initialized. To handle this, allocate the registration node prior to any other frame object. This will ensure that catch objects will not be allocated before the registration node. This fixes PR26757. Differential Revision: http://reviews.llvm.org/D17689 llvm-svn: 262294	2016-03-01 04:30:16 +00:00
Michael Zuckerman	96836fc81c	[AVX512][PSLLW ][PSLLV] Change imm8 to int Differential Revision: http://reviews.llvm.org/D17684 llvm-svn: 262176	2016-02-28 07:32:10 +00:00
Matt Arsenault	982224cfb8	DAGCombiner: Don't unnecessarily swap operands in ReassociateOps In the case where op = add, y = base_ptr, and x = offset, this transform: (op y, (op x, c1)) -> (op (op x, y), c1) breaks the canonical form of add by putting the base pointer in the second operand and the offset in the first. This fix is important for the R600 target, because for some address spaces the base pointer and the offset are stored in separate register classes. The old pattern caused the ISel code for matching addressing modes to put the base pointer and offset in the wrong register classes, which required no-trivial code transformations to fix. llvm-svn: 262148	2016-02-27 19:57:45 +00:00
Simon Pilgrim	83e76327e8	[X86][AVX] vpermilvar.pd mask element indices only use bit1 llvm-svn: 262134	2016-02-27 12:51:46 +00:00
Simon Pilgrim	a9a7bf68ee	[X86][AVX] Added AVX1 target shuffle combine tests llvm-svn: 262132	2016-02-27 12:33:08 +00:00
Ahmed Bougacha	0c95decaaa	[X86] Move an encoding test from CodeGen to MC. NFC. llvm-svn: 262089	2016-02-26 23:00:03 +00:00
Ahmed Bougacha	ccf38fd0e2	[X86] Delete old redundant test. NFC. llvm-svn: 262088	2016-02-26 23:00:00 +00:00
Sanjay Patel	155193c3aa	[x86, AVX] fold 'isPositive' 256-bit vector integer operations (PR26701) This extends the fold introduced with: http://reviews.llvm.org/rL262036 llvm-svn: 262047	2016-02-26 18:42:50 +00:00
Sanjay Patel	334685b486	[x86, AVX] add 256-bit tests llvm-svn: 262044	2016-02-26 18:07:58 +00:00
Sanjay Patel	4402a32b32	[x86, SSE] fold 'isPositive' vector integer operations (PR26701) This is one of the cases shown in: https://llvm.org/bugs/show_bug.cgi?id=26701 Shift and negate is what InstCombine appears to prefer, so I've started with that pattern. Note that the 'pcmpeq' instructions are always generating the negative one for the actual 'pcmpgt' comparison in each case (side note: why isn't there an alias mnemonic for that?). Differential Revision: http://reviews.llvm.org/D17630 llvm-svn: 262036	2016-02-26 16:56:03 +00:00
Reid Kleckner	70c9bc71d4	[WinEH] Fix funclet return block clobber mask placement MBB slot index intervals are half open, not closed. getMBBEndIndex() returns the slot index of the start of the next block in layout order. Placing a register mask there is incorrect if the successor of the funclet return is not laid out after the return. Clang generates IR for catch bodies before generating the following normal code, so we never noticed this issue until the D frontend authors filed a bug about it. Instead, we can put the clobber mask on the last instruction of the funclet return block. We still aren't using a register mask operand on the CATCHRET instruction because it would cause PEI to spill all CSRs, including XMM regs, in the prologue. Fixes PR26679. llvm-svn: 262035	2016-02-26 16:53:19 +00:00
Simon Pilgrim	cf5352db84	[X86][F16C] Added native IR half/float conversion tests. Placeholder tests until we start improving native vector support. llvm-svn: 261989	2016-02-26 08:52:29 +00:00
Matthias Braun	9dcd65f478	MachineCopyPropagation: Catch copies of the form A<-B;A<-B Differential Revision: http://reviews.llvm.org/D17475 llvm-svn: 261966	2016-02-26 03:18:55 +00:00
Matthias Braun	e39ff70685	MachineCopyPropagation: Keep scanning through instructions with regmasks This also simplifies the code by removing the overly conservative NoInterveningSideEffect() function. This function checked: - That the two copies belong to the same block: We only process one block at a time and clear our maps in between it is impossible to find a copy from a different block. - There is no terminator between the two copy instructions: This is not allowed anyway (the MachineVerifier would complain) - Does not have instructions with hasUnmodeledSideEffects() or isCall() set: Even for those instructuction we must have all clobbers/defs of registers explicit as an operand. If the register is explicitely clobbered we would never come to the point of checking for NoInterveningSideEffect() anyway. (I also checked this with a temporary build of the test-suite with all potentially failing conditions in NoInterveningSideEffect() turned into asserts) Differential Revision: http://reviews.llvm.org/D17474 llvm-svn: 261965	2016-02-26 03:18:50 +00:00
Sanjay Patel	7ed9361896	[x86, SSE] add tests to show missing pcmp folds llvm-svn: 261948	2016-02-26 01:14:27 +00:00
Simon Pilgrim	e4178ae510	[X86][SSE3] Added combine support for MOVDDUP/MOVSHDUP/MOVSLDUP target shuffles Now that PerformShuffleCombine can handle unary shuffles. llvm-svn: 261843	2016-02-25 09:12:12 +00:00
Elena Demikhovsky	e5bbca6ae2	Optimized loading (zextload) of i1 value from memory. This patch is a partial revert of https://llvm.org/svn/llvm-project/llvm/trunk@237793. Extra "and" causes performance degradation. We assume that i1 is stored in zero-extended form. And store operation is responsible for zeroing upper bits. Differential Revision: http://reviews.llvm.org/D17541 llvm-svn: 261828	2016-02-25 07:05:12 +00:00
Junmo Park	161dc1c605	[CodeGenPrepare] Remove load-based heuristic Summary: Both the hardware and LLVM have changed since 2012. Now, load-based heuristic don't show big differences any more on OoO cores. There is no notable regressons and improvements on spec2000/2006. (Cortex-A57, Core i5). Reviewers: spatel, zansari Differential Revision: http://reviews.llvm.org/D16836 llvm-svn: 261809	2016-02-25 00:23:27 +00:00
Cong Hou	ce16649d49	Move test/CodeGen/Generic/pr26652.ll to test/CodeGen/X86/pr26652.ll and test it only on X86. llvm-svn: 261807	2016-02-25 00:12:18 +00:00
Matthias Braun	aca625a4fe	MachineInstr: Respect register aliases in clearRegiserKills() This fixes bugs in copy elimination code in llvm. It slightly changes the semantics of clearRegisterKills(). This is appropriate because: - Users in lib/CodeGen/MachineCopyPropagation.cpp and lib/Target/AArch64RedundantCopyElimination.cpp and lib/Target/SystemZ/SystemZElimCompare.cpp are incorrect without it (see included testcase). - All other users in llvm are unaffected (they pass TRI==nullptr) - (Kill flags are optional anyway so removing too many shouldn't hurt.) Differential Revision: http://reviews.llvm.org/D17554 llvm-svn: 261763	2016-02-24 19:21:48 +00:00
Simon Pilgrim	cd25a2bef0	[X86][SSSE3] Added target shuffle combine tests for SSE3/SSSE3 specific shuffles. Allows us to test SSSE3 PSHUFB intrinsic. llvm-svn: 261753	2016-02-24 17:08:59 +00:00
Simon Pilgrim	3b6feeaa7c	[X86][SSE41] Combine vector blends with zero Part 2 of 2 This patch add support for combining target shuffles into blends-with-zero. Differential Revision: http://reviews.llvm.org/D17483 llvm-svn: 261745	2016-02-24 15:14:21 +00:00
Simon Pilgrim	dd01f70085	[X86][SSE41] Combine insertion of zero scalars into vector blends with zero Part 1 of 2 This patch attempts to replace the insertion of zero scalars with a vector blend with zero, avoiding the use of the integer insertion instructions (which are particularly slow on many targets). (Part 2 will add support for combining multiple blends-with-zero). Differential Revision: http://reviews.llvm.org/D17483 llvm-svn: 261743	2016-02-24 14:53:27 +00:00
Simon Pilgrim	e1bb13d67a	[X86][SSE] Fixed vector rotation test name typo Rotation of 16i6 vector not 8i16 vector - copy+paste is not your friend llvm-svn: 261733	2016-02-24 11:39:13 +00:00
Michael Zuckerman	a1f2d27da2	[LLVM][AVX512][PSHUFHW ][PSHUFLW ] Change imm8 to int Differential Revision: http://reviews.llvm.org/D17538 llvm-svn: 261725	2016-02-24 08:39:05 +00:00
Igor Breger	c7ba5699c5	AVX512: Add vpmovzxbw/d/q ,vpmovzxw/d/q ,vpmovzxbdq lowering patterns that support 256bit inputs like AVX patterns ( that are disable in case HasVLX , see SS41I_pmovx_avx2_patterns). Differential Revision: http://reviews.llvm.org/D17504 llvm-svn: 261724	2016-02-24 08:15:20 +00:00
Alexander Kornienko	d95b5d4490	Remove a space after a trailing backslash. llvm-svn: 261629	2016-02-23 11:19:56 +00:00
Igor Breger	1360f4db47	AVX512: Fix predicate of AVX pcmpeqw/b , pcmpgtb/w/d instructions . AVX512 version of this instructions return result in kmask register, so AVX patterns should not be disabled. Differential Revision: http://reviews.llvm.org/D17517 llvm-svn: 261619	2016-02-23 08:55:33 +00:00
Dehao Chen	f84b630044	Add prefix based function layout when profile is available. Summary: If a function is hot, put it in text.hot section. Reviewers: davidxl Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D17532 llvm-svn: 261607	2016-02-23 03:39:24 +00:00
Duncan P. N. Exon Smith	b3613fce19	Revert "Add prefix based function layout when profile is available." This reverts commit r261582, since this bot has been broken for four hours: http://lab.llvm.org:8080/green/job/clang-stage1-cmake-RA-incremental_check/19399/ llvm-svn: 261604	2016-02-23 02:28:40 +00:00
David Majnemer	964b70d559	[X86] Create mergeable constant pool entries for AVX We supported creating mergeable constant pool entries for smaller constants but not for 32-byte AVX constants. llvm-svn: 261584	2016-02-22 22:23:11 +00:00
Dehao Chen	c5f76f7347	Add prefix based function layout when profile is available. Summary: If a function is hot, put it in text.hot section. Reviewers: davidxl Subscribers: eraman, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D17460 llvm-svn: 261582	2016-02-22 22:14:14 +00:00
Kevin B. Smith	47e64abe1a	[X86] More test updates to support fixup-byte-word-insts optimization either on or off. Differential Revisions: http://reviews.llvm.org/D17458 llvm-svn: 261505	2016-02-22 01:27:56 +00:00
Simon Pilgrim	e9093adae0	[X86][AVX] Add shuffle masking support for EltsFromConsecutiveLoads Add support for the case where we have a consecutive load (which must include the first + last elements) with a mixture of undef/zero elements. We load the vector and then apply a shuffle to clear the zero'd elements. Differential Revision: http://reviews.llvm.org/D17297 llvm-svn: 261490	2016-02-21 19:15:48 +00:00
Sanjoy Das	aa63dc0e9a	Fix LLVM's handling and detection of skylake and cannonlake CPUs Summary: - Rename `"skylake"` == SkylakeServerProc to `"skylake-avx512"` - Change `"skylake"` to denote SkylakeClientProc - Fix the detection of cpu family 6 and model 94 to be SkylakeClientProc instead of SkylakeServerProc - Remove the `"cnl"` for CannonLake Reviewers: craig.topper, delena Subscribers: zansari, echristo, qcolombet, RKSimon, spatel, DavidKreitzer, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D17090 llvm-svn: 261482	2016-02-21 17:12:03 +00:00
David Majnemer	a3ea407d48	[X86] Use the correct alignment for COMDAT constant pool entries COFF doesn't have sections with mergeable contents. Instead, each constant pool entry ends up in a COMDAT section. The linker, when choosing between COMDAT sections, doesn't choose the max alignment of the two sections. You just get whatever alignment was on the section. If one constant needed a higher alignment in one object file from another one, then we will get into trouble if the linker chooses the lower alignment one. Instead, lets promote the alignment of the constant pool entry to make sure we don't use an under aligned constant with an instruction which assumed otherwise. This fixes PR26680. llvm-svn: 261462	2016-02-21 01:30:30 +00:00
Simon Pilgrim	d765c0b8b9	[X86][AVX] Added test case for PR22359 llvm-svn: 261444	2016-02-20 19:21:20 +00:00
Simon Pilgrim	79a14dd3d1	[X86] Regenerated pr16360.ll llvm-svn: 261440	2016-02-20 17:56:45 +00:00
Simon Pilgrim	972d9fb76b	[X86][SSE41] More fast-isel intrinsics tests llvm-svn: 261439	2016-02-20 17:30:37 +00:00
Simon Pilgrim	19b3ce0f07	[X86][SSE41] Added fast-isel intrinsics tests As discussed on PR24580, this patch adds some (more to come) initial fast-isel codegen tests to match the IR generated in clang/test/CodeGen/sse41-builtins.c llvm-svn: 261438	2016-02-20 17:11:32 +00:00
Simon Pilgrim	ecb0433599	[X86][SSE] Fixed issue with commutation of 'faux unary' target shuffles (PR26667) Fixed a bug introduced by D16683 when a binary shuffle is simplified to a unary shuffle (with undef/zero sentinel mask indices) - if this resulted in only the second input being used combineX86ShuffleChain failed to take this into account and still referenced the first input. llvm-svn: 261434	2016-02-20 14:39:45 +00:00
Andrey Turetskiy	9994b8894a	[X86] Enable the LEA optimization pass by default. Differential Revision: http://reviews.llvm.org/D16877 llvm-svn: 261429	2016-02-20 11:11:55 +00:00
Andrey Turetskiy	0babd26626	[X86] PR26575: Fix LEA optimization pass (Part 2). Handle address displacement operands of a type other than Immediate or Global in LEAs and load/stores. Ref: https://llvm.org/bugs/show_bug.cgi?id=26575 Differential Revision: http://reviews.llvm.org/D17374 llvm-svn: 261428	2016-02-20 10:58:28 +00:00
Davide Italiano	228978c0dc	[X86ISelLowering] Fix TLSADDR lowering when shrink-wrapping is enabled. TLSADDR nodes are lowered into actuall calls inside MC. In order to prevent shrink-wrapping from pushing prologue/epilogue past them (which result in TLS variables being accessed before the stack frame is set up), we put markers, so that the stack gets adjusted properly. Thanks to Quentin Colombet for guidance/help on how to fix this problem! llvm-svn: 261387	2016-02-20 00:44:47 +00:00
Quentin Colombet	e611698e84	[RegAllocFast] Properly track the physical register definitions on calls. PR26485 llvm-svn: 261384	2016-02-20 00:32:29 +00:00
Dimitry Andric	db417b6d40	Fix incorrect selection of AVX512 sqrt when OptForSize is on Summary: When optimizing for size, sqrt calls can be incorrectly selected as AVX512 VSQRT instructions. This is because X86InstrAVX512.td has a `Requires<[OptForSize]>` in its `avx512_sqrt_scalar` multiclass definition. Even if the target does not support AVX512, the class can apparently still be chosen, leading to an incorrect selection of `vsqrtss`. In PR26625, this lead to an assertion: Reg >= X86::FP0 && Reg <= X86::FP6 && "Expected FP register!", because the `vsqrtss` instruction requires an XMM register, which is not available on i686 CPUs. Reviewers: grosbach, resistor, joker.eph Subscribers: spatel, emaste, llvm-commits Differential Revision: http://reviews.llvm.org/D17414 llvm-svn: 261360	2016-02-19 20:14:11 +00:00
Sanjoy Das	d2db73ba59	[StatepointLowering] Fix bug in allocateStackSlot allocateStackSlot did not consider the size of the value to be spilled before deciding to re-use a spill slot. This was originally okay (since originally we'd only ever spill pointers), but it became not okay when we changed our scheme to directly spill vectors of pointers. While this change fixes the bug pointed out, it has two performance caveats: - It matches spill slot and spillee size exactly, while in theory we can spill, e.g., an 8 byte pointer into a 16 byte slot. This is slightly complicated to fix since in the stackmaps section, we report the size of the spill slot as the size of the "indirect value"; and if they're no longer equivalent, we'll have to keep track of the (indirect) value size separately from the stack slot size. - It will "spuriously run out" of reusable slots, since we now have an second check in the search loop in addition to the availablity check (e.g. you had two free scalar slots, and you first ask for a vector slot followed by a scalar slot). I'll fix this in a later commit. llvm-svn: 261336	2016-02-19 17:15:22 +00:00
Kevin B. Smith	652128d48c	[X86] Change fixup-bw-inst.ll to test output with this optimization on and off. Differential Revision: http://reviews.llvm.org/D17415 llvm-svn: 261332	2016-02-19 16:20:48 +00:00
Simon Pilgrim	9630a4ab15	[X86][AVX] Added fast-isel intrinsics tests As discussed on PR24580, this patch adds some (more to come) initial fast-isel codegen tests to match the IR generated in clang/test/CodeGen/avx-builtins.c llvm-svn: 261329	2016-02-19 14:38:09 +00:00
Justin Lebar	c75d566f56	When printing MIR, output to errs() rather than outs(). Summary: Without this, this command $ llvm-run llc -stop-after machine-cp -o - <( echo '' ) outputs an error, because we close stdout twice -- once when closing the file opened for "-o", and again when closing outs(). Also clarify in the outs() definition that you can't ever call it if you want to open your own raw_fd_ostream on stdout. Reviewers: jroelofs, tstellarAMD Subscribers: jholewinski, qcolombet, dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D17422 llvm-svn: 261286	2016-02-19 00:18:46 +00:00
David Majnemer	a822c880a9	[WinEH] Hoist state stores from successors If we know that all of our successors want to be in the exact same state, it makes sense to hoist the state transition into their common predecessor. Differential Revision: http://reviews.llvm.org/D17391 llvm-svn: 261262	2016-02-18 21:13:35 +00:00
Hans Wennborg	75734f87a6	Add more triples after r261235 Since the behaviour is now different between Darwin and non-Darwin, more triples are needed :-/ llvm-svn: 261238	2016-02-18 18:44:33 +00:00
Hans Wennborg	23cdc643b9	Revert to extend i8/i16 return values on Darwin (PR26665) In r260133, LLVM was changed to no longer extend i8/i16 return values, as it's not required by the ABI. However, code was found in the wild that relies on the old behaviour on Darwin, so this commit reverts back to that old behaviour for Darwin. On other platforms, it's less likely that code would be depending on the old behaviour, as GCC and MSVC haven't been extending such return values. llvm-svn: 261235	2016-02-18 18:17:05 +00:00
Simon Pilgrim	05e48b95eb	[X86][SSE] Improve PSHUFB shuffle mask decoding. In cases where the PSHUFB shuffle mask is shared it might not be bitcasted to a vXi8 byte vector. This patch adds support for decoding these wider shuffle masks from the ConstantPool. The test case in question makes use of this to recognise the shuffle mask is an unary UNPCKL pattern and simplifies accordingly. llvm-svn: 261201	2016-02-18 10:17:40 +00:00
Michael Zuckerman	724dc3b20c	[AVX512][PRORQ][PRORD] Change imm8 to int Differential Revision: http://reviews.llvm.org/D17024 llvm-svn: 261198	2016-02-18 09:52:12 +00:00
David Majnemer	7e5937b775	[WinEH] Optimize WinEH state stores 32-bit x86 Windows targets use a linked-list of nodes allocated on the stack, referenced to via thread-local storage. The personality routine interprets one of the fields in the node as a 'state number' which indicates where the personality routine should transfer control. State transitions are possible only before call-sites which may throw exceptions. Our previous scheme had us update the state number before all call-sites which may throw. Instead, we can try to minimize the number of times we need to store by reasoning about the nearest store which dominates the current call-site. If the last store agrees with the current call-site, then we know that the state-update is redundant and can be elided. This is largely straightforward: an RPO walk of the blocks allows us to correctly forward propagate the information when the function is a DAG. Currently, loops are not handled optimally and may trigger superfluous state stores. Differential Revision: http://reviews.llvm.org/D16763 llvm-svn: 261122	2016-02-17 18:37:11 +00:00
Simon Pilgrim	07d72f4f49	[X86][SSE] Update pshufb mask tests. We are getting better at combining constant pshufb masks - use a real input instead of undef. Add test for decoding multi-use bitcasted masks as well (actual support will come soon). llvm-svn: 261101	2016-02-17 15:52:39 +00:00
Simon Pilgrim	43bd887090	[X86][SSE] Update pshufb mask test to use a real input instead of undef We are getting better at combining constant pshufb masks - this test would've failed once we decode bitcasted masks as well. llvm-svn: 261095	2016-02-17 14:56:58 +00:00
Igor Breger	ac02f1bb62	AVX512: Fix LowerMSCATTER() return value. Bug description: The bug was discovered when test was compiled with -O0. In case scatter result is DAG root , VectorLegalizer failed (assert) due to LowerMSCATTER() return kmask as result. Change LowerMSCATTER() to return chain as original node do. Differential Revision: http://reviews.llvm.org/D17331 llvm-svn: 261090	2016-02-17 14:04:33 +00:00
Simon Pilgrim	c5b5dcb985	[X86][AVX] Support bit-blend integer shuffles for 256-bit integer vectors AVX1 doesn't support the shuffling of 256-bit integer vectors. For 32/64-bit elements we get around this by shuffling as float/double but for 8/16-bit elements (assuming they can't widen) we currently just split, shuffle as 128-bit vectors and concatenate the results back. This patch adds the ability to lower using the bit-blend patterns before defaulting to the splitting behaviour. Part 2 of 2 Differential Revision: http://reviews.llvm.org/D17292 llvm-svn: 261082	2016-02-17 10:50:06 +00:00
Simon Pilgrim	a50e8d3627	[X86][AVX] Support bit-mask integer shuffles for 256-bit integer vectors AVX1 doesn't support the shuffling of 256-bit integer vectors. For 32/64-bit elements we get around this by shuffling as float/double but for 8/16-bit elements (assuming they can't widen) we currently just split, shuffle as 128-bit vectors and concatenate the results back. This patch adds the ability to lower using the bit-mask patterns before defaulting to the splitting behaviour. In some cases this ends up matching what AVX2 would do anyhow or what AVX1 does on the split vectors. Part 1 of 2 Differential Revision: http://reviews.llvm.org/D17292 llvm-svn: 261081	2016-02-17 10:37:49 +00:00
Hans Wennborg	84047896b9	Revert r260979 "[X86] Enable the LEA optimization pass by default." Asserts are still firing in Chromium builds. PR26575. llvm-svn: 261058	2016-02-17 02:49:59 +00:00
Reid Kleckner	8de35fef3d	[X86] Fix a shrink-wrapping miscompile around __chkstk __chkstk clobbers EAX. If EAX is live across the prologue, then we have to take extra steps to save it. We already had code to do this if EAX was a register parameter. This change adapts it to work when shrink wrapping is used. llvm-svn: 261039	2016-02-17 00:17:33 +00:00
Simon Pilgrim	cc8a282647	[X86][AVX] Regenerated vselect tests llvm-svn: 261026	2016-02-16 22:33:27 +00:00
Ahmed Bougacha	af60a429c9	[X86] Generalize logic blend of (x, -x) combine to match (-x, x). I suspect this is what let PR26110 lie dormant for so long. llvm-svn: 261024	2016-02-16 22:14:07 +00:00
Ahmed Bougacha	132fbf5476	[X86] Don't turn (c?-v:v) into (c?-v:0) by blindly using PSIGN. Currently, we sometimes miscompile this vector pattern: (c ? -v : v) We lower it to (because "c" is <4 x i1>, lowered as a vector mask): (~c & v) \| (c & -v) When we have SSSE3, we incorrectly lower that to PSIGN, which does: (c < 0 ? -v : c > 0 ? v : 0) in other words, when c is either all-ones or all-zero: (c ? -v : 0) While this is an old bug, it rarely triggers because the PSIGN combine is too sensitive to operand order. This will be improved separately. Note that the PSIGN tests are also incorrect. Consider: %b.lobit = ashr <4 x i32> %b, <i32 31, i32 31, i32 31, i32 31> %sub = sub nsw <4 x i32> zeroinitializer, %a %0 = xor <4 x i32> %b.lobit, <i32 -1, i32 -1, i32 -1, i32 -1> %1 = and <4 x i32> %a, %0 %2 = and <4 x i32> %b.lobit, %sub %cond = or <4 x i32> %1, %2 ret <4 x i32> %cond if %b is zero: %b.lobit = <4 x i32> zeroinitializer %sub = sub nsw <4 x i32> zeroinitializer, %a %0 = <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1> %1 = <4 x i32> %a %2 = <4 x i32> zeroinitializer %cond = or <4 x i32> %a, zeroinitializer ret <4 x i32> %a whereas we currently generate: psignd %xmm1, %xmm0 retq which returns 0, as %xmm1 is 0. Instead, use a pure logic sequence, as described in: https://graphics.stanford.edu/~seander/bithacks.html#ConditionalNegate Fixes PR26110. Differential Revision: http://reviews.llvm.org/D17181 llvm-svn: 261023	2016-02-16 22:14:03 +00:00
Ahmed Bougacha	a87c3480b5	[X86] Extract PSIGN/BLENDVP tests into vector-blend.ll. NFC. We're going to stop generating PSIGN, so calling a test "psign" isn't ideal. Instead, call these tests what they really are: variable blends using logic. Also add a test to exhibit a case we're currently missing in the PSIGN combine. llvm-svn: 261022	2016-02-16 22:13:59 +00:00
Andrey Turetskiy	eab4e68650	[X86] Enable the LEA optimization pass by default. Differential Revision: http://reviews.llvm.org/D16877 llvm-svn: 260979	2016-02-16 16:41:38 +00:00
Andrey Turetskiy	1052ac2311	[X86] PR26575: Fix LEA optimization pass. Add a missing check for a type of address displacement operand of the load/store instruction being a candidate for LEA substitution. Ref: https://llvm.org/bugs/show_bug.cgi?id=26575 Differential Revision: http://reviews.llvm.org/D17261 llvm-svn: 260959	2016-02-16 12:47:45 +00:00
Zia Ansari	30a02384f7	Implemented stack symbol table ordering/packing optimization to improve data locality and code size from SP/FP offset encoding. Differential Revision: http://reviews.llvm.org/D15393 llvm-svn: 260917	2016-02-15 23:44:13 +00:00
Simon Pilgrim	7c920e611c	[X86][SSE2] Regenerated sse2 tests llvm-svn: 260900	2016-02-15 17:57:40 +00:00
Simon Pilgrim	766a659eb5	[X86] More thorough partial-register division checks For when grep counts are just not enough... llvm-svn: 260891	2016-02-15 14:09:35 +00:00
Simon Pilgrim	a62170834d	[X86] Regenerated 64/128 bit multiply tests llvm-svn: 260890	2016-02-15 14:04:05 +00:00
Simon Pilgrim	9513b3c4c7	[X86][SSE] More thorough testing of all-ones vectors re-materialization llvm-svn: 260889	2016-02-15 13:50:48 +00:00
Simon Pilgrim	02d3b6a82d	[X86][SSE] Regenerated uint2fp special case tests llvm-svn: 260888	2016-02-15 13:41:41 +00:00
Simon Pilgrim	4e4989a64a	[X86][SSE] Regenerated fast isel intrinsics tests llvm-svn: 260885	2016-02-15 12:32:16 +00:00
Igor Breger	4dc7d390db	AVX512: Change store size of kmask. Store size of v8i1, v4i1 , v2i1 and i1 are changed to 16 bits. If KMOVB not supported (require AVX512DQ) only KMOVW can be used so store size should be 2 bytes. Differential Revision: http://reviews.llvm.org/D17138 llvm-svn: 260878	2016-02-15 08:25:28 +00:00
Simon Pilgrim	834931554b	[X86][AVX] Fixed copy+paste typo in shuffle test llvm-svn: 260852	2016-02-14 18:11:52 +00:00
Simon Pilgrim	08ba012973	[X86][AVX] Lower shuffles as repeated lane shuffles then lane-crossing shuffles This patch attempts to represent a shuffle as a repeating shuffle (recognisable by is128BitLaneRepeatedShuffleMask) with the source input(s) in their original lanes, followed by a single permutation of the 128-bit lanes to their final destinations. On AVX2 we can additionally attempt to match using 64-bit sub-lane permutation. AVX2 can also now match a similar 'broadcasted' repeating shuffle. This patch has several benefits: * Avoids prematurely matching with lowerVectorShuffleByMerging128BitLanes which can require both inputs to have their input lanes permuted before shuffling. * Can replace PERMPS/PERMD instructions - although these are useful for cross-lane unary shuffling, they require their shuffle mask to be pre-loaded (and increase register pressure). * Matching the repeating shuffle makes use of a lot of existing shuffle lowering. There is an outstanding minor AVX1 regression (combine_unneeded_subvector1 in vector-shuffle-combining.ll) of a previously 128-bit shuffle + subvector splat being converted to a subvector splat + (2 instruction) 256-bit shuffle, I intend to fix this in a followup patch for review. Differential Revision: http://reviews.llvm.org/D16537 llvm-svn: 260834	2016-02-13 21:54:04 +00:00
Sanjay Patel	e9bf993cee	[x86-64] allow mfence even with -mno-sse (PR23203) As shown in: https://llvm.org/bugs/show_bug.cgi?id=23203 ...we currently die because lowering believes that mfence is allowed without SSE2 on x86-64, but the instruction def doesn't know that. I don't know if allowing mfence without SSE is right, but if not, at least now it's consistently wrong. :) Differential Revision: http://reviews.llvm.org/D17219 llvm-svn: 260828	2016-02-13 17:26:29 +00:00
Pirama Arumuga Nainar	7476bc89e9	Don't combine fp_round (fp_round x) if f80 to f16 is generated Summary: This patch skips DAG combine of fp_round (fp_round x) if it results in an fp_round from f80 to f16. fp_round from f80 to f16 always generates an expensive (and as yet, unimplemented) libcall to __truncxfhf2. This prevents selection of native f16 conversion instructions from f32 or f64. Moreover, the first (value-preserving) fp_round from f80 to either f32 or f64 may become a NOP in platforms like x86. Reviewers: ab Subscribers: srhines, llvm-commits Differential Revision: http://reviews.llvm.org/D17221 llvm-svn: 260769	2016-02-13 00:08:05 +00:00
Yunzhong Gao	0de36ec169	Disable the vzeroupper insertion pass on PS4. Differential Revision: http://reviews.llvm.org/D16837 llvm-svn: 260764	2016-02-12 23:37:57 +00:00
Sanjay Patel	1617d5ab15	fix test to use FileCheck llvm-svn: 260751	2016-02-12 22:07:54 +00:00
Sanjay Patel	e5df1dfb14	[SelectionDAG] change getConstant() to use the input SDLoc when building splat vectors The code change is simple enough: instead of attaching an anonymous SDLoc to splatted vector constants, use the scalar constant's existing SDLoc since that is what is passed into getConstant() as a param. But this changes instruction scheduling, so I'll explain why that happens. The motivation for this patch starts near: http://reviews.llvm.org/rL258833 ...x86's getZeroVector() could be similarly cleaned up and I thought it would be 'NFC'. But when I made that change locally, several x86 codegen tests wiggled. It turns out that the lack of SDLoc consistency in getConstant() changes the way ScheduleDAGRRList behaves. This is because the SDLoc contains 'IROrder' and some DAG scheduler algorithms use IROrder for tie-breaking. Differential Revision: http://reviews.llvm.org/D16972 llvm-svn: 260582	2016-02-11 20:21:24 +00:00
Kevin B. Smith	6a83350bee	[X86] New pass to change byte and word instructions to zero-extending versions. Differential Revision: http://reviews.llvm.org/D17032 llvm-svn: 260572	2016-02-11 19:43:04 +00:00
Hans Wennborg	75fab7b0b0	Revert r260507: "[X86] Enable the LEA optimization pass by default." This caused PR26575. llvm-svn: 260538	2016-02-11 16:44:06 +00:00
Andrey Turetskiy	193956e25f	[X86] Enable the LEA optimization pass by default. Differential Revision: http://reviews.llvm.org/D16877 llvm-svn: 260507	2016-02-11 10:51:26 +00:00
Andrey Turetskiy	2396c38a8a	[X86] Fix stack alignment for MCU target, by Anton Nadolskiy. This patch fixes stack alignments for MCU (should be aligned to 4 bytes). Differential Revision: http://reviews.llvm.org/D15646 llvm-svn: 260375	2016-02-10 11:57:06 +00:00
Sanjay Patel	c7dde5f502	[x86] convert masked load of exactly one element to scalar load This is the load counterpart to the store optimization that was added in: http://reviews.llvm.org/rL260145 llvm-svn: 260325	2016-02-09 23:44:35 +00:00
Simon Pilgrim	7e671e06a2	[X86][AVX2] Fix SIGN_EXTEND vector handling on AVX2 targets. On AVX2 target we are poorly legalizing SIGN_EXTEND ops for which the input's legalized type doesn't have the same number of elements as the destination, resulting in an ANY_EXTEND followed by a SIGN_EXTEND_INREG. This patch uses the existing SIGN_EXTEND -> SIGN_EXTEND_VECTOR_INREG combine to extend the input to the size of the result and using SIGN_EXTEND_VECTOR_INREG instead. Differential Revision: http://reviews.llvm.org/D16994 llvm-svn: 260210	2016-02-09 08:19:19 +00:00
Simon Pilgrim	a207436b01	[X86][SSE1] Add MOVLHPS/MOVHLPS lowering and memory folding support As discussed on PR26491, this patch adds support for lowering v4f32 shuffles to the MOVLHPS/MOVHLPS instructions. It also adds support for memory folding with their MOVLPS/MOVHPS load equivalents. This first patch only really helps SSE1 targets as SSE2+ targets will widen the shuffle mask and use v2f64 equivalents (although they still combine to MOVLHPS/MOVHLPS for v2f64 splats). This will have to be addressed in a future patch, most likely when we add support for binary target shuffle combines. Differential Revision: http://reviews.llvm.org/D16956 llvm-svn: 260168	2016-02-08 23:03:46 +00:00
Andrew Kaylor	1224488e0c	[regalloc][WinEH] Do not mark intervals as not spillable if they contain a regmask Differential Revision: http://reviews.llvm.org/D16831 llvm-svn: 260164	2016-02-08 22:52:51 +00:00
Sanjay Patel	264d7e5b68	[x86] convert masked store of one element to scalar store Another opportunity to reduce masked stores: in D16691, we decided not to attempt the 'one mask element is set' transform in InstCombine, but this should be a win for any AVX machine. Code comments note that this transform could be extended for other targets / cases. Differential Revision: http://reviews.llvm.org/D16828 llvm-svn: 260145	2016-02-08 21:05:08 +00:00
Hans Wennborg	303d3dd110	Add triple to h-registers-3.ll to make bots happy after r260133 llvm-svn: 260136	2016-02-08 19:45:24 +00:00
Hans Wennborg	850ec6ca18	[X86] Don't zero/sign-extend i1, i8, or i16 return values to 32 bits (PR22532) This matches GCC and MSVC's behaviour, and saves on code size. We were already not extending i1 return values on x86_64 after r127766. This takes that patch further by applying it to x86 target as well, and also for i8 and i16. The ABI docs have been unclear about the required behaviour here. The new i386 psABI [1] clearly states (Table 2.4, page 14) that i1, i8, and i16 return vales do not need to be extended beyond 8 bits. The x86_64 ABI doc is being updated to say the same [2]. Differential Revision: http://reviews.llvm.org/D16907 [1]. https://01.org/sites/default/files/file_attach/intel386-psabi-1.0.pdf [2]. https://groups.google.com/d/msg/x86-64-abi/E8O33onbnGQ/_RFWw_ixDQAJ llvm-svn: 260133	2016-02-08 19:34:30 +00:00
Michael Zuckerman	529c27f408	[AVX512][PROLQ][PROLD] Change imm8 to int Differential Revision: http://reviews.llvm.org/D16983 llvm-svn: 260101	2016-02-08 15:13:32 +00:00
Craig Topper	3bb3f73be3	[X86] Change FeatureIFMA string to 'avx512ifma'. Matches gcc and fixes PR26461. llvm-svn: 260069	2016-02-08 01:23:15 +00:00
Simon Pilgrim	f116e4acc7	[X86][SSE] Resolve target shuffle inputs to sentinels to permit more combines The combineX86ShufflesRecursively only supports unary shuffles, but was missing the opportunity to combine binary shuffles with a zero / undef second input. This patch resolves target shuffle inputs, converting the shuffle mask elements to SM_SentinelUndef/SM_SentinelZero where possible. It then resolves the updated mask to check if we have created a faux unary shuffle. Additionally, we now attempt to recursively call combineX86ShufflesRecursively for all input operands (we used to just recurse for unary integer shuffles and unary unpacks) - it safely returns early if its not a target shuffle. Differential Revision: http://reviews.llvm.org/D16683 llvm-svn: 260063	2016-02-07 22:51:06 +00:00
Simon Pilgrim	a8d76d8741	[X86][SSE] Regenerate PSHUFB shuffle mask comments tests llvm-svn: 260061	2016-02-07 22:22:09 +00:00
Simon Pilgrim	a3d674470c	[X86][SSE] Added support for MOVHPD/MOVLPD + MOVHPS/MOVLPS shuffle decoding. llvm-svn: 260034	2016-02-07 15:39:22 +00:00
Asaf Badouh	ad5c3fc47d	[X86][AVX512] add intrinsics of Scalar FP to integer conversion with rounding mode Differential Revision: http://reviews.llvm.org/D16629 llvm-svn: 260033	2016-02-07 14:59:13 +00:00
Igor Breger	0aeda37464	AVX512: VPBROADCASTB/W/D/Q from GPR intrinsics implementation. Differential Revision: http://reviews.llvm.org/D16813 llvm-svn: 260024	2016-02-07 08:30:50 +00:00

... 4 5 6 7 8 ...

7517 Commits