llvm-project

Commit Graph

Author	SHA1	Message	Date
Rafael Espindola	f2898d73a5	Convert test to FileCheck. llvm-svn: 273609	2016-06-23 20:37:49 +00:00
Michael Kuperstein	0194d30e09	[X86] Extract HiPE prologue constants into metadata X86FrameLowering::adjustForHiPEPrologue() contains a hard-coded offset into an Erlang Runtime System-internal data structure (the PCB). As the layout of this data structure is prone to change, this poses problems for maintaining compatibility. To address this problem, the compiler can produce this information as module-level named metadata. For example (where P_NSP_LIMIT is the offending offset): !hipe.literals = !{ !2, !3, !4 } !2 = !{ !"P_NSP_LIMIT", i32 152 } !3 = !{ !"X86_LEAF_WORDS", i32 24 } !4 = !{ !"AMD64_LEAF_WORDS", i32 24 } Patch by Magnus Lang Differential Revision: http://reviews.llvm.org/D20363 llvm-svn: 273593	2016-06-23 18:17:25 +00:00
Pablo Barrio	7a64346533	[ARM] Lower (select_cc k k (select_cc ~k ~k x)) into (SSAT l_k x) Summary: SSAT saturates an integer, making sure that its value lies within an interval [-k, k]. Since the constant is given to SSAT as the number of bytes set to one, k + 1 must be a power of 2, otherwise the optimization is not possible. Also, the select_cc must use < and > respectively so that they define an interval. Reviewers: mcrosier, jmolloy, rengolin Subscribers: aemerson, rengolin, llvm-commits Differential Revision: http://reviews.llvm.org/D21372 llvm-svn: 273581	2016-06-23 16:53:49 +00:00
Artur Pilipenko	80771b9ad9	Upgrade other old memset/memcpy signatures in tests causing buildbot failures with rL273568. llvm-svn: 273580	2016-06-23 16:34:52 +00:00
Artur Pilipenko	4fec7b7131	Fix an old memset signature in 2009-09-01-PostRAProlog.ll test causing a buildbot failure llvm-svn: 273573	2016-06-23 16:07:10 +00:00
Simon Pilgrim	595dddb103	[X86][AVX512] Added AVX512F vector sign extend tests Now that Elena has confirmed that PR26474 has been fixed llvm-svn: 273560	2016-06-23 14:01:45 +00:00
Daniel Sanders	de393329b9	[mips] Don't derive the default ABI from the CPU in the backend. Summary: The backend has no reason to behave like a driver and should generally do as it's told (and error out if it can't) instead of trying to figure out what the API user meant. The default ABI is still derived from the arch component as a concession to backwards compatibility. API-users that previously passed an explicit CPU and a triple that was inconsistent with the CPU (e.g. mips-linux-gnu and mips64r2) may get a different ABI to what they got before. However, it's expected that there are no such users on the basis that CodeGen has been asserting that the triple is consistent with the selected ABI for several releases. API-users that were consistent or passed '' or 'generic' as the CPU will see no difference. Reviewers: sdardis, rafael Subscribers: rafael, dsanders, sdardis, llvm-commits Differential Revision: http://reviews.llvm.org/D21466 llvm-svn: 273557	2016-06-23 12:42:53 +00:00
Diana Picus	e440f99913	[AMDGPU] Remove exit-on-error in test (PR27761) The exit-on-error flag was necessary in order to avoid an assertion when handling DYNAMIC_STACKALLOC nodes in SelectionDAGLegalize. We can avoid the assertion by creating some dummy nodes. This enables us to remove the exit-on-error flag on the first 2 run lines (SI), but on the third run line (R600) we would run into another assertion when trying to reserve indirect registers. This patch also replaces that assertion with an early exit from the function. Fixes PR27761. Differential Revision: http://reviews.llvm.org/D20852 llvm-svn: 273550	2016-06-23 09:19:16 +00:00
Craig Topper	597aa42fec	[AVX512] Remove masked unpack intrinsics and autoupgrade to vectorshuffle and selects. llvm-svn: 273543	2016-06-23 07:37:33 +00:00
Matt Arsenault	3cb4ddeb4e	AMDGPU: Fix liveness when expanding m0 loop llvm-svn: 273514	2016-06-22 23:40:57 +00:00
Sanjoy Das	e57bf680ec	[ImplicitNullChecks] Hoist trivial depdendencies if possible When trying to convert a loading instruction into a FAULTING_LOAD, we sometimes face code like this: if %R10 is not null: %R9<def> = MOV32ri Immediate %R9<def, tied> = AND32rm %R9, 0x20(%R10) else: goto TRAP In these cases we would like to use the AND32rm instruction as the faulting operation by hoisting the "depedency" def-ing %R9 also above the control flow, transforming the program into: %R9<def> = MOV32ri Immediate %R9<def, tied> = FAULTING_LOAD_OP(AND32rm %R9, 0x20(%R10), FailPath: TRAP) This change teaches ImplicitNullChecks to do the above, when safe. llvm-svn: 273501	2016-06-22 22:16:51 +00:00
Rafael Espindola	928a95d0b0	Use shouldAssumeDSOLocal. With this it handle -fPIE. llvm-svn: 273499	2016-06-22 22:09:17 +00:00
Changpeng Fang	47efe1f6db	AMDGPU/SI: Define an intrinsic to expose ds_swizzle_b32 Reviewers: tstellarAMD, arsenm Differential Revision: http://reviews.llvm.org/D21533 llvm-svn: 273496	2016-06-22 21:33:49 +00:00
Peter Collingbourne	6d88fde3af	IR: Introduce Module::global_objects(). This is a convenience iterator that allows clients to enumerate the GlobalObjects within a Module. Also start using it in a few places where it is obviously the right thing to use. Differential Revision: http://reviews.llvm.org/D21580 llvm-svn: 273470	2016-06-22 20:29:42 +00:00
Matt Arsenault	9babdf4265	AMDGPU: Fix verifier errors in SILowerControlFlow The main sin this was committing was using terminator instructions in the middle of the block, and then not updating the block successors / predecessors. Split the blocks up to avoid this and introduce new pseudo instructions for branches taken with exec masking. Also use a pseudo instead of emitting s_endpgm and erasing it in the special case of a non-void return. llvm-svn: 273467	2016-06-22 20:15:28 +00:00
Krzysztof Parzyszek	f7f7068109	[Hexagon] Add SDAG preprocessing step to expose shifted addressing modes Transform: (store ch addr (add x (add (shl y c) e))) to: (store ch addr (add x (shl (add y d) c))), where e = (shl d c) for some integer d. The purpose of this is to enable generation of loads/stores with shifted addressing mode, i.e. mem(x+y<<#c). For that, the shift value c must be 0, 1 or 2. llvm-svn: 273466	2016-06-22 20:08:27 +00:00
Chad Rosier	8c106bcbe8	[AArch64] Remove an overly aggressive assert. llvm-svn: 273458	2016-06-22 19:18:52 +00:00
Rafael Espindola	8474fdf90d	Start using shouldAssumeDSOLocal on Hexagon. Include a token test showing that access to private is now the same as to internal. llvm-svn: 273457	2016-06-22 19:09:14 +00:00
Wei Ding	0526e7f8d9	AMDGPU: Add convergent flag to INLINEASM instruction. Differential Revision: http://reviews.llvm.org/D21214 llvm-svn: 273455	2016-06-22 18:51:08 +00:00
Zhan Jun Liau	0df350589f	[SystemZ] Recognize RISBG opportunities involving a truncate Summary: Recognize RISBG opportunities where the end result is narrower than the original input - where a truncate separates the shift/and operations. The motivating case is some code in postgres which looks like: srlg %r2, %r0, 11 nilh %r2, 255 Reviewers: uweigand Author: RolandF Differential Revision: http://reviews.llvm.org/D21452 llvm-svn: 273433	2016-06-22 16:16:27 +00:00
Krzysztof Parzyszek	f228c95f87	[Hexagon] Handle expansion of cmpxchg llvm-svn: 273432	2016-06-22 16:07:10 +00:00
Artur Pilipenko	1cec4fdddf	Upgrade old memset/memcpy signatures (without isVolatile argument) in tests We no longer have corresponding code in autoupgrade and the vast majority of the tests were fixed long time ago. Fix the remaining few. One of the verifier test cases is marked as XFAIL because it was passing only because the signature was incorrect. llvm-svn: 273428	2016-06-22 15:16:06 +00:00
Simon Pilgrim	1536c19642	Regenerated test llvm-svn: 273404	2016-06-22 12:58:15 +00:00
Jan Vesely	fea814d531	AMDGPU: Add implicitarg.ptr intrinsic. Points to the start of implicit arguments (appended after explicit arguments) Differential Revision: http://reviews.llvm.org/D20297 llvm-svn: 273317	2016-06-21 20:46:20 +00:00
Artem Belevich	d7ebcfb291	[NVPTX] Improve lowering of byval args of device functions. Avoid unnecessary spills of such vars to local space on SASS level and pointer space conversion. Instead, make a local copy with appropriate addrspacecasts and let LLVM optimize them away when possible. This allows loading value of the argument using [symbol+offset] instead of converting argument to general space pointer and using it for indexing (which also implicitly converts param space pointer to local space one on SASS level and triggers copying of argument into local space in the process). This reduces call overhead, uses less registers and reduces overall SASS size by 2-4%. Differential Review: http://reviews.llvm.org/D21421 llvm-svn: 273313	2016-06-21 20:30:26 +00:00
Silviu Baranga	03b6a4fc88	[AArch64] Fix merge-store.ll regression test after r273271 r273271 changed the RUN line of the regression test to use -march=cyclone instead of -mtriple=aarch64-none-none. This caused a change in the output syntax for the ext instruction, causing the test to fail. Change this test back to using -mtriple=aarch64-none-none. llvm-svn: 273286	2016-06-21 17:15:49 +00:00
Etienne Bergeron	f6be62f2c8	[StackProtector] Fix computation of GSCookieOffset and EHCookieOffset with SEH4 Summary: Fix the computation of the offsets present in the scopetable when using the SEH (__except_handler4). This patch added an intrinsic to track the position of the allocation on the stack of the EHGuard. This position is needed when producing the ScopeTable. ``` struct _EH4_SCOPETABLE { DWORD GSCookieOffset; DWORD GSCookieXOROffset; DWORD EHCookieOffset; DWORD EHCookieXOROffset; _EH4_SCOPETABLE_RECORD ScopeRecord[1]; }; struct _EH4_SCOPETABLE_RECORD { DWORD EnclosingLevel; long (FilterFunc)(); union { void (HandlerAddress)(); void (*FinallyFunc)(); }; }; ``` The code to generate the EHCookie is added in `X86WinEHState.cpp`. Which is adding these instructions when using SEH4. ``` Lfunc_begin0: # BB#0: # %entry pushl %ebp movl %esp, %ebp pushl %ebx pushl %edi pushl %esi subl $28, %esp movl %ebp, %eax <<-- Loading FramePtr movl %esp, -36(%ebp) movl $-2, -16(%ebp) movl $L__ehtable$use_except_handler4_ssp, %ecx xorl ___security_cookie, %ecx movl %ecx, -20(%ebp) xorl ___security_cookie, %eax <<-- XOR FramePtr and Cookie movl %eax, -40(%ebp) <<-- Storing EHGuard leal -28(%ebp), %eax movl $__except_handler4, -24(%ebp) movl %fs:0, %ecx movl %ecx, -28(%ebp) movl %eax, %fs:0 movl $0, -16(%ebp) calll _may_throw_or_crash LBB1_1: # %cont movl -28(%ebp), %eax movl %eax, %fs:0 addl $28, %esp popl %esi popl %edi popl %ebx popl %ebp retl ``` And the corresponding offset is computed: ``` Luse_except_handler4_ssp$parent_frame_offset = -36 .p2align 2 L__ehtable$use_except_handler4_ssp: .long -2 # GSCookieOffset .long 0 # GSCookieXOROffset .long -40 # EHCookieOffset <<---- .long 0 # EHCookieXOROffset .long -2 # ToState .long _catchall_filt # FilterFunction .long LBB1_2 # ExceptionHandler ``` Clang is not yet producing function using SEH4, but it's a work in progress. This patch is a step toward having a valid implementation of SEH4. Unfortunately, it is not yet fully working. The EH registration block is not allocated at the right offset on the stack. Reviewers: rnk, majnemer Subscribers: llvm-commits, chrisha Differential Revision: http://reviews.llvm.org/D21231 llvm-svn: 273281	2016-06-21 15:58:55 +00:00
Evandro Menezes	230083ff9d	[AArch64] Change the preferred alignment for char and short to word alignment Differential Revision: http://reviews.llvm.org/D21414 llvm-svn: 273279	2016-06-21 15:55:18 +00:00
Silviu Baranga	dc43d61a25	[AArch64] Switch regression tests to test features not CPUs Summary: We have switched to using features for all heuristics, but the tests for these are still using -mcpu, which means we are not directly testing the features. This converts at least some of the existing regression tests to use the new features. This still leaves the following features untested: merge-narrow-ld predictable-select-expensive alternate-sextload-cvt-f32-pattern disable-latency-sched-heuristic Reviewers: mcrosier, t.p.northover, rengolin Subscribers: MatzeB, aemerson, llvm-commits, rengolin Differential Revision: http://reviews.llvm.org/D21288 llvm-svn: 273271	2016-06-21 15:16:34 +00:00
Daniel Sanders	bf2c03ee69	[arm+x86] Make GNU variants behave like GNU w.r.t combining sin+cos into sincos. Summary: canCombineSinCosLibcall() would previously combine sin+cos into sincos for GNUX32/GNUEABI/GNUEABIHF regardless of whether UnsafeFPMath were set or not. However, GNU would only combine them for UnsafeFPMath because sincos does not set errno like sin and cos do. It seems likely that this is an oversight. Reviewers: t.p.northover Subscribers: t.p.northover, aemerson, llvm-commits, rengolin Differential Revision: http://reviews.llvm.org/D21431 llvm-svn: 273259	2016-06-21 12:29:03 +00:00
Craig Topper	283418fbb6	[AVX512] Add patterns for any-extending a mask that use the def of KMOVW/KMOVB without going through an EXTRACT_SUBREG and a MOVZX. llvm-svn: 273253	2016-06-21 07:37:32 +00:00
Craig Topper	9038aa3001	[AVX512] Use update_llc_test_checks.py to regenerate a test in preparation for a future commit. llvm-svn: 273252	2016-06-21 07:37:27 +00:00
James Y Knight	03c1415b8f	Revert "Change RelaxELFRelocations for llc." This reverts commit r273019. From email I sent to list: > I don't think this makes sense. Either the linker you're using supports > this feature, or it doesn't. Having it enabled for llc if your linker > doesn't support it is not fun. > > Further note that this also affects basically all other code using llvm > libraries -- other than Clang, which explicitly sets it back to false by > default, unless you set the ENABLE_X86_RELAX_RELOCATIONS cmake flag to > true. > > If you want to enable the relax mode across all llvm tools in some > circumstances, I think it should be via moving the cmake flag from clang > down into llvm. > > I'm going to revert this commit, since I both think it intrinsically > doesn't make sense to do this, and because it's breaking some of our > tools. llvm-svn: 273245	2016-06-21 05:40:41 +00:00
Craig Topper	0a0fb0fda1	[AVX512] Remove the masked vpcmpeq/vcmpgt intrinsics and autoupgrade them to native icmps. llvm-svn: 273240	2016-06-21 03:53:24 +00:00
Simon Pilgrim	225b2e37a0	[X86][X87] Fix issue with sitofp i64 -> fp128 on 32-bit targets Fix for PR27726 - sitofp i64 to fp128 was loading the merged load i64 to a x87 register preventing legalization for conversion to fp128. Added 32-bit tests for fp128 cast/conversions. llvm-svn: 273210	2016-06-20 22:41:17 +00:00
Matt Arsenault	2209625387	AMDGPU: Preserve undef flag on vcc when shrinking v_cndmask_b32 The implicit operand is added by the initial instruction construction, so this was adding an additional vcc use. The original one was missing the undef flag the original condition had, so the verifier would complain. llvm-svn: 273182	2016-06-20 18:34:00 +00:00
Matt Arsenault	b6d8c37e1a	AMDGPU: Fold more custom nodes to undef This will help sneak undefs past GVN into the DAG for some tests. Also add missing intrinsic for rsq_legacy, even though the node was already selected to the instruction. Also start passing the debug location to intrinsic errors. llvm-svn: 273181	2016-06-20 18:33:56 +00:00
Matt Arsenault	ff98241f37	Generalize DiagnosticInfoStackSize to support other limits Backends may want to report errors on resources other than stack size. llvm-svn: 273177	2016-06-20 18:13:04 +00:00
Matt Arsenault	a9720c67f1	AMDGPU: Use correct method for determining instruction size llvm-svn: 273172	2016-06-20 17:51:32 +00:00
Rafael Espindola	959e9c8d01	Use shouldAssumeDSOLocal. With this ARM fast isel knows that PIE variable are not preemptable. llvm-svn: 273169	2016-06-20 17:45:33 +00:00
Tom Stellard	5350894265	AMDGPU: Add support for R_AMDGPU_REL32 relocations Reviewers: arsenm, kzhuravl, rafael Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D21401 llvm-svn: 273168	2016-06-20 17:33:43 +00:00
Tom Stellard	1c89eb7db0	AMDGPU: Emit R_AMDGPU_ABS32_{HI,LO} for scratch buffer relocations Reviewers: arsenm, rafael, kzhuravl Subscribers: rafael, arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D21400 llvm-svn: 273166	2016-06-20 16:59:44 +00:00
Sam Parker	d616cf07b2	[ARM] Enable isel of UMAAL TargetLowering and DAGToDAG are used to combine ADDC, ADDE and UMLAL dags into UMAAL. Selection is split into the two phases because it is easier to match the two patterns at those different times. Differential Revision: http://http://reviews.llvm.org/D21461 llvm-svn: 273165	2016-06-20 16:47:09 +00:00
Simon Pilgrim	0a81b13f31	[X86][F16C] Added half <-> double conversion tests llvm-svn: 273153	2016-06-20 12:51:55 +00:00
Pankaj Gode	0aab2e398a	[AARCH64] Add support for Broadcom Vulcan Adding core tuning support for new Broadcom Vulcan core (ARMv8.1A). Differential Revision: http://reviews.llvm.org/D21500 llvm-svn: 273148	2016-06-20 11:13:31 +00:00
Igor Breger	e59165ca63	[AVX512] [AVX512/AVX][Intrinsics] Fix Variable Bit Shift Right Arithmetic intrinsic lowering. Differential Revision: http://reviews.llvm.org/D20897 llvm-svn: 273138	2016-06-20 07:05:43 +00:00
Simon Pilgrim	0887d5b02e	[X86][AVX512] Added 512-bit BITREVERSE tests and enabled AVX512BW lowering support llvm-svn: 273125	2016-06-19 20:59:19 +00:00
Simon Pilgrim	3d881a0230	[X86][SSE] Allow target shuffle combining to match masks with SM_Sentinel values We currently only allow exact matches of shuffle mask patterns during target shuffle combining. This patch relaxes this to permit SM_SentinelUndef in the combined shuffle to always be accepted as well as allowing exact matching of the SM_SentinelZero value. I've adjusted some tests that were requiring exact shuffle masks to now include undef values. Differential Revision: http://reviews.llvm.org/D21495 llvm-svn: 273119	2016-06-19 18:03:52 +00:00
Chris Dewhurst	a294541c05	[SPARC[ Correcting out-of-date unit tests checked in as part of r273108 llvm-svn: 273110	2016-06-19 12:52:39 +00:00
Chris Dewhurst	0c1e0026aa	[SPARC] Fixes for hardware errata on LEON processor. Passes to fix three hardware errata that appear on some LEON processor variants. The instructions FSMULD, FMULS and FDIVS do not work as expected on some LEON processors. This change allows those instructions to be substituted for alternatives instruction sequences that are known to work. These passes only run when selected individually, or as part of a processor defintion. They are not included in general SPARC processor compilations for non-LEON processors or for those LEON processors that do not have these hardware errata. llvm-svn: 273108	2016-06-19 11:03:28 +00:00
Simon Pilgrim	9a09652a3a	[X86][AVX] Added test case for PR28136 llvm-svn: 273098	2016-06-18 22:59:08 +00:00
Simon Pilgrim	cd6d4352bc	[X86][SSSE3] Added examples of target shuffle combining failing to match undefs in shuffle masks llvm-svn: 273097	2016-06-18 21:18:21 +00:00
Simon Pilgrim	ab009e9f41	[X86][XOP] Added fast-isel tests matching tools/clang/test/CodeGen/xop-builtins.c llvm-svn: 273096	2016-06-18 21:07:31 +00:00
Simon Pilgrim	b201678763	[X86][TBM] Added fast-isel tests matching tools/clang/test/CodeGen/tbm-builtins.c llvm-svn: 273087	2016-06-18 17:20:52 +00:00
Vasileios Kalintiris	0cf68df6cc	[mips] Emit a JALR with $rd equal to $zero, instead of a JR in MIPS32R6. Summary: JR is an alias of JALR with $rd=0 in the R6 ISA. Also, this fixes recursive builds in MIPS32R6. Reviewers: dsanders, sdardis Subscribers: jfb, dschuff, dsanders, sdardis, llvm-commits Differential Revision: http://reviews.llvm.org/D21370 llvm-svn: 273085	2016-06-18 15:39:43 +00:00
Matt Arsenault	e935f05a94	AMDGPU: Fix kernel argument alignment impacting stack size Don't use AllocateStack because kernel arguments have nothing to do with the stack. The ensureMaxAlignment call was still changing the stack alignment. llvm-svn: 273080	2016-06-18 05:15:53 +00:00
Simon Pilgrim	f4b2af1b9f	[X86][SSE4A] Autoupgrade and remove MOVNTSD/MOVNTSS intrinsics Required better annotation of the instruction defs upon removal of the builtin intrinsic pattern. llvm-svn: 273077	2016-06-18 02:38:26 +00:00
Matt Arsenault	0bb294b224	AMDGPU: Temporarily select trap to s_endpgm This should select to s_trap, but that requires additonal work to setup and enable the trap handler. For now emit s_endpgm so bugpoint stops getting stuck on the unsupported call to abort. Emit a warning that this will only terminate the wave and not really trap. llvm-svn: 273062	2016-06-17 22:27:03 +00:00
Matt Arsenault	8885910f8e	AMDGPU: Remove llvm.SI.tid intrinsic Mesa doesn't emit this for llvm >= 3.8 anymore. llvm-svn: 273050	2016-06-17 21:18:41 +00:00
Marcin Koscielnicki	fd4b6b9e51	[SelectionDAG] Don't treat library calls specially if marked with nobuiltin. To be used by D19781. Differential Revision: http://reviews.llvm.org/D19801 llvm-svn: 273039	2016-06-17 20:24:07 +00:00
Michael Kuperstein	18d6d3d95e	[X86] Add missing AVX512 anyext patterns. Add AVX512 anyext patterns for i16 and i64, modeled on the existing i8 and i32 patterns. llvm-svn: 273038	2016-06-17 20:21:17 +00:00
Tim Northover	28a9e7f4ba	ARM: take account of possible bundle when erasing an instruction. Fortunately this appears to be the only ARM-specific pass that runs while bundles might be in play, so no other cases need modifying. llvm-svn: 273029	2016-06-17 18:40:46 +00:00
James Y Knight	148a6469dc	Support expanding partial-word cmpxchg to full-word cmpxchg in AtomicExpandPass. Many CPUs only have the ability to do a 4-byte cmpxchg (or ll/sc), not 1 or 2-byte. For those, you need to mask and shift the 1 or 2 byte values appropriately to use the 4-byte instruction. This change adds support for cmpxchg-based instruction sets (only SPARC, in LLVM). The support can be extended for LL/SC-based PPC and MIPS in the future, supplanting the ISel expansions those architectures currently use. Tests added for the IR transform and SPARCv9. Differential Revision: http://reviews.llvm.org/D21029 llvm-svn: 273025	2016-06-17 18:11:48 +00:00
Rafael Espindola	9f86baebe0	Change RelaxELFRelocations for llc. As a developer tool it makes sense for it to use the new relocations. llvm-svn: 273019	2016-06-17 17:43:41 +00:00
Simon Pilgrim	6a35e5ab97	[X86][SSE4A] Remove the GCCBuiltins from the movntsd/movntss intrinsic defs so we can emit native IR from clang. Clang-side sibling commit to follow. llvm-svn: 273002	2016-06-17 14:27:38 +00:00
Ranjeet Singh	39d2d097d6	[ARM] Add support for mrrc/mrrc2 intrinsics. Reapplying patch as it was reverted when it was first committed because of an assertion failure when the mrrc2 intrinsic was called in ARM mode. The failure was happening because the instruction was being built in ARMISelDAGToDAG.cpp and the tablegen description for mrrc2 instruction doesn't allow you to use a predicate. The ARM architecture manuals do say that mrrc2 in ARM mode can be predicated with AL in assembly but this has no effect on the encoding of the instruction as the top 4 bits will always be 1111 not 1110 which is the encoding for the condition AL. Differential Revision: http://reviews.llvm.org/D21408 llvm-svn: 272982	2016-06-17 00:52:41 +00:00
Sanjay Patel	0e9afea3c8	[x86] autoupgrade and remove AVX2 integer min/max intrinsics This will (hopefully very temporarily) break clang. The clang side of this should be the next commit. llvm-svn: 272932	2016-06-16 18:44:20 +00:00
Rafael Espindola	5a07687a8e	dos2unix this test. NFC. llvm-svn: 272928	2016-06-16 18:21:11 +00:00
Sanjay Patel	d09a21682f	remove old FileCheck lines that are no longer used llvm-svn: 272921	2016-06-16 17:04:16 +00:00
Sanjay Patel	f664f3a578	[DAG] Remove redundant FMUL in Newton-Raphson SQRT code When calculating a square root using Newton-Raphson with two constants, a naive implementation is to use five multiplications (four muls to calculate reciprocal square root and another one to calculate the square root itself). However, after some reassociation and CSE the same result can be obtained with only four multiplications. Unfortunately, there's no reliable way to do such a reassociation in the back-end. So, the patch modifies NR code itself so that it directly builds optimal code for SQRT and doesn't rely on any further reassociation. Patch by Nikolai Bozhenov! Differential Revision: http://reviews.llvm.org/D21127 llvm-svn: 272920	2016-06-16 16:58:54 +00:00
Rafael Espindola	afade35003	Don't print (PLT) on arm. The R_ARM_PLT32 relocation is deprecated and is not produced by MC. This means that the code being deleted is dead from the .o point of view and was making the .s more confusing. llvm-svn: 272909	2016-06-16 16:09:53 +00:00
Sanjay Patel	51ab757941	[x86] autoupgrade and remove SSE2/SSE41 integer min/max intrinsics Follow-up to: http://reviews.llvm.org/rL272806 http://reviews.llvm.org/rL272807 llvm-svn: 272907	2016-06-16 15:48:30 +00:00
Daniel Sanders	de7816b0cd	[mips][mips16] Fix machine verifier errors about incorrect register classes on load/stores. Summary: [ls][bh] and [ls][bh]u cannot use sp-relative addresses and must therefore lower frameindex nodes such that there is a copy to a CPU16Regs register. This is now done consistently using a separate addressing mode that does not permit frameindex nodes. As part of this I've had to remove an optimization that reduced the number of instructions needed to work around the lack of sp-relative addresses on [ls][bh] and [ls][bh]u. This optimization used one of the eight CPU16Regs registers as a copy of the stack pointer and it's implementation was the root cause of many of the register vs register class mismatches. lw/sw can use sp-relative addresses but we ought to ensure that we use the correct version of lw/sw internally for things like IAS. This is not currently the case and this change does not fix this. However, this change does clean it up sufficiently well to fix the machine verifier failures. Also removed irrelevant functions from stchar.ll. Reviewers: sdardis Subscribers: dsanders, sdardis, llvm-commits Differential Revision: http://reviews.llvm.org/D21062 llvm-svn: 272882	2016-06-16 10:20:59 +00:00
Daniel Sanders	1d14864bb3	[llvm-objdump] Support detection of feature bits from the object and implement this for Mips. Summary: The Mips implementation only covers the feature bits described by the ELF e_flags so far. Mips stores additional feature bits such as MSA in the .MIPS.abiflags section. Also fixed a small bug this revealed where microMIPS wouldn't add the EF_MIPS_MICROMIPS flag when using -filetype=obj. Reviewers: echristo, rafael Subscribers: rafael, mehdi_amini, dsanders, sdardis, llvm-commits Differential Revision: http://reviews.llvm.org/D21125 llvm-svn: 272880	2016-06-16 09:17:03 +00:00
Hrvoje Varga	f1e0a03d08	[mips][micromips] Implement DCLO, DCLZ, DROTR, DROTR32 and DROTRV instructions Differential Revision: http://reviews.llvm.org/D16917 llvm-svn: 272876	2016-06-16 07:06:25 +00:00
Tim Northover	daa1c018b0	AArch64: allow MOV (imm) alias to be printed The backend has been around for years, it's pretty ridiculous that we can't even use the preferred form for printing "MOV" aliases. Unfortunately, TableGen can't handle the complex predicates when printing so it's a bunch of nasty C++. Oh well. llvm-svn: 272865	2016-06-16 01:42:25 +00:00
Matt Arsenault	191763026c	AMDGPU: Disable scheduling in some slow tests Disabling the pre-RA scheduler on large-work-group-registers causes it to be ~50% slower. llvm-svn: 272860	2016-06-16 00:56:47 +00:00
Sanjay Patel	74b40bdb53	[x86, SSE] update packed FP compare tests for direct translation from builtin to IR The clang side of this was r272840: http://reviews.llvm.org/rL272840 A follow-up step would be to auto-upgrade and remove these LLVM intrinsics completely. Differential Revision: http://reviews.llvm.org/D21269 llvm-svn: 272841	2016-06-15 21:22:15 +00:00
Sanjay Patel	0b526676ab	[x86] delete unnecessary function declarations Missed this in r272806, r272807. llvm-svn: 272834	2016-06-15 20:51:47 +00:00
Tim Northover	389a1e39ea	AArch64: stop trying to use 32-bit MOVZs when expanding patchpoints. Of course the assembly was right but because the opcode was MOVZWi it was encoded as "movz w16, #65535, lsl #32" which is an unallocated encoding and would go horribly wrong on a CPU. No idea how this bug survived this long. It seems nobody is using that aspect of patchpoints. llvm-svn: 272831	2016-06-15 20:33:36 +00:00
Sanjay Patel	1a4569df54	[x86] add folds for x86 vector compare nodes (PR27924) Ideally, we can get rid of most x86 LLVM intrinsics by transforming them to IR (and some of that happened with http://reviews.llvm.org/rL272807), but it doesn't cost much to have some simple folds in the backend too while we're working on that and as a backstop. This fixes: https://llvm.org/bugs/show_bug.cgi?id=27924 Differential Revision: http://reviews.llvm.org/D21356 llvm-svn: 272828	2016-06-15 20:26:58 +00:00
Kevin B. Smith	acbda9ef30	[X86]: Updated r272801 to promote 16 bit compares with immediate operand to 32 bits. This is in response to a comment by Eli Friedman. llvm-svn: 272814	2016-06-15 18:18:05 +00:00
Sanjay Patel	a6c6f09967	[x86, SSE] remove the GCCBuiltins from the integer min/max intrinsics This allows us to emit native IR in Clang (next commit). Also, update the intrinsic tests to show that codegen already knows how to handle the IR that Clang will soon produce. llvm-svn: 272806	2016-06-15 17:17:27 +00:00
Kevin B. Smith	54566a0e9a	[X86]: Quit promoting 8 and 16 bit compares to 32 bit. Differential Revision: http://reviews.llvm.org/D21144 llvm-svn: 272801	2016-06-15 16:37:46 +00:00
Kevin B. Smith	c3c82cdbd0	[X86]: Improve Liveness checking for X86FixupBWInsts.cpp Differential Revision: http://reviews.llvm.org/D21085 llvm-svn: 272797	2016-06-15 16:03:06 +00:00
Ranjeet Singh	0db7be886e	Reverting r272778 because there's an assertion failure when running the test CodeGen/ARM/intrinsics-coprocessor.ll llvm-svn: 272791	2016-06-15 14:23:29 +00:00
Simon Dardis	7bdf183ac1	[mips] Missing test case Add missing testcase from r272666. llvm-svn: 272784	2016-06-15 13:49:58 +00:00
Ranjeet Singh	351364fe76	[ARM] Add support for mrrc/mrrc2 intrinsics. Differential Revision: http://reviews.llvm.org/D21178 llvm-svn: 272778	2016-06-15 11:32:24 +00:00
Daniel Sanders	df3185d2ea	[mips] Removed invalid test from o32_cc.ll MIPS32R1 cannot implement a 64-bit FPU because this was introduced in MIPS32R2. llvm-svn: 272769	2016-06-15 09:47:27 +00:00
Daniel Sanders	d3bb20821d	[mips][msa] Fix register/register-class mismatches in emitINSERT_DF_VIDX(). Reviewers: sdardis Subscribers: dsanders, sdardis, llvm-commits Differential Revision: http://reviews.llvm.org/D21068 llvm-svn: 272765	2016-06-15 08:43:23 +00:00
Zlatko Buljan	d2ed9c6c2c	[mips][microMIPS] Add CodeGen support for AND, OR16, OR, XOR*, NOT16 and NOR instructions Differential Revision: http://reviews.llvm.org/D16719 llvm-svn: 272764	2016-06-15 07:46:24 +00:00
Igor Breger	64cfd3a442	[AVX512] Fix BLENDM lowering patterns. Operands should be swapped to match SELECT behavior. Use BLENDM instead of masked move instruction. Differential Revision: http://reviews.llvm.org/D21001 llvm-svn: 272763	2016-06-15 07:30:38 +00:00
Nicolai Haehnle	a609259832	AMDGPU: Fix MUBUF offset bugs affecting llvm.amdgcn.buffer.* intrinsics Summary: This fixes two related bugs. First, the generic optimization passes unfortunately generate negative constant offsets but the hardware treats SOffset as an unsigned value. Second, there is a hardware bug on SI and CI, where address clamping in MUBUF instructions does not work correctly when SOffset is larger than the buffer size. This patch works around this bug by never using SOffset. An alternative workaround would be to do the clamping manually when SOffset is too large, but generating the required code sequence during instruction selection would be rather involved, and in any case the resulting code would probably be worse. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96360 Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D21326 llvm-svn: 272761	2016-06-15 07:13:05 +00:00
Sanjoy Das	0272be206a	Don't force SP-relative addressing for statepoints Summary: ... when the offset is not statically known. Prioritize addresses relative to the stack pointer in the stackmap, but fallback gracefully to other modes of addressing if the offset to the stack pointer is not a known constant. Patch by Oscar Blumberg! Reviewers: sanjoy Subscribers: llvm-commits, majnemer, rnk, sanjoy, thanm Differential Revision: http://reviews.llvm.org/D21259 llvm-svn: 272756	2016-06-15 05:35:14 +00:00
David Majnemer	cbf614a93b	Remove the ScalarReplAggregates pass Nearly all the changes to this pass have been done while maintaining and updating other parts of LLVM. LLVM has had another pass, SROA, which has superseded ScalarReplAggregates for quite some time. Differential Revision: http://reviews.llvm.org/D21316 llvm-svn: 272737	2016-06-15 00:19:09 +00:00
Matt Arsenault	f42c69206d	AMDGPU: Run pointer optimization passes llvm-svn: 272736	2016-06-15 00:11:01 +00:00
Xinliang David Li	8052238ac0	Fix a test case to match its intention llvm-svn: 272733	2016-06-14 23:05:46 +00:00
Dehao Chen	9f2bdfb40f	Set machine block placement hot prob threshold for both static and runtime profile. Summary: With runtime profile, we have more confidence in branch probability, thus during basic block layout, we set a lower hot prob threshold so that blocks can be layouted optimally. Reviewers: djasper, davidxl Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D20991 llvm-svn: 272729	2016-06-14 22:27:17 +00:00
Sanjay Patel	4c3cb8b6c0	[x86] add current codegen tests for PR27924 llvm-svn: 272714	2016-06-14 21:25:46 +00:00
Peter Collingbourne	96efdd6107	IR: Introduce local_unnamed_addr attribute. If a local_unnamed_addr attribute is attached to a global, the address is known to be insignificant within the module. It is distinct from the existing unnamed_addr attribute in that it only describes a local property of the module rather than a global property of the symbol. This attribute is intended to be used by the code generator and LTO to allow the linker to decide whether the global needs to be in the symbol table. It is possible to exclude a global from the symbol table if three things are true: - This attribute is present on every instance of the global (which means that the normal rule that the global must have a unique address can be broken without being observable by the program by performing comparisons against the global's address) - The global has linkonce_odr linkage (which means that each linkage unit must have its own copy of the global if it requires one, and the copy in each linkage unit must be the same) - It is a constant or a function (which means that the program cannot observe that the unique-address rule has been broken by writing to the global) Although this attribute could in principle be computed from the module contents, LTO clients (i.e. linkers) will normally need to be able to compute this property as part of symbol resolution, and it would be inefficient to materialize every module just to compute it. See: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160509/356401.html http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160516/356738.html for earlier discussion. Part of the fix for PR27553. Differential Revision: http://reviews.llvm.org/D20348 llvm-svn: 272709	2016-06-14 21:01:22 +00:00
Wei Mi	b799a625f9	[X86] Reduce the width of multiplification when its operands are extended from i8 or i16 For <N x i32> type mul, pmuludq will be used for targets without SSE41, which often introduces many extra pack and unpack instructions in vectorized loop body because pmuludq generates <N/2 x i64> type value. However when the operands of <N x i32> mul are extended from smaller size values like i8 and i16, the type of mul may be shrunk to use pmullw + pmulhw/pmulhuw instead of pmuludq, which generates better code. For targets with SSE41, pmulld is supported so no shrinking is needed. Differential Revision: http://reviews.llvm.org/D20931 llvm-svn: 272694	2016-06-14 18:53:20 +00:00
Nirav Dave	f8d00d5cac	Fix BSS global handling in AsmPrinter Change EmitGlobalVariable to check final assembler section is in BSS before using .lcomm/.comm directive. This prevents globals from being put into .bss erroneously when -data-sections is used. This fixes PR26570. Reviewers: echristo, rafael Subscribers: llvm-commits, mehdi_amini Differential Revision: http://reviews.llvm.org/D21146 llvm-svn: 272674	2016-06-14 15:09:30 +00:00
Simon Dardis	878c0b1b76	[mips] Optimize stack pointer adjustments. Instead of always using addu to adjust the stack pointer when the size out is of the range of an addiu instruction, use subu so that a smaller constant can be generated. This can give savings of ~3 instructions whenever a function has a a stack frame whose size is out of range of an addiu instruction. This change may break some naive stack unwinders. Partially resolves PR/26291. Thanks to David Chisnall for reporting the issue. Reviewers: dsanders, vkalintiris Differential Review: http://reviews.llvm.org/D21321 llvm-svn: 272666	2016-06-14 13:39:43 +00:00
James Molloy	65b6be1d3a	[Thumb] Fix off-by-one error in r272007 We can only generate immediates up to #510 with a MOV+ADD, not #511, because there's no such instruction as add #256. Found by Oliver Stannard and csmith! llvm-svn: 272665	2016-06-14 13:33:07 +00:00
Simon Dardis	4fbf76f7c3	[mips][atomics] Fix atomic instruction descriptions and uses. PR27458 highlights that the MIPS backend does not have well formed MIR for atomic operations (among other errors). This patch adds expands and corrects the LL/SC descriptions and uses for MIPS(64). Reviewers: dsanders, vkalintiris Differential Review: http://reviews.llvm.org/D19719 llvm-svn: 272655	2016-06-14 11:29:28 +00:00
Simon Pilgrim	cf1165b86e	[X86][SSE4A] Added patterns for nontemporal stores of scalar float/doubles using MOVNTSD/MOVNTSS llvm-svn: 272651	2016-06-14 09:43:38 +00:00
Simon Dardis	e661e528db	[mips] MIPS32/64 itineraries Itineraries for some pre MIPSR6 and EVA instructions. Some pseudo expanded instructions are marked as having no scheduling info. Reviewers: dsanders, vkalintiris Differential Review: http://reviews.llvm.org/D20418 llvm-svn: 272648	2016-06-14 09:35:29 +00:00
Daniel Sanders	435a653437	[mips][dsp] Fix use without def on DSPCtrl registers read by rddsp intrinsic. Reviewers: sdardis Subscribers: dsanders, sdardis, llvm-commits Differential Revision: http://reviews.llvm.org/D21063 llvm-svn: 272647	2016-06-14 09:29:46 +00:00
Daniel Sanders	d2a49ec3ab	[mips][msa] copyPhysReg() should not set RegState::Define on result of CTCMSA. Summary: The machine verifier reports 'Explicit operand marked as def' when it is manually specified even though it agrees with the operand info. Reviewers: sdardis Subscribers: dsanders, sdardis, llvm-commits Differential Revision: http://reviews.llvm.org/D21065 llvm-svn: 272646	2016-06-14 09:11:33 +00:00
Diana Picus	bae1d89e45	[SelectionDAG] Remove exit-on-error flag from test (PR27765) The exit-on-error flag in the ARM test is necessary in order to avoid an unreachable in the DAGTypeLegalizer, when trying to expand a physical register. We can also avoid this situation by introducing a bitcast early on, where the invalid scalar-to-vector conversion is detected. We also add a test for PowerPC, which goes through a similar code path in the SelectionDAGBuilder. Fixes PR27765. Differential Revision: http://reviews.llvm.org/D21061 llvm-svn: 272644	2016-06-14 07:30:20 +00:00
Igor Breger	484bace21b	re-generate the tests using the update_llc_test_checks.py script llvm-svn: 272643	2016-06-14 07:05:10 +00:00
Craig Topper	99e30e6a66	[AVX512] Use MOVZX32 instead of MOVZ16 for loading single v8/v4/v2/v1 masks when KMOVB is not available. This has better behavior with respect to partial register stalls since it won't need to preserve the upper 16-bits of the GPR. llvm-svn: 272626	2016-06-14 03:13:00 +00:00
Craig Topper	ddab395397	[AVX512] Add patterns for zero-extending a mask that use the def of KMOVW/KMOVB without going through an EXTRACT_SUBREG and a MOVZX. llvm-svn: 272625	2016-06-14 03:12:54 +00:00
Craig Topper	cbe54a4bd9	[AVX512] Add tests for zero extending masks that show an unnecessary movzx instruction. A followup patch will remove that instruction, but adding the tests first to make the more obvious. llvm-svn: 272624	2016-06-14 03:12:48 +00:00
Sanjoy Das	98ac278b86	Move previously added test case to the right location In rL272580 I accidentally added a test case to test/CodeGen when test/Transforms/DeadStoreElimination/ is a better place for it. llvm-svn: 272581	2016-06-13 20:12:07 +00:00
Sanjoy Das	d0bdf3e02b	Fix AAResults::callCapturesBefore for operand bundles Summary: AAResults::callCapturesBefore would previously ignore operand bundles. It was possible for a later instruction to miss its memory dependency on a call site that would only access the pointer through a bundle. Patch by Oscar Blumberg! Reviewers: sanjoy Differential Revision: http://reviews.llvm.org/D21286 llvm-svn: 272580	2016-06-13 19:55:04 +00:00
Simon Pilgrim	582b9ce36e	[X86][SSE] Added extract to scalar nontemporal store tests llvm-svn: 272577	2016-06-13 19:08:28 +00:00
David Majnemer	248190ba69	[X86] Remove llvm.x86.bit.scan.{forward,reverse}.32 The need for these intrinsics has been obviated by r272564 which reimplements their functionality using generic IR. llvm-svn: 272566	2016-06-13 17:33:13 +00:00
Marek Olsak	e93f6d6923	AMDGPU/SI: Set INDEX_STRIDE for scratch coalescing Summary: Mesa and other users must set this to enable coalescing: - STRIDE = 0 - SWIZZLE_ENABLE = 1 This makes one particular compute shader 8x faster. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, kzhuravl Differential Revision: http://reviews.llvm.org/D21136 llvm-svn: 272556	2016-06-13 16:05:57 +00:00
Ulrich Weigand	daae87aa21	[SystemZ] Enable index register memory constraints for inline ASM This enables use of the 'R' and 'T' memory constraints for inline ASM operands on SystemZ, which allow an index register as well as an immediate displacement. This patch includes corresponding documentation and test case updates. As with the last patch of this kind, I moved the 'm' constraint to the most general case, which is now 'T' (base + 20-bit signed displacement + index register). Author: colpell Differential Revision: http://reviews.llvm.org/D21239 llvm-svn: 272547	2016-06-13 14:24:05 +00:00
Ranjeet Singh	933e1aa39f	[ARM] Reverting r272544 because clang patch needs to go in as soon as llvm patch has gone in because tests will start breaking in Clang. llvm-svn: 272546	2016-06-13 10:58:24 +00:00
Ranjeet Singh	8feacb330d	[ARM] Add mrrc/mrrc2 co-processor intrinsics MRRC/MRRC2 instruction writes to two registers. The intrinsic definition returns a single uint64_t to represent the write, this is a compact way of representing a write to two 32 bit registers, the alternative might have been two return a struct of 2 uint32_t's but this isn't as nice. Differential Revision: llvm-svn: 272544	2016-06-13 10:43:50 +00:00
Strahinja Petrovic	f0980e4dc0	This patch fixes handling long double type when it is constant in soft float mode on PowerPC 32 architecture. llvm-svn: 272543	2016-06-13 10:29:29 +00:00
Simon Pilgrim	377bc2ea43	[X86][SSE4A] Renamed tests to correspond with the the instruction with being tested llvm-svn: 272542	2016-06-13 10:14:42 +00:00
Craig Topper	13cf7cac07	[AVX512] Remove maksed pshufd, pshuflw, and phufhw intrinsics and autoupgrade them to selects and shufflevector. llvm-svn: 272527	2016-06-13 02:36:48 +00:00
Sanjay Patel	977530a8c9	[x86, SSE] change patterns for CMPP to float types to allow matching with SSE1 (PR28044) This patch is intended to solve: https://llvm.org/bugs/show_bug.cgi?id=28044 By changing the definition of X86ISD::CMPP to use float types, we allow it to be created and pass legalization for an SSE1-only target where v4i32 is not legal. The motivational trail for this change includes: https://llvm.org/bugs/show_bug.cgi?id=28001 and eventually makes this trigger: http://reviews.llvm.org/D21190 Ie, after this step, we should be free to have Clang generate FP compare IR instead of x86 intrinsics for SSE C packed compare intrinsics. (We can auto-upgrade and remove the LLVM sse.cmp intrinsics as a follow-up step.) Once we're generating vector IR instead of x86 intrinsics, a big pile of generic optimizations can trigger. Differential Revision: http://reviews.llvm.org/D21235 llvm-svn: 272511	2016-06-12 15:03:25 +00:00
Craig Topper	1067986c5b	[X86] Remove sse2 pshufd/pshuflw/pshufhw intrinsics and upgrade them to shufflevector. llvm-svn: 272510	2016-06-12 14:11:32 +00:00
Simon Pilgrim	9d8bed1796	[X86][BMI] Added fast-isel tests for BMI1 intrinsics A lot of the codegen is pretty awful for these as they are mostly implemented as generic bit twiddling ops llvm-svn: 272508	2016-06-12 09:56:05 +00:00
Craig Topper	b7713e413b	[X86] Move tests for llvm.x86.avx.vpermil.* intrinsics to a -upgrade test since they are autoupgraded to shufflevector. llvm-svn: 272494	2016-06-12 01:41:06 +00:00
Simon Pilgrim	2b7c02a04f	[X86] Updated test checks script to generalise LCPI symbol refs The script now replace '.LCPI888_8' style asm symbols with the {{\.LCPI.*}} re pattern - this helps stop hardcoded symbols in 32-bit x86 tests changing with every edit of the file Refreshed some tests to demonstrate the new check llvm-svn: 272488	2016-06-11 20:39:21 +00:00
Simon Pilgrim	5b9bade8dd	[X86][SSSE3] Added PSHUFB LUT implementation of BITREVERSE PSHUFB can speed up BITREVERSE of byte vectors by performing LUT on the low/high nibbles separately and ORing the results. Wider integer vector types are already BSWAP'd beforehand so also make use of this approach. llvm-svn: 272477	2016-06-11 15:44:13 +00:00
Craig Topper	46f49fb407	[AVX512] Re-generate v8i64 shuffle test now that we use pshufd for some cases. llvm-svn: 272474	2016-06-11 13:57:08 +00:00
Craig Topper	504fba5c8a	[AVX512] Lower v8i64 and v16i32 to pshufd when possible. llvm-svn: 272473	2016-06-11 13:43:21 +00:00
Simon Pilgrim	6800a45790	[X86][SSE] Added PSLLDQ/PSRLDQ as a target shuffle type Ensure that PALIGNR/PSLLDQ/PSRLDQ are byte vectors so that they can be correctly decoded for target shuffle combining llvm-svn: 272471	2016-06-11 13:38:28 +00:00
Simon Pilgrim	8dd73e3ffa	[X86][AVX2] Added PSLLDQ/PSRLDQ shuffle combining tests llvm-svn: 272469	2016-06-11 13:18:21 +00:00
Craig Topper	40abd1cc61	[AVX512] Add support for lowering v32i16 shuffles with repeated lanes. This allows us to create 512-bit PSHUFLW/PSHUFHW. llvm-svn: 272450	2016-06-11 03:27:42 +00:00
Quentin Colombet	f2a1909bb5	[IRTranslator] Support the translation of or. Now or instructions get translated into G_OR. llvm-svn: 272433	2016-06-10 20:50:35 +00:00
Sanjay Patel	b114fd65fc	[x86] enable bitcasted fabs/fneg transforms The vector cases don't change because we already have folds in X86ISelLowering to look through and remove bitcasts. llvm-svn: 272427	2016-06-10 20:33:50 +00:00
Zhan Jun Liau	ab42cbce98	[SystemZ] Support Compare and Traps Support and generate Compare and Traps like CRT, CIT, etc. Support Trap as legal DAG opcodes and generate "j .+2" for them by default. Add support for Conditional Traps and use the If Converter to convert them into the corresponding compare and trap opcodes. Differential Revision: http://reviews.llvm.org/D21155 llvm-svn: 272419	2016-06-10 19:58:10 +00:00
Tom Stellard	f3af841462	AMDGPU/SI: Don't use fixup_si_rodata for scratch rsrc relocations Summary: We need to set the fixup type to FK_Data_4 for the SCRATCH_RSRC_DWORD[01] symbols, since these require absolute relocations, and fixup_si_rodata is for relative relocations. Reviewers: arsenm, kzhuravl Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: http://reviews.llvm.org/D21153 llvm-svn: 272417	2016-06-10 19:26:38 +00:00
Mehdi Amini	cbd68ecf04	Move CodeGen test from Generic to X86 specific directory llvm-svn: 272416	2016-06-10 19:14:01 +00:00
Mehdi Amini	1d396832d3	Interprocedural Register Allocation (IPRA): add a Transformation Pass Adds a MachineFunctionPass that scans the body to find calls, and update the register mask with the one saved by the RegUsageInfoCollector analysis in PhysicalRegisterUsageInfo. Patch by Vivek Pandya <vivekvpandya@gmail.com> Differential Revision: http://reviews.llvm.org/D21180 llvm-svn: 272414	2016-06-10 18:37:21 +00:00
Sanjay Patel	d558bdadd2	[x86] add test for PR28044 llvm-svn: 272411	2016-06-10 18:05:55 +00:00
Mehdi Amini	bbacddfe92	Interprocedural Register Allocation (IPRA) Analysis Add an option to enable the analysis of MachineFunction register usage to extract the list of clobbered registers. When enabled, the CodeGen order is changed to be bottom up on the Call Graph. The analysis is split in two parts, RegUsageInfoCollector is the MachineFunction Pass that runs post-RA and collect the list of clobbered registers to produce a register mask. An immutable pass, RegisterUsageInfo, stores the RegMask produced by RegUsageInfoCollector, and keep them available. A future tranformation pass will use this information to update every call-sites after instruction selection. Patch by Vivek Pandya <vivekvpandya@gmail.com> Differential Revision: http://reviews.llvm.org/D20769 llvm-svn: 272403	2016-06-10 16:19:46 +00:00
Sanjay Patel	27f06ae7a5	[x86] fix test attributes and autogenerate checks llvm-svn: 272398	2016-06-10 15:30:52 +00:00
Sanjay Patel	cccccd9ab5	[x86] add missing tests for fcmp ueq/one Somehow, the codegen logic for these sequences has gone completely untested until now (note the 2 compare instructions generated per test). There's also an Intel AVX optimization opportunity exposed in these cases and the existing tests. Intel's (but not AMD's) AVX spec shows that extra FP predicates were added, so a single comparison should always be sufficient, and operand commutation should never be necessary. llvm-svn: 272397	2016-06-10 15:17:54 +00:00
Sanjay Patel	330a359fb3	[x86] regenerate checks llvm-svn: 272396	2016-06-10 14:48:50 +00:00
Simon Pilgrim	2fa2690bca	[X86][SSE] Added target shuffle combine tests for byte shift/rotates (PSLLDQ/PSRLDQ/PALIGNR) llvm-svn: 272392	2016-06-10 13:03:22 +00:00
Simon Pilgrim	34263ad995	[X86][AVX512] Added VPSLLDQ/VPSRLDQ memory fold tests Memory operand is new for AVX512 (SSE/AVX2 didn't support it). Also dropped the 'mask' from the tests (VPSLLDQ/VPSRLDQ don't support masked operations). Regenerated VPALIGNR test now that the shuffle comments work llvm-svn: 272383	2016-06-10 09:56:20 +00:00
Craig Topper	200d237e57	[AVX512] Add shuffle comment printing for masked VPERMPD/VPERMQ. llvm-svn: 272371	2016-06-10 05:12:40 +00:00
Craig Topper	89c1761474	[AVX512] Fix shuffle comment printing to handle the masked versions of some shuffles. Previously we were printing the mask operands as the register names. llvm-svn: 272367	2016-06-10 04:48:05 +00:00
Quentin Colombet	3198649199	[LiveRangeEdit] Add a test case for r272314. The test case is not great espicially because it is still cumbersome to run the regalloc pass with run-pass. (We miss a bunch of initiliazier to be properly implemented.) Related to llvm.org/PR27983 llvm-svn: 272360	2016-06-10 01:57:48 +00:00
Quentin Colombet	129458a7ed	[llc] Add support for several run-pass options. Previously we could run only one machine pass with the run-pass option. With that patch, we can now specify several passes with several run-pass options (or just one option with a list of comma separated passes) and llc will build the related pipeline. This is great to test the interaction of two passes that are not necessarily next to each other in the pipeline, or play with pass ordering. Now, we should be at parity with opt for the flexibility of running passes. Note: I also moved the run pass option from CommandFlags.h to llc.cpp because, really, this is needed only there! llvm-svn: 272356	2016-06-10 00:52:10 +00:00
Matt Arsenault	58ddad5bd6	AMDGPU: v_cndmask_b32 does not def vcc Fixes verifier errors after SIShrinkInstructions. llvm-svn: 272351	2016-06-10 00:18:41 +00:00
Tom Stellard	26a2ab7477	AMDGPU/SI: Make sure to emit TargetConstant nodes when matching ds_permute Summary: This fixes a bug with ds_permute instructions where if it was passed a constant address, then the offset operand would get assigned a register operand instead of an immediate. Reviewers: scchan, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19994 llvm-svn: 272349	2016-06-10 00:01:04 +00:00
Matt Arsenault	7757c59e48	AMDGPU: Fix flat atomics The flat atomics could already be selected, but only when using flat instructions for global memory. Add patterns for flat addresses. llvm-svn: 272345	2016-06-09 23:42:54 +00:00
Matt Arsenault	887018179a	AMDGPU: Fix i64 global cmpxchg This was using extract_subreg sub0 to extract the low register of the result instead of sub0_sub1, producing an invalid copy. There doesn't seem to be a way to use the compound subreg indices in tablegen since those are generated, so manually select it. llvm-svn: 272344	2016-06-09 23:42:48 +00:00
Matt Arsenault	25363d37fc	AMDGPU: Fix missing and broken check lines in atomic tests llvm-svn: 272343	2016-06-09 23:42:44 +00:00
Eric Christopher	1dbb23e162	Add aliases for mfvrsave/mtvrsave. Update a test as we're now going to emit it for easier reading of generated assembly as well. llvm-svn: 272339	2016-06-09 23:27:48 +00:00
Simon Pilgrim	643734c565	[X86][AVX512] Added avx512 VPSLLDQ/VPSRLDQ instruction comments llvm-svn: 272319	2016-06-09 22:03:15 +00:00
Simon Pilgrim	f718682eb9	[X86][AVX512] Dropped avx512 VPSLLDQ/VPSRLDQ intrinsics Auto-upgrade to generic shuffles like sse/avx2 implementations now that we can lower to VPSLLDQ/VPSRLDQ llvm-svn: 272308	2016-06-09 21:09:03 +00:00
Simon Pilgrim	47c76e201a	[X86][AVX512] Fixed issue with v16i32 shuffles lowering to VPALIGNR llvm-svn: 272307	2016-06-09 20:53:12 +00:00
Simon Pilgrim	0ab9d3026a	[X86][AVX512] Added support for lowering 512-bit vector shuffles to bit/byte shifts 512-bit VPSLLDQ/VPSRLDQ can only be used for avx512bw targets so lowerVectorShuffleAsShift had to be adjusted to include the subtarget llvm-svn: 272300	2016-06-09 20:13:58 +00:00
Justin Lebar	ed2c282d4b	[NVPTX] Add intrinsics for shfl instructions. Summary: Currently clang emits these instructions via inline (volatile) asm in the CUDA headers. Switching to intrinsics will let the optimizer reason across calls to these intrinsics. Reviewers: tra Subscribers: llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D21160 llvm-svn: 272298	2016-06-09 20:04:08 +00:00
Wei Ding	ed0f97fad2	AMDGPU/SI: Fix 32-bit fdiv lowering We were using the fast fdiv lowering for all division, implementation of IEEE754 fdiv is added. http://reviews.llvm.org/D20557 llvm-svn: 272292	2016-06-09 19:17:15 +00:00
Davide Italiano	1a7e32cc48	Also fix a typo. Need more coffee today. llvm-svn: 272278	2016-06-09 17:06:01 +00:00
Davide Italiano	f326b30a15	Improve r272262, check that __stack_chk_guard is used. Thanks to Rafael for the suggestion. llvm-svn: 272277	2016-06-09 17:04:38 +00:00
Jan Vesely	2da0cba5fb	SelectionDAG: Implement expansion of {S,U}MIN/MAX in integer legalization Fixes {u,}long_{min,max,clamp} opencl piglit regressions on EG. Reviewers: arsenm Differential Revision: http://reviews.llvm.org/D17898 llvm-svn: 272272	2016-06-09 16:04:00 +00:00
Haicheng Wu	5b458cc1f6	Reapply "[MBP] Reduce code size by running tail merging in MBP."" This reapplies commit r271930, r271915, r271923. They hit a bug in Thumb which is fixed in r272258 now. The original message: The code layout that TailMerging (inside BranchFolding) works on is not the final layout optimized based on the branch probability. Generally, after BlockPlacement, many new merging opportunities emerge. This patch calls Tail Merging after MBP and calls MBP again if Tail Merging merges anything. llvm-svn: 272267	2016-06-09 15:24:29 +00:00
Ulrich Weigand	79564611d9	[SystemZ] Enable long displacement constraints for inline ASM operands This enables use of the 'S' constraint for inline ASM operands on SystemZ, which allows for a memory reference with a signed 20-bit immediate displacement. This patch includes corresponding documentation and test case updates. I've changed the 'T' constraint to match the new behavior for 'S', as 'T' also uses a long displacement (though index constraints are still not implemented). I also changed 'm' to match the behavior for 'S' as this will allow for a wider range of displacements for 'm', though correct me if that's not the right decision. Author: colpell Differential Revision: http://reviews.llvm.org/D21097 llvm-svn: 272266	2016-06-09 15:19:16 +00:00
Davide Italiano	24f1f62dca	Move stackguard test to X86/ directory as it's not generic. llvm-svn: 272264	2016-06-09 15:16:58 +00:00
Davide Italiano	bd4243c519	[CodeGen] Change getSDagStackGuard to get an internal sym. Fixes a crash in the backend during an LTO build of rtld(1) in FreeBSD. llvm-svn: 272262	2016-06-09 14:23:38 +00:00
Igor Breger	f635367e2b	[AVX512] Remove masked_move/blendm intrinsic from back-end. This is complement patch to D21060. Differential Revision: http://reviews.llvm.org/D21174 llvm-svn: 272257	2016-06-09 11:46:55 +00:00
Zlatko Buljan	cd242c1655	[mips][microMIPS] Add CodeGen support for SEL., SELEQZ, SELNEZ, SELEQZ., SELNEZ.* and CMP.condn.fmt instructions Differential Revision: http://reviews.llvm.org/D20862 llvm-svn: 272256	2016-06-09 11:15:53 +00:00
Diana Picus	db2aff0ab4	[llc] Remove exit-on-error flag from MIR tests (PR27770) This is made possible by removing an assert in llc that assumed MIRParser::parseLLVMModule would exit on error. MIRParser's documentation states that it returns null if a parsing error occurs, so there's no reason to assert. We can instead just fall through to where the check for a module is performed and exit if it is null. This commit is part of the clean-up after r269655. Fixes PR27770 Differential Revision: http://reviews.llvm.org/D20371 llvm-svn: 272254	2016-06-09 10:31:05 +00:00
Craig Topper	6f7288dc44	[AVX512] Fix shuffle decode printing for several instructions with write masks. There are still more bugs here with UNPCK and PALIGN for sure. But these were the easiest ones to fix. llvm-svn: 272252	2016-06-09 07:49:08 +00:00
James Molloy	feb9f4243b	[Thumb] Select a BIC instead of AND if the immediate can be encoded more optimally negated If an immediate is only used in an AND node, it is possible that the immediate can be more optimally materialized when negated. If this is the case, we can negate the immediate and use a BIC instead; int i(int a) { return a & 0xfffffeec; } Used to produce: ldr r1, [CONSTPOOL] ands r0, r1 CONSTPOOL: 0xfffffeec And now produces: movs r1, #255 adds r1, #20 ; Less costly immediate generation bics r0, r1 llvm-svn: 272251	2016-06-09 07:39:08 +00:00
Craig Topper	8537c11ff3	[X86] Fix a test I failed to re-generate in r272249. llvm-svn: 272250	2016-06-09 07:10:34 +00:00
Craig Topper	7a2993093e	[X86] Bring consistent naming to the SSE/AVX and AVX512 PALIGNR instructions. Then add shuffle decode printing for the EVEX forms which is made easier by having the naming structure more similar to other instructions. llvm-svn: 272249	2016-06-09 07:06:38 +00:00
Quentin Colombet	2c6469687d	[MIR] Check that generic virtual registers get a size. Without that check it was possible to write test cases where the size was not specified and we ended up with weird asserts down the road, because the default value (1) would not make sense. llvm-svn: 272226	2016-06-08 23:27:46 +00:00
Dehao Chen	769219b11a	Revive http://reviews.llvm.org/D12778 to handle forward-hot-prob and backward-hot-prob consistently. Summary: Consider the following diamond CFG: A / \ B C \/ D Suppose A->B and A->C have probabilities 81% and 19%. In block-placement, A->B is called a hot edge and the final placement should be ABDC. However, the current implementation outputs ABCD. This is because when choosing the next block of B, it checks if Freq(C->D) > Freq(B->D) * 20%, which is true (if Freq(A) = 100, then Freq(B->D) = 81, Freq(C->D) = 19, and 19 > 8120%=16.2). Actually, we should use 25% instead of 20% as the probability here, so that we have 19 < 8125%=20.25, and the desired ABDC layout will be generated. Reviewers: djasper, davidxl Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D20989 llvm-svn: 272203	2016-06-08 21:30:12 +00:00
Quentin Colombet	d1cd30b218	[AArch64][RegisterBankInfo] G_OR are fine on either GPR or FPR. Teach AArch64RegisterBankInfo that G_OR can be mapped on either GPR or FPR for 64-bit or 32-bit values. Add test cases demonstrating how this information is used to coalesce a computation on a single register bank. llvm-svn: 272170	2016-06-08 16:53:32 +00:00
Oliver Stannard	b3378e2f3c	[ARM] MSR instructions implicitly set CPSR The MSR instructions can write to the CPSR, but we did not model this fact, so we could emit them in the middle of IT blocks, changing the condition flags for later instructions in the block. The tests use two calls to llvm.write_register.i32 because it is valid to use these instructions at the end of an IT block, which if conversion does do in some cases. With two calls, the first clobbers the flags, so a branch has to be used to make the second one conditional. Differential Revision: http://reviews.llvm.org/D21139 llvm-svn: 272154	2016-06-08 15:26:34 +00:00
Matthias Braun	3ef7df9cdf	MIR: Fix parsing of stack object references in MachineMemOperands The MachineMemOperand parser lacked the code to handle %stack.X references (%fixed-stack.X was working). llvm-svn: 272082	2016-06-08 00:47:07 +00:00
Nicolai Haehnle	c00e03b8f5	AMDGPU: Add amdgpu-ps-wqm-outputs function attributes Summary: The presence of this attribute indicates that VGPR outputs should be computed in whole quad mode. This will be used by Mesa for prolog pixel shaders, so that derivatives can be taken of shader inputs computed by the prolog, fixing a bug. The generated code could certainly be improved: if a prolog pixel shader is used (which isn't common in modern OpenGL - they're used for gl_Color, polygon stipples, and forcing per-sample interpolation), Mesa will use this attribute unconditionally, because it has to be conservative. So WQM may be used in the prolog when it isn't really needed, and furthermore a silly back-and-forth switch is likely to happen at the boundary between prolog and main shader parts. Fixing this is a bit involved: we'd first have to add a mechanism by which LLVM writes the WQM-related input requirements to the main shader part binary, and then Mesa specializes the prolog part accordingly. At that point, we may as well just compile a monolithic shader... Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95130 Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D20839 llvm-svn: 272063	2016-06-07 21:37:17 +00:00
Simon Pilgrim	536434e80f	[X86][SSE4A] Regenerated SSE4A intrinsics tests There are no VEX encoded versions of SSE4A instructions, make sure that AVX targets give the same output llvm-svn: 272060	2016-06-07 21:15:45 +00:00
Eric Christopher	538d09d0dd	Revert "Differential Revision: http://reviews.llvm.org/D20557 " Author: Wei Ding <wei.ding2@amd.com> Date: Tue Jun 7 19:04:44 2016 +0000 Differential Revision: http://reviews.llvm.org/D20557 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272044 91177308-0d34-0410-b5e6-96231b3b80d8 as it was breaking the bots. This reverts commit r272044. llvm-svn: 272056	2016-06-07 20:27:12 +00:00
Etienne Bergeron	22bfa83208	[stack-protection] Add support for MSVC buffer security check Summary: This patch is adding support for the MSVC buffer security check implementation The buffer security check is turned on with the '/GS' compiler switch. * https://msdn.microsoft.com/en-us/library/8dbf701c.aspx * To be added to clang here: http://reviews.llvm.org/D20347 Some overview of buffer security check feature and implementation: * https://msdn.microsoft.com/en-us/library/aa290051(VS.71).aspx * http://www.ksyash.com/2011/01/buffer-overflow-protection-3/ * http://blog.osom.info/2012/02/understanding-vs-c-compilers-buffer.html For the following example: ``` int example(int offset, int index) { char buffer[10]; memset(buffer, 0xCC, index); return buffer[index]; } ``` The MSVC compiler is adding these instructions to perform stack integrity check: ``` push ebp mov ebp,esp sub esp,50h [1] mov eax,dword ptr [__security_cookie (01068024h)] [2] xor eax,ebp [3] mov dword ptr [ebp-4],eax push ebx push esi push edi mov eax,dword ptr [index] push eax push 0CCh lea ecx,[buffer] push ecx call _memset (010610B9h) add esp,0Ch mov eax,dword ptr [index] movsx eax,byte ptr buffer[eax] pop edi pop esi pop ebx [4] mov ecx,dword ptr [ebp-4] [5] xor ecx,ebp [6] call @__security_check_cookie@4 (01061276h) mov esp,ebp pop ebp ret ``` The instrumentation above is: * [1] is loading the global security canary, * [3] is storing the local computed ([2]) canary to the guard slot, * [4] is loading the guard slot and ([5]) re-compute the global canary, * [6] is validating the resulting canary with the '__security_check_cookie' and performs error handling. Overview of the current stack-protection implementation: * lib/CodeGen/StackProtector.cpp * There is a default stack-protection implementation applied on intermediate representation. * The target can overload 'getIRStackGuard' method if it has a standard location for the stack protector cookie. * An intrinsic 'Intrinsic::stackprotector' is added to the prologue. It will be expanded by the instruction selection pass (DAG or Fast). * Basic Blocks are added to every instrumented function to receive the code for handling stack guard validation and errors handling. * Guard manipulation and comparison are added directly to the intermediate representation. * lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp * lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp * There is an implementation that adds instrumentation during instruction selection (for better handling of sibbling calls). * see long comment above 'class StackProtectorDescriptor' declaration. * The target needs to override 'getSDagStackGuard' to activate SDAG stack protection generation. (note: getIRStackGuard MUST be nullptr). * 'getSDagStackGuard' returns the appropriate stack guard (security cookie) * The code is generated by 'SelectionDAGBuilder.cpp' and 'SelectionDAGISel.cpp'. * include/llvm/Target/TargetLowering.h * Contains function to retrieve the default Guard 'Value'; should be overriden by each target to select which implementation is used and provide Guard 'Value'. * lib/Target/X86/X86ISelLowering.cpp * Contains the x86 specialisation; Guard 'Value' used by the SelectionDAG algorithm. Function-based Instrumentation: * The MSVC doesn't inline the stack guard comparison in every function. Instead, a call to '__security_check_cookie' is added to the epilogue before every return instructions. * To support function-based instrumentation, this patch is * adding a function to get the function-based check (llvm 'Value', see include/llvm/Target/TargetLowering.h), * If provided, the stack protection instrumentation won't be inlined and a call to that function will be added to the prologue. * modifying (SelectionDAGISel.cpp) do avoid producing basic blocks used for inline instrumentation, * generating the function-based instrumentation during the ISEL pass (SelectionDAGBuilder.cpp), * if FastISEL (not SelectionDAG), using the fallback which rely on the same function-based implemented over intermediate representation (StackProtector.cpp). Modifications * adding support for MSVC (lib/Target/X86/X86ISelLowering.cpp) * adding support function-based instrumentation (lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp, .h) Results * IR generated instrumentation: ``` clang-cl /GS test.cc /Od /c -mllvm -print-isel-input ``` ``` * Final LLVM Code input to ISel * ; Function Attrs: nounwind sspstrong define i32 @"\01?example@@YAHHH@Z"(i32 %offset, i32 %index) #0 { entry: %StackGuardSlot = alloca i8* <<<-- Allocated guard slot %0 = call i8* @llvm.stackguard() <<<-- Loading Stack Guard value call void @llvm.stackprotector(i8* %0, i8** %StackGuardSlot) <<<-- Prologue intrinsic call (store to Guard slot) %index.addr = alloca i32, align 4 %offset.addr = alloca i32, align 4 %buffer = alloca [10 x i8], align 1 store i32 %index, i32* %index.addr, align 4 store i32 %offset, i32* %offset.addr, align 4 %arraydecay = getelementptr inbounds [10 x i8], [10 x i8]* %buffer, i32 0, i32 0 %1 = load i32, i32* %index.addr, align 4 call void @llvm.memset.p0i8.i32(i8* %arraydecay, i8 -52, i32 %1, i32 1, i1 false) %2 = load i32, i32* %index.addr, align 4 %arrayidx = getelementptr inbounds [10 x i8], [10 x i8]* %buffer, i32 0, i32 %2 %3 = load i8, i8* %arrayidx, align 1 %conv = sext i8 %3 to i32 %4 = load volatile i8, i8* %StackGuardSlot <<<-- Loading Guard slot call void @__security_check_cookie(i8* %4) <<<-- Epilogue function-based check ret i32 %conv } ``` * SelectionDAG generated instrumentation: ``` clang-cl /GS test.cc /O1 /c /FA ``` ``` "?example@@YAHHH@Z": # @"\01?example@@YAHHH@Z" # BB#0: # %entry pushl %esi subl $16, %esp movl ___security_cookie, %eax <<<-- Loading Stack Guard value movl 28(%esp), %esi movl %eax, 12(%esp) <<<-- Store to Guard slot leal 2(%esp), %eax pushl %esi pushl $204 pushl %eax calll _memset addl $12, %esp movsbl 2(%esp,%esi), %esi movl 12(%esp), %ecx <<<-- Loading Guard slot calll @__security_check_cookie@4 <<<-- Epilogue function-based check movl %esi, %eax addl $16, %esp popl %esi retl ``` Reviewers: kcc, pcc, eugenis, rnk Subscribers: majnemer, llvm-commits, hans, thakis, rnk Differential Revision: http://reviews.llvm.org/D20346 llvm-svn: 272053	2016-06-07 20:15:35 +00:00
Wei Ding	a70216f1b3	Differential Revision: http://reviews.llvm.org/D20557 llvm-svn: 272044	2016-06-07 19:04:44 +00:00
Geoff Berry	486f49cc63	Reapply [AArch64] Fix isLegalAddImmediate() to return true for valid negative values. Originally reviewed here: http://reviews.llvm.org/D17463 llvm-svn: 272023	2016-06-07 16:48:43 +00:00
Haicheng Wu	4fa9f3ae45	Revert "[MBP] Reduce code size by running tail merging in MBP." This reverts commit r271930, r271915, r271923. They break a thumb selfhosting bot. llvm-svn: 272017	2016-06-07 15:17:21 +00:00
Simon Pilgrim	15c6ab5fac	[X86][AVX512] Added 512-bit integer vector non-temporal load tests llvm-svn: 272016	2016-06-07 15:12:47 +00:00
Simon Pilgrim	9a89623b57	[X86][SSE] Add general lowering of nontemporal vector loads Currently the only way to use the (V)MOVNTDQA nontemporal vector loads instructions is through the int_x86_sse41_movntdqa style builtins. This patch adds support for lowering nontemporal loads from general IR, allowing us to remove the movntdqa builtins in a future patch. We currently still fold nontemporal loads into suitable instructions, we should probably look at removing this (and nontemporal stores as well) or at least make the target's folding implementation aware that its dealing with a nontemporal memory transaction. There is also an issue that VMOVNTDQA only acts on 128-bit vectors on pre-AVX2 hardware - so currently a normal ymm load is still used on AVX1 targets. Differential Review: http://reviews.llvm.org/D20965 llvm-svn: 272010	2016-06-07 13:34:24 +00:00
James Molloy	b101383fb5	[Thumb-1] Add optimized constant materialization for integers [256..512) We can materialize these integers using a MOV; ADDi8 pair. llvm-svn: 272007	2016-06-07 13:10:14 +00:00
Igor Breger	61e628591f	[AVX512] Fix load opcode for fast isel. Differential Revision: http://reviews.llvm.org/D21067 llvm-svn: 272006	2016-06-07 13:08:45 +00:00
Ulrich Weigand	6b0634b304	[PowerPC] Support multiple return values with fast isel Using an LLVM IR aggregate return value type containing three or more integer values causes an abort in the fast isel pass. This patch adds two more registers to RetCC_PPC64_ELF_FIS to allow returning up to four integers with fast isel, just the same as is currently supported with regular isel (RetCC_PPC). This is needed for Swift and (possibly) other non-clang frontends. Fixes PR26190. llvm-svn: 272005	2016-06-07 12:48:22 +00:00
Simon Pilgrim	ca1da1bf07	[X86][SSE] Improved blend+zero target shuffle combining to use combined shuffle mask directly We currently only combine to blend+zero if the target value type has 8 elements or less, but this was missing a lot of cases where the combined mask had been widened. This change makes it so we use the combined mask to determine the blend value type, allowing us to catch more widened cases. llvm-svn: 272003	2016-06-07 12:20:14 +00:00
James Molloy	53298a1808	[ARM] Shrink post-indexed LDR and STR to LDM/STM A Thumb-2 post-indexed LDR instruction such as: ldr.w r0, [r1], #4 Can be rewritten as: ldm.n r1!, {r0} LDMs can be more expensive than LDRs on some cores, so this has been enabled only in minsize mode. llvm-svn: 272002	2016-06-07 12:13:34 +00:00
James Molloy	75afc95112	[ARM] Transform LDMs into writeback form to save code size If we have an LDM that uses only low registers and doesn't write to its base register: ldm.w r0, {r1, r2, r3} And that base register is dead after the LDM, then we can convert it to writeback form and use a narrow encoding: ldm.n r0!, {r1, r2, r3} Obviously, this introduces a new register write and so can cause WAW hazards, so I've enabled it only in minsize mode. This is a code size trick that ARM Compiler 5 ("armcc") does that we don't. llvm-svn: 272000	2016-06-07 11:47:24 +00:00
Saleem Abdulrasool	532dcbc2c5	ARM: correct TLS access on WoA TLS access requires an offset from the TLS index. The index itself is the section-relative distance of the symbol. For ARM, the relevant relocation (IMAGE_REL_ARM_SECREL) is applied as a constant. This means that the value may not be an immediate and must be lowered into a constant pool. This offset will not be base relocated. We were previously emitting the actual address of the symbol which would be base relocated and would therefore be the vaue offset by the ImageBase + TLS Offset. llvm-svn: 271974	2016-06-07 03:15:07 +00:00

... 2 3 4 5 6 ...

16483 Commits