llvm-project

Commit Graph

Author	SHA1	Message	Date
Amara Emerson	bf43004ff1	[AArch64][GlobalISel] Fix the G_EXTLOAD combiner creating non-extending illegal instructions. This fixes loads like 's1 = load %p (load 1 from %p)' being combined with an extend into an illegal 's8 = g_extload %p (load 1 from %p)' which doesn't do any extension, by avoiding touching those < s8 size loads. This bug was uncovered by a verifier update r351584, which I reverted it to keep the bots green. llvm-svn: 352311	2019-01-27 10:56:20 +00:00
Amara Emerson	203760ab9c	[GlobalISel][IRTranslator] Fix crash on translation of fneg. When the fneg IR instruction was added the code to do translation wasn't tested, and tried to get an invalid operand. llvm-svn: 352296	2019-01-26 23:47:09 +00:00
Matt Arsenault	cdc201fcde	GlobalISel: Fix address space limit in LLT The IR enforced limit for the address space is 24-bits, but LLT was only using 23-bits. Additionally, the argument to the constructor was truncating to 16-bits. A similar problem still exists for the number of vector elements. The IR enforces no limit, so if you try to use a vector with > 65535 elements the IRTranslator asserts in the LLT constructor. llvm-svn: 352264	2019-01-26 01:42:13 +00:00
Aditya Nandakumar	3ba0d94bce	[GISel]: Change how CSE is enabled by default for each pass https://reviews.llvm.org/D57178 Now add a hook in TargetPassConfig to query if CSE needs to be enabled. By default this hook returns false only for O0 opt level but this can be overridden by the target. As a consequence of the default of enabled for non O0, a few tests needed to be updated to not use CSE (by passing in -O0) to the run line. reviewed by: arsenm llvm-svn: 352126	2019-01-24 23:11:25 +00:00
Jessica Paquette	245047dfe8	[GlobalISel][AArch64] Add isel support for FP16 vector @llvm.ceil This patch adds support for vector @llvm.ceil intrinsics when full 16 bit floating point support isn't available. To do this, this patch... - Implements basic isel for G_UNMERGE_VALUES - Teaches the legalizer about 16 bit floats - Teaches AArch64RegisterBankInfo to respect floating point registers on G_BUILD_VECTOR and G_UNMERGE_VALUES - Teaches selectCopy about 16-bit floating point vectors It also adds - A legalizer test for the 16-bit vector ceil which verifies that we create a G_UNMERGE_VALUES and G_BUILD_VECTOR when full fp16 isn't supported - An instruction selection test which makes sure we lower to G_FCEIL when full fp16 is supported - A test for selecting G_UNMERGE_VALUES And also updates arm64-vfloatintrinsics.ll to show that the new ceiling types work as expected. https://reviews.llvm.org/D56682 llvm-svn: 352113	2019-01-24 22:00:41 +00:00
Kristof Beyls	e70ba0a1bc	[SLH][AArch64] Remove accidentally retained -debug-only line from test. llvm-svn: 351932	2019-01-23 09:10:12 +00:00
Kristof Beyls	3ff5dfd735	[SLH] AArch64: correctly pick temporary register to mask SP As part of speculation hardening, the stack pointer gets masked with the taint register (X16) before a function call or before a function return. Since there are no instructions that can directly mask writing to the stack pointer, the stack pointer must first be transferred to another register, where it can be masked, before that value is transferred back to the stack pointer. Before, that temporary register was always picked to be x17, since the ABI allows clobbering x17 on any function call, resulting in the following instruction pattern being inserted before function calls and returns/tail calls: mov x17, sp and x17, x17, x16 mov sp, x17 However, x17 can be live in those locations, for example when the call is an indirect call, using x17 as the target address (blr x17). To fix this, this patch looks for an available register just before the call or terminator instruction and uses that. In the rare case when no register turns out to be available (this situation is only encountered twice across the whole test-suite), just insert a full speculation barrier at the start of the basic block where this occurs. Differential Revision: https://reviews.llvm.org/D56717 llvm-svn: 351930	2019-01-23 08:18:39 +00:00
Peter Collingbourne	73078ecd38	hwasan: Move memory access checks into small outlined functions on aarch64. Each hwasan check requires emitting a small piece of code like this: https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html#memory-accesses The problem with this is that these code blocks typically bloat code size significantly. An obvious solution is to outline these blocks of code. In fact, this has already been implemented under the -hwasan-instrument-with-calls flag. However, as currently implemented this has a number of problems: - The functions use the same calling convention as regular C functions. This means that the backend must spill all temporary registers as required by the platform's C calling convention, even though the check only needs two registers on the hot path. - The functions take the address to be checked in a fixed register, which increases register pressure. Both of these factors can diminish the code size effect and increase the performance hit of -hwasan-instrument-with-calls. The solution that this patch implements is to involve the aarch64 backend in outlining the checks. An intrinsic and pseudo-instruction are created to represent a hwasan check. The pseudo-instruction is register allocated like any other instruction, and we allow the register allocator to select almost any register for the address to check. A particular combination of (register selection, type of check) triggers the creation in the backend of a function to handle the check for specifically that pair. The resulting functions are deduplicated by the linker. The pseudo-instruction (really the function) is specified to preserve all registers except for the registers that the AAPCS specifies may be clobbered by a call. To measure the code size and performance effect of this change, I took a number of measurements using Chromium for Android on aarch64, comparing a browser with inlined checks (the baseline) against a browser with outlined checks. Code size: Size of .text decreases from 243897420 to 171619972 bytes, or a 30% decrease. Performance: Using Chromium's blink_perf.layout microbenchmarks I measured a median performance regression of 6.24%. The fact that a perf/size tradeoff is evident here suggests that we might want to make the new behaviour conditional on -Os/-Oz. But for now I've enabled it unconditionally, my reasoning being that hwasan users typically expect a relatively large perf hit, and ~6% isn't really adding much. We may want to revisit this decision in the future, though. I also tried experimenting with varying the number of registers selectable by the hwasan check pseudo-instruction (which would result in fewer variants being created), on the hypothesis that creating fewer variants of the function would expose another perf/size tradeoff by reducing icache pressure from the check functions at the cost of register pressure. Although I did observe a code size increase with fewer registers, I did not observe a strong correlation between the number of registers and the performance of the resulting browser on the microbenchmarks, so I conclude that we might as well use ~all registers to get the maximum code size improvement. My results are below: Regs \| .text size \| Perf hit -----+------------+--------- ~all \| 171619972 \| 6.24% 16 \| 171765192 \| 7.03% 8 \| 172917788 \| 5.82% 4 \| 177054016 \| 6.89% Differential Revision: https://reviews.llvm.org/D56954 llvm-svn: 351920	2019-01-23 02:20:10 +00:00
Matt Arsenault	30989e492b	GlobalISel: Allow shift amount to be a different type For AMDGPU the shift amount is never 64-bit, and this needs to use a 32-bit shift. X86 uses i8, but seemed to be hacking around this before. llvm-svn: 351882	2019-01-22 21:42:11 +00:00
Eli Friedman	23e60c7893	[AArch64] Add patterns for zext/sext of shift amount. Not sure this is the best fix, but it saves an instruction for certain constructs involving variable shifts. Differential Revision: https://reviews.llvm.org/D55572 llvm-svn: 351768	2019-01-22 00:21:35 +00:00
Sanjay Patel	fe3a1b56eb	[AArch64] add more tests for buildvec to shuffle transform; NFC These are copied from the sibling x86 file. I'm not sure which of the current outputs (if any) is considered optimal, but someone more familiar with AArch may want to take a look. llvm-svn: 351754	2019-01-21 17:46:35 +00:00
Sanjay Patel	e713c47d49	[DAGCombiner] fix crash when converting build vector to shuffle The regression test is reduced from the example shown in D56281. This does raise a question as noted in the test file: do we want to handle this pattern? I don't have a motivating example for that on x86 yet, but it seems like we could have that pattern there too, so we could avoid the back-and-forth using a shuffle. llvm-svn: 351753	2019-01-21 17:30:14 +00:00
Matt Arsenault	bd3a5b29cb	GlobalISel: Verify G_BITCAST llvm-svn: 351594	2019-01-18 21:04:59 +00:00
Sanjin Sijaric	4d1450298c	Fix the buildbot failure introduced by r351404 EXPENSIVE_CHECKS buildbots are failing due to r351404. Add x1 as live in to the funclet basic block for SEH funclets, as well as -verify-machineinstrs to the test case that triggered the failure. llvm-svn: 351472	2019-01-17 20:24:14 +00:00
Sanjin Sijaric	b694030647	[ARM64][Windows] Share unwind codes between epilogues There are cases where we have multiple epilogues that have the exact same unwind code sequence. In that case, the epilogues can share the same unwind codes in the .xdata section. This should get us past the assert "SEH unwind data splitting not yet implemented" in many cases. We still need to add support for generating multiple .pdata/.xdata sections for those functions that need to be split into fragments. Differential Revision: https://reviews.llvm.org/D56813 llvm-svn: 351421	2019-01-17 09:45:17 +00:00
Sanjin Sijaric	685565ae9a	[SEH] [ARM64] Retrieve the frame pointer from SEH funclets The Windows ARM64 runtime passes the establisher frame to funclets as the first argument. llvm-svn: 351404	2019-01-17 00:24:38 +00:00
Mandeep Singh Grang	33c49c0c82	[COFF, ARM64] Implement support for SEH extensions __try/__except/__finally Summary: This patch supports MS SEH extensions __try/__except/__finally. The intrinsics localescape and localrecover are responsible for communicating escaped static allocas from the try block to the handler. We need to preserve frame pointers for SEH. So we create a new function/property HasLocalEscape. Reviewers: rnk, compnerd, mstorsjo, TomTan, efriedma, ssijaric Reviewed By: rnk, efriedma Subscribers: smeenai, jrmuizel, alex, majnemer, ssijaric, ehsan, dmajor, kristina, javed.absar, kristof.beyls, chrib, llvm-commits Differential Revision: https://reviews.llvm.org/D53540 llvm-svn: 351370	2019-01-16 19:52:59 +00:00
Aditya Nandakumar	500e3ead9f	[GISel]: Add support for CSEing continuously during GISel passes. https://reviews.llvm.org/D52803 This patch adds support to continuously CSE instructions during each of the GISel passes. It consists of a GISelCSEInfo analysis pass that can be used by the CSEMIRBuilder. llvm-svn: 351283	2019-01-16 00:40:37 +00:00
James Y Knight	693d39dd12	Remove irrelevant references to legacy git repositories from compiler identification lines in test-cases. (Doing so only because it's then easier to search for references which are actually important and need fixing.) llvm-svn: 351200	2019-01-15 16:18:52 +00:00
Evandro Menezes	f793fe1402	[AArch64] Adjust the feature set for Exynos Enable the fusion of arithmetic and logic instructions for Exynos M4. llvm-svn: 351149	2019-01-15 01:53:49 +00:00
Evandro Menezes	111033d917	[AArch64] Fix typo (NFC) Fix another typo, this time in the `RUN` line, which used a syntax not universally supported, in test case added by D56572. llvm-svn: 351144	2019-01-15 00:58:59 +00:00
Evandro Menezes	4b6c290fbd	[AArch64] Fix typo (NFC) Fix typo in test case added by D56572 (rL351139). llvm-svn: 351143	2019-01-15 00:20:57 +00:00
Eli Friedman	295e346da4	[EarlyIfConversion] Don't if-convert unconditional branches. A block ending in an unconditional branch can have two successors if one is a landing pad. In practice, I think this only has an effect on Windows because landing pads are never empty for Itanium unwinding. (Alternatively, I could add a check to AArch64InstrInfo::canInsertSelect, but this seems more obvious.) Differential Revision: https://reviews.llvm.org/D56468 llvm-svn: 351142	2019-01-15 00:19:46 +00:00
Eli Friedman	33aecc8182	[AArch64] Explicitly use v1i64 type for llvm.aarch64.neon.abs.i64 . Otherwise, with D56544, the intrinsic will be expanded to an integer csel, which is probably not what the user expected. This matches the general convention of using "v1" types to represent scalar integer operations in vector registers. While I'm here, also add some error checking so we don't generate illegal ABS nodes. Differential Revision: https://reviews.llvm.org/D56616 llvm-svn: 351141	2019-01-15 00:15:24 +00:00
Evandro Menezes	bf59cb02c3	[AArch64] Add new target feature to fuse arithmetic and logic operations This feature enables the fusion of some arithmetic and logic instructions together. Differential revision: https://reviews.llvm.org/D56572 llvm-svn: 351139	2019-01-14 23:54:36 +00:00
Francis Visoiu Mistrih	b7cef81fd3	Replace "no-frame-pointer-" function attributes with "frame-pointer" Part of the effort to refactoring frame pointer code generation. We used to use two function attributes "no-frame-pointer-elim" and "no-frame-pointer-elim-non-leaf" to represent three kinds of frame pointer usage: (all) frames use frame pointer, (non-leaf) frames use frame pointer, (none) frame use frame pointer. This CL makes the idea explicit by using only one enum function attribute "frame-pointer" Option "-frame-pointer=" replaces "-disable-fp-elim" for tools such as llc. "no-frame-pointer-elim" and "no-frame-pointer-elim-non-leaf" are still supported for easy migration to "frame-pointer". tests are mostly updated with // replace command line args ‘-disable-fp-elim=false’ with ‘-frame-pointer=none’ grep -iIrnl '\-disable-fp-elim=false' \| xargs sed -i '' -e "s/-disable-fp-elim=false/-frame-pointer=none/g" // replace command line args ‘-disable-fp-elim’ with ‘-frame-pointer=all’ grep -iIrnl '\-disable-fp-elim' * \| xargs sed -i '' -e "s/-disable-fp-elim/-frame-pointer=all/g" Patch by Yuanfang Chen (tabloid.adroit)! Differential Revision: https://reviews.llvm.org/D56351 llvm-svn: 351049	2019-01-14 10:55:55 +00:00
Simon Pilgrim	ca0de0363b	[X86][AARCH64] Improve ISD::ABS support This patch takes some of the code from D49837 to allow us to enable ISD::ABS support for all SSE vector types. Differential Revision: https://reviews.llvm.org/D56544 llvm-svn: 350998	2019-01-12 09:59:32 +00:00
Bryan Chan	7ce5775e62	[AArch64] Fix operation actions for FP16 vector intrinsics Summary: This patch changes the legalization action for some half-precision floating- point vector intrinsics (FSIN, FLOG, etc.) from Promote to Expand. These ops are not supported in hardware for half-precision vectors, but promotion is not always possible (for v8f16 operands). Changing the action to Expand fixes an assertion failure in the legalizer when the frontend produces such ops. In addition, a quick microbenchmark shows that, in the v4f16 case, expanding introduces fewer spills and is therefore slightly faster than promoting. Reviewers: t.p.northover, SjoerdMeijer Reviewed By: SjoerdMeijer Subscribers: javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D56296 llvm-svn: 350825	2019-01-10 15:02:37 +00:00
Mandeep Singh Grang	859cb2e35d	[AArch64] Emit the correct MCExpr relocations specifiers like VK_ABS_G0, etc Summary: D55896 and D56029 add support to emit fixups for :abs_g0: , :abs_g1_s: , etc. This patch adds the necessary enums and MCExpr needed for lowering these. Reviewers: rnk, mstorsjo, efriedma Reviewed By: efriedma Subscribers: javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D56037 llvm-svn: 350798	2019-01-10 04:59:44 +00:00
Florian Hahn	7c122c1255	[AArch64] Add test for constant shrinking with multiple users (NFC). Test to avoid regression fixed by rL350684. llvm-svn: 350762	2019-01-09 21:04:36 +00:00
Francis Visoiu Mistrih	ac6454a7f6	[CodeGen] Ignore return sext/zext attributes of unused results for tail calls If the caller's return type does not have a zeroext attribute but the callee does a tail call zeroext, we won't consider the tail call during CodeGenPrepare because the attributes don't match. However, if the result of the tail call has no uses, it makes sense to drop the sext/zext attributes. Differential Revision: https://reviews.llvm.org/D56486 llvm-svn: 350753	2019-01-09 19:46:15 +00:00
Kristof Beyls	c650ff77eb	Initial AArch64 SLH implementation. This is an initial implementation for Speculative Load Hardening for AArch64. It builds on top of the recently introduced AArch64SpeculationHardening pass. This doesn't implement (yet) some of the optimizations implemented for the X86SpeculativeLoadHardening pass. I thought introducing the optimizations incrementally in follow-up patches should make this easier to review. Differential Revision: https://reviews.llvm.org/D55929 llvm-svn: 350729	2019-01-09 15:13:34 +00:00
Matt Arsenault	3dddb163dd	GlobalISel: Implement fewerElements for implicit_def llvm-svn: 350697	2019-01-09 07:51:52 +00:00
Matt Arsenault	befee402ff	GlobalISel: Implement widenScalar for implicit_def llvm-svn: 350695	2019-01-09 07:34:14 +00:00
Petr Pavlu	bf4fdecc51	[GlobalISel] Fix choice of instruction selector for AArch64 at -O0 with -global-isel=0 Commit rL347861 introduced an unintentional change in the behaviour when compiling for AArch64 at -O0 with -global-isel=0. Previously, explicitly disabling GlobalISel resulted in using FastISel but an updated condition in the commit changed it to using SelectionDAG. The patch fixes this condition and slightly better organizes the code that chooses the instruction selector. Fixes PR40131. Differential Revision: https://reviews.llvm.org/D56266 llvm-svn: 350626	2019-01-08 14:19:06 +00:00
Tim Northover	964eea7ad2	AArch64: avoid splitting vector truncating stores. We have code to split vector splats (of zero and non-zero) for performance reasons, but it ignores the fact that a store might be truncating. Actually, truncating stores are formed for vNi8 and vNi16 types. Since the truncation is from a legal type, the size of the store is always <= 64-bits and so they don't actually benefit from being split up anyway, so this patch just disables that transformation. llvm-svn: 350620	2019-01-08 13:30:27 +00:00
Ayonam Ray	e00606a1b2	Reversing the commit in revision 350186. Revision causes regression in 4 tests. llvm-svn: 350187	2019-01-01 07:28:55 +00:00
Ayonam Ray	c471bb2e67	Omit range checks from jump tables when lowering switches with unreachable default During the lowering of a switch that would result in the generation of a jump table, a range check is performed before indexing into the jump table, for the switch value being outside the jump table range and a conditional branch is inserted to jump to the default block. In case the default block is unreachable, this conditional jump can be omitted. This patch implements omitting this conditional branch for unreachable defaults. Review Reference: D52002 llvm-svn: 350186	2019-01-01 06:37:50 +00:00
Roman Lebedev	da1df56e5d	NFC][CodeGen][X86][AArch64] Tests for bit extract (pat. a/c/d) with trunc (PR36419) llvm-svn: 350000	2018-12-22 10:38:05 +00:00
Roman Lebedev	c90611db06	[NFC][CodeGen][X86][AArch64] Bit extract: add nounwind attr to drop .cfi noise Forgot about that. llvm-svn: 349999	2018-12-22 09:58:13 +00:00
Roman Lebedev	29d8af283a	[NFC][CodeGen][X86][AArch64] Tests for bit extract (pat. b) with trunc (PR36419) @bextr64_32_b1 is extracted from hotpath of real-world code (RawSpeed BitStream<>::peekBitsNoFill()) after `clang -O3`. @bextr64_32_b2/@bextr64_32_b0 is the same pattern, but with trunc done last, showing how i think it can be handled: https://rise4fun.com/Alive/K4B https://rise4fun.com/Alive/qC9 It is possible that middle-end should do some of this, too. https://bugs.llvm.org/show_bug.cgi?id=36419 llvm-svn: 349998	2018-12-22 09:40:14 +00:00
Jessica Paquette	453ab1db5b	[GlobalISel][AArch64] Add support for widening G_FCEIL This adds support for widening G_FCEIL in LegalizerHelper and AArch64LegalizerInfo. More specifically, it teaches the AArch64 legalizer to widen G_FCEIL from a 16-bit float to a 32-bit float when the subtarget doesn't support full FP 16. This also updates AArch64/f16-instructions.ll to show that we perform the correct transformation. llvm-svn: 349927	2018-12-21 17:05:26 +00:00
Jessica Paquette	a6b9c68a85	[GlobalISel][AArch64] Add G_FCEIL to isPreISelGenericFloatingPointOpcode If you don't do this, then if you hit a G_LOAD in getInstrMapping, you'll end up with GPRs on the G_FCEIL instead of FPRs. This causes a fallback. Add it to the switch, and add a test verifying that this happens. llvm-svn: 349822	2018-12-20 21:14:15 +00:00
Amilendra Kodithuwakku	388bb86d7b	Test commit Fix a simple typo. llvm-svn: 349771	2018-12-20 16:44:26 +00:00
Amara Emerson	321bfb210a	Fix build errors introduced by r349712 on aarch64 bots. llvm-svn: 349723	2018-12-20 03:27:42 +00:00
Amara Emerson	8cb186ce17	[AArch64][GlobalISel] Implement selection og G_MERGE of two s32s into s64. This code pattern is an unfortunate side effect of the way some types get split at call lowering. Ideally we'd either not generate it at all or combine it away in the legalizer artifact combiner. Until then, add selection support anyway which is a significant proportion of our current fallbacks on CTMark. rdar://46491420 llvm-svn: 349712	2018-12-20 01:11:04 +00:00
Jessica Paquette	3560e93dc1	[GlobalISel][AArch64] Add support for @llvm.ceil This adds a G_FCEIL generic instruction and uses it in AArch64. This adds selection for floating point ceil where it has a supported, dedicated instruction. Other cases aren't handled here. It updates the relevant gisel tests and adds a select-ceil test. It also adds a check to arm64-vcvt.ll which ensures that we don't fall back when we run into one of the relevant cases. llvm-svn: 349664	2018-12-19 19:01:36 +00:00
Simon Pilgrim	6c95bea072	[TargetLowering] Fix propagation of undefs in zero extension ops (PR40091) As described on PR40091, we have several places where zext (and zext_vector_inreg) fold an undef input into an undef output. For zero extensions this is incorrect as the output should guarantee to least have the new upper bits set to zero. SimplifyDemandedVectorElts is the worst offender (and its the most likely to cause new undefs to appear) but DAGCombiner's tryToFoldExtendOfConstant has a similar issue. Thanks to @dmgreen for catching this. Differential Revision: https://reviews.llvm.org/D55883 llvm-svn: 349625	2018-12-19 13:37:59 +00:00
Simon Pilgrim	73c8685295	[AARCH64] Added test case for PR40091 llvm-svn: 349543	2018-12-18 21:05:22 +00:00
Michael Berg	c6a5245cf7	Add FMF management to common fp intrinsics in GlobalIsel Summary: This the initial code change to facilitate managing FMF flags from Instructions to MI wrt Intrinsics in Global Isel. Eventually the GlobalObserver interface will be added as well, where FMF additions can be tracked for the builder and CSE. Reviewers: aditya_nandakumar, bogner Reviewed By: bogner Subscribers: rovka, kristof.beyls, javed.absar Differential Revision: https://reviews.llvm.org/D55668 llvm-svn: 349514	2018-12-18 17:54:52 +00:00

1 2 3 4 5 ...

2438 Commits