llvm-project

Commit Graph

Author	SHA1	Message	Date
David Bolvansky	2fa7fb14ea	[DAGCombiner] Bug 31275- Extract a shift from a constant mul or udiv if a rotate can be formed Summary: Attempt to extract a shrl from a udiv or a shl from a mul if this allows a rotate to be formed. This targets cases where the input to a rotate pattern was a mul or udiv by a constant and InstCombine merged one of the shifts with the op. Patch by: sameconrad (Sam Conrad) Reviewers: RKSimon, craig.topper, spatel, lebedev.ri, javed.absar Reviewed By: lebedev.ri Subscribers: efriedma, kparzysz, llvm-commits Differential Revision: https://reviews.llvm.org/D47681 llvm-svn: 338270	2018-07-30 16:50:00 +00:00
Jessica Paquette	7816531f3c	Attempt to fix Windows test failure caused by r338133 It seems like the pass pipeline on Windows is slightly different than on Linux and macOS. As a result, the arm64-opt-remarks-lazy-bfi test has been failing. This switches a CHECK-NEXT to a CHECK-DAG to try and get this running properly again. It'd be nice to switch it back to a CHECK-NEXT if possible, but the CHECK-NEXT lines following the line we care about (the optimization remark emitter) do a pretty good job of enforcing the ordering we want. Hopefully this works, since I don't have a Windows machine. ;) Example failure: http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/11295 llvm-svn: 338267	2018-07-30 16:36:22 +00:00
Craig Topper	50b1d4303d	[DAGCombiner] Teach DAG combiner that A-(B-C) can be folded to A+(C-B) This can be useful since addition is commutable, and subtraction is not. This matches a transform that is also done by InstCombine. llvm-svn: 338181	2018-07-28 00:27:25 +00:00
Jessica Paquette	f90edbe3d6	Recommit "Enable MachineOutliner by default under -Oz for AArch64" Fixed the ASAN failure from before in r338148, so recommiting. This patch enables the MachineOutliner by default in AArch64 under -Oz. The MachineOutliner offers around a 4.5% improvement on the current -Oz code size improvements. We have done work into improving the debuggability of outlined code, so that users of -Oz won't be surprised by the optimization. We have also been executing the LLVM test suite and common external tests such as the SPEC suites continuously with no issue. The outliner has a low compile-time overhead of roughly 1%. At this point, the outliner would be a really good addition to the -Oz pass pipeline! llvm-svn: 338160	2018-07-27 20:18:27 +00:00
Sanjay Patel	06c7d5aef6	[AArch64, PowerPC, x86] add more signbit math tests; NFC The tests with a constant sub operand were added with rL338143, but the potential transform doesn't have that requirement, so adding more tests with variable operands. llvm-svn: 338150	2018-07-27 18:31:21 +00:00
Sanjay Patel	efac39eef6	[AArch64, PowerPC, x86] add more signbit math tests; NFC llvm-svn: 338143	2018-07-27 18:12:29 +00:00
Jessica Paquette	faea2d3130	Revert "Enable MachineOutliner by default under -Oz for AArch64" It failed an Asan test on a bot: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/21543/steps/check-llvm%20asan/logs/stdio Fixing that before recommitting. llvm-svn: 338136	2018-07-27 17:25:38 +00:00
Jessica Paquette	d4229b985c	Enable MachineOutliner by default under -Oz for AArch64 This patch enables the MachineOutliner by default in AArch64 under -Oz. The MachineOutliner offers around a 4.5% improvement on the current -Oz code size improvements. We have done work into improving the debuggability of outlined code, so that users of -Oz won't be surprised by the optimization. We have also been executing the LLVM test suite and common external tests such as the SPEC suites continuously with no issue. The outliner has a low compile-time overhead of roughly 1%. At this point, the outliner would be a really good addition to the -Oz pass pipeline! llvm-svn: 338133	2018-07-27 16:44:42 +00:00
Sanjay Patel	c7abb416dc	[DAGCombiner] fold 'not' with signbit math This is a follow-up suggested in D48970. Alive proofs: https://rise4fun.com/Alive/sII We can eliminate an instruction in the usual select-of-constants to bit hack transform by adjusting the add/sub with constant. This is always a win. There are more transforms that are likely wins, but they may need target hooks in case some targets do not benefit. This is another step towards making up for canonicalizing to select-of-constants in rL331486. llvm-svn: 338132	2018-07-27 16:42:55 +00:00
Sanjay Patel	f815bc658b	[AArch64] add more tests for signbit math; NFC llvm-svn: 338129	2018-07-27 16:21:56 +00:00
Matthias Braun	09810c9269	MacroFusion: Fix macro fusion with ExitSU failing in top-down scheduling When fusing instructions A and B, we must add all predecessors of B as predecessors of A to avoid instructions getting scheduling in between. There is a special case involving ExitSU: Every other node must be scheduled before it by design and we don't need to make this explicit in the graph, however when fusing with a different node we need to schedule every othere node before the fused node too and we need to make this explicit now: This patch adds a dependency from the fused node to all roots in the graph. Differential Revision: https://reviews.llvm.org/D49830 llvm-svn: 338046	2018-07-26 17:43:56 +00:00
Roman Lebedev	41ba5c1455	[DAGCombine] optimizeSetCCOfSignedTruncationCheck(): handle ule,ugt CondCodes. Summary: A follow-up for D49266 / rL337166. At least one of these cases is more canonical, so we really do have to handle it. https://godbolt.org/g/pkzP3X https://rise4fun.com/Alive/pQyhZZ We won't get to these cases with I1 being -1, as that will be constant-folded to true or false. I'm also not sure we actually hit the 'ule' case, but i think the worst think that could happen is that being dead code. Reviewers: spatel, craig.topper, RKSimon, javed.absar, efriedma Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49497 llvm-svn: 338044	2018-07-26 17:34:28 +00:00
Martin Storsjo	9dafd6f6d9	Revert "[COFF] Use comdat shared constants for MinGW as well" This reverts commit r337951. While that kind of shared constant generally works fine in a MinGW setting, it broke some cases of inline assembly that worked before: $ cat const-asm.c int MULH(int a, int b) { int rt, dummy; __asm__ ( "imull %3" :"=d"(rt), "=a"(dummy) :"a"(a), "rm"(b) ); return rt; } int func(int a) { return MULH(a, 1); } $ clang -target x86_64-win32-gnu -c const-asm.c -O2 const-asm.c:4:9: error: invalid variant '00000001' "imull %3" ^ <inline asm>:1:15: note: instantiated into assembly here imull __real@00000001(%rip) ^ A similar error is produced for i686 as well. The same test with a target of x86_64-win32-msvc or i686-win32-msvc works fine. llvm-svn: 338018	2018-07-26 10:48:20 +00:00
Amara Emerson	fdd089aa14	[GlobalISel] Fall back to SDISel for swifterror/swiftself attributes. We don't currently support these, fall back until we do. llvm-svn: 337994	2018-07-26 01:25:58 +00:00
Sanjay Patel	215dcbf4db	[SelectionDAG] try to convert funnel shift directly to rotate if legal If the DAGCombiner's rotate matching was working as expected, I don't think we'd see any test diffs here. This sidesteps the issue of custom lowering for rotates raised in PR38243: https://bugs.llvm.org/show_bug.cgi?id=38243 ...by only dealing with legal operations. llvm-svn: 337966	2018-07-25 21:38:30 +00:00
Sanjay Patel	f94c4c84e6	[AArch, PowerPC] add more tests for legal rotate ops; NFC llvm-svn: 337964	2018-07-25 21:25:50 +00:00
Martin Storsjo	ff33a95ed4	[COFF] Use comdat shared constants for MinGW as well GNU binutils tools have no problems with this kind of shared constants, provided that we actually hook it up completely in AsmPrinter and produce a global symbol. This effectively reverts SVN r335918 by hooking the rest of it up properly. This feature was implemented originally in SVN r213006, with no reason for why it can't be used for MinGW other than the fact that GCC doesn't do it while MSVC does. Differential Revision: https://reviews.llvm.org/D49646 llvm-svn: 337951	2018-07-25 18:35:42 +00:00
Martin Storsjo	d2662c32fb	[COFF] Hoist constant pool handling from X86AsmPrinter into AsmPrinter In SVN r334523, the first half of comdat constant pool handling was hoisted from X86WindowsTargetObjectFile (which despite the name only was used for msvc targets) into the arch independent TargetLoweringObjectFileCOFF, but the other half of the handling was left behind in X86AsmPrinter::GetCPISymbol. With only half of the handling in place, inconsistent comdat sections/symbols are created, causing issues with both GNU binutils (avoided for X86 in SVN r335918) and with the MS linker, which would complain like this: fatal error LNK1143: invalid or corrupt file: no symbol for COMDAT section 0x4 Differential Revision: https://reviews.llvm.org/D49644 llvm-svn: 337950	2018-07-25 18:35:31 +00:00
Martin Storsjo	c2b701408e	[AArch64] Use MCAsmInfoMicrosoft and MCAsmInfoGNUCOFF as base classes This matches the structure used on X86 and ARM. This requires a little bit of duplication of the parts that are equal in both AArch64 COFF variants though. Before SVN r335286, these classes didn't add anything that MCAsmInfoCOFF didn't, but now they do. This makes AArch64 match X86 in how comdat is used for float constants for MinGW. Differential Revision: https://reviews.llvm.org/D49637 llvm-svn: 337755	2018-07-23 22:15:14 +00:00
Simon Pilgrim	bfb900d363	[DAGCombiner] Add rotate-extract tests Add new tests from D47681 to current codegen. Also added i686 codegen tests. llvm-svn: 337445	2018-07-19 09:27:34 +00:00
Roman Lebedev	5317e88300	[NFC][X86][AArch64][DAGCombine] More tests for optimizeSetCCOfSignedTruncationCheck() At least one of these cases is more canonical, so we really do have to handle it. https://godbolt.org/g/pkzP3X https://rise4fun.com/Alive/pQyh llvm-svn: 337400	2018-07-18 16:19:06 +00:00
Sanjay Patel	c71adc8040	[Intrinsics] define funnel shift IR intrinsics + DAG builder support As discussed here: http://lists.llvm.org/pipermail/llvm-dev/2018-May/123292.html http://lists.llvm.org/pipermail/llvm-dev/2018-July/124400.html We want to add rotate intrinsics because the IR expansion of that pattern is 4+ instructions, and we can lose pieces of the pattern before it gets to the backend. Generalizing the operation by allowing 2 different input values (plus the 3rd shift/rotate amount) gives us a "funnel shift" operation which may also be a single hardware instruction. Initially, I thought we needed to define new DAG nodes for these ops, and I spent time working on that (much larger patch), but then I concluded that we don't need it. At least as a first step, we have all of the backend support necessary to match these ops...because it was required. And shepherding these through the IR optimizer is the primary concern, so the IR intrinsics are likely all that we'll ever need. There was also a question about converting the intrinsics to the existing ROTL/ROTR DAG nodes (along with improving the oversized shift documentation). Again, I don't think that's strictly necessary (as the test results here prove). That can be an efficiency improvement as a small follow-up patch. So all we're left with is documentation, definition of the IR intrinsics, and DAG builder support. Differential Revision: https://reviews.llvm.org/D49242 llvm-svn: 337221	2018-07-16 22:59:31 +00:00
Roman Lebedev	de506632aa	[X86][AArch64][DAGCombine] Unfold 'check for [no] signed truncation' pattern Summary: [[ https://bugs.llvm.org/show_bug.cgi?id=38149 \| PR38149 ]] As discussed in https://reviews.llvm.org/D49179#1158957 and later, the IR for 'check for [no] signed truncation' pattern can be improved: https://rise4fun.com/Alive/gBf ^ that pattern will be produced by Implicit Integer Truncation sanitizer, https://reviews.llvm.org/D48958 https://bugs.llvm.org/show_bug.cgi?id=21530 in signed case, therefore it is probably a good idea to improve it. But the IR-optimal patter does not lower efficiently, so we want to undo it.. This handles the simple pattern. There is a second pattern with predicate and constants inverted. NOTE: we do not check uses here. we always do the transform. Reviewers: spatel, craig.topper, RKSimon, javed.absar Reviewed By: spatel Subscribers: kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D49266 llvm-svn: 337166	2018-07-16 12:44:10 +00:00
Sanjay Patel	a41c886c55	[DAGCombiner] extend(ifpositive(X)) -> shift-right (not X) This is almost the same as an existing IR canonicalization in instcombine, so I'm assuming this is a good early generic DAG combine too. The motivation comes from reduced bit-hacking for select-of-constants in IR after rL331486. We want to restore that functionality in the DAG as noted in the commit comments for that change and the llvm-dev discussion here: http://lists.llvm.org/pipermail/llvm-dev/2018-July/124433.html The PPC and AArch tests show that those targets are already doing something similar. x86 will be neutral in the minimal case and generally better when this pattern is extended with other ops as shown in the signbit-shift.ll tests. Note the asymmetry: we don't include the (extend (ifneg X)) transform because it already exists in SimplifySelectCC(), and that is verified in the later unchanged tests in the signbit-shift.ll files. Without the 'not' op, the general transform to use a shift is always a win because that's a single instruction. Alive proofs: https://rise4fun.com/Alive/ysli Name: if pos, get -1 %c = icmp sgt i16 %x, -1 %r = sext i1 %c to i16 => %n = xor i16 %x, -1 %r = ashr i16 %n, 15 Name: if pos, get 1 %c = icmp sgt i16 %x, -1 %r = zext i1 %c to i16 => %n = xor i16 %x, -1 %r = lshr i16 %n, 15 Differential Revision: https://reviews.llvm.org/D48970 llvm-svn: 337130	2018-07-15 16:27:07 +00:00
Roman Lebedev	b64e74feed	[NFC][X86][AArch64] Negative tests for 'check for [no] signed truncation' pattern See D49247, D49266 I'm only adding the sane negative tests, and not adding the one-use tests yet. Also, not adding negative tests for the second pattern with inverted operands yet, since it's handling will be added in later differential. llvm-svn: 337014	2018-07-13 16:14:37 +00:00
Simon Pilgrim	9fe0bf3be7	[AArch64] Updated bigendian buildvector tests As suggested by @efriedma on D49262 - changed the extractelement to a store to prevent SimplifyDemandedVectorElts from simplifying the build vectors - this keeps the immediate generation which was the point of the tests. llvm-svn: 336981	2018-07-13 09:25:32 +00:00
Roman Lebedev	1574e49792	[NFC][X86][AArch64] Add tests for the 'check for [no] signed truncation' pattern Summary: [[ https://bugs.llvm.org/show_bug.cgi?id=38149 \| PR38149 ]] As discussed in https://reviews.llvm.org/D49179#1158957 and later, the IR can be improved: https://rise4fun.com/Alive/gBf ^ that pattern will be produced by Implicit Integer Truncation sanitizer, https://reviews.llvm.org/D48958 https://bugs.llvm.org/show_bug.cgi?id=21530 in signed case, therefore it is probably a good idea to improve it. But as it looks from these tests, i think we want to revert at least some cases in DAGCombine. Reviewers: spatel, craig.topper, RKSimon, javed.absar Reviewed By: spatel Subscribers: kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D49247 llvm-svn: 336917	2018-07-12 17:00:11 +00:00
Joel E. Denny	9fa9c9368d	[FileCheck] Add -allow-deprecated-dag-overlap to failing llvm tests See https://reviews.llvm.org/D47106 for details. Reviewed By: probinson Differential Revision: https://reviews.llvm.org/D47171 This commit drops that patch's changes to: llvm/test/CodeGen/NVPTX/f16x2-instructions.ll llvm/test/CodeGen/NVPTX/param-load-store.ll For some reason, the dos line endings there prevent me from commiting via the monorepo. A follow-up commit (not via the monorepo) will finish the patch. llvm-svn: 336843	2018-07-11 20:25:49 +00:00
Simon Pilgrim	f6ff75c4c2	Fix check-prefix vs check-prefixes typo in updated test llvm-svn: 336787	2018-07-11 10:42:51 +00:00
Simon Pilgrim	1975efe555	[AArch64] Regenerate SDIV tests Will make codegen diffs much easier to grok in a future patch llvm-svn: 336786	2018-07-11 10:39:50 +00:00
Daniel Sanders	9481399c0f	[globalisel][irtranslator] Add support for atomicrmw and (strong) cmpxchg Summary: This patch adds support for the atomicrmw instructions and the strong cmpxchg instruction to the IRTranslator. I've left out weak cmpxchg because LangRef.rst isn't entirely clear on what difference it makes to the backend. As far as I can tell from the code, it only matters to AtomicExpandPass which is run at the LLVM-IR level. Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar, volkan, javed.absar Reviewed By: qcolombet Subscribers: kristof.beyls, javed.absar, igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D40092 llvm-svn: 336589	2018-07-09 19:33:40 +00:00
Yvan Roux	7382cf8225	[MachineOutliner] Add missing liveness tracking info in MIR test. This should bring the bots back to green state. llvm-svn: 336482	2018-07-07 08:42:31 +00:00
Nico Weber	038dbf3c24	Revert 336426 (and follow-ups 428, 440), it very likely caused PR38084. llvm-svn: 336453	2018-07-06 17:37:24 +00:00
Diogo N. Sampaio	81e9dd1ed7	Commit rL336426 cause buildbot failures http://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-incremental/50537/testReport/junit/LLVM/CodeGen_AArch64/FoldRedundantShiftedMasking_ll/ This removes the comments of the function label causing this error. llvm-svn: 336440	2018-07-06 14:41:09 +00:00
Diogo N. Sampaio	742bf1a255	[SelectionDAG] https://reviews.llvm.org/D48278 D48278 Allow to reduce redundant shift masks. For example: x1 = x & 0xAB00 x2 = (x >> 8) & 0xAB can be reduced to: x1 = x & 0xAB00 x2 = x1 >> 8 It only allows folding when the masks and shift values are constants. llvm-svn: 336426	2018-07-06 09:42:25 +00:00
Sanjay Patel	bce899ff59	[AArch64, PowerPC, x86] add tests for signbit bit hacks; NFC llvm-svn: 336348	2018-07-05 13:16:46 +00:00
Mikael Holmen	8505f34b29	Partial revert of "NFC - Various typo fixes in tests" This partially reverts r336268 since it causes buildbot failures. Added FIXME at the places where the CHECKs are misspelled. llvm-svn: 336323	2018-07-05 08:42:16 +00:00
Gabor Buella	da4a966e1c	NFC - Various typo fixes in tests llvm-svn: 336268	2018-07-04 13:28:39 +00:00
Amara Emerson	d912ffaba5	[AArch64][GlobalISel] Fix fallbacks introduced in r336120 due to unselectable stores. r336120 resulted in falling back to SelectionDAG more often due to the G_STORE MMOs not matching the vreg size. This fixes that by explicitly any-extending the value. llvm-svn: 336209	2018-07-03 15:59:26 +00:00
Amara Emerson	846f2436e8	[AArch64][GlobalISel] Any-extend vararg parameters to stack slot size on Darwin. We currently don't any-extend vararg parameters before storing them to the stack locations on Darwin. However, SelectionDAG however does this, and so user code is in the wild which inadvertently relies on this extension. This can manifest in cases where the value stored is (int)0, but the actual parameter is interpreted by va_arg as a pointer, and so not extending to 64 bits causes the callee to load additional undefined bits. llvm-svn: 336120	2018-07-02 16:39:09 +00:00
Jessica Paquette	8bda1881ca	[MachineOutliner] Add support for target-default outlining. This adds functionality to the outliner that allows targets to specify certain functions that should be outlined from by default. If a target supports default outlining, then it specifies that in its TargetOptions. In the case that it does, and the user hasn't specified that they never want to outline, the outliner will be added to the pass pipeline and will run on those default functions. This is a preliminary patch for turning the outliner on by default under -Oz for AArch64. https://reviews.llvm.org/D48776 llvm-svn: 336040	2018-06-30 03:56:03 +00:00
Jessica Paquette	79917b9686	[MachineOutliner] Add always and never options to -enable-machine-outliner This is a recommit of r335887, which was erroneously committed earlier. To enable the MachineOutliner by default on AArch64, we need to be able to disable the MachineOutliner and also provide an option to "always" enable the outliner. This adds that capability. It allows the user to still use the old -enable-machine-outliner option, which defaults to "always". This is building up to allowing the user to specify "always" versus the target default outlining behaviour. https://reviews.llvm.org/D48682 llvm-svn: 335986	2018-06-29 16:12:45 +00:00
Jessica Paquette	0c5d3ffbb8	[MachineOutliner] Never add the outliner in -O0 This is a recommit of r335879. We shouldn't add the outliner when compiling at -O0 even if -enable-machine-outliner is passed in. This makes sure that we don't add it in this case. This also removes -O0 from the outliner DWARF test. llvm-svn: 335930	2018-06-28 21:49:24 +00:00
Jessica Paquette	d6261bef7b	Revert "[MachineOutliner] Add always and never options to -enable-machine-outliner" I accidentally committed this instead of D48683 because I haven't had coffee yet. llvm-svn: 335883	2018-06-28 17:26:19 +00:00
Jessica Paquette	f3a44fe833	Revert "[MachineOutliner] Never add the outliner in -O0" This reverts commit 9c7c10e4073a0bc6a759ce5cd33afbac74930091. It relies on r335872 since that introduces the machine outliner flags test. I meant to commit D48683 in that commit, but got mixed up and committed D48682 instead. So, I'm reverting this and r335872, since D48682 hasn't made it through review yet. llvm-svn: 335882	2018-06-28 17:26:18 +00:00
Jessica Paquette	c9d675266e	[MachineOutliner] Never add the outliner in -O0 We shouldn't add the outliner when compiling at -O0 even if -enable-machine-outliner is passed in. This makes sure that we don't add it in this case. This also updates machine-outliner-flags to reflect the change and improves the comment describing what that test does. llvm-svn: 335879	2018-06-28 17:05:57 +00:00
Jessica Paquette	1ccb66c5fb	[MachineOutliner] Add always and never options to -enable-machine-outliner To enable the MachineOutliner by default on AArch64, we need to be able to disable the MachineOutliner and also provide an option to "always" enable the outliner. This adds that capability. It allows the user to still use the old -enable-machine-outliner option, which defaults to "always". This is building up to allowing the user to specify "always" versus the target-default outlining behaviour. llvm-svn: 335872	2018-06-28 16:39:42 +00:00
Daniel Sanders	bdeb880d14	[globalisel][legalizer] Add AtomicOrdering to LegalityQuery and use it in AArch64 Now that we have the ability to legalize based on MMO's. Add support for legalizing based on AtomicOrdering and use it to correct the legalization of the atomic instructions. Also extend all() to be a variadic template as this ruleset now requires 3 and 4 argument versions. llvm-svn: 335767	2018-06-27 19:03:21 +00:00
Sanjay Patel	d052de856d	[DAGCombiner] restrict (float)((int) f) --> ftrunc with no-signed-zeros As noted in the D44909 review, the transform from (fptosi+sitofp) to ftrunc can produce -0.0 where the original code does not: #include <stdio.h> int main(int argc) { float x; x = -0.8 * argc; printf("%f\n", (float)((int)x)); return 0; } $ clang -O0 -mavx fp.c ; ./a.out 0.000000 $ clang -O1 -mavx fp.c ; ./a.out -0.000000 Ideally, we'd use IR/node flags to predicate the transform, but the IR parser doesn't currently allow fast-math-flags on the cast instructions. So for now, just use the function attribute that corresponds to clang's "-fno-signed-zeros" option. Differential Revision: https://reviews.llvm.org/D48085 llvm-svn: 335761	2018-06-27 18:16:40 +00:00
Jessica Paquette	f472f6159a	[MachineOutliner] Don't outline sequences where x16/x17/nzcv are live across It isn't safe to outline sequences of instructions where x16/x17/nzcv live across the sequence. This teaches the outliner to check whether or not a specific canidate has x16/x17/nzcv live across it and discard the candidate in the case that that is true. https://bugs.llvm.org/show_bug.cgi?id=37573 https://reviews.llvm.org/D47655 llvm-svn: 335758	2018-06-27 17:43:27 +00:00
Luke Geeson	316327150b	[AArch64] Reverting FP16 vcvth_n_s64_f16 to fix llvm-svn: 335737	2018-06-27 14:34:40 +00:00
Adhemerval Zanella	cadcfed7aa	[AArch64] Add custom lowering for v4i8 trunc store This patch adds a custom trunc store lowering for v4i8 vector types. Since there is not v.4b register, the v4i8 is promoted to v4i16 (v.4h) and default action for v4i8 is to extract each element and issue 4 byte stores. A better strategy would be to extended the promoted v4i16 to v8i16 (with undef elements) and extract and store the word lane which represents the v4i8 subvectores. The construction: define void @foo(<4 x i16> %x, i8* nocapture %p) { %0 = trunc <4 x i16> %x to <4 x i8> %1 = bitcast i8* %p to <4 x i8>* store <4 x i8> %0, <4 x i8>* %1, align 4, !tbaa !2 ret void } Can be optimized from: umov w8, v0.h[3] umov w9, v0.h[2] umov w10, v0.h[1] umov w11, v0.h[0] strb w8, [x0, #3] strb w9, [x0, #2] strb w10, [x0, #1] strb w11, [x0] ret To: xtn v0.8b, v0.8h str s0, [x0] ret The patch also adjust the memory cost for autovectorization, so the C code: void foo (const int src, int width, unsigned char dst) { for (int i = 0; i < width; i++) dst++ = src++; } can be vectorized to: .LBB0_4: // %vector.body // =>This Inner Loop Header: Depth=1 ldr q0, [x0], #16 subs x12, x12, #4 // =4 xtn v0.4h, v0.4s xtn v0.8b, v0.8h st1 { v0.s }[0], [x2], #4 b.ne .LBB0_4 Instead of byte operations. llvm-svn: 335735	2018-06-27 13:58:46 +00:00
Luke Geeson	68cb233c0f	[AArch64] Remove Duplicate FP16 Patterns with same encoding, match on existing patterns llvm-svn: 335715	2018-06-27 09:20:13 +00:00
Vedant Kumar	b725c69f12	[SelectionDAG] Remove debug locations from ConstantSD(FP)Nodes This removes debug locations from ConstantSDNode and ConstantSDFPNode. When this kind of node is materialized we no longer create a line table entry which jumps back to the constant's first point of use. This makes single-stepping behavior smoother, and it matches the model used by IR, where Constants have no locations. See this thread for more context: http://lists.llvm.org/pipermail/llvm-dev/2018-June/124164.html I'd like to handle constant BuildVectorSDNodes and to try to eliminate passing SDLocs to SelectionDAG::getConstant*() in follow-up commits. Differential Revision: https://reviews.llvm.org/D48468 llvm-svn: 335497	2018-06-25 17:06:18 +00:00
Aditya Nandakumar	e2a7f31064	[GISel]: Add G_ADDRSPACE_CAST Opcode Added IRTranslator support for addrspacecast. https://reviews.llvm.org/D48469 reviewed by: volkan llvm-svn: 335388	2018-06-22 20:58:51 +00:00
Sirish Pande	b60acb9e48	Revert "[AArch64] Coalesce Copy Zero during instruction selection" This reverts commit d8f57105010cc7e78026e511d5def873fc91e0e7. Original Commit: Author: Haicheng Wu <haicheng@codeaurora.org> Date: Sun Feb 18 13:51:33 2018 +0000 [AArch64] Coalesce Copy Zero during instruction selection Add special case for copy of zero to avoid a double copy. Differential Revision: https://reviews.llvm.org/D36104 Author's intention is to remove a BB that has one mov instruction. In order to do that, d8f571050 pessmizes MachineSinking by introducing a copy, such that mov instruction is NOT moved to the BB. Optimization downstream gets rid of the BB with only mov instruction. This works well if we have only one fall through branch as there is only one "extra" mov instruction. If we have multiple fall throughs, we will have a lot of redundant movs. In such a case, it's better to have this BB which has one mov instruction. This is causing degradation in jpeg, fft and other codebases. I believe if we want to remove a BB with only one branch instruction, we should not pessimize Machine Sinking at all, and find some other solution. llvm-svn: 335251	2018-06-21 16:05:24 +00:00
Tim Northover	70666e7765	[AArch64] Implement FLT_ROUNDS macro. Very similar to ARM implementation, just maps to an MRS. Should fix PR25191. Patch by Michael Brase. llvm-svn: 335118	2018-06-20 12:09:01 +00:00
Michael Berg	7b993d762f	Utilize new SDNode flag functionality to expand current support for fadd Summary: This patch originated from D46562 and is a proper subset, with some issues addressed. Reviewers: spatel, hfinkel, wristow, arsenm, javed.absar Reviewed By: spatel Subscribers: wdng, nhaehnle Differential Revision: https://reviews.llvm.org/D47909 llvm-svn: 334996	2018-06-18 23:44:59 +00:00
Michael Berg	8e570c3390	Utilize new SDNode flag functionality to expand current support for fma Summary: This patch originated from D47388 and is a proper subset of the originating changes, containing only the fmf optimization guard extensions. Reviewers: spatel, hfinkel, wristow, arsenm, javed.absar, rampitec, nhaehnle, nemanjai Reviewed By: rampitec, nhaehnle Subscribers: tpr, nemanjai, wdng Differential Revision: https://reviews.llvm.org/D47918 llvm-svn: 334876	2018-06-16 00:03:06 +00:00
Justin Bogner	3b83edb037	Re-apply "[VirtRegRewriter] Avoid clobbering registers when expanding copy bundles" This is r334750 (which was reverted in r334754) with a fix for an uninitialized variable that was caught by msan. Original commit message: > If a copy bundle happens to involve overlapping registers, we can end > up with emitting the copies in an order that ends up clobbering some > of the subregisters. Since instructions in the copy bundle > semantically happen at the same time, this is incorrect and we need to > make sure we order the copies such that this doesn't happen. llvm-svn: 334756	2018-06-14 19:24:03 +00:00
Justin Bogner	36c7f40f20	Revert "[VirtRegRewriter] Avoid clobbering registers when expanding copy bundles" There's an msan failure: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/19549 This reverts r334750. llvm-svn: 334754	2018-06-14 19:10:57 +00:00
Justin Bogner	866d9f02be	[VirtRegRewriter] Avoid clobbering registers when expanding copy bundles If a copy bundle happens to involve overlapping registers, we can end up with emitting the copies in an order that ends up clobbering some of the subregisters. Since instructions in the copy bundle semantically happen at the same time, this is incorrect and we need to make sure we order the copies such that this doesn't happen. Differential Revision: https://reviews.llvm.org/D48154 llvm-svn: 334750	2018-06-14 18:32:55 +00:00
Sanjay Patel	7d4929611c	[DAGCombiner] remove hasOneUse() check from fadd constants transform We're constant folding here, so we shouldn't check uses. This matches the IR optimizer behavior. The x86 test shows the expected win. The AArch64 test shows something else. This only seems to happen if the "generic" AArch64 CPU model is used by MachineCombiner, so I'll file a bug report to follow-up. llvm-svn: 334608	2018-06-13 15:22:48 +00:00
Sanjay Patel	82bc842650	[AArch64] add tests for fadd with more than one use; NFC llvm-svn: 334556	2018-06-12 22:50:37 +00:00
Petr Hosek	7250908016	[AArch64] Support reserving x20 register Register x20 is a callee-saved register which may be used for other purposes in certain contexts, for example to hold special variables within the kernel. This change adds support for reserving this register both to frontend and backend to make this register usable for these purposes. Differential Revision: https://reviews.llvm.org/D46552 llvm-svn: 334531	2018-06-12 20:00:50 +00:00
Roman Tereshin	b2d3f2e5da	[MIR][MachineCSE] Implementing proper MachineInstr::getNumExplicitDefs() Apparently, MachineInstr class definition as well as pretty much all of the machine passes assume that the only kind of MachineInstr's operands that is variadic for variadic opcodes is explicit non-definitions. In particular, this assumption is made by MachineInstr::defs(), uses(), and explicit_uses() methods, as well as by MachineCSE pass. The assumption is incorrect judging from at least TableGen backend implementation, that recognizes variable_ops in OutOperandList, and the very existence of G_UNMERGE_VALUES generic opcode, or ARM load multiple instructions, all of which have variadic defs. In particular, MachineCSE pass breaks MIR with CSE'able G_UNMERGE_VALUES instructions in it. This commit implements MachineInstr::getNumExplicitDefs() similar to pre-existing MachineInstr::getNumExplicitOperands(), fixes MachineInstr::defs(), uses(), and explicit_uses(), and fixes MachineCSE pass. As the issue addressed seems to affect only machine passes that could be ran mid-GlobalISel pipeline at the moment, the other passes aren't fixed by this commit, like MachineLICM: that could be done on per-pass basis when (if ever) they get adopted for GlobalISel. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D45640 llvm-svn: 334520	2018-06-12 18:30:37 +00:00
Michael Berg	95f3a430a8	NFC, some additional tests added and some renaming for planned fma support changes llvm-svn: 334461	2018-06-12 00:52:43 +00:00
Roman Lebedev	cb56f7a550	[NFC][X86][AArch64] Reorganize/cleanup BZHI test patterns Summary: In D47428, i propose to choose the `~(-(1 << nbits))` as the canonical form of low-bit-mask formation. As it is seen from these tests, there is a reason for that. AArch64 currently better handles `~(-(1 << nbits))`, but not the more traditional `(1 << nbits) - 1` (sic!). The other way around for X86. It would be much better to canonicalize. It would seem that there is too much tests, but this is most of all the auto-generated possible variants of C code that one would expect for BZHI to be formed, and then manually cleaned up a bit. So this should be pretty representable, which somewhat good coverage... Related links: https://bugs.llvm.org/show_bug.cgi?id=36419 https://bugs.llvm.org/show_bug.cgi?id=37603 https://bugs.llvm.org/show_bug.cgi?id=37610 https://rise4fun.com/Alive/idM Reviewers: javed.absar, craig.topper, RKSimon, spatel Reviewed By: RKSimon Subscribers: kristof.beyls, llvm-commits, RKSimon, craig.topper, spatel Differential Revision: https://reviews.llvm.org/D47452 llvm-svn: 334124	2018-06-06 19:38:10 +00:00
Evandro Menezes	b2c8244715	[AArch64, ARM] Add support for Samsung Exynos M4 Create a separate feature set for Exynos M4 and add test cases. llvm-svn: 334115	2018-06-06 18:56:00 +00:00
David Green	25312b2b6c	[GlobalMerge] Set the alignment on merged global structs If no alignment is set, the abi/preferred alignment of structs will be used which may be higher than required. This can lead to extra padding and in the end an increase in data size. Differential Revision: https://reviews.llvm.org/D47633 llvm-svn: 334099	2018-06-06 14:48:32 +00:00
Francis Visoiu Mistrih	ca69b3bf6d	[ShrinkWrap] Add optimization remarks to the shrink-wrapping pass Start by emitting remarks for very basic unsupported cases such as irreducible CFGs and EHFunclets. The end goal is to be able to cover all the cases where we give up with an explanation. llvm-svn: 333972	2018-06-05 00:27:24 +00:00
Amara Emerson	5a3bb68e12	[AArch64][GlobalISel] Zero-extend s1 values when returning. Before we were relying on the any extend of the s1 to s32, but for AAPCS we need to zero-extend it to at least s8. Fixes PR36719 Differential Revision: https://reviews.llvm.org/D47425 llvm-svn: 333747	2018-06-01 13:20:32 +00:00
Luke Geeson	2e09995d42	[AArch64] Reverted rL333427 fixing Clang UnitTest Failure llvm-svn: 333634	2018-05-31 08:27:53 +00:00
Roman Tereshin	5a65eb75c7	[GlobalISel][AArch64] LegalizerInfo verifier: Fixing bugs exposed by LegalizerInfo::verify(...) Reviewers: aemerson, qcolombet Reviewed By: qcolombet Differential Revision: https://reviews.llvm.org/D46339 llvm-svn: 333618	2018-05-31 01:56:05 +00:00
Roman Tereshin	8f1753e994	[GlobalISel][AArch64] LegalizerInfo verifier: Adding LegalizerInfo::verify(...) call w/o fixing bugs This is to make it clear what kind of bugs the LegalizerInfo::verifier is able to catch and test its output Reviewers: aemerson, qcolombet Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D46338 llvm-svn: 333597	2018-05-30 22:10:04 +00:00
Evandro Menezes	f8425340e4	[AArch64] Fix PR32384: bump up the number of stores per memset and memcpy As suggested in https://bugs.llvm.org/show_bug.cgi?id=32384#c1, this change makes the inlining of `memset()` and `memcpy()` more aggressive when compiling for speed. The tuning remains the same when optimizing for size. Patch by: Sebastian Pop <s.pop@samsung.com> Evandro Menezes <e.menezes@samsung.com> Differential revision: https://reviews.llvm.org/D45098 llvm-svn: 333429	2018-05-29 15:58:50 +00:00
Amara Emerson	d5a9e7bbc9	Revert "[AArch64] added FP16 vcvth intrinsic support" This reverts commit r333410 due to bot failures. llvm-svn: 333427	2018-05-29 15:34:22 +00:00
Luke Geeson	16092ab3c5	[AArch64] added FP16 vcvth intrinsic support Summary: Change-Id: I0df845749c7689dfc99150ba7c19c7d0dadbd705 Reviewers: javed.absar, SjoerdMeijer Reviewed By: SjoerdMeijer Subscribers: llvm-commits, SjoerdMeijer Differential Revision: https://reviews.llvm.org/D46311 llvm-svn: 333410	2018-05-29 11:40:33 +00:00
Eli Friedman	9e177882aa	[AArch64] Improve orr+movk sequences for MOVi64imm. The existing code has three different ways to try to lower a 64-bit immediate to the sequence ORR+MOVK. The result is messy: it misses some possible sequences, and the order of the checks means we sometimes emit two MOVKs when we only need one. Instead, just use a simple loop to try all possible two-instruction ORR+MOVK sequences. Differential Revision: https://reviews.llvm.org/D47176 llvm-svn: 333218	2018-05-24 19:38:23 +00:00
Geoff Berry	98150e3a62	[AArch64] Take advantage of variable shift/rotate amount implicit mod operation. Summary: Optimize code generated for variable shifts/rotates by taking advantage of the implicit and/mod done on the variable shift amount register. Resolves bug 27582 and bug 37421. Reviewers: t.p.northover, qcolombet, MatzeB, javed.absar Subscribers: rengolin, kristof.beyls, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D46844 llvm-svn: 333214	2018-05-24 18:29:42 +00:00
Eli Friedman	042dc9e092	[MachineOutliner] Add "thunk" outlining for AArch64. When we're outlining a sequence that ends in a call, we can save up to three instructions in the outlined function by turning the call into a tail-call. I refer to this as thunk outlining because the resulting outlined function looks like a thunk; suggestions welcome for a better name. In addition to making the outlined function shorter, thunk outlining allows outlining calls which would otherwise be illegal to outline: we don't need to save/restore LR, so we don't need to prove anything about the stack access patterns of the callee. To make this work effectively, I also added MachineOutlinerInstrType::LegalTerminator to the generic MachineOutliner code; this allows treating an arbitrary instruction as a terminator in the suffix tree. Differential Revision: https://reviews.llvm.org/D47173 llvm-svn: 333015	2018-05-22 19:11:06 +00:00
Sanjay Patel	17a870f07c	[DAG] fold FP binops with undef operands to NaN This is the FP sibling of D43141 with the corresponding IR change in rL327212. We can't propagate undef here because if a variable operand is a NaN, these binops must propagate NaN. Neither global nor node-level fast-math makes a difference. If we have 'nnan', I think later folds can turn the NaN into undef. The tests in X86/fp-undef.ll are meant to be the definitive verification for these folds - everything reduces identically now. The other test changes are collateral damage. They may need to be altered to preserve their intent. Differential Revision: https://reviews.llvm.org/D47026 llvm-svn: 332920	2018-05-21 23:54:19 +00:00
Roman Lebedev	7772de25d0	[DAGCombine][X86][AArch64] Masked merge unfolding: vector edition. Summary: This appears to be the last missing piece for the masked merge pattern handling in the backend. This is [[ https://bugs.llvm.org/show_bug.cgi?id=37104 \| PR37104 ]]. [[ https://bugs.llvm.org/show_bug.cgi?id=6773 \| PR6773 ]] will introduce an IR canonicalization that is likely bad for the end assembly. Previously, `andps`+`andnps` / `bsl` would be generated. (see `@out`) Now, they would no longer be generated (see `@in`), and we need to make sure that they are generated. Differential Revision: https://reviews.llvm.org/D46528 llvm-svn: 332904	2018-05-21 21:41:02 +00:00
Roman Lebedev	fd79bc3aa2	[X86][AArch64][NFC] Add tests for vector masked merge unfolding Summary: This is [[ https://bugs.llvm.org/show_bug.cgi?id=37104 \| PR37104 ]]. [[ https://bugs.llvm.org/show_bug.cgi?id=6773 \| PR6773 ]] will introduce an IR canonicalization that is likely bad for the end assembly. Previously, `andps`+`andnps` / `bsl` would be generated. (see `@out`) Now, they would no longer be generated (see `@in`). Differential Revision: https://reviews.llvm.org/D46008 llvm-svn: 332903	2018-05-21 21:40:51 +00:00
Haicheng Wu	69ba0613f2	[GlobalMerge] Exit early if only one global is to be merged To save some compilation time and prevent some unnecessary changes. Differential Revision: https://reviews.llvm.org/D46640 llvm-svn: 332813	2018-05-19 18:00:02 +00:00
Eli Friedman	4081a57af7	[MachineOutliner] Count savings from outlining in bytes. Counting the number of instructions is both unintuitive and inaccurate. On AArch64, this only affects the generated remarks and certain rare pseudo-instructions, but it will have a bigger impact on other targets. Differential Revision: https://reviews.llvm.org/D46921 llvm-svn: 332685	2018-05-18 01:52:16 +00:00
Sanjay Patel	ffd349577d	[AArch64] preserve test intent by removing undef We need to clean up the DAG floating-point undef logic. This process is similar to how we handled integer undef logic in D43141. And as we did there, I'm trying to reduce the patch by changing tests that would probably become meaningless once we correct FP undef folding. Follow-up to: https://reviews.llvm.org/rL332534 ...because that change wasn't enough. llvm-svn: 332636	2018-05-17 18:07:02 +00:00
Sanjay Patel	60fe206793	[AArch64] preserve test intent by removing undef We need to clean up the DAG floating-point undef logic. This process is similar to how we handled integer undef logic in D43141. And as we did there, I'm trying to reduce the patch by changing tests that would probably become meaningless once we correct FP undef folding. llvm-svn: 332534	2018-05-16 21:57:57 +00:00
Eli Friedman	ddbf6d6514	[MachineOutliner] Don't outline instructions that modify SP. This breaks the code which saves and restores LR, so we can't outline without doing something more complicated for stack adjustment. Found by inspection; we get lucky in most cases because getMemOpInfo only handles STRWpost, not any other pre/post-increment forms. But it hits a couple of artificial testcases in the tree. Differential Revision: https://reviews.llvm.org/D46920 llvm-svn: 332529	2018-05-16 21:20:16 +00:00
Eli Friedman	02709bcb78	[MachineOutliner] Don't save/restore LR for tail calls. The cost computation assumes we do this correctly, but the actual lowering was wrong. Differential Revision: https://reviews.llvm.org/D46923 llvm-svn: 332514	2018-05-16 19:49:01 +00:00
Sirish Pande	cabe50a308	[AArch64] Gangup loads and stores for pairing. Keep loads and stores together (target defines how many loads and stores to gang up), such that it will help in pairing and vectorization. Differential Revision https://reviews.llvm.org/D46477 llvm-svn: 332482	2018-05-16 15:36:52 +00:00
Amara Emerson	0d6a26dffc	[GlobalISel][IRTranslator] Split aggregates during IR translation. We currently handle all aggregates by creating one large LLT, and letting the legalizer deal with splitting them up. However using this approach means that we can't support big endian code correctly. This patch changes the way that the IRTranslator deals with aggregate values, by splitting them up into their constituent element values. To do this, parts of the translator need to be modified to deal with multiple VRegs for a single Value. A new Value to VReg mapper is introduced to help keep compile time under control, currently there is no measurable impact on CTMark despite the extra code being generated in some cases. Patch is based on the original work of Tim Northover. Differential Revision: https://reviews.llvm.org/D46018 llvm-svn: 332449	2018-05-16 10:32:02 +00:00
Peter Smith	c811758da6	[AArch64] Support "S" inline assembler constraint This patch re-introduces the "S" inline assembler constraint. This matches an absolute symbolic address or a label reference. The primary use case is asm("adrp %0, %1\n\t" "add %0, %0, :lo12:%1" : "=r"(addr) : "S"(&var)); I say re-introduces as it seems like "S" was implemented in the original AArch64 backend, but it looks like it wasn't carried forward to the merged backend. The original implementation had A and L modifiers that could be used to print ":lo12:" to the string. It looks like gcc doesn't use these and :lo12: is expected to be written in the inline assembly string so I've not implemented A and L. Clang already supports the S modifier. Fixes PR37180 Differential Revision: https://reviews.llvm.org/D46745 llvm-svn: 332444	2018-05-16 09:33:25 +00:00
Eli Friedman	25bef201c5	[MachineOutliner] Add optsize markings to outlined functions. It doesn't matter much this late in the pipeline, but one place that does check for it is the function alignment code. Differential Revision: https://reviews.llvm.org/D46373 llvm-svn: 332415	2018-05-15 23:36:46 +00:00
Evandro Menezes	8d522d811a	[AArch64] Improve single vector lane unscaled stores When storing the 0th lane of a vector, use a simpler and usually more efficient scalar store instead. In this case, also using the unscaled offset. Differential revision: https://reviews.llvm.org/D46762 llvm-svn: 332394	2018-05-15 20:41:12 +00:00
Geoff Berry	32d07d59f5	[AArch64] Fix mir test case liveins info. The test case added in r332265 had incomplete livein information which was caught by the EXPENSIVE_CHECKS bot. Fix the livein information and add -verify-machineinstrs to the test case. llvm-svn: 332367	2018-05-15 16:27:34 +00:00
Sanjay Patel	8652c53d29	[DAG] propagate FMF for all FPMathOperators This is a simple hack based on what's proposed in D37686, but we can extend it if needed in follow-ups. It gets us most of the FMF functionality that we want without adding any state bits to the flags. It also intentionally leaves out non-FMF flags (nsw, etc) to minimize the patch. It should provide a superset of the functionality from D46563 - the extra tests show propagation and codegen diffs for fcmp, vecreduce, and FP libcalls. The PPC log2() test shows the limits of this most basic approach - we only applied 'afn' to the last node created for the call. AFAIK, there aren't any libcall optimizations based on the flags currently, so that shouldn't make any difference. Differential Revision: https://reviews.llvm.org/D46854 llvm-svn: 332358	2018-05-15 14:16:24 +00:00
Sanjay Patel	165587b424	[AArch64] enhance test to show FMF loss; NFC llvm-svn: 332301	2018-05-14 21:53:21 +00:00
Geoff Berry	64a2ea41ea	[BranchFolding] Allow hoisting to block with a single conditional branch. Summary: The BranchFolding pass is currently missing opportunities to hoist common code if the hoisted-to block contains a single conditional branch that has register uses. This occurs somewhat frequently on AArch64 with CBZ/TBZ opcodes. This change also eliminates some code differences when debug info is present since the presence of e.g. DBG_VALUE instructions in the hoisted-to block can enable hoisting that wouldn't have occurred without them. Reviewers: MatzeB, rnk, kparzysz, twoh, aprantl, javed.absar Subscribers: kristof.beyls, JDevlieghere, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D46324 llvm-svn: 332265	2018-05-14 17:31:18 +00:00
Evandro Menezes	14fa2e4fa5	[AArch64] Improve single vector lane stores When storing the 0th lane of a vector, use a simpler and usually more efficient scalar store instead. Differential revision: https://reviews.llvm.org/D46655 llvm-svn: 332251	2018-05-14 15:26:35 +00:00
Vedant Kumar	fd340a4047	[DAGCombiner] Set the right SDLoc on a newly-created sextload (6/N) This teaches tryToFoldExtOfLoad to set the right location on a newly-created extload. With that in place, the logic for performing a certain ([s\|z]ext (load ...)) combine becomes identical for sexts and zexts, and we can get rid of one copy of the logic. The test case churn is due to dependencies on IROrders inherited from the wrong SDLoc. Part of: llvm.org/PR37262 Differential Revision: https://reviews.llvm.org/D46158 llvm-svn: 332118	2018-05-11 18:40:08 +00:00
Geoff Berry	60460268c0	[AArch64] Fix performPostLD1Combine to check for constant lane index. Summary: performPostLD1Combine in AArch64ISelLowering looks for vector insert_vector_elt of a loaded value which it can optimize into a single LD1LANE instruction. The code checking for the pattern was not checking if the lane index was a constant which could cause two problems: - an assert when lowering the LD1LANE ISD node since it assumes an constant operand - an assert in isel if the lane index value depends on the post-incremented base register Both of these issues are avoided by simply checking that the lane index is a constant. Fixes bug 35822. Reviewers: t.p.northover, javed.absar Subscribers: rengolin, kristof.beyls, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D46591 llvm-svn: 332103	2018-05-11 16:25:06 +00:00
Roman Tereshin	6d26638c90	[GlobalISel][Legalizer] Widening the second src op of shifts bug fix The second source operand of G_SHL, G_ASHR, and G_LSHR must preserve its value as a (small) unsigned integer, therefore its incorrect to widen it in any way but by zero extending it. G_SHL was using G_ANYEXT and G_ASHR - G_SEXT (which is correct for their destination and first source operands, but not the "number of bits to shift" operand). Generally, shifts aren't as similar to regular binary operations as it might seem, for instance, they aren't commutative nor associative and the second source operand usually requires a special treatment. Reviewers: bogner, javed.absar, aivchenk, rovka Reviewed By: bogner Subscribers: igorb, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D46413 llvm-svn: 331926	2018-05-09 21:43:30 +00:00
Roman Tereshin	d5fa9fde58	Reapplying r331819 [GlobalISel][Legalizer] More concise and faster widenScalar, NFC The commit was a suspect for clang-cmake-aarch64-global-isel and clang-cmake-aarch64-quick bot failures, proved to be innocent. llvm-svn: 331898	2018-05-09 17:28:18 +00:00
Daniel Sanders	618437459c	Revert r331816 and r331820 - [globalisel] Add a combiner helpers for extending loads and use them in a pre-legalize combiner for AArch64 Reverting this to see if the clang-cmake-aarch64-global-isel and clang-cmake-aarch64-quick bots are failing because of this commit. We know it wasn't r331819. llvm-svn: 331846	2018-05-09 05:00:17 +00:00
Shiva Chen	2c864551df	[DebugInfo] Add DILabel metadata and intrinsic llvm.dbg.label. In order to set breakpoints on labels and list source code around labels, we need collect debug information for labels, i.e., label name, the function label belong, line number in the file, and the address label located. In order to keep these information in LLVM IR and to allow backend to generate debug information correctly. We create a new kind of metadata for labels, DILabel. The format of DILabel is !DILabel(scope: !1, name: "foo", file: !2, line: 3) We hope to keep debug information as much as possible even the code is optimized. So, we create a new kind of intrinsic for label metadata to avoid the metadata is eliminated with basic block. The intrinsic will keep existing if we keep it from optimized out. The format of the intrinsic is llvm.dbg.label(metadata !1) It has only one argument, that is the DILabel metadata. The intrinsic will follow the label immediately. Backend could get the label metadata through the intrinsic's parameter. We also create DIBuilder API for labels to be used by Frontend. Frontend could use createLabel() to allocate DILabel objects, and use insertLabel() to insert llvm.dbg.label intrinsic in LLVM IR. Differential Revision: https://reviews.llvm.org/D45024 Patch by Hsiangkai Wang. llvm-svn: 331841	2018-05-09 02:40:45 +00:00
Roman Tereshin	27bba4495a	Revert r331819 [GlobalISel][Legalizer] More concise and faster widenScalar, NFC Reverting this to see if the clang-cmake-aarch64-global-isel and clang-cmake-aarch64-quick bots are failing because of this commit llvm-svn: 331839	2018-05-09 01:43:12 +00:00
Roman Tereshin	25cbfe680e	[GlobalISel][Legalizer] More concise and faster widenScalar, NFC Refactoring LegalizerHelper::widenScalar member function reducing its size by approximately a factor of 2 and (hopefuly) making it more straightforward and regular by introducing widenScalarSrc and widenScalarDst helper methods. The new widenScalar* methods mutate the instructions in place instead of recreating them from scratch and removing the originals. The compile time implications of this were measured on sqlite3 amalgamation, targeting AArch64 in -O0: LegalizerHelper::widenScalar: > 25% faster Legalizer::runOnMachineFunction: ~ 4.0 - 4.5% faster Also adding MachineOperand::setCImm and refactoring out MachineIRBuilder::recordInsertion methods to make the change possible. Reviewers: aditya_nandakumar, bogner, javed.absar, t.p.northover, ab, dsanders, arsenm Reviewed By: aditya_nandakumar Subscribers: wdng, rovka, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D46414 llvm-svn: 331819	2018-05-08 22:53:09 +00:00
Daniel Sanders	d24dcdd1f7	[globalisel] Add a combiner helpers for extending loads and use them in a pre-legalize combiner for AArch64 Summary: Depends on D45541 Reviewers: ab, aditya_nandakumar, bogner, rtereshin, volkan, rovka, javed.absar, aemerson Reviewed By: aemerson Subscribers: aemerson, rengolin, mgorny, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D45543 llvm-svn: 331816	2018-05-08 22:26:39 +00:00
Roman Lebedev	0412c90236	[DAGCombine][NFC] Masked merge unfolding: comment: some tests are non-canonical As requested in https://reviews.llvm.org/D46494#inline-407282 llvm-svn: 331650	2018-05-07 16:42:47 +00:00
Daniel Sanders	acc008cb0c	[globalisel] Remove redundant -global-isel option from tests that use -run-pass. NFC As Roman Tereshin pointed out in https://reviews.llvm.org/D45541, the -global-isel option is redundant when -run-pass is given. -global-isel sets up the GlobalISel passes in the pass manager but -run-pass skips that entirely and configures it's own pipeline. llvm-svn: 331603	2018-05-05 21:19:59 +00:00
Daniel Sanders	f84bc3793e	[globalisel] Update GlobalISel emitter to match new representation of extending loads Summary: Previously, a extending load was represented at (G_EXT (G_LOAD x)). This had a few drawbacks: G_LOAD had to be legal for all sizes you could extend from, even if registers didn't naturally hold those sizes. * All sizes you could extend from had to be allocatable just in case the extend went missing (e.g. by optimization). * At minimum, G_EXT and G_TRUNC had to be legal for these sizes. As we improve optimization of extends and truncates, this legality requirement would spread without considerable care w.r.t when certain combines were permitted. The SelectionDAG importer required some ugly and fragile pattern rewriting to translate patterns into this style. This patch changes the representation to: * (G_[SZ]EXTLOAD x) * (G_LOAD x) any-extends when MMO.getSize() * 8 < ResultTy.getSizeInBits() which resolves these issues by allowing targets to work entirely in their native register sizes, and by having a more direct translation from SelectionDAG patterns. Each extending load can be lowered by the legalizer into separate extends and loads, however a target that supports s1 will need the any-extending load to extend to at least s8 since LLVM does not represent memory accesses smaller than 8 bit. The legalizer can widenScalar G_LOAD into an any-extending load but sign/zero-extending loads need help from something else like a combiner pass. A follow-up patch that adds combiner helpers for for this will follow. The new representation requires that the MMO correctly reflect the memory access so this has been corrected in a couple tests. I've also moved the extending loads to their own tests since they are (mostly) separate opcodes now. Additionally, the re-write appears to have invalidated two tests from select-with-no-legality-check.mir since the matcher table no longer contains loads that result in s1's and they aren't legal in AArch64 anymore. Depends on D45540 Reviewers: ab, aditya_nandakumar, bogner, rtereshin, volkan, rovka, javed.absar Reviewed By: rtereshin Subscribers: javed.absar, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D45541 llvm-svn: 331601	2018-05-05 20:53:24 +00:00
Roman Lebedev	a3b0b59f54	[DAGCombiner] Masked merge: don't touch "not" xor's. Summary: Split off form D46031. It seems we don't want to transform the pattern if the `xor`'s are actually `not`'s. In vector case, this breaks `andnpd` / `vandnps` patterns. That being said, we may want to re-visit this `not` handling, maybe in D46073. Reviewers: spatel, craig.topper, javed.absar Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D46492 llvm-svn: 331595	2018-05-05 15:45:40 +00:00
Geoff Berry	8e4958e760	[MachineLICM] Debug intrinsics shouldn't affect hoist decisions Summary: When checking if an instruction stores to a given frame index, check that the instruction can write to memory before looking at the memory operands list to avoid e.g. DBG_VALUE instructions that reference a frame index preventing a load from that index from being hoisted. Reviewers: dblaikie, MatzeB, qcolombet, reames, javed.absar Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D46284 llvm-svn: 331549	2018-05-04 19:25:09 +00:00
Adhemerval Zanella	6d56294c7f	[AArch64] Add missing testcase for r331522 llvm-svn: 331541	2018-05-04 17:21:26 +00:00
Adhemerval Zanella	a57ef17ab6	[AArch64] Custom Lower MULLH{S,U} for v16i8, v8i16, and v4i32 This patch adds a custom lowering for ISD::MULH{S,U} used on divide by constant optimization (DAGCombiner::BuildSDIV and DAGCombiner::BuildUDIV). New patterns for smull and umull are added, so AArch64ISD::{S,U}MULL can be correctly lowered to smull2 and umull2. Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D46009 llvm-svn: 331522	2018-05-04 14:33:55 +00:00
Piotr Padlewski	5dde809404	Rename invariant.group.barrier to launder.invariant.group Summary: This is one of the initial commit of "RFC: Devirtualization v2" proposal: https://docs.google.com/document/d/16GVtCpzK8sIHNc2qZz6RN8amICNBtvjWUod2SujZVEo/edit?usp=sharing Reviewers: rsmith, amharc, kuhar, sanjoy Subscribers: arsenm, nhaehnle, javed.absar, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D45111 llvm-svn: 331448	2018-05-03 11:03:01 +00:00
Eli Friedman	763d161eda	[AArch64] Add more tests for 64-bit immediate lowering. This adds a some more tests, and adds some notes to tests which are using a suboptimal lowering. The constants with suboptimal lowerings seem to be relatively rare in practice, but it might be a fun project to work on improvements. llvm-svn: 331304	2018-05-01 20:00:14 +00:00
Vedant Kumar	ee4bfcaa5a	[DAGCombiner] Set the right SDLoc on a newly-created zextload (1/N) Setting the right SDLoc on a newly-created zextload fixes a line table bug which resulted in non-linear stepping behavior. Several backend tests contained CHECK lines which relied on the IROrder inherited from the wrong SDLoc. This patch breaks that dependence where feasbile and regenerates test cases where not. In some cases, changing a node's IROrder may alter register allocation and spill behavior. This can affect performance. I have chosen not to prevent this by applying a "known good" IROrder to SDLocs, as this may hide a more general bug in the scheduler, or cause regressions on other test inputs. rdar://33755881, Part of: llvm.org/PR37262 Differential Revision: https://reviews.llvm.org/D45995 llvm-svn: 331300	2018-05-01 19:26:15 +00:00
Daniel Sanders	2de9d4ad5d	Fix infinite loop after r331115 There are two separate fixes here: * The lowering code for non-extending loads should report UnableToLegalize instead of emitting the same instruction. * The target should not be requesting lowering of non-extending loads. llvm-svn: 331201	2018-04-30 17:20:01 +00:00
Daniel Sanders	5eb9f581b6	[globalisel][legalizerinfo] Introduce dedicated extending loads and add lowerings for them Summary: Previously, a extending load was represented at (G_EXT (G_LOAD x)). This had a few drawbacks: G_LOAD had to be legal for all sizes you could extend from, even if registers didn't naturally hold those sizes. * All sizes you could extend from had to be allocatable just in case the extend went missing (e.g. by optimization). * At minimum, G_EXT and G_TRUNC had to be legal for these sizes. As we improve optimization of extends and truncates, this legality requirement would spread without considerable care w.r.t when certain combines were permitted. The SelectionDAG importer required some ugly and fragile pattern rewriting to translate patterns into this style. This patch begins changing the representation to: * (G_[SZ]EXTLOAD x) * (G_LOAD x) any-extends when MMO.getSize() * 8 < ResultTy.getSizeInBits() which resolves these issues by allowing targets to work entirely in their native register sizes, and by having a more direct translation from SelectionDAG patterns. This patch introduces the new generic instructions and new variation on G_LOAD and adds lowering for them to convert back to the existing representations. Depends on D45466 Reviewers: ab, aditya_nandakumar, bogner, rtereshin, volkan, rovka, aemerson, javed.absar Reviewed By: aemerson Subscribers: aemerson, kristof.beyls, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D45540 llvm-svn: 331115	2018-04-28 18:14:50 +00:00
Jessica Paquette	0b6724917a	[MachineOutliner] Add defs to calls + don't track liveness on outlined functions This commit makes it so that if you outline a def of some register, then the call instruction created by the outliner actually reflects that the register is defined by the call. It also makes it so that outlined functions don't have the TracksLiveness property. Outlined calls shouldn't break liveness assumptions that someone might make. This also un-XFAILs the noredzone test, and updates the calls test. llvm-svn: 331095	2018-04-27 23:36:35 +00:00
Jun Bum Lim	9e3e14b5f9	[PostRASink] extend the live-in check for all aliased registers Extend the live-in check for all aliased registers so that we can allow sinking Copy instructions when only implicit def is in successor's live-in. llvm-svn: 331072	2018-04-27 19:59:20 +00:00
Daniel Sanders	27fe8a5011	[globalisel][legalizerinfo] Add support for legalization based on the MachineMemOperand Summary: Currently only the memory size is supported but others can be added as needed. narrowScalar for G_LOAD and G_STORE now correctly update the MachineMemOperand and will refuse to legalize atomics since those need more careful expansions to maintain atomicity. Reviewers: ab, aditya_nandakumar, bogner, rtereshin, aemerson, javed.absar Reviewed By: aemerson Subscribers: aemerson, rovka, kristof.beyls, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D45466 llvm-svn: 331071	2018-04-27 19:48:53 +00:00
Francis Visoiu Mistrih	c855e92ca9	[AArch64] Place the first ldp at the end when ReverseCSRRestoreSeq is true Put the first ldp at the end, so that the load-store optimizer can run and merge the ldp and the add into a post-index ldp. This didn't work in case no frame was needed and resulted in code size regressions. llvm-svn: 331044	2018-04-27 15:30:54 +00:00
Oliver Stannard	76088a5929	[AArch64] Codegen for v8.2A dot product intrinsics This adds IR intrinsics for the AArch64 dot-product instructions introduced in v8.2-A. Differential revisioon: https://reviews.llvm.org/D46107 llvm-svn: 331036	2018-04-27 13:45:32 +00:00
Eli Friedman	da018e5687	[MachineOutliner] Don't outline from functions with a section marking. The program might have unusual expectations for functions; for example, the Linux kernel's build system warns if it finds references from .text to .init.data. I'm not sure this is something we actually want to make any guarantees about (there isn't any explicit rule that would disallow outlining in this case), but we might want to be conservative anyway. Differential Revision: https://reviews.llvm.org/D46091 llvm-svn: 331007	2018-04-27 00:21:34 +00:00
Geoff Berry	08ab8c9544	[AArch64] Fix scavenged spill slot base when stack realignment required. Summary: Use the FP for scavenged spill slot accesses to prevent corruption of the callee-save region when the SP is re-aligned. Based on problem and patch reported by @paulwalker-arm This is an alternative to solution proposed in D45770 Reviewers: t.p.northover, paulwalker-arm, thegameg, javed.absar Subscribers: qcolombet, mcrosier, paulwalker-arm, kristof.beyls, rengolin, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D46063 llvm-svn: 330976	2018-04-26 18:50:45 +00:00
Francis Visoiu Mistrih	57fcd3454a	[MIR] Add support for debug metadata for fixed stack objects Debug var, expr and loc were only supported for non-fixed stack objects. This patch adds the following fields to the "fixedStack:" entries, and renames the ones from "stack:" to: * debug-info-variable * debug-info-expression * debug-info-location Differential Revision: https://reviews.llvm.org/D46032 llvm-svn: 330859	2018-04-25 18:58:06 +00:00
Amara Emerson	1f5d994119	[AArch64][GlobalISel] Implement selection for the llvm.trap intrinsic. rdar://38674040 llvm-svn: 330831	2018-04-25 14:43:59 +00:00
Roman Lebedev	cfa9e58ccf	[X86][AArch64][NFC] Finish adding 'bad' tests for masked merge unfolding with constants. I have initially committed basic tests in, rL330771, but then quickly discovered that there are a few more interesting patterns. llvm-svn: 330819	2018-04-25 12:48:23 +00:00
Jessica Paquette	4f56428de1	[MachineOutliner] Check for explicit uses of LR/W30 in MI operands Before, the outliner would grab ADRPs that used LR/W30. This patch fixes that by checking for explicit uses of those registers before the special-casing for ADRPs. This also adds a test that ensures that those sorts of ADRPs won't be outlined. llvm-svn: 330783	2018-04-24 22:38:15 +00:00
Roman Lebedev	54df09e8fe	[X86][AArch64][NFC] Add tests for masked merge unfolding with %y = const The fold was added in D45733. This appears to be a regression. llvm-svn: 330771	2018-04-24 21:23:22 +00:00
Petar Jovanovic	e2bfcd6394	Correct dwarf unwind information in function epilogue This patch aims to provide correct dwarf unwind information in function epilogue for X86. It consists of two parts. The first part inserts CFI instructions that set appropriate cfa offset and cfa register in emitEpilogue() in X86FrameLowering. This part is X86 specific. The second part is platform independent and ensures that: * CFI instructions do not affect code generation (they are not counted as instructions when tail duplicating or tail merging) * Unwind information remains correct when a function is modified by different passes. This is done in a late pass by analyzing information about cfa offset and cfa register in BBs and inserting additional CFI directives where necessary. Added CFIInstrInserter pass: * analyzes each basic block to determine cfa offset and register are valid at its entry and exit * verifies that outgoing cfa offset and register of predecessor blocks match incoming values of their successors * inserts additional CFI directives at basic block beginning to correct the rule for calculating CFA Having CFI instructions in function epilogue can cause incorrect CFA calculation rule for some basic blocks. This can happen if, due to basic block reordering, or the existence of multiple epilogue blocks, some of the blocks have wrong cfa offset and register values set by the epilogue block above them. CFIInstrInserter is currently run only on X86, but can be used by any target that implements support for adding CFI instructions in epilogue. Patch by Violeta Vukobrat. Differential Revision: https://reviews.llvm.org/D42848 llvm-svn: 330706	2018-04-24 10:32:08 +00:00
Roman Tereshin	3c6ea7e28c	[GlobalISel][Legalizer] Look thro copies while combining G_UNMERGE's As we're becoming stricter w/ respect to not allowing vregs having LLTs and regclasses assigned both mid-globalisel pipeline, the number of extra copies grows, some of which separate G_UNMERGE's from their corresponding G_MERGE's, becoming a performance concern. It's worth mentioning that we're already looking through copies while combining legalization artifacts for every kind of artifact but G_UNMERGE. Reviewed By: aditya_nandakumar Reviewers: ab, t.p.northover, volkan, javed.absar Subscribers: rovka, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D45644 llvm-svn: 330660	2018-04-23 22:28:36 +00:00
Roman Lebedev	95c6eaf530	[DAGCombiner] Unfold scalar masked merge if profitable Summary: This is [[ https://bugs.llvm.org/show_bug.cgi?id=37104 \| PR37104 ]]. [[ https://bugs.llvm.org/show_bug.cgi?id=6773 \| PR6773 ]] will introduce an IR canonicalization that is likely bad for the end assembly. Previously, `andl`+`andn`/`andps`+`andnps` / `bic`/`bsl` would be generated. (see `@out`) Now, they would no longer be generated (see `@in`). So we need to make sure that they are still generated. If the mask is constant, we do nothing. InstCombine should have unfolded it. Else, i use `hasAndNot()` TLI hook. For now, only handle scalars. https://rise4fun.com/Alive/bO6 ---- I really don't like the code i wrote in `DAGCombiner::unfoldMaskedMerge()`. It is super fragile. Is there something like IR Pattern Matchers for this? Reviewers: spatel, craig.topper, RKSimon, javed.absar Reviewed By: spatel Subscribers: andreadb, courbet, kristof.beyls, javed.absar, rengolin, nemanjai, llvm-commits Differential Revision: https://reviews.llvm.org/D45733 llvm-svn: 330646	2018-04-23 20:38:49 +00:00
Roman Lebedev	bf18cc56d3	[X86][AArch64][NFC] Add tests for masked merge unfolding Summary: This is [[ https://bugs.llvm.org/show_bug.cgi?id=37104 \| PR37104 ]]. [[ https://bugs.llvm.org/show_bug.cgi?id=6773 \| PR6773 ]] will introduce an IR canonicalization that is likely bad for the end assembly. Previously, `andl`+`andn`/`andps`+`andnps` / `bic`/`bsl` would be generated. (see `@out`) Now, they would no longer be generated (see `@in`). I'm guessing `llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp` should be able to unfold this. Reviewers: spatel, craig.topper, RKSimon, javed.absar Reviewed By: spatel Subscribers: nemanjai, rengolin, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D45563 llvm-svn: 330645	2018-04-23 20:38:42 +00:00
Peter Collingbourne	5ab4a4793e	Reland r329956, "AArch64: Introduce a DAG combine for folding offsets into addresses.", with a fix for the bot failure. This reland includes a check to prevent the DAG combiner from folding an offset that is smaller than the existing one. This can cause oscillations between two possible DAGs, which was the cause of the hang and later assertion failure observed on the lnt-ctmark-aarch64-O3-flto bot. http://green.lab.llvm.org/green/job/lnt-ctmark-aarch64-O3-flto/2024/ Original commit message: > This is a code size win in code that takes offseted addresses > frequently, such as C++ constructors that typically need to compute > an offseted address of a vtable. This reduces the size of Chromium > for Android's .text section by 108KB. Differential Revision: https://reviews.llvm.org/D45199 llvm-svn: 330630	2018-04-23 19:09:34 +00:00
Eli Friedman	0644130612	[AArch64] Don't crash trying to resolve __stack_chk_guard. In certain cases, the compiler might try to merge __stack_chk_guard with another global variable. (Or someone could theoretically define __stack_chk_guard as an alias.) In that case, make sure we don't crash. Differential Revision: https://reviews.llvm.org/D45746 llvm-svn: 330495	2018-04-21 00:07:46 +00:00
Jessica Paquette	e5d279e6d6	Fix typo in test (verify-machine-instrs -> verify-machineinstrs) llvm-svn: 330494	2018-04-20 23:37:48 +00:00
Jessica Paquette	d442c3a632	[MachineOutliner] XFAIL machine-outliner-noredzone.ll The verifier began complaining about an undefined physical register in this test. XFAILing for the purposes of getting a bot up while I look into it. Failure: http://lab.llvm.org:8080/green/job/clang-stage1-cmake-RA-expensive/11385/ llvm-svn: 330493	2018-04-20 23:35:54 +00:00
Jessica Paquette	2e5ada5c81	[MachineOutliner] Change B instruction for tail calls to TCRETURNdi First off, this is more correct than having the B. Second off, this was making a bot upset. This fixes that. Update the test to include -verify-machineinstrs as well to prevent stuff like this slipping by non debug/assert builds in the future. llvm-svn: 330459	2018-04-20 18:03:21 +00:00
Sanjay Patel	3d453ad711	[DAGCombine] (float)((int) f) --> ftrunc (PR36617) This was originally committed at rL328921 and reverted at rL329920 to investigate failures in Chrome. This time I've added to the ReleaseNotes to warn users of the potential of exposing UB and let me repeat that here for more exposure: Optimization of floating-point casts is improved. This may cause surprising results for code that is relying on undefined behavior. Code sanitizers can be used to detect affected patterns such as this: int main() { float x = 4294967296.0f; x = (float)((int)x); printf("junk in the ftrunc: %f\n", x); return 0; } $ clang -O1 ftrunc.c -fsanitize=undefined ; ./a.out ftrunc.c:5:15: runtime error: 4.29497e+09 is outside the range of representable values of type 'int' junk in the ftrunc: 0.000000 Original commit message: fptosi / fptoui round towards zero, and that's the same behavior as ISD::FTRUNC, so replace a pair of casts with the equivalent node. We don't have to account for special cases (NaN, INF) because out-of-range casts are undefined. Differential Revision: https://reviews.llvm.org/D44909 llvm-svn: 330437	2018-04-20 15:07:55 +00:00
Amara Emerson	9de072f8ae	[AArch64] Add isel pattern for v8i8->v2f32 NVCASTs. rdar://39454635 llvm-svn: 330276	2018-04-18 17:10:19 +00:00
Momchil Velikov	d501a609c4	Add tests for shrink wrapping and VLAs Differential revision: https://reviews.llvm.org/D45727 llvm-svn: 330253	2018-04-18 13:37:12 +00:00
Tim Northover	271d3d2771	MachO: trap unreachable instructions Debugability is more important than saving 4 bytes to let us to fall through to nonense. llvm-svn: 330073	2018-04-13 22:25:20 +00:00
Peter Collingbourne	ab40e44ba1	Revert r329956, "AArch64: Introduce a DAG combine for folding offsets into addresses." Caused a hang and eventually an assertion failure in LTO builds of 7zip-benchmark on aarch64 iOS targets. http://green.lab.llvm.org/green/job/lnt-ctmark-aarch64-O3-flto/2024/ llvm-svn: 330063	2018-04-13 20:21:00 +00:00
Peter Collingbourne	00db326b0d	AArch64: Introduce a DAG combine for folding offsets into addresses. This is a code size win in code that takes offseted addresses frequently, such as C++ constructors that typically need to compute an offseted address of a vtable. This reduces the size of Chromium for Android's .text section by 108KB. Differential Revision: https://reviews.llvm.org/D45199 llvm-svn: 329956	2018-04-12 21:23:55 +00:00
Jessica Paquette	8aa6cd5cb9	[AArch64] Move AFI->setRedZone(false) to top of emitPrologue AFI->setRedZone(false) was put in the wrong place before, and so it only fired on functions that didn't have stack frames. This moves that to the top of emitPrologue to make sure that every function without a redzone has it set correctly. This also adds a function representing one of the early exit cases (GHC calling convention) to the MachineOutliner noredzone test to ensure that we can outline from functions like these, where we never use a redzone. llvm-svn: 329922	2018-04-12 16:16:18 +00:00
Sanjay Patel	5ace2b765a	revert r328921 - [DAGCombine] (float)((int) f) --> ftrunc (PR36617) This change is exposing UB in source code - as was warned/predicted. :) See D44909 for discussion. Reverting while we figure out how to fix things. llvm-svn: 329920	2018-04-12 15:27:01 +00:00

1 2 3 4 5 ...

2292 Commits