Commit Graph

2292 Commits

Author SHA1 Message Date
David Bolvansky 2fa7fb14ea [DAGCombiner] Bug 31275- Extract a shift from a constant mul or udiv if a rotate can be formed
Summary:
Attempt to extract a shrl from a udiv or a shl from a mul if this allows a rotate to be formed.  This targets cases where the input to a rotate pattern was a mul or udiv by a constant and InstCombine merged one of the shifts with the op.

Patch by: sameconrad (Sam Conrad)

Reviewers: RKSimon, craig.topper, spatel, lebedev.ri, javed.absar

Reviewed By: lebedev.ri

Subscribers: efriedma, kparzysz, llvm-commits

Differential Revision: https://reviews.llvm.org/D47681

llvm-svn: 338270
2018-07-30 16:50:00 +00:00
Jessica Paquette 7816531f3c Attempt to fix Windows test failure caused by r338133
It seems like the pass pipeline on Windows is slightly different than on Linux
and macOS. As a result, the arm64-opt-remarks-lazy-bfi test has been failing.

This switches a CHECK-NEXT to a CHECK-DAG to try and get this running properly
again.

It'd be nice to switch it back to a CHECK-NEXT if possible, but the CHECK-NEXT
lines following the line we care about (the optimization remark emitter)
do a pretty good job of enforcing the ordering we want.

Hopefully this works, since I don't have a Windows machine. ;)

Example failure: http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/11295

llvm-svn: 338267
2018-07-30 16:36:22 +00:00
Craig Topper 50b1d4303d [DAGCombiner] Teach DAG combiner that A-(B-C) can be folded to A+(C-B)
This can be useful since addition is commutable, and subtraction is not.

This matches a transform that is also done by InstCombine.

llvm-svn: 338181
2018-07-28 00:27:25 +00:00
Jessica Paquette f90edbe3d6 Recommit "Enable MachineOutliner by default under -Oz for AArch64"
Fixed the ASAN failure from before in r338148, so recommiting.

This patch enables the MachineOutliner by default in AArch64 under -Oz.

The MachineOutliner offers around a 4.5% improvement on the current -Oz code
size improvements.

We have done work into improving the debuggability of outlined code, so that
users of -Oz won't be surprised by the optimization. We have also been executing
the LLVM test suite and common external tests such as the SPEC suites
continuously with no issue. The outliner has a low compile-time overhead of
roughly 1%. At this point, the outliner would be a really good addition to the
-Oz pass pipeline!

llvm-svn: 338160
2018-07-27 20:18:27 +00:00
Sanjay Patel 06c7d5aef6 [AArch64, PowerPC, x86] add more signbit math tests; NFC
The tests with a constant sub operand were added with rL338143,
but the potential transform doesn't have that requirement, so
adding more tests with variable operands.

llvm-svn: 338150
2018-07-27 18:31:21 +00:00
Sanjay Patel efac39eef6 [AArch64, PowerPC, x86] add more signbit math tests; NFC
llvm-svn: 338143
2018-07-27 18:12:29 +00:00
Jessica Paquette faea2d3130 Revert "Enable MachineOutliner by default under -Oz for AArch64"
It failed an Asan test on a bot:

http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/21543/steps/check-llvm%20asan/logs/stdio

Fixing that before recommitting.

llvm-svn: 338136
2018-07-27 17:25:38 +00:00
Jessica Paquette d4229b985c Enable MachineOutliner by default under -Oz for AArch64
This patch enables the MachineOutliner by default in AArch64 under -Oz.

The MachineOutliner offers around a 4.5% improvement on the current -Oz code
size improvements.

We have done work into improving the debuggability of outlined code, so that
users of -Oz won't be surprised by the optimization. We have also been executing
the LLVM test suite and common external tests such as the SPEC suites
continuously with no issue. The outliner has a low compile-time overhead of
roughly 1%. At this point, the outliner would be a really good addition to the
-Oz pass pipeline!

llvm-svn: 338133
2018-07-27 16:44:42 +00:00
Sanjay Patel c7abb416dc [DAGCombiner] fold 'not' with signbit math
This is a follow-up suggested in D48970. 

Alive proofs:
https://rise4fun.com/Alive/sII

We can eliminate an instruction in the usual select-of-constants 
to bit hack transform by adjusting the add/sub with constant.
This is always a win. 

There are more transforms that are likely wins, but they may need 
target hooks in case some targets do not benefit. 

This is another step towards making up for canonicalizing to 
select-of-constants in rL331486.

llvm-svn: 338132
2018-07-27 16:42:55 +00:00
Sanjay Patel f815bc658b [AArch64] add more tests for signbit math; NFC
llvm-svn: 338129
2018-07-27 16:21:56 +00:00
Matthias Braun 09810c9269 MacroFusion: Fix macro fusion with ExitSU failing in top-down scheduling
When fusing instructions A and B, we must add all predecessors of B as
predecessors of A to avoid instructions getting scheduling in between.

There is a special case involving ExitSU: Every other node must be
scheduled before it by design and we don't need to make this explicit in
the graph, however when fusing with a different node we need to schedule
every othere node before the fused node too and we need to make this
explicit now: This patch adds a dependency from the fused node to all
roots in the graph.

Differential Revision: https://reviews.llvm.org/D49830

llvm-svn: 338046
2018-07-26 17:43:56 +00:00
Roman Lebedev 41ba5c1455 [DAGCombine] optimizeSetCCOfSignedTruncationCheck(): handle ule,ugt CondCodes.
Summary:
A follow-up for D49266 / rL337166.

At least one of these cases is more canonical,
so we really do have to handle it.
https://godbolt.org/g/pkzP3X
https://rise4fun.com/Alive/pQyhZZ

We won't get to these cases with I1 being -1,
as that will be constant-folded to true or false.

I'm also not sure we actually hit the 'ule' case,
but i think the worst think that could happen is that being dead code.

Reviewers: spatel, craig.topper, RKSimon, javed.absar, efriedma

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D49497

llvm-svn: 338044
2018-07-26 17:34:28 +00:00
Martin Storsjo 9dafd6f6d9 Revert "[COFF] Use comdat shared constants for MinGW as well"
This reverts commit r337951.

While that kind of shared constant generally works fine in a MinGW
setting, it broke some cases of inline assembly that worked before:

$ cat const-asm.c
int MULH(int a, int b) {
    int rt, dummy;
    __asm__ (
        "imull %3"
        :"=d"(rt), "=a"(dummy)
        :"a"(a), "rm"(b)
    );
    return rt;
}
int func(int a) {
    return MULH(a, 1);
}
$ clang -target x86_64-win32-gnu -c const-asm.c -O2
const-asm.c:4:9: error: invalid variant '00000001'
        "imull %3"
        ^
<inline asm>:1:15: note: instantiated into assembly here
        imull __real@00000001(%rip)
                     ^

A similar error is produced for i686 as well. The same test with a
target of x86_64-win32-msvc or i686-win32-msvc works fine.

llvm-svn: 338018
2018-07-26 10:48:20 +00:00
Amara Emerson fdd089aa14 [GlobalISel] Fall back to SDISel for swifterror/swiftself attributes.
We don't currently support these, fall back until we do.

llvm-svn: 337994
2018-07-26 01:25:58 +00:00
Sanjay Patel 215dcbf4db [SelectionDAG] try to convert funnel shift directly to rotate if legal
If the DAGCombiner's rotate matching was working as expected, 
I don't think we'd see any test diffs here. 

This sidesteps the issue of custom lowering for rotates raised in PR38243:
https://bugs.llvm.org/show_bug.cgi?id=38243
...by only dealing with legal operations.

llvm-svn: 337966
2018-07-25 21:38:30 +00:00
Sanjay Patel f94c4c84e6 [AArch, PowerPC] add more tests for legal rotate ops; NFC
llvm-svn: 337964
2018-07-25 21:25:50 +00:00
Martin Storsjo ff33a95ed4 [COFF] Use comdat shared constants for MinGW as well
GNU binutils tools have no problems with this kind of shared constants,
provided that we actually hook it up completely in AsmPrinter and
produce a global symbol.

This effectively reverts SVN r335918 by hooking the rest of it up
properly.

This feature was implemented originally in SVN r213006, with no reason
for why it can't be used for MinGW other than the fact that GCC doesn't
do it while MSVC does.

Differential Revision: https://reviews.llvm.org/D49646

llvm-svn: 337951
2018-07-25 18:35:42 +00:00
Martin Storsjo d2662c32fb [COFF] Hoist constant pool handling from X86AsmPrinter into AsmPrinter
In SVN r334523, the first half of comdat constant pool handling was
hoisted from X86WindowsTargetObjectFile (which despite the name only
was used for msvc targets) into the arch independent
TargetLoweringObjectFileCOFF, but the other half of the handling was
left behind in X86AsmPrinter::GetCPISymbol.

With only half of the handling in place, inconsistent comdat
sections/symbols are created, causing issues with both GNU binutils
(avoided for X86 in SVN r335918) and with the MS linker, which
would complain like this:

fatal error LNK1143: invalid or corrupt file: no symbol for COMDAT section 0x4

Differential Revision: https://reviews.llvm.org/D49644

llvm-svn: 337950
2018-07-25 18:35:31 +00:00
Martin Storsjo c2b701408e [AArch64] Use MCAsmInfoMicrosoft and MCAsmInfoGNUCOFF as base classes
This matches the structure used on X86 and ARM. This requires
a little bit of duplication of the parts that are equal in both
AArch64 COFF variants though.

Before SVN r335286, these classes didn't add anything that MCAsmInfoCOFF
didn't, but now they do.

This makes AArch64 match X86 in how comdat is used for float constants
for MinGW.

Differential Revision: https://reviews.llvm.org/D49637

llvm-svn: 337755
2018-07-23 22:15:14 +00:00
Simon Pilgrim bfb900d363 [DAGCombiner] Add rotate-extract tests
Add new tests from D47681 to current codegen. Also added i686 codegen tests.

llvm-svn: 337445
2018-07-19 09:27:34 +00:00
Roman Lebedev 5317e88300 [NFC][X86][AArch64][DAGCombine] More tests for optimizeSetCCOfSignedTruncationCheck()
At least one of these cases is more canonical,
so we really do have to handle it.
https://godbolt.org/g/pkzP3X
https://rise4fun.com/Alive/pQyh

llvm-svn: 337400
2018-07-18 16:19:06 +00:00
Sanjay Patel c71adc8040 [Intrinsics] define funnel shift IR intrinsics + DAG builder support
As discussed here:
http://lists.llvm.org/pipermail/llvm-dev/2018-May/123292.html
http://lists.llvm.org/pipermail/llvm-dev/2018-July/124400.html

We want to add rotate intrinsics because the IR expansion of that pattern is 4+ instructions, 
and we can lose pieces of the pattern before it gets to the backend. Generalizing the operation 
by allowing 2 different input values (plus the 3rd shift/rotate amount) gives us a "funnel shift" 
operation which may also be a single hardware instruction.

Initially, I thought we needed to define new DAG nodes for these ops, and I spent time working 
on that (much larger patch), but then I concluded that we don't need it. At least as a first 
step, we have all of the backend support necessary to match these ops...because it was required. 
And shepherding these through the IR optimizer is the primary concern, so the IR intrinsics are 
likely all that we'll ever need.

There was also a question about converting the intrinsics to the existing ROTL/ROTR DAG nodes
(along with improving the oversized shift documentation). Again, I don't think that's strictly 
necessary (as the test results here prove). That can be an efficiency improvement as a small 
follow-up patch.

So all we're left with is documentation, definition of the IR intrinsics, and DAG builder support. 

Differential Revision: https://reviews.llvm.org/D49242

llvm-svn: 337221
2018-07-16 22:59:31 +00:00
Roman Lebedev de506632aa [X86][AArch64][DAGCombine] Unfold 'check for [no] signed truncation' pattern
Summary:

[[ https://bugs.llvm.org/show_bug.cgi?id=38149 | PR38149 ]]

As discussed in https://reviews.llvm.org/D49179#1158957 and later,
the IR for 'check for [no] signed truncation' pattern can be improved:
https://rise4fun.com/Alive/gBf
^ that pattern will be produced by Implicit Integer Truncation sanitizer,
https://reviews.llvm.org/D48958 https://bugs.llvm.org/show_bug.cgi?id=21530
in signed case, therefore it is probably a good idea to improve it.

But the IR-optimal patter does not lower efficiently, so we want to undo it..

This handles the simple pattern.
There is a second pattern with predicate and constants inverted.

NOTE: we do not check uses here. we always do the transform.

Reviewers: spatel, craig.topper, RKSimon, javed.absar

Reviewed By: spatel

Subscribers: kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D49266

llvm-svn: 337166
2018-07-16 12:44:10 +00:00
Sanjay Patel a41c886c55 [DAGCombiner] extend(ifpositive(X)) -> shift-right (not X)
This is almost the same as an existing IR canonicalization in instcombine, 
so I'm assuming this is a good early generic DAG combine too.

The motivation comes from reduced bit-hacking for select-of-constants in IR 
after rL331486. We want to restore that functionality in the DAG as noted in
the commit comments for that change and the llvm-dev discussion here:
http://lists.llvm.org/pipermail/llvm-dev/2018-July/124433.html

The PPC and AArch tests show that those targets are already doing something 
similar. x86 will be neutral in the minimal case and generally better when 
this pattern is extended with other ops as shown in the signbit-shift.ll tests.

Note the asymmetry: we don't include the (extend (ifneg X)) transform because 
it already exists in SimplifySelectCC(), and that is verified in the later 
unchanged tests in the signbit-shift.ll files. Without the 'not' op, the 
general transform to use a shift is always a win because that's a single 
instruction.

Alive proofs:
https://rise4fun.com/Alive/ysli

Name: if pos, get -1
  %c = icmp sgt i16 %x, -1
  %r = sext i1 %c to i16
  =>
  %n = xor i16 %x, -1
  %r = ashr i16 %n, 15

Name: if pos, get 1
  %c = icmp sgt i16 %x, -1
  %r = zext i1 %c to i16
  =>
  %n = xor i16 %x, -1
  %r = lshr i16 %n, 15

Differential Revision: https://reviews.llvm.org/D48970

llvm-svn: 337130
2018-07-15 16:27:07 +00:00
Roman Lebedev b64e74feed [NFC][X86][AArch64] Negative tests for 'check for [no] signed truncation' pattern
See D49247, D49266

I'm only adding the sane negative tests, and not
adding the one-use tests yet. Also, not adding
negative tests for the second pattern with inverted operands yet,
since it's handling will be added in later differential.

llvm-svn: 337014
2018-07-13 16:14:37 +00:00
Simon Pilgrim 9fe0bf3be7 [AArch64] Updated bigendian buildvector tests
As suggested by @efriedma on D49262 - changed the extractelement to a store to prevent SimplifyDemandedVectorElts from simplifying the build vectors - this keeps the immediate generation which was the point of the tests.

llvm-svn: 336981
2018-07-13 09:25:32 +00:00
Roman Lebedev 1574e49792 [NFC][X86][AArch64] Add tests for the 'check for [no] signed truncation' pattern
Summary:
[[ https://bugs.llvm.org/show_bug.cgi?id=38149 | PR38149 ]]

As discussed in https://reviews.llvm.org/D49179#1158957 and later,
the IR can be improved:
https://rise4fun.com/Alive/gBf
^ that pattern will be produced by Implicit Integer Truncation sanitizer,
https://reviews.llvm.org/D48958
https://bugs.llvm.org/show_bug.cgi?id=21530
in signed case, therefore it is probably a good idea to improve it.

But as it looks from these tests,
i think we want to revert at least some cases in DAGCombine.

Reviewers: spatel, craig.topper, RKSimon, javed.absar

Reviewed By: spatel

Subscribers: kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D49247

llvm-svn: 336917
2018-07-12 17:00:11 +00:00
Joel E. Denny 9fa9c9368d [FileCheck] Add -allow-deprecated-dag-overlap to failing llvm tests
See https://reviews.llvm.org/D47106 for details.

Reviewed By: probinson

Differential Revision: https://reviews.llvm.org/D47171

This commit drops that patch's changes to:

  llvm/test/CodeGen/NVPTX/f16x2-instructions.ll
  llvm/test/CodeGen/NVPTX/param-load-store.ll

For some reason, the dos line endings there prevent me from commiting
via the monorepo.  A follow-up commit (not via the monorepo) will
finish the patch.

llvm-svn: 336843
2018-07-11 20:25:49 +00:00
Simon Pilgrim f6ff75c4c2 Fix check-prefix vs check-prefixes typo in updated test
llvm-svn: 336787
2018-07-11 10:42:51 +00:00
Simon Pilgrim 1975efe555 [AArch64] Regenerate SDIV tests
Will make codegen diffs much easier to grok in a future patch

llvm-svn: 336786
2018-07-11 10:39:50 +00:00
Daniel Sanders 9481399c0f [globalisel][irtranslator] Add support for atomicrmw and (strong) cmpxchg
Summary:
This patch adds support for the atomicrmw instructions and the strong
cmpxchg instruction to the IRTranslator.

I've left out weak cmpxchg because LangRef.rst isn't entirely clear on what
difference it makes to the backend. As far as I can tell from the code, it
only matters to AtomicExpandPass which is run at the LLVM-IR level.

Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar, volkan, javed.absar

Reviewed By: qcolombet

Subscribers: kristof.beyls, javed.absar, igorb, llvm-commits

Differential Revision: https://reviews.llvm.org/D40092

llvm-svn: 336589
2018-07-09 19:33:40 +00:00
Yvan Roux 7382cf8225 [MachineOutliner] Add missing liveness tracking info in MIR test.
This should bring the bots back to green state.

llvm-svn: 336482
2018-07-07 08:42:31 +00:00
Nico Weber 038dbf3c24 Revert 336426 (and follow-ups 428, 440), it very likely caused PR38084.
llvm-svn: 336453
2018-07-06 17:37:24 +00:00
Diogo N. Sampaio 81e9dd1ed7 Commit rL336426 cause buildbot failures
http://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-incremental/50537/testReport/junit/LLVM/CodeGen_AArch64/FoldRedundantShiftedMasking_ll/

This removes the comments of the function label causing this error.

llvm-svn: 336440
2018-07-06 14:41:09 +00:00
Diogo N. Sampaio 742bf1a255 [SelectionDAG] https://reviews.llvm.org/D48278
D48278

Allow to reduce redundant shift masks.
For example:
x1 = x & 0xAB00
x2 = (x >> 8) & 0xAB

can be reduced to:
x1 = x & 0xAB00
x2 = x1 >> 8
It only allows folding when the masks and shift values are constants.

llvm-svn: 336426
2018-07-06 09:42:25 +00:00
Sanjay Patel bce899ff59 [AArch64, PowerPC, x86] add tests for signbit bit hacks; NFC
llvm-svn: 336348
2018-07-05 13:16:46 +00:00
Mikael Holmen 8505f34b29 Partial revert of "NFC - Various typo fixes in tests"
This partially reverts r336268 since it causes buildbot failures.

Added FIXME at the places where the CHECKs are misspelled.

llvm-svn: 336323
2018-07-05 08:42:16 +00:00
Gabor Buella da4a966e1c NFC - Various typo fixes in tests
llvm-svn: 336268
2018-07-04 13:28:39 +00:00
Amara Emerson d912ffaba5 [AArch64][GlobalISel] Fix fallbacks introduced in r336120 due to unselectable stores.
r336120 resulted in falling back to SelectionDAG more often due to the G_STORE
MMOs not matching the vreg size. This fixes that by explicitly any-extending the
value.

llvm-svn: 336209
2018-07-03 15:59:26 +00:00
Amara Emerson 846f2436e8 [AArch64][GlobalISel] Any-extend vararg parameters to stack slot size on Darwin.
We currently don't any-extend vararg parameters before storing them to the stack
locations on Darwin. However, SelectionDAG however does this, and so user code
is in the wild which inadvertently relies on this extension. This can manifest
in cases where the value stored is (int)0, but the actual parameter is interpreted
by va_arg as a pointer, and so not extending to 64 bits causes the callee to
load additional undefined bits.

llvm-svn: 336120
2018-07-02 16:39:09 +00:00
Jessica Paquette 8bda1881ca [MachineOutliner] Add support for target-default outlining.
This adds functionality to the outliner that allows targets to
specify certain functions that should be outlined from by default.

If a target supports default outlining, then it specifies that in
its TargetOptions. In the case that it does, and the user hasn't
specified that they *never* want to outline, the outliner will
be added to the pass pipeline and will run on those default functions.

This is a preliminary patch for turning the outliner on by default
under -Oz for AArch64.

https://reviews.llvm.org/D48776

llvm-svn: 336040
2018-06-30 03:56:03 +00:00
Jessica Paquette 79917b9686 [MachineOutliner] Add always and never options to -enable-machine-outliner
This is a recommit of r335887, which was erroneously committed earlier.

To enable the MachineOutliner by default on AArch64, we need to be able to
disable the MachineOutliner and also provide an option to "always" enable the
outliner.

This adds that capability. It allows the user to still use the old
-enable-machine-outliner option, which defaults to "always". This is building
up to allowing the user to specify "always" versus the target default
outlining behaviour.

https://reviews.llvm.org/D48682

llvm-svn: 335986
2018-06-29 16:12:45 +00:00
Jessica Paquette 0c5d3ffbb8 [MachineOutliner] Never add the outliner in -O0
This is a recommit of r335879.

We shouldn't add the outliner when compiling at -O0 even if
-enable-machine-outliner is passed in. This makes sure that we
don't add it in this case.

This also removes -O0 from the outliner DWARF test.

llvm-svn: 335930
2018-06-28 21:49:24 +00:00
Jessica Paquette d6261bef7b Revert "[MachineOutliner] Add always and never options to -enable-machine-outliner"
I accidentally committed this instead of D48683 because I haven't had coffee
yet.

llvm-svn: 335883
2018-06-28 17:26:19 +00:00
Jessica Paquette f3a44fe833 Revert "[MachineOutliner] Never add the outliner in -O0"
This reverts commit 9c7c10e4073a0bc6a759ce5cd33afbac74930091.

It relies on r335872 since that introduces the machine outliner
flags test. I meant to commit D48683 in that commit, but got mixed
up and committed D48682 instead. So, I'm reverting this and
r335872, since D48682 hasn't made it through review yet.

llvm-svn: 335882
2018-06-28 17:26:18 +00:00
Jessica Paquette c9d675266e [MachineOutliner] Never add the outliner in -O0
We shouldn't add the outliner when compiling at -O0 even if
-enable-machine-outliner is passed in. This makes sure that we
don't add it in this case.

This also updates machine-outliner-flags to reflect the change
and improves the comment describing what that test does.

llvm-svn: 335879
2018-06-28 17:05:57 +00:00
Jessica Paquette 1ccb66c5fb [MachineOutliner] Add always and never options to -enable-machine-outliner
To enable the MachineOutliner by default on AArch64, we need to be able to
disable the MachineOutliner and also provide an option to "always" enable the
outliner.

This adds that capability. It allows the user to still use the old
-enable-machine-outliner option, which defaults to "always". This is building
up to allowing the user to specify "always" versus the target-default
outlining behaviour.

llvm-svn: 335872
2018-06-28 16:39:42 +00:00
Daniel Sanders bdeb880d14 [globalisel][legalizer] Add AtomicOrdering to LegalityQuery and use it in AArch64
Now that we have the ability to legalize based on MMO's. Add support for
legalizing based on AtomicOrdering and use it to correct the legalization
of the atomic instructions.

Also extend all() to be a variadic template as this ruleset now requires
3 and 4 argument versions.

llvm-svn: 335767
2018-06-27 19:03:21 +00:00
Sanjay Patel d052de856d [DAGCombiner] restrict (float)((int) f) --> ftrunc with no-signed-zeros
As noted in the D44909 review, the transform from (fptosi+sitofp) to ftrunc 
can produce -0.0 where the original code does not:

#include <stdio.h>
  
int main(int argc) {
  float x;
  x = -0.8 * argc;
  printf("%f\n", (float)((int)x));
  return 0;
}

$ clang -O0 -mavx fp.c ; ./a.out 
0.000000
$ clang -O1 -mavx fp.c ; ./a.out 
-0.000000

Ideally, we'd use IR/node flags to predicate the transform, but the IR parser 
doesn't currently allow fast-math-flags on the cast instructions. So for now, 
just use the function attribute that corresponds to clang's "-fno-signed-zeros" 
option.

Differential Revision: https://reviews.llvm.org/D48085

llvm-svn: 335761
2018-06-27 18:16:40 +00:00
Jessica Paquette f472f6159a [MachineOutliner] Don't outline sequences where x16/x17/nzcv are live across
It isn't safe to outline sequences of instructions where x16/x17/nzcv live
across the sequence.

This teaches the outliner to check whether or not a specific canidate has
x16/x17/nzcv live across it and discard the candidate in the case that that is
true.

https://bugs.llvm.org/show_bug.cgi?id=37573
https://reviews.llvm.org/D47655

llvm-svn: 335758
2018-06-27 17:43:27 +00:00
Luke Geeson 316327150b [AArch64] Reverting FP16 vcvth_n_s64_f16 to fix
llvm-svn: 335737
2018-06-27 14:34:40 +00:00
Adhemerval Zanella cadcfed7aa [AArch64] Add custom lowering for v4i8 trunc store
This patch adds a custom trunc store lowering for v4i8 vector types.
Since there is not v.4b register, the v4i8 is promoted to v4i16 (v.4h)
and default action for v4i8 is to extract each element and issue 4
byte stores.

A better strategy would be to extended the promoted v4i16 to v8i16
(with undef elements) and extract and store the word lane which
represents the v4i8 subvectores. The construction:

  define void @foo(<4 x i16> %x, i8* nocapture %p) {
    %0 = trunc <4 x i16> %x to <4 x i8>
    %1 = bitcast i8* %p to <4 x i8>*
    store <4 x i8> %0, <4 x i8>* %1, align 4, !tbaa !2
    ret void
  }

Can be optimized from:

  umov    w8, v0.h[3]
  umov    w9, v0.h[2]
  umov    w10, v0.h[1]
  umov    w11, v0.h[0]
  strb    w8, [x0, #3]
  strb    w9, [x0, #2]
  strb    w10, [x0, #1]
  strb    w11, [x0]
  ret

To:

  xtn     v0.8b, v0.8h
  str     s0, [x0]
  ret

The patch also adjust the memory cost for autovectorization, so the C
code:

  void foo (const int *src, int width, unsigned char *dst)
  {
    for (int i = 0; i < width; i++)
       *dst++ = *src++;
  }

can be vectorized to:

  .LBB0_4:                                // %vector.body
                                          // =>This Inner Loop Header: Depth=1
        ldr     q0, [x0], #16
        subs    x12, x12, #4            // =4
        xtn     v0.4h, v0.4s
        xtn     v0.8b, v0.8h
        st1     { v0.s }[0], [x2], #4
        b.ne    .LBB0_4

Instead of byte operations.

llvm-svn: 335735
2018-06-27 13:58:46 +00:00
Luke Geeson 68cb233c0f [AArch64] Remove Duplicate FP16 Patterns with same encoding, match on existing patterns
llvm-svn: 335715
2018-06-27 09:20:13 +00:00
Vedant Kumar b725c69f12 [SelectionDAG] Remove debug locations from ConstantSD(FP)Nodes
This removes debug locations from ConstantSDNode and ConstantSDFPNode.

When this kind of node is materialized we no longer create a line table
entry which jumps back to the constant's first point of use. This makes
single-stepping behavior smoother, and it matches the model used by IR,
where Constants have no locations. See this thread for more context:

  http://lists.llvm.org/pipermail/llvm-dev/2018-June/124164.html

I'd like to handle constant BuildVectorSDNodes and to try to eliminate
passing SDLocs to SelectionDAG::getConstant*() in follow-up commits.

Differential Revision: https://reviews.llvm.org/D48468

llvm-svn: 335497
2018-06-25 17:06:18 +00:00
Aditya Nandakumar e2a7f31064 [GISel]: Add G_ADDRSPACE_CAST Opcode
Added IRTranslator support for addrspacecast.

https://reviews.llvm.org/D48469

reviewed by: volkan

llvm-svn: 335388
2018-06-22 20:58:51 +00:00
Sirish Pande b60acb9e48 Revert "[AArch64] Coalesce Copy Zero during instruction selection"
This reverts commit d8f57105010cc7e78026e511d5def873fc91e0e7.

Original Commit:

Author: Haicheng Wu <haicheng@codeaurora.org>
Date:   Sun Feb 18 13:51:33 2018 +0000

    [AArch64] Coalesce Copy Zero during instruction selection

    Add special case for copy of zero to avoid a double copy.

    Differential Revision: https://reviews.llvm.org/D36104

Author's intention is to remove a BB that has one mov instruction. In
order to do that, d8f571050 pessmizes MachineSinking by introducing a
copy, such that mov instruction is NOT moved to the BB. Optimization
downstream gets rid of the BB with only mov instruction. This works well
if we have only one fall through branch as there is only one "extra"
mov instruction.

If we have multiple fall throughs, we will have a lot of redundant movs.
In such a case, it's better to have this BB which has one mov instruction.

This is causing degradation in jpeg, fft and other codebases. I believe
if we want to remove a BB with only one branch instruction, we should not
pessimize Machine Sinking at all, and find some other solution.

llvm-svn: 335251
2018-06-21 16:05:24 +00:00
Tim Northover 70666e7765 [AArch64] Implement FLT_ROUNDS macro.
Very similar to ARM implementation, just maps to an MRS.

Should fix PR25191.

Patch by Michael Brase.

llvm-svn: 335118
2018-06-20 12:09:01 +00:00
Michael Berg 7b993d762f Utilize new SDNode flag functionality to expand current support for fadd
Summary: This patch originated from D46562 and is a proper subset, with some issues addressed.

Reviewers: spatel, hfinkel, wristow, arsenm, javed.absar

Reviewed By: spatel

Subscribers: wdng, nhaehnle

Differential Revision: https://reviews.llvm.org/D47909

llvm-svn: 334996
2018-06-18 23:44:59 +00:00
Michael Berg 8e570c3390 Utilize new SDNode flag functionality to expand current support for fma
Summary: This patch originated from D47388 and is a proper subset of the originating changes, containing only the fmf optimization guard extensions.

Reviewers: spatel, hfinkel, wristow, arsenm, javed.absar, rampitec, nhaehnle, nemanjai

Reviewed By: rampitec, nhaehnle

Subscribers: tpr, nemanjai, wdng

Differential Revision: https://reviews.llvm.org/D47918

llvm-svn: 334876
2018-06-16 00:03:06 +00:00
Justin Bogner 3b83edb037 Re-apply "[VirtRegRewriter] Avoid clobbering registers when expanding copy bundles"
This is r334750 (which was reverted in r334754) with a fix for an
uninitialized variable that was caught by msan.

Original commit message:
> If a copy bundle happens to involve overlapping registers, we can end
> up with emitting the copies in an order that ends up clobbering some
> of the subregisters. Since instructions in the copy bundle
> semantically happen at the same time, this is incorrect and we need to
> make sure we order the copies such that this doesn't happen.

llvm-svn: 334756
2018-06-14 19:24:03 +00:00
Justin Bogner 36c7f40f20 Revert "[VirtRegRewriter] Avoid clobbering registers when expanding copy bundles"
There's an msan failure:

  http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/19549

This reverts r334750.

llvm-svn: 334754
2018-06-14 19:10:57 +00:00
Justin Bogner 866d9f02be [VirtRegRewriter] Avoid clobbering registers when expanding copy bundles
If a copy bundle happens to involve overlapping registers, we can end
up with emitting the copies in an order that ends up clobbering some
of the subregisters. Since instructions in the copy bundle
semantically happen at the same time, this is incorrect and we need to
make sure we order the copies such that this doesn't happen.

Differential Revision: https://reviews.llvm.org/D48154

llvm-svn: 334750
2018-06-14 18:32:55 +00:00
Sanjay Patel 7d4929611c [DAGCombiner] remove hasOneUse() check from fadd constants transform
We're constant folding here, so we shouldn't check uses. This matches
the IR optimizer behavior.

The x86 test shows the expected win. The AArch64 test shows something
else. This only seems to happen if the "generic" AArch64 CPU model is 
used by MachineCombiner, so I'll file a bug report to follow-up.

llvm-svn: 334608
2018-06-13 15:22:48 +00:00
Sanjay Patel 82bc842650 [AArch64] add tests for fadd with more than one use; NFC
llvm-svn: 334556
2018-06-12 22:50:37 +00:00
Petr Hosek 7250908016 [AArch64] Support reserving x20 register
Register x20 is a callee-saved register which may be used for other
purposes in certain contexts, for example to hold special variables
within the kernel. This change adds support for reserving this register
both to frontend and backend to make this register usable for these
purposes.

Differential Revision: https://reviews.llvm.org/D46552

llvm-svn: 334531
2018-06-12 20:00:50 +00:00
Roman Tereshin b2d3f2e5da [MIR][MachineCSE] Implementing proper MachineInstr::getNumExplicitDefs()
Apparently, MachineInstr class definition as well as pretty much all of
the machine passes assume that the only kind of MachineInstr's operands
that is variadic for variadic opcodes is explicit non-definitions.

In particular, this assumption is made by MachineInstr::defs(), uses(),
and explicit_uses() methods, as well as by MachineCSE pass.

The assumption is incorrect judging from at least TableGen backend
implementation, that recognizes variable_ops in OutOperandList, and the
very existence of G_UNMERGE_VALUES generic opcode, or ARM load multiple
instructions, all of which have variadic defs.

In particular, MachineCSE pass breaks MIR with CSE'able G_UNMERGE_VALUES
instructions in it.

This commit implements MachineInstr::getNumExplicitDefs() similar to
pre-existing MachineInstr::getNumExplicitOperands(), fixes
MachineInstr::defs(), uses(), and explicit_uses(), and fixes MachineCSE
pass.

As the issue addressed seems to affect only machine passes that could be
ran mid-GlobalISel pipeline at the moment, the other passes aren't fixed
by this commit, like MachineLICM: that could be done on per-pass basis
when (if ever) they get adopted for GlobalISel.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D45640

llvm-svn: 334520
2018-06-12 18:30:37 +00:00
Michael Berg 95f3a430a8 NFC, some additional tests added and some renaming for planned fma support changes
llvm-svn: 334461
2018-06-12 00:52:43 +00:00
Roman Lebedev cb56f7a550 [NFC][X86][AArch64] Reorganize/cleanup BZHI test patterns
Summary:
In D47428, i propose to choose the `~(-(1 << nbits))` as the canonical form of low-bit-mask formation.
As it is seen from these tests, there is a reason for that.

AArch64 currently better handles `~(-(1 << nbits))`, but not the more traditional `(1 << nbits) - 1` (sic!).
The other way around for X86.
It would be much better to canonicalize.

It would seem that there is too much tests, but this is most of all the auto-generated possible variants
of C code that one would expect for BZHI to be formed, and then manually cleaned up a bit.
So this should be pretty representable, which somewhat good coverage...

Related links:
https://bugs.llvm.org/show_bug.cgi?id=36419
https://bugs.llvm.org/show_bug.cgi?id=37603
https://bugs.llvm.org/show_bug.cgi?id=37610
https://rise4fun.com/Alive/idM

Reviewers: javed.absar, craig.topper, RKSimon, spatel

Reviewed By: RKSimon

Subscribers: kristof.beyls, llvm-commits, RKSimon, craig.topper, spatel

Differential Revision: https://reviews.llvm.org/D47452

llvm-svn: 334124
2018-06-06 19:38:10 +00:00
Evandro Menezes b2c8244715 [AArch64, ARM] Add support for Samsung Exynos M4
Create a separate feature set for Exynos M4 and add test cases.

llvm-svn: 334115
2018-06-06 18:56:00 +00:00
David Green 25312b2b6c [GlobalMerge] Set the alignment on merged global structs
If no alignment is set, the abi/preferred alignment of structs will be
used which may be higher than required. This can lead to extra padding
and in the end an increase in data size.

Differential Revision: https://reviews.llvm.org/D47633

llvm-svn: 334099
2018-06-06 14:48:32 +00:00
Francis Visoiu Mistrih ca69b3bf6d [ShrinkWrap] Add optimization remarks to the shrink-wrapping pass
Start by emitting remarks for very basic unsupported cases such as
irreducible CFGs and EHFunclets. The end goal is to be able to cover all
the cases where we give up with an explanation.

llvm-svn: 333972
2018-06-05 00:27:24 +00:00
Amara Emerson 5a3bb68e12 [AArch64][GlobalISel] Zero-extend s1 values when returning.
Before we were relying on the any extend of the s1 to s32, but
for AAPCS we need to zero-extend it to at least s8.

Fixes PR36719

Differential Revision: https://reviews.llvm.org/D47425

llvm-svn: 333747
2018-06-01 13:20:32 +00:00
Luke Geeson 2e09995d42 [AArch64] Reverted rL333427 fixing Clang UnitTest Failure
llvm-svn: 333634
2018-05-31 08:27:53 +00:00
Roman Tereshin 5a65eb75c7 [GlobalISel][AArch64] LegalizerInfo verifier: Fixing bugs exposed by LegalizerInfo::verify(...)
Reviewers: aemerson, qcolombet

Reviewed By: qcolombet

Differential Revision: https://reviews.llvm.org/D46339

llvm-svn: 333618
2018-05-31 01:56:05 +00:00
Roman Tereshin 8f1753e994 [GlobalISel][AArch64] LegalizerInfo verifier: Adding LegalizerInfo::verify(...) call w/o fixing bugs
This is to make it clear what kind of bugs the LegalizerInfo::verifier
is able to catch and test its output

Reviewers: aemerson, qcolombet

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D46338

llvm-svn: 333597
2018-05-30 22:10:04 +00:00
Evandro Menezes f8425340e4 [AArch64] Fix PR32384: bump up the number of stores per memset and memcpy
As suggested in https://bugs.llvm.org/show_bug.cgi?id=32384#c1, this change
makes the inlining of `memset()` and `memcpy()` more aggressive when
compiling for speed.  The tuning remains the same when optimizing for size.

Patch by: Sebastian Pop <s.pop@samsung.com>
          Evandro Menezes <e.menezes@samsung.com>

Differential revision: https://reviews.llvm.org/D45098

llvm-svn: 333429
2018-05-29 15:58:50 +00:00
Amara Emerson d5a9e7bbc9 Revert "[AArch64] added FP16 vcvth intrinsic support"
This reverts commit r333410 due to bot failures.

llvm-svn: 333427
2018-05-29 15:34:22 +00:00
Luke Geeson 16092ab3c5 [AArch64] added FP16 vcvth intrinsic support
Summary: Change-Id: I0df845749c7689dfc99150ba7c19c7d0dadbd705

Reviewers: javed.absar, SjoerdMeijer

Reviewed By: SjoerdMeijer

Subscribers: llvm-commits, SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D46311

llvm-svn: 333410
2018-05-29 11:40:33 +00:00
Eli Friedman 9e177882aa [AArch64] Improve orr+movk sequences for MOVi64imm.
The existing code has three different ways to try to lower a 64-bit
immediate to the sequence ORR+MOVK.  The result is messy: it misses
some possible sequences, and the order of the checks means we sometimes
emit two MOVKs when we only need one.

Instead, just use a simple loop to try all possible two-instruction
ORR+MOVK sequences.

Differential Revision: https://reviews.llvm.org/D47176

llvm-svn: 333218
2018-05-24 19:38:23 +00:00
Geoff Berry 98150e3a62 [AArch64] Take advantage of variable shift/rotate amount implicit mod operation.
Summary:
Optimize code generated for variable shifts/rotates by taking advantage
of the implicit and/mod done on the variable shift amount register.

Resolves bug 27582 and bug 37421.

Reviewers: t.p.northover, qcolombet, MatzeB, javed.absar

Subscribers: rengolin, kristof.beyls, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D46844

llvm-svn: 333214
2018-05-24 18:29:42 +00:00
Eli Friedman 042dc9e092 [MachineOutliner] Add "thunk" outlining for AArch64.
When we're outlining a sequence that ends in a call, we can save up to
three instructions in the outlined function by turning the call into
a tail-call. I refer to this as thunk outlining because the resulting
outlined function looks like a thunk; suggestions welcome for a better
name.

In addition to making the outlined function shorter, thunk outlining
allows outlining calls which would otherwise be illegal to outline:
we don't need to save/restore LR, so we don't need to prove anything
about the stack access patterns of the callee.

To make this work effectively, I also added
MachineOutlinerInstrType::LegalTerminator to the generic MachineOutliner
code; this allows treating an arbitrary instruction as a terminator in
the suffix tree.

Differential Revision: https://reviews.llvm.org/D47173

llvm-svn: 333015
2018-05-22 19:11:06 +00:00
Sanjay Patel 17a870f07c [DAG] fold FP binops with undef operands to NaN
This is the FP sibling of D43141 with the corresponding IR change in rL327212.

We can't propagate undef here because if a variable operand is a NaN, these 
binops must propagate NaN. Neither global nor node-level fast-math makes a 
difference. If we have 'nnan', I think later folds can turn the NaN into undef.

The tests in X86/fp-undef.ll are meant to be the definitive verification for 
these folds - everything reduces identically now.

The other test changes are collateral damage. They may need to be altered to
preserve their intent.

Differential Revision: https://reviews.llvm.org/D47026

llvm-svn: 332920
2018-05-21 23:54:19 +00:00
Roman Lebedev 7772de25d0 [DAGCombine][X86][AArch64] Masked merge unfolding: vector edition.
Summary:
This **appears** to be the last missing piece for the masked merge pattern handling in the backend.

This is [[ https://bugs.llvm.org/show_bug.cgi?id=37104 | PR37104 ]].

[[ https://bugs.llvm.org/show_bug.cgi?id=6773 | PR6773 ]] will introduce an IR canonicalization that is likely bad for the end assembly.
Previously, `andps`+`andnps` / `bsl` would be generated. (see `@out`)
Now, they would no longer be generated  (see `@in`), and we need to make sure that they are generated.

Differential Revision: https://reviews.llvm.org/D46528

llvm-svn: 332904
2018-05-21 21:41:02 +00:00
Roman Lebedev fd79bc3aa2 [X86][AArch64][NFC] Add tests for vector masked merge unfolding
Summary:
This is [[ https://bugs.llvm.org/show_bug.cgi?id=37104 | PR37104 ]].

[[ https://bugs.llvm.org/show_bug.cgi?id=6773 | PR6773 ]] will introduce an IR canonicalization that is likely bad for the end assembly.
Previously, `andps`+`andnps` / `bsl` would be generated. (see `@out`)
Now, they would no longer be generated  (see `@in`).

Differential Revision: https://reviews.llvm.org/D46008

llvm-svn: 332903
2018-05-21 21:40:51 +00:00
Haicheng Wu 69ba0613f2 [GlobalMerge] Exit early if only one global is to be merged
To save some compilation time and prevent some unnecessary changes.

Differential Revision: https://reviews.llvm.org/D46640

llvm-svn: 332813
2018-05-19 18:00:02 +00:00
Eli Friedman 4081a57af7 [MachineOutliner] Count savings from outlining in bytes.
Counting the number of instructions is both unintuitive and inaccurate.
On AArch64, this only affects the generated remarks and certain rare
pseudo-instructions, but it will have a bigger impact on other targets.

Differential Revision: https://reviews.llvm.org/D46921

llvm-svn: 332685
2018-05-18 01:52:16 +00:00
Sanjay Patel ffd349577d [AArch64] preserve test intent by removing undef
We need to clean up the DAG floating-point undef logic.
This process is similar to how we handled integer undef
logic in D43141.

And as we did there, I'm trying to reduce the patch by
changing tests that would probably become meaningless
once we correct FP undef folding.

Follow-up to:
https://reviews.llvm.org/rL332534
...because that change wasn't enough.

llvm-svn: 332636
2018-05-17 18:07:02 +00:00
Sanjay Patel 60fe206793 [AArch64] preserve test intent by removing undef
We need to clean up the DAG floating-point undef logic.
This process is similar to how we handled integer undef
logic in D43141.

And as we did there, I'm trying to reduce the patch by
changing tests that would probably become meaningless
once we correct FP undef folding.

llvm-svn: 332534
2018-05-16 21:57:57 +00:00
Eli Friedman ddbf6d6514 [MachineOutliner] Don't outline instructions that modify SP.
This breaks the code which saves and restores LR, so we can't outline
without doing something more complicated for stack adjustment.

Found by inspection; we get lucky in most cases because getMemOpInfo
only handles STRWpost, not any other pre/post-increment forms. But it
hits a couple of artificial testcases in the tree.

Differential Revision: https://reviews.llvm.org/D46920

llvm-svn: 332529
2018-05-16 21:20:16 +00:00
Eli Friedman 02709bcb78 [MachineOutliner] Don't save/restore LR for tail calls.
The cost computation assumes we do this correctly, but the actual
lowering was wrong.

Differential Revision: https://reviews.llvm.org/D46923

llvm-svn: 332514
2018-05-16 19:49:01 +00:00
Sirish Pande cabe50a308 [AArch64] Gangup loads and stores for pairing.
Keep loads and stores together (target defines how many loads
and stores to gang up), such that it will help in pairing
and vectorization.

Differential Revision https://reviews.llvm.org/D46477

llvm-svn: 332482
2018-05-16 15:36:52 +00:00
Amara Emerson 0d6a26dffc [GlobalISel][IRTranslator] Split aggregates during IR translation.
We currently handle all aggregates by creating one large LLT, and letting the
legalizer deal with splitting them up. However using this approach means that
we can't support big endian code correctly.

This patch changes the way that the IRTranslator deals with aggregate values,
by splitting them up into their constituent element values. To do this, parts
of the translator need to be modified to deal with multiple VRegs for a single
Value.

A new Value to VReg mapper is introduced to help keep compile time under
control, currently there is no measurable impact on CTMark despite the extra
code being generated in some cases.

Patch is based on the original work of Tim Northover.

Differential Revision: https://reviews.llvm.org/D46018

llvm-svn: 332449
2018-05-16 10:32:02 +00:00
Peter Smith c811758da6 [AArch64] Support "S" inline assembler constraint
This patch re-introduces the "S" inline assembler constraint. This matches
an absolute symbolic address or a label reference. The primary use case is

asm("adrp %0, %1\n\t"
    "add %0, %0, :lo12:%1" : "=r"(addr) : "S"(&var));

I say re-introduces as it seems like "S" was implemented in the original
AArch64 backend, but it looks like it wasn't carried forward to the merged
backend. The original implementation had A and L modifiers that could be
used to print ":lo12:" to the string. It looks like gcc doesn't use these
and :lo12: is expected to be written in the inline assembly string so I've
not implemented A and L. Clang already supports the S modifier.

Fixes PR37180

Differential Revision: https://reviews.llvm.org/D46745

llvm-svn: 332444
2018-05-16 09:33:25 +00:00
Eli Friedman 25bef201c5 [MachineOutliner] Add optsize markings to outlined functions.
It doesn't matter much this late in the pipeline, but one place that
does check for it is the function alignment code.

Differential Revision: https://reviews.llvm.org/D46373

llvm-svn: 332415
2018-05-15 23:36:46 +00:00
Evandro Menezes 8d522d811a [AArch64] Improve single vector lane unscaled stores
When storing the 0th lane of a vector, use a simpler and usually more
efficient scalar store instead.  In this case, also using the unscaled
offset.

Differential revision: https://reviews.llvm.org/D46762

llvm-svn: 332394
2018-05-15 20:41:12 +00:00
Geoff Berry 32d07d59f5 [AArch64] Fix mir test case liveins info.
The test case added in r332265 had incomplete livein information which
was caught by the EXPENSIVE_CHECKS bot.  Fix the livein information and
add -verify-machineinstrs to the test case.

llvm-svn: 332367
2018-05-15 16:27:34 +00:00
Sanjay Patel 8652c53d29 [DAG] propagate FMF for all FPMathOperators
This is a simple hack based on what's proposed in D37686, but we can extend it if needed in follow-ups. 
It gets us most of the FMF functionality that we want without adding any state bits to the flags. It 
also intentionally leaves out non-FMF flags (nsw, etc) to minimize the patch.

It should provide a superset of the functionality from D46563 - the extra tests show propagation and 
codegen diffs for fcmp, vecreduce, and FP libcalls.

The PPC log2() test shows the limits of this most basic approach - we only applied 'afn' to the last 
node created for the call. AFAIK, there aren't any libcall optimizations based on the flags currently, 
so that shouldn't make any difference.

Differential Revision: https://reviews.llvm.org/D46854

llvm-svn: 332358
2018-05-15 14:16:24 +00:00
Sanjay Patel 165587b424 [AArch64] enhance test to show FMF loss; NFC
llvm-svn: 332301
2018-05-14 21:53:21 +00:00
Geoff Berry 64a2ea41ea [BranchFolding] Allow hoisting to block with a single conditional branch.
Summary:
The BranchFolding pass is currently missing opportunities to hoist
common code if the hoisted-to block contains a single conditional branch
that has register uses.  This occurs somewhat frequently on AArch64 with
CBZ/TBZ opcodes.

This change also eliminates some code differences when debug info is
present since the presence of e.g. DBG_VALUE instructions in the
hoisted-to block can enable hoisting that wouldn't have occurred without
them.

Reviewers: MatzeB, rnk, kparzysz, twoh, aprantl, javed.absar

Subscribers: kristof.beyls, JDevlieghere, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D46324

llvm-svn: 332265
2018-05-14 17:31:18 +00:00
Evandro Menezes 14fa2e4fa5 [AArch64] Improve single vector lane stores
When storing the 0th lane of a vector, use a simpler and usually more efficient scalar store instead.

Differential revision: https://reviews.llvm.org/D46655

llvm-svn: 332251
2018-05-14 15:26:35 +00:00
Vedant Kumar fd340a4047 [DAGCombiner] Set the right SDLoc on a newly-created sextload (6/N)
This teaches tryToFoldExtOfLoad to set the right location on a
newly-created extload. With that in place, the logic for performing a
certain ([s|z]ext (load ...)) combine becomes identical for sexts and
zexts, and we can get rid of one copy of the logic.

The test case churn is due to dependencies on IROrders inherited from
the wrong SDLoc.

Part of: llvm.org/PR37262

Differential Revision: https://reviews.llvm.org/D46158

llvm-svn: 332118
2018-05-11 18:40:08 +00:00
Geoff Berry 60460268c0 [AArch64] Fix performPostLD1Combine to check for constant lane index.
Summary:
performPostLD1Combine in AArch64ISelLowering looks for vector
insert_vector_elt of a loaded value which it can optimize into a single
LD1LANE instruction.  The code checking for the pattern was not checking
if the lane index was a constant which could cause two problems:

- an assert when lowering the LD1LANE ISD node since it assumes an
  constant operand

- an assert in isel if the lane index value depends on the
  post-incremented base register

Both of these issues are avoided by simply checking that the lane index
is a constant.

Fixes bug 35822.

Reviewers: t.p.northover, javed.absar

Subscribers: rengolin, kristof.beyls, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D46591

llvm-svn: 332103
2018-05-11 16:25:06 +00:00
Roman Tereshin 6d26638c90 [GlobalISel][Legalizer] Widening the second src op of shifts bug fix
The second source operand of G_SHL, G_ASHR, and G_LSHR must preserve its
value as a (small) unsigned integer, therefore its incorrect to widen it
in any way but by zero extending it.

G_SHL was using G_ANYEXT and G_ASHR - G_SEXT (which is correct for their
destination and first source operands, but not the "number of bits to
shift" operand).

Generally, shifts aren't as similar to regular binary operations as it
might seem, for instance, they aren't commutative nor associative and
the second source operand usually requires a special treatment.

Reviewers: bogner, javed.absar, aivchenk, rovka

Reviewed By: bogner

Subscribers: igorb, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D46413

llvm-svn: 331926
2018-05-09 21:43:30 +00:00
Roman Tereshin d5fa9fde58 Reapplying r331819 [GlobalISel][Legalizer] More concise and faster widenScalar, NFC
The commit was a suspect for clang-cmake-aarch64-global-isel and
    clang-cmake-aarch64-quick bot failures, proved to be innocent.

llvm-svn: 331898
2018-05-09 17:28:18 +00:00
Daniel Sanders 618437459c Revert r331816 and r331820 - [globalisel] Add a combiner helpers for extending loads and use them in a pre-legalize combiner for AArch64
Reverting this to see if the clang-cmake-aarch64-global-isel and
clang-cmake-aarch64-quick bots are failing because of this commit.
We know it wasn't r331819.

llvm-svn: 331846
2018-05-09 05:00:17 +00:00
Shiva Chen 2c864551df [DebugInfo] Add DILabel metadata and intrinsic llvm.dbg.label.
In order to set breakpoints on labels and list source code around
labels, we need collect debug information for labels, i.e., label
name, the function label belong, line number in the file, and the
address label located. In order to keep these information in LLVM
IR and to allow backend to generate debug information correctly.
We create a new kind of metadata for labels, DILabel. The format
of DILabel is

!DILabel(scope: !1, name: "foo", file: !2, line: 3)

We hope to keep debug information as much as possible even the
code is optimized. So, we create a new kind of intrinsic for label
metadata to avoid the metadata is eliminated with basic block.
The intrinsic will keep existing if we keep it from optimized out.
The format of the intrinsic is

llvm.dbg.label(metadata !1)

It has only one argument, that is the DILabel metadata. The
intrinsic will follow the label immediately. Backend could get the
label metadata through the intrinsic's parameter.

We also create DIBuilder API for labels to be used by Frontend.
Frontend could use createLabel() to allocate DILabel objects, and use
insertLabel() to insert llvm.dbg.label intrinsic in LLVM IR.

Differential Revision: https://reviews.llvm.org/D45024

Patch by Hsiangkai Wang.

llvm-svn: 331841
2018-05-09 02:40:45 +00:00
Roman Tereshin 27bba4495a Revert r331819 [GlobalISel][Legalizer] More concise and faster widenScalar, NFC
Reverting this to see if the clang-cmake-aarch64-global-isel and
clang-cmake-aarch64-quick bots are failing because of this commit

llvm-svn: 331839
2018-05-09 01:43:12 +00:00
Roman Tereshin 25cbfe680e [GlobalISel][Legalizer] More concise and faster widenScalar, NFC
Refactoring LegalizerHelper::widenScalar member function reducing its
size by approximately a factor of 2 and (hopefuly) making it more
straightforward and regular by introducing widenScalarSrc and
widenScalarDst helper methods.

The new widenScalar* methods mutate the instructions in place instead
of recreating them from scratch and removing the originals. The
compile time implications of this were measured on sqlite3
amalgamation, targeting AArch64 in -O0:

LegalizerHelper::widenScalar: > 25% faster
Legalizer::runOnMachineFunction: ~ 4.0 - 4.5% faster

Also adding MachineOperand::setCImm and refactoring out
MachineIRBuilder::recordInsertion methods to make the change possible.

Reviewers: aditya_nandakumar, bogner, javed.absar, t.p.northover, ab, dsanders, arsenm

Reviewed By: aditya_nandakumar

Subscribers: wdng, rovka, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D46414

llvm-svn: 331819
2018-05-08 22:53:09 +00:00
Daniel Sanders d24dcdd1f7 [globalisel] Add a combiner helpers for extending loads and use them in a pre-legalize combiner for AArch64
Summary: Depends on D45541

Reviewers: ab, aditya_nandakumar, bogner, rtereshin, volkan, rovka, javed.absar, aemerson

Reviewed By: aemerson

Subscribers: aemerson, rengolin, mgorny, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D45543

llvm-svn: 331816
2018-05-08 22:26:39 +00:00
Roman Lebedev 0412c90236 [DAGCombine][NFC] Masked merge unfolding: comment: some tests are non-canonical
As requested in https://reviews.llvm.org/D46494#inline-407282

llvm-svn: 331650
2018-05-07 16:42:47 +00:00
Daniel Sanders acc008cb0c [globalisel] Remove redundant -global-isel option from tests that use -run-pass. NFC
As Roman Tereshin pointed out in https://reviews.llvm.org/D45541, the
-global-isel option is redundant when -run-pass is given. -global-isel sets up
the GlobalISel passes in the pass manager but -run-pass skips that entirely and
configures it's own pipeline.

llvm-svn: 331603
2018-05-05 21:19:59 +00:00
Daniel Sanders f84bc3793e [globalisel] Update GlobalISel emitter to match new representation of extending loads
Summary:
Previously, a extending load was represented at (G_*EXT (G_LOAD x)).
This had a few drawbacks:
* G_LOAD had to be legal for all sizes you could extend from, even if
  registers didn't naturally hold those sizes.
* All sizes you could extend from had to be allocatable just in case the
  extend went missing (e.g. by optimization).
* At minimum, G_*EXT and G_TRUNC had to be legal for these sizes. As we
  improve optimization of extends and truncates, this legality requirement
  would spread without considerable care w.r.t when certain combines were
  permitted.
* The SelectionDAG importer required some ugly and fragile pattern
  rewriting to translate patterns into this style.

This patch changes the representation to:
* (G_[SZ]EXTLOAD x)
* (G_LOAD x) any-extends when MMO.getSize() * 8 < ResultTy.getSizeInBits()
which resolves these issues by allowing targets to work entirely in their
native register sizes, and by having a more direct translation from
SelectionDAG patterns.

Each extending load can be lowered by the legalizer into separate extends
and loads, however a target that supports s1 will need the any-extending
load to extend to at least s8 since LLVM does not represent memory accesses
smaller than 8 bit. The legalizer can widenScalar G_LOAD into an
any-extending load but sign/zero-extending loads need help from something
else like a combiner pass. A follow-up patch that adds combiner helpers for
for this will follow.

The new representation requires that the MMO correctly reflect the memory
access so this has been corrected in a couple tests. I've also moved the
extending loads to their own tests since they are (mostly) separate opcodes
now. Additionally, the re-write appears to have invalidated two tests from
select-with-no-legality-check.mir since the matcher table no longer contains
loads that result in s1's and they aren't legal in AArch64 anymore.

Depends on D45540

Reviewers: ab, aditya_nandakumar, bogner, rtereshin, volkan, rovka, javed.absar

Reviewed By: rtereshin

Subscribers: javed.absar, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D45541

llvm-svn: 331601
2018-05-05 20:53:24 +00:00
Roman Lebedev a3b0b59f54 [DAGCombiner] Masked merge: don't touch "not" xor's.
Summary:
Split off form D46031.

It seems we don't want to transform the pattern if the `xor`'s are actually `not`'s.
In vector case, this breaks `andnpd` / `vandnps` patterns.

That being said, we may want to re-visit this `not` handling, maybe in D46073.

Reviewers: spatel, craig.topper, javed.absar

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D46492

llvm-svn: 331595
2018-05-05 15:45:40 +00:00
Geoff Berry 8e4958e760 [MachineLICM] Debug intrinsics shouldn't affect hoist decisions
Summary:
When checking if an instruction stores to a given frame index, check
that the instruction can write to memory before looking at the memory
operands list to avoid e.g. DBG_VALUE instructions that reference a
frame index preventing a load from that index from being hoisted.

Reviewers: dblaikie, MatzeB, qcolombet, reames, javed.absar

Subscribers: mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D46284

llvm-svn: 331549
2018-05-04 19:25:09 +00:00
Adhemerval Zanella 6d56294c7f [AArch64] Add missing testcase for r331522
llvm-svn: 331541
2018-05-04 17:21:26 +00:00
Adhemerval Zanella a57ef17ab6 [AArch64] Custom Lower MULLH{S,U} for v16i8, v8i16, and v4i32
This patch adds a custom lowering for ISD::MULH{S,U} used on divide by
constant optimization (DAGCombiner::BuildSDIV and DAGCombiner::BuildUDIV).

New patterns for smull and umull are added, so AArch64ISD::{S,U}MULL
can be correctly lowered to smull2 and umull2.

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D46009

llvm-svn: 331522
2018-05-04 14:33:55 +00:00
Piotr Padlewski 5dde809404 Rename invariant.group.barrier to launder.invariant.group
Summary:
This is one of the initial commit of "RFC: Devirtualization v2" proposal:
https://docs.google.com/document/d/16GVtCpzK8sIHNc2qZz6RN8amICNBtvjWUod2SujZVEo/edit?usp=sharing

Reviewers: rsmith, amharc, kuhar, sanjoy

Subscribers: arsenm, nhaehnle, javed.absar, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D45111

llvm-svn: 331448
2018-05-03 11:03:01 +00:00
Eli Friedman 763d161eda [AArch64] Add more tests for 64-bit immediate lowering.
This adds a some more tests, and adds some notes to tests which are using
a suboptimal lowering.

The constants with suboptimal lowerings seem to be relatively rare in
practice, but it might be a fun project to work on improvements.

llvm-svn: 331304
2018-05-01 20:00:14 +00:00
Vedant Kumar ee4bfcaa5a [DAGCombiner] Set the right SDLoc on a newly-created zextload (1/N)
Setting the right SDLoc on a newly-created zextload fixes a line table
bug which resulted in non-linear stepping behavior.

Several backend tests contained CHECK lines which relied on the IROrder
inherited from the wrong SDLoc. This patch breaks that dependence where
feasbile and regenerates test cases where not.

In some cases, changing a node's IROrder may alter register allocation
and spill behavior. This can affect performance. I have chosen not to
prevent this by applying a "known good" IROrder to SDLocs, as this may
hide a more general bug in the scheduler, or cause regressions on other
test inputs.

rdar://33755881, Part of: llvm.org/PR37262

Differential Revision: https://reviews.llvm.org/D45995

llvm-svn: 331300
2018-05-01 19:26:15 +00:00
Daniel Sanders 2de9d4ad5d Fix infinite loop after r331115
There are two separate fixes here:
* The lowering code for non-extending loads should report UnableToLegalize instead of emitting the same instruction.
* The target should not be requesting lowering of non-extending loads.

llvm-svn: 331201
2018-04-30 17:20:01 +00:00
Daniel Sanders 5eb9f581b6 [globalisel][legalizerinfo] Introduce dedicated extending loads and add lowerings for them
Summary:
Previously, a extending load was represented at (G_*EXT (G_LOAD x)).
This had a few drawbacks:
* G_LOAD had to be legal for all sizes you could extend from, even if
  registers didn't naturally hold those sizes.
* All sizes you could extend from had to be allocatable just in case the
  extend went missing (e.g. by optimization).
* At minimum, G_*EXT and G_TRUNC had to be legal for these sizes. As we
  improve optimization of extends and truncates, this legality requirement
  would spread without considerable care w.r.t when certain combines were
  permitted.
* The SelectionDAG importer required some ugly and fragile pattern
  rewriting to translate patterns into this style.

This patch begins changing the representation to:
* (G_[SZ]EXTLOAD x)
* (G_LOAD x) any-extends when MMO.getSize() * 8 < ResultTy.getSizeInBits()
which resolves these issues by allowing targets to work entirely in their
native register sizes, and by having a more direct translation from
SelectionDAG patterns.

This patch introduces the new generic instructions and new variation on
G_LOAD and adds lowering for them to convert back to the existing
representations.

Depends on D45466

Reviewers: ab, aditya_nandakumar, bogner, rtereshin, volkan, rovka, aemerson, javed.absar

Reviewed By: aemerson

Subscribers: aemerson, kristof.beyls, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D45540

llvm-svn: 331115
2018-04-28 18:14:50 +00:00
Jessica Paquette 0b6724917a [MachineOutliner] Add defs to calls + don't track liveness on outlined functions
This commit makes it so that if you outline a def of some register, then the
call instruction created by the outliner actually reflects that the register
is defined by the call. It also makes it so that outlined functions don't
have the TracksLiveness property.

Outlined calls shouldn't break liveness assumptions that someone might make.

This also un-XFAILs the noredzone test, and updates the calls test.

llvm-svn: 331095
2018-04-27 23:36:35 +00:00
Jun Bum Lim 9e3e14b5f9 [PostRASink] extend the live-in check for all aliased registers
Extend the live-in check for all aliased registers so that we can
allow sinking Copy instructions when only implicit def is in successor's
live-in.

llvm-svn: 331072
2018-04-27 19:59:20 +00:00
Daniel Sanders 27fe8a5011 [globalisel][legalizerinfo] Add support for legalization based on the MachineMemOperand
Summary:
Currently only the memory size is supported but others can be added as
needed.

narrowScalar for G_LOAD and G_STORE now correctly update the
MachineMemOperand and will refuse to legalize atomics since those need more
careful expansions to maintain atomicity.

Reviewers: ab, aditya_nandakumar, bogner, rtereshin, aemerson, javed.absar

Reviewed By: aemerson

Subscribers: aemerson, rovka, kristof.beyls, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D45466

llvm-svn: 331071
2018-04-27 19:48:53 +00:00
Francis Visoiu Mistrih c855e92ca9 [AArch64] Place the first ldp at the end when ReverseCSRRestoreSeq is true
Put the first ldp at the end, so that the load-store optimizer can run
and merge the ldp and the add into a post-index ldp.

This didn't work in case no frame was needed and resulted in code size
regressions.

llvm-svn: 331044
2018-04-27 15:30:54 +00:00
Oliver Stannard 76088a5929 [AArch64] Codegen for v8.2A dot product intrinsics
This adds IR intrinsics for the AArch64 dot-product instructions introduced in
v8.2-A.

Differential revisioon: https://reviews.llvm.org/D46107

llvm-svn: 331036
2018-04-27 13:45:32 +00:00
Eli Friedman da018e5687 [MachineOutliner] Don't outline from functions with a section marking.
The program might have unusual expectations for functions; for example,
the Linux kernel's build system warns if it finds references from .text
to .init.data.

I'm not sure this is something we actually want to make any guarantees
about (there isn't any explicit rule that would disallow outlining
in this case), but we might want to be conservative anyway.

Differential Revision: https://reviews.llvm.org/D46091

llvm-svn: 331007
2018-04-27 00:21:34 +00:00
Geoff Berry 08ab8c9544 [AArch64] Fix scavenged spill slot base when stack realignment required.
Summary:
Use the FP for scavenged spill slot accesses to prevent corruption of
the callee-save region when the SP is re-aligned.

Based on problem and patch reported by @paulwalker-arm

This is an alternative to solution proposed in D45770

Reviewers: t.p.northover, paulwalker-arm, thegameg, javed.absar

Subscribers: qcolombet, mcrosier, paulwalker-arm, kristof.beyls, rengolin, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D46063

llvm-svn: 330976
2018-04-26 18:50:45 +00:00
Francis Visoiu Mistrih 57fcd3454a [MIR] Add support for debug metadata for fixed stack objects
Debug var, expr and loc were only supported for non-fixed stack objects.

This patch adds the following fields to the "fixedStack:" entries, and
renames the ones from "stack:" to:

* debug-info-variable
* debug-info-expression
* debug-info-location

Differential Revision: https://reviews.llvm.org/D46032

llvm-svn: 330859
2018-04-25 18:58:06 +00:00
Amara Emerson 1f5d994119 [AArch64][GlobalISel] Implement selection for the llvm.trap intrinsic.
rdar://38674040

llvm-svn: 330831
2018-04-25 14:43:59 +00:00
Roman Lebedev cfa9e58ccf [X86][AArch64][NFC] Finish adding 'bad' tests for masked merge unfolding with constants.
I have initially committed basic tests in, rL330771,
but then quickly discovered that there are a few more
interesting patterns.

llvm-svn: 330819
2018-04-25 12:48:23 +00:00
Jessica Paquette 4f56428de1 [MachineOutliner] Check for explicit uses of LR/W30 in MI operands
Before, the outliner would grab ADRPs that used LR/W30. This patch fixes
that by checking for explicit uses of those registers before the special-casing
for ADRPs.

This also adds a test that ensures that those sorts of ADRPs won't be outlined.

llvm-svn: 330783
2018-04-24 22:38:15 +00:00
Roman Lebedev 54df09e8fe [X86][AArch64][NFC] Add tests for masked merge unfolding with %y = const
The fold was added in D45733.

This appears to be a regression.

llvm-svn: 330771
2018-04-24 21:23:22 +00:00
Petar Jovanovic e2bfcd6394 Correct dwarf unwind information in function epilogue
This patch aims to provide correct dwarf unwind information in function
epilogue for X86.
It consists of two parts. The first part inserts CFI instructions that set
appropriate cfa offset and cfa register in emitEpilogue() in
X86FrameLowering. This part is X86 specific.

The second part is platform independent and ensures that:

* CFI instructions do not affect code generation (they are not counted as
  instructions when tail duplicating or tail merging)
* Unwind information remains correct when a function is modified by
  different passes. This is done in a late pass by analyzing information
  about cfa offset and cfa register in BBs and inserting additional CFI
  directives where necessary.

Added CFIInstrInserter pass:

* analyzes each basic block to determine cfa offset and register are valid
  at its entry and exit
* verifies that outgoing cfa offset and register of predecessor blocks match
  incoming values of their successors
* inserts additional CFI directives at basic block beginning to correct the
  rule for calculating CFA

Having CFI instructions in function epilogue can cause incorrect CFA
calculation rule for some basic blocks. This can happen if, due to basic
block reordering, or the existence of multiple epilogue blocks, some of the
blocks have wrong cfa offset and register values set by the epilogue block
above them.
CFIInstrInserter is currently run only on X86, but can be used by any target
that implements support for adding CFI instructions in epilogue.

Patch by Violeta Vukobrat.

Differential Revision: https://reviews.llvm.org/D42848

llvm-svn: 330706
2018-04-24 10:32:08 +00:00
Roman Tereshin 3c6ea7e28c [GlobalISel][Legalizer] Look thro copies while combining G_UNMERGE's
As we're becoming stricter w/ respect to not allowing vregs having LLTs
and regclasses assigned both mid-globalisel pipeline, the number of
extra copies grows, some of which separate G_UNMERGE's from their
corresponding G_MERGE's, becoming a performance concern.

It's worth mentioning that we're already looking through copies while
combining legalization artifacts for every kind of artifact but
G_UNMERGE.

Reviewed By: aditya_nandakumar

Reviewers: ab, t.p.northover, volkan, javed.absar

Subscribers: rovka, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D45644

llvm-svn: 330660
2018-04-23 22:28:36 +00:00
Roman Lebedev 95c6eaf530 [DAGCombiner] Unfold scalar masked merge if profitable
Summary:
This is [[ https://bugs.llvm.org/show_bug.cgi?id=37104 | PR37104 ]].

[[ https://bugs.llvm.org/show_bug.cgi?id=6773 | PR6773 ]] will introduce an IR canonicalization that is likely bad for the end assembly.
Previously, `andl`+`andn`/`andps`+`andnps` / `bic`/`bsl` would be generated. (see `@out`)
Now, they would no longer be generated  (see `@in`).
So we need to make sure that they are still generated.

If the mask is constant, we do nothing. InstCombine should have unfolded it.
Else, i use `hasAndNot()` TLI hook.

For now, only handle scalars.

https://rise4fun.com/Alive/bO6

----

I *really* don't like the code i wrote in `DAGCombiner::unfoldMaskedMerge()`.
It is super fragile. Is there something like IR Pattern Matchers for this?

Reviewers: spatel, craig.topper, RKSimon, javed.absar

Reviewed By: spatel

Subscribers: andreadb, courbet, kristof.beyls, javed.absar, rengolin, nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D45733

llvm-svn: 330646
2018-04-23 20:38:49 +00:00
Roman Lebedev bf18cc56d3 [X86][AArch64][NFC] Add tests for masked merge unfolding
Summary:
This is [[ https://bugs.llvm.org/show_bug.cgi?id=37104 | PR37104 ]].

[[ https://bugs.llvm.org/show_bug.cgi?id=6773 | PR6773 ]] will introduce an IR canonicalization that is likely bad for the end assembly.
Previously, `andl`+`andn`/`andps`+`andnps` / `bic`/`bsl` would be generated. (see `@out`)
Now, they would no longer be generated  (see `@in`).
I'm guessing `llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp` should be able to unfold this.

Reviewers: spatel, craig.topper, RKSimon, javed.absar

Reviewed By: spatel

Subscribers: nemanjai, rengolin, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D45563

llvm-svn: 330645
2018-04-23 20:38:42 +00:00
Peter Collingbourne 5ab4a4793e Reland r329956, "AArch64: Introduce a DAG combine for folding offsets into addresses.", with a fix for the bot failure.
This reland includes a check to prevent the DAG combiner from folding an
offset that is smaller than the existing one. This can cause oscillations
between two possible DAGs, which was the cause of the hang and later assertion
failure observed on the lnt-ctmark-aarch64-O3-flto bot.
http://green.lab.llvm.org/green/job/lnt-ctmark-aarch64-O3-flto/2024/

Original commit message:
> This is a code size win in code that takes offseted addresses
> frequently, such as C++ constructors that typically need to compute
> an offseted address of a vtable. This reduces the size of Chromium
> for Android's .text section by 108KB.

Differential Revision: https://reviews.llvm.org/D45199

llvm-svn: 330630
2018-04-23 19:09:34 +00:00
Eli Friedman 0644130612 [AArch64] Don't crash trying to resolve __stack_chk_guard.
In certain cases, the compiler might try to merge __stack_chk_guard with
another global variable.  (Or someone could theoretically define
__stack_chk_guard as an alias.)  In that case, make sure we don't crash.

Differential Revision: https://reviews.llvm.org/D45746

llvm-svn: 330495
2018-04-21 00:07:46 +00:00
Jessica Paquette e5d279e6d6 Fix typo in test (verify-machine-instrs -> verify-machineinstrs)
llvm-svn: 330494
2018-04-20 23:37:48 +00:00
Jessica Paquette d442c3a632 [MachineOutliner] XFAIL machine-outliner-noredzone.ll
The verifier began complaining about an undefined physical register in this
test. XFAILing for the purposes of getting a bot up while I look into it.

Failure:
http://lab.llvm.org:8080/green/job/clang-stage1-cmake-RA-expensive/11385/

llvm-svn: 330493
2018-04-20 23:35:54 +00:00
Jessica Paquette 2e5ada5c81 [MachineOutliner] Change B instruction for tail calls to TCRETURNdi
First off, this is more correct than having the B. Second off, this was making
a bot upset. This fixes that.

Update the test to include -verify-machineinstrs as well to prevent stuff like
this slipping by non debug/assert builds in the future.

llvm-svn: 330459
2018-04-20 18:03:21 +00:00
Sanjay Patel 3d453ad711 [DAGCombine] (float)((int) f) --> ftrunc (PR36617)
This was originally committed at rL328921 and reverted at rL329920 to
investigate failures in Chrome. This time I've added to the ReleaseNotes
to warn users of the potential of exposing UB and let me repeat that
here for more exposure:

  Optimization of floating-point casts is improved. This may cause surprising
  results for code that is relying on undefined behavior. Code sanitizers can
  be used to detect affected patterns such as this:

    int main() {
      float x = 4294967296.0f;
      x = (float)((int)x);
      printf("junk in the ftrunc: %f\n", x);
      return 0;
    }

    $ clang -O1 ftrunc.c -fsanitize=undefined ; ./a.out
    ftrunc.c:5:15: runtime error: 4.29497e+09 is outside the range of 
                   representable values of type 'int'
    junk in the ftrunc: 0.000000


Original commit message:

fptosi / fptoui round towards zero, and that's the same behavior as ISD::FTRUNC,
so replace a pair of casts with the equivalent node. We don't have to account for
special cases (NaN, INF) because out-of-range casts are undefined.

Differential Revision: https://reviews.llvm.org/D44909

llvm-svn: 330437
2018-04-20 15:07:55 +00:00
Amara Emerson 9de072f8ae [AArch64] Add isel pattern for v8i8->v2f32 NVCASTs.
rdar://39454635

llvm-svn: 330276
2018-04-18 17:10:19 +00:00
Momchil Velikov d501a609c4 Add tests for shrink wrapping and VLAs
Differential revision: https://reviews.llvm.org/D45727

llvm-svn: 330253
2018-04-18 13:37:12 +00:00
Tim Northover 271d3d2771 MachO: trap unreachable instructions
Debugability is more important than saving 4 bytes to let us to fall
through to nonense.

llvm-svn: 330073
2018-04-13 22:25:20 +00:00
Peter Collingbourne ab40e44ba1 Revert r329956, "AArch64: Introduce a DAG combine for folding offsets into addresses."
Caused a hang and eventually an assertion failure in LTO builds
of 7zip-benchmark on aarch64 iOS targets.
http://green.lab.llvm.org/green/job/lnt-ctmark-aarch64-O3-flto/2024/

llvm-svn: 330063
2018-04-13 20:21:00 +00:00
Peter Collingbourne 00db326b0d AArch64: Introduce a DAG combine for folding offsets into addresses.
This is a code size win in code that takes offseted addresses
frequently, such as C++ constructors that typically need to compute
an offseted address of a vtable. This reduces the size of Chromium
for Android's .text section by 108KB.

Differential Revision: https://reviews.llvm.org/D45199

llvm-svn: 329956
2018-04-12 21:23:55 +00:00
Jessica Paquette 8aa6cd5cb9 [AArch64] Move AFI->setRedZone(false) to top of emitPrologue
AFI->setRedZone(false) was put in the wrong place before, and so it only fired
on functions that didn't have stack frames. This moves that to the top of
emitPrologue to make sure that every function without a redzone has it set
correctly.

This also adds a function representing one of the early exit cases (GHC calling
convention) to the MachineOutliner noredzone test to ensure that we can outline
from functions like these, where we never use a redzone.

llvm-svn: 329922
2018-04-12 16:16:18 +00:00
Sanjay Patel 5ace2b765a revert r328921 - [DAGCombine] (float)((int) f) --> ftrunc (PR36617)
This change is exposing UB in source code - as was warned/predicted. :)
See D44909 for discussion. Reverting while we figure out how to fix things.

llvm-svn: 329920
2018-04-12 15:27:01 +00:00