Commit Graph

54771 Commits

Author SHA1 Message Date
Wolfgang Pieb 9ea65082ff [DWARF v5] Reposting r337981, which was reverted in r337997 due to a test failure in debuginfo_tests.
The test failure was caused by the compiler not emitting a __debug_ranges section with DWARF 4 and
earlier when no ranges are needed. The test checks for the existence regardless.

llvm-svn: 338081
2018-07-26 22:48:52 +00:00
Zachary Turner 23df1319ca [MS Demangler] Properly handle function parameter back-refs.
Properly demangle function parameter back-references.

Previously we treated lists of function parameters and template
parameters the same. There are some important differences with regards
to back-references, and some less important differences regarding which
characters can appear before or after the name.

The important differences are that with a given type T, all instances of
a function parameter list share the same global back-ref table.
Specifically, if X and Y are function pointers, then there are 3
entities in the declaration X func(Y) which all affect and are affected
by the master parameter back-ref table:
  1) The parameter list of X's function type
  2) the parameter list of func itself
  3) The parameter list of Y's function type.

The previous code would create a back-reference table that was local to
a single parameter list, so it would not be shared across parameter
lists.

This was discovered when porting ms-back-references.test from clang's
mangling tests. All of these tests should now pass with the new changes.

In doing so, I split the function for parsing template and function
parameters into two separate functions. This makes the template
parameter list parsing code in particular very small and easy to
understand now.

Differential Revision: https://reviews.llvm.org/D49875

llvm-svn: 338075
2018-07-26 22:13:39 +00:00
Keno Fischer 864fbd8e9a [SCEV] Don't expand Wrap predicate using inttoptr in ni addrspaces
Summary:
In non-integral address spaces, we're not allowed to introduce inttoptr/ptrtoint
intrinsics. Instead, we need to expand any pointer arithmetic as geps on the
base pointer. Luckily this is a common task for SCEV, so all we have to do here
is hook up the corresponding helper function and add test case.

Fixes PR38290

Reviewers: sanjoy
Differential Revision: https://reviews.llvm.org/D49832

llvm-svn: 338073
2018-07-26 21:55:06 +00:00
Vedant Kumar b572f64212 [DebugInfo] LowerDbgDeclare: Add derefs when handling CallInst users
LowerDbgDeclare inserts a dbg.value before each use of an address
described by a dbg.declare. When inserting a dbg.value before a CallInst
use, however, it fails to append DW_OP_deref to the DIExpression.

The DW_OP_deref is needed to reflect the fact that a dbg.value describes
a source variable directly (as opposed to a dbg.declare, which relies on
pointer indirection).

This patch adds in the DW_OP_deref where needed. This results in the
correct values being shown during a debug session for a program compiled
with ASan and optimizations (see https://reviews.llvm.org/D49520). Note
that ConvertDebugDeclareToDebugValue is already correct -- no changes
there were needed.

One complication is that SelectionDAG is unable to distinguish between
direct and indirect frame-index (FRAMEIX) SDDbgValues. This patch also
fixes this long-standing issue in order to not regress integration tests
relying on the incorrect assumption that all frame-index SDDbgValues are
indirect. This is a necessary fix: the newly-added DW_OP_derefs cannot
be lowered properly otherwise. Basically the fix prevents a direct
SDDbgValue with DIExpression(DW_OP_deref) from being dereferenced twice
by a debugger. There were a handful of tests relying on this incorrect
"FRAMEIX => indirect" assumption which actually had incorrect
DW_AT_locations: these are all fixed up in this patch.

Testing:

- check-llvm, and an end-to-end test using lldb to debug an optimized
  program.
- Existing unit tests for DIExpression::appendToStack fully cover the
  new DIExpression::append utility.
- check-debuginfo (the debug info integration tests)

Differential Revision: https://reviews.llvm.org/D49454

llvm-svn: 338069
2018-07-26 20:56:53 +00:00
Zachary Turner 024e1762aa [MS Demangler] Print calling convention inside parentheses.
For function pointers, we would print something like

int __cdecl (*)(int)

We need to move the calling convention inside, and print

int (__cdecl *)(int)

This patch implements this change for regular function pointers as
well as member function pointers.

llvm-svn: 338068
2018-07-26 20:33:48 +00:00
Zachary Turner ca7aef10c4 [MS Demangler] Add ms-arg-qualifiers.test
This converts the arg qualifier mangling tests from
clang/CodeGenCXX/mangle-ms-arg-qualifiers.cpp to demangling tests.
Most tests already pass, so this patch doesn't come with any
functional change, just the addition of new tests.  The few tests
that don't pass are left in with a FIXME label so that they don't
run but serve as documentation about what still doesn't work.

llvm-svn: 338067
2018-07-26 20:25:35 +00:00
Zachary Turner f4c4519532 Add missing tests from ms-mangle.cpp.
None of these tests pass yet so they are commented out, but I'm
adding them with a FIXME label so that they don't get lost when
copying tests over from clang's mangling tests.  Currently these
tests are all commented out.

llvm-svn: 338066
2018-07-26 20:20:29 +00:00
Zachary Turner 38b78a7f0e [MS Demangler] Demangle pointers to member functions.
After this patch, we can now properly demangle pointers to member
functions.  The calling convention is located in the wrong place,
but this will be fixed in a followup since it also affects non
member function pointers.

Differential Revision: https://reviews.llvm.org/D49639

llvm-svn: 338065
2018-07-26 20:20:10 +00:00
Martin Storsjo 390bce4322 [MC] Add support for the .rva assembler directive for COFF targets
Even though gas doesn't document it, it has been supported there for
a very long time.

This produces the 32 bit relative virtual address (aka image relative
address) for a given symbol. ".rva foo" is essentially equal to
".long foo@imgrel".

Differential Revision: https://reviews.llvm.org/D49821

llvm-svn: 338063
2018-07-26 20:11:26 +00:00
Stephen Hines e6e75bf84c Handle the lack of a symbol table correctly.
Summary:
These two cases will trigger a dereference on a nullptr, since the
SymbolTable can be nonexistent for a given library, in addition to just
being empty.

Reviewers: alexshap

Reviewed By: alexshap

Subscribers: meikeb, kongyi, chh, jakehehrlich, llvm-commits, pirama

Differential Revision: https://reviews.llvm.org/D49534

llvm-svn: 338062
2018-07-26 20:05:31 +00:00
Zachary Turner d742d645a1 [MS Demangler] Demangle data member pointers.
Differential Revision: https://reviews.llvm.org/D49630

llvm-svn: 338061
2018-07-26 19:56:09 +00:00
Scott Linder eb1f75d561 [AMDGPU] Fix VGPR spills where offset doesn't fit in 12 bits
Scale the offset of VGPR spills by the wave size when it cannot fit in the
12-bit offset immediate field and so is added to the soffset SGPR. This
accounts for hardware swizzling of scratch memory.

Differential Revision: https://reviews.llvm.org/D49448

llvm-svn: 338060
2018-07-26 19:47:51 +00:00
Sanjay Patel 6d6eab66e0 [InstCombine] fold udiv with common factor from muls with nuw
Unfortunately, sdiv isn't as simple because of UB due to overflow.

This fold is mentioned in PR38239:
https://bugs.llvm.org/show_bug.cgi?id=38239

llvm-svn: 338059
2018-07-26 19:22:41 +00:00
Ana Pazos 2e4106b73d [RISCV] Add support for _interrupt attribute
- Save/restore only registers that are used.
This includes Callee saved registers and Caller saved registers
(arguments and temporaries) for integer and FP registers.
- If there is a call in the interrupt handler, save/restore all
Caller saved registers (arguments and temporaries) and all FP registers.
- Emit special return instructions depending on "interrupt"
attribute type.
Based on initial patch by Zhaoshi Zheng.

Reviewers: asb

Reviewed By: asb

Subscribers: rkruppe, the_o, MartinMosbeck, brucehoult, rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, mgrang, rogfer01, llvm-commits

Differential Revision: https://reviews.llvm.org/D48411

llvm-svn: 338047
2018-07-26 17:49:43 +00:00
Matthias Braun 09810c9269 MacroFusion: Fix macro fusion with ExitSU failing in top-down scheduling
When fusing instructions A and B, we must add all predecessors of B as
predecessors of A to avoid instructions getting scheduling in between.

There is a special case involving ExitSU: Every other node must be
scheduled before it by design and we don't need to make this explicit in
the graph, however when fusing with a different node we need to schedule
every othere node before the fused node too and we need to make this
explicit now: This patch adds a dependency from the fused node to all
roots in the graph.

Differential Revision: https://reviews.llvm.org/D49830

llvm-svn: 338046
2018-07-26 17:43:56 +00:00
Roman Lebedev 41ba5c1455 [DAGCombine] optimizeSetCCOfSignedTruncationCheck(): handle ule,ugt CondCodes.
Summary:
A follow-up for D49266 / rL337166.

At least one of these cases is more canonical,
so we really do have to handle it.
https://godbolt.org/g/pkzP3X
https://rise4fun.com/Alive/pQyhZZ

We won't get to these cases with I1 being -1,
as that will be constant-folded to true or false.

I'm also not sure we actually hit the 'ule' case,
but i think the worst think that could happen is that being dead code.

Reviewers: spatel, craig.topper, RKSimon, javed.absar, efriedma

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D49497

llvm-svn: 338044
2018-07-26 17:34:28 +00:00
Alexey Bataev 4dd7558fab [DEBUGINFO, NVPTX] Emit correct debug information for local variables.
Summary:
NVPTX target dos not use register-based frame information. Instead it
relies on the artificial local_depot that is used instead of the frame
and the data for variables must be emitted relatively to this
local_depot.

Reviewers: tra, jlebar, echristo

Subscribers: jholewinski, aprantl, JDevlieghere, llvm-commits

Differential Revision: https://reviews.llvm.org/D45963

llvm-svn: 338039
2018-07-26 16:29:52 +00:00
Sanjay Patel b381e7e59a [InstCombine] add tests for udiv with common factor; NFC
This fold is mentioned in PR38239:
https://bugs.llvm.org/show_bug.cgi?id=38239

The general case probably belongs in -reassociate, but given that we do 
basic reassociation optimizations similar to this in instcombine already, 
we might as well be consistent within instcombine and handle this pattern?

llvm-svn: 338038
2018-07-26 16:14:53 +00:00
Alexey Bataev 7ae86fe71c [DEBUGINFO, NVPTX] Set `DW_AT_frame_base` to `DW_OP_call_frame_cfa`.
Summary:
For NVPTX target the value of `DW_AT_frame_base` attribute must be set
to `DW_OP_call_frame_cfa`.

Reviewers: tra, jlebar, echristo

Subscribers: jholewinski, JDevlieghere, llvm-commits

Differential Revision: https://reviews.llvm.org/D45785

llvm-svn: 338036
2018-07-26 16:10:05 +00:00
Jonas Devlieghere 640e790af2 [test] Disable dsymutil update test on windows
Apparently, the issue with dsymutil update functionality on Windows was
that Windows doesn't like dsymutil renaming files that have open handles
to them. This disables the new accelerator test and updates the comment
in the other two test.

We should be able to enable the tests again once we updated the
implementation to use TempFile::keep() to keep the temporary files in
MachOUtils.

A big thank you to Jeremy Morse from Sony for figuring this out and
bringing it to my attention.

llvm-svn: 338030
2018-07-26 14:16:19 +00:00
Luke Cheeseman 66b5e7da4c Enable some pointer authentication instructions for aarch64 v8a targets
- Some of the v8.3 pointer authentication instruction inhabit the Hint space
- These instructions can be assembled to hint instructions which act as NOP instructions prior to v8.3
- This patch permits using the hint instructions for all v8a targets
- Also, correct the RETA{A,B} instructions to match the instruction attributes of RET (set isTerminator and isBarrier)

Differential Revision: https://reviews.llvm.org/D49786

llvm-svn: 338029
2018-07-26 14:00:50 +00:00
Stefan Maksimovic 4a612d4bf2 [mips] Sign extend i32 return values on MIPS64
Override getTypeForExtReturn so that functions returning
an i32 typed value have it sign extended on MIPS64.

Also provide patterns to get rid of unneeded sign extensions
for arithmetic instructions which implicitly sign extend
their results.

Differential Revision: https://reviews.llvm.org/D48374

llvm-svn: 338019
2018-07-26 10:59:35 +00:00
Martin Storsjo 9dafd6f6d9 Revert "[COFF] Use comdat shared constants for MinGW as well"
This reverts commit r337951.

While that kind of shared constant generally works fine in a MinGW
setting, it broke some cases of inline assembly that worked before:

$ cat const-asm.c
int MULH(int a, int b) {
    int rt, dummy;
    __asm__ (
        "imull %3"
        :"=d"(rt), "=a"(dummy)
        :"a"(a), "rm"(b)
    );
    return rt;
}
int func(int a) {
    return MULH(a, 1);
}
$ clang -target x86_64-win32-gnu -c const-asm.c -O2
const-asm.c:4:9: error: invalid variant '00000001'
        "imull %3"
        ^
<inline asm>:1:15: note: instantiated into assembly here
        imull __real@00000001(%rip)
                     ^

A similar error is produced for i686 as well. The same test with a
target of x86_64-win32-msvc or i686-win32-msvc works fine.

llvm-svn: 338018
2018-07-26 10:48:20 +00:00
Jonas Devlieghere f290256dfb [test] Do dsymutil update in place
Update the dSYM bundle in place when swapping out the accelerator
tables. This should unbreak the windows bot that have been failing with
an access denied.

llvm-svn: 338014
2018-07-26 09:23:10 +00:00
Sjoerd Meijer 31d38586e7 [AArch64][NFC] Removed tab characters from test files.
llvm-svn: 338011
2018-07-26 07:59:39 +00:00
Sjoerd Meijer dc198344ce [AArch64] Armv8.2-A: add the crypto extensions
This adds MC support for the crypto instructions that were made optional
extensions in Armv8.2-A (AArch64 only).

Differential Revision: https://reviews.llvm.org/D49370

llvm-svn: 338010
2018-07-26 07:13:59 +00:00
Fangrui Song f2822e2d9d [ConstProp] Fix calls-math-finite.ll on FreeBSD
FreeBSD's log(3.0) is less precise than glibc and musl.
Let's forgive its rounding error of more than half an ulp.

llvm-svn: 338009
2018-07-26 06:24:11 +00:00
Fangrui Song c32561ea8b [AsmParser] Fix preserve-comments-crlf.s on FreeBSD
--strip-trailing-cr is a diffutils option which is also available on
BSD-licensed diff introduced in FreeBSD 11.2, however, it has a bug
comparing files mixing \r and \r\n. Use -b (POSIX) instead.

llvm-svn: 338008
2018-07-26 06:07:03 +00:00
Craig Topper 4e687d5bb2 [X86] Don't use CombineTo to skip adding new nodes to the DAGCombiner worklist in combineMul.
I'm not sure if this was trying to avoid optimizing the new nodes further or what. Or maybe to prevent a cycle if something tried to reform the multiply? But I don't think its a reliable way to do that. If the user of the expanded multiply is visited by the DAGCombiner after this conversion happens, the DAGCombiner will check its operands, see that they haven't been visited by the DAGCombiner before and it will then add the first node to the worklist. This process will repeat until all the new nodes are visited.

So this seems like an unreliable prevention at best. So this patch just returns the new nodes like any other combine. If this starts causing problems we can try to add target specific nodes or something to more directly prevent optimizations.

Now that we handle the combine normally, we can combine any negates the mul expansion creates into their users since those will be visited now.

llvm-svn: 338007
2018-07-26 05:40:10 +00:00
Alex Lorenz 7d808c19ff Revert r337981: it breaks the debuginfo-tests
This commit caused a regression in the debuginfo-tests:

FAIL: debuginfo-tests :: apple-accel.cpp (40748 of 46595)
llvm-svn: 337997
2018-07-26 03:21:40 +00:00
Amara Emerson fdd089aa14 [GlobalISel] Fall back to SDISel for swifterror/swiftself attributes.
We don't currently support these, fall back until we do.

llvm-svn: 337994
2018-07-26 01:25:58 +00:00
Wolfgang Pieb 1d56b4ae40 [DWARF v5] Don't report an error when the .debug_rnglists section is empty or non-existent. Fixes PR38297.
Reviewer: JDevlieghere

Differential Revision: https://reviews.llvm.org/D49815
 

llvm-svn: 337993
2018-07-26 01:12:41 +00:00
Wolfgang Pieb c42087df7c [DWARF v5] Don't emit multiple DW_AT_rnglists_base attributes. Some refactoring of
range lists emissions and added test cases.

Reviewer: dblaikie

Differential Revision: https://reviews.llvm.org/D49522

llvm-svn: 337981
2018-07-25 23:03:22 +00:00
Jonas Devlieghere 743d351120 [dsymutil] Add support for generating DWARF5 accelerator tables.
This patch add support for emitting DWARF5 accelerator tables
(.debug_names) from dsymutil. Just as with the Apple style accelerator
tables, it's possible to update existing dSYMs. This patch includes a
test that show how you can convert back and forth between the two types.

If no kind of table is specified, dsymutil will default to generating
Apple-style accelerator tables whenever it finds those in its input. The
same is true when there are no accelerator tables at all. Finally, in
the remaining case, where there's at least one DWARF v5 table and no
Apple ones, the output will contains a DWARF accelerator tables
(.debug_names).

Differential revision: https://reviews.llvm.org/D49137

llvm-svn: 337980
2018-07-25 23:01:38 +00:00
Yonghong Song 71d81e5c8f bpf: new option -bpf-expand-memcpy-in-order to expand memcpy in order
Some BPF JIT backends would want to optimize memcpy in their own
architecture specific way.

However, at the moment, there is no way for JIT backends to see memcpy
semantics in a reliable way. This is due to LLVM BPF backend is expanding
memcpy into load/store sequences and could possibly schedule them apart from
each other further. So, BPF JIT backends inside kernel can't reliably
recognize memcpy semantics by peephole BPF sequence.

This patch introduce new intrinsic expand infrastructure to memcpy.

To get stable in-order load/store sequence from memcpy, we first lower
memcpy into BPF::MEMCPY node which then expanded into in-order load/store
sequences in expandPostRAPseudo pass which will happen after instruction
scheduling. By this way, kernel JIT backends could reliably recognize
memcpy through scanning BPF sequence.

This new memcpy expand infrastructure is gated by a new option:

  -bpf-expand-memcpy-in-order

Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 337977
2018-07-25 22:40:02 +00:00
Eli Friedman d6baff65f7 [GlobalMerge] Handle llvm.compiler.used correctly.
Reuse the handling for llvm.used, and don't transform such globals.

Fixes a failure on the asan buildbot caused by my previous commit.

llvm-svn: 337973
2018-07-25 22:03:35 +00:00
Sanjay Patel 215dcbf4db [SelectionDAG] try to convert funnel shift directly to rotate if legal
If the DAGCombiner's rotate matching was working as expected, 
I don't think we'd see any test diffs here. 

This sidesteps the issue of custom lowering for rotates raised in PR38243:
https://bugs.llvm.org/show_bug.cgi?id=38243
...by only dealing with legal operations.

llvm-svn: 337966
2018-07-25 21:38:30 +00:00
Roman Tereshin 4f10a9d3a3 [LSV] Look through selects for consecutive addresses
In some cases LSV sees (load/store _ (select _ <pointer expression>
<pointer expression>)) patterns in input IR, often due to sinking and
other forms of CFG simplification, sometimes interspersed with
bitcasts and all-constant-indices GEPs. With this
patch`areConsecutivePointers` method would attempt to handle select
instructions. This leads to an increased number of successful
vectorizations.

Technically, select instructions could appear in index arithmetic as
well, however, we don't see those in our test suites / benchmarks.
Also, there is a lot more freedom in IR shapes computing integral
indices in general than in what's common in pointer computations, and
it appears that it's quite unreliable to do anything short of making
select instructions first class citizens of Scalar Evolution, which
for the purposes of this patch is most definitely an overkill.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D49428

llvm-svn: 337965
2018-07-25 21:33:00 +00:00
Sanjay Patel f94c4c84e6 [AArch, PowerPC] add more tests for legal rotate ops; NFC
llvm-svn: 337964
2018-07-25 21:25:50 +00:00
Eli Friedman 0887cf9cab [GlobalMerge] Allow merging globals with arbitrary alignment.
Instead of depending on implicit padding from the structure layout code,
use a packed struct and emit the padding explicitly.

Differential Revision: https://reviews.llvm.org/D49710

llvm-svn: 337961
2018-07-25 20:58:01 +00:00
Florian Hahn b6613ac665 Revert r337904: [IPSCCP] Use PredicateInfo to propagate facts from cmp instructions.
I suspect it is causing the clang-stage2-Rthinlto failures.

llvm-svn: 337956
2018-07-25 19:44:19 +00:00
Martin Storsjo ff33a95ed4 [COFF] Use comdat shared constants for MinGW as well
GNU binutils tools have no problems with this kind of shared constants,
provided that we actually hook it up completely in AsmPrinter and
produce a global symbol.

This effectively reverts SVN r335918 by hooking the rest of it up
properly.

This feature was implemented originally in SVN r213006, with no reason
for why it can't be used for MinGW other than the fact that GCC doesn't
do it while MSVC does.

Differential Revision: https://reviews.llvm.org/D49646

llvm-svn: 337951
2018-07-25 18:35:42 +00:00
Martin Storsjo d2662c32fb [COFF] Hoist constant pool handling from X86AsmPrinter into AsmPrinter
In SVN r334523, the first half of comdat constant pool handling was
hoisted from X86WindowsTargetObjectFile (which despite the name only
was used for msvc targets) into the arch independent
TargetLoweringObjectFileCOFF, but the other half of the handling was
left behind in X86AsmPrinter::GetCPISymbol.

With only half of the handling in place, inconsistent comdat
sections/symbols are created, causing issues with both GNU binutils
(avoided for X86 in SVN r335918) and with the MS linker, which
would complain like this:

fatal error LNK1143: invalid or corrupt file: no symbol for COMDAT section 0x4

Differential Revision: https://reviews.llvm.org/D49644

llvm-svn: 337950
2018-07-25 18:35:31 +00:00
Eli Friedman 733f4ed1bb [ARM] Prefer lsls+lsrs over lsls+ands or lsrs+ands in Thumb1.
Saves materializing the immediate for the "ands".

Corresponding patterns exist for lsrs+lsls, but that seems less common
in practice.

Now implemented as a DAGCombine.

Differential Revision: https://reviews.llvm.org/D49585

llvm-svn: 337945
2018-07-25 18:22:22 +00:00
Roman Tereshin ed047b0184 [SCEV] Add [zs]ext{C,+,x} -> (D + [zs]ext{C-D,+,x})<nuw><nsw> transform
as well as sext(C + x + ...) -> (D + sext(C-D + x + ...))<nuw><nsw>
similar to the equivalent transformation for zext's

if the top level addition in (D + (C-D + x * n)) could be proven to
not wrap, where the choice of D also maximizes the number of trailing
zeroes of (C-D + x * n), ensuring homogeneous behaviour of the
transformation and better canonicalization of such AddRec's

(indeed, there are 2^(2w) different expressions in `B1 + ext(B2 + Y)` form for
the same Y, but only 2^(2w - k) different expressions in the resulting `B3 +
ext((B4 * 2^k) + Y)` form, where w is the bit width of the integral type)

This patch generalizes sext(C1 + C2*X) --> sext(C1) + sext(C2*X) and
sext{C1,+,C2} --> sext(C1) + sext{0,+,C2} transformations added in
r209568 relaxing the requirements the following way:

1. C2 doesn't have to be a power of 2, it's enough if it's divisible by 2
 a sufficient number of times;
2. C1 doesn't have to be less than C2, instead of extracting the entire
  C1 we can split it into 2 terms: (00...0XXX + YY...Y000), keep the
  second one that may cause wrapping within the extension operator, and
  move the first one that doesn't affect wrapping out of the extension
  operator, enabling further simplifications;
3. C1 and C2 don't have to be positive, splitting C1 like shown above
 produces a sum that is guaranteed to not wrap, signed or unsigned;
4. in AddExpr case there could be more than 2 terms, and in case of
  AddExpr the 2nd and following terms and in case of AddRecExpr the
  Step component don't have to be in the C2*X form or constant
  (respectively), they just need to have enough trailing zeros,
  which in turn could be guaranteed by means other than arithmetics,
  e.g. by a pointer alignment;
5. the extension operator doesn't have to be a sext, the same
  transformation works and profitable for zext's as well.

Apparently, optimizations like SLPVectorizer currently fail to
vectorize even rather trivial cases like the following:

 double bar(double *a, unsigned n) {
   double x = 0.0;
   double y = 0.0;
   for (unsigned i = 0; i < n; i += 2) {
     x += a[i];
     y += a[i + 1];
   }
   return x * y;
 }

If compiled with `clang -std=c11 -Wpedantic -Wall -O3 main.c -S -o - -emit-llvm`
(!{!"clang version 7.0.0 (trunk 337339) (llvm/trunk 337344)"})

it produces scalar code with the loop not unrolled with the unsigned `n` and
`i` (like shown above), but vectorized and unrolled loop with signed `n` and
`i`. With the changes made in this commit the unsigned version will be
vectorized (though not unrolled for unclear reasons).

How it all works:

Let say we have an AddExpr that looks like (C + x + y + ...), where C
is a constant and x, y, ... are arbitrary SCEVs. Let's compute the
minimum number of trailing zeroes guaranteed of that sum w/o the
constant term: (x + y + ...). If, for example, those terms look like
follows:

        i
XXXX...X000
YYYY...YY00
   ...
ZZZZ...0000

then the rightmost non-guaranteed-zero bit (a potential one at i-th
position above) can change the bits of the sum to the left (and at
i-th position itself), but it can not possibly change the bits to the
right. So we can compute the number of trailing zeroes by taking a
minimum between the numbers of trailing zeroes of the terms.

Now let's say that our original sum with the constant is effectively
just C + X, where X = x + y + .... Let's also say that we've got 2
guaranteed trailing zeros for X:

         j
CCCC...CCCC
XXXX...XX00  // this is X = (x + y + ...)

Any bit of C to the left of j may in the end cause the C + X sum to
wrap, but the rightmost 2 bits of C (at positions j and j - 1) do not
affect wrapping in any way. If the upper bits cause a wrap, it will be
a wrap regardless of the values of the 2 least significant bits of C.
If the upper bits do not cause a wrap, it won't be a wrap regardless
of the values of the 2 bits on the right (again).

So let's split C to 2 constants like follows:

0000...00CC  = D
CCCC...CC00  = (C - D)

and represent the whole sum as D + (C - D + X). The second term of
this new sum looks like this:

CCCC...CC00
XXXX...XX00
-----------  // let's add them up
YYYY...YY00

The sum above (let's call it Y)) may or may not wrap, we don't know,
so we need to keep it under a sext/zext. Adding D to that sum though
will never wrap, signed or unsigned, if performed on the original bit
width or the extended one, because all that that final add does is
setting the 2 least significant bits of Y to the bits of D:

YYYY...YY00 = Y
0000...00CC = D
-----------  <nuw><nsw>
YYYY...YYCC

Which means we can safely move that D out of the sext or zext and
claim that the top-level sum neither sign wraps nor unsigned wraps.

Let's run an example, let's say we're working in i8's and the original
expression (zext's or sext's operand) is 21 + 12x + 8y. So it goes
like this:

0001 0101  // 21
XXXX XX00  // 12x
YYYY Y000  // 8y

0001 0101  // 21
ZZZZ ZZ00  // 12x + 8y

0000 0001  // D
0001 0100  // 21 - D = 20
ZZZZ ZZ00  // 12x + 8y

0000 0001  // D
WWWW WW00  // 21 - D + 12x + 8y = 20 + 12x + 8y

therefore zext(21 + 12x + 8y) = (1 + zext(20 + 12x + 8y))<nuw><nsw>

This approach could be improved if we move away from using trailing
zeroes and use KnownBits instead. For instance, with KnownBits we could
have the following picture:

    i
10 1110...0011  // this is C
XX X1XX...XX00  // this is X = (x + y + ...)

Notice that some of the bits of X are known ones, also notice that
known bits of X are interspersed with unknown bits and not grouped on
the rigth or left.

We can see at the position i that C(i) and X(i) are both known ones,
therefore the (i + 1)th carry bit is guaranteed to be 1 regardless of
the bits of C to the right of i. For instance, the C(i - 1) bit only
affects the bits of the sum at positions i - 1 and i, and does not
influence if the sum is going to wrap or not. Therefore we could split
the constant C the following way:

    i
00 0010...0011  = D
10 1100...0000  = (C - D)

Let's compute the KnownBits of (C - D) + X:

XX1 1            = carry bit, blanks stand for known zeroes
 10 1100...0000  = (C - D)
 XX X1XX...XX00  = X
--- -----------
 XX X0XX...XX00

Will this add wrap or not essentially depends on bits of X. Adding D
to this sum, however, is guaranteed to not to wrap:

0    X
 00 0010...0011  = D
 sX X0XX...XX00  = (C - D) + X
--- -----------
 sX XXXX   XX11

As could be seen above, adding D preserves the sign bit of (C - D) +
X, if any, and has a guaranteed 0 carry out, as expected.

The more bits of (C - D) we constrain, the better the transformations
introduced here canonicalize expressions as it leaves less freedom to
what values the constant part of ((C - D) + x + y + ...) can take.

Reviewed By: mzolotukhin, efriedma

Differential Revision: https://reviews.llvm.org/D48853

llvm-svn: 337943
2018-07-25 18:01:41 +00:00
Ulrich Weigand 5f75371c5d Fix corruption of result number in LegalizeVectorOps.cpp
When VectorLegalizer::LegalizeOp creates a new SDValue after iterating
over its arguments, we need to refer to the same result number of the
new node that the original value used.

Reviewed by: cameron.mcinally

Differential Revision: https://reviews.llvm.org/D49805

llvm-svn: 337939
2018-07-25 17:08:13 +00:00
Stanislav Mekhanoshin 7e7268ac1c [AMDGPU] Use AssumptionCacheTracker in the divrem32 expansion
Differential Revision: https://reviews.llvm.org/D49761

llvm-svn: 337938
2018-07-25 17:02:11 +00:00
Stanislav Mekhanoshin b8269a9589 Fix llvm::ComputeNumSignBits with some operations and llvm.assume
Currently ComputeNumSignBits does early exit while processing some
of the operations (add, sub, mul, and select). This prevents the
function from using AssumptionCacheTracker if passed.

Differential Revision: https://reviews.llvm.org/D49759

llvm-svn: 337936
2018-07-25 16:39:24 +00:00
Krzysztof Parzyszek 4e07509d18 [Hexagon] Properly scale bit index when extracting elements from vNi1
For example v = <2 x i1> is represented as bbbbaaaa in a predicate register,
where b = v[1], a = v[0]. Extracting v[1] is equivalent to extracting bit 4
from the predicate register.

llvm-svn: 337934
2018-07-25 16:20:59 +00:00
Petar Jovanovic 58c0210023 [MIPS GlobalISel] Lower pointer arguments
Add support for lowering pointer arguments.
Changing type from pointer to integer is already done in
MipsTargetLowering::getRegisterTypeForCallingConv.

Patch by Petar Avramovic.

Differential Revision: https://reviews.llvm.org/D49419

llvm-svn: 337912
2018-07-25 12:35:01 +00:00