Commit Graph

70214 Commits

Author SHA1 Message Date
Heejin Ahn c87b5e7e22 [WebAssembly] Fix subregion relationship in CFGSort
Summary:
The previous code for determining the innermost region in CFGSort was
not correct. We determine subregion relationship by domination of their
headers, i.e., if region A's header dominates region B's header, B is a
subregion of A. Previously we assumed that if a BB belongs to both a
loop and an exception, the region with fewer number of BBs is the
innermost one. This may not be true, because while WebAssemblyException
contains BBs in all its subregions (loops or exceptions), MachineLoop
may not, because MachineLoop does not contain BBs that don't have a path
to its header even if they are dominated by its header.

                Loop header  <---|
                    |            |
              Exception header   |
                    | \          |
                    A  B         |
                    |   \        |
                    |    C       |
                    |            |
                Loop latch       |
                    |            |
                    -------------|

For example, in this CFG, the loop does not contain B and C, because
they don't have a path back to the loops header. But for CFGSort we
consider the exception here belongs to the loop and the exception should
be a subregion of the loop and scheduled together.

So here we should use `WE->contains(ML->getHeader())` (but not
`ML->contains(WE->getHeader())`, for the stated region above).

This also fixes some comments and deletes `Regions` vector in
`RegionInfo` class, which was not used anywere.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77181
2020-04-01 08:12:41 -07:00
Georgii Rymar f527e6f2e1 [llvm-readobj] - Do not crash when SHT_HASH table is broken.
We have scenarios when the logic of --elf-hash-histogram/--hash-symbols/--hash-table
options might crash when given a broken hash table.

This patch adds pre-checks for tables for these 3 options
and provides test cases.

Differential revision: https://reviews.llvm.org/D77147
2020-04-01 18:03:02 +03:00
Jessica Clarke 616289ed29 [LegalizeTypes][RISCV] Correctly sign-extend comparison for ATOMIC_CMP_XCHG
Summary:
Currently, the comparison argument used for ATOMIC_CMP_XCHG is legalised
with GetPromotedInteger, which leaves the upper bits of the value
undefind. Since this is used for comparing in an LR/SC loop with a
full-width comparison, we must sign extend it. We introduce a new
getExtendForAtomicCmpSwapArg to complement getExtendForAtomicOps, since
many targets have compare-and-swap instructions (or pseudos) that
correctly handle an any-extend input, and the existing function
determines the extension of the result, whereas we are concerned with
the input.

This is related to https://reviews.llvm.org/D58829, which solved the
issue for ATOMIC_CMP_SWAP_WITH_SUCCESS, but not the simpler
ATOMIC_CMP_SWAP.

Reviewers: asb, lenary, efriedma

Reviewed By: asb

Subscribers: arichardson, hiraditya, rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, jfb, PkmX, jocewei, psnobl, benna, Jim, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, evandro, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74453
2020-04-01 15:51:26 +01:00
Puyan Lotfi e3033c0ce5 [llvm][clang][IFS] Enhancing the llvm-ifs yaml format for symbol lists.
Prior to this change the clang interface stubs format resembled
something ending with a symbol list like this:

 Symbols:
   a: { Type: Func }

This was problematic because we didn't actually want a map format and
also because we didn't like that an empty symbol list required
"Symbols: {}". That is to say without the empty {} llvm-ifs would crash
on an empty list.

With this new format it is much more clear which field is the symbol
name, and instead the [] that is used to express an empty symbol vector
is optional, ie:

Symbols:
 - { Name: a, Type: Func }

or

Symbols: []

or

Symbols:

This further diverges the format from existing llvm-elftapi. This is a
good thing because although the format originally came from the same
place, they are not the same in any way.

Differential Revision: https://reviews.llvm.org/D76979
2020-04-01 10:49:06 -04:00
Simon Pilgrim eb8880562e [X86][SSE] combinePTESTCC - fold TESTZ(X,~Y) -> TESTC(Y,X) 2020-04-01 15:10:53 +01:00
Kai Wang 501522b5b2 [RISCV] Support RISC-V ELF attributes sections in llvm-readobj.
Enable llvm-readobj to handle RISC-V ELF attribute sections.

Differential Revision: https://reviews.llvm.org/D75833
2020-04-01 21:50:11 +08:00
David Green a0c537834a [ARM] Extra vmull loop tests. NFC 2020-04-01 14:07:45 +01:00
shchenz e344f8b9db Revert "[LSR] re-add testcase for wrongly phi node elimination - NFC"
This reverts commit f25a1b4f58.
ARM and hexagon fail at the new added case.
2020-04-01 12:58:06 +00:00
Pierre-vh 2effe8f5e7 [Target][ARM] Improvements to the VPT Block Insertion Pass
This allows the MVE VPT Block insertion pass to remove VPNOTs in
order to create more complex VPT blocks such as TE, TEET, TETE, etc.

Differential Revision: https://reviews.llvm.org/D75993
2020-04-01 12:34:20 +01:00
shchenz f25a1b4f58 [LSR] re-add testcase for wrongly phi node elimination - NFC
Retest the case on X86/SystemZ/AArch64/PowerPC
2020-04-01 11:11:17 +00:00
Cullen Rhodes 84aa6cf1a9 [Transforms][SROA] Promote allocas with mem2reg for scalable types
Summary:
Aggregate types containing scalable vectors aren't supported and as far
as I can tell this pass is mostly concerned with optimisations on
aggregate types, so the majority of this pass isn't very useful for
scalable vectors.

This patch modifies SROA such that mem2reg is run on allocas with
scalable types that are promotable, but nothing else such as slicing is
done.

The use of TypeSize in this pass has also been updated to be explicitly
fixed size. When invoking the following methods in DataLayout:

    * getTypeSizeInBits
    * getTypeStoreSize
    * getTypeStoreSizeInBits
    * getTypeAllocSize

we now called getFixedSize on the resultant TypeSize. This is quite an
extensive change with around 50 calls to these functions, and also the
first change of this kind (being explicit about fixed vs scalable
size) as far as I'm aware, so feedback welcome.

A test is included containing IR with scalable vectors that this pass is
able to optimise.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D76720
2020-04-01 10:34:11 +00:00
Simon Pilgrim 918ccb64b0 [X86][SSE] Handle basic inversion of PTEST/TESTP operands (PR38522)
PTEST/TESTP sets EFLAGS as:
TESTZ: ZF = (Op0 & Op1) == 0
TESTC: CF = (~Op0 & Op1) == 0
TESTNZC: ZF == 0 && CF == 0

If we are inverting the 0'th operand of a PTEST/TESTP instruction we can adjust the comparisons to correct handle the inversion implicitly.

Additionally, for "TESTZ" (ZF) cases, the allones case, PTEST(X,-1) can be simplified to PTEST(X,X).

We can expand this for the TESTZ(X,~Y) pattern and also handle KTEST/KORTEST in the future.

Differential Revision: https://reviews.llvm.org/D76984
2020-04-01 11:33:28 +01:00
shchenz 8b8cd150a4 Revert "[LSR] add testcase for wrongly phi node elimination - NFC"
This reverts commit dbf5e4f6c7.
The testcase has different behaviour on PowerPC and X86.
2020-04-01 10:28:43 +00:00
shchenz dbf5e4f6c7 [LSR] add testcase for wrongly phi node elimination - NFC 2020-04-01 09:58:58 +00:00
Qiu Chaofan d8b51789fd [NFC] [PowerPC] Add test for frsp elimination 2020-04-01 17:54:24 +08:00
Bjorn Pettersson ef49895da8 [X86] Do not assume types are legal in getFauxShuffleMask
Summary:
Make sure we do not assert on value types not being
simple in getFauxShuffleMask when analysing operations
such as "v8i16 = truncate v8i24".

Reviewers: RKSimon

Reviewed By: RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77136
2020-04-01 11:40:18 +02:00
Georgii Rymar 93fc0ba145 [yaml2obj] - Add NBucket and NChain fields for the SHT_HASH section.
These fields allows to override nchain and nbucket fields of a SHT_HASH section.

Differential revision: https://reviews.llvm.org/D76834
2020-04-01 12:28:16 +03:00
Florian Hahn d307174e1d [ConstantRange] Use APInt::or/APInt::and for single elements.
Currently ConstantRange::binaryAnd/binaryOr results are too pessimistic
for single element constant ranges.

If both operands are single element ranges, we can use APInt's AND and
OR implementations directly.

Note that some other binary operations on constant ranges can cover the
single element cases naturally, but for OR and AND this unfortunately is
not the case.

Reviewers: nikic, spatel, lebedev.ri

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D76446
2020-04-01 09:50:24 +01:00
Florian Hahn e20cac3650 [Matrix] Add new test case with getelementptr constant exprs.
The new test mostly ensures we keep doing the right thing for constant
expressions while lowering matrix instructions.
2020-04-01 09:32:13 +01:00
Qiu Chaofan 95bcab8272 [DAGCombiner] Require ninf for sqrt recip estimation
Currently, DAG combiner uses (fmul (rsqrt x) x) to estimate square
root of x. However, this method would return NaN if x is +Inf, which
is incorrect.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D76853
2020-04-01 16:23:43 +08:00
Florian Hahn 862766e01e [Verifier] Verify matrix dimensions operands match vector size.
This patch adds checks to the verifier to ensure the dimension arguments
passed to the matrix intrinsics match the vector types for their
arugments/return values.

Reviewers: anemet, Gerolf, andrew.w.kaylor, LuoYuanke

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D77129
2020-04-01 09:21:39 +01:00
Simon Pilgrim f9f401dba1 [X86][AVX] Add additional 256/512-bit test cases for PACKSS/PACKUS shuffle patterns
Also add lowerShuffleWithPACK call to lowerV32I16Shuffle - shuffle combining was catching it but we avoid a lot of temporary shuffle creations if we catch it at lowering first.
2020-04-01 08:19:03 +01:00
Simon Pilgrim 3c9064ed96 [X86] Run XOP vector rotation tests with/without AVX2
I noticed this while reviewing D77152 - by only testing bdver4 we weren't checking an XOP target that only had AVX1
2020-04-01 08:19:03 +01:00
Kai Luo 8eb40e41f6 [PowerPC] Don't generate ST_VSR_SCAL_INT if power8-vector is disabled
Summary:
In https://bugs.llvm.org/show_bug.cgi?id=45297, it fails selecting
instructions for `PPCISD::ST_VSR_SCAL_INT`. The reason it generate the
`PPCISD::ST_VSR_SCAL_INT` with `-power8-vector` in IR is PPC's
combiner checks `hasP8Altivec` rather than `hasP8Vector`. This patch
should resolve PR45297.

Differential Revision: https://reviews.llvm.org/D76773
2020-04-01 02:15:25 +00:00
Shengchen Kan d0efd7bfcf [X86][MC] Disable Prefix padding after hardcode/prefix
Reviewers: reames, MaskRay, craig.topper, LuoYuanke, jyknight, eli.friedman

Reviewed By: craig.topper

Subscribers: hiraditya, llvm-commits, annita.zhang

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76475
2020-04-01 09:49:52 +08:00
Matt Arsenault 43e576593e AMDGPU/GlobalISel: Fix insert point when lowering G_FMAD 2020-03-31 19:57:06 -04:00
Evgenii Stepanov f9471b0010 Fix MSan false positive due to select folding.
Summary:
Select folding in JumpThreading can create a conditional branch on a
code patch that did not have one in the original program. This is not a
valid transformation in sanitize_memory functions.

Note that JumpThreading does select folding in 3 different places. Two
of them seem safe - they apply to a select instruction in a BB that ends
with an unconditional branch to another BB, which (in turn) ends with a
conditional branch or a switch with the same condition.

Fixes PR45220.

Reviewers: glider, dvyukov, efriedma

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76332
2020-03-31 15:25:42 -07:00
Fangrui Song 4af7560b37 [PPCInstPrinter] Print conditional branches as `bt 2, $target` instead of `bt 2, .+$imm`
Follow-up of D76591.

Reviewed By: #powerpc, sfertile

Differential Revision: https://reviews.llvm.org/D76907
2020-03-31 15:05:38 -07:00
Joel E. Denny 8f8c4950fe [FileCheck] Add missing %ProtectFileCheckOutput to FileCheck tests
I'm committing this fixup without review because it's an obvious
continuation of D65121 (committed at f471eb8e99).
2020-03-31 17:29:11 -04:00
Hubert Tong 478af4479a [Object] Update ObjectFile::makeTriple for XCOFF
Summary:
When we encounter an XCOFF file, reflect that in the triple information.
In addition to knowing the object file format, we know that the
associated OS is AIX.

This means that we can expect that there is no output difference in the
processing of an XCOFF32 input file between cases where the triple is
left unspecified by the user and cases where the user specifies
`--triple powerpc-ibm-aix` explicitly.

Reviewers: jhenderson, sfertile, jasonliu, daltenty

Reviewed By: jasonliu

Subscribers: wuzish, nemanjai, hiraditya, MaskRay, rupprecht, steven.zhang, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77025
2020-03-31 17:26:30 -04:00
Daniel Frampton 494abe139a [AArch64] Change AArch64 Windows EH UnwindHelp object to be a fixed object
The UnwindHelp object is used during exception handling by runtime
code. It must be findable from a fixed offset from FP.

This change allocates the UnwindHelp object as a fixed object (as is
done for x86_64) to ensure that both the generated code and runtime
agree on the location of the object.

Fixes https://bugs.llvm.org/show_bug.cgi?id=45346

Differential Revision: https://reviews.llvm.org/D77016
2020-03-31 14:21:21 -07:00
Daniel Frampton 522b4c4b88 [AArch64] Fix mismatch in prologue and epilogue for funclets on Windows
The generated code for a funclet can have an add to sp in the epilogue
for which there is no corresponding sub in the prologue.

This patch removes the early return from emitPrologue that was
preventing the sub to sp, and instead conditionalizes the appropriate
parts of the rest of the function.

Fixes https://bugs.llvm.org/show_bug.cgi?id=45345

Differential Revision: https://reviews.llvm.org/D77015
2020-03-31 14:21:18 -07:00
Anna Thomas 58a05675da Revert "[InlineFunction] Handle return attributes on call within inlined body"
This reverts commit 28518d9ae3.
There is a failure in MsgPackReader.cpp when built with clang. It
complains about "signext and zeroext" are incompatible. Investigating
offline if it is infact a UB in the MsgPackReader code.
2020-03-31 16:16:34 -04:00
Eli Friedman dacf8d3562 [AArch64][SVE] Add support for fcmp.
This also requires support for boolean "not", so I added boolean logic
while I was there.

Differential Revision: https://reviews.llvm.org/D76901
2020-03-31 12:04:39 -07:00
Guozhi Wei 6d20937c29 [CodeGenPrepare] Delete intrinsic call to llvm.assume to enable more tailcall
The attached test case is simplified from tcmalloc. Both function calls should be optimized as tailcall. But llvm can only optimize the first call. The second call can't be optimized because function dupRetToEnableTailCallOpts failed to duplicate ret into block case2.

There 2 problems blocked the duplication:

  1 Intrinsic call llvm.assume is not handled by dupRetToEnableTailCallOpts.
  2 The control flow is more complex than expected, dupRetToEnableTailCallOpts can only duplicate ret into its predecessor, but here we have an intermediate block between call and ret.

The solutions:

  1 Since CodeGenPrepare is already at the end of LLVM IR phase, we can simply delete the intrinsic call to llvm.assume.
  2 A general solution to the complex control flow is hard, but for this case, after exit2 is duplicated into case1, exit2 is the only successor of exit1 and exit1 is the only predecessor of exit2, so they can be combined through eliminateFallThrough. But this function is called too late, there is no more dupRetToEnableTailCallOpts after it. We can add an earlier call to eliminateFallThrough to solve it.

Differential Revision: https://reviews.llvm.org/D76539
2020-03-31 11:55:51 -07:00
Stanislav Mekhanoshin 08682dcc86 [AMDGPU] Define 16 bit VGPR subregs
We have loads preserving low and high 16 bits of their
destinations. However, we always use a whole 32 bit register
for these. The same happens with 16 bit stores, we have to
use full 32 bit register so if high bits are clobbered the
register needs to be copied. One example of such code is
added to the load-hi16.ll.

The proper solution to the problem is to define 16 bit subregs
and use them in the operations which do not read another half
of a VGPR or preserve it if the VGPR is written.

This patch simply defines subregisters and register classes.
At the moment there should be no difference in code generation.
A lot more work is needed to actually use these new register
classes. Therefore, there are no new tests at this time.

Register weight calculation has changed with new subregs so
appropriate changes were made to keep all calculations just
as they are now, especially calculations of register pressure.

Differential Revision: https://reviews.llvm.org/D74873
2020-03-31 11:49:06 -07:00
Anna Thomas 28518d9ae3 [InlineFunction] Handle return attributes on call within inlined body
Consider a callee function that has a call (C) within it which feeds
into the return.  When we inline that callee into a callsite that has
return attributes, we can backward propagate those attributes to the
call (C) within that inlined callee body.

This is safe to do so only if we can guarantee transfer of execution to
successor in the window of instructions between return value (i.e. the
call C) and the return instruction.

See added test cases.

Reviewed-By: reames, jdoerfert

Differential Revision: https://reviews.llvm.org/D76140
2020-03-31 14:35:40 -04:00
Ulrich Weigand c726c920e0 [SystemZ] Allow %r0 in address context for AsmParser
Registers used in any address (as well as in a few other contexts)
have special semantics when a "zero" register is used, which is
why the back-end defines extra register classes ADDR32, ADDR64 etc
to be used to prevent the register allocator from using %r0 there.

However, when writing assembler code "by hand", you sometimes need
to trigger that special semantics.  However, currently the AsmParser
will reject %r0 in those places.  In some cases it may be possible
to write that instruction differently - but in others it is currently
not possible at all.

This check in AsmParser simply seems overly strict, so this patch
just removes the check completely.  This brings the behaviour of
AsmParser in line with the GNU assembler as well.

Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=45092
2020-03-31 19:48:50 +02:00
Uday Bondhugula dc817b2dea [InstCombine] Deduce attributes for aligned_alloc in InstCombine
Make InstCombine aware of the aligned_alloc library function.

Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>

Depends on D76970.

Differential Revision: https://reviews.llvm.org/D76971
2020-03-31 23:17:28 +05:30
zhizhouy 94d912296d [NFC] Do not run CGProfilePass when not using integrated assembler
Summary:
CGProfilePass is run by default in certain new pass manager optimization pipeline. Assemblers other than llvm as (such as gnu as) cannot recognize the .cgprofile entries generated and emitted from this pass, causing build time error.

This patch adds new options in clang CodeGenOpts and PassBuilder options so that we can turn cgprofile off when not using integrated assembler.

Reviewers: Bigcheese, xur, george.burgess.iv, chandlerc, manojgupta

Reviewed By: manojgupta

Subscribers: manojgupta, void, hiraditya, dexonsmith, llvm-commits, tcwang, llozano

Tags: #llvm, #clang

Differential Revision: https://reviews.llvm.org/D62627
2020-03-31 10:31:31 -07:00
Simon Pilgrim 30436a1ce7 [X86][SSE] Add additional PTEST/TESTP inversion tests 2020-03-31 18:02:27 +01:00
Simon Pilgrim 8b925440d1 [X86][SSE] Simplify PTEST/TESTP tests for D76984
We don't need to use an allones for the second operand - test the general case.
2020-03-31 18:02:27 +01:00
Sterling Augustine 21d9d0855b New symbolizer option to print files relative to the compilation directory.
Summary: New "--relative" option to allow printing files relative to the compilation directory.

Reviewers: jhenderson

Subscribers: MaskRay, rupprecht, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76733
2020-03-31 09:29:24 -07:00
Florian Hahn b0cd7b2799 [SCCP] Limit use of range info for binops to integers for now.
This fixes a crash when building the test suite.
2020-03-31 17:08:09 +01:00
Tyker 4aeb7e1ef4 [AssumeBundles] Preserve information in EarlyCSE
Summary: this patch preserve information from various places in EarlyCSE into assume bundles.

Reviewers: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76769
2020-03-31 17:47:04 +02:00
Tyker 7093b92a13 [AssumeBundles] Preserve Information from Load/Store
Summary: This patch preserve dereferenceable, nonnull and alignment from loads and stores.

Reviewers: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76759
2020-03-31 17:47:04 +02:00
Jonas Paulsson f481d48893 [SystemZ] Improve foldMemoryOperandImpl().
Fold MS(G)RKC -> MS(G)C.

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D76771
2020-03-31 17:17:51 +02:00
Georgii Rymar b3f13bc165 [obj2yaml] - Teach tool to dump program headers.
Currently obj2yaml does not dump program headers,
this patch teaches it to do that.

Differential revision: https://reviews.llvm.org/D75342
2020-03-31 18:10:19 +03:00
Simon Pilgrim 7e0e5fa499 Revert rGefe59d6717dcdf7777acb9b7a734e1a520bdf22a "[X86][SSE] lowerShuffleWithPACK - extend to use chained PACKs for larger truncations"
This might be causing an issue on the fuchsia-x86_64-linux buildbot - reverting to see what happens.
2020-03-31 15:47:30 +01:00
Simon Pilgrim efe59d6717 [X86][SSE] lowerShuffleWithPACK - extend to use chained PACKs for larger truncations
If canLowerByDroppingEvenElements indicates that the shuffle is a N:1 compaction pattern and the inputs are suitably sign/zero extended then we can use a chain of PACKSS/PACKUS to compact.

This helps avoid PSHUFB (and its mask load) for short shuffle chains, shuffle combining will still replace with a PSHUFB if we have enough shuffles as getFauxShuffleMask can recognise PACKSS/PACKUS chains.
2020-03-31 14:48:48 +01:00
Sanjay Patel fa61b5059a [InstCombine] remove stray auto-generated test comment; NFC
The script now includes extra info about command-line options used
when generating its advertisement heading, but we don't want that
here. This is a special-case because we have enhanced the check
lines (as noted in the 2nd comment line).
2020-03-31 09:19:12 -04:00
Florian Hahn b37543750c [ValueLattice] Distinguish between constant ranges with/without undef.
This patch updates ValueLattice to distinguish between ranges that are
guaranteed to not include undef and ranges that may include undef.

A constant range guaranteed to not contain undef can be used to simplify
instructions to arbitrary values. A constant range that may contain
undef can only be used to simplify to a constant. If the value can be
undef, it might take a value outside the range. For example, consider
the snipped below

define i32 @f(i32 %a, i1 %c) {
  br i1 %c, label %true, label %false
true:
  %a.255 = and i32 %a, 255
  br label %exit
false:
  br label %exit
exit:
  %p = phi i32 [ %a.255, %true ], [ undef, %false ]
  %f.1 = icmp eq i32 %p, 300
  call void @use(i1 %f.1)
  %res = and i32 %p, 255
  ret i32 %res
}

In the exit block, %p would be a constant range [0, 256) including undef as
%p could be undef. We can use the range information to replace %f.1 with
false because we remove the compare, effectively forcing the use of the
constant to be != 300. We cannot replace %res with %p however, because
if %a would be undef %cond may be true but the  second use might not be
< 256.

Currently LazyValueInfo uses the new behavior just when simplifying AND
instructions and does not distinguish between constant ranges with and
without undef otherwise. I think we should address the remaining issues
in LVI incrementally.

Reviewers: efriedma, reames, aqjune, jdoerfert, sstefan1

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D76931
2020-03-31 12:50:20 +01:00
Denis Antrushin 06c58f11a9 [SCEV] Use backedge SCEV of PHI only if its input is loop invariant
For the PHI node

      %1 = phi [%A, %entry], [%X, %latch]

it is incorrect to use SCEV of backedge val %X as an exit value
of PHI unless %X is loop invariant.
This is because exit value of %1 is value of %X at one-before-last
iteration of the loop.

Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D73181
2020-03-31 18:39:24 +07:00
Simon Pilgrim 98357dee1c [X86] Combine concat(palignr,palignr) -> palignr(concat,concat)
combineX86ShufflesRecursively should handle this someday
2020-03-31 11:06:35 +01:00
Daan Sprenkels 464b9aeafe [InstCombine] Transform extelt-trunc -> bitcast-extelt
Canonicalize the case when a scalar extracted from a vector is
truncated.  Transform such cases to bitcast-then-extractelement.
This will enable erasing the truncate operation.

This commit fixes PR45314.

reviewers: spatel

Differential revision: https://reviews.llvm.org/D76983
2020-03-31 11:53:41 +02:00
David Green 2c5f43f9dd [ARM] Fix qdadd operand order
qdadd is defined as sat(Rm + sat(2*Rn)). We had the Rm and Rn switched
the wrong way around.

Differential Revision: https://reviews.llvm.org/D77049
2020-03-31 10:11:36 +01:00
Guillaume Chatelet c9d5c19597 [Alignment][NFC] Transitionning more getMachineMemOperand call sites
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: arsenm, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, Jim, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77121
2020-03-31 08:36:18 +00:00
Sebastian Neubauer 5d3a69feca [AMDGPU] New llvm.amdgcn.ballot intrinsic
Add a new llvm.amdgcn.ballot intrinsic modeled on the ballot function
in GLSL and other shader languages. It returns a bitfield containing the
result of its boolean argument in all active lanes, and zero in all
inactive lanes.

This is intended to replace the existing llvm.amdgcn.icmp and
llvm.amdgcn.fcmp intrinsics after a suitable transition period.

Use the new intrinsic in the atomic optimizer pass.

Differential Revision: https://reviews.llvm.org/D65088
2020-03-31 10:35:39 +02:00
Florian Hahn 0c9c58ada0 [SCCP] Use constant ranges for casts.
For casts with constant range operands, we can use
ConstantRange::castOp.

Reviewers: davide, efriedma, mssimpso

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D71938
2020-03-31 09:22:04 +01:00
Kai Wang 581ba35291 [RISCV] ELF attribute section for RISC-V.
Leverage ARM ELF build attribute section to create ELF attribute section
for RISC-V. Extract the common part of parsing logic for this section
into ELFAttributeParser.[cpp|h] and ELFAttributes.[cpp|h].

Differential Revision: https://reviews.llvm.org/D74023
2020-03-31 16:16:19 +08:00
Djordje Todorovic bcbd60aeb5 [Mips] Make MipsBranchExpansion aware of BBIT family of branch
Octeon branches (bbit0/bbit032/bbit1/bbit132) have an immediate operand,
so it is legal to have such replacement within
MipsBranchExpansion::replaceBranch().

According to the specification, a branch (e.g. bbit0 ) looks like:

bbit0  rs p offset  // p is an immediate operand
  if !rs<p> then branch

Without this patch, an assertion triggers in the method,
and the problem has been found in the real example.

Differential Revision: https://reviews.llvm.org/D76842
2020-03-31 09:20:51 +02:00
Dylan McKay 7b808b105f [AVR] Generalize the previous interrupt bugfix to signal handlers too 2020-03-31 19:33:34 +13:00
Dylan McKay 339b34266c [AVR] Respect the 'interrupt' function attribute
In the past, AVR functions were only lowered with interrupt-specific
machine code if the function was defined with the "avr-interrupt" or
"avr-signal" calling conventions.

This patch modifies the backend so that if the function does not have a
special calling convention, but does have an "interrupt" attribute,
that function is interpreted as a function with interrupts.

This also extracts the "is this function an interrupt" logic from
several disparate places in the backend into one AVRMachineFunctionInfo
attribute.

Bug found by Wilhelm Meier.
2020-03-31 19:00:18 +13:00
Wei Mi ebad678857 [SampleFDO] Port MD5 name table support to extbinary format.
Compbinary format uses MD5 to represent strings in name table. That gives smaller profile without the need of compression/decompression when writing/reading the profile. The patch adds the support in extbinary format. It is off by default but user can choose to enable it.

Note the feature of using MD5 in name table can bring very small chance of name conflict leading to profile mismatch. Besides, profile using the feature won't have the profile remapping support.

Differential Revision: https://reviews.llvm.org/D76255
2020-03-30 22:07:08 -07:00
QingShan Zhang 4eeb56d088 [PowerPC] Don't do the folding if the operand is R0/X0
We have this transformation in PowerPC peephole:

Replace instruction:
  renamable $x28 = ADDI8 renamable $x7, -8
  renamable $x28 = ADD8 killed renamable $x28, renamable $x0
  STFD killed renamable $f0, -8, killed renamable $x28 :: (store 8 into %ir._ind_cast99.epil)
with:
  renamable $x28 = ADDI8 renamable $x7, -16
  STFDX killed renamable $f0, $x0, killed $x28 :: (store 8 into %ir._ind_cast99.epil)

It is invalid as the '$x0' in STFDX is constant 0, not register r0.

Reviewed By: Nemanjai

Differential Revision: https://reviews.llvm.org/D77034
2020-03-31 02:50:19 +00:00
Jessica Paquette d5ee72065b [GlobalISel] Implement identity transforms for x op x -> x
When we have

```
a = G_OR x, x
```

or

```
b = G_AND y, y
```

We can drop the G_OR/G_AND and just use x/y respectively.

Also update arm64-fallback.ll because there was an or in there which hits this
transformation.

Differential Revision: https://reviews.llvm.org/D77105
2020-03-30 18:22:37 -07:00
Juneyoung Lee 519f5c3796 [LegalizeTypes] Add SoftenFloatRes_FREEZE
Summary: This adds SoftenFloatRes_FREEZE.

Reviewers: bkramer, JamesNagurne, craig.topper, efriedma

Reviewed By: craig.topper

Subscribers: AbigailLinden, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76980
2020-03-31 10:16:38 +09:00
Jessica Paquette 63d70ea6a0 [GlobalISel] Combine (x op 0) -> x for operations with a right identity of 0
Implement identity combines for operations like the following:

```
%a = G_SUB %b, 0
```

This can just be replaced with %b.

Over CTMark, this gives some minor size improvements at -O3.

Differential Revision: https://reviews.llvm.org/D76640
2020-03-30 16:49:52 -07:00
Matt Arsenault b8fc192d42 Revert "[GISel]: Fix incorrect IRTranslation while translating null pointer types"
This reverts commit b3297ef051.

This change is incorrect. The current semantic of null in the IR is a
pointer with the bitvalue 0. It is not a cast from an integer 0, so
this should preserve the pointer type.
2020-03-30 19:30:42 -04:00
Daan Sprenkels 5227fa0c72 Recommit "[InstCombine] Update assertions in InstCombine test; NFC" 2020-03-31 00:00:41 +02:00
Matt Arsenault db9f0d1ce5 AMDGPU: Form v_cvt_ubyte* with f16 results
We get 2 conversion instructions anyway. Previously we would get a
conversion with SDWA reading from a byte source, which has a larger
encoding.
2020-03-30 17:59:49 -04:00
Matt Arsenault b27d255e1e AMDGPU/GlobalISel: Form CVT_F32_UBYTE0 2020-03-30 17:45:55 -04:00
Matt Arsenault bcb643c8af AMDGPU/GlobalISel: Handle image atomics 2020-03-30 17:41:04 -04:00
Matt Arsenault 48eda37282 AMDGPU/GlobalISel: Start selecting image intrinsics
Does not handled atomics yet.
2020-03-30 17:33:04 -04:00
Matt Arsenault 570a578e46 AMDGPU: Account for dmask when computing image mem size
Only the number of elements in the dmask will really be accessed.
2020-03-30 17:30:58 -04:00
Jay Foad cee65d51fe AMDGPU: Implement getMemcpyLoopLoweringType
Summary: Based on a patch by Matt Arsenault.

Reviewers: rampitec, kerbowa, nhaehnle, arsenm

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77057
2020-03-30 22:21:01 +01:00
Matt Arsenault 2641ba52a9 AMDGPU/GlobalISel: Round up image operations with 5, 6 or 7 addresses
The instruction definitions are missing for these register types, so
round up to 8 like the DAG.
2020-03-30 17:02:47 -04:00
Matt Arsenault 42d5609809 AMDGPU/GlobalISel: Start handling _L to _LZ optimization
We currently don't have a way to map to the equivalent intrinsic
opcode, so track immediate 0s in place of the address for the
selection to know to change the final opcode.
2020-03-30 17:02:30 -04:00
Daan Sprenkels 273b0d7766 Revert "[InstCombine] Update assertions in InstCombine test; NFC"
This reverts commit 4243bd494d.
2020-03-30 22:41:33 +02:00
Daan Sprenkels 4243bd494d [InstCombine] Update assertions in InstCombine test; NFC 2020-03-30 22:15:50 +02:00
Sanjay Patel f2fbdf76d8 [InstCombine] do not exclude min/max from icmp with casted operand fold
InstCombine has a mess of logic that tries to preserve min/max patterns,
but AFAICT, this one is not necessary because we can always narrow the
corresponding select in this sequence to match the narrow compare.

The biggest danger for this patch is inducing infinite looping or
assert from exceeding max iterations. If any bots hit that in the
vicinity of this commit, this is the likely patch to blame.
2020-03-30 16:10:51 -04:00
Eli Friedman 9eb1b41811 [llvm-cov] Improve error message for missing profdata
I got a report recently that a user was having trouble interpreting the
meaning of the error message.  Hopefully this is more readable; produces
something like the following:

error: No such file or directory: Could not read profile data!

Differential Revision: https://reviews.llvm.org/D76796
2020-03-30 12:54:07 -07:00
Matt Arsenault 4919f2e1c5 AMDGPU/GlobalISel: Basic legalize rules for G_FSHR
Only handles easy 32-bit cases.
2020-03-30 11:53:01 -07:00
Bill Wendling fa496ce3c6 [Intrinsic] Give "is.constant" the "convergent" attribute
Summary:
Code frequently relies upon the results of "is.constant" intrinsics to
DCE invalid code paths. We don't want the intrinsic to be made control-
dependent on any additional values. For instance, we can't split a PHI
into a "constant" and "non-constant" part via jump threading in order
to "optimize" the constant part, because the "is.constant" intrinsic is
meant to return "false".

Reviewers: wmi, kazu, MaskRay

Reviewed By: kazu

Subscribers: jdoerfert, efriedma, joerg, lebedev.ri, nikic, xbolva00, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75799
2020-03-30 11:47:12 -07:00
Matt Arsenault 23da702d69 GlobalISel: Translate llvm.fshl/llvm.fshr 2020-03-30 11:34:42 -07:00
Jakub Kuderski 77ce2e21a8 [AMDGPU] Add Relocation Constant Support
Summary:
This change adds amdgcn.reloc.constant intrinsic to the amdgpu backend, which will compile into a relocation entry in the resulting elf.

The intrinsics takes a MetadataNode (String) as its only argument, which specifies the symbol name of the relocation entry.

`SelectionDAGBuilder::getValueImpl` is changed to allow metadata operands passed through to ISel.

Author: csyonghe <yonghe@google.com>

Reviewers: tpr, nhaehnle

Reviewed By: nhaehnle

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76440
2020-03-30 13:49:20 -04:00
Sameer Sahasrabuddhe 3cbbded68c Introduce unify-loop-exits pass.
For each natural loop with multiple exit blocks, this pass creates a
new block N such that all exiting blocks now branch to N, and then
control flow is redistributed to all the original exit blocks.

The bulk of the tranformation is a new function introduced in
BasicBlockUtils that an redirect control flow from a set of incoming
blocks to a set of outgoing blocks via a common "hub".

This is a useful workaround for a limitation in the structurizer which
incorrectly orders blocks when processing a nest of loops. This pass
bypasses that issue by ensuring that each natural loop is recognized
as a separate region. Since the structurizer is a region pass, it no
longer sees a nest of loops in a single region, and instead processes
each "level" in the nesting as a separate region.

The AMDGPU backend provides a new option to enable this pass before
the structurizer, which may eventually be enabled by default.

Reviewers: madhur13490, arsenm, nhaehnle

Reviewed By: nhaehnle

Differential Revision: https://reviews.llvm.org/D75865
2020-03-30 13:23:56 -04:00
Vedant Kumar dcc410b5cf [LoopVectorize] Fix crash on "getNoopOrZeroExtend cannot truncate!" (PR45259)
In InnerLoopVectorizer::getOrCreateTripCount, when the backedge taken
count is a SCEV add expression, its type is defined by the type of the
last operand of the add expression.

In the test case from PR45259, this last operand happens to be a
pointer, which (according to llvm::Type) does not have a primitive size
in bits. In this case, LoopVectorize fails to truncate the SCEV and
crashes as a result.

Uing ScalarEvolution::getTypeSizeInBits makes the truncation work as expected.

https://bugs.llvm.org/show_bug.cgi?id=45259

Differential Revision: https://reviews.llvm.org/D76669
2020-03-30 10:14:14 -07:00
Yuanfang Chen ece79f4708 [X86] make sure POP has implicit def/use of stack pointer when materializing 8-bit immediates for minsize
Summary:
Otherwise PostRA list scheduler may reorder instruction, such as

schedule this
'''
pushq  $0x8
pop    %rbx
lea    0x2a0(%rsp),%r15
'''
to
'''
pushq  $0x8
lea    0x2a0(%rsp),%r15
pop    %rbx
'''
by mistake. The patch is to prevent this to happen by making sure POP has
implicit use of SP.

Reviewers: craig.topper

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77031
2020-03-30 09:25:31 -07:00
Matt Arsenault bb009498c2 AMDGPU/GlobalISel: Hack to fix i24 argument lowering
I still think the call lowering type legalization logic split between
the generic code and target is too confusing, but largely induced by
the reliance on the DAG infrastructure.
2020-03-30 11:00:45 -04:00
Matt Arsenault 90a36bbd7c AMDGPU/GlobalISel: Legalize 64-bit G_UDIV/G_UREM
Mostly ported from the DAG version. This results in much worse code
than the DAG version, largely due to a much worse expansion for
G_UMULH.
2020-03-30 10:57:37 -04:00
Chris Jackson f6b2c003f3 [DebugInfo] Ensure that a demanded bits optimisation in
InstCombine does not result in an incorrect debuginfo variable
value

- Add an additional salvage and a test.

Reviewers: aprantl, djtodoro

Differential Revision: https://reviews.llvm.org/D76854

Bugzilla:  https://bugs.llvm.org/show_bug.cgi?id=44371
2020-03-30 15:39:22 +01:00
Florian Hahn 7899a111ea Revert "[Darwin] Respect -fno-unroll-loops during LTO."
As per post-commit comment at https://reviews.llvm.org/D76916, this
should better be done at the TU level.

This reverts commit 9ce198d6ed.
2020-03-30 15:20:30 +01:00
Chris Jackson 135709aa90 [DebugInfo] Ensure dead store elimination can mark an operand
value as undefined

    - Correct a debug info salvage and add a test

    Reviewers: aprantl, vsk

    Differential Revision: https://reviews.llvm.org/D76930
    Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=45080
2020-03-30 14:58:14 +01:00
Sanjay Patel bc60cdcc3f [InstCombine] add test for trunc-extelt; NFC
Goes with D76983
2020-03-30 09:43:03 -04:00
Georgii Rymar 4cbfb98eb3 [llvm-readobj] - Improve test of --elf-hash-histogram option.
This test missed the check of histograms printed for .hash sections.
It was removed by mistake in D71606 where I tried to get rid of precompiled objects
and did not realize that time that both SHT_GNU_HASH and SHT_HASH sections
were tested and not just GNU version.

Also it never tested aliases for the --elf-hash-histogram option.

Differential revision: https://reviews.llvm.org/D76920
2020-03-30 15:46:45 +03:00
Georgii Rymar 821439a45a [llvm-readobj][test] - Simplify hash-symbols test.
We are able to reduce `-DBITS=32/64` to reduce this test case.
I've rewrote the comments we had to generalize them and
fix wrong computations they contained.

Differential revision: https://reviews.llvm.org/D76924
2020-03-30 14:44:30 +03:00
Simon Pilgrim e95d04f4f1 [X86][AVX] lowerV4X128Shuffle - attempt to widen to 2x256 to simplify shuffles
If we are lowering to X86ISD::SHUF128 we are going to lose track of individual 128-bit lanes that are UNDEF, so if we can widen these to guarantee that they are sequential with their neighbour we should. This helps with later shuffle combines.
2020-03-30 12:22:26 +01:00
Florian Hahn 84c1fbab5d [CVP] Add additional icmp for ranges with undef to test. 2020-03-30 10:59:25 +01:00
Qiu Chaofan 9aa884ccc2 [NFC] [PowerPC] Update and add tests for ori
Use script to update test for ori with 32-bit imms, and add test for
ori with 64-bit imms.
2020-03-30 17:46:12 +08:00
Sam Parker 94b195ff12 [ARM][LowOverheadLoops] Add horizontal reduction support
Add a bit more logic into the 'FalseLaneZeros' tracking to enable
horizontal reductions and also make the VADDV variants
validForTailPredication.

Differential Revision: https://reviews.llvm.org/D76708
2020-03-30 09:55:41 +01:00
David Green c9eaed5149 [ARM] MVE VMOV.i64
In the original batch of MVE VMOVimm code generation VMOV.i64 was left
out due to the way it was done downstream. It turns out that it's fairly
simple though. This adds the codegen for it, similar to NEON.

Bigendian is technically incorrect in this version, which John is fixing
in a Neon patch.
2020-03-30 07:44:23 +01:00
Craig Topper b4695351cb [TTI][X86] Fix the value passed to IsUnsigned for cost modeling of experimental.vector.reduce.smin/smax/umin/umax.
We were passing true for smax/smin and false for umax/umin.
2020-03-29 23:34:22 -07:00
Jun Ma 31a1d85c53 [Coroutines 2/2] Improve symmetric control transfer feature
Differential Revision: https://reviews.llvm.org/D76913
2020-03-30 09:53:09 +08:00
Jun Ma a94fa2c049 [Coroutines 1/2] Improve symmetric control transfer feature
Differential Revision: https://reviews.llvm.org/D76911
2020-03-30 09:53:09 +08:00
Craig Topper d74533a18b [X86] Add sse4.1 RUNs lines to the min/max reduction cost model tests.
Mostly this matches the sse4.2 we already had command lines for.
Except in the i64 case since sse4.1 doesn't have pcmpgtq.
2020-03-29 16:05:35 -07:00
Daan Sprenkels 24562c6588 [InstCombine] Add tests for trunc (extelt x); (NFC)
Baseline tests for D76983 (PR45314)

Differential Revision: https://reviews.llvm.org/D77024
2020-03-29 17:30:54 -04:00
Craig Topper 2451e4c597 [X86] Add sse4.2 command lines to min/max reduction tests.
SSE4.2 has the pcmpgtq instruction which we will use in
vXi64 reductions when its available.
2020-03-29 13:51:03 -07:00
David Green 7c1a6873aa [ARM] VMOV.64 immediate tests. NFC 2020-03-29 21:08:43 +01:00
Simon Pilgrim 9c8ec99c80 [X86][AVX] Combine 128/256-bit lane shuffles with zeroable upper subvectors to EXTRACT_SUBVECTOR (PR40720)
As explained on PR40720, EXTRACTF128 is always as good/better than VPERM2F128/SHUF128, and we can use the implicit zeroing of the uppers.
2020-03-29 19:51:38 +01:00
Uday Bondhugula c0955edfd6 Introduce support for lib function aligned_alloc in TLI / memory builtins
Aligned_alloc is a standard lib function and has been in glibc since
2.16 and in the C11 standard. It has semantics similar to malloc/calloc
for several analyses/transforms. This patch introduces aligned_alloc
in target library info and memory builtins. Subsequent ones will
make other passes aware and fix https://bugs.llvm.org/show_bug.cgi?id=44062

This change will also be useful to LLVM generators that need to allocate
buffers of vector elements larger than 16 bytes (for eg. 256-bit ones),
element boundary alignment for which is not typically provided by glibc malloc.

Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>

Differential Revision: https://reviews.llvm.org/D76970
2020-03-29 23:36:24 +05:30
Matt Arsenault 0b68ca5162 AMDGPU: Add some additional tests for v_cvt_ubyte* formation
Use functions now that we have them for less boilerplate in the
output.
2020-03-29 14:03:07 -04:00
Sanjay Patel febcb24f14 [InstCombine] make test independent of branch undef/UB; NFC 2020-03-29 13:32:47 -04:00
Simon Pilgrim 443dcc0e00 [X86][AVX] Add tests for 512-bit shuffle patterns that could reduce to subvector extractions 2020-03-29 18:27:18 +01:00
Simon Pilgrim b44f07045c Remove unnecessary empty comments from test check lines. NFC. 2020-03-29 18:27:18 +01:00
Simon Pilgrim 7734e4b3a3 [X86][AVX] Combine 128-bit lane shuffles with a zeroable upper half to EXTRACT_SUBVECTOR (PR40720)
As explained on PR40720, EXTRACTF128 is always as good/better than VPERM2F128, and we can use the implicit zeroing of the upper half.

I've added some extra tests to vector-shuffle-combining-avx2.ll to make sure we don't lose coverage.
2020-03-29 16:41:59 +01:00
Simon Pilgrim 10439f9e32 [X86][AVX] Add X86ISD::VALIGN target shuffle decode support
Allows us to combine VALIGN instructions with other shuffles - the combiner doesn't create VALIGN yet though.
2020-03-29 16:41:58 +01:00
Simon Pilgrim a7115d51be [X86] X86CallFrameOptimization - generalize slow push code path
Replace the explicit isAtom() || isSLM() test with the more general (and more specific) slowTwoMemOps() check to avoid the use of the PUSHrmm push from memory case.

This is actually very tricky to test in anything but quite complex code, but the atomic-idempotent.ll tests seem to be the most straightforward to use.

Differential Revision: https://reviews.llvm.org/D76239
2020-03-29 11:01:59 +01:00
Richard Diamond 4bf015c035 [AlignmentFromAssumptions] Fix a SCEV assertion resulting from address space differences.
Summary:
On targets with different pointer sizes, -alignment-from-assumptions could attempt to create SCEV expressions which use different effective SCEV types. The provided test illustrates the issue.

In `getNewAlignment`, AASCEV would be the (only) alloca, which would have an effective SCEV type of i32. But PtrSCEV, the GEP in this case, due to being in the flat/default address space, will have an effective SCEV of i64.

This patch resolves the issue by truncating PtrSCEV to AASCEV's effective type.

Reviewers: hfinkel, jdoerfert

Reviewed By: jdoerfert

Subscribers: jvesely, nhaehnle, hiraditya, javed.absar, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75471
2020-03-29 01:26:31 -05:00
Craig Topper c0aa97b632 [X86] Add cost model test cases for fmin/fmax reduction. 2020-03-28 17:12:49 -07:00
Fangrui Song fc93787d7e [MC][PowerPC] Make .reloc support arbitrary relocation types
Generalizes ad7199f3e6 (R_PPC_NONE/R_PPC64_NONE).
2020-03-28 17:04:31 -07:00
Yonghong Song ced0d1f42b [BPF] support 128bit int explicitly in layout spec
Currently, bpf does not specify 128bit alignment in its
layout spec. So for a structure like
  struct ipv6_key_t {
    unsigned pid;
    unsigned __int128 saddr;
    unsigned short lport;
  };
clang will generate IR type
  %struct.ipv6_key_t = type { i32, [12 x i8], i128, i16, [14 x i8] }
Additional padding is to ensure later IR->MIR can generate correct
stack layout with target layout spec.

But it is common practice for a tracing program to be
first compiled with target flag (e.g., x86_64 or aarch64) through
clang to generate IR and then go through llc to generate bpf
byte code. Tracing program often refers to kernel internal
data structures which needs to be compiled with non-bpf target.

But such a compilation model may cause a problem on aarch64.
The bcc issue https://github.com/iovisor/bcc/issues/2827
reported such a problem.

For the above structure, since aarch64 has "i128:128" in its
layout string, the generated IR will have
  %struct.ipv6_key_t = type { i32, i128, i16 }

Since bpf does not have "i128:128" in its spec string,
the selectionDAG assumes alignment 8 for i128 and
computes the stack storage size for the above is 32 bytes,
which leads incorrect code later.

The x86_64 does not have this issue as it does not have
"i128:128" in its layout spec as it does permits i128 to
be alignmented at 8 bytes at stack. Its IR type looks like
  %struct.ipv6_key_t = type { i32, [12 x i8], i128, i16, [14 x i8] }

The fix here is add i128 support in layout spec, the same as
aarch64. The only downside is we may have less optimal stack
allocation in certain cases since we require 16byte alignment
for i128 instead of 8. But this is probably fine as i128 is
not used widely and in most cases users should already
have proper alignment.

Differential Revision: https://reviews.llvm.org/D76587
2020-03-28 11:46:29 -07:00
Reid Kleckner e5bf5037d8 [CodeGen] Fix sinking local values in lpads with phis
There was already a test case for landingpads to handle this case, but I
had forgotten to consider PHI instructions preceding the EH_LABEL in the
landingpad.

PR45261
2020-03-28 11:10:33 -07:00
Nikita Popov 30d712103f [InstCombine] Use replaceOperand() API in GEP transforms
To make sure that replaced operands get DCEd. This drops one
iteration from gepphigep.ll, which is still not optimal.

This was the last test case performing more than 3 iterations.

NFC-ish, only worklist order should change.
2020-03-28 19:07:25 +01:00
Nikita Popov 672e8bfbfc [InstCombine] Fix worklist management in foldXorOfICmps()
Because this code does not use the IC-aware replaceInstUsesWith()
helper, we need to manually push users to the worklist.

This is NFC-ish, in that it may only change worklist order.
2020-03-28 18:25:21 +01:00
Nikita Popov 337b671b0d [InstCombine] Change limit-max-iterations test case; NFC
This particular case will stop needing multiple iterations in
a followup change.
2020-03-28 18:25:20 +01:00
Martin Storsjö e6112a56dd [AsmPrinter] Emit .weak directive for weak linkage on COFF for symbols without a comdat
MC already knows how to emulate the .weak directive (with its ELF
semantics; i.e., an undefined weak symbol resolves to 0, and a defined
weak symbol has lower link precedence than a strong symbol of the same
name) using COFF weak externals. Plumb this through the ASM printer too,
so that definitions marked with __attribute__((weak)) at the language
level (which gets translated to weak linkage at the IR level) have the
corresponding .weak directive emitted. Note that declarations marked
with __attribute__((weak)) at the language level (which translates to
extern_weak at the IR level) already have .weak directives emitted.

Weak*/linkonce* symbols without an associated comdat (in particular, ones
generated with __attribute__((weak)) in C/C++) were earlier emitted as
normal unique globals, as the comdat is required to provide the linkonce
semantics. This change makes sure they are emitted as .weak instead,
allowing other symbols to override them.

Rename the existing coff-weak.ll test to coff-linkonce.ll. I'm not
quite sure what that test covers, since the behavior being tested in it
(the emission of a one_only section) is just a result of passing
-function-sections to llc; the linkonce_odr makes no difference.

Add a new coff-weak.ll which tests the new directive emission.

Based on an previous patch by Shoaib Meenai.

Differential Revision: https://reviews.llvm.org/D44543
2020-03-28 18:48:58 +02:00
Martin Storsjö 8330dcadb8 [llvm-rc] Allow -1 for menu item IDs
This seems to be used in some resource files, e.g.
f3217573d7/include/wx/msw/wx.rc (L28).

MSVC rc.exe and GNU windres both allow any value here, and silently
just truncate to uint16_t range. This just explicitly allows the
-1 value and errors out on others - the same was done for control
IDs in dialogs in c1a67857ba.

Differential Revision: https://reviews.llvm.org/D76951
2020-03-28 14:32:08 +02:00
Simon Pilgrim 8c1dbd5c1e [X86][SSE] Add testnzc(~X,Y) -> testnzc(X,Y) test cases 2020-03-28 10:56:57 +00:00
Simon Pilgrim d34d2ec28b [X86][SSE] Add original PR38522 test case 2020-03-28 10:56:57 +00:00
Simon Pilgrim 8d85da5f5a [X86][SSE] Add combine tests for PTEST/TESTPS/TESTPD instructions
Including some test coverage for PR38522
2020-03-28 10:56:57 +00:00
Serge Pavlov f398739152 [FEnv] Constfold some unary constrained operations
This change implements constant folding to constrained versions of
intrinsics, implementing rounding: floor, ceil, trunc, round, rint and
nearbyint.

Differential Revision: https://reviews.llvm.org/D72930
2020-03-28 12:28:33 +07:00
Jessica Paquette 98d05f88d5 [GlobalISel] Fix equality for copies from physregs in matchEqualDefs
When we see this:

```
%a = COPY $physreg
...
SOMETHING implicit-def $physreg
...
%b = COPY $physreg
```

The two copies are not equivalent, and so we shouldn't perform any folding
on them.

When we have two instructions which use a physical register check that they
define the same virtual register(s) as well.

e.g., if we run into this case

```
%a = COPY $physreg
...
%b = COPY %a
```

we can say that the two copies are the same, and can be folded.

Differential Revision: https://reviews.llvm.org/D76890
2020-03-27 17:52:21 -07:00
Kamlesh Kumar aabc24acf0 [RISCV] Support llvm.thread.pointer
Fixes https://bugs.llvm.org/show_bug.cgi?id=45303 (clang crashed on __builtin_thread_pointer)

Reviewed By: lenary, MaskRay, luismarques

Differential Revision: https://reviews.llvm.org/D76828
2020-03-27 17:30:12 -07:00
Nemanja Ivanovic 4821411347 [DAGCombine] Fix splitting indexed loads in ForwardStoreValueToDirectLoad()
In DAGCombiner::visitLOAD() we perform some checks before breaking up an indexed
load. However, we don't do the same checking in ForwardStoreValueToDirectLoad()
which can lead to failures later during combining
(see: https://bugs.llvm.org/show_bug.cgi?id=45301).

This patch just adds the same checks to this function as well.

Fixes: https://bugs.llvm.org/show_bug.cgi?id=45301

Differential revision: https://reviews.llvm.org/D76778
2020-03-27 18:03:47 -05:00
Florian Hahn 9ce198d6ed [Darwin] Respect -fno-unroll-loops during LTO.
Currently -fno-unroll-loops is ignored when doing LTO on Darwin. This
patch adds a new -lto-no-unroll-loops option to the LTO code generator
and forwards it to the linker if -fno-unroll-loops is passed.

Reviewers: thegameg, steven_wu

Reviewed By: thegameg

Differential Revision: https://reviews.llvm.org/D76916
2020-03-27 22:19:03 +00:00
Sanjay Patel 0f56bbc1a5 [InstCombine] reduce FP-casted and bitcasted signbit check
PR45305:
https://bugs.llvm.org/show_bug.cgi?id=45305

Alive2 proofs:
http://volta.cs.utah.edu:8080/z/bVyrko
http://volta.cs.utah.edu:8080/z/Vxpz9q
2020-03-27 17:33:59 -04:00
Sanjay Patel e72730ee3a [InstCombine] add tests for FP cast+bitcast signbit checks; NFC
PR45305:
https://bugs.llvm.org/show_bug.cgi?id=45305
2020-03-27 17:25:25 -04:00
Matt Arsenault a8cc9047de CodeGen: Add -denormal-fp-math-f32 flag
Make the set of FP related attributes and command flags closer.
2020-03-27 14:00:39 -07:00
Jay Foad a6dfd827e5 [AMDGPU] Fix getEUsPerCU for gfx10 in CU mode
Summary:
"Per CU" is a bit simplistic for gfx10, but I couldn't think of a better
name.

Reviewers: arsenm, rampitec, nhaehnle, dstuttard, tpr

Subscribers: kzhuravl, jvesely, wdng, yaxunl, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76861
2020-03-27 20:36:49 +00:00
Fangrui Song 152d14da64 [MC][X86] Make .reloc support arbitrary relocation types
Generalizes D62014 (R_386_NONE/R_X86_64_NONE).

Unlike ARM (D76746) and AArch64 (D76754), we cannot delete FK_NONE from
getFixupKindSize because FK_NONE is still used by R_386_TLS_DESC_CALL/R_X86_64_TLSDESC_CALL.
2020-03-27 13:33:15 -07:00
Matt Arsenault 0fd8030be3 Fix line endings in test 2020-03-27 16:26:06 -04:00
Matt Arsenault 348735b723 AMDGPU: Stop setting attributes based on TargetOptions
Having arbitrary passes looking at the TargetOptions is pretty
messy. This was also disregarding if a function already had an
explicit attribute setting on it. opt/llc now add the attributes to
functions that don't specify the attribute. clang and lld do not call
the function to do this, which they maybe should.

This was also treating unsafe-fp-math as implying the others, and
setting the other attributes based on it. This is not done anywhere
else, and I'm not sure is correct based on the current description of
the option bit.

Effectively reverts 1d8cf2be89
2020-03-27 13:13:43 -07:00
Fangrui Song 34d77516b8 [MC][AArch64] Make .reloc support arbitrary relocation types
Depends on D76746. Generalizes D61973.

Differential Revision: https://reviews.llvm.org/D76754
2020-03-27 12:30:52 -07:00
Fangrui Song c389526171 [MC][ARM] Make .reloc support arbitrary relocation types
Generalizes D61992. In GNU as, the .reloc directive supports arbitrary relocation types.

A MCFixupKind value `V` larger than or equal to FirstLiteralRelocationKind
is used to represent the relocation type whose number is V-FirstLiteralRelocationKind.

This is useful for linker tests. Without the feature the assembler
cannot produce certain relocation records (e.g.  R_ARM_ALU_PC_G0/R_ARM_LDR_PC_G0)
This helps move forward D75349 and D76575.

Differential Revision: https://reviews.llvm.org/D76746
2020-03-27 12:29:49 -07:00
Craig Topper cdd1cd7120 [X86] Don't form masked instructions if the operation has an additional user.
This will cause the operation to be repeated in both a mask and another masked
or unmasked form. This can a wasted of execution resources.

Differential Revision: https://reviews.llvm.org/D60940
2020-03-27 10:44:22 -07:00
Simon Pilgrim 763c87309d [X86][SSE] Add some additional v8i16 'truncation' style shuffle tests 2020-03-27 17:29:29 +00:00
Dennis Felsing aa0be69e74 Export Segment.IsGapRegion to JSON
Summary:
So that external tools can make use of that information and not display such lines as uncovered.

Fixes https://bugs.llvm.org/show_bug.cgi?id=45300

Reviewers: vsk

Reviewed By: vsk

Differential Revision: https://reviews.llvm.org/D76763
2020-03-27 18:05:01 +01:00
jasonliu d60d7d69de [llvm-objdump][XCOFF][AIX] Implement -r option
Summary:
Implement several XCOFF hooks to get '-r' option working for llvm-objdump -r.

Reviewer: DiggerLin, hubert.reinterpretcast, jhenderson, MaskRay

Differential Revision: https://reviews.llvm.org/D75131
2020-03-27 16:05:42 +00:00
Sam Parker d7084fa34a [ARM][LowOverheadLoops] DoubleWidthResult instructions canGenerateZeros
Given that some instructions generate wider result elements than
their inputs, flag them as being able to generate non zeros in the
false lanes.

Differential Revision: https://reviews.llvm.org/D76766
2020-03-27 15:26:13 +00:00