Commit Graph

182179 Commits

Author SHA1 Message Date
Sean Fertile 324d33dd4e [PowerPC] Fix comment on MO_PLT Target Operand Flag. [NFC]
Patch by Xiangling Liao.

llvm-svn: 366724
2019-07-22 18:47:59 +00:00
Sean Fertile 8034daca5f [Object][XCOFF] Remove extra includes from XCOFF related files. [NFC]
Differential Revision: https://reviews.llvm.org/D60885

llvm-svn: 366723
2019-07-22 18:47:55 +00:00
Peter Collingbourne c3b8661df5 LowerTypeTests: Teach the pass to respect global alignments.
We were previously ignoring alignment entirely when combining globals
together in this pass. There are two main things that we need to do here:
add additional padding before each global to meet the alignment requirements,
and set the combined global's alignment to the maximum of all of the original
globals' alignments.

Since we now need to calculate layout as we go anyway, use the calculated
layout to produce GlobalLayout instead of using StructLayout.

Differential Revision: https://reviews.llvm.org/D65033

llvm-svn: 366722
2019-07-22 18:47:03 +00:00
Nilanjana Basu 06b8fe8d03 Changes to emit CodeView debug info nested type records properly using MCStreamer directives
llvm-svn: 366720
2019-07-22 18:22:55 +00:00
Stanislav Mekhanoshin 401461584d [AMDGPU] Test update. NFC.
llvm-svn: 366715
2019-07-22 18:08:53 +00:00
Simon Pilgrim 3ebd2fe91a [SLPVectorizer] Fix some MSVC/cppcheck uninitialized variable warnings. NFCI.
llvm-svn: 366712
2019-07-22 17:57:36 +00:00
Vlad Tsyrklevich 5874a28ac5 Revert "Reland [ELF] Loose a condition for relocation with a symbol"
This reverts commit r366686 as it appears to be causing buildbot
failures on sanitizer-x86_64-linux-android and sanitizer-x86_64-linux.

llvm-svn: 366708
2019-07-22 17:48:53 +00:00
Matt Arsenault 542720b2bc TableGen: Support physical register inputs > 255
This was truncating register value that didn't fit in unsigned char.
Switch AMDGPU sendmsg intrinsics to using a tablegen pattern.

llvm-svn: 366695
2019-07-22 15:02:34 +00:00
Sam Parker 4379a40088 [ARM][LowOverheadLoops] Revert remaining pseudos
ARMLowOverheadLoops would assert a failure if it did not find all the
pseudo instructions that comprise the hardware loop. Instead of doing
this, iterate through all the instructions of the function and revert
any remaining pseudo instructions that haven't been converted.

Differential Revision: https://reviews.llvm.org/D65080

llvm-svn: 366691
2019-07-22 14:16:40 +00:00
Matt Arsenault 4668ea4072 AMDGPU/GlobalISel: Fix broken tests
llvm-svn: 366688
2019-07-22 13:33:11 +00:00
Nikola Prica 0166cff09b Reland [ELF] Loose a condition for relocation with a symbol
This patch was not the reason of the buildbot failure.

Deleted code was introduced as a work around for a bug in the gold linker
(http://sourceware.org/PR16794). Test case that was given as a reason for
this part of code, the one on previous link, now works for the gold.
This condition is too strict and when a code is compiled with debug info
it forces generation of numerous relocations with symbol for architectures
that do not have relocation addend.

Reviewers: arsenm, espindola

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D64327

llvm-svn: 366686
2019-07-22 13:07:01 +00:00
Matt Arsenault 937d0ee5d8 AMDGPU/GlobalISel: Remove unnecessary code
The minnum/maxnum case are dead, and the cvt is handled by the
default.

llvm-svn: 366685
2019-07-22 13:05:25 +00:00
David Green 8876a312a8 [ARM] Fix for MVE VPT block pass
We need to ensure that the number of T's is correct when adding multiple
instructions into the same VPT block.

Differential revision: https://reviews.llvm.org/D65049

llvm-svn: 366684
2019-07-22 12:51:38 +00:00
Simon Pilgrim b3d719e1cf [X86] EltsFromConsecutiveLoads - support common source loads (REAPPLIED)
This patch enables us to find the source loads for each element, splitting them into a Load and ByteOffset, and attempts to recognise consecutive loads that are in fact from the same source load.

A helper function, findEltLoadSrc, recurses to find a LoadSDNode and determines the element's byte offset within it. When attempting to match consecutive loads, byte offsetted loads then attempt to matched against a previous load that has already been confirmed to be a consecutive match.

Next step towards PR16739 - after this we just need to account for shuffling/repeated elements to create a vector load + shuffle.

Fixed out of bounds load assert identified in rL366501

Differential Revision: https://reviews.llvm.org/D64551

llvm-svn: 366681
2019-07-22 12:44:10 +00:00
Matt Arsenault 8d372008b1 AMDGPU/GlobalISel: Fix tests without asserts
The legality check is only done under NDEBUG, so the failure cases are
different in a release build.

llvm-svn: 366680
2019-07-22 12:43:41 +00:00
Christudasan Devadasan 006cf8c03d Added address-space mangling for stack related intrinsics
Modified the following 3 intrinsics:
int_addressofreturnaddress,
int_frameaddress & int_sponentry.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D64561

llvm-svn: 366679
2019-07-22 12:42:48 +00:00
Simon Pilgrim bdb9295520 [X86][SSE] Add EltsFromConsecutiveLoads test case identified in rL366501
Test case that led to rL366441 being reverted at rL366501

llvm-svn: 366678
2019-07-22 12:17:56 +00:00
George Rimar 13a364e1cc [yaml2obj] - Change how we handle implicit sections.
Instead of having the special list of implicit sections,
that are mixed with the sections read from YAML on late
stages, I just create the placeholders and add them to
the main sections list early.

That allows to significantly simplify the code.

Differential revision: https://reviews.llvm.org/D64999

llvm-svn: 366677
2019-07-22 12:01:52 +00:00
Stefan Granitz 3a52e50d73 Add location of SVN staging dir to git-llvm error output
Summary:
In pre-monorepo times the svn staging directory was `.git/svn`. The below error message wasn't mentioning the new name yet.

Example before:
```
Can't push git rev 104cfa289d9 because svn status is not empty:
!     llvm/trunk/include/llvm
```

Example after:
```
Can't push git rev 104cfa289d9 because status in svn staging dir (.git/llvm-upstream-svn) is not empty:
!     llvm/trunk/include/llvm
```

Reviewers: mehdi_amini, jlebar, teemperor

Reviewed By: mehdi_amini

Subscribers: llvm-commits, #llvm

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65038

llvm-svn: 366671
2019-07-22 09:47:40 +00:00
Oliver Stannard 6771a89fa0 [IPRA][ARM] Make use of the "returned" parameter attribute
ARM has code to recognise uses of the "returned" function parameter
attribute which guarantee that the value passed to the function in r0
will be returned in r0 unmodified. IPRA replaces the regmask on call
instructions, so needs to be told about this to avoid reverting the
optimisation.

Differential revision: https://reviews.llvm.org/D64986

llvm-svn: 366669
2019-07-22 08:44:36 +00:00
George Rimar 6522a7df54 [llvm-readobj] - Stop using precompiled objects in file-headers.test
This converts all sub-tests except one to YAML instead of precompiled inputs.

Differential revision: https://reviews.llvm.org/D64800

llvm-svn: 366668
2019-07-22 08:10:02 +00:00
Jay Foad 298500ae33 [AMDGPU] Save some work when an atomic op has no uses
Summary:
In the atomic optimizer, save doing a bunch of work and generating a
bunch of dead IR in the fairly common case where the result of an
atomic op (i.e. the value that was in memory before the atomic op was
performed) is not used. NFC.

Reviewers: arsenm, dstuttard, tpr

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, t-tye, hiraditya, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64981

llvm-svn: 366667
2019-07-22 07:19:44 +00:00
Kai Luo 3d72a58981 [PowerPC][NFC] Precommit a test case where ppc-mi-peepholes miscompiles extswsli
Added a test case to show codegen differences.

llvm-svn: 366666
2019-07-22 05:32:20 +00:00
Serguei Katkov c6c31da867 [Loop Peeling] Fix the handling of branch weights of peeled off branches.
Current algorithm to update branch weights of latch block and its copies is
based on the assumption that number of peeling iterations is approximately equal
to trip count.

However it is not correct. According to profitability check in one case we can decide to peel
in case it helps to reduce the number of phi nodes. In this case the number of peeled iteration
can be less then estimated trip count.

This patch introduces another way to set the branch weights to peeled of branches.
Let F is a weight of the edge from latch to header.
Let E is a weight of the edge from latch to exit.
F/(F+E) is a probability to go to loop and E/(F+E) is a probability to go to exit.
Then, Estimated TripCount = F / E.
For I-th (counting from 0) peeled off iteration we set the the weights for
the peeled latch as (TC - I, 1). It gives us reasonable distribution,
The probability to go to exit 1/(TC-I) increases. At the same time
the estimated trip count of remaining loop reduces by I.

As a result after peeling off N iteration the weights will be
(F - N * E, E) and trip count of loop becomes
F / E - N or TC - N.

The idea is taken from the review of the patch D63918 proposed by Philip.

Reviewers: reames, mkuper, iajbar, fhahn
Reviewed By: reames
Subscribers: hiraditya, zzheng, llvm-commits
Differential Revision: https://reviews.llvm.org/D64235

llvm-svn: 366665
2019-07-22 05:15:34 +00:00
Fangrui Song 6ef23e6581 [utils] Clean up UpdateTestChecks/common.py
llvm-svn: 366664
2019-07-22 04:59:01 +00:00
Craig Topper ee5dc7e7ad [InstCombine] Add foldAndOfICmps test cases inspired by PR42691.
icmp ne %x, INT_MIN can be treated similarly to icmp sgt %x, INT_MIN.
icmp ne %x, INT_MAX can be treated similarly to icmp slt %x, INT_MAX.
icmp ne %x, UINT_MAX can be treated similarly to icmp ult %x, UINT_MAX.

We already treat icmp ne %x, 0 similarly to icmp ugt %x, 0

llvm-svn: 366662
2019-07-22 02:43:43 +00:00
Nemanja Ivanovic 3d68adebc5 [PowerPC][NFC] Precomit test case for upcoming patch
Just committing a test case for an upcoming patch so that the review can show
only the codegen differences.

llvm-svn: 366661
2019-07-21 21:03:45 +00:00
Simon Pilgrim 86fa3270ef [X86] SimplifyDemandedVectorEltsForTargetNode - Move SUBV_BROADCAST narrowing handling. NFCI.
Move the narrowing of SUBV_BROADCAST to where we handle all the other opcodes.

llvm-svn: 366660
2019-07-21 19:04:44 +00:00
Nemanja Ivanovic 73d641a23c [PowerPC][NFC] Regenerate test using script
This test case ended up as a hybrid of generated checks and manually inserted
checks. Regenerate using script to make it consistent.

llvm-svn: 366659
2019-07-21 18:42:29 +00:00
Craig Topper e6cd20ba53 [InstCombine] Update comment I missed in r366649. NFC
llvm-svn: 366658
2019-07-21 16:15:03 +00:00
Simon Pilgrim 630be14ac6 [SmallBitVector] Fix bug in find_next_unset for small types with indices >=32
We were creating a bitmask from a shift of unsigned instead of uintptr_t, meaning we couldn't create masks for indices above 31.

Noticed due to a MSVC analyzer warning.

llvm-svn: 366657
2019-07-21 16:06:26 +00:00
Aditya Nandakumar d7504a1569 [GISel]: Attach missing range metadata while translating G_LOADs
https://reviews.llvm.org/D65048

Attach range information to G_LOAD when only defining one register.

reviewed by: arsenm

llvm-svn: 366656
2019-07-21 14:07:54 +00:00
David Green c38899fc26 [ARM] Move MVE VPT block tests into the Thumb2 directory. NFC
llvm-svn: 366655
2019-07-21 13:09:19 +00:00
Roman Lebedev 8a431874e9 [NFC][InstCombine] Add a few extra srem-by-power-of-two tests - extra uses
llvm-svn: 366652
2019-07-21 09:05:49 +00:00
Craig Topper 1d149d08d3 [InstCombine] Remove insertRangeTest code that handles the equality case.
For equality, the function called getTrue/getFalse with the VT
of the comparison input. But getTrue/getFalse need the boolean VT.
So if this code ever executed, it would assert.

I believe these cases are removed by InstSimplify so we don't get here.

So this patch just fixes up an assert to exclude the equality
possibility and removes the broken code.

llvm-svn: 366649
2019-07-21 06:43:38 +00:00
Craig Topper 8fabdfe9fc [InstCombine] Don't use AddOne/SubOne to see if two APInts are 1 apart. Use APInt operations instead. NFCI
AddOne/SubOne create new Constant objects. That seems heavy for
comparing ConstantInts which wrap APInts. Just do the math on
on the APInts and compare them.

llvm-svn: 366648
2019-07-21 05:26:05 +00:00
Nico Weber b910956202 gn build: Merge r366622
llvm-svn: 366646
2019-07-21 00:03:55 +00:00
Roman Lebedev a2dd672c5f [NFC][InstCombine] Autogenerate a few tests
llvm-svn: 366643
2019-07-20 21:34:00 +00:00
Roman Lebedev 056640f8b3 [NFC][InstCombine] Add srem-by-signbit tests - still can fold to bittest
https://rise4fun.com/Alive/IIeS

llvm-svn: 366642
2019-07-20 21:33:50 +00:00
Roman Lebedev 7f0c23576f [NFC][Codegen][X86][AArch64] Add "(x s% C) == 0" tests
Much like with `urem`, the same optimization (albeit with slightly
different algorithm) applies for the signed case, too.

I'm simply copying the test coverage from `urem` case for now,
i believe it should be (close to?) sufficient.

llvm-svn: 366640
2019-07-20 19:25:44 +00:00
Roman Lebedev cd9b19484b [Codegen][SelectionDAG] X u% C == 0 fold: non-splat vector improvements
Summary:
Four things here:
1. Generalize the fold to handle non-splat divisors. Reasonably trivial.
2. Unban power-of-two divisors. I don't see any reason why they should
   be illegal.
   * There is no ban in Hacker's Delight
   * I think the ban came from the same bug that caused the miscompile
      in the base patch - in `floor((2^W - 1) / D)` we were dividing by
      `D0` instead of `D`, and we **were** ensuring that `D0` is not `1`,
      which made sense.
3. Unban `1` divisors. I no longer believe Hacker's Delight actually says
   that the fold is invalid for `D = 0`. Further considerations:
   * We know that
     * `(X u% 1) == 0`  can be constant-folded to `1`,
     * `(X u% 1) != 0`  can be constant-folded to `0`,
   *  Also, we know that
     * `X u<= -1` can be constant-folded to `1`,
     * `X u>  -1` can be constant-folded to `0`,
   * https://godbolt.org/z/7jnZJX https://rise4fun.com/Alive/oF6p
   * We know will end up with the following:
       `(setule/setugt (rotr (mul N, P), K), Q)`
   * Therefore, for given new DAG nodes and comparison predicates
     (`ule`/`ugt`), we will still produce the correct answer if:
     `Q` is a all-ones constant; and both `P` and `K` are *anything*
     other than `undef`.
   * The fold will indeed produce `Q = all-ones`.
4. Try to re-splat the `P` and `K` vectors - we don't care about
   their values for the lanes where divisor was `1`.

Reviewers: RKSimon, hermord, craig.topper, spatel, xbolva00

Reviewed By: RKSimon

Subscribers: hiraditya, javed.absar, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63963

llvm-svn: 366637
2019-07-20 16:33:15 +00:00
Simon Pilgrim adec0f2252 [X86][SSE] Use PSADBW to improve vXi8 sum reduction (PR42674)
As detailed on PR42674, we can reduce a vXi8 down until we have the final <8 x i8>, and then use PSADBW with zero, to sum those values. We then extract the bottom i8, discarding any overflow from the upper bits of the i16 result.

llvm-svn: 366636
2019-07-20 15:20:11 +00:00
Florian Hahn 0a7faa4e3d [Local] Zap blockaddress without users in ConstantFoldTerminator.
If the blockaddress is not destoryed, the destination block will still
be marked as having its address taken, limiting further transformations.

I think there are other places where the dead blockaddress constants are kept
around, I'll look into that as follow up.

Reviewers: craig.topper, brzycki, davide

Reviewed By: brzycki, davide

Differential Revision: https://reviews.llvm.org/D64936

llvm-svn: 366633
2019-07-20 12:25:47 +00:00
Jessica Paquette 41affad967 [GlobalISel][AArch64] Contract trivial same-size cross-bank copies into G_STOREs
Sometimes, you can end up with cross-bank copies between same-sized GPRs and
FPRs, which feed into G_STOREs. When these copies feed only into stores, they
aren't necessary; we can just store using the original register bank.

This provides some minor code size savings for some floating point SPEC
benchmarks. (Around 0.2% for 453.povray and 450.soplex)

This issue doesn't seem to show up due to regbankselect or anything similar. So,
this patch introduces an early select function, `contractCrossBankCopyIntoStore`
which performs the contraction when possible. The selector then continues
normally and selects the correct store opcode, eliminating needless copies
along the way.

Differential Revision: https://reviews.llvm.org/D65024

llvm-svn: 366625
2019-07-20 01:55:35 +00:00
Guanzhong Chen 5204f7611f [WebAssembly] Compute and export TLS block alignment
Summary:
Add immutable WASM global `__tls_align` which stores the alignment
requirements of the TLS segment.

Add `__builtin_wasm_tls_align()` intrinsic to get this alignment in Clang.

The expected usage has now changed to:

    __wasm_init_tls(memalign(__builtin_wasm_tls_align(),
                             __builtin_wasm_tls_size()));

Reviewers: tlively, aheejin, sbc100, sunfish, alexcrichton

Reviewed By: tlively

Subscribers: dschuff, jgravelle-google, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D65028

llvm-svn: 366624
2019-07-19 23:34:16 +00:00
Daniel Sanders 578e8fa833 Re-commit: r366610 and r366612: Expand pseudo-components before embedding in llvm-config
There were two main problems:
* The 'nativecodegen' pseudo-component was unconditionally adding
  ${native_tgt}CodeGen even though it conditionally added ${native_tgt}Info and
  ${native_tgt}Desc. This has been fixed by making ${native_tgt}CodeGen
  conditional too
* The 'all' pseudo-component was causing library names like LLVMLLVMDemangle as
  the expansion was to a library name and not a component. There doesn't seem to
  be a list of available components anywhere so this has been fixed by moving the
  expansion of 'all' back where it was before. This manifested in different ways
  on different builders but it was the same root cause

llvm-svn: 366622
2019-07-19 22:46:47 +00:00
Matt Arsenault f3bfb85bce AMDGPU/GlobalISel: Legalize GEP for other 32-bit address spaces
llvm-svn: 366621
2019-07-19 22:28:44 +00:00
Stanislav Mekhanoshin 05d9e6a2a3 [AMDGPU] Autogenerate register sequences in tuples
Differential Revision: https://reviews.llvm.org/D65007

llvm-svn: 366619
2019-07-19 21:43:42 +00:00
Stanislav Mekhanoshin 7b5a54e369 [AMDGPU] Fixed occupancy calculation for gfx10
Differential Revision: https://reviews.llvm.org/D65010

llvm-svn: 366616
2019-07-19 21:29:51 +00:00
Daniel Sanders 34da8dfba0 Revert r366610 and r366612: Expand pseudo-components before embedding in llvm-config
Some targets are missing LLVMDemangle, one is adding the LLVM prefix twice, and two
are hitting the very error this patch fixes for my target. Reverting while I work
through the reports.

llvm-svn: 366615
2019-07-19 21:11:05 +00:00