Commit Graph

65849 Commits

Author SHA1 Message Date
Simon Pilgrim fdc0917b46 Fix OCaml/core.ml fneg check (try 2)
llvm-svn: 374355
2019-10-10 14:13:55 +00:00
Dmitri Gribenko eaf6dd482b Revert "[IRBuilder] Update IRBuilder::CreateFNeg(...) to return a UnaryOperator"
This reverts commit r374240. It broke OCaml tests:
http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/19014

llvm-svn: 374354
2019-10-10 14:13:54 +00:00
Thomas Preud'homme 48edae336b Revert "[test] Use system locale for mri-utf8.test"
This reverts commit r374318 / b6f1d1fa0e.

llvm-svn: 374349
2019-10-10 13:39:12 +00:00
Simon Pilgrim fbf8b0bc0d Fix OCaml/core.ml fneg check
llvm-svn: 374346
2019-10-10 13:29:47 +00:00
George Rimar 55f1be0996 [llvm-readelf] - Do not enter an infinite loop when printing histogram.
This is similar to D68086.
We are entering an infinite loop when dumping a histogram for a specially crafted
.hash section with a loop in a chain.

Differential revision: https://reviews.llvm.org/D68771

llvm-svn: 374344
2019-10-10 13:26:26 +00:00
Kai Nacke 819f01d917 [Tests] Output of od can be lower or upper case (llvm-objcopy/yaml2obj).
The command `od -t x` is used to dump data in hex format.
The LIT tests assumes that the hex characters are in lowercase.
However, there are also platforms which use uppercase letter.

To solve this issue the tests are updated to use the new
`--ignore-case` option of FileCheck.

Reviewers: Bigcheese, jakehehrlich, rupprecht, espindola, alexshap, jhenderson

Differential Revision: https://reviews.llvm.org/D68693

llvm-svn: 374343
2019-10-10 13:24:00 +00:00
Amaury Sechet aaf0507896 [DAGCombine] Match more patterns for half word bswap
Summary: It ensures that the bswap is generated even when a part of the subtree already matches a bswap transform.

Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68250

llvm-svn: 374340
2019-10-10 13:20:10 +00:00
Kai Nacke dfd2b6f07f [FileCheck] Implement --ignore-case option.
The FileCheck utility is enhanced to support a `--ignore-case`
option. This is useful in cases where the output of Unix tools
differs in case (e.g. case not specified by Posix).

Reviewers: Bigcheese, jakehehrlich, rupprecht, espindola, alexshap, jhenderson, MaskRay

Differential Revision: https://reviews.llvm.org/D68146

llvm-svn: 374339
2019-10-10 13:15:41 +00:00
Pavel Labath 3aa7e76677 MinidumpYAML: Add support for the memory info list stream
Summary:
The implementation is fairly straight-forward and uses the same patterns
as the existing streams. The yaml form does not attempt to preserve the
data in the "gaps" that can be created by setting a larger-than-required
header or entry size in the stream header, because the existing consumer
(lldb) does not make use of the information in the gap in any way, and
attempting to preserve that would make the implementation more
complicated.

Reviewers: amccarth, jhenderson, clayborg

Subscribers: llvm-commits, lldb-commits, markmentovai, zturner, JosephTremoulet

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68645

llvm-svn: 374337
2019-10-10 13:05:46 +00:00
David Green 39596ec2fe [ARM] VQADD instructions
This selects MVE VQADD from the vector llvm.sadd.sat or llvm.uadd.sat
intrinsics.

Differential Revision: https://reviews.llvm.org/D68566

llvm-svn: 374336
2019-10-10 13:05:04 +00:00
Sanjay Patel 3370d4d2b7 [AArch64][x86] add tests for (v)select bit magic; NFC
llvm-svn: 374334
2019-10-10 12:53:24 +00:00
Mirko Brkusanin c2e481679b [Mips] Fix 374055
EXPENSIVE_CHECKS build was failing on new test.
This is fixed by marking $ra register as undef.
Test now has -verify-machineinstrs to check for operand flags.

llvm-svn: 374320
2019-10-10 12:02:14 +00:00
Thomas Preud'homme b6f1d1fa0e [test] Use system locale for mri-utf8.test
Summary:
llvm-ar's mri-utf8.test test relies on the en_US.UTF-8 locale to be
installed for its last RUN line to work. If not installed, the unicode
string gets encoded (interpreted) as ascii which fails since the most
significant byte is non zero. This commit changes the test to only rely
on the system being able to encode the pound sign in its default
encoding (e.g. UTF-16 for Microsoft Windows) by always opening the file
via input/output redirection. This avoids forcing a given locale to be
present and supported. A Byte Order Mark is also added to help
recognizing the encoding of the file and its endianness.

Reviewers: gbreynoo, MaskRay, rupprecht, JamesNagurne, jfb

Subscribers: dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68472

llvm-svn: 374318
2019-10-10 11:48:30 +00:00
Oliver Stannard 4f454b2275 [IfCvt][ARM] Optimise diamond if-conversion for code size
Currently, the heuristics the if-conversion pass uses for diamond if-conversion
are based on execution time, with no consideration for code size. This adds a
new set of heuristics to be used when optimising for code size.

This is mostly target-independent, because the if-conversion pass can
see the code size of the instructions which it is removing. For thumb,
there are a few passes (insertion of IT instructions, selection of
narrow branches, and selection of CBZ instructions) which are run after
if conversion and affect these heuristics, so I've added target hooks to
better predict the code-size effect of a proposed if-conversion.

Differential revision: https://reviews.llvm.org/D67350

llvm-svn: 374301
2019-10-10 09:58:28 +00:00
Matt Arsenault 12994a70cf AMDGPU: Use SGPR_128 instead of SReg_128 for vregs
SGPR_128 only includes the real allocatable SGPRs, and SReg_128 adds
the additional non-allocatable TTMP registers. There's no point in
allocating SReg_128 vregs. This shrinks the size of the classes
regalloc needs to consider, which is usually good.

llvm-svn: 374284
2019-10-10 07:11:33 +00:00
Craig Topper 0a84576262 [X86] Add test case for trunc_packus_v16i32_v16i8 with avx512vl+avx512bw and prefer-vector-width=256 and min-legal-vector-width=256. NFC
llvm-svn: 374283
2019-10-10 06:25:00 +00:00
Johannes Doerfert 72adda1740 [Attributor] Handle `null` differently in capture and alias logic
Summary:
`null` in the default address space (=AS 0) cannot be captured nor can
it alias anything. We make this clear now as it can be important for
callbacks and other cases later on. In addition, this patch improves the
debug output for noalias deduction.

Reviewers: sstefan1, uenoku

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68624

llvm-svn: 374280
2019-10-10 05:33:21 +00:00
Chen Zheng 92e00293fd [PowerPC] add testcase for ppc loop instr form prep - NFC
llvm-svn: 374273
2019-10-10 03:00:15 +00:00
Reid Kleckner 9d8f0b3519 [codeview] Try to avoid emitting .cv_loc with line zero
Summary:
Visual Studio doesn't like it while stepping. It kicks you out of the
source view of the file being stepped through and tries to fall back to
the disassembly view.

Fixes PR43530

The fix is incomplete, because it's possible to have a basic block with
no source locations at all. In this case, we don't emit a .cv_loc, but
that will result in wrong stepping behavior in the debugger if the
layout predecessor of the location-less BB has an unrelated source
location. We could try harder to find a valid location that dominates or
post-dominates the current BB, but in general it's a dataflow problem,
and one still might not exist. I left a FIXME about this.

As an alternative, we might want to consider having the middle-end check
if its emitting codeview and get it to stop using line zero.

Reviewers: akhuang

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68747

llvm-svn: 374267
2019-10-10 01:06:01 +00:00
Thomas Lively 3414bce07a [WebAssembly] Fix tests missed in rL374235
llvm-svn: 374259
2019-10-09 23:06:38 +00:00
Matt Arsenault f8bf7d7f42 AMDGPU: Don't fold copies to physregs
In a future patch, this will help cleanup m0 handling.

The register coalescer handles copies from a register that
materializes an immediate, but doesn't handle move immediates
itself. The virtual register uses will often be allocated to the same
register, so there end up being no real copy.

llvm-svn: 374257
2019-10-09 22:51:42 +00:00
Matt Arsenault 85dfa82302 AMDGPU/GlobalISel: Fix crash on wide constant load with VGPR pointer
This was ignoring the register bank of the input pointer, and
isUniformMMO seems overly aggressive.

This will now conservatively assume a VGPR in cases where the incoming
bank hasn't been determined yet (i.e. is from a loop phi).

llvm-svn: 374255
2019-10-09 22:44:49 +00:00
Matt Arsenault 3cd3959fe2 GlobalISel: Implement fewerElementsVector for G_BUILD_VECTOR
Turn it into a G_CONCAT_VECTORS of G_BUILD_VECTOR.

llvm-svn: 374252
2019-10-09 22:44:43 +00:00
Stanislav Mekhanoshin c6dec1d828 [AMDGPU] Fixed dpp combine of VOP1
If original instruction did not have source modifiers they were
not added to the new DPP instruction as well, even if needed.

Differential Revision: https://reviews.llvm.org/D68729

llvm-svn: 374241
2019-10-09 22:02:58 +00:00
Cameron McInally 47363a148f [IRBuilder] Update IRBuilder::CreateFNeg(...) to return a UnaryOperator
Also update Clang to call Builder.CreateFNeg(...) for UnaryMinus.

Differential Revision: https://reviews.llvm.org/D61675

llvm-svn: 374240
2019-10-09 21:52:15 +00:00
Thomas Lively 00f9e5aa76 [WebAssembly] Make returns variadic
Summary:
This is necessary and sufficient to get simple cases of multiple
return working with multivalue enabled. More complex cases will
require block and loop signatures to be generalized to potentially be
type indices as well.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68684

llvm-svn: 374235
2019-10-09 21:42:08 +00:00
Wei Mi 09dcfe6805 [SampleFDO] Add indexing for function profiles so they can be loaded on demand
in ExtBinary format

Currently for Text, Binary and ExtBinary format profiles, when we compile a
module with samplefdo, even if there is no function showing up in the profile,
we have to load all the function profiles from the profile input. That is a
waste of compile time.

CompactBinary format profile has already had the support of loading function
profiles on demand. In this patch, we add the support to load profile on
demand for ExtBinary format. It will work no matter the sections in ExtBinary
format profile are compressed or not. Experiment shows it reduces the time to
compile a server benchmark by 30%.

When profile remapping and loading function profiles on demand are both used,
extra work needs to be done so that the loading on demand process will take
the name remapping into consideration. It will be addressed in a follow-up
patch.

Differential Revision: https://reviews.llvm.org/D68601

llvm-svn: 374233
2019-10-09 21:36:03 +00:00
David Blaikie 411497c6c7 llvm-dwarfdump: Support multiple debug_loclists contributions
Also fixing the incorrect "offset" field being computed/printed for each
location list.

llvm-svn: 374232
2019-10-09 21:25:28 +00:00
Sanjay Patel 232b9dc46a [ConstProp] add tests for extractelement with undef index; NFC
llvm-svn: 374210
2019-10-09 20:14:17 +00:00
Sanjay Patel 0845ac7331 [InstCombine] add another test for gep inbounds; NFC
llvm-svn: 374190
2019-10-09 17:52:26 +00:00
Thomas Lively 3419e90dc1 [WebAssembly] Add builtin and intrinsic for v8x16.swizzle
Summary:
This clang builtin and corresponding LLVM intrinsic are necessary to
expose the exact semantics of the underlying WebAssembly instruction
to users. LLVM produces a poison value if the dynamic swizzle indices
are greater than the vector size, but the WebAssembly instruction sets
the corresponding output lane to zero. Users who depend on this
behavior can safely use this builtin.

Depends on D68527.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D68531

llvm-svn: 374189
2019-10-09 17:45:47 +00:00
Thomas Lively d5b7a4e2e8 [WebAssembly] v8x16.swizzle and rewrite BUILD_VECTOR lowering
Summary:
Adds the new v8x16.swizzle SIMD instruction as specified at
https://github.com/WebAssembly/simd/blob/master/proposals/simd/SIMD.md#swizzling-using-variable-indices.
In addition to adding swizzles as a candidate lowering in
LowerBUILD_VECTOR, also rewrites and simplifies the lowering to
minimize the number of replace_lanes necessary rather than trying to
minimize code size. This leads to more uses of v128.const instead of
splats, which is expected to increase performance.

The new code will be easier to tune once V8 implements all the vector
construction operations, and it will also be easier to add new
candidate instructions in the future if necessary.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68527

llvm-svn: 374188
2019-10-09 17:39:19 +00:00
Kevin P. Neal 44e988ab14 [FPEnv][NFC] Change test to conform to strictfp attribute rules.
In particular, the function definition is not marked strictfp despite
containing a function marked strictfp. Also, if any function call is marked
strictfp then all function calls in that function must be marked.

This change to move the one strictfp call to a new properly marked function
meets all the new rules.

Tested with a stricter version of D68233.

Reviewed by:	spatel
Approved by:	spatel
Differential Revision:	https://reviews.llvm.org/D68713

llvm-svn: 374186
2019-10-09 17:24:56 +00:00
Sanjay Patel df14bd315d [SLP] respect target register width for GEP vectorization (PR43578)
We failed to account for the target register width (max vector factor)
when vectorizing starting from GEPs. This causes vectorization to
proceed to obviously illegal widths as in:
https://bugs.llvm.org/show_bug.cgi?id=43578

For x86, this also means that SLP can produce rogue AVX or AVX512
code even when the user specifies a narrower vector width.

The AArch64 test in ext-trunc.ll appears to be better using the
narrower width. I'm not exactly sure what getelementptr.ll is trying
to do, but it's testing with "-slp-threshold=-18", so I'm not worried
about those diffs. The x86 test is an over-reduction from SPEC h264;
this patch appears to restore the perf loss caused by SLP when using
-march=haswell.

Differential Revision: https://reviews.llvm.org/D68667

llvm-svn: 374183
2019-10-09 16:32:49 +00:00
Momchil Velikov d037a5f065 [AArch64] Ensure no tagged memory is left in the unallocated portion of the
stack

This patch makes sure that if we tag some memory, we untag that memory before
the function returns/throws via any exit, reachable from the tag operation. For
that we place the untag operation either at:

  a) the lifetime end call for the alloca, if that call post-dominates the
     lifetime start call (where the tag operation is placed), or it (the
     lifetime end call) dominates all reachable exits, otherwise
  b) at the reachable exits

Differential Revision: https://reviews.llvm.org/D68469

llvm-svn: 374182
2019-10-09 16:31:50 +00:00
Jonas Devlieghere e7affcdbd2 Re-land "[dsymutil] Fix handling of common symbols in multiple object files."
The original patch got reverted because it hit a long-standing legacy
issue on Windows that prevents files from being named `com`. Thanks
Kristina & Jeremy for pointing this out.

llvm-svn: 374178
2019-10-09 16:19:13 +00:00
Alina Sbirlea 7faa14a98b [MemorySSA] Make the use of moveAllAfterMergeBlocks consistent.
Summary:
The rule for the moveAllAfterMergeBlocks API si for all instructions
from `From` to have been moved to `To`, while keeping the CFG edges (and
block terminators) unchanged.
Update all the callsites for moveAllAfterMergeBlocks to follow this.

Pending follow-up: since the same behavior is needed everytime, merge
all callsites into one. The common denominator may be the call to
`MergeBlockIntoPredecessor`.

Resolves PR43569.

Reviewers: george.burgess.iv

Subscribers: Prazek, sanjoy.google, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68659

llvm-svn: 374177
2019-10-09 15:54:24 +00:00
David Green fcc9c4627e Add and adjust saturating tests. NFC
This adds some extra testing to the existing [su][add/sub]_sat X86 and AArch64
tests and adds equivalent tests for ARM.

llvm-svn: 374169
2019-10-09 14:17:38 +00:00
Sjoerd Meijer d1170dbe58 [LV] Emitting SCEV checks with OptForSize
When optimising for size and SCEV runtime checks need to be emitted to check
overflow behaviour, the loop vectorizer can run in this assert:

  LoopVectorize.cpp:2699: void llvm::InnerLoopVectorizer::emitSCEVChecks(
  llvm::Loop *, llvm::BasicBlock *): Assertion `!BB->getParent()->hasOptSize()
  && "Cannot SCEV check stride or overflow when opt

We should not generate predicates while optimising for size because
code will be generated for predicates such as these SCEV overflow runtime
checks.

This should fix PR43371.

Differential Revision: https://reviews.llvm.org/D68082

llvm-svn: 374166
2019-10-09 13:19:41 +00:00
Simon Pilgrim d7ac255325 [CostModel][X86] Add tests for insertelement to non-immediate vector element indices
llvm-svn: 374161
2019-10-09 12:36:34 +00:00
Simon Pilgrim a21176ffb1 [CostModel][X86] Add tests for extractelement from non-immediate vector element indices
llvm-svn: 374160
2019-10-09 12:36:22 +00:00
David Green e2c72929c8 [ARM] Add saturating arithmetic tests for MVE. NFC
llvm-svn: 374159
2019-10-09 12:29:51 +00:00
James Molloy 9948fe6997 [TableGen] Fix crash when using HwModes in CodeEmitterGen
When an instruction has an encoding definition for only a subset of
the available HwModes, ensure we just avoid generating an encoding
rather than crash.

llvm-svn: 374150
2019-10-09 09:15:34 +00:00
Clement Courbet c3a7fb7599 [llvm-exegesis] Explore LEA addressing modes.
Summary:
This will help for PR32326.

This shows the well-known issue with `RBP` and `R13` as base registers.

Reviewers: gchatelet

Subscribers: tschuett, llvm-commits, RKSimon, andreadb

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68646

llvm-svn: 374146
2019-10-09 08:49:13 +00:00
Jeremy Morse e9c8f6fea6 Revert r374139, "[dsymutil] Fix handling of common symbols in multiple object files."
The added test files ("com", "com1.o", "com2.o") are reserved names on
Windows, and makes 'git checkout' fail with a filesystem error.

llvm-svn: 374144
2019-10-09 08:27:48 +00:00
Jonas Devlieghere 4ac388f7ca [dsymutil] Fix handling of common symbols in multiple object files.
For common symbols the linker emits only a single symbol entry in the
debug map. This caused dsymutil to not relocate common symbols when
linking DWARF coming form object files that did not have this entry.
This patch fixes that by keeping track of common symbols in the object
files and synthesizing a debug map entry for them using the address from
the main binary.

Differential revision: https://reviews.llvm.org/D68680

llvm-svn: 374139
2019-10-09 04:16:18 +00:00
Bill Wendling 4d69ca8c67 [IA] Add tests for a few other edge cases
Test with the last eight bits within the range [7F, FF] and with
lower-case hex letters.

llvm-svn: 374124
2019-10-08 22:06:09 +00:00
Jonas Devlieghere a3f794e9b4 [dsymutil] Improve verbose output (NFC)
The verbose output for finding relocations assumed that we'd always dump
the DIE after (which starts with a newline) and therefore didn't include
one itself. However, this isn't always true, leading to garbled output.

This patch adds a newline to the verbose output and adds a line that
says that the DIE is being kept (which isn't obvious otherwise). It also
adds a 0x prefix to the relocations.

llvm-svn: 374123
2019-10-08 22:03:13 +00:00
Roman Lebedev 354ba6985c [CVP} Replace SExt with ZExt if the input is known-non-negative
Summary:
zero-extension is far more friendly for further analysis.
While this doesn't directly help with the shift-by-signext problem, this is not unrelated.

This has the following effect on test-suite (numbers collected after the finish of middle-end module pass manager):
| Statistic                            |     old |     new | delta | percent change |
| correlated-value-propagation.NumSExt |       0 |    6026 |  6026 |   +100.00%     |
| instcount.NumAddInst                 |  272860 |  271283 | -1577 |     -0.58%     |
| instcount.NumAllocaInst              |   27227 |   27226 | -1    |      0.00%     |
| instcount.NumAndInst                 |   63502 |   63320 | -182  |     -0.29%     |
| instcount.NumAShrInst                |   13498 |   13407 | -91   |     -0.67%     |
| instcount.NumAtomicCmpXchgInst       |    1159 |    1159 |  0    |      0.00%     |
| instcount.NumAtomicRMWInst           |    5036 |    5036 |  0    |      0.00%     |
| instcount.NumBitCastInst             |  672482 |  672353 | -129  |     -0.02%     |
| instcount.NumBrInst                  |  702768 |  702195 | -573  |     -0.08%     |
| instcount.NumCallInst                |  518285 |  518205 | -80   |     -0.02%     |
| instcount.NumExtractElementInst      |   18481 |   18482 |  1    |      0.01%     |
| instcount.NumExtractValueInst        |   18290 |   18288 | -2    |     -0.01%     |
| instcount.NumFAddInst                |  139035 |  138963 | -72   |     -0.05%     |
| instcount.NumFCmpInst                |   10358 |   10348 | -10   |     -0.10%     |
| instcount.NumFDivInst                |   30310 |   30302 | -8    |     -0.03%     |
| instcount.NumFenceInst               |     387 |     387 |  0    |      0.00%     |
| instcount.NumFMulInst                |   93873 |   93806 | -67   |     -0.07%     |
| instcount.NumFPExtInst               |    7148 |    7144 | -4    |     -0.06%     |
| instcount.NumFPToSIInst              |    2823 |    2838 |  15   |      0.53%     |
| instcount.NumFPToUIInst              |    1251 |    1251 |  0    |      0.00%     |
| instcount.NumFPTruncInst             |    2195 |    2191 | -4    |     -0.18%     |
| instcount.NumFSubInst                |   92109 |   92103 | -6    |     -0.01%     |
| instcount.NumGetElementPtrInst       | 1221423 | 1219157 | -2266 |     -0.19%     |
| instcount.NumICmpInst                |  479140 |  478929 | -211  |     -0.04%     |
| instcount.NumIndirectBrInst          |       2 |       2 |  0    |      0.00%     |
| instcount.NumInsertElementInst       |   66089 |   66094 |  5    |      0.01%     |
| instcount.NumInsertValueInst         |    2032 |    2030 | -2    |     -0.10%     |
| instcount.NumIntToPtrInst            |   19641 |   19641 |  0    |      0.00%     |
| instcount.NumInvokeInst              |   21789 |   21788 | -1    |      0.00%     |
| instcount.NumLandingPadInst          |   12051 |   12051 |  0    |      0.00%     |
| instcount.NumLoadInst                |  880079 |  878673 | -1406 |     -0.16%     |
| instcount.NumLShrInst                |   25919 |   25921 |  2    |      0.01%     |
| instcount.NumMulInst                 |   42416 |   42417 |  1    |      0.00%     |
| instcount.NumOrInst                  |  100826 |  100576 | -250  |     -0.25%     |
| instcount.NumPHIInst                 |  315118 |  314092 | -1026 |     -0.33%     |
| instcount.NumPtrToIntInst            |   15933 |   15939 |  6    |      0.04%     |
| instcount.NumResumeInst              |    2156 |    2156 |  0    |      0.00%     |
| instcount.NumRetInst                 |   84485 |   84484 | -1    |      0.00%     |
| instcount.NumSDivInst                |    8599 |    8597 | -2    |     -0.02%     |
| instcount.NumSelectInst              |   45577 |   45913 |  336  |      0.74%     |
| instcount.NumSExtInst                |   84026 |   78278 | -5748 |     -6.84%     |
| instcount.NumShlInst                 |   39796 |   39726 | -70   |     -0.18%     |
| instcount.NumShuffleVectorInst       |  100272 |  100292 |  20   |      0.02%     |
| instcount.NumSIToFPInst              |   29131 |   29113 | -18   |     -0.06%     |
| instcount.NumSRemInst                |    1543 |    1543 |  0    |      0.00%     |
| instcount.NumStoreInst               |  805394 |  804351 | -1043 |     -0.13%     |
| instcount.NumSubInst                 |   61337 |   61414 |  77   |      0.13%     |
| instcount.NumSwitchInst              |    8527 |    8524 | -3    |     -0.04%     |
| instcount.NumTruncInst               |   60523 |   60484 | -39   |     -0.06%     |
| instcount.NumUDivInst                |    2381 |    2381 |  0    |      0.00%     |
| instcount.NumUIToFPInst              |    5549 |    5549 |  0    |      0.00%     |
| instcount.NumUnreachableInst         |    9855 |    9855 |  0    |      0.00%     |
| instcount.NumURemInst                |    1305 |    1305 |  0    |      0.00%     |
| instcount.NumXorInst                 |   10230 |   10081 | -149  |     -1.46%     |
| instcount.NumZExtInst                |   60353 |   66840 |  6487 |     10.75%     |
| instcount.TotalBlocks                |  829582 |  829004 | -578  |     -0.07%     |
| instcount.TotalFuncs                 |   83818 |   83817 | -1    |      0.00%     |
| instcount.TotalInsts                 | 7316574 | 7308483 | -8091 |     -0.11%     |

TLDR: we produce -0.11% less instructions, -6.84% less `sext`, +10.75% more `zext`.
To be noted, clearly, not all new `zext`'s are produced by this fold.

(And now i guess it might have been interesting to measure this for D68103 :S)

Reviewers: nikic, spatel, reames, dberlin

Reviewed By: nikic

Subscribers: hiraditya, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68654

llvm-svn: 374112
2019-10-08 20:29:48 +00:00
Roman Lebedev 347f6a770b [CVP][NFC] Revisit sext vs. zext test
llvm-svn: 374111
2019-10-08 20:29:36 +00:00
Yonghong Song 05e46979d2 [BPF] do compile-once run-everywhere relocation for bitfields
A bpf specific clang intrinsic is introduced:
   u32 __builtin_preserve_field_info(member_access, info_kind)
Depending on info_kind, different information will
be returned to the program. A relocation is also
recorded for this builtin so that bpf loader can
patch the instruction on the target host.
This clang intrinsic is used to get certain information
to facilitate struct/union member relocations.

The offset relocation is extended by 4 bytes to
include relocation kind.
Currently supported relocation kinds are
 enum {
    FIELD_BYTE_OFFSET = 0,
    FIELD_BYTE_SIZE,
    FIELD_EXISTENCE,
    FIELD_SIGNEDNESS,
    FIELD_LSHIFT_U64,
    FIELD_RSHIFT_U64,
 };
for __builtin_preserve_field_info. The old
access offset relocation is covered by
    FIELD_BYTE_OFFSET = 0.

An example:
struct s {
    int a;
    int b1:9;
    int b2:4;
};
enum {
    FIELD_BYTE_OFFSET = 0,
    FIELD_BYTE_SIZE,
    FIELD_EXISTENCE,
    FIELD_SIGNEDNESS,
    FIELD_LSHIFT_U64,
    FIELD_RSHIFT_U64,
};

void bpf_probe_read(void *, unsigned, const void *);
int field_read(struct s *arg) {
  unsigned long long ull = 0;
  unsigned offset = __builtin_preserve_field_info(arg->b2, FIELD_BYTE_OFFSET);
  unsigned size = __builtin_preserve_field_info(arg->b2, FIELD_BYTE_SIZE);
 #ifdef USE_PROBE_READ
  bpf_probe_read(&ull, size, (const void *)arg + offset);
  unsigned lshift = __builtin_preserve_field_info(arg->b2, FIELD_LSHIFT_U64);
 #if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
  lshift = lshift + (size << 3) - 64;
 #endif
 #else
  switch(size) {
  case 1:
    ull = *(unsigned char *)((void *)arg + offset); break;
  case 2:
    ull = *(unsigned short *)((void *)arg + offset); break;
  case 4:
    ull = *(unsigned int *)((void *)arg + offset); break;
  case 8:
    ull = *(unsigned long long *)((void *)arg + offset); break;
  }
  unsigned lshift = __builtin_preserve_field_info(arg->b2, FIELD_LSHIFT_U64);
 #endif
  ull <<= lshift;
  if (__builtin_preserve_field_info(arg->b2, FIELD_SIGNEDNESS))
    return (long long)ull >> __builtin_preserve_field_info(arg->b2, FIELD_RSHIFT_U64);
  return ull >> __builtin_preserve_field_info(arg->b2, FIELD_RSHIFT_U64);
}

There is a minor overhead for bpf_probe_read() on big endian.

The code and relocation generated for field_read where bpf_probe_read() is
used to access argument data on little endian mode:
        r3 = r1
        r1 = 0
        r1 = 4  <=== relocation (FIELD_BYTE_OFFSET)
        r3 += r1
        r1 = r10
        r1 += -8
        r2 = 4  <=== relocation (FIELD_BYTE_SIZE)
        call bpf_probe_read
        r2 = 51 <=== relocation (FIELD_LSHIFT_U64)
        r1 = *(u64 *)(r10 - 8)
        r1 <<= r2
        r2 = 60 <=== relocation (FIELD_RSHIFT_U64)
        r0 = r1
        r0 >>= r2
        r3 = 1  <=== relocation (FIELD_SIGNEDNESS)
        if r3 == 0 goto LBB0_2
        r1 s>>= r2
        r0 = r1
LBB0_2:
        exit

Compare to the above code between relocations FIELD_LSHIFT_U64 and
FIELD_LSHIFT_U64, the code with big endian mode has four more
instructions.
        r1 = 41   <=== relocation (FIELD_LSHIFT_U64)
        r6 += r1
        r6 += -64
        r6 <<= 32
        r6 >>= 32
        r1 = *(u64 *)(r10 - 8)
        r1 <<= r6
        r2 = 60   <=== relocation (FIELD_RSHIFT_U64)

The code and relocation generated when using direct load.
        r2 = 0
        r3 = 4
        r4 = 4
        if r4 s> 3 goto LBB0_3
        if r4 == 1 goto LBB0_5
        if r4 == 2 goto LBB0_6
        goto LBB0_9
LBB0_6:                                 # %sw.bb1
        r1 += r3
        r2 = *(u16 *)(r1 + 0)
        goto LBB0_9
LBB0_3:                                 # %entry
        if r4 == 4 goto LBB0_7
        if r4 == 8 goto LBB0_8
        goto LBB0_9
LBB0_8:                                 # %sw.bb9
        r1 += r3
        r2 = *(u64 *)(r1 + 0)
        goto LBB0_9
LBB0_5:                                 # %sw.bb
        r1 += r3
        r2 = *(u8 *)(r1 + 0)
        goto LBB0_9
LBB0_7:                                 # %sw.bb5
        r1 += r3
        r2 = *(u32 *)(r1 + 0)
LBB0_9:                                 # %sw.epilog
        r1 = 51
        r2 <<= r1
        r1 = 60
        r0 = r2
        r0 >>= r1
        r3 = 1
        if r3 == 0 goto LBB0_11
        r2 s>>= r1
        r0 = r2
LBB0_11:                                # %sw.epilog
        exit

Considering verifier is able to do limited constant
propogation following branches. The following is the
code actually traversed.
        r2 = 0
        r3 = 4   <=== relocation
        r4 = 4   <=== relocation
        if r4 s> 3 goto LBB0_3
LBB0_3:                                 # %entry
        if r4 == 4 goto LBB0_7
LBB0_7:                                 # %sw.bb5
        r1 += r3
        r2 = *(u32 *)(r1 + 0)
LBB0_9:                                 # %sw.epilog
        r1 = 51   <=== relocation
        r2 <<= r1
        r1 = 60   <=== relocation
        r0 = r2
        r0 >>= r1
        r3 = 1
        if r3 == 0 goto LBB0_11
        r2 s>>= r1
        r0 = r2
LBB0_11:                                # %sw.epilog
        exit

For native load case, the load size is calculated to be the
same as the size of load width LLVM otherwise used to load
the value which is then used to extract the bitfield value.

Differential Revision: https://reviews.llvm.org/D67980

llvm-svn: 374099
2019-10-08 18:23:17 +00:00
Matt Arsenault 190a17bbd1 AMDGPU: Fix i16 arithmetic pattern redundancy
There were 2 problems here. First, these patterns were duplicated to
handle the inverted shift operands instead of using the commuted
PatFrags.

Second, the point of the zext folding patterns don't apply to the
non-0ing high subtargets. They should be skipped instead of inserting
the extension. The zeroing high code would be emitted when necessary
anyway. This was also emitting unnecessary zexts in cases where the
high bits were undefined.

llvm-svn: 374092
2019-10-08 17:36:38 +00:00
Jinsong Ji 9912232b46 Revert "[LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize"
Also Revert "[LoopVectorize] Fix non-debug builds after rL374017"

This reverts commit 9f41deccc0.
This reverts commit 18b6fe07bc.

The patch is breaking PowerPC internal build, checked with author, reverting
on behalf of him for now due to timezone.

llvm-svn: 374091
2019-10-08 17:32:56 +00:00
Sanjay Patel 796a58107a [SLP] add test with prefer-vector-width function attribute; NFC (PR43578)
llvm-svn: 374090
2019-10-08 17:18:32 +00:00
Tom Stellard 3a8d80944b AMDGPU: Add offsets to MMO when lowering buffer intrinsics
Summary:
Without offsets on the MachineMemOperands (MMOs),
MachineInstr::mayAlias() will return true for all reads and writes to the
same resource descriptor.  This leads to O(N^2) complexity in the MachineScheduler
when analyzing dependencies of buffer loads and stores.  It also limits
the SILoadStoreOptimizer from merging more instructions.

This patch reduces the compile time of one pathological compute shader
from 12 seconds to 1 second.

Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65097

llvm-svn: 374087
2019-10-08 17:04:51 +00:00
Hideto Ueno fb8218f252 [Attributor][Fix] Temporary fix for windows build bot failure
D65402 causes test failure related to attributor-max-iterations.
This commit removes attributor-max-iterations-verify for now.
I'll examine the factor and the flag should be reverted.

llvm-svn: 374086
2019-10-08 17:01:56 +00:00
Roman Lebedev d1fe34cc93 [NFC][CVP] Add tests where we can replace sext with zext
If the sign bit of the value that is being sign-extended is not set,
i.e. the value is non-negative (s>= 0), then zero-extension will suffice,
and is better for analysis: https://rise4fun.com/Alive/a8PD

llvm-svn: 374075
2019-10-08 16:21:13 +00:00
Amaury Sechet 7df5b2f79f (Re)generate various tests. NFC
llvm-svn: 374074
2019-10-08 16:16:26 +00:00
Heejin Ahn 6a37c5d6fc [WebAssembly] Fix a bug in 'try' placement
Summary:
When searching for local expression tree created by stackified
registers, for 'block' placement, we start the search from the previous
instruction of a BB's terminator. But in 'try''s case, we should start
from the previous instruction of a call that can throw, or a EH_LABEL
that precedes the call, because the return values of the call's previous
instructions can be stackified and consumed by the throwing call.

For example,
```
  i32.call @foo
  call @bar         ; may throw
  br $label0
```
In this case, if we start the search from the previous instruction of
the terminator (`br` here), we end up stopping at `call @bar` and place
a 'try' between `i32.call @foo` and `call @bar`, because `call @bar`
does not have a return value so it is not a local expression tree of
`br`.

But in this case, unlike when placing 'block's, we should start the
search from `call @bar`, because the return value of `i32.call @foo` is
stackified and used by `call @bar`.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68619

llvm-svn: 374073
2019-10-08 16:15:39 +00:00
Nikola Prica 98603a8153 [DebugInfo][If-Converter] Update call site info during the optimization
During the If-Converter optimization pay attention when copying or
deleting call instructions in order to keep call site information in
valid state.

Reviewers: aprantl, vsk, efriedma

Reviewed By: vsk, efriedma

Differential Revision: https://reviews.llvm.org/D66955

llvm-svn: 374068
2019-10-08 15:43:12 +00:00
Hideto Ueno 96e6ce4cd3 [Attributor][MustExec] Deduce dereferenceable and nonnull attribute using MustBeExecutedContextExplorer
Summary:
In D65186 and related patches, MustBeExecutedContextExplorer is introduced. This enables us to traverse instructions guaranteed to execute from function entry. If we can know the argument is used as `dereferenceable` or `nonnull` in these instructions, we can mark `dereferenceable` or `nonnull` in the argument definition:

1. Memory instruction (similar to D64258)
Trace memory instruction pointer operand. Currently, only inbounds GEPs are traced.
```
define i64* @f(i64* %a) {
entry:
  %add.ptr = getelementptr inbounds i64, i64* %a, i64 1
; (because of inbounds GEP we can know that %a is at least dereferenceable(16))
  store i64 1, i64* %add.ptr, align 8
  ret i64* %add.ptr ; dereferenceable 8 (because above instruction stores into it)
}
```

2. Propagation from callsite (similar to D27855)
If `deref` or `nonnull` are known in call site parameter attributes we can also say that argument also that attribute.

```
declare void @use3(i8* %x, i8* %y, i8* %z);
declare void @use3nonnull(i8* nonnull %x, i8* nonnull %y, i8* nonnull %z);

define void @parent1(i8* %a, i8* %b, i8* %c) {
  call void @use3nonnull(i8* %b, i8* %c, i8* %a)
; Above instruction is always executed so we can say that@parent1(i8* nonnnull %a, i8* nonnull %b, i8* nonnull %c)
  call void @use3(i8* %c, i8* %a, i8* %b)
  ret void
}
```

Reviewers: jdoerfert, sstefan1, spatel, reames

Reviewed By: jdoerfert

Subscribers: xbolva00, hiraditya, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65402

llvm-svn: 374063
2019-10-08 15:25:56 +00:00
Mirko Brkusanin 45e0f24373 [Mips] Emit proper ABI for _mcount calls
When -pg option is present than a call to _mcount is inserted into every
function. However since the proper ABI was not followed then the generated
gmon.out did not give proper results. By inserting needed instructions 
before every _mcount we can fix this.

Differential Revision: https://reviews.llvm.org/D68390

llvm-svn: 374055
2019-10-08 14:32:03 +00:00
Clement Courbet 2cd0f28959 [llvm-exegesis] Add options to SnippetGenerator.
Summary:
This adds a `-max-configs-per-opcode` option to limit the number of
configs per opcode.

Reviewers: gchatelet

Subscribers: tschuett, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68642

llvm-svn: 374054
2019-10-08 14:30:24 +00:00
Sebastian Pop d0d52edae9 fix fmls fp16
Tim Northover remarked that the added patterns for fmls fp16
produce wrong code in case the fsub instruction has a
multiplication as its first operand, i.e., all the patterns FMLSv*_OP1:

> define <8 x half> @test_FMLSv8f16_OP1(<8 x half> %a, <8 x half> %b, <8 x half> %c) {
> ; CHECK-LABEL: test_FMLSv8f16_OP1:
> ; CHECK: fmls {{v[0-9]+}}.8h, {{v[0-9]+}}.8h, {{v[0-9]+}}.8h
> entry:
>
>   %mul = fmul fast <8 x half> %c, %b
>   %sub = fsub fast <8 x half> %mul, %a
>   ret <8 x half> %sub
> }
>
> This doesn't look right to me. The exact instruction produced is "fmls
> v0.8h, v2.8h, v1.8h", which I think calculates "v0 - v2*v1", but the
> IR is calculating "v2*v1-v0". The equivalent <4 x float> code also
> doesn't emit an fmls.

This patch generates an fmla and negates the value of the operand2 of the fsub.

Inspecting the pattern match, I found that there was another mistake in the
opcode to be selected: matching FMULv4*16 should generate FMLSv4*16
and not FMLSv2*32.

Tested on aarch64-linux with make check-all.

Differential Revision: https://reviews.llvm.org/D67990

llvm-svn: 374044
2019-10-08 13:23:57 +00:00
Amaury Sechet aa53d6eb01 Add test for rotating truncated vectors. NFC
llvm-svn: 374043
2019-10-08 13:08:51 +00:00
Graham Hunter b302561b76 [SVE][IR] Scalable Vector size queries and IR instruction support
* Adds a TypeSize struct to represent the known minimum size of a type
  along with a flag to indicate that the runtime size is a integer multiple
  of that size
* Converts existing size query functions from Type.h and DataLayout.h to
  return a TypeSize result
* Adds convenience methods (including a transparent conversion operator
  to uint64_t) so that most existing code 'just works' as if the return
  values were still scalars.
* Uses the new size queries along with ElementCount to ensure that all
  supported instructions used with scalable vectors can be constructed
  in IR.

Reviewers: hfinkel, lattner, rkruppe, greened, rovka, rengolin, sdesmalen

Reviewed By: rovka, sdesmalen

Differential Revision: https://reviews.llvm.org/D53137

llvm-svn: 374042
2019-10-08 12:53:54 +00:00
Nicolai Haehnle df6e67697b AMDGPU: Propagate undef flag during pre-RA exec mask optimizations
Summary: Issue: https://github.com/GPUOpen-Drivers/llpc/issues/204

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68184

llvm-svn: 374041
2019-10-08 12:46:32 +00:00
Nicolai Haehnle 7febdb7f27 MachineSSAUpdater: insert IMPLICIT_DEF at top of basic block
Summary:
When getValueInMiddleOfBlock happens to be called for a basic block
that has no incoming value at all, an IMPLICIT_DEF is inserted in that
block via GetValueAtEndOfBlockInternal. This IMPLICIT_DEF must be at
the top of its basic block or it will likely not reach the use that
the caller intends to insert.

Issue: https://github.com/GPUOpen-Drivers/llpc/issues/204

Reviewers: arsenm, rampitec

Subscribers: jvesely, wdng, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68183

llvm-svn: 374040
2019-10-08 12:46:20 +00:00
Sanjay Patel 5cce533525 [SLP] add test with prefer-vector-width function attribute; NFC
llvm-svn: 374039
2019-10-08 12:43:46 +00:00
Andrea Di Biagio 8d6651f7b1 [MCA][LSUnit] Track loads and stores until retirement.
Before this patch, loads and stores were only tracked by their corresponding
queues in the LSUnit from dispatch until execute stage. In practice we should be
more conservative and assume that memory opcodes leave their queues at
retirement stage.

Basically, loads should leave the load queue only when they have completed and
delivered their data. We conservatively assume that a load is completed when it
is retired. Stores should be tracked by the store queue from dispatch until
retirement. In practice, stores can only leave the store queue if their data can
be written to the data cache.

This is mostly a mechanical change. With this patch, the retire stage notifies
the LSUnit when a memory instruction is retired. That would triggers the release
of LDQ/STQ entries.  The only visible change is in memory tests for the bdver2
model. That is because bdver2 is the only model that defines the load/store
queue size.

This patch partially addresses PR39830.

Differential Revision: https://reviews.llvm.org/D68266

llvm-svn: 374034
2019-10-08 10:46:01 +00:00
Nikola Prica 02682498b8 [ISEL][ARM][AARCH64] Tracking simple parameter forwarding registers
Support for tracking registers that forward function parameters into the
following function frame. For now we only support cases when parameter
is forwarded through single register.

Reviewers: aprantl, vsk, t.p.northover

Reviewed By: vsk

Differential Revision: https://reviews.llvm.org/D66953

llvm-svn: 374033
2019-10-08 09:43:05 +00:00
Clement Courbet 4919534ae4 [llvm-exegesis] Finish plumbing the `Config` field.
Summary:
Right now there are no snippet generators that emit the `Config` Field,
but I plan to add it to investigate LEA operands for PR32326.

What was broken was:
 - `Config` Was not propagated up until the BenchmarkResult::Key.
 - Clustering should really consider different configs as measuring
 different things, so we should stabilize on (Opcode, Config) instead of
 just Opcode.

Reviewers: gchatelet

Subscribers: tschuett, llvm-commits, lebedev.ri

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68629

llvm-svn: 374031
2019-10-08 09:06:48 +00:00
George Rimar eec9896960 [llvm-readobj/llvm-readelf] - Add checks for GNU-style to "all.test" test case.
We do not check the GNU-style output when -all is given.
This patch does that.

Differential revision: https://reviews.llvm.org/D68462

llvm-svn: 374028
2019-10-08 08:59:12 +00:00
Zi Xuan Wu 2edc69c05d [NFC] Add REQUIRES for r374017 in testcase
llvm-svn: 374027
2019-10-08 08:49:15 +00:00
Kristof Beyls 78bfe3ab94 [ARM] Generate vcmp instead of vcmpe
Based on the discussion in
http://lists.llvm.org/pipermail/llvm-dev/2019-October/135574.html, the
conclusion was reached that the ARM backend should produce vcmp instead
of vcmpe instructions by default, i.e. not be producing an Invalid
Operation exception when either arguments in a floating point compare
are quiet NaNs.

In the future, after constrained floating point intrinsics for floating
point compare have been introduced, vcmpe instructions probably should
be produced for those intrinsics - depending on the exact semantics
they'll be defined to have.

This patch logically consists of the following parts:
- Revert http://llvm.org/viewvc/llvm-project?rev=294945&view=rev and
  http://llvm.org/viewvc/llvm-project?rev=294968&view=rev, which
  implemented fine-tuning for when to produce vcmpe (i.e. not do it for
  equality comparisons). The complexity introduced by those patches
  isn't needed anymore if we just always produce vcmp instead. Maybe
  these patches need to be reintroduced again once support is needed to
  map potential LLVM-IR constrained floating point compare intrinsics to
  the ARM instruction set.
- Simply select vcmp, instead of vcmpe, see simple changes in
  lib/Target/ARM/ARMInstrVFP.td
- Adapt lots of tests that tested for vcmpe (instead of vcmp). For all
  of these test, the intent of what is tested for isn't related to
  whether the vcmp should produce an Invalid Operation exception or not.

Fixes PR43374.

Differential Revision: https://reviews.llvm.org/D68463

llvm-svn: 374025
2019-10-08 08:25:42 +00:00
Clement Courbet f1ac8151f9 [llvm-exegesis] Add stabilization test with config
In preparation for D68629.

llvm-svn: 374020
2019-10-08 07:08:48 +00:00
Bill Wendling 411f1885b6 [IA] Recognize hexadecimal escape sequences
Summary:
Implement support for hexadecimal escape sequences to match how GNU 'as'
handles them. I.e., read all hexadecimal characters and truncate to the
lower 16 bits.

Reviewers: nickdesaulniers, jcai19

Subscribers: llvm-commits, hiraditya

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68598

llvm-svn: 374018
2019-10-08 04:39:52 +00:00
Zi Xuan Wu 9f41deccc0 [LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize
In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not
estimate different register pressure for different register class separately(especially for scalar type,
float type should not be on the same position with int type), so it's not accurate. Specifically,
it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance.

So we need classify the register classes in IR level, and importantly these are abstract register classes,
and are not the target register class of backend provided in td file. It's used to establish the mapping between
the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types.

For example, POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(GPR),
float is 64(VSR), but for int and float vector register both are 64(VSR). So there should be 2 kinds of register class when vsx is enabled,
and 3 kinds of register class when VSX is NOT enabled.

It runs on POWER target, it makes big(+~30%) performance improvement in one specific bmk(503.bwaves_r) of spec2017 and no other obvious degressions.

Differential revision: https://reviews.llvm.org/D67148

llvm-svn: 374017
2019-10-08 03:28:33 +00:00
Heejin Ahn a58ddba113 [WebAssembly] Add REQUIRES: asserts to cfg-stackify-eh.ll
This was missing in D68552.

llvm-svn: 374015
2019-10-08 02:50:27 +00:00
Matt Arsenault c8a6df7130 AMDGPU/GlobalISel: Clamp G_SITOFP/G_UITOFP sources
llvm-svn: 373989
2019-10-07 23:33:08 +00:00
Johannes Doerfert 766f2cc1a4 [Attributor] Use local linkage instead of internal
Local linkage is internal or private, and private is a specialization of
internal, so either is fine for all our "local linkage" queries.

llvm-svn: 373986
2019-10-07 23:21:52 +00:00
Johannes Doerfert 661db04b98 [Attributor] Use abstract call sites for call site callback
Summary:
When we iterate over uses of functions and expect them to be call sites,
we now use abstract call sites to allow callback calls.

Reviewers: sstefan1, uenoku

Subscribers: hiraditya, bollu, hfinkel, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67871

llvm-svn: 373985
2019-10-07 23:14:58 +00:00
Craig Topper be7f81ece9 [X86] Shrink zero extends of gather indices from type less than i32 to types larger than i32.
Gather instructions can use i32 or i64 elements for indices. If
the index is zero extended from a type smaller than i32 to i64, we
can shrink the extend to just extend to i32.

llvm-svn: 373982
2019-10-07 23:03:12 +00:00
Craig Topper 7647d3ec70 [X86] Add test cases for zero extending a gather index from less than i32 to i64.
We should be able to use a smaller zero extend.

llvm-svn: 373981
2019-10-07 23:02:03 +00:00
Reid Kleckner f9b67b810e [X86] Add new calling convention that guarantees tail call optimization
When the target option GuaranteedTailCallOpt is specified, calls with
the fastcc calling convention will be transformed into tail calls if
they are in tail position. This diff adds a new calling convention,
tailcc, currently supported only on X86, which behaves the same way as
fastcc, except that the GuaranteedTailCallOpt flag does not need to
enabled in order to enable tail call optimization.

Patch by Dwight Guth <dwight.guth@runtimeverification.com>!

Reviewed By: lebedev.ri, paquette, rnk

Differential Revision: https://reviews.llvm.org/D67855

llvm-svn: 373976
2019-10-07 22:28:58 +00:00
Heejin Ahn daeead4b02 [WebAssembly] Fix unwind mismatch stat computation
Summary:
There was a bug when computing the number of unwind destination
mismatches in CFGStackify. When there are many mismatched calls that
share the same (original) destination BB, they have to be counted
separately.

This also fixes a typo and runs `fixUnwindMismatches` only when the wasm
exception handling is enabled. This is to prevent unnecessary
computations and does not change behavior.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68552

llvm-svn: 373975
2019-10-07 22:19:40 +00:00
Heejin Ahn 58af5be28d [WebAssembly] Add memory intrinsics handling to mayThrow()
Summary:
Previously, `WebAssembly::mayThrow()` assumed all inputs are global
addresses. But when intrinsics, such as `memcpy`, `memmove`, or `memset`
are lowered to external symbols in instruction selection and later
emitted as library calls. And these functions don't throw.

This patch adds handling to those memory intrinsics to `mayThrow`
function. But while most of libcalls don't throw, we can't guarantee all
of them don't throw, so currently we conservatively return true for all
other external symbols.

I think a better way to solve this problem is to embed 'nounwind' info
in `TargetLowering::CallLoweringInfo`, so that we can access the info
from the backend. This will also enable transferring 'nounwind'
properties of LLVM IR instructions. Currently we don't transfer that
info and we can only access properties of callee functions, if the
callees are within the module. Other targets don't need this info in the
backend because they do all the processing before isel, but it will help
us because that info will reduce code size increase in fixing unwind
destination mismatches in CFGStackify.

But for now we return false for these memory intrinsics and true for all
other libcalls conservatively.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68553

llvm-svn: 373967
2019-10-07 21:14:45 +00:00
Johannes Doerfert 1097fab1cf [Attributor] Deduce memory behavior of functions and arguments
Deduce the memory behavior, aka "read-none", "read-only", or
"write-only", for functions and arguments.

Reviewers: sstefan1, uenoku

Subscribers: hiraditya, bollu, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67384

llvm-svn: 373965
2019-10-07 21:07:57 +00:00
Roman Lebedev 7cdeac43e5 [InstCombine] Fold conditional sign-extend of high-bit-extract into high-bit-extract-with-signext (PR42389)
This can come up in Bit Stream abstractions.

The pattern looks big/scary, but it can't be simplified any further.
It only is so simple because a number of my preparatory folds had
happened already (shift amount reassociation / shift amount
reassociation in bit test, sign bit test detection).

Highlights:
* There are two main flavors: https://rise4fun.com/Alive/zWi
  The difference is add vs. sub, and left-shift of -1 vs. 1
* Since we only change the shift opcode,
  we can preserve the exact-ness: https://rise4fun.com/Alive/4u4
* There can be truncation after high-bit-extraction:
  https://rise4fun.com/Alive/slHc1   (the main pattern i'm after!)
  Which means that we need to ignore zext of shift amounts and of NBits.
* The sign-extending magic can be extended itself (in add pattern
  via sext, in sub pattern via zext. not the other way around!)
  https://rise4fun.com/Alive/NhG
  (or those sext/zext can be sinked into `select`!)
  Which again means we should pay attention when matching NBits.
* We can have both truncation of extraction and widening of magic:
  https://rise4fun.com/Alive/XTw
  In other words, i don't believe we need to have any checks on
  bitwidths of any of these constructs.

This is worsened in general by the fact that we may have `sext` instead
of `zext` for shift amounts, and we don't yet canonicalize to `zext`,
although we should. I have not done anything about that here.

Also, we really should have something to weed out `sub` like these,
by folding them into `add` variant.

https://bugs.llvm.org/show_bug.cgi?id=42389

llvm-svn: 373964
2019-10-07 20:53:27 +00:00
Roman Lebedev 3da71714cb [InstCombine][NFC] Tests for "conditional sign-extend of high-bit-extract" pattern (PR42389)
https://bugs.llvm.org/show_bug.cgi?id=42389

llvm-svn: 373963
2019-10-07 20:53:16 +00:00
Roman Lebedev c3b394ffba [InstCombine] dropRedundantMaskingOfLeftShiftInput(): propagate undef shift amounts
Summary:
When we do `ConstantExpr::getZExt()`, that "extends" `undef` to `0`,
which means that for patterns a/b we'd assume that we must not produce
any bits for that channel, while in reality we simply didn't care
about that channel - i.e. we don't need to mask it.

Reviewers: spatel

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68239

llvm-svn: 373960
2019-10-07 20:52:52 +00:00
Matt Arsenault 538b73b797 AMDGPU/GlobalISel: Handle more G_INSERT cases
Start manually writing a table to get the subreg index. TableGen
should probably generate this, but I'm not sure what it looks like in
the arbitrary case where subregisters are allowed to not fully cover
the super-registers.

llvm-svn: 373947
2019-10-07 19:16:26 +00:00
Matt Arsenault 4bcdcad91b GlobalISel: Partially implement lower for G_INSERT
llvm-svn: 373946
2019-10-07 19:13:27 +00:00
Matt Arsenault 1237aa2996 AMDGPU/GlobalISel: Fix selection of 16-bit shifts
llvm-svn: 373945
2019-10-07 19:10:44 +00:00
Matt Arsenault 09ec6918bc AMDGPU/GlobalISel: Select VALU G_AMDGPU_FFBH_U32
llvm-svn: 373944
2019-10-07 19:10:43 +00:00
Matt Arsenault 0b2ea91d6d AMDGPU/GlobalISel: Use S_MOV_B64 for inline constants
This hides some defects in SIFoldOperands when the immediates are
split.

llvm-svn: 373943
2019-10-07 19:07:19 +00:00
Matt Arsenault 578fa2819f AMDGPU/GlobalISel: Widen 16-bit G_MERGE_VALUEs sources
Continue making a mess of merge/unmerge legality.

llvm-svn: 373942
2019-10-07 19:05:58 +00:00
Matt Arsenault b4cbf9862c AMDGPU/GlobalISel: Select more G_INSERT cases
At minimum handle the s64 insert type, which are emitted in real cases
during legalization.

We really need TableGen to emit something to emit something like the
inverse of composeSubRegIndices do determine the subreg index to use.

llvm-svn: 373938
2019-10-07 18:43:31 +00:00
Matt Arsenault 27269054d2 GlobalISel: Add target pre-isel instructions
Allows targets to introduce regbankselectable
pseudo-instructions. Currently the closet feature to this is an
intrinsic. However this requires creating a public intrinsic
declaration. This litters the public intrinsic namespace with
operations we don't necessarily want to expose to IR producers, and
would rather leave as private to the backend.

Use a new instruction bit. A previous attempt tried to keep using enum
value ranges, but it turned into a mess.

llvm-svn: 373937
2019-10-07 18:43:29 +00:00
Wei Mi b523790ae1 [SampleFDO] Add compression support for any section in ExtBinary profile format
Previously ExtBinary profile format only supports compression using zlib for
profile symbol list. In this patch, we extend the compression support to any
section. User can select some or all of the sections to compress. In an
experiment, for a 45M profile in ExtBinary format, compressing name table
reduced its size to 24M, and compressing all the sections reduced its size
to 11M.

Differential Revision: https://reviews.llvm.org/D68253

llvm-svn: 373914
2019-10-07 16:12:37 +00:00
Sanjay Patel b743f18b1f [LoopVectorize] add test that asserted after cost model change (PR43582); NFC
llvm-svn: 373913
2019-10-07 14:48:27 +00:00
Amaury Sechet a6a70415c8 Regenerate ptr-rotate.ll . NFC
llvm-svn: 373908
2019-10-07 14:10:21 +00:00
Simon Atanasyan 55ac745828 [Mips] Always save RA when disabling frame pointer elimination
This ensures that frame-based unwinding will continue to work when
calling a noreturn function; there is not much use having the caller's
frame pointer saved if you don't also have the caller's program counter.

Patch by James Clarke.

Differential Revision: https://reviews.llvm.org/D68542

llvm-svn: 373907
2019-10-07 14:01:37 +00:00
Simon Atanasyan 19ede2f53b [Mips] Fix evaluating J-format branch targets
J/JAL/JALX/JALS are absolute branches, but stay within the current
256 MB-aligned region, so we must include the high bits of the
instruction address when calculating the branch target.

Patch by James Clarke.

Differential Revision: https://reviews.llvm.org/D68548

llvm-svn: 373906
2019-10-07 14:01:22 +00:00
whitequark b63db94fa5 [LLVM-C] Add bindings to create macro debug info
Summary: The C API doesn't have the bindings to create macro debug information.

Reviewers: whitequark, CodaFi, deadalnix

Reviewed By: whitequark

Subscribers: aprantl, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58334

llvm-svn: 373903
2019-10-07 13:57:13 +00:00
Kevin P. Neal 1c3d19c82d [FPEnv] Add constrained intrinsics for lrint and lround
Earlier in the year intrinsics for lrint, llrint, lround and llround were
added to llvm. The constrained versions are now implemented here.

Reviewed by:	andrew.w.kaylor, craig.topper, cameron.mcinally
Approved by:	craig.topper
Differential Revision:	https://reviews.llvm.org/D64746

llvm-svn: 373900
2019-10-07 13:20:00 +00:00
Nico Weber 0fedc26a0d Revert r373888 "[IA] Recognize hexadecimal escape sequences"
It broke MC/AsmParser/directive_ascii.s on all bots:

    Assertion failed: (Index < Length && "Invalid index!"), function operator[],
        file ../../llvm/include/llvm/ADT/StringRef.h, line 243.

llvm-svn: 373898
2019-10-07 11:46:26 +00:00
Jay Foad 301decd93d [AMDGPU] Fix test checks
The GFX10-DENORM-STRICT checks were only passing by accident. Fix them
to make the test more robust in the face of scheduling or register
allocation changes.

llvm-svn: 373893
2019-10-07 10:57:41 +00:00
George Rimar 5ce8c39149 [llvm-readelf/llvm-objdump] - Improve/refactor the implementation of SHT_LLVM_ADDRSIG section dumping.
This patch:

* Adds a llvm-readobj/llvm-readelf test file for SHT_LLVM_ADDRSIG sections. (we do not have any)
* Enables dumping of SHT_LLVM_ADDRSIG with --all.
* Changes the logic to report a warning instead of an error when something goes wrong during dumping
  (allows to continue dumping SHT_LLVM_ADDRSIG and other sections on error).
* Refactors a piece of logic to a new toULEB128Array helper which might be used for GNU-style
  dumping implementation.

Differential revision: https://reviews.llvm.org/D68383

llvm-svn: 373890
2019-10-07 10:29:38 +00:00
Bill Wendling 6942327a8f [IA] Recognize hexadecimal escape sequences
Summary:
Implement support for hexadecimal escape sequences to match how GNU 'as'
handles them. I.e., read all hexadecimal characters and truncate to the
lower 16 bits.

Reviewers: nickdesaulniers

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68483

llvm-svn: 373888
2019-10-07 09:54:53 +00:00
Martin Storsjo dfc1aee25b Revert "[SLP] avoid reduction transform on patterns that the backend can load-combine"
This reverts SVN r373833, as it caused a failed assert "Non-zero loop
cost expected" on building numerous projects, see PR43582 for details
and reproduction samples.

llvm-svn: 373882
2019-10-07 08:21:37 +00:00
Craig Topper 6785108356 [X86] Autogenerate checks in leaFixup32.mir and leaFixup64.mir. NFC
llvm-svn: 373878
2019-10-07 06:50:56 +00:00
Craig Topper 2c4f078877 [X86] Support LEA64_32r in processInstrForSlow3OpLEA and use INC/DEC when possible.
Move the erasing and iterator updating inside to match the
other slow LEA function.

I've adapted code from optTwoAddrLEA and basically rebuilt the
implementation here. We do lose the kill flags now just like
optTwoAddrLEA. This runs late enough in the pipeline that
shouldn't really be a problem.

llvm-svn: 373877
2019-10-07 06:27:55 +00:00
Yi-Hong Lyu 6088f84398 [NFC][CGP] Tests for making ICMP_EQ use CR result of ICMP_S(L|G)T dominators
llvm-svn: 373876
2019-10-07 05:29:11 +00:00
Simon Pilgrim b4ba3cbda0 [X86][AVX] Access a scalar float/double as a free extract from a broadcast load (PR43217)
If a fp scalar is loaded and then used as both a scalar and a vector broadcast, perform the load as a broadcast and then extract the scalar for 'free' from the 0th element.

This involved switching the order of the X86ISD::BROADCAST combines so we only convert to X86ISD::BROADCAST_LOAD once all other canonicalizations have been attempted.

Adds a DAGCombinerInfo::recursivelyDeleteUnusedNodes wrapper.

Fixes PR43217

Differential Revision: https://reviews.llvm.org/D68544

llvm-svn: 373871
2019-10-06 21:11:45 +00:00
Craig Topper 570ae49d03 [X86] Add custom type legalization for v16i64->v16i8 truncate and v8i64->v8i8 truncate when v8i64 isn't legal
Summary:
The default legalization for v16i64->v16i8 tries to create a multiple stage truncate concatenating after each stage and truncating again. But avx512 implements truncates with multiple uops. So it should be better to truncate all the way to the desired element size and then concatenate the pieces using unpckl instructions. This minimizes the number of 2 uop truncates. The unpcks are all single uop instructions.

I tried to handle this by just custom splitting the v16i64->v16i8 shuffle. And hoped that the DAG combiner would leave the two halves in the state needed to make D68374 do the job for each half. This worked for the first half, but the second half got messed up. So I've implemented custom handling for v8i64->v8i8 when v8i64 needs to be split to produce the VTRUNCs directly.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68428

llvm-svn: 373864
2019-10-06 18:43:08 +00:00
Craig Topper 842dde6be4 [LegalizeTypes][X86] When splitting a vselect for type legalization, don't split a setcc condition if the setcc input is legal and vXi1 conditions are supported
Summary: The VSELECT splitting code tries to split a setcc input as well. But on avx512 where mask registers are well supported it should be better to just split the mask and use a single compare.

Reviewers: RKSimon, spatel, efriedma

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68359

llvm-svn: 373863
2019-10-06 18:43:03 +00:00
Sanjay Patel f643fabb52 Revert [DAGCombine] Match more patterns for half word bswap
This reverts r373850 (git commit 25ba49824d)

This patch appears to cause multiple codegen regression test failures - http://lab.llvm.org:8011/builders/clang-cmake-armv7-quick/builds/10680

llvm-svn: 373853
2019-10-06 15:27:34 +00:00
Sanjay Patel aab8b3ab9c [InstCombine] fold fneg disguised as select+fmul (PR43497)
Extends rL373230 and solves the motivating bug (although in a narrow way):
https://bugs.llvm.org/show_bug.cgi?id=43497

llvm-svn: 373851
2019-10-06 14:15:48 +00:00
Amaury Sechet 25ba49824d [DAGCombine] Match more patterns for half word bswap
Summary: It ensures that the bswap is generated even when a part of the subtree already matches a bswap transform.

Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68250

llvm-svn: 373850
2019-10-06 14:14:55 +00:00
Sanjay Patel 61c22a83de [InstCombine] add fast-math-flags for better test coverage; NFC
llvm-svn: 373848
2019-10-06 13:19:05 +00:00
Sanjay Patel c38881a6b7 [InstCombine] don't assume 'inbounds' for bitcast pointer to GEP transform (PR43501)
https://bugs.llvm.org/show_bug.cgi?id=43501
We can't declare a GEP 'inbounds' in general. But we may salvage that information if
we have known dereferenceable bytes on the source pointer.

Differential Revision: https://reviews.llvm.org/D68244

llvm-svn: 373847
2019-10-06 13:08:08 +00:00
Simon Pilgrim 032dd9b086 [X86][SSE] matchVectorShuffleAsBlend - use Zeroable element mask directly.
We can make use of the Zeroable mask to indicate which elements we can safely set to zero instead of creating a target shuffle mask on the fly.

This allows us to remove createTargetShuffleMask.

This is part of the work to fix PR43024 and allow us to use SimplifyDemandedElts to simplify shuffle chains - we need to get to a point where the target shuffle masks isn't adjusted by its source inputs in setTargetShuffleZeroElements but instead we cache them in a parallel Zeroable mask.

llvm-svn: 373846
2019-10-06 12:38:38 +00:00
David Zarzycki 7653ff398d [X86] Enable AVX512BW for memcmp()
llvm-svn: 373845
2019-10-06 10:25:52 +00:00
Matt Arsenault c0ec72d4f8 AMDGPU/GlobalISel: RegBankSelect DS GWS intrinsics
llvm-svn: 373840
2019-10-06 01:37:38 +00:00
Matt Arsenault bcd6b1d209 AMDGPU/GlobalISel: Lower G_ATOMIC_CMPXCHG_WITH_SUCCESS
llvm-svn: 373839
2019-10-06 01:37:37 +00:00
Matt Arsenault a5b9c75674 GlobalISel: Partially implement lower for G_EXTRACT
Turn into shift and truncate. Doesn't yet handle pointers.

llvm-svn: 373838
2019-10-06 01:37:35 +00:00
Matt Arsenault 69c65a8609 AMDGPU/GlobalISel: Fix RegBankSelect for sendmsg intrinsics
This wasn't updated for the immarg handling change.

llvm-svn: 373837
2019-10-06 01:37:34 +00:00
Craig Topper 2decdf42b9 [FastISel] Copy the inline assembly dialect to the INLINEASM instruction.
Fixes PR43575.

llvm-svn: 373836
2019-10-05 23:21:17 +00:00
Simon Pilgrim 8815be04ec [X86][AVX] Push sign extensions of comparison bool results through bitops (PR42025)
As discussed on PR42025, with more complex boolean math we can end up with many truncations/extensions of the comparison results through each bitop.

This patch handles the cases introduced in combineBitcastvxi1 by pushing the sign extension through the AND/OR/XOR ops so its just the original SETCC ops that gets extended.

Differential Revision: https://reviews.llvm.org/D68226

llvm-svn: 373834
2019-10-05 20:49:34 +00:00
Sanjay Patel e2321bb448 [SLP] avoid reduction transform on patterns that the backend can load-combine
I don't see an ideal solution to these 2 related, potentially large, perf regressions:
https://bugs.llvm.org/show_bug.cgi?id=42708
https://bugs.llvm.org/show_bug.cgi?id=43146

We decided that load combining was unsuitable for IR because it could obscure other
optimizations in IR. So we removed the LoadCombiner pass and deferred to the backend.
Therefore, preventing SLP from destroying load combine opportunities requires that it
recognizes patterns that could be combined later, but not do the optimization itself (
it's not a vector combine anyway, so it's probably out-of-scope for SLP).

Here, we add a scalar cost model adjustment with a conservative pattern match and cost
summation for a multi-instruction sequence that can probably be reduced later.
This should prevent SLP from creating a vector reduction unless that sequence is
extremely cheap.

In the x86 tests shown (and discussed in more detail in the bug reports), SDAG combining
will produce a single instruction on these tests like:

  movbe   rax, qword ptr [rdi]

or:

  mov     rax, qword ptr [rdi]

Not some (half) vector monstrosity as we currently do using SLP:

  vpmovzxbq       ymm0, dword ptr [rdi + 1] # ymm0 = mem[0],zero,zero,..
  vpsllvq ymm0, ymm0, ymmword ptr [rip + .LCPI0_0]
  movzx   eax, byte ptr [rdi]
  movzx   ecx, byte ptr [rdi + 5]
  shl     rcx, 40
  movzx   edx, byte ptr [rdi + 6]
  shl     rdx, 48
  or      rdx, rcx
  movzx   ecx, byte ptr [rdi + 7]
  shl     rcx, 56
  or      rcx, rdx
  or      rcx, rax
  vextracti128    xmm1, ymm0, 1
  vpor    xmm0, xmm0, xmm1
  vpshufd xmm1, xmm0, 78          # xmm1 = xmm0[2,3,0,1]
  vpor    xmm0, xmm0, xmm1
  vmovq   rax, xmm0
  or      rax, rcx
  vzeroupper
  ret

Differential Revision: https://reviews.llvm.org/D67841

llvm-svn: 373833
2019-10-05 18:03:58 +00:00
David Bolvansky 41c934acaf [SelectionDAG] Add tests for LKK algorithm
Added some tests testing urem and srem operations with a constant divisor.

Patch by TG908 (Tim Gymnich)

Differential Revision: https://reviews.llvm.org/D68421

llvm-svn: 373830
2019-10-05 14:29:25 +00:00
Philip Reames d5a4dad206 Fix a *nasty* miscompile in experimental unordered atomic lowering
This is an omission in rL371441.  Loads which happened to be unordered weren't being added to the PendingLoad set, and thus weren't be ordered w/respect to side effects which followed before the end of the block.

Included test case is how I spotted this.  We had an atomic load being folded into a using instruction after a fence that load was supposed to be ordered with.  I'm sure it showed up a bunch of other ways as well.

Spotted via manual inspecting of assembly differences in a corpus w/and w/o the new experimental mode.  Finding this with testing would have been "unpleasant".  

llvm-svn: 373814
2019-10-05 00:32:10 +00:00
Philip Reames 9fe5d730c7 [Test] Add a test case fo a missed oppurtunity in implicit null checking
llvm-svn: 373813
2019-10-04 23:46:26 +00:00
Aditya Kumar 6a2673605e Invalidate assumption cache before outlining.
Subscribers: llvm-commits

Tags: #llvm

Reviewers: compnerd, vsk, sebpop, fhahn, tejohnson

Reviewed by: vsk

Differential Revision: https://reviews.llvm.org/D68478

llvm-svn: 373807
2019-10-04 22:46:42 +00:00
Reid Kleckner 67cfa79c01 Revert [CodeGen] Do the Simple Early Return in block-placement pass to optimize the blocks
This reverts r371177 (git commit f879c68755)

It caused PR43566 by removing empty, address-taken MachineBasicBlocks.
Such blocks may have references from blockaddress or other operands, and
need more consideration to be removed.

See the PR for a test case to use when relanding.

llvm-svn: 373805
2019-10-04 22:24:21 +00:00
Roman Lebedev fb5af8b9b9 [InstCombine] Fold 'icmp eq/ne (?trunc (lshr/ashr %x, bitwidth(x)-1)), 0' -> 'icmp sge/slt %x, 0'
We do indeed already get it right in some cases, but only transitively,
with one-use restrictions. Since we only need to produce a single
comparison, it makes sense to match the pattern directly:
  https://rise4fun.com/Alive/kPg

llvm-svn: 373802
2019-10-04 22:16:22 +00:00
Roman Lebedev f304d4d185 [InstCombine] Right-shift shift amount reassociation with truncation (PR43564, PR42391)
Initially (D65380) i believed that if we have rightshift-trunc-rightshift,
we can't do any folding. But as it usually happens, i was wrong.

https://rise4fun.com/Alive/GEw
https://rise4fun.com/Alive/gN2O

In https://bugs.llvm.org/show_bug.cgi?id=43564 we happen to have
this very sequence, of two right shifts separated by trunc.
And "just" so that happens, we apparently can fold the pattern
if the total shift amount is either 0, or it's equal to the bitwidth
of the innermost widest shift - i.e. if we are left with only the
original sign bit. Which is exactly what is wanted there.

llvm-svn: 373801
2019-10-04 22:16:11 +00:00
Roman Lebedev ae738641d5 [NFC][InstCombine] Autogenerate shift.ll test
llvm-svn: 373800
2019-10-04 22:15:57 +00:00
Roman Lebedev 007452532b [NFC][InstCombine] Autogenerate icmp-shr-lt-gt.ll test
llvm-svn: 373799
2019-10-04 22:15:49 +00:00
Roman Lebedev 3c56cc920f [NFC][InstCombine] Tests for bit test via highest sign-bit extract (w/ trunc) (PR43564)
https://rise4fun.com/Alive/x5IS

llvm-svn: 373798
2019-10-04 22:15:41 +00:00
Roman Lebedev 6a954748c8 [NFC][InstCombine] Tests for right-shift shift amount reassociation (w/ trunc) (PR43564, PR42391)
https://rise4fun.com/Alive/GEw

llvm-svn: 373797
2019-10-04 22:15:32 +00:00
Jessica Paquette 784892c964 [MachineOutliner] Disable outlining from noreturn functions
Outlining from noreturn functions doesn't do the correct thing right now. The
outliner should respect that the caller is marked noreturn. In the event that
we have a noreturn function, and the outlined code is in tail position, the
outliner will not see that the outlined function should be tail called. As a
result, you end up with a regular call containing a return.

Fixing this requires that we check that all candidates live inside noreturn
functions. So, for the sake of correctness, don't outline from noreturn
functions right now.

Add machine-outliner-noreturn.mir to test this.

llvm-svn: 373791
2019-10-04 21:24:12 +00:00
Sanjay Patel 6e312388b6 [InstCombine] add tests for fneg disguised as fmul; NFC
llvm-svn: 373788
2019-10-04 20:54:14 +00:00
Eli Friedman 23ae13d51f [ScheduleDAG] When a node is cloned, add an edge between the nodes.
InstrEmitter's virtual register handling assumes that clones are emitted
after the cloned node.  Make sure this assumption actually holds.

Fixes a "Node emitted out of order - early" assertion on the testcase.

This is probably a very rare case to actually hit in practice; even
without the explicit edge, the scheduler will usually end up scheduling
the nodes in the expected order due to other constraints.

Differential Revision: https://reviews.llvm.org/D68068

llvm-svn: 373782
2019-10-04 19:51:40 +00:00
Martin Storsjo 174604c93c [test] Remove another two unnecessary uses of REQUIRES: target-windows. NFC.
Differential Revision: https://reviews.llvm.org/D68449

llvm-svn: 373780
2019-10-04 19:47:48 +00:00
Craig Topper 87aa59a0c7 [X86] Remove isel patterns for mask vpcmpgt/vpcmpeq. Switch vpcmp to these based on the immediate in MCInstLower
The immediate form of VPCMP can represent these completely. The
vpcmpgt/eq are just shorter encodings.

This patch removes the isel patterns and just swaps the opcodes
and removes the immediate in MCInstLower. This matches where we do
some other encodings tricks.

Removes over 10K bytes from the isel table.

Differential Revision: https://reviews.llvm.org/D68446

llvm-svn: 373766
2019-10-04 18:02:46 +00:00
Craig Topper 074fa390d2 [X86] Add DAG combine to form saturating VTRUNCUS/VTRUNCS from VTRUNC
We already do this for ISD::TRUNCATE, but we can do the same for X86ISD::VTRUNC

Differential Revision: https://reviews.llvm.org/D68432

llvm-svn: 373765
2019-10-04 17:53:18 +00:00
Kevin P. Neal 68b8052121 [FPEnv] Strict FP tests should use the requisite function attributes.
A set of function attributes is required in any function that uses constrained
floating point intrinsics. None of our tests use these attributes.

This patch fixes this.

These tests have been tested against the IR verifier changes in D68233.

Reviewed by:	andrew.w.kaylor, cameron.mcinally, uweigand
Approved by:	andrew.w.kaylor
Differential Revision:	https://reviews.llvm.org/D67925

llvm-svn: 373761
2019-10-04 17:03:46 +00:00
Dmitry Preobrazhensky 434d59250e [AMDGPU][MC][GFX10][WS32] Corrected decoding of dst operand for v_cmp_*_sdwa opcodes
See bug 43484: https://bugs.llvm.org/show_bug.cgi?id=43484

Reviewers: arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D68349

llvm-svn: 373745
2019-10-04 13:04:17 +00:00
Dmitry Preobrazhensky 9bd763679f [AMDGPU][MC][GFX10] Enabled decoding of 'null' operand
See bug 43485: https://bugs.llvm.org/show_bug.cgi?id=43485

Reviewers: arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D68348

llvm-svn: 373740
2019-10-04 12:38:36 +00:00
Tim Northover a7d90af1be ARM-Darwin: keep the frame register reserved even if not updated.
Darwin platforms need the frame register to always point at a valid record even
if it's not updated in a leaf function. Backtraces are more important than one
extra GPR.

llvm-svn: 373738
2019-10-04 12:29:32 +00:00
Owen Reynolds e64369e76e [llvm-ar][test] Clarified comment
The test is dependant on the installation of the en_US.UTF-8
locale. The reasoning for this is clarified in the amended comment.

llvm-svn: 373737
2019-10-04 12:26:25 +00:00
Dmitry Preobrazhensky 94d040706d [AMDGPU][MC][GFX10] Corrected definition of FLAT GLOBAL/SCRATCH instructions
See bug 43483: https://bugs.llvm.org/show_bug.cgi?id=43483

Reviewers: arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D68347

llvm-svn: 373736
2019-10-04 12:10:22 +00:00
Simon Atanasyan 8c1dd31a08 [llvm-readobj][mips] Implement GNU-style printing of .MIPS.abiflags section
In this patch `llvm-readobj` prints ASEs flags on a single line
separated by a comma. GNU `readelf` prints each ASEs flag on
a separate line. It will be fixed later.

llvm-svn: 373732
2019-10-04 11:59:16 +00:00
Owen Reynolds 4682b9c46b Revert [test] Remove locale dependency for mri-utf8.test
This reverts r373700 (git commit b455ebf921)

llvm-svn: 373728
2019-10-04 11:12:37 +00:00
Jeremy Morse 61800a75b7 [DebugInfo] LiveDebugValues: move DBG_VALUE creation into VarLoc class
Rather than having a mixture of location-state shared between DBG_VALUEs
and VarLoc objects in LiveDebugValues, this patch makes VarLoc the
master record of variable locations. The refactoring means that the
transfer of locations from one place to another is always a performed by
an operation on an existing VarLoc, that produces another transferred
VarLoc. DBG_VALUEs are only created at the end of LiveDebugValues, once
all locations are known. As a plus, there is now only one method where
DBG_VALUEs can be created.

The test case added covers a circumstance that is now impossible to
express in LiveDebugValues: if an already-indirect DBG_VALUE is spilt,
previously it would have been restored-from-spill as a direct DBG_VALUE.
We now don't lose this information along the way, as VarLocs always
refer back to the "original" non-transfer DBG_VALUE, and we can always
work out whether a location was "originally" indirect.

Differential Revision: https://reviews.llvm.org/D67398

llvm-svn: 373727
2019-10-04 10:53:47 +00:00
Jeremy Morse 0ca48de26c [DebugInfo] LiveDebugValues: defer DBG_VALUE creation during analysis
When transfering variable locations from one place to another,
LiveDebugValues immediately creates a DBG_VALUE representing that
transfer. This causes trouble if the variable location should
subsequently be invalidated by a loop back-edge, such as in the added
test case: the transfer DBG_VALUE from a now-invalid location is used
as proof that the variable location is correct. This is effectively a
self-fulfilling prophesy.

To avoid this, defer the insertion of transfer DBG_VALUEs until after
analysis has completed. Some of those transfers are still sketchy, but
we don't propagate them into other blocks now.

Differential Revision: https://reviews.llvm.org/D67393

llvm-svn: 373720
2019-10-04 09:38:05 +00:00
Matt Arsenault d7cad4fb41 AMDGPU/GlobalISel: Fix using wrong addrspace for aperture
This was always passing the destination flat address space, when it
should be picking between the two valid source options.

llvm-svn: 373716
2019-10-04 08:35:38 +00:00
Matt Arsenault 412e0bf8f3 AMDGPU/GlobalISel: Select G_PTRTOINT
llvm-svn: 373715
2019-10-04 08:35:37 +00:00
Matt Arsenault be9521acaa AMDGPU/GlobalISel: Support wave32 waterfall loops
llvm-svn: 373714
2019-10-04 08:35:35 +00:00
David Zarzycki 03b216d854 [X86] Enable inline memcmp() to use AVX512
llvm-svn: 373706
2019-10-04 07:42:34 +00:00
Martin Storsjo b8f790234f Revert "[Symbolize] Use the local MSVC C++ demangler instead of relying on dbghelp. NFC."
This reverts SVN r373698, as it broke sanitizer tests, e.g. in
http://lab.llvm.org:8011/builders/sanitizer-windows/builds/52441.

llvm-svn: 373701
2019-10-04 07:22:37 +00:00
Thomas Preud'homme b455ebf921 [test] Remove locale dependency for mri-utf8.test
Summary:
llvm-ar's mri-utf8.test test relies on the en_US.UTF-8 locale to be
installed for its last RUN line to work. If not installed, the unicode
string gets encoded (interpreted) as ascii which fails since the most
significant byte is non zero. This commit changes the call to open to
use a binary literal of the UTF-8 encoding for the pound sign instead,
thus bypassing the encoding step.

Note that the echo to create the <pound sign>.txt file will work
regardless of the locale because both the shell and the echo (in case
it's not a builtin of the shell concerned) only care about ascii
character to operate. Indeed, the mri-utf8.test file (and in particular
the pound sign) is encoded in UTF-8 and UTF-8 guarantees only ascii
characters can create bytes that can be interpreted as ascii characters
(i.e. bytes with the most significant bit null).

So the process to break down the filename in the line goes something
along:
- find an ascii chevron '>'
- find beginning of the filename by removing ascii space-like characters
- find ascii newline character indicating the end of the redirection (no
  semicolon ';', closing curly bracket '}' or parenthesis ')' or the
  like
- create a file whose name is made of all the bytes in between beginning
  and end of filename *without interpretting them*

Reviewers: gbreynoo, MaskRay, rupprecht, JamesNagurne, jfb

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68418

llvm-svn: 373700
2019-10-04 07:13:46 +00:00
Martin Storsjo 1ca074b86a [Symbolize] Use the local MSVC C++ demangler instead of relying on dbghelp. NFC.
This allows making a couple llvm-symbolizer tests run in all
environments.

Differential Revision: https://reviews.llvm.org/D68133

llvm-svn: 373698
2019-10-04 07:05:42 +00:00
Martin Storsjo 30cb220115 [test] Remove a needless declaration of REQUIRES: target-windows
This test only relies on running on a case insensitive file system,
the exact target triple of the toolchain shouldn't matter.

Differential Revision: https://reviews.llvm.org/D68136

llvm-svn: 373697
2019-10-04 07:05:29 +00:00
Lang Hames 4e920e58e6 [JITLink] Switch from an atom-based model to a "blocks and symbols" model.
In the Atom model the symbols, content and relocations of a relocatable object
file are represented as a graph of atoms, where each Atom represents a
contiguous block of content with a single name (or no name at all if the
content is anonymous), and where edges between Atoms represent relocations.
If more than one symbol is associated with a contiguous block of content then
the content is broken into multiple atoms and layout constraints (represented by
edges) are introduced to ensure that the content remains effectively contiguous.
These layout constraints must be kept in mind when examining the content
associated with a symbol (it may be spread over multiple atoms) or when applying
certain relocation types (e.g. MachO subtractors).

This patch replaces the Atom model in JITLink with a blocks-and-symbols model.
The blocks-and-symbols model represents relocatable object files as bipartite
graphs, with one set of nodes representing contiguous content (Blocks) and
another representing named or anonymous locations (Symbols) within a Block.
Relocations are represented as edges from Blocks to Symbols. This scheme
removes layout constraints (simplifying handling of MachO alt-entry symbols,
and hopefully ELF sections at some point in the future) and simplifies some
relocation logic.

llvm-svn: 373689
2019-10-04 03:55:26 +00:00
Shiva Chen ff55e2e047 [RISCV] Split SP adjustment to reduce the offset of callee saved register spill and restore
We would like to split the SP adjustment to reduce the instructions in
prologue and epilogue as the following case. In this way, the offset of
the callee saved register could fit in a single store.

    add     sp,sp,-2032
    sw      ra,2028(sp)
    sw      s0,2024(sp)
    sw      s1,2020(sp)
    sw      s3,2012(sp)
    sw      s4,2008(sp)
    add     sp,sp,-64

Differential Revision: https://reviews.llvm.org/D68011

llvm-svn: 373688
2019-10-04 02:00:57 +00:00
Peter Collingbourne 71662116fd LowerTypeTests: Rename local functions to avoid collisions with identically named functions in ThinLTO modules.
Without this we can encounter link errors or incorrect behaviour
at runtime as a result of the wrong function being referenced.

Differential Revision: https://reviews.llvm.org/D67945

llvm-svn: 373678
2019-10-03 23:42:44 +00:00
Jordan Rupprecht d84e942703 [llvm-objdump][test] Move test to X86 dir to avoid errors disassembling on non-x86
llvm-svn: 373676
2019-10-03 23:08:22 +00:00
Alina Sbirlea 145cdad119 [MemorySSA] Don't hoist stores if interfering uses (as calls) exist.
llvm-svn: 373674
2019-10-03 22:20:04 +00:00
Jordan Rupprecht 9d4a6b1bb2 [llvm-objdump] Further rearrange llvm-objdump sections for compatability
Summary:
rL371826 rearranged some output from llvm-objdump for GNU objdump compatability, but there still seem to be some more.

I think this rearrangement is a little closer. Overview of the ordering which matches GNU objdump:
* Archive headers
* File headers
* Section headers
* Symbol table
* Dwarf debugging
* Relocations (if `--disassemble` is not used)
* Section contents
* Disassembly

Reviewers: jhenderson, justice_adams, grimar, ychen, espindola

Reviewed By: jhenderson

Subscribers: aprantl, emaste, arichardson, jrtc27, atanasyan, seiya, llvm-commits, MaskRay

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68066

llvm-svn: 373671
2019-10-03 22:01:08 +00:00
Sanjay Patel 288079aafd [DAGCombiner] add operation legality checks before creating shift ops (PR43542)
As discussed on llvm-dev and:
https://bugs.llvm.org/show_bug.cgi?id=43542
...we have transforms that assume shift operations are legal and transforms to
use them are profitable, but that may not hold for simple targets.

In this case, the MSP430 target custom lowers shifts by repeating (many)
simpler/fixed ops. That can be avoided by keeping this code as setcc/select.

Differential Revision: https://reviews.llvm.org/D68397

llvm-svn: 373666
2019-10-03 21:34:04 +00:00
Philip Reames 82cb5bc302 [Tests] Add a unordered atomic load combine test
llvm-svn: 373659
2019-10-03 20:28:59 +00:00
Philip Reames 65d63ac05a [Test] Fix inconsistency in alignment in test case
The IR was using a fixed 8 byte alignment, but the MIR portion was using native alignment.  Since the test doesn't appear to be deliberately testing overalignment, just make the IR match the MIR.

llvm-svn: 373658
2019-10-03 20:24:18 +00:00
Jinsong Ji 230cf9a360 [AArch64][SVE] Move the testcase into CodeGen dir
https://reviews.llvm.org/rL373600 added an AArch64 testcase in top dir
which should be moved to Codegen dir.

llvm-svn: 373657
2019-10-03 20:21:23 +00:00
Nick Desaulniers ede784ff5a [AArch64InstPrinter] prefer bfi to bfc for < armv8.2-a
Summary:
Fixes pr/42576.

Link: https://github.com/ClangBuiltLinux/linux/issues/697

Reviewers: t.p.northover

Reviewed By: t.p.northover

Subscribers: kristof.beyls, hiraditya, llvm-commits, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68356

llvm-svn: 373655
2019-10-03 20:10:02 +00:00
Jinsong Ji 4a6881eabc [PowerPC] Adjust the naming and operand order of fnmsub patterns
Summary:
This is follow up patch of https://reviews.llvm.org/D67595.
Adjust naming and the Commutable operands for additional patterns
to make it easier to read.

The testcase update also show that we can save some unecessary fmr as
well.

Reviewers: #powerpc, steven.zhang, hfinkel, nemanjai

Reviewed By: #powerpc, nemanjai

Subscribers: wuzish, hiraditya, kbarton, MaskRay, shchenz, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68112

llvm-svn: 373652
2019-10-03 19:36:42 +00:00
Craig Topper 185ee6ec7c [X86] Add v32i8 shuffle lowering strategy to recognize two v4i64 vectors truncated to v4i8 and concatenated into the lower 8 bytes with undef/zero upper bytes.
This patch recognizes the shuffle pattern we get from a
v8i64->v8i8 truncate when v8i64 isn't a legal type.

With VLX we can use two VTRUNCs, unpckldq, and a insert_subvector.

Diffrential Revision: https://reviews.llvm.org/D68374

llvm-svn: 373645
2019-10-03 18:34:42 +00:00
Matt Arsenault ed77b27441 AMDGPU/GlobalISel: Handle RegBankSelect of G_INSERT_VECTOR_ELT
llvm-svn: 373639
2019-10-03 17:59:03 +00:00
Matt Arsenault 233ff982c7 AMDGPU/GlobalISel: Split 64-bit vector extracts during RegBankSelect
Register indexing 64-bit elements is possible on the SALU, but not the
VALU. Handle splitting this into two 32-bit indexes. Extend waterfall
loop handling to allow moving a range of instructions.

llvm-svn: 373638
2019-10-03 17:55:27 +00:00
Matt Arsenault 56271fe180 AMDGPU/GlobalISel: Allow VGPR to index SGPR register
We can still do a waterfall loop over the index if using a VGPR to
index an SGPR. The result will still be a VGPR, but we can avoid the
wide copy of the source register to a VGPR.

llvm-svn: 373637
2019-10-03 17:50:32 +00:00
Matt Arsenault 9256183994 AMDGPU/GlobalISel: Add some more tests for G_INSERT legalization
llvm-svn: 373636
2019-10-03 17:50:31 +00:00
Matt Arsenault 3d23e58dbe AMDGPU/GlobalISel: Fix mutationIsSane assert v8s8 and
This would try to do FewerElements to v9s8

llvm-svn: 373635
2019-10-03 17:50:29 +00:00
James Molloy 9972c992eb [ModuloSchedule] removeBranch() *before* creating the trip count condition
The Hexagon code assumes there's no existing terminator when inserting its
trip count condition check.

This causes swp-stages5.ll to break. The generated code looks good to me,
it is likely a permutation. I have disabled the new codegen path to keep
everything green and will investigate along with the other 3-4 tests
that have different codegen.

Fixes expensive-checks build.

llvm-svn: 373629
2019-10-03 17:10:32 +00:00
Jonas Devlieghere eddc1a4e95 [dsymutil] Tablegenify option parsing
This patch reimplements command line option parsing in dsymutil with
Tablegen and libOption. The main motivation for this change is to
prevent clashes with other cl::opt options defined in llvm. Although
it's a bit more heavyweight, it has some nice advantages such as no
global static initializers and better separation between the code and
the option definitions.

I also used this opportunity to improve how dsymutil deals with
incompatible options. Instead of having checks spread across the code,
everything is now grouped together in verifyOptions. The fact that the
options are no longer global means that we need to pass them around a
bit more, but I think it's worth the trade-off.

Differential revision: https://reviews.llvm.org/D68361

llvm-svn: 373622
2019-10-03 16:34:41 +00:00
Yonghong Song 02ac75092d [BPF] Handle offset reloc endpoint ending in the middle of chain properly
During studying support for bitfield, I found an issue for
an example like the one in test offset-reloc-middle-chain.ll.
  struct t1 { int c; };
  struct s1 { struct t1 b; };
  struct r1 { struct s1 a; };
  #define _(x) __builtin_preserve_access_index(x)
  void test1(void *p1, void *p2, void *p3);
  void test(struct r1 *arg) {
    struct s1 *ps = _(&arg->a);
    struct t1 *pt = _(&arg->a.b);
    int *pi = _(&arg->a.b.c);
    test1(ps, pt, pi);
  }

The IR looks like:
  %0 = llvm.preserve.struct.access(base, ...)
  %1 = llvm.preserve.struct.access(%0, ...)
  %2 = llvm.preserve.struct.access(%1, ...)
  using %0, %1 and %2

In this case, we need to generate three relocatiions
corresponding to chains: (%0), (%0, %1) and (%0, %1, %2).
After collecting all the chains, the current implementation
process each chain (in a map) with code generation sequentially.
For example, after (%0) is processed, the code may look like:
  %0 = base + special_global_variable
  // llvm.preserve.struct.access(base, ...) is delisted
  // from the instruction stream.
  %1 = llvm.preserve.struct.access(%0, ...)
  %2 = llvm.preserve.struct.access(%1, ...)
  using %0, %1 and %2

When processing chain (%0, %1), the current implementation
tries to visit intrinsic llvm.preserve.struct.access(base, ...)
to get some of its properties and this caused segfault.

This patch fixed the issue by remembering all necessary
information (kind, metadata, access_index, base) during
analysis phase, so in code generation phase there is
no need to examine the intrinsic call instructions.
This also simplifies the code.

Differential Revision: https://reviews.llvm.org/D68389

llvm-svn: 373621
2019-10-03 16:30:29 +00:00
Edward Jones f5177a7db4 [RISCV] Add obsolete aliases of fscsr, frcsr (fssr, frsr)
These old aliases were renamed, but are still used by some projects (eg newlib).

Differential Revision: https://reviews.llvm.org/D68392

llvm-svn: 373618
2019-10-03 15:47:28 +00:00
George Rimar c18585e32e [yaml2obj] - Add a Size tag support for SHT_LLVM_ADDRSIG sections.
It allows using "Size" with or without "Content" in YAML descriptions of
SHT_LLVM_ADDRSIG sections.

Differential revision: https://reviews.llvm.org/D68334

llvm-svn: 373610
2019-10-03 15:02:18 +00:00
Sanjay Patel 38c265fe26 [MSP430] add tests for unwanted shift codegen; NFC (PR43542)
llvm-svn: 373607
2019-10-03 14:54:03 +00:00
George Rimar fc9104d42a Recommit r373598 "[yaml2obj/obj2yaml] - Add support for SHT_LLVM_ADDRSIG sections."
Fix: call `consumeError()` for a case missed.

Original commit message:

SHT_LLVM_ADDRSIG is described here:
https://llvm.org/docs/Extensions.html#sht-llvm-addrsig-section-address-significance-table

This patch teaches tools to dump them and to parse the YAML declarations of such sections.

Differential revision: https://reviews.llvm.org/D68333

llvm-svn: 373606
2019-10-03 14:52:33 +00:00
George Rimar 9f6cf2a081 Revert r373598 "[yaml2obj/obj2yaml] - Add support for SHT_LLVM_ADDRSIG sections."
It broke BB:
http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/18655/steps/test/logs/stdio

llvm-svn: 373599
2019-10-03 14:04:47 +00:00
George Rimar 32cbabfecb [yaml2obj/obj2yaml] - Add support for SHT_LLVM_ADDRSIG sections.
SHT_LLVM_ADDRSIG is described here:
https://llvm.org/docs/Extensions.html#sht-llvm-addrsig-section-address-significance-table

This patch teaches tools to dump them and to parse the YAML declarations of such sections.

Differential revision: https://reviews.llvm.org/D68333

llvm-svn: 373598
2019-10-03 13:57:08 +00:00
Roman Lebedev c780645736 [NFC][InstCombine] Some tests for sub-of-negatible pattern
As we have previously estabilished, `sub` is an outcast,
and should be considered non-canonical iff it can be converted to `add`.

It can be converted to `add` if it's second operand can be negated.
So far we mostly only do that for constants and negation itself,
but we should be more through.

llvm-svn: 373597
2019-10-03 13:36:00 +00:00
George Rimar 6079498c51 [llvm-readobj] - Stop using a precompiled binary in all.test
Having a precompiled binary here is excessive.
I also added a few missing tags.

Differential revision: https://reviews.llvm.org/D68386

llvm-svn: 373594
2019-10-03 13:13:23 +00:00
Simon Atanasyan f6551ddfce [mips] Push `fixup_Mips_LO16` fixup for `jialc` and `jic` instructions
llvm-svn: 373591
2019-10-03 12:08:26 +00:00
Simon Atanasyan afe7197f13 [mips] Use llvm-readobj `-A` flag in test cases. NFC
llvm-svn: 373589
2019-10-03 12:08:04 +00:00
Simon Atanasyan 952d71b794 [llvm-readobj][mips] Display MIPS specific info under --arch-specific flag
Old options `--mips-plt-got`, `--mips-abi-flags`, '--mips-reginfo`,
and `--mips-options` wiil be deleted in a separate patch.

llvm-svn: 373588
2019-10-03 12:07:07 +00:00
Simon Atanasyan 8c6bed4396 [llvm-readobj][mips] Do not show an error if GOT is missed
It is not an error if a file does not contain GOT.

llvm-svn: 373587
2019-10-03 12:06:56 +00:00
Sander de Smalen 4f99b6f0fe [AArch64] Static (de)allocation of SVE stack objects.
Adds support to AArch64FrameLowering to allocate fixed-stack SVE objects.

The focus of this patch is purely to allow the stack frame to
allocate/deallocate space for scalable SVE objects. More dynamic
allocation (at compile-time, i.e. determining placement of SVE objects
on the stack), or resolving frame-index references that include
scalable-sized offsets, are left for subsequent patches.

SVE objects are allocated in the stack frame as a separate region below
the callee-save area, and above the alignment gap. This is done so that
the SVE objects can be accessed directly from the FP at (runtime)
VL-based offsets to benefit from using the VL-scaled addressing modes.

The layout looks as follows:

     +-------------+
     | stack arg   |   
     +-------------+
     | Callee Saves|
     |   X29, X30  |       (if available)
     |-------------| <- FP (if available)
     |     :       |   
     |  SVE area   |   
     |     :       |   
     +-------------+
     |/////////////| alignment gap.
     |     :       |   
     | Stack objs  |
     |     :       |   
     +-------------+ <- SP after call and frame-setup

SVE and non-SVE stack objects are distinguished using different
StackIDs. The offsets for objects with TargetStackID::SVEVector should be
interpreted as purely scalable offsets within their respective SVE region.

Reviewers: thegameg, rovka, t.p.northover, efriedma, rengolin, greened

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D61437

llvm-svn: 373585
2019-10-03 11:33:50 +00:00
Craig Topper 3a6950d3f0 [X86] Add test case for v8i64->v8i8 truncate with avx512 and prefer-vector-width/min-legal-vector-width=256. NFC
With vpmovqb, we should be able to do better here until we get
AVX512VBMI on Cannonlake/Icelake.

llvm-svn: 373569
2019-10-03 06:18:45 +00:00
Matt Arsenault 1c135a39aa AMDGPU/GlobalISel: Expand G_BITCAST legality
llvm-svn: 373567
2019-10-03 05:46:08 +00:00
Craig Topper eb420aa379 [X86] Add DAG combine to turn (bitcast (vbroadcast_load)) into just a vbroadcast_load if the scalar size is the same.
This improves broadcast load folding of i64 elements on 32-bit
targets where i64 isn't legal.

Previously we had to represent these as vXf64 vbroadcast_loads and
a bitcast to vXi64. But we didn't have any isel patterns
looking for that.

This also allows us to remove or simplify some isel patterns that
were looking for bitcasted vbroadcast_loads.

llvm-svn: 373566
2019-10-03 05:30:02 +00:00
Craig Topper f849f41469 [X86] Add broadcast load folding patterns to NoVLX VPMULLQ/VPMAXSQ/VPMAXUQ/VPMINSQ/VPMINUQ patterns.
More fixes for PR36191.

llvm-svn: 373560
2019-10-03 03:16:27 +00:00
Stanislav Mekhanoshin 1384c3a5b8 [AMDGPU] Fix illegal agpr use by VALU
When SIFixSGPRCopies attempts to fix an illegal copy from vector to
scalar register it calls moveToVALU(). A copy from an agpr to sgpr
becomes a copy from agpr to agpr, which may result in the illegal
register class at a use of this copy.

Solution is to copy it always into a vgpr. This may result in a
subsequent copy into an agpr if that is what really needed, however
should not happen too often and likely will be folded later.

The opposite situation may not happen because an sgpr is always
illegal where agpr is legal, so such user instructions may not
exist.

Differential Revision: https://reviews.llvm.org/D68358

llvm-svn: 373544
2019-10-02 23:23:46 +00:00
Roman Lebedev ae3315af07 [InstCombine] Bypass high bit extract before variable sign-extension (PR43523)
https://rise4fun.com/Alive/8BY - valid for lshr+trunc+variable sext
https://rise4fun.com/Alive/7jk - the variable sext can be redundant

https://rise4fun.com/Alive/Qslu - 'exact'-ness of first shift can be preserver

https://rise4fun.com/Alive/IF63 - without trunc we could view this as
                                  more general "drop redundant mask before right-shift",
                                  but let's handle it here for now
https://rise4fun.com/Alive/iip - likewise, without trunc, variable sext can be redundant.

There's more patterns for sure - e.g. we can have 'lshr' as the final shift,
but that might be best handled by some more generic transform, e.g.
"drop redundant masking before right-shift" (PR42456)

I'm singling-out this sext patch because you can only extract
high bits with `*shr` (unlike abstract bit masking),
and i *know* this fold is wanted by existing code.

I don't believe there is much to review here,
so i'm gonna opt into post-review mode here.

https://bugs.llvm.org/show_bug.cgi?id=43523

llvm-svn: 373542
2019-10-02 23:02:12 +00:00
Roman Lebedev 29339149c3 [NFC][InstCombine] Add tests for 'variable sext of variable high bit extract' pattern (PR43523)
https://bugs.llvm.org/show_bug.cgi?id=43523

llvm-svn: 373541
2019-10-02 23:01:58 +00:00
David Bolvansky 6b45029676 [InstCombine] Transform bcopy to memmove
bcopy is still widely used mainly for network apps. Sadly, LLVM has no optimizations for bcopy, but there are some for memmove. 
Since bcopy == memmove, it is profitable to transform bcopy to memmove and use current optimizations for memmove for free here.

llvm-svn: 373537
2019-10-02 22:49:20 +00:00
Craig Topper f5bda7fe24 [X86] Add test cases for suboptimal vselect+setcc splitting.
If the vselect result type needs to be split, it will try to
also try to split the condition if it happens to be a setcc.

With avx512 where k-registers are legal, its probably better
to just use a kshift to split the mask register.

llvm-svn: 373536
2019-10-02 22:35:03 +00:00
Yi-Hong Lyu c7be067974 [PowerPC] Fix SH field overflow issue
Store rlwinm Rx, Ry, 32, 0, 31 as rlwinm Rx, Ry, 0, 0, 31 and store
rldicl Rx, Ry, 64, 0 as rldicl Rx, Ry, 0, 0. Otherwise SH field is overflow and
fails assertion in assembly printing stage.

Differential Revision: https://reviews.llvm.org/D66991

llvm-svn: 373519
2019-10-02 20:25:16 +00:00
Evgeniy Stepanov 464df87288 Handle llvm.launder.invariant.group in msan.
Summary:
[MSan] handle llvm.launder.invariant.group

    Msan used to give false-positives in

    class Foo {
     public:
      virtual ~Foo() {};
    };

    // Return true iff *x is set.
    bool f1(void **x, bool flag);

    Foo* f() {
      void *p;
      bool found;
      found = f1(&p,flag);
      if (found) {
        // p is always set here.
        return static_cast<Foo*>(p); // False positive here.
      }
      return nullptr;
    }

Patch by Ilya Tokar.

Reviewers: #sanitizers, eugenis

Reviewed By: #sanitizers, eugenis

Subscribers: eugenis, Prazek, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68236

llvm-svn: 373515
2019-10-02 19:53:19 +00:00
Alina Sbirlea 24ae5ce54b [MemorySSA] Update Phi creation when inserting a Def.
MemoryPhis should be added in the IDF of the blocks newly gaining Defs.
This includes the blocks that gained a Phi and the block gaining a Def,
if the block did not have one before.
Resolves PR43427.

llvm-svn: 373505
2019-10-02 18:42:33 +00:00
Craig Topper 74c7d6be28 [X86] Rewrite to the vXi1 subvector insertion code to not rely on the value of bits that might be undef
The previous code tried to do a trick where we would extract the subvector from the location we were inserting. Then xor that with the new value. Take the xored value and clear out the bits above the subvector size. Then shift that xored subvector to the insert location. And finally xor that with the original vector. Since the old subvector was used in both xors, this would leave just the new subvector at the inserted location. Since the surrounding bits had been zeroed no other bits of the original vector would be modified.

Unfortunately, if the old subvector came from undef we might aggressively propagate the undef. Then we end up with the XORs not cancelling because they aren't using the same value for the two uses of the old subvector. @bkramer gave me a case that demonstrated this, but we haven't reduced it enough to make it easily readable to see what's happening.

This patch uses a safer, but more costly approach. It isolate the bits above the insertion and bits below the insert point and ORs those together leaving 0 for the insertion location. Then widens the subvector with 0s in the upper bits, shifts it into position with 0s in the lower bits. Then we do another OR.

Differential Revision: https://reviews.llvm.org/D68311

llvm-svn: 373495
2019-10-02 17:47:09 +00:00
Thomas Lively 5b74c39d72 [WebAssembly] Error when using wasm64 for ISel
Summary:
64-bit WebAssembly (wasm64) is not specified and not supported in the
WebAssembly backend. We do have support for it in clang, however, and
we would like to keep that support because we expect wasm64 to be
specified and supported in the future. For now add an error when
trying to use wasm64 from the backend to minimize user confusion from
unexplained crashes.

Reviewers: aheejin, dschuff, sunfish

Subscribers: sbc100, jgravelle-google, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68254

llvm-svn: 373493
2019-10-02 17:34:44 +00:00
Piotr Sobczak 265e94e657 [AMDGPU] Extend buffer intrinsics with swizzling
Summary:
Extend cachepolicy operand in the new VMEM buffer intrinsics
to supply information whether the buffer data is swizzled.
Also, propagate this information to MIR.

Intrinsics updated:
int_amdgcn_raw_buffer_load
int_amdgcn_raw_buffer_load_format
int_amdgcn_raw_buffer_store
int_amdgcn_raw_buffer_store_format
int_amdgcn_raw_tbuffer_load
int_amdgcn_raw_tbuffer_store
int_amdgcn_struct_buffer_load
int_amdgcn_struct_buffer_load_format
int_amdgcn_struct_buffer_store
int_amdgcn_struct_buffer_store_format
int_amdgcn_struct_tbuffer_load
int_amdgcn_struct_tbuffer_store

Furthermore, disable merging of VMEM buffer instructions
in SI Load/Store optimizer, if the "swizzled" bit on the instruction
is on.

The default value of the bit is 0, meaning that data in buffer
is linear and buffer instructions can be merged.

There is no difference in the generated code with this commit.
However, in the future it will be expected that front-ends
use buffer intrinsics with correct "swizzled" bit set.

Reviewers: arsenm, nhaehnle, tpr

Reviewed By: nhaehnle

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, arphaman, jfb, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68200

llvm-svn: 373491
2019-10-02 17:22:36 +00:00
Teresa Johnson 077cc3fcb0 [ThinLTO/WPD] Ensure devirtualized targets use promoted symbol when necessary
Summary:
This fixes a hole in the handling of devirtualized targets that were
local but need promoting due to devirtualization in another module. We
were not correctly referencing the promoted symbol in some cases. Make
sure the code that updates the name also looks at the ExportedGUIDs set
by utilizing a callback that checks all conditions (the callback
utilized by the internalization/promotion code).

Reviewers: pcc, davidxl, hiraditya

Subscribers: mehdi_amini, Prazek, inglorion, steven_wu, dexonsmith, dang, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68159

llvm-svn: 373485
2019-10-02 16:36:59 +00:00
Sanjay Patel 3f4726b818 [SLP] add test for vectorization of different widths (PR28457); NFC
llvm-svn: 373483
2019-10-02 16:12:42 +00:00
Hans Wennborg 9330005a54 Reapply r373431 "Switch lowering: omit range check for bit tests when default is unreachable (PR43129)"
This was reverted in r373454 due to breaking the expensive-checks bot.
This version addresses that by omitting the addSuccessorWithProb() call
when omitting the range check.

> Switch lowering: omit range check for bit tests when default is unreachable (PR43129)
>
> This is modeled after the same functionality for jump tables, which was
> added in r357067.
>
> Differential revision: https://reviews.llvm.org/D68131

llvm-svn: 373477
2019-10-02 14:35:06 +00:00
George Rimar 4496f07497 [llvm-readelf] - Report a warning when .hash section contains a chain with a cycle.
It is possible to craft a .hash section that triggers an infinite loop
in llvm-readelf code. This patch fixes the issue and introduces
a warning.

Differential revision: https://reviews.llvm.org/D68086

llvm-svn: 373476
2019-10-02 14:11:35 +00:00
George Rimar 6fa696fb08 [yaml2obj] - Alow Size tag for describing SHT_HASH sections.
This is a follow-up for D68085 which allows using "Size"
tag together with "Content" tag or alone.

Differential revision: https://reviews.llvm.org/D68216

llvm-svn: 373473
2019-10-02 13:52:37 +00:00
Djordje Todorovic 45297645aa [llvm-dwarfdump] Fix dumping of wrong locstats map
llvm-svn: 373469
2019-10-02 13:24:45 +00:00
Kerry McLaughlin 822b298958 [AArch64][SVE] Implement int_aarch64_sve_cnt intrinsic
Summary: This patch includes tests for the VecOfBitcastsToInt type added by D68021

Reviewers: c-rhodes, sdesmalen, rovka

Reviewed By: c-rhodes

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits, cfe-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68023

llvm-svn: 373468
2019-10-02 13:09:54 +00:00
James Molloy 9026518e73 [ModuloSchedule] Peel out prologs and epilogs, generate actual code
Summary:
This extends the PeelingModuloScheduleExpander to generate prolog and epilog code,
and correctly stitch uses through the prolog, kernel, epilog DAG.

The key concept in this patch is to ensure that all transforms are *local*; only a
function of a block and its immediate predecessor and successor. By defining the problem in this way
we can inductively rewrite the entire DAG using only local knowledge that is easy to
reason about.

For example, we assume that all prologs and epilogs are near-perfect clones of the
steady-state kernel. This means that if a block has an instruction that is predicated out,
we can redirect all users of that instruction to that equivalent instruction in our
immediate predecessor. As all blocks are clones, every instruction must have an equivalent in
every other block.

Similarly we can make the assumption by construction that if a value defined in a block is used
outside that block, the only possible user is its immediate successors. We maintain this
even for values that are used outside the loop by creating a limited form of LCSSA.

This code isn't small, but it isn't complex.

Enabled a bunch of testing from Hexagon. There are a couple of tests not enabled yet;
I'm about 80% sure there isn't buggy codegen but the tests are checking for patterns
that we don't produce. Those still need a bit more investigation. In the meantime we
(Google) are happy with the code produced by this on our downstream SMS implementation,
and believe it generates correct code.

Subscribers: mgorny, hiraditya, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68205

llvm-svn: 373462
2019-10-02 12:46:44 +00:00
Fangrui Song 671fb34358 [llvm-objcopy] Add --set-section-alignment
Fixes PR43181. This option was recently added to GNU objcopy (binutils
PR24942).

`llvm-objcopy -I binary -O elf64-x86-64 --set-section-alignment .data=8` can set the alignment of .data.

Reviewed By: grimar, jhenderson, rupprecht

Differential Revision: https://reviews.llvm.org/D67656

llvm-svn: 373461
2019-10-02 12:41:25 +00:00
Florian Hahn f2ffa7a1c0 [InstCombine] Precommit tests for D68265
llvm-svn: 373458
2019-10-02 12:32:37 +00:00
Sanjay Patel be21ceb565 [InstSimplify] fold fma/fmuladd with a NaN or undef operand
This is intended to be similar to the constant folding results from
D67446
and earlier, but not all operands are constant in these tests, so the
responsibility for folding is left to InstSimplify.

Differential Revision: https://reviews.llvm.org/D67721

llvm-svn: 373455
2019-10-02 12:12:02 +00:00
Hans Wennborg 372aece777 Revert r373431 "Switch lowering: omit range check for bit tests when default is unreachable (PR43129)"
This broke http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/19967

> Switch lowering: omit range check for bit tests when default is unreachable (PR43129)
>
> This is modeled after the same functionality for jump tables, which was
> added in r357067.
>
> Differential revision: https://reviews.llvm.org/D68131

llvm-svn: 373454
2019-10-02 12:08:44 +00:00
David Green c9b5ab8b1c [ARM] Identity shuffles are legal
Identity shuffles, of the form (0, 1, 2, 3, ...) are perfectly OK under MVE
(they essentially just become bitcasts). We were not catching that in the
existing set of what we considered legal though. On NEON, they would be covered
by vext's, but that is not generally available in MVE.

This uses ShuffleVectorInst::isIdentityMask which is a little odd to use here
but does what we want and prevents us from just rewriting what is the same
function.

Differential Revision: https://reviews.llvm.org/D68241

llvm-svn: 373446
2019-10-02 11:40:51 +00:00
Hans Wennborg cbefc36fcc Switch lowering: omit range check for bit tests when default is unreachable (PR43129)
This is modeled after the same functionality for jump tables, which was
added in r357067.

Differential revision: https://reviews.llvm.org/D68131

llvm-svn: 373431
2019-10-02 08:32:15 +00:00
Djordje Todorovic 2ef18fb41a Reland "[utils] Implement the llvm-locstats tool"
The tool reports verbose output for the DWARF debug location coverage.
The llvm-locstats for each variable or formal parameter DIE computes what
percentage from the code section bytes, where it is in scope, it has
location description. The line 0 shows the number (and the percentage) of
DIEs with no location information, but the line 100 shows the number (and
the percentage) of DIEs where there is location information in all code
section bytes (where the variable or parameter is in the scope). The line
50..59 shows the number (and the percentage) of DIEs where the location
information is in between 50 and 59 percentage of its scope covered.

Differential Revision: https://reviews.llvm.org/D66526

The cause of the test failure was resolved.

llvm-svn: 373427
2019-10-02 07:00:01 +00:00
Rui Ueyama 60e9df3362 [llvm-lib] Detect duplicate input files
Differential Revision: https://reviews.llvm.org/D68320

llvm-svn: 373426
2019-10-02 06:41:52 +00:00
Rui Ueyama 64a362e721 [llvm-lib] Correctly handle .lib input files
If archive files are passed as input files, llvm-lib needs to append
the members of the input archive files to the output file. This patch
implements that behavior.

This patch splits an existing function into smaller functions.
Effectively, the new code is only `if (Magic == file_magic::archive)
{ ... }` part.

Fixes https://bugs.llvm.org/show_bug.cgi?id=32674

Differential Revision: https://reviews.llvm.org/D68204

llvm-svn: 373424
2019-10-02 05:24:24 +00:00
Craig Topper 8d6a863b02 [X86] Add broadcast load folding patterns to the NoVLX compare patterns.
These patterns use zmm registers for 128/256-bit compares when
the VLX instructions aren't available. Previously we only
supported registers, but as PR36191 notes we can fold broadcast
loads, but not regular loads.

llvm-svn: 373423
2019-10-02 04:45:02 +00:00
David Blaikie bfc68885d9 DebugInfo: Update support for detecting C++ language variants in debug info emission
llvm-svn: 373420
2019-10-02 01:39:48 +00:00
Matt Arsenault cdfe5efe9b AMDGPU/GlobalISel: Assume VGPR for G_FRAME_INDEX
In principle this should behave as any other constant. However
eliminateFrameIndex currently assumes a VALU use and uses a vector
shift. Work around this by selecting to VGPR for now until
eliminateFrameIndex is fixed.

llvm-svn: 373415
2019-10-02 01:02:24 +00:00
Matt Arsenault bfce0c2664 AMDGPU/GlobalISel: Private loads always use VGPRs
llvm-svn: 373414
2019-10-02 01:02:21 +00:00
Matt Arsenault 05aa8a733e AMDGPU/GlobalISel: Legalize 1024-bit G_BUILD_VECTOR
This will be needed to support AGPR operations.

llvm-svn: 373413
2019-10-02 01:02:18 +00:00
Matt Arsenault 3a657afb3a AMDGPU/GlobalISel: Fix RegBankSelect for 1024-bit values
llvm-svn: 373412
2019-10-02 01:02:14 +00:00
Stanislav Mekhanoshin 075bc48a7f [AMDGPU] separate accounting for agprs
Account and report agprs separately on gfx908. Other targets
do not change the reporting.

Differential Revision: https://reviews.llvm.org/D68307

llvm-svn: 373411
2019-10-02 00:26:58 +00:00
Craig Topper 8c19925f42 [X86] Add a DAG combine to shrink vXi64 gather/scatter indices that are constant with sufficient sign bits to fit in vXi32
The gather/scatter instructions can implicitly sign extend the indices. If we're operating on 32-bit data, an v16i64 index can force a v16i32 gather to be split in two since the index needs 2 registers. If we can shrink the index to the i32 we can avoid the split. It should always be safe to shrink the index regardless of the number of elements. We have gather/scatter instructions that can use v2i32 index stored in a v4i32 register with v2i64 data size.

I've limited this to before legalize types to avoid creating a v2i32 after type legalization. We could check for it, but we'd also need testing. I'm also only handling build_vectors with no bitcasts to be sure the truncate will constant fold.

Differential Revision: https://reviews.llvm.org/D68247

llvm-svn: 373408
2019-10-01 23:18:31 +00:00
Changpeng Fang e4ee28d14c AMDGPU: Fix an out of date assert in addressing FrameIndex
Reviewers:
  arsenm

Differential Revision:
  https://reviews.llvm.org/D67574

llvm-svn: 373404
2019-10-01 23:07:14 +00:00
Craig Topper 0da163a2cf Revert r373172 "[X86] Add custom isel logic to match VPTERNLOG from 2 logic ops."
This seems to be causing some performance regresions that I'm
trying to investigate.

One thing that stands out is that this transform can increase
the live range of the operands of the earlier logic op. This
can be bad for register allocation. If there are two logic
op inputs we should really combine the one that is closest, but
SelectionDAG doesn't have a good way to do that. Maybe we need
to do this as a basic block transform in Machine IR.

llvm-svn: 373401
2019-10-01 22:40:03 +00:00
Craig Topper 912870573c [X86] convertToThreeAddress, make sure second operand of SUB32ri is really an immediate before calling getImm().
It might be a symbol instead. We can't fold those since we can't
negate them.

Similar for other SUB with immediates.

Fixes PR43529.

llvm-svn: 373397
2019-10-01 21:55:55 +00:00
Sanjay Patel 9738fd6387 [BypassSlowDivision][CodeGenPrepare] avoid crashing on unused code (PR43514)
https://bugs.llvm.org/show_bug.cgi?id=43514

llvm-svn: 373394
2019-10-01 21:25:36 +00:00
Bardia Mahjour 91b62d5c89 [DDG] Data Dependence Graph - Root Node
Summary:
This patch adds Root Node to the DDG. The purpose of the root node is to create a single entry node that allows graph walk iterators to iterate through all nodes of the graph, making sure that no node is left unvisited during a graph walk (eg. SCC or DFS). Once the DDG is fully constructed it will have exactly one root node. Every node in the graph is reachable from the root. The algorithm for connecting the root node is based on depth-first-search that keeps track of visited nodes to try to avoid creating unnecessary edges.

Authored By: bmahjour

Reviewer: Meinersbur, fhahn, myhsu, xtian, dmgreen, kbarton, jdoerfert

Reviewed By: Meinersbur

Subscribers: ychen, arphaman, simoll, a.elovikov, mgorny, hiraditya, jfb, wuzish, llvm-commits, jsji, Whitney, etiotto, ppc-slack

Tag: #llvm

Differential Revision: https://reviews.llvm.org/D67970

llvm-svn: 373386
2019-10-01 19:32:42 +00:00
Alina Sbirlea 890090f7f5 [MemorySSA] Check for unreachable blocks when getting last definition.
If a single predecessor is found, still check if the block is
unreachable. The test that found this had a self loop unreachable block.
Resolves PR43493.

llvm-svn: 373383
2019-10-01 19:09:50 +00:00
Jakub Kuderski 7ed4fb389b Add a missing pass in ARM O3 pipeline
llvm-svn: 373382
2019-10-01 18:53:54 +00:00
Alina Sbirlea ae40dfc1e3 [MemorySSA] Update last_access_in_block check.
The check for "was there an access in this block" should be: is the last
access in this block and is it not a newly inserted phi.
Resolves new test in PR43438.

Also fix a typo when simplifying trivial Phis to match the comment.

llvm-svn: 373380
2019-10-01 18:34:39 +00:00
Jakub Kuderski 856c1cd852 [Dominators][CodeGen] Don't mark MachineDominatorTree as preserved in MachineLICM
llvm-svn: 373378
2019-10-01 18:27:44 +00:00
David Green a3ebcfe5a6 [ARM] Some MVE shuffle plus extend tests. NFC
llvm-svn: 373368
2019-10-01 18:04:02 +00:00