Commit Graph

59079 Commits

Author SHA1 Message Date
Teresa Johnson 87cc05055a Try to make new test more resilient to different orderings
New test added in r352441 getting a bot failure which I believe is
due to different ordering in the dumping which isn't being handled
well. Try to make test more resilient to ordering differences.

llvm-svn: 352446
2019-01-29 02:04:01 +00:00
Sam Clegg b54927cc48 [WebAssembly] Handle more types of uses in WebAssemblyAddMissingPrototypes
Previously we were only handling bitcast operations, however
prototypeless functions can also appear in other places such as
comparisons and as function params.

Switch to using replaceAllUsesWith() to replace the prototype-less
function uses.  This new approach results in some redundant bitcasting
but is much simpler and handles all cases.

Differential Revision: https://reviews.llvm.org/D56938

llvm-svn: 352445
2019-01-29 00:30:46 +00:00
Thomas Lively 33f87b8aef [WebAssembly] Expand BUILD_PAIR nodes
Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish

Differential Revision: https://reviews.llvm.org/D57276

llvm-svn: 352442
2019-01-28 23:44:31 +00:00
Teresa Johnson 2f616e479b [ThinLTO] Add option to dump per-module summary dot graph
Summary:
I found that there currently isn't a way to invoke exportToDot from
the command line for a per-module summary index, and therefore no
testing of that case. Add an internal option and use it to test dumping
of per module summary indexes.

In particular, I am looking at fixing the limitation that causes the
aliasee GUID in the per-module summary to be 0, and want to be able to
test that change.

Reviewers: evgeny777

Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D57206

llvm-svn: 352441
2019-01-28 23:43:26 +00:00
Philip Reames 6c5341bc5a Demanded elements support for vector GEPs
GEPs can produce either scalar or vector results. If we're extracting only a subset of the vector lanes, simplifying the operands is helpful in eliminating redundant computation, and (eventually) allowing further optimizations

Differential Revision: https://reviews.llvm.org/D57177

llvm-svn: 352440
2019-01-28 23:24:49 +00:00
Sanjay Patel a36a293a56 [CGP] auto-generate complete checks for add overflow tests; NFC
llvm-svn: 352437
2019-01-28 22:07:37 +00:00
Craig Topper 390ac61b93 Recommit r352255 "[SelectionDAG][X86] Don't use SEXTLOAD for promoting masked loads in the type legalizer"
This did not cause the buildbot failure it was previously reverted for.

Original commit message:

I'm not sure why we were using SEXTLOAD. EXTLOAD seems more appropriate since we don't care about the upper bits.

This patch changes this and then modifies the X86 post legalization combine to emit a extending shuffle instead of a sign_extend_vector_inreg. Could maybe use an any_extend_vector_inre

On AVX512 targets I think we might be able to use a masked vpmovzx and not have to expand this at all.

llvm-svn: 352433
2019-01-28 21:38:47 +00:00
Jessica Paquette 2d73ecd0a3 [GlobalISel][AArch64] Add legalization for G_FLOG
This adds support for legalizing G_FLOG into a RTLib call.

It adds a legalizer test, and updates the existing floating point tests.

https://reviews.llvm.org/D57347

llvm-svn: 352429
2019-01-28 21:27:23 +00:00
Sanjay Patel 8965411619 [InstCombine] add another saturating uadd test (no undefs); NFC
I forgot that our undef matching hasn't been completed in the previous commit.

llvm-svn: 352424
2019-01-28 20:37:18 +00:00
Sanjay Patel dc543300a9 [InstCombine] add tests for saturating uadd with constant; NFC
llvm-svn: 352423
2019-01-28 20:32:48 +00:00
Matt Arsenault cdd191d9db AMDGPU: Add DS append/consume intrinsics
Since these pass the pointer in m0 unlike other DS instructions, these
need to worry about whether the address is uniform or not. This
assumes the address is dynamically uniform, and just uses
readfirstlane to get a copy into an SGPR.

I don't know if these have the same 16-bit add for the addressing mode
offset problem on SI or not, but I've just assumed they do.

Also includes some misc. changes to avoid test differences between the
LDS and GDS versions.

llvm-svn: 352422
2019-01-28 20:14:49 +00:00
Jessica Paquette c49428a97d [GlobalISel][AArch64] Add instruction selection support for @llvm.log10
This adds instruction selection support for @llvm.log10 in AArch64. It teaches
GISel to lower it to a library call, updates the relevant tests, and adds a
legalizer test for log10.

https://reviews.llvm.org/D57341

llvm-svn: 352418
2019-01-28 19:53:14 +00:00
Scott Linder b5d6292822 [MC] Do not consider .ifdef/.ifndef as a use
This is allowed by GAS and seems correct.

Differential Revision: https://reviews.llvm.org/D55439

llvm-svn: 352414
2019-01-28 19:32:08 +00:00
Francis Visoiu Mistrih 556ea7d2e0 [AArch64] Add 'apple-latest' CPU alias
The 'apple-latest' alias is supposed to provide a CPU that contains the
latest Apple processor model supported by LLVM.

This is supposed to be used by tools like lldb to provide a target that
supports most of the CPU features.

For now, this is mapped to Cyclone.

Differential Revision: https://reviews.llvm.org/D56384

llvm-svn: 352412
2019-01-28 19:27:33 +00:00
Jessica Paquette 2e35dc5185 [GlobalISel] Add ISel support for @llvm.lifetime.start and @llvm.lifetime.end
This adds ISel support for lifetime markers in opt levels above O0.

It also updates the arm64-irtranslator test, and updates some AArch64 tests that
use them for added coverage.

It also adds a testcase taken from the X86 codegen tests which verified a bug
caused by lifetime markers + stack colouring in the past. This is intended to
make sure that GISel doesn't re-introduce the bug.

(This is basically a straight copy from what SelectionDAG does in
SelectionDAGBuilder.cpp)

https://reviews.llvm.org/D57187

llvm-svn: 352410
2019-01-28 19:22:29 +00:00
Nikita Popov 8e1a464e6a [CodeGen][X86] Expand UADDSAT to NOT+UMIN+ADD
Followup to D56636, this time handling the UADDSAT case by expanding
uadd.sat(a, b) to umin(a, ~b) + b.

Differential Revision: https://reviews.llvm.org/D56869

llvm-svn: 352409
2019-01-28 19:19:09 +00:00
Vedant Kumar 1c3694a4d4 [CodeExtractor] Add support for the `swifterror` attribute
When passing a `swifterror` argument or alloca as an input to an
extraction region, mark the input parameter `swifterror`.

llvm-svn: 352408
2019-01-28 19:13:37 +00:00
Jessica Paquette 7db82d7257 [GlobalISel][AArch64] Add instruction selection support for G_FCOS and G_FSIN
This contains all of the legalizer changes from D57197 necessary to select
G_FCOS and G_FSIN. It also updates several existing IR tests in
test/CodeGen/AArch64 that verify that we correctly lower the G_FCOS and G_FSIN
instructions.

https://reviews.llvm.org/D57197
3/3

llvm-svn: 352402
2019-01-28 18:34:18 +00:00
Jessica Paquette 296f19b3d9 [GlobalISel][AArch64] Add IRTranslator support for G_FCOS and G_FSIN
This adds IRTranslator support for the G_FCOS and G_FSIN generic instructions.

https://reviews.llvm.org/D57197
2/3

llvm-svn: 352401
2019-01-28 18:34:17 +00:00
Jessica Paquette 9f6afad913 [GlobalISel] Add G_FSIN and G_FCOS generic instructions
This introduces generic instrutions for floating point sin and cos, G_FCOS and
G_FSIN. It updates the tests, etc.

https://reviews.llvm.org/D57197
1/3

llvm-svn: 352400
2019-01-28 18:34:16 +00:00
Simon Pilgrim 2c17512456 [X86][AVX] Remove lowerShuffleByMerging128BitLanes 2-lane restriction
First step towards adding support for 64-bit unary "sublane" handling (a bit like lowerShuffleAsRepeatedMaskAndLanePermute). 

This allows us to add lowerV64I8Shuffle handling.

llvm-svn: 352389
2019-01-28 17:02:35 +00:00
Sanjay Patel 94cca60b82 [x86] allow more shuffle splitting to avoid vpermps (PR40434)
This is tricky to make optimal: sometimes we're better off using 
a single wider op, but other times it makes more sense to combine
a narrow ops to achieve the same result.

This solves the case from:
https://bugs.llvm.org/show_bug.cgi?id=40434

There's potentially a similar change for vectors with 64-bit elements,
but it needs adjustments similar to rL352333 to avoid creating infinite
loops.

llvm-svn: 352380
2019-01-28 15:51:34 +00:00
George Rimar 7d6fd6d73d [llvm-objdump] - Update test after r352366. NFC.
Change the column name.

llvm-svn: 352379
2019-01-28 15:49:41 +00:00
George Rimar 3168496822 [obj2yaml] - Dump the sh_entsize section field.
I faced with the fact that obj2yaml does not dump the sh_entsize field.
A problem arose when I tried to dump ELF versioning sections.

This is close to what D50235 did, but D50235 did the change for yaml2obj, and now
I had to do the same for obj2yaml.

Differential revision: https://reviews.llvm.org/D57229

llvm-svn: 352373
2019-01-28 15:05:10 +00:00
Jordan Rupprecht b2702d6a45 [llvm-objcopy] Fix crash when writing empty binary output
Summary: When using llvm-objcopy -O binary and the resulting file will be empty (e.g. removing the only section that would be written, or using --only-keep with a section that doesn't exist/isn't SHF_ALLOC), we crash because FileOutputBuffer expects Size > 0. Add a regression test, and change Buffer to open/truncate the output file in this case.

Reviewers: alexshap, jhenderson, jakehehrlich, espindola

Reviewed By: alexshap, jhenderson

Subscribers: jfb, llvm-commits, emaste, arichardson

Differential Revision: https://reviews.llvm.org/D56806

llvm-svn: 352371
2019-01-28 15:02:40 +00:00
Aleksandar Beserminji 6c5dfcb89e [mips] Support for +abs2008 attribute
Instruction abs.[ds] is not generating correct result when working
with NaNs for revisions prior mips32r6 and mips64r6.

To generate a sequence which always produce a correct result, but also
to allow user more control on how his code is compiled, attribute
+abs2008 is added, so user can choose legacy or 2008.

By default legacy mode is used on revisions prior R6. Mips32r6 and
mips64r6 use abs2008 mode by default.

Differential Revision: https://reviews.llvm.org/D35983

llvm-svn: 352370
2019-01-28 14:59:30 +00:00
George Rimar 87fa2e66e7 [llvm-objdump] - Print LMAs when dumping section headers.
When --section-headers is used, GNU objdump prints both LMA and VMA for sections.
llvm-objdump does not do that what makes it's output be slightly inconsistent.

Patch teaches llvm-objdump to print LMA/VMA for ELF file formats.
The behavior for other formats remains unchanged.

Differential revision: https://reviews.llvm.org/D57146

llvm-svn: 352366
2019-01-28 14:11:35 +00:00
Tim Corringham 824ca3f3dd [AMDGPU] Add intrinsics for 16 bit interpolation
Summary:
Added the intrinsics llvm.amdgcn.interp.p1.f16() and
llvm.amdgcn.interp.p2.f16() and related LIT test.

The p1 intrinsic generates code appropriate for both 16 and 32
bank LDS.

Reviewers: #amdgpu, dstuttard, arsenm, tpr

Reviewed By: #amdgpu, arsenm

Subscribers: jvesely, mgorny, arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D46754

llvm-svn: 352357
2019-01-28 13:48:59 +00:00
Petar Avramovic 7cecadb9af [MIPS GlobalISel] Select sub
Lower G_USUBO and G_USUBE. Add narrowScalar for G_SUB.
Legalize and select G_SUB for MIPS 32.

Differential Revision: https://reviews.llvm.org/D53416

llvm-svn: 352351
2019-01-28 12:10:17 +00:00
Jeremy Morse 8ebffb4b82 [DebugInfo][DAG] Avoid re-ordering of DBG_VALUEs
This patch improves the placement of DBG_VALUEs when by SelectionDAG, which
as documented in PR40427 can go very wrong. At the core of this is
ProcessSourceNode, which assumes the last instruction in a BB is the start
of the last processed IR instruction, which isn't always true.

Instead, use a helper function to call InstrEmitter::EmitNode, that records
before-and-after iterators and determines the first of any new instruction
created during emission. This is passed to ProcessSourceNode, which can
then make more elightened decisions about ordering for DBG_VALUE placement.

Differential revision: https://reviews.llvm.org/D57163

llvm-svn: 352350
2019-01-28 12:08:31 +00:00
George Rimar 4c3b297621 [llvm-objdump] - Implement the --adjust-vma option.
GNU objdump's help says: "--adjust-vma: Add OFFSET to all displayed section addresses"
In real life what it does is a bit more complicated
(and IMO not always reasonable. For example, GNU objdump prints not only VMA, but also LMA
for sections. And with --adjust-vma it adjusts LMA, but only when a section has relocations.
llvm-objsump does not seem to support printing LMAs yet, but GNU's logic anyways does not
make sense for me here).

This patch tries to adjust VMA. I tried to implement a reasonable approach.
I am not adjusting sections that are not allocatable. As, for example, adjusting debug sections
VA's and rel[a] sections VA's should not make sense. This behavior seems to be GNU compatible.

Differential revision: https://reviews.llvm.org/D57051

llvm-svn: 352347
2019-01-28 10:44:01 +00:00
Diana Picus 574e0c5e32 [ARM GlobalISel] Support integer division for Thumb2
Support G_SDIV, G_UDIV, G_SREM and G_UREM.

The only significant difference between arm and thumb mode is that we
need to check a different subtarget feature.

llvm-svn: 352346
2019-01-28 10:37:30 +00:00
Craig Topper 453150bc18 [X86] Add new variadic avx512 compress/expand intrinsics that use vXi1 types for the mask argument.
Remove and autoupgrade the old intrinsics

llvm-svn: 352343
2019-01-28 07:03:03 +00:00
Craig Topper b23d5ccafc [X86] Add vbmi2 compressstore and expandload tests that aren't fast-isel tests.
These got removed when we autoupgraded to target independent intrinsics, but we didn't have coverage anywhere else. The avx512f/avx512vl versions do have coverage.

Also move some tests back from the upgrade file that aren't really upgraded.

llvm-svn: 352342
2019-01-28 05:42:39 +00:00
Amara Emerson fd31bf95c1 [AArch64][GlobalISel] Teach RBS about G_FNEG default mapping.
llvm-svn: 352340
2019-01-28 03:21:14 +00:00
Amara Emerson 0bfa2faccc [AArch64][GlobalISel] Add some missing vector support for FP arithmetic ops.
Moved the fneg lowering legalization test from AArch64 to X86, as we want to
specify that it's already legal.

llvm-svn: 352338
2019-01-28 02:28:22 +00:00
Amara Emerson 92ffb305cc [AArch64][GlobalISel] Add some vector support for fp <-> int conversions.
Some unrelated, but benign, test changes as well due to the test update script.

llvm-svn: 352337
2019-01-28 02:27:59 +00:00
Matt Arsenault cfca2a7adf GlobalISel: Don't reduce elements for atomic load/store
This is invalid for the same reason as in the narrowScalar handling
for load.

llvm-svn: 352334
2019-01-27 22:36:24 +00:00
Sanjay Patel ebe6b43aec [x86] add restriction for lowering to vpermps
This transform was added with rL351346, and we had
an escape for shufps, but we also want one for
unpckps vs. vpermps because vpermps doesn't take
an immediate shuffle index operand.

llvm-svn: 352333
2019-01-27 21:53:33 +00:00
Sanjay Patel 9ceaf2932a [x86] add tests for extract/extract/unpack; NFC
llvm-svn: 352331
2019-01-27 21:34:51 +00:00
Simon Pilgrim 670a6971f8 [X86][SSE] Add UNDEF handling to combineSelect ISD::USUBSAT matching (PR40083)
llvm-svn: 352330
2019-01-27 21:01:23 +00:00
Simon Pilgrim e5cf884018 [X86][SSE] Add UNDEF test case for combineSelect ISD::USUBSAT matching (PR40083)
llvm-svn: 352329
2019-01-27 20:52:34 +00:00
Simon Pilgrim f10b6623cc [X86][SSE] Permit UNDEFs in combineAddToSUBUS matching (PR40083)
llvm-svn: 352328
2019-01-27 20:36:37 +00:00
Sanjay Patel 6c865deedd [x86] add more tests for lowerShuffleWithUndefHalf; NFC
Some other transform is creating the opposite form and causing 
an infinite loop if we try to split some of these.

llvm-svn: 352327
2019-01-27 20:17:02 +00:00
Simon Pilgrim 976b093ecb [X86][SSE] Add PSUBUS undef element test case (PR40083)
llvm-svn: 352326
2019-01-27 20:09:30 +00:00
Simon Pilgrim c9d32e20d5 [X86] Add test cases for PR36721 (unnecessary andl for %cl when shifting)
llvm-svn: 352321
2019-01-27 18:31:33 +00:00
Matt Arsenault fdfb7d78f1 GlobalISel: Verify load/store has a pointer input
I expected this to be automatically verified, but it seems
nothing uses that the type index was declared as a "ptype"

llvm-svn: 352319
2019-01-27 15:57:23 +00:00
Roman Lebedev d35424a2b3 [X86][NFC] Replace "<%s" with "< %s" in run-lines.
While i have no intention of actually commiting regeneration
of the check lines in these test files with update_llc_test_checks,
lack of that whitespace breaks that util, which is mildly inconvenient.

llvm-svn: 352318
2019-01-27 15:36:35 +00:00
Roman Lebedev 661577466e [NFC][MCA][X86][BdVer2] Cherry-pick int-to-ivec forwarding tests from BtVer2
llvm-svn: 352317
2019-01-27 14:35:54 +00:00
Simon Pilgrim f6d7cfef39 [X86] Add CGP tests for PR40486
llvm-svn: 352316
2019-01-27 14:04:45 +00:00
Simon Pilgrim adca820927 [TTI] Add generic SADDSAT/SSUBSAT costs
Add generic costs calculation for SADDSAT/SSUBSAT intrinsics, this uses generic costs for sadd_with_overflow/ssub_with_overflow, an extra sign comparison + a selects based on the sign/overflow.

This completes PR40316

Differential Revision: https://reviews.llvm.org/D57239

llvm-svn: 352315
2019-01-27 13:51:59 +00:00
Simon Pilgrim c09a4db3b7 [X86] Regenerate reverse branch test to explicitly show branching and condition codes.
llvm-svn: 352314
2019-01-27 12:39:38 +00:00
Simon Pilgrim 7b980ad368 [X86] Regenerate test to explicitly show branching and condition codes.
llvm-svn: 352313
2019-01-27 12:38:09 +00:00
Amara Emerson 711bbdc894 Re-apply "r351584: "GlobalISel: Verify g_zextload and g_sextload""
I reverted it originally due to a bot failing. The underlying bug has been fixed
as of r352311.

llvm-svn: 352312
2019-01-27 11:34:41 +00:00
Amara Emerson bf43004ff1 [AArch64][GlobalISel] Fix the G_EXTLOAD combiner creating non-extending illegal instructions.
This fixes loads like 's1 = load %p (load 1 from %p)' being combined with an
extend into an illegal 's8 = g_extload %p (load 1 from %p)' which doesn't do any
extension, by avoiding touching those < s8 size loads.

This bug was uncovered by a verifier update r351584, which I reverted it to keep
the bots green.

llvm-svn: 352311
2019-01-27 10:56:20 +00:00
Thomas Preud'homme 447abc57c5 Revert "Detect incorrect FileCheck variable CLI definition"
This reverts commit r351039.

llvm-svn: 352309
2019-01-27 09:02:19 +00:00
Thomas Preud'homme e97834e28f Revert "Fix defines.txt"
This reverts commit r351042.

llvm-svn: 352308
2019-01-27 09:02:05 +00:00
Gabor Buella a0f743b77a [X86] Add some missing blsr patterns
The add+and sequence followed by a branch can
happen e.g. when looping over the set bits of an integer:

```
while (x != 0) {
   func(x & ~x);
   x &= x - 1;
}
```

Reviewed By: ctopper

Differential Revision: https://reviews.llvm.org/D57296

llvm-svn: 352306
2019-01-27 06:15:39 +00:00
Gabor Buella 23b04798ad [NFC][X86] Add a few more blsr test cases
llvm-svn: 352305
2019-01-27 06:05:40 +00:00
Craig Topper e65d4c5525 [X86] Add a pattern for (i64 (and (anyext def32:), 0x00000000FFFFFFFF)) to produce SUBREG_TO_REG
def32 here means the producing instruction zeroed bits 63:32. We already do this for zext, but it looks like we can get an and+anyext sometimes.

Spotted in the diffs from D33587.

llvm-svn: 352303
2019-01-27 03:37:05 +00:00
Matt Arsenault 211e89d4dd GlobalISel: Implement narrowScalar for mul
llvm-svn: 352300
2019-01-27 00:52:51 +00:00
Matt Arsenault 2e5f900849 GlobalISel: fewerElementsVector for intrinsic_trunc/intrinsic_round
llvm-svn: 352298
2019-01-27 00:12:21 +00:00
Amara Emerson 203760ab9c [GlobalISel][IRTranslator] Fix crash on translation of fneg.
When the fneg IR instruction was added the code to do translation wasn't
tested, and tried to get an invalid operand.

llvm-svn: 352296
2019-01-26 23:47:09 +00:00
Matt Arsenault 26a6c74fbe AMDGPU/GlobalISel: Legalize more bit ops
llvm-svn: 352295
2019-01-26 23:47:07 +00:00
Matt Arsenault 4d47594fc5 AMDGPU/GlobalISel: Widen small uaddo/usubo
llvm-svn: 352294
2019-01-26 23:44:51 +00:00
Johannes Doerfert 00102c7d95 [ValueTracking] Look through casts when determining non-nullness
Bitcast and certain Ptr2Int/Int2Ptr instructions will not alter the
value of their operand and can therefore be looked through when we
determine non-nullness.

Differential Revision: https://reviews.llvm.org/D54956

llvm-svn: 352293
2019-01-26 23:40:35 +00:00
Simon Pilgrim 37a8e65a60 [X86] combineCarryThroughADD - add support for X86::COND_A commutations (PR24545)
As discussed on PR24545, we should try to commute X86::COND_A 'icmp ugt' cases to X86::COND_B 'icmp ult' to more optimally bind the carry flag output to a SBB instruction.

Differential Revision: https://reviews.llvm.org/D57281

llvm-svn: 352289
2019-01-26 20:23:04 +00:00
Simon Pilgrim b7a15acd38 [X86] Fold X86ISD::SBB(ISD::SUB(X,Y),0) -> X86ISD::SBB(X,Y) (PR25858)
We often generate X86ISD::SBB(X, 0) for carry flag arithmetic.

I had tried to create test cases for the ADC equivalent (which often uses the same pattern) but haven't managed to find anything yet.

Differential Revision: https://reviews.llvm.org/D57169

llvm-svn: 352288
2019-01-26 20:13:44 +00:00
Amaury Sechet be03018384 Generate test results for combine-fcopysign.ll using update_llc_test_checks.py . NFC
llvm-svn: 352285
2019-01-26 18:13:53 +00:00
Simon Pilgrim 6162fba57c [X86][SSE] Generalized unsigned compares to support nonsplat constant vectors (PR39859)
llvm-svn: 352283
2019-01-26 16:40:03 +00:00
Simon Pilgrim 7d6c58e843 [X86] Add nonsplat increment/decrement constant vector with min/max test (PR39859)
llvm-svn: 352281
2019-01-26 16:27:48 +00:00
Simon Pilgrim 0199838883 [X86] Add test case from PR34292
llvm-svn: 352274
2019-01-26 13:56:53 +00:00
Simon Pilgrim c9d33907ef [llvm-mca][X86] Add some missing DQI tests
Match more of the coverage of test\CodeGen\X86\avx512-schedule.ll as discussed on D57244 

llvm-svn: 352273
2019-01-26 13:00:46 +00:00
Simon Pilgrim 3cdf3f681d [X86] Add 'less_than_ideal' followup test case from PR24545
llvm-svn: 352272
2019-01-26 12:51:52 +00:00
Craig Topper 21cdcd7b2b [X86] Autoupgrade some of the intrinsics used by stack folding tests that have been previously removed.
llvm-svn: 352271
2019-01-26 06:27:04 +00:00
Craig Topper 3b5e01b386 [X86] Remove and autoupgrade vpconflict intrinsics that take a mask and passthru argument.
We have unmasked versions as of r352172

llvm-svn: 352270
2019-01-26 06:27:01 +00:00
Craig Topper 58e6b37e62 Revert r352255 "[SelectionDAG][X86] Don't use SEXTLOAD for promoting masked loads in the type legalizer"
This might be breaking an lldb windows buildbot.

llvm-svn: 352268
2019-01-26 02:44:58 +00:00
Craig Topper 6c9c7d0796 [X86] Remove GCCBuiltins from 512-bit cvt(u)qqtops, cvt(u)qqtopd, and cvt(u)dqtops intrinsics. Add new variadic uitofp/sitofp with rounding mode intrinsics.
Summary: See clang patch D56998 for a full description.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D56999

llvm-svn: 352266
2019-01-26 02:41:54 +00:00
Matt Arsenault cdc201fcde GlobalISel: Fix address space limit in LLT
The IR enforced limit for the address space is 24-bits, but LLT was
only using 23-bits. Additionally, the argument to the constructor was
truncating to 16-bits.

A similar problem still exists for the number of vector elements. The
IR enforces no limit, so if you try to use a vector with > 65535
elements the IRTranslator asserts in the LLT constructor.

llvm-svn: 352264
2019-01-26 01:42:13 +00:00
Nemanja Ivanovic 7d007ddedf [PowerPC] Update Vector Costs for P9
For the power9 CPU, vector operations consume a pair of execution units rather
than one execution unit like a scalar operation. Update the target transform
cost functions to reflect the higher cost of vector operations when targeting
Power9.

Patch by RolandF.

Differential revision: https://reviews.llvm.org/D55461

llvm-svn: 352261
2019-01-26 01:18:48 +00:00
Craig Topper 7a8e74775c [X86] Add DAG combine to merge vzext_movl with the various fp<->int conversion operations that only write the lower 64-bits of an xmm register and zero the rest.
Summary: We have isel patterns for this, but we're missing some load patterns and all broadcast patterns. A DAG combine seems like a better fit for this.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D56971

llvm-svn: 352260
2019-01-26 01:17:09 +00:00
Vedant Kumar 8ca0875617 [llvm-nm] Print out N_COLD_FUNC as "cold func"
Per post-commit feedback from Mike, have llvm-nm print out this symbol
attribute as "[cold func]".

llvm-svn: 352258
2019-01-26 00:33:15 +00:00
Artem Belevich dfad526943 [NVPTX] Some nvvm.read.ptx.sreg intrinsics should have IntrInaccessibleMemOnly attribute.
These intrinsics may return different values every time they are called
and should not be CSE'd. IntrInaccessibleMemOnly appears to be the right
attribute to model this behavior.

Differential Revision: https://reviews.llvm.org/D57259

llvm-svn: 352256
2019-01-26 00:28:32 +00:00
Craig Topper b1d3457c03 [SelectionDAG][X86] Don't use SEXTLOAD for promoting masked loads in the type legalizer
Summary:
I'm not sure why we were using SEXTLOAD. EXTLOAD seems more appropriate since we don't care about the upper bits.

This patch changes this and then modifies the X86 post legalization combine to emit a extending shuffle instead of a sign_extend_vector_inreg. Could maybe use an any_extend_vector_inreg, but I just did what we already do in LowerLoad. I think we can actually get rid of this code entirely if we switch to -x86-experimental-vector-widening-legalization.

On AVX512 targets I think we might be able to use a masked vpmovzx and not have to expand this at all.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D57186

llvm-svn: 352255
2019-01-26 00:26:37 +00:00
Alex Bradbury 0092df0669 [RISCV] Add target DAG combine for bitcast fabs/fneg on RV32FD
DAGCombiner::visitBITCAST will perform:
 fold (bitconvert (fneg x)) -> (xor (bitconvert x), signbit)
 fold (bitconvert (fabs x)) -> (and (bitconvert x), (not signbit))

As shown in double-bitmanip-dagcombines.ll, this can be advantageous. But
RV32FD doesn't use bitcast directly (as i64 isn't a legal type), and instead
uses RISCVISD::SplitF64. This patch adds an equivalent DAG combine for
SplitF64.

llvm-svn: 352247
2019-01-25 21:55:48 +00:00
Mircea Trofin 519f42d914 [llvm] Opt-in flag for X86DiscriminateMemOps
Summary:
Currently, if an instruction with a memory operand has no debug information,
X86DiscriminateMemOps will generate one based on the first line of the
enclosing function, or the last seen debug info.

This may cause confusion in certain debugging scenarios. The long term
approach would be to use the line number '0' in such cases, however, that
brings in challenges: the base discriminator value range is limited
(4096 values).

For the short term, adding an opt-in flag for this feature.

See bug 40319 (https://bugs.llvm.org/show_bug.cgi?id=40319)

Reviewers: dblaikie, jmorse, gbedwell

Reviewed By: dblaikie

Subscribers: aprantl, eraman, hiraditya

Differential Revision: https://reviews.llvm.org/D57257

llvm-svn: 352246
2019-01-25 21:49:54 +00:00
Alex Bradbury d760910d3d [RISCV] Add another potential combine to {double,float}-bitmanip-dagcombines.ll
(fcopysign a, (fneg b)) will be expanded to bitwise operations by
DAGTypeLegalizer::SoftenFloatRes_FCOPYSIGN if the floating point type isn't
legal. Arguably it might be worth doing a combine even if it is legal.

llvm-svn: 352240
2019-01-25 21:06:47 +00:00
Ana Pazos 05a6064385 Reapply: [RISCV] Set isAsCheapAsAMove for ADDI, ORI, XORI, LUI
This reapplies commit r352010 with RISC-V test fixes.

llvm-svn: 352237
2019-01-25 20:22:49 +00:00
Guozhi Wei 81f3fd4bf8 [MBP] Don't move bottom block before header if it can't reduce taken branches
If bottom of block BB has only one successor OldTop, in most cases it is profitable to move it before OldTop, except the following case:

-->OldTop<-
|    .    |
|    .    |
|    .    |
---Pred   |
     |    |
    BB-----

Move BB before OldTop can't reduce the number of taken branches, this patch detects this case and prevent the moving.

Differential Revision: https://reviews.llvm.org/D57067

llvm-svn: 352236
2019-01-25 19:45:13 +00:00
Craig Topper 4cf28bad5b [X86] Combine masked store and truncate into masked truncating stores.
We also need to combine to masked truncating with saturation stores, but I'm leaving that for a future patch.

This does regress some tests that used truncate wtih saturation followed by a masked store. Those now use a truncating store and use min/max to saturate.

Differential Revision: https://reviews.llvm.org/D57218

llvm-svn: 352230
2019-01-25 18:37:36 +00:00
Vedant Kumar db3f9774ee [HotColdSplit] Introduce a cost model to control splitting behavior
The main goal of the model is to avoid *increasing* function size, as
that would eradicate any memory locality benefits from splitting. This
happens when:

  - There are too many inputs or outputs to the cold region. Argument
    materialization and reloads of outputs have a cost.

  - The cold region has too many distinct exit blocks, causing a large
    switch to be formed in the caller.

  - The code size cost of the split code is less than the cost of a
    set-up call.

A secondary goal is to prevent excessive overall binary size growth.

With the cost model in place, I experimented to find a splitting
threshold that works well in practice. To make warm & cold code easily
separable for analysis purposes, I moved split functions to a "cold"
section. I experimented with thresholds between [0, 4] and set the
default to the threshold which minimized geomean __text size.

Experiment data from building LNT+externals for X86 (N = 639 programs,
all sizes in bytes):

| Configuration | __text geom size | __cold geom size | TEXT geom size |
| **-Os**       | 1736.3           | 0, n=0           | 10961.6        |
| -Os, thresh=0 | 1740.53          | 124.482, n=134   | 11014          |
| -Os, thresh=1 | 1734.79          | 57.8781, n=90    | 10978.6        |
| -Os, thresh=2 | ** 1733.85 **    | 65.6604, n=61    | 10977.6        |
| -Os, thresh=3 | 1733.85          | 65.3071, n=61    | 10977.6        |
| -Os, thresh=4 | 1735.08          | 67.5156, n=54    | 10965.7        |
| **-Oz**       | 1554.4           | 0, n=0           | 10153          |
| -Oz, thresh=2 | ** 1552.2 **     | 65.633, n=61     | 10176          |
| **-O3**       | 2563.37          | 0, n=0           | 13105.4        |
| -O3, thresh=2 | ** 2559.49 **    | 71.1072, n=61    | 13162.4        |

Picking thresh=2 reduces the geomean __text section size by 0.14% at
-Os, -Oz, and -O3 and causes ~0.2% growth in the TEXT segment. Note that
TEXT size is page-aligned, whereas section sizes are byte-aligned.

Experiment data from building LNT+externals for ARM64 (N = 558 programs,
all sizes in bytes):

| Configuration | __text geom size | __cold geom size | TEXT geom size |
| **-Os**       | 1763.96          | 0, n=0           | 42934.9        |
| -Os, thresh=2 | ** 1760.9 **     | 76.6755, n=61    | 42934.9        |

Picking thresh=2 reduces the geomean __text section size by 0.17% at
-Os and causes no growth in the TEXT segment.

Measurements were done with D57082 (r352080) applied.

Differential Revision: https://reviews.llvm.org/D57125

llvm-svn: 352228
2019-01-25 18:30:37 +00:00
Vedant Kumar 13ef84fced [MC] Teach the MachO object writer about N_FUNC_COLD
N_FUNC_COLD is a new MachO symbol attribute. It's a hint to the linker
to order a symbol towards the end of its section, to improve locality.

Example:

```
void a1() {}
__attribute__((cold)) void a2() {}
void a3() {}
int main() {
  a1();
  a2();
  a3();
  return 0;
}
```

A linker that supports N_FUNC_COLD will order _a2 to the end of the text
section. From `nm -njU` output, we see:

```
_a1
_a3
_main
_a2
```

Differential Revision: https://reviews.llvm.org/D57190

llvm-svn: 352227
2019-01-25 18:30:22 +00:00
Florian Hahn fd7ee47940 [opt-viewer] Add javascript to expand/hide full message for multiline remarks.
This patch adds support for displaying remarks with multiple
lines. For such remarks, it creates a hidden div
containing the message's lines except the first one in a <pre>
tag. It also prepends a link (with '+' as text) to the regular remark
line. This link can be used to show/hide the div containing the
full remark.

In combination with D57159, this allows for better displaying of
multiline remarks in the html pages generated by opt-viewer.

The Javascript is very simple and should be supported by any recent
major browser.

Reviewers: hfinkel, anemet, thegameg, serge-sans-paille

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D57167

llvm-svn: 352223
2019-01-25 17:48:31 +00:00
Alex Bradbury c67515d542 [RISCV][NFC] s/f32/f64 in double-arith.ll
The intrinsic names erroneously used the .f32 variant. As the return and
argument types were still double the intrinsics calls worked properly.

llvm-svn: 352211
2019-01-25 16:04:04 +00:00
Simon Pilgrim f56298f4b9 [X86] Simplify X86ISD::ADD/SUB if we don't use the result flag
Simplify to the generic ISD::ADD/SUB if we don't make use of the result flag.

This mainly helps with ADDCARRY/SUBBORROW intrinsics which get expanded to X86ISD::ADD/SUB but could be simplified further.

Noticed in some of the test cases in PR31754

Differential Revision: https://reviews.llvm.org/D57234

llvm-svn: 352210
2019-01-25 15:58:28 +00:00
Sanjay Patel 21aa6ddc14 [x86] narrow a shuffle that doesn't use or set any high elements
This isn't the final fix for our reduction/horizontal codegen, but it takes care 
of a lot of the problems. After we narrow the shuffle, existing combines for 
insert/extract and binops kick in, and we end up with cheaper 128-bit ops.

The avg and mul reduction tests show an existing shuffle lowering hole for 
AVX2/AVX512. I think in its most minimal form this is:
https://bugs.llvm.org/show_bug.cgi?id=40434
...but we might need multiple fixes to get it right.

Differential Revision: https://reviews.llvm.org/D57156

llvm-svn: 352209
2019-01-25 15:37:42 +00:00
Alex Bradbury 38c4ec31cb [RISCV] Add tests to demonstrate bitcasted fneg/fabs dagcombines
This target-independent code won't trigger for cases such as RV32FD where
custom SelectionDAG nodes are generated. These new tests demonstrate such
cases. Additionally, float-arith.ll was updated so that fneg.s, fsgnjn.s, and
fabs.s selection patterns are actually exercised.

llvm-svn: 352199
2019-01-25 14:33:08 +00:00
Simon Pilgrim d41ccddda9 [X86] Add addcarry/subborrow combine tests
Show failure to simplify cases with zero op/flags

llvm-svn: 352196
2019-01-25 12:26:27 +00:00
James Henderson 759d5e6783 [llvm-symbolizer] Add switch to adjust addresses by fixed offset
If a stack trace or similar has a list of addresses from an executable
or DSO loaded at a variable address (e.g. due to ASLR), the addresses
will not directly correspond to the addresses stored in the object file.
If a user wishes to use llvm-symbolizer, they have to subtract the load
address from every address. This is somewhat inconvenient, especially as
the output of --print-address will result in the adjusted address being
listed, rather than the address coming from the stack trace, making it
harder to map results between the two.

This change adds a new switch to llvm-symbolizer --adjust-vma which
takes an offset, which is then used to automatically do this
calculation. The printed address remains the input address (allowing for
easy mapping), whilst the specified offset is applied to the addresses
when performing the lookup.

The switch is conceptually similar to llvm-objdump's new switch of the
same name (see D57051), which in turn mirrors a GNU switch. There is no
equivalent switch in addr2line.

Reviewed by: grimar

Differential Revision: https://reviews.llvm.org/D57151

llvm-svn: 352195
2019-01-25 11:49:21 +00:00
Max Kazantsev 7822d25de3 [NFC] One more crashing test on LoopSimplifyCFG
llvm-svn: 352194
2019-01-25 11:47:16 +00:00
Max Kazantsev e5116e9b4a [NFC] Add failing test on LCSSA forming
llvm-svn: 352190
2019-01-25 11:32:21 +00:00
Diana Picus 8976ad12a9 [ARM GlobalISel] Support shifts for Thumb2
Same as ARM.

On this occasion we split some of the instruction select tests for more
complicated instructions into their own files, so we can reuse them for
ARM and Thumb mode. Likewise for the legalizer tests.

llvm-svn: 352188
2019-01-25 10:48:42 +00:00
Javed Absar a3e3d85286 [TblGen] Extend !if semantics through new feature !cond
This patch extends TableGen language with !cond operator.
Instead of embedding !if inside !if which can get cumbersome,
one can now use !cond.
Below is an example to convert an integer 'x' into a string:

    !cond(!lt(x,0) : "Negative",
          !eq(x,0) : "Zero",
          !eq(x,1) : "One,
          1        : "MoreThanOne")

Reviewed By: hfinkel, simon_tatham, greened
Differential Revision: https://reviews.llvm.org/D55758

llvm-svn: 352185
2019-01-25 10:25:25 +00:00
Douglas Yung 914e838e63 [llvm-objcopy] Add support for -g as an alias for --strip-debug
This change adds an option -g to llvm-objcopy which is an alias for the existing option --strip-debug.

This fixes PR40003.

Reviewed by: alexshap

Differential Revision: https://reviews.llvm.org/D57217

llvm-svn: 352182
2019-01-25 09:57:20 +00:00
Simon Pilgrim d36f7730cd [llvm-mca][X86] Add missing shuffle tests
Match the coverage of test\CodeGen\X86\avx512-shuffle-schedule.ll so we can get rid of -print-schedule (and fix PR37160) without losing schedule tests

llvm-svn: 352179
2019-01-25 09:17:30 +00:00
Anton Korobeynikov 509d5c4a7d [MSP430] Fix absolute addressing mode printing in AsmPrinter
Align checks for absolute addressing mode with its current
implementation (SR is used as a base register).

This fixes https://bugs.llvm.org/show_bug.cgi?id=39993

Patch by Kristina Bessonova!

Differential Revision: https://reviews.llvm.org/D56785

llvm-svn: 352178
2019-01-25 09:14:05 +00:00
Max Kazantsev 6f2a0c6827 [NFC] Add test with multiple loops
llvm-svn: 352176
2019-01-25 08:46:00 +00:00
Zi Xuan Wu 308a609c6e [PowerPC] Enhance the fast selection of cmp instruction and clean up related asserts
Fast selection of llvm icmp and fcmp instructions is not handled well about VSX instruction support.

We'd use VSX float comparison instruction instead of non-vsx float comparison instruction 
if the operand register class is VSSRC or VSFRC because i32 and i64 are mapped to VSSRC and 
VSFRC correspondingly if VSX feature is opened.

If the target does not have corresponding VSX instruction comparison for some type, 
just copy VSX-related register to common float register class and use non-vsx comparison instruction.

Differential Revision: https://reviews.llvm.org/D57078

llvm-svn: 352174
2019-01-25 07:24:59 +00:00
Alex Bradbury 456d3798d6 [RISCV] Custom-legalise i32 SDIV/UDIV/UREM on RV64M
Follow the same custom legalisation strategy as used in D57085 for
variable-length shifts (see that patch summary for more discussion). Although
we may lose out on some late-stage DAG combines, I think this custom
legalisation strategy is ultimately easier to reason about.

There are some codegen changes in rv64m-exhaustive-w-insts.ll but they are all
neutral in terms of the number of instructions.

Differential Revision: https://reviews.llvm.org/D57096

llvm-svn: 352171
2019-01-25 05:11:34 +00:00
Max Kazantsev 38cd9acbb9 [LoopSimplifyCFG] Fix inconsistency in blocks in loop markup
2nd part of D57095 with the same reason, just in another place. We never
fold branches that are not immediately in the current loop, but this check
is missing in `IsEdgeLive` As result, it may think that the edge in subloop is
dead while it's live. It's a pessimization in the current stance.

Differential Revision: https://reviews.llvm.org/D57147
Reviewed By: rupprecht	

llvm-svn: 352170
2019-01-25 05:05:02 +00:00
Alex Bradbury 299d690a50 [RISCV] Custom-legalise 32-bit variable shifts on RV64
The previous DAG combiner-based approach had an issue with infinite loops
between the target-dependent and target-independent combiner logic (see
PR40333). Although this was worked around in rL351806, the combiner-based
approach is still potentially brittle and can fail to select the 32-bit shift
variant when profitable to do so, as demonstrated in the pr40333.ll test case.

This patch instead introduces target-specific SelectionDAG nodes for
SHLW/SRLW/SRAW and custom-lowers variable i32 shifts to them. pr40333.ll is a
good example of how this approach can improve codegen.

This adds DAG combine that does SimplifyDemandedBits on the operands (only
lower 32-bits of first operand and lower 5 bits of second operand are read).
This seems better than implementing SimplifyDemandedBitsForTargetNode as there
is no guarantee that would be called (and it's not for e.g. the anyext return
test cases). Also implements ComputeNumSignBitsForTargetNode.

There are codegen changes in atomic-rmw.ll and atomic-cmpxchg.ll but the new
instruction sequences are semantically equivalent.

Differential Revision: https://reviews.llvm.org/D57085

llvm-svn: 352169
2019-01-25 05:04:00 +00:00
Matt Arsenault 3e08b772b3 AMDGPU/GlobalISel: Scalarize add/sub
llvm-svn: 352167
2019-01-25 04:53:57 +00:00
Matt Arsenault e6cebd0d69 GlobalISel: fewerElementsVector for more cast types
llvm-svn: 352166
2019-01-25 04:37:33 +00:00
Matt Arsenault 95fd95cfe0 GlobalISel: fewerElementsVector for a few more trivial ops
llvm-svn: 352165
2019-01-25 04:03:38 +00:00
Matt Arsenault 5d622fbcc1 AMDGPU/GlobalISel: Legalize smulh/umulh and scalarize mul
llvm-svn: 352162
2019-01-25 03:23:04 +00:00
Vedant Kumar 65de025d64 [HotColdSplit] Split more aggressively before/after cold invokes
While a cold invoke itself and its unwind destination can't be
extracted, code which unconditionally executes before/after the invoke
may still be profitable to extract.

With cost model changes from D57125 applied, this gives a 3.5% increase
in split text across LNT+externals on arm64 at -Os.

llvm-svn: 352160
2019-01-25 03:22:23 +00:00
Matt Arsenault 1b1e685f10 GlobalISel: Support fewerElementsVector for icmp/fcmp
Also legalize 64-bit compares for AMDGPU

llvm-svn: 352157
2019-01-25 02:59:34 +00:00
Matt Arsenault ca676343a9 GlobalISel: Implement fewerElementsVector for extensions
llvm-svn: 352155
2019-01-25 02:36:32 +00:00
Peter Collingbourne 1a8acfb768 hwasan: If we split the entry block, move static allocas back into the entry block.
Otherwise they are treated as dynamic allocas, which ends up increasing
code size significantly. This reduces size of Chromium base_unittests
by 2MB (6.7%).

Differential Revision: https://reviews.llvm.org/D57205

llvm-svn: 352152
2019-01-25 02:08:46 +00:00
Bob Haarman 6710cc7db5 simplify COFF module assembly test and move it to Object
Reviewers: pcc, rnk

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D57192

llvm-svn: 352142
2019-01-25 00:33:05 +00:00
Vedant Kumar a48cd9aedd Try to address Windows bot failure after r352080
See the bot error message reported in https://reviews.llvm.org/D57082.

Avoid trying to match full class names in -debug-pass-manager output,
because they aren't portable.

llvm-svn: 352138
2019-01-25 00:15:16 +00:00
Nemanja Ivanovic b9b75de0ae [PowerPC] Exploit store instructions that store a single vector element
This patch exploits the instructions that store a single element from a vector
to preform a (store (extract_elt)). We already have code that does this with
ISA 3.0 instructions that were added to handle i8/i16 types. However, we had
never exploited the existing ones that handle f32/f64/i32/i64 types.

Differential revision: https://reviews.llvm.org/D56175

llvm-svn: 352131
2019-01-24 23:44:28 +00:00
Aditya Nandakumar 3ba0d94bce [GISel]: Change how CSE is enabled by default for each pass
https://reviews.llvm.org/D57178

Now add a hook in TargetPassConfig to query if CSE needs to be
enabled. By default this hook returns false only for O0 opt level but
this can be overridden by the target.
As a consequence of the default of enabled for non O0, a few tests
needed to be updated to not use CSE (by passing in -O0) to the run
line.

reviewed by: arsenm

llvm-svn: 352126
2019-01-24 23:11:25 +00:00
Matt Arsenault baa5d2e69c RegBankSelect: Support some more complex part mappings
llvm-svn: 352123
2019-01-24 22:47:04 +00:00
Armando Montanez 8367b0750f [elfabi] Add support for reading dynamic symbols from binaries
This patch adds initial support for reading dynamic symbols from ELF binaries. Currently, STT_NOTYPE, STT_OBJECT, STT_FUNC, and STT_TLS are explicitly supported. Other symbol types are mapped to ELFSymbolType::Unknown to improve signal/noise ratio.

Symbols must meet two criteria to be read into in an ELFStub:

 - The symbol's binding must be STB_GLOBAL or STB_WEAK.
 - The symbol's visibility must be STV_DEFAULT or STV_PROTECTED.

This filters out symbols that aren't of interest during compile-time linking against a shared object.

This change uses DT_HASH and DT_GNU_HASH to determine the size of .dynsym. Using hash tables to determine the number of symbols in .dynsym allows llvm-elfabi to work on binaries without relying on section headers.

Differential Revision: https://reviews.llvm.org/D56031

llvm-svn: 352121
2019-01-24 22:39:21 +00:00
Jessica Paquette 245047dfe8 [GlobalISel][AArch64] Add isel support for FP16 vector @llvm.ceil
This patch adds support for vector @llvm.ceil intrinsics when full 16 bit
floating point support isn't available.

To do this, this patch...

- Implements basic isel for G_UNMERGE_VALUES
- Teaches the legalizer about 16 bit floats
- Teaches AArch64RegisterBankInfo to respect floating point registers on
  G_BUILD_VECTOR and G_UNMERGE_VALUES
- Teaches selectCopy about 16-bit floating point vectors

It also adds

- A legalizer test for the 16-bit vector ceil which verifies that we create a
  G_UNMERGE_VALUES and G_BUILD_VECTOR when full fp16 isn't supported
- An instruction selection test which makes sure we lower to G_FCEIL when
  full fp16 is supported
- A test for selecting G_UNMERGE_VALUES

And also updates arm64-vfloatintrinsics.ll to show that the new ceiling types
work as expected.

https://reviews.llvm.org/D56682

llvm-svn: 352113
2019-01-24 22:00:41 +00:00
Bob Haarman 38ebaf7d5d allow COFF .def directive in module assembly when using ThinLTO
Summary:
Using COFF's .def directive in module assembly used to crash ThinLTO
with "this directive only supported on COFF targets" when getting
symbol information in ModuleSymbolTable.  This change allows
ModuleSymbolTable to process such code and adds a test to verify that
the .def directive has the desired effect on the native object file,
with and without ThinLTO.

Fixes https://bugs.llvm.org/show_bug.cgi?id=36789

Reviewers: rnk, pcc, vlad.tsyrklevich

Subscribers: mehdi_amini, eraman, hiraditya, dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D57073

llvm-svn: 352112
2019-01-24 21:41:03 +00:00
Eli Friedman 525ef0159d [Analysis] Fix isSafeToLoadUnconditionally handling of volatile.
A volatile operation cannot be used to prove an address points to normal
memory.  (LangRef was recently updated to state it explicitly.)

Differential Revision: https://reviews.llvm.org/D57040

llvm-svn: 352109
2019-01-24 21:31:13 +00:00
Michael Trent f4c902bd77 Limit dyld image suffixes guessed by guessLibraryShortName()
Summary:
guessLibraryShortName() separates a full Mach-O dylib install name path
into a short name and a dyld image suffix. The short name is the name
of the dylib without its path or extension. The dyld image suffix is a
string used by dyld to load variants of dylibs if available at runtime;
for example, "when binding this process, load 'debug' variants of all
required dylibs." dyld knows exactly what the image suffix is, but
by convention diagnostic tools such as llvm-nm attempt to guess suffix
names by looking at the install name path. 

These dyld image suffixes are separated from the short name by a '_'
character. Because the '_' character is commonly used to separate words
in filenames guessLibraryShortName() cannot reliably separate a dylib's
short name from an arbitrary image suffix; imagine if both the short
name and the suffix contains an '_' character! To better deal with this
ambiguity, guessLibraryShortName() will recognize only "_debug" and
"_profile" as valid Suffix values. Calling code needs to be tolerant of
guessLibraryShortName() guessing incorrectly.

The previous implementation of guessLibraryShortName() did not allow
'_' characters to appear in short names. When present, the short name
would be  truncated, e.g., "libcompiler_rt" => "libcompiler". This
change allows "libcompiler_rt" and "libcompiler_rt_debug" to both be
recognized as "libcompiler_rt".

rdar://47412244

Reviewers: kledzik, lhames, pete

Reviewed By: pete

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D56978

llvm-svn: 352104
2019-01-24 20:59:44 +00:00
Alina Sbirlea 52f6e2a173 [MemorySSA +LICM CFHoist] Solve PR40317.
Summary:
MemorySSA needs updating each time an instruction is moved.
LICM and control flow hoisting re-hoists instructions, thus needing another update when re-moving those instructions.
Pending cleanup: the MSSA update is duplicated, should be moved inside moveInstructionBefore.

Reviewers: jnspaulsson

Subscribers: sanjoy, jlebar, Prazek, george.burgess.iv, llvm-commits

Differential Revision: https://reviews.llvm.org/D57176

llvm-svn: 352092
2019-01-24 19:48:35 +00:00
Philip Reames 4775721dda Test cases for demanded elements on vector GEPs
This is the first part of splitting apart https://reviews.llvm.org/D57140 into usuable pieces.  Landing the tests in advance of posting a review specifically for the demanded elements part.

llvm-svn: 352091
2019-01-24 19:35:28 +00:00
Simon Pilgrim c12a634326 [X86] Regenerate SBB test to fix buildbots.
Some local WIP code unexpectedly managed to get in the way.

llvm-svn: 352081
2019-01-24 18:57:48 +00:00
Vedant Kumar ef1ebed1c6 [HotColdSplit] Move splitting earlier in the pipeline
Performing splitting early has several advantages:

  - Inhibiting inlining of cold code early improves code size. Compared
    to scheduling splitting at the end of the pipeline, this cuts code
    size growth in half within the iOS shared cache (0.69% to 0.34%).

  - Inhibiting inlining of cold code improves compile time. There's no
    need to inline split cold functions, or to inline as much *within*
    those split functions as they are marked `minsize`.

  - During LTO, extra work is only done in the pre-link step. Less code
    must be inlined during cross-module inlining.

An additional motivation here is that the most common cold regions
identified by the static/conservative splitting heuristic can (a) be
found before inlining and (b) do not grow after inlining. E.g.
__assert_fail, os_log_error.

The disadvantages are:

  - Some opportunities for splitting out cold code may be missed. This
    gap can potentially be narrowed by adding a worklist algorithm to the
    splitting pass.

  - Some opportunities to reduce code size may be lost (e.g. store
    sinking, when one side of the CFG diamond is split). This does not
    outweigh the code size benefits of splitting earlier.

On net, splitting early in the pipeline has substantial code size
benefits, and no major effects on memory locality or performance. We
measured memory locality using ktrace data, and consistently found that
10% fewer pages were needed to capture 95% of text page faults in key
iOS benchmarks. We measured performance on frequency-stabilized iOS
devices using LNT+externals.

This reverses course on the decision made to schedule splitting late in
r344869 (D53437).

Differential Revision: https://reviews.llvm.org/D57082

llvm-svn: 352080
2019-01-24 18:55:49 +00:00
James Y Knight 2c36240a82 Fix emission of _fltused for MSVC.
It should be emitted when any floating-point operations (including
calls) are present in the object, not just when calls to printf/scanf
with floating point args are made.

The difference caused by this is very subtle: in static (/MT) builds,
on x86-32, in a program that uses floating point but doesn't print it,
the default x87 rounding mode may not be set properly upon
initialization.

This commit also removes the walk of the types pointed to by pointer
arguments in calls. (To assist in opaque pointer types migration --
eventually the pointee type won't be available.)

That latter implies that it will no longer consider a call like
`scanf("%f", &floatvar)` as sufficient to emit _fltused on its
own. And without _fltused, `scanf("%f")` will abort with error R6002. This
new behavior is unlikely to bite anyone in practice (you'd have to
read a float, and do nothing with it!), and also, is consistent with
MSVC.

Differential Revision: https://reviews.llvm.org/D56548

llvm-svn: 352076
2019-01-24 18:34:00 +00:00
Simon Pilgrim f4a1b54097 [X86] Add PR25858 test cases
llvm-svn: 352075
2019-01-24 18:30:45 +00:00
Julian Lettner b62e9dc46b Revert "[Sanitizers] UBSan unreachable incompatible with ASan in the presence of `noreturn` calls"
This reverts commit cea84ab93a.

llvm-svn: 352069
2019-01-24 18:04:21 +00:00
Philip Reames 86bbf7ccee [RS4GC] Expand/standardize tests introduced in rL352059
Write a couple of variations on vector geps w/both scalars and vectors live over safepoints.  Use update_test_checks to show all the IR.

llvm-svn: 352062
2019-01-24 16:45:23 +00:00
Philip Reames 4d683ee7e3 [RS4GC] Be slightly less conservative for gep vector_base, scalar_idx
After submitting https://reviews.llvm.org/D57138, I realized it was slightly more conservative than needed. The scalar indices don't appear to be a problem on a vector gep, we even had a test for that.

Differential Revision: https://reviews.llvm.org/D57161

llvm-svn: 352061
2019-01-24 16:34:00 +00:00
Philip Reames a657510eb7 [RS4GC] Avoid crashing on gep scalar_base, vector_idx
This is an alternative to https://reviews.llvm.org/D57103.  After discussion, we dedicided to check this in as a temporary workaround, and pursue a true fix under the original thread.

The issue at hand is that the base rewriting algorithm doesn't consider the fact that GEPs can turn a scalar input into a vector of outputs. We had handling for scalar GEPs and fully vector GEPs (i.e. all vector operands), but not the scalar-base + vector-index forms. A true fix here requires treating GEP analogously to extractelement or shufflevector.

This patch is merely a workaround. It simply hides the crash at the cost of some ugly code gen for this presumable very rare pattern.

Differential Revision: https://reviews.llvm.org/D57138

llvm-svn: 352059
2019-01-24 16:08:18 +00:00
Sanjay Patel 55787a7e77 [x86] add tests for unpack shuffle lowering; NFC
https://bugs.llvm.org/show_bug.cgi?id=40434

llvm-svn: 352048
2019-01-24 14:12:34 +00:00
Simon Pilgrim 30b206b5da [CostModel][X86] Add SMUL fixed point cost tests
llvm-svn: 352046
2019-01-24 13:48:20 +00:00
Simon Pilgrim 47ca8606ba [TTI] Add generic SADDO/SSUBO costs
Added x86 scalar sadd_with_overflow/ssub_with_overflow costs.

llvm-svn: 352045
2019-01-24 13:36:45 +00:00
Simon Pilgrim a131e4e296 [TTI] Add generic UADDSAT/USUBSAT costs
Add generic costs calculation for UADDSAT/USUBSAT intrinsics, this fallbacks to using generic costs for uadd_with_overflow/usub_with_overflow + a select.

Differential Revision: https://reviews.llvm.org/D56907

llvm-svn: 352044
2019-01-24 12:27:10 +00:00
Simon Pilgrim 2d1964b90f [TTI] Add generic UADDO/USUBO costs
Added x86 scalar uadd_with_overflow/usub_with_overflow costs.

Differential Revision: https://reviews.llvm.org/D56907

llvm-svn: 352043
2019-01-24 12:10:20 +00:00
Petar Avramovic 79df859685 [MIPS GlobalISel] Select zero extending and sign extending load
Select zero extending and sign extending load for MIPS32.
Use size from MachineMemOperand to determine number of bytes to load.

Differential Revision: https://reviews.llvm.org/D57099

llvm-svn: 352038
2019-01-24 10:27:21 +00:00
Petar Avramovic b5a939d246 [MIPS GlobalISel] Combine extending loads
Use CombinerHelper to combine extending load instructions.
G_LOAD combined with G_ZEXT, G_SEXT or G_ANYEXT gives G_ZEXTLOAD,
G_SEXTLOAD or G_LOAD with same type as def of extending instruction
respectively.
Similarly G_ZEXTLOAD combined with G_ZEXT gives G_ZEXTLOAD and
G_SEXTLOAD combined with G_SEXT gives G_SEXTLOAD with same type
as def of extending instruction.

Differential Revision: https://reviews.llvm.org/D56914

llvm-svn: 352037
2019-01-24 10:09:52 +00:00
Simon Atanasyan b6d3c50a36 Reapply: [mips] Handle MipsMCExpr sub-expression for the MEK_DTPREL tag
This reapplies commit r351987 with a failed test fix. Now the test
accepts both DW_OP_GNU_push_tls_address and DW_OP_form_tls_address
opcode.

Original commit message:
```
  This is a fix for a regression introduced by the rL348194 commit. In
  that change new type (MEK_DTPREL) of MipsMCExpr expression was added,
  but in some places of the code this type of expression considered as
  unexpected.

  This change fixes the bug. The MEK_DTPREL type of expression is used for
  marking TLS DIEExpr only and contains a regular sub-expression. Where we
  need to handle the expression, we retrieve the sub-expression and
  handle it in a common way.
```

llvm-svn: 352034
2019-01-24 09:13:14 +00:00
Jonas Paulsson 5916dea338 [SystemZ] Remember to reset the NoPHIs property on MF in createPHIsForSelects()
After creating new PHI instructions during isel pseudo expansion, the NoPHIs
property of MF should be reset in case it was previously set.

Review: Ulrich Weigand
llvm-svn: 352030
2019-01-24 07:54:41 +00:00
Craig Topper e79b779fbb [X86] Add test cases for opportunities to fold a truncate and a masked store into a truncating masked store.
llvm-svn: 352027
2019-01-24 06:15:03 +00:00
Max Kazantsev 66f92df761 [NFC] Add another failing test on LoopSimplifyCFG
llvm-svn: 352026
2019-01-24 05:43:19 +00:00
Max Kazantsev 56515a2c76 [LoopSimplifyCFG] Fix inconsistency in live blocks markup
When we choose whether or not we should mark block as dead, we have an
inconsistent logic in markup of live blocks.
- We take candidate IF its terminator branches on constant AND it is immediately
  in current loop;
- We mark successor live IF its terminator doesn't branch by constant OR it branches
  by constant and the successor is its always taken block.

What we are missing here is that when the terminator branches on a constant but is
not taken as a candidate because is it not immediately in the current loop, we will
mark only one (always taken) successor as live. Therefore, we do NOT do the actual
folding but may NOT mark one of the successors as live. So the result of markup is
wrong in this case, and we may then hit various asserts.

Thanks Jordan Rupprech for reporting this!

Differential Revision: https://reviews.llvm.org/D57095
Reviewed By: rupprecht

llvm-svn: 352024
2019-01-24 05:20:29 +00:00
Max Kazantsev 11d3314241 [NFC] Add a failing test on live block markup in term folding
llvm-svn: 352023
2019-01-24 05:05:55 +00:00
David Blaikie 7b585673d1 DebugInfo: Use assembly label arithmetic for address pool size for easier reading/editing
Recommits 350048, 350050 That broke buildbots because of some typos in
the test case.

llvm-svn: 352019
2019-01-24 03:27:57 +00:00
David Blaikie 79c3d8b127 llvm-symbolizer: Extract individual test cases now that it's easier to use directly (without a piped input file)
Pulling out the split-dwarf tests by way of example of how I think
llvm-symbolizer should be tested going forward. Open to
debate/discussion, though.

llvm-svn: 352004
2019-01-24 01:19:17 +00:00
Julian Lettner cea84ab93a [Sanitizers] UBSan unreachable incompatible with ASan in the presence of `noreturn` calls
Summary:
UBSan wants to detect when unreachable code is actually reached, so it
adds instrumentation before every `unreachable` instruction. However,
the optimizer will remove code after calls to functions marked with
`noreturn`. To avoid this UBSan removes `noreturn` from both the call
instruction as well as from the function itself. Unfortunately, ASan
relies on this annotation to unpoison the stack by inserting calls to
`_asan_handle_no_return` before `noreturn` functions. This is important
for functions that do not return but access the the stack memory, e.g.,
unwinder functions *like* `longjmp` (`longjmp` itself is actually
"double-proofed" via its interceptor). The result is that when ASan and
UBSan are combined, the `noreturn` attributes are missing and ASan
cannot unpoison the stack, so it has false positives when stack
unwinding is used.

Changes:
  # UBSan now adds the `expect_noreturn` attribute whenever it removes
    the `noreturn` attribute from a function
  # ASan additionally checks for the presence of this attribute

Generated code:
```
call void @__asan_handle_no_return    // Additionally inserted to avoid false positives
call void @longjmp
call void @__asan_handle_no_return
call void @__ubsan_handle_builtin_unreachable
unreachable
```

The second call to `__asan_handle_no_return` is redundant. This will be
cleaned up in a follow-up patch.

rdar://problem/40723397

Reviewers: delcypher, eugenis

Tags: #sanitizers

Differential Revision: https://reviews.llvm.org/D56624

llvm-svn: 352003
2019-01-24 01:06:19 +00:00
David Callahan d2eeb2516d Update entry count for cold calls
Summary:
Profile sample files include the number of times each entry or inlined
call site is sampled. This is translated into the entry count metadta
on functions.

When sample data is being read, if a call site that was inlined
in the sample program is considered cold and not inlined, then
the entry count of the out-of-line functions does not reflect
the current compilation.

In this patch, we note call sites where the function was not inlined
and as a last action of the sample profile loading, we update the
called function's entry count to reflect the calls from these
call sites which are not included in the profile file.

Reviewers: danielcdh, wmi, Kader, modocache

Reviewed By: wmi

Subscribers: davidxl, eraman, llvm-commits

Differential Revision: https://reviews.llvm.org/D52845

llvm-svn: 352001
2019-01-24 00:55:23 +00:00
Douglas Yung 7876c0ecf2 [llvm-symbolizer] Add support for -i and -inlines as aliases for -inlining
This change adds two options, -i and -inlines as aliases for the -inlining option to llvm-symbolizer to improve compatibility with the GNU addr2line utility which accepts these options.

It also modifies existing tests that use -inlining to exercise these new aliases as well.

This fixes PR40073.

Reviewed by: jhenderson, Quolyk, ruiu

Differential Revision: https://reviews.llvm.org/D57083

llvm-svn: 351999
2019-01-24 00:34:09 +00:00
Amara Emerson addb7ab2ae Revert "[mips] Handle MipsMCExpr sub-expression for the MEK_DTPREL tag"
This reverts commit r351987 as it broke some bots.

llvm-svn: 351998
2019-01-24 00:24:59 +00:00
Peter Collingbourne 020ce3f026 hwasan: Read shadow address from ifunc if we don't need a frame record.
This saves a cbz+cold call in the interceptor ABI, as well as a realign
in both ABIs, trading off a dcache entry against some branch predictor
entries and some code size.

Unfortunately the functionality is hidden behind a flag because ifunc is
known to be broken on static binaries on Android.

Differential Revision: https://reviews.llvm.org/D57084

llvm-svn: 351989
2019-01-23 22:39:11 +00:00
Simon Atanasyan 812f1c55b1 [mips] Handle MipsMCExpr sub-expression for the MEK_DTPREL tag
This is a fix for a regression introduced by the rL348194 commit. In
that change new type (MEK_DTPREL) of MipsMCExpr expression was added,
but in some places of the code this type of expression considered as
unexpected.

This change fixes the bug. The MEK_DTPREL type of expression is used for
marking TLS DIEExpr only and contains a regular sub-expression. Where we
need to handle the expression, we retrieve the sub-expression and
handle it in a common way.

llvm-svn: 351987
2019-01-23 22:02:53 +00:00
Reid Kleckner f9ebacfd29 Revert r351938 "[ARM] Alter the register allocation order for minsize on Thumb2"
This change caused fatal backend errors when compiling a file in libvpx
for Android.

llvm-svn: 351979
2019-01-23 21:10:48 +00:00
Alexey Bataev 897129dc3f [DEBUGINFO, NVPTX] Enable support for the debug info on NVPTX target.
Enable full support for the debug info.

Differential revision: https://reviews.llvm.org/D46189

llvm-svn: 351974
2019-01-23 18:59:54 +00:00
Alexey Bataev 25624e2e5b Revert "[DEBUGINFO, NVPTX] Enable support for the debug info on NVPTX target."
This reverts commit r351972. Some pieces of the patch was not applied
correctly.

llvm-svn: 351973
2019-01-23 18:48:36 +00:00
Alexey Bataev fe0b356063 [DEBUGINFO, NVPTX] Enable support for the debug info on NVPTX target.
Enable full support for the debug info. Recommit to fix the emission of
the not required closing brace.

Differential revision: https://reviews.llvm.org/D46189

llvm-svn: 351972
2019-01-23 18:28:59 +00:00
Craig Topper aa0e74c1fc [X86] Autogenerate complete checks. NFC
llvm-svn: 351970
2019-01-23 18:25:49 +00:00
James Henderson 25ce596cd1 [llvm-symbolizer] Improve compatibility of --functions with GNU addr2line
This fixes https://bugs.llvm.org/show_bug.cgi?id=40072.

GNU addr2line's --functions switch is off by default, has a short alias
of -f, and does not take an argument. This patch changes llvm-symbolizer
to allow the second and third point (changing the default behaviour may
have negative impacts on users). If the option is missing a value, it
now treats it as "linkage".

This change does cause one previously valid command-line to behave
differently. Before --functions <value> was accepted, but now only
--functions=<value> is allowed (as well as --functions). The old
behaviour will result in the value being treated as a positional
argument.

The previous testing for --functions=short has been pulled out into a
new test that also tests the other accepted values and option formats.

Reviewed by: ruiu

Differential Revision: https://reviews.llvm.org/D57049

llvm-svn: 351968
2019-01-23 17:27:48 +00:00
Haojian Wu 15a77418a9 Revert "[DEBUGINFO, NVPTX] Enable support for the debug info on NVPTX target."
This reverts commit r351846.

This patch may generate illegal assembly code, see

```
$ ./bin/clang -cc1 -triple nvptx64-nvidia-cuda -aux-triple x86_64-grtev4-linux-gnu -S -disable-free -disable-llvm-verifier -discard-value-names -main-file-name new.cc -mrelocation-model pic -pic-level 2 -mthread-model posix -fmerge-all-constants -mdisable-fp-elim -relaxed-aliasing -no-integrated-as -mpie-copy-relocations -munwind-tables -fcuda-is-device -target-feature +ptx60 -target-cpu sm_35 -dwarf-column-info -debug-info-kind=line-directives-only -dwarf-version=2 -debugger-tuning=gdb -o empty.s -x cuda empty.cc
$  cat empty.s
//
// Generated by LLVM NVPTX Back-End
//

.version 6.0
.target sm_35
.address_size 64

	}
```

llvm-svn: 351966
2019-01-23 16:39:57 +00:00
Andrea Di Biagio d768d35515 [MC][X86] Correctly model additional operand latency caused by transfer delays from the integer to the floating point unit.
This patch adds a new ReadAdvance definition named ReadInt2Fpu.
ReadInt2Fpu allows x86 scheduling models to accurately describe delays caused by
data transfers from the integer unit to the floating point unit.
ReadInt2Fpu currently defaults to a delay of zero cycles (i.e. no delay) for all
x86 models excluding BtVer2. That means, this patch is only a functional change
for the Jaguar cpu model only.

Tablegen definitions for instructions (V)PINSR* have been updated to account for
the new ReadInt2Fpu. That read is mapped to the the GPR input operand.
On Jaguar, int-to-fpu transfers are modeled as a +6cy delay. Before this patch,
that extra delay was added to the opcode latency. In practice, the insert opcode
only executes for 1cy. Most of the actual latency is actually contributed by the
so-called operand-latency. According to the AMD SOG for family 16h, (V)PINSR*
latency is defined by expression f+1, where f is defined as a forwarding delay
from the integer unit to the fpu.

When printing instruction latency from MCA (see InstructionInfoView.cpp) and LLC
(only when flag -print-schedule is speified), we now need to account for any
extra forwarding delays. We do this by checking if scheduling classes declare
any negative ReadAdvance entries. Quoting a code comment in TargetSchedule.td:
"A negative advance effectively increases latency, which may be used for
cross-domain stalls". When computing the instruction latency for the purpose of
our scheduling tests, we now add any extra delay to the formula. This avoids
regressing existing codegen and mca schedule tests. It comes with the cost of an
extra (but very simple) hook in MCSchedModel.

Differential Revision: https://reviews.llvm.org/D57056

llvm-svn: 351965
2019-01-23 16:35:07 +00:00
James Henderson 21ed868390 [llvm-readelf] Don't suppress static symbol table with --dyn-symbols + --symbols
In r287786, a bug was introduced into llvm-readelf where it didn't print
the static symbol table if both --symbols and --dyn-symbols were
specified, even if there was no dynamic symbol table. This is obviously
incorrect.

This patch fixes this issue, by delegating the decision of which symbol
tables should be printed to the final dumper, rather than trying to
decide in the command-line option handling layer. The decision was made
to follow the approach taken in this patch because the LLVM style dumper
uses a different order to the original GNU style behaviour (and GNU
readelf) for ELF output. Other approaches resulted in behaviour changes
for other dumpers which felt wrong. In particular, I wanted to avoid
changing the order of the output for --symbols --dyn-symbols for LLVM
style, keep what is emitted by --symbols unchanged for all dumpers, and
avoid having different orders of .dynsym and .symtab dumping for GNU
"--symbols" and "--symbols --dyn-symbols".

Reviewed by: grimar, rupprecht

Differential Revision: https://reviews.llvm.org/D57016

llvm-svn: 351960
2019-01-23 16:15:39 +00:00
Simon Pilgrim f87226eb70 [IR] Match intrinsic parameter by scalar/vectorwidth
This patch replaces the existing LLVMVectorSameWidth matcher with LLVMScalarOrSameVectorWidth.

The matching args must be either scalars or vectors with the same number of elements, but in either case the scalar/element type can differ, specified by LLVMScalarOrSameVectorWidth.

I've updated the _overflow intrinsics to demonstrate this - allowing it to return a i1 or <N x i1> overflow result, matching the scalar/vectorwidth of the other (add/sub/mul) result type.

The masked load/store/gather/scatter intrinsics have also been updated to use this, although as we specify the reference type to be llvm_anyvector_ty we guarantee the mask will be <N x i1> so no change in behaviour

Differential Revision: https://reviews.llvm.org/D57090

llvm-svn: 351957
2019-01-23 16:00:22 +00:00
Tim Renouf f64f8efe13 [AMDGPU] With XNACK, cannot clause a load with result coalesced with operand
Summary:
With XNACK, an smem load whose result is coalesced with an operand (thus
it overwrites its own operand) cannot appear in a clause, because some
other instruction might XNACK and restart the whole clause.

The clause breaker already realized that an smem that overwrites an
operand cannot appear in a clause, and broke the clause. The problem
that this commit fixes is that the SIFormMemoryClauses optimization
formed a bundle with early clobber, which caused the earlier code that
set up the coalesced operand to be removed as dead.

Differential Revision: https://reviews.llvm.org/D57008

Change-Id: I703c4d5b0bf7d6060222bec491f45c18bb3c0016
llvm-svn: 351950
2019-01-23 13:38:06 +00:00
Martin Storsjo 1be91958b3 [llvm-objcopy] [COFF] Fix handling of aux symbols for big objects
The aux symbols were stored in an opaque std::vector<uint8_t>,
with contents interpreted according to the rest of the symbol.

All aux symbol types but one fit in 18 bytes (sizeof(coff_symbol16)),
and if written to a bigobj, two extra padding bytes are written (as
sizeof(coff_symbol32) is 20). In the storage agnostic intermediate
representation, store the aux symbols as a series of coff_symbol16
sized opaque blobs. (In practice, all such aux symbols only consist
of one aux symbol, so this is more flexible than what reality needs.)

The special case is the file aux symbols, which are written in
potentially more than one aux symbol slot, without any padding,
as one single long string. This can't be stored in the same opaque
vector of fixed sized aux symbol entries. The file aux symbols will
occupy a different number of aux symbol slots depending on the type
of output object file. As nothing in the intermediate process needs
to have accurate raw symbol indices, updating that is moved into the
writer class.

Differential Revision: https://reviews.llvm.org/D57009

llvm-svn: 351947
2019-01-23 11:54:51 +00:00
Martin Storsjo 481334056f [llvm-objcopy] [COFF] Remove testcase debugging lines. NFC.
These are no longer necessary as the testcase now seems to run fine
on the buildbots that previously failed on this case, after SVN r351934.

llvm-svn: 351946
2019-01-23 11:54:36 +00:00
Jonas Paulsson 6046d087c5 [SystemZ] Fix test case for buildbot.
llvm-clang-x86_64-expensive-checks-win triggered this assert:

"llvm.dbg.value intrinsic requires a !dbg attachment"

Hopefully, adding reasonable !dbg operands solves this.

llvm-svn: 351939
2019-01-23 10:29:12 +00:00
David Green 6a858a9425 [ARM] Alter the register allocation order for minsize on Thumb2
Currently in Arm code, we allocate LR first, under the assumption that
it needs to be saved anyway. Unfortunately this has the disadvantage
that it will require any instructions using it to be the longer thumb2
instructions, not the shorter thumb1 ones.

This switches the order when we are optimising for minsize, returning to
the default order so that more lower registers can be used. It can end
up requiring more pushed registers, but on average produces smaller code.

Differential Revision: https://reviews.llvm.org/D56008

llvm-svn: 351938
2019-01-23 10:18:30 +00:00
Dmitry Venikov cce66874a8 [llvm-symbolizer] Allow single letter command flags grouping.
Summary: Currently llvm-symbolizer doesn't allow flags combining. This patch allows such grouping behavior just like addr2line. Motivation: https://bugs.llvm.org/show_bug.cgi?id=40304

Reviewers: jhenderson, ruiu

Reviewed By: jhenderson

Subscribers: rupprecht, llvm-commits

Differential Revision: https://reviews.llvm.org/D57046

llvm-svn: 351936
2019-01-23 09:49:37 +00:00
Sam Parker 31bef63bb4 [ARM][CGP] Check trunc type before replacing
In the last stage of type promotion, we replace any zext that uses a
new trunc with the operand of the trunc. This is okay when we only
allowed one type to be optimised, but now its the case that the trunc
maybe needed to produce a more narrow type than the one we were
optimising for. So we need to check this before doing the replacement.

Differential Revision: https://reviews.llvm.org/D57041

llvm-svn: 351935
2019-01-23 09:18:44 +00:00
Sam Parker 9a2a89d58f [DAGCombine] Enable more pre-indexed stores
The current check in CombineToPreIndexedLoadStore is too
conversative, preventing a pre-indexed store when the base pointer
is a predecessor of the value being stored. Instead, we should check
the pointer operand of the store.

Differential Revision: https://reviews.llvm.org/D56719

llvm-svn: 351933
2019-01-23 09:11:49 +00:00
Kristof Beyls e70ba0a1bc [SLH][AArch64] Remove accidentally retained -debug-only line from test.
llvm-svn: 351932
2019-01-23 09:10:12 +00:00
Martin Storsjo 12b6b80208 Reapply: [llvm-objcopy] [COFF] Implement --add-gnu-debuglink
This was reverted since it broke a couple buildbots. The reason
for the breakage is not yet known, but this time, the test has
got more diagnostics added, to hopefully allow figuring out
what goes wrong.

Differential Revision: https://reviews.llvm.org/D57007

llvm-svn: 351931
2019-01-23 08:25:28 +00:00
Kristof Beyls 3ff5dfd735 [SLH] AArch64: correctly pick temporary register to mask SP
As part of speculation hardening, the stack pointer gets masked with the
taint register (X16) before a function call or before a function return.
Since there are no instructions that can directly mask writing to the
stack pointer, the stack pointer must first be transferred to another
register, where it can be masked, before that value is transferred back
to the stack pointer.
Before, that temporary register was always picked to be x17, since the
ABI allows clobbering x17 on any function call, resulting in the
following instruction pattern being inserted before function calls and
returns/tail calls:

mov x17, sp
and x17, x17, x16
mov sp, x17
However, x17 can be live in those locations, for example when the call
is an indirect call, using x17 as the target address (blr x17).

To fix this, this patch looks for an available register just before the
call or terminator instruction and uses that.

In the rare case when no register turns out to be available (this
situation is only encountered twice across the whole test-suite), just
insert a full speculation barrier at the start of the basic block where
this occurs.

Differential Revision: https://reviews.llvm.org/D56717

llvm-svn: 351930
2019-01-23 08:18:39 +00:00
Jonas Paulsson 961c47ec98 [SystemZ] Handle DBG_VALUE instructions in two places in backend.
Two backend optimizations failed to handle cases when compiled with -g, due
to failing to consider DBG_VALUE instructions. This was in
SystemZTargetLowering::emitSelect() and
SystemZElimCompare::getRegReferences().

This patch makes sure that DBG_VALUEs are recognized so that they do not
affect these optimizations.

Tests for branch-on-count, load-and-trap and consecutive selects.

Review: Ulrich Weigand
https://reviews.llvm.org/D57048

llvm-svn: 351928
2019-01-23 07:42:26 +00:00
Max Kazantsev d9aee3c0d1 [IRCE] Support narrow latch condition for wide range checks
This patch relaxes restrictions on types of latch condition and range check.
In current implementation, they should match. This patch allows to handle
wide range checks against narrow condition. The motivating example is the
following:

  int N = ...
  for (long i = 0; (int) i < N; i++) {
    if (i >= length) deopt;
  }

In this patch, the option that enables this support is turned off by
default. We'll wait until it is switched to true.

Differential Revision: https://reviews.llvm.org/D56837
Reviewed By: reames

llvm-svn: 351926
2019-01-23 07:20:56 +00:00
Brendon Cahoon 59d9973146 [Pipeliner] Add two pragmas to control software pipelining optimization
#pragma clang loop pipeline(disable)
  
    Disable SWP optimization for the next loop.
    “disable” is the only possible value.
  
#pragma clang loop pipeline_initiation_interval(number)
  
    Set value of initiation interval for SWP
    optimization to specified number value for
    the next loop. Number is the positive value
    greater than 0.
  
These pragmas could be used for debugging or reducing
compile time purposes. It is possible to disable SWP for
concrete loops to save compilation time or to find bugs
by not doing SWP to certain loops. It is possible to set
value of initiation interval to concrete number to save
compilation time by not doing extra pipeliner passes or
to check created schedule for specific initiation interval.

That is llvm part of the fix

Clang part of fix: https://reviews.llvm.org/D55710

Patch by Alexey Lapshin!

Differential Revision: https://reviews.llvm.org/D56403

llvm-svn: 351923
2019-01-23 03:26:10 +00:00
Peter Collingbourne 73078ecd38 hwasan: Move memory access checks into small outlined functions on aarch64.
Each hwasan check requires emitting a small piece of code like this:
https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html#memory-accesses

The problem with this is that these code blocks typically bloat code
size significantly.

An obvious solution is to outline these blocks of code. In fact, this
has already been implemented under the -hwasan-instrument-with-calls
flag. However, as currently implemented this has a number of problems:
- The functions use the same calling convention as regular C functions.
  This means that the backend must spill all temporary registers as
  required by the platform's C calling convention, even though the
  check only needs two registers on the hot path.
- The functions take the address to be checked in a fixed register,
  which increases register pressure.
Both of these factors can diminish the code size effect and increase
the performance hit of -hwasan-instrument-with-calls.

The solution that this patch implements is to involve the aarch64
backend in outlining the checks. An intrinsic and pseudo-instruction
are created to represent a hwasan check. The pseudo-instruction
is register allocated like any other instruction, and we allow the
register allocator to select almost any register for the address to
check. A particular combination of (register selection, type of check)
triggers the creation in the backend of a function to handle the check
for specifically that pair. The resulting functions are deduplicated by
the linker. The pseudo-instruction (really the function) is specified
to preserve all registers except for the registers that the AAPCS
specifies may be clobbered by a call.

To measure the code size and performance effect of this change, I
took a number of measurements using Chromium for Android on aarch64,
comparing a browser with inlined checks (the baseline) against a
browser with outlined checks.

Code size: Size of .text decreases from 243897420 to 171619972 bytes,
or a 30% decrease.

Performance: Using Chromium's blink_perf.layout microbenchmarks I
measured a median performance regression of 6.24%.

The fact that a perf/size tradeoff is evident here suggests that
we might want to make the new behaviour conditional on -Os/-Oz.
But for now I've enabled it unconditionally, my reasoning being that
hwasan users typically expect a relatively large perf hit, and ~6%
isn't really adding much. We may want to revisit this decision in
the future, though.

I also tried experimenting with varying the number of registers
selectable by the hwasan check pseudo-instruction (which would result
in fewer variants being created), on the hypothesis that creating
fewer variants of the function would expose another perf/size tradeoff
by reducing icache pressure from the check functions at the cost of
register pressure. Although I did observe a code size increase with
fewer registers, I did not observe a strong correlation between the
number of registers and the performance of the resulting browser on the
microbenchmarks, so I conclude that we might as well use ~all registers
to get the maximum code size improvement. My results are below:

Regs | .text size | Perf hit
-----+------------+---------
~all | 171619972  | 6.24%
  16 | 171765192  | 7.03%
   8 | 172917788  | 5.82%
   4 | 177054016  | 6.89%

Differential Revision: https://reviews.llvm.org/D56954

llvm-svn: 351920
2019-01-23 02:20:10 +00:00
Jordan Rupprecht 302393d4da [llvm-objcopy] Remove os-dependent message from test
llvm-svn: 351914
2019-01-23 01:42:02 +00:00
Josh Stone 7a3108ff0f [CodeView] Allow empty types in member functions
Summary:
`CodeViewDebug::lowerTypeMemberFunction` used to default to a `Void`
return type if the function's type array was empty. After D54667, it
started blindly indexing the 0th item for the return type, which fails
in `getOperand` for empty arrays if assertions are enabled.

This patch restores the `Void` return type for empty type arrays, and
adds a test generated by Rust in line-only debuginfo mode.

Reviewers: zturner, rnk

Reviewed By: rnk

Subscribers: hiraditya, JDevlieghere, llvm-commits

Differential Revision: https://reviews.llvm.org/D57070

llvm-svn: 351910
2019-01-23 00:53:22 +00:00
Jordan Rupprecht b4465f12dc [llvm-objcopy] Fix error message for msvc tests
llvm-svn: 351905
2019-01-23 00:35:04 +00:00
Jordan Rupprecht 881cae7a45 [llvm-objcopy] Return Error from Buffer::allocate(), [ELF]Writer::finalize(), and [ELF]Writer::commit()
Summary:
This patch changes a few methods to return Error instead of manually calling error/reportError to abort. This will make it easier to extract into a library.

Note that error() takes just a string (this patch also adds an overload that takes an Error), while reportError() takes string + [error/code]. To help unify things, use FileError to associate a given filename with an error. Note that this takes some special care (for now), e.g. calling reportError(FileName, <something that could be FileError>) will duplicate the filename. The goal is to eventually remove reportError() and have every error associated with a file to be a FileError, and just one error handling block at the tool level.

This change was suggested in D56806. I took it a little further than suggested, but completely fixing llvm-objcopy will take a couple more patches. If this approach looks good, I'll commit this and apply similar patche(s) for the rest.

This change is NFC in terms of non-error related code, although the error message changes in one context.

Reviewers: alexshap, jhenderson, jakehehrlich, mstorsjo, espindola

Reviewed By: alexshap, jhenderson

Subscribers: llvm-commits, emaste, arichardson

Differential Revision: https://reviews.llvm.org/D56930

llvm-svn: 351896
2019-01-22 23:49:16 +00:00
Matt Arsenault 4c5e8f51e7 AMDGPU/GlobalISel: Start selectively legalizing 16-bit operations
It might be a bit nicer to use the fancy .legalIf and co. predicates,
but this was requiring more boilerplate and disables the coverage
assertions.

llvm-svn: 351886
2019-01-22 22:00:19 +00:00
Matt Arsenault 736cfa9ffb AMDGPU/GlobalISel: Handle legality/regbanks for 32/64-bit shifts
llvm-svn: 351884
2019-01-22 21:51:38 +00:00
Matt Arsenault 30989e492b GlobalISel: Allow shift amount to be a different type
For AMDGPU the shift amount is never 64-bit, and
this needs to use a 32-bit shift.

X86 uses i8, but seemed to be hacking around this before.

llvm-svn: 351882
2019-01-22 21:42:11 +00:00
Joel E. Denny 352695c336 [FileCheck] Suppress old -v/-vv diags if dumping input
The old diagnostic form of the trace produced by -v and -vv looks
like:

```
check1:1:8: remark: CHECK: expected string found in input
CHECK: abc
       ^
<stdin>:1:3: note: found here
; abc def
  ^~~
```

When dumping annotated input is requested (via -dump-input), I find
that this old trace is not useful and is sometimes harmful:

1. The old trace is mostly redundant because the same basic
   information also appears in the input dump's annotations.

2. The old trace buries any error diagnostic between it and the input
   dump, but I find it useful to see any error diagnostic up front.

3. FILECHECK_OPTS=-dump-input=fail requests annotated input dumps only
   for failed FileCheck calls.  However, I have to also add -v or -vv
   to get a full set of annotations, and that can produce massive
   output from all FileCheck calls in all tests.  That's a real
   problem when I run this in the IDE I use, which grinds to a halt as
   it tries to capture all that output.

When -dump-input=fail|always, this patch suppresses the old trace from
-v or -vv.  Error diagnostics still print as usual.  If you want the
old trace, perhaps to see variable expansions, you can set
-dump-input=none (the default).

Reviewed By: probinson

Differential Revision: https://reviews.llvm.org/D55825

llvm-svn: 351881
2019-01-22 21:41:42 +00:00
Craig Topper a13edd3ef2 [X86][AVX512F_SCALAR]: Adding full coverage of MC encoding for the AVX512F_SCALAR isa sets. NFC
Adding MC regressions tests to cover the AVX512F_SCALAR isa sets.
This patch is part of a larger task to cover MC encoding of all X86 isa sets started in revision: https://reviews.llvm.org/D39952

Differential Revision: https://reviews.llvm.org/D41174

llvm-svn: 351874
2019-01-22 20:48:24 +00:00
Matt Arsenault 6378629609 GlobalISel: Implement widen for extract_vector_elt elt type
llvm-svn: 351871
2019-01-22 20:38:15 +00:00
Matt Arsenault aebb2ee036 GlobalISel: Implement fewerElementsVector for basic FP ops
llvm-svn: 351866
2019-01-22 20:14:29 +00:00
Matt Arsenault 6614f852b6 GlobalISel: Support narrowing zextload/sextload
llvm-svn: 351856
2019-01-22 19:02:10 +00:00
Matt Arsenault a7cd83bc88 GlobalISel: Disallow vectors for G_CONSTANT/G_FCONSTANT
llvm-svn: 351853
2019-01-22 18:53:41 +00:00
Matt Arsenault a5840c3c39 Codegen support for atomicrmw fadd/fsub
llvm-svn: 351851
2019-01-22 18:36:06 +00:00
Matt Arsenault 39508331ef Reapply "IR: Add fp operations to atomicrmw"
This reapplies commits r351778 and r351782 with
RISCV test fixes.

llvm-svn: 351850
2019-01-22 18:18:02 +00:00