Commit Graph

111525 Commits

Author SHA1 Message Date
Sanjay Patel a4f42f2cfd [PatternMatch, InstSimplify] allow undef elements when matching any vector FP zero
This matcher implementation appears to be slightly more efficient than 
the generic constant check that it is replacing because every use was 
for matching FP patterns, but the previous code would check int and 
pointer type nulls too. 

llvm-svn: 327627
2018-03-15 14:29:27 +00:00
Sanjay Patel 8f063d0c70 [InstSimplify] remove 'nsz' requirement for frem 0, X
From the LangRef definition for frem: 
"The value produced is the floating-point remainder of the two operands. 
This is the same output as a libm ‘fmod‘ function, but without any 
possibility of setting errno. The remainder has the same sign as the 
dividend. This instruction is assumed to execute in the default 
floating-point environment."

llvm-svn: 327626
2018-03-15 14:04:31 +00:00
Ulrich Weigand f4ceef8d3f [Debug] Retain both copies of debug intrinsics in HoistThenElseCodeToIf
When hoisting common code from the "then" and "else" branches of a condition
to before the "if", the HoistThenElseCodeToIf routine will attempt to merge
the debug location associated with the two original copies of the hoisted
instruction.

This is a problem in the special case where the hoisted instruction is a
debug info intrinsic, since for those the debug location is considered
part of the intrinsic and attempting to modify it may resut in invalid
IR.  This is the underlying cause of PR36410.

This patch fixes the problem by handling debug info intrinsics specially:
instead of hoisting one copy and merging the two locations, the code now
simply hoists both copies, each with its original location intact.  Note
that this is still only done in the case where both original copies are
otherwise (i.e. apart from location metadata) identical.

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D44312

llvm-svn: 327622
2018-03-15 12:28:48 +00:00
Brock Wyma 3cc5710cec [CodeView] Initial support for emitting S_BLOCK32 symbols for lexical scopes
This patch sorts local variables by lexical scope and emits them inside
an appropriate S_BLOCK32 CodeView symbol.

Differential Revision: https://reviews.llvm.org/D42926

llvm-svn: 327620
2018-03-15 11:52:17 +00:00
Fedor Sergeev 194a407bda [New PM][IRCE] port of Inductive Range Check Elimination pass to the new pass manager
There are two nontrivial details here:
* Loop structure update interface is quite different with new pass manager,
  so the code to add new loops was factored out

* BranchProbabilityInfo is not a loop analysis, so it can not be just getResult'ed from
  within the loop pass. It cant even be queried through getCachedResult as LoopCanonicalization
  sequence (e.g. LoopSimplify) might invalidate BPI results.

  Complete solution for BPI will likely take some time to discuss and figure out,
  so for now this was partially solved by making BPI optional in IRCE
  (skipping a couple of profitability checks if it is absent).

Most of the IRCE tests got their corresponding new-pass-manager variant enabled.
Only two of them depend on BPI, both marked with TODO, to be turned on when BPI
starts being available for loop passes.

Reviewers: chandlerc, mkazantsev, sanjoy, asbirlea
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D43795

llvm-svn: 327619
2018-03-15 11:01:19 +00:00
Andrei Elovikov f9b8035f3c [LoopUnroll] Ignore ephemeral values when checking full unroll profitability.
Summary:
Before this patch call graph is like this in the LoopUnrollPass:

  tryToUnrollLoop
    ApproximateLoopSize
      collectEphemeralValues
      /* Use collected ephemeral values */
    computeUnrollCount
      analyzeLoopUnrollCost
        /* Bail out from the analysis if loop contains CallInst */

This patch moves collection of the ephemeral values to the tryToUnrollLoop
function and passes the collected values into both ApproximateLoopsize (as
before) and additionally starts using them in analyzeLoopUnrollCost:

  tryToUnrollLoop
    collectEphemeralValues
    ApproximateLoopSize(EphValues)
      /* Use EphValues */
    computeUnrollCount(EphValues)
      analyzeLoopUnrollCost(EphValues)
        /* Ignore ephemeral values - they don't contribute to the final cost */
        /* Bail out from the analysis if loop contains CallInst */

Reviewers: mzolotukhin, evstupac, sanjoy

Reviewed By: evstupac

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D43931

llvm-svn: 327617
2018-03-15 09:59:15 +00:00
Max Kazantsev 4f9c7c5086 [SCEV][NFC] Remove TBB, FBB parameters from exit limit computations
Methods `computeExitLimitFromCondCached` and `computeExitLimitFromCondImpl` take
true and false branches as parameters and only use them for asserts and for identifying
whether true/false branch belongs to the loop (which can be done once earlier). This fact
complicates generalization of exit limit computation logic on guards because the guards
don't have blocks to which they go in case of failure explicitly.

The motivation of this patch is that currently this part of SCEV knows nothing about guards
and only works with explicit branches. As result, it fails to prove that a loop

  for (i = 0; i < 100; i++)
    guard(i < 10);

exits after 10th iteration, while in the equivalent example

  for (i = 0; i < 100; i++)
    if (i >= 10) break;

SCEV easily proves this fact. We are going to change it in near future, and this is why
we need to make these methods operate on more abstract level.

This patch refactors this code to get rid of these parameters as meaningless and prepare
ground for teaching these methods to work with guards as well as they work with explicit
branching instructions.

Differential Revision: https://reviews.llvm.org/D44419

llvm-svn: 327615
2018-03-15 09:38:00 +00:00
Craig Topper 26a3a80c87 [X86] Add support for matching FMSUBADD from build_vector.
llvm-svn: 327604
2018-03-15 06:14:55 +00:00
Craig Topper a5e712f402 [X86] Remove old TODO. We have coverage for this now.
Coverage was added in r320950.

llvm-svn: 327603
2018-03-15 06:14:53 +00:00
Craig Topper b9526e9fdb [X86] Use MVT in a couple places where we know the type is legal.
llvm-svn: 327602
2018-03-15 06:14:51 +00:00
Aaron Smith 40198f5905 [DebugInfo] Add a new method IPDBSession::findLineNumbersBySectOffset
Summary:
Some PDB symbols do not have a valid VA or RVA but have Addr by Section and Offset. For example, a variable in thread-local storage has the following properties:

     get_addressOffset: 0
     get_addressSection: 5
     get_lexicalParentId: 2
     get_name: g_tls
     get_symIndexId: 12
     get_typeId: 4
     get_dataKind: 6
     get_symTag: 7
     get_locationType: 2

This change provides a new method to locate line numbers by Section and Offset from those symbols.

Reviewers: zturner, rnk, llvm-commits

Subscribers: asmith, JDevlieghere

Differential Revision: https://reviews.llvm.org/D44407

llvm-svn: 327601
2018-03-15 06:04:51 +00:00
Lei Huang 1f8da3ae19 [PowerPC][NFC] formatting-only fix
llvm-svn: 327599
2018-03-15 03:06:44 +00:00
George Burgess IV cedfa6da81 Remove unused variable; NFC
llvm-svn: 327597
2018-03-15 02:58:36 +00:00
Lang Hames 5721ee48a2 [ORC] Re-apply r327566 with a fix for test-global-ctors.ll.
Also clang-formats the patch, which I should have done the first time around.

llvm-svn: 327594
2018-03-15 00:30:14 +00:00
Matt Davis 9407bb5f54 [CleanUp] Remove NumInstructions field from LoopVectorizer's RegisterUsage struct.
Summary:
This variable is largely going unused; aside from reporting number of instructions for in DEBUG builds.

The only use of NumInstructions is in debug output to represent the LoopSize.  That value can be can be misleading as it also includes metadata instructions (e.g., DBG_VALUE) which have no real impact.  If we do choose to keep this around, we probably should guard it by a DEBUG macro, as it's not used in production builds.



Reviewers: majnemer, congh, rengolin

Reviewed By: rengolin

Subscribers: llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D44495

llvm-svn: 327589
2018-03-14 23:30:31 +00:00
Simon Pilgrim 48fbf0c69a [X86][Btver2] Add support for multiple pipelines stages for fpu schedules. NFCI.
This allows us to use JWriteResFpuPair for complex schedule classes as well as single pipe instructions.

llvm-svn: 327588
2018-03-14 23:12:09 +00:00
Mark Searles c3c02bde73 [AMDGPU] Waitcnt pass: Modify the waitcnt pass to propagate info in the case of a single basic block loop. mergeInputScoreBrackets() does this for us; update it so that it processes the single bb's score bracket when processing the single bb's preds. It is, after all, a pred of itself, so it's score bracket is needed.
Differential Revision: https://reviews.llvm.org/D44434

llvm-svn: 327583
2018-03-14 22:04:32 +00:00
Simon Pilgrim dfeebdbed7 [X86][Btver2] Add ResourceCycles and NumMicroOps overrides to scalar instructions. NFCI.
Currently still use default values - this is setup for a future patch.

llvm-svn: 327582
2018-03-14 21:55:54 +00:00
Reid Kleckner 3a7a2e4a0a [FastISel] Sink local value materializations to first use
Summary:
Local values are constants, global addresses, and stack addresses that
can't be folded into the instruction that uses them. For example, when
storing the address of a global variable into memory, we need to
materialize that address into a register.

FastISel doesn't want to materialize any given local value more than
once, so it generates all local value materialization code at
EmitStartPt, which always dominates the current insertion point. This
allows it to maintain a map of local value registers, and it knows that
the local value area will always dominate the current insertion point.

The downside is that local value instructions are always emitted without
a source location. This is done to prevent jumpy line tables, but it
means that the local value area will be considered part of the previous
statement. Consider this C code:
  call1();      // line 1
  ++global;     // line 2
  ++global;     // line 3
  call2(&global, &local); // line 4

Today we end up with assembly and line tables like this:
  .loc 1 1
  callq call1
  leaq global(%rip), %rdi
  leaq local(%rsp), %rsi
  .loc 1 2
  addq $1, global(%rip)
  .loc 1 3
  addq $1, global(%rip)
  .loc 1 4
  callq call2

The LEA instructions in the local value area have no source location and
are treated as being on line 1. Stepping through the code in a debugger
and correlating it with the assembly won't make much sense, because
these materializations are only required for line 4.

This is actually problematic for the VS debugger "set next statement"
feature, which effectively assumes that there are no registers live
across statement boundaries. By sinking the local value code into the
statement and fixing up the source location, we can make that feature
work. This was filed as https://bugs.llvm.org/show_bug.cgi?id=35975 and
https://crbug.com/793819.

This change is obviously not enough to make this feature work reliably
in all cases, but I felt that it was worth doing anyway because it
usually generates smaller, more comprehensible -O0 code. I measured a
0.12% regression in code generation time with LLC on the sqlite3
amalgamation, so I think this is worth doing.

There are some special cases worth calling out in the commit message:
1. local values materialized for phis
2. local values used by no-op casts
3. dead local value code

Local values can be materialized for phis, and this does not show up as
a vreg use in MachineRegisterInfo. In this case, if there are no other
uses, this patch sinks the value to the first terminator, EH label, or
the end of the BB if nothing else exists.

Local values may also be used by no-op casts, which adds the register to
the RegFixups table. Without reversing the RegFixups map direction, we
don't have enough information to sink these instructions.

Lastly, if the local value register has no other uses, we can delete it.
This comes up when fastisel tries two instruction selection approaches
and the first materializes the value but fails and the second succeeds
without using the local value.

Reviewers: aprantl, dblaikie, qcolombet, MatzeB, vsk, echristo

Subscribers: dotdash, chandlerc, hans, sdardis, amccarth, javed.absar, zturner, llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D43093

llvm-svn: 327581
2018-03-14 21:54:21 +00:00
Francis Visoiu Mistrih e85b06d65f [CodeGen] Use MIR syntax for MachineMemOperand printing
Get rid of the "; mem:" suffix and use the one we use in MIR: ":: (load 2)".

rdar://38163529

Differential Revision: https://reviews.llvm.org/D42377

llvm-svn: 327580
2018-03-14 21:52:13 +00:00
Philip Reames 0adbb19409 [EarlyCSE] Exploit open ended invariant.start scopes
If we have an invariant.start with no corresponding invariant.end, then the memory location becomes invariant indefinitely after the invariant.start. As a result, anything dominated by the start is guaranteed to see the value the memory location had when the invariant.start executed.

This patch adds an AvailableInvariants table which tracks the generation a particular memory location became invariant and then uses that information to allow value forwarding that would otherwise be disallowed by potentially aliasing stores. (Reminder: In EarlyCSE everything clobbers everything by default.)

This should be compatible with the MemorySSA variant, but design is generational. We can and should add first class support for invariant.start within MemorySSA at a later time.  I took a quick look at doing so, but probably need some input from a MemorySSA expert.

Differential Revision: https://reviews.llvm.org/D43716

llvm-svn: 327577
2018-03-14 21:35:06 +00:00
Reid Kleckner c7fd1540b3 Revert "[ORC] Switch from shared_ptr to unique_ptr for addModule methods."
This reverts commit r327566, it breaks
test/ExecutionEngine/OrcMCJIT/test-global-ctors.ll.

The test doesn't crash with a stack trace, unfortunately. It merely
returns 1 as the exit code.

ASan didn't produce a report, and I reproduced this on my Linux machine
and Windows box.

llvm-svn: 327576
2018-03-14 21:32:34 +00:00
Sanjay Patel 11f7f9908b [InstSimplify] fix folds for (0.0 - X) + X --> 0 (PR27151)
As shown in:
https://bugs.llvm.org/show_bug.cgi?id=27151
...the existing fold could miscompile when X is NaN.

The fold was also dependent on 'ninf' but that's not necessary.

From IEEE-754 (with default rounding which we can assume for these opcodes):
"When the sum of two operands with opposite signs (or the difference of two 
operands with like signs) is exactly zero, the sign of that sum (or difference) 
shall be +0...However, x + x = x − (−x) retains the same sign as x even when 
x is zero."

llvm-svn: 327575
2018-03-14 21:23:27 +00:00
Francis Visoiu Mistrih 164560bd74 [AArch64] Emit CSR loads in the same order as stores
Optionally allow the order of restoring the callee-saved registers in the
epilogue to be reversed.

The flag -reverse-csr-restore-seq generates the following code:

```
stp     x26, x25, [sp, #-64]!
stp     x24, x23, [sp, #16]
stp     x22, x21, [sp, #32]
stp     x20, x19, [sp, #48]

; [..]

ldp     x24, x23, [sp, #16]
ldp     x22, x21, [sp, #32]
ldp     x20, x19, [sp, #48]
ldp     x26, x25, [sp], #64
ret
```

Note how the CSRs are restored in the same order as they are saved.

One exception to this rule is the last `ldp`, which allows us to merge
the stack adjustment and the ldp into a post-index ldp. This is done by
first generating:

ldp x26, x27, [sp]
add sp, sp, #64

which gets merged by the arm64 load store optimizer into

ldp x26, x25, [sp], #64

The flag is disabled by default.

llvm-svn: 327569
2018-03-14 20:34:03 +00:00
Lang Hames 7bea03c2bb [ORC] Switch from shared_ptr to unique_ptr for addModule methods.
Layer implementations typically mutate module state, and this is better
reflected by having layers own the Module they are operating on.

llvm-svn: 327566
2018-03-14 20:29:45 +00:00
Reid Kleckner 5a5ac65768 [MC] Always emit relocations for same-section function references
Summary:
We already emit relocations in this case when the "incremental linker
compatible" flag is set, but it turns out these relocations are also
required for /guard:cf. Now that we have two use cases for this
behavior, let's make it unconditional to try to keep things simple.

We never hit this problem in Clang because it always sets the
"incremental linker compatible" flag when targeting MSVC. However, LLD
LTO doesn't set this flag, so we'd get CFG failures at runtime when
using ThinLTO and /guard:cf. We probably don't want LLD LTO to set the
"incremental linker compatible" assembler flag, since this has nothing
to do with incremental linking, and we don't need to timestamp LTO
temporary objects.

Fixes PR36624.

Reviewers: inglorion, espindola, majnemer

Subscribers: mehdi_amini, llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D44485

llvm-svn: 327557
2018-03-14 19:24:32 +00:00
Reid Kleckner e66458a841 [LLVM-C] [bindings/go] Add C and Golang bindings for COMDAT
Patch by Ben Clayton

Differential Revision: https://reviews.llvm.org/D44086

llvm-svn: 327551
2018-03-14 18:33:53 +00:00
Craig Topper 9c098ed819 [X86] Add back fast-isel code for handling i8 shifts.
I removed this in r316797 because the coverage report showed no coverage and I thought it should have been handled by the auto generated table. I now see that there is code that bypasses the table if the shift amount is out of bounds.

This adds back the code. We'll codegen out of bounds i8 shifts to effectively (amount & 0x1f). The 0x1f is a strange quirk of x86 that shift amounts are always masked to 5-bits(except 64-bits). So if the masked value is still out bounds the result will be 0.

Fixes PR36731.

llvm-svn: 327540
2018-03-14 17:57:19 +00:00
Francis Visoiu Mistrih 084e7d8770 [AArch64] Keep track of MIFlags in the LoadStoreOptimizer
Merging:

* $x26, $x25 = frame-setup LDPXi $sp, 0
* $sp = frame-destroy ADDXri $sp, 64, 0

into an LDPXpost should preserve the flags from both instructions as
following:

* frame-setup frame-destroy LDPXpost

Differential Revision: https://reviews.llvm.org/D44446

llvm-svn: 327533
2018-03-14 17:10:58 +00:00
Craig Topper b36cb20ef9 [X86] Teach X86TargetLowering::targetShrinkDemandedConstant to set non-demanded bits if it helps created an and mask that can be matched as a zero extend.
I had to modify the bswap recognition to allow unshrunk masks to make this work.

Fixes PR36689.

Differential Revision: https://reviews.llvm.org/D44442

llvm-svn: 327530
2018-03-14 16:55:15 +00:00
Simon Pilgrim d1c3c995c0 [X86][AVX] Use WriteFShuffleLd for broadcast reg-mem instructions
They shouldn't be treated as pure loads.

Found while investigating D44428

llvm-svn: 327524
2018-03-14 15:47:08 +00:00
Nicholas Wilson 027b9357a8 [WebAssembly] Identify COMDATs by index rather than string. NFC
This will enable an optimisation in LLD.

Differential Revision: https://reviews.llvm.org/D44343

llvm-svn: 327522
2018-03-14 15:44:45 +00:00
Arnold Schwaighofer bf1638daa8 SjLjEHPrepare: Don't reg-to-mem swifterror values
swifterror llvm values model the swifterror register as memory at the
LLVM IR level. ISel will perform adhoc mem-to-reg on them. swifterror
values are constraint in how they can be used. Spilling them to memory
is not allowed.

SjLjEHPrepare tried to lower swifterror values to memory which is
unecessary since the back-end will spill and reload the register as
neccessary (as long as clobbering calls are marked as such which is the
case here) and further leads to invalid IR because swifterror values
can't be stored to memory.

rdar://38164004

llvm-svn: 327521
2018-03-14 15:44:07 +00:00
Alexander Ivchenko 86ef9ab28f [GlobalIsel][X86] Support for G_SDIV instruction
Reviewed By: igorb

Differential Revision: https://reviews.llvm.org/D44430

llvm-svn: 327520
2018-03-14 15:41:11 +00:00
Sanjay Patel 5773ac3ee8 [CodeGen] allow printing of zero latency in sched comments
I don't know how to expose this in a test. There are ARM / AArch64 
sched classes that include zero latency instructions, but I'm not 
seeing sched info printed for those targets. X86 will almost 
certainly have these soon (see PR36671), but no model has
'let Latency = 0' currently.

llvm-svn: 327518
2018-03-14 15:28:48 +00:00
Petar Jovanovic 3408caf686 [mips] Add support for CRC ASE
This includes

  Instructions: crc32b, crc32h, crc32w, crc32d,
                crc32cb, crc32ch, crc32cw, crc32cd

  Assembler directives: .set crc, .set nocrc, .module crc, .module nocrc

  Attribute: crc

  .MIPS.abiflags: CRC (0x8000)

Patch by Vladimir Stefanovic.

Differential Revision: https://reviews.llvm.org/D44176

llvm-svn: 327511
2018-03-14 14:13:31 +00:00
Simon Pilgrim d594942928 [X86][Btver2] Fix YMM shuffle, permute and permutevar scheduler costs
Account for ymm double pumping and add proper pshufb/permutevar support

llvm-svn: 327510
2018-03-14 14:05:19 +00:00
Simon Pilgrim de995e6e37 [X86][SSE] Use WriteFShuffleLd for MOVDDUP/MOVSHDUP/MOVSLDUP reg-mem instructions
They shouldn't be treated as pure loads.

Found while investigating D44428

llvm-svn: 327505
2018-03-14 13:22:56 +00:00
Martin Storsjo bde677289a [AArch64] Don't produce R_AARCH64_TLSLE_LDST32_TPREL_LO12_NC
Support for this relocation is missing in both LLD and GNU binutils
at the moment.

This reverts the ELF parts of SVN r327316.

llvm-svn: 327503
2018-03-14 13:09:10 +00:00
Simon Pilgrim f0ccaae5bc Fix 'not all control paths return a value' MSVC warning. NFCI.
llvm-svn: 327502
2018-03-14 12:04:51 +00:00
Pavel Labath 8ed6582bb0 Fix msvc compiler error in r327498
msvc reports an "illegal indirection" error here. Attempt to appease it
with a different initialization syntax.

llvm-svn: 327500
2018-03-14 11:31:17 +00:00
Alexander Ivchenko 0bd4d8c901 [GlobalISel][X86] Support G_LSHR/G_ASHR/G_SHL
Support G_LSHR/G_ASHR/G_SHL. We have 3 variance for
shift instructions : shift gpr, shift imm, shift 1.
Currently GlobalIsel TableGen generate patterns for
shift imm and shift 1, but with shiftCount i8.
In G_LSHR/G_ASHR/G_SHL like LLVM-IR both arguments
has the same type, so for now only shift i8 can use
auto generated TableGen patterns.

The support of G_SHL/G_ASHR enables tryCombineSExt
from LegalizationArtifactCombiner.h to hit, which
results in different legalization for the following tests:
    LLVM :: CodeGen/X86/GlobalISel/ext-x86-64.ll
    LLVM :: CodeGen/X86/GlobalISel/gep.ll
    LLVM :: CodeGen/X86/GlobalISel/legalize-ext-x86-64.mir

-; X64-NEXT:    movsbl %dil, %eax
+; X64-NEXT:    movl $24, %ecx
+; X64-NEXT:    # kill: def $cl killed $ecx
+; X64-NEXT:    shll %cl, %edi
+; X64-NEXT:    movl $24, %ecx
+; X64-NEXT:    # kill: def $cl killed $ecx
+; X64-NEXT:    sarl %cl, %edi
+; X64-NEXT:    movl %edi, %eax

..which is not optimal and should be addressed later.

Rework of the patch by igorb

Reviewed By: igorb

Differential Revision: https://reviews.llvm.org/D44395

llvm-svn: 327499
2018-03-14 11:23:57 +00:00
Pavel Labath 0dd81bab92 Explicitly initialize dwarf::FormParams in DIEInteger::SizeOf
This could end up inititialized if someone called the function with a
null AsmPrinter. Right now this only happens in DIEHash unit tests,
presumably because it was hard to create an AsmPrinter in the context of
unit tests. This only worked before r327486 because those tests did not
use any dwarf forms whose size actually depended on the dwarf version
(otherwise, they would have crashed due to null dereference).

I fix the uninitialized error, by explicitly initializing FormParams to
an invalid value, which will cause getFixedFormByteSize to return None
if called with a form with version-dependent size. A more principled
solution might be to fix the DIEHash tests to always pass in a valid
AsmPrinter.

llvm-svn: 327498
2018-03-14 11:14:43 +00:00
Nicolai Haehnle a511dddced TableGen: Explicitly forbid some nestings of class, multiclass, and foreach
These previously all failed one way or another, but now we produce a more
helpful error message.

Change-Id: I8ffd2e87c8e35a5134c3be289e0a1fecaa2bb8ca

Differential revision: https://reviews.llvm.org/D44115

llvm-svn: 327497
2018-03-14 11:01:01 +00:00
Nicolai Haehnle aa9ca691cd TableGen: Add !ne, !le, !lt, !ge, and !gt comparisons
Change-Id: I8e2ece677268972d578a787467f7ef52a1f33a71

Differential revision: https://reviews.llvm.org/D44114

llvm-svn: 327496
2018-03-14 11:00:57 +00:00
Nicolai Haehnle b61c26e614 TableGen: Allow dag operators to be resolved late
Change-Id: I51bb80fd5c48c8ac441ab11e43d43c1b91b4b590

Differential revision: https://reviews.llvm.org/D44113

llvm-svn: 327495
2018-03-14 11:00:48 +00:00
Nicolai Haehnle 77841b159a TableGen: Type-check BinOps
Additionally, allow more than two operands to !con, !add, !and, !or
in the same way as is already allowed for !listconcat and !strconcat.

Change-Id: I9659411f554201b90cd8ed7c7e004d381a66fa93

Differential revision: https://reviews.llvm.org/D44112

llvm-svn: 327494
2018-03-14 11:00:43 +00:00
Nicolai Haehnle ef60a26817 TableGen: Allow ? in lists
This makes using !dag more convenient in some cases.

Change-Id: I0a8c35e15ccd1ecec778fd1c8d64eee38d74517c

Differential revision: https://reviews.llvm.org/D44111

llvm-svn: 327493
2018-03-14 11:00:33 +00:00
Nicolai Haehnle 6c11865638 TableGen: Add !dag function for construction
This allows constructing DAG nodes with programmatically determined
names, and can simplify constructing DAG nodes in other cases as
well.

Also, add documentation and some very simple tests for the already
existing !con.

Change-Id: Ida61cd82e99752548d7109ce8da34d29da56a5f7

Differential revision: https://reviews.llvm.org/D44110

llvm-svn: 327492
2018-03-14 11:00:26 +00:00
Pavel Labath 322711f529 DWARF: Unify form size handling code
Summary:
This patch replaces the two switches which are deducing the size of
various forms with a single implementation. I have put the new
implementation into BinaryFormat, to avoid introducing dependencies
between the two independent libraries (DebugInfo and CodeGen) that need
this functionality.

Reviewers: aprantl, JDevlieghere, dblaikie

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D44418

llvm-svn: 327486
2018-03-14 09:39:54 +00:00
Alexander Ivchenko 327de80529 [GlobalIsel][X86] Support for G_ZEXT instruction
Reviewed By: igorb

Differential Revision: https://reviews.llvm.org/D44378

llvm-svn: 327482
2018-03-14 09:11:23 +00:00
Robert Widmann 4bb481b2f2 [LLVM-C] Redo unnamed_address attribute bindings
Summary:
The old bindings should have used an enum instead of a boolean.  This
deprecates LLVMHasUnnamedAddr and LLVMSetUnnamedAddr , replacing them
with LLVMGetUnnamedAddress and LLVMSetUnnamedAddress respectively that do.
Though it is unlikely LLVM will gain more supported global value linker
hints, the new API can scale to accommodate this.

Reviewers: deadalnix, whitequark

Reviewed By: whitequark

Subscribers: llvm-commits, harlanhaskins

Differential Revision: https://reviews.llvm.org/D43448

llvm-svn: 327479
2018-03-14 06:45:51 +00:00
Lang Hames 2d603a1860 [RuntimeDyld] Silence a compiler error.
This should fix the error at
http://lab.llvm.org:8011/builders/lld-x86_64-darwin13/builds/19008

llvm-svn: 327478
2018-03-14 06:39:49 +00:00
Lang Hames b2facd6479 [ORC] Fix a data race in the lookup function.
The Error locals need to be protected by a mutex. (This could be fixed by
having the promises / futures contain Expected and Error values, but
MSVC's future implementation does not support this yet).

Hopefully this will fix some of the errors seen on the builders due to
r327474.

llvm-svn: 327477
2018-03-14 06:25:08 +00:00
Lang Hames 313f590aee [ExecutionEngine] Add a getSymbolTable method to RuntimeDyld.
This can be used to extract the symbol table from a RuntimeDyld instance prior
to disposing of it.

This patch also updates RTDyldObjectLinkingLayer to use the new method, rather
than requesting symbols one at a time via getSymbol.

llvm-svn: 327476
2018-03-14 06:25:07 +00:00
Lang Hames d4a768e78f [ORC] Silence a compiler error.
This should fix the builder error at
http://lab.llvm.org:8011/builders/lld-x86_64-darwin13/builds/19006

llvm-svn: 327475
2018-03-14 05:23:56 +00:00
Lang Hames 817f1f64d9 [ORC] Add a 'lookup' convenience function for finding symbols in a list of VSOs.
The lookup function takes a list of VSOs, a set of symbol names (or just one
symbol name) and a materialization function object. It returns an
Expected<SymbolMap> (if given a set of names) or an Expected<JITEvaluatedSymbol>
(if given just one name). The lookup method constructs an
AsynchronousSymbolQuery for the given names, applies that query to each VSO in
the list in turn, and then blocks waiting for the query to complete. If
threading is enabled then the materialization function object can be used to
execute the materialization on different threads. If threading is disabled the
MaterializeOnCurrentThread utility must be used.

llvm-svn: 327474
2018-03-14 04:18:04 +00:00
Matt Arsenault 41e5ac4fa4 TargetMachine: Add address space to getPointerSize
llvm-svn: 327467
2018-03-14 00:36:23 +00:00
Craig Topper ec4881ad53 [X86] Simplify the LowerAVXCONCAT_VECTORS code a little by creating a single path for insert_subvector handling.
We now only create recursive concats if we have more than two non-zero values. This keeps our subvector broadcast DAG combine functioning.

llvm-svn: 327457
2018-03-13 22:36:07 +00:00
Craig Topper cc060e921b [X86] Rewrite LowerAVXCONCAT_VECTORS similar to how we handle vXi1 concats.
This better able to detect undef and zeros pieces in the concat. Or cases when only one subvector is non-zero. This allows us to avoid silly things like double inserts into progressively larger undefs.

This still builds 512 bit concats of 128 bits by building up through 256 bits first. But I don't know if that's best.

We probably want to merge this with the vXi1 concat code since they are very similar.

llvm-svn: 327454
2018-03-13 22:05:25 +00:00
Hiroshi Yamauchi e6a3dc7699 Simplify more cases of logical ops of masked icmps.
Summary:
For example,

((X & 255) != 0) && ((X & 15) == 8) -> ((X & 15) == 8).
((X & 7) != 0) && ((X & 15) == 8) -> false.

Reviewers: davidxl

Reviewed By: davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D43835

llvm-svn: 327450
2018-03-13 21:13:18 +00:00
Craig Topper 4aeec51986 [DAGCombiner] Allow visitEXTRACT_SUBVECTOR to combine with BUILD_VECTORS between LegalizeVectorOps and LegalizeDAG.
BUILD_VECTORs aren't themselves legalized until LegalizeDAG so we should still be able to create an "illegal" one before that. This helps combine with BUILD_VECTORS that are introduced during LegalizeVectorOps due to unrolling.

llvm-svn: 327446
2018-03-13 20:36:28 +00:00
Francis Visoiu Mistrih 3abf05739f [MIR] Allow frame-setup and frame-destroy on the same instruction
Nothing prevents us from having both frame-setup and frame-destroy on
the same instruction.

When merging:
* frame-setup OPCODE1
* frame-destroy OPCODE2
into
* frame-setup frame-destroy OPCODE3

we want to be able to print and parse both flags.

llvm-svn: 327442
2018-03-13 19:53:16 +00:00
Anna Thomas 5ac72f94f3 Test Commit NFC. Updated comment
llvm-svn: 327436
2018-03-13 19:38:45 +00:00
Haicheng Wu aee0af3e23 [SLP] clean some formats
llvm-svn: 327433
2018-03-13 18:44:19 +00:00
Brian M. Rzycki 252165b27a [LazyValueInfo] PR33357 prevent infinite recursion on BinaryOperator
Summary:
It is possible for LVI to encounter instructions that are not in valid
SSA form and reference themselves. One example is the following:
  %tmp4 = and i1 %tmp4, undef
Before this patch LVI would recurse until running out of stack memory
and crashed.  This patch marks these self-referential instructions as
Overdefined and aborts analysis on the instruction.

Fixes https://bugs.llvm.org/show_bug.cgi?id=33357

Reviewers: craig.topper, anna, efriedma, dberlin, sebpop, kuhar

Reviewed by: dberlin

Subscribers: uabelho, spatel, a.elovikov, fhahn, eli.friedman, mzolotukhin, spop, evandro, davide, llvm-commits

Differential Revision: https://reviews.llvm.org/D34135

llvm-svn: 327432
2018-03-13 18:14:10 +00:00
Eugene Zemtsov 82d60d6b29 Handle mixed-OS paths in DWARF reader
Make sure that DWARF line information generated by Windows can be properly read by Posix OS and vice versa.

Differential Revision: https://reviews.llvm.org/D44290

llvm-svn: 327430
2018-03-13 17:54:29 +00:00
Sanjay Patel c6cbbc899b [MC] fix documentation comments; NFC
llvm-svn: 327429
2018-03-13 17:50:27 +00:00
Zachary Turner 679aeadda1 [PDB] Support dumping injected sources via the DIA reader.
Injected sources are basically a way to add actual source file content
to your PDB. Presumably you could use this for shipping your source code
with your debug information, but in practice I can only find this being
used for embedding natvis files inside of PDBs.

In order to effectively test LLVM's natvis file injection, we need a way
to dump the injected sources of a PDB in a way that is authoritative
(i.e. based on Microsoft's understanding of the PDB format, and not
LLVM's). To this end, I've added support for dumping injected sources
via DIA. I made a PDB file that used the /natvis option to generate a
test case.

Differential Revision: https://reviews.llvm.org/D44405

llvm-svn: 327428
2018-03-13 17:46:06 +00:00
Simon Dardis e5f72dd5e1 Revert "[mips] Guard traps for microMIPS correctly"
This appears to have broken the expensive checks bot in
a strange fashion. Reverting until I can investigate.

This reverts r327409.

llvm-svn: 327427
2018-03-13 17:31:11 +00:00
Simon Pilgrim 9855b39380 [DAGCombine] visitREM - Don't assume that one divrem isn't driving another
Under some circumstances the divrems won't have been combined together before getting to this code.

So replace the assertion with a if() guard to not expand to X-((X/C)*C) to give the other combine chance to happen.

Reduced from OSS-Fuzz #6883
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=6883

llvm-svn: 327424
2018-03-13 17:17:15 +00:00
Daniel Neilson 5182113f07 [SelectionDAGBuilder] Replace deprecated calls to MemoryIntrinsic::getAlignment() (NFCI)
Summary:
This change is part of step five in the series of changes to remove alignment argument from
memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the
SelectionDAGBuilder to cease using the old getAlignment() API of MemoryIntrinsic in favour of getting
source & dest specific alignments through the new API.

Steps:
Step 1) Remove alignment parameter and create alignment parameter attributes for
memcpy/memmove/memset. ( rL322965, rC322964, rL322963 )
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
source and dest alignments. ( rL323597 )
Step 3) Update Clang to use the new IRBuilder API. ( rC323617 )
Step 4) Update Polly to use the new IRBuilder API. ( rL323618 )
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
and those that use use MemIntrinsicInst::[get|set]Alignment() to use [get|set]DestAlignment()
and [get|set]SourceAlignment() instead. ( rL323886, rL323891, rL324148, rL324273, rL324278,
rL324384, rL324395, rL324402, rL324626, rL324642, rL324653, rL324654, rL324773, rL324774,
rL324781, rL324784, rL324955, rL324960, rL325816, rL327398 )
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
MemIntrinsicInst::[get|set]Alignment() methods.

Reference
   http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html
   http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html

llvm-svn: 327421
2018-03-13 16:31:19 +00:00
Andrea Di Biagio 7faea7cb53 [MC] Move the reciprocal throughput computation from TargetSchedModel to MCSchedModel.
The goal is to make the reciprocal throughput computation accessible through the
MCSchedModel interface. This is particularly important for llvm-mca because it
can only query the MCSchedModel interface.

No functional change intended.

Differential Revision: https://reviews.llvm.org/D44392

llvm-svn: 327420
2018-03-13 16:28:55 +00:00
Craig Topper 7e711a6822 [X86] Remove SplitBinaryOpsAndApply and use SplitOpsAndApply by adding curly braces around the ops.
Summary: Unless you were intentionally avoiding this syntax? I saw you mentioned makeArrayRef in your commit that added SplitOpsAndApply.

Reviewers: RKSimon

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D44403

llvm-svn: 327418
2018-03-13 16:23:27 +00:00
Brock Wyma f52e192293 Revert r327397 [CodeView] Omit forward references for unnamed structs and ...
This reverts commit r327397 to investigate a buildbot failure.

llvm-svn: 327414
2018-03-13 15:56:20 +00:00
Zaara Syeda df28fb6ac2 test commit: fix formatting of a comment
This is  a simple change to do the test commit.

llvm-svn: 327412
2018-03-13 15:49:05 +00:00
Simon Dardis d5ae61d49d [mips] Guard traps for microMIPS correctly
This is part of fixing the instruction predicates for MIPS.

Reviewers: atanasyan, abeserminji

Differential Revision: https://reviews.llvm.org/D44212

llvm-svn: 327409
2018-03-13 15:46:58 +00:00
Rafael Espindola f5220fb68f [ThinLTO] Clear dllimport when setting dso_local.
This is PR36686.

If a user of a library is LTOed with that library we take the
opportunity to set dso_local, but we don't clear dllimport, which
creates an invalid IR.

llvm-svn: 327408
2018-03-13 15:24:51 +00:00
Simon Pilgrim 3d4c86d399 [X86][Btver2] Split i8/i16/i32/i64 div/idiv costs
We were assuming a mixture of 32/64 division costs.

llvm-svn: 327407
2018-03-13 15:22:24 +00:00
Andrea Di Biagio 30c1ba4834 [MC] Move the instruction latency computation from TargetSchedModel to MCSchedModel.
The goal is to make the latency information accessible through the MCSchedModel
interface. This is particularly important for tools like llvm-mca that only have
access to the MCSchedModel API.

This partially fixes PR36676.
No functional change intended.

Differential Revision: https://reviews.llvm.org/D44383

llvm-svn: 327406
2018-03-13 15:22:13 +00:00
Sanjay Patel 204edeca56 [InstCombine] fix fmul reassociation to avoid creating an extra fdiv
This was supposed to be an NFC refactoring that will eventually allow
eliminating the isFast() predicate, but there's a rare possibility
that we would pessimize the code as shown in the test case because
we failed to check 'hasOneUse()' properly. This version also removes
an inefficiency of the old code; we would look for: 
(X * C) * C1 --> X * (C * C1)
...but that pattern is always handled by 
SimplifyAssociativeOrCommutative().

llvm-svn: 327404
2018-03-13 14:46:32 +00:00
Simon Dardis 476ed8f26e [mips] Fix the definitions of the EVA instructions
Correct their availability to their respective ISAs.

Reviewers: atanasyan

Differential Revision: https://reviews.llvm.org/D44209

llvm-svn: 327403
2018-03-13 14:39:44 +00:00
Daniel Neilson 41e781d5f1 [SROA] Take advantage of separate alignments for memcpy source and destination
Summary:
This change is part of step five in the series of changes to remove alignment argument from
memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the
SROA pass to cease using the old getAlignment() & setAlignment() APIs of MemoryIntrinsic in
favour of getting source & dest specific alignments through the new API. This allows us
to enhance visitMemTransferInst to be more aggressive setting the alignment in memcpy
calls that it creates, as well as to only change the alignment of a memcpy/memmove
argument that it replaces.

Steps:
Step 1) Remove alignment parameter and create alignment parameter attributes for
memcpy/memmove/memset. ( rL322965, rC322964, rL322963 )
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
source and dest alignments. ( rL323597 )
Step 3) Update Clang to use the new IRBuilder API. ( rC323617 )
Step 4) Update Polly to use the new IRBuilder API. ( rL323618 )
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
and those that use use MemIntrinsicInst::[get|set]Alignment() to use [get|set]DestAlignment()
and [get|set]SourceAlignment() instead. ( rL323886, rL323891, rL324148, rL324273, rL324278,
rL324384, rL324395, rL324402, rL324626, rL324642, rL324653, rL324654, rL324773, rL324774,
rL324781, rL324784, rL324955, rL324960, rL325816 )
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
MemIntrinsicInst::[get|set]Alignment() methods.

Reference
   http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html
   http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html

Reviewers: chandlerc, bollu, efriedma

Reviewed By: efriedma

Subscribers: efriedma, eraman, llvm-commits

Differential Revision: https://reviews.llvm.org/D42974

llvm-svn: 327398
2018-03-13 14:25:33 +00:00
Brock Wyma 4fb9184558 [CodeView] Omit forward references for unnamed structs and unions
Codeview references to unnamed structs and unions are expected to refer to the
complete type definition instead of a forward reference so Visual Studio can
resolve the type properly.

Differential Revision: https://reviews.llvm.org/D32498

llvm-svn: 327397
2018-03-13 14:14:16 +00:00
Simon Dardis 9d7e9032f1 [mips] Don't create nested CALLSEQ_START..CALLSEQ_END nodes.
For the MIPS O32 ABI, the current call lowering logic naively lowers each
call, creating the reserved argument area to hold the argument spill areas for
$a0..$a3 and the outgoing parameter area if one is required at each call site.

In the case of a sufficently large byval argument, a call to memcpy is used
to write the start+16..end of the argument into the outgoing parameter area.
This is done within the CALLSEQ_START..CALLSEQ_END of the callee. The CALLSEQ
nodes are responsible for performing the necessary stack adjustments.

Since the O32/N32/N64 MIPS ABIs do not have a red-zone and writing below the
stack pointer and reading the values back is unpredictable, the call to memcpy
cannot be hoisted out of the callee's CALLSEQ nodes.

However, for the O32 ABI requires the reserved argument area for functions
which have parameters. The naive lowering of calls will then create nested
CALLSEQ sequences. For N32 and N64 these nodes are also created, but with
zero stack adjustments as those ABIs do not have a reserved argument area.

This patch addresses the correctness issue by recognizing the special case
of lowering a byval argument that uses memcpy. By recognizing that the
incoming chain already has a CALLSEQ_START node on it when calling memcpy,
the CALLSEQ nodes are not created. For the N32 and N64 ABIs, this is not an
issue, as no stack adjustment has to be performed.

For the O32 ABI, the correctness reasoning is different. In the case of a
sufficently large byval argument, registers a0..a3 are going to be used for
the callee's arguments, mandating the creation of the reserved argument area.
The call to memcpy in the naive case will also create its own reserved
argument area. However, since the reserved argument area consists of undefined
values, both calls can use the same reserved argument area.

Reviewers: abeserminji, atanasyan

Differential Revision: https://reviews.llvm.org/D44296

llvm-svn: 327388
2018-03-13 12:50:03 +00:00
Simon Pilgrim 93bd7187f4 [X86][SSE41] createVariablePermute v2X64 - PCMPEQQ can test for index 0/1 and select between them.
llvm-svn: 327385
2018-03-13 12:22:58 +00:00
Eugene Leviant 6f42a2cd91 [Evaluator] Evaluate load/store with bitcast
Differential revision: https://reviews.llvm.org/D43457

llvm-svn: 327381
2018-03-13 10:19:50 +00:00
Jonas Paulsson 5612bb292c [CodeGenPrepare] Respect endianness in splitMergedValStore.
splitMergedValStore will split a store into two if target prefers this, or if
-force-split-store is passed.

This patch adds the missing handling for endianness in this function along
with a test case.

Review: Eli Friedman
https://reviews.llvm.org/D44396

llvm-svn: 327375
2018-03-13 08:36:20 +00:00
Max Kazantsev 2371e0a1c8 [SCEV][NFC] Smarter implementation of isAvailableAtLoopEntry
isAvailableAtLoopEntry duplicates logic of `properlyDominates` after checking invariance.
This patch replaces this logic with invocation of this method which is more profitable
because it supports caching.

Differential Revision: https://reviews.llvm.org/D43997

llvm-svn: 327373
2018-03-13 07:46:06 +00:00
Clement Courbet 9f0b3170bc [MergeICmps] Make sure that the comparison only has one use.
Summary: Fixes PR36557.

Reviewers: trentxintong, spatel

Subscribers: mstorsjo, llvm-commits

Differential Revision: https://reviews.llvm.org/D44083

llvm-svn: 327372
2018-03-13 07:05:55 +00:00
Yonghong Song 82bf8bcb4f bpf: Enhance debug information for peephole optimization passes
Add more debug information for peephole optimization passes.

These would only be enabled for debug version binary and could help
analyzing why some optimization opportunities were missed.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 327371
2018-03-13 06:47:07 +00:00
Yonghong Song e91802f336 bpf: New post-RA peephole optimization pass to eliminate bad RA codegen
This new pass eliminate identical move:

  MOV rA, rA

This is particularly possible to happen when sub-register support
enabled. The special type cast insn MOV_32_64 involves different
register class on src (i32) and dst (i64), RA could generate useless
instruction due to this.

This pass also could serve as the bast for further post-RA optimization.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 327370
2018-03-13 06:47:06 +00:00
Yonghong Song 80b882ecc5 bpf: Don't expand BSWAP on i32, promote it
Currently, there is no ALU32 bswap support in eBPF ISA.

BSWAP on i32 was set to EXPAND which would need about eight instructions
for single BSWAP.

It would be more efficient to promote it to i64, then doing BSWAP on i64.
For eBPF programs, most of the promotion are zero extensions which are
likely be elimiated later by peephole optimizations.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 327369
2018-03-13 06:47:05 +00:00
Yonghong Song 1d28a759d9 bpf: Support subregister definition check on PHI node
This patch relax the subregister definition check on Phi node.
Previously, we just cancel the optimizatoin when the definition is Phi
node while actually we could further check the definitions of incoming
parameters of PHI node.

This helps catch more elimination opportunities.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 327368
2018-03-13 06:47:04 +00:00
Yonghong Song c88bcdec43 bpf: Extends zero extension elimination beyond comparison instructions
The current zero extension elimination was restricted to operands of
comparison. It actually could be extended to more cases.

For example:

  int *inc_p (int *p, unsigned a)
  {
    return p + a;
  }

'a' will be promoted to i64 during addition, and the zero extension could
be eliminated as well.

For the elimination optimization, it should be much better to start
recognizing the candidate sequence from the SRL instruction instead of J*
instructions.

This patch makes it an generic zero extension elimination pass instead of
one restricted with comparison.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 327367
2018-03-13 06:47:03 +00:00
Yonghong Song 905d13c123 bpf: J*_RR should check both operands
There is a mistake in current code that we "break" out the optimization
when the first operand of J*_RR doesn't qualify the elimination. This
caused some elimination opportunities missed, for example the one in the
testcase.

The code should just fall through to handle the second operand.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 327366
2018-03-13 06:47:02 +00:00
Yonghong Song 89e47ac671 bpf: Tighten subregister definition check
The current subregister definition check stops after the MOV_32_64
instruction.

This means we are thinking all the following instruction sequences
are safe to be eliminated:

  MOV_32_64 rB, wA
  SLL_ri    rB, rB, 32
  SRL_ri    rB, rB, 32

However, this is *not* true. The source subregister wA of MOV_32_64 could
come from a implicit truncation of 64-bit register in which case the high
bits of the 64-bit register is not zeroed, therefore we can't eliminate
above sequence.

For example, for i32_val, we shouldn't do the elimination:

  long long bar ();

  int foo (int b, int c)
  {
    unsigned int i32_val = (unsigned int) bar();

    if (i32_val < 10)
      return b;
    else
      return c;
  }

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 327365
2018-03-13 06:47:00 +00:00
Serguei Katkov bbfbf21ddc Revert [SCEV] Fix isKnownPredicate
It is a revert of rL327362 which causes build bot failures with assert like

Assertion `isAvailableAtLoopEntry(RHS, L) && "RHS is not available at Loop Entry"' failed.

llvm-svn: 327363
2018-03-13 06:36:00 +00:00
Serguei Katkov b05574c0d3 [SCEV] Fix isKnownPredicate
IsKnownPredicate is updated to implement the following algorithm
proposed by @sanjoy and @mkazantsev :
isKnownPredicate(Pred, LHS, RHS) {
  Collect set S all loops on which either LHS or RHS depend.
  If S is non-empty
    a. Let PD be the element of S which is dominated by all other elements of S
    b. Let E(LHS) be value of LHS on entry of PD.
       To get E(LHS), we should just take LHS and replace all AddRecs that
       are attached to PD on with their entry values.
       Define E(RHS) in the same way.
    c. Let B(LHS) be value of L on backedge of PD.
       To get B(LHS), we should just take LHS and replace all AddRecs that
       are attached to PD on with their backedge values.
       Define B(RHS) in the same way.
    d. Note that E(LHS) and E(RHS) are automatically available on entry of PD,
       so we can assert on that.
    e. Return true if isLoopEntryGuardedByCond(Pred, E(LHS), E(RHS)) &&
                      isLoopBackedgeGuardedByCond(Pred, B(LHS), B(RHS))
Return true if Pred, L, R is known from ranges, splitting etc.
}
This is follow-up for https://reviews.llvm.org/D42417.

Reviewers: sanjoy, mkazantsev, reames
Reviewed By: sanjoy, mkazantsev
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43507

llvm-svn: 327362
2018-03-13 06:10:27 +00:00
Vlad Tsyrklevich aab6000684 Reland r327041: [ThinLTO] Keep available_externally symbols live
Summary:
This change fixes PR36483. The bug was originally introduced by a change
that marked non-prevailing symbols dead. This broke LowerTypeTests
handling of available_externally functions, which are non-prevailing.
LowerTypeTests uses liveness information to avoid emitting thunks for
unused functions.

Marking available_externally functions dead is incorrect, the functions
are used though the function definitions are not. This change keeps them
live, and lets the EliminateAvailableExternally/GlobalDCE passes remove
them later instead.

(Reland with a suspected fix for a unit test failure I haven't been able
to reproduce locally)

Reviewers: pcc, tejohnson

Reviewed By: tejohnson

Subscribers: grimar, mehdi_amini, inglorion, eraman, llvm-commits

Differential Revision: https://reviews.llvm.org/D43690

llvm-svn: 327360
2018-03-13 05:08:48 +00:00
Adam Nemet ade40dd3c8 [LTO] Return proper error object rather than null LTOModule
This caused a crash in LTOModule::createInLocalContext.

rdar://37926841

llvm-svn: 327359
2018-03-13 04:37:01 +00:00
Taewook Oh 7646e77969 [ThinLTO] Add funtions in callees metadata to CallGraphEdges
Summary:
If there's a callees metadata attached to the indirect call instruction, add CallGraphEdges to the callees mentioned in the metadata when computing FunctionSummary.

* Why this is necessary:
Consider following code example:
```
(foo.c)
static int f1(int x) {...}
static int f2(int x);
static int (*fptr)(int) = f2;
static int f2(int x) {
  if (x) fptr=f1; return f1(x);
}
int foo(int x) {
  (*fptr)(x); // !callees metadata of !{i32 (i32)* @f1, i32 (i32)* @f2} would be attached to this call.
}

(bar.c)
int bar(int x) {
  return foo(x);
}
```

At LTO time when `foo.o` is imported into `bar.o`, function `foo` might be inlined into `bar` and PGO-guided indirect call promotion will run after that. If the profile data tells that the promotion of `@f1` or `@f2` is beneficial, the optimizer will check if the "promoted" `@f1` or `@f2` (such as `@f1.llvm.0` or `@f2.llvm.0`) is available. Without this patch, importing `!callees` metadata would only add promoted declarations of `@f1` and `@f2` to the `bar.o`, but still the optimizer will assume that the function is available and perform the promotion. The result of that is link failure with `undefined reference to @f1.llvm.0`.

This patch fixes this problem by adding callees in the `!callees` metadata to CallGraphEdges so that their definition would be properly imported into.

One may ask that there already is a logic to add indirect call promotion targets to be added to CallGraphEdges. However, if profile data says "indirect call promotion is only beneficial under a certain inline context", the logic wouldn't work. In the code example above, if profile data is like
```
bar:1000000:100000
  1:100000
    1: foo:100000
        1: 100000 f1:100000
```
, Computing FunctionSummary for `foo.o` wouldn't add `foo->f1` to CallGraphEdges. (Also, it is at least "possible" that one can provide profile data to only link step but not to compilation step).

Reviewers: tejohnson, mehdi_amini, pcc

Reviewed By: tejohnson

Subscribers: inglorion, eraman, llvm-commits

Differential Revision: https://reviews.llvm.org/D44399

llvm-svn: 327358
2018-03-13 04:26:58 +00:00
Craig Topper 80058e30cc [LegalizeTypes] In SplitVecOp_TruncateHelper, use GetSplitVector on the input instead of creating new extract_subvectors.
llvm-svn: 327355
2018-03-13 01:17:40 +00:00
Saleem Abdulrasool f159a389df ObjCARC: address review comments from majnemer
I forgot to incorporate these comments into the original revision.  This
is just code cleanup addressing the feedback, NFC.

llvm-svn: 327351
2018-03-12 23:48:20 +00:00
Volkan Keles 4ecdb44a64 BlockExtractor: Don’t delete functions directly
Blocks may have function calls, so don’t erase functions
directly to avoid erasing a function that has a user.

llvm-svn: 327340
2018-03-12 22:28:18 +00:00
Saleem Abdulrasool 8b342680bf ObjCARC: teach the cloner about funclets
In the case that the CallInst that is being moved has an associated
operand bundle which is a funclet, the move will construct an invalid
instruction.  The new site will have a different token and needs to be
reassociated with the new instruction.

Unfortunately, there is no way to alter the bundle after the
construction of the instruction.  Replace the call instruction cloning
with a custom helper to clone the instruction and reassociate the
funclet token.

llvm-svn: 327336
2018-03-12 21:46:09 +00:00
Simon Pilgrim 7f1b9196cb [X86][Btver2] Clean up formatting/comments in scheduler model. NFCI.
Moved 'special cases' to be closer to other system classes.

llvm-svn: 327332
2018-03-12 21:35:12 +00:00
Vedant Kumar 3a408538f0 Remove the LoopInstSimplify pass (-loop-instsimplify)
LoopInstSimplify is unused and untested. Reading through the commit
history the pass also seems to have a high maintenance burden.

It would be best to retire the pass for now. It should be easy to
recover if we need something similar in the future.

Differential Revision: https://reviews.llvm.org/D44053

llvm-svn: 327329
2018-03-12 20:49:42 +00:00
Michael Zolotukhin a3d8ef0f08 Improve caching scheme in ProvenanceAnalysis.
Summary:
ProvenanceAnalysis::related(A, B) currently memoizes its results, and on big
tests the cache grows too large, and we're spending most of the time
growing/looking through DenseMap.

This patch reduces the size of the cache by normalizing keys first: we do that
by calling GetUnderlyingObjCPtr on the input values. The results of
GetUnderlyingObjCPtr are also memoized in a separate cache.

The patch doesn't bring noticable changes to compile time on CTMark, however
significantly helps one of our internal tests.

Reviewers: gottesmm

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D44270

llvm-svn: 327328
2018-03-12 20:36:25 +00:00
Lei Huang cd4f385795 [PowerPC][NFC] Explicitly state types on FP SDAG patterns in anticipation of adding the f128 type
llvm-svn: 327319
2018-03-12 19:26:18 +00:00
Martin Storsjo 7bc64bd889 [AArch64] Fold adds with tprel_lo12_nc and secrel_lo12 into a following ldr/str
Differential Revision: https://reviews.llvm.org/D44355

llvm-svn: 327316
2018-03-12 18:47:43 +00:00
Craig Topper ee99aa4dd0 [InstCombine] Replace calls to getNumUses with hasNUses or hasNUsesOrMore
getNumUses is a linear time operation. It traverses the user linked list to the end and counts as it goes. Since we are only interested in small constant counts, we should use hasNUses or hasNUsesMore more that terminate the traversal as soon as it can provide the answer.

There are still two other locations in InstCombine, but changing those would force a rebase of D44266 which if accepted would remove them.

Differential Revision: https://reviews.llvm.org/D44398

llvm-svn: 327315
2018-03-12 18:46:05 +00:00
Craig Topper 3b4ad9c12d [CallSiteSplitting] Use !Instruction::use_empty instead of checking for a non-zero return from getNumUses
getNumUses is a linear operation. It walks a linked list to get a count. So in this case its better to just ask if there are any users rather than how many.

llvm-svn: 327314
2018-03-12 18:40:59 +00:00
Jan Korous 8d315b8ede [NFC] Replace iterators in PrintHelp with range-based for
llvm-svn: 327312
2018-03-12 18:31:07 +00:00
Jan Korous 41e9a15deb [NFC] PrintHelp cleanup
llvm-svn: 327311
2018-03-12 18:30:47 +00:00
Krzysztof Parzyszek 2d08f2ebf8 [Hexagon] Counting leading/trailing bits is cheap
llvm-svn: 327308
2018-03-12 18:18:23 +00:00
Simon Pilgrim f0a9b25394 [X86][Btver2] FSqrt/FDiv reg-reg instructions don't use the AGU.
I love you llvm-mca.

llvm-svn: 327306
2018-03-12 18:12:46 +00:00
Bjorn Pettersson a223f815dd [SelectionDAG] Improve handling of dangling debug info
Summary:
1) Make sure to discard dangling debug info if the variable (or
variable fragment) is mapped to something new before we had a
chance to resolve the dangling debug info.

2) When resolving debug info, make sure to bump the associated
SDNodeOrder to ensure that the DBG_VALUE is emitted after the
instruction that defines the value used in the DBG_VALUE.
This will avoid a debug-use before def scenario as seen in
https://bugs.llvm.org/show_bug.cgi?id=36417.

The new test case, test/DebugInfo/X86/sdag-dangling-dbgvalue.ll,
show some other limitations in how dangling debug info is
handled in the SelectionDAG. Since we currently only support
having one dangling dbg.value per Value, we will end up dropping
debug info when there are more than one variable that is described
by the same "dangling value".

Reviewers: aprantl

Reviewed By: aprantl

Subscribers: aprantl, eraman, llvm-commits, JDevlieghere

Tags: #debug-info

Differential Revision: https://reviews.llvm.org/D44369

llvm-svn: 327303
2018-03-12 18:02:39 +00:00
Krzysztof Parzyszek 5d41cc19bd [Hexagon] Subtarget feature to emit one instruction per packet
This adds two features: "packets", and "nvj".

Enabling "packets" allows the compiler to generate instruction packets,
while disabling it will prevent it and disable all optimizations that
generate them. This feature is enabled by default on all subtargets.
The feature "nvj" allows the compiler to generate new-value jumps and it
implies "packets". It is enabled on all subtargets.

The exception is made for packets with endloop instructions, since they
require a certain minimum number of instructions in the packets to which
they apply. Disabling "packets" will not prevent hardware loops from
being generated.

llvm-svn: 327302
2018-03-12 17:47:46 +00:00
Simon Pilgrim a0536c17b9 [X86] Deleting README-MMX.txt now that all tasks have been completed.
MMX buildvectors were improved at rL327247 - new MMX bugs should be raised on bugzilla

llvm-svn: 327300
2018-03-12 17:29:54 +00:00
Dmitry Preobrazhensky d98c97b4f9 [AMDGPU][MC][GFX8] Added BUFFER_STORE_LDS_DWORD Instruction
See bug 36558: https://bugs.llvm.org/show_bug.cgi?id=36558

Differential Revision: https://reviews.llvm.org/D43950

Reviewers: artem.tamazov, arsenm
llvm-svn: 327299
2018-03-12 17:29:24 +00:00
Simon Pilgrim deface9c73 [X86][Btver2] Prefix all scheduler defs. NFCI.
These are all global, so prefix with 'J' to help prevent accidental name clashes with other models.

llvm-svn: 327296
2018-03-12 17:07:08 +00:00
Craig Topper acaba3b402 [X86] Remove use of MVT class from the ShuffleDecode library.
MVT belongs to the CodeGen layer, but ShuffleDecode is used by the X86 InstPrinter which is part of the MC layer. This only worked because MVT is completely implemented in a header file with no other library dependencies.

Differential Revision: https://reviews.llvm.org/D44353

llvm-svn: 327292
2018-03-12 16:43:11 +00:00
Yaxun Liu a99e7d8e44 [AMDGPU] Fix lowering enqueue kernel when kernel has no name
Since the enqueued kernels have internal linkage, their names may be dropped.
In this case, give them unique names __amdgpu_enqueued_kernel or
__amdgpu_enqueued_kernel.n where n is a sequential number starting from 1.

Differential Revision: https://reviews.llvm.org/D44322

llvm-svn: 327291
2018-03-12 16:34:06 +00:00
Simon Pilgrim 6f01e654b4 [X86][Btver2] Extend JWriteResFpuPair to accept resource/uop counts. NFCI.
This allows the single resource classes (VarBlend, MPSAD, VarVecShift) to use the JWriteResFpuPair macro.

llvm-svn: 327289
2018-03-12 16:02:56 +00:00
Simon Pilgrim bc216b440f [X86][Btver2] Use JWriteResFpuPair wrapper for AES/CLMUL/HADD scheduler cases. NFCI.
These are single pipe and have the default resource/uop counts like JWriteResFpuPair so there's no need to handle them separately.

llvm-svn: 327283
2018-03-12 15:29:00 +00:00
Dmitry Preobrazhensky da4a7c01bf [AMDGPU][MC] Corrected GATHER4 opcodes
See bug 36252: https://bugs.llvm.org/show_bug.cgi?id=36252

Differential Revision: https://reviews.llvm.org/D43874

Reviewers: artem.tamazov, arsenm
llvm-svn: 327278
2018-03-12 15:03:34 +00:00
Jonas Devlieghere b4c85cf4a4 [DebugInfo] Replace unreachable with None
Invalid user input should not trigger assertions and unreachables. We
already return an Option so we should just return None here.

Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=5532

llvm-svn: 327274
2018-03-12 14:45:08 +00:00
Sam McCall bbfe434185 [Hexagon] fix 'must explicitly initialize the const member' error which clang 3.8 emits
llvm-svn: 327273
2018-03-12 14:40:48 +00:00
Matt Arsenault 7b9ed89dcf AMDGPU/GlobalISel: Legality and RegBankInfo for G_{INSERT|EXTRACT}_VECTOR_ELT
llvm-svn: 327269
2018-03-12 13:35:53 +00:00
Matt Arsenault c0aefd561e AMDGPU/GlobalISel: InstrMapping for G_MERGE_VALUES
llvm-svn: 327268
2018-03-12 13:35:49 +00:00
Matt Arsenault 503afda95f AMDGPU/GlobalISel: Make some G_MERGE_VALUEs legal
llvm-svn: 327267
2018-03-12 13:35:43 +00:00
Simon Dardis 1f0fe56460 [mips] Split out ASEPredicate from InsnPredicates (NFC)
This simplifies tagging instructions with the correct ISA and ASE, albeit making
instruction definitions a bit more verbose.

Reviewers: atanasyan, abeserminji

Differential Revision: https://reviews.llvm.org/D44299

llvm-svn: 327265
2018-03-12 13:16:12 +00:00
Nico Weber 73a699e592 MC intel asm parser: Allow @ at the start of function names.
Ports parts of r193000 to the intel parser. Fixes part of PR36676.

https://reviews.llvm.org/D44359

llvm-svn: 327262
2018-03-12 12:47:27 +00:00
Simon Pilgrim 6618e2a09c [X86][SSE] createVariablePermute - PSHUFB requires SSSE3 not just SSE3
llvm-svn: 327259
2018-03-12 12:30:04 +00:00
Jonas Devlieghere 2b8b90a768 Fix compilation on Darwin with expensive checks.
After r327219 was landed, the bot with expensive checks on GreenDragon
started failing. The problem was missing symbols `regex_t` and
`regmatch_t` in `xlocale/_regex.h`. The latter was included because
after the change in r327219, `random` is needed, which transitively
includes `xlocale.h.` which in turn conditionally includes
`xlocale/_regex.h` when _REGEX_H_ is defined. Because this is the header
guard in `regex_impl.h` and because `regex_impl.h` was included before
the other LLVM includes, `xlocale/_regex.h` was included without the
necessary types being available.

This commit fixes this by moving the include of `regex_impl.h` all the
way down. I also added a comment to stress the significance of its
position.

llvm-svn: 327256
2018-03-12 11:01:05 +00:00
Eugene Leviant 19e238746b [ThinLTO] Recommit of import global variables
This wasreverted in r326638 due to link problems and fixed
afterwards

llvm-svn: 327254
2018-03-12 10:30:50 +00:00
Justin Lebar 24b6640b1b Back out "Re-land: Teach CorrelatedValuePropagation to reduce the width of udiv/urem instructions."
This reverts r326908, originally landed as D44102.

Reverted for causing performance regressions on x86.  (These regressions
are not yet understood.)

llvm-svn: 327252
2018-03-12 09:26:09 +00:00
Craig Topper 7cc1b1fc84 [X86] Don't compute known bits twice for the same SDValue in LowerMUL.
We called MaskedValueIsZero with two different masks, but underneath that calls computeKnownBits before applying the mask. This means we compute the same known bits twice due to the two calls. Instead just call computeKnownBits directly and apply the two masks ourselves.

llvm-svn: 327251
2018-03-12 05:35:02 +00:00
Serguei Katkov a20e05bb94 [CGP] Fix the remove of matched phis in complex addressing mode
When we replace the Phi we created with matched ones it is possible that
there are two identical phi nodes in IR. And matcher is smart enough to find that
new created phi matches both of them. So we try to replace our phi node with
matched ones twice and what is bad we delete our phi node twice causing a crash.

As soon as we found that we have two identical Phi nodes it makes sense to do
a clean-up and replace one phi node by other one.
The patch implements it.

Reviewers: john.brawn, reames
Reviewed By: john.brawn
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43758

llvm-svn: 327250
2018-03-12 03:50:07 +00:00
Simon Pilgrim d09cc9c62c [X86][MMX] Support MMX build vectors to avoid SSE usage (PR29222)
64-bit MMX vector generation usually ends up lowering into SSE instructions before being spilled/reloaded as a MMX type.

This patch creates a MMX vector from MMX source values, taking the lowest element from each source and constructing broadcasts/build_vectors with direct calls to the MMX PUNPCKL/PSHUFW intrinsics.

We're missing a few consecutive load combines that could be handled in a future patch if that would be useful - my main interest here is just avoiding a lot of the MMX/SSE crossover.

Differential Revision: https://reviews.llvm.org/D43618

llvm-svn: 327247
2018-03-11 19:22:13 +00:00
Simon Pilgrim 30f74c14ff [X86][AVX] createVariablePermute - scale v16i16 variable permutes to use v32i8 codegen
XOP was already doing this, and now AVX performs v32i8 variable permutes as well.

llvm-svn: 327245
2018-03-11 17:23:54 +00:00
Simon Pilgrim b306501796 [X86][AVX] createVariablePermute - widen permutes for cases where the source vector is wider than the destination type
llvm-svn: 327244
2018-03-11 17:00:46 +00:00
Simon Pilgrim 9a5d0c7540 [X86][AVX] createVariablePermute - use PSHUFB+PCMPGT+SELECT for v32i8 variable permutes
Same as the VPERMILPS/VPERMILPD approach for v8f32/v4f64 cases, rely on PSHUFB using bits[3:0] for indexing - we can ignore the sign bit (zero element) as those index vector values are considered undefined. The select between the lo/hi permute results based on the index size.

llvm-svn: 327242
2018-03-11 16:28:11 +00:00
Simon Pilgrim d2fbd87ce8 Fix for buildbots which didn't like makeArrayRef with initializer lists.
llvm-svn: 327241
2018-03-11 14:31:55 +00:00
Simon Pilgrim e60afdf9eb [X86][SSE] Generalized SplitBinaryOpsAndApply to SplitOpsAndApply to support any number of ops.
I've kept SplitBinaryOpsAndApply as a wrapper to avoid a lot of makeArrayRef code.

llvm-svn: 327240
2018-03-11 14:04:53 +00:00
Simon Pilgrim f9cc80d218 [X86][AVX] createVariablePermute - use 2xVPERMIL+PCMPGT+SELECT for v8i32/v8f32 and v4i64/v4f64 variable permutes
As VPERMILPS/VPERMILPD only selects elements based on the bits[1:0]/bit[1] then we can permute both the (repeated) lo/hi 128-bit vectors in each case and then select between these results based on whether the index was for for lo/hi.

For v4i64/v4f64 this avoids some rather nasty v4i64 multiples on the AVX2 implementation, which seems to be worse than the extra port5 pressure from the additional shuffles/blends.

llvm-svn: 327239
2018-03-11 11:52:26 +00:00
Simon Pilgrim 2565bd421e [X86][AVX512] createVariablePermute - Non-VLX targets can widen v4i64/v8f64 variable permutes to v8i64/v8f64
Permutes in the upper elements will be undefined, but they will be discarded anyway.

llvm-svn: 327238
2018-03-11 11:19:19 +00:00
Simon Pilgrim 64b899f0f3 [x86][SSE] Add widenSubVector helper. NFCI.
Helper function to insert a subvector into the bottom elements of a larger zero/undef vector with the same scalar type.

I've converted a couple of INSERT_SUBVECTOR calls to use it, there are plenty more although in some cases I was worried it might make the code more ambiguous. 

llvm-svn: 327236
2018-03-11 10:50:48 +00:00
George Burgess IV 44477c61a0 [MemorySSA] Fix comment + remove redundant dyn_casts; NFC
StartingAccess is already a MemoryUseOrDef.

llvm-svn: 327235
2018-03-11 04:16:12 +00:00