Commit Graph

114763 Commits

Author SHA1 Message Date
Sanjay Patel 284ba0c18f [ValueTracking] allow undef elements when matching vector abs
llvm-svn: 336111
2018-07-02 14:43:40 +00:00
David Stenberg 23bba56fce [CodeGen] Make block removal order deterministic in CodeGenPrepare
Summary:
Replace use of a SmallPtrSet with a SmallSetVector to make the worklist
iteration order deterministic. This is done as the order the blocks are
removed may affect whether or not PHI nodes in successor blocks are
removed.

For example, consider the following case where %bb1 and %bb2 are
removed:

    bb1:
      br i1 undef, label %bb3, label %bb4
    bb2:
      br i1 undef, label %bb4, label %bb3
    bb3:
      pv1 = phi type [ undef, %bb1 ], [ undef, %bb2], [ v0, %other ]
      br label %bb4
    bb4:
      pv2 = phi type [ undef, %bb1 ], [ undef, %bb2 ],
                     [ pv1, %bb3 ], [ v0, %other ]

If %bb2 is removed before %bb1, the incoming values from %bb1 and %bb2
to pv1 will be removed before %bb1 is removed as a predecessor to %bb4.
The pv1 node will thus be optimized out (to v0) at the time %bb1 is
removed as a predecessor to %bb4, leaving the blocks as following when
the incoming value from %bb1 has been removed:

    bb3: ; pv1 optimized out, incoming value to pv2 is v0
      br label %bb4
    bb4:
      pv2 = phi type [ v0, %bb3 ], [ v0, %other ]

The pv2 PHI node will be optimized away by removePredecessor() as all
incoming values are identical.

In case %bb2 is removed after %bb1, pv1 will not be optimized out at the
time %bb2 is removed as a predecessor to %bb4, leaving the blocks as
following when the incoming value from %bb2 to pv2 has been removed:

    bb3:
      pv1 = phi type [ undef, %bb2 ], [ v0, %other ]
      br label %bb4
    bb4:
      pv2 = phi type [ pv1, %bb3 ], [ v0, %other ]

The pv2 PHI node will thus not be removed in this case, ultimately
leading to the following output

    bb3: ; pv1 optimized out, incoming value to pv2 is v0
      br label %bb4
    bb4:
      pv2 = phi type [ v0, %bb3 ], [ v0, %other ]

I have not looked into changing DeleteDeadBlock() so that the redundant
PHI nodes are removed.

I have not added a test case, as I was not able to create a particularly
small and (not messy) reproducer. This is likely due to SmallPtrSet
behaving deterministically when in small mode.

Reviewers: void, dexonsmith, spatel, skatkov, fhahn, bkramer, nhaehnle

Reviewed By: fhahn

Subscribers: mgrang, llvm-commits

Differential Revision: https://reviews.llvm.org/D48369

llvm-svn: 336109
2018-07-02 14:23:48 +00:00
Alex Bradbury c48908781d [X86] Use addAliasForDirective to support the .word directive (reland)
The X86 asm parser currently has custom parsing logic for .word. Rather than
use this custom logic, we can just use addAliasForDirective to enable the
reuse of AsmParser::parseDirectiveValue.

See also similar changes to Sparc (rL333078), AArch64 (rL333077), and Hexagon
(rL332607) backends.

Differential Revision: https://reviews.llvm.org/D47004

This is a fixed reland of rL336100. This should have been caught in 
pre-commit testing so apologies for the noise.

llvm-svn: 336104
2018-07-02 13:49:52 +00:00
Alex Bradbury c000e4dcb5 Revert r336100
This was a bad change. .word == 2byte on x86.

llvm-svn: 336103
2018-07-02 13:43:45 +00:00
Simon Pilgrim d5fb50e3bf [SLPVectorizer] Remove nullptr early-outs from Instruction::ShuffleVector getEntryCost
This code is only used by alternate opcodes so the InstructionsState has already confirmed that every Value is an Instruction, plus we use cast<Instruction> which will assert on failure.

llvm-svn: 336102
2018-07-02 13:41:29 +00:00
Alex Bradbury 42485ec9ca [X86] Use addAliasForDirective to support the .word directive
The X86 asm parser currently has custom parsing logic for .word. Rather than 
use this custom logic, we can just use addAliasForDirective to enable the 
reuse of AsmParser::parseDirectiveValue.

See also similar changes to Sparc (rL333078), AArch64 (rL333077), and Hexagon 
(rL332607) backends.

Differential Revision: https://reviews.llvm.org/D47004

llvm-svn: 336100
2018-07-02 13:37:15 +00:00
Florian Hahn 4ebba909a2 Recommit r328307: [IPSCCP] Use constant range information for comparisons of parameters.
This version contains a fix to add values for which the state in ParamState change
to the worklist if the state in ValueState did not change. To avoid adding the
same value multiple times, mergeInValue returns true, if it added the value to
the worklist. The value is added to the worklist depending on its state in
ValueState.

Original message:
For comparisons with parameters, we can use the ParamState lattice
elements which also provide constant range information. This improves
the code for PR33253 further and gets us closer to use
ValueLatticeElement for all values.

Also, as we are using the range information in the solver directly, we
do not need tryToReplaceWithConstantRange afterwards anymore.

Reviewers: dberlin, mssimpso, davide, efriedma

Reviewed By: mssimpso

Differential Revision: https://reviews.llvm.org/D43762

llvm-svn: 336098
2018-07-02 12:44:04 +00:00
Simon Pilgrim 265793d52a [SLPVectorizer] Fix alternate opcode + shuffle cost function to correct handle SK_Select patterns.
We were always using the opcodes of the first 2 scalars for the costs of the alternate opcode + shuffle. This made sense when we used SK_Alternate and opcodes were guaranteed to be alternating, but this fails for the more general SK_Select case.

This fix exposes an issue demonstrated by the fmul_fdiv_v4f32_const test - the SLM model has v4f32 fdiv costs which are more than twice those of the f32 scalar cost, meaning that the cost model determines that the vectorization is not performant. Unfortunately it completely ignores the fact that the fdiv by a constant will be changed into a fmul by InstCombine for a much lower cost vectorization. But at least we're seeing this now...

llvm-svn: 336095
2018-07-02 11:28:01 +00:00
Simon Pilgrim 409bd5f487 [SLPVectorizer] Only Alternate opcodes use ShuffleVector cases for getEntryCost/vectorizeTree. NFCI.
Add assertions - we're already assuming this in how we use the AltOpcode and treat everything as BinaryOperators.

llvm-svn: 336092
2018-07-02 10:54:19 +00:00
Sander de Smalen 8d4c01a702 [AArch64][SVE] Asm: Support for (SQ)INCP/DECP (scalar, vector)
Increments/decrements the result with the number of active bits
from the predicate.

The inc/dec variants added are:
- incp   x0, p0.h     (scalar)
- incp   z0.h, p0     (vector)

The unsigned saturating inc/dec variants added are:
- uqincp x0, p0.h     (scalar)
- uqincp w0, p0.h     (scalar, 32bit)
- uqincp z0.h, p0     (vector)

The signed saturating inc/dec variants added are:
- sqincp x0, p0.h     (scalar)
- sqincp x0, p0.h, w0 (scalar, 32bit)
- sqincp z0.h, p0     (vector)

llvm-svn: 336091
2018-07-02 10:08:36 +00:00
Sander de Smalen c504101781 [AArch64][SVE] Asm: Support for (saturating) vector INC/DEC instructions.
Increment/decrement vector by multiple of predicate constraint
element count.

The variants added by this patch are:
 - INCH, INCW, INC 

and (saturating):
 - SQINCH, SQINCW, SQINCD
 - UQINCH, UQINCW, UQINCW
 - SQDECH, SQINCW, SQINCD
 - UQDECH, UQINCW, UQINCW

For example:
  incw z0.s, all, mul #4

llvm-svn: 336090
2018-07-02 09:31:11 +00:00
Simon Pilgrim e389434a8a [X86][BtVer2] Added Jaguar FPU Pipe0/1 uop counters to permit basic llvm-exegesis uop testing
We don't have PMCs to cover many of the Jaguar resources but we can at least monitor the FPU issue pipes which give an indication of the fpu uop count, just not the execution resources.

llvm-svn: 336089
2018-07-02 09:15:01 +00:00
Petar Jovanovic 3af2c992dc [Mips][FastISel] Do not duplicate condition while lowering branches
This change fixes the issue that arises when we duplicate condition from
the predecessor block. If the condition's arguments are not considered alive
across the blocks, fast regalloc gets confused and starts generating reloads
from the slots that have never been spilled to. This change also leads to
smaller code given that, unlike on architectures with condition codes, on
Mips we can branch directly on register value, thus we gain nothing by
duplication.

Patch by Dragan Mladjenovic.

Differential Revision: https://reviews.llvm.org/D48642

llvm-svn: 336084
2018-07-02 08:56:57 +00:00
Sander de Smalen 8eea4f1c7d [AArch64][SVE] Asm: Support for vector element compares (immediate).
Compare vector elements with a signed/unsigned immediate, e.g.
  cmpgt   p0.s, p0/z, z0.s, #-16
  cmphi   p0.s, p0/z, z0.s, #127

llvm-svn: 336081
2018-07-02 08:20:59 +00:00
Sander de Smalen 0325e304b9 Reapply r334980 and r334983.
These patches were previously reverted as they led to 
buildbot time-outs caused by large switch statement in
printAliasInstr when using UBSan and O3.  The issue has
been addressed with a workaround (r335525).

llvm-svn: 336079
2018-07-02 07:34:52 +00:00
Craig Topper e06dabd3ca [X86] Put some cases in switch statements back on one line to be more compact and make it easier to see the similarities. NFC
It looks like someone ran clang-format over this entire file which reformatted these switches into a multiline form. But I think the single line form is more useful here.

llvm-svn: 336077
2018-07-02 06:42:42 +00:00
Craig Topper 0661f67296 [X86] Remove FMA3Info DenseMap. Break into sorted tables that we can binary search.
I separated out the rounding and broadcast groups into their own tables because it made the ordering in the main table easier.

Further splitting of the tables might make it possible to directly index using bits from the TSFlags, but its probably not worth it right now.

llvm-svn: 336075
2018-07-02 06:23:39 +00:00
QingShan Zhang 3b2aa2b4b4 [PowerPC] Don't make it as pre-inc candidate if displacement isn't 4's multiple for i64 pre-inc load/store
For the below case, pre-inc prep think it's a good candidate to use pre-inc for the bucket, but 64bit integer load/store update (pre-inc) instruction on Power requires the displacement field should be DS-form (4's multiple). Since it can't satisfy the constraint, we have to do some fix ups later. As below, the original load/stores could be well-form, it makes things worse.

unsigned long long result = 0;
unsigned long long foo(char *p, unsigned long long n) {
  for (unsigned long long i = 0; i < n; i++) {
    unsigned long long x1 = *(unsigned long long *)(p - 50000 + i);
    unsigned long long x2 = *(unsigned long long *)(p - 61024 + i);
    unsigned long long x3 = *(unsigned long long *)(p - 62048 + i);
    unsigned long long x4 = *(unsigned long long *)(p - 64096 + i);
    result *= x1 * x2 * x3 * x4;
  }
  return result;
}

Patch by jedilyn(Kewen Lin).

Differential Revision: https://reviews.llvm.org/D48813 
--This line, and  those below, will be ignored--

M    lib/Target/PowerPC/PPCLoopPreIncPrep.cpp
A    test/CodeGen/PowerPC/preincprep-i64-check.ll

llvm-svn: 336074
2018-07-02 05:46:09 +00:00
Piotr Padlewski 5b3db45e8f Implement strip.invariant.group
Summary:
This patch introduce new intrinsic -
strip.invariant.group that was described in the
RFC: Devirtualization v2

Reviewers: rsmith, hfinkel, nlopes, sanjoy, amharc, kuhar

Subscribers: arsenm, nhaehnle, JDevlieghere, hiraditya, xbolva00, llvm-commits

Differential Revision: https://reviews.llvm.org/D47103

Co-authored-by: Krzysztof Pszeniczny <krzysztof.pszeniczny@gmail.com>
llvm-svn: 336073
2018-07-02 04:49:30 +00:00
Eric Christopher 53054141a7 Add an entry for rodata constant merge sections to the default
section flags in the ELF assembler. This matches the defaults
given in the rest of MC.

Fixes PR37997 where we couldn't assemble our own assembly output
without warnings.

llvm-svn: 336072
2018-07-02 00:16:39 +00:00
Craig Topper c004aa6c5f [X86] Remove the places that return nullptr from X86InstrInfo::commuteInstructionImpl.
findCommutedOpIndices does the pre-checking for whether commuting is possible. There should be no reason left to fail in commuteInstructionImpl. There was a missing pre-check that I've added there and changed the check to an assert in commuteInstructionImpl.

llvm-svn: 336070
2018-07-01 23:27:41 +00:00
Simon Pilgrim 3dafb553d9 [SLPVectorizer] Call InstructionsState.isOpcodeOrAlt with Instruction instead of an opcode. NFCI.
llvm-svn: 336069
2018-07-01 20:22:46 +00:00
Simon Pilgrim ef9c97c343 [SLPVectorizer] Replace sameOpcodeOrAlt with InstructionsState.isOpcodeOrAlt helper. NFCI.
This is a basic step towards matching more general instructions types than just opcodes.

llvm-svn: 336068
2018-07-01 20:07:30 +00:00
Craig Topper 4d8ec92fb0 [X86][Disassembler] Remove TYPE_BNDR from translateImmediate.
I've check the disassembler tables and this shouldn't be reachable. Which is good since if it was reachable there should have been a 'return' after the addOperand line.

llvm-svn: 336066
2018-07-01 17:50:29 +00:00
Simon Pilgrim 77d2067677 [SLPVectorizer] Use InstructionsState Op/Alt opcodes directly. NFCI.
llvm-svn: 336063
2018-07-01 13:41:58 +00:00
David Green 963401d2be [UnrollAndJam] New Unroll and Jam pass
This is a simple implementation of the unroll-and-jam classical loop
optimisation.

The basic idea is that we take an outer loop of the form:

  for i..
    ForeBlocks(i)
    for j..
      SubLoopBlocks(i, j)
    AftBlocks(i)

Instead of doing normal inner or outer unrolling, we unroll as follows:

  for i... i+=2
    ForeBlocks(i)
    ForeBlocks(i+1)
    for j..
      SubLoopBlocks(i, j)
      SubLoopBlocks(i+1, j)
    AftBlocks(i)
    AftBlocks(i+1)
  Remainder Loop

So we have unrolled the outer loop, then jammed the two inner loops into
one. This can lead to a simpler inner loop if memory accesses can be shared
between the now jammed loops.

To do this we have to prove that this is all safe, both for the memory
accesses (using dependence analysis) and that ForeBlocks(i+1) can move before
AftBlocks(i) and SubLoopBlocks(i, j).

Differential Revision: https://reviews.llvm.org/D41953

llvm-svn: 336062
2018-07-01 12:47:30 +00:00
Eugene Leviant 6e4134459b [Evaluator] Improve evaluation of call instruction
Recommit of r335324 after buildbot failure fix

llvm-svn: 336059
2018-07-01 11:02:07 +00:00
Craig Topper a2d30b3134 [X86] Remove unnecessary include. NFC
Leftover from when the pass contained a DenseMap before it switched to binary search.

llvm-svn: 336057
2018-07-01 05:54:22 +00:00
Craig Topper 4e78213ae4 [X86] Move the memory unfolding table creation into its own class and make it a ManagedStatic.
Also move the static folding tables, their search functions and the new class into new cpp/h files.

The unfolding table is effectively static data. It's just a different ordering and a subset of the static folding tables.

By putting it in a separate ManagedStatic we ensure we only have one copy instead of one per X86InstrInfo object. This way also makes it only get initialized when really needed.

llvm-svn: 336056
2018-07-01 05:47:49 +00:00
Craig Topper 84199deb17 [X86] Move the X86InstrFMA3Info class into the cpp file. Expose only a getFMA3Group free function. NFCI
The class only exists to hold a DenseMap and is only created as a ManagedStatic. It used to expose a single static method that outside code was expected to use.

This patch moves that static function out of the class and moves it implementation into the cpp file. It can now access the ManagedStatic directly by name without the need for the other static method that accessed the ManagedStatic.

llvm-svn: 336055
2018-06-30 22:38:42 +00:00
Craig Topper 731740744f [X86] Remove the AsmName from the HAX,HDX,HCX,HBX,HSI,HDI,HBP,HSP,HIP artificial registers so they can't be parsed by the assembly parser.
There are no instructions that use them so they weren't causing any bad matches. But they weren't being diagnosed as "invalid register name" if they were used and would instead trigger some form of invalid operand.

llvm-svn: 336054
2018-06-30 22:38:41 +00:00
Craig Topper 1b7b9b8596 [X86] Use MVT::i8 for scalar shift amounts since that is what they ultimately need to legalize to.
I believe all of these are constants so legalizing them should be pretty trivial, but this saves a step.

In one case it looks like we may have been creating a shift amount larger than the shift input itself.

llvm-svn: 336052
2018-06-30 18:30:31 +00:00
Craig Topper 5f28d50d27 [X86] When combining load to BZHI, make sure we create the shift instruction with an i8 type.
This combine runs pretty late and causes us to introduce a shift after the op legalization phase has run. We need to be sure we create the shift with the proper type for the shift amount. If we don't do this, we will still re-legalize the operation properly, but we won't get a chance to fully optimize the truncate that gets inserted.

So this patch adds the necessary truncate when the shift is created. I've also narrowed the subtract that gets created to always be an i32 type. The truncate would have trigered SimplifyDemandedBits to optimize it anyway. But using a more appropriate VT here is free and saves an optimization step.

llvm-svn: 336051
2018-06-30 17:49:42 +00:00
Simon Pilgrim fae337704e [DAGCombiner] Handle correctly non-splat power of 2 -1 divisor (PR37119)
The combine added in commit 329525 overlooked the case where one, but not all, of the divisor elements is -1, -1 is the only power of two value for which the sdiv expansion recipe breaks.

Thanks to @zvi for the original patch.

Differential Revision: https://reviews.llvm.org/D45806

llvm-svn: 336048
2018-06-30 12:22:55 +00:00
Tom Stellard eebbfc2809 AMDGPU/GlobalISel: Make IMPLICIT_DEF of all sizes < 512 legal.
Summary:
We could split sizes that are not power of two into smaller sized
G_IMPLICIT_DEF instructions, but this ends up generating
G_MERGE_VALUES instructions which we then have to handle in the instruction
selector.  Since G_IMPLICIT_DEF is really a no-op it's easier just to
keep everything that can fit into a register legal.

Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D48777

llvm-svn: 336041
2018-06-30 04:09:44 +00:00
Jessica Paquette 8bda1881ca [MachineOutliner] Add support for target-default outlining.
This adds functionality to the outliner that allows targets to
specify certain functions that should be outlined from by default.

If a target supports default outlining, then it specifies that in
its TargetOptions. In the case that it does, and the user hasn't
specified that they *never* want to outline, the outliner will
be added to the pass pipeline and will run on those default functions.

This is a preliminary patch for turning the outliner on by default
under -Oz for AArch64.

https://reviews.llvm.org/D48776

llvm-svn: 336040
2018-06-30 03:56:03 +00:00
Craig Topper 59f2f38fe0 [X86] Remove masking from avx512 rotate intrinsics. Use select in IR instead.
llvm-svn: 336035
2018-06-30 01:32:04 +00:00
Chandler Carruth 7c557f804d [instsimplify] Move the instsimplify pass to use more obvious file names
and diretory.

Also cleans up all the associated naming to be consistent and removes
the public access to the pass ID which was unused in LLVM.

Also runs clang-format over parts that changed, which generally cleans
up a bunch of formatting.

This is in preparation for doing some internal cleanups to the pass.

Differential Revision: https://reviews.llvm.org/D47352

llvm-svn: 336028
2018-06-29 23:36:03 +00:00
Zachary Turner 68e1919d14 [CodeView] Correctly compute the name of S_PROCREF symbols.
We have a function which switches on the type of a symbol record
to return a hardcoded offset into the record that contains the
symbol name.  Not all symbols have names to begin with, and for
those records we return -1 for the offset.

Names are used for various things.  Importantly for this particular
bug, a hash of the record name is used as a key for certain hash
tables which are serialied into the PDB file.  One of these hash
tables is for the global symbol stream, which is basically a
collection of S_PROCREF symbols which contain the name of the
symbol, a module, and an address offset.

However, for S_PROCREF symbols, the function to return the offset
of the name was returning -1: basically it wasn't implemented.
As a result of this, all global symbols were hashing to the same
value, essentially it was as if every single global symbol's name
was the empty string.

This manifests in the VS debugger when you try to call a function
(global or member, doesn't matter) through the immediate window
and the debugger simply reports an error because it can't find the
function.  This makes perfect sense, because it is hashing the name
for real, looking in the global symbol hash table, and there is only
1 entry there which corresponds to a symbol whose name is the empty
string.

Fixing this fixes the MSVC debugger in this case.

llvm-svn: 336024
2018-06-29 22:19:02 +00:00
Heejin Ahn a86152d0a7 [WebAssembly] Comment out a switch block in ISelDAGToDAG
Summary: Fixes PR37977.

Reviewers: RKSimon

Subscribers: dschuff, sbc100, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D48737

llvm-svn: 336017
2018-06-29 21:19:22 +00:00
Alina Sbirlea da1e80feb7 [MemorySSA] Add APIs to MemoryPhis to delete incoming blocks/values, and an updater API to remove blocks.
Summary:
MemoryPhis now have APIs analogous to BB Phis to remove an incoming value/block.
The MemorySSAUpdater uses the above APIs when updating MemorySSA given a set of dead blocks about to be deleted.

Reviewers: george.burgess.iv

Subscribers: sanjoy, jlebar, Prazek, llvm-commits

Differential Revision: https://reviews.llvm.org/D48396

llvm-svn: 336015
2018-06-29 20:46:16 +00:00
Alex Shlyapnikov 788764ca12 [HWASan] Do not retag allocas before return from the function.
Summary:
Retagging allocas before returning from the function might help
detecting use after return bugs, but it does not work at all in real
life, when instrumented and non-instrumented code is intermixed.
Consider the following code:

F_non_instrumented() {
  T x;
  F1_instrumented(&x);
  ...
}

{
  F_instrumented();
  F_non_instrumented();
}

- F_instrumented call leaves the stack below the current sp tagged
  randomly for UAR detection
- F_non_instrumented allocates its own vars on that tagged stack,
  not generating any tags, that is the address of x has tag 0, but the
  shadow memory still contains tags left behind by F_instrumented on the
  previous step
- F1_instrumented verifies &x before using it and traps on tag mismatch,
  0 vs whatever tag was set by F_instrumented

Reviewers: eugenis

Subscribers: srhines, llvm-commits

Differential Revision: https://reviews.llvm.org/D48664

llvm-svn: 336011
2018-06-29 20:20:17 +00:00
Vedant Kumar 69ee62cef8 [LLVMContext] Detecting leaked instructions with metadata
When instructions with metadata are accidentally leaked, the result is a
difficult-to-find memory corruption in ~LLVMContextImpl that leads to
random crashes.

Patch by Arvīds Kokins!

llvm-svn: 336010
2018-06-29 20:13:13 +00:00
Paul Robinson 50f8ca38ee Pass DWARFUnit to verifier by reference not by value. I am moderately
sure this should not cause a memory leak.

llvm-svn: 336007
2018-06-29 19:17:44 +00:00
Sean Fertile cd0d7634f6 Revert "Extend CFGPrinter and CallPrinter with Heat Colors"
This reverts r335996 which broke graph printing in Polly.

llvm-svn: 336000
2018-06-29 17:48:58 +00:00
Matt Arsenault f5be3ad7f8 AMDGPU: Don't use struct type for argument layout
This was introducing unnecessary padding after the explicit
arguments, depending on the alignment of the total struct type.
Also has the side effect of avoiding creating an extra GEP for
the offset from the base kernel argument to the explicit kernel
argument offset.

llvm-svn: 335999
2018-06-29 17:31:42 +00:00
Craig Topper 87b107dd69 [X86] Limit the number of target specific nodes emitted in LowerShiftParts
The important part is the creation of the SHLD/SHRD nodes. The compare and the conditional move can use target independent nodes that can be legalized on their own. This gives some opportunities to trigger the optimizations present in the lowering for those things. And its just better to limit the number of places we emit target specific nodes.

The changed test cases still aren't optimal.

Differential Revision: https://reviews.llvm.org/D48619

llvm-svn: 335998
2018-06-29 17:24:07 +00:00
Sean Fertile 3b0535b424 Extend CFGPrinter and CallPrinter with Heat Colors
Extends the CFGPrinter and CallPrinter with heat colors based on heuristics or
profiling information. The colors are enabled by default and can be toggled
on/off for CFGPrinter by using the option -cfg-heat-colors for both
-dot-cfg[-only] and -view-cfg[-only].  Similarly, the colors can be toggled
on/off for CallPrinter by using the option -callgraph-heat-colors for both
-dot-callgraph and -view-callgraph.

Patch by Rodrigo Caetano Rocha!

Differential Revision: https://reviews.llvm.org/D40425

llvm-svn: 335996
2018-06-29 17:13:58 +00:00
Craig Topper 7c96f051d2 [X86] Use a std::vector for the memory unfolding table.
Previously we used a DenseMap which is costly to set up due to multiple full table rehashes as the size increases and causes the table to be reallocated.

This patch changes the table to a vector of structs. We now walk the reg->mem tables and push new entries in the mem->reg table for each row not marked TB_NO_REVERSE. Once all the table entries have been created, we sort the vector. Then we can use a binary search for lookups.

Differential Revision: https://reviews.llvm.org/D48585

llvm-svn: 335994
2018-06-29 17:11:26 +00:00
Petar Jovanovic cccc236a96 [mips] Support shrink-wrapping
Except for -O0, it's enabled by default.

Patch by Vladimir Stefanovic.

Differential Revision: https://reviews.llvm.org/D47947

llvm-svn: 335989
2018-06-29 16:37:16 +00:00
Stanislav Mekhanoshin 20d4795d93 [AMDGPU] Enable LICM in the BE pipeline
This allows to hoist code portion to compute reciprocal of loop
invariant denominator in integer division after codegen prepare
expansion.

Differential Revision: https://reviews.llvm.org/D48604

llvm-svn: 335988
2018-06-29 16:26:53 +00:00
Jessica Paquette 79917b9686 [MachineOutliner] Add always and never options to -enable-machine-outliner
This is a recommit of r335887, which was erroneously committed earlier.

To enable the MachineOutliner by default on AArch64, we need to be able to
disable the MachineOutliner and also provide an option to "always" enable the
outliner.

This adds that capability. It allows the user to still use the old
-enable-machine-outliner option, which defaults to "always". This is building
up to allowing the user to specify "always" versus the target default
outlining behaviour.

https://reviews.llvm.org/D48682

llvm-svn: 335986
2018-06-29 16:12:45 +00:00
Alexey Bataev 2a03d4296a [DEBUG_INFO, NVPTX] Do not emit .debug_loc section.
Summary:
.debug_loc section is not supported for NVPTX target. If there is an
object whose location can change during its lifetime, we do not generate
debug location info for this variable.

Reviewers: echristo

Subscribers: jholewinski, JDevlieghere, llvm-commits

Differential Revision: https://reviews.llvm.org/D48730

llvm-svn: 335976
2018-06-29 14:23:28 +00:00
Krzysztof Parzyszek ce3a66804a [Hexagon] Remove unused instruction itineraties, NFC
llvm-svn: 335975
2018-06-29 13:55:28 +00:00
Sanjay Patel da66753e01 [InstCombine] enhance shuffle-of-binops to allow different variable ops (PR37806)
This was discussed in D48401 as another improvement for:
https://bugs.llvm.org/show_bug.cgi?id=37806

If we have 2 different variable values, then we shuffle (select) those lanes, 
shuffle (select) the constants, and then perform the binop. This eliminates a binop.

The new shuffle uses the same shuffle mask as the existing shuffle, so there's no 
danger of creating a difficult shuffle.

All of the earlier constraints still apply, but we also check for extra uses to 
avoid creating more instructions than we'll remove.

Additionally, we're disallowing the fold for div/rem because that could expose a
UB hole.

Differential Revision: https://reviews.llvm.org/D48678

llvm-svn: 335974
2018-06-29 13:44:06 +00:00
Roman Shirokiy 272eac85c7 Fix overconfident assert in ScalarEvolution::isImpliedViaMerge
We can have AddRec with loops having many predecessors.
This changes an assert to an early return.

Differential Revision: https://reviews.llvm.org/D48766

llvm-svn: 335965
2018-06-29 11:46:30 +00:00
Sjoerd Meijer 3b599d75d5 [AArch64] Armv8.4-A: Virtualization system registers
This adds the Secure EL2 extension.

Differential Revision: https://reviews.llvm.org/D48711

llvm-svn: 335962
2018-06-29 11:03:15 +00:00
Simon Pilgrim aab8660e23 [X86][SSE] Support v16i8/v32i8 vector rotations
This uses the same technique as for shifts - split the rotation into 4/2/1-bit partial rotations and select those partials based on the amount bit, making use of PBLENDVB if available. This halves the use of PBLENDVB compared to expanding to shifts, which can be a slow op.

Unfortunately I haven't found a decent way to share much of this code with the shift equivalent.

Differential Revision: https://reviews.llvm.org/D48655

llvm-svn: 335957
2018-06-29 09:36:39 +00:00
Sjoerd Meijer 195e904002 [ARM][AArch64] Armv8.4-A Enablement
Initial patch adding assembly support for Armv8.4-A.

Besides adding v8.4 as a supported architecture to the usual places, this also
adds target features for the different crypto algorithms. Armv8.4-A introduced
new crypto algorithms, made them optional, and allows different combinations:

- none of the v8.4 crypto functions are supported, which is independent of the
  implementation of the Armv8.0 SHA1 and SHA2 instructions.
- the v8.4 SHA512 and SHA3 support is implemented, in this case the Armv8.0
  SHA1 and SHA2 instructions must also be implemented.
- the v8.4 SM3 and SM4 support is implemented, which is independent of the
  implementation of the Armv8.0 SHA1 and SHA2 instructions.
- all of the v8.4 crypto functions are supported, in this case the Armv8.0 SHA1
  and SHA2 instructions must also be implemented.

The v8.4 crypto instructions are added to AArch64 only, and not AArch32,
and are made optional extensions to Armv8.2-A.

The user-facing Clang options will map on these new target features, their
naming will be compatible with GCC and added in follow-up patches.

The Armv8.4-A instruction sets can be downloaded here:
https://developer.arm.com/products/architecture/a-profile/exploration-tools

Differential Revision: https://reviews.llvm.org/D48625

llvm-svn: 335953
2018-06-29 08:43:19 +00:00
Roman Lebedev 8d081b78e4 SCEVExpander::expandAddRecExprLiterally(): check before casting as Instruction
Summary:
An alternative to D48597.
Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=37936 | PR37936 ]].

The problem is as follows:
1. `indvars` marks `%dec` as `NUW`.
2. `loop-instsimplify` runs `instsimplify`, which constant-folds `%dec` to -1 (D47908)
3. `loop-reduce` tries to do some further modification, but crashes
    with an type assertion in cast, because `%dec` is no longer an `Instruction`,

If the runline is split into two, i.e. you first run `-indvars -loop-instsimplify`,
store that into a file, and then run `-loop-reduce`, there is no crash.

So it looks like the problem is due to `-loop-instsimplify` not discarding SCEV.
But in this case we can just not crash if it's not an `Instruction`.
This is just a local fix, unlike D48597, so there may very well be other problems.

Reviewers: mkazantsev, uabelho, sanjoy, silviu.baranga, wmi

Reviewed By: mkazantsev

Subscribers: evstupac, javed.absar, spatel, llvm-commits

Differential Revision: https://reviews.llvm.org/D48599

llvm-svn: 335950
2018-06-29 07:44:20 +00:00
Craig Topper 875e9f8fa4 [X86] Remove masking from the avx512 packed sqrt intrinsics. Use select in IR instead.
While there improve the coverage of the intrinsic testing and add fast-isel tests.

llvm-svn: 335944
2018-06-29 05:43:26 +00:00
Tom Stellard c5a154db48 AMDGPU: Separate R600 and GCN TableGen files
Summary:
We now have two sets of generated TableGen files, one for R600 and one
for GCN, so each sub-target now has its own tables of instructions,
registers, ISel patterns, etc.  This should help reduce compile time
since each sub-target now only has to consider information that
is specific to itself.  This will also help prevent the R600
sub-target from slowing down new features for GCN, like disassembler
support, GlobalISel, etc.

Reviewers: arsenm, nhaehnle, jvesely

Reviewed By: arsenm

Subscribers: MatzeB, kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D46365

llvm-svn: 335942
2018-06-28 23:47:12 +00:00
Eli Friedman 65d885e376 [ARM] Assert that ARMDAGToDAGISel creates valid UBFX/SBFX nodes.
We don't ever check these again (unless you're using
-fno-integrated-as), so make sure the extracted bits are well-defined.

I don't think it's possible to trigger any of the assertions on trunk,
but it's difficult to prove.  (The first one depends on DAGCombine to
minimize the number of set bits in AND masks; I think the others are
mathematically impossible to hit.)

llvm-svn: 335931
2018-06-28 21:49:41 +00:00
Jessica Paquette 0c5d3ffbb8 [MachineOutliner] Never add the outliner in -O0
This is a recommit of r335879.

We shouldn't add the outliner when compiling at -O0 even if
-enable-machine-outliner is passed in. This makes sure that we
don't add it in this case.

This also removes -O0 from the outliner DWARF test.

llvm-svn: 335930
2018-06-28 21:49:24 +00:00
Jake Ehrlich 0f440d832f [llvm-readobj] Add experimental support for SHT_RELR sections
This change adds experimental support for SHT_RELR sections, proposed
here: https://groups.google.com/forum/#!topic/generic-abi/bX460iggiKg

Definitions for the new ELF section type and dynamic array tags, as well
as the encoding used in the new section are all under discussion and are
subject to change. Use with caution!

Author: rahulchaudhry

Differential Revision: https://reviews.llvm.org/D47919

llvm-svn: 335922
2018-06-28 21:07:34 +00:00
Sanjay Patel d512853aa3 [InstCombine] fix opcode check in shuffle fold
There's no way to expose this difference currently, 
but we should use the updated variable because the
original opcodes can go stale if we transform into
something new.

llvm-svn: 335920
2018-06-28 20:52:43 +00:00
Martin Storsjo 2a9bd7b756 [COFF] Fix constant sharing regression for MinGW
This fixes a regression since SVN r334523, where the object files
built targeting MinGW were rejected by GNU binutils tools. Prior to
that commit, we only put constants in comdat for MSVC configurations.

Differential Revision: https://reviews.llvm.org/D48567

llvm-svn: 335918
2018-06-28 20:28:29 +00:00
Teresa Johnson e87868b7e9 [ThinLTO] Port InlinerFunctionImportStats handling to new PM
Summary:
The InlinerFunctionImportStats will collect and dump stats regarding how
many function inlined into the module were imported by ThinLTO.

Reviewers: wmi, dexonsmith

Subscribers: mehdi_amini, inglorion, llvm-commits, eraman

Differential Revision: https://reviews.llvm.org/D48729

llvm-svn: 335914
2018-06-28 20:07:47 +00:00
Benjamin Kramer 23d8282047 [NVPTX] Delete dead code
No functionality change.

llvm-svn: 335913
2018-06-28 20:05:35 +00:00
Eli Friedman 6613efbd4e [ARM] Add missing Thumb2 assembler diagnostics.
Mostly just adding checks for Thumb2 instructions which correspond to
ARM instructions which already had diagnostics. While I'm here, also fix
ARM-mode strd to check the input registers correctly.

Differential Revision: https://reviews.llvm.org/D48610

llvm-svn: 335909
2018-06-28 19:53:12 +00:00
Anastasis Grammenos 425df22ee3 [SROA] Preserve DebugLoc when rewriting alloca partitions
When rewriting an alloca partition copy the DL from the
old alloca over the the new one.

Differential Revision: https://reviews.llvm.org/D48640

llvm-svn: 335904
2018-06-28 18:58:30 +00:00
Zachary Turner 1adca7c4a5 Add a flag to FileOutputBuffer that allows modification.
FileOutputBuffer creates a temp file and on commit atomically
renames the temp file to the destination file.  Sometimes we
want to modify an existing file in place, but still have the
atomicity guarantee.  To do this we can initialize the contents
of the temp file from the destination file (if it exists), that
way the resulting FileOutputBuffer can have only selective
bytes modified.  Committing will then atomically replace the
destination file as desired.

llvm-svn: 335902
2018-06-28 18:49:09 +00:00
Simon Pilgrim c09b5e31d7 Remove unnecessary semicolon. NFCI.
Fixes -Wpedantic warning.

llvm-svn: 335901
2018-06-28 18:37:16 +00:00
Craig Topper 90317d1d94 [X86] Suppress load folding into and/or/xor if it will prevent matching btr/bts/btc.
This is a follow up to r335753. At the time I forgot about isProfitableToFold which makes this pretty easy.

Differential Revision: https://reviews.llvm.org/D48706

llvm-svn: 335895
2018-06-28 17:58:01 +00:00
Jonas Devlieghere b757fc3878 Revert "Re-land r335297 "[X86] Implement more of x86-64 large and medium PIC code models""
Reverting because this is causing failures in the LLDB test suite on
GreenDragon.

  LLVM ERROR: unsupported relocation with subtraction expression, symbol
  '__GLOBAL_OFFSET_TABLE_' can not be undefined in a subtraction
  expression

llvm-svn: 335894
2018-06-28 17:56:43 +00:00
Sanjay Patel 57bda365bf [InstCombine] allow shl+mul combos with shuffle (select) fold (PR37806)
This is an enhancement to D48401 that was discussed in:
https://bugs.llvm.org/show_bug.cgi?id=37806

We can convert a shift-left-by-constant into a multiply (we canonicalize IR in the other 
direction because that's generally better of course). This allows us to remove the shuffle 
as we do in the regular opcodes-are-the-same cases.

This requires a small hack to make sure we don't introduce any extra poison:
https://rise4fun.com/Alive/ZGv

Other examples of opcodes where this would work are add+sub and fadd+fsub, but we already 
canonicalize those subs into adds, so there's nothing to do for those cases AFAICT. There 
are planned enhancements for opcode transforms such or -> add.

Note that there's a different fold needed if we've already managed to simplify away a binop 
as seen in the test based on PR37806, but we manage to get that one case here because this 
fold is positioned above the demanded elements fold currently.

Differential Revision: https://reviews.llvm.org/D48485

llvm-svn: 335888
2018-06-28 17:48:04 +00:00
Jessica Paquette dafa198c96 [MachineOutliner] Define MachineOutliner support in TargetOptions
Targets should be able to define whether or not they support the outliner
without the outliner being added to the pass pipeline. Before this, the
outliner pass would be added, and ask the target whether or not it supports the
outliner.

After this, it's possible to query the target in TargetPassConfig, before the
outliner pass is created. This ensures that passing -enable-machine-outliner
will not modify the pass pipeline of any target that does not support it.

https://reviews.llvm.org/D48683

llvm-svn: 335887
2018-06-28 17:45:43 +00:00
Simon Pilgrim 9c70d48cb2 [DAGCombiner] Ensure we use the correct CC result type in visitSDIV (REAPPLIED)
We could get away with it for constant folded cases, but not for rL335719.

Thanks to Krzysztof Parzyszek for noticing.

Reapply original commit rL335821 which was reverted at rL335871 due to a WebAssembly bug that was fixed at rL335884.

llvm-svn: 335886
2018-06-28 17:33:41 +00:00
Simon Pilgrim 99f701673d [WebAssembly] Add getSetCCResultType placeholder override to handle vector compare results.
Necessary to get the rL335821 bugfix (which was reverted at rL335871) un-reverted.

llvm-svn: 335884
2018-06-28 17:27:09 +00:00
Jessica Paquette d6261bef7b Revert "[MachineOutliner] Add always and never options to -enable-machine-outliner"
I accidentally committed this instead of D48683 because I haven't had coffee
yet.

llvm-svn: 335883
2018-06-28 17:26:19 +00:00
Jessica Paquette f3a44fe833 Revert "[MachineOutliner] Never add the outliner in -O0"
This reverts commit 9c7c10e4073a0bc6a759ce5cd33afbac74930091.

It relies on r335872 since that introduces the machine outliner
flags test. I meant to commit D48683 in that commit, but got mixed
up and committed D48682 instead. So, I'm reverting this and
r335872, since D48682 hasn't made it through review yet.

llvm-svn: 335882
2018-06-28 17:26:18 +00:00
Jessica Paquette c9d675266e [MachineOutliner] Never add the outliner in -O0
We shouldn't add the outliner when compiling at -O0 even if
-enable-machine-outliner is passed in. This makes sure that we
don't add it in this case.

This also updates machine-outliner-flags to reflect the change
and improves the comment describing what that test does.

llvm-svn: 335879
2018-06-28 17:05:57 +00:00
Matthias Braun da5e7e11d1 SelectionDAGBuilder, mach-o: Skip trap after noreturn call (for Mach-O)
Add NoTrapAfterNoreturn target option which skips emission of traps
behind noreturn calls even if TrapUnreachable is enabled.

Enable the feature on Mach-O to save code size; Comments suggest it is
not possible to enable it for the other users of TrapUnreachable.

rdar://41530228

DifferentialRevision: https://reviews.llvm.org/D48674
llvm-svn: 335877
2018-06-28 17:00:45 +00:00
Jessica Paquette 1ccb66c5fb [MachineOutliner] Add always and never options to -enable-machine-outliner
To enable the MachineOutliner by default on AArch64, we need to be able to
disable the MachineOutliner and also provide an option to "always" enable the
outliner.

This adds that capability. It allows the user to still use the old
-enable-machine-outliner option, which defaults to "always". This is building
up to allowing the user to specify "always" versus the target-default
outlining behaviour.

llvm-svn: 335872
2018-06-28 16:39:42 +00:00
Haojian Wu 2103990e63 Revert "[DAGCombiner] Ensure we use the correct CC result type in visitSDIV"
This reverts commit r335821.

This crashes the webassembly test, run "ninja check-llvm-codegen-webassembly" to reproduce.

llvm-svn: 335871
2018-06-28 16:25:57 +00:00
Stanislav Mekhanoshin 67aa18f165 [AMDGPU] Early expansion of 32 bit udiv/urem
This allows hoisting of a common code, for instance if denominator
is loop invariant. Current change is expansion only, adding licm to
the target pass list going to be a separate patch. Given this patch
changes to codegen are minor as the expansion is similar to that on
DAG. DAG expansion still must remain for R600.

Differential Revision: https://reviews.llvm.org/D48586

llvm-svn: 335868
2018-06-28 15:59:18 +00:00
Stanislav Mekhanoshin 298a61590a [AMDGPU] Overload llvm.amdgcn.fmad.ftz to support f16
Differential Revision: https://reviews.llvm.org/D48677

llvm-svn: 335866
2018-06-28 15:24:46 +00:00
John Brawn bdbbd8381f Add a PhiValuesAnalysis pass to calculate the underlying values of phis
This pass is being added in order to make the information available to BasicAA,
which can't do caching of this information itself, but possibly this information
may be useful for other passes.

Incorporates code based on Daniel Berlin's implementation of Tarjan's algorithm.

Differential Revision: https://reviews.llvm.org/D47893

llvm-svn: 335857
2018-06-28 14:13:06 +00:00
Benjamin Kramer 269eb21e1c Revert "Add support for generating a call graph profile from Branch Frequency Info."
This reverts commits r335794 and r335797. Breaks ThinLTO+FDO selfhost.

llvm-svn: 335851
2018-06-28 13:15:03 +00:00
Sjoerd Meijer c89ca5582a [ARM] Parallel DSP Pass
Armv6 introduced instructions to perform 32-bit SIMD operations. The purpose of
this pass is to do some straightforward IR pattern matching to create ACLE DSP
intrinsics, which map on these 32-bit SIMD operations.

Currently, only the SMLAD instruction gets recognised. This instruction
performs two multiplications with 16-bit operands, and stores the result in an
accumulator. We will follow this up with patches to recognise SMLAD in more
cases, and also to generate other DSP instructions (like e.g. SADD16).

Patch by: Sam Parker and Sjoerd Meijer

Differential Revision: https://reviews.llvm.org/D48128

llvm-svn: 335850
2018-06-28 12:55:29 +00:00
Jesper Antonsson 514b6b5796 Comment change to verify commit rights. NFC.
Summary: Just a silly one-character correction.

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D48709

llvm-svn: 335832
2018-06-28 10:55:04 +00:00
Hans Wennborg a257376003 s/TablesChecked/TableChecked/ after r335823
llvm-svn: 335831
2018-06-28 10:24:38 +00:00
Matt Arsenault 75e7192ba3 AMDGPU: Remove MFI::ABIArgOffset
We have too many mechanisms for tracking the various offsets
used for kernel arguments, so remove one. There's still a lot of
confusion with these because there are two different "implicit"
argument areas located at the beginning and end of the kernarg
segment.

Additionally, the offset was determined based on the memory
size of the split element types. This would break in a future
commit where v3i32 is decomposed into separate i32 pieces.

llvm-svn: 335830
2018-06-28 10:18:55 +00:00
Matt Arsenault 1fb9013368 AMDGPU: Error on calls from graphics shaders
In principle nothing should stop these from working, but
work is necessary to create an ABI for dealing with the stack
related registers.

llvm-svn: 335829
2018-06-28 10:18:36 +00:00
Matt Arsenault 12269dda5c AMDGPU: Fix AMDGPUCodeGenPrepare using uninitialized AMDGPUAS struct
Not sure how this wasn't noticed before.

llvm-svn: 335828
2018-06-28 10:18:23 +00:00
Matt Arsenault 513e0c0ea4 AMDGPU: Fix assert on aggregate type kernel arguments
Just fix the crash for now by not doing the optimization since
figuring out how to properly convert the bits for an arbitrary
struct is a pain.

Also fix a crash when there is only an empty struct argument.

llvm-svn: 335827
2018-06-28 10:18:11 +00:00
Benjamin Kramer f9613b2995 Unify sorted asserts to use the existing atomic pattern
These are all benign races and only visible in !NDEBUG. tsan complains
about it, but a simple atomic bool is sufficient to make it happy.

llvm-svn: 335823
2018-06-28 10:03:45 +00:00
Simon Pilgrim abebe4c746 [DAGCombiner] Ensure we use the correct CC result type in visitSDIV
We could get away with it for constant folded cases, but not for rL335719.

Thanks to Krzysztof Parzyszek for noticing.

llvm-svn: 335821
2018-06-28 09:54:28 +00:00
Florian Hahn 388af14f85 [SCCP] Mark CFG as preserved.
SCCP does not change the CFG, so we can mark it as preserved.

Reviewers: dberlin, efriedma, davide

Reviewed By: davide

Differential Revision: https://reviews.llvm.org/D47149

llvm-svn: 335820
2018-06-28 09:53:38 +00:00
Simon Pilgrim 49cb65bb7b [DAGCombiner] Remove unused variable. NFCI.
Noticed in D45806 review.

llvm-svn: 335817
2018-06-28 09:29:08 +00:00
Max Kazantsev f5ba37182e [IndVarSimplify] Ignore unreachable users of truncs
If a trunc has a user in a block which is not reachable from entry,
we can safely perform trunc elimination as if this user didn't exist.

llvm-svn: 335816
2018-06-28 08:20:03 +00:00
Petar Jovanovic d175aeb881 [DwarfDebug] Remove unused argument (NFC)
Remove unused ByteStreamer argument from function emitDebugLocValue.

Patch by Nikola Prica.

Differential Revision: https://reviews.llvm.org/D48590

llvm-svn: 335811
2018-06-28 04:50:40 +00:00
Craig Topper ec5d568ac1 [X86] Use PatFrag with hardcoded numbers for FROUND_NO_EXC/FROUND_CURRENT instead of ImmLeafs with predicates where one of the two numbers was hardcoded.
This more efficient for the isel table generator since we can use CheckChildInteger instead of MoveChild, CheckPredicate, MoveParent. This reduced the table size by 1-2K.

I wish there was a way to share the values with X86BaseInfo.h and still use a PatFrag like this. These numbers are fixed by the X86 intrinsic spec going back many years and we should never need to change them. So we shouldn't waste table bytes to support sharing.

llvm-svn: 335806
2018-06-28 01:45:44 +00:00
Craig Topper ab70f58891 [X86] Change how we prefer shift by immediate over folding a load into a shift.
BMI2 added new shift by register instructions that have the ability to fold a load.

Normally without doing anything special isel would prefer folding a load over folding an immediate because the load folding pattern has higher "complexity". This would require an instruction to move the immediate into a register. We would rather fold the immediate instead and have a separate instruction for the load.

We used to enforce this priority by artificially lowering the complexity of the load pattern.

This patch changes this to instead reject the load fold in isProfitableToFoldLoad if there is an immediate. This is more consistent with other binops and feels less hacky.

llvm-svn: 335804
2018-06-28 00:47:41 +00:00
Michael J. Spencer 98f5475f44 [CGProfile] Fix unused variable warning.
llvm-svn: 335797
2018-06-28 00:12:04 +00:00
Michael J. Spencer 5bf1ead377 Add support for generating a call graph profile from Branch Frequency Info.
=== Generating the CG Profile ===

The CGProfile module pass simply gets the block profile count for each BB and scans for call instructions.  For each call instruction it adds an edge from the current function to the called function with the current BB block profile count as the weight.

After scanning all the functions, it generates an appending module flag containing the data. The format looks like:
```
!llvm.module.flags = !{!0}

!0 = !{i32 5, !"CG Profile", !1}
!1 = !{!2, !3, !4} ; List of edges
!2 = !{void ()* @a, void ()* @b, i64 32} ; Edge from a to b with a weight of 32
!3 = !{void (i1)* @freq, void ()* @a, i64 11}
!4 = !{void (i1)* @freq, void ()* @b, i64 20}
```

Differential Revision: https://reviews.llvm.org/D48105

llvm-svn: 335794
2018-06-27 23:58:08 +00:00
Zachary Turner ee8010abe3 Move some code from PDBFileBuilder to MSFBuilder.
The code to emit the pieces of the MSF file were actually in
PDBFileBuilder.  Move this to MSFBuilder so that we can
theoretically emit an MSF without having a PDB file.

llvm-svn: 335789
2018-06-27 21:18:15 +00:00
Benjamin Kramer e214f046af [X86] Make folding table checking threadsafe
This is a benign race, but tsan likes to complain about it. Just make it
happy.

llvm-svn: 335788
2018-06-27 21:01:53 +00:00
Craig Topper 880e34ed45 [X86] In X86DAGToDAGISel::PreprocessISelDAG, make sure we don't access N after we delete it.
If we turn X86ISD::AND into ISD::AND, we delete N. But we were continuing onto the next block of code even though N no longer existed.

Just happened to notice it. I assume asan didn't notice it because we explicitly unpoison deleted nodes and give them a DELETE_NODE opcode.

llvm-svn: 335787
2018-06-27 20:58:46 +00:00
Sameer AbuAsal 9b65ffb097 [RISCV] Add machine function pass to merge base + offset
Summary:
   In r333455 we added a peephole to fix the corner cases that result
   from separating base + offset lowering of global address.The
   peephole didn't handle some of the cases because it only has a basic
   block view instead of a function level view.

   This patch replaces that logic with a machine function pass. In
   addition to handling the original cases it handles uses of the global
   address across blocks in function and folding an offset from LW\SW
   instruction. This pass won't run for OptNone compilation, so there
   will be a negative impact overall vs the old approach at O0.

Reviewers: asb, apazos, mgrang

Reviewed By: asb

Subscribers: MartinMosbeck, brucehoult, the_o, rogfer01, mgorny, rbar, johnrusso, simoncook, niosHD, kito-cheng, shiva0217, zzheng, llvm-commits, edward-jones

Differential Revision: https://reviews.llvm.org/D47857

llvm-svn: 335786
2018-06-27 20:51:42 +00:00
Nirav Dave 7c57ae57a8 [DAGCombine] Disable TokenFactor simplifications when optnone.
llvm-svn: 335773
2018-06-27 19:41:25 +00:00
Fangrui Song b0d57a535b [X86] Fix unmatched parenthesis in r335768
llvm-svn: 335769
2018-06-27 19:12:07 +00:00
Craig Topper 6bea2c7f9b [X86] Teach the disassembler to use %eiz/%riz instead of NoRegister when the SIB byte is present, but doesn't encode an index register and there was another shorter encoding that would achieve the same result.
The %eiz/%riz are dummy registers that force the encoder to emit a SIB byte when it normally wouldn't. By emitting them in the disassembly output we ensure that assembling the disassembler output would also produce a SIB byte.

This should match the behavior of objdump from binutils.

llvm-svn: 335768
2018-06-27 19:03:36 +00:00
Daniel Sanders bdeb880d14 [globalisel][legalizer] Add AtomicOrdering to LegalityQuery and use it in AArch64
Now that we have the ability to legalize based on MMO's. Add support for
legalizing based on AtomicOrdering and use it to correct the legalization
of the atomic instructions.

Also extend all() to be a variadic template as this ruleset now requires
3 and 4 argument versions.

llvm-svn: 335767
2018-06-27 19:03:21 +00:00
Sanjay Patel d052de856d [DAGCombiner] restrict (float)((int) f) --> ftrunc with no-signed-zeros
As noted in the D44909 review, the transform from (fptosi+sitofp) to ftrunc 
can produce -0.0 where the original code does not:

#include <stdio.h>
  
int main(int argc) {
  float x;
  x = -0.8 * argc;
  printf("%f\n", (float)((int)x));
  return 0;
}

$ clang -O0 -mavx fp.c ; ./a.out 
0.000000
$ clang -O1 -mavx fp.c ; ./a.out 
-0.000000

Ideally, we'd use IR/node flags to predicate the transform, but the IR parser 
doesn't currently allow fast-math-flags on the cast instructions. So for now, 
just use the function attribute that corresponds to clang's "-fno-signed-zeros" 
option.

Differential Revision: https://reviews.llvm.org/D48085

llvm-svn: 335761
2018-06-27 18:16:40 +00:00
Teresa Johnson 7e7b13d016 [ThinLTO] Print names in function import debug messages when available
Summary:
Rather than just print the GUID, when it is available in the index,
print the global name as well in the function import thin link debug
messages. Names will be available when the combined index is being
built by the same process, e.g. a linker or "llvm-lto2 run".

Reviewers: davidxl

Subscribers: mehdi_amini, inglorion, eraman, steven_wu, llvm-commits

Differential Revision: https://reviews.llvm.org/D48612

llvm-svn: 335760
2018-06-27 18:03:39 +00:00
Jessica Paquette f472f6159a [MachineOutliner] Don't outline sequences where x16/x17/nzcv are live across
It isn't safe to outline sequences of instructions where x16/x17/nzcv live
across the sequence.

This teaches the outliner to check whether or not a specific canidate has
x16/x17/nzcv live across it and discard the candidate in the case that that is
true.

https://bugs.llvm.org/show_bug.cgi?id=37573
https://reviews.llvm.org/D47655

llvm-svn: 335758
2018-06-27 17:43:27 +00:00
Craig Topper 812fcb35e7 [X86] Use bts/btr/btc for single bit set/clear/complement of a variable bit position
If we are just modifying a single bit at a variable bit position we can use the BT* instructions to make the change instead of shifting a 1(or rotating a -1) and doing a binop. These instruction also ignore the upper bits of their index input so we can also remove an and if one is present on the index.

Fixes PR37938.

llvm-svn: 335754
2018-06-27 16:47:39 +00:00
Jakub Kuderski 555e41bbf2 [AliasSet] Fix UnknownInstructions printing
Summary:
AliasSet::print uses `I->printAsOperand` to print UnknownInstructions. The problem is that not all UnknownInstructions have names (e.g. call instructions). When such instructions are printed, they appear as `<badref>` in AliasSets, which is very confusing, as the values are perfectly valid.

This patch fixes that by printing UnknownInstructions without a name using `print` instead of `printAsOperand`.

Reviewers: asbirlea, chandlerc, sanjoy, grosser

Reviewed By: asbirlea

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D48609

llvm-svn: 335751
2018-06-27 16:34:30 +00:00
Craig Topper 31cbe75b3b [X86] Rename the autoupgraded of packed fp compare and fpclass intrinsics that don't take a mask as input to exclude '.mask.' from their name.
I think the intrinsics named 'avx512.mask.' should refer to the previous behavior of taking a mask argument in the intrinsic instead of using a 'select' or 'and' instruction in IR to accomplish the masking. This is more consistent with the goal that eventually we will have no intrinsics that have masking builtin. When we reach that goal, we should have no intrinsics named "avx512.mask".

llvm-svn: 335744
2018-06-27 15:57:53 +00:00
Stanislav Mekhanoshin 1a1687f1bb [AMDGPU] Convert rcp to rcp_iflag
If a source of rcp instruction is a result of any conversion from
an integer convert it into rcp_iflag instruction. No FP exception
can ever happen except division by zero if a single precision rcp
argument is a representation of an integral number.

Differential Revision: https://reviews.llvm.org/D48569

llvm-svn: 335742
2018-06-27 15:33:33 +00:00
Luke Geeson 316327150b [AArch64] Reverting FP16 vcvth_n_s64_f16 to fix
llvm-svn: 335737
2018-06-27 14:34:40 +00:00
Adhemerval Zanella cadcfed7aa [AArch64] Add custom lowering for v4i8 trunc store
This patch adds a custom trunc store lowering for v4i8 vector types.
Since there is not v.4b register, the v4i8 is promoted to v4i16 (v.4h)
and default action for v4i8 is to extract each element and issue 4
byte stores.

A better strategy would be to extended the promoted v4i16 to v8i16
(with undef elements) and extract and store the word lane which
represents the v4i8 subvectores. The construction:

  define void @foo(<4 x i16> %x, i8* nocapture %p) {
    %0 = trunc <4 x i16> %x to <4 x i8>
    %1 = bitcast i8* %p to <4 x i8>*
    store <4 x i8> %0, <4 x i8>* %1, align 4, !tbaa !2
    ret void
  }

Can be optimized from:

  umov    w8, v0.h[3]
  umov    w9, v0.h[2]
  umov    w10, v0.h[1]
  umov    w11, v0.h[0]
  strb    w8, [x0, #3]
  strb    w9, [x0, #2]
  strb    w10, [x0, #1]
  strb    w11, [x0]
  ret

To:

  xtn     v0.8b, v0.8h
  str     s0, [x0]
  ret

The patch also adjust the memory cost for autovectorization, so the C
code:

  void foo (const int *src, int width, unsigned char *dst)
  {
    for (int i = 0; i < width; i++)
       *dst++ = *src++;
  }

can be vectorized to:

  .LBB0_4:                                // %vector.body
                                          // =>This Inner Loop Header: Depth=1
        ldr     q0, [x0], #16
        subs    x12, x12, #4            // =4
        xtn     v0.4h, v0.4s
        xtn     v0.8b, v0.8h
        st1     { v0.s }[0], [x2], #4
        b.ne    .LBB0_4

Instead of byte operations.

llvm-svn: 335735
2018-06-27 13:58:46 +00:00
Ivan A. Kosarev 7231598fce [NEON] Support vldNq intrinsics in AArch32 (LLVM part)
This patch adds support for the q versions of the dup
(load-to-all-lanes) NEON intrinsics, such as vld2q_dup_f16() for
example.

Currently, non-q versions of the dup intrinsics are implemented
in clang by generating IR that first loads the elements of the
structure into the first lane with the lane (to-single-lane)
intrinsics, and then propagating it other lanes. There are at
least two problems with this approach. First, there are no
double-spaced to-single-lane byte-element instructions. For
example, there is no such instruction as 'vld2.8 { d0[0], d2[0]
}, [r0]'. That means we cannot rely on the to-single-lane
intrinsics and instructions to implement the q versions of the
dup intrinsics. Note that to-all-lanes instructions do support
all sizes of data items, including bytes.

The second problem with the current approach is that we need a
separate vdup instruction to propagate the structure to each
lane. So for vld4q_dup_f16() we would need four vdup instructions
in addition to the initial vld instruction.

This patch introduces dup LLVM intrinsics and reworks handling of
the currently supported (non-q) NEON dup intrinsics to expand
them into those LLVM intrinsics, thus eliminating the need for
using to-single-lane intrinsics and instructions.

Additionally, this patch adds support for u64 and s64 dup NEON
intrinsics. These are marked as Arch64-only in the ARM NEON
Reference, but it seems there are no reasons to not support them
in AArch32 mode. Please correct, if that is wrong.

That's what we generate with this patch applied:

vld2q_dup_f16:
  vld2.16 {d0[], d2[]}, [r0]
  vld2.16 {d1[], d3[]}, [r0]

vld3q_dup_f16:
  vld3.16 {d0[], d2[], d4[]}, [r0]
  vld3.16 {d1[], d3[], d5[]}, [r0]

vld4q_dup_f16:
  vld4.16 {d0[], d2[], d4[], d6[]}, [r0]
  vld4.16 {d1[], d3[], d5[], d7[]}, [r0]

Differential Revision: https://reviews.llvm.org/D48439

llvm-svn: 335733
2018-06-27 13:57:52 +00:00
Simon Pilgrim d3e583a52d [DAGCombiner] visitSDIV - add special case handling for (sdiv X, 1) -> X in pow2 expansion
For divisor = 1, perform a select of X - reduces scalarisation of simple SDIVs

llvm-svn: 335727
2018-06-27 12:45:31 +00:00
Simon Pilgrim e835f662fa [DAGCombiner] visitSDIV - simplify pow2 handling. NFCI.
Use the builtin constant folding of getNode() etc. instead of doing it manually.

llvm-svn: 335720
2018-06-27 10:51:55 +00:00
Simon Pilgrim dfbcc66adc [DAGCombiner] Fold SDIV(%X, MIN_SIGNED) -> SELECT(%X == MIN_SIGNED, 1, 0)
Fixes PR37569.

llvm-svn: 335719
2018-06-27 10:21:06 +00:00
Simon Pilgrim 0a566bc0ae [DAGCombiner] Don't accept signbit sdiv divisors in sdiv-by-pow2 vector expansion (PR37569)
llvm-svn: 335717
2018-06-27 09:41:22 +00:00
Luke Geeson 68cb233c0f [AArch64] Remove Duplicate FP16 Patterns with same encoding, match on existing patterns
llvm-svn: 335715
2018-06-27 09:20:13 +00:00
Konstantin Zhuravlyov 30f03b3bc0 AMDGPU/NFC: Fix typo in comment
llvm-svn: 335707
2018-06-27 05:36:03 +00:00
Vedant Kumar f6c0b41fb7 [InstCombine] Avoid creating mis-sized dbg.values in commonCastTransforms()
This prevents InstCombine from creating mis-sized dbg.values when
replacing a sequence of casts with a simpler cast. For example, in:

  (fptrunc (floor (fpext X))) -> (floorf X)

We no longer emit dbg.value(X) (with a 32-bit float operand) to describe
(fpext X) (which is a 64-bit float).

This was diagnosed by the debugify check added in r335682.

llvm-svn: 335696
2018-06-27 00:47:53 +00:00
Craig Topper 33aba0eb4c [X86] Don't store register and memory FMA3 opcodes in the same X86InstrFMA3Group.
Nothing was using this relationship. By splitting them we no longer need to worry about register or memory entries being empty in a group.

The memory folding tables in X86InstrInfo.cpp can be used to access this relationship if needed.

llvm-svn: 335694
2018-06-27 00:42:24 +00:00
Evgeniy Stepanov 289a7d4c7d Revert "[asan] Instrument comdat globals on COFF targets"
Causes false positive ODR violation reports on __llvm_profile_raw_version.

llvm-svn: 335681
2018-06-26 22:43:48 +00:00
Lang Hames 2f17824463 [ORC] Don't call isa<> on a null value.
This should fix the recent builder failures in the test-global-ctors.ll testcase.

llvm-svn: 335680
2018-06-26 22:43:01 +00:00
Lang Hames 8f9dbb1d64 [ORC] Fix a missing return value.
llvm-svn: 335677
2018-06-26 22:30:42 +00:00
Michael Zolotukhin d3b8bdef01 [JumpThreading] Don't try to rewrite a use if it's already valid.
Summary:
When recording uses we need to rewrite after cloning a loop we need to
check if the use is not dominated by the original def. The initial
assumption was that the cloned basic block will introduce a new path and
thus the original def will only dominate the use if they are in the same
BB, but as the reproducer from PR37745 shows it's not always the case.

This fixes PR37745.

Reviewers: haicheng, Ka-Ka

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D48111

llvm-svn: 335675
2018-06-26 22:19:48 +00:00
Lang Hames bf7b532cbc [ORC] Add a dependence on MC to LLVMBuild.txt
llvm-svn: 335673
2018-06-26 22:12:02 +00:00
Lang Hames 6a94134b11 [ORC] Add LLJIT and LLLazyJIT, and replace OrcLazyJIT in LLI with LLLazyJIT.
LLJIT is a prefabricated ORC based JIT class that is meant to be the go-to
replacement for MCJIT. Unlike OrcMCJITReplacement (which will continue to be
supported) it is not API or bug-for-bug compatible, but targets the same
use cases: Simple, non-lazy compilation and execution of LLVM IR.

LLLazyJIT extends LLJIT with support for function-at-a-time lazy compilation,
similar to what was provided by LLVM's original (now long deprecated) JIT APIs.

This commit also contains some simple utility classes (CtorDtorRunner2,
LocalCXXRuntimeOverrides2, JITTargetMachineBuilder) to support LLJIT and
LLLazyJIT.

Both of these classes are works in progress. Feedback from JIT clients is very
welcome!

llvm-svn: 335670
2018-06-26 21:35:48 +00:00
Konstantin Zhuravlyov 777477705a AMDGPU: Silence unused warnings in waitcnt insertion pass in release build
Differential Revision: https://reviews.llvm.org/D48607

llvm-svn: 335669
2018-06-26 21:33:38 +00:00
Jessica Paquette 67599c2e1e [X86][AsmParser] Recommit r335658
Recommit of r335658 so that it does not change the behaviour of any
existing error output.

llvm-svn: 335668
2018-06-26 21:30:34 +00:00
Vedant Kumar 1cb63dc2d5 Rename skipDebugInfo -> skipDebugIntrinsics, NFC
This addresses post-commit feedback about the name 'skipDebugInfo' being
misleading. This name could be interpreted as meaning 'a function that
skips instructions with debug locations'.

The new name, 'skipDebugIntrinsics', makes it clear that this function
only skips debug info intrinsics.

Thanks to Adrian Prantl for pointing this out!

llvm-svn: 335667
2018-06-26 21:16:59 +00:00
Lang Hames 2795a0a06e [ORC] Reset AsynchronousSymbolQuery's NotifySymbolsResolved callback on error.
AsynchronousSymbolQuery::canStillFail checks the value of the callback to
prevent sending it redundant error notifications, so we need to reset it after
running it.

llvm-svn: 335664
2018-06-26 20:59:50 +00:00
Lang Hames 831c575829 [ORC] Move the VSOList typedef out of VSO.
llvm-svn: 335663
2018-06-26 20:59:49 +00:00
Lang Hames ec8f5c8e5a [ORC] Fix a FIXME by moving MangleAndInterner to Core.h.
llvm-svn: 335661
2018-06-26 20:59:46 +00:00
Jessica Paquette 0a80af0761 Revert "[X86][AsmParser] Emit an error when RIP-relative instructions are used in 32-bit mode"
This reverts commit 4850a9aae8b38c7deadc103d634ec7397e6c323b.

It caused MC/X86/x86_errors.s to fail. Will fix and recommit shortly.

llvm-svn: 335660
2018-06-26 20:57:19 +00:00
Jessica Paquette 0e40d4bfc3 [X86][AsmParser] Emit an error when RIP-relative instructions are used in 32-bit mode
Right now, when we use RIP-relative instructions in 32-bit mode, we'll just
assert and crash.

This adds an error message which tells the user that they can't do that in
32-bit mode, so that we don't crash (and also can see the issue outside of
assert builds).

llvm-svn: 335658
2018-06-26 20:33:46 +00:00
Stanislav Mekhanoshin dacda79ee6 [AMDGPU] Add llvm.amdgcn.fmad.ftz intrinsic
This intrinsic selects v_mad_f32 regardless of fp32 denorm support.

Differential Revision: https://reviews.llvm.org/D48573

llvm-svn: 335654
2018-06-26 20:04:19 +00:00
Sanjay Patel fb9c440ba5 [DAGCombiner] use isBitwiseNot to simplify code; NFC
llvm-svn: 335652
2018-06-26 19:46:56 +00:00
Matt Arsenault 8c4a35237a AMDGPU: Add pass to lower kernel arguments to loads
This replaces most argument uses with loads, but for
now not all.

The code in SelectionDAG for calling convention lowering
is actively harmful for amdgpu_kernel. It attempts to
split the argument types into register legal types, which
results in low quality code for arbitary types. Since
all kernel arguments are passed in memory, we just want the
raw types.

I've tried a couple of methods of mitigating this in SelectionDAG,
but it's easier to just bypass this problem alltogether. It's
possible to hack around the problem in the initial lowering,
but the real problem is the DAG then expects to be able to use
CopyToReg/CopyFromReg for uses of the arguments outside the block.

Exposing the argument loads in the IR also has the advantage
that the LoadStoreVectorizer can merge them.

I'm not sure the best approach to dealing with the IR
argument list is. The patch as-is just leaves the IR arguments
in place, so all the existing code will still compute the same
kernarg size and pointlessly lowers the arguments.

Arguably the frontend should emit kernels with an empty argument
list in the first place. Alternatively a dummy array could be
inserted as a single argument just to reserve space.

This does have some disadvantages. Local pointer kernel arguments can
no longer have AssertZext placed  on them as the equivalent !range
metadata is not valid on pointer  typed loads. This is mostly bad
for SI which needs to know about the known bits in order to use the
DS instruction offset, so in this case this is not done.

More importantly, this skips noalias arguments since this pass
does not yet convert this to the equivalent !alias.scope and !noalias
metadata. Producing this metadata correctly seems to be tricky,
although this logically is the same as inlining into a function which
doesn't exist. Additionally, exposing these loads to the vectorizer
may result in degraded aliasing information if a pointer load is
merged with another argument load.

I'm also not entirely sure this is preserving the current clover
ABI, although I would greatly prefer if it would stop widening
arguments and match the HSA ABI. As-is I think it is extending
< 4-byte arguments to 4-bytes but doesn't align them to 4-bytes.

llvm-svn: 335650
2018-06-26 19:10:00 +00:00
Matt Arsenault 7e991d30c0 ConstantFold: Don't fold global address vs. null for addrspace != 0
Not sure why this logic seems to be repeated in 2 different places,
one called by the other.

On AMDGPU addrspace(3) globals start allocating at 0, so these
checks will be incorrect (not that real code actually tries
to compare these addresses)

llvm-svn: 335649
2018-06-26 18:55:43 +00:00
Vedant Kumar 78ff0f1b83 Use a variable to appease a no-asserts bot, NFC
Failure URL:
http://lab.llvm.org:8011/builders/lld-x86_64-darwin13/builds/22836

llvm-svn: 335648
2018-06-26 18:55:26 +00:00
Tim Shen b32823cbe9 [ConstantRange] Add support of mul in makeGuaranteedNoWrapRegion.
Summary: This is trying to add support for r334428.

Reviewers: sanjoy

Subscribers: jlebar, hiraditya, bixia, llvm-commits

Differential Revision: https://reviews.llvm.org/D48399

llvm-svn: 335646
2018-06-26 18:54:10 +00:00
Matt Arsenault 2c1a570aab LoopUnroll: Allow analyzing intrinsic call costs
I'm not sure why the code here is skipping calls since
TTI does try to do something for general calls, but it
at least should allow intrinsics.

Skip intrinsics that should not be omitted as calls, which
is by far the most common case on AMDGPU.

llvm-svn: 335645
2018-06-26 18:51:17 +00:00
Vedant Kumar c85ca4cdab [Local] Add a convenient insertReplacementDbgValues overload, NFC
Add an overload for the common case where the replacement dbg.values
have the same DIExpressions as the originals.

llvm-svn: 335643
2018-06-26 18:44:53 +00:00
Vedant Kumar de46f65bbd [Local] Sink salvageDI's early exit into helper functions, NFC
salvageDebugInfo() performs a check that allows it to exit early without
doing a DenseMap lookup. It's a bit neater and marginally more useful to
sink this early exit into the findDbg{Addr,Users,Values} helpers.

llvm-svn: 335642
2018-06-26 18:44:52 +00:00
Brendon Cahoon b7169c435a [Hexagon] Add a "generic" cpu
Add the generic processor for Hexagon so that it can be used
with 3rd party programs that create a back-end with the
"generic" CPU. This patch also enables the JIT for Hexagon.

Differential Revision: https://reviews.llvm.org/D48571

llvm-svn: 335641
2018-06-26 18:44:05 +00:00
Simon Pilgrim 7f55af37f4 [DAGCombiner] Don't accept -1 sdiv divisors in sdiv-by-pow2 vector expansion (PR37119)
Temporary fix until I've managed to get D45806 updated - both +1 and -1 special cases need to be properly supported.

llvm-svn: 335637
2018-06-26 17:46:51 +00:00
Sanjay Patel ad0bfb844d [InstSimplify] fold shifts by sext bool
https://rise4fun.com/Alive/c3Y

llvm-svn: 335633
2018-06-26 17:31:38 +00:00
Sanjay Patel 9adea01c9f [InstCombine] simplify code for urem fold; NFCI
llvm-svn: 335623
2018-06-26 16:39:29 +00:00
Sanjay Patel 3575f0c0b3 [InstCombine] fold urem with sext bool divisor
Similar to other patches in this series:
https://reviews.llvm.org/rL335512
https://reviews.llvm.org/rL335527
https://reviews.llvm.org/rL335597
https://reviews.llvm.org/rL335616

...this is filling a gap in analysis that is exposed by an unrelated select-of-constants transform.
I didn't see a way to unify the sext cases because each div/rem opcode results in a different fold.

Note that in this case, the backend might want to convert the select into math:
Name: sext urem
%e = sext i1 %x to i32
%r = urem i32 %y, %e
=>
%c = icmp eq i32 %y, -1
%z = zext i1 %c to i32
%r = add i32 %z, %y

llvm-svn: 335622
2018-06-26 16:30:00 +00:00
Simon Pilgrim bbfc18b5b5 [SLPVectorizer] Recognise non uniform power of 2 constants
Since D46637 we are better at handling uniform/non-uniform constant Pow2 detection; this patch tweaks the SLP argument handling to support them.

As SLP works with arrays of values I don't think we can easily use the pattern match helpers here.

Differential Revision: https://reviews.llvm.org/D48214

llvm-svn: 335621
2018-06-26 16:20:16 +00:00
Simon Pilgrim 133b1cdf08 [DAGCombiner] Pull out VT bitwidth in visitSDIV. NFCI.
llvm-svn: 335617
2018-06-26 15:39:16 +00:00
Sanjay Patel 2b7e31095d [InstSimplify] fold srem with sext bool divisor
llvm-svn: 335616
2018-06-26 15:32:54 +00:00
Krzysztof Parzyszek 9f199ebec0 Silence "unused variable" warning in LiveIntervals.cpp after r335607
llvm-svn: 335610
2018-06-26 14:55:04 +00:00
Krzysztof Parzyszek 70f027022c Account for undef values from predecessors in extendSegmentsToUses
It is legal for a PHI node not to have a live value in a predecessor
as long as the end of the predecessor is jointly dominated by an undef
value.

llvm-svn: 335607
2018-06-26 14:37:16 +00:00
Simon Pilgrim aa2bf2be31 [TargetLowering] isVectorClearMaskLegal - use ArrayRef<int> instead of const SmallVectorImpl<int>&
This is more generic and matches isShuffleMaskLegal.

Differential Revision: https://reviews.llvm.org/D48591

llvm-svn: 335605
2018-06-26 14:15:31 +00:00
Than McIntosh 3190993a02 [X86,ARM] Retain split-stack prolog check for sibling calls
Summary:
If a routine with no stack frame makes a sibling call, we need to
preserve the stack space check even if the local stack frame is empty,
since the call target could be a "no-split" function (in which case
the linker needs to be able to fix up the prolog sequence in order to
switch to a larger stack).

This fixes PR37807.

Reviewers: cherry, javed.absar

Subscribers: srhines, llvm-commits

Differential Revision: https://reviews.llvm.org/D48444

llvm-svn: 335604
2018-06-26 14:11:30 +00:00
Simon Pilgrim cfe2f9d4d2 Fix spelling mistakes in comments. NFCI.
llvm-svn: 335603
2018-06-26 14:06:23 +00:00
Teresa Johnson 63ee0e73e4 [ThinLTO] Parse module summary index from assembly
Summary:
Adds assembly parsing support for the module summary index (follow on
to r333335 which added the assembly writing support).

I added support to llvm-as to invoke the index parsing, so that it can
create either a bitcode file with a Module and a per-module index, or
a combined index without a Module.

I will send follow on patches soon to do the following:
- add support to tools such as llvm-lto2 to parse the per-module indexes
from assembly instead of bitcode when testing the thin link.
- verification support.

Depends on D47844 and D47842.

Reviewers: pcc, dexonsmith, mehdi_amini

Subscribers: inglorion, eraman, steven_wu, llvm-commits

Differential Revision: https://reviews.llvm.org/D47905

llvm-svn: 335602
2018-06-26 13:56:49 +00:00
Sanjay Patel 7c45debaea [InstCombine] fold udiv with sext bool divisor
Note: I didn't add a hasOneUse() check because the existing,
related fold doesn't have that check. I suspect that the
improved analysis and codegen make these some of the rare
canonicalization cases where we allow an increase in
instructions.

llvm-svn: 335597
2018-06-26 12:41:15 +00:00
Tim Northover b73efb85ba ARM: correctly decode VFP instructions following unpredictable t2IT
When the condition code for an IT instruction is "AL" we get strange "15"
predicates on subsequent instructions. These are dealt with for most
instructions by treating them as "ARMCC::AL", but VFP takes a different path
which didn't have this code.

llvm-svn: 335594
2018-06-26 11:39:20 +00:00
Tim Northover bf54858115 ARM: diagnose unpredictable IT instructions
IT instructions are allowed to have the 'AL' predicate, but it must never
result in an 'NV' predicated instruction. Essentially this means that all
branches must be 't' rather than 'e' if the predicate is 'AL'.

This patch adds a diagnostic for this during assembly (error because parsing
hits an assertion if allowed to continue) and an annotation during disassembly.

llvm-svn: 335593
2018-06-26 11:38:41 +00:00
Simon Pilgrim bfaa09220b [X86] Just use ArrayRef instead of SmallVectorImpl in a few static method arguments. NFCI.
llvm-svn: 335590
2018-06-26 10:45:41 +00:00
Florian Hahn 4a69b0bb36 [IPSCCP] Change dead blocks to unreachable after visiting all executable blocks.
changeToUnreachable may remove PHI nodes from executable blocks we found values
for and we would fail to replace them. By changing dead blocks to unreachable after
we replaced constants in all executable blocks, we ensure such PHI nodes are replaced
by their known value before.

Fixes PR37780.

Reviewers: efriedma, davide

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D48421

llvm-svn: 335588
2018-06-26 10:15:02 +00:00
Simon Pilgrim dcf5bd271f Fix MSVC "signed/unsigned mismatch" warning. NFCI.
llvm-svn: 335587
2018-06-26 10:02:12 +00:00
Simon Pilgrim 9b3b0fe763 Fix MSVC "not all control paths return a value" warnings. NFCI.
llvm-svn: 335584
2018-06-26 09:31:18 +00:00
Bjorn Pettersson 550517bcab Improve ConvertDebugDeclareToDebugValue
Summary:
This is a follow-up to r334830 and r335031.

In the valueCoversEntireFragment check we now also handle
the situation when there is a variable length array (VLA)
involved, and the length of the array has been reduced to
a constant.

The ConvertDebugDeclareToDebugValue functions that are related
to PHI nodes and load instructions now avoid inserting dbg.value
intrinsics when the value does not, for certain, cover the
variable/fragment that should be described.
In r334830 we assumed that the value always covered the entire
var/fragment and we had assertions in the code to show that
assumption. However, those asserts failed when compiling code
with VLAs, so we removed the asserts in r335031. Now when we
know that the valueCoversEntireFragment check can fail also for
PHI/Load instructions we avoid to insert the faulty dbg.value
intrinsic in such situations. Compared to the Store instruction
scenario we simply drop the dbg.value here (as the variable does
not change its value due to PHI/Load, so an earlier dbg.value
describing the variable should still be valid).

Reviewers: aprantl, vsk, efriedma

Reviewed By: aprantl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D48547

llvm-svn: 335580
2018-06-26 06:17:00 +00:00
Gil Rapaport da2e2caa6c [InstCombine] (A + 1) + (B ^ -1) --> A - B
Turn canonicalized subtraction back into (-1 - B) and combine it with (A + 1) into (A - B).
This is similar to the folding already done for (B ^ -1) + Const into (-1 + Const) - B.

Differential Revision: https://reviews.llvm.org/D48535

llvm-svn: 335579
2018-06-26 05:31:18 +00:00
Craig Topper 08dae1682d [X86] Don't use getScalarShiftAmountTy to get the immediate type for target specific VSHLDQ/VSRLDQ nodes.
These opcodes have a fixed type of i8 for their immediate and shouldn't have anything to do with the scalar shift amount used by target independent shift nodes.

llvm-svn: 335578
2018-06-26 04:53:42 +00:00
Dan Gohman 910ba33d0c [WebAssembly] Fix lowering of varargs functions with non-legal fixed arguments.
CallLoweringInfo's NumFixedArgs field gives the number of fixed arguments
before legalization. The ISD::OutputArg "Outs" array holds legalized
arguments, so when indexing into it to find the non-fixed arguemn, we need
to use the number of arguments after legalization.

Fixes PR37934.

llvm-svn: 335576
2018-06-26 03:18:38 +00:00
Craig Topper c42ed4e3c4 [X86] Use XOR for SUB (C, X) during isel if will help fold an immediate
Summary:
Same idea as D48529, but restricted to X86 and done very late to avoid any surprises where subtract might be better for DAG combining.

This seems like the safest way to do this trick. And we consider doing it as a DAG combine later.

Reviewers: spatel, RKSimon

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D48557

llvm-svn: 335575
2018-06-26 03:11:15 +00:00
Dan Gohman fd2f7aeb12 [WebAssembly] Fix a typo in a comment.
llvm-svn: 335574
2018-06-26 03:03:41 +00:00
Teresa Johnson 519055336d [ThinLTO] Add string saver onto index for value names
Summary:
Adds a string saver to the ModuleSummaryIndex so it can store value
names in the case of adding a ValueInfo for a GUID when we don't
have the name stored in a Module string table. This is motivated
by the upcoming summary parser patch, where we will read value names
from the summary entry and want to store them, even when a Module
is not available.

Currently this allows us to store the name in the legacy bitcode case,
and I have added a test to show that.

Reviewers: pcc, dexonsmith

Subscribers: mehdi_amini, inglorion, eraman, steven_wu, llvm-commits

Differential Revision: https://reviews.llvm.org/D47842

llvm-svn: 335570
2018-06-26 02:29:08 +00:00
Craig Topper 689e363ff2 [X86] Redefine avx512 packed fpclass intrinsics to return a vXi1 mask and implement the mask input argument using an 'and' IR instruction.
This recommits r335562 and 335563 as a single commit.

The frontend will surround the intrinsic with the appropriate marshalling to/from a scalar type to match the sigature of the builtin that software expects.

By exposing the vXi1 type directly in the llvm intrinsic we make it available to optimizers much earlier. This can enable the scalar marshalling code to be optimized away.

llvm-svn: 335568
2018-06-26 01:37:02 +00:00
Teresa Johnson 9766fd64fb [ThinLTO] Add per-module indexes to combined index consistently
Summary:
Without this change we only add module paths to the combined index when
there is a module hash or at least one global value. Make this more
consistent by adding the module to the index whenever there is a summary
section, and it is a per-module summary (had a MODULE_CODE_SOURCE_FILENAME
record).

Since we will no longer add module paths lazily, add a new interface to get
the module info from the index that asserts it is already added.

Fixes PR37899.

Reviewers: Vlad, pcc

Subscribers: mehdi_amini, inglorion, steven_wu, llvm-commits

Differential Revision: https://reviews.llvm.org/D48511

llvm-svn: 335567
2018-06-26 01:32:58 +00:00
Craig Topper 6f4fdfa9af Revert r335562 and 335563 "[X86] Redefine avx512 packed fpclass intrinsics to return a vXi1 mask and implement the mask input argument using an 'and' IR instruction."
These were supposed to have been squashed to a single commit.

llvm-svn: 335566
2018-06-26 01:31:53 +00:00
Lang Hames ce72161ddf [ORC] Add a symbolAliases function to the Core APIs.
symbolAliases can be used to define symbol aliases within a VSO.

llvm-svn: 335565
2018-06-26 01:22:29 +00:00
Craig Topper 9b4322ce31 foo
llvm-svn: 335562
2018-06-26 00:43:34 +00:00
Teresa Johnson 7bea1aad6a [ThinLTO] Compute GUID directly from GV when building per-module index
Summary:
I discovered when writing the summary parsing support that the
per-module index builder and writer are computing the GUID from the
value name alone (ignoring the linkage type). This was ok since those
GUID were not emitted in the bitcode, and there are never multiple
conflicting names in a single module.

However, I don't see a reason for making the GUID computation different
for the per-module case. It also makes things simpler on the parsing
side to have the GUID computation consistent. So this patch changes the
summary analysis phase and the per-module summary writer to compute the
GUID using the facility on the GlobalValue.

Reviewers: pcc, dexonsmith

Subscribers: llvm-commits, inglorion

Differential Revision: https://reviews.llvm.org/D47844

llvm-svn: 335560
2018-06-26 00:20:49 +00:00
Eric Christopher b7a52bb28a Add a warning if someone attempts to add extra section flags to sections
with well defined semantics like .rodata.

llvm-svn: 335558
2018-06-25 23:53:54 +00:00
Tim Shen 802c31cc28 [APInt] Add helpers for rounding u/sdivs.
Reviewers: sanjoy, craig.topper

Subscribers: jlebar, hiraditya, bixia, llvm-commits

Differential Revision: https://reviews.llvm.org/D48498

llvm-svn: 335557
2018-06-25 23:49:20 +00:00
Chandler Carruth 1652996fd6 [PM/LoopUnswitch] Teach the new unswitch to handle nontrivial
unswitching of switches.

This works much like trivial unswitching of switches in that it reliably
moves the switch out of the loop. Here we potentially clone the entire
loop into each successor of the switch and re-point the cases at these
clones.

Due to the complexity of actually doing nontrivial unswitching, this
patch doesn't create a dedicated routine for handling switches -- it
would duplicate far too much code. Instead, it generalizes the existing
routine to handle both branches and switches as it largely reduces to
looping in a few places instead of doing something once. This actually
improves the results in some cases with branches due to being much more
careful about how dead regions of code are managed. With branches,
because exactly one clone is created and there are exactly two edges
considered, somewhat sloppy handling of the dead regions of code was
sufficient in most cases. But with switches, there are much more
complicated patterns of dead code and so I've had to move to a more
robust model generally. We still do as much pruning of the dead code
early as possible because that allows us to avoid even cloning the code.

This also surfaced another problem with nontrivial unswitching before
which is that we weren't as precise in reconstructing loops as we could
have been. This seems to have been mostly harmless, but resulted in
pointless LCSSA PHI nodes and other unnecessary cruft. With switches, we
have to get this *right*, and everything benefits from it.

While the testing may seem a bit light here because we only have two
real cases with actual switches, they do a surprisingly good job of
exercising numerous edge cases. Also, because we share the logic with
branches, most of the changes in this patch are reasonably well covered
by existing tests.

The new unswitch now has all of the same fundamental power as the old
one with the exception of the single unsound case of *partial* switch
unswitching -- that really is just loop specialization and not
unswitching at all. It doesn't fit into the canonicalization model in
any way. We can add a loop specialization pass that runs late based on
profile data if important test cases ever come up here.

Differential Revision: https://reviews.llvm.org/D47683

llvm-svn: 335553
2018-06-25 23:32:54 +00:00
Sanjay Patel 38a86d3136 [InstCombine] cleanup udiv folds; NFCI
This removes a "UDivFoldAction" in favor of a simple constant
matcher. In theory, the existing code could do more matching,
but I don't see any evidence or need for it. I've left a TODO
about using ValueTracking in case we see any regressions.

llvm-svn: 335545
2018-06-25 22:50:26 +00:00
Benjamin Kramer 1649774816 [Instrumentation] Remove unused include
It's also a layering violation.

llvm-svn: 335528
2018-06-25 21:43:09 +00:00
Sanjay Patel 6a96d90acd [InstCombine] fold sdiv with sext bool divisor
llvm-svn: 335527
2018-06-25 21:39:41 +00:00
Florian Hahn b10b141a79 Revert r335513: [SCEVExp] Advance found insertion point
llvm-svn: 335522
2018-06-25 20:55:26 +00:00
Craig Topper 27847868b7 [LoopIdiomRecognize] Fix a couple places where it appears we were unintenionally making copies of DebugLoc.
llvm-svn: 335521
2018-06-25 20:45:45 +00:00
Craig Topper 913abc8b58 [X86] Simplify intrinsic table binary search to not require a temporary struct.
std::lower_bound doesn't require the thing to search for to be the same type as the table entries. We just need to define an appropriate comparison function that can take an table entry and an intrinsic number.

llvm-svn: 335518
2018-06-25 20:27:46 +00:00
Craig Topper 614f192471 [X86] Add comment about the sorting of the memory folding tables added in r335501.
llvm-svn: 335517
2018-06-25 20:11:16 +00:00
Lei Huang 5d109ee3d4 [PowerPC] Fix incorrectly encoded wait instruction
Encoding for the wait instruction was wrong. Fix according to ISA 3.0.

Differential Revision: https://reviews.llvm.org/D48550

llvm-svn: 335514
2018-06-25 19:28:27 +00:00
Florian Hahn 5947c17fd4 [SCEVExp] Advance found insertion point until we find a non-dbg instruction.
This avoids creating unnecessary casts if the IP used to be a dbg info
intrinsic. Fixes PR37727.

Reviewers: vsk, aprantl, sanjoy, efriedma

Reviewed By: vsk, efriedma

Differential Revision: https://reviews.llvm.org/D47874

llvm-svn: 335513
2018-06-25 19:17:29 +00:00
Sanjay Patel 1e911fa746 [InstSimplify] fold div/rem of zexted bool
I was looking at an unrelated fold and noticed that
we don't have this simplification (because the other
fold would break existing tests).

Name: zext udiv
  %z = zext i1 %x to i32
  %r = udiv i32 %y, %z
=>
  %r = %y

Name: zext urem
  %z = zext i1 %x to i32
  %r = urem i32 %y, %z
=>
  %r = 0

Name: zext sdiv
  %z = zext i1 %x to i32
  %r = sdiv i32 %y, %z
=>
  %r = %y

Name: zext srem
  %z = zext i1 %x to i32
  %r = srem i32 %y, %z
=>
  %r = 0

https://rise4fun.com/Alive/LZ9

llvm-svn: 335512
2018-06-25 18:51:21 +00:00
Kamil Rytarowski a8448ad098 Handle NetBSD specific path in findDebugBinary()
Summary:
The NetBSD Operating System installs debuginfo
files into /usr/libdata/debug, rather than other path
like in some other popular distribution.

This change makes llvm-symbolizer functional with
the basesystem executables.

Reviewers: joerg, vitalybuka

Reviewed By: vitalybuka

Subscribers: JDevlieghere, llvm-commits

Differential Revision: https://reviews.llvm.org/D48525

llvm-svn: 335511
2018-06-25 18:49:13 +00:00
Reid Kleckner 88fee5fdbc Re-land r335297 "[X86] Implement more of x86-64 large and medium PIC code models"
The large code model allows code and data segments to exceed 2GB, which
means that some symbol references may require a displacement that cannot
be encoded as a displacement from RIP. The large PIC model even relaxes
the assumption that the GOT itself is within 2GB of all code. Therefore,
we need a special code sequence to materialize it:
  .LtmpN:
    leaq .LtmpN(%rip), %rbx
    movabsq $_GLOBAL_OFFSET_TABLE_-.LtmpN, %rax # Scratch
    addq %rax, %rbx # GOT base reg

From that, non-local references go through the GOT base register instead
of being PC-relative loads. Local references typically use GOTOFF
symbols, like this:
    movq extern_gv@GOT(%rbx), %rax
    movq local_gv@GOTOFF(%rbx), %rax

All calls end up being indirect:
    movabsq $local_fn@GOTOFF, %rax
    addq %rbx, %rax
    callq *%rax

The medium code model retains the assumption that the code segment is
less than 2GB, so calls are once again direct, and the RIP-relative
loads can be used to access the GOT. Materializing the GOT is easy:
    leaq _GLOBAL_OFFSET_TABLE_(%rip), %rbx # GOT base reg

DSO local data accesses will use it:
    movq local_gv@GOTOFF(%rbx), %rax

Non-local data accesses will use RIP-relative addressing, which means we
may not always need to materialize the GOT base:
    movq extern_gv@GOTPCREL(%rip), %rax

Direct calls are basically the same as they are in the small code model:
They use direct, PC-relative addressing, and the PLT is used for calls
to non-local functions.

This patch adds reasonably comprehensive testing of LEA, but there are
lots of interesting folding opportunities that are unimplemented.

I restricted the MCJIT/eh-lg-pic.ll test to Linux, since the large PIC
code model is not implemented for MachO yet.

Differential Revision: https://reviews.llvm.org/D47211

llvm-svn: 335508
2018-06-25 18:16:27 +00:00
Craig Topper 3cc6cb1d35 [X86] Sort the static memory folding tables by reg opcode. Remove the reg->mem DenseMaps in favor of binary search.
With the static tables sorted we can binary search them directly for reg->mem lookups. This removes 6 DenseMaps that had to be created when X86InstrInfo is constructed.

We still have one Mem->Reg DenseMap for the reverse direction. This is created just as before by walking the reg->mem arrays to populate it.

Differential Revision: https://reviews.llvm.org/D48527

llvm-svn: 335501
2018-06-25 17:26:56 +00:00
Craig Topper b9cb88a4b0 [X86] Allow base and index for gather instructions to appear in other order for Intel syntax.
llvm-svn: 335500
2018-06-25 17:26:51 +00:00
Vedant Kumar b725c69f12 [SelectionDAG] Remove debug locations from ConstantSD(FP)Nodes
This removes debug locations from ConstantSDNode and ConstantSDFPNode.

When this kind of node is materialized we no longer create a line table
entry which jumps back to the constant's first point of use. This makes
single-stepping behavior smoother, and it matches the model used by IR,
where Constants have no locations. See this thread for more context:

  http://lists.llvm.org/pipermail/llvm-dev/2018-June/124164.html

I'd like to handle constant BuildVectorSDNodes and to try to eliminate
passing SDLocs to SelectionDAG::getConstant*() in follow-up commits.

Differential Revision: https://reviews.llvm.org/D48468

llvm-svn: 335497
2018-06-25 17:06:18 +00:00
Alexander Richardson 85e200e934 Add Triple::isMIPS()/isMIPS32()/isMIPS64(). NFC
There are quite a few if statements that enumerate all these cases. It gets
even worse in our fork of LLVM where we also have a Triple::cheri (which
is mips64 + CHERI instructions) and we had to update all if statements that
check for Triple::mips64 to also handle Triple::cheri. This patch helps to
reduce our diff to upstream and should also make some checks more readable.

Reviewed By: atanasyan

Differential Revision: https://reviews.llvm.org/D48548

llvm-svn: 335493
2018-06-25 16:49:20 +00:00
Matt Arsenault b1cc4f52ff AMDGPU/GlobalISel: Add support for llvm.amdgcn.kernarg.segment.ptr
Note a normal select test is not currently possible because this
relies on input registers tracked in SIMachineFunctionInfo which
are not currently serializable in MIR, but this does work end-to-end
from the IR.

llvm-svn: 335490
2018-06-25 16:17:48 +00:00
Matt Arsenault 921f7a27cc StackSlotColoring: Decide colors per stack ID
I thought I fixed this in r308673, but that fix was
very broken. The assumption that any frame index can be used
in place of another was more widespread than I realized.
Even when stack slot sharing was disabled, this was still
replacing frame index uses with a different ID with a different
stack slot.

Really fix this by doing the coloring per-stack ID, so all of
the coloring logically done in a separate namespace. This is a lot
simpler than trying to figure out how to change the color if
the stack ID is different.

llvm-svn: 335488
2018-06-25 16:05:55 +00:00
Matt Arsenault 2811a20f77 AMDGPU: Remove commented out code
llvm-svn: 335486
2018-06-25 15:42:20 +00:00
Matt Arsenault b3feccd7fa AMDGPU/GlobalISel: Fix G_IMPLICIT_DEF for pointers
llvm-svn: 335485
2018-06-25 15:42:12 +00:00
Wei Mi e555127435 [SampleFDO] Add an option to turn on/off warning about samples unused.
If a function has sample to use, but cannot use them because of no debug
information, currently a warning will be issued to inform the missing
opportunity.

This warning assumes the binary generating the profile and the binary using
the profile are similar enough. It is not always the case. Sometimes even
if the binaries are not quite similar, we may still get some benefit by
using sampleFDO. In those cases, we may still want to apply sampleFDO but
not want to see a lot of such warnings pop up.

The patch adds an option for the warning.

Differential Revision: https://reviews.llvm.org/D48510

llvm-svn: 335484
2018-06-25 15:40:31 +00:00
David Green 8699492304 [DA] Delinearise AddRecs if we can prove they don't wrap
We can prove that some delinearized subscripts do not wrap around to become
negative by the fact that they are from inbound geps of load/store locations.
This helps improve the delinearisation in cases where we can't prove that they
are non-negative from SCEV alone.

Differential Revision: https://reviews.llvm.org/D48481

llvm-svn: 335481
2018-06-25 15:13:26 +00:00
Matt Arsenault 73eeb42e50 AMDGPU: Respect align argument parameter
This should avoid relying on the pointee type
to get the alignment, particularly since pointee
types are supposed to be removed at some point.

Also fixes not getting the alignment for unsized types.

llvm-svn: 335478
2018-06-25 14:29:04 +00:00
Artur Pilipenko ac6e6864e8 SafepointIRVerifier should ignore dead blocks and dead edges
Not only should SafepointIRVerifier ignore unreachable blocks (as suggested in https://reviews.llvm.org/D47011) but it also has to ignore dead blocks.

In @test2 (see the new tests):

  br i1 true, label %right, label %left
left:
  ...
right:
  ...
merge:
  %val = phi i8 addrspace(1)* [ ..., %left ], [ ..., %right ]
  use %val
both left and right branches are reachable.
If they collide then SafepointIRVerifier reports an error.

Because of the foldable branch condition GVN finds the left branch dead and removes the phi node entry that merges values from right and left. Then the use comes from the right branch. This results in no collision.

So, SafepointIRVerifier ends up in different results depending on either GVN is run or not.

To solve this issue this patch adds Dead Block detection to SafepointIRVerifier which can ignore dead blocks while validating IR. The Dead Block detection algorithm is taken from GVN but modified to not split critical edges. That is needed to keep CFG unchanged by SafepointIRVerifier.

Patch by Yevgeny Rouban.

Reviewed By: anna, apilipenko, DaniilSuchkov

Differential Revision: https://reviews.llvm.org/D47441

llvm-svn: 335473
2018-06-25 13:51:11 +00:00
Krzysztof Parzyszek 4581f37e7c Improve handling of COPY instructions with identical value numbers
Testcases provided by Tim Renouf.

Differential Revision: https://reviews.llvm.org/D48102

llvm-svn: 335472
2018-06-25 13:46:41 +00:00
Artur Pilipenko ddc7f391d2 Revert change 335077 "[InlineSpiller] Fix a crash due to lack of forward progress from remat specifically for STATEPOINT"
This change caused widespread assertion failures in our downstream testing:
lib/CodeGen/LiveInterval.cpp:409: bool llvm::LiveRange::overlapsFrom(const llvm::LiveRange&, llvm::LiveRange::const_iterator) const: Assertion `!empty() && "empty range"' failed.

llvm-svn: 335462
2018-06-25 12:58:13 +00:00
Simon Pilgrim 79e474bf46 Use APInt[] bit access to avoid "32-bit shift implicitly converted to 64 bits" MSVC warning (again). NFCI.
llvm-svn: 335457
2018-06-25 11:46:24 +00:00
Simon Pilgrim 3a0e13f347 Use APInt[] bit access to avoid "32-bit shift implicitly converted to 64 bits" MSVC warning. NFCI.
llvm-svn: 335454
2018-06-25 11:38:27 +00:00
Simon Pilgrim 5b6b500687 Fix -Wparentheses gcc warning. NFCI.
llvm-svn: 335451
2018-06-25 11:19:05 +00:00
Craig Topper facea6b4a6 [X86] Block commuting operand 1 of FMA*_Int instructions in findThreeSrcCommutedOpIndices. Remove uncommutable returns from getThreeSrcCommuteCase/getFMA3OpcodeToCommuteOperands.
We should be blocking the operand while we are in the routine that tries to find commutable operand indices. Doing it later means we might have missed out on another valid set of operands we could have commuted.

The intrinsic case was the only case that could really prevent commuting in getFMA3OpcodeToCommuteOperands. All the other cases in getThreeSrcCommuteCase were not reachable conditions as they were protected by findThreeSrcCommutedOpIndices.

With that abort case pushed earlier, we can remove all the abort checks and replace with asserts.

llvm-svn: 335446
2018-06-25 06:05:37 +00:00
George Burgess IV 97ec62455d [MSSA] Add domination number verifier; NFC
It's easy for domination numbers to get out-of-date, and this is no more
costly than any of the other verifiers we already have, so it seems nice
to have.

A stage3 build with this Works On My Machine, so this hasn't caught any
bugs... yet. :)

llvm-svn: 335444
2018-06-25 05:30:36 +00:00
Heejin Ahn 04c4894911 [WebAssembly] Add WebAssemblyException information analysis
Summary:
A WebAssemblyException object contains BBs that belong to a 'catch' part
of the try-catch-end structure. Because CFGSort requires all the BBs
within a catch part to be sorted together as it does for loops, this
pass calculates the nesting structure of catch part of exceptions in a
function. Now this assumes the use of Windows EH instructions.

Reviewers: dschuff, majnemer

Subscribers: jfb, mgorny, sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D44134

llvm-svn: 335439
2018-06-25 01:20:21 +00:00
Heejin Ahn 4934f76b58 [WebAssembly] Add WebAssemblyLateEHPrepare pass
Summary:
Add WebAssemblyLateEHPrepare pass that does several small jobs for
exception handling. This runs before CFGSort, and is different from
WasmEHPrepare pass that runs before ISel, even though the names are
similar.

Reviewers: dschuff, majnemer

Subscribers: sbc100, jgravelle-google, sunfish, mgorny, llvm-commits

Differential Revision: https://reviews.llvm.org/D46803

llvm-svn: 335438
2018-06-25 01:07:11 +00:00
Craig Topper 3b18bdc46d [X86] Simplify some code by using isOneConstant. NFC
llvm-svn: 335437
2018-06-25 01:01:47 +00:00
Craig Topper 4331d6218d [X86] Remove the changes to combineScalarToVector made in r335037.
They appear to be untested other than the test case for p37879.ll and I believe we should be using SimplifyDemandedElts here to handle these cases.

llvm-svn: 335436
2018-06-25 00:21:53 +00:00
Craig Topper ecf7c5b75f [X86] Reduce the number of patterns needed for masked scalar ceil/floor isel.
The scalar to vector on the mask register should not be part of the patterns.

llvm-svn: 335435
2018-06-25 00:05:09 +00:00
Brad Smith df1f50579f [mips][ias] Enable IAS by default for OpenBSD / FreeBSD mips64/mips64el.
Reviewers: atanasyan

Differential Review: https://reviews.llvm.org/D31557

llvm-svn: 335434
2018-06-24 15:44:47 +00:00
Sanjay Patel 962ee178fa [DAGCombiner] eliminate setcc bool math when input is low-bit of some value
This patch has the same motivating example as D48466:
define void @foo(i64 %x, i32 %c.0282.in, i32 %d.0280, i32* %ptr0, i32* %ptr1) {
    %c.0282 = and i32 %c.0282.in, 268435455
    %a16 = lshr i64 32508, %x
    %a17 = and i64 %a16, 1
    %tobool = icmp eq i64 %a17, 0
    %. = select i1 %tobool, i32 1, i32 2
    %.286 = select i1 %tobool, i32 27, i32 26
    %shr97 = lshr i32 %c.0282, %.
    %shl98 = shl i32 %c.0282.in, %.286
    %or99 = or i32 %shr97, %shl98
    %shr100 = lshr i32 %d.0280, %.
    %shl101 = shl i32 %d.0280, %.286
    %or102 = or i32 %shr100, %shl101
    store i32 %or99, i32* %ptr0
    store i32 %or102, i32* %ptr1
    ret void
}

...but I'm trying to kill the setcc bool math sooner rather than later.

By matching a larger pattern that includes both the low-bit mask and the trailing add/sub, 
we can create a universally good fold because we always eliminate the condition code 
intermediate value.

Here are Alive proofs for these (currently instcombine folds the 'add' variants, but 
misses the 'sub' patterns):
https://rise4fun.com/Alive/Gsyp

Name: sub of zext cmp mask
  %a = and i8 %x, 1
  %c = icmp eq i8 %a, 0
  %z = zext i1 %c to i32
  %r = sub i32 C1, %z
  =>
  %optional_cast = zext i8 %a to i32
  %r = add i32 %optional_cast, C1-1

Name: add of zext cmp mask
  %a = and i32 %x, 1
  %c = icmp eq i32 %a, 0
  %z = zext i1 %c to i8
  %r = add i8 %z, C1
  =>
  %optional_cast = trunc i32 %a to i8
  %r = sub i8 C1+1, %optional_cast

All of the tests look like improvements or neutral to me. But it is possible that x86 
test+set+bitop is better than what we now show here. I suspect we could do better by 
adding another fold for the 'sub' variants.

We start with select-of-constant in IR in the larger motivating test, so that's why I 
included tests with selects. Proofs for those variants:
https://rise4fun.com/Alive/Bx1

Name: true const is bigger
Pre: C2 == (C1 + 1)
  %a = and i8 %x, 1
  %c = icmp eq i8 %a, 0
  %r = select i1 %c, i64 C2, i64 C1
  =>
  %z = zext i8 %a to i64
  %r = sub i64 C2, %z

Name: false const is bigger
Pre: C2 == (C1 + 1)
  %a = and i8 %x, 1
  %c = icmp eq i8 %a, 0
  %r = select i1 %c, i64 C1, i64 C2
  =>
  %z = zext i8 %a to i64
  %r = add i64 C1, %z

Differential Revision: https://reviews.llvm.org/D48466

llvm-svn: 335433
2018-06-24 14:37:30 +00:00
Craig Topper 03523f6741 [X86] Regroup some isel patterns. NFC
For some reason the 64-bit patterns were separated from their 8/16/32-bit friends, but only for add/sub/mul. For and/or/xor they were together.

llvm-svn: 335429
2018-06-24 06:56:49 +00:00
Craig Topper 19772c89c7 [X86] Rename VFPCLASSSS and VFPCLASSSD internal instruction names to include a Z to match other EVEX instructions.
llvm-svn: 335428
2018-06-24 06:29:50 +00:00
Brad Smith 8c17d5921d Add OpenBSD support to the Threading code
llvm-svn: 335426
2018-06-23 22:02:59 +00:00
Duncan P. N. Exon Smith f4c82cffc6 ADT: Use EBO to shrink SmallVector size 1
SmallVectorStorage is empty when its size is 1; use inheritance so that
the empty base class optimization kicks in.

llvm-svn: 335421
2018-06-23 18:39:44 +00:00
Jonas Devlieghere c02bf01f64 [TableGen] Use WithColor for printing errors/warnings
Use the WithColor helper from support to print errors and warnings.

llvm-svn: 335415
2018-06-23 16:48:03 +00:00
Craig Topper d8d64a56b5 [X86] Make %eiz usage in 64-bit mode, force a 0x67 address size prefix. Fix some test CHECK lines.
llvm-svn: 335414
2018-06-23 06:15:04 +00:00
Craig Topper 2545529034 [X86] Teach disassembler to use %eip instead of %rip when 0x67 prefix is used on a rip-relative address.
llvm-svn: 335413
2018-06-23 06:03:48 +00:00
Craig Topper 68d64e3859 [X86][AsmParser] Improve base/index register checks.
-Ensure EIP isn't used with an index reigster.
-Ensure EIP isn't used as index register.
-Ensure base register isn't a vector register.
-Ensure eiz/riz usage matches the size of their base register.

llvm-svn: 335412
2018-06-23 05:53:00 +00:00
Stanislav Mekhanoshin d8c9374797 Fix invariant fdiv hoisting in LICM
FDiv is replaced with multiplication by reciprocal and invariant
reciprocal is hoisted out of the loop, while multiplication remains
even if invariant.

Switch checks for all invariant operands and only invariant
denominator to fix the issue.

Differential Revision: https://reviews.llvm.org/D48447

llvm-svn: 335411
2018-06-23 04:01:28 +00:00
Reid Kleckner fd7c9ab971 [AMDGPU] Update includes for intrinsic changes :(
llvm-svn: 335409
2018-06-23 03:05:39 +00:00
Lang Hames d716a26e8b [ORC] Fix formatting and list pending queries in VSO::dump.
llvm-svn: 335408
2018-06-23 02:22:10 +00:00
Reid Kleckner f5890e4e43 [IR] Split Intrinsics.inc into enums and implementations
Implements PR34259

Intrinsics.h is a very popular header. Most LLVM TUs care about things
like dbg_value, but they don't care how they are implemented. After I
split these out, IntrinsicImpl.inc is 1.7 MB, so this saves each LLVM TU
from scanning 1.7 MB of source that gets pre-processed away.

It also means we can modify intrinsic properties without triggering a
full rebuild, but that's probably less of a win.

I think the next best thing to do would be to split out the target
intrinsics into their own header. Very, very few TUs care about
target-specific intrinsics. It's very hard to split up the target
independent intrinsics like llvm.expect, assume, and dbg.value, though.

llvm-svn: 335407
2018-06-23 02:02:38 +00:00
Craig Topper abdbb2c67a [X86][AsmParser] Rework that allows (%dx) to be used in place of %dx with in/out instructions.
Previously, to support (%dx) we left a wide open hole in our 16-bit memory address checking. This let this address value be used with any instruction without error in the parser. It would later fail in the encoder with an assertion failure on debug builds and who knows what on release builds.

This patch passes the mnemonic down to the memory operand parsing function so we can allow the (%dx) form only on specific instructions.

llvm-svn: 335403
2018-06-23 00:03:20 +00:00
Reid Kleckner 330f65b3e8 [RuntimeDyld] Implement the ELF PIC large code model relocations
Prerequisite for https://reviews.llvm.org/D47211 which improves our ELF
large PIC codegen.

llvm-svn: 335402
2018-06-22 23:53:22 +00:00
Eli Friedman 203eaaf5ba [LoopReroll] Rewrite induction variable rewriting.
This gets rid of a bunch of weird special cases; instead, just use SCEV
rewriting for everything.  In addition to being simpler, this fixes a
bug where we would use the wrong stride in certain edge cases.

The one bit I'm not quite sure about is the trip count handling,
specifically the FIXME about overflow.  In general, I think we need to
widen the exit condition, but that's probably not profitable if the new
type isn't legal, so we probably need a check somewhere.  That said, I
don't think I'm making the existing problem any worse.

As a followup to this, a bunch of IV-related code in root-finding could
be cleaned up; with SCEV-based rewriting, there isn't any reason to
assume a loop will have exactly one or two PHI nodes.

Differential Revision: https://reviews.llvm.org/D45191

llvm-svn: 335400
2018-06-22 22:58:55 +00:00
George Burgess IV 2cbf9730b0 [MSSA] Remove incorrect comment + `auto`ify dyn_cast results; NFC
llvm-svn: 335399
2018-06-22 22:34:07 +00:00
Craig Topper 10e2f73793 [X86][AsmParser] Keep track of whether an explicit scale was specified while parsing an address in Intel syntax. Use it for improved error checking.
This allows us to check these:
-16-bit addressing doesn't support scale so we should error if we find one there.
-Multiplying ESP/RSP by a scale even if the scale is 1 should be an error because ESP/RSP can't be an index.

llvm-svn: 335398
2018-06-22 22:28:39 +00:00
Craig Topper 1d707539e4 [X86][AsmParser] In Intel syntax make sure we support ESP/RSP being the second register in memory expressions like [EAX+ESP].
By default, the second register gets assigned to the index register slot. But ESP can't be an index register so we need to swap it with the other register.

There's still a slight bug that we allow [EAX+ESP*1]. The existence of the multiply even though its with 1 should force ESP to the index register and trigger an error, but it doesn't currently.

llvm-svn: 335394
2018-06-22 21:57:24 +00:00
Tobias Edler von Koch 7609cb83e6 Re-land "[LTO] Enable module summary emission by default for regular LTO"
Since we are now producing a summary also for regular LTO builds, we
need to run the NameAnonGlobals pass in those cases as well (the
summary cannot handle anonymous globals).

See https://reviews.llvm.org/D34156 for details on the original change.

This reverts commit 6c9ee4a4a438a8059aacc809b2dd57128fccd6b3.

llvm-svn: 335385
2018-06-22 20:23:21 +00:00
Craig Topper 9bc2c059c3 [X86] Don't accept (%si,%bp) 16-bit address expressions.
The second register is the index register and should only be %si or %di if used with a base register. And in that case the base register should be %bp or %bx.

This makes us compatible with gas.

We do still need to support both orders with Intel syntax which uses [bp+si] and [si+bp]

llvm-svn: 335384
2018-06-22 20:20:38 +00:00