Commit Graph

145388 Commits

Author SHA1 Message Date
Dan Gohman 9e188e3366 [WebAssembly] Define an initial set of relocation types for Wasm.
This set will likely evolve, along with the Wasm linking ABI.

llvm-svn: 296177
2017-02-24 21:21:44 +00:00
Tim Northover ef29e7284b GlobalISel: check for CImm rather than Imm on G_CONSTANTs.
All G_CONSTANTS created by the MachineIRBuilder have an operand of type CImm
(i.e. a ConstantInt), so that's what the selector needs to look for.

llvm-svn: 296176
2017-02-24 21:21:38 +00:00
Sanjay Patel cd72f156d6 [ARM] auto-generate complete checks; NFC
The affected test may change with a patch I'm looking at for DAGCombiner,
so I want to make sure it's not a regression.

llvm-svn: 296175
2017-02-24 21:19:09 +00:00
Dan Gohman 6999c4fd28 [WebAssembly] Handle f16 in fast-isel.
llvm-svn: 296172
2017-02-24 21:05:35 +00:00
Xin Tong 68ea9aa23a Fix Indentation. NFCI
llvm-svn: 296169
2017-02-24 20:59:26 +00:00
Lang Hames 630d2639f9 [Orc][RPC] Accept both const char* and char* arguments for string serialization.
llvm-svn: 296168
2017-02-24 20:56:43 +00:00
Eli Friedman c12a5a7595 [CodeGenPrepare] Make -addr-sink-using-gep work with address spaces.
When we construct addressing modes, we use isNoopAddrSpaceCast to ignore
addrspacecast instructions. Make sure we insert the correct addrspacecast
when we reconstruct the addressing mode.

Differential Revision: https://reviews.llvm.org/D30114

llvm-svn: 296167
2017-02-24 20:51:36 +00:00
Yaxun Liu e6d1ce59c0 [InstCombine] Fix bug in pointer replacement
This optimisation was crashing when there was a chain of more than one bitcast
instruction to replace, as a result of the changes in D27283.

Patch by James Price.

Differential Revision: https://reviews.llvm.org/D30347

llvm-svn: 296163
2017-02-24 20:27:25 +00:00
Davide Italiano 74f27b80d4 [Target/MIPS] Kill dead code, no functional change intended.
Hopefully placates gcc with -Werror.

llvm-svn: 296153
2017-02-24 18:48:10 +00:00
Michael Kuperstein 46b131e3f8 [CGP] Split some critical edges coming out of indirect branches
Splitting critical edges when one of the source edges is an indirectbr
is hard in general (because it requires changing the memory the indirectbr
reads). But if a block only has a single indirectbr predecessor (which is
the common case), we can simulate splitting that edge by splitting
the destination block, and retargeting the *direct* branches.

This is motivated by the use of computed gotos in python 2.7: PyEval_EvalFrame()
ends up using an indirect branch with ~100 successors, and passing a constant to
each of those. Since MachineSink can't break indirect critical edges on demand
(and doing this in MIR doesn't look feasible), this causes us to emit about ~100
defs of registers containing constants, which we in the predecessor block, where
only one of those constants is used in each successor. So, at each computed goto,
we needlessly spill about a 100 constants to stack. The end result is that a
clang-compiled python interpreter can be about ~2.5x slower on a simple python
reduction loop than a gcc-compiled interpreter.

Differential Revision: https://reviews.llvm.org/D29916

llvm-svn: 296149
2017-02-24 18:41:32 +00:00
Simon Pilgrim cdf2bd656a Revert: r296141 [APInt] Add APInt::extractBits() method to extract APInt subrange
The current pattern for extract bits in range is typically:

Mask.lshr(BitOffset).trunc(SubSizeInBits);

Which can be particularly slow for large APInts (MaskSizeInBits > 64) as they require the allocation of memory for the temporary variable.

This is another of the compile time issues identified in PR32037 (see also D30265).

This patch adds the APInt::extractBits() helper method which avoids the temporary memory allocation.

Differential Revision: https://reviews.llvm.org/D30336

llvm-svn: 296147
2017-02-24 18:31:04 +00:00
Matthew Simpson bdc9c78880 [LV] Merge floating-point and integer induction widening code
This patch merges the existing floating-point induction variable widening code
into the integer induction variable widening code, creating a single set of
functions for both kinds of inductions. The primary motivation for doing this
is to enable vector phi node creation for floating-point induction variables.

Differential Revision: https://reviews.llvm.org/D30211

llvm-svn: 296145
2017-02-24 18:20:12 +00:00
Nemanja Ivanovic 195c5452d3 [PowerPC] Use subfic instruction for subtract from immediate
Provide a 64-bit pattern to use SUBFIC for subtracting from a 16-bit immediate.
The corresponding pattern already exists for 32-bit integers.

Committing on behalf of Hiroshi Inoue.

Differential Revision: https://reviews.llvm.org/D29387

llvm-svn: 296144
2017-02-24 18:16:06 +00:00
Nemanja Ivanovic 82d53ed492 [PowerPC] Use rldicr instruction for AND with an immediate if possible
Emit clrrdi (extended mnemonic for rldicr) for AND-ing with masks that
clear bits from the right hand size.

Committing on behalf of Hiroshi Inoue.

Differential Revision: https://reviews.llvm.org/D29388

llvm-svn: 296143
2017-02-24 18:03:16 +00:00
Simon Pilgrim bd9fb2ae95 [APInt] Add APInt::extractBits() method to extract APInt subrange
The current pattern for extract bits in range is typically:

Mask.lshr(BitOffset).trunc(SubSizeInBits);

Which can be particularly slow for large APInts (MaskSizeInBits > 64) as they require the allocation of memory for the temporary variable.

This is another of the compile time issues identified in PR32037 (see also D30265).

This patch adds the APInt::extractBits() helper method which avoids the temporary memory allocation.

Differential Revision: https://reviews.llvm.org/D30336

llvm-svn: 296141
2017-02-24 17:46:18 +00:00
Simon Pilgrim d030291d6d Fixed IntOperandMatcher::emitCxxPredicateExpr arguments
Extra const in the StringRef argument meant that MSVC complained about it not correctly overriding from OperandPredicateMatcher::emitCxxPredicateExpr (which didn't have the const)

llvm-svn: 296138
2017-02-24 17:20:27 +00:00
Sanjay Patel 832b1622d8 [DAGCombiner] add missing folds for scalar select of {-1,0,1}
The motivation for filling out these select-of-constants cases goes back to D24480, 
where we discussed removing an IR fold from add(zext) --> select. And that goes back to:
https://reviews.llvm.org/rL75531
https://reviews.llvm.org/rL159230

The idea is that we should always canonicalize patterns like this to a select-of-constants 
in IR because that's the smallest IR and the best for value tracking. Note that we currently 
do the opposite in some cases (like the cases in *this* patch). Ie, the proposed folds in 
this patch already exist in InstCombine today:
https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/InstCombine/InstCombineSelect.cpp#L1151

As this patch shows, most targets generate better machine code for simple ext/add/not ops 
rather than a select of constants. So the follow-up steps to make this less of a patchwork 
of special-case folds and missing IR canonicalization:

1. Have DAGCombiner convert any select of constants into ext/add/not ops.
2  Have InstCombine canonicalize in the other direction (create more selects).

Differential Revision: https://reviews.llvm.org/D30180

llvm-svn: 296137
2017-02-24 17:17:33 +00:00
Simon Dardis ae6f2bcb25 Recommit "[mips] Fix atomic compare and swap at O0."
This time with the missing files.

Similar to PR/25526, fast-regalloc introduces spills at the end of basic
blocks. When this occurs in between an ll and sc, the store can cause the
atomic sequence to fail.

This patch fixes the issue by introducing more pseudos to represent atomic
operations and moving their lowering to after the expansion of postRA
pseudos.

This resolves PR/32020.

Thanks to James Cowgill for reporting the issue!

Reviewers: slthakur

Differential Revision: https://reviews.llvm.org/D30257

llvm-svn: 296134
2017-02-24 16:32:18 +00:00
Simon Dardis 3c58c18ff0 Revert "[mips] Fix atomic compare and swap at O0."
This reverts r296132. I forgot to include the tests.

llvm-svn: 296133
2017-02-24 16:30:27 +00:00
Simon Dardis cf0e06d375 [mips] Fix atomic compare and swap at O0.
Similar to PR/25526, fast-regalloc introduces spills at the end of basic
blocks. When this occurs in between an ll and sc, the store can cause the
atomic sequence to fail.

This patch fixes the issue by introducing more pseudos to represent atomic
operations and moving their lowering to after the expansion of postRA
pseudos.

This resolves PR/32020.

Thanks to James Cowgill for reporting the issue!

Reviewers: slthakur

Differential Revision: https://reviews.llvm.org/D30257

llvm-svn: 296132
2017-02-24 16:27:45 +00:00
Daniel Sanders 066ebbfd46 [globalisel] Decouple src pattern operands from dst pattern operands.
Summary:
This isn't testable for AArch64 by itself so this patch also adds
support for constant immediates in the pattern and physical
register uses in the result.

The new IntOperandMatcher matches the constant in patterns such as
'(set $rd:GPR32, (G_XOR $rs:GPR32, -1))'. It's always safe to fold
immediates into an instruction so this is the first rule that will match
across multiple BB's.

The Renderer hierarchy is responsible for adding operands to the result
instruction. Renderers can copy operands (CopyRenderer) or add physical
registers (in particular %wzr and %xzr) to the result instruction
in any order (OperandMatchers now import the operand names from
SelectionDAG to allow renderers to access any operand). This allows us to
emit the result instruction for:
  %1 = G_XOR %0, -1 --> %1 = ORNWrr %wzr, %0
  %1 = G_XOR -1, %0 --> %1 = ORNWrr %wzr, %0
although the latter is untested since the matcher/importer has not been
taught about commutativity yet.

Added BuildMIAction which can build new instructions and mutate them where
possible. W.r.t the mutation aspect, MatchActions are now told the name of
an instruction they can recycle and BuildMIAction will emit mutation code
when the renderers are appropriate. They are appropriate when all operands
are rendered using CopyRenderer and the indices are the same as the matcher.
This currently assumes that all operands have at least one matcher.

Finally, this change also fixes a crash in
AArch64InstructionSelector::select() caused by an immediate operand
passing isImm() rather than isCImm(). This was uncovered by the other
changes and was detected by existing tests.

Depends on D29711

Reviewers: t.p.northover, ab, qcolombet, rovka, aditya_nandakumar, javed.absar

Reviewed By: rovka

Subscribers: aemerson, dberris, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D29712

llvm-svn: 296131
2017-02-24 15:43:30 +00:00
Simon Pilgrim 7f6a7c97a7 [X86][SSE] Target shuffle combine can try to combine up to 16 vectors
Noticed while profiling PR32037, the target shuffle ops were being stored in SmallVector<*,8> types but the combiner could store as many as 16 ops at maximum depth (2 per depth).

llvm-svn: 296130
2017-02-24 15:35:52 +00:00
Sanjay Patel ec9a8de0e6 [InstCombine] don't try SimplifyDemandedInstructionBits from zext/sext because it's slow and unnecessary
This one seems more obvious than D30270 that it can't make improvements because an extension always needs
all of the incoming bits. There's one specific transform in SimplifyDemandedInstructionBits of converting
a sext to a zext when the sign-bit is known zero, but that is handled explicitly in visitSext() with
ComputeSignBit().

Like D30270, there are no IR differences (other than instruction names) for the case in PR32037:
https://bugs.llvm.org//show_bug.cgi?id=32037
...and no regression test differences.

Zext/sext are a smaller part of the profile, but this still appears to shave off another 0.5% or so from
'opt -O2'.

Differential Revision: https://reviews.llvm.org/D30280

llvm-svn: 296129
2017-02-24 15:18:42 +00:00
Sanjay Patel 9f0fa52aa2 [x86] use DAG.getAllOnesConstant(); NFCI
llvm-svn: 296128
2017-02-24 15:09:59 +00:00
Daniel Sanders 8d4d72f16b Fix missing call to base class constructor in r296121.
The 'Kind' member used in RTTI for InstructionPredicateMatcher was not
initialized but went undetected since I always ended up with the correct value.

llvm-svn: 296126
2017-02-24 14:53:35 +00:00
Simon Dardis aa20881749 [mips] Handle 64 bit immediate in and/or/xor pseudo instructions on mips64
Previously LLVM was assuming 32-bit signed immediates which results in and with
a bitmask that has bit 31 set to incorrectly include bits 63-32 in the result.
After applying this patch I can now compile all of the FreeBSD mips assembly
code with clang.

This issue also affects the nor, slt and sltu macros and I will fix those in a
separate review.

Patch By: Alexander Richardson

Commit message reformatted by sdardis.

Reviewers: atanasyan, theraven, sdardis

Differential Revision: https://reviews.llvm.org/D30298

llvm-svn: 296125
2017-02-24 14:34:32 +00:00
Diana Picus 3b99c64ba1 [ARM] GlobalISel: Select G_STORE
Same as selecting G_LOAD.

llvm-svn: 296122
2017-02-24 14:01:27 +00:00
Daniel Sanders 759ff41f31 [globalisel] Sort RuleMatchers by priority.
Summary:
This makes more important rules have priority over less important rules.
For example, '%a = G_ADD $b:s64, $c:s64' has priority over
'%a = G_ADD $b:s32, $c:s32'. Previously these rules were emitted in the
correct order by chance.

NFC in this patch but it is required to make the next patch work correctly.

Depends on D29710

Reviewers: t.p.northover, ab, qcolombet, aditya_nandakumar, rovka

Reviewed By: ab, rovka

Subscribers: javed.absar, dberris, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D29711

llvm-svn: 296121
2017-02-24 13:58:11 +00:00
Diana Picus b31a259198 Minor test fix
The test was using a size of 8 for loading/storing pointers. It should be 4.

llvm-svn: 296120
2017-02-24 13:27:55 +00:00
Diana Picus 1f432f995a [ARM] GlobalISel: Add reg bank mappings for stores
Same as the ones for loads.

llvm-svn: 296115
2017-02-24 13:07:25 +00:00
Simon Dardis ebc35129e5 [mips][mc] Fix a crash when disassembling odd sized sections
Attempt to fix failing test.

llvm-svn: 296112
2017-02-24 12:47:41 +00:00
Diana Picus 767d053d0d Fixup r296105 - only run tests on Mips
llvm-svn: 296111
2017-02-24 12:47:11 +00:00
Simon Pilgrim 4f8a443798 Fix signed/unsigned comparison warnings
llvm-svn: 296109
2017-02-24 11:31:00 +00:00
Diana Picus a2b632a353 [ARM] GlobalISel: Legalize stores
Allow the same types that we allow for loads.

llvm-svn: 296108
2017-02-24 11:28:24 +00:00
Simon Dardis f7923db066 [mips][mc] Fix a crash when disassembling odd sized sections
Corresponding test.

llvm-svn: 296106
2017-02-24 10:51:27 +00:00
Simon Dardis a5f52dc00d [mips][mc] Fix a crash when disassembling odd sized sections
Make the MIPS disassembler consistent with the other targets in returning
a Size of zero when the input buffer cannot contain an instruction due
to it's size. Previously it reported the minimum instruction size when
it failed due to the buffer not being big enough for an instruction
causing llvm-objdump to crash when disassembling all sections.

Reviewers: slthakur

Differential Revision: https://reviews.llvm.org/D29984

llvm-svn: 296105
2017-02-24 10:50:27 +00:00
Diana Picus c21d1e5d94 Revert "[ARM] GlobalISel: Legalize stores"
This reverts commit r296103 because the test broke on one of the bots. Sorry!

llvm-svn: 296104
2017-02-24 10:35:39 +00:00
Diana Picus a5f1cfd1a7 [ARM] GlobalISel: Legalize stores
Allow the same types that we allow for loads.

llvm-svn: 296103
2017-02-24 10:19:23 +00:00
Simon Pilgrim aed352273e [APInt] Add APInt::setBits() method to set all bits in range
The current pattern for setting bits in range is typically:

Mask |= APInt::getBitsSet(MaskSizeInBits, LoPos, HiPos);

Which can be particularly slow for large APInts (MaskSizeInBits > 64) as they require the allocation memory for the temporary variable.

This is one of the key compile time issues identified in PR32037.

This patch adds the APInt::setBits() helper method which avoids the temporary memory allocation completely, this first implementation uses setBit() internally instead but already significantly reduces the regression in PR32037 (~10% drop). Additional optimization may be possible.

I investigated whether there is need for APInt::clearBits() and APInt::flipBits() equivalents but haven't seen these patterns to be particularly common, but reusing the code would be trivial.

Differential Revision: https://reviews.llvm.org/D30265

llvm-svn: 296102
2017-02-24 10:15:29 +00:00
Justin Bogner 259a0cf3ee Add missing initialization for MachineOptimizationRemarkEmitter
This was missed in r293110.

llvm-svn: 296096
2017-02-24 07:42:35 +00:00
Dan Gohman 411dc07aba [WebAssembly] Add a README.txt entry for mergeable sections.
llvm-svn: 296095
2017-02-24 07:33:55 +00:00
Craig Topper 8783bbb598 [AVX-512] Separate the fadd/fsub/fmul/fdiv/fmax/fmin with rounding mode ISD opcodes into separate packed and scalar opcodes. This is more consistent with the rest of the ISD opcodes. NFC
llvm-svn: 296094
2017-02-24 07:21:10 +00:00
Craig Topper c446b1ffae [ExecutionDepsFix] Use range-based for loop. NFC
llvm-svn: 296093
2017-02-24 06:38:24 +00:00
Craig Topper c43f3f3291 [IR][X86] Fix llvm version number in comments in AutoUpgrade. Forgot the next release is 5.0 not 4.1
llvm-svn: 296092
2017-02-24 05:35:07 +00:00
Craig Topper f2529c188b [AVX-512] Remove lzcnt intrinsics and autoupgrade them to generic ctlz intrinsics with select.
Clang has been emitting cltz intrinsics for a while now.

llvm-svn: 296091
2017-02-24 05:35:04 +00:00
Craig Topper dc13344150 [AVX-512] Move lzcnt and conflict intrinsic tests to avx512cd intrinsic test file since that's their feature.
llvm-svn: 296090
2017-02-24 05:34:59 +00:00
Craig Topper 700edc3275 [AVX-512] Use update_llc_test_checks.py to generate a test.
llvm-svn: 296089
2017-02-24 05:34:57 +00:00
Petr Hosek a7d5916308 [Fuchsia] Use thread-pointer ABI slots for stack-protector and safe-stack
The Fuchsia ABI defines slots from the thread pointer where the
stack-guard value for stack-protector, and the unsafe stack pointer
for safe-stack, are stored. This parallels the Android ABI support.

Patch by Roland McGrath

Differential Revision: https://reviews.llvm.org/D30237

llvm-svn: 296081
2017-02-24 03:10:10 +00:00
Eli Friedman 7e0ce82c4a Add some testcases for bitfields with illegal widths.
clang will generate IR like this for input using packed bitfields;
very simple semantically, but it's a bit tricky to actually
generate good code.

llvm-svn: 296080
2017-02-24 03:04:11 +00:00
Eli Friedman 8f34746c72 Fix old testcase for dead store to match the original intent.
The x86 backend has a special case for load+xor+store, which isn't really
what this is trying to test.

llvm-svn: 296077
2017-02-24 02:58:49 +00:00