Commit Graph

44489 Commits

Author SHA1 Message Date
Jonas Paulsson 1e8648577c [SystemZ] Update kill-flag in splitMove().
EarlierMI needs to clear the kill flag on the first operand in case of a store.

Review: Ulrich Weigand
llvm-svn: 301177
2017-04-24 12:40:28 +00:00
Renato Golin 54c736f833 [DWARF] Move test to x86 directory
llvm-svn: 301176
2017-04-24 12:37:11 +00:00
George Rimar ca53211beb [DWARF] - Take relocations in account when extracting ranges from .debug_ranges
I found this when investigated "Bug 32319 - .gdb_index is broken/incomplete" for LLD.

When we have object file with .debug_ranges section it may be filled with zeroes.
Relocations are exist in file to relocate this zeroes into real values later, but until that
a pair of zeroes is treated as terminator. And DWARF parser thinks there is no ranges at all
when I am trying to collect address ranges for building .gdb_index.

Solution implemented in this patch is to take relocations in account when parsing ranges.

Differential revision: https://reviews.llvm.org/D32228

llvm-svn: 301170
2017-04-24 10:19:45 +00:00
Diana Picus f53865daa4 [ARM] GlobalISel: Legalize s8 and s16 G_(S|U)DIV
We have to widen the operands to 32 bits and then we can either use
hardware division if it is available or lower to a libcall otherwise.

At the moment it is not enough to set the Legalizer action to
WidenScalar, since for libcalls it won't know what to do (it won't be
able to find what size to widen to, because it will find Libcall and not
Legal for 32 bits). To hack around this limitation, we request Custom
lowering, and as part of that we widen first and then we run another
legalizeInstrStep on the widened DIV.

llvm-svn: 301166
2017-04-24 09:12:19 +00:00
Sjoerd Meijer e5b8557d5b [Arch64AsmParser] better diagnostic for isb
Instruction isb takes as an operand either 'sy' or an immediate value. This
improves the diagnostic when the string is not 'sy' and adds a test case for
this which was missing. This also adds tests to check invalid inputs for dsb
and dmb.

Differential Revision: https://reviews.llvm.org/D32227

llvm-svn: 301165
2017-04-24 08:22:20 +00:00
Diana Picus b70e88bdec [ARM] GlobalISel: Support G_(S|U)DIV for s32
Add support for both targets with hardware division and without. For
hardware division we have to add support throughout the pipeline
(legalizer, reg bank select, instruction select). For targets without
hardware division, we only need to mark it as a libcall.

llvm-svn: 301164
2017-04-24 08:20:05 +00:00
Diana Picus 95a8aa93e2 [ARM] GlobalISel: Select G_CONSTANT with CImm operands
When selecting a G_CONSTANT to a MOVi, we need the value to be an Imm
operand. We used to just leave the G_CONSTANT operand unchanged, which
works in some cases (such as the GEP offsets that we create when
referring to stack slots). However, in many other places the G_CONSTANTs
are created with CImm operands. This patch makes sure to handle those as
well, and to error out gracefully if in the end we don't end up with an
Imm operand.

Thanks to Oliver Stannard for reporting this issue.

llvm-svn: 301162
2017-04-24 06:30:56 +00:00
Dean Michael Berris ca780b5a27 [XRay] A tool for Comparing xray function call graphs
Summary:
This is a tool for comparing the function graphs produced by the
llvm-xray graph too. It takes the form of a new subcommand of the
llvm-xray tool 'graph-diff'.

This initial version of the patch is very rough, but it is close to
feature complete.

Depends on D29363

Reviewers: dblaikie, dberris

Reviewed By: dberris

Subscribers: mgorny, llvm-commits

Differential Revision: https://reviews.llvm.org/D29320

llvm-svn: 301160
2017-04-24 05:54:33 +00:00
Sanjoy Das 0cdcdf018e Revert "[SCEV] Enable SCEV verification by default in EXPENSIVE_CHECKS builds"
This reverts commit r301150.  It breaks CodeGen/Hexagon/hwloop-wrap2.ll, reverting
while I investigate.

llvm-svn: 301154
2017-04-24 02:35:19 +00:00
Sanjoy Das 8919303b0a [SCEV] Enable SCEV verification by default in EXPENSIVE_CHECKS builds
llvm-svn: 301150
2017-04-24 00:41:58 +00:00
Sanjoy Das bdbc4938f9 [SCEV] Fix exponential time complexity by caching
llvm-svn: 301149
2017-04-24 00:09:46 +00:00
Xinliang David Li db8d09b6c2 [PartialInine]: add triaging options
There are more bugs (runtime failures) triggered when partial
inlining is turned on. Add options to help triaging problems.

llvm-svn: 301148
2017-04-23 23:39:04 +00:00
Simon Pilgrim 12df01c3c7 [X86][AVX] Add scheduling latency/throughput tests for some AVX1 instructions
More instructions will be added in future commits

llvm-svn: 301145
2017-04-23 22:08:17 +00:00
Sanjay Patel e0c26e0640 [InstCombine] add/move folds for [not]-xor
We handled all of the commuted variants for plain xor already,
although they were scattered around and sometimes folded less
efficiently using distributive laws. We had no folds for not-xor.

Handling all of these patterns consistently is part of trying to 
reinstate:
https://reviews.llvm.org/rL300977

llvm-svn: 301144
2017-04-23 22:00:02 +00:00
Simon Pilgrim 06d6263309 [X86][SSE] Add scheduler class support for SSE42 (PCMPGT) instructions
llvm-svn: 301142
2017-04-23 21:23:27 +00:00
Simon Pilgrim 7d71ed503d [X86][SSE] Add scheduling latency/throughput tests for (most) SSE42 instructions
llvm-svn: 301141
2017-04-23 21:00:25 +00:00
Sanjay Patel afa371fd1d [InstCombine] add tests for not-xor and remove redundant tests; NFC
llvm-svn: 301140
2017-04-23 20:59:00 +00:00
Xin Tong f98602a1ab [JumpThread] We want to fold (not thread) when all predecessor go to single BB's successor.
Summary:
In case all predecessor go to a single successor of current BB. We want to fold (not thread).

I failed to update the phi nodes properly in the last patch https://reviews.llvm.org/rL300657.

Phi nodes values are per predecessor in LLVM.

Reviewers: sanjoy

Reviewed By: sanjoy

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D32400

llvm-svn: 301139
2017-04-23 20:56:29 +00:00
Simon Pilgrim 19a173ac23 [X86][SSE] Add scheduling latency/throughput tests for (most) SSE41 instructions
llvm-svn: 301137
2017-04-23 20:05:21 +00:00
Simon Pilgrim 57fea6879b [X86][SSE] Add missing scheduling latency/throughput test for PINSRW
llvm-svn: 301136
2017-04-23 19:56:49 +00:00
Sanjay Patel 42a84ac710 [InstCombine] add tests for or-to-xor; NFC
llvm-svn: 301131
2017-04-23 16:37:36 +00:00
Sanjay Patel d13b0bfdac [InstCombine] add pattern matches for commuted variants of xor-to-xor
There's probably some better way to write this that eliminates the
code duplication without hurting readability, but at least this
eliminates the logic holes and is hopefully slightly more efficient
than creating new instructions.

llvm-svn: 301129
2017-04-23 16:03:00 +00:00
Sanjay Patel 9081808521 [InstCombine] add tests for xor-to-xor; NFC
Besides missing 2 commuted patterns, the way we handle these folds is inefficient.

llvm-svn: 301128
2017-04-23 14:51:03 +00:00
Simon Pilgrim c781c0f630 [X86][SSE] Add scheduling latency/throughput tests for SSSE3 instructions
llvm-svn: 301127
2017-04-23 14:01:55 +00:00
Simon Pilgrim e8f8422fe5 [X86][SSE] Add scheduling latency/throughput tests for SSE3 instructions
llvm-svn: 301126
2017-04-23 13:59:29 +00:00
Sanjay Patel 794c34dc35 [InstCombine] add tests for add-to-xor commuted variants; NFC
1 out of the 4 tests commuted the operands, so there's an asymmetry
somewhere under this in how we handle these transforms.

llvm-svn: 301125
2017-04-23 13:37:05 +00:00
Ayman Musa 544988f34d [X86] Convert test checks to generated checks of update_llc_test_checks.py. NFC
llvm-svn: 301107
2017-04-23 07:41:40 +00:00
Artyom Skrobov 53cf1897cc [ARM] ScheduleDAGRRList::DelayForLiveRegsBottomUp must consider OptionalDefs
Summary:
D30400 has enabled tADC and tSBC instructions to be unglued, thereby allowing CPSR to remain live between Thumb1 scheduling units.

Most Thumb1 instructions have an OptionalDef for CPSR; but the scheduler ignored the OptionalDefs, and could unwittingly insert a flag-setting instruction in between an ADDS and the corresponding ADC.

Reviewers: javed.absar, atrick, MatzeB, t.p.northover, jmolloy, rengolin

Reviewed By: javed.absar

Subscribers: rogfer01, efriedma, aemerson, rengolin, llvm-commits, MatzeB

Differential Revision: https://reviews.llvm.org/D31081

llvm-svn: 301106
2017-04-23 06:58:08 +00:00
Adrian Prantl 4677205010 Revert "Use DW_OP_stack_value when reconstructing variable values with arithmetic."
This reverts commit r301093 while investigating stage2 bot breakage.

llvm-svn: 301099
2017-04-23 00:44:40 +00:00
Jonathan Roelofs 1233fe5ac3 Fix testcase: s/CHECKNEXT/CHECK-NEXT/
llvm-svn: 301098
2017-04-22 23:43:44 +00:00
Sanjay Patel ceff20fe50 [InstCombine] clean up tests and regenerate checks; NFC
llvm-svn: 301097
2017-04-22 23:36:47 +00:00
Adrian Prantl a2d25ac14a Use DW_OP_stack_value when reconstructing variable values with arithmetic.
When the location description of a source variable involves arithmetic
on the value itself, it needs to be marked with DW_OP_stack_value since it
is not describing the variable's location, but rather its value.

This is a follow-up to r297971 and fixes the source testcase quoted in
the comment in debuginfo-dce.ll.

rdar://problem/30725338

llvm-svn: 301093
2017-04-22 20:54:06 +00:00
Simon Pilgrim f27a714a9e [X86] Regenerate TLS tests
Use the correct check prefix for X86/X32/X64 target types.

llvm-svn: 301092
2017-04-22 20:13:58 +00:00
Daniel Sanders 658541fe69 [globalisel][tablegen] Add support for RegisterOperand.
Summary:
It functions just like RegisterClass except that the class is obtained
from a field.

Depends on D31761.

Reviewers: ab, qcolombet, t.p.northover, rovka, kristof.beyls, aditya_nandakumar

Reviewed By: ab

Subscribers: dberris, llvm-commits, igorb

Differential Revision: https://reviews.llvm.org/D32229

llvm-svn: 301080
2017-04-22 15:53:21 +00:00
Daniel Sanders 2deea1878e [globalisel][tablegen] Revise API for ComplexPattern operands to improve flexibility.
Summary:
Some targets need to be able to do more complex rendering than just adding an
operand or two to an instruction. For example, it may need to insert an
instruction to extract a subreg first, or it may need to perform an operation
on the operand.

In SelectionDAG, targets would create SDNode's to achieve the desired effect
during the complex pattern predicate. This worked because SelectionDAG had a
form of garbage collection that would take care of SDNode's that were created
but not used due to a later predicate rejecting a match. This doesn't translate
well to GlobalISel and the churn was wasteful.

The API changes in this patch enable GlobalISel to accomplish the same thing
without the waste. The API is now:
	InstructionSelector::OptionalComplexRendererFn selectArithImmed(MachineOperand &Root) const;
where Root is the root of the match. The return value can be omitted to
indicate that the predicate failed to match, or a function with the signature
ComplexRendererFn can be returned. For example:
	return OptionalComplexRendererFn(
	       [=](MachineInstrBuilder &MIB) { MIB.addImm(Immed).addImm(ShVal); });
adds two immediate operands to the rendered instruction. Immed and ShVal are
captured from the predicate function.

As an added bonus, this also reduces the amount of information we need to
provide to GIComplexOperandMatcher.

Depends on D31418

Reviewers: aditya_nandakumar, t.p.northover, qcolombet, rovka, ab, javed.absar

Reviewed By: ab

Subscribers: dberris, kristof.beyls, igorb, llvm-commits

Differential Revision: https://reviews.llvm.org/D31761

llvm-svn: 301079
2017-04-22 15:11:04 +00:00
Daniel Sanders 3016d3c6c9 [globalisel][tablegen] Fix PR32733 by checking which instruction operands belong to.
canMutate() was returning true when the operands were all in the same order as
the matched instruction. However, it wasn't checking the operands were actually
on that instruction. This worked when we could only match a single instruction
but the addition of nested instruction matching led to cases where the operands
could be split across multiple instructions. canMutate() now returns false if
operands belong to instructions other than the root of the match.

llvm-svn: 301077
2017-04-22 14:31:28 +00:00
David Blaikie 5477b97d45 Fix test to handle .rel and .rela sections (& to actually specify the target architecture as X86)
llvm-svn: 301073
2017-04-22 08:17:39 +00:00
David Blaikie 85366acf15 Avoid using relocations for ref_addr in .dwo files
In dwo files the fixed offset can be used - if the dwos are linked into
a dwp, the dwo consumer must use the dwp tables to find out where the
original range of the debug_info was and resolve the "section relative"
value relative to that original range - effectively
avoiding/reimplementing the relocation handling.

llvm-svn: 301072
2017-04-22 07:53:44 +00:00
David Blaikie 6cce69020c Fix test from polluting the source tree
(though this seems like a "does this not crash" test - which isn't very
good. Should be fixed)

llvm-svn: 301071
2017-04-22 07:53:40 +00:00
Artur Pilipenko 0632bdc648 Fix for PR32740 - Invalid floating type, unreachable between r300969 and r301029
The bug was introduced by r301018 "[InstCombine] fadd double (sitofp x), y check that the promotion is valid". The patch didn't expect that fadd can be on vectors not necessarily scalars. Add vector support along with the test.

llvm-svn: 301070
2017-04-22 07:24:52 +00:00
Matt Arsenault 01d17e7c5f LowerSwitch: Fix producing invalid IR on unreachable code
If a switch was in an unreachable block that branched
to a block with a phi, it would leave phis with missing
predecessors.

llvm-svn: 301064
2017-04-21 23:54:12 +00:00
David Blaikie 96b1ed50e8 Move Split DWARF handling to an MC option/command line argument rather than using metadata
Since Split DWARF needs to name the actual .dwo file that is generated,
it can't be known at the time the llvm::Module is produced as it may be
merged with other Modules before the object is generated and that object
may be generated with any name.

By passing the Split DWARF file name when LLVM is producing object code
the .dwo file name in the object file can match correctly.

The support for Split DWARF for implicit modules remains the same -
using metadata to store the dwo name and dwo id so that potentially
multiple skeleton CUs referring to different dwo files can be generated
from one llvm::Module.

llvm-svn: 301062
2017-04-21 23:35:26 +00:00
Matthias Braun d78597ec08 AArch64FrameLowering: Check if the ExtraCSSpill register is actually unused
The code assumed that when saving an additional CSR register
(ExtraCSSpill==true) we would have a free register throughout the
function. This was not true if this CSR register is also used to pass
values as in the swiftself case.

rdar://31451816

llvm-svn: 301057
2017-04-21 22:42:08 +00:00
Adrian Prantl ff384546f5 Add test coverage for mem2reg dbg.declare lowering.
llvm-svn: 301050
2017-04-21 22:13:55 +00:00
Hans Wennborg 9b9a5358dd Re-commit r301040 "X86: Don't emit zero-byte functions on Windows"
In addition to the original commit, tighten the condition for when to
pad empty functions to COFF Windows.  This avoids running into problems
when targeting e.g. Win32 AMDGPU, which caused test failures when this
was committed initially.

llvm-svn: 301047
2017-04-21 21:48:41 +00:00
Matt Arsenault c07bda7b87 InferAddressSpaces: Infer for just GEPs
Fixes leaving intermediate flat addressing computations
where a GEP instruction's source is a constant expression.

Still leaves behind a trivial addrspacecast + gep pair that
instcombine is able to handle, which ideally could be folded
here directly.

llvm-svn: 301044
2017-04-21 21:35:04 +00:00
Xinliang David Li 0e9f6df169 [PartialInliner] Partial inliner needs to check use kind before transformation
Differential Revision: https://reviews.llvm.org/D32373

llvm-svn: 301042
2017-04-21 21:20:56 +00:00
Hans Wennborg 04593000d8 Revert r301040 "X86: Don't emit zero-byte functions on Windows"
This broke almost all bots. Reverting while fixing.

llvm-svn: 301041
2017-04-21 21:10:37 +00:00
Hans Wennborg cb3e810714 X86: Don't emit zero-byte functions on Windows
Empty functions can lead to duplicate entries in the Guard CF Function
Table of a binary due to multiple functions sharing the same RVA,
causing the kernel to refuse to load that binary.

We had a terrific bug due to this in Chromium.

It turns out we were already doing this for Mach-O in certain
situations. This patch expands the code for that in
AsmPrinter::EmitFunctionBody() and renames
TargetInstrInfo::getNoopForMachoTarget() to simply getNoop() since it
seems it was used for not just Mach-O anyway.

Differential Revision: https://reviews.llvm.org/D32330

llvm-svn: 301040
2017-04-21 20:58:12 +00:00
Zachary Turner 0fc009b008 Add a dependency from llvm/test to llvm-cvtres.
llvm-svn: 301038
2017-04-21 20:45:11 +00:00
Tim Northover 1efaa3a88f AArch64: add test for "fence singlethread"
Forgot a git add yesterday.

llvm-svn: 301037
2017-04-21 20:36:08 +00:00
Tim Northover e31cf3f824 ARM: make sure we use all entries in a vector before forming a vpaddl.
Otherwise there's some mismatch, and we'll either form an illegal type or an
illegal node.

Thanks to Eli Friedman for pointing out the problem with my original solution.

llvm-svn: 301036
2017-04-21 20:35:52 +00:00
Sanjay Patel 8ce1d4cbe1 [InstCombine] revert r300977 and r301021
This can cause an inf-loop. Investigating...

llvm-svn: 301035
2017-04-21 20:29:17 +00:00
Konstantin Zhuravlyov 3d1cc88c68 AMDGPU: Temporarily disable packed inlinable literals (v2f16, v2i16)
Differential Revision: https://reviews.llvm.org/D32361

llvm-svn: 301028
2017-04-21 19:45:22 +00:00
Konstantin Zhuravlyov 88938d4e67 AMDGPU: Fix S_PACK_HH_B32_B16
- We really ought to zero out lower 16 bits

Differential Revision: https://reviews.llvm.org/D32356

llvm-svn: 301026
2017-04-21 19:35:05 +00:00
Yaxun Liu 15a96b1dc8 [AMDGPU] Handle SI_MASKED_UNREACHABLE in instruction emitter
SI_MASKED_UNREACHABLE does not have machine instruction encoding.
It needs special handling in AMDGPUAsmPrinter::EmitInstruction like some
other pseudo instructions.

This patch fixes compilation failure of RadeonRays.

Differential Revision: https://reviews.llvm.org/D32364

llvm-svn: 301025
2017-04-21 19:32:02 +00:00
Konstantin Zhuravlyov c4b18e7099 AMDGPU: Do not lower fast unsafe div for safe, f32, with fp32 denormals
Differential Revision: https://reviews.llvm.org/D32085

llvm-svn: 301023
2017-04-21 19:25:33 +00:00
Akira Hatanaka 22e839f4b2 [AArch64] Improve code generation for logical instructions taking
immediate operands.

This commit adds an AArch64 dag-combine that optimizes code generation
for logical instructions taking immediate operands. The optimization
uses demanded bits to change a logical instruction's immediate operand
so that the immediate can be folded into the immediate field of the
instruction.

This recommits r300932 and r300930, which was causing dag-combine to
loop forever. The problem was that optimizeLogicalImm was returning
true even when there was no change to the immediate node (which happened
when the immediate was all zeros or ones), which caused dag-combine to
push and pop the same node to the work list over and over again without
making any progress.

This commit fixes the bug by returning false early in optimizeLogicalImm
if the immediate is all zeros or ones. Also, it changes the code to
compare the immediate with 0 or Mask rather than calling
countPopulation.

rdar://problem/18231627

Differential Revision: https://reviews.llvm.org/D5591

llvm-svn: 301019
2017-04-21 18:53:12 +00:00
Artur Pilipenko 134d94f9a3 [InstCombine] fadd double (sitofp x), y check that the promotion is valid
Doing these transformations check that the result of integer addition is representable in the FP type.

(fadd double (sitofp x), fpcst) --> (sitofp (add int x, intcst))
(fadd double (sitofp x), (sitofp y)) --> (sitofp (add int x, y))

This is a fix for https://bugs.llvm.org//show_bug.cgi?id=27036

Reviewed By: andrew.w.kaylor, scanon, spatel

Differential Revision: https://reviews.llvm.org/D31182

llvm-svn: 301018
2017-04-21 18:45:25 +00:00
Zachary Turner 087edfa2e8 Add empty shell of llvm-cvtres.
This marks the beginning of an effort to port remaining
MSVC toolchain miscellaneous utilities to all platforms.

Currently clang-cl shells out to certain additional tools
such as the IDL compiler, resource compiler, and a few
other tools, but as these tools are Windows-only it
limits the ability of clang to target Windows on other
platforms.  having a full suite of these tools directly
in LLVM should eliminate this constraint.

The current implementation provides no actual functionality,
it is just an empty skeleton executable for the purposes
of making incremental changes.

Differential Revision: https://reviews.llvm.org/D32095
Patch by Eric Beckmann (ecbeckmann@google.com)

llvm-svn: 301004
2017-04-21 17:30:29 +00:00
Tim Northover 1061ccca8c ARM: don't try to create an i8 -> i32 vpaddl.
DAG combine was mistakenly assuming that the step-up it was looking at was
always a doubling, but it can sometimes be a larger extension in which case
we'd crash.

llvm-svn: 301002
2017-04-21 17:21:59 +00:00
Daniel Sanders e7b0d66080 [globalisel][tablegen] Import SelectionDAG's rule predicates and support the equivalent in GIRule.
Summary:
The SelectionDAG importer now imports rules with Predicate's attached via
Requires, PredicateControl, etc. These predicates are implemented as
bitset's to allow multiple predicates to be tested together. However,
unlike the MC layer subtarget features, each target only pays for it's own
predicates (e.g. AArch64 doesn't have 192 feature bits just because X86
needs a lot).

Both AArch64 and X86 derive at least one predicate from the MachineFunction
or Function so they must re-initialize AvailableFeatures before each
function. They also declare locals in <Target>InstructionSelector so that
computeAvailableFeatures() can use the code from SelectionDAG without
modification.

Reviewers: rovka, qcolombet, aditya_nandakumar, t.p.northover, ab

Reviewed By: rovka

Subscribers: aemerson, rengolin, dberris, kristof.beyls, llvm-commits, igorb

Differential Revision: https://reviews.llvm.org/D31418

llvm-svn: 300993
2017-04-21 15:59:56 +00:00
Craig Topper 7af078847c [SimplifyCFG] Fix the determination of PostBB in conditional store merging to handle the targets on the second branch being commuted
Currently we choose PostBB as the single successor of QFB, but its possible that QTB's single successor is QFB which would make QFB the correct choice.

Differential Revision: https://reviews.llvm.org/D32323

llvm-svn: 300992
2017-04-21 15:53:42 +00:00
Wei Mi 337d4d95c2 [ConstHoisting] Add BFI in constanthoisting pass and select the best insertion
places based on it.

Existing constant hoisting pass will merge a group of contants in a small range
and hoist the const materialization code to the common dominator of their uses.
However, if the uses are all in cold pathes, existing implementation may hoist
the materialization code from cold pathes to a hot place. This may hurt performance.
The patch introduces BFI to the pass and selects the best insertion places based
on it.

The change is controlled by an option consthoist-with-block-frequency which is
off by default for now.

Differential Revision: https://reviews.llvm.org/D28962

llvm-svn: 300989
2017-04-21 15:50:16 +00:00
Matthew Simpson e2037d24f9 [LV] Model if-converted phi node costs
Phi nodes in non-header blocks are converted to select instructions after
if-conversion. This patch updates the cost model to account for the selects.

Differential Revision: https://reviews.llvm.org/D31906

llvm-svn: 300980
2017-04-21 14:14:54 +00:00
Daniel Sanders 419efdd55b Revert r300964 + r300970 - [globalisel][tablegen] Import SelectionDAG's rule predicates and support the equivalent in GIRule.
It's causing llvm-clang-x86_64-expensive-checks-win to fail to compile and I
haven't worked out why. Reverting to make it green while I figure it out.

llvm-svn: 300978
2017-04-21 14:09:20 +00:00
Sanjay Patel 347b54b093 [InstCombine] prefer xor with -1 because 'not' is easier to understand (PR32706)
This matches the demanded bits behavior in the DAG and should fix:
https://bugs.llvm.org/show_bug.cgi?id=32706

Differential Revision: https://reviews.llvm.org/D32255

llvm-svn: 300977
2017-04-21 14:03:54 +00:00
Diana Picus 64a33431eb [ARM] GlobalISel: Add support for G_TRUNC
Select them as copies. We only select if both the source and the
destination are on the same register bank, so this shouldn't cause any
trouble.

llvm-svn: 300971
2017-04-21 13:16:50 +00:00
Diana Picus f941ec0ecc [ARM] GlobalISel: Make struct arguments fail elegantly
The condition in isSupportedType didn't handle struct/array arguments
properly. Fix the check and add a test to make sure we use the fallback
path in this kind of situation. The test deals with some common cases
where the call lowering should error out. There are still some issues
here that need to be addressed (tail calls come to mind), but they can
be addressed in other patches.

llvm-svn: 300967
2017-04-21 11:53:01 +00:00
Daniel Sanders 279d03527e [globalisel][tablegen] Import SelectionDAG's rule predicates and support the equivalent in GIRule.
Summary:
The SelectionDAG importer now imports rules with Predicate's attached via
Requires, PredicateControl, etc. These predicates are implemented as
bitset's to allow multiple predicates to be tested together. However,
unlike the MC layer subtarget features, each target only pays for it's own
predicates (e.g. AArch64 doesn't have 192 feature bits just because X86
needs a lot).

Both AArch64 and X86 derive at least one predicate from the MachineFunction
or Function so they must re-initialize AvailableFeatures before each
function. They also declare locals in <Target>InstructionSelector so that
computeAvailableFeatures() can use the code from SelectionDAG without
modification.

Reviewers: rovka, qcolombet, aditya_nandakumar, t.p.northover, ab

Reviewed By: rovka

Subscribers: aemerson, rengolin, dberris, kristof.beyls, llvm-commits, igorb

Differential Revision: https://reviews.llvm.org/D31418

llvm-svn: 300964
2017-04-21 10:27:20 +00:00
Clement Courbet 00b51bf123 add skylake
llvm-svn: 300962
2017-04-21 09:21:01 +00:00
Clement Courbet 2430e25ba3 add 32 bit tests
llvm-svn: 300961
2017-04-21 09:20:58 +00:00
Clement Courbet d5f6182bec use repmovsb when optimizing forminsize
llvm-svn: 300960
2017-04-21 09:20:55 +00:00
Clement Courbet 203fc17797 Rename FastString flag.
llvm-svn: 300959
2017-04-21 09:20:50 +00:00
Clement Courbet a7c233fbe0 add more tests
llvm-svn: 300958
2017-04-21 09:20:44 +00:00
Clement Courbet 1ce3b82dea X86 memcpy: use REPMOVSB instead of REPMOVS{Q,D,W} for inline copies
when the subtarget has fast strings.

This has two advantages:
  - Speed is improved. For example, on Haswell thoughput improvements increase
    linearly with size from 256 to 512 bytes, after which they plateau:
    (e.g. 1% for 260 bytes, 25% for 400 bytes, 40% for 508 bytes).
  - Code is much smaller (no need to handle boundaries).

llvm-svn: 300957
2017-04-21 09:20:39 +00:00
Artyom Skrobov 8d9643009f [Thumb1] The recently added tADCS and tSBCS pseudo-instructions were missing `Uses = [CPSR]`
Summary: Thanks to Oliver Stannard for helping catch this.

Reviewers: olista01, efriedma

Subscribers: llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D31815

llvm-svn: 300951
2017-04-21 07:35:21 +00:00
Davide Italiano fa15de34b7 [PartialInliner] Fix crash when inlining functions with unreachable blocks.
CodeExtractor looks up the dominator node corresponding to return blocks
when splitting them. If one of these blocks is unreachable, there's no
node in the Dom and CodeExtractor crashes because it doesn't check
for domtree node validity.
In theory, we could add just a check for skipping null DTNodes in
`splitReturnBlock` but the fix I propose here is slightly different. To the
best of my knowledge, unreachable blocks are irrelevant for the algorithm,
therefore we can just skip them when building the candidate set in the
constructor.

Differential Revision:  https://reviews.llvm.org/D32335

llvm-svn: 300946
2017-04-21 04:25:00 +00:00
Akira Hatanaka 78ccba6a20 Revert r300932 and r300930.
It seems that r300930 was creating an infinite loop in dag-combine when
compling the following file:

MultiSource/Benchmarks/MiBench/consumer-typeset/z21.c

llvm-svn: 300940
2017-04-21 01:31:50 +00:00
Akira Hatanaka 19077aaee0 [AArch64] Improve code generation for logical instructions taking
immediate operands.

This commit adds an AArch64 dag-combine that optimizes code generation
for logical instructions taking immediate operands. The optimization
uses demanded bits to change a logical instruction's immediate operand
so that the immediate can be folded into the immediate field of the
instruction.

This recommits r300913, which broke bots because I didn't fix a call to
ShrinkDemandedConstant in SIISelLowering.cpp after changing the APIs of
TargetLoweringOpt and TargetLowering.

rdar://problem/18231627

Differential Revision: https://reviews.llvm.org/D5591

llvm-svn: 300930
2017-04-21 00:05:16 +00:00
Eli Friedman d0e6ae5678 Revert r300746 (SCEV analysis for or instructions).
There have been multiple reports of this causing problems: a
compile-time explosion on the LLVM testsuite, and a stack
overflow for an opencl kernel.

llvm-svn: 300928
2017-04-20 23:59:05 +00:00
Akira Hatanaka 7b06cebe73 Revert "[AArch64] Improve code generation for logical instructions taking"
This reverts r300913.

This broke bots.

llvm-svn: 300916
2017-04-20 23:03:30 +00:00
Craig Topper 1cb0e5afb0 [Simplify] Add testcase to show that merging conditional stores for triangles is sensitive to the order of the branch targets on the conditional branches. NFC
llvm-svn: 300915
2017-04-20 22:57:36 +00:00
Akira Hatanaka e327f09832 [AArch64] Improve code generation for logical instructions taking
immediate operands.

This commit adds an AArch64 dag-combine that optimizes code generation
for logical instructions taking immediate operands. The optimization
uses demanded bits to change a logical instruction's immediate operand
so that the immediate can be folded into the immediate field of the
instruction.

rdar://problem/18231627

Differential Revision: https://reviews.llvm.org/D5591

llvm-svn: 300913
2017-04-20 22:47:56 +00:00
Sanjay Patel c9485ca895 [InstCombine] allow shl+shr demanded bits folds with splat constants
llvm-svn: 300911
2017-04-20 22:33:54 +00:00
Sanjay Patel be2dcaf45a [InstCombine] add tests for shl+shr demanded bits splat vector folds; NFC
llvm-svn: 300907
2017-04-20 22:18:47 +00:00
Tim Northover 46e58354da ARM: lower "fence singlethread" to a pure compiler barrier.
Single-threaded fences aren't required to provide any synchronization with
other processing elements so there's no need for a DMB. They should still be a
barrier for compiler optimizations though.

llvm-svn: 300904
2017-04-20 21:56:52 +00:00
Sanjay Patel 3e1ae72fcf [InstCombine] allow shl demanded bits folds with splat constants
More fixes are needed to enable the helper SimplifyShrShlDemandedBits().

llvm-svn: 300898
2017-04-20 21:33:02 +00:00
Sanjay Patel fb5b3e773a [InstCombine] allow ashr/lshr demanded bits folds with splat constants
llvm-svn: 300888
2017-04-20 20:59:02 +00:00
Sanjay Patel 7e77bed813 [InstCombine] add tests for demanded bits ashr/lshr splat constants; NFC
llvm-svn: 300884
2017-04-20 20:44:54 +00:00
Adrian Prantl ada104888e Don't emit locations that need a DW_OP_stack_value in DWARF 2 & 3.
https://bugs.llvm.org/show_bug.cgi?id=32382

llvm-svn: 300883
2017-04-20 20:42:33 +00:00
Tim Northover 8b1240b0f0 ARM: handle post-indexed NEON ops where the offset isn't the access width.
Before, we assumed that any ConstantInt offset was precisely the access width,
so we could use the "[rN]!" form. ISelLowering only ever created that kind, but
further simplification during combining could lead to unexpected constants and
incorrect codegen.

Should fix PR32658.

llvm-svn: 300878
2017-04-20 19:54:02 +00:00
Paul Robinson 70b34533c2 [DWARF] Versioning for DWARF constants; verify FORMs
Associate the version-when-defined with definitions of standard DWARF
constants.  Identify the "vendor" for DWARF extensions.
Use this information to verify FORMs in .debug_abbrev are defined as
of the DWARF version specified in the associated unit.
Removed two tests that had specified DWARF v1 (which essentially does
not exist).

Differential Revision: http://reviews.llvm.org/D30785

llvm-svn: 300875
2017-04-20 19:16:51 +00:00
Yaxun Liu 5d977f8ed4 CodeGen: Let frame index value type match alloca addr space
Recently alloca address space has been added to data layout. Due to this
change, pointer returned by alloca may have different size as pointer in
address space 0.

However, currently the value type of frame index is assumed to be of the
same size as pointer in address space 0.

This patch fixes that.

Most targets assume alloca returning pointer in address space 0, which
is the default alloca address space. Therefore it is NFC for them.

AMDGCN target with amdgiz environment requires this change since it
assumes alloca returning pointer to addr space 5 and its size is 32,
which is different from the size of pointer in addr space 0 which is 64.

Differential Revision: https://reviews.llvm.org/D32021

llvm-svn: 300864
2017-04-20 18:15:34 +00:00
Amara Emerson bfbdebd00e [MVT][SVE] Scalable vector MVTs (2/3)
Adds scalable vector machine value types, and updates
the switch statements required for tablegen.

Patch by Graham Hunter.

Differential Revision: https://reviews.llvm.org/D32018

llvm-svn: 300840
2017-04-20 13:36:58 +00:00
Petar Jovanovic 2b6fe3ffa6 [mips][msa] Mask vectors holding shift amounts
Masked vectors which hold shift amounts when creating the following nodes:
ISD::SHL, ISD::SRL or ISD::SRA.
Instructions that use said nodes, which have had their arguments altered are
sll, srl, sra, bneg, bclr and bset.

For said instructions, the shift amount or the bit position that is
specified in the corresponding vector elements will be interpreted as the
shift amount/bit position modulo the size of the element in bits.

The problem lies in compiling with -O2 enabled, where the instructions for
formats .w and .d are not generated, but are instead optimized away.
In this case, having shift amounts that are either negative or greater than
the element bit size results in generation of incorrect results when
constant folding.

We remedy this by masking the operands for the nodes mentioned above before
actually creating them, so that the final result is correct before placed
into the constant pool.

Patch by Stefan Maksimovic.

Differential Revision: https://reviews.llvm.org/D31331

llvm-svn: 300839
2017-04-20 13:26:46 +00:00
John Brawn 66719f63d0 [ARM] Fix handling of mapping symbols when changing sections
ChangeSection incorrectly registers LastEMSInfo as belonging to the previous
section, not the current section. This happens to work when changing sections
using .section, as the previous section is set to the current section before
the call to ChangeSection, but not when using .popsection.

Differential Revision: https://reviews.llvm.org/D32225

llvm-svn: 300831
2017-04-20 10:18:13 +00:00
John Brawn 5ca5daa6b9 [AArch64] Fix handling of zero immediate in fmov instructions
Currently fmov #0 with a vector destination is handle incorrectly and results in
fmov #-1.9375 being emitted but should instead give an error. This is due to the
way we cope with fmov #0 with a scalar destination being an alias of fmov zr, so
fix this by actually doing it through an alias.

Differential Revision: https://reviews.llvm.org/D31949

llvm-svn: 300830
2017-04-20 10:13:54 +00:00
John Brawn dcf037a6f0 [AArch64] Fix handling of integer fp immediates
When an integer is used as an fp immediate we're failing to check the return
value of getFP64Imm, so invalid values are silently permitted. Fix this by
merging together the integer and real handling.

llvm-svn: 300828
2017-04-20 10:10:10 +00:00
Adrian Prantl c12cee3600 Fix bug that caused DwarfExpression to drop DW_OP_deref from FI locations
- introduced in r300522 and found via the Swift LLDB testsuite.

The fix is to set the location kind to memory whenever an FrameIndex
location is emitted.

rdar://problem/31707602

llvm-svn: 300793
2017-04-19 23:42:25 +00:00
Reid Kleckner aa0cec7d6d Simplify test for sret attribute in instcombine
This change is correct because the verifier requires that at most one
argument be marked 'sret'.

NFC, removes a use of AttributeList slot APIs.

llvm-svn: 300784
2017-04-19 23:17:47 +00:00
Galina Kistanova 2cc97d92ce Temporarily revert r299221 to fix nondeterminism in ThinLTO builder.
llvm-svn: 300783
2017-04-19 23:16:14 +00:00
Matthias Braun 372ee59766 X86FrameLowering: Fix getFrameIndexReference() for 'fixed' objects
Debug information is calculated with getFrameIndexReference() which was
missing some logic for the fixed object cases (= parameters on the stack).

rdar://24557797

Differential Revision: https://reviews.llvm.org/D32204

llvm-svn: 300781
2017-04-19 23:10:43 +00:00
Kostya Serebryany c5d3d49034 [sanitizer-coverage] remove some more stale code
llvm-svn: 300778
2017-04-19 22:42:11 +00:00
Sanjay Patel 0658a95a35 [DAG] add splat vector support for 'or' in SimplifyDemandedBits
I've changed one of the tests to not fold away, but we didn't and still don't do the transform
that the comment claims we do (and I don't know why we'd want to do that).

Follow-up to:
https://reviews.llvm.org/rL300725
https://reviews.llvm.org/rL300763

llvm-svn: 300772
2017-04-19 22:00:00 +00:00
Kostya Serebryany be87d480ff [sanitizer-coverage] remove stale code
llvm-svn: 300769
2017-04-19 21:48:09 +00:00
Sanjay Patel ae382bb6af [DAG] add splat vector support for 'xor' in SimplifyDemandedBits
This allows forming more 'not' ops, so we get improvements for ISAs that have and-not.

Follow-up to:
https://reviews.llvm.org/rL300725

llvm-svn: 300763
2017-04-19 21:23:09 +00:00
Matthias Braun 8aaa368d00 ARMFrameLowering: Reserve emergency spill slot for large arguments
Re-commit after revert in r300668. Changed getMaxFPOffset() to a
more conservative heuristic instead of trying to be clever and missing
for some exotic calling conventions.

We need to reserve an emergency spill slot in cases with large argument
types that could overflow immediate offsets for FP relative address
calculations.

rdar://31317893

Differential Revision: https://reviews.llvm.org/D31643

llvm-svn: 300761
2017-04-19 21:11:44 +00:00
Simon Pilgrim 9a828c1f2b [InstCombine] Add frem constant folding test (PR3316)
llvm-svn: 300757
2017-04-19 21:09:19 +00:00
Matt Arsenault 4a48623e4f AMDGPU: Custom lower illegal small select types
Promote them to i32 vectors to avoid unpacking and re-packing
the vectors.

llvm-svn: 300754
2017-04-19 20:53:07 +00:00
Simon Pilgrim 19a6dddd6d [InstCombine] Add frem constant folding test (PR32177)
llvm-svn: 300750
2017-04-19 20:47:58 +00:00
Eli Friedman f281d490cc [ARM] Use TableGen patterns to select vtbl. NFC.
Differential Revision: https://reviews.llvm.org/D32103

llvm-svn: 300749
2017-04-19 20:39:39 +00:00
Eli Friedman e77d2b86b4 [SCEV] Make SCEV or modeling more aggressive.
Use haveNoCommonBitsSet to figure out whether an "or" instruction
is equivalent to addition. This handles more cases than just
checking for a constant on the RHS.

Differential Revision: https://reviews.llvm.org/D32239

llvm-svn: 300746
2017-04-19 20:19:58 +00:00
Dehao Chen a364f09f18 Using address range map to speedup finding inline stack for address.
Summary:
In the current implementation, to find inline stack for an address incurs expensive linear search in 2 places:

* linear search for the top-level DIE
* recursive linear traverse the DIE tree to find the path to the leaf DIE

In this patch, a map is built from address to its corresponding leaf DIE. The inline stack is built by traversing from the leaf DIE up to the root DIE. This speeds up batch symbolization by ~10X without noticible memory overhead.

Reviewers: dblaikie

Reviewed By: dblaikie

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D32177

llvm-svn: 300742
2017-04-19 20:09:38 +00:00
Dehao Chen cfe5b2bf74 Update the madd.ll test with utils/update_llc_test_checks.py (NFC)
llvm-svn: 300740
2017-04-19 20:08:14 +00:00
Dehao Chen 58601674d2 PR32710: Disable using PMADDWD for unsigned short.
Summary: PMADDWD can only handle signed short.

Reviewers: mkuper, wmi

Reviewed By: mkuper

Subscribers: andreadb, llvm-commits

Differential Revision: https://reviews.llvm.org/D32236

llvm-svn: 300737
2017-04-19 19:50:34 +00:00
Matt Arsenault 021a218dd2 AMDGPU: Don't emit amd_kernel_code_t for callable functions
This is inserted directly in the text section. The relocation
for the function ends up resolving to the beginning of the
amd_kernel_code_t header rather than the actual function
entry point.

Also skip some of the comments for initialization
that only makes sense for kernels.

llvm-svn: 300736
2017-04-19 19:38:10 +00:00
Artem Tamazov 557a057d4f [AMDGPU][mc][tests][NFC] Update bulk ISA tests for Gfx7 and Gfx8
Added approx. 1100 gfx7 and 1040 gfx8 test cases.

llvm-svn: 300734
2017-04-19 19:12:06 +00:00
Matt Arsenault d3406bc45c StructurizeCFG: Directly invert cmp instructions
The most common case for a branch condition is
a single use compare. Directly invert the branch
predicate rather than adding a lot of xor i1 true
which the DAG will have to fold later.

This produces nicer to read structurizer output.

This produces some random changes in codegen
due to the DAG swapping branch conditions itself,
and then does a poor job of dealing with those
inverts.

llvm-svn: 300732
2017-04-19 18:29:07 +00:00
Sanjoy Das 5945447d84 [GVN] Don't coerce non-integral pointers to integers or vice versa
Summary:
See http://llvm.org/docs/LangRef.html#non-integral-pointer-type

The NewGVN test does not fail without these changes (perhaps it does
try to coerce pointers <-> integers to begin with?), but I added the
test case anyway.

Reviewers: dberlin

Subscribers: mcrosier, llvm-commits, Prazek

Differential Revision: https://reviews.llvm.org/D32208

llvm-svn: 300730
2017-04-19 18:21:09 +00:00
Tim Northover ff168c68dc ARM: TLS calling convention doesn't preserve r9 or r12 on Darwin.
llvm-svn: 300726
2017-04-19 18:07:54 +00:00
Sanjay Patel ded7d59f0e [DAG] add splat vector support for 'and' in SimplifyDemandedBits
The patch itself is simple: stop discriminating against vectors in visitAnd() and again in 
SimplifyDemandedBits().

Some notes for reference:

1. We're not consistent about calls to SimplifyDemandedBits in the various visitXXX functions. 
   Sometimes, we check if the RHS is a constant first. Other times (like here), we just dive in.
2. I'd like to break the vector shackles in steps for the sake of risk minimization, but we could
    make similar simultaneous changes in other places if we think that would be better.
3. I don't know what the intent of the changed tests in this patch was supposed to be, but since 
   they wiggled in a positive way, I'm just going with that. :)
4. In the rotate tests, note that we can see through non-splat constants. This is a result of D24253.
5. My motivation for being here now is to make D31944 look better, so this is step 1 of N towards 
   improving the vector codegen in that patch without writing any actual new code.

Differential Revision: https://reviews.llvm.org/D32230

llvm-svn: 300725
2017-04-19 18:05:06 +00:00
Matt Arsenault 6cb7b8a42f AMDGPU: Don't align callable functions to 256
llvm-svn: 300720
2017-04-19 17:42:39 +00:00
Matt Arsenault 4c1ecded63 AMDGPU: Change DivergenceAnalysis for function arguments
Stop assuming all functions are kernels.

llvm-svn: 300719
2017-04-19 17:42:34 +00:00
Sanjay Patel a3c297dba4 [InstSimplify] fold identity shuffles (recursing if needed)
This patch simplifies the examples from D31509 and D31927 (PR30630) and catches 
the basic identity shuffle tests that Zvi recently added.

I'm not sure if we have something like this in DAGCombiner, but we should?

It's worth noting that "MaxRecurse / RecursionLimit" is only 3 on entry at the moment. 
We might want to bump that up if there are longer shuffle chains like this in the wild.

For now, we're ignoring shuffles that have undef mask elements because it's not
clear how those should be handled.

Differential Revision: https://reviews.llvm.org/D31960

llvm-svn: 300714
2017-04-19 16:48:22 +00:00
Krzysztof Parzyszek 333b2bf2ed [Hexagon] Generate proper offset in opt-addr-mode
Also, make a few changes to allow using the pass in .mir testcases.
Among other things, change the abbreviation from opt-amode to amode-opt,
because otherwise lit would expand the "opt" part to the full path to
the opt binary.

llvm-svn: 300707
2017-04-19 15:15:51 +00:00
Sanjay Patel c9d36f181f [PowerPC] add test and auto-generate checks; NFC
llvm-svn: 300700
2017-04-19 14:58:09 +00:00
Sanjay Patel 5a2235bbd0 [ARM] add test and auto-generate checks; NFC
llvm-svn: 300698
2017-04-19 14:55:50 +00:00
Davide Italiano a9f047a594 [InstSimplify] Deduce correct type for vector GEP.
InstSimplify returned the wrong type when simplifying a vector GEP
and we ended up crashing when trying to replace all uses with the
new value. Fixes PR32697.

Differential Revision: https://reviews.llvm.org/D32180

llvm-svn: 300693
2017-04-19 14:23:42 +00:00
Dylan McKay da2d74642a [AVR] Remove the 'multibyte' asm test
It tests registers which are not actually used on AVR.

llvm-svn: 300684
2017-04-19 12:13:45 +00:00
Simon Pilgrim 5536ecd9f0 Regenerate test. NFCI.
llvm-svn: 300683
2017-04-19 12:06:40 +00:00
Dylan McKay 7838104382 [AVR] Fix the test suite
A bunch of tests failed because memory operations have been reordered.

I am unsure which commit changed this behaviour as the AVR build was
failing at that point with an unrelated error.

This commit just reoders some of the CHECK lines in some tests to suit
current llc output.

llvm-svn: 300682
2017-04-19 12:02:52 +00:00
Igor Breger 4fdf1e489c [GlobalIsel][X86] support G_TRUNC selection.
Summary:
[GlobalIsel][X86] support G_TRUNC selection.
Add regbank-select and legalizer tests. Currently legalization of trunc i64 on 32bit platform not supported.

Reviewers: ab, zvi, rovka

Reviewed By: zvi

Subscribers: dberris, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D32115

llvm-svn: 300678
2017-04-19 11:34:59 +00:00
Simon Pilgrim 94d5dba8ae [X86] Add D32039/PR31357 tests to show current BSWAP codegen
llvm-svn: 300672
2017-04-19 11:06:22 +00:00
Simon Pilgrim b13ab63503 [X86][SSE] Add scheduling latency/throughput tests for (most) SSE2 instructions
llvm-svn: 300671
2017-04-19 10:52:09 +00:00
Renato Golin 742aed8683 Revert "ARMFrameLowering: Reserve emergency spill slot for large arguments"
This reverts commit r300639, as it broke self-hosting on ARM. PR32709.

llvm-svn: 300668
2017-04-19 09:02:52 +00:00
Igor Breger 746b3c336f [GlobalISel][X86] Split select tests. NFC.
llvm-svn: 300666
2017-04-19 08:40:44 +00:00
Diana Picus 49472ff1cf [ARM] GlobalISel: Add support for G_MUL
Support G_MUL, very similar to G_ADD and G_SUB. The only difference is
in the instruction selector, where we have to select either MUL or MULv5
depending on the target.

llvm-svn: 300665
2017-04-19 07:29:46 +00:00
Kristof Beyls 0f36e68f62 [GlobalISel] Support vector-of-pointers in LLT
This fixes PR32471.

As comment 10 on that bug report highlights
(https://bugs.llvm.org//show_bug.cgi?id=32471#c10), there are quite a
few different defendable design tradeoffs that could be made, including
not representing pointers at all in LLT.

I decided to go for representing vector-of-pointer as a concept in LLT,
while keeping the size of the LLT type 64 bits (this is an increase from
48 bits before). My rationale for keeping pointers explicit is that on
some targets probably it's very handy to have the distinction between
pointer and non-pointer (e.g. 68K has a different register bank for
pointers IIRC). If we keep a scalar pointer, it probably is easiest to
also have a vector-of-pointers to keep LLT relatively conceptually clean
and orthogonal, while we don't have a very strong reason to break that
orthogonality.  Once we gain more experience on the use of LLT, we can
of course reconsider this direction.

Rejecting vector-of-pointer types in the IRTranslator is also an option
to avoid the crash reported in PR32471, but that is only a very
short-term solution; also needs quite a bit of code tweaks in places,
and is probably fragile. Therefore I didn't consider this the best
option.

llvm-svn: 300664
2017-04-19 07:23:57 +00:00
Chandler Carruth ae3386aa74 Revert r300657 due to crashes in stage2 of bootstraps:
http://lab.llvm.org:8011/builders/clang-with-lto-ubuntu/builds/2476/steps/build-stage2-LLVMgold.so/logs/stdio
http://bb.pgr.jp/builders/clang-3stage-x86_64-linux/builds/15036/steps/build_llvmclang/logs/stdio

I've updated the commit thread, reverting to get the bots back to green.

Original commit summary:
[JumpThread] We want to fold (not thread) when all predecessor go to single BB's successor.

llvm-svn: 300662
2017-04-19 06:23:20 +00:00
Xin Tong 636a332906 [JumpThread] We want to fold (not thread) when all predecessor go to single BB's successor. .
Summary: In case all predecessor go to a single successor of current BB. We want to fold (not thread).

Reviewers: efriedma, sanjoy

Reviewed By: sanjoy

Subscribers: dberlin, majnemer, llvm-commits

Differential Revision: https://reviews.llvm.org/D30869

llvm-svn: 300657
2017-04-19 05:15:57 +00:00
Matthias Braun 661d3d4b00 ARMFrameLowering: Reserve emergency spill slot for large arguments
We need to reserve an emergency spill slot in cases with large argument
types that could overflow immediate offsets for FP relative address
calculations.

rdar://31317893

Differential Revision: https://reviews.llvm.org/D31643

llvm-svn: 300639
2017-04-19 01:16:07 +00:00
Dean Michael Berris 97923db891 [XRay][tools] Fix yaml matching to be more permissive
Account for a potentially empty function name.

Follow-up to D32153.

llvm-svn: 300631
2017-04-19 00:10:09 +00:00
Dean Michael Berris 918802bed4 [XRay][tools] Add option to llvm-xray extract to symbolize functions
Summary:
This allows us to, if the symbol names are available in the binary, be
able to provide the function name in the YAML output.

Reviewers: dblaikie, pelikan

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D32153

llvm-svn: 300624
2017-04-18 23:23:54 +00:00
Sanjay Patel ff981f9256 [x86] add tests for potential andn optimization; NFC
llvm-svn: 300617
2017-04-18 22:36:59 +00:00
Chih-Hung Hsieh 877923a87f [X86] Keep EXTRACT_VECTOR_ELT result type as f128 for Android x86_64.
Android x86_64 target uses f128 type and stores f128 values in %xmm* registers.
SoftenFloatRes_EXTRACT_VECTOR_ELT should not convert result value
from f128 to i128.

Differential Revision: http://reviews.llvm.org/D32102

llvm-svn: 300583
2017-04-18 20:15:18 +00:00
Simon Pilgrim 9398649fea [X86][SSE] Add scheduling latency/throughput tests for (most) SSE1 instructions
llvm-svn: 300576
2017-04-18 19:04:40 +00:00
Easwaran Raman 76aba5f6d7 [SLP vectorizer] Allow phi node reordering in tryToVectorizeList.
In tryToVectorizeList, under a very limited circumstance (when entered
from tryToVectorizePair), the values may be reordered (swapped) and the
SLP tree is built with the new order. This extends that to the case when
starting from phis in vectorizeChainsInBlock when there are exactly two
phis. The textual order of phi nodes shouldn't really matter. Without
this change, the loop body in the accompnaying test case is fully vectorized
when we swap the orde of the phis but not with this order. While this
doesn't solve the phi-ordering problem in a general way (for more than 2
phis), this is simple fix that piggybacks on an existing mechanism and
is useful in cases like multiplying two complex numbers.

Differential revision: https://reviews.llvm.org/D32065

llvm-svn: 300574
2017-04-18 18:16:57 +00:00
Nirav Dave 855ef45602 [DAG] Improve store merge candidate pruning.
Remove non-consecutive stores from store merge candidate search as
they cannot be merged and will prevent us from finding subsequent
mergeable store cases.

Reviewers: jyknight, bogner, javed.absar, spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D32086

llvm-svn: 300561
2017-04-18 15:36:34 +00:00
Nirav Dave e50544cdf2 Add base-index-based store merge test
llvm-svn: 300559
2017-04-18 15:12:13 +00:00
Nikolai Bozhenov 95fc644148 Make globalaa-retained.ll test catching more cases.
Summary:
* Add checks for store. That is needed because GlobalsAA is called
  twice in the current pipeline with different sets of Function passes
  following it. However, the loads are eliminated using instcombine
  which happens everywhere. On the other hand, DeadStoreElimination is
  performed only once so by checking for store we'll be able to catch
  more cases when GlobalsAA is invalidated unintentionally.
* Add empty function above/below the test so that we don't depend on
  the relative order of instcombine/dead-store-elimination and the
  pass that invalidates the analysis (inside the same
  FunctionPassManager).

Reviewers: kristof.beyls

Reviewed By: kristof.beyls

Subscribers: llvm-commits, n.bozhenov

Differential Revision: https://reviews.llvm.org/D32015
Patch by Andrei Elovikov <andrei.elovikov@intel.com>

llvm-svn: 300553
2017-04-18 13:29:26 +00:00
Nirav Dave b97768499f Add store Merge test.
llvm-svn: 300551
2017-04-18 13:25:19 +00:00
Oliver Stannard 7ad2e8aae1 [ARM] Add hardware build attributes in assembler
In the assembler, we should emit build attributes based on the target
selected with command-line options. This matches the GNU assembler's
behaviour. We only do this for build attributes which describe the
hardware that is expected to be available, not the ones that describe
ABI compatibility.

This is done by moving some of the attribute emission code to
ARMTargetStreamer, so that it can be shared between the assembly and
code-generation code paths. Since the assembler only creates a
MCSubtargetInfo, not an ARMSubtarget, the code had to be changed to
check raw features, and not use the convenience functions in
ARMSubtarget.

If different attributes are later specified using the .eabi_attribute
directive, then they will take precedence, as happens when the same
.eabi_attribute is specified twice.

This must be enabled by an option, because we don't want to do this when
parsing inline assembly. The attributes would match the ones emitted at
the start of the file, so wouldn't actually change the emitted object
file, but the extra directives would be added to every inline assembly
block when emitting assembly, which we'd like to avoid.

The majority of the changes in the build-attributes.ll test are just
re-ordering the directives, because the hardware attributes are now
emitted before the ABI ones. However, I did fix one bug which I spotted:
Tag_CPU_arch_profile was not being emitted for v6M.

Differential revision: https://reviews.llvm.org/D31812

llvm-svn: 300547
2017-04-18 12:52:35 +00:00
Diana Picus a3a0cccb2c [ARM] GlobalISel: Add support for G_SUB
Support G_SUB throughout the GlobalISel pipeline. It is exactly the same
as G_ADD, nothing fancy.

llvm-svn: 300546
2017-04-18 12:35:28 +00:00
Kristof Beyls a4e79cca77 Revert "[GlobalISel] Support vector-of-pointers in LLT"
This reverts r300535 and r300537.
The newly added tests in test/CodeGen/AArch64/GlobalISel/arm64-fallback.ll
produces slightly different code between LLVM versions being built with different compilers.
E.g., dependent on the compiler LLVM is built with, either one of the following
can be produced:

remark: <unknown>:0:0: unable to legalize instruction: %vreg0<def>(p0) = G_EXTRACT_VECTOR_ELT %vreg1, %vreg2; (in function: vector_of_pointers_extractelement)
remark: <unknown>:0:0: unable to legalize instruction: %vreg2<def>(p0) = G_EXTRACT_VECTOR_ELT %vreg1, %vreg0; (in function: vector_of_pointers_extractelement)

Non-determinism like this is clearly a bad thing, so reverting this until
I can find and fix the root cause of the non-determinism.

llvm-svn: 300538
2017-04-18 09:26:36 +00:00
Diana Picus e2626bb7c2 [ARM] Check for correct HW div when lowering divmod
For subtargets that use the custom lowering for divmod, e.g. gnueabi,
we used to check if the subtarget has hardware divide and then lower to
a div-mul-sub sequence if true, or to a libcall if false.

However, judging by the usage of hasDivide vs hasDivideInARMMode, it
seems that hasDivide only refers to Thumb. For instance, in the
ARMTargetLowering constructor, the code that specifies whether to use
libcalls for (S|U)DIV looks like this:

bool hasDivide = Subtarget->isThumb() ? Subtarget->hasDivide()
                                      : Subtarget->hasDivideInARMMode();

In the case of divmod for arm-gnueabi, using only hasDivide() to
determine what to do means that instead of lowering to __aeabi_idivmod
to get the remainder, we lower to div-mul-sub and then further lower the
div to __aeabi_idiv. Even worse, if we have hardware divide in ARM but
not in Thumb, we generate a libcall instead of using it (this is not an
issue in practice since AFAICT none of the cores that we support have
hardware divide in ARM but not Thumb).

This patch fixes the code dealing with custom lowering to take into
account the mode (Thumb or ARM) when deciding whether or not hardware
division is available.

Differential Revision: https://reviews.llvm.org/D32005

llvm-svn: 300536
2017-04-18 08:32:27 +00:00
Kristof Beyls fb73eb0324 [GlobalISel] Support vector-of-pointers in LLT
This fixes PR32471.

As comment 10 on that bug report highlights
(https://bugs.llvm.org//show_bug.cgi?id=32471#c10), there are quite a
few different defendable design tradeoffs that could be made, including
not representing pointers at all in LLT.

I decided to go for representing vector-of-pointer as a concept in LLT,
while keeping the size of the LLT type 64 bits (this is an increase from
48 bits before). My rationale for keeping pointers explicit is that on
some targets probably it's very handy to have the distinction between
pointer and non-pointer (e.g. 68K has a different register bank for
pointers IIRC). If we keep a scalar pointer, it probably is easiest to
also have a vector-of-pointers to keep LLT relatively conceptually clean
and orthogonal, while we don't have a very strong reason to break that
orthogonality. Once we gain more experience on the use of LLT, we can
of course reconsider this direction.

Rejecting vector-of-pointer types in the IRTranslator is also an option
to avoid the crash reported in PR32471, but that is only a very
short-term solution; also needs quite a bit of code tweaks in places,
and is probably fragile. Therefore I didn't consider this the best
option.

llvm-svn: 300535
2017-04-18 08:12:45 +00:00
Adrian Prantl 6825fb64e9 PR32382: Fix emitting complex DWARF expressions.
The DWARF specification knows 3 kinds of non-empty simple location
descriptions:
1. Register location descriptions
  - describe a variable in a register
  - consist of only a DW_OP_reg
2. Memory location descriptions
  - describe the address of a variable
3. Implicit location descriptions
  - describe the value of a variable
  - end with DW_OP_stack_value & friends

The existing DwarfExpression code is pretty much ignorant of these
restrictions. This used to not matter because we only emitted very
short expressions that we happened to get right by accident.  This
patch makes DwarfExpression aware of the rules defined by the DWARF
standard and now chooses the right kind of location description for
each expression being emitted.

This would have been an NFC commit (for the existing testsuite) if not
for the way that clang describes captured block variables. Based on
how the previous code in LLVM emitted locations, DW_OP_deref
operations that should have come at the end of the expression are put
at its beginning. Fixing this means changing the semantics of
DIExpression, so this patch bumps the version number of DIExpression
and implements a bitcode upgrade.

There are two major changes in this patch:

I had to fix the semantics of dbg.declare for describing function
arguments. After this patch a dbg.declare always takes the *address*
of a variable as the first argument, even if the argument is not an
alloca.

When lowering a DBG_VALUE, the decision of whether to emit a register
location description or a memory location description depends on the
MachineLocation — register machine locations may get promoted to
memory locations based on their DIExpression. (Future) optimization
passes that want to salvage implicit debug location for variables may
do so by appending a DW_OP_stack_value. For example:
  DBG_VALUE, [RBP-8]                        --> DW_OP_fbreg -8
  DBG_VALUE, RAX                            --> DW_OP_reg0 +0
  DBG_VALUE, RAX, DIExpression(DW_OP_deref) --> DW_OP_reg0 +0

All testcases that were modified were regenerated from clang. I also
added source-based testcases for each of these to the debuginfo-tests
repository over the last week to make sure that no synchronized bugs
slip in. The debuginfo-tests compile from source and run the debugger.

https://bugs.llvm.org/show_bug.cgi?id=32382
<rdar://problem/31205000>

Differential Revision: https://reviews.llvm.org/D31439

llvm-svn: 300522
2017-04-18 01:21:53 +00:00
Dehao Chen 1ea8bd8109 Build SymbolMap in SampleProfileLoader to help matchin function names with suffix.
Summary: If there is suffix added in the function name (e.g. module hash added by thinLTO), we will not be able to find a match in profile as the suffix does not exist in profile. This patch build a map from function name to Function *. The map includes the entry for the stripped function name so that inlineHotFunctions can find the corresponding function to promote/inline.

Reviewers: davidxl, dnovillo, tejohnson

Reviewed By: davidxl

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D31952

llvm-svn: 300507
2017-04-17 22:23:05 +00:00
Haicheng Wu 68f82a31d3 Change the testcase tail-merge-after-mbp.ll to tail-merge-after-mbp.mir
Differential Revision: https://reviews.llvm.org/D32037

llvm-svn: 300506
2017-04-17 22:22:38 +00:00
Jacob Gravelle 0bb7541233 [WebAssembly] Fix WebAssemblyOptimizeReturned after r300367
Summary:
Refactoring changed paramHasAttr(1 + i) to paramHasAttr(0), fix that to
paramHasAttr(i).
Add more tests to WebAssemblyOptimizeReturned that catch that
regression.

Reviewers: dschuff

Subscribers: jfb, sbc100, llvm-commits

Differential Revision: https://reviews.llvm.org/D32136

llvm-svn: 300502
2017-04-17 21:40:28 +00:00
Davide Italiano cdc937d0fc [InstCombine] Matchers work with both ConstExpr and Instructions.
So, `cast<Instruction>` is not guaranteed to succeed. Change the
code so that we create a new constant and use it in the newly
created instruction, as it's done in other places in InstCombine.

OK'ed by Sanjay/Craig. Fixes PR32686.

llvm-svn: 300495
2017-04-17 20:49:50 +00:00
Sanjay Patel 9083dc9680 [InstSimplify] add/move tests for (icmp X, C1 & icmp X, C2); NFC
We simplify based on range intersection, but we're missing folds.

llvm-svn: 300493
2017-04-17 20:38:33 +00:00
Dehao Chen 6afb3167e9 Update the test to fix the buildbot failure introduced by r300486 (NFC)
llvm-svn: 300492
2017-04-17 20:35:32 +00:00
Dehao Chen ef700d550e Add GNU_discriminator support for inline callsites in llvm-symbolizer.
Summary: LLVM symbolize cannot recognize GNU_discriminator for inline callsites. This patch adds support for it.

Reviewers: dblaikie

Reviewed By: dblaikie

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D32134

llvm-svn: 300486
2017-04-17 20:10:39 +00:00
Matt Arsenault a3566f2149 AMDGPU: Use MachineRegisterInfo to find max used register
Avoid looping through program to determine register counts.
This avoids needing to look at regmask operands.

Also fixes some counting errors with flat_scr when there
are no stack objects.

llvm-svn: 300482
2017-04-17 19:48:30 +00:00
Brendon Cahoon 7769a0854e [CodeGenPrepare] Fix crash due to an invalid CFG
The splitIndirectCriticalEdges function generates and invalid CFG when the
'Target' basic block is a loop to itself. When this occurs, the code that
updates the predecessor terminator needs to update the terminator in the split
basic block.

This occurs when there is an edge from block D back to D. Since D is split in
to D0 and D1, the code needs to update the terminator in D1. But D1 is not in
the OtherPreds vector, so it was not getting updated.

Differential Revision: https://reviews.llvm.org/D32126

llvm-svn: 300480
2017-04-17 19:11:04 +00:00
Tim Northover 46e36f0953 AArch64: put nonlazybind special handling behind a flag for now.
It's basically a terrible idea anyway but objc_msgSend gets emitted like that.
We can decide on a better way to deal with it in the unlikely event that anyone
actually uses it.

llvm-svn: 300474
2017-04-17 18:18:47 +00:00
Konstantin Zhuravlyov dfe2ffe8c5 AMDGPU: Test handling of R_AMDGPU_ABS64 in RelocVisitor
llvm-svn: 300472
2017-04-17 18:12:45 +00:00
Konstantin Zhuravlyov 12096848fd AMDGPU: Set CodePointerSize to 8 for amdgcn
llvm-svn: 300470
2017-04-17 18:02:09 +00:00
Peter Collingbourne a0f371a106 Bitcode: Add a string table to the bitcode format.
Add a top-level STRTAB block containing a string table blob, and start storing
strings for module codes FUNCTION, GLOBALVAR, ALIAS, IFUNC and COMDAT in
the string table.

This change allows us to share names between globals and comdats as well
as between modules, and improves the efficiency of loading bitcode files by
no longer using a bit encoding for symbol names. Once we start writing the
irsymtab to the bitcode file we will also be able to share strings between
it and the module.

On my machine, link time for Chromium for Linux with ThinLTO decreases by
about 7% for no-op incremental builds or about 1% for full builds. Total
bitcode file size decreases by about 3%.

As discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2017-April/111732.html

Differential Revision: https://reviews.llvm.org/D31838

llvm-svn: 300464
2017-04-17 17:51:36 +00:00
Tim Northover 879a0b2e1b AArch64: support nonlazybind
It's almost certainly not a good idea to actually use it in most cases (there's
a pretty large code size overhead on AArch64), but we can't do those
experiments until it's supported.

llvm-svn: 300462
2017-04-17 17:27:56 +00:00
Matt Arsenault 7205f3c2e4 AMDGPU: SimplifyDemandedElts for image intrinsics
Causes some VGPR usage improvements in shaderdb, but
introduces some SGPR spilling regressions due to random
scheduling changes later.

llvm-svn: 300453
2017-04-17 15:12:44 +00:00
Max Kazantsev 751579cac0 [LoopPeeling] Get rid of Phis that become invariant after N steps
This patch is a generalization of the improvement introduced in rL296898.
Previously, we were able to peel one iteration of a loop to get rid of a Phi that becomes
an invariant on the 2nd iteration. In more general case, if a Phi becomes invariant after
N iterations, we can peel N times and turn it into invariant.
In order to do this, we for every Phi in loop's header we define the Invariant Depth value
which is calculated as follows:

Given %x = phi <Inputs from above the loop>, ..., [%y, %back.edge].

If %y is a loop invariant, then Depth(%x) = 1.
If %y is a Phi from the loop header, Depth(%x) = Depth(%y) + 1.
Otherwise, Depth(%x) is infinite.
Notice that if we peel a loop, all Phis with Depth = 1 become invariants,
and all other Phis with finite depth decrease the depth by 1.
Thus, peeling N first iterations allows us to turn all Phis with Depth <= N
into invariants.

Reviewers: reames, apilipenko, mkuper, skatkov, anna, sanjoy

Reviewed By: sanjoy

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31613

llvm-svn: 300446
2017-04-17 09:52:02 +00:00
Max Kazantsev 8ed6b66d85 [LoopPeeling] Fix condition for phi-eliminating peeling
When peeling loops basing on phis becoming invariants, we make a wrong loop size check.
UP.Threshold should be compared against the total numbers of instructions after the transformation,
which is equal to 2 * LoopSize in case of peeling one iteration.
We should also check that the maximum allowed number of peeled iterations is not zero.

Reviewers: sanjoy, anna, reames, mkuper

Reviewed By: mkuper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31753

llvm-svn: 300441
2017-04-17 05:38:28 +00:00
Serguei Katkov 2616bbb16d [BPI] Use metadata info before any other heuristics
Metadata potentially is more precise than any heuristics we use, so
it makes sense to use first metadata info if it is available. However it makes
sense to examine it against other strong heuristics like unreachable one.
If edge coming to unreachable block has higher probability then it is expected 
by unreachable heuristic then we use heuristic and remaining probability is
distributed among other reachable blocks equally.

An example where metadata might be more strong then unreachable heuristic is
as follows: it is possible that there are two branches and for the branch A
metadata says that its probability is (0, 2^25). For the branch B
the probability is (1, 2^25).
So the expectation is that first edge of B is hotter than first edge of A
because first edge of A did not executed at least once.
If first edge of A points to the unreachable block then using the unreachable
heuristics we'll set the probability for A to (1, 2^20) and now edge of A
becomes hotter than edge of B.
This is unexpected behavior.

This fixed the biggest part of https://bugs.llvm.org/show_bug.cgi?id=32214

Reviewers: sanjoy, junbuml, vsk, chandlerc

Reviewed By: chandlerc

Subscribers: llvm-commits, reames, davidxl

Differential Revision: https://reviews.llvm.org/D30631

llvm-svn: 300440
2017-04-17 04:33:04 +00:00
Craig Topper 218a359fbd [InstCombine] Simplify 1/X for vectors.
llvm-svn: 300439
2017-04-17 03:41:47 +00:00
Craig Topper eee53c030a [InstCombine] Add test cases for missing support for simplifying 1/X for vectors. NFC
llvm-svn: 300438
2017-04-17 03:41:44 +00:00
Craig Topper 1a18a7c51e [InstCombine] Add support for vector srem->urem.
llvm-svn: 300437
2017-04-17 01:51:24 +00:00
Craig Topper b60f300afb [InstCombine] Add missing testcases for srem->urem conversion. The vector version isn't currently supported. NFC
llvm-svn: 300436
2017-04-17 01:51:21 +00:00
Craig Topper f248468359 [InstCombine] Add support for turning vector sdiv into udiv.
llvm-svn: 300435
2017-04-17 01:51:19 +00:00
Craig Topper 43b012b1b3 [InstCombine] Add test cases for missing support for turning vector sdiv into udiv. NFC
llvm-svn: 300434
2017-04-17 01:51:16 +00:00
Benjamin Kramer f5f593b674 [X86] Remove special handling for 16 bit for A asm constraints.
Our 16 bit support is assembler-only + the terrible hack that is
.code16gcc. Simply using 32 bit registers does the right thing for the
latter.

Fixes PR32681.

llvm-svn: 300429
2017-04-16 20:13:08 +00:00
Michael Zuckerman 16b20d2fc5 [X86][X86 intrinsics]Folding cmp(sub(a,b),0) into cmp(a,b) optimization
This patch adds new optimization (Folding cmp(sub(a,b),0) into cmp(a,b))
to instCombineCall pass and was written specific for X86 CMP intrinsics.

Differential Revision: https://reviews.llvm.org/D31398

llvm-svn: 300422
2017-04-16 13:26:08 +00:00
Dimitry Andric 909b3376ba Use correct registers for "A" inline asm constraint
Summary:
In PR32594, inline assembly using the 'A' constraint on x86_64 causes
llvm to crash with a "Cannot select" stack trace.  This is because
`X86TargetLowering::getRegForInlineAsmConstraint` hardcodes that 'A'
means the EAX and EDX registers.

However, on x86_64 it means the RAX and RDX registers, and on 16-bit x86
(ia16?) it means the old AX and DX registers.

Add new register classes in `X86RegisterInfo.td` to support these cases,
and amend the logic in `getRegForInlineAsmConstraint` to cope with
different subtargets.  Also add a test case, derived from PR32594.

Reviewers: craig.topper, qcolombet, RKSimon, ab

Reviewed By: ab

Subscribers: ab, emaste, royger, llvm-commits

Differential Revision: https://reviews.llvm.org/D31902

llvm-svn: 300404
2017-04-15 22:15:01 +00:00
Sanjay Patel ef9f586bb2 [InstCombine] allow (X != C1 && X != C2) and similar patterns to match splat vector constants
llvm-svn: 300402
2017-04-15 17:55:06 +00:00
Sanjay Patel c8405b82a1 [InstCombine] add tests to show missing transforms for vectors; NFC
llvm-svn: 300401
2017-04-15 17:50:45 +00:00
Sam Clegg 135a4b8ea1 [WebAssembly] Improve readobj and nm support for wasm
Now that the libObect support for wasm is better we can
have readobj and nm produce more useful output too.

Differential Revision: https://reviews.llvm.org/D31514

llvm-svn: 300365
2017-04-14 19:50:44 +00:00
Sanjay Patel 7cfe41659c [InstCombine] (X != C1 && X != C2) --> (X | (C1 ^ C2)) != C2
...when C1 differs from C2 by one bit and C1 <u C2:
http://rise4fun.com/Alive/Vuo

And move related folds to a helper function. This reduces code duplication and
will make it easier to remove the scalar-only restriction as a follow-up step.

llvm-svn: 300364
2017-04-14 19:23:50 +00:00
Craig Topper fb71b7d3e0 [InstCombine] Support folding a subtract with a constant LHS into a phi node
We currently only support folding a subtract into a select but not a PHI. This fixes that.

I had to fix an assumption in FoldOpIntoPhi that assumed the PHI node was always in operand 0. Now we pass it in like we do for FoldOpIntoSelect. But we still require some dancing to find the Constant when we create the BinOp or ConstantExpr. This is based code is similar to what we do for selects.

Since I touched all call sites, this also renames FoldOpIntoPhi to foldOpIntoPhi to match coding standards.

Differential Revision: https://reviews.llvm.org/D31686

llvm-svn: 300363
2017-04-14 19:20:12 +00:00
Stanislav Mekhanoshin eff0bc7839 [AMDGPU] set read_only access qualifier for pointers
If a kernel's pointer argument is known to be readonly
set access qualifier accordingly. This allows RT not to
flush caches before dispatches.

Differential Revision: https://reviews.llvm.org/D32091

llvm-svn: 300362
2017-04-14 19:11:40 +00:00
Sam Clegg dd2d7bf100 [Test commit] Cleanup some whitespace in a test file
llvm-svn: 300361
2017-04-14 18:43:57 +00:00
Craig Topper d61ccd735e [InstCombine] Regenerate test checks using script. NFC
llvm-svn: 300360
2017-04-14 18:42:55 +00:00
Sanjay Patel 9d39a9d860 [InstCombine] add/move tests for and/or-of-icmps equality folds; NFC
llvm-svn: 300357
2017-04-14 18:19:27 +00:00
Alexey Bataev 9c27d79520 Update tests for the patch.
llvm-svn: 300351
2017-04-14 17:47:07 +00:00
Simon Pilgrim 5a22eaa2bf [X86][SSE] Update MOVNTDQA non-temporal loads to generic implementation (LLVM)
MOVNTDQA non-temporal aligned vector loads can be correctly represented using generic builtin loads, allowing us to remove the existing x86 intrinsics.

Clang companion patch: D31766.

Differential Revision: https://reviews.llvm.org/D31767

llvm-svn: 300325
2017-04-14 15:05:35 +00:00
Dmitry Preobrazhensky e6ef099dcd [AMDGPU][MC] Corrected ds_write_src2_* to require one offset instead of two.
Fixed bug 32551: https://bugs.llvm.org//show_bug.cgi?id=32551

Reviewers: vpykhtin

Differential Revision: https://reviews.llvm.org/D31809

llvm-svn: 300319
2017-04-14 12:28:07 +00:00
Dmitry Preobrazhensky 5714860ee4 [AMDGPU][MC] Enabled constants for src operands of s_cbranch_g_fork
Fixed bug 32619: https://bugs.llvm.org//show_bug.cgi?id=32619

Reviewers: artem.tamazov, vpykhtin

Differential Revision: https://reviews.llvm.org/D31973

llvm-svn: 300318
2017-04-14 11:52:26 +00:00
Andrew V. Tischenko 4e7bcd5216 Fix for PR#30562: Selection DAG error: Detected cycle in SelectionDAG.
Patch by Dinar Temirbulatov

llvm-svn: 300314
2017-04-14 09:17:09 +00:00
Andrew V. Tischenko 75745d0c3e This patch closes PR#32216: Better testing of schedule model instruction latencies/throughputs.
The details are here: https://reviews.llvm.org/D30941

llvm-svn: 300311
2017-04-14 07:44:23 +00:00