Commit Graph

152727 Commits

Author SHA1 Message Date
Matt Arsenault 9d288e69f5 AMDGPU: Remove redundant opt level check
addOptimizedRegAlloc isn't used for -O0 already.

llvm-svn: 310275
2017-08-07 18:12:48 +00:00
Matt Arsenault 3db456820d AMDGPU: Remove FixControlFlowLiveIntervals pass
This hasn't done anything in a long time. This was
running after the the control flow pseudos were expanded,
so this would never find them. The control flow pseudo
expansion was moved to solve the problem this pass was
supposed to solve in the first place, except handling
it earlier also fixes it for fast regalloc which doesn't
use LiveIntervals.

Noticed by checking LCOV reports.

llvm-svn: 310274
2017-08-07 18:12:47 +00:00
Craig Topper 7091a743b4 [InstCombine] Support (X | C1) & C2 --> (X & C2^(C1&C2)) | (C1&C2) for vector splats
Note the original code I deleted incorrectly listed this as (X | C1) & C2 --> (X & C2^(C1&C2)) | C1 Which is only valid if C1 is a subset of C2. This relied on SimplifyDemandedBits to remove any extra bits from C1 before we got to that code.

My new implementation avoids relying on that behavior so that it can be naively verified with alive.

Differential Revision: https://reviews.llvm.org/D36384

llvm-svn: 310272
2017-08-07 18:10:39 +00:00
Matt Arsenault aac47c1c00 AMDGPU: Use a custom areInlineCompatible
Fixes not inlining OpenCL library functions on AMDGPU,
which don't have an explicitly set target-cpu.

llvm-svn: 310269
2017-08-07 17:08:44 +00:00
Simon Pilgrim 0242cead2c [X86][AVX] Add full test coverage of subvector_broadcasts from registers
X86SubVBroadcast is for memory subvector broadcasts, but we must test that it handles all cases without the load as well just in case.

This was noticed while I was triaging the test cases from PR34041.

llvm-svn: 310268
2017-08-07 16:49:09 +00:00
Simon Dardis b1b52c0200 [DebugInfo][DWARF] Address paulr's comment on rL310253.
llvm-svn: 310267
2017-08-07 16:08:11 +00:00
Simon Pilgrim 8b9c4f2855 [X86][AVX] Cleanup subvector broadcast tests - remove old prefixes.
llvm-svn: 310265
2017-08-07 15:50:43 +00:00
Sanjay Patel 807f92b8ff [x86] revert r310208 to investigate test-suite failures (PR34105 / PR34097)
llvm-svn: 310264
2017-08-07 15:47:48 +00:00
Simon Dardis 02d9945e6f [DebugInfo][DWARF] Correct some usages of PRIx32 to PRIx64
These lead to tests failing spuriously as the values after being rendered to a
string were incorrect.

Reviewers: clayborg

Differential Revision: https://reviews.llvm.org/D36319

llvm-svn: 310262
2017-08-07 15:37:57 +00:00
Alexey Bataev 9581b42589 [SLP] General improvements of SLP vectorization process.
Patch tries to improve two-pass vectorization analysis, existing in SLP vectorizer. What it does:

1. Defines key nodes, that are the vectorization roots. Previously vectorization started if StoreInst or ReturnInst is found. For now, the vectorization started for all Instructions with no users and void types (Terminators, StoreInst) + CallInsts.
2. CmpInsts, InsertElementInsts and InsertValueInsts are stored in the
array. This array is processed only after the vectorization of the
first-after-these instructions key node is finished. Vectorization goes
in reverse order to try to vectorize as much code as possible.

Reviewers: mzolotukhin, Ayal, mkuper, gilr, hfinkel, RKSimon

Subscribers: ashahid, anemet, RKSimon, mssimpso, llvm-commits

Differential Revision: https://reviews.llvm.org/D29826

llvm-svn: 310260
2017-08-07 15:25:49 +00:00
Matt Arsenault faeac6b15e Fix typo in comment
llvm-svn: 310259
2017-08-07 14:58:43 +00:00
Matt Arsenault 8728c5f2db AMDGPU: Cleanup subtarget features
Try to avoid mutually exclusive features. Don't use
a real default GPU, and use a fake "generic". The goal
is to make it easier to see which set of features are
incompatible between feature strings.

Most of the test changes are due to random scheduling changes
from not having a default fullspeed model.

llvm-svn: 310258
2017-08-07 14:58:04 +00:00
Alexey Bataev 53d523c9eb Revert "[SLP] General improvements of SLP vectorization process."
This reverts commit r310255.

llvm-svn: 310257
2017-08-07 14:51:52 +00:00
Nirav Dave 3d3bde7682 [DAG] Extend visitSCALAR_TO_VECTOR optimization to truncated vector.
Relanding after case to insert explicit truncation as necessary.

Allow SCALAR_TO_VECTOR of EXTRACT_VECTOR_ELT to reduce to
EXTRACT_SUBVECTOR of vector shuffle when output is smaller. Marginally
improves vector shuffle computations.

Reviewers: efriedma, RKSimon, spatel

Subscribers: javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D35566

llvm-svn: 310256
2017-08-07 14:07:49 +00:00
Alexey Bataev faace8f1f1 [SLP] General improvements of SLP vectorization process.
Summary:
Patch tries to improve two-pass vectorization analysis, existing in SLP vectorizer. What it does:
1. Defines key nodes, that are the vectorization roots. Previously vectorization started if StoreInst or ReturnInst is found. For now, the vectorization started for all Instructions with no users and void types (Terminators, StoreInst) + CallInsts.
2. CmpInsts, InsertElementInsts and InsertValueInsts are stored in the array. This array is processed only after the vectorization of the first-after-these instructions key node is finished. Vectorization goes in reverse order to try to vectorize as much code as possible.

Reviewers: mzolotukhin, Ayal, mkuper, gilr, hfinkel, RKSimon

Subscribers: ashahid, anemet, RKSimon, mssimpso, llvm-commits

Differential Revision: https://reviews.llvm.org/D29826

llvm-svn: 310255
2017-08-07 14:03:17 +00:00
Nirav Dave b2f3fad40c [TableGen] AsmMatcher: fix OpIdx computation when HasOptionalOperands is true
Relanding after fixing UB issue with DefaultOffsets.

Consider the following instruction: "inst.eq $dst, $src" where ".eq"
is an optional flag operand.  The $src and $dst operands are
registers.  If we parse the instruction "inst r0, r1", the flag is not
present and it will be marked in the "OptionalOperandsMask" variable.
After the matching is complete we call the "convertToMCInst" method.

The current implementation works only if the optional operands are at
the end of the array.  The "Operands" array looks like [token:"inst",
reg:r0, reg:r1].  The first operand that must be added to the MCInst
is the destination, the r0 register.  The "OpIdx" (in the Operands
array) for this register is 2.  However, since the flag is not present
in the Operands, the actual index for r0 should be 1.  The flag is not
present since we rely on the default value.

This patch removes the "NumDefaults" variable and replaces it with an
array (DefaultsOffset).  This array contains an index for each operand
(excluding the mnemonic).  At each index, the array contains the
number of optional operands that should be subtracted.  For the
previous example, this array looks like this: [0, 1, 1].  When we need
to access the r0 register, we compute its index as 2 -
DefaultsOffset[1] = 1.

Patch by Alexandru Guduleasa!

Reviewers: SamWot, nhaustov, niravd

Reviewed By: niravd

Subscribers: vitalybuka, llvm-commits

Differential Revision: https://reviews.llvm.org/D35998

llvm-svn: 310254
2017-08-07 13:55:27 +00:00
Simon Dardis ec4ea99766 [DebugInfo][DWARF] Use PRIx64 explicitly in output.
llvm-svn: 310253
2017-08-07 13:30:03 +00:00
Michael Zuckerman 680ac10aa7 [X86][LLVM]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess (VF16 stride 4).
This patch expands the support of lowerInterleavedStore to 16x8i stride 4.

LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=4 VF=16) and we plan to include more patterns in the future.

The patch goal is to optimize the following sequence:
At the end of the computation, we have ymm2, ymm0, ymm12 and ymm3 holding
each 16 chars:

c0, c1, , c16
m0, m1, , m16
y0, y1, , y16
k0, k1, ., k16

And these need to be transposed/interleaved and stored like so:

c0 m0 y0 k0 c1 m1 y1 k1 c2 m2 y2 k2 c3 m3 y3 k3 ....

Differential Revision: https://reviews.llvm.org/D35829

llvm-svn: 310252
2017-08-07 13:22:39 +00:00
Dmitry Preobrazhensky 50805a0b83 [AMDGPU][MC] Corrected VOP3 version of v_interp_* instructions for VI
See bug 32621: https://bugs.llvm.org//show_bug.cgi?id=32621

Reviewers: vpykhtin, SamWot, arsenm

Differential Revision: https://reviews.llvm.org/D35902

llvm-svn: 310251
2017-08-07 13:14:12 +00:00
Simon Dardis 3886705689 [llvm-objdump] Use PRIx64 for output of ARM64_RELOC_ADDEND
llvm-svn: 310250
2017-08-07 12:29:38 +00:00
Simon Pilgrim e2bb83a4ee [X86][AVX] Added test for broadcast shuffle with undefs (PR34041)
llvm-svn: 310249
2017-08-07 12:24:33 +00:00
Andre Vieira 7dffb9bfa6 [ARM] Fix assembly and disassembly for VMRS/VMSR
This patch addresses two issues with assembly and disassembly for VMRS/VMSR:

1.currently VMRS/VMSR instructions accessing fpsid, mvfr{0-2} and fpexc, are
  accepted for non ARMv8-A targets.

2. all VMRS/VMSR instructions accept writing/reading to PC and SP, when only
   ARMv7-A and ARMv8-A should be allowed to write/read to SP and none to PC.

This patch addresses those issues and adds tests for these cases.

Differential Revision: https://reviews.llvm.org/D36306

llvm-svn: 310243
2017-08-07 08:41:05 +00:00
Vitaly Buka 5d432ec929 [asan] Fix asan dynamic shadow check before copyArgsPassedByValToAllocas
llvm-svn: 310242
2017-08-07 07:35:33 +00:00
Vitaly Buka 629047de8e [asan] Disable checking of arguments passed by value for --asan-force-dynamic-shadow
Fails with "Instruction does not dominate all uses!"

llvm-svn: 310241
2017-08-07 07:12:34 +00:00
Vitaly Buka f20b9bc218 Add -asan-force-dynamic-shadow test
llvm-svn: 310240
2017-08-07 07:12:33 +00:00
Guy Blank 5ca01695f7 [SelectionDAG] reset NewNodesMustHaveLegalTypes flag between basic blocks
The NewNodesMustHaveLegalTypes flag is set to false at the beginning of CodeGenAndEmitDAG, and set to true after legalizing types.
But before calling CodeGenAndEmitDAG we build the DAG for the basic block.
So for the first basic block NewNodesMustHaveLegalTypes would be 'false' during the SDAG building, and for all other basic blocks it would be 'true'.

This patch sets the flag to false before SDAG building each basic block.

Differential Revision:
https://reviews.llvm.org/D33435

llvm-svn: 310239
2017-08-07 05:51:14 +00:00
Davide Italiano b53b075bb1 [Reassociate] Use a range loop for clarity. NFCI.
While here, rename `i` to `Rank` as the latter is more
self-explanatory (and this code also uses `I` two lines below to
identify an Instruction).

llvm-svn: 310238
2017-08-07 01:57:21 +00:00
Davide Italiano a5cdc22e70 [Reassociate] Try to bail out early when canonicalizing.
This commit rearranges the checks to avoid calls to getRank()
when not needed (e.g. when RHS == LHS).

llvm-svn: 310237
2017-08-07 01:49:09 +00:00
Craig Topper 576fb91aef [InstCombine] Remove shift handling from OptAndOp.
Summary: This is all handled by SimplifyDemandedBits.

Reviewers: spatel, davide

Reviewed By: davide

Subscribers: davide, llvm-commits

Differential Revision: https://reviews.llvm.org/D36382

llvm-svn: 310234
2017-08-06 23:30:49 +00:00
Craig Topper a1693a2ed3 [InstCombine] Support (X ^ C1) & C2 --> (X & C2) ^ (C1&C2) for vector splats.
llvm-svn: 310233
2017-08-06 23:11:49 +00:00
Craig Topper 9cbdbefd0f [InstCombine] Support '(C - X) ^ signmask -> (C + signmask - X)' and '(X + C) ^ signmask -> (X + C + signmask)' for vector splats.
llvm-svn: 310232
2017-08-06 22:17:21 +00:00
Simon Pilgrim 7c53c0f586 [SLPVectorizer][X86] Cleanup test case. NFCI
Remove excess attributes/metadata

llvm-svn: 310227
2017-08-06 20:50:19 +00:00
Martin Storsjo c9263f4e49 [llvm-dlltool] Map the "arm64" machine type
Differential Revision: https://reviews.llvm.org/D36365

llvm-svn: 310223
2017-08-06 19:58:13 +00:00
Matt Arsenault a7eb14afc7 AMDGPU: Fix typo in feature description
llvm-svn: 310217
2017-08-06 18:13:23 +00:00
Sanjay Patel a923c2ee95 [x86] use more shift or LEA for select-of-constants
We can convert any select-of-constants to math ops:
http://rise4fun.com/Alive/d7d

For this patch, I'm enhancing an existing x86 transform that uses fake multiplies 
(they always become shl/lea) to avoid cmov or branching. The current code misses 
cases where we have a negative constant and a positive constant, so this is just 
trying to plug that hole.

The DAGCombiner diff prevents us from hitting a terrible inefficiency: we can start 
with a select in IR, create a select DAG node, convert it into a sext, convert it 
back into a select, and then lower it to sext machine code.

Some notes about the test diffs:

1. 2010-08-04-MaskedSignedCompare.ll - We were creating control flow that didn't exist in the IR.
2. memcmp.ll - Choose -1 or 1 is the case that got me looking at this again. I 
   think we could avoid the push/pop in some cases if we used 'movzbl %al' instead of an xor on 
   a different reg? That's a post-DAG problem though.
3. mul-constant-result.ll - The trade-off between sbb+not vs. setne+neg could be addressed if 
   that's a regression, but I think those would always be nearly equivalent.
4. pr22338.ll and sext-i1.ll - These tests have undef operands, so I don't think we actually care about these diffs.
5. sbb.ll - This shows a win for what I think is a common case: choose -1 or 0.
6. select.ll - There's another borderline case here: cmp+sbb+or vs. test+set+lea? Also, sbb+not vs. setae+neg shows up again.
7. select_const.ll - These are motivating cases for the enhancement; replace cmov with cheaper ops.

Assembly differences between movzbl and xor to avoid a partial reg stall are caused later by the X86 Fixup SetCC pass.

Differential Revision: https://reviews.llvm.org/D35340

llvm-svn: 310208
2017-08-06 16:27:07 +00:00
Simon Pilgrim 17e290f1d3 [X86] Add comment to match closing Defs = [FPSW]. NFCI.
llvm-svn: 310202
2017-08-06 13:21:09 +00:00
Simon Pilgrim aeb2b1e36d [X86][X87] Regenerate inline-asm tests
llvm-svn: 310201
2017-08-06 12:17:10 +00:00
Meador Inge 70ab7cc55c [AVR] Compute code model if one is not provided
The patch from r310028 fixed things to work with the new
`LLVMTargetMachine` constructor that came in on r309911.
However, the fix was partial since an object of type
`CodeModel::Model` must be passed to `LLVMTargetMachine`
(not one of `Optional<CodeModel::Model>`).

This patch fixes the problem in the same fashion that r309911
did for other machines: by checking if the passed optional
code model has a value and using `CodeModel::Small` if not.

llvm-svn: 310200
2017-08-06 12:02:17 +00:00
Simon Pilgrim 5489781cc7 [X86][X87] Add test case for PR34080
Test with/without the sandybridge (default) model for SSE2, SSE3 and AVX targets.

pre-SSE3 the issue is the order of the fpsw and fpcw load/stores (with SSE3 trunc-store FIST instructions avoid the sw/cw manipulations).

llvm-svn: 310198
2017-08-06 11:22:33 +00:00
Craig Topper b5bf016015 [InstCombine] Support ~(c-X) --> X+(-c-1) and ~(X-c) --> (-c-1)-X for splat vectors.
llvm-svn: 310195
2017-08-06 06:28:41 +00:00
Craig Topper 6bfa2aee78 [X86] Enable isel to use the PAUSE instruction even when SSE2 is disabled
Summary:
On older processors this instruction encoding is treated as a NOP.

MSVC doesn't disable intrinsics based on features the way clang/gcc does. Because the PAUSE instruction encoding doesn't crash older processors, some software out there uses these intrinsics without checking for SSE2.

This change also seems to also be consistent with gcc behavior.

Fixes PR34079

Reviewers: RKSimon, zvi

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D36361

llvm-svn: 310190
2017-08-05 23:34:44 +00:00
Chandler Carruth 6b78bac9fb [ADT] Add a much simpler loop to DenseMap::clear when the types are
POD-like and we can just splat the empty key across memory.

Sadly we can't optimize the normal loop well enough because we can't
turn the conditional store into an unconditional store according to the
memory model.

This loop actually showed up in a profile of code that was calling clear
as a serious source of time. =[

llvm-svn: 310189
2017-08-05 22:48:37 +00:00
Craig Topper 7e84697e59 [InstCombine] Regenerate test28_sub test case in xor.ll that I forgot to commit after fixing a typo in r310186.
llvm-svn: 310188
2017-08-05 22:44:38 +00:00
Craig Topper 9ffda5ab86 [InstCombine] Fold (C - X) ^ signmask -> (C + signmask - X).
llvm-svn: 310186
2017-08-05 20:00:44 +00:00
Craig Topper 65dd32afbc [InstCombine] Teach the code that pulls logical operators through constant shifts to handle vector splats too.
llvm-svn: 310185
2017-08-05 20:00:42 +00:00
Craig Topper 1bbcab9ca5 [InstCombine] Support vector splats in foldSelectICmpAnd.
Unfortunately, it looks like there's some other missed optimizations in the generated code for some of these cases. I'll try to look at some of those next.

llvm-svn: 310184
2017-08-05 20:00:41 +00:00
Dinar Temirbulatov cc2294a4eb [SLPVectorizer] Add extra parameter to setInsertPointAfterBundle to handle different opcodes, NFCI.
Differential Revision: https://reviews.llvm.org/D35769

llvm-svn: 310183
2017-08-05 18:43:52 +00:00
Sanjay Patel 94da1de1ce [InstCombine] refactor trunc(binop) transforms; NFCI
In addition to moving the shift transforms over, we may want to
detect too-wide rotate patterns here (PR34046). 

llvm-svn: 310181
2017-08-05 15:19:18 +00:00
Florian Hahn d51a35e339 [ARM] The ARM backend is MachineVerifier clean now.
Summary: Thanks everyone involved in fixing the outstanding issues.

Reviewers: rovka, MatzeB, efriedma

Reviewed By: MatzeB

Subscribers: aemerson, javed.absar, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D36153

llvm-svn: 310180
2017-08-05 15:14:06 +00:00
Florian Hahn 414d00139b [ARM] Add registers to debuginfo MIR test cases.
Summary:
MIRParserImpl::computeFunctionProperties uses MRI.getNumVirtRegs() to
set the NoVReg property. By adding a bunch of registers to the MIR test
cases, the NoVReg property is not set when importing the MIR. Otherwise
NoVReg is set after instruction selection while the machine instructions
still contain virtual registers, causing expensive checks to fail.

Reviewers: efriedma, MatzeB, aprantl

Reviewed By: MatzeB, aprantl

Subscribers: aemerson, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D36152

llvm-svn: 310178
2017-08-05 12:13:13 +00:00