Commit Graph

23380 Commits

Author SHA1 Message Date
Craig Topper b8d7b1620b [X86] Custom legalize (v2i1 (fp_to_uint/fp_to_sint v2f64)) without AVX512VL.
Strangely the code was already present, just the setOperationAction wasn't being called without VLX.

llvm-svn: 324806
2018-02-10 08:39:31 +00:00
Craig Topper c3aab4bbe1 [X86] Legalize zero extends from vXi1 to vXi16/vXi32/vXi64 using a sign extend and a shift.
This avoids a constant pool load to create 1.

The int->float are showing converts to mask and back. We probably need to widen inputs to sint_to_fp/uint_to_fp before type legalization.

llvm-svn: 324805
2018-02-10 08:06:52 +00:00
Craig Topper d34af6f636 [X86] Teach combineExtSetcc to handle ZERO_EXTEND by widening the setcc and then masking. A later DAG combine will convert to a shift.
This helps to avoid a constant pool load needed to zero extend from the mask.

llvm-svn: 324804
2018-02-10 08:06:49 +00:00
Craig Topper fa6113b3d7 [X86] Teach combineInsertSubvector how to combine some k-register insert_subvectors and extract_subvector sequences to remove extra zeroing.wq
llvm-svn: 324791
2018-02-10 01:00:41 +00:00
Craig Topper 99db883d55 [X86] Teach lower1BitVectorShuffle to recognize shuffles that are just filling upper elements with zero. Replace with insert_subvector.
There's still some extra kshifts in one of the modified test cases here, but hopefully that's only a DAG combine away.

llvm-svn: 324782
2018-02-09 23:32:27 +00:00
Dan Gohman db1916a646 [WebAssembly] Add mechanisms for specifying an explicit import module name.
This adds a wasm-import-module function attribute and a .import_module
assembler directive, for specifying module import names for WebAssembly.
Currently these may only be used for function symbols; global variables
may be considered in the future.

WebAssembly has a two-level namespace scheme for symbols, and it's
normally the linker's job to assign the module name, which is the
first-level name. The attributes here allow users to specify their
own module names explicitly, which is useful for tools generating
bindings to modules defined in other languages.

This feature is not fully usable yet. It will evolve along with the
ongoing symbol table and lld changes.

Differential Revision: https://reviews.llvm.org/D42520

llvm-svn: 324778
2018-02-09 23:13:22 +00:00
Krzysztof Parzyszek 9b48e8d233 [Hexagon] Add code to select QTRUE and QFALSE
Fixes http://llvm.org/PR36320.

llvm-svn: 324763
2018-02-09 19:10:46 +00:00
Sanjay Patel 4031ce15b8 [x86] remove duplicate undef tests; NFC
These are incomplete and were made redundant with the consolidation in:
https://reviews.llvm.org/rL324678

llvm-svn: 324754
2018-02-09 17:46:38 +00:00
Rafael Espindola c052fa0bd3 Emit smaller exception tables for non-SJLJ mode.
* Use uleb128 for code offsets in the LSDA call site table.
* Omit the TTBase offset if the type table is empty.

This change can reduce the size of the DWARF/Itanium LSDA by about half.

Patch by Ryan Prichard!

llvm-svn: 324750
2018-02-09 17:13:37 +00:00
Rafael Espindola d09b416943 Use assembler expressions to lay out the EH LSDA.
Rely on the assembler to finalize the layout of the DWARF/Itanium
exception-handling LSDA. Rather than calculate the exact size of each
thing in the LSDA, use assembler directives:

    To emit the offset to the TTBase label:

.uleb128 .Lttbase0-.Lttbaseref0
.Lttbaseref0:

    To emit the size of the call site table:

.uleb128 .Lcst_end0-.Lcst_begin0
.Lcst_begin0:
... call site table entries ...
.Lcst_end0:

    To align the type info table:

... action table ...
.balign 4
.long _ZTIi
.long _ZTIl
.Lttbase0:

Using assembler directives simplifies the compiler and allows switching
the encoding of offsets in the call site table from udata4 to uleb128 for
a large code size savings. (This commit does not change the encoding.)

The combination of the uleb128 followed by a balign creates an unfortunate
dependency cycle that the assembler must sometimes resolve either by
padding an LEB or by inserting zero padding before the type table. See
PR35809 or GNU as bug 4029.

Patch by Ryan Prichard!

llvm-svn: 324749
2018-02-09 17:00:25 +00:00
Matt Arsenault 923712b6b5 Reapply "AMDGPU: Add 32-bit constant address space"
This reverts r324494 and reapplies r324487.

llvm-svn: 324747
2018-02-09 16:57:57 +00:00
Krzysztof Parzyszek 7cfe7cbccc [Hexagon] Express calling conventions via .td file instead of hand-coding
Additionally, simplify the rest of the argument/parameter lowering code.

llvm-svn: 324737
2018-02-09 15:30:02 +00:00
Stefan Maksimovic 991af7a558 [DebugInfo] Don't insert DEBUG_VALUE after terminators
r314974 introduced insertion of DEBUG_VALUEs after
each redefinition of debug value register in the slot index range.

In case the instruction redefining the debug value register
was a terminator, machine verifier would complain since it
enforces the rule of no non-terminator instructions
following the first terminator.

Differential Revision: https://reviews.llvm.org/D42801

llvm-svn: 324734
2018-02-09 14:03:26 +00:00
Stefan Maksimovic dc66ae78c6 [SelectionDAG] Provide adequate register class for RegisterSDNode
When adding operands to machine instructions in case of
RegisterSDNodes, generate a COPY node in case the register class
does not match the one in the instruction definition.

Differental Revision: https://reviews.llvm.org/D35561

llvm-svn: 324733
2018-02-09 13:55:25 +00:00
Simon Dardis 9ab7f42adc [mips] UnXFAIL gprestore.ll test.
Repurpose this previously XFAIL'd test to check that jalr uses $25
as per ABI requirements for PIC code.

llvm-svn: 324729
2018-02-09 10:46:16 +00:00
Jonas Paulsson 7850601fa3 [AArch64] Return true in enableMultipleCopyHints().
Enable multiple COPY hints to eliminate more COPYs during register allocation.

Note that this is something all targets should do, see
https://reviews.llvm.org/D38128.

Review: Martin Storsjö
llvm-svn: 324720
2018-02-09 09:22:20 +00:00
Craig Topper ca5841b4e4 [X86] Simplify some code in lowerV4X128VectorShuffle and lowerV2X128VectorShuffle
Previously we extracted two subvectors and concatenate. But the concatenate will be lowered to two insert subvectors. Then DAG combine will merge once of the inserts and one of the extracts back into the original vector. We might as well just directly use one extract and one insert.

llvm-svn: 324710
2018-02-09 05:54:36 +00:00
Craig Topper 28166a877d [X86] Teach shuffle lowering to recognize 128/256 bit insertions into a zero vector.
This regresses a couple cases in the shuffle combining test. But those cases use intrinsics that InstCombine knows how to turn into a generic shuffle earlier. This should give opportunities to fold this earlier in InstCombine or DAG combine.

llvm-svn: 324709
2018-02-09 05:54:34 +00:00
Craig Topper 090e41d0cc [X86] Add 512-bit shuffle test cases for concatenating 128/256-bits with zeros in the upper portion.
We should recognize this and just use a mov that will zero the upper bits.

llvm-svn: 324708
2018-02-09 05:54:31 +00:00
Aditya Nandakumar b14fd2608c [GISel]: Verify COPIES involving generic registers.
Add verification for copies involving generic registers if they are
compatible - ie if it is a generic copy, then the types are the
same, and if a COPY b/w generic and target virtual register, then
the sizes should be the same. Only checks if there are no sub registers
involved for now.

https://reviews.llvm.org/D37775

llvm-svn: 324696
2018-02-09 01:27:23 +00:00
Francis Visoiu Mistrih fb7b14f70d [CodeGen] Unify the syntax of MBB liveins in MIR and -debug output
Instead of:

Live Ins: %r0 %r1

print:

liveins: %r0, %r1
llvm-svn: 324694
2018-02-09 01:14:44 +00:00
Craig Topper 79c3255fe4 [x86] Add test cases to demonstrate some dumb mask->gpr->mask transition sequences.
llvm-svn: 324693
2018-02-09 01:14:17 +00:00
Francis Visoiu Mistrih 39ec2e95ae [CodeGen] Unify the syntax of MBB successors in MIR and -debug output
Instead of:

Successors according to CFG: %bb.6(0x12492492 / 0x80000000 = 14.29%)

print:

successors: %bb.6(0x12492492); %bb.6(14.29%)
llvm-svn: 324685
2018-02-09 00:10:31 +00:00
Sanjay Patel b7e13938a9 [x86] consolidate and add tests for undef binop folds; NFC
As was already shown in the div/rem tests and noted in PR36305,
the behavior is inconsistent, but it's not limited to div/rem only.

llvm-svn: 324678
2018-02-08 23:21:44 +00:00
Paul Robinson ceafcd41cf [DWARFv5] Fix dumper to show the file table starts at index 0.
Emitting the correct (root of compilation) file at index 0 will be
posted for review later; I wanted to get this minor change out of the
way first.

llvm-svn: 324669
2018-02-08 23:08:02 +00:00
Matt Arsenault c24d5e2819 AMDGPU: Minor cleanups
Column limit, typo, unnecessary reference

llvm-svn: 324666
2018-02-08 22:46:38 +00:00
Alexander Ivchenko da9e81c462 [GlobalISel][X86] Fixing failures after https://reviews.llvm.org/D37775
The patch essentially makes sure that X86CallLowering adds proper
G_COPY/G_TRUNC and G_ANYEXT/G_COPY when we are doing lowering of
arguments/returns for floating point values passed on registers.

Tests are updated accordingly

Reviewed By: qcolombet

Differential Revision: https://reviews.llvm.org/D42287

llvm-svn: 324665
2018-02-08 22:41:47 +00:00
Alexander Ivchenko a85c4fc029 [GlobalIsel][X86] Making {G_IMPLICIT_DEF, s128} legal
The patch is a split from D42287 and is related to
fixing failures after https://reviews.llvm.org/D37775

Reviewed By: qcolombet

Differential Revision: https://reviews.llvm.org/D42287

llvm-svn: 324664
2018-02-08 22:40:31 +00:00
Craig Topper 9e030c9e00 [X86] Improve combineCastedMaskArithmetic to fold (bitcast (vXi1 (and/or/xor X, C)))->(vXi1 (and/or/xor (bitcast X), (bitcast C)) where C is a constant build_vector.
Most vxi1 constant build vectors have to be implemented in the scalar domain anyway so we'll probably end up with a cast there later. But by then its too late to do the combine to get rid of it.

llvm-svn: 324662
2018-02-08 22:26:39 +00:00
Craig Topper 1b5b4ccb77 [X86] Add DAG combine to constant fold a bitcast of a vXi1 constant build_vector into a scalar integer.
llvm-svn: 324661
2018-02-08 22:26:36 +00:00
Craig Topper dccf72b583 [X86] Remove kortest intrinsics and replace with native IR.
llvm-svn: 324646
2018-02-08 20:16:06 +00:00
David Woodhouse 76eb26aa92 [X86] Support 'V' register operand modifier
This allows the register name to be printed without the leading '%'.
This can be used for emitting calls to the retpoline thunks from inline
asm.

llvm-svn: 324645
2018-02-08 20:06:05 +00:00
Simon Pilgrim 21fb6bccc4 [X86] Add common CHECK prefix to shift combine tests
llvm-svn: 324638
2018-02-08 19:23:47 +00:00
Simon Pilgrim ae6873f91c [X86] Add shift undef, %X tests
llvm-svn: 324637
2018-02-08 19:20:34 +00:00
Craig Topper 7aee1a838e [X86] Add a few new test cases for shrunkblend combine
One of them shows a missed opportunity to use SimplifyDemandedBits on the condition when its used by multiple vselects.

The other is a case we shouldn't optimize because the condition has a non-vselect use.

llvm-svn: 324630
2018-02-08 18:34:25 +00:00
Alexander Ivchenko dd5b2396d3 [x86] Add test/CodeGen/X86/vmaskmov-offset.ll. NFC.
Needed for checking current code generation.

llvm-svn: 324601
2018-02-08 13:16:42 +00:00
Stefan Maksimovic 8989940557 Revert accidental changes that snuck in r324584
llvm-svn: 324585
2018-02-08 09:31:48 +00:00
Stefan Maksimovic b3e7ed3b94 [mips] Define certain instructions in microMIPS32r3
Instructions affected:
mthc1, mfhc1, add.d, sub.d, mul.d, div.d,
mov.d, neg.d, cvt.w.d, cvt.d.s, cvt.d.w, cvt.s.d

These instructions are now defined for
microMIPS32r3 + microMIPS32r6 in MicroMipsInstrFPU.td
since they shared their encoding with those already defined
in microMIPS32r6InstrInfo.td and have been therefore
removed from the latter file.

Some instructions present in MicroMipsInstrFPU.td which
did not have both AFGR64 and FGR64 variants defined have
been altered to do so.

Differential revision: https://reviews.llvm.org/D42738

llvm-svn: 324584
2018-02-08 09:25:17 +00:00
Dylan McKay 820553fdb1 [AVR] Fix the testsuite after '%' changed to '$' in MIR
llvm-svn: 324583
2018-02-08 09:17:11 +00:00
Sjoerd Meijer 5ea465ded7 [AArch64] Don't materialize 0 with "fmov h0, .." when FullFP16 is not supported
We were generating "fmov h0, wzr" instructions when FullFP16 is not enabled.
I've not added any tests, because the problem was visible in:
test/CodeGen/AArch64/arm64-zero-cycle-zeroing.ll,
which I had to change: I don't think Cyclone has FullFP16 enabled
by default, so it shouldn't be using this v8.2a instruction.

I've also removed these rdar tags, please shout if there are any objections.

Differential Revision: https://reviews.llvm.org/D43020

llvm-svn: 324581
2018-02-08 08:39:05 +00:00
Craig Topper 8d0c8c9be1 [X86] Support folding in a k-register OR when creating KORTEST from scalar compare of a bitcast from vXi1.
This should allow us to remove the kortest intrinsic from IR and use compare+bitcast+or in IR instead.

llvm-svn: 324580
2018-02-08 08:29:43 +00:00
Craig Topper 93505707b6 [X86] Allow KORTEST instruction to be used for testing if a mask is all ones
The KTEST instruction sets the C flag if the result of anding both operands together is all 1s. We can use this to lower (icmp eq/ne (bitcast (vXi1 X), -1)

Differential Revision: https://reviews.llvm.org/D42772

llvm-svn: 324577
2018-02-08 07:54:16 +00:00
Craig Topper f5465f98d2 [X86] Don't emit KTEST instructions unless only the Z flag is being used
Summary:
KTEST has weird flag behavior. The Z flag is set for all bits in the AND of the k-registers being 0, and the C flag is set for all bits being 1. All other flags are cleared.

We currently emit this instruction in EmitTEST and don't check the condition code. This can lead to strange things like using the S flag after a KTEST for a signed compare.

The domain reassignment pass can also transform TEST instructions into KTEST and is not protected against the flag usage either. For now I've disabled this part of the domain reassignment pass. I tried to comment out the checks in the mir test so that we could recover them later, but I couldn't figure out how to get that to work.

This patch moves the KTEST handling into LowerSETCC and now creates a ktest+x86setcc. I've chosen this approach because I'd like to add support for the C flag for all ones in a followup patch. To do that requires that I can rewrite the condition code going in the x86setcc to be different than the original SETCC condition code.

This fixes PR36182. I'll file a PR to fix domain reassignment once this goes in. Should this be merged to 6.0?

Reviewers: spatel, guyblank, RKSimon, zvi

Reviewed By: guyblank

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42770

llvm-svn: 324576
2018-02-08 07:45:55 +00:00
Francis Visoiu Mistrih da89d1812a [CodeGen] Print MachineBasicBlock labels using MIR syntax in -debug output
Instead of:

%bb.1: derived from LLVM BB %for.body

print:

bb.1.for.body:

Also use MIR syntax for MBB attributes like "align", "landing-pad", etc.

llvm-svn: 324563
2018-02-08 05:02:00 +00:00
Yonghong Song f2075aef68 bpf: Improve expanding logic in LowerSELECT_CC
LowerSELECT_CC is not generating optimal Select_Ri pattern at the moment. It
is not guaranteed to place ConstantNode at RHS which would miss matching
Select_Ri.

A new testcase added into the existing select_ri.ll, also there is an
existing case in cmp.ll which would be improved to use Select_Ri after this
patch, it is adjusted accordingly.

Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Reviewed-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
llvm-svn: 324560
2018-02-08 04:37:49 +00:00
Matt Arsenault b02cebf552 AMDGPU: Fix incorrect reordering when inline asm defines LDS address
Defs of operands outside of the instruction's explicit defs need
to be checked.

llvm-svn: 324554
2018-02-08 01:56:14 +00:00
Matt Arsenault c908e3f77a AMDGPU: Don't crash when trying to fold implicit operands
llvm-svn: 324550
2018-02-08 01:12:46 +00:00
Chandler Carruth 0be0cfa65b [x86] Fix nasty bug in the x86 backend that is essentially impossible to
hit from IR but creates a minefield for MI passes.

The x86 backend has fairly powerful logic to try and fold loads that
feed register operands to instructions into a memory operand on the
instruction. This is almost always a good thing, but there are specific
relocated loads that are only allowed to appear in specific
instructions. Notably, R_X86_64_GOTTPOFF is only allowed in `movq` and
`addq`. This patch blocks folding of memory operands using this
relocation unless the target is in fact `addq`.

The particular relocation indicates why we simply don't hit this under
normal circumstances. This relocation is only used for TLS, and it gets
used in very specific ways in conjunction with %fs-relative addressing.
The result is that loads using this relocation are essentially never
eligible for folding into an instruction's memory operands. Unless, of
course, you have an MI pass that inserts usage of such a load. I have
exactly such an MI pass and was greeted by truly mysterious miscompiles
where the linker replaced my instruction with a completely garbage byte
sequence. Go team.

This is the only such relocation I'm aware of in x86, but there may be
others that need to be similarly restricted.

Fixes PR36165.

Differential Revision: https://reviews.llvm.org/D42732

llvm-svn: 324546
2018-02-07 23:59:14 +00:00
Craig Topper 8baa9c77e3 [X86] When doing callee save/restore for k-registers make sure we don't use KMOVQ on non-BWI targets
If we are saving/restoring k-registers, the default behavior of getMinimalRegisterClass will find the VK64 class with a spill size of 64 bits. This will cause the KMOVQ opcode to be used for save/restore. If we don't have have BWI instructions we need to constrain the class returned to give us VK16 with a 16-bit spill size. We can do this by passing the either v16i1 or v64i1 into getMinimalRegisterClass.

Also add asserts to make sure BWI is enabled anytime we use KMOVD/KMOVQ. These are what caught this bug.

Fixes PR36256

Differential Revision: https://reviews.llvm.org/D42989

llvm-svn: 324533
2018-02-07 21:41:50 +00:00
Craig Topper ce26819f9e [X86] Auto-generate complete checks. NFC
llvm-svn: 324530
2018-02-07 21:29:30 +00:00