Commit Graph

40046 Commits

Author SHA1 Message Date
Yaxun Liu c5bf4b831d AMDGPU: Attempt to fix build failure on x86-64 selfhost build
Remove redundant include file.

llvm-svn: 286552
2016-11-11 02:48:50 +00:00
Sean Fertile e1ca561b0a Add a blank line for a test commit.
llvm-svn: 286550
2016-11-11 02:33:17 +00:00
Stanislav Mekhanoshin 6fc8a1cdaa Revert "[AMDGPU] Allow hoisting of comparisons out of a loop and eliminate condition copies"
This reverts commit r286171, it breaks piglit test fs-discard-exit-2

llvm-svn: 286530
2016-11-11 00:22:34 +00:00
Joerg Sonnenberger 618d475c03 Fix requirements.
llvm-svn: 286527
2016-11-10 23:53:45 +00:00
Matthias Braun d67fa9dc6a Timer: Remove group-less NamedRegionTimer constructor.
The NamedRegionTimer initializer without a group name puts the Timer
into the "Misc" group and is (nearly) unused. Remove it.

The only user of this constructor appears to be the HexagonGenInsert pass,
which creates a counter without group to count the complete execution
time of that pass, however since every pass gets a counter by the
PassManager anyway this should be unnecessary. Also removed the
pointless TimerGroup there.

Differential Revision: https://reviews.llvm.org/D25582

llvm-svn: 286524
2016-11-10 23:36:44 +00:00
Evandro Menezes 21f9ce1a0d [DAG Combiner] Fix the native computation of the Newton series for reciprocals
The generic infrastructure to compute the Newton series for reciprocal and
reciprocal square root was conceived to allow a target to compute the series
itself.  However, the original code did not properly consider this condition
if returned by a target.  This patch addresses the issues to allow a target
to compute the series on its own.

Differential revision: https://reviews.llvm.org/D22975

llvm-svn: 286523
2016-11-10 23:31:06 +00:00
Yaxun Liu d6fbe65040 AMDGPU: Emit runtime metadata as a note element in .note section
Currently runtime metadata is emitted as an ELF section with name .AMDGPU.runtime_metadata.

However there is a standard way to convey vendor specific information about how to run an ELF binary, which is called vendor-specific note element (http://www.netbsd.org/docs/kernel/elf-notes.html).

This patch lets AMDGPU backend emits runtime metadata as a note element in .note section.

Differential Revision: https://reviews.llvm.org/D25781

llvm-svn: 286502
2016-11-10 21:18:49 +00:00
Davide Italiano a22ddddfea [Target] Rename X86/ARM Assembly printer to reflect reality.
This shows up a lot profiling LTO testcases with -time-passes, so
better have a non confusing name.

llvm-svn: 286488
2016-11-10 18:39:31 +00:00
Tom Stellard 115a61560e AMDGPU: Add VI i16 support
Patch By: Wei Ding

Differential Revision: https://reviews.llvm.org/D18049

llvm-svn: 286464
2016-11-10 16:02:37 +00:00
Oliver Stannard 18ca2adf2d [ARM] Thumb2 LDR (literal) should accept PC as the destination
The version of this instruction with the .w suffix already correctly accepts
this, but the alias without the .w did not.

Differential Revision: https://reviews.llvm.org/D26499

llvm-svn: 286446
2016-11-10 13:20:41 +00:00
Craig Topper bd298c37d1 [AVX-512] Allow legacy cvtpd2dq intrinsics to select EVEX encoded instruction when available.
llvm-svn: 286435
2016-11-10 07:47:17 +00:00
Craig Topper e0845d8e8c [AVX-512][X86] Convert avx_cvtt_ps2dq_256 and sse2_cvttps2dq intrinsics to ISD::FP_TO_SINT in the intrinsics table and delete patterns. While nearby also move CVTDQ2PS patterns into their instructions.
This allows these intrinsics to also use EVEX instructons.

llvm-svn: 286434
2016-11-10 07:24:52 +00:00
Craig Topper f37b9b9b5f [X86] Convert int_x86_avx_cvtt_pd2dq_256 to fp_to_sint using the intrinsics table. Removes extra patterns and allows legacy intrinsic to select EVEX encoded instructions when available.
llvm-svn: 286433
2016-11-10 06:45:39 +00:00
Craig Topper 2afed2c790 [X86] Move some custom patterns into the currently empty pattern of their corresponding instructions. NFC
llvm-svn: 286432
2016-11-10 06:45:37 +00:00
Craig Topper 1d2e74f030 [X86] Remove some patterns still referencing int_x86_sse2_cvttpd2dq that should have been removed in r286344. NFC
llvm-svn: 286431
2016-11-10 06:45:34 +00:00
Peter Collingbourne 32ab3a817d Re-apply r286384, "X86: Introduce the "relocImm" ComplexPattern, which represents a relocatable immediate.", with a fix for 32-bit x86.
Teach X86InstrInfo::analyzeCompare() not to crash on CMP and SUB instructions
that take a global address operand.

llvm-svn: 286420
2016-11-09 23:53:43 +00:00
Tim Northover a9105be437 GlobalISel: translate invoke and landingpad instructions
Pretty bare-bones support for exception handling (no weird MSVC stuff, no SjLj
etc), but it should get things going.

llvm-svn: 286407
2016-11-09 22:39:54 +00:00
Peter Collingbourne a9cadeddd4 Revert r286384, "X86: Introduce the "relocImm" ComplexPattern, which represents a relocatable immediate."
Suspected to be the cause of a sanitizer-windows bot failure:
Assertion failed: isImm() && "Wrong MachineOperand accessor", file C:\b\slave\sanitizer-windows\llvm\include\llvm/CodeGen/MachineOperand.h, line 420

llvm-svn: 286385
2016-11-09 18:17:50 +00:00
Peter Collingbourne 4c15db45e4 X86: Introduce the "relocImm" ComplexPattern, which represents a relocatable immediate.
A relocatable immediate is either an immediate operand or an operand that
can be relocated by the linker to an immediate, such as a regular symbol
in non-PIC code.

Start using relocImm for 32-bit and 64-bit MOV instructions, and for operands
of type "imm32_su". Remove a number of now-redundant patterns.

Differential Revision: https://reviews.llvm.org/D25812

llvm-svn: 286384
2016-11-09 17:51:58 +00:00
Krzysztof Parzyszek f817efbbb0 [Hexagon] Silence "sometimes uninitialized" warning in HexagonCopyToCombine
llvm-svn: 286383
2016-11-09 17:50:46 +00:00
Krzysztof Parzyszek a540997ce4 [Hexagon] Separate Hexagon subreg indices for different register classes
For pairs of 32-bit registers: isub_lo, isub_hi.
For pairs of vector registers: vsub_lo, vsub_hi.

Add generic subreg indices: ps_sub_lo, ps_sub_hi, and a function
  HexagonRegisterInfo::getHexagonSubRegIndex(RegClass, GenericSubreg)
that returns the appropriate subreg index for RegClass.

llvm-svn: 286377
2016-11-09 16:19:08 +00:00
Krzysztof Parzyszek 601d7eb11a [Hexagon] Eliminate Insert4 pseudo-instruction, use combines instead
llvm-svn: 286368
2016-11-09 14:16:29 +00:00
Jonas Paulsson e127fe7083 [SystemZ] A few fixes in scheduler files.
Review: U Weigand
llvm-svn: 286362
2016-11-09 12:47:57 +00:00
Jonas Paulsson 28f29487b9 [MachineScheduler] Comments fixing.
The name/comment of the third argument to the ScheduleDAGMI constructor
is RemoveKillFlags and not IsPostRA. Only the comments are changed.

Review: A Trick
llvm-svn: 286350
2016-11-09 09:59:27 +00:00
Craig Topper f334ac19ad [AVX-512] Add lowering to cvttpd2udq/cvttps2udq for fptoui v2f64/2f32 to 2i32
This patch adds support for fptoui to 2i32 from both 2f64 and 2f32, building on Simon's change for the signed version in r284459 and using AVX-512 instructions.

If we don't have VLX support we need to use a 512-bit operation for v2f64->v2i32 and extract the result.

It also recognises that cvttpd2udq zeroes the upper 64-bits of the xmm result.

Differential Revision: https://reviews.llvm.org/D26331

llvm-svn: 286345
2016-11-09 07:48:51 +00:00
Craig Topper 731bf9c5d6 [X86] Lower AVX512 and SSE intrinsics for CVTTPD2DQ to X86ISD::CVTTPD2DQ.
Summary: This allows the SSE intrinsic to use the EVEX instruction when available. It also fixes EVEX to not use a weird (v4i32 (fp_to_sint v2f64)) node and it merges some isel patterns. This also fixes some cases that weren't combining vzmovl with cvttpd2dq to remove extra moves.

Reviewers: delena, zvi, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26330

llvm-svn: 286344
2016-11-09 07:31:32 +00:00
Craig Topper 28e3dfc02b [AVX-512] Use alignedstore256 in patterns that look for stores of the lower 256-bits of a 512-bit vector to use a 256-bit aligned store.
Previously we were only checking for 16 byte alignment instead of 32 byte alignment. Fixes PR30947.

llvm-svn: 286342
2016-11-09 05:31:57 +00:00
Craig Topper 5c842be9a0 [AVX-512] Make VBMI instruction set enabling imply that the BWI instruction set is also enabled.
Summary:
This is needed to make the v64i8 and v32i16 types legal for the 512-bit VBMI instructions. Fixes PR30912.

Reviewers: delena, zvi

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26322

llvm-svn: 286339
2016-11-09 04:50:48 +00:00
Matthias Braun c53cbbb1d1 AArch64DeadRegisterDefinitionsPass: Fix Changed flag
Fix a bug in the calculation of the changed flag introduced in r285488.

llvm-svn: 286293
2016-11-08 20:59:03 +00:00
Ulrich Weigand 05effca2d8 [SystemZ] Add missing FP extension instructions
This completes assembler / disassembler support for all BFP
instructions provided by the floating-point extensions facility.
The instructions added here are not currently used for codegen.

llvm-svn: 286285
2016-11-08 20:18:41 +00:00
Ulrich Weigand 4006e09d1d [SystemZ] Add program mask and addressing mode instructions
Add several instructions that operate on the program mask
or the addressing mode.  These are not really needed for
code generation under Linux, but are provided for completeness
for the assembler/disassembler.

llvm-svn: 286284
2016-11-08 20:17:02 +00:00
Ulrich Weigand fffc7110d6 [SystemZ] Model access registers as LLVM registers
Add the 16 access registers as LLVM registers.  This allows removing
a lot of special cases in the assembler and disassembler where we
were handling access registers; this can all just use the generic
register code now.

Also add a bunch of instructions to operate on access registers,
for assembler/disassembler use only.  No change in code generation
intended.

llvm-svn: 286283
2016-11-08 20:15:26 +00:00
Dan Gohman e81021a5cb [WebAssembly] Convert stackified IMPLICIT_DEF into constant 0.
Since IMPLIFIT_DEF instructions are omitted in the output, when the output
of an IMPLICIT_DEF instruction is stackified, the resulting register lacks
an explicit push, leading to a push/pop mismatch. Fix this by converting
such IMPLICIT_DEFs into CONST_I32 0 instructions so that they have explicit
pushes.

llvm-svn: 286274
2016-11-08 19:40:38 +00:00
Ulrich Weigand 3d07d45089 [SystemZ] Always use semantic instruction classes
Define a couple of additional semantic classes and use them
throughout the .td files to make them more consistent and
more easily readable.

No functional change.

llvm-svn: 286268
2016-11-08 18:37:48 +00:00
Ulrich Weigand bfcfa0e207 [SystemZ] Refactor InstRR* instruction format patterns
This changes the InstRR (and related) patterns to no longer
automatically add an "r" at the end of the mnemonic.  This
makes the .td files more obviously understandable, and also
allows using the patterns for those few instructions that
do not follow the *r scheme.

Also add some more sub-formats of the RRF format class, to
match operand names and sequence from the PoP better.

No functional change.

llvm-svn: 286267
2016-11-08 18:36:31 +00:00
Ulrich Weigand 37bd451a55 [SystemZ] Rename some Inst* instruction format classes
Now that we've added instruction format subclasses like
InstRIb, it makes sense to rename the old InstRI to InstRIa.

Similar for InstRX, InstRXY, InstRS, InstRSY, and InstSS.

No functional change.

llvm-svn: 286266
2016-11-08 18:32:50 +00:00
Nirav Dave e833c6c61a [MC][AArch64] Cleanup end-of-line parsing in AArch64 AsmParser.
Reviewers: t.p.northover, rengolin

Subscribers: llvm-commits, aemerson

Differential Revision: https://reviews.llvm.org/D26309

llvm-svn: 286265
2016-11-08 18:31:04 +00:00
Ulrich Weigand d2148caffc [SystemZ] Refactor branch and conditional instruction patterns
Rework patterns for branches, call & return instructions,
compare-and-branch, compare-and-trap, and conditional move
instructions.

In particular, simplify creation of patterns for the extended
opcodes of instructions that take a CC mask.

Also, use semantical instruction classes for all the instructions
instead of open-coding them in SystemZInstrInfo.td.

Adds a couple of the basic branch instructions (that are unused
for codegen) for the assembler/disassembler.

llvm-svn: 286263
2016-11-08 18:30:50 +00:00
Tim Northover 5f7dea85c2 GlobalISel: support selecting fpext/fptrunc instructions on AArch64.
llvm-svn: 286253
2016-11-08 17:44:07 +00:00
Anton Korobeynikov 243a4700ce Fix PR27500: on MSP430 the branch destination offset is measured in words, not bytes.
Summary: In addition, the branch instructions will have proper BB destinations, not offsets, like before.

Reviewers: asl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23718

llvm-svn: 286252
2016-11-08 17:19:59 +00:00
Simon Pilgrim d02c55204b [VectorLegalizer] Expansion of CTLZ using CTPOP when possible
This patch avoids scalarization of CTLZ by instead expanding to use CTPOP (ref: "Hacker's Delight") when the necessary operations are available.

This also adds the necessary cost models for X86 SSE2 targets (the main beneficiary) to ensure vectorization only happens when its useful.

Differential Revision: https://reviews.llvm.org/D25910

llvm-svn: 286233
2016-11-08 14:10:28 +00:00
Roger Ferrer Ibanez 80c0f33c29 [AArch64] Fix incorrect CSEL node created
Under -enable-unsafe-fp-math, SELECT_CC lowering in AArch64
transforms floating point comparisons of the form "a == 0.0 ? 0.0 : x" to
"a == 0.0 ? a : x". But it incorrectly assumes that 'x' and 'a' have
the same type which can lead to a wrong CSEL node that crashes later
due to nonsensical copies.

Differential Revision: https://reviews.llvm.org/D26394

llvm-svn: 286231
2016-11-08 13:34:41 +00:00
Tim Northover 9ac0eba672 GlobalISel: support selecting G_SELECT on AArch64.
llvm-svn: 286185
2016-11-08 00:45:29 +00:00
Tim Northover 7d88da6a46 GlobalISel: constrain PHI registers on AArch64.
Self-referencing PHI nodes need their destination operands to be constrained
because nothing else is likely to do so. For now we just pick a register class
naively.

Patch mostly by Ahmed again.

llvm-svn: 286183
2016-11-08 00:34:06 +00:00
Stanislav Mekhanoshin 92e01ee90b [AMDGPU] Allow hoisting of comparisons out of a loop and eliminate condition copies
Codegen prepare sinks comparisons close to a user is we have only one register
for conditions. For AMDGPU we have many SGPRs capable to hold vector conditions.
Changed BE to report we have many condition registers. That way IR LICM pass
would hoist an invariant comparison out of a loop and codegen prepare will not
sink it.

With that done a condition is calculated in one block and used in another.
Current behavior is to store workitem's condition in a VGPR using v_cndmask
and then restore it with yet another v_cmp instruction from that v_cndmask's
result. To mitigate the issue a forward propagation of a v_cmp 64 bit result
to an user is implemented. Additional side effect of this is that we may
consume less VGPRs in a cost of more SGPRs in case if holding of multiple
conditions is needed, and that is a clear win in most cases.

llvm-svn: 286171
2016-11-07 23:04:50 +00:00
Sanjin Sijaric 6f020d91a1 [AArch64] Transfer memory operands when lowering vector load/store intrinsics
Summary:
Some vector loads and stores generated from AArch64 intrinsics alias each other
unnecessarily, preventing better scheduling.  We just need to transfer memory
operands during lowering.

Reviewers: mcrosier, t.p.northover, jmolloy

Subscribers: aemerson, rengolin, llvm-commits

Differential Revision: https://reviews.llvm.org/D26313

llvm-svn: 286168
2016-11-07 22:39:02 +00:00
Derek Schuff 0d41b7b3f3 [WebAssembly] Emit a BasePointer when we have overly-aligned stack objects
Because we shift the stack pointer by an unknown amount, we need an
additional pointer. In the case where we have variable-size objects
as well, we can't reuse the frame pointer, thus three pointers.

Patch by Jacob Gravelle

Differential Revision: https://reviews.llvm.org/D26263

llvm-svn: 286160
2016-11-07 22:00:48 +00:00
Davide Italiano 5df6066ec1 [AArch64] Remove dead store. Found by gcc7.
llvm-svn: 286137
2016-11-07 19:11:25 +00:00
Matt Arsenault f530e8b3f0 AMDGPU: Remove unnecessary and on conditional branch
The comment explaining why this was necessary is incorrect
in its description of v_cmp's behavior for inactive workitems.

llvm-svn: 286134
2016-11-07 19:09:33 +00:00
Matt Arsenault 52f14ec596 AMDGPU: Preserve vcc undef flags when inverting branch
If the branch was on a read-undef of vcc, passes that used
analyzeBranch to invert the branch condition wouldn't preserve
the undef flag resulting in a verifier error.

Fixes verifier failures in a future commit.

Also fix verifier error when inserting copy for vccz
corruption bug.

llvm-svn: 286133
2016-11-07 19:09:27 +00:00