Commit Graph

39238 Commits

Author SHA1 Message Date
Tim Northover bc1701c7fb GlobalISel: mark G_FPEXT legal from float to double.
llvm-svn: 279845
2016-08-26 17:46:22 +00:00
Tim Northover 30bd36e3fc GlobalISel: mark G_FCMP legal on float & double.
llvm-svn: 279844
2016-08-26 17:46:19 +00:00
Tim Northover 051b8ad3d9 GlobalISel: simplify G_ICMP legalization regime.
It's unclear how the old

    %res(32) = G_ICMP { s32, s32 } intpred(eq), %0, %1

is actually different from an s1 verison

    %res(1) = G_ICMP { s1, s32 } intpred(eq), %0, %1

so we'll remove it for now.

llvm-svn: 279843
2016-08-26 17:46:17 +00:00
Tim Northover cecee56abb GlobalISel: legalize sdiv and srem operations.
llvm-svn: 279842
2016-08-26 17:46:13 +00:00
Tim Northover 7a753d9bec GlobalISel: legalize under-width divisions.
llvm-svn: 279841
2016-08-26 17:46:06 +00:00
Tim Northover 1d18a99a53 GlobalISel: mark selects legal
llvm-svn: 279840
2016-08-26 17:46:03 +00:00
Tim Northover 5d0eaa4e79 GlobalISel: mark float/int conversions legal
llvm-svn: 279839
2016-08-26 17:45:58 +00:00
Chad Rosier 39c1dbb845 [AArch64] Avoid materializing constant 1 by using csinc, rather than csel.
This is similar to what was done in r261675, but for CSINC rather than CSINV.

Differential Revision: https://reviews.llvm.org/D23892

llvm-svn: 279822
2016-08-26 14:01:55 +00:00
Pablo Barrio b8ec630583 Handle empty functions with debug info in load/store opt pass
Summary:
In fuctions that contained debug info but were empty otherwise,
the ARM load/store optimizer could abort. This was because
function MergeReturnIntoLDM handled the special case where a
Machine Basic BLock is empty by calling MBB.empty(). However, this
returns false in presence of debug info, although the function
should be considered empty in the eyes of the load/store optimizer.
This has been fixed by handling the case where searching through the
block finds only debug instructions.

Reviewers: rengolin, dexonsmith, llvm-commits, jmolloy

Subscribers: t.p.northover, aemerson, rengolin, samparker

Differential Revision: https://reviews.llvm.org/D23847

llvm-svn: 279820
2016-08-26 13:00:39 +00:00
Simon Pilgrim 091c4c781c [X86][SSE4A] The EXTRQ/INSERTQ bit extraction/insertion ops should be in the integer domain
llvm-svn: 279811
2016-08-26 09:55:41 +00:00
Craig Topper 8f27f51192 [X86][SSE] Add CMPSS/CMPSD intrinsic scalar load folding support.
llvm-svn: 279806
2016-08-26 07:08:00 +00:00
Michael Kuperstein 2ee911e985 Revert r274613 because it breaks the test suite with AVX512
This reverts most of r274613 (AKA r274626) and its follow-ups (r276347, r277289),
due to miscompiles in the test suite. The FastISel change was left in, because
it apparently fixes an unrelated issue.

(Recommit of r279782 which was broken due to a bad merge.)

This fixes 4 out of the 5 test failures in PR29112.

llvm-svn: 279788
2016-08-25 22:48:11 +00:00
Michael Kuperstein 6e271f4ce8 Revert r279782 due to debug buildbot breakage.
llvm-svn: 279785
2016-08-25 22:14:45 +00:00
Michael Kuperstein a6ccc8d365 Revert r274613 because it breaks the test suite with AVX512
This reverts most of r274613 and its follow-ups (r276347, r277289), due to
miscompiles in the test suite. The FastISel change was left in, because it
apparently fixes an unrelated issue.

This fixes 4 out of the 5 test failures in PR29112.

llvm-svn: 279782
2016-08-25 21:55:41 +00:00
Tim Northover 3495647d0d ARM: by default don't set the Thumb bit on MachO relocated values.
Its existence is largely historical, apparently we tried to make ARM object
files look maybe-almost-possibly runnable by putting our best guess at the
actual value into relocated locations. Of course, the real linker then comes
along and can completely change things.

But it should only be there for word-sized and movw/movt relocations. It can't
be encoded in branch relocations, and I've seen it mess up validity
calculations twice in the last couple of weeks so the default is clearly problematic.

llvm-svn: 279773
2016-08-25 20:41:30 +00:00
Tim Northover d8a6d7ce91 GlobalISel: mark overflow bit of overflow ops legal.
It's expected this will map to NZCV register class and be properly selectable.

llvm-svn: 279761
2016-08-25 17:37:41 +00:00
Tim Northover fe880a8801 GlobalISel: mark simple ops legal even on types < 32-bit.
The 32-bit variants of these operations don't depend on the bits not being
operated on, so they also naturally model operations narrower than the actual
register width.

llvm-svn: 279760
2016-08-25 17:37:39 +00:00
Tim Northover 7a1ec0141a GlobalISel: mark pointer constants as legal on AArch64.
llvm-svn: 279759
2016-08-25 17:37:35 +00:00
Tim Northover 438c77ca1a GlobalISel: perform multi-step legalization
llvm-svn: 279758
2016-08-25 17:37:32 +00:00
Tim Northover 2c4a838e24 GlobalISel: mark small extends as legal on AArch64
llvm-svn: 279757
2016-08-25 17:37:25 +00:00
Michael Kuperstein 40887c5566 [X86] 512-bit VPAVG requires AVX512BW
Fix VPAVG detection to require AVX512BW, not AVX512F for 512-bit widths,
and change associated asserts to assert in the right direction...

This fixes PR29111.

llvm-svn: 279755
2016-08-25 17:17:46 +00:00
Simon Pilgrim 5aa9c203ac [X86][SSE] INSERTPS is only combined on v4f32 types. NFCI.
llvm-svn: 279751
2016-08-25 17:02:00 +00:00
Ron Lieberman a3c739b977 [Hexagon] Remove extraneous debug output from HexagonCopyToCombine.cpp
BB# ...

llvm-svn: 279750
2016-08-25 16:46:09 +00:00
Simon Pilgrim 6fe4a9ed1e Fix line endings
llvm-svn: 279745
2016-08-25 15:45:27 +00:00
Ron Lieberman c93d123b86 [Hexagon] vector store print tracing.
Add vector store print tracing option for hexagon vector instructions.

https://reviews.llvm.org/D23870

llvm-svn: 279739
2016-08-25 13:35:48 +00:00
Simon Pilgrim 0ad9f3e93b [X86][AVX] Provide SubVectorBroadcast fallback if load fold fails (PR29133)
Fix for PR29133, matching the approach that was taken for AVX1 scalar broadcasts.

llvm-svn: 279735
2016-08-25 12:45:16 +00:00
Craig Topper 5ef7a0f45a [X86] Simplify getOperandBias as a bit. NFC
There's no reason for it to return a signed type. Just return the operand bias in each if instead of starting from 0 and adding in the 'if'.

llvm-svn: 279720
2016-08-25 04:16:10 +00:00
Craig Topper 969e56a2cc [X86] Fix indentation per coding standards. NFC
llvm-svn: 279719
2016-08-25 04:16:08 +00:00
Matthias Braun 1eb473680a MachineFunctionProperties/MIRParser: Rename AllVRegsAllocated->NoVRegs, compute it
Rename AllVRegsAllocated to NoVRegs. This avoids the connotation of
running after register and simply describes that no vregs are used in
a machine function. With that we can simply compute the property and do
not need to dump/parse it in .mir files.

Differential Revision: http://reviews.llvm.org/D23850

llvm-svn: 279698
2016-08-25 01:27:13 +00:00
George Burgess IV 381fc0ee3c Make some LLVM_CONSTEXPR variables const. NFC.
This patch changes LLVM_CONSTEXPR variable declarations to const
variable declarations, since LLVM_CONSTEXPR expands to nothing if the
current compiler doesn't support constexpr. In all of the changed
cases, it looks like the code intended the variable to be const instead
of sometimes-constexpr sometimes-not.

llvm-svn: 279696
2016-08-25 01:05:08 +00:00
Heejin Ahn b6cd5121b7 [WebAssembly] Change a comment line
Test for commit access.

llvm-svn: 279683
2016-08-24 22:53:00 +00:00
Krzysztof Parzyszek 6dff336ad1 [Hexagon] Check for block end when skipping debug instructions
llvm-svn: 279681
2016-08-24 22:36:35 +00:00
Krzysztof Parzyszek 951fb36120 [Hexagon] Change insertion of expand-condsets pass to avoid memory leaks
llvm-svn: 279678
2016-08-24 22:27:36 +00:00
Tim Northover 9c3633f516 ARM: don't diagnose cbz/cbnz to Thumb functions.
A branch-distance to a Thumb function shouldn't be forced to be odd for
CBZ/CBNZ instructions because (assuming it's within range), it's going to be a
valid, even offset.

llvm-svn: 279665
2016-08-24 21:21:29 +00:00
Changpeng Fang 75f0968b39 AMDGCN/SI: Implement readlane/readfirstlane intrinsics
Summary:
  This patch implements readlane/readfirstlane intrinsics.
TODO: need to define a new register class to consider the case
that the source could be a vector register or M0.

Reviewed by:
  arsenm and tstellarAMD

Differential Revision:
  http://reviews.llvm.org/D22489

llvm-svn: 279660
2016-08-24 20:35:23 +00:00
Rafael Espindola 70c6a3976b Use isTargetMachO instead of isTargetDarwin.
llvm-svn: 279655
2016-08-24 19:02:29 +00:00
Simon Pilgrim e14653e17d [X86][SSE] Add MINSD/MAXSD/MINSS/MAXSS intrinsic scalar load folding support
These are no different in load behaviour to the existing ADD/SUB/MUL/DIV scalar ops but were missing from isNonFoldablePartialRegisterLoad

llvm-svn: 279652
2016-08-24 18:40:53 +00:00
Evandro Menezes 5395187fe5 [AArch64] Adjust the feature set for Exynos M1.
Enable zero cycle zeroing.

llvm-svn: 279648
2016-08-24 18:17:30 +00:00
Simon Pilgrim 941bd6bbae [X86][SSE] Add support for combining VZEXT_MOVL target shuffles
Includes adding more general support for the pattern: VZEXT_MOVL(VZEXT_LOAD(ptr)) -> VZEXT_LOAD(ptr)

This has unearthed a couple of latent poor codegen issues (MINSS/MAXSS scalar load folding and MOVDDUP/BROADCAST load folding patterns), which will be fixed shortly.

Its also reduced a couple of tests so that they no longer reach the instruction threshold necessary to be combined to PSHUFB (see PR26183).

llvm-svn: 279646
2016-08-24 18:07:53 +00:00
Krzysztof Parzyszek b5ec48755d [Hexagon] Enable subregister liveness tracking
llvm-svn: 279642
2016-08-24 17:17:39 +00:00
Krzysztof Parzyszek cbd559f507 [Hexagon] Remove the utilization of IMPLICIT_DEFs from expand-condsets
This is no longer necessary, because since r279625 the subregister
liveness properly accounts for read-undefs.

llvm-svn: 279637
2016-08-24 16:36:37 +00:00
Wei Ding 1041a646a9 AMDGPU : Add V_SAD_U32 instruction pattern.
Differential Revision: http://reviews.llvm.org/D23069

llvm-svn: 279629
2016-08-24 14:59:47 +00:00
Krzysztof Parzyszek a7ed090bba Create subranges for new intervals resulting from live interval splitting
The register allocator can split a live interval of a register into a set
of smaller intervals. After the allocation of registers is complete, the
rewriter will modify the IR to replace virtual registers with the corres-
ponding physical registers. At this stage, if a register corresponding
to a subregister of a virtual register is used, the rewriter will check
if that subregister is undefined, and if so, it will add the <undef> flag
to the machine operand. The function verifying liveness of the subregis-
ter would assume that it is undefined, unless any of the subranges of the
live interval proves otherwise.
The problem is that the live intervals created during splitting do not
have any subranges, even if the original parent interval did. This could
result in the <undef> flag placed on a register that is actually defined.

Differential Revision: http://reviews.llvm.org/D21189

llvm-svn: 279625
2016-08-24 13:37:55 +00:00
Simon Dardis f114820912 [mips] Preparatory work for a generic scheduler
Extend instruction definitions from nearly all ISAs to include
appropriate instruction itineraries. Change MIPS16s gp prologue
generation to use real instructions instead of using a pseudo
instruction.

Reviewers: dsanders, vkalintiris

Differential Review: https://reviews.llvm.org/D23548

llvm-svn: 279623
2016-08-24 13:00:47 +00:00
Simon Pilgrim 7a50c8c2ba [X86][AVX2] Ensure on 32-bit targets that we broadcast f64 types not i64 (PR29101)
llvm-svn: 279622
2016-08-24 12:42:31 +00:00
Simon Pilgrim 6392b8d4ce [X86][SSE] Add support for 32-bit element vectors to X86ISD::VZEXT_LOAD
Consecutive load matching (EltsFromConsecutiveLoads) currently uses VZEXT_LOAD (load scalar into lowest element and zero uppers) for vXi64 / vXf64 vectors only.

For vXi32 / vXf32 vectors it instead creates a scalar load, SCALAR_TO_VECTOR and finally VZEXT_MOVL (zero upper vector elements), relying on tablegen patterns to match this into an equivalent of VZEXT_LOAD.

This patch adds the VZEXT_LOAD patterns for vXi32 / vXf32 vectors directly and updates EltsFromConsecutiveLoads to use this.

This has proven necessary to allow us to easily make VZEXT_MOVL a full member of the target shuffle set - without this change the call to combineShuffle (which is the main caller of EltsFromConsecutiveLoads) tended to recursively recreate VZEXT_MOVL nodes......

Differential Revision: https://reviews.llvm.org/D23673

llvm-svn: 279619
2016-08-24 10:46:40 +00:00
Matthias Braun 733fe3676c CodeGen: Remove MachineFunctionAnalysis => Enable (Machine)ModulePasses
Re-apply this patch, hopefully I will get away without any warnings
in the constructor now.

This patch removes the MachineFunctionAnalysis. Instead we keep a
map from IR Function to MachineFunction in the MachineModuleInfo.

This allows the insertion of ModulePasses into the codegen pipeline
without breaking it because the MachineFunctionAnalysis gets dropped
before a module pass.

Peak memory should stay unchanged without a ModulePass in the codegen
pipeline: Previously the MachineFunction was freed at the end of a codegen
function pipeline because the MachineFunctionAnalysis was dropped; With
this patch the MachineFunction is freed after the AsmPrinter has
finished.

Differential Revision: http://reviews.llvm.org/D23736

llvm-svn: 279602
2016-08-24 01:52:46 +00:00
Philip Reames e83c4b30ca [stackmaps] More extraction of common code [NFCI]
General cleanup before starting to work on the part I want to actually change.

llvm-svn: 279586
2016-08-23 23:33:29 +00:00
Richard Smith 8c3fbdc6c4 Revert r279564. It introduces undefined behavior (binding a reference to a
dereferenced null pointer) in MachineModuleInfo::MachineModuleInfo that causes
-Werror builds (including several buildbots) to fail.

llvm-svn: 279580
2016-08-23 22:08:27 +00:00
Matthias Braun 90799ce8b2 MachineFunction: Introduce NoPHIs property
I want to compute the SSA property of .mir files automatically in
upcoming patches. The problem with this is that some inputs will be
reported as static single assignment with some passes claiming not to
support SSA form.  In reality though those passes do not support PHI
instructions => Track the presence of PHI instructions separate from the
SSA property.

Differential Revision: https://reviews.llvm.org/D22719

llvm-svn: 279573
2016-08-23 21:19:49 +00:00
Tim Northover 6cd4b23a0f GlobalISel: legalize integer comparisons on AArch64.
Next step is doing both legalizations at the same time! Marvel at GlobalISel's
cunning.

llvm-svn: 279566
2016-08-23 21:01:26 +00:00
Tim Northover b3a0be4d38 GlobalISel: legalize conditional branches on AArch64.
llvm-svn: 279565
2016-08-23 21:01:20 +00:00
Matthias Braun 4c1f1f120c CodeGen: Remove MachineFunctionAnalysis => Enable (Machine)ModulePasses
Re-apply this commit with the deletion of a MachineFunction delegated to
a separate pass to avoid use after free when doing this directly in
AsmPrinter.

This patch removes the MachineFunctionAnalysis. Instead we keep a
map from IR Function to MachineFunction in the MachineModuleInfo.

This allows the insertion of ModulePasses into the codegen pipeline
without breaking it because the MachineFunctionAnalysis gets dropped
before a module pass.

Peak memory should stay unchanged without a ModulePass in the codegen
pipeline: Previously the MachineFunction was freed at the end of a codegen
function pipeline because the MachineFunctionAnalysis was dropped; With
this patch the MachineFunction is freed after the AsmPrinter has
finished.

Differential Revision: http://reviews.llvm.org/D23736

llvm-svn: 279564
2016-08-23 20:58:29 +00:00
Tim Northover a01bece1dc GlobalISel: extend legalizer interface to handle multiple types.
Instructions like G_ICMP have multiple types that may need to be legalized (the
boolean output and nearly arbitrary inputs in this case). So the legalizer must
be capable of deciding what to do for each of them separately.

llvm-svn: 279554
2016-08-23 19:30:42 +00:00
Tim Northover 456a3c03ac GlobalISel: mark pointer casts legal on AArch64.
llvm-svn: 279553
2016-08-23 19:30:38 +00:00
Tim Northover 3c73e367c0 GlobalISel: legalize 1-bit load/store and mark 8/16 bit variants legal on AArch64.
llvm-svn: 279548
2016-08-23 18:20:09 +00:00
Krzysztof Parzyszek 38e2ccc8d0 [Hexagon] Packetize return value setup with the return instruction
Commit r279241 unintentionally reverted that ability.

llvm-svn: 279526
2016-08-23 16:01:01 +00:00
Jacques Pienaar 8ed1cd291d [lanai] Use const instead of constexpr
The windows build bot did not like constexpr.

llvm-svn: 279517
2016-08-23 14:36:53 +00:00
Elliot Colp a4092104cb Fix SystemZ hang caused by r279105
The change in r279105 causes an infinite loop in some cases, as it sets the upper bits of an AND mask constant, which DAGCombiner::SimplifyDemandedBits then unsets.
This patch reverts that part of the behaviour, instead relying on .td peepholes to perform the transformation to NILL. I reapplied my original fix for the problem addressed by r279105 (unsetting the upper bits, which prevents a compiler abort for a different reason).

Differential Revision: https://reviews.llvm.org/D23781

llvm-svn: 279515
2016-08-23 14:03:02 +00:00
NAKAMURA Takumi 708591d231 LLVMLanaDesc: Update libdesp.
llvm-svn: 279510
2016-08-23 10:47:40 +00:00
NAKAMURA Takumi 031a6330c4 Change the target's name, s/LanaiMCTargetDesc/LanaiDesc/g.
"AllTargetsDescs" in llvm-mc/CMakeLists.txt expects not ${target}MCTargetDesc, but ${target}Desc.

llvm-svn: 279509
2016-08-23 10:43:01 +00:00
Oliver Stannard 9aa6f010a4 [ARM] Generate consistent frame records for Thumb2
There is not an official documented ABI for frame pointers in Thumb2,
but we should try to emit something which is useful.

We use r7 as the frame pointer for Thumb code, which currently means
that if a function needs to save a high register (r8-r11), it will get
pushed to the stack between the frame pointer (r7) and link register
(r14). This means that while a stack unwinder can follow the chain of
frame pointers up the stack, it cannot know the offset to lr, so does
not know which functions correspond to the stack frames.

To fix this, we need to push the callee-saved registers in two batches,
with the first push saving the low registers, fp and lr, and the second
push saving the high registers. This is already implemented, but
previously only used for iOS. This patch turns it on for all Thumb2
targets when frame pointers are required by the ABI, and the frame
pointer is r7 (Windows uses r11, so this isn't a problem there). If
frame pointer elimination is enabled we still emit a single push/pop
even if we need a frame pointer for other reasons, to avoid increasing
code size.

We must also ensure that lr is pushed to the stack when using a frame
pointer, so that we end up with a complete frame record. Situations that
could cause this were rare, because we already push lr in most
situations so that we can return using the pop instruction.

Differential Revision: https://reviews.llvm.org/D23516

llvm-svn: 279506
2016-08-23 09:19:22 +00:00
Matthias Braun 7f66202d38 Revert "(HEAD -> master, origin/master, origin/HEAD) CodeGen: Remove MachineFunctionAnalysis => Enable (Machine)ModulePasses"
Reverting while tracking down a use after free.

This reverts commit r279502.

llvm-svn: 279503
2016-08-23 05:17:11 +00:00
Matthias Braun fd936841eb CodeGen: Remove MachineFunctionAnalysis => Enable (Machine)ModulePasses
This patch removes the MachineFunctionAnalysis. Instead we keep a
map from IR Function to MachineFunction in the MachineModuleInfo.

This allows the insertion of ModulePasses into the codegen pipeline
without breaking it because the MachineFunctionAnalysis gets dropped
before a module pass.

Peak memory should stay unchanged without a ModulePass in the codegen
pipeline: Previously the MachineFunction was freed at the end of a codegen
function pipeline because the MachineFunctionAnalysis was dropped; With
this patch the MachineFunction is freed after the AsmPrinter has
finished.

Differential Revision: http://reviews.llvm.org/D23736

llvm-svn: 279502
2016-08-23 03:20:09 +00:00
Matt Arsenault 567631bdd4 BranchRelaxation: Fix handling of blocks with multiple conditional
branches

Looping over all terminators exposed AArch64 tests hitting
an assert from analyzeBranch failing. I believe these cases
were miscompiled before.

e.g.
  fcmp s0, s1
  b.ne LBB0_1
  b.vc LBB0_2
  b LBB0_2
LBB0_1:
  ; Large block
LBB0_2:
 ; ...

Both of the individual conditional branches need to
be expanded, since neither can reach the final block.

Split the original block into ones which analyzeBranch
will be able to understand.

llvm-svn: 279499
2016-08-23 01:30:30 +00:00
Jacques Pienaar 0e2171904e [lanai] Exit early in Mem Alu combiner if sentinel reach.
LanaiMemAluCombiner could try to query the debug value of a list sentinel. Add check to exit early instead.

llvm-svn: 279497
2016-08-23 01:04:41 +00:00
Matt Arsenault 78fc9daf8d AMDGPU: Split SILowerControlFlow into two pieces
Do most of the lowering in a pre-RA pass. Keep the skip jump
insertion late, plus a few other things that require more
work to move out.

One concern I have is now there may be COPY instructions
which do not have the necessary implicit exec uses
if they will be lowered to v_mov_b32.

This has a positive effect on SGPR usage in shader-db.

llvm-svn: 279464
2016-08-22 19:33:16 +00:00
Simon Pilgrim c8ad5c069c [X86][AVX] Don't use SubVectorBroadcast if there are additional users of the chain (PR29088)
We could improve on this by making X86SubVBroadcast a full memory intrinsic similar to X86vzload

llvm-svn: 279441
2016-08-22 16:47:55 +00:00
Simon Atanasyan eb9ed61021 [mips][ias] Support .dtprel[d]word and .tprel[d]word directives
Assembler directives .dtprelword, .dtpreldword, .tprelword, and
.tpreldword generates relocations R_MIPS_TLS_DTPREL32, R_MIPS_TLS_DTPREL64,
R_MIPS_TLS_TPREL32, and R_MIPS_TLS_TPREL64 respectively.

The main motivation for this patch is to be able to write test cases
for checking correctness of the LLD linker's behaviour.

Differential Revision: https://reviews.llvm.org/D23669

llvm-svn: 279439
2016-08-22 16:18:42 +00:00
Simon Pilgrim 13fa33012b [X86] Only accept SM_SentinelUndef (-1) as an undefined shuffle mask in range
As discussed on D23027 we should be trying to be more strict on what is an undefined mask value.

llvm-svn: 279435
2016-08-22 13:18:56 +00:00
Simon Pilgrim 2279e59573 [X86][SSE] Avoid specifying unused arguments in SHUFPD lowering
As discussed on PR26491, we are missing the opportunity to make use of the smaller MOVHLPS instruction because we set both arguments of a SHUFPD when using it to lower a single input shuffle.

This patch sets the lowered argument to UNDEF if that shuffle element is undefined. This in turn makes it easier for target shuffle combining to decode UNDEF shuffle elements, allowing combines to MOVHLPS to occur.

A fix to match against MOVHPD stores was necessary as well.

This builds on the improved MOVLHPS/MOVHLPS lowering and memory folding support added in D16956

Adding similar support for SHUFPS will have to wait until have better support for target combining of binary shuffles.

Differential Revision: https://reviews.llvm.org/D23027

llvm-svn: 279430
2016-08-22 12:56:54 +00:00
Hrvoje Varga f0ed16eae5 [mips][microMIPS] Implement BLTZC, BLEZC, BGEZC and BGTZC instructions, fix disassembly and add operand checking to existing B<cond>C implementations
Differential Revision: https://reviews.llvm.org/D22667

llvm-svn: 279429
2016-08-22 12:17:59 +00:00
Craig Topper 5f8419da34 [X86] Create a new instruction format to handle 4VOp3 encoding. This saves one bit in TSFlags and simplifies MRMSrcMem/MRMSrcReg format handling.
llvm-svn: 279424
2016-08-22 07:38:50 +00:00
Craig Topper 9b20fece81 [X86] Create a new instruction format to handle MemOp4 encoding. This saves one bit in TSFlags and simplifies MRMSrcMem/MRMSrcReg format handling.
llvm-svn: 279423
2016-08-22 07:38:45 +00:00
Craig Topper 61b62e56b7 [X86] Space out the encodings of X86 instruction formats. I plan to add some new encodings in future commits and this will reduce the size of those commits. NFC
This tries to keep all the ModRM memory and register forms in their own regions of the encodings. Hoping to make it simple on some of the switch statements that operate on these encodings.

llvm-svn: 279422
2016-08-22 07:38:41 +00:00
Craig Topper ca0eda3e6a [X86] Merge hasVEX_i8ImmReg into the ImmFormat type which had extra unused encodings. This saves one bit in TSFlags. NFC
llvm-svn: 279412
2016-08-22 01:37:19 +00:00
Craig Topper 522541231a [X86] Remove ignoreVEX_L from TSFlags. Only the disassembler needs it and the disassembler doesn't use TSFlags. NFC
llvm-svn: 279411
2016-08-22 01:37:16 +00:00
NAKAMURA Takumi 9d0b53129c Reformat.
llvm-svn: 279409
2016-08-22 00:58:47 +00:00
NAKAMURA Takumi 59a20649c6 Untabify.
llvm-svn: 279408
2016-08-22 00:58:04 +00:00
Simon Pilgrim 67e7e22462 [X86][AVX] Dropped combineShuffle256 - this can now be performed by EltsFromConsecutiveLoads
llvm-svn: 279397
2016-08-21 15:39:45 +00:00
Guy Blank 9ae797a798 [AVX512][FastISel] Do not use K registers in TEST instructions
In some cases, FastIsel was emitting TEST instruction with K reg input, which is illegal.
Changed to using KORTEST when dealing with K regs.

Differential Revision: https://reviews.llvm.org/D23163

llvm-svn: 279393
2016-08-21 08:02:27 +00:00
Duncan P. N. Exon Smith 8f44c98d04 ARM: Avoid dereferencing end() in ARMFrameLowering::emitEpilogue
This fixes the crash from PR29072, where the MachineBasicBlock::iterator
wasn't being properly checked against MachineBasicBlock::end() before
iterating.  This was another bug exposed by the new
ilist::iterator::operator*() assertion from r279314.

This testcase is poor quality.  bugpoint couldn't reduce any further,
and I haven't had time to dig into what's going on so I can't invent a
better one.  I didn't even get good CHECK lines in: this is just a
crasher.

I'm committing anyway since this is a real crash with an obvious fix,
but I'll leave PR29072 open and ask an ARM maintainer to help improve
the testcase.

llvm-svn: 279391
2016-08-21 00:08:10 +00:00
Tim Northover a11be04769 GlobalISel: support legalization of G_FCONSTANTs
llvm-svn: 279341
2016-08-19 22:40:08 +00:00
Tim Northover ea904f9424 GlobalISel: teach legalizer how to handle integer constants.
llvm-svn: 279340
2016-08-19 22:40:00 +00:00
Tim Shen b5e0f5ac95 [GraphTraits] Make nodes_iterator dereference to NodeType*/NodeRef
Currently nodes_iterator may dereference to a NodeType* or a NodeType&. Make them all dereference to NodeType*, which is NodeRef later.

Differential Revision: https://reviews.llvm.org/D23704
Differential Revision: https://reviews.llvm.org/D23705

llvm-svn: 279326
2016-08-19 21:20:13 +00:00
Krzysztof Parzyszek 29a6a2eb8f [Hexagon] Avoid register dependencies on indirect branches in packetizer
Do not packetize the instruction setting the branch address with the
indirect branch itself.

llvm-svn: 279324
2016-08-19 21:07:35 +00:00
Justin Lebar d13880a750 [NVPTX] Switch nvptx-use-infer-addrspace to true.
Summary:
This switches us to use a different, more powerful algorithm for address
space inference.  I've tested this locally and it seems to work great.
Once we're more confident in it, we can remove the old pass altogether.

Reviewers: jingyue

Subscribers: llvm-commits, tra, jholewinski

Differential Revision: https://reviews.llvm.org/D23694

llvm-svn: 279317
2016-08-19 20:46:45 +00:00
Krzysztof Parzyszek fb4c4178a2 [Hexagon] Fix subesthetic indentation
llvm-svn: 279303
2016-08-19 19:29:15 +00:00
Krzysztof Parzyszek 505eb498bd [Hexagon] Allow i1 values for 'r' constraint in inline-asm
llvm-svn: 279302
2016-08-19 19:17:28 +00:00
Krzysztof Parzyszek 8849a51370 [Hexagon] Do not cache alloca instructions during isel
They can be deleted or replicated, so the cache may become outdated.
They only need to be visited once during frame lowering, so just scan
the function instead.

llvm-svn: 279297
2016-08-19 18:46:13 +00:00
Krzysztof Parzyszek 3d9946eb23 [Hexagon] Fixes for new-value jump formation
- Recognize C2_cmpgtui, S2_tstbit_i, and S4_ntstbit_i.
- Avoid creating new-value instructions with both source operands equal.

llvm-svn: 279286
2016-08-19 17:54:49 +00:00
Krzysztof Parzyszek 5a7bef9c14 [Hexagon] Fix a few omissions in HexagonInstrInfo
llvm-svn: 279280
2016-08-19 17:20:57 +00:00
Simon Pilgrim d7a3782ae4 [X86][SSE] Generalised combining to VZEXT_MOVL to any vector size
This doesn't change tests codegen as we already combined to blend+zero which is what we lower VZEXT_MOVL to on SSE41+ targets, but it does put us in a better position when we improve shuffling for optsize.

llvm-svn: 279273
2016-08-19 17:02:00 +00:00
Krzysztof Parzyszek 639545b4d8 [Hexagon] Enforce LLSC packetization rules
Ensure that load locked and store conditional instructions are only
packetized with ALU32 instructions.

Patch by Ben Craig.

llvm-svn: 279272
2016-08-19 16:57:05 +00:00
Krzysztof Parzyszek b7640d4df0 [Hexagon] Minor updates to register definitions
llvm-svn: 279269
2016-08-19 16:40:19 +00:00
Krzysztof Parzyszek 9335bf0ec5 [Hexagon] Fix incorrect generation of S4_subi_asl_ri
Patch by Jyotsna Verma.

llvm-svn: 279267
2016-08-19 16:35:05 +00:00
Krzysztof Parzyszek dddb097a1f [Hexagon] Add missing pattern for C4_cmplte
llvm-svn: 279265
2016-08-19 16:11:33 +00:00
Krzysztof Parzyszek 0b8672269c [Hexagon] Make p0 an explicit operand in VA1_clr* subinstructions, NFC
llvm-svn: 279255
2016-08-19 15:17:19 +00:00
Krzysztof Parzyszek 6ce82951c3 [Hexagon] Add explicit default constructor for HexagonSelectionDAGInfo
llvm-svn: 279254
2016-08-19 15:13:54 +00:00
Krzysztof Parzyszek 0ba9754584 [Hexagon] Allow tail-call optimization when mixing C and fast calling conv
Patch by Arnold Schwaighofer.

llvm-svn: 279251
2016-08-19 15:02:18 +00:00
Krzysztof Parzyszek 66dd6797e8 [Hexagon] Check for empty live interval
Patch by Brendon Cahoon.

llvm-svn: 279249
2016-08-19 14:29:43 +00:00
Krzysztof Parzyszek db019ae801 [Hexagon] Consider zext/sext of a load to i32 to be free
llvm-svn: 279248
2016-08-19 14:22:07 +00:00
Anton Korobeynikov b38195c1a8 Revert r279242 - it's failing the tests
llvm-svn: 279247
2016-08-19 14:18:34 +00:00
Krzysztof Parzyszek a243adfd27 [Hexagon] Handle J2_jumptpt and J2_jumpfpt instructions
llvm-svn: 279246
2016-08-19 14:14:09 +00:00
Krzysztof Parzyszek 067debe0a0 [Hexagon] Fix indentation, NFC
llvm-svn: 279245
2016-08-19 14:12:51 +00:00
Krzysztof Parzyszek 9273ecc176 [Hexagon] Remove unnecessary llvm::, NFC
llvm-svn: 279244
2016-08-19 14:10:57 +00:00
Krzysztof Parzyszek 75e74ee699 [Hexagon] Rename the HEXAGON_MC namespace to Hexagon_MC, NFC
llvm-svn: 279243
2016-08-19 14:09:47 +00:00
Anton Korobeynikov 2aae31a945 Fix PR27500: on MSP430 the branch destination offset is measured in words, not bytes.
In addition, the branch instructions will have proper BB destinations, not offsets, like before.

Patch by Vadzim Dambrouski!

Differential Revision: https://reviews.llvm.org/D20162

llvm-svn: 279242
2016-08-19 14:07:10 +00:00
Krzysztof Parzyszek 6421b934ec [Hexagon] Mark PS_jumpret as pseudo-instruction, expand it into J2_jumpr
llvm-svn: 279241
2016-08-19 14:04:45 +00:00
Krzysztof Parzyszek bd8ef4b8ce [Hexagon] Improvements to handling and generation of FP instructions
Improved handling of fma, floating point min/max, additional load/store
instructions for floating point types.

Patch by Jyotsna Verma.

llvm-svn: 279239
2016-08-19 13:34:31 +00:00
Simon Pilgrim f1b8fdc074 [X86][SSE] Add support for matching commuted insertps patterns
INSERTPS doesn't fit well with our shuffle mask canonicalization, so we need to attempt both the original mask and the commuted mask to more likely get a match

llvm-svn: 279230
2016-08-19 10:31:53 +00:00
Dean Michael Berris 1dd1ca9727 [XRay] Synthesize a reference to the xray_instr_map
Without the synthesized reference to a symbol in the xray_instr_map,
linker section garbage collection will helpfully remove the whole
xray_instr_map section from the final executable (or archive). This will
cause the runtime to not be able to identify the sleds and hot-patch the
calls/jumps into the runtime trampolines.

This change adds a reference from the text section at the end of the
function to keep around the associated xray_instr_map section as well.

We also make sure that we catch this reference in the test.

Reviewers: chandlerc, echristo, majnemer, mehdi_amini

Subscribers: mehdi_amini, llvm-commits, dberris

Differential Revision: https://reviews.llvm.org/D23398

llvm-svn: 279204
2016-08-19 04:44:30 +00:00
Andrew Kaylor 81901d658f Include X86CallFrameOptimization in the opt-bisect process.
Differential Revision: https://reviews.llvm.org/D23683

llvm-svn: 279175
2016-08-18 22:49:51 +00:00
Saleem Abdulrasool dab786fb78 AArch64: remove extraneous padding
The structs BarrierOp, PrefetchOp, PSBHintOp are in AArch64AsmParser.cpp
(inside anonymous namespace).  This diff changes the order of fields and
removes the excessive padding (8 bytes).

Patch by Alexander Shaposhnikov!

llvm-svn: 279173
2016-08-18 22:35:06 +00:00
Zhan Jun Liau cf2f4b3251 [SystemZ] Use valid base/index regs for inline asm
Summary:
Inline asm memory constraints can have the base or index register be assigned
to %r0 right now. Make sure that we assign only ADDR64 registers to the base
and index.

Reviewers: uweigand

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23367

llvm-svn: 279157
2016-08-18 21:44:15 +00:00
Michael Kuperstein 2bc3d4d46c [SelectionDAG] Rename fextend -> fpextend, fround -> fpround, frnd -> fround
The names of the tablegen defs now match the names of the ISD nodes.
This makes the world a slightly saner place, as previously "fround" matched
ISD::FP_ROUND and not ISD::FROUND.

Differential Revision: https://reviews.llvm.org/D23597

llvm-svn: 279129
2016-08-18 20:08:15 +00:00
Wei Ding 52bb661dec AMDGPU : Fix QSAD and MQSAD instructions' incorrect data type.
Differential Revision: http://reviews.llvm.org/D23689

llvm-svn: 279126
2016-08-18 19:51:14 +00:00
Valery Pykhtin 609c2f8137 [AMDGPU] add s_incperflevel/s_decperflevel intrinsics.
Differential revision: https://reviews.llvm.org/D23666

llvm-svn: 279106
2016-08-18 18:06:20 +00:00
Elliot Colp 687691aeac Fix SystemZ compilation abort caused by negative AND mask
Normally, when an AND with a constant is lowered to NILL, the constant value is truncated to 16 bits. However, since r274066, ANDs whose results are used in a shift are caught by a different pattern that does not truncate. The instruction printer expects a 16-bit unsigned immediate operand for NILL, so this results in an abort.

This patch adds code to manually truncate the constant in this situation. The rest of the bits are then set, so we will detect a case for NILL "naturally" rather than using peephole optimizations.

Differential Revision: http://reviews.llvm.org/D21854

llvm-svn: 279105
2016-08-18 18:04:26 +00:00
Duncan P. N. Exon Smith 84c2da47f9 AArch64: Don't call getIterator() on iterators
Remove an unnecessary round-trip:

    iterator => operator->() => getIterator()

In some cases, the iterator is end(), so the dereference of operator->
is invalid (UB).

The testcase only crashes with r278974 (currently reverted to
investigate this), which adds an assertion for invalid dereferences of
ilist nodes.

Fixes PR29035.

llvm-svn: 279104
2016-08-18 17:58:09 +00:00
Eugene Zelenko 61a72d8850 [LLVM] Fix some Clang-tidy modernize-use-using and Include What You Use warnings
Differential revision: https://reviews.llvm.org/D23675

llvm-svn: 279102
2016-08-18 17:56:27 +00:00
Dan Gohman c9623db884 [WebAssembly] Disable the store-results optimization.
The WebAssemly spec removing the return value from store instructions, so
remove the associated optimization from LLVM.

This patch leaves the store instruction operands in place for now, so stores
now always write to "$drop"; these will be removed in a seperate patch.

llvm-svn: 279100
2016-08-18 17:51:27 +00:00
Ahmed Bougacha 33e19fe1c4 [AArch64][GlobalISel] Select floating-point binary ops.
There is no FREM instruction, but the others are straightforward.

llvm-svn: 279081
2016-08-18 16:05:11 +00:00
Derek Schuff ccdceda128 [WebAssembly] Refactor WebAssemblyLowerEmscriptenException pass for setjmp/longjmp
This patch changes the code structure of
WebAssemblyLowerEmscriptenException pass to support both exception
handling and setjmp/longjmp. It also changes the name of the pass and
the source file.

1. Change the file/pass name to WebAssemblyLowerEmscriptenExceptions ->
WebAssemblyLowerEmscriptenEHSjLj to make it clear that it supports both
EH and SjLj
2. List function / global variable names at the top so they
can be changed easily
3. Some cosmetic changes

Patch by Heejin Ahn

Differential Revision: https://reviews.llvm.org/D23588

llvm-svn: 279075
2016-08-18 15:27:25 +00:00
Ahmed Bougacha 1d0560b14d [AArch64][GlobalISel] Select G_SDIV/G_UDIV.
There is no REM instruction; that will require an expansion.
It's not obvious that should be done in select, rather than as a
(custom?) legalization.

llvm-svn: 279074
2016-08-18 15:17:13 +00:00
Sanjay Patel 7d37b221a2 fix typo; NFC
llvm-svn: 279068
2016-08-18 14:17:34 +00:00
Krzysztof Parzyszek b1b0372337 [Hexagon] Create vcombine in HexagonCopyToCombine
llvm-svn: 279067
2016-08-18 14:12:34 +00:00
Simon Dardis ea3431598e [mips] Correct tail call encoding for MIPSR6
r277708 enabled tails calls for MIPS but used the 'jr' instruction when the 
jump target was held in a register. For MIPSR6, 'jalr $zero, $reg' should
have been used. Additionally, add missing patterns for external and global
symbols for tail calls.

Reviewers: dsanders, vkalintiris

Differential Review: https://reviews.llvm.org/D23301

llvm-svn: 279064
2016-08-18 13:22:43 +00:00
Simon Pilgrim 916485c765 Remove trailing whitespace
llvm-svn: 279054
2016-08-18 11:22:22 +00:00
Dominic Chen a8a638292c [WebAssembly] Handle debug information and virtual registers without crashing (reland r278967)
Summary: Currently, enabling debug information when compiling for WebAssembly crashes the backend. This commit fixes these by skipping debug values in backend passes.

Reviewers: jfb, aprantl, dschuff, echristo

Subscribers: llvm-commits, dschuff, jfb, MatzeB, dexonsmith, yurydelendik, mehdi_amini

Differential Revision: https://reviews.llvm.org/D23635

llvm-svn: 279011
2016-08-17 23:42:27 +00:00
Duncan P. N. Exon Smith afdd8e541b Revert "[WebAssembly] Handle debug information and virtual registers without crashing"
This reverts commit r278967, since the new test is failing when you
don't build the WebAssembly target (most people, since it's
off-by-default).

llvm-svn: 278973
2016-08-17 20:41:50 +00:00
Justin Bogner cd1d5aaf2e Replace a few more "fall through" comments with LLVM_FALLTHROUGH
Follow up to r278902. I had missed "fall through", with a space.

llvm-svn: 278970
2016-08-17 20:30:52 +00:00
Tim Northover de3aea0412 GlobalISel: support irtranslation of icmp instructions.
llvm-svn: 278969
2016-08-17 20:25:25 +00:00
Dominic Chen 4326167a37 [WebAssembly] Handle debug information and virtual registers without crashing
Summary: Currently, enabling debug information when compiling for WebAssembly crashes the backend. This commit fixes these by skipping debug values in backend passes.

Reviewers: jfb, aprantl, dschuff, echristo

Subscribers: mehdi_amini, yurydelendik, dexonsmith, MatzeB, jfb, dschuff, llvm-commits

Differential Revision: https://reviews.llvm.org/D21808

llvm-svn: 278967
2016-08-17 20:11:03 +00:00
Matt Arsenault d42d58cf21 AMDGPU: Remove dead option
llvm-svn: 278965
2016-08-17 20:07:16 +00:00
Simon Dardis ac96ec7906 [mips] Add l.[sd] and s.[sd] instruction aliases
Reviewers: dsanders, vkalintiris

Differential Review: https://reviews.llvm.org/D23121

llvm-svn: 278930
2016-08-17 14:45:09 +00:00
Jonas Paulsson 7a79422536 [LoopStrenghtReduce] Refactoring and addition of a new target cost function.
Refactored so that a LSRUse owns its fixups, as oppsed to letting the
LSRInstance own them. This makes it easier to rate formulas for
LSRUses, since the fixups are available directly. The Offsets vector
has been removed since it was no longer necessary.

New target hook isFoldableMemAccessOffset(), which is used during formula
rating.

For SystemZ, this is useful to express that loads and stores with
float or vector types with a big/negative offset should be avoided in
loops. Without this, LSR will generate a lot of negative offsets that
would require extra instructions for loading the address.

Updated tests:
test/CodeGen/SystemZ/loop-01.ll

Reviewed by: Quentin Colombet and Ulrich Weigand.
https://reviews.llvm.org/D19152

llvm-svn: 278927
2016-08-17 13:24:19 +00:00
Justin Bogner b03fd12cef Replace "fallthrough" comments with LLVM_FALLTHROUGH
This is a mechanical change of comments in switches like fallthrough,
fall-through, or fall-thru to use the LLVM_FALLTHROUGH macro instead.

llvm-svn: 278902
2016-08-17 05:10:15 +00:00
Chuang-Yu Cheng f7ba716bcb [ppc64] Don't apply sibling call optimization if callee has any byval arg
This is a quick work around, because in some cases, e.g. caller's stack
size > callee's stack size, we are still able to apply sibling call
optimization even callee has any byval arg.

This patch fix: https://llvm.org/bugs/show_bug.cgi?id=28328

Reviewers: hfinkel kbarton nemanjai amehsan
Subscribers: hans, tjablin

https://reviews.llvm.org/D23441

llvm-svn: 278900
2016-08-17 03:17:44 +00:00
Chandler Carruth 67fc52f067 [PM] Port the always inliner to the new pass manager in a much more
minimal and boring form than the old pass manager's version.

This pass does the very minimal amount of work necessary to inline
functions declared as always-inline. It doesn't support a wide array of
things that the legacy pass manager did support, but is alse ... about
20 lines of code. So it has that going for it. Notably things this
doesn't support:

- Array alloca merging
  - To support the above, bottom-up inlining with careful history
    tracking and call graph updates
- DCE of the functions that become dead after this inlining.
- Inlining through call instructions with the always_inline attribute.
  Instead, it focuses on inlining functions with that attribute.

The first I've omitted because I'm hoping to just turn it off for the
primary pass manager. If that doesn't pan out, I can add it here but it
will be reasonably expensive to do so.

The second should really be handled by running global-dce after the
inliner. I don't want to re-implement the non-trivial logic necessary to
do comdat-correct DCE of functions. This means the -O0 pipeline will
have to be at least 'always-inline,global-dce', but that seems
reasonable to me. If others are seriously worried about this I'd like to
hear about it and understand why. Again, this is all solveable by
factoring that logic into a utility and calling it here, but I'd like to
wait to do that until there is a clear reason why the existing
pass-based factoring won't work.

The final point is a serious one. I can fairly easily add support for
this, but it seems both costly and a confusing construct for the use
case of the always inliner running at -O0. This attribute can of course
still impact the normal inliner easily (although I find that
a questionable re-use of the same attribute). I've started a discussion
to sort out what semantics we want here and based on that can figure out
if it makes sense ta have this complexity at O0 or not.

One other advantage of this design is that it should be quite a bit
faster due to checking for whether the function is a viable candidate
for inlining exactly once per function instead of doing it for each call
site.

Anyways, hopefully a reasonable starting point for this pass.

Differential Revision: https://reviews.llvm.org/D23299

llvm-svn: 278896
2016-08-17 02:56:20 +00:00
Zijiao Ma 53d55f45a1 Some places that could using TargetParser in LLVM. NFC.
llvm-svn: 278888
2016-08-17 02:08:28 +00:00
Duncan P. N. Exon Smith ec083b59ed ARM: Avoid dereferencing end() in ARMFrameLowering::emitPrologue
llvm::tryFoldSPUpdateIntoPushPop assumes its arguments are valid
MachineInstrs.  Update ARMFrameLowering::emitPrologue to respect that;
when LastPush==end(), it can't possibly be a push instruction anyway.

llvm-svn: 278880
2016-08-17 00:53:04 +00:00
Duncan P. N. Exon Smith e04fe1a394 Hexagon: Avoid dereferencing end() in HexagonInstrInfo::InsertBranch
llvm-svn: 278878
2016-08-17 00:34:00 +00:00
Duncan P. N. Exon Smith db53d99d02 AMDGPU: Avoid looking for the DebugLoc in end()
The end() iterator isn't a safe thing to dereference.  Pass the DebugLoc
into EmitFetchClause and EmitALUClause to avoid it.

llvm-svn: 278873
2016-08-17 00:06:43 +00:00
Konstantin Zhuravlyov e0b87181cf [AMDGPU] Remove duplicate initialization of SIDebuggerInsertNops pass
Differential Revision: https://reviews.llvm.org/D23556

llvm-svn: 278863
2016-08-16 22:30:11 +00:00
Sanjay Patel 904cd39b05 [x86] Allow merging multiple instances of an immediate within a basic block for code size savings, for 64-bit constants.
This patch handles 64-bit constants which can be encoded as 32-bit immediates.

It extends the functionality added by https://reviews.llvm.org/D11363 for 32-bit constants to 64-bit constants.

Patch by Sunita Marathe!

Differential Revision: https://reviews.llvm.org/D23391

llvm-svn: 278857
2016-08-16 21:35:16 +00:00
Evandro Menezes 5a5b8dcd32 [AArch64] Adjust the scheduling model for Exynos M1.
Refine the model for the FP division unit.

llvm-svn: 278846
2016-08-16 20:35:01 +00:00
Evandro Menezes d03aff2e11 [AArch64] Adjust the scheduling model for Exynos M1.
Refine the model for the integer division unit.

llvm-svn: 278845
2016-08-16 20:34:58 +00:00
Matt Arsenault 7f19298bfa AMDGPU: Remove excessive padding from ImmOp and RegOp.
The structs ImmOp and RegOp are in AArch64AsmParser.cpp (inside
anonymous namespace).
This diff changes the order of fields and removes the excessive padding
(8 bytes).

Patch by Alexander Shaposhnikov

llvm-svn: 278844
2016-08-16 20:28:06 +00:00
Krzysztof Parzyszek 1d01a79304 [Hexagon] Standardize next batch of pseudo instructions
ALIGNA          PS_aligna
ALLOCA          PS_alloca
TFR_FI          PS_fi
TFR_FIA         PS_fia
TFR_PdFalse     PS_false
TFR_PdTrue      PS_true
VMULW           PS_vmulw
VMULW_ACC       PS_vmulw_acc

llvm-svn: 278832
2016-08-16 18:08:40 +00:00
Simon Dardis 4893aff94e [mips] Enforce compact branch restrictions
Check both operands for use of the $zero register which cannot be used with
a compact branch instruction.

Reviewers: dsanders, vkalintris

Differential Review: https://reviews.llvm.org/D23547

llvm-svn: 278824
2016-08-16 17:16:11 +00:00
Krzysztof Parzyszek eabc0d0fd5 [Hexagon] Clean up some miscellaneous V60 intrinsics a bit
llvm-svn: 278823
2016-08-16 17:14:44 +00:00
Krzysztof Parzyszek 17aa4136a2 [Hexagon] Standardize vector predicate load/store pseudo instructions
- Remove unused instructions: LDriq_pred_vec_V6, STriq_pred_vec_V6, and
  the 128B counterparts.
- Rename:
    LDriq_pred_V6         PS_vloadrq_ai
    LDriq_pred_V6_128B    PS_vloadrq_ai_128B
    STriq_pred_V6         PS_vstorerq_ai
    STriq_pred_V6_128B    PS_vstorerq_ai_128B

llvm-svn: 278813
2016-08-16 15:43:54 +00:00
Ahmed Bougacha e4c03abddd [AArch64][GlobalISel] Select G_MUL.
llvm-svn: 278810
2016-08-16 14:37:46 +00:00
Ahmed Bougacha 59e160a19c [AArch64][GlobalISel] Factor out unsupported binop check. NFC.
We're going to need it for G_MUL, and, if other targets end up using
something similar, we can easily put it in the generic selector.

llvm-svn: 278808
2016-08-16 14:37:40 +00:00
Ahmed Bougacha 2ac5bf94bc [AArch64][GlobalISel] Select (variable) shifts.
For now, no support for immediates.

llvm-svn: 278804
2016-08-16 14:02:47 +00:00
Ahmed Bougacha 0306b5ef07 [AArch64][GlobalISel] Select p0 G_FRAME_INDEX.
And mark it as legal.

llvm-svn: 278802
2016-08-16 14:02:42 +00:00
Pierre Gousseau 051db7d838 [x86] Refactor a PowerPC specific ctlz/srl transformation (NFC).
Following the discussion on D22038, this refactors a PowerPC specific setcc -> srl(ctlz) transformation so it can be used by other targets.

Differential Revision: https://reviews.llvm.org/D23445

llvm-svn: 278799
2016-08-16 13:53:53 +00:00
Simon Pilgrim cc316f013a [X86][SSE] Add support for combining v2f64 target shuffles to VZEXT_MOVL byte rotations
The combine was only matching v2i64 as it assumed lowering to MOVQ - but we have v2f64 patterns that match in a similar fashion

llvm-svn: 278794
2016-08-16 12:52:06 +00:00
Prakhar Bahuguna a27c4a0e66 Correct the upper bound for a CBZ/CBNZ branch target.
Summary:
Fix for the upper bound check that was causing a build failure.

Reviewers: olista01, rengolin, t.p.northover

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23501

llvm-svn: 278789
2016-08-16 10:41:56 +00:00
Prakhar Bahuguna 15ed7ec5aa [Thumb] Validate branch target for CBZ/CBNZ instructions.
Summary:
The assembler currently does not check the branch target for CBZ/CBNZ
instructions, which only permit branching forwards with a positive offset. This
adds validation for the branch target to ensure negative PC-relative offsets are
not encoded into the instruction, whether specified as a literal or as an
assembler symbol.

Reviewers: rengolin, t.p.northover

Subscribers: llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D23312

llvm-svn: 278788
2016-08-16 10:41:52 +00:00
Simon Pilgrim f16cd361d4 [X86][SSE] Add support for combining target shuffles to PALIGNR byte rotations
llvm-svn: 278787
2016-08-16 10:03:23 +00:00
Job Noorman 6cd8c9a9d6 [AVR] Fix compile errors
Differential Revision: https://reviews.llvm.org/D23450

llvm-svn: 278784
2016-08-16 08:41:35 +00:00
Guy Blank 722caebdae [X86] Add xgetbv/xsetbv intrinsics to non-windows platforms
Differential Revision: https://reviews.llvm.org/D21958

llvm-svn: 278782
2016-08-16 06:41:00 +00:00
Reid Kleckner 229d32abfc [AMDGPU] Give enum an explicit 64-bit type to fix MSVC 2013 failures
Recall that MSVC always gives enums the type 'int', nothing else.  MSVC
2015 does not appear to have this problem anymore.

Clang-cl -Wmicrosoft-enum-value flags this, FWIW, so now I have a true
positive for my warning. :)

llvm-svn: 278762
2016-08-15 23:54:44 +00:00
Jan Vesely 0486f739a4 AMDGPU/R600: Convert buffer id to VTX_READ input
Use patterns instead of multiple instructions
Add buffer id to asm string

https://reviews.llvm.org/D22650

llvm-svn: 278749
2016-08-15 21:38:30 +00:00
Matthias Braun b948c52416 Revert "[Thumb] Validate branch target for CBZ/CBNZ instructions."
This currently breaks the greendragon clang-stage1-configure-RA/ and
brotli. It is probably just uncovering a pre-existing problem. Reverting
temporarily to get the buildbots green again. A reduced testcase will
follow shortly.

This reverts commit r278659.

llvm-svn: 278711
2016-08-15 18:50:13 +00:00
Yaxun Liu c7cbd72921 AMDGPU: Update AMDGPURuntimeMetadata.h for enums of address space qualifiers
llvm-svn: 278682
2016-08-15 16:54:25 +00:00
Matt Arsenault 3661e90e71 AMDGPU: Don't fold subregister extracts into tied operands
llvm-svn: 278676
2016-08-15 16:18:36 +00:00
Valery Pykhtin c761675ef4 [AMDGPU] fix failure on printing of non-existing instruction operands.
Differential revision: https://reviews.llvm.org/D23323

llvm-svn: 278665
2016-08-15 10:56:48 +00:00
Sjoerd Meijer 58156715b4 MachineLoop: add methods findLoopControlBlock and findLoopPreheader
This adds two new utility functions findLoopControlBlock and findLoopPreheader
to MachineLoop and MachineLoopInfo. These functions are refactored and taken
from the Hexagon target as they are target independent; thus this is intendend to
be a non-functional change.

Differential Revision: https://reviews.llvm.org/D22959

llvm-svn: 278661
2016-08-15 08:22:42 +00:00
Prakhar Bahuguna a305a435a6 [Thumb] Validate branch target for CBZ/CBNZ instructions.
Summary:
The assembler currently does not check the branch target for CBZ/CBNZ
instructions, which only permit branching forwards with a positive offset. This
adds validation for the branch target to ensure negative PC-relative offsets are
not encoded into the instruction, whether specified as a literal or as an
assembler symbol.

Reviewers: rengolin, t.p.northover

Subscribers: llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D23312

llvm-svn: 278659
2016-08-15 07:57:44 +00:00
Craig Topper f774de6d54 [X86] PADDUSB/W instructions should be commutable.
llvm-svn: 278654
2016-08-15 06:31:57 +00:00
Craig Topper 80c8b80919 [X86] Mark some of the X86 SDNodes as commutative.
llvm-svn: 278653
2016-08-15 04:47:30 +00:00
Craig Topper dbc387cfc9 [X86] X86ISD::FANDN is not commutative or associative.
llvm-svn: 278652
2016-08-15 04:47:28 +00:00
Craig Topper 37e8c5443c [AVX-512] Mark VPMADDWD as commutable to match SSE/AVX version.
llvm-svn: 278629
2016-08-14 17:57:22 +00:00
Craig Topper c677e97dff [AVX-512] Add masked commutable floating point max/min instructions to folding tables.
llvm-svn: 278628
2016-08-14 17:57:19 +00:00
Craig Topper 29fbdc309a [AVX-512] Add masked logical operations to memory folding tables.
llvm-svn: 278627
2016-08-14 17:57:16 +00:00
Igor Breger 505f2cc468 [AVX512] Fix VFPCLASSSD/VFPCLASSSS intrinsic lowering. The i1 result should be zero extended according to SPEC.
Differential Revision: http://reviews.llvm.org/D23489

llvm-svn: 278626
2016-08-14 13:58:57 +00:00
Igor Breger 8672408db0 [AVX512] Fix insertelement i1 lowering.
1. Use shuffle to insert element i1 into vector. The previous implementation was incorrect ( dest_bit OR src_bit , it doesn't clear the bit if src_bit=0 )
2. Improve shuffle i1 vector, use CVT2MASK if supported instead TRUNCATE.

Differential Revision: http://reviews.llvm.org/D23347

llvm-svn: 278623
2016-08-14 05:25:07 +00:00
Ron Lieberman 822ee88ab8 Fix unsupported relocation type R_HEX_6_X' for symbol .rodata
LowerTargetConstantPool is not properly setting the TargetFlag to indicate
desired relocation. Coding error, the offset parameter was omitted, so the
TargetFlag was used as the offset, and the TargetFlag defaulted to zero.

This only affects -fpic compilation, and only those items created in a
Constant Pool, for example a vector of constants. Halide ran into this issue.

llvm-svn: 278614
2016-08-13 23:41:11 +00:00
Craig Topper 8c372a31b7 [X86] Add a check of isCommutable at the top of X86InstrInfo::findCommutedOpIndices. Most callers don't check if the instruction is commutable before calling.
This saves us the trouble of ending up in the default of the switch and having to determine if this is an FMA or not.

llvm-svn: 278597
2016-08-13 06:48:44 +00:00
Craig Topper eafdbecc44 [AVX-512] Add isCommutable to scalar FMA3 instructions.
llvm-svn: 278596
2016-08-13 06:48:41 +00:00
Craig Topper 5f2441d8f3 [AVX-512] Add commutable flags to 132 form FMA3 instructions.
llvm-svn: 278595
2016-08-13 06:48:39 +00:00
Craig Topper e5115aa4ca [X86] Remove patterns for (vzmovl (insert_subvector undef, (scalar_to_vector))) as the (vzmovl VR256) pattern has higher priority. NFC
llvm-svn: 278594
2016-08-13 06:02:19 +00:00
Craig Topper 3f8126e6fa [AVX-512] Remove an AddedComplexity that was prioritizing basic vzmovl patterns over more complex ones that produce better code.
llvm-svn: 278593
2016-08-13 05:43:20 +00:00
Craig Topper 600685d510 [AVX-512] Add patterns to support VZEXT_MOVL from 512-bit vectors with 64-bit and 32-bit elements.
Fixes PR28961.

llvm-svn: 278592
2016-08-13 05:33:12 +00:00
Matt Arsenault c1ebd82ebe AMDGPU: Fix not estimating MBB operand sizes correctly
llvm-svn: 278590
2016-08-13 01:43:54 +00:00
Matt Arsenault 3cc1e0066d AMDGPU: Fix missing test for addressing mode with odd offsets
Add test if the constant offset looks unaligned.

llvm-svn: 278589
2016-08-13 01:43:51 +00:00
Matt Arsenault 44f6d694b3 AMDGPU/R600: Remove macros
llvm-svn: 278588
2016-08-13 01:43:46 +00:00
Hans Wennborg 0dd9ed1d45 Fix more dereferenced end() iterators after r278532
llvm-svn: 278587
2016-08-13 01:12:49 +00:00
Hans Wennborg 2d87ccfd58 X86: Fix another dereferenced end() iterator after r278532
llvm-svn: 278577
2016-08-12 23:35:59 +00:00
Duncan P. N. Exon Smith 69b0650548 X86: Stop dereferencing end() in X86FrameLowering::emitEpilogue
On a Windows build of Chromium, r278532 (up to r278539)
X86FrameLowering::emitEpilogue because it wasn't wary enough of the
return of MachineBasicBlock::getFirstTerminator.  Guard all the uses
here.

Note that r278532 *looks* like an NFC commit (just an API change), but
it removes a couple of layers of abstraction and is probably causing
optimization differences in MSVC.

llvm-svn: 278572
2016-08-12 22:43:33 +00:00
Artem Belevich 2f0a3dfe64 [NVPTX] Use untyped (.b) integer registers in PTX.
This bring LLVM-generated PTX closer to what nvcc generates and avoids
triggering issues in ptxas.

For instance, ptxas does not accept .s16 (or .u16) registers as operands
for .fp16 instructions.

Differential Revision: https://reviews.llvm.org/D23460

llvm-svn: 278568
2016-08-12 22:02:19 +00:00
Krzysztof Parzyszek f285963608 [Hexagon] Cleanup and standardize vector load/store pseudo instructions
Remove the following single-vector load/store pseudo instructions, use real
instructions instead:
  LDriv_pseudo_V6         STriv_pseudo_V6
  LDriv_pseudo_V6_128B    STriv_pseudo_V6_128B
  LDrivv_indexed          STrivv_indexed
  LDrivv_indexed_128B     STrivv_indexed_128B

Rename the double-vector load/store pseudo instructions, add unaligned
counterparts:

  -- old --               -- new --            -- unaligned --
  LDrivv_pseudo_V6        PS_vloadrw_io        PS_vloadrwu_io
  LDrivv_pseudo_V6_128B   PS_vloadrw_io_128B   PS_vloadrwu_io_128B
  STrivv_pseudo_V6        PS_vstorerw_io       PS_vstorerwu_io
  STrivv_pseudo_V6_128B   PS_vstorerw_io_128   PS_vstorerwu_io_128

llvm-svn: 278564
2016-08-12 21:05:05 +00:00
Eli Friedman f184e4befc [AArch64LoadStoreOptimizer] Check aliasing correctly when creating paired loads/stores.
The existing code accidentally skipped the aliasing check in edge cases.

Differential revision: https://reviews.llvm.org/D23372

llvm-svn: 278562
2016-08-12 20:39:51 +00:00
Mike Aizatsky f4fdb5ddf3 [AArch64] Registering default MCInstrAnalysis
Even in this form it is useful: it can detect branch instructions.

https://github.com/google/sanitizers/issues/706

Subscribers: aemerson, rengolin

Differential Revision: https://reviews.llvm.org/D23426

llvm-svn: 278560
2016-08-12 20:28:05 +00:00
Eli Friedman 8585e9d33d [AArch64LoadStoreOpt] Handle offsets correctly for post-indexed paired loads.
Trunk would try to create something like "stp x9, x8, [x0], #512", which isn't actually a valid instruction.

Differential revision: https://reviews.llvm.org/D23368

llvm-svn: 278559
2016-08-12 20:28:02 +00:00
Tim Shen dc698c3e91 [PPC] Memoize getValueBits. NFC.
Summary: It triggers exponential behavior when the DAG has many branches.

Reviewers: hfinkel, kbarton

Subscribers: iteratee, nemanjai, echristo

Differential Revision: https://reviews.llvm.org/D23428

llvm-svn: 278548
2016-08-12 18:40:04 +00:00
Benjamin Kramer 9bc1b230fd [WebAssembly] Plug MachineMemOperand leaks.
llvm-svn: 278545
2016-08-12 18:33:50 +00:00
Artur Pilipenko 87e4038a91 [x86] X86ISelLowering zext(add_nuw(x, C)) --> add(zext(x), C_zext)
Currently X86ISelLowering has a similar transformation for sexts:
sext(add_nsw(x, C)) --> add(sext(x), C_sext)

In this change I extend this code to handle zexts as well.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D23359

llvm-svn: 278520
2016-08-12 16:08:30 +00:00
Geoff Berry 22dfbc5637 [AArch64] Re-factor code shared by AArch64LoadStoreOpt and AArch64InstrInfo.
This re-factoring could cause the following slight changes in generated
code, though none were observed during testing:

- MachineScheduler could decide not to cluster some loads/stores if
  there are other load/stores with non-pairable opcodes that have the
  same base register and offset as a pairable set of load/stores.  One
  case of different MachineScheduler pairing did show up in my testing,
  but it wasn't due to this issue, but due
  BaseMemOpClusterMutation::clusterNeighboringMemOps() being unstable
  w.r.t. the order it considers memory operations.  See PR28942.

- The ImplicitNullChecks optimization could be done for more load/store
  opcodes.  This optimization isn't done for C/C++ code, so it didn't
  show up in my testing.

Reviewers: mcrosier, t.p.northover

Subscribers: aemerson, rengolin, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D23365

llvm-svn: 278515
2016-08-12 15:26:00 +00:00
James Y Knight 2cc9da9a65 Revert "[Sparc] Leon errata fix passes."
...and the two followup commits:
Revert "[Sparc][Leon] Missed resetting option flags from check-in 278489."
Revert "[Sparc][Leon] Errata fixes for various errata in different
versions of the Leon variants of the Sparc 32 bit processor."

This reverts commit r274856, r278489, and r278492.

llvm-svn: 278511
2016-08-12 14:48:09 +00:00
Simon Pilgrim 687d71e877 [X86][SSE] Add support for combining target shuffles to PSLLDQ/PSRLDQ byte shifts
llvm-svn: 278502
2016-08-12 11:24:34 +00:00
Krzysztof Parzyszek be976d4ea9 [Hexagon] Standardize pseudo-instructions for calls and returns
- CALLv3nr        PS_call_nr
- CALLRv3nr       PS_callr_nr
- CALLstk         PS_call_stk

- TCRETURNi       PS_tailcall_i
- TCRETURNr       PS_tailcall_r

- JMPret          PS_jmpret
- JMPrett         PS_jmprett
- JMPretf         PS_jmpretf
- JMPrettnew      PS_jmprettnew
- JMPretfnew      PS_jmpretfnew
- JMPrettnewpt    PS_jmprettnewpt
- JMPretfnewpt    PS_jmpretfnewpt

llvm-svn: 278499
2016-08-12 11:12:02 +00:00
Krzysztof Parzyszek ab9127ca3c [Hexagon] Treat non-returning indirect calls as scheduling boundaries
llvm-svn: 278498
2016-08-12 11:01:10 +00:00
Simon Pilgrim ed96b9adfb [X86][SSE] Fixed PALIGNR target shuffle decode
The PALIGNR target shuffle decode was not taking into account that DecodePALIGNRMask (rather oddly) expects the operands to be in reverse order, nor was it detecting unary patterns, causing combines to combine with the incorrect input.

The cgbuiltin, auto upgrade and instruction comments code correctly swap the operands so are not affected.

llvm-svn: 278494
2016-08-12 10:10:51 +00:00
Chris Dewhurst 5247af24c3 [Sparc][Leon] Missed resetting option flags from check-in 278489.
llvm-svn: 278492
2016-08-12 09:54:39 +00:00
Chris Dewhurst 829f8efe55 [Sparc][Leon] Errata fixes for various errata in different versions of the Leon variants of the Sparc 32 bit processor.
The nature of the errata are listed in the comments preceding the errata fix passes. Relevant unit tests are implemented for each of these.

These changes update older versions of these errata fixes with improvements to code and unit tests.

Differential Revision: https://reviews.llvm.org/D21960

llvm-svn: 278489
2016-08-12 09:34:26 +00:00
Duncan P. N. Exon Smith f197b1f78f ADT: Remove all ilist_iterator => pointer casts, NFC
Remove all ilist_iterator to pointer casts.  There were two reasons for
casts:

  - Checking for an uninitialized (i.e., null) iterator.  I added
    MachineInstrBundleIterator::isValid() to check for that case.

  - Comparing an iterator against the underlying pointer value while
    avoiding converting the pointer value to an iterator.  This is
    occasionally necessary in MachineInstrBundleIterator, since there is
    an assertion in the constructors that the underlying MachineInstr is
    not bundled (but we don't care about that if we're just checking for
    pointer equality).

To support the latter case, I rewrote the == and != operators for
ilist_iterator and MachineInstrBundleIterator.

  - The implicit constructors now use enable_if to exclude
    const-iterator => non-const-iterator conversions from overload
    resolution (previously it was a compiler error on instantiation, now
    it's SFINAE).

  - The == and != operators are now global (friends), and are not
    templated.

  - MachineInstrBundleIterator has overloads to compare against both
    const_pointer and const_reference.  This avoids the implicit
    conversions to MachineInstrBundleIterator that assert, instead just
    checking the address (and I added unit tests to confirm this).

Notably, the only remaining uses of ilist_iterator::getNodePtrUnchecked
are in ilist.h, and no code outside of ilist*.h directly relies on this
UB end-iterator-to-pointer conversion anymore.  It's still needed for
ilist_*sentinel_traits, but I'll clean that up soon.

llvm-svn: 278478
2016-08-12 05:05:36 +00:00
David Majnemer 91a02f5bee Use the range variant of transform instead of unpacking begin/end
No functionality change is intended.

llvm-svn: 278477
2016-08-12 04:32:45 +00:00
David Majnemer 2d006e7673 Use the range variant of transform instead of unpacking begin/end
No functionality change is intended.

llvm-svn: 278476
2016-08-12 04:32:42 +00:00
David Majnemer c700490f48 Use the range variant of remove_if instead of unpacking begin/end
No functionality change is intended.

llvm-svn: 278475
2016-08-12 04:32:37 +00:00
David Majnemer 0da5afe717 Use the range variant of count_if instead of unpacking begin/end
No functionality change is intended.

llvm-svn: 278474
2016-08-12 04:32:29 +00:00
David Majnemer 42531260b3 Use the range variant of find/find_if instead of unpacking begin/end
If the result of the find is only used to compare against end(), just
use is_contained instead.

No functionality change is intended.

llvm-svn: 278469
2016-08-12 03:55:06 +00:00
David Majnemer 562e82945e Use the range variant of find_if instead of unpacking begin/end
No functionality change is intended.

llvm-svn: 278443
2016-08-12 00:18:03 +00:00
David Majnemer 0d955d0bf5 Use the range variant of find instead of unpacking begin/end
If the result of the find is only used to compare against end(), just
use is_contained instead.

No functionality change is intended.

llvm-svn: 278433
2016-08-11 22:21:41 +00:00
Vyacheslav Klochkov 6daefcf626 X86-FMA3: Implemented commute transformation for EVEX/AVX512 FMA3 opcodes.
This helped to improved memory-folding and register coalescing optimizations.

Also, this patch fixed the tracker #17229.

Reviewer: Craig Topper.
Differential Revision: https://reviews.llvm.org/D23108

llvm-svn: 278431
2016-08-11 22:07:33 +00:00
David Majnemer 0a16c22846 Use range algorithms instead of unpacking begin/end
No functionality change is intended.

llvm-svn: 278417
2016-08-11 21:15:00 +00:00
Krzysztof Parzyszek 1b689da04e [Hexagon] Allow non-returning calls in hardware loops
llvm-svn: 278416
2016-08-11 21:14:25 +00:00
Matt Arsenault 18da70dd2d AMDGPU: Remove unused tablegen utilities
llvm-svn: 278414
2016-08-11 21:08:43 +00:00
Wei Ding 70cda07526 AMDGPU : Add intrinsic for instruction v_cvt_pk_u8_f32
Differential Revision: http://reviews.llvm.org/D23336

llvm-svn: 278403
2016-08-11 20:34:48 +00:00
Matt Arsenault 2ffe8fd2ce AMDGPU: Prune includes
llvm-svn: 278391
2016-08-11 19:18:50 +00:00
Krzysztof Parzyszek 258af19d99 [Hexagon] Standardize "select" pseudo-instructions
- PS_pselect: general register pairs
- PS_vselect: vector registers (+ 128B version)
- PS_wselect: vector register pairs (+ 128B version)

llvm-svn: 278390
2016-08-11 19:12:18 +00:00
Krzysztof Parzyszek 60f0b51485 [Hexagon] Skip byval arguments when checking parameter attributes
From the point of view of register assignment, byval parameters are
ignored: a byval parameter is not going to be assigned to a register,
and it will not affect the assignments of subsequent parameters.
When matching registers with parameters in the bit tracker, make sure
to skip byval parameters before advancing the registers.

llvm-svn: 278375
2016-08-11 18:15:16 +00:00
Matt Arsenault 56684d4538 AMDGPU: Fix crashes on memory functions
llvm-svn: 278369
2016-08-11 17:31:42 +00:00
Matt Arsenault 76837df6ff AArch64: Assert on analyzeBranch failing
llvm-svn: 278366
2016-08-11 17:22:59 +00:00
Matt Arsenault 4b5fc093d0 AMDGPU: Remove custom getSubReg
This was kind of confusing, the subregister
class shouldn't really be necessary.

llvm-svn: 278362
2016-08-11 17:15:32 +00:00
Matt Arsenault 69fd2c1179 AMDGPU: Remove unused tracking of flat instructions
llvm-svn: 278361
2016-08-11 17:15:28 +00:00
Duncan P. N. Exon Smith ec30cc2171 Hexagon: Avoid dereferencing end() in HexagonCopyToCombine::findPairable
Check for end() before skipping through debug values.  This avoids
dereferencing end() when the instruction is the final one in the basic
block.  (It still assumes that a debug value will not be the final
instruction in the basic block.  No tests seemed to violate that.)

Many Hexagon tests trigger this, but they happen to magically pass right
now.  I found this because WIP patches for PR26753 convert them into
crashes.

llvm-svn: 278355
2016-08-11 16:40:03 +00:00
Wei Ding 34e1753585 AMDGPU : Add LLVM intrinsics for SAD related instructions.
Differential Revision: http://reviews.llvm.org/D23133

llvm-svn: 278354
2016-08-11 16:33:53 +00:00
Duncan P. N. Exon Smith 62e351f5a4 X86: Use operator lookup for operator==, NFC
Avoid relying on the MachineInstrBundleIterator operator== being
implemented as a member function.

llvm-svn: 278347
2016-08-11 15:51:29 +00:00
Valery Pykhtin 82c73bee2b Revert "[AMDGPU] fix failure on printing of non-existing instruction operands."
This reverts revision 278333, newly added test failed.

llvm-svn: 278336
2016-08-11 14:22:05 +00:00
Valery Pykhtin 3048ff6ec3 [AMDGPU] fix failure on printing of non-existing instruction operands.
Differential revision: https://reviews.llvm.org/D23323

llvm-svn: 278333
2016-08-11 13:49:46 +00:00
Simon Pilgrim 5c91764af5 Fixed VS2015 (Update 3) warning - differing const/volatile qualifiers for overridden function
Dropped the const qualifier to match llvm::CallLowering::lowerCall

llvm-svn: 278329
2016-08-11 12:19:43 +00:00
Igor Breger a77b14d02c [AVX512] Fix extractelement i1 lowering.
The previous implementation (not custom) doesn't enforce zeroing off upper bits. The assumption is that i1 PRODUCER (truncate and extractelement) must zero all upper bits, so i1 CONSUMER instructions ( test, zext, save, etc) can be done without additional zeroing.
Make extractelement i1 lowering custom for all vector i1.

Differential Revision: http://reviews.llvm.org/D23246

llvm-svn: 278328
2016-08-11 12:13:46 +00:00
Marina Yatsina 88f0c31f13 Avoid false dependencies of undef machine operands
This patch helps avoid false dependencies on undef registers by updating the machine instructions' undef operand to use a register that the instruction is truly dependent on, or use a register with clearance higher than Pref.

Pseudo example:

loop:
xmm0 = ...
xmm1 = vcvtsi2sdl eax, xmm0<undef>
... = inst xmm0
jmp loop

In this example, selecting xmm0 as the undef register creates false dependency between loop iterations.
This false dependency cannot be solved by inserting an xor before vcvtsi2sdl because xmm0 is alive at the point of the vcvtsi2sdl instruction.
Selecting a different register instead of xmm0, especially a register that is not used in the loop, will eliminate this problem.

Differential Revision: https://reviews.llvm.org/D22466

llvm-svn: 278321
2016-08-11 07:32:08 +00:00
Craig Topper a78b768ed4 [AVX-512] Promote 512-bit integer loads to v8i64 similar to what is done for 128/256-bit vectors for overall consistency.
llvm-svn: 278318
2016-08-11 06:04:07 +00:00
Craig Topper 14aa2665d3 [AVX-512] Add patterns to allow EVEX encoded stores of v16i16/v8i16/v16i8/v32i8 even when BWI is not supported.
llvm-svn: 278317
2016-08-11 06:04:04 +00:00
Craig Topper 3563d0f622 [AVX-512] Fix the 128-bit and 256-bit nontemporal load patterns with elements type other than i64. These loads have all been promoted to v2i64/v4i64 loads so we need bitcasts or we end up selecting VMOVDQA32/VMOVDQU32 instead.
llvm-svn: 278316
2016-08-11 06:04:00 +00:00
Dominic Chen 4173fffa08 [WebAssembly] Cleanup trailing whitespace
Summary: Test for commit access.

Subscribers: jfb, dschuff

Differential Revision: https://reviews.llvm.org/D23392

llvm-svn: 278313
2016-08-11 04:10:56 +00:00
Tim Northover 406024a108 GlobalISel: implement simple function calls on AArch64.
We're still limited in the arguments we support, but this at least handles the
basic cases.

llvm-svn: 278293
2016-08-10 21:44:01 +00:00
Changpeng Fang fb9c3818dd AMDGPU/SI: Implement amdgcn image intrinsics with sampler
Summary:
  This patch define and implement amdgcn image intrinsics with sampler.

    1. define vdata type to be llvm_anyfloat_ty, address type to be llvm_anyfloat_ty,
       and rsrc type to be llvm_anyint_ty. As a result, we expect the intrinsics name
       to have three suffixes to overload each of these three types;

    2. D128 as well as two other flags are implied in the three types, for example,
       if you use v8i32 as resource type, then r128 is 0!

    3. don't expose TFE flag, and other flags are exposed in the instruction order:
       unrm, glc, slc, lwe and da.

Differential Revision: http://reviews.llvm.org/D22838

Reviewed by:
  arsenm and tstellarAMD

llvm-svn: 278291
2016-08-10 21:15:30 +00:00
Matt Arsenault 61f8ba8b79 AMDGPU: s_setpc_b64 should be an indirect branch
llvm-svn: 278278
2016-08-10 19:20:02 +00:00
Matt Arsenault c6b1350039 AMDGPU: Set sizes on control flow pseudos
llvm-svn: 278276
2016-08-10 19:11:51 +00:00
Matt Arsenault f4af802381 AMDGPU: Remove empty file comment
llvm-svn: 278275
2016-08-10 19:11:48 +00:00
Matt Arsenault 11587d97be AMDGPU: Remove unnecessary cast
llvm-svn: 278274
2016-08-10 19:11:45 +00:00
Matt Arsenault 57431c9680 AMDGPU: Change insertion point of si_mask_branch
Insert before the skip branch if one is created.
This is a somewhat more natural placement relative
to the skip branches, and makes it possible to implement
analyzeBranch for skip blocks.

The test changes are mostly due to a quirk where
the block label is not emitted if there is a terminator
that is not also a branch.

llvm-svn: 278273
2016-08-10 19:11:42 +00:00
Matt Arsenault b920e9987d AMDGPU: Use CreateStackObject instead of CreateSpillStackObject
I'm not sure what the difference is, but no other target
uses this for emergency spill slots.

llvm-svn: 278272
2016-08-10 19:11:36 +00:00
Sanjay Patel 5ccc85fe83 [x86, AVX] allow FP vector select folding to bitwise logic ops (PR28895)
This handles the case in:
https://llvm.org/bugs/show_bug.cgi?id=28895

...but we are not getting all of the possibilities yet. 
Eg, we use 'X86::FANDN' for scalar FP select combines.

That enhancement is filed as:
https://llvm.org/bugs/show_bug.cgi?id=28925

Differential Revision: https://reviews.llvm.org/D23337

llvm-svn: 278270
2016-08-10 19:00:11 +00:00