Commit Graph

43409 Commits

Author SHA1 Message Date
Rafael Espindola 9f92995781 Don't pass the code model to MC
I was surprised to see the code model being passed to MC. After all,
it assembles code, it doesn't create it.

The one place it is used is in the expansion of .cfi directives to
handle .eh_frame being more that 2gb away from the code.

As far as I can tell, gnu assembler doesn't even have an option to
enable this. Compiling a c file with gcc -mcmodel=large produces a
regular looking .eh_frame. This is probably because in practice linker
parse and recreate .eh_frames.

In llvm this is used because the JIT can place the code and .eh_frame
very far apart. Ideally we would fix the jit and delete this
option. This is hard.

Apart from confusion another problem with the current interface is
that most callers pass CodeModel::Default, which is bad since MC has
no way to map it to the target default if it actually needed to.

This patch then replaces the argument with a boolean with a default
value. The vast majority of users don't ever need to look at it. In
fact, only CodeGen and llvm-mc use it and llvm-mc just to enable more
testing.

llvm-svn: 309884
2017-08-02 20:32:26 +00:00
Stefan Pintilie 873889ca16 [Power9] Exploit vector absolute difference instructions on Power 9
Power 9 has instructions to do absolute difference (VABSDUB, VABSDUH, VABSDUW)
for byte, halfword and word. We should take advantage of these.

Differential Revision: https://reviews.llvm.org/D34684

llvm-svn: 309876
2017-08-02 20:07:21 +00:00
Matt Arsenault 2738ede6b2 AMDGPU: Restore using MRI to find highest used regs
If there are no calls, this is a faster path than
searching the entire program for calls.

This was supposed to be left in r309781.
Fixes unused variable warning.

llvm-svn: 309832
2017-08-02 17:15:01 +00:00
Adrian Prantl 61b39b2aec Remove unused includes of MachineLocation.h (NFC)
llvm-svn: 309824
2017-08-02 15:32:18 +00:00
Florian Hahn 31f78fd0ae [AArch64] Simplify AES*Tied pseudo expansion (NFC).
Summary:
Suggested by @t.p.northover in https://bugs.llvm.org/show_bug.cgi?id=34015.


Reviewers: javed.absar, t.p.northover, rengolin

Reviewed By: t.p.northover

Subscribers: aemerson, kristof.beyls, llvm-commits, t.p.northover

Differential Revision: https://reviews.llvm.org/D36223

llvm-svn: 309821
2017-08-02 15:17:19 +00:00
Matt Arsenault 8e8f8f43b0 AMDGPU: Fix clobbering CSR VGPRs when spilling SGPR to it
llvm-svn: 309783
2017-08-02 01:52:45 +00:00
Matt Arsenault 1d6317c3ad AMDGPU: Fix emitting encoded calls
This was failing on out of bounds access to the extra operands
on the s_swappc_b64 beyond those in the instruction definition.

This was working, but somehow regressed within the past few weeks,
although I don't see any obvious commit.

llvm-svn: 309782
2017-08-02 01:42:04 +00:00
Matt Arsenault 6ed7b9bfc0 AMDGPU: Analyze callee resource usage in AsmPrinter
llvm-svn: 309781
2017-08-02 01:31:28 +00:00
Stanislav Mekhanoshin f23ae4fbe9 [AMDGPU] Fix asan error after last commit
Previous change "Turn s_and_saveexec_b64 into s_and_b64 if
result is unused" introduced asan use-after-poison error.
Instruction was analyzed after eraseFromParent() calls.

Move analysys higher than erase.

llvm-svn: 309779
2017-08-02 01:18:57 +00:00
Matt Arsenault d1867c0345 AMDGPU: Don't place arguments in emergency stack slot
When finding the fixed offsets for function arguments,
this needs to skip over the 4 bytes reserved for the
emergency stack slot.

llvm-svn: 309776
2017-08-02 00:59:51 +00:00
Stanislav Mekhanoshin da0edef1bd [AMDGPU] Turn s_and_saveexec_b64 into s_and_b64 if result is unused
With SI_END_CF elimination for some nested control flow we can now
eliminate saved exec register completely by turning a saveexec version
of instruction into just a logical instruction.

Differential Revision: https://reviews.llvm.org/D36007

llvm-svn: 309766
2017-08-01 23:44:35 +00:00
Stanislav Mekhanoshin 37e7f959c0 [AMDGPU] Collapse adjacent SI_END_CF
Add a pass to remove redundant S_OR_B64 instructions enabling lanes in
the exec. If two SI_END_CF (lowered as S_OR_B64) come together without any
vector instructions between them we can only keep outer SI_END_CF, given
that CFG is structured and exec bits of the outer end statement are always
not less than exec bit of the inner one.

This needs to be done before the RA to eliminate saved exec bits registers
but after register coalescer to have no vector registers copies in between
of different end cf statements.

Differential Revision: https://reviews.llvm.org/D35967

llvm-svn: 309762
2017-08-01 23:14:32 +00:00
Haicheng Wu 50692a203c [AArch64] Fix a typo in isExtFreeImpl()
next => not

Differential Revision: https://reviews.llvm.org/D36104

llvm-svn: 309748
2017-08-01 21:26:45 +00:00
Eugene Zelenko 52889219ef [Hexagon] Fix some Clang-tidy modernize-use-using and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 309746
2017-08-01 21:20:10 +00:00
Martin Storsjo eacf4e408b [AArch64] Rewrite stack frame handling for win64 vararg functions
The previous attempt, which made do with a single offset in
computeCalleeSaveRegisterPairs, wasn't quite enough. The previous
attempt only worked as long as CombineSPBump == true (since the
offset would be adjusted later in fixupCalleeSaveRestoreStackOffset).

Instead include the size for the fixed stack area used for win64
varargs in calculations in emitPrologue/emitEpilogue. The stack
consists of mainly three parts;
- AFI->getLocalStackSize()
- AFI->getCalleeSavedStackSize()
- FixedObject

Most of the places in the code which previously used the CSStackSize
now use PrologueSaveSize instead, which is the sum of the latter
two, while some cases which need exactly the middle one use
AFI->getCalleeSavedStackSize() explicitly instead of a local variable.

In addition to moving the offsetting into emitPrologue/emitEpilogue
(which fixes functions with CombineSPBump == false), also set the
frame pointer to point to the right location, where the frame pointer
and link register actually are stored. In addition to the prologue/epilogue,
this also requires changes to resolveFrameIndexReference.

Add tests for a function that keeps a frame pointer and another one
that uses a VLA.

Differential Revision: https://reviews.llvm.org/D35919

llvm-svn: 309744
2017-08-01 21:13:54 +00:00
Matt Arsenault 206f826348 AMDGPU: Fix handling of div_scale with undef inputs
The src0 register must match src1 or src2, but if these
were undefined they could end up using different implicit_defed
virtual registers. Force these to use one undef vreg or pick the
defined other register.

Also fixes producing invalid nodes without the right number of
inputs when src2 is undef.

llvm-svn: 309743
2017-08-01 20:49:41 +00:00
Matt Arsenault b62a4eb524 AMDGPU: Initial implementation of calls
Includes a hack to fix the type selected for
the GlobalAddress of the function, which will be
fixed by changing the default datalayout to use
generic pointers for 0.

llvm-svn: 309732
2017-08-01 19:54:18 +00:00
Davide Italiano ffb1098e92 [AMDGPU] Put a function used only inside assert() under NDEBUG.
llvm-svn: 309723
2017-08-01 19:07:20 +00:00
Jacques Pienaar 922928b62d [lanai] Add getIntImmCost in LanaiTargetTransformInfo.
Add simple int immediate cost function.

llvm-svn: 309721
2017-08-01 18:40:08 +00:00
Simon Pilgrim 486072d3d6 [X86][SSE] Added missing vector logic intrinsic schedules
Improves atom scheduler test coverage (to make it easier to upgrade them for PR32431).

Merged SSE_VEC_BIT_ITINS_P + SSE_BIT_ITINS_P as we were interchanging between them.

llvm-svn: 309715
2017-08-01 17:51:20 +00:00
Craig Topper 2a5bba7325 [X86] Use BEXTR/BEXTRI for 64-bit 'and' with a large mask
Summary: The 64-bit 'and' with immediate instruction only supports a 32-bit immediate. So for larger constants we have to load the constant into a register first. If the immediate happens to be a mask we can use the BEXTRI instruction to perform the masking. We already do something similar using the BZHI instruction from the BMI2 instruction set.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D36129

llvm-svn: 309706
2017-08-01 17:18:14 +00:00
Simon Pilgrim 3f24ff6130 [X86][SSE] Added missing PACKSS/PACKUS intrinsic schedules
Improves atom scheduler test coverage (to make it easier to upgrade them for PR32431).

Checked on Agner that these actually match the UNPACK schedules, but better to include a separate class

llvm-svn: 309701
2017-08-01 16:47:48 +00:00
Simon Pilgrim 810677eba2 [X86][SSSE3] Added missing PHADDS/PHSUBS/PSIGN intrinsic schedules
llvm-svn: 309699
2017-08-01 16:18:25 +00:00
Craig Topper 2462a713ae [AVX-512] Don't use unmasked VMOVDQU8/16 for 8-bit or 16-bit element stores even when BWI instructions are supported. Always use VMOVDQA32/VMOVDQU32.
We were already using the 32 bit element opcode if BWI isn't enabled, but there's no reason to change opcode if we have BWI. We will still use the 8/16 opcodes for masked stores though.

This allows us to use the aligned opcode when we can which makes our test output more consistent between different modes. It also reduces the number of isel patterns we need.

This is a slight inconsistency with loads which default to 64 bit element opcodes. I'll probably rectify that in a future patch.

Differential Revision: https://reviews.llvm.org/D35978

llvm-svn: 309693
2017-08-01 15:31:24 +00:00
Strahinja Petrovic a2b4748bdc [Mips] Fix for BBIT octeon instruction
This patch enables control flow optimization for
variations of BBIT instruction. In this case
optimization removes unnecessary branch after
BBIT instruction.

Differential Revision: https://reviews.llvm.org/D35359

llvm-svn: 309679
2017-08-01 13:42:45 +00:00
Krzysztof Parzyszek 91ff5c6d47 [Hexagon] Convert HVX vector constants of i1 to i8
Certain operations require vector of i1 values. However, for Hexagon
architecture compatibility, they need to be represented as vector of i8.

Patch by Suyog Sarda.

llvm-svn: 309677
2017-08-01 13:12:53 +00:00
Tom Stellard 9d8337d857 AMDGPU/GlobalISel: Add support for amdgpu_vs calling convention
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D35916

llvm-svn: 309675
2017-08-01 12:38:33 +00:00
Craig Topper 410d252f5b [AVX-512] Add unmasked subvector inserts and extract to the execution domain tables.
llvm-svn: 309632
2017-07-31 22:07:29 +00:00
Konstantin Belochapka b77d0a5cf1 [X86][MMX] Added custom lowering action for MMX SELECT (PR30418)
Fix for pr30418 - error in backend: Cannot select: t17: x86mmx = select_cc t2, Constant:i64<0>, t7, t8, seteq:ch
Differential Revision: https://reviews.llvm.org/D34661

llvm-svn: 309614
2017-07-31 20:11:49 +00:00
Craig Topper cb0e74975a [AVX-512] Remove patterns that select vmovdqu8/16 for unmasked loads. Prefer vmovdqa64/vmovdqu64 instead.
These were taking priority over the aligned load instructions since there is no vmovda8/16. I don't think there is really a difference between aligned and unaligned on newer cpus so I don't think it matters which instructions we use.

But with this change we reduce the size of the isel table a little and we allow the aligned information to pass through to the evex->vec pass and produce the same output has avx/avx2 in some cases.

I also generally dislike patterns rooted in a bitcast which these were.

Differential Revision: https://reviews.llvm.org/D35977

llvm-svn: 309589
2017-07-31 17:35:44 +00:00
Simon Pilgrim 7b89ab5887 Strip trailing whitespace. NFCI.
llvm-svn: 309584
2017-07-31 17:09:27 +00:00
Simon Pilgrim 77bdbc197e Fix typo in comment.
llvm-svn: 309583
2017-07-31 17:06:55 +00:00
Aditya Nandakumar 02c602e18c [GISel]: Support Widening G_ICMP's destination operand.
Updated AArch64 to widen destination to s32.
https://reviews.llvm.org/D35737

Reviewed by Tim

llvm-svn: 309579
2017-07-31 17:00:16 +00:00
Amaury Sechet 4358c5217d Do not recombine FMA when that is not needed.
Summary: As per title. This creates useless recombines.

Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D33848

llvm-svn: 309578
2017-07-31 16:56:25 +00:00
Florian Hahn a3ad61d874 Exclude more unused functions from release build.
llvm-svn: 309576
2017-07-31 16:44:28 +00:00
Alexey Bataev 3e9b3eb91d [Cost] Rename getReductionCost() to getArithmeticReductionCost(), NFC.
llvm-svn: 309563
2017-07-31 14:19:32 +00:00
Florian Hahn 6b3216aad8 Guard print() functions only used by dump() functions.
Summary:
Since  r293359, most dump() function are only defined when
`!defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)` holds. print() functions
only used by dump() functions are now unused in release builds,
generating lots of warnings. This patch only defines some print()
functions if they are used.

Reviewers: MatzeB

Reviewed By: MatzeB

Subscribers: arsenm, mzolotukhin, nhaehnle, llvm-commits

Differential Revision: https://reviews.llvm.org/D35949

llvm-svn: 309553
2017-07-31 10:07:49 +00:00
Guy Blank b169d56dc3 [X86][AVX512] Add masked MOVS[S|D] patterns
Added patterns to recognize AND 1 on the mask of a scalar masked
move is not needed since only the lower bit is relevant for the
instruction.

Differential Revision:
https://reviews.llvm.org/D35897

llvm-svn: 309546
2017-07-31 08:26:14 +00:00
Hiroshi Inoue 5703fe37ab [PowerPC] Change method names; NFC
Changed method names based on the discussion in https://reviews.llvm.org/D34986:
getInt64 -> selectI64Imm,
getInt64Count -> selectI64ImmInstrCount.

llvm-svn: 309541
2017-07-31 06:27:09 +00:00
Craig Topper 97e9fa7954 [X86] Add pattern to use bzhi for 64-bit 'and' with a mask when there is a load involved.
We already had a pattern without load, but with a load we were falling back to a regular 'and' due to pattern complexity priority.

llvm-svn: 309535
2017-07-31 05:55:54 +00:00
Coby Tayree 48d67cdbb4 [x86][inline-asm][ms-compat] legalize the use of "jc/jz short <op>"
MS ignores the keyword "short" when used after a jc/jz instruction, LLVM ought to do the same.
Test: D35893

Differential Revision: https://reviews.llvm.org/D35892

llvm-svn: 309509
2017-07-30 11:12:47 +00:00
Craig Topper 951f0ca104 [X86] Add addsub intrinsics to the intrinsic lowering table so we have a single set of isel patterns.
llvm-svn: 309502
2017-07-30 06:02:59 +00:00
Florian Hahn f63a5e91db [AArch64] Tie source and destination operands for AESMC/AESIMC.
Summary:
Most CPUs implementing AES fusion require instruction pairs of the form
    AESE Vn, _
    AESMC Vn, Vn
and
    AESD Vn, _
    AESIMC Vn, Vn

The constraint is added to AES(I)MC instructions which use the result of
an AES(E|D) instruction by using AES(I)MCTrr pseudo instructions, which
constraint source and destination registers to be the same.

A nice side effect of this change is that now all possible pairs are
scheduled back-to-back on the exynos-m1 for the misched-fusion-aes.ll
test case.

I had to update aes_load_store. The version I added initially was very
reduced and with the new constraint, AESE/AESMC could not be scheduled
back-to-back. I updated the test to be more realistic and still expose
the same scheduling problem as the initial test case.

Reviewers: t.p.northover, rengolin, evandro, kristof.beyls, silviu.baranga

Reviewed By: t.p.northover, evandro

Subscribers: aemerson, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D35299

llvm-svn: 309495
2017-07-29 20:35:28 +00:00
Florian Hahn 2f86e3d494 [AArch64] Use 8 bytes as preferred function alignment on Cortex-A53.
Summary:
This change gives a 0.25% speedup on execution time, a 0.82% improvement
in benchmark scores and a 0.20% increase in binary size on a Cortex-A53.
These numbers are the geomean results on a wide range of benchmarks from
the test-suite and a range of proprietary suites.

Reviewers: t.p.northover, aadg, silviu.baranga, mcrosier, rengolin

Reviewed By: rengolin

Subscribers: grimar, davide, aemerson, rengolin, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D35568

llvm-svn: 309494
2017-07-29 20:04:54 +00:00
Simon Pilgrim 718cb0ea62 [SelectionDAG][X86] CombineBT - more aggressively determine demanded bits
This patch is in 2 parts:

1 - replace combineBT's use of SimplifyDemandedBits (hasOneUse only) with SelectionDAG::GetDemandedBits to more aggressively determine the lower bits used by BT.

2 - update SelectionDAG::GetDemandedBits to support ANY_EXTEND - if the demanded bits are only in the non-extended portion, then peek through and demand from the source value and then ANY_EXTEND that if we found a match.

Differential Revision: https://reviews.llvm.org/D35896

llvm-svn: 309486
2017-07-29 14:50:25 +00:00
Tom Stellard 503fd446ad AMDGPU: Remove deadcode from AMDGPUInstPrinter
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D36034

llvm-svn: 309477
2017-07-29 03:56:53 +00:00
Tom Stellard 5c50cdf0e8 AMDGPU: Move INDIRECT_BASE_ADDR definition out of common files
Summary: This is only used by R600.

Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D35926

llvm-svn: 309476
2017-07-29 03:44:07 +00:00
Jessica Paquette d87f54493d [MachineOutliner] NFC: Change IsTailCall to a call class + frame class
This commit

- Removes IsTailCall and replaces it with a target-defined unsigned
- Refactors getOutliningCallOverhead and getOutliningFrameOverhead so that they don't use IsTailCall
- Adds a call class + frame class classification to OutlinedFunction and Candidate respectively

This accomplishes a couple things.

Firstly, we don't need the notion of *tail call* in the general outlining algorithm.

Secondly, we now can have different "outlining classes" for each candidate within a set of candidates.
This will make it easy to add new ways to outline sequences for certain targets and dynamically choose
an appropriate cost model for a sequence depending on the context that that sequence lives in.

Ultimately, this should get us closer to being able to do something like, say avoid saving the link
register when outlining AArch64 instructions.

llvm-svn: 309475
2017-07-29 02:55:46 +00:00
Matt Arsenault 9608a2891d AMDGPU: Make areMemAccessesTriviallyDisjoint more aware of segment flat
Checking the encoding is insufficient since now there can
be global or scratch instructions.

llvm-svn: 309472
2017-07-29 01:26:21 +00:00
Matt Arsenault dc8f5cc39c AMDGPU: Teach isLegalAddressingMode about global_* instructions
Also refine the flat check to respect flat-for-global feature,
and constant fallback should check global handling, not
specifically MUBUF.

llvm-svn: 309471
2017-07-29 01:12:31 +00:00