Commit Graph

50999 Commits

Author SHA1 Message Date
Simon Atanasyan 8cb497027d [mips] Emit `.module softfloat` directive
This change fixes crash on an assertion in case of using
`soft float` ABI for mips32r6 target.

llvm-svn: 354882
2019-02-26 14:45:17 +00:00
Igor Kudrin 2d3faad706 [llvm-objdump] Implement -Mreg-names-raw/-std options.
The --disassembler-options, or -M, are used to customize
the disassembler and affect its output.

The two implemented options allow selecting register names on ARM:
* With -Mreg-names-raw, the disassembler uses rNN for all registers.
* With -Mreg-names-std it prints sp, lr and pc for r13, r14 and r15,
  which is the default behavior of llvm-objdump.

Differential Revision: https://reviews.llvm.org/D57680

llvm-svn: 354870
2019-02-26 12:15:14 +00:00
Luke Cheeseman 9e285bef2b [ARM] Add Cortex-M35P
- Add LLVM backend support for Cortex-M35P
- Documentation can be found at
  https://developer.arm.com/products/processors/cortex-m/cortex-m35p

Differentail Revision: https://reviews.llvm.org/D57763

llvm-svn: 354868
2019-02-26 12:02:12 +00:00
Dan Gohman c71132c0be [WebAssembly] Properly align fp128 arguments in outgoing varargs arguments
For outgoing varargs arguments, it's necessary to check the OrigAlign field
of the corresponding OutputArg entry to determine argument alignment, rather
than just computing an alignment from the argument value type. This is
because types like fp128 are split into multiple argument values, with
narrower types that don't reflect the ABI alignment of the full fp128.

This fixes the printf("printfL: %4.*Lf\n", 2, lval); testcase.

Differential Revision: https://reviews.llvm.org/D58656

llvm-svn: 354846
2019-02-26 05:20:19 +00:00
Philip Reames 38b14e33a8 [ARM] Be super conservative about atomics
As requested during review of D57601 <https://reviews.llvm.org/D57601> https://reviews.llvm.org/D57601, be equally conservative for atomic MMOs as for volatile MMOs in all in tree backends. At the moment, all atomic MMOs are also volatile, but I'm about to change that.

Differential Revision: https://reviews.llvm.org/D58490

Note: D58498 landed in several pieces as individual backends were approved.  This is the last chunk.
llvm-svn: 354845
2019-02-26 04:30:33 +00:00
Heejin Ahn d2a56ac661 [WebAssembly] Fix a bug deleting instruction in a ranged for loop
Summary: We shouldn't delete elements while iterating a ranged for loop.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58519

llvm-svn: 354844
2019-02-26 04:08:49 +00:00
Reid Kleckner 2f055f026a [X86] Fix bug in x86_intrcc with arg copy elision
Summary:
Use a custom calling convention handler for interrupts instead of fixing
up the locations in LowerMemArgument. This way, the offsets are correct
when constructed and we don't need to account for them in as many
places.

Depends on D56883

Replaces D56275

Reviewers: craig.topper, phil-opp

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D56944

llvm-svn: 354837
2019-02-26 02:11:25 +00:00
Matt Arsenault 752579736e RegBankSelect: Handle slightly more complex value mappings
Try to use concat_vectors. Also remove unnecessary assert on
pointers. Fixes asserting for <4 x s16> operations and 64-bit pointers
for AMDGPU.

llvm-svn: 354828
2019-02-25 22:24:13 +00:00
Matt Arsenault f4bfe4cd17 AMDGPU/GlobalISel: Fix bit ops for non-power-of-2 sizes
llvm-svn: 354825
2019-02-25 21:32:48 +00:00
Matt Arsenault 82b103998b AMDGPU/GlobalISel: Clamp max implicit_def elements
llvm-svn: 354818
2019-02-25 20:46:06 +00:00
Matt Arsenault f97ace5639 AMDGPU: Remove IntrReadMem from memtime/memrealtime intrinsics
EarlyCSE with MemorySSA was able to use this to merge multiple calls
with no intervening store.

llvm-svn: 354814
2019-02-25 20:16:11 +00:00
Craig Topper 316c58e8f1 [X86] Improve detection of unneeded shift amount masking to also handle the case that the LHS has known zeroes in it
If the LHS has known zeros, the RHS immediate will have had bits removed. So call computeKnownBits to get the known zeroes so we can handle this case.

Differential Revision: https://reviews.llvm.org/D58475

llvm-svn: 354811
2019-02-25 19:42:47 +00:00
Matt Arsenault fd6fd00773 AMDGPU: Correct definitions for bitset instructions
These really read and write the result register, so these need a tied
input.

llvm-svn: 354809
2019-02-25 19:24:46 +00:00
Nikita Popov fcbd7f6495 [Mips] Fix missing masking in fast-isel of br (PR40325)
Fixes https://bugs.llvm.org/show_bug.cgi?id=40325 by zero extending
(and x, 1) the condition before branching on it.

To avoid regressing trivial cases, I'm combining emission of cmp+br
sequences for the single-use + same block case (similar to what we
do in x86). icmpbr1.ll still regresses due to the cross-bb usage
of the condition.

Differential Revision: https://reviews.llvm.org/D58576

llvm-svn: 354808
2019-02-25 18:54:17 +00:00
Amara Emerson 6bcfa1c419 [AArch64][GlobalISel] Refactor selectBuildVector to use MachineIRBuilder. NFC.
This is a preparatory change as I want to use emitScalarToVector() elsewhere,
and in general we want to transition to MIRBuilder instead of using BuildMI
directly.

Differential Revision: https://reviews.llvm.org/D58528

llvm-svn: 354807
2019-02-25 18:52:54 +00:00
Philip Reames a64de6720b [Lanai] Be super conservative about atomics
As requested during review of D57601 <https://reviews.llvm.org/D57601>, be equally conservative for atomic MMOs as for volatile MMOs in all in tree backends. At the moment, all atomic MMOs are also volatile, but I'm about to change that.

Reviewed as part of https://reviews.llvm.org/D58490, with other backends still pending review.

llvm-svn: 354800
2019-02-25 17:36:10 +00:00
David Green b504f104b2 [ARM] Add some more missing T1 opcodes for the peephole optimisier
This adds a few extra Thumb1 opcodes to improve the peephole opimisers
ability to remove redundant cmp instructions. tADC and tSBC require
a small fixup to prevent MOVS being moved past the instruction, giving
the wrong flags.

Differential Revision: https://reviews.llvm.org/D58281

llvm-svn: 354791
2019-02-25 15:50:54 +00:00
Luke Cheeseman 59f77e7891 [AArch64] Add support for Cortex-A76 and Cortex-A76AE
- Add LLVM backend support for Cortex-A76 and Cortex-A76AE
- Documentation can be found at
  https://developer.arm.com/products/processors/cortex-a/cortex-a76

llvm-svn: 354788
2019-02-25 15:08:27 +00:00
Simon Pilgrim c61f1e8e6c [X86] Merge ISD::ADD/SUB nodes into X86ISD::ADD/SUB equivalents (PR40483)
Avoid ADD/SUB instruction duplication by reusing the X86ISD::ADD/SUB results.

Includes ADD commutation - I tried to include NEG+SUB SUB commutation as well but this causes regressions as we don't have good combine coverage to simplify X86ISD::SUB.

Differential Revision: https://reviews.llvm.org/D58597

llvm-svn: 354771
2019-02-25 11:19:37 +00:00
Simon Tatham b70fc0c5fd [ARM] Make fullfp16 instructions not conditionalisable.
More or less all the instructions defined in the v8.2a full-fp16
extension are defined as UNPREDICTABLE if you put them in an IT block
(Thumb) or use with any condition other than AL (ARM). LLVM didn't
know that, and was happy to conditionalise them.

In order to force these instructions to count as not predicable, I had
to make a small Tablegen change. The code generation back end mostly
decides if an instruction was predicable by looking for something it
can identify as a predicate operand; there's an isPredicable bit flag
that overrides that check in the positive direction, but nothing that
overrides it in the negative direction.

(I considered the alternative approach of actually removing the
predicate operand from those instructions, but thought that it would
be more painful overall for instructions differing only in data type
to have different shapes of operand list. This way, the only code that
has to notice the difference is the if-converter.)

So I've added an isUnpredicable bit alongside isPredicable, and set
that bit on the right subset of FP16 instructions, and also on the
VSEL, VMAXNM/VMINNM and VRINT[ANPM] families which should be
unpredicable for all data types.

I've included a couple of representative regression tests, both of
which previously caused an fp16 instruction to be conditionalised in
ARM state and (with -arm-no-restrict-it) to be put in an IT block in
Thumb.

Reviewers: SjoerdMeijer, t.p.northover, efriedma

Reviewed By: efriedma

Subscribers: jdoerfert, javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57823

llvm-svn: 354768
2019-02-25 10:39:53 +00:00
Kang Zhang 4faa4090c9 [PowerPC] [PowerPC] Enhance the fast selection of fptoi & fptrunc instruction and clean up related asserts
Summary:
Fast selection of llvm fptoi & fptrunc instructions is not handled well about
VSX instruction support.
We'd use VSX float convert integer instruction instead of non-vsx float convert
integer instruction if the operand register class is VSSRC or VSFRC because i32
and i64 are mapped to VSSRC and VSFRC correspondingly if VSX feature is
openeded.
For float trunc instruction, we do this silimar work like float convert integer
instruction to try to use VSX instruction.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D58430

llvm-svn: 354762
2019-02-25 02:46:16 +00:00
Simon Pilgrim cfaf663a35 [X86] Combine zext(packus(x),packus(y)) -> concat(x,y) (PR39637)
Its proving tricky to combine shuffles across multiple vector sizes, so for now I'm adding this more specific combine - the pattern is common enough to be worth it as a first step.

llvm-svn: 354757
2019-02-24 19:57:52 +00:00
Craig Topper 3fe4bd464c [X86] Fix tls variable lowering issue with large code model
Summary:
The problem here is the lowering for tls variable. Below is the DAG for the code.
SelectionDAG has 11 nodes:

t0: ch = EntryToken
      t8: i64,ch = load<(load 8 from `i8 addrspace(257)* null`, addrspace 257)> t0, Constant:i64<0>, undef:i64
        t10: i64 = X86ISD::WrapperRIP TargetGlobalTLSAddress:i64<i32* @x> 0 [TF=10]
      t11: i64,ch = load<(load 8 from got)> t0, t10, undef:i64
    t12: i64 = add t8, t11
  t4: i32,ch = load<(dereferenceable load 4 from @x)> t0, t12, undef:i64
t6: ch = CopyToReg t0, Register:i32 %0, t4
And when mcmodel is large, below instruction can NOT be folded.

  t10: i64 = X86ISD::WrapperRIP TargetGlobalTLSAddress:i64<i32* @x> 0 [TF=10]
t11: i64,ch = load<(load 8 from got)> t0, t10, undef:i64
So "t11: i64,ch = load<(load 8 from got)> t0, t10, undef:i64" is lowered to " Morphed node: t11: i64,ch = MOV64rm<Mem:(load 8 from got)> t10, TargetConstant:i8<1>, Register:i64 $noreg, TargetConstant:i32<0>, Register:i32 $noreg, t0"

When llvm start to lower "t10: i64 = X86ISD::WrapperRIP TargetGlobalTLSAddress:i64<i32* @x> 0 [TF=10]", it fails.

The patch is to fold the load and X86ISD::WrapperRIP.

Fixes PR26906

Patch by LuoYuanke

Reviewers: craig.topper, rnk, annita.zhang, wxiao3

Reviewed By: rnk

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58336

llvm-svn: 354756
2019-02-24 19:33:37 +00:00
Craig Topper 5532a98737 [X86][SSE] Use pblendw for v4i32/v2i64 during isel.
Summary:

Previously we used BLENDPS/BLENDPD but that puts the blend in the FP domain. Under optsize, the two address instruction pass can cause blendps/blendpd to commute to blendps/blendpd. But we probably shouldn't do that if the original type was a integer. So use pblendw instead.

Reviewers: spatel, RKSimon

Reviewed By: RKSimon

Subscribers: jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58574

llvm-svn: 354755
2019-02-24 19:23:41 +00:00
Craig Topper ce2bd19c49 [X86] Correct some ADC/SBB with immediate scheduler data for Broadwell and Skylake.
Summary:
The AX/EAX/RAX with immediate forms are 2 uops just like the AL with immediate.

The modrm form with r8 and immediate is a single uop just like r16/r32/r64 with immediate.

Reviewers: RKSimon, andreadb

Reviewed By: RKSimon

Subscribers: gbedwell, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58581

llvm-svn: 354754
2019-02-24 19:23:39 +00:00
Craig Topper be3348573e [LegalizeTypes][AArch64][X86] Make type legalization of vector (S/U)ADD/SUB/MULO follow getSetCCResultType for the overflow bits. Make UnrollVectorOverflowOp properly convert from scalar boolean contents to vector boolean contents
Summary:
When promoting the over flow vector for these ops we should use the target's desired setcc result type. This way a v8i32 result type will use a v8i32 overflow vector instead of a v8i16 overflow vector. A v8i16 overflow vector will cause LegalizeDAG/LegalizeVectorOps to have to use v8i32 and truncate to v8i16 in its expansion. By doing this in type legalization instead, we get the truncate into the DAG earlier and give DAG combine more of a chance to optimize it.

We also have to fix unrolling to use the scalar setcc result type for the scalarized operation, and convert it to the required vector element type after the scalar operation. We have to observe the vector boolean contents when doing this conversion. The previous code was just taking the scalar result and putting it in the vector. But for X86 and AArch64 that would have only put a the boolean value in bit 0 of the element and left all other bits in the element 0. We need to ensure all bits in the element are the same. I'm using a select with constants here because that's what setcc unrolling in LegalizeVectorOps used.

Reviewers: spatel, RKSimon, nikic

Reviewed By: nikic

Subscribers: javed.absar, kristof.beyls, dmgreen, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58567

llvm-svn: 354753
2019-02-24 19:23:36 +00:00
Simon Pilgrim 4f4f9abdfa [X86][AVX] Rename lowerShuffleByMerging128BitLanes to lowerShuffleAsLanePermuteAndRepeatedMask. NFC.
Name better matches the other similar 'lane permute' and 'repeated mask' functions we have.

llvm-svn: 354749
2019-02-24 17:30:06 +00:00
Heejin Ahn 20cf0749cb [WebAssembly] Rename a variable in CFGStackify (NFC)
llvm-svn: 354744
2019-02-24 08:30:06 +00:00
Heejin Ahn 25d924b41f [WebAssembly] Merge two identical switch case routines into one (NFC)
llvm-svn: 354743
2019-02-24 08:19:55 +00:00
Philip Reames 33d7e49bb7 [Hexagon, SystemZ] Be super conservative about atomics
As requested during review of D57601, be equally conservative for atomic MMOs as for volatile MMOs in all in tree backends. At the moment, all atomic MMOs are also volatile, but I'm about to change that.

Reviewed as part of https://reviews.llvm.org/D58490, with other backends still pending review.  

llvm-svn: 354740
2019-02-24 00:45:09 +00:00
Craig Topper be9eeb5526 Recommit r354363 "[X86][SSE] Generalize X86ISD::BLENDI support to more value types"
And its follow ups r354511, r354640.

A follow patch will fix the issue that caused it to be reverted.

llvm-svn: 354737
2019-02-23 21:41:42 +00:00
Craig Topper ccc860cb81 Recommit r354647 and r354648 "[LegalizeTypes] When promoting the result of EXTRACT_SUBVECTOR, also check if the input needs to be promoted. Use that to determine the element type to extract"
r354648 was a follow up to fix a regression "[X86] Add a DAG combine for (aext_vector_inreg (aext_vector_inreg X)) -> (aext_vector_inreg X) to fix a regression from my previous commit."

These were reverted in r354713 as their context depended on other patches that were reverted for a bug.

llvm-svn: 354734
2019-02-23 19:51:32 +00:00
Nikita Popov e661f946a7 [WebAssembly] Fix select of and (PR40805)
Fixes https://bugs.llvm.org/show_bug.cgi?id=40805 introduced by
patterns added in D53676.

I'm removing the patterns entirely here, as they are not correct
in the general case. If necessary something more specific can be
added in the future.

Differential Revision: https://reviews.llvm.org/D58575

llvm-svn: 354733
2019-02-23 18:59:01 +00:00
Simon Pilgrim f383a47b7d [X86][AVX] combineInsertSubvector - remove concat_vectors(load(x),load(x)) --> sub_vbroadcast(x)
D58053/rL354340 added this to EltsFromConsecutiveLoads directly

llvm-svn: 354732
2019-02-23 18:53:03 +00:00
Simon Pilgrim 398d0b9e96 Fix MSVC constant truncation warnings. NFCI.
llvm-svn: 354731
2019-02-23 18:49:02 +00:00
Simon Pilgrim e08f177ea2 [X86][AVX] concat_vectors(scalar_to_vector(x),scalar_to_vector(x)) --> broadcast(x)
For AVX1, limit this to i32/f32/i64/f64 loading cases only.

llvm-svn: 354730
2019-02-23 18:34:05 +00:00
Simon Pilgrim 31793733a0 [X86][AVX] Shuffle->Permute+Blend if we have one v4f64/v4i64 shuffle input in place
Even on AVX1 we can pretty cheaply (VPERM2F128+VSHUFPD) permute a single v4f64/v4i64 input (on AVX2 its just a single VPERMPD), followed by a BLENDPD.

llvm-svn: 354729
2019-02-23 17:10:47 +00:00
Craig Topper 75afc0105c [X86] Sign extend the 8-bit immediate when commuting blend instructions to match isel.
Conversion from ConstantSDNode to MachineInstr sign extends immediates from their APInt representation to int64_t.

This commit makes sure we do the same for commuting. The tests changes show how this improves CSE. This issue was made worse by the MachineCSE using commuteInstruction to undo a commute. So we virtually guarantee the sign extend from isel would be lost.

The improved CSE also occurred with r354363, but that was reverted. I'm working to undo the revert, but wanted to get this fix in while it was easy to see the results.

llvm-svn: 354724
2019-02-23 08:34:10 +00:00
Jordan Rupprecht 6387fa2715 [NFC] Fix typos: preceeding -> preceding
llvm-svn: 354715
2019-02-23 01:28:32 +00:00
Reid Kleckner e3876637cf Revert r354363 & co "[X86][SSE] Generalize X86ISD::BLENDI support to more value types"
r354363 caused https://crbug.com/934963#c1, which has a plain C reduced
test case.

I also had to revert some dependent changes:
- r354648
- r354647
- r354640
- r354511

llvm-svn: 354713
2019-02-23 01:19:42 +00:00
Craig Topper a9697f24cf [X86] Enable custom splitting of v8i64/v16i32 sext/zext for avx/avx2 when input type will be promoted by the type legalize to 128-bits.
If the the input type will be promoted to 128 bits its better to put a sign_extend_inreg/and in the 128 bit register before the split occurs. Otherwise we end up doing it on each half in the wider register.

Some of the overflow arithmetic tests are regressions, but I think we can make some improvement using getSetccResultType in DAG combine and/or type legalization.

llvm-svn: 354709
2019-02-23 00:35:02 +00:00
Konstantin Zhuravlyov 9a278bf6b5 Revert "AMDGPU/NFC: Cleanup subtarget predicates"
It breaks one of our downstream merges, so revert it
temporarily while investigating failures downstream

llvm-svn: 354700
2019-02-22 23:21:06 +00:00
Sam Clegg 8fffa1dfa3 [WebAssembly] Remove unneeded MCSymbolRefExpr variants
We record the type of the symbol (event/function/data/global) in the
MCWasmSymbol and so it should always be clear how to handle a relocation
based on the symbol itself.

The exception is a function which still needs the special @TYPEINDEX
then the relocation contains the signature rather than the address
of the functions.

Differential Revision: https://reviews.llvm.org/D58472

llvm-svn: 354697
2019-02-22 22:29:34 +00:00
Matt Arsenault 476e26b5d3 AMDGPU: Use removeAllRegUnitsForPhysReg
llvm-svn: 354686
2019-02-22 19:03:36 +00:00
Sam Clegg a5e68748bf [WebAssembly] Remove debug statement submitted in rL354657
Subscribers: dschuff, jgravelle-google, hiraditya, aheejin, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58549

llvm-svn: 354684
2019-02-22 19:00:03 +00:00
Sanjay Patel a9e289174a [x86] allow narrowing of vector UINT_TO_FP
As discussed in:
D56864
D58197

Always use the narrow (128-bit) instruction when possible.
We already had the signed int version of this transform.

llvm-svn: 354675
2019-02-22 15:47:45 +00:00
Sanjay Patel 1baf7896cc [x86] simplify code in combineExtractSubvector; NFC
Only the 1st fold is attempted pre-legalization, but it requires
legal (simple) types too, so we don't need an EVT in any of the code.

llvm-svn: 354674
2019-02-22 15:28:22 +00:00
Petar Jovanovic 6083106b12 [mips][micromips] fix filling delay slots for PseudoIndirectBranch_MM
Filling a delay slot in 32bit jump instructions with a 16bit instruction
can cause issues. According to the documentation such an operation is
unpredictable.
This patch adds opcode Mips::PseudoIndirectBranch_MM alongside
Mips::PseudoIndirectBranch and other instructions that are expanded to jr
instruction and do not allow a 16bit instruction in their delay slots.

Patch by Mirko Brkusanin.

Differential Revision: https://reviews.llvm.org/D58507

llvm-svn: 354672
2019-02-22 14:53:58 +00:00
David Green acb628b2af [ARM] Add some missing thumb1 opcodes to enable peephole optimisation of CMPs
This adds a number of missing Thumb1 opcodes so that the peephole optimiser can
remove redundant CMP instructions.

Reapplying this after the first attempt broke non-thumb1 code as the t2ADDri
instruction can be used with frame indices. In thumb1 we use tADDframe.

Differential Revision: https://reviews.llvm.org/D57833

llvm-svn: 354667
2019-02-22 12:23:31 +00:00
Diana Picus 35e1c6663c [ARM GlobalISel] Support floating point for Thumb2
This is exactly the same as arm mode, so for the instruction selector
tests we just extract them to a new file and run with the same checks
for both arm and thumb mode.

For the legalizer we need to update the tests for soft float a bit, but
only because BL and tBL are slightly different. We could be pedantic and
check that we get a well-formed BL for arm mode and a tBL for thumb, but
for the purposes of the legalizer test it's sufficient to just skip over
the predicate operands in the checks. Also note that we have the
pedantic checks in the divmod test, so we're covered.

llvm-svn: 354665
2019-02-22 09:54:54 +00:00