Commit Graph

15592 Commits

Author SHA1 Message Date
Craig Topper 8555c91337 [X86] Use more accurate increments for the induction variables in sad.ll. NFC
I think some copy/pasting was used to create loops of different
VFs. But the increment of the induction variable wasn't updated
to match the VF.

This has no effect on the pattern matching we're testing, it just
helps the test make sense to the reader.
2020-05-01 18:55:22 -07:00
Simon Pilgrim 7cb5a51f38 [DAG] SimplifyDemandedVectorElts - add INSERT_SUBVECTOR SimplifyMultipleUseDemandedBits handling 2020-05-01 16:20:51 +01:00
Simon Pilgrim 65d32a9892 [DAG] SimplifyDemandedVectorElts - remove INSERT_SUBVECTOR if we don't demand the subvector 2020-05-01 16:20:51 +01:00
Simon Pilgrim e3c0be596c [DAG] SimplifyDemandedVectorElts - add EXTRACT_SUBVECTOR SimplifyMultipleUseDemandedBits handling 2020-05-01 13:48:07 +01:00
Simon Pilgrim 8cbd8194c1 [X86] Improving folding of concat_vectors of subvectors from the same broadcast
Handle concat_vectors(extract_subvector(broadcast(x)), extract_subvector(broadcast(x))) -> broadcast(x)

To expose this we also need collectConcatOps to recognise the insert_subvector(x, extract_subvector(x, lo), hi) subvector splat pattern
2020-05-01 11:23:10 +01:00
Hubert Tong 0e8608b3c3 [tests] Revert unhelpful change from d73eed42d1 2020-04-30 22:18:54 -04:00
Hubert Tong d73eed42d1 [tests] Speculative fix for buildbot breakage from c5f7c039ef 2020-04-30 22:04:53 -04:00
Craig Topper c5f7c039ef [X86] Add x, t and g modifiers for inline asm
This patch adds the x, t and g modifiers for inline asm from GCC. These will print a vector register as xmm*, ymm* or zmm* respectively.

I also fixed register names with modifiers with inteldialect so they are no longer printed with a leading %.

Patch by Amanieu d'Antras

Differential Revision: https://reviews.llvm.org/D78977
2020-04-30 17:45:45 -07:00
Craig Topper 6a1ad76dab [X86] Don't return true from isTruncateFree for vectors
Also fix some cost tables for vXi1 types to match the costs entries for the types they will be promoted to.

Differential Revision: https://reviews.llvm.org/D79045
2020-04-30 16:43:35 -07:00
Simon Pilgrim bf468f4349 [X86][SSE] Canonicalize UNARYSHUFFLE(XOR(X,-1) -> XOR(UNARYSHUFFLE(X),-1)
This pushes the NOT pattern up the DAG to help expose it for further combines (AND->ANDN in particular).

The PSHUFD/MOVDDUP 'splat' cases are the only ones I've seen in the wild so far, we can further generalize if/when we need to.
2020-04-30 19:18:51 +01:00
Simon Pilgrim 51308ee30c [X86] Extend combine-bitselect tests
Add AVX512VL tests and v2i64/v8i64 mask broadcast tests that match the existing v4i64 versions
2020-04-30 16:24:17 +01:00
Simon Pilgrim 30211c4783 [X86] combineANDXORWithAllOnesIntoANDNP - add BROADCAST handling
Fold BROADCAST(NOT(Y)) -> BROADCAST(Y) as part of finding a NOT inversion.
2020-04-30 16:24:17 +01:00
Simon Pilgrim f6cdcb0a5a [X86][SSE] Add bitselect tests where the mask is a broadcasted scalar
Shows issue that the IsNot() test can't see through shuffles/broadcasts
2020-04-30 14:45:21 +01:00
Simon Pilgrim 96238486ed [DAGCombine] Move the remaining X86 funnel shift patterns to DAGCombine
X86 matches several 'shift+xor' funnel shift patterns:

  fold (or (srl (srl x1, 1), (xor y, 31)), (shl x0, y))  -> (fshl x0, x1, y)
  fold (or (shl (shl x0, 1), (xor y, 31)), (srl x1, y))  -> (fshr x0, x1, y)
  fold (or (shl (add x0, x0), (xor y, 31)), (srl x1, y)) -> (fshr x0, x1, y)

These patterns are also what we end up with the proposed expansion changes in D77301.

This patch moves these to DAGCombine's generic MatchFunnelPosNeg.

All existing X86 test cases still pass, and we just have a small codegen change in pr32282.ll.

Reviewed By: @spatel

Differential Revision: https://reviews.llvm.org/D78935
2020-04-30 12:57:17 +01:00
Evgeniy Brevnov 3e68a66704 [BPI][NFC] Reuse post dominantor tree from analysis manager when available
Summary: Currenlty BPI unconditionally creates post dominator tree each time. While this is not incorrect we can save compile time by reusing existing post dominator tree (when it's valid) provided by analysis manager.

Reviewers: skatkov, taewookoh, yrouban

Reviewed By: skatkov

Subscribers: hiraditya, steven_wu, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78987
2020-04-30 11:31:03 +07:00
Sanjay Patel cecee111e4 [x86] add tests for awkward 'icmp eq i1'; NFC 2020-04-29 14:39:47 -04:00
Simon Pilgrim f0903de1aa [x86] Enable bypassing 64-bit division on generic x86-64
This is currently enabled for Intel big cores from Sandy Bridge onward, as well as Atom, Silvermont, and KNL, due to 64-bit division being so slow on these cores. AMD cores can do this in hardware (use 32-bit division based on input operand width), so it's not a win there. But since the majority of x86 CPUs benefit from this optimization, and since the potential upside is significantly greater than the downside, we should enable this for the generic x86-64 target.

Patch By: @atdt

Reviewed By: @craig.topper, @RKSimon

Differential Revision: https://reviews.llvm.org/D75567
2020-04-29 16:55:48 +01:00
Craig Topper 446a3be8f1 [X86] Add PACK instructions to hasUndefRegUpdate so the BreakFalseDeps pass will reassign an undef second source to match the first source
We generate PACK instructions with an undef second source when we are truncating from a 128-bit vector to something narrower and we don't care about the upper bits of the vector register. The register allocation process will always assign untied undef uses to xmm0. This creates a false dependency on xmm0.

By adding these instructions to hasUndefRegUpdate, we can get the BreakFalseDeps pass to reassign the source to match the other input. Normally this interface is used for instructions that might need an xor inserted to break the dependency. But the pass also has a heuristic that tries to use the same register as other sources. That should always be possible for these instructions so we'll never trigger the xor dependency break.

Differential Revision: https://reviews.llvm.org/D79032
2020-04-28 15:11:32 -07:00
Craig Topper 0de7ddbfb0 [X86] Handle more cases in combineAddOrSubToADCOrSBB.
This adds support for

X + SETAE --> sbb X, -1
X - SETAE --> adc X, -1

Fixes PR45700

Differential Revision: https://reviews.llvm.org/D78984
2020-04-28 10:39:39 -07:00
Craig Topper c480dc6b47 [X86] Pre-commit tests for D78984. NFC
These tests show some missed opportunities to use sbb/adc.
2020-04-28 10:39:37 -07:00
LemonBoy f30416fdde [AsmPrinter] Fix emission of non-standard integer constants for BE targets
The code assumed that zero-extending the integer constant to the
designated alloc size would be fine even for BE targets, but that's not
the case as that pulls in zeros from the MSB side while we actually
expect the padding zeros to go after the LSB.

I've changed the codepath handling the constant integers to use the
store size for both small(er than u64) and big constants and then add
zero padding right after that.

Differential Revision: https://reviews.llvm.org/D78011
2020-04-27 14:57:29 -07:00
Wei Mi 68d2301e12 Recommit "Generate Callee Saved Register (CSR) related cfi directives
like .cfi_restore"

Insert .cfi_offset/.cfi_register when IncomingCSRSaved of current block
is larger than OutgoingCSRSaved of its previous block.

Original commit message:
https://reviews.llvm.org/D42848 only handled CFA related cfi directives but
didn't handle CSR related cfi. The patch adds the CSR part. Basically it reuses
the framework created in D42848. For each basicblock, the patch tracks which
CSR set have been saved at its CFG predecessors's exits, and compare the CSR
set with the set at its previous basicblock's exit (The previous block is the
block laid before the current block). If the saved CSR set at its previous
basicblock's exit is larger, .cfi_restore will be inserted.

The patch also generates proper .cfi_restore in epilogue to make sure the
saved CSR set is consistent for the incoming edges of each block.

Differential Revision: https://reviews.llvm.org/D74303
2020-04-27 12:46:58 -07:00
Simon Pilgrim bd60b2983e [X86][SSE] Regenerate oddsubvector.ll test checks
Fixes some missed address symbol regexs
2020-04-27 18:48:02 +01:00
Simon Pilgrim d9e174dbf7 [X86][SSE] getFauxShuffle - account for PEXTW/PEXTB implicit zero-extension
The insert(truncate/extend(extract(vec0,c0)),vec1,c1) case in rGacbc5ede99 wasn't combining the 'mineltsize' with the src vector elt size which may be smaller due to implicit extension during extraction.

Reduced from test case provided by @mstorsjo
2020-04-27 12:46:50 +01:00
Simon Pilgrim acbc5ede99 [X86][SSE] getFauxShuffle - support insert(truncate/extend(extract(vec0,c0)),vec1,c1) shuffle patterns at the byte level
Followup to the PR45604 fix at rGe71dd7c011a3 where we disabled most of these cases.

By creating the shuffle at the byte level we can handle any extension/truncation as long as we track how small the scalar got and assume that the upper bytes will need to be zero.
2020-04-26 15:31:01 +01:00
Craig Topper c1cb733db6 [X86] Improve lowering of v16i8->v16i1 truncate under prefer-vector-width=256. 2020-04-25 15:20:33 -07:00
Sanjay Patel 7f4ff782d4 [x86] use vector instructions to lower even more FP->int->FP casts
This is another enhancement to D77895/D78362
to avoid a round-trip from XMM->GPR->XMM.
This time we handle the case of starting/ending with different FP types
but always with signed i32 as the intermediate value.
I think this covers all of the faux vector optimization possibilities
for pre-AVX512.

There is at least 1 other transform mentioned in PR36617:
https://bugs.llvm.org/show_bug.cgi?id=36617#c19
...where we fold an 'fpext' into a preceding 'sitofp'. I think we will
want to handle that earlier (DAGCombiner or instcombine) because that's
a target-independent optimization.

Differential Revision: https://reviews.llvm.org/D78758
2020-04-25 11:38:54 -04:00
Snehasish Kumar 0cc063a8ff Use .text.unlikely and .text.eh prefixes for MachineBasicBlock sections.
Summary:
Instead of adding a ".unlikely" or ".eh" suffix for machine basic blocks,
this change updates the behaviour to use an appropriate prefix
instead. This allows lld to group basic block sections together
when -z,keep-text-section-prefix is specified and matches the behaviour
observed in gcc.

Reviewers: tmsriram, mtrofin, efriedma

Reviewed By: tmsriram, efriedma

Subscribers: eli.friedman, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78742
2020-04-24 15:07:38 -07:00
Fangrui Song 10bc12588d [XRay] Change Sled.Function to PC-relative for sled version 2 and make llvm-xray support sled version 2 addresses
Follow-up of D78082 and D78590.

Otherwise, because xray_instr_map is now read-only, the absolute
relocation used for Sled.Function will cause a text relocation.
2020-04-24 14:41:56 -07:00
James Y Knight 248a5db3f2 Change callbr to only define its output SSA variable on the normal
path, not the indirect targets.

Fixes: PR45565.

Differential Revision: https://reviews.llvm.org/D78341
2020-04-23 19:36:44 -04:00
aartbik 907871d9ad [llvm] [CodeGen] Fixed vector halving bug for masked load
Summary:
Given a VL=14 that is enveloped by a proper VL=16, splitting the
masked load using the enveloping halving VL=8/8 should yields
should eventually yield V=8/5. This fixes various assert failures
in getHalfNumVectorElementsVT() and IncrementMemoryAddress().

Note, I suspect similar fixes will be needed for other masked
operations, but for now I send out a fix for masked load only.

Bugzilla issue 45563
https://bugs.llvm.org/show_bug.cgi?id=45563

Reviewers: craig.topper, mehdi_amini, nicolasvasilache

Reviewed By: craig.topper

Subscribers: hiraditya, dmgreen, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78608
2020-04-23 15:12:44 -07:00
aartbik e4e187d203 [llvm] [X86] Processed test with update_llc_test_checks
Summary:
As requested in another review for a similar regression test,
I updated this test with the same utility.

Reviewers: dmgreen, craig.topper, mehdi_amini, nicolasvasilache

Reviewed By: craig.topper

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78739
2020-04-23 15:09:53 -07:00
Sanjay Patel b53fd70b9e [x86] add tests for FP->int->FP with different FP types; NFC 2020-04-23 17:07:16 -04:00
Simon Pilgrim 757c7c244b [X86][SSE] Add SSE2 extract-concat tests
Check pre-SSE41 codegen where we have less PEXTR*/PINSR* instructions
2020-04-23 19:40:34 +01:00
Lucas Prates 727e6fb84a [NFC][llvm][X86] Adding missing -mtiple to X86 test.
The modified test was missing the specification of the intended triple
in its run line, assuming X86 is the default.
2020-04-22 11:55:57 +01:00
aartbik 5397f29087 [llvm] [X86] Make test more robust against different builds
Summary:
Rationale:
Using the --debug-only flag requires a debug build. Also, the debug output is not always consistent over different builds.
This change avoids all problems by just testing the generated assembly for AVX.

Reviewers: craig.topper, mehdi_amini, nicolasvasilache

Reviewed By: craig.topper

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78609
2020-04-22 00:23:46 -07:00
aartbik 8387bee94d [llvm] [X86] Fixed type bug in vselect for AVX masked load
Summary:
Bugzilla issue 45563
https://bugs.llvm.org/show_bug.cgi?id=45563

Reviewers: nicolasvasilache, mehdi_amini, craig.topper

Reviewed By: craig.topper

Subscribers: RKSimon, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78527
2020-04-21 11:11:35 -07:00
Fangrui Song 5771c98562 [XRay] Change xray_instr_map sled addresses from absolute to PC relative for x86-64
xray_instr_map contains absolute addresses of sleds, which are relocated
by `R_*_RELATIVE` when linked in -pie or -shared mode.

By making these addresses relative to PC, we can avoid the dynamic
relocations and remove the SHF_WRITE flag from xray_instr_map.  We can
thus save VM pages containg xray_instr_map (because they are not
modified).

This patch changes x86-64 and bumps the sled version to 2. Subsequent
changes will change powerpc64le and AArch64.

Reviewed By: dberris, ianlevesque

Differential Revision: https://reviews.llvm.org/D78082
2020-04-21 09:36:09 -07:00
Xiang1 Zhang 0980038a5e Handle CET for -exception-model sjlj
Summary:
In SjLj exception mode, the old landingpad BB will create a new landingpad BB and use indirect branch jump to the old landingpad BB in lowering.
So we should add 2 endbr for this exception model.

Reviewers: hjl.tools, craig.topper, annita.zhang, LuoYuanke, pengfei, efriedma

Reviewed By: LuoYuanke

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77124
2020-04-20 11:13:40 +08:00
Simon Pilgrim e71dd7c011 [X86][SSE] getFauxShuffle - don't combine shuffles with small truncated scalars (PR45604)
getFauxShuffle attempts to combine INSERT_VECTOR_ELT(TRUNCATE/EXTEND(EXTRACT_VECTOR_ELT(x))) patterns into a target shuffle chain.

PR45604 identified an issue where the scalar was truncated to a size smaller than the destination vector element and then zero extended back, which requires the upper bits to be zero'd which we don't currently do.

To avoid the bug I've added an early out in these truncation cases, a future commit should allow us to handle this by inserting the necessary SM_SentinelZero padding.
2020-04-19 13:35:22 +01:00
Sanjay Patel cceb630a07 [x86] use vector instructions to lower more FP->int->FP casts
This is an enhancement to D77895 to avoid another
round-trip from XMM->GPR->XMM. This time we handle
the case of starting/ending with an f64 and casting
to signed i32 as the intermediate value.

It's a bit more involved than I initially assumed
because we need to use target-specific opcodes to
represent the non-standard cast ops.

Differential Revision: https://reviews.llvm.org/D78362
2020-04-19 08:33:17 -04:00
Simon Pilgrim d6db919bee [X86][SSE] Add test case for PR45604 2020-04-19 13:13:54 +01:00
Andrew Litteken 8d5024f7fe fix to outline cfi instruction when can be grouped in a tail call
[MachineOutliner] fix test for excluding CFI and add test to include CFI in outlining

New test to check that we only outline CFI instruction if all CFI
Instructions in the function would be captured by the outlining

adding x86 tests analagous to AARCH64 cfi tests

Revision: https://reviews.llvm.org/D77852
2020-04-17 22:26:34 -07:00
Craig Topper 31a166e4cb [X86] Clean up some mir tests with INLINEASM to avoid regdef or to correct the immediate for the regdef.
The immediate used for the regdef is the encoding for the register
class in the enum generated by tablegen. This encoding will change
any time a new register class is added. Since the number is part
of the input, this means it can become stale.

This change modifies some test to avoid this kind of immediate
all together. And updates one test to use the current encoding of
GR64.
2020-04-17 21:55:44 -07:00
Craig Topper 5f69e53e55 [X86] Remove single incoming value phis from tests for the loop SAD pattern. NFC
InstCombine should ensure these don't exist.

I'm looking at making some changes to how we detect these
patterns and not having to worry about these phis will help.
2020-04-17 13:39:47 -07:00
Sanjay Patel a6fc687e34 [x86] add/adjust tests for FP<->int casts; NFC 2020-04-17 08:22:42 -04:00
Craig Topper 944cc5e0ab [SelectionDAGBuilder][CGP][X86] Move some of SDB's gather/scatter uniform base handling to CGP.
I've always found the "findValue" a little odd and
inconsistent with other things in SDB.

This simplfifies the code in SDB to just handle a splat constant
address or a 2 operand GEP in the same BB. This removes the
need for "findValue" since the operands to the GEP are
guaranteed to be available. The splat constant handling is
new, but was needed to avoid regressions due to constant
folding combining GEPs created in CGP.

CGP is now responsible for canonicalizing gather/scatters into
this form. The pattern I'm using for scalarizing, a scalar GEP
followed by a GEP with an all zeroes index, seems to be subject
to constant folding that the insertelement+shufflevector was not.

Differential Revision: https://reviews.llvm.org/D76947
2020-04-16 17:49:22 -07:00
Sanjay Patel b29fca30fa [x86] auto-generate complete test checks; NFC 2020-04-16 17:16:51 -04:00
bd1976llvm 86478d3de9 [MC][ELF] Put explicit section name symbols into entry size compatible sections
Ensure that symbols explicitly* assigned a section name are placed into
a section with a compatible entry size.

This is done by creating multiple sections with the same name** if
incompatible symbols are explicitly given the name of an incompatible
section, whilst:

  - Avoiding using uniqued sections where possible (for readability and
    to maximize compatibly with assemblers).

  - Creating as few SHF_MERGE sections as possible (for efficiency).

Given that each symbol is assigned to a section in a single pass, we
must decide which section each symbol is assigned to without seeing the
properties of all symbols. A stable and easy to understand assignment is
desirable. The following rules facilitate this: The "generic" section
for a given section name will be mergeable if the name is a mergeable
"default" section name (such as .debug_str), a mergeable "implicit"
section name (such as .rodata.str2.2), or MC has already created a
mergeable "generic" section for the given section name (e.g. in response
to a section directive in inline assembly). Otherwise, the "generic"
section for a given name is non-mergeable; and, non-mergeable symbols
are assigned to the "generic" section, while mergeable symbols are
assigned to uniqued sections.

Terminology:
"default" sections are those always created by MC initially, e.g. .text
or .debug_str.

"implicit" sections are those created normally by MC in response to the
symbols that it encounters, i.e. in the absence of an explicit section
name assignment on the symbol, e.g. a function foo might be placed into
a .text.foo section.

"generic" sections are those that are referred to when a unique section
ID is not supplied, e.g. if there are multiple unique .bob sections then
".quad .bob" will reference the generic .bob section. Typically, the
generic section is just the first section of a given name to be created.
Default sections are always generic.

* Typically, section names might be explicitly assigned in source code
using a language extension e.g. a section attribute: _attribute_
((section ("section-name"))) -
https://clang.llvm.org/docs/AttributeReference.html

** I refer to such sections as unique/uniqued sections. In assembly the
", unique," assembly syntax is used to express such sections.

Fixes https://bugs.llvm.org/show_bug.cgi?id=43457.

See https://reviews.llvm.org/D68101 for previous discussions leading to
this patch.

Some minor fixes were required to LLVM's tests, for tests had been using
the old behavior - which allowed for explicitly assigning globals with
incompatible entry sizes to a section.

This fix relies on the ",unique ," assembly feature. This feature is not
available until bintuils version 2.35
(https://sourceware.org/bugzilla/show_bug.cgi?id=25380). If the
integrated assembler is not being used then we avoid using this feature
for compatibility and instead try to place mergeable symbols into
non-mergeable sections or issue an error otherwise.

Differential Revision: https://reviews.llvm.org/D72194
2020-04-16 19:12:49 +00:00
Konstantin Schwarz 1a3e89aa2b [MIR] Add comments to INLINEASM immediate flag MachineOperands
Summary:
The INLINEASM MIR instructions use immediate operands to encode the values of some operands.
The MachineInstr pretty printer function already handles those operands and prints human readable annotations instead of the immediates. This patch adds similar annotations to the output of the MIRPrinter, however uses the new MIROperandComment feature.

Reviewers: SjoerdMeijer, arsenm, efriedma

Reviewed By: arsenm

Subscribers: qcolombet, sdardis, jvesely, wdng, nhaehnle, hiraditya, jrtc27, atanasyan, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78088
2020-04-16 13:46:14 +02:00