Commit Graph

134713 Commits

Author SHA1 Message Date
Ayal Zaks 840450549c [LV] Clamp MaxVF to power of 2.
If a loop has a constant trip count known to be a multiple of MaxVF (times user
UF), LV infers that no tail will be generated for any chosen VF. This relies on
the chosen VF's being powers of 2 bound by MaxVF, and assumes MaxVF is a power
of 2. Make sure the latter holds, in particular when MaxVF is set by a memory
dependence distance which may not be a power of 2.

Differential Revision: https://reviews.llvm.org/D80491
2020-05-25 11:24:33 +03:00
Fangrui Song 1b79509f97 [MCDwarf] Delete unneeded DW_AT_unspecified_parameters 2020-05-24 22:36:57 -07:00
Fangrui Song 20e9fc55fe [MCDwarf] Delete unneeded DW_AT_prototyped for DW_TAG_label 2020-05-24 22:24:24 -07:00
Orivej Desh 838d12207b [TargetLoweringObjectFileImpl] Use llvm::transform
Fixes a build issue with libc++ configured with _LIBCPP_RAW_ITERATORS (ADL not effective)

```
llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp:1602:3: error: no matching function for call to 'transform'
  transform(HexString.begin(), HexString.end(), HexString.begin(), tolower);
  ^~~~~~~~~
```

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D80475
2020-05-24 20:59:24 -07:00
Craig Topper 51dec88c5d [X86] Remove isCommutable flag from MULX instructions.
The fixed register constraint on EDX/RDX as an input
makes this not really commutable.
2020-05-24 15:02:36 -07:00
Simon Pilgrim 8a5aea7b50 [X86][AVX] Fold extract_subvector(subv_broadcast(x),c) -> (x)
If we're extracting an subvector from a broadcasted subvector of the same type then we can use the source vector directly.
2020-05-24 18:49:39 +01:00
Simon Pilgrim e508d643cf [X86][AVX] Fold extract_subvector(broadcast(x),c) -> extract_subvector(broadcast(x),0) iff c != 0
If we're extracting an upper subvector from a broadcast we're better off extracting the lowest subvector instead as it avoids an actual extract instruction and might help SimplifyDemandedVectorElts further simplify the code.
2020-05-24 18:05:54 +01:00
Sanjay Patel 57bb4787d7 [Pass Manager] remove EarlyCSE as clean-up for VectorCombine
EarlyCSE was added with D75145, but the motivating test is
not regressed by removing the extra pass now. That might be
because VectorCombine altered the way it processes instructions,
or it might be from (re)moving VectorCombine in the pipeline.

The extra round of EarlyCSE appears to cost approximately
0.26% in compile-time as discussed in D80236, so we need some
evidence to justify its inclusion here, but we do not have
that (yet).

I suspect that between SLP and VectorCombine, we are creating
patterns that InstCombine and/or codegen are not prepared for,
but we will need to reduce those examples and include them as
PhaseOrdering and/or test-suite benchmarks.
2020-05-24 12:36:21 -04:00
Florian Hahn 0deab8a54f [LV] Either get invariant condition OR vector condition.
Currently we unconditionally get the first lane of the condition
operand, even if we later use the full vector condition. This can result
in some unnecessary instructions being generated.

Suggested as follow-up in D80219.
2020-05-24 17:16:42 +01:00
Simon Pilgrim 1e7865d946 [X86] SimplifyMultipleUseDemandedBitsForTargetNode - add initial X86ISD::VSRAI handling.
This initial version only peeks through cases where we just demand the sign bit of an ashr shift, but we could generalize this further depending on how many sign bits we already have.

The pr18014.ll case is a minor annoyance - we've failed to to move the psrad/paddd after the blendvps which would have avoided the extra move, but we have still increased the ILP.
2020-05-24 16:07:46 +01:00
Simon Pilgrim 71bed8206b AMDGPU.h - reduce TargetMachine.h include. NFC.
Replace TargetMachine.h include with forward declaration and CodeGen.h include in AMDGPU.h.

Exposes a couple of implicit dependencies that require additional forward declarations/includes.
2020-05-24 15:27:41 +01:00
Kang Zhang 86e3abc9e6 [PowerPC] Add some InstAlias definitions
Summary:
This patch add the InstAlias definitions for below instructions.

ADDI ADDIS ADDI8 ADDIS8
RLWINM8
ISEL ISEL8
OR OR_rec ORI ORI8 XORI8
CNTLZW8 CNTLZW8_rec
TEND TSR
RFEBB
NOR NOR_rec
MTCRF
SUBF SUBF_rec SUBFC SUBFC_rec
RLDICL_32_64
TW

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D77559
2020-05-24 14:05:28 +00:00
Sanjay Patel c048a02b5b [InstCombine] fold FP trunc into exact itofp
Similar to D79116 and rGbfd512160fe0 - if the 1st cast
is exact, then we can go directly to the destination
type because there is no double-rounding.
2020-05-24 09:30:19 -04:00
Sanjay Patel 7eed772a27 [PatternMatch] abbreviate vector inst matchers; NFC
Readability is not reduced with these opcodes/match lines,
so reduce odds of awkward wrapping from 80-col limit.
2020-05-24 09:19:47 -04:00
Simon Pilgrim b05b69e056 AMDGPUInstPrinter.cpp - add CommandLine.h include. NFC.
Fixes implicit dependency that will be exposed by a future patch.
2020-05-24 14:17:04 +01:00
Florian Hahn 15224408f0 [VPlan] Use VPUser for VPWidenSelectRecipe operands (NFC).
VPWidenSelectRecipe already contains a VPUser, but it is not used. This
patch updates the code related to VPWidenSelectRecipe to use VPUser for
its operands.

Reviewers: Ayal, gilr, rengolin

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D80219
2020-05-24 13:58:08 +01:00
Simon Pilgrim 725b3463c5 AMDGPUTargetObjectFile.h - remove unnecessary includes. NFC.
As we're inheriting from TargetLoweringObjectFileELF, TargetLoweringObjectFileImpl.h already declares all types we require in the overrides.
2020-05-24 13:57:02 +01:00
Simon Pilgrim a650256062 AMDGPULibFunc - fix include order. NFC.
Ensure AMDGPULibFunc.h module header is first, and fix exposed missing forward declaration.
2020-05-24 13:25:59 +01:00
Simon Pilgrim d0f2a8a049 X86Subtarget.h - remove unnecessary TargetMachine.h include. NFC.
By moving X86Subtarget::isPositionIndependent() into X86Subtarget.cpp we can remove the header dependency and move the few uses into source files.
2020-05-24 12:30:22 +01:00
Simon Pilgrim 478f2ce5d3 [X86] Pull out repeated DemandedBits signmask variable. NFC.
Both paths always create the same DemandedBits mask.
2020-05-24 12:01:58 +01:00
Simon Pilgrim 1603106725 [TargetLowering] Improve expandFunnelShift shift amount masking
For the 'inverse shift', we currently always perform a subtraction of the original (masked) shift amount.

But for the case where we are handling power-of-2 type widths, we can replace:

(sub bw-1, (and amt, bw-1) ) -> (and (xor amt, bw-1), bw-1) -> (and ~amt, bw-1)

This allows x86 shifts to fold away the and-mask.

Followup to D77301 + D80466.

http://volta.cs.utah.edu:8080/z/Nod0Gr

Differential Revision: https://reviews.llvm.org/D80489
2020-05-24 11:25:09 +01:00
Simon Pilgrim ffb367217d [X86] Move CONCAT_VECTORS/INSERT_SUBVECTOR actions inside loop. NFC.
CONCAT_VECTORS/INSERT_SUBVECTOR both are custom on v32i1/v64i1 like the other ops in the loop.
2020-05-24 10:59:33 +01:00
Simon Pilgrim 04d32d7ac1 X86TargetMachine.h - remove unnecessary X86Subtarget forward declaration. NFC.
We have to include X86Subtarget.h.
2020-05-24 10:52:23 +01:00
Simon Pilgrim 8310c9b741 [X86][AVX] Call SimplifyDemandedBits on MaskedLoadSDNode with non-boolean masks
On X86 (AVX1/AVX2), non-boolean masked loads only demand the sign bit of the mask, we already do the equivalent for masked stores.

Annoyingly I can't easily handle this inside TargetLowering::SimplifyDemandedBits as this is an x86 specific case for a generic node.

Differential Revision: https://reviews.llvm.org/D80478
2020-05-24 09:51:21 +01:00
Craig Topper 2bb822bc90 [X86] Add family/model for Intel Comet Lake CPUs for -march=native and function multiversioning
This adds the family/model returned by CPUID for some Intel
Comet Lake CPUs. Instruction set and tuning wise these are
the same as "skylake".

These are not in the Intel SDM yet, but these should be correct.
2020-05-24 00:29:25 -07:00
Craig Topper 7940123084 [X86] Fix typo in comment. NFC 2020-05-24 00:29:24 -07:00
Simon Pilgrim cc65a7a5ea [X86] Improve i8 + 'slow' i16 funnel shift codegen
This is a preliminary patch before I deal with the xor+and issue raised in D77301.

We get much better code for i8/i16 funnel shifts by concatenating the operands together and performing the shift as a double width type, it avoids repeated use of the shift amount and partial registers.

fshl(x,y,z) -> (((zext(x) << bw) | zext(y)) << (z & (bw-1))) >> bw.
fshr(x,y,z) -> (((zext(x) << bw) | zext(y)) >> (z & (bw-1))) >> bw.

Alive2: http://volta.cs.utah.edu:8080/z/CZx7Cn

This doesn't do as well for i32 cases on x86_64 (the xor+and followup patch is much better) so I haven't bothered with that.

Cases with constant amounts are more dubious as well so I haven't currently bothered with those - its these kind of 'edge' cases that put me off trying to put this in TargetLowering::expandFunnelShift.

Differential Revision: https://reviews.llvm.org/D80466
2020-05-24 08:08:53 +01:00
Amara Emerson 99660217e9 [AArch64][GlobalISel] When generating SUBS for compares, don't write to wzr/xzr.
Although writing to wzr/xzr is correct since we don't care about the result
of the sub, only the flags, doing so causes tail merge blocks to fail.

Writing to an unused virtual register instead allows the optimization to fire,
improving performance significantly on 256.bzip2.

Differential Revision: https://reviews.llvm.org/D80460
2020-05-23 22:59:49 -07:00
Amy Kwan b631f86ac5 [TLI][PowerPC] Introduce TLI query to check if MULH is cheaper than MUL + SHIFT
This patch introduces a TargetLowering query, isMulhCheaperThanMulShift.

Currently in DAG Combine, it will transform mulhs/mulhu into a
wider multiply and a shift if the wide multiply is legal.

This TLI function is implemented on 64-bit PowerPC, as it is more desirable to
have multiply-high over multiply + shift for words and doublewords. Having
multiply-high can also aid in further transformations that can be done.

Differential Revision: https://reviews.llvm.org/D78271
2020-05-23 16:47:12 -05:00
Fangrui Song de172ef61e [CFIInstrInserter] Delete unneeded checks 2020-05-23 14:13:31 -07:00
Florian Hahn 8d04181198 [ValueTracking] Use assumptions in computeConstantRange.
This patch updates computeConstantRange to optionally take an assumption
cache as argument and use the available assumptions to limit the range
of the result.

Currently this is limited to assumptions that are comparisons.

Reviewers: reames, nikic, spatel, jdoerfert, lebedev.ri

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D76193
2020-05-23 20:07:52 +01:00
Nikita Popov 2833c46f75 [DwarfEHPrepare] Don't prune unreachable resumes at optnone
Disable pruning of unreachable resumes in the DwarfEHPrepare pass
at optnone. While I expect the pruning itself to be essentially free,
this does require a dominator tree calculation, that is not used for
anything else. Saving this DT construction makes for a 0.4% O0
compile-time improvement.

Differential Revision: https://reviews.llvm.org/D80400
2020-05-23 20:58:01 +02:00
Simon Pilgrim fe0006c882 TargetLowering.h - remove unnecessary TargetMachine.h include. NFC
Replace with forward declaration and move dependency down to source files that actually need it.

Both TargetLowering.h and TargetMachine.h are 2 of the most expensive headers (top 10) in the ClangBuildAnalyzer report when building llc.
2020-05-23 19:49:38 +01:00
Matt Arsenault cdd006eec9 SimplifyCFG: Clean up optforfuzzing implementation
This should function as any other SimplifyCFGOption rather than having
the transform check and specially consider the attribute itself.
2020-05-23 13:49:50 -04:00
Matt Arsenault 27fe841aa6 AMDGPU: Refine rcp/rsq intrinsic folding for modern FP rules
We have to assume undef could be an snan, which would need quieting so
returning qnan is safer than undef. Also consider strictfp, and don't
care if the result rounded.
2020-05-23 13:28:36 -04:00
Matt Arsenault 76e3dd0a49 AMDGPU: Implement isConstantPhysReg
I don't think any of these registers are used in contexts where this
would do anything yet.
2020-05-23 13:24:42 -04:00
Matt Arsenault 2e82667f60 AMDGPU: Define mode register
This should eventually model FP mode constraints as well as the other
special fields it tracks.
2020-05-23 13:24:42 -04:00
Georgii Rymar 304b0ed403 [yaml2obj] - Move "repeated section/fill name" check earlier.
This allows to simplify the code.
Doing checks early is generally useful.

Differential revision: https://reviews.llvm.org/D79985
2020-05-23 17:40:48 +03:00
Georgii Rymar 38c5d6f700 [yaml2obj] - Add a technical prefix for each unnamed chunk.
This change does not affect the produced binary.

In this patch I assign a technical suffix to each section/fill
(i.e. chunk) name when it is empty. It allows to simplify the code
slightly and improve error messages reported.

In the code we have the section to index mapping, SN2I, which is
globally used. With this change we can use it to map "empty"
names to indexes now, what is helpful.

Differential revision: https://reviews.llvm.org/D79984
2020-05-23 17:22:23 +03:00
Michal Paszkowski 335de55fa3 Revert "Added a new IRCanonicalizer pass."
This reverts commit 14d358537f.
2020-05-23 13:51:43 +02:00
Michal Paszkowski 14d358537f Added a new IRCanonicalizer pass.
Summary:
Added a new IRCanonicalizer pass which aims to transform LLVM modules into
a canonical form by reordering and renaming instructions while preserving the
same semantics. The canonicalizer makes it easier to spot semantic differences
when diffing two modules which have undergone different passes.

Presentation: https://www.youtube.com/watch?v=c9WMijSOEUg

Reviewed by: plotfi

Differential Revision: https://reviews.llvm.org/D66029
2020-05-23 12:45:53 +02:00
Nikita Popov 0c6bba71e3 [TargetPassConfig] Don't add alias analysis at optnone
When performing codegen at optnone, don't add alias analysis to
the pipeline. We don't need it, but it causes an unnecessary
dominator tree calculation.

I've also moved the module verifier call to the top so that a bunch
of disabled-at-optnone passes group more nicely.

Differential Revision: https://reviews.llvm.org/D80378
2020-05-23 10:35:03 +02:00
Craig Topper 7392820f98 [Align] Remove operations on MaybeAlign that asserted that it had a defined value.
If the caller needs to reponsible for making sure the MaybeAlign
has a value, then we should just make the caller convert it to an Align
with operator*.

I explicitly deleted the relational comparison operators that
were being inherited from Optional. It's unclear what the meaning
of two MaybeAligns were one is defined and the other isn't
should be. So make the caller reponsible for defining the behavior.

I left the ==/!= operators from Optional. But now that exposed a
weird quirk that ==/!= between Align and MaybeAlign required the
MaybeAlign to be defined. But now we use the operator== from
Optional that takes an Optional and the Value.

Differential Revision: https://reviews.llvm.org/D80455
2020-05-22 21:54:28 -07:00
Fangrui Song 0f6bd9cda6 [MC] Drop unneeded std::abs for DW_def_cfa_offset in DarwinX86AsmBackend::generateCompactUnwindEncoding
This clean-up is available after double negation bugs are fixed.
2020-05-22 21:12:47 -07:00
Fangrui Song 773f8dbd1d [MC] Fix double negation of DW_CFA_def_cfa
Negations are incorrectly added in numerous places and the code just happens to work.
Also fix a missed DW_CFA_def_cfa_offset negation in c693b9c321d5a40d012340619674cf790c9ac86c:
ARMAsmBackendDarwin::generateCompactUnwindEncoding
2020-05-22 21:02:53 -07:00
Fangrui Song c693b9c321 [MC] Fix double negation of DW_CFA_def_cfa_offset
Negations are incorrectly added in two places and the code works just
because the negations cancel each other.
2020-05-22 20:01:40 -07:00
Fangrui Song 0840d725c4 [MC] Change MCCFIInstruction::createDefCfaOffset to cfiDefCfaOffset which does not negate Offset
The negative Offset has caused a bunch of problems and confused quite a
few call sites. Delete the unneeded negation and fix all call sites.
2020-05-22 17:07:11 -07:00
Fangrui Song 7e49dc6184 [MC] Change MCCFIInstruction::createDefCfa to cfiDefCfa which does not negate Offset
The negative Offset has caused a bunch of problems and confused quite a
few call sites. Delete the unneeded negation and fix all call sites.
2020-05-22 15:47:26 -07:00
Stanislav Mekhanoshin 62fb3fa6d9 [AMDGPU] Define 6 dword subregs
This prevents autogeneration of degenerate names for these.

Differential Revision: https://reviews.llvm.org/D80451
2020-05-22 13:53:29 -07:00
Sanjay Patel 024098ae53 [VectorCombine] set preserve alias analysis
As noted in D80236, moving the pass in the pipeline exposed this
shortcoming. Extra work to recalculate the alias results showed
up as a compile-time slowdown.
2020-05-22 16:25:16 -04:00