BitPermutationSelector sets Repl32 flag for bit groups which can be (potentially) benefit from 32-bit rotate-and-mask instructions with bit replication, i.e. rlwinm/rlwimi copies lower 32 bits into upper 32 bits on 64-bit PowerPC before rotation.
However, enforcing 32-bit instruction sometimes results in redundant generated code.
For example, the following simple code is compiled into rotldi + rlwimi while it can be compiled into only rldimi instruction if Repl32 flag is not set on the bit group for (a & 0xFFFFFFFF).
uint64_t func(uint64_t a, uint64_t b) {
return (a & 0xFFFFFFFF) | (b << 32) ;
}
To avoid such problem, this patch checks the potential benefit of Repl32 flag before setting it. If a bit group does not require rotation (i.e. RLAmt == 0) and won't be merged into another group, we do not benefit from Repl32 flag on this group.
Differential Revision: https://reviews.llvm.org/D47867
llvm-svn: 334195
isORCopyInst and isReadOrWriteToDSPReg functions were producing warning
that some statements my fall through.
Patch by Nikola Prica.
Differential Revision: https://reviews.llvm.org/D47876
llvm-svn: 334194
Simplify combineVectorTruncationWithPACKSS to just a SIGN_EXTEND_INREG followed by using the existing truncateVectorWithPACK instead of duplicating code.
llvm-svn: 334193
We do the same thing in rewriteSingleStoreAlloca.
Fixes PR37632.
Reviewers: chandlerc, davide, efriedma
Reviewed By: davide
Differential Revision: https://reviews.llvm.org/D47825
llvm-svn: 334187
This has two main components. First, widen
widen short constant loads in DAG when they have
the correct alignment. This is already done a bit in
AMDGPUCodeGenPrepare, since that has access to
DivergenceAnalysis. This can't help kernarg loads
created in the DAG. Start to use DAG divergence analysis
to help this case.
The second part is to avoid kernel argument lowering
breaking the alignment of short vector elements because
calling convention lowering wants to split everything
into legal register types.
When loading a split type, load the nearest 4-byte aligned
segment and shift to get the desired bits. This extra
load of the earlier argument piece ends up merging,
and the bit extract hopefully folds out.
There are a number of improvements and regressions with
this, but I think as-is this is a better compromise between
several of the worst parts of SelectionDAG.
Particularly when i16 is legal, this produces worse code
for i8 and i16 element vector kernel arguments. This is
partially due to the very weak load merging the DAG does.
It only looks for fairly specific combines between pairs
of loads which no longer appear. In particular this
causes v4i16 loads to be split into 2 components when
previously the two halves were merged.
Worse, because of the newly introduced shifts, there
is a lot more unnecessary vector packing and unpacking code
emitted. At least some of this is due to reporting
false for isTypeDesirableForOp for i16 as a workaround for
the lack of divergence information in the DAG. The cases
where this happens it doesn't actually matter, but the
relevant code in SimplifyDemandedBits doens't have the context
to know to ignore this.
The use of the scalar cache is probably more important
than the mess of mostly scalar instructions doing this packing
and unpacking. Future work can fix this, possibly by making better
use of the new DAG divergence information for controlling promotion
decisions, or adding another version of shift + trunc + shift
combines that doesn't only know about the used types.
llvm-svn: 334180
Summary: Prevent folding of operations with memory loads when one of the sources has undefined register update.
Reviewers: craig.topper
Subscribers: llvm-commits, mike.dvoretsky, ashlykov
Differential Revision: https://reviews.llvm.org/D47621
llvm-svn: 334175
Summary:
When the branch folder hoist code into a predecessor it adjust live-in's
in the blocks it hoist code from. However it fail to handle hoisted code
that contain a defed register that originally is live-in in the block
through a super register.
This is fixed by replacing the live-in handling code with calls to
utility functions in LivePhysRegs.
Reviewers: kparzysz, gberry, MatzeB, uweigand, aprantl
Reviewed By: kparzysz
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47529
llvm-svn: 334163
This is needed to get CC operand in right place, as expected by the
SchedModel.
Review: Ulrich Weigand
https://reviews.llvm.org/D47820
llvm-svn: 334161
When denormals are supported we are producing a full division for
1.0f / x. That still can be replaced by the faster version:
bool c = fabs(x) > 0x1.0p+96f;
float s = c ? 0x1.0p-32f : 1.0f;
x *= s;
return s * v_rcp_f32(x)
in case if requested accuracy is 2.5ulp or less. The same version
is used if denormals are not supported for non 1.0 numerators, where
just v_rcp_f32 is then used for 1.0 numerator.
The optimization of 1/x is extended to the case -1/x, which is the
same except for the resulting sign bit.
OpenCL conformance passed with both enabled and disabled denorms.
Differential Revision: https://reviews.llvm.org/D47805
llvm-svn: 334142
With the upcoming patch to add summary parsing support, IsAnalysis would
be true in contexts where we are not performing module summary analysis.
Rename to the more specific and approprate HaveGVs, which is essentially
what this flag is indicating.
llvm-svn: 334140
The bug report:
https://bugs.llvm.org/show_bug.cgi?id=36036
...requests a DAG change for this, but an IR canonicalization
probably handles most cases. If we still want to match this
pattern in the backend, there's a proposal for that too:
D47831
Alive proofs including nsw/nuw cases that were first noted in:
D46988
https://rise4fun.com/Alive/Kmp
This patch is largely copied from the existing code that was
initially added with:
D40984
...but I didn't see much gain from trying to share code.
llvm-svn: 334137
Fixes terrible code on targets without f16 support. The
legalization creates a mess that is difficult to recover
from. Also should avoid randomly breaking these tests
multiple times in sequence in future commits.
Some regressions in cases where it happens to be better
to pull the source modifier after the conversion.
llvm-svn: 334132
Summary:
This is [[ https://bugs.llvm.org/show_bug.cgi?id=37603 | PR37603 ]].
https://godbolt.org/g/VCMNpShttps://rise4fun.com/Alive/idM
When doing bit manipulations, it is quite common to calculate some bit mask,
and apply it to some value via `and`.
The typical C code looks like:
```
int mask_signed_add(int nbits) {
return (1 << nbits) - 1;
}
```
which is translated into (with `-O3`)
```
define dso_local i32 @mask_signed_add(int)(i32) local_unnamed_addr #0 {
%2 = shl i32 1, %0
%3 = add nsw i32 %2, -1
ret i32 %3
}
```
But there is a second, less readable variant:
```
int mask_signed_xor(int nbits) {
return ~(-(1 << nbits));
}
```
which is translated into (with `-O3`)
```
define dso_local i32 @mask_signed_xor(int)(i32) local_unnamed_addr #0 {
%2 = shl i32 -1, %0
%3 = xor i32 %2, -1
ret i32 %3
}
```
Since we created such a mask, it is quite likely that we will use it in `and` next.
And then we may get rid of `not` op by folding into `andn`.
But now that i have actually looked:
https://godbolt.org/g/VTUDmU
_some_ backend changes will be needed too.
We clearly loose `bzhi` recognition.
Reviewers: spatel, craig.topper, RKSimon
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47428
llvm-svn: 334127
Summary:
In D47428, i propose to choose the `~(-(1 << nbits))` as the canonical form of low-bit-mask formation.
As it is seen from these tests, there is a reason for that.
AArch64 currently better handles `~(-(1 << nbits))`, but not the more traditional `(1 << nbits) - 1` (sic!).
The other way around for X86.
It would be much better to canonicalize.
This patch is completely monkey-typing.
I don't really understand how this works :)
I have based it on `// x & (-1 >> (32 - y))` pattern.
Also, when we only have `BMI`, i wonder if we could use `BEXTR` with `start=0` ?
Related links:
https://bugs.llvm.org/show_bug.cgi?id=36419https://bugs.llvm.org/show_bug.cgi?id=37603https://bugs.llvm.org/show_bug.cgi?id=37610https://rise4fun.com/Alive/idM
Reviewers: craig.topper, spatel, RKSimon, javed.absar
Reviewed By: craig.topper
Subscribers: kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D47453
llvm-svn: 334125
These encodings correspond to the cases in the normal encoding scheme where there is no index and our modrm reading code initially decodes it as such. The VSIB handling code tried to compensate for this, but failed to add the base needed to make later code do the right thing.
Fixes PR37712.
llvm-svn: 334121
The index size is represented by the letter after the 'v'. The number represents the memory size. If an 'x' appears after the number its means the index register can be from VR128X/VR256X instead of VR128/VR256.
As vy512mem uses a VR256X index it should have an x.
And vz256mem uses a VR512 index so it shouldn't have an x.
I admit these names kind of suck and are confusing.
llvm-svn: 334120
As detailed on Agner's Microarchitecture doc (21.8 AMD Bobcat and Jaguar pipeline - Dependency-breaking instructions), all these instructions are dependency breaking and zero the destination register.
llvm-svn: 334119
Summary:
This change uses fmf subflags to guard optimizations as well as unsafe. These changes originated from D46483.
It contains only context for fsqrt.
Reviewers: spatel, hfinkel, arsenm
Reviewed By: spatel
Subscribers: hfinkel, wdng, andrew.w.kaylor, wristow, efriedma, nemanjai
Differential Revision: https://reviews.llvm.org/D47749
llvm-svn: 334113
Make TII isCopyInstr() return MachineOperands through pointer to pointer
instead via reference.
Patch by Nikola Prica.
Differential Revision: https://reviews.llvm.org/D47364
llvm-svn: 334105
If no alignment is set, the abi/preferred alignment of structs will be
used which may be higher than required. This can lead to extra padding
and in the end an increase in data size.
Differential Revision: https://reviews.llvm.org/D47633
llvm-svn: 334099
We should never get different CodeGen based on whether the code is being
compiled in debug mode so we must skip over @llvm.dbg.value (and similar)
calls.
Should fix at least the worst part of PR37690.
llvm-svn: 334090
Only the bottom 16-bits of BEXTR's control op are required (0:8 INDEX, 15:8 LENGTH).
Differential Revision: https://reviews.llvm.org/D47690
llvm-svn: 334083
On targets like Arm some relaxations may only be performed when certain
architectural features are available. As functions can be compiled with
differing levels of architectural support we must make a judgement on
whether we can relax based on the MCSubtargetInfo for the function. This
change passes through the MCSubtargetInfo for the function to
fixupNeedsRelaxation so that the decision on whether to relax can be made
per function. In this patch, only the ARM backend makes use of this
information. We must also pass the MCSubtargetInfo to applyFixup because
some fixups skip error checking on the assumption that relaxation has
occurred, to prevent code-generation errors applyFixup must see the same
MCSubtargetInfo as fixupNeedsRelaxation.
Differential Revision: https://reviews.llvm.org/D44928
llvm-svn: 334078
Add minimal support to lower function calls.
Support only functions with arguments/return that go through registers
and have type i32.
Patch by Petar Avramovic.
Differential Revision: https://reviews.llvm.org/D45627
llvm-svn: 334071
Fuchsia doesn't use __clear_cache, instead it provide zx_cache_flush
system call. Use it to flush instruction cache.
Differential Revision: https://reviews.llvm.org/D47753
llvm-svn: 334068
This is a fix for the problem arising in D47374 (PR37678):
https://bugs.llvm.org/show_bug.cgi?id=37678
We may not have throughput info because it's not specified in the model
or it's not available with variant scheduling, so assume that those
instructions can execute/complete at max-issue-width.
Differential Revision: https://reviews.llvm.org/D47723
llvm-svn: 334055
Summary: As it turns out, the lowering for the Mips16* family of target is the exact same thing as what the ops expands to, so the code handling them can be removed and the ops only enabled for the MipsSE* family of targets.
Reviewers: smaksimovic, atanasyan, abeserminji
Subscribers: sdardis, arichardson, llvm-commits
Differential Revision: https://reviews.llvm.org/D47703
llvm-svn: 334052
CodeGenPrepare pass move extension instructions close to load instructions in different BB, so they can be combined later. But the extension instructions can't move through logical and shift instructions in current implementation. This patch enables this enhancement, so we can eliminate more extension instructions.
Differential Revision: https://reviews.llvm.org/D45537
This is re-commit of r331783, which was reverted by r333305. The performance regression was caused by some unlucky alignment, not a code generation problem.
llvm-svn: 334049
There was only one place in the entire codebase where a non
default value was being passed, and that place was already hidden
in an implementation file. So we can delete the extra parameter
and all existing clients continue to work as they always have,
while making the interface a bit simpler.
Differential Revision: https://reviews.llvm.org/D47789
llvm-svn: 334046
Preserves the low bound of the !range. I don't think
it's legal to do anything with the top half since it's
theoretically reading garbage.
llvm-svn: 334045
Summary: This change uses fmf subflags to guard optimizations as well as unsafe. These changes originated from D46483.
Reviewers: spatel, hfinkel
Reviewed By: spatel
Subscribers: nemanjai
Differential Revision: https://reviews.llvm.org/D47389
llvm-svn: 334037
Similar to v4i32 SHL, convert v8i16 shift amounts to scale factors instead to improve performance and reduce instruction count. We were already doing this for constant shifts, this adds variable shift support.
Reduces the serial nature of the codegen, which relies on chains of plendvb/pand+pandn+por shifts.
This is a step towards adding support for vXi16 vector rotates.
Differential Revision: https://reviews.llvm.org/D47546
llvm-svn: 334023
Summary:
Allow extended parsing of variable assembler assignment syntax and modify X86 to permit
VAR = register assignment. As we emit these as .set directives when possible, we inline
such expressions in output assembly.
Fixes PR37425.
Reviewers: rnk, void, echristo
Reviewed By: rnk
Subscribers: nickdesaulniers, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D47545
llvm-svn: 334022
When legalizing illegal FP load results, this was
for some reason dropping the invariant and dereferencable
memory flags. There doesn't seem to be any reason for this,
and the equivalent isn't done for integer loads.
Fixes an issue in a future AMDGPU commit where some identical
loads fail to merge because one of the loads ends up
dropping the flags.
llvm-svn: 334020
When adjusting a cmp in order to canonicalize an abs/nabs select pattern we need
to use the type of the existing operand when creating a new operand not the
type of a select operand, as the two may be different.
This fixes PR37686.
llvm-svn: 334019
BitPermutationSelector builds the output value by repeating rotate-and-mask instructions with input registers.
Here, we may avoid one rotate instruction if we start building from an input register that does not require rotation.
For example of the test case bitfieldinsert.ll, it first rotates left r4 by 8 bits and then inserts some bits from r5 without rotation.
This can be executed by one rlwimi instruction, which rotates r4 by 8 bits and inserts its bits into r5.
This patch adds a check for rotation amounts in the comparator used in sorting to process the input without rotation first.
Differential Revision: https://reviews.llvm.org/D47765
llvm-svn: 334011
Ideally we'd use resolveTargetShuffleInputs to handle faux shuffles as well but:
(a) that code path doesn't handle general/pre-legalized ops/types very well.
(b) I'm concerned about the compute time as they recurse to calls to computeKnownBits/ComputeNumSignBits which would need depth limiting somehow.
llvm-svn: 334007
Summary:
Bringing some come duplicated in the AT&T and the Intel printers
into a common parent class.
Reviewers: craig.topper
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D47682
llvm-svn: 334005
When the branch target of a Thumb2 unconditional or conditonal branch is
resolved at assembly time, no range checking is performed on the result
leading to incorrect immediates. This change adds a range check:
+- 16 Megabytes for unconditional branches, +- 1 Megabyte for the
conditional branch.
Differential Revision: https://reviews.llvm.org/D46306
llvm-svn: 333997
The Thumb BL range is + or - either 16 Megabytes or 4 Megabytes depending
on whether the CPU supports Thumb2 or the v8-m baseline ops. The existing
check for BL range is incorrectly set at +- 32 Megabytes. This change
corrects the higher range and uses the lower range if the featurebits
don't have the necessary support for it.
Differential Revision: https://reviews.llvm.org/D46305
llvm-svn: 333991
This is the new version of D46181, allowing setjmp/longjmp
to work correctly with the Intel CET shadow stack by storing
SSP on setjmp and fixing it on longjmp. The patch has been
updated to use the cf-protection-return module flag instead
of HasSHSTK, and the bug that caused D46181 to be reverted
has been fixed with the test expanded to track that fix.
patch by mike.dvoretsky
Differential Revision: https://reviews.llvm.org/D47311
llvm-svn: 333990
Passing -mattr=-mmx needs to disable these instructions since the MMX register class won't have been set up. But we don't want -mattr=-mmx to disable SSE so we have to do it separately.
llvm-svn: 333984
RegAlloc keeps a insertion-time ordered map of evictee information,
but we only use membership. Replace MapVector with contextually
equivalent DenseMap which is smaller and faster.
llvm-svn: 333981
Start by emitting remarks for very basic unsupported cases such as
irreducible CFGs and EHFunclets. The end goal is to be able to cover all
the cases where we give up with an explanation.
llvm-svn: 333972
We already output true and false in the printer, but the parser isn't able to
read it.
Differential Revision: https://reviews.llvm.org/D47424
llvm-svn: 333970
Code review feedback from r328123 prefers copying the few feature test
macros used by Demangle into there, rather than sinking the header into
an odd corner like Demangle.
llvm-svn: 333965
The ELF version was broken (does not deal with wasm specific fixups),
and now is slightly less broken. It will be removed in its entirety
in the future which this change makes slightly easier (just remove
the IsELF bool).
Differential Revision: https://reviews.llvm.org/D47745
Patch by Wouter van Oortmerssen
llvm-svn: 333964
As noted in rL333782, we can be both better for optimization and
safer with this transform:
BinOp (shuffle V1, Mask), C --> shuffle (BinOp V1, NewC), Mask
The only potentially unsafe-to-speculate binops are integer div/rem.
All other binops are always safe (although I don't see a way to
assert that in code here).
For opcodes like shifts that can produce poison, it can't matter
here because we know the lanes with undef are dropped by the
subsequent shuffle.
Differential Revision: https://reviews.llvm.org/D47686
llvm-svn: 333962
Review feedback from r328165. Split out just the one function from the
file that's used by Analysis. (As chandlerc pointed out, the original
change only moved the header and not the implementation anyway - which
was fine for the one function that was used (since it's a
template/inlined in the header) but not in general)
llvm-svn: 333954
This is setting up to fix bug 37573 cleanly.
This moves data structures that are technically both used in some way by the
target and the general-purpose outlining algorithm into MachineOutliner.h. In
particular, the `Candidate` class is of importance.
Before, the outliner passed the locations of `Candidates` to the target, which
would then make some decisions about the prospective outlined function. This
change allows us to just pass `Candidates` along to the target. This will allow
the target to discard `Candidates` that would be considered unsafe before cost
calculation. Thus, we will be able to remove the unsafe candidates described in
the bug without resorting to torching the entire prospective function.
Also, as a side-effect, it makes the outliner a bit cleaner.
https://bugs.llvm.org/show_bug.cgi?id=37573
llvm-svn: 333952
Windows' CRT has a limit of 512 open file descriptors, and fds which are
generated by converting a HANDLE via _get_osfhandle count towards this
limit as well.
Regardless, often you find yourself marshalling back and forth between
native HANDLE objects and fds anyway. If we know from the getgo that
we're going to need to work directly with the handle, we can cut out the
marshalling layer while also not contributing to filling up the CRT's
very limited handle table.
On Unix these functions just delegate directly to the existing set of
functions since an fd *is* the native file type. It would be nice, very
long term, if we could convert most uses of fds to file_t.
Differential Revision: https://reviews.llvm.org/D47688
llvm-svn: 333945
Summary: This include variant for add, uaddo and addcarry. usubo and subcarry require the carry to be flipped to preserve semantic, but we chose to do the transform anyway in that case as to push the transform down the carry chain.
Reviewers: efriedma, spatel, RKSimon, zvi, bkramer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D46505
llvm-svn: 333943
Summary: It has been deprecated in favor of SETCCCARRY for a year now and isn't used by any in tree backend.
Reviewers: efriedma, craig.topper, dblaikie, bkramer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47685
llvm-svn: 333939
entries to reach the target. Since these calls don't require type checks,
we can short-circuit them to their real targets, except in cases when they
can be pre-empted.
Differential Revision: https://reviews.llvm.org/D46326
llvm-svn: 333937
When checking a select to see if it matches an abs, allow the true/false values
to be a sign-extension of the comparison value instead of requiring that they're
directly the comparison value, as all the comparison cares about is the sign of
the value.
This fixes a regression due to r333702, where we were no longer generating ctlz
due to isKnownNonNegative failing to match such a pattern.
Differential Revision: https://reviews.llvm.org/D47631
llvm-svn: 333927
On GFX9 and earlier, flat memory ops may decrement VMCNT out-of-order as well as LGKMCNT out-of-order.
Differential Revision: https://reviews.llvm.org/D46616
llvm-svn: 333926
This patch is the last of a sequence of three patches related to LLVM-dev RFC
"MC support for variant scheduling classes".
http://lists.llvm.org/pipermail/llvm-dev/2018-May/123181.html
This fixes PR36672.
The main goal of this patch is to teach llvm-mca how to solve variant scheduling
classes. This patch does that, plus it adds new variant scheduling classes to
the BtVer2 scheduling model to identify so-called zero-idioms (i.e. so-called
dependency breaking instructions that are known to generate zero, and that are
optimized out in hardware at register renaming stage).
Without the BtVer2 change, this patch would not have had any meaningful tests.
This patch is effectively the union of two changes:
1) a change that teaches llvm-mca how to resolve variant scheduling classes.
2) a change to the BtVer2 scheduling model that allows us to special-case
packed XOR zero-idioms (this partially fixes PR36671).
Differential Revision: https://reviews.llvm.org/D47374
llvm-svn: 333909
Summary:
The new rules are straightforward. The main rules to keep in mind
are:
1. NAME is an implicit template argument of class and multiclass,
and will be substituted by the name of the instantiating def/defm.
2. The name of a def/defm in a multiclass must contain a reference
to NAME. If such a reference is not present, it is automatically
prepended.
And for some additional subtleties, consider these:
3. defm with no name generates a unique name but has no special
behavior otherwise.
4. def with no name generates an anonymous record, whose name is
unique but undefined. In particular, the name won't contain a
reference to NAME.
Keeping rules 1&2 in mind should allow a predictable behavior of
name resolution that is simple to follow.
The old "rules" were rather surprising: sometimes (but not always),
NAME would correspond to the name of the toplevel defm. They were
also plain bonkers when you pushed them to their limits, as the old
version of the TableGen test case shows.
Having NAME correspond to the name of the toplevel defm introduces
"spooky action at a distance" and breaks composability:
refactoring the upper layers of a hierarchy of nested multiclass
instantiations can cause unexpected breakage by changing the value
of NAME at a lower level of the hierarchy. The new rules don't
suffer from this problem.
Some existing .td files have to be adjusted because they ended up
depending on the details of the old implementation.
Change-Id: I694095231565b30f563e6fd0417b41ee01a12589
Reviewers: tra, simon_tatham, craig.topper, MartinO, arsenm, javed.absar
Subscribers: wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D47430
llvm-svn: 333900
For immediates used in DUP instructions that have the range
-128 to 127, or a multiple of 256 in the range -32768 to 32512,
one could argue that when the result element size is 16bits (.h),
the value can be considered both signed and unsigned.
Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D47619
llvm-svn: 333873
Print the first indexed element as a FP register, for example:
mov z0.d, z1.d[0]
Is now printed as:
mov z0.d, d1
Next to printing, this patch also adds aliases to parse 'mov z0.d, d1'.
Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D47571
llvm-svn: 333872
Unpredicated copy of indexed SVE element to SVE vector,
along with MOV-aliases.
For example:
dup z0.h, z1.h[0]
duplicates the first 16-bit element from z1 to all elements in
the result vector z0.
Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D47570
llvm-svn: 333871
Predicated copy of floating-point immediate value to SVE vector,
along with MOV-aliases.
Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar
Reviewed By: javed.absar
Differential Revision: https://reviews.llvm.org/D47518
llvm-svn: 333869
Predicated copy of possibly shifted immediate value into SVE
vector, along with MOV-aliases.
Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D47517
llvm-svn: 333868
When we optimize select basing on fact that div by 0 is undef
we should not traverse the instruction which are not guaranteed to
transfer execution to next instruction. Guard intrinsic is an example.
Reviewers: spatel, craig.topper
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47576
llvm-svn: 333864
Applying synthetic debug info before the bitcode writer pass has no
testing-related purpose. This commit prevents that from happening.
It also adds tests which check that IR produced with/without
-debugify-each enabled is identical after stripping. This makes it
possible to check that individual passes (or full pipelines) are
invariant to debug info.
llvm-svn: 333861
pre-existing SymbolFlags and SymbolToDefinition maps.
This constructor is useful when delegating work from an existing
IRMaterialiaztionUnit to a new one, as it avoids the cost of re-computing these
maps.
llvm-svn: 333852
There's a patchwork of existing transforms trying to handle
these cases, but as seen in the changed test, we weren't
catching them all.
llvm-svn: 333845
Summary: This creates a small perf regression, but after talking with Jacques Pienaar, he was good with it to get things moving toward removng SETCCE.
Reviewers: jpienaar, bryant
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47626
llvm-svn: 333838
We currently support them only in AArch64. The NEON Reference,
however, says they are 'ARMv7, ARMv8' intrinsics.
Differential Revision: https://reviews.llvm.org/D47120
llvm-svn: 333825
Object FIle Representation
At codegen time this is emitted into the ELF file a pair of symbol indices and a weight. In assembly it looks like:
.cg_profile a, b, 32
.cg_profile freq, a, 11
.cg_profile freq, b, 20
When writing an ELF file these are put into a SHT_LLVM_CALL_GRAPH_PROFILE (0x6fff4c02) section as (uint32_t, uint32_t, uint64_t) tuples as (from symbol index, to symbol index, weight).
Differential Revision: https://reviews.llvm.org/D44965
llvm-svn: 333823
As noted in the review thread for rL333782, we could have
made a bug harder to hit if we were simplifying instructions
before trying other folds.
The shuffle transform in question isn't ever a simplification;
it's just a canonicalization. So I've renamed that to make that
clearer.
This is NFCI at this point, but I've regenerated the test file
to show the cosmetic value naming difference of using
instcombine's RAUW vs. the builder.
Possible follow-ups:
1. Move reassociation folds after simplifies too.
2. Refactor common code; we shouldn't have so much repetition.
llvm-svn: 333820
We currently support them only in AArch64. The NEON Reference,
however, says they are 'ARMv7, ARMv8' intrinsics.
Differential Revision: https://reviews.llvm.org/D47121
llvm-svn: 333819
These instructions are unusual in that they operate on 4 consecutive registers so supporting them in codegen will be more difficult than normal.
Includes an assembler check to warn if the source register is not the first register of a 4 register group.
llvm-svn: 333812
Summary:
I noticed this issue because we didn't put the primary cloned loop into
the `NonChildClonedLoops` vector and so never iterated on it. Once
I fixed that, it made it clear why I had to do a really complicated and
unnecesasry dance when updating the loops to remain in canonical form --
I was unwittingly working around the fact that the primary cloned loop
wasn't in the expected list of cloned loops. Doh!
Now that we include it in this vector, we don't need to return it and we
can consolidate the update logic as we correctly have a single place
where it can be handled.
I've just added a test for the iteration order aspect as every time
I changed the update logic partially or incorrectly here, an existing
test failed and caught it so that seems well covered (which is also
evidenced by the extensive working around of this missing update).
Reviewers: asbirlea, sanjoy
Subscribers: mcrosier, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D47647
llvm-svn: 333811
and using the latter in DIBuilder::createArtificialType and
DIBuilder::createObjectPointerType methods as well as introducing
mirroring DISubprogram::cloneWithFlags and
DIBuilder::createArtificialSubprogram methods.
The primary goal here is to add createArtificialSubprogram to support
a pass downstream while keeping the method consistent with the
existing ones and making sure we don't encourage changing already
created DI-nodes.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D47615
llvm-svn: 333806
Previously we just returned undef, but really we should be returning the pass thru input. We also need to make sure we preserve the chain output that the original intrinsic node had to maintain connectivity in the DAG. So we should just return the incoming chain as the output chain.
llvm-svn: 333804
The idea behind WindowsSupport.h is that it's in the source directory so
that windows.h'isms don't leak out into the larger LLVM project. To that
end, any symbol that references a symbol from windows.h must be in this
private header, and not in a public header.
However, we had some useful utility functions in WindowsSupport.h which
have no dependency on the Windows API, but still only make sense on
Windows. Those functions should be usable outside of Support since there
is no risk of causing a windows.h leak. Although this introduces some
preprocessor logic in some header files, It's not too egregious and it's
better than the alternative of duplicating a ton of code.
Differential Revision: https://reviews.llvm.org/D47662
llvm-svn: 333798
Summary:
Getelementptr returns a vector of pointers, instead of a single address,
when one or more of its arguments is a vector. In such case it is not
possible to simplify the expression by inserting a bitcast of operand(0)
into the destination type, as it will create a bitcast between different
sizes.
Reviewers: majnemer, mkuper, mssimpso, spatel
Reviewed By: spatel
Subscribers: lebedev.ri, llvm-commits
Differential Revision: https://reviews.llvm.org/D46379
llvm-svn: 333783
This bug:
https://bugs.llvm.org/show_bug.cgi?id=37648
...was created with the enhancement to this transform with rL332479.
The urem test shows the disaster potential: any undef divisor lane makes
the whole op undef.
The test diffs show that vector demanded elements turns some of the potential,
but not all, unused binop operands back into undef already.
llvm-svn: 333782
The `MipsAsmParser::loadImmediate` can load immediates of various sizes
into a register. Idea of this change is to use `loadImmediate` in the
`MipsAsmParser::expandMemInst` method to load offset into a register and
then call required load/store instruction.
The patch removes separate `expandLoadInst` and `expandStoreInst`
methods and does everything in the `expandMemInst` method to escape code
duplication.
Differential Revision: https://reviews.llvm.org/D47316
llvm-svn: 333774
Supporting GOT and TLS related relocations by the `.reloc` directive is
useful for purpose of testing various tools like a linker, for example.
llvm-svn: 333773
Summary:
Emit summaries for bitcode modules that are only destined for the
regular LTO portion of the build so they can participate in
summary-based dead stripping.
This change reduces the size of a nacl_helper build with cfi-icall
enabled by 7%, removing the majority of the overhead due to enabling
cfi-icall. The cfi-icall size increase was caused by compiling in lots
of unused code and cfi-icall generating jumptable references to unused
symbols that could no longer be removed by -Wl,-gc-sections. Increasing
the visibility of summary-based dead stripping prevented jumptable
entries being created for unused symbols from the regular LTO portion
of the build.
Reviewers: pcc
Reviewed By: pcc
Subscribers: dschuff, mehdi_amini, inglorion, eraman, llvm-commits, kcc
Differential Revision: https://reviews.llvm.org/D47594
llvm-svn: 333768
Summary:
They've been deprecated in favor of UADDO/ADDCARRY or USUBO/SUBCARRY for a while.
Target that uses these opcodes are changed in order to ensure their behavior doesn't change.
Reviewers: efriedma, craig.topper, dblaikie, bkramer
Subscribers: jholewinski, arsenm, jyknight, sdardis, nemanjai, nhaehnle, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, mgrang, atanasyan, llvm-commits
Differential Revision: https://reviews.llvm.org/D47422
llvm-svn: 333748
Before we were relying on the any extend of the s1 to s32, but
for AAPCS we need to zero-extend it to at least s8.
Fixes PR36719
Differential Revision: https://reviews.llvm.org/D47425
llvm-svn: 333747
Unpredicated copy of floating-point immediate value into SVE vector,
along with MOV-aliases.
Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D47482
llvm-svn: 333744
This patch updates IPSCCP to use PredicateInfo to propagate
facts to true branches predicated by EQ and to false branches
predicated by NE.
As a follow up, we should be able to extend it to also propagate additional
facts about nonnull.
Reviewers: davide, mssimpso, dberlin, efriedma
Reviewed By: davide, dberlin
Differential Revision: https://reviews.llvm.org/D45330
llvm-svn: 333740
Summary:
Back when we were introducing the DWARF v5 name index, there was a
short discussion whether we shouldn't have a nicer api for iterating
over the index. At that time, I did not find it necessary since the
iteration over names was done only from within the index itself (and I
figured the internal implementation can deal with a slightly rough
interface).
However, now I ran into a use for this kind of API in LLDB (for finding
all names matching a regular expression), so it looked like a nice
opportunity to introduce one. To make the API more useful, I've made the
NameTableEntry class a bit smarter: it now stores the string section
reference (so it can return its name) and its position in the name index
(mainly useful for dumping/logging).
I also convert the internal users to use the new API, which also gives
test coverage for the added code.
Reviewers: JDevlieghere, aprantl, dblaikie
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47590
llvm-svn: 333738
A 5-bit value can occur when EVEX.X is 0 due to it being used to extend modrm.rm to encode XMM16-31. But if modrm.rm instead encodes a GPR, the Intel documentation says EVEX.X should be ignored so just mask it to 4 bits once we know its a GPR.
llvm-svn: 333725
This was an accidental side effect of EVEX.X being used to encode XMM16-XMM31 using modrm.rm with modrm.mod==0x3.
I think there are still more bugs related to this.
llvm-svn: 333722