Clang already stopped using these a couple months ago.
The test cases aren't great as there is nothing forcing the operations to stay in k-registers so some of them moved back to scalar ops due to the bitcasts being moved around.
llvm-svn: 324177
This, in instcombine, allows conversions to i8/i16/i32 (very
common cases) even if the resulting type is not legal according
to the data layout. This can often open up extra combine
opportunities.
Differential Revision: https://reviews.llvm.org/D42424
llvm-svn: 324174
Without extra instructions and uses, swapMayExposeCSEOpportunities() would change
the icmp (as seen in the check lines), so we were not actually testing patterns
that should be handled by D41480.
llvm-svn: 324143
Turns out I misunderstood the flag behavior of PTEST because I read the documentation for KORTEST which is different than PTEST/KTEST and made a bad assumption.
Keep the test rename though cause that's useful.
llvm-svn: 324129
Summary:
We should always be able to accept AVX512 registers and instructions in llvm-mc. The only subtarget mode that should be checked is 16-bit vs 32-bit vs 64-bit mode.
I've also removed all the mattr/mcpu lines from test RUN lines to be consistent with this. Most were due to AVX512, but a few were for other features.
Fixes PR36202
Reviewers: RKSimon, echristo, bkramer
Reviewed By: echristo
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42824
llvm-svn: 324106
This patch implements analysis for new-format TBAA access tags
with aggregate types as their final access types.
Differential Revision: https://reviews.llvm.org/D41501
llvm-svn: 324092
This fixes PR36187.
Patch teaches ThinLTO to drop non-prevailing variables,
just like we recently did for functions (in r323633).
Differential revision: https://reviews.llvm.org/D42798
llvm-svn: 324075
Summary:
When creating the debug fragments for a SRA'd variable, use the types'
allocation sizes. This fixes issues where the pass would emit too small
fragments, placed at the wrong offset, for padded types.
An example of this is long double on x86. The type is represented using
x86_fp80, which is 10 bytes, but the value is aligned to 12/16 bytes.
The padding is included in the type's DW_AT_byte_size attribute;
therefore, the fragments should also include that. Newer GCC releases
(I tested 7.2.0) emit 12/16-byte pieces for long double. Earlier
releases, e.g. GCC 5.5.0, behaved as LLVM did, i.e. by emitting a
10-byte piece, followed by an empty 2/6-byte piece for the padding.
Failing to cover all `DW_AT_byte_size' bytes of a value with non-empty
pieces results in the value being printed as <optimized out> by GDB.
Patch by: David Stenberg
Reviewers: aprantl, JDevlieghere
Reviewed By: aprantl, JDevlieghere
Subscribers: llvm-commits
Tags: #debug-info
Differential Revision: https://reviews.llvm.org/D42807
llvm-svn: 324066
When handling vectors with non byte-sized elements, reverse the order of the
elements in the built integer if the target is Big-Endian.
SystemZ tests updated.
Review: Eli Friedman, Ulrich Weigand.
https://reviews.llvm.org/D42786
llvm-svn: 324063
test/CodeGen/SystemZ/vec-trunc-to-i1.ll was marked as a temporary
FAIL when it was previously updated when it needed one more COPY.
This was however wrong, since the loop body had been reduced
significantly, and it was actually an improvement.
Review: Ulrich Weigand.
llvm-svn: 324060
llvm-objdump could get C feature by ELF::EF_RISCV_RVC e_flag,
so then we don't have to add -mattr=+c on the command line.
Differential Revision: https://reviews.llvm.org/D42629
llvm-svn: 324058
This fixes a crash where the user is a COPY, which deliberately does not
constrain its source operands, resulting in a vreg without a reg class escaping
selection.
Differential Revision: https://reviews.llvm.org/D42697
llvm-svn: 324047
Example situation:
```
BB0:
%0 = ...
use %0
; ...
condjump BB1
jmp BB2
BB1:
%0 = ... ; rematerialized def from above (from earlier split step)
jmp BB2
BB2:
; ...
use %0
```
%0 will have a live interval with 3 value numbers (for the BB0, BB1 and
BB2 parts). Now SplitKit tries and succeeds in rematerializing the value
number in BB2 (This only works because it is a secondary split so
SplitKit is can trace this back to a single original def).
We need to recompute all live ranges affected by a value number that we
rematerialize. The case that we missed before is that when the value
that is rematerialized is at a join (Phi VNI) then we also have to
recompute liveness for the predecessor VNIs.
rdar://35699130
Differential Revision: https://reviews.llvm.org/D42667
llvm-svn: 324039
Summary:
This update now allows users to specify `--blame-context` and `--blame-context-all` to print source file blame information for the source of the blame.
Also updates the inline printing to correctly identify the top of the inlining stack for blame information.
Patch by Mitch Phillips!
Reviewers: vlad.tsyrklevich
Subscribers: llvm-commits, kcc, pcc
Differential Revision: https://reviews.llvm.org/D40111
llvm-svn: 324035
This is the enhancement suggested in D42536 to fix a shortcoming in
regular InstCombine's canEvaluate* functionality.
When we have multiple uses of a value, but they're all in one instruction, we can
allow that expression to be narrowed or widened for the same cost as a single-use
value.
AFAICT, this can only matter for multiply: sub/and/or/xor/select would be simplified
away if the operands are the same value; add becomes shl; shifts with a variable shift
amount aren't handled.
Differential Revision: https://reviews.llvm.org/D42739
llvm-svn: 324014
This is a rather non-controversial change. We were missing these instructions
from the list of instructions that are lane-sensitive. These two put the result
into lane 0 (BE) or 3 (LE) regardless of the input. This patch fixes PR36068.
llvm-svn: 324005
We were only checking the element count, but not the total width. This could cause illegal bitcasts to be created if for example the output was 512-bits, but N1 is 256 bits, and the extraction size was 128-bits.
Fixes PR36199
Differential Revision: https://reviews.llvm.org/D42809
llvm-svn: 324002
Until we support extending loads properly we're going to fall back for these.
We already handle stores in the same way, so this is just being consistent.
llvm-svn: 324001
Increment the field list member count for base classes and virtual base
classes.
Differential Revision: https://reviews.llvm.org/D41874
llvm-svn: 324000
Summary:
This change extends MachineCopyPropagation to do COPY source forwarding
and adds an additional run of the pass to the default pass pipeline just
after register allocation.
This version of this patch uses the newly added
MachineOperand::isRenamable bit to avoid forwarding registers is such a
way as to violate constraints that aren't captured in the
Machine IR (e.g. ABI or ISA constraints).
This change is a continuation of the work started in D30751.
Reviewers: qcolombet, javed.absar, MatzeB, jonpa, tstellar
Subscribers: tpr, mgorny, mcrosier, nhaehnle, nemanjai, jyknight, hfinkel, arsenm, inouehrs, eraman, sdardis, guyblank, fedor.sergeev, aheejin, dschuff, jfb, myatsina, llvm-commits
Differential Revision: https://reviews.llvm.org/D41835
llvm-svn: 323991
This allows us to use PSHUFB for v8i16/v4i32 and VPERMD/PERMPS for v4i64/v4f64 variable shuffles.
Differential Revision: https://reviews.llvm.org/D42487
llvm-svn: 323987
Summary:
EmitTest sometimes creates X86ISD::AND specifically to hide the AND from DAG combine. But this prevents isel patterns that look for (cmp (and X, Y), 0) from being able to see it. So we end up with an AND and a TEST. The TEST gets removed by compare instruction optimization during the peephole pass.
This patch attempts to fix this by converting X86ISD::AND with no flag users back into ISD::AND during the DAG preprocessing just before isel.
In order to do this correctly I had to make the X86ISD::AND node created by EmitTest in this case really have a flag output. Which arguably it should have had anyway so that the number of operands would be consistent for the opcode in all cases. Then I had to modify the ReplaceAllUsesWith to understand that we might be looking at an instruction with 2 outputs. Though in this case there are no uses to replace since we just created the node, but that's what the code did before so I just made it keep working.
Reviewers: spatel, RKSimon, niravd, deadalnix
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42764
llvm-svn: 323982
As shown in the example in PR34994:
https://bugs.llvm.org/show_bug.cgi?id=34994
...we can return a very wrong answer (inf instead of 0.0) for square root when
using a reciprocal square root estimate instruction.
Here, I've conditionalized the filtering out of denorms based on the function
having "denormal-fp-math"="ieee" in its attributes. The other options for this
attribute are 'preserve-sign' and 'positive-zero'.
So we don't generate this extra code by default with just '-ffast-math' (because
then there's no denormal attribute string at all), but it works if you specify
'-ffast-math -fdenormal-fp-math=ieee' from clang.
As noted in the review, there may be other problems in clang that affect the
results depending on platform (Linux x86 at least), but this should allow
creating the desired codegen.
Differential Revision: https://reviews.llvm.org/D42323
llvm-svn: 323981
Summary:
In Instruction Selection UpdateChains replaces all matched Nodes'
chain references including interior token factors and deletes them.
This may allow nodes which depend on these interior nodes but are not
part of the set of matched nodes to be left with a dangling dependence.
Avoid this by doing the replacement for matched non-TokenFactor nodes.
Fixes PR36164.
Reviewers: jonpa, RKSimon, bogner
Subscribers: llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D42754
llvm-svn: 323977
Commit r323512 introduced an optimisation in LowerReturn for half-precision
return values. A missing check caused a crash when the return value is "undef"
(i.e. a node that has no operands).
Differential Revision: https://reviews.llvm.org/D42743
llvm-svn: 323968
This patch includes EVA instructions in the Std2MicroMips mapping
tables, which is required for direct object emission.
Differential Revision: https://reviews.llvm.org/D41771
llvm-svn: 323958
This fixes bugzilla 33011
https://bugs.llvm.org/show_bug.cgi?id=33011
Defines bits {19-16} as zero or unpredictable as specified by the ARM ARM in
sections A8.8.116 and A8.8.117.
It fixes also the usage of PC register as destination register for MVN
register-shifted register version as specified in A8.8.117.
Differential Revision: https://reviews.llvm.org/D41905
llvm-svn: 323954
This, in instcombine, allows conversions to i8/i16/i32 (very
common cases) even if the resulting type is not legal according
to the data layout. This can often open up extra combine
opportunities.
Differential Revision: https://reviews.llvm.org/D42424
llvm-svn: 323951
Summary:
Before emitting code for scaled registers, we prevent
SCEVExpander from hoisting any scaled addressing mode
by emitting all the bases first. However, these bases
are being forced to the final type, resulting in some
odd code.
For example, if the type of the base is an integer and
the final type is a pointer, we will emit an inttoptr
for the base, a ptrtoint for the scale, and then a
'reverse' GEP where the GEP pointer is actually the base
integer and the index is the pointer. It's more intuitive
to use the pointer as a pointer and the integer as index.
Patch by: Bevin Hansson
Reviewers: atrick, qcolombet, sanjoy
Reviewed By: qcolombet
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42103
llvm-svn: 323946
Fix the infinite loop reported in PR35809. It can occur with GCC-style
EH table assembly, where the compiler relies on the assembler to
calculate the offsets in the EH table.
Also see https://sourceware.org/bugzilla/show_bug.cgi?id=4029 for the
equivalent issue in the GNU assembler.
Patch by Ryan Prichard!
llvm-svn: 323934
This covers the case where TruncInst leaf node is a constant expression.
See PR36121 for more details.
Differential Revision: https://reviews.llvm.org/D42622
llvm-svn: 323926
Discussed here:
http://lists.llvm.org/pipermail/llvm-dev/2018-January/120320.html
In preparation for adding support for named vregs we are changing the sigil for
physical registers in MIR to '$' from '%'. This will prevent name clashes of
named physical register with named vregs.
llvm-svn: 323922
Summary:
This removes the need for a machine module pass using some deeply
questionable hacks. This should address PR36123 which is a case where in
full LTO the memory usage of a machine module pass actually ended up
being significant.
We should revert this on trunk as soon as we understand and fix the
memory usage issue, but we should include this in any backports of
retpolines themselves.
Reviewers: echristo, MatzeB
Subscribers: sanjoy, mcrosier, mehdi_amini, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D42726
llvm-svn: 323915
Summary:
Call MRI.freezeReservedRegs() on functions created during outlining so
that calls to isReserved() by the verifier called after this pass won't
assert.
Reviewers: MatzeB, qcolombet, paquette
Subscribers: mcrosier, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D42749
llvm-svn: 323905
This change is useful for the upcoming addition of the symbol
table (D41954) since in that world aliases for given function
all share the same function index.
This change does not effect lld because it essentially ignores
the wasm "table". The table exists only to the wasm objects
will validate and disassembly meaningfully.
Patch by Nicholas Wilson!
Differential Revision: https://reviews.llvm.org/D42095
llvm-svn: 323900
Summary:
This was introduced in D42646 but ended up being reverted because the original implementation was buggy.
Depends on D42646
Reviewers: craig.topper, niravd, spatel, hfinkel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42741
llvm-svn: 323899
Since r322087, glibc's finite lib calls are generated when possible.
However, they are not supported on Android. This change also
disables other functions not available on Android.
Differential Revision: http://reviews.llvm.org/D42668
llvm-svn: 323898
Summary:
This change is part of step five in the series of changes to remove alignment argument from
memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the
CodeGenPrepare pass to be more aggressive in improving the source and destination alignments
of memcpy/memmove/memset by exploiting our new ability to record independent alignments
for each argument.
Steps:
Step 1) Remove alignment parameter and create alignment parameter attributes for
memcpy/memmove/memset. ( rL322965, rC322964, rL322963 )
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
source and dest alignments. ( rL323597 )
Step 3) Update Clang to use the new IRBuilder API. ( rC323617 )
Step 4) Update Polly to use the new IRBuilder API. ( rL323618 )
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
and those that use use MemIntrinsicInst::[get|set]Alignment() to use [get|set]DestAlignment()
and [get|set]SourceAlignment() instead. ( rL323886 )
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
MemIntrinsicInst::[get|set]Alignment() methods.
Reference
http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.htmlhttp://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html
llvm-svn: 323891
Summary:
It seems it's main effect is to create addition copies when values are inr register that do not support this trick, which increase register pressure and makes the code bigger.
Reviewers: craig.topper, niravd, spatel, hfinkel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42646
llvm-svn: 323888
Selecting of constant HVX vectors involves some "manual processing",
which mishandled an unrelated BITCAST operation causing a selection
error.
llvm-svn: 323887
Summary:
This change is part of step five in the series of changes to remove alignment argument from
memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the Lint
analysis to cease using the old getAlignment() API of MemoryIntrinsic in favour of getting
source & dest specific alignments through the new API.
Steps:
Step 1) Remove alignment parameter and create alignment parameter attributes for
memcpy/memmove/memset. ( rL322965, rC322964, rL322963 )
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
source and dest alignments. ( rL323597 )
Step 3) Update Clang to use the new IRBuilder API. ( rC323617 )
Step 4) Update Polly to use the new IRBuilder API. ( rL323618 )
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
and those that use use MemIntrinsicInst::[get|set]Alignment() to use [get|set]DestAlignment()
and [get|set]SourceAlignment() instead.
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
MemIntrinsicInst::[get|set]Alignment() methods.
Reference
http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.htmlhttp://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html
llvm-svn: 323886
This commit came as a result for revert of patch r317579 (originally
committed as r317100). The patch made CFI instructions duplicable, because
their existence in the epilogue block was affecting the Tail duplication
pass. However, duplicating blocks with CFI instructions was an issue for
compact unwind info on Darwin, which is why the patch was reverted.
This patch allows duplicating tails with CFI instructions, though they are
not duplicable, by copying them 'manually'.
Patch by Djordje Kovacevic.
Differential Revision: https://reviews.llvm.org/D40979
llvm-svn: 323883
Summary:
Instruction Selection preserves relative orders of all nodes save
TokenFactors which we treat specially. As a result Node Ids for
TokenFactors may violate the topological ordering and should not be
considered as valid pruning candidates in predecessor search.
Fixes PR35316.
Reviewers: RKSimon, hfinkel
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D42701
llvm-svn: 323880
In D41587, @mssimpso discovered that the order of some patterns for
AArch64 was sub-optimal. I thought a bit about how we could avoid that
case in the future. I do not think there is a need for evaluating all
patterns for now. But this patch adds an extra (expensive) check, that
evaluates the latencies of all patterns, and ensures that the latency
saved decreases for subsequent patterns.
This catches the sub-optimal order fixed in D41587, but I am not
entirely happy with the check, as it only applies to sub-optimal
patterns seen while building with EXPENSIVE_CHECKS on. It did not
discover any other sub-optimal pattern ordering.
Reviewers: Gerolf, spatel, mssimpso
Reviewed By: Gerolf, mssimpso
Differential Revision: https://reviews.llvm.org/D41766
llvm-svn: 323873
When selecting a split candidate for region splitting, the register allocator tries to predict which candidate will have the cheapest spill cost.
Global splitting may cause the creation of local intervals, and they might spill.
This patch makes RA take into account the spill cost of local split intervals in use blocks (we already take into account the spill cost in through blocks).
A flag ("-condsider-local-interval-cost") controls weather we do this advanced cost calculation (it's on by default for X86 target, off for the rest).
Differential Revision: https://reviews.llvm.org/D41585
Change-Id: Icccb8ad2dbf13124f5d97a18c67d95aa6be0d14d
llvm-svn: 323870
Summary:
Expressions of the form x < 0 ? 0 : x; and x < -1 ? -1 : x can be lowered using bit-operations instead of branching or conditional moves
In thumb-mode this results in a two-instruction sequence, a shift followed by a bic or or while in ARM/thumb2 mode that has flexible second operand the shift can be folded into a single bic/or instructions. In most cases this results in smaller code and possibly less branches, and in no case larger than before.
Patch by Marten Svanfeldt.
Reviewers: fhahn, pbarrio
Reviewed By: pbarrio
Subscribers: efriedma, rogfer01, aemerson, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D42574
llvm-svn: 323869
Since these methods will assert if the integer does not fit into 64 bits,
it is necessary to do this check before calling them in
supportedAddressingMode().
Review: Ulrich Weigand.
llvm-svn: 323866
Because dead code may contain non-standard IR that causes infinite looping or crashes in underlying analysis.
See PR36134 for more details.
Differential Revision: https://reviews.llvm.org/D42683
llvm-svn: 323862
Half-precision arguments and return values are passed as if it were an int or
float for ARM. This results in truncates and bitcasts to/from i16 and f16
values, which are legalized very early to stack stores/loads. When FullFP16 is
enabled, we want to avoid codegen for these bitcasts as it is unnecessary and
inefficient.
Differential Revision: https://reviews.llvm.org/D42580
llvm-svn: 323861
Enable multiple COPY hints to eliminate more COPYs during register allocation.
Note that this is something all targets should do, see
https://reviews.llvm.org/D38128.
Review: Nemanja Ivanovic
llvm-svn: 323858
In Thumb 1, with the new ADDCARRY / SUBCARRY the scheduler may need to do
copies CPSR ↔ GPR but not all Thumb1 targets implement them.
The schedule can attempt, before attempting a copy, to clone the instructions
but it does not currently do that for nodes with input glue. In this patch we
introduce a target-hook to let the hook decide if a glued machinenode is still
eligible for copying. In this case these are ARM::tADCS and ARM::tSBCS .
As a follow-up of this change we should actually implement the copies for the
Thumb1 targets that do implement them and restrict the hook to the targets that
can't really do such copy as these clones are not ideal.
This change fixes PR35836.
Differential Revision: https://reviews.llvm.org/D42051
llvm-svn: 323857
These were introduced in r323783 and use an X86 triple. I'll follow up
on the list to check if it would make more sense to remove the triple
and mark them REQUIRES: default_triple instead.
llvm-svn: 323847
When a the Apple link editor builds a kext bundle file type and the
value of the -miphoneos-version-min argument is significantly current
(like 11.0) then the (__TEXT,__text) section is changed to the
(__TEXT_EXEC,__text) section. So it would be nice for llvm-nm to
show symbols in that section with a type of T instead of the generic
type of S for some section other than text, data, etc.
rdar://36262205
llvm-svn: 323836
Sometimes users do not specify data layout in LLVM assembly and let llc set the
data layout by target triple after loading the LLVM assembly.
Currently the parser checks alloca address space no matter whether the LLVM
assembly contains data layout definition, which causes false alarm since the
default data layout does not contain the correct alloca address space.
The parser also calls verifier to check debug info and updating invalid debug
info. Currently there is no way to let the verifier to check debug info only.
If the verifier finds non-debug-info issues the parser will fail.
For llc, the fix is to remove the check of alloca addr space in the parser and
disable updating debug info, and defer the updating of debug info and
verification to be after setting data layout of the IR by target.
For other llvm tools, since they do not override data layout by target but
instead can override data layout by a command line option, an argument for
overriding data layout is added to the parser. In cases where data layout
overriding is necessary for the parser, the data layout can be provided by
command line.
Differential Revision: https://reviews.llvm.org/D41832
llvm-svn: 323826
Summary: ThinLTO may skip object for other reasons, e.g. if there is no summary.
Reviewers: pcc, eugenis
Subscribers: mehdi_amini, inglorion, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D42514
llvm-svn: 323818
Summary:
This is exposed during ThinLTO compilation, when we import an alias by
creating a clone of the aliasee. Without this fix the debug type is
unnecessarily cloned and we get a duplicate, undoing the uniquing.
Fixes PR36089.
Reviewers: mehdi_amini, pcc
Subscribers: eraman, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D41669
llvm-svn: 323813
Passing -minimize to dsymutil prevents the emission of .debug_inlines,
.debug_pubnames, and .debug_pubtypes in favor of the Apple accelerator
tables.
The actual check in the DWARF linker was added in r323655. This patch
simply enables it.
Differential revision: https://reviews.llvm.org/D42688
llvm-svn: 323812
Without the patch !if() is only evaluated if it's used directly.
If it's passed through more than one level of class inheritance,
we end up with a reference to an anonymous record with unresolved
references to the original arguments !if may have used.
The root cause of the problem is that TernOpInit::isComplete()
was always returning false and that prevented use of the folded
value of !if() as an initializer for the record at the next level
of inheritance.
Differential Revision: https://reviews.llvm.org/D42695
llvm-svn: 323807
Instructions like memd(r0+##global+1) are legal as long as the entire
address is properly aligned. Assuming that "global" is aligned at an
8-byte boundary, the expression "global+1" appears to be misaligned.
Handle such cases in HexagonConstExtenders, and make sure that any non-
extended offsets generated are still aligned accordingly.
llvm-svn: 323799
This reverts r323562, since it wasn't actually necessary. Constant-
extended offsets do not need to be aligned, as long as the effective
address is aligned.
Keep the testcase, with a modification which checks that such offsets
are not unnecessarily avoided.
llvm-svn: 323798
Similar to D42437, XOP supports variable shift for v16i8/v8i16/v4i32/v2i64 types.
Differential Revision: https://reviews.llvm.org/D42526
llvm-svn: 323797
Mark more opcodes as hasExtraSrcRegAllocReq so that their operands will
be marked as not renamable, to avoid copy forwarding violating the
constraint that only one operand may use the constant bus.
These changes fix a few mis-compiles when copy forwarding is enabled in
MachineCopyPropagation by D41835 (and were reviewed as part of that change).
llvm-svn: 323794
-amdgpu-waitcnt-forcezero={1|0} Force all waitcnt instrs to be emitted as s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-amdgpu-waitcnt-forceexp=<n> Force emit a s_waitcnt expcnt(0) before the first <n> instrs
-amdgpu-waitcnt-forcelgkm=<n> Force emit a s_waitcnt lgkmcnt(0) before the first <n> instrs
-amdgpu-waitcnt-forcevm=<n> Force emit a s_waitcnt vmcnt(0) before the first <n> instrs
This patch was pushed ( abb190fd51cd2f9a9eef08c024e109f7f7e909fc ), which caused a buildbot failure, reverted ( 6227480d74da507cf8e1b4bcaffbdb9fb875b4b8 ), and then updated to fix buildbot failures (this patch).
Differential Revision: https://reviews.llvm.org/D40091
llvm-svn: 323788
When removing return value Dead Argument Elimination pass clobbers first
llvm.dbg.value’s argument for live arguments of that function by replacing
it with nullptr. In the next pass it will be deleted, so debug location
about those arguments are lost. This change fixes it.
Patch by Djordje Todorovic.
Differential Revision: https://reviews.llvm.org/D42541
llvm-svn: 323784
Introduce an extension to support passing linker options to the linker.
These would be ignored by older linkers, but newer linkers which support
this feature would be able to process the linker.
Emit a special discarded section `.linker-option`. The content of this
section is a pair of strings (key, value). The key is a type identifier for
the parameter. This allows for an argument free parameter that will be
processed by the linker with the value being the parameter. As an example,
`lib` identifies a library to be linked against, traditionally the `-l`
argument for Unix-based linkers with the parameter being the library name.
Thanks to James Henderson, Cary Coutant, Rafael Espinolda, Sean Silva
for the valuable discussion on the design of this feature.
llvm-svn: 323783
This feature enables the fusion of the address generation and a
corresponding load or store together.
Differential revision: https://reviews.llvm.org/D42393
llvm-svn: 323782
PR36061 showed that during the expansion of ISD::FPOWI, that there
was an incorrect zero extension of the integer argument which for
MIPS64 would then give incorrect results. Address this with the
existing mechanism for correcting sign extensions.
This resolves PR36061.
Thanks to James Cowgill for reporting the issue!
Reviewers: atanasyan, hfinkel
Differential Revision: https://reviews.llvm.org/D42537
llvm-svn: 323781
candidates with coldcc attribute.
This recommits r322721 reverted due to sanitizer memory leak build bot failures.
Original commit message:
This patch adds support for the coldcc calling convention for Power.
This changes the set of non-volatile registers. It includes a pass to stress
test the implementation by marking all static directly called functions with
the coldcc attribute through the option -enable-coldcc-stress-test. It also
includes an option, -ppc-enable-coldcc, to add the coldcc attribute to
functions which are cold at all call sites based on BlockFrequencyInfo when
the containing function does not call any non cold functions.
Differential Revision: https://reviews.llvm.org/D38413
llvm-svn: 323778
Summary:
There's an asymmetry in the definitions of findBaseDefiningValueOfVector() and
findBaseDefiningValue() of RS4GC. The later handles call and invoke instructions,
and the former does not. This appears to be simple oversight. This patch remedies
the oversight by adding the call and invoke cases to findBaseDefiningValueOfVector().
Reviewers: DaniilSuchkov, anna
Reviewed By: anna
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42653
llvm-svn: 323764
Legal if we have hardware support for floating point, libcalls
otherwise.
Also add the necessary support for libcalls in the legalizer helper.
llvm-svn: 323726
When a function return value can't be directly lowered, such as
returning an i128 on WebAssembly, as indicated by the CanLowerReturn
target hook, SelectionDAGBuilder can translate it to return the
value through a hidden sret-like argument.
If such a function has an argument with the "returned" attribute,
the attribute can't be automatically lowered, because the function
no longer has a normal return value. For now, just discard the
"returned" attribute.
This fixes PR36128.
llvm-svn: 323715
When RAFast sees liveins in on a basic block, it uses that information
to initialize the availability of the registers. The called
method uses an instruction as one of its argument and in the liveins
case, RAFast was dereferencing MBB::begin which can be MBB::end for
empty basic block.
Change the API of definePhysReg to use MachineBasicBlock::iterator
instead of MachineInstr so that we don't dereference an
invalid iterator while making the call.
rdar://problem/36952401
llvm-svn: 323710
Patch by: Bas Nieuwenhuizen
Just use the _e64 variant if needed. This should be possible as per
def : Pat <
(int_amdgcn_kill (i1 (setcc f32:$src, InlineFPImm<f32>:$imm, cond:$cond))),
(SI_KILL_F32_COND_IMM_PSEUDO $src, (bitcast_fpimm_to_i32 $imm), (cond_as_i32imm $cond))
> ;
I don't think we can get an immediate for the other operand for which we
need the second 32-bit word.
https://reviews.llvm.org/D42302
llvm-svn: 323706
We currently emit up to 15-byte NOPs on all targets (apart from Silvermont), which stalls performance on some targets with decoders that struggle with 2 or 3 more '66' prefixes.
This patch flags recent AMD targets (btver1/znver1) to still emit 15-byte NOPs and bdver* targets to emit 11-byte NOPs. All other targets now emit 10-byte NOPs apart from SilverMont CPUs which still emit 7-byte NOPS.
Differential Revision: https://reviews.llvm.org/D42616
llvm-svn: 323693
Summary:
Apparently, we missed on constraining register classes of VReg-operands of all the instructions
built from a destination pattern but the root (top-level) one. The issue exposed itself
while selecting G_FPTOSI for armv7: the corresponding pattern generates VTOSIZS wrapped
into COPY_TO_REGCLASS, so top-level COPY_TO_REGCLASS gets properly constrained,
while nested VTOSIZS (or rather its destination virtual register to be exact) does not.
Fixing this by issuing GIR_ConstrainSelectedInstOperands for every nested GIR_BuildMI.
https://bugs.llvm.org/show_bug.cgi?id=35965
rdar://problem/36886530
Patch by Roman Tereshin
Reviewers: dsanders, qcolombet, rovka, bogner, aditya_nandakumar, volkan
Reviewed By: dsanders, qcolombet, rovka
Subscribers: aemerson, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D42565
llvm-svn: 323692
r323476 added support for DW_FORM_line_strp, and incorrectly made that
depend on having a DWARFUnit available. We shouldn't be tracking
.debug_line_str in DWARFUnit after all. After this patch, I can do an
NFC follow up and undo a bunch of the "plumbing" part of r323476.
Differential Revision: https://reviews.llvm.org/D42609
llvm-svn: 323691
Summary:
It seems it's main effect is to create addition copies when values are inr register that do not support this trick, which increase register pressure and makes the code bigger.
The main noteworthy regression I was able to observe was pattern of the type (setcc (trunc (and X, C)), 0) where C is such as it would benefit from the hi register trick. To prevent this, a new pattern is added to materialize such pattern using a 32 bits test. This has the added benefit of working with any constant that is materializable as a 32bits immediate, not just the ones that can leverage the high register trick, as demonstrated by the test case in test-shrink.ll using the constant 2049 .
Reviewers: craig.topper, niravd, spatel, hfinkel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42646
llvm-svn: 323690
This reverts commit r322917 due to multiple performance regressions in spec2006
and spec2017. XFAILed llvm/test/CodeGen/AArch64/big-callframe.ll which initially
motivated this change.
llvm-svn: 323683
We now test that pic and static produce different results for bar.
The function names were demangled.
The attributes are written inline.
llvm-svn: 323680
Summary:
Fix a few places that were modifying code after register
allocation to set the renamable bit correctly to avoid failing the
validation added in D42449.
llvm-svn: 323675
Summary:
If the same value is going to be vectorized several times in the same
tree entry, this entry is considered to be a gather entry and cost of
this gather is counter as cost of InsertElementInstrs for each gathered
value. But we can consider these elements as ShuffleInstr with
SK_PermuteSingle shuffle kind.
Reviewers: spatel, RKSimon, mkuper, hfinkel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38697
llvm-svn: 323662
This patch adds support for generating accelerator tables in dsymutil.
This feature was already present in our internal repository but not yet
upstreamed because it requires changes to the Apple accelerator table
implementation.
Differential revision: https://reviews.llvm.org/D42501
llvm-svn: 323655
Summary:
When emitting the location for a global variable with fragmented debug
expressions, make sure that the offset pieces, which represent
optimized-out parts of the variable, are emitted before their succeeding
fragments' expressions. Previously, if the succeeding fragment's
location was a symbol, the offset piece was emitted after, rather than
before, that symbol's expression. This effectively meant that the symbols
were associated with the wrong parts of the variable.
This fixes PR36085.
Patch by: David Stenberg
Reviewers: aprantl, probinson, dblaikie
Reviewed By: aprantl
Subscribers: JDevlieghere, llvm-commits
Tags: #debug-info
Differential Revision: https://reviews.llvm.org/D42527
llvm-svn: 323644
Summary: This was broken long ago in D12208, which failed to account for
the fact that 64-bit SPARC uses a stack bias of 2047, and it is the
*unbiased* value which should be aligned, not the biased one. This was
seen to be an issue with Rust.
Patch by: jrtc27 (James Clarke)
Reviewers: jyknight, venkatra
Reviewed By: jyknight
Subscribers: jacob_hansen, JDevlieghere, fhahn, fedor.sergeev, llvm-commits
Differential Revision: https://reviews.llvm.org/D39425
llvm-svn: 323643
Summary:
This modifies the dwarfdump output to align it with the new .debug_names
dump. It also renames two header fields to match similar fields in the
dwarf5 header.
A couple of tests needed to be updated to match new output. The changes
were fairly straight-forward, although not really automatable.
Reviewers: JDevlieghere, aprantl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42415
llvm-svn: 323641
Summary:
This commit renames DWARFAcceleratorTable to AppleAcceleratorTable to free up
the first name as an interface for the different accelerator tables.
Then I add a DWARFDebugNames class for the dwarf5 table.
Presently, the only common functionality of the two classes is the dump()
method, because this is the only method that was necessary to implement
dwarfdump -debug-names; and because the rest of the
AppleAcceleratorTable interface does not directly transfer to the dwarf5
tables (the main reason for that is that the present interface assumes
the tables are homogeneous, but the dwarf5 tables can have different
keys associated with each entry).
I expect to make the common interface richer as I add more functionality
to the new class (and invent a way to represent it in generic way).
In terms of sharing the implementation, I found the format of the two
tables sufficiently different to frustrate any attempts to have common
parsing or dumping code, so presently the implementations share just low
level code for formatting dwarf constants.
Reviewers: vleschuk, JDevlieghere, clayborg, aprantl, probinson, echristo, dblaikie
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42297
llvm-svn: 323638
The Large System Extension added an atomic compare-and-swap instruction
that operates on a pair of 64-bit registers, which we can use to
implement a 128-bit cmpxchg.
Because i128 is not a legal type for AArch64 we have to do all of the
instruction selection in C++, and the instruction requires even/odd
register pairs, so we have to wrap it in REG_SEQUENCE and EXTRACT_SUBREG
nodes. This is very similar to what we do for 64-bit cmpxchg in the ARM
backend.
Differential revision: https://reviews.llvm.org/D42104
llvm-svn: 323634
Summary:
There's a check in the code to only check getSetCCResultType after LegalOperations or if the type is MVT::i1. But the i1 check is only allowing scalar types through. I think it should check that the scalar type is MVT::i1 so that it will work for vectors.
The changed test already does this combine with AVX512VL where getSetCCResultType returns vXi1. But with avx512f and no VLX getSetCCResultType returns a type matching the width of the input type.
Reviewers: spatel, RKSimon
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42619
llvm-svn: 323631
This pretty much reverts r322006, except that we keep the test,
because we work around the issue exposed in a different way (a
recursion limit in value tracking). There's still probably some
sequence that exposes this problem, and the proper way to fix that
for somebody who has time is outlined in the code review.
llvm-svn: 323630
This prevents functions accessing varargs from being inlined if they
have the alwaysinline attribute.
Reviewers: efriedma, rnk, davide
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D42556
llvm-svn: 323619
We can use the same input for both operands to get a free compare with zero.
We already use this trick in a couple places where we explicitly create PTESTM with the same input twice. This generalizes it.
I'm hoping to remove the ISD opcodes and move this to isel patterns like we do for scalar cmp/test.
llvm-svn: 323605
Legalization is still biased to turn LT compares in to GT by swapping operands to avoid needing extra isel patterns to commute.
I'm hoping to remove TESTM/TESTNM next and this should simplify that by making EQ/NE more similar.
llvm-svn: 323604
If broadcasting from another shuffle, attempt to simplify it.
We can probably generalize this a lot more (embedding in combineX86ShufflesRecursively), but BROADCAST is one of the more troublesome as it accepts inputs of different sizes to the result.
llvm-svn: 323602
The code was using getValueSizeInBits and combining with the result of a call to DAG.ComputeNumSignBits. But for vector types getValueSizeInBits returns the width of the full vector while ComputeNumSignBits is going to give a number no larger than the width of a single element. So we should be using getScalarValueSizeInBits to get the element width.
llvm-svn: 323583
We weren't converting the immediate ConstantFP during legalization, which caused
the wrong bit patterns to be emitted for half type FP constants.
Fixes PR36106.
llvm-svn: 323582
Previously we had to materialize all 1s in a register using vpternlog or pcmpeq and then xor with that. By using vpternlog directly we can do it in one operation.
This is implemented using isel patterns, but we should maybe consider creating a generalized vpternlog combiner.
llvm-svn: 323572
A cast from A to B is eliminable if its result is casted to C, and if
the pair of casts could just be expressed as a single cast. E.g here,
%c1 is eliminable:
%c1 = zext i16 %A to i32
%c2 = sext i32 %c1 to i64
InstCombine optimizes away eliminable casts. This patch teaches it to
insert a dbg.value intrinsic pointing to the final result, so that local
variables pointing to the eliminable result are preserved.
Differential Revision: https://reviews.llvm.org/D42566
llvm-svn: 323570
A correctly aligned address may happen to be separated into a variable
part and a constant part, where the constant part does not match the
alignment needed in a load/store that uses this address. Such a constant
cannot be used as an immediate offset in an indexed instruction.
When lowering a global address, make sure that if there is an offset
folded into the global, the offset is valid for all uses in load/store
instructions.
llvm-svn: 323562
One common source of blocks with no successors is calls to noreturn
functions; we want to preserve pristine registers in case they throw an
exception.
The whole pristine register thing is messy (we should really prefer to
explicitly model registers), but this fills a hole in the model for now.
Fixes https://bugs.llvm.org/show_bug.cgi?id=36073.
Differential Revision: https://reviews.llvm.org/D42509
llvm-svn: 323559
Summary: This is the producer side for DWARF v5 string offsets tables. The reader/consumer
side was committed with r321295. All compile and type units in a module share a
contribution to the string offsets table. Indirect strings use the strx{1,2,3,4} index forms.
Reviewers: dblaikie, aprantl, JDevliegehere
Differential Revision: https://reviews.llvm.org/D42021
llvm-svn: 323546
We currently coalesce v4i32 extracts from all 4 elements to 2 v2i64 extracts + shifts/sign-extends.
This seems to have been added back in the days when we tended to spill vectors and reload scalars, or ended up with repeated shuffles moving everything down to 0'th index. I don't think either of these are likely these days as we have better EXTRACT_VECTOR_ELT and VECTOR_SHUFFLE handling, and the existing code tends to make it very difficult for various vector and load combines.
Differential Revision: https://reviews.llvm.org/D42308
llvm-svn: 323541
Summary:
If the same value is going to be vectorized several times in the same
tree entry, this entry is considered to be a gather entry and cost of
this gather is counter as cost of InsertElementInstrs for each gathered
value. But we can consider these elements as ShuffleInstr with
SK_PermuteSingle shuffle kind.
Reviewers: spatel, RKSimon, mkuper, hfinkel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38697
llvm-svn: 323530
Add support for printing / parsing the addrspace of a MachineMemOperand.
Fixes PR35970.
Differential Revision: https://reviews.llvm.org/D42502
llvm-svn: 323521
- using qualified pointer addrspace in intrinsics class to avoid .f32 mangling
- changed too common atomic mangling to ds
- added missing intrinsics to AMDGPUTTIImpl::getTgtMemIntrinsic
Reviewed by: b-sumner
Differential Revision: https://reviews.llvm.org/D42383
llvm-svn: 323516
load instruction
The function `Thumb1InstrInfo::loadRegFromStackSlot` accepts only the `tGPR`
register class. The function serves to emit a `tLDRspi` instruction and
certainly any subset of the `tGPR` register class is a valid destination of the
load.
Differential revision: https://reviews.llvm.org/D42535
llvm-svn: 323514
This is the groundwork for Armv8.2-A FP16 code generation .
Clang passes and returns _Float16 values as floats, together with the required
bitconverts and truncs etc. to implement correct AAPCS behaviour, see D42318.
We will implement half-precision argument passing/returning lowering in the ARM
backend soon, but for now this means that this:
_Float16 sub(_Float16 a, _Float16 b) {
return a + b;
}
gets lowered to this:
define float @sub(float %a.coerce, float %b.coerce) {
entry:
%0 = bitcast float %a.coerce to i32
%tmp.0.extract.trunc = trunc i32 %0 to i16
%1 = bitcast i16 %tmp.0.extract.trunc to half
<SNIP>
%add = fadd half %1, %3
<SNIP>
}
When FullFP16 is *not* supported, we don't make f16 a legal type, and we get
legalization for "free", i.e. nothing changes and everything works as before.
And also f16 argument passing/returning is handled.
When FullFP16 is supported, we do make f16 a legal type, and have 2 places that
we need to patch up: f16 argument passing and returning, which involves minor
tweaks to avoid unnecessary code generation for some bitcasts.
As a "demonstrator" that this works for the different FP16, FullFP16, softfp
modes, etc., I've added match rules to the VSUB instruction description showing
that we can codegen this instruction from IR, but more importantly, also to
some conversion instructions. These conversions were causing issue before in
the FP16 and FullFP16 cases.
I've also added match rules to the VLDRH and VSTRH desriptions, so that we can
actually compile the entire half-precision sub code example above. This showed
that these loads and stores had the wrong addressing mode specified: AddrMode5
instead of AddrMode5FP16, which turned out not be implemented at all, so that
has also been added.
This is the minimal patch that shows all the different moving parts. In patch
2/3 I will add some efficient lowering of bitcasts, and in 2/3 I will add the
remaining Armv8.2-A FP16 instruction descriptions.
Thanks to Sam Parker and Oliver Stannard for their help and reviews!
Differential Revision: https://reviews.llvm.org/D38315
llvm-svn: 323512
When pass creates a MOV instruction for
lea (%base,%index,1), %dst => mov %base,%dst; add %index,%dst
modification it should clean the killed flag for base
if base is equal to index.
Otherwise verifier complains about usage of killed register in add instruction.
Reviewers: lsaba, zvi, zansari, aaboud
Reviewed By: lsaba
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42522
llvm-svn: 323497
Inserting a dbg.value instruction at the start of a basic block with a
landingpad instruction triggers a verifier failure. We should be OK if
we insert the instruction a bit later.
Speculative fix for the bot failure described here:
https://reviews.llvm.org/D42551
llvm-svn: 323482
While writing code for input and output formats in llvm-objcopy it became
apparent that there was a code health problem. This change attempts to solve
that problem by refactoring the code to use Reader and Writer objects that can
read in different objects in different formats, convert them to a single shared
internal representation, and then write them to any other representation.
New classes:
Reader: the base class used to construct instances of the internal
representation
Writer: the base class used to write out instances of the internal
representation
ELFBuilder: a helper class for ELFWriter that takes an ELFFile and converts it
to a Object
SectionVisitor: it became necessary to remove writeSection from SectionBase
because, under the new Reader/Writer scheme, it's possible to convert between
ELF Types such as ELF32LE and ELF32BE. This isn't possible with writeSection
because it (dynamically) depends on the underlying section type *and*
(statically) depends on the ELF type. Bad things would happen if the underlying
sections for ELF32LE were used for writing to ELF64BE. To avoid this code smell
(which would have compiled, run, and output some nonsesnse) I decoupled writing
of sections from a class.
SectionWriter: This is just the ELFT templated implementation of
SectionVisitor. Many classes now have this class as a friend so that the
writing methods in this class can write out private data.
ELFWriter: This is the Writer that outputs to ELF
BinaryWriter: This is the Writer that outputs to Binary
ElfType: Because the ELF Type is not a part of the Object anymore we need a way
to construct the correct default Writer based on properties of the Reader. This
enum just keeps track of the ELF type of the input so it can be used as the
default output type as well.
Object has correspondingly undergone some serious changes as well. It now has
more generic methods for building and manipulating ELF binaries. This interface
makes ELFBuilder easy enough to use and will make the BinaryReader/Builder easy
to create as well. Most changes in this diff are cosmetic and deal with the
fact that a method has been moved from one class to another or a change from a
pointer to a reference. Almost no changes should result in a functional
difference (this is after all a refactor). One minor functional change was made
and the result can be seen in remove-shstrtab-error.test. The fact that it
fails hasn't changed but the error message has changed because that failure is
detected at a later point in the code now (because WriteSectionHeaders is a
property of the ElfWriter *not* a property of the Object). I'd say roughly
80-90% of this code is cosmetically different, 10-19% is different but
functionally the same, and 1-5% is functionally different despite not causing a
change in tests.
Differential Revision: https://reviews.llvm.org/D42222
llvm-svn: 323480
This form is like DW_FORM_strp, but points to .debug_line_str instead
of .debug_str as the string section. It's intended to be used from
the line-table header, and allows string-pooling of directory and
filenames across compilation units.
Differential Revision: https://reviews.llvm.org/D42553
llvm-svn: 323476
This patch is an enhancement to propagate dbg.value information when
Phis are created on behalf of LCSSA. I noticed a case where a value
carried across a loop was reported as <optimized out>.
Specifically this case:
int bar(int x, int y) {
return x + y;
}
int foo(int size) {
int val = 0;
for (int i = 0; i < size; ++i) {
val = bar(val, i); // Both val and i are correct
}
return val; // <optimized out>
}
In the above case, after all of the interesting computation completes
our value is reported as "optimized out." This change will add a
dbg.value to correct this.
This patch also moves the dbg.value insertion routine from
LoopRotation.cpp into Local.cpp, so that we can share it in both places
(LoopRotation and LCSSA).
Patch by Matt Davis!
Differential Revision: https://reviews.llvm.org/D42551
llvm-svn: 323472
The asm parser puts the lock prefix in the MCInst flags so we need to check that in addition to TSFlags. This matches what the ATT printer does.
llvm-svn: 323469
Summary:
If the same value is going to be vectorized several times in the same
tree entry, this entry is considered to be a gather entry and cost of
this gather is counter as cost of InsertElementInstrs for each gathered
value. But we can consider these elements as ShuffleInstr with
SK_PermuteSingle shuffle kind.
Reviewers: spatel, RKSimon, mkuper, hfinkel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38697
llvm-svn: 323441
This is guarded by shouldChangeType(), so the tests show that
we don't do the fold if the narrower type is not legal. Note
that there is a proposal (D42424) that would change the results
for the specific cases shown in these tests. That difference is
also discussed in PR35792:
https://bugs.llvm.org/show_bug.cgi?id=35792
Alive proofs for the cases handled here as well as the bitwise
logic binops that we should already do better on:
https://rise4fun.com/Alive/c97https://rise4fun.com/Alive/Lc5Ehttps://rise4fun.com/Alive/kdf
llvm-svn: 323437
Summary:
If the same value is going to be vectorized several times in the same
tree entry, this entry is considered to be a gather entry and cost of
this gather is counter as cost of InsertElementInstrs for each gathered
value. But we can consider these elements as ShuffleInstr with
SK_PermuteSingle shuffle kind.
Reviewers: spatel, RKSimon, mkuper, hfinkel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38697
llvm-svn: 323430