There is no situation where this rarely-used argument cannot be
substituted with a DIExpression and removing it allows us to simplify
the DWARF backend. Note that this patch does not yet remove any of
the newly dead code.
rdar://problem/33580047
Differential Revision: https://reviews.llvm.org/D35951
llvm-svn: 309426
The conditional tail call logic did the wrong thing when both
destinations of a conditional branch were the same:
BB#1: derived from LLVM BB %entry
Live Ins: %EFLAGS
Predecessors according to CFG: BB#0
JE_1 <BB#5>, %EFLAGS<imp-use,kill>
JMP_1 <BB#5>
BB#5: derived from LLVM BB %sw.epilog
Predecessors according to CFG: BB#1
TCRETURNdi64 <ga:@mergeable_conditional_tailcall>, 0, ...
We would fold the JE_1 to a TCRETURNdi64cc, and then remove our BB#5
successor. Then BB#5 would be deleted as it had no predecessors, leaving
a dangling "JMP_1 <BB#5>" reference behind to cause assertions later.
This patch checks that both conditional branch destinations are
different before doing the transform. The standard branch folding logic
is able to remove both the JMP_1 and the JE_1, and for my test case we
end up forming a better conditional tail call later.
Fixes PR33980
llvm-svn: 309422
This allows handling of a lot more of the interesting
cases in Blender. Most of the large functions unlikely
to be inlined have this pattern.
This is a special case for what clang emits for OpenCL 3
element vectors. Annoyingly, these are emitted as
<3 x elt>* pointers, but accessed as <4 x elt>* operations.
This also needs to handle cases where a struct containing
a single vector is used.
llvm-svn: 309419
It is better to return arguments directly in registers
if we are making a call rather than introducing expensive
stack usage. In one of sample compile from one of
Blender's many kernel variants, this fires on about
~20 different functions. Future improvements may be to
recognize simple cases where the pointer is indexing a small
array. This also fails when the store to the out argument
is in a separate block from the return, which happens in
a few of the Blender functions. This should also probably
be using MemorySSA which might help with that.
I'm not sure this is correct as a FunctionPass, but
MemoryDependenceAnalysis seems to not work with
a ModulePass.
I'm also not sure where it should run.I think it should
run before DeadArgumentElimination, so maybe either
EP_CGSCCOptimizerLate or EP_ScalarOptimizerLate.
llvm-svn: 309416
We need to pass something to functions for this to work.
It isn't derivable just from the kernarg segment pointer
because the implicit arguments are placed after the
kernel arguments.
Also fixes missing test for the intrinsic.
llvm-svn: 309398
This patch enables choice for accessing thread local
storage pointer (like '-mtp' in gcc).
Differential Revision: https://reviews.llvm.org/D34408
llvm-svn: 309381
The ARM Runtime ABI document (IHI0043) defines the AEABI floating point
helper functions in section 4.1.2 The floating-point helper functions.
The functions listed in this section must always use the base AAPCS calling
convention.
This test generates calls to all the helper functions that llvm supports
and checks that the base AAPCS calling convention has been used. We test
the equivalent of -mfloat-abi=soft, -mfloat-abi=softfp, -mfloat-abi=hardfp
with an FPU that supports single and double precision, and one that only
supports double precision.
Differential Revision: https://reviews.llvm.org/D35904
llvm-svn: 309371
The code assumed that unclobbered/unspilled callee saved registers are
unused in the function. This is not true for callee saved registers that are
also used to pass parameters such as swiftself.
rdar://33401922
llvm-svn: 309350
The X86 tail call eligibility logic was correct when it was written, but
the addition of inalloca and argument copy elision broke its
assumptions. It was assuming that fixed stack objects were immutable.
Currently, we aim to emit a tail call if no arguments have to be
re-arranged in memory. This code would trace the outgoing argument
values back to check if they are loads from an incoming stack object.
If the stack argument is immutable, then we won't need to store it back
to the stack when we tail call.
Fortunately, stack objects track their mutability, so we can just make
the obvious check to fix the bug.
This was http://crbug.com/749826
llvm-svn: 309343
The (seldom-used) TBI-aware optimization had a typo lying dormant since
it was first introduced, in r252573: when asking for demanded bits, it
told TLI that it was running after legalize, where the opposite was
true.
This is an important piece of information, that the demanded bits
analysis uses to make assumptions about the node. r301019 added such an
assumption, which was broken by the TBI combine.
Instead, pass the correct flags to TLO.
llvm-svn: 309323
Improve DAGTypeLegalizer::convertMask's isSETCCorConvertedSETCC assertion to properly check for any mixture of SETCC or BUILD_VECTOR of constants, or a logical mask op of them.
llvm-svn: 309302
In optimizeCompareInstr, a compare instruction is eliminated by using a record form instruction if possible.
If the branch instruction that uses the result of the compare has a static branch hint, the optimization does not happen.
This patch makes this optimization happen regardless of the branch hint by splitting branch hint and branch condition before checking the predicate to identify the possible optimizations.
Differential Revision: https://reviews.llvm.org/D35801
llvm-svn: 309255
Currently SI_IF results in a s_and_saveexec_b64 followed by s_xor_b64.
The xor is used to extract only the changed bits. In case of a simple
if region where the only use of that value is in the SI_END_CF to
restore the old exec mask, we can omit the xor and perform an or of
the exec mask with the original exec value saved by the
s_and_saveexec_b64.
Differential Revision: https://reviews.llvm.org/D35861
llvm-svn: 309185
This patch expands the support of lowerInterleavedStore to 32x8i stride 4.
LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=4 VF=32) and we plan to include more patterns in the future. To reach our goal of "more patterns". We include two mask creators. The first function creates shuffle's mask equivalent to unpacklo/unpackhi instructions. The other creator creates mask equivalent to a concat of two half vectors(high/low).
The patch goal is to optimize the following sequence:
At the end of the computation, we have ymm2, ymm0, ymm12 and ymm3 holding
each 32 chars:
c0, c1, , c31
m0, m1, , m31
y0, y1, , y31
k0, k1, ., k31
And these need to be transposed/interleaved and stored like so:
c0 m0 y0 k0 c1 m1 y1 k1 c2 m2 y2 k2 c3 m3 y3 k3 ....
Reviewers:
dorit
Farhana
RKSimon
guyblank
DavidKreitzer
Differential Revision: https://reviews.llvm.org/D34601
llvm-svn: 309086
Summary: The aligned load predicates don't suppress themselves if the load is non-temporal the way the unaligned predicates do. For the most part this isn't a problem because the aligned predicates are mostly used for instructions that only load the the non-temporal loads have priority over those. The exception are masked loads.
Reviewers: RKSimon, zvi
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D35712
llvm-svn: 309079
This patch just adds printing of CR bit registers in a more human-readable
form akin to that used by the GNU binutils.
Differential Revision: https://reviews.llvm.org/D31494
llvm-svn: 309001
D35067/rL308322 attempted to support up to 4 load pairs for memcmp inlining which resulted in regressions for some optimized libc memcmp implementations (PR33914).
Until we can match these more optimal cases, this patch reduces the memcmp expansion to a maximum of 2 load pairs (which matches what we do for -Os).
This patch should be considered for the 5.0.0 release branch as well
Differential Revision: https://reviews.llvm.org/D35830
llvm-svn: 308986
Create a dummy 8 byte fixed object for the unused slot below the first
stored vararg.
Alternative ideas tested but skipped: One could try to align the whole
fixed object to 16, but I haven't found how to add an offset to the stack
frame used in LowerWin64_VASTART.
If only the size of the fixed stack object size is padded but not the offset, via
MFI.CreateFixedObject(alignTo(GPRSaveSize, 16), -(int)GPRSaveSize, false),
PrologEpilogInserter crashes due to "Attempted to reset backwards range!".
This fixes misconceptions about where registers are spilled, since
AArch64FrameLowering.cpp assumes the offset from fixed objects is
aligned to 16 bytes (and the Win64 case there already manually aligns
the offset to 16 bytes).
This fixes cases where local stack allocations could overwrite callee
saved registers on the stack.
Differential Revision: https://reviews.llvm.org/D35720
llvm-svn: 308950
This patch removes unnecessary zero copies in BBs that are targets of b.eq/b.ne
and we know the result of the compare instruction is zero. For example,
BB#0:
subs w0, w1, w2
str w0, [x1]
b.ne .LBB0_2
BB#1:
mov w0, wzr ; <-- redundant
str w0, [x2]
.LBB0_2
Differential Revision: https://reviews.llvm.org/D35075
llvm-svn: 308849
These patterns were only missing to favor using the legacy instructions when the shift was a constant. With careful adjustment of the pattern complexity we can make sure the immediate instructions still have priority over these patterns.
llvm-svn: 308834
Check the actual memory type stored and not the extended value size
when considering if truncated store merge is worthwhile.
Reviewers: efriedma, RKSimon, spatel, jyknight
Reviewed By: efriedma
Subscribers: llvm-commits, nhaehnle
Differential Revision: https://reviews.llvm.org/D35623
llvm-svn: 308833
-membedded-data changes the location of constant data from the .sdata to
the .rodata section. Previously it was (incorrectly) always located in the
.rodata section.
Reviewers: atanasyan
Differential Revision: https://reviews.llvm.org/D35686
llvm-svn: 308758
This patch makes LSR generate better code for SystemZ in the cases of memory
intrinsics, Load->Store pairs or comparison of immediate with memory.
In order to achieve this, the following common code changes were made:
* New TTI hook: LSRWithInstrQueries(), which defaults to false. Controls if
LSR should do instruction-based addressing evaluations by calling
isLegalAddressingMode() with the Instruction pointers.
* In LoopStrengthReduce: handle address operands of memset, memmove and memcpy
as address uses, and call isFoldableMemAccessOffset() for any LSRUse::Address,
not just loads or stores.
SystemZ changes:
* isLSRCostLess() implemented with Insns first, and without ImmCost.
* New function supportedAddressingMode() that is a helper for TTI methods
looking at Instructions passed via pointers.
Review: Ulrich Weigand, Quentin Colombet
https://reviews.llvm.org/D35262https://reviews.llvm.org/D35049
llvm-svn: 308729
Currently we only support (i32 bitcast(v32i1)) using the AVX2 VPMOVMSKB ymm instruction.
This patch adds support for splitting pre-AVX2 targets into 2 x (V)PMOVMSKB xmm instructions and merging the integer results.
In future we could probably generalize this to handle more cases.
Differential Revision: https://reviews.llvm.org/D35303
llvm-svn: 308723
It revealed a bug in the Localizer pass which has now been fixed.
This includes the fix for SUBREG_TO_REG committed separately last time.
llvm-svn: 308688
If the localizer pass puts one of its constants before the label that tells the
unwinder "jump here to handle your exception" then control-flow will skip it,
leaving uninitialized registers at runtime. That's bad.
llvm-svn: 308687
The patch adds support of i128 params lowering. The changes are quite trivial to
support i128 as a "special case" of integer type. With this patch, we lower i128
params the same way as aggregates of size 16 bytes: .param .b8 _ [16].
Currently, NVPTX can't deal with the 128 bit integers:
* in some cases because of failed assertions like
ValVTs.size() == OutVals.size() && "Bad return value decomposition"
* in other cases emitting PTX with .i128 or .u128 types (which are not valid [1])
[1] http://docs.nvidia.com/cuda/parallel-thread-execution/index.html#fundamental-types
Differential Revision: https://reviews.llvm.org/D34555
Patch by: Denys Zariaiev (denys.zariaiev@gmail.com)
llvm-svn: 308675
On AMDGPU SGPR spills are really spilled to another register.
The spiller creates the spills to new frame index objects,
which is used as a placeholder.
This will eventually be replaced with a reference to a position
in a VGPR to write to and the frame index deleted. It is
most likely not a real stack location that can be shared
with another stack object.
This is a problem when StackSlotColoring decides it should
combine a frame index used for a normal VGPR spill with
a real stack location and a frame index used for an SGPR.
Add an ID field so that StackSlotColoring has a way
of knowing the different frame index types are
incompatible.
llvm-svn: 308673
Summary:
Also enable no-fsmuld for sparcv7 (which doesn't have the
instruction).
The previous code which used a post-processing pass to do this was
unnecessary; disabling the instruction is entirely sufficient.
Reviewers: jacob_hansen, ekedaigle
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D35576
llvm-svn: 308661
We allow wider than 5 bits in the 16 and 32 bit store forms. And we allow wider than 6 bits on the 64-bit regsiter form.:w
I'm assuming this was a mistake made back in r148024.
llvm-svn: 308656
Summary:
When pushing an extension of a constant bitwise operator on a load
into the load, change other uses of the load value if they exist to
prevent the old load from persisting.
Reviewers: spatel, RKSimon, efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D35030
llvm-svn: 308618
Test constant folding both on node creation (which already works) and once the input nodes have been folded themselves (not working yet).
llvm-svn: 308611
This patch adds handling of the `long_call`, `far`, and `near`
attributes passed by front-end. The patch depends on D35479.
Differential revision: https://reviews.llvm.org/D35480.
llvm-svn: 308606
Most combines currently recognise scalar and splat-vector constants, but not non-uniform vector constants.
This patch introduces a matching mechanism that uses predicates to check against BUILD_VECTOR of ConstantSDNode, as well as scalar ConstantSDNode cases.
I've changed a couple of predicates to demonstrate - the combine-shl changes add currently unsupported cases, while the MatchRotate replaces an existing mechanism.
Differential Revision: https://reviews.llvm.org/D35492
llvm-svn: 308598
Introduced FSELECT node necesary when lowering ISD::SELECT
which has i32, f64, f64 as its operands.
SEL_D instruction required that its output and first operand
of a SELECT node, which it used, have matching types.
MTC1_D64 node introduced to aid FSELECT lowering.
This fixes machine verifier errors on following tests:
CodeGen/Mips/llvm-ir/select-dbl.ll
CodeGen/Mips/llvm-ir/select-flt.ll
CodeGen/Mips/select.ll
Differential Revision: https://reviews.llvm.org/D35408
llvm-svn: 308595
Add optimization remarks support to the PrologueEpilogueInserter. For
now, emit the stack size as an analysis remark, but more additions wrt
shrink-wrapping may be added.
https://reviews.llvm.org/D35645
llvm-svn: 308556
This patch cleans up and fixes issues in the M-Class system register handling:
1. It defines the system registers and the encoding (SYSm values) in one place:
a new ARMSystemRegister.td using SearchableTable, thereby removing the
hand-coded values which existed in multiple places.
2. Some system registers e.g. BASEPRI_MAX_NS which do not exist were being allowed!
Ref: ARMv6/7/8M architecture reference manual.
Reviewed by: @t.p.northover, @olist01, @john.brawn
Differential Revision: https://reviews.llvm.org/D35209
llvm-svn: 308456
Summary:
When simplifying unconditional branches from empty blocks, we pre-test if the
BB belongs to a set of loop headers and keep the block to prevent passes from
destroying canonical loop structure. However, the current algorithm fails if
the destination of the branch is a loop header. Especially when such a loop's
latch block is folded into loop header it results in additional backedges and
LoopSimplify turns it into a nested loop which prevent later optimizations
from being applied (e.g., loop unrolling and loop interleaving).
This patch augments the existing algorithm by further checking if the
destination of the branch belongs to a set of loop headers and defer
eliminating it if yes to LateSimplifyCFG.
Fixes PR33605: https://bugs.llvm.org/show_bug.cgi?id=33605
Reviewers: efriedma, mcrosier, pacxx, hsung, davidxl
Reviewed By: efriedma
Subscribers: ashutosh.nema, gberry, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D35411
llvm-svn: 308422
That part was reverted because the underlying change necessitating it
(r308025) was reverted in r308271.
Nirav re-landed r308025 again in r308350, so re-landing this fix.
llvm-svn: 308418
Summary:
This patch adds the following
1. Adds a skeleton scheduler model for AMD Znver1.
2. Introduces the znver1 execution units and pipes.
3. Caters the instructions based on the generic scheduler classes.
4. Further additions to the scheduler model with instruction itineraries will be carried out incrementally based on
a. Instructions types
b. Registers used
5. Since itineraries are not added based on instructions, throughput information are bound to change when incremental changes are added.
6. Scheduler testcases are modified accordingly to suit the new model.
Patch by Ganesh Gopalasubramanian. With minor formatting tweaks from me.
Reviewers: craig.topper, RKSimon
Subscribers: javed.absar, shivaram, ddibyend, vprasad
Differential Revision: https://reviews.llvm.org/D35293
llvm-svn: 308411
Re-recommiting after landing DAG extension-crash fix.
Recommiting after adding check to avoid miscomputing alias information
on addresses of the same base but different subindices.
Memory accesses offset from frame indices may alias, e.g., we
may merge write from function arguments passed on the stack when they
are contiguous. As a result, when checking aliasing, we consider the
underlying frame index's offset from the stack pointer.
Static allocs are realized as stack objects in SelectionDAG, but its
offset is not set until post-DAG causing DAGCombiner's alias check to
consider access to static allocas to frequently alias. Modify isAlias
to consider access between static allocas and access from other frame
objects to be considered aliasing.
Many test changes are included here. Most are fixes for tests which
indirectly relied on our aliasing ability and needed to be modified to
preserve their original intent.
The remaining tests have minor improvements due to relaxed
ordering. The exception is CodeGen/X86/2011-10-19-widen_vselect.ll
which has a minor degradation dispite though the pre-legalized DAG is
improved.
Reviewers: rnk, mkuper, jonpa, hfinkel, uweigand
Reviewed By: rnk
Subscribers: sdardis, nemanjai, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D33345
llvm-svn: 308350
Added a feature to the Sparc back-end that replaces the integer multiply and
divide instructions with calls to .mul/.sdiv/.udiv. This is a step towards
having full v7 support.
Patch by: Eric Kedaigle
Differential Revision: https://reviews.llvm.org/D35500
llvm-svn: 308343
When replacing a node and it's operand, replacing the operand node may
cause the deletion of the original node leading to an assertion
failure. Case around these replacements to avoid this without relying
on inspecting the DELETED_NODE opcode in various extend
dagcombiner cases.
Fixes PR32515.
Reviewers: dbabokin, RKSimon, davide, chandlerc
Subscribers: chandlerc, llvm-commits
Differential Revision: https://reviews.llvm.org/D34095
llvm-svn: 308330
As an approximation of the existing handling to avoid
regressions. Fixes using too many registers with calls
on subtargets with the SGPR allocation bug.
llvm-svn: 308326
Introduce pseudo-registers for registers needed for stack
access, which are replaced during finalizeLowering.
Note these pseudo-registers are currently only used for the
used register location, and not for determining their
input argument register.
This is better because it avoids the need to try to predict
whether a call will be emitted from the IR, and also
detects stack objects introduced by legalization.
Test changes are from the HasStackObjects check being more
accurate since stack objects introduced during legalization
are now known.
llvm-svn: 308325
It should be a win to avoid going out to the system lib for all small memcmp() calls using scalar ops. For x86 32-bit, this means most everything up to 16 bytes. For 64-bit, that doubles because we can do 8-byte loads.
Notes:
Reduced from 4 to 2 loads for -Os behavior, which might not be optimal in all cases. It's effectively a question of how much do we trust the system implementation. Linux and macOS (and Windows I assume, but did not test) have optimized memcmp() code for x86, so it's probably not bad either way? PPC is using 8/4 for defaults on these. We do not expand at all for -Oz.
There are still potential improvements to make for the CGP expansion IR and/or lowering such as avoiding select-of-constants (D34904) and not doing zexts to the max load type before doing a compare.
We have special-case SSE/AVX codegen for (memcmp(x, y, 16/32) == 0) that will no longer be produced after this patch. I've shown the experimental justification for that change in PR33329:
https://bugs.llvm.org/show_bug.cgi?id=33329#c12
TLDR: While the vector code is a likely winner, we can't guarantee that it's a winner in all cases on all CPUs, so I'm willing to sacrifice it for the greater good of expanding all small memcmp(). If we want to resurrect that codegen, it can be done by adjusting the CGP params or poking a hole to let those fall-through the CGP expansion.
Committed on behalf of Sanjay Patel
Differential Revision: https://reviews.llvm.org/D35067
llvm-svn: 308322
The flag "-hexagon-emit-lut-text" (defaulted to false) is added to decide
on where to keep the switch generated lookup table.
Differential Revision: https://reviews.llvm.org/D34818
llvm-svn: 308316
Summary:
When an immediate is folded by constant folding, we re-scan the entire
use list for two reasons:
1. The constant folding may have created a new use of the same reg.
2. The constant folding may have removed an additional use in the list
we're currently traversing (e.g., constant folding an S_ADD_I32 c, c).
However, this could previously lead to a crash when an unrelated use was
added twice into the FoldList. Since we re-scan the whole list anyway, we
might as well just clear the FoldList again before we do so.
Using a MIR test to show this because real code seems to trigger the issue
only in connection with some really subtle control flow structures.
Fixes GL45-CTS.shading_language_420pack.binding_images on gfx9.
Reviewers: arsenm
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D35416
llvm-svn: 308314
As discussed by @spatel on D35067:
"I added the cmov attribute to the 32-bit codegen test because it removes some noise for that file. I think the intent for the SSE vs no-SSE runs is to show the potential difference for the 16 and 32 byte cases rather than the lack of cmov (which has been available for all CPUs since SSE1, so that's why it shows up automatically with -mattr=sse2)."
llvm-svn: 308309
Summary:
G_FMA was recently added to GlobalISel which enables the import of rules
involving fma. Add the mapping to allow it.
Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar
Reviewed By: rovka
Subscribers: kristof.beyls, javed.absar, igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D35130
llvm-svn: 308308
This change introduces additional machine instructions in functions
dealing with the expansion of msa pseudo f16 instructions due to
register classes being inappropriate when checked with machine
verifier.
Differential Revision: https://reviews.llvm.org/D34276
llvm-svn: 308301
The commit r308100 updated WebAssembly tests for r308025. In one case it
merely made the test more resilient but in another case it made
a substantive update. Because r308025 was reverted in r308271, these
changes to the test also need to be reverted. They should be folded into
the recommit of r308025 when it is ready.
llvm-svn: 308273
This isn't legal code, but we shouldn't crash on it. Now we just don't convert the gather intrinsic if the scale isn't constant and let it go through to isel where we'll report an isel failure.
Fixes PR33772.
llvm-svn: 308267
This wasn't necessary before since they are always enabled
for kernels, but this is necessary if they need to be
forwarded to a callable function.
llvm-svn: 308226
Rename the enum value from X86_64_Win64 to plain Win64.
The symbol exposed in the textual IR is changed from 'x86_64_win64cc'
to 'win64cc', but the numeric value is kept, keeping support for
old bitcode.
Differential Revision: https://reviews.llvm.org/D34474
llvm-svn: 308208
This adds support for the new 128-bit vector float instructions of z14.
Note that these instructions actually only operate on the f128 type,
since only each 128-bit vector register can hold only one 128-bit
float value. However, this is still preferable to the legacy 128-bit
float instructions, since those operate on pairs of floating-point
registers (so we can hold at most 8 values in registers), while the
new instructions use single vector registers (so we hold up to 32
value in registers).
Adding support includes:
- Enabling the instructions for the assembler/disassembler.
- CodeGen for the instructions. This includes allocating the f128
type now to the VR128BitRegClass instead of FP128BitRegClass.
- Scheduler description support for the instructions.
Note that for a small number of operations, we have no new vector
instructions (like integer <-> 128-bit float conversions), and so
we use the legacy instruction and then reformat the operand
(i.e. copy between a pair of floating-point registers and a
vector register).
llvm-svn: 308196
This adds support for the new 32-bit vector float instructions of z14.
This includes:
- Enabling the instructions for the assembler/disassembler.
- CodeGen for the instructions, including new LLVM intrinsics.
- Scheduler description support for the instructions.
- Update to the vector cost function calculations.
In general, CodeGen support for the new v4f32 instructions closely
matches support for the existing v2f64 instructions.
llvm-svn: 308195
This patch series adds support for the IBM z14 processor. This part includes:
- Basic support for the new processor and its features.
- Support for new instructions (except vector 32-bit float and 128-bit float).
- CodeGen for new instructions, including new LLVM intrinsics.
- Scheduler description for the new processor.
- Detection of z14 as host processor.
Support for the new 32-bit vector float and 128-bit vector float
instructions is provided by separate patches.
llvm-svn: 308194
The target-independent lowering works fine, except concatenating 32-bit
words. Add a pattern to generate A2_combinew instead of 64-bit asl/or.
llvm-svn: 308186
Summary:
Previously, CodeGen checked first src operand type to determine if omod is supported by instruction. This isn't correct for some instructions: e.g. V_CMP_EQ_F32 has floating-point src operands but desn't support omod.
Changed .td files to check if dst operand instead of src operand.
Reviewers: arsenm, vpykhtin
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye
Differential Revision: https://reviews.llvm.org/D35350
llvm-svn: 308179
LLVM compiler recognizes opportunities to transform a branch into IR select instruction(s) - later it will be lowered into X86::CMOV instruction, assuming no other optimization eliminated the SelectInst.
However, it is not always profitable to emit X86::CMOV instruction. For example, branch is preferable over an X86::CMOV instruction when:
1. Branch is well predicted
2. Condition operand is expensive, compared to True-value and the False-value operands
In CodeGenPrepare pass there is a shallow optimization that tries to convert SelectInst into branch, but it is not enough.
This commit, implements machine optimization pass that converts X86::CMOV instruction(s) into branch, based on a conservative heuristic.
Differential Revision: https://reviews.llvm.org/D34769
llvm-svn: 308142
If the `long-calls` feature flags is enabled, disable use of the `jal`
instruction. Instead of that call a function by by first loading its
address into a register, and then using the contents of that register.
Differential revision: https://reviews.llvm.org/D35168
llvm-svn: 308087
The type needs to be casted back to the original argument type.
Fixes an assert that for some reason is only run when
using -debug.
Includes an additional combine to avoid test regressions
from having conversions mixed with multiple Assert[SZ]ext
nodes. On subtargets where i16 is legal, this was producing an i32
register with an i16 AssertZExt, truncated to i16 with another i8
AssertZExt.
t2: i32,ch = CopyFromReg t0, Register:i32 %vreg0
t3: i16 = truncate t2
t5: i16 = AssertZext t3, ValueType:ch:i8
t6: i8 = truncate t5
t7: i32 = zero_extend t6
llvm-svn: 308082
Currently, for code like below,
===
inner_map = bpf_map_lookup_elem(outer_map, &port_key);
if (!inner_map) {
inner_map = &fallback_map;
}
===
the compiler generates (pseudo) code like the below:
===
I1: r1 = bpf_map_lookup_elem(outer_map, &port_key);
I2: r2 = 0
I3: if (r1 == r2)
I4: r6 = &fallback_map
I5: ...
===
During kernel verification process, After I1, r1 holds a state
map_ptr_or_null. If I3 condition is not taken
(path [I1, I2, I3, I5]), supposedly r1 should become map_ptr.
Unfortunately, kernel does not recognize this pattern
and r1 remains map_ptr_or_null at insn I5. This will cause
verificaiton failure later on.
Kernel, however, is able to recognize pattern "if (r1 == 0)"
properly and give a map_ptr state to r1 in the above case.
LLVM here generates suboptimal code which causes kernel verification
failure. This patch fixes the issue by changing BPF insn pattern
matching and lowering to generate proper codes if the righthand
parameter of the above condition is a constant. A test case
is also added.
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 308080
Restricting register class to PointerRegClass for memory operands.
Also fix the PointerRegClass for AArch64 from GPR64 to GPR64sp, since
XZR cannot hold a memory pointer while SP is.
Fixes PR33134.
Differential Revision: https://reviews.llvm.org/D34999
llvm-svn: 308060
Summary:
This patch is the first step in reducing HW prefetcher instruction tag
collisions in inner loops for Falkor. It adds a pass that annotates IR
loads with metadata to indicate that they are known to be strided loads,
and adds a target lowering hook that translates this metadata to a
target-specific MachineMemOperand flag.
A follow on change will use this MachineMemOperand flag to re-write
instructions to reduce tag collisions.
Reviewers: mcrosier, t.p.northover
Subscribers: aemerson, rengolin, mgorny, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D34963
llvm-svn: 308059
In moveToVALU(), move to vector ALU is performed, all instrs in
the use chain will be visited. We do not want the same node to be
pushed to the visit worklist more than once.
Differential Revision: https://reviews.llvm.org/D34726
llvm-svn: 308039
This is the LLVM part, adding definitions for
void @llvm.hexagon.Y2.dccleana(i8*)
void @llvm.hexagon.Y2.dccleaninva(i8*)
void @llvm.hexagon.Y2.dcinva(i8*)
void @llvm.hexagon.Y2.dczeroa(i8*)
void @llvm.hexagon.Y4.l2fetch(i8*, i32)
void @llvm.hexagon.Y5.l2fetch(i8*, i64)
The clang part will follow.
llvm-svn: 308032
Recommiting after adding check to avoid miscomputing alias information
on addresses of the same base but different subindices.
Memory accesses offset from frame indices may alias, e.g., we
may merge write from function arguments passed on the stack when they
are contiguous. As a result, when checking aliasing, we consider the
underlying frame index's offset from the stack pointer.
Static allocs are realized as stack objects in SelectionDAG, but its
offset is not set until post-DAG causing DAGCombiner's alias check to
consider access to static allocas to frequently alias. Modify isAlias
to consider access between static allocas and access from other frame
objects to be considered aliasing.
Many test changes are included here. Most are fixes for tests which
indirectly relied on our aliasing ability and needed to be modified to
preserve their original intent.
The remaining tests have minor improvements due to relaxed
ordering. The exception is CodeGen/X86/2011-10-19-widen_vselect.ll
which has a minor degradation dispite though the pre-legalized DAG is
improved.
Reviewers: rnk, mkuper, jonpa, hfinkel, uweigand
Reviewed By: rnk
Subscribers: sdardis, nemanjai, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D33345
llvm-svn: 308025
Author: milena.vujosevic.janicic
Reviewers: sdardis
The patch extends size reduction pass for MicroMIPS.
The following instructions are examined and transformed, if possible:
ADDIU instruction is transformed into 16-bit instruction ADDIUSP
ADDIU instruction is transformed into 16-bit instruction ADDIUR1SP
Function InRange is changed to avoid left shifting of negative values, since
that caused some sanitizer tests to fail (so the previous patch
Differential Revision: https://reviews.llvm.org/D34511
llvm-svn: 308011
Insert a TSTri to set the flags and a Bcc to branch based on their
values. This is a bit inefficient in the (common) cases where the
condition for the branch comes from a compare right before the branch,
since we set the flags both as part of the compare lowering and as part
of the branch lowering. We're going to live with that until we settle on
a principled way to handle this kind of situation, which occurs with
other patterns as well (combines might be the way forward here).
llvm-svn: 308009
Constants are crucial for code size in the ARM Thumb-1 instruction
set. The 16 bit instruction size often does not offer enough space
for immediate arguments. This means that additional instructions are
frequently used to load constants into registers. Since constants are
hoisted, this can lead to significant register spillage if they are
used multiple times in a single function. This can be avoided by
rematerialization, i.e. recomputing a constant instead of reloading
it from the stack. This patch fixes the rematerialization of literal
pool loads in the ARM Thumb instruction set.
Patch by Philip Ginsbach
Differential Revision: https://reviews.llvm.org/D33936
llvm-svn: 308004
Since GFX9 supports denorm modes for v_min_f32/v_max_f32 that
is possible to further optimize fcanonicalize and remove it
if applied to min/max given their operands are known not to be
an sNaN or that sNaNs are not supported.
Additionally we can remove fcanonicalize if denorms are supported
for the VT and we know that its argument is never a NaN.
Differential Revision: https://reviews.llvm.org/D35335
llvm-svn: 307976
As outlined in the PR, we didn't ensure that displacements for DQ-Form
instructions are multiples of 16. Since the instruction encoding encodes
a quad-word displacement, a sub-16 byte displacement is meaningless and
ends up being encoded incorrectly.
Fixes https://bugs.llvm.org/show_bug.cgi?id=33671.
Differential Revision: https://reviews.llvm.org/D35007
llvm-svn: 307934
Pass parameters properly in calls to such functions (pass all
floats in integer registers), and handle va_start properly (allocate
stack immediately below the arguments on the stack, to save the
register arguments into a single continuous array).
Differential Revision: https://reviews.llvm.org/D35006
llvm-svn: 307928
For multiprecision arithmetic on MIPS, rather than using ISD::ADDE / ISD::ADDC,
get SelectionDAG to break down the operation into ISD::ADDs and ISD::SETCCs.
For MIPS, only the DSP ASE has a carry flag, so in the general case it is not
useful to directly support ISD::{ADDE, ADDC, SUBE, SUBC} nodes.
Also improve the generation code in such cases for targets with
TargetLoweringBase::ZeroOrOneBooleanContent by directly using the result of the
comparison node rather than using it in selects. Similarly for ISD::SUBE /
ISD::SUBC.
Address optimization breakage by moving the generation of MIPS specific integer
multiply-accumulate nodes to before legalization.
This revolves PR32713 and PR33424.
Thanks to Simonas Kazlauskas and Pirama Arumuga Nainar for reporting the issue!
Reviewers: slthakur
Differential Revision: https://reviews.llvm.org/D33494
The previous version of this patch was too aggressive in producing fused
integer multiple-addition instructions.
llvm-svn: 307906
This boils down to not crashing in reg bank select due to the lack of
register operands on this instruction, and adding some tests. The
instruction selection is already covered by the TableGen'erated code.
llvm-svn: 307904
Summary: Add target hooks for printing and parsing target MMO flags.
Targets may override getSerializableMachineMemOperandTargetFlags() to
return a mapping from string to flag value for target MMO values that
should be serialized/parsed in MIR output.
Add implementation of this hook for AArch64 SuppressPair MMO flag.
Reviewers: bogner, hfinkel, qcolombet, MatzeB
Subscribers: mcrosier, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D34962
llvm-svn: 307877
Summary: Continuing the work from https://reviews.llvm.org/D33240, this change introduces an element unordered-atomic memset intrinsic. This intrinsic is essentially memset with the implementation requirement that all stores used for the assignment are done with unordered-atomic stores of a given element size.
Reviewers: eli.friedman, reames, mkazantsev, skatkov
Reviewed By: reames
Subscribers: jfb, dschuff, sbc100, jgravelle-google, aheejin, efriedma, llvm-commits
Differential Revision: https://reviews.llvm.org/D34885
llvm-svn: 307854
We are using multiplication by 1.0 to flush denormals and quiet sNaNs.
That is possible to omit this multiplication if source of the
fcanonicalize instruction is known to be flushed/quieted, i.e.
if it comes from another instruction known to do the normalization
and we are using IEEE mode to quiet sNaNs.
Differential Revision: https://reviews.llvm.org/D35218
llvm-svn: 307848