The Promote action doesn't apply until LegalizeDAG. By the time
we get there, we would have already softened all the FP operations
if useSoftFloat was true. So there wouldn't be any operation left
to Promote.
Summary:
This temporarily disables the large working set size behavior in profile guided
size optimization due to internal benchmark regressions.
Reviewers: davidxl
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70207
The bug manifests as replacing a reduction operand with an undef
value.
The problem appears to be limited to cases where a min/max reduction
has extra uses of the compare operand to the select.
In the general case, we are tracking "ExternallyUsedValues" and
an "IgnoreList" of the reduction operations, but those may not apply
to the final compare+select in a min/max reduction.
For that, we use replaceAllUsesWith (RAUW) to ensure that the new
vectorized reduction values are transferred to all subsequent users.
Differential Revision: https://reviews.llvm.org/D70148
v256i1 on X86 without avx512 breaks down to 256 i8 values when passed between basic blocks. But the NumRegistersForVT was sized at a byte for each VT. This results in 256 being stored as 0.
This patch enlarges the type to 16 bits and adds an assert to ensure that no information is lost when the entry is stored.
Differential Revision: https://reviews.llvm.org/D70138
During register coalescing, we update the live-intervals on-the-fly.
To do that we are in this strange mode where the live-intervals can
be slightly out-of-sync (more precisely they are forward looking)
compared to what the IR actually represents.
This happens because the register coalescer only updates the IR when
it is done with updating the live-intervals and it has to do it this
way because updating the IR on-the-fly would actually clobber some
information on how the live-ranges that are being updated look like.
This is problematic for updates that rely on the IR to accurately
represents the state of the live-ranges. Right now, we have only
one of those: stripValuesNotDefiningMask.
To reconcile this need of out-of-sync IR, this patch introduces a
new argument to LiveInterval::refineSubRanges that allows the code
doing the live range updates to reason about how the code should
look like after the coalescer will have rewritten the registers.
Essentially this captures how a subregister index with be offseted
to match its position in a new register class.
E.g., let say we want to merge:
V1.sub1:<2 x s32> = COPY V2.sub3:<4 x s32>
We do that by choosing a class where sub1:<2 x s32> and sub3:<4 x s32>
overlap, i.e., by choosing a class where we can find "offset + 1 == 3".
Put differently we align V2's sub3 with V1's sub1:
V2: sub0 sub1 sub2 sub3
V1: <offset> sub0 sub1
This offset will look like a composed subregidx in the the class:
V1.(composed sub2 with sub1):<4 x s32> = COPY V2.sub3:<4 x s32>
=> V1.(composed sub2 with sub1):<4 x s32> = COPY V2.sub3:<4 x s32>
Now if we didn't rewrite the uses and def of V1, all the checks for V1
need to account for this offset to match what the live intervals intend
to capture.
Prior to this patch, we would fail to recognize the uses and def of V1
and would end up with machine verifier errors: No live segment at def.
This could lead to miscompile as we would drop some live-ranges and
thus, miss some interferences.
For this problem to trigger, we need to reach stripValuesNotDefiningMask
while having a mismatch between the IR and the live-ranges (i.e.,
we have to apply a subreg offset to the IR.)
This requires the following three conditions:
1. An update of overlapping subreg lanes: e.g., dsub0 == <ssub0, ssub1>
2. An update with Tuple registers with a possibility to coalesce the
subreg index: e.g., v1.dsub_1 == v2.dsub_3
3. Subreg liveness enabled.
looking at the IR to decide what is alive and what is not, i.e., calling
stripValuesNotDefiningMask.
coalescer maintains for the live-ranges information.
None of the targets that currently use subreg liveness (i.e., the targets
that fulfill #3, Hexagon, AMDGPU, PowerPC, and SystemZ IIRC) expose #1 and
and #2, so this patch also artificial enables subreg liveness for ARM,
so that a nice test case can be attached.
RETA always implicitly uses LR, unlike RET which merely has an
alias that defaults it to LR.
Additionally, RETA implicitly uses SP as well, which it uses as
a discriminator to authenticate LR.
This isn't usually noticeable, because RET_ReallyLR is used in most
of the backend. However, the post-RA scheduler, if enabled, will
cause miscompiles if the imp-uses are missing.
While there, fix a typo in the lone affected testcase.
The instruction definition has been retroactively expanded to
allow for an alias for '[xN, 0]!' as '[xN]!'.
That wouldn't make sense on LDR, but does for LDRA.
As noted by the FIXME comment, this is not correct based on our current FMF semantics.
We should be propagating FMF from the final value in a sequence (in this case the
'select'). So the behavior even without this patch is wrong, but we did not allow FMF
on 'select' until recently.
But if we do the correct thing right now in this patch, we'll inevitably introduce
regressions because we have not wired up FMF propagation for 'phi' and 'select' in
other passes (like SimplifyCFG) or other places in InstCombine. I'm not seeing a
better incremental way to make progress.
That said, the potential extra damage over the existing wrong behavior from this
patch is very limited. AFAIK, the only way to have different FMF on IR in the same
function is if we have LTO inlined IR from 2 modules that were compiled using
different fast-math settings.
As seen in the tests, we may actually see some improvements with this patch because
adding the FMF to the 'select' allows matching to min/max intrinsics that were
previously missed (in the common case, the 'fcmp' and 'select' should have identical
FMF to begin with).
Next steps in the transition:
Make similar changes in instcombine as needed.
Enable phi-to-select FMF propagation in SimplifyCFG.
Remove dependencies on fcmp with FMF.
Deprecate FMF on fcmp.
Differential Revision: https://reviews.llvm.org/D69720
Summary:
This avoid the need to duplicate the location lists searching logic in
various users. The "inline location list dumping" code (which is the
only user actually updated to handle DWARF v5 location lists) is
switched to this method. After adding v4 location list support, I'll
switch other users too.
Reviewers: dblaikie, probinson, JDevlieghere, aprantl, SouraVX
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70084
I think we have to be a bit more careful when it comes to moving
ops across shuffles, if the op does restrict undef. For example, without
this patch, we would move 'and %v, <0, 0, -1, -1>' over a
'shufflevector %a, undef, <undef, undef, 1, 2>'. As a result, the first
2 lanes of the result are undef after the combine, but they really
should be 0, unless I am missing something.
For ops that do fold to undef on undef operands, the current behavior
should be fine. I've add conservative check OpDoesRestrictUndef, maybe
there's a better existing utility?
Reviewers: spatel, RKSimon, lebedev.ri
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D70093
This implements TTI hook 'preferPredicateOverEpilogue' for MVE. This is a
first version and it operates on single block loops only. With this change, the
vectoriser will now determine if tail-folding scalar remainder loops is
possible/desired, which is the first step to generate MVE tail-predicated
vector loops.
This is disabled by default for now. I.e,, this is depends on option
-disable-mve-tail-predication, which is off by default.
I will follow up on this soon with a patch for the vectoriser to respect loop
hint 'vectorize.predicate.enable'. I.e., with this loop hint set to Disabled,
we don't want to tail-fold and we shouldn't query this TTI hook, which is
done in D70125.
Differential Revision: https://reviews.llvm.org/D69845
Summary: Removes CFI CFA directives that could incorrectly propagate
beyond the basic block they were inteded for. Specifically it removes
the epilogue CFI directives. See the branch_and_tail_call test for an
example of the issue. Should fix the stack unwinding issues caused by
the incorrect directives.
Reviewers: asb, lenary, shiva0217
Reviewed By: lenary
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69723
These are really just placeholders that use approximately the right resources - once we have CPUs scheduler models that support these instructions they will need revisiting.
In the meantime this means that all instructions have a class of some kind., meaning models can be more easily flagged as complete.
This caused miscompiles of Chromium (https://crbug.com/1023818). The reduced
repro is small enough to fit here:
$ cat /tmp/a.c
unsigned char f(unsigned char *p) {
unsigned char result = 0;
for (int shift = 0; shift < 1; ++shift)
result |= p[0] << (shift * 8);
return result;
}
$ bin/clang -O2 -S -o - /tmp/a.c | grep -A4 f:
f: # @f
.cfi_startproc
# %bb.0: # %entry
xorl %eax, %eax
retq
That's nicely optimized, but I don't think it's the right result :-)
> Same as D60846 but with a fix for the problem encountered there which
> was a missing context adjustment in the handling of PHI nodes.
>
> The test that caused D60846 to be reverted was added in e15ab8f277.
>
> Reviewers: nikic, nlopes, mkazantsev,spatel, dlrobertson, uabelho, hakzsam
>
> Subscribers: hiraditya, bollu, llvm-commits
>
> Tags: #llvm
>
> Differential Revision: https://reviews.llvm.org/D69571
This reverts commit 57dd4b03e4.
Instruction ldi.fmt can be considered cheap enough to avoid spill and restore
of value that it produces since it's loaded from immediate.
Differential Revision: https://reviews.llvm.org/D69898
When a 64-bit triple is used emit an error if the CPU only supports
32-bit code.
Patch by Miloš Stojanović.
Differential Revision: https://reviews.llvm.org/D70018
Summary:
Entry values are considered for parameters that have register-described
DBG_VALUEs in the entry block (along with other conditions).
If a parameter's value has been propagated from the caller to the
callee, then the parameter's DBG_VALUE in the entry block may be
described using a register defined by some instruction, and entry values
should not be emitted for the parameter, which can currently occur.
One such case was seen in the attached test case, in which the second
parameter, which is described by a redefinition of the first parameter's
register, would incorrectly get an entry value using the first
parameter's register. This commit intends to solve such cases by keeping
track of register defines, and ignoring DBG_VALUEs in the entry block
that are described by such registers.
In a RelWithDebInfo build of clang-8, the average size of the set was
27, and in a RelWithDebInfo+ASan build it was 30.
Reviewers: djtodoro, NikolaPrica, aprantl, vsk
Reviewed By: djtodoro, vsk
Subscribers: hiraditya, llvm-commits
Tags: #debug-info, #llvm
Differential Revision: https://reviews.llvm.org/D69889
Summary:
The conditions that are used to determine if entry values should be
emitted for a parameter are quite many, and will grow slightly
in a follow-up commit, so move those to a helper function, as was
suggested in the code review for D69889.
Reviewers: djtodoro, NikolaPrica
Reviewed By: djtodoro
Subscribers: probinson, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69955
This patch allows the register allocator to spill SVE registers to the stack.
Reviewers: ostannard, efriedma, rengolin, cameron.mcinally
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D70082
In case when all incoming values of a PHI are equal pointers, this
transformation inserts a definition of such a pointer right after
definition of the base pointer and replaces with this value both PHI and
all it's incoming pointers. Primary goal of this transformation is
canonicalization of this pattern in order to enable optimizations that
can't handle PHIs. Non-inbounds pointers aren't currently supported.
Reviewers: spatel, RKSimon, lebedev.ri, apilipenko
Reviewed By: apilipenko
Tags: #llvm
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D68128
This patch adds a target interface to set the StackID for a given type,
which allows scalable vectors (e.g. `<vscale x 16 x i8>`) to be assigned a
'sve-vec' StackID, so it is allocated in the SVE area of the stack frame.
Reviewers: ostannard, efriedma, rengolin, cameron.mcinally
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D70080
MVE includes instructions that extract an 8- or 16-bit lane from a
vector and sign-extend it into the output 32-bit GPR. `ARMInstrMVE.td`
already included isel patterns to select those instructions in
response to the `ARMISD::VGETLANEs` selection-DAG node type. But
`ARMISD::VGETLANEs` was never actually generated, because the code
that creates it was conditioned on NEON only.
It's an easy fix to enable the same code for integer MVE, and now IR
that sign-extends the result of an extractelement (whether explicitly
or as part of the function call ABI) will use `vmov.s8` instead of
`vmov.u8` followed by `sxtb`.
Reviewers: SjoerdMeijer, dmgreen, ostannard
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70132
Summary:
Replaces
```
unsigned getShiftAmountThreshold(EVT VT)
```
by
```
bool shouldAvoidTransformToShift(EVT VT, unsigned amount)
```
thus giving more flexibility for targets to decide whether particular shift amounts must be considered expensive or not.
Updates the MSP430 target with a custom implementation.
This continues D69116, D69120, D69326 and updates them, so all of them must be committed before this.
Existing tests apply, a few more have been added.
Reviewers: asl, spatel
Reviewed By: spatel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70042
This is no longer needed after widening legalization as we
custom legalize v8i8 ourselves.
Added entries to the cost model, but bumped the cost slightly
to account for the truncate shuffle that wasn't costed before.
Summary:
This patch adds a custom ISA for vector functions for internal use
in LLVM. The <isa> token is set to "_LLVM_", and it is not attached
to any specific instruction Vector ISA, or Vector Function ABI.
The ISA is used as a token for handling Vector Function ABI-style
vectorization for those vector functions that are not directly
associated to any existing Vector Function ABI (for example, some of
the vector functions exposed by TargetLibraryInfo). The demangling
function for this ISA in a Vector Function ABI context is set to be
the same as the common one shared between X86 and AArch64.
Reviewers: jdoerfert, sdesmalen, simoll
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70089
Previously this would default to 256, not the maximum supported size
of 1024. Using a maximum lower than the hardware maximum requires
language runtimes to enforce this limit for correctness, which no
language has correctly done. Switch the default to the conservatively
correct maximum, and force frontends to opt-in to the more optimal 256
default maximum.
I don't really understand why the changes in occupancy-levels.ll
increased the computed occupancy, which I expected to decrease. I'm
not sure if these tests should be forcing the old maximum.
Otherwise just let the v64i8/v32i16 types be split to v32i8/v16i16.
In reality this shouldn't happen because it means we have a 512-bit
vector argument, but min-legal-vector-width says a value less than
512. But a 512-bit argument should have been factored into the
preferred vector width.
Enable to generate BTF_KIND_VARs for non-static
default-section globals which is not allowed previously.
Modified the existing test case to accommodate the new change.
Also removed unused linkage enum members VAR_GLOBAL_TENTATIVE and
VAR_GLOBAL_EXTERNAL.
Differential Revision: https://reviews.llvm.org/D70145
Summary:
If there are any internal methods whose address was taken, conclude there is nothing known in relation of any other internal method and a global.
Reviewers: nlopes, sanjoy.google
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69690
These relocations are specified by the ARM EHABI (section 6.3). As I understand
it, their purpose is to accommodate unwinder implementations that wish to
reduce code size by placing the implementations of the compact unwinding
decoders in a separate translation unit, and using extern weak symbols to
refer to them from the main unwinder implementation, so that they are only
linked when something in the binary needs them in order to unwind.
However, neither of the unwinders used on Android (libgcc, LLVM libunwind)
use this technique, and in fact emitting these relocations ends up being
counterproductive to code size because they cause a copy of the unwinder
to be statically linked into most binaries, regardless of whether it is
actually needed. Furthermore, these relocations create circular dependencies
(between libc and the unwinder) in cases where the unwinder is dynamically
linked and libc contains compact unwind info.
Therefore, deviate from the EHABI here and stop emitting these relocations
on Android.
Differential Revision: https://reviews.llvm.org/D70027
1. Add pseudos PS_vloadrv_ai and PS_vstorerv_ai: those are now used
for single vector registers in loadRegFromStackSlot (and store...).
2. Remove pseudos PS_vloadrwu_ai and PS_vstorerwu_ai. The alignment is
now checked when expanding spill pseudos (both in frame lowering
and in expand-post-ra-pseudos), and a proper instruction is generated.
3. Update MachineMemOperands when dealigning vector spill slots.
4. Return vector predicate registers in getCallerSavedRegs.
Summary:
getAsInstruction is the only non-const member method.
It is impossible to enforce const-correctness because of it.
Reviewers: jmolloy, majnemer
Reviewed By: jmolloy
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70113
If the MOVi operand was renamable, the operands of the expanded
instructions are also renamable.
Reviewers: thegameg, samparker, zatrazz
Reviewed By: thegameg
Differential Revision: https://reviews.llvm.org/D70061
Don't try to canonicalize loads to scalable vector types to loads
of integers.
This removes one assertion when trying to use a TypeSize as a parameter
to DataLayout::isLegalInteger. It does not handle the second part of the
function (which looks at bitcasts).
This patch also contains a NFC fix for Load Analysis, where a variable
initialization that would cause the same assertion is moved closer to
its use. This allows us to run the new test for InstCombine without
having to teach LocationSize to play nicely with scalable vectors.
Differential Revision: https://reviews.llvm.org/D70075
Currently we have limited support for outer loops with multiple basic
blocks after the inner loop exit. But the current checks for creating
PHIs for loop exit values only assumes the header and latches of the
outer loop. It is better to just skip incoming values defined in the
original inner loops. Those are handled earlier.
Reviewers: efriedma, mcrosier
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D70059
Summary:
This patch extracts the logic for computing the "absolute" locations,
which was partially present in the debug_loclists dumper, completes it,
and moves it into a separate function. This makes it possible to later
reuse the same logic for uses other than dumping.
The dumper is changed to reuse the location list interpreter, and its
format is changed somewhat. In "verbose" mode it prints the "raw" value
of a location list, the interpreted location (if available) and the
expression itself. In non-verbose mode it prints only one of the
location forms: it prefers the interpreted form, but falls back to the
"raw" format if interpretation is not possible (for instance, because we
were not given a base address, or the resolution of indirect addresses
failed).
This patch also undos some of the changes made in D69672, namely the
part about making all functions static. The main reason for this is that
I learned that the original approach (dumping only fully resolved
locations) meant that it was impossible to rewrite one of the existing
tests. To make that possible (and make the "inline location" dump work
in more cases), I now reuse the same dumping mechanism as is used for
section-based dumping. As this required having more objects know about
the various location lists classes, it seemed like a good idea to create
an interface abstracting the difference between them.
Therefore, I now create a DWARFLocationTable class, which will serve as
a base class for the location list classes. DWARFDebugLoclists is made
to inherit from that. DWARFDebugLoc will follow.
Another positive effect of this change is that section-based dumping
code will not need to use templates (as originally) envisioned, and that
the argument lists of the dumping functions become shorter.
Reviewers: dblaikie, probinson, JDevlieghere, aprantl, SouraVX
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70081
In MachineCopyPropagation, when propagating the source of a copy into
the operand of a later instruction, bail if a destination overlaps
(partly defines) the copy source. If the instruction where the
substitution is happening is also a copy, allowing the propagation
confuses the tracking mechanism.
Differential Revision: https://reviews.llvm.org/D69953
Change-Id: Ic570754f878f2d91a4a50a9bdcf96fbaa240726d
SHT_LLVM_LINKER_OPTIONS section contains pairs of null-terminated strings.
This patch adds support for them.
Differential revision: https://reviews.llvm.org/D69895
Summary:
This patch introduces align attribute deduction for callsite argument, function argument, function returned and floating value based on must-be-executed-context.
Reviewers: jdoerfert, sstefan1
Reviewed By: jdoerfert
Subscribers: hiraditya, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69797
* Add inline to the helper functions because gcc-9 won't inline all of
them without the hint. I've avoided `__attribute__((always_inline))`
because gcc and clang will inline without it, and improves
compatibility.
* Replace the byte-by-byte copy in update() with endian::readbe32()
since perf reports that 1/2 of the time is spent copying into the
buffer before this patch.
When lld uses --build-id=sha1 it spends 30-45% of CPU in SHA1 depending on the binary (not wall-time since it is parallel). This patch speeds up SHA1 by a factor of 2 on clang-8 and 3 on gcc-6. This leads to a >10% improvement in overall linking time.
lld-speed-test benchmarks run on an Intel i9-9900k with Turbo disabled on CPU 0 compiled with clang-9. Stats recorded with `perf stat -r 5`. All inputs are using `--build-id=sha1`.
| Input | Before (seconds) | After (seconds) |
| --- | --- | --- |
| chrome | 2.14 | 1.82 (-15%) |
| chrome-icf | 2.56 | 2.29 (-10%) |
| clang | 0.65 | 0.53 (-18%) |
| clang-fsds | 0.69 | 0.58 (-16%) |
| clang-gdb-index | 21.71 | 19.3 (-11%) |
| gold | 0.42 | 0.34 (-19%) |
| gold-fsds | 0.431 | 0.355 (-17%) |
| linux-kernel | 0.625 | 0.575 (-8%) |
| llvm-as | 0.045 | 0.039 (-14%) |
| llvm-as-fsds | 0.035 | 0.039 (-11%) |
| mozilla | 11.3 | 9.8 (-13%) |
| mozilla-gc | 11.84 | 10.36 (-12%) |
| mozilla-O0 | 8.2 | 5.84 (-28%) |
| scylla | 5.59 | 4.52 (-19%) |
Reviewed By: ruiu, MaskRay
Differential Revision: https://reviews.llvm.org/D69295
Summary: This patch introduces a new heuristic for guiding operand reordering. The new "look-ahead" heuristic can look beyond the immediate predecessors. This helps break ties when the immediate predecessors have identical opcodes (see lit test for examples).
Reviewers: RKSimon, ABataev, dtemirbulatov, Ayal, hfinkel, rnk
Reviewed By: RKSimon, dtemirbulatov
Subscribers: xbolva00, Carrot, hiraditya, phosek, rnk, rcorcs, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60897
Summary: The attached test case replicates a null dereference crash in
`yaml::Document::skip()`. This was fixed by adding a check and early
return in the method.
Reviewers: Bigcheese, hintonda, beanz
Reviewed By: hintonda
Subscribers: hiraditya, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69974
The attribute is stored at the `FunctionIndex` attribute set, with the
name "vector-function-abi-variant".
The get/set methods of the attribute have assertion to verify that:
1. Each name in the attribute is a valid VFABI mangled name.
2. Each name in the attribute correspond to a function declared in the
module.
Differential Revision: https://reviews.llvm.org/D69976
MVT::i1 should be removed by type legalization before we reach
any code that would act on the promote action.
Mainly to avoid replicating this for strict FP versions of these
operations.
Summary:
This patch redefines freeze instruction from being UnaryOperator to a subclass of UnaryInstruction.
ConstantExpr freeze is removed, as discussed in the previous review.
FreezeOperator is not added because there's no ConstantExpr freeze.
`freeze i8* null` test is added to `test/Bindings/llvm-c/freeze.ll` as well, because the null pointer-related bug in `tools/llvm-c/echo.cpp` is now fixed.
InstVisitor has visitFreeze now because freeze is not unaryop anymore.
Reviewers: whitequark, deadalnix, craig.topper, jdoerfert, lebedev.ri
Reviewed By: craig.topper, lebedev.ri
Subscribers: regehr, nlopes, mehdi_amini, hiraditya, steven_wu, dexonsmith, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69932
If we're using soft floats, then these operations shoudl be
softened during type legalization. They'll never get to
LegalizeVectorOps or LegalizeDAG so they don't need to be
Expanded there.
For XCOFF, globals mapped into the .bss section are linked as COMMON
definitions. This behaviour is incorrect for zero initialized data, so
emit those to the .data section instead.
Differential Revision: https://reviews.llvm.org/D69528
In current Hoist() function of machine licm pass, it will not check the source and destination basic block frequencies that a instruction is hoisted from/to.
There is a chance that instruction is hoisted from a cold to a hot basic block.
In this patch, we add options to disable machine instruction hoisting if destination block is hotter.
Differential Revision: https://reviews.llvm.org/D63676
The new experimental expansion has a problem when a value has a data
dependency with an instruction from a previous stage. This is due to
the way we peel out the kernel. To fix that I'm changing the way we
peel out the kernel. We now peel the kernel NumberStage - 1 times.
The code would be correct at this point if we didn't have to handle
cases where the loop iteration is smaller than the number of stages.
To handle this case we move instructions between different epilogues
based on their stage and remap the PHI instructions correctly.
Differential Revision: https://reviews.llvm.org/D69538
Simple change to call target hook analyzeLoopForPipelining before
changing the loop. After peeling analyzing the loop may be more
complicated for target that don't have a loop instruction. This doesn't
affect Hexagone and PPC as they have hardware loop instructions.
Differential Revision: https://reviews.llvm.org/D69912
For example:
long long test(long long a, long long b) {
if (a << b > 0)
return b;
if (a << b < 0)
return a;
return a*b;
}
Produces:
sld. 5, 3, 4
ble 0, .LBB0_2
mr 3, 4
blr
.LBB0_2: # %if.end
cmpldi 5, 0
li 5, 1
isel 4, 4, 5, 2
mulld 3, 4, 3
blr
But the compare (cmpldi 5, 0) is redundant and can be removed (CR0 already
contains the result of that comparison).
The root cause of this is that LLVM converts signed comparisons into equality
comparison based on dominance. Equality comparisons are unsigned by default, so
we get either a record-form or cmp (without the l for logical) feeding a cmpl.
That is the situation we want to avoid here.
Differential Revision: https://reviews.llvm.org/D60506
The tail-call-kind-ness is known by the ObjCARC analysis and can be
enforced while lowering the intrinsics to calls.
This allows us to get the requested tail calls at -O0 without trying to
preserve the attributes throughout passes that change code even at -O0
,like the Always Inliner, where the ObjCOpt pass doesn't run.
Differential Revision: https://reviews.llvm.org/D69980
The Overflow version of XO-Form instruction uses the SO, OV and
OV32 special registers.
This changes modifies existing multiclasses and instruction
definitions to allow for the use of the XER register to record
the various types if overflow from possible add, subtract and
multiply instructions. It then modifies the existing instructions
as to use these multiclasses as needed.
Patch By: Kamau Bridgeman
Differential Revision: https://reviews.llvm.org/D66902
Re-try because earlier attempts were reverted due to use-after-free.
Hopefully, diagnosed correctly this time - we replace/remove the
invariant.start first rather than the invariant.end to avoid angering
worklist-based iteration.
We gather a set of white-listed instructions in isAllocSiteRemovable() and then
replace/erase them. But we don't know in general if the instructions in the set
have uses amongst themselves, so order of deletion makes a difference.
There's already a special-case for the llvm.objectsize intrinsic, so add another
for llvm.invariant.start.
Should fix:
https://bugs.llvm.org/show_bug.cgi?id=43723
Differential Revision: https://reviews.llvm.org/D69977
Summary:
SimplifySelectsFeedingBinaryOp simplified binary ops when both operands
were selects with the same condition. This patch extends it to handle
these cases where only one operand is a select:
X op (C ? P : Q) -> C ? (X op P) : (X op Q)
// if X op P and X op Q both simplify
(C ? P : Q) op Y -> C ? (P op Y) : (Q op Y)
// if P op Y and Q op Y both simplify
For example: X *fast (C ? 1.0 : 0.0) -> C ? X : 0.0
Reviewers: mcberg2017, majnemer, craig.topper, qcolombet, mcrosier
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64713
Summary:
Additional filtering of undesired shifts for targets that do not support them efficiently.
Related with D69116 and D69120
Applies the TLI.getShiftAmountThreshold hook to prevent undesired generation of shifts for the following IR code:
```
define i16 @testShiftBits(i16 %a) {
entry:
%and = and i16 %a, -64
%cmp = icmp eq i16 %and, 64
%conv = zext i1 %cmp to i16
ret i16 %conv
}
define i16 @testShiftBits_11(i16 %a) {
entry:
%cmp = icmp ugt i16 %a, 63
%conv = zext i1 %cmp to i16
ret i16 %conv
}
define i16 @testShiftBits_12(i16 %a) {
entry:
%cmp = icmp ult i16 %a, 64
%conv = zext i1 %cmp to i16
ret i16 %conv
}
```
The attached diff file shows the piece code in TargetLowering that is responsible for the generation of shifts in relation to the IR above.
Before applying this patch, shifts will be generated to replace non-legal icmp immediates. However, shifts may be undesired if they are even more expensive for the target.
For all my previous patches in this series (cited above) I added test cases for the MSP430 target. However, in this case, the target is not suitable for showing improvements related with this patch, because the MSP430 does not implement "isLegalICmpImmediate". The default implementation returns always true, therefore the patched code in TargetLowering is never reached for that target. Targets implementing both "isLegalICmpImmediate" and "getShiftAmountThreshold" will benefit from this.
The differential effect of this patch can only be shown for the MSP430 by temporarily implementing "isLegalICmpImmediate" to return false for large immediates. This is simulated with the implementation of a command line flag that was incorporated in D69975
This patch belongs to a initiative to "relax" the generation of shifts by LLVM for targets requiring it
Reviewers: spatel, lebedev.ri, asl
Reviewed By: spatel
Subscribers: lenary, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69326
Currently there is no way to describe the data that is not a part of an output section.
It can be a data used to align sections or to fill the gaps with something,
or another kind of custom data. In this patch I suggest a way to describe it. It looks like that:
```
Sections:
- Type: CustomFiller
Pattern: "CCDD"
Size: 4
- Name: .bar
Type: SHT_PROGBITS
Content: "FF"
```
I.e. I've added a kind of synthetic section with a synthetic type "CustomFiller".
In the code it is called a "SyntheticFiller", which is "a synthetic section which
might be used to write the custom data around regular output sections. It does
not present in the sections header table, but it might affect the output file size and
program headers produced. Think about it as about piece of data."
`SyntheticFiller` currently has a `Pattern` field and a `Size` field + an optional `Name`.
When written, `Size` of bytes in the output will be filled with a `Pattern`.
It is possible to reference a named filler it by name from the program headers description,
just like any other normal section.
Differential revision: https://reviews.llvm.org/D69709
The _m64 type is represented in IR as <1 x i64>. The x86-64 ABI
on Linux passes <1 x i64> as a double. MMX intrinsics use x86_mmx
type in IR.These things result in a lot of bitcasts in mmx code.
There's another instcombine that tries to turn bitcast <1 x i64>
to double into extractelement and a bitcast.
The combine here tries to reverse this extractelement conversion
if we see an mmx type.
Re-try rGef02831f0a4e (reverted due to use-after-free), but bail out completely
if we encounter an unexpected llvm.invariant.start.
We gather a set of white-listed instructions in isAllocSiteRemovable() and then
replace/erase them. But we don't know in general if the instructions in the set
have uses amongst themselves, so order of deletion makes a difference.
There's already a special-case for the llvm.objectsize intrinsic, so add another
for llvm.invariant.end.
Should fix:
https://bugs.llvm.org/show_bug.cgi?id=43723
Differential Revision: https://reviews.llvm.org/D69977
Summary: A helper function to get argument number of a arg operand Use.
Reviewers: jdoerfert, uenoku
Subscribers: hiraditya, lebedev.ri, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66844
Summary: When using the split sp adjustment and using the frame-pointer
we were still emitting CFI CFA directives based on the sp value. The
final sp-based offset also didn't reflect the two-stage sp adjust. There
remain CFI issues that aren't related to the split sp adjustment, and
thus will be addressed in a separate patch.
Reviewers: asb, lenary, shiva0217
Reviewed By: lenary, shiva0217
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69385
Summary: Adds tests necessary to properly show the impact of other
patches that affect the emission of CFI directives.
Reviewers: asb, lenary
Reviewed By: lenary
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69721
We gather a set of white-listed instructions in isAllocSiteRemovable() and then
replace/erase them. But we don't know in general if the instructions in the set
have uses amongst themselves, so order of deletion makes a difference.
There's already a special-case for the llvm.objectsize intrinsic, so add another
for llvm.invariant.end.
Should fix:
https://bugs.llvm.org/show_bug.cgi?id=43723
Differential Revision: https://reviews.llvm.org/D69977
Summary:This patch fixes the following warnings uncovered by PVS
Studio:
/home/xbolva00/LLVM/llvm-project/llvm/lib/Analysis/LoopCacheAnalysis.cpp
353 warn V612 An unconditional 'return' within a loop.
/home/xbolva00/LLVM/llvm-project/llvm/lib/Analysis/LoopCacheAnalysis.cpp
456 err V502 Perhaps the '?:' operator works in a different way than it
was expected. The '?:' operator has a lower priority than the '=='
operator.
Authored By:etiotto
Reviewer:Meinersbur, kbarton, bmahjour, Whitney, xbolva00
Reviewed By:xbolva00
Subscribers:hiraditya, llvm-commits
Tag:LLVM
Differential Revision:https://reviews.llvm.org/D69821
This recommits 11ed1c0239 (reverted in
9f08ce0d21 for failing an assert) with a fix:
tryToWidenMemory() now first checks if the widening decision is to interleave,
thus maintaining previous behavior where tryToInterleaveMemory() was called
first, giving priority to interleave decisions over widening/scalarization. This
commit adds the test case that exposed this bug as a LIT.
Summary: A user can force a function to be inlined by specifying the always_inline attribute. Currently, thinlto implementation is not aware of always_inline functions and does not guarantee import of such functions, which in turn can prevent inlining of such functions.
Patch by Bharathi Seshadri <bseshadr@cisco.com>
Reviewers: tejohnson
Reviewed By: tejohnson
Subscribers: mehdi_amini, inglorion, hiraditya, steven_wu, dexonsmith, arphaman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70014
This was arbitrarily appearing in only the last section emitted - which
made tests more sensitive than they needed to be (removing the last
section - like the macinfo section change that's coming after this)
would, surprisingly, move the blank line to the previous section.
Recommit r373168, which was reverted by r373242. This actually exposed a
boringssl bug which has been fixed for more than one month.
For the following two cases, we currently suppress the symbols. This
patch emits them (compatible with GNU as).
* `test2_a = undef`: if `undef` is otherwise unused.
* `.hidden hidden`: if `hidden` is unused. This is the main point of the
patch, because omitting the symbol would cause a linker semantic
difference.
It causes a behavior change that is not compatible with GNU as:
.weakref foo1, bar1
When neither foo1 nor bar1 is used, we now emit bar1, which is arguably
more consistent.
Another change is that we will emit .TOC. for .TOC.@tocbase . For this
directive, suppressing .TOC. can be seen as a size optimization, but we
choose to drop it for simplicity and consistency.
Summary:
This is baseline tests for D69326
Incorporates a command line flag for the MSP430 and adds a test cases to help showing the effects of applying D69326
More details and motivation for this patch in D69326
Reviewers: spatel, asl, lebedev.ri
Reviewed By: spatel, asl
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69975
The macinfo support was broken for LTO situations, by terminating
macinfo lists only once - multiple macinfo contributions were correctly
labeled, but they all continued/flowed into later contributions until
only one terminator appeared at the end of the section.
Correctly terminate each contribution & fix the parsing to handle this
situation too. The parsing fix is also necessary for dumping linked
binaries - the previous code would stop at the end of the first
contribution - missing all later contributions in a linked binary.
It'd be nice to improve the dumping to print the offsets of each
contribution so it'd be easier to know which CU AT_macro_info refers to
which macinfo contribution.
Summary:
This patch adds Pi Blocks to the DDG. A pi-block represents a group of DDG
nodes that are part of a strongly-connected component of the graph.
Replacing all the SCCs with pi-blocks results in an acyclic representation
of the DDG. For example if we have:
{a -> b}, {b -> c, d}, {c -> a}
the cycle a -> b -> c -> a is abstracted into a pi-block "p" as follows:
{p -> d} with "p" containing: {a -> b}, {b -> c}, {c -> a}
In this implementation the edges between nodes that are part of the pi-block
are preserved. The crossing edges (edges where one end of the edge is in the
set of nodes belonging to an SCC and the other end is outside that set) are
replaced with corresponding edges to/from the pi-block node instead.
Authored By: bmahjour
Reviewer: Meinersbur, fhahn, myhsu, xtian, dmgreen, kbarton, jdoerfert
Reviewed By: Meinersbur
Subscribers: ychen, arphaman, simoll, a.elovikov, mgorny, hiraditya, jfb, wuzish, llvm-commits, jsji, Whitney, etiotto, ppc-slack
Tag: #llvm
Differential Revision: https://reviews.llvm.org/D68827
We had some code for this for 32-bit ARM, but this doesn't really need
to be in target-specific code; generalize it.
(I think this started showing up recently because we added an
optimization that converts pow to powi.)
Differential Revision: https://reviews.llvm.org/D69013
Fix cache invalidation by not guarding the dereferenced pointer cache
erasure by SeenBlocks. SeenBlocks is only populated when actually
caching a value in the block, which doesn't necessarily have to happen
just because dereferenced pointers were calculated.
-----
Related to D69686. As noted there, LVI currently behaves differently
for integer and pointer values: For integers, the block value is always
valid inside the basic block, while for pointers it is only valid at
the end of the basic block. I believe the integer behavior is the
correct one, and CVP relies on it via its getConstantRange() uses.
The reason for the special pointer behavior is that LVI checks whether
a pointer is dereferenced in a given basic block and marks it as
non-null in that case. Of course, this information is valid only after
the dereferencing instruction, or in conservative approximation,
at the end of the block.
This patch changes the treatment of dereferencability: Instead of
including it inside the block value, we instead treat it as something
similar to an assume (it essentially is a non-nullness assume) and
incorporate this information in intersectAssumeOrGuardBlockValueConstantRange()
if the context instruction is the terminator of the basic block.
This happens either when determining an edge-value internally in LVI,
or when a terminator was explicitly passed to getValueAt(). The latter
case makes this change not fully NFC, because we can now fold
terminator icmps based on the dereferencability information in the
same block. This is the reason why I changed one JumpThreading test
(it would optimize the condition away without the change).
Of course, we do not want to recompute dereferencability on each
intersectAssume call, so we need a new cache for this. The
dereferencability analysis requires walking the entire basic block
and computing underlying objects of all memory operands. This was
previously done separately for each queried pointer value. In the
new implementation (both because this makes the caching simpler,
and because it is faster), I instead only walk the full BB once and
cache all the dereferenced pointers. So the traversal is now performed
only once per BB, instead of once per queried pointer value.
I think the overall model now makes more sense than before, and there
will be no more pitfalls due to differing integer/pointer behavior.
Differential Revision: https://reviews.llvm.org/D69914
Patch enables import of write-only variables with non-trivial initializers
to fix linker errors. Initializers of imported variables are converted to
'zeroinitializer' to avoid promotion of referenced objects.
Differential revision: https://reviews.llvm.org/D70006
This reverts commit 15bc4dc9a8.
clang-cmake-x86_64-sde-avx512-linux buildbot reported quite a few
compile-time regressions in test-suite, will investigate.
Related to D69686. As noted there, LVI currently behaves differently
for integer and pointer values: For integers, the block value is always
valid inside the basic block, while for pointers it is only valid at
the end of the basic block. I believe the integer behavior is the
correct one, and CVP relies on it via its getConstantRange() uses.
The reason for the special pointer behavior is that LVI checks whether
a pointer is dereferenced in a given basic block and marks it as
non-null in that case. Of course, this information is valid only after
the dereferencing instruction, or in conservative approximation,
at the end of the block.
This patch changes the treatment of dereferencability: Instead of
including it inside the block value, we instead treat it as something
similar to an assume (it essentially is a non-nullness assume) and
incorporate this information in intersectAssumeOrGuardBlockValueConstantRange()
if the context instruction is the terminator of the basic block.
This happens either when determining an edge-value internally in LVI,
or when a terminator was explicitly passed to getValueAt(). The latter
case makes this change not fully NFC, because we can now fold
terminator icmps based on the dereferencability information in the
same block. This is the reason why I changed one JumpThreading test
(it would optimize the condition away without the change).
Of course, we do not want to recompute dereferencability on each
intersectAssume call, so we need a new cache for this. The
dereferencability analysis requires walking the entire basic block
and computing underlying objects of all memory operands. This was
previously done separately for each queried pointer value. In the
new implementation (both because this makes the caching simpler,
and because it is faster), I instead only walk the full BB once and
cache all the dereferenced pointers. So the traversal is now performed
only once per BB, instead of once per queried pointer value.
I think the overall model now makes more sense than before, and there
will be no more pitfalls due to differing integer/pointer behavior.
Differential Revision: https://reviews.llvm.org/D69914