Confusingly, the intrinsic operands do not match the
instruction/custom node. The order is shuffled, and the 3rd operand is
an immediate to select operands.
I'm not 100% sure I did this right, but fdiv still doesn't select end
to end and it will be easier to tell when it does. This at least
avoids an assertion in RegBankSelect and allows hitting the fallback
on selection.
Having this function be recursive could use up way too much stack space.
Rewrite it as an iterative traversal in the tree instead to prevent this.
Fixes PR44344.
It isn't necessary to create DIEs for all of the declaration subprograms
in a CU's retainedTypes list. We can defer creating these subprograms
until we need to prepare a call site tag that refers to one.
This cleanup was mentioned in passing in D70350.
This allows a call site tag in CU A to reference a callee DIE in CU B
without resorting to creating an incomplete duplicate DIE for the callee
inside of CU A.
We already allow cross-CU references of subprogram declarations, so it
doesn't seem like definitions ought to be special.
This improves entry value evaluation and tail call frame synthesis in
the LTO setting. During LTO, it's common for cross-module inlining to
produce a call in some CU A where the callee resides in a different CU,
and there is no declaration subprogram for the callee anywhere. In this
case llvm would (unnecessarily, I think) emit an empty DW_TAG_subprogram
in order to fill in the call site tag. That empty 'definition' defeats
entry value evaluation etc., because the debugger can't figure out what
it means.
As a follow-up, maybe we could add a DWARF verifier check that a
DW_TAG_subprogram at least has a DW_AT_name attribute.
Update:
Reland with a fix to create a declaration DIE when the declaration is
missing from the CU's retainedTypes list. The declaration is left out
of the retainedTypes list in two cases:
1) Re-compiling pre-r266445 bitcode (in which declarations weren't added
to the retainedTypes list), and
2) Doing LTO function importing (which doesn't update the retainedTypes
list).
It's possible to handle (1) and (2) by modifying the retainedTypes list
(in AutoUpgrade, or in the LTO importing logic resp.), but I don't see
an advantage to doing it this way, as it would cause more DWARF to be
emitted compared to creating the declaration DIEs lazily.
Tested with a stage2 ThinLTO+RelWithDebInfo build of clang, and with a
ReleaseLTO-g build of the test suite.
rdar://46577651, rdar://57855316, rdar://57840415
Differential Revision: https://reviews.llvm.org/D70350
Extends DWARF expression language to express locals/globals locations. (via
target-index operands atm) (possible variants are: non-virtual registers
or address spaces)
The WebAssemblyExplicitLocals can replace virtual registers to targertindex
operand type at the time when WebAssembly backend introduces
{get,set,tee}_local instead of corresponding virtual registers.
Reviewed By: aprantl, dschuff
Tags: #debug-info, #llvm
Differential Revision: https://reviews.llvm.org/D52634
Summary:
This patch limits the default number of iterations performed by InstCombine. It also exposes a new option that allows to specify how many iterations is considered getting stuck in an infinite loop.
Based on experiments performed on real-world C++ programs, InstCombine seems to perform at most ~8-20 iterations, so treating 1000 iterations as an infinite loop seems like a safe choice. See D71145 for details.
The two limits can be specified via command line options.
Reviewers: spatel, lebedev.ri, nikic, xbolva00, grosser
Reviewed By: spatel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71673
This is a purely cosmetic change that is NFC in terms of the binary
output. I bugs me that I called the attribute DW_AT_LLVM_isysroot
since the "i" is an artifact of GCC command line option syntax
(-isysroot is in the category of -i options) and doesn't carry any
useful information otherwise.
This attribute only appears in Clang module debug info.
Differential Revision: https://reviews.llvm.org/D71722
Demote member functions to static functions where possible
Use early continue/early return to reduce nesting
Clarify comments slightly.
Reuse previously define expression in one case.
Should have caught this in review, but only noticed when addressing post commit style items. We were creating a new instance of the X86MCInstrInfo class, and then never reclaiming the memory. This wasn't even conditional on the new off by default flags, so it was an unconditional leak.
WARNING: If you're looking at this patch because you're looking for a full
performace mitigation of the Intel JCC Erratum, this is not it!
This is a preliminary patch on the patch towards mitigating the performance
regressions caused by Intel's microcode update for Jump Conditional Code
Erratum. For context, see:
https://www.intel.com/content/www/us/en/support/articles/000055650.html
The patch adds the required assembler infrastructure and command line options
needed to exercise the logic for INTERNAL TESTING. These are NOT public flags,
and should not be used for anything other than LLVM's own testing/debugging
purposes. They are likely to change both in spelling and meaning.
WARNING: This patch is knowingly incorrect in some cornercases. We need, and
do not yet provide, a mechanism to selective enable/disable the padding.
Conversation on this will continue in parellel with work on extending this
infrastructure to support prefix padding.
The goal here is to have the assembler align specific instructions such that
they neither cross or end at a 32 byte boundary. The impacted instructions are:
a. Conditional jump.
b. Fused conditional jump.
c. Unconditional jump.
d. Indirect jump.
e. Ret.
f. Call.
The new options for llvm-mc are:
-x86-align-branch-boundary=NUM aligns branches within NUM byte boundary.
-x86-align-branch=TYPE[+TYPE...] specifies types of branches to align.
A new MCFragment type, MCBoundaryAlignFragment, is added, which may emit
NOP to align the fused/unfused branch.
alignBranchesBegin inserts MCBoundaryAlignFragment before instructions,
alignBranchesEnd marks the end of the branch to be aligned,
relaxBoundaryAlign grows or shrinks sizes of NOP to align the target branch.
Nop padding is disabled when the instruction may be rewritten by the linker,
such as TLS Call.
Process Note: I am landing a patch by skan as it has been LGTMed, and
continuing to iterate on the review is simply slowing us down at this point.
We can and will continue to iterate in tree.
Patch By: skan
Differential Revision: https://reviews.llvm.org/D70157
static void *ifunc(void) __attribute__((ifunc("resolver")));
void foo() { ifunc(); }
The relocation produced by the ifunc() call:
1. gcc -msecure-plt -fPIC => R_PPC_PLTREL24 r_addend=0x8000
2. gcc -msecure-plt -PIE => R_PPC_PLTREL24 r_addend=0x8000
3. clang -msecure-plt -fPIC => R_PPC_PLTREL24 r_addend=0x8000
4. clang -msecure-plt -fPIE => R_PPC_REL24
4 is incorrect. The R_PPC_REL24 needs a call stub due to ifunc. If this
relocation is mixed with other R_PPC_PLTREL24(r_addend=0x8000) in a
function, both GNU ld and lld (after D71621 fix) may produce a wrong
result.
This patch fixes 4 to use R_PPC_PLTREL24, which matches GCC.
Both GNU ld and lld (after D71621) will be happy.
Reviewed By: sfertile
Differential Revision: https://reviews.llvm.org/D71649
The setcc operands are copied into LHS and RHS variables at the top of the function. We also capture the condition code.
A later piece of code swaps the operands and changing the CC variable as part of a canonicalization to make some other checks simpler. But we might not make the transform we canonicalized for. So we continue on through the function where we can use the swapped LHS/RHS variables and access the original condition code operand instead of the modified CC variable. This leads to a setcc being created with the original condition code, but with swapped operands.
To mitigate this, this patch does a couple things. The LHS/RHS/CC variables are made const to keep them from being modified like this again. The transform that needs the swap now uses temporary copies of the variables. And the transform that used the original condition code operand has been altered to use the CC variable we cached originally. Either of these changes are enough to fix the issue, but doing both to make this code very safe.
I also considered rewriting the swap code in some way to check both permutations without explicitly swapping or needing temporary variables, but held off on that.
Differential Revision: https://reviews.llvm.org/D71736
Summary: Replace the integer immediate intrisics with splat vector variants so they can be applied as optimizations for the C/C++ intrinsics.
Reviewers: sdesmalen, huntergr, rengolin, efriedma, c-rhodes, mgudim, kmclaughlin
Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits, amehsan
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71614
The SELR(Mux) instructions can be converted to two-address form as LOCR(Mux)
instructions whenever one of the sources are the same reg as dest. By adding
this mapping in getTwoOperandOpcode(), we get:
- Two-address hints in getRegAllocationHints() for select register
instructions.
- No need anymore for special handling in SystemZShortenInst.cpp -
shortenSelect() removed.
The two-address hints are now added before the GRX32 hints, which should be
preferred.
Review: Ulrich Weigand
https://reviews.llvm.org/D68870
Summary:
With -gdwarf-5 local variable locations are emitted as DW_FORM_loclistx
form instead of the regular DW_FORM_sec_offset. Teach
DWARFDie::getLocations to understand the new format and use it in
llvm-symbolizer "FRAME" command.
Reviewers: pcc, jdoerfert
Subscribers: srhines, aprantl, hiraditya, rupprecht, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70756
It was recently discovered that the handling of CC values was actually broken
since overflow was not properly handled ('nsw' flag not checked for).
Add and sub instructions now have a new target specific instruction flag
named SystemZII::CCIfNoSignedWrap. It means that the CC result can be used
instead of a compare with 0, but only if the instruction has the 'nsw' flag
set.
This patch also adds the improvements of conversion to logical instructions
and the analyzing of add with immediates, to be able to eliminate more
compares.
Review: Ulrich Weigand
https://reviews.llvm.org/D66868
The back-end currently has special DAGCombine code to detect
cases where two floating-point extend or truncate operations
can be combined into a single vector operation.
This patch extends that support to also handle strict FP operations.
Note that currently only the case where both operations have the
same input chain are supported. This already suffices to cover
the common case where the operations result from scalarizing a
non-legal vector type. More general cases can be supported in
the future.
In general SVE intrinsics are considered predicated and merging
with everything else having suitable decoration. For predicated
zeroing operations (like the predicate logical instructions) we
use the "_z" suffix. After this change all intrinsics use their
expected names (i.e. orr instead of or and eor instead of xor).
I've removed intrinsics and patterns for condition code setting
instructions as that data is not returned as part of the intrinsic.
The expectation is to ask for a cc flag explicitly.
For example:
a = and_z(pg, p1, p2)
cc = ptest_<flag>(pg, a)
With the code generator expected to use "s" variants of instructions
when available.
Differential Revision: https://reviews.llvm.org/D71715
The calculator was considering instructions such as KILLs as clobbers
of a physical address. This is wrong as meta instructions such as KILLs
produce no output in the final program and thus don't clobber or change
any physical location's value. As a result they're safe to ignore whilst
calculating location list ranges.
reviewers: aprantl, vsk
diff revision: https://reviews.llvm.org/D70497
fixes: https://bugs.llvm.org/show_bug.cgi?id=38753
A sequence of additions or multiplications that is known not to wrap, may wrap
if it's order is changed (i.e., reassociated). Therefore when vectorizing
integer sum or product reductions, their no-wrap flags need to be removed.
Fixes PR43828
Patch by Denis Antrushin
Differential Revision: https://reviews.llvm.org/D69563
Recommit 23c28c4043 (reverted in
dcb48f50bd) with a fix for an assert
"Request for a fixed size on a scalable object" being triggered in
`LowerSVEIntrinsicEXT`. The fix is to call `getKnownMinSize` on the
TypeSize object.
1) Fix an issue with the incorrect value being used for the number of
elements being passed to [d|w]lstp. We were trying to check that
the value was available at LoopStart, but this doesn't consider
that the last instruction in the block could also define the
register. Two helpers have been added to RDA for this.
2) Insert some code to now try to move the element count def or the
insertion point so that we can perform more tail predication.
3) Related to (1), the same off-by-one could prevent us from
generating a low-overhead loop when a mov lr could have been
the last instruction in the block.
4) Fix up some instruction attributes so that not all the
low-overhead loop instructions are labelled as branches and
terminators - as this is not true for dls/dlstp.
Differential Revision: https://reviews.llvm.org/D71609
Record the discovered VPT blocks while checking for validity and, for
now, only handle blocks that begin with VPST and not VPT. We're now
allowing more than one instruction to define vpr, but each block must
somehow be predicated using the vctp. This leaves us with several
scenarios which need fixing up:
1) A VPT block with is only predicated by the vctp and has no
internal vpr defs.
2) A VPT block which is only predicated by the vctp but has an
internal vpr def.
3) A VPT block which is predicated upon the vctp as well as another
vpr def.
4) A VPT block which is not predicated upon a vctp, but contains it
and all instructions within the block are predicated upon in.
The changes needed are, for:
1) The easy one, just remove the vpst and unpredicate the
instructions in the block.
2) Remove the vpst and unpredicate the instructions up to the
internal vpr def. Need insert a new vpst to predicate the
remaining instructions.
3) No nothing.
4) The vctp will be inside a vpt and the instruction will be removed,
so adjust the size of the mask on the vpst.
Differential Revision: https://reviews.llvm.org/D71107
The only thing its getting from the X86TargetLowering class is
the subtarget which we can easily pass. This function only has
one call site now since this might help the compiler inline it.
Explicitly return both the flag result and the chain result for
STRICT_FCMP nodes. This removes an assumption in the caller that
getValue(1) is the right way to get the chain.
EmitCmp will just immediately call EmitTest and discard the null
constant only to have EmitTest create it again if it doesn't fold.
So just skip all that and go directly to EmitTest.
Summary:
All the use cases of CallAnalyzer use the same call site parameter to
both construct the CallAnalyzer, and then pass to the analysis member.
This change removes this duplication.
Reviewers: davidxl, eraman, Jim
Reviewed By: davidxl
Subscribers: Jim, hiraditya, haicheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71645
Summary:
It is pretty common to assume that something is not zero.
Even optimizer itself sometimes emits such assumptions
(e.g. `addAssumeNonNull()` in `PromoteMemoryToRegister.cpp`).
But we currently don't deal with such assumptions :)
The only way `isKnownNonZero()` handles assumptions is
by calling `computeKnownBits()` which calls `computeKnownBitsFromAssume()`.
But `x != 0` does not tell us anything about set bits,
it only says that there are *some* set bits.
So naturally, `KnownBits` does not get populated,
and we fail to make use of this assumption.
I propose to deal with this special case by special-casing it
via adding a `isKnownNonZeroFromAssume()` that returns boolean
when there is an applicable assumption.
While there, we also deal with other predicates,
mainly if the comparison is with constant.
Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=43267 | PR43267 ]].
Differential Revision: https://reviews.llvm.org/D71660
This is a pretty rare case, when CxtI and assume are
in the same basic block, with assume being located later.
We were already checking that assumption was guaranteed to be
executed, but we omitted CxtI itself from consideration,
and as the test (miscompile) shows, that is incorrect.
As noted in D71660 review by @nikic.
A function marked `noreturn` may contain unreachable terminators: these
should not be considered cold, as the function may be a trampoline.
rdar://58068594
Recommit after making the same API change in non-x86 targets. This has been build for all targets, and tested for effected ones. Why the difference? Because my disk filled up when I tried make check for all.
For auto-padding assembler support, we'll need to bundle the label with the instructions (nops or call sequences) so that they don't get separated. This just rearranges the code to make the upcoming change more obvious.
as it causes a layering violation/dependency cycle:
llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp -> llvm/DebugInfo/DWARF/DWARFExpression.h
llvm/include/llvm/DebugInfo/DWARF/DWARFOptimizer.h -> llvm/CodeGen/NonRelocatableStringpool.h
This reverts commit abc7f6800d.
Summary:
When we use undefined symbol with its qualname, we are not able
to generate that symbol because of the logic of early "continue"
that skip the qualname symbol. This patch fixes it.
Differential revision: https://reviews.llvm.org/D71667
For auto-padding assembler support, we'll need to bundle the label with the instructions (nops or call sequences) so that they don't get separated. This just rearranges the code to make the upcoming change more obvious.
This is in advance of assembler padding directives support where we'll need to bundle the label w/the corresponding faulting instruction to avoid padding being inserted between.
Summary: Instead of crashing due to the `llvm_unreachable`, provide a proper
error when invalid fixups/relocations are encountered.
Reviewers: asb, lenary
Reviewed By: asb
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71536
This patch enables the machine outliner for RISC-V and adds the
necessary logic for checking whether sequences can be safely outlined,
and describing how they should be outlined. Outlined functions are
called using the register t0 (x5) as the return address register, which
must be available for an occurrence of a sequence to be safely outlined.
Differential Revision: https://reviews.llvm.org/D66210
Summary:
This patch associates ordinal numbers to the DDG Nodes allowing
the builder to order nodes within a pi-block in program order. The
algorithm works by simply assuming the order in which the BBList
is fed into the builder. The builder already relies on the blocks being
in program order so that it can compute the dependencies correctly.
Similarly the order of instructions in their parent basic blocks
determine their program order.
Authored By: bmahjour
Reviewer: Meinersbur, fhahn, myhsu, xtian, dmgreen, kbarton, jdoerfert
Reviewed By: Meinersbur
Subscribers: ychen, arphaman, simoll, a.elovikov, mgorny, hiraditya, jfb, wuzish, llvm-commits, jsji, Whitney, etiotto, ppc-slack
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70986
Summary:
The default static (non-PIC, non-PIE) model for 32-bit powerpc does not
use @PLT annotations and relocations in GCC. LLVM shouldn't use @PLT
annotations either, because it breaks secure-PLT linking with (some
versions of?) GNU LD.
Update the available-externally.ll test to reflect that default mode should be
the same as the static relocation, by using the same check prefix.
Reviewed by: sfertile
Differential Revision: https://reviews.llvm.org/D70570
Summary:
Ignore looking at blocks that are unreachable from entry when
collecting candidates for hosting.
Normally the consthoist pass is executed in the llc pipeline,
just after unreachableblockelim. So it is abnormal to have code
that is unreachable from the entry block. But when running the
pass as part of opt, for example as part of fuzzy testing, we
might trigger various kinds of asserts when collecting candidates
if we include unreachable blocks in that analysis.
It seems like a waste of time to hoist constants in unreachble
blocks, so the solution is to simply ignore such blocks when
collecting the hoisting candidates.
The two added test cases used to end up in two different asserts,
and the intention with the checks is just to verify that we no
longer fail.
Fixes: PR43903
Reviewers: spatel
Reviewed By: spatel
Subscribers: hiraditya, uabelho, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71678
The debug line verbose printing was printing the wrong values for rows
added via DW_LNE_end_sequence, because the row was being printed AFTER
its state had been reset following it being appended to the line table.
This patch fixes this issue by printing the row before appending it.
Reviewers: dblaikie, MaskRay
Differential Revision: https://reviews.llvm.org/D71664
That patch is extracted from the D70709. It moves CompileUnit, DeclContext
into llvm/DebugInfo/DWARF. It also adds new file DWARFOptimizer with
AddressesMap class. AddressesMap generalizes functionality
from RelocationManager.
Differential Revision: https://reviews.llvm.org/D71271
In certain situations after inlining and simplification we end up with
code that is _almost_ a min/max pattern, but contains constants that
have been demand-bit optimised to the wrong values, ending up with code
like:
%1 = icmp slt i32 %shr, -128
%2 = select i1 %1, i32 128, i32 %shr
%.inv = icmp sgt i32 %shr, 127
%spec.select.i = select i1 %.inv, i32 127, i32 %2
%conv7 = trunc i32 %spec.select.i to i8
This should be turned into a min/max pattern, but the -128 in the first
select was instead transformed into 128, as only the bottom byte was
ever demanded.
To fix this, I've put in further canonicalisation for the immediates of
selects, preferring to use the same value as the icmp if available.
Differential Revision: https://reviews.llvm.org/D71516
The inconsistency caused uops mode to fail on an older version of libpfm
since the dispatched_port was added as an alias for executed_port only
after v4.6.0 of libpfm.
Differential revision: https://reviews.llvm.org/D71665
Summary:
Instead of generating two i32 instructions for each load or store of a volatile
i64 value (two LDRs or STRs), now emit LDRD/STRD.
These improvements cover architectures implementing ARMv5TE or Thumb-2.
Reviewers: dmgreen, efriedma, john.brawn
Reviewed By: efriedma
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70072
Since the address pool doesn't get populated in this case (due to the
lack of inlining, no child DIEs are added to the CU - so no addresses
are needed for the DIEs themselves) until the range list is emitted - at
the time the attributes are added to the CU, the address pool is empty.
So check whether the address pool will be used for the range lists & add
an addr_base if that's the case.
Move these data structures closer together so their emission code can
eventually share more of its implementation.
Was an egregious bug (completely untested, evidently) where I hadn't
inverted a DWARFv5 test as needed, so it was doing the exact opposite of
what was required & thus tried to emit a DWARFv5 range list header in
DWARFv4.
Reapply 8e04896288 which was
reverted in a8154e5e0c.
Summary:
The vector pattern `(a + b + 1) / 2` was previously selected to an
avgr_u instruction regardless of nuw flags, but this is incorrect in
the case where either addition may have an unsigned wrap. This CL
changes the existing pattern to require both adds to have nuw flags
and adds builtin functions and intrinsics for the avgr_u instructions
because the corrected pattern is not representable in C.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D71648
We really need to update the isel patterns to prevent this, but
that requires some tablegen de-tangling. So this hack will work
for correctness in the short term.
LLJITBuilder will now use JITLink on supported platforms even if a custom
JITTargetMachineBuilder is supplied, provided that neither the code model,
nor the relocation model, nor the ObjectLinkingLayerCreator is set.
Add new intrinsics
llvm.experimental.constrained.minimum
llvm.experimental.constrained.maximum
as strict versions of llvm.minimum and llvm.maximum.
Includes SystemZ back-end support.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D71624
Loop fusion previously had a method to check whether a loop was in rotated form. This method has
been moved into the LoopInfo class. This patch removes the old isRotated method from loop fusion,
in favour of the new one in LoopInfo.
Summary:
This patch adds instructions to the InstCombine worklist after they are properly inserted. This way we don't get `<badref>`s printed when logging added instructions.
It also adds a check in `Worklist::Add` that ensures that all added instructions have parents.
Simple test case that illustrates the difference when run with `--debug-only=instcombine`:
```
define i32 @test35(i32 %a, i32 %b) {
%1 = or i32 %a, 1135
%2 = or i32 %1, %b
ret i32 %2
}
```
Before this patch:
```
INSTCOMBINE ITERATION #1 on test35
IC: ADDING: 3 instrs to worklist
IC: Visiting: %1 = or i32 %a, 1135
IC: Visiting: %2 = or i32 %1, %b
IC: ADD: %2 = or i32 %a, %b
IC: Old = %3 = or i32 %1, %b
New = <badref> = or i32 %2, 1135
IC: ADD: <badref> = or i32 %2, 1135
...
```
With this patch:
```
INSTCOMBINE ITERATION #1 on test35
IC: ADDING: 3 instrs to worklist
IC: Visiting: %1 = or i32 %a, 1135
IC: Visiting: %2 = or i32 %1, %b
IC: ADD: %2 = or i32 %a, %b
IC: Old = %3 = or i32 %1, %b
New = <badref> = or i32 %2, 1135
IC: ADD: %3 = or i32 %2, 1135
...
```
Reviewers: fhahn, davide, spatel, foad, grosser, nikic
Reviewed By: nikic
Subscribers: nikic, lebedev.ri, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71093
Since VFS paths can be in either Posix or Windows style, we have to use
a more flexible definition of "absolute" path.
The key here is that FileSystem::makeAbsolute is now virtual, and the
RedirectingFileSystem override checks for either concept of absolute
before trying to make the path absolute by combining it with the current
directory.
Differential Revision: https://reviews.llvm.org/D70701
This reverts commit 830e08b98b and eb1857ce0d.
This commit leads to an unexpected failure on test/CodeGen/AArch64/sve-gather-scatter-dag-combine.ll.
The review will need more changes before its re-commited.
Let the "mnop-mcount" function attribute simply be present or non-present.
Update SystemZ backend as well to use hasFnAttribute() instead.
Review: Ulrich Weigand
https://reviews.llvm.org/D71669
Summary:
This patch teaches InstCombine to accept a new parameter: maximum number of iterations over functions.
InstCombine tries to simplify instructions by iterating over the whole function until the function stops changing. As a consequence, the last iteration before reaching a fixpoint visits all instructions in the worklist and never performs any rewrites.
Bounding the number of iterations can have 2 benefits:
* In case the users of the pass can make a good guess about the number of required iterations, we can save the time normally spent on the last iteration that doesn't change anything.
* When the wants to use InstCombine as a cleanup pass, it may be enough to run just a few iterations and stop even before reaching a fixpoint. This can be also useful for implementing a lightweight pass pipeline (think `-O1`).
This patch does not change the behavior of opt or Clang -- limiting the number of iterations is entirely opt-in.
Reviewers: fhahn, davide, spatel, foad, nlopes, grosser, lebedev.ri, nikic, xbolva00
Reviewed By: spatel
Subscribers: craig.topper, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71145
Remappings involving extern "C" names were already supported in the
context of <local-name>s, but this support didn't work for remapping the
complete mangling itself. (Eg, we would remap X<foo> but not foo itself,
if foo is an extern "C" function.)
getTargetConstant prevents any optimizations from operating on the
value and basically says its already been iseled. But since we
want the index to be in a register, this isn't true.
Prior to this we were generating a vbroadcast with an immediate
argument which is illegal and was flagged by the expensive checks
bot.
Refactor the splatting of a constant to a vector so that common code is used
both for Power9 and Power8.
Patch by: Anil Mahmud
Differential Revision: https://reviews.llvm.org/D71481
Summary: Replace the integer immediate intrisics with splat vector variants so they can be applied as optimizations for the C/C++ intrinsics.
Reviewers: sdesmalen, huntergr, rengolin, efriedma, c-rhodes, mgudim, kmclaughlin
Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits, amehsan
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71614
(This commit restores the original branch (4272372c57) and applies an
additional change dropped from the original in a bad merge. This change
should address the previous bot failures. Both changes reviewed by pete.)
Summary:
This commit builds upon Derek Schuff's 2014 commit for attaching labels to
existing fragments ( Diff Revision: http://reviews.llvm.org/D5915 )
When temporary labels appear ahead of a fragment, MCObjectStreamer will
track the temporary label symbol in a "Pending Labels" list. Labels are
associated with fragments when a real fragment arrives; otherwise, an empty
data fragment will be created if the streamer's section changes or if the
stream finishes.
This commit moves the "Pending Labels" list into each MCStream, so that
this label-fragment matching process is resilient to section changes. If
the streamer emits a label in a new section, switches to another section to
do other work, then switches back to the first section and emits a
fragment, that initial label will be associated with this new fragment.
Labels will only receive empty data fragments in the case where no other
fragment exists for that section.
The downstream effects of this can be seen in Mach-O relocations. The
previous approach could produce local section relocations and external
symbol relocations for the same data in an object file, and this mix of
relocation types resulted in problems in the ld64 Mach-O linker. This
commit ensures relocations triggered by temporary labels are consistent.
Reviewers: pete, ab, dschuff
Reviewed By: pete, dschuff
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71368
This reverts commit 1f3dd83cc1, reapplying
commit bb1b0bc4e5.
The original commit failed on some builds seemingly due to the use of a
bracketed constructor with an std::array, i.e. `std::array<> arr({...})`.
This should eliminate a regression seen in D63815.
If we are FP extending the high half extract of a vector,
we should be able to peek through a bitcast sitting
between the extract and extend.
This replaces tablegen patterns with a more general
DAG to DAG override, so we can handle any casted type.
Differential Revision: https://reviews.llvm.org/D71515
Summary:
This is used by the extending_loads combine to tell the apply step which
use is the preferred one to fold and the other uses should be re-written
to consume.
Depends on D69117
Reviewers: volkan, bogner
Reviewed By: volkan
Subscribers: hiraditya, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69147
Previously, LLVM had no functional way of performing casts inside of a
DIExpression(), which made salvaging cast instructions other than Noop
casts impossible. This patch enables the salvaging of casts by using the
DW_OP_LLVM_convert operator for SExt and Trunc instructions.
There is another issue which is exposed by this fix, in which fragment
DIExpressions (which are preserved more readily by this patch) for
values that must be split across registers in ISel trigger an assertion,
as the 'split' fragments extend beyond the bounds of the fragment
DIExpression causing an error. This patch also fixes this issue by
checking the fragment status of DIExpressions which are to be split, and
dropping fragments that are invalid.
Summary:
Instead of generating two i64 instructions for each load or store of a
volatile i128 value (two LDRs or STRs), now emit a single LDP or STP.
Reviewers: labrinea, t.p.northover, efriedma
Reviewed By: efriedma
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69559
Summary:
r347747 added support for clustering mem ops with FI base operands
including support for fixed stack objects in shouldClusterFI, but
apparently this was never tested.
This patch fixes shouldClusterFI to work with scaled as well as
unscaled load/store instructions, and fixes the ordering of memory ops
in MemOpInfo::operator< to ensure that memory addresses always
increase, regardless of which direction the stack grows.
Subscribers: MatzeB, kristof.beyls, hiraditya, javed.absar, arphaman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71334