We were using the correct pseudo-instruction, but because the operand's flags
weren't set correctly we still ended up emitting incorrect relocations during
MC lowering.
llvm-svn: 289566
Summary:
This is last in of a series of patches to evolve ADCE.cpp to support
removing of unnecessary control flow.
This patch adds the code to update the control and data flow graphs
to remove the dead control flow.
Also update unit tests to test the capability to remove dead,
may-be-infinite loop which is enabled by the switch
-adce-remove-loops.
Previous patches:
D23824 [ADCE] Add handling of PHI nodes when removing control flow
D23559 [ADCE] Add control dependence computation
D23225 [ADCE] Modify data structures to support removing control flow
D23065 [ADCE] Refactor anticipating new functionality (NFC)
D23102 [ADCE] Refactoring for new functionality (NFC)
Reviewers: dberlin, majnemer, nadav, mehdi_amini
Subscribers: llvm-commits, david2050, freik, twoh
Differential Revision: https://reviews.llvm.org/D24918
llvm-svn: 289548
Match a pattern where a wide type scalar value is loaded by several narrow loads and combined by shifts and ors. Fold it into a single load or a load and a bswap if the targets supports it.
Assuming little endian target:
i8 *a = ...
i32 val = a[0] | (a[1] << 8) | (a[2] << 16) | (a[3] << 24)
=>
i32 val = *((i32)a)
i8 *a = ...
i32 val = (a[0] << 24) | (a[1] << 16) | (a[2] << 8) | a[3]
=>
i32 val = BSWAP(*((i32)a))
This optimization was discussed on llvm-dev some time ago in "Load combine pass" thread. We came to the conclusion that we want to do this transformation late in the pipeline because in presence of atomic loads load widening is irreversible transformation and it might hinder other optimizations.
Eventually we'd like to support folding patterns like this where the offset has a variable and a constant part:
i32 val = a[i] | (a[i + 1] << 8) | (a[i + 2] << 16) | (a[i + 3] << 24)
Matching the pattern above is easier at SelectionDAG level since address reassociation has already happened and the fact that the loads are adjacent is clear. Understanding that these loads are adjacent at IR level would have involved looking through geps/zexts/adds while looking at the addresses.
The general scheme is to match OR expressions by recursively calculating the origin of individual bits which constitute the resulting OR value. If all the OR bits come from memory verify that they are adjacent and match with little or big endian encoding of a wider value. If so and the load of the wider type (and bswap if needed) is allowed by the target generate a load and a bswap if needed.
Reviewed By: hfinkel, RKSimon, filcab
Differential Revision: https://reviews.llvm.org/D26149
llvm-svn: 289538
In certain cases it is possible that transient instructions such as
%reg = IMPLICIT_DEF as a single instruction in a basic block to reach
the MipsHazardSchedule pass. This patch teaches MipsHazardSchedule to
properly look through such cases.
Reviewers: vkalintiris, zoran.jovanovic
Differential Revision: https://reviews.llvm.org/D27209
llvm-svn: 289529
Only the lower bits of the input element are used. And only the lower element can be undef since the upper bits are zeroed.
Have InstCombineCalls call SimplifyDemandedVectorElts for these intrinsics to reuse this support.
llvm-svn: 289523
Summary:
Since we don't break BBs for function calls. We might get some insane counts
(wrap of unsigned) in the presence of noreturn calls.
This patch sets these counts to zero instead of the wrapped number.
Reviewers: davidxl
Subscribers: xur, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D27602
llvm-svn: 289521
Summary:
This pass will be used to relax instructions which use out of bounds
memory accesses to equivalent operations that can work with the
addresses.
The pass currently implements relaxation for the STDWPtrQRr instruction.
Without this pass, an assertion error would be hit in the pseudo expansion pass.
In the future, we will need to add more instructions to this pass. We can do
that on a case-by-case basic.
Reviewers: arsenm, kparzysz
Subscribers: wdng, llvm-commits, mgorny
Differential Revision: https://reviews.llvm.org/D27650
llvm-svn: 289517
The general idea here is to get enough of the existing restrictions out of the way that the already existing folding logic in foldMemoryOperand can kick in for STATEPOINTs and fold references to immutable stack slots. The key changes are:
Support for folding multiple operands at once which reference the same load
Support for folding multiple loads into a single instruction
Walk all the operands of the instruction for varidic instructions (this is a bug fix!)
Once this lands, I'll post another patch which refactors the TII interface here. There's nothing actually x86 specific about the x86 code used here.
Differential Revision: https://reviews.llvm.org/D24103
llvm-svn: 289510
The stack slot reuse code had a really amusing bug. We ended up only reusing a stack slot exact once (initial use + reuse) within a basic block. If we had a third statepoint to process, we ended up allocating a new set of stack slots. If we crossed a basic block boundary, the set got cleared. As a result, code which is invoke heavy doesn't see the problem, but multiple calls within a basic block does. Net result: as we optimize invokes into calls, lowering gets worse.
The root error here is that the bitmap uses by the custom allocator wasn't kept in sync. The result was that we ended up resizing the bitmap on the next statepoint (to handle the cross block case), reset the bit once, but then never reset it again.
Differential Revision: https://reviews.llvm.org/D25243
llvm-svn: 289509
Turns out if you were on windows and your default target wasn't windows the system-windows feature wasn't getting enabled.
This fixes that and updates the coff-dwarf test to rely on the new "target-windows" feature. That test was the reason why system-windows was changed to not always be enabled on Windows hosts.
llvm-svn: 289503
This reverts commit r260386.
These tests all pass for me locally. I have no idea if they will pass on all configurations, so I'll watch the bots closely.
llvm-svn: 289490
Power8 has MTVSRWZ but no LXSIBZX/LXSIHZX, so move 1 or 2 bytes to VSR through MTVSRWZ is much faster than store the extended value into stack and load it with LXSIWZX.
This patch fixes pr31144.
Differential Revision: https://reviews.llvm.org/D27287
llvm-svn: 289473
This patch ensures the correct minimum bit width during type-shrinking.
Previously when type-shrinking, we always sign-extended values back to their
original width. However, if we are going to sign-extend, and the sign bit is
unknown, we have to increase the minimum bit width by one bit so the
sign-extend will fill the upper bits correctly. If the sign bit is known to be
zero, we can perform a zero-extend instead. This should fix PR31243.
Reference: https://llvm.org/bugs/show_bug.cgi?id=31243
Differential Revision: https://reviews.llvm.org/D27466
llvm-svn: 289470
DWARF specifies that "line 0" really means "no appropriate source
location" in the line table. By default, use this for branch targets
and some other cases that have no specified source location, to
prevent inheriting unfortunate line numbers from physically preceding
instructions (which might be from completely unrelated source).
Updated patch allows enabling or suppressing this behavior for all
unspecified source locations.
Differential Revision: http://reviews.llvm.org/D24180
llvm-svn: 289468
Reverts r289412. It caused an OOB PHI operand access in instcombine when
ASan is enabled. Reduction in progress.
Also reverts "[SCEVExpander] Add a test case related to r289412"
llvm-svn: 289453
We could truncate the condition and then try to fold the add into the
original condition value causing wrong case constants to be used.
Move the offset transform ahead of the truncate transform and return
after each transform, so there's no chance of getting confused values.
Fix for:
https://llvm.org/bugs/show_bug.cgi?id=31260
llvm-svn: 289442
Summary:
As discussed on mailing list, for ThinLTO importing we don't need
to import all the fields of the DICompileUnit. Don't import enums,
macros, retained types lists. Also only import local scoped imported
entities. Since we don't currently import any global variables,
we also don't need to import the list of global variables (added an
assert to verify none are being imported).
This is being done by pre-populating the value map entries to map
the unneeded metadata to nullptr. For the imported entities, we can
simply replace the source module's list with a new list containing
only those needed imported entities. This is done in the IRLinker
constructor so that value mapping automatically does the desired
mapping.
Reviewers: mehdi_amini, dexonsmith, dblaikie, aprantl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D27635
llvm-svn: 289441
PMULDQ returns the 64-bit result of the signed multiplication of the lower 32-bits of vXi64 vector inputs, we can lower with this if the sign bits stretch that far.
Differential Revision: https://reviews.llvm.org/D27657
llvm-svn: 289426
Summary:
These intrinsic instructions are all selected from intrinsics that have well defined behavior for where the upper bits come from. It's not the same place as the lower bits.
As you can see we were suppressing load folding for these instructions in some cases. In none of the cases was the separate load helping avoid a partial dependency on the destination register. So we should just go ahead and allow the load to be folded.
Only foldMemoryOperand was suppressing folding for these. They all have patterns for folding sse_load_f32/f64 that aren't gated with OptForSize, but sse_load_f32/f64 doesn't allow 128-bit vector loads. It only allows scalar_to_vector and vzmovl of scalar loads to match. There's no reason we can't allow a 128-bit vector load to be narrowed so I would like to fix sse_load_f32/f64 to allow that. And if I do that it changes some of these same test cases to fold the load too.
Reviewers: spatel, zvi, RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D27611
llvm-svn: 289419
SCEVExpand computes the insertion point for the components of a SCEV to be code
generated. When it comes to generating code for a division, SCEVexpand would
not be able to check (at compilation time) all the conditions necessary to avoid
a division by zero. The patch disables hoisting of expressions containing
divisions by anything other than non-zero constants in order to avoid hoisting
these expressions past conditions that should hold before doing the division.
The patch passes check-all on x86_64-linux.
Differential Revision: https://reviews.llvm.org/D27216
llvm-svn: 289412
When the load node which the broadcast instruction broadcasts has multiple uses, it cannot be folded.
A fallback pattern is added to catch these cases and provide another solution.
Differential Revision: https://reviews.llvm.org/D27661
llvm-svn: 289404
Summary:
This change adds some verification in the IR verifier around struct path
TBAA metadata.
Other than some basic sanity checks (e.g. we get constant integers where
we expect constant integers), this checks:
- That by the time an struct access tuple `(base-type, offset)` is
"reduced" to a scalar base type, the offset is `0`. For instance, in
C++ you can't start from, say `("struct-a", 16)`, and end up with
`("int", 4)` -- by the time the base type is `"int"`, the offset
better be zero. In particular, a variant of this invariant is needed
for `llvm::getMostGenericTBAA` to be correct.
- That there are no cycles in a struct path.
- That struct type nodes have their offsets listed in an ascending
order.
- That when generating the struct access path, you eventually reach the
access type listed in the tbaa tag node.
Reviewers: dexonsmith, chandlerc, reames, mehdi_amini, manmanren
Subscribers: mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D26438
llvm-svn: 289402
We have found that -- when the selected subarchitecture has a scheduling model
and we are not optimizing for size -- the machine-instruction combiner uses a
too-simple algorithm to compute the cost of one of the two alternatives [before
and after running a combining pass on a section of code], and therefor it throws
away the combination results too often.
This fix has the potential to help any ISA with the potential to combine
instructions and for which at least one subarchitecture has a scheduling model.
As of now, this is only known to definitely affect AArch64 subarchitectures with
a scheduling model.
Regression tested on AMD64/GNU-Linux, new test case tested to fail on an
unpatched compiler and pass on a patched compiler.
Patch by Abe Skolnik and Sebastian Pop.
llvm-svn: 289399
Regcall calling convention passes mask types arguments in x86 GPR registers.
The review includes the changes required in order to support v32i1, v16i1 and v8i1.
Differential Revision: https://reviews.llvm.org/D27148
llvm-svn: 289383
This teaches SimplifyDemandedElts that the FMA can be removed if the lower element isn't used. It also teaches it that if upper elements of the first operand aren't used then we can simplify them.
llvm-svn: 289377