The original change was reverted in rL317217 because of the failure in
the RS4GC testcase. I couldn't reproduce the failure on my local machine
(macbook) but could reproduce it on a linux box.
The failure was around removing the uses of invariant.start. The fix
here is to just RAUW undef (which was the first implementation in D39388).
This is perfectly valid IR as discussed in the review.
llvm-svn: 317225
Summary:
Invariant.start on memory locations has the property that the memory
location is unchanging. However, this is not true in the face of
rewriting statepoints for GC.
Teach RS4GC about removing invariant.start so that optimizations after
RS4GC does not incorrect sink a load from the memory location past a
statepoint.
Added test showcasing the issue.
Reviewers: reames, apilipenko, dneilson
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39388
llvm-svn: 317215
undefined reference to `llvm::TargetPassConfig::ID' on
clang-ppc64le-linux-multistage
This reverts commit eea333c33fa73ad225ef28607795984829f65688.
llvm-svn: 317213
Summary:
This is mostly a noop (most of the test diffs are renamed blocks).
There are a few temporary register renames (eax<->ecx) and a few blocks are
shuffled around.
See the discussion in PR33325 for more details.
Reviewers: spatel
Subscribers: mgorny
Differential Revision: https://reviews.llvm.org/D39456
llvm-svn: 317211
When splitting a large load to smaller legally-typed loads, the last load should be padded to reach the size of the previous one so a CONCAT_VECTORS node could reunite them again.
The code currently pads the last load to reach the size of the first load (instead of the previous).
Differential Revision: https://reviews.llvm.org/D38495
Change-Id: Ib60b55ed26ce901fabf68108daf52683fbd5013f
llvm-svn: 317206
MSA stores and loads to the stack are more likely to require an
emergency GPR spill slot due to the smaller offsets available
with those instructions.
Handle this by overestimating the size of the stack by determining
the largest offset presuming that all callee save registers are
spilled and accounting of incoming arguments when determining
whether an emergency spill slot is required.
Reviewers: atanasyan
Differential Revision: https://reviews.llvm.org/D39056
llvm-svn: 317204
As of today we only use .cfi_offset to specify the offset of a CSR, but
we never use .cfi_restore when the CSR is restored.
If we want to perform a more advanced type of shrink-wrapping, we need
to use .cfi_restore in order to switch the CFI state between blocks.
This patch only aims at adding support for the directive.
Differential Revision: https://reviews.llvm.org/D36114
llvm-svn: 317199
Summary:
SpeculativelyExecuteBB can flatten the CFG by doing
speculative execution followed by a select instruction.
When the speculatively executed BB contained dbg intrinsics
the result could be a little bit weird, since those dbg
intrinsics were inserted before the select in the flattened
CFG. So when single stepping in the debugger, printing the
value of the variable referenced in the dbg intrinsic, it
could happen that it looked like the variable had values
that never actually were assigned to the variable.
This patch simply discards all dbg intrinsics that were found
in the speculatively executed BB.
Reviewers: aprantl, chandlerc, craig.topper
Reviewed By: aprantl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39494
llvm-svn: 317198
The generic dag combiner will fold:
(shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2)
(shl (or x, c1), c2) -> (or (shl x, c2), c1 << c2)
This can create constants which are too large to use as an immediate.
Many ALU operations are also able of performing the shl, so we can
unfold the transformation to prevent a mov imm instruction from being
generated.
Other patterns, such as b + ((a << 1) | 510), can also be simplified
in the same manner.
Differential Revision: https://reviews.llvm.org/D38084
llvm-svn: 317197
Rather than looking at model numbers just check for the mmx feature flag. While there promote INTEL_PENTIUM_MMX to a CPU type instead of a subtype so that we don't have weird type with only one subtype.
llvm-svn: 317184
Sometimes program headers have larger alignments than any of the
sections they contain. Currently yaml2obj can't produce such files. A
bug recently appeared in llvm-objcopy that failed in such a case. I'd
like to be able to add tests to llvm-objcopy for such cases.
This change adds an optional alignment parameter to program headers that
will be used instead of calculating the alignment.
Differential Revision: https://reviews.llvm.org/D39130
llvm-svn: 317139
These include:
* Several functions for creating an LLVMDIBuilder,
* LLVMDIBuilderCreateCompileUnit,
* LLVMDIBuilderCreateFile,
* LLVMDIBuilderCreateDebugLocation.
Patch by Harlan Haskins.
Differential Revision: https://reviews.llvm.org/D32368
llvm-svn: 317135
Similar to the existing code to lower to PACKSS, we can use PACKUS if the input vector's leading zero bits extend all the way to the packed/truncated value.
We have to account for pre-SSE41 targets not supporting PACKUSDW
llvm-svn: 317128
This patch is to rewrite FileOutputBuffer as two separate classes;
one for file-backed output buffer and the other for memory-backed
output buffer. I think the new code is easier to follow because two
different implementations are now actually separated as different
classes.
Unlike the previous implementation, the class that does not replace the
final output file using rename(2) does not create a temporary file at
all. Instead, it allocates memory using mmap(2) and use it. I think
this is an improvement because it is now guaranteed that the temporary
memory region doesn't trigger any I/O and there's now zero chance to
leave a temporary file behind. Also, it shouldn't impose new restrictions
because were using mmap IO too.
Differential Revision: https://reviews.llvm.org/D39449
llvm-svn: 317127
This will enable us to prefer VALIGND/Q during shuffle lowering in order to get the extended register encoding space when BWI isn't available. But if we end up not using the extended registers we can switch VPALIGNR for the shorter VEX encoding.
Differential Revision: https://reviews.llvm.org/D39401
llvm-svn: 317122
Summary: In the compile phase of SamplePGO+ThinLTO, ICP is not invoked. Instead, indirect call targets will be included as function metadata for ThinIndex to buidl the call graph. This should not only include functions defined in other modules, but also functions defined in the same module, otherwise ThinIndex may find the callee dead and eliminate it, while ICP in backend will revive the symbol, which leads to undefined symbol.
Reviewers: tejohnson
Reviewed By: tejohnson
Subscribers: sanjoy, llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D39480
llvm-svn: 317118
This patch moves the check for opt size and hasPartialRegUpdate into the lower level implementation of foldMemoryOperandImpl to catch the entry point that fast isel uses.
We're still folding undef register instructions in AVX that we should also probably disable, but that's a problem for another patch.
Unfortunately, this requires reordering a bunch of functions which is why the diff is so large. I can do the function reordering separately if we want.
Differential Revision: https://reviews.llvm.org/D39402
llvm-svn: 317112
This is necessary because DCE is applied to full LTO modules. Without
this change, a reference from a dead ThinLTO global to a dead full
LTO global will result in an undefined reference at link time.
This problem is only observable when --gc-sections is disabled, or
when targeting COFF, as the COFF port of lld requires all symbols to
have a definition even if all references are dead (this is consistent
with link.exe).
This change also adds an EliminateAvailableExternally pass at -O0. This
is necessary to handle the situation on Windows where a non-prevailing
copy of a linkonce_odr function has an SEH filter function; any
such filters must be DCE'd because they will contain a call to the
llvm.localrecover intrinsic, passing as an argument the address of the
function that the filter belongs to, and llvm.localrecover requires
this function to be defined locally.
Fixes PR35142.
Differential Revision: https://reviews.llvm.org/D39484
llvm-svn: 317108
Summary:
[X86] Teach fast isel to handle i64 sitofp with AVX.
For some reason we only handled i32 sitofp with AVX. But with SSE only we support i64 so we should do the same with AVX.
Also add i686 command lines for the 32-bit tests. 64-bit tests are in a separate file to avoid a fast-isel abort failure in 32-bit mode.
Reviewers: RKSimon, zvi
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39450
llvm-svn: 317102
This patch aims to provide correct dwarf unwind information in function
epilogue for X86.
It consists of two parts. The first part inserts CFI instructions that set
appropriate cfa offset and cfa register in emitEpilogue() in
X86FrameLowering. This part is X86 specific.
The second part is platform independent and ensures that:
- CFI instructions do not affect code generation
- Unwind information remains correct when a function is modified by
different passes. This is done in a late pass by analyzing information
about cfa offset and cfa register in BBs and inserting additional CFI
directives where necessary.
Changed CFI instructions so that they:
- are duplicable
- are not counted as instructions when tail duplicating or tail merging
- can be compared as equal
Added CFIInstrInserter pass:
- analyzes each basic block to determine cfa offset and register valid at
its entry and exit
- verifies that outgoing cfa offset and register of predecessor blocks match
incoming values of their successors
- inserts additional CFI directives at basic block beginning to correct the
rule for calculating CFA
Having CFI instructions in function epilogue can cause incorrect CFA
calculation rule for some basic blocks. This can happen if, due to basic
block reordering, or the existence of multiple epilogue blocks, some of the
blocks have wrong cfa offset and register values set by the epilogue block
above them.
CFIInstrInserter is currently run only on X86, but can be used by any target
that implements support for adding CFI instructions in epilogue.
Patch by Violeta Vukobrat.
Differential Revision: https://reviews.llvm.org/D35844
llvm-svn: 317100
Summary:
Compute the strongly connected components of the CFG and fall back to
use these for blocks that are in loops that are not detected by
LoopInfo when computing loop back-edge and exit branch probabilities.
Reviewers: dexonsmith, davidxl
Subscribers: mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D39385
llvm-svn: 317094
This patch reverts rL311205 that was initially a wrong fix. The real problem
was in intersection of signed and unsigned ranges (see rL316552), and the
patch being reverted masked the problem instead of fixing it.
By now, the test against which rL311205 was made works OK even without this
code. This revert patch also contains a test case that demonstrates incorrect
behavior caused by rL311205: it is caused by incorrect choise of signed max
instead of unsigned.
llvm-svn: 317088
So far we've only been using PACKSS truncations with 'all-bits or zero-bits' patterns (vector comparison results etc.). When really we can safely use it for any case as long as the number of sign bits reach down to the last 16-bits (or 8-bits if we're truncating to bytes).
The next steps after this is add the equivalent support for PACKUS and to support packing to sub-128 bit vectors for truncating stores etc.
Differential Revision: https://reviews.llvm.org/D39476
llvm-svn: 317086
Summary:
By replacing branches to CommonExitBlock, we remove the node from
CommonExitBlock's predecessors, invalidating the iterator. The problem
is exposed when the common exit block has multiple predecessors and
needs to sink lifetime info. The modification in the test case trigger
the issue.
Reviewers: davidxl, davide, wmi
Reviewed By: davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39112
llvm-svn: 317084
fmod specification requires the sign of the remainder is
the same as numerator in case remainder is zero.
Reviewers: gottesmm, scanon, arsenm, davide, craig.topper
Reviewed By: scanon
Subscribers: wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D39225
llvm-svn: 317081
This formulation might be slightly slower since I eagerly compute the cheap replacements. If anyone sees this having a compile time impact, let me know and I'll use lazy population instead.
llvm-svn: 317048
Change the map key from DIFile* to the absolute path string. Computing
the absolute path isn't expensive because we already have a map that
caches the full path keyed on DIFile*.
llvm-svn: 317041
Currently the selects are created with the names of their inputs concatenated together. It's possible to get cases that chain these selects together resulting in long names due to multiple levels of concatenation. Our internal branch of llvm managed to generate names over 100000 characters in length on a particular test due to an extreme compounding of the names.
This patch changes the name to a generic name that is not dependent on its inputs.
Differential Revision: https://reviews.llvm.org/D39440
llvm-svn: 317024
Previously, we call write(2) for each 32767 byte chunk. That is not
efficient because Linux can handle much larger write requests.
This patch changes the chunk size on Linux to 1 GiB.
This patch also changes the default chunks size to SSIZE_MAX. I think
that doesn't in practice change this function's behavior on any operating
system because SSIZE_MAX on 64-bit machine is unrealistically large,
and writing 2 GiB (SSIZE_MAX on 32-bit) on a 32-bit machine by a single
call of write(2) is also unrealistic, as the userspace is usually
limited to 2 GiB. That said, it is in general a good thing to do because
a write larger than SSIZE_MAX is implementation-defined in POSIX.
Differential Revision: https://reviews.llvm.org/D39444
llvm-svn: 317015
Fixes a bunch of assert-on-invalid-bitcode regressions after 315483.
Expected<> calls assertIsChecked() in its dtor, and operator bool() only calls
setChecked() if there's no error. So for functions that don't return an error
itself, the Expected<> version needs explicit code to disarm the error that the
ErrorOr<> code didn't need.
https://reviews.llvm.org/D39437
llvm-svn: 317010
If a select instruction tests the returned flag of a cmpxchg instruction and
selects between the returned value of the cmpxchg instruction and its compare
operand, the result of the select will always be equal to its false value.
Differential Revision: https://reviews.llvm.org/D39383
llvm-svn: 316994
The optimisation remarks for loop unrolling with an unrolled remainder looks something like:
test.c:7:18: remark: completely unrolled loop with 3 iterations [-Rpass=loop-unroll]
C[i] += A[i*N+j];
^
test.c:6:9: remark: unrolled loop by a factor of 4 with run-time trip count [-Rpass=loop-unroll]
for(int j = 0; j < N; j++)
^
This removes the first of the two messages.
Differential revision: https://reviews.llvm.org/D38725
llvm-svn: 316986
extract subvector of vXi1 from vYi1 is poorly supported by LLVM and most of the time end with an assertion.
This patch fixes this issue by adding new patterns to the TD file.
Reviewers:
1. guyblank
2. igorb
3. zvi
4. ayman
5. craig.topper
Differential Revision: https://reviews.llvm.org/D39292
Change-Id: Ideb4d7e946c8d40cfce2920891f2d89fe64c58f8
llvm-svn: 316981
Rename `Offset`, `Scale`, `Length` into `Begin`, `Step`, `End` respectively
to make naming of similar entities for Ranges and Range Checks more
consistent.
Differential Revision: https://reviews.llvm.org/D39414
llvm-svn: 316979
Unfortunately, this weakens our ability to do domain fixing when AVX512DQ is not enabled, but it is consistent with our 256-bit behavior.
Maybe we should add custom handling to domain fixing to allow EVEX integer XOR/AND/OR/ANDN to switch to VEX encoded fp instructions if the high registers aren't being used?
llvm-svn: 316978
As noted in the nice block comment, the previous code didn't actually handle multi-entry loops correctly, it just assumed SCEV didn't analyze such loops. Given SCEV has comments to the contrary, that seems a bit suspect. More importantly, the pass actually requires loopsimplify form which ensures a loop-preheader is available. Remove the excessive generaility and shorten the code greatly.
Note that we do successfully analyze many multi-entry loops, but we do so by converting them to single entry loops. See the added test case.
llvm-svn: 316976
This patch fixes the miscompile that happens when PRE hoists loads across guards and
other instructions that don't always pass control flow to their successors. PRE is now prohibited
to hoist across such instructions because there is no guarantee that the load standing after such
instruction is still valid before such instruction. For example, a load from under a guard may be
invalid before the guard in the following case:
int array[LEN];
...
guard(0 <= index && index < LEN);
use(array[index]);
Differential Revision: https://reviews.llvm.org/D37460
llvm-svn: 316975
Previously, the code returned early from the *function* when it couldn't find a free expansion, it should be returning from the *transform*. I don't have a test case, noticed this via inspection.
As a follow up, I'm going to revisit the logic in the extract function. I think that essentially the whole helper routine can be replaced with SCEVExpander, but I wanted to do that in a series of separate commits.
llvm-svn: 316974
Issue found by llvm-isel-fuzzer on OSS fuzz, https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=3725
If anyone actually cares about > 64 bit arithmetic, there's a lot more to do in this area. There's a bunch of obviously wrong code in the same function. I don't have the time to fix all of them and am just using this to understand what the workflow for fixing fuzzer cases might look like.
llvm-svn: 316967
InferAddressSpaces assumes the pointee type of addrspacecast
is the same as the operand, which is not always true and causes
invalid IR.
This bug cause build failure in HCC.
This patch fixes that.
Differential Revision: https://reviews.llvm.org/D39432
llvm-svn: 316957
It's not guaranteed. There's a bug open to sort them in predecessor
order, but it won't happen anytime soon. In the meanwhile, passes
will have to do an O(#preds) scan. Such is life.
llvm-svn: 316953
Revert r316478.
A test case has failed.
Will recommit this change once we find and fix the failure.
This reverts commit 7c330fabaedaba3d02c58bc3cc1198896c895f34.
llvm-svn: 316952
Summary:
For reference, see: http://lists.llvm.org/pipermail/llvm-dev/2017-August/116589.html
This patch fleshes out the instruction class hierarchy with respect to atomic and
non-atomic memory intrinsics. With this change, the relevant part of the class
hierarchy becomes:
IntrinsicInst
-> MemIntrinsicBase (methods-only class)
-> MemIntrinsic (non-atomic intrinsics)
-> MemSetInst
-> MemTransferInst
-> MemCpyInst
-> MemMoveInst
-> AtomicMemIntrinsic (atomic intrinsics)
-> AtomicMemSetInst
-> AtomicMemTransferInst
-> AtomicMemCpyInst
-> AtomicMemMoveInst
-> AnyMemIntrinsic (both atomicities)
-> AnyMemSetInst
-> AnyMemTransferInst
-> AnyMemCpyInst
-> AnyMemMoveInst
This involves some class renaming:
ElementUnorderedAtomicMemCpyInst -> AtomicMemCpyInst
ElementUnorderedAtomicMemMoveInst -> AtomicMemMoveInst
ElementUnorderedAtomicMemSetInst -> AtomicMemSetInst
A script for doing this renaming in downstream trees is included below.
An example of where the Any* classes should be used in LLVM is when reasoning
about the effects of an instruction (ex: aliasing).
---
Script for renaming AtomicMem* classes:
PREFIXES="[<,([:space:]]"
CLASSES="MemIntrinsic|MemTransferInst|MemSetInst|MemMoveInst|MemCpyInst"
SUFFIXES="[;)>,[:space:]]"
REGEX="(${PREFIXES})ElementUnorderedAtomic(${CLASSES})(${SUFFIXES})"
REGEX2="visitElementUnorderedAtomic(${CLASSES})"
FILES=$( grep -E "(${REGEX}|${REGEX2})" -r . | tr ':' ' ' | awk '{print $1}' | sort | uniq )
SED_SCRIPT="s~${REGEX}~\1Atomic\2\3~g"
SED_SCRIPT2="s~${REGEX2}~visitAtomic\1~g"
for f in $FILES; do
echo "Processing: $f"
sed -i ".bak" -E "${SED_SCRIPT};${SED_SCRIPT2};${EA_SED_SCRIPT};${EA_SED_SCRIPT2}" $f
done
Reviewers: sanjoy, deadalnix, apilipenko, anna, skatkov, mkazantsev
Reviewed By: sanjoy
Subscribers: hfinkel, jholewinski, arsenm, sdardis, nhaehnle, JDevlieghere, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D38419
llvm-svn: 316950
Summary: The two 32-bit words were swapped. Update a test omitted in reverted r316270.
Reviewers: jtony, aaron.ballman
Subscribers: nemanjai, kbarton
Differential Revision: https://reviews.llvm.org/D39163
llvm-svn: 316916
Summary:
INC/DEC don't update the carry flag so we need to make sure we don't try to use it.
This patch introduces new X86ISD opcodes for locked INC/DEC. Teaches lowerAtomicArithWithLOCK to emit these nodes if INC/DEC is not slow or the function is being optimized for size. An additional flag is added that allows the INC/DEC to be disabled if the caller determines that the carry flag is being requested.
The test_sub_1_cmp_1_setcc_ugt test is currently showing this bug. The other test case changes are recovering cases that were regressed in r316860.
This should fully fix PR35068 finishing the fix started in r316860.
Reviewers: RKSimon, zvi, spatel
Reviewed By: zvi
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39411
llvm-svn: 316913
Identifies kernels which performs device side kernel enqueues and emit
metadata for the associated hidden kernel arguments. Such kernels are
marked with calls-enqueue-kernel function attribute by
AMDGPUOpenCLEnqueueKernelLowering pass and later on
hidden kernel arguments metadata HiddenDefaultQueue and
HiddenCompletionAction are emitted for them.
Differential Revision: https://reviews.llvm.org/D39255
llvm-svn: 316907
- Targets that want to support memcmp expansions now return the list of
supported load sizes.
- Expansion codegen does not assume that all power-of-two load sizes
smaller than the max load size are valid. For examples, this is not the
case for x86(32bit)+sse2.
Fixes PR34887.
llvm-svn: 316905
This version of the patch includes a fix addressing a stage2 LTO buildbot
failure and addressed some additional nits.
Original commit message:
This updates the SCCP solver to use of the ValueElement lattice for
parameters, which provides integer range information. The range
information is used to remove unneeded icmp instructions.
For the following function, f() can be optimized to ret i32 2 with
this change
source_filename = "sccp.c"
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
; Function Attrs: norecurse nounwind readnone uwtable
define i32 @main() local_unnamed_addr #0 {
entry:
%call = tail call fastcc i32 @f(i32 1)
%call1 = tail call fastcc i32 @f(i32 47)
%add3 = add nsw i32 %call, %call1
ret i32 %add3
}
; Function Attrs: noinline norecurse nounwind readnone uwtable
define internal fastcc i32 @f(i32 %x) unnamed_addr #1 {
entry:
%c1 = icmp sle i32 %x, 100
%cmp = icmp sgt i32 %x, 300
%. = select i1 %cmp, i32 1, i32 2
ret i32 %.
}
attributes #1 = { noinline }
Reviewers: davide, sanjoy, efriedma, dberlin
Reviewed By: davide, dberlin
Subscribers: mcrosier, gberry, mssimpso, dberlin, llvm-commits
Differential Revision: https://reviews.llvm.org/D36656
llvm-svn: 316891
This version of the patch includes a fix addressing a stage2 LTO buildbot
failure and addressed some additional nits.
Original commit message:
This updates the SCCP solver to use of the ValueElement lattice for
parameters, which provides integer range information. The range
information is used to remove unneeded icmp instructions.
For the following function, f() can be optimized to ret i32 2 with
this change
source_filename = "sccp.c"
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
; Function Attrs: norecurse nounwind readnone uwtable
define i32 @main() local_unnamed_addr #0 {
entry:
%call = tail call fastcc i32 @f(i32 1)
%call1 = tail call fastcc i32 @f(i32 47)
%add3 = add nsw i32 %call, %call1
ret i32 %add3
}
; Function Attrs: noinline norecurse nounwind readnone uwtable
define internal fastcc i32 @f(i32 %x) unnamed_addr #1 {
entry:
%c1 = icmp sle i32 %x, 100
%cmp = icmp sgt i32 %x, 300
%. = select i1 %cmp, i32 1, i32 2
ret i32 %.
}
attributes #1 = { noinline }
Reviewers: davide, sanjoy, efriedma, dberlin
Reviewed By: davide, dberlin
Subscribers: mcrosier, gberry, mssimpso, dberlin, llvm-commits
Differential Revision: https://reviews.llvm.org/D36656
llvm-svn: 316887
I have a future patch that wants to make use of the one of the partial functions in one of the earlier memory folding methods and the current ordering prevents that.
llvm-svn: 316883
The old PM sets the options of what used to be known as "latesimplifycfg" on the
instantiation after the vectorizers have run, so that's what we'redoing here.
FWIW, there's a later SimplifyCFGPass instantiation in both PMs where we do not
set the "late" options. I'm not sure if that's intentional or not.
Differential Revision: https://reviews.llvm.org/D39407
llvm-svn: 316869
Introduce a isConstOrDemandedConstSplat helper function that can recognise a constant splat build vector for at least the demanded elts we care about.
llvm-svn: 316866
If the carry flag is being used, this transformation isn't safe.
This does prevent some test cases from using DEC now, but I'll try to look into that separately.
Fixes PR35068.
llvm-svn: 316860
This code attempted to say that v8i16/v16i16 VSELECT is legal if BWI and VLX are enabled, but the only way we could reach this point is if the condition was not a vXi1 type. Which means it really wasn't legal.
We don't have any tests that exercise this code. So I'm hoping it wasn't really reachable.
llvm-svn: 316851
I think this code is unreachable due to some promotions that occur elsewhere. I'll look into that to be sure, but for now I thought I should at least fix the obvious typo.
llvm-svn: 316840
This is no-functional-change-intended.
This is repackaging the functionality of D30333 (defer switch-to-lookup-tables) and
D35411 (defer folding unconditional branches) with pass parameters rather than a named
"latesimplifycfg" pass. Now that we have individual options to control the functionality,
we could decouple when these fire (but that's an independent patch if desired).
The next planned step would be to add another option bit to disable the sinking transform
mentioned in D38566. This should also make it clear that the new pass manager needs to
be updated to limit simplifycfg in the same way as the old pass manager.
Differential Revision: https://reviews.llvm.org/D38631
llvm-svn: 316835
For cases where we know the floating point representations match the bitcasted integer equivalent, allow bitcasting to these types.
This is especially useful for the X86 floating point compare results which return all/zero bits but as a floating point type.
Differential Revision: https://reviews.llvm.org/D39289
llvm-svn: 316831
LLVM crashes when factoring out an out-of-bound index into preceding dimension
and the preceding dimension uses vector index. Simply bail out now when this
case happens.
Differential Revision: https://reviews.llvm.org/D38677
llvm-svn: 316824
Summary:
We shouldn't do this transformation if the function is marked nobuitlin.
We were only checking that the return type is floating point, we really should be checking the argument types and argument count as well. This can be accomplished by using the other version of getLibFunc that takes the Function and not just the name.
We should also be checking TLI::has since sqrtf is a macro on Windows.
Fixes PR32559.
Reviewers: hfinkel, spatel, davide, efriedma
Reviewed By: davide, efriedma
Subscribers: efriedma, llvm-commits, eraman
Differential Revision: https://reviews.llvm.org/D39381
llvm-svn: 316819
Summary:
The removed code checks that we are able to handle a 64-bit number, but
the code we're calling takes two dwords (for a total of 64 bits), so this
is always true.
Reviewers: zturner, rnk, majnemer, compnerd
Reviewed By: zturner
Subscribers: amccarth, hiraditya, lebedev.ri, llvm-commits
Differential Revision: https://reviews.llvm.org/D39263
llvm-svn: 316814
In function DAGCombiner::visitSIGN_EXTEND_INREG, sext can be combined with extload even if sextload is not supported by target, then
if sext is the only user of extload, there is no big difference, no harm no benefit.
if extload has more than one user, the combined sextload may block extload from combining with other zext, causes extra zext instructions generated. As demonstrated by the attached test case.
This patch add the constraint that when sextload is not supported by target, sext can only be combined with extload if it is the only user of extload.
Differential Revision: https://reviews.llvm.org/D39108
llvm-svn: 316802
When accessing a member for a symbol with an offset greater than 2^32 -
1 the current Archive::Symbol::getMember implementation will overflow
and cause unexpected behavior. This change simply fixes that. In
particular if you call "llvm-nm --print-armap" on an archive that has
this behavior you'll get an error.
Differential Revision: https://reviews.llvm.org/D39379
llvm-svn: 316801
We were handling the non-hidden case in lib/Target/TargetMachine.cpp,
but the hidden case was handled in architecture dependent code and
only X86_64 and AArch64 were covered.
While it is true that some code sequences in some ABIs might be able
to produce the correct value at runtime, that doesn't seem to be the
common case.
I left the AArch64 code in place since it also forces a got access for
non-pic code. It is not clear if that is needed, but it is probably
better to change that in another commit.
llvm-svn: 316799
gtest depends on this #define to determine whether it can
use various classes like std::tuple, or whether it has to fall
back to experimental classes in the std::tr1 namespace. The
check in the current version of gtest relies on the value of
the `__cplusplus` macro, but MSVC provides a non-conformant
value of this macro, making it effectively impossible to detect
C++11. In short, LLVM compiled with MSVC has been silently
using the tr1 versions of several classes since the beginning of
time.
This would normally be pretty benign, except that in the latest
preview of MSVC they have marked all of the tr1 classes
deprecated, so it spews thousands of warnings.
llvm-svn: 316798
Summary:
ValueTracking was recognizing not all variations of clamp. Swapping of
true value and false value of select was added to fix this problem. The
first patch was reverted because it caused miscompile in NVPTX target.
Added corresponding test cases.
Reviewers: spatel, majnemer, efriedma, reames
Subscribers: llvm-commits, jholewinski
Differential Revision: https://reviews.llvm.org/D39240
llvm-svn: 316795
Summary:
Original oss-fuzz report:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=3727#c2
The minimized test case that causes this failure:
5b 5b 5b 3d 47 53 00 5b 3d 5d 5b 5d 0a [[[=GS.[=][].
Note the string "=GS\x00". The failure happens because the code is
searching the string against an array of known collated names. "GS\x00"
is a hit, but since len takes into account an extra NUL byte, indexing
into cp->name[len] goes one byte past it's allocated memory. Fix this to
use a strlen(cp->name) comparison to account for NUL bytes in the input.
Reviewers: pcc
Reviewed By: pcc
Subscribers: hctim, kcc
Differential Revision: https://reviews.llvm.org/D39380
llvm-svn: 316786
The Android relocation packing format is a more compact
format for dynamic relocations in executables and DSOs
that is based on delta encoding and SLEBs. An overview
of the format can be found in the Android source code:
https://android.googlesource.com/platform/bionic/+/refs/heads/master/tools/relocation_packer/src/delta_encoder.h
This patch implements relocation packing using that format.
This implementation uses a more intelligent algorithm for compressing
relative relocations than Android's own relocation packer. As a
result it can generally create smaller relocation sections than
that packer. If I link Chromium for Android targeting ARM32 I get a
.rel.dyn of size 174693 bytes, as compared to 371832 bytes with gold
and the Android packer.
Differential Revision: https://reviews.llvm.org/D39152
llvm-svn: 316775
This is a follow up change for D37569.
Currently the transformation is limited to the case when:
* The loop has a single latch with the condition of the form: ++i <pred> latchLimit, where <pred> is u<, u<=, s<, or s<=.
* The step of the IV used in the latch condition is 1.
* The IV of the latch condition is the same as the post increment IV of the guard condition.
* The guard condition is of the form i u< guardLimit.
This patch enables the transform in the case when the latch is
latchStart + i <pred> latchLimit, where <pred> is u<, u<=, s<, or s<=.
And the guard is
guardStart + i u< guardLimit
Reviewed By: anna
Differential Revision: https://reviews.llvm.org/D39097
llvm-svn: 316768
Patch by Robert Widmann.
Expose getters for MetadataType and TokenType publicly in the C API.
Discovered a need for these while trying to wrap the intrinsics API.
Differential Revision: https://reviews.llvm.org/D38809
llvm-svn: 316762
Patch improves next things:
* Fixes assert/crash in getOpDesc when giving it a invalid expression op code.
* DWARFExpression::print() called DWARFExpression::Operation::getEndOffset() which
returned and used uninitialized field EndOffset. Patch fixes that.
* Teaches verifier to verify DW_AT_location and error out on broken expressions.
Differential revision: https://reviews.llvm.org/D39294
llvm-svn: 316756
Not having the subclass data on an MemIntrinsicSDNodes means it was possible
to try to fold 2 nodes with the same operands but differing MMO flags. This
would trip an assertion when trying to refine the alignment between the 2
MachineMemOperands.
Differential Revision: https://reviews.llvm.org/D38898
llvm-svn: 316737
This ensures that each segment has a unique address.
Without this, consecutive zero sized symbols would
end up with the same address and the linker cannot
map symbols to unique data segments.
Differential Revision: https://reviews.llvm.org/D39107
llvm-svn: 316717
Summary:
Currently we skip merging when extra moves may be added in the header of switch instead of the case block, if the case block is used as an incoming
block of a PHI. If all the incoming values of the PHIs are non-constants and the destination block is dominated by the switch block then extra moves are likely not added by ISel, so there is no need to skip merging in this case.
Reviewers: efriedma, junbuml, davidxl, hfinkel, qcolombet
Reviewed By: efriedma
Subscribers: dberlin, kuhar, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D37343
llvm-svn: 316711
As far as I can tell, this matches gcc: -mfloat-abi determines the
calling convention for all functions except those explicitly defined as
soft-float in the ARM RTABI.
This change only affects cases where the user specifies -mfloat-abi to
override the default calling convention derived from the target triple.
Fixes https://bugs.llvm.org//show_bug.cgi?id=34530.
Differential Revision: https://reviews.llvm.org/D38299
llvm-svn: 316708
These headers have static variables in them, which would easily create
ODR violations if the header was included in another header, and the
constants were used by an inline function, for example.
llvm-svn: 316706
Summary: There are certain requirements for debug location of debug intrinsics, e.g. the scope of the DILocalVariable should be the same as the scope of its debug location. As a result, we should not add discriminator encoding for debug intrinsics.
Reviewers: dblaikie, aprantl
Reviewed By: aprantl
Subscribers: JDevlieghere, aprantl, bjope, sanjoy, llvm-commits
Differential Revision: https://reviews.llvm.org/D39343
llvm-svn: 316703
If the extend type is 64-bits, emit a 32-bit -> 64-bit extend after the UDIVREM8_ZEXT_HREG/UDIVREM8_SEXT_HREG operation.
This gives a shorter encoding for the second extend in the sext case, and allows us to completely remove the second extend in the zext case.
This also adds known bit and num sign bits support for UDIVREM8_ZEXT_HREG/SDIVREM8_SEXT_HREG.
Differential Revision: https://reviews.llvm.org/D38275
llvm-svn: 316702
When going to explain this to someone else, I got tripped up by the complicated meaning of IsKnownNonEscapingObject in load-store promotion. Extract a helper routine and clarify naming/scopes to make this a bit more obvious.
llvm-svn: 316699
Both GNU ld and MS link.exe support declaring ordinals this way.
A test will be added in lld.
Differential Revision: https://reviews.llvm.org/D39327
llvm-svn: 316690
Instead of loading (a potential ton of) scalar constants, load those as a vector and then insert into it.
Differential Revision: https://reviews.llvm.org/D38756
llvm-svn: 316685
Summary:
This causes a segfault on ARM when (I think) the pass manager is used multiple times.
Reset set the (last) current section to NULL without saving the corresponding LastEMSInfo back into the map. The next use of the streamer then save the LastEMSInfo for the NULL section leaving the LastEMSInfo mapping for the last current section (the one that was there before the reset) NULL which cause the LastEMSInfo to be set to NULL when the section is being used again.
The reuse of the section (pointer) might mean that the map was holding dangling pointers previously which is why I went for clearing the map and resetting the info, making it as similar to the state right after the constructor run as possible. The AArch64 one doesn't have segfault (since LastEMS isn't a pointer) but it seems to have the same issue.
The segfault is likely caused by https://reviews.llvm.org/D30724 which turns LastEMSInfo into a pointer. As mentioned above, it seems that the actual issue was older though.
No test is included since the test is believed to be too complicated for such an obvious fix and not worth doing.
Reviewers: llvm-commits, shankare, t.p.northover, peter.smith, rengolin
Reviewed By: rengolin
Subscribers: mgorny, aemerson, rengolin, javed.absar, kristof.beyls
Differential Revision: https://reviews.llvm.org/D38588
llvm-svn: 316679
Summary:
On musl libc, stdin/out/err are defined as `FILE* const` globals,
and their address is not implicitly convertible to void *,
or at least gcc 6 doesn't allow it, giving errors like:
```
error: cannot initialize return object of type 'void *' with an rvalue of type 'FILE *const *' (aka '_IO_FILE *const *')
EXPLICIT_SYMBOL(stderr);
^~~~~~~~~~~~~~~~~~~~~~~
```
Add an explicit cast to fix that problem.
Reviewers: marsupial, krytarowski, dim
Reviewed By: dim
Differential Revision: https://reviews.llvm.org/D39297
llvm-svn: 316672
Summary:
This seems to be the only place in llvm we directly call qsort. We can replace
this with a call to array_pod_sort. Also minor cleanup of the sorting function.
Reviewers: bkramer, Eugene.Zelenko, rafael
Reviewed By: bkramer
Subscribers: efriedma, llvm-commits
Differential Revision: https://reviews.llvm.org/D39214
llvm-svn: 316671
Currently we do not represent runtime preemption in the IR, which has several
drawbacks:
1) The semantics of GlobalValues differ depending on the object file format
you are targeting (as well as the relocation-model and -fPIE value).
2) We have no way of disabling inlining of run time interposable functions,
since in the IR we only know if a function is link-time interposable.
Because of this llvm cannot support elf-interposition semantics.
3) In LTO builds of executables we will have extra knowledge that a symbol
resolved to a local definition and can't be preemptable, but have no way to
propagate that knowledge through the compiler.
This patch adds preemptability specifiers to the IR with the following meaning:
dso_local --> means the compiler may assume the symbol will resolve to a
definition within the current linkage unit and the symbol may be accessed
directly even if the definition is not within this compilation unit.
dso_preemptable --> means that the compiler must assume the GlobalValue may be
replaced with a definition from outside the current linkage unit at runtime.
To ease transitioning dso_preemptable is treated as a 'default' in that
low-level codegen will still do the same checks it did previously to see if a
symbol should be accessed indirectly. Eventually when IR producers emit the
specifiers on all Globalvalues we can change dso_preemptable to mean 'always
access indirectly', and remove the current logic.
Differential Revision: https://reviews.llvm.org/D20217
llvm-svn: 316668
Summary:
We no longer add vectors of pointers as candidates for
load/store vectorization. It does not seem to work anyway,
but without this patch we can end up in asserts when trying
to create casts between an integer type and the pointer of
vectors type.
The test case I've added used to assert like this when trying to
cast between i64 and <2 x i16*>:
opt: ../lib/IR/Instructions.cpp:2565: Assertion `castIsValid(op, S, Ty) && "Invalid cast!"' failed.
#0 PrintStackTraceSignalHandler(void*)
#1 SignalHandler(int)
#2 __restore_rt
#3 __GI_raise
#4 __GI_abort
#5 __GI___assert_fail
#6 llvm::CastInst::Create(llvm::Instruction::CastOps, llvm::Value*, llvm::Type*, llvm::Twine const&, llvm::Instruction*)
#7 llvm::IRBuilder<llvm::ConstantFolder, llvm::IRBuilderDefaultInserter>::CreateBitOrPointerCast(llvm::Value*, llvm::Type*, llvm::Twine const&)
#8 Vectorizer::vectorizeStoreChain(llvm::ArrayRef<llvm::Instruction*>, llvm::SmallPtrSet<llvm::Instruction*, 16u>*)
Reviewers: arsenm
Reviewed By: arsenm
Subscribers: nhaehnle, llvm-commits
Differential Revision: https://reviews.llvm.org/D39296
llvm-svn: 316665
Summary:
The code comments indicate that no effort has been spent on
handling load/stores when the size isn't a multiple of the
byte size correctly. However, the code only avoided types
smaller than 8 bits. So for example a load of an i28 could
still be considered as a candidate for vectorization.
This patch adjusts the code to behave according to the code
comment.
The test case used to hit the following assert when
trying to use "cast" an i32 to i28 using CreateBitOrPointerCast:
opt: ../lib/IR/Instructions.cpp:2565: Assertion `castIsValid(op, S, Ty) && "Invalid cast!"' failed.
#0 PrintStackTraceSignalHandler(void*)
#1 SignalHandler(int)
#2 __restore_rt
#3 __GI_raise
#4 __GI_abort
#5 __GI___assert_fail
#6 llvm::CastInst::Create(llvm::Instruction::CastOps, llvm::Value*, llvm::Type*, llvm::Twine const&, llvm::Instruction*)
#7 llvm::IRBuilder<llvm::ConstantFolder, llvm::IRBuilderDefaultInserter>::CreateBitOrPointerCast(llvm::Value*, llvm::Type*, llvm::Twine const&)
#8 (anonymous namespace)::Vectorizer::vectorizeLoadChain(llvm::ArrayRef<llvm::Instruction*>, llvm::SmallPtrSet<llvm::Instruction*, 16u>*)
Reviewers: arsenm
Reviewed By: arsenm
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39295
llvm-svn: 316663
These instructions were previously marked as codegen only preventing
them from being assembled as microMIPS or disassembled.
Reviewers: atanasyan, abeserminji
Differential Revision: https://reviews.llvm.org/D39123
llvm-svn: 316656
PR35071 exposed the fact that MipsInstrInfo::removeBranch did not walk past
debug instructions when removing branches for the control flow optimizer, which
lead to duplicated conditional branches. If the target of the branch was a
removable block, only the conditional branch in the terminating position would
have it's MBB operands updated, leaving the first branch with a dangling MBB
operand. The MIPS long branch pass would then trigger an assertion when
attempting to examine the instruction with dangling MBB operand.
This resolves PR35071.
Thanks to Alex Richardson for reporting the issue!
Reviewers: atanasyan
Differential Revision: https://reviews.llvm.org/D39288
llvm-svn: 316654
Currently a record-form instruction is used for comparison of "greater than -1" and "less than 1" by modifying the predicate (e.g. LT 1 into LE 0) in addition to the naive case of comparison against 0.
This patch also enables emitting a record-form instruction for "less than or equal to -1" (i.e. "less than 0") and "greater than or equal to 1" (i.e. "greater than 0") to increase the optimization opportunities.
Differential Revision: https://reviews.llvm.org/D38941
llvm-svn: 316647
Add the option to lookup an address in the debug information and print
out the file, function, block and line table details.
Differential revision: https://reviews.llvm.org/D38409
llvm-svn: 316619
Max backedge taken count is always expected to be a constant; and this is
usually true by construction -- it is a SCEV expression with constant inputs.
However, if the max backedge expression ends up being computed to be a udiv with
a constant zero denominator[0], SCEV does not fold the result to a constant
since there is no constant it can fold it to (SCEV has no representation for
"infinity" or "undef").
However, in computeMaxBECountForLT we already know the denominator is positive,
and thus at least 1; and we can use this fact to avoid dividing by zero.
[0]: We can end up with a constant zero denominator if the signed range of the
stride is more precise than the unsigned range.
llvm-svn: 316615