Revert r316478.
A test case has failed.
Will recommit this change once we find and fix the failure.
This reverts commit 7c330fabaedaba3d02c58bc3cc1198896c895f34.
llvm-svn: 316952
Summary: The two 32-bit words were swapped. Update a test omitted in reverted r316270.
Reviewers: jtony, aaron.ballman
Subscribers: nemanjai, kbarton
Differential Revision: https://reviews.llvm.org/D39163
llvm-svn: 316916
Summary:
INC/DEC don't update the carry flag so we need to make sure we don't try to use it.
This patch introduces new X86ISD opcodes for locked INC/DEC. Teaches lowerAtomicArithWithLOCK to emit these nodes if INC/DEC is not slow or the function is being optimized for size. An additional flag is added that allows the INC/DEC to be disabled if the caller determines that the carry flag is being requested.
The test_sub_1_cmp_1_setcc_ugt test is currently showing this bug. The other test case changes are recovering cases that were regressed in r316860.
This should fully fix PR35068 finishing the fix started in r316860.
Reviewers: RKSimon, zvi, spatel
Reviewed By: zvi
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39411
llvm-svn: 316913
Identifies kernels which performs device side kernel enqueues and emit
metadata for the associated hidden kernel arguments. Such kernels are
marked with calls-enqueue-kernel function attribute by
AMDGPUOpenCLEnqueueKernelLowering pass and later on
hidden kernel arguments metadata HiddenDefaultQueue and
HiddenCompletionAction are emitted for them.
Differential Revision: https://reviews.llvm.org/D39255
llvm-svn: 316907
- Targets that want to support memcmp expansions now return the list of
supported load sizes.
- Expansion codegen does not assume that all power-of-two load sizes
smaller than the max load size are valid. For examples, this is not the
case for x86(32bit)+sse2.
Fixes PR34887.
llvm-svn: 316905
This version of the patch includes a fix addressing a stage2 LTO buildbot
failure and addressed some additional nits.
Original commit message:
This updates the SCCP solver to use of the ValueElement lattice for
parameters, which provides integer range information. The range
information is used to remove unneeded icmp instructions.
For the following function, f() can be optimized to ret i32 2 with
this change
source_filename = "sccp.c"
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
; Function Attrs: norecurse nounwind readnone uwtable
define i32 @main() local_unnamed_addr #0 {
entry:
%call = tail call fastcc i32 @f(i32 1)
%call1 = tail call fastcc i32 @f(i32 47)
%add3 = add nsw i32 %call, %call1
ret i32 %add3
}
; Function Attrs: noinline norecurse nounwind readnone uwtable
define internal fastcc i32 @f(i32 %x) unnamed_addr #1 {
entry:
%c1 = icmp sle i32 %x, 100
%cmp = icmp sgt i32 %x, 300
%. = select i1 %cmp, i32 1, i32 2
ret i32 %.
}
attributes #1 = { noinline }
Reviewers: davide, sanjoy, efriedma, dberlin
Reviewed By: davide, dberlin
Subscribers: mcrosier, gberry, mssimpso, dberlin, llvm-commits
Differential Revision: https://reviews.llvm.org/D36656
llvm-svn: 316891
This version of the patch includes a fix addressing a stage2 LTO buildbot
failure and addressed some additional nits.
Original commit message:
This updates the SCCP solver to use of the ValueElement lattice for
parameters, which provides integer range information. The range
information is used to remove unneeded icmp instructions.
For the following function, f() can be optimized to ret i32 2 with
this change
source_filename = "sccp.c"
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
; Function Attrs: norecurse nounwind readnone uwtable
define i32 @main() local_unnamed_addr #0 {
entry:
%call = tail call fastcc i32 @f(i32 1)
%call1 = tail call fastcc i32 @f(i32 47)
%add3 = add nsw i32 %call, %call1
ret i32 %add3
}
; Function Attrs: noinline norecurse nounwind readnone uwtable
define internal fastcc i32 @f(i32 %x) unnamed_addr #1 {
entry:
%c1 = icmp sle i32 %x, 100
%cmp = icmp sgt i32 %x, 300
%. = select i1 %cmp, i32 1, i32 2
ret i32 %.
}
attributes #1 = { noinline }
Reviewers: davide, sanjoy, efriedma, dberlin
Reviewed By: davide, dberlin
Subscribers: mcrosier, gberry, mssimpso, dberlin, llvm-commits
Differential Revision: https://reviews.llvm.org/D36656
llvm-svn: 316887
The old PM sets the options of what used to be known as "latesimplifycfg" on the
instantiation after the vectorizers have run, so that's what we'redoing here.
FWIW, there's a later SimplifyCFGPass instantiation in both PMs where we do not
set the "late" options. I'm not sure if that's intentional or not.
Differential Revision: https://reviews.llvm.org/D39407
llvm-svn: 316869
Introduce a isConstOrDemandedConstSplat helper function that can recognise a constant splat build vector for at least the demanded elts we care about.
llvm-svn: 316866
If the carry flag is being used, this transformation isn't safe.
This does prevent some test cases from using DEC now, but I'll try to look into that separately.
Fixes PR35068.
llvm-svn: 316860
This is no-functional-change-intended.
This is repackaging the functionality of D30333 (defer switch-to-lookup-tables) and
D35411 (defer folding unconditional branches) with pass parameters rather than a named
"latesimplifycfg" pass. Now that we have individual options to control the functionality,
we could decouple when these fire (but that's an independent patch if desired).
The next planned step would be to add another option bit to disable the sinking transform
mentioned in D38566. This should also make it clear that the new pass manager needs to
be updated to limit simplifycfg in the same way as the old pass manager.
Differential Revision: https://reviews.llvm.org/D38631
llvm-svn: 316835
For cases where we know the floating point representations match the bitcasted integer equivalent, allow bitcasting to these types.
This is especially useful for the X86 floating point compare results which return all/zero bits but as a floating point type.
Differential Revision: https://reviews.llvm.org/D39289
llvm-svn: 316831
LLVM crashes when factoring out an out-of-bound index into preceding dimension
and the preceding dimension uses vector index. Simply bail out now when this
case happens.
Differential Revision: https://reviews.llvm.org/D38677
llvm-svn: 316824
Summary:
We shouldn't do this transformation if the function is marked nobuitlin.
We were only checking that the return type is floating point, we really should be checking the argument types and argument count as well. This can be accomplished by using the other version of getLibFunc that takes the Function and not just the name.
We should also be checking TLI::has since sqrtf is a macro on Windows.
Fixes PR32559.
Reviewers: hfinkel, spatel, davide, efriedma
Reviewed By: davide, efriedma
Subscribers: efriedma, llvm-commits, eraman
Differential Revision: https://reviews.llvm.org/D39381
llvm-svn: 316819
In function DAGCombiner::visitSIGN_EXTEND_INREG, sext can be combined with extload even if sextload is not supported by target, then
if sext is the only user of extload, there is no big difference, no harm no benefit.
if extload has more than one user, the combined sextload may block extload from combining with other zext, causes extra zext instructions generated. As demonstrated by the attached test case.
This patch add the constraint that when sextload is not supported by target, sext can only be combined with extload if it is the only user of extload.
Differential Revision: https://reviews.llvm.org/D39108
llvm-svn: 316802
We were handling the non-hidden case in lib/Target/TargetMachine.cpp,
but the hidden case was handled in architecture dependent code and
only X86_64 and AArch64 were covered.
While it is true that some code sequences in some ABIs might be able
to produce the correct value at runtime, that doesn't seem to be the
common case.
I left the AArch64 code in place since it also forces a got access for
non-pic code. It is not clear if that is needed, but it is probably
better to change that in another commit.
llvm-svn: 316799
Summary:
ValueTracking was recognizing not all variations of clamp. Swapping of
true value and false value of select was added to fix this problem. The
first patch was reverted because it caused miscompile in NVPTX target.
Added corresponding test cases.
Reviewers: spatel, majnemer, efriedma, reames
Subscribers: llvm-commits, jholewinski
Differential Revision: https://reviews.llvm.org/D39240
llvm-svn: 316795
This is a follow up change for D37569.
Currently the transformation is limited to the case when:
* The loop has a single latch with the condition of the form: ++i <pred> latchLimit, where <pred> is u<, u<=, s<, or s<=.
* The step of the IV used in the latch condition is 1.
* The IV of the latch condition is the same as the post increment IV of the guard condition.
* The guard condition is of the form i u< guardLimit.
This patch enables the transform in the case when the latch is
latchStart + i <pred> latchLimit, where <pred> is u<, u<=, s<, or s<=.
And the guard is
guardStart + i u< guardLimit
Reviewed By: anna
Differential Revision: https://reviews.llvm.org/D39097
llvm-svn: 316768
Patch improves next things:
* Fixes assert/crash in getOpDesc when giving it a invalid expression op code.
* DWARFExpression::print() called DWARFExpression::Operation::getEndOffset() which
returned and used uninitialized field EndOffset. Patch fixes that.
* Teaches verifier to verify DW_AT_location and error out on broken expressions.
Differential revision: https://reviews.llvm.org/D39294
llvm-svn: 316756
Not having the subclass data on an MemIntrinsicSDNodes means it was possible
to try to fold 2 nodes with the same operands but differing MMO flags. This
would trip an assertion when trying to refine the alignment between the 2
MachineMemOperands.
Differential Revision: https://reviews.llvm.org/D38898
llvm-svn: 316737
This ensures that each segment has a unique address.
Without this, consecutive zero sized symbols would
end up with the same address and the linker cannot
map symbols to unique data segments.
Differential Revision: https://reviews.llvm.org/D39107
llvm-svn: 316717
Summary:
Currently we skip merging when extra moves may be added in the header of switch instead of the case block, if the case block is used as an incoming
block of a PHI. If all the incoming values of the PHIs are non-constants and the destination block is dominated by the switch block then extra moves are likely not added by ISel, so there is no need to skip merging in this case.
Reviewers: efriedma, junbuml, davidxl, hfinkel, qcolombet
Reviewed By: efriedma
Subscribers: dberlin, kuhar, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D37343
llvm-svn: 316711
As far as I can tell, this matches gcc: -mfloat-abi determines the
calling convention for all functions except those explicitly defined as
soft-float in the ARM RTABI.
This change only affects cases where the user specifies -mfloat-abi to
override the default calling convention derived from the target triple.
Fixes https://bugs.llvm.org//show_bug.cgi?id=34530.
Differential Revision: https://reviews.llvm.org/D38299
llvm-svn: 316708
Summary: There are certain requirements for debug location of debug intrinsics, e.g. the scope of the DILocalVariable should be the same as the scope of its debug location. As a result, we should not add discriminator encoding for debug intrinsics.
Reviewers: dblaikie, aprantl
Reviewed By: aprantl
Subscribers: JDevlieghere, aprantl, bjope, sanjoy, llvm-commits
Differential Revision: https://reviews.llvm.org/D39343
llvm-svn: 316703
If the extend type is 64-bits, emit a 32-bit -> 64-bit extend after the UDIVREM8_ZEXT_HREG/UDIVREM8_SEXT_HREG operation.
This gives a shorter encoding for the second extend in the sext case, and allows us to completely remove the second extend in the zext case.
This also adds known bit and num sign bits support for UDIVREM8_ZEXT_HREG/SDIVREM8_SEXT_HREG.
Differential Revision: https://reviews.llvm.org/D38275
llvm-svn: 316702
Instead of loading (a potential ton of) scalar constants, load those as a vector and then insert into it.
Differential Revision: https://reviews.llvm.org/D38756
llvm-svn: 316685
Currently we do not represent runtime preemption in the IR, which has several
drawbacks:
1) The semantics of GlobalValues differ depending on the object file format
you are targeting (as well as the relocation-model and -fPIE value).
2) We have no way of disabling inlining of run time interposable functions,
since in the IR we only know if a function is link-time interposable.
Because of this llvm cannot support elf-interposition semantics.
3) In LTO builds of executables we will have extra knowledge that a symbol
resolved to a local definition and can't be preemptable, but have no way to
propagate that knowledge through the compiler.
This patch adds preemptability specifiers to the IR with the following meaning:
dso_local --> means the compiler may assume the symbol will resolve to a
definition within the current linkage unit and the symbol may be accessed
directly even if the definition is not within this compilation unit.
dso_preemptable --> means that the compiler must assume the GlobalValue may be
replaced with a definition from outside the current linkage unit at runtime.
To ease transitioning dso_preemptable is treated as a 'default' in that
low-level codegen will still do the same checks it did previously to see if a
symbol should be accessed indirectly. Eventually when IR producers emit the
specifiers on all Globalvalues we can change dso_preemptable to mean 'always
access indirectly', and remove the current logic.
Differential Revision: https://reviews.llvm.org/D20217
llvm-svn: 316668
Summary:
We no longer add vectors of pointers as candidates for
load/store vectorization. It does not seem to work anyway,
but without this patch we can end up in asserts when trying
to create casts between an integer type and the pointer of
vectors type.
The test case I've added used to assert like this when trying to
cast between i64 and <2 x i16*>:
opt: ../lib/IR/Instructions.cpp:2565: Assertion `castIsValid(op, S, Ty) && "Invalid cast!"' failed.
#0 PrintStackTraceSignalHandler(void*)
#1 SignalHandler(int)
#2 __restore_rt
#3 __GI_raise
#4 __GI_abort
#5 __GI___assert_fail
#6 llvm::CastInst::Create(llvm::Instruction::CastOps, llvm::Value*, llvm::Type*, llvm::Twine const&, llvm::Instruction*)
#7 llvm::IRBuilder<llvm::ConstantFolder, llvm::IRBuilderDefaultInserter>::CreateBitOrPointerCast(llvm::Value*, llvm::Type*, llvm::Twine const&)
#8 Vectorizer::vectorizeStoreChain(llvm::ArrayRef<llvm::Instruction*>, llvm::SmallPtrSet<llvm::Instruction*, 16u>*)
Reviewers: arsenm
Reviewed By: arsenm
Subscribers: nhaehnle, llvm-commits
Differential Revision: https://reviews.llvm.org/D39296
llvm-svn: 316665
Summary:
The code comments indicate that no effort has been spent on
handling load/stores when the size isn't a multiple of the
byte size correctly. However, the code only avoided types
smaller than 8 bits. So for example a load of an i28 could
still be considered as a candidate for vectorization.
This patch adjusts the code to behave according to the code
comment.
The test case used to hit the following assert when
trying to use "cast" an i32 to i28 using CreateBitOrPointerCast:
opt: ../lib/IR/Instructions.cpp:2565: Assertion `castIsValid(op, S, Ty) && "Invalid cast!"' failed.
#0 PrintStackTraceSignalHandler(void*)
#1 SignalHandler(int)
#2 __restore_rt
#3 __GI_raise
#4 __GI_abort
#5 __GI___assert_fail
#6 llvm::CastInst::Create(llvm::Instruction::CastOps, llvm::Value*, llvm::Type*, llvm::Twine const&, llvm::Instruction*)
#7 llvm::IRBuilder<llvm::ConstantFolder, llvm::IRBuilderDefaultInserter>::CreateBitOrPointerCast(llvm::Value*, llvm::Type*, llvm::Twine const&)
#8 (anonymous namespace)::Vectorizer::vectorizeLoadChain(llvm::ArrayRef<llvm::Instruction*>, llvm::SmallPtrSet<llvm::Instruction*, 16u>*)
Reviewers: arsenm
Reviewed By: arsenm
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39295
llvm-svn: 316663
These instructions were previously marked as codegen only preventing
them from being assembled as microMIPS or disassembled.
Reviewers: atanasyan, abeserminji
Differential Revision: https://reviews.llvm.org/D39123
llvm-svn: 316656
PR35071 exposed the fact that MipsInstrInfo::removeBranch did not walk past
debug instructions when removing branches for the control flow optimizer, which
lead to duplicated conditional branches. If the target of the branch was a
removable block, only the conditional branch in the terminating position would
have it's MBB operands updated, leaving the first branch with a dangling MBB
operand. The MIPS long branch pass would then trigger an assertion when
attempting to examine the instruction with dangling MBB operand.
This resolves PR35071.
Thanks to Alex Richardson for reporting the issue!
Reviewers: atanasyan
Differential Revision: https://reviews.llvm.org/D39288
llvm-svn: 316654
Currently a record-form instruction is used for comparison of "greater than -1" and "less than 1" by modifying the predicate (e.g. LT 1 into LE 0) in addition to the naive case of comparison against 0.
This patch also enables emitting a record-form instruction for "less than or equal to -1" (i.e. "less than 0") and "greater than or equal to 1" (i.e. "greater than 0") to increase the optimization opportunities.
Differential Revision: https://reviews.llvm.org/D38941
llvm-svn: 316647
Add the option to lookup an address in the debug information and print
out the file, function, block and line table details.
Differential revision: https://reviews.llvm.org/D38409
llvm-svn: 316619
Summary:
On FreeBSD11.0 the FileCheck NOT string "1.0" will be matched by
`.amd_amdgpu_isa "amdgcn-unknown-freebsd11.0--gfx802"` at the end of the
file. Add a CHECK for that directive to avoid failing the test.
Reviewers: rampitec, kzhuravl
Reviewed By: rampitec, kzhuravl
Subscribers: emaste, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits, krytarowski
Differential Revision: https://reviews.llvm.org/D39306
llvm-svn: 316616
Max backedge taken count is always expected to be a constant; and this is
usually true by construction -- it is a SCEV expression with constant inputs.
However, if the max backedge expression ends up being computed to be a udiv with
a constant zero denominator[0], SCEV does not fold the result to a constant
since there is no constant it can fold it to (SCEV has no representation for
"infinity" or "undef").
However, in computeMaxBECountForLT we already know the denominator is positive,
and thus at least 1; and we can use this fact to avoid dividing by zero.
[0]: We can end up with a constant zero denominator if the signed range of the
stride is more precise than the unsigned range.
llvm-svn: 316615
In getOffsetRange, Max can be set to 0 to force the extender replacement
to be at or below the original value. This would cause the new offset to
be non-negative, which is preferred for memory instructions (to reduce
the likelihood of it getting constant-extended due to predication). The
problem happens when the range is shifted by an offset (present in the
instruction being examined) and the offset is negative. The entire range
for the allowable deviation will then be strictly negative. This creates
a problem, since 0 is assumed to be a valid deviation.
llvm-svn: 316601
This patch adds a new pass for attaching !callees metadata to indirect call
sites. The pass propagates values to call sites by performing an IPSCCP-like
analysis using the generic sparse propagation solver. For indirect call sites
having a small set of possible callees, the attached metadata indicates what
those callees are. The metadata can be used to facilitate optimizations like
intersecting the function attributes of the possible callees, refining the call
graph, performing indirect call promotion, etc.
Differential Revision: https://reviews.llvm.org/D37355
llvm-svn: 316576
In a case when number of output constraint operands that has matched input operands
doesn't fit to signed char, TargetLowering::ParseConstraints() can try to access
ConstraintOperands (that is std::vector) with negative index.
Reviewers: rampitec, arsenm
Differential Review: https://reviews.llvm.org/D39125
llvm-svn: 316574
Remove the G_FADD testcases from arm-legalizer.mir, they are covered by
arm-legalizer-fp.mir (I probably forgot to delete them when I created
that test).
llvm-svn: 316573
We were generating BLX for all the calls, which was incorrect in most
cases. Update ARMCallLowering to generate BL for direct calls, and BLX,
BX_CALL or BMOVPCRX_CALL for indirect calls.
llvm-svn: 316570
Separate the test cases that deal with calls from the rest of the IR
Translator tests.
We split into 2 different files, one for testing parameter and result
lowering, and one for testing the various different kinds of calls that
can occur (BL, BLX, BX_CALL etc).
llvm-svn: 316569
This patch allows SCEVFindUnsafe algorithm to tread division by any non-positive
value as safe. Previously, it could only recognize non-zero constants.
Differential Revision: https://reviews.llvm.org/D39228
llvm-svn: 316568
Compute the actual decomposition only after deciding whether to expand
of not. Else, it's easy to make the compiler OOM with:
`memcpy(dst, src, 0xffffffffffffffff);`, which typically happens if
someone mistakenly passes a negative value. Add a test.
This reverts commit f8fc02fbd4ab33383c010d33675acf9763d0bd44.
llvm-svn: 316567
This fixes possible out of bound access in
DWARFDie::getFirstChild()
which might happen when .debug_info section is corrupted,
like shown in testcase.
Differential revision: https://reviews.llvm.org/D39185
llvm-svn: 316566
Swap the compare operands if the lhs is a shift and the rhs isn't,
as in arm and T2 the shift can be performed by the compare for its
second operand.
Differential Revision: https://reviews.llvm.org/D39004
llvm-svn: 316562
Previously, the dllimport attribute did the right thing in terms
of treating it as a pointer to a value, but this makes sure the
names get mangled properly, and calls to such functions load the
function from the __imp_ pointer.
This is based on SVN r212431 and r212430 where the same was
implemented for ARM.
Differential Revision: https://reviews.llvm.org/D38530
llvm-svn: 316555
This code added in r297930 assumed that it could create
a select with a condition type that is just an integer
bitcast of the selected type. For AMDGPU any vselect is
going to be scalarized (although the vector types are legal),
and all select conditions must be i1 (the same as getSetCCResultType).
This logic doesn't really make sense to me, but there's
never really been a consistent policy in what the select
condition mask type is supposed to be. Try to extend
the logic for skipping the transform for condition types
that aren't setccs. It doesn't seem quite right to me though,
but checking conditions that seem more sensible (like whether the
vselect is going to be expanded) doesn't work since this
seems to depend on that also.
llvm-svn: 316554
IRCE for unsigned latch conditions was temporarily disabled by rL314881. The motivating
example contained an unsigned latch condition and a signed range check. One of the safe
iteration ranges was `[1, SINT_MAX + 1]`. Its right border was incorrectly interpreted as a negative
value in `IntersectRange` function, this lead to a miscompile under which we deleted a range check
without inserting a postloop where it was needed.
This patch brings back IRCE for unsigned latch conditions. Now we treat range intersection more
carefully. If the latch condition was unsigned, we only try to consider a range check for deletion if:
1. The range check is also unsigned, or
2. Safe iteration range of the range check lies within `[0, SINT_MAX]`.
The same is done for signed latch.
Values from `[0, SINT_MAX]` are unambiguous, these values are non-negative under any interpretation,
and all values of a range intersected with such range are also non-negative.
We also use signed/unsigned min/max functions for range intersection depending on type of the
latch condition.
Differential Revision: https://reviews.llvm.org/D38581
llvm-svn: 316552
Summary:
Memory dependence analysis no longer counts DbgInfoIntrinsics towards the
limit where to abort the analysis. Before, a bunch of calls to dbg.value
could affect the generated code, meaning that with -g we could generate
different code than without.
Reviewers: chandlerc, Prazek, davide, efriedma
Reviewed By: efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39181
llvm-svn: 316551
For a SCEV range, this patch replaces the naive emptiness check for SCEV ranges
which looks like `Begin == End` with a SCEV check. The range is guaranteed to be
empty of `Begin >= End`. We should filter such ranges out and do not try to perform
IRCE for them.
For example, we can get such range when intersecting range `[A, B)` and `[C, D)`
where `A < B < C < D`. The resulting range is `[max(A, C), min(B, D)) = [C, B)`.
This range is empty, but its `Begin` does not match with `End`.
Making IRCE for an empty range is basically safe but unprofitable because we
never actually get into the main loop where the range checks are supposed to
be eliminated. This patch uses SCEV mechanisms to treat loops with proved
`Begin >= End` as empty.
Differential Revision: https://reviews.llvm.org/D39082
llvm-svn: 316550
With r314527, promoted values get a suffix that is a decimal value of
the module hash instead of hex. Change the regex to match only decimal
suffix values.
llvm-svn: 316544
This is in preparation for testing lld's upcoming relocation packing
feature (D39152). I have verified that this implementation correctly
unpacks the relocations from a Chromium DSO built with gold and the
Android relocation packer for ARM32 and ARM64.
Differential Revision: https://reviews.llvm.org/D39272
llvm-svn: 316543
Similar to how llvm::salvagDebugInfo hooks into InstCombine, this adds
a hook that can be invoked before an SDNode that is associated with an
SDDbgValue is erased to capture the effect of the deleted node in a
DIExpression.
The motivating example is an SDDebugValue attached to an ADD operation
that gets folded into a LOAD+OFFSET operation.
rdar://problem/32121503
llvm-svn: 316525
If particular target supports volatile memory access operations, we can
avoid AS casting to generic AS. Currently it's only enabled in NVPTX for
loads and stores that access global & shared AS.
Differential Revision: https://reviews.llvm.org/D39026
llvm-svn: 316495
Adding the scheduling information for the Browadwell (BDW) CPU target.
This patch adds the instruction scheduling information for the Broadwell (BDW) architecture target by adding the file X86SchedBroadwell.td located under the X86 Target.
We used the scheduling information retrieved from the Broadwell architects in order to create the file.
The scheduling information includes latency, number of micro-Ops and used ports by each BDW instruction.
The patch continues the scheduling replacement and insertion effort started with the SandyBridge (SNB) target in r310792, the Haswell (HSW) target in r311879, the SkylakeClient (SKL) target in rL313613 + rL315978 and the SkylakeServer (SKX) in rL315175.
Performance fluctuations may be expected due to code alignment effects.
Reviewers: zvi, RKSimon, craig.topper
Differential Revision: https://reviews.llvm.org/D39054
Change-Id: If6f799e5ff60e1091c8d43b05ea78c53581bae01
llvm-svn: 316492
This updates the MIRPrinter to include the regclass when printing
virtual register defs, which is already valid syntax for the
parser. That is, given 64 bit %0 and %1 in a "gpr" regbank,
%1(s64) = COPY %0(s64)
would now be written as
%1:gpr(s64) = COPY %0(s64)
While this change alone introduces a bit of redundancy with the
registers block, it allows us to update the tests to be more concise
and understandable and brings us closer to being able to remove the
registers block completely.
Note: We generally only print the class in defs, but there is one
exception. If there are uses without any defs whatsoever, we'll print
the class on all uses. I'm not completely convinced this comes up in
meaningful machine IR, but for now the MIRParser and MachineVerifier
both accept that kind of stuff, so we don't want to have a situation
where we can print something we can't parse.
llvm-svn: 316479
If we have the situation where a Swap feeds a Splat we can sometimes change the
index on the Splat and then remove the Swap instruction.
Fixed the test case that was failing and recommit after pulling the original
commit.
Original revision is here: https://reviews.llvm.org/D39009
llvm-svn: 316478
By using the widest type possible for PACKSS truncation we have a better chance of being able to peek through bitcasts and improves other combines driven by ComputeNumSignBits.
llvm-svn: 316448
These tests checked for the line number without a leading ":", so for example,
a missed diagnostic on line 123 could match one on line 1123, 2123, etc,
desynchronising the test for hundreds of lines.
This couldn't cause it to incorrectly pass or fail, but made it hard to track
down test failures.
Differential revision: https://reviews.llvm.org/D39238
llvm-svn: 316442
Report a diagnostic when we fail to parse a shift in a memory operand because
the shift type is not an identifier. Without this, we were silently ignoring
the whole instruction.
Differential revision: https://reviews.llvm.org/D39237
llvm-svn: 316441
Summary:
r264440 added or/and patterns for storing -1 or 0 with the intention of decreasing code size. However,
X86CallFrameOptimization does not recognize these memory accesses so it will not replace them with push's when profitable.
This patch fixes this problem by teaching X86CallFrameOptimization these store 0/-1 idioms.
An alternative fix would be to prevent the 'store 0/1 idioms' patterns from firing when accessing the stack. This would save
the need to teach the pass about these idioms. However, because X86CallFrameOptimization does not always fire we may result
in cases where neither X86CallFrameOptimization not the patterns for 'store 0/1 idioms' fire.
Fixes pr34863
Reviewers: DavidKreitzer, guyblank, aymanmus
Reviewed By: aymanmus
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38738
llvm-svn: 316431
Summary:
Got asserts in llvm::CastInst::getCastOpcode saying:
`DestBits == SrcBits && "Illegal cast to vector (wrong type or size)"' failed.
Problem seemed to be that llvm::ConstantFoldCastInstruction did
not handle ptrtoint cast of a getelementptr returning a vector
correctly. I assume such situations are quite rare, since the
GEP needs to be considered as a constant value (base pointer
being null).
The solution used here is to simply avoid the constant fold
of ptrtoint when the value is a vector. It is not supported,
and by bailing out we do not fail on assertions later on.
Reviewers: craig.topper, majnemer, davide, filcab, efriedma
Reviewed By: efriedma
Subscribers: efriedma, filcab, llvm-commits
Differential Revision: https://reviews.llvm.org/D38546
llvm-svn: 316430
This change fixes values of test so that it passes
-verify without errors and also adds comments.
Test was introduced in D39119 and intention was to check
that tool is able to dump few
DW_*GNU_call_site* tags and attributes, so that
change is NFC cleanup.
llvm-svn: 316428
Summary:
Kill the thread if operand 0 == false.
llvm.amdgcn.wqm.vote can be applied to the operand.
Also allow kill in all shader stages.
Reviewers: arsenm, nhaehnle
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D38544
llvm-svn: 316427
This alias caused a crash when trying to print the "cps #0" instruction in a
diagnostic for thumbv6 (which doesn't have that instruction).
The comment was incorrect, this instruction is UNPREDICTABLE if no flag bits
are set, so I don't think it's worth keeping.
Differential Revision: https://reviews.llvm.org/D39191
llvm-svn: 316420
SelectionDAG inserts a copy of ESP into a virtual register.
X86CallFrameOptimization assumed that the COPY, if present, is always
right after the call-frame setup instruction (ADJCALLSTACKDOWN). This was a
wrong assumption as the COPY can be located anywhere between the call-frame setup
instruction and its first use. If the COPY happened to be located in a different
location than what X86CallFrameOptimization assumed, visiting it while
processing the call chain would lead to a conservative bail-out.
The fix is quite straightfoward, scan ahead for the stack-pointer copy and make note
of it so it can be ignored while processing the call chain.
Fixes pr34903
Differential Revision: https://reviews.llvm.org/D38730
llvm-svn: 316416
Infrastructure designed for padding code with nop instructions in key places such that preformance improvement will be achieved.
The infrastructure is implemented such that the padding is done in the Assembler after the layout is done and all IPs and alignments are known.
This patch by itself in a NFC. Future patches will make use of this infrastructure to implement required policies for code padding.
Reviewers:
aaboud
zvi
craig.topper
gadi.haber
Differential revision: https://reviews.llvm.org/D34393
Change-Id: I92110d0c0a757080a8405636914a93ef6f8ad00e
llvm-svn: 316413
Summary:
The motivation of this change is to enable .mir testing for this pass.
Added one test case to cover the functionality, this same case will be improved by
a future patch.
Reviewers: igorb, guyblank, DavidKreitzer
Reviewed By: guyblank, DavidKreitzer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38729
llvm-svn: 316412
The `BasicBlock::getFirstInsertionPt` call may return `std::end` for the
BB. Dereferencing the end iterator results in an assertion failure
"(!NodePtr->isKnownSentinel()), function operator*". Ensure that the
returned iterator is valid before dereferencing it. If the end is
returned, move one position backward to get a valid insertion point.
llvm-svn: 316401
This adds type index discovery and dumper support for symbol record kind
0x1168, which is a list of inlined function ids. This symbol kind is
undocumented, but S_INLINEES is consistent with the existing
nomenclature.
Fixes PR34222
llvm-svn: 316398
This commit adds optimisation remarks for outlining which fire when a function
is successfully outlined.
To do this, OutlinedFunctions must now contain references to their Candidates.
Since the Candidates must still be sorted and worked on separately, this is
done by working on everything in terms of shared_ptrs to Candidates. This is
good; it means that we can easily move everything to outlining in terms of
the OutlinedFunctions rather than the individual Candidates. This is far more
intuitive than what's currently there!
(Remarks are output when a function is created for some group of Candidates.
In a later commit, all of the outlining logic should be rewritten so that we
loop over OutlinedFunctions rather than over Candidates.)
llvm-svn: 316396
This is in preparation for a verifier check that makes sure
copies are of the same size (when generic virtual registers are involved).
llvm-svn: 316388
This is in preparation for a verifier check that makes sure copies are
of the same size (when generic virtual registers are involved).
llvm-svn: 316387
This pass adds pgo-memop-opt pass to the new pass manager.
It is in the old pass manager but somehow left out in the new pass manager.
Differential Revision: http://reviews.llvm.org/D39145
llvm-svn: 316384
This fixes a bug where we'd crash given code like the test-case from
https://bugs.llvm.org/show_bug.cgi?id=30792 . Instead, we let the
offending clobber silently slide through.
This doesn't fully fix said bug, since the assembler will still complain
the moment it sees a crypto/fp/vector op, and we still don't diagnose
calls that require vector regs.
Differential Revision: https://reviews.llvm.org/D39030
llvm-svn: 316374
In HexagonISelLowering, there is code to handle the case when
a function returns an i1 type. In this case, we need to generate
extra nodes to copy the result from R0 to a predicate register.
The code was returning the wrong value for the chain edge which
caused an assert "Wrong topological sorting" when converting the
instructions to MIs.
This patch fixes the problem by returning the chain for the final
copy.
Patch by Brendon Cahoon.
llvm-svn: 316367
If we have the situation where a Swap feeds a Splat we can sometimes change the
index on the Splat and then remove the Swap instruction.
Differential Revision: https://reviews.llvm.org/D39009
llvm-svn: 316366
This patch enables the import of stores. Unfortunately, doing so by itself,
loses an optimization where storing 0 to memory makes use of WZR/XZR.
To mitigate this, this patch also introduces a new feature that allows register
operands to nominate a zero register. When this is done, GlobalISel will
substitute (G_CONSTANT 0) with the nominated register automatically. This
is currently configured to only apply to the stores.
Applying it to GPR32/GPR64 register classes in general will be done after
review see (https://reviews.llvm.org/D39150).
llvm-svn: 316360
The range should be assumed to be the hardware maximum
if a workitem intrinsic is used in a callable function
which does not know the restricted limit of the calling
kernel.
llvm-svn: 316346
combineShuffleOfScalars is very conservative about shuffled BUILD_VECTORs that can be combined together.
This patch adds one additional case - if both BUILD_VECTORs represent splats of the same scalar value but with different UNDEF elements, then we should create a single splat BUILD_VECTOR, sharing only the UNDEF elements defined by the shuffle mask.
Differential Revision: https://reviews.llvm.org/D38696
llvm-svn: 316331
Before, loop unrolling was only enabled for loops with a single
block. This restriction has been removed and replaced by:
- allow a maximum of two exiting blocks,
- a four basic block limit for cores with a branch predictor.
Differential Revision: https://reviews.llvm.org/D38952
llvm-svn: 316313
Summary:
It's unclear if this is the only thing we can do but at least this is consistent with the check
of address space agreement in `isBitCastable`.
The code is used at least in both instcombine and jumpthreading though
I could only find a way to trigger the invalid cast in instcombine.
Reviewers: loladiro, sanjoy, majnemer
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34335
llvm-svn: 316302
As discussed in D39011:
https://reviews.llvm.org/D39011
...replacing constants with a variable is inverting the transform done
by other IR passes, so we definitely don't want to do this early.
In fact, it's questionable whether this transform belongs in SimplifyCFG
at all. I'll look at moving this to codegen as a follow-up step.
llvm-svn: 316298
This fixes bugzilla 26810
https://bugs.llvm.org/show_bug.cgi?id=26810
This is intended to prevent sequences like:
movl %ebp, 8(%esp) # 4-byte Spill
movl %ecx, %ebp
movl %ebx, %ecx
movl %edi, %ebx
movl %edx, %edi
cltd
idivl %esi
movl %edi, %edx
movl %ebx, %edi
movl %ecx, %ebx
movl %ebp, %ecx
movl 16(%esp), %ebp # 4 - byte Reload
Such sequences are created in 2 scenarios:
Scenario #1:
vreg0 is evicted from physreg0 by vreg1
Evictee vreg0 is intended for region splitting with split candidate physreg0 (the reg vreg0 was evicted from)
Region splitting creates a local interval because of interference with the evictor vreg1 (normally region spliiting creates 2 interval, the "by reg" and "by stack" intervals. Local interval created when interference occurs.)
one of the split intervals ends up evicting vreg2 from physreg1
Evictee vreg2 is intended for region splitting with split candidate physreg1
one of the split intervals ends up evicting vreg3 from physreg2 etc.. until someone spills
Scenario #2
vreg0 is evicted from physreg0 by vreg1
vreg2 is evicted from physreg2 by vreg3 etc
Evictee vreg0 is intended for region splitting with split candidate physreg1
Region splitting creates a local interval because of interference with the evictor vreg1
one of the split intervals ends up evicting back original evictor vreg1 from physreg0 (the reg vreg0 was evicted from)
Another evictee vreg2 is intended for region splitting with split candidate physreg1
one of the split intervals ends up evicting vreg3 from physreg2 etc.. until someone spills
As compile time was a concern, I've added a flag to control weather we do cost calculations for local intervals we expect to be created (it's on by default for X86 target, off for the rest).
Differential Revision: https://reviews.llvm.org/D35816
Change-Id: Id9411ff7bbb845463d289ba2ae97737a1ee7cc39
llvm-svn: 316295
The missed canonicalization/optimization in the motivating test from PR34471 leads to very different codegen:
int switcher(int x) {
switch(x) {
case 17: return 17;
case 19: return 19;
case 42: return 42;
default: break;
}
return 0;
}
int comparator(int x) {
if (x == 17) return 17;
if (x == 19) return 19;
if (x == 42) return 42;
return 0;
}
For the first example, we use a bit-test optimization to avoid a series of compare-and-branch:
https://godbolt.org/g/BivDsw
Differential Revision: https://reviews.llvm.org/D39011
llvm-svn: 316293
This patch implements dynamic stack (re-)alignment for 16-bit Thumb. When
targeting processors, which support only the 16-bit Thumb instruction set
the compiler ignores the alignment attributes of automatic variables and may
silently generate incorrect code.
Differential revision: https://reviews.llvm.org/D38143
llvm-svn: 316289
The pass scans the function to find instruction chains that define
registers in the same domain (closures).
It then calculates the cost of converting the closure to another domain.
If found profitable, the instructions are converted to instructions in
the other domain and the register classes are changed accordingly.
This commit adds the pass infrastructure and a simple conversion from
the GPR domain to the Mask domain.
Differential Revision:
https://reviews.llvm.org/D37251
Change-Id: Ic2cf1d76598110401168326d411128ae2580a604
llvm-svn: 316288
This introduces a new operand type to encode the whether the index register should be XMM/YMM/ZMM. And new code to fixup the results created by readSIB.
This has the nice effect of removing a bunch of code that hard coded the name of every GATHER and SCATTER instruction to map the index type.
This fixes PR32807.
llvm-svn: 316273
The way that splitInnerLoopHeader splits blocks requires that
the induction PHI will be the first PHI in the inner loop
header. This makes sure that is actually the case when there
are both IV and reduction phis.
Differential Revision: https://reviews.llvm.org/D38682
llvm-svn: 316261
Summary: __multi3 is not available on x86 (32-bit). Setting lib call name for MULI_128 to nullptr forces DAGTypeLegalizer::ExpandIntRes_MUL to generate instructions for 128-bit multiply instead of a call to an undefined function. This fixes PR20871 though it may be worth looking at why licm and indvars combine to generate 65-bit multiplies in that test.
Patch by Riyaz V Puthiyapurayil
Reviewers: craig.topper, schweitz
Reviewed By: craig.topper, schweitz
Subscribers: RKSimon, llvm-commits
Differential Revision: https://reviews.llvm.org/D38668
llvm-svn: 316254
This was suggested in PR35003:
https://bugs.llvm.org/show_bug.cgi?id=35003
32-bit checks may be identical to 64-bit (if we avoid those pesky scalar params!).
I'll check in the script change shortly assuming this doesn't anger any bots.
llvm-svn: 316223
Summary:
This change comes from using lld for i686-windows-msvc. Before this change, lld
emits an error of:
error: relocation against symbol in discarded section: .xdata
It's possible that this could be addressed in lld, but I think this change is
reasonable on its own.
At a high level, this is being generated:
A (.text comdat) -> B (.text) -> C (.xdata comdat)
Where A is a C++ inline function, which references B, an exception handler
thunk, which references C, the exception handling info.
With this structure, lld will error when applying relocations to B if the C it
references has been discarded (some other C has been selected).
This change checks if A is comdat, and if so places the exception registration
thunk (B) in the comdata group of A (and B).
It appears that MSVC makes the __ehhandler function comdat.
Is it possible that duplicate thunks are being emitted into the final binary
with other linkers, or are they stripping the unused thunks?
Reviewers: rnk, majnemer, compnerd, smeenai
Reviewed By: rnk, compnerd
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38940
llvm-svn: 316219
Normally, if the registers holding the induction variable's bounds
are redefined inside of the loop's body, the loop cannot be converted
to a hardware loop. However, if the redefining instruction is actually
loading an immediate value into the register, this conversion is both
possible and legal (since the immediate itself will be used in the
loop setup in the preheader).
llvm-svn: 316218
(recommit #2 after checking for timeout issue).
The original patch was an improvement to IR ValueTracking on
non-negative integers. It has been checked in to trunk (D18777,
r284022). But was disabled by default due to performance regressions.
Perf impact has improved. The patch would be enabled by default.
Reviewers: reames, hfinkel
Differential Revision: https://reviews.llvm.org/D34101
Patch by: Olga Chupina <olga.chupina@intel.com>
llvm-svn: 316208
This test checks that load from constant memory will be sunk regardless of
aliasing stores in the loop.
Patch by Daniil Suchkov!
Differential Revision: https://reviews.llvm.org/D39113
llvm-svn: 316207
The commit at https://reviews.llvm.org/rL315888 is causing some failures
with internal testing. Disabling this code until we can resolve the issues.
llvm-svn: 316199
This adds the minimum necessary to support codegen for simple ALU operations
on RV32. Prolog and epilog insertion, support for memory operations etc etc
follow in future patches.
Leave guessInstructionProperties=1 until https://reviews.llvm.org/D37065 is
reviewed and lands.
Differential Revision: https://reviews.llvm.org/D29933
llvm-svn: 316188
x86 has its own copy of integer absolute pattern matching to combine directly to a SUB+CMOV.
This patch removes the x86 combine and adds custom lowering support for ISD::ABS instead, allowing us to use the DAGCombiner version.
Additional test cases are already covered by iabs.ll (rL315706 and rL315711).
Differential Revision: https://reviews.llvm.org/D38895
llvm-svn: 316162
This runs `udpate_mir_test_checks --add-vreg-checks` on the tests taht
are already more or less in the format that generates, so that there
will be less churn in some upcoming changes.
llvm-svn: 316139
This converts a large and somewhat arbitrary set of tests to use
update_mir_test_checks. I ran the script on all of the tests I expect
to need to modify for an upcoming mir syntax change and kept the ones
that obviously didn't change the tests in ways that might make it
harder to understand.
llvm-svn: 316137
llvm-cov tends to highlight too many regions because its policy is to
highlight all region entry segments. This can look confusing to users:
not all region entry segments are interesting and deserve highlighting.
Emitting these highlights only when the region count differs from the
line count is a more user-friendly policy.
llvm-svn: 316109
llvm-cov typically doesn't highlight gap segments, but it should if the
gap occurs after an uncovered region in order to preserve continuity.
llvm-svn: 316107
This patch lets the llvm tools handle the new HVX target features that
are added by frontend (clang). The target-features are of the form
"hvx-length64b" for 64 Byte HVX mode, "hvx-length128b" for 128 Byte mode HVX.
"hvx-double" is an alias to "hvx-length128b" and is soon will be deprecated.
The hvx version target feature is upgated form "+hvx" to "+hvxv{version_number}.
Eg: "+hvxv62"
For the correct HVX code generation, the user must use the following
target features.
For 64B mode: "+hvxv62" "+hvx-length64b"
For 128B mode: "+hvxv62" "+hvx-length128b"
Clang picks a default length if none is specified. If for some reason,
no hvx-length is specified to llvm, the compilation will bail out.
There is a corresponding clang patch.
Differential Revision: https://reviews.llvm.org/D38851
llvm-svn: 316101
r315275 set the IsLittleEndian parameter incorrectly. This patch corrects
this, and adds a test to ensure such mistakes will be caught in the future.
llvm-svn: 316091
Fix a couple of tests that were extending the wrong vreg, and
regenerate their checks with update_mir_test_checks. This looks like
it was a copy-paste or test update error.
llvm-svn: 316087
In the case where there was a conditional branch followed by a unconditional
branch with debug instruction separating them, MipsInstrInfo::analyzeBranch
would not skip past debug instruction when searching for the second branch
which give erroneous results about the control flow of the block.
This could lead to the branch folder to merge the non-fall through case
into it's predecessor, leaving the conditional branch with a dangling
basic block operand.
This resolves PR34975.
Thanks to Alexander Richardson for reporting the issue!
Reviewers: atanasyan
Differential Revision: https://reviews.llvm.org/D39003
llvm-svn: 316084
bug fix 316067 https://bugs.llvm.org/show_bug.cgi?id=34978
This test checks that the x86-interleaved ends without any
assertion.
Change-Id: I1e970482a4d0404516cbc85517fc091bb21c35a8
llvm-svn: 316080
This patch adds accurate instructions cost.
The formula presents two cases(stride 3 and stride 4) and calculates the cost according to the VF and stride.
Reviewers:
1. delena
2. Farhana
3. zvi
4. dorit
5. Ayal
Differential Revision: https://reviews.llvm.org/D38762
Change-Id: If4cfbd4ac0e63694e8144cb78c7fa34850647ff7
llvm-svn: 316072
Helper functions to identify sign- and zero-extending machine instruction is introduced in rL315888.
This patch makes PPCInstrInfo::optimizeCompareInstr use the helper functions. It simplifies the code and also makes possible more optimizations since the helper can do more analysis than the original check code; I observed about 5000 more compare instructions are eliminated while building LLVM.
Also, this patch fixes a bug in helpers on ANDIo instruction handling due to the order of checks. This bug causes a failure in an existing test case for optimizeCompareInstr.
Differential Revision: https://reviews.llvm.org/D38988
llvm-svn: 316071
Summary:
When we have the following case:
%cond = cmp iN %x, CmpConst
%tr = trunc iN %x to iK
%narrowsel = select i1 %cond, iK %t, iK C
We could possibly match only min/max pattern after looking through cast.
So it is more profitable if widened C constant will be equal CmpConst.
That is why just set widened C constant equal to CmpConst, because there
is a further check in this function that trunc CmpConst == C.
Also description for lookTroughCast function was added.
Reviewers: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38536
Patch by: Artur Gainullin <artur.gainullin@intel.com>
llvm-svn: 316070
Summary:
If a compare instruction is same or inverse of the compare in the
branch of the loop latch, then return a constant evolution node.
Currently scope of evaluation is limited to SCEV computation for
PHI nodes.
This shall facilitate computations of loop exit counts in cases
where compare appears in the evolution chain of induction variables.
Will fix PR 34538
Reviewers: sanjoy, hfinkel, junryoungju
Reviewed By: junryoungju
Subscribers: javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D38494
llvm-svn: 316054
When more than one Module is imported into the same context, such as during
an LTO build before linking the modules, ODR type uniquing may cause types
to point to a different CU. This check does not make sense in this case.
This fixes the error reported in PR34944.
https://bugs.llvm.org/show_bug.cgi?id=34944
rdar://problem/34940685
This reapplies a cleaner implementation of r316049.
llvm-svn: 316052
When more than one Module is imported into the same context, such as during
an LTO build before linking the modules, ODR type uniquing may cause types
to point to a different CU. This check does not make sense in this case.
This fixes the error reported in PR34944.
https://bugs.llvm.org/show_bug.cgi?id=34944
rdar://problem/34940685
llvm-svn: 316049
If the address of a local is used in a comparison, AArch64 can fold the
address-calculation into the comparison via "adds". Unfortunately, a couple of
places (both hit in this one test) are not ready to deal with that yet and just
assume the first source operand is a register.
llvm-svn: 316035
NFC.
Added the Broadwell cpu and the BROADWELL prefix to all the scheduling regression tests, as part of prepartion for a larger commit of adding all Broadwell scheduiling.
Reviewers: RKSimon, zvi, aaboud
Differential Revision: https://reviews.llvm.org/D38994
Change-Id: I54bc9065168844c107b1729fcdc1d311ce3ea0a9
llvm-svn: 315998
Summary:
ValueTracking was recognizing not all variations of clamp. Swapping of
true value and false value of select was added to fix this problem. This
change breaks the canonical form of cmp inside the matchMinMax function,
that is why additional checks for compare predicates is needed. Added
corresponding test cases.
Reviewers: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38531
Patch by: Artur Gainullin <artur.gainullin@intel.com>
llvm-svn: 315992
Summary:
It seems that negative offset was accidentally allowed in D17967.
AFAICT small negative offset should be valid (always raise segfault) on all archs that I'm aware of (especially x86, which is the only one with this optimization enabled) and such case can be useful when loading hiden metadata from an object.
However, like the positive side, it should only be done within a certain limit.
For now, use the same limit on the positive side for the negative side.
A separate option can be added if needs appear.
Reviewers: mcrosier, skatkov
Reviewed By: skatkov
Subscribers: sanjoy, llvm-commits
Differential Revision: https://reviews.llvm.org/D38925
llvm-svn: 315991
Updated the scheduling information for the SkylakeClient target with the following changes:
1. regrouped the instructions after adding load and store latencies.
2. regrouped the instructions after adding identified missing ports in several groups.
The changes were made after revisiting the latencies impact of all the load and store uOps.
Reviewers: zvi, RKSimon, craig.topper
Differential Revision: https://reviews.llvm.org/D38727
Change-Id: I778a308cc11e490e8fa5e27e2047412a1dca029f
llvm-svn: 315978