If we don't have cmov, X87 compares write to FPSW and we need to
move the bits to EFLAGS to use as JCC/SETCC/CMOV conditions.
Previously this was done by calling ConvertCmpIfNecessary in
multiple places which would emit the extra code for the FNSTSW,
a shift, a truncate, and a SAHF instructions. Isel would then
select trunc+X86ISD::CMP to a FUCOM instruction that produces FPSW.
This patch centralizes all of the handling into a single custom
isel handler. This allows us to remove ConvertCmpIfNecessary and
a couple target specific ISD opcodes.
Differential Revision: https://reviews.llvm.org/D73863
Summary:
This enables it for large working set size cases only.
This does not enable it under sample PGO.
Reviewers: davidxl
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74073
Only 32 and 64 bit SBB are dependency breaking instructons on some
CPUs. The 8 and 16 bit forms have to preserve upper bits of the GPR.
This patch removes the smaller forms and selects the wider form
instead. I had to do this with custom code as the tblgen generated
code glued the eflags copytoreg to the extract_subreg instead of
to the SETB pseudo.
Longer term I think we can remove X86ISD::SETCC_CARRY and use
(X86ISD::SBB zero, zero). We'll want to keep the pseudo and select
(X86ISD::SBB zero, zero) to either a MOV32r0+SBB for targets where
there is no dependency break and SETB_C32/SETB_C64 for targets
that have a dependency break. May want some way to avoid the MOV32r0
if the instruction that produced the carry flag happened to def a
register that we can use for the dependency.
I think the flag copy lowering should be using NEG instead of SUB to
handle SETB. That would avoid the MOV32r0 there. Or maybe it should
use a ADC with -1 to recreate the carry flag and keep the SETB?
That would avoid a MOVZX on the input of the SUB.
Differential Revision: https://reviews.llvm.org/D74024
When multiple instructions are moved into a waterfall loop, it's
possible some of them re-use the same operands. Avoid creating
multiple sequences of readfirstlanes for them. None of the current
uses will hit this, but will be used in a future patch.
This patch implements the caller side of placing function call arguments
in stack memory. This removes the current limitation where LLVM on AIX
will report fatal error when arguments can't be contained in registers.
There is a particular oddity that a float argument that passes in a
register and also in stack memory requires that the caller initialize
both. From what AIX "ABI" documentation I have it's not clear that this
needs to be done, however, it is necessary for compatibility with the
AIX XL compiler so I think it's best to implement it the same way.
Note a later patch will follow to address the callee side.
Differential Revision: https://reviews.llvm.org/D73209
This reverts commit ed29dbaafa.
I'm backing out D68945, which as the discussion for D73526 shows, doesn't
seem to handle the -O0 path through the codegen backend correctly. I'll
reland the patch when a fix is worked out, apologies for all the churn.
The two parent commits are part of this revert too.
Conflicts:
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
llvm/test/DebugInfo/X86/dbg-addr-dse.ll
SelectionDAGBuilder conflict is due to a nearby change in e39e2b4a79
that's technically unrelated. dbg-addr-dse.ll conflicted because
41206b61e3 (legitimately) changes the order of two lines.
There are further modifications to dbg-value-func-arg.ll: it landed after
the patch being reverted, and I've converted indirection to be represented
by the isIndirect field rather than DW_OP_deref.
This reverts commit 3137fe4d23.
I'm backing out D68945, which this patch is a follow up for. It'll be
re-landed when D68945 is fixed.
The changes to dbg-value-func-arg.ll occur because our handling of certain
kinds of location now mixes up indirection that happens at different points
in a DIExpression. While this is a regression, it's a return to the prior
behaviour while a better patch is sought.
This reverts commit 2d3174c4df.
The overall solution for this problem is reverting D68945, which wasn't
handling the -O0 path through the codegen backend correctly. See:
discussion in D73526.
This test case was XFAIL'ed because the peepholer was missing an optimisation.
But the peepholer is now able to handle this case, so enable this test. I will
close the corresponding and very old PR11364.
To find the instruction in the block for a given ID, first a count and then a
lookup was performed in the map, which is almost the same thing, thus doing
double the work.
Differential Revision: https://reviews.llvm.org/D73866
This adds some of LLD specific scopes and picks up optimisation scopes
via LTO/ThinLTO. Makes use of TimeProfiler multi-thread support added in
77e6bb3c.
Differential Revision: https://reviews.llvm.org/D71060
It broke e.g. all tests under tools/llvm-exegesis/X86/ when libpfm is
not available, see comment on D74085.
This reverts commit b3576f60eb and
141915963b.
Followup to D74085.
Replace the use of `report_fatal_error()` with returning the error to
`llvm-exegesis.cpp` and handling it there.
Differential Revision: https://reviews.llvm.org/D74113
Fix inconsistencies in error reporting created by mixing
`report_fatal_error()` and `ExitOnErr()`, and add additional information
to the error message to make it more user friendly. Minimize the use
`report_fatal_error()` because it's meant for use in very rare cases and
it results in low information density of the error messages.
Summary of the new design:
* For command line argument errors output `llvm-exegesis: <error_message>`,
which is consistent with the error output format emitted by the backend
which checks correctness of the command line arguments.
* For other errors the format `llvm-exegesis error: <error_message>` is used.
** If the error occurred during file access `<error_message>` will have
of two parts: `'<file_name>': <rest_of_the_error_message>`
Differential Revision: https://reviews.llvm.org/D74085
As detailed on PR43943, we're seeing static analyzer use after move warnings in the iplist_impl move constructor/operator as they call std::move to both the TraitsT and IntrusiveListT base classes.
As suggested by @dexonsmith this patch casts the moved value to the base classes to silence the warnings.
Differential Revision: https://reviews.llvm.org/D74062
IRCE pass checks that it can calculate loop bounds by checking
SCEV availability at loop entry. However it is possible that loop
bound SCEV is loop invariant, but instruction used to compute it
resides within loop. In such case adjusting loop bound in preheader
using IRBuilder leads to malformed SSA.
Use SCEVExpander instead to generate proper instructions.
Reviewed-by: mkazantsev
Differential Revision: https://reviews.llvm.org/D73496
Summary:
ARM Type Promotion pass does not clear
the container that defines if one variable
was visited or not, missing optimization
opportunities by luck when two llvm:Values
from different functions are allocated at
the same memory address.
Also fixes a comment and uses existing
method to pop and obtain last element
of the worklist.
Reviewers: samparker
Reviewed By: samparker
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73970
* Hide unrelated options.
* Add "OVERVIEW: " to yaml2obj -h/--help.
* Place options under a yaml2obj category.
* Disallow -docnum. Currently -docnum is the only yaml2obj specific long option that is affected.
* Specify `cl::init("-")` and `cl::Prefix` for OutputFilename. The
latter allows `-ofile`
Reviewed By: grimar, jhenderson
Differential Revision: https://reviews.llvm.org/D73982
Do not iterate on scalable vector type in BitCastConstantVector.
Continuation work of D70985, D71147.
Support for folding bitcast into splat value is kept in D74095, as
it depends on D71637.
Differential Revision: https://reviews.llvm.org/D71389
When we have a G_ICMP which checks SLT, and the comparison is against 0, we
can emit a TBNZ instead of a CBZ.
This lets us fold in things into the branch, which can provide some code size
savings.
This is similar to the case in `AArch64TargetLowering::LowerBR_CC`.
https://reviews.llvm.org/D74090
Factor it out into `emitTestBit` and add some asserts to the new function.
This will be useful for implementing TB(N)Z emission for SLT/SGT compares.
Differential Revision: https://reviews.llvm.org/D74080
Add support for walking through G_LSHR in `getTestBitReg`. Equivalent to the
code in `getTestBitOperand` in AArch64ISelLowering.
```
(tbz (lshr x, c), b) -> (tbz x, b+c) when b + c is < # bits in x
```
Differential Revision: https://reviews.llvm.org/D74077
This is a compile-time optimization for PHIElimination (splitting of critical
edges), which was reported at https://bugs.llvm.org/show_bug.cgi?id=44249. As
discussed there, the way to remedy the slowdowns with huge functions is to
pre-compute the live-in registers for each MBB in an efficient way in
PHIElimination.cpp and then pass that information along to
LiveVariabless::addNewBlock().
In all the huge test programs where this slowdown has been noticable, it has
dissapeared entirely with this patch.
Review: Björn Pettersson, Quentin Colombet.
Differential Revision: https://reviews.llvm.org/D73152
The legalizer produces a lot of these, and they make reading legalized
MIR annoying. For some reason, this does seem to sometimes introduce
copies of implicit def, which is dumb.
The "{=v0}" constraint did not result in the expected error message in the
abscence of the vector facility, because 'v0' matches as a string into the
AnyRegBitRegClass in common code.
This patch adds checks for vector support in case of "{v" and soft-float in
case of "{f" to remedy this.
Review: Ulrich Weigand.
The load ports need a cycle for each potentially loaded element just like Haswell and Skylake. Unlike Haswell and Broadwell, the number of uops does not scale with the number of elements. Instead the load uops run for multiple cycles.
I've taken the latency number from the uops.info. The port binding for the non-load uops is taken from the original IACA data I have.
Differential Revision: https://reviews.llvm.org/D74000
Summary: This patch fixes https://bugs.llvm.org/show_bug.cgi?id=44388 which incorrectly assigns an ABI alignment to memset when there was no explicit alignment given.
Reviewers: gchatelet, lenary, nikic
Reviewed By: nikic
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74083
The problem was noticed by the Chrome OS toolchain folks
(crbug.com/1048445) because llvm-objcopy --add-gnu-debuglink would
insert the wrong checksum when processing a binary larger than 4 GB.
That use case regressed in 1e1e3ba252 when we started using
llvm::crc32() in more places.
Differential revision: https://reviews.llvm.org/D74039
Really the intrinsic definition is wrong, but work around this
here. The DAG lowering introduces an MMO. We have to introduce a new
operation to avoid the verifier complaining about the missing mayLoad.
I was debug stepping through an x86 shuffle lowering and
noticed we were doing an N^2 search for splat index. I
didn't find the equivalent functionality anywhere else in
LLVM, so here's a helper that takes an array of int and
returns a splatted index while ignoring undefs (any
negative value).
This might also be used inside existing
ShuffleVectorInst/ShuffleVectorSDNode functions and/or
help with D72467.
Differential Revision: https://reviews.llvm.org/D74064