Add lower for G_FPTOUI. Algorithm is similar to the SDAG version
in TargetLowering::expandFP_TO_UINT.
Lower G_FPTOUI for MIPS32.
Differential Revision: https://reviews.llvm.org/D66929
llvm-svn: 370431
When the number of return values exceeds the number of registers available,
SelectionDAGBuilder::visitRet transforms a function's return to use a
pointer to a buffer to hold return values. When the returned value is an
operator such as extractvalue, the value may have a non-zero result number.
Add that number to the indexing when obtaining the values to store.
This fixes https://bugs.llvm.org/show_bug.cgi?id=43132.
Differential Revision: https://reviews.llvm.org/D66978
llvm-svn: 370430
Unlike ppc64, which has ADDISgotTprelHA+LDgotTprelL pairs,
ppc32 just uses LDgotTprelL32, so it does not make lots of sense to use
_LO without a paired _HA.
Emit R_PPC_GOT_TPREL16 instead R_PPC_GOT_TPREL16_LO to match GCC, and
get better linker relocation check. Note, R_PPC_GOT_TPREL16_{HA,LO}
don't have good linker support:
(a) lld does not support R_PPC_GOT_TPREL16_{HA,LO}.
(b) Top of tree ld.bfd does not support R_PPC_GOT_REL16_HA Initial-Exec -> Local-Exec relaxation:
// a.o
addis 3, 3, tsd_tls@got@tprel@ha
lwz 3, tsd_tls@got@tprel@l(3)
add 3, 3, tsd_tls@tls
// b.o
.section .tdata,"awT"; .globl tsd_tls; tsd_tls:
// ld/ld-new a.o b.o
internal error, aborting at ../../bfd/elf32-ppc.c:7952 in ppc_elf_relocate_section
Reviewed By: adalava
Differential Revision: https://reviews.llvm.org/D66925
llvm-svn: 370426
Add a default with an llvm_unreachable for anything we don't expect.
This seems safer that just blindly returning true for anything
missing from the switch.
llvm-svn: 370424
Add an WASM_SYMBOL_NO_STRIP flag, so that __attribute__((used)) doesn't
need to imply exporting. When targeting Emscripten, have
WASM_SYMBOL_NO_STRIP imply exporting.
Differential Revision: https://reviews.llvm.org/D62542
llvm-svn: 370415
Summary:
Reported in https://github.com/opencv/opencv/issues/15413.
We have serveral extended mnemonics for Move To/From Vector-Scalar Register Instructions
eg: mffprd,mtfprd etc.
We only support one of them, this patch add the others.
Reviewers: nemanjai, steven.zhang, hfinkel, #powerpc
Reviewed By: hfinkel
Subscribers: wuzish, qcolombet, hiraditya, kbarton, MaskRay, shchenz, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66963
llvm-svn: 370411
This teaches GISel to select patterns which fold an extend plus optional shift
into the addressing mode. In particular, adds and subs.
Factor out the arith extended register ComplexPatterns in AArch64InstrFormats.td
and create GISel equivalents.
Add some equivalent functions to the ones in AArch64ISelDAGToDAG:
- `selectArithExtendedRegister`
- `narrowExtendRegIfNeeded`
- `getExtendTypeForInst`
`getExtendTypeForInst` includes the checks for loads and stores. This will be
used for WRO addressing modes in loads + stores.
Teach selectCopy to properly handle subregister copies on the same bank in
order to support `narrowExtendRegIfNeeded`. The extended register must be a
GPR32, so we need to support same-bank subregister copies.
Fix a bug in getSubRegForClass which would cause registers on things like
GPR32common to end up getting ssub. Just change the check to look for FPR32
rather than GPR32.
For tests:
- Add select-arith-extended-reg.mir
- Update addsub_ext.ll to include GlobalISel checks
Differential Revision: https://reviews.llvm.org/D66835
llvm-svn: 370410
Summary:
This is a minor improvement on our past attempts to do this. Fixes
PR43155.
Reviewers: hans
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66905
llvm-svn: 370409
Summary:
There is no reason to differ in assembler behavior here between -msvc
and -gnu targets. Without this setting, the text after the '@' is
interpreted as a symbol variable, like foo@IMGREL.
Reviewers: mstorsjo
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66974
llvm-svn: 370408
ISD::isBuildVectorAllZeros permits undef elements to be present, which means we can't return it as a zero vector. PMULDQ/PMULUDQ is an extending multiply so a multiply by zero of the lower 32-bits should result in a zero 64-bit element.
llvm-svn: 370404
-Deprecate -mmpx and -mno-mpx command line options
-Remove CPUID detection of mpx for -march=native
-Remove MPX from all CPUs
-Remove MPX preprocessor define
I've left the "mpx" string in the backend so we don't fail on old IR, but its not connected to anything.
gcc has also deprecated these command line options. https://www.phoronix.com/scan.php?page=news_item&px=GCC-Patch-To-Drop-MPX
Differential Revision: https://reviews.llvm.org/D66669
llvm-svn: 370393
We can also apply the earlier updates to the lazy DTU, instead of
applying them directly.
Reviewers: kuhar, brzycki, asbirlea, SjoerdMeijer
Reviewed By: brzycki, asbirlea, SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D66918
llvm-svn: 370391
AMDGPU uses this for some addressing mode selection patterns. The
analysis run itself doesn't do anything so it seems easier to just
always require this than adding a way to opt in.
llvm-svn: 370388
Summary:
I'm not planning to check this in at the moment, but feedback is very welcome, in particular how this affects performance.
The feedback obtains here will guide the next steps towards enabling this.
This patch enables the use of MemorySSA in the loop pass manager.
Passes that currently use MemorySSA:
- EarlyCSE
Passes that use MemorySSA after this patch:
- EarlyCSE
- LICM
- SimpleLoopUnswitch
Loop passes that update MemorySSA (and do not use it yet, but could use it after this patch):
- LoopInstSimplify
- LoopSimplifyCFG
- LoopUnswitch
- LoopRotate
- LoopSimplify
- LCSSA
Loop passes that do *not* update MemorySSA:
- IndVarSimplify
- LoopDelete
- LoopIdiom
- LoopSink
- LoopUnroll
- LoopInterchange
- LoopUnrollAndJam
- LoopVectorize
- LoopReroll
- IRCE
Reviewers: chandlerc, george.burgess.iv, davide, sanjoy, gberry
Subscribers: jlebar, Prazek, dmgreen, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58311
llvm-svn: 370384
Add a GISelPredicateCode to the stxr_* PatFrags in AArch64InstrAtomics.td.
This allows us to select these intrinsics.
Differential Revision: https://reviews.llvm.org/D65779
llvm-svn: 370382
Remove manual selection code for this intrinsic and use a GISelPredicateCode
instead.
This allows us to fully select this intrinsic without any tricky custom C++
matching.
Differential Revision: https://reviews.llvm.org/D65780
llvm-svn: 370380
Same thing as D66897, but for ldxr.* instead. Add a GISelPredicateCode to the
ldxr_* definitions, which allows us to import them.
Add select-ldxr-intrin.mir, and update arm64-ldxr-stxr.ll.
Differential Revision: https://reviews.llvm.org/D66898
llvm-svn: 370378
Add a GISelPredicateCode to ldaxr_*. This allows us to import the patterns for
@llvm.aarch64.ldaxr.*, and thus select them.
Add `isLoadStoreOfNumBytes` for the GISelPredicateCode, since each of these
intrinsics involves the same check.
Add select-ldaxr-intrin.mir, and update arm64-ldxr-stxr.ll.
Differential Revision: https://reviews.llvm.org/D66897
llvm-svn: 370377
Summary:
- Similar to the workaround in fix of PR30188, skip sinking common
lifetime markers of `alloca`. They are mostly left there after
inlining functions in branches.
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66950
llvm-svn: 370376
Summary:
While examining this class for possible use in lldb, I noticed two
things:
- it spits out parsing errors directly to stderr
- the loclists parser can incorrectly return valid location lists when
parsing malformed (truncated) data
I improve the stderr situation by making the parseOneLocationList
functions return Expected<T>s. The errors are still dumped to stderr by
their callers, so this is only a partial fix, but it is enough for my
use case, as I intend to parse the locations lists one by one.
I fix the behavior in the truncated scenario by using the newly
introduced DataExtractor Cursor API.
I also add tests for handling the error cases, as they currently have no
coverage.
Reviewers: dblaikie, JDevlieghere, probinson
Subscribers: lldb-commits, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63591
llvm-svn: 370363
Both methods `MipsTargetStreamer::emitStoreWithSymOffset` and
`MipsTargetStreamer::emitLoadWithSymOffset` are almost the same and
differ argument names only. These methods are used in the single place
so it's better to inline their code and remove original methods.
llvm-svn: 370354
When a "base" in the `lw/sw $reg1, symbol($reg2)` instruction is
a register and generated code is position independent, backend
does not add the "base" value to the symbol address.
```
lw $reg1, %got(symbol)($gp)
lw/sw $reg1, 0($reg1)
```
This patch fixes the bug and adds the missed `addu` instruction by
passing `BaseReg` into the `loadAndAddSymbolAddress` routine and handles
the case when the `BaseReg` is the zero register to escape redundant
`move reg, reg` instruction:
```
lw $reg1, %got(symbol)($gp)
addu $reg1, $reg1, $reg2
lw/sw $reg1, 0($reg1)
```
Differential Revision: https://reviews.llvm.org/D66894
llvm-svn: 370353
Summary:
Now that with D65143/D65144 we've produce `@llvm.umul.with.overflow`,
and with D65147 we've flattened the CFG, we now can see that
the guard may have been there to prevent division by zero is redundant.
We can simply drop it:
```
----------------------------------------
Name: no overflow or zero
%iszero = icmp eq i4 %y, 0
%umul = smul_overflow i4 %x, %y
%umul.ov = extractvalue {i4, i1} %umul, 1
%umul.ov.not = xor %umul.ov, -1
%retval.0 = or i1 %iszero, %umul.ov.not
ret i1 %retval.0
=>
%iszero = icmp eq i4 %y, 0
%umul = smul_overflow i4 %x, %y
%umul.ov = extractvalue {i4, i1} %umul, 1
%umul.ov.not = xor %umul.ov, -1
%retval.0 = or i1 %iszero, %umul.ov.not
ret i1 %umul.ov.not
Done: 1
Optimization is correct!
```
Note that this is inverted from what we have in a previous patch,
here we are looking for the inverted overflow bit.
And that inversion is kinda problematic - given this particular
pattern we neither hoist that `not` closer to `ret` (then the pattern
would have been identical to the one without inversion,
and would have been handled by the previous patch), neither
do the opposite transform. But regardless, we should handle this too.
I've filled [[ https://bugs.llvm.org/show_bug.cgi?id=42720 | PR42720 ]].
Reviewers: nikic, spatel, xbolva00, RKSimon
Reviewed By: spatel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65151
llvm-svn: 370351
Summary:
Now that with D65143/D65144 we've produce `@llvm.umul.with.overflow`,
and with D65147 we've flattened the CFG, we now can see that
the guard may have been there to prevent division by zero is redundant.
We can simply drop it:
```
----------------------------------------
Name: no overflow and not zero
%iszero = icmp ne i4 %y, 0
%umul = umul_overflow i4 %x, %y
%umul.ov = extractvalue {i4, i1} %umul, 1
%retval.0 = and i1 %iszero, %umul.ov
ret i1 %retval.0
=>
%iszero = icmp ne i4 %y, 0
%umul = umul_overflow i4 %x, %y
%umul.ov = extractvalue {i4, i1} %umul, 1
%retval.0 = and i1 %iszero, %umul.ov
ret %umul.ov
Done: 1
Optimization is correct!
```
Reviewers: nikic, spatel, xbolva00
Reviewed By: spatel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65150
llvm-svn: 370350
Summary:
As it can be seen in the tests in D65143/D65144, even though we have formed an '@llvm.umul.with.overflow'
and got rid of potential for division-by-zero, the control flow remains, we still have that branch.
We have this condition:
```
// Don't fold i1 branches on PHIs which contain binary operators
// These can often be turned into switches and other things.
if (PN->getType()->isIntegerTy(1) &&
(isa<BinaryOperator>(PN->getIncomingValue(0)) ||
isa<BinaryOperator>(PN->getIncomingValue(1)) ||
isa<BinaryOperator>(IfCond)))
return false;
```
which was added back in rL121764 to help with `select` formation i think?
That check prevents us to flatten the CFG here, even though we know
we no longer need that guard and will be able to drop everything
but the '@llvm.umul.with.overflow' + `not`.
As it can be seen from tests, we end here because the `not` is being
sinked into the PHI's incoming values by InstCombine,
so we can't workaround this by hoisting it to after PHI.
Thus i suggest that we relax that check to not bailout if we'd get to hoist the `not`.
Reviewers: craig.topper, spatel, fhahn, nikic
Reviewed By: spatel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65147
llvm-svn: 370349
The missing line added by this patch ensures that only spilt variable
locations are candidates for being restored from the stack. Otherwise,
register or constant-value information can be interpreted as a spill
location, through a union.
The added regression test replicates a scenario where this occurs: the
stack load from [rsp] causes the register-location DBG_VALUE to be
"restored" to rsi, when it should be left alone. See PR43058 for details.
Un x-fail a test that was suffering from this from a previous patch.
Differential Revision: https://reviews.llvm.org/D66895
llvm-svn: 370334
This allows us to produce broken binaries with local
symbols placed after global in '.dynsym'/'.symtab'
Also, simplifies the code.
Differential revision: https://reviews.llvm.org/D66799
llvm-svn: 370331
Masked loads and store fit naturally with MVE, the instructions being easily
predicated. This adds lowering for the simple cases of masked loads and stores.
It does not yet deal with widening/narrowing or pre/post inc.
The llvm masked load intrinsic will accept a "passthru" value, dictating the
values used for the zero masked lanes. In MVE the instructions write 0 to the
zero predicated lanes, so we need to match a passthru that isn't 0 (or undef)
with a select instruction to pull in the correct data after the load.
We also need to do something with unaligned loads/stores. Currently this uses a
similar method used in big endian, using an VLDRB.8 (and potentially a VREV in
BE). This does mean that the predicate mask is converted from, for example, a
v4i1 to a v16i1. The VLDR instructions are defined as using the first bit of
the relevant mask lane, so this could potentially load different results if the
predicate is little odd. As the input is a v4i1 however, I believe this is OK
and all the bits required should be set in the predicate, making the VLDRB.8
load the same data.
Differential Revision: https://reviews.llvm.org/D66534
llvm-svn: 370329
The "join" method in LiveDebugValues does not attempt to join unseen
predecessor blocks if their out-locations aren't yet initialized, instead
the block should be re-visited later to see if any locations have changed
validity. However, because the set of blocks were all being "process"'d
once before "join" saw them, that logic in "join" was actually ignoring
legitimate out-locations on the first pass through. This meant that some
invalidated locations were not removed from the head of loops, allowing
illegal locations to persist.
Fix this by removing the run of "process" before the main join/process loop
in ExtendRanges. Now the unseen predecessors that "join" skips truly are
uninitialized, and we come back to the block at a later time to re-run
"join", see the @baz function added.
This also fixes another fault where stack/register transfers in the entry
block (or any other before-any-loop-block) had their tranfers initially
ignored, and were then never revisited. The MIR test added tests for this
behaviour.
XFail a test that exposes another bug; a fix for this is coming in D66895.
Differential Revision: https://reviews.llvm.org/D66663
llvm-svn: 370328
Summary:
We were previously doing it in DAGCombine.
But we also want to do `sub %x, C` -> `add %x, (sub 0, C)` for vectors in DAGCombine.
So if we had `sub %x, -1`, we'll transform it to `add %x, 1`,
which `combineIncDecVector()` will immediately transform back into `sub %x, -1`,
and here we go again...
I've marked this as NFC since not a single test changes,
but since that 'changes' DAGCombine, probably this isn't fully NFC.
Reviewers: RKSimon, craig.topper, spatel
Reviewed By: craig.topper
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62327
llvm-svn: 370327
Summary: This is beneficial when the shuffle is only used once and end up being generated in a few places when some node is combined into a shuffle.
Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66718
llvm-svn: 370326
Summary:
Finally, the fold i was looking forward to :)
The legality check is muddy, i doubt i've groked the full generalization,
but it handles all the cases i care about, and can come up with:
https://rise4fun.com/Alive/26j
I.e. we can perform the fold if **any** of the following is true:
* The shift amount is either zero or one less than widest bitwidth
* Either of the values being shifted has at most lowest bit set
* The value that is being shifted by `shl` (which is not truncated) should have no less leading zeros than the total shift amount;
* The value that is being shifted by `lshr` (which **is** truncated) should have no less leading zeros than the widest bit width minus total shift amount minus one
I strongly suspect there is some better generalization, but i'm not aware of it as of right now.
For now i also avoided using actual `computeKnownBits()`, but restricted it to constants.
Reviewers: spatel, nikic, xbolva00
Reviewed By: spatel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66383
llvm-svn: 370324
Instead of blindly incrementing pointers in llvm-readobj, use this
helper, which does bounds checking against the available section
data.
Differential Revision: https://reviews.llvm.org/D66818
llvm-svn: 370310
Previously, the expression (Reader.readFoo()) was expanded twice,
triggering asserts as one of the Error types ends up not checked
(and as it was expanded twice, the method would end up called twice
if it failed first).
Differential Revision: https://reviews.llvm.org/D66817
llvm-svn: 370309
We had an isel pattern to perform this, but its better to
do it in DAG combine as a simplification. This also fixes the lack
of patterns for AVX512 targets.
llvm-svn: 370294
Including a type legalizer fix to make bitcast operand promotion
work correctly when getSoftenedFloat returns f128 instead of i128.
Fixes PR43157
llvm-svn: 370293
We do not access the DT in the loop, so we do not have to apply updates
eagerly. We can apply them lazyly and flush them after we are done
merging blocks.
As follow-up work, we might be able to use the DTU above as well,
instead of manually updating the DT.
This brings the example from PR43134 from ~100s to ~4s for a relase +
assertions build on my machine.
Reviewers: efriedma, kuhar, asbirlea, brzycki
Reviewed By: kuhar, brzycki
Differential Revision: https://reviews.llvm.org/D66911
llvm-svn: 370292
SGPR spills aren't really handled after SILowerSGPRSpills. In order to
directly control what happens if the scavenger needs to spill, the
scavenger needs to be used directly. There is an alternative to
spilling in these contexts anyway since the frame register can be
increment and restored.
This does present another possible issue if spilling is needed for the
unused carry out if an add is needed. I think this can be avoided by
using a scalar add (although that clobbers SCC, which happens anyway).
llvm-svn: 370281
This is a special case because one node maps to two different G_
instructions, and the operand order is changed.
This mostly enables G_FCMP for AMDPGPU. G_ICMP is still manually
selected for now since it has the SALU and VALU complication to deal
with.
llvm-svn: 370280
Also add a FIXME because I'm not sure why these patterns exist. Looks like a missing combine.
And another FIXME because the AVX512 equivalent one of the patterns is missing.
llvm-svn: 370276
The patch fixed the issue that RV64 didn't clear the upper bits
when return complex floating value with lp64 ABI.
float _Complex
complex_add(float _Complex a, float _Complex b)
{
return a + b;
}
RealResult = zero_extend(RealA + RealB)
ImageResult = ImageA + ImageB
Return (RealResult | (ImageResult << 32))
The patch introduces shouldExtendTypeInLibCall target hook to suppress
the AssertZext generation when lowering floating LibCall.
Thanks to Eli's comments from the Bugzilla
https://bugs.llvm.org/show_bug.cgi?id=42820
Differential Revision: https://reviews.llvm.org/D65497
llvm-svn: 370275
If result of 64-bit address loading combines with 32-bit mask, LLVM
tries to optimize the code and remove "redundant" loading of upper
32-bits of the address. It leads to incorrect code on MIPS64 targets.
MIPS backend creates the following chain of commands to load 64-bit
address in the `MipsTargetLowering::getAddrNonPICSym64` method:
```
(add (shl (add (shl (add %highest(sym), %higher(sym)),
16),
%hi(sym)),
16),
%lo(%sym))
```
If the mask presents, LLVM decides to optimize the chain of commands. It
really does not make sense to load upper 32-bits because the 0x0fffffff
mask anyway clears them. After removing redundant commands we get this
chain:
```
(add (shl (%hi(sym), 16), %lo(%sym))
```
There is no patterns matched `(MipsHi (i64 symbol))`. Due a bug in `SYM_32`
predicate definition, backend incorrectly selects a pattern for a 32-bit
symbols and uses the `lui` instruction for loading `%hi(sym)`.
As a result we get incorrect set of instructions with unnecessary 16-bit
left shifting:
```
lui at,0x0
R_MIPS_HI16 foo
dsll at,at,0x10
daddiu at,at,0
R_MIPS_LO16 foo
```
This patch resolves two problems:
- Fix `SYM_32/SYM_64` predicates to prevent selection of patterns dedicated
to 32-bit symbols in case of using N64 ABI.
- Add missed patterns for 64-bit symbols for `%hi/%lo`.
Fix PR42736.
Differential Revision: https://reviews.llvm.org/D66228
llvm-svn: 370268
...cloning a function from a different module
Currently when a function with debug info is cloned from a different module, the
cloned function may have hanging DICompileUnits, so that the module with the
cloned function fails debug info verification.
The proposed fix inserts all DICompileUnits reachable from the cloned function
to "llvm.dbg.cu" metadata operands of the cloned function module.
Reviewed By: aprantl, efriedma
Differential Revision: https://reviews.llvm.org/D66510
Patch by Oleg Pliss (Oleg.Pliss@azul.com)
llvm-svn: 370265
By default ASan calls a versioned function
`__asan_version_mismatch_check_vXXX` from the ASan module constructor to
check that the compiler ABI version and runtime ABI version are
compatible. This ensures that we get a predictable linker error instead
of hard-to-debug runtime errors.
Sometimes, however, we want to skip this safety guard. This new command
line option allows us to do just that.
rdar://47891956
Reviewed By: kubamracek
Differential Revision: https://reviews.llvm.org/D66826
llvm-svn: 370258
Before this change, if multiple binary files were presented, all of them must have been instrumented or the load would fail with coverage_map_error::no_data_found.
Patch by Dean Sturtevant.
Differential Revision: https://reviews.llvm.org/D66763
llvm-svn: 370257
Stop counting explicitly disabled user_spgr's in the user_sgpr_count field of the kernel descriptor.
Differential Revision: https://reviews.llvm.org/D66900
llvm-svn: 370250
Always true/false checks were flagged by static analysis;
https://bugs.llvm.org/show_bug.cgi?id=43143
I have not confirmed the logic difference in propagating nsw vs. nuw,
but presumably we would have noticed a bug by now if that was wrong.
llvm-svn: 370248
Summary:
This functionality was added when Mapper::mapMetadata was recursive. It
is no longer needed after r265456, which switched it to be iterative.
Reviewers: dexonsmith, srhines
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66860
llvm-svn: 370236
As dependences between abstract attributes can become stale, e.g., if
one was sufficient to imply another one at some point but it has since
been wakened to the point it is not usable for the formerly implied one.
To weed out spurious dependences, and thereby eliminate unneeded
updates, we introduce an option to determine how often the dependence
cache is cleared and recomputed during the fixpoint iteration.
Note that the initial value was determined such that we see a positive
result on our tests.
Differential Revision: https://reviews.llvm.org/D63315
llvm-svn: 370230
This implements constrained floating point intrinsics for FP to signed and
unsigned integers.
Quoting from D32319:
The purpose of the constrained intrinsics is to force the optimizer to
respect the restrictions that will be necessary to support things like the
STDC FENV_ACCESS ON pragma without interfering with optimizations when
these restrictions are not needed.
Reviewed by: Andrew Kaylor, Craig Topper, Hal Finkel, Cameron McInally, Roman Lebedev, Kit Barton
Approved by: Craig Topper
Differential Revision: http://reviews.llvm.org/D63782
llvm-svn: 370228
These are currently translated as normal functions calls in AArch64.
Until we have proper tail call lowering, we shouldn't translate these.
Differential Revision: https://reviews.llvm.org/D66842
llvm-svn: 370225
This reduces the number of SGPRs due to some concerns about running
out of SGPRs if you make all the SGPRs that aren't reserved available
for the calling convention.
Change-Id: Idb4ca4dc72f5b6808cb524ff7270915a8de5b4c1
llvm-svn: 370215
Summary:
Until we have proper call-site information we should not recompute
liveness and return information for each call site. This patch directly
uses the function versions and introduces TODOs at the usage sites.
The required iterations to get to the fixpoint are most of the time
reduced by this change and we always avoid work duplication.
Reviewers: sstefan1, uenoku
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66562
llvm-svn: 370208
This relands this commit, I mistakenly reverted the original change
thinking it was the cause of the observed MSan failures but it was not.
llvm-svn: 370206
Neither libgcc or compiler-rt are usually used on Windows, so these
functions can't be called.
Differential revision: https://reviews.llvm.org/D66880
llvm-svn: 370204
There is no pattern matched `add hi, (MipsLo texternalsym)`. As a result,
loading an address of 32-bit symbol requires two registers and one more
additional instruction:
```
addiu $1, $zero, %lo(foo)
lui $2, %hi(foo)
addu $25, $2, $1
```
This patch adds the missed pattern and enables generation more effective
set of instructions:
```
lui $1, %hi(foo)
addiu $25, $1, %lo(foo)
```
Differential Revision: https://reviews.llvm.org/D66771
llvm-svn: 370196
Summary: There are at least 2 ways to express the same shuffle. Various pieces of code explicit check for both option, but other places do not when they would benefit from doing it. This patches refactor the codebase to use buildLegalVectorShuffle in order to make that behavior more consistent.
Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri
Subscribers: javed.absar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66804
llvm-svn: 370190
This just pulls the MVEVPTBlockPass into a separate file, as opposed to being
wrapped up in Thumb2ITBlockPass.
Differential revision: https://reviews.llvm.org/D66579
llvm-svn: 370187
This adds fp16 VMOVX patterns, using the same patterns as rL362482 with some
adjustments for MVE. It allows us to move fp16 registers without going into and
out of gprs.
VMOVX is able to move the top bits from a fp16 in a fp reg into the bottom bits
of another register, zeroing the rest. This can be used for odd MVE register
lanes. The top bits are not read by fp16 instructions, so no move is required
there if we are dealing with even lanes.
Differential revision: https://reviews.llvm.org/D66793
llvm-svn: 370184
With the introduction of the typed byval attribute change there was no
way that the LLVM-C API could create the correct class Attribute. If a
program that uses the C API creates a ByVal attribute and annotates a
function with that attribute LLVM will crash when it assembles or write
that module containing the function out as bitcode.
This change is a minimal fix to at least allow code to work, this is
because the byval change is on the 9.0 and I don't want to introduce new
LLVM-C API this late in the release cycle.
By Jakob Bornecrantz!
Differential revision: https://reviews.llvm.org/D66144
llvm-svn: 370176
Allow vectorizing loops that have reductions when tail is folded by masking.
A select is introduced in VPlan, choosing between the last value carried by the
loop-exit/live-out instruction of the reduction, and the penultimate value
carried by the reduction phi, according to the "i < n" mask of fold-tail.
This select replaces the last value as the live-out value of the loop.
Differential Revision: https://reviews.llvm.org/D66720
llvm-svn: 370173
rL369567 reverted a couple of recent changes made to ARMParallelDSP
because of a miscompilation error: PR43073.
The issue stemmed from an underlying bug that was caused by adding
muls into a reduction before it was proved that they could be executed
in parallel with another mul.
Most of the changes here are from the previously reverted commits.
The additional changes have been made area:
1) The Search function now doesn't insert any muls into the Reduction
object. That now happens once the search has successfully finished.
2) For any muls added into the reduction but that weren't paired, we
accumulate their values as an input into the smlad.
Differential Revision: https://reviews.llvm.org/D66660
llvm-svn: 370171
This change moves the actual stack pointer manipulation into the legalizer,
available to targets via lower(). The codegen is slightly different because
we're using explicit masks instead of G_PTRMASK, and using G_SUB rather than
adding a negative amount via G_GEP.
Differential Revision: https://reviews.llvm.org/D66678
llvm-svn: 370104
The code we had isSafeToLoadUnconditionally was blatantly wrong. This function takes a "Size" argument which is supposed to describe the span loaded from. Instead, the code use the size of the pointer passed (which may be unrelated!) and only checks that span. For any Size > LoadSize, this can and does lead to miscompiles.
Worse, the generic code just a few lines above correctly handles the cases which *are* valid. So, let's delete said code.
Removing this code revealed two issues:
1) As noted by jdoerfert the removed code incorrectly handled external globals. The test update in SROA is to stop testing incorrect behavior.
2) SROA was confusing bytes and bits, but this wasn't obvious as the Size parameter was being essentially ignored anyway. Fixed.
Differential Revision: https://reviews.llvm.org/D66778
llvm-svn: 370102
Copied directly from the IR version.
Most of the testcases I've added for this are somewhat problematic
because they really end up testing the yet to be implemented version
for MUL_I24/MUL_U24.
llvm-svn: 370099
Summary:
This patch implements main entry and auxiliary entries of symbol table generation for llvm-readobj on AIX.
The source code of aix_xcoff_xlc_test8.o (compile with xlc) is:
-bash-4.2$ cat test8.c
extern int i;
extern int TestforXcoff;
extern int fun(int i);
static int static_i;
char* p="abcd";
int fun1(int j) {
static_i++;
j++;
j=j+*p;
return j;
}
int main() {
i++;
fun(i);
return fun1(i);
}
Patch provided by DiggerLin
Differential Revision: https://reviews.llvm.org/D65240
llvm-svn: 370097
Summary:
This patch introduces, SequenceBBQuery - new heuristic to find likely next callable functions it tries to find the blocks with calls in order of execution sequence of Blocks.
It still uses BlockFrequencyAnalysis to find high frequency blocks. For a handful of hottest blocks (plan to customize), the algorithm traverse and discovered the caller blocks along the way to Entry Basic Block and Exit Basic Block. It uses Block Hint, to stop traversing the already visited blocks in both direction. It implicitly assumes that once the block is visited during discovering entry or exit nodes, revisiting them again does not add much. It also branch probability info (cached result) to traverse only hot edges (planned to customize) from hot blocks. Without BPI, the algorithm mostly return's all the blocks in the CFG with calls.
It also changes the heuristic queries, so they don't maintain states. Hence it is safe to call from multiple threads.
It also implements, new instrumentation to avoid jumping into JIT on every call to the function with the help _orc_speculate.decision.block and _orc_speculate.block.
"Speculator Registration Mechanism is also changed" - kudos to @lhames
Open to review, mostly looking to change implementation of SequeceBBQuery heuristics with good data structure choices.
Reviewers: lhames, dblaikie
Reviewed By: lhames
Subscribers: mgorny, hiraditya, mgrang, llvm-commits, lhames
Tags: #speculative_compilation_in_orc, #llvm
Differential Revision: https://reviews.llvm.org/D66399
llvm-svn: 370092
Before this patch, users were not allowed to optionally mark processor resource
groups as load/store queues. That is because tablegen class MemoryQueue was
originally declared as expecting a ProcResource template argument (instead of a
more generic ProcResourceKind).
That was an oversight, since the original intention from D54957 was to let user
mark any processor resource as either load/store queue. This patch adds the
ability to use processor resource groups in MemoryQueue definitions. This is not
a user visible change.
Differential Revision: https://reviews.llvm.org/D66810
llvm-svn: 370091
There are 5 instructions here that are converted from TAILJMP opcodes to regular JMP/JCC opcodes during MCInstLowering. So normally there encoding information isn't used. The exception being when XRay wraps them in PATCHABLE_TAIL_CALL.
For the ones that weren't already handled in MCInstLowering, add handling for those and remove their encoding information.
This patch fixes PATCHABLE_TAIL_CALL to do the same opcode conversion as the regular lowering patch. Then removes the encoding information.
Differential Revision: https://reviews.llvm.org/D66561
llvm-svn: 370079
On MachO, processing of the eh-frame section should stop if the end of the
__eh_frame section is reached, regardless of whether or not there is a null CFI
length field at the end of the section. This patch tracks the eh-frame section
size and threads it through the appropriate APIs so that processing can be
terminated correctly.
No testcase yet: This patch is all API plumbing (rather than modification of
linked memory) which the existing infrastructure does not provide a way of
testing. Committing without a testcase until I have an idea of how to write
one.
llvm-svn: 370074
If content sections have lower alignment than zero-fill sections then bump the
overall segment alignment to avoid under-aligning the zero-fill sections.
llvm-svn: 370072
(-X) * (-Y) + Z --> X * Y + Z
This is a missing optimization that shows up as a potential regression in D66050,
so we should solve it first. We appear to be partly missing this fold in IR as well.
We do handle the simpler case already:
(-X) * (-Y) --> X * Y
And it might be beneficial to make the constraint less conservative (eg, if both
operands are cheap, but not necessarily cheaper), but that causes infinite looping
for the existing fmul transform.
Differential Revision: https://reviews.llvm.org/D66755
llvm-svn: 370071
Summary:
Adds support for emitting common local global symbols to an XCOFF object file.
Local commons are emitted into the .bss section with a storage class of
C_HIDEXT.
Patch by: daltenty
Reviewers: sfertile, hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D66097
llvm-svn: 370070
This reverts commit b3d258fc44.
@skatkov is reporting crash in D63972#1646303
Contacted @ZhangKang, and revert the commit on behalf of him.
llvm-svn: 370069
Main difference is in the way Hi for Long shift (HiL) is made.
G_LSHR fills HiL with zeros, while G_ASHR fills HiL with sign bit value.
Differential Revision: https://reviews.llvm.org/D66589
llvm-svn: 370064
Fix typos. Use Hi and Lo prefixes for Or instead of LHS and RHS
to match names of surrounding variables.
Differential Revision: https://reviews.llvm.org/D66587
llvm-svn: 370062
Summary:
This patch adds support for scalable vectors in intrinsics, enabling
intrinsics such as the following to be defined:
declare <vscale x 4 x i32> @llvm.something.nxv4i32(<vscale x 4 x i32>)
Support for this is implemented by defining a new type descriptor for
scalable vectors and adding mangling support for scalable vector types
in the name mangling scheme used by 'any' types in intrinsic signatures.
Tests have been added for IRBuilder to test scalable vectors work as
expected when using intrinsics through this interface. This required
implementing an intrinsic that is explicitly defined with scalable
vectors, e.g. LLVMType<nxv4i32>, an SVE floating-point convert
intrinsic was used for this. The behaviour of the overloaded type
LLVMScalarOrSameVectorWidth with scalable vectors is tested using the
existing masked load intrinsic. Also added an .ll test to test the
Verifier catches a bad intrinsic argument when passing a fixed-width
predicate (mask) to the masked.load intrinsic where a scalable is
expected.
Patch by Paul Walker
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D65930
llvm-svn: 370053
Summary:
This is motivated by D63591, where we realized that there isn't a really
good way of telling whether a DataExtractor is reading actual data, or
is it just returning default values because it reached the end of the
buffer.
This patch resolves that by providing a new "Cursor" class. A Cursor
object encapsulates two things:
- the current position/offset in the DataExtractor
- an error object
Storing the error object inside the Cursor enables one to use the same
pattern as the std::{io}stream API, where one can blindly perform a
sequence of reads and only check for errors once at the end of the
operation. Similarly to the stream API, as soon as we encounter one
error, all of the subsequent operations are skipped (return default
values) too, even if the would suceed with clear error state. Unlike the
std::stream API (but in line with other llvm APIs), we force the error
state to be checked through usage of llvm::Error.
Reviewers: probinson, dblaikie, JDevlieghere, aprantl, echristo
Subscribers: kristina, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63713
llvm-svn: 370042
Inserting a value into Visited has the effect of terminating a search for
predecessors if that node is seen. This is legitimate for the base address, and
acts as a slight performance optimization, but the vector-building node can be
paert of a legitimate cycle so we shouldn't stop searching there.
PR43056.
llvm-svn: 370036
This is a follow up discussed in the comments of D66583.
Currently, if for example, we have both StOther and Other set in YAML document for a symbol,
then yaml2obj reports an "unknown key 'Other'" error.
It happens because 'mapOptional()' is never called for 'Other/Visibility' in this case,
leaving those unhandled.
This message does not describe the reason of the error well. This patch fixes it.
Differential revision: https://reviews.llvm.org/D66642
llvm-svn: 370032
ConstantDataVector is a specialized verison of ConstantVector
that stores data in a packed array of bits instead of as
individual pointers to other Constants. But we really shouldn't
expose that if we can void it. And we should handle regular
ConstantVector equally well.
This removes a dyn_cast to ConstantDataVector and just calls
getSplatValue directly on a Constant* if the type is a vector.
llvm-svn: 370018
Summary:
During the fixpoint iteration, including the manifest stage, we should
not delete stuff as other abstract attributes might have a reference to
the value. Through the API this can now be done safely at the very end.
Reviewers: uenoku, sstefan1
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66779
llvm-svn: 370014
This change causes instrumented builds of Clang to have a fatal error in the
backend. https://reviews.llvm.org/D66537 has the details.
llvm-svn: 370006
Summary:
This is an alternate approach to D63396
Currently funclets reuse the same stack slots that are used in the
parent function for saving callee-saved xmm registers. If the parent
function modifies a callee-saved xmm register before an excpetion is
thrown, the catch handler will overwrite the original saved value.
This patch allocates space in funclets stack for saving callee-saved xmm
registers and uses RSP instead RBP to access memory.
Signed-off-by: Pengfei Wang <pengfei.wang@intel.com>
Reviewers: rnk, RKSimon, craig.topper, annita.zhang, LuoYuanke, andrew.w.kaylor
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66596
Signed-off-by: Pengfei Wang <pengfei.wang@intel.com>
llvm-svn: 370005
The problem these are supposed to work around can occur before the
intrinsics are lowered into the nodes. Try to directly simplify them
so they are matched before the bit assert operations can be optimized
out.
llvm-svn: 369994
The mul24 matching could interfere with SLSR and the other addressing
mode related passes. This probably is not the optimal placement, but
is an intermediate step. This should probably be moved after all the
generic IR passes, particularly LSR. Moving this after LSR seems to
help in some cases, and hurts others.
As-is in this patch, in idiv-licm, it saves 1-2 instructions inside
some of the loop bodies, but increases the number in others. Moving
this later helps these loops. In the new lsr tests in
mul24-pass-ordering, the intrinsic prevents introducing more
instructions in the loop preheader, so moving this later ends up
hurting them. This shouldn't be any worse than before the intrinsics
were introduced in r366094, and LSR should probably be smarter. I
think it's because it doesn't know the and inside the loop will be
folded away.
llvm-svn: 369991
Probably better to keep add over sub in early DAG combines.
It might make sense to push this to lowering or delay it all
the way to isel. But this was the simplest change.
llvm-svn: 369981
Summary:
Previously we skipped uses within the same BB as a def when rebuilding
SSA after SjLj transformation. For example, before transformation,
```
for.cond:
%0 = phi i32 [ %var, %for.inc ] ...
%var = ...
br label %for.inc
for.inc: ; preds = %for.cond
call i32 @setjmp(...)
br %for.cond
```
In this BB, %var should be defined in all paths from %for.inc to make %0
valid. In the input it was true; %for.inc's only predecessor was
%for.cond. But after SjLj transformation, it is possible that %for.inc
has other predecessors that are reachable without reaching %for.cond.
```
entry.split:
...
br i1 %a, label %bb.1, label %for.inc
for.cond:
%0 = phi i32 [ %var, %for.inc ] ... ; Not valid!
%var = ...
br label %for.inc
for.inc: ; preds = %for.cond, %entry.split
call i32 @setjmp(...)
...
br %for.cond
```
In this case, we can't use %var in the `phi` instruction in %for.cond,
because %var is not defined in all paths through %for.inc (If the
control flow is %entry -> %entry.split -> %for.inc -> %for.cond, %var
has not been defined until we reach the `phi`). But the previous code
excluded users within the same BB, skipping instructions within the same
BB so they are not rewritten properly. User instructions within the same
BB also should be candidates for rewriting if they are _before_ the
original definition.
Fixes PR43097.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66729
llvm-svn: 369978
Try harder to emulate "old runtime" in the test.
To get the old behavior with the new runtime library, we need both
disable personality function wrapping and enable landing pad
instrumentation.
llvm-svn: 369977
In r369808 the failure scheme for ORC symbols was changed to make
MaterializationResponsibility objects responsible for failing the symbols
they represented. This simplifies error logic in the case where symbols are
still covered by a MaterializationResponsibility, but left a gap in error
handling: Symbols that have been emitted but are not yet ready (due to a
dependence on some unemitted symbol) are not covered by a
MaterializationResponsibility object. Under the scheme introduced in r369808
such symbols would be moved to the error state, but queries on those symbols
were never notified. This led to deadlocks when such symbols were failed.
This commit updates error logic to immediately fail queries on any symbol that
has already been emitted if one of its dependencies fails.
llvm-svn: 369976
Symbols that have not been queried will not have MaterializingInfo entries,
so remove the assert that all failed symbols should have these entries.
Also updates the loop to only remove entries that were found earlier.
llvm-svn: 369975
This implements the DWARF 5 feature described in:
http://dwarfstd.org/ShowIssue.php?issue=141212.1
To support recognizing anonymous structs:
struct A {
struct { // Anonymous struct
int y;
};
} a
This patch adds support for the new flag in constructTypeDIE(...) and test to verify this change.
Differential Revision: https://reviews.llvm.org/D66605
llvm-svn: 369969
Summary:
Try to verify how many iterations we need for a fixpoint in our tests.
This patch adjust the way we count to make it easier to follow. It also
adjusts the bounds to actually account for a fixpoint and not only the
minimum number to pass all checks.
Reviewers: uenoku, sstefan1
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66757
llvm-svn: 369945
ANY_EXTEND_VECTOR_INREG isn't currently marked Legal which prevents SimplifyDemandedBits from turning SIGN/ZERO_EXTEND_VECTOR_INREG into it after op legalization. And even if we did make it Legal, combineExtInVec doesn't do shuffle combining on the VECTOR_INREG nodes until AVX1.
This patch adds a quick hack to combinePMULDQ to directly emit a vector shuffle corresponding to an ANY_EXTEND_VECTOR_INREG operation. This avoids both of those issues without creating any other regressions on our tests. The xop-ifma.ll change here also showed up when I tried to resurrect D56306 and seemed to be the only improvement that patch creates now. This is a more direct way to get the benefit.
Differential Revision: https://reviews.llvm.org/D66436
llvm-svn: 369942
This improves the combine I included in D66504 to handle constants in the upper operands of the concat. If we can constant fold them away we can pull the concat after the bin op. This helps with chains of madd reductions on X86 from loop unrolling. The loop madd reduction pattern creates pmaddwd with half the width of the add that follows it using zeroes to fill the upper bits. If we have two of these added together we can pull the zeroes through the accumulating add and then shrink it.
Differential Revision: https://reviews.llvm.org/D66680
llvm-svn: 369937
By default, the Attributor tracks potential dependences between abstract
attributes based on the issued Attributor::getAAFor queries. This
simplifies the development of new abstract attributes but it can also
lead to spurious dependences that might increase compile time and make
internalization harder (D63312). With this patch, abstract attributes
can opt-out of implicit dependence tracking and instead register
dependences explicitly. It is up to the implementation to make sure all
existing dependences are registered.
Differential Revision: https://reviews.llvm.org/D63314
llvm-svn: 369935
Summary:
This comes as a first step toward processing the DAG nodes in topological orders. Doing so ensure that arguments of a node are combined before the node itself is combined, which exposes ore opportunities for optimization and/or reduce the amount of patterns a node has to match for.
DAGCombiner adding nodes to the worklist is various places causes the nodes to be in a different order from what is expected. In addition, this is reduant because these nodes end up being added to the worklist anyways due to the machinery at line 1621.
Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66537
llvm-svn: 369927
This is a followup of https://reviews.llvm.org/D66513. The code calling each
section reader should be put into a separate function (readOneSection), so
SampleProfileExtBinaryReader can override it. Otherwise, the base class
SampleProfileExtBinaryBaseReader will need to be aware of all different kinds
of section readers. That is not right.
Differential Revision: https://reviews.llvm.org/D66693
llvm-svn: 369919
Summary:
When reconstructing the CFG of the loop after unrolling,
LoopUnroll could in some cases remove the phi operands of
loop-carried values instead of preserving them, resulting
in undef phi values after loop unrolling.
When doing this reconstruction, avoid removing incoming
phi values for phis in the successor blocks if the successor
is the block we are jumping to anyway.
Patch-by: ebevhan
Reviewers: fhahn, efriedma
Reviewed By: fhahn
Subscribers: bjope, lebedev.ri, zzheng, dmgreen, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66334
llvm-svn: 369886
Summary:
Concat_vectors is more canonical during early DAG combine. For example, its what's used by SelectionDAGBuilder when converting IR shuffles into SelectionDAG shuffles when element counts between inputs and mask don't match. We also have combines in DAGCombiner than can pull concat_vectors through a shuffle. See partitionShuffleOfConcats. So it seems like concat_vectors is a better operation to use here. I had to teach DAGCombiner's SimplifyVBinOp to also handle concat_vectors with undef. I haven't checked yet if we can remove the INSERT_SUBVECTOR version in there or not.
I didn't want to mess with the other caller of getShuffleHalfVectors that's used during shuffle lowering where insert_subvector probably is what we want to produce so I've enabled this via a boolean passed to the function.
Reviewers: spatel, RKSimon
Reviewed By: RKSimon
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66504
llvm-svn: 369872
Summary:
Adds support for generating the .data section in assembly files for global variables with a non-zero initialization. The support for writing the .data section in XCOFF object files will be added in a follow-on patch. Any relocations are not included in this patch.
Reviewers: hubert.reinterpretcast, sfertile, jasonliu, daltenty, Xiangling_L
Reviewed by: hubert.reinterpretcast
Subscribers: nemanjai, hiraditya, kbarton, MaskRay, jsji, wuzish, shchenz, DiggerLin, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66154
llvm-svn: 369869
These can turn up during multiplication legalization. In principle
these should also apply to smul_lohi, but I wasn't able to figure
out how to produce those with the necessary operands.
Differential Revision: https://reviews.llvm.org/D66380
llvm-svn: 369864
This requires std::intializer_list to be a literal type, which it is
starting with C++14. The downside is that std::bitset is still not
constexpr-friendly so this change contains a re-implementation of most
of it.
Shrinks clang by ~60k.
llvm-svn: 369847
Promoting it from InstCombine's tryToReuseConstantFromSelectInComparison().
Return true if this constant and a constant 'Y' are element-wise equal.
This is identical to just comparing the pointers, with the exception that
for vectors, if only one of the constants has an `undef` element in some
lane, the constants still match.
llvm-svn: 369842
Summary:
`matchThreeWayIntCompare()` looks for
```
select i1 (a == b),
i32 Equal,
i32 (select i1 (a < b), i32 Less, i32 Greater)
```
but both of these selects/compares can be in it's commuted form,
so out of 8 variants, only the two most basic ones is handled.
This fixes regression being introduced in D66232.
Reviewers: spatel, nikic, efriedma, xbolva00
Reviewed By: spatel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66607
llvm-svn: 369841
Summary:
If we have e.g.:
```
%t = icmp ult i32 %x, 65536
%r = select i1 %t, i32 %y, i32 65535
```
the constants `65535` and `65536` are suspiciously close.
We could perform a transformation to deduplicate them:
```
Name: ult
%t = icmp ult i32 %x, 65536
%r = select i1 %t, i32 %y, i32 65535
=>
%t.inv = icmp ugt i32 %x, 65535
%r = select i1 %t.inv, i32 65535, i32 %y
```
https://rise4fun.com/Alive/avb
While this may seem esoteric, this should certainly be good for vectors
(less constant pool usage) and for opt-for-size - need to have only one constant.
But the real fun part here is that it allows further transformation,
in particular it finishes cleaning up the `clamp` folding,
see e.g. `canonicalize-clamp-with-select-of-constant-threshold-pattern.ll`.
We start with e.g.
```
%dont_need_to_clamp_positive = icmp sle i32 %X, 32767
%dont_need_to_clamp_negative = icmp sge i32 %X, -32768
%clamp_limit = select i1 %dont_need_to_clamp_positive, i32 -32768, i32 32767
%dont_need_to_clamp = and i1 %dont_need_to_clamp_positive, %dont_need_to_clamp_negative
%R = select i1 %dont_need_to_clamp, i32 %X, i32 %clamp_limit
```
without this patch we currently produce
```
%1 = icmp slt i32 %X, 32768
%2 = icmp sgt i32 %X, -32768
%3 = select i1 %2, i32 %X, i32 -32768
%R = select i1 %1, i32 %3, i32 32767
```
which isn't really a `clamp` - both comparisons are performed on the original value,
this patch changes it into
```
%1.inv = icmp sgt i32 %X, 32767
%2 = icmp sgt i32 %X, -32768
%3 = select i1 %2, i32 %X, i32 -32768
%R = select i1 %1.inv, i32 32767, i32 %3
```
and then the magic happens! Some further transform finishes polishing it and we finally get:
```
%t1 = icmp sgt i32 %X, -32768
%t2 = select i1 %t1, i32 %X, i32 -32768
%t3 = icmp slt i32 %t2, 32767
%R = select i1 %t3, i32 %t2, i32 32767
```
which is beautiful and just what we want.
Proofs for `getFlippedStrictnessPredicateAndConstant()` for de-canonicalization:
https://rise4fun.com/Alive/THl
Proofs for the fold itself: https://rise4fun.com/Alive/THl
Reviewers: spatel, dmgreen, nikic, xbolva00
Reviewed By: spatel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66232
llvm-svn: 369840
This just adds the opcode and verifier, it will be used to replace existing
dynamic alloca handling in a subsequent patch.
Differential Revision: https://reviews.llvm.org/D66677
llvm-svn: 369833
Summary: Removes a not so useful function from DataLayout and cleans up Support/MathExtras.h
Reviewers: courbet
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66691
llvm-svn: 369824
Summary:
Here is the commit introducing the fields
https://github.com/llvm/llvm-project/commit/cf6749e4c091
It dates back from 2006 and was used by AArch64 backend.
There is no more reference to these fields in the whole codebase so I think it's fine.
Reviewers: courbet
Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66683
llvm-svn: 369810
If the dependencies are not removed then a late failure (one symbol covered by
the query failing after others have already been resolved) can result in an
attempt to detach the query from already finalized symbol, resulting in an
assert/crash. This patch fixes the issue by removing query dependencies in
JITDylib::resolve for symbols that meet the required state.
llvm-svn: 369809
When symbols are failed (via MaterializationResponsibility::failMaterialization)
any symbols depending on them will now be moved to an error state. Attempting
to resolve or emit a symbol in the error state (via the notifyResolved or
notifyEmitted methods on MaterializationResponsibility) will result in an error.
If notifyResolved or notifyEmitted return an error due to failure of a
dependence then the caller should log or discard the error and call
failMaterialization to propagate the failure to any queries waiting on the
symbols being resolved/emitted (plus their dependencies).
llvm-svn: 369808
Instead of using custom C++ in `earlySelect` for loads and stores, just import
the patterns.
Remove `earlySelectLoad`, since we can just import the work it's doing.
Some minor changes to how `ComplexRendererFns` are returned for the XRO
addressing modes. If you add immediates in two steps, sometimes they are not
imported properly and you only end up with one immediate. I'm not sure if this
is intentional.
- Update load-addressing-modes.mir to include the instructions we can now
import.
- Add a similar test, store-addressing-modes.mir to show which store opcodes we
currently import, and show that we can pull in shifts etc.
- Update arm64-fastisel-gep-promote-before-add.ll to use FastISel instead of
GISel. This test failed with GISel because GISel folds the gep into the load.
The test checks that FastISel doesn't fold non-pointer-width adds into loads.
GISel on the other hand, produces a G_CONSTANT of -128 for the add, and then
a G_GEP, which must be pointer-width.
Note that we don't get STRBRoX right now. It seems like the importer can't
handle `FPR8Op:{ *:[Untyped] }:$Rt` source operands. So, those are not currently
supported.
Differential Revision: https://reviews.llvm.org/D66679
llvm-svn: 369806
Summary:
Currently, Legalizer aborts if it’s unable to legalize artifacts. However, it’s
possible to combine them after processing the rest of the instruction because
the legalization is likely to generate more artifacts that allow ArtifactCombiner
to combine away them.
Instead, move illegal artifacts to another list called RetryList and wait until all of the
instruction in InstList are legalized. After that, check if there is any new artifacts and
try to combine them again if that’s the case. If not, abort. The idea is similar to D59339,
but the approach is a bit different.
This patch fixes the issue described above, but the legalizer still may be unable to handle
some cases depending on when to legalize artifacts. So, in the long run, we probably need
a different legalization strategy that handles this dependency in a better way.
Reviewers: dsanders, aditya_nandakumar, qcolombet, arsenm, aemerson, paquette
Reviewed By: dsanders
Subscribers: jvesely, wdng, nhaehnle, rovka, javed.absar, hiraditya, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65894
llvm-svn: 369805
Summary:
We can now manifest alignment information in load/store instructions if
the pointer is known to have a better alignment.
Reviewers: uenoku, sstefan1, lebedev.ri
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66567
llvm-svn: 369804
CONCAT_VECTORS and INSERT_SUBVECTORS can both call combineConcatVectorOps,
but we shouldn't produce INSERT_SUBVECTORS from there. We should
keep CONCAT_VECTORS until vector legalization.
Noticed while looking at the madd_quad_reduction test from madd.ll
llvm-svn: 369802
This is a patch split from https://reviews.llvm.org/D66374. It tries to add
a new format of profile called ExtBinary. The format adds a section header
table to the profile and organize the profile in sections, so the future
extension like adding a new section or extending an existing section will be
easier while keeping backward compatiblity feasible.
Differential Revision: https://reviews.llvm.org/D66513
llvm-svn: 369798
The smin opcode and friends for v1i128 are incorrectly marked as legal for PPC.
Change them to expand.
Differential Revision: https://reviews.llvm.org/D64960
llvm-svn: 369797
Started implementing the vector case and realized the scalar case hadn't handled the GEP producing a different type than the base correctly. It's entertaining seeing what slips through review when we're focused on the 'hard' parts. :(
Also adding an extra vector test as it happened to be in workspace and wasn't worth separating.
llvm-svn: 369795
This generalizes the isGEPKnownNonNull rule from ValueTracking to apply when we do not know if the base is non-null, and thus need to replace one condition with another.
The core notion is that since an inbounds GEP can only form null if the base pointer is null and the offset is zero. However, if the offset is non-zero, the the "inbounds" marker makes the result poison. Thus, we're free to ignore the case where the offset is non-zero. Similarly, there's no case under which a non-null base can result in a null result without generating poison.
Differential Revision: https://reviews.llvm.org/D66608
llvm-svn: 369789
Summary:
We already use the fact that an object with known size X does not alias
another objection of size Y > X before. With this commit, we use
dereferenceability information to determine a lower bound for Y and not
only rely on the user provided query size.
The result for @global_and_deref_arg_2() and @local_and_deref_ret_2()
in test/Analysis/BasicAA/dereferenceable.ll improved with this patch.
Reviewers: asbirlea, chandlerc, hfinkel, sanjoy
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66157
llvm-svn: 369786
Summary:
If the unique return value is a constant we now replace call uses with
that constant.
Reviewers: sstefan1, uenoku
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66551
llvm-svn: 369785
Summary:
If we have a loop in which the dereferenceability of a pointer decreases
we did slowly decrease it iteration by iteration, leading to a timeout.
With this patch we detect such circular reasoning and indicate a
fixpoint early.
Reviewers: uenoku, sstefan1
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66558
llvm-svn: 369784
Summary:
Since clang does not support comment style fallthrough annotations
these should be switched to macros defined in Compiler.h. This
requires some fixing to Compiler.h.
Original patch: https://reviews.llvm.org/D66487
Reviewers: nickdesaulniers, aaron.ballman, xbolva00, rsmith
Reviewed By: nickdesaulniers, aaron.ballman, rsmith
Subscribers: rsmith, sfertile, ormris, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66609
llvm-svn: 369782
Patch showing the effect of enabling bool vector oversimplification.
Non-VLX builds can simplify a kshift shuffle, but VLX builds simplify:
insert_subvector v8i zeroinitializer, v2i --> insert_subvector v8i undef, v2i
Preventing the removal of the AND to clear the upper bits of result
Differential Revision: https://reviews.llvm.org/D53022
llvm-svn: 369780
LiveDebugValues gives variable locations to blocks, but it should also take
away. There are various circumstances where a variable location is known
until a loop backedge with a different location is detected. In those
circumstances, where there's no agreement on the variable location, it
should be undef / removed, otherwise we end up picking a location that's
valid on some loop iterations but not others.
However, LiveDebugValues doesn't currently do this, see the new testcase
attached. Without this patch, the location of !3 is assumed to be %bar
through the loop. Once it's added to the In-Locations list, it's never
removed, even though the later dbg.value(0... of !3 makes the location
un-knowable.
This patch checks during block-location-joining to see whether any
previously-present locations have been removed in a predecessor. If they
have, the live-ins have changed, and the block needs reprocessing.
Similarly, in transferTerminator, assign rather than |= the Out-Locations
after processing a block, as we may have deleted some previously valid
locations. This will mean that LiveDebugValues performs more propagation
-- but that's necessary for it being correct.
Differential Revision: https://reviews.llvm.org/D66599
llvm-svn: 369778
Summary:
If we have a negative inbounds offset dereferenceabily "grows". However,
until we do not handle the overflow that can occur in the
dereferenceable bytes and the problem with loops, we simply do not grow
the state.
Reviewers: sstefan1, uenoku
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66557
llvm-svn: 369771
If the number of potentially returned values not change since the last
traversal we do not need to visit the returned values again. This works
as we only add values to the returned values set now.
Differential Revision: https://reviews.llvm.org/D66484
llvm-svn: 369770
Summary:
When we have new attributes and we end the fixpoint iteration because
the iteration limit is reached, we need to treat the new ones as if they
changed in the last iteration, as they might have.
This adds a test for which we should not derive anything regardless of
the iteration limit, e.g., if we abort there should not be any
attributes manifested in the IR.
Reviewers: uenoku, sstefan1
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66549
llvm-svn: 369768
Summary:
Keep aliasees alive if their alias is live, otherwise we end up with an
alias to a declaration, which is invalid. This can happen when the
aliasee is weak and non-prevailing.
This fix exposed the fact that we were then attempting to internalize
the weak symbol, which was not exported as it was not prevailing. We
should not internalize interposable symbols in general, unless this is
the prevailing copy, since it can lead to incorrect inlining and other
optimizations. Most of the changes in this patch are due to the
restructuring required to pass down the prevailing callback.
Finally, while implementing the test cases, I found that in the case of
a weak aliasee that is still marked not live because its alias isn't
live, after dropping the definition we incorrectly marked the
declaration with weak linkage when resolving prevailing symbols in the
module. This was due to some special case handling for symbols marked
WeakLinkage in the summary located before instead of after a subsequent
check for the symbol being a declaration. It turns out that we don't
actually need this special case handling any more (looking back at the
history, when that was added the code was structured quite differently)
- we will correctly mark with weak linkage further below when the
definition hasn't been dropped.
Fixes PR42542.
Reviewers: pcc
Subscribers: mehdi_amini, inglorion, steven_wu, dexonsmith, dang, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66264
llvm-svn: 369766
Given an instruction I, the MustBeExecutedContextExplorer allows to
easily traverse instructions that are guaranteed to be executed whenever
I is. For now, these instruction have to be statically "after" I, in
the same or different basic blocks.
This patch also adds a pass which prints the must-be-executed-context
for each instruction in a module. It is used to test the
MustBeExecutedContextExplorer, for now on the examples given in the
class comment of the MustBeExecutedIterator.
Differential Revision: https://reviews.llvm.org/D65186
llvm-svn: 369765
Now `lw/sw $reg, sym+offset` pseudo instructions for global symbol `sym`
are lowering into the following three instructions.
```
lw $reg, %got(symbol)($gp)
addiu $reg, $reg, offset
lw/sw $reg, 0($reg)
```
It's possible to reduce the number of instructions by taking the offset
in account in the final `lw/sw` command. This patch implements that
optimization.
```
lw $reg, %got(symbol)($gp)
lw/sw $reg, offset($reg)
```
Differential Revision: https://reviews.llvm.org/D66553
llvm-svn: 369756
Now pseudo instruction `la $6, symbol+8($6)` is expanding into the following
chain of commands:
```
lw $1, %got(symbol+8)($gp)
addiu $1, $1, 8
addu $6, $1, $6
```
This is incorrect. When a linker handles the `R_MIPS_GOT16` relocation,
it does not expect to get any addend and breaks on assertion. Otherwise
it has to create new GOT entry for each unique "sym + offset" pair.
Offset for a global symbol should be added to result of loading GOT
entry by a separate `add` command.
The patch fixes the problem by stripping off an offset from the expression
passed to the `%got`. That's interesting that even current code inserts
a separate `add` command.
Differential Revision: https://reviews.llvm.org/D66552
llvm-svn: 369755
This is a follow up of r369642.
This patch assigns a ReadAfterLd to every implicit register use of instruction
CMPXCHG8B and instruction CMPXCHG16B. Perf micro-benchmarks show that implicit
registers are read after 3cy from the start of execution.
llvm-svn: 369750
Excluding ADC/SBB and the bit-test instructions (BTR/BTS/BTC), the observed
latency of all other RMW integer arithmetic/logic instructions is 6cy and not
5cy.
Example (ADD):
```
addb $0, (%rsp) # Latency: 6cy
addb $7, (%rsp) # Latency: 6cy
addb %sil, (%rsp) # Latency: 6cy
addw $0, (%rsp) # Latency: 6cy
addw $511, (%rsp) # Latency: 6cy
addw %si, (%rsp) # Latency: 6cy
addl $0, (%rsp) # Latency: 6cy
addl $511, (%rsp) # Latency: 6cy
addl %esi, (%rsp) # Latency: 6cy
addq $0, (%rsp) # Latency: 6cy
addq $511, (%rsp) # Latency: 6cy
addq %rsi, (%rsp) # Latency: 6cy
```
The same latency profile applies to SUB/AND/OR/XOR/INC/DEC.
The observed latency of ADC/SBB is 7-8cy. So we need a different write to model
those. Latency of BTS/BTR/BTC is not fixed by this patch (they are much slower
than what the model for btver2 currently reports).
Differential Revision: https://reviews.llvm.org/D66636
llvm-svn: 369748
ExtName should not be decorated, just like Name.
This avoids double decoration on symbols in import libraries
that use = for renaming functions. (Weak aliases, which use ==,
worked fine with respect to decoration.)
Differential Revision: https://reviews.llvm.org/D66617
llvm-svn: 369747
If the accumulator and either of the multiply operands are negatable then we can we negate the entire expression.
Differential Revision: https://reviews.llvm.org/D63141
llvm-svn: 369746
Summary:
Add support for gfx10, where all DPP operations are confined to work
within a single row of 16 lanes, and wave32.
Reviewers: arsenm, sheredom, critson, rampitec
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, t-tye, hiraditya, jfb, dstuttard, tpr, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65644
llvm-svn: 369745
st_other field of a symbol usually contains its visibility.
Other bits are usually 0, though some targets, like
MIPS can set them using the named bit field values.
Problem is that there is no way to set an arbitrary value now,
though that might be useful for our test cases.
In this patch I introduced a way to set st_other to any numeric
value using the new StOther field.
I added a test and simplified the existent one to show the effect/benefit
Differential revision: https://reviews.llvm.org/D66583
llvm-svn: 369742
For v2i32 we only feed 2 i8 elements into the psadbw instructions
with 0s in the other 14 bytes. The resulting psadbw instruction
will produce zeros in bits [127:16] of the output. We need to take
the result and feed it to a v2i32 add where the first element
includes bits [15:0] of the sad result. The other element should
be zero.
Prior to this patch we were using a truncate to take 0 from
bits 95:64 of the psadbw. This results in a pshufd to move those
bits to 63:32. But since we also have zeroes in bits 63:32 of
the psadbw output, we should just take those bits.
The previous code probably worked better with promoting legalization,
but now we use widening legalization. I've preserved the old
behavior if -x86-experimental-vector-widening-legalization=false
until we get that option removed.
llvm-svn: 369733
We were computing the loop exit value, but not ensuring the addrec belonged to the loop whose exit value we were computing. I couldn't actually trip this; the test case shows the basic setup which *might* trip this, but none of the variations I've tried actually do.
llvm-svn: 369730
The alignment is calculated incorrectly, thus sometimes it doesn't generate aligned mov instructions, as shown by the example below:
```
// b.cc
typedef long long index;
extern "C" index g_tid;
extern "C" index g_num;
void add3(float* __restrict__ a, float* __restrict__ b, float* __restrict__ c) {
index n = 64*1024;
index m = 16*1024;
index k = 4*1024;
index tid = g_tid;
index num = g_num;
__builtin_assume_aligned(a, 32);
__builtin_assume_aligned(b, 32);
__builtin_assume_aligned(c, 32);
for (index i0=tid*k; i0<m; i0+=num*k)
for (index i1=0; i1<n*m; i1+=m)
for (index i2=0; i2<k; i2++)
c[i1+i0+i2] = b[i0+i2] + a[i1+i0+i2];
}
```
Compile with `clang b.cc -Ofast -march=skylake -mavx2 -S`
```
vmovaps -224(%rdi,%rbx,4), %ymm0
vmovups -192(%rdi,%rbx,4), %ymm1 # should be movaps
vmovups -160(%rdi,%rbx,4), %ymm2 # should be movaps
vmovups -128(%rdi,%rbx,4), %ymm3 # should be movaps
vaddps -224(%rsi,%rbx,4), %ymm0, %ymm0
vaddps -192(%rsi,%rbx,4), %ymm1, %ymm1
vaddps -160(%rsi,%rbx,4), %ymm2, %ymm2
vaddps -128(%rsi,%rbx,4), %ymm3, %ymm3
vmovaps %ymm0, -224(%rdx,%rbx,4)
vmovups %ymm1, -192(%rdx,%rbx,4) # should be movaps
vmovups %ymm2, -160(%rdx,%rbx,4) # should be movaps
vmovups %ymm3, -128(%rdx,%rbx,4) # should be movaps
```
Differential Revision: https://reviews.llvm.org/D66575
Patch by Dun Liang
llvm-svn: 369723
One problem with untagging memory in landing pads is that it only works
correctly if the function that catches the exception is instrumented.
If the function is uninstrumented, we have no opportunity to untag the
memory.
To address this, replace landing pad instrumentation with personality function
wrapping. Each function with an instrumented stack has its personality function
replaced with a wrapper provided by the runtime. Functions that did not have
a personality function to begin with also get wrappers if they may be unwound
past. As the unwinder calls personality functions during stack unwinding,
the original personality function is called and the function's stack frame is
untagged by the wrapper if the personality function instructs the unwinder
to keep unwinding. If unwinding stops at a landing pad, the function is
still responsible for untagging its stack frame if it resumes unwinding.
The old landing pad mechanism is preserved for compatibility with old runtimes.
Differential Revision: https://reviews.llvm.org/D66377
llvm-svn: 369721
Prefer `MCFixupKind` where possible and add getTargetKind() to
convert to `unsigned` when needed rather than scattering cast
operators around the place.
Differential Revision: https://reviews.llvm.org/D59890
llvm-svn: 369720
I noticed another instance of the issue where references to aliases were
being replaced with aliasees, this time in InstCombine. In the instance that
I saw it turned out to be only a QoI issue (a symbol ended up being missing
from the symbol table due to the last reference to the alias being removed,
preventing HWASAN from symbolizing a global reference), but it could easily
have manifested as incorrect behaviour.
Since this is the third such issue encountered (previously: D65118, D65314)
it seems to be time to address this common error/QoI issue once and for all
and make the strip* family of functions not look through aliases.
Includes a test for the specific issue that I saw, but no doubt there are
other similar bugs fixed here.
As with D65118 this has been tested to make sure that the optimization isn't
load bearing. I built Clang, Chromium for Linux, Android and Windows as well
as the test-suite and there were no size regressions.
Differential Revision: https://reviews.llvm.org/D66606
llvm-svn: 369697
The x86 tests are now broken (in paticular add-scalar.ll now hits the
DAG fallback) due to not handling G_UADDO. The DAG x86 backend has a
custom lowering for this, so that will need to be implemented.
llvm-svn: 369673
Local symbols in the indirect symbol table contain the value
`INDIRECT_SYMBOL_LOCAL` and the corresponding __pointers entry must
contain the address of the target.
In r349060, I added support for local symbols in the indirect symbol
table, which was checking if the symbol `isDefined` && `!isExternal` to
determine if the symbol is local or not.
It turns out that `isDefined` will return false if the user of the
symbol comes before its definition, and we'll again generate .long 0
which will be the symbol at the adress 0x0.
Instead of doing that, use GlobalValue::hasLocalLinkage() to check if
the symbol is local.
Differential Revision: https://reviews.llvm.org/D66563
llvm-svn: 369671
It appears the FIXME here was handled at some point. r159728 from 2012 seems to be at least aportion of fixing it.
Differential Revision: https://reviews.llvm.org/D66570
llvm-svn: 369665
Patch https://reviews.llvm.org/D43256 introduced more aggressive loop layout optimization which depends on profile information. If profile information is not available, the statically estimated profile information(generated by BranchProbabilityInfo.cpp) is used. If user program doesn't behave as BranchProbabilityInfo.cpp expected, the layout may be worse.
To be conservative this patch restores the original layout algorithm in plain mode. But user can still try the aggressive layout optimization with -force-precise-rotation-cost=true.
Differential Revision: https://reviews.llvm.org/D65673
llvm-svn: 369664
Summary: These nodes end up being processed regardless due to DAGCombiner ensuring arguments are processed. This changes the order in which nodes are processed, which fixes an issue on PowerPC.
Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri, mcberg2017, stefanp, hfinkel
Subscribers: nemanjai, MaskRay, jsji, steven.zhang, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66548
llvm-svn: 369662
A lot of places in the code combine checks for both ABI (SVR4/Darwin/AIX) and
addressing mode (64-bit vs 32-bit). In an attempt to make some of the code more
readable I've added a couple functions that combine checking for the ELF abi and
64-bit/32-bit code at once. As we add more AIX support I intend to add similar
functions for the AIX ABI.
Differential Revision: https://reviews.llvm.org/D65814
llvm-svn: 369658
Previously we would get the csect a symbol was contained in through its
fragment. This works only if we are writing an object file, and only for
defined symbols. To fix this we set the contating csect explicitly on the
MCSymbolXCOFF object.
Differential Revision: https://reviews.llvm.org/D66032
llvm-svn: 369657
Summary: In D65402, I want to get DerefState from AADereferenceable but it was not allowed. This patch moves DerefState definition into Attributor.h and makes AADerefenceable inherit StateWrapper.
Reviewers: jdoerfert, sstefan1
Reviewed By: jdoerfert
Subscribers: hiraditya, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66585
llvm-svn: 369653
Summary:
When we print the IR with --print-after/before-*,
SlotIndexes will be printed whenever available (We haven't freed it).
This introduces some noises when we try to compare the IR
among different optimizations.
eg:
-print-before=machine-cp will print SlotIndexes for 1st machine-cp
pass, but NOT for 2nd machine-cp;
-print-after=machine-cp will NOT print SlotIndexes for both
machine-cp passes.
So SlotIndexes in 1st pass introduce noises when differing these IRs.
This patch introduces an option to hide indexes.
Reviewers: stoklund, thegameg, qcolombet
Reviewed By: thegameg
Subscribers: hiraditya, arphaman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66500
llvm-svn: 369650
This fixes https://bugs.llvm.org/show_bug.cgi?id=40337.
Previously, it was always assumed that relocations referenced symbols in the static symbol table.
Now, if the Link field references a section called ".dynsym" it will look up these symbols
in the dynamic symbol table.
This patch is heavily based on D59097 by James Henderson
Differential revision: https://reviews.llvm.org/D66532
llvm-svn: 369645
On Jaguar, XCHG has a latency of 1cy and decodes to 2 macro-opcodes. Maximum
throughput for XCHG is 1 IPC. The byte exchange has worse latency and decodes to
1 extra uOP; maximum observed throughput is 0.5 IPC.
```
xchgb %cl, %dl # Latency: 2cy - uOPs: 3 - 2 ALU
xchgw %cx, %dx # Latency: 1cy - uOPs: 2 - 2 ALU
xchgl %ecx, %edx # Latency: 1cy - uOPs: 2 - 2 ALU
xchgq %rcx, %rdx # Latency: 1cy - uOPs: 2 - 2 ALU
```
The reg-mem forms of XCHG are atomic operations with an observed latency of
16cy. The resource usage is similar to the XCHGrr variants. The biggest
difference is obviously the bus-locking, which prevents the LS to issue other
memory uOPs in parallel until the unlocking store uOP is executed.
```
xchgb %cl, (%rsp) # Latency: 16cy - uOPs: 3 - ECX latency: 11cy
xchgw %cx, (%rsp) # Latency: 16cy - uOPs: 3 - ECX latency: 11cy
xchgl %ecx, (%rsp) # Latency: 16cy - uOPs: 3 - ECX latency: 11cy
xchgq %rcx, (%rsp) # Latency: 16cy - uOPs: 3 - ECX latency: 11cy
```
The exchanged in/out register operand becomes available after 11cy from the
start of execution. Added test xchg.s to verify that we correctly see that
register write committed in 11cy (and not 16cy).
Reg-reg XADD instructions have the same latency/throughput than the byte
exchange (register-register variant).
```
xaddb %cl, %dl # latency: 2cy - uOPs: 3 - 3 ALU
xaddw %cx, %dx # latency: 2cy - uOPs: 3 - 3 ALU
xaddl %ecx, %edx # latency: 2cy - uOPs: 3 - 3 ALU
xaddq %rcx, %rdx # latency: 2cy - uOPs: 3 - 3 ALU
```
The non-atomic RM variants have a latency of 11cy, and decode to 4
macro-opcodes. They still consume 2 ALU pipes, and the exchange in/out register
operand becomes available in 3cy (it matches the 'load-to-use latency').
```
xaddb %cl, (%rsp) # latency: 11cy - uOPs: 4 - 3 ALU
xaddw %cx, (%rsp) # latency: 11cy - uOPs: 4 - 3 ALU
xaddl %ecx, (%rsp) # latency: 11cy - uOPs: 4 - 3 ALU
xaddq %rcx, (%rsp) # latency: 11cy - uOPs: 4 - 3 ALU
```
The atomic XADD variants execute in 16cy. The in/out register operand is
available after 11cy from the start of execution.
```
lock xaddb %cl, (%rsp) # latency: 16cy - uOPs: 4 - 3 ALU -- ECX latency: 11cy
lock xaddw %cx, (%rsp) # latency: 16cy - uOPs: 4 - 3 ALU -- ECX latency: 11cy
lock xaddl %ecx, (%rsp) # latency: 16cy - uOPs: 4 - 3 ALU -- ECX latency: 11cy
lock xaddq %rcx, (%rsp) # latency: 16cy - uOPs: 4 - 3 ALU -- ECX latency: 11cy
```
Added test xadd.s to verify those latencies as well as read-advance values.
Differential Revision: https://reviews.llvm.org/D66535
llvm-svn: 369642