When we have
```
a = G_OR x, x
```
or
```
b = G_AND y, y
```
We can drop the G_OR/G_AND and just use x/y respectively.
Also update arm64-fallback.ll because there was an or in there which hits this
transformation.
Differential Revision: https://reviews.llvm.org/D77105
Implement identity combines for operations like the following:
```
%a = G_SUB %b, 0
```
This can just be replaced with %b.
Over CTMark, this gives some minor size improvements at -O3.
Differential Revision: https://reviews.llvm.org/D76640
This reverts commit b3297ef051.
This change is incorrect. The current semantic of null in the IR is a
pointer with the bitvalue 0. It is not a cast from an integer 0, so
this should preserve the pointer type.
Summary:
This code was throwing away the opcode for a boolean, which was then
reconstructing the opcode from that boolean. Just pass the opcode, and
forget the boolean.
Reviewers: srhines
Reviewed By: srhines
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77100
Summary:
This change adds amdgcn.reloc.constant intrinsic to the amdgpu backend, which will compile into a relocation entry in the resulting elf.
The intrinsics takes a MetadataNode (String) as its only argument, which specifies the symbol name of the relocation entry.
`SelectionDAGBuilder::getValueImpl` is changed to allow metadata operands passed through to ISel.
Author: csyonghe <yonghe@google.com>
Reviewers: tpr, nhaehnle
Reviewed By: nhaehnle
Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, hiraditya, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76440
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: jyknight, sdardis, nemanjai, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, jfb, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77059
Summary:
Also deprecate getOriginalAlignment, getAlignment will take much more time as it is pervasive through the codebase (including TableGened files).
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76933
There was already a test case for landingpads to handle this case, but I
had forgotten to consider PHI instructions preceding the EH_LABEL in the
landingpad.
PR45261
MC already knows how to emulate the .weak directive (with its ELF
semantics; i.e., an undefined weak symbol resolves to 0, and a defined
weak symbol has lower link precedence than a strong symbol of the same
name) using COFF weak externals. Plumb this through the ASM printer too,
so that definitions marked with __attribute__((weak)) at the language
level (which gets translated to weak linkage at the IR level) have the
corresponding .weak directive emitted. Note that declarations marked
with __attribute__((weak)) at the language level (which translates to
extern_weak at the IR level) already have .weak directives emitted.
Weak*/linkonce* symbols without an associated comdat (in particular, ones
generated with __attribute__((weak)) in C/C++) were earlier emitted as
normal unique globals, as the comdat is required to provide the linkonce
semantics. This change makes sure they are emitted as .weak instead,
allowing other symbols to override them.
Rename the existing coff-weak.ll test to coff-linkonce.ll. I'm not
quite sure what that test covers, since the behavior being tested in it
(the emission of a one_only section) is just a result of passing
-function-sections to llc; the linkonce_odr makes no difference.
Add a new coff-weak.ll which tests the new directive emission.
Based on an previous patch by Shoaib Meenai.
Differential Revision: https://reviews.llvm.org/D44543
When we see this:
```
%a = COPY $physreg
...
SOMETHING implicit-def $physreg
...
%b = COPY $physreg
```
The two copies are not equivalent, and so we shouldn't perform any folding
on them.
When we have two instructions which use a physical register check that they
define the same virtual register(s) as well.
e.g., if we run into this case
```
%a = COPY $physreg
...
%b = COPY %a
```
we can say that the two copies are the same, and can be folded.
Differential Revision: https://reviews.llvm.org/D76890
Make these behave the same way unsafe-fp-math and co. The command line
flag should add the attribute to functions that do not already have
it, and leave existing attributes. The attribute is the actual
implementation, but the flag is useful in some testing situations.
AMDGPU has a variety of tests with denormals enabled/disabled that
would require a painful level of test duplication without a flag. This
doesn't expose setting the separate input/output modes, or add a flag
for the f32 version yet.
Tests will be included in future patch.
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: arsenm, dschuff, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, jrtc27, atanasyan, jfb, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76925
Summary: This patch is the first effort to adding basic optimizations for FREEZE in SelDag.
Reviewers: spatel, lebedev.ri
Reviewed By: spatel
Subscribers: xbolva00, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76707
These transforms rely on a vector reduction flag on the SDNode
set by SelectionDAGBuilder. This flag exists because SelectionDAG
can't see across basic blocks so SelectionDAGBuilder is looking
across and saving the info. X86 is the only target that uses this
flag currently. By removing the X86 code we can remove the flag
and the SelectionDAGBuilder code.
This pass adds a dedicated IR pass for X86 that looks across the
blocks and transforms the IR into a form that the X86 SelectionDAG
can finish.
An advantage of this new approach is that we can enhance it to
shrink the phi nodes and final reduction tree based on the zeroes
that we need to concatenate to bring the partially reduced
reduction back up to the original width.
Differential Revision: https://reviews.llvm.org/D76649
SUMMARY:
SUMMARY
for a source file "test.c"
void foo() {};
llc will generate assembly code as (assembly patch)
.globl foo
.globl .foo
.csect foo[DS]
foo:
.long .foo
.long TOC[TC0]
.long 0
and symbol table as (xcoff object file)
[4] m 0x00000004 .data 1 unamex foo
[5] a4 0x0000000c 0 0 SD DS 0 0
[6] m 0x00000004 .data 1 extern foo
[7] a4 0x00000004 0 0 LD DS 0 0
After first patch, the assembly will be as
.globl foo[DS] # -- Begin function foo
.globl .foo
.align 2
.csect foo[DS]
.long .foo
.long TOC[TC0]
.long 0
and symbol table will as
[6] m 0x00000004 .data 1 extern foo
[7] a4 0x00000004 0 0 DS DS 0 0
Change the code for the assembly path and xcoff objectfile patch for llc.
Reviewers: Jason Liu
Subscribers: wuzish, nemanjai, hiraditya
Differential Revision: https://reviews.llvm.org/D76162
Summary:
The existing helper function can only create a libcall to functions available in
RTLIB. Add a helper function that can create a libcall to a given function name
using the provided calling convention.
Reviewers: aditya_nandakumar, t.p.northover, rovka, arsenm, dsanders
Reviewed By: arsenm
Subscribers: wdng, hiraditya, volkan, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76845
In some scalarize/split result methods (unary, binary, ...), flags in
SDNode were not passed down, which may lead to unexpected results in
unsafe float-point optimization. This patch fixes them. (maybe not
complete)
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D76832
For some operations, the type is unimportant and only the number of
bits matters. For example I don't want to treat <4 x s8> as a legal
type, but I also don't want to decompose loads of this into smaller
pieces to get legal register types.
On AMDGPU in SelectionDAG, we legalize a number of operations (most
notably load and store) by coercing all types to vectors of i32. For
GlobalISel, I'm trying very hard to avoid doing this for every type,
but I don't think this strategy can be completely avoided. I'm trying
to avoid bitcasts for any legitimately legal type we can operate on,
since the intervening bitcasts have proven to be a hassle.
For loads, I think I can get away without ever casting the result
type, and handling any arbitrary bitwidth during selection (I will
eventually want new tablegen support to help with this, rather than
having to add every possible type as legal). The unmerge required to
do anything with the value should expand to the expected shifts. This
is trickier for stores, since it would now require handling a wide
array of truncates during selection which I don't want.
Future potentially interesting case are for vector indexing, where
sub-dword type should be indexed in s32 pieces.
Record the address of a tail-calling branch instruction within its call
site entry using DW_AT_call_pc. This allows a debugger to determine the
address to use when creating aritificial frames.
This creates an extra attribute + relocation at tail call sites, which
constitute 3-5% of all call sites in xnu/clang respectively.
rdar://60307600
Differential Revision: https://reviews.llvm.org/D76336
The current implementation collects all Preds/Succs of a Dep of kind Output, creating a long chain and subsequently a schedule with an unnecessarily large II.
Was this done on purpose for a reason I'm missing?
Reviewed By: bcahoon
Differential Revision: https://reviews.llvm.org/D75424
When we find something like this:
```
%a:_(s32) = G_SOMETHING ...
...
%select:_(s32) = G_SELECT %cond(s1), %a, %a
```
We can remove the select and just replace it entirely with `%a` because it's
always going to result in `%a`.
Same if we have
```
%select:_(s32) = G_SELECT %cond(s1), %a, %b
```
where we can deduce that `%a == %b`.
This implements the following cases:
- `%select:_(s32) = G_SELECT %cond(s1), %a, %a` -> `%a`
- `%select:_(s32) = G_SELECT %cond(s1), %a, %some_copy_from_a` -> `%a`
- `%select:_(s32) = G_SELECT %cond(s1), %a, %b` -> `%a` when `%a` and `%b`
are defined by identical instructions
This gives a few minor code size improvements on CTMark at -O3 for AArch64.
Differential Revision: https://reviews.llvm.org/D76523
I think we can save the MRI argument from these since it's in
GISelKnownBits already, but currently not accessible.
Implementation deferred to avoid dependency on other patches.
Otherwise, the Win64 unwinder considers direct branches to such empty
trailing BBs to be a branch out of the function. It treats such a branch
as a tail call, which can only be part of an epilogue. If the unwinder
misclassifies such a branch as part of the epilogue, it will fail to
unwind the stack further. This can lead to bad stack traces, or failure
to handle exceptions properly. This is described in
https://llvm.org/PR45064#c4, and by the comment at the top of the
X86AvoidTrailingCallPass.cpp file.
It should be safe to insert int3 for such blocks. An empty trailing BB
that reaches this pass is pretty much guaranteed to be unreachable. If
a program executed such a block, it would fall off the end of the
function.
Most of the complexity in this patch comes from threading through the
"EHFuncletEntry" boolean on the MIRParser and registering the pass so we
can stop and start codegen around it. I used an MIR test because we
should teach LLVM to optimize away these branches as a follow-up.
Reviewed By: hans
Differential Revision: https://reviews.llvm.org/D76531
Summary:
Add new generic MIR opcodes G_SADDSAT etc. Add support in IRTranslator
for translating the saturating add/subtract intrinsics to the new
opcodes.
Reviewers: aemerson, dsanders, paquette, arsenm
Subscribers: jvesely, wdng, nhaehnle, rovka, hiraditya, volkan, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76600
We have some long-standing missing shuffle optimizations that could
use this transform via VectorCombine now:
https://bugs.llvm.org/show_bug.cgi?id=35454
(and we still don't get that case in the backend either)
This function is apparently templated because there's existing code
in IR that treats mask values as unsigned and backend code that
treats masks values as signed.
The mask values are not endian-dependent (as shown by the existing
bitcast transform from DAGCombiner).
Differential Revision: https://reviews.llvm.org/D76508
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: dylanmckay, sdardis, nemanjai, hiraditya, kbarton, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76551
When decided whether to generate a post-inc load/store, look at the
other memory nodes that use the same base address and, if any proceed
the current node, then don't do the combine.
The change only seems to be affecting the Arm backend, which I was
surprised at, but it appears to fix a lot of our issues around MVE
masked load/stores having to store a temporary address after an early
post-increment on a shared base address.
Differential Revision: https://reviews.llvm.org/D75847
Extract the decision to combine into a post-inc address into a
couple of functions to make the logic more clear and re-usable.
Differential Revision: https://reviews.llvm.org/D76060
Summary:
Widening G_UNMERGE_VALUES to a type which is larger than the
original source type is the same as widening it to the same
type as the source type: in both cases, G_UNMERGE_VALUES has
to be replaced with bit arithmetic which. Although the arithmetic
itself is independent of whether the source type is smaller
or equal to the widen type, widening the source type to the
widen type should result in less artifacts being emitted,
since this is the type that the user explicitly requested.
Reviewers: arsenm, dsanders, aemerson, aditya_nandakumar
Reviewed By: arsenm, dsanders
Subscribers: jvesely, wdng, nhaehnle, rovka, hiraditya, volkan, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76494
For folding pattern `x-(fma y,z,u*v) -> (fma -y,z,(fma -u,v,x))`, if
`yz` is 1, `uv` is -1 and `x` is -0, sign of result would be changed.
Differential Revision: https://reviews.llvm.org/D76419
Technically we can permit EXTLOAD of the LHS operand but only if all the extended bits are shifted out. Until we test coverage for that case, I'm just disabling this to fix PR45265.
-fuse-init-array is now the CC1 default but TargetLoweringObjectFileELF::UseInitArray still defaults to false.
The following two unknown OS target triples continue using .ctors/.dtors because InitializeELF is not called.
clang -target i386 -c a.c
clang -target x86_64 -c a.c
This cleanup fixes this as a bonus.
X86SpeculativeLoadHardeningPass::tracePredStateThroughCall can call
MCContext::createTempSymbol before TargetLoweringObjectFileELF::Initialize().
We need to call TargetLoweringObjectFileELF::Initialize() ealier.
test/CodeGen/X86/speculative-load-hardening-indirect.ll
Differential Revision: https://reviews.llvm.org/D71360
Use the advanceToLowerBound operation available on CoalescingBitVector
iterators to speed up collection of variables which reside within some
set of registers.
The speedup comes from avoiding repeated top-down traversals in
IntervalMap::find. The linear scan forward from one register interval to
the next is unlikely to be as expensive as a full IntervalMap search
starting from the root.
This reduces time spent in LiveDebugValues when compiling sqlite3 by
200ms (about 0.1% - 0.2% of the total User Time).
Depends on D76466.
rdar://60046261
Differential Revision: https://reviews.llvm.org/D76467
Avoid making a heap allocation when constructing a CoalescingBitVector.
This reduces time spent in LiveDebugValues when compiling sqlite3 by
700ms (0.5% of the total User Time).
rdar://60046261
Differential Revision: https://reviews.llvm.org/D76465
UseInitArray is now the CC1 default but TargetLoweringObjectFileELF::UseInitArray still defaults to false.
The following two unknown OS target triples continue using .ctors/.dtors because InitializeELF is not called.
clang -target i386 -c a.c
clang -target x86_64 -c a.c
This cleanup fixes this as a bonus.
Differential Revision: https://reviews.llvm.org/D71360
Summary:
It can be the case that a vector type is legal but the corresponding
scalar type is not legal for an architecture (i8 vs. v16i8 on AArch64).
Check if the scalar type created when folding
truncate(build_vector(x,y)) -> build_vector(truncate(x),truncate(y))
is legal if we are running after the type legalizer.
This fixes https://github.com/android/ndk/issues/1207.
Reviewers: RKSimon, srhines
Subscribers: kristof.beyls, hiraditya, danielkiss, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76312
Summary:
For some reason the order in which we call getNegatedExpression
for the involved operands, after a call to isCheaperToUseNegatedFPOps,
seem to matter. This patch includes a new test case in
test/CodeGen/X86/fdiv.ll that crashes if we reverse the order of
those calls. Before this patch that could happen depending on
which compiler that were used when buildind llvm. With my GCC
version (7.4.0) I got the crash, because it seems like it is
using a different order for the argument evaluation compared
to clang.
All other users of isCheaperToUseNegatedFPOps already used this
pattern with unfolded/ordered calls to getNegatedExpression, so
this patch is aligning visitFDIV with the other use cases.
This patch simply deals with the non-determinism for FDIV. While
the underlying problem with getNegatedExpression is discussed
further in D76439.
Reviewers: spatel, RKSimon
Reviewed By: spatel
Subscribers: hiraditya, mgrang, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76319
This reverts commit e9f22fd429.
When building with -DLLVM_USE_SANITIZER="Thread", check-llvm has 70
failing tests with this revision, and 29 without this revision.
Port over the following:
- shuffle undef, undef, any_mask -> undef
- shuffle anything, anything, undef_mask -> undef
This sort of thing shows up a lot when you try to bugpoint code containing
shufflevector.
Differential Revision: https://reviews.llvm.org/D76382
Summary:
* Remove a bunch of asserts checking for unsupported scalable types and
add some more now that they are supported.
* Propagate the scalable flag where necessary.
* Add another `EVT::getExtendedVectorVT` method that takes an
ElementCount parameter.
* Add `EVT::isExtendedScalableVector` and
`EVT::getExtendedVectorElementCount` - latter is currently unused.
Reviewers: sdesmalen, efriedma, rengolin, craig.topper, huntergr
Reviewed By: efriedma
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75672
Summary:
Related to D75672, this patch adds EVT::isFixedLengthVector to determine
if the underlying vector type is of fixed length.
An assert is introduced in EVT::getVectorNumElements that triggers for
types that aren't fixed length. This is currently guarded by a flag
added D75297 that is off by default and has been renamed to the more
generic ENABLE_STRICT_FIXED_SIZE_VECTORS.
Ideally we want to get rid of getVectorNumElements but a quick grep
shows there are >350 uses in lib/CodeGen and 75 in lib/Target/AArch64
alone. All of these probably aren't EVT::getVectorNumElements (some may
be the MVT equivalent), but there are many places to fixup and having
the assert on by default would make the SVE upstreaming effort
difficult.
Reviewers: sdesmalen, efriedma, ctetreau, huntergr, rengolin
Reviewed By: efriedma
Subscribers: mgorny, kristof.beyls, hiraditya, danielkiss, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76376
Gather/scatter don't access one memory location, they access multiple disjoint locations. So using a fixed size isn't accurate. But we don't have a way to represent the true behavior so just use UnknownSize.
Previously we "split" the memory VT and use that size for the MMO of each half. But the memory VT is scalar so splitting usually just returned the original scalar VT, but on 32-bit X86 if the scalar VT was i64 it probably returned i32?
Differential Revision: https://reviews.llvm.org/D76388
The existence of the class is more confusing than helpful, I think; the
commonality is mostly just "GEP is legal", which can be queried using
APIs on GetElementPtrInst.
Differential Revision: https://reviews.llvm.org/D75660
SelectionDAG CSEs nodes based on their result type and operands, but not their flags. The flags are expected to be intersected when they are CSEd. In SelectionDAGBuilder, for FP nodes we manage both the fast math flags and the nofpexcept flag after the nodes have already been CSEd when they were created with getNode. The management of the fastmath flags before the constrained nodes prevents the nofpexcept management from working correctly.
This commit moves the FMF handling for constrained intrinsics into their visitor and disables the common FMF handling for these nodes.
Differential Revision: https://reviews.llvm.org/D75224
This patch generates TableGen descriptions for the specified register
banks which contain a list of register sizes corresponding to the
available HwModes. The appropriate size is used during codegen according
to the current HwMode. As this HwMode was not available on generation,
it is set upon construction of the RegisterBankInfo class. Targets
simply need to provide the HwMode argument to the
<target>GenRegisterBankInfo constructor.
The RISC-V RegisterBankInfo constructor has been updated accordingly
(plus an unused argument removed).
Differential Revision: https://reviews.llvm.org/D76007
This ports some combines from DAGCombiner.cpp which perform some trivial
transformations on instructions with undef operands.
Not having these can make it extremely annoying to find out where we differ
from SelectionDAG by looking at existing lit tests. Without them, we tend to
produce pretty bad code generation when we run into instructions which use
undef operands.
Also remove the nonpow2_store_narrowing testcase from arm64-fallback.ll, since
we no longer fall back on the add.
Differential Revision: https://reviews.llvm.org/D76339
Summary: The following change is to allow the machine outlining can be applied for Nth times, where N is specified by the compiler option. By default the value of N is 1. The motivation is that the repeated machine outlining can further reduce code size. Please refer to the presentation "Improving Swift Binary Size via Link Time Optimization" in LLVM Developers' Meeting in 2019.
Reviewers: aschwaighofer, tellenbach, paquette
Reviewed By: paquette
Subscribers: tellenbach, hiraditya, llvm-commits, jinlin
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71027
When optimising for code size at the expense of performance, it is often
worth saving and restoring some of r0-r3, if IPRA will be able to take
advantage of them. This doesn't cost any extra code size if we already
have a PUSH/POP pair, and increases the number of available registers
across any calls to the function.
We already have an optimisation which tries fold the subtract/add of the
SP into the PUSH/POP by using extra registers, which somewhat conflicts
with this. I've made the new optimisation less aggressive in cases where
the existing one is likely to trigger, which gives better results than
either of these optimisations by themselves.
Differential revision: https://reviews.llvm.org/D69936
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: jholewinski, arsenm, dschuff, jyknight, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76348
Skip debug instructions before calling functions not expecting them.
In particular, LIS.getInstructionIndex(*mi) would fail if mi was a debg instr.
Differential Revision: https://reviews.llvm.org/D76129
If it is a*b-c*d, it could be also folded into fma(a, b, -c*d) or fma(-c, d, a*b).
This patch is trying to respect the uses of a*b and c*d to make the best choice.
Differential Revision: https://reviews.llvm.org/D75982
Summary: The following change is to allow the machine outlining can be applied for Nth times, where N is specified by the compiler option. By default the value of N is 1. The motivation is that the repeated machine outlining can further reduce code size. Please refer to the presentation "Improving Swift Binary Size via Link Time Optimization" in LLVM Developers' Meeting in 2019.
Reviewers: aschwaighofer, tellenbach, paquette
Reviewed By: paquette
Subscribers: tellenbach, hiraditya, llvm-commits, jinlin
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71027
ISD::ROTL/ROTR rotation values are guaranteed to act as a modulo amount, so for power-of-2 bitwidths we only need the lowest bits.
Differential Revision: https://reviews.llvm.org/D76201
When compiling
```
struct S {
float w;
};
void f(long w, long b);
void g(struct S s) {
int w = s.w;
f(w, w*4);
}
```
I get Assertion failed: ((!CombinedExpr || CombinedExpr->isValid()) && "Combined debug expression is invalid").
That's because we combine two epxressions that both end in DW_OP_stack_value:
```
(lldb) p Expr->dump()
!DIExpression(DW_OP_LLVM_convert, 32, DW_ATE_signed, DW_OP_LLVM_convert, 64, DW_ATE_signed, DW_OP_stack_value)
(lldb) p Param.Expr->dump()
!DIExpression(DW_OP_constu, 4, DW_OP_mul, DW_OP_LLVM_convert, 32, DW_ATE_signed, DW_OP_LLVM_convert, 64, DW_ATE_signed, DW_OP_stack_value)
(lldb) p CombinedExpr->isValid()
(bool) $0 = false
(lldb) p CombinedExpr->dump()
!DIExpression(4097, 32, 5, 4097, 64, 5, 16, 4, 30, 4097, 32, 5, 4097, 64, 5, 159, 159)
```
I believe that in this particular case combining two stack values is
safe, but I didn't want to sink the special handling into
DIExpression::append() because I do want everyone to think about what
they are doing.
Patch by Adrian Prantl.
Fixes PR45181.
rdar://problem/60383095
Differential Revision: https://reviews.llvm.org/D76164
RDF is designed to be target agnostic. Therefore it would be useful to have it available for other targets, such as X86.
Based on a previous patch by Krzysztof Parzyszek
Differential Revision: https://reviews.llvm.org/D75932
I believe we were previously calculating a pointer info with the scalar base and an offset of 0. But that's not really where the gather is pointing. The offset is a function of the indices of the GEP we looked through.
Also set the size of the MachineMemOperand to UnknownSize
Differential Revision: https://reviews.llvm.org/D76157
Summary: The following change is to allow the machine outlining can be applied for Nth times, where N is specified by the compiler option. By default the value of N is 1. The motivation is that the repeated machine outlining can further reduce code size. Please refer to the presentation "Improving Swift Binary Size via Link Time Optimization" in LLVM Developers' Meeting in 2019.
Reviewers: aschwaighofer, tellenbach, paquette
Reviewed By: paquette
Subscribers: tellenbach, hiraditya, llvm-commits, jinlin
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71027
Under certain circumstances we'll end up in the position where the negated shift amount will get truncated to the type specified getScalarShiftAmountTy(), so we need to test for a truncated version of the shift amount as well.
This allows us to remove half of the remaining patterns tested for by X86ISelLowering's combineOrShiftToFunnelShift.
MCTargetOptionsCommandFlags.inc and CommandFlags.inc are headers which contain
cl::opt with static storage.
These headers are meant to be incuded by tools to make it easier to parametrize
codegen/mc.
However, these headers are also included in at least two libraries: lldCommon
and handle-llvm. As a result, when creating DYLIB, clang-cpp holds a reference
to the options, and lldCommon holds another reference. Linking the two in a
single executable, as zig does[0], results in a double registration.
This patch explores an other approach: the .inc files are moved to regular
files, and the registration happens on-demand through static declaration of
options in the constructor of a static object.
[0] https://bugzilla.redhat.com/show_bug.cgi?id=1756977#c5
Differential Revision: https://reviews.llvm.org/D75579
With -fstack-protector-strong we check if a non-array variable has its address
taken in a way that could cause a potential out-of-bounds access. However what
we don't catch is when the address is directly used to create an out-of-bounds
memory access.
Fix this by examining the offsets of GEPs that are ultimately derived from
allocas and checking if the resulting address is out-of-bounds, and by checking
that any memory operations using such addresses are not over-large.
Fixes PR43478.
Differential revision: https://reviews.llvm.org/D75695
This is the second patch in a series of patches to enable basic block
sections support.
This patch adds support for:
* Creating direct jumps at the end of basic blocks that have fall
through instructions.
* New pass, bbsections-prepare, that analyzes placement of basic blocks
in sections.
* Actual placing of a basic block in a unique section with special
handling of exception handling blocks.
* Supports placing a subset of basic blocks in a unique section.
* Support for MIR serialization and deserialization with basic block
sections.
Parent patch : D68063
Differential Revision: https://reviews.llvm.org/D73674
I used the implementation for floor instead of round. It also turns
out the OpenCL builtin library wasn't using the round builtin, but
implemented the expanded form.
Summary:
This is a part of the series of efforts for correcting alignment of memory operations.
(Another related bugs: https://bugs.llvm.org/show_bug.cgi?id=44388 , https://bugs.llvm.org/show_bug.cgi?id=44543 )
This fixes https://bugs.llvm.org/show_bug.cgi?id=43880 by giving default alignment of loads to 1.
The test CodeGen/AArch64/bcmp-inline-small.ll should have been changed; it was introduced by https://reviews.llvm.org/D64805 . I talked with @evandro, and confirmed that the test is okay to be changed.
Other two tests from PowerPC needed changes as well, but fixes were straightforward.
Reviewers: courbet
Reviewed By: courbet
Subscribers: nlopes, gchatelet, wuzish, nemanjai, kristof.beyls, hiraditya, steven.zhang, danielkiss, llvm-commits, evandro
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76113
Summary:
This is a simple fix for CodeGenPrepare that freezes branch condition when transforming select to branch.
If it is not frozen, instsimplify or the later pipeline can potentially exploit undefined behavior.
The diff shows optimized form becase D75859 and D76048 already made a few changes to CodeGenPrepare for optimizing freeze(cmp).
Reviewers: jdoerfert, spatel, lebedev.ri, efriedma
Reviewed By: lebedev.ri
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76179
Followup to D75114, this patch reuses the existing MatchRotate ROTL/ROTR rotation pattern code to also recognize the more general FSHL/FSHR funnel shift patterns when we have variable shift amounts, matched with MatchFunnelPosNeg which acts in an (almost) equivalent manner to MatchRotatePosNeg.
Summary:
This is a simple fix for CodeGenPrepare that freezes branch condition when transforming select to branch.
If it is not freezed, instsimplify or the later pipeline can potentially exploit undefined behavior.
The diff shows optimized form becase D75859 and D76048 already made a few changes to CodeGenPrepare for optimizing freeze(cmp).
Reviewers: jdoerfert, spatel, lebedev.ri, efriedma
Reviewed By: lebedev.ri
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76179