As there is no header in pre-DWARFv5 address tables, and we fill
the class data members with some artificial values, we should not
dump them as that might be misleading.
Differential Revision: https://reviews.llvm.org/D74195
As addresses in the address tables may have relocations, thus,
the relocations should be resolved to read the correct address.
That is especially important for targets that use RELA relocations
because in that case addends are stored in relocation sections.
Differential Revision: https://reviews.llvm.org/D74404
This adds a strict version of FP16_TO_FP and FP_TO_FP16 and uses
them to implement soft promotion for the half type. This is
enough to provide basic support for __fp16 with strictfp.
Add the necessary X86 support to use VCVTPS2PH/VCVTPH2PS when F16C
is enabled.
This reverts commit rGcd5b308b828e, rGcd5b308b828e, rG8cedf0e2994c.
There are issues to be investigated for polly bots and bots turning on
EXPENSIVE_CHECKS.
This was creating a select on true/false values, and then comparing
that later. This produced more work for later combines, which can be
avoided by just using the boolean values. This was copied from the
original DAG expansion, which also has the same problem. This doesn't
have a observable change using SelectionDAG, but since GlobalISel is
missing these optimizations, the final code was noticeably longer.
These have nicer expansions implemented in the DAG. Ideally we would
either directly implement all of these special expansions, or stop
expanding division in the IR.
Mark the CrashRecoveryContextImpl constructor noexcept, so that MSVC
won't emit an unwind helper to clean up the allocation from `new` if the
constructor throws an exception.
Otherwise, MSVC complains:
llvm\lib\Support\CrashRecoveryContext.cpp(220): error C2712: \
Cannot use __try in functions that require object unwinding
The other simple fix would be to wrap `new` in a static helper or
lambda.
Users have reported that Tensorflow builds LLVM with /EHsc.
This is apparently worse than 1-byte alignment. This does not attempt
to decompose 2-byte aligned wide stores, but will stop trying to
produce them.
Also fix bug in LoadStoreVectorizer which was decreasing the alignment
and vectorizing stack accesses. It was assuming a stack object was an
alloca that could have its base alignment changed, which is not true
if the pointer is derived from a function argument.
Since natural fdiv lowering is now more conservative even with
denormals disabled, we get a slower expansion from just a plain
1.0/fdiv. Directly emit the rcp intrinsic when using it to implement
integer division to avoid a pointlessly complex sequence.
https://reviews.llvm.org/D74273
Pad macho section data to pointer size bytes, so that relocation
table and symbol table following section data will be pointer size
aligned.
Patch by pguo.
We aren't doing a good job of optimizing AVX512 outside of this code. So remove the bail out for AVX512 and replace with a FIXME. This at least gets us the AVX2 codegen.
Differential Revision: https://reviews.llvm.org/D74431
Instructions marked as FrameSetup do not cause requestLabelAfterInsn to
be called and so no such label is generated. Call instructions which
require call site entries to be generated require this label to be
present in order to calculate the return PC offset/address, but the
check for whether the call instruction is marked as FrameSetup was not
present.
Therefore in the case where a call instruction is marked as FrameSetup,
an assertion failure occurs if a call site entry is to be generated.
This is the case with RISC-V's implementation of save/restore via
library calls.
Differential Revision: https://reviews.llvm.org/D71593
This patch adds the support required for using the __riscv_save and
__riscv_restore libcalls to implement a size-optimization for prologue
and epilogue code, whereby the spill and restore code of callee-saved
registers is implemented by common functions to reduce code duplication.
Logic is also included to ensure that if both this optimization and
shrink wrapping are enabled then the prologue and epilogue code can be
safely inserted into the basic blocks chosen by shrink wrapping.
Differential Revision: https://reviews.llvm.org/D62686
As an approximation to a dead edge we can check if the terminator is
dead. If so, the corresponding operand use in a PHI node is dead even if
the PHI node itself is not.
ObjectLinkingLayer was not correctly propagating dependencies through local
symbols within an object. This could cause symbol lookup to return before a
searched-for symbol is ready if the following conditions are met:
(1) The definition of the symbol being searched for transitively depends on a
local symbol within the same object, and that local symbol in turn
transitively depends on an external symbol provided by some other module
in the JIT.
(2) Concurrent compilation is enabled.
(3) Thread scheduling causes the lookup of the searched-for symbol to return
before all transitive dependencies of the looked-up symbol are emitted.
This bug was found by inspection and has not been observed in practice.
A jitlink test case has been added to verify that symbol dependencies are
correctly propagated through local symbol definitions.
Summary:
SIInstrInfo::expandPostRAPseudo converts ENTER_WWM in-place into an
S_OR_SAVEEXEC instruction that needs certain implicit operands. Without
this patch I get errors like this that make it harder to use -stop-after
to bisect the pass pipeline:
$ llc -march=amdgcn test/CodeGen/AMDGPU/wqm.ll -stop-after=postrapseudos -o - | sed -E 's/ (from|into) custom "TargetCustom[0-9]+"//' | llc -march=amdgcn -x=mir
error: <stdin>:1295:70: missing implicit register operand 'implicit-def $scc'
renamable $sgpr2_sgpr3 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec
^
Note that this error is currently only generated by MIParser but it
comes with a FIXME comment:
// FIXME: Move the implicit operand verification to the machine verifier.
Reviewers: critson, arsenm, rampitec, nhaehnle
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74428
Summary:
Dwarf stores source-file names the three parts:
<compilation_directory><include_directory><filename>
Prior to this change, the code only allowed retrieving either all
three as the absolute path, or just the filename. But many
compile-command lines--especially those in hermetic build systems
don't specify an absolute path, nor just the filename, but rather the
path relative to the compilation directory. This features allows
retrieving them in that style.
Add tests for path printing styles.
Modify createBasicPrologue to handle include directories.
Subscribers: aprantl, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73383
Summary:
Part of the changes in D44564 made BasicAA not CFG only due to it using
PhiAnalysisValues which may have values invalidated.
Subsequent patches (rL340613) appear to have addressed this limitation.
BasicAA should not be invalidated by non-CFG-altering passes.
A concrete example is MemCpyOpt which preserves CFG, but we are testing
it invalidates BasicAA.
llvm-dev RFC: https://groups.google.com/forum/#!topic/llvm-dev/eSPXuWnNfzM
Reviewers: john.brawn, sebpop, hfinkel, brzycki
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74353
Based on uops.info these should have 5 cycle latency as they did on Haswell/Broadwell. I have no additional internal information from Intel.
This was also shown as a discrepancy in the spreadsheet that was sent with an early llvm-dev post about llvm-exegesis.
It also matches Agner Fog.
Differential Revision: https://reviews.llvm.org/D74357
This restores commit 748bb5a0f1, along
with a fix for a Chromium test suite build issue (and a new test for
that case).
Differential Revision: https://reviews.llvm.org/D73242
Summary:
* for <= tbd_v3, simulator platforms appear the same as the real
platform and we distinct the difference from the architecture.
fixes: rdar://problem/59161559
Reviewers: ributzka, steven_wu
Reviewed By: ributzka
Subscribers: hiraditya, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74416
Currently, isTruncateFree() and isZExtFree() callbacks return false
as they are not implemented in BPF backend. This may cause suboptimal
code generation. For example, if the load in the context of zero extension
has more than one use, the pattern zextload{i8,i16,i32} will
not be generated. Rather, the load will be matched first and
then the result is zero extended.
For example, in the test together with this commit, we have
I1: %0 = load i32, i32* %data_end1, align 4, !tbaa !2
I2: %conv = zext i32 %0 to i64
...
I3: %2 = load i32, i32* %data, align 4, !tbaa !7
I4: %conv2 = zext i32 %2 to i64
...
I5: %4 = trunc i64 %sub.ptr.lhs.cast to i32
I6: %conv13 = sub i32 %4, %2
...
The I1 and I2 will match to one zextloadi32 DAG node, where SUBREG_TO_REG is
used to convert a 32bit register to 64bit one. During code generation,
SUBREG_TO_REG is a noop.
The %2 in I3 is used in both I4 and I6. If isTruncateFree() is false,
the current implementation will generate a SLL_ri and SRL_ri
for the zext part during lowering.
This patch implement isTruncateFree() in the BPF backend, so for the
above example, I3 and I4 will generate a zextloadi32 DAG node with
SUBREG_TO_REG is generated during lowering to Machine IR.
isZExtFree() is also implemented as it should help code gen as well.
This patch also enables the change in https://reviews.llvm.org/D73985
since it won't kick in generates MOV_32_64 machine instruction.
Differential Revision: https://reviews.llvm.org/D74101
The changeXXXAfterManifest functions are better suited to deal with
changes so we should prefer them. These functions also recursively
delete dead instructions which is why we see test changes.
This is a followup to D73803, which uses the replaceOperand()
helper in more places.
This should be NFC apart from changes to worklist order.
Differential Revision: https://reviews.llvm.org/D73919
Fix/workaround for https://bugs.llvm.org/show_bug.cgi?id=44539.
As discussed there, this pass makes some overly optimistic
assumptions, as it does not have access to actual branch weights.
This patch makes the computation of the depth of the optimized cmov
more conservative, by assuming a distribution of 75/25 rather than
50/50 and placing the weights to get the more conservative result
(larger depth). The fully conservative choice would be
std::max(TrueOpDepth, FalseOpDepth), but that would break at least
one existing test (which may or may not be an issue in practice).
Differential Revision: https://reviews.llvm.org/D74155
Summary:
Add a new method (tryParseRegister) that attempts to parse a register specification.
MASM allows the use of IFDEF <register>, as well as IFDEF <symbol>. To accommodate this, we make it possible to check whether a register specification can be parsed at the current location, without failing the entire parse if it can't.
Reviewers: thakis
Reviewed By: thakis
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73486
When more than one SelectPseudo instruction is handled a new MBB is
returned. This must not be done if that would result in leaving an undhandled
isel pseudo behind in the original MBB.
Fixes https://bugs.llvm.org/show_bug.cgi?id=44849.
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D74352
This patch removes forcedconstant to simplify things for the
move to ValueLattice, which includes constant ranges, but no
forced constants.
This patch removes forcedconstant and changes ResolvedUndefsIn
to mark instructions with unknown operands as overdefined. This
means we do not do simplifications based on undef directly in SCCP
any longer, but this seems to hardly come up in practice (see stats
below), presumably because InstCombine & others take care
of most of the relevant folds already.
It is still beneficial to keep ResolvedUndefIn, as it allows us delaying
going to overdefined until we propagated all known information.
I also built MultiSource, SPEC2000 and SPEC2006 and compared
sccp.IPNumInstRemoved and sccp.NumInstRemoved. It looks like the impact
is quite low:
Tests: 244
Same hash: 238 (filtered out)
Remaining: 6
Metric: sccp.IPNumInstRemoved
Program base patch diff
test-suite...arks/VersaBench/dbms/dbms.test 4.00 3.00 -25.0%
test-suite...TimberWolfMC/timberwolfmc.test 38.00 34.00 -10.5%
test-suite...006/453.povray/453.povray.test 158.00 155.00 -1.9%
test-suite.../CINT2000/176.gcc/176.gcc.test 668.00 668.00 0.0%
test-suite.../CINT2006/403.gcc/403.gcc.test 1209.00 1209.00 0.0%
test-suite...arks/mafft/pairlocalalign.test 76.00 76.00 0.0%
Tests: 244
Same hash: 238 (filtered out)
Remaining: 6
Metric: sccp.NumInstRemoved
Program base patch diff
test-suite...arks/mafft/pairlocalalign.test 185.00 175.00 -5.4%
test-suite.../CINT2006/403.gcc/403.gcc.test 2059.00 2056.00 -0.1%
test-suite.../CINT2000/176.gcc/176.gcc.test 2358.00 2357.00 -0.0%
test-suite...006/453.povray/453.povray.test 317.00 317.00 0.0%
test-suite...TimberWolfMC/timberwolfmc.test 12.00 12.00 0.0%
Reviewers: davide, efriedma, mssimpso
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D61314
A small IR change in calculating the active lanes resulted in no longer
recognising tail-predication. Now recognise both an 'add' and 'or' in
the expression that calculates the active lanes.
Differential Revision: https://reviews.llvm.org/D74394
Added a test for #pragma clang __debug llvm_fatal_error to test for the original issue.
Added llvm::sys::Process::Exit() and replaced ::exit() in places where it was appropriate. This new function would call the current CrashRecoveryContext if one is running on the same thread; or call ::exit() otherwise.
Fixes PR44705.
Differential Revision: https://reviews.llvm.org/D73742
ADDI(C.ADDI) may achieve better code size than XORI, since XORI has no C extension.
This patch transforms two patterns and gets almost equivalent results.
Differential Revision: https://reviews.llvm.org/D71774
This reverts commit d0c4d4fe09.
Revert "[DSE,MSSA] Move more passing test cases from todo to simple.ll."
This reverts commit 02266e64bb.
Revert "[DSE,MSSA] Adjust mda-with-dbg-values.ll to MSSA backed DSE."
This reverts commit 74f03e4ff0.
The variable was added to the initial commit via copy/paste of existing
code, but it wasn't actually used in the code. We can add it back with
the proper usage if/when that is needed.
Without PSHUFB we are better using ROTL (expanding to OR(SHL,SRL)) than using the generic v16i8 shuffle lowering - but if we can widen to v8i16 or more then the existing shuffles are still the better option.
Summary:
That patch is extracted from https://reviews.llvm.org/D74308.
Currently there are two patterns to name error handling functions:
using "Callback" and "Handler". This patch uses "Handler" for all
usage places.
Reviewers: jhenderson, dblaikie, probinson, aprantl
Reviewed By: jhenderson, dblaikie
Subscribers: hiraditya, llvm-commits
Tags: #llvm, #debug-info
Differential Revision: https://reviews.llvm.org/D74354
New intrinisics are implemented for when we need to port SIMD code from other
arhitectures and only load or store portions of MSA registers.
Following intriniscs are added which only load/store element 0 of a vector:
v4i32 __builtin_msa_ldrq_w (const void *, imm_n2048_2044);
v2i64 __builtin_msa_ldr_d (const void *, imm_n4096_4088);
void __builtin_msa_strq_w (v4i32, void *, imm_n2048_2044);
void __builtin_msa_str_d (v2i64, void *, imm_n4096_4088);
Differential Revision: https://reviews.llvm.org/D73644
Fixup the UserValue methods to use FragmentInfo instead of DIExpression because
the DIExpression is only ever used to get the to get the FragmentInfo. The
DIExpression is meaningless in the UserValue class because each definition point
added to a UserValue may have a unique DIExpression.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D74057
Rename the class DbgValueLocation to DbgVariableValue and instances from Loc to
DbgValue. These names better express the new semantics introduced in D74053.
The class previously represented a { Location } only. It now represents a
{ Location, DIExpression } pair which together describe a value.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D74055
LiveDebugVariables uses interval maps to explicitly represent DBG_VALUE
intervals. DBG_VALUEs are filtered into an interval map based on their {
Variable, DIExpression }. The interval map will coalesce adjacent entries that
use the same { Location }. Under this model, DBG_VALUEs which refer to the same
bits of the same variable will be filtered into different interval maps if they
have different DIExpressions which means the original intervals will not be
properly preserved.
This patch fixes the problem by using { Variable, Fragment } to filter the
DBG_VALUEs into maps, and coalesces adjacent entries iff they have the same
{ Location, DIExpression } pair.
The solution is not perfect because we see the similar issues appear when
partially overlapping fragments are encountered, but is far simpler than a
complete solution (i.e. D70121).
Fixes: pr41992, pr43957
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D74053
Summary:
As far as I know this did not affect code generation, but it did affect
the order of -debug-only=si-wqm output and the naming of autonamed
values in -print-after=si-wqm output.
Reviewers: arsenm, rampitec, nhaehnle
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, mgrang, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74317
We need to use vector instructions for these operations. Previously
we handled this with isel patterns that used extra instructions
and copies to handle the the conversions.
Now we use custom lowering to emit the conversions. This allows
them to be pattern matched and optimized on their own. For
example we can now emit vpextrw to store the result if its going
directly to memory.
I've forced the upper elements to VCVTPHS2PS to zero to keep some
code similar. Zeroes will be needed for strictfp. I've added a
DAG combine for (fp16_to_fp (fp_to_fp16 X)) to avoid extra
instructions in between to be closer to the previous codegen.
This is a step towards strictfp support for f16 conversions.
Summary:
There are a few field init values that are concrete but not complete/foldable (e.g. `?`). This allows for using those values as initializers without erroring out.
Example:
```
class A {
string value = ?;
}
class B<A impl> : A {
let value = impl.value; // This currently emits an error.
let value = ?; // This doesn't emit an error.
}
```
Differential Revision: https://reviews.llvm.org/D74360
A downstream test exposed a simple logic bug with the manual pointer
stripping code, fix that by just using stripPointerCasts() on the value.
I don't think there's a way to expose this issue upstream.
As discussed in PR41083:
https://bugs.llvm.org/show_bug.cgi?id=41083
...we can assert/crash in EarlyCSE using the current hashing scheme and
instructions with flags.
ValueTracking's matchSelectPattern() may rely on overflow (nsw, etc) or
other flags when detecting patterns such as min/max/abs composed of
compare+select. But the value numbering / hashing mechanism used by
EarlyCSE intersects those flags to allow more CSE.
Several alternatives to solve this are discussed in the bug report.
This patch avoids the issue by doing simple matching of min/max/abs
patterns that never requires instruction flags. We give up some CSE
power because of that, but that is not expected to result in much
actual performance difference because InstCombine will canonicalize
these patterns when possible. It even has this comment for abs/nabs:
/// Canonicalize all these variants to 1 pattern.
/// This makes CSE more likely.
(And this patch adds PhaseOrdering tests to verify that the expected
transforms are still happening in the standard optimization pipelines.
I left this code to use ValueTracking's "flavor" enum values, so we
don't have to change the callers' code. If we decide to go back to
using the ValueTracking call (by changing the hashing algorithm
instead), it should be obvious how to replace this chunk.
Differential Revision: https://reviews.llvm.org/D74285
Summary: It attempts to devirtualize a call on alloca through vtable loads.
Reviewers: davidxl
Subscribers: mgorny, Prazek, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71308
This patch:
- enable frame pointer for AIX;
- update some of red zone comments;
- add/update testcases;
Differential Revision: https://reviews.llvm.org/D72454
We were checking for extra uses of the negated operand even
if we were not going to create it as part of this canonicalization.
This was showing up as a regression when we limit EarlyCSE as
proposed in D74285.
SUMMARY:
The patch is enable to support Mergeable2ByteCString and Mergeable4ByteCString
Reviewers: daltenty
Subscribers: wuzish, nemanjai, hiraditya
Differential Revision: https://reviews.llvm.org/D74164
LoopCacheAnalysis currently assumes the loop will be iterated over in
a forward direction. This patch addresses the issue by using the
absolute value of the stride when iterating backwards.
Note: this patch will treat negative and positive array access the
same, resulting in the same cost being calculated for single and
bi-directional access patterns. This should be improved in a
subsequent patch.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D73064
Each function is with this compiled with the SystemZSubtarget initialized
from the functions attributes.
Review: Ulrich Weigand.
Differential Revision: https://reviews.llvm.org/D74086
Non-AVX512BW targets failed to concatenate 256-bit shifts back to 512-bits (split during 512-bit shuffle lowering as they don't have v32i16/v64i8 types).
This reverts commit b54a8ec1bc.
The commit triggered debug invariance (different output with/without
-g). The patch seems to have exposed a pre-existing invariance problem
in GlobalOpt, which I'll write a bug report for.
These are generated and do not need to have the same values.
We are defining separate subregs for R600 and GCN but then
using AMDGPU subregs on R600.
Differential Revision: https://reviews.llvm.org/D74248
As noted on PR44379, we didn't attempt to lower vector shuffles using bit rotations on XOP/AVX512F targets.
This patch lowers to uniform ISD:ROTL nodes - ROTR isn't supported by XOP and they are interchangeable for constant values anyway.
There might be cases where targets without ISD:ROTL support would benefit from this (expanding to SRL+SHL+OR), which I'll investigate in a future patch.
REAPPLIED rGe82e17d4d4ca after reversion at rG39eade73a567 - fixed offset matching in matchShuffleAsBitRotate.
If a debug line section with version of greater than 5 is encountered,
prior to this change the parser would accept it and treat it as version
5. This might work to some extent, but then it might not at all, as it
really depends on the format of the unspecified future version, which
will be different (otherwise there would be no point in changing the
version number). Any information we could provide has a good chance of
being invalid, so we should just refuse to parse such tables.
Reviewed by: dblaikie, MaskRay
Differential Revision: https://reviews.llvm.org/D74204
This patch makes the following System Registers Read Only:
- CurrentEL
- ICH_MISR_EL2
- PMBIDR_EL1
- PMSIDR_EL1
as found in:
https://developer.arm.com/docs/ddi0595/e/aarch64-system-registers
Relative line numbers were also added to the tests so we get more
informative error messages on failure.
Change-Id: I963b4f01ca5737b58f9e8e7abe9ca1d99e328758
Add a simplification to fuse a manual vector extract with shifts and
truncate into a bitcast.
Unpacking and packing values into vectors is only optimized with
extractelement instructions, not when manually unpacked using shifts
and truncates.
This patch simplifies shifts and truncates into a bitcast if possible.
Simplify (build_vec (trunc $1)
(trunc (srl $1 width))
(trunc (srl $1 (2 * width))) ...)
to (bitcast $1)
Differential Revision: https://reviews.llvm.org/D73892
This change implements the llvm intrinsic llvm.read_register for
the SystemZ platform which returns the value of the specified
register
(http://llvm.org/docs/LangRef.html#llvm-read-register-and-llvm-write-register-intrinsics).
This implementation returns the value of the stack register, and
can be extended to return the value of other registers. The
implementation for this intrinsic exists on various other platforms
including Power, x86, ARM, etc. but missing on SystemZ.
Reviewers: uweigand
Differential Revision: https://reviews.llvm.org/D73378
As noted on PR44379, we didn't attempt to lower vector shuffles using bit rotations on XOP/AVX512F targets.
This patch lowers to uniform ISD:ROTL nodes - ROTR isn't supported by XOP and they are interchangeable for constant values anyway.
There might be cases where targets without ISD:ROTL support would benefit from this (expanding to SRL+SHL+OR), which I'll investigate in a future patch.
Also, non-AVX512BW targets fail to concatenate 256-bit rotations back to 512-bits (split during shuffle lowering as they don't have v32i16/v64i8 types).
---
Internal shuffle tests indicate theres a bug somewhere that I haven't been able to track down yet.
This patch adds a first version of a MemorySSA based DSE. It is missing
a lot of features, which will get added as follow-ups, to help to keep
the review manageable.
The patch uses the following general approach: given a MemoryDef, walk
upwards to find clobbering MemoryDefs that may be killed by the
starting def. Then check that there are no uses that may read the
location of the original MemoryDef in between both MemoryDefs. A bit
more concretely:
For all MemoryDefs StartDef:
1. Get the next dominating clobbering MemoryDef (DomAccess) by walking upwards.
2. Check that there no reads between DomAccess and the StartDef by checking
all uses starting at DomAccess and walking until we see StartDef.
3. For each found DomDef, check that:
1. There are no barrier instructions between DomDef and StartDef (like
throws or stores with ordering constraints).
2. StartDef is executed whenever DomDef is executed.
3. StartDef completely overwrites DomDef.
4. Erase DomDef from the function and MemorySSA.
The patch uses a very simple approach to guarantee that no throwing
instructions are between 2 stores: We only allow accesses to stack
objects, access that are in the same basic block if the block does not
contain any throwing instructions or accesses in functions that do
not contain any throwing instructions. This will get lifted later.
Besides adding support for the missing cases, there is plenty of additional
potential for improvements as follow-up work, e.g. the way we visit stores
(could be just a traversal of the MemorySSA, rather than collecting them
up-front), using the alias information discovered during walking to optimize
the MemorySSA.
This is loosely based on D40480 by Dave Green.
Reviewers: dmgreen, rnk, efriedma, bryant, asbirlea, Tyker
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D72700
This copies the DSE tests into a MSSA subdirectory to test the MemorySSA
backed DSE implementation, without disturbing the original tests.
Differential Revision: https://reviews.llvm.org/D72145
When specifying -march=arch[8|9|10], those CPU types do NOT support
the vector extension. In this case the vector ABI must be disabled.
The generated data layout should NOT contain 64-v128.
Reviewers: uweigand
Differential Revision: https://reviews.llvm.org/D74146
Use the isCandidateForCallSiteEntry().
This should mostly be an NFC, but there are some parts ensuring
the moveCallSiteInfo() and copyCallSiteInfo() operate with call site
entry candidates (both Src and Dest should be the call site entry
candidates).
Differential Revision: https://reviews.llvm.org/D74122
Based on D72931
This adds a new feature called A16 which is enabled for gfx10.
gfx9 keeps the R128A16 feature so it can share all the instruction encodings
with gfx7/8.
Differential Revision: https://reviews.llvm.org/D73956
This is a minimal but important advancement over the existing code. A
cast with an operand that is only used in the cast retains the no-alias
property of the operand.
Traversing PHI nodes is natural with the genericValueTraversal but also
a bit tricky. The problem is similar to the ones we have seen in AAAlign
and AADereferenceable, namely that we continue to increase the range in
each iteration. We use a pessimistic approach here to stop the
iterations. Nevertheless, optimistic information can now be propagated
through a PHI node.
The change is performed as stated by the FIXME and the tests are
adjusted. All changes look fine to me and values can be inferred as
undef without it being an error.
Casts can be handled natively by the ConstantRange class. We do limit it
to extends for now as we assume an integer type in different locations.
A TODO and a test case with a FIXME was added to remove that restriction
in the future.
Using sign extend forces the adjacent element to either all zeros
or all ones. But all ones is a NAN. So that doesn't seem like a
great idea.
Trying to work on supporting this with strict FP where NAN would
definitely be bad.
When the FP exists, the FP base CFI directive offset should take the size of variable arguments into account.
Differential Revision: https://reviews.llvm.org/D73862
Vector indexing with a constant index should be folded out in the
legalizer, but this was accidentally falling through. This would
produce the indexing operation with $noreg. Handle this case as a
dynamic index just in case a bug like this happens again in the
future.
We were failing to find constants that were casted. I feel like the
artifact combiner should have folded the constant in the trunc before
the custom lowering, but that doesn't happen.
Reverts part of 6524a7a2b9. Since that
commit, the expansion was ignoring the actual save exec register
produced by the instruction, and looking at other instructions. I do
not understand why it was looking at other instructions, but relying
on this scan was wrong.
Fixes verifier errors after SI_IF is tail duplicated, which should be
correct to do. The results were fed into a phi, which was lowered to
the S_MOV_B64_term instructions.
Fix issue mentioned on rGe82e17d4d4ca - non-AVX512BW targets failed to concatenate 256-bit rotations back to 512-bits (split during shuffle lowering as they don't have v32i16/v64i8 types).
The flag isn't used, but I believe this matches the MOV32r0 that
would be created by the table emitter. This should allow this node
to be CSEed with any others created by the table.
As noted on PR44379, we didn't attempt to lower vector shuffles using bit rotations on XOP/AVX512F targets.
This patch lowers to uniform ISD:ROTL nodes - ROTR isn't supported by XOP and they are interchangeable for constant values anyway.
There might be cases where targets without ISD:ROTL support would benefit from this (expanding to SRL+SHL+OR), which I'll investigate in a future patch.
Also, non-AVX512BW targets fail to concatenate 256-bit rotations back to 512-bits (split during shuffle lowering as they don't have v32i16/v64i8 types).
A vselect+strictfp node is not equivalent to a masked operation.
The exceptions of the strictfp node are not masked by a vselect
after it so we can't match it to a masked operation.
We already had a hack in IsLegalToFold to prevent these patterns from
matching. This patch removes that hack and removes the patterns.
We have several bug reports that could be characterized as "reducing scalarization",
and this topic was also raised on llvm-dev recently:
http://lists.llvm.org/pipermail/llvm-dev/2020-January/138157.html
...so I'm proposing that we deal with these patterns in a new, lightweight IR vector
pass that runs before/after other vectorization passes.
There are 4 alternate options that I can think of to deal with this kind of problem
(and we've seen various attempts at all of these), but they all have flaws:
InstCombine - can't happen without TTI, but we don't want target-specific
folds there.
SDAG - too late to assist other vectorization passes; TLI is not equipped
for these kind of cost queries; limited to a single basic block.
CGP - too late to assist other vectorization passes; would need to re-implement
basic cleanups like CSE/instcombine.
SLP - doesn't fit with existing transforms; limited to a single basic block.
This initial patch/transform is based on existing code in AggressiveInstCombine:
we walk backwards through the function looking for a pattern match. But we diverge
from that cost-independent IR canonicalization pass by using TTI to decide if the
vector alternative is profitable.
We probably have at least 10 similar bug reports/patterns (binops, constants,
inserts, cheap shuffles, etc) that would fit in this pass as follow-up enhancements.
It's possible that we could iterate on a worklist to fix-point like InstCombine does,
but it's safer to start with a most basic case and evolve from there, so I didn't
try to do anything fancy with this initial implementation.
Differential Revision: https://reviews.llvm.org/D73480
The LoopExtractor created new functions (by definition), which violates
the restrictions of a LoopPass.
The correct implementation of this pass should be as a ModulePass.
Includes reverting rL82990 implications on the LoopExtractor.
Fixes PR3082 and PR8929.
Differential Revision: https://reviews.llvm.org/D69069
Implement protection against the stack clash attack [0] through inline stack
probing.
Probe stack allocation every PAGE_SIZE during frame lowering or dynamic
allocation to make sure the page guard, if any, is touched when touching the
stack, in a similar manner to GCC[1].
This extends the existing `probe-stack' mechanism with a special value `inline-asm'.
Technically the former uses function call before stack allocation while this
patch provides inlined stack probes and chunk allocation.
Only implemented for x86.
[0] https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt
[1] https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00556.html
This a recommit of 39f50da2a3 with proper LiveIn
declaration, better option handling and more portable testing.
Differential Revision: https://reviews.llvm.org/D68720
Implement protection against the stack clash attack [0] through inline stack
probing.
Probe stack allocation every PAGE_SIZE during frame lowering or dynamic
allocation to make sure the page guard, if any, is touched when touching the
stack, in a similar manner to GCC[1].
This extends the existing `probe-stack' mechanism with a special value `inline-asm'.
Technically the former uses function call before stack allocation while this
patch provides inlined stack probes and chunk allocation.
Only implemented for x86.
[0] https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt
[1] https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00556.html
This a recommit of 39f50da2a3 with proper LiveIn
declaration, better option handling and more portable testing.
Differential Revision: https://reviews.llvm.org/D68720
In addition to the module pass, this patch introduces a CGSCC pass that
runs the Attributor on a strongly connected component of the call graph
(both old and new PM). The Attributor was always design to be used on a
subset of functions which makes this patch mostly mechanical.
The one change is that we give up `norecurse` deduction in the module
pass in favor of doing it during the CGSCC pass. This makes the
interfaces simpler but can be revisited if needed.
Reviewed By: hfinkel
Differential Revision: https://reviews.llvm.org/D70767
Add support for Master and Critical directive in the OMPIRBuilder. Both make use of a new common interface for emitting inlined OMP regions called `emitInlinedRegion` which was added in this patch as well.
Also this patch modifies clang to use the new directives when `-fopenmp-enable-irbuilder` commandline option is passed.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D72304
Parallel regions known to be read-only, e.g., after we removed all dead
write accesses, and terminating (`willreturn`) can be removed.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D69954
This adds ~27 more runtime calls to the OpenMPKinds.def file, all with
attributes. We deduplicate 16 of those automatically in function =
thread scope. And we annotate all of them automatically during the
OpenMPOpt discovery step. A test with all omp_XXXX runtime calls to
track annotation coverage is included.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D69984
Making sure not to use them with patterns for masked instructions.
Also fix FMA patterns that were matching strict_fma+x86selects to
masked instructions.
The OpenMPOpt pass is a CGSCC pass in which OpenMP specific
optimizations can reside.
The OpenMPOpt pass uses the OpenMPKinds.def file to identify runtime
calls and their uses. This allows targeted transformations and eases
their implementation.
This initial patch deduplicates `__kmpc_global_thread_num` and
`omp_get_thread_num` calls. We can also identify arguments that are
equivalent to such a call result and use it instead. Later we can
determine "gtid" arguments based on the use in kernel functions etc.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D69930
The CallGraphUpdater is a helper that simplifies the process of updating
the call graph, both old and new style, while running an CGSCC pass.
The uses are contained in different commits, e.g. D70767.
More functionality is added as we need it.
Reviewed By: modocache, hfinkel
Differential Revision: https://reviews.llvm.org/D70927
Bionic has had `__strlen_chk` for a while. Optimizing that into a
constant is quite profitable, when possible.
Differential Revision: https://reviews.llvm.org/D74079
While D72944 also fixes https://bugs.llvm.org/show_bug.cgi?id=44541,
it does so in a more roundabout manner and there might be other
loopholes to trigger the same issue. This is a more direct fix,
that prevents the transform if the min/max is based on a
non-canonical sub X, 0 instruction.
Differential Revision: https://reviews.llvm.org/D73849
Remove code from LegalizeTypes that allowed this to work.
We were already using BUILD_PAIR for this in some places so this
standardizes on a single way to do this.
As discussed on D73919, this replaces a few cases where we were
modifying multiple operands of instructions in-place with the
creation of a new instruction, which we generally prefer nowadays.
This tends to be more readable and less prone to worklist management
bugs.
Test changes are only superficial (instruction naming and order).
Fixes https://bugs.llvm.org/show_bug.cgi?id=44835. Skip the transform
if it wouldn't actually do anything (apart from removing and reinserting
the same instructions).
Note that the test case doesn't loop on current master anymore, only
on the LLVM 10 release branch. The issue is already mitigated on master
due to worklist order fixes, but we should fix the root cause there as well.
As a side note, we should probably assert in combineLoadToNewType()
that it does not combine to the same type. Not doing this here, because
this assertion would also be triggered in another place right now.
Differential Revision: https://reviews.llvm.org/D74278
Implement protection against the stack clash attack [0] through inline stack
probing.
Probe stack allocation every PAGE_SIZE during frame lowering or dynamic
allocation to make sure the page guard, if any, is touched when touching the
stack, in a similar manner to GCC[1].
This extends the existing `probe-stack' mechanism with a special value `inline-asm'.
Technically the former uses function call before stack allocation while this
patch provides inlined stack probes and chunk allocation.
Only implemented for x86.
[0] https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt
[1] https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00556.html
This a recommit of 39f50da2a3 with better option
handling and more portable testing
Differential Revision: https://reviews.llvm.org/D68720
This hasn't been used for years, its original implementation, D35700, had bugs that caused the reversion of most of the code, and since then x86 shuffle lowering/combining has handled most cases and can deal with the rest as well.
Summary:
For CTTZ we place a set bit just past where the non-promoted type
stopped so the extended bits won't be used for the count. For
CTTZ_ZERO_UNDEF we don't care what happens if no bits are set in
the original type and we end up counting into the extended bits.
So we can just use ANY_EXTEND for both cases.
This matches what is done in type legalization for these operations.
We make no effort to force the upper bits to zero.
Differential Revision: https://reviews.llvm.org/D74111
This improves on the following patch, which removed ARC runtime calls
taking inert global variables:
https://reviews.llvm.org/D62433
rdar://problem/59137105
Summary:
Debug Info Version was changed to use "Max" instead of "Warning" per the
original design intent - but this maxes old/new IR unlinkable, since
mismatched merge styles are a linking failure.
It seems possible/maybe reasonable to actually support the combination
of these two flags: Warn, but then use the maximum value rather than the
first value/earlier module's value.
Reviewers: tejohnson
Differential Revision: https://reviews.llvm.org/D74257
Calls to ObjC's objc_msgSend function are done by bitcasting the function global
to the required function type signature. This patch looks through this bitcast
so that we can do a direct call with bl on arm64 instead of using an indirect blr.
Differential Revision: https://reviews.llvm.org/D74241
Update lambda function
static auto InitializeRegisterBankOnce = [this](const auto &TRI) {
with
static auto InitializeRegisterBankOnce = [&]() {
Capture reference instead of passing argument, as there are buildbot
compiling errors related when passing argument.
Update lambda function
static auto InitializeRegisterBankOnce = [this](const auto &TRI) {
with
static auto InitializeRegisterBankOnce = [&]() {
Capture reference instead of passing argument, as there are buildbot
compiling errors related when passing argument.
On little endian targets prior to Power9, we spill vector registers using a
swapping store (i.e. stdxvd2x saves the vector with the two doublewords in
big endian order regardless of endianness). This is generally not a problem
since we restore them using the corresponding swapping load (lxvd2x). However
if the restore is done by the unwinder, the vector register contains data in
the incorrect order.
This patch fixes that by using Altivec loads/stores for vector saves and
restores in PEI (which keep the order correct) under those specific conditions:
- EH aware function
- Subtarget requires swaps for VSX memops (Little Endian prior to Power9)
Differential revision: https://reviews.llvm.org/D73692
Summary:
The accuracy limit to use rcp is adjusted to 1.0 ulp from 2.5 ulp.
Also, afn instead of arcp is used to allow inaccurate rcp to be used.
Reviewers:
arsenm
Differential Revision: https://reviews.llvm.org/D73588
The issue in the previous commits was that we swap the LHS and RHS while
looking for the constant. In SLT/SGT, the constant must be on the RHS, or the
optimization is invalid.
Move the swapping logic after the check for the SLT/SGT case and update tests.
Original commits:
d78cefb160a373841407
Summary:
Current implementation of matchSwap in SIShrinkInstructions searches the entire
use_nodbg_operands set to find the possible pattern to generate v_swap instruction.
This approach will lead to a O(N^3) in compile time for SIShrinkInstructions.
But in reality, the matching pattern only exists within nearby instructions in the
same basic block. This work limits the search to a maximum of 16 instructions, and has
a linear compile time comsumption.
Reviewers:
rampitec, arsenm
Differential Revision: https://reviews.llvm.org/D74180
Implement protection against the stack clash attack [0] through inline stack
probing.
Probe stack allocation every PAGE_SIZE during frame lowering or dynamic
allocation to make sure the page guard, if any, is touched when touching the
stack, in a similar manner to GCC[1].
This extends the existing `probe-stack' mechanism with a special value `inline-asm'.
Technically the former uses function call before stack allocation while this
patch provides inlined stack probes and chunk allocation.
Only implemented for x86.
[0] https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt
[1] https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00556.html
This a recommit of 39f50da2a3 with correct option
flags set.
Differential Revision: https://reviews.llvm.org/D68720
hasReservedSpillSlot returns a dummy frame index of '0' on PPC64 for the
non-volatile condition registers, which leads to the CalleSavedInfo
either referencing an unrelated stack object, or an invalid object if
there are no stack objects. The latter case causes the mir-printer to
crash due to assertions that checks if the frame index referenced by a
CalleeSavedInfo is valid.
To fix the problem create an immutable FixedStack object at the correct offset
in the linkage area of the previous stack frame (ie SP + positive offset).
Differential Revision: https://reviews.llvm.org/D73709
Previously we took the restored flag in a GPR, extended it 32 or 64 bits. Then used as an input to a sub from 0. This requires creating a zero extend and creating a 0.
This patch changes this to just use an ADD with 255 to restore the carry flag and keep the SETB_C32r/SETB_C64r. Exactly like we handle SBB which is what SETB becomes.
Differential Revision: https://reviews.llvm.org/D74152
Add the isCandidateForCallSiteEntry predicate to MachineInstr to
determine whether a DWARF call site entry should be created for an
instruction.
For now, it's enough to have any call instruction that doesn't belong to
a blacklisted set of opcodes. For these opcodes, a call site entry isn't
meaningful.
Differential Revision: https://reviews.llvm.org/D74159
Allows more flexible use of buildMerge in places where
use operands are available as SrcOp since it does not
require explicit conversion to Register.
Simplify code with new buildMerge.
Differential Revision: https://reviews.llvm.org/D74223
This is a one off special case, since actually implementing full inline asm
support will be much more involved. This lets us compile a lot more code as a
common simple case.
Differential Revision: https://reviews.llvm.org/D74201
Printing floating point number in decimal is inconvenient for humans.
Verbose asm output will print out floating point values in comments, it
helps.
But in lots of cases, users still need additional work to covert the
decimal back to hex or binary to check the bit patterns,
especially when there are small precision difference.
Hexadecimal form is one of the supported form in LLVM IR, and easier for
debugging.
This patch try to print all FP constant in hex form instead.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D73566
We were executing this in a waterfall loop as a placeholder, but this
should really be converted to a MUBUF load. Also execute in a
waterfall loop if the resource isn't an SGPR. This is a case where the
DAG handling was wrong because doing the right thing was too hard.
Currently, this will mishandle 96-bit loads. There's currently no way
to track the original memory size with an MMO, so these loads will be
widened andd the resulting memory size will be 128-bits.
Summary:
The following example gives the error message "expected value of type
'bits<32>', got 'bit'" on the assignment.
class Instruction { bits<32> encoding; }
def foo: Instruction { let encoding{10} = !eq(0, 1); }
But there's nothing wrong with this code: 'bit' is a perfectly good
type for the RHS of an assignment to a //single bit// of an
instruction encoding.
The problem is that `ParseBodyItem` is accidentally type-checking the
RHS against the full type of the `encoding` field, without adjusting
it in the case where we're only assigning to a subset of the bits. The
fix is trivial.
Reviewers: nhaehnle, hfinkel
Reviewed By: hfinkel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74220
The type passed to lower was invalid, so I'm not sure how this was
even working before. The source and destination type also do not have
to match, so make sure to use the right ones.
The registers TRCEXTINSELR and TRCEXTINSELR0 are distinct registers,
defined by separate extension specifications (ETM and ETE,
respectively), yet they use the same encoding in MSR/MRS.
When performing a system register lookup by encoding, we would
essentially return a random one, depending on the number, relative
position in the TableGen file, whether the TableGen records for system
registers are named or not, and, if they are named, depending on
record (not register!) name as well.
This patch works around the issue by explictly checking for the
TRCEXTINSELR/TRCEXTINSELR0 encoding and always returning TRCEXTINSELR.
Differential Revision: https://reviews.llvm.org/D74074
If we know that a >= b (unsigned), usub.with.overflow(a, b) cannot
overflow. Similarly, if b > a, the same expression overflows.
Reviewers: nikic, RKSimon, lebedev.ri, spatel
Reviewed By: nikic, Gerolf
Differential Revision: https://reviews.llvm.org/D74066
This reverts commit 39f50da2a3.
The -fstack-clash-protection is being passed to the linker too, which
is not intended.
Reverting and fixing that in a later commit.
Summary: This patch introduces an API for MemOp in order to simplify and tighten the client code.
Reviewers: courbet
Subscribers: arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jsji, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73964
This patch adds versions of isImpliedCondition and
isImpliedByDomCondition that take a predicate, LHS and RHS operands as
instead of a Value representing the condition.
This allows using those functions to check conditions without having a
concrete ICmp instruction.
Reviewers: nikic, RKSimon, lebedev.ri, spatel
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D74065
Implement protection against the stack clash attack [0] through inline stack
probing.
Probe stack allocation every PAGE_SIZE during frame lowering or dynamic
allocation to make sure the page guard, if any, is touched when touching the
stack, in a similar manner to GCC[1].
This extends the existing `probe-stack' mechanism with a special value `inline-asm'.
Technically the former uses function call before stack allocation while this
patch provides inlined stack probes and chunk allocation.
Only implemented for x86.
[0] https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt
[1] https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00556.html
Differential Revision: https://reviews.llvm.org/D68720
Summary:
This patch reorders the emission of debug_str section, so that
string can come after macros.
This is necessary for macro forms like DW_MACRO_define_strp,
which emits macro as a string in debug_str section.