Summary: The APFloat and Constant APIs taking an APInt allow arbitrary payloads,
and that's great. There's a convenience API which takes an unsigned, and that's
silly because it then directly creates a 64-bit APInt. Just change it to 64-bits
directly.
At the same time, add ConstantFP NaN getters which match the APFloat ones (with
getQNaN / getSNaN and APInt parameters).
Improve the APFloat testing to set more payload bits.
Reviewers: scanon, rjmccall
Subscribers: jkorous, dexonsmith, kristina, llvm-commits
Differential Revision: https://reviews.llvm.org/D55460
llvm-svn: 348791
This patch restricts the capability of G_MERGE_VALUES, and uses the new
G_BUILD_VECTOR and G_CONCAT_VECTORS opcodes instead in the appropriate places.
This patch also includes AArch64 support for selecting G_BUILD_VECTOR of <4 x s32>
and <2 x s64> vectors.
Differential Revisions: https://reviews.llvm.org/D53629
llvm-svn: 348788
If all the demanded elements of the SimplifyDemandedVectorElts are known to be UNDEF, we can simplify to an ISD::UNDEF node.
Zero constant folding will be handled in a future patch - its a little trickier as we often have bitcasted zero values.
Differential Revision: https://reviews.llvm.org/D55511
llvm-svn: 348784
As discussed on D55511, this caused an issue if the inner node deletes a node that the outer node depends upon. As it doesn't affect any lit-tests and I've only been able to expose this with the D55511 change I'm committing this now.
llvm-svn: 348781
This should really be generalized to allow increment and/or
we should replace it by using ISD::matchUnaryPredicate().
See D55515 for context.
llvm-svn: 348776
Refactor the scheduling predicates based on `MCInstPredicate`. In this
case, for the Exynos processors.
Differential revision: https://reviews.llvm.org/D55345
llvm-svn: 348774
This commit changes which l1 flush instruction is used for AMDPAL and
MESA3d workloads to flush the entire l1 cache instead of just the
volatile lines.
Differential Revision: https://reviews.llvm.org/D55367
llvm-svn: 348771
Refactor the scheduling predicates based on `MCInstPredicate`. Augment the
number of helper predicates used by processor specific predicates.
Differential revision: https://reviews.llvm.org/D55375
llvm-svn: 348768
Record the stack protector index in MachineFrameInfo when translating
Intrinsic::stackprotector similarly as is done by SelectionDAG when
processing the same intrinsic.
Setting this index allows the Prologue/Epilogue Insertion to recognize
that the stack protection is enabled. The pass can then make sure that
the stack protector comes before local variables on the stack and
assigns potentially vulnerable objects first so they are close to the
stack protector slot.
Differential Revision: https://reviews.llvm.org/D55418
llvm-svn: 348761
When replacing jal with jalr, also emit '.reloc R_MIPS_JALR' (R_MICROMIPS_JALR
for micromips). The linker might then be able to turn jalr into a direct
call.
Add '-mips-jalr-reloc' to enable/disable this feature (default is true).
Differential revision: https://reviews.llvm.org/D55292
llvm-svn: 348760
This triggers an assert when combining concat_vectors of a bitcast of
merge_values.
With asserts disabled, it fails to select:
fatal error: error in backend: Cannot select: 0x7ff19d000e90: i32 = any_extend 0x7ff19d000ae8
0x7ff19d000ae8: f64,ch = CopyFromReg 0x7ff19d000c20:1, Register:f64 %1
0x7ff19d000b50: f64 = Register %1
In function: d
Differential Revision: https://reviews.llvm.org/D55507
llvm-svn: 348759
The list generated in the target parser tests is the
same as the one in the AArch64 target parser.
Use that one instead.
Differential Revision: https://reviews.llvm.org/D55509
llvm-svn: 348757
A new pass to manage the Mode register.
Currently this just manages the floating point double precision
rounding requirements, but is intended to be easily extended to
encompass all Mode register settings.
The immediate motivation comes from the requirement to use the
round-to-zero rounding mode for the 16 bit interpolation
instructions, where the rounding mode setting is shared between
16 and 64 bit operations.
llvm-svn: 348754
Currently, dbg.value's of "nullptr" are dropped when entering a SelectionDAG --
apparently just because of an oversight when recognising Values that are
constant (see PR39787). This patch adds ConstantPointerNull to the list of
constants that can be turned into DBG_VALUEs.
The matter of what bit-value a null pointer constant in LLVM has was raised
in this mailing list thread:
http://lists.llvm.org/pipermail/llvm-dev/2018-December/128234.html
Where it transpires LLVM relies on (IR) null pointers being zero valued,
thus I've baked this assumption into the patch.
Differential Revision: https://reviews.llvm.org/D55227
llvm-svn: 348753
This is a fix for PR39896, where dbg.value's of SDNodes that have been
optimised out do not lead to "DBG_VALUE undef" instructions being created.
Such undef instructions are necessary to terminate earlier variable
ranges, otherwise variable values leak past the point where they're valid.
The "invalidated" flag of SDDbgValue is currently being abused to mean two
things:
* The corresponding SDNode is now invalid
* This SDDbgValue should not be emitted
Of which there are several legitimate combinations of meaning:
* The SDNode has been invalidated and we should emit "DBG_VALUE undef"
* The SDNode has been invalidated but the debug data was salvaged, don't
emit anything for this SDDbgValue
* This SDDbgValue has been emitted
This patch introduces distinct "Emitted" and "Invalidated" fields to the
SDDbgValue class, updates users accordingly, and generates "undef"
DBG_VALUEs for invalidated records. Awkwardly, there are circumstances
where we emit SDDbgValue's twice, specifically DebugInfo/X86/dbg-addr-dse.ll
which I've preserved.
Differential Revision: https://reviews.llvm.org/D55372
llvm-svn: 348751
Fixes https://bugs.llvm.org/show_bug.cgi?id=39926.
The size of the first copy was computed as
std::abs(std::abs(LdDisp2) - std::abs(LdDisp1)), which results in
skipped bytes if the signs of LdDisp2 and LdDisp1 differ. As far as
I can see, this should just be LdDisp2 - LdDisp1. The case where
LdDisp1 > LdDisp2 is already handled in the code above, in which case
LdDisp2 is set to LdDisp1 and this subtraction will evaluate to
Size1 = 0, which is the correct value to skip an overlapping copy.
Differential Revision: https://reviews.llvm.org/D55485
llvm-svn: 348750
Summary: This should avoid failing on old CPUs that do not have a cycle counter.
Subscribers: tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D55416
llvm-svn: 348740
Summary: The comment says we need 3 extracts and a select at the end. But didn't we just account for the select in the vector cost above. Aren't we just extracting the single element after taking the min/max in the vector register?
Reviewers: RKSimon, spatel, ABataev
Reviewed By: RKSimon
Subscribers: javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D55480
llvm-svn: 348739
Both intrinsics do the exact same thing so we really only need one.
Earlier in the 8.0 cycle we changed the signature of this intrinsic without renaming it. But it looks difficult to get the autoupgrade code to allow me to merge the intrinsics and change the signature at the same time. So I've renamed the intrinsic slightly for the new merged intrinsic. I'm skipping autoupgrading from the previous new to 8.0 signature. I've also renamed the subborrow for consistency.
llvm-svn: 348737
Since TBEHandler doesn't maintain state or otherwise have any need to be
a class right now, the read and write functions have been moved out and
turned into standalone functions. Additionally, the TBE read function
has been updated to return an Expected value for better error handling.
Tests have been updated to reflect these changes.
Differential Revision: https://reviews.llvm.org/D55450
llvm-svn: 348735
Summary:
When bugpoint attempts to find the other executables it needs to run,
such as `opt` or `clang`, it tries searching the user's PATH. However,
in many cases, the 'bugpoint' executable is part of an LLVM build, and
the 'opt' executable it's looking for is in that same directory.
Many LLVM tools handle this case by using the `Paths` parameter of
`llvm::sys::findProgramByName`, passing the parent path of the currently
running executable. Do this same thing for bugpoint. However, to
preserve the current behavior exactly, first search the user's PATH,
and then search for 'opt' in the directory containing 'bugpoint'.
Test Plan:
`check-llvm`. Many of the existing bugpoint tests no longer need to use the
`--opt-command` option as a result of these changes.
Reviewers: MatzeB, silvas, davide
Reviewed By: MatzeB, davide
Subscribers: davide, llvm-commits
Differential Revision: https://reviews.llvm.org/D54884
llvm-svn: 348734
Summary:
`llvm::AttributeList` and `llvm::AttributeSet` are immutable, and so methods
defined on these classes, such as `addAttribute`, return a new immutable
object with the attribute added. In https://reviews.llvm.org/D55217 I attempted
to annotate methods such as `addAttribute` with `LLVM_NODISCARD`, since
calling these methods has no side-effects, and so ignoring the result
that is returned is almost certainly a programmer error.
However, committing the change resulted in new warnings in the AMDGPU target.
The AMDGPU simplify libcalls pass added in https://reviews.llvm.org/D36436
attempts to add the readonly and nounwind attributes to simplified
library functions, but instead calls the `addAttribute` methods and
ignores the result.
Modify the simplify libcalls pass to actually add the nounwind and
readonly attributes. Also update the simplify libcalls test to assert
that these attributes are actually being set.
Reviewers: rampitec, vpykhtin, rnk
Reviewed By: rampitec
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D55435
llvm-svn: 348732
This trait is used by several AST visitor classes to control whether the AST is visiting const nodes or non-const nodes. These uses cannot be easily replaced with the STL traits directly due to use of an unspecialized templated when a type is expected (due to the template template parameter involved).
llvm-svn: 348729
Someday we'd like to remove old autoupgrade code so it helps to annotate how long its been there so we don't have to go digging through commit history.
llvm-svn: 348728
Previously we had to take the carry in and add -1 to it to set the carry flag so we could use it with ADC/SBB. But if we know its 0 then we don't need to bother.
This should go a long way towards fixing PR24545.
llvm-svn: 348727
The dependency was added in r213995 in response to r213986 which did make
X86/Utils depend on IR, but r256680 later removed that dependency again.
llvm-svn: 348724
The existing code tries to handle an undef operand while transforming an add to an LEA,
but it's incomplete because we will crash on the i16 test with the debug output shown below.
It's better to just give up instead. Really, GlobalIsel should have folded these before we
could get into trouble.
# Machine code for function add_undef_i16: NoPHIs, TracksLiveness, Legalized, RegBankSelected, Selected
bb.0 (%ir-block.0):
liveins: $edi
%1:gr32 = COPY killed $edi
%0:gr16 = COPY %1.sub_16bit:gr32
%5:gr64_nosp = IMPLICIT_DEF
%5.sub_16bit:gr64_nosp = COPY %0:gr16
%6:gr64_nosp = IMPLICIT_DEF
%6.sub_16bit:gr64_nosp = COPY %2:gr16
%4:gr32 = LEA64_32r killed %5:gr64_nosp, 1, killed %6:gr64_nosp, 0, $noreg
%3:gr16 = COPY killed %4.sub_16bit:gr32
$ax = COPY killed %3:gr16
RET 0, implicit killed $ax
# End machine code for function add_undef_i16.
*** Bad machine code: Reading virtual register without a def ***
- function: add_undef_i16
- basic block: %bb.0 (0x7fe6cd83d940)
- instruction: %6.sub_16bit:gr64_nosp = COPY %2:gr16
- operand 1: %2:gr16
LLVM ERROR: Found 1 machine code errors.
Differential Revision: https://reviews.llvm.org/D54710
llvm-svn: 348722
Extension to rL348617, turns out llvm-exegesis doesn't need to match the perf counter name against a scheduler model resource name - so I've added a few more counters that I could find in the libpfm4 source code (and fix a typo in the knl/knm retired_uops counter - which uses 'all' instead of 'any').
llvm-svn: 348721
The test file shows a case where the avoid store forwarding block
pass misses to copy a range (-1..1) when the load displacement
changes sign.
Baseline test for D55485.
llvm-svn: 348712
PE/COFF sections can have section names truncated to 8 chars, in order to
have the name available at runtime. (The string table, where long untruncated
names are stored, isn't loaded at runtime.)
This allows various llvm tools to dump the .eh_frame section from such
executables.
Patch by Peiyuan Song!
Differential Revision: https://reviews.llvm.org/D55407
llvm-svn: 348708
This is effectively re-committing the changes from:
rL347917 (D54640)
rL348195 (D55126)
...which were effectively reverted here:
rL348604
...because the code had a bug that could induce infinite looping
or eventual out-of-memory compilation.
The bug was that this code did not guard against transforming
opaque constants. More details are in the post-commit mailing
list thread for r347917. A reduced test for that is included
in the x86 bool-math.ll file. (I wasn't able to reduce a PPC
backend test for this, but it was almost the same pattern.)
Original commit message for r347917:
The motivating case for this is shown in:
https://bugs.llvm.org/show_bug.cgi?id=32023
and the corresponding rot16.ll regression tests.
Because x86 scalar shift amounts are i8 values, we can end up with trunc-binop-trunc
sequences that don't get folded in IR.
As the TODO comments suggest, there will be regressions if we extend this (for x86,
we mostly seem to be missing LEA opportunities, but there are likely vector folds
missing too). I think those should be considered existing bugs because this is the
same transform that we do as an IR canonicalization in instcombine. We just need
more tests to make those visible independent of this patch.
llvm-svn: 348706
Summary:
WasmSignature used to use its `WasmSignature` member variable only for
function types, but now it also can be used for events as well.
Reviewers: sbc100
Subscribers: dschuff, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D55247
llvm-svn: 348702
Summary:
This anoymous function actually has same logic with `Obj->toMappedAddr`.
Besides, I have a question on resolving illegal value. `gnu-readelf`, `gnu-objdump` and `llvm-objdump` could parse the test file 'test/tools/llvm-objdump/Inputs/private-headers-x86_64.elf', but `llvm-readobj` will fail when parse `DT_RELR` segment. Because, the value is 0x87654321 which is illegal. So, shall we do this clean up rather then remove the checking statements inside anoymous function?
```
if (Delta >= Phdr.p_filesz)
return createError("Virtual address is not in any segment");
```
Reviewers: rupprecht, jhenderson
Reviewed By: jhenderson
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D55329
llvm-svn: 348701
These nodes should have two results. A real VT and a Glue. But this code would have returned Undef which would only be a single result. But we're in the single result version of getNode so these opcodes should never be seen by this function anyway.
llvm-svn: 348670
It seems to be unexpectedly passing on some bots probably because it requires asserts to fail, but doesn't say that. But we already have a patch in review to make it not xfail so I'd rather just focus on getting it passing rather than trying to figure out an unexpected pass.
llvm-svn: 348661
We were still using the rounded down offset and alignment even though
they aren't handled because you can't trivially bitcast the loaded
value.
llvm-svn: 348658
`Saver` is a StringSaver, which has a few overloads of `save` that all
ultimately just call `StringRef save(StringRef)`. Just take a StringRef
here instead of building up a std::string to convert it to a StringRef.
llvm-svn: 348650
Summary:
- LLVM clang-format style doesn't allow one-line ifs.
- LLVM clang-tidy style says method names should start with a lowercase
letter. But currently WebAssemblyAsmParser's parent class
MCTargetAsmParser is mixing lowercase and uppercase method names
itself so overridden methods cannot be renamed now.
- Changed else ifs after returns to ifs.
- Added some newlines for readability.
Reviewers: aardappel, sbc100
Subscribers: dschuff, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D55350
llvm-svn: 348648
Currently memcpyopt optimizes cases like
memset(a, byte, N);
memcpy(b, a, M);
to
memset(a, byte, N);
memset(b, byte, M);
if M <= N. Often this allows further simplifications down the line,
which drop the first memset entirely.
This patch extends this optimization for the case where M > N, but we
know that the bytes a[N..M] are undef due to alloca/lifetime.start.
This situation arises relatively often for Rust code, because Rust does
not initialize trailing structure padding and loves to insert redundant
memcpys. This also fixes https://bugs.llvm.org/show_bug.cgi?id=39844.
For the implementation, I'm reusing a bit of code for a similar existing
optimization (direct memcpy of undef). I've also added memset support to
MemDepAnalysis GetLocation -- Instead, getPointerDependencyFrom could be
used, but it seems to make more sense to add this to GetLocation and thus
make the computation cachable.
Differential Revision: https://reviews.llvm.org/D55120
llvm-svn: 348645
The splitting pass uses its 'unlikelyExecuted' predicate to statically
decide which blocks are cold.
- Do not treat noreturn calls as if they are cold unless they are actually
marked cold. This is motivated by functions like exit() and longjmp(), which
are not beneficial to outline.
- Do not treat inline asm as an outlining barrier. In practice asm("") is
frequently used to inhibit basic block merging; enabling outlining in this case
results in substantial memory savings.
- Treat invokes of cold functions as cold.
As a drive-by, remove the 'exceptionHandlingFunctions' predicate, because it's
no longer needed. The pass can identify & outline blocks dominated by EH pads,
so there's no need to special-case __cxa_begin_catch etc.
Differential Revision: https://reviews.llvm.org/D54244
llvm-svn: 348640
Algorithm: Identify maximal cold regions and put them in a worklist. If
a candidate region overlaps with another, discard it. While the worklist
is full, remove a single-entry sub-region from the worklist and attempt
to outline it. By the non-overlap property, this should not invalidate
parts of the domtree pertaining to other outlining regions.
Testing: LNT results on X86 are clean. With test-suite + externals, llvm
outlines 134KB pre-patch, and 352KB post-patch (+ ~2.6x). The file
483.xalancbmk/src/Constants.cpp stands out as an extreme case where llvm
outlines over 100 times in some functions (mostly EH paths). There was
not a significant performance impact pre vs. post-patch.
Differential Revision: https://reviews.llvm.org/D53887
llvm-svn: 348639
Previously we would create an lldb::Function object for each function
parsed, but we would not add these to the clang AST. This is a first
step towards getting local variable support working, as we first need an
AST decl so that when we create local variable entries, they have the
proper DeclContext.
Differential Revision: https://reviews.llvm.org/D55384
llvm-svn: 348631
We were overcounting the number of arithmetic operations needed at each level before we reach a legal type. We were using the full vector type for that level, but we are going to split the input vector at that level in half. So the effective arithmetic operation cost at that level is half the width.
So for example on 8i32 on an sse target. Were were calculating the cost of an 8i32 op which is likely 2 for basic integer. Then after the loop we count 2 more v4i32 ops. For a total arith cost of 4. But if you look at the assembly there would only be 3 arithmetic ops.
There are still more bugs in this code that I'm going to work on next. The non pairwise code shouldn't count extract subvectors in the loop. There are no extracts, the types are split in registers. For pairwise we need to use 2 two src permute shuffles.
Differential Revision: https://reviews.llvm.org/D55397
llvm-svn: 348621
To make X86CondBrFoldingPass can be run with --run-pass option, this can test one wrong assertion on analyzeCompare function for SUB32ri when its operand is not imm
Patch by Jianping Chen
Differential Revision: https://reviews.llvm.org/D55412
llvm-svn: 348620
This patch attempts to improve pfm perf counter coverage for all the x86 CPUs that libpfm4 supports.
Intel/AMD CPU families tend to share names for cycle/uops counters so even if they don't have a scheduler model yet they can at least use the default values (checked against the libpfm4 source code).
The remaining CPUs (where their port/pipe resource counters are known) I've tried to add to the existing model mappings.
These are untested but don't represent a regression to current llvm-exegesis behaviour for these CPUs.
Differential Revision: https://reviews.llvm.org/D55432
llvm-svn: 348617
As discussed in the post-commit thread of r347917, this
transform is fighting with an existing transform causing
an infinite loop or out-of-memory, so this is effectively
reverting r347917 and its follow-up r348195 while we
investigate the bug.
llvm-svn: 348604
DemandedBits and BDCE currently only support scalar integers. This
patch extends them to also handle vector integer operations. In this
case bits are not tracked for individual vector elements, instead a
bit is demanded if it is demanded for any of the elements. This matches
the behavior of computeKnownBits in ValueTracking and
SimplifyDemandedBits in InstCombine.
Unlike the previous iteration of this patch, getDemandedBits() can now
again be called on arbirary (sized) instructions, even if they don't
have integer or vector of integer type. (For vector types the size of the
returned mask will now be the scalar size in bits though.)
The added LoopVectorize test case shows a case which triggered an
assertion failure with the previous attempt, because getDemandedBits()
was called on a pointer-typed instruction.
Differential Revision: https://reviews.llvm.org/D55297
llvm-svn: 348602
This change attempts to shrink scalar AND, OR and XOR instructions which take an immediate that isn't inlineable.
It performs:
AND s0, s0, ~(1 << n) -> BITSET0 s0, n
OR s0, s0, (1 << n) -> BITSET1 s0, n
AND s0, s1, x -> ANDN2 s0, s1, ~x
OR s0, s1, x -> ORN2 s0, s1, ~x
XOR s0, s1, x -> XNOR s0, s1, ~x
In particular, this catches setting and clearing the sign bit for fabs (and x, 0x7ffffffff -> bitset0 x, 31 and or x, 0x80000000 -> bitset1 x, 31).
llvm-svn: 348601
This patch introduces a new instinsic `@llvm.experimental.widenable_condition`
that allows explicit representation for guards. It is an alternative to using
`@llvm.experimental.guard` intrinsic that does not contain implicit control flow.
We keep finding places where `@llvm.experimental.guard` is not supported or
treated too conservatively, and there are 2 reasons to that:
- `@llvm.experimental.guard` has memory write side effect to model implicit control flow,
and this sometimes confuses passes and analyzes that work with memory;
- Not all passes and analysis are aware of the semantics of guards. These passes treat them
as regular throwing call and have no idea that the condition of guard may be used to prove
something. One well-known place which had caused us troubles in the past is explicit loop
iteration count calculation in SCEV. Another example is new loop unswitching which is not
aware of guards. Whenever a new pass appears, we potentially have this problem there.
Rather than go and fix all these places (and commit to keep track of them and add support
in future), it seems more reasonable to leverage the existing optimizer's logic as much as possible.
The only significant difference between guards and regular explicit branches is that guard's condition
can be widened. It means that a guard contains (explicitly or implicitly) a `deopt` block successor,
and it is always legal to go there no matter what the guard condition is. The other successor is
a guarded block, and it is only legal to go there if the condition is true.
This patch introduces a new explicit form of guards alternative to `@llvm.experimental.guard`
intrinsic. Now a widenable guard can be represented in the CFG explicitly like this:
%widenable_condition = call i1 @llvm.experimental.widenable.condition()
%new_condition = and i1 %cond, %widenable_condition
br i1 %new_condition, label %guarded, label %deopt
guarded:
; Guarded instructions
deopt:
call type @llvm.experimental.deoptimize(<args...>) [ "deopt"(<deopt_args...>) ]
The new intrinsic `@llvm.experimental.widenable.condition` has semantics of an
`undef`, but the intrinsic prevents the optimizer from folding it early. This form
should exploit all optimization boons provided to `br` instuction, and it still can be
widened by replacing the result of `@llvm.experimental.widenable.condition()`
with `and` with any arbitrary boolean value (as long as the branch that is taken when
it is `false` has a deopt and has no side-effects).
For more motivation, please check llvm-dev discussion "[llvm-dev] Giving up using
implicit control flow in guards".
This patch introduces this new intrinsic with respective LangRef changes and a pass
that converts old-style guards (expressed as intrinsics) into the new form.
The naming discussion is still ungoing. Merging this to unblock further items. We can
later change the name of this intrinsic.
Reviewed By: reames, fedor.sergeev, sanjoy
Differential Revision: https://reviews.llvm.org/D51207
llvm-svn: 348593
When we had dynamic call frames (i.e. sp adjustment around each call) we
were including that adjustment into offsets calculated based on r6, even
though it's only sp that changes. This led to incorrect stack slot
accesses.
llvm-svn: 348591
Adds fatal errors for any target that does not support the Tiny or Kernel
codemodels by rejigging the getEffectiveCodeModel calls.
Differential Revision: https://reviews.llvm.org/D50141
llvm-svn: 348585
Summary: This line is longer than 80 characters.
Subscribers: llvm-commits, jakehehrlich
Differential Revision: https://reviews.llvm.org/D55419
llvm-svn: 348580
Summary: This line is longer than 80 characters.
Subscribers: llvm-commits, jakehehrlich
Differential Revision: https://reviews.llvm.org/D55419
llvm-svn: 348578
It was failing as below. Adding a triple seems to help.
--
: 'RUN: at line 2'; /work/llvm.combined/build.release/bin/llvm-mca -march=aarch64 -mcpu=exynos-m1 -resource-pressure=false < /work/llvm.combined/llvm/test/tools/llvm-mca/AArch64/Exynos/direct-branch.s | /work/llvm.combined/build.release/bin/FileCheck /work/llvm.combined/llvm/test/tools/llvm-mca/AArch64/Exynos/direct-branch.s -check-prefixes=ALL,M1
: 'RUN: at line 3'; /work/llvm.combined/build.release/bin/llvm-mca -march=aarch64 -mcpu=exynos-m3 -resource-pressure=false < /work/llvm.combined/llvm/test/tools/llvm-mca/AArch64/Exynos/direct-branch.s | /work/llvm.combined/build.release/bin/FileCheck /work/llvm.combined/llvm/test/tools/llvm-mca/AArch64/Exynos/direct-branch.s -check-prefixes=ALL,M3
--
Exit Code: 1
Command Output (stderr):
--
/work/llvm.combined/llvm/test/tools/llvm-mca/AArch64/Exynos/direct-branch.s:36:12: error: M1-NEXT: expected string not found in input
^
<stdin>:21:2: note: scanning from here
1 0 0.25 b Ltmp0
^
--
llvm-svn: 348577
has_key has been removed in Python 3. The in comparison operator can be used
instead.
Differential Revision: https://reviews.llvm.org/D55310
llvm-svn: 348576
Summary:
Allow clients to suppress setup of default RPATHs in designated library targets. This is used in LLDB when emitting liblldb as a framework bundle, which itself doesn't load further RPATH-dependent libraries.
This follows the approach in add_llvm_executable().
Reviewers: aprantl, JDevlieghere, davide, friss
Reviewed By: aprantl
Subscribers: mgorny, lldb-commits, llvm-commits, #lldb
Differential Revision: https://reviews.llvm.org/D55316
llvm-svn: 348573
In some cases different alignments for function might be used to save
space e.g. thumb mode with -Oz will try to use 2 byte function
alignment. Similar patch that fixed this in other areas exists here
https://reviews.llvm.org/D46110
This was approved previously https://reviews.llvm.org/D55115 (r348215)
but when committed it caused failures on the sanitizer buildbots when
building llvm with clang (containing this patch). This is now fixed
because I've added a check to see if getting the parent module returns
null if it does then set the alignment to 0.
Differential Revision: https://reviews.llvm.org/D55115
llvm-svn: 348571
The current algorithm that collects live/dead/inloop blocks relies on some invariants
related to RPO and PO traversals. In particular, the important fact it requires is that
the only loop's latch is the first block in PO traversal. It also relies on fact that during
RPO we visit all prececessors of a block before we visit this block (backedges ignored).
If a loop has irreducible non-loop cycle inside, both these assumptions may break.
This patch adds detection for this situation and prohibits the terminator folding
for loops with irreducible CFG.
We can in theory support this later, for this some algorithmic changes are needed.
Besides, irreducible CFG is not a frequent situation and we can just don't bother.
Thanks @uabelho for finding this!
Differential Revision: https://reviews.llvm.org/D55357
Reviewed By: skatkov
llvm-svn: 348567
Fix assert about using an undefined physical register in machine instruction verify pass.
The reason is that register flag undef is missing when doing transformation from If Conversion Pass.
```
Bad machine code: Using an undefined physical register
- function: func_65
- basic block: %bb.0 entry (0x10024740738)
- instruction: BCLR killed $cr5lt, implicit $lr8, implicit $rm, implicit undef $x3
- operand 0: killed $cr5lt
LLVM ERROR: Found 1 machine code errors.
```
There are also other existing testcases with same issue. So I add -verify-machineinstrs option to open verifying.
Differential Revision: https://reviews.llvm.org/D55408
llvm-svn: 348566
When CodeExtractor outlines values which are used by the original
function, it must store those values in some in-out parameter. This
store instruction must not be inserted in between a PHI and an EH pad
instruction, as that results in invalid IR.
This fixes the following verifier failure seen while outlining within
ObjC methods with live exit values:
The unwind destination does not have an exception handling instruction!
%call35 = invoke i8* bitcast (i8* (i8*, i8*, ...)* @objc_msgSend to i8* (i8*, i8*)*)(i8* %exn.adjusted, i8* %1)
to label %invoke.cont34 unwind label %lpad33, !dbg !4183
The unwind destination does not have an exception handling instruction!
invoke void @objc_exception_throw(i8* %call35) #12
to label %invoke.cont36 unwind label %lpad33, !dbg !4184
LandingPadInst not the first non-PHI instruction in the block.
%3 = landingpad { i8*, i32 }
catch i8* null, !dbg !1411
rdar://46540815
llvm-svn: 348562
If this is not a valid way to assign an SDLoc, then we get this
wrong all over SDAG.
I don't know enough about the SDAG to explain this. IIUC, theoretically,
debug info is not supposed to affect codegen. But here it has clearly
affected 3 different targets, and the x86 change is an actual improvement.
llvm-svn: 348552
Change the ELF YAML implementation of TextAPI so NeededLibs uses flow
sequence vector correctly instead of overriding the YAML implementation
for std::vector<std::string>>.
This should fix the test failure with the LLVM_LINK_LLVM_DYLIB build mentioned in D55381.
Still passes existing tests that cover this.
Differential Revision: https://reviews.llvm.org/D55390
llvm-svn: 348551
We shouldn't care about the debug location for a node that
we're creating, but attaching the root of the pattern should
be the best effort. (If this is not true, then we are doing
it wrong all over the SDAG).
This is no-functional-change-intended, and there are no
regression test diffs...and that's what I expected. But
there's a similar line above this diff, where those
assumptions apparently do not hold.
llvm-svn: 348550
DemandedBits and BDCE currently only support scalar integers. This
patch extends them to also handle vector integer operations. In this
case bits are not tracked for individual vector elements, instead a
bit is demanded if it is demanded for any of the elements. This matches
the behavior of computeKnownBits in ValueTracking and
SimplifyDemandedBits in InstCombine.
The getDemandedBits() method can now only be called on instructions that
have integer or vector of integer type. Previously it could be called on
any sized instruction (even if it was not particularly useful). The size
of the return value is now always the scalar size in bits (while
previously it was the type size in bits).
Differential Revision: https://reviews.llvm.org/D55297
llvm-svn: 348549
This addresses a FIXME and avoids depending on an isel pattern match I think. I've remove the isel patterns too since he have no lit tests left that cover them. Hopefully that really means they are unused.
I'm trying to decide if we need SETCC_CARRY. This removes one of its usages.
Differential Revision: https://reviews.llvm.org/D55355
llvm-svn: 348536
This was probably organized as it was because bswap is a unary op.
But that's where the similarity to the other opcodes ends. We should
not limit this transform to scalars, and we should not try it if
either input has other uses. This is another step towards trying to
clean this whole function up to prevent it from causing infinite loops
and memory explosions.
Earlier commits in this series:
rL348501
rL348508
rL348518
llvm-svn: 348534
Unlike some of the folds in hoistLogicOpWithSameOpcodeHands()
above this shuffle transform, this has the expected hasOneUse()
checks in place.
llvm-svn: 348523
This patch introduces a new DAGCombiner rule to simplify concat_vectors nodes:
concat_vectors( bitcast (scalar_to_vector %A), UNDEF)
--> bitcast (scalar_to_vector %A)
This patch only partially addresses PR39257. In particular, it is enough to fix
one of the two problematic cases mentioned in PR39257. However, it is not enough
to fix the original test case posted by Craig; that particular case would
probably require a more complicated approach (and knowledge about used bits).
Before this patch, we used to generate the following code for function PR39257
(-mtriple=x86_64 , -mattr=+avx):
vmovsd (%rdi), %xmm0 # xmm0 = mem[0],zero
vxorps %xmm1, %xmm1, %xmm1
vblendps $3, %xmm0, %xmm1, %xmm0 # xmm0 = xmm0[0,1],xmm1[2,3]
vmovaps %ymm0, (%rsi)
vzeroupper
retq
Now we generate this:
vmovsd (%rdi), %xmm0 # xmm0 = mem[0],zero
vmovaps %ymm0, (%rsi)
vzeroupper
retq
As a side note: that VZEROUPPER is completely redundant...
I guess the vzeroupper insertion pass doesn't realize that the definition of
%xmm0 from vmovsd is already zeroing the upper half of %ymm0. Note that on
%-mcpu=btver2, we don't get that vzeroupper because pass vzeroupper insertion
%pass is disabled.
Differential Revision: https://reviews.llvm.org/D55274
llvm-svn: 348522
The PPC test with 2 extra uses seems clearly better by avoiding this transform.
With 1 extra use, we also prevent an extra register move (although that might
be an RA problem). The general rule should be to only make a change here if
it is always profitable. The x86 diffs are all neutral.
llvm-svn: 348518
This reverts commit r348203 and reapplies D55085 with an additional
GCOV bugfix to make the change NFC for relative file paths in .gcno files.
Thanks to Ilya Biryukov for additional testing!
Original commit message:
Update Diagnostic handling for changes in CFE.
The clang frontend no longer emits the current working directory for
DIFiles containing an absolute path in the filename: and will move the
common prefix between current working directory and the file into the
directory: component.
https://reviews.llvm.org/D55085
llvm-svn: 348512
The AVX512 diffs are neutral, but the bswap test shows a clear overreach in
hoistLogicOpWithSameOpcodeHands(). If we don't check for other uses, we can
increase the instruction count.
This could also fight with transforms trying to go in the opposite direction
and possibly blow up/infinite loop. This might be enough to solve the bug
noted here:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181203/608593.html
I did not add the hasOneUse() checks to all opcodes because I see a perf
regression for at least one opcode. We may decide that's irrelevant in the
face of potential compiler crashing, but I'll see if I can salvage that first.
llvm-svn: 348508
Tweak write_cmake_config.py to also handle variable references looking @FOO@
(matching CMake's configure_file() function), and make it replace '\' 'n' in
values with a newline literal since there's no good portable way of passing a
real newline literal on a command line.
Use that to process all the .def.in files in llvm/include/Config and add
llvm/lib/Target/BUILD.gn, which (indirectly, through llvm-c/Target.h) includes
them.
Differential Revision: https://reviews.llvm.org/D55184
llvm-svn: 348503
VarStreamArray was built on the assumption that it is backed by a
StreamRef, and offset 0 of that StreamRef is the first byte of the first
record in the array.
This is a logical and intuitive assumption, but unfortunately we have
use cases where it doesn't hold. Specifically, a PDB module's symbol
stream is prefixed by 4 bytes containing a magic value, and the first
byte of record data in the array is actually at offset 4 of this byte
sequence.
Previously, we would just truncate the first 4 bytes and then construct
the VarStreamArray with the resulting StreamRef, so that offset 0 of the
underlying stream did correspond to the first byte of the first record,
but this is problematic, because symbol records reference other symbol
records by the absolute offset including that initial magic 4 bytes. So
if another record wants to refer to the first record in the array, it
would say "the record at offset 4".
This led to extremely confusing hacks and semantics in loading code, and
after spending 30 minutes trying to get some math right and failing, I
decided to fix this in the underlying implementation of VarStreamArray.
Now, we can say that a stream is skewed by a particular amount. This
way, when we access a record by absolute offset, we can use the same
values that the records themselves contain, instead of having to do
fixups.
Differential Revision: https://reviews.llvm.org/D55344
llvm-svn: 348499
Initial step towards making the function more generic (and probably move into SelectionDAG).
This is necessary to avoid massive codegen bloat for PR38243 (Add modulo rotate support to LowerRotate).
llvm-svn: 348498
Summary:
If the output of debug directives only is requested, we should drop
emission of ',debug' option from the target directive. Required for
supporting of nvprof profiler.
Reviewers: echristo
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D46061
llvm-svn: 348497
Partial Redundancy Elimination of GEPs prevents CodeGenPrepare from
sinking the addressing mode computation of memory instructions back
to its uses. The problem comes from the insertion of PHIs, which
confuse CGP and make it bail.
I've autogenerated the check lines of an existing test and added a
store instruction to demonstrate the motivation behind this change.
The store is now using the gep instead of a phi.
Differential Revision: https://reviews.llvm.org/D55009
llvm-svn: 348496
Summary:
We may end up with not emitted debug directives at the end of the module
emission. Patch fixes this problem emitting those last directives the
end of the module emission.
Reviewers: echristo
Subscribers: jholewinski, llvm-commits
Differential Revision: https://reviews.llvm.org/D54320
llvm-svn: 348495
This patch splits backend features currently
hidden behind architecture versions.
For example, currently the only way to activate
complex numbers extension is targeting an v8.3
architecture, where after the patch this extension
can be added separately.
This refactoring is required by the new command lines proposal:
http://lists.llvm.org/pipermail/llvm-dev/2018-September/126346.html
Reviewers: DavidSpickett, olista01, t.p.northover
Subscribers: kristof.beyls, bryanpkc, javed.absar, pbarrio
Differential revision: https://reviews.llvm.org/D54633
--
It was reverted in rL348249 due a build bot failure in one of the
regression tests:
http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/14386
The problem seems to be that FileCheck behaves
different in windows and linux. This new patch
splits the test file in multiple,
and does more exact pattern matching attempting
to circumvent the issue.
llvm-svn: 348493
This reverts commit r348457.
The original commit causes clang to crash when doing an instrumented
build with a new pass manager. Reverting to unbreak our integrate.
llvm-svn: 348484
...yet!
A lot of the current code should be shared for arm and thumb mode, but
until we add tests and work out some of the details (e.g. checking the
correct subtarget feature for G_SDIV) it's safer to bail out as early as
possible for thumb targets.
This should have arguably been part of r348347, which allowed Thumb
functions to be handled by the IR Translator.
llvm-svn: 348472
The test was fully rewritten for simplification.
New test code was suggested by David Blaikie.
Differential revision: https://reviews.llvm.org/D55261
llvm-svn: 348464
I was finally able to quantify what i thought was missing in the fix,
it was vector constants. If we have a scalar (and %x, -1),
it will be instsimplified before we reach this code,
but if it is a vector, we may still have a -1 element.
Thus, we want to avoid the fold if *at least one* element is -1.
Or in other words, ignoring the undef elements, no sign bits
should be set. Thus, m_NonNegative().
A follow-up for rL348181
https://bugs.llvm.org/show_bug.cgi?id=39861
llvm-svn: 348462
This patch teaches LoopSimplifyCFG to delete loop blocks that have
become unreachable after terminator folding has been done.
Differential Revision: https://reviews.llvm.org/D54023
Reviewed By: anna
llvm-svn: 348457
I just hard core goofed when I wrote this and created a different name
for no good reason. I'm failry aware of most "fresh" users of llvm-objcopy
(that is, users which are not using it as a drop in replacement for GNU
objcopy) and can say that only "-j" is being used by such people so this
patch should strictly increase compatibility and not remove it.
Differential Revision: https://reviews.llvm.org/D52180
llvm-svn: 348446
The code emitting AND-subtrees used to check whether any of the operands
was an OR in order to figure out if the result needs to be negated.
However the OR could be hidden in further subtrees and not immediately
visible.
Change the code so that canEmitConjunction() determines whether the
result of the generated subtree needs to be negated. Cleanup emission
logic to use this. I also changed the code a bit to make all negation
decisions early before we actually emit the subtrees.
This fixes http://llvm.org/PR39550
Differential Revision: https://reviews.llvm.org/D54137
llvm-svn: 348444
Refactoring.
This map was only used when we used a string of integers to output the outlined
sequence. Since it's no longer used for anything, there's no reason to keep it
around.
llvm-svn: 348432
These opcodes are intended to subsume some of the capability of G_MERGE_VALUES,
as it was too powerful and thus complex to add deal with throughout the GISel
pipeline.
G_BUILD_VECTOR creates a vector value from a sequence of uniformly typed
scalar values. G_BUILD_VECTOR_TRUNC is a special opcode for handling scalar
operands which are larger than the destination vector element type, and
therefore does an implicit truncate.
G_CONCAT_VECTOR creates a vector by concatenating smaller, uniformly typed,
vectors together.
These will be used in a subsequent commit. This commit just adds the initial
infrastructure.
Differential Revision: https://reviews.llvm.org/D53594
llvm-svn: 348430
More refactoring.
Since the pruning logic has changed, and the candidate list is gone,
everything can be sunk into findCandidates.
We no longer need to keep track of the length of the longest substring, so we
can drop all of that logic as well.
After this, we just find all of the candidates and move to outlining.
llvm-svn: 348428
More refactoring.
After the changes to the pruning logic, and removing CandidateList, there's
no reason for Candiates to be shared_ptrs (or pointers at all).
std::shared_ptr<Candidate> -> Candidate.
llvm-svn: 348427
This change caused SEGVs in instcombine. (The r347934 change seems to me to be a
precipitating cause, not a root cause. Details are on the llvm-commits thread
for r347934.)
llvm-svn: 348426
Summary:
We decided to change the event section code from 12 to 13 as new
`DataCount` section in the bulk memory operations proposal will take the
code 12 instead.
Reviewers: sbc100
Subscribers: dschuff, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D55343
llvm-svn: 348424
Extracting from a splat constant is always handled by InstSimplify.
Move the test for this from InstCombine to InstSimplify to make
sure that stays true.
llvm-svn: 348423
Since we're now performing outlining per OutlinedFunction rather than per
Candidate, we can simply outline each candidate as it shows up.
Instead of having a pruning phase, instead, we'll outline entire functions.
Then we'll update the UnsignedVec we mapped to reflect the deletion. If any
candidate is in a space that's marked dirty, then we'll drop it.
This lets us remove the pruning logic entirely, and greatly simplifies the
code.
llvm-svn: 348420
It looks like this isn't necessary (in any tests I've done, it results
in the global being described with no location or value in the imported
side - while it's still fully described in the place it's imported from)
& results in significant/pathological debug info growth to home these
location-less global variable descriptions on the import side.
This is a rather pressing/important issue to address - this regressed
executable size for one example I'm looking at by 15%, object size is probably
similar though I haven't measured it, and a 22x increase in the number of CUs
in the cu_index in split DWARF DWP files, creating a similarly large regression
in the time it takes llvm-symbolizer to run on such binaries.
Reviewers: tejohnson, evgeny777
Differential Revision: https://reviews.llvm.org/D55309
llvm-svn: 348416
Mostly NFC, only change is the order of outlined function names.
Loop over the outlined functions instead of walking the candidate list.
This is a bit easier to understand. It's far more natural to create a function,
then replace all of its occurrences with calls than the other way around.
The functions outlined after this do not change, but their names will be
decided by their benefit. E.g, OUTLINED_FUNCTION_0 will now always be the
most beneficial function, rather than the first one seen.
This makes it easier to enforce an ordering on the outlined functions. So,
this also adds a test to make sure that the ordering works as expected.
llvm-svn: 348414
https://reviews.llvm.org/D54980
This provides a standard API across GISel passes to observe and notify
passes about changes (insertions/deletions/mutations) to MachineInstrs.
This patch also removes the recordInsertion method in MachineIRBuilder
and instead provides method to setObserver.
Reviewed by: vkeles.
llvm-svn: 348406
Treat terminators which resume exception propagation as returning instructions
(at least, for the purposes of marking outlined functions `noreturn`). This is
to avoid inserting traps after calls to outlined functions which unwind.
rdar://46129950
llvm-svn: 348404
Some gardening/refactoring.
It's cleaner to copy the instructions into the MachineFunction using the first
candidate instead of going to the mapper.
Also, by doing this we can remove the Seq member from OutlinedFunction entirely.
llvm-svn: 348390
Summary:
r336838 allowed these to be toggleable.
r336858 reverted r336838.
r336943 made the generation of these sections conditional on LDPO_REL.
This commit brings back the toggle-ability. You can specify:
-plugin-opt=-function-sections
-plugin-opt=-data-sections
For your linker flags to disable the changes made in r336943.
Without toggling r336943 off, arm64 linux kernels linked with gold-plugin
see significant boot time regressions, but with r336943 outright reverted
x86_64 linux kernels linked with gold-plugin fail to boot.
Reviewers: pcc, void
Reviewed By: pcc
Subscribers: javed.absar, kristof.beyls, llvm-commits, srhines
Differential Revision: https://reviews.llvm.org/D55291
llvm-svn: 348389
Because we're potentially peeking through a bitcast in this transform,
we need to use overall bitwidths rather than number of elements to
determine when it's safe to proceed.
Should fix:
https://bugs.llvm.org/show_bug.cgi?id=39893
llvm-svn: 348383
Summary: debug intrinsics might be marked norecurse to enable the caller function to be norecurse and optimized if needed. This avoids code gen optimisation differences when -g is used, as in globalOpt.cpp:processInternalGlobal checks.
Reviewers: chandlerc, jmolloy, aprantl
Reviewed By: aprantl
Subscribers: aprantl, llvm-commits
Differential Revision: https://reviews.llvm.org/D55187
llvm-svn: 348381
Revert https://reviews.llvm.org/D55217 due to warnings-turned-into-errors in
AMGPU targets. I'll fix the warnings first, then re-commit this patch.
llvm-svn: 348375
Whenever we effectively take the address of a basic block we need to
manually update that basic block to reflect that fact or later passes
such as tail duplication and tail merging can break the invariants of
the code. =/ Sadly, there doesn't appear to be any good way of
automating this or even writing a reasonable assert to catch it early.
The change seems trivially and obviously correct, but sadly the only
really good test case I have is 1000s of basic blocks. I've tried
directly writing a test case that happens to make tail duplication do
something that crashes later on, but this appears to require an
*amazingly* complex set of conditions that I've not yet reproduced.
The change is technically covered by the tests because we mark the
blocks as having their address taken, but that doesn't really count as
properly testing the functionality.
llvm-svn: 348374
fidelity checking of RIP-based references to basic blocks and other
labels.
These labels are super important for SLH tests so we should keep them
readable in the test cases.
llvm-svn: 348373
Summary:
Many functions on `llvm::AttributeList` and `llvm::AttributeSet` are
documented with "returns a new {list,set} because attribute
{lists,sets} are immutable." This documentation can be aided by the
addition of an attribute, `LLVM_NODISCARD`. Adding this prevents
unsuspecting users of the API from expecting
`AttributeList::setAttributes` from modifying the underlying list.
At the very least, it would have saved me a few hours of debugging, since I
had been doing just that! I had a bug in my program where I was calling
`setAttributes` but then passing in the unmutated `AttributeList`.
I tried adding LLVM_NODISCARD and confirmed that it would have made my bug
immediately obvious.
Reviewers: rnk, javed.absar
Reviewed By: rnk
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D55217
llvm-svn: 348372
The tests here are based on the motivating cases from D54827.
More background:
1. We don't get these cases in general with SimplifyCFG because the root
of the pattern match is an icmp, not a branch. I'm not sure how often
we encounter this pattern vs. the seemingly more likely case with
branches, but I don't see evidence to leave the minimal pattern
unoptimized.
2. This has a chance of increasing compile-time because we're using a
ValueTracking call to handle the match. The motivating cases could be
handled with a simpler pair of calls to isImpliedTrueByMatchingCmp/
isImpliedFalseByMatchingCmp, but I saw that we have a more
comprehensive wrapper around those, so we might as well use it here
unless there's evidence that it's significantly slower.
3. Ideally, we'd handle the fold to constants in InstSimplify, but as
with the existing code here, we could extend this to handle cases
where the result is not a constant, but a new combined predicate.
That would mean splitting the logic across the 2 passes and possibly
duplicating the pattern-matching cost.
4. As mentioned in D54827, this seems like the kind of thing that should
be handled in Correlated Value Propagation, but that pass is currently
limited to dealing with instructions with constant operands, so extending
this bit of InstCombine is the smallest/easiest way to get these patterns
optimized.
llvm-svn: 348367
Prep work for PR38243 - mainly adding comments on where we need to add modulo support (doing so at the moment causes massive codegen regressions).
I've also consistently added support for modulo folding for uniform constants (although at the moment we have no way to trigger this) and removed the old assertions.
llvm-svn: 348366