Summary: This implements searching for function symbols and public symbols by address.
More specifically,
-Implements NativeSession::findSymbolByAddress for function symbols and
public symbols. I think data symbols are also searched for, but isn't
implemented in this patch.
-Adds classes for NativeFunctionSymbol and NativePublicSymbol
-Adds a '-use-native-pdb-reader' option to llvm-symbolizer, for testing
purposes.
Reviewers: rnk, amccarth, labath
Subscribers: mgorny, hiraditya, MaskRay, rupprecht, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79269
Sometimes you want to disable a FileCheck directive without removing
it entirely, or you want to write comments that mention a directive by
name. The `COM:` directive makes it easy to do this. For example,
you might have:
```
; X32: pinsrd_1:
; X32: pinsrd $1, 4(%esp), %xmm0
; COM: FIXME: X64 isn't working correctly yet for this part of codegen, but
; COM: X64 will have something similar to X32:
; COM:
; COM: X64: pinsrd_1:
; COM: X64: pinsrd $1, %edi, %xmm0
```
Without this patch, you need to use some combination of rewording and
directive syntax mangling to prevent FileCheck from recognizing the
commented occurrences of `X32:` and `X64:` above as directives.
Moreover, FileCheck diagnostics have been proposed that might complain
about the occurrences of `X64` that don't have the trailing `:`
because they look like directive typos:
<http://lists.llvm.org/pipermail/llvm-dev/2020-April/140610.html>
I think dodging all these problems can prove tedious for test authors,
and directive syntax mangling already makes the purpose of existing
test code unclear. `COM:` can avoid all these problems.
This patch also updates the small set of existing tests that define
`COM` as a check prefix:
- clang/test/CodeGen/default-address-space.c
- clang/test/CodeGenOpenCL/addr-space-struct-arg.cl
- clang/test/Driver/hip-device-libs.hip
- llvm/test/Assembler/drop-debug-info-nonzero-alloca.ll
I think lit should support `COM:` as well. Perhaps `clang -verify`
should too.
Reviewed By: jhenderson, thopre
Differential Revision: https://reviews.llvm.org/D79276
Under MVE a vdup will always take a gpr register, not a floating point
value. During DAG combine we convert the types to a bitcast to an
integer in an attempt to fold the bitcast into other instructions. This
is OK, but only works inside the same basic block. To do the same trick
across a basic block boundary we need to convert the type in
codegenprepare, before the splat is sunk into the loop.
This adds a convertSplatType function to codegenprepare to do that,
putting bitcasts around the splat to force the type to an integer. There
is then some adjustment to the code in shouldSinkOperands to handle the
extra bitcasts.
Differential Revision: https://reviews.llvm.org/D78728
Similar to fmul/fadd, we can sink a splat into a loop containing a fma
in order to use more register instruction variants. For that there are
also adjustments to the sinking code to handle more than 2 arguments.
Differential Revision: https://reviews.llvm.org/D78386
This patch adds a new TTI hook to allow targets to tell LSR that
a chain including some instruction is already profitable and
should not be optimized. This patch also adds an implementation
of this TTI hook for ARM so LSR doesn't optimize chains that include
the VCTP intrinsic.
Differential Revision: https://reviews.llvm.org/D79418
Summary:
In the assembler or inline assembler,
attempting to use an invalid fixup type
gives a crash with a segmentation fault.
__attribute__((naked))
void foo(void) {
__asm__("mov r9, :lower16:bar(prel31)");
}
This should give a proper error message when building for ARM or Thumb.
This brings it in line with AARCH64.
This fixes all 8 instances of llvm_unreachable("Unsupported Modifier");
in ARM/MCTargetDesc/ARMELFObjectWriter.cpp.
A test is provided for each instance.
Reviewers: llvm-commits, MarkMurrayARM
Reviewed By: MarkMurrayARM
Subscribers: kristof.beyls, hiraditya, danielkiss
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79782
Change-Id: I6971ba37f129cc453568fe71514ccb2ac9d16831
This was reverted because of a miscompilation. At closer inspection, the
problem was actually visible in a changed llvm regression test too. This
one-line follow up fix/recommit will splat the IV, which is what we are trying
to avoid if unnecessary in general, if tail-folding is requested even if all
users are scalar instructions after vectorisation. Because with tail-folding,
the splat IV will be used by the predicate of the masked loads/stores
instructions. The previous version omitted this, which caused the
miscompilation. The original commit message was:
If tail-folding of the scalar remainder loop is applied, the primary induction
variable is splat to a vector and used by the masked load/store vector
instructions, thus the IV does not remain scalar. Because we now mark
that the IV does not remain scalar for these cases, we don't emit the vector IV
if it is not used. Thus, the vectoriser produces less dead code.
Thanks to Ayal Zaks for the direction how to fix this.
This is a reimplementation of the `orderNodes` function, as the old
implementation didn't take into account all cases.
Fix PR41509
Differential Revision: https://reviews.llvm.org/D79037
This patch extends DIModule Debug metadata in LLVM to support
Fortran modules. DIModule is extended to contain File and Line
fields, these fields will be used by Flang FE to create debug
information necessary for representing Fortran modules at IR level.
Furthermore DW_TAG_module is also extended to contain these fields.
If these fields are missing, debuggers like GDB won't be able to
show Fortran modules information correctly.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D79484
Hide the method that allows setting probability for particular
edge and introduce a public method that sets probabilities for
all outgoing edges at once.
Setting individual edge probability is error prone. More over
it is difficult to check that the total probability is 1.0
because there is no easy way to know when the user finished
setting all the probabilities.
Reviewers: yamauchi, ebrevnov
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79396
xsnegdp, xsabsdp and xsnabsdp can be used to operate on f32 operand.
This patch adds the missing patterns since we prefer VSX instructions
when available.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D75344
Legalizer should respect both command-line options or SDNode-level
fast-math flags.
Also, this patch propagates other flags during custom simplifying.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D79074
Summary:
The ppc-early-ret pass use the addReg() to add operand to the new
instruction, it can't reserve the flag of old operand. This has caused
machine verfications failed.
This patch use add() to instead of addReg().
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D77997
Fixes PR41696
The loop-reroll pass generates an invalid IR (or its assertion
fails in debug build) if values of the base instruction and
other root instructions (terms used in the loop-reroll pass)
are used outside the loop block. See IRs written in PR41696
as examples.
The current implementation of the loop-reroll pass can reroll
only loops that don't have values that are used outside the
loop, except reduced values (the last values of reduction chains).
This is described in the comment of the `LoopReroll::reroll`
function.
https://github.com/llvm/llvm-project/blob/llvmorg-10.0.0/llvm/lib/Transforms/Scalar/LoopRerollPass.cpp#L1600
This is checked in the `LoopReroll::DAGRootTracker::validate`
function.
https://github.com/llvm/llvm-project/blob/llvmorg-10.0.0/llvm/lib/Transforms/Scalar/LoopRerollPass.cpp#L1393
However, the base instruction and other root instructions skip
this check in the validation loop.
https://github.com/llvm/llvm-project/blob/llvmorg-10.0.0/llvm/lib/Transforms/Scalar/LoopRerollPass.cpp#L1229
Moving the check in front of the skip is the logically simplest
fix. However, inserting the check in an earlier stage is better
in terms of compilation time of unrerollable loops. This fix
inserts the check for the base instruction into the function
to validate possible base/root instructions. Check for other
root instructions is unnecessary because they don't match any
base instructions if they have uses outside the loop.
Differential Revision: https://reviews.llvm.org/D79549
For AAReturnedValues we treated new and existing information differently
in the updateImpl. Only the latter was properly analyzed and
categorized. The former was thought to be analyzed in the subsequent
update. Since the Attributor does not support "self-updates" we need to
make sure the state is "stable" after each updateImpl invocation. That
is, if the surrounding information does not change, the state is valid.
Now we make sure all return values have been handled and properly
categorized each iteration. We might not update again if we have not
requested a non-fix attribute so we cannot "wait" for the next update to
analyze a new return value.
Bug reported by @sdmitriev.
Summary:
This fixes PR45885 by fixing isGuaranteedNotToBeUndefOrPoison so it does not look into dominating
branch conditions of V when V is an instruction in an unreachable block.
Reviewers: spatel, nikic, lebedev.ri
Reviewed By: nikic
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79790
We want to add a way to avoid merging identical calls so as to keep the
separate debug-information for those calls. There is also an asan
usecase where having this attribute would be beneficial to avoid
alternative work-arounds.
Here is the link to the feature request:
https://bugs.llvm.org/show_bug.cgi?id=42783.
`nomerge` is different from `noline`. `noinline` prevents function from
inlining at callsites, but `nomerge` prevents multiple identical calls
from being merged into one.
This patch adds `nomerge` to disable the optimization in IR level. A
followup patch will be needed to let backend understands `nomerge` and
avoid tail merge at backend.
Reviewed By: asbirlea, rnk
Differential Revision: https://reviews.llvm.org/D78659
We can produce such vectors in the Promote Alloca pass,
but we are unable to use movrel to operate it and lower
via scratch. Making it legal makes SI_INDIRECT patterns
work.
There is more work to do in subsequent changes:
1. We initialize m0 twice to access each dword. It shall
be possible to only do it once and increment base register
number instead.
2. We also need v16i64/v16f64 but these first need to be
added to tablegen.
Differential Revision: https://reviews.llvm.org/D79808
SDAG suffers when it can't see that a funnel operand is a splat value
(due to single-basic-block visibility), so invert the normal loop
hoisting rules to move a splat op closer to its use.
This would be part 1 of an enhancement similar to D63233.
This is needed to re-fix PR37426:
https://bugs.llvm.org/show_bug.cgi?id=37426
...because we got better at canonicalizing IR to funnel shift intrinsics.
The existing CGP code for shift opcodes is likely overstepping what it was
intended to do, so that will be fixed in a follow-up.
Differential Revision: https://reviews.llvm.org/D79718
Summary:
The SPE doesn't have a 'fma' instruction, so the intrinsic becomes a
libcall. It really should become an expansion to two instructions, but
for some reason the compiler doesn't think that's as optimal as a
branch. Since this lowering is done after CTR is allocated for loops,
tell the optimizer that CTR may be used in this case. This prevents a
"Invalid PPC CTR loop!" assertion in the case that a fma() function call
is used in a C/C++ file, and clang converts it into an intrinsic.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D78668
Summary:
This patch refactors handling of VarArgs in
X86TargetLowering::LowerFormalArguments.
That refactoring was requested while reviewing
D69372. Code related to varargs handling is removed
from X86TargetLowering::LowerFormalArguments and
is divided into smaller routines.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D74794
GNU ld's internal linker script uses (https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=add44f8d5c5c05e08b11e033127a744d61c26aee)
.text :
{
*(.text.unlikely .text.*_unlikely .text.unlikely.*)
*(.text.exit .text.exit.*)
*(.text.startup .text.startup.*)
*(.text.hot .text.hot.*)
*(SORT(.text.sorted.*))
*(.text .stub .text.* .gnu.linkonce.t.*)
/* .gnu.warning sections are handled specially by elf.em. */
*(.gnu.warning)
}
Because `*(.text.exit .text.exit.*)` is ordered before `*(.text .text.*)`, in a -ffunction-sections build, the C library function `exit` will be placed before other functions.
gold's `-z keep-text-section-prefix` has the same problem.
In lld, `-z keep-text-section-prefix` recognizes `.text.{exit,hot,startup,unlikely,unknown}.*`, but not `.text.{exit,hot,startup,unlikely,unknown}`, to avoid the strange placement problem.
In -fno-function-sections or -fno-unique-section-names mode, a function whose `function_section_prefix` is set to `.exit"`
will go to the output section `.text` instead of `.text.exit` when linked by lld.
To address the problem, append a dot to become `.text.exit.`
Reviewed By: grimar
Differential Revision: https://reviews.llvm.org/D79600
This patch folds redundant load immediates into a zero for instructions
which recognise this as the value zero and not the register. If the load
immediate is no longer in use it is then deleted.
This is already done in earlier passes but the ppc-mi-peephole allows for
a more general implementation.
Differential Revision: https://reviews.llvm.org/D69168
Summary:
This patch makes propagatesPoison be more accurate by returning true on
more bin ops/unary ops/casts/etc.
The changed test in ScalarEvolution/nsw.ll was introduced by
a19edc4d15 .
IIUC, the goal of the tests is to show that iv.inc's SCEV expression still has
no-overflow flags even if the loop isn't in the wanted form.
It becomes more accurate with this patch, so think this is okay.
Reviewers: spatel, lebedev.ri, jdoerfert, reames, nikic, sanjoy
Reviewed By: spatel, nikic
Subscribers: regehr, nlopes, efriedma, fhahn, javed.absar, llvm-commits, hiraditya
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78615
We have a couple main strategies for legalizing MULH.
-If the vXi16 type is legal, extend to do the full i16 multiply
and then shift and truncate the results.
-Use unpcks to split each 128 bit lane into high and low halves.a
For signed we have an extra case to split a v32i8 to v16i8 and then
use the extending to v16i16 strategy.
This patch proposes to use the unpck strategy instead. Which is
what we already do for unsigned.
This seems to be 1 instruction shorter when the RHS is constant
like the idiv case. It's 1 instruction longer for the smulo case.
But we're trading cross lane shuffles for inlane shuffles and a
shift.
Differential Revision: https://reviews.llvm.org/D79652
Summary:
In 2e24219d3c, a number of ARM pcrel fixups were resolved at assembly
time, to solve PR44929. This only covered little-endian ARM however, so
add similar fixups for big-endian ARM. Also extend the test case to
cover big-endian ARM.
Reviewers: hans, psmith, MaskRay
Reviewed By: psmith, MaskRay
Subscribers: kristof.beyls, hiraditya, danielkiss, emaste, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79774
Summary:
As proposed in https://github.com/WebAssembly/simd/pull/122. Since
these instructions are not yet merged to the SIMD spec proposal, this
patch makes them entirely opt-in by surfacing them only through LLVM
intrinsics and clang builtins. If these instructions are made
official, these intrinsics and builtins should be replaced with simple
instruction patterns.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D79742
gcov 4.8 (r189778) moved the exit block from the last to the second.
The .gcda format is compatible with 4.7 but
* decoding libgcov 4.7 produced .gcda with gcov [4.7,8) can mistake the
exit block, emit bogus `%s:'%s' has arcs from exit block\n` warnings,
and print wrong `" returned %s` for branch statistics (-b).
* decoding libgcov 4.8 produced .gcda with gcov 4.7 has similar issues.
Also, rename "return block" to "exit block" because the latter is the
appropriate term.
Summary:
As commented in the code, ProfileSummaryAnalysis is required for inliner
pass to query, so this patch moved
RequireAnalysisPass<ProfileSummaryAnalysis> in the recently created
buildInlinerPipeline.
Reviewer: mtrofin, davidxl, tejohnson, dblaikie, jdoerfert, sstefan1
Reviewed By: mtrofin, davidxl, jdoerfert
Subscribers: hiraditya, steven_wu, dexonsmith, wuzish, llvm-commits,
jsji
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D79696
Summary:
ConstantExprs involving operations on <1 x Ty> could translate into MIR
that failed to verify with:
*** Bad machine code: Reading virtual register without a def ***
The problem was that translate(const Constant &C, Register Reg) had
recursive calls that passed the same Reg in for the translation of a
subexpression, but without updating VMap for the subexpression first as
translate(const Constant &C, Register Reg) expects.
Fix this by using the same translateCopy helper function that we use for
translating Instructions. In some cases this causes extra G_COPY
MIR instructions to be generated.
Fixes https://bugs.llvm.org/show_bug.cgi?id=45576
Reviewers: arsenm, volkan, t.p.northover, aditya_nandakumar
Subscribers: jvesely, wdng, nhaehnle, rovka, hiraditya, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78378
getARMVPTBlockMask was an outdated function that only handled basic
block masks: T, TT, TTT and TTTT. This worked fine before the MVE
VPT Block Insertion Pass improvements as it was the only kind of
masks that it could generate, but now it can generate more complex
masks that uses E predicates, so it's dangerous to use that function
to calculate VPT/VPST block masks.
I replaced it with 2 different functions:
- expandPredBlockMask, in ARMBaseInfo. This adds an "E" or "T" at
the end of an existing PredBlockMask.
- recomputeVPTBlockMask, in Thumb2InstrInfo. This takes an iterator
to a VPT/VPST instruction and recomputes its block mask by looking
at the predicated instructions that follows it. This should be
used to recompute a block mask after removing/adding a predicated
instruction to the block.
The expandPredBlockMask function is pretty much imported from the MVE
VPT Blocks pass.
I had to change the ARMLowOverheadLoops and MVEVPTBlocks passes as well
so they could use these new functions.
Differential Revision: https://reviews.llvm.org/D78201