The code wasn't zero-extending correctly, so the comparison could
spuriously fail.
Adds some AArch64 tests to cover this case.
Inspired by D41791.
Differential Revision: https://reviews.llvm.org/D41798
llvm-svn: 322767
It appears that we haven't been prioritizing rules that contain nested
instructions properly. InstructionOperandMatcher didn't override
isHigherPriorityThan so it never compared the instructions/operands/predicates
inside nested instructions.
Fixes PR35926. Thanks to Diana Picus for the bug report.
llvm-svn: 322754
Get rid of DEBUG_FUNCTION_NAME symbols. When we actually debug
data, maybe we'll want somewhere to put it... but having a symbol
that just stores the name of another symbol seems odd.
It means you have multiple Symbols with the same name, one
containing the actual function and another containing the name!
Store the names in a vector on the WasmObjectFile when reading
them in. Also stash them on the WasmFunctions themselves.
The names are //not// "symbol names" or aliases or anything,
they're just the name that a debugger should show against the
function body itself. NB. The WasmObjectFile stores them so that
they can be exported in the YAML losslessly, and hence the tests
can be precise.
Enforce that the CODE section has been read in before reading
the "names" section. Requires minor adjustment to some tests.
Patch by Nicholas Wilson!
Differential Revision: https://reviews.llvm.org/D42075
llvm-svn: 322741
Trying to link
__attribute__((weak, visibility("hidden"))) extern int foo;
int *main(void) {
return &foo;
}
on OS X fails with
ld: 32-bit RIP relative reference out of range (-4294971318 max is +/-2GB): from _main (0x100000FAB) to _foo@0x00001000 (0x00000000) in '_main' from test.o for architecture x86_64
The problem being that 0 cannot be computed as a fixed difference from
%rip. Exactly the same issue exists on ELF and we can use the same
solution.
llvm-svn: 322739
This extends my previous patches to also optimize overflow-checked multiplies during SelectionDAG.
Differential revision: https://reviews.llvm.org/D40922
llvm-svn: 322738
The ARM backend contains code that tries to optimize compares by replacing them with an existing instruction that sets the flags the same way. This allows it to replace a "cmp" with a "adds", generalizing the code that replaces "cmp" with "sub". It also heuristically disables sinking of instructions that could potentially be used to replace compares (currently only if they're next to each other).
Differential revision: https://reviews.llvm.org/D38378
llvm-svn: 322737
Summary:
Discovered while working on a patch to move alignment in
@llvm.memcpy/move/set from an arg into parameter attributes.
The current implementations of AttributeSet::removeAttribute() and
AttributeList::removeAttribute crash when attempting to remove the
alignment attribute. Currently, these implementations add the
to-be-removed attributes to an AttrBuilder and then remove
the builder from the list/set. Alignment is special in that it
must be added to a builder with an integer value for the alignment;
attempts to add alignment to a builder without a value is an error.
This change fixes the removeAttribute implementations for AttributeSet and
AttributeList to make them able to remove the alignment, and other similar,
attributes.
Reviewers: rnk, chandlerc, pete, javed.absar, reames
Reviewed By: rnk
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41951
llvm-svn: 322735
Most are just replaced with instrs lists, but a few regexps have been further generalized to match more instructions with a single pattern.
llvm-svn: 322734
If we are splatting pairs of 32-bit elements, we can use a 64-bit broadcast to get the job done.
We could probably could probably do this with other sizes too, for example four 16-bit elements. Or we could broadcast pairs of 16-bit elements using a 32-bit element broadcast. But I've left that as a future improvement.
I've also restricted this to AVX2 only because we can only broadcast loads under AVX.
Differential Revision: https://reviews.llvm.org/D42086
llvm-svn: 322730
Summary: llc sometimes may not emit .cfi_startproc which makes func_dict to have less entries.
Subscribers: nemanjai, llvm-commits
Differential Revision: https://reviews.llvm.org/D42144
llvm-svn: 322725
We legalize selects of masks with scalar conditions using a bitcast to an integer type. But if we are in 32-bit mode we can't convert v64i1 to i64. So instead split the v64i1 to v32i1 and concat it back together. Each half will then be legalized by bitcasting to i32 which is fine.
The test case is a little indirect. If we have the v64i1 select in IR it will get legalized by legalize vector ops which has a run of type legalization after it. That type legalization run is able to fix this i64 bitcast. So in order to avoid that we need a build_vector of a splat which legalize vector ops will ignore. Legalize DAG will then turn that into a select via LowerBUILD_VECTORvXi1. And the select will get legalized. In this case there is no type legalizer run to cleanup the bitcast.
This fixes pr35972.
llvm-svn: 322724
candidates with coldcc attribute.
This patch adds support for the coldcc calling convention for Power.
This changes the set of non-volatile registers. It includes a pass to stress
test the implementation by marking all static directly called functions with
the coldcc attribute through the option -enable-coldcc-stress-test. It also
includes an option, -ppc-enable-coldcc, to add the coldcc attribute to
functions which are cold at all call sites based on BlockFrequencyInfo when
the containing function does not call any non cold functions.
Differential Revision: https://reviews.llvm.org/D38413
llvm-svn: 322721
BRCTH is capable of a long branch which needs to be recognized during branch
relaxation. This is done by checking for ExtraRelaxSize == 0.
Review: Ulrich Weigand
llvm-svn: 322688
This assert typically happens if an unstructured CFG is passed
to the pass. This can happen if the pass is run independently
without the structurizer.
llvm-svn: 322685
With my bad luck I separately implemented the DomTree preservation
for ConstantFoldTerminator before r322401 was committed. Commit the
tests which I think still provide some value.
llvm-svn: 322683
Summary:
Loading a vector of 4 half-precision FP sometimes results in an LD1
of 2 single-precision FP + a reversal. This results in an incorrect
byte swap due to the conversion from little endian to big endian.
In order to generate the correct byte swap, it is easier to
generate the correct LD1 of 4 half-precision FP, thus avoiding the
subsequent reversal.
Reviewers: craig.topper, jmolloy, olista01
Reviewed By: olista01
Subscribers: efriedma, samparker, SjoerdMeijer, rogfer01, aemerson, rengolin, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D41863
llvm-svn: 322663
I was comparing the demanded-bits implementations between InstCombine
and TargetLowering as part of investigating questions in D42088 and
noticed that this was wrong in IR. We were losing all of the prior
known bits when we got back to the 'zext'.
llvm-svn: 322662
When the compressed instruction set is enabled, the 16-bit c.nop can be
generated if necessary.
Differential Revision: https://reviews.llvm.org/D41221
Patch by Shiva Chen.
llvm-svn: 322658
Mark G_FPEXT and G_FPTRUNC as legal or libcall, depending on hardware
support, but only for conversions between float and double.
Also add the necessary boilerplate so that the LegalizerHelper can
introduce the required libcalls. This also works only for float and
double, but isn't too difficult to extend when the need arises.
llvm-svn: 322651
The match* functions have the annoying behavior of modifying its inputs.
Save and restore the inputs, just in case the early out for AVX512 is
hit. This is still not great and its only a matter of time this kind of
bug happens again, but I couldn't come up with a better pattern without
rewriting significant chunks of this code. Fixes PR35977.
llvm-svn: 322644
Summary:
Currently -glldb turns on emission of apple tables on all targets, but
lldb is only really capable of consuming them on darwin. Furthermore,
making lldb consume these tables is not straight-forward because of the
differences in how the debug info is distributed on darwin vs. elf
targets.
The darwin debug model assumes that the debug info (along with
accelerator tables) will either remain in the .o files or it will be
linked into a dsym bundle by a linker that knows how to merge these
tables. In the elf world, all present linkers will simply concatenate
these accelerator tables into the shared object. Since the tables are
not self-terminating, this renders the tables unusable, as the debugger
cannot pry the individual tables apart anymore.
It might theoretically be possible to make the tables work with split
dwarf, as that is somewhat similar to the apple .o model, but
unfortunately right now the combination of -glldb and -gsplit-dwarf
produces broken object files.
Until these issues are resolved there is no point in emitting the apple
tables for these targets. At best, it wastes space; at worst, it breaks
compilation and prevents the user from getting other benefits of -glldb.
Reviewers: probinson, aprantl, dblaikie
Subscribers: emaste, dim, llvm-commits, JDevlieghere
Differential Revision: https://reviews.llvm.org/D41986
llvm-svn: 322633
Summary:
In a follow-up commit I'll change the rules for emission of accelerator
tables, which means we won't be able to use them as a litmus test for
the debugger tuning options. Instead of sections, I base the test on the
presence/absence of some debug info attributes and opcodes:
LLDB - prefers DW_OP_form_tls_address and uses DW_AT_APPLE_optimized
GDB - prefers DW_OP_GNU_push_tls_address and does not use the optimized
attribute
SCE - prefers DW_OP_form_tls_address and does not use the optimized
attribute
Reviewers: probinson, aprantl, dblaikie
Subscribers: JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D41985
llvm-svn: 322630
Change symbol values in the stack_size section from being 8 bytes, to being a target dependent size.
Differential Revision: https://reviews.llvm.org/D42108
llvm-svn: 322619
We seem to be (logically) returning ArchExtKinds here in all cases, so
the return type should reflect that.
The static_cast is necessary because `A.ID` is actually an `unsigned`,
presumably since we use `decltype(A)` to represent extended attributes
for both ARM and AArch64, which use distinct `ArchExtKinds`.
We can't trivially make the same change for ARM, because one of the
values it returns is the bitwise-or of two `ARM::ArchExtKind`s.
llvm-svn: 322613