As frame pointer elimination isn't implemented until a later patch and we make
extensive use of update_llc_test_checks.py, this changes touches a lot of the
RISC-V tests.
Differential Revision: https://reviews.llvm.org/D39849
llvm-svn: 320357
This is a preparatory step for D34515.
This change:
- makes nodes ISD::ADDCARRY and ISD::SUBCARRY legal for i32
- lowering is done by first converting the boolean value into the carry flag
using (_, C) ← (ARMISD::ADDC R, -1) and converted back to an integer value
using (R, _) ← (ARMISD::ADDE 0, 0, C). An ARMISD::ADDE between the two
operations does the actual addition.
- for subtraction, given that ISD::SUBCARRY second result is actually a
borrow, we need to invert the value of the second operand and result before
and after using ARMISD::SUBE. We need to invert the carry result of
ARMISD::SUBE to preserve the semantics.
- given that the generic combiner may lower ISD::ADDCARRY and
ISD::SUBCARRYinto ISD::UADDO and ISD::USUBO we need to update their lowering
as well otherwise i64 operations now would require branches. This implies
updating the corresponding test for unsigned.
- add new combiner to remove the redundant conversions from/to carry flags
to/from boolean values (ARMISD::ADDC (ARMISD::ADDE 0, 0, C), -1) → C
- fixes PR34045
- fixes PR34564
- fixes PR35103
Differential Revision: https://reviews.llvm.org/D35192
llvm-svn: 320355
Introduces the AddrFI "addressing mode", which is necessary simply because
it's not possible to write a pattern that directly matches a frameindex.
Ensure callee-saved registers are accessed relative to the stackpointer. This
is necessary as callee-saved register spills are performed before the frame
pointer is set.
Move HexagonDAGToDAGISel::isOrEquivalentToAdd to SelectionDAGISel, so we can
make use of it in the RISC-V backend.
Differential Revision: https://reviews.llvm.org/D39848
llvm-svn: 320353
All files and parts of files related to microMIPS4R6 are removed.
When target is microMIPS4R6, errors are printed.
This is LLVM part of patch.
Differential Revision: https://reviews.llvm.org/D35625
llvm-svn: 320350
This matches AVX512 version and is more consistent overall. And improves our scheduler models.
In some cases this adds _Int to instructions that didn't have any Int_ before. It's a side effect of the adjustments made to some of the multiclasses.
llvm-svn: 320325
This separates the CPU specific scheduler model includes to occur after the instructions. Moves the instruction includes between the basic scheduler information and the CPU specific scheduler models.
llvm-svn: 320313
CreateAddRecFromPHIWithCastsImpl() adds an IncrementNUSW overflow predicate
which allows the PSCEV rewriter to rewrite this scev expression:
(zext i8 {0, + , (trunc i32 step to i8)} to i32)
into
{0, +, (sext i8 (trunc i32 step to i8) to i32)}
But then it adds the wrong Equal predicate:
%step == (zext i8 (trunc i32 %step to i8) to i32).
instead of:
%step == (sext i8 (trunc i32 %step to i8) to i32)
This is fixed here.
Differential Revision: https://reviews.llvm.org/D40641
llvm-svn: 320298
This adds assembly & disassembly support for the e500mc "external pid"
instructions.
See https://reviews.llvm.org/D39249.
Patch by vit9696 <vit9696@avp.su>
llvm-svn: 320287
This affects CVTSD2SS, FMA, RCP28, RSQRT28, and SQRT scalar instructions
'b' here refers to 'sae' not broadcast. These aren't memory instructions.
llvm-svn: 320281
If the question mark is inside the parentheses it only applies to the single character proceeding it.
I had to make a few additional cleanups to fix some duplicate warnings that were exposed by fixing this.
llvm-svn: 320279
When the lowest bits of the operands to an integer multiply are known, the low bits of the result are deducible.
Code to deduce known-zero bottom bits already existed, but this change improves on that by deducing known-ones.
Patch by: Pedro Ferreira
Reviewers: craig.topper, sanjoy, efriedma
Differential Revision: https://reviews.llvm.org/D34029
llvm-svn: 320269
We may need to widen the vector to make the shifts legal, but if we do that we need to make sure we shift left/right after accounting for the new size. If not we can't guarantee we are shifting in zeros.
The test cases affected actually show cases where we should move the shifts all together, but that's another problem.
llvm-svn: 320248
We were previously using kunpck with zero inputs unnecessarily. And we had cases where we would insert into a zero vector and then insert into larger zero vector incurring two sets of shifts.
llvm-svn: 320244
Summary:
This relaxes an assertion inside SelectionDAGBuilder which is overly
restrictive on targets which have no concept of alignment (such as AVR).
In these architectures, all types are aligned to 8-bits.
After this, LLVM will only assert that accesses are aligned on targets
which actually require alignment.
This patch follows from a discussion on llvm-dev a few months ago
http://llvm.1065342.n5.nabble.com/llvm-dev-Unaligned-atomic-load-store-td112815.html
Reviewers: bogner, nemanjai, joerg, efriedma
Reviewed By: efriedma
Subscribers: efriedma, cactus, llvm-commits
Differential Revision: https://reviews.llvm.org/D39946
llvm-svn: 320243
The outliner previously would never outline calls. Calls are pretty common in
files, so it makes sense to outline them. In fact, in the LLVM test suite, if
you count the number of instructions that the outliner misses when you outline
calls vs when you don't, it turns out that, on average, around 6% of the
instructions encountered are calls. So, if we outline calls, we can find more
candidates, and thus save some more space.
This commit adds that functionality and updates the mir test to reflect that.
llvm-svn: 320229
Summary:
Reuse the Linux new mapping as it is.
Sponsored by <The NetBSD Foundation>
Reviewers: joerg, eugenis, vitalybuka
Reviewed By: vitalybuka
Subscribers: llvm-commits, #sanitizers
Tags: #sanitizers
Differential Revision: https://reviews.llvm.org/D41022
llvm-svn: 320219
Summary:
This is LLVM instrumentation for the new HWASan tool. It is basically
a stripped down copy of ASan at this point, w/o stack or global
support. Instrumenation adds a global constructor + runtime callbacks
for every load and store.
HWASan comes with its own IR attribute.
A brief design document can be found in
clang/docs/HardwareAssistedAddressSanitizerDesign.rst (submitted earlier).
Reviewers: kcc, pcc, alekseyshl
Subscribers: srhines, mehdi_amini, mgorny, javed.absar, eraman, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D40932
llvm-svn: 320217
MachineSink attempts to place instructions near the basic blocks where
they are needed. Once an instruction has been sunk, its location
relative to other instructions no longer is consistent with the
original source code. In order to ensure correct stepping in the
debugger, the debug location for sunk instructions is either merged
with the insertion point or erased if the target successor block is
empty.
Originally submitted as r318679, revised to fix sanitizer failure and
improve testing.
Patch by Matthew Voss!
Differential Revision: https://reviews.llvm.org/D39933
llvm-svn: 320216
Both had a declaration of EmitXRayTable, but there is no method defined in either with that name. There is a emitXRayTable in the base class with a lower case 'e' and they both call that.
llvm-svn: 320213
Work towards the unification of MIR and debug output by refactoring the
interfaces.
Add support for operand subreg index as an immediate to debug printing
and use ::print in the MIRPrinter.
Differential Review: https://reviews.llvm.org/D40965
llvm-svn: 320209
Summary:
If a partially inlined function has debug info, we have to add debug
locations to the call instruction calling the outlined function.
We use the debug location of the first instruction in the outlined
function, as the introduced call transfers control to this statement and
there is no other equivalent line in the source code.
We also use the same debug location for the branch instruction added
to jump from artificial entry block for the outlined function, which just
jumps to the first actual basic block of the outlined function.
Reviewers: davide, aprantl, rriddle, dblaikie, danielcdh, wmi
Reviewed By: aprantl, rriddle, danielcdh
Subscribers: eraman, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D40413
llvm-svn: 320199
This includes a fix so that it doesn't transform declarations, and it
puts the functionality under control of a command-line option which is off
by default to avoid breaking existing setups.
llvm-svn: 320196
For narrow sizes we'll widen the zero vector and widen the insert. Then do an extract_subvector to get back down to correct size.
This allows us to remove some patterns from the isel table that had to COPY_TO_REGCLASS to an oversized register, do the shift and then COPY_TO_REGCLASS back to the narrow register. Now this is represented explicitly in the DAG.
This seems to have perturbed the register allocation in one of the tests, but the number of instructions didn't change.
llvm-svn: 320190
Causes unexpected memory issue with New PM this time.
The new PM invalidates BPI but not BFI, leaving the
reference to BPI from BFI invalid.
Abandon this patch. There is a more general solution
which also handles runtime infinite loop (but not statically).
llvm-svn: 320180
Currently tagged these as system instructions, once we have uses for them (ASAN?) and they are faster we will need to improve on this.
llvm-svn: 320173
These are aliases, but the thing we're checking here is that the target has
vpsllv*, not that the data type is 256-bit. Those instructions exist for
128-bit vectors too...but sadly, not for all element sizes.
llvm-svn: 320170
Summary:
llvm-objdump's Mach-O parser was updated in r306037 to display external
relocations for MH_KEXT_BUNDLE file types. This change extends the Macho-O
parser to display local relocations for MH_PRELOAD files. When used with
the -macho option relocations will be displayed in a historical format.
rdar://35778019
Reviewers: enderby
Reviewed By: enderby
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D40867
llvm-svn: 320166
Summary:
If we have the code like this:
```
float a, b;
a = std::max(a ,b);
```
it is converted into something like this:
```
%call = call dereferenceable(4) float* @_ZSt3maxIfERKT_S2_S2_(float* nonnull dereferenceable(4) %a.addr, float* nonnull dereferenceable(4) %b.addr)
%1 = bitcast float* %call to i32*
%2 = load i32, i32* %1, align 4
%3 = bitcast float* %a.addr to i32*
store i32 %2, i32* %3, align 4
```
After inlinning this code is converted to the next:
```
%1 = load float, float* %a.addr
%2 = load float, float* %b.addr
%cmp.i = fcmp fast olt float %1, %2
%__b.__a.i = select i1 %cmp.i, float* %a.addr, float* %b.addr
%3 = bitcast float* %__b.__a.i to i32*
%4 = load i32, i32* %3, align 4
%5 = bitcast float* %arrayidx to i32*
store i32 %4, i32* %5, align 4
```
This pattern is not recognized as minmax pattern.
Patch solves this problem by converting sequence
```
store (bitcast, (load bitcast (select ((cmp V1, V2), &V1, &V2))))
```
to a sequence
```
store (,load (select((cmp V1, V2), &V1, &V2)))
```
After this the code is recognized as minmax pattern.
Reviewers: RKSimon, spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D40304
llvm-svn: 320157
Summary:
+DumpCode is a hack to embed disassembly in the ELF file. This commit
fixes it to include labels, to make it slightly more useful.
Reviewers: arsenm, kzhuravl
Subscribers: nhaehnle, timcorringham, dstuttard, llvm-commits, t-tye, yaxunl, wdng, kzhuravl
Differential Revision: https://reviews.llvm.org/D40169
llvm-svn: 320146
In this method, we invoke `SimplifyICmpOperands` which takes the `Cond` predicate
by reference and may change it along with `LHS` and `RHS` SCEVs. But then we invoke
`computeShiftCompareExitLimit` with Values from which the SCEVs have been derived,
these Values have not been modified while `Cond` could be.
One of possible outcomes of this is that we may falsely prove that an infinite loop ends
within some finite number of iterations.
In this patch, we save the original `Cond` and pass it along with original operands.
This logic may be removed in future once `computeShiftCompareExitLimit` works
with SCEVs instead of value operands.
Reviewed By: sanjoy
Differential Revision: https://reviews.llvm.org/D40953
llvm-svn: 320142
Updated the scheduling information for the Haswell subtarget with the following changes:
Regrouped the instructions after adding appropriate load + store latencies.
Added scheduling for missing instructions such as the GATHER instrs.
The changes were made after revisiting the latencies impact of all memory uOps.
Reviewers: RKSimon, zvi, craig.topper, apilipenko
Differential Revision: https://reviews.llvm.org/D40021
Change-Id: Iaf6c1f5169add1552845a8a566af4e5a359217a7
llvm-svn: 320137
Previously we only allowed these through if the subvector came from a compare or test instruction which we would again check for during isel.
With this change we only check for the compare and test instructions during isel and have fallback patterns that emit the shifts if needed.
I noticed that in a lot of cases we don't actually see the compare during lowering and rely on an odd legalization of concat_vectors with a zero vector as the second argument. This keeps the concat_vectors around long enough for a later dag combine to expose the compare then we re-legalize the concat_vectors and catch the compare.
llvm-svn: 320134
Replace interleaved store instructions by equivalent and more efficient instructions based on latency cost model.
Https://reviews.llvm.org/D38196
llvm-svn: 320123
This reverts commit 959e37e669b0c3cfad4cb9f1f7c9261ce9f5e9ae.
That commit doesn't handle the case where main is declared rather than defined,
in particular the even-more special case where main is a prototypeless
declaration (which is of course the one actually used by musl currently).
llvm-svn: 320121
We previously only supported inserting to the LSB or MSB where it was easy to zero to perform an OR to insert.
This change effectively extracts the old value and the new value, xors them together and then xors that single bit with the correct location in the original vector. This will cancel out the old value in the first xor leaving the new value in the position.
The way I've implemented this uses 3 shifts and two xors and uses an additional register. We can avoid the additional register at the cost of another shift.
llvm-svn: 320120
In more recent Linux kernels with 47 bit VMAs the layout of virtual memory
for powerpc64 changed causing the address sanitizer to not work properly. This
patch adds support for 47 bit VMA kernels for powerpc64 and fixes up test
cases.
https://reviews.llvm.org/D40907
There is an associated patch for compiler-rt.
Tested on several 4.x and 3.x kernel releases.
llvm-svn: 320109
Previously, when linking against libcmt from the MSVC runtime,
lld-link /verbose would show "Ignoring unknown symbol record
with kind 0x1006". It turns out this was because
TypeIndexDiscovery did not handle S_REGISTER records, so these
records were not getting properly remapped.
Patch by: Alexnadre Ganea
Differential Revision: https://reviews.llvm.org/D40919
llvm-svn: 320108
Summary:
Make enum ModRefInfo an enum class. Changes to ModRefInfo values should
be done using inline wrappers.
This should prevent future bit-wise opearations from being added, which can be more error-prone.
Reviewers: sanjoy, dberlin, hfinkel, george.burgess.iv
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D40933
llvm-svn: 320107
It is causing sanitizer failures on llvm tests in a bootstrapped compiler. No bot link since it's currently down, but following up to get the bot up.
This reverts commit r319218.
llvm-svn: 320106
The offset overflow check before was incorrect. It would always give the
correct result, but it was comparing the SCALED potential fixed-up offset
against an UNSCALED minimum/maximum. As a result, the outliner was missing a
bunch of frame setup/destroy instructions that ought to have been safe to
outline. This fixes that, and adds an instruction to the .mir test that
failed the old test.
llvm-svn: 320090
-amdgpu-waitcnt-forcezero={1|0} Force all waitcnt instrs to be emitted as s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-amdgpu-waitcnt-forceexp=<n> Force emit a s_waitcnt expcnt(0) before the first <n> instrs
-amdgpu-waitcnt-forcelgkm=<n> Force emit a s_waitcnt lgkmcnt(0) before the first <n> instrs
-amdgpu-waitcnt-forcevm=<n> Force emit a s_waitcnt vmcnt(0) before the first <n> instrs
Differential Revision: https://reviews.llvm.org/D40091
llvm-svn: 320084
There's no v2i1 or v4i1 kshift, and v8i1 is only supported with AVXDQ. Isel has fake patterns to extend these types to native shifts, but makes no guarantees about the value of any bits shifted in when shifting right.
This patch promotes the vector to a type that supports a native shift first and only allows inserting into the msb of a native sized shift.
I've constructed this in a way that doesn't do the promotion if we're going to fallback to using a xmm/ymm/zmm shuffle. I think I have a plan to remove the shuffle fall back entirely. In which case we this can be simplified, but I wanted to fix the correctness issue first.
llvm-svn: 320081
Tagged as IMUL instructions for a reasonable approximation (ALU tends to be a lot faster) - POPCNT is currently tagged as FAdd which I think should be replaced with IMUL as well
llvm-svn: 320051
I noticed this pattern in D38316 / D38388. We failed to combine a shuffle that is either
repeating a scalar insertion at the same position in a vector or translated to a different
element index.
Like the earlier patch, this could be an instcombine too, but since we opted to make this
a DAG transform earlier, I've made this one a DAG patch too.
We do not need any legality checking because the new insert is identical to the existing
insert except that it may have a different constant insertion operand.
The constant insertion test in test/CodeGen/X86/vector-shuffle-combining.ll was the
motivation for D38756.
Differential Revision: https://reviews.llvm.org/D40209
llvm-svn: 320050
WebAssembly requires caller and callee signatures to match, so the usual
C runtime trick of calling main and having it just work regardless of
whether main is defined as '()' or '(int argc, char *argv[])' doesn't
work. Extend the FixFunctionBitcasts pass to rewrite main to use the
latter form.
llvm-svn: 320041
Simply checking for register class equality will break once additional
register classes are added (as is done for the RVC instruction set extension).
llvm-svn: 320036
This patch includes all missing functionality needed to provide first
compilation of a simple program that just returns from a function.
I've added a test case that checks for "ret" instruction printed in assembly
output.
Patch by Andrei Grischenko (andrei.l.grischenko@intel.com)
Differential revision: https://reviews.llvm.org/D39688
llvm-svn: 320035
As the FPR32 and FPR64 registers have the same names, use
validateTargetOperandClass in RISCVAsmParser to coerce a parsed FPR32 to an
FPR64 when necessary. The rest of this patch is very similar to the RV32F
patch.
Differential Revision: https://reviews.llvm.org/D39895
llvm-svn: 320023
The most interesting part of this patch is probably the handling of
rounding mode arguments. Sadly, the RISC-V assembler handles floating point
rounding modes as a special "argument" when it would be more consistent to
handle them like the atomics, opcode suffixes. This patch supports parsing
this optional parameter, using InstAlias to allow parsing these floating point
instructions when no rounding mode is specified.
Differential Revision: https://reviews.llvm.org/D39893
llvm-svn: 320020
A number of architectures re-use the same register names (e.g. for both 32-bit
FPRs and 64-bit FPRs). They are currently unable to use the tablegen'erated
MatchRegisterName and MatchRegisterAltName, as tablegen (when built with
asserts enabled) will fail.
When the AllowDuplicateRegisterNames in AsmParser is set, duplicated register
names will be tolerated. A backend can then coerce registers to the desired
register class by (for instance) implementing validateTargetOperandClass.
At least the in-tree Sparc backend could benefit from this, as does RISC-V
(single and double precision floating point registers).
Differential Revision: https://reviews.llvm.org/D39845
llvm-svn: 320018
Summary:
Changed use_instructions() to use_nodbg_instructions() when
building an instruction set.
We don't want the presence of debug info to affect the code
we generate.
Reviewers: dblaikie, Eugene.Zelenko, chandlerc, aprantl
Reviewed By: aprantl
Subscribers: aprantl, llvm-commits
Differential Revision: https://reviews.llvm.org/D40882
llvm-svn: 320010
Currently, when creating a named section, the Wasm
frontend forces it to use `SectionKind::Data`, whereas
in fact C++ does generate code sections with custom
names.
Patch by Nicholas Wilson
Differential Revision: https://reviews.llvm.org/D40906
llvm-svn: 320002
dsymutil doesn't yet understand the new format and the change,
among others, breaks a large fraction of the debugger tests on
mac OS.
rdar://problem/35856354
llvm-svn: 319995
This extends r319391. It teaches the segment builder to emit the right
completed segment when more than one region ends at the same location.
Fixes PR35495.
llvm-svn: 319990
Instead of having .o files contain linear-memory and function table
definitions, use imports. This is more consistent with the stack pointer
being imported, and it's consistent with the linker being the one to
decide whether linear memory and function table are imported or defined
in the linked output. This implements tool-conventions #23.
Differential Revision: https://reviews.llvm.org/D40875
llvm-svn: 319989
Summary:
This patch adds MachineCombiner patterns for transforming
(fsub (fmul x y) z) into (fma x y (fneg z)). This has a lower
latency on micro architectures where fneg is cheap.
Patch based on work by George Steed.
Reviewers: rengolin, joelkevinjones, joel_k_jones, evandro, efriedma
Reviewed By: evandro
Subscribers: aemerson, javed.absar, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D40306
llvm-svn: 319980
As a new access is generated spanning across multiple fields, we need to
propagate alias info from all the fields to form the most generic alias info.
rdar://35602528
Differential Revision: https://reviews.llvm.org/D40617
llvm-svn: 319979
This patch factors out the main code transformation utilities in the pgo-driven
indirect call promotion pass and places them in Transforms/Utils. The change is
intended to be a non-functional change, letting non-pgo-driven passes share a
common implementation with the existing pgo-driven pass.
The common utilities are used to conditionally promote indirect call sites to
direct call sites. They perform the underlying transformation, and do not
consider profile information. The pgo-specific details (e.g., the computation
of branch weight metadata) have been left in the indirect call promotion pass.
Differential Revision: https://reviews.llvm.org/D40658
llvm-svn: 319963
Summary:
When calculating the RootLatency, we add up all the latencies of the
deleted instructions. But for NewRootLatency we only add the latency of
the new root instructions, ignoring the latencies of the other
instructions inserted. This leads the combiner to underestimate the cost
of patterns which add multiple instructions. This patch fixes that by
summing up the latencies of all new instructions. For NewRootNode, the
more complex getLatency function is used.
Note that we may be slightly more precise than just summing up
all latencies. For example, consider a pattern like
r1 = INS1 ..
r2 = INS2 ..
r3 = INS3 r1, r2
I think in some other places, the total latency of the pattern would be
estimated as lat(INS3) + max(lat(INS1), lat(INS2)). If you consider
that worth changing, I think it would be best to do in a follow-up
patch.
Reviewers: Gerolf, sebpop, spop, fhahn
Reviewed By: fhahn
Subscribers: evandro, llvm-commits
Differential Revision: https://reviews.llvm.org/D40307
llvm-svn: 319951
Summary:
There is no need to replace the original call instruction if no
VarArgs need to be forwarded.
Reviewers: davide, rnk, majnemer, efriedma
Reviewed By: efriedma
Subscribers: eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D40412
llvm-svn: 319947
Patch by David Major.
The NSS project's .def files make heavy use of semicolons in a
frightening attempt at portability:
https://hg.mozilla.org/projects/nss/raw-file/tip/lib/ckfw/capi/nsscapi.def
lld-link was treating the semicolon as part of the export name,
resulting in unresolved symbols. This patch includes ';' in the list of
characters to split on.
Differential Revision: https://reviews.llvm.org/D39968
llvm-svn: 319933
Previously the lambda for AVX512 passed out a flag that indicated whether AVX512BW was required and that was checked against the AVX512BW subtarget flag outside.
This patch changes the interface to pass the AVX512BW subtarget bit in and return its value if we detect 16 or 8 bit types.
llvm-svn: 319919
Summary:
An undef extract index can be arbitrarily chosen to be an
out-of-range index value, which would result in the instruction being undef.
This change closes a gap identified while working on lowering vector permute intrinsics
with variable index vectors to pure LLVM IR.
Reviewers: arsenm, spatel, majnemer
Reviewed By: arsenm, spatel
Subscribers: fhahn, nhaehnle, wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D40231
llvm-svn: 319910