I don't know how to expose this in a test. There are ARM / AArch64
sched classes that include zero latency instructions, but I'm not
seeing sched info printed for those targets. X86 will almost
certainly have these soon (see PR36671), but no model has
'let Latency = 0' currently.
llvm-svn: 327518
Before this patch, the register file was always updated at instruction creation
time. That means, new read-after-write dependencies, and new temporary registers
were allocated at instruction creation time.
This patch refactors the code in InstrBuilder, and move all the logic that
updates the register file into the dispatch unit. We only want to update the
register file when instructions are effectively dispatched (not before).
This refactoring also helps removing a bad dependency between the InstrBuilder
and the DispatchUnit.
No functional change intended.
llvm-svn: 327514
Summary:
(Restores r327459 with handling for old plugin-api.h)
Utilize new gold plugin api interface for obtaining --wrap option
arguments, and LTO API handling (added for --wrap support in lld LTO),
to mark symbols so that LTO does not optimize them inappropriately.
Note the test cases will be in a new gold test subdirectory that
is dependent on the next release of gold which will contain the new
interfaces.
Reviewers: pcc, tmsriram
Subscribers: mehdi_amini, llvm-commits, inglorion
Differential Revision: https://reviews.llvm.org/D44235
llvm-svn: 327506
Support G_LSHR/G_ASHR/G_SHL. We have 3 variance for
shift instructions : shift gpr, shift imm, shift 1.
Currently GlobalIsel TableGen generate patterns for
shift imm and shift 1, but with shiftCount i8.
In G_LSHR/G_ASHR/G_SHL like LLVM-IR both arguments
has the same type, so for now only shift i8 can use
auto generated TableGen patterns.
The support of G_SHL/G_ASHR enables tryCombineSExt
from LegalizationArtifactCombiner.h to hit, which
results in different legalization for the following tests:
LLVM :: CodeGen/X86/GlobalISel/ext-x86-64.ll
LLVM :: CodeGen/X86/GlobalISel/gep.ll
LLVM :: CodeGen/X86/GlobalISel/legalize-ext-x86-64.mir
-; X64-NEXT: movsbl %dil, %eax
+; X64-NEXT: movl $24, %ecx
+; X64-NEXT: # kill: def $cl killed $ecx
+; X64-NEXT: shll %cl, %edi
+; X64-NEXT: movl $24, %ecx
+; X64-NEXT: # kill: def $cl killed $ecx
+; X64-NEXT: sarl %cl, %edi
+; X64-NEXT: movl %edi, %eax
..which is not optimal and should be addressed later.
Rework of the patch by igorb
Reviewed By: igorb
Differential Revision: https://reviews.llvm.org/D44395
llvm-svn: 327499
This could end up inititialized if someone called the function with a
null AsmPrinter. Right now this only happens in DIEHash unit tests,
presumably because it was hard to create an AsmPrinter in the context of
unit tests. This only worked before r327486 because those tests did not
use any dwarf forms whose size actually depended on the dwarf version
(otherwise, they would have crashed due to null dereference).
I fix the uninitialized error, by explicitly initializing FormParams to
an invalid value, which will cause getFixedFormByteSize to return None
if called with a form with version-dependent size. A more principled
solution might be to fix the DIEHash tests to always pass in a valid
AsmPrinter.
llvm-svn: 327498
These previously all failed one way or another, but now we produce a more
helpful error message.
Change-Id: I8ffd2e87c8e35a5134c3be289e0a1fecaa2bb8ca
Differential revision: https://reviews.llvm.org/D44115
llvm-svn: 327497
Additionally, allow more than two operands to !con, !add, !and, !or
in the same way as is already allowed for !listconcat and !strconcat.
Change-Id: I9659411f554201b90cd8ed7c7e004d381a66fa93
Differential revision: https://reviews.llvm.org/D44112
llvm-svn: 327494
This makes using !dag more convenient in some cases.
Change-Id: I0a8c35e15ccd1ecec778fd1c8d64eee38d74517c
Differential revision: https://reviews.llvm.org/D44111
llvm-svn: 327493
This allows constructing DAG nodes with programmatically determined
names, and can simplify constructing DAG nodes in other cases as
well.
Also, add documentation and some very simple tests for the already
existing !con.
Change-Id: Ida61cd82e99752548d7109ce8da34d29da56a5f7
Differential revision: https://reviews.llvm.org/D44110
llvm-svn: 327492
Summary:
This patch replaces the two switches which are deducing the size of
various forms with a single implementation. I have put the new
implementation into BinaryFormat, to avoid introducing dependencies
between the two independent libraries (DebugInfo and CodeGen) that need
this functionality.
Reviewers: aprantl, JDevlieghere, dblaikie
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D44418
llvm-svn: 327486
Make the architecture part of the warning in the DebugMapParser. This
makes things consistent with the Apple's internal version of dsymutil.
llvm-svn: 327485
Summary:
This is needed so that external projects (e.g. a standalone build of
lldb) can link to the LLVM shared library via the USE_SHARED argument of
llvm_config. Without this, llvm_config would add LLVM to the link list,
but then also add the constituent static libraries, resulting in
multiply defined symbols.
Reviewers: beanz, mgorny
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D44391
llvm-svn: 327484
Summary:
The old bindings should have used an enum instead of a boolean. This
deprecates LLVMHasUnnamedAddr and LLVMSetUnnamedAddr , replacing them
with LLVMGetUnnamedAddress and LLVMSetUnnamedAddress respectively that do.
Though it is unlikely LLVM will gain more supported global value linker
hints, the new API can scale to accommodate this.
Reviewers: deadalnix, whitequark
Reviewed By: whitequark
Subscribers: llvm-commits, harlanhaskins
Differential Revision: https://reviews.llvm.org/D43448
llvm-svn: 327479
The Error locals need to be protected by a mutex. (This could be fixed by
having the promises / futures contain Expected and Error values, but
MSVC's future implementation does not support this yet).
Hopefully this will fix some of the errors seen on the builders due to
r327474.
llvm-svn: 327477
This can be used to extract the symbol table from a RuntimeDyld instance prior
to disposing of it.
This patch also updates RTDyldObjectLinkingLayer to use the new method, rather
than requesting symbols one at a time via getSymbol.
llvm-svn: 327476
The lookup function takes a list of VSOs, a set of symbol names (or just one
symbol name) and a materialization function object. It returns an
Expected<SymbolMap> (if given a set of names) or an Expected<JITEvaluatedSymbol>
(if given just one name). The lookup method constructs an
AsynchronousSymbolQuery for the given names, applies that query to each VSO in
the list in turn, and then blocks waiting for the query to complete. If
threading is enabled then the materialization function object can be used to
execute the materialization on different threads. If threading is disabled the
MaterializeOnCurrentThread utility must be used.
llvm-svn: 327474
Summary:
Utilize new gold plugin api interface for obtaining --wrap option
arguments, and LTO API handling (added for --wrap support in lld LTO),
to mark symbols so that LTO does not optimize them inappropriately.
Note the test cases will be in a new gold test subdirectory that
is dependent on the next release of gold which will contain the new
interfaces.
Reviewers: pcc, tmsriram
Subscribers: mehdi_amini, llvm-commits, inglorion
Differential Revision: https://reviews.llvm.org/D44235
llvm-svn: 327459
We now only create recursive concats if we have more than two non-zero values. This keeps our subvector broadcast DAG combine functioning.
llvm-svn: 327457
This better able to detect undef and zeros pieces in the concat. Or cases when only one subvector is non-zero. This allows us to avoid silly things like double inserts into progressively larger undefs.
This still builds 512 bit concats of 128 bits by building up through 256 bits first. But I don't know if that's best.
We probably want to merge this with the vXi1 concat code since they are very similar.
llvm-svn: 327454
BUILD_VECTORs aren't themselves legalized until LegalizeDAG so we should still be able to create an "illegal" one before that. This helps combine with BUILD_VECTORS that are introduced during LegalizeVectorOps due to unrolling.
llvm-svn: 327446
Nothing prevents us from having both frame-setup and frame-destroy on
the same instruction.
When merging:
* frame-setup OPCODE1
* frame-destroy OPCODE2
into
* frame-setup frame-destroy OPCODE3
we want to be able to print and parse both flags.
llvm-svn: 327442
Nops should have zero latency because there is no result.
Idioms like 'xorps xmm0, xmm0' may have zero latency because
they are handled without using an execution unit.
llvm-svn: 327435
Summary:
It is possible for LVI to encounter instructions that are not in valid
SSA form and reference themselves. One example is the following:
%tmp4 = and i1 %tmp4, undef
Before this patch LVI would recurse until running out of stack memory
and crashed. This patch marks these self-referential instructions as
Overdefined and aborts analysis on the instruction.
Fixes https://bugs.llvm.org/show_bug.cgi?id=33357
Reviewers: craig.topper, anna, efriedma, dberlin, sebpop, kuhar
Reviewed by: dberlin
Subscribers: uabelho, spatel, a.elovikov, fhahn, eli.friedman, mzolotukhin, spop, evandro, davide, llvm-commits
Differential Revision: https://reviews.llvm.org/D34135
llvm-svn: 327432
Make sure that DWARF line information generated by Windows can be properly read by Posix OS and vice versa.
Differential Revision: https://reviews.llvm.org/D44290
llvm-svn: 327430
Injected sources are basically a way to add actual source file content
to your PDB. Presumably you could use this for shipping your source code
with your debug information, but in practice I can only find this being
used for embedding natvis files inside of PDBs.
In order to effectively test LLVM's natvis file injection, we need a way
to dump the injected sources of a PDB in a way that is authoritative
(i.e. based on Microsoft's understanding of the PDB format, and not
LLVM's). To this end, I've added support for dumping injected sources
via DIA. I made a PDB file that used the /natvis option to generate a
test case.
Differential Revision: https://reviews.llvm.org/D44405
llvm-svn: 327428
Since r327420, the tool can query the MCSchedModel interface to obtain the
reciprocal throughput information.
As a consequence, method `ResourceManager::getRThroughput`, and
method `Backend::getRThroughput` are no longer needed.
This patch simplifies the code by removing the custom RThroughput computation.
This patch also refactors class SummaryView by removing the dependency with
the Backend object.
No functional change intended.
llvm-svn: 327425
Under some circumstances the divrems won't have been combined together before getting to this code.
So replace the assertion with a if() guard to not expand to X-((X/C)*C) to give the other combine chance to happen.
Reduced from OSS-Fuzz #6883https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=6883
llvm-svn: 327424
Summary: Build system changes for RISCV. Makes it possible to build just the RISCV target alone.
Reviewers: asb, apazos, mgrang, beanz
Reviewed By: asb
Subscribers: mgorny, kito-cheng, shiva0217, llvm-commits
Differential Revision: https://reviews.llvm.org/D44153
llvm-svn: 327423
Summary:
These changes are to allow to a Result object to have nested Result objects in
order to support microbenchmarks. Currently lit is restricted to reporting one
result object for one test, this change provides support tests that want to
report individual timings for individual kernels.
This revision is the result of the discussions in
https://reviews.llvm.org/D32272#794759,
https://reviews.llvm.org/D37421#f8003b27 and https://reviews.llvm.org/D38496.
It is a separation of the changes purposed in https://reviews.llvm.org/D40077.
This change will enable adding LCALS (Livermore Compiler Analysis Loop Suite)
collection of loop kernels to the llvm test suite using the google benchmark
library (https://reviews.llvm.org/D43319) with tracking of individual kernel
timings.
Previously microbenchmarks had been handled by using macros to section groups
of microbenchmarks together and build many executables while still getting a
grouped timing (MultiSource/TSVC). Recently the google benchmark library was
added to the test suite and utilized with a litsupport plugin. However the
limitation of 1 test 1 result limited its use to passing a runtime option to
run only 1 microbenchmark with several hand written tests
(MicroBenchmarks/XRay). This runs the same executable many times with different
hand-written tests. I will update the litsupport plugin to utilize the new
functionality (https://reviews.llvm.org/D43316).
These changes allow lit to report micro test results if desired in order to get
many precise timing results from 1 run of 1 test executable.
Reviewers: MatzeB, hfinkel, rengolin, delcypher
Differential Revision: https://reviews.llvm.org/D43314
llvm-svn: 327422
Summary:
This change is part of step five in the series of changes to remove alignment argument from
memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the
SelectionDAGBuilder to cease using the old getAlignment() API of MemoryIntrinsic in favour of getting
source & dest specific alignments through the new API.
Steps:
Step 1) Remove alignment parameter and create alignment parameter attributes for
memcpy/memmove/memset. ( rL322965, rC322964, rL322963 )
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
source and dest alignments. ( rL323597 )
Step 3) Update Clang to use the new IRBuilder API. ( rC323617 )
Step 4) Update Polly to use the new IRBuilder API. ( rL323618 )
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
and those that use use MemIntrinsicInst::[get|set]Alignment() to use [get|set]DestAlignment()
and [get|set]SourceAlignment() instead. ( rL323886, rL323891, rL324148, rL324273, rL324278,
rL324384, rL324395, rL324402, rL324626, rL324642, rL324653, rL324654, rL324773, rL324774,
rL324781, rL324784, rL324955, rL324960, rL325816, rL327398 )
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
MemIntrinsicInst::[get|set]Alignment() methods.
Reference
http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.htmlhttp://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html
llvm-svn: 327421
The goal is to make the reciprocal throughput computation accessible through the
MCSchedModel interface. This is particularly important for llvm-mca because it
can only query the MCSchedModel interface.
No functional change intended.
Differential Revision: https://reviews.llvm.org/D44392
llvm-svn: 327420
Summary: Unless you were intentionally avoiding this syntax? I saw you mentioned makeArrayRef in your commit that added SplitOpsAndApply.
Reviewers: RKSimon
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D44403
llvm-svn: 327418
This is a follow-up to r327137 where we unified error handling for the
DwarfLinker. This replaces calls to errs() and outs() with the
appropriate ostream wrapper everywhere in dsymutil.
llvm-svn: 327411
This is part of fixing the instruction predicates for MIPS.
Reviewers: atanasyan, abeserminji
Differential Revision: https://reviews.llvm.org/D44212
llvm-svn: 327409
This is PR36686.
If a user of a library is LTOed with that library we take the
opportunity to set dso_local, but we don't clear dllimport, which
creates an invalid IR.
llvm-svn: 327408
The goal is to make the latency information accessible through the MCSchedModel
interface. This is particularly important for tools like llvm-mca that only have
access to the MCSchedModel API.
This partially fixes PR36676.
No functional change intended.
Differential Revision: https://reviews.llvm.org/D44383
llvm-svn: 327406
This was supposed to be an NFC refactoring that will eventually allow
eliminating the isFast() predicate, but there's a rare possibility
that we would pessimize the code as shown in the test case because
we failed to check 'hasOneUse()' properly. This version also removes
an inefficiency of the old code; we would look for:
(X * C) * C1 --> X * (C * C1)
...but that pattern is always handled by
SimplifyAssociativeOrCommutative().
llvm-svn: 327404
This patch makes dsymutil perform analyzeContextInfo and CloneDIEs in
parallel. For the same object file, there is a dependency between the
two. However, we can do analyzeContextInfo for the next object file
while cloning DIEs for the current. This is exactly the approach taken
in this patch.
For WebCore, this leads to a performance improvement of 29% and for
clang we see similar results with at 32% improvement.
A big thanks to Pete Cooper who came up with the original idea and
the PoC.
Differential revision: https://reviews.llvm.org/D43945
llvm-svn: 327399
Summary:
This change is part of step five in the series of changes to remove alignment argument from
memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the
SROA pass to cease using the old getAlignment() & setAlignment() APIs of MemoryIntrinsic in
favour of getting source & dest specific alignments through the new API. This allows us
to enhance visitMemTransferInst to be more aggressive setting the alignment in memcpy
calls that it creates, as well as to only change the alignment of a memcpy/memmove
argument that it replaces.
Steps:
Step 1) Remove alignment parameter and create alignment parameter attributes for
memcpy/memmove/memset. ( rL322965, rC322964, rL322963 )
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
source and dest alignments. ( rL323597 )
Step 3) Update Clang to use the new IRBuilder API. ( rC323617 )
Step 4) Update Polly to use the new IRBuilder API. ( rL323618 )
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
and those that use use MemIntrinsicInst::[get|set]Alignment() to use [get|set]DestAlignment()
and [get|set]SourceAlignment() instead. ( rL323886, rL323891, rL324148, rL324273, rL324278,
rL324384, rL324395, rL324402, rL324626, rL324642, rL324653, rL324654, rL324773, rL324774,
rL324781, rL324784, rL324955, rL324960, rL325816 )
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
MemIntrinsicInst::[get|set]Alignment() methods.
Reference
http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.htmlhttp://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html
Reviewers: chandlerc, bollu, efriedma
Reviewed By: efriedma
Subscribers: efriedma, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D42974
llvm-svn: 327398
Codeview references to unnamed structs and unions are expected to refer to the
complete type definition instead of a forward reference so Visual Studio can
resolve the type properly.
Differential Revision: https://reviews.llvm.org/D32498
llvm-svn: 327397
Summary: This is a first step towards making the pipeline configurable.
Subscribers: llvm-commits, andreadb
Differential Revision: https://reviews.llvm.org/D44309
llvm-svn: 327389
For the MIPS O32 ABI, the current call lowering logic naively lowers each
call, creating the reserved argument area to hold the argument spill areas for
$a0..$a3 and the outgoing parameter area if one is required at each call site.
In the case of a sufficently large byval argument, a call to memcpy is used
to write the start+16..end of the argument into the outgoing parameter area.
This is done within the CALLSEQ_START..CALLSEQ_END of the callee. The CALLSEQ
nodes are responsible for performing the necessary stack adjustments.
Since the O32/N32/N64 MIPS ABIs do not have a red-zone and writing below the
stack pointer and reading the values back is unpredictable, the call to memcpy
cannot be hoisted out of the callee's CALLSEQ nodes.
However, for the O32 ABI requires the reserved argument area for functions
which have parameters. The naive lowering of calls will then create nested
CALLSEQ sequences. For N32 and N64 these nodes are also created, but with
zero stack adjustments as those ABIs do not have a reserved argument area.
This patch addresses the correctness issue by recognizing the special case
of lowering a byval argument that uses memcpy. By recognizing that the
incoming chain already has a CALLSEQ_START node on it when calling memcpy,
the CALLSEQ nodes are not created. For the N32 and N64 ABIs, this is not an
issue, as no stack adjustment has to be performed.
For the O32 ABI, the correctness reasoning is different. In the case of a
sufficently large byval argument, registers a0..a3 are going to be used for
the callee's arguments, mandating the creation of the reserved argument area.
The call to memcpy in the naive case will also create its own reserved
argument area. However, since the reserved argument area consists of undefined
values, both calls can use the same reserved argument area.
Reviewers: abeserminji, atanasyan
Differential Revision: https://reviews.llvm.org/D44296
llvm-svn: 327388
This patch introduces the LinkContext which is necessary to have
dsymutil perform analysis and cloning of DIEs in parallel. As requested
in D43945, I'm landing this as two separate commits.
llvm-svn: 327382
splitMergedValStore will split a store into two if target prefers this, or if
-force-split-store is passed.
This patch adds the missing handling for endianness in this function along
with a test case.
Review: Eli Friedman
https://reviews.llvm.org/D44396
llvm-svn: 327375
isAvailableAtLoopEntry duplicates logic of `properlyDominates` after checking invariance.
This patch replaces this logic with invocation of this method which is more profitable
because it supports caching.
Differential Revision: https://reviews.llvm.org/D43997
llvm-svn: 327373
Add more debug information for peephole optimization passes.
These would only be enabled for debug version binary and could help
analyzing why some optimization opportunities were missed.
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 327371
This new pass eliminate identical move:
MOV rA, rA
This is particularly possible to happen when sub-register support
enabled. The special type cast insn MOV_32_64 involves different
register class on src (i32) and dst (i64), RA could generate useless
instruction due to this.
This pass also could serve as the bast for further post-RA optimization.
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 327370
Currently, there is no ALU32 bswap support in eBPF ISA.
BSWAP on i32 was set to EXPAND which would need about eight instructions
for single BSWAP.
It would be more efficient to promote it to i64, then doing BSWAP on i64.
For eBPF programs, most of the promotion are zero extensions which are
likely be elimiated later by peephole optimizations.
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 327369
This patch relax the subregister definition check on Phi node.
Previously, we just cancel the optimizatoin when the definition is Phi
node while actually we could further check the definitions of incoming
parameters of PHI node.
This helps catch more elimination opportunities.
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 327368
The current zero extension elimination was restricted to operands of
comparison. It actually could be extended to more cases.
For example:
int *inc_p (int *p, unsigned a)
{
return p + a;
}
'a' will be promoted to i64 during addition, and the zero extension could
be eliminated as well.
For the elimination optimization, it should be much better to start
recognizing the candidate sequence from the SRL instruction instead of J*
instructions.
This patch makes it an generic zero extension elimination pass instead of
one restricted with comparison.
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 327367
There is a mistake in current code that we "break" out the optimization
when the first operand of J*_RR doesn't qualify the elimination. This
caused some elimination opportunities missed, for example the one in the
testcase.
The code should just fall through to handle the second operand.
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 327366
The current subregister definition check stops after the MOV_32_64
instruction.
This means we are thinking all the following instruction sequences
are safe to be eliminated:
MOV_32_64 rB, wA
SLL_ri rB, rB, 32
SRL_ri rB, rB, 32
However, this is *not* true. The source subregister wA of MOV_32_64 could
come from a implicit truncation of 64-bit register in which case the high
bits of the 64-bit register is not zeroed, therefore we can't eliminate
above sequence.
For example, for i32_val, we shouldn't do the elimination:
long long bar ();
int foo (int b, int c)
{
unsigned int i32_val = (unsigned int) bar();
if (i32_val < 10)
return b;
else
return c;
}
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 327365
Improve the test accuracy by adding more check directives.
Shifts are expected to be eliminated for zero extension but not for signed
extension.
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 327364
It is a revert of rL327362 which causes build bot failures with assert like
Assertion `isAvailableAtLoopEntry(RHS, L) && "RHS is not available at Loop Entry"' failed.
llvm-svn: 327363
IsKnownPredicate is updated to implement the following algorithm
proposed by @sanjoy and @mkazantsev :
isKnownPredicate(Pred, LHS, RHS) {
Collect set S all loops on which either LHS or RHS depend.
If S is non-empty
a. Let PD be the element of S which is dominated by all other elements of S
b. Let E(LHS) be value of LHS on entry of PD.
To get E(LHS), we should just take LHS and replace all AddRecs that
are attached to PD on with their entry values.
Define E(RHS) in the same way.
c. Let B(LHS) be value of L on backedge of PD.
To get B(LHS), we should just take LHS and replace all AddRecs that
are attached to PD on with their backedge values.
Define B(RHS) in the same way.
d. Note that E(LHS) and E(RHS) are automatically available on entry of PD,
so we can assert on that.
e. Return true if isLoopEntryGuardedByCond(Pred, E(LHS), E(RHS)) &&
isLoopBackedgeGuardedByCond(Pred, B(LHS), B(RHS))
Return true if Pred, L, R is known from ranges, splitting etc.
}
This is follow-up for https://reviews.llvm.org/D42417.
Reviewers: sanjoy, mkazantsev, reames
Reviewed By: sanjoy, mkazantsev
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43507
llvm-svn: 327362
Summary:
This change fixes PR36483. The bug was originally introduced by a change
that marked non-prevailing symbols dead. This broke LowerTypeTests
handling of available_externally functions, which are non-prevailing.
LowerTypeTests uses liveness information to avoid emitting thunks for
unused functions.
Marking available_externally functions dead is incorrect, the functions
are used though the function definitions are not. This change keeps them
live, and lets the EliminateAvailableExternally/GlobalDCE passes remove
them later instead.
(Reland with a suspected fix for a unit test failure I haven't been able
to reproduce locally)
Reviewers: pcc, tejohnson
Reviewed By: tejohnson
Subscribers: grimar, mehdi_amini, inglorion, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D43690
llvm-svn: 327360
Summary:
If there's a callees metadata attached to the indirect call instruction, add CallGraphEdges to the callees mentioned in the metadata when computing FunctionSummary.
* Why this is necessary:
Consider following code example:
```
(foo.c)
static int f1(int x) {...}
static int f2(int x);
static int (*fptr)(int) = f2;
static int f2(int x) {
if (x) fptr=f1; return f1(x);
}
int foo(int x) {
(*fptr)(x); // !callees metadata of !{i32 (i32)* @f1, i32 (i32)* @f2} would be attached to this call.
}
(bar.c)
int bar(int x) {
return foo(x);
}
```
At LTO time when `foo.o` is imported into `bar.o`, function `foo` might be inlined into `bar` and PGO-guided indirect call promotion will run after that. If the profile data tells that the promotion of `@f1` or `@f2` is beneficial, the optimizer will check if the "promoted" `@f1` or `@f2` (such as `@f1.llvm.0` or `@f2.llvm.0`) is available. Without this patch, importing `!callees` metadata would only add promoted declarations of `@f1` and `@f2` to the `bar.o`, but still the optimizer will assume that the function is available and perform the promotion. The result of that is link failure with `undefined reference to @f1.llvm.0`.
This patch fixes this problem by adding callees in the `!callees` metadata to CallGraphEdges so that their definition would be properly imported into.
One may ask that there already is a logic to add indirect call promotion targets to be added to CallGraphEdges. However, if profile data says "indirect call promotion is only beneficial under a certain inline context", the logic wouldn't work. In the code example above, if profile data is like
```
bar:1000000:100000
1:100000
1: foo:100000
1: 100000 f1:100000
```
, Computing FunctionSummary for `foo.o` wouldn't add `foo->f1` to CallGraphEdges. (Also, it is at least "possible" that one can provide profile data to only link step but not to compilation step).
Reviewers: tejohnson, mehdi_amini, pcc
Reviewed By: tejohnson
Subscribers: inglorion, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D44399
llvm-svn: 327358
This diff extends the output of -elf-section-groups
(llvm style, gnu style is unchanged since it's meant to be
compatible with binutils readelf) with sh_link and sh_info.
This change will enable us to use llvm-readobj -elf-section-groups
for testing llvm-objcopy's support for .group sections.
Test plan: make check-all
Differential revision: https://reviews.llvm.org/D44280
llvm-svn: 327341
In the case that the CallInst that is being moved has an associated
operand bundle which is a funclet, the move will construct an invalid
instruction. The new site will have a different token and needs to be
reassociated with the new instruction.
Unfortunately, there is no way to alter the bundle after the
construction of the instruction. Replace the call instruction cloning
with a custom helper to clone the instruction and reassociate the
funclet token.
llvm-svn: 327336
LoopInstSimplify is unused and untested. Reading through the commit
history the pass also seems to have a high maintenance burden.
It would be best to retire the pass for now. It should be easy to
recover if we need something similar in the future.
Differential Revision: https://reviews.llvm.org/D44053
llvm-svn: 327329
Summary:
ProvenanceAnalysis::related(A, B) currently memoizes its results, and on big
tests the cache grows too large, and we're spending most of the time
growing/looking through DenseMap.
This patch reduces the size of the cache by normalizing keys first: we do that
by calling GetUnderlyingObjCPtr on the input values. The results of
GetUnderlyingObjCPtr are also memoized in a separate cache.
The patch doesn't bring noticable changes to compile time on CTMark, however
significantly helps one of our internal tests.
Reviewers: gottesmm
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D44270
llvm-svn: 327328
Clean up the parsing of notes in llvm-readobj, improve bounds checking, and
allow the parsing code to be reused.
Differential Revision: https://reviews.llvm.org/D43958
llvm-svn: 327320
getNumUses is a linear time operation. It traverses the user linked list to the end and counts as it goes. Since we are only interested in small constant counts, we should use hasNUses or hasNUsesMore more that terminate the traversal as soon as it can provide the answer.
There are still two other locations in InstCombine, but changing those would force a rebase of D44266 which if accepted would remove them.
Differential Revision: https://reviews.llvm.org/D44398
llvm-svn: 327315
getNumUses is a linear operation. It walks a linked list to get a count. So in this case its better to just ask if there are any users rather than how many.
llvm-svn: 327314
This is the FP equivalent of D42818. Use it for the few cases in InstSimplify
with -0.0 folds (that's the only current use of m_NegZero()).
Differential Revision: https://reviews.llvm.org/D43792
llvm-svn: 327307
Summary:
1) Make sure to discard dangling debug info if the variable (or
variable fragment) is mapped to something new before we had a
chance to resolve the dangling debug info.
2) When resolving debug info, make sure to bump the associated
SDNodeOrder to ensure that the DBG_VALUE is emitted after the
instruction that defines the value used in the DBG_VALUE.
This will avoid a debug-use before def scenario as seen in
https://bugs.llvm.org/show_bug.cgi?id=36417.
The new test case, test/DebugInfo/X86/sdag-dangling-dbgvalue.ll,
show some other limitations in how dangling debug info is
handled in the SelectionDAG. Since we currently only support
having one dangling dbg.value per Value, we will end up dropping
debug info when there are more than one variable that is described
by the same "dangling value".
Reviewers: aprantl
Reviewed By: aprantl
Subscribers: aprantl, eraman, llvm-commits, JDevlieghere
Tags: #debug-info
Differential Revision: https://reviews.llvm.org/D44369
llvm-svn: 327303
This adds two features: "packets", and "nvj".
Enabling "packets" allows the compiler to generate instruction packets,
while disabling it will prevent it and disable all optimizations that
generate them. This feature is enabled by default on all subtargets.
The feature "nvj" allows the compiler to generate new-value jumps and it
implies "packets". It is enabled on all subtargets.
The exception is made for packets with endloop instructions, since they
require a certain minimum number of instructions in the packets to which
they apply. Disabling "packets" will not prevent hardware loops from
being generated.
llvm-svn: 327302
Summary:
This pattern came up in PR36682:
https://bugs.llvm.org/show_bug.cgi?id=36682https://godbolt.org/g/LhuD9A
Tests for proposed fix in D44367.
Looking at the IR pattern in question, as per [[ https://github.com/rutgers-apl/alive-nj | alive-nj ]], for all the type combinations i checked
(input: `i16`, `i32`, `i64`; intermediate: `half`/`i16`, `float`/`i32`, `double`/`i64`)
for the following `icmp` comparisons the `sitofp`+`bitcast` can be dropped:
* `eq 0`
* `ne 0`
* `slt 0`
* `sle 0`
* `sge 0`
* `sgt 0`
* `slt 1`
* `sge 1`
* `sle -1`
* `sgt -1`
I did not check vectors, but i'm guessing it's the same there.
{F5887419}
Thus all these cases are in the testcase (along with the vector variant with additional `undef` element in the middle).
There are no negative patterns here (unless alive-nj lied/is broken), all of these should be optimized.
Generated with {F5887551}
Reviewers: spatel, majnemer, efriedma, arsenm
Reviewed By: spatel
Subscribers: nlopes, wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D44390
llvm-svn: 327301
Remove the special casing for MRM_F8 by using HANDLE_OPTIONAL.
This should be NFC as the forms that were missing aren't used by any instructions today. They exist in the enum so that we didn't have to put them in one at a time when instructions are added. But looks like we failed here.
llvm-svn: 327298
MVT belongs to the CodeGen layer, but ShuffleDecode is used by the X86 InstPrinter which is part of the MC layer. This only worked because MVT is completely implemented in a header file with no other library dependencies.
Differential Revision: https://reviews.llvm.org/D44353
llvm-svn: 327292
Since the enqueued kernels have internal linkage, their names may be dropped.
In this case, give them unique names __amdgpu_enqueued_kernel or
__amdgpu_enqueued_kernel.n where n is a sequential number starting from 1.
Differential Revision: https://reviews.llvm.org/D44322
llvm-svn: 327291
This broke some Windows buildbots; see llvm-commits thread.
> These were just copies of the relevant fuzzer binary with (presumably)
> meaningful suffixes, but accounted for more than 10% of my build
> directory (> 8GB). Hard drive space is cheap, but not that cheap.
(Also reverts follow-up r326710 which didn't help.)
llvm-svn: 327266
This simplifies tagging instructions with the correct ISA and ASE, albeit making
instruction definitions a bit more verbose.
Reviewers: atanasyan, abeserminji
Differential Revision: https://reviews.llvm.org/D44299
llvm-svn: 327265
After r327219 was landed, the bot with expensive checks on GreenDragon
started failing. The problem was missing symbols `regex_t` and
`regmatch_t` in `xlocale/_regex.h`. The latter was included because
after the change in r327219, `random` is needed, which transitively
includes `xlocale.h.` which in turn conditionally includes
`xlocale/_regex.h` when _REGEX_H_ is defined. Because this is the header
guard in `regex_impl.h` and because `regex_impl.h` was included before
the other LLVM includes, `xlocale/_regex.h` was included without the
necessary types being available.
This commit fixes this by moving the include of `regex_impl.h` all the
way down. I also added a comment to stress the significance of its
position.
llvm-svn: 327256
This reverts r326908, originally landed as D44102.
Reverted for causing performance regressions on x86. (These regressions
are not yet understood.)
llvm-svn: 327252
We called MaskedValueIsZero with two different masks, but underneath that calls computeKnownBits before applying the mask. This means we compute the same known bits twice due to the two calls. Instead just call computeKnownBits directly and apply the two masks ourselves.
llvm-svn: 327251
When we replace the Phi we created with matched ones it is possible that
there are two identical phi nodes in IR. And matcher is smart enough to find that
new created phi matches both of them. So we try to replace our phi node with
matched ones twice and what is bad we delete our phi node twice causing a crash.
As soon as we found that we have two identical Phi nodes it makes sense to do
a clean-up and replace one phi node by other one.
The patch implements it.
Reviewers: john.brawn, reames
Reviewed By: john.brawn
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43758
llvm-svn: 327250
64-bit MMX vector generation usually ends up lowering into SSE instructions before being spilled/reloaded as a MMX type.
This patch creates a MMX vector from MMX source values, taking the lowest element from each source and constructing broadcasts/build_vectors with direct calls to the MMX PUNPCKL/PSHUFW intrinsics.
We're missing a few consecutive load combines that could be handled in a future patch if that would be useful - my main interest here is just avoiding a lot of the MMX/SSE crossover.
Differential Revision: https://reviews.llvm.org/D43618
llvm-svn: 327247
Same as the VPERMILPS/VPERMILPD approach for v8f32/v4f64 cases, rely on PSHUFB using bits[3:0] for indexing - we can ignore the sign bit (zero element) as those index vector values are considered undefined. The select between the lo/hi permute results based on the index size.
llvm-svn: 327242
As VPERMILPS/VPERMILPD only selects elements based on the bits[1:0]/bit[1] then we can permute both the (repeated) lo/hi 128-bit vectors in each case and then select between these results based on whether the index was for for lo/hi.
For v4i64/v4f64 this avoids some rather nasty v4i64 multiples on the AVX2 implementation, which seems to be worse than the extra port5 pressure from the additional shuffles/blends.
llvm-svn: 327239
Helper function to insert a subvector into the bottom elements of a larger zero/undef vector with the same scalar type.
I've converted a couple of INSERT_SUBVECTOR calls to use it, there are plenty more although in some cases I was worried it might make the code more ambiguous.
llvm-svn: 327236
Else, the test fails in LLVM_TARGETS_TO_BUILD=X86 builds like so:
bin/llvm-mc: : error: unable to get target for 'arm64-apple-ios7.0.0'
llvm-svn: 327233
The intent of revision r300311 was to add a check for invalid scheduling class
descriptors. However, it ended up adding a redundant call in a basic block that
should not be reachable.
llvm-svn: 327231
Summary:
There are 3 different operand orders for FMA instructions so figuring out the exact operation being performed requires a lot of thought.
This patch adds a comment to the end of the assembly line to print the exact operation.
I think I've got all the instructions in here except the ones with builtin rounding.
I didn't update all tests, but I assume we can get them as we regenerate tests in the future.
Reviewers: spatel, v_klochkov, RKSimon
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D44345
llvm-svn: 327225
Summary:
std::sort and array_pod_sort both use non-stable sorting algorithms.
This means that the relative order of elements with the same key is
undefined. This patch is an attempt to uncover such scenarios by
randomly shuffling all containers before sorting, if EXPENSIVE_CHECKS
is enabled.
Here's the bugzilla for this: https://bugs.llvm.org/show_bug.cgi?id=35135
Reviewers: dblaikie, dexonsmith, chandlerc, efriedma, RKSimon
Reviewed By: RKSimon
Subscribers: fhahn, davide, RKSimon, vsk, mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D39245
llvm-svn: 327219
This change removes method Backend::getProcResourceMasks() and simplifies some
logic in the Views. This effectively removes yet another dependency between the
views and the Backend.
No functional change intended.
llvm-svn: 327214
With the updated LangRef ( D44216 / rL327138 ) in place, we can proceed with more constant folding.
I'm intentionally taking the conservative path here: no matter what the constant or the FMF, we can
always fold to NaN. This is because the undef operand can be chosen as NaN, and in our simplified
default FP env, nothing else happens - NaN just propagates to the result. If we find some way/need
to propagate undef instead, that can be added subsequently.
The tests show that we always choose the same quiet NaN constant (0x7FF8000000000000 in IR text).
There were suggestions to improve that with a 'NaN' string token or not always print a 64-bit hex
value, but those are independent changes. We might also consider setting/propagating the payload of
NaN constants as an enhancement.
Differential Revision: https://reviews.llvm.org/D44308
llvm-svn: 327208
Use isInlineViable to prevent inlining of functions with non-inlinable
constructs, in case cost analysis is skipped.
Reviewers: efriedma, sfertile, davide, davidxl
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D42846
llvm-svn: 327207
X86InstComments.h is used by tools that only have the MC layer. We shouldn't be importing a file from CodeGen into this.
X86InstrInfo.h isn't a great place, but I couldn't find a better one.
llvm-svn: 327202
We're persisting AliasResults in some places in MemorySSA, so the size
of these now matters a little bit (well, 8 regular-sized bits, to be
precise).
Do the same for ModRefInfo for consistency.
llvm-svn: 327201
r327100 made us stop producing vecreduce-propagate-sd-flags.s, but it's
still sticking around on some bots. This makes the bots unhappy.
I'll revert this tomorrow.
llvm-svn: 327199
This fixes pr36674.
While it is valid for shouldAssumeDSOLocal to return false anytime,
always returning false for intrinsics is not optimal on i386 and also
hits a bug in the backend.
To use a plt, the caller must first setup ebx to handle the case of
that file being linked into a PIE executable or shared library. In
those cases the generated PLT uses ebx.
Currently we can produce "calll expf@plt" without setting ebx. We
could fix that by correctly setting ebx, but this would produce worse
code for the case where the runtime library is statically linked. It
would also required other tools to handle R_386_PLT32.
llvm-svn: 327198