Summary:
Existing heuristic uses the ratio between the function entry
frequency and the loop invocation frequency to find cold loops. However,
even if the loop executes frequently, if it has a small trip count per
each invocation, vectorization is not beneficial. On the other hand,
even if the loop invocation frequency is much smaller than the function
invocation frequency, if the trip count is high it is still beneficial
to vectorize the loop.
This patch uses estimated trip count computed from the profile metadata
as a primary metric to determine coldness of the loop. If the estimated
trip count cannot be computed, it falls back to the original heuristics.
Reviewers: Ayal, mssimpso, mkuper, danielcdh, wmi, tejohnson
Reviewed By: tejohnson
Subscribers: tejohnson, mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D32451
llvm-svn: 305729
Summary:
Some optimizations in AddReachableCodeToWorklist did not update
the MadeIRChange state. This could happen both when removing
trivially dead instructions (DCE) and at constant folds.
It is essential that changes to the IR is reported correctly,
since for example InstCombinePass::run() will indicate that all
analyses are preserved otherwise.
And the CGPassManager determines if the CallGraph is up-to-date
based on status from InstructionCombiningPass::runOnFunction().
The new test case early_dce_clobbers_callgraph.ll is a reproducer
for some asserts that started to trigger after changes in the
inliner in r305245. With this patch the test case passes again.
Reviewers: sanjoy, craig.topper, dblaikie
Reviewed By: craig.topper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34346
llvm-svn: 305725
This seems to be interacting badly with ASan somehow, causing false reports of
heap-buffer overflows: PR33514.
> Summary:
> The patch makes instruction count the highest priority for
> LSR solution for X86 (previously registers had highest priority).
>
> Reviewers: qcolombet
>
> Differential Revision: http://reviews.llvm.org/D30562
>
> From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 305720
Summary:
This is a first step towards getting line info to show up in VS and
windbg. So far, only llvm-pdbutil can parse the PDBs that we produce.
cvdump doesn't like something about our file checksum tables. I'll have
to dig into that next.
This patch adds a new DebugSubsectionRecordBuilder which takes bytes
directly from some other producer, such as a linker, and sticks it into
the PDB. Line tables only need to be relocated. No data needs to be
rewritten.
File checksums and string tables, on the other hand, need to be re-done.
Reviewers: zturner, ruiu
Subscribers: llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D34257
llvm-svn: 305713
Summary:
These 4 patterns have the same one use check repeated twice for each. Once without a cast and one with. But the cast has no effect on what method is called.
For the OR case I believe it is always profitable regardless of the number of uses since we'll never increase the instruction count.
For the AND case I believe it is profitable if the pair of xors has one use such that we'll get rid of it completely. Or if the C value is something freely invertible, in which case the not doesn't cost anything.
Reviewers: spatel, majnemer
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34308
llvm-svn: 305705
Summary:
Currently we don't try to do anything with vector xors.
This patch adds support for removing duplicate pairs from a chain of vector xors as its pretty easy to support. We still dont' try to combine the xors with and/ors, but I might try that in a future patch.
Reviewers: mcrosier, davide, resistor
Reviewed By: mcrosier
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34338
llvm-svn: 305704
As all store merges checks are based on the memory operation
performed, allow use of truncated stores and extended loads as valid
input candidates for merging.
Relanding after fixing selection between truncated and normal store.
llvm-svn: 305701
Summary:
After a single predecessor is merged into a basic block, we need to invalidate
the LVI information for the new merged block, when LVI is not provably true for
all of instructions in the new block.
The test cases added show the correct LVI information using the LVI printer
pass.
Reviewers: reames, dberlin, davide, sanjoy
Reviewed by: dberlin, davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34108
llvm-svn: 305699
Summary: use AA to tell whether a load can be moved before a call that writes to memory.
Reviewers: dberlin, davide, sanjoy, hfinkel
Reviewed By: hfinkel
Subscribers: hfinkel, llvm-commits
Differential Revision: https://reviews.llvm.org/D34115
llvm-svn: 305698
Summary: Implement some of the simplest addressing modes.It should help to test ABI.
Reviewers: zvi, guyblank
Reviewed By: guyblank
Subscribers: rovka, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D33888
llvm-svn: 305691
Use llvm::make_unique to avoid ambiguity with MSVC.
This patch adds a generic MacroFusion pass, that is used on X86 and
AArch64, which both define target-specific shouldScheduleAdjacent
functions. This generic pass should make it easier for other targets to
implement macro fusion and I intend to add macro fusion for ARM shortly.
Differential Revision: https://reviews.llvm.org/D34144
llvm-svn: 305690
Summary:
This patch adds a generic MacroFusion pass, that is used on X86 and
AArch64, which both define target-specific shouldScheduleAdjacent
functions. This generic pass should make it easier for other targets to
implement macro fusion and I intend to add macro fusion for ARM shortly.
Reviewers: craig.topper, evandro, t.p.northover, atrick, MatzeB
Reviewed By: MatzeB
Subscribers: atrick, aemerson, mgorny, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D34144
llvm-svn: 305677
Add support throughout the pipeline:
- mark as legal for s32 and pointers
- map to GPRs
- lower to a sequence of instructions, which moves 0 or 1 into the
result register based on the flags set by a CMPrr
We have copied from FastISel a helper function which maps CmpInst
predicates into ARMCC codes. Ideally, we should be able to move it
somewhere that both FastISel and GlobalISel can use.
llvm-svn: 305672
Current implementation of SCEVExpander demonstrates a very naive behavior when
it deals with power calculation. For example, a SCEV for x^8 looks like
(x * x * x * x * x * x * x * x)
If we try to expand it, it generates a very straightforward sequence of muls, like:
x2 = mul x, x
x3 = mul x2, x
x4 = mul x3, x
...
x8 = mul x7, x
This is a non-efficient way of doing that. A better way is to generate a sequence of
binary power calculation. In this case the expanded calculation will look like:
x2 = mul x, x
x4 = mul x2, x2
x8 = mul x4, x4
In some cases the code size reduction for such SCEVs is dramatic. If we had a loop:
x = a;
for (int i = 0; i < 3; i++)
x = x * x;
And this loop have been fully unrolled, we have something like:
x = a;
x2 = x * x;
x4 = x2 * x2;
x8 = x4 * x4;
The SCEV for x8 is the same as in example above, and if we for some reason
want to expand it, we will generate naively 7 multiplications instead of 3.
The BinPow expansion algorithm here allows to keep code size reasonable.
This patch teaches SCEV Expander to generate a sequence of BinPow multiplications
if we have repeating arguments in SCEVMulExpressions.
Differential Revision: https://reviews.llvm.org/D34025
llvm-svn: 305663
Merge the functionality into the random access type collection.
This class was only being used in 2 places, so getting rid of it
simplifies the code.
llvm-svn: 305653
Summary:
This allows strlen to be moved out of the loop in case its argument is
not modified in the loop in LICM.
Reviewers: hfinkel, davide, sanjoy, dberlin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34323
llvm-svn: 305641
No behavior is changed if LLVM_TARGET_TRIPLE_ENV is blank or undefined.
If LLVM_TARGET_TRIPLE_ENV is "TEST_TARGET_TRIPLE" and $TEST_TARGET_TRIPLE is not blank,
llvm::sys::getDefaultTargetTriple() returns $TEST_TARGET_TRIPLE.
Lit resets config.target_triple and config.environment[LLVM_TARGET_TRIPLE_ENV] to change the default target.
Without changing LLVM_DEFAULT_TARGET_TRIPLE nor rebuilding, lit can be run;
TEST_TARGET_TRIPLE=i686-pc-win32 bin/llvm-lit -sv path/to/test/
TEST_TARGET_TRIPLE=i686-pc-win32 ninja check-clang-tools
Differential Revision: https://reviews.llvm.org/D33662
llvm-svn: 305632
Re-apply r276044/r279124/r305516. Fixed a problem where we would refuse
to place spills as the very first instruciton of a basic block and thus
artifically increase pressure (test in
test/CodeGen/PowerPC/scavenging.mir:spill_at_begin)
This is a variant of scavengeRegister() that works for
enterBasicBlockEnd()/backward(). The benefit of the backward mode is
that it is not affected by incomplete kill flags.
This patch also changes
PrologEpilogInserter::doScavengeFrameVirtualRegs() to use the register
scavenger in backwards mode.
Differential Revision: http://reviews.llvm.org/D21885
llvm-svn: 305625
Summary:
This is my misunderstanding on isBarrier. It's not for memory barriers,
but for other control flow purposes. lwsync doesn't have it either.
This fixes a simple crash with -verify-machineinstrs like below:
define void @Foo() {
entry:
%tmp = load atomic i64, i64* undef acquire, align 8
unreachable
}
I deliberately don't want to check in the test, since there is little
chance to regress on such a mistake. Such a test adds noise to the code
base.
I plan to check in first, since it fixes a crash, and the fix is obvious.
Reviewers: kbarton, echristo
Subscribers: sanjoy, nemanjai, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D34314
llvm-svn: 305624
This ensures that symbolic relocations are generated for stack
pointer manipulations.
These relocations are of type R_WEBASSEMBLY_GLOBAL_INDEX_LEB.
This change also adds support for reading relocations of this
type in WasmObjectFile.cpp.
Since its a globally imported symbol this does mean that
the get_global/set_global instruction won't be valid until
the objects are linked that global used in no longer an
imported global.
Differential Revision: https://reviews.llvm.org/D34172
llvm-svn: 305616
Suppose we had a type index offsets array with a boundary at type index
N. Then you request the name of the type with index N+1, and that name
requires the name of index N-1 (think a parameter list, for example). We
didn't handle this, and we would print something like (<unknown UDT>,
<unknown UDT>).
The fix for this is not entirely trivial, and speaks to a larger
problem. I think we need to kill TypeDatabase, or at the very least kill
TypeDatabaseVisitor. We need a thing that doesn't do any caching
whatsoever, just given a type index it can compute the type name "the
slow way". The reason for the bug is that we don't have anything like
that. Everything goes through the type database, and if we've visited a
record, then we're "done". It doesn't know how to do the expensive thing
of re-visiting dependent records if they've not yet been visited.
What I've done here is more or less copied the code (albeit greatly
simplified) from TypeDatabaseVisitor, but wrapped it in an interface
that just returns a std::string. The logic of caching the name is now in
LazyRandomTypeCollection. Eventually I'd like to move the record
database here as well and the visited record bitfield here as well, at
which point we can actually just delete TypeDatabase. I don't see any
reason for it if a "sequential" collection is just a special case of a
random access collection with an empty partial offsets array.
Differential Revision: https://reviews.llvm.org/D34297
llvm-svn: 305612
Davide Italiano reported the following issue if llvm
is compiled with gcc -Wstrict-aliasing -Werror:
.....
lib/Target/BPF/CMakeFiles/LLVMBPFCodeGen.dir/BPFISelDAGToDAG.cpp.o
../lib/Target/BPF/BPFISelDAGToDAG.cpp: In member function ‘virtual
void {anonymous}::BPFDAGToDAGISel::PreprocessISelDAG()’:
../lib/Target/BPF/BPFISelDAGToDAG.cpp:264:26: warning: dereferencing
type-punned pointer will break strict-aliasing rules
[-Wstrict-aliasing]
val = *(uint16_t *)new_val;
.....
The error is caused by my previous commit (revision 305560).
This patch fixed the issue by introducing an union to avoid
type casting.
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 305608
Summary: As far as I can tell we should be able to implement these almost the same way we do unsigned, but using signed comparisons and checks for min signed value instead of min unsigned value.
Reviewers: pete, davide, sanjoy
Reviewed By: davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D33815
llvm-svn: 305607
For the following motivating example
bool c();
void f();
bool start() {
bool result = c();
if (!c()) {
result = false;
goto exit;
}
f();
result = true;
exit:
return result;
}
we would previously generate a single DW_AT_const_value(1) because
only the DBG_VALUE in the second-to-last basic block survived
codegen. This patch improves the heuristic used to determine when a
DBG_VALUE is available at the beginning of its variable's enclosing
lexical scope:
- Stop giving singular constants blanket permission to take over the
entire scope. There is still a special case for constants in the
function prologue that we also miight want to retire later.
- Use the lexical scope information to determine available-at-entry
instead of proximity to the function prologue.
After this patch we generate a location list with a more accurate
narrower availability for the constant true value. As a pleasant side
effect, we also generate inline locations instead of location lists
where a loacation covers the entire range of the enclosing lexical
scope.
Measured on compiling llc with four targets this doesn't have an
effect on compile time and reduces the size of the debug info for llc
by ~600K.
rdar://problem/30286912
llvm-svn: 305599
The verifier should not output any message in such a case.
Added test case with no .apple_name section in the file to verify new functionality.
Made existing test case more specific.
llvm-svn: 305597
In this patch, I flip the switch in DriverUtils from using the external
cvtres.exe tool to using the Windows Resource library in llvm.
I also fixed a bug where .rsrc sections were marked as discardable
memory and therefore were placed in the wrong order in the final PE.
Furthermore, I modified WindowsResource to write the coff directly to a
memory buffer instead of to file, also had it use the machine types
already declared in COFF.h instead creating my own enum.
Finally, I flipped the switch to allow all unit tests that had
previously run only on windows due to a winres dependency to run
cross-platform.
Reviewers: zturner, ruiu
Subscribers: llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D34265
llvm-svn: 305592
Summary:
When we fold vector constants that are operands of phi's that feed into select,
we need to set the correct insertion point for the *new* selects that get generated.
The correct insertion point is the incoming block for the phi.
Such cases can occur with patch r298845, which fixed folding of
vector constants, but the new selects could be inserted incorrectly (as the added
test case shows).
Reviewers: majnemer, spatel, sanjoy
Reviewed by: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34162
llvm-svn: 305591
The recommit fixes two bugs: The first one is to use CurrentBlock instead of
PREInstr's Parent as param of performScalarPREInsertion because the Parent
of a clone instruction may be uninitialized. The second one is stop PRE when
CurrentBlock to its predecessor is a backedge and an operand of CurInst is
defined inside of CurrentBlock. The same value defined inside of loop in last
iteration can not be regarded as available.
Right now scalarpre doesn't have phi-translate support, so it will miss some
simple pre opportunities. Like the following testcase, current scalarpre cannot
recognize the last "a * b" is fully redundent because a and b used by the last
"a * b" expr are both defined by phis.
long a[100], b[100], g1, g2, g3;
__attribute__((pure)) long goo();
void foo(long a, long b, long c, long d) {
g1 = a * b;
if (__builtin_expect(g2 > 3, 0)) {
a = c;
b = d;
g2 = a * b;
}
g3 = a * b; // fully redundant.
}
The patch adds phi-translate support in scalarpre. This is only a temporary
solution before the newpre based on newgvn is available.
Differential Revision: https://reviews.llvm.org/D32252
llvm-svn: 305578
Revert because of reports of some PPC input starting to spill when it
was predicted that it wouldn't and no spillslot was reserved.
This reverts commit r305516.
llvm-svn: 305566
If users tried to have a structure decl/init code like below
struct test_t t = { .memeber1 = 45 };
It is very likely that compiler will generate a readonly section
to hold up the init values for variable t. Later load of t members,
e.g., t.member1 will result in a read from readonly section.
BPF program cannot handle relocation. This will force users to
write:
struct test_t t = {};
t.member1 = 45;
This is just inconvenient and unintuitive.
This patch addresses this issue by implementing BPF PreprocessISelDAG.
For any load from a global constant structure or an global array of
constant struct, it attempts to
translate it into a constant directly. The traversal of the
constant struct and other constant data structures are similar
to where the assembler emits read-only sections.
Four different unit test cases are also added to cover
different scenarios.
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 305560
o This is discovered during my study of 32-bit subregister
support.
o This is no impact on current functionality since we
only support 64-bit registers.
o Searching the web, looks like the issue has been discovered
before, so fix it now.
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 305559
Summary:
Background: http://lists.llvm.org/pipermail/llvm-dev/2017-May/112779.html
This change is to alter the prototype for the atomic memcpy intrinsic. The prototype itself is being changed to more closely resemble the semantics and parameters of the llvm.memcpy intrinsic -- to ease later combination of the llvm.memcpy and atomic memcpy intrinsics. Furthermore, the name of the atomic memcpy intrinsic is being changed to make it clear that it is not a generic atomic memcpy, but specifically a memcpy is unordered atomic.
Reviewers: reames, sanjoy, efriedma
Reviewed By: reames
Subscribers: mzolotukhin, anna, llvm-commits, skatkov
Differential Revision: https://reviews.llvm.org/D33240
llvm-svn: 305558
This reverts commit r305455. This commit was reported as breaking one of
the sanitizer buildbots. Reverting until lab.llvm.org comes back online.
llvm-svn: 305557
The second part of r305300: when placing the mux at the later location,
make sure that it won't use any register that was killed between the
two original instructions. Remove any such kills and transfer them to
the mux.
llvm-svn: 305553
- Topologocal is abbreviated as "topo" in comments, but "top" is used in only one comment. Modify it for consistency.
- Capitalize "succ" and "pred" for consistency in one figure.
- Other trivial fixes.
llvm-svn: 305552
Summary: This is the demorganed version of the case we already handle for the OR of iszero.
Reviewers: spatel
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34244
llvm-svn: 305548
This patch should fix sanitizer-x86_64-linux-fast bot.
The problem was that the contents of this stream are aligned to 4 byte,
and the paddings were created just by incrementing `Offset`, so paddings
had undefined values. When the entire stream is written to an output,
it triggered msan.
llvm-svn: 305541
This resubmits commit c0c249e9f2ef83e1d1e5f166b50673d92f3579d7.
It was broken due to some weird template issues, which have
since been fixed.
llvm-svn: 305517
Re-apply r276044/r279124. Trying to reproduce or disprove the ppc64
problems reported in the stage2 build last time, which I cannot
reproduce right now.
This is a variant of scavengeRegister() that works for
enterBasicBlockEnd()/backward(). The benefit of the backward mode is
that it is not affected by incomplete kill flags.
This patch also changes
PrologEpilogInserter::doScavengeFrameVirtualRegs() to use the register
scavenger in backwards mode.
Differential Revision: http://reviews.llvm.org/D21885
llvm-svn: 305516
This reverts commit 83ea17ebf2106859a51fbc2a86031b44d33696ad.
This is failing due to some strange template problems, so reverting
until it can be straightened out.
llvm-svn: 305505
Summary:
Split the PGOMemOPSizeOpt pass out from IndirectCallPromotion.cpp into
its own file.
Reviewers: davidxl
Subscribers: mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D34248
llvm-svn: 305501
After some internal discussions, we agreed that the raw output style had
outlived its usefulness. It was originally created before we had even
thought of dumping to YAML, and it was intended to give us some insight
into the internals of a PDB file. Now we have YAML mode which does
almost exactly this but is more powerful in that it can round-trip back
to a PDB, which the raw mode could not do. So the raw mode had become
purely a maintenance burden.
One option was to just delete it. However, its original goal was to be
as readable as possible while staying close to the "metal" - i.e.
presenting the output in a way that maps directly to the underlying file
format. We don't actually need that last requirement anymore since it's
covered by the yaml mode, so we could repurpose "raw" mode to actually
just be as readable as possible.
This patch implements about 80% of the functionality previously in raw
mode, but in a completely different style that is more akin to what
cvdump outputs. Records are very compressed, often times appearing on
just one line. One nice thing about this is that it makes full record
matching easier, because you can grep for indices, names, and leaf types
on a single line often.
See the tests for some examples of what the new output looks like.
Note that this patch actually regresses the functionality of raw mode in
a few areas, but only because the patch was already unreasonably large
and going 100% would have been even worse. Specifically, this patch is
missing:
The ability to dump module debug subsections (checksums, lines, etc)
The ability to dump section headers
Aside from that everything is here. While goign through the tests fixing
them all up, I found many duplicate tests. They've been deleted. In
subsequent patches I will go through and re-add the missing
functionality.
Differential Revision: https://reviews.llvm.org/D34191
llvm-svn: 305495
Add condition for MachineLICM to safely hoist instructions that utilize
non constant registers that are reserved.
On PPC, global variable access is done through the table of contents (TOC)
which is always in register X2. The ABI reserves this register in any
functions that have calls or access global variables.
A call through a function pointer involves saving, changing and restoring
this register around the call and thus MachineLICM does not consider it to
be invariant. We can however guarantee the register is preserved across the
call and thus is invariant.
Differential Revision: https://reviews.llvm.org/D33562
llvm-svn: 305490
Currently we expect A to be on the same side in both Ands but nothing guarantees that.
While there also switch to using matchers for some of the code.
Differential Revision: https://reviews.llvm.org/D34230
llvm-svn: 305487
The code assumed that we process instructions in basic block order. FastISel
processes instructions in reverse basic block order. We need to pre-assign
virtual registers before selecting otherwise we get def-use relationships wrong.
This only affects code with swifterror registers.
rdar://32659327
llvm-svn: 305484
If a regular LTO module has a summary index, then instead of linking
it into the combined regular LTO module right away, add it to the
combined summary index and associate it with a special module that
represents the combined regular LTO module.
Any such modules are linked during LTO::run(), at which time we use
the results of summary-based dead stripping to control whether to
link prevailing symbols.
Differential Revision: https://reviews.llvm.org/D33922
llvm-svn: 305482
This is a fix for the test case in PR32314.
Basic Alias Analysis can ask if two nodes are known non-equal after looking through a phi node to find a GEP. isAddOfNonZero saw an add of a constant from the same phi and said that its output couldn't be equal. But Basic Alias Analysis was really asking about the value from the previous loop iteration.
This patch at least makes that case not happen anymore, I'm not sure if there were still other ways this can fail. As was discussed in the bug, it looks like fixing BasicAA would be difficult so this patch seemed like a possible workaround
Differential Revision: https://reviews.llvm.org/D33136
llvm-svn: 305481
This patch fixes a potential verification error (64-bit register operands for cmpw) with -verify-machineinstrs.
Differential Revision: https://reviews.llvm.org/D34208
llvm-svn: 305479
In preparation for doing storemerge post-legalization, reorder
visitSTORE passes to move pre/post-index combining after store
merge. Reordered passes other than store merge are unaffected.
llvm-svn: 305473
As all store merges checks are based on the memory operation
performed, allow use of truncated stores and extended loads as valid
input candidates for merging.
llvm-svn: 305468
AVX512 compare instructions return v*i1 types.
In cases where the number of elements in the returned value are less than 8, clang adds zeroes to get a mask of v8i1 type.
Later on it's replaced with CONCAT_VECTORS, which then is lowered to many DAG nodes including insert/extract element and shift right/left nodes.
The fact that AVX512 compare instructions put the result in a k register and zeroes all its upper bits allows us to remove the extra nodes simply by copying the result to the required register class.
When lowering, identify these cases and transform them into an INSERT_SUBVECTOR node (marked legal), then catch this pattern in instructions selection phase and transform it into one avx512 cmp instruction.
Differential Revision: https://reviews.llvm.org/D33188
llvm-svn: 305465
This is a fix for PR33292 that shows a case of extremely long compilation
of a single .c file with clang, with most time spent within SCEV.
We have a mechanism of limiting recursion depth for getAddExpr to avoid
long analysis in SCEV. However, there are calls from getAddExpr to getMulExpr
and back that do not propagate the info about depth. As result of this, a chain
getAddExpr -> ... .> getAddExpr -> getMulExpr -> getAddExpr -> ... -> getAddExpr
can be extremely long, with every segment of getAddExpr's being up to max depth long.
This leads either to long compilation or crash by stack overflow. We face this situation while
analyzing big SCEVs in the test of PR33292.
This patch applies the same limit on max expression depth for getAddExpr and getMulExpr.
Differential Revision: https://reviews.llvm.org/D33984
llvm-svn: 305463
Add support for modulo for targets that have hardware division and for
those that don't. When hardware division is not available, we have to
choose the correct libcall to use. This is generally straightforward,
except for AEABI.
The AEABI variant is trickier than the other libcalls because it
returns { quotient, remainder }, instead of just one value like the
other libcalls that we've seen so far. Therefore, we need to use custom
lowering for it. However, we don't want to have too much special code,
so we refactor the target-independent code in the legalizer by adding a
helper for replacing an instruction with a libcall. This helper is used
by the legalizer itself when dealing with simple calls, and also by the
custom ARM legalization for the more complicated AEABI divmod calls.
llvm-svn: 305459
Lowering mixed struct args, params and returns used G_INSERT, which is a
bit more convoluted to support through the entire pipeline. Since they
don't occur that often in practice, it's probably wiser to leave them
out until later.
Meanwhile, we can lower homogeneous structs using G_MERGE_VALUES, which
has good support in the legalizer. These occur e.g. as the return of
__aeabi_idivmod, so it's nice to be able to support them.
llvm-svn: 305458
Summary:
Scheduling AESE/AESMC and AESD/AESIMC instruction pairs back-to-back
gives a double digit speedup on benchmarks using those instructions on
Cortex-A processors. In GCC, this optimization is part of the generic
processor model as well.
This change should not have a major performance impact on processors
that do not optimize AES instruction pairs, although I only had access
to Cortex-A processors for benchmarking.
Reviewers: rengolin, kristof.beyls, javed.absar, evandro, silviu.baranga, MatzeB, mcrosier, joelkevinjones, joel_k_jones, bmakam, t.p.northover
Reviewed By: evandro
Subscribers: sbaranga, aemerson, llvm-commits
Differential Revision: https://reviews.llvm.org/D33836
llvm-svn: 305457
Author: milena.vujosevic.janicic
Reviewers: sdardis
The patch extends size reduction pass for MicroMIPS.
The following instructions are examined and transformed, if possible:
ADDIU instruction is transformed into 16-bit instruction ADDIUSP
ADDIU instruction is transformed into 16-bit instruction ADDIUR1SP
Differential Revision: https://reviews.llvm.org/D33887
llvm-svn: 305455
The current name (addModulePath) and return value
(ModulePathStringTableTy::iterator) is a little confusing. This
API adds a module, not just a path. And the iterator is basically
just an implementation detail of the summary index. Address
both of those issues by renaming to addModule and introducing a
ModuleSummaryIndex::ModuleInfo type that the function returns.
Differential Revision: https://reviews.llvm.org/D34124
llvm-svn: 305422
Summary:
At present, `-profile-guided-section-prefix` is a `cl::Optional` option, which means it demands to be passed exactly zero or one times. Our build system makes it pretty tricky to guarantee this. We often accidentally pass the flag more than once (but always with the same "false" value) which results in an error, after which compilation fails:
```
clang (LLVM option parsing): for the -profile-guided-section-prefix option: may only occur zero or one times!
```
While we work on improving our build system, it also seems reasonable just to allow `-profile-guided-section-prefix` to be passed more than once, by to `cl::ZeroOrMore`. Quoting [[ http://llvm.org/docs/CommandLine.html#controlling-the-number-of-occurrences-required-and-allowed | the documentation ]]:
> The cl::ZeroOrMore modifier ... indicates that your program will allow the option to be specified zero or more times.
> ...
> If an option is specified multiple times for an option of the cl::opt class, only the last value will be retained.
Reviewers: danielcdh
Reviewed By: danielcdh
Subscribers: twoh, david2050, llvm-commits
Differential Revision: https://reviews.llvm.org/D34219
llvm-svn: 305413
This way we end up not looking at PHI args already removed.
MemSSA now goes through the updater so we can prune
it to avoid having redundant MemoryPHI arguments, but that
doesn't quite work for the general case.
Discussed with Daniel Berlin, fixes PR33406.
llvm-svn: 305409
There's an early out that's trying to detect when we don't know any bits that make up the legal range of a shift. The code subtracts one from BitWidth which creates a mask in the lower bits for power of 2 bit widths. This is then ANDed with the known bits to see if any of those bits are known. If the bit width isn't a power of 2 this creates a non-sensical mask.
This patch corrects this by rounding up to a power of 2 before doing the subtract and mask.
Differential Revision: https://reviews.llvm.org/D34165
llvm-svn: 305400
We know that shuffle masks are power-of-2 sizes, but there's no way (?) for LLVM to know that,
so hack combineX86ShufflesRecursively() to be much faster by replacing div/rem with shift/mask.
This makes the motivating compile-time bug in PR32037 ( https://bugs.llvm.org/show_bug.cgi?id=32037 )
about 9% faster overall.
Differential Revision: https://reviews.llvm.org/D34174
llvm-svn: 305398
Many times unit tests for different libraries would like to use
the same helper functions for checking common types of errors.
This patch adds a common library with helpers for testing things
in Support, and introduces helpers in here for integrating the
llvm::Error and llvm::Expected<T> classes with gtest and gmock.
Normally, we would just be able to write:
EXPECT_THAT(someFunction(), succeeded());
but due to some quirks in llvm::Error's move semantics, gmock
doesn't make this easy, so two macros EXPECT_THAT_ERROR() and
EXPECT_THAT_EXPECTED() are introduced to gloss over the difficulties.
Consider this an exception, and possibly only temporary as we
look for ways to improve this.
Differential Revision: https://reviews.llvm.org/D33059
llvm-svn: 305395
This was originally reverted because of some non-deterministic
failures on certain buildbots. Luckily ASAN eventually caught
this as a stack-use-after-scope, so the fix is included in
this patch.
llvm-svn: 305393
This reverts commit 3a204faa093c681a1e96c5e0622f50649b761ee0.
I've upset a buildbot which runs the address sanitizer:
ERROR: AddressSanitizer: stack-use-after-scope
lib/Target/ARM/ARMISelLowering.cpp:2690
That Twine variable is used illegally.
llvm-svn: 305390
For multiprecision arithmetic on MIPS, rather than using ISD::ADDE / ISD::ADDC,
get SelectionDAG to break down the operation into ISD::ADDs and ISD::SETCCs.
For MIPS, only the DSP ASE has a carry flag, so in the general case it is not
useful to directly support ISD::{ADDE, ADDC, SUBE, SUBC} nodes.
Also improve the generation code in such cases for targets with
TargetLoweringBase::ZeroOrOneBooleanContent by directly using the result of the
comparison node rather than using it in selects. Similarly for ISD::SUBE /
ISD::SUBC.
Address optimization breakage by moving the generation of MIPS specific integer
multiply-accumulate nodes to before legalization.
This revolves PR32713 and PR33424.
Thanks to Simonas Kazlauskas and Pirama Arumuga Nainar for reporting the issue!
Reviewers: slthakur
Differential Revision: https://reviews.llvm.org/D33494
llvm-svn: 305389
The ARM backend asserts against constant pool lowering when it generates
execute-only code in order to prevent the generation of constant pools in
the text section. It appears that target independent optimizations might
generate DAG nodes that represent constant pools. By lowering such nodes
as global addresses we don't violate the semantics of execute-only code
and also it is guaranteed that execute-only behaves correct with the
position-independent addressing modes that support execute-only code.
Differential Revision: https://reviews.llvm.org/D33773
llvm-svn: 305387
Summary:
This patch is part of 3 patches that together form a single patch, but must be introduced in stages in order not to break things.
The way that LLVM interprets DW_OP_plus in DIExpression nodes is basically that of the DW_OP_plus_uconst operator since LLVM expects an unsigned constant operand. This unnecessarily restricts the DW_OP_plus operator, preventing it from being used to describe the evaluation of runtime values on the expression stack. These patches try to align the semantics of DW_OP_plus and DW_OP_minus with that of the DWARF definition, which pops two elements off the expression stack, performs the operation and pushes the result back on the stack.
This is done in three stages:
• The first patch (LLVM) adds support for DW_OP_plus_uconst.
• The second patch (Clang) contains changes all its uses from DW_OP_plus to DW_OP_plus_uconst.
• The third patch (LLVM) changes the semantics of DW_OP_plus and DW_OP_minus to be in line with its DWARF meaning. This patch includes the bitcode upgrade from legacy DIExpressions.
Patch by Sander de Smalen.
Reviewers: echristo, pcc, aprantl
Reviewed By: aprantl
Subscribers: fhahn, javed.absar, aprantl, llvm-commits
Differential Revision: https://reviews.llvm.org/D33894
llvm-svn: 305386
This patch fixes two systemic machine verifier errors in the long
branch pass. The first is the incorrect basic block successors
and the second was the incorrect construction of several jump
instructions.
This partially resolves PR27458 and the associated PR32146.
Reviewers: slthakur
Differential Revision: https://reviews.llvm.org/D33378
llvm-svn: 305382
This is causing failures on linux bots with an invalid stream
read. It doesn't repro in any configuration on Windows, so
reverting until I have a chance to investigate on Linux.
llvm-svn: 305371
This allows us to use yaml2obj and obj2yaml to round-trip CodeView
symbol and type information without having to manually specify the bytes
of the section. This makes for much easier to maintain tests. See the
tests under lld/COFF in this patch for example. Before they just said
SectionData: <blob> whereas now we can use meaningful record
descriptions. Note that it still supports the SectionData yaml field,
which could be useful for initializing a section to invalid bytes for
testing, for example.
Differential Revision: https://reviews.llvm.org/D34127
llvm-svn: 305366
This patch adds code which verifies that each bucket in the .apple_names
accelerator table is either empty or has a valid hash index.
Differential Revision: https://reviews.llvm.org/D34177
llvm-svn: 305344
Summary:
When legalizing G_LOAD/G_STORE using NarrowScalar, we should avoid emitting
%0 = G_CONSTANT ty 0
%1 = G_GEP %x, %0
since it's cheaper to not emit the redundant instructions than it is to fold them
away later.
Reviewers: qcolombet, t.p.northover, ab, rovka, aditya_nandakumar, kristof.beyls
Reviewed By: qcolombet
Subscribers: javed.absar, llvm-commits, igorb
Differential Revision: https://reviews.llvm.org/D32746
llvm-svn: 305340
Doing so breaks compilation of the following C program
(under -fprofile-instr-generate):
__attribute__((always_inline)) inline int foo() { return 0; }
int main() { return foo(); }
At link time, we fail because taking the address of an
available_externally function creates an undefined external reference,
which the TU cannot provide.
Emitting the function definition into the object file at all appears to
be a violation of the langref: "Globals with 'available_externally'
linkage are never emitted into the object file corresponding to the LLVM
module."
Differential Revision: https://reviews.llvm.org/D34134
llvm-svn: 305327
Summary:
We were writing the length of the string based on system-endianness, and
not universally little-endian. This fixes that.
Reviewers: zturner
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D34159
llvm-svn: 305322
Summary:
Leave an updated VP metadata on the fallback memcpy intrinsic after
specialization. This can be used for later possible expansion based on
the average of the remaining values.
Reviewers: davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34164
llvm-svn: 305321
Summary: Apparently we need to write using a void* pointer on some architectures, or else alignment error is caused.
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D34166
llvm-svn: 305320
Summary: Added output to stderr so that we can actually see what is happening when the test fails on big endian.
Reviewers: zturner
Subscribers: llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D34155
llvm-svn: 305314
Store-immediate instructions have a non-extendable offset. Since the
actual offset for a stack object is not known until much later, only
generate these stores when the stack size (at the time of instruction
selection) is small.
llvm-svn: 305305
Summary:
This patch is part of 3 patches that together form a single patch, but must be introduced in stages in order not to break things.
The way that LLVM interprets DW_OP_plus in DIExpression nodes is basically that of the DW_OP_plus_uconst operator since LLVM expects an unsigned constant operand. This unnecessarily restricts the DW_OP_plus operator, preventing it from being used to describe the evaluation of runtime values on the expression stack. These patches try to align the semantics of DW_OP_plus and DW_OP_minus with that of the DWARF definition, which pops two elements off the expression stack, performs the operation and pushes the result back on the stack.
This is done in three stages:
• The first patch (LLVM) adds support for DW_OP_plus_uconst.
• The second patch (Clang) contains changes all its uses from DW_OP_plus to DW_OP_plus_uconst.
• The third patch (LLVM) changes the semantics of DW_OP_plus and DW_OP_minus to be in line with its DWARF meaning. This patch includes the bitcode upgrade from legacy DIExpressions.
Patch by Sander de Smalen.
Reviewers: pcc, echristo, aprantl
Reviewed By: aprantl
Subscribers: fhahn, aprantl, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D33892
llvm-svn: 305304
Summary: Fixes an issue using RegisterStandardPasses from a statically linked object before PassManagerBuilder::addGlobalExtension is called from a dynamic library.
Reviewers: efriedma, theraven
Reviewed By: efriedma
Subscribers: mehdi_amini, mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D33515
llvm-svn: 305303
When a mux instruction is created from a pair of complementary conditional
transfers, it can be placed at the location of either the earlier or the
later of the transfers. Since it will use the operands of the original
transfers, putting it in the earlier location may hoist a kill of a source
register that was originally further down. Make sure the kill flag is
removed if the register is still used afterwards.
llvm-svn: 305300
Summary:
Expose the module descriptor index and fill it in for section
contributions.
Reviewers: zturner
Subscribers: llvm-commits, ruiu, hiraditya
Differential Revision: https://reviews.llvm.org/D34126
llvm-svn: 305296
While simplifying branches in the MachineInstr representation, the
routine BuildCondBr must preserve flags on register MachineOperands. In
particular, it must preserve the <undef> flag.
This fixes a bug that is unlikely to occur in any real scenario, but
which bugpoint is likely to introduce.
Patch By Nick Johnson!
Reviewers: ahatanak, sdardis
Differential Revision: https://reviews.llvm.org/D34041
llvm-svn: 305290
The VFNM[AS] instructions did not have scheduling information attached, which
was causing assertion failures with the Cortex-A57 scheduling model and
-fp-contract=fast, because the Cortex-A57 sched model claims to be complete.
Differential Revision: https://reviews.llvm.org/D34139
llvm-svn: 305288
Much of PR32037's compile time regression is due to getTargetConstantBitsFromNode always creating large (>64bit) APInts during the bitcasting from the source data to the destination bitwidth.
This commit avoids this bitcast stage if the data is already the correct bitwidth.
llvm-svn: 305284
This restores the order of evaluation (& conditionalized evaluation) of
isTriviallyDeadInstruction, InlineHistoryIncludes, and shouldInline
(with the addition of a shouldInline call after
isTriviallyDeadInstruction) from before r305245.
llvm-svn: 305267
These symbols were previously not being marked as functions
so were appearing as globals instead, and with the incorrect
relocation type.
Without this fix, objects that take address of external
functions include them as global imports rather than function
imports which then fails at link time.
Differential Revision: https://reviews.llvm.org/D34068
llvm-svn: 305263
This revert was done so that my other patch to add test framework could
land separately. Now the revert can be reverted and this patch can
reland.
This reverts commit 18b3c75b2b0d32601fb60a06b9672c33d6f0dff9.
llvm-svn: 305259
Summary: Added test cases for multiple machine types, file merging, multiple languages, and more resource types. Also fixed new bugs these tests exposed.
Subscribers: javed.absar, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D34047
llvm-svn: 305258
I accidentally combined this patch with one for adding more tests, they
should be separated.
This reverts commit 3da218a523be78df32e637d3446ecf97c9ea0465.
llvm-svn: 305257
Previously we were writing the value function index space
value but for these types of relocations we want to be
writing the table element index space value.
Add a test case for these relocation types that fails
without this change.
Differential Revision: https://reviews.llvm.org/D33962
llvm-svn: 305253
User has 3 signatures for operator new today. They take a single size, a size and a number of users, and a size, number of users, and descriptor size.
Historically there used to only be one signature that took size and a number of uses. Long ago derived classes implemented their own versions that took just a size and would call the size and use count version. Then they left an unimplemented signature for the size and use count signature from User. As we moved to C++11 this unimplemented signature because = delete.
Since then operator new has picked up two new signatures for operator new. But when the 3 argument version was added it was never added to the delete list in all of the derived classes where the 2 argument version is deleted. This makes things inconsistent.
I believe once one version of operator new is created in a derived class name hiding will take care of making all of the base class signatures unavailable. So I don't think the deleted lines are needed at all.
This patch removes all of the deletes in cases where there is an override or there is already a delete of another signature (that should trigger name hiding too).
Differential Revision: https://reviews.llvm.org/D34120
llvm-svn: 305251
When we get an unknown symbol type, we might as well at least
dump it. Same goes for round-tripping through YAML, we can
dump the record contents as raw bytes even if we don't know
how to interpret it semantically.
llvm-svn: 305248
This fixes PR33157.
https://bugs.llvm.org//show_bug.cgi?id=33157
We might also think about disallowing duplicate dbg.declare intrinsics
entirely, but this may complicate some passes needlessly.
llvm-svn: 305244
It doesn't seem relevant to set an address space limit - this isn't
important in any sense that I'm aware & it gets in the way of things
that use a lot of address space, like llvm-symbolizer.
This came up when I realized that bugpoint regression tests were much
slower with -gsplit-dwarf than plain -g. Turned out that bugpoint
subprocesses (opt, etc) were crashing and doing symbolization - but
bugpoint runs those subprocesses with a 400MB memory limit. So with
plain -g, mmaping the opt binary would exceed the memory limit, fail,
and thus be really fast - no symbolization occurred. Whereas with
-gsplit-dwarf, comically, having less to map in, it would succeed and
then spend lots of time symbolizing.
I've fixed at least the critical part of bugpoint's perf problem there
by adding an option to allow bugpoint to disable symbolization. Thus
improving the perfromance for -gsplit-dwarf and making the -g-esque
speed available without this quirk/accidental benefit.
llvm-svn: 305242
The last fix required the user to manually add the required
feature. This caused an LLD test to fail because I failed to
update LLD. In practice we can hide this logic so it can just
be transparently added when we write the PDB.
llvm-svn: 305236
Older PDBs don't have this. Its presence is detected by using
the various "feature" flags that come at the end of the PDB
Stream. Detect this, and don't try to dump the ID stream if the
features tells us it's not present.
llvm-svn: 305235
Summary:
After RS4GC, we should drop metadata that is no longer valid. These metadata
is used by optimizations scheduled after RS4GC, and can cause a miscompile.
One such metadata is invariant.load which is used by LICM sinking transform.
After rewriting statepoints, the address of a load maybe relocated. With
invariant.load metadata on a load instruction, LICM sinking assumes the
loaded value (from a dererenceable address) to be invariant, and
rematerializes the load operand and the load at the exit block.
This transforms the IR to have an unrelocated use of the
address after a statepoint, which is incorrect.
Other metadata we conservatively remove are related to
dereferenceability and noalias metadata.
This patch drops such metadata on store and load instructions after
rewriting statepoints.
Reviewers: reames, sanjoy, apilipenko
Reviewed by: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D33756
llvm-svn: 305234
This is a precursor to another change (coming soon) that aims to make
FoldingSet's API more type-safe. Without this, the type-safety change
would just duplicate 4 more public methods between the already very
similar classes.
This renames FoldingSetImpl to FoldingSetBase so it's consistent with
the FooBase -> FooImpl<T> -> Foo<T> convention we seem to have with
other containers.
llvm-svn: 305231
The "Add/sub (shifted reg)" instructions use the 31 encoding for xzr and wzr
rather than the SP, so we need to use different variants.
Situations where this actually comes up are rare enough (see test-case) that I
think falling back to DAG is fine.
llvm-svn: 305230
Static data members were causing a problem because I mistakenly
assumed all members would affect a class's layout and so the
Layout member would be non-null.
llvm-svn: 305229
Fix thinko/typo in subreg aware liverange splitting logic. I'm not sure
how to write a proper testcase for this. The original problem only
happens on an out-of-tree target. Forcing subreg enabled targets to
spill and split in a predictable way is near impossible.
llvm-svn: 305228
Summary:
Use the filepath used to open the archive member as the archive member
name instead of the file basename. This path might be absolute or
relative. This is important because the archive member name will show
up in the PDB, and we want our PDBs to look as much like MSVC's as
possible.
This also helps avoid an issue in our PDB module descriptor writing
code, which assumes that all module names are unique. Relative paths
still aren't guaranteed to be unique, but they're much better than
basenames, which definitely aren't unique.
Reviewers: ruiu, zturner
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D33575
llvm-svn: 305223
Power9 has instructions that will reverse the bytes within an element for all
sizes (half-word, word, double-word and quad-word). These can be used for the
vec_revb builtins in altivec.h. However, we implement these to match vector
shuffle nodes as that will cover both the builtins and vector shuffles that
occur in the SDAG through other means.
Differential Revision: https://reviews.llvm.org/D33690
llvm-svn: 305214
Note that if we need the result of both the divide and the modulo then we
compute the modulo based on the result of the divide and not using the new
hardware instruction.
Commit on behalf of STEFAN PINTILIE.
Differential Revision: https://reviews.llvm.org/D33940
llvm-svn: 305210
Summary:
This change enables the sin(x) cos(x) -> sincos(x) optimization on GNU
target triples. This optimization was being inhibited when -ffast-math
wasn't set because sincos in GLibC does not set errno, while sin and cos
do. However, this optimization will only run if the attributes on the
sin/cos calls include readnone, which is how clang represents the fact
that it doesn't care about the errno values set by these functions (via
the -fno-math-errno flag).
Reviewers: hfinkel, bogner
Subscribers: mcrosier, javed.absar, llvm-commits, paul.redmond
Differential Revision: https://reviews.llvm.org/D32921
llvm-svn: 305204
Summary:
The old check for slot overlap treated 2 slots `S` and `T` as
overlapping if there existed a CFG node in which both of the slots could
possibly be active. That is overly conservative and caused stack blowups
in Rust programs. Instead, check whether there is a single CFG node in
which both of the slots are possibly active *together*.
Fixes PR32488.
Patch by Ariel Ben-Yehuda <ariel.byd@gmail.com>
Reviewers: thanm, nagisa, llvm-commits, efriedma, rnk
Reviewed By: thanm
Subscribers: dotdash
Differential Revision: https://reviews.llvm.org/D31583
llvm-svn: 305193
This step is just intended to reduce code duplication rather than change any functionality.
A follow-up would be to replace PPCTargetLowering::spliceIntoChain() usage with this new helper.
Differential Revision: https://reviews.llvm.org/D33649
llvm-svn: 305192
Summary: The method TargetTransformInfo::getRegisterBitWidth() is declared const, but the type erasing implementation classes (TargetTransformInfo::Concept & TargetTransformInfo::Model) that were introduced by Chandler in https://reviews.llvm.org/D7293 do not have the method declared const. This is an NFC to tidy up the const consistency between TTI and its implementation.
Reviewers: chandlerc, rnk, reames
Reviewed By: reames
Subscribers: reames, jfb, arsenm, dschuff, nemanjai, nhaehnle, javed.absar, sbc100, jgravelle-google, llvm-commits
Differential Revision: https://reviews.llvm.org/D33903
llvm-svn: 305189
First possible step towards merging SSE/AVX memory folding pattern fragments.
Also allows us to remove the duplicate non-temporal load logic.
Differential Revision: https://reviews.llvm.org/D33902
llvm-svn: 305184
I was looking closer at the x86 test diffs in D33866, and the first change seems like it
shouldn't happen in the first place. So this patch will resolve that.
Using Agner's tables and AMD docs, vperm2f128 and vinsertf128 have identical timing for
any given CPU model, so we should be able to interchange those without affecting perf.
But as we can see in some of the diffs here, using vperm2f128 allows load folding, so
we should take that opportunity to reduce code size and register pressure.
A secondary advantage is making AVX1 and AVX2 codegen more similar. Given that vperm2f128
was introduced with AVX1, we should be selecting it in all of the same situations that we
would with AVX2. If there's some reason that an AVX1 CPU would not want to use this
instruction, that should be fixed up in a later pass.
Differential Revision: https://reviews.llvm.org/D33938
llvm-svn: 305171
Summary: UADDO has 2 result, and one must check the result no before doing any kind of combine. Without it, the transform is invalid.
Reviewers: joerg
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34088
llvm-svn: 305162
Summary:
Use MemorySSA for memory dependency checking in the EarlyCSE pass at the
start of the function simplification portion of the pipeline. We rely
on the fact that GVNHoist runs just after this pass of EarlyCSE to
amortize the MemorySSA construction cost since GVNHoist uses MemorySSA
and EarlyCSE preserves it.
This is turned off by default. A follow-up change will turn it on to
allow for easier reversion in case it breaks something.
llvm-svn: 305146
We're currently passing endian-ness around as a param (and not uniformly),
so this eliminates the need for that. I'd like to add a constant fold
call too, and that requires a DL.
llvm-svn: 305129
Summary:
- Fix assertion failures on F16 to/from int types in FastISel by falling
back to regular ISel
- Add a testcase of various conversion cases with FastISel (-O0)
Reviewers: kristof.beyls, jmolloy, SjoerdMeijer
Reviewed By: SjoerdMeijer
Subscribers: SjoerdMeijer, llvm-commits, srhines, pirama, aemerson, rengolin, javed.absar, kristof.beyls
Differential Revision: https://reviews.llvm.org/D33734
llvm-svn: 305127
Previously it was non-const reference named Result which would tend to make someone think that it was an outparam when really its an input.
llvm-svn: 305114
Currently there is a bug in SROA::presplitLoadsAndStores which causes assertion in
GEPOperator::accumulateConstantOffset.
Basically it does not consider the situation that the pointer operand of load or store
may be in a non-zero address space and its size may be different from the size of
a pointer in address space 0.
This patch fixes assertion when compiling Blender Cycles kernels for amdgpu backend.
Diffferential Revision: https://reviews.llvm.org/D33298
llvm-svn: 305107
Summary:
isSafeToSpeculativelyExecute is the wrong predicate to use here.
All that checks for is whether it is safe to hoist a value due to
unaligned/un-dereferencable accesses. However, not only are we doing
sinking rather than hoisting, our concern is that the location
we're loading from may have been modified. Instead forbid sinking
any load across a critical edge.
Reviewers: majnemer
Subscribers: davide, llvm-commits
Differential Revision: https://reviews.llvm.org/D33179
llvm-svn: 305102
Previously extractors tried to be stateless with any additional
context information needed in order to parse items being passed
in via the extraction method. This led to quite cumbersome
implementation challenges and awkwardness of use. This patch
brings back support for stateful extractors, making the
implementation and usage simpler.
llvm-svn: 305093
Summary: Add the WindowsResourceCOFFWriter class for producing the final COFF after all parsing is done.
Reviewers: hiraditya!, zturner, ruiu
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34020
llvm-svn: 305092
Summary:
Unless I'm mistaken, the special handling for EQ/NE should cover everything and there is no reason to fallthrough to the more complex code. For that matter I'm not sure there's any reason to special case EQ/NE other than avoiding creating temporary ConstantRanges.
This patch moves the complex code into an else so we only do it when we are handling a predicate other than EQ/NE.
Reviewers: anna, reames, resistor, Farhana
Reviewed By: anna
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34000
llvm-svn: 305086
Summary:
During DAG legalization loop in SelectionDAG::Legalize(),
bookkeeping of the SDNodes that were already legalized is implemented
with SmallPtrSet (LegalizedNodes). This kind of set stores only pointers
to objects, not the objects themselves. Unfortunately, if SDNode is
deleted during legalization for some reason, LegalizedNodes set is not
informed about this fact. This wouldn’t be so bad, if SelectionDAG wouldn’t reuse
space deallocated after deletion of unused nodes, for creation of new
ones. Because of this, new nodes, created during legalization often can
have pointers identical to ones that have been previously legalized,
added to the LegalizedNodes set, and deleted afterwards. This in turn
causes, that newly created nodes, sharing the same pointer as deleted
old ones, are present in LegalizedNodes *already at the moment of
creation*, so we never call Legalize on them.
The fix facilitates the fact, that DAG notifies listeners about each
modification. I have registered DAGNodeDeletedListener inside
SelectionDAG::Legalize, with a callback function that removes any
pointer of any deleted SDNode from the LegalizedNodes set. With this
modification, LegalizeNodes set does not contain pointers to nodes that
were deleted, so newly created nodes can always be inserted to it, even
if they share pointers with old deleted nodes.
Patch by pawel.szczerbuk@intel.com
The issue this patch addresses causes failures in an out-of-tree target,
and i was not able to create a reproducer for an in-tree target, hence
there is no test-case.
Reviewers: delena, spatel, RKSimon, hfinkel, davide, qcolombet
Reviewed By: delena
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D33891
llvm-svn: 305084
By target hookifying getRegisterType, getNumRegisters, getVectorBreakdown,
backends can request that LLVM to scalarize vector types for calls
and returns.
The MIPS vector ABI requires that vector arguments and returns are passed in
integer registers. With SelectionDAG's new hooks, the MIPS backend can now
handle LLVM-IR with vector types in calls and returns. E.g.
'call @foo(<4 x i32> %4)'.
Previously these cases would be scalarized for the MIPS O32/N32/N64 ABI for
calls and returns if vector types were not legal. If vector types were legal,
a single 128bit vector argument would be assigned to a single 32 bit / 64 bit
integer register.
By teaching the MIPS backend to inspect the original types, it can now
implement the MIPS vector ABI which requires a particular method of
scalarizing vectors.
Previously, the MIPS backend relied on clang to scalarize types such as "call
@foo(<4 x float> %a) into "call @foo(i32 inreg %1, i32 inreg %2, i32 inreg %3,
i32 inreg %4)".
This patch enables the MIPS backend to take either form for vector types.
The previous version of this patch had a "conditional move or jump depends on
uninitialized value".
Reviewers: zoran.jovanovic, jaydeep, vkalintiris, slthakur
Differential Revision: https://reviews.llvm.org/D27845
llvm-svn: 305083
Summary:
Alloca promotion pass not dealing with non-canonical input
Added some additional checks so the pass simply backs-off forms it can't deal with (non-canonical)
Also added some test cases in non-canonical form to check that it no longer crashes
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tpr, t-tye
Differential Revision: https://reviews.llvm.org/D31710
llvm-svn: 305079
This patch creates a customised machine-scheduler for ARM targets,
so that subsequently DAG mutations etc can be added.
Reviewed by: hahn, rengolin, rovka.
Differential Revision: https://reviews.llvm.org/D34039
llvm-svn: 305078
When an empty comment is present in an assembly file, the compiler will crash because it checks the first character for '\n' or '\r'.
The fix consists of also checking if the string is empty before accessing the *front* method of the StringRef.
A test is included for the x86 target, but this issue is reproducible with other targets as well.
Patch by Alexandru Guduleasa!
Reviewers: niravd, grosbach, llvm-commits
Reviewed By: niravd
Differential Revision: https://reviews.llvm.org/D33993
llvm-svn: 305077