The nuw constraint will not be satisfied unless <expr> == 0.
This bug has been around since r102234 (in 2010!), but was uncovered by
r251052, which introduced more aggressive optimization of nuw scev expressions.
Differential Revision: http://reviews.llvm.org/D14850
llvm-svn: 253627
coloring pass. Turn the logic into "look for an insert point and
then move things past the insert point".
No functional change intended.
llvm-svn: 253626
This adds a new API, LTOCodeGenerator::setFileType, to choose the output file
format for LTO CodeGen. A corresponding change to use this new API from
llvm-lto and a test case is coming in a separate commit.
Differential Revision: http://reviews.llvm.org/D14554
llvm-svn: 253622
Now that the register allocator knows about the barriers on funclet
entry and exit, testing has shown that this is unnecessary.
We still demote PHIs on unsplittable blocks due to the differences
between the IR CFG and the Machine CFG.
llvm-svn: 253619
The change exposed a bug in IndVarSimplify (PR25578), which led to a
failure (PR25538). When the bug is fixed, this patch can be reapplied.
The tests are kept in tree, as they're useful anyway, and will not break
with this revert.
llvm-svn: 253596
Summary: The new algorithm is more efficient (O(n), n is number of basic blocks). And it is guaranteed to cover all cases of multiple BB mapped to same line.
Reviewers: dblaikie, davidxl, dnovillo
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D14738
llvm-svn: 253594
Summary :
* Rename isSmallTypeLdMerge() to isNarrowLoad().
* Rename NumSmallTypeMerged to NumNarrowTypePromoted.
* Use Subtarget defined as a member variable.
llvm-svn: 253587
We currently bail out of global localization if the global has non-instruction users. However, often these can be simple bitcasts or constant-GEPs, which we can easily turn into instructions before localizing. Be a bit more aggressive.
llvm-svn: 253584
This change extends r251438 to handle more narrow load promotions
including byte type, unscaled, and signed. For example, this change will
convert :
ldursh w1, [x0, #-2]
ldurh w2, [x0, #-4]
into
ldur w2, [x0, #-4]
asr w1, w2, #16
and w2, w2, #0xffff
llvm-svn: 253577
This is another step towards allowing SimplifyCFG to speculate harder, but then have
CGP clean things up if the target doesn't like it.
Previous patches in this series:
http://reviews.llvm.org/D12882http://reviews.llvm.org/D13297
D13297 should catch most expensive ops, but speculation of cttz/ctlz requires special
handling because of weirdness in the intrinsic definition for handling a zero input
(that definition can probably be blamed on x86).
For example, if we have the usual speculated-by-select expensive op pattern like this:
%tobool = icmp eq i64 %A, 0
%0 = tail call i64 @llvm.cttz.i64(i64 %A, i1 true) ; is_zero_undef == true
%cond = select i1 %tobool, i64 64, i64 %0
ret i64 %cond
There's an instcombine that will turn it into:
%0 = tail call i64 @llvm.cttz.i64(i64 %A, i1 false) ; is_zero_undef == false
This CGP patch is looking for that case and despeculating it back into:
entry:
%tobool = icmp eq i64 %A, 0
br i1 %tobool, label %cond.end, label %cond.true
cond.true:
%0 = tail call i64 @llvm.cttz.i64(i64 %A, i1 true) ; is_zero_undef == true
br label %cond.end
cond.end:
%cond = phi i64 [ %0, %cond.true ], [ 64, %entry ]
ret i64 %cond
This unfortunately may lead to poorer codegen (see the changes in the existing x86 test),
but if we increase speculation in SimplifyCFG (the next step in this patch series), then
we should avoid those kinds of cases in the first place.
The need for this patch was originally mentioned here:
http://reviews.llvm.org/D7506
with follow-up here:
http://reviews.llvm.org/D7554
Differential Revision: http://reviews.llvm.org/D14630
llvm-svn: 253573
When dumping function samples or writing them out as text format, it
helps if the samples are emitted sorted by source location. The sorting
of the maps is a bit slow, so we only do it on demand.
llvm-svn: 253568
Copying one mask register to another under BW should be done with kmovq instruction, otherwise we can loose some bits.
Copying 8 bits under DQ may be done with kmovb.
Differential Revision: http://reviews.llvm.org/D14812
llvm-svn: 253563
The lowering patterns for X86ISD::VZEXT_MOVL for 128-bit to 256-bit vectors were just copying the lower xmm instead of actually masking off the first scalar using a blend.
Fix for PR25320.
Differential Revision: http://reviews.llvm.org/D14151
llvm-svn: 253561
Make X86AsmBackend generate smarter nops instead of a bunch of 0x90 for code alignment for CPUs which don't support long nop instructions.
Differential Revision: http://reviews.llvm.org/D14178
llvm-svn: 253557
This provides a way to force a function to have certain attributes from the command line. This can be useful when debugging or doing workload exploration, where manually editing IR is tedious or not possible (due to build systems etc).
The syntax is -force-attribute=function_name:attribute_name
All function attributes are parsed except alignstack as it requires an argument.
llvm-svn: 253550
The masked intrinsics support all integer and floating point data types. I added the pointer type to this list.
Added tests for CodeGen and for Loop Vectorizer.
Updated the Language Reference.
Differential Revision: http://reviews.llvm.org/D14150
llvm-svn: 253544
The LLVMContext was only used for Diagnostic. Pass a DiagnosticHandler
instead.
Differential Revision: http://reviews.llvm.org/D14794
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 253540
Optimizations like LoadPRE in GVN will insert new instructions.
If the insertion point is in a already processed BB, they should
get a value number explicitly. If the insertion point is after
current instruction, then just leave it. However, current GVN framework
has no support for it.
In this patch, we just bail out if a VN can't be found.
Dfferential Revision: http://reviews.llvm.org/D14670
A test/Transforms/GVN/pr25440.ll
M lib/Transforms/Scalar/GVN.cpp
llvm-svn: 253536
This bug would manifest in some very specific cases where all the following
conditions are fullfilled:
- GVN didn't remove block
- The regular GVN iteration didn't change the IR
- PRE is enabled
- PRE will not split critical edge
- The last instruction processed by PRE didn't change the IR
Because the CallGraph PassManager relies on this returned value to decide
if it needs to recompute a node after the execution of Function passes,
not returning the right value can lead to unexpected results.
Fix for: https://llvm.org/bugs/show_bug.cgi?id=24715
Patch by Wenxiang Qiu <vincentqiuuu@gmail.com>
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 253518
Note, this was reviewed (and more details are in) http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html
These intrinsics currently have an explicit alignment argument which is
required to be a constant integer. It represents the alignment of the
source and dest, and so must be the minimum of those.
This change allows source and dest to each have their own alignments
by using the alignment attribute on their arguments. The alignment
argument itself is removed.
There are a few places in the code for which the code needs to be
checked by an expert as to whether using only src/dest alignment is
safe. For those places, they currently take the minimum of src/dest
alignments which matches the current behaviour.
For example, code which used to read:
call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 500, i32 8, i1 false)
will now read:
call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 8 %dest, i8* align 8 %src, i32 500, i1 false)
For out of tree owners, I was able to strip alignment from calls using sed by replacing:
(call.*llvm\.memset.*)i32\ [0-9]*\,\ i1 false\)
with:
$1i1 false)
and similarly for memmove and memcpy.
I then added back in alignment to test cases which needed it.
A similar commit will be made to clang which actually has many differences in alignment as now
IRBuilder can generate different source/dest alignments on calls.
In IRBuilder itself, a new argument was added. Instead of calling:
CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, /* isVolatile */ false)
you now call
CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, SrcAlign, /* isVolatile */ false)
There is a temporary class (IntegerAlignment) which takes the source alignment and rejects
implicit conversion from bool. This is to prevent isVolatile here from passing its default
parameter to the source alignment.
Note, changes in future can now be made to codegen. I didn't change anything here, but this
change should enable better memcpy code sequences.
Reviewed by Hal Finkel.
llvm-svn: 253511
This patch adds support for vector constant folding of integer/float comparisons.
This requires FoldConstantVectorArithmetic to support scalar constant operands (in this case ISD::CONDCASE). In future we should be able to support other scalar constant types as necessary (and possibly start calling FoldConstantVectorArithmetic for all node creations)
Differential Revision: http://reviews.llvm.org/D14683
llvm-svn: 253504
It turns out we decide whether to use SjLj exceptions or some alternative in
two separate places in the backend, and they disagreed with each other. This
led to inconsistent code and is generally a terrible idea.
So make them consistent and add an assert that they *do* match (unfortunately
MCAsmInfo isn't available in opt, so it can't be used to initialise the CodeGen
version directly).
llvm-svn: 253502
This change introduces an instrumentation intrinsic instruction for
value profiling purposes, the lowering of the instrumentation intrinsic
and raw reader updates. The raw profile data files for llvm-profdata
testing are updated.
llvm-svn: 253484
This patch adds a cost estimate for some missing sign and zero extensions. The
costs were determined by counting the number of shift instructions generated
without context for each new extension.
Differential Revision: http://reviews.llvm.org/D14730
llvm-svn: 253482
This also takes the push/pop syntax another step forward, introducing stack
slot numbers to make it easier to see how expressions are connected. For
example, the value pushed in $push7 is popped in $pop7.
And, this begins an experiment with making get_local and set_local implicit
when an operation directly uses or defines a register. This greatly reduces
clutter. If this experiment succeeds, it may make sense to do this for
const instructions as well.
And, this introduces more special code for ARGUMENTS; hopefully this code
will soon be obviated by proper support for live-in virtual registers.
llvm-svn: 253465
Starting on an input stream that is not at offset 0 would trigger the
assert in WinCOFFObjectWriter.cpp:1065:
assert(getStream().tell() <= (*i)->Header.PointerToRawData &&
"Section::PointerToRawData is insane!");
llvm-svn: 253464
The virtual register containing the address for returned value on
stack should in the DAG be represented with a CopyFromReg node and not
a Register node. Otherwise, InstrEmitter will not make sure that it
ends up in the right register class for the target instruction.
SystemZ needs this, becuause the reg class for address registers is a
subset of the general 64 bit register class.
test/SystemZ/CodeGen/args-07.ll and args-04.ll updated to run with
-verify-machineinstrs.
Reviewed by Hal Finkel.
llvm-svn: 253461
We use to have an odd difference among MapVector and SetVector. The map
used a DenseMop, but the set used a SmallSet, which in turn uses a
std::set.
I have changed SetVector to use a DenseSet. If you were depending on the
old behaviour you can pass an explicit set type or use SmallSetVector.
The common cases for needing to do it are:
* Optimizing for small sets.
* Sets for types not supported by DenseSet.
llvm-svn: 253439
Summary:
This change teaches LLVM's inliner to track and suitably adjust
deoptimization state (tracked via deoptimization operand bundles) as it
inlines through call sites. The operation is described in more detail
in the LangRef changes.
Reviewers: reames, majnemer, chandlerc, dexonsmith
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D14552
llvm-svn: 253438
If a section is rw, it is irrelevant if the dynamic linker will write to
it or not.
It looks like llvm implemented this because gcc was doing it. It looks
like gcc implemented this in the hope that it would put all the
relocated items close together and speed up the dynamic linker.
There are two problem with this:
* It doesn't work. Both bfd and gold will map .data.rel to .data and
concatenate the input sections in the order they are seen.
* If we want a feature like that, it can be implemented directly in the
linker since it knowns where the dynamic relocations are.
llvm-svn: 253436
When looking for the best successor from the outer loop for a block
belonging to an inner loop, the edge probability computation can be
improved so that edges in the inner loop are ignored. For example,
suppose we are building chains for the non-loop part of the following
code, and looking for B1's best successor. Assume the true body is very
hot, then B3 should be the best candidate. However, because of the
existence of the back edge from B1 to B0, the probability from B1 to B3
can be very small, preventing B3 to be its successor. In this patch, when
computing the probability of the edge from B1 to B3, the weight on the
back edge B1->B0 is ignored, so that B1->B3 will have 100% probability.
if (...)
do {
B0;
... // some branches
B1;
} while(...);
else
B2;
B3;
Differential revision: http://reviews.llvm.org/D10825
llvm-svn: 253414
While still allowing CodeGen/AsmPrinter in llvm to own them using a bump
ptr allocator. (might be nice to replace the pointers there with
something that at least automatically calls their dtors, if that's
necessary/useful, rather than having it done explicitly (I think a typed
BumpPtrAllocator already does this, or maybe a unique_ptr with a custom
deleter, etc))
llvm-svn: 253409
The logic for handling the pattern without a shift is identical
to the logic for handling the pattern with a shift if you set
the shift amount to zero for the former.
This should make it easier to see that we probably don't even need
optimizeIntToFloatBitCast().
If we call something like foldVecTruncToExtElt() from visitTrunc(),
we'll solve PR25543:
https://llvm.org/bugs/show_bug.cgi?id=25543
llvm-svn: 253403
Summary:
This change tries to make the root cause of instrumented profile data merge failures clearer.
Previous:
$ llvm-profdata merge test_0.profraw test_1.profraw -o test_merged.profdata
test_1.profraw: foo: Function count mismatch
test_1.profraw: bar: Function count mismatch
test_1.profraw: baz: Function count mismatch
...
Changed:
$ llvm-profdata merge test_0.profraw test_1.profraw -o test_merged.profdata
test_1.profraw: foo: Function basic block count change detected (counter mismatch)
Make sure that all profile data to be merged is generated from the same binary.
test_1.profraw: bar: Function basic block count change detected (counter mismatch)
test_1.profraw: baz: Function basic block count change detected (counter mismatch)
...
Reviewers: dnovillo, davidxl, bogner
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D14739
llvm-svn: 253384
Summary:
Now that there is a one-to-one mapping from MachineFunction to
WinEHFuncInfo, we don't need to use a DenseMap to select the right
WinEHFuncInfo for the current funclet.
The main challenge here is that X86WinEHStatePass is an IR pass that
doesn't have access to the MachineFunction. I gave it its own
WinEHFuncInfo object that it uses to calculate state numbers, which it
then throws away. As long as nobody creates or removes EH pads between
this pass and SDAG construction, we will get the same state numbers.
The other thing X86WinEHStatePass does is to mark the EH registration
node. Instead of communicating which alloca was the registration through
WinEHFuncInfo, I added the llvm.x86.seh.ehregnode intrinsic. This
intrinsic generates no code and simply marks the alloca in use.
Reviewers: JCTremoulet
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D14668
llvm-svn: 253378
The instruction combiner previously removed types from filter clauses in Landing Pad instructions if the type had previously been seen in a catch clause. This is incorrect and prevents unexpected exception handlers from rethrowing the caught type.
Differential Revision: http://reviews.llvm.org/D14669
llvm-svn: 253370
When resolving R_PPC64_REL24, code used to check for an address delta
that fits in 24 bits, while the instructions that take this relocation
actually can process address deltas that fit into *26* bits (as those
instructions have a 24 bit field, but implicitly append two zero bits
at the end since all instruction addresses are a multiple of 4).
This means that code would signal overflow once a single object's text
section exceeds 8 MB, while we can actually support up to 32 MB.
Partially fixes PR25540.
llvm-svn: 253369
This patch removes the std::string& argument from a number of C++ LTO API calls
and instead makes them use the installed diagnostic handler. This would also
improve consistency of diagnostic handling infrastructure: if an LTO client used
lto_codegen_set_diagnostic_handler() to install a custom error handler, we do
not want some error messages to go through the custom error handler, and some
other error messages to go into sLastErrorString.
llvm-svn: 253367
While setting function attributes we check all instructions that may access memory. For a call instruction we check all arguments. The special check is required for pointers.
I added vector-of-pointers to the call arguments types that should be checked.
Differential Revision: http://reviews.llvm.org/D14693
llvm-svn: 253363
The underlying issues surrounding codegen for 32-bit vselects have been resolved. The pessimistic costs for 64-bit vselects remain due to the bad
scalarization that is still happening there.
I tested this on A57 in T32, A32 and A64 modes. I saw no regressions, and some improvements.
From my benchmarks, I saw these improvements in A57 (T32)
spec.cpu2000.ref.177_mesa 5.95%
lnt.SingleSource/Benchmarks/Shootout/strcat 12.93%
lnt.MultiSource/Benchmarks/MiBench/telecomm-CRC32/telecomm-CRC32 11.89%
I also measured A57 A32, A53 T32 and A9 T32 and found no performance regressions. I see much bigger wins in third-party benchmarks with this change
Differential Revision: http://reviews.llvm.org/D14743
llvm-svn: 253349
Summary:
This patch changes the behavior of path::system_temp_directory() on Windows to be closer to GetTempPath Windows API call. Enforces path separator to be the native one, makes path absolute, etc. GetTempPath is not used directly because of limitations/implementation bugs on Windows 7.
Windows specific unit tests are added. Most of them runs in separated process with modified environment variables.
This change fixes FileSystemTest.CreateDir unittest that had been failing when run from Unix-like shell on Windows (Unix-like path separator (/) used in env variables).
Reviewers: chapuni, rafael, aaron.ballman
Subscribers: rafael, llvm-commits
Differential Revision: http://reviews.llvm.org/D14231
llvm-svn: 253345
SELECT_CC has the nasty property of having operands with unrelated
types. So if you do something like:
f32 = select_cc f16, f16, f32, f32, cc
You'd only look for the action for <select_cc, f32>, but never f16.
If the types are all legal, but the op isn't (as for f16 on AArch64,
or for f128 on x86_64/AArch64?), then you get into trouble.
For f128, we have softenSetCCOperands to handle this case.
Similarly, for f16, we can directly promote the CC operands.
llvm-svn: 253344