Summary:
This is the attribute purpose-made for e.g. __syncthreads. It appears
that NoDuplicate may not be sufficient to prevent Sink from touching a
call to __syncthreads.
Reviewers: jingyue, hfinkel
Subscribers: llvm-commits, jholewinski, jhen, rnk, tra, majnemer
Differential Revision: http://reviews.llvm.org/D16941
llvm-svn: 260005
First step towards being able to decode AVX512 PMOVZX instructions without a massive bloat in the shuffle decode switch statement.
This should also make it easier to decode X86ISD::VZEXT target shuffles in the future.
llvm-svn: 259995
Summary:
Adds the linkage type to both the per-module and combined function
summaries, which subsumes the current islocal bit. This will eventually
be used to optimized linkage types based on global summary-based
analysis.
Reviewers: joker.eph
Subscribers: joker.eph, davidxl, llvm-commits
Differential Revision: http://reviews.llvm.org/D16943
llvm-svn: 259993
Summary:
When alias analysis is uncertain about the aliasing between any two accesses,
it will return MayAlias. This uncertainty from alias analysis restricts LICM
from proceeding further. In cases where alias analysis is uncertain we might
use loop versioning as an alternative.
Loop Versioning will create a version of the loop with aggressive aliasing
assumptions in addition to the original with conservative (default) aliasing
assumptions. The version of the loop making aggressive aliasing assumptions
will have all the memory accesses marked as no-alias. These two versions of
loop will be preceded by a memory runtime check. This runtime check consists
of bound checks for all unique memory accessed in loop, and it ensures the
lack of memory aliasing. The result of the runtime check determines which of
the loop versions is executed: If the runtime check detects any memory
aliasing, then the original loop is executed. Otherwise, the version with
aggressive aliasing assumptions is used.
The pass is off by default and can be enabled with command line option
-enable-loop-versioning-licm.
Reviewers: hfinkel, anemet, chatur01, reames
Subscribers: MatzeB, grosser, joker.eph, sanjoy, javed.absar, sbaranga,
llvm-commits
Differential Revision: http://reviews.llvm.org/D9151
llvm-svn: 259986
There is a legitimate use-case in clang where we need to replace a
temporary placeholder node with the temporary node that may be a
forward declaration.
<rdar://problem/24493203>
llvm-svn: 259973
a significant fraction of the file size (for files that otherwise have few
records). Also include an average size per record in the summary information.
llvm-svn: 259965
check wrong when inheriting a member through two levels of private inheritance,
where the middle one is a class template specialization.
llvm-svn: 259943
-fsized-deallocation. Disable sized deallocation for all objects derived from
TrailingObjects, as we expect the storage allocated for these objects to be
larger than the size of their dynamic type.
llvm-svn: 259942
In r252595, I inadvertently changed the condition to "Cost <= Threshold",
which caused a significant size regression in Chrome. This commit rectifies
that.
llvm-svn: 259915
Summary:
This makes it possible to specify some operands as optional to the AsmMatcher.
Setting this field to true will prevent the AsmMatcher from emitting
'too few operands' errors when there are missing optional operands.
Reviewers: olista01, ab
Subscribers: nhaustov, arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15755
llvm-svn: 259913
The current situation isn't great, because the amount of padding
requires is determined by the inverse order of the first encountered
use. We should eventually somehow sort these to minimize wasted space.
Another problem is the alignment of kernel arguments isn't
respected. The group_segment_alignment is always emitted as
the default 16, and typed arguments with higher alignments
or an explicitly set alignment are also ignored.
llvm-svn: 259912
LiveRangeEdit::eliminateDeadDef is used to remove dead define instructions
after rematerialization. To remove a VNI for a vreg from its LiveInterval,
LiveIntervals::removeVRegDefAt is used. However, after non-PHI VNIs are all
removed, PHI VNI are still left in the LiveInterval. Such unused vregs will
be kept in RegsToSpill[] at the end of InlineSpiller::reMaterializeAll and
spiller will allocate stackslot for them.
The fix is to get rid of unused reg by checking whether it has non-dbg
reference instead of whether it has non-empty interval.
llvm-svn: 259895
This makes it less likely to clash with other stuff that might be linked
in by change, e.g. ncurses exposes an external function called simply
"echo", so linking ncurses statically into the binary explodes in funny
ways.
llvm-svn: 259882
In r255133 (reapplied r253126) we started to avoid redundant
recomputation of LCSSA after loop-unrolling. This patch moves one step
further in this direction - now we can avoid it for much wider range of
loops, as we start to look at IR and try to figure out if the
transformation actually breaks LCSSA phis or makes it necessary to
insert new ones.
Differential Revision: http://reviews.llvm.org/D16838
llvm-svn: 259869
CodeView, like most other debug formats, represents the live range of a
variable so that debuggers might print them out.
They use a variety of records to represent how a particular variable
might be available (in a register, in a frame pointer, etc.) along with
a set of ranges where this debug information is relevant.
However, the format only allows us to use ranges which are limited to a
maximum of 0xF000 in size. This means that we need to split our debug
information into chunks of 0xF000.
Because the layout of code is not known until *very* late, we must use a
new fragment to record the information we need until we can know
*exactly* what the range is.
llvm-svn: 259868
Summary:
Passing the rematerialized values map to insertRematerializationStores by
value looks to be a simple oversight; update it to pass by reference.
Reviewers: reames, sanjoy
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D16911
llvm-svn: 259867
Summary: This diff increase the tested surface of the C API.
Reviewers: bogner, chandlerc, echristo, dblaikie, joker.eph, Wallbraker
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D16910
llvm-svn: 259863
Only single and double FP immediates are correctly printed by
MachineInstr::print() during debug output. Half float type goes to
APFloat::convertToDouble() and hits assertion it is not a double
semantics. This diff prints half machine operands correctly.
This cannot currently be hit by any in-tree target.
Patch by Stanislav Mekhanoshin
llvm-svn: 259857
We don't currently have many tests that deal with operations on multiple
local MemoryLocations. This new test helps out a bit in that regard.
llvm-svn: 259854
Summary: As per title. It is required and don't get linked in in some builds.
Reviewers: chapuni, joker.eph
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D16903
llvm-svn: 259853
Summary computation is not just for instrumented profiling and so I have moved
the ProfileSummary class to ProfileCommon.h (named so to allow code unrelated
to summary but common to instrumented and sampled profiling to be placed there)
Differential Revision: http://reviews.llvm.org/D16661
llvm-svn: 259846
Summary:
This basically add an echo test case in C. The support is limited right now, but full support would just be too much to review at once.
The echo test case simply get a module as input and try to output the same exact module. This allow to check the both reading and writing API are working as expected.
I want to improve this test over time to support more and more of the API, in order to improve coverage (coverage is quite poor right now).
Test Plan: Run the test.
Reviewers: chandlerc, bogner
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10725
llvm-svn: 259844
Using the load immediate only when the immediate (whether signed or unsigned)
can fit in a 16-bit signed field. Namely, from -32768 to 32767 for signed and
0 to 65535 for unsigned. This patch also ensures that we sign-extend under the
right conditions.
llvm-svn: 259840
This only impacts the creation of pre-/post-index instructions. The bound was
set high enough such that it did not change code generation for SPEC200X.
llvm-svn: 259828
This is the right location for platform-specific files.
On some distributions (e. g. Exherbo), a package can be installed for several
architectures in parallel, but the architecture-independent files are shared.
Therefore, we must not install architecture-dependent files (like the CMake
config and export files) to share/.
llvm-svn: 259821
Choose between MOVD/MOVSS and MOVQ/MOVSD depending on the target vector type.
This has a lot fewer test changes than trying to add this to X86InstrInfo::setExecutionDomain.....
llvm-svn: 259816
When SCEV expansion tries to reuse an existing value, it is needed to ensure
that using the Value at the InsertPt will not break LCSSA. The fix adds a
check that InsertPt is either inside the candidate Value's parent loop, or
the candidate Value's parent loop is nullptr.
llvm-svn: 259815
This patch allows the mixing of scaled and unscaled load/stores to form
load/store pairs.
PR24465
http://reviews.llvm.org/D12116
Many thanks to Ahmed and Michael for fixes and code review.
This is a reapplication of r246769 and r259790. The tramp3d failure was caused
by an incorrect refactoring in the patch. Specifically, we weren't always
properly clearing the SExtIdx flag.
llvm-svn: 259812
During instruction selection, the AArch64 backend can recognise the
following pattern and generate an [U|S]MADDL instruction, i.e. a
multiply of two 32-bit operands with a 64-bit result:
(mul (sext i32), (sext i32))
However, when one of the operands is constant, the sign extension
gets folded into the constant in SelectionDAG::getNode(). This means
that the instruction selection sees this:
(mul (sext i32), i64)
...which doesn't match the pattern. Sign-extension and 64-bit
multiply instructions are generated, which are slower than one 32-bit
multiply.
Add a pattern to match this and generate the correct instruction, for
both signed and unsigned multiplies.
Patch by Chris Diamand!
llvm-svn: 259800
Fix the lit bug that enabled this "feature" (empty triple is substring
of all possible target triples) and change the two outliers to use the
documented * syntax.
llvm-svn: 259799
This patch corresponds to review:
http://reviews.llvm.org/D16847
There are some files in glibc that use the output operand modifier even though
it was deprecated in GCC. This patch just adds support for it to prevent issues
with such files.
llvm-svn: 259798
This patch adds support for consecutive (load/undef elements) 32-bit loads, followed by trailing undef/zero elements to be combined to a single MOVD load.
Differential Revision: http://reviews.llvm.org/D16729
llvm-svn: 259796
This patch implements softening of long double type (ppcf128) on ppc32
architecture and enables operations for this type for soft float.
Patch by Strahinja Petrovic.
Differential Revision: http://reviews.llvm.org/D15811
llvm-svn: 259791
This patch allows the mixing of scaled and unscaled load/stores to form
load/store pairs.
PR24465
http://reviews.llvm.org/D12116
Many thanks to Ahmed and Michael for fixes and code review.
This is a reapplication of r246769, which was reverted in r246782 due to a
test-suite failure. I'm unable to reproduce the issue at this time.
llvm-svn: 259790
Use hash table (key is a memory operand) to store found LEA instructions to reduce compile time.
Differential Revision: http://reviews.llvm.org/D16404
llvm-svn: 259770
Current SCEV expansion will expand SCEV as a sequence of operations
and doesn't utilize the value already existed. This will introduce
redundent computation which may not be cleaned up throughly by
following optimizations.
This patch introduces an ExprValueMap which is a map from SCEV to the
set of equal values with the same SCEV. When a SCEV is expanded, the
set of values is checked and reused whenever possible before generating
a sequence of operations.
The original commit triggered regressions in Polly tests. The regressions
exposed two problems which have been fixed in current version.
1. Polly will generate a new function based on the old one. To generate an
instruction for the new function, it builds SCEV for the old instruction,
applies some tranformation on the SCEV generated, then expands the transformed
SCEV and insert the expanded value into new function. Because SCEV expansion
may reuse value cached in ExprValueMap, the value in old function may be
inserted into new function, which is wrong.
In SCEVExpander::expand, there is a logic to check the cached value to
be used should dominate the insertion point. However, for the above
case, the check always passes. That is because the insertion point is
in a new function, which is unreachable from the old function. However
for unreachable node, DominatorTreeBase::dominates thinks it will be
dominated by any other node.
The fix is to simply add a check that the cached value to be used in
expansion should be in the same function as the insertion point instruction.
2. When the SCEV is of scConstant type, expanding it directly is cheaper than
reusing a normal value cached. Although in the cached value set in ExprValueMap,
there is a Constant type value, but it is not easy to find it out -- the cached
Value set is not sorted according to the potential cost. Existing reuse logic
in SCEVExpander::expand simply chooses the first legal element from the cached
value set.
The fix is that when the SCEV is of scConstant type, don't try the reuse
logic. simply expand it.
Differential Revision: http://reviews.llvm.org/D12090
llvm-svn: 259736
D16251)
Summary:
This is a simpler fix to the problem than the dominator approach in
http://reviews.llvm.org/D16251. It adds only values into the gather() while loop
that have been seen before.
The actual endless loop is in the constant compare gather() routine in
Utils/SimplifyCFG.cpp. The same value ret.0.off0.i is pushed back into the
queue:
%.ret.0.off0.i = or i1 %.ret.0.off0.i, %cmp10.i
Here is what happens at the IR level:
for.cond.i: ; preds = %if.end6.i,
%if.end.i54
%ix.0.i = phi i32 [ 0, %if.end.i54 ], [ %inc.i55, %if.end6.i ]
%ret.0.off0.i = phi i1 [false, %if.end.i54], [%.ret.0.off0.i, %if.end6.i] <<<
%cmp2.i = icmp ult i32 %ix.0.i, %11
br i1 %cmp2.i, label %for.body.i, label %LBJ_TmpSimpleNeedExt.exit
if.end6.i: ; preds = %for.body.i
%cmp10.i = icmp ugt i32 %conv.i, %add9.i
%.ret.0.off0.i = or i1 %ret.0.off0.i, %cmp10.i <<<
When if.end.i54 gets eliminated which removes the definition of ret.0.off0.i.
The result is the expression %.ret.0.off0.i = or i1 %.ret.0.off0.i, %cmp10.i
(Note the first ‘or’ operand is now %.ret.0.off0.i, and *NOT* %ret.0.off0.i).
And
now there is use of .ret.0.off0.i before a definition which triggers the
“endless” loop in gather():
while(!DFT.empty()) {
V = DFT.pop_back_val(); // V is .ret.0.off0.i
if (Instruction *I = dyn_cast<Instruction>(V)) {
// If it is a || (or && depending on isEQ), process the operands.
if (I->getOpcode() == (isEQ ? Instruction::Or : Instruction::And)) {
DFT.push_back(I->getOperand(1)); // This is now .ret.0.off0.i also
DFT.push_back(I->getOperand(0));
continue; // “endless loop” for .ret.0.off0.i
}
Reviewers: reames, ahatanak
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D16839
llvm-svn: 259730
Unfortunately, ProgramInfo::ProcessId is signed on Unix and unsigned on
Windows, breaking the standard fix of using '0U' in the gtest
expectation.
llvm-svn: 259704
Summary:
Fixed pointers to go on the right hand side following coding guidelines. NFC.
Patch by Mandeep Singh Grang.
Reviewers: majnemer, arsenm, sanjoy
Differential Revision: http://reviews.llvm.org/D16866
llvm-svn: 259703
Bail out if we have a PHI on an EHPad that gets a value from a
CatchSwitchInst. Because the CatchSwitchInst cannot be split, there is
no good place to stick any instructions.
This fixes PR26373.
llvm-svn: 259702
This is mostly about having shorter lines and standardizing on one
interface, but it also avoids some needless indirection.
No functional change.
llvm-svn: 259697
Summary:
In r257979, I added code to ensure that we wouldn't merge DebugLocEntries if
the pieces they describe overlap. Unfortunately, I failed to cover the case,
where there may have multiple active Expressions in the entry, in which case we
need to make sure that no two values overlap before we can perform the merge.
This fixed PR26148.
Reviewers: aprantl
Differential Revision: http://reviews.llvm.org/D16742
llvm-svn: 259696
The IR/Value class had a linkage issue present when LLVM was built
as a library, and the LLVM library build time had different settings
for NDEBUG than the client of the LLVM library. Clients could get
into a state where the LLVM lib expected
Value::assertModuleIsMaterialized() to be inline-defined in the header
but clients expected that method to be defined in the LLVM library.
See this llvm-commits thread for more details:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160201/329667.html
llvm-svn: 259695
This patch consists of two parts: a performance fix in DAGCombiner.cpp
and a correctness fix in SelectionDAG.cpp.
The test case tests the bug that's uncovered by the performance fix, and
fixed by the correctness fix.
The performance fix keeps the containers required by the
hasPredecessorHelper (which is a lazy DFS) and reuse them. Since
hasPredecessorHelper is called in a loop, the overall efficiency reduced
from O(n^2) to O(n), where n is the number of SDNodes.
The correctness fix keeps iterating the neighbor list even if it's time
to early return. It will return after finishing adding all neighbors to
Worklist, so that no neighbors are discarded due to the original early
return.
llvm-svn: 259691
Summary:
This patch adds a reserve call to an expensive function
(`llvm::LoadIntrinsics`), and may fix a few other low hanging
performance fruit (I've put them in comments for now, so we can
discuss).
**Motivation:**
As I'm sure other developers do, when I build LLVM, I build the entire
project with the same config (`Debug`, `MinSizeRel`, `Release`, or
`RelWithDebInfo`). However, the `Debug` config also builds llvm-tblgen
in `Debug` mode. Later build steps that run llvm-tblgen then can
actually be the slowest steps in the entire build. Nobody likes slow
builds.
Reviewers: rnk, dblaikie
Differential Revision: http://reviews.llvm.org/D16832
Patch by Alexander G. Riccio
llvm-svn: 259683
Add support for TLS access for Windows on ARM. This generates a similar access
to MSVC for ARM.
The changes to the tablegen data is needed to support loading an external symbol
global that is not for a call. The adjustments to the DAG to DAG transforms are
needed to preserve the 32-bit move.
llvm-svn: 259676
According to git bisect, this is the root cause of a miscompile for Regex in
libLLVMSupport. I am still working on reducing a test case.
The actual bug may be elsewhere and this commit just exposed it.
Anyway, at the moment, to reproduce, follow these steps:
1. Build clang and libLTO in release mode.
2. Create a new build directory <stage2> and cd into it.
3. Use clang and libLTO from #1 to build llvm-extract in Release mode + asserts
using -O2 -flto
4. Run llvm-extract -ralias '.*bar' -S test/Other/extract-alias.ll
Result:
program doesn't contain global named '.*bar'!
Expected result:
@a0a0bar = alias void ()* @bar
@a0bar = alias void ()* @bar
declare void @bar()
Note: In step #3, if you don't use lto or asserts, the miscompile disappears.
llvm-svn: 259674
Recommited, after some fixing with test cases.
Updated test cases:
test/CodeGen/AArch64/arm64-misched-memdep-bug.ll
test/CodeGen/AArch64/tailcall_misched_graph.ll
Temporarily disabled test cases:
test/CodeGen/AMDGPU/split-vector-memoperand-offsets.ll
test/CodeGen/PowerPC/ppc64-fastcc.ll (partially updated)
test/CodeGen/PowerPC/vsx-fma-m.ll
test/CodeGen/PowerPC/vsx-fma-sp.ll
http://reviews.llvm.org/D8705
Reviewers: Hal Finkel, Andy Trick.
llvm-svn: 259673
Current SCEV expansion will expand SCEV as a sequence of operations
and doesn't utilize the value already existed. This will introduce
redundent computation which may not be cleaned up throughly by
following optimizations.
This patch introduces an ExprValueMap which is a map from SCEV to the
set of equal values with the same SCEV. When a SCEV is expanded, the
set of values is checked and reused whenever possible before generating
a sequence of operations.
Differential Revision: http://reviews.llvm.org/D12090
llvm-svn: 259662
The GNU toolchain emits __aeabi_divmod for soft-divide on ARM cores
which happens to be a lot faster than __divsi3/__modsi3 when the core
has hardware divide instructions. Do the same here.
Fixes PR26450.
llvm-svn: 259657
This regresses a test in LoopVectorize, so I'll need to go away and think about how to solve this in a way that isn't broken.
From the writeup in PR26071:
What's happening is that ComputeKnownZeroes is telling us that all bits except the LSB are zero. We're then deciding that only the LSB needs to be demanded from the icmp's inputs.
This is where we're wrong - we're assuming that after simplification the bits that were known zero will continue to be known zero. But they're not - during trivialization the upper bits get changed (because an XOR isn't shrunk), so the icmp fails.
The fault is in demandedbits - its contract does clearly state that a non-demanded bit may either be zero or one.
llvm-svn: 259649
MIPS ABI states that .sbss and .sdata sections must have SHF_MIPS_GPREL
flag. See Figure 4–7 on page 69 in the following document:
ftp://www.linux-mips.org/pub/linux/mips/doc/ABI/mipsabi.pdf.
Differential Revision: http://reviews.llvm.org/D15740
llvm-svn: 259641
Summary:
This adds a new attribute which targets can set in TableGen which causes a function to be generated which matches register alternative names. This is very similar to `ShouldEmitMatchRegisterName`, except it works on alt names.
This patch is currently used by the out of tree part of the AVR backend. It reduces code duplication greatly, and has the effect that you do not need to hardcode altname to register mappings in C++.
It will not work on targets which have registers which share the same aliases.
Reviewers: stoklund, arsenm, dsanders, hfinkel, vkalintiris
Subscribers: hfinkel, dylanmckay, llvm-commits
Differential Revision: http://reviews.llvm.org/D16312
llvm-svn: 259636
Follow up to D16217 and D16729
This change uncovered an odd pattern where VZEXT_LOAD v4i64 was being lowered to a load of the lower v2i64 (so the 2nd i64 destination element wasn't being zeroed), I can't find any use/reason for this and have removed the pattern and replaced it so only the 1st i64 element is loaded and the upper bits all zeroed. This matches the description for X86ISD::VZEXT_LOAD
Differential Revision: http://reviews.llvm.org/D16768
llvm-svn: 259635
With this patch, the profile summary data will be available in indexed
profile data file so that profiler reader/compiler optimizer can start
to make use of.
Differential Revision: http://reviews.llvm.org/D16258
llvm-svn: 259626
The purpose of PPCVSXFMAMutate is to elide copies by changing FMA forms
on PPC.
%vreg6<def> = COPY %vreg96
%vreg6<def,tied1> = XSMADDASP %vreg6<tied0>, %vreg5<kill>, %vreg7
;v6 = v6 + v5 * v7
is replaced by
%vreg5<def,tied1> = XSMADDMSP %vreg5<tied0>, %vreg7, %vreg96
;v5 = v5 * v7 + v96
This was broken in the case where the target register was also used as a
multiplicand. Fix this case by checking for it and replacing both uses
with the copied register.
%vreg6<def> = COPY %vreg96
%vreg6<def,tied1> = XSMADDASP %vreg6<tied0>, %vreg5<kill>, %vreg6
;v6 = v6 + v5 * v6
is replaced by
%vreg5<def,tied1> = XSMADDMSP %vreg5<tied0>, %vreg96, %vreg96
;v5 = v5 * v96 + v96
llvm-svn: 259617
The register coalescer can rematerialize constants that define
more of a register than the copy it is going to replace was going
to do.
This is valid in the case the register was undef before the
copy happened.
This patch makes sure that all the subranges defined by the new
rematerialization instructions have at least a dead def.
Review: http://reviews.llvm.org/D16693
llvm-svn: 259614
Summary:
LoopVersioning is a transform utility that transform passes can use to
run-time disambiguate may-aliasing accesses. I'd like to also expose as
pass to allow it to be unit-tested.
I am planning to add support for non-aliasing annotation in
LoopVersioning and I'd like to be able to write tests directly using
this pass.
(After that feature is done, the pass could also be used to look for
optimization opportunities that are hidden behind incomplete alias
information at compile time.)
The pass drives LoopVersioning in its default way which is to fully
disambiguate may-aliasing accesses no matter how many checks are
required.
Reviewers: hfinkel, ashutosh.nema, sbaranga
Subscribers: zzheng, mssimpso, llvm-commits, sanjoy
Differential Revision: http://reviews.llvm.org/D16612
llvm-svn: 259610
Please see include/llvm/Transforms/Utils/MemorySSA.h for a description
of MemorySSA, and what it does.
Differential Revision: http://reviews.llvm.org/D7864
llvm-svn: 259595
Due to staleness in a patch I committed yesterday, the debug output was reporting overdefined cases as being undefined. Confusing to say the least. The mistake appears to have only effected the debug output thankfully.
llvm-svn: 259594
I introduced a declaration in 259583 to keep the diff readable. This change just moves the definition up to remove the declaration again.
llvm-svn: 259585
This patch uses the newly introduced 'intersect' utility (from 259461: [LVI] Introduce an intersect operation on lattice values) to simplify existing code in LVI.
While not introducing any new concepts, this change is probably not NFC. The common 'intersect' function is more powerful that the ad-hoc implementations we'd had in a couple of places. Given that, we may see optimizations triggering a bit more often.
llvm-svn: 259583
This didn't affect X86_64, which is the only client of this code at the moment,
as stubs and pointers are both 8-bytes there. It will affect other platforms
though.
llvm-svn: 259575
If we can't assume the pointer value isn't within the bounds
of the object, it seems risky to try to replace the pointer
calculations.
llvm-svn: 259573
CodeView requires us to accurately describe the extent of the inlined
code. We did this by grabbing the next debug location in source order
and using *that* to denote where we stopped inlining. However, this is
not sufficient or correct in instances where there is no next debug
location or the next debug location belongs to the start of another
function.
To get this correct, use the end symbol of the function to denote the
last possible place the inlining could have stopped at.
llvm-svn: 259548
When promoting allocas to LDS, we know we are indexing
into a specific area just created, and the calculation
will also never overflow.
Also emit some of the muls as nsw nuw, because instcombine
infers this already from the range metadata. I think
putting this on the other adds and muls might be OK too,
but I'm not 100% sure.
llvm-svn: 259545
This directive emits the binary annotations that describe line and code
deltas in inlined call sites. Single-stepping through inlined frames in
windbg now works.
llvm-svn: 259535
Summary:
Enables eip-based addressing, e.g.,
lea constant(%eip), %rax
lea constant(%eip), %eax
in MC, (used for the x32 ABI). EIP-base addressing is also valid in x86_64,
it is left enabled for that architecture as well.
Patch by João Porto
Differential Revision: http://reviews.llvm.org/D16581
llvm-svn: 259528
Re-commit of r258951 after fixing layering violation.
The BPF and WebAssembly backends had identical code for emitting errors
for unsupported features, and AMDGPU had very similar code. This merges
them all into one DiagnosticInfo subclass, that can be used by any
backend.
There should be minimal functional changes here, but some AMDGPU tests
have been updated for the new format of errors (it used a slightly
different format to BPF and WebAssembly). The AMDGPU error messages will
now benefit from having precise source locations when debug info is
available.
llvm-svn: 259498
description and changed the regression test accordingly.
The default configuration of a Cortex-R7 is to implement the
VFPv3-D16 architecture and the feature line as it was is too
restrictive.
llvm-svn: 259480
When rematerializing a computation by replacing the copy, use the copy's
location. The location of the copy is more representative of the
original program.
This partially fixes PR10003.
llvm-svn: 259469
differentiate between indirect references to functions an direct calls.
This doesn't do a whole lot yet other than change the print out produced
by the analysis, but it lays the groundwork for a very major change I'm
working on next: teaching the call graph to actually be a call graph,
modeling *both* the indirect reference graph and the call graph
simultaneously. More details on that in the next patch though.
The rest of this is essentially a bunch of over-engineering that won't
be interesting until the next patch. But this also isolates essentially
all of the churn necessary to introduce the edge abstraction from the
very important behavior change necessary in order to separately model
the two graphs. So it should make review of the subsequent patch a bit
easier at the cost of making this patch seem poorly motivated. ;]
Differential Revision: http://reviews.llvm.org/D16038
llvm-svn: 259463
LVI has several separate sources of facts - edge local conditions, recursive queries, assumes, and control independent value facts - which all apply to the same value at the same location. The existing implementation was very conservative about exploiting all of these facts at once.
This change introduces an "intersect" function specifically to abstract the action of picking a good set of facts from all of the separate facts given. At the moment, this function is relatively simple (i.e. mostly just reuses the bits which were already there), but even the minor additions reveal the inherent power. For example, JumpThreading is now capable of doing an inductive proof that a particular value is always positive and removing a half range check.
I'm currently only using the new intersect function in one place. If folks are happy with the direction of the work, I plan on making a series of small changes without review to replace mergeIn with intersect at all the appropriate places.
Differential Revision: http://reviews.llvm.org/D14476
llvm-svn: 259461
Fix a crash in `getMemOpBaseRegImmOfs` that happens if the base of
`MemOp` is a frame index memory operand. The fix is to have
`getMemOpBaseRegImmOfs` bail out in such cases. We can possibly be more
clever here, if needed.
llvm-svn: 259456
Officially, we don't acknowledge non-default configurations of MXCSR,
as getting there would require usage of the FENV_ACCESS pragma (at
least insofar as rounding mode is concerned).
We don't support the pragma, so we can assume that the default
rounding mode - round to nearest, ties to even - is always used.
However, it's inconsistent with the rest of the instruction set,
where MXCSR is always effective (unless otherwise specified).
Also, it's an unnecessary obstacle to the few brave souls that use
fenv.h with LLVM.
Avoid the hard-coded rounding mode for fp_to_f16; use MXCSR instead.
llvm-svn: 259448
This routine was returning Undefined for most queries. This was utterly wrong. Amusingly, we do not appear to have any callers of this which are actually trying to exploit unreachable code or this would have broken the world.
A better approach would be to explicit describe the intersection of facts. That's blocked behind http://reviews.llvm.org/D14476 and I wanted to fix the current bug.
llvm-svn: 259446
I'll submit a test case shortly which covers this, but it's causing clang self host problems in the builders so I wanted to get it removed.
llvm-svn: 259432
Teach LVI to handle select instructions in the exact same way it handles PHI nodes. This is useful since various parts of the optimizer convert PHI nodes into selects and we don't want these transformations to cause inferior optimization.
Note that this patch does nothing to exploit the implied constraint on the inputs represented by the select condition itself. That will be a later patch and is blocked on http://reviews.llvm.org/D14476
llvm-svn: 259429
These sets do linear searching in small mode; It is not a good idea to
use huge numbers as the small value here, save people from themselves by
adding a static_assert.
Differential Revision: http://reviews.llvm.org/D16706
llvm-svn: 259419
Summary:
If the normal destination of the invoke or the parent block of the call site is unreachable-terminated, there is little point in inlining the call site unless there is literally zero cost. Unlike my previous change (D15289), this change specifically handle the call sites followed by unreachable in the same basic block for call or in the normal destination for the invoke. This change could be a reasonable first step to conservatively inline call sites leading to an unreachable-terminated block while BFI / BPI is not yet available in inliner.
Reviewers: manmanren, majnemer, hfinkel, davidxl, mcrosier, dblaikie, eraman
Subscribers: dblaikie, davidxl, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D16616
llvm-svn: 259403
- ScalarEvolution::isKnownPredicateViaConstantRanges duplicates some
logic already present in ConstantRange, use ConstantRange for those
bits.
- In some cases ScalarEvolution::isKnownPredicateViaConstantRanges
returns `false` to mean "definitely false" (e.g. see the
`LHSRange.getSignedMin().sge(RHSRange.getSignedMax())` case for
`ICmpInst::ICMP_SLT`), but for `isKnownPredicateViaConstantRanges`,
`false` actually means "don't know". Get rid of this extra bit of
code to avoid confusion.
llvm-svn: 259401
If a target can only emit 8-bits data, we would loop in EmitValueImpl
since it will try to split a 32-bits data in 1 chunk of 32-bits.
No test since all current targets can emit 32bits at a time.
Patch by Alexandru Guduleasa!
llvm-svn: 259399
Iterate over the function list instead of a DenseMap of Function pointers
when emitting the function summary into the module.
This fixes PR26419.
llvm-svn: 259398
A masked store with a zero mask means there's no store.
A masked store with an allOnes mask means it's a normal vector store.
This is a continuation of:
http://reviews.llvm.org/rL259369
llvm-svn: 259392
Summary:
This is an extension to the existing implementation of r242436 which
restricts to only select inputs. This version fixes missed opportunities
in pr26084 by attempting to lower conditional compare sequences of
and/or trees with setcc leafs. This will additionaly handle the case
when a tree with select input is not a conjunction-disjunction tree
but some of the sub trees are conjunction-disjunction trees.
Reviewers: jmolloy, t.p.northover, mcrosier, MatzeB
Subscribers: mcrosier, llvm-commits, junbuml, haicheng, mssimpso, gberry
Differential Revision: http://reviews.llvm.org/D16291
llvm-svn: 259387
Summary:
Factor out common code for callee-save register pair calculation. This
is intended to simplify follow-on changes that reduce the number of
registers saved/restored.
Depends on D16732
Reviewers: mcrosier, jmolloy, t.p.northover
Subscribers: aemerson, rengolin, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D16734
llvm-svn: 259384
We've found another bug in the code generation logic conditions for a
certain class of always-false conditions, those of the form
if ((a & 1) < 0)
These only reach the back end when compiling without optimization.
The bug was introduced by the choice of using TEST UNDER MASK
to implement a check for
if ((a & MASK) < VAL)
as
if ((a & MASK) == 0)
where VAL is less than the the lowest bit of MASK. This is correct
in all cases except for VAL == 0, in which case the original
condition is always false, but the replacement isn't.
Fixed by excluding that particular case.
llvm-svn: 259381
This miscompile came about because we tried to use a transform which was
only appropriate for xor operators when addition was present.
This fixes PR26407.
llvm-svn: 259375
A masked load with a zero mask means there's no load.
A masked load with an allOnes mask means it's a normal vector load.
Differential Revision: http://reviews.llvm.org/D16691
llvm-svn: 259369
Summary:
Simplify callee-save register save/restore code generation by
remembering the size of the callee-save area when it is computed so we
don't have to scan the prologue/epilogue instructions again later to
reconstruct it.
This is intended to simplify follow-on changes that reduce the number of
registers saved/restored.
Reviewers: mcrosier, jmolloy, t.p.northover
Subscribers: aemerson, rengolin, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D16732
llvm-svn: 259365
In the future, we will vectorize recurrences other than reductions. This patch
renames a few variables and updates their associated comments to enable them to
be reused for non-reduction PHI nodes.
This change was requested in the review for D16197.
llvm-svn: 259364
Remove the old select.ll file and use select-int.ll, select-flt.ll,
select-dbl.ll for testing selects on integers, floats & doubles respectivelly.
llvm-svn: 259361
Summary:
The bugs were:
* teq and similar take 4-bit unsigned immediates on microMIPS.
* teqi and similar have side-effects like teq do.
* shll_s.w and shra_r.w take 5-bit unsigned immediates.
* The various DSP ext* instructions take a 5-bit immediate.
* repl.qh takes an 8-bit unsigned immediate.
* repl.ph takes a 10-bit unsigned immediate.
* rddsp/wrdsp take a 10-bit unsigned immediate.
* teqi and similar take signed 16-bit immediates (10-bit for microMIPS).
* Out-of-range immediate macros for or/xor take a simm32/simm64 depending
on architecture. I'll fix the simm64 case properly when I reach simm32.
lui is a bit more lenient than GAS and accepts signed immediates in addition
to unsigned. This is because MipsMCExpr can produce signed values when
constant folding and it currently lacks a way of knowing it should fold to
an unsigned value.
Reviewers: vkalintiris
Subscribers: dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D15446
llvm-svn: 259360
Changed emitting offset of macinfo entry into compiler unit DIE to use "addSectionLabel" method rather than explicitly calculating size/offset of macro entry.
Differential Revision: http://reviews.llvm.org/D16292
llvm-svn: 259358
Patch adds a DWARF language vendor extension for RenderScript.
We are already using this identifier in LLDB with a hard coded value, so it's preferable to use a LLVM generated enum instead.
The language is intended to be added to the next version of the standard.
See http://www.dwarfstd.org/ShowIssue.php?issue=150331.1
Reviewers: dexonsmith, echristo
Subscribers: probinson domipheus, srhines, llvm-commits
Differential Revision: http://reviews.llvm.org/D16409
llvm-svn: 259348
Minor patch to trace back through target shuffles to the source of the inserted element in a (V)INSERTPS shuffle.
Differential Revision: http://reviews.llvm.org/D16652
llvm-svn: 259343
Noticed while working on scattered relocations.
I do not think these relocs can actually happen in the debug_info section,
but if they happen the code would mishandle them. Explicitely skip them
and warn if we encounter one.
llvm-svn: 259341
Although it seems like clang will never emit scattered relocations in
the debug information (at least I couldn't find a way), we have too
support them for the benefit of other compilers.
As clang doesn't generate them, the included testcase was produced
from hacked up assembly.
llvm-svn: 259339