Summary:
Move the function renaming logic into the Function class, and the
MD5Hash routine into the MD5 header.
This will enable these routines to be shared with ThinLTO, which
will be changed to store the MD5 hash instead of full function name
in the combined index for significant size reductions. And using the same
function naming for locals in the function index facilitates future
integration with indirect call value profiles.
Reviewers: davidxl
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D17006
llvm-svn: 260197
In general, memory restrictions on a called function (e.g. readnone)
cannot be transferred to a CallSite that has operand bundles. It is
possible to make this inference smarter, but lets fix the behavior to be
correct first.
llvm-svn: 260193
compiler-specific issues. Instead, repeat an 'operator delete' definition in
each derived class that is actually deleted, and give up on the static type
safety of an error when sized delete is accidentally used on a type derived
from TrailingObjects.
llvm-svn: 260190
Summary:
Passes that call `getAnalysisIfAvailable<T>` also need to call
`addUsedIfAvailable<T>` in `getAnalysisUsage` to indicate to the
legacy pass manager that it uses `T`. This contract was being
violated by passes that used `createLegacyPMAAResults`. This change
fixes this by exposing a helper in AliasAnalysis.h,
`addUsedAAAnalyses`, that is complementary to createLegacyPMAAResults
and does the right thing when called from `getAnalysisUsage`.
Reviewers: chandlerc
Subscribers: mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D17010
llvm-svn: 260183
Summary:
createLegacyPMAAResults is only called by CGSCC and Module passes, so
the call to getAnalysisIfAvailable<SCEVAAWrapperPass>() never
succeeds (SCEVAAWrapperPass is a function pass).
Reviewers: chandlerc
Subscribers: mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D17009
llvm-svn: 260182
This fixes undefined behavior in C++14 due to the size of the object being
deleted being different from sizeof(dynamic type) when it is allocated with
trailing objects.
MSVC seems to have several bugs around using-declarations changing the access
of a member inherited from a base class, so use forwarding functions instead of
using-declarations to make TrailingObjects::operator delete accessible where
desired.
llvm-svn: 260180
IndVarSimplify assumes scAddRecExpr to be expanded in literal form instead of
canonical form by calling disableCanonicalMode after it creates SCEVExpander.
When CanonicalMode is disabled, SCEVExpander::expand should always return PHI
node for scAddRecExpr. r259736 broke the assumption.
The fix is to let SCEVExpander::expand skip the reuse Value logic if
CanonicalMode is false.
In addition, Besides IndVarSimplify, LSR pass also calls disableCanonicalMode
before doing rewrite. We can remove the original check of LSRMode in reuse
Value logic and use CanonicalMode instead.
llvm-svn: 260174
Summary:
Unrolling Analyzer is already pretty complicated, and it becomes harder and harder to exercise it with usual IR tests, as with them we can only check the final decision: whether the loop is unrolled or not. This change factors this framework out from LoopUnrollPass to analyses, which allows to use unit tests.
The change itself is supposed to be NFC, except adding a couple of tests.
I plan to add more tests as I add new functionality and find/fix bugs.
Reviewers: chandlerc, hfinkel, sanjoy
Subscribers: zzheng, sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D16623
llvm-svn: 260169
As discussed on PR26491, this patch adds support for lowering v4f32 shuffles to the MOVLHPS/MOVHLPS instructions. It also adds support for memory folding with their MOVLPS/MOVHPS load equivalents.
This first patch only really helps SSE1 targets as SSE2+ targets will widen the shuffle mask and use v2f64 equivalents (although they still combine to MOVLHPS/MOVHLPS for v2f64 splats). This will have to be addressed in a future patch, most likely when we add support for binary target shuffle combines.
Differential Revision: http://reviews.llvm.org/D16956
llvm-svn: 260168
In order for recent gcov versions to read the coverage data, you have
to use UseCfgChecksum=true and FunctionNamesInData=false options for
coverage profiling pass. This is because gcov is expecting the
function section in .gcda to be exactly 3 words in size, containing
ident and two checksums.
While llvm-cov is compatible with UseCfgChecksum=true, it always
expects a function name in .gcda function sections (it's not
compatible with FunctionNamesInData=false). Thus it's currently
impossible to generate one set of coverage files that works with both
gcov and llvm-cov.
This change fixes the reading of coverage information to only read the
function name if it's present.
Patch by Arseny Kapoulkine. Thanks!
llvm-svn: 260162
This patch uses one bit in profile version to differentiate Clang
instrumentation and IR level instrumentation profiles.
PGOInstrumenation generates a COMDAT variable __llvm_profile_raw_version so
that the compiler runtime can set the right profile kind.
PGOInstrumenation now checks this bit to make sure it's an IR level
instrumentation profile.
Differential Revision: http://reviews.llvm.org/D15540
llvm-svn: 260146
Another opportunity to reduce masked stores: in D16691, we decided not to attempt the 'one mask element is set'
transform in InstCombine, but this should be a win for any AVX machine.
Code comments note that this transform could be extended for other targets / cases.
Differential Revision: http://reviews.llvm.org/D16828
llvm-svn: 260145
Mehdi suggested in a review of r259766 that it's also useful to easily
set the type of LTO. Augment the cmake variable to support that.
llvm-svn: 260143
Summary:
We will hit this once we have enabled uniform branches. The
smrd-vccz-bug.ll test will be added with the uniform branch commit.
Reviewers: mareko, arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D16725
llvm-svn: 260137
This matches GCC and MSVC's behaviour, and saves on code size.
We were already not extending i1 return values on x86_64 after r127766. This
takes that patch further by applying it to x86 target as well, and also for i8
and i16.
The ABI docs have been unclear about the required behaviour here. The new i386
psABI [1] clearly states (Table 2.4, page 14) that i1, i8, and i16 return
vales do not need to be extended beyond 8 bits. The x86_64 ABI doc is being
updated to say the same [2].
Differential Revision: http://reviews.llvm.org/D16907
[1]. https://01.org/sites/default/files/file_attach/intel386-psabi-1.0.pdf
[2]. https://groups.google.com/d/msg/x86-64-abi/E8O33onbnGQ/_RFWw_ixDQAJ
llvm-svn: 260133
Summary:
Available externally definitions are considered declarations for the
linker and eventually dropped. As such they are not allowed to be
in comdats. Remove any such imported functions from comdats.
Reviewers: rafael
Subscribers: davidxl, llvm-commits, joker.eph
Differential Revision: http://reviews.llvm.org/D16120
llvm-svn: 260122
This reduces sizes of instrumented object files, final binaries,
process images, and raw profile data.
The format of the indexed profile data remain the same.
Differential Revision: http://reviews.llvm.org/D16388
llvm-svn: 260117
sanitizer issue. The PredicatedScalarEvolution's copy constructor
wasn't copying the Generation value, and was leaving it un-initialized.
Original commit message:
[SCEV][LAA] Add no wrap SCEV predicates and use use them to improve strided pointer detection
Summary:
This change adds no wrap SCEV predicates with:
- support for runtime checking
- support for expression rewriting:
(sext ({x,+,y}) -> {sext(x),+,sext(y)}
(zext ({x,+,y}) -> {zext(x),+,sext(y)}
Note that we are sign extending the increment of the SCEV, even for
the zext case. This is needed to cover the fairly common case where y would
be a (small) negative integer. In order to do this, this change adds two new
flags: nusw and nssw that are applicable to AddRecExprs and permit the
transformations above.
We also change isStridedPtr in LAA to be able to make use of
these predicates. With this feature we should now always be able to
work around overflow issues in the dependence analysis.
Reviewers: mzolotukhin, sanjoy, anemet
Subscribers: mzolotukhin, sanjoy, llvm-commits, rengolin, jmolloy, hfinkel
Differential Revision: http://reviews.llvm.org/D15412
llvm-svn: 260112
Change a return statement of ComputeValueKnownInPredecessors() to be the same as
the rest return statements of the function. Otherwise, it might return true with
an empty Result when the current basic block has no predecessors and trigger the
first assert of JumpThreading::ProcessThreadableEdges().
llvm-svn: 260110
If a range has a lower bound of 0, add an AssertZext from the
nearest floor power of two.
This allows operations with some workitem intrinsics with known
maximum ranges to use fast 24-bit multiplies.
llvm-svn: 260109
We shouldn't assert when there are no memchecks, since we
can have SCEV checks. There is already an assert covering
the case where there are no SCEV checks or memchecks.
This also changes the LAA pointer wrapping versioning test
to use the loop versioning pass (this was how I managed to
trigger the assert in the loop versioning pass).
llvm-svn: 260086
Summary:
This change adds no wrap SCEV predicates with:
- support for runtime checking
- support for expression rewriting:
(sext ({x,+,y}) -> {sext(x),+,sext(y)}
(zext ({x,+,y}) -> {zext(x),+,sext(y)}
Note that we are sign extending the increment of the SCEV, even for
the zext case. This is needed to cover the fairly common case where y would
be a (small) negative integer. In order to do this, this change adds two new
flags: nusw and nssw that are applicable to AddRecExprs and permit the
transformations above.
We also change isStridedPtr in LAA to be able to make use of
these predicates. With this feature we should now always be able to
work around overflow issues in the dependence analysis.
Reviewers: mzolotukhin, sanjoy, anemet
Subscribers: mzolotukhin, sanjoy, llvm-commits, rengolin, jmolloy, hfinkel
Differential Revision: http://reviews.llvm.org/D15412
llvm-svn: 260085
As discussed in https://github.com/google/sanitizers/issues/398, with current
implementation of poisoning globals we can have some CHECK failures or false
positives in case of mixing instrumented and non-instrumented code due to ASan
poisons innocent globals from non-sanitized binary/library. We can use private
aliases to avoid such errors. In addition, to preserve ODR violation detection,
we introduce new __odr_asan_gen_XXX symbol for each instrumented global that
indicates if this global was already registered. To detect ODR violation in
runtime, we should only check the value of indicator and report an error if it
isn't equal to zero.
Differential Revision: http://reviews.llvm.org/D15642
llvm-svn: 260075
The combineX86ShufflesRecursively only supports unary shuffles, but was missing the opportunity to combine binary shuffles with a zero / undef second input.
This patch resolves target shuffle inputs, converting the shuffle mask elements to SM_SentinelUndef/SM_SentinelZero where possible. It then resolves the updated mask to check if we have created a faux unary shuffle.
Additionally, we now attempt to recursively call combineX86ShufflesRecursively for all input operands (we used to just recurse for unary integer shuffles and unary unpacks) - it safely returns early if its not a target shuffle.
Differential Revision: http://reviews.llvm.org/D16683
llvm-svn: 260063
The Windows bots have been failing for the last two days, with:
FAILED: C:\PROGRA~2\MICROS~1.0\VC\bin\amd64\cl.exe -c LLVMContextImpl.cpp
D:\buildslave\clang-x64-ninja-win7\llvm\lib\IR\LLVMContextImpl.cpp(137) :
error C2248: 'llvm::TrailingObjects<llvm::AttributeSetImpl,
llvm::IndexAttrPair>::operator delete' :
cannot access private member declared in class 'llvm::AttributeSetImpl'
TrailingObjects.h(298) : see declaration of
'llvm::TrailingObjects<llvm::AttributeSetImpl,
llvm::IndexAttrPair>::operator delete'
AttributeImpl.h(213) : see declaration of 'llvm::AttributeSetImpl'
llvm-svn: 260053
Watching new contributors trying to build LLVM on Windows, one of the
very common failure modes was getting a version of Visual Studio
that did not have a C++ compiler for CMake to put up. Trying to create
a C++ project in Visual Studio will cause Visual Studio to go and
download the C++ tools.
llvm-svn: 260049
This just incrementally improves what was already there; it's questionable whether this content belongs in the getting started guide at all.
Patch by Ben Nathanson w/permission w/minor edtis by me.
llvm-svn: 260040
The mentioned environment variable doesn't appear to have any use in the LLVM repository. If it is still relevant for clang, we can consider adding it to the clang getting started page.
Patch inspired by documentation work by Ben Nathanson at the LLVM Bloomberg sprint.
llvm-svn: 260037
Pulled out the code used by PSHUFB/VPERMV/VPERMV3 shuffle mask decoding into common helper functions.
The helper functions handle masks coming from BROADCAST/BUILD_VECTOR and ConstantPool nodes respectively.
llvm-svn: 260032
Summary:
This is the attribute purpose-made for e.g. __syncthreads. It appears
that NoDuplicate may not be sufficient to prevent Sink from touching a
call to __syncthreads.
Reviewers: jingyue, hfinkel
Subscribers: llvm-commits, jholewinski, jhen, rnk, tra, majnemer
Differential Revision: http://reviews.llvm.org/D16941
llvm-svn: 260005
First step towards being able to decode AVX512 PMOVZX instructions without a massive bloat in the shuffle decode switch statement.
This should also make it easier to decode X86ISD::VZEXT target shuffles in the future.
llvm-svn: 259995
Summary:
Adds the linkage type to both the per-module and combined function
summaries, which subsumes the current islocal bit. This will eventually
be used to optimized linkage types based on global summary-based
analysis.
Reviewers: joker.eph
Subscribers: joker.eph, davidxl, llvm-commits
Differential Revision: http://reviews.llvm.org/D16943
llvm-svn: 259993
Summary:
When alias analysis is uncertain about the aliasing between any two accesses,
it will return MayAlias. This uncertainty from alias analysis restricts LICM
from proceeding further. In cases where alias analysis is uncertain we might
use loop versioning as an alternative.
Loop Versioning will create a version of the loop with aggressive aliasing
assumptions in addition to the original with conservative (default) aliasing
assumptions. The version of the loop making aggressive aliasing assumptions
will have all the memory accesses marked as no-alias. These two versions of
loop will be preceded by a memory runtime check. This runtime check consists
of bound checks for all unique memory accessed in loop, and it ensures the
lack of memory aliasing. The result of the runtime check determines which of
the loop versions is executed: If the runtime check detects any memory
aliasing, then the original loop is executed. Otherwise, the version with
aggressive aliasing assumptions is used.
The pass is off by default and can be enabled with command line option
-enable-loop-versioning-licm.
Reviewers: hfinkel, anemet, chatur01, reames
Subscribers: MatzeB, grosser, joker.eph, sanjoy, javed.absar, sbaranga,
llvm-commits
Differential Revision: http://reviews.llvm.org/D9151
llvm-svn: 259986
There is a legitimate use-case in clang where we need to replace a
temporary placeholder node with the temporary node that may be a
forward declaration.
<rdar://problem/24493203>
llvm-svn: 259973
a significant fraction of the file size (for files that otherwise have few
records). Also include an average size per record in the summary information.
llvm-svn: 259965
check wrong when inheriting a member through two levels of private inheritance,
where the middle one is a class template specialization.
llvm-svn: 259943
-fsized-deallocation. Disable sized deallocation for all objects derived from
TrailingObjects, as we expect the storage allocated for these objects to be
larger than the size of their dynamic type.
llvm-svn: 259942
In r252595, I inadvertently changed the condition to "Cost <= Threshold",
which caused a significant size regression in Chrome. This commit rectifies
that.
llvm-svn: 259915
Summary:
This makes it possible to specify some operands as optional to the AsmMatcher.
Setting this field to true will prevent the AsmMatcher from emitting
'too few operands' errors when there are missing optional operands.
Reviewers: olista01, ab
Subscribers: nhaustov, arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15755
llvm-svn: 259913
The current situation isn't great, because the amount of padding
requires is determined by the inverse order of the first encountered
use. We should eventually somehow sort these to minimize wasted space.
Another problem is the alignment of kernel arguments isn't
respected. The group_segment_alignment is always emitted as
the default 16, and typed arguments with higher alignments
or an explicitly set alignment are also ignored.
llvm-svn: 259912
LiveRangeEdit::eliminateDeadDef is used to remove dead define instructions
after rematerialization. To remove a VNI for a vreg from its LiveInterval,
LiveIntervals::removeVRegDefAt is used. However, after non-PHI VNIs are all
removed, PHI VNI are still left in the LiveInterval. Such unused vregs will
be kept in RegsToSpill[] at the end of InlineSpiller::reMaterializeAll and
spiller will allocate stackslot for them.
The fix is to get rid of unused reg by checking whether it has non-dbg
reference instead of whether it has non-empty interval.
llvm-svn: 259895
This makes it less likely to clash with other stuff that might be linked
in by change, e.g. ncurses exposes an external function called simply
"echo", so linking ncurses statically into the binary explodes in funny
ways.
llvm-svn: 259882
In r255133 (reapplied r253126) we started to avoid redundant
recomputation of LCSSA after loop-unrolling. This patch moves one step
further in this direction - now we can avoid it for much wider range of
loops, as we start to look at IR and try to figure out if the
transformation actually breaks LCSSA phis or makes it necessary to
insert new ones.
Differential Revision: http://reviews.llvm.org/D16838
llvm-svn: 259869
CodeView, like most other debug formats, represents the live range of a
variable so that debuggers might print them out.
They use a variety of records to represent how a particular variable
might be available (in a register, in a frame pointer, etc.) along with
a set of ranges where this debug information is relevant.
However, the format only allows us to use ranges which are limited to a
maximum of 0xF000 in size. This means that we need to split our debug
information into chunks of 0xF000.
Because the layout of code is not known until *very* late, we must use a
new fragment to record the information we need until we can know
*exactly* what the range is.
llvm-svn: 259868
Summary:
Passing the rematerialized values map to insertRematerializationStores by
value looks to be a simple oversight; update it to pass by reference.
Reviewers: reames, sanjoy
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D16911
llvm-svn: 259867
Summary: This diff increase the tested surface of the C API.
Reviewers: bogner, chandlerc, echristo, dblaikie, joker.eph, Wallbraker
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D16910
llvm-svn: 259863
Only single and double FP immediates are correctly printed by
MachineInstr::print() during debug output. Half float type goes to
APFloat::convertToDouble() and hits assertion it is not a double
semantics. This diff prints half machine operands correctly.
This cannot currently be hit by any in-tree target.
Patch by Stanislav Mekhanoshin
llvm-svn: 259857
We don't currently have many tests that deal with operations on multiple
local MemoryLocations. This new test helps out a bit in that regard.
llvm-svn: 259854
Summary: As per title. It is required and don't get linked in in some builds.
Reviewers: chapuni, joker.eph
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D16903
llvm-svn: 259853
Summary computation is not just for instrumented profiling and so I have moved
the ProfileSummary class to ProfileCommon.h (named so to allow code unrelated
to summary but common to instrumented and sampled profiling to be placed there)
Differential Revision: http://reviews.llvm.org/D16661
llvm-svn: 259846
Summary:
This basically add an echo test case in C. The support is limited right now, but full support would just be too much to review at once.
The echo test case simply get a module as input and try to output the same exact module. This allow to check the both reading and writing API are working as expected.
I want to improve this test over time to support more and more of the API, in order to improve coverage (coverage is quite poor right now).
Test Plan: Run the test.
Reviewers: chandlerc, bogner
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10725
llvm-svn: 259844
Using the load immediate only when the immediate (whether signed or unsigned)
can fit in a 16-bit signed field. Namely, from -32768 to 32767 for signed and
0 to 65535 for unsigned. This patch also ensures that we sign-extend under the
right conditions.
llvm-svn: 259840
This only impacts the creation of pre-/post-index instructions. The bound was
set high enough such that it did not change code generation for SPEC200X.
llvm-svn: 259828
This is the right location for platform-specific files.
On some distributions (e. g. Exherbo), a package can be installed for several
architectures in parallel, but the architecture-independent files are shared.
Therefore, we must not install architecture-dependent files (like the CMake
config and export files) to share/.
llvm-svn: 259821
Choose between MOVD/MOVSS and MOVQ/MOVSD depending on the target vector type.
This has a lot fewer test changes than trying to add this to X86InstrInfo::setExecutionDomain.....
llvm-svn: 259816
When SCEV expansion tries to reuse an existing value, it is needed to ensure
that using the Value at the InsertPt will not break LCSSA. The fix adds a
check that InsertPt is either inside the candidate Value's parent loop, or
the candidate Value's parent loop is nullptr.
llvm-svn: 259815
This patch allows the mixing of scaled and unscaled load/stores to form
load/store pairs.
PR24465
http://reviews.llvm.org/D12116
Many thanks to Ahmed and Michael for fixes and code review.
This is a reapplication of r246769 and r259790. The tramp3d failure was caused
by an incorrect refactoring in the patch. Specifically, we weren't always
properly clearing the SExtIdx flag.
llvm-svn: 259812
During instruction selection, the AArch64 backend can recognise the
following pattern and generate an [U|S]MADDL instruction, i.e. a
multiply of two 32-bit operands with a 64-bit result:
(mul (sext i32), (sext i32))
However, when one of the operands is constant, the sign extension
gets folded into the constant in SelectionDAG::getNode(). This means
that the instruction selection sees this:
(mul (sext i32), i64)
...which doesn't match the pattern. Sign-extension and 64-bit
multiply instructions are generated, which are slower than one 32-bit
multiply.
Add a pattern to match this and generate the correct instruction, for
both signed and unsigned multiplies.
Patch by Chris Diamand!
llvm-svn: 259800
Fix the lit bug that enabled this "feature" (empty triple is substring
of all possible target triples) and change the two outliers to use the
documented * syntax.
llvm-svn: 259799
This patch corresponds to review:
http://reviews.llvm.org/D16847
There are some files in glibc that use the output operand modifier even though
it was deprecated in GCC. This patch just adds support for it to prevent issues
with such files.
llvm-svn: 259798
This patch adds support for consecutive (load/undef elements) 32-bit loads, followed by trailing undef/zero elements to be combined to a single MOVD load.
Differential Revision: http://reviews.llvm.org/D16729
llvm-svn: 259796
This patch implements softening of long double type (ppcf128) on ppc32
architecture and enables operations for this type for soft float.
Patch by Strahinja Petrovic.
Differential Revision: http://reviews.llvm.org/D15811
llvm-svn: 259791
This patch allows the mixing of scaled and unscaled load/stores to form
load/store pairs.
PR24465
http://reviews.llvm.org/D12116
Many thanks to Ahmed and Michael for fixes and code review.
This is a reapplication of r246769, which was reverted in r246782 due to a
test-suite failure. I'm unable to reproduce the issue at this time.
llvm-svn: 259790
Use hash table (key is a memory operand) to store found LEA instructions to reduce compile time.
Differential Revision: http://reviews.llvm.org/D16404
llvm-svn: 259770
Current SCEV expansion will expand SCEV as a sequence of operations
and doesn't utilize the value already existed. This will introduce
redundent computation which may not be cleaned up throughly by
following optimizations.
This patch introduces an ExprValueMap which is a map from SCEV to the
set of equal values with the same SCEV. When a SCEV is expanded, the
set of values is checked and reused whenever possible before generating
a sequence of operations.
The original commit triggered regressions in Polly tests. The regressions
exposed two problems which have been fixed in current version.
1. Polly will generate a new function based on the old one. To generate an
instruction for the new function, it builds SCEV for the old instruction,
applies some tranformation on the SCEV generated, then expands the transformed
SCEV and insert the expanded value into new function. Because SCEV expansion
may reuse value cached in ExprValueMap, the value in old function may be
inserted into new function, which is wrong.
In SCEVExpander::expand, there is a logic to check the cached value to
be used should dominate the insertion point. However, for the above
case, the check always passes. That is because the insertion point is
in a new function, which is unreachable from the old function. However
for unreachable node, DominatorTreeBase::dominates thinks it will be
dominated by any other node.
The fix is to simply add a check that the cached value to be used in
expansion should be in the same function as the insertion point instruction.
2. When the SCEV is of scConstant type, expanding it directly is cheaper than
reusing a normal value cached. Although in the cached value set in ExprValueMap,
there is a Constant type value, but it is not easy to find it out -- the cached
Value set is not sorted according to the potential cost. Existing reuse logic
in SCEVExpander::expand simply chooses the first legal element from the cached
value set.
The fix is that when the SCEV is of scConstant type, don't try the reuse
logic. simply expand it.
Differential Revision: http://reviews.llvm.org/D12090
llvm-svn: 259736
D16251)
Summary:
This is a simpler fix to the problem than the dominator approach in
http://reviews.llvm.org/D16251. It adds only values into the gather() while loop
that have been seen before.
The actual endless loop is in the constant compare gather() routine in
Utils/SimplifyCFG.cpp. The same value ret.0.off0.i is pushed back into the
queue:
%.ret.0.off0.i = or i1 %.ret.0.off0.i, %cmp10.i
Here is what happens at the IR level:
for.cond.i: ; preds = %if.end6.i,
%if.end.i54
%ix.0.i = phi i32 [ 0, %if.end.i54 ], [ %inc.i55, %if.end6.i ]
%ret.0.off0.i = phi i1 [false, %if.end.i54], [%.ret.0.off0.i, %if.end6.i] <<<
%cmp2.i = icmp ult i32 %ix.0.i, %11
br i1 %cmp2.i, label %for.body.i, label %LBJ_TmpSimpleNeedExt.exit
if.end6.i: ; preds = %for.body.i
%cmp10.i = icmp ugt i32 %conv.i, %add9.i
%.ret.0.off0.i = or i1 %ret.0.off0.i, %cmp10.i <<<
When if.end.i54 gets eliminated which removes the definition of ret.0.off0.i.
The result is the expression %.ret.0.off0.i = or i1 %.ret.0.off0.i, %cmp10.i
(Note the first ‘or’ operand is now %.ret.0.off0.i, and *NOT* %ret.0.off0.i).
And
now there is use of .ret.0.off0.i before a definition which triggers the
“endless” loop in gather():
while(!DFT.empty()) {
V = DFT.pop_back_val(); // V is .ret.0.off0.i
if (Instruction *I = dyn_cast<Instruction>(V)) {
// If it is a || (or && depending on isEQ), process the operands.
if (I->getOpcode() == (isEQ ? Instruction::Or : Instruction::And)) {
DFT.push_back(I->getOperand(1)); // This is now .ret.0.off0.i also
DFT.push_back(I->getOperand(0));
continue; // “endless loop” for .ret.0.off0.i
}
Reviewers: reames, ahatanak
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D16839
llvm-svn: 259730
Unfortunately, ProgramInfo::ProcessId is signed on Unix and unsigned on
Windows, breaking the standard fix of using '0U' in the gtest
expectation.
llvm-svn: 259704
Summary:
Fixed pointers to go on the right hand side following coding guidelines. NFC.
Patch by Mandeep Singh Grang.
Reviewers: majnemer, arsenm, sanjoy
Differential Revision: http://reviews.llvm.org/D16866
llvm-svn: 259703
Bail out if we have a PHI on an EHPad that gets a value from a
CatchSwitchInst. Because the CatchSwitchInst cannot be split, there is
no good place to stick any instructions.
This fixes PR26373.
llvm-svn: 259702
This is mostly about having shorter lines and standardizing on one
interface, but it also avoids some needless indirection.
No functional change.
llvm-svn: 259697
Summary:
In r257979, I added code to ensure that we wouldn't merge DebugLocEntries if
the pieces they describe overlap. Unfortunately, I failed to cover the case,
where there may have multiple active Expressions in the entry, in which case we
need to make sure that no two values overlap before we can perform the merge.
This fixed PR26148.
Reviewers: aprantl
Differential Revision: http://reviews.llvm.org/D16742
llvm-svn: 259696
The IR/Value class had a linkage issue present when LLVM was built
as a library, and the LLVM library build time had different settings
for NDEBUG than the client of the LLVM library. Clients could get
into a state where the LLVM lib expected
Value::assertModuleIsMaterialized() to be inline-defined in the header
but clients expected that method to be defined in the LLVM library.
See this llvm-commits thread for more details:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160201/329667.html
llvm-svn: 259695
This patch consists of two parts: a performance fix in DAGCombiner.cpp
and a correctness fix in SelectionDAG.cpp.
The test case tests the bug that's uncovered by the performance fix, and
fixed by the correctness fix.
The performance fix keeps the containers required by the
hasPredecessorHelper (which is a lazy DFS) and reuse them. Since
hasPredecessorHelper is called in a loop, the overall efficiency reduced
from O(n^2) to O(n), where n is the number of SDNodes.
The correctness fix keeps iterating the neighbor list even if it's time
to early return. It will return after finishing adding all neighbors to
Worklist, so that no neighbors are discarded due to the original early
return.
llvm-svn: 259691
Summary:
This patch adds a reserve call to an expensive function
(`llvm::LoadIntrinsics`), and may fix a few other low hanging
performance fruit (I've put them in comments for now, so we can
discuss).
**Motivation:**
As I'm sure other developers do, when I build LLVM, I build the entire
project with the same config (`Debug`, `MinSizeRel`, `Release`, or
`RelWithDebInfo`). However, the `Debug` config also builds llvm-tblgen
in `Debug` mode. Later build steps that run llvm-tblgen then can
actually be the slowest steps in the entire build. Nobody likes slow
builds.
Reviewers: rnk, dblaikie
Differential Revision: http://reviews.llvm.org/D16832
Patch by Alexander G. Riccio
llvm-svn: 259683
Add support for TLS access for Windows on ARM. This generates a similar access
to MSVC for ARM.
The changes to the tablegen data is needed to support loading an external symbol
global that is not for a call. The adjustments to the DAG to DAG transforms are
needed to preserve the 32-bit move.
llvm-svn: 259676
According to git bisect, this is the root cause of a miscompile for Regex in
libLLVMSupport. I am still working on reducing a test case.
The actual bug may be elsewhere and this commit just exposed it.
Anyway, at the moment, to reproduce, follow these steps:
1. Build clang and libLTO in release mode.
2. Create a new build directory <stage2> and cd into it.
3. Use clang and libLTO from #1 to build llvm-extract in Release mode + asserts
using -O2 -flto
4. Run llvm-extract -ralias '.*bar' -S test/Other/extract-alias.ll
Result:
program doesn't contain global named '.*bar'!
Expected result:
@a0a0bar = alias void ()* @bar
@a0bar = alias void ()* @bar
declare void @bar()
Note: In step #3, if you don't use lto or asserts, the miscompile disappears.
llvm-svn: 259674
Recommited, after some fixing with test cases.
Updated test cases:
test/CodeGen/AArch64/arm64-misched-memdep-bug.ll
test/CodeGen/AArch64/tailcall_misched_graph.ll
Temporarily disabled test cases:
test/CodeGen/AMDGPU/split-vector-memoperand-offsets.ll
test/CodeGen/PowerPC/ppc64-fastcc.ll (partially updated)
test/CodeGen/PowerPC/vsx-fma-m.ll
test/CodeGen/PowerPC/vsx-fma-sp.ll
http://reviews.llvm.org/D8705
Reviewers: Hal Finkel, Andy Trick.
llvm-svn: 259673
Current SCEV expansion will expand SCEV as a sequence of operations
and doesn't utilize the value already existed. This will introduce
redundent computation which may not be cleaned up throughly by
following optimizations.
This patch introduces an ExprValueMap which is a map from SCEV to the
set of equal values with the same SCEV. When a SCEV is expanded, the
set of values is checked and reused whenever possible before generating
a sequence of operations.
Differential Revision: http://reviews.llvm.org/D12090
llvm-svn: 259662
The GNU toolchain emits __aeabi_divmod for soft-divide on ARM cores
which happens to be a lot faster than __divsi3/__modsi3 when the core
has hardware divide instructions. Do the same here.
Fixes PR26450.
llvm-svn: 259657
This regresses a test in LoopVectorize, so I'll need to go away and think about how to solve this in a way that isn't broken.
From the writeup in PR26071:
What's happening is that ComputeKnownZeroes is telling us that all bits except the LSB are zero. We're then deciding that only the LSB needs to be demanded from the icmp's inputs.
This is where we're wrong - we're assuming that after simplification the bits that were known zero will continue to be known zero. But they're not - during trivialization the upper bits get changed (because an XOR isn't shrunk), so the icmp fails.
The fault is in demandedbits - its contract does clearly state that a non-demanded bit may either be zero or one.
llvm-svn: 259649
MIPS ABI states that .sbss and .sdata sections must have SHF_MIPS_GPREL
flag. See Figure 4–7 on page 69 in the following document:
ftp://www.linux-mips.org/pub/linux/mips/doc/ABI/mipsabi.pdf.
Differential Revision: http://reviews.llvm.org/D15740
llvm-svn: 259641
Summary:
This adds a new attribute which targets can set in TableGen which causes a function to be generated which matches register alternative names. This is very similar to `ShouldEmitMatchRegisterName`, except it works on alt names.
This patch is currently used by the out of tree part of the AVR backend. It reduces code duplication greatly, and has the effect that you do not need to hardcode altname to register mappings in C++.
It will not work on targets which have registers which share the same aliases.
Reviewers: stoklund, arsenm, dsanders, hfinkel, vkalintiris
Subscribers: hfinkel, dylanmckay, llvm-commits
Differential Revision: http://reviews.llvm.org/D16312
llvm-svn: 259636
Follow up to D16217 and D16729
This change uncovered an odd pattern where VZEXT_LOAD v4i64 was being lowered to a load of the lower v2i64 (so the 2nd i64 destination element wasn't being zeroed), I can't find any use/reason for this and have removed the pattern and replaced it so only the 1st i64 element is loaded and the upper bits all zeroed. This matches the description for X86ISD::VZEXT_LOAD
Differential Revision: http://reviews.llvm.org/D16768
llvm-svn: 259635
With this patch, the profile summary data will be available in indexed
profile data file so that profiler reader/compiler optimizer can start
to make use of.
Differential Revision: http://reviews.llvm.org/D16258
llvm-svn: 259626
The purpose of PPCVSXFMAMutate is to elide copies by changing FMA forms
on PPC.
%vreg6<def> = COPY %vreg96
%vreg6<def,tied1> = XSMADDASP %vreg6<tied0>, %vreg5<kill>, %vreg7
;v6 = v6 + v5 * v7
is replaced by
%vreg5<def,tied1> = XSMADDMSP %vreg5<tied0>, %vreg7, %vreg96
;v5 = v5 * v7 + v96
This was broken in the case where the target register was also used as a
multiplicand. Fix this case by checking for it and replacing both uses
with the copied register.
%vreg6<def> = COPY %vreg96
%vreg6<def,tied1> = XSMADDASP %vreg6<tied0>, %vreg5<kill>, %vreg6
;v6 = v6 + v5 * v6
is replaced by
%vreg5<def,tied1> = XSMADDMSP %vreg5<tied0>, %vreg96, %vreg96
;v5 = v5 * v96 + v96
llvm-svn: 259617
The register coalescer can rematerialize constants that define
more of a register than the copy it is going to replace was going
to do.
This is valid in the case the register was undef before the
copy happened.
This patch makes sure that all the subranges defined by the new
rematerialization instructions have at least a dead def.
Review: http://reviews.llvm.org/D16693
llvm-svn: 259614
Summary:
LoopVersioning is a transform utility that transform passes can use to
run-time disambiguate may-aliasing accesses. I'd like to also expose as
pass to allow it to be unit-tested.
I am planning to add support for non-aliasing annotation in
LoopVersioning and I'd like to be able to write tests directly using
this pass.
(After that feature is done, the pass could also be used to look for
optimization opportunities that are hidden behind incomplete alias
information at compile time.)
The pass drives LoopVersioning in its default way which is to fully
disambiguate may-aliasing accesses no matter how many checks are
required.
Reviewers: hfinkel, ashutosh.nema, sbaranga
Subscribers: zzheng, mssimpso, llvm-commits, sanjoy
Differential Revision: http://reviews.llvm.org/D16612
llvm-svn: 259610
Please see include/llvm/Transforms/Utils/MemorySSA.h for a description
of MemorySSA, and what it does.
Differential Revision: http://reviews.llvm.org/D7864
llvm-svn: 259595
Due to staleness in a patch I committed yesterday, the debug output was reporting overdefined cases as being undefined. Confusing to say the least. The mistake appears to have only effected the debug output thankfully.
llvm-svn: 259594
I introduced a declaration in 259583 to keep the diff readable. This change just moves the definition up to remove the declaration again.
llvm-svn: 259585
This patch uses the newly introduced 'intersect' utility (from 259461: [LVI] Introduce an intersect operation on lattice values) to simplify existing code in LVI.
While not introducing any new concepts, this change is probably not NFC. The common 'intersect' function is more powerful that the ad-hoc implementations we'd had in a couple of places. Given that, we may see optimizations triggering a bit more often.
llvm-svn: 259583
This didn't affect X86_64, which is the only client of this code at the moment,
as stubs and pointers are both 8-bytes there. It will affect other platforms
though.
llvm-svn: 259575
If we can't assume the pointer value isn't within the bounds
of the object, it seems risky to try to replace the pointer
calculations.
llvm-svn: 259573
CodeView requires us to accurately describe the extent of the inlined
code. We did this by grabbing the next debug location in source order
and using *that* to denote where we stopped inlining. However, this is
not sufficient or correct in instances where there is no next debug
location or the next debug location belongs to the start of another
function.
To get this correct, use the end symbol of the function to denote the
last possible place the inlining could have stopped at.
llvm-svn: 259548
When promoting allocas to LDS, we know we are indexing
into a specific area just created, and the calculation
will also never overflow.
Also emit some of the muls as nsw nuw, because instcombine
infers this already from the range metadata. I think
putting this on the other adds and muls might be OK too,
but I'm not 100% sure.
llvm-svn: 259545