Summary:
Unrolling Analyzer is already pretty complicated, and it becomes harder and harder to exercise it with usual IR tests, as with them we can only check the final decision: whether the loop is unrolled or not. This change factors this framework out from LoopUnrollPass to analyses, which allows to use unit tests.
The change itself is supposed to be NFC, except adding a couple of tests.
I plan to add more tests as I add new functionality and find/fix bugs.
Reviewers: chandlerc, hfinkel, sanjoy
Subscribers: zzheng, sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D16623
llvm-svn: 260169
As discussed on PR26491, this patch adds support for lowering v4f32 shuffles to the MOVLHPS/MOVHLPS instructions. It also adds support for memory folding with their MOVLPS/MOVHPS load equivalents.
This first patch only really helps SSE1 targets as SSE2+ targets will widen the shuffle mask and use v2f64 equivalents (although they still combine to MOVLHPS/MOVHLPS for v2f64 splats). This will have to be addressed in a future patch, most likely when we add support for binary target shuffle combines.
Differential Revision: http://reviews.llvm.org/D16956
llvm-svn: 260168
In order for recent gcov versions to read the coverage data, you have
to use UseCfgChecksum=true and FunctionNamesInData=false options for
coverage profiling pass. This is because gcov is expecting the
function section in .gcda to be exactly 3 words in size, containing
ident and two checksums.
While llvm-cov is compatible with UseCfgChecksum=true, it always
expects a function name in .gcda function sections (it's not
compatible with FunctionNamesInData=false). Thus it's currently
impossible to generate one set of coverage files that works with both
gcov and llvm-cov.
This change fixes the reading of coverage information to only read the
function name if it's present.
Patch by Arseny Kapoulkine. Thanks!
llvm-svn: 260162
This patch uses one bit in profile version to differentiate Clang
instrumentation and IR level instrumentation profiles.
PGOInstrumenation generates a COMDAT variable __llvm_profile_raw_version so
that the compiler runtime can set the right profile kind.
PGOInstrumenation now checks this bit to make sure it's an IR level
instrumentation profile.
Differential Revision: http://reviews.llvm.org/D15540
llvm-svn: 260146
Another opportunity to reduce masked stores: in D16691, we decided not to attempt the 'one mask element is set'
transform in InstCombine, but this should be a win for any AVX machine.
Code comments note that this transform could be extended for other targets / cases.
Differential Revision: http://reviews.llvm.org/D16828
llvm-svn: 260145
Summary:
We will hit this once we have enabled uniform branches. The
smrd-vccz-bug.ll test will be added with the uniform branch commit.
Reviewers: mareko, arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D16725
llvm-svn: 260137
This matches GCC and MSVC's behaviour, and saves on code size.
We were already not extending i1 return values on x86_64 after r127766. This
takes that patch further by applying it to x86 target as well, and also for i8
and i16.
The ABI docs have been unclear about the required behaviour here. The new i386
psABI [1] clearly states (Table 2.4, page 14) that i1, i8, and i16 return
vales do not need to be extended beyond 8 bits. The x86_64 ABI doc is being
updated to say the same [2].
Differential Revision: http://reviews.llvm.org/D16907
[1]. https://01.org/sites/default/files/file_attach/intel386-psabi-1.0.pdf
[2]. https://groups.google.com/d/msg/x86-64-abi/E8O33onbnGQ/_RFWw_ixDQAJ
llvm-svn: 260133
Summary:
Available externally definitions are considered declarations for the
linker and eventually dropped. As such they are not allowed to be
in comdats. Remove any such imported functions from comdats.
Reviewers: rafael
Subscribers: davidxl, llvm-commits, joker.eph
Differential Revision: http://reviews.llvm.org/D16120
llvm-svn: 260122
This reduces sizes of instrumented object files, final binaries,
process images, and raw profile data.
The format of the indexed profile data remain the same.
Differential Revision: http://reviews.llvm.org/D16388
llvm-svn: 260117
sanitizer issue. The PredicatedScalarEvolution's copy constructor
wasn't copying the Generation value, and was leaving it un-initialized.
Original commit message:
[SCEV][LAA] Add no wrap SCEV predicates and use use them to improve strided pointer detection
Summary:
This change adds no wrap SCEV predicates with:
- support for runtime checking
- support for expression rewriting:
(sext ({x,+,y}) -> {sext(x),+,sext(y)}
(zext ({x,+,y}) -> {zext(x),+,sext(y)}
Note that we are sign extending the increment of the SCEV, even for
the zext case. This is needed to cover the fairly common case where y would
be a (small) negative integer. In order to do this, this change adds two new
flags: nusw and nssw that are applicable to AddRecExprs and permit the
transformations above.
We also change isStridedPtr in LAA to be able to make use of
these predicates. With this feature we should now always be able to
work around overflow issues in the dependence analysis.
Reviewers: mzolotukhin, sanjoy, anemet
Subscribers: mzolotukhin, sanjoy, llvm-commits, rengolin, jmolloy, hfinkel
Differential Revision: http://reviews.llvm.org/D15412
llvm-svn: 260112
Change a return statement of ComputeValueKnownInPredecessors() to be the same as
the rest return statements of the function. Otherwise, it might return true with
an empty Result when the current basic block has no predecessors and trigger the
first assert of JumpThreading::ProcessThreadableEdges().
llvm-svn: 260110
If a range has a lower bound of 0, add an AssertZext from the
nearest floor power of two.
This allows operations with some workitem intrinsics with known
maximum ranges to use fast 24-bit multiplies.
llvm-svn: 260109
We shouldn't assert when there are no memchecks, since we
can have SCEV checks. There is already an assert covering
the case where there are no SCEV checks or memchecks.
This also changes the LAA pointer wrapping versioning test
to use the loop versioning pass (this was how I managed to
trigger the assert in the loop versioning pass).
llvm-svn: 260086
Summary:
This change adds no wrap SCEV predicates with:
- support for runtime checking
- support for expression rewriting:
(sext ({x,+,y}) -> {sext(x),+,sext(y)}
(zext ({x,+,y}) -> {zext(x),+,sext(y)}
Note that we are sign extending the increment of the SCEV, even for
the zext case. This is needed to cover the fairly common case where y would
be a (small) negative integer. In order to do this, this change adds two new
flags: nusw and nssw that are applicable to AddRecExprs and permit the
transformations above.
We also change isStridedPtr in LAA to be able to make use of
these predicates. With this feature we should now always be able to
work around overflow issues in the dependence analysis.
Reviewers: mzolotukhin, sanjoy, anemet
Subscribers: mzolotukhin, sanjoy, llvm-commits, rengolin, jmolloy, hfinkel
Differential Revision: http://reviews.llvm.org/D15412
llvm-svn: 260085
As discussed in https://github.com/google/sanitizers/issues/398, with current
implementation of poisoning globals we can have some CHECK failures or false
positives in case of mixing instrumented and non-instrumented code due to ASan
poisons innocent globals from non-sanitized binary/library. We can use private
aliases to avoid such errors. In addition, to preserve ODR violation detection,
we introduce new __odr_asan_gen_XXX symbol for each instrumented global that
indicates if this global was already registered. To detect ODR violation in
runtime, we should only check the value of indicator and report an error if it
isn't equal to zero.
Differential Revision: http://reviews.llvm.org/D15642
llvm-svn: 260075
The combineX86ShufflesRecursively only supports unary shuffles, but was missing the opportunity to combine binary shuffles with a zero / undef second input.
This patch resolves target shuffle inputs, converting the shuffle mask elements to SM_SentinelUndef/SM_SentinelZero where possible. It then resolves the updated mask to check if we have created a faux unary shuffle.
Additionally, we now attempt to recursively call combineX86ShufflesRecursively for all input operands (we used to just recurse for unary integer shuffles and unary unpacks) - it safely returns early if its not a target shuffle.
Differential Revision: http://reviews.llvm.org/D16683
llvm-svn: 260063
The Windows bots have been failing for the last two days, with:
FAILED: C:\PROGRA~2\MICROS~1.0\VC\bin\amd64\cl.exe -c LLVMContextImpl.cpp
D:\buildslave\clang-x64-ninja-win7\llvm\lib\IR\LLVMContextImpl.cpp(137) :
error C2248: 'llvm::TrailingObjects<llvm::AttributeSetImpl,
llvm::IndexAttrPair>::operator delete' :
cannot access private member declared in class 'llvm::AttributeSetImpl'
TrailingObjects.h(298) : see declaration of
'llvm::TrailingObjects<llvm::AttributeSetImpl,
llvm::IndexAttrPair>::operator delete'
AttributeImpl.h(213) : see declaration of 'llvm::AttributeSetImpl'
llvm-svn: 260053
Pulled out the code used by PSHUFB/VPERMV/VPERMV3 shuffle mask decoding into common helper functions.
The helper functions handle masks coming from BROADCAST/BUILD_VECTOR and ConstantPool nodes respectively.
llvm-svn: 260032
First step towards being able to decode AVX512 PMOVZX instructions without a massive bloat in the shuffle decode switch statement.
This should also make it easier to decode X86ISD::VZEXT target shuffles in the future.
llvm-svn: 259995
Summary:
Adds the linkage type to both the per-module and combined function
summaries, which subsumes the current islocal bit. This will eventually
be used to optimized linkage types based on global summary-based
analysis.
Reviewers: joker.eph
Subscribers: joker.eph, davidxl, llvm-commits
Differential Revision: http://reviews.llvm.org/D16943
llvm-svn: 259993
Summary:
When alias analysis is uncertain about the aliasing between any two accesses,
it will return MayAlias. This uncertainty from alias analysis restricts LICM
from proceeding further. In cases where alias analysis is uncertain we might
use loop versioning as an alternative.
Loop Versioning will create a version of the loop with aggressive aliasing
assumptions in addition to the original with conservative (default) aliasing
assumptions. The version of the loop making aggressive aliasing assumptions
will have all the memory accesses marked as no-alias. These two versions of
loop will be preceded by a memory runtime check. This runtime check consists
of bound checks for all unique memory accessed in loop, and it ensures the
lack of memory aliasing. The result of the runtime check determines which of
the loop versions is executed: If the runtime check detects any memory
aliasing, then the original loop is executed. Otherwise, the version with
aggressive aliasing assumptions is used.
The pass is off by default and can be enabled with command line option
-enable-loop-versioning-licm.
Reviewers: hfinkel, anemet, chatur01, reames
Subscribers: MatzeB, grosser, joker.eph, sanjoy, javed.absar, sbaranga,
llvm-commits
Differential Revision: http://reviews.llvm.org/D9151
llvm-svn: 259986
There is a legitimate use-case in clang where we need to replace a
temporary placeholder node with the temporary node that may be a
forward declaration.
<rdar://problem/24493203>
llvm-svn: 259973