Rationale :
// sse3
__m128d test_mm_addsub_pd(__m128d A, __m128d B) {
return _mm_addsub_pd(A, B);
}
// mmx
void shift(__m64 a, __m64 b, int c) {
_mm_slli_pi16(a, c);
_mm_slli_pi32(a, c);
_mm_slli_si64(a, c);
_mm_srli_pi16(a, c);
_mm_srli_pi32(a, c);
_mm_srli_si64(a, c);
_mm_srai_pi16(a, c);
_mm_srai_pi32(a, c);
}
clang -msse3 -mno-mmx file.c -c
For this code we should be able to explicitly turn off MMX
without affecting the compilation of the SSE3 function and then
diagnose and error on compiling the MMX function.
This is a preparatory patch to the actual diagnosis code which is
coming in a future patch. This sets us up to have the correct information
where we need it and verifies that it's being emitted for the backend
to handle.
llvm-svn: 249733
that we can build up an accurate set of features rather than relying on
TargetInfo initialization via handleTargetFeatures to munge the list
of features.
llvm-svn: 249732
its own variable.
This is needed so that we can explicitly turn off MMX without turning
off SSE and also so that we can diagnose feature set incompatibilities
that involve MMX without SSE.
Rationale:
// sse3
__m128d test_mm_addsub_pd(__m128d A, __m128d B) {
return _mm_addsub_pd(A, B);
}
// mmx
void shift(__m64 a, __m64 b, int c) {
_mm_slli_pi16(a, c);
_mm_slli_pi32(a, c);
_mm_slli_si64(a, c);
_mm_srli_pi16(a, c);
_mm_srli_pi32(a, c);
_mm_srli_si64(a, c);
_mm_srai_pi16(a, c);
_mm_srai_pi32(a, c);
}
clang -msse3 -mno-mmx file.c -c
For this code we should be able to explicitly turn off MMX
without affecting the compilation of the SSE3 function and then
diagnose and error on compiling the MMX function.
This matches the existing gcc behavior and follows the spirit of
the SSE/MMX separation in llvm where we can (and do) turn off
MMX code generation except in the presence of intrinsics.
Updated a couple of tests, but primarily tested with a couple of tests
for turning on only mmx and only sse.
This is paired with a patch to clang to take advantage of this behavior.
llvm-svn: 249731
This fixes memory allocation problems by making the merge operation keep
the profile readers around until the merged profile has been emitted.
This is needed to prevent the inlined function names to disappear from
the function profiles. Since all the names are kept as references, once
the reader disappears, the names are also deallocated.
Additionally, XFAIL on big-endian architectures. The test case uses a
gcov file generated on a little-endian system.
llvm-svn: 249724
Address a FIXME in ELF/Writer.cpp: Make VAStart a target-dependent property.
I've set the values for the existing targets to what I believe to be the
correct values, and updated the regression tests accordingly.
llvm-svn: 249723
early.
This is needed in a patch I plan to commit later, in which a null Decl
pointer is passed to SetLLVMFunctionAttributesForDefinition.
Relevant discussion is in http://reviews.llvm.org/D13525.
llvm-svn: 249722
In preparation for making the size of a .plt entry target dependent, use the
existing EntrySize variable when writing (instead of a hard-coded value). NFC.
llvm-svn: 249720
o Before this patch, BPF backend will expand UNDEF node
to i64 constant 0.
o For second pass of dag combiner, legalizer will run through
each to-be-processed dag node.
o If any new SDNode is generated and has an undef operand,
dag combiner will put undef node, newly-generated constant-0 node,
and any node which uses these nodes in the working list.
o During this process, it is possible undef operand is
generated again, and this will form an infinite loop
for dag combiner pass2.
o This patch allows UNDEF to be a legal type.
Signed-off-by: Yonghong Song <yhs@plumgrid.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
llvm-svn: 249718
Summary:
`getConstantEvolutionLoopExitValue` and `ComputeExitCountExhaustively`
assumed all phi nodes in the loop header have the same order of incoming
values. This is not correct, and this commit changes
`getConstantEvolutionLoopExitValue` and `ComputeExitCountExhaustively`
to lookup the backedge value of a phi node using the loop's latch block.
Unfortunately, there is still some code duplication
`getConstantEvolutionLoopExitValue` and `ComputeExitCountExhaustively`.
At some point in the future we should extract out a helper class /
method that can evolve constant evolution phi nodes across iterations.
Fixes 25060. Thanks to Mattias Eriksson for the spot-on analysis!
Depends on D13457.
Reviewers: atrick, hfinkel
Subscribers: materi, llvm-commits
Differential Revision: http://reviews.llvm.org/D13458
llvm-svn: 249712
These changes improve the wait/release mechanism for threads spinning in
barriers that are handling tasks while spinnin by providing feedback to the
barriers about any task stealing that occurs.
Differential Revision: http://reviews.llvm.org/D13353
llvm-svn: 249711
Added (optional) sockets to the syntax of the KMP_PLACE_THREADS environment variable.
Some limitations:
* The number of sockets and then optional offset should be specified first (before other parameters).
* The letter designation is mandatory for sockets and then for other parameters.
* If number of cores is specified first, then the number of sockets is defaulted to all sockets on the machine; also, the old syntax is partially supported if sockets are skipped.
* If number of threads per core is specified first, then the number of sockets and cores per socket are defaulted to all sockets and all cores per socket respectively.
* The number of cores per socket cannot be specified before sockets or after threads per core.
* The number of threads per core can be specified before or after core-offset (old syntax required it to be before core-offset);
* Parameters delimiter can be: empty, comma, lower-case x;
* Spaces are allowed around numbers, around letters, around delimiter.
Approximate shorthand specification:
KMP_PLACE_THREADS="[num_sockets(S|s)[[delim]offset(O|o)][delim]][num_cores_per_socket(C|c)[[delim]offset(O|o)][delim]][num_threads_per_core(T|t)]"
Differential Revision: http://reviews.llvm.org/D13175
llvm-svn: 249708
This fixes yet another scenario where tryBuildVectorShuffle would
attempt to create a BUILD_VECTOR node with an invalid combination
of types. This can happen if the incoming BUILD_VECTOR has elements
of a type different from the vector element type, which is allowed
in certain cases as long as they are all the same type.
When one of these elements is used in the residual vector, and
UNDEF elements are added to fill up the residual vector, those
UNDEFs then have to use the type of the original element, not
the vector element type, or else the resulting BUILD_VECTOR
will have an invalid type combination.
llvm-svn: 249706
The <no-defaults> special entry will prevent that specific directory's excludes from
receiving the global default source excludes. This allows overriding the global default
in a specific instance, simplifying the syntax for that case.
llvm-svn: 249705
C++ exceptions are still off by default, which is similar to how C++
cleanups are off by default in MSVC.
If you use clang instead of clang-cl, exceptions are also still off by
default. In the future, when C++ EH is proven to be stable, we may flip
the default for that driver to be consistent with other platforms.
llvm-svn: 249704
This is a partial fix for PR24886:
https://llvm.org/bugs/show_bug.cgi?id=24886
Without this IR transform, the backend (x86 at least) was producing inefficient code.
This patch is making 2 assumptions:
1. The canonical form of a fabs() operation is, in fact, the LLVM fabs() intrinsic.
2. The high bit of an FP value is always the sign bit; as noted in the bug report, this isn't specified by the LangRef.
Differential Revision: http://reviews.llvm.org/D13076
llvm-svn: 249702
This was requested in D13076: if we're going to canonicalize to fabs(), ValueTracking
should know that fabs() clears sign bits.
In this patch (as in D13076), we're not handling vectors yet even though computeKnownBits'
fabs() case itself should be vector-ready via the splat in this patch.
Fixing this will require follow-on patches to correct other logic that uses 'getScalarType'.
Differential Revision: http://reviews.llvm.org/D13222
llvm-svn: 249701
Simplifying the convoluted CPU handling in ARMTargetInfo.
The default base CPU on ARM is ARM7TDMI, arch ARMv4T, and
ARMTargetInfo had a different one. This wasn't visible from
Clang because the driver selects the defaults and sets the
Arch/CPU features directly, but the constructor depended
on the CPU, which was never used.
This patch corrects the mistake and greatly simplifies
how CPU is dealt with (essentially by removing the duplicated
DefaultCPU field).
Tests updated.
llvm-svn: 249699
Problem was in SearchPathW function that does not attach an extension if file already has one.
That does not work for executables like ld.lld2 for example which require to have .exe extension but SearchPath thinks that its "lld2".
Solution was to add the extension manually.
Differential Revision: http://reviews.llvm.org/D13536
llvm-svn: 249696
Summary:
Adds support for automatically detecting and printing strings
represented by Array abbrev operands, analogous to the string dumping
performed for Blob abbrev operands.
Enhanced the ThinLTO combined index test to check for the appropriate
module and function strings.
Reviewers: dexonsmith, joker.eph, davidxl
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D13553
llvm-svn: 249695
We emit 1 compact unwind encoding per function, and this can’t represent
the varying stack pointer that will be generated by X86CallFrameOptimization.
Disable the optimization on Darwin.
(It might be possible to split the function into multiple ranges
and emit 1 compact unwind info per range. The compact unwind emission
code isn’t ready for that and this kind of info certainly isn’t
tested/used anywhere. It might be worth exploring this path if we want
to get the space savings at some point though)
llvm-svn: 249694
Removed an unused abbrev op in the VST_CODE_COMBINED_FNENTRY abbrev.
I noticed while writing/testing an array string dumper for
llvm-bcanalyze that the combined function's VST entry abbrevs contained
an old field that I am not using. Everything was working fine since the
bitcode writer and reader were in sync on how the record fields were
actually being set up and interpreted.
llvm-svn: 249691
This instructions doesn't have intrincis.
Added tests for lowering and encoding.
Differential Revision: http://reviews.llvm.org/D12317
llvm-svn: 249688
Instead of bailing out when we see an icmp, we can instead at least
say that if the upper bits of both operands are known zero, they are
not demanded. This doesn't help with signed comparisons, but it's at
least better than bailing out.
llvm-svn: 249687
Like adds and subtracts, muls ripple only to the left so we can use
the same logic.
While we're here, add a print method to DemandedBits so it can be used
with -analyze, which we'll use in the testcase.
llvm-svn: 249686
The algorithm itself is still eager, but it doesn't get run until a
query function is called. This greatly reduces the compile-time impact
of requiring DemandedBits when at runtime it is not often used.
NFCI.
llvm-svn: 249685