This reverts commit r291973.
The test fails in a Release build with LLVM_BUILD_GLOBAL_ISEL enabled.
AFAICT, llc segfaults. I'll add a few more details to the original
commit.
llvm-svn: 292061
First, I've moved a test of IVUsers from the LSR tree to a dedicated
IVUsers test directory. I've also simplified its RUN line now that the
new pass manager's loop PM is providing analyses on their own.
No functionality changed, but it makes subsequent changes cleaner.
llvm-svn: 292060
events.
This pass sometimes has a pointer to BlockFrequencyInfo so it needs
custom invalidation logic. It is also otherwise immutable so we can
reduce the number of invalidations that happen substantially.
llvm-svn: 292058
mark it as never invalidated in the new PM.
The old PM already required this to work, and after a discussion with
Hal this seems to really be the only sensible answer. The cache
gracefully degrades as the IR is mutated, and most things which do this
should already be incrementally updating the cache.
This gets rid of a bunch of logic preserving and testing the
invalidation of this analysis.
llvm-svn: 292039
cover domtree and alias analysis. These are the pretty clear analyses
that we would always want to survive this pass.
To make these survive, we also need to preserve the assumption cache.
Added a test that verifies the important bits of this preservation.
llvm-svn: 292037
VPMACSDQH/VPMACSDQL act as VPADDQ( VPMULDQ( x, y ), z ) - multiply+extending either the odd/even 4i32 input elements and adding to v2i64 accumulator
llvm-svn: 292020
Tests showing missed opportunities to use XOP's integer fma instructions
Some of these are pretty awkward to match as they often have implicit sext/trunc stages but many just ignore overflow bits which makes things pretty straightforward.
llvm-svn: 292017
Isel now selects masked move instructions for vselect instead of blendm. But sometimes it beneficial to register allocation to remove the tied register constraint by using blendm instructions.
This also picks up cases where the masked move was created due to a masked load intrinsic.
Differential Revision: https://reviews.llvm.org/D28454
llvm-svn: 292005
We'll now expand AVX512_128_SET0 to an EVEX VXORD if VLX available. Or if its not, but register allocation has selected a non-extended register we will use VEX VXORPS. And if its an extended register without VLX we'll use a 512-bit XOR. Do the same for AVX512_FsFLD0SS/SD.
This makes it possible for the register allocator to have all 32 registers available to work with.
llvm-svn: 292004
Allows LLVM to optimize sequences like the following:
%add = add nuw i32 %x, 1
%cmp = icmp ugt i32 %add, %y
Into:
%cmp = icmp uge i32 %x, %y
Previously, only signed comparisons were being handled.
Decrements could also be handled, but 'sub nuw %x, 1' is currently canonicalized to
'add %x, -1' in InstCombineAddSub, losing the nuw flag. Removing that canonicalization
seems like it might have far-reaching ramifications so I kept this simple for now.
Patch by Matti Niemenmaa!
Differential Revision: https://reviews.llvm.org/D24700
llvm-svn: 291975
Correctly populating Machine PHIs relies on knowing exactly how the IR level
CFG was lowered to MachineIR. This needs to be tracked by any translation
phases that meddle (currently only SwitchInst handling).
llvm-svn: 291973
Summary:
This is a testcase where phi node cycling happens, and because we do
not order the leaders by domination or anything similar, the leader
keeps changing.
Using std::set for the members is too expensive, and we actually don't
need them sorted all the time, only at leader changes.
We could keep both a set and a vector, and keep them mostly sorted and
resort as necessary, or use a set and a fibheap, but all of this seems
premature.
After running some statistics, we are able to avoid the vast majority
of sorting by keeping a "next leader" field. Most congruence classes only have
leader changes once or twice during GVN.
Reviewers: davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28594
llvm-svn: 291968
Only scalar half-precision operations are supported at the moment.
- Adds general support for 'half' type in NVPTX.
- fp16 math operations are supported on sm_53+ GPUs only
(can be disabled with --nvptx-no-f16-math).
- Type conversions to/from fp16 are supported on all GPU variants.
- On GPU variants that do not have full fp16 support (or if it's disabled),
fp16 operations are promoted to fp32 and results are converted back
to fp16 for storage.
Differential Revision: https://reviews.llvm.org/D28540
llvm-svn: 291956
reserved physreg in RegisterCoalescer.
Previously, we only checked for clobbers when merging into a READ of
the physreg, but not when merging from a WRITE to the physreg.
Differential Revision: https://reviews.llvm.org/D28527
llvm-svn: 291942
Previously we'd always lower @llvm.{sin,cos}.f32 to {sin.cos}.approx.f32
instruction even when unsafe FP math was not allowed.
Clang-generated IR is not affected by this as it uses precise sin/cos
from CUDA's libdevice when unsafe math is disabled.
Differential Revision: https://reviews.llvm.org/D28619
llvm-svn: 291936
Summary:
Revert [ARM] Fix ubig32_t read in ARMAttributeParser
Now using support functions to read data instead of trying to
perform casts.
===========================================================
Revert [ARM] Enable objdump to construct triple for ARM
Now that The ARMAttributeParser has been moved into the library,
it has been modified so that it can parse the attributes without
printing them and stores them in a map. ELFObjectFile now queries
the attributes to fill out the architecture details of a provided
triple for 'arm' and 'thumb' targets. llvm-objdump uses this new
functionality.
Subscribers: llvm-commits, samparker, aemerson, mgorny
Differential Revision: https://reviews.llvm.org/D28683
llvm-svn: 291911
GCC changes the CC between the user-code and the builtins based on the
value of `-target` rather than `-mfloat-abi`. When a HF target is used,
the VFP variant of the AAPCS CC is used. Otherwise, the AAPCS variant
is used. In all cases, the AEABI functions use the AAPCS CC. Adjust
the calling convention based on the target.
Resolves PR30543!
llvm-svn: 291909
Use v8i64 variable ASHR instructions if we don't have VLX.
This is a reduced version of D28537 that just adds support for variable shifts - I'll continue with that patch (for just constant/uniform shifts) once I've fixed the type legalization issue in avx512-cvt.ll.
Differential Revision: https://reviews.llvm.org/D28604
llvm-svn: 291901
Now that The ARMAttributeParser has been moved into the library,
it has been modified so that it can parse the attributes without
printing them and stores them in a map. ELFObjectFile now queries
the attributes to fill out the architecture details of a provided
triple for 'arm' and 'thumb' targets. llvm-objdump uses this new
functionality.
Differential Revision: https://reviews.llvm.org/D28281
llvm-svn: 291898
Some shuffles can be lowered to blend mask instruction (VPBLENDMB/VPBLENDMW/VPBLENDMD/VPBLENDMQ) .
In this patch, I added new pattern match for this case.
Reviewers:
1. craig.topper
2. guyblank
3. RKSimon
4. igorb
Differential Revision: https://reviews.llvm.org/D28483
llvm-svn: 291888
Running tests with expensive checks enabled exhibits some problems with
verification of pass results.
First, the pass verification may require results of analysis that are not
available. For instance, verification of loop info requires results of dominator
tree analysis. A pass may be marked as conserving loop info but does not need to
be dependent on DominatorTreePass. When a pass manager tries to verify that loop
info is valid, it needs dominator tree, but corresponding analysis may be
already destroyed as no user of it remained.
Another case is a pass that is skipped. For instance, entities with linkage
available_externally do not need code generation and such passes are skipped for
them. In this case result verification must also be skipped.
To solve these problems this change introduces a special flag to the Pass
structure to mark passes that have valid results. If this flag is reset,
verifications dependent on the pass result are skipped.
Differential Revision: https://reviews.llvm.org/D27190
llvm-svn: 291882
Other than on COFF with incremental linking, global metadata should
not need any extra alignment.
Differential Revision: https://reviews.llvm.org/D28628
llvm-svn: 291859
Summary:
We can sometimes end up with multiple copies of a local function that
have the same GUID in the index. This happens when there are local
functions with the same name that are in different source files with the
same name (but in different directories), and they were compiled in
their own directory so had the same path at compile time.
In this case make sure we import the copy in the caller's module. While
it isn't a correctness problem (the renamed reference which is based on the
module IR hash will be unique since the module must have had an
externally visible function that was imported), importing the wrong copy
will result in lost performance opportunity since it won't be referenced
and inlined.
Reviewers: mehdi_amini
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28440
llvm-svn: 291841
Revision 289661 introduced the function DILocation::getMergedLocation for
merging of debug locations. At the time is was simply a stub which always
returned no location. This patch modifies getMergedLocation to handle the
case where the two locations are the same or can't be discriminated.
Differential Revision: https://reviews.llvm.org/D28521
llvm-svn: 291809
Emit SHRQ/SHLQ instead of ANDQ with a 64 bit constant mask if the result
is unused and the mask has only higher/lower bits set. For example, with
this patch LLVM emits
shrq $41, %rdi
je
instead of
movabsq $0xFFFFFE0000000000, %rcx
testq %rcx, %rdi
je
This reduces number of instructions, code size and register pressure.
The transformation is applied only for cases where the mask cannot be
encoded as an immediate value within TESTQ instruction.
Differential Revision: https://reviews.llvm.org/D28198
llvm-svn: 291806
For tests on bypassing slow division there's no need to be
Atom-specific. The patch renames all tests on division bypassing
and makes their names more consistent:
atom-bypass-slow-division.ll -> bypass-slow-division-32.ll
(tests verifying correctness of divl-to-divb bypassing)
atom-bypass-slow-division-64.ll -> bypass-slow-division-64.ll
(tests verifying correctness of divq-to-divl bypassing)
slow-div.ll -> bypass-slow-division-tune.ll
(tests verifying that bypassing is enabled only when appropriate)
Differential Revision: https://reviews.llvm.org/D28197
llvm-svn: 291802
64-bit integer division in Intel CPUs is extremely slow, much slower
than 32-bit division. On the other hand, 8-bit and 16-bit divisions
aren't any faster. The only important exception is Atom where DIV8
is fastest. Because of that, the patch
1) Enables bypassing of 64-bit division for Atom, Silvermont and
all big cores.
2) Modifies 64-bit bypassing to use 32-bit division instead of
16-bit one. This doesn't make the shorter division slower but
increases chances of taking it. Moreover, it's much more likely
to prove at compile-time that a value fits 32 bits and doesn't
require a run-time check (e.g. zext i32 to i64).
Differential Revision: https://reviews.llvm.org/D28196
llvm-svn: 291800