Rather than gating on "isSwitchDense" (resulting in necessesarily
sparse lookup tables even when they were generated), always run
this quite cheap transform.
This transform is useful not just for generating tables.
LowerSwitch also wants this: read LowerSwitch.cpp:257.
Be careful to not generate worse code, by introducing a
SubThreshold heuristic.
Instead of just sorting by signed, generalize the finding of the
best base.
And now that it is run unconditionally, do not replicate its
functionality in SwitchToLookupTable (which could use a Sub
when having a hole is smaller, hence the SubThreshold
heuristic located in a single place).
This simplifies SwitchToLookupTable, and fixes
some ugly corner cases due to the use of signed numbers,
such as a table containing i16 32768 and 32769, of which
32769 would be interpreted as -32768, and now the code thinks
the table is size 65536.
(We still use unconditional subtraction when building a single-register mask,
but I think this whole block should go when the more general sparse
map is added, which doesn't leave empty holes in the table.)
And the reason test4 and test5 did not trigger was documented wrong:
it was because they were not considered sufficiently "dense".
Also, fix generation of invalid LLVM-IR: shl by bit-width.
llvm-svn: 361727
and replace with an equilivent countTrailingZeros.
GCD is much more expensive than this, with repeated division.
This depends on D60823
llvm-svn: 361726
This matches countLeadingOnes() and countTrailingOnes(), and
APInt's countLeadingZeros() and countTrailingZeros().
(as well as __builtin_clzll())
llvm-svn: 361724
The implementation in ValueTracking and ConstantRange are equally
powerful, reuse the one in ConstantRange, which will make this easier
to extend.
llvm-svn: 361723
Extract method to compute overflow based on binop and signedness,
and then make the result handling code generic. This extends the
always-overflow handling to signed muls, but has currently no effect,
as we don't compute always overflow for them (thus NFC).
llvm-svn: 361721
This add patterns for fp16 round and ceil etc. Same as the float and double
patterns.
Differential Revision: https://reviews.llvm.org/D62326
llvm-svn: 361718
Promote a number of fp16 math intrinsics to float, so that the relevant float
math routines can be used. Copysign is expanded so as to be handled in-place.
Differential Revision: https://reviews.llvm.org/D62325
llvm-svn: 361717
We were only testing for direct SETCC results - this allows us to peek through AND/OR/XOR combinations of the comparison results as well.
There's a missing SEXT(PACKSS) fold that I need to investigate for v8i1 cases before I can enable it there as well.
llvm-svn: 361716
While people mostly care about 64-bit, some systems need basic lib32
support. The plan is to make lld (see PR40888) capable of linking some
applications (PR40888).
llvm-svn: 361711
This is a follow up to r361432 and r361504 which addresses issues
introduced by those changes. Specifically, it avoids duplicating
file and runtime paths in case when the effective triple is the
same as the cannonical one. Furthermore, it fixes the broken multilib
setup in the Fuchsia driver and deduplicates some of the code.
Differential Revision: https://reviews.llvm.org/D62442
llvm-svn: 361709
If we have a known non-nan operand, place it in the second operand
of fmin/fmax that is returned if either operand is nan.
Differential Revision: https://reviews.llvm.org/D62448
llvm-svn: 361704
Adds support for the uadd.sat family of intrinsics in LVI, based on
ConstantRange methods from D60946.
Differential Revision: https://reviews.llvm.org/D62447
llvm-svn: 361703
The test diffs show improved vector narrowing for integer min/max opcodes because
those were all absent from the list. I'm not sure if we can expose functional diffs
for all of the moved/added opcodes though.
It seems like we are missing an AVX512 opportunity to use 256-bit ops in place of
512-bit ops on some tests/targets, but I think that can be a follow-up.
Preliminary steps to make sure the callers are not misusing these queries:
rL361268
rL361547
Differential Revision: https://reviews.llvm.org/D62191
llvm-svn: 361701
The guaranteed no-wrap region is never empty, it always contains at
least zero, so these optimizations don't ever apply.
To make this more obviously true, replace the conversative return
in makeGNWR with an assertion.
llvm-svn: 361698
The test based on PR42010:
https://bugs.llvm.org/show_bug.cgi?id=42010
...may show an inaccuracy for PPC's target defs, but we should not
be so aggressive with an assert here. There's no telling what out-of-tree
targets look like.
llvm-svn: 361696
In LVI, calculate the range of extractvalue(op.with.overflow(%x, %y), 0)
as the range of op(%x, %y). This is mainly useful in conjunction with
D60650: If the result of the operation is extracted in a branch guarded
against overflow, then the value of %x will be appropriately constrained
and the result range of the operation will be calculated taking that
into account.
Differential Revision: https://reviews.llvm.org/D60656
llvm-svn: 361693
INC/DEC is really a special case of a more generic issue. We should also turn leas into add reg/reg or add reg/imm regardless of the slow lea flags.
This also supports LEA64_32 which has 64 bit input registers and 32 bit output registers. So we need to convert the 64 bit inputs to their 32 bit equivalents to check if they are equal to base reg.
One thing to note, the original code preserved the kill flags by adding operands to the new instruction instead of using addReg. But I think tied operands aren't supposed to have the kill flag set. I dropped the kill flags, but I could probably try to preserve it in the add reg/reg case if we think its important. Not sure which operand its supposed to go on for the LEA64_32r instruction due to the super reg implicit uses. Though I'm also not sure those are needed since they were probably just created by an INSERT_SUBREG from a 32-bit input.
Differential Revision: https://reviews.llvm.org/D61472
llvm-svn: 361691
This copies the Sandy Bridge zero idiom support to later CPUs. Adding the AVX2 and AVX512F/VL instructions as appropriate.
Differential Revision: https://reviews.llvm.org/D62360
llvm-svn: 361690
This reverts commit r361340. The following builder has been broken for
the past few days because of this commit:
http://green.lab.llvm.org/green/job/clang-stage2-cmake-RgSan/
Also revert r361399, which was committed to fix r361340.
llvm-svn: 361685
This lead to errors when dumping binaries with v4 and v5 units linked
together (but could've also errored on v5 units that did/didn't use
str_offsets).
Also improves error handling and messages around invalid str_offsets
contributions.
llvm-svn: 361683
When initialization of virtual base classes is skipped, we now tell the user
about it, because this aspect of C++ isn't very well-known.
The implementation is based on the new "note tags" feature (r358781).
In order to make use of it, allow note tags to produce prunable notes,
and move the note tag factory to CoreEngine.
Differential Revision: https://reviews.llvm.org/D61817
llvm-svn: 361682
This patch adds the run-time CFG branch that would skip initialization of
virtual base classes depending on whether the constructor is called from a
superclass constructor or not. Previously the Static Analyzer was already
skipping virtual base-class initializers in such constructors, but it wasn't
skipping their arguments and their potential side effects, which was causing
pr41300 (and was generally incorrect). The previous skipping behavior is
now replaced with a hard assertion that we're not even getting there due
to how our CFG works.
The new CFG element is under a CFG build option so that not to break other
consumers of the CFG by this change. Static Analyzer support for this change
is implemented.
Differential Revision: https://reviews.llvm.org/D61816
llvm-svn: 361681
In a few places in getInstrMapping, we check if use/def instructions for the
instruction we're mapping have floating point constraints.
We can improve this check and reduce the number of copies in GISel-compiled code
if we make a couple observations:
- For a def instruction, it only matters if the def instruction must always
output a value stored on a FPR
- For a use instruction, it only matters if the use instruction must always
only take in values stored in FPRs
This adds two new functions:
- onlyUsesFP
- onlyDefinesFP
Then we can use those when we're checking the uses/defs instead.
Without this patch, the load, unmerge, store, and select in the added test
would have unnecessary copies.
Differential Revision: https://reviews.llvm.org/D62426
llvm-svn: 361679