x86 has its own copy of integer absolute pattern matching to combine directly to a SUB+CMOV.
This patch removes the x86 combine and adds custom lowering support for ISD::ABS instead, allowing us to use the DAGCombiner version.
Additional test cases are already covered by iabs.ll (rL315706 and rL315711).
Differential Revision: https://reviews.llvm.org/D38895
llvm-svn: 316162
This patch adds accurate instructions cost.
The formula presents two cases(stride 3 and stride 4) and calculates the cost according to the VF and stride.
Reviewers:
1. delena
2. Farhana
3. zvi
4. dorit
5. Ayal
Differential Revision: https://reviews.llvm.org/D38762
Change-Id: If4cfbd4ac0e63694e8144cb78c7fa34850647ff7
llvm-svn: 316072
Summary:
Make sure the map is cleared before processing a new module. Similar to what is done on `StackMaps`.
This issue is similar to D38588, though this time for FaultMaps (on x86) rather than ARM/AArch64. Other than possible mixing of information between modules, the crash is caused by the pointers values in the map that was allocated by the bump pointer allocator that is unwinded when emitting the next file. This issue has been around since 3.8.
This issue is likely much harder to write a test for since AFAICT it requires emitting something much more compilcated (and possibly real code) instead of just some random bytes.
Reviewers: skatkov, sanjoy
Reviewed By: skatkov, sanjoy
Subscribers: sanjoy, aemerson, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D38924
llvm-svn: 315990
Updated the scheduling information for the SkylakeClient target with the following changes:
1. regrouped the instructions after adding load and store latencies.
2. regrouped the instructions after adding identified missing ports in several groups.
The changes were made after revisiting the latencies impact of all the load and store uOps.
Reviewers: zvi, RKSimon, craig.topper
Differential Revision: https://reviews.llvm.org/D38727
Change-Id: I778a308cc11e490e8fa5e27e2047412a1dca029f
llvm-svn: 315978
Turns out we have no patterns on the instructions that were using this feature flag for other reasons. These instructions are slow on all modern CPUs so it seems unlikely that we will spend any effort supporting these instructions going forward. So we might as well just kill of the feature flag and just fix up the comments.
llvm-svn: 315862
Summary:
This was impeding our ability to combine the extending shuffles with other shuffles as you can see from the test changes.
There's one special case that needed to be added to use VZEXT directly for v8i8->v8i64 since the custom lowering requires v64i8.
Reviewers: RKSimon, zvi, delena
Reviewed By: delena
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38714
llvm-svn: 315860
Summary: I see nothing in Agner Fog's tables to indicate that this improved between Ivy Bridge and Haswell. It's also set for all Atom CPUs so I assume KNL should have it too.
Reviewers: RKSimon, zvi, gadi.haber
Reviewed By: gadi.haber
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38890
llvm-svn: 315859
Summary:
It's better to use our shuffle lowering code to handle these than loading an immediate into a k-register.
It really feels like this should be a DAG combine optimization rather than a lowering operation, but that's a problem for another day.
Reviewers: RKSimon, delena, zvi
Reviewed By: delena
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38932
llvm-svn: 315849
If we are applying a byte mask to a value extracted from a shuffle, see if we can combine the mask into shuffle.
Fixes the last issue with PR22415
llvm-svn: 315807
We don't need a bitconvert as a root pattern in these cases. The types in the other parts of the pattern are sufficient to express the behavior of these instructions.
llvm-svn: 315798
I believe these were added incorrectly under the belief that the load size was smaller than the input register size, but that's not true.
llvm-svn: 315795
This is particularly important for AVX512VL where we are better able to recognize the VBROADCAST loads to fold with other operations.
For AVX512VL we now use X86ISD::VBROADCAST for all of the patterns and remove the 128-bit X86ISD::VMOVDDUP.
We may be able to use this for AVX1 as well which would allow us to remove more isel patterns.
I also had to add X86ISD::VBROADCAST as a node to call combineShuffle for so that we treat it similar to X86ISD::MOVDDUP.
Differential Revision: https://reviews.llvm.org/D38836
llvm-svn: 315768
Summary:
There's only a tablegen testcase for IntImmLeaf and not a CodeGen one
because the relevant rules are rejected for other reasons at the moment.
On AArch64, it's because there's an SDNodeXForm attached to the operand.
On X86, it's because the rule either emits multiple instructions or has
another predicate using PatFrag which cannot easily be supported at the
same time.
Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar
Reviewed By: qcolombet
Subscribers: aemerson, javed.absar, igorb, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D36569
llvm-svn: 315761
This adds Intel's Knights Mill CPU to valid CPU names for the backend. For now its an alias of "knl", but ultimately we need to support AVX5124FMAPS and AVX5124VNNIW instruction sets for it.
Differential Revision: https://reviews.llvm.org/D38811
llvm-svn: 315722
Summary: We seem to inconsistently create CMOV nodes some with a Glue result and some without. But I can't find any cases that use the Glue result. So I've tried to remove all the place that did this.
Reviewers: RKSimon, spatel, zvi
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38664
llvm-svn: 315686
There's no advantage to using these instructions when they aren't masked. This enables some additional execution domain switching without needing to update the table.
llvm-svn: 315674
Reverting to investigate layering effects of MCJIT not linking
libCodeGen but using TargetMachine::getNameWithPrefix() breaking the
lldb bots.
This reverts commit r315633.
llvm-svn: 315637
Merge LLVMTargetMachine into TargetMachine.
- There is no in-tree target anymore that just implements TargetMachine
but not LLVMTargetMachine.
- It should still be possible to stub out all the various functions in
case a target does not want to use lib/CodeGen
- This simplifies the code and avoids methods ending up in the wrong
interface.
Differential Revision: https://reviews.llvm.org/D38489
llvm-svn: 315633
Summary:
Add LLVM_FORCE_ENABLE_DUMP cmake option, and use it along with
LLVM_ENABLE_ASSERTIONS to set LLVM_ENABLE_DUMP.
Remove NDEBUG and only use LLVM_ENABLE_DUMP to enable dump methods.
Move definition of LLVM_ENABLE_DUMP from config.h to llvm-config.h so
it'll be picked up by public headers.
Differential Revision: https://reviews.llvm.org/D38406
llvm-svn: 315590
MCObjectStreamer owns its MCCodeEmitter -- this fixes the types to reflect that,
and allows us to remove the last instance of MCObjectStreamer's weird "holding
ownership via someone else's reference" trick.
llvm-svn: 315531
Summary:
This adds a set of new directives that describe 32-bit x86 prologues.
The directives are limited and do not expose the full complexity of
codeview FPO data. They are merely a convenience for the compiler to
generate more readable assembly so we don't need to generate tons of
labels in CodeGen. If our prologue emission changes in the future, we
can change the set of available directives to suit our needs. These are
modelled after the .seh_ directives, which use a different format that
interacts with exception handling.
The directives are:
.cv_fpo_proc _foo
.cv_fpo_pushreg ebp/ebx/etc
.cv_fpo_setframe ebp/esi/etc
.cv_fpo_stackalloc 200
.cv_fpo_endprologue
.cv_fpo_endproc
.cv_fpo_data _foo
I tried to follow the implementation of ARM EHABI CFI directives by
sinking most directives out of MCStreamer and into X86TargetStreamer.
This helps avoid polluting non-X86 code with WinCOFF specific logic.
I used cdb to confirm that this can show locals in parent CSRs in a few
cases, most importantly the one where we use ESI as a frame pointer,
i.e. the one in http://crbug.com/756153#c28
Once we have cdb integration in debuginfo-tests, we can add integration
tests there.
Reviewers: majnemer, hans
Subscribers: aemerson, mgorny, kristof.beyls, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D38776
llvm-svn: 315513