Commit Graph

44427 Commits

Author SHA1 Message Date
Matthias Braun a2f96b5bde AArch64: Enable AES instruction fusion on Cyclone.
Note that cyclone itself doesn't fuse, but newer apple chips do and we
are using cyclone as the default when targeting apple OSes.

The current code also does not capture all fusion patterns of apple CPUs
yet; I am still looking for ways to refactor the code nicely to extend
it.

llvm-svn: 316036
2017-10-17 21:46:15 +00:00
Tim Northover 350a87eaf1 AArch64: account for possible frame index operand in compares.
If the address of a local is used in a comparison, AArch64 can fold the
address-calculation into the comparison via "adds". Unfortunately, a couple of
places (both hit in this one test) are not ready to deal with that yet and just
assume the first source operand is a register.

llvm-svn: 316035
2017-10-17 21:43:52 +00:00
Eugene Zelenko 6cadde7f40 [Transforms] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 316034
2017-10-17 21:27:42 +00:00
Konstantin Zhuravlyov 7dabe9ced7 AMDGPU: Start generating metadata for MaxFlatWorkGroupSize
Differential Revision: https://reviews.llvm.org/D38958

llvm-svn: 316024
2017-10-17 20:03:21 +00:00
Yichao Yu a46eb8e649 Fix `FaultMaps` crash when the out streamer is reused
Summary:
Make sure the map is cleared before processing a new module. Similar to what is done on `StackMaps`.

This issue is similar to D38588, though this time for FaultMaps (on x86) rather than ARM/AArch64. Other than possible mixing of information between modules, the crash is caused by the pointers values in the map that was allocated by the bump pointer allocator that is unwinded when emitting the next file. This issue has been around since 3.8.

This issue is likely much harder to write a test for since AFAICT it requires emitting something much more compilcated (and possibly real code) instead of just some random bytes.

Reviewers: skatkov, sanjoy

Reviewed By: skatkov, sanjoy

Subscribers: sanjoy, aemerson, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D38924

llvm-svn: 315990
2017-10-17 11:44:34 +00:00
Gadi Haber 1e0f1f476a [X86][SKL] Updated scheduling information for the SkylakeClient target
Updated the scheduling information for the SkylakeClient target with the following changes:

1. regrouped the instructions after adding load and store latencies.
2. regrouped the instructions after adding identified missing ports in several groups.
The changes were made after revisiting the latencies impact of all the load and store uOps.

Reviewers: zvi, RKSimon, craig.topper
Differential Revision: https://reviews.llvm.org/D38727

Change-Id: I778a308cc11e490e8fa5e27e2047412a1dca029f
llvm-svn: 315978
2017-10-17 06:47:04 +00:00
Craig Topper fbb1985c14 [X86] Fix typo in comment. NFC
llvm-svn: 315969
2017-10-17 04:17:54 +00:00
Mark Searles 4e3d6160db Use the return value of UpdateNodeOperands(); in some cases, UpdateNodeOperands() modifies the node in-place and using the return value isn’t strictly necessary. However, it does not necessarily modify the node, but may return a resultant node if it already exists in the DAG. See comments in UpdateNodeOperands(). In that case, the return value must be used to avoid such scenarios as an infinite loop (node is assumed to have been updated, so added back to the worklist, and re-processed; however, node hasn’t changed so it is once again passed to UpdateNodeOperands(), assumed modified, added back to worklist; cycle infinitely repeats).
Differential Revision: https://reviews.llvm.org/D38466

llvm-svn: 315957
2017-10-16 23:38:53 +00:00
Quentin Colombet 0bd2825517 Re-apply [AArch64][RegisterBankInfo] Use the statically computed mappings for COPY
This reverts commit r315823, thus re-applying r315781.

Also make sure we don't use G_BITCAST mapping for non-generic registers.
Non-generic registers don't have a type but do have a reg bank.
Something the COPY mapping now how to deal with but the G_BITCAST
mapping don't.

-- Original Commit Message --
We use to resort on the generic implementation to get the mappings for
COPYs. The generic implementation resorts on table lookup and
dynamically allocated objects to get the valid mappings.

Given we already know how to map G_BITCAST and have the static mappings
for them, use that code path for COPY as well. This is much more
efficient.

Improve the compile time of RegBankSelect by up to 20%.

Note: When we eventually generate all the mappings via TableGen, we
wouldn't have to do that dance to shave compile time. The intent of this
change was to make sure that moving to static structure really pays off.

NFC.

llvm-svn: 315947
2017-10-16 22:28:40 +00:00
Quentin Colombet 9f20af6135 [AArch64][RegisterBankInfo] Add mapping support for G_BITCAST of s128
Anything bigger than 64-bit just map to FPR.

llvm-svn: 315946
2017-10-16 22:28:38 +00:00
Quentin Colombet 7c114d3d70 [AArch64][LegalizerInfo] Mark s128 G_BITCAST legal
We used to mark all G_BITCAST of 128-bit legal but only for vector
types. Scalars of this size are just fine as well.

llvm-svn: 315945
2017-10-16 22:28:27 +00:00
Krzysztof Parzyszek 72518eaa6f Add iterator range MachineRegisterInfo::liveins(), adopt users, NFC
llvm-svn: 315927
2017-10-16 19:08:41 +00:00
Krzysztof Parzyszek 02893de4ef [Hexagon] Rangify some loops, NFC
Recommit r315763 with a fix.

llvm-svn: 315925
2017-10-16 18:43:08 +00:00
Simon Dardis 0d378a9eed [mips][micromips] Fix (dis)assembly of bc1(t|f)
Previously these instructions were marked codegen only and had
an under-specified instruction description that did not record the
fcc register.

Reviewers: atanasyan, abeserminji

Differential Revision: https://reviews.llvm.org/D38847

llvm-svn: 315905
2017-10-16 14:20:22 +00:00
Simon Pilgrim 73bd5aa049 Fix or vs || typo.
llvm-svn: 315903
2017-10-16 14:01:59 +00:00
Stefan Maksimovic ee6b5a79dc [mips] Provide alternate predicates for constant synthesis
Ordering of patterns should not be of importance anymore
since the predicates used are mutually exclusive now.

llvm-svn: 315901
2017-10-16 13:18:21 +00:00
Hiroshi Inoue a7eb78b47f [PowerPC] fix up in sign-/zero-extension elimination
This patch fixes a potential problem in my previous commit (https://reviews.llvm.org/rL315888) by adding a null check.

llvm-svn: 315900
2017-10-16 12:11:15 +00:00
Andrew V. Tischenko bfc9061593 This patch is a result of D37262: The issues with X86 prefixes. It closes PR7709, PR17697, PR19251, PR32809 and PR21640. There could be other bugs closed by this patch.
llvm-svn: 315899
2017-10-16 11:14:29 +00:00
Daniel Sanders 01805b6747 [aarch64][globalisel] Fix a crash in selectAddrModeIndexed() caused by incorrect G_FRAME_INDEX handling
The wrong operand was being rendered to the result instruction.

The crash was detected by Bitcode/simd_ops/AArch64_halide_runtime.bc

llvm-svn: 315890
2017-10-16 05:39:30 +00:00
Yonghong Song 6621cf67cf bpf: fix bug on silently truncating 64-bit immediate
We came across an llvm bug when compiling some testcases that 64-bit
immediates are silently truncated into 32-bit and then packed into
BPF_JMP | BPF_K encoding.  This caused comparison with wrong value.

This bug looks to be introduced by r308080.  The Select_Ri pattern is
supposed to be lowered into J*_Ri while the latter only support 32-bit
immediate encoding, therefore Select_Ri should have similar immediate
predicate check as what J*_Ri are doing.

Reported-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Yonghong Song <yhs@fb.com>
llvm-svn: 315889
2017-10-16 04:14:53 +00:00
Hiroshi Inoue e3a3e3c9e9 [PowerPC] Eliminate sign- and zero-extensions if already sign- or zero-extended
This patch enables redundant sign- and zero-extension elimination in PowerPC MI Peephole pass.
If the input value of a sign- or zero-extension is known to be already sign- or zero-extended, the operation is redundant and can be eliminated.
One common case is sign-extensions for a method parameter or for a method return value; they must be sign- or zero-extended as defined in PPC ELF ABI. 
For example of the following simple code, two extsw instructions are generated before the invocation of int_func and before the return. With this patch, both extsw are eliminated.

void int_func(int);
void ii_test(int a) {
    if (a & 1) return int_func(a);
}

Such redundant sign- or zero-extensions are quite common in many programs; e.g. I observed about 60,000 occurrences of the elimination while compiling the LLVM+CLANG.

Differential Revision: https://reviews.llvm.org/D31319

llvm-svn: 315888
2017-10-16 04:12:57 +00:00
Daniel Sanders ea8711b88e Re-commit r315885: [globalisel][tblgen] Add support for iPTR and implement am_unscaled* and am_indexed*
Summary:
iPTR is a pointer of subtarget-specific size to any address space. Therefore
type checks on this size derive the SizeInBits from a subtarget hook.

At this point, we can import the simplests G_LOAD rules and select load
instructions using them. Further patches will support for the predicates to
enable additional loads as well as the stores.

The previous commit failed on MSVC due to a failure to convert an
initializer_list to a std::vector. Hopefully, MSVC will accept this version.

Depends on D37457

Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar

Reviewed By: qcolombet

Subscribers: kristof.beyls, javed.absar, llvm-commits, igorb

Differential Revision: https://reviews.llvm.org/D37458

llvm-svn: 315887
2017-10-16 03:36:29 +00:00
Daniel Sanders ce72d611af Revert r315885: [globalisel][tblgen] Add support for iPTR and implement am_unscaled* and am_indexed*
MSVC doesn't like one of the constructors.

llvm-svn: 315886
2017-10-16 02:15:39 +00:00
Daniel Sanders 6735ea86cd [globalisel][tblgen] Add support for iPTR and implement am_unscaled* and am_indexed*
Summary:
iPTR is a pointer of subtarget-specific size to any address space. Therefore
type checks on this size derive the SizeInBits from a subtarget hook.

At this point, we can import the simplests G_LOAD rules and select load
instructions using them. Further patches will support for the predicates to
enable additional loads as well as the stores.

Depends on D37457

Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar

Reviewed By: qcolombet

Subscribers: kristof.beyls, javed.absar, llvm-commits, igorb

Differential Revision: https://reviews.llvm.org/D37458

llvm-svn: 315885
2017-10-16 01:16:35 +00:00
Krzysztof Parzyszek 7467119149 [Hexagon] Add LLVM_ATTRIBUTE_UNUSED to operator<<, NFC
This should silence "unused function" warnings.

llvm-svn: 315883
2017-10-16 00:29:47 +00:00
Daniel Sanders df39cbae2f Re-commit r315863: [globalisel][tablegen] Import ComplexPattern when used as an operator
Summary:
It's possible for a ComplexPattern to be used as an operator in a match
pattern. This is used by the load/store patterns in AArch64 to name the
suboperands returned by ComplexPattern predicate so that they can be broken
apart and referenced independently in the result pattern.

This patch adds support for this in order to enable the import of load/store
patterns.

Depends on D37445

Hopefully fixed the ambiguous constructor that a large number of bots reported.

Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar

Reviewed By: qcolombet

Subscribers: aemerson, javed.absar, igorb, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D37456

llvm-svn: 315869
2017-10-15 18:22:54 +00:00
Daniel Sanders bb082a36d3 Revert r315863: [globalisel][tablegen] Import ComplexPattern when used as an operator
A large number of bots are failing on an ambiguous constructor call.

llvm-svn: 315866
2017-10-15 17:51:07 +00:00
Daniel Sanders b95b867dd8 [globalisel][tablegen] Import ComplexPattern when used as an operator
Summary:
It's possible for a ComplexPattern to be used as an operator in a match
pattern. This is used by the load/store patterns in AArch64 to name the
suboperands returned by ComplexPattern predicate so that they can be broken
apart and referenced independently in the result pattern.

This patch adds support for this in order to enable the import of load/store
patterns.

Depends on D37445

Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar

Reviewed By: qcolombet

Subscribers: aemerson, javed.absar, igorb, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D37456

llvm-svn: 315863
2017-10-15 17:03:36 +00:00
Craig Topper 2738117326 [X86] Remove the SlowBTMem feature flag entirely
Turns out we have no patterns on the instructions that were using this feature flag for other reasons. These instructions are slow on all modern CPUs so it seems unlikely that we will spend any effort supporting these instructions going forward. So we might as well just kill of the feature flag and just fix up the comments.

llvm-svn: 315862
2017-10-15 16:57:33 +00:00
Craig Topper a5af4a64d0 [AVX512] Don't mark EXTLOAD as legal with AVX512. Continue using custom lowering.
Summary:
This was impeding our ability to combine the extending shuffles with other shuffles as you can see from the test changes.

There's one special case that needed to be added to use VZEXT directly for v8i8->v8i64 since the custom lowering requires v64i8.

Reviewers: RKSimon, zvi, delena

Reviewed By: delena

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38714

llvm-svn: 315860
2017-10-15 16:41:17 +00:00
Craig Topper a1f9c9dd8b [X86] Add FeatureSlowBTMem to Haswell, Broadwell, Skylake, Cannonlake, and Knights Landing CPUs.
Summary: I see nothing in Agner Fog's tables to indicate that this improved between Ivy Bridge and Haswell. It's also set for all Atom CPUs so I assume KNL should have it too.

Reviewers: RKSimon, zvi, gadi.haber

Reviewed By: gadi.haber

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38890

llvm-svn: 315859
2017-10-15 16:41:15 +00:00
Aaron Ballman 615eb47035 Reverting r315590; it did not include changes for llvm-tblgen, which is causing link errors for several people.
Error LNK2019 unresolved external symbol "public: void __cdecl `anonymous namespace'::MatchableInfo::dump(void)const " (?dump@MatchableInfo@?A0xf4f1c304@@QEBAXXZ) referenced in function "public: void __cdecl `anonymous namespace'::AsmMatcherEmitter::run(class llvm::raw_ostream &)" (?run@AsmMatcherEmitter@?A0xf4f1c304@@QEAAXAEAVraw_ostream@llvm@@@Z) llvm-tblgen D:\llvm\2017\utils\TableGen\AsmMatcherEmitter.obj 1

llvm-svn: 315854
2017-10-15 14:32:27 +00:00
Amjad Aboud c8d67979c0 [X86] Ignore DBG instructions in X86CmovConversion optimization to resolve PR34565
Differential Revision: https://reviews.llvm.org/D38359

llvm-svn: 315851
2017-10-15 11:00:56 +00:00
Craig Topper a9cd59fb5d [X86] Lower vselect with constant condition to vector_shuffle even with AVX512 instructions.
Summary:
It's better to use our shuffle lowering code to handle these than loading an immediate into a k-register.

It really feels like this should be a DAG combine optimization rather than a lowering operation, but that's a problem for another day.

Reviewers: RKSimon, delena, zvi

Reviewed By: delena

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38932

llvm-svn: 315849
2017-10-15 06:39:07 +00:00
Vitaly Buka 7450398e01 Remove unused variables
llvm-svn: 315847
2017-10-15 05:35:02 +00:00
Davide Italiano 76067588dc [Hexagon] Mark RangeTree::dump() with LLVM_DUMP_METHOD.
GCC otherwise emits a "defined but not used" warning on the
member function.

llvm-svn: 315838
2017-10-14 23:46:01 +00:00
Konstantin Zhuravlyov 8c18f5b3d4 AMDGPU: Don't use TargetStreamer if it has not been initialized
Fixes cfe/trunk/test/Misc/backend-resource-limit-diagnostics.cl
test after r315808

We may hit few other similar issues, but I want to discuss good
solution offline.

llvm-svn: 315830
2017-10-14 22:16:26 +00:00
Simon Pilgrim 36fe00ee17 [X86][SSE] Don't attempt to reduce the imul vector width of odd sized vectors (PR34947)
llvm-svn: 315825
2017-10-14 19:57:19 +00:00
Bruno Cardoso Lopes caac2fbd19 Revert "[AArch64][RegisterBankInfo] Use the statically computed mappings for COPY"
This reverts commit r315781, breaks:
http://green.lab.llvm.org/green/job/Compiler_Verifiers_GlobalISEL/9882

llvm-svn: 315823
2017-10-14 19:31:03 +00:00
Konstantin Zhuravlyov a01d8b0b63 AMDGPU: Bring HSA metadata on par with the specification
Differential Revision: https://reviews.llvm.org/D38753

llvm-svn: 315821
2017-10-14 19:03:51 +00:00
Simon Pilgrim f5b9f353c3 Pull out repeated calls to VT.getVectorNumElements(). NFCI.
llvm-svn: 315818
2017-10-14 17:37:42 +00:00
Simon Pilgrim cded82837d Use DAG::getBitcast() helper. NFCI.
llvm-svn: 315815
2017-10-14 17:14:42 +00:00
Konstantin Zhuravlyov 219066bab8 AMDGPU: Improve note directive verification in assembler
- Do not allow amd_amdgpu_isa directives on non-amdgcn architectures
  - Do not allow amd_amdgpu_hsa_metadata on non-amdhsa OSes
  - Do not allow amd_amdgpu_pal_metadata on non-amdpal OSes

Differential Revision: https://reviews.llvm.org/D38750

llvm-svn: 315812
2017-10-14 16:15:28 +00:00
Konstantin Zhuravlyov eda425edd4 AMDGPU: Do not emit deprecated notes for code object v3
Differential Revision: https://reviews.llvm.org/D38749

llvm-svn: 315810
2017-10-14 15:59:07 +00:00
Konstantin Zhuravlyov 9c05b2bc3b AMDGPU: Add support for isa version note
- Emit NT_AMD_AMDGPU_ISA
  - Add assembler parsing for isa version directive
    - If isa version directive does not match command line arguments, then return error

Differential Revision: https://reviews.llvm.org/D38748

llvm-svn: 315808
2017-10-14 15:40:33 +00:00
Simon Pilgrim f367c27d2d [X86][SSE] Support combining AND(EXTRACT(SHUF(X)), C) -> EXTRACT(SHUF(X))
If we are applying a byte mask to a value extracted from a shuffle, see if we can combine the mask into shuffle.

Fixes the last issue with PR22415

llvm-svn: 315807
2017-10-14 15:01:36 +00:00
Craig Topper f7e777763d [X86] Add patterns for vzmovl+cvtpd2dq/cvttpd2dq with a load.
llvm-svn: 315802
2017-10-14 07:04:48 +00:00
Craig Topper 61010a85b8 [X86] Add AVX512 versions of VCVTPD2PS to load folding tables.
llvm-svn: 315801
2017-10-14 05:55:43 +00:00
Craig Topper ee277e190c [X86] Add patterns for vzmovl+cvtpd2ps with a load.
llvm-svn: 315800
2017-10-14 05:55:42 +00:00
Craig Topper aec05a9303 [X86] Remove some patterns for bitcasted alignednonedtemporalloads.
These select the same instruction as the non-bitcasted pattern. So this provides no additional value.

llvm-svn: 315799
2017-10-14 04:18:11 +00:00
Craig Topper 009f0aaeb0 [X86] Remove unnecessary bitconverts as the root of patterns for zero extended VCVTPD2UDQZ128rr and VCVTTPD2UDQZ128rr.
We don't need a bitconvert as a root pattern in these cases. The types in the other parts of the pattern are sufficient to express the behavior of these instructions.

llvm-svn: 315798
2017-10-14 04:18:10 +00:00
Craig Topper d746747d03 [X86] Add additional patterns for folding loads with 128-bit VCVTDQ2PD and VCVTUDQ2PD.
This matches the patterns we have for the SSE/AVX version.

This is a prerequisite for D38714.

llvm-svn: 315797
2017-10-14 04:18:09 +00:00
Craig Topper 134241e4af [X86] Add AVX512 flavors of VCVTDQ2PD plus VCVTUDQ2PD to the load folding tables.
llvm-svn: 315796
2017-10-14 04:18:08 +00:00
Craig Topper 0b64e67b0d [X86] Remove TB_NO_REVERSE from VCVTDQ2PDYrr and VCVTPS2PDYrr in the load folding tables.
I believe these were added incorrectly under the belief that the load size was smaller than the input register size, but that's not true.

llvm-svn: 315795
2017-10-14 04:18:07 +00:00
Craig Topper 53b0cb7fa9 [X86] Add an additional isel pattern to CVTDQ2PDrm/VCVTDQ2PDrm to enable load folding without the peephole pass.
This pattern is already used in AVX512VL version of these instructions. Though AVX512VL version is missing other patterns.

llvm-svn: 315794
2017-10-14 04:18:06 +00:00
Quentin Colombet dc2da06c55 [AArch64][RegisterBankInfo] Use the statically computed mappings for COPY
We use to resort on the generic implementation to get the mappings for
COPYs. The generic implementation resorts on table lookup and
dynamically allocated objects to get the valid mappings.

Given we already know how to map G_BITCAST and have the static mappings
for them, use that code path for COPY as well. This is much more
efficient.

Improve the compile time of RegBankSelect by up to 20%.

Note: When we eventually generate all the mappings via TableGen, we
wouldn't have to do that dance to shave compile time. The intent of this
change was to make sure that moving to static structure really pays off.

NFC.

llvm-svn: 315781
2017-10-14 00:43:48 +00:00
Krzysztof Parzyszek a7e5c84590 Revert r315763: "[Hexagon] Rangify some loops, NFC"
Broke some builds (using libstdc++).

llvm-svn: 315769
2017-10-13 21:57:11 +00:00
Craig Topper f6c69564e7 [X86] Use X86ISD::VBROADCAST in place of v2f64 X86ISD::MOVDDUP when AVX2 is available
This is particularly important for AVX512VL where we are better able to recognize the VBROADCAST loads to fold with other operations.

For AVX512VL we now use X86ISD::VBROADCAST for all of the patterns and remove the 128-bit X86ISD::VMOVDDUP.

We may be able to use this for AVX1 as well which would allow us to remove more isel patterns.

I also had to add X86ISD::VBROADCAST as a node to call combineShuffle for so that we treat it similar to X86ISD::MOVDDUP.

Differential Revision: https://reviews.llvm.org/D38836

llvm-svn: 315768
2017-10-13 21:56:48 +00:00
Krzysztof Parzyszek 63ca5d6196 [Hexagon] Rangify some loops, NFC
llvm-svn: 315763
2017-10-13 21:43:00 +00:00
Daniel Sanders 11300cead8 [globalisel][tablegen] Add support for fpimm and import of APInt/APFloat based ImmLeaf.
Summary:
There's only a tablegen testcase for IntImmLeaf and not a CodeGen one
because the relevant rules are rejected for other reasons at the moment.
On AArch64, it's because there's an SDNodeXForm attached to the operand.
On X86, it's because the rule either emits multiple instructions or has
another predicate using PatFrag which cannot easily be supported at the
same time.

Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar

Reviewed By: qcolombet

Subscribers: aemerson, javed.absar, igorb, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D36569

llvm-svn: 315761
2017-10-13 21:28:03 +00:00
Matt Arsenault e11d8aca77 AMDGPU: Implement hasBitPreservingFPLogic
llvm-svn: 315754
2017-10-13 21:10:22 +00:00
Benjamin Kramer 9f21ca6361 [Hexagon] Avoid unused variable warnings in release builds.
No functionality change intended.

llvm-svn: 315749
2017-10-13 20:46:14 +00:00
Matt Arsenault 550c66d10f AMDGPU: Look for src mods before fp_extend
When selecting modifiers for mad_mix instructions,
look at fneg/fabs that occur before the conversion.

llvm-svn: 315748
2017-10-13 20:45:49 +00:00
Daniel Sanders 649c585710 [aarch64] Support APInt and APFloat in ImmLeaf subclasses and make AArch64 use them.
Summary:
The purpose of this patch is to expose more information about ImmLeaf-like
PatLeaf's so that GlobalISel can learn to import them. Previously, ImmLeaf
could only be used to test int64_t's produced by sign-extending an APInt.
Other tests on immediates had to use the generic PatLeaf and extract the
constant using C++.

With this patch, tablegen will know how to generate predicates for APInt,
and APFloat. This will allow it to 'do the right thing' for both SelectionDAG
and GlobalISel which require different methods of extracting the immediate
from the IR.

This is NFC for SelectionDAG since the new code is equivalent to the
previous code. It's also NFC for FastISel because FastIselShouldIgnore is 1
for the ImmLeaf subclasses. Enabling FastIselShouldIgnore == 0 for these new
subclasses will require a significant re-factor of FastISel.

For GlobalISel, it's currently NFC because the relevant code to import the
affected rules is not yet present. This will be added in a later patch.

Depends on D36086

Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar

Reviewed By: qcolombet

Subscribers: bjope, aemerson, rengolin, javed.absar, igorb, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D36534

llvm-svn: 315747
2017-10-13 20:42:18 +00:00
Matt Arsenault 4d70754e3c AMDGPU: Implement isFPExtFoldable
This helps match v_mad_mix* in some cases.

llvm-svn: 315744
2017-10-13 20:18:59 +00:00
Matt Arsenault f2db97d8fa DAG: Add opcode and source type to isFPExtFree
This is only currently used for mad/fma transforms.
This is the only case where it should be used for AMDGPU,
so add an opcode to be sure.

llvm-svn: 315740
2017-10-13 19:55:45 +00:00
Krzysztof Parzyszek 7c9c05888c [Hexagon] Minimize number of repeated constant extenders
Each constant extender requires an extra instruction, which adds to the
code size and also reduces the number of available slots in an instruction
packet. In most cases, the value of a repeated constant extender could be
loaded into a register, and the instructions using the extender could be
replaced with their counterparts that use that register instead.

This patch adds a pass that tries to reduce the number of constant
extenders, including extenders which differ only in an immediate offset
known at compile time, e.g. @global and @global+12.

llvm-svn: 315735
2017-10-13 19:02:59 +00:00
Craig Topper 5d692917f4 [X86] Add initial skeleton support for knm cpu
This adds Intel's Knights Mill CPU to valid CPU names for the backend. For now its an alias of "knl", but ultimately we need to support AVX5124FMAPS and AVX5124VNNIW instruction sets for it.

Differential Revision: https://reviews.llvm.org/D38811

llvm-svn: 315722
2017-10-13 18:10:17 +00:00
Craig Topper 5805fb3dfc [X86] Fix some inconsistent formatting in the processor feature lists.
llvm-svn: 315696
2017-10-13 16:06:06 +00:00
Craig Topper 54541c4675 [X86] Add ProcIntelBDW to BroadwellProc class not BDWFeatures class.
This isn't a property we want inherited.

llvm-svn: 315695
2017-10-13 16:04:08 +00:00
Krzysztof Parzyszek a0f2f7c413 [Hexagon] Add patterns for cmpb/cmph with immediate arguments
Patch by Sumanth Gundapaneni.

llvm-svn: 315692
2017-10-13 15:43:12 +00:00
Craig Topper 0817346aef [X86] Stop creating CMOV nodes with a second MVT::Glue result
Summary: We seem to inconsistently create CMOV nodes some with a Glue result and some without. But I can't find any cases that use the Glue result. So I've tried to remove all the place that did this.

Reviewers: RKSimon, spatel, zvi

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38664

llvm-svn: 315686
2017-10-13 15:28:35 +00:00
Craig Topper bf0de9d3b6 [X86] Remove patterns that select unmasked vbroadcastf2x32/vbroadcasti2x32. Prefer vbroadcastsd/vpbroadcastq instead.
There's no advantage to using these instructions when they aren't masked. This enables some additional execution domain switching without needing to update the table.

llvm-svn: 315674
2017-10-13 06:07:10 +00:00
Matthias Braun bb8507e63c Revert "TargetMachine: Merge TargetMachine and LLVMTargetMachine"
Reverting to investigate layering effects of MCJIT not linking
libCodeGen but using TargetMachine::getNameWithPrefix() breaking the
lldb bots.

This reverts commit r315633.

llvm-svn: 315637
2017-10-12 22:57:28 +00:00
Matthias Braun 3a9c114b24 TargetMachine: Merge TargetMachine and LLVMTargetMachine
Merge LLVMTargetMachine into TargetMachine.

- There is no in-tree target anymore that just implements TargetMachine
  but not LLVMTargetMachine.
- It should still be possible to stub out all the various functions in
  case a target does not want to use lib/CodeGen
- This simplifies the code and avoids methods ending up in the wrong
  interface.

Differential Revision: https://reviews.llvm.org/D38489

llvm-svn: 315633
2017-10-12 22:28:54 +00:00
Craig Topper 060cb43721 [X86] Add CLWB intrinsic. llvm part
llvm-svn: 315613
2017-10-12 20:08:31 +00:00
Wei Ding 5676acad9e Implement custom lowering for ISD::CTTZ_ZERO_UNDEF and ISD::CTTZ.
Differential Revision: http://reviews.llvm.org/D37348

llvm-svn: 315610
2017-10-12 19:37:14 +00:00
Konstantin Zhuravlyov 70303c011f AMDGPU/NFC: Move AMDGPU specific note types to ELF.h
Differential Revision: https://reviews.llvm.org/D38747

llvm-svn: 315608
2017-10-12 18:59:54 +00:00
Artem Belevich 3bafc2f0d9 [NVPTX] Implemented wmma intrinsics and instructions.
WMMA = "Warp Level Matrix Multiply-Accumulate".
These are the new instructions introduced in PTX6.0 and available
on sm_70 GPUs.

Differential Revision: https://reviews.llvm.org/D38645

llvm-svn: 315601
2017-10-12 18:27:55 +00:00
Reid Kleckner 1a7e387849 [codeview] Don't emit FPO data in funclet prologues
Attempt 3 to work around bugs in FPO data with funclets.

llvm-svn: 315600
2017-10-12 18:20:35 +00:00
Konstantin Zhuravlyov 63e87f5a02 AMDGPU: Fix warnings introduced in r315526
llvm-svn: 315596
2017-10-12 17:34:05 +00:00
Lei Huang 0724fea2da [PowerPC] Add profitablilty check for conversion to mtctr loops
Add profitability checks for modifying counted loops to use the mtctr instruction.

The latency of mtctr is only justified if there are more than 4 comparisons that
will be removed as a result.  Usually counted loops are formed relatively early
and before unrolling, so most low trip count loops often don't survive.  However
we want to ensure that if they do, we do not mistakenly update them to mtctr loops.

Use CodeMetrics to ensure we are only doing this for small loops with small trip counts.

Differential Revision: https://reviews.llvm.org/D38212

llvm-svn: 315592
2017-10-12 16:43:33 +00:00
Tim Renouf c8ffffe462 [AMDGPU] For amdpal, widen interpolation mode workaround
Summary:
The interpolation mode workaround ensures that at least one
interpolation mode is enabled in PSInputAddr. It does not also check
PSInputEna on the basis that the user might enable bits in that
depending on run-time state.

However, for amdpal os type, the user does not enable some bits after
compilation based on run-time states; the register values being
generated here are the final ones set in the hardware. Therefore, apply
the workaround to PSInputAddr and PSInputEnable together. (The case
where a bit is set in PSInputAddr but not in PSInputEnable is where the
frontend set up an input arg for a particular interpolation mode, but
nothing uses that input arg. Really we should have an earlier pass that
removes such an arg.)

Reviewers: arsenm, nhaehnle, dstuttard

Subscribers: kzhuravl, wdng, yaxunl, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D37758

llvm-svn: 315591
2017-10-12 16:16:41 +00:00
Don Hinton 3e0199f7eb [dump] Remove NDEBUG from test to enable dump methods [NFC]
Summary:
Add LLVM_FORCE_ENABLE_DUMP cmake option, and use it along with
LLVM_ENABLE_ASSERTIONS to set LLVM_ENABLE_DUMP.

Remove NDEBUG and only use LLVM_ENABLE_DUMP to enable dump methods.

Move definition of LLVM_ENABLE_DUMP from config.h to llvm-config.h so
it'll be picked up by public headers.

Differential Revision: https://reviews.llvm.org/D38406

llvm-svn: 315590
2017-10-12 16:16:06 +00:00
Sanjay Patel 3a72909b7e [x86] replace isEqualTo with == for efficiency
This is a follow-up suggested in D37534.
Patch by Yulia Koval.

llvm-svn: 315589
2017-10-12 16:15:38 +00:00
Simon Pilgrim 0903085ec3 [X86][SSE] Pull out repeated INSERT_VECTOR_ELT code from LowerBUILD_VECTOR v16i8/v8i16 insertion. NFCI.
llvm-svn: 315587
2017-10-12 15:52:01 +00:00
Reid Kleckner d925f98375 Speculative build fix 2
llvm-svn: 315542
2017-10-12 00:28:28 +00:00
Wei Mi 1736efd16a Revert r307036 because of PR34919.
llvm-svn: 315540
2017-10-12 00:24:52 +00:00
Reid Kleckner 9c0126ec0b Speculative build fix, apparently I built llc without my patch applied to test it
llvm-svn: 315539
2017-10-12 00:20:50 +00:00
Reid Kleckner 29cfa6f11f [codeview] Disable FPO in functions using EH funclets
Funclets are emitted by WinException which doesn't have access to
X86TargetStreamer so it's hard to make a quick fix for this.

llvm-svn: 315538
2017-10-12 00:06:57 +00:00
Reid Kleckner c18c12e385 Fix AMDGPU build issue
llvm-svn: 315535
2017-10-11 23:53:36 +00:00
Reid Kleckner ec4ff24f79 [X86] Sink X86AsmPrinter ctor into .cpp file, NFC
I keep adding and removing code here, so let's sink it.

llvm-svn: 315534
2017-10-11 23:53:12 +00:00
Lang Hames 2241ffa43c [MC] Have MCObjectStreamer take its MCAsmBackend argument via unique_ptr.
MCObjectStreamer owns its MCCodeEmitter -- this fixes the types to reflect that,
and allows us to remove the last instance of MCObjectStreamer's weird "holding
ownership via someone else's reference" trick.

llvm-svn: 315531
2017-10-11 23:34:47 +00:00
Konstantin Zhuravlyov 516651b154 AMDGPU/NFC: Minor clean ups in HSA metadata
- Use HSA metadata streamer directly from AMDGPUAsmPrinter
  - Make naming consistent with PAL metadata

Differential Revision: https://reviews.llvm.org/D38746

llvm-svn: 315526
2017-10-11 22:59:35 +00:00
Konstantin Zhuravlyov c3beb6a075 AMDGPU/NFC: Minor clean ups in PAL metadata
- Move PAL metadata definitions to AMDGPUMetadata
  - Make naming consistent with HSA metadata

Differential Revision: https://reviews.llvm.org/D38745

llvm-svn: 315523
2017-10-11 22:41:09 +00:00
Konstantin Zhuravlyov a63b0f9d20 AMDGPU/NFC: Rename code object metadata as HSA metadata
- Rename AMDGPUCodeObjectMetadata to AMDGPUMetadata (PAL metadata will be included in this file in the follow up change)
  - Rename AMDGPUCodeObjectMetadataStreamer to AMDGPUHSAMetadataStreamer
  - Introduce HSAMD namespace
  - Other minor name changes in function and test names

llvm-svn: 315522
2017-10-11 22:18:53 +00:00
Reid Kleckner 9cdd4df81a [codeview] Implement FPO data assembler directives
Summary:
This adds a set of new directives that describe 32-bit x86 prologues.
The directives are limited and do not expose the full complexity of
codeview FPO data. They are merely a convenience for the compiler to
generate more readable assembly so we don't need to generate tons of
labels in CodeGen. If our prologue emission changes in the future, we
can change the set of available directives to suit our needs. These are
modelled after the .seh_ directives, which use a different format that
interacts with exception handling.

The directives are:
  .cv_fpo_proc _foo
  .cv_fpo_pushreg ebp/ebx/etc
  .cv_fpo_setframe ebp/esi/etc
  .cv_fpo_stackalloc 200
  .cv_fpo_endprologue
  .cv_fpo_endproc
  .cv_fpo_data _foo

I tried to follow the implementation of ARM EHABI CFI directives by
sinking most directives out of MCStreamer and into X86TargetStreamer.
This helps avoid polluting non-X86 code with WinCOFF specific logic.

I used cdb to confirm that this can show locals in parent CSRs in a few
cases, most importantly the one where we use ESI as a frame pointer,
i.e. the one in http://crbug.com/756153#c28

Once we have cdb integration in debuginfo-tests, we can add integration
tests there.

Reviewers: majnemer, hans

Subscribers: aemerson, mgorny, kristof.beyls, llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D38776

llvm-svn: 315513
2017-10-11 21:24:33 +00:00
Krzysztof Parzyszek c4a9a8d8e0 [Hexagon] Make sure that new-value jump is packetized with producer
llvm-svn: 315510
2017-10-11 21:20:43 +00:00
Lei Huang 263dc4ef3a [PowerPC] Utilize DQ-Form instructions for spill/restore and fix FrameIndex elimination to only use `lis/addi` if necessary.
Currently we produce a bunch of unnecessary code when emitting the
prologue/epilogue for spills/restores.  Namely, if the load from stack
slot/store to stack slot instruction is an X-Form instruction, we will
always produce an LIS/ORI sequence for the stack offset.

Furthermore, we have not exploited the P9 vector D-Form loads/stores for this
purpose.

This patch address both issues.

Specifying the D-Form load as the instruction to use for stack spills/reloads
should be safe because:

1. The stack should be aligned according to the ABI
2. If the stack isn't aligned, PPCRegisterInfo::eliminateFrameIndex() will
   check for the offset being a multiple of 16 and will convert it to an
   X-Form instruction if it isn't.

Differential Revision : https://reviews.llvm.org/D38758

llvm-svn: 315500
2017-10-11 20:20:58 +00:00
Sanjay Patel 6c0aef77aa [x86] avoid infinite loop from SoftenFloatOperand (PR34866)
Legalization of fp128 assumes things that we should have asserts for,
so that's another potential improvement.

Differential Revision: https://reviews.llvm.org/D38771

llvm-svn: 315485
2017-10-11 18:24:21 +00:00