Commit Graph

279832 Commits

Author SHA1 Message Date
Craig Topper c1ec57c3e2 [X86] Remove unneeded code from combineGatherScatter that used to delte SIGN_EXTEND_INREG nodes created during legalization of v2i1/v4i1 masks on KNL.
v2i1/v4i1 are now legal on KNL so no sign_extend_inreg is generated.

llvm-svn: 321968
2018-01-07 18:34:08 +00:00
Craig Topper d58c165545 [X86] Make v2i1 and v4i1 legal types without VLX
Summary:
There are few oddities that occur due to v1i1, v8i1, v16i1 being legal without v2i1 and v4i1 being legal when we don't have VLX. Particularly during legalization of v2i32/v4i32/v2i64/v4i64 masked gather/scatter/load/store. We end up promoting the mask argument to these during type legalization and then have to widen the promoted type to v8iX/v16iX and truncate it to get the element size back down to v8i1/v16i1 to use a 512-bit operation. Since need to fill the upper bits of the mask we have to fill with 0s at the promoted type.

It would be better if we could just have the v2i1/v4i1 types as legal so they don't undergo any promotion. Then we can just widen with 0s directly in a k register. There are no real v4i1/v2i1 instructions anyway. Everything is done on a larger register anyway.

This also fixes an issue that we couldn't implement a masked vextractf32x4 from zmm to xmm properly.

We now have to support widening more compares to 512-bit to get a mask result out so new tablegen patterns got added.

I had to hack the legalizer for widening the operand of a setcc a bit so it didn't try create a setcc returning v4i32, extract from it, then try to promote it using a sign extend to v2i1. Now we create the setcc with v4i1 if the original setcc's result type is v2i1. Then extract that and don't sign extend it at all.

There's definitely room for improvement with some follow up patches.

Reviewers: RKSimon, zvi, guyblank

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D41560

llvm-svn: 321967
2018-01-07 18:20:37 +00:00
Marshall Clow 464de6ca09 Mark the transparent version set::count() as const. Thanks to Ivan Matek for the bug report.
llvm-svn: 321966
2018-01-07 17:39:57 +00:00
Jonas Hahnfeld 3ffca790f6 Correct types of pointers to doacross_num_done
This field is defined as kmp_int32, so we should use neither
pointers to kmp_int64 nor 64 bit atomic instructions.
(Found while testing on a Raspberry Pi, 32 bit ARM)

Differential Revision: https://reviews.llvm.org/D41656

llvm-svn: 321964
2018-01-07 16:54:36 +00:00
Dimitry Andric 672f7adce5 Add pre-C++11 is_constructible wrappers for 3 arguments
Summary:
After rL319736 for D28253 (which fixes PR28929), gcc cannot compile `<memory>` anymore in pre-C+11 modes, complaining:

```
In file included from /usr/include/c++/v1/memory:648:0,
                 from test.cpp:1:
/usr/include/c++/v1/memory: In static member function 'static std::__1::shared_ptr<_Tp> std::__1::shared_ptr<_Tp>::make_shared(_A0&, _A1&, _A2&)':
/usr/include/c++/v1/memory:4365:5: error: wrong number of template arguments (4, should be at least 1)
     static_assert((is_constructible<_Tp, _A0, _A1, _A2>::value), "Can't construct object in make_shared" );
     ^
In file included from /usr/include/c++/v1/memory:649:0,
                 from test.cpp:1:
/usr/include/c++/v1/type_traits:3198:29: note: provided for 'template<class _Tp, class _A0, class _A1> struct std::__1::is_constructible'
 struct _LIBCPP_TEMPLATE_VIS is_constructible
                             ^~~~~~~~~~~~~~~~
In file included from /usr/include/c++/v1/memory:648:0,
                 from test.cpp:1:
/usr/include/c++/v1/memory:4365:5: error: template argument 1 is invalid
     static_assert((is_constructible<_Tp, _A0, _A1, _A2>::value), "Can't construct object in make_shared" );
     ^
/usr/include/c++/v1/memory: In static member function 'static std::__1::shared_ptr<_Tp> std::__1::shared_ptr<_Tp>::allocate_shared(const _Alloc&, _A0&, _A1&, _A2&)':
/usr/include/c++/v1/memory:4444:5: error: wrong number of template arguments (4, should be at least 1)
     static_assert((is_constructible<_Tp, _A0, _A1, _A2>::value), "Can't construct object in allocate_shared" );
     ^
In file included from /usr/include/c++/v1/memory:649:0,
                 from test.cpp:1:
/usr/include/c++/v1/type_traits:3198:29: note: provided for 'template<class _Tp, class _A0, class _A1> struct std::__1::is_constructible'
 struct _LIBCPP_TEMPLATE_VIS is_constructible
                             ^~~~~~~~~~~~~~~~
In file included from /usr/include/c++/v1/memory:648:0,
                 from test.cpp:1:
/usr/include/c++/v1/memory:4444:5: error: template argument 1 is invalid
     static_assert((is_constructible<_Tp, _A0, _A1, _A2>::value), "Can't construct object in allocate_shared" );
     ^
```

This is also reported in https://bugs.freebsd.org/224946 (FreeBSD is apparently one of the very few projects that regularly builds programs against libc++ with gcc).

The reason is that the static assertions are invoking `is_constructible` with three arguments, while gcc does not have the built-in `is_constructible` feature, and the pre-C++11 `is_constructible` wrappers in `<type_traits>` only provide up to two arguments.

I have added additional wrappers for three arguments, modified the `is_constructible` entry point to take three arguments instead, and added a simple test to is_constructible.pass.cpp.

Reviewers: EricWF, mclow.lists

Reviewed By: EricWF

Subscribers: krytarowski, cfe-commits, emaste

Differential Revision: https://reviews.llvm.org/D41805

llvm-svn: 321963
2018-01-07 16:45:11 +00:00
Hal Finkel 0f1314c5ee [LV][VPlan] NFC patch to move LoopVectorizationPlanner class out of LoopVectorize.cpp
Another small step forward to move VPlan stuff outside of LoopVectorize.cpp.

VPlanBuilder.h is renamed to LoopVectorizationPlanner.h
LoopVectorizationPlanner class is moved from LoopVectorize.cpp to
LoopVectorizationPlanner.h LoopVectorizationCostModel::VectorizationFactor
class is moved to LoopVectorizationPlanner.h (used by the planner class) ---
this needs further streamlining work in later patches and thus all I did was
take it out of the CostModel class and moved to the header file.  The callback
function had to stay inside LoopVectorize.cpp since it calls an
InnerLoopVectorizer member function declared in it.  Next Steps: Make
InnerLoopVectorizer, LoopVectorizationCostModel, and other classes more modular
and more aligned with VPlan direction, in small increments.

Previous step was: r320900 (https://reviews.llvm.org/D41045)

Patch by Hideki Saito, thanks!

Differential Revision: https://reviews.llvm.org/D41420

llvm-svn: 321962
2018-01-07 16:02:58 +00:00
Florian Hahn 55be37e7d4 [CodeExtractor] Use subset of function attributes for extracted function.
In addition to target-dependent attributes, we can also preserve a
white-listed subset of target independent function attributes. The white-list
excludes problematic attributes, most prominently:

* attributes related to memory accesses, as alloca instructions
  could be moved in/out of the extracted block

* control-flow dependent attributes, like no_return or thunk, as the
  relerelevant instructions might or might not get extracted.

Thanks @efriedma and @aemerson for providing a set of attributes that cannot be
propagated.


Reviewers: efriedma, davidxl, davide, silvas

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D41334

llvm-svn: 321961
2018-01-07 11:22:25 +00:00
Benjamin Kramer a674416d90 Remove outdated doxygen comment [-Wdocumentation]
No functionality change.

llvm-svn: 321960
2018-01-07 09:11:16 +00:00
Craig Topper d461aefe5f [PowerPC] Add an ISD::TRUNCATE to the legalization for ppc_is_decremented_ctr_nonzero
Summary:
I believe legalization is really expecting that ReplaceNodeResults will return something with the same type as the thing that's being legalized. Ultimately, it uses the output to replace the uses in the DAG so the type should match to make that work.

There are two relevant cases here. When crbits are enabled, then i1 is a legal type and getSetCCResultType should return i1. In this case, the truncate will be between i1 and i1 and should be removed (SelectionDAG::getNode does this). Otherwise, getSetCCResultType will be i32 and the legalizer will promote the truncate to be i32 -> i32 which will be similarly removed.

With this fixed we can remove some code from PromoteIntRes_SETCC that seemed to only exist to deal with the intrinsic being replaced with a larger type without changing the other operand. With the truncate being used for connectivity this doesn't happen anymore.

Reviewers: hfinkel

Reviewed By: hfinkel

Subscribers: nemanjai, llvm-commits, kbarton

Differential Revision: https://reviews.llvm.org/D41654

llvm-svn: 321959
2018-01-07 07:51:36 +00:00
Craig Topper a21f551109 [X86] Add the 16 and 8-bit CRC32 instructions to the load folding tables.
llvm-svn: 321958
2018-01-07 06:48:20 +00:00
John McCall 56331e2864 Simplify the internal API for checking whether swiftcall passes a type indirectly and expose that API externally.
llvm-svn: 321957
2018-01-07 06:28:49 +00:00
Craig Topper d0859a03b5 [X86] Correct the load folding flags for xmm fp->mmx conversion instructions.
The instructions that load 64-bits or an xmm register should be TB_NO_REVERSE to avoid the load being widened during unfold. The instructions that load 128-bits need to ensure 128-bit alignment.

llvm-svn: 321956
2018-01-07 06:24:30 +00:00
Craig Topper aa73941176 [X86] Add TB_NO_REVERSE to some scalar intrinsic instructions in the load folding table.
llvm-svn: 321955
2018-01-07 06:24:29 +00:00
Craig Topper 85657d59a9 [X86] Don't put any EVEX_B instructions in the tablegen generated load folding tables.
EVEX_B means different things for memory and register forms. The instructions should not be considered equivalent.

llvm-svn: 321954
2018-01-07 06:24:28 +00:00
Craig Topper 89293a2a94 [X86] Add 128 and 256-bit VPOPCNTD/Q instructions to load folding tables.
llvm-svn: 321953
2018-01-07 06:24:27 +00:00
Craig Topper a124ab10ef [X86] Add some 8 and 16-bit instructions to the load folding tables.
llvm-svn: 321952
2018-01-07 06:24:25 +00:00
Craig Topper 11aede13db [X86] Add EVEX vcvtph2ps to the load folding tables.
llvm-svn: 321951
2018-01-07 06:24:24 +00:00
Craig Topper 40cc8338f7 [X86] Remove cvtps2ph xmm->xmm from store folding tables. Add the evex versions of cvtps2ph to the store folding tables.
The memory form of the xmm->xmm version only writes 64-bits. If we use it in the folding tables and its get used for a stack spill, only half the slot will be written. Then a reload may read all 128-bits which will pull in garbage. But without the spill the upper bits of the register would have been zero. By not folding we would preserve the zeros.

llvm-svn: 321950
2018-01-07 06:24:23 +00:00
Craig Topper 8fa800b834 [X86] Add CMP8ri8 to load folding tables.
llvm-svn: 321949
2018-01-07 06:24:21 +00:00
Benjamin Kramer dfecbe9ad8 Add support for a limited subset of TS 18661-3 math builtins.
These just overloads for _Float128. They're supported by GCC 7 and used
by glibc. APFloat support is already there so just add the overloads.

__builtin_copysignf128
__builtin_fabsf128
__builtin_huge_valf128
__builtin_inff128
__builtin_nanf128
__builtin_nansf128

This is the same support that GCC has, according to the documentation,
but limited to _Float128.

llvm-svn: 321948
2018-01-06 21:49:54 +00:00
Craig Topper cf93feb981 [X86] Remove assembler predicates from all AVX512 related feature flags.
We don't do fine grained feature control like this on features prior to AVX512.

We do still have checks in place in the assembly parser itself that prevents %zmm references or %xmm16-31 from being parsed without at least -mattr=avx512f. Same for rounding control and mask operands. That will prevent the table matcher from matching for any instructions that need those features and that's probably good enough.

llvm-svn: 321947
2018-01-06 21:45:30 +00:00
Craig Topper 61d8a60e23 [X86] Remove memory forms of EVEX encoded vcvttss2si/vcvttsd2si from asm matcher table.
This is also needed to fix PR35837.

llvm-svn: 321946
2018-01-06 21:27:25 +00:00
Craig Topper 0f4ccb7806 [X86] Add load folding pattern to EVEX vcvttss2si/vcvtsd2si.
llvm-svn: 321945
2018-01-06 21:02:26 +00:00
Craig Topper 90353a9f42 [X86] Remove an unnecessary VCVTTSD2SIrrb/VCVTSS2SIrrb instruction with no isel pattern that only existed for the assembler. Use VCVTTSD2SIrrb_Int instead.
For consistency use the _Int version of VCVTTSD2SIrr_Int and VCVTTSD2SIrm_Int for the assembler as well.

llvm-svn: 321944
2018-01-06 21:02:22 +00:00
Florian Hahn a82eef2363 [InlineFunction] Preserve calling convention when forwarding VarArgs.
Reviewers: efriedma, rnk, davide

Reviewed By: rnk, davide

Differential Revision: https://reviews.llvm.org/D41556

llvm-svn: 321943
2018-01-06 20:56:27 +00:00
Florian Hahn de10e6e064 [InlineFunction] Preserve attributes when forwarding VarArgs.
Reviewers: rnk, efriedma

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D41555

llvm-svn: 321942
2018-01-06 20:46:00 +00:00
Lang Hames 0b93cd7351 [ORC] Remove AsynchronousSymbolQuery while I debug an issue on one of the
builders.

llvm-svn: 321941
2018-01-06 20:14:22 +00:00
Florian Hahn 80788d8088 [InlineFunction] Inline vararg functions that do not access varargs.
If the varargs are not accessed by a function, we can inline the
function.

Reviewers: dblaikie, chandlerc, davide, efriedma, rnk, hfinkel

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D41335

llvm-svn: 321940
2018-01-06 19:45:40 +00:00
Craig Topper a49c354a08 [X86] Remove memory forms of EVEX encoded vcvtsd2si/vcvtss2si from the assembler matcher table
We should always prefer the VEX encoded version of these instructions. There is no advantage to the EVEX version.

Fixes PR35837.

llvm-svn: 321939
2018-01-06 19:20:33 +00:00
Craig Topper ad89541ae9 [TableGen] Make the ambiguous match debug messages from the AsmMatcherEmitter slightly more useful.
Don't report ambiguous matches on different variants. Print the variant number in the output.

llvm-svn: 321938
2018-01-06 19:20:32 +00:00
Saleem Abdulrasool ae20590c17 Correct mistake in pragma usage for Windows
The autolink pragma was missing the pragma name itself.  This would
result in the pragma being silently dropped.

llvm-svn: 321937
2018-01-06 18:47:03 +00:00
Sanjay Patel 26a6fcde83 [InstCombine] relax use constraint for min/max (~a, ~b) --> ~min/max(a, b)
In the minimal case, this won't remove instructions, but it still improves
uses of existing values.

In the motivating example from PR35834, it does remove instructions, and
sets that case up to be optimized by something like D41603:
https://reviews.llvm.org/D41603

llvm-svn: 321936
2018-01-06 17:34:22 +00:00
Sanjay Patel f7e775291e [InstCombine] add more tests for max(~a, ~b) and PR35834; NFC
llvm-svn: 321935
2018-01-06 17:14:46 +00:00
Sanjay Patel 5a48aef3f0 [x86, MemCmpExpansion] allow 2 pairs of loads per block (PR33325)
This is the last step needed to fix PR33325:
https://bugs.llvm.org/show_bug.cgi?id=33325

We're trading branch and compares for loads and logic ops. 
This makes the code smaller and hopefully faster in most cases.

The 24-byte test shows an interesting construct: we load the trailing scalar 
elements into vector registers and generate the same pcmpeq+movmsk code that 
we expected for a pair of full vector elements (see the 32- and 64-byte tests).

Differential Revision: https://reviews.llvm.org/D41714

llvm-svn: 321934
2018-01-06 16:16:04 +00:00
Gabor Horvath b77bc6bb8b [analyzer] Fix some check's output plist not containing the check name
Differential Revision: https://reviews.llvm.org/D41538

llvm-svn: 321933
2018-01-06 10:51:00 +00:00
Michal Gorny 15a86ef5af [test] Use full PATH lookup for tools
Use full PATH when looking up test tools rather than just llvm tools
directory. r320813 has added a lookup for 'lldb-test' which is part
of LLDB tools rather than LLVM, and therefore is not present
in llvm_tools_dir before LLDB is installed.

While technically we could introduce separate per-directory lookup
logic, there is no real reason not to use the PATH formed earlier here,
and this is what other tools are doing.

Differential Revision: https://reviews.llvm.org/D41726

llvm-svn: 321932
2018-01-06 10:20:25 +00:00
Craig Topper b18d6221ba [X86] Rename the EVEX encoded GFNI instructions to start with a 'V'. NFC
This makes the names consistent with the mnemonics like every other instruction.

llvm-svn: 321931
2018-01-06 07:18:08 +00:00
Craig Topper 36d8da3358 [X86] When parsing rounding mode operands, provide a proper end location so we don't crash when trying to print an error message using it.
llvm-svn: 321930
2018-01-06 06:41:07 +00:00
Craig Topper 8c2ea74e74 [X86] Call lowerShuffleAsRepeatedMaskAndLanePermute from lowerV4I64VectorShuffle.
llvm-svn: 321929
2018-01-06 06:08:04 +00:00
Craig Topper af1d257571 [X86] Run dos2unix on a test file. NFC
llvm-svn: 321928
2018-01-06 06:08:02 +00:00
Lang Hames 4b6cae190d [ORC] Yet more debugging output to diagnose test failures.
llvm-svn: 321927
2018-01-06 05:19:07 +00:00
Lang Hames d80ce40d3d [ORC] Fix the counter type on SymbolStringPool entries.
Hopefully this will fix the build failure in
http://lab.llvm.org:8011/builders/llvm-mips-linux/builds/3417

llvm-svn: 321926
2018-01-06 05:19:06 +00:00
Lang Hames 623bd270cc [ORC] More debugging output to track down tester failures.
llvm-svn: 321925
2018-01-06 04:35:51 +00:00
Richard Trieu 719757039f Test case for r321396
Any hashing for methods should be able to compile this test case without
emitting an error.  Since the class and method come from the same header from
each module, there should be no messages about ODR violations.

llvm-svn: 321924
2018-01-06 03:20:59 +00:00
Billy Robert O'Neal III 231d15e086 Add casts to prevent narrowing warnings.
llvm-svn: 321923
2018-01-06 02:50:03 +00:00
Billy Robert O'Neal III 1e1195dce5 [libcxx] [test] Remove nonstandard things and resolve warnings in Xxx_scan tests
Reviewed as https://reviews.llvm.org/D41748

* These tests use function objects from functional, back_inserter from iterator, and equal from algorithm, so add those headers.
* The use of iota targeting vector<unsigned char> with an int parameter triggers warnings on MSVC++ assigning an into a unsigned char&; so change the parameter to unsigned char with a static_cast.
* Avoid naming unary_function in identity here as that is removed in '17. (This also fixes naming _VSTD, _NOEXCEPT_, and other libcxx-isms)
* Change the predicate in the transform tests to add_ten so that problems with multiple application are caught.

llvm-svn: 321922
2018-01-06 02:18:20 +00:00
Richard Smith a263c346e5 Serialize the IDNS for a UsingShadowDecl rather than recomputing it.
Attempting to recompute it are doomed to fail because the IDNS of a declaration
is not necessarily preserved across serialization and deserialization (in turn
because whether a friend declaration is visible depends on whether some prior
non-friend declaration exists).

llvm-svn: 321921
2018-01-06 01:07:05 +00:00
Lang Hames 0f74d273b0 [ORC] Temporarily adding some redundant asserts / debug output to aid in
debugging a tester failure.

llvm-svn: 321920
2018-01-06 01:06:07 +00:00
Lang Hames c2ba9059d0 [ORC] Fix a think-o in the current AsynchronousSymbolQuery test.
This *should* be a no-op as far as the current failure is concerned, but needs
to be fixed anyway.

llvm-svn: 321919
2018-01-06 01:06:05 +00:00
Vedant Kumar 1f6f5f1df9 [Debugify] Handled unsized types
llvm-svn: 321918
2018-01-06 00:37:01 +00:00