(Target)FrameLowering::determineCalleeSaves can be called multiple
times. I don't think it should have side-effects as creating stack
objects and setting global MachineFunctionInfo state as it is doing
today (in other back-ends as well).
This moves the creation of stack objects from determineCalleeSaves to
assignCalleeSavedSpillSlots.
Differential Revision: https://reviews.llvm.org/D41703
llvm-svn: 321987
Previously, in r320472, I moved the calculation of section offsets and sizes
for compressed debug sections into maybeCompress, which happens before
assignAddresses, so that the compression had the required information. However,
I failed to take account of relocations that patch such sections. This had two
effects:
1. A race condition existed when a debug section referred to a different debug
section (see PR35788).
2. References to symbols in non-debug sections would be patched incorrectly.
This is because the addresses of such symbols are not calculated until after
assignAddresses (this was a partial regression caused by r320472, but they
could still have been broken before, in the event that a custom layout was used
in a linker script).
assignAddresses does not need to know about the output section size of
non-allocatable sections, because they do not affect the value of Dot. This
means that there is no longer a reason not to support custom layout of
compressed debug sections, as far as I'm aware. These two points allow for
delaying when maybeCompress can be called, removing the need for the loop I
previously added to calculate the section size, and therefore the race
condition. Furthermore, by delaying, we fix the issues of relocations getting
incorrect symbol values, because they have now all been finalized.
llvm-svn: 321986
Previously, the COFF driver would call exit(1) from the
ErrorHandler in the case of a link error, even if
CanExitEarly=false was specified. Now it initializes
the ErrorHandler in the same way that the ELF driver does.
Patch by Andrew Kelley.
Differential Revision: https://reviews.llvm.org/D41803
llvm-svn: 321983
LLD previously used to handle dynamic lists and version scripts in the
exact same way, even though they have very different semantics for
shared libraries and subtly different semantics for executables. r315114
untangled their semantics for executables (building on previous work to
correct their semantics for shared libraries). With that change, dynamic
lists won't set the default version to VER_NDX_LOCAL, and so resetting
the version to VER_NDX_GLOBAL in scanShlibUndefined is unnecessary.
This was causing an issue because version scripts containing `local: *`
work by setting the default version to VER_NDX_LOCAL, but scanShlibUndefined
would override this default, and therefore symbols which should have
been local would end up in the dynamic symbol table, which differs from
both bfd and gold's behavior. gold silently keeps the symbol hidden in
such a scenario, whereas bfd issues an error. I prefer bfd's behavior
and plan to implement that in LLD in a follow-up (and the test case
added here will be updated accordingly).
Differential Revision: https://reviews.llvm.org/D41639
llvm-svn: 321982
These tests assumes availability of external symbols provided by the
C++ library, but those won't be available in case when the C++ library
is statically linked because lli itself doesn't need these.
This uses llvm-readobj -needed-libs to check if C++ library is linked as
shared library and exposes that information as a feature to lit.
Differential Revision: https://reviews.llvm.org/D41272
llvm-svn: 321981
Check whether we are comparing the same entity, not merely the same
declaration, and don't assume that weak declarations resolve to distinct
entities.
llvm-svn: 321976
The approach was never discussed, I wasn't able to reproduce this
non-determinism, and the original author went AWOL.
After a discussion on the ML, Philip suggested to revert this.
llvm-svn: 321974
I had removed the qualifiers around the autogenerated folding table so I could compare with the manual table, but didn't intend to commit the change.
llvm-svn: 321971
Allow SimplifyDemandedBits to use TargetLoweringOpt::computeKnownBits to look through bitcasts. This can help simplifying in some cases where bitcasts of constants generated during or after legalization can't be folded away, and thus didn't get picked up by SimplifyDemandedBits. This fixes PR34620, where a redundant pand created during legalization from lowering and lshr <16xi8> wasn't being simplified due to the presence of a bitcasted build_vector as an operand.
Committed on the behalf of @sameconrad (Sam Conrad)
Differential Revision: https://reviews.llvm.org/D41643
llvm-svn: 321969
Summary:
There are few oddities that occur due to v1i1, v8i1, v16i1 being legal without v2i1 and v4i1 being legal when we don't have VLX. Particularly during legalization of v2i32/v4i32/v2i64/v4i64 masked gather/scatter/load/store. We end up promoting the mask argument to these during type legalization and then have to widen the promoted type to v8iX/v16iX and truncate it to get the element size back down to v8i1/v16i1 to use a 512-bit operation. Since need to fill the upper bits of the mask we have to fill with 0s at the promoted type.
It would be better if we could just have the v2i1/v4i1 types as legal so they don't undergo any promotion. Then we can just widen with 0s directly in a k register. There are no real v4i1/v2i1 instructions anyway. Everything is done on a larger register anyway.
This also fixes an issue that we couldn't implement a masked vextractf32x4 from zmm to xmm properly.
We now have to support widening more compares to 512-bit to get a mask result out so new tablegen patterns got added.
I had to hack the legalizer for widening the operand of a setcc a bit so it didn't try create a setcc returning v4i32, extract from it, then try to promote it using a sign extend to v2i1. Now we create the setcc with v4i1 if the original setcc's result type is v2i1. Then extract that and don't sign extend it at all.
There's definitely room for improvement with some follow up patches.
Reviewers: RKSimon, zvi, guyblank
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41560
llvm-svn: 321967
This field is defined as kmp_int32, so we should use neither
pointers to kmp_int64 nor 64 bit atomic instructions.
(Found while testing on a Raspberry Pi, 32 bit ARM)
Differential Revision: https://reviews.llvm.org/D41656
llvm-svn: 321964
Summary:
After rL319736 for D28253 (which fixes PR28929), gcc cannot compile `<memory>` anymore in pre-C+11 modes, complaining:
```
In file included from /usr/include/c++/v1/memory:648:0,
from test.cpp:1:
/usr/include/c++/v1/memory: In static member function 'static std::__1::shared_ptr<_Tp> std::__1::shared_ptr<_Tp>::make_shared(_A0&, _A1&, _A2&)':
/usr/include/c++/v1/memory:4365:5: error: wrong number of template arguments (4, should be at least 1)
static_assert((is_constructible<_Tp, _A0, _A1, _A2>::value), "Can't construct object in make_shared" );
^
In file included from /usr/include/c++/v1/memory:649:0,
from test.cpp:1:
/usr/include/c++/v1/type_traits:3198:29: note: provided for 'template<class _Tp, class _A0, class _A1> struct std::__1::is_constructible'
struct _LIBCPP_TEMPLATE_VIS is_constructible
^~~~~~~~~~~~~~~~
In file included from /usr/include/c++/v1/memory:648:0,
from test.cpp:1:
/usr/include/c++/v1/memory:4365:5: error: template argument 1 is invalid
static_assert((is_constructible<_Tp, _A0, _A1, _A2>::value), "Can't construct object in make_shared" );
^
/usr/include/c++/v1/memory: In static member function 'static std::__1::shared_ptr<_Tp> std::__1::shared_ptr<_Tp>::allocate_shared(const _Alloc&, _A0&, _A1&, _A2&)':
/usr/include/c++/v1/memory:4444:5: error: wrong number of template arguments (4, should be at least 1)
static_assert((is_constructible<_Tp, _A0, _A1, _A2>::value), "Can't construct object in allocate_shared" );
^
In file included from /usr/include/c++/v1/memory:649:0,
from test.cpp:1:
/usr/include/c++/v1/type_traits:3198:29: note: provided for 'template<class _Tp, class _A0, class _A1> struct std::__1::is_constructible'
struct _LIBCPP_TEMPLATE_VIS is_constructible
^~~~~~~~~~~~~~~~
In file included from /usr/include/c++/v1/memory:648:0,
from test.cpp:1:
/usr/include/c++/v1/memory:4444:5: error: template argument 1 is invalid
static_assert((is_constructible<_Tp, _A0, _A1, _A2>::value), "Can't construct object in allocate_shared" );
^
```
This is also reported in https://bugs.freebsd.org/224946 (FreeBSD is apparently one of the very few projects that regularly builds programs against libc++ with gcc).
The reason is that the static assertions are invoking `is_constructible` with three arguments, while gcc does not have the built-in `is_constructible` feature, and the pre-C++11 `is_constructible` wrappers in `<type_traits>` only provide up to two arguments.
I have added additional wrappers for three arguments, modified the `is_constructible` entry point to take three arguments instead, and added a simple test to is_constructible.pass.cpp.
Reviewers: EricWF, mclow.lists
Reviewed By: EricWF
Subscribers: krytarowski, cfe-commits, emaste
Differential Revision: https://reviews.llvm.org/D41805
llvm-svn: 321963
Another small step forward to move VPlan stuff outside of LoopVectorize.cpp.
VPlanBuilder.h is renamed to LoopVectorizationPlanner.h
LoopVectorizationPlanner class is moved from LoopVectorize.cpp to
LoopVectorizationPlanner.h LoopVectorizationCostModel::VectorizationFactor
class is moved to LoopVectorizationPlanner.h (used by the planner class) ---
this needs further streamlining work in later patches and thus all I did was
take it out of the CostModel class and moved to the header file. The callback
function had to stay inside LoopVectorize.cpp since it calls an
InnerLoopVectorizer member function declared in it. Next Steps: Make
InnerLoopVectorizer, LoopVectorizationCostModel, and other classes more modular
and more aligned with VPlan direction, in small increments.
Previous step was: r320900 (https://reviews.llvm.org/D41045)
Patch by Hideki Saito, thanks!
Differential Revision: https://reviews.llvm.org/D41420
llvm-svn: 321962
In addition to target-dependent attributes, we can also preserve a
white-listed subset of target independent function attributes. The white-list
excludes problematic attributes, most prominently:
* attributes related to memory accesses, as alloca instructions
could be moved in/out of the extracted block
* control-flow dependent attributes, like no_return or thunk, as the
relerelevant instructions might or might not get extracted.
Thanks @efriedma and @aemerson for providing a set of attributes that cannot be
propagated.
Reviewers: efriedma, davidxl, davide, silvas
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D41334
llvm-svn: 321961
Summary:
I believe legalization is really expecting that ReplaceNodeResults will return something with the same type as the thing that's being legalized. Ultimately, it uses the output to replace the uses in the DAG so the type should match to make that work.
There are two relevant cases here. When crbits are enabled, then i1 is a legal type and getSetCCResultType should return i1. In this case, the truncate will be between i1 and i1 and should be removed (SelectionDAG::getNode does this). Otherwise, getSetCCResultType will be i32 and the legalizer will promote the truncate to be i32 -> i32 which will be similarly removed.
With this fixed we can remove some code from PromoteIntRes_SETCC that seemed to only exist to deal with the intrinsic being replaced with a larger type without changing the other operand. With the truncate being used for connectivity this doesn't happen anymore.
Reviewers: hfinkel
Reviewed By: hfinkel
Subscribers: nemanjai, llvm-commits, kbarton
Differential Revision: https://reviews.llvm.org/D41654
llvm-svn: 321959
The instructions that load 64-bits or an xmm register should be TB_NO_REVERSE to avoid the load being widened during unfold. The instructions that load 128-bits need to ensure 128-bit alignment.
llvm-svn: 321956
The memory form of the xmm->xmm version only writes 64-bits. If we use it in the folding tables and its get used for a stack spill, only half the slot will be written. Then a reload may read all 128-bits which will pull in garbage. But without the spill the upper bits of the register would have been zero. By not folding we would preserve the zeros.
llvm-svn: 321950
These just overloads for _Float128. They're supported by GCC 7 and used
by glibc. APFloat support is already there so just add the overloads.
__builtin_copysignf128
__builtin_fabsf128
__builtin_huge_valf128
__builtin_inff128
__builtin_nanf128
__builtin_nansf128
This is the same support that GCC has, according to the documentation,
but limited to _Float128.
llvm-svn: 321948
We don't do fine grained feature control like this on features prior to AVX512.
We do still have checks in place in the assembly parser itself that prevents %zmm references or %xmm16-31 from being parsed without at least -mattr=avx512f. Same for rounding control and mask operands. That will prevent the table matcher from matching for any instructions that need those features and that's probably good enough.
llvm-svn: 321947
If the varargs are not accessed by a function, we can inline the
function.
Reviewers: dblaikie, chandlerc, davide, efriedma, rnk, hfinkel
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D41335
llvm-svn: 321940