We were handling the non-hidden case in lib/Target/TargetMachine.cpp,
but the hidden case was handled in architecture dependent code and
only X86_64 and AArch64 were covered.
While it is true that some code sequences in some ABIs might be able
to produce the correct value at runtime, that doesn't seem to be the
common case.
I left the AArch64 code in place since it also forces a got access for
non-pic code. It is not clear if that is needed, but it is probably
better to change that in another commit.
llvm-svn: 316799
As far as I can tell, this matches gcc: -mfloat-abi determines the
calling convention for all functions except those explicitly defined as
soft-float in the ARM RTABI.
This change only affects cases where the user specifies -mfloat-abi to
override the default calling convention derived from the target triple.
Fixes https://bugs.llvm.org//show_bug.cgi?id=34530.
Differential Revision: https://reviews.llvm.org/D38299
llvm-svn: 316708
If the extend type is 64-bits, emit a 32-bit -> 64-bit extend after the UDIVREM8_ZEXT_HREG/UDIVREM8_SEXT_HREG operation.
This gives a shorter encoding for the second extend in the sext case, and allows us to completely remove the second extend in the zext case.
This also adds known bit and num sign bits support for UDIVREM8_ZEXT_HREG/SDIVREM8_SEXT_HREG.
Differential Revision: https://reviews.llvm.org/D38275
llvm-svn: 316702
Instead of loading (a potential ton of) scalar constants, load those as a vector and then insert into it.
Differential Revision: https://reviews.llvm.org/D38756
llvm-svn: 316685
Summary:
This causes a segfault on ARM when (I think) the pass manager is used multiple times.
Reset set the (last) current section to NULL without saving the corresponding LastEMSInfo back into the map. The next use of the streamer then save the LastEMSInfo for the NULL section leaving the LastEMSInfo mapping for the last current section (the one that was there before the reset) NULL which cause the LastEMSInfo to be set to NULL when the section is being used again.
The reuse of the section (pointer) might mean that the map was holding dangling pointers previously which is why I went for clearing the map and resetting the info, making it as similar to the state right after the constructor run as possible. The AArch64 one doesn't have segfault (since LastEMS isn't a pointer) but it seems to have the same issue.
The segfault is likely caused by https://reviews.llvm.org/D30724 which turns LastEMSInfo into a pointer. As mentioned above, it seems that the actual issue was older though.
No test is included since the test is believed to be too complicated for such an obvious fix and not worth doing.
Reviewers: llvm-commits, shankare, t.p.northover, peter.smith, rengolin
Reviewed By: rengolin
Subscribers: mgorny, aemerson, rengolin, javed.absar, kristof.beyls
Differential Revision: https://reviews.llvm.org/D38588
llvm-svn: 316679
Currently we do not represent runtime preemption in the IR, which has several
drawbacks:
1) The semantics of GlobalValues differ depending on the object file format
you are targeting (as well as the relocation-model and -fPIE value).
2) We have no way of disabling inlining of run time interposable functions,
since in the IR we only know if a function is link-time interposable.
Because of this llvm cannot support elf-interposition semantics.
3) In LTO builds of executables we will have extra knowledge that a symbol
resolved to a local definition and can't be preemptable, but have no way to
propagate that knowledge through the compiler.
This patch adds preemptability specifiers to the IR with the following meaning:
dso_local --> means the compiler may assume the symbol will resolve to a
definition within the current linkage unit and the symbol may be accessed
directly even if the definition is not within this compilation unit.
dso_preemptable --> means that the compiler must assume the GlobalValue may be
replaced with a definition from outside the current linkage unit at runtime.
To ease transitioning dso_preemptable is treated as a 'default' in that
low-level codegen will still do the same checks it did previously to see if a
symbol should be accessed indirectly. Eventually when IR producers emit the
specifiers on all Globalvalues we can change dso_preemptable to mean 'always
access indirectly', and remove the current logic.
Differential Revision: https://reviews.llvm.org/D20217
llvm-svn: 316668
These instructions were previously marked as codegen only preventing
them from being assembled as microMIPS or disassembled.
Reviewers: atanasyan, abeserminji
Differential Revision: https://reviews.llvm.org/D39123
llvm-svn: 316656
PR35071 exposed the fact that MipsInstrInfo::removeBranch did not walk past
debug instructions when removing branches for the control flow optimizer, which
lead to duplicated conditional branches. If the target of the branch was a
removable block, only the conditional branch in the terminating position would
have it's MBB operands updated, leaving the first branch with a dangling MBB
operand. The MIPS long branch pass would then trigger an assertion when
attempting to examine the instruction with dangling MBB operand.
This resolves PR35071.
Thanks to Alex Richardson for reporting the issue!
Reviewers: atanasyan
Differential Revision: https://reviews.llvm.org/D39288
llvm-svn: 316654
Currently a record-form instruction is used for comparison of "greater than -1" and "less than 1" by modifying the predicate (e.g. LT 1 into LE 0) in addition to the naive case of comparison against 0.
This patch also enables emitting a record-form instruction for "less than or equal to -1" (i.e. "less than 0") and "greater than or equal to 1" (i.e. "greater than 0") to increase the optimization opportunities.
Differential Revision: https://reviews.llvm.org/D38941
llvm-svn: 316647
In getOffsetRange, Max can be set to 0 to force the extender replacement
to be at or below the original value. This would cause the new offset to
be non-negative, which is preferred for memory instructions (to reduce
the likelihood of it getting constant-extended due to predication). The
problem happens when the range is shifted by an offset (present in the
instruction being examined) and the offset is negative. The entire range
for the allowable deviation will then be strictly negative. This creates
a problem, since 0 is assumed to be a valid deviation.
llvm-svn: 316601
As indicated by Table 1-1 in Intel Architecture Instruction Set Extensions and Future Features Programming Reference from October 2017.
llvm-svn: 316592
We were generating BLX for all the calls, which was incorrect in most
cases. Update ARMCallLowering to generate BL for direct calls, and BLX,
BX_CALL or BMOVPCRX_CALL for indirect calls.
llvm-svn: 316570
Swap the compare operands if the lhs is a shift and the rhs isn't,
as in arm and T2 the shift can be performed by the compare for its
second operand.
Differential Revision: https://reviews.llvm.org/D39004
llvm-svn: 316562
Previously, the dllimport attribute did the right thing in terms
of treating it as a pointer to a value, but this makes sure the
names get mangled properly, and calls to such functions load the
function from the __imp_ pointer.
This is based on SVN r212431 and r212430 where the same was
implemented for ARM.
Differential Revision: https://reviews.llvm.org/D38530
llvm-svn: 316555
If particular target supports volatile memory access operations, we can
avoid AS casting to generic AS. Currently it's only enabled in NVPTX for
loads and stores that access global & shared AS.
Differential Revision: https://reviews.llvm.org/D39026
llvm-svn: 316495
Adding the scheduling information for the Browadwell (BDW) CPU target.
This patch adds the instruction scheduling information for the Broadwell (BDW) architecture target by adding the file X86SchedBroadwell.td located under the X86 Target.
We used the scheduling information retrieved from the Broadwell architects in order to create the file.
The scheduling information includes latency, number of micro-Ops and used ports by each BDW instruction.
The patch continues the scheduling replacement and insertion effort started with the SandyBridge (SNB) target in r310792, the Haswell (HSW) target in r311879, the SkylakeClient (SKL) target in rL313613 + rL315978 and the SkylakeServer (SKX) in rL315175.
Performance fluctuations may be expected due to code alignment effects.
Reviewers: zvi, RKSimon, craig.topper
Differential Revision: https://reviews.llvm.org/D39054
Change-Id: If6f799e5ff60e1091c8d43b05ea78c53581bae01
llvm-svn: 316492
If we have the situation where a Swap feeds a Splat we can sometimes change the
index on the Splat and then remove the Swap instruction.
Fixed the test case that was failing and recommit after pulling the original
commit.
Original revision is here: https://reviews.llvm.org/D39009
llvm-svn: 316478
In BPF backend, we try to optimize away redundant
trunc operations so that kernel verifier rewrite
remains valid. Previous implementation only works
for a single function.
This patch fixed the issue for multiple functions.
It clears internal map data structure before
performing optimization for each function.
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
llvm-svn: 316469
By using the widest type possible for PACKSS truncation we have a better chance of being able to peek through bitcasts and improves other combines driven by ComputeNumSignBits.
llvm-svn: 316448
Report a diagnostic when we fail to parse a shift in a memory operand because
the shift type is not an identifier. Without this, we were silently ignoring
the whole instruction.
Differential revision: https://reviews.llvm.org/D39237
llvm-svn: 316441
Summary:
r264440 added or/and patterns for storing -1 or 0 with the intention of decreasing code size. However,
X86CallFrameOptimization does not recognize these memory accesses so it will not replace them with push's when profitable.
This patch fixes this problem by teaching X86CallFrameOptimization these store 0/-1 idioms.
An alternative fix would be to prevent the 'store 0/1 idioms' patterns from firing when accessing the stack. This would save
the need to teach the pass about these idioms. However, because X86CallFrameOptimization does not always fire we may result
in cases where neither X86CallFrameOptimization not the patterns for 'store 0/1 idioms' fire.
Fixes pr34863
Reviewers: DavidKreitzer, guyblank, aymanmus
Reviewed By: aymanmus
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38738
llvm-svn: 316431