Commit Graph

3170 Commits

Author SHA1 Message Date
Jinsong Ji 42eea2b69b [AIX] Enable int128 in 64 bit mode
This patch remove the override in AIX target,
so the int128 is enabled in 64 bit mode or with ForceEnableInt128.

Reviewed By: lkail

Differential Revision: https://reviews.llvm.org/D111078
2021-10-15 16:23:04 +00:00
Qiu Chaofan 9e9b0f4621 [PowerPC] Support ppc-asm-full-reg-names for AIX
Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D94282
2021-10-15 12:22:44 +08:00
Albion Fung b4b9f9b4b3 [PowerPC] Emit dcbt and dcbtst in place of their extended mnemonics on AIX
On AIX, the system assembler does not support the extended mnemonics
dcbtt and dcbtstt. This patch stops them from being emitted on
AIX and emits the base mnemonics instead, dcbt X, X, 16 and
dcbtstt X, X, 16 respectively.

Differential revision: https://reviews.llvm.org/D111258
2021-10-12 15:47:57 -05:00
Roland Froese 28e648b29e [PowerPC] Simplify PPC codegen test pre-inc-disable.ll
Simplify the test case to make it easier to look at. Change from auto-generated
checks to targeted manual checks to reduce sensitivity to register allocation
and scheduling changes.

Differential Revision: https://reviews.llvm.org/D111333
2021-10-12 20:12:31 +00:00
Qiu Chaofan 1f253e4fd6 Pre-commit pre-inc-disable.ll to avoid dead code
The case was added in 728e139, testing it outputs lxsibzx instead of
lbzux. Here we need some minimal update to avoid DCE in future patches.
2021-10-12 16:03:17 +08:00
Chen Zheng 4ead32d1cf [PowerPC] update test case using the scripts; nfc 2021-10-10 14:39:20 +00:00
Qiu Chaofan 573531fb1f Fix typo of colon to semicolon in lit tests 2021-10-09 10:03:50 +08:00
Chen Zheng 5f4c91583e [XCOFF] support DWARF for 32-bit XCOFF for object output
Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D97184
2021-10-08 02:35:11 +00:00
Stefan Pintilie 740086596c [PowerPC] Fix issue with lowering byval parameters.
Lowering of byval parameters with sizes that are not represented by a single
store require multiple stores to properly address the correct size of the
parameter.

Sizes that cannot be done with a single store are 3 bytes, 5 bytes, 6 bytes,
7 bytes. It is not correct to simply perform an 8 byte store and for these
elements because then the store would be larger than the element and alias
analysis would assume that this is undefined behaivour and return NoAlias
for them.

This patch adds the correct stores so that the size of the store is not larger
than the size of the element.

Reviewed By: nemanjai, #powerpc

Differential Revision: https://reviews.llvm.org/D108795
2021-10-06 13:19:15 -05:00
Kamau Bridgeman 8737c74fab [PowerPC][MMA] Allow MMA builtin types in pre-P10 compilation units
This patch allows the use of __vector_quad and __vector_pair, PPC MMA builtin
types, on all PowerPC 64-bit compilation units. When these types are
made available the builtins that use them automatically become available
so semantic checking for mma and pair vector memop __builtins is also
expanded to ensure these builtin function call are only allowed on
Power10 and new architectures. All related test cases are updated to
ensure test coverage.

Reviewed By: #powerpc, nemanjai

Differential Revision: https://reviews.llvm.org/D109599
2021-10-05 07:59:32 -05:00
Jinsong Ji 933e2469a2 [PowerPC][NFC] Remove reg name option in int128 test
The test is generated by script, so we don't really need the regname to
be meaniful here.

AIX doesn't support the reg name option, removing it for now so that we
can reuse the CHECKs for AIX triple as well.
2021-10-04 15:31:25 +00:00
Stefan Pintilie 4fc2f4979c [PowerPC] Fix __builtin_ppc_load2r to return short instead of int.
This patch fixes the return value of the builtin __builtin_ppc_load2r to
correctly return short instead of int.

Reviewed By: nemanjai, #powerpc

Differential Revision: https://reviews.llvm.org/D110771
2021-10-04 06:17:02 -05:00
Philip Reames f39978b84f [SCEV] Correctly propagate nowrap flags across scopes when folding invariant add through addrec
This fixes a violation of the wrap flag rules introduced in c4048d8f. This is an alternate fix to D106852.

The basic problem being fixed is that we infer a set of flags which is valid at some inner scope S1 (usually by correctly propagating them from IR), and then (incorrectly) extend them to a SCEV in scope S2 where S1 != S2. This is not in general safe per the wrap flags semantics recently defined.

In this patch, I include a simple inference step to handle the case where we can prove that S2 is the preheader of the loop S1, and that entry into S2 implies execution of S1. See the code for a more detailed explanation.

One worry I have with this patch is that I might be over-fitting what shows up in tests - and thus hiding negative impact we'd see in the real world. My best defense is that the rule used here very closely follows the one used to propagate the flags from IR to the inner add to start with, and thus if one is reasonable, so probably is the other. Curious what others think about that piece.

The test diffs are roughly as expected. Mostly analysis only, with two transform changes. Oddly, the result looks better in the loop-idiom test, and I don't understand the PPC output enough to have tell. Nothing terrible looking though. (For context, without the scope inference peephole, the test delta includes a couple of vectorization tests. Again, not super concerning, but slightly more so.)

Differential Revision: https://reviews.llvm.org/D109845
2021-10-03 15:19:33 -07:00
Stefan Pintilie 40f382ad10 [NFC][PowerPC] Add test case for byval store.
Added a test case for situations where a struct of size 1-7 bytes is
passed by value.
2021-10-01 16:54:29 -05:00
Albion Fung 4195ed9959 [PowerPC] Improved codegen related to xscvdpsxws/xscvdpuxws
This patch removes the uneccessary mf/mtvsr generated in conjunction
with xscvdpsxws/xscvdpuxws.

Differential revision: https://reviews.llvm.org/D109902
2021-09-30 14:31:00 -05:00
Quinn Pham 67a3d1e275 [PowerPC] swdiv builtins for XL compatibility
This patch is in a series of patches to provide builtins for compatibility with
the XL compiler. This patch implements the software divide builtin as
wrappers for a floating point divide. XL provided these builtins because it
didn't produce software estimates by default at `-Ofast`. When compiled
with `-Ofast` these builtins will produce the software estimate for divide.

Reviewed By: #powerpc, nemanjai

Differential Revision: https://reviews.llvm.org/D106959
2021-09-29 11:31:07 -05:00
Nemanja Ivanovic 09b67aa1c3 [PowerPC] Implement builtin for vbpermd
The instruction has similar semantics to vbpermq but for doublewords.
It was added in Power9 and the ABI documents the builtin.

Differential revision: https://reviews.llvm.org/D107899
2021-09-29 06:34:31 -05:00
Quinn Pham 70391b3468 [PowerPC] FP compare and test XL compat builtins.
This patch is in a series of patches to provide builtins for
compatability with the XL compiler. This patch adds builtins for compare
exponent and test data class operations on floating point values.

Reviewed By: #powerpc, lei

Differential Revision: https://reviews.llvm.org/D109437
2021-09-28 11:01:51 -05:00
Albion Fung 3678df5ae6 [PowerPC][NFC] Add test case in preparation for codegen change
This test case tests doubles inserted into vector ints,
and help make apparent the optimizations a future patch
will make.
2021-09-24 12:17:50 -05:00
Victor Huang 6e1aaf18af [PowerPC] Mark splat immediate instructions as rematerializable
This patch marks splat immediate instructions XXSPLTIW and XXSPLTIDP as
rematerializable to prevent MachineLICM from moving them out of loops.

Reviewed By: lei, amy

Differential revision: https://reviews.llvm.org/D108823
2021-09-24 12:03:34 -05:00
Chen Zheng 957514eb9e [PowerPC] add testcase for chain commoning; nfc 2021-09-22 05:08:00 +00:00
Chen Zheng ffa9fa9ed2 [PowerPC] prepare for udpate form with non-const increment.
This is a follow-up of D105872. Now we are able to prepare for update
form with non-const increment.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D106032
2021-09-22 02:54:28 +00:00
Amy Kwan 2af57b6099 [PowerPC] Add prefix load pattern for fpext to v2f64
This patch adds a prefixed load pattern involving v2f32 fpext v2f64, where we
are dealing with a value with an offset that fits into a 34-bit signed immediate.
A reduced test case is also added to patch that tests the pattern, in which the
pattern is tested in the big endian CHECKs of the newly added test.

Differential Revision: https://reviews.llvm.org/D109887
2021-09-21 12:45:24 -05:00
Chen Zheng 80584f0056 Revert "[PowerPC][ELF] make sure local variable space does not overlap with parameter save area"
This causes mix-compile issues on PowerPC Linux.

This reverts commit 324bd467a2.
2021-09-17 08:07:18 +00:00
Matt Arsenault 54d755a034 DAG: Fix incorrect folding of fmul -1 to fneg
The fmul is a canonicalizing operation, and fneg is not so this would
break denormals that need flushing and also would not quiet signaling
nans. Fold to fsub instead, which is also canonicalizing.
2021-09-14 21:25:02 -04:00
Matt Arsenault 4a36e96c3f RegAllocGreedy: Account for reserved registers in num regs heuristic
This simple heuristic uses the estimated live range length combined
with the number of registers in the class to switch which heuristic to
use. This was taking the raw number of registers in the class, even
though not all of them may be available. AMDGPU heavily relies on
dynamically reserved numbers of registers based on user attributes to
satisfy occupancy constraints, so the raw number is highly misleading.

There are still a few problems here. In the original testcase that
made me notice this, the live range size is incorrect after the
scheduler rearranges instructions, since the instructions don't have
the original InstrDist offsets. Additionally, I think it would be more
appropriate to use the number of disjointly allocatable registers in
the class. For the AMDGPU register tuples, there are a large number of
registers in each tuple class, but only a small fraction can actually
be allocated at the same time since they all overlap with each
other. It seems we do not have a query that corresponds to the number
of independently allocatable registers. Relatedly, I'm still debugging
some allocation failures where overlapping tuples seem to not be
handled correctly.

The test changes are mostly noise. There are a handful of x86 tests
that look like regressions with an additional spill, and a handful
that now avoid a spill. The worst looking regression is likely
test/Thumb2/mve-vld4.ll which introduces a few additional
spills. test/CodeGen/AMDGPU/soft-clause-exceeds-register-budget.ll
shows a massive improvement by completely eliminating a large number
of spills inside a loop.
2021-09-14 21:00:29 -04:00
Amy Kwan 5041a485b9 [PowerPC] Exploit Prefixed Load/Stores using the refactored Load/Store Implementation
This patch exploits the prefixed load and store instructions utilizing the
refactored load/store implementation introduced in D93370.

Prefixed load and store instructions are emitted whenever we are loading or
storing a value with an offset that fits into a 34-bit signed immediate.
Patterns for the prefixed load and stores are added in this patch, as well as
the implementation that detects when we are loading and storing a value with an
offset that fits in 34-bits.

Differential Revision: https://reviews.llvm.org/D96075
2021-09-14 08:39:49 -05:00
Chen Zheng 946e69d253 [PowerPC] prepare more loop load/store instructions
PPCLoopInstrFormPrep pass now can prepare for load store instructions
in a loop whose increment is not a constant integer.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D105872
2021-09-14 05:00:48 +00:00
Amy Kwan 351a0d8a90 [PowerPC] Update PC-Relative Load/Store Patterns to use the refactored Load/Store Implementation
This patch updates the PC-Relative load and store patterns to utilize the
refactored load/store implementation introduced in D93370.

PC-Relative implementation has been added to PPCISelLowering.cpp, and also the
patterns in PPCInstrPrefix.td have been updated and no longer require AddedComplexity.
All existing test cases pass with this update.

Differential Revision: https://reviews.llvm.org/D95116
2021-09-09 15:38:42 -05:00
David Green d8d24c64fe [DAG] Fix GT -> GE condition when creating SetCC
79845ed6df folded some setcc(ashr) conditions to setcc, but got
the condition for NE incorrect, using GT where it should be using GE.
2021-09-08 12:41:51 +01:00
Victor Huang 4a226529e2 [PowerPC] Fixed the crash due to early if conversion with fixed CR fields
This patch adds a fix to do early if conversion to select when
conditional branch not using physical register to prevent the crash when
expanding ISEL instruction.

Reviewed By: lei, kamaub, PowerPC

Differential revision: https://reviews.llvm.org/D108302
2021-09-07 10:51:03 -05:00
Jinsong Ji 042a6564d3 [PowerPC] Guard XSRSP in P8 for FastISel
This is exposed by enabling FastIsel on 64bit AIX.
We are generating XSRSP regardless of the arch,
which may be wrong when -mcpu=pwr7.

The fix is to guard the generation in P8 only.

Reviewed By: qiucf

Differential Revision: https://reviews.llvm.org/D109365
2021-09-07 15:17:51 +00:00
David Green 1b83aaaefa [DAG] Remove oneuse check in select_cc setgt X, -1, C, ~C fold
This appears to produce better code, even if the condition may need to
be replicated.
2021-09-05 16:18:31 +01:00
David Green 8523fb96a6 [DAG] Fold select_cc setgt X, -1, C, ~C -> xor (ashr X, BW-1), C
Given a select_cc producing a constant and a invertion of the constant
for a comparison more than zero, we can produce an xor with ashr
instead, which produces smaller code. The ashr either sets all bits or
clear all bits depending on if the value is negative. This is then xor'd
with the constant to optionally negate the value.
https://alive2.llvm.org/ce/z/DTFaBZ

This includes a OneUseCheck on the Cmp, which seems to make thinks a
little worse and will be removed in a followup.

Differential Revision: https://reviews.llvm.org/D109149
2021-09-05 16:04:01 +01:00
David Green 79845ed6df [DAG] Fold setcc eq with ashr to compare to zero.
Pulled out of D109149, this folds set_cc seteq (ashr X, BW-1), -1 ->
set_cc setlt X, 0 to prevent some regressions later on when folding
select_cc setgt X, -1, C, ~C -> xor (ashr X, BW-1), C

Differential Revision: https://reviews.llvm.org/D109214
2021-09-05 14:06:47 +01:00
David Green 7801d7963d [DAG] Add tests for select_cc and setcc with constant patterns. 2021-09-05 10:17:21 +01:00
Qiu Chaofan d0f9553ef5 [PowerPC] Enable fast-isel on AIX 64 subtarget
This patch basically enables fast-isel for AIX 64-bit subtarget
(previously enabled only for ELF 64). The initial motivation is to
introduce branch folding to AIX generated code for correct debug
behavior. I also saw some compiling time improvement in a few LLVM
test-suite benchmarks. (toast, dbms, cjpeg, burg, etc.)

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D98844
2021-09-03 11:33:45 +08:00
Chen Zheng 34badc409c Revert "[HardwareLoops] Change order of SCEV expression construction for InitLoopCount."
This causes https://bugs.llvm.org/show_bug.cgi?id=51714 and
is not a right patch according to comments in D91724

This reverts commit 42eaf4fe0a.
2021-09-03 02:55:43 +00:00
Roman Lebedev 3f1f08f0ed
Revert @llvm.isnan intrinsic patchset.
Please refer to
https://lists.llvm.org/pipermail/llvm-dev/2021-September/152440.html
(and that whole thread.)

TLDR: the original patch had no prior RFC, yet it had some changes that
really need a proper RFC discussion. It won't be productive to discuss
such an RFC, once it's actually posted, while said patch is already
committed, because that introduces bias towards already-committed stuff,
and the tree is potentially in broken state meanwhile.

While the end result of discussion may lead back to the current design,
it may also not lead to the current design.

Therefore i take it upon myself
to revert the tree back to last known good state.

This reverts commit 4c4093e6e3.
This reverts commit 0a2b1ba33a.
This reverts commit d9873711cb.
This reverts commit 791006fb8c.
This reverts commit c22b64ef66.
This reverts commit 72ebcd3198.
This reverts commit 5fa6039a5f.
This reverts commit 9efda541bf.
This reverts commit 94d3ff09cf.
2021-09-02 13:53:56 +03:00
Kai Luo 5eaebd5d64 [PowerPC] Implement quadword atomic load/store
Add support to load/store i128 atomically.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D105612
2021-09-01 06:55:40 +00:00
Nick Desaulniers d8b6ae072d [PPCISelLowering] avoid emitting libcalls to __mulodi4()
Similar to D108842, D108844, and D108926.

__has_builtin(builtin_mul_overflow) returns true for 32b PPC targets,
but Clang is deferring to compiler RT when encountering long long types.
This breaks ppc44x_defconfig + CONFIG_BLK_DEV_NBD=y builds of the Linux
kernel that are using builtin_mul_overflow with these types for these
targets.

If the semantics of __has_builtin mean "the compiler resolves these,
always" then we shouldn't conditionally emit a libcall.

This will still need to be worked around in the Linux kernel in order to
continue to support these builds of the Linux kernel for this
target with older releases of clang.

Link: https://bugs.llvm.org/show_bug.cgi?id=28629
Link: https://github.com/ClangBuiltLinux/linux/issues/1438

Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D108936
2021-08-31 11:09:58 -07:00
Qiu Chaofan 3bdd850d0c [PowerPC] Set branch/call instructions as no hasSideEffects
PowerPC can model these instructions, so we don't need this flag set.

Reviewed By: shchenz, jsji

Differential Revision: https://reviews.llvm.org/D71983
2021-08-30 12:23:35 +08:00
Chen Zheng 324bd467a2 [PowerPC][ELF] make sure local variable space does not overlap with parameter save area
Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D105271
2021-08-27 01:58:41 +00:00
Eli Friedman 09dcf31d74 [NFC] Add tests for i128 fshl on a few targets.
In preparation for D108058.
2021-08-24 11:43:35 -07:00
Simon Pilgrim 6de0b55188 [DAG] TransformFPLoadStorePair - replace getABITypeAlign with allowsMemoryAccess (PR45116)
One of the cases identified in PR45116 - we don't need to limit load combines (in this case for fp->int load/store copies) to ABI alignment, we can use allowsMemoryAccess - which tests using getABITypeAlign, but also checks if a target permits (fast) misaligned memory loads by checking allowsMisalignedMemoryAccesses as a fallback.

Differential Revision: https://reviews.llvm.org/D108318
2021-08-24 13:11:27 +01:00
Zarko Todorovski b575bbd0c7 [PowerPC][AIX] Set the HasAlloca flag in the AIX Traceback Table only if R31 is used as a frame pointer
After c063946476 usage of R31 doesn't necessarily mean
that alloca is used. The `TracebackTable::IsAllocaUsedMask` flag should be set only
when R31 is used as a frame pointer.

On AIX the `function calls alloca' bit seems to be set whenever R31 is
set up as a frame pointer, even when there is no alloca call.

Reviewed By: lkail

Differential Revision: https://reviews.llvm.org/D108141
2021-08-23 15:20:41 -04:00
Kai Luo 7165e6713f [PowerPC] Use int64_t to represent stack object offset and frame size
This is the first step to enable PPC64 support huge frame size(>2G). Also fix an assertion error for frame size, i.e.,`int x; !isInt<32>(x);` should be always evaluated false, so the guard code for frame size is impossible to hit.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D107435
2021-08-23 02:13:21 +00:00
David Green d10f23a25d [ISel] Expand saddsat and ssubsat via asr and xor
This changes the lowering of saddsat and ssubsat so that instead of
using:
  r,o = saddo x, y
  c = setcc r < 0
  s = c ? INTMAX : INTMIN
  ret o ? s : r
into using asr and xor to materialize the INTMAX/INTMIN constants:
  r,o = saddo x, y
  s = ashr r, BW-1
  x = xor s, INTMIN
  ret o ? x : r
https://alive2.llvm.org/ce/z/TYufgD

This seems to reduce the instruction count in most testcases across most
architectures. X86 has some custom lowering added to compensate for
cases where it can increase instruction count.

Differential Revision: https://reviews.llvm.org/D105853
2021-08-19 16:08:07 +01:00
Simon Pilgrim ba1f6ffb8d [PowerPC] Regenerate 2007-09-08-unaligned.ll test checks 2021-08-18 19:54:11 +01:00
Qiu Chaofan 2e5e33807e Pre-commit frem test in PowerPC 2021-08-18 17:52:53 +08:00