Instead of asserting when using the def_cfa directive with a register
different from fp, fallback on DWARF.
Easily triggered with:
.cfi_def_cfa x1, 32;
rdar://40249694
Differential Revision: https://reviews.llvm.org/D47593
llvm-svn: 333667
Noticed while fixing PR37426, for splat rotations (rotation by an uniform value) its better to just expand back to shift ops than performing as a general non-uniform rotation.
llvm-svn: 333661
Summary:
{FLDL2E, FLDL2T, FLDLG2, FLDLN2, FLDPI} were using WriteMicrocoded.
- I've measured the values for Broadwell, Haswell, SandyBridge, Skylake.
- For ZnVer1 and Atom, values were transferred form InstRWs.
- For SLM and BtVer2, I've guessed some values :(
Reviewers: RKSimon, craig.topper, andreadb
Subscribers: gbedwell, llvm-commits
Differential Revision: https://reviews.llvm.org/D47585
llvm-svn: 333656
Summary:
- I've measured the values for Broadwell, Haswell, SandyBridge, Skylake.
- For ZnVer1 and Atom, values were transferred form `InstRW`s.
- For SLM and BtVer2, values are from Agner.
This is split off from https://reviews.llvm.org/D47377
Reviewers: RKSimon, andreadb
Subscribers: gbedwell, llvm-commits
Differential Revision: https://reviews.llvm.org/D47523
llvm-svn: 333642
This improves splat rotations (rotation by an uniform value), to avoid having to use the generic non-uniform shift code (extension to PR37426).
llvm-svn: 333641
Keep track of achieved occupancy in SIMachineFunctionInfo.
At the moment we have a lot of duplicated or even missed code to
query and maintain occupancy info. Record it in the MFI and
query in a single call. Interfaces:
- getOccupancy() - returns current recorded achieved occupancy.
- getMinAllowedOccupancy() - returns lesser of the achieved occupancy
and the lowest occupancy we are ready to tolerate. For example if
a kernel is memory bound we are ready to tolerate 4 waves.
- limitOccupancy() - record occupancy level if we have to lower it.
- increaseOccupancy() - record occupancy if scheduler managed to
increase the occupancy.
MFI takes care of integrating different checks affecting occupancy,
including LDS use and waves-per-eu attribute. Note that scheduler
starts with not yet known register pressure, so has to record either
limit or increase in occupancy after it is done. Later passes can
just query a resulting value.
New interface is used in the active scheduler and NFC wrt its work.
Changes are also made to experimental schedulers to use it and record
an occupancy after they are done. Before the change waves-per-eu was
ignored by experimental schedulers and tolerance window for memory
bound kernels was not used.
Differential Revision: https://reviews.llvm.org/D47509
llvm-svn: 333629
v2: use "ensureAlignment"
make functions cache line aligned
Fixes GPU hangs since r333219:
"AMDGPU: Split R600 AsmPrinter code into its own class"
Differential Revision: https://reviews.llvm.org/D47516
llvm-svn: 333622
This is to make it clear what kind of bugs the LegalizerInfo::verifier
is able to catch and test its output
Reviewers: aemerson, qcolombet
Reviewed By: aemerson
Differential Revision: https://reviews.llvm.org/D46338
llvm-svn: 333597
Created the IsSplatValue helper from the splat detection code in LowerScalarVariableShift as a first NFC step towards improving support for splat rotations, which is an extension of PR37426.
llvm-svn: 333580
This was just emitting loads with the ABI alignment
for the raw type. The true alignment is often better,
especially when an illegal vector type was scalarized.
The better alignment allows using a scalar load
more often.
llvm-svn: 333558
In terms of waitcnt insertion/if necessary, the waitcnt pass forces convergence
for a loop. Previously, that kicked if greater than 2 passes over a loop, which
doesn't account for loop with many bottom blocks. So, increase the threshold to
(n+1), where n is the number of bottom blocks. This gives the pass an
opportunity to consider the contribution of each bottom block, to the overall
loop, before the forced convergence potentially kicks in.
Differential Revision: https://reviews.llvm.org/D47488
llvm-svn: 333556
Support for Clang lowering of fused intrinsics. This patch:
1. Removes bindings to clang fma intrinsics.
2. Introduces new LLVM unmasked intrinsics with rounding mode:
int_x86_avx512_vfmadd_pd_512
int_x86_avx512_vfmadd_ps_512
int_x86_avx512_vfmaddsub_pd_512
int_x86_avx512_vfmaddsub_ps_512
supported with a new intrinsic type (INTR_TYPE_3OP_RM).
3. Introduces new x86 fmaddsub/fmsubadd folding.
4. Introduces new tests for code emitted by sequentions introduced in Clang part.
Patch by tkrupa
Reviewers: craig.topper, sroland, spatel, RKSimon
Reviewed By: craig.topper, RKSimon
Differential Revision: https://reviews.llvm.org/D47443
llvm-svn: 333554
Summary: This code is now dead as the ARM backend uses ADDCARRY/SUBCARRY/SETCCCARRY .
Reviewers: rogfer01, efriedma, rengolin, javed.absar
Subscribers: kristof.beyls, chrib, llvm-commits
Differential Revision: https://reviews.llvm.org/D47413
llvm-svn: 333544
Previously PredicateControl in some cases was a member of <X>Inst classes
for some X (DSP, EVA) or was in more irregular place in the hierarchry
for any given instruction.
This patch moves PredicateControl down to the root so that it is consistently
available. Then correct the base class of microMIPS instructions as using
EncodingPredicates instead of the general Predicates field of Instruction.
Reviewers: smaksimovic, abeserminji, atanasyan
Differential Revision: https://reviews.llvm.org/D47526
llvm-svn: 333536
As part of this effort, duplicate and correct the predicates of some
aliases. Also disable code generation of some short form instructions
for FastISel, as it would otherwise reject them.
Reviewers: atanasyan, abeserminji, smaksimovic
Differential Revision: https://reviews.llvm.org/D47075
llvm-svn: 333530
Floating point immediate combining a negative sign and
a hexadecimal number, e.g. #-0x0 caused the compiler to crash.
Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar
Reviewed By: javed.absar
Differential Revision: https://reviews.llvm.org/D47483
llvm-svn: 333524
They get type Other when used in the clobber list in inline assembly.
This fixes tests fp128.ll and float.ll that failed after r333512.
llvm-svn: 333523
Summary: The fX version of floating-point registers only supports
single precision. We need to map the name to dX for doubles and qX
for long doubles if we want getRegForInlineAsmConstraint() to be
able to pick the correct register class.
Reviewers: jyknight, venkatra
Reviewed By: jyknight
Subscribers: eraman, fedor.sergeev, jrtc27, llvm-commits
Differential Revision: https://reviews.llvm.org/D47258
llvm-svn: 333512
Resolving fixup_riscv_call by assembler when the linker relaxation diabled
and the function and callsite within the same compile unit.
And also adding static_assert after Infos array declaration
to avoid missing any new fixup in MCFixupKindInfo in the future.
Differential Revision: https://reviews.llvm.org/D47126
llvm-svn: 333487
We only need the extractelt that corresponds to the register we're trying to insert back into. We can't guarantee the others haven't been optimized out depending on how those operands were produced.
So instead just look for an FR32/FR64 input and emit a COPY_TO_REGCLASS to VR128 in the output pattern. This matches what we do for ADD/SUB/MUL/DIV.
llvm-svn: 333473
AFAIK the driver's allocation will actually have to round this
up anyway. It is useful to track the rounded up size, so that
the end of the kernel segment is known to be dereferencable so
a wider s_load_dword can be used for a short argument at the end
of the segment.
llvm-svn: 333456
Summary:
Base and offset are always separated when a GlobalAddress node is lowered
(rL332641) as an optimization to reduce instruction count. However, this
optimization is not profitable if the Global Address ends up being used in only
instruction.
This patch adds peephole optimizations that merge an offset of
an address calculation into the LUI %%hi and ADD %lo of the lowering sequence.
The peephole handles three patterns:
1) ADDI (ADDI (LUI %hi(global)) %lo(global)), offset
--->
ADDI (LUI %hi(global + offset)) %lo(global + offset).
This generates:
lui a0, hi (global + offset)
add a0, a0, lo (global + offset)
Instead of
lui a0, hi (global)
addi a0, hi (global)
addi a0, offset
This pattern is for cases when the offset is small enough to fit in the
immediate filed of ADDI (less than 12 bits).
2) ADD ((ADDI (LUI %hi(global)) %lo(global)), (LUI hi_offset))
--->
offset = hi_offset << 12
ADDI (LUI %hi(global + offset)) %lo(global + offset)
Which generates the ASM:
lui a0, hi(global + offset)
addi a0, lo(global + offset)
Instead of:
lui a0, hi(global)
addi a0, lo(global)
lui a1, (offset)
add a0, a0, a1
This pattern is for cases when the offset doesn't fit in an immediate field
of ADDI but the lower 12 bits are all zeros.
3) ADD ((ADDI (LUI %hi(global)) %lo(global)), (ADDI lo_offset, (LUI hi_offset)))
--->
offset = global + offhi20<<12 + offlo12
ADDI (LUI %hi(global + offset)) %lo(global + offset)
Which generates the ASM:
lui a1, %hi(global + offset)
addi a1, %lo(global + offset)
Instead of:
lui a0, hi(global)
addi a0, lo(global)
lui a1, (offhi20)
addi a1, (offlo12)
add a0, a0, a1
This pattern is for cases when the offset doesn't fit in an immediate field
of ADDI and both the lower 1 bits and high 20 bits are non zero.
Reviewers: asb
Reviewed By: asb
Subscribers: rbar, johnrusso, simoncook, jordy.potman.lists, apazos,
niosHD, kito-cheng, shiva0217, zzheng, edward-jones, mgrang
llvm-svn: 333455
We've had Thumb1 support for ARMISD::SUBE for a while now, so this just
works. Reduces codesize a bit for 64-bit integer comparisons.
Differential Revision: https://reviews.llvm.org/D47387
llvm-svn: 333445
There seems to be no real reason to have these separate copies.
The existing implementations just copy each other for x86.
For Mips there is a subtle difference, which is just a bug
since it changes based on the context where which one was called.
Dropping this version, all tests pass. If I try to merge them
to match the removed version, a test fails.
llvm-svn: 333440
As suggested in https://bugs.llvm.org/show_bug.cgi?id=32384#c1, this change
makes the inlining of `memset()` and `memcpy()` more aggressive when
compiling for speed. The tuning remains the same when optimizing for size.
Patch by: Sebastian Pop <s.pop@samsung.com>
Evandro Menezes <e.menezes@samsung.com>
Differential revision: https://reviews.llvm.org/D45098
llvm-svn: 333429
Now LLVM assembler cannot process the following code and generates an
error. GNU tools support .set assignment directive with numeric register
name.
```
.set r4, 4
test.s:1:11: error: invalid token in expression
.set r4, $4
^
```
This patch teach assembler to handle such directives correctly.
Unfortunately a numeric register name cannot be represented as an
expression. That's why we have to maintain a separate `StringMap`
in the `MipsAsmParser` to keep mapping between aliases names and
register numbers.
Differential revision: https://reviews.llvm.org/D47464
llvm-svn: 333428
1. Introduction of mask scalar TableGen patterns.
2. Introduction of new scalar move TableGen patterns
and refactoring of existing ones.
3. Folding of pattern created by introducing scalar
masking in Clang header files.
Patch by tkrupa
Differential Revision: https://reviews.llvm.org/D47012
llvm-svn: 333419
Instruction selection can insert nodes into the underlying list after the root
node so iterating will thereby miss it. We should NOT assume that, the root node
is the last element in the DAG nodelist.
Patch by: steven.zhang (Qing Shan Zhang)
Differential Revision: https://reviews.llvm.org/D47437
llvm-svn: 333415
This patch addresses the following variants:
- bitmask immediate, e.g. 'and z0.d, z0.d, #0x6'.
- unpredicated data vectors, e.g. 'and z0.d, z1.d, z2.d'.
- predicated data vectors, e.g. 'and z0.d, p0/m, z0.d, z1.d'.
And also several aliases, such as:
- ORN, alias of ORR.
- EON, alias of EOR.
- BIC, alias of AND (immediate variant)
- MOV, alias of ORR (if unpredicated and source register operands are the same)
Reviewers: rengolin, huntergr, fhahn, samparker, SjoerdMeijer, javed.absar
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D47363
llvm-svn: 333414
Emit R_MICROMIPS_GPREL16/R_MICROMIPS_SUB/R_MICROMIPS_LO16 and
R_MICROMIPS_GPREL16/R_MICROMIPS_SUB/R_MICROMIPS_HI16 chains of
relocations for %lo(%neg(%gp_rel())) and %hi(%neg(%gp_rel()))
expressions in case of microMIPS.
Differential Revision: http://reviews.llvm.org/D47220
llvm-svn: 333409
This patch adds addsub_imm8_opt_lsl_(i8|i16|i32|i64) operands
that are unsigned values in the range 0 to 255. For element widths of
16 bits or higher it may also be a signed multiple of 256 in the
range 0 to 65280.
Note: This also does some refactoring to reuse convenience function
getShiftedVal<shift>(), and now allows AArch64 scalar 'ADD #-4096' to be
accepted to be mapped to SUB #4096.
Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D47310
llvm-svn: 333408
Emit R_MICROMIPS_HIGHER / R_MICROMIPS_HIGHEST relocations for %higher()
and %highest() expressions in case of microMIPS. These relocations do
exactly the same things as R_MIPS_HIGHER / R_MIPS_HIGHEST, but for
consistency it's better to write microMIPS variants.
Differential Revision: http://reviews.llvm.org/D47219
llvm-svn: 333407
Previously, their listed predicates were overridden at the scope level.
Reviewers: atanasyan, abeserminji, smaksimovic
Differential Revision: https://reviews.llvm.org/D46947
llvm-svn: 333405
Before this fix the following code triggers two error messages. The
second one is at least useless:
test.s:1:9: error: expected identifier after .set
.set 123, $a0
^
test-set.s:1:9: error: unexpected token, expected comma
.set 123, $a0
^
llvm-svn: 333402
Summary: We already get this right if the i64 didn't come from a load.
Reviewers: RKSimon
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47439
llvm-svn: 333393
We have unmasked intrinsics now and wrap them with a select. This is a net reduction of 36 intrinsics from before the unmasked intrinsics were added.
llvm-svn: 333388
This will allow us to remove the 3 different flavors of masked intrinsics. I'm leaving the actual intrinsic removal for another patch.
llvm-svn: 333386
These do the same thing with the first and second sources swapped. They previously came from separate intrinsics that specified different masking behavior. But we can cover that with isel patterns and a single node.
This is a step towards reducing the number of intrinsics needed.
A bunch of tests change because we are now biased to choosing VPERMT over VPERMI when there is nothing to signal that commuting is beneficial.
llvm-svn: 333383
Implemente patterns to extract HWord and Byte vector elements and convert to
quad-precision.
Differential Revision: https://reviews.llvm.org/D46774
llvm-svn: 333377
The X-form TLS load/store instructions added for optimizing the initial-exec
sequence in https://reviews.llvm.org/rL327635 fail to assemble. llvm-mc fails
with the error: invalid operand for instruction. This patch adds these
instructions into a block with isAsmParserOnly, similar to how ADD8TLS_ is
currently handled.
Differential Revision: https://reviews.llvm.org/D47382
llvm-svn: 333374
Summary:
Adding these makes it easier to assemble the output from GCC which
generates a lot of .uahalf and .uaword directives.
GAS treats .uahalf and .half the same unless the --enforce-aligned-data
flag is used. I could not find a similar flag for LLVM so it seems that
.half does not have any alignment requirement and is treated the same as
.uahalf should be. If that would change later on then the tests in
sparc-directives.s would fail due to bad alignment.
Reviewers: jyknight, asb
Reviewed By: jyknight
Subscribers: fedor.sergeev, jrtc27, llvm-commits
Differential Revision: https://reviews.llvm.org/D47319
llvm-svn: 333372
This basically reverts r280696 in favor of using extra patterns as mentioned as an alternative in that commit message. For now I've only added the cases we have test cases for, but it should be easy to add more in the future.
This will help to convert VPERMI2PS/VPERMT2PS intrinsics to use a single ISD node opcode. And hopefully allow some intrinsics to be removed.
llvm-svn: 333365
Summary:
For a block with WQM on entry and exit and containing no exact mode
code, but containing some WWM code, the WQM pass forgot to process the
block at all and so did not insert code to enter and leave WWM.
This commit fixes that.
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D47027
Change-Id: I044792eead1293bed4203fb26ce75f47878afeb6
llvm-svn: 333362
This is a simple implementation of the unroll-and-jam classical loop
optimisation.
The basic idea is that we take an outer loop of the form:
for i..
ForeBlocks(i)
for j..
SubLoopBlocks(i, j)
AftBlocks(i)
Instead of doing normal inner or outer unrolling, we unroll as follows:
for i... i+=2
ForeBlocks(i)
ForeBlocks(i+1)
for j..
SubLoopBlocks(i, j)
SubLoopBlocks(i+1, j)
AftBlocks(i)
AftBlocks(i+1)
Remainder
So we have unrolled the outer loop, then jammed the two inner loops into
one. This can lead to a simpler inner loop if memory accesses can be shared
between the now-jammed loops.
To do this we have to prove that this is all safe, both for the memory
accesses (using dependence analysis) and that ForeBlocks(i+1) can move before
AftBlocks(i) and SubLoopBlocks(i, j).
Differential Revision: https://reviews.llvm.org/D41953
llvm-svn: 333358
The argument was used as an additional negative condition and can be
expressed in the if conditional without needing to pass it down.
Update bss commentary around main use.
llvm-svn: 333357
The uint64_ts that we pass around AA to represent MemoryLocation sizes
are logically an Optional<uint64_t>. In D44748, we want to add an extra
'imprecise' bit to this Optional<uint64_t> to represent whether a given
MemoryLocation size is an upper-bound or an exact size. For more context
on why, please see D44748.
That patch is quite large, but reviewers seem to be OK with the
approach. In D45581 (my first attempt to split 'noise' out of D44748),
reames asked that I land a precursor that is solely replacing uint64_t
with LocationSize, which starts out as `using LocationSize = uint64_t;`.
He also gave me the OK to submit this rename without further review.
llvm-svn: 333314
With the removal of the old waitcnt pass, the '-enable-si-insert-waitcnts' option is obsolete. Remove it.
Differential Revision: https://reviews.llvm.org/D47378
llvm-svn: 333303
This is adoption of HSAIL perfhint pass. Two types of hints are produced:
1. Function is memory bound.
2. Kernel can use wave limiter.
Currently these hints are used in the scheduler. If a function is suspected
to be memory bound we allow occupancy to decrease to 4 waves in the course
of scheduling.
Differential Revision: https://reviews.llvm.org/D46992
llvm-svn: 333289
Rather than using a regpair operand of these instructions, use two seperate
operands and a custom converter to handle the implicit second register operand.
Additionally, remove the microMIPS32R6 definition as its redundant.
Reviewers: atanasyan, abeserminji, smaksimovic
Differential Revision: https://reviews.llvm.org/D47255
llvm-svn: 333288
When the shuffle mask selected a subvector of the second input vector,
and aligning of the source was performed, the shuffle mask was updated
incorrectly, resulting in an ICE further in the selection process.
llvm-svn: 333279
As confirmed by llvm-exegesis, there is no scheduler difference between MOVDQA/MOVDQU and VMOVDQA/VMOVDQU xmm reg-reg moves
Another chapter in the never ending crusade to remove useless InstRW overrides from the x86 scheduler models......
llvm-svn: 333271
Unpredicated copy of optionally-shifted immediate to SVE vector,
along with MOV-aliases.
This patch contains parsing and printing support for
cpy_imm8_opt_lsl_(i8|i16|i32|i64). This operand allows a signed value in
the range -128 to +127. For element widths of 16 bits or higher it may
also be a signed multiple of 256 in the range -32768 to +32512.
For element-width of 8 bits a range of -128 to 255 is accepted, since a copy
of a byte can be considered either signed/unsigned.
Note: This patch renames tryParseAddSubImm() -> tryParseImmWithOptionalShift()
and moves the behaviour of trying to shift a plain immediate by an allowed
shift-value to its addImmWithOptionalShiftOperands() method, so that the
parsing itself is generic and allows immediates from multiple shifted operands.
This is done because an immediate can be divisible by both shifted operands.
Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D47309
llvm-svn: 333263
Summary:
Lower control flow did not correctly handle the case that a loop break
in if/else was on a condition that was not guaranteed to be masked by
exec. The first test kernel shows an example of this going wrong; after
exiting the loop, exec is all ones, even if it was not before the loop.
The fix is for lowering of if-break and else-break to insert an
S_AND_B64 to mask the break condition with exec. This commit also
includes the optimization of not inserting that S_AND_B64 if it is
obviously not needed because the break condition is the result of a
V_CMP in the same basic block.
V2: Addressed some review comments.
V3: Test fixes.
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D44046
Change-Id: I0fc56a01209a9e99d1d5c9b0ffd16f111caf200c
llvm-svn: 333258
Re-add the feature flag for invpcid, which was removed in r294561.
Add an intrinsic, which always uses a 32 bit integer as first argument,
while the instruction actually uses a 64 bit register in 64 bit mode
for the INVPCID_TYPE argument.
Reviewers: craig.topper
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D47141
llvm-svn: 333255
Summary:
The AMDGPUMCInstLower class is not used outside AMDGPUMCInstLower.cpp,
so we don't need a header file.
Reviewers: arsenm, nhaehnle
Reviewed By: arsenm
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D47264
llvm-svn: 333254
The existing code has three different ways to try to lower a 64-bit
immediate to the sequence ORR+MOVK. The result is messy: it misses
some possible sequences, and the order of the checks means we sometimes
emit two MOVKs when we only need one.
Instead, just use a simple loop to try all possible two-instruction
ORR+MOVK sequences.
Differential Revision: https://reviews.llvm.org/D47176
llvm-svn: 333218
Ideally we'd be able to test a CPU by using __builtin_readcyclecounter()/RDTSC instead (PR37193) if a model/cycle-counter is not specified.
NOTE: Jaguar PMCs don't give good coverage of resource pipes specified in the model (at the macro-vs-micro-op levels) but we should be able to cover at least a few resources.
llvm-svn: 333190
To do this:
1. Add fixup_riscv_relax fixup types which eventually will
transfer to R_RISCV_RELAX relocation types.
2. Insert R_RISCV_RELAX relocation types to auipc function call
expression when linker relaxation enabled.
Differential Revision: https://reviews.llvm.org/D44886
llvm-svn: 333158
The match pattern in the definition of LXSDX is xoaddr, so the Pseudo
instruction XFLOADf64 never gets selected. XFLOADf64 expands to LXSDX/LFDX post
RA based on the register pressure. To avoid ambiguity, we need to remove the
select pattern for LXSDX, same as what was done for LXSD. STXSDX also have
the same issue.
Patch by Qing Shan Zhang (steven.zhang).
Differential Revision: https://reviews.llvm.org/D47178
llvm-svn: 333150
Summary:
Set CostPerUse higher for registers that are not used in the compressed
instruction set. This will influence the greedy register allocator to reduce
the use of registers that can't be encoded in 16 bit instructions. This
affects register allocation even when compressed instruction isn't targeted,
we see no major negative codegen impact.
Reviewers: asb
Reviewed By: asb
Subscribers: rbar, johnrusso, simoncook, jordy.potman.lists, apazos, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, mgrang
Differential Revision: https://reviews.llvm.org/D47039
llvm-svn: 333132
Implemente patterns to extract [Un]signed Word vector element and convert to
quad-precision.
Differential Revision: https://reviews.llvm.org/D46536
llvm-svn: 333115
Implemente patterns to extract [Un]signed DWord vector element and convert to
quad-precision.
Differential Revision: https://reviews.llvm.org/D46333
llvm-svn: 333112
r333093 introduced several warnings (-Wlogical-not-parentheses,
-Wbool-compare).
Adding parentheses in MipsSEInstrInfo::isCopyInstr() to silence it.
llvm-svn: 333097
This property is needed in order to follow values movement between
registers. This property is used in TII to implement method that
returns true if simple copy like instruction is recognized, along
with source and destination machine operands.
Patch by Nikola Prica.
Differential Revision: https://reviews.llvm.org/D45204
llvm-svn: 333093
Now that the LLVM_DEBUG() macro landed on the various sub-projects
the DEBUG macro can be removed.
Also change the new uses of DEBUG to LLVM_DEBUG.
Differential Revision: https://reviews.llvm.org/D46952
llvm-svn: 333091
For RISC-V it is desirable to have relaxation happen in the linker once
addresses are known, and as such the size between two instructions/byte
sequences in a section could change.
For most assembler expressions, this is fine, as the absolute address results
in the expression being converted to a fixup, and finally relocations.
However, for expressions such as .quad .L2-.L1, the assembler folds this down
to a constant once fragments are laid out, under the assumption that the
difference can no longer change, although in the case of linker relaxation the
differences can change at link time, so the constant is incorrect. One place
where this commonly appears is in debug information, where the size of a
function expression is in a form similar to the above.
This patch extends the assembler to allow an AsmBackend to declare that it
does not want the assembler to fold down this expression, and instead generate
a pair of relocations that allow the linker to carry out the calculation. In
this case, the expression is not folded, but when it comes to emitting a
fixup, the generic FK_Data_* fixups are converted into a pair, one for the
addition half, one for the subtraction, and this is passed to the relocation
generating methods as usual. I have named these FK_Data_Add_* and
FK_Data_Sub_* to indicate which half these are for.
For RISC-V, which supports this via e.g. the R_RISCV_ADD64, R_RISCV_SUB64 pair
of relocations, these are also set to always emit relocations relative to
local symbols rather than section offsets. This is to deal with the fact that
if relocations were calculated on e.g. .text+8 and .text+4, the result 12
would be stored rather than 4 as both addends are added in the linker.
Differential Revision: https://reviews.llvm.org/D45181
Patch by Simon Cook.
llvm-svn: 333079
The Sparc asm parser currently has custom parsing logic for .half, .word,
.nword and .xword. Rather than use this custom logic, we can just use
addAliasForDirective to enable the reuse of AsmParser::parseDirectiveValue.
https://reviews.llvm.org/D47003
llvm-svn: 333078
The AArch64 asm parser currently has custom parsing logic for .hword, .word,
and .xword. Rather than use this custom logic, we can just use
addAliasForDirective to enable the reuse of AsmParser::parseDirectiveValue.
Differential Revision: https://reviews.llvm.org/D47000
llvm-svn: 333077
This is a different approach to fixing the problem described in D46746.
RISCVAsmBackend currently depends on the getSize helper function returning the
number of bytes a fixup may change (note: some other backends have a similar
helper named getFixupNumKindBytes). As noted in that review, this doesn't
return the correct size for FK_Data_1, FK_Data_2, or FK_Data_8 meaning that
too few bytes will be written in the case of FK_Data_8, and there's the
potential of writing outside the Data array for the smaller fixups.
D46746 extends getSize to recognise some of the builtin fixup types. Rather
than having a function that needs to be kept up to date as new builtin or
target-specific fixups are added, We can calculate an appropriate bound on the
number of bytes that might be touched using Info.TargetSize and
Info.TargetOffset.
Differential Revision: https://reviews.llvm.org/D46965
llvm-svn: 333076
Also bringing ARMRegisterBankInfo::getRegBankFromRegClass
implementation up to speed with the *.td-definition.
Reviewed By: qcolombet
Differential Revision: https://reviews.llvm.org/D43982
llvm-svn: 333056
The integer operation convertion for some reason only happens
if the source is a bitcast from an integer, which happens to
always be the situation when the result is loaded. Add
an additional pattern for when the source operation is really
an FP operation.
llvm-svn: 333019
When we're outlining a sequence that ends in a call, we can save up to
three instructions in the outlined function by turning the call into
a tail-call. I refer to this as thunk outlining because the resulting
outlined function looks like a thunk; suggestions welcome for a better
name.
In addition to making the outlined function shorter, thunk outlining
allows outlining calls which would otherwise be illegal to outline:
we don't need to save/restore LR, so we don't need to prove anything
about the stack access patterns of the callee.
To make this work effectively, I also added
MachineOutlinerInstrType::LegalTerminator to the generic MachineOutliner
code; this allows treating an arbitrary instruction as a terminator in
the suffix tree.
Differential Revision: https://reviews.llvm.org/D47173
llvm-svn: 333015
MipsLongBranchPass and MipsHazardSchedule passes are joined to one pass
because of mutual conflict. When MipsHazardSchedule inserts 'nop's, it
potentially breaks some jumps, so they have to be expanded to long
branches. When some branch is expanded to long branch, it potentially
creates a hazard situation, which should be fixed by adding nops.
New pass is called MipsBranchExpansion, it combines these two passes,
and runs them alternately until one of them reports no changes were made.
Differential Revision: https://reviews.llvm.org/D46641
llvm-svn: 332977
This enables us to detect more fast path sdiv cases under cost analysis.
This patch also enables us to handle non-uniform-constant pow2 cases for X86 SDIV costs.
Found while working on D46276
Future patches can then extend the vectorizers to more fully support non-uniform pow2 cases.
Differential Revision: https://reviews.llvm.org/D46637
llvm-svn: 332969
This usually results in better code. Fixes using
inline asm with short2, and also fixes having a different
ABI for function parameters between VI and gfx9.
Partially cleans up the mess used for lowering of the d16
operations. Making v4f16 legal will help clean this up more,
but this requires additional work.
llvm-svn: 332953
For both argument and return types, promote illegal types like i24 to i32,
and if a type can't be easily promoted, clear out the signature before
bailing out, so avoid leaving it in a partially complete state.
Fixes PR37546.
llvm-svn: 332947
Summary:
MCTargetDesc/AMDGPUMCTargetDesc.h contains enums for all the instuction
and register defintions, which are huge so we only want to include
them where needed.
This will also make it easier if we want to split the R600 and GCN
definitions into separate tablegenerated files.
I was unable to remove AMDGPUMCTargetDesc.h from SIMachineFunctionInfo.h
because it uses some enums from the header to initialize default values
for the SIMachineFunction class, so I ended up having to remove includes of
SIMachineFunctionInfo.h from headers too.
Reviewers: arsenm, nhaehnle
Reviewed By: nhaehnle
Subscribers: MatzeB, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D46272
llvm-svn: 332930
Summary:
This **appears** to be the last missing piece for the masked merge pattern handling in the backend.
This is [[ https://bugs.llvm.org/show_bug.cgi?id=37104 | PR37104 ]].
[[ https://bugs.llvm.org/show_bug.cgi?id=6773 | PR6773 ]] will introduce an IR canonicalization that is likely bad for the end assembly.
Previously, `andps`+`andnps` / `bsl` would be generated. (see `@out`)
Now, they would no longer be generated (see `@in`), and we need to make sure that they are generated.
Differential Revision: https://reviews.llvm.org/D46528
llvm-svn: 332904
This code should really do exactly the same thing for 32-bit x86 and
64-bit small code models, with the exception that RIP-relative
addressing can't use base and index registers.
llvm-svn: 332893
This removes 6 intrinsics since we no longer need separate mask and maskz intrinsics.
Differential Revision: https://reviews.llvm.org/D47124
llvm-svn: 332890
With this we gain a little flexibility in how the generic object
writer is created.
Part of PR37466.
Differential Revision: https://reviews.llvm.org/D47045
llvm-svn: 332868
AMDGPUDAGToDAGISel adds DivergenceAnalysis in getAnalysisUsage
but does not list it in pass dependencies which may lead to
crash.
Differential Revision: https://reviews.llvm.org/D47151
llvm-svn: 332862
To make this work I needed to add an endianness field to MCAsmBackend
so that writeNopData() implementations know which endianness to use.
Part of PR37466.
Differential Revision: https://reviews.llvm.org/D47035
llvm-svn: 332857
Chances are we'll be asked again after type legalization, but before that point
it's better to claim misaligned accesses aren't allowed than to assert.
llvm-svn: 332840
MipsLongBranchPass and MipsHazardSchedule passes are joined to one pass
because of mutual conflict. When MipsHazardSchedule inserts 'nop's, it
potentially breaks some jumps, so they have to be expanded to long
branches. When some branch is expanded to long branch, it potentially
creates a hazard situation, which should be fixed by adding nops.
New pass is called MipsBranchExpansion, it combines these two passes,
and runs them alternately until one of them reports no changes were made.
Differential Revision: https://reviews.llvm.org/D46641
llvm-svn: 332834
As suggested by Fabian on PR37426, we can use PMULUDQ to perform v4i32 vector rotations as the upper 32bits of the multiply will contain the 'wrapped' bits of the rotation.
v8i16/v16i8 rotations would be straightforward to add to lowerRotate in the future - ideally we'd mostly share code with the vector shifts lowering.
Differential Revision: https://reviews.llvm.org/D46954
llvm-svn: 332832
Previously the compiler was using the microMIPSR3 variants, incorrectly.
Reviewers: atanasyan, abeserminji, smaksimovic
Differential Revision: https://reviews.llvm.org/D46948
llvm-svn: 332820
Eliminate loads from the dispatch packet when they will have
a known value.
Also pattern match the code used by the library to handle partial
workgroup dispatches, which isn't necessary if reqd_work_group_size
is used.
llvm-svn: 332771
Provide some free functions to reduce verbosity of endian-writing
a single value, and replace the endianness template parameter with
a field.
Part of PR37466.
Differential Revision: https://reviews.llvm.org/D47032
llvm-svn: 332757
The idea is that a client that wants split dwarf would create a
specific kind of object writer that creates two files, and use it to
create the streamer.
Part of PR37466.
Differential Revision: https://reviews.llvm.org/D47050
llvm-svn: 332749
The code that generates post-increments for Hexagon considered
integer values only. This patch adds support to generate them for
floating point values, f32 and f64.
Differential Revision: https://reviews.llvm.org/D47036
llvm-svn: 332748
BtVer2 - fix NumMicroOp and account for the Lat+6cy GPR->XMM and Lat+1cy XMm->GPR delays (see rL332737)
The high number of MOVD/MOVQ equivalent instructions meant that there were a number of missed patterns in SNB/Znver1:
SNB - add missing GPR<->MMX costs (taken from Agner / Intel AOM)
Znver1 - add missing GPR<->XMM MOVQ costs (taken from Agner)
llvm-svn: 332745
The intrinsic legalization for masked truncate uses ISD::TRUNCATE which can be constant folded by getNode. This prevents getVectorMaskingNode from seeing the ISD::TRUNCATE special case where it should emit X86ISD::SELECT instead of ISD::VSELECT. This causes a vselect with a v16i1 or v8i1 condition to be emitted during vector legalization. but vector legalization doesn't revisit nodes it creates. DAG combine will then promote this condition to match the result type. Then op legalization will try to legalize it, but the custom lowering hook returned SDValue(). But op legalization doesn't have an Expand for VSELECT because it expects vector legalization to have taken care of it. So the operation sticks around and fails in isel.
This patch adds a custom legalization hook to morph it to a vXi8 vselect instead.
This also simplifies the normal vXi16 vselect handling because vector legalization was normally expanding to AND/ANDN/OR and DAG combine was turning that into VBLENDVB. So we can skip a step by doing it directly.
Fixes PR37499
Differential Revision: https://reviews.llvm.org/D47025
llvm-svn: 332743
Retag some instructions that were missed when we split off vector load/store/moves - MOVQ/MOVD etc.
Fixes BtVer2/SLM which have different behaviours for GPR stores.
llvm-svn: 332718
Retag some instructions that were missed when we split off vector load/store/moves - MOVSS/MOVSD/MOVHPD/MOVHPD/MOVLPD/MOVLPS etc.
Fixes BtVer2/SLM which have different behaviours for GPR stores.
llvm-svn: 332714
Sorry, the commit comment for r332703 is completely broken.
My mind slipped - the right description would be:
In SystemZDAGToDAGISel::Select(), in the handling for SELECT_CCMASK:
Check if UpdateNodeOperands() returns a different SDNode and in that
case call ReplaceNode.
Review: Ulrich Weigand.
llvm-svn: 332706
This patch aims to match the changes introduced in gcc by
https://gcc.gnu.org/ml/gcc-cvs/2018-04/msg00534.html. The
IBT feature definition is removed, with the IBT instructions
being freely available on all X86 targets. The shadow stack
instructions are also being made freely available, and the
use of all these CET instructions is controlled by the module
flags derived from the -fcf-protection clang option. The hasSHSTK
option remains since clang uses it to determine availability of
shadow stack instruction intrinsics, but it is no longer directly used.
Comes with a clang patch (D46881).
Patch by mike.dvoretsky
Differential Revision: https://reviews.llvm.org/D46882
llvm-svn: 332705
For RISCV branch instructions, we need to preserve relocation types when linker
relaxation enabled, so then linker could modify offset when the branch offsets
changed.
We preserve relocation types by define shouldForceRelocation.
IsResolved return by evaluateFixup will always false when shouldForceRelocation
return true. It will make RISCV MC Branch Relaxation always relax 16-bit
branches to 32-bit form, even if the symbol actually could be resolved.
To avoid 16-bit branches always relax to 32-bit form when linker relaxation
enabled, we add a new parameter WasForced to indicate that the symbol actually
couldn't be resolved and not forced by shouldForceRelocation return true.
RISCVAsmBackend::fixupNeedsRelaxationAdvanced could relax branches with
unresolved symbols by (!IsResolved && !WasForced).
RISCV MC Branch Relaxation is needed because RISCV could perform 32-bit
to 16-bit transformation in MC layer.
Differential Revision: https://reviews.llvm.org/D46350
llvm-svn: 332696
This reapplies commits: r330271, r330592, r330779.
[DEBUG] Initial adaptation of NVPTX target for debug info emission.
Summary:
Patch adds initial emission of the debug info for NVPTX target.
Currently, only .file and .loc directives are emitted, everything else is
commented out to not break the compilation of Cuda.
llvm-svn: 332689
Counting the number of instructions is both unintuitive and inaccurate.
On AArch64, this only affects the generated remarks and certain rare
pseudo-instructions, but it will have a bigger impact on other targets.
Differential Revision: https://reviews.llvm.org/D46921
llvm-svn: 332685
Summary:
The Closure allocated in the main loop is allocated on the stack. However,
later in the code its address is taken (and used for comparisons). This
obviously doesn't work. In fact, the Closure will get the same stack address
during every loop iteration, rendering the check that intended to identify
Closure conflicts entirely ineffective. Fix this bug by giving every Closure
a unique ID and using that for comparison. Alternatively, we could heap
allocate the closure object.
Fixes PR37396
FixesJuliaLang/julia#27032
Reviewers: craig.topper, guyblank
Reviewed By: craig.topper
Subscribers: vchuravy, llvm-commits
Differential Revision: https://reviews.llvm.org/D46800
llvm-svn: 332682
Summary:
We cannot simply delete IMPLICIT_DEF nodes. They may be used
later (e.g. by a PHI) and deleting them will cause later passes (e.g.
LiveVariables) to crash. However, it seems fine to ignore them for
purposes of the domain reassignment (as we do with PHI).
Fixes PR37430
FixesJuliaLang/julia#27080
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D46797
llvm-svn: 332680
Summary:
When lowering global address, lower the base as a TargetGlobal first then
create an SDNode for the offset separately and chain it to the address calculation
This optimization will create a DAG where the base address of a global access will
be reused between different access. The offset can later be folded into the immediate
part of the memory access instruction.
With this optimization we generate:
lui a0, %hi(s)
addi a0, a0, %lo(s) ; shared base address.
addi a1, zero, 20 ; 2 instructions per access.
sw a1, 44(a0)
addi a1, zero, 10
sw a1, 8(a0)
addi a1, zero, 30
sw a1, 80(a0)
Instead of:
lui a0, %hi(s+44) ; 3 instructions per access.
addi a1, zero, 20
sw a1, %lo(s+44)(a0)
lui a0, %hi(s+8)
addi a1, zero, 10
sw a1, %lo(s+8)(a0)
lui a0, %hi(s+80)
addi a1, zero, 30
sw a1, %lo(s+80)(a0)
Which will save one instruction per access.
Reviewers: asb, apazos
Reviewed By: asb
Subscribers: rbar, johnrusso, simoncook, jordy.potman.lists, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, mgrang, apazos, asb, llvm-commits
Differential Revision: https://reviews.llvm.org/D46989
llvm-svn: 332641
Summary:
This patch implements MC support for tail psuedo instruction.
A follow-up patch implements the codegen support as well as handling of the indirect tail pseudo instruction.
Reviewers: asb, apazos
Reviewed By: asb
Subscribers: rbar, johnrusso, simoncook, jordy.potman.lists, sabuasal, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, llvm-commits
Differential Revision: https://reviews.llvm.org/D46221
llvm-svn: 332634
Summary:
The current StructurizeCFG pass only works for CFG with one exit. AMDGPUUnifyDivergentExitNodes combines multiple "return" blocks and/or "unreachable" blocks
to one exit block for the Structurizer to work. However, infinite loop is another kind of special "exit", and if we don't handle it, the case of multiple exits will prevent the structurizer from working.
In this work, for each infinite loop, we add a dummy edge to the "return" block, and thus the AMDGPUUnifyDivergentExitNodes pass will work with infinite loops.
This will make CFG with infinite loops be structurized.
Reviewer:
nhaehnle
Differential Revision:
https://reviews.llvm.org/D46340
llvm-svn: 332625
The isReMaterlizable flag is somewhat confusing, unlike most other instruction
flags it is currently interpreted as a hint (mightBeRematerializable would be
a better name). While LUI is always rematerialisable, for an instruction like
ADDI it depends on its operands. TargetInstrInfo::isTriviallyReMaterializable
will call TargetInstrInfo::isReallyTriviallyReMaterializable, which in turn
calls TargetInstrInfo::isReallyTriviallyReMaterializableGeneric. We rely on
the logic in the latter to pick out instances of ADDI that really are
rematerializable.
The isReMaterializable flag does make a difference on a variety of test
programs. The recently committed remat.ll test case demonstrates how stack
usage is reduce and a unnecessary lw/sw can be removed. Stack usage in the
Proc0 function in dhrystone reduces from 192 bytes to 112 bytes.
For the sake of completeness, this patch also implements
RISCVRegisterInfo::isConstantPhysReg. Although this is called from a number of
places, it doesn't seem to result in different codegen for any programs I've
thrown at it. However, it is called in the rematerialisation codepath and it
seems sensible to implement something correct here.
Differential Revision: https://reviews.llvm.org/D46182
llvm-svn: 332617
Data directives such as .word, .half, .hword are currently parsed using
HexagonAsmParser::ParseDirectiveValue which effectively duplicates logic from
AsmParser::parseDirectiveValue. This patch deletes that duplicated logic in
favour of using addAliasForDirective.
Differential Revision: https://reviews.llvm.org/D46999
llvm-svn: 332607
These directives are recognised by gas. Support is added through the use of
addAliasForDirective.
Also match RISC-V gcc in preferring .half and .word for 16-bit and 32-bit data
directives.
llvm-svn: 332574
The FIXME comments were about preventing load folding to avoid a partial xmm update. But these instructions use GPR as input when the load isn't folded. This won't help prevent a partial xmm update.
llvm-svn: 332573
This breaks the code which saves and restores LR, so we can't outline
without doing something more complicated for stack adjustment.
Found by inspection; we get lucky in most cases because getMemOpInfo
only handles STRWpost, not any other pre/post-increment forms. But it
hits a couple of artificial testcases in the tree.
Differential Revision: https://reviews.llvm.org/D46920
llvm-svn: 332529