Although we can't write a tablegen pattern to remove redundant
splitf64+buildf64 pairs due to the multiple return values, we can handle it
with some C++ selection code. This is simpler than removing them after
instruction selection through RISCVDAGToDAGISel::PostprocessISelDAG, as was
done previously.
llvm-svn: 343712
This patch adds a 'WriteCopy' [WriteLoad, WriteStore] schedule sequence instead to better model the behaviour
Found by @andreadb during llvm-mca testing on btver2 which was crashing on "zero uop" WriteRMW only instructions
llvm-svn: 343708
-verify-machineinstrs was implemented as a simple bool. As a result, the
'VerifyMachineCode == cl::BOU_UNSET' used by EXPENSIVE_CHECKS to make it on by
default but possible to disable didn't work as intended. Changed
-verify-machineinstrs to a boolOrDefault to correct this.
llvm-svn: 343696
1. Fix include ordering.
2. Improve variable name (width is bitwidth not number-of-elements).
3. Add local Opcode variable to reduce code duplication.
llvm-svn: 343694
This fixes a problem where the register allocator fails to eliminate a PHI
because there's a non-PHI in the middle of the PHI instructions at the start
of a BB.
This G_TRUNC can be better placed but this at least fixes the correctness issue
quickly. I'll follow up with a patch to the verifier to catch this kind of bug
in future.
llvm-svn: 343693
This patch teaches class RegisterFile how to analyze register writes from
instructions that are move elimination candidates.
In particular, it teaches it how to check if a move can be effectively eliminated
by the underlying PRF, and (if necessary) how to perform move elimination.
The long term goal is to allow processor models to describe instructions that
are valid move elimination candidates.
The idea is to let register file definitions in tablegen declare if/when moves
can be eliminated.
This patch is a non functional change.
The logic that performs move elimination is currently disabled. A future patch
will add support for move elimination in the processor models, and enable this
new code path.
llvm-svn: 343691
deserializeMCOperand - ensure that we at least match the first character of the sscanf pattern before calling
This reduces llvm-exegesis uops analysis of the instructions supported from btver2 from 5m13s to 2m1s on debug builds.
llvm-svn: 343690
Fix use of SSE1 registers for f32 ops in no-x87 mode.
Notably, allow use of SSE instructions for f32 operations in 64-bit
mode (but not 32-bit which is disallowed by callign convention).
Also avoid translating memset/memcopy/memmove into SSE registers
without X87 for 32-bit mode.
This fixes PR38738.
Reviewers: nickdesaulniers, craig.topper
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D52555
llvm-svn: 343689
Two cases in a ThinLTO test were passing for the wrong reasons, since
rL340374. The tests were supposed to be testing that files were being
pruned due to the cache size, but they were in fact being pruned because
they were older than the default expiration period of 1 week.
This change fixes the tests by explicitly setting the expiration time to
the maximum value. This required the option to be exposed in llvm-lto.
By assigning all files in the cache a similar time, it is possible to see
that the newest files are still being kept, and that we aren't passing
for the wrong reason again. In the event that the entry expiration were
to expire for them, then the test would start failing, because these
files would be removed too.
Reviewed by: rnk, inglorion
Differential Revision: https://reviews.llvm.org/D51992
llvm-svn: 343687
This patch makes sure that a register is only hinted once to RA. In extreme
cases the same register can otherwise be hinted numerous times and cause a
compile time slowdown.
Review: Simon Pilgrim
https://reviews.llvm.org/D52826
llvm-svn: 343686
The patterns as defined are correct only when XLen==32.
This is another preparatory patch for a set of patches that flesh out RV64
codegen.
llvm-svn: 343679
1. brcond operates on an condition.
2. atomic_fence and the pseudo AMO instructions should all take xlen immediates
This allows the same definitions and patterns to work for RV64 (XLenVT==i64).
llvm-svn: 343678
Summary:
The new buffer/tbuffer intrinsics handle an out-of-range immediate
offset by moving/adding offset&-4096 to a vgpr, leaving an in-range
immediate offset, with a chance of the move/add being CSEd for similar
loads/stores.
However it turns out that a negative offset in a vgpr is illegal, even
if adding the immediate offset makes it legal again.
Therefore, this commit disables the offset&-4096 thing if the offset is
negative.
Differential Revision: https://reviews.llvm.org/D52683
Change-Id: Ie02f0a74f240a138dc2a29d17cfbd9e350e4ed13
llvm-svn: 343672
I was expecting this to be a nfc but Silvermont seems to be setup a little differently:
// A folded store needs a cycle on MEC_RSV for the store data, but it does not need an extra port cycle to recompute the address.
def : WriteRes<WriteRMW, [SLM_MEC_RSV]>;
So moving from WriteStore to WriteRMW reduces predicted port pressure, confirmed by @craig.topper that this is correct.
Differential Revision: https://reviews.llvm.org/D52740
llvm-svn: 343670
Modified the testcases to use both pass managers
Use single commandline flag for both pass managers.
Differential Revision: https://reviews.llvm.org/D52708
Reviewers: sebpop, tejohnson, brzycki, SirishP
Reviewed By: tejohnson, brzycki
llvm-svn: 343662
Summary: Depends on D45541
Reviewers: ab, aditya_nandakumar, bogner, rtereshin, volkan, rovka, javed.absar, aemerson
Subscribers: aemerson, rengolin, mgorny, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D45543
The previous commit failed portions of the test-suite on GreenDragon due to
duplicate COPY instructions and iterator invalidation. Both issues have now
been fixed. To assist with this, a helper (cloneVirtualRegister) has been added
to MachineRegisterInfo that can be used to get another register that has the same
type and class/bank as an existing one.
llvm-svn: 343654
We don't need to match the precise type index number here. It's not
important. The type name is what matters to make this test useful.
llvm-svn: 343642
Previously we were creating weakly defined helper function in
each translation unit:
- setThrew
- setTempRet0
Instead we now assume these will be provided at link time. In
emscripten they are provided in compiler-rt:
https://github.com/kripken/emscripten/pull/7203
Additionally we previously created three global variable which are
also now required to exist at link time instead.
- __THREW__
- _threwValue
- __tempRet0
Differential Revision: https://reviews.llvm.org/D49208
llvm-svn: 343640
The behaviour of this bot indicates that -verify-machineinstrs has been forced
on and is therefore inserting the verifier on builds that don't expect it.
Explicitly specify whether it's enabled or disabled for each test.
llvm-svn: 343633
Summary:
Use the newly added DebugInfo (DI) Trivial flag, which indicates if a C++ record is trivial or not, to determine Codeview::FunctionOptions.
Clang and MSVC generate slightly different Codeview for C++ records. For example, here is the C++ code for a class with a defaulted ctor,
class C {
public:
C() = default;
};
Clang will produce a LF for the defaulted ctor while MSVC does not. For more details, refer to FIXMEs in the test cases in "function-options.ll" included with this set of changes.
Reviewers: zturner, rnk, llvm-commits, aleksandr.urakov
Reviewed By: rnk
Subscribers: Hui, JDevlieghere
Differential Revision: https://reviews.llvm.org/D45123
llvm-svn: 343626
The 0x63 opcodes in 64-bit mode have a fixed source size of 32-bits, but the destination size is controlled by REX.W and the 0x66 opsize prefix. This instruction is normally used with a REX.W prefix which provides desired behavior. The other encodings are interpretted as valid by the processor, but aren't useful.
This patch makes us recognize them for the disassembler to match objdump.
llvm-svn: 343614
-verify-machineinstrs inserts the MachineVerifier after every MachineInstr-based
pass. However, GlobalISel creates MachineInstr-based passes earlier than DAGISel
and the corresponding verifiers are not being added. This patch fixes that.
If GlobalISel triggers the fallback path then the MIR can be left in a bad
state that is going to be cleared by ResetMachineFunctions. In this situation
verifying between GlobalISel passes will prevent the fallback path from
recovering from this. As a result, we bail out of verifying a function if the
FailedISel attribute is present.
llvm-svn: 343613
Add the .cv_fpo_stackalign directive so that we can define $T0, or the
VFRAME virtual register, with it. This was overlooked in the initial
implementation because unlike MSVC, we push CSRs before allocating stack
space, so this value is only needed to describe local variable
locations. Variables that the compiler now addresses via ESP are instead
described as being stored at offsets from VFRAME, which for us is ESP
after alignment in the prologue.
This adds tests that show that we use the VFRAME register properly in
our S_DEFRANGE records, and that we emit the correct FPO data to define
it.
Fixes PR38857
llvm-svn: 343603
The ARM elf emitter would omit printing data
symbol when constant data. This patch
overrides the emitFill method as to enforce that
the symbol is correctly printed.
Differential revision: https://reviews.llvm.org/D52737
llvm-svn: 343594
These are candidates for the same fold that was implemented in
D52439, but FP types require bitcasting (and that changes the
extra uses profitability calculation).
llvm-svn: 343587
This adds new instructions to manipluate tagged pointers, and to load
and store the tags associated with memory.
Patch by Pablo Barrio, David Spickett and Oliver Stannard!
Differential revision: https://reviews.llvm.org/D52490
llvm-svn: 343572
This adds new system registers introduced by the Memory Tagging
extension.
Patch by Pablo Barrio!
Differential revision: https://reviews.llvm.org/D52488
llvm-svn: 343571
The Memory Tagging Extension adds system instructions for data cache
maintenance, implemented as new operands to the DC instruction.
Patch by Pablo Barrio!
Differential revision: https://reviews.llvm.org/D52487
llvm-svn: 343570
This is an attempt to get out of a local-minimum that instcombine currently
gets stuck in. We essentially combine two optimisations at once, ~a - ~b = b-a
and min(~a, ~b) = ~max(a, b), only doing the transform if the result is at
least neutral. This involves using IsFreeToInvert, which has been expanded a
little to include selects that can be easily inverted.
This is trying to fix PR35875, using the ideas from Sanjay. It is a large
improvement to one of our rgb to cmy kernels.
Differential Revision: https://reviews.llvm.org/D52177
llvm-svn: 343569
This adds the memory tagging extension, which is an optional extension
introduced in v8.5A. The new instructions and registers will be added by
subsequent patches.
Patch by Pablo Barrio!
Differential revision: https://reviews.llvm.org/D52486
llvm-svn: 343563
Consistently try to use APFloat::toString for floating point constant comments to get rid of differences between Constant / ConstantDataSequential values - it should help stop some of the linux-windows buildbot failures matching NaN/INF etc. as well.
Differential Revision: https://reviews.llvm.org/D52702
llvm-svn: 343562
Summary:
This is redundant, as FetchStage::getNextInstruction already checks this
and returns llvm::ErrorSuccess() as appropriate.
NFC.
Reviewers: andreadb
Subscribers: gbedwell, llvm-commits
Differential Revision: https://reviews.llvm.org/D52642
llvm-svn: 343555
getNumUses is linear in the number of uses. Since we're looking for a specific use count, we can use hasNUses which will stop as soon as it determines there are more than N uses instead of walking all of them.
llvm-svn: 343550
There's a strange assertion on two of the Green Dragon bots that goes away when
this is reverted. The assertion is in RegBankAlloc and if it is this commit then
-verify-machine-instrs should have caught it earlier in the pipeline.
llvm-svn: 343546
Summary:
Before this change, LLVM would always describe locals on the stack as
being relative to some specific register, RSP, ESP, EBP, ESI, etc.
Variables in stack memory are pretty common, so there is a special
S_DEFRANGE_FRAMEPOINTER_REL symbol for them. This change uses it to
reduce the size of our debug info.
On top of the size savings, there are cases on 32-bit x86 where local
variables are addressed from ESP, but ESP changes across the function.
Unlike in DWARF, there is no FPO data to describe the stack adjustments
made to push arguments onto the stack and pop them off after the call,
which makes it hard for the debugger to find the local variables in
frames further up the stack.
To handle this, CodeView has a special VFRAME register, which
corresponds to the $T0 variable set by our FPO data in 32-bit. Offsets
to local variables are instead relative to this value.
This is part of PR38857.
Reviewers: hans, zturner, javed.absar
Subscribers: aprantl, hiraditya, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D52217
llvm-svn: 343543
Clang-cl was complaining about some sort of constexpr narrowing bug:
C:\src\llvm-project\llvm\lib\CodeGen\GlobalISel\CombinerHelper.cpp(136,31): error: non-constant-expression cannot be narrowed from type 'llvm::TargetOpcode::(anonymous enum at C:\src\llvm-project\llvm\include\llvm/CodeGen/TargetOpcodes.h:22:1)' to 'unsigned int' in initializer list [-Wc++11-narrowing]
unsigned(MI.getOpcode()) == unsigned(TargetOpcode::G_LOAD)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
C:\src\llvm-project\llvm\lib\CodeGen\GlobalISel\CombinerHelper.cpp(136,31): note: insert an explicit cast to silence this issue
unsigned(MI.getOpcode()) == unsigned(TargetOpcode::G_LOAD)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
static_cast<unsigned int>(
llvm-svn: 343541
This includes a fix to prevent i16 compares with i32/i64 ands from being shrunk if bit 15 of the and is set and the sign bit is used.
Original commit message:
Currently we skip looking through truncates if the sign flag is used. But that's overly restrictive.
It's safe to look through the truncate as long as we ensure one of the 3 things when we shrink. Either the MSB of the mask at the shrunken size isn't set. If the mask bit is set then either the shrunk size needs to be equal to the compare size or the sign
There are still missed opportunities to shrink a load and fold it in here. This will be fixed in a future patch.
llvm-svn: 343539
Going from XForm Load to DSForm Load requires that the immediate be 4 byte
aligned.
If we are not aligned we must leave the load as LDX (XForm).
This bug is causing a compile-time failure in the benchmark h264ref.
Differential Revision: https://reviews.llvm.org/D51988
llvm-svn: 343525
This reverts commit r342387 as it's showing significant performance
regressions in a number of benchmarks. Followed up with the
committer and original thread with an example and will get performance
numbers before recommitting.
llvm-svn: 343522
Spill/reload instructions are artificially generated by the compiler and
have no relation to the original source code. So the best thing to do is
not attach any debug location to them (instead of just taking the next
debug location we find on following instructions).
Differential Revision: https://reviews.llvm.org/D52125
llvm-svn: 343520
There's a subtle bug in the handling of truncate from i32/i64 to i32 without minsize.
I'll be adding more test cases and trying to find a fix.
llvm-svn: 343516
The pattern had a couple of problems:
- It was checking for loads of bytes in the reverse order to what it
should have been looking for.
- It would replace loads of bytes with a load of a word without making
sure that the alignment was correct.
Thanks to Eli Friedman for pointing it out.
llvm-svn: 343514
Currently it returns incorrect operand size for a target independet
node such as COPY if operand is a register with subreg. Instead of
correct subreg size it returns a size of the whole superreg.
Differential Revision: https://reviews.llvm.org/D52736
llvm-svn: 343508
These work a little differently because they are actually in
the globals stream and are treated as symbol records, even though
DIA presents them as types. So this also adds the necessary
infrastructure to cache records that live somewhere other than
the TPI stream as well.
llvm-svn: 343507
Summary:
The AsmParser Lexer regards these as a seperate token.
Here we expand the instruction name with them if they are
adjacent (no whitespace).
Tested: the basic-assembly.s test case has one case with a / in it.
The currently are also instructions with : in them, which we intend
to rename rather than fix them here.
Reviewers: tlively, dschuff
Subscribers: sbc100, jgravelle-google, aheejin, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D52442
llvm-svn: 343501
This patch adds load folding support to the test shrinking code. This was noticed missing in the review for D52669
Differential Revision: https://reviews.llvm.org/D52699
llvm-svn: 343499
Currently we skip looking through truncates if the sign flag is used. But that's overly restrictive.
It's safe to look through the truncate as long as we ensure one of the 3 things when we shrink. Either the MSB of the mask at the shrunken size isn't set. If the mask bit is set then either the shrunk size needs to be equal to the compare size or the sign flag needs to be unused.
There are still missed opportunities to shrink a load and fold it in here. This will be fixed in a future patch.
Differential Revision: https://reviews.llvm.org/D52669
llvm-svn: 343498
This fixes a case of bad index calculation when merging mismatching
vector types. This changes the existing code to just use the existing
extract_{subvector|element} and a bitcast (instead of bitcast first and
then newly created extract_xxx) so we don't need to adjust any indices
in the first place.
rdar://44584718
Differential Revision: https://reviews.llvm.org/D52681
llvm-svn: 343493
Summary:
This is a continuation of the fix for PR34627 "InstCombine assertion at vector gep/icmp folding". (I just realized bugpoint had fuzzed the original test for me, so I had fixed another trigger of the same assert in adjacent code in InstCombine.)
This patch avoids optimizing an icmp (to look only at the base pointers) when the resulting icmp would have a different type.
The patch adds a testcase and also cleans up and shrinks the pre-existing test for the adjacent assert trigger.
Reviewers: lebedev.ri, majnemer, spatel
Reviewed By: lebedev.ri
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D52494
llvm-svn: 343486
This was originally committed at rL343407, but reverted at
rL343458 because it crashed trying to handle a case where
the destination type is FP. This version of the patch adds
a check for that possibility. Tests added at rL343480.
Original commit message:
This transform is requested for the backend in:
https://bugs.llvm.org/show_bug.cgi?id=39016
...but I figured it was worth doing in IR too, and it's probably
easier to implement here, so that's this patch.
In the simplest case, we are just truncating a scalar value. If the
extract index doesn't correspond to the LSBs of the scalar, then we
have to shift-right before the truncate. Endian-ness makes this tricky,
but hopefully the ASCII-art helps visualize the transform.
Differential Revision: https://reviews.llvm.org/D52439
llvm-svn: 343482
The first attempt at this transform:
rL343407
...was reverted:
rL343458
...because it did not handle the case where we bitcast to FP.
The patch was already limited to avoid the case where we
bitcast from FP, but we might want to transform that too.
llvm-svn: 343480
Summary:
Address fixme in r301762. And would simplify the cmake file in
clang-tools-extra.
Reviewers: sammccall
Subscribers: mgorny, llvm-commits, cfe-commits
Differential Revision: https://reviews.llvm.org/D52713
llvm-svn: 343473
Summary: Allows for retrieving the type of a metadata node. Has the added benefit of ensuring that the C and C++ kind APIs stay in sync as a failure to add a corresponding LLVMMetadataKind will result in the switch in the accessor being semantically malformed.
Reviewers: whitequark, deadalnix
Reviewed By: whitequark
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D52693
llvm-svn: 343469
Summary:
Reporting this as an error required stat()ing every file, as well as seeming
semantically questionable.
Reviewers: vsk, bkramer
Subscribers: mgrang, kristina, llvm-commits, liaoyuke
Differential Revision: https://reviews.llvm.org/D52648
llvm-svn: 343460
This caused Chromium builds to fail with "Illegal Trunc" assertion.
See https://crbug.com/890723 for repro.
> This transform is requested for the backend in:
> https://bugs.llvm.org/show_bug.cgi?id=39016
> ...but I figured it was worth doing in IR too, and it's probably
> easier to implement here, so that's this patch.
>
> In the simplest case, we are just truncating a scalar value. If the
> extract index doesn't correspond to the LSBs of the scalar, then we
> have to shift-right before the truncate. Endian-ness makes this tricky,
> but hopefully the ASCII-art helps visualize the transform.
>
> Differential Revision: https://reviews.llvm.org/D52439
llvm-svn: 343458
Summary: This is prelimineary to moving random functions to SnippetGenerator.
Reviewers: courbet
Subscribers: tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D52718
llvm-svn: 343456
Summary: This change enables VOP3 shifts to be explicitly selected
dependent on the divergence.
Differential Revision: https://reviews.llvm.org/D52559
Reviewers: rampitec
llvm-svn: 343455
This patch adds another variant class to identify zero-idiom VPERM2F128rr
instructions.
On Jaguar, a VPERM wih bit 3 and 7 of the mask set, is a zero-idiom.
Differential Revision: https://reviews.llvm.org/D52663
llvm-svn: 343452
Summary: I had added support for compressing dwarf sections in a prior commit,
this one adds support for decompressing. Usage is:
llvm-objcopy --decompress-debug-sections input.o output.o
Reviewers: jakehehrlich, jhenderson, alexshap
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D51841
llvm-svn: 343451
Summary:
While looking at PR35606, I found out that the scheduling info is incorrect.
One can check that it's really a P5+P6 and not a 2*P56 with:
echo -e 'vzeroall\nvandps %xmm1, %xmm2, %xmm3' | ./bin/llvm-exegesis -mode=uops -snippets-file=-
(vandps executes on P5 only)
Reviewers: craig.topper, RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D52541
llvm-svn: 343447
When MachineCopyPropagation eliminates a dead 'copy', its associated debug information becomes invalid. as the recorded register has been removed. It causes the debugger to display wrong variable value.
Differential Revision: https://reviews.llvm.org/D52614
llvm-svn: 343445
We can only copy between a k-register and a GR32/GR64 register.
This patch detects that the copy will be illegal and prevents the domain reassignment from happening for that closure.
This probably isn't the best fix, and we should probably figure out how to handle this correctly.
Fixes PR38803.
llvm-svn: 343443
for the target machine.
This simplifies usage during setup of concurrent JIT stacks where the client
needs a DataLayout, but not a TargetMachine (TargetMachines are created on
the fly by the compile threads later).
llvm-svn: 343429
There's a conditional report_fatal_error just above this llvm_unreachable. The optimizer when seeing the unreachable removes the conditional and just makes any other error trigger the existing report_fatal_error.
llvm-svn: 343428
There are a few leftovers in rL343163 which span two lines. This commit
changes these llvm::sort(C.begin(), C.end, ...) to llvm::sort(C, ...)
llvm-svn: 343426
(1) Adds comments for the API.
(2) Removes the setArch method: This is redundant: the setArchStr method on the
triple should be used instead.
(3) Turns EmulatedTLS on by default. This matches EngineBuilder's behavior.
llvm-svn: 343423
Summary:
The lowering of PHI nodes used to detect if all inputs originated
from IMPLICIT_DEF's. If so the PHI node was replaced by an
IMPLICIT_DEF. Now we also consider undef uses when checking the
inputs. So if all inputs are implicitly defined or undef we
lower the PHI to an IMPLICIT_DEF. This makes
PHIElimination::LowerPHINode more consistent as it checks
both implicit and undef properties at later stages.
Reviewers: MatzeB, tstellar
Reviewed By: MatzeB
Subscribers: jvesely, nhaehnle, llvm-commits
Differential Revision: https://reviews.llvm.org/D52558
llvm-svn: 343417
Summary:
When PR16508 was solved (in rL185363) a regression test was
added as test/CodeGen/PowerPC/2013-07-01-PHIElimBug.ll.
I discovered that the test case no longer reproduced the
scenario from PR16508. This problem could have been amended
by adding an extra RUN line with "-O1" (or possibly "-O0"),
but instead I added a mir-reproducer
test/CodeGen/PowerPC/2013-07-01-PHIElimBug.mir
to get a reproducer that is less sensitive to changes in
earlier passes (including O-level).
While being at it I also corrected a code comment in
PHIElimination::EliminatePHINodes that has been incorrect
since the related bugfix from rL185363.
Reviewers: MatzeB, hfinkel
Reviewed By: MatzeB
Subscribers: nemanjai, jsji, llvm-commits
Differential Revision: https://reviews.llvm.org/D52553
llvm-svn: 343416
This transform is requested for the backend in:
https://bugs.llvm.org/show_bug.cgi?id=39016
...but I figured it was worth doing in IR too, and it's probably
easier to implement here, so that's this patch.
In the simplest case, we are just truncating a scalar value. If the
extract index doesn't correspond to the LSBs of the scalar, then we
have to shift-right before the truncate. Endian-ness makes this tricky,
but hopefully the ASCII-art helps visualize the transform.
Differential Revision: https://reviews.llvm.org/D52439
llvm-svn: 343407
As noted in post-commit comments for D52548, the limitation on
increasing vector length can be applied by opcode.
As a first step, this patch only allows insertelement to be
widened because that has no logical downsides for IR and has
little risk of pessimizing codegen.
This may cause PR39132 to go into hiding during a full compile,
but that bug is not fixed.
llvm-svn: 343406
The SINT_TO_FP<->UINT_TO_FP combines for non-negative integers should only occur for legal ops once LegalOperations = true
No test case to hand, noticed when investigating PR38226 + PR38970
llvm-svn: 343405
Summary:
This function turns (X >> C1) & C2 into a BMI BEXTR or TBM BEXTRI instruction. For BMI BEXTR we have to materialize an immediate into a register to feed to the BEXTR instruction.
The BMI BEXTR instruction is 2 uops on Intel CPUs. It looks like on SKL its one port 0/6 uop and one port 1/5 uop. Despite what Agner's tables say. I know one of the uops is a regular shift uop so it would have to go through the port 0/6 shifter unit. So that's the same or worse execution wise than the shift+and which is one 0/6 uop and one 0/1/5/6 uop. The move immediate into register is an additional 0/1/5/6 uop.
For now I've limited this transform to AMD CPUs which have a single uop BEXTR. If may also might make sense if we can fold a load or if the and immediate is larger than 32-bits and can't be encoded as a sign extended 32-bit value or if LICM or CSE can hoist the move immediate and share it. But we'd need to look more carefully at that. In the regression I looked at it doesn't look load folding or large immediates were occurring so the regression isn't caused by the loss of those. So we could try to be smarter here if we find a compelling case.
Reviewers: RKSimon, spatel, lebedev.ri, andreadb
Reviewed By: RKSimon
Subscribers: llvm-commits, andreadb, RKSimon
Differential Revision: https://reviews.llvm.org/D52570
llvm-svn: 343399
We added support for dumping pointers but pointers to arrays
won't correctly dump until we add support for dumping arrays.
Instead of trying to dump everything, which this test isn't
even interested in, just dump enums and typedefs.
llvm-svn: 343398
CompileOnDemandLayer2 now supports user-supplied partition functions (the
original CompileOnDemandLayer already supported these).
Partition functions are called with the list of requested global values
(i.e. global values that currently have queries waiting on them) and have an
opportunity to select extra global values to materialize at the same time.
Also adds testing infrastructure for the new feature to lli.
llvm-svn: 343396
We didn't properly detect when a pointer was a member
pointer, and when that was the case we were not
properly returning class parent info. This caused
member pointers to render incorrectly in pretty mode.
However, we didn't even have pretty tests for pointers
in native mode, so those are also added now to ensure
this.
llvm-svn: 343393
By removing demanded target shuffles that simplify to zero/undef/identity before simplifying its inputs we improve chances of further simplification, as only the immediate parent user of the combined is added back to the work list - this still doesn't help us if its passed through other ops though (bitcasts....).
llvm-svn: 343390
The shift amount might have peeked through a extract_subvector, altering the number of vector elements in the 'Amt' variable - so we were incorrectly calculating the ratio when peeking through bitcasts, resulting in incorrectly detecting splats.
llvm-svn: 343373
This makes it available for use in IRTransformLayer2::TransformFunction
instances (since a const MaterializationResponsibility& parameter was
added in r343365).
llvm-svn: 343367
(1) A const accessor for the LLVMContext held by a ThreadSafeContext.
(2) A const accessor for the ThreadSafeModules held by an IRMaterializationUnit.
(3) A const MaterializationResponsibility reference to IRTransformLayer2's
transform function. This makes IRTransformLayer2 useful for JIT debugging
(since it can inspect JIT state through the responsibility argument) as well
as program transformations.
llvm-svn: 343365
Summary:
This CL allows constant vectors of floats to be recognized as non-NaN
and non-zero in select patterns. This change makes
`matchSelectPattern` more powerful generally, but was motivated
specifically because I wanted fminnan and fmaxnan to be created for
vector versions of the scalar patterns they are created for.
Tested with check-all on all targets. A testcase in the WebAssembly
backend that tests the non-nan codepath is in an upcoming CL.
Reviewers: aheejin, dschuff
Subscribers: sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D52324
llvm-svn: 343364
Summary: Before this, there was no reasonable way to retrieve the type of a global value (most notably, a function) that was created with the C API.
Reviewers: whitequark, deadalnix
Reviewed By: whitequark
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D52659
llvm-svn: 343363
Correctly check for relocations in the constant to promote. And don't
allow promoting a constant multiple times.
This partially fixes https://bugs.llvm.org//show_bug.cgi?id=32780 ;
it's not a complete fix because we also need to prevent
ARMConstantIslands from cloning the constant.
(-arm-promote-constant is currently off by default, and it stays off
with this patch. I'll look into turning it on again when all the known
issues are fixed.)
Differential Revision: https://reviews.llvm.org/D51472
llvm-svn: 343361
This mostly affects IR generated by non-clang frontends because clang
generally sets the alignment of globals explicitly.
Fixes https://bugs.llvm.org//show_bug.cgi?id=32394 .
(-arm-promote-constant is currently off by default, and it stays off
with this patch. I'll look into turning it on again when all the known
issues are fixed.)
Differential Revision: https://reviews.llvm.org/D51469
llvm-svn: 343359
The code in X86ISelDAGToDAG only looks through truncates if the sign flag isn't used, but that is overly restrictive. A future patch will improve this.
llvm-svn: 343355
Split the `zcz` feature into specific ones got GP and FP registers, `zcz-gp`
and `zcz-fp`, respectively, while retaining the original feature option to
mean both.
Differential revision: https://reviews.llvm.org/D52621
llvm-svn: 343354
Always generating a temporary file is not always suitable, especially for tests
Differential Revision: https://reviews.llvm.org/D52636
llvm-svn: 343351
We don't correctly model the latency and resource usage information for
zero-idiom VPERM2F128rr on Jaguar.
This is demonstrated by the incorrect numbers in the resource pressure view, and
the timeline view.
A follow up patch will fix this problem.
llvm-svn: 343346
Summary: This patch adds bindings to C and Go for addCoroutinePassesToExtensionPoints, which is used to add coroutine passes to the correct locations in PassManagerBuilder.
Reviewers: whitequark, deadalnix
Reviewed By: whitequark
Subscribers: mehdi_amini, modocache, llvm-commits
Differential Revision: https://reviews.llvm.org/D51642
llvm-svn: 343336
If any prefixes have been specified on the RUN lines that do not end up
ever actually getting printed, raise an Error. This is either an
indication that the run lines just need cleaning up, or that something
is more fundamentally wrong with the test.
Also raise an Error if there are any blocks which cannot be checked
because they are not uniquely covered by a prefix.
Fixed up a couple of tests where the extra checking flagged up issues.
Differential Revision: https://reviews.llvm.org/D48276
llvm-svn: 343332
Insert empty blocks to cause the positions of matching blocks to match
across lists where possible so that later stages of the algorithm can
actually identify them as being identical.
Regenerated all tests with this change.
Differential Revision: https://reviews.llvm.org/D52560
llvm-svn: 343331