Summary:
One for register based, much like the existing definitions,
and one for stack based (suffix _S).
This allows us to use registers in most of LLVM (which works better),
and stack based in MC (which results in a simpler and more readable
assembler / disassembler).
Tried to keep this change as small as possible while passing tests,
follow-up commit will:
- Add reg->stack conversion in MI.
- Fix asm/disasm in MC to be stack based.
- Fix emitter to be stack based.
tests passing:
llvm-lit -v `find test -name WebAssembly`
test/CodeGen/WebAssembly
test/MC/WebAssembly
test/MC/Disassembler/WebAssembly
test/DebugInfo/WebAssembly
test/CodeGen/MIR/WebAssembly
test/tools/llvm-objdump/WebAssembly
Reviewers: dschuff, sbc100, jgravelle-google, sunfish
Subscribers: aheejin, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D48183
llvm-svn: 334985
Summary: Refactoring for all constant cases which require AllowNewConst and some staging for future fmf usage.
Reviewers: spatel, hfinkel, wristow
Reviewed By: spatel
Subscribers: nhaehnle
Differential Revision: https://reviews.llvm.org/D48289
llvm-svn: 334984
This patch uses the DiagnosticPredicate for SVE predicate patterns
to improve their diagnostics, now giving a 'invalid operand' diagnostic
if the type is not an immediate or one of the expected pattern
labels.
Reviewers: samparker, SjoerdMeijer, javed.absar, fhahn
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D48220
llvm-svn: 334983
The variants added by this patch are:
- SQINC signed increment, e.g. sqinc x0, w0, all, mul #4
- SQDEC signed decrement, e.g. sqdec x0, w0, all, mul #4
- UQINC unsigned increment, e.g. uqinc w0, all, mul #4
- UQDEC unsigned decrement, e.g. uqdec w0, all, mul #4
This patch includes asmparser changes to parse a GPR64 as a GPR32 in
order to satisfy the constraint check:
x0 == GPR64(w0)
in:
sqinc x0, w0, all, mul #4
^___^ (must match)
Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D47716
llvm-svn: 334980
2 of these tests were clearly not doing what the comments
said they were doing.
The last test was added at rL177933 with no assertions
(presumably it used to crash). But either we don't have
that problem anymore, or this test is folded sooner,
so we don't hit the bug that was fixed by disabling late
FP constant creation. Looking at this as part of reviewing
D48289.
llvm-svn: 334977
Rather than having an exclusion list in tablegen sources, add a flag to the X86 instruction records that can be used to suppress checking for convertibility.
llvm-svn: 334971
This patch introduces a VPInstructionToVPRecipe transformation, which
allows us to generate code for a VPInstruction based VPlan re-using the
existing infrastructure.
Reviewers: dcaballe, hsaito, mssimpso, hfinkel, rengolin, mkuper, javed.absar, sguggill
Reviewed By: dcaballe
Differential Revision: https://reviews.llvm.org/D46827
llvm-svn: 334969
CompileOnDemandLayer2 is a replacement for CompileOnDemandLayer built on the ORC
Core APIs. Functions in added modules are extracted and compiled lazily.
CompileOnDemandLayer2 supports multithreaded JIT'd code, and compilation on
multiple threads.
llvm-svn: 334967
materializing weak symbols as strong.
This removes some elaborate flag tweaking and plays nicer with RuntimeDyld,
which relies of weak/common flags to determine whether it should emit a given
weak definition. (Switching to strong up-front makes it appear as if there is
already an overriding definition, which would require an extra back-channel to
override).
llvm-svn: 334966
Ensure we keep track of the input vectors in all cases instead of just for SK_Select.
Ideally we'd reuse the shuffle mask pattern matching in TargetTransformInfo::getInstructionThroughput here to easily add support for all TargetTransformInfo::ShuffleKind without mass code duplication, I've added a TODO for now but D48236 should help us here.
Differential Revision: https://reviews.llvm.org/D48023
llvm-svn: 334958
When the destination register of a XOP instruction is an XMM register, bits
[255:128] of the corresponding YMM register are cleared.
When the destination register of a EVEX encoded instruction is an XMM/YMM
register, the upper bits of the corresponding ZMM are cleared.
On processors that feature AVX512, a write to an XMM registers always clears the
upper portion of the corresponding ZMM register if the instruction is VEX or
EVEX encoded.
These new tests show some interesting cases which aren't correctly analyzed by
llvm-mca. The lack of knowledge related to the implicit update on the
super-registers is addressed by D48225.
llvm-svn: 334945
Allow a tied operand of a different operand class in InstAliases,
so that the operand can be printed (and added to the MC instruction)
as the appropriate register. For example, 'GPR64as32', which would
be printed/parsed as a 32bit register and should match a tied 64bit
register operand, where the former is a sub-register of the latter.
This patch also generalizes the constraint checking to an overrideable
method in MCTargetAsmParser, so that target asmparsers can specify
whether a given operand satisfies the tied register constraint.
Reviewers: olista01, rengolin, fhahn, SjoerdMeijer, samparker, dsanders, craig.topper
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D47714
llvm-svn: 334942
This patch adds instructions for comparing elements from two vectors, e.g.
cmpgt p0.s, p0/z, z0.s, z1.s
and also adds support for comparing to a 64-bit wide element vector, e.g.
cmpgt p0.s, p0/z, z0.s, z1.d
The patch also contains aliases for certain comparisons, e.g.:
cmple p0.s, p0/z, z0.s, z1.s => cmpge p0.s, p0/z, z1.s, z0.s
cmplo p0.s, p0/z, z0.s, z1.s => cmphi p0.s, p0/z, z1.s, z0.s
cmpls p0.s, p0/z, z0.s, z1.s => cmphs p0.s, p0/z, z1.s, z0.s
cmplt p0.s, p0/z, z0.s, z1.s => cmpgt p0.s, p0/z, z1.s, z0.s
llvm-svn: 334931
Previously we heap allocated the X86InstrFMA3Group objects which were created by passing them small register/memory opcode arrays that existed as individual static tables.
Rather than a bunch of small static arrays we now have one large static table of X86InstrFMA3Group objects. Rather than storing a pointer to the opcode arrays in the X86InstrFMA3Group object, we now store have a register and memory array as part of the object. If a group doesn't have memory or register opcodes, the array entries will be 0.
This greatly simplifies the destruction of the X86InstrFMA3Info object. We no longer need to delete the X86InstrFMA3Group objects as we destruct the DenseMap. And we don't need to keep track of which ones we already deleted.
This reduces the llc binary size on my local machine by ~50k. I can only assume that's really due to the fact that we had something like 512 small static arrays that we passed to the init functions either one at a time or in pairs. So there were between 256 and 512 distinct calls to the init functions in the initOnceImpl method.
llvm-svn: 334925
We already have these aliases for EVEX enocded instructions, but not for the GPR, MMX, SSE, and VEX versions.
Also remove the vpextrw.s EVEX alias. That's not something gas implements.
llvm-svn: 334922
The .s assembly strings allow the reversed forms to be targeted from assembly which matches gas behavior. But when printing the instructions we should print them without the .s to match other tooling like objdump. By using InstAliases we can use the normal string in the instruction and just hide it from the assembly parser.
Ideally we'd add the .s versions to the legacy SSE and VEX versions as well for full compatibility with gas. Not sure how we got to state where only EVEX was supported.
llvm-svn: 334920
Unlike CodeGenInstruction, CodeGenInstAlias was flatting asm strings in its constructor. For instructions it was the users responsibility to flatten the string.
AsmMatcherEmitter didn't know this and treated them the same. This caused double flattening of InstAliases. This is mostly harmless unless the desired assembly string contains curly braces. The second flattening wouldn't know to ignore these and would remove the curly braces. And for variant 1 it would remove the contents of them as well.
To mitigate this, this patch makes removes the flattening from the CodeGenIntAlias constructor and modifies AsmWriterEmitter to account for the flattening not having been done.
llvm-svn: 334919
These increases the size of the static tables, but is closer to what we would get if used the autogenerated table directly. This reduces the remaining large deltas between what's in the manual table and what's in the autogenerated table.
llvm-svn: 334915
symbols in debug mode.
The MaterializationResponsibility class hijacks the Materializing flag to track
symbols that have not yet been resolved in order to guard against redundant
resolution. Since this is an API contract check and only enforced in debug mode
there is no reason to maintain the flag state in release mode.
llvm-svn: 334909
Some of the calls to hasSingleUseFromRoot were passing the load itself. If the load's chain result has a user this would count against that. By getting the true parent of the match and ensuring any intermediate between the match and the load have a single use we can avoid this case. isLegalToFold will take care of checking users of the load's data output.
This fixed at least fma-scalar-memfold.ll to succed without the peephole pass.
llvm-svn: 334908
There are a lot of instructions to add under these ISAs (and the other AVX512 variants) but this should demonstrate how to test for the EVEX instructions with different maskings
llvm-svn: 334907
Support for SVE's predicated select instructions to select elements
from either vector, both in a data-vector and a predicate-vector
variant.
llvm-svn: 334905
We don't want to prevent inlining because of target-cpu and -features
attributes that were added to newer versions of LLVM/Clang: There are
no incompatible functions in PTX, ptxas will throw errors in such cases.
Differential Revision: https://reviews.llvm.org/D47691
llvm-svn: 334904
These all have a short form encoding that the assembler already prefers. Though that preference seems to only be based on order in the .td fie. Hiding the long form saves space in the table and prevents us from breaking the implicit order based priority.
llvm-svn: 334897
VMOVPQIto64Zmr is not a 64-bit mode only instruction. But I don't know how to test this because VMOVPQIto64mr should always have priority over it in 32-bit mode since its only advantage is XMM16-XMM31 which aren't usable in 32-bit mode.
VMOVPQIto64Zrr is a 64-bit mode only instruction, but we don't need to explicitly mark it as such because it uses a GR64 register which won't parse in 32-bit mode.
llvm-svn: 334896
Summary:
We only modify CFG in a couple of places, and we can preserve DT there
with a little effort.
Reviewers: davide, vsk
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D48059
llvm-svn: 334895
DominatorTreeBase::getNode does not modify its parameter and this change
allows callers that only have access to const pointers to use it without
casting.
Reviewers: kuhar, dblaikie, chandlerc
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D48231
llvm-svn: 334892
This patch adds a simple const_iterator implementation for SmallSet by
delegating to either a SmallVector::const_iterator or
std::set::const_iterator, depending on which storage is used by the
SmallSet.
Reviewers: dblaikie, craig.topper
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D47942
llvm-svn: 334887
This is the common case in the BE when we serialize condition and then
rematerialize it. Use either original or inverted condition.
Differential Revision: https://reviews.llvm.org/D48246
llvm-svn: 334882
Relanding after fixing expensive check from modifying tables.
To avoid redundant work, during DAG legalization we keep tables
mapping pre-legalized SDValues to post-legalized SDValues and a
SDValue-to-SDValue map to enable fast node replacements. However, as
the keys are nodes which may be reused it is possible that an entry in
a table refers to a now deleted node N (that should have been renamed
by the value replacement map) while a new node N' exists. If N' is
then replaced that entry would be wrong. Previously we avoided this by
when potentially violating this property, walking every table and
updating all node pointers. This is very expensive but hopefully rare
occurance.
This patch assigns each instance of a SDValue used in legalization a
unique id and uses these ids in the legalization tables. This avoids
any such aliasing issue, avoiding the full table search and allowing
more aggressive incremental table pruning.
In some cases this is a 1000x speedup to compilation.
Reviewers: jyknight, echristo, bogner, tra
Reviewed By: bogner
Subscribers: dberris, grandinj, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D47959
llvm-svn: 334880
Summary: This patch originated from D47388 and is a proper subset of the originating changes, containing only the fmf optimization guard extensions.
Reviewers: spatel, hfinkel, wristow, arsenm, javed.absar, rampitec, nhaehnle, nemanjai
Reviewed By: rampitec, nhaehnle
Subscribers: tpr, nemanjai, wdng
Differential Revision: https://reviews.llvm.org/D47918
llvm-svn: 334876
Summary:
Sending for presubmit review out of an abundance of caution; it would be
bad to mess this up.
Reviewers: sanjoy
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D48238
llvm-svn: 334875
Summary:
Obviates the need for mask/clear/setFlags helpers.
There are some expressions here which can be simplified, but to keep
this easy to review, I have not simplified them in this patch.
No functional change.
Reviewers: sanjoy
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D48237
llvm-svn: 334874
So far, we've only handled special cases of PatFrag like ImmLeaf. This patch
adds support for the remaining cases using similar mechanisms.
Like most C++ code from SelectionDAG, GISel and DAGISel expect to operate on
different types and representations and as such the code is not compatible
between the two. It's therefore necessary to add an alternative implementation
in the GISelPredicateCode field.
The target test for this feature could easily be done with IntImmLeaf and this
would save on a little boilerplate. The reason I've chosen to implement this
using PatFrag.GISelPredicateCode and not IntImmLeaf is because I was unable to
find a rule that was blocked solely by lack of support for PatFrag predicates. I
found that the ones I investigated as being likely candidates for the test
were further blocked by other things.
llvm-svn: 334871
Not sure any of these matter today because I don't think we ever produce them with IMPLICIT_DEF as an input. But by listing them we don't be suprised in the future.
llvm-svn: 334867
Modify ExpandStrictFPOp(...) to handle nodes that have scalar
operands.
Also, add a Strict FMA test and do some other light cleanup in the
Strict FP code.
Differential Revision: https://reviews.llvm.org/D48149
llvm-svn: 334863
Summary: This patch originated from D46562 and is a proper subset, with some issues addressed.
Reviewers: spatel, hfinkel, wristow, arsenm
Reviewed By: spatel
Subscribers: wdng, nhaehnle
Differential Revision: https://reviews.llvm.org/D47954
llvm-svn: 334862
Summary:
Using associated metadata rather than llvm.used allows linkers to
perform dead stripping with -fsanitize-coverage=pc-table. Unfortunately
in my local tests, LLD was the only linker that made use of this metadata.
Partially addresses https://bugs.llvm.org/show_bug.cgi?id=34636 and fixes
https://github.com/google/sanitizers/issues/971.
Reviewers: eugenis
Reviewed By: eugenis
Subscribers: Dor1s, hiraditya, llvm-commits, kcc
Differential Revision: https://reviews.llvm.org/D48203
llvm-svn: 334858
Enables using the high and high-adjusted symbol modifiers on thread local
storage modifers in powerpc assembly. Needed to be able to support 64 bit
thread-pointer and dynamic-thread-pointer access sequences.
Differential Revision: https://reviews.llvm.org/D47754
llvm-svn: 334856
Add support for the "@high" and "@higha" symbol modifiers in powerpc64 assembly.
The modifiers represent accessing the segment consiting of bits 16-31 of a
64-bit address/offset.
Differential Revision: https://reviews.llvm.org/D47729
llvm-svn: 334855
redundant-vf2-cost.ll is X86 specific. Moved from
test/Transforms/LoopVectorize/redundant-vf2-cost.ll to
test/Transforms/LoopVectorize/X86/redundant-vf2-cost.ll
llvm-svn: 334854
Added a Generic x86 cpu set of resource tests to allow us to check all ISAs.
We currently use SandyBridge as our generic CPU model, but it's better if we actually duplicate these tests for if/when we change the model, it also means we don't end up polluting the SandyBridge folder with tests for ISAs it doesn't support.
llvm-svn: 334853
An earlier commit prevented folds from the peephole pass by checking for IMPLICIT_DEF. But later in the pipeline IMPLICIT_DEF just becomes and Undef flag on the input register so we need to check for that case too.
llvm-svn: 334848
When coalescing a small register into a subregister of a larger register,
if the larger register is rematerialized, the function updateRegDefUses
can add an <undef> flag to the rematerialized definition (since it's
treating it as only definining the coalesced subregister). While with that
assumption doing so is not incorrect, make sure to remove the flag later
on after the call to updateRegDefUses.
llvm-svn: 334845
Summary:
When iterating users of a multiply in processUMulZExtIdiom, the
call to setOperand in the truncation case may replace the use
being visited; make sure the iterator has been advanced before
doing that replacement.
Reviewers: majnemer, davide
Reviewed By: davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D48192
llvm-svn: 334844
We were unnecessarily going from SmallString to std::string just to
get a null-terminated C string. So just...don't do that. Crash
slightly faster!
llvm-svn: 334841
This is a minor fix for LV cost model, where the cost for VF=2 was
computed twice when the vectorization of the loop was forced without
specifying a VF.
Reviewers: xusx595, hsaito, fhahn, mkuper
Reviewed By: hsaito, xusx595
Differential Revision: https://reviews.llvm.org/D48048
llvm-svn: 334840
Increment/decrement scalar register by (scaled) element count given by
predicate pattern, e.g. 'incw x0, all, mul #4'.
Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D47713
llvm-svn: 334838
Try to access pieces 4 bytes at a time. This helps
various hasOneUse extract_vector_elt combines, such
as load width reductions.
Avoids test regressions in a future commit.
llvm-svn: 334836
Summary:
While that is indeed a quite interesting summary stat,
there are cases where it does not really add anything
other than consuming extra lines.
Declutters the output of D48190.
Reviewers: RKSimon, andreadb, courbet, craig.topper
Reviewed By: andreadb
Subscribers: javed.absar, gbedwell, llvm-commits
Differential Revision: https://reviews.llvm.org/D48209
llvm-svn: 334833
Summary:
There does not seem to be any other tests for this.
Split off from D47676.
Reviewers: RKSimon, craig.topper, courbet, andreadb
Reviewed By: andreadb
Subscribers: javed.absar, gbedwell, llvm-commits
Differential Revision: https://reviews.llvm.org/D48190
llvm-svn: 334832
This is r334704 (which was reverted in r334732) with a fix for
types like x86_fp80. We need to use getTypeAllocSizeInBits and
not getTypeStoreSizeInBits to avoid dropping debug info for
such types.
Original commit msg:
> Summary:
> Do not convert a DbgDeclare to DbgValue if the store
> instruction only refer to a fragment of the variable
> described by the DbgDeclare.
>
> Problem was seen when for example having an alloca for an
> array or struct, and there were stores to individual elements.
> In the past we inserted a DbgValue intrinsics for each store,
> just as if the store wrote the whole variable.
>
> When handling store instructions we insert a DbgValue that
> indicates that the variable is "undefined", as we do not know
> which part of the variable that is updated by the store.
>
> When ConvertDebugDeclareToDebugValue is used with a load/phi
> instruction we assert that the referenced value is large enough
> to cover the whole variable. Afaict this should be true for all
> scenarios where those methods are used on trunk. If the assert
> blows in the future I guess we could simply skip to insert a
> dbg.value instruction.
>
> In the future I think we should examine which part of the variable
> that is accessed, and add a DbgValue instrinsic with an appropriate
> DW_OP_LLVM_fragment expression.
>
> Reviewers: dblaikie, aprantl, rnk
>
> Reviewed By: aprantl
>
> Subscribers: JDevlieghere, llvm-commits
>
> Tags: #debug-info
>
> Differential Revision: https://reviews.llvm.org/D48024
llvm-svn: 334830
Some instructions require of a limited set of FP immediates as operands,
for example '#0.5 or #1.0' for SVE's FADD instruction.
This patch adds support for parsing and printing such FP immediates as
exact values (e.g. #0.499999 is not accepted for #0.5).
Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D47711
llvm-svn: 334826
Summary:
We already do it for splat constants, but not just values.
Also, undef cases are mostly non-functional.
The original commit was reverted because
it broke tests for amdgpu backend, which i didn't check.
Now, the backed was updated to recognize these new
patterns, so we are good.
https://bugs.llvm.org/show_bug.cgi?id=37603https://rise4fun.com/Alive/cplX
Reviewers: spatel, craig.topper, mareko, bogner, rampitec, nhaehnle, arsenm
Reviewed By: spatel, rampitec, nhaehnle
Subscribers: wdng, nhaehnle, llvm-commits
Differential Revision: https://reviews.llvm.org/D47980
llvm-svn: 334818
Summary: The same pattern as D48010, but this one is IR-canonical as of D47428.
Reviewers: nhaehnle, bogner, tstellar, arsenm
Reviewed By: arsenm
Subscribers: arsenm, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Tags: #amdgpu
Differential Revision: https://reviews.llvm.org/D48012
llvm-svn: 334817
Summary:
As a followup for D48007.
Since we already handle `x << (bitwidth - y) >> (bitwidth - y)` pattern,
which does not have ub for both the edge cases (`y == 0`, `y == bitwidth`),
i think also handling a pattern that is ub for `y == bitwidth` should be fine.
Reviewers: nhaehnle, bogner, tstellar, arsenm
Reviewed By: arsenm
Subscribers: arsenm, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Tags: #amdgpu
Differential Revision: https://reviews.llvm.org/D48010
llvm-svn: 334816
Summary:
D47980 will canonicalize the `x << (32 - y) >> (32 - y)`,
which is the pattern the AMDGPU expects to `x & (-1 >> (32 - y))`,
which is not recognized by AMDGPU.
Thus, it needs to be recognized, too.
Reviewers: nhaehnle, bogner, tstellar, arsenm
Reviewed By: arsenm
Subscribers: arsenm, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Tags: #amdgpu
Differential Revision: https://reviews.llvm.org/D48007
llvm-svn: 334815
Instruction bundling is only supported on descendants of the
MCEncodedFragment type. By moving the bundling functionality and
MCSubtargetInfo to this class it makes it easier to set and extract the
MCSubtargetInfo when it is necessary.
This is a refactoring change that will make it easier to pass the
MCSubtargetInfo through to writeNops when nop padding is required.
Differential Revision: https://reviews.llvm.org/D45959
llvm-svn: 334814
Summary:
On hover, the whole asm snippet is displayed, including operands.
This requires the actual assembly output instead of just the MCInsts:
This is because some pseudo-instructions get lowered to actual target
instructions during codegen (e.g. ABS_Fp32 -> SSE or X87).
Reviewers: gchatelet
Subscribers: mgorny, tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D48164
llvm-svn: 334805
I think this covers most of the unmasked vector instructions. We're still missing a lot of the masked instructions.
There are some test changes here because of the new folding support. I don't think these particular cases should be folded because it creates an undef register dependency. I think the changes introduced in r334175 are not handling stack folding. They're only blocking the peephole pass.
llvm-svn: 334800
LLVM currently assumes that Apple platforms will always use ld64. In the
future, LLD Mach-O might also be supported, so add the beginnings of
linker detection support. ld64 is currently the only detected linker,
since `ld64.lld -v` doesn't yield any useful version output, but we can
add that detection later, and in the meantime it's still useful to have
the ld64 identification.
Switch clang's order file check to use this new detection rather than
just checking for the presence of an ld64 executable.
Differential Revision: https://reviews.llvm.org/D48201
llvm-svn: 334780
IEEE 754 defines the expected result on overflow. As far as I know,
hardware implementations (of f16), and compiler-rt (__floatuntisf)
correctly return +-Inf on overflow. And I can't think of any useful
transform that would take advantage of overflow being undefined here.
Differential Revision: https://reviews.llvm.org/D47807
llvm-svn: 334777
Once a symbol has been selected for materialization it can no longer be
overridden. Stripping the weak flag guarantees this (override attempts will
then be treated as duplicate definitions and result in a DuplicateDefinition
error).
llvm-svn: 334771
Summary:
Here we relax the old constraint which utilized unsafe with the TargetOption flag HonorSignDependentRoundingFPMathOption, with the assertion that unsafe is no longer needed or never was required for correctness on FDIV/FMUL.
Reviewers: spatel, hfinkel, wristow, arsenm, javed.absar
Reviewed By: spatel
Subscribers: efriedma, wdng, tpr
Differential Revision: https://reviews.llvm.org/D48057
llvm-svn: 334769
The return value of TreePatternNode::getChild is never null. This patch also
updates various places that use return values of getChild to also use
references. Those changes were suggested post-commit for D47463.
llvm-svn: 334764
In particular, when asked to print a MemoryAccess, we'll now print where
defs are optimized to, and we'll print optimized access types.
This patch also introduces an operator<< to make printing AliasResults
easier.
Patch by Juneyoung Lee!
Differential Revision: https://reviews.llvm.org/D47860
llvm-svn: 334760
isVectorClearMaskLegal() is the TLI hook used by the generic
DAGCombiner::XformToShuffleWithZero().
We've grown to accomodate/expect this transform to shuffle
(disabling it more generally results in many regressions).
So I'm narrowly excluding the 256-bit types that clearly
are not worthwhile for AVX1.
I think in most cases we are able to recover by converting
the shuffle back into 'and' ops, but the cases in:
https://bugs.llvm.org/show_bug.cgi?id=37749
...show that there are cracks.
llvm-svn: 334759
This is r334750 (which was reverted in r334754) with a fix for an
uninitialized variable that was caught by msan.
Original commit message:
> If a copy bundle happens to involve overlapping registers, we can end
> up with emitting the copies in an order that ends up clobbering some
> of the subregisters. Since instructions in the copy bundle
> semantically happen at the same time, this is incorrect and we need to
> make sure we order the copies such that this doesn't happen.
llvm-svn: 334756
Summary: A FMF constraint is added to FADD with unsafe still available as the fallback
Reviewers: spatel, wristow, arsenm, hfinkel
Reviewed By: spatel
Subscribers: wdng
Differential Revision: https://reviews.llvm.org/D48180
llvm-svn: 334753
WebAssembly doesn't support more than one function per section
and we rely on function sections being unique. This change ignores
the section provided by the function to avoid two functions being
in the same section.
Without this change the object writer produces the following
error for this test:
LLVM ERROR: section already has a defining function: baz
Differential Revision: https://reviews.llvm.org/D48178
llvm-svn: 334752
If a copy bundle happens to involve overlapping registers, we can end
up with emitting the copies in an order that ends up clobbering some
of the subregisters. Since instructions in the copy bundle
semantically happen at the same time, this is incorrect and we need to
make sure we order the copies such that this doesn't happen.
Differential Revision: https://reviews.llvm.org/D48154
llvm-svn: 334750
On MacOS, if CMAKE_OSX_SYSROOT is used and the user has command line tools
installed, we currently get the include path for libxml2 as
/usr/include/libxml2, instead of ${CMAKE_OSX_SYSROOT}/usr/include/libxml2.
Make it consistent on MacOS by prefixing ${CMAKE_OSX_SYSROOT} when
possible.
rdar://problem/41103601
llvm-svn: 334746
On x86-64, a write to register EAX implicitly clears the upper half or RAX.
128-bit AVX instructions clear the upper 128-bit of the YMM register that
aliases the XMM definition register.
llvm-mca doesn't know about register writes that implicitly clear the upper
portion of an aliasing super-register. This issue will be fixed in a future patch.
llvm-svn: 334742
Summary:
Specifically, we transform
zext(2^K * (trunc X to iN)) to iM ->
2^K * (zext(trunc X to i{N-K}) to iM)<nuw>
This is helpful because pulling the 2^K out of the zext allows further
optimizations.
Reviewers: sanjoy
Subscribers: hiraditya, llvm-commits, timshen
Differential Revision: https://reviews.llvm.org/D48158
llvm-svn: 334737
Summary:
Previously we would do this simplification only if it did not introduce
any new truncs (excepting new truncs which replace other cast ops).
This change weakens this condition: If the number of truncs stays the
same, but we're able to transform trunc(X + Y) to X + trunc(Y), that's
still simpler, and it may open up additional transformations.
While we're here, also clean up some duplicated code.
Reviewers: sanjoy
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D48160
llvm-svn: 334736
This reverts rL331412. We didn't up using fragment atoms
in the wasm object writer after all.
Differential Revision: https://reviews.llvm.org/D48173
llvm-svn: 334734
This test checks that a physical register is correctly allocated for the partial
write to register BX.
The ADD instruction has to wait for the write to RBX (and BX) before being
executed.
llvm-svn: 334730
To avoid redundant work, during DAG legalization we keep tables
mapping pre-legalized SDValues to post-legalized SDValues and a
SDValue-to-SDValue map to enable fast node replacements. However, as
the keys are nodes which may be reused it is possible that an entry in
a table refers to a now deleted node N (that should have been renamed
by the value replacement map) while a new node N' exists. If N' is
then replaced that entry would be wrong. Previously we avoided this by
when potentially violating this property, walking every table and
updating all node pointers. This is very expensive but hopefully rare
occurance.
This patch assigns each instance of a SDValue used in legalization a
unique id and uses these ids in the legalization tables. This avoids
any such aliasing issue, avoiding the full table search and allowing
more aggressive incremental table pruning.
In some cases this is a 1000x speedup to compilation.
Reviewers: jyknight, echristo, bogner, tra
Reviewed By: bogner
Subscribers: dberris, grandinj, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D47959
llvm-svn: 334729
If WaitUntilReady is set to true then blockingLookup will return once all
requested symbols are ready. If WaitUntilReady is set to false then
blockingLookup will return as soon as all requested symbols have been
resolved. In the latter case, if any error occurs in finalizing the symbols it
will be reported to the ExecutionSession, rather than returned by
blockingLookup.
llvm-svn: 334722
In some cases, for example when compiling a preprocessed file, the
front-end is not able to provide an MD5 checksum for all files. When
that happens, omit the MD5 checksums from the final DWARF, because
DWARF doesn't have a way to indicate that some but not all files have
a checksum.
When assembling a .s file, and some but not all .file directives
provide an MD5 checksum, issue a warning and don't emit MD5 into the
DWARF.
Fixes PR37623.
Differential Revision: https://reviews.llvm.org/D48135
llvm-svn: 334710
This patches teaches EarlyCSE to figure out that if `and i1 %x, %y` is true then both
`%x` and `%y` are true in the taken branch, and if `or i1 %x, %y` is false then both
`%x` and `%y` are false in non-taken branch. Fix for PR37635.
Differential Revision: https://reviews.llvm.org/D47574
Reviewed By: reames
llvm-svn: 334707
Summary:
Do not convert a DbgDeclare to DbgValue if the store
instruction only refer to a fragment of the variable
described by the DbgDeclare.
Problem was seen when for example having an alloca for an
array or struct, and there were stores to individual elements.
In the past we inserted a DbgValue intrinsics for each store,
just as if the store wrote the whole variable.
When handling store instructions we insert a DbgValue that
indicates that the variable is "undefined", as we do not know
which part of the variable that is updated by the store.
When ConvertDebugDeclareToDebugValue is used with a load/phi
instruction we assert that the referenced value is large enough
to cover the whole variable. Afaict this should be true for all
scenarios where those methods are used on trunk. If the assert
blows in the future I guess we could simply skip to insert a
dbg.value instruction.
In the future I think we should examine which part of the variable
that is accessed, and add a DbgValue instrinsic with an appropriate
DW_OP_LLVM_fragment expression.
Reviewers: dblaikie, aprantl, rnk
Reviewed By: aprantl
Subscribers: JDevlieghere, llvm-commits
Tags: #debug-info
Differential Revision: https://reviews.llvm.org/D48024
llvm-svn: 334704
This is part of the work to cleanup use of 'alternate' ops so we can use the more general SK_Select shuffle type.
Only getSameOpcode calls getMainOpcode and much of the logic is repeated in both functions. This will require some reworking of D28907 but that patch has hit trouble and is unlikely to be completed anytime soon.
Differential Revision: https://reviews.llvm.org/D48120
llvm-svn: 334701
Summary:
Get rid of OpcodeName.
To remove the opcode name from an old file:
```
cat old_file | sed '/opcode_name.*/d'
```
Reviewers: gchatelet
Subscribers: tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D48121
llvm-svn: 334691
Summary:
The tests in:
https://bugs.llvm.org/show_bug.cgi?id=37751
...show miscompiles because we wrongly mapped and folded x86-specific intrinsics into generic DAG nodes.
This patch corrects the mappings in X86IntrinsicsInfo.h and adds isel matching corresponding to the new patterns. The complete tests for the failure cases should be in avx-cvttp2si.ll and sse-cvttp2si.ll and avx512-cvttp2i.ll
Reviewers: RKSimon, gbedwell, spatel
Reviewed By: spatel
Subscribers: mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D47993
llvm-svn: 334685
Summary: This patch transforms the Scheduler class into the ExecuteStage. Most of the logic remains.
Reviewers: andreadb, RKSimon, courbet
Reviewed By: andreadb
Subscribers: mgorny, javed.absar, tschuett, gbedwell, llvm-commits
Differential Revision: https://reviews.llvm.org/D47246
llvm-svn: 334679
This is failing to compile when LLVM_ENABLE_THREADS is false,
and the fix is not immediately obvious, so reverting while I look
into it.
llvm-svn: 334658
When using clang --save-stats -mllvm -time-passes, both timers and stats
end up in the same json file.
We could end up with things like:
{
"asm-printer.EmittedInsts": 1,
"time.pass.Virtual Register Map.wall": 2.9015541076660156e-04,
"time.pass.Virtual Register Map.user": 2.0500000000000379e-04,
"time.pass.Virtual Register Map.sys": 8.5000000000001741e-05,
}
This patch makes use of the pass argument name (if available) in the
JSON key to end up with things like:
{
"asm-printer.EmittedInsts": 1,
"time.pass.virtregmap.wall": 2.9015541076660156e-04,
"time.pass.virtregmap.user": 2.0500000000000379e-04,
"time.pass.virtregmap.sys": 8.5000000000001741e-05,
}
This also helps avoiding to write another JSON printer to handle all the
cases that we could have in our pass names.
Fixed test instead of adding a new one originally from r334649.
Differential Revision: https://reviews.llvm.org/D48109
llvm-svn: 334657
Such globals are very likely to be part of a sorted section array, such
the .CRT sections used for dynamic initialization. The uses its own
sorted sections called ATL$__a, ATL$__m, and ATL$__z. Instead of special
casing them, just look for the dollar sign, which is what invokes linker
section sorting for COFF.
Avoids issues with ASan and the ATL uncovered after we started
instrumenting comdat globals on COFF.
llvm-svn: 334653
When using clang --save-stats -mllvm -time-passes, both timers and stats
end up in the same json file.
We could end up with things like:
{
"asm-printer.EmittedInsts": 1,
"time.pass.Virtual Register Map.wall": 2.9015541076660156e-04,
"time.pass.Virtual Register Map.user": 2.0500000000000379e-04,
"time.pass.Virtual Register Map.sys": 8.5000000000001741e-05,
}
This patch makes use of the pass argument name (if available) in the
JSON key to end up with things like:
{
"asm-printer.EmittedInsts": 1,
"time.pass.virtregmap.wall": 2.9015541076660156e-04,
"time.pass.virtregmap.user": 2.0500000000000379e-04,
"time.pass.virtregmap.sys": 8.5000000000001741e-05,
}
This also helps avoiding to write another JSON printer to handle all the
cases that we could have in our pass names.
Differential Revision: https://reviews.llvm.org/D48109
llvm-svn: 334649
Previously ThreadPool could only queue async "jobs", i.e. work
that was done for its side effects and not for its result. It's
useful occasionally to queue async work that returns a value.
From an API perspective, this is very intuitive. The previous
API just returned a shared_future<void>, so all we need to do is
make it return a shared_future<T>, where T is the type of value
that the operation returns.
Making this work required a little magic, but ultimately it's not
too bad. Instead of keeping a shared queue<packaged_task<void()>>
we just keep a shared queue<unique_ptr<TaskBase>>, where TaskBase
is a class with a pure virtual execute() method, then have a
templated derived class that stores a packaged_task<T()>. Everything
else works out pretty cleanly.
Differential Revision: https://reviews.llvm.org/D48115
llvm-svn: 334643
Returning optional is much safer.
The previous API had potential to cause use of undefined variables, if
the value passed by pointer was accidentally read afterwards.
Differential Revision: https://reviews.llvm.org/D48137
llvm-svn: 334634
Fixes PR37790.
In some (very rare) cases, the LSUnit (Load/Store unit) was wrongly marking a
load (or store) as "ready to execute" effectively bypassing older memory barrier
instructions.
To reproduce this bug, the memory barrier must be the first instruction in the
input assembly sequence, and it doesn't have to perform any register writes.
llvm-svn: 334633
On Windows we've observed that if you open a file, write to it, map it into
memory and close the file handle, the contents of the memory mapping can
sometimes be incorrect. That was what we did when adding an entry to the
ThinLTO cache using the TempFile and MemoryBuffer classes, and it was causing
intermittent build failures on Chromium's ThinLTO bots on Windows. More
details are in the associated Chromium bug (crbug.com/786127).
We can prevent this from happening by keeping a handle to the file open while
the mapping is active. So this patch changes the mapped_file_region class to
duplicate the file handle when mapping the file and close it upon unmapping it.
One gotcha is that the file handle that we keep open must not have been
created with FILE_FLAG_DELETE_ON_CLOSE, as otherwise the operating system
will prevent other processes from opening the file. We can achieve this
by avoiding the use of FILE_FLAG_DELETE_ON_CLOSE altogether. Instead,
we use SetFileInformationByHandle with FileDispositionInfo to manage the
delete-on-close bit. This lets us remove the hack that we used to use to
clear the delete-on-close bit on a file opened with FILE_FLAG_DELETE_ON_CLOSE.
A downside of using SetFileInformationByHandle/FileDispositionInfo as
opposed to FILE_FLAG_DELETE_ON_CLOSE is that it prevents us from using
CreateFile to open the file while the flag is set, even within the same
process. This doesn't seem to matter for almost every client of TempFile,
except for LockFileManager, which calls sys::fs::create_link to create a
hard link from the lock file, and in the process of doing so tries to open
the file. To prevent this change from breaking LockFileManager I changed it
to stop using TempFile by effectively reverting r318550.
Differential Revision: https://reviews.llvm.org/D48051
llvm-svn: 334630
Currently the handle type is a global pointer which holds 8 bytes.
We need a larger type which hold 16 bytes, therefore change it
to [i64 x 2].
Differential Revision: https://reviews.llvm.org/D48094
llvm-svn: 334625
Not sure why, but it breaks buildbot clang-cmake-armv8-full.
It causes a failure in TEST 'Xray-armhf-linux :: TestCases/Posix/profiling-single-threaded.cc'.
llvm-svn: 334617
We're constant folding here, so we shouldn't check uses. This matches
the IR optimizer behavior.
The x86 test shows the expected win. The AArch64 test shows something
else. This only seems to happen if the "generic" AArch64 CPU model is
used by MachineCombiner, so I'll file a bug report to follow-up.
llvm-svn: 334608
Summary:
The code that handles ISD:Register and ISD::CopyFromReg assumes
the target is amdgcn, so this is broken on r600. We don't
need this analysis on r600 anyway so we can safely move
it to SITargetLowering.
Reviewers: alex-t, arsenm, nhaehnle
Reviewed By: arsenm
Subscribers: msearles, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D46298
llvm-svn: 334607
The equivalent AArch64 test added at rL334556 isn't showing
the expected output from the DAGCombiner code change that
would fix this example. That's a machine combiner bug from
what I see.
llvm-svn: 334605
Add a helper function to expand constrained FP operations as needed.
Note that the Strict POWI operation is not handled in this patch since
the format is slightly different from the others.
Differential Revision: https://reviews.llvm.org/D47491
llvm-svn: 334603