Summary:
According to
https://docs.nvidia.com/cuda/archive/10.0/ptx-writers-guide-to-interoperability/index.html#cuda-specific-dwarf,
the compiler should emit the DW_AT_address_class attribute for all
variable and parameter. It means, that DW_AT_address_class attribute
should be used in the non-standard way to support compatibility with the
cuda-gdb debugger.
Clang is able to generate the information about the variable address
class. This information is emitted as the expression sequence
`DW_OP_constu <DWARF Address Space> DW_OP_swap DW_OP_xderef`. The patch
tries to find all such expressions and transform them into
`DW_AT_address_class <DWARF Address Space>` if target is NVPTX and the debugger is gdb.
If the expression is not found, then default values are used. For the
local variables <DWARF Address Space> is set to ADDR_local_space(6), for
the globals <DWARF Address Space> is set to ADDR_global_space(5). The
values are taken from the table in the same section 5.2. CUDA-Specific
DWARF Definitions.
Reviewers: echristo, probinson
Subscribers: jholewinski, aprantl, llvm-commits
Differential Revision: https://reviews.llvm.org/D57157
llvm-svn: 353203
This patch improves code generation for some AArch64 ACLE intrinsics. It adds
support to CGP to duplicate and sink operands to their user, if they can be
folded into a target instruction, like zexts and sub into usubl. It adds a
TargetLowering hook shouldSinkOperands, which looks at the operands of
instructions to see if sinking is profitable.
I decided to add a new target hook, as for the sinking to be profitable,
at least on AArch64, we have to look at multiple operands of an
instruction, instead of looking at the users of a zext for example.
The sinking is done in CGP, because it works around an instruction
selection limitation. If instruction selection is not limited to a
single basic block, this patch should not be needed any longer.
Alternatively this could be done in the LoopSink pass, which tries to
undo LICM for instructions in blocks that are not executed frequently.
Note that we do not force the operands to sink to have a single user,
because we duplicate them before sinking. Therefore this is only
desirable if they really can be done for free. Additionally we could
consider the impact on live ranges later on.
This should fix https://bugs.llvm.org/show_bug.cgi?id=40025.
As for performance, we have internal code that uses intrinsics and can
be speed up by 10% by this change.
Reviewers: SjoerdMeijer, t.p.northover, samparker, efriedma, RKSimon, spatel
Reviewed By: samparker
Differential Revision: https://reviews.llvm.org/D57377
llvm-svn: 353152
The fewerElementsVectors implementation for load/stores
handles the scalar reduction case just as well, so drop
the redundant code in narrowScalar. This also introduces
support for narrowing irregular size breakdowns for
scalars.
llvm-svn: 353125
Summary:
If the index isn't constant, this transform inserts a multiply and an add on the index to calculating the base pointer for a scalar load. But we still create a memory operand with an offset of 0 and the size of the scalar access. But the access is really to an unknown offset within the original access size.
This can cause the machine scheduler to incorrectly calculate dependencies between this load and other accesses. In the case we saw, there was a 32 byte vector store that was split into two 16 byte stores, one with offset 0 and one with offset 16. The size of the memory operand for both was 16. The scheduler correctly detected the alias with the offset 0 store, but not the offset 16 store.
This patch discards the pointer info so we don't incorrectly detect aliasing. I wasn't sure if we could keep using the original offset and size without risking some other transform on the load changing the size.
I tried to reduce a test case, but there's still a lot of memory operations needed to get the scheduler to do the bad reordering. So it looked pretty fragile to maintain.
Reviewers: efriedma
Reviewed By: efriedma
Subscribers: arphaman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57616
llvm-svn: 353124
Don't handle vector conditions.
I think this can be merged in the future with
fewerElementsVectorSelect, although this becomes slightly tricky with
a vector condition.
llvm-svn: 353122
Try to use the underlying source registers.
This enables legalization in more cases where some irregular
operations are widened and others narrowed.
This seems to make the test_combines_2 AArch64 test worse, since the
MERGE_VALUES has multiple uses. Since this should be required for
legalization, a hasOneUse check is probably inappropriate (or maybe
should only be used if the merge is legal?).
llvm-svn: 353121
A number of of tests were using imm operands, not cimm. Since CSE
relies on the exact ConstantInt* pointer used, and implicit
conversions are generally evil, also enforce the bitsize of the types.
llvm-svn: 353113
Aliases of functions are now marked as function symbols even if
they are bitcast to some other other non-function type.
This is important for WebAssembly where object and function
symbols can't alias each other.
Fixes PR38866
Differential Revision: https://reviews.llvm.org/D57538
llvm-svn: 353109
The LiveDebugValues pass recognizes spills but not restores, which can
cause large gaps in location information for some variables, depending
on control flow. This patch make LiveDebugValues recognize restores and
generate appropriate DBG_VALUE instructions.
This patch was posted previously with r352642 and reverted in r352666 due
to buildbot errors. A missing return statement was the cause for the
failures.
Reviewers: aprantl, NicolaPrica
Differential Revision: https://reviews.llvm.org/D57271
llvm-svn: 353089
This fixes two problems with CSE done in buildConstant. First, this
would hit an assert when used with a vector result type. Solve this by
allowing CSE on the vector elements, but not on the result vector for
now.
Second, this was also performing the CSE based on the input
ConstantInt pointer. The underlying buildConstant could potentially
convert the constant depending on the result type, giving in a
different ConstantInt*. Stop allowing the APInt and ConstantInt forms
from automatically casting to the result type to avoid any similar
problems in the future.
llvm-svn: 353077
This reverts commit 8bbd570fd5205a04d88d2e5513a6e4adbd028039.
Apparently adding ffloor breaks AMDGPU somehow, so I need to back this out
while I look into it.
llvm-svn: 353064
Add an intrinsic that takes 2 unsigned integers with the scale of them
provided as the third argument and performs fixed point multiplication on
them.
This is a part of implementing fixed point arithmetic in clang where some of
the more complex operations will be implemented as intrinsics.
Differential Revision: https://reviews.llvm.org/D55625
llvm-svn: 353059
This is no-functional-change-intended although there could
be intermediate variations caused by a difference in the
debug info produced by setting that from the builder's
insertion point.
I'm updating the IR test file associated with this code just
to show that the naming differences from using the builder
are visible.
The motivation for adding a helper function is that we are
likely to extend this code to deal with other overflow ops.
llvm-svn: 353056
Noticed while investigating PR40483, and fixes the basic test case from the bug - but not a more general case.
We're pretty weak at dealing with ADD/SUB combines compared to the SimplifyAssociativeOrCommutative/SimplifyUsingDistributiveLaws abilities that InstCombine can manage.
llvm-svn: 353044
This patch removes hidden codegen flag -print-schedule effectively reverting the
logic originally committed as r300311
(https://llvm.org/viewvc/llvm-project?view=revision&revision=300311).
Flag -print-schedule was originally introduced by r300311 to address PR32216
(https://bugs.llvm.org/show_bug.cgi?id=32216). That bug was about adding "Better
testing of schedule model instruction latencies/throughputs".
These days, we can use llvm-mca to test scheduling models. So there is no longer
a need for flag -print-schedule in LLVM. The main use case for PR32216 is
now addressed by llvm-mca.
Flag -print-schedule is mainly used for debugging purposes, and it is only
actually used by x86 specific tests. We already have extensive (latency and
throughput) tests under "test/tools/llvm-mca" for X86 processor models. That
means, most (if not all) existing -print-schedule tests for X86 are redundant.
When flag -print-schedule was first added to LLVM, several files had to be
modified; a few APIs gained new arguments (see for example method
MCAsmStreamer::EmitInstruction), and MCSubtargetInfo/TargetSubtargetInfo gained
a couple of getSchedInfoStr() methods.
Method getSchedInfoStr() had to originally work for both MCInst and
MachineInstr. The original implmentation of getSchedInfoStr() introduced a
subtle layering violation (reported as PR37160 and then fixed/worked-around by
r330615).
In retrospect, that new API could have been designed more optimally. We can
always query MCSchedModel to get the latency and throughput. More importantly,
the "sched-info" string should not have been generated by the subtarget.
Note, r317782 fixed an issue where "print-schedule" didn't work very well in the
presence of inline assembly. That commit is also reverted by this change.
Differential Revision: https://reviews.llvm.org/D57244
llvm-svn: 353043
There are 2 changes visible here:
1. There's no reason to limit this transform based on number
of condition registers. That diff allows PPC to produce
slightly better (dot-instructions should be generally good)
code.
Note: someone that cares about PPC codegen might want to
look closer at that output because it seems like we could
still improve this.
2. We (probably?) should not bother trying to form uaddo (or
other overflow ops) when there's no target support for such
an op. This goes beyond checking whether the op is expanded
because both PPC and AArch64 show better codegen for standard
types regardless of whether the op is legal/custom.
llvm-svn: 353001
This is not truly NFC because we are bailing out without
a TLI now. That should not be a real concern though because
there should be a TLI in any real-world scenario.
That seems better than passing around a pointer and then
checking it for null-ness all over the place.
The motivation is to fix what appears to be an unintended
restriction on the uaddo transform -
hasMultipleConditionRegisters() shouldn't be reason to limit
the transform.
llvm-svn: 352988
For the scalar case only.
Also move the similar G_MERGE_VALUES handling to a separate function
and cleanup to make them look more similar.
llvm-svn: 352979
We already have the getConstantOperandVal helper which returns a uint64_t, but along comes the fuzzer and inserts a i128 -1 constant or something and the whole thing asserts.......
I've updated a few obvious cases, and tried to make use of the const reference where possible, but there's more to do. A number of existing oss-fuzz tickets should be fixed if we start using APInt and perform value clamping where necessary.
llvm-svn: 352961
Background: At the moment, we record the AtomicOrdering of an access in the MMO, but also mark any atomic access as volatile in SelectionDAG. I'm working towards separating that. See https://reviews.llvm.org/D57601 for context.
Update all usages of isVolatile in lib/CodeGen to preserve behaviour once atomic MMOs stop being also volatile. This is NFC in it's current form, but is essential for correctness once we make that final change.
It useful to keep in mind that AtomicSDNode is not a parent of LoadSDNode, StoreSDNode, or LSBaseSDNode. As a result, any call to isVolatile on one of those static types doesn't need a companion isAtomic check. We should probably adjust that class hierarchy long term, but for now, that seperation is useful.
I'm deliberately being conservative about handling. I want the change to stop adding volatile to be NFC itself, and then will work through places where we can be less conservative for atomics one by one in separate changes w/tests.
Differential Revision: https://reviews.llvm.org/D57596
llvm-svn: 352937
Summary: This fixes using the correct stack registers for SEH when stack realignment is needed or when variable size objects are present.
Reviewers: rnk, efriedma, ssijaric, TomTan
Reviewed By: rnk, efriedma
Subscribers: javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D57183
llvm-svn: 352923
This cleans up all GetElementPtr creation in LLVM to explicitly pass a
value type rather than deriving it from the pointer's element-type.
Differential Revision: https://reviews.llvm.org/D57173
llvm-svn: 352913
This cleans up all LoadInst creation in LLVM to explicitly pass the
value type rather than deriving it from the pointer's element-type.
Differential Revision: https://reviews.llvm.org/D57172
llvm-svn: 352911
This cleans up all CallInst creation in LLVM to explicitly pass a
function type rather than deriving it from the pointer's element-type.
Differential Revision: https://reviews.llvm.org/D57170
llvm-svn: 352909
The version of FoldConstantArithmetic() that takes arbitrary nodes
was confusingly naming those nodes as constants when they might
not be; also "Cst" reads like "Cast".
llvm-svn: 352884
This might be the start of tracking all vector element constants generally if we take it to its
logical conclusion, but let's stop here and make sure this is correct/beneficial so far.
The affected tests require a convoluted path before they get simplified currently because we
don't call SimplifyDemandedVectorElts() from binops directly and don't modify the binop operands
directly in SimplifyDemandedVectorElts().
That's why the tests all have a trailing shuffle to induce a chain reaction of transforms. So
something like this is happening:
1. Improve the knowledge of undefs in the binop via a SimplifyDemandedVectorElts() call that
originates from a shuffle.
2. Transfer that undef knowledge back to the shuffle mask user as more undef lanes.
3. Combine the modified shuffle by calling SimplifyDemandedVectorElts() again.
4. Translate the improved shuffle mask as undemanded lanes of build vector constants causing
those to become full undef constants.
5. Simplify the binop now that it has a full undef operand.
As we can see from the unchanged 'and' and 'or' tests, tracking undefs alone isn't a full solution.
We would need to track zero and all-ones constants to improve those opcodes. We'd probably need to
track NaN for FP ops too (assuming we don't have fast-math-flags set).
Differential Revision: https://reviews.llvm.org/D57066
llvm-svn: 352880
Previously, LiveRegUnits was assuming that if a block has no successors
and does not return, then no registers are live at the end of it
(because the end of the block is unreachable). This was causing the
register scavenger to use callee-saved registers to materialise stack
frame addresses without saving them in the prologue. This would normally
be fine, because the end of the block is unreachable, but this is not
legal if the block ends by throwing a C++ exception. If this happens,
the scratch register will be modified, but its previous value won't be
preserved, so it doesn't get restored by the exception unwinder.
Differential revision: https://reviews.llvm.org/D57381
llvm-svn: 352844
For targets where i32 is not a legal type (e.g. 64-bit RISC-V),
LegalizeIntegerTypes must promote the integer operand of ISD::FPOWI. As this
is a signed value, this should be sign-extended.
This patch enables all tests in test/CodeGen/RISCVfloat-intrinsics.ll for
RV64, as prior to this patch that file couldn't be compiled for RV64 due to an
assertion when performing codegen for fpowi.
Differential Revision: https://reviews.llvm.org/D54574
llvm-svn: 352832
Recommit r352791 after tweaking DerivedTypes.h slightly, so that gcc
doesn't choke on it, hopefully.
Original Message:
The FunctionCallee type is effectively a {FunctionType*,Value*} pair,
and is a useful convenience to enable code to continue passing the
result of getOrInsertFunction() through to EmitCall, even once pointer
types lose their pointee-type.
Then:
- update the CallInst/InvokeInst instruction creation functions to
take a Callee,
- modify getOrInsertFunction to return FunctionCallee, and
- update all callers appropriately.
One area of particular note is the change to the sanitizer
code. Previously, they had been casting the result of
`getOrInsertFunction` to a `Function*` via
`checkSanitizerInterfaceFunction`, and storing that. That would report
an error if someone had already inserted a function declaraction with
a mismatching signature.
However, in general, LLVM allows for such mismatches, as
`getOrInsertFunction` will automatically insert a bitcast if
needed. As part of this cleanup, cause the sanitizer code to do the
same. (It will call its functions using the expected signature,
however they may have been declared.)
Finally, in a small number of locations, callers of
`getOrInsertFunction` actually were expecting/requiring that a brand
new function was being created. In such cases, I've switched them to
Function::Create instead.
Differential Revision: https://reviews.llvm.org/D57315
llvm-svn: 352827
This reverts commit f47d6b38c7 (r352791).
Seems to run into compilation failures with GCC (but not clang, where
I tested it). Reverting while I investigate.
llvm-svn: 352800
This patch fixes pr39098.
For the attached test case, CombineZExtLogicopShiftLoad can optimize it to
t25: i64 = Constant<1099511627775>
t35: i64 = Constant<0>
t0: ch = EntryToken
t57: i64,ch = load<(load 4 from `i40* undef`, align 8), zext from i32> t0, undef:i64, undef:i64
t58: i64 = srl t57, Constant:i8<1>
t60: i64 = and t58, Constant:i64<524287>
t29: ch = store<(store 5 into `i40* undef`, align 8), trunc to i40> t57:1, t60, undef:i64, undef:i64
But later visitANDLike transforms it to
t25: i64 = Constant<1099511627775>
t35: i64 = Constant<0>
t0: ch = EntryToken
t57: i64,ch = load<(load 4 from `i40* undef`, align 8), zext from i32> t0, undef:i64, undef:i64
t61: i32 = truncate t57
t63: i32 = srl t61, Constant:i8<1>
t64: i32 = and t63, Constant:i32<524287>
t65: i64 = zero_extend t64
t58: i64 = srl t57, Constant:i8<1>
t60: i64 = and t58, Constant:i64<524287>
t29: ch = store<(store 5 into `i40* undef`, align 8), trunc to i40> t57:1, t60, undef:i64, undef:i64
And it triggers CombineZExtLogicopShiftLoad again, causes a dead loop.
Both forms should generate same instructions, CombineZExtLogicopShiftLoad generated IR looks cleaner. But it looks more difficult to prevent visitANDLike to do the transform, so I prevent CombineZExtLogicopShiftLoad to do the transform if the ZExt is free.
Differential Revision: https://reviews.llvm.org/D57491
llvm-svn: 352792
The FunctionCallee type is effectively a {FunctionType*,Value*} pair,
and is a useful convenience to enable code to continue passing the
result of getOrInsertFunction() through to EmitCall, even once pointer
types lose their pointee-type.
Then:
- update the CallInst/InvokeInst instruction creation functions to
take a Callee,
- modify getOrInsertFunction to return FunctionCallee, and
- update all callers appropriately.
One area of particular note is the change to the sanitizer
code. Previously, they had been casting the result of
`getOrInsertFunction` to a `Function*` via
`checkSanitizerInterfaceFunction`, and storing that. That would report
an error if someone had already inserted a function declaraction with
a mismatching signature.
However, in general, LLVM allows for such mismatches, as
`getOrInsertFunction` will automatically insert a bitcast if
needed. As part of this cleanup, cause the sanitizer code to do the
same. (It will call its functions using the expected signature,
however they may have been declared.)
Finally, in a small number of locations, callers of
`getOrInsertFunction` actually were expecting/requiring that a brand
new function was being created. In such cases, I've switched them to
Function::Create instead.
Differential Revision: https://reviews.llvm.org/D57315
llvm-svn: 352791
While dangling nodes will eventually be pruned when they are
considered, leaving them disables combines requiring single-use.
Reviewers: Carrot, spatel, craig.topper, RKSimon, efriedma
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D57520
llvm-svn: 352784
r zero scale SMULFIX, expand into MUL which produces better code for X86.
For vector arguments, expand into MUL if SMULFIX is provided with a zero scale.
Otherwise, expand into MULH[US] or [US]MUL_LOHI.
Differential Revision: https://reviews.llvm.org/D56987
llvm-svn: 352783
This ensures that if we make it to the backend w/o lowering widenable_conditions first, that we generate correct code. Doing it in CGP - instead of isel - let's us fold control flow before hitting block local instruction selection.
Differential Revision: https://reviews.llvm.org/D57473
llvm-svn: 352779
And instead just generate a libcall. My motivating example on ARM was a simple:
shl i64 %A, %B
for which the code bloat is quite significant. For other targets that also
accept __int128/i128 such as AArch64 and X86, it is also beneficial for these
cases to generate a libcall when optimising for minsize. On these 64-bit targets,
the 64-bits shifts are of course unaffected because the SHIFT/SHIFT_PARTS
lowering operation action is not set to custom/expand.
Differential Revision: https://reviews.llvm.org/D57386
llvm-svn: 352736
This change reverts r351626.
The changes in r351626 cause quadratic work in several cases. (See r351626 thread on llvm-commits for details.)
llvm-svn: 352722
Also fix an alignment bug getMachineMemOperand. If the
tracked value is null, the offset isn't tracked so the
base alignment needs to be reduced.
llvm-svn: 352716
Summary:
Fixes PR40267, in which the removed assertion was triggering on
perfectly valid IR. As far as I can tell, constant out of bounds
indices should be allowed when splitting extract_vector_elt, since
they will simply be propagated as out of bounds indices in the
resulting split vector and handled appropriately elsewhere.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya
Differential Revision: https://reviews.llvm.org/D57471
llvm-svn: 352702
These can be triggered by mistakenly using a 64-bit mode only intrinsics with a -mtriple=i686. Using report_fatal_error gives a better experience for this mistake in release builds instead of probably crashing.
We already do this for some of the vector type legalization handles.
llvm-svn: 352699
This teaches the legalizer to handle G_FEXP in AArch64. As a result, it also
allows us to select G_FEXP.
It...
- Updates the legalizer-info tests
- Adds a test for legalizing exp
- Updates the existing fp tests to show that we can now select G_FEXP
https://reviews.llvm.org/D57483
llvm-svn: 352692
This extends the existing transform for:
add X, 0/1 --> sub X, 0/-1
...to allow the sibling subtraction fold.
This pattern could regress with the proposed change in D57401.
llvm-svn: 352680
This teaches GlobalISel to emit a RTLib call for @llvm.log2 when it encounters
it.
It updates the existing floating point tests to show that we don't fall back on
the intrinsic, and select the correct instructions. It also adds a legalizer
test for G_FLOG2.
https://reviews.llvm.org/D57357
llvm-svn: 352673
This teaches the legalizer about G_FSQRT in AArch64. Also adds a legalizer
test for G_FSQRT, a selection test for it, and updates existing floating point
tests.
https://reviews.llvm.org/D57361
llvm-svn: 352671
This is meant to be used with clang's __builtin_dynamic_object_size.
When 'true' is passed to this parameter, the intrinsic has the
potential to be folded into instructions that will be evaluated
at run time. When 'false', the objectsize intrinsic behaviour is
unchanged.
rdar://32212419
Differential revision: https://reviews.llvm.org/D56761
llvm-svn: 352664
The LiveDebugValues pass recognizes spills but not restores, which can
cause large gaps in location information for some variables, depending
on control flow. This patch make LiveDebugValues recognize restores and
generate appropriate DBG_VALUE instructions.
Reviewers: aprantl, NicolaPrica
Differential Revision: https://reviews.llvm.org/D57271
llvm-svn: 352642
I've repeatedly encountered bugs resulting from custom legalize
mutations returning nonsense legalize results, such as increasing the
number of elements for FewerElements. Add an assert function to make
sure the type to mutate to is consistent with the legalize action.
llvm-svn: 352636
Summary:
This switches the EH implementation to the new proposal:
https://github.com/WebAssembly/exception-handling/blob/master/proposals/Exceptions.md
(The previous proposal was
https://github.com/WebAssembly/exception-handling/blob/master/proposals/old/Exceptions.md)
- Instruction changes
- Now we have one single `catch` instruction that returns a except_ref
value
- `throw` now can take variable number of operations
- `rethrow` does not have 'depth' argument anymore
- `br_on_exn` queries an except_ref to see if it matches the tag and
branches to the given label if true.
- `extract_exception` is a pseudo instruction that simulates popping
values from wasm stack. This is to make `br_on_exn`, a very special
instruction, work: `br_on_exn` puts values onto the stack only if it
is taken, and the # of values can vay depending on the tag.
- Now there's only one `catch` per `try`, this patch removes all special
handling for terminate pad with a call to `__clang_call_terminate`.
Before it was the only case there are two catch clauses (a normal
`catch` and `catch_all` per `try`).
- Make `rethrow` act as a terminator like `throw`. This splits BB after
`rethrow` in WasmEHPrepare, and deletes an unnecessary `unreachable`
after `rethrow` in LateEHPrepare.
- Now we stop at all catchpads (because we add wasm `catch` instruction
that catches all exceptions), this creates new
`findWasmUnwindDestinations` function in SelectionDAGBuilder.
- Now we use `br_on_exn` instrution to figure out if an except_ref
matches the current tag or not, LateEHPrepare generates this sequence
for catch pads:
```
catch
block i32
br_on_exn $__cpp_exception
end_block
extract_exception
```
- Branch analysis for `br_on_exn` in WebAssemblyInstrInfo
- Other various misc. changes to switch to the new proposal.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D57134
llvm-svn: 352598
This is the sibling fold for insert-of-insert that was added with D56604.
Now that we have x86 shuffle narrowing (D57156), this change shows improvements for
lots of AVX512 reduction code (not sure that we would ever expect extract-of-extract otherwise).
There's a small regression in some of the partial-permute tests (extracting followed by splat).
That is tracked by PR40500:
https://bugs.llvm.org/show_bug.cgi?id=40500
Differential Revision: https://reviews.llvm.org/D57336
llvm-svn: 352528
This was ignoring the memory size, and producing multiple loads/stores
if the operand size was different from the memory size.
I assume this is the intent of not having an explicit G_ANYEXTLOAD
(although I think that would probably be better).
llvm-svn: 352523
This fixes most references to the paths:
llvm.org/svn/
llvm.org/git/
llvm.org/viewvc/
github.com/llvm-mirror/
github.com/llvm-project/
reviews.llvm.org/diffusion/
to instead point to https://github.com/llvm/llvm-project.
This is *not* a trivial substitution, because additionally, all the
checkout instructions had to be migrated to instruct users on how to
use the monorepo layout, setting LLVM_ENABLE_PROJECTS instead of
checking out various projects into various subdirectories.
I've attempted to not change any scripts here, only documentation. The
scripts will have to be addressed separately.
Additionally, I've deleted one document which appeared to be outdated
and unneeded:
lldb/docs/building-with-debug-llvm.txt
Differential Revision: https://reviews.llvm.org/D57330
llvm-svn: 352514
During the lowering of a switch that would result in the generation of a
jump table, a range check is performed before indexing into the jump
table, for the switch value being outside the jump table range and a
conditional branch is inserted to jump to the default block. In case the
default block is unreachable, this conditional jump can be omitted. This
patch implements omitting this conditional branch for unreachable
defaults.
Review ID: D52002
Reviewers: Hans Wennborg, Eli Freidman, Roman Lebedev
llvm-svn: 352484
A FrameIndex should be valid throughout a block regardless of what instructions
get selected in that block -- therefore we shouldn't harness dbg.values that
refer to FrameIndexes to an SDNode. There are numerous codegen reasons why
an SDNode never appears or doesn't become a location that a DBG_VALUE can
refer to. None of them actually affect the variable location.
Therefore, before any other tests to encode dbg_values in a SelectionDAG,
identify FrameIndex operands and encode them unattached to any SDNode.
Differential Revision: https://reviews.llvm.org/D57328
llvm-svn: 352467
This patch makes sure that a debug value that is after the bitcast in
dupRetToEnableTailCallOpts() is also skipped.
The reduced test case is from SPEC-2006 on SystemZ.
Review: Vedant Kumar, Wolfgang Pieb
https://reviews.llvm.org/D57050
llvm-svn: 352462
This did not cause the buildbot failure it was previously reverted for.
Original commit message:
I'm not sure why we were using SEXTLOAD. EXTLOAD seems more appropriate since we don't care about the upper bits.
This patch changes this and then modifies the X86 post legalization combine to emit a extending shuffle instead of a sign_extend_vector_inreg. Could maybe use an any_extend_vector_inre
On AVX512 targets I think we might be able to use a masked vpmovzx and not have to expand this at all.
llvm-svn: 352433
This adds support for legalizing G_FLOG into a RTLib call.
It adds a legalizer test, and updates the existing floating point tests.
https://reviews.llvm.org/D57347
llvm-svn: 352429
This adds instruction selection support for @llvm.log10 in AArch64. It teaches
GISel to lower it to a library call, updates the relevant tests, and adds a
legalizer test for log10.
https://reviews.llvm.org/D57341
llvm-svn: 352418
This adds ISel support for lifetime markers in opt levels above O0.
It also updates the arm64-irtranslator test, and updates some AArch64 tests that
use them for added coverage.
It also adds a testcase taken from the X86 codegen tests which verified a bug
caused by lifetime markers + stack colouring in the past. This is intended to
make sure that GISel doesn't re-introduce the bug.
(This is basically a straight copy from what SelectionDAG does in
SelectionDAGBuilder.cpp)
https://reviews.llvm.org/D57187
llvm-svn: 352410
Followup to D56636, this time handling the UADDSAT case by expanding
uadd.sat(a, b) to umin(a, ~b) + b.
Differential Revision: https://reviews.llvm.org/D56869
llvm-svn: 352409
This contains all of the legalizer changes from D57197 necessary to select
G_FCOS and G_FSIN. It also updates several existing IR tests in
test/CodeGen/AArch64 that verify that we correctly lower the G_FCOS and G_FSIN
instructions.
https://reviews.llvm.org/D57197
3/3
llvm-svn: 352402
Lower G_USUBO and G_USUBE. Add narrowScalar for G_SUB.
Legalize and select G_SUB for MIPS 32.
Differential Revision: https://reviews.llvm.org/D53416
llvm-svn: 352351
This patch improves the placement of DBG_VALUEs when by SelectionDAG, which
as documented in PR40427 can go very wrong. At the core of this is
ProcessSourceNode, which assumes the last instruction in a BB is the start
of the last processed IR instruction, which isn't always true.
Instead, use a helper function to call InstrEmitter::EmitNode, that records
before-and-after iterators and determines the first of any new instruction
created during emission. This is passed to ProcessSourceNode, which can
then make more elightened decisions about ordering for DBG_VALUE placement.
Differential revision: https://reviews.llvm.org/D57163
llvm-svn: 352350
This fixes loads like 's1 = load %p (load 1 from %p)' being combined with an
extend into an illegal 's8 = g_extload %p (load 1 from %p)' which doesn't do any
extension, by avoiding touching those < s8 size loads.
This bug was uncovered by a verifier update r351584, which I reverted it to keep
the bots green.
llvm-svn: 352311
Summary:
I'm not sure why we were using SEXTLOAD. EXTLOAD seems more appropriate since we don't care about the upper bits.
This patch changes this and then modifies the X86 post legalization combine to emit a extending shuffle instead of a sign_extend_vector_inreg. Could maybe use an any_extend_vector_inreg, but I just did what we already do in LowerLoad. I think we can actually get rid of this code entirely if we switch to -x86-experimental-vector-widening-legalization.
On AVX512 targets I think we might be able to use a masked vpmovzx and not have to expand this at all.
Reviewers: RKSimon, spatel
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D57186
llvm-svn: 352255
If bottom of block BB has only one successor OldTop, in most cases it is profitable to move it before OldTop, except the following case:
-->OldTop<-
| . |
| . |
| . |
---Pred |
| |
BB-----
Move BB before OldTop can't reduce the number of taken branches, this patch detects this case and prevent the moving.
Differential Revision: https://reviews.llvm.org/D57067
llvm-svn: 352236
N_FUNC_COLD is a new MachO symbol attribute. It's a hint to the linker
to order a symbol towards the end of its section, to improve locality.
Example:
```
void a1() {}
__attribute__((cold)) void a2() {}
void a3() {}
int main() {
a1();
a2();
a3();
return 0;
}
```
A linker that supports N_FUNC_COLD will order _a2 to the end of the text
section. From `nm -njU` output, we see:
```
_a1
_a3
_main
_a2
```
Differential Revision: https://reviews.llvm.org/D57190
llvm-svn: 352227
https://reviews.llvm.org/D57178
Now add a hook in TargetPassConfig to query if CSE needs to be
enabled. By default this hook returns false only for O0 opt level but
this can be overridden by the target.
As a consequence of the default of enabled for non O0, a few tests
needed to be updated to not use CSE (by passing in -O0) to the run
line.
reviewed by: arsenm
llvm-svn: 352126
This patch adds support for vector @llvm.ceil intrinsics when full 16 bit
floating point support isn't available.
To do this, this patch...
- Implements basic isel for G_UNMERGE_VALUES
- Teaches the legalizer about 16 bit floats
- Teaches AArch64RegisterBankInfo to respect floating point registers on
G_BUILD_VECTOR and G_UNMERGE_VALUES
- Teaches selectCopy about 16-bit floating point vectors
It also adds
- A legalizer test for the 16-bit vector ceil which verifies that we create a
G_UNMERGE_VALUES and G_BUILD_VECTOR when full fp16 isn't supported
- An instruction selection test which makes sure we lower to G_FCEIL when
full fp16 is supported
- A test for selecting G_UNMERGE_VALUES
And also updates arm64-vfloatintrinsics.ll to show that the new ceiling types
work as expected.
https://reviews.llvm.org/D56682
llvm-svn: 352113
It should be emitted when any floating-point operations (including
calls) are present in the object, not just when calls to printf/scanf
with floating point args are made.
The difference caused by this is very subtle: in static (/MT) builds,
on x86-32, in a program that uses floating point but doesn't print it,
the default x87 rounding mode may not be set properly upon
initialization.
This commit also removes the walk of the types pointed to by pointer
arguments in calls. (To assist in opaque pointer types migration --
eventually the pointee type won't be available.)
That latter implies that it will no longer consider a call like
`scanf("%f", &floatvar)` as sufficient to emit _fltused on its
own. And without _fltused, `scanf("%f")` will abort with error R6002. This
new behavior is unlikely to bite anyone in practice (you'd have to
read a float, and do nothing with it!), and also, is consistent with
MSVC.
Differential Revision: https://reviews.llvm.org/D56548
llvm-svn: 352076
Summary:
Previously no client of ilist traits has needed to know about transfers
of nodes within the same list, so as an optimization, ilist doesn't call
transferNodesFromList in that case. However, now there are clients that
want to use ilist traits to cache instruction ordering information to
optimize dominance queries of instructions in the same basic block.
This change updates the existing ilist traits users to detect in-list
transfers and do nothing in that case.
After this change, we can start caching instruction ordering information
in LLVM IR data structures. There are two main ways to do that:
- by putting an order integer into the Instruction class
- by maintaining order integers in a hash table on BasicBlock
I plan to implement and measure both, but I wanted to commit this change
first to enable other out of tree ilist clients to implement this
optimization as well.
Reviewers: lattner, hfinkel, chandlerc
Subscribers: hiraditya, dexonsmith, llvm-commits
Differential Revision: https://reviews.llvm.org/D57120
llvm-svn: 351992
This patch adds a new ReadAdvance definition named ReadInt2Fpu.
ReadInt2Fpu allows x86 scheduling models to accurately describe delays caused by
data transfers from the integer unit to the floating point unit.
ReadInt2Fpu currently defaults to a delay of zero cycles (i.e. no delay) for all
x86 models excluding BtVer2. That means, this patch is only a functional change
for the Jaguar cpu model only.
Tablegen definitions for instructions (V)PINSR* have been updated to account for
the new ReadInt2Fpu. That read is mapped to the the GPR input operand.
On Jaguar, int-to-fpu transfers are modeled as a +6cy delay. Before this patch,
that extra delay was added to the opcode latency. In practice, the insert opcode
only executes for 1cy. Most of the actual latency is actually contributed by the
so-called operand-latency. According to the AMD SOG for family 16h, (V)PINSR*
latency is defined by expression f+1, where f is defined as a forwarding delay
from the integer unit to the fpu.
When printing instruction latency from MCA (see InstructionInfoView.cpp) and LLC
(only when flag -print-schedule is speified), we now need to account for any
extra forwarding delays. We do this by checking if scheduling classes declare
any negative ReadAdvance entries. Quoting a code comment in TargetSchedule.td:
"A negative advance effectively increases latency, which may be used for
cross-domain stalls". When computing the instruction latency for the purpose of
our scheduling tests, we now add any extra delay to the formula. This avoids
regressing existing codegen and mca schedule tests. It comes with the cost of an
extra (but very simple) hook in MCSchedModel.
Differential Revision: https://reviews.llvm.org/D57056
llvm-svn: 351965
The current check in CombineToPreIndexedLoadStore is too
conversative, preventing a pre-indexed store when the base pointer
is a predecessor of the value being stored. Instead, we should check
the pointer operand of the store.
Differential Revision: https://reviews.llvm.org/D56719
llvm-svn: 351933
#pragma clang loop pipeline(disable)
Disable SWP optimization for the next loop.
“disable” is the only possible value.
#pragma clang loop pipeline_initiation_interval(number)
Set value of initiation interval for SWP
optimization to specified number value for
the next loop. Number is the positive value
greater than 0.
These pragmas could be used for debugging or reducing
compile time purposes. It is possible to disable SWP for
concrete loops to save compilation time or to find bugs
by not doing SWP to certain loops. It is possible to set
value of initiation interval to concrete number to save
compilation time by not doing extra pipeliner passes or
to check created schedule for specific initiation interval.
That is llvm part of the fix
Clang part of fix: https://reviews.llvm.org/D55710
Patch by Alexey Lapshin!
Differential Revision: https://reviews.llvm.org/D56403
llvm-svn: 351923
Summary:
`CodeViewDebug::lowerTypeMemberFunction` used to default to a `Void`
return type if the function's type array was empty. After D54667, it
started blindly indexing the 0th item for the return type, which fails
in `getOperand` for empty arrays if assertions are enabled.
This patch restores the `Void` return type for empty type arrays, and
adds a test generated by Rust in line-only debuginfo mode.
Reviewers: zturner, rnk
Reviewed By: rnk
Subscribers: hiraditya, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D57070
llvm-svn: 351910
Also add debug prints in the default case of the switches in these routines.
Most if not all of the type legalization handlers already do this so this makes promoting floats consistent
llvm-svn: 351890
For AMDGPU the shift amount is never 64-bit, and
this needs to use a 32-bit shift.
X86 uses i8, but seemed to be hacking around this before.
llvm-svn: 351882
Summary: Initial function labels must follow the debug location for the correct relocation info generation.
Reviewers: tra, jlebar, echristo
Subscribers: jholewinski, llvm-commits
Differential Revision: https://reviews.llvm.org/D45784
llvm-svn: 351843
vecbo (insertsubv undef, X, Z), (insertsubv undef, Y, Z) --> insertsubv VecC, (vecbo X, Y), Z
This is another step in generic vector narrowing. It's also a step towards more horizontal op
formation specifically for x86 (although we still failed to match those in the affected tests).
The scalarization cases are also not optimal (we should be scalarizing those), but it's still
an improvement to use a narrower vector op when we know part of the result must be constant
because both inputs are undef in some vector lanes.
I think a similar match but checking for a constant operand might help some of the cases in
D51553.
Differential Revision: https://reviews.llvm.org/D56875
llvm-svn: 351825
This broke the RISCV build, and even with that fixed, one of the RISCV
tests behaves surprisingly differently with asserts than without,
leaving there no clear test pattern to use. Generally it seems bad for
hte IR to differ substantially due to asserts (as in, an alloca is used
with asserts that isn't needed without!) and nothing I did simply would
fix it so I'm reverting back to green.
This also required reverting the RISCV build fix in r351782.
llvm-svn: 351796
The regression test is reduced from the example shown in D56281.
This does raise a question as noted in the test file: do we want
to handle this pattern? I don't have a motivating example for
that on x86 yet, but it seems like we could have that pattern
there too, so we could avoid the back-and-forth using a shuffle.
llvm-svn: 351753
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
This reverts commit r351618.
Compiler RT + ASAN tests are failing for PowerPC. Not sure
how would I reproduce these on macOS, so reverting (again)
until I do.
llvm-svn: 351619
Make sure CodeGenPrepare doesn't emit multiple inttoptr instructions of
the same integer value while sinking address computations, but rather
CSEs them on the fly: excessive inttoptr's confuse SCEV into thinking
that related pointers have nothing to do with each other.
This problem blocks LoadStoreVectorizer from vectorizing some of the
loads / stores in a downstream target.
Reviewed By: hfinkel
Differential Revision: https://reviews.llvm.org/D56838
llvm-svn: 351582
Summary:
This patch makes some changes related to -dag-dump-verbose.
Main use case has been when debugging how SelectionDAG is
dealing with debug info (SDDbgValue nodes).
1) We now print the number of DbgValues that are mapped to each
SDNode.
2) Removed duplicated printing of DebugLoc (nowadays DebugLoc is
printed also when not using -dag-dump-verbose).
3) Renamed SDDbgValue::dump to SDDbgValue::print, and added a
new SDDbgValue::dump that will start a new line after calling
print.
4) SDDbgValue::print now prints "Order", and it also prints
some additional information when kind is CONST/FRAMEIX/VREG.
5) SelectionDAG::dump() now dumps all SDDbgValue nodes after
the list of SDNodes (both "regular" and "ByVal" SDDbgValue:s).
Invalidated nodes are not printed.
6) Prohibit inline printing of SDNode operands that has SDDbgValue
nodes associated to them.
Reviewers: jmorse, aprantl
Reviewed By: aprantl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D56793
llvm-svn: 351581
Similar to D55073. Without this change, the DAG combiner crashes on code
with more than 64k of stores in a single basic block that form parallelizable
chains.
No test case, as it would be very IR file.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D56740
llvm-svn: 351571
Defer inline asm's output fixup work until after we've generated the
inline asm node itself. Remove StoresToEmit, IndirectStoresToEmit, and
RetValRegs in favor of using ConstraintOperands.
llvm-svn: 351558
This functionality is required at multiple places which potentially
create large operand lists, like SelectionDAGBuilder or DAGCombiner.
Differential Revision: https://reviews.llvm.org/D56739
llvm-svn: 351552
Summary:
Use this helper to make sure we use the same value at various places.
This will likely be needed at more places were we currently crash
because we use more operands than possible.
Also makes it easier to change in the future.
Reviewers: RKSimon, craig.topper, efriedma, aemerson
Reviewed By: RKSimon
Subscribers: hiraditya, arsenm, llvm-commits
Differential Revision: https://reviews.llvm.org/D56859
llvm-svn: 351537
We should not pre-scheduled the node has ADJCALLSTACKDOWN parent,
or else, when bottom-up scheduling, ADJCALLSTACKDOWN and
ADJCALLSTACKUP may hold CallResource too long and make other
calls can't be scheduled. If there's no other available node
to schedule, the scheduler will try to rename the register by
creating copy to avoid the conflict which will fail because
CallResource is not a real physical register.
llvm-svn: 351527
The callee address is added as an optional operand (MCSymbol) in
AdjustInstrPostInstrSelection() and then used by asm printer to insert:
'.reloc tmplabel, R_MIPS_JALR, symbol
tmplabel:'.
Controlled with '-mips-jalr-reloc', default is true.
Differential revision: https://reviews.llvm.org/D56694
llvm-svn: 351485
Currently we do not always collapse subsequent .loc 0 0 directives. The
reason is that we were checking for a PrevInstLoc which is not set when
we emit a line-0 record. We should only check the LastAsmLine, which
seems to be created exactly for this purpose.
// When we emit a line-0 record, we don't update PrevInstLoc; so look at
// the last line number actually emitted, to see if it was line 0.
unsigned LastAsmLine =
Asm->OutStreamer->getContext().getCurrentDwarfLoc().getLine();
Differential revision: https://reviews.llvm.org/D56767
llvm-svn: 351395
Summary:
This patch supports MS SEH extensions __try/__except/__finally. The intrinsics localescape and localrecover are responsible for communicating escaped static allocas from the try block to the handler.
We need to preserve frame pointers for SEH. So we create a new function/property HasLocalEscape.
Reviewers: rnk, compnerd, mstorsjo, TomTan, efriedma, ssijaric
Reviewed By: rnk, efriedma
Subscribers: smeenai, jrmuizel, alex, majnemer, ssijaric, ehsan, dmajor, kristina, javed.absar, kristof.beyls, chrib, llvm-commits
Differential Revision: https://reviews.llvm.org/D53540
llvm-svn: 351370
dbg.value intrinsics can appear in blocks where their operand is not used,
meaning the operand never receives an SDNode, and thus no DBG_VALUE will
be created. Get around this by looking to see whether the operand has already
been allocated a virtual register. This allows dbg.values of Phi node and
Values that are used across basic blocks to successfully be translated into
DBG_VALUEs.
Differential Revision: https://reviews.llvm.org/D56678
llvm-svn: 351358
The value returned by max() is the last valid value, adjust the
comparison accordingly.
The code added in D55073 creates TokenFactors with max() operands.
Reviewers: aemerson, efriedma, RKSimon, craig.topper
Reviewed By: aemerson
Differential Revision: https://reviews.llvm.org/D56738
llvm-svn: 351318
ReduceLoadWidth can trigger using a shifted mask is used and this
requires that the function return a shl node to correct for the
offset. However, the way that this was implemented meant that the
returned result could be an existing node, which would be incorrect.
This fixes the method of inserting the new node and replacing uses.
Differential Revision: https://reviews.llvm.org/D50432
llvm-svn: 351310
https://reviews.llvm.org/D52803
This patch adds support to continuously CSE instructions during
each of the GISel passes. It consists of a GISelCSEInfo analysis pass
that can be used by the CSEMIRBuilder.
llvm-svn: 351283
Summary:
Make recoverfp intrinsic target-independent so that it can be implemented for AArch64, etc.
Refer D53541 for the context. Clang counterpart D56748.
Reviewers: rnk, efriedma
Reviewed By: rnk, efriedma
Subscribers: javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D56747
llvm-svn: 351281
The motivating case for this is shown in the first regression test. We are
transferring to scalar and back rather than just zero-extending with 'vpmovzxdq'.
That's a special-case for a more general pattern as shown here. In all tests,
we're avoiding the vector-scalar-vector moves in favor of vector ops.
We aren't producing optimal shuffle code in some cases though, so the patch is
limited to reduce regressions.
Differential Revision: https://reviews.llvm.org/D56281
llvm-svn: 351198
A block ending in an unconditional branch can have two successors if one
is a landing pad. In practice, I think this only has an effect on
Windows because landing pads are never empty for Itanium unwinding.
(Alternatively, I could add a check to
AArch64InstrInfo::canInsertSelect, but this seems more obvious.)
Differential Revision: https://reviews.llvm.org/D56468
llvm-svn: 351142
Split MachinePipeliner code into header and cpp files to allow
inheritance from SwingSchedulerDAG.
This reapplies https://reviews.llvm.org/D56084 after moving the
implementation of the dump functions into the .cpp files. This fixes a
linker error when building with Clang modules enables and local
submodule visibility disabled.
Original patch by Lama Saba <lama.saba@intel.com>!
llvm-svn: 351077
Part of the effort to refactoring frame pointer code generation. We used
to use two function attributes "no-frame-pointer-elim" and
"no-frame-pointer-elim-non-leaf" to represent three kinds of frame
pointer usage: (all) frames use frame pointer, (non-leaf) frames use
frame pointer, (none) frame use frame pointer. This CL makes the idea
explicit by using only one enum function attribute "frame-pointer"
Option "-frame-pointer=" replaces "-disable-fp-elim" for tools such as
llc.
"no-frame-pointer-elim" and "no-frame-pointer-elim-non-leaf" are still
supported for easy migration to "frame-pointer".
tests are mostly updated with
// replace command line args ‘-disable-fp-elim=false’ with ‘-frame-pointer=none’
grep -iIrnl '\-disable-fp-elim=false' * | xargs sed -i '' -e "s/-disable-fp-elim=false/-frame-pointer=none/g"
// replace command line args ‘-disable-fp-elim’ with ‘-frame-pointer=all’
grep -iIrnl '\-disable-fp-elim' * | xargs sed -i '' -e "s/-disable-fp-elim/-frame-pointer=all/g"
Patch by Yuanfang Chen (tabloid.adroit)!
Differential Revision: https://reviews.llvm.org/D56351
llvm-svn: 351049
I accidentally triggered this code while doing some experiments and it doesn't look lke it could possibly work.
It calls 'getNOT' on a node that should be a CondCode.
I think to do this right we would need to swap the branch target and the fallthrough target. But that's not easy to do. Or we could create an explicit SetCC and feed that into a new BR_CC?
llvm-svn: 351022
This pattern:
t33: v8i32 = insert_subvector undef:v8i32, t35, Constant:i64<0>
t21: v16i32 = insert_subvector undef:v16i32, t33, Constant:i64<0>
...shows up in PR33758:
https://bugs.llvm.org/show_bug.cgi?id=33758
...although this patch doesn't make any difference to the final result on that yet.
In the affected tests here, it looks like it just makes RA wiggle. But we might
as well squash this to prevent it interfering with other pattern-matching.
Differential Revision:
https://reviews.llvm.org/D56604
llvm-svn: 351008
This patch takes some of the code from D49837 to allow us to enable ISD::ABS support for all SSE vector types.
Differential Revision: https://reviews.llvm.org/D56544
llvm-svn: 350998
Summary:
When legalizing the result of a SELECT_CC node by promoting the
floating-point type, use the promoted-to type rather than the original
type.
Fix PR40273.
Reviewers: efriedma, majnemer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D56566
llvm-svn: 350951
That is, remove many of the calls to Type::getNumContainedTypes(),
Type::subtypes(), and Type::getContainedType(N).
I'm not intending to remove these accessors -- they are
useful/necessary in some cases. However, removing the pointee type
from pointers would potentially break some uses, and reducing the
number of calls makes it easier to audit.
llvm-svn: 350835
If the caller's return type does not have a zeroext attribute but the
callee does a tail call zeroext, we won't consider the tail call during
CodeGenPrepare because the attributes don't match.
However, if the result of the tail call has no uses, it makes sense to
drop the sext/zext attributes.
Differential Revision: https://reviews.llvm.org/D56486
llvm-svn: 350753
Summary:
This fixes PR39710. In that case we emitted a location list looking like
this:
.Ldebug_loc0:
.quad .Lfunc_begin0-.Lfunc_begin0
.quad .Lfunc_begin0-.Lfunc_begin0
.short 1 # Loc expr size
.byte 85 # DW_OP_reg5
.quad .Lfunc_begin0-.Lfunc_begin0
.quad .Lfunc_end0-.Lfunc_begin0
.short 1 # Loc expr size
.byte 85 # super-register DW_OP_reg5
.quad 0
.quad 0
As seen, the first entry's beginning and ending addresses evalute to 0,
which meant that the entry inadvertently became an "end of list" entry,
resulting in the location list ending sooner than expected.
To fix this, omit all entries with empty ranges. Location list entries
with empty ranges do not have any effect, as specified by DWARF, so we
might as well drop them:
"A location list entry (but not a base address selection or end of list
entry) whose beginning and ending addresses are equal has no effect
because the size of the range covered by such an entry is zero."
Reviewers: davide, aprantl, dblaikie
Reviewed By: aprantl
Subscribers: javed.absar, JDevlieghere, llvm-commits
Tags: #debug-info
Differential Revision: https://reviews.llvm.org/D55919
llvm-svn: 350698
This removes check for single use from general ShrinkDemandedConstant
to the BE because of the AArch64 regression after D56289/rL350475.
After several hours of experiments I did not come up with a testcase
failing on any other targets if check is not performed.
Moreover, direct call to ShrinkDemandedConstant is not really needed
and superceed by SimplifyDemandedBits.
Differential Revision: https://reviews.llvm.org/D56406
llvm-svn: 350684
Currently it's possible for following
check on V.WriteLanes (which is not really meaningful
during SubRangeJoin) to pass for one half of the pair,
and then fall through to to one of the impossible
or unresolved states. This then fails as inconsistent
on the other half.
During the main range join, the check between V.WriteLanes
and OtherV.ValidLanes must have passed, meaning this
should be a CR_Replace.
Fixes most of the testcases in bugs 39542 and 39602
llvm-svn: 350678
We can't go back and recover the lanes if it turns
out the implicit_def really can't be erased.
Assume all lanes are valid if an unresolved conflict
is encountered. There aren't any tests where this
seems to matter either way, but this seems like a
safer option.
Fixes bug 39602
llvm-svn: 350676
This patch adds a convenience report() method for physical registers and
uses it to print the offending register with the 'MBB has allocatable
live-in' error.
Reviewers: MatzeB, rtereshin, dsanders
Reviewed By: dsanders
Differential Revision: https://reviews.llvm.org/D55946
llvm-svn: 350630
Commit rL347861 introduced an unintentional change in the behaviour when
compiling for AArch64 at -O0 with -global-isel=0. Previously, explicitly
disabling GlobalISel resulted in using FastISel but an updated condition
in the commit changed it to using SelectionDAG. The patch fixes this
condition and slightly better organizes the code that chooses the
instruction selector.
Fixes PR40131.
Differential Revision: https://reviews.llvm.org/D56266
llvm-svn: 350626
This reverts commit rL350497
reported remaining issues seem to be unrelated to modules or this change.
more info: https://reviews.llvm.org/D56084
llvm-svn: 350621
If a copy was needed to handle the condition of brcond, it was being
inserted before the defining instruction. Add tests for iterator edge
cases.
I find the existing code here suspect for the case where it's looking
for terminators that modify the register. It's going to insert a copy
in the middle of the terminators, which isn't allowed (it might be
necessary to have a COPY_terminator if anybody actually needs this).
Also legalize brcond for AMDGPU.
llvm-svn: 350595
merging a src register in ToBeUpdated set.
This is to fix PR40061 related with https://reviews.llvm.org/rL339035.
In https://reviews.llvm.org/rL339035, live interval of source pseudo register
in rematerialized copy may be saved in ToBeUpdated set and its update may be
postponed.
In PR40061, %t2 = %t1 is rematerialized and %t1 is added into toBeUpdated set
to postpone its live interval update. After the rematerialization, the live
interval of %t1 is larger than necessary. Then %t1 is merged into %t3 and %t1
gets removed. After the merge, %t3 contains live interval larger than necessary.
Because %t3 is not in toBeUpdated set, its live interval is not updated after
register coalescing and it will break some assumption in regalloc.
The patch requires the live interval of destination register in a merge to be
updated if the source register is in ToBeUpdated.
Differential revision: https://reviews.llvm.org/D55867
llvm-svn: 350586
As we saw in D56057 when we tried to use this function on X86, it's unsafe. It allows the operand node to have multiple users, but doesn't prevent recursing past the first node when it does have multiple users. This can cause other simplifications earlier in the graph without regard to what bits are needed by the other users of the first node. Ideally all we should do to the first node if it has multiple uses is bypass it when its not needed by the user we started from. Doing any other transformation that SimplifyDemandedBits can do like turning ZEXT/SEXT into AEXT would result in an increase in instructions.
Fortunately, we already have a function that can do just that, GetDemandedBits. It will only make transformations that involve bypassing a node.
This patch changes AMDGPU's simplifyI24, to use a combination of GetDemandedBits to handle the multiple use simplifications. And then uses the regular SimplifyDemandedBits on each operand to handle simplifications allowed when the operand only has a single use. Unfortunately, GetDemandedBits simplifies constants more aggressively than SimplifyDemandedBits. This caused the -7 constant in the changed test to be simplified to remove the upper bits. I had to modify computeKnownBits to account for this by ignoring the upper 8 bits of the input.
Differential Revision: https://reviews.llvm.org/D56087
llvm-svn: 350560
The FSHL/FSHR nodes are handled in the expand function, but they need to also be listed in the code that queries for the operation action too.
llvm-svn: 350490
Fixes cvt_f32_ubyte combine. performCvtF32UByteNCombine() could shrink
source node to demanded bits only even if there are other uses.
Differential Revision: https://reviews.llvm.org/D56289
llvm-svn: 350475
This adds support for calculating sign bits of insert_subvector. I based it on the computeKnownBits.
My motivating case is propagating sign bits information across basic blocks on AVX targets where concatenating using insert_subvector is common.
Differential Revision: https://reviews.llvm.org/D56283
llvm-svn: 350432
As noted in PR39973 and D55558:
https://bugs.llvm.org/show_bug.cgi?id=39973
...this is a partial implementation of a fold that we do as an IR canonicalization in instcombine:
// extelt (binop X, Y), Index --> binop (extelt X, Index), (extelt Y, Index)
We want to have this in the DAG too because as we can see in some of the test diffs (reductions),
the pattern may not be visible in IR.
Given that this is already an IR canonicalization, any backend that would prefer a vector op over
a scalar op is expected to already have the reverse transform in DAG lowering (not sure if that's
a realistic expectation though). The transform is limited with a TLI hook because there's an
existing transform in CodeGenPrepare that tries to do the opposite transform.
Differential Revision: https://reviews.llvm.org/D55722
llvm-svn: 350354
A DBG_VALUE between a two-address instruction and a following COPY
would prevent rescheduleMIBelowKill optimization inside
TwoAddressInstructionPass.
Differential Revision: https://reviews.llvm.org/D55987
llvm-svn: 350289
Currently we expand the two nodes separately. This gives DAG combiner an opportunity to optimize the expanded sequence taking into account only one set of users. When we expand the other node we'll create the expansion again, but might not be able to optimize it the same way. So the nodes won't CSE and we'll have two similarish sequences in the same basic block. By expanding both nodes at the same time we'll avoid prematurely optimizing the expansion until both the division and remainder have been replaced.
Improves the test case from PR38217. There may be additional opportunities after this.
Differential Revision: https://reviews.llvm.org/D56145
llvm-svn: 350239
By also promoting the input type we get a better idea for what scalar type to use. This can provide better results if the result of the extract is sign extended. What was previously happening is that the extract result would be legalized, sometime later the input of the sign extend would be legalized using the result of the extract. Then later the extract input would be legalized forcing a truncate into the input of the sign extend using a replace all uses. This requires DAG combine to combine out the sext/truncate pair. But sometimes we visited the truncate first and messed things up before the sext could be combined.
By creating the extract with the correct scalar type when we create legalize the result type, the truncate will be added right away. Then when the sign_extend input is legalized it will create an any_extend of the truncate which can be optimized by getNode to maybe remove the truncate. And then a sign_extend_inreg. Now DAG combine doesn't have to worry about getting rid of the extend.
This fixes the regression on X86 in D56156.
Differential Revision: https://reviews.llvm.org/D56176
llvm-svn: 350236
If x has multiple sign bits than it doesn't matter which one we extend from so we can sext from x's msb instead.
The X86 setcc-combine.ll changes are a little weird. It appears we ended up with a (sext_inreg (aext (trunc (extractelt)))) after type legalization. The sext_inreg+aext now gets optimized by this combine to leave (sext (trunc (extractelt))). Then we visit the trunc before we visit the sext. This ends up changing the truncate to an extractvectorelt from a bitcasted vector. I have a follow up patch to fix this.
Differential Revision: https://reviews.llvm.org/D56156
llvm-svn: 350235
default
During the lowering of a switch that would result in the generation of a jump
table, a range check is performed before indexing into the jump table, for the
switch value being outside the jump table range and a conditional branch is
inserted to jump to the default block. In case the default block is
unreachable, this conditional jump can be omitted. This patch implements
omitting this conditional branch for unreachable defaults.
Review Reference: D52002
llvm-svn: 350186
Fixes crash reported after r347354 for frontends that don't always emit
'this' pointers for methods. Now we will silently produce debug info
that makes functions like this look like static methods, which seems
reasonable.
llvm-svn: 350073
The patch adds a possibility to make library calls on NVPTX.
An important thing about library functions - they must be defined within
the current module. This basically should guarantee that we produce a
valid PTX assembly (without calls to not defined functions). The one who
wants to use the libcalls is probably will have to link against
compiler-rt or any other implementation.
Currently, it's completely impossible to make library calls because of
error LLVM ERROR: Cannot select: i32 = ExternalSymbol '...'. But we can
lower ExternalSymbol to TargetExternalSymbol and verify if the function
definition is available.
Also, there was an issue with a DAG during legalisation. When we expand
instruction into libcall, the inner call-chain isn't being "integrated"
into outer chain. Since the last "data-flow" (call retval load) node is
located in call-chain earlier than CALLSEQ_END node, the latter becomes
a leaf and therefore a dead node (and is being removed quite fast).
Proposed here solution relies on another data-flow pseudo nodes
(ProxyReg) which purpose is only to keep CALLSEQ_END at legalisation and
instruction selection phases - we remove the pseudo instructions before
register scheduling phase.
Patch by Denys Zariaiev!
Differential Revision: https://reviews.llvm.org/D34708
llvm-svn: 350069
Add widen scalar for type index 1 (i1 condition) for G_SELECT.
Select G_SELECT for pointer, s32(integer) and smaller low level
types on MIPS32.
Differential Revision: https://reviews.llvm.org/D56001
llvm-svn: 350063
This is an alternative to what I attempted in D56057.
GetDemandedBits is a special version of SimplifyDemandedBits that allows simplifications even when the operand has other uses. GetDemandedBits will only do simplifications that allow a node to be bypassed. It won't create new nodes or alter any of the other users.
I had to add support for bypassing SIGN_EXTEND_INREG to GetDemandedBits.
Based on a patch that Simon Pilgrim sent me in email.
Fixes PR40142.
llvm-svn: 350059
More migration so we can disable the implicit int -> LocationSize
conversion.
All of these are either scatter/gather'ed vector instructions, or direct
loads. Hence, they're all precise.
Perhaps if we see way more getTypeStoreSize calls, we can make a
getTypeStoreLocationSize (or similar) as a wrapper that applies this
::precise. Doesn't appear that it's a good idea to make getTypeStoreSize
return a LocationSize itself, however.
llvm-svn: 350042
It's dangerous to knowingly create an illegal vector type
no matter what stage of combining we're in.
This prevents the missed folding/scalarization seen in:
https://bugs.llvm.org/show_bug.cgi?id=40146
llvm-svn: 350034
trunc (add X, C ) --> add (trunc X), C'
If we're throwing away the top bits of an 'add' instruction, do it in the narrow destination type.
This makes the truncate-able opcode list identical to the sibling transform done in IR (in instcombine).
This change used to show regressions for x86, but those are gone after D55494.
This gets us closer to deleting the x86 custom function (combineTruncatedArithmetic)
that does almost the same thing.
Differential Revision: https://reviews.llvm.org/D55866
llvm-svn: 350006
Instruction::isLifetimeStartOrEnd() checks whether an Instruction is an
llvm.lifetime.start or an llvm.lifetime.end intrinsic.
This was suggested as a cleanup in D55967.
Differential Revision: https://reviews.llvm.org/D56019
llvm-svn: 349964
This adds support for widening G_FCEIL in LegalizerHelper and
AArch64LegalizerInfo. More specifically, it teaches the AArch64 legalizer to
widen G_FCEIL from a 16-bit float to a 32-bit float when the subtarget doesn't
support full FP 16.
This also updates AArch64/f16-instructions.ll to show that we perform the
correct transformation.
llvm-svn: 349927
This saves materializing the immediate. The additional forms are less
common (they don't usually show up for bitfield insert/extract), but
they're still relevant.
I had to add a new target hook to prevent DAGCombine from reversing the
transform. That isn't the only possible way to solve the conflict, but
it seems straightforward enough.
Differential Revision: https://reviews.llvm.org/D55630
llvm-svn: 349857
When deciding lazily whether a CU would be split or non-split I
accidentally dropped some handling for the line tables comp_dir (by
doing it lazily it was too late to be handled properly by the MC line
table code).
Move that bit of the code back to the non-lazy place.
llvm-svn: 349819
Emit static locals within the correct lexical scope so variables with the same
name will not confuse the debugger into getting the wrong value.
Differential Revision: https://reviews.llvm.org/D55336
llvm-svn: 349777
This patch enables funnel shift -> rotate building for all ROTL/ROTR custom/legal operations.
AFAICT X86 was the last target that was missing modulo support (PR38243), but I've tried to CC stakeholders for every target that has ROTL/ROTR custom handling for their final OK.
Differential Revision: https://reviews.llvm.org/D55747
llvm-svn: 349765
Summary:
This allows expanding {7,11,13,14,15,21,22,23,25,26,27,28,29,30,31}-byte memcmp
in just two loads on X86. These were previously calling memcmp.
Reviewers: spatel, gchatelet
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D55263
llvm-svn: 349731
Creating the IR builder, then modifying the CFG, leads to an IRBuilder
where the BB and insertion point are inconsistent, so new instructions
have the wrong parent.
Modified an existing test because the test wasn't covering anything
useful (the "invoke" was not actually an invoke by the time we hit the
code in question).
Differential Revision: https://reviews.llvm.org/D55729
llvm-svn: 349693
This adds a G_FCEIL generic instruction and uses it in AArch64. This adds
selection for floating point ceil where it has a supported, dedicated
instruction. Other cases aren't handled here.
It updates the relevant gisel tests and adds a select-ceil test. It also adds a
check to arm64-vcvt.ll which ensures that we don't fall back when we run into
one of the relevant cases.
llvm-svn: 349664
Now that SimplifyDemandedBits/SimplifyDemandedVectorElts is simplifying vector elements, we're seeing more constant BUILD_VECTOR containing undefs.
This patch provides opt-in support for UNDEF elements in matchBinaryPredicate, passing NULL instead of the result ConstantSDNode* argument.
I've updated the (or (and X, c1), c2) -> (and (or X, c2), c1|c2) fold to demonstrate its use, which I believe is safe for undef cases.
Differential Revision: https://reviews.llvm.org/D55822
llvm-svn: 349629
Now that SimplifyDemandedBits/SimplifyDemandedVectorElts is simplifying vector elements, we're seeing more constant BUILD_VECTOR containing undefs.
This patch provides opt-in support for UNDEF elements in matchBinaryPredicate, passing NULL instead of the result ConstantSDNode* argument.
Differential Revision: https://reviews.llvm.org/D55822
llvm-svn: 349628
As described on PR40091, we have several places where zext (and zext_vector_inreg) fold an undef input into an undef output. For zero extensions this is incorrect as the output should guarantee to least have the new upper bits set to zero.
SimplifyDemandedVectorElts is the worst offender (and its the most likely to cause new undefs to appear) but DAGCombiner's tryToFoldExtendOfConstant has a similar issue.
Thanks to @dmgreen for catching this.
Differential Revision: https://reviews.llvm.org/D55883
llvm-svn: 349625
Now that SimplifyDemandedBits/SimplifyDemandedVectorElts are simplifying vector elements, we're seeing more constant BUILD_VECTOR containing UNDEFs.
This patch provides opt-in handling of UNDEF elements in matchUnaryPredicate, passing NULL instead of the ConstantSDNode* argument.
I've updated SelectionDAG::simplifyShift to demonstrate its use.
Differential Revision: https://reviews.llvm.org/D55819
llvm-svn: 349616
This patch moved the following files in lib/CodeGen/AsmPrinter/
AsmPrinterHandler.h
DbgEntityHistoryCalculator.h
DebugHandlerBase.h
to include/llvm/CodeGen directory.
Such a change will enable Target to extend DebugHandlerBase
and emit Target specific debug info sections.
Signed-off-by: Yonghong Song <yhs@fb.com>
Differential Revision: https://reviews.llvm.org/D55755
llvm-svn: 349564
Clang uses weak linkage for objc runtime functions when they are not available on the platform.
The intrinsic has this linkage so we just need to pass that on to the runtime call.
llvm-svn: 349559
For performance reasons, clang set nonlazybind on these functions. Now that we
are using intrinsics instead of runtime calls, we should set this attribute when
creating the runtime functions.
llvm-svn: 349558
SelectionDAG currently changes these intrinsics to function calls, but that won't work
for other ISel's. Also we want to eventually support nonlazybind and weak linkage coming
from the front-end which we can't do in SelectionDAG.
llvm-svn: 349552
Summary: This the initial code change to facilitate managing FMF flags from Instructions to MI wrt Intrinsics in Global Isel. Eventually the GlobalObserver interface will be added as well, where FMF additions can be tracked for the builder and CSE.
Reviewers: aditya_nandakumar, bogner
Reviewed By: bogner
Subscribers: rovka, kristof.beyls, javed.absar
Differential Revision: https://reviews.llvm.org/D55668
llvm-svn: 349514
Add support for s64 libcalls for G_SDIV, G_UDIV, G_SREM and G_UREM
and use integer type of correct size when creating arguments for
CLI.lowerCall.
Select G_SDIV, G_UDIV, G_SREM and G_UREM for types s8, s16, s32 and s64
on MIPS32.
Differential Revision: https://reviews.llvm.org/D55651
llvm-svn: 349499
Add narrowScalar for G_AND and G_XOR.
Legalize G_AND G_OR and G_XOR for types other then s32
with clampScalar on MIPS32.
Differential Revision: https://reviews.llvm.org/D55362
llvm-svn: 349475
- Reapply changes intially introduced in r343089
- The archtecture info is no longer loaded whenever a DWARFContext is created
- The runtimes libraries (santiziers) make use of the dwarf context classes but
do not intialise the target info
- The architecture of the object can be obtained without loading the target info
- Adding a method to the dwarf context to get this information and multiplex the
string printing later on
Differential Revision: https://reviews.llvm.org/D55774
llvm-svn: 349472
For opcodes not covered by SimplifyDemandedVectorElts, SimplifyDemandedBits might be able to help now that it supports demanded elts as well.
llvm-svn: 349466
In PDBs, symbol records must be aligned to four bytes. However, in the
object file, symbol records may not be aligned. MSVC does not pad out
symbol records to make sure they are aligned. That means the linker has
to do extra work to insert the padding. Currently, LLD calculates the
required space with alignment, and copies each record one at a time
while padding them out to the correct size. It has a fast path that
avoids this copy when the records are already aligned.
This change fixes a bug in that codepath so that the copy is actually
saved, and tweaks LLVM's symbol record emission to align symbol records.
Here's how things compare when doing a plain clang Release+PDB build:
- objs are 0.65% bigger (negligible)
- link is 3.3% faster (negligible)
- saves allocating 441MB
- new LLD high water mark is ~1.05GB
llvm-svn: 349431
Mucking about simplifying a test case ( https://reviews.llvm.org/D55261 ) I stumbled across something I've hit before - that LLVM's (GCC's does too, FWIW) assembly output includes a hardcode length for a DWARF unit in its header. Instead we could emit a label difference - making the assembly easier to read/edit (though potentially at a slight (I haven't tried to observe it) performance cost of delaying/sinking the length computation into the MC layer).
Fix: Predicated all the changes (including creating the labels, even if they aren't used/needed) behind the NVPTX useSectionsAsReferences, avoiding emitting labels in NVPTX where ptxas can't parse them.
Reviewers: JDevlieghere, probinson, ABataev
Differential Revision: https://reviews.llvm.org/D55281
llvm-svn: 349430
The assertion type is always supposed to be a scalar type. So if the result VT of the assertion is a vector, we need to get the scalar VT before we can compare them.
Similarly for the assert above it.
I don't have a test case because I don't know of any place we violate this today. A coworker found this while trying to use r347287 on the 6.0 branch without also having r336868
llvm-svn: 349390
This is an initial patch to add the necessary support for a DemandedElts argument to SimplifyDemandedBits, more closely matching computeKnownBits and to help improve vector codegen.
I've added only a small amount of the changes necessary to get at least one test to update - a lot more can be done but I'd like to add these methodically with proper test coverage, at the same time the hope is to slowly move some/all of SimplifyDemandedVectorElts into SimplifyDemandedBits as well.
Differential Revision: https://reviews.llvm.org/D55768
llvm-svn: 349374
We keep a few iterators into the basic block we're selecting while
performing FastISel. Usually this is fine, but occasionally code wants
to remove already-emitted instructions. When this happens we have to be
careful to update those iterators so they're not pointint at dangling
memory.
llvm-svn: 349365
The transform performs a bitwise logic op in a wider type followed by
truncate when both inputs are truncated from the same source type:
logic_op (truncate x), (truncate y) --> truncate (logic_op x, y)
There are a bunch of other checks that should prevent doing this when
it might be harmful.
We already do this transform for scalars in this spot. The vector
limitation was shared with a check for the case when the operands are
extended. I'm not sure if that limit is needed either, but that would
be a separate patch.
Differential Revision: https://reviews.llvm.org/D55448
llvm-svn: 349303
Also exposes an issue in DAGCombiner::visitFunnelShift where we were assuming the shift amount had the result type (after legalization it'll have the targets shift amount type).
llvm-svn: 349298
Summary:
Make machine PHIs optimization to work for single value register taken from
several different copies. This is the first step to fix PR38917. This change
allows to get rid of redundant PHIs (see opt_phis2.mir test) to make
the subsequent optimizations (like CSE) possible and simpler.
For instance, before this patch the code like this:
%b = COPY %z
...
%a = PHI %bb1, %a; %bb2, %b
could be optimized to:
%a = %b
but the code like this:
%c = COPY %z
...
%b = COPY %z
...
%a = PHI %bb1, %a; %bb2, %b; %bb3, %c
would remain unchanged.
With this patch the latter case will be optimized:
%a = %z```.
Committed on behalf of: Anton Afanasyev anton.a.afanasyev@gmail.com
Reviewers: RKSimon, MatzeB
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D54839
llvm-svn: 349271
In ThinLTO many split CUs may be effectively empty because of the lack
of support for cross-unit references in split DWARF.
Using a split unit in those cases is just a waste/overhead - and turned
out to be one contributor to a significant symbolizer performance issue
when global variable debug info was being imported (see r348416 for the
primary fix) due to symbolizers seeing CUs with no ranges, assuming
there might still be addresses covered and walking into the split CU to
see if there are any ranges (when that split CU was in a DWP file, that
meant loading the DWP and its index, the index was extra large because
of all these fractured/empty CUs... and so was very expensive to load).
(the 3rd fix which will follow, is to assume that a CU with no ranges is
empty rather than merely missing its CU level range data - and to not
walk into its DIEs (split or otherwise) in search of address information
that is generally not present)
llvm-svn: 349207
Previously beginning a symbol record was excessively verbose. Now it's a
bit simpler. This follows the same pattern as begin/endCVSubsection.
llvm-svn: 349205
Summary:
This allows us to register it with the MachineFunction delegate and be
notified automatically about erasure and creation of instructions. However,
we still need explicit notification for modifications such as those caused
by setReg() or replaceRegWith().
There is a catch with this though. The notification for creation is
delivered before any operands can be added. While appropriate for
scheduling combiner work. This is unfortunate for debug output since an
opcode by itself doesn't provide sufficient information on what happened.
As a result, the work list remembers the instructions (when debug output is
requested) and emits a more complete dump later.
Another nit is that the MachineFunction::Delegate provides const pointers
which is inconvenient since we want to use it to schedule future
modification. To resolve this GISelWorkList now has an optional pointer to
the MachineFunction which describes the scope of the work it is permitted
to schedule. If a given MachineInstr* is in this function then it is
permitted to schedule work to be performed on the MachineInstr's. An
alternative to this would be to remove the const from the
MachineFunction::Delegate interface, however delegates are not permitted
to modify the MachineInstr's they receive.
In addition to this, the observer has three interface changes.
* erasedInstr() is now erasingInstr() to indicate it is about to be erased
but still exists at the moment.
* changingInstr() and changedInstr() have been added to report changes
before and after they are made. This allows us to trace the changes
in the debug output.
* As a convenience changingAllUsesOfReg() and
finishedChangingAllUsesOfReg() will report changingInstr() and
changedInstr() for each use of a given register. This is primarily useful
for changes caused by MachineRegisterInfo::replaceRegWith()
With this in place, both combine rules have been updated to report their
changes to the observer.
Finally, make some cosmetic changes to the debug output and make Combiner
and CombinerHelp
Reviewers: aditya_nandakumar, bogner, volkan, rtereshin, javed.absar
Reviewed By: aditya_nandakumar
Subscribers: mgorny, rovka, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D52947
llvm-svn: 349167
Implement options in clang to enable recording the driver command-line
in an ELF section.
Implement a new special named metadata, llvm.commandline, to support
frontends embedding their command-line options in IR/ASM/ELF.
This differs from the GCC implementation in some key ways:
* In GCC there is only one command-line possible per compilation-unit,
in LLVM it mirrors llvm.ident and multiple are allowed.
* In GCC individual options are separated by NULL bytes, in LLVM entire
command-lines are separated by NULL bytes. The advantage of the GCC
approach is to clearly delineate options in the face of embedded
spaces. The advantage of the LLVM approach is to support merging
multiple command-lines unambiguously, while handling embedded spaces
with escaping.
Differential Revision: https://reviews.llvm.org/D54487
Clang Differential Revision: https://reviews.llvm.org/D54489
llvm-svn: 349155
It costs nothing to spill an IMPLICIT_DEF value (the only spill code that's
generated is a KILL of the value), so when creating split constraints if the
live-out value is IMPLICIT_DEF the exit constraint should be DontCare instead
of PrefReg.
Differential Revision: https://reviews.llvm.org/D55652
llvm-svn: 349151
Summary:
If the setcc already has the target desired type we can reach the getSetCC/getSExtOrTrunc after the MatchingVecType check with the exact same types as the nodes we started with. This causes those causes VsetCC to be CSEd to N0 and the getSExtOrTrunc will CSE to N. When we return N, the caller will think that meant we called CombineTo and did our own worklist management. But that's not what happened. This prevents target hooks from being called for the node.
To fix this, I've now returned SDValue if the setcc is already the desired type. But to avoid some regressions in X86 I've had to disable one of the target combines that wasn't being reached before in the case of a (sext (setcc)). If we get vector widening legalization enabled that entire function will be deleted anyway so hopefully this is only for the short term.
Reviewers: RKSimon, spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D55459
llvm-svn: 349137
build version load commands in the object file
This commit introduces a new metadata node called "SDK Version". It will be set
by the frontend to mark the platform SDK (macOS/iOS/etc) version which was used
during that particular compilation.
This node is used when machine code is emitted, by either saving the SDK version
into the appropriate macho load command (version min/build version), or by
emitting the assembly for these load commands with the SDK version specified as
well.
The assembly for both load commands is extended by allowing it to contain the
sdk_version X, Y [, Z] trailing directive to represent the SDK version
respectively.
rdar://45774000
Differential Revision: https://reviews.llvm.org/D55612
llvm-svn: 349119
This isn't quite NFC, but I don't know how to expose
any outward diffs from these changes. Mostly, this
was confusing because it used 'VT' to refer to the
operand type rather the usual type of the input node.
There's also a large block at the end that is dedicated
solely to matching loads, but that wasn't obvious. This
could probably be split up into separate functions to
make it easier to see.
It's still not clear to me when we make certain transforms
because the legality and constant conditions are
intertwined in a way that might be improved.
llvm-svn: 349095
On 32-bit archs, before, we would assume that an indirect symbol will
never have local linkage. This can lead to miscompiles where the
symbol's value would be 0 and the linker would use that value, because
the indirect symbol table would contain the value
`INDIRECT_SYMBOL_LOCAL` for that specific symbol.
Differential Revision: https://reviews.llvm.org/D55573
llvm-svn: 349060
This is a retry of rL349051 (reverted at rL349056). I changed the check for dead-ness from
number of uses to an opcode test for DELETED_NODE based on existing similar code.
Differential Revision: https://reviews.llvm.org/D55655
llvm-svn: 349058
Summary:
Sometimes MIR-level passes create DILocations that were not present in the
LLVM-IR. For example, it may merge two DILocations together to produce a
DILocation that points to line 0.
Previously, the address of these DILocations were printed which prevented the
MIR from being read back into LLVM. With this patch, DILocations will use
metadata references where possible and fall back on serializing them inline like so:
MOV32mr %stack.0.x.addr, 1, _, 0, _, %0, debug-location !DILocation(line: 1, scope: !15)
Reviewers: aprantl, vsk, arphaman
Reviewed By: aprantl
Subscribers: probinson, llvm-commits
Tags: #debug-info
Differential Revision: https://reviews.llvm.org/D55243
llvm-svn: 349035
Move existing rotation expansion code into TargetLowering and set it up for vectors as well.
Ideally this would share more of the funnel shift expansion, but we handle the shift amount modulo quite differently at the moment.
Begun removing x86 vector rotate custom lowering to use the expansion.
llvm-svn: 349025
Summary:
All targets either just return false here or properly model `Fast`, so I
don't think there is any reason to prevent CodeGen from doing the right
thing here.
Subscribers: nemanjai, javed.absar, eraman, jsji, llvm-commits
Differential Revision: https://reviews.llvm.org/D55365
llvm-svn: 349016
Summary:
In addition to knowing that an instruction is changed. It's also useful to
know when it's about to change. For example, it might print the instruction so
you can track the changes in a debug log, it might remove it from some queue
while it's being worked on, or it might want to change several instructions as
a single transaction and act on all the changes at once.
Added changingInstr() to all existing uses of changedInstr()
Reviewers: aditya_nandakumar
Reviewed By: aditya_nandakumar
Subscribers: rovka, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D55623
llvm-svn: 348992
Summary:
There's little of interest that can be done to an already-erased instruction.
You can't inspect it, write it to a debug log, etc. It ought to be notification
that we're about to erase it. Rename the function to clarify the timing of the
event and reflect current usage.
Also fixed one case where we were trying to print an erased instruction.
Reviewers: aditya_nandakumar
Reviewed By: aditya_nandakumar
Subscribers: rovka, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D55611
llvm-svn: 348976
This patch introduces a generic function to determine whether a given vector type is known to be a splat value for the specified demanded elements, recursing up the DAG looking for BUILD_VECTOR or VECTOR_SHUFFLE splat patterns.
It also keeps track of the elements that are known to be UNDEF - it returns true if all the demanded elements are UNDEF (as this may be useful under some circumstances), so this needs to be handled by the caller.
A wrapper variant is also provided that doesn't take the DemandedElts or UndefElts arguments for cases where we just want to know if the SDValue is a splat or not (with/without UNDEFS).
I had hoped to completely remove the X86 local version of this function, but I'm seeing some regressions in shift/rotate codegen that will take a little longer to fix and I hope to get this in sooner so I can continue work on PR38243 which needs more capable splat detection.
Differential Revision: https://reviews.llvm.org/D55426
llvm-svn: 348953
If either of the operand elements are zero then we know the result element is going to be zero (even if the other element is undef).
Differential Revision: https://reviews.llvm.org/D55558
llvm-svn: 348926
Add an intrinsic that takes 2 signed integers with the scale of them provided
as the third argument and performs fixed point multiplication on them.
This is a part of implementing fixed point arithmetic in clang where some of
the more complex operations will be implemented as intrinsics.
Differential Revision: https://reviews.llvm.org/D54719
llvm-svn: 348912
Summary:
Any time a symbol record, whether it's S_UDT, S_LOCAL, or S_[GL]DATA32,
references a record type, it should use the complete type index, even if
there's a typedef in the way.
Fixes the compiler part of PR39853.
Reviewers: zturner, aganea
Subscribers: hiraditya, arphaman, llvm-commits
Differential Revision: https://reviews.llvm.org/D55236
llvm-svn: 348902
Temporarily reverts commit r348806 due to strange asm compilation issues in certain modes (combination of asan+cuda+other things). Will provide repro soon.
llvm-svn: 348898
This fixes PR39845. CodeGenPrepare employs a transactional model when
performing optimizations, i.e. it changes the IR to attempt an optimization
and rolls back the change when it finds the change inadequate. It is during
the rollback that references to locals were dropped from debug value
intrinsics. This patch reinstates debuginfo references during rollbacks.
Reviewers: aprantl, vsk
Differential Revision: https://reviews.llvm.org/D55396
llvm-svn: 348896
https://reviews.llvm.org/D55516
Add the ability to pass in flags to buildInstr calls. Currently no
validation is performed but that can be easily performed based on the
opcode (if necessary).
Reviewed by: paquette.
llvm-svn: 348893
Summary:
All targets either just return false here or properly model `Fast`, so I
don't think there is any reason to prevent CodeGen from doing the right
thing here.
Subscribers: nemanjai, javed.absar, eraman, jsji, llvm-commits
Differential Revision: https://reviews.llvm.org/D55365
llvm-svn: 348843
https://reviews.llvm.org/D55294
Previously MachineIRBuilder::buildInstr used to accept variadic
arguments for sources (which were either unsigned or
MachineInstrBuilder). While this worked well in common cases, it doesn't
allow us to build instructions that have multiple destinations.
Additionally passing in other optional parameters in the end (such as
flags) is not possible trivially. Also a trivial call such as
B.buildInstr(Opc, Reg1, Reg2, Reg3)
can be interpreted differently based on the opcode (2defs + 1 src for
unmerge vs 1 def + 2srcs).
This patch refactors the buildInstr to
buildInstr(Opc, ArrayRef<DstOps>, ArrayRef<SrcOps>)
where DstOps and SrcOps are typed unions that know how to add itself to
MachineInstrBuilder.
After this patch, most invocations would look like
B.buildInstr(Opc, {s32, DstReg}, {SrcRegs..., SrcMIBs..});
Now all the other calls (such as buildAdd, buildSub etc) forward to
buildInstr. It also makes it possible to build instructions with
multiple defs.
Additionally in a subsequent patch, we should make it possible to add
flags directly while building instructions.
Additionally, the main buildInstr method is now virtual and other
builders now only have to override buildInstr (for say constant
folding/cseing) is straightforward.
Also attached here (https://reviews.llvm.org/F7675680) is a clang-tidy
patch that should upgrade the API calls if necessary.
llvm-svn: 348815
Mucking about simplifying a test case ( https://reviews.llvm.org/D55261 ) I stumbled across something I've hit before - that LLVM's (GCC's does too, FWIW) assembly output includes a hardcode length for a DWARF unit in its header. Instead we could emit a label difference - making the assembly easier to read/edit (though potentially at a slight (I haven't tried to observe it) performance cost of delaying/sinking the length computation into the MC layer).
Reviewers: JDevlieghere, probinson, ABataev
Differential Revision: https://reviews.llvm.org/D55281
llvm-svn: 348806
This patch restricts the capability of G_MERGE_VALUES, and uses the new
G_BUILD_VECTOR and G_CONCAT_VECTORS opcodes instead in the appropriate places.
This patch also includes AArch64 support for selecting G_BUILD_VECTOR of <4 x s32>
and <2 x s64> vectors.
Differential Revisions: https://reviews.llvm.org/D53629
llvm-svn: 348788
If all the demanded elements of the SimplifyDemandedVectorElts are known to be UNDEF, we can simplify to an ISD::UNDEF node.
Zero constant folding will be handled in a future patch - its a little trickier as we often have bitcasted zero values.
Differential Revision: https://reviews.llvm.org/D55511
llvm-svn: 348784
As discussed on D55511, this caused an issue if the inner node deletes a node that the outer node depends upon. As it doesn't affect any lit-tests and I've only been able to expose this with the D55511 change I'm committing this now.
llvm-svn: 348781
Record the stack protector index in MachineFrameInfo when translating
Intrinsic::stackprotector similarly as is done by SelectionDAG when
processing the same intrinsic.
Setting this index allows the Prologue/Epilogue Insertion to recognize
that the stack protection is enabled. The pass can then make sure that
the stack protector comes before local variables on the stack and
assigns potentially vulnerable objects first so they are close to the
stack protector slot.
Differential Revision: https://reviews.llvm.org/D55418
llvm-svn: 348761
This triggers an assert when combining concat_vectors of a bitcast of
merge_values.
With asserts disabled, it fails to select:
fatal error: error in backend: Cannot select: 0x7ff19d000e90: i32 = any_extend 0x7ff19d000ae8
0x7ff19d000ae8: f64,ch = CopyFromReg 0x7ff19d000c20:1, Register:f64 %1
0x7ff19d000b50: f64 = Register %1
In function: d
Differential Revision: https://reviews.llvm.org/D55507
llvm-svn: 348759