DAGCombiner::ReassociateOps was correctly testing for an constant integer scalar but failed to correctly test for constant integer vectors (it was testing for any constant vector).
llvm-svn: 233482
The original f32->f64 promotion logic was refactored into roughly the
currently shape in r37781. However, starting with r132263, the
legalizer has been split into different kinds, and the previous
"Promote" (which did the right thing) was search-and-replace'd into
"PromoteInteger". The divide gradually deepened, with type legalization
("PromoteInteger") being separated from ops legalization
("Promote", which still works for floating point ops).
Fast-forward to today: there's no in-tree target with legal f64 but
illegal f32 (rather: no tests were harmed in the making of this patch).
With such a target, i.e., if you trick the legalizer into going through
the PromoteInteger path for FP, you get the expected brokenness.
For instance, there's no PromoteIntRes_FADD (the name itself sounds
wrong), so we'll just hit some assert in the PromoteInteger path.
Don't pretend we can promote f32 to f64. Instead, always soften.
llvm-svn: 233464
Tailcalls are only OK with forwarded sret pointers. With explicit sret,
one approximation is to check that the pointer isn't an Instruction, as
in that case it might point into some local memory (alloca). That's not
OK with tailcalls.
Explicit sret counterpart to r233409.
Differential Revison: http://reviews.llvm.org/D8510
llvm-svn: 233410
Tailcalls are only OK with forwarded sret pointers. With sret demotion,
they're not, as we'd have a pointer into a soon-to-be-dead stack frame.
Differential Revison: http://reviews.llvm.org/D8510
llvm-svn: 233409
nodes.
When a node is terminal it is pushed at the end of the list of the copies to
coalesce instead of being completely ignored. In effect, this reduces its
priority over non-terminal nodes.
Because of that, we do not miss the rematerialization opportunities, nor the
copies that can be merged with more complex, than the terminal rule,
interference checks.
Related to PR22768.
llvm-svn: 233395
"Fix the MachineScheduler's logic for updating ready times for in-order.
Now the scheduler updates a node's ready time as soon as it is
scheduled, before releasing dependent nodes."
This fix was only made in one variant of the ScheduleDAGMI driver.
Francois de Ferriere reported the issue in the other bit of code where
it was also needed.
I never got around to coming up with a test case, but it's an
obvious fix that shouldn't be delayed any longer.
I'll try to refactor this code a little better.
I did verify performance on a wide variety of targets and saw no
negative impact with this fix.
llvm-svn: 233366
This was discussed a while back and I left it optional for migration. Since it's been far more than the 'week or two' that was discussed, time to actually make this manditory.
llvm-svn: 233357
This patch adds support for explicitly provided spill slots in the GC arguments of a gc.statepoint. This is somewhat analogous to gcroot, but leverages the STATEPOINT MI node and StackMap infrastructure. The motivation for this is:
1) The stack spilling code for gc.statepoints hasn't advanced as fast as I'd like. One major option is to give up on doing spilling in the backend and do it at the IR level instead. We'd give up the ability to have gc values in registers, but that's a minor cost in practice. We are not neccessarily moving in that direction, but having the ability to prototype such a thing cheaply is interesting.
2) I want to port the gcroot lowering to use the statepoint infastructure. Given the metadata printers for gcroot expect a fixed set of stack roots, it's easiest to just reuse the explicit stack slots and pass them directly to the underlying statepoint.
I'm holding off on the documentation for the new feature until I'm reasonable sure this is going to stick around.
llvm-svn: 233356
We don't have any logic to emit those tables yet, so the SDAG lowering
of this intrinsic is just a stub. We can see the intrinsic in the
prepared IR, though.
llvm-svn: 233354
It can happen (by line CurSU->isPending = true; // This SU is not in
AvailableQueue right now.) that a SUnit is mark as available but is
not in the AvailableQueue. For SUnit being selected for scheduling
both conditions must be met.
This patch mainly defensively protects from invalid removing a node
from a queue. Sometimes nodes are marked isAvailable but are not in
the queue because they have been defered due to some hazard.
Patch by Pawel Bylica!
llvm-svn: 233351
We used to mark a bunch of libm nodes as Expand for f16. There are no
libcalls we can use for those, so we eventually just hit an unhelpful
llvm_unreachable in ExpandFPLibCall.
Instead, just ignore them altogether. If nothing else changes, we'll
then get the more descriptive and pleasant "Cannot select" fatal error.
There's an argument to be made for consistency, but f16 is already
special in all the good ways, and as long as there's no f16 support in
the ops expander (this patch), as well as the Soften/Expand float
legalizers (which, when hit, will currently segfault), I think there's
no point in even pretending we can legalize any of this.
This shouldn't affect anything that's not already broken.
llvm-svn: 233328
those are in the same basic block.
The previous approach was the topological order of the basic block.
By default this rule is disabled.
Related to PR22768.
llvm-svn: 233241
This patch adds supports for the vector constant folding of TRUNCATE and FP_EXTEND instructions and tidies up the SINT_TO_FP and UINT_TO_FP instructions to match.
It also moves the vector constant folding for the FNEG and FABS instructions to use the DAG.getNode() functionality like the other unary instructions.
Differential Revision: http://reviews.llvm.org/D8593
llvm-svn: 233224
If liveranges induced by an IMPLICIT_DEF get completely covered by a
proper liverange the IMPLICIT_DEF instructions and its corresponding
definitions have to be removed from the live ranges. This has to happen
in the subregister live ranges as well (I didn't see this case earlier
because in most programs only some subregisters are covered and the
IMPLCIT_DEF won't get removed).
No testcase, I spent hours trying to create one for one of the public
targets, but ultimately failed because I couldn't manage to properly
control the placement of COPY and IMPLICIT_DEF instructions from an .ll
file.
llvm-svn: 233217
We don't have any logic to emit those tables yet, so the sdag lowering
of this intrinsic is just a stub. We can see the intrinsic in the
prepared IR, though.
llvm-svn: 233209
Reverts the code change from r221168 and the relevant test.
It was a mistake to disable the combiner, and based on the ultimate
definition of 'optnone' we shouldn't have considered the test case
as failing in the first place.
llvm-svn: 233153
We can't use TargetFrameLowering::getFrameIndexOffset directly, because
Win64 really wants the offset from the stack pointer at the end of the
prologue. Instead, use X86FrameLowering::getFrameIndexOffsetFromSP(),
which is a pretty close approximiation of that. It fails to handle cases
with interestingly large stack alignments, which is pretty uncommon on
Win64 and is TODO.
llvm-svn: 233137
The changes to InstCombine do seem a bit silly - it doesn't make
anything obviously better to have the caller access the pointers element
type (the thing I'm trying to remove) than the GEP itself, but it's a
helpful migration step. This will allow me to more obviously lock down
GEP (& Load, etc) API usage, then fix all the code that accesses pointer
element types except the places that need to be removed (most of the
InstCombines) anyway - at which point I'll need to just remove all that
code because it won't be meaningful anymore (there will be no pointer
types, so no bitcasts to combine)
llvm-svn: 233126
While the uitofp scalar constant folding treats an integer as an unsigned value (from lang ref):
%X = sitofp i8 -1 to double ; yields double:-1.0
%Y = uitofp i8 -1 to double ; yields double:255.0
The vector constant folding was always using sitofp:
%X = sitofp <2 x i8> <i8 -1, i8 -1> to <2 x double> ; yields <double -1.0, double -1.0>
%Y = uitofp <2 x i8> <i8 -1, i8 -1> to <2 x double> ; yields <double -1.0, double -1.0>
This patch fixes this so that the correct opcode is used for sitofp and uitofp.
%X = sitofp <2 x i8> <i8 -1, i8 -1> to <2 x double> ; yields <double -1.0, double -1.0>
%Y = uitofp <2 x i8> <i8 -1, i8 -1> to <2 x double> ; yields <double 255.0, double 255.0>
Differential Revision: http://reviews.llvm.org/D8560
llvm-svn: 233033
There is now a canonical symbol at the end of a section that different
passes can request.
This also allows us to assert that we don't switch back to a section whose
end symbol has already been printed.
llvm-svn: 233026
Fixing sign extension in makeLibCall for MIPS64. In MIPS64 architecture all
32 bit arguments (int, unsigned int, float 32 (soft float)) must be sign
extended. This fixes test "MultiSource/Applications/oggenc/".
Patch by Strahinja Petrovic.
Differential Revision: http://reviews.llvm.org/D7791
llvm-svn: 232943
Because the operands of a vector SETCC node can be of a different type from the
result (and often are), it can happen that even if we'd prefer to widen the
result type of the SETCC, the operands have been split instead. In this case,
the SETCC result also must be split. This mirrors what is done in
WidenVecRes_SELECT, and should be NFC elsewhere because if the operands are not
widened the following calls to GetWidenedVector will assert (which is what was
happening in the test case).
llvm-svn: 232935
As preparation for removing the getSubtargetImpl() call from
TargetMachine go ahead and flip the switch on caching the function
dependent subtarget and remove the bare getSubtargetImpl call
from the X86 port. As part of this add a few tests that show we
can generate code and assemble on X86 based on features/cpu on
the Function.
llvm-svn: 232879
thumb-ness similar to the rest of the Module level asm printing
infrastructure as debug info finalization happens after the function
may be missing.
llvm-svn: 232875
If we couldn't analyze its terminator (i.e., it's an indirectbr, or some
other weirdness), we can't safely re-if-convert a predicated block,
because we can't tell whether the predicated terminator can
fallthrough (it does).
Currently, we would completely ignore the fallthrough successor. In
the added testcase, this means we used to generate:
...
@ %entry:
cmp r5, #21
ittt ne
@ %cc1f:
cmpne r7, #42
@ %cc2t:
strne.w r5, [r8]
movne pc, r10
@ %cc1t:
...
Whereas the successor of %cc1f was originally %bb1.
With the fix, we get the correct:
...
@ %entry:
cmp r5, #21
itt eq
@ %cc1t:
streq.w r5, [r11]
moveq pc, r0
@ %cc1f:
cmp r7, #42
itt ne
@ %cc2t:
strne.w r5, [r8]
movne pc, r10
@ %bb1:
...
rdar://20192768
Differential Revision: http://reviews.llvm.org/D8509
llvm-svn: 232872
The code this patch removes was there to make sure the text sections went
before the dwarf sections. That is necessary because MachO uses offsets
relative to the start of the file, so adding a section can change relaxations.
The dwarf sections were being printed at the start just to produce symbols
pointing at the start of those sections.
The underlying issue was fixed in r231898. The dwarf sections are now printed
when they are about to be used, which is after we printed the text sections.
To make sure we don't regress, the patch makes the MachO streamer assert
if CodeGen puts anything unexpected after the DWARF sections.
llvm-svn: 232842
Check return of `getDISubprogram()` before using it. A WIP patch makes
`DIDescriptor` accessors more strict (and would crash on this).
llvm-svn: 232838
`DL` might be null, so check for that before using accessors. A WIP
patch to make `DIDescriptors` more strict fails otherwise.
As a bonus, I think the logic is easier to follow now (despite the extra
nesting depth).
llvm-svn: 232836
LocalStackSlotPass assumes that isFrameOffsetLegal doesn't change its
answer when the base register changes. Unfortunately this isn't true
in thumb1, where SP-based loads allow a larger offset than
non-SP-based loads, and this causes the base register reuse code to
generate instructions that are unencodable, causing an assertion
failure.
Solve this by adding a BaseReg parameter to isFrameOffsetLegal, which
ARMBaseRegisterInfo can then make use of to give the correct answer.
Differential Revision: http://reviews.llvm.org/D8419
llvm-svn: 232825
numbers before emission.
This removes a dependency on being able to access TRI at the module
level and is similar to the DwarfExpression handling. I've modified
the debug support into print/dump routines that'll do the same dumping
but is now callable anywhere and if TRI isn't available will go ahead
and just print out raw register numbers.
llvm-svn: 232821
nullptr so that users get an earlier dereferencing error and
so that we can use it to conditionalize access to MachineFunction
specific data.
llvm-svn: 232820
With the option -outline-optional-branches, LLVM will place optional
branches out of line (more details on r231230).
With this patch, this is not done for short optional branches. A short
optional branch is a branch containing a single block with an
instruction count below a certain threshold (defaulting to 3). Still
everything is guarded under -outline-optional-branches).
Outlining a short branch can't significantly improve code locality. It
can however decrease performance because of the additional jmp and in
cases where the optional branch is hot. This fixes a compile time
regression I have observed in a benchmark.
Review: http://reviews.llvm.org/D8108
llvm-svn: 232802
This is very related to the bug fixed in r174431. The problem is that
SelectionDAG does not include alignment in the uniquing of loads and
stores. When an otherwise no-op DAGCombine would increase the alignment
of a load or store, the original node would be returned (with the
alignment increased), which would cause the node not to be processed by
any further DAGCombines.
I don't have a direct testcase for this that manifests on an in-tree
target, but I did see some noise in the tests for other targets and have
updated them for it.
llvm-svn: 232780
This enables us to remove calls to the subtarget from the TargetMachine
and with a small hack for backends that require global subtarget
information for module level code generation, e.g. mips abi flags, as
mentioned in a fixme in the code.
llvm-svn: 232776
This switches the sense of the i32 values and updates the test cases.
We can also use CHECK-SAME to clean up some tests, and reduce the visual
noise from bitcasts.
llvm-svn: 232774
Remove `DebugInfoVerifierLegacyPass` and the `-verify-di` pass.
Instead, call into the `DebugInfoVerifier` from inside
`VerifierLegacyPass::finalizeModule()`. This better matches the logic
in `verifyModule()` (used by the new PassManager), avoids requiring two
separate passes to verify the IR, and makes the API for "add a pass to
verify the IR" simple.
Note: the `-verify-debug-info` flag still works (for now, at least;
eventually it might make sense to just remove it).
llvm-svn: 232772
Some subregisters are only to indicate different access sizes, while not
providing any way to actually divide the register up into multiple
disjunct parts. Avoid tracking subregister liveness in these cases as it
is not beneficial.
Differential Revision: http://reviews.llvm.org/D8429
llvm-svn: 232695
No outlining is necessary for SEH catch blocks. Use the blockaddr of the
handler in place of the usual outlined function.
Reviewers: majnemer, andrew.w.kaylor
Differential Revision: http://reviews.llvm.org/D8370
llvm-svn: 232664
Memcpy, and other memory intrinsics, typically tries to use LDM/STM if
the source and target addresses are 4-byte aligned. In CodeGenPrepare
look for calls to memory intrinsics and, if the object is on the
stack, 4-byte align it if it's large enough that we expect that memcpy
would want to use LDM/STM to copy it.
Differential Revision: http://reviews.llvm.org/D7908
llvm-svn: 232627
Targets which provide a rotate make it possible to replace a sequence of
(XOR (SHL 1, x), -1) with (ROTL ~1, x). This saves an instruction on
architectures like X86 and POWER(64).
Differential Revision: http://reviews.llvm.org/D8350
llvm-svn: 232572
COFF COMDATs (for selection kinds other than 'select any') require at
least one non-section symbol in the symbol table.
Satisfy this by morally enhancing the linkage from private to internal.
Differential Revision: http://reviews.llvm.org/D8394
llvm-svn: 232570
Summary:
COFF COMDATs (for selection kinds other than 'select any') require at
least one non-section symbol in the symbol table.
Satisfy this by morally enhancing the linkage from private to internal.
Reviewers: rafael
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8374
llvm-svn: 232539
Before this patch code wanting to create temporary labels for a given entity
(function, cu, exception range, etc) had to keep its own counter to have stable
symbol names.
createTempSymbol would still add a suffix to make sure a new symbol was always
returned, but it kept a single counter. Because of that, if we were to use
just createTempSymbol("cu_begin"), the label could change from cu_begin42 to
cu_begin43 because some other code started using temporary labels.
Simplify this by just keeping one counter per prefix and removing the various
specialized counters.
llvm-svn: 232535
Now that we check `MDExpression` during `-verify` (r232299), make
the `DIExpression` wrapper more strict:
- remove redundant checks in `DebugInfoVerifier`,
- overload `get()` to `cast_or_null<MDExpression>` (superseding
`getRaw()`),
- stop checking for null in any accessor, and
- remove `DIExpression::Verify()` entirely in favour of
`MDExpression::isValid()`.
There is still some logic in this class, mostly to do with high-level
iterators; I'll defer cleaning up those until the rest of the wrappers
are similarly strict.
llvm-svn: 232412
are not at the file level.
Previously, the default subtarget created from the target triple was used to
emit inline asm instructions. Compilation would fail in cases where the feature
bits necessary to assemble an inline asm instruction in a function weren't set.
llvm-svn: 232392
Specifically, if there are copy-like instructions in the loop header
they are moved into the loop close to their uses. This reduces the live
intervals of the values and can avoid register spills.
This is working towards a fix for http://llvm.org/PR22230.
Review: http://reviews.llvm.org/D7259
Next steps:
- Find a better cost model (which non-copy instructions should be sunk?)
- Make this dependent on register pressure
llvm-svn: 232262