In order to introduce v8.1a-specific entities, Mappers should be aware of SubtargetFeatures available.
This patch introduces refactoring, that will then allow to easily introduce:
- v8.1-specific "pan" PState for PStateMapper (PAN extension)
- v8.1-specific sysregs for SysRegMapper (LOR,VHE extensions)
Reviewers: jmolloy
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8496
Patch by Tom Coxon
llvm-svn: 235089
Summary:
This assembler directive marks the current label as an instruction label in microMIPS and MIPS16.
This initial implementation works only for microMIPS.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8006
llvm-svn: 235084
BXJ was incorrectly said to be unsupported in ARMv8-A. It is not
supported in the A64 instruction set, but it is supported in the T32
and A32 instruction sets, because it's listed as an instruction in the
ARM ARM section F7.1.28.
Using SP as an operand to BXJ changed from UNPREDICTABLE to
PREDICTABLE in v8-A. This patch reflects that update as well.
This was found by MCHammer.
llvm-svn: 235024
This is a 1-line patch (with a TODO for AVX because that will affect
even more regression tests) that lets us substitute the appropriate
64-bit store for the float/double/int domains.
It's not clear to me exactly what the difference is between the 0xD6 (MOVPQI2QImr) and
0x7E (MOVSDto64mr) opcodes, but this is apparently the right choice.
Differential Revision: http://reviews.llvm.org/D8691
llvm-svn: 235014
Set the transform bar at 2 divisions because the fastest current
x86 FP divider circuit is in SandyBridge / Haswell at 10 cycle
latency (best case) relative to a 5 cycle multiplier.
So that's the worst case for this transform (no latency win),
but multiplies are obviously pipelined while divisions are not,
so there's still a big throughput win which we would expect to
show up in typical FP code.
These are the sequences I'm comparing:
divss %xmm2, %xmm0
mulss %xmm1, %xmm0
divss %xmm2, %xmm0
Becomes:
movss LCPI0_0(%rip), %xmm3 ## xmm3 = mem[0],zero,zero,zero
divss %xmm2, %xmm3
mulss %xmm3, %xmm0
mulss %xmm1, %xmm0
mulss %xmm3, %xmm0
[Ignore for the moment that we don't optimize the chain of 3 multiplies
into 2 independent fmuls followed by 1 dependent fmul...this is the DAG
version of: https://llvm.org/bugs/show_bug.cgi?id=21768 ...if we fix that,
then the transform becomes even more profitable on all targets.]
Differential Revision: http://reviews.llvm.org/D8941
llvm-svn: 235012
Summary:
MSP430 doesn't seem to have any additional constraints. Therefore remove
the target hook.
No functional change intended.
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8208
llvm-svn: 235003
Summary:
Refactor MipsAsmParser::getATReg to return an internal register number instead of a register index.
Also change all the int's to unsigned, seeing as the current AT register index is stored as an unsigned in MipsAssemblerOptions.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8478
llvm-svn: 234996
The ARMv8 ARMARM states that for these instructions in A64 state:
"Unspecified bits in "imm5" are ignored but should be set to zero by an assembler.", (imm4 for INS).
Make the disassembler accept any encoding with these ignored bits set to 1.
llvm-svn: 234896
This pass will always try to insert llvm.SI.ifbreak intrinsics
in the same block that its conditional value is computed in. This is
a problem when conditions for breaks or continue are computed outside
of the loop, because the llvm.SI.ifbreak intrinsic ends up being inserted
outside of the loop.
This patch fixes this problem by inserting the llvm.SI.ifbreak
intrinsics in the loop header when the condition is computed outside
the loop.
llvm-svn: 234891
Some targets (ie. Mips) have additional rules for ordering the relocation
table entries. Allow them to override generic sortRelocs(), which sorts
entries by Offset.
Then override this function for Mips, to emit HI16 and GOT16 relocations
against the local symbol in pair with the corresponding LO16 relocation.
Patch by Vladimir Stefanovic.
Differential Revision: http://reviews.llvm.org/D7414
llvm-svn: 234883
Gut the `DIDescriptor` wrappers around `MDLocalScope` subclasses. Note
that `DILexicalBlock` wraps `MDLexicalBlockBase`, not `MDLexicalBlock`.
llvm-svn: 234850
Gut all the non-pointer API from the variable wrappers, except an
implicit conversion from `DIGlobalVariable` to `DIDescriptor`. Note
that if you're updating out-of-tree code, `DIVariable` wraps
`MDLocalVariable` (`MDVariable` is a common base class shared with
`MDGlobalVariable`).
llvm-svn: 234840
Revert "Remove default in fully-covered switch (to fix Clang -Werror -Wcovered-switch-default)"
Revert "R600: Add carry and borrow instructions. Use them to implement UADDO/USUBO"
Revert "LegalizeDAG: Try to use Overflow operations when expanding ADD/SUB"
Using overflow operations fails CodeGen/Generic/2011-07-07-ScheduleDAGCrash.ll
on hexagon, nvptx, and r600. Revert while I investigate.
llvm-svn: 234768
v2: tighten the sub64 tests
v3: rename to CARRY/BORROW
v4: fixup test cmdline
add known bits computation
use sign extend instead of sub 0,x
better add test
v5: remove redundant break
move lowering to separate functions
fix comments
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewers: arsenm
llvm-svn: 234759
When I fixed these a couple of days ago to iterate over all loops, not just
depth == 1 loops, I inadvertently made it such that we'd only look at the first
top-level loop. Make sure that we really look at all of them.
llvm-svn: 234705
As it turns out, even though these are part of ISA 2.06, the P7 does not
support them (or, at least, not any P7s we're tested so far).
llvm-svn: 234686
This patch corresponds to review:
http://reviews.llvm.org/D8928
It adds direct move instructions to/from VSX registers to GPR's. These are
exploited for FP <-> INT conversions.
llvm-svn: 234682
The patch is generated using clang-tidy misc-use-override check.
This command was used:
tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py \
-checks='-*,misc-use-override' -header-filter='llvm|clang' \
-j=32 -fix -format
http://reviews.llvm.org/D8925
llvm-svn: 234679
This pass had the same problem as the data-prefetching pass: it was only
checking for depth == 1 loops in practice. Fix that, add some debugging
statements, and make sure that, when we grab an AddRec, it is for the loop we
expect.
llvm-svn: 234670
Currently, there's a single flag, checked by the pass itself.
It can't force-enable the pass (and is on by default), because it
might not even have been created, as that's the targets decision.
Instead, have separate explicit flags, so that the decision is
consistently made in the target.
Keep the flag as a last-resort "force-disable GlobalMerge" for now,
for backwards compatibility.
llvm-svn: 234666
The spilled registers are pristine and thus, correctly handled by
the register scavenger and so on, but the liveness information is
strictly speaking wrong at this point.
Fix that.
llvm-svn: 234664
Iterating over loops from the LoopInfo instance only provides top-level loops.
We need to search the whole tree of loops to find the inner ones.
llvm-svn: 234603
Using SchedAliases is convenient and works well for latency and resource
lookup for instructions. However, this creates an entry in
AArch64WriteLatencyTable with a WriteResourceID of 0, breaking any
SchedReadAdvance since the lookup will fail.
http://reviews.llvm.org/D8043
Patch by Dave Estes <cestes@codeaurora.org>!
llvm-svn: 234594
Summary:
Some optimizations such as jump threading and loop unswitching can negatively
affect performance when applied to divergent branches. The divergence analysis
added in this patch conservatively estimates which branches in a GPU program
can diverge. This information can then help LLVM to run certain optimizations
selectively.
Test Plan: test/Analysis/DivergenceAnalysis/NVPTX/diverge.ll
Reviewers: resistor, hfinkel, eliben, meheff, jholewinski
Subscribers: broune, bjarke.roune, madhur13490, tstellarAMD, dberlin, echristo, jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D8576
llvm-svn: 234567
When we have an instruction for this (and, thus, don't generate a runtime
call), we need to custom type legalize this (in a trivial way, just as we do
for fp_to_sint).
Fixes PR23173.
llvm-svn: 234561
For the most common ones (such as fadd), we already did the promotion.
Do the same thing for all the others.
Currently, we'll just crash/assert on all these operations, as
there's no hardware or libcall support whatsoever.
f16 (half) is specified as an interchange - not arithmetic - format,
and is expected to be promoted to single-precision for arithmetic
operations.
While there, teach the legalizer about promoting some of the (mostly
floating-point) operations that we never needed before.
Differential Revision: http://reviews.llvm.org/D8648
See related discussion on the thread for: http://reviews.llvm.org/D8755
llvm-svn: 234550
This is the patch corresponding to review:
http://reviews.llvm.org/D8406
It adds some missing instructions from ISA 2.06 to the PPC back end.
llvm-svn: 234546
formatted_raw_ostream is a wrapper over another stream to add column and line
number tracking.
It is used only for asm printing.
This patch moves the its creation down to where we know we are printing
assembly. This has the following advantages:
* Simpler lifetime management: std::unique_ptr
* We don't compute column and line number of object files :-)
llvm-svn: 234535
The integer extend optimization tries to fold the extend into the load
instruction. This requires us to identify if the extend has already been
emitted or not and act accordingly on it.
The check that was originally performed for this was not sufficient. Besides
checking the ValueMap for a mapped register we also need to check if the
virtual register has already an associated machine instruction that defines it.
This fixes rdar://problem/20470788.
llvm-svn: 234529