the ones that get or set the frame index for the $gp save slot.
Remove the piece of code in MipsFunctionInfo::getGlobalBaseReg() which returns
GP. This function should always return a virtual register.
llvm-svn: 156695
- Stop creating stack frame objects needed for saving $gp.
- Insert a node that copies the global pointer register to register $gp
before the call node. This will ensure $gp is valid at the entry of the
called function.
llvm-svn: 156692
- Stop emitting instructions needed to initialize the global pointer register.
- Stop emitting .cprestore directive.
- Do not take into account the $gp save slot when computing stack size.
llvm-svn: 156691
- Remove code which lowers pseudo SETGP01.
- Fix LowerSETGP01. The first two of the three instructions that are emitted to
initialize the global pointer register now use register $2.
- Stop emitting .cpload directive.
llvm-svn: 156689
pointer register.
This is the first of the series of patches which clean up the way global pointer
register is used. The patches will make the following improvements:
- Make $gp an allocatable temporary register rather than reserving it.
- Use a virtual register as the global pointer register and let the register
allocator decide which register to assign to it or whether spill/reloads are
needed.
- Make sure $gp is valid at the entry of a called function, which is necessary
for functions using lazy binding.
- Remove the need for emitting .cprestore and .cpload directives.
llvm-svn: 156671
This patch will optimize the following cases:
sub r1, r3 | sub r1, imm
cmp r3, r1 or cmp r1, r3 | cmp r1, imm
bge L1
TO
subs r1, r3
bge L1 or ble L1
If the branch instruction can use flag from "sub", then we can replace
"sub" with "subs" and eliminate the "cmp" instruction.
rdar: 10734411
llvm-svn: 156599
This patch will optimize the following cases:
sub r1, r3 | sub r1, imm
cmp r3, r1 or cmp r1, r3 | cmp r1, imm
bge L1
TO
subs r1, r3
bge L1 or ble L1
If the branch instruction can use flag from "sub", then we can replace
"sub" with "subs" and eliminate the "cmp" instruction.
rdar: 10734411
llvm-svn: 156550
The getPointerRegClass() hook can return register classes that depend on
the calling convention of the current function (ptr_rc_tailcall).
So far, we have been able to infer the calling convention from the
subtarget alone, but as we add support for multiple calling conventions
per target, that no longer works.
Patch by Yiannis Tsiouris!
llvm-svn: 156328
This function is a generalization of getMatchingSuperRegClass() to the
symmetric case where both sides are using a sub-register index. It will
find a super-register class and sub-register indexes that make this
diagram commute:
PreA
SuperRC ----------> RCA
| |
| |
PreB | | SubA
| |
| |
V V
RCB ----------> SubRC
SubB
This can be used to coalesce copies like:
%vreg1:sub16 = COPY %vreg2:sub16; GR64:%vreg1, GR32: %vreg2
llvm-svn: 156317
This patch will optimize -(x != 0) on X86
FROM
cmpl $0x01,%edi
sbbl %eax,%eax
notl %eax
TO
negl %edi
sbbl %eax %eax
In order to generate negl, I added patterns in Target/X86/X86InstrCompiler.td:
def : Pat<(X86sub_flag 0, GR32:$src), (NEG32r GR32:$src)>;
rdar: 10961709
llvm-svn: 156312
This will be used to determine whether it's profitable to turn a select into a
branch when the branch is likely to be predicted.
Currently enabled for everything but Atom on X86 and Cortex-A9 devices on ARM.
I'm not entirely happy with the name of this flag, suggestions welcome ;)
llvm-svn: 156233
In file included from ../lib/Target/NVPTX/VectorElementize.cpp:53:
../lib/Target/NVPTX/NVPTX.h:44:3: warning: default label in switch which covers all enumeration values [-Wcovered-switch-default]
default: assert(0 && "Unknown condition code");
^
1 warning generated.
The prevailing pattern in LLVM is to not use a default label, and instead to
use llvm_unreachable to denote that the switch in fact covers all return paths
from the function.
llvm-svn: 156209
The new target machines are:
nvptx (old ptx32) => 32-bit PTX
nvptx64 (old ptx64) => 64-bit PTX
The sources are based on the internal NVIDIA NVPTX back-end, and
contain more functionality than the current PTX back-end currently
provides.
NV_CONTRIB
llvm-svn: 156196
This moves the logic for selecting a TLS model to a single place,
instead of the previous three (ARM, Mips, and X86 which already
uses this function).
llvm-svn: 156162
This iterator class provides a more abstract interface to the (Idx,
Mask) lists of super-registers for a register class. The layout of the
tables shouldn't be exposed to clients.
llvm-svn: 156144
for the assembler and disassembler. Which were not being set/read correctly
for offsets greater than 22 bits in some cases.
Changes to lib/Target/ARM/ARMAsmBackend.cpp from Gideon Myles!
llvm-svn: 156118
The ensures that virtual registers always belong to an allocatable class.
If your target attempts to create a vreg for an operand that has no
allocatable register subclass, you will crash quickly.
This ensures that targets define register classes as intended.
llvm-svn: 156046
Expressions for movw/movt don't always have an :upper16: or :lower16:
on them and that's ok. When they don't, it's just a plain [0-65536]
immediate result, effectively the same as a :lower16: variant kind.
rdar://10550147
llvm-svn: 155941
in order to avoid assertion failures in the register scavenger. The assertion
failures were “Bad machine code: Using an undefined physical register” and
“Bad machine code: MBB exits via unconditional fall-through but its successor
differs from its CFG successor!”.
llvm-svn: 155930
This patch will optimize the following cases on X86
(a > b) ? (a-b) : 0
(a >= b) ? (a-b) : 0
(b < a) ? (a-b) : 0
(b <= a) ? (a-b) : 0
FROM
movl %edi, %ecx
subl %esi, %ecx
cmpl %edi, %esi
movl $0, %eax
cmovll %ecx, %eax
TO
xorl %eax, %eax
subl %esi, %edi
cmovll %eax, %edi
movl %edi, %eax
rdar: 10734411
llvm-svn: 155919
The TargetPassManager's default constructor wants to initialize the PassManager
to 'null'. But it's illegal to bind a null reference to a null l-value. Make the
ivar a pointer instead.
PR12468
llvm-svn: 155902
Replace some assert() calls w/ actual diagnostics. In a perfect world,
there'd be range checks on these values long before things ever reached
this code. For now, though, issuing a better-late-than-never diagnostic
is still a big improvement over assert().
rdar://11347287
llvm-svn: 155851
This was exposed by SingleSource/UnitTests/Vector/constpool.c.
The computed size of a basic block isn't always a multiple of its known
alignment, and that can introduce extra alignment padding after the
block.
<rdar://problem/11347135>
llvm-svn: 155845
On x86-32, structure return via sret lets the callee pop the hidden
pointer argument off the stack, which the caller then re-pushes.
However if the calling convention is fastcc, then a register is used
instead, and the caller should not adjust the stack. This is
implemented with a check of IsTailCallConvention
X86TargetLowering::LowerCall but is now checked properly in
X86FastISel::DoSelectCall.
(this time, actually commit what was reviewed!)
llvm-svn: 155825
ARM BUILD_VECTORs created after type legalization cannot use i8 or i16
operands, since those types are not legal. Instead use i32 operands, which
will be implicitly truncated by the BUILD_VECTOR to match the element type.
llvm-svn: 155824
The code could search past the end of the basic block when there was
already a constant pool entry after the block.
Test case with giant basic block in SingleSource/UnitTests/Vector/constpool.c
llvm-svn: 155753
Make sure when parsing the Thumb1 sp+register ADD instruction that
the source and destination operands match. In thumb2, just use the
wide encoding if they don't. In Thumb1, issue a diagnostic.
rdar://11219154
llvm-svn: 155748
On x86-32, structure return via sret lets the callee pop the hidden
pointer argument off the stack, which the caller then re-pushes.
However if the calling convention is fastcc, then a register is used
instead, and the caller should not adjust the stack. This is
implemented with a check of IsTailCallConvention
X86TargetLowering::LowerCall but is now checked properly in
X86FastISel::DoSelectCall.
llvm-svn: 155745
Previously, ARMConstantIslandPass would conservatively compute the
address of an aligned basic block as:
RoundUpToAlignment(Offset + UnknownPadding)
This worked fine for the layout algorithm itself, but it could fool the
verify() function because it accounts for alignment padding twice: Once
when adding the worst case UnknownPadding, and again by rounding up the
fictional block offset. This meant that when optimizeThumb2Instructions
would shrink an instruction, the conservative distance estimate could
grow. That shouldn't be possible since the woorst case alignment padding
wss already included.
This patch drops the use of RoundUpToAlignment, and depends only on
worst case padding to compute conservative block offsets. This has the
weird effect that the computed offset for an aligned block may not be
aligned.
The important difference is that shrinking an instruction can never
cause the estimated distance between two instructions to grow. The
estimated distance is always larger than the real distance that only the
assembler knows.
<rdar://problem/11339352>
llvm-svn: 155744
x == -y --> x+y == 0
x != -y --> x+y != 0
On x86, the generated code goes from
negl %esi
cmpl %esi, %edi
je .LBB0_2
to
addl %esi, %edi
je .L4
This case is correctly handled for ARM with "cmn".
Patch by Manman Ren.
rdar://11245199
PR12545
llvm-svn: 155739
* Model FPSW (the FPU status word) as a register.
* Add ISel patterns for the FUCOM*, FNSTSW and SAHF instructions.
* During Legalize/Lowering, build a node sequence to transfer the comparison
result from FPSW into EFLAGS. If you're wondering about the right-shift: That's
an implicit sub-register extraction (%ax -> %ah) which is handled later on by
the instruction selector.
Fixes PR6679. Patch by Christoph Erhardt!
llvm-svn: 155704
The base address for the PC-relative load is Align(PC,4), so it's the
address of the word containing the 16-bit instruction, not the address
of the instruction itself. Ugh.
rdar://11314619
llvm-svn: 155659
On some cores it's a bad idea for performance to mix VFP and NEON instructions
and since these patterns are NEON anyway, the NEON load should be used.
llvm-svn: 155630
the feature set of v7a. This comes about if the user specifies something like
-arch armv7 -mcpu=cortex-m3. We shouldn't be generating instructions such as
uxtab in this case.
rdar://11318438
llvm-svn: 155601
When an instruction match is found, but the subtarget features it
requires are not available (missing floating point unit, or thumb vs arm
mode, for example), issue a diagnostic that identifies what the feature
mismatch is.
rdar://11257547
llvm-svn: 155499
immediate. We can't use it here because the shuffle code does not check that
the lower part of the word is identical to the upper part.
llvm-svn: 155440
using the pattern (vbroadcast (i32load src)). In some cases, after we generate
this pattern new users are added to the load node, which prevent the selection
of the blend pattern. This commit provides fallback patterns which perform
in-vector broadcast (using in-vector vbroadcast in AVX2 and pshufd on AVX1).
llvm-svn: 155437
on X86 Atom. Some of our tests failed because the tail merging part of
the BranchFolding pass was creating new basic blocks which did not
contain live-in information. When the anti-dependency code in the Post-RA
scheduler ran, it would sometimes rename the register containing
the function return value because the fact that the return value was
live-in to the subsequent block had been lost. To fix this, it is necessary
to run the RegisterScavenging code in the BranchFolding pass.
This patch makes sure that the register scavenging code is invoked
in the X86 subtarget only when post-RA scheduling is being done.
Post RA scheduling in the X86 subtarget is only done for Atom.
This patch adds a new function to the TargetRegisterClass to control
whether or not live-ins should be preserved during branch folding.
This is necessary in order for the anti-dependency optimizations done
during the PostRASchedulerList pass to work properly when doing
Post-RA scheduling for the X86 in general and for the Intel Atom in particular.
The patch adds and invokes the new function trackLivenessAfterRegAlloc()
instead of using the existing requiresRegisterScavenging().
It changes BranchFolding.cpp to call trackLivenessAfterRegAlloc() instead of
requiresRegisterScavenging(). It changes the all the targets that
implemented requiresRegisterScavenging() to also implement
trackLivenessAfterRegAlloc().
It adds an assertion in the Post RA scheduler to make sure that post RA
liveness information is available when it is needed.
It changes the X86 break-anti-dependencies test to use –mcpu=atom, in order
to avoid running into the added assertion.
Finally, this patch restores the use of anti-dependency checking
(which was turned off temporarily for the 3.1 release) for
Intel Atom in the Post RA scheduler.
Patch by Andy Zhang!
Thanks to Jakob and Anton for their reviews.
llvm-svn: 155395
test suite failures. The failures occur at each stage, and only get
worse, so I'm reverting all of them.
Please resubmit these patches, one at a time, after verifying that the
regression test suite passes. Never submit a patch without running the
regression test suite.
llvm-svn: 155372
Use the new TwoOperandAliasConstraint to handle lots of the two-operand aliases
for NEON instructions. There's still more to go, but this is a good chunk of
them.
llvm-svn: 155210
(load only has one operand) and smuggle in some whitespace changes too
NB: I am obviously testing the water here, and believe that the unguarded
cast is still wrong, but why is the getZExtValue of the load's operand
tested against zero here? Any review is appreciated.
llvm-svn: 155190
symbolicated. These have and operand type of TYPE_RELv which was not handled
as isBranch in translateImmediate() in X86Disassembler.cpp. rdar://11268426
llvm-svn: 155074
commits have had several major issues pointed out in review, and those
issues are not being addressed in a timely fashion. Furthermore, this
was all committed leading up to the v3.1 branch, and we don't need piles
of code with outstanding issues in the branch.
It is possible that not all of these commits were necessary to revert to
get us back to a green state, but I'm going to let the Hexagon
maintainer sort that out. They can recommit, in order, after addressing
the feedback.
Reverted commits, with some notes:
Primary commit r154616: HexagonPacketizer
- There are lots of review comments here. This is the primary reason
for reverting. In particular, it introduced large amount of warnings
due to a bad construct in tablegen.
- Follow-up commits that should be folded back into this when
reposting:
- r154622: CMake fixes
- r154660: Fix numerous build warnings in release builds.
- Please don't resubmit this until the three commits above are
included, and the issues in review addressed.
Primary commit r154695: Pass to replace transfer/copy ...
- Reverted to minimize merge conflicts. I'm not aware of specific
issues with this patch.
Primary commit r154703: New Value Jump.
- Primarily reverted due to merge conflicts.
- Follow-up commits that should be folded back into this when
reposting:
- r154703: Remove iostream usage
- r154758: Fix CMake builds
- r154759: Fix build warnings in release builds
- Please incorporate these fixes and and review feedback before
resubmitting.
Primary commit r154829: Hexagon V5 (floating point) support.
- Primarily reverted due to merge conflicts.
- Follow-up commits that should be folded back into this when
reposting:
- r154841: Remove unused variable (fixing build warnings)
There are also accompanying Clang commits that will be reverted for
consistency.
llvm-svn: 155047
also fix SimplifyLibCalls to use TLI rather than compile-time conditionals to enable optimizations on floor, ceil, round, rint, and nearbyint
llvm-svn: 154960
As an example, attach range info to the "invalid instruction" message:
$ clang -arch arm -c asm.c
asm.c:2:11: error: invalid instruction
__asm__("foo r0");
^
<inline asm>:1:2: note: instantiated into assembly here
foo r0
^~~
llvm-svn: 154765