Commit Graph

7511 Commits

Author SHA1 Message Date
Venkatraman Govindaraju f80d72f149 Sparc: Add support for indirect branch and blockaddress in Sparc backend.
llvm-svn: 183094
2013-06-03 05:58:33 +00:00
Venkatraman Govindaraju 774fe2e29a Sparc: When storing 0, use %g0 directly in the store instruction instead of
using two instructions (sethi and store).

llvm-svn: 183090
2013-06-03 00:21:54 +00:00
Venkatraman Govindaraju 0bbe1b210e Sparc: Combine add/or/sethi instruction with restore if possible.
llvm-svn: 183088
2013-06-02 21:48:17 +00:00
Venkatraman Govindaraju 3e8c7d98be Sparc: Perform leaf procedure optimization by default
llvm-svn: 183083
2013-06-02 02:24:27 +00:00
Venkatraman Govindaraju 28e2cd0e7e Sparc: Mark functions calling llvm.vastart and llvm.returnaddress intrinsics as non-leaf functions.
llvm-svn: 183079
2013-06-01 20:42:48 +00:00
Tim Northover 339bf154cc Revert r183069: "TMP: LEA64_32r fixing"
Very sorry, it was committed from the wrong branch by mistake.

llvm-svn: 183070
2013-06-01 10:23:46 +00:00
Tim Northover 57954f04b3 TMP: LEA64_32r fixing
llvm-svn: 183069
2013-06-01 10:21:54 +00:00
Tim Northover 3a1fd4c0ac X86: change MOV64ri64i32 into MOV32ri64
The MOV64ri64i32 instruction required hacky MCInst lowering because it
was allocated as setting a GR64, but the eventual instruction ("movl")
only set a GR32. This converts it into a so-called "MOV32ri64" which
still accepts a (appropriate) 64-bit immediate but defines a GR32.
This is then converted to the full GR64 by a SUBREG_TO_REG operation,
thus keeping everyone happy.

This fixes a typo in the opcode field of the original patch, which
should make the legact JIT work again (& adds test for that problem).

llvm-svn: 183068
2013-06-01 09:55:14 +00:00
Venkatraman Govindaraju 3521dcdcc4 [Sparc] Generate correct code for leaf functions with stack objects
llvm-svn: 183067
2013-06-01 04:51:18 +00:00
Eric Christopher e1e57e5ebd Temporarily Revert "X86: change MOV64ri64i32 into MOV32ri64" as it
seems to have caused PR16192 and other JIT related failures.

llvm-svn: 183059
2013-05-31 23:30:45 +00:00
Quentin Colombet 8aa7abe2ae Modify how the formulae are rated in Loop Strength Reduce.
Namely, check if the target allows to fold more that one register in the
addressing mode and if yes, adjust the cost accordingly.

Prior to this commit, reg1 + scale * reg2 accesses were artificially preferred
to reg1 + reg2 accesses. Indeed, the cost model wrongly assumed that reg1 + reg2
needs a temporary register for the computation, whereas it was correctly
estimated for reg1 + scale * reg2.

<rdar://problem/13973908>

llvm-svn: 183021
2013-05-31 17:20:29 +00:00
Richard Sandiford 30efd87f6e [SystemZ] Don't use LOAD and STORE REVERSED for volatile accesses
Unlike most -- hopefully "all other", but I'm still checking -- memory
instructions we support, LOAD REVERSED and STORE REVERSED may access
the memory location several times.  This means that they are not suitable
for volatile loads and stores.

This patch is a prerequisite for better atomic load and store support.
The same principle applies there: almost all memory instructions we
support are inherently atomic ("block concurrent"), but LOAD REVERSED
and STORE REVERSED are exceptions.

Other instructions continue to allow volatile operands.  I will add
positive "allows volatile" tests at the same time as the "allows atomic
load or store" tests.

llvm-svn: 183002
2013-05-31 13:25:22 +00:00
Justin Holewinski dbb3b2f4b6 [NVPTX] Re-enable support for virtual registers in the final output
Now that 3.3 is branched, we are re-enabling virtual registers to help
iron out bugs before the next release. Some of the post-RA passes do
not play well with virtual registers, so we disable them for now. The
needed functionality of the PrologEpilogInserter pass is copied to a
new backend-specific NVPTXPrologEpilog pass.

The test for this commit is not breaking the existing tests.

llvm-svn: 182998
2013-05-31 12:14:49 +00:00
Tim Northover d4736d67f4 X86: change MOV64ri64i32 into MOV32ri64
The MOV64ri64i32 instruction required hacky MCInst lowering because it was
allocated as setting a GR64, but the eventual instruction ("movl") only set a
GR32. This converts it into a so-called "MOV32ri64" which still accepts a
(appropriate) 64-bit immediate but defines a GR32. This is then converted to
the full GR64 by a SUBREG_TO_REG operation, thus keeping everyone happy.

llvm-svn: 182991
2013-05-31 09:57:13 +00:00
Akira Hatanaka 2bf97336af [mips] Big-endian code generation for atomic instructions.
Patch by Jyun-Yan You.

llvm-svn: 182984
2013-05-31 03:25:44 +00:00
Rafael Espindola 99bd2ae479 Revert r182937 and r182877.
r182877 broke MCJIT tests on ARM and r182937 was working around another failure
by r182877.

This should make the ARM bots green.

llvm-svn: 182960
2013-05-30 20:37:52 +00:00
Benjamin Kramer 54cd84861e Force a triple so we don't get bitten by windows' different regalloc.
llvm-svn: 182935
2013-05-30 15:39:35 +00:00
Benjamin Kramer dc93c8d50c Force fragile test to the atom scheduler model.
The pattern the test originally checked for doesn't occur on other -mcpu
settings. On atom it's still there though slightly differently scheduled.

llvm-svn: 182933
2013-05-30 15:22:28 +00:00
Tim Northover c0b42a257d X86: allow registers 8-15 in test
This test was failing on some hosts when an unexpected register was used for a
variable. This just extends the regexp to allow the new x86-64 registers.

llvm-svn: 182929
2013-05-30 13:56:32 +00:00
Tim Northover 64ec0ff433 X86: use sub-register sequences for MOV*r0 operations
Instead of having a bunch of separate MOV8r0, MOV16r0, ... pseudo-instructions,
it's better to use a single MOV32r0 (which will expand to "xorl %reg, %reg")
and obtain other sizes with EXTRACT_SUBREG and SUBREG_TO_REG. The encoding is
smaller and partial register updates can sometimes be avoided.

Until recently, this sequence was a barrier to rematerialization though. That
should now be fixed so it's an appropriate time to make the change.

llvm-svn: 182928
2013-05-30 13:19:42 +00:00
Justin Holewinski 994d66a345 [NVPTX] Fix case where a sext load of an i1 type may produce an
ld.u1 instead of an ld.u8.

llvm-svn: 182924
2013-05-30 12:22:39 +00:00
Richard Sandiford 46af5a2cdc [SystemZ] Enable unaligned accesses
The code to distinguish between unaligned and aligned addresses was
already there, so this is mostly just a switch-on-and-test process.

llvm-svn: 182920
2013-05-30 09:45:42 +00:00
Rafael Espindola 4f60a38f18 Change how we iterate over relocations on ELF.
For COFF and MachO, sections semantically have relocations that apply to them.
That is not the case on ELF.

In relocatable objects (.o), a section with relocations in ELF has offsets to
another section where the relocations should be applied.

In dynamic objects and executables, relocations don't have an offset, they have
a virtual address. The section sh_info may or may not point to another section,
but that is not actually used for resolving the relocations.

This patch exposes that in the ObjectFile API. It has the following advantages:

* Most (all?) clients can handle this more efficiently. They will normally walk
all relocations, so doing an effort to iterate in a particular order doesn't
save time.

* llvm-readobj now prints relocations in the same way the native readelf does.

* probably most important, relocations that don't point to any section are now
visible. This is the case of relocations in the rela.dyn section. See the
updated relocation-executable.test for example.

llvm-svn: 182908
2013-05-30 03:05:14 +00:00
Bill Wendling 2aa007c59c This testcase tests command line attributes which we don't yet support.
In fact, we're probably going to support these flags in completely different
ways. So this test is no longer valid.

llvm-svn: 182899
2013-05-30 00:32:04 +00:00
Andrew Trick ad6d08ac6f Order CALLSEQ_START and CALLSEQ_END nodes.
Fixes PR16146: gdb.base__call-ar-st.exp fails after
pre-RA-sched=source fixes.

Patch by Xiaoyi Guo!

This also fixes an unsupported dbg.value test case. Codegen was
previously incorrect but the test was passing by luck.

llvm-svn: 182885
2013-05-29 22:03:55 +00:00
JF Bastien f60e0e44ca Enable FastISel on ARM for Linux and NaCl
FastISel was only enabled for iOS ARM and Thumb2, this patch enables it
for ARM (not Thumb2) on Linux and NaCl.

Thumb2 support needs a bit more work, mainly around register class
restrictions.

The patch punts to SelectionDAG when doing TLS relocation on non-Darwin
targets. I will fix this and other FastISel-to-SelectionDAG failures in
a separate patch.

The patch also forces FastISel to retain frame pointers: iOS always
keeps them for backtracking (so emitted code won't change because of
this), but Linux was getting much worse code that was incorrect when
using big frames (such as test-suite's lencod). I'll also fix this in a
later patch, it will probably require a peephole so that FastISel
doesn't rematerialize frame pointers back-to-back.

The test changes are straightforward, similar to:
  http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130513/174279.html
They also add a vararg test that got dropped in that change.

I ran all of test-suite on A15 hardware with --optimize-option=-O0 and
all the tests pass.

llvm-svn: 182877
2013-05-29 20:38:10 +00:00
Tim Northover b65f6b0820 Teach ReMaterialization to be more cunning about subregisters
This allows rematerialization during register coalescing to handle
more cases involving operations like SUBREG_TO_REG which might need to
be rematerialized using sub-register indices.

For example, code like:
    v1(GPR64):sub_32 = MOVZ something
    v2(GPR64) = COPY v1(GPR64)
should be convertable to:
    v2(GPR64):sub_32 = MOVZ something

but previously we just gave up in places like this

llvm-svn: 182872
2013-05-29 19:32:06 +00:00
Richard Sandiford ba97c34bb6 [SystemZ] Two tests missing from previous commit
llvm-svn: 182847
2013-05-29 11:59:26 +00:00
Richard Sandiford e1d9f00f09 [SystemZ] Immediate compare-and-branch support
This patch adds support for the CIJ and CGIJ instructions.

llvm-svn: 182846
2013-05-29 11:58:52 +00:00
Venkatraman Govindaraju ca0fe2f57e [Sparc] Add support for leaf functions in sparc backend.
llvm-svn: 182822
2013-05-29 04:46:31 +00:00
Richard Sandiford 0fb90ab0cb [SystemZ] Register compare-and-branch support
This patch adds support for the CRJ and CGRJ instructions.  Support for
the immediate forms will be a separate patch.

The architecture has a large number of comparison instructions.  I think
it's generally better to concentrate on using the "best" comparison
instruction first and foremost, then only use something like CRJ if
CR really was the natual choice of comparison instruction.  The patch
therefore opportunistically converts separate CR and BRC instructions
into a single CRJ while emitting instructions in ISelLowering.

llvm-svn: 182764
2013-05-28 10:41:11 +00:00
Preston Gurd 048f99de11 Convert sqrt functions into sqrt instructions when -ffast-math is in effect.
When -ffast-math is in effect (on Linux, at least), clang defines
__FINITE_MATH_ONLY__ > 0 when including <math.h>. This causes the
preprocessor to include <bits/math-finite.h>, which renames the sqrt functions.
For instance, "sqrt" is renamed as "__sqrt_finite". 

This patch adds the 3 new names in such a way that they will be treated
as equivalent to their respective original names.

llvm-svn: 182739
2013-05-27 15:44:35 +00:00
Rafael Espindola cca5f562db Add a cpu to try to bring back the atom bots.
llvm-svn: 182734
2013-05-27 13:22:52 +00:00
Hal Finkel 7d8a691b5d Prefer to duplicate PPC Altivec loads when expanding unaligned loads
When expanding unaligned Altivec loads, we use the decremented offset trick to
prevent page faults. Unfortunately, if we have a sequence of consecutive
unaligned loads, this leads to suboptimal code generation because the 'extra'
load from the first unaligned load can be combined with the base load from the
second (but only if the decremented offset trick is not used for the first).
Search up and down the chain, through loads and token factors, looking for
consecutive loads, and if one is found, don't use the offset reduction trick.
These duplicate loads are later combined to yield the desired sequence (in the
future, we might want a more-powerful chain search, but that will require some
changes to allow the combiner routines to access the AA object).

This should complete the initial implementation of the optimized unaligned
Altivec load expansion. There is some refactoring that should be done, but
that will happen when the unaligned store expansion is added.

llvm-svn: 182719
2013-05-26 18:08:30 +00:00
Andrew Trick c66d26adf0 Fix PR16143: Insert DEBUG_VALUE before terminator.
llvm-svn: 182717
2013-05-26 08:58:50 +00:00
Hal Finkel bc2ee4c4e6 PPC: Combine duplicate (offset) lvsl Altivec intrinsics
The lvsl permutation control instruction is a function only of the alignment of
the pointer operand (relative to the 16-byte natural alignment of Altivec
vectors). As a result, multiple lvsl intrinsics where the operands differ by a
multiple of 16 can be combined.

llvm-svn: 182708
2013-05-25 04:05:05 +00:00
Andrew Trick 8972aba193 Track IR ordering of SelectionDAG nodes 4/4.
Unit test cases for -pre-RA-sched=source.

llvm-svn: 182706
2013-05-25 03:26:51 +00:00
Andrew Trick e2431c64bc Track IR ordering of SelectionDAG nodes 3/4.
Remove the old IR ordering mechanism and switch to new one.  Fix unit
test failures.

llvm-svn: 182704
2013-05-25 03:08:10 +00:00
Hal Finkel cf2e908014 PPC: Initial support for permutation-based unaligned Altivec loads
Altivec only directly supports aligned loads, but the loads have a strange
property: If given an unaligned address, they truncate the address to the next
lower aligned address, and load from there.  This property, along with an extra
load and some special-purpose permutation-control instructions that generate
the appropriate permutations from the original unaligned address, allow
efficient lowering of aligned loads. This code uses the trick explained in the
Apple Velocity Engine optimization overview document to prevent the needed
extra load from possibly causing a page fault if the original address happens
to be aligned.

As noted in the FIXMEs, there are several additional optimizations that can be
performed to reduce the cost of these loads even more. These will be
implemented in future commits.

llvm-svn: 182691
2013-05-24 23:00:14 +00:00
Diego Novillo c63995394d Add a new function attribute 'cold' to functions.
Other than recognizing the attribute, the patch does little else.
It changes the branch probability analyzer so that edges into
blocks postdominated by a cold function are given low weight.

Added analysis and code generation tests.  Added documentation for the
new attribute.

llvm-svn: 182638
2013-05-24 12:26:52 +00:00
Tim Northover bc93308489 ARM: implement @llvm.readcyclecounter intrinsic
This implements the @llvm.readcyclecounter intrinsic as the specific
MRC instruction specified in the ARM manuals for CPUs with the Power
Management extensions.

Older CPUs had slightly different methods which may also have to be
implemented eventually, but this should cover all v7 cases.

rdar://problem/13939186

llvm-svn: 182603
2013-05-23 19:11:20 +00:00
Tom Stellard 1b086cbcb8 R600: Fix R600ControlFlowFinalizer not considering VTX_READ 128 bit dst reg
Patch by: Vincent Lejeune

https://bugs.freedesktop.org/show_bug.cgi?id=64877

NOTE: This is a candidate for the 3.3 branch.
llvm-svn: 182600
2013-05-23 18:26:42 +00:00
Jakob Stoklund Olesen 43711c51ec Fix PR16110: Handle DBG_VALUE in ConnectedVNInfoEqClasses::Distribute().
Now that the LiveDebugVariables pass is running *after* register
coalescing, the ConnectedVNInfoEqClasses class needs to deal with
DBG_VALUE instructions.

This only comes up when rematerialization during coalescing causes the
remaining live range of a virtual register to separate into two
connected components.

llvm-svn: 182592
2013-05-23 17:02:23 +00:00
Nick Lewycky 7b431030ac Add missing test from r175092.
llvm-svn: 182564
2013-05-23 07:46:13 +00:00
Nadav Rotem 7b66c47051 X86: Fix a bug in EltsFromConsecutiveLoads. We can't generate new loads without chains.
llvm-svn: 182507
2013-05-22 19:28:41 +00:00
Benjamin Kramer d76cc186fc X86: When expanding PCMPGTQ to PCMPGTD we always want to compare the lower halves as unsigned.
Take #2 on fixing PR15977.

llvm-svn: 182486
2013-05-22 17:01:12 +00:00
David Majnemer 7ea2a52a0c X86: Remove test instructions proceeding shift by immediate instructions
Allow LLVM to take advantage of shift instructions that set the ZF flag,
making instructions that test the destination superfluous.

llvm-svn: 182454
2013-05-22 08:13:02 +00:00
Akira Hatanaka be76cd0b8e [mips] Rename option to make it compatible with gcc.
llvm-svn: 182397
2013-05-21 17:17:59 +00:00
Akira Hatanaka 6871031be9 [mips] Add instruction selection patterns for blez and bgez.
llvm-svn: 182396
2013-05-21 17:13:47 +00:00
Justin Holewinski 48f4ad3fc0 [NVPTX] Add @llvm.nvvm.sqrt.f() intrinsic
llvm-svn: 182394
2013-05-21 16:51:30 +00:00