This commit updates the PowerPC back-end (PPCInstrInfo.td and
PPCInstr64Bit.td) to use types instead of register classes in
instruction patterns, along the lines of Jakob Stoklund Olesen's
changes in r177835 for Sparc.
llvm-svn: 177890
This commit updates the PowerPC back-end (PPCInstrInfo.td and
PPCInstr64Bit.td) to use types instead of register classes in
Pat patterns, along the lines of Jakob Stoklund Olesen's
changes in r177829 for Sparc.
llvm-svn: 177889
sure the base register and would-be writeback register don't conflict for
stores. This was already being done for loads.
Unfortunately, it is rather difficult to create a test case for this issue. It
was exposed in 450.soplex at LTO and requires unlucky register allocation.
<rdar://13394908>
llvm-svn: 177874
to have them appear in the right order. Instead append all warnings explicitly
to the language flags. This was already the case for many warnings. Fixes the
issue of -Wno-maybe-uninitialized not being effective because -Wall was being
placed after it rather than before.
llvm-svn: 177866
This simplification happens at 2 places :
- using the nsw attribute when the shl / mul is used by a sign test
- when the shl / mul is compared for (in)equality to zero
llvm-svn: 177856
This syntax is now preferred:
def : Pat<(subc i32:$b, i32:$c), (SUBCCrr $b, $c)>;
There is no reason to repeat the types in the output pattern.
llvm-svn: 177844
DAG arguments can optionally be named:
(dag node, node:$name)
With this change, the node is also optional:
(dag node, node:$name, $name)
The missing node is treated as an UnsetInit, so the above is equivalent
to:
(dag node, node:$name, ?:$name)
This syntax is useful in output patterns where we currently require the
types of variables to be repeated:
def : Pat<(subc i32:$b, i32:$c), (SUBCCrr i32:$b, i32:$c)>;
This is preferable:
def : Pat<(subc i32:$b, i32:$c), (SUBCCrr $b, $c)>;
llvm-svn: 177843
This makes it possible to define instruction patterns like this:
def LDri : F3_2<3, 0b000000,
(outs IntRegs:$dst), (ins MEMri:$addr),
"ld [$addr], $dst",
[(set i32:$dst, (load ADDRri:$addr))]>;
~~~
llvm-svn: 177834
In order for the new ZERO register to be used with MC, etc. we need to specify
its register number (0).
Thanks to Kai for reporting the problem!
llvm-svn: 177833
In preparation for using the new register scavenger capability for providing
more than one register simultaneously, specifically note functions that have
spilled VRSAVE (currently, this can happen only in functions that use the
setjmp intrinsic). As with CR spilling, such functions will need to provide two
emergency spill slots to the scavenger.
No functionality change intended.
llvm-svn: 177832
I recently added a BCL instruction definition as part of implementing SjLj
support. This can also be used to MCize bcl emission in the asm printer.
No functionality change intended.
llvm-svn: 177830
The SelectionDAG graph has MVT type labels, not register classes, so
this makes it clearer what is happening.
This notation is also robust against adding more types to the IntRegs
register class.
llvm-svn: 177829
Just like register classes, value types can be used in two ways in
patterns:
(sext_inreg i32:$src, i16)
In a named leaf node like i32:$src, the value type simply provides the
type of the node directly. This simplifies type inference a lot compared
to the current practice of specifiying types indirectly with register
classes.
As an unnamed leaf node, like i16 above, the value type represents
itself as an MVT::Other immediate.
llvm-svn: 177828
These spilling functions will eventually make use of the register scavenger,
however, they'll do so by taking advantage of PEI's virtual-register-based
delayed scavenging mechanism. As a result, these function parameters will not
be used, and can be removed.
No functionality change intended.
llvm-svn: 177827
A register class can appear as a leaf TreePatternNode with and without a
name:
(COPY_TO_REGCLASS GPR:$src, F8RC)
In a named leaf node like GPR:$src, the register class provides type
information for the named variable represented by the node. The TypeSet
for such a node is the set of value types that the register class can
represent.
In an unnamed leaf node like F8RC above, the register class represents
itself as a kind of immediate. Such a node has the type MVT::i32,
we'll never create a virtual register representing it.
This change makes it possible to remove the special handling of
COPY_TO_REGCLASS in CodeGenDAGPatterns.cpp.
llvm-svn: 177825
The LR register is unconditionally reserved, and its spilling and restoration
is handled by the prologue/epilogue code. As a result, it is never explicitly
spilled by the register allocator.
No functionality change intended.
llvm-svn: 177823
Performing this check unilaterally prevented us from generating FMAs when the incoming IR contained illegal vector types which would eventually be legalized to underlying types that *did* support FMA.
For example, an @llvm.fmuladd on an OpenCL float16 should become a sequence of float4 FMAs, not float4 fmul+fadd's.
NOTE: Because we still call the target-specific profitability hook, individual targets can reinstate the old behavior, if desired, by simply performing the legality check inside their callback hook. They can also perform more sophisticated legality checks, if, for example, some illegal vector types can be productively implemented as FMAs, but not others.
llvm-svn: 177820
177774 broke the lld-x86_64-darwin11 builder; error:
error: comparison of integers of different signs: 'int' and 'size_type' (aka 'unsigned long')
for (SI = 0; SI < Scavenged.size(); ++SI)
~~ ^ ~~~~~~~~~~~~~~~~
Fix this by making SI also unsigned.
llvm-svn: 177780
The new wording cannot be construed as suggesting the use of
SmallVectorImpl<T> as e.g. a class member (just because the class
happens to be in an interface).
llvm-svn: 177778
This patch lets the register scavenger make use of multiple spill slots in
order to guarantee that it will be able to provide multiple registers
simultaneously.
To support this, the RS's API has changed slightly: setScavengingFrameIndex /
getScavengingFrameIndex have been replaced by addScavengingFrameIndex /
isScavengingFrameIndex / getScavengingFrameIndices.
In forthcoming commits, the PowerPC backend will use this capability in order
to implement the spilling of condition registers, and some special-purpose
registers, without relying on r0 being reserved. In some cases, spilling these
registers requires two GPRs: one for addressing and one to hold the value being
transferred.
llvm-svn: 177774
Add "evaluate-tbaa" to print alias queries of loads/stores. Alias queries
between pointers do not include TBAA tags.
Add testing case for "placement new". TBAA currently says NoAlias.
llvm-svn: 177772
We currently have a duplicated set of call instruction patterns depending
on the ABI to be followed (Darwin vs. Linux). This is a bit odd; while the
different ABIs will result in different instruction sequences, the actual
instructions themselves ought to be independent of the ABI. And in fact it
turns out that the only nontrivial difference between the two sets of
patterns is that in the PPC64 Linux ABI, the instruction used for indirect
calls is marked to take X11 as extra input register (which is indeed used
only with that ABI to hold an incoming environment pointer for nested
functions). However, this does not need to be hard-coded at the .td
pattern level; instead, the C++ code expanding calls can simply add that
use, just like it adds uses for argument registers anyway.
No change in generated code expected.
llvm-svn: 177735
Currently, the sub-operand of a memrr address that corresponds to what
hardware considers the base register is called "offreg", while the
sub-operand that corresponds to the offset is called "ptrreg".
To avoid confusion, this patch simply swaps the named of those two
sub-operands and updates all uses. No functional change is intended.
llvm-svn: 177734
PPCTargetLowering::getPreIndexedAddressParts currently provides
the base part of a memory address in the offset result, and the
offset part in the base result. That swap is then undone again
when an MI instruction is generated (in PPCDAGToDAGISel::Select
for loads, and using .md Pat patterns for stores).
This patch reverts this double swap, to make common code and
back-end be in sync as to which part of the address is base
and which is offset.
To avoid performance regressions in certain cases, target code
now checks whether the choice of base register would be rejected
for pre-inc accesses by common code, and attempts to swap base
and offset again in such cases. (Overall, this means that now
pre-ice accesses are generated *more* frequently than before.)
llvm-svn: 177733
The iaddroff ComplexPattern is supposed to recognize displacement
expressions that have been processed by a SelectAddressRegImm,
which means it needs to accept TargetConstant and TargetGlobalAddress
nodes. Currently, it erroneously also accepts some other nodes,
in particular Constant and PPCISD::Lo.
While this problem is currently latent, it would cause wrong-code
bugs with a follow-on patch I'm about to commit, so this patch
tightens the ComplexPattern. The equivalent change is made in
PPCDAGToDAGISel::Select, where pre-inc load patterns are handled
(as opposed to store patterns, the loads are handled in C++ code
without making use of the .td ComplexPattern).
llvm-svn: 177732
The xaddroff pattern is currently (mistakenly) used to recognize
the *base* register in pre-inc store patterns. This patch replaces
those uses by ptr_rc_nor0 (as is elsewhere done to match the base
register of an address), and removes the now unused ComplexPattern.
llvm-svn: 177731
Fixes wrong lighting in some corner cases with r600g and radeonsi, e.g.
manifested by failure of two piglit/glean tests and intermittent black
patches in many apps.
Tested on SI and RS880.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62012 [radeonsi]
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=58150 [r600g]
NOTE: This is a candidate for the Mesa stable branch.
Reviewed-by: Christian König <christian.koenig@amd.com>
llvm-svn: 177730
Before: the function name was stored by the compiler as a constant string
and the run-time was printing it.
Now: the PC is stored instead and the run-time prints the full symbolized frame.
This adds a couple of instructions into every function with non-empty stack frame,
but also reduces the binary size because we store less strings (I saw 2% size reduction).
This change bumps the asan ABI version to v3.
llvm part.
Example of report (now):
==31711==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fffa77cf1c5 at pc 0x41feb0 bp 0x7fffa77cefb0 sp 0x7fffa77cefa8
READ of size 1 at 0x7fffa77cf1c5 thread T0
#0 0x41feaf in Frame0(int, char*, char*, char*) stack-oob-frames.cc:20
#1 0x41f7ff in Frame1(int, char*, char*) stack-oob-frames.cc:24
#2 0x41f477 in Frame2(int, char*) stack-oob-frames.cc:28
#3 0x41f194 in Frame3(int) stack-oob-frames.cc:32
#4 0x41eee0 in main stack-oob-frames.cc:38
#5 0x7f0c5566f76c (/lib/x86_64-linux-gnu/libc.so.6+0x2176c)
#6 0x41eb1c (/usr/local/google/kcc/llvm_cmake/a.out+0x41eb1c)
Address 0x7fffa77cf1c5 is located in stack of thread T0 at offset 293 in frame
#0 0x41f87f in Frame0(int, char*, char*, char*) stack-oob-frames.cc:12 <<<<<<<<<<<<<< this is new
This frame has 6 object(s):
[32, 36) 'frame.addr'
[96, 104) 'a.addr'
[160, 168) 'b.addr'
[224, 232) 'c.addr'
[288, 292) 's'
[352, 360) 'd'
llvm-svn: 177724
The original code used i32, and i64 if legal. This introduced unneeded
casts when they aren't legal, or when the index variable i has another
type. In order of preference: try to use i's type; use the smallest
fitting legal type (using an added DataLayout method); default to i32.
A testcase checks that this works when the index gep operand is i16.
Patch by : Ahmed Bougacha <ahmed.bougacha@gmail.com>
Reviewed by : Duncan
llvm-svn: 177712
the ARM build bots, and it adds a weird case to the test suite where
a test uses as inputs files in the parent directory.
Talked about this with Dave on IRC and he's fine with this approach even
though it isn't optimal.
llvm-svn: 177700
-time-ir-parsing flag
This breaks the layering of the Support library. We can't add an
implementation side to IRReader because it refers directly to entities
only accessible as part of the IR, AsmParser, and BitcodeReader
libraries. It can only be used in a context where all of those libraries
will be available.
We'll need to find some other way to get this functionality, and
hopefully solve the long-standing layering problem of IRReader.h...
llvm-svn: 177695
For mips a branch an 18-bit signed offset (the 16-bit
offset field shifted left 2 bits) is added to the
address of the instruction following the branch
(not the branch itself), in the branch delay slot,
to form a PC-relative effective target address.
Previously, the code generator did not perform the
shift of the immediate branch offset which resulted
in wrong instruction opcode. This patch fixes the issue.
Contributor: Vladimir Medic
llvm-svn: 177687
This patch uses the generated instruction info tables to
identify memory/load store instructions.
After successful matching and based on the operand type
and size, it generates additional instructions to the output.
Contributor: Vladimir Medic
llvm-svn: 177685
As Jakob pointed out in his review of r177423, having a shared ZERO
register between the 32- and 64-bit register classes causes this
odd G8RC_NOX0_and_GPRC_NOR0 class to be created. As recommended,
this adds a ZERO8 register which differentiates the 32- and 64-bit
zeros.
No functionality change intended.
llvm-svn: 177683
To use this in conjunction with exuberant ctags to generate a single
combined tags file, run tblgen first and then
$ ctags --append [...]
Since some identifiers have corresponding definitions in C++ code,
it can be useful (if using vim) to also use cscope, and
:set cscopetagorder=1
so that
:tag X
will preferentially select the tablegen symbol, while
:cscope find g X
will always find the C++ symbol.
Patch by Kevin Schoedel!
(a couple small formatting changes courtesy of clang-format)
llvm-svn: 177682
How did this ever work?
Basically, if you have a function that's inlined into the caller, it may not
have any 'call' instructions, but any 'resume' instructions it may have should
still be forwarded to the outer (caller's) landing pad. This requires that all
of the 'landingpad' instructions in the callee have their clauses merged with
the caller's outer 'landingpad' instruction (hence the bit of ugly code in the
`forwardResume' method).
Testcase in a follow commit to the test-suite repository.
<rdar://problem/13360379> & PR15555
llvm-svn: 177680
Thanks to Jakob for isolating the underlying problem from the
test case in r177423. The original commit had introduced
asymmetric copy operations, but these turned out to be a work-around
to the real problem (the use of == instead of hasSubClassEq in PPCCTRLoops).
llvm-svn: 177679
The DARWIN_USER_TEMP_DIR and DARWIN_USER_CACHE_DIR configuration
settings are more idiomatic for Darwin than the TMPDIR environment
variable.
llvm-svn: 177669
The .set directive in the Mips the assembler can be
used to set the value of a symbol to an expression.
This changes the symbol's value and type to conform
to the expression's.
Syntax: .set symbol, expression
This patch implements the parsing of the above syntax
and enables the parser to use defined symbols when
parsing operands.
Contributor: Vladimir Medic
llvm-svn: 177667
This implements SJLJ lowering on PPC, making the Clang functions
__builtin_{setjmp/longjmp} functional on PPC platforms. The implementation
strategy is similar to that on X86, with the exception that a branch-and-link
variant is used to get the right jump address. Credit goes to Bill Schmidt for
suggesting the use of the unconditional bcl form (instead of the regular bl
instruction) to limit return-address-cache pollution.
Benchmarking the speed at -O3 of:
static jmp_buf env_sigill;
void foo() {
__builtin_longjmp(env_sigill,1);
}
main() {
...
for (int i = 0; i < c; ++i) {
if (__builtin_setjmp(env_sigill)) {
goto done;
} else {
foo();
}
done:;
}
...
}
vs. the same code using the libc setjmp/longjmp functions on a P7 shows that
this builtin implementation is ~4x faster with Altivec enabled and ~7.25x
faster with Altivec disabled. This comparison is somewhat unfair because the
libc version must also save/restore the VSX registers which we don't yet
support.
llvm-svn: 177666
Although there is only one Altivec VRSAVE register, it is a member of
a register class, and we need the ability to spill it. Because this
register is normally callee-preserved and handled by special code this
has never before been necessary. However, this capability will be required by
a forthcoming commit adding SjLj support.
llvm-svn: 177654
The old code used to lower FRAMEADDR tried to replicate the logic in the real
frame-lowering code that determines whether or not the frame pointer (r31) will
be used. When it seemed as through the frame pointer would not be used, the
stack pointer (r1) was used instead. Unfortunately, because the stack size is
not yet known, this does not work. Instead, this change introduces new
always-reserved pseudo-registers (FP and FP8) that are replaced during prologue
insertion with the real frame-pointer register (either r1 or r31).
It is important that this intrinsic always return a valid frame address because
it is used by Clang to store the frame address as part of code generation for
__builtin_setjmp.
llvm-svn: 177653
NEON is not IEEE 754 compliant, so we should avoid lowering single-precision
floating point operations with NEON unless unsafe-math is turned on. The
equivalent VFP instructions are IEEE 754 compliant, but in some cores they're
much slower, so some archs/OSs might still request it to be on by default,
such as Swift and Darwin.
llvm-svn: 177651
header.
This method is called in the hot path for *many* passes, SROA is what
caught my interest. A common pattern is that which branch of the switch
should be taken is known in the callsite and so it is a very good
candidate for inlining and simplification. Moving it into the header
allows the optimizer to fold a lot of boring, repeatitive code in
callers of this routine.
I'm seeing pretty significant speedups in parts of SROA and I suspect
other passes will see similar speedups if they end up working with type
sizes frequently. I've not seen any significant growth of the binaries
as a consequence, but let me know if you see anything suspicious here.
llvm-svn: 177632
The key part of this is ensuring that name prefixes remain in a Twine
form until we get to a point where we can nuke them under NDEBUG. This
is tricky using the old APIs as they played fast and loose with Twine,
which is prone to serious error. The inserter is much cleaner as it is
actually in the call stack leading to the setName call, and so has
a good opportunity to prepend the prefix.
This matters more than you might imagine because most runs over an
alloca find a single partition, and rewrite 3 or 4 instructions
referring to it. As a consequence doing this lazily and exclusively with
Twine allows the optimizer to delete more of it and shaves another 2% to
3% off of the release build's SROA run time for PR15412. I also think
the APIs are cleaner, and the use of Twine is more reliable, so
I consider it a win-win despite the churn required to reach this state.
llvm-svn: 177631
The simplify-libcalls pass implemented a doInitialization hook to infer
function prototype attributes for well-known functions. Given that the
simplify-libcalls pass is going away *and* that the functionattrs pass
is already in place to deduce function attributes, I am moving this logic
to the functionattrs pass. This approach was discussed during patch
review:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20121126/157465.html.
llvm-svn: 177619
Native Windows Python will do line ending translation by default, which
we don't want in bash scripts. If we're not native Windows Python, then
'b' is ignored.
llvm-svn: 177602
- After moving logic recognizing vector shift with scalar amount from
DAG combining into DAG lowering, we declare to customize all vector
shifts even vector shift on AVX is legal. As a result, the cost model
needs special tuning to identify these legal cases.
llvm-svn: 177586
Use the new `llvm_gcov_init' function to register the writeout and flush
functions. The initialization function will also call `atexit' for some cleanups
and final writout calls. But it does this only once. This is better than
checking for the `main' function, because in a library that function may not
exist.
<rdar://problem/12439551>
llvm-svn: 177579
This makes it possible to report multiple errors in one invocation.
There are already calls to PrintError in CodeGenDAGPatterns.cpp which
previously would not cause TableGen to fail.
<rdar://problem/13463339>
llvm-svn: 177573
This reverts commit 06091513c283c863296f01cc7c2e86b56bb50d02.
The code is obviously wrong, but the trivial fix causes
inefficient code generation on X86. Somebody with more
knowledge of the code needs to take a look here.
Signed-off-by: Christian König <christian.koenig@amd.com>
llvm-svn: 177529
This is espcially important because the new SROA pass goes to great
lengths to provide helpful names for debugging, and as a consequence
they can become very slow to render.
Good for between 5% and 15% of the SROA runtime on some slow test cases
such as the one in PR15412.
llvm-svn: 177495
Moving the DIFile parameter to immediately proceed the tag so that it will be a
common prefix with other DIScopes (once the DIFile is replaced with the raw
file/directory pair).
llvm-svn: 177492
This is the backend portion of a Clang test case
(clang/test/CodeGenCXX/debug-info-namespace.cpp) that was roughly/coarsely
testing LLVM.
llvm-svn: 177487
This makes DIType's first non-tag parameter the same as DIFile's, allowing them
to both share the common implementation of getFilename/getDirectory in DIScope.
llvm-svn: 177467
A node's ordering is only propagated during legalization if (a) the new node does
not have an ordering (is not a CSE'd node), or (b) the new node has an ordering
that is higher than the node being legalized.
llvm-svn: 177465
This is another step along the way to making all DIScopes have a common prefix
which can be added to in a general manner to support using directives
(DW_TAG_imported_module).
llvm-svn: 177462
added back in by X86AsmPrinter::printIntelMemReference() during codegen.
Previously, this following example
void t() {
int i;
__asm mov eax, [i]
}
would generate the below assembly
mov eax, dword ptr [[eax]]
which resulted in a fatal error when compiling. Test case coming on the
clang side.
rdar://13444264
llvm-svn: 177440
an X86Operand, but also performs a Sema lookup and adds the sizing directive
when appropriate. Use this when parsing a bracketed statement. This is
necessary to get the instruction matching correct as well. Test case coming
on clang side.
rdar://13455408
llvm-svn: 177439
We don't want to write out >1000 files at the same time. That could make things
prohibitively expensive. Instead, register the "writeout" function so that it's
emitted serially.
<rdar://problem/12439551>
llvm-svn: 177437
- it is trivially known to be used inside the loop in a way that can not be optimized away
- there is no use outside of the loop which can take advantage of the computation hoisting
llvm-svn: 177432
All pre-increment load patterns need to set the mayLoad flag (since
they don't provide a DAG pattern).
This was missing for LHAUX8 and LWAUX, which is added by this patch.
llvm-svn: 177431
As opposed to to pre-increment store patterns, the pre-increment
load patterns were already using standard memory operands, with
the sole exception of LHAU8.
As there's no real reason why LHAU8 should be different here,
this patch simply rewrites the pattern to also use a memri
operand, just like all the other patterns.
llvm-svn: 177430
Currently, pre-increment store patterns are written to use two separate
operands to represent address base and displacement:
stwu $rS, $ptroff($ptrreg)
This causes problems when implementing the assembler parser, so this
commit changes the patterns to use standard (complex) memory operands
like in all other memory access instruction patterns:
stwu $rS, $dst
To still match those instructions against the appropriate pre_store
SelectionDAG nodes, the patch uses the new feature that allows a Pat
to match multiple DAG operands against a single (complex) instruction
operand.
Approved by Hal Finkel.
llvm-svn: 177429
of complex instruction operands (e.g. address modes).
Currently, if a Pat pattern creates an instruction that has a complex
operand (i.e. one that consists of multiple sub-operands at the MI
level), this operand must match a ComplexPattern DAG pattern with the
correct number of output operands.
This commit extends TableGen to alternatively allow match a complex
operands against multiple separate operands at the DAG level.
This allows using Pat patterns to match pre-increment nodes like
pre_store (which must have separate operands at the DAG level) onto
an instruction pattern that uses a multi-operand memory operand,
like the following example on PowerPC (will be committed as a
follow-on patch):
def STWU : DForm_1<37, (outs ptr_rc:$ea_res), (ins GPRC:$rS, memri:$dst),
"stwu $rS, $dst", LdStStoreUpd, []>,
RegConstraint<"$dst.reg = $ea_res">, NoEncode<"$ea_res">;
def : Pat<(pre_store GPRC:$rS, ptr_rc:$ptrreg, iaddroff:$ptroff),
(STWU GPRC:$rS, iaddroff:$ptroff, ptr_rc:$ptrreg)>;
Here, the pair of "ptroff" and "ptrreg" operands is matched onto the
complex operand "dst" of class "memri" in the "STWU" instruction.
Approved by Jakob Stoklund Olesen.
llvm-svn: 177428
The tocentry operand class refers to 64-bit values (it is only used in 64-bit,
where iPTR is a 64-bit type), but its sole suboperand is designated as 32-bit
type. This causes a mismatch to be detected at compile-time with the TableGen
patch I'll check in shortly.
To fix this, this commit changes the suboperand to a 64-bit type as well.
llvm-svn: 177427
def : Pat<(load (i64 (X86Wrapper tglobaltlsaddr :$dst))),
(MOV64rm tglobaltlsaddr :$dst)>;
This pattern is invalid because the MOV64rm instruction expects a
source operand of type "i64mem", which is a subclass of X86MemOperand
and thus actually consists of five MI operands, but the Pat provides
only a single MI operand ("tglobaltlsaddr" matches an SDnode of
type ISD::TargetGlobalTLSAddress and provides a single output).
Thus, if the pattern were ever matched, subsequent uses of the MOV64rm
instruction pattern would access uninitialized memory. In addition,
with the TableGen patch I'm about to check in, this would actually be
reported as a build-time error.
Fortunately, the pattern does in fact never match, for at least two
independent reasons.
First, the code generator actually never generates a pattern of the
form (load (X86Wrapper (tglobaltlsaddr))). For most combinations of
TLS and code models, (tglobaltlsaddr) represents just an offset that
needs to be added to some base register, so it is never directly
dereferenced. The only exception is the initial-exec model, where
(tglobaltlsaddr) refers to the (pc-relative) address of a GOT slot,
which *is* in fact directly dereferenced: but in that case, the
X86WrapperRIP node is used, not X86Wrapper, so the Pat doesn't match.
Second, even if some patterns along those lines *were* ever generated,
we should not need an extra Pat pattern to match it. Instead, the
original MOV64rm instruction pattern ought to match directly, since
it uses an "addr" operand, which is implemented via the SelectAddr
C++ routine; this routine is supposed to accept the full range of
input DAGs that may be implemented by a single mov instruction,
including those cases involving ISD::TargetGlobalTLSAddress (and
actually does so e.g. in the initial-exec case as above).
To avoid build breaks (due to the above-mentioned error) after the
TableGen patch is checked in, I'm removing this Pat here.
llvm-svn: 177426
Currently the PPC r0 register is unconditionally reserved. There are two reasons
for this:
1. r0 is treated specially (as the constant 0) by certain instructions, and so
cannot be used with those instructions as a regular register.
2. r0 is used as a temporary register in the CR-register spilling process
(where, under some circumstances, we require two GPRs).
This change addresses the first reason by introducing a restricted register
class (without r0) for use by those instructions that treat r0 specially. These
register classes have a new pseudo-register, ZERO, which represents the r0-as-0
use. This has the side benefit of making the existing target code simpler (and
easier to understand), and will make it clear to the register allocator that
uses of r0 as 0 don't conflict will real uses of the r0 register.
Once the CR spilling code is improved, we'll be able to allocate r0.
Adding these extra register classes, for some reason unclear to me, causes
requests to the target to copy 32-bit registers to 64-bit registers. The
resulting code seems correct (and causes no test-suite failures), and the new
test case covers this new kind of asymmetric copy.
As r0 is still reserved, no functionality change intended.
llvm-svn: 177423
Remove an accidentally-added instruction definition and add a comment in the
test case. This is in response to a post-commit review by Bill Schmidt.
No functionality change intended.
llvm-svn: 177404
The ARM backend currently has poor codegen for long sext/zext
operations, such as v8i8 -> v8i32. This patch addresses this
by performing a custom expansion in ARMISelLowering. It also
adds/changes the cost of such lowering in ARMTTI.
This partially addresses PR14867.
Patch by Pete Couperus
llvm-svn: 177380
For each compile unit, we want to register a function that will flush that
compile unit. Otherwise, __gcov_flush() would only flush the counters within the
current compile unit, and not any outside of it.
PR15191 & <rdar://problem/13167507>
llvm-svn: 177340
PPC64 supports unaligned loads and stores of 64-bit values, but
in order to use the r+i forms, the offset must be a multiple of 4.
Unfortunately, this cannot always be determined by examining the
immediate itself because it might be available only via a TOC entry.
In order to get around this issue, we additionally predicate the
selection of the r+i form on the alignment of the load or store
(forcing it to be at least 4 in order to select the r+i form).
llvm-svn: 177338