and mips16 on a per function basis.
Because this patch is somewhat involved I have provide an overview of the
key pieces of it.
The patch is written so as to not change the behavior of the non mixed
mode. We have tested this a lot but it is something new to switch subtargets
so we don't want any chance of regression in the mainline compiler until
we have more confidence in this.
Mips32/64 are very different from Mip16 as is the case of ARM vs Thumb1.
For that reason there are derived versions of the register info, frame info,
instruction info and instruction selection classes.
Now we register three separate passes for instruction selection.
One which is used to switch subtargets (MipsModuleISelDAGToDAG.cpp) and then
one for each of the current subtargets (Mips16ISelDAGToDAG.cpp and
MipsSEISelDAGToDAG.cpp).
When the ModuleISel pass runs, it determines if there is a need to switch
subtargets and if so, the owning pointers in MipsTargetMachine are
appropriately changed.
When 16Isel or SEIsel is run, they will return immediately without doing
any work if the current subtarget mode does not apply to them.
In addition, MipsAsmPrinter needs to be reset on a function basis.
The pass BasicTargetTransformInfo is substituted with a null pass since the
pass is immutable and really needs to be a function pass for it to be
used with changing subtargets. This will be fixed in a follow on patch.
llvm-svn: 179118
This commit adds the infrastructure for performing bottom-up SLP vectorization (and other optimizations) on parallel computations.
The infrastructure has three potential users:
1. The loop vectorizer needs to be able to vectorize AOS data structures such as (sum += A[i] + A[i+1]).
2. The BB-vectorizer needs this infrastructure for bottom-up SLP vectorization, because bottom-up vectorization is faster to compute.
3. A loop-roller needs to be able to analyze consecutive chains and roll them into a loop, in order to reduce code size. A loop roller does not need to create vector instructions, and this infrastructure separates the chain analysis from the vectorization.
This patch also includes a simple (100 LOC) bottom up SLP vectorizer that uses the infrastructure, and can vectorize this code:
void SAXPY(int *x, int *y, int a, int i) {
x[i] = a * x[i] + y[i];
x[i+1] = a * x[i+1] + y[i+1];
x[i+2] = a * x[i+2] + y[i+2];
x[i+3] = a * x[i+3] + y[i+3];
}
llvm-svn: 179117
parse an identifier. Otherwise, parseExpression may parse multiple tokens,
which makes it impossible to properly compute an immediate displacement.
An example of such a case is the source operand (i.e., [Symbol + ImmDisp]) in
the below example:
__asm mov eax, [Symbol + ImmDisp]
The existing test cases exercise this patch.
rdar://13611297
llvm-svn: 179115
therefore not at all) of the pc or statement list. We also don't
need to emit the compilation dir so save so space and time
and don't bother.
Fix up the testcase accordingly and verify that we don't emit
the attributes or the items that they use.
llvm-svn: 179114
Some general cleanup and only scan the end of a BB for branches (once we're
done with the terminators and debug values, then there should not be any other
branches). These address post-commit review suggestions by Bill Schmidt.
No functionality change intended.
llvm-svn: 179112
rather than deriving the StringRef from the Start and End SMLocs.
Using the Start and End SMLocs works fine for operands such as [Symbol], but
not for operands such as [Symbol + ImmDisp]. All existing test cases that
reference a variable exercise this patch.
rdar://13602265
llvm-svn: 179109
This pattern occurs in SROA output due to the way vector arguments are lowered
on ARM.
The testcase from PR15525 now compiles into this, which is better than the code
we got with the old scalarrepl:
_Store:
ldr.w r9, [sp]
vmov d17, r3, r9
vmov d16, r1, r2
vst1.8 {d16, d17}, [r0]
bx lr
Differential Revision: http://llvm-reviews.chandlerc.com/D647
llvm-svn: 179106
On PowerPC, non-vector loads and stores have r+i forms; however, in functions
with large stack frames these were not being used to access slots far from the
stack pointer because such slots were out of range for the signed 16-bit
immediate offset field. This increases register pressure because we need a
separate register for each offset (when the r+r form is used). By enabling
virtual base registers, we can deal with large stack frames without unduly
increasing register pressure.
llvm-svn: 179105
Some translations here are not 1x1 because there are grep|grep
chains that are non-trivial to implement in terms of FileCheck features. I
made an effort for the tests to remain as similar as possible; do let me know
if you notice anything fishy. The good news are that some buggy tests were
fixed (grep | not grep - a bug waiting to happen).
llvm-svn: 179102
The save area is twice as big and there is no struct return slot. The
stack pointer is always 16-byte aligned (after adding the bias).
Also eliminate the stack adjustment instructions around calls when the
function has a reserved stack frame.
llvm-svn: 179083
Some parts of PointerIntPair assumed that the IntType of the pair was implicitly
convertible to intptr_t, which is not the case for enum class values. Add a
static_cast<intptr_t> to make these conversions explicit and allow
PointerIntPair to be used with an enum class IntType. While we're here, rename
some of the argument values so we don't have variables named "Int" floating
around.
llvm-svn: 179073
I brazenly think this change is slightly simpler than r178793 because:
- no "state" in functor
- "OpndPtrs[i]" looks simpler than "&Opnds[OpndIndices[i]]"
While I can reproduce the probelm in Valgrind, it is rather difficult to come up
a standalone testing case. The reason is that when an iterator is invalidated,
the stale invalidated elements are not yet clobbered by nonsense data, so the
optimizer can still proceed successfully.
Thank Benjamin for fixing this bug and generously providing the test case.
llvm-svn: 179062
The costs are overfitted so that I can still use the legalization factor.
For example the following kernel has about half the throughput vectorized than
unvectorized when compiled with SSE2. Before this patch we would vectorize it.
unsigned short A[1024];
double B[1024];
void f() {
int i;
for (i = 0; i < 1024; ++i) {
B[i] = (double) A[i];
}
}
radar://13599001
llvm-svn: 179033
PowerPC has a conditional branch to the link register (return) instruction: BCLR.
This should be used any time when we'd otherwise have a conditional branch to a
return. This adds a small pass, PPCEarlyReturn, which runs just prior to the
branch selection pass (and, importantly, after block placement) to generate
these conditional returns when possible. It will also eliminate unconditional
branches to returns (these happen rarely; most of the time these have already
been tail duplicated by the time PPCEarlyReturn is invoked). This is a nice
optimization for small functions that do not maintain a stack frame.
llvm-svn: 179026
I've managed to convince myself that AArch64's acquire/release
instructions are sufficient to guarantee C++11's required semantics,
even in the sequentially-consistent case.
llvm-svn: 179005
I couldn't touch this file and not clean it up some. These reformattings
brought to you by clang-format, with some minor adjustments by me. More
spring cleaning to follow here.
llvm-svn: 179004
internal linkage and so wasn't a patent bug, it doesn't make any sense
here. We can avoid even calling operator<< by just embedding the newline
in the string literals that were already being streamed out. It also
gives the impression of some line-ending agnosticisms which is not
present, and that flushing happens when it doesn't.
If we want to use std::endl, we could do that, but honestly it doesn't
seem remotely worth it. Using '\n' directly is much more clear when
working with raw_ostream.
It also happens to fix builds with old crufty GCC STL implementations
that include std::endl into the global namespace (or headers written to
be compatible with such atrocities).
llvm-svn: 179003
First, we should not cheat: fsel-based lowering of select_cc is a
finite-math-only optimization (the ISA manual, section F.3 of v2.06, makes
this clear, as does a note in our own README).
This also adds fsel-based lowering of EQ and NE condition codes. As it turned
out, fsel generation was covered by a grand total of zero regression test
cases. I've added some test cases to cover the existing behavior (which is now
finite-math only), as well as the new EQ cases.
llvm-svn: 179000
The code in getTypeConversion attempts to promote the element vector type
before it trys to split or widen the vector.
After it failed finding a legal vector type by promoting it would continue using
the promoted vector element type. Thereby missing legal splitted vector types.
For example the type v32i32 that has a legal split of 4 x v3i32 on x86/sse2
would be transformed to: v32i256 and from there on successively split to:
v16i256, v8i256, v1i256 and then finally ends up as an i64 type.
By resetting the vector element type to the original vector element type that
existed before the promotion the code will attempt to split the vector type to
smaller vector widths of the same type.
llvm-svn: 178999
LoadCommandInfo was needed to keep a command and its offset in the file. Now
that we always have a pointer to the command, we don't need the offset.
llvm-svn: 178991
The fix for PR14972 in r177055 introduced a real think-o in the *store*
side, likely because I was much more focused on the load side. While we
can arbitrarily widen (or narrow) a loaded value, we can't arbitrarily
widen a value to be stored, as that changes the width of memory access!
Lock down the code path in the store rewriting which would do this to
only handle the intended circumstance.
All of the existing tests continue to pass, and I've added a test from
the PR.
llvm-svn: 178974
a relocation across sections. Do this for DW_AT_stmt list in the
skeleton CU and check the relocations in the debug_info section.
Add a FIXME for multiple CUs.
llvm-svn: 178969
Integer return values are sign or zero extended by the callee, and
structs up to 32 bytes in size can be returned in registers.
The CC_Sparc64 CallingConv definition is shared between
LowerFormalArguments_64 and LowerReturn_64. Function arguments and
return values are passed in the same registers.
The inreg flag is also used for return values. This is required to handle
C functions returning structs containing floats and ints:
struct ifp {
int i;
float f;
};
struct ifp f(void);
LLVM IR:
define inreg { i32, float } @f() {
...
ret { i32, float } %retval
}
The ABI requires that %retval.i is returned in the high bits of %i0
while %retval.f goes in %f1.
Without the inreg return value attribute, %retval.i would go in %i0 and
%retval.f would go in %f3 which is a more efficient way of returning
%multiple values, but it is not ABI compliant for returning C structs.
llvm-svn: 178966
64-bit SPARC v9 processes use biased stack and frame pointers, so the
current function's stack frame is located at %sp+BIAS .. %fp+BIAS where
BIAS = 2047.
This makes more local variables directly accessible via [%fp+simm13]
addressing.
llvm-svn: 178965
There are certain PPC instructions into which we can fold a zero immediate
operand. We can detect such cases by looking at the register class required
by the using operand (so long as it is not otherwise constrained).
llvm-svn: 178961
This comment documents the current behavior of the ARM implementation of this
callback, and also the soon-to-be-committed PPC version.
llvm-svn: 178959
All arguments are formally assigned to stack positions and then promoted
to floating point and integer registers. Since there are more floating
point registers than integer registers, this can cause situations where
floating point arguments are assigned to registers after integer
arguments that where assigned to the stack.
Use the inreg flag to indicate 32-bit fragments of structs containing
both float and int members.
The three-way shadowing between stack, integer, and floating point
registers requires custom argument lowering. The good news is that
return values are passed in the exact same way, and we can share the
code.
Still missing:
- Update LowerReturn to handle structs returned in registers.
- LowerCall.
- Variadic functions.
llvm-svn: 178958
The code emitter knows how to encode operands whose name matches one of
the encoding fields. If there is no match, the code emitter relies on
the order of the operand and field definitions to determine how operands
should be encoding. Matching by order makes it easy to accidentally break
the instruction encodings, so we prefer to match by name.
Reviewed-by: Christian König <christian.koenig@amd.com>
llvm-svn: 178930
SITargetLowering::analyzeImmediate() was converting the 64-bit values
to 32-bit and then checking if they were an inline immediate. Some
of these conversions caused this check to succeed and produced
S_MOV instructions with 64-bit immediates, which are illegal.
v2:
- Clean up logic
Reviewed-by: Christian König <christian.koenig@amd.com>
llvm-svn: 178927
On cores for which we know the misprediction penalty, and we have
the isel instruction, we can profitably perform early if conversion.
This enables us to replace some small branch sequences with selects
and avoid the potential stalls from mispredicting the branches.
Enabling this feature required implementing canInsertSelect and
insertSelect in PPCInstrInfo; isel code in PPCISelLowering was
refactored to use these functions as well.
llvm-svn: 178926
The manual states that there is a minimum of 13 cycles from when the
mispredicted branch is issued to when the correct branch target is
issued.
llvm-svn: 178925
The normal dataflow sequence in the ARC optimizer consists of the following
states:
Retain -> CanRelease -> Use -> Release
The optimizer before this patch stored the uses that determine the lifetime of
the retainable object pointer when it bottom up hits a retain or when top down
it hits a release. This is correct for an imprecise lifetime scenario since what
we are trying to do is remove retains/releases while making sure that no
``CanRelease'' (which is usually a call) deallocates the given pointer before we
get to the ``Use'' (since that would cause a segfault).
If we are considering the precise lifetime scenario though, this is not
correct. In such a situation, we *DO* care about the previous sequence, but
additionally, we wish to track the uses resulting from the following incomplete
sequences:
Retain -> CanRelease -> Release (TopDown)
Retain <- Use <- Release (BottomUp)
*NOTE* This patch looks large but the most of it consists of updating
test cases. Additionally this fix exposed an additional bug. I removed
the test case that expressed said bug and will recommit it with the fix
in a little bit.
llvm-svn: 178921
This fixes PEI as previously described, but correctly handles the case where
the instruction defining the virtual register to be scavenged is the first in
the block. Arnold provided me with a bugpoint-reduced test case, but even that
seems too large to use as a regression test. If I'm successful in cleaning it
up then I'll commit that as well.
Original commit message:
This change fixes a bug that I introduced in r178058. After a register is
scavenged using one of the available spills slots the instruction defining the
virtual register needs to be moved to after the spill code. The scavenger has
already processed the defining instruction so that registers killed by that
instruction are available for definition in that same instruction. Unfortunately,
after this, the scavenger needs to iterate through the spill code and then
visit, again, the instruction that defines the now-scavenged register. In order
to avoid confusion, the register scavenger needs the ability to 'back up'
through the spill code so that it can again process the instructions in the
appropriate order. Prior to this fix, once the scavenger reached the
just-moved instruction, it would assert if it killed any registers because,
having already processed the instruction, it believed they were undefined.
Unfortunately, I don't yet have a small test case. Thanks to Pranav Bhandarkar
for diagnosing the problem and testing this fix.
llvm-svn: 178919
During LTO, the target options on functions within the same Module may
change. This would necessitate resetting some of the back-end. Do this for X86,
because it's a Friday afternoon.
llvm-svn: 178917
Reverting because this breaks one of the LTO builders. Original commit message:
This change fixes a bug that I introduced in r178058. After a register is
scavenged using one of the available spills slots the instruction defining the
virtual register needs to be moved to after the spill code. The scavenger has
already processed the defining instruction so that registers killed by that
instruction are available for definition in that same instruction. Unfortunately,
after this, the scavenger needs to iterate through the spill code and then
visit, again, the instruction that defines the now-scavenged register. In order
to avoid confusion, the register scavenger needs the ability to 'back up'
through the spill code so that it can again process the instructions in the
appropriate order. Prior to this fix, once the scavenger reached the
just-moved instruction, it would assert if it killed any registers because,
having already processed the instruction, it believed they were undefined.
Unfortunately, I don't yet have a small test case. Thanks to Pranav Bhandarkar
for diagnosing the problem and testing this fix.
llvm-svn: 178916
This optimization is unstable at this moment; it
1) block us on a very important application
2) PR15200
3) test6 and test7 in test/Transforms/ScalarRepl/dynamic-vector-gep.ll
(the CHECK command compare the output against wrong result)
I personally believe this optimization should not have any impact on the
autovectorized code, as auto-vectorizer is supposed to put gather/scatter
in a "right" way. Although in theory downstream optimizaters might reveal
some gather/scatter optimization opportunities, the chance is quite slim.
For the hand-crafted vectorizing code, in term of redundancy elimination,
load-CSE, copy-propagation and DSE can collectively achieve the same result,
but in much simpler way. On the other hand, these optimizers are able to
improve the code in a incremental way; in contrast, SROA is sort of all-or-none
approach. However, SROA might slighly win in stack size, as it tries to figure
out a stretch of memory tightenly cover the area accessed by the dynamic index.
rdar://13174884
PR15200
llvm-svn: 178912
llvm-mips-linux green.
llvm-mips-linux runs on a big endian machine. This test passes if I change 'e'
to 'E' in the target data layout string.
llvm-svn: 178910
It's possible for the lock file to disappear and the owning process to
return before we're able to see the generated file. Spin for a little
while to see if it shows up before failing.
llvm-svn: 178909
If the directory that will contain the unique file doesn't exist when
we tried to create the file, but another process creates it before we
get a chance to try creating it, we would bail out rather than try to
create the unique file.
llvm-svn: 178908
memory operands.
Essentially, this layers an infix calculator on top of the parsing state
machine. The scale on the index register is still expected to be an immediate
__asm mov eax, [eax + ebx*4]
and will not work with more complex expressions. For example,
__asm mov eax, [eax + ebx*(2*2)]
The plus and minus binary operators assume the numeric value of a register is
zero so as to not change the displacement. Register operands should never
be an operand for a multiply or divide operation; the scale*indexreg
expression is always replaced with a zero on the operand stack to prevent
such a case.
rdar://13521380
llvm-svn: 178881
Summary:
Sets a report hook that emulates pressing "retry" in the "abort, retry,
ignore" dialog box that _CrtDbgReport normally raises. There are many
other ways to disable assertion reports, but this was the only way I
could find that still calls our exception handler.
Reviewers: Bigcheese
CC: llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D625
llvm-svn: 178880
InMemoryStruct is extremely dangerous as it returns data from an internal
buffer when the endiannes doesn't match. This should fix the tests on big
endian hosts.
llvm-svn: 178875
Change unittests/ExecutionEngine/Makefile to include Makefile.config before
TARGET_HAS_JIT flag is checked.
Fixes bug: http://llvm.org/bugs/show_bug.cgi?id=15669
llvm-svn: 178871
When the RuntimeDyldELF::processRelocationRef routine finds the target
symbol of a relocation in the local or global symbol table, it performs
a section-relative relocation:
Value.SectionID = lsi->second.first;
Value.Addend = lsi->second.second;
At this point, however, any Addend that might have been specified in
the original relocation record is lost. This is somewhat difficult to
trigger for relocations within the code section since they usually
do not contain non-zero Addends (when built with the default JIT code
model, in any case). However, the problem can be reliably triggered
by a relocation within the data section caused by code like:
int test[2] = { -1, 0 };
int *p = &test[1];
The initializer of "p" will need a relocation to "test + 4". On
platforms using RelA relocations this means an Addend of 4 is required.
Current code ignores this addend when processing the relocation,
resulting in incorrect execution.
Fixed by taking the Addend into account when processing relocations
to symbols found in the local or global symbol table.
Tested on x86_64-linux and powerpc64-linux.
llvm-svn: 178869
This change fixes a bug that I introduced in r178058. After a register is
scavenged using one of the available spills slots the instruction defining the
virtual register needs to be moved to after the spill code. The scavenger has
already processed the defining instruction so that registers killed by that
instruction are available for definition in that same instruction. Unfortunately,
after this, the scavenger needs to iterate through the spill code and then
visit, again, the instruction that defines the now-scavenged register. In order
to avoid confusion, the register scavenger needs the ability to 'back up'
through the spill code so that it can again process the instructions in the
appropriate order. Prior to this fix, once the scavenger reached the
just-moved instruction, it would assert if it killed any registers because,
having already processed the instruction, it believed they were undefined.
Unfortunately, I don't yet have a small test case. Thanks to Pranav Bhandarkar
for diagnosing the problem and testing this fix.
llvm-svn: 178845
Looks like there is a big endian/little endian problem here. Loosen the
test to try to get the bots green while llvm builds on a ppc qemu vm.
The failure was in http://lab.llvm.org:8011/builders/clang-ppc64-elf-linux2/
llvm-svn: 178839
For now, just save the compile time since the ConvergingScheduler
heuristics don't use this analysis. We'll probably enable it later
after compile-time investigation.
llvm-svn: 178822