by xoring the high-bit. This fails if the source operand is a vector because we need to negate
each of the elements in the vector.
Fix rdar://12281066 PR13813.
llvm-svn: 163802
are within the lifetime zone. Sometime legitimate usages of allocas are
hoisted outside of the lifetime zone. For example, GEPS may calculate the
address of a member of an allocated struct. This commit makes sure that
we only check (abort regions or assert) for instructions that read and write
memory using stack frames directly. Notice that by allowing legitimate
usages outside the lifetime zone we also stop checking for instructions
which use derivatives of allocas. We will catch less bugs in user code
and in the compiler itself.
llvm-svn: 163791
* wrap code blocks in \code ... \endcode;
* refer to parameter names in paragraphs correctly (\arg is not what most
people want -- it starts a new paragraph).
llvm-svn: 163790
The assumption that the target address for the relocation will always be
sizeof(intptr_t) and will always contain an addend for the relocation
value is very wrong. Default to no addend for now.
rdar://12157052
llvm-svn: 163765
When comparing to the macho relocation type enum value, make sure we're only
comparing against the bits in the RelType that correspond.
llvm-svn: 163764
We don't have enough GR64_TC registers when calling a varargs function
with 6 arguments. Since %al holds the number of vector registers used,
only %r11 is available as a scratch register.
This means that addressing modes using both base and index registers
can't be folded into TCRETURNmi64.
<rdar://problem/12282281>
llvm-svn: 163761
This function writes out the current values of the counters and then resets
them. This can be used similarly to the __gcov_flush function to sync the
counters when need be. For instance, in a situation where the application
doesn't exit.
<rdar://problem/12185886>
llvm-svn: 163757
Add some support for dealing with an object pointer on arguments.
Part of rdar://9797999
which now supports adding the object pointer attribute to the
subprogram as it should.
llvm-svn: 163754
- BlockAddress has no support of BA + offset form and there is no way to
propagate that offset into machine operand;
- Add BA + offset support and a new interface 'getTargetBlockAddress' to
simplify target block address forming;
- All targets are modified to use new interface and X86 backend is enhanced to
support BA + offset addressing.
llvm-svn: 163743
nonvolatile condition register fields across calls under the SVR4 ABIs.
* With the 64-bit ABI, the save location is at a fixed offset of 8 from
the stack pointer. The frame pointer cannot be used to access this
portion of the stack frame since the distance from the frame pointer may
change with alloca calls.
* With the 32-bit ABI, the save location is just below the general
register save area, and is accessed via the frame pointer like the rest
of the save areas. This is an optional slot, so it must only be created
if any of CR2, CR3, and CR4 were modified.
* For both ABIs, save/restore logic is generated only if one of the
nonvolatile CR fields were modified.
I also took this opportunity to clean up an extra FIXME in
PPCFrameLowering.h. Save area offsets for 32-bit GPRs are meaningless
for the 64-bit ABI, so I removed them for correctness and efficiency.
Fixes PR13708 and partially also PR13623. It lets us enable exception handling
on PPC64.
Patch by William J. Schmidt!
llvm-svn: 163713
SelectionDAG::getConstantFP(double Val, EVT VT, bool isTarget);
should not be used when Val is not a simple constant (as the comment in
SelectionDAG.h indicates). This patch avoids using this function
when folding an unknown constant through a bitcast, where it cannot be
guaranteed that Val will be a simple constant.
llvm-svn: 163703
The search for liveness is clipped to a specific number of instructions around the target MachineInstr, in order to avoid degenerating into an O(N^2) algorithm. It tries to use various clues about how instructions around (both before and after) a given MachineInstr use that register, to determine its state at the MachineInstr.
llvm-svn: 163695
was printing a newline that doesn't occur when printing other kinds
of LLVM values. Move the printing of that newline elsewhere, making
globals print the same as other values while leaving the output when
printing an entire module unchanged. Patch by Saša Tomić.
llvm-svn: 163693
findLastUseBefore was previous considering virtreg liveness only, leading to
incorrect live intervals for reg units when instrs with physreg operands were
moved up.
llvm-svn: 163685
The input program may contain intructions which are not inside lifetime
markers. This can happen due to a bug in the compiler or due to a bug in
user code (for example, returning a reference to a local variable).
This commit adds checks that all of the instructions in the function and
invalidates lifetime ranges which do not contain all of the instructions.
llvm-svn: 163678
a pair of switch/branch where both depend on the value of the same variable and
the default case of the first switch/branch goes to the second switch/branch.
Code clean up and fixed a few issues:
1> handling the case where some cases of the 2nd switch are invalidated
2> correctly calculate the weight for the 2nd switch when it is a conditional eq
Testing case is modified from Alastair's original patch.
llvm-svn: 163635
Sub-register lane masks are bitmasks that can be used to determine if
two sub-registers of a virtual register will overlap. For example, ARM's
ssub0 and ssub1 sub-register indices don't overlap each other, but both
overlap dsub0 and qsub0.
The lane masks will be accurate on most targets, but on targets that use
sub-register indexes in an irregular way, the masks may conservatively
report that two sub-register indices overlap when the eventually
allocated physregs don't.
Irregular register banks also mean that the bits in a lane mask can't be
mapped onto register units, but the concept is similar.
llvm-svn: 163630
Preserve the Composites map in the CodeGenSubRegIndex class so it can be
used to determine which sub-register indices can actually be composed.
llvm-svn: 163629
Apparently, NumSubRegIndices was completely unused before. Adjust it by
one to include the null subreg index, just like getNumRegs() includes
the null register.
llvm-svn: 163628
The Hexagon target decided to use a lot of functionality from the
target-independent scheduler. That's fine, and other targets should be
able to do the same. This reorg and API update makes that easy.
For the record, ScheduleDAGMI was not meant to be subclassed. Instead,
new scheduling algorithms should be able to implement
MachineSchedStrategy and be done. But if need be, it's nice to be
able to extend ScheduleDAGMI, so I also made that easier. The target
scheduler is somewhat more apt to break that way though.
llvm-svn: 163580
The ARM backend can eliminate cmp instructions by reusing flags from a
nearby sub instruction with similar arguments.
Don't do that if the sub is predicated - the flags are not written
unconditionally.
<rdar://problem/12263428>
llvm-svn: 163535
This folding happens as early as possible for performance reasons, and to make sure it isn't foiled by other transforms (e.g. forming FMAs).
llvm-svn: 163519
- If a boolean value is generated from CMOV and tested as boolean value,
simplify the use of test result by referencing the original condition.
RDRAND intrinisc is one of such cases.
llvm-svn: 163516
- The C API should be stable
- InlineAsm::AsmDialect is not exposed to C
- The function didn't match the prototype so this was unreachable code
llvm-svn: 163502
Previously we checked if the register is def'd in a block via the def/use list a
nd walked the list of kills to check if the register is killed in a block. Both
of these checks can be made much cheaper by walking the block first and
recording all defs and kills.
This reduces the compile time of the test case from PR13651 from 40s to 15s at
-O2. The compile time is still dominated by LV updating but now the main culprit
is SparseBitVector's slowness.
llvm-svn: 163478
For some reason .lcomm uses byte alignment and .comm log2 alignment so we can't
use the same setting for both. Fix this by reintroducing the LCOMM enum.
I verified this against mingw's gcc.
llvm-svn: 163420
- Darwin lied about not supporting .lcomm and turned it into zerofill in the
asm parser. Push the zerofill-conversion down into macho-specific code.
- This makes the tri-state LCOMMType enum superfluous, there are no targets
without .lcomm.
- Do proper error reporting when trying to use .lcomm with alignment on a target
that doesn't support it.
- .comm and .lcomm alignment was parsed in bytes on COFF, should be power of 2.
- Fixes PR13755 (.lcomm crashes on ELF).
llvm-svn: 163395
gas accepts this and it seems to be common enough to be worth supporting. This
doesn't affect the parsing of reg operands outside of .cfi directives.
llvm-svn: 163390
The assembler can alias one instruction into another based
on the operands. For example the jump instruction "J" takes
and immediate operand, but if the operand is a register the
assembler will change it into a jump register "JR" instruction.
These changes are in the instruction td file.
Test cases included
Contributer: Vladimir Medic
llvm-svn: 163368
Actually these are just stubs for parsing the directives.
Semantic support will come later.
Test cases included
Contributer: Vladimir Medic
llvm-svn: 163364
- This patch is inspired by the failure of the following code snippet
which is used to convert enumerable values into encoding bits to
improve the readability of td files.
class S<int s> {
bits<2> V = !if(!eq(s, 8), {0, 0},
!if(!eq(s, 16), {0, 1},
!if(!eq(s, 32), {1, 0},
!if(!eq(s, 64), {1, 1}, {?, ?}))));
}
Later, PR8330 is found to report not exactly the same bug relevant
issue to bit/bits values.
- Instead of resolving bit/bits values separately through
resolveBitReference(), this patch adds getBit() for all Inits and
resolves bit value by resolving plus getting the specified bit. This
unifies the resolving of bit with other values and removes redundant
logic for resolving bit only. In addition,
BitsInit::resolveReferences() is optimized to take advantage of this
origanization by resolving VarBitInit's variable reference first and
then getting bits from it.
- The type interference in '!if' operator is revised to support possible
combinations of int and bits/bit in MHS and RHS.
- As there may be illegal assignments from integer value to bit, says
assign 2 to a bit, but we only check this during instantiation in some
cases, e.g.
bit V = !if(!eq(x, 17), 0, 2);
Verbose diagnostic message is generated when invalid value is
resolveed to help locating the error.
- PR8330 is fixed as well.
llvm-svn: 163360
The RegisterCoalescer understands overlapping live ranges where one
register is defined as a copy of the other. With this change, register
allocators using LiveRegMatrix can do the same, at least for copies
between physical and virtual registers.
When a physreg is defined by a copy from a virtreg, allow those live
ranges to overlap:
%CL<def> = COPY %vreg11:sub_8bit; GR32_ABCD:%vreg11
%vreg13<def,tied1> = SAR32rCL %vreg13<tied0>, %CL<imp-use,kill>
We can assign %vreg11 to %ECX, overlapping the live range of %CL.
llvm-svn: 163336
We will soon allow virtual register live ranges to overlap regunit live
ranges when the physreg is defined as a copy of the virtreg:
%EAX = COPY %vreg5
FOO %vreg5
BAR %EAX<kill>
There is no real interference since %vreg5 and %EAX have the same value
where they overlap.
This patch prevents addKillFlags from adding virtreg kill flags to FOO
where the assigned physreg is overlapping the virtual register live
range.
llvm-svn: 163335
Kill flags are difficult to maintain, and liveness queries are better
handled by live intervals.
Kill flags are reinserted after register allocation by addKillFlags().
llvm-svn: 163334
Enhances basic alias analysis to recognize phis whose first incoming values are
NoAlias and whose other incoming values are just the phi node itself through
some amount of recursion.
Example: With this change basicaa reports that ptr_phi and ptr_phi2 do not alias
each other.
bb:
ptr = ptr2 + 1
loop:
ptr_phi = phi [bb, ptr], [loop, ptr_plus_one]
ptr2_phi = phi [bb, ptr2], [loop, ptr2_plus_one]
...
ptr_plus_one = gep ptr_phi, 1
ptr2_plus_one = gep ptr2_phi, 1
This enables the elimination of one load in code like the following:
extern int foo;
int test_noalias(int *ptr, int num, int* coeff) {
int *ptr2 = ptr;
int result = (*ptr++) * (*coeff--);
while (num--) {
*ptr2++ = *ptr;
result += (*coeff--) * (*ptr++);
}
*ptr = foo;
return result;
}
Part 2/2 of fix for PR13564.
llvm-svn: 163319
If we can show that the base pointers of two GEPs don't alias each other using
precise analysis and the indices and base offset are equal then the two GEPs
also don't alias each other.
This is primarily needed for the follow up patch that analyses NoAlias'ing PHI
nodes.
Part 1/2 of fix for PR13564.
llvm-svn: 163317
The lookup tables did not get built in a deterministic order.
This makes them get built in the order that the corresponding phi nodes
were found.
llvm-svn: 163305
If we have a BUILD_VECTOR that is mostly a constant splat, it is often better to splat that constant then insertelement the non-constant lanes instead of insertelementing every lane from an undef base.
llvm-svn: 163304
This adds a transformation to SimplifyCFG that attemps to turn switch
instructions into loads from lookup tables. It works on switches that
are only used to initialize one or more phi nodes in a common successor
basic block, for example:
int f(int x) {
switch (x) {
case 0: return 5;
case 1: return 4;
case 2: return -2;
case 5: return 7;
case 6: return 9;
default: return 42;
}
This speeds up the code by removing the hard-to-predict jump, and
reduces code size by removing the code for the jump targets.
llvm-svn: 163302
assembler such as shifts greater than 32. In the case
of direct object, the code gen needs to do this lowering
since the assembler is not involved.
With the advent of the llvm-mc assembler, it also needs
to do the same lowering.
This patch makes that specific lowering code accessible
to both the direct object output and the assembler.
This patch does not affect generated output.
llvm-svn: 163287
Now that it is possible to dynamically tie MachineInstr operands,
predicated instructions are possible in SSA form:
%vreg3<def> = SUBri %vreg1, -2147483647, pred:14, pred:%noreg, %opt:%noreg
%vreg4<def,tied1> = MOVCCr %vreg3<tied0>, %vreg1, %pred:12, pred:%CPSR
Becomes a predicated SUBri with a tied imp-use:
SUBri %vreg1, -2147483647, pred:13, pred:%CPSR, opt:%noreg, %vreg1<imp-use,tied0>
This means that any instruction that is safe to move can be folded into
a MOVCC, and the *CC pseudo-instructions are no longer needed.
The test case changes reflect that Thumb2SizeReduce recognizes the
predicated instructions. It didn't understand the pseudos.
llvm-svn: 163274
switch, make sure we include the value for the cases when calculating edge
value from switch to the default destination.
rdar://12241132
llvm-svn: 163270
Previous patch accidentally decided it couldn't convert a VFP to a
NEON instruction after it had already destroyed the old one. Not a
good move.
llvm-svn: 163230
Simulate a remote target address space by allocating a seperate chunk of
memory for the target and re-mapping section addresses to that prior to
execution. Later we'll want to have a truly remote process, but for now
this gets us closer to being able to test the remote target
functionality outside LLDB.
rdar://12157052
llvm-svn: 163216
It relies on clear() being fast and the cache rarely has more than 1 or 2
elements, so give it an inline capacity and always shrink it back down in case
it grows. DenseMap will grow to 64 buckets which makes clear() a lot slower.
llvm-svn: 163215
subreg_hireg of register pair Rp.
* lib/Target/Hexagon/HexagonPeephole.cpp(PeepholeDoubleRegsMap): New
DenseMap similar to PeepholeMap that additionally records subreg info
too.
(runOnMachineFunction): Record information in PeepholeDoubleRegsMap
and copy propagate the high sub-reg of Rp0 in Rp1 = lsr(Rp0, #32) to
the instruction Rx = COPY Rp1:logreg_subreg.
* test/CodeGen/Hexagon/remove_lsr.ll: New test.
llvm-svn: 163214
pointers-to-strong-pointers may be in play. These can lead to retains and
releases happening in unstructured ways, foiling the optimizer. This fixes
rdar://12150909.
llvm-svn: 163180
This doesn't seem ideal, perhaps we could just keep the llvm_site_cfg and have
other config (clang and clang-tools-extra) derive their site_cfg from that.
Suggestions/complaints/ideas welcome.
llvm-svn: 163171
The MachineOperand::TiedTo field was maintained, but not used.
This patch enables it in isRegTiedToDefOperand() and
isRegTiedToUseOperand() which are the actual functions use by the
register allocator.
llvm-svn: 163153
After much agonizing, use a full 4 bits of precious MachineOperand space
to encode this. This uses existing padding, and doesn't grow
MachineOperand beyond its current 32 bytes.
This allows tied defs among the first 15 operands on a normal
instruction, just like the current MCInstrDesc constraint encoding.
Inline assembly needs to be able to tie more than the first 15 operands,
and gets special treatment.
Tied uses can appear beyond 15 operands, as long as they are tied to a
def that's in range.
llvm-svn: 163151
- CodeGenPrepare pass for identifying div/rem ops
- Backend specifies the type mapping using addBypassSlowDivType
- Enabled only for Intel Atom with O2 32-bit -> 8-bit
- Replace IDIV with instructions which test its value and use DIVB if the value
is positive and less than 256.
- In the case when the quotient and remainder of a divide are used a DIV
and a REM instruction will be present in the IR. In the non-Atom case
they are both lowered to IDIVs and CSE removes the redundant IDIV instruction,
using the quotient and remainder from the first IDIV. However,
due to this optimization CSE is not able to eliminate redundant
IDIV instructions because they are located in different basic blocks.
This is overcome by calculating both the quotient (DIV) and remainder (REM)
in each basic block that is inserted by the optimization and reusing the result
values when a subsequent DIV or REM instruction uses the same operands.
- Test cases check for the presents of the optimization when calculating
either the quotient, remainder, or both.
Patch by Tyler Nowicki!
llvm-svn: 163150
Rationale: For each preprocessor macro, either the definedness is what's
meaningful, or the value is what's meaningful, or both. If definedness is
meaningful, we should use #ifdef. If the value is meaningful, we should use
and #ifdef interchangeably for the same macro, seems ugly to me, even if
undefined macros are zero if used.
This also has the benefit that including an LLVM header doesn't prevent
you from compiling with -Wundef -Werror.
Patch by John Garvin!
<rdar://problem/12189979>
llvm-svn: 163148
This patch corrects the definition of umlal/smlal instructions and adds support
for matching them to the ARM dag combiner.
Bug 12213
Patch by Yin Ma!
llvm-svn: 163136
Scan the body of the loop and find instructions that may trap.
Use this information when deciding if it is safe to hoist or sink instructions.
Notice that we can optimize the search of instructions that may throw in the case of nested loops.
rdar://11518836
llvm-svn: 163132
by instruction address from DWARF.
Add --inlining flag to llvm-dwarfdump to demonstrate and test this functionality,
so that "llvm-dwarfdump --inlining --address=0x..." now works much like
"addr2line -i 0x...", provided that the binary has debug info
(Clang's -gline-tables-only *is* enough).
llvm-svn: 163128
If an allocation has a must-alias relation to the access pointer, we treat it
as a Def. Otherwise, without this check, the code here was just skipping over
the allocation call and ignoring it. I noticed this by inspection and don't
have a specific testcase that it breaks, but it seems like we need to treat
a may-alias allocation as a Clobber.
llvm-svn: 163127