The getSwappedPredicate function can be used in other places (such as in
improvements to the PPCCTRLoops pass). Instead of trapping it as a static
function in PPCInstrInfo, move it into PPCPredicates with other
predicate-related things.
No functionality change intended.
llvm-svn: 179926
The logic that actually compares the types considers pointers and integers the
same if they are of the same size. This created a strange mismatch between hash
and reality and made the test case for this fail on some platforms (yay,
test cases).
llvm-svn: 179905
When matching a compare with a subtract where the arguments of the compare are
swapped w.r.t. the arguments of the subtract, we need to negate the predicates
(or CR bit indices) of the users. This, however, is not the same as inverting
the predicate (negating LT -> GT, but inverting LT -> GE, for example). The ARM
backend seems to do this correctly, but when I adapted the code for the PPC
backend, I introduced an error in this logic.
Comparison optimization is now enabled again by default.
llvm-svn: 179899
Also make some static function class functions to avoid having to mention the
class namespace for enums all the time.
No functionality change intended.
llvm-svn: 179886
This patch adds support for recoded (meaning assembly-language compatible to
standard mips32) arithmetic 32-bit instructions.
Patch by Zoran Jovanovic.
llvm-svn: 179873
Thanks to Evgeniy Stepanov for reporting this.
It might be a good idea to add a command iterator abstraction to MachO.h, but
this fixes the bug for now.
llvm-svn: 179848
Adding another CU-wide list, in this case of imported_modules (since they
should be relatively rare, it seemed better to add a list where each element
had a "context" value, rather than add a (usually empty) list to every scope).
This takes care of DW_TAG_imported_module, but to fully address PR14606 we'll
need to expand this to cover DW_TAG_imported_declaration too.
llvm-svn: 179836
variant/dialect. Addresses a FIXME in the emitMnemonicAliases function.
Use and test case to come shortly.
rdar://13688439 and part of PR13340.
llvm-svn: 179804
Many PPC instructions have a so-called 'record form' which stores to a specific
condition register the result of comparing the result of the instruction with
zero (always as a signed comparison). For integer operations on PPC64, this is
always a 64-bit comparison.
This implementation is derived from the implementation in the ARM backend;
there are some differences because PPC condition registers are allocatable
virtual registers (although the record forms always use a specific one), and we
look for a matching subtraction instruction after the compare (but before the
first use) in addition to before it.
llvm-svn: 179802
Semantics of parameters named Index and Idx were inconsistent between
"include/llvm/IR/Attributes.h", "lib/IR/AttributeImpl.h" and
"lib/IR/Attributes.cpp": sometimes these were fixed 1-based indexes of IR
parameters (or AttributeSet::ReturnIndex for IR return values or
AttributeSet::FunctionIndex for IR functions), other times they were the
internal slot for storage in the underlying AttributeSetImpl. I renamed usage of
the former to "Index" and usage of the latter to "Slot" ("Slot" was already
being used consistently for the latter in a subset of cases)
Patch by Stephen Lin!
llvm-svn: 179791
1. Verify::VerifyParameterAttrs in "lib/IR/Verifier.cpp" and
AttrBuilder::removeFunctionOnlyAttrs in "lib/IR/Attributes.cpp" (only called
by Verify::VerifyFunctionAttrs) separately maintained a list of function-only
attribute types. I've consolidated the logic into a new function used for
both cases in "lib/IR/Verifier.cpp", so this logic is in one place (other
than the AsmParser front-end)
2. Various functions in "lib/IR/Verifier.cpp" passed AttributeSet around by
reference needlessly, as it's just a handle to an immutable pimpl body.
Patch by Stephen Lin!
llvm-svn: 179790
In X86FastISel::X86SelectStore(), improperly aligned stores are rejected and
handled by the DAG-based ISel. However, X86FastISel::X86SelectLoad() makes
no such requirement. There doesn't appear to be an x86 architectural
correctness issue with allowing potentially unaligned store instructions.
This patch removes this restriction.
Patch by Jim Stichnot.
llvm-svn: 179774
A min/max operation is represented by a select(cmp(lt/le/gt/ge, X, Y), X, Y)
sequence in LLVM. If we see such a sequence we can treat it just as any other
commutative binary instruction and reduce it.
This appears to help bzip2 by about 1.5% on an imac12,2.
radar://12960601
llvm-svn: 179773
This occurs due to an alloca representing a separate ownership from the
original pointer. Thus consider the following pseudo-IR:
objc_retain(%a)
for (...) {
objc_retain(%a)
%block <- %a
F(%block)
objc_release(%block)
}
objc_release(%a)
From the perspective of the optimizer, the %block is a separate
provenance from the original %a. Thus the optimizer pairs up the inner
retain for %a and the outer release from %a, resulting in segfaults.
This is fixed by noting that the signature of a mismatch of
retain/releases inside the for loop is a Use/CanRelease top down with an
None bottom up (since bottom up the Retain-CanRelease-Use-Release
sequence is completed by the inner objc_retain, but top down due to the
differing provenance from the objc_release said sequence is not
completed). In said case in CheckForCFGHazards, we now clear the state
of %a implying that no pairing will occur.
Additionally a test case is included.
rdar://12969722
llvm-svn: 179747
* We only ever specialize these templates with an instantiation of ELFType,
so we don't need a template template.
* Replace LLVM_ELF_COMMA with just passing the individual parameters to the
macro. This requires a second macro for when we only have ELFT, but that
is still a small win.
llvm-svn: 179726
unable to handle cases such as __asm mov eax, 8*-8.
This patch also attempts to simplify the state machine. Further, the error
reporting has been improved. Test cases included, but more will be added to
the clang side shortly.
rdar://13668445
llvm-svn: 179719
for the sdiv/srem/udiv/urem bitcode instructions. This is done for the i8,
i16, and i32 types, as well as i64 for the x86_64 target.
Patch by Jim Stichnoth
llvm-svn: 179715
PR15000 has a testcase where the time to compile was bordering on 30s. When I
dropped the limit value to 100, it became a much more managable 6s. The compile
time seems to increase in a roughly linear fashion based on increasing the limit
value. (See the runtimes below.)
So, let's lower the limit to 100 so that they can get a more reasonable compile
time.
Limit Value Time
----------- ----
10 0.9744s
20 1.8035s
30 2.3618s
40 2.9814s
50 3.6988s
60 4.5486s
70 4.9314s
80 5.8012s
90 6.4246s
100 7.0852s
110 7.6634s
120 8.3553s
130 9.0552s
140 9.6820s
150 9.8804s
160 10.8901s
170 10.9855s
180 12.0114s
190 12.6816s
200 13.2754s
210 13.9942s
220 13.8097s
230 14.3272s
240 15.7753s
250 15.6673s
260 16.0541s
270 16.7625s
280 17.3823s
290 18.8213s
300 18.6120s
310 20.0333s
320 19.5165s
330 20.2505s
340 20.7068s
350 21.1833s
360 22.9216s
370 22.2152s
380 23.9390s
390 23.4609s
400 24.0426s
410 24.6410s
420 26.5208s
430 27.7155s
440 26.4142s
450 28.5646s
460 27.3494s
470 29.7255s
480 29.4646s
490 30.5001s
llvm-svn: 179713
The reference manual defines only 5 permitted values for the immediate field of the "hint" instruction:
1. nop (imm == 0)
2. yield (imm == 1)
3. wfe (imm == 2)
4. wfi (imm == 3)
5. sev (imm == 4)
Therefore, restrict the permitted values for the "hint" instruction to 0 through 4.
Patch by Mihail Popa <Mihail.Popa@arm.com>
llvm-svn: 179707
GCC complains: Core.cpp:1449:27: warning: overflow in implicit constant conversion [-Woverflow]
I'm not sure if that's really a problem here, but using the enum type is better
style anyways.
llvm-svn: 179696
A couple of recently introduced conditional branch patterns
also need to be marked as isCodeGenOnly since they cannot
be handled by the asm parser.
No change in generated code.
llvm-svn: 179690
This patch allows the Mips assembler to parse and emit nested
expressions as instruction operands. It also extends the
expansion of memory instructions when an offset is given as
an expression.
Contributer: Vladimir Medic
llvm-svn: 179657
If a switch instruction has a case for every possible value of its type,
with the same successor, SimplifyCFG would replace it with an icmp ult,
but the computation of the bound overflows in that case, which inverts
the test.
Patch by Jed Davis!
llvm-svn: 179587
Two return types are not equivalent if one is a pointer and the other is an
integral. This is because we cannot bitcast a pointer to an integral value.
PR15185
llvm-svn: 179569
This patch allows the assembler to recognize $fcc0
as a valid register for conditional move instructions.
Corresponding test cases have been added.
Contributer: Vladimir Medic
llvm-svn: 179567
Instead of emitting config values in a predefined order, the code
emitter will now emit a 32-bit register index followed by the 32-bit
config value.
llvm-svn: 179546
I will remove the isBigEndianHost function once I update clang.
The ifdef logic is designed to
* not use configure/cmake to avoid breaking -arch i686 -arch ppc.
* default to little endian
* be as small as possible
It looks like sys/endian.h is the preferred header on most modern BSD systems,
but it is better to change this in a followup patch as machine/endian.h is
available on FreeBSD, OpenBSD, NetBSD and OS X.
llvm-svn: 179527
This is a rework of the broken parts in r179373 which were subsequently reverted in r179374 due to incompatibility with C++98 compilers. This version should be ok under C++98.
llvm-svn: 179520
Now that the CR spilling issues have been resolved, we can remove the
unmodeled-side-effect attributes from the comparison instructions (and also
mark them as isCompare). By allowing these, by default, to have unmodeled side
effects, we were hiding problems with CR spilling; but everything seems much
happier now.
llvm-svn: 179502
This fixes an ABI bug for non-Darwin PPC64. For the callee-saved condition
registers, the spill location is specified relative to the stack pointer (SP +
8). However, this is not relative to the SP after the new stack frame is
established, but instead relative to the caller's stack pointer (it is stored
into the linkage area of the parent's stack frame).
So, like with the link register, we don't directly spill the CRs with other
callee-saved registers, but just mark them to be spilled during prologue
generation.
In practice, this reverts r179457 for PPC64 (but leaves it in place for PPC32).
llvm-svn: 179500
One performs: (X == 13 | X == 14) -> X-13 <u 2
The other: (A == C1 || A == C2) -> (A & ~(C1 ^ C2)) == C1
The problem is that there are certain values of C1 and C2 that
trigger both transforms but the first one blocks out the second,
this generates suboptimal code.
Reordering the transforms should be better in every case and
allows us to do interesting stuff like turn:
%shr = lshr i32 %X, 4
%and = and i32 %shr, 15
%add = add i32 %and, -14
%tobool = icmp ne i32 %add, 0
into:
%and = and i32 %X, 240
%tobool = icmp ne i32 %and, 224
llvm-svn: 179493
SDNodes and MachineOperands get target flags representing the %hi() and
%lo() assembly annotations that eventually become relocations.
Also define flags to be used by the 64-bit code models.
llvm-svn: 179468
Leaving MFCR has having unmodeled side effects is not enough to prevent
unwanted instruction reordering post-RA. We could probably apply a stronger
barrier attribute, but there is a better way: Add all (not just the first) CR
to be spilled as live-in to the entry block, and add all CRs to the MFCR
instruction as implicitly killed.
Unfortunately, I don't have a small test case.
llvm-svn: 179465
This is basically the same fix in three different places. We use a set to avoid
walking the whole tree of a big ConstantExprs multiple times.
For example: (select cmp, (add big_expr 1), (add big_expr 2))
We don't want to visit big_expr twice here, it may consist of thousands of
nodes.
The testcase exercises this by creating an insanely large ConstantExprs out of
a loop. It's questionable if the optimizer should ever create those, but this
can be triggered with real C code. Fixes PR15714.
llvm-svn: 179458
For functions that need to spill CRs, and have dynamic stack allocations, the
value of the SP during the restore is not what it was during the save, and so
we need to use the FP in these cases (as for all of the other spills and
restores, but the CR restore has a special code path because its reserved slot,
like the link register, is specified directly relative to the adjusted SP).
llvm-svn: 179457