The test case had a couple of FIXMEs where the instruction is in
fact already supported by the back-end. In some other case, while
the generic form of the instruction is not yet supported, a
specialized form is. This adds tests for those already supported
instructions / instruction forms.
llvm-svn: 185347
I believe the full "dmb ish" barrier is not required to guarantee release
semantics for atomic operations. The weaker "dmb ishst" prevents previous
operations being reordered with a store executed afterwards, which is enough.
A key point to note (fortunately already correct) is that this barrier alone is
*insufficient* for sequential consistency, no matter how liberally placed.
llvm-svn: 185339
Fix a case where we were incorrectly sign-extending a value when we should have been zero-extending the value.
Also change some SIGN_EXTEND to ANY_EXTEND because we really dont care and may have more opportunity to fold subexpressions
llvm-svn: 185331
Math functions are mark as readonly because they read the floating point
rounding mode. Because we don't vectorize loops that would contain function
calls that set the rounding mode it is safe to ignore this memory read.
llvm-svn: 185299
Changing the sign when comparing the base pointer would introduce all
sorts of unexpected things like:
%gep.i = getelementptr inbounds [1 x i8]* %a, i32 0, i32 0
%gep2.i = getelementptr inbounds [1 x i8]* %b, i32 0, i32 0
%cmp.i = icmp ult i8* %gep.i, %gep2.i
%cmp.i1 = icmp ult [1 x i8]* %a, %b
%cmp = icmp ne i1 %cmp.i, %cmp.i1
ret i1 %cmp
into:
%cmp.i = icmp slt [1 x i8]* %a, %b
%cmp.i1 = icmp ult [1 x i8]* %a, %b
%cmp = xor i1 %cmp.i, %cmp.i1
ret i1 %cmp
By preserving the original sign, we now get:
ret i1 false
This fixes PR16483.
llvm-svn: 185259
Real world code sometimes has the denominator of a 'udiv' be a
'select'. LLVM can handle such cases but only when the 'select'
operands are symmetric in structure (both select operands are a constant
power of two or a left shift, etc.). This falls apart if we are dealt a
'udiv' where the code is not symetric or if the select operands lead us
to more select instructions.
Instead, we should treat the LHS and each select operand as a distinct
divide operation and try to optimize them independently. If we can
to simplify each operation, then we can replace the 'udiv' with, say, a
'lshr' that has a new select with a bunch of new operands for the
select.
llvm-svn: 185257
We may, after other optimizations, find ourselves with IR that looks
like:
%shl = shl i32 1, %y
%cmp = icmp ult i32 %shl, 32
Instead, we should just compare the shift count:
%cmp = icmp ult i32 %y, 5
llvm-svn: 185242
This fixes PR16418, which reports that a function calling
__builtin_unwind_init() asserts. The cause is that this generates a
spill/restore for VRSAVE, and we support that only on Darwin (because VRSAVE is
only really used on Darwin).
The test case checks only that we don't crash. We can add correctness checks
once someone verifies what behavior the function is supposed to have.
llvm-svn: 185235
To support this we have to insert 'extractelement' instructions to pick the right lane.
We had this functionality before but I removed it when we moved to the multi-block design because it was too complicated.
llvm-svn: 185230
- lit tests verify that each line of input LLVM IR gets a !dbg node and a
corresponding entry of metadata that contains the line number
- unit tests verify that DebugIR works as advertised in the interface
- refactored some useful IR generation functionality from the MCJIT unit tests
so it can be reused
llvm-svn: 185212
On OpenBSD, the stack-smash protection transform uses "__guard_local"
and "__stack_smash_handler" instead of "__stack_chk_guard" and
"__stack_chk_fail". However, CodeGen/PowerPC/stack-protector.ll
doesn't specify a target OS, so on OpenBSD it fails.
Add -mtriple=ppc32-unknown-linux to make the test host-OS agnostic. While
there, convert to FileCheck.
Patch by Matthew Dempsky.
llvm-svn: 185206
Based on GCC's output for TLS variables (OP_constNu, x@dtpoff,
OP_lo_user), this implements debug info support for TLS in ELF. Verified
that this output is correct/sufficient on Linux (using gold - if you're
using binutils-ld, you'll need something with the fix for
http://sourceware.org/bugzilla/show_bug.cgi?id=15685 in it).
Support on non-ELF is sort of "arbitrary" at the moment - if Apple folks
want to discuss (or just go ahead & implement) how this should work in
MachO, etc, I'm open.
llvm-svn: 185203
Under certain (evidently rare) circumstances, this code used to convert OR(a,
AND(x, y)) into OR(a, x). This was incorrect.
While there, I've added a comment to the code immediately above.
llvm-svn: 185201
should expand ATOMIC_CMP_SWAP nodes the same way that it does for ATOMIC_SWAP.
Since ATOMIC_LOADs on some targets (e.g. older ARM variants) get legalized to
ATOMIC_CMP_SWAPs, the missing case had been causing i64 atomic loads to crash
during isel.
<rdar://problem/14074644>
llvm-svn: 185186
Fix ABI handling for function
returning bool -- use st.param.b32 to return the value
and use ld.param.b32 in caller to load the return value.
llvm-svn: 185177
This patch assigns paired GPRs for inline asm with
64-bit data on ARM. It's enabled for both ARM and Thumb to support modifiers
like %H, %Q, %R.
llvm-svn: 185169
We were generating intrinsics for NEON fixed-point conversions that didn't
exist (e.g. float -> i16). There are two cases to consider:
+ iN is smaller than float. In this case we can do the conversion but need an
extend or truncate as well.
+ iN is larger than float. In this case using the NEON conversion would be
incorrect so we don't perform any combining.
llvm-svn: 185158
The mapping between SRS pseudo-instructions and SRS native instructions was incorrect, the correct mapping is:
srsfa -> srsib
srsea -> srsia
srsfd -> srsdb
srsed -> srsda
This fixes <rdar://problem/14214734>.
llvm-svn: 185155
No functionality change.
It should suffice to check the type of a debug info metadata, instead of
calling Verify. For cases where we know the type of a DI metadata, use
assert.
Also update testing cases to make them conform to the format of DI classes.
llvm-svn: 185135
algorithm when assigning EnumValues to the synthesized registers.
The current algorithm, LessRecord, uses the StringRef compare_numeric
function. This function compares strings, while handling embedded numbers.
For example, the R600 backend registers are sorted as follows:
T1
T1_W
T1_X
T1_XYZW
T1_Y
T1_Z
T2
T2_W
T2_X
T2_XYZW
T2_Y
T2_Z
In this example, the 'scaling factor' is dEnum/dN = 6 because T0, T1, T2
have an EnumValue offset of 6 from one another. However, in other parts
of the register bank, the scaling factors are different:
dEnum/dN = 5:
KC0_128_W
KC0_128_X
KC0_128_XYZW
KC0_128_Y
KC0_128_Z
KC0_129_W
KC0_129_X
KC0_129_XYZW
KC0_129_Y
KC0_129_Z
The diff lists do not work correctly because different kinds of registers have
different 'scaling factors'. This new algorithm, LessRecordRegister, tries to
enforce a scaling factor of 1. For example, the registers are now sorted as
follows:
T1
T2
T3
...
T0_W
T1_W
T2_W
...
T0_X
T1_X
T2_X
...
KC0_128_W
KC0_129_W
KC0_130_W
...
For the Mips and R600 I see a 19% and 6% reduction in size, respectively. I
did see a few small regressions, but the differences were on the order of a
few bytes (e.g., AArch64 was 16 bytes). I suspect there will be even
greater wins for targets with larger register files.
Patch reviewed by Jakob.
rdar://14006013
llvm-svn: 185094
The purpose of this test was to check boundary conditions for the size
of an ALU clause. This test is very sensitive to changes to the
optimizer or scheduler, because it requires an exact number of ALU
instructions in order to remain valid. It's not good to have a test
this sensitive, because it is confusing to developers who implement
optimizations and then 'break' the test.
I'm not sure if there is a good way to test these limits using lit, but
if I can come up with replacement test that isn't as sensitive I'll add
it back to the tree.
llvm-svn: 185084
Add pseudo conditional store instructions, so that we use:
branch foo:
store
foo:
instead of:
load
branch foo:
move
foo:
store
z196 has real 32-bit and 64-bit conditional stores, but we don't use
any z196 instructions yet.
llvm-svn: 185065
When we store values for reversed induction stores we must not store the
reversed value in the vectorized value map. Another instruction might use this
value.
This fixes 3 test cases of PR16455.
llvm-svn: 185051
The Builtin attribute is an attribute that can be placed on function call site that signal that even though a function is declared as being a builtin,
rdar://problem/13727199
llvm-svn: 185049
Unfortunately this addresses two issues (by the time I'd disentangled the logic
it wasn't worth putting it back to half-broken):
+ Coprocessor instructions should all be predicable in Thumb mode.
+ BKPT should never be predicable.
llvm-svn: 184965
The assembler currently strictly verifies that immediates for
s16imm operands are in range (-32768 ... 32767). This matches
the behaviour of the GNU assembler, with one exception: gas
allows, as a special case, operands in an extended range
(-65536 .. 65535) for the addis instruction only (and its
extended mnemonic lis).
The main reason for this seems to be to allow using unsigned
16-bit operands for lis, e.g. like lis %r1, 0xfedc.
Since this has been supported by gas for a long time, and
assembler source code seen "in the wild" actually exploits
this feature, this patch adds equivalent support to LLVM
for compatibility reasons.
llvm-svn: 184946
Currently, all instructions taking s16imm operands support symbolic
operands. However, for u16imm operands, we only support actual
immediate integers. This causes the assembler to reject code like
ori %r5, %r5, symbol@l
This patch changes the u16imm operand definition to likewise
accept symbolic operands. In fact, s16imm and u16imm can
share the same encoding routine, now renamed to getImm16Encoding.
llvm-svn: 184944
This is easier to read than the internal fixed-point representation.
If anybody knows the correct algorithm for converting fixed-point
numbers to base 10, feel free to fix it.
llvm-svn: 184881
When a 1-element vector alloca is promoted, a store instruction can often be
rewritten without converting the value to a scalar and using an insertelement
instruction to stuff it into the new alloca. This patch just adds a check
to skip that conversion when it is unnecessary. This turns out to be really
important for some ARM Neon operations where <1 x i64> is used to get around
the fact that i64 is not a legal type.
llvm-svn: 184870
Note: Only adding test for evergreen, not SI yet.
When I attempted to expand vselect for SI, I got the following:
llc: /home/awatry/src/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp:522:
llvm::SDValue llvm::DAGTypeLegalizer::PromoteIntRes_SETCC(llvm::SDNode*):
Assertion `SVT.isVector() == N->getOperand(0).getValueType().isVector() &&
"Vector compare must return a vector result!"' failed.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 184847
No test/expansion for SI has been added yet. Attempts to expand this
operation for SI resulted in a stacktrace in (IIRC) LegalizeIntegerTypes
which was complaining about vector comparisons being required to return
a vector type.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 184845
Also add lit test for both cases on SI, and v2i32 for evergreen.
Note: I followed the guidance of the v4i32 EG check... UREM produces really
complex code, so let's just check that the instruction was lowered
successfully.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 184844
Also add lit test for both cases on SI, and v2i32 for evergreen.
Note: I followed the guidance of the v4i32 EG check... UDIV produces really
complex code, so let's just check that the instruction was lowered
successfully.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 184843
This is a band-aid to fix the most severe regressions we're seeing from basing
spill decisions on block frequencies, until we have a better solution.
llvm-svn: 184835
This adds pattern for the rldcr and rldic instructions (the last instruction
from the rotate/shift family that were missing). They are currently used
only by the asm parser.
llvm-svn: 184833
In reality, some unaligned memory accesses are legal for 32-bit types and
smaller too, but it all depends on the address space. Allowing
unaligned loads/stores for > 32-bit types is mainly to prevent the
legalizer from splitting one load into multiple loads of smaller types.
https://bugs.freedesktop.org/show_bug.cgi?id=65873
llvm-svn: 184822
This should only make a difference in programs that use a lot of the
vector ALU instructions like BFI_INT and BIT_ALIGN. There is a slight
improvement in the phatk bitcoin mining kernel with this patch on
Evergreen (vector size == 1):
Before:
1173 Instruction Groups / 9520 dwords
After:
1167 Instruction Groups / 9510 dwords
Reviewed-by: Reviewed-by: Vincent Lejeune<vljn at ovi.com>
llvm-svn: 184819
This adds support for the predicted forms of branches (+/-).
There are three cases to consider:
- Branches using a PPC::Predicate code
For these, I've added new PPC::Predicate codes corresponding
to the BO values for predicted branch forms, and updated insn
printing to print them correctly. I've also added new aliases
for the asm parser matching the new forms.
- bt/bf
I've added new aliases matching to gBC etc.
- bd(n)z variants
I've added new instruction patterns for the predicted forms.
In all cases, the new patterns are used for the asm parser only.
(The new infrastructure ought to be sufficient to allow use by
the compiler too at some point.)
llvm-svn: 184754
This should hopefully have fixed the stage2/stage3 miscompare on the dragonegg
testers.
"LoopVectorize: Use the dependence test utility class
We now no longer need alias analysis - the cases that alias analysis would
handle are now handled as accesses with a large dependence distance.
We can now vectorize loops with simple constant dependence distances.
for (i = 8; i < 256; ++i) {
a[i] = a[i+4] * a[i+8];
}
for (i = 8; i < 256; ++i) {
a[i] = a[i-4] * a[i-8];
}
We would be able to vectorize about 200 more loops (in many cases the cost model
instructs us no to) in the test suite now. Results on x86-64 are a wash.
I have seen one degradation in ammp. Interestingly, the function in which we
now vectorize a loop is never executed so we probably see some instruction
cache effects. There is a 2% improvement in h264ref. There is one or the other
TSCV loop kernel that speeds up.
radar://13681598"
llvm-svn: 184724
This adds instruction patterns to cover the generic forms of
the conditional branch instructions. This allows the assembler
to support the generic mnemonics.
The compiler will still generate the various specific forms
of the instruction that were already supported.
llvm-svn: 184722
There is currently only limited support for the "absolute" variants
of branch instructions. This patch adds support for the absolute
variants of all branches that are currently otherwise supported.
This requires adding new fixup types so that the correct variant
of relocation type can be selected by the object writer.
While the compiler will continue to usually choose the relative
branch variants, this will allow the asm parser to fully support
the absolute branches, with either immediate (numerical) or
symbolic target addresses.
No change in code generation intended.
llvm-svn: 184721
We now no longer need alias analysis - the cases that alias analysis would
handle are now handled as accesses with a large dependence distance.
We can now vectorize loops with simple constant dependence distances.
for (i = 8; i < 256; ++i) {
a[i] = a[i+4] * a[i+8];
}
for (i = 8; i < 256; ++i) {
a[i] = a[i-4] * a[i-8];
}
We would be able to vectorize about 200 more loops (in many cases the cost model
instructs us no to) in the test suite now. Results on x86-64 are a wash.
I have seen one degradation in ammp. Interestingly, the function in which we
now vectorize a loop is never executed so we probably see some instruction
cache effects. There is a 2% improvement in h264ref. There is one or the other
TSCV loop kernel that speeds up.
radar://13681598
llvm-svn: 184685
Untill now we detected the vectorizable tree and evaluated the cost of the
entire tree. With this patch we can decide to trim-out branches of the tree
that are not profitable to vectorizer.
Also, increase the max depth from 6 to 12. In the worse possible case where all
of the code is made of diamond-shaped graph this can bring the cost to 2**10,
but diamonds are not very common.
llvm-svn: 184681
This makes it possible to write unit tests that are less susceptible
to minor code motion, particularly copy placement. block-placement.ll
covers this case with -pre-RA-sched=source which will soon be
default. One incorrectly named block is already fixed, but without
this fix, enabling new coalescing and scheduling would cause more
failures.
llvm-svn: 184680
This is an awful implementation of the target hook. But we don't have
abstractions yet for common machine ops, and I don't see any quick way
to make it table-driven.
llvm-svn: 184664
Rewrote the SLP-vectorization as a whole-function vectorization pass. It is now able to vectorize chains across multiple basic blocks.
It still does not vectorize PHIs, but this should be easy to do now that we scan the entire function.
I removed the support for extracting values from trees.
We are now able to vectorize more programs, but there are some serious regressions in many workloads (such as flops-6 and mandel-2).
llvm-svn: 184647
Although in reality the symbol table in ELF resides in a section, the
standard requires that there be no more than one SHT_SYMTAB. To enforce
this constraint, it is cleaner to group all the symbols under a
top-level `Symbols` key on the object file.
llvm-svn: 184627
It wouldn't really test anything that doesn't already have a more
targeted test:
`yaml2obj-elf-section-basic.yaml`:
Already tests that section content is correctly passed though.
`yaml2obj-elf-symbol-basic.yaml` (this file):
Tests that the st_value and st_size attributes of `main` are set
correctly.
Between those two tests, disassembling the file doesn't really add
anything, so just remove mention of disassembling the file.
llvm-svn: 184607
This reverts commit r184602. In an upcoming commit, I will just remove
the disassembler part of the test; it was mostly just a "nifty" thing
marking a milestone but it doesn't test anything that isn't tested
elsewhere.
llvm-svn: 184606
A FastISel optimization was causing us to emit no information for such
parameters & when they go missing we end up emitting a different
function type. By avoiding that shortcut we not only get types correct
(very important) but also location information (handy) - even if it's
only live at the start of a function & may be clobbered later.
Reviewed/discussion by Evan Cheng & Dan Gohman.
llvm-svn: 184604
This was causing buildbot failures when build without X86 support.
Is there a way to conditionalize the test on the X86 target being
present?
llvm-svn: 184597
Zero is used by BlockFrequencyInfo as a special "don't know" value. It also
causes a sink for frequencies as you can't ever get off a zero frequency with
more multiplies.
This recovers a 10% regression on MultiSource/Benchmarks/7zip. A zero frequency
was propagated into an inner loop causing excessive spilling.
PR16402.
llvm-svn: 184584
The GNU assembler supports (as extension to the ABI) use of PC-relative
relocations in half16 fields, which allows writing code like:
li 1, base-.
This patch adds support for those relocation types in the assembler.
llvm-svn: 184552
The current code base only supports the minimum set of tls-related
relocations and @modifiers that are necessary to support compiler-
generated code. This patch extends this to the full set defined
in the ABI (and supported by the GNU assembler) for the benefit
of the assembler parser.
llvm-svn: 184551
This adds necessary infrastructure to support the @h modifier.
Note that all required relocation types were already present
(and unused).
This patch provides support for using @h in the assembler;
it would also be possible to now use this feature in code
generated by the compiler, but this is not done yet.
llvm-svn: 184548
The output can be in different orders, which breaks the test in some
situations. I have not yet found out what the root cause of the order
difference is. This fixes our internal build. If it is not the right
solution, feel free to roll back.
llvm-svn: 184535
Previously we unconditionally enforced that section references in
symbols in the YAML had a name that was a section name present in the
object, and linked the references to that section. Now, permit empty
section names (already the default, if the `Section` key is not
provided) to indicate SHN_UNDEF.
llvm-svn: 184513
Instead, just have 3 sub-lists, one for each of
{STB_LOCAL,STB_GLOBAL,STB_WEAK}.
This allows us to be a lot more explicit w.r.t. the symbol ordering in
the object file, because if we allowed explicitly setting the STB_*
`Binding` key for the symbol, then we might have ended up having to
shuffle STB_LOCAL symbols to the front of the list, which is likely to
cause confusion and potential for error.
Also, this new approach is simpler ;)
llvm-svn: 184506
it at the moment.
This allows to form more paired loads even when stack coloring pass destroys the
memoryoperand's value.
<rdar://problem/13978317>
llvm-svn: 184492
This is a bit tricky as the xacquire and xrelease hints use the same bytes,
0xf2 and 0xf3, as the repne and rep prefixes.
Fortunately llvm has different llvm MCInst Opcode enums for rep/xrelease
and repne/xacquire. So to make this work a boolean was added the
InternalInstruction struct as part of the Prefix state which is set with the
added logic in readPrefixes() when decoding an instruction to determine
if these prefix bytes are to be disassembled as xacquire or xrelease. Then
we let the matcher pick the normal prefix instructionID and we change the
Opcode after that when it is set into the MCInst being created.
rdar://11019859
llvm-svn: 184490
Also add a v2i32 test to the existing v4i32 test.
Patch by: Aaron Watry
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Aaron Watry<awatry@gmail.com>
llvm-svn: 184482
Also add SI tests to existing file and a v2i32 test for both
R600 and SI.
Patch by: Aaron Watry
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 184481
The custom lowering causes llc to crash with a segfault.
Ideally, the custom lowering can be fixed, but this allows
programs which load/store v2i32 to work without crashing.
Patch by: Aaron Watry
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Aaron Watry<awatry@gmail.com>
llvm-svn: 184480
After this patch, the ELF file produced by
`yaml2obj-elf-symbol-basic.yaml`, when linked and executed on x86_64
(under SysV ABI, obviously; I tested on Linux), produces a working
executable that goes into an infinite loop!
llvm-svn: 184469
The cdp2 instruction should have the same restrictions as cdp on the
co-processor registers.
VFP instructions on v8/AArch32 share the same encoding space as cdp2.
llvm-svn: 184445
We collect gather sequences when we vectorize basic blocks. Gather sequences are excellent
hints for vectorization of other basic blocks.
llvm-svn: 184444
The assembler parser common code supports recognizing symbol variants
using the @ modifer. On PowerPC, it should also be possible to use
(some of) those modifiers with directional labels, like "1f@l".
This patch adds support for accepting symbol variants on directional
labels as well.
llvm-svn: 184437
This patch adds support for having the assembler optimize fixups
to constructs like "symbol@ha" or "symbol@l" if "symbol" can be
resolved at assembler time.
This optimization is already present in the PPCMCExpr.cpp code
for handling PPC_HA16/PPC_LO16 target expressions. However,
those target expression were used only on Darwin targets.
This patch changes target expression code so that they are
usable also with the GNU assembler (using the @ha / @l syntax
instead of the ha16() / lo16() syntax), and changes the
MCInst lowering code to generate those target expressions
where appropriate.
It also changes the asm parser to generate HA16/LO16 target
expressions when parsing assembler source that uses the
@ha / @l modifiers. The effect is that now the above-
mentioned optimization automatically becomes available
for those situations too.
llvm-svn: 184436
Fix up three tests - one that was relying on abbreviation number,
another relying on a location list in this case (& testing raw asm,
changed that to use dwarfdump on the debug_info now that that's where
the location is), and another which was added in r184368 - exposing a
bug in that fix that is exposed when we emit the location inline rather
than through a location list. Fix that bug while I'm here.
llvm-svn: 184387
We had been papering over a problem with location info for non-trivial
types passed by value by emitting their type as references (this caused
the debugger to interpret the location information correctly, but broke
the type of the function). r183329 corrected the type information but
lead to the debugger interpreting the pointer parameter as the value -
the debug info describing the location needed an extra dereference.
Use a new flag in DIVariable to add the extra indirection (either by
promoting an existing DW_OP_reg (parameter passed in a register) to
DW_OP_breg + 0 or by adding DW_OP_deref to an existing DW_OP_breg + n
(parameter passed on the stack).
llvm-svn: 184368
This is a basic implementation - we still don't have any support (that I
know of) for dumping DWARF expressions in a meaningful way, so the
location information itself is just printed as a sequence of bytes as we
do elsewhere.
llvm-svn: 184361
The compiler occasionally generates multiple .loc directives in a row
(at the same instruction address). These need to be transformed into
multple actual .debug_line table entries, since they are used to signal
certain information to the debugger (e.g. if the opening brace of a
function body is on the same line as the declaration).
The MCAsmStreamer version of EmitDwarfLocDirective handles this
correctly by emitting a .loc directive every time it is called.
However, the MCObjectStream version simply defaults to recording
the information and emitting only a single table entry later,
e.g. when EmitInstruction is called.
This patch introduces a MCAsmStreamer::EmitDwarfLocDirective
version that emits a line table entry for a .loc directive
that may already be pending before recording the new directive.
(This is similar to how this is handled in GNU as.)
With this patch (and the code alignment factor patch) applied,
I'm now getting identical DWARF .debug sections for all test-suite
object files on PowerPC for the internal and the external assembler.
llvm-svn: 184357
Prior to this change, the considered addressing modes may be invalid since the
maximum and minimum offsets were not taking into account.
This was causing an assertion failure.
The added test case exercices that behavior.
<rdar://problem/14199725> Assertion failed: (CurScaleCost >= 0 && "Legal
addressing mode has an illegal cost!")
llvm-svn: 184341
The type <3 x i8> is a common in graphics and we want to be able to vectorize it.
This changes accelerates bullet by 12% and 471_omnetpp by 5%.
llvm-svn: 184317
"When assembling to the ARM instruction set, the .N qualifier produces
an assembler error and the .W qualifier has no effect."
In the pre-matcher handler in the asm parser the ".w" (wide) qualifier
when in ARM mode is now discarded. And an error message is now
produced when the ".n" (narrow) qualifier is used in ARM mode.
Test cases for these were added.
rdar://14064574
llvm-svn: 184224
value is zero.
This allows optmizations to kick in more easily.
Fix some test cases so that they remain meaningful (i.e., not completely dead
coded) when optimizations apply.
<rdar://problem/14096009> superfluous multiply by high part of zero-extended
value.
llvm-svn: 184222