This seems to have been a cargo-culted habit from the very first such
cache which didn't have any specific justification (but might've been a
layering constraint at the time).
llvm-svn: 206003
This removes the -segmented-stacks command line flag in favor of a
per-function "split-stack" attribute.
Patch by Luqman Aden and Alex Crichton!
llvm-svn: 205997
Refactored stack-protector.ll to use new-style function attributes everywhere
and eliminated unnecessary attributes.
This cleanup is in preparation for an upcoming test change.
llvm-svn: 205996
Move the iterators into the range the same way the range's ctor moves
them into the members.
Also remove some redundant top level parens in the return statement.
llvm-svn: 205993
To support compressing the debug_line section that contains multiple
fragments (due, I believe, to variation in choices of line table
encoding depending on the size of instruction ranges in the actual
program code) we needed to support compressing multiple MCFragments in a
single pass.
This patch implements that behavior by mutating the post-relaxed and
relocated section to be the compressed form of its former self,
including renaming the section.
This is a more flexible (and less invasive, to a degree) implementation
that will allow for other features such as "use compression only if it's
smaller than the uncompressed data".
Compressing debug_frame would be a possible further extension to this
work, but I've left it for now. The hurdle there is alignment sections -
which might require going as far as to refactor
MCAssembler.cpp:writeFragment to handle writing to a byte buffer or an
MCObjectWriter (there's already a virtual call there, so it shouldn't
add substantial compile-time cost) which could in turn involve
refactoring MCAsmBackend::writeNopData to use that same abstraction...
which involves touching all the backends. This would remove the limited
handling of fragment writing seen in
ELFObjectWriter.cpp:getUncompressedData which would be nice - but it's
more invasive.
I did discover that I (perhaps obviously) don't need to handle
relocations when I rewrite the fragments - since the relocations have
already been applied and computed (and stored into
ELFObjectWriter::Relocations) by this stage (necessarily, because we
need to have written any immediate values or assembly-time relocations
into the data already before we compress it, which we have). The test
case doesn't necessarily cover that in detail - I can add more test
coverage if that's preferred.
llvm-svn: 205990
To support compression for debug_line and debug_frame a different
approach is required. To simplify review, revert the old implementation
and XFAIL the test case. New implementation to follow shortly.
Reverts r205059 and r204958.
llvm-svn: 205989
Convenience wrapper to make dealing with sub-ranges easier. Like the
iterator_range<> itself, if/when this sort of thing gets standards
blessing, it will be replaced by the official version.
llvm-svn: 205987
alignments on vld/vst instructions. And report errors for
alignments that are not supported.
While this is a large diff and an big test case, the changes
are very straight forward. But pretty much had to touch
all vld/vst instructions changing the addrmode to one of the
new ones that where added will do the proper checking for
the specific instruction.
FYI, re-committing this with a tweak so MemoryOp's default
constructor is trivial and will work with MSVC 2012. Thanks
to Reid Kleckner and Jim Grosbach for help with the tweak.
rdar://11312406
llvm-svn: 205986
This reverts commit r205974, it turns out that this wasn't such a great idea
after all. Using DIVariable as return value is self-documenting and marginally
more type safe.
llvm-svn: 205979
Summary:
Similarly, the HasMips64 on the 64-bit move InstAlias is a test for 64-bit
GPR's.
No functional change.
Reviewers: matheusalmeida
Reviewed By: matheusalmeida
Differential Revision: http://reviews.llvm.org/D3263
llvm-svn: 205968
Summary:
It is now the smallest superset for these ISA's.
FeatureMips4 now contains FeatureFPIdx since [ls][dw]xc1 were added in MIPS-IV.
Made the FPIdx feature bit lowercase so that it can be used in the -mattr option.
Depends on D3274
Reviewers: matheusalmeida
Reviewed By: matheusalmeida
Differential Revision: http://reviews.llvm.org/D3275
llvm-svn: 205964
GCC 4.8 complains with:
warning: enumeral and non-enumeral type in conditional expression
Although this is silly and harmless in this case, add an explicit cast to
silence the warning.
llvm-svn: 205949
The immediate cost calculation code was hitting an assertion in the included
test case, because APInt was still internally 128-bits. Truncating it to 64-bits
fixed the issue.
Fixes <rdar://problem/16572521>.
llvm-svn: 205947
It doesn't build with MSVC 2012, because MSVC doesn't allow union
members that have non-trivial default constructors. This change added
'SMLoc AlignmentLoc' to MemoryOp, which made MemoryOp's default ctor
non-trivial.
This reverts commit r205930.
llvm-svn: 205944
As it turns out the source of the sunkaddr can be a constant, in which case
there is not an instruction to delete, causing the cleanup code introduced in
r204833 to crash. This patch adds a dynamic check to ensure the deleted value is
in fact an instruction and not a constant.
Patch by Louis Gerbarg <lgg@apple.com>
llvm-svn: 205941
AVX supports logical operations using an operand from memory. Unfortunately
because integer operations were not added until AVX2 the AVX1 logical
operation's types were preventing the isel from folding the loads. In a limited
number of cases the peephole optimizer would fold the loads, but most were
missed. This patch adds explicit patterns with appropriate casts in order for
these loads to be folded.
The included test cases run on reduced examples and disable the peephole
optimizer to ensure the folds are being pattern matched.
Patch by Louis Gerbarg <lgg@apple.com>
rdar://16355124
llvm-svn: 205938
FoldConstantArithmetic() only knows how to deal with a few target independent
ISD opcodes. Bail early if it sees a target-specific ISD node. These node do
funny things with operand types which may break the assumptions of the code
that follows, and there's no actual folding that can be done anyway. For example,
non-constant 256 bit vector shifts on X86 have a shift-amount operand that's a
128-bit v4i32 vector regardless of what the first operand type is and that breaks
the assumption that the operand types must match.
rdar://16530923
llvm-svn: 205937
alignments on vld/vst instructions. And report errors for
alignments that are not supported.
While this is a large diff and an big test case, the changes
are very straight forward. But pretty much had to touch
all vld/vst instructions changing the addrmode to one of the
new ones that where added will do the proper checking for
the specific instruction.
rdar://11312406
llvm-svn: 205930
In AArch64 i64 to i32 truncate operation is a subregister access.
This allows more opportunities for LSR optmization to eliminate
variables of different types (i32 and i64).
llvm-svn: 205925
sign/zero/any extensions. However a few places were not checking properly the
property of the load and were turning an indexed load into a regular extended
load. Therefore the indexed value was lost during the process and this was
triggering an assertion.
<rdar://problem/16389332>
llvm-svn: 205923
Don't quote octal compatible strings if they are only two wide, they
aren't ambiguous.
This reverts commit r205857 which reverted r205857.
llvm-svn: 205914
obj2yaml would fail when seeing a Weak External auxiliary record with a
characteristics field holding zero instead of one of
IMAGE_WEAK_EXTERN_SEARCH_NOLIBRARY, IMAGE_WEAK_EXTERN_SEARCH_NOLIBRARY,
or IMAGE_WEAK_EXTERN_SEARCH_NOLIBRARY.
llvm-svn: 205911
This commit adds intrinsics and codegen support for the surface read/write and texture read instructions that take an explicit sampler parameter. Codegen operates on image handles at the PTX level, but falls back to direct replacement of handles with kernel arguments if image handles are not enabled. Note that image handles are explicitly disabled for all target architectures in this change (to be enabled later).
llvm-svn: 205907
This also fixes a bug in the annotation cache where the cache will not be cleared between modules if multiple modules are compiled in the same process.
llvm-svn: 205905
Summary:
They behave in accordance with the Has2008 and ABS2008 configuration bits of the processor which are used to select between the 1985 and 2008 versions of IEEE 754. In 1985 mode, these instructions are arithmetic (i.e. they raise invalid operation exceptions when given NaN), in 2008 mode they are non-arithmetic (i.e. they are copies).
nmadd.[ds], and nmsub.[ds] are still subject to -enable-no-nans-fp-math because the ISA spec does not explicitly state that they obey Has2008 and ABS2008.
Fixed the issue with the previous version of this patch (r205628). A pre-existing 'let Predicate =' statement was removing some predicates that were necessary for FP64 to behave correctly.
Reviewers: matheusalmeida
Reviewed By: matheusalmeida
Differential Revision: http://llvm-reviews.chandlerc.com/D3274
llvm-svn: 205844
YAMLIO would turn a BinaryRef into the string 0000000004000000.
However, the leading zero causes parsers to interpret it as being an
octal number instead of a hexadecimal one.
Instead, escape such strings as needed.
llvm-svn: 205839
Add type name mappings for the ARM COFF relocations. This allows for objdump to
provide a more useful description of relocations in disassembly inline form.
llvm-svn: 205834
Summary:
Local common symbols were properly inserted into the .bss section.
However, putting external common symbols in the .bss section would give
them a strong definition.
Instead, encode them as undefined, external symbols who's symbol value
is equivalent to their size.
Reviewers: Bigcheese, rafael, rnk
CC: llvm-commits
Differential Revision: http://reviews.llvm.org/D3324
llvm-svn: 205811
:doc:`...` and :ref:`...` links help Sphinx keep track the dependencies
between documents and ensure that they are not pointing to nowhere.
Raw HTML links work just fine and are easier for people less familiar
with reST/Sphinx. They are easy to change over to the :doc:/:ref: style
after the fact so this is not a problem.
This commit doesn't fix all of them.
llvm-svn: 205792
This implements the target-hooks for ARM64 to enable constant hoisting.
This fixes <rdar://problem/14774662> and <rdar://problem/16381500>.
llvm-svn: 205791
Until r197284, the entry frequency was constant -- i.e., set to 2^14.
Although current ToT still has a constant entry frequency, since r197284
that has been an implementation detail (which is soon going to change).
- r204690 made the wrong assumption for the CSRCost metric. Adjust
callee-saved register cost based on entry frequency.
- r185393 made the wrong assumption (although it was valid at the
time). Update SpillPlacement.cpp::Threshold to be relative to the
entry frequency.
Since ToT still has 2^14 entry frequency, this should have no observable
functionality change.
<rdar://problem/14292693>
llvm-svn: 205789
PPC::isVSLDOIShuffleMask should return -1, not false, when the shuffle
predicate should be false.
Noticed by inspection; no test case (yet).
llvm-svn: 205787
Summary:
This adds support in 'opt' to filter pass remarks emitted by
optimization passes. A new flag -pass-remarks specifies which
passes should emit a diagnostic when LLVMContext::emitOptimizationRemark
is invoked.
This will allow the front end to simply pass along the regular
expression from its own -Rpass flag when launching the backend.
Depends on D3227.
Reviewers: qcolombet
CC: llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D3291
llvm-svn: 205775
Summary:
This patch adds backend support for -Rpass=, which indicates the name
of the optimization pass that should emit remarks stating when it
made a transformation to the code.
Pass names are taken from their DEBUG_NAME definitions.
When emitting an optimization report diagnostic, the lack of debug
information causes the diagnostic to use "<unknown>:0:0" as the
location string.
This is the back end counterpart for
http://llvm-reviews.chandlerc.com/D3226
Reviewers: qcolombet
CC: llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D3227
llvm-svn: 205774
Confusingly, the NEON fmla instructions put the accumulator first but the
scalar versions put it at the end (like the fma lib function & LLVM's
intrinsic).
This should fix PR19345, assuming there's only one issue.
llvm-svn: 205758
indirectly requires a function analysis.
This bug was reported by Jason Kim. He included a test case here:
http://reviews.llvm.org/D3312
llvm-svn: 205753
Before, we would have conditional operators where one side of the
operator would be of type RelocationTypeAMD64 and the other is of type
RelocationTypeI386. GCC would noisly warn with -Wenum-compare
diagnostic.
Instead, refactor the code so it is more like the X86 ELF object writer.
llvm-svn: 205752
The IO normalizer would essentially lump I386 and AMD64 relocations
together. Relocation types with the same numeric value would then get
mapped in appropriately.
For example:
IMAGE_REL_AMD64_ADDR64 and IMAGE_REL_I386_DIR16 both have a numeric
value of one. We would see IMAGE_REL_I386_DIR16 in obj2yaml conversions
of object files with a machine type of IMAGE_FILE_MACHINE_AMD64.
llvm-svn: 205746
Fixes PR16365 - Extremely slow compilation in -O1 and -O2.
The SD scheduler has a quadratic implementation of load clustering
which absolutely blows up compile time for large blocks with constant
pool loads. The MI scheduler has a better implementation of load
clustering. However, we have not done the work yet to completely
eliminate the SD scheduler. Some benchmarks still seem to benefit from
early load clustering, although maybe by chance.
As an intermediate term fix, I just put a nice limit on the number of
DAG users to search before finding a match. With this limit there are no
binary differences in the LLVM test suite, and the PR16365 test case
does not suffer any compile time impact from this routine.
llvm-svn: 205738
Moving these patterns from TableGen files to PerformDAGCombine()
should allow us to generate better code by eliminating unnecessary
shifts and extensions earlier.
This also fixes a bug where the MAD pattern was calling
SimplifyDemandedBits with a 24-bit mask on the first operand
even when the full pattern wasn't being matched. This occasionally
resulted in some instructions being incorrectly deleted from the
program.
v2:
- Fix bug with 64-bit mul
llvm-svn: 205731
Using this file would result in an odr violation: it defines an llvm::Interval
class that conflicts with the one in Analysis/Interval.h.
llvm-svn: 205726
into a constant size alloca by inlining.
Ran a run over the testsuite, no results out of the noise, fixes
the testcase in the PR.
PR19115.
llvm-svn: 205710
cygwin has llvm-dwarfdump problems and isn't paying attention to the
specific xfail there.
s390x isn't matching for an unknown reason.
llvm-svn: 205708
- take->release: LLVM has moved to C++11. MockWrapper became an instance of unique_ptr.
- method symbol_iterator::increment disappeared recently, in this revision:
r200442 | rafael | 2014-01-29 20:49:50 -0600 (Wed, 29 Jan 2014) | 9 lines
Simplify the handling of iterators in ObjectFile.
None of the object file formats reported error on iterator increment. In
retrospect, that is not too surprising: no object format stores symbols or
sections in a linked list or other structure that requires chasing pointers.
As a consequence, all error checking can be done on begin() and end().
This reduces the text segment of bin/llvm-readobj in my machine from 521233 to
518526 bytes.
My change mimics the change that the revision made to lib/DebugInfo/DWARFContext.cpp .
- const_cast: Shut up a warning from gcc.
I ran unittests/ExecutionEngine/JIT/Debug+Asserts/JITTests to make sure it worked.
- Arch
llvm-svn: 205689
It affected callee's stack pop in x86. It is one of devergences between cygwin and mingw since mingw-gcc-4.6.
Added testcases to llvm/test/CodeGen/X86/win32_sret.ll for cygwin.
llvm-svn: 205688
I really should read the spec more often (and test GCC more often too).
I just assumed that namespace aliases would be the same as using
directives, except with a name. But apparently that's not how the DWARF
standards suggests they be implemented. DWARF4 provides an example and
other non-normative text suggesting that namespace aliases be
implemented by named imported declarations intsead of named imported
modules.
So be it.
llvm-svn: 205685
Also update a few null pointers in this function to be consistent with
new null pointers being added.
Patch by Robert Matusewicz!
Differential Revision: http://reviews.llvm.org/D3123
llvm-svn: 205682
This adds a warning when linker_private or linker_private_weak is provided and
we handle it in a compatible manner.
Suggested by Chris Lattner!
llvm-svn: 205681
Member functions defined within a class definition are implicitly
'inline' for linkage purposes. Compilers might slightly favor inlining
functions explicitly marked 'inline', but LLVM doesn't make a stylistic
habit of doing this generally.
llvm-svn: 205679
This consolidates the duplicated MachO checks in the directive parsing for
various directives that are unsupported for Mach-O. The error message change is
unimportant as this restores the behaviour to that prior to the addition of the
new directive handling. Furthermore, use a more direct check for MachO
targeting rather than an indirect feature check of the assembler.
Also simplify the test execution command to avoid temporary files. Further more,
perform the check in both object and assembly emission.
Whether all non-applicable directives are handled is another question. .fnstart
is marked as being unsupported, however, the complementary .fnend is not. The
additional unwinding directives are also still honoured. This change does not
change that, though, it would be good to validate and mark them as being
unsupported if they are unsupported for the MachO emission.
llvm-svn: 205678
This avoids an extra copy during decompression and avoids the use of
MemoryBuffer which is a weirdly esoteric device that includes unrelated
concepts like "file name" (its rather generic name is a bit misleading).
Similar refactoring of zlib::compress coming up.
llvm-svn: 205676
This restores the linker_private and linker_private_weak lexemes to permit
translation of the deprecated lexmes. The behaviour is identical to the bitcode
handling: linker_private and linker_private_weak are handled as if private had
been specified. This enables compatibility with IR generated by LLVM 3.4.
Reported on IRC by ki9a!
llvm-svn: 205675
This provides more realistic costs for the insert/extractelement instructions
(which are load/store pairs), accounts for the cheap unaligned Altivec load
sequence, and for unaligned VSX load/stores.
Bad news:
MultiSource/Applications/sgefa/sgefa - 35% slowdown (this will require more investigation)
SingleSource/Benchmarks/McGill/queens - 20% slowdown (we no longer vectorize this, but it was a constant store that was scalarized)
MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2 - 2% slowdown
Good news:
SingleSource/Benchmarks/Shootout/ary3 - 54% speedup
SingleSource/Benchmarks/Shootout-C++/ary - 40% speedup
MultiSource/Benchmarks/Ptrdist/ks/ks - 35% speedup
MultiSource/Benchmarks/FreeBench/neural/neural - 30% speedup
MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt - 20% speedup
Unfortunately, estimating the costs of the stack-based scalarization sequences
is hard, and adjusting these costs is like a game of whac-a-mole :( I'll
revisit this again after we have better codegen for vector extloads and
truncstores and unaligned load/stores.
llvm-svn: 205658
This patch adds the Octeon cnMips instructions seqi/snei and v3mulu/vmm0/vmulu.
It is only for the assembler. Test case is included.
Reviewed by: Daniel.Sanders@imgtec.com
llvm-svn: 205631
gcc inline asm supports specifying "cc" as a clobber of all condition
registers. Add just enough modeling of the full register to make this work.
Fixed PR19326.
llvm-svn: 205630
Summary:
They behave in accordance with the Has2008 and ABS2008 configuration bits of the
processor which are used to select between the 1985 and 2008 versions of IEEE
754. In 1985 mode, these instructions are arithmetic (i.e. they raise invalid
operation exceptions when given NaN), in 2008 mode they are non-arithmetic
(i.e. they are copies).
nmadd.[ds], and nmsub.[ds] are still subject to -enable-no-nans-fp-math because
the ISA spec does not explicitly state that they obey Has2008 and ABS2008.
Reviewers: matheusalmeida
Reviewed By: matheusalmeida
Differential Revision: http://llvm-reviews.chandlerc.com/D3274
llvm-svn: 205628
When LLVM sees something like (v1iN (vselect v1i1, v1iN, v1iN)) it can
decide that the result is OK (v1i64 is legal on AArch64, for example)
but it still need scalarising because of that v1i1. There was no code
to do this though.
AArch64 and ARM64 have DAG combines to produce efficient code and
prevent that occuring in *most* such situations, but there are edge
cases that they miss. This adds a legalization to cope with that.
llvm-svn: 205626
There were several overlapping problems here, and this solution is
closely inspired by the one adopted in AArch64 in r201381.
Firstly, scalarisation of v1i1 setcc operations simply fails if the
input types are legal. This is fixed in LegalizeVectorTypes.cpp this
time, and allows AArch64 code to be simplified slightly.
Second, vselect with such a setcc feeding into it ends up in
ScalarizeVectorOperand, where it's not handled. I experimented with an
implementation, but found that whatever DAG came out was rather
horrific. I think Hao's DAG combine approach is a good one for
quality, though there are edge cases it won't catch (to be fixed
separately).
Should fix PR19335.
llvm-svn: 205625
Removed "GNU Assembler extension (compatibility)" definitions from ARMInstrInfo.td
Fixed ARMAsmParser::ParseInstruction GNU compatability branch, so it also works for thumb mode from now.
Added new tests.
llvm-svn: 205622
Sorry for the breakage.
For now, it will fail in two ways:
1. To fail for targeting x86_64-mingw32.
<stdin>:131:8: note: possible intended match here
0x30830a0100000002 3 0 1 0 0 is_stmt
2. To fail not to find the target x86.
llc: : error: unable to get target for 'x86_64-unknown-unknown',
see --version and --triple.
llvm-svn: 205621
The previous patterns directly inserted FMOV or INS instructions into
the DAG for scalar_to_vector & bitconvert patterns. This is horribly
inefficient and can generated lots more GPR <-> FPR register traffic
than necessary.
It's much better to emit instructions the register allocator
understands so it can coalesce the copies when appropriate.
It led to at least one ISelLowering hack to avoid the problems, which
was incorrect for v1i64 (FPR64 has no dsub). It can now be removed
entirely.
This should also fix PR19331.
llvm-svn: 205616
Without this change, the llvm_unreachable kicked in. The code pattern
being spotted is rather non-canonical for 128-bit MLAs, but it can
happen and there's no point in generating sub-optimal code for it just
because it looks odd.
Should fix PR19332.
llvm-svn: 205615
recoloring cut-offs are encountered and register allocation failed.
This is related to PR18747
Patch by MAYUR PANDEY <mayur.p@samsung.com>.
llvm-svn: 205601
Removes unnecessary casts from non-generic address spaces to the generic address
space for certain code patterns.
Patch by Jingyue Wu.
llvm-svn: 205571
When rematerializing through truncates, the coalescer may produce instructions
with dead defs, but live implicit-defs of subregs:
E.g.
%X1<def,dead> = MOVi64imm 2, %W1<imp-def>; %X1:GPR64, %W1:GPR32
These instructions are live, and their definitions should not be rewritten.
Fixes <rdar://problem/16492408>
llvm-svn: 205565
Acording to AMD documentation, the correct opcode for
BFE_INT is 0x5, not 0x4
Fixes Arithm/Absdiff.Mat/3 OpenCV test
Patch by: Bruno Jiménez
llvm-svn: 205562
these is very much off and is more than just the branch
from this bug incorrect:
Address Line Column File ISA Discriminator Flags
------------------ ------ ------ ------ --- ------------- -------------
0x30830a0100000002 3 0 1 0 0 is_stmt
0x30830a0100000008 3 0 1 0 0 is_stmt end_sequence
llvm-svn: 205551
More updating of tests to be explicit about the target triple rather than
relying on the default target triple supporting ARM mode.
Indicate to lit that object emission is not yet available for Windows on ARM.
llvm-svn: 205545
This changes the tests that were targeting ARM EABI to explicitly specify the
environment rather than relying on the default. This breaks with the new
Windows on ARM support when running the tests on Windows where the default
environment is no longer EABI.
Take the opportunity to avoid a pointless redirect (helps when trying to debug
with providing a command line invocation which can be copy and pasted) and
removing a few greps in favour of FileCheck.
llvm-svn: 205541
Implementing this via ComputeMaskedBits has two advantages:
+ It actually works. DAGISel doesn't deal with the chains properly
in the previous pattern-based solution, so they never trigger.
+ The information can be used in other DAG combines, as well as the
trivial "get rid of truncs". For example if the trunc is in a
different basic block.
rdar://problem/16227836
llvm-svn: 205540
Summary:
test/MC/Mips/<isa1>/invalid-<isa2>.s
Test that <isa1> does not support <isa2>'s instructions.
test/MC/Mips/<isa1>/invalid-<isa2>-xfail.s
Things that should be invalid but currently aren't. Will XPASS if any
become invalid.
Reviewers: matheusalmeida
Reviewed By: matheusalmeida
Differential Revision: http://llvm-reviews.chandlerc.com/D3262
llvm-svn: 205538
The terminal barrier of a cmpxchg expansion will be either Acquire or
SequentiallyConsistent. In either case it can be skipped if the
operation has Monotonic requirements on failure.
rdar://problem/15996804
llvm-svn: 205535
Summary:
Adds the 'mips4' processor and a simple test of the ELF e_flags.
Patch by David Chisnall
His work was sponsored by: DARPA, AFRL
I made one small change to the testcase so that it uses
mips64-unknown-linux instead of mips4-unknown-linux.
This patch indirectly adds FeatureCondMov to FeatureMips64. This is ok
because it's supposed to be there anyway and it turns out that
FeatureCondMov is not a predicate of any instructions at the moment
(this is a bug that hasn't been noticed because there are no targets
without the conditional move instructions yet).
CC: theraven
Differential Revision: http://llvm-reviews.chandlerc.com/D3244
llvm-svn: 205530
llc doesn't generate nodes for unconditional fall-through branches for targets
without FastISel implementation (X86 has it, but can be disabled by
"-fast-isel=false") in SelectionDAGBuilder::visitBr().
So for line 4 in the following testcase
1: void foo(int i){
2: switch(i){
3: default:
4: break;
5: }
6: return;
7: }
there is no corresponding line in .debug_line section, and a debugger
cannot set a breakpoint at line 4.
Fix this by always emitting a branch when we're not optimizing and add a
testcase to ensure that there's code on every line we'd want to break.
Patch by Daniil Fukalov.
llvm-svn: 205529
The previous situation where ATOMIC_LOAD_WHATEVER nodes were expanded
at MachineInstr emission time had grown to be extremely large and
involved, to account for the subtly different code needed for the
various flavours (8/16/32/64 bit, cmpxchg/add/minmax).
Moving this transformation into the IR clears up the code
substantially, and makes future optimisations much easier:
1. an atomicrmw followed by using the *new* value can be more
efficient. As an IR pass, simple CSE could handle this
efficiently.
2. Making use of cmpxchg success/failure orderings only has to be done
in one (simpler) place.
3. The common "cmpxchg; did we store?" idiom can be exposed to
optimisation.
I intend to gradually improve this situation within the ARM backend
and make sure there are no hidden issues before moving the code out
into CodeGen to be shared with (at least ARM64/AArch64, though I think
PPC & Mips could benefit too).
llvm-svn: 205525
The trouble as in ARMAsmParser, in ParseInstruction method. It assumes that ARM::R12 + 1 == ARM::SP.
It is wrong, since ARM::<Register> codes are generated by tablegen and actually could be any random numbers.
llvm-svn: 205524
add operation since extract_vector_elt can perform an extend operation. Get the input lane
type from the vector on which we're performing the vpaddl operation on and extend or
truncate it to the output type of the original add node.
llvm-svn: 205523
%highest(sym1 - sym2 + const) relocations. Remove "ABS_" from VK_Mips_HI
and VK_Mips_LO enums in MipsMCExpr, to be consistent with VK_Mips_HIGHER
and VK_Mips_HIGHEST.
This change also deletes test file test/MC/Mips/higher_highest.ll and moves
its CHECK's to the new test file test/MC/Mips/higher-highest-addressing.s.
The deleted file tests that R_MIPS_HIGHER and R_MIPS_HIGHEST relocations are
emitted in the .o file. Since it uses -force-mips-long-branch option, it was
created when MipsLongBranch's implementation was emitting R_MIPS_HIGHER and
R_MIPS_HIGHEST relocations in the .o file. It was disabled when MipsLongBranch
started to directly calculate offsets.
Differential Revision: http://llvm-reviews.chandlerc.com/D3230
llvm-svn: 205522
Switching between i32 and i64 based on the LHS type is a good idea in
theory, but pre-legalisation uses i64 regardless of our choice,
leading to potential ISel errors.
Should fix PR19294.
llvm-svn: 205519
While we were encoding 64 bit values (data8) in the subrange itself,
using a 32 bit type for the subrange was still confusing the gdb. Oh,
and make it unsigned too.
As the comment points out, this could be pushed into the frontend so
that it would be 32 or 64 bit as appropriate, etc.
llvm-svn: 205512
I should have read that comment a little more carefully. ;)
Regression test in the works, committing in the mean time to un-break people.
llvm-svn: 205511
This has the following advantages:
* Less code.
* The old ELF implementation was wrong for non-relocatable objects.
* The old ELF implementation (and I think MachO) was wrong for thumb.
No current testcase since this is only used from MCJIT and it only uses
relocatable objects and I don't think it supports thumb yet.
llvm-svn: 205508
When a vector type legalizes to a larger vector type, and the target does not
support the associated extending load (or truncating store), then legalization
will scalarize the load (or store) resulting in an associated scalarization
cost. BasicTTI::getMemoryOpCost needs to account for this.
Between this, and r205487, PowerPC on the P7 with VSX enabled shows:
MultiSource/Benchmarks/PAQ8p/paq8p: 43% speedup
SingleSource/Benchmarks/BenchmarkGame/puzzle: 51% speedup
SingleSource/UnitTests/Vectorizer/gcc-loops 28% speedup
(some of these are new; some of these, such as PAQ8p, just reverse regressions
that VSX support would trigger)
llvm-svn: 205495
This reverts commit r205479.
It turns out that nm does use addresses, it is just that every reasonable
relocatable ELF object has sections with address 0. I have no idea if those
exist in reality, but it at least it shows that llvm-nm should use the name
address.
The added test was includes an unusual .o file with non 0 section addresses. I
created it by hacking ELFObjectWriter.cpp.
Really sorry for the churn.
llvm-svn: 205493
TargetInstrInfo::findCommutedOpIndices to enable VFMA*231 commutation, rather
than abusing commuteInstruction.
Thanks very much for the suggestion guys!
llvm-svn: 205489