David Blaikie's commits r217563 & r217564, which added shared_ptr to the
CostPool have fixed some memory leak issues exposed by the PBQP with
coalescing constraints.
The sanitizer bot was failing because of those leaks. Now that the leaks
are gone, we can reenable the aarch64/pbqp test.
llvm-svn: 217580
We used to crash processing any relevant @llvm.assume on a 32-bit target
(because we'd ask SE to subtract expressions of differing types). I've copied
our 'simple.ll' test, but with the data layout from arm-linux-gnueabihf to get
some meaningful test coverage here.
llvm-svn: 217574
Leveraging both intrusive shared_ptr-ing (std::enable_shared_from_this)
and shared_ptr<T>-owning-U (to allow external users to hold
std::shared_ptr<CostT> while keeping the underlying PoolEntry alive).
The intrusiveness could be removed if we had a weak_set that implicitly
removed items from the set when their underlying data went away.
This /might/ fix an existing memory leak reported by LeakSanitizer in
r217504.
llvm-svn: 217563
Need to convert the 64 element offset into bytes, not just the element
size like the normal case instructions.
Noticed by inspection. This can't be hit now because
st64 instructions aren't emitted during instruction selection,
and the post-RA scheduler isn't enabled.
llvm-svn: 217560
With this a DataLayoutPass can be reused for multiple modules.
Once we have doInitialization/doFinalization, it doesn't seem necessary to pass
a Module to the constructor.
Overall this change seems in line with the idea of making DataLayout a required
part of Module. With it the only way of having a DataLayout used is to add it
to the Module.
llvm-svn: 217548
The routine that determines an alignment given some SCEV returns zero if the
answer is unknown. In a case where we could determine the increment of an
AddRec but not the starting alignment, we would compute the integer modulus by
zero (which is illegal and traps). Prevent this by returning early if either
the start or increment alignment is unknown (zero).
llvm-svn: 217544
The increase of the interleave factor to 4 has side-effects
like performance losses eg. due to reminder loops being executed
more frequently and may increase code size. It requires more
analysis and careful heuristic tuning. Expect double digit gains
in small benchmarks like lowercase.c and losses in puzzle.c.
llvm-svn: 217540
Summary:
Make CallingConv::ID a plain unsigned instead of enum with a
fixed set of valus. LLVM IR allows arbitraty calling conventions (you are
free to write cc12345), and loading them as enum is an undefined
behavior. This was reported by UBSan.
Test Plan: llvm regression test suite
Reviewers: nicholas
Reviewed By: nicholas
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5248
llvm-svn: 217529
"Unroll" is not the appropriate name for this variable. Clang already uses
the term "interleave" in pragmas and metadata for this.
Differential Revision: http://reviews.llvm.org/D5066
llvm-svn: 217528
Noticed while trying to understand how the merge of forward decalred types
and defintions work.
Reviewers: echristo, dblaikie, aprantl
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5291
llvm-svn: 217514
This adds target specific support for using the PBQP register allocator on the
AArch64, for the A57 cpu.
By default, the PBQP allocator is not used, unless explicitely required
on the command line with "-aarch64-pbqp".
llvm-svn: 217504
using static relocation model and small code model.
Summary: currently we generate GOT based relocations for weak symbol
references regardless of the underlying relocation model. This should
be change so that in static relocation model we use a constant pool
load instead.
Patch from: Keith Walker
Reviewers: Renato Golin, Tim Northover
llvm-svn: 217503
The only Thumb-1 multi-store capable of using LR is the PUSH instruction, which
translates to STMDB, so we shouldn't convert STMIAs.
Patch by Sergey Dmitrouk.
llvm-svn: 217498
This adds support for reading the "bigobj" variant of COFF produced by
cl's /bigobj and mingw's -mbig-obj.
The most significant difference that bigobj brings is more than 2**16
sections to COFF.
bigobj brings a few interesting differences with it:
- It doesn't have a Characteristics field in the file header.
- It doesn't have a SizeOfOptionalHeader field in the file header (it's
only used in executable files).
- Auxiliary symbol records have the same width as a symbol table entry.
Since symbol table entries are bigger, so are auxiliary symbol
records.
Write support will come soon.
Differential Revision: http://reviews.llvm.org/D5259
llvm-svn: 217496
This fixes the generation of broken LLVMExports.cmake file by
the Autoconf/Makefile build system when --enable-shared is passed to
configure.
When --enable_shared is passed the Makefile.rules does not set the
LLVMConfigLibs variable which cmake/modules/Makefile previously relied
on. Now it runs the llvm-config command itself to get the library names.
This still isn't perfect because the generated LLVM targets refer to the
static libraries and not the shared library but that is much larger
problem to fix.
llvm-svn: 217484
It appears that the -filename-equivalence option for testing llvm-cov
doesn't work correctly with -show-expansions. I'm reverting this test
to get the bots green while I look into fixing that.
This partially reverts r217476
llvm-svn: 217478
This commit adds aliases for the sync instruction (synciobdma,
syncs, syncw, syncws) which are used by the Octeon CPU.
Reviewed by D. Sanders
llvm-svn: 217477
So that the two operations in DwarfDebug couldn't get separated (because
I accidentally separated them in some work in progress), put them
together. While we're here, move DwarfUnit::addRange to
DwarfCompileUnit, since it's not relevant to type units.
llvm-svn: 217468
PrevSection/PrevCU are used to detect holes in the address range of a CU
to ensure the DW_AT_ranges does not include those holes. When we see a
function with no debug info, though it may be in the same range as the
prior and subsequent functions, there should be a gap in the CU's
ranges. By setting PrevCU to null in that case, the range would not be
extended to cover the gap.
llvm-svn: 217466
This is the plugin version of pr20882.
This handles the case of every common symbol being in the IR. We will need some
support from gold to handle the case where some symbols are in ELF and some in
the IR.
llvm-svn: 217458
This is a first pass at a scheduling model for Jaguar.
It's structured largely on the existing SandyBridge and SLM sched models.
Using this model, in addition to turning on the PostRA scheduler, results in
some perf wins on internal and 3rd party benchmarks. There's not much difference
in LLVM's test-suite benchmarking subset of tests.
Differential Revision: http://reviews.llvm.org/D5229
llvm-svn: 217457
Summary:
This directive is used to reset the assembler options to their initial values.
Assembly programmers use it in conjunction with the ".set mipsX" directives.
This patch depends on the .set push/pop directive (http://reviews.llvm.org/D4821).
Contains work done by Matheus Almeida.
Reviewers: dsanders
Reviewed By: dsanders
Differential Revision: http://reviews.llvm.org/D4957
llvm-svn: 217438
Summary:
This patch moves the profile reading logic out of the Sample Profile
transformation into a generic profile reader facility in
lib/ProfileData.
The intent is to use this new reader to implement a sample profile
reader/writer that can be used to convert sample profiles from external
sources into LLVM.
This first patch introduces no functional changes. It moves the profile
reading code from lib/Transforms/SampleProfile.cpp into
lib/ProfileData/SampleProfReader.cpp.
In subsequent patches I will:
- Add a bitcode format for sample profiles to allow for more efficient
encoding of the profile.
- Add a writer for both text and bitcode format profiles.
- Add a 'convert' command to llvm-profdata to be able to convert between
the two (and serve as entry point for other sample profile formats).
Reviewers: bogner, echristo
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5250
llvm-svn: 217437
Summary:
The GPR size is more a property of the subtarget than that of the ABI so move
this information to the MipsSubtarget.
No functional change.
Reviewers: vmedic
Reviewed By: vmedic
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5009
llvm-svn: 217436
Summary:
In AT&T annotation for both x86_64 and x32 calls should be printed as
callq in assembly. It's only a matter of correct mnemonic, object output
is ok.
Test Plan: trivial test added
Reviewers: nadav, dschuff, craig.topper
Subscribers: llvm-commits, zinovy.nis
Differential Revision: http://reviews.llvm.org/D5213
llvm-svn: 217435
Summary:
These directives are used to save the current assembler options (in the case of ".set push") and restore the previously saved options (in the case of ".set pop").
Contains work done by Matheus Almeida.
Reviewers: dsanders
Reviewed By: dsanders
Differential Revision: http://reviews.llvm.org/D4821
llvm-svn: 217432
This solves the problem of having a kill flag inside a loop
with a definition of the register prior to the loop:
%vreg368<def> ...
Inside loop:
%vreg520<def> = COPY %vreg368
%vreg568<def,tied1> = add %vreg341<tied0>, %vreg520<kill>
=> was coalesced into =>
%vreg568<def,tied1> = add %vreg341<tied0>, %vreg368<kill>
MachineVerifier then complained:
*** Bad machine code: Virtual register killed in block, but needed live out. ***
The kill flag for %vreg368 is incorrect, and is cleared by this patch.
This is similar to the clearing done at the end of
MachineSinking::SinkInstruction().
Patch provided by Jonas Paulsson.
Reviewed by Quentin Colombet and Juergen Ributzka.
llvm-svn: 217427
llvm-cov had a SourceRange type that was nearly identical to a
CountedRegion except that it shaved off a couple of fields. There
aren't likely to be enough of these for the minor memory savings to be
worth the extra complexity here.
llvm-svn: 217417
When compiling without SSE2, isTruncStoreLegal(F64, F32) would return Legal, whereas with SSE2 it would return Expand. And since the Target doesn't seem to actually handle a truncstore for double -> float, it would just output a store of a full double in the space for a float hence overwriting other bits on the stack.
Patch by Luqman Aden!
llvm-svn: 217410
Previously, fast-isel would not clean up after failing to select a call
instruction, because it would have called flushLocalValueMap() which moves
the insertion point, making SavedInsertPt in selectInstruction() invalid.
Fixing this by making SavedInsertPt a member variable, and having
flushLocalValueMap() update it.
This removes some redundant code at -O0, and more importantly fixes PR20863.
Differential Revision: http://reviews.llvm.org/D5249
llvm-svn: 217401
to make sure we don't do invalid load of an enum. Share the
conversion code between llvm::Module implementation and the
verifier.
This bug was reported by UBSan.
llvm-svn: 217395
Assert in scheduler from an inserted copy_to_regclass from
a constant.
This only seems to break sometimes when a constant initializer
address is forced into VGPRs in a non-entry block. No test
since the only case I've managed to hit only happens with a future
patch, and that case will also not be a problem once scalar instructions
are used in non-entry blocks.
llvm-svn: 217380
This adds a basic (but important) use of @llvm.assume calls in ScalarEvolution.
When SE is attempting to validate a condition guarding a loop (such as whether
or not the loop count can be zero), this check should also include dominating
assumptions.
llvm-svn: 217348
From a combination of @llvm.assume calls (and perhaps through other means, such
as range metadata), it is possible that all bits of a return value might be
known. Previously, InstCombine did not check for this (which is understandable
given assumptions of constant propagation), but means that we'd miss simple
cases where assumptions are involved.
llvm-svn: 217346
This change teaches LazyValueInfo to use the @llvm.assume intrinsic. Like with
the known-bits change (r217342), this requires feeding a "context" instruction
pointer through many functions. Aside from a little refactoring to reuse the
logic that turns predicates into constant ranges in LVI, the only new code is
that which can 'merge' the range from an assumption into that otherwise
computed. There is also a small addition to JumpThreading so that it can have
LVI use assumptions in the same block as the comparison feeding a conditional
branch.
With this patch, we can now simplify this as expected:
int foo(int a) {
__builtin_assume(a > 5);
if (a > 3) {
bar();
return 1;
}
return 0;
}
llvm-svn: 217345
This adds a ScalarEvolution-powered transformation that updates load, store and
memory intrinsic pointer alignments based on invariant((a+q) & b == 0)
expressions. Many of the simple cases we can get with ValueTracking, but we
still need something like this for the more complicated cases (such as those
with an offset) that require some algebra. Note that gcc's
__builtin_assume_aligned's optional third argument provides exactly for this
kind of 'misalignment' offset for which this kind of logic is necessary.
The primary motivation is to fixup alignments for vector loads/stores after
vectorization (and unrolling). This pass is added to the optimization pipeline
just after the SLP vectorizer runs (which, admittedly, does not preserve SE,
although I imagine it could). Regardless, I actually don't think that the
preservation matters too much in this case: SE computes lazily, and this pass
won't issue any SE queries unless there are any assume intrinsics, so there
should be no real additional cost in the common case (SLP does preserve DT and
LoopInfo).
llvm-svn: 217344
This builds on r217342, which added the infrastructure to compute known bits
using assumptions (@llvm.assume calls). That original commit added only a few
patterns (to catch common cases related to determining pointer alignment); this
change adds several other patterns for simple cases.
r217342 contained that, for assume(v & b = a), bits in the mask
that are known to be one, we can propagate known bits from the a to v. It also
had a known-bits transfer for assume(a = b). This patch adds:
assume(~(v & b) = a) : For those bits in the mask that are known to be one, we
can propagate inverted known bits from the a to v.
assume(v | b = a) : For those bits in b that are known to be zero, we can
propagate known bits from the a to v.
assume(~(v | b) = a): For those bits in b that are known to be zero, we can
propagate inverted known bits from the a to v.
assume(v ^ b = a) : For those bits in b that are known to be zero, we can
propagate known bits from the a to v. For those bits in
b that are known to be one, we can propagate inverted
known bits from the a to v.
assume(~(v ^ b) = a) : For those bits in b that are known to be zero, we can
propagate inverted known bits from the a to v. For those
bits in b that are known to be one, we can propagate
known bits from the a to v.
assume(v << c = a) : For those bits in a that are known, we can propagate them
to known bits in v shifted to the right by c.
assume(~(v << c) = a) : For those bits in a that are known, we can propagate
them inverted to known bits in v shifted to the right by c.
assume(v >> c = a) : For those bits in a that are known, we can propagate them
to known bits in v shifted to the right by c.
assume(~(v >> c) = a) : For those bits in a that are known, we can propagate
them inverted to known bits in v shifted to the right by c.
assume(v >=_s c) where c is non-negative: The sign bit of v is zero
assume(v >_s c) where c is at least -1: The sign bit of v is zero
assume(v <=_s c) where c is negative: The sign bit of v is one
assume(v <_s c) where c is non-positive: The sign bit of v is one
assume(v <=_u c): Transfer the known high zero bits
assume(v <_u c): Transfer the known high zero bits (if c is know to be a power
of 2, transfer one more)
A small addition to InstCombine was necessary for some of the test cases. The
problem is that when InstCombine was simplifying and, or, etc. it would fail to
check the 'do I know all of the bits' condition before checking less specific
conditions and would not fully constant-fold the result. I'm not sure how to
trigger this aside from using assumptions, so I've just included the change
here.
llvm-svn: 217343
This change, which allows @llvm.assume to be used from within computeKnownBits
(and other associated functions in ValueTracking), adds some (optional)
parameters to computeKnownBits and friends. These functions now (optionally)
take a "context" instruction pointer, an AssumptionTracker pointer, and also a
DomTree pointer, and most of the changes are just to pass this new information
when it is easily available from InstSimplify, InstCombine, etc.
As explained below, the significant conceptual change is that known properties
of a value might depend on the control-flow location of the use (because we
care that the @llvm.assume dominates the use because assumptions have
control-flow dependencies). This means that, when we ask if bits are known in a
value, we might get different answers for different uses.
The significant changes are all in ValueTracking. Two main changes: First, as
with the rest of the code, new parameters need to be passed around. To make
this easier, I grouped them into a structure, and I made internal static
versions of the relevant functions that take this structure as a parameter. The
new code does as you might expect, it looks for @llvm.assume calls that make
use of the value we're trying to learn something about (often indirectly),
attempts to pattern match that expression, and uses the result if successful.
By making use of the AssumptionTracker, the process of finding @llvm.assume
calls is not expensive.
Part of the structure being passed around inside ValueTracking is a set of
already-considered @llvm.assume calls. This is to prevent a query using, for
example, the assume(a == b), to recurse on itself. The context and DT params
are used to find applicable assumptions. An assumption needs to dominate the
context instruction, or come after it deterministically. In this latter case we
only handle the specific case where both the assumption and the context
instruction are in the same block, and we need to exclude assumptions from
being used to simplify their own ephemeral values (those which contribute only
to the assumption) because otherwise the assumption would prove its feeding
comparison trivial and would be removed.
This commit adds the plumbing and the logic for a simple masked-bit propagation
(just enough to write a regression test). Future commits add more patterns
(and, correspondingly, more regression tests).
llvm-svn: 217342
It's probably not a huge deal to not do this - if we could, maybe the
address could be reused by a subprogram low_pc and avoid an extra
relocation, but it's just one per CU at best.
llvm-svn: 217338
This adds a set of utility functions for collecting 'ephemeral' values. These
are LLVM IR values that are used only by @llvm.assume intrinsics (directly or
indirectly), and thus will be removed prior to code generation, implying that
they should be considered free for certain purposes (like inlining). The
inliner's cost analysis, and a few other passes, have been updated to account
for ephemeral values using the provided functionality.
This functionality is important for the usability of @llvm.assume, because it
limits the "non-local" side-effects of adding llvm.assume on inlining, loop
unrolling, etc. (these are hints, and do not generate code, so they should not
directly contribute to estimates of execution cost).
llvm-svn: 217335
This adds an immutable pass, AssumptionTracker, which keeps a cache of
@llvm.assume call instructions within a module. It uses callback value handles
to keep stale functions and intrinsics out of the map, and it relies on any
code that creates new @llvm.assume calls to notify it of the new instructions.
The benefit is that code needing to find @llvm.assume intrinsics can do so
directly, without scanning the function, thus allowing the cost of @llvm.assume
handling to be negligible when none are present.
The current design is intended to be lightweight. We don't keep track of
anything until we need a list of assumptions in some function. The first time
this happens, we scan the function. After that, we add/remove @llvm.assume
calls from the cache in response to registration calls and ValueHandle
callbacks.
There are no new direct test cases for this pass, but because it calls it
validation function upon module finalization, we'll pick up detectable
inconsistencies from the other tests that touch @llvm.assume calls.
This pass will be used by follow-up commits that make use of @llvm.assume.
llvm-svn: 217334
support for MOVDDUP which is really important for matrix multiply style
operations that do lots of non-vector-aligned load and splats.
The original motivation was to add support for MOVDDUP as the lack of it
regresses matmul_f64_4x4 by 5% or so. However, all of the rules here
were somewhat suspicious.
First, we should always be using the floating point domain shuffles,
regardless of how many copies we have to make as a movapd is *crazy*
faster than the domain switching cost on some chips. (Mostly because
movapd is crazy cheap.) Because SHUFPD can't do the copy-for-free trick
of the PSHUF instructions, there is no need to avoid canonicalizing on
UNPCK variants, so do that canonicalizing. This also ensures we have the
chance to form MOVDDUP. =]
Second, we assume SSE2 support when doing any vector lowering, and given
that we should just use UNPCKLPD and UNPCKHPD as they can operate on
registers or memory. If vectors get spilled or come from memory at all
this is going to allow the load to be folded into the operation. If we
want to optimize for encoding size (the only difference, and only
a 2 byte difference) it should be done *much* later, likely after RA.
llvm-svn: 217332
Instead of aligning and moving the CurPtr forward, and then comparing
with End, simply calculate how much space is needed, and compare that
to how much is available.
Hopefully this avoids any doubts about comparing addresses possibly
derived from past the end of the slab array, overflowing, etc.
Also add a test where aligning CurPtr would move it past End.
llvm-svn: 217330
field of RelocationValueRef, rather than the 'Addend' field.
This is consistent with RuntimeDyldELF's use of RelocationValueRef, and more
consistent with the semantics of the data being stored (the offset from the
start of a section or symbol).
llvm-svn: 217328
DWARF address ranges contain a reference to the debug_info section. This offset
is an absolute relocation except on non-PE/COFF targets where it is section
relative. We would emit this incorrectly, and trying to map the debug info from
the address would fail.
llvm-svn: 217317
parsing (and latent bug in the instruction definitions).
This is effectively a revert of r136287 which tried to address
a specific and narrow case of immediate operands failing to be accepted
by x86 instructions with a pretty heavy hammer: it introduced a new kind
of operand that behaved differently. All of that is removed with this
commit, but the test cases are both preserved and enhanced.
The core problem that r136287 and this commit are trying to handle is
that gas accepts both of the following instructions:
insertps $192, %xmm0, %xmm1
insertps $-64, %xmm0, %xmm1
These will encode to the same byte sequence, with the immediate
occupying an 8-bit entry. The first form was fixed by r136287 but that
broke the prior handling of the second form! =[ Ironically, we would
still emit the second form in some cases and then be unable to
re-assemble the output.
The reason why the first instruction failed to be handled is because
prior to r136287 the operands ere marked 'i32i8imm' which forces them to
be sign-extenable. Clearly, that won't work for 192 in a single byte.
However, making thim zero-extended or "unsigned" doesn't really address
the core issue either because it breaks negative immediates. The correct
fix is to make these operands 'i8imm' reflecting that they can be either
signed or unsigned but must be 8-bit immediates. This patch backs out
r136287 and then changes those places as well as some others to use
'i8imm' rather than one of the extended variants.
Naturally, this broke something else. The custom DAG nodes had to be
updated to have a much more accurate type constraint of an i8 node, and
a bunch of Pat immediates needed to be specified as i8 values.
The fallout didn't end there though. We also then ceased to be able to
match the instruction-specific intrinsics to the instructions so
modified. Digging, this is because they too used i32 rather than i8 in
their signature. So I've also switched those intrinsics to i8 arguments
in line with the instructions.
In order to make the intrinsic adjustments of course, I also had to add
auto upgrading for the intrinsics.
I suspect that the intrinsic argument types may have led everything down
this rabbit hole. Pretty happy with the result.
llvm-svn: 217310
The finalizeObject method calls generateCodeForModule on each of the currently
'added' objects, but generateCodeForModule moves objects out of the 'added'
set as it's called. To avoid iterator invalidation issues, the added set is
copied out before any calls to generateCodeForModule.
This should fix http://llvm.org/PR20851 .
llvm-svn: 217291
computation was totally wrong, but somehow it didn't really show up with
llc.
I've added an assert that triggers on multiple existing test cases and
updated one of them to show the correct value.
There appear to still be more bugs lurking around insertps's mask. =/
However, note that this only really impacts the new vector shuffle
lowering.
llvm-svn: 217289
follows '~' in a clobber constraint string.
Previously llc would hit an llvm_unreachable when compiling an inline-asm
instruction with malformed constraint string "~x{21}". This commit enables
LLParser to catch the error earlier and print a more helpful diagnostic.
rdar://problem/14206559
llvm-svn: 217288
This problem is bigger than just fsub, but this is the minimum fix to solve
fneg for PR20556 ( http://llvm.org/bugs/show_bug.cgi?id=20556 ), and we solve
zero subtraction with the same change.
llvm-svn: 217286
When linking llvm.global_ctors with the optional third element we have to handle
it specially and only copy the elements whose keys were also copied.
llvm-svn: 217281
Summary:
Until r216870 LLVMCreateObjectFile returned nullptr in case of an error,
so callers could check if the call was successful. Now, it always
returns an OwningBinary wrapped as an LLVMObjectFileRef, so callers
can't check if the call was successul.
This results in a segfault running e.g.
llvm-c-test --object-list-sections < /dev/null
So the old behaviour should be restored.
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5143
llvm-svn: 217279
This reverts commit r217211.
Both the bfd ld and gold outputs were valid. They were using a Rela relocation,
so the value present in the relocated location was not used, which caused me
to misread the output.
llvm-svn: 217264
Fixes PR20523.
When spilling variables onto the stack, spillVirtReg() is setting the
parent pointer of the cloned DBG_VALUE intrinsic for the stack location
to the parent pointer of the original intrinsic. MachineInstr parent
pointers should however always point to the parent basic block.
MBB is shadowing the MBB member variable. The instruction still ends up
being inserted into the right basic block, because it's inserted after MI
which serves as the iterator.
I failed at constructing a reliable testcase for this, see
http://llvm.org/bugs/show_bug.cgi?id=20523 for a large testcases.
llvm-svn: 217260
Summary: Found a couple of cases where unsigned was still being used. These two should be the last ones in the (entire) Mips backend.
Reviewers: dsanders
Reviewed By: dsanders
Differential Revision: http://reviews.llvm.org/D5028
llvm-svn: 217257
Summary: Use the naming convention from the LLVM Coding Standards.
Reviewers: dsanders
Reviewed By: dsanders
Differential Revision: http://reviews.llvm.org/D4972
llvm-svn: 217254
We must constrain the destination register class of legalized operands
to a VGPR class or else the illegal operand may be folded back into
the instruction by the register coalescer.
This fixes a bug in add.ll that will be uncovered by future commits.
llvm-svn: 217249
shuffle lowering for integer vectors and share it from v4i32, v8i16, and
v16i8 code paths.
Ironically, the SSE2 v16i8 code for this is now better than the SSSE3!
=] Will have to fix the SSSE3 code next to just using a single pshufb.
llvm-svn: 217240
The special case did not work when run under -reassociate and can easily
be expressed by a further generalization of an existing pattern.
llvm-svn: 217227
The header contains an offset to the DWARF line table for the CU. The offset
must be section relative for COFF and absolute for others. The non-assembly
code path for the DWARF header generation already has the correct emission for
the headers. This corrects the assembly input path.
This was identified by BFD objecting to the LLVM generated DWARF information.
llvm-svn: 217222
Patched by Sergey Dmitrouk.
This pass tries to make consecutive compares of values use same operands to
allow CSE pass to remove duplicated instructions. For this it analyzes
branches and adjusts comparisons with immediate values by converting:
GE -> GT
GT -> GE
LT -> LE
LE -> LT
and adjusting immediate values appropriately. It basically corrects two
immediate values towards each other to make them equal.
llvm-svn: 217220
This reverts commit 93c7e6161e1adbd2c7ac81fa081823183035cb64.
This commit got approved first, but was dependant on another one going in (The one pretty printing attribute values). I'll reapply when the other one is in.
llvm-svn: 217183
Summary:
This fixes a long standing issue where we would emit many little .text
sections and only one .pdata and .xdata section. Now we generate one
.pdata / .xdata pair per .text section and associate them correctly.
Fixes PR19667.
Reviewers: majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5181
llvm-svn: 217176
This fixes an issue where MS inline assembly containing xgetbv wouldn't
be marked as clobbering EAX:EDX. Test for that forthcoming on the Clang
side.
llvm-svn: 217173
Follow up to r217138, extending the logic to other NEON-immediate instructions.
As before, the instruction already performs the correct operation and we're
just using a different type for convenience, so we want a true nop-cast.
Patch by Asiri Rathnayake.
llvm-svn: 217159
Fixes:
CMake Warning (dev) at lib/ExecutionEngine/Interpreter/CMakeLists.txt:16 (target_link_libraries):
Policy CMP0023 is not set: Plain and keyword target_link_libraries
signatures cannot be mixed. Run "cmake --help-policy CMP0023" for policy
details. Use the cmake_policy command to set the policy and suppress this
warning.
The keyword signature for target_link_libraries has already been used with
the target "LLVMInterpreter". All uses of target_link_libraries with a
target should be either all-keyword or all-plain.
The uses of the keyword signature are here:
* cmake/modules/AddLLVM.cmake:345 (target_link_libraries)
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 217154
Summary: There are still some functions which should be renamed, but they are inherited from the generic MC classes.
Reviewers: dsanders
Reviewed By: dsanders
Differential Revision: http://reviews.llvm.org/D5068
llvm-svn: 217145
We were materialising big-endian constants using DAG nodes with types different
from what was requested, followed by a bitcast. This is fine on little-endian
machines where bitcasting is a nop, but we need a slightly different
representation for big-endian. This adds a new set of NVCAST (natural-vector
cast) operations which are always nops.
Patch by Asiri Rathnayake.
llvm-svn: 217138
vzext patterns and insert-element patterns that for SSE4 have dedicated
instructions.
With this we can enable the experimental mode in a regression test that
happens to cover some of the past set of issues. You can see that the
new logic does significantly better here on the floating point cases.
A follow-up to this change and the previous ones will hoist the logic
into helpers so it can be shared across element type sizes as in this
particular case it generalizes cleanly.
llvm-svn: 217136
The DWARFContext will be used to pass global 'context' down, like
pointers to related debug info sections or command line options.
The first use will be for the debug_info dumper to be able to access
other debug info section to dump eg. Location Expression inline
in the debug_info dump.
llvm-svn: 217128
abilities of INSERTPS which are really powerful and come up in very
important contexts such as forming diagonal matrices, etc.
With this I ended up being able to remove the somewhat weird helper
I added for INSERTPS because we can collapse the entire state to a no-op
mask. Added a bunch of tests for inserting into a zero-ish vector.
llvm-svn: 217117
The basic idea is similar to the existing cross compilation support. A directory must be configured to build host versions of tablegen tools and llvm-config. This directory can be user provided (and configured), or it can be created during the build. During a build the native build directory will be configured and built to supply the tablegen tools used during the build. A user could also explicitly provide the tablegen executables to run on the CMake command line.
llvm-svn: 217105
LinearFunctionTestReplace tries to use the *next* indvar to compare
against when possible. However, it may be the case that the calculation
for the next indvar has NUW/NSW flags and that it may only be safely
used inside the loop. Using it in a comparison to calculate the exit
condition could result in observing poison.
This fixes PR20680.
Differential Revision: http://reviews.llvm.org/D5174
llvm-svn: 217102
'insertps' patterns.
This replaces two shuffles with a single insertps in very common cases.
My next patch will extend this to leverage the zeroing capabilities of
insertps which will allow it to be used in a much wider set of cases.
llvm-svn: 217100
an immediate operand when we don't have instruction-specific comments.
This ensures that instruction-specific comments are attached to the same
line as the instruction which is important for using them to write
readable and maintainable tests. My next commit will just such a test.
llvm-svn: 217099
I'm not sure this is a particularly helpful API (to pass ownership and
then return it unconditionally) rather than just pass the underlying
object by non-const reference, but this was the original API so I'll
just make it more safe/stable and anyone else is free to adjust that at
their whim, of course.
llvm-svn: 217081
Summary:
Split shouldExpandAtomicInIR() into different versions for Stores/Loads/RMWs/CmpXchgs.
Makes runOnFunction cleaner (no more redundant checking/casting), and will help moving
the X86 backend to this pass.
This requires a way of easily detecting which instructions are atomic.
I followed the pattern of mayReadFromMemory, mayWriteOrReadMemory, etc.. in making
isAtomic() a method of Instruction implemented by a switch on the opcodes.
Test Plan: make check
Reviewers: jfb
Subscribers: mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D5035
llvm-svn: 217080
Fixes two latent bugs:
- There was no fence inserted before expanded seq_cst load (unsound on Power)
- There was only a fence release before seq_cst stores (again unsound, in particular on Power)
It is not even clear if this is correct on ARM swift processors (where release fences are
DMB ishst instead of DMB ish). This behaviour is currently preserved on ARM Swift
as it is not clear whether it is incorrect. I would love to get documentation stating
whether it is correct or not.
These two bugs were not triggered because Power is not (yet) using this pass, and these
behaviours happen to be (mostly?) working on ARM
(although they completely butchered the semantics of the llvm IR).
See:
http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-August/075821.html
for an example of the problems that can be caused by the second of these bugs.
I couldn't see a way of fixing these in a completely target-independent way without
adding lots of unnecessary fences on ARM, hence the target-dependent parts of this
patch.
This patch implements the new target-dependent parts only for ARM (the default
of not doing anything is enough for AArch64), other architectures will use this
infrastructure in later patches.
llvm-svn: 217076
This is the final round of renaming. This changes tblgen to emit lower-case
function names for FastEmitInst_* and FastEmit_*, and updates all its uses
in the source code.
Reviewed by Eric
llvm-svn: 217075
JITEventListener. This used to be in the old JIT (last line of the file)
and everyone just "happened" to pick it up from there. =/ Doh.
llvm-svn: 217073
This patch adds to LLVMSupport the capability of writing files with
international characters encoded in the current system encoding. This
is relevant for Windows, where we can either use UTF16 or the current
code page (the legacy Windows international characters). On UNIX, the
file is always saved in UTF8.
This will be used in a patch for clang to thoroughly support response
files creation when calling other tools, addressing PR15171. On
Windows, to correctly support internationalization, we need the
ability to write response files both in UTF16 or the current code
page, depending on the tool we will call. GCC for mingw, for instance,
requires files to be encoded in the current code page. MSVC tools
requires files to be encoded in UTF16.
Patch by Rafael Auler!
llvm-svn: 217068
Things got a little bit messy over the years and it is time for a little bit
spring cleaning.
This first commit is focused on the FastISel base class itself. It doxyfies all
comments, C++11fies the code where it makes sense, renames internal methods to
adhere to the coding standard, and clang-formats the files.
Reviewed by Eric
llvm-svn: 217060
I took a guess at the changes to the gold plugin, because that doesn't
seem to build by default for me. Not sure what dependencies I might be
missing for that.
llvm-svn: 217056
The SLP vectorizer should propagate IR-level optimization hints/flags (nsw, nuw, exact, fast-math)
when converting scalar instructions into vectors. But this isn't a simple copy - we need to take
the intersection (the logical 'and') of the sets of flags on the scalars.
The solution is further complicated because we can have non-uniform (non-SIMD) vector ops after:
http://reviews.llvm.org/D4015http://llvm.org/viewvc/llvm-project?view=revision&revision=211339
The vast majority of changed files are existing tests that were not propagating IR flags, but I've
also added a new test file for focused testing of IR flag possibilities.
Differential Revision: http://reviews.llvm.org/D5172
llvm-svn: 217051
This forces callers to use std::move when calling it. It is somewhat odd to have
code with std::move that doesn't always move, but it is also odd to have code
without std::move that sometimes moves.
llvm-svn: 217049
An unpleasant surprise while migrating unique_ptrs (see changes in
lib/Object): ErrorOr<int*> was implicitly convertible to
ErrorOr<std::unique_ptr<int>>.
Keep the explicit conversions otherwise it's a pain to convert
ErrorOr<int*> to ErrorOr<std::unique_ptr<int>>.
I'm not sure if there should be more SFINAE on those explicit ctors (I
could check if !is_convertible && is_constructible, but since the ctor
has to be called explicitly I don't think there's any need to disable
them when !is_constructible - they'll just fail anyway. It's the
converting ctors that can create interesting ambiguities without proper
SFINAE). I had to SFINAE the explicit ones because otherwise they'd be
ambiguous with the implicit ones in an explicit context, so far as I
could tell.
The converting assignment operators seemed unnecessary (and similarly
buggy/dangerous) - just rely on the converting ctors to convert to the
right type for assignment instead.
llvm-svn: 217048
This fixes a crash in the OpenCV test:
ImgprocWarpResizeArea/Resize.Mat/16
There is no test case for this, because this failure depends on a
specific ordering of the loads, which could easily change.
llvm-svn: 217040
This CL replaces the constant DarwinX86AsmBackend.PushInstrSize with a method
that lets the backend account for different sizes of "push %reg" instruction
sizes.
llvm-svn: 217020
This reapplies r216805 with a fix to a copy-past error, which resulted in an
incorrect register class.
Original commit message:
Select the correct register class for the various instructions that are
generated when combining instructions and constrain the registers to the
appropriate register class.
This fixes rdar://problem/18183707.
llvm-svn: 217019
The syntax of the new builtin is 'section_addr(<filename>, <section-name>)'
(similar to the stub_addr builtin, but without a symbol name). It returns the
base address of the given section in the given object file. This builtin makes
it possible to refer to the contents of sections that cannot contain symbols,
e.g. sections added by the linker itself, like __eh_frame.
llvm-svn: 217010
There is already target-dependent instruction selection support for Adds/Subs to
support compares and the intrinsics with overflow check. This takes advantage of
the existing infrastructure to also support Add/Sub, which allows the folding of
immediates, sign-/zero-extends, and shifts.
This fixes rdar://problem/18207316.
llvm-svn: 217007