For one predicate to subsume another, they must both check the same condition
register. Failure to check this prerequisite was causing miscompiles.
Fixes PR18003.
llvm-svn: 197089
This adds two additional functions to the hazard recognizer interface. These
are optional (in the sense that the default implementations preserve the
current behavior), and used by the post-RA scheduler. Upcoming commits will use
this functionality in order to improve dispatch-group formation on the POWER7
and related cores. Dispatch groups are an odd construct: sometimes we need to
insert nops to force a new one to start (for performance reasons), and some
instructions need to appear in certain positions within a group, but the groups
are not fundamentally cycle based (they can contain instructions with data
dependencies with non-trivial latencies).
Motivation:
unsigned PreEmitNoops(SUnit *) - Used to force the post-RA scheduler to insert
nops to force a new dispatch group to begin. We already have a NoopHazard, and
this is also still needed. However, NoopHazard only causes a nop to be inserted
if there are no other available instructions, and so is not always sufficient.
The number of nops to insert depends on state that only the hazard recognizer
has, so a general callback is necessary.
bool ShouldPreferAnother(SUnit *) - Used to avoid scheduling instructions that
would start a new dispatch group when others are available that could be part
of the current dispatch group. In this case, we don't want to issue nops,
because the non-preferred instruction will implicitly start a new dispatch
group regardless.
Although the motivation for these functions is driven by the PowerPC backend,
they are completely general.
llvm-svn: 197084
The linkers on these systems don't have anything special to do with these
symbols. Since the intent is for them to be absent from the final object,
just treat them as private.
llvm-svn: 197080
This reverts commit r197073.
The test seems to be failing on some buildbots for unknown reasons.
Reverting until I can figure that out. If anyone's got a reproduction
(.s and .o together would be great) - I'd really appreciate it.
llvm-svn: 197079
This commit does not complete the type units feature - there are issues
around fission support (skeletal type units, pubtypes/pubnames) and
hashing of some types including those containing references to types in
other type units.
llvm-svn: 197073
point reciprocal exponent, and floating-point reciprocal square root estimate
LLVM AArch64 intrinsics to use f32/f64 types, rather than their vector
equivalents.
llvm-svn: 197066
The tests were no longer using fast-isel at all (MachO needs an "ios" rather
than "darwin" triple at the moment and Linux needs ARM mode). Once that was
corrected, the verifier complained about a t2ADDri created for the alloca.
llvm-svn: 197046
I moved a test from avx512-vbroadcast-crash.ll to avx512-vbroadcast.ll
I defined HasAVX512 predicate as AssemblerPredicate. It means that you should invoke llvm-mc with "-mcpu=knl" to get encoding for AVX-512 instructions. I need this to let AsmMatcher to set different encoding for AVX and AVX-512 instructions that have the same mnemonic and operands (all scalar instructions).
llvm-svn: 197041
DAGCombiner could fold (truncate (load)) -> smaller load if the original
load was the width of the truncation result or wider. This patch extends
it to handle cases where the original load was narrower (and so the
extension type stays the same).
llvm-svn: 197030
This hook reverses the order of assignment for local live ranges. This
will generally allocate shorter local live ranges first. For targets with
many registers, this could reduce regalloc compile time by a large
factor. It should still achieve optimal coloring; however, it can change
register eviction decisions. It is disabled by default for two reasons:
(1) Top-down allocation is simpler and easier to debug for targets that
don't benefit from reversing the order.
(2) Bottom-up allocation could result in poor evicition decisions on some
targets affecting the performance of compiled code.
llvm-svn: 197001
The combination of inline asm, stack realignment, and dynamic allocas
turns out to be too common to reject out of hand.
ASan inserts empy inline asm fragments and uses aligned allocas.
Compiling any trivial function containing a dynamic alloca with ASan is
enough to trigger the check.
XFAIL the test cases that would be miscompiled and add one that uses the
relevant functionality.
llvm-svn: 196986
It was failing because ASan was adding all of the following to one
function:
- dynamic alloca
- stack realignment
- inline asm
This patch avoids making the static alloca dynamic when coverage is
used.
ASan should probably not be inserting empty inline asm blobs to inhibit
duplicate tail elimination.
llvm-svn: 196973
This re-lands commit r196876, which was reverted in r196879.
The tests have been fixed to pass on platforms with a stack alignment
larger than 4.
Update to clang side tests will land shortly.
llvm-svn: 196939
Most users would be surprised if "isCOFF" and "isMachO" were simultaneously
true, unless they'd put the compiler in a box with a gun attached to a photon
detector.
This makes sure precisely one of the three formats is true for any triple and
simplifies some target logic based on that.
llvm-svn: 196934
The docstrings were describing an older interface that has been replaced with
functions.
Also describe the performance characteristics of FindProgramByName() and
ExecuteAndWait() explaining when it's best to avoid them.
llvm-svn: 196932
immediately after SSE scalar fp instructions like addss or mulss.
Added patterns to select SSE scalar fp arithmetic instructions from a scalar
fp operation followed by a blend.
For example, given the following code:
__m128 foo(__m128 A, __m128 B) {
A[0] += B[0];
return A;
}
previously we generated:
addss %xmm0, %xmm1
movss %xmm1, %xmm0
now we generate:
addss %xmm1, %xmm0
llvm-svn: 196925
Save S2(reg 18) only when we are calling floating point stubs that
have a return value of float or complex. Some more work to make this
better but this is the first step.
llvm-svn: 196921
Defaulting to iOS 3.0 when LLVM has to guess the version is no longer a useful
option and can give surprising results (like tail calls being disabled).
5.0 seems like a reasonable compromise as a platform that's still interesting
to some people.
rdar://problem/15567348
llvm-svn: 196912
One unusual feature of the z architecture is that the result of a
previous load can be reused indefinitely for subsequent loads, even if
a cache-coherent store to that location is performed by another CPU.
A special serializing instruction must be used if you want to force
a load to be reattempted.
Since volatile loads are not supposed to be omitted in this way,
we should insert a serializing instruction before each such load.
The same goes for atomic loads.
The patch implements this at the IR->DAG boundary, in a similar way
to atomic fences. It is a no-op for targets other than SystemZ.
llvm-svn: 196906
One unusual feature of the z architecture is that the result of a
previous load can be reused indefinitely for subsequent loads, even if
a cache-coherent store to that location is performed by another CPU.
A special serializing instruction must be used if you want to force
a load to be reattempted.
Since volatile loads are not supposed to be omitted in this way,
we should insert a serializing instruction before each such load.
The same goes for atomic loads.
The patch implements this at the IR->DAG boundary, in a similar way
to atomic fences. It is a no-op for targets other than SystemZ.
llvm-svn: 196905
For stack frames requiring realignment, three pointers may be needed:
- ebp to address incoming arguments
- esi (could be any callee-saved register) to address locals
- esp to address outgoing arguments
We would use esi unconditionally without verifying that it did not
conflict with inline assembly.
This change doesn't do the verification, it simply emits a fatal error
on functions that use stack realignment, dynamic SP adjustments, and
inline assembly.
Because stack realignment is common on Windows, we also no longer assume
that MS inline assembly clobbers esp. Instead, we analyze the inline
instructions for implicit definitions and check if esp is there. If so,
we require the use of a base pointer and consider it in the condition
above.
Mostly fixes PR16830, but we could try harder to find a non-conflicting
base pointer.
Reviewers: sunfish
Differential Revision: http://llvm-reviews.chandlerc.com/D1317
llvm-svn: 196876
MCJIT needs to be able to run in hostile environments, even when PWD
is invalid. There's no need to crash MCJIT in this case.
The obvious fix is to simply leave MCContext's CompilationDir empty
when PWD can't be determined. This way, MCJIT clients,
and other clients that link with LLVM don’t need a valid working directory.
If we do want to guarantee valid CompilationDir, that should be done
only for clients of getCompilationDir(). This is as simple as checking
for an empty string.
The only current use of getCompilationDir is EmitGenDwarfInfo, which
won’t conceivably run with an invalid working dir. However, in the
purely hypothetically and untestable case that this happens, the
AT_comp_dir will be omitted from the compilation_unit DIE.
llvm-svn: 196874
Similar to gcov, llvm-cov will now print out the block count at the end
of each block. Multiple blocks can end on the same line.
One computational difference is by using -a, llvm-cov will no longer
simply add the block counts together to form a line count. Instead, it
will take the maximum of the block counts on that line. This has a
similar effect to what gcov does, but generates more correct counts in
certain scenarios.
Also updated tests.
llvm-svn: 196856
This avoids creating branch weight metadata of length one when we fold
cases into the default of a switch instruction, which was triggering
an assert.
llvm-svn: 196845
Patch by Jiangning Liu.
With some test case changes:
- intrinsic test added to the existing /test/CodeGen/AArch64/neon-aba-abd.ll.
- New test cases to cover movi 1D scenario without using the intrinsic in
test/CodeGen/AArch64/neon-mov.ll.
llvm-svn: 196806
Summary:
When clang is used under GNU/Linux in a chroot without /proc mount, it falls
back on the BSD method. However, since the buf variable is used twice
and fails with snprintf to produce the correct path.
When called as relatived (ie ./clang), it was failing with:
"" -cc1 [...] -x c++ x.cc
error: unable to execute command: Executable "" doesn't exist!
I also took the opportunity to simply the code (the first arg of test_dir
was useless).
Reviewers: rafael
Reviewed By: rafael
CC: cfe-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D2361
llvm-svn: 196791
Summary:
The MSA ld.[bhwd] and st.[bhwd] instructions scale the immediate by the
element size before use as an offset. The offset must therefore be a
multiple of the element size to be valid in these instructions. However,
an unaligned base address is valid in MSA.
This commit causes the compiler to emit valid code when the calculated
offset is not a multiple of the element size by accounting for the offset
using addiu and using a zero offset in the load/store.
Depends on D2338
Reviewers: matheusalmeida
Reviewed By: matheusalmeida
Differential Revision: http://llvm-reviews.chandlerc.com/D2339
llvm-svn: 196777
Summary:
The immediate in these instructions is scaled before use as an offset.
They therefore have a wider reach than ld.b/st.b.
Reviewers: matheusalmeida
Reviewed By: matheusalmeida
Differential Revision: http://llvm-reviews.chandlerc.com/D2338
llvm-svn: 196775
As we can't make a complete solution now it has been decided to enable .set directive to handle long jump expressions. This will cause parser to report errors when parsing integer based register assignments, for example:
.set r3, will be reported as error. Still, the need for expressions is higher priority as the integer based register assignments are Mips specific and can be avoided using register names.
llvm-svn: 196773
When trying to eliminate an "sub sp, sp, #N" instruction by folding
it into an existing push/pop using dummy registers, we need to account
for the fact that this might affect precisely how "fp" gets set in the
prologue.
We were attempting this, but assuming that *whenever* we performed a
fold it would make a difference. This is false, for example, in:
push {r4, r7, lr}
add fp, sp, #4
vpush {d8}
sub sp, sp, #8
we can fold the "sub" into the "vpush", forming "vpush {d7, d8}".
However, in that case the "add fp" instruction mustn't change, which
we were getting wrong before.
Should fix PR18160.
llvm-svn: 196725
Before this change, inlining one "invoke" into an outer "invoke" call
site can lead to the outer landingpad's catch/filter clauses being
copied multiple times into the resulting landingpad. This happens:
* when the inlined function contains multiple "resume" instructions,
because forwardResume() copies the clauses but is called multiple
times;
* when the inlined function contains a "resume" and a "call", because
HandleCallsInBlockInlinedThroughInvoke() copies the clauses but is
redundant with forwardResume().
Fix this by deduplicating the code.
This problem doesn't lead to any incorrect execution; it's only
untidy.
This change will make fixing PR17872 a little easier.
llvm-svn: 196710
They were out of place since the introduction of arbitrary precision integer
types.
This also synchronizes the documentation to Types.h, so it refers to first class
types and single value types.
llvm-svn: 196661
These helper classes take care of the book-keeping the drives the
GenericScheduler heuristics. It is likely that developers writing
target-specific schedulers that work similarly to GenericScheduler
will want to use these helpers too. The immediate goal is to develop a
GenericPostScheduler that can run in place of the old PostRAScheduler,
but will use the new machine model.
No functionality change intended.
llvm-svn: 196643
The sefault occurs due to an infinite loop when the verifier tries to
determine the size of a type of the form "%rt = type { %rt }" while
checking an alloca of the type.
llvm-svn: 196626
lib/Transforms/Instrumentation/AddressSanitizer.cpp:1405:36: error: non-constant-expression cannot be narrowed from type 'uint64_t' (aka 'unsigned long long') to 'size_t' (aka 'unsigned int') in initializer list [-Wc++11-narrowing]
getAllocaSizeInBytes(AI),
^~~~~~~~~~~~~~~~~~~~~~~~
llvm-svn: 196623
This commit caches the value of the AllowAtInIdentifier variable as
a class variable in AsmLexer. We do this to avoid repeated MAI
queries and string comparisons each time we lex an identifier.
llvm-svn: 196622
Command line arguments that begin with @ but aren't a path to an
existing file currently cause later @file arguments to be ignored.
Correctly skip over these arguments instead of trying to read a
non-existent file 20 times and giving up.
Since the problem manifests in the clang driver, the test is in that
repository.
Fixes rdar://problem/15590906
llvm-svn: 196620
- krait processor currently modeled with the same features as A9.
- Krait processor additionally has VFP4 (fused multiply add/sub)
and hardware division features enabled.
- krait has currently the same Schedule model as A9
- krait cpu flag is not recognized by the GNU assembler yet,
it is replaced with march=armv7-a to avoid a lower march
from being used.
llvm-svn: 196619
This removes another case of spooky action at a distance (building the
same label names in multiple places creating an implicit dependency
between those places) and helps pave the way for type units.
llvm-svn: 196617
The integrated assembler fails to properly lex arm comments when
they are adjacent to an identifier in the input stream. The reason
is that the arm comment symbol '@' is also used as symbol variant in
other assembly languages so when lexing an identifier it allows the
'@' symbol as part of the identifier.
Example:
$ cat comment.s
foo:
add r0, r0@got to parse this as a comment
$ llvm-mc -triple armv7 comment.s
comment.s:4:18: error: unexpected token in argument list
add r0, r0@got to parse this as a comment
^
This should be parsed as correctly as `add r0, r0`.
This commit modifes the assembly lexer to not include the '@' symbol
in identifiers when lexing for targets that use '@' for comments.
llvm-svn: 196607
This more accurately represents the actual walk - pubnames/pubtypes are
emitted into the .o, not the .dwo, and reference the skeletons not the
full units.
Use the newly established ID->index invariant to lookup the underlying
full unit to retrieve its public names and types.
llvm-svn: 196601
This simplifies reasoning about the code and enables simple navigation
from a skeleton to its full unit. (currently there are no type unit
skeletons, so the skeleton list doesn't have the same ID == index
property)
Eventually we should get rid of this ID and just store the labels we
need as the IDs are allowing this code to create difficult to
manage/understand associations (loops over non-skeletal units are
implicitly referencing their skeletal units during pub* emission, for
example). It may be necessary to have some kind of skeleton->full unit
association and a more direct pointer or similar device would be
preferable than an index.
llvm-svn: 196600
The current peephole optimizing for compare inst assumes an instr that
uses CPSR has an MO for ARM Cond code.However, for VSEL instructions
(vseqeq, vselgt, vselgt, vselvs), there is no such operand nor do
they support the modification of Cond Code.
llvm-svn: 196588
Since z has no setcc instruction as such, the choice of setBooleanContents
is a bit arbitrary. Currently it's set to ZeroOrOneBooleanContent,
so we produced a branch-free form when selecting between 0 and 1,
but not when selecting between 0 and -1. This patch handles the latter
case too.
At some point I'd like to measure whether it's better to use conditional
moves for constant selects on z196, but that's future work.
llvm-svn: 196578