In Thumb2, instructions which write to the PC are UNPREDICTABLE if they are in
an IT block but not the last instruction in the block.
Previously, we only diagnosed this for LDM instructions, this patch extends the
diagnostic to cover all of the relevant instructions.
Differential Revision: https://reviews.llvm.org/D30398
llvm-svn: 296459
Without this simplification for a loop nest:
void foo(long n1_a, long n1_b, long n1_c, long n1_d,
long p1_b, long p1_c, long p1_d,
float A_1[][p1_b][p1_c][p1_d]) {
for (long i = 0; i < n1_a; i++)
for (long j = 0; j < n1_b; j++)
for (long k = 0; k < n1_c; k++)
for (long l = 0; l < n1_d; l++)
A_1[i][j][k][l] += i + j + k + l;
}
the assumption:
n1_a <= 0 or (n1_a > 0 and n1_b <= 0) or
(n1_a > 0 and n1_b > 0 and n1_c <= 0) or
(n1_a > 0 and n1_b > 0 and n1_c > 0 and n1_d <= 0) or
(n1_a > 0 and n1_b > 0 and n1_c > 0 and n1_d > 0 and
p1_b >= n1_b and p1_c >= n1_c and p1_d >= n1_d)
is taken rather than the simpler assumption:
p9_b >= n9_b and p9_c >= n9_c and p9_d >= n9_d.
The former is less strict, as it allows arbitrary values of p1_* in case, the
loop is not executed at all. However, in practice these precise constraints
explode when combined across different accesses and loops. For now it seems
to make more sense to take less precise, but more scalable constraints by
default. In case we find a practical example where more precise constraints
are needed, we can think about allowing such precise constraints in specific
situations where they help.
This change speeds up the new test case from taking very long (waited at least
a minute, but it probably takes a lot more) to below a second.
llvm-svn: 296456
The option -mexecute-only is translated into the backend option
-arm-execute-only. But this option only makes sense for the compiler and
the assembler does not recognize it. This patch stops clang from passing
this option to the assembler.
Change-Id: I4f4cb1162c13cfd50a0a36702a4ecab1bc0324ba
Review: https://reviews.llvm.org/D30414
llvm-svn: 296454
The begining command "rm" will return 1 when there is not such file to
delete.
This patch is to remove it, as it's not needed for the test.
llvm-svn: 296453
There are many special cases and a layer of abstraction or two in the
way, but the VA calculation in the typical case is actually very simple
and probably makes perfect sense even to somebody new to linkers.
Also, this line brings together many components and is a good place to
start understanding the linker (or improve one's existing
understanding).
llvm-svn: 296451
Summary:
Use a common definition of a "this variable is unused" annotation for useless
variables only present for their lambda global initializers, to silence gcc's
warning.
Reviewers: dberris
Reviewed By: dberris
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29860
llvm-svn: 296449
Naively it seemed at first like getVA had the responsibility of adding
the addend, and getSymVA had the responsibility of getting the symbol
VA.
So it was not obvious to me at first why getVA passes Addend to
getSymVA. In fact, it passes it as a mutable reference.
It turns out that it only matters for SHF_MERGE sections, and in
particular only for STT_SECTION symbols that are used as a hack for
reducing the number of local symbols (e.g. to avoid a local symbol for
each string in the string table).
llvm-svn: 296448
Summary:
Add usage count to find-all-symbols.
FindAllSymbols now finds (most!) main-file usages of the discovered symbols.
The per-TU map output has NumUses=0 or 1 (only one use per file is counted).
The reducer aggregates these to find the number of files that use a symbol.
The NumOccurrences is now set to 1 in the mapper rather than being inferred by
the reducer, for consistency.
The idea here is to use NumUses for ranking: intuitively number of files that
use a symbol is more meaningful than number of files that include the header.
Reviewers: hokein, bkramer
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D30210
llvm-svn: 296446
Summary:
Currently, we assume that applications built with XRay would like to
have the instrumentation sleds patched before main starts. This patch
changes the default so that we do not patch the instrumentation sleds
before main. This default is more helpful for deploying applications in
environments where changing the current default is harder (i.e. on
remote machines, or work-pool-like systems).
This default (not to patch pre-main) makes it easier to selectively run
applications with XRay instrumentation enabled, than with the current
state.
Reviewers: echristo, timshen
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D30396
llvm-svn: 296445
This is also useful in cases when llvm is in a shared library. First we dlopen
the llvm shared library and then we register it as a permanent library in order
to keep the JIT and other services working.
Patch reviewed by Vedant Kumar (D29955)!
llvm-svn: 296442
Summary:
With this change ImplicitNullCheck optimization uses alias analysis
and can use load/store memory access for implicit null check if there
are other load/store before but memory accesses do not alias.
Patch by Serguei Katkov!
Reviewers: sanjoy
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30331
llvm-svn: 296440
In other places in LLD, we use write32<E> instead of Elf_Word.
This patch uses the same technique in the hash table classes.
The hash table classes needs improving as they have almost no
comments. We at least need to describe the hash table structure
and why we have to support two different on-disk hash tables
for the same purpose. I'll do that later.
llvm-svn: 296439
Previously, these two functions put their symbols in different queues.
Now that the queues have been merged. So there's no point to keep two
separate functions.
llvm-svn: 296435
That function doesn't use any member of SymbolTableSection, so I
couldn't see a reason to make it a member of that class. The function
takes a SymbolBody, so it is more natural to make it a member of
SymbolBody.
llvm-svn: 296433
Previously, there were three conditions: .symtab, .dynsym or we are
producing a relocatable output. Turned out that the third condition is
the same as the first one. This patch removes that third condition and
simplify code.
llvm-svn: 296431
We really need to find a way to get this info from a single point of
truth in the LLVM backend, but it seems that the EM_* constants are
buried deep inside the constructors of the MCAsmBackend's.
For now, just fill in entries as we run into cases. AFAIK these mappings
are largely immutable, so we get a 75% discount on the technical debt
(code is duplicated, but little chance of divergence).
llvm-svn: 296429
The previous code was a bit hard to understand because it unnecessarily
distinguished local and non-local symbols. It had NumLocals member
variable, but that variable didn't have a number of local symbols but
had some value that I cannot describe easily.
This patch rewrites SynbolTableSection::finalizeContents and
SymbolTableSection::writeTo to make it easy to understand. NumLocals
member variable has been removed, and writeGlobalSymbols and
writeLocalSymbols have been merged into one function.
There's still a piece of code that I think unnecessary. I'm not removing
that code in this patch, but will do in a follow-up patch.
llvm-svn: 296423
Summary: Points the user to look at function pointer assignments.
Reviewers: kcc, eugenis, kubamracek
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30432
llvm-svn: 296419
This is a patch for the outliner described in the RFC at:
http://lists.llvm.org/pipermail/llvm-dev/2016-August/104170.html
The outliner is a code-size reduction pass which works by finding
repeated sequences of instructions in a program, and replacing them with
calls to functions. This is useful to people working in low-memory
environments, where sacrificing performance for space is acceptable.
This adds an interprocedural outliner directly before printing assembly.
For reference on how this would work, this patch also includes X86
target hooks and an X86 test.
The outliner is run like so:
clang -mno-red-zone -mllvm -enable-machine-outliner file.c
Patch by Jessica Paquette<jpaquette@apple.com>!
rdar://29166825
Differential Revision: https://reviews.llvm.org/D26872
llvm-svn: 296418
Splitting critical edges when one of the source edges is an indirectbr
is hard in general (because it requires changing the memory the indirectbr
reads). But if a block only has a single indirectbr predecessor (which is
the common case), we can simulate splitting that edge by splitting
the destination block, and retargeting the *direct* branches.
This is motivated by the use of computed gotos in python 2.7: PyEval_EvalFrame()
ends up using an indirect branch with ~100 successors, and passing a constant to
each of those. Since MachineSink can't break indirect critical edges on demand
(and doing this in MIR doesn't look feasible), this causes us to emit about ~100
defs of registers containing constants, which we in the predecessor block, where
only one of those constants is used in each successor. So, at each computed goto,
we needlessly spill about a 100 constants to stack. The end result is that a
clang-compiled python interpreter can be about ~2.5x slower on a simple python
reduction loop than a gcc-compiled interpreter.
Differential Revision: https://reviews.llvm.org/D29916
llvm-svn: 296416
Before the endianness was specified on each call to read
or write of the StreamReader / StreamWriter, but in practice
it's extremely rare for streams to have data encoded in
multiple different endiannesses, so we should optimize for the
99% use case.
This makes the code cleaner and more general, but otherwise
has NFC.
llvm-svn: 296415