Summary:
This is a simple fix for CodeGenPrepare that freezes branch condition when transforming select to branch.
If it is not freezed, instsimplify or the later pipeline can potentially exploit undefined behavior.
The diff shows optimized form becase D75859 and D76048 already made a few changes to CodeGenPrepare for optimizing freeze(cmp).
Reviewers: jdoerfert, spatel, lebedev.ri, efriedma
Reviewed By: lebedev.ri
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76179
Global symbols with linker-private prefixes should be resolvable across object
boundaries, but internal symbols with linker-private prefixes should not.
This is the first in a series of patches to enable Basic Block Sections
in LLVM.
We introduce a new compiler option, -fbasicblock-sections=, which places every
basic block in a unique ELF text section in the object file along with a
symbol labeling the basic block. The linker can then order the basic block
sections in any arbitrary sequence which when done correctly can encapsulate
block layout, function layout and function splitting optimizations. However,
there are a couple of challenges to be addressed for this to be feasible:
1) The compiler must not allow any implicit fall-through between any two
adjacent basic blocks as they could be reordered at link time to be
non-adjacent. In other words, the compiler must make a fall-through
between adjacent basic blocks explicit by retaining the direct jump
instruction that jumps to the next basic block. These branches can only
be removed later by the linker after the blocks have been reordered.
2) All inter-basic block branch targets would now need to be resolved by
the linker as they cannot be calculated during compile time. This is
done using static relocations which bloats the size of the object files.
Further, the compiler tries to use short branch instructions on some ISAs
for branch offsets that can be accommodated in one byte. This is not
possible with basic block sections as the offset is not determined at
compile time, and long branch instructions have to be used everywhere.
3) Each additional section bloats object file sizes by tens of bytes. The
number of basic blocks can be potentially very large compared to the
size of functions and can bloat object sizes significantly. Option
fbasicblock-sections= also takes a file path which can be used to
specify a subset of basic blocks that needs unique sections to keep
the bloats small.
4) Debug Info and CFI need special handling and will be presented as
separate patches.
Basic Block Labels
With -fbasicblock-sections=labels, or when a basic block is placed in a
unique section, it is labelled with a symbol. This allows easy mapping of
virtual addresses from PMU profiles back to the corresponding basic blocks.
Since the number of basic blocks is large, the labeling bloats the symbol
table sizes and the string table sizes significantly. While the binary size
does increase, it does not affect performance as the symbol table is not
loaded in memory during run-time. The string table size bloat is kept very
minimal using a unary naming scheme that uses string suffix compression.
The basic blocks for function foo are named "a.BB.foo", "aa.BB.foo", ...
This turns out to be very good for string table sizes and the bloat in the
string table size for a very large binary is ~8 %. The naming also allows
using the --symbol-ordering-file option in LLD to arbitrarily reorder the
sections.
Differential Revision: https://reviews.llvm.org/D68063
Gives us coverage of splitting the v32i16/v64i8 when we have
avx512f and not avx512bw.
Considering making v32i16/v64i8 a legal type on avx512f which
needs this test coverage.
Renames the llvm/examples/LLJITExamples directory to llvm/examples/OrcV2Examples
since it is becoming a home for all OrcV2 examples, not just LLJIT.
See http://llvm.org/PR31103.
The name field is optional if the custom code is supplied, so this updates the
documentation for LangOpt and introduces a tablegen warning if both custom code
and a language option name are supplied.
Setting MLIR_TABLEGEN_EXE would prevent building the native tool which is used in cross-compiling
Differential Revision: https://reviews.llvm.org/D75299
Summary:
Cover a new use case when using a 'signed char' as an integer
might lead to issue with non-ASCII characters. Comparing
a 'signed char' with an 'unsigned char' using equality / unequality
operator produces an unexpected result for non-ASCII characters.
Reviewers: aaron.ballman, alexfh, hokein, njames93
Reviewed By: njames93
Subscribers: xazax.hun, cfe-commits
Tags: #clang, #clang-tools-extra
Differential Revision: https://reviews.llvm.org/D75749
This patch adds a new undef lattice state, which is used to represent
UndefValue constants or instructions producing undef.
The main difference to the unknown state is that merging undef values
with constants (or single element constant ranges) produces the
constant/constant range, assuming all uses of the merge result will be
replaced by the found constant.
Contrary, merging non-single element ranges with undef needs to go to
overdefined. Using unknown for UndefValues currently causes mis-compiles
in CVP/LVI (PR44949) and will become problematic once we use
ValueLatticeElement for SCCP.
Reviewers: efriedma, reames, davide, nikic
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D75120
`PAddr` corresponds to `p_paddr` of a program header, which is the segment's physical
address for systems in which physical addressing is relevant. `p_paddr` is often equal
to `p_vaddr`, which is the virtual address of a segment.
This patch changes the default for `PAddr` from 0 to a value of `VAddr`.
Differential revision: https://reviews.llvm.org/D76131
Merge the INSERT_VECTOR_ELT/SCALAR_TO_VECTOR and PINSRW/PINSRB shuffle mask paths - they both do the same thing (find source vector + handle implicit zero extension). The PINSRW/PINSRB path also handled in the insertion of zero case which needed to be added to the general case as well.
Summary:
- remove stale declarations on flat affine constraints
- avoid allocating small vectors where possible
- clean up code comments, rename some variables
Differential Revision: https://reviews.llvm.org/D76117
transparent context.
(The same crash would happen if a class template with a friend was
declared in an 'export' block, but there are more issues with that
case.)
Summary:
When moving add and sub to memory operand instructions,
aarch64-ldst-opt would prematurally pop the stack pointer,
before memory instructions that do access the stack using
indirect loads.
e.g.
```
int foo(int offset){
int local[4] = {0};
return local[offset];
}
```
would generate:
```
sub sp, sp, #16 ; Push the stack
mov x8, sp ; Save stack in register
stp xzr, xzr, [sp], #16 ; Zero initialize stack, and post-increment, making it invalid
------ If an exception goes here, the stack value might be corrupted
ldr w0, [x8, w0, sxtw #2] ; Access correct position, but it is not guarded by SP
```
Reviewers: fhahn, foad, thegameg, eli.friedman, efriedma
Reviewed By: efriedma
Subscribers: efriedma, kristof.beyls, hiraditya, danielkiss, llvm-commits, simon_tatham
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75755
This is a little more complicated than I'd like it to be. We have
to manually match a trunc+srl+load pattern that generic DAG
combine won't do for us due to isTypeDesirableForOp.
Because we have to use a ConstantExpr at some point, the canonical form
isn't set in stone, but this seems reasonable.
The pretty sizeof(<vscale x 4 x i32>) dumping is a relic of ancient
LLVM; I didn't have to touch that code. :)
Differential Revision: https://reviews.llvm.org/D75887
Avoid copying of the orignal variable if it is going to be marked as
firstprivate in task regions. For taskloops, still need to copy the
non-trvially copyable variables to correctly construct them upon task
creation.
Summary:
To enable this, two changes are needed:
1) Add an optional attribute `padding` to linalg.conv.
2) Compute if the indices accessing is out of bound in the loops. If so, use the
padding value `0`. Otherwise, use the value derived from load.
In the patch, the padding only works for lowering without other transformations,
e.g., tiling, fusion, etc.
Differential Revision: https://reviews.llvm.org/D75722