We don't know which test suites are going to be included by runtimes
builds so we cannot include those before running the sub-build, but
that's not possible during the LLVM build configuration. We instead use
a response file that's populated by the runtimes build as a level of
indirection.
This addresses the issue described in:
https://discourse.llvm.org/t/cmake-regeneration-is-broken/62788
Differential Revision: https://reviews.llvm.org/D132438
This allows reading arguments from file using the response file syntax.
We would like to use this in the LLVM build to pass test suites from
subbuilds.
Differential Revision: https://reviews.llvm.org/D132437
The pass tablegen backend has been reworked to remove the monolithic nature of the autogenerated declarations.
The pass public header can be generated with the -gen-pass-decls option. It contains options structs and registrations: the inclusion of options structs can be controlled individually for each pass by defining the GEN_PASS_DECL_PASSNAME macro; the declaration of the registrations have been kept together and can still be included by defining the GEN_PASS_REGISTRATION macro.
The private code used for the pass implementation (i.e. the pass base class and the constructors definitions, if missing from tablegen) can be generated with the -gen-pass-defs option. Similarly to the declarations file, the definitions of each pass can be enabled by defining the GEN_PASS_DEF_PASNAME variable.
While doing so, the pass base class has been enriched to also accept a the aformentioned struct of options and copy them to the actual pass options, thus allowing each pass to also be configurable within C++ and not only through command line.
Reviewed By: rriddle, mehdi_amini, Mogball, jpienaar
Differential Revision: https://reviews.llvm.org/D131839
The ArgumentPromotion pass uses Mem2Reg promotion at the end to cutting
down generated `alloca` instructions as well as meaningless `store`s and
this behavior can leave unused (dead) arguments. To eliminate the dead
arguments and therefore let the DeadCodeElimination remove becoming dead
inserted `GEP`s as well as `load`s and `cast`s in the callers, the
DeadArgumentElimination pass should be run after the ArgumentPromotion
one.
Differential Revision: https://reviews.llvm.org/D128830
This will make it easier to find the LICENSE and some
software also looks in the root to automatically find it.
Reviewed By: kristof.beyls, lattner
Differential Revision: https://reviews.llvm.org/D132018
This removes -O1 from the SVE ACLE intrinsics tests and replaces it with
-O0 and "opt -mem2reg -instcombine -tailcallelim". Instrcombine and
TailCallElim are only added to keep the differences smaller and can be
removed in a followup patches. The only remaining differences in the
tests are tbaa nodes not being emitted under -O0, and the removable of
some tailcall flags.
Add zihintntl compressed instructions and some files related to zihintntl.
This patch is base on {D121670}.
Reviewed By: kito-cheng
Differential Revision: https://reviews.llvm.org/D121779
This drops the artificial requirement of producers having a single
result value to be able to fuse with consumers.
The current default also only fuses producer with consumer when the
producer has a single use. This is a simplifying assumption. There are
legitimate use cases where a producer can be fused with consumer and
the fused o pcould be used to replace the uses of the producer as
well. This needs to be done with care to avoid use-def violations. To
allow for downstream users to explore more fusion opportunities, the
core transformation method is exposed as a utility function.
This patch also modifies the control function to take just the fused
operand as the argument. This is enough information for the callers to
get the producer and the consumer operations being considered to
fuse. It also provides information of which producer result is used.
Differential Revision: https://reviews.llvm.org/D132301
Summary: Currently, an error was reported when a thread local symbol has an invalid name. D100956 create a new symbol to prefix the TLS symbol name with a dot. When the symbol name is renamed, the error occurs. This patch uses the original symbol name (name in the symbol table) as the input for the symbol for TOC entry.
Reviewed By: shchenz, lkail
Differential Revision: https://reviews.llvm.org/D132348
In branch relaxation pass, `j`'s with offset over 1MiB will be relaxed
to `jump` pseudo-instructions.
This patch allocates a stack slot for functions with a size greater than
1MiB. If the register scavenger cannot find a scratch register for
`jump`, spill a register to the slot before the jump and restore it
after the jump.
.mbb:
foo
j .dest_bb
bar
bar
bar
.dest_bb:
baz
The above code will be relaxed to the following code.
.mbb:
foo
sd s11, 0(sp)
jump .restore_bb, s11
bar
bar
bar
j .dest_bb
.restore_bb:
ld s11, 0(sp)
.dest_bb:
baz
Depends on D129999.
Reviewed By: StephenFan
Differential Revision: https://reviews.llvm.org/D130560
If a MachineInstr's operand should be Reg in compiler's output but is
currently FrameIndex, `isCompressibleInst()` will terminate at
`MachineOperandType::getReg()`.
This patch adds `.isReg()` checks to make `isCompressibleInst()` return
false for these MachineInstr, allowing `getInstSizeInBytes()` to return
a value and `EstimateFunctionSizeInBytes()` to work as intended.
See https://reviews.llvm.org/D129999#3694222 for details.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D129999
The array size specification of the an alloca can be any integer,
so zext or trunc it to intptr before attempting to multiply it
with an intptr constant.
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D131846
We were passing the type of `Val` to `getShadowOriginPtr`, rather
than the type of `Val`'s shadow resulting in broken IR. The fix
is simple.
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D131845
If `LLVM_BUILD_MAIN_SRC_DIR` is not defined, just assume we are in
regular monorepo layout. Non-standard (and not really supported) layouts
can still be configured manually.
Reviewed By: beanz
Differential Revision: https://reviews.llvm.org/D132314
extractShiftForRotate may fail to return canonicalized shifts due to constant folding or other simplification that can occur in getNode()
Fixes Issue #57283
This change came in a few hours ago and introduced a warning. The fix
is trivial, so I'm providing it. The original change was reviewed here:
https://reviews.llvm.org/D132331
Move the large lambda out of BinaryFunction::disassemble, reducing its size from
255 to 233 LoC.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D132104
Move the large lambda out of BinaryFunction::disassemble, reducing its size from
295 to 255 LoC.
Differential Revision: https://reviews.llvm.org/D132101
Move the large lambda out of BinaryFunction::disassemble, reducing its size from
338 to 295 LoC.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D132100
Move the large lambda out of BinaryFunction::disassemble, reducing its size from
377 to 338 LoC.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D132099
We're seeing the following warnings with --rtlib=compiler-rt:
lld-link: warning: ignoring unknown argument '--as-needed'
lld-link: warning: ignoring unknown argument '-lunwind'
lld-link: warning: ignoring unknown argument '--no-as-needed'
MSVC doesn't use the unwind library, so just omit it.
Differential Revision: https://reviews.llvm.org/D132440
Some tests still pass even though we don't claim big-endian support. Using
UNSUPPORTED is a better indicator than XFAIL that we don't guarantee that
the tests work.
Dialects can opt-in to providing custom encodings by implementing the
`BytecodeDialectInterface`. This interface provides hooks, namely
`readAttribute`/`readType` and `writeAttribute`/`writeType`, that will be used
by the bytecode reader and writer. These hooks are provided a reader and writer
implementation that can be used to encode various constructs in the underlying
bytecode format. A unique feature of this interface is that dialects may choose
to only encode a subset of their attributes and types in a custom bytecode
format, which can simplify adding new or experimental components that aren't
fully baked.
Differential Revision: https://reviews.llvm.org/D132498
This moves some parsing functionality from BytecodeReader to
AttrTypeReader, and removes some duplication between the attribute/type
code paths.
Differential Revision: https://reviews.llvm.org/D132497
This extracts the string section writer and reader into dedicated
classes, which better separates the logic and will also simplify future
patches that want to interact with the string section.
Differential Revision: https://reviews.llvm.org/D132496
The existing code resulted in the max size and access counts being equal
to the min. Compute the max instead (max lifetime was already correct).
Differential Revision: https://reviews.llvm.org/D132515
The constructors of non-POD array elements are evaluated under
certain conditions. This patch makes sure that in such cases
we also evaluate the destructors.
Differential Revision: https://reviews.llvm.org/D130737
Keep the original data type of integer do-variables
for structured loops. When do-variable's data type
is an integer type shorter than IndexType, processing
the do-variable separately from the DoLoop's iteration index
allows getting rid of type casts, which can make backend
optimizations easier.
For example,
```
do i = 2, n-1
do j = 2, n-1
... = a(j-1, i)
end do
end do
```
If value of 'j' is computed by casting the DoLoop's iteration
index to 'i32', then Flang will produce the following LLVM IR:
```
%1 = trunc i64 %iter_index to i32
%2 = sub i32 %1, 1
%3 = sext i32 %2 to i64
```
LLVM's InstCombine may try to get rid of the sign extension,
and may transform this into:
```
%1 = shl i64 %iter_index, 32
%2 = add i64 %1, -4294967296
%3 = ashr exact i64 %2, 32
```
The extra computations for the element address applied on top
of this awkward pattern confuse LLVM vectorizer so that
it does not recognize the unit-strided access of 'a'.
Measured performance improvements on `SPEC CPU2000@IceLake`:
```
168.wupwise: 11.96%
171.swim: 11.22%
172.mrgid: 56.38%
178.galgel: 7.29%
301.apsi: 8.32%
```
Differential Revision: https://reviews.llvm.org/D132176
(ctpop x) == 1 --> (x != 0) && ((x & x-1) == 0)
Adjust the legality check to avoid the poor codegen on AArch64.
We probably only want to use popcount on this pattern when it
is a single instruction.
fixes#57225
Differential Revision: https://reviews.llvm.org/D132237
The existing tests were added with 2880d7b9e4, but
discussion in D132412 suggests that we should start
with a simpler pattern (the more complicated pattern
may not be a real problem).