Handy when testing specific files, already supported in other components.
Example:
cd build; ./bin/llvm-lit ../compiler-rt/test/tsan/ignore_free.cpp
Differential Revision: https://reviews.llvm.org/D103054
During the generic x86-64 support refactor in ecf6466f01 the implementation
of MachO_arm64_GOTAndStubsBuilder::isGOTEdgeToFix was altered to only return
true for external symbols. This behavior is incorrect: GOT entries may be
required for defined symbols (e.g. in the large code model).
This patch fixes the bug and adds a test case for it (renaming an old test
case to avoid any ambiguity).
When memoized values for a SCEV expressions are dropped, we also
drop all BECounts that make use of the SCEV expression. This is done
by iterating over all the ExitNotTaken counts and (recursively)
checking whether they use the SCEV expression. If there are many
exits, this will take a lot of time.
This patch improves the situation by pre-computing a set of all
used operands, so that we can determine whether a certain BEInfo
needs to be invalidated using a simple set lookup. Will still need
to loop over all BEInfos though.
This makes for a mild improvement on non-degenerate cases:
https://llvm-compile-time-tracker.com/compare.php?from=b661a55a253f4a1cf5a0fbcb86e5ba7b9fb1387b&to=be1393f450e594c53f0ad7e62339a6bc831b16f6&stat=instructions
For the degenerate case from https://bugs.llvm.org/show_bug.cgi?id=50384,
for n=128 I'm seeing run time drop from 1.6s to 1.1s.
Differential Revision: https://reviews.llvm.org/D102796
This diff paves the way for {D102964} which adds a new kind of
InputSection.
We previously maintained section ordering implicitly: we created
InputSections as we parsed each file in command-line order, and passed
on this ordering when we created OutputSections and OutputSegments by
iterating over these InputSections. The implicitness of the ordering
made it difficult to refactor the code to e.g. handle a new type of
InputSection. As such, I've codified the ordering explicitly via
`inputOrder` fields. This also allows us to use `sort` instead of
`stable_sort`.
Benchmarking chromium_framework on my 3.2 GHz 16-Core Intel Xeon W:
N Min Max Median Avg Stddev
x 20 4.23 4.35 4.27 4.274 0.030157481
+ 20 4.24 4.38 4.27 4.2815 0.033759989
No difference proven at 95.0% confidence
Reviewed By: #lld-macho, alexshap
Differential Revision: https://reviews.llvm.org/D102972
The ELF format has the concept of merge sections (marked by SHF_MERGE),
which contain data that can be safely deduplicated. The Mach-O
equivalents are called literal sections (marked by S_CSTRING_LITERALS or
S_{4,8,16}BYTE_LITERALS). While the Mach-O format doesn't use the word
'merge', to avoid confusion, I've renamed our MergedOutputSection to
ConcatOutputSection. I believe it's a more descriptive name too.
This renaming sets the stage for {D102964}.
Reviewed By: #lld-macho, alexshap
Differential Revision: https://reviews.llvm.org/D102971
* Move `static_asserts` into cpp instead of header file. I noticed they
had been separated from the main class definition in the header, so I
set about to clean that up, then figured it made more sense as part of
the cpp file so as not to incur unnecessary compile-time overhead.
* Remove unnecessary `virtual`s
* Remove unnecessary comment / reword another comment
The common phi value transform replaces constants with values that
have the same value as the constant on a given edge. However, LVI
generally only provides information that is correct up to poison,
so this can end up replacing a well-defined value with poison.
D69442 addressed an instance of this problem by clearing poison
flags on the generating instruction, which was sufficient at the
time. rGa917fb89dc28 made LVI's edge value analysis slightly more
powerful, and clearing poison flags is no longer sufficient.
This patch changes the transform to instead explicitly guard against
a poison value instead. This should be satisfied for most cases due
to a prior branch on poison.
Fixes https://bugs.llvm.org/show_bug.cgi?id=50399.
Differential Revision: https://reviews.llvm.org/D102966
- When memory intrinsics, such as memcpy, the attached scoped AA
metadata is not passed down to the backend. As a result, the backend
cannot schedule relevant memory operations around them following that
hint. In this patch, SelectionDAG is enhanced to propagate that
metadata (scoped AA only) when they are lowered into loads and stores.
Differential Revision: https://reviews.llvm.org/D102215
Currently, AbstractOperation fields are function pointers.
Modifying them to unique_function allow them to contain
runtime information.
For instance, this allows operations to be defined at runtime.
Differential Revision: https://reviews.llvm.org/D103031
The semantics of select with undefined/poison condition
are not explicitly stated in the LangRef, but this matches
comments in the code and Alive2 appears to concur:
https://alive2.llvm.org/ce/z/KXytmd
We can find this pattern after demanded elements transforms.
As noted in D101191, fuzzers are finding infinite loops because
we may not account for this pattern in other passes.
This patch is the third in a series of patches fixing markdown links and references inside the mlir documentation.
This patch addresses all broken references to other markdown files and sections inside the Tutorials folder.
Differential Revision: https://reviews.llvm.org/D103017
Now that we can fold some transposes into multiplies (CM: A * B^t and RM:
A^t * B), we want to move them around to create the optimal expressions:
* fold away double transposes while still using them to assert the shape
* sink transposes hoping they cancel out
* lift transposes when both operands are transposed
This also modifies the matrix remarks to include the number of exposed
transposes (i.e. transposes that we couldn't fold into a multiply).
The adjustment to the test remarks-inlining is a bit subtle: I am changing the
double transpose to a single transpose so that we don't remove it completely.
More importantly this changes some of the total instruction count, most
notable stores because we can no longer use a vector store.
Differential Revision: https://reviews.llvm.org/D102733
Nowadays LLVM does not assume that all loops are finite,
so if we want to produce a finite loop from a potentially-infinite one,
we must ensure that the original loop is known to be a finite one.
For this transform, it only matters for arithmetic right-shifts.
For them, either the function or the loop must be known to
be `mustprogress`, or the original value being shifted must be known
to be non-negative (because iff the sign bit was set,
it will never become zero, but will become `-1` in the "end").
It would be really good for alive2 to actually complain about this,
but it currently does not: https://github.com/AliveToolkit/alive2/issues/726
Cast of signed types to u64 breaks comparison.
Also remove double () around operands.
Reviewed By: cryptoad, hctim
Differential Revision: https://reviews.llvm.org/D103060
Make sure that if SCUDO_DEBUG=1 in tests
then we had the same in the scudo
library itself.
Reviewed By: cryptoad, hctim
Differential Revision: https://reviews.llvm.org/D103061
Update the paragraph on generic / indexed_generic to reflect the unification of these operations.
Differential Revision: https://reviews.llvm.org/D102775
llvm-profgen uses profile summary based cold threshold to merge and trim cold context profile. This is to strike a good balance between profile size and performance.
We've been using 99.9% as the cutoff to save profile size without affecting performance. This change switch to use 99.9% instead of 99.9999% as default cold threshold cutoff for llvm-profgen.
Redundant switch csprof-cold-thres is also removed and tests cleaned up.
Differential Revision: https://reviews.llvm.org/D103071
A recent fix for problems with ENTRY statement handling didn't
get the case of a procedure dummy argument on an ENTRY statement
in an executable part right; the code presumed that those dummy
arguments would be objects, not entities that might be objects or
procedures. Fix.
Differential Revision: https://reviews.llvm.org/D103098
This function can change regbank for registers which already have a selected
bank. Depending on the instruction where these registers were used it can
cause instruction selection to fail.
The 2nd test is based on the fuzzer example in post-commit
comments of D101191 -
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=34661
The 1st test shows that we don't deal with this symmetrically.
We should be able to reduce both examples (possibly in
instsimplify instead of instcombine).
The parseInputFile function returns an empty unique_ptr to signal an
error, like when the input file doesn't exist, or is malformed. In this
case, the tool should exit immediately rather than segfault by
dereferencing the unique_ptr later.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D102891
Removed some of the older raw "MLIRized" versions that are
no longer needed now that the sparse runtime support library
can focus on the proper sparse tensor types rather than the
opague pointer approach of the past. This avoids legacy...
Reviewed By: penpornk
Differential Revision: https://reviews.llvm.org/D102960
We are using TOCEntry symbols like `LC..0` in TOC loads,
this is hard to read , at least requiring an additional step to figure
out the loaded symbols.
We should print out the name in comments.
Reviewed By: #powerpc, shchenz
Differential Revision: https://reviews.llvm.org/D102949
Match whats documented in the Intel AOM - the XMM variant of PSHUFB requires BOTH ports - this was being incorrectly modelled as EITHER port.
Now that we can use in-order models in llvm-mca, the atom model is a good "worst case scenario" analysis for x86.
Determined from llvm-mca analysis, AVX1 capable targets have a higher throughput for VPBLENDVB and shuffle ops, making it cheaper to perform shift+shuffle/select shift patterns.
A test in ir.c makes use of casting a void* to an integer type to print it's address. This cast is currently done with the datatype `long` however, which is only guaranteed to be equal to the pointer width on LP64 system. Other platforms may use a length not equal to the pointer width. 64bit Windows as an example uses 32 bit for `long` which does not match the 64 bit pointers.
This also results in clang warning due to `-Wvoid-pointer-to-int-cast`.
Technically speaking, since the test only passes the value 42, it does not cause any issues, but it'd be nice to fix the warning at least.
Differential Revision: https://reviews.llvm.org/D103085
Said function had a few shortfalls:
- didn't set an abort message on Android
- was logged on several lines
- didn't provide extra information like the size requested if OOM'ing
This improves the function to address those points.
Differential Revision: https://reviews.llvm.org/D103034
Currently, BPF only contains three relocations:
R_BPF_NONE for no relocation
R_BPF_64_64 for LD_imm64 and normal 64-bit data relocation
R_BPF_64_32 for call insn and normal 32-bit data relocation
Also .BTF and .BTF.ext sections contain symbols in allocated
program and data sections. These two sections reserved 32bit
space to hold the offset relative to the symbol's section.
When LLVM JIT is used, the LLVM ExecutionEngine RuntimeDyld
may attempt to resolve relocations for .BTF and .BTF.ext,
which we want to prevent. So we used R_BPF_NONE for such relocations.
This all works fine until when we try to do linking of
multiple objects.
. R_BPF_64_64 handling of LD_imm64 vs. normal 64-bit data
is different, so lld target->relocate() needs more context
to do a correct job.
. The same for R_BPF_64_32. More context is needed for
lld target->relocate() to differentiate call insn vs.
normal 32-bit data relocation.
. Since relocations in .BTF and .BTF.ext are set to R_BPF_NONE,
they will not be relocated properly when multiple .BTF/.BTF.ext
sections are merged by lld.
This patch intends to address this issue by adding additional
relocation kinds:
R_BPF_64_ABS64 for normal 64-bit data relocation
R_BPF_64_ABS32 for normal 32-bit data relocation
R_BPF_64_NODYLD32 for .BTF and .BTF.ext style relocations.
The old R_BPF_64_{64,32} semantics:
R_BPF_64_64 for LD_imm64 relocation
R_BPF_64_32 for call insn relocation
The existing R_BPF_64_64/R_BPF_64_32 mapping to numeric values
is maintained. They are the most common use cases for
bpf programs and we want to maintain backward compatibility
as much as possible.
ExecutionEngine RuntimeDyld BPF relocations are adjusted as well.
R_BPF_64_{ABS64,ABS32} relocations will be resolved properly and
other relocations will be ignored.
Two tests are added for RuntimeDyld. Not handling R_BPF_64_NODYLD32 in
RuntimeDyldELF.cpp will result in "Relocation type not implemented yet!"
fatal error.
FK_SecRel_4 usages in BPFAsmBackend.cpp and BPFELFObjectWriter.cpp
are removed as they are not triggered in BPF backend.
BPF backend used FK_SecRel_8 for LD_imm64 instruction operands.
Differential Revision: https://reviews.llvm.org/D102712