Update v4i64 -> v4f32/v4f64 uitofp costs based on the worst case costs from the script in D103695.
Fixes a few regressions before we start adding AVX costs for legalized types.
The compiler should not ignore UndefValue when gathering the scalars,
otherwise the resulting code may be less defined than the original one.
Also, grouped scalars to insert them at first to reduce the analysis in
further passes.
Differential Revision: https://reviews.llvm.org/D105275
Fixes bugs [[ https://bugs.llvm.org/show_bug.cgi?id=50580 | 50580 ]] and [[ https://bugs.llvm.org/show_bug.cgi?id=49446 | 49446 ]]
When compiling with -g "DBG_VALUE <reg>" instructions are added in the MIR, if such a instruction is inserted between instructions that use <reg> then MachineCopyPropagation invalidates <reg> , this causes some copies to not be propagated and causes differences in code generation (ex bugs 50580 and 49446 ). DBG_VALUE instructions should be ignored since they don't actually modify the register.
Reviewed By: lkail
Differential Revision: https://reviews.llvm.org/D104394
While this should not matter for most architectures (where the program
address space is 0), it is important for CHERI (and therefore Arm Morello).
We use address space 200 for all of our code pointers and without this
change we assert in the SelectionDAG handling of BlockAddress nodes.
It is also useful for AVR: previously programs targeting
AVR that attempt to read their own machine code
via a pointer to a label would instead read from RAM
using a pointer relative to the the start of program flash.
Reviewed By: dylanmckay, theraven
Differential Revision: https://reviews.llvm.org/D48803
If the store address does not dominate the matrix multiply, try to hoist
address computation instructions without side-effects and/or memory
reads before the multiply, to allow fusion.
Reviewed By: thegameg
Differential Revision: https://reviews.llvm.org/D105193
Support using the extended thread-id syntax with Hg packet to select
a subprocess. This makes it possible to start providing support for
running some of the debugger packets against another subprocesses.
Differential Revision: https://reviews.llvm.org/D100261
For files directly under clangd/, -Iclang-tools-extra/clangd (and the
equivalent for generated files) are not required, as CMake/the compiler puts
these directories on the include path by default.
However this means each subdirectory needs to
include_directories(.. ${CMAKE_CURRENT_BINARY_DIR}/..) etc, and this
proved annoying and error-prone to maintain and debug.
Since include_directories is inherited by subdirectories, we just
configure this explicitly at the top level instead.
Reland of 31859f896.
This change implements new DAG notes GLOBAL_GET/GLOBAL_SET, and
lowering methods for load and stores of reference types from IR
globals. Once the lowering creates the new nodes, tablegen pattern
matches those and converts them to Wasm global.get/set.
Differential Revision: https://reviews.llvm.org/D104797
Introduce an integration test folder in the test/python subfolder and move the opsrun.py test into the newly created folder. The test verifies named operations end-to-end using both the yaml and the python path.
Differential Revision: https://reviews.llvm.org/D105276
Add the max operation to the OpDSL and introduce a max pooling operation to test the implementation. As MLIR has no builtin max operation, the max function is lowered to a compare and select pair.
Differential Revision: https://reviews.llvm.org/D105203
If linking directly against a DLL without an import library, the
DLL export symbols might not contain stdcall decorations.
If we have an undefined symbol with decoration, and we happen to have
a matching undecorated symbol (which either is lazy and can be loaded,
or already defined), then alias it against that instead.
This matches what's done in reverse, when we have a def file
declaring to export a symbol without decoration, but we only have
a defined decorated symbol. In that case we do a fuzzy match
(SymbolTable::findMangle). This case is more straightforward; if we
have a decorated undefined symbol, just strip the decoration and look
for the corresponding undecorated symbol name.
Add warnings and options for either silencing the warning or disabling
the whole feature, corresponding to how ld.bfd does it.
(This feature works for any symbol decoration mismatch, not only when
linking against a DLL directly; ld.bfd also tolerates it anywhere,
and also fixes up mismatches in the other direction, like
SymbolTable::findMangle, for any symbol, not only exports. But in
practice, at least for lld, it would primarily end up used for linking
against DLLs.)
Differential Revision: https://reviews.llvm.org/D104532
As the COFF linker is capable of linking directly against a DLL now
(after D104530, as long as it is running in mingw mode), don't error
out here but successfully load libraries specified with "-l" from DLLs
if that's what ld.bfd would have matched.
Differential Revision: https://reviews.llvm.org/D104531
GNU ld.bfd supports linking directly against DLLs without using an
import library, and some projects have picked up on this habit.
(There's no one single unsurmountable issue with using import
libraries, but this is a regularly surfacing missing feature.)
As long as one is linking by name (instead of by ordinal), the DLL
export table contains most of the information needed. (One can
inspect what section a symbol points at, to see if it's a function
or data symbol. The practical implementation of this loops over all
sections for each symbol, but as long as they're not very many, that
should hopefully be tolerable performance wise.)
One exception where the information in the DLL isn't entirely enough
is on i386 with stdcall functions; depending on how they're done,
the exported function name can be a plain undecorated name, while
the import library would contain the full decorated symbol name. This
issue is addressed separately in a different patch.
This is implemented mimicing the structure of a regular import library,
with one InputFile corresponding to the static archive that just adds
lazy symbols, which then are fetched when they are needed. When such
a symbol is fetched, we synthesize a coff_import_header structure
in memory and create a regular ImportFile out of it.
The implementation could be even smaller by just creating ImportFiles
for every symbol available immediately, but that would have the
drawback of actually ending up importing all symbols unless running
with GC enabled (and mingw mode defaults to having it disabled for
historical reasons).
Differential Revision: https://reviews.llvm.org/D104530
With 'for' loop there is is a single place where 'Current' is adjusted. It helps to avoid copy paste and makes a bit easy to understand overall loop controll flow.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D101044
Previously we used the vector type, but we're loading/storing
invididual elements so I think only element alignment should matter.
Noticed while looking at the code for something else so I don't
have a test case.
Differential Revision: https://reviews.llvm.org/D105220
We have been creating many ConcatInputSections with identical values due
to .subsections_via_symbols. This diff factors out the identical values
into a Shared struct, to reduce memory consumption and make copying
cheaper.
I also changed `callSiteCount` from a uint32_t to a 31-bit field to save an
extra word.
All in all, this takes InputSection from 120 to 72 bytes (and
ConcatInputSection from 160 to 112 bytes), i.e. 30% size reduction in
ConcatInputSection.
Numbers for linking chromium_framework on my 3.2 GHz 16-Core Intel Xeon W:
N Min Max Median Avg Stddev
x 20 4.14 4.24 4.18 4.183 0.027548999
+ 20 4.04 4.11 4.075 4.0775 0.018027756
Difference at 95.0% confidence
-0.1055 +/- 0.0149005
-2.52211% +/- 0.356215%
(Student's t, pooled s = 0.0232803)
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D105305
`__cfstring` is a special literal section, so instead of breaking it up
at symbol boundaries, we break it up at fixed-width boundaries (since
each literal is the same size). Symbols can only occur at one of those
boundaries, so this is strictly more powerful than
`.subsections_via_symbols`.
With that in place, we then run the section through ICF.
This change is about perf-neutral when linking chromium_framework.
Reviewed By: #lld-macho, gkm
Differential Revision: https://reviews.llvm.org/D105045
This is a pretty big refactoring diff, so here are the motivations:
Previously, ICF ran after scanRelocations(), where we emitting
bind/rebase opcodes etc. So we had a bunch of redundant leftovers after
ICF. Having ICF run before Writer seems like a better design, and is
what LLD-ELF does, so this diff refactors it accordingly.
However, ICF had two dependencies on things occurring in Writer: 1) it
needs literals to be deduplicated beforehand and 2) it needs to know
which functions have unwind info, which was being handled by
`UnwindInfoSection::prepareRelocations()`.
In order to do literal deduplication earlier, we need to add literal
input sections to their corresponding output sections. So instead of
putting all input sections into the big `inputSections` vector, and then
filtering them by type later on, I've changed things so that literal
sections get added directly to their output sections during the 'gather'
phase. Likewise for compact unwind sections -- they get added directly
to the UnwindInfoSection now. This latter change is not strictly
necessary, but makes it easier for ICF to determine which functions have
unwind info.
Adding literal sections directly to their output sections means that we
can no longer determine `inputOrder` from iterating over
`inputSections`. Instead, we store that order explicitly on
InputSection. Bloating the size of InputSection for this purpose would
be unfortunate -- but LLD-ELF has already solved this problem: it reuses
`outSecOff` to store this order value.
One downside of this refactor is that we now make an additional pass
over the unwind info relocations to figure out which functions have
unwind info, since want to know that before `processRelocations()`. I've
made sure to run that extra loop only if ICF is enabled, so there should
be no overhead in non-optimizing runs of the linker.
The upside of all this is that the `inputSections` vector now contains
only ConcatInputSections that are destined for ConcatOutputSections, so
we can clean up a bunch of code that just existed to filter out other
elements from that vector.
I will test for the lack of redundant binds/rebases in the upcoming
cfstring deduplication diff. While binds/rebases can also happen in the
regular `.text` section, they're more common in `.data` sections, so it
seems more natural to test it that way.
This change is perf-neutral when linking chromium_framework.
Reviewed By: oontvoo
Differential Revision: https://reviews.llvm.org/D105044
There were some missing bazel dependencies for the Tosa dialect.
Added these deps.
Reviewed By: GMNGeoffrey
Differential Revision: https://reviews.llvm.org/D105326
Tosa's PassDetail.h may be used in non-TOSA transforms. Include
TosaDialect to avoid transient dependency.
Differential Revision: https://reviews.llvm.org/D105324
In `IRTranslator::translateGetElementPtr`, when we run into a vector gep with
some scalar operands, we try to normalize those operands using
`buildSplatVector`.
This is fine except for when the getelementptr has a <1 x N> type. In that case
it is treated as a scalar. If we run into one of these then every call to
```
// With VectorWidth = 1
LLT::fixed_vector(VectorWidth, PtrTy)
```
will assert.
Here's an example (equivalent to the added testcase):
https://godbolt.org/z/hGsTnMYdW
To get around this, this patch adds a variable, `WantSplatVector`, which
is true when our vector type ought to actually be represented using a vector.
When it's false, we'll translate as a scalar. This checks if `VectorWidth > 1`.
This fixes this bug:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=35496
Differential Revision: https://reviews.llvm.org/D105316
Added InferReturnTypeComponents for NAry operations, reshape, and reverse.
With the additional tosa-infer-shapes pass, we can infer/propagate shapes
across a set of TOSA operations. Current version does not modify the
FuncOp type by inserting an unrealized conversion cast prior to any new
non-matchin returns.
Differential Revision: https://reviews.llvm.org/D105312
Target-independent code only knows how to spill to the stack; instead,
use AArch64ISD::REINTERPRET_CAST.
Differential Revision: https://reviews.llvm.org/D104573
We need the compiler generated variable to override the weak symbol of
the same name inside the profile runtime, but using LinkOnceODRLinkage
results in weak symbol being emitted which leads to an issue where the
linker might choose either of the weak symbols potentially disabling the
runtime counter relocation.
This change replaces the use of weak definition inside the runtime with
an external weak reference to address the issue. We also place the
compiler generated symbol inside a COMDAT group so dead definition can
be garbage collected by the linker.
Differential Revision: https://reviews.llvm.org/D105176
The context can be created with threading disabled, to avoid creating a thread pool
that may be destroyed when injecting another one later.
Differential Revision: https://reviews.llvm.org/D105302
This patch includes the following changes to address a few issues when
using hidden helper task.
- Assertion is triggered when there are inadvertent calls to hidden
helper functions on non-Linux OS
- Added deinit code in __kmp_internal_end_library function to fix random
shutdown crashes
- Moved task data access into the lock-guarded region in __kmp_push_task
Differential Revision: https://reviews.llvm.org/D105308
This is the cause of the miscompile in:
https://llvm.org/PR50944
The problem has likely existed for some time, but it was made visible with:
5af8bacc94 ( D104661 )
handleOtherCmpSelSimplifications() assumed it can convert select of
constants to bool logic ops, but that does not work with poison.
We had a very similar construct in InstCombine, so the fix here
mimics the fix there.
The bug is in instsimplify, but I'm not sure how to reproduce it outside of
instcombine. The reason this is visible in instcombine is because we have a
hack (FIXME) to bypass simplification of a select when it has an icmp user:
955f125899/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp (L2632)
So we get to an unusual case where we are trying to simplify an instruction
that has an operand that would have already simplified if we had processed
it in normal order.
Differential Revision: https://reviews.llvm.org/D105298