In something like "ASSOCIATE(X=>T(1))", the "T(1)" is parsed
as a Variable because it looks like a function reference or
array reference; if it turns out to be a structure constructor,
which is something we can't know until we're able to attempt
generic interface resolution in semantics, the parse tree needs
to be fixed up by replacing the Variable with an Expr.
The compiler could already do this for putative function references
encapsulated as Exprs, so this patch moves some code around and
adds parser::Selector to the overloads of expression analysis.
Differential Revision: https://reviews.llvm.org/D103572
Building LLVM with the LLVM_ENABLE_MODULES cmake option fails when the modules are being compiled due to missing includes. This is a side effect of some transitive includes that changed recently.
Differential Revision: https://reviews.llvm.org/D103645
This moves the implementations for HandleTagMismatch, __hwasan_tag_mismatch4,
and HwasanAtExit from hwasan_linux.cpp to hwasan.cpp and declares them in hwasan.h.
This way, calls to those functions can be shared with the fuchsia implementation
without duplicating code.
Differential Revision: https://reviews.llvm.org/D103562
The constexpr-capable class evaluate::DynamicType represented
CHARACTER length only with a nullable pointer into the declared
parameters of types in the symbol table, which works fine for
anything with a declaration but turns out to not suffice to
describe the results of the ACHAR() and CHAR() intrinsic
functions. So extend DynamicType to also accommodate known
constant CHARACTER lengths, too; use them for ACHAR & CHAR;
clean up several use sites and fix regressions found in test.
Differential Revision: https://reviews.llvm.org/D103571
A recent change (D99683) to support ThinLTO for HIP caused a regression
when compiling cuda code with -flto=thin -fwhole-program-vtables.
Specifically, we now get an error:
error: invalid argument '-fwhole-program-vtables' only allowed with '-flto'
This error is coming from the device offload cc1 action being set up for
the cuda compile, for which -flto=thin doesn't apply and gets dropped.
This is a regression, but points to a potential issue that was silently
occurring before the patch, details below.
Before D99683, the check for fwhole-program-vtables in the driver looked
like:
if (WholeProgramVTables) {
if (!D.isUsingLTO())
D.Diag(diag::err_drv_argument_only_allowed_with)
<< "-fwhole-program-vtables"
<< "-flto";
CmdArgs.push_back("-fwhole-program-vtables");
}
And D.isUsingLTO() returned true since we have -flto=thin. However,
because the cuda cc1 compile is doing device offloading, which didn't
support any LTO, there was other code that suppressed -flto* options
from being passed to the cc1 invocation. So the cc1 invocation silently
had -fwhole-program-vtables without any -flto*. This seems potentially
problematic, since if we had any virtual calls we would get type test
assume sequences without the corresponding LTO pass that handles them.
However, with the patch, which adds support for device offloading LTO
option -foffload-lto=thin, the code has changed so that we set a bool
IsUsingLTO based on either -flto* or -foffload-lto*, depending on
whether this is the device offloading action. For the device offload
action in our compile, since we don't have -foffload-lto, IsUsingLTO is
false, and the check for LTO with -fwhole-program-vtables now fails.
What we should do is only pass through -fwhole-program-vtables to the
cc1 invocation that has LTO enabled (either the device offload action
with -foffload-lto, or the non-device offload action with -flto), and
otherwise drop the -fwhole-program-vtables for the non-LTO action.
Then we should error only if we have -fwhole-program-vtables without any
-f*lto* options.
Differential Revision: https://reviews.llvm.org/D103579
This builds on D103584. The change eliminates the coupling between unroll heuristic and implementation w.r.t. knowing when the passed in trip count is an exact trip count or a max trip count. In theory the new code is slightly less powerful (since it relies on exact computable trip counts), but in practice, it appears to cover all the same cases. It can also be extended if needed.
The test change shows what appears to be a bug in the existing code around the interaction of peeling and unrolling. The original loop only ran 8 iterations. The previous output had the loop peeled by 2, and then an exact unroll of 8. This meant the loop ran a total of 10 iterations which appears to have been a miscompile.
Differential Revision: https://reviews.llvm.org/D103620
A procedure pointer is allowed to name a specific intrinsic function
from F'2018 table 16.2 as its interface, but not other intrinsic
procedures. Catch this error, and thereby also fix a crash resulting
from a failure later in compilation from failed characteristics;
while here, also catch the similar error with initializers.
Differential Revision: https://reviews.llvm.org/D103570
In this particular example, we had a crash when compiling it
for several architectures. This patch extends the legalization
of extract_subvector to avoid this problem.
Differential Revision: https://reviews.llvm.org/D103344
As a benign extension common to other Fortran compilers,
accept BOZ literals in array constructors w/o explicit
types, treating them as integers.
Differential Revision: https://reviews.llvm.org/D103569
PPC_FP128 determines isZero/isNan/isInf using high-order double value
only. Checking isZero/isNegative might return the isNullValue unexpectedly.
eg:
0xM0000000000000000FFFFFFFFFFFFFFFFF
isZero, but it is not NullValue.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D103634
`__profd_*` variables are referenced by code only when value profiling is
enabled. If disabled (e.g. default -fprofile-instr-generate), the symbols just
waste space on ELF/Mach-O. We change the comdat symbol from `__profd_*` to
`__profc_*` because an internal symbol does not provide deduplication features
on COFF. The choice doesn't matter on ELF.
(In -DLLVM_BUILD_INSTRUMENTED_COVERAGE=on build, there is now no `__profd_*` symbols.)
On Windows this enables further optimization. We are no longer affected by the
link.exe limitation: an external symbol in IMAGE_COMDAT_SELECT_ASSOCIATIVE can
cause duplicate definition error.
https://lists.llvm.org/pipermail/llvm-dev/2021-May/150758.html
We can thus use llvm.compiler.used instead of llvm.used like ELF (D97585).
This avoids many `/INCLUDE:` directives in `.drectve`.
Here is rnk's measurement for Chrome:
```
This reduced object file size of base_unittests.exe, compiled with coverage, optimizations, and gmlt debug info by 10%:
#BEFORE
$ find . -iname '*.obj' | xargs du -b | awk '{ sum += $1 } END { print sum}'
1047758867
$ du -cksh base_unittests.exe
82M base_unittests.exe
82M total
# AFTER
$ find . -iname '*.obj' | xargs du -b | awk '{ sum += $1 } END { print sum}'
937886499
$ du -cksh base_unittests.exe
78M base_unittests.exe
78M total
```
Reviewed By: davidxl, rnk
Differential Revision: https://reviews.llvm.org/D103372
The code for folding calls to the intrinsic function CMPLX was
incorrectly dependent on the number of arguments to distinguish its
two cases (conversion from one kind of complex to another, and
composition of a complex value from real & imaginary parts).
This was wrong since the optional KIND= argument has already been
taken into account by intrinsic processing; instead, the type of
the first argument should decide the issue.
Differential Revision: https://reviews.llvm.org/D103568
One exit is unpredictable, the other has a known trip count. For
one function the predictable exit is the latch exit, for the other
the non-latch exit. Currently they are treated differently.
In error recovery situations, the mappings from source locations
to scopes were failing in a way that tripped some asserts.
Specifically, FindPureProcedureContaining() wasn't coping well
when starting at the global scope. (And since the global scope
no longer has a source range, clean up the Semantics constructor
to avoid confusion.)
Differential Revision: https://reviews.llvm.org/D103567
ca6751043d added a dependency on XAR (at
least for the shared libs build), so without this change we get the
following linker error:
Undefined symbols for architecture x86_64:
"_xar_close", referenced from:
lld::macho::BitcodeBundleSection::finalize() in SyntheticSections.cpp.o
Reviewed By: #lld-macho, int3, thakis
Differential Revision: https://reviews.llvm.org/D100999
This patch introduces a new tool, llvm-tapi-diff, that compares and returns the diff of two TBD files.
Reviewed By: ributzka, JDevlieghere
Differential Revision: https://reviews.llvm.org/D101835
If we're not emitting separate fences for the success/failure cases, we
need to pass the merged ordering to the target so it can emit the
correct instructions.
For the PowerPC testcase, we end up with extra fences, but that seems
like an improvement over missing fences. If someone wants to improve
that, the PowerPC backed could be taught to emit the fences after isel,
instead of depending on fences emitted by AtomicExpand.
Fixes https://bugs.llvm.org/show_bug.cgi?id=33332 .
Differential Revision: https://reviews.llvm.org/D103342
Restructure handling of cfg-hide-unreachable-paths and
cfg-hide-deoptimize-paths options so as to make it easier
to introduce new types of hidden blocks.
The os_signpost API already captures the begin/end part and in
Instruments, this just adds visual noise that gets in the way of the
interesting data. By removing the redundant end text, the display in
Instruments gets even less cluttered.
rdar://78636200
Differential Revision: https://reviews.llvm.org/D103577
Most of our private headers need to be treated as submodules so that
Clang modules can export things correctly. Previous commits that split
monolithic headers into smaller chunks were unaware of this requirement,
and so this is being addressed in one fell swoop. Moving forward, most
new headers will need to have their own submodule (anything that's
conditionally included is exempt from this rule, which means `__support`
headers aren't made into submodules).
This hasn't been marked NFC, since I'm not 100% sure that's the case.
Differential Revision: https://reviews.llvm.org/D103551
Prior to this patch when you used `clang -module-file-info` clang would
delete the module on completion because the module was treated as an
output file.
This fixes the issue so you don't need to invoke cc1 directly to get
module file information.
Reviewed By: steven_wu, phosek
Differential Revision: https://reviews.llvm.org/D103547
Template args of outer types were not fully-qualified when calling getFullyQualifiedType() for inner types.
For simplicity the patch is a copy-paste of the same call from getFullyQualifiedType().
Reviewed at: https://reviews.llvm.org/D103039
This is a first step towards simplifying the transform interface to be less error prone. The basic idea is that querying SCEV is cheap (since it's cached) and we can just check for properties related to branch folding in the transform method instead of relying on the heuristic part to pass everything in correctly.
Differential Revision: https://reviews.llvm.org/D103584
No need to recalculate the cost of extractelements, just no need to
compensate the cost of all extractelements, need to check before if this
is actually going to be removed at the vectorization. Also, no need to
generate new extractelement instruction, we may just regenerate the
original one. It may improve the final vectorization.
Differential Revision: https://reviews.llvm.org/D102933
This cleans up the unroll action into two phases. Phase 1 does the mechanical act of unrolling, and leaves all conditional branches in place. Phase 2 optimizes away some of the conditional branches and then simplifies the loop. The primary benefit of the reordering is that we can delete some special cases dom tree update logic.
Differential Revision: https://reviews.llvm.org/D103561
tryToVectorizeList function allows to reorder only 2 scalars. Patch
allows to reorder >2 scalars. Also, to avoid possible regressions, it
allows extra vectorization of the remaining parts of the scalars
elements if possible.
Part of D57059.
Differential Revision: https://reviews.llvm.org/D103247
As noticed by NAKAMURA Takumi back in 2017, we cannot use
properlyDominates for std::stable_sort as properlyDominates only
partially orders blocks. That is, for blocks A, B, C, D, where A
dominates B and C dominates D, we have A == C, B == C, but A < B. This
is not a valid comparison function for std::stable_sort and causes
different results between libstdc++ and libc++. This change uses DFS
numbering to give deterministic results for all reachable blocks.
Unreachable blocks are ignored already, so do not need special
consideration.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D103441
This is a followup to D103422. The DenseMapInfo implementations for
ArrayRef and StringRef are moved into the ArrayRef.h and StringRef.h
headers, which means that these two headers no longer need to be
included by DenseMapInfo.h.
This required adding a few additional includes, as many files were
relying on various things pulled in by ArrayRef.h.
Differential Revision: https://reviews.llvm.org/D103491
D101613 added some macros used by Microsofts SAL. D103425 uses `__pre`
and `__post`. They are also used by SAL and cause issues when used on
Windows. Add them to the blacklist making it easier to figure out what
the issue is.
Differential Revision: https://reviews.llvm.org/D103541
In the instruction referencing variable location model, we store variable
locations that point at PHIs in MachineFunction during register
allocation. Unfortunately, register coalescing can substantially change
the locations of registers, and so that PHI-variable-location side table
needs maintenence during the pass.
This patch builds an index from the side table, and whenever a vreg gets
coalesced into another vreg, update the index to record the new vreg that
the PHI happens in. It also accepts a limited range of subregister
coalescing, for example merging a subregister into a larger class.
Differential Revision: https://reviews.llvm.org/D86813