The transform wasn't checking that the LHS of the comparison
*is* the `X` in question...
This is the miscompile that was holding up D87188.
Thanks to Dave Green for producing an actionable reproducer!
The main change is to add a 'IsDecl' field to DIModule so
that when IsDecl is set to true, the debug info entry generated
for the module would be marked as a declaration. That way, the debugger
would look up the definition of the module in the gloabl scope.
Please see the comments in llvm/test/DebugInfo/X86/dimodule.ll
for what the debug info entries would look like.
Differential Revision: https://reviews.llvm.org/D93462
Temporarily revert commit 8b1c4e310c.
After 8b1c4e310c the compile-time for `MultiSource/Benchmarks/MiBench/consumer-lame`
dramatically increases with -O3 & LTO, causing issues for builders with
that configuration.
I filed PR48553 with a smallish reproducer that shows a 10-100x compile
time increase.
We work with @rogfer01 from BSC to come out this patch.
Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Craig Topper <craig.topper@sifive.com>
Differential Revision: https://reviews.llvm.org/D93514
Similar to D69312, and documented in D69839, the IRBuilder needs to add
the strictfp attribute to invoke instructions when constrained floating
point is enabled.
This is try 2, with the test corrected.
Differential Revision: https://reviews.llvm.org/D93134
This PR implements the function splitBasicBlockBefore to address an
issue
that occurred during SplitEdge(BB, Succ, ...), inside splitBlockBefore.
The issue occurs in SplitEdge when the Succ has a single predecessor
and the edge between the BB and Succ is not critical. This produces
the result ‘BB->Succ->New’. The new function splitBasicBlockBefore
was added to splitBlockBefore to handle the issue and now produces
the correct result ‘BB->New->Succ’.
Below is an example of splitting the block bb1 at its first instruction.
/// Original IR
bb0:
br bb1
bb1:
%0 = mul i32 1, 2
br bb2
bb2:
/// IR after splitEdge(bb0, bb1) using splitBasicBlock
bb0:
br bb1
bb1:
br bb1.split
bb1.split:
%0 = mul i32 1, 2
br bb2
bb2:
/// IR after splitEdge(bb0, bb1) using splitBasicBlockBefore
bb0:
br bb1.split
bb1.split
br bb1
bb1:
%0 = mul i32 1, 2
br bb2
bb2:
Differential Revision: https://reviews.llvm.org/D92200
According to the documentation, if a spill is required to make a
register available and AllowSpill is false, then NoRegister should be
returned, however, this scenario was actually triggering an assertion
failure.
This patch moves the assertion after the handling of AllowSpill.
Authored by: Lewis Revill
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D92104
The SROA pass tries to be lazy for removing dead instructions that are collected during iterative run of the pass in the DeadInsts list. However it does not remove instructions from the dead list while running eraseFromParent() on those instructions.
This causes (rare) null pointer dereferences. For example, in the speculatePHINodeLoads() instruction, in the following code snippet:
```
while (!PN.use_empty()) {
LoadInst *LI = cast<LoadInst>(PN.user_back());
LI->replaceAllUsesWith(NewPN);
LI->eraseFromParent();
}
```
If the Load instruction LI belongs to the DeadInsts list, it should be removed when eraseFromParent() is called. However, the bug does not show up in most cases, because immediately in the same function, a new LoadInst is created in the following line:
```
LoadInst *Load = PredBuilder.CreateAlignedLoad(
LoadTy, InVal, Alignment,
(PN.getName() + ".sroa.speculate.load." + Pred->getName()));
```
This new LoadInst object takes the same memory address of the just deleted LI using eraseFromParent(), therefore the bug does not materialize. In very rare cases, the addresses differ and therefore, a dangling pointer is created, causing a crash.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D92431
The scope of R_TLS (TP offset relocation types (TPREL/TPOFF) used for the
local-exec TLS model) is actually narrower than its name may imply. R_TLS_NEG
is only used by Solaris R_386_TLS_LE_32.
Rename them so that they will be less confusing.
Reviewed By: grimar, psmith, rprichard
Differential Revision: https://reviews.llvm.org/D93467
MVE has a dual lane vector move instruction, capable of moving two
general purpose registers into lanes of a vector register. They look
like one of:
vmov q0[2], q0[0], r2, r0
vmov q0[3], q0[1], r3, r1
They only accept these lane indices though (and only insert into an
i32), either moving lanes 1 and 3, or 0 and 2.
This patch adds some tablegen patterns for them, selecting from vector
inserts elements. Because the insert_elements are know to be
canonicalized to ascending order there are several patterns that we need
to select. These lane indices are:
3 2 1 0 -> vmovqrr 31; vmovqrr 20
3 2 1 -> vmovqrr 31; vmov 2
3 1 -> vmovqrr 31
2 1 0 -> vmovqrr 20; vmov 1
2 0 -> vmovqrr 20
With the top one being the most common. All other potential patterns of
lane indices will be matched by a combination of these and the
individual vmov pattern already present. This does mean that we are
selecting several machine instructions at once due to the need to
re-arrange the inserts, but in this case there is nothing else that will
attempt to match an insert_vector_elt node.
This is a recommit of 6cc3d80a84 after
fixing the backward instruction definitions.
Following up with the comments in D92706.
- Use -passes instead of -enable-new-pm
- CoroEarly should happen before AlwaysInliner, adjust it.
- Remove some unnecessary barriers (still kept one)
- Cleanup unnecessary debug info
Differential Revision: https://reviews.llvm.org/D93342
This only needs to be called once for the function, and it visits all
the necessary blocks in the function. It looks like
631f6b888c accidentally moved this into
the loop over all save blocks.
This updates the test for the `.arch_extension` as directive negatives
to properly enable the extensions being tested on the llvm-mc command
line before validating that the directive correctly disables them.
Reviewed By: ostannard
Differential Revision: https://reviews.llvm.org/D93538
This adds support for the 'ls64' AArch64 extension to the `.arch_extension`
asm directive.
Reviewed By: ostannard
Differential Revision: https://reviews.llvm.org/D92574
Followup to D92645 - remove the remaining places where we create X86ISD::SUBV_BROADCAST, and fold splatted vector loads to X86ISD::SUBV_BROADCAST_LOAD instead.
Remove all the X86SubVBroadcast isel patterns, including all the fallbacks for if memory folding failed.
Only show the keyword as the hover "Name".
Show whether the type is deduced or undeduced as
the hover "Documentation".
Show the deduced type (if any) as the "Definition".
Don't show any hover information for:
- the "auto" word of "decltype(auto)"
- "auto" in lambda parameters
- "auto" in template arguments
---------------
This diff is a suggestion based on what @sammccall suggested in https://reviews.llvm.org/D92977 about hover on "auto". It somehow "hacks" onto the "Documentation" and "Definition" fields of `HoverInfo`. It sure looks good on VSCode, let me know if this seem acceptable to you.
Reviewed By: sammccall
Differential Revision: https://reviews.llvm.org/D93227
This is an enhancement motivated by https://llvm.org/PR16739
(see D92858 for another).
We can look through a GEP to find a base pointer that may be
safe to use for a vector load. If so, then we shuffle (shift)
the necessary vector element over to index 0.
Alive2 proof based on 1 of the regression tests:
https://alive2.llvm.org/ce/z/yPJLkh
The vector translation is independent of endian (verify by
changing to leading 'E' in the datalayout string).
Differential Revision: https://reviews.llvm.org/D93229
Currently, `ELFFile<ELFT>::getEntry` does not check an index of
an entry. Because of that the code might read past the end of the symbol
table silently. I've added a test to `llvm-readobj\ELF\relocations.test`
to demonstrate the possible issue. Also, I've added a unit test for
this method.
After this change, `getEntry` stops reporting the section index and
reuses the `getSectionContentsAsArray` method, which already has
all the validation needed. Our related warnings now provide
more and better context sometimes.
Differential revision: https://reviews.llvm.org/D93209
Redundant Copy Elimination was eliminating a MOVi32imm -1 when it
determined that the value of the destination register is already -1.
However, it didn't take into account that the MOVi32imm zeroes the upper
32 bits (which are FFFFFFFF) and therefore cannot be eliminated.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D93100
During isel there's no need to protect illegal types. Patch also
adds a missing unit test for tbl2 intrinsic using bfloat types.
Differential Revision: https://reviews.llvm.org/D93404
We currently reject this valid C construct by claiming it declares a
non-local variable: for (struct { int i; } s={0}; s.i != 0; s.i--) ;
We expected all declaration in the clause-1 declaration statement to be
a local VarDecl, but there can be other declarations involved such as a
tag declaration. This fixes PR35757.
This lint check is a part of the FLOCL (FPGA Linters for OpenCL)
project out of the Synergy Lab at Virginia Tech.
FLOCL is a set of lint checks aimed at FPGA developers who write code
in OpenCL.
The altera single work item barrier check finds OpenCL kernel functions
that call a barrier function but do not call an ID function. These
kernel functions will be treated as single work-item kernels, which
could be inefficient or lead to errors.
Based on the "Altera SDK for OpenCL: Best Practices Guide."
This patch fixes the following problem:
- open a file with references to the symbol `Foo`
- remove all references to `Foo` (from the dynamic index).
- `MergedIndex::refs()` result will contain positions of removed references (from the static index).
The idea of this patch is to keep a set of files which were used during index build inside the index.
Thus at processing the static index references we can check if the file of processing reference is a part of the dynamic index or not.
Reviewed By: sammccall
Differential Revision: https://reviews.llvm.org/D93393
Nearly all of our lldb-server tests have two flavours (lldb-server and
debugserver). Each of them is tagged with an appropriate decorator, and
each of them starts with a call to a matching "init" method. The init
calls are mandatory, and it's not possible to meaningfully combine them
with a different decorator.
This patch leverages the existing decorators to also tag the tests with
the appropriate debug server tag, similar to how we do with debug info
flavours. This allows us to make the "init" calls from inside the common
setUp method.
This patch extends LowerMGATHER/MSCATTER to make use of the vector + reg/immediate
addressing modes for scalable masked gathers & scatters.
selectGatherScatterAddrMode checks if the base pointer is null, in which case
we can swap the base pointer and the index, e.g.
getelementptr nullptr, <vscale x N x T> (splat(%offset)) + %indices)
-> getelementptr %offset, <vscale x N x T> %indices
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D93132
This is an addition to the existing Statistical Profiling extension, which
introduces an extra system register that is enabled by the new 'spe-eef'
subtarget feature.
Patch written by Simon Tatham.
Reviewed By: ostannard
Differential Revision: https://reviews.llvm.org/D92391
This introduces asm support for the Branch Record Buffer extension, through
the new 'brbe' subtarget feature. It consists of a new set of system registers
that enable the handling of branch records.
Patch written by Simon Tatham.
Reviewed By: ostannard
Differential Revision: https://reviews.llvm.org/D92389
This is split off from D91718 and adds a new target hook
supportsScalableVectors that can be queried to check if scalable vectors
are supported by the backend. For AArch64 this returns true if SVE is
enabled.
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D93060
When querying the CDB, we stat the underlying file to check it hasn't changed.
We don't do this every time, but only if we didn't check within 5 seconds.
This behavior only exists for compile_commands.json and compile_flags.txt.
The CDB plugin system doesn't expose enough information to handle others.
Slight behavior change: we now only look for `build/compile_commands.json`
rather than trying every CDB strategy under `build` subdirectories.
Differential Revision: https://reviews.llvm.org/D92663
This patch adds support for the fptoui.sat and fptosi.sat intrinsics,
which provide basically the same functionality as the existing fptoui
and fptosi instructions, but will saturate (or return 0 for NaN) on
values unrepresentable in the target type, instead of returning
poison. Related mailing list discussion can be found at:
https://groups.google.com/d/msg/llvm-dev/cgDFaBmCnDQ/CZAIMj4IBAAJ
The intrinsics have overloaded source and result type and support
vector operands:
i32 @llvm.fptoui.sat.i32.f32(float %f)
i100 @llvm.fptoui.sat.i100.f64(double %f)
<4 x i32> @llvm.fptoui.sat.v4i32.v4f16(half %f)
// etc
On the SelectionDAG layer two new ISD opcodes are added,
FP_TO_UINT_SAT and FP_TO_SINT_SAT. These opcodes have two operands
and one result. The second operand is an integer constant specifying
the scalar saturation width. The idea here is that initially the
second operand and the scalar width of the result type are the same,
but they may change during type legalization. For example:
i19 @llvm.fptsi.sat.i19.f32(float %f)
// builds
i19 fp_to_sint_sat f, 19
// type legalizes (through integer result promotion)
i32 fp_to_sint_sat f, 19
I went for this approach, because saturated conversion does not
compose well. There is no good way of "adjusting" a saturating
conversion to i32 into one to i19 short of saturating twice.
Specifying the saturation width separately allows directly saturating
to the correct width.
There are two baseline expansions for the fp_to_xint_sat opcodes. If
the integer bounds can be exactly represented in the float type and
fminnum/fmaxnum are legal, we can expand to something like:
f = fmaxnum f, FP(MIN)
f = fminnum f, FP(MAX)
i = fptoxi f
i = select f uo f, 0, i # unnecessary if unsigned as 0 = MIN
If the bounds cannot be exactly represented, we expand to something
like this instead:
i = fptoxi f
i = select f ult FP(MIN), MIN, i
i = select f ogt FP(MAX), MAX, i
i = select f uo f, 0, i # unnecessary if unsigned as 0 = MIN
It should be noted that this expansion assumes a non-trapping fptoxi.
Initial tests are for AArch64, x86_64 and ARM. This exercises all of
the scalar and vector legalization. ARM is included to test float
softening.
Original patch by @nikic and @ebevhan (based on D54696).
Differential Revision: https://reviews.llvm.org/D54749
The behaviour triggered with this flag is consistent with `-fparse-only`
in `flang` (i.e. the throwaway driver). This new spelling is consistent
with Clang and gfortran, and was proposed and agreed on for the new
driver in [1].
This patch also adds some minimal logic to communicate whether the
semantic checks have failed or not. When semantic checks fail, a
frontend driver error is generated. The return code from the frontend
driver is then determined by checking the driver diagnostics - the
presence of driver errors means that the compilation has failed. This
logic is consistent with `clang -cc1`.
[1] http://lists.llvm.org/pipermail/flang-dev/2020-November/000588.html
Differential Revision: https://reviews.llvm.org/D92854
The directory_iterator.cpp file did contain an incomplete,
non-working implementation for windows.
Change it to use the wchar version of the APIs.
Don't set the windows specific errors from GetLastError() as code
in the generic category; remap the errors to the std::errc values.
Error out cleanly on empty paths.
Invoke FindFirstFile on <directoryname>/* to actually list the
entries of the directory.
If the first entry retured by FindFirstFile is to be skipped (e.g.
being "." or ".."), call advance() (which calls FindNextFile and loops)
which doesn't return until a valid entry is found (or the end is
reached).
Differential Revision: https://reviews.llvm.org/D91140
On windows, the narrow, char based paths normally don't use utf8, but
can use many different native code pages, and this is what system
functions that operate on files, taking such paths/file names, interpret
them as.
Differential Revision: https://reviews.llvm.org/D91137