Add folder for the case where ExtractStridedSliceOp source comes from a chain
of InsertStridedSliceOp. Also add a folder for the trivial case where the
ExtractStridedSliceOp is a no-op.
Differential Revision: https://reviews.llvm.org/D89850
Use `LineOffsetMapping:get` directly and remove/inline the helper
`ComputeLineNumbers`, simplifying the callers.
Differential Revision: https://reviews.llvm.org/D89922
It's currently ambiguous in IR whether the source language explicitly
did not want a stack a stack protector (in C, via function attribute
no_stack_protector) or doesn't care for any given function.
It's common for code that manipulates the stack via inline assembly or
that has to set up its own stack canary (such as the Linux kernel) would
like to avoid stack protectors in certain functions. In this case, we've
been bitten by numerous bugs where a callee with a stack protector is
inlined into an __attribute__((__no_stack_protector__)) caller, which
generally breaks the caller's assumptions about not having a stack
protector. LTO exacerbates the issue.
While developers can avoid this by putting all no_stack_protector
functions in one translation unit together and compiling those with
-fno-stack-protector, it's generally not very ergonomic or as
ergonomic as a function attribute, and still doesn't work for LTO. See also:
https://lore.kernel.org/linux-pm/20200915172658.1432732-1-rkir@google.com/https://lore.kernel.org/lkml/20200918201436.2932360-30-samitolvanen@google.com/T/#u
Typically, when inlining a callee into a caller, the caller will be
upgraded in its level of stack protection (see adjustCallerSSPLevel()).
By adding an explicit attribute in the IR when the function attribute is
used in the source language, we can now identify such cases and prevent
inlining. Block inlining when the callee and caller differ in the case that one
contains `nossp` when the other has `ssp`, `sspstrong`, or `sspreq`.
Fixes pr/47479.
Reviewed By: void
Differential Revision: https://reviews.llvm.org/D87956
We were returning the default constructed unique_pointer from
TypeSystem.h for which the compiler does not have a definition. Move the
implementation into the cpp file.
On AIX, to support vector types, which should always be 16 bytes aligned,
we set alloca to return 16 bytes aligned memory space.
Differential Revision: https://reviews.llvm.org/D89910
This does not change anything at the moment, but needed for
D89170. In that change I am probing a physical SGPR to see if
it is legal. RC is SReg_32, but DRC for scratch instructions
is SReg_32_XEXEC_HI and test fails.
That is sufficient just to check if DRC contains a register
here in case of physreg. Physregs also do not use subregs
so the subreg handling below is irrelevant for these.
Differential Revision: https://reviews.llvm.org/D90064
The change in 0ba9843397 changed the behaviour of the build when
using an XL build compiler because `-G` is not a pure linker option:
it also implies `-shared`. This was accounted for in the base CMake
configuration, so an analysis of the change from 0ba9843397 in
relation to a build using Clang (where `-shared` is introduced by CMake)
would not identify the issue. This patch resolves this particular issue
by adding `-shared` alongside `-Wl,-G`.
At the same time, the investigation reveals that several aspects of the
various build configurations are not operating in the manner originally
intended.
The other issue related to the `-G` linker option in the build is that
the removal of it (to avoid unnecessary use of run-time linking) is not
effective for the build using the Clang compiler. This patch addresses
this by adjusting the regular expressions used to remove the broadly-
applied `-G`.
Finally, the issue of specifying the export list with `-Wl,` instead of
a compiler option is flagged with a FIXME comment.
Reviewed By: daltenty, amyk
Differential Revision: https://reviews.llvm.org/D90041
For unknown reasons, this test started failing only on the
llvm-avr-linux bot after 5c20d7db9f2791367b9311130eb44afecb16829c:
http://lab.llvm.org:8011/#/builders/112/builds/365
The error message is not helpful, and I have an email out to the bot
owner to help with debugging. XFAIL it on avr for now.
This was initiated from the uses of MCRegUnitIterator, so while likely
not exhaustive, it's a step forward.
Differential Revision: https://reviews.llvm.org/D89975
I'm not sure whether this can cause actual non-determinism in the
compiler output, but at least it causes non-determinism in the
statistics collected by BasicAA.
Use SetVector to have a predictable iteration order.
This partially reverts D85994.
In glibc, elf/dl-sym.c calls the raw `__tls_get_addr` by specifying the
tls_index parameter. Such a call does not have a pairing R_PPC64_TLSGD/R_PPC64_TLSLD.
This is legitimate. Since we cannot distinguish the benign case from cases due
to toolchain issues, we have to be permissive.
Acked by Stefan Pintilie
Avoid some noisy `const_cast`s by making `ContentCache::SourceLineCache`
and `SourceManager::LastLineNoContentCache` both mutable.
Differential Revision: https://reviews.llvm.org/D89914
There are two optimizations here:
1. Consider the following code:
FCMPSrr %0, %1, implicit-def $nzcv
%sel1:gpr32 = CSELWr %_, %_, 12, implicit $nzcv
%sub:gpr32 = SUBSWrr %_, %_, implicit-def $nzcv
FCMPSrr %0, %1, implicit-def $nzcv
%sel2:gpr32 = CSELWr %_, %_, 12, implicit $nzcv
This kind of code where we have 2 FCMPs each feeding a CSEL can happen
when we have a single IR fcmp being used by two selects. During selection,
to ensure that there can be no clobbering of nzcv between the fcmp and the
csel, we have to generate an fcmp immediately before each csel is
selected.
However, often we can essentially CSE these together later in MachineCSE.
This doesn't work though if there are unrelated flag-setting instructions
in between the two FCMPs. In this case, the SUBS defines NZCV
but it doesn't have any users, being overwritten by the second FCMP.
Our solution here is to try to convert flag setting operations between
a interval of identical FCMPs, so that CSE will be able to eliminate one.
2. SelectionDAG imported patterns for arithmetic ops currently select the
flag-setting ops for CSE reasons, and add the implicit-def $nzcv operand
to those instructions. However if those impdef operands are not marked as
dead, the peephole optimizations are not able to optimize them into non-flag
setting variants. The optimization here is to find these dead imp-defs and
mark them as such.
This pass is only enabled when optimizations are enabled.
Differential Revision: https://reviews.llvm.org/D89415
If CUDA version can not be determined based on version.txt file, attempt to find
CUDA_VERSION macro in cuda.h.
This is a follow-up to D89752,
Differntial Revision: https://reviews.llvm.org/D89832
CUDA-11.1 does not carry version.txt which causes clang to assume that it's
CUDA-7.0, which used to be the only CUDA version w/o version.txt.
In order to tell CUDA-7.0 apart from the new versions, clang now probes for the
presence of libdevice.10.bc which is not present in the old CUDA versions.
This should keep Clang working for CUDA-11.1.
PR47332: https://bugs.llvm.org/show_bug.cgi?id=47332
Differential Revision: https://reviews.llvm.org/D89752
This patch redesigns the Target::GetUtilityFunctionForLanguage API:
- Use a unique_ptr instead of a raw pointer for the return type.
- Wrap the result in an llvm::Expected instead of using a Status object as an I/O parameter.
- Combine the action of "getting" and "installing" the UtilityFunction as they always get called together.
- Pass std::strings instead of const char* and std::move them where appropriate.
There's more room for improvement but I think this tackles the most
prevalent issues with the current API.
Differential revision: https://reviews.llvm.org/D90011
These compiler-rt tests should be UNSUPPORTED instead of XFAIL, which seems to be the real intent of the authors.
Reviewed By: vvereschaka
Differential Revision: https://reviews.llvm.org/D89840
Put the guts of `ComputeLineNumbers` into `LineOffsetMapping::get` and
`LineOffsetMapping::LineOffsetMapping`. As a drive-by, store the number
of lines directly in the bump-ptr-allocated array.
Differential Revision: https://reviews.llvm.org/D89913
Runs an executable on a remote host.
This is meant to be used as an executor when running the LLVM and the Libraries tests on a target.
Reviewed By: vvereschaka
Differential Revision: https://reviews.llvm.org/D89349
This re-applies e2fceec2fd with fixes. Apparently we already *do* support
relaxation for ELF, so we need to make sure the test case allocates a slab at
a fixed address, and that the R_X86_64_REX_GOTPCRELX test references an external
that is guaranteed to be out of range.
Immediate must be in an integer range [0,255] for umin/umax instruction.
Extend pattern matching helper SelectSVEArithImm() to take in value type
bitwidth when checking immediate value is in range or not.
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D89831
It turns out that `FileInfo` *always* has a ContentCache. Clarify that
in the code:
- Update the private version of `SourceManager::createFileID` to take a
`ContentCache&` instead of `ContentCache*`, and rename it to
`createFileIDImpl` for clarity.
- Change `FileInfo::getContentCache` to return a reference.
Differential Revision: https://reviews.llvm.org/D89554
In this patch, Predicates fix added for the following:
* disable prefix-instrs will disable pcrelative-memops
* set two predicates PairedVectorMemops and PrefixInstrs for PLXVP/PSTXVP definitions
Differential Revision: https://reviews.llvm.org/D89727
Reviewed by: amyk, steven.zhang
I was wrong in thinking that MRI.use_instructions return unique instructions and mislead Jay in his previous patch D64393.
First loop counted more instructions than it was in reality and the second loop went beyond the basic block with that counter.
I used Jay's previous code that relied on MRI.use_operands to constrain the number of instructions to check among.
modifiesRegister is inlined to reduce the number of passes over instruction operands and added assert on BB end boundary.
Differential Revision: https://reviews.llvm.org/D89386
Implementation of instructions table.get, table.set, table.grow,
table.size, table.fill, table.copy.
Missing instructions are table.init and elem.drop as they deal with
element sections which are not yet implemented.
Added more tests to tables.s
Differential Revision: https://reviews.llvm.org/D89797
Deciding where to place debugging instructions when normal instructions
sink between blocks is difficult -- see PR44117. Dealing with this with
instruction-referencing variable locations is simple: we just tolerate
DBG_INSTR_REFs referring to values that haven't been computed yet. This
patch adds support into InstrRefBasedLDV to record when a variable value
appears in the middle of a block, and should have a DBG_VALUE added when it
appears (a debug use before def).
While described simply, this relies heavily on the value-propagation
algorithm in InstrRefBasedLDV. The implementation doesn't attempt to verify
the location of a value unless something non-trivial occurs to merge
variable values in vlocJoin. This means that a variable with a value that
has no location can retain it across all control flow (including loops).
It's only when another debug instruction specifies a different variable
value that we have to check, and find there's no location.
This property means that if a machine value is defined in a block dominated
by a DBG_INSTR_REF that refers to it, all the successor blocks can
automatically find a location for that value (if it's not clobbered). Thus
in a sense, InstrRefBasedLDV is already supporting and implementing
use-before-defs. This patch allows us to specify a variable location in the
block where it's defined.
When loading live-in variable locations, TransferTracker currently discards
those where it can't find a location for the variable value. However, we
can tell from the machine value number whether the value is defined in this
block. If it is, add it to a set of use-before-def records. Then, once the
relevant instruction has been processed, emit a DBG_VALUE immediately after
it.
Differential Revision: https://reviews.llvm.org/D85775
This follows on from D89558 which added the new intrinsic and D88955
which added similar combines for llvm.amdgcn.fmul.legacy.
Differential Revision: https://reviews.llvm.org/D90028
Apparently there are some Microsoft headers which
`#define interface struct`. This method is only used
in pending changes so far.
Change-Id: Ic68fe8e1958ec9b015f817ee218431f4146b888a