PTX does not differentiate between read and write fences. Hence, these a
lowered to a mem_fence call. The mem_fence function compiles to the
“member.cta” instruction, which commits all outstanding reads and writes
of a thread such that these become visible to all other threads in the same
CTA (i.e., work-group). The instruction does not differentiate between
global and local memory. Hence, the flags parameter is ignored, except
for deciding whether a “member.cta” instruction should be issued at all.
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 315235
The vcruntime headers are hairy and clash with both libc++ headers
themselves and other libraries. libc++ normally deals with the clashes
by deferring to the vcruntime headers and silencing its own definitions,
but for clients which don't want to depend on vcruntime headers, it's
desirable to support the opposite, i.e. have libc++ provide its own
definitions.
Certain operator new/delete replacement scenarios are not currently
supported in this mode, which requires some tests to be marked XFAIL.
The added documentation has more details.
Differential Revision: https://reviews.llvm.org/D38522
llvm-svn: 315234
Some functions were taking Twine's not by const&, these are all
fixed to take by const&. We also had a case where some functions
were overloaded to accept by const& and &&. Now there is only
one version which accepts by value and move's the value.
llvm-svn: 315229
This generates a "bar.sync 0” instruction, which not only causes the
threads to wait, but does acts as a memory fence, as required by
OpenCL. The fence does not differentiate between local and global
memory. Unfortunately, there is no similar instruction which does
not include a memory fence. Hence, we cannot optimize the case
where neither CLK_LOCAL_MEM_FENCE nor CLK_GLOBAL_MEM_FENCE is
passed.
llvm-svn: 315228
Fuchsia doesn't support signals, so don't use interceptors for signal or
sigaction.
Differential Revision: https://reviews.llvm.org/D38669
llvm-svn: 315227
It's rare but there are a small number of patterns like this:
(set i64:$dst, (add i64:$src1, i64:$src2))
These should be equivalent to register classes except they shouldn't check for
a specific register bank.
This doesn't occur in AArch64/ARM/X86 but does occasionally come up in other
in-tree targets such as BPF.
llvm-svn: 315226
Microsoft's debug implementation of std::copy checks if the destination is an
array and then does some bounds checking. This was causing an assertion
failure in fs::rename_internal which copies to a buffer of the appropriate
size but that's type-punned to an array of length 1 for API compatibility
reasons.
Fix is to make make the destination a pointer rather than an array.
llvm-svn: 315222
Summary:
While the specification says that the 64bit registers are prefixed with
`x`, it seems that many people still use `r`. Until recently, we had been using
the `r` prefix instead of the `x` prefix in ds2. This caused lldb to fail during
unwinding. I think it's reasonable to check for a register prefixed with `r`,
since some people still choose to use `r`.
Reviewers: sas, fjricci, clayborg
Reviewed By: sas, clayborg
Subscribers: aemerson, javed.absar, kristof.beyls
Differential Revision: https://reviews.llvm.org/D38376
Change by Alex Langford <apl@fb.com>
llvm-svn: 315221
Summary:
swiftc emits symbols without flags set, which led dsymutil to ignore
them when searching for global symbols, causing dwarf location data
to be omitted. Xcode's dsymutil handles this case correctly, and emits
valid location data. Add this functionality to llvm-dsymutil by
allowing parsing of symbols with no flags set.
Reviewers: aprantl, friss, JDevlieghere
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38587
llvm-svn: 315218
Summary:
Since D37924 and D37925 were merged, it's now possible to specify
individual sanitizers or CFI modes in sanitizer blacklists. Update the
CFI blacklist entries to only apply to cfi-unrelated-cast checks.
Reviewers: eugenis, pcc
Reviewed By: eugenis
Subscribers: kcc
Differential Revision: https://reviews.llvm.org/D38385
llvm-svn: 315216
Summary: The arg is useful for debugging and creating test cases.
Reviewers: bkramer, krasimir
Reviewed By: bkramer
Subscribers: klimek, cfe-commits
Differential Revision: https://reviews.llvm.org/D37970
llvm-svn: 315214
Summary:
It was previsouly set only in ASTUnit, but it should be set for all client of
PrecompiledPreamble.
Reviewers: erikjv, bkramer, klimek
Reviewed By: bkramer
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D38617
llvm-svn: 315212
Summary:
They are now used in ClangdScheduler instead of deferred std::async
computations.
The results of `std::async` are much less effective and do not provide
a good abstraction for similar purposes, i.e. for storing additional callbacks
to clangd async tasks. The actual callback API will follow a bit later.
Reviewers: klimek, bkramer, sammccall, krasimir
Reviewed By: sammccall
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D38627
llvm-svn: 315210
This allows rc files to have comments. Eventually we should
just use clang's c preprocessor, but that's a bit larger
effort for minimal gain, and this is straightforward.
Differential Revision: https://reviews.llvm.org/D38651
llvm-svn: 315207
E.g. if we have a (xor(overflow-bit), 1) where overflow-bit comes from an
intrinsic like llvm.sadd.with.overflow then we can kill the xor and use the
inverted condition code for the CSEL.
rdar://28495949
Reviewed By: kristof.beyls
Differential Revision: https://reviews.llvm.org/D38160
llvm-svn: 315205
We believe that despite AMD's documentation, that they really do support all 32 comparision predicates under AVX.
Differential Revision: https://reviews.llvm.org/D38609
llvm-svn: 315201
include/clang/Lex/PreprocessorLexer.h:79:3: error: constructor for
'clang::PreprocessorLexer' must explicitly initialize the const member
'FID'
llvm-svn: 315197
Return the combined shuffle from combineX86ShufflesRecursively and perform the combineTo in the caller.
Makes it easier for future patches to use this in functions that aren't actually shuffles themselves.
llvm-svn: 315195
for loop would only report status of the last command
v2: return '1'
call test instead of '['
Reviewer: Jeroen Ketema
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 315193
Summary:
We currently disable some converting of shuffles to MOVSS/MOVSD during legalization if SSE41 is enabled. But later during shuffle combining we go back to prefering MOVSS/MOVSD.
Additionally we have patterns that look for BLENDIs to detect scalar arithmetic operations. I believe due to the combining using MOVSS/MOVSD these are unnecessary.
Interestingly, we still codegen blend instructions even though lowering/isel emit movss/movsd instructions. Turns out machine CSE commutes them to blend, and then commuting those blends back into blends that are equivalent to the original movss/movsd.
This patch fixes the inconsistency in legalization to prefer MOVSS/MOVSD. The one test change was caused by this change. The problem is that we have integer types and are mostly selecting integer instructions except for the shufps. This shufps forced the execution domain, but the vpblendw couldn't have its domain changed with a naive instruction swap. We could fix this by special casing VPBLENDW based on the immediate to widen the element type.
The rest of the patch is removing all the excess scalar patterns.
Long term we should probably add isel patterns to make MOVSS/MOVSD emit blends directly instead of relying on the double commute. We may also want to consider emitting movss/movsd for optsize. I also wonder if we should still use the VEX encoded blendi instructions even with AVX512. Blends have better throughput, and that may outweigh the register constraint.
Reviewers: RKSimon, zvi
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38023
llvm-svn: 315181