Summary:
The 512-bit cvt(u)qq2tops, cvt(u)qqtopd, and cvt(u)dqtops intrinsics all have the possibility of taking an explicit rounding mode argument. If the rounding mode is CUR_DIRECTION we'd like to emit a sitofp/uitofp instruction and a select like we do for 256-bit intrinsics.
For cvt(u)qqtopd and cvt(u)dqtops we do this when the form of the software intrinsics that doesn't take a rounding mode argument is used. This is done by using convertvector in the header with the select builtin. But if the explicit rounding mode form of the intrinsic is used and CUR_DIRECTION is passed, we don't do this. We shouldn't have this inconsistency.
For cvt(u)qqtops nothing is done because we can't use the select builtin in the header without avx512vl. So we need to use custom codegen for this.
Even when the rounding mode isn't CUR_DIRECTION we should also use select in IR for consistency. And it will remove another scalar integer mask from our intrinsics.
To accomplish all of these goals I've taken a slightly unusual approach. I've added two new X86 specific intrinsics for sitofp/uitofp with rounding. These intrinsics are variadic on the input and output type so we only need 2 instead of 6. This avoids the need for a switch to map them in CGBuiltin.cpp. We just need to check signed vs unsigned. I believe other targets also use variadic intrinsics like this.
So if the rounding mode is CUR_DIRECTION we'll use an sitofp/uitofp instruction. Otherwise we'll use one of the new intrinsics. After that we'll emit a select instruction if needed.
Reviewers: RKSimon, spatel
Reviewed By: RKSimon
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D56998
llvm-svn: 352267
The IR enforced limit for the address space is 24-bits, but LLT was
only using 23-bits. Additionally, the argument to the constructor was
truncating to 16-bits.
A similar problem still exists for the number of vector elements. The
IR enforces no limit, so if you try to use a vector with > 65535
elements the IRTranslator asserts in the LLT constructor.
llvm-svn: 352264
For the power9 CPU, vector operations consume a pair of execution units rather
than one execution unit like a scalar operation. Update the target transform
cost functions to reflect the higher cost of vector operations when targeting
Power9.
Patch by RolandF.
Differential revision: https://reviews.llvm.org/D55461
llvm-svn: 352261
Summary: We have isel patterns for this, but we're missing some load patterns and all broadcast patterns. A DAG combine seems like a better fit for this.
Reviewers: RKSimon, spatel
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D56971
llvm-svn: 352260
These intrinsics may return different values every time they are called
and should not be CSE'd. IntrInaccessibleMemOnly appears to be the right
attribute to model this behavior.
Differential Revision: https://reviews.llvm.org/D57259
llvm-svn: 352256
Summary:
I'm not sure why we were using SEXTLOAD. EXTLOAD seems more appropriate since we don't care about the upper bits.
This patch changes this and then modifies the X86 post legalization combine to emit a extending shuffle instead of a sign_extend_vector_inreg. Could maybe use an any_extend_vector_inreg, but I just did what we already do in LowerLoad. I think we can actually get rid of this code entirely if we switch to -x86-experimental-vector-widening-legalization.
On AVX512 targets I think we might be able to use a masked vpmovzx and not have to expand this at all.
Reviewers: RKSimon, spatel
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D57186
llvm-svn: 352255
I need the comdat selection for PR40094. To keep the patch for that smaller,
I'm adding it here, and as a first application I'm using it to reject
associative comdats referring to earlier associative comdats. Depends on
D56929; together with that all associative comdats referring to other
associative comdats are now rejected.
Differential Revision: https://reviews.llvm.org/D56931
llvm-svn: 352254
libclang can be built in shared or static mode. On Windows, with
LLVM_ENABLE_PIC=OFF, it was built in neither mode, leading to clients of
libclang (c-index-test, c-arcmt-test) failing to link with it set.
Since PIC isn't really a thing on Windows, build libclang in shared mode when
LLVM_ENABLE_PIC=OFF there. This is also somewhat symmetric with the existing
ENABLE_STATIC a few lines down.
Differential Revision: https://reviews.llvm.org/D57258
llvm-svn: 352253
Summary:
When mounting LLVM source into a windows container in read-only mode, certain tests fail. Ideally, we want all these tests to pass so that developers can mount the same source folder into multiple (windows) containers simultaneously, allowing them to build/test the same source code using various different configurations simultaneously.
**Fix**: I've found that when attempting to open a file for writing on windows, if you don't have the correct permissions (trying to open a file for writing in a read-only folder), you get [Access is denied](https://support.microsoft.com/en-us/help/2623670/access-denied-or-other-errors-when-you-access-or-work-with-files-and-f). In llvm, we map this error message to a linux based error, see: https://github.com/llvm-mirror/llvm/blob/master/lib/Support/ErrorHandling.cpp
This is why we see "Permission denied" in our output as opposed to the expected "No such file or directory", thus causing the tests to fail.
I've changed the test locally to instead point to the root drive so that they can successfully bypass the Access is denied error when LLVM is mounted in as a read-only directory. This way, the test operate exactly the same, but we can get around the windows-complications of what error to expect in a read-only directory.
Patch By: justice_adams
Reviewers: rsmith, zturner, MatzeB, stella.stamenova
Reviewed By: stella.stamenova
Subscribers: ormris, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D50563
llvm-svn: 352252
With the fixes to the building of LLVM-C.dll in D56781 this should now
be safe to land. This will greatly simplify dealing with LLVM for people
that just want to use the C API on windows. This is a follow up from
D35077.
Patch by Jakob Bornecrantz!
Differential revision: https://reviews.llvm.org/D56774
llvm-svn: 352250
DAGCombiner::visitBITCAST will perform:
fold (bitconvert (fneg x)) -> (xor (bitconvert x), signbit)
fold (bitconvert (fabs x)) -> (and (bitconvert x), (not signbit))
As shown in double-bitmanip-dagcombines.ll, this can be advantageous. But
RV32FD doesn't use bitcast directly (as i64 isn't a legal type), and instead
uses RISCVISD::SplitF64. This patch adds an equivalent DAG combine for
SplitF64.
llvm-svn: 352247
Summary:
Currently, if an instruction with a memory operand has no debug information,
X86DiscriminateMemOps will generate one based on the first line of the
enclosing function, or the last seen debug info.
This may cause confusion in certain debugging scenarios. The long term
approach would be to use the line number '0' in such cases, however, that
brings in challenges: the base discriminator value range is limited
(4096 values).
For the short term, adding an opt-in flag for this feature.
See bug 40319 (https://bugs.llvm.org/show_bug.cgi?id=40319)
Reviewers: dblaikie, jmorse, gbedwell
Reviewed By: dblaikie
Subscribers: aprantl, eraman, hiraditya
Differential Revision: https://reviews.llvm.org/D57257
llvm-svn: 352246
The rest of libunwind already uses placement new, these are the only
places where non-placement new is being used introducing undesirable
C++ library dependency.
Differential Revision: https://reviews.llvm.org/D57251
llvm-svn: 352245
(fcopysign a, (fneg b)) will be expanded to bitwise operations by
DAGTypeLegalizer::SoftenFloatRes_FCOPYSIGN if the floating point type isn't
legal. Arguably it might be worth doing a combine even if it is legal.
llvm-svn: 352240
Fix a bug where we would compare array sizes with incompatible
element types, and look through explicit casts.
rdar://44800168
Differential revision: https://reviews.llvm.org/D57064
llvm-svn: 352239
Summary:
Set default value for retrieved attributes to 1, since the check is against 1.
Eliminates the warning noise generated when the attributes are not present.
Reviewers: sanjoy
Subscribers: jlebar, llvm-commits
Differential Revision: https://reviews.llvm.org/D57253
llvm-svn: 352238
If bottom of block BB has only one successor OldTop, in most cases it is profitable to move it before OldTop, except the following case:
-->OldTop<-
| . |
| . |
| . |
---Pred |
| |
BB-----
Move BB before OldTop can't reduce the number of taken branches, this patch detects this case and prevent the moving.
Differential Revision: https://reviews.llvm.org/D57067
llvm-svn: 352236
Summary:
When cross-compiling LLDB, we want to use llvm-tblgen built for the
host, not the target.
Reviewers: compnerd, sgraenitz
Subscribers: mgorny, lldb-commits
Differential Revision: https://reviews.llvm.org/D57194
llvm-svn: 352235
Summary:
As reported on llvm-testers, during 8.0.0-rc1 testing I got errors while
building of `XRayTest`, during `check-all`:
```
[100%] Generating XRayTest-x86_64-Test
/home/dim/llvm/8.0.0/rc1/Phase3/Release/llvmCore-8.0.0-rc1.obj/./lib/libLLVMSupport.a(Signals.cpp.o): In function `llvm::sys::PrintStackTrace(llvm::raw_ostream&)':
Signals.cpp:(.text._ZN4llvm3sys15PrintStackTraceERNS_11raw_ostreamE+0x24): undefined reference to `backtrace'
Signals.cpp:(.text._ZN4llvm3sys15PrintStackTraceERNS_11raw_ostreamE+0x254): undefined reference to `llvm::itaniumDemangle(char const*, char*, unsigned long*, int*)'
clang-8: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[3]: *** [projects/compiler-rt/lib/xray/tests/unit/CMakeFiles/TXRayTest-x86_64-Test.dir/build.make:73: projects/compiler-rt/lib/xray/tests/unit/XRayTest-x86_64-Test] Error 1
gmake[3]: Target 'projects/compiler-rt/lib/xray/tests/unit/CMakeFiles/TXRayTest-x86_64-Test.dir/build' not remade because of errors.
gmake[2]: *** [CMakeFiles/Makefile2:33513: projects/compiler-rt/lib/xray/tests/unit/CMakeFiles/TXRayTest-x86_64-Test.dir/all] Error 2
gmake[2]: Target 'CMakeFiles/check-all.dir/all' not remade because of errors.
gmake[1]: *** [CMakeFiles/Makefile2:737: CMakeFiles/check-all.dir/rule] Error 2
gmake[1]: Target 'check-all' not remade because of errors.
gmake: *** [Makefile:277: check-all] Error 2
[Release Phase3] check-all failed
```
This is because the `backtrace` function requires `-lexecinfo` on BSD
platforms. To fix this, detect the `execinfo` library in
`cmake/config-ix.cmake`, and add it to the unit test link flags.
Additionally, since the code in `sys::PrintStackTrace` makes use of
`itaniumDemangle`, also add `-lLLVMDemangle`. (Note that this is more
of a general problem with libLLVMSupport, but I'm looking for a quick
fix now so it can be merged to the 8.0 branch.)
Reviewers: dberris, hans, mgorny, samsonov
Reviewed By: dberris
Subscribers: krytarowski, delcypher, erik.pilkington, #sanitizers, emaste, llvm-commits
Differential Revision: https://reviews.llvm.org/D57181
llvm-svn: 352234
This code doesn't need to traverse types, lambdas, template arguments,
etc to detect trivial recursion. We can do a basic statement traversal
instead. This reduces the time spent compiling CodeGenModule.cpp, the
object file size (mostly reduced debug info), and the final executable
size by a small amount. I measured the exe mostly to check how much of
the overhead is from debug info, object file section headers, etc, vs
actual code.
metric | before | after | diff
time (s) | 47.4 | 38.5 | -8.9
obj (kb) | 12888 | 12012 | -876
exe (kb) | 86072 | 85996 | -76
llvm-svn: 352232
Summary:
Because _Float16 was disabled for X86 targets the unit-tests started failing.
Extract the pieces for _Float16 and run theses tests under AArch64.
Reviewers: aaron.ballman, erichkeane, lebedev.ri
Reviewed By: erichkeane
Subscribers: javed.absar, xazax.hun, kristof.beyls, cfe-commits
Differential Revision: https://reviews.llvm.org/D57249
llvm-svn: 352231
We also need to combine to masked truncating with saturation stores, but I'm leaving that for a future patch.
This does regress some tests that used truncate wtih saturation followed by a masked store. Those now use a truncating store and use min/max to saturate.
Differential Revision: https://reviews.llvm.org/D57218
llvm-svn: 352230
Float16 support was disabled recently on many platforms, however that
commit still allowed literals of Float16 type to work. This commit
removes those based on the same logic as Float16 disable.
Change-Id: I72243048ae2db3dc47bd3d699843e3edf9c395ea
llvm-svn: 352229
The main goal of the model is to avoid *increasing* function size, as
that would eradicate any memory locality benefits from splitting. This
happens when:
- There are too many inputs or outputs to the cold region. Argument
materialization and reloads of outputs have a cost.
- The cold region has too many distinct exit blocks, causing a large
switch to be formed in the caller.
- The code size cost of the split code is less than the cost of a
set-up call.
A secondary goal is to prevent excessive overall binary size growth.
With the cost model in place, I experimented to find a splitting
threshold that works well in practice. To make warm & cold code easily
separable for analysis purposes, I moved split functions to a "cold"
section. I experimented with thresholds between [0, 4] and set the
default to the threshold which minimized geomean __text size.
Experiment data from building LNT+externals for X86 (N = 639 programs,
all sizes in bytes):
| Configuration | __text geom size | __cold geom size | TEXT geom size |
| **-Os** | 1736.3 | 0, n=0 | 10961.6 |
| -Os, thresh=0 | 1740.53 | 124.482, n=134 | 11014 |
| -Os, thresh=1 | 1734.79 | 57.8781, n=90 | 10978.6 |
| -Os, thresh=2 | ** 1733.85 ** | 65.6604, n=61 | 10977.6 |
| -Os, thresh=3 | 1733.85 | 65.3071, n=61 | 10977.6 |
| -Os, thresh=4 | 1735.08 | 67.5156, n=54 | 10965.7 |
| **-Oz** | 1554.4 | 0, n=0 | 10153 |
| -Oz, thresh=2 | ** 1552.2 ** | 65.633, n=61 | 10176 |
| **-O3** | 2563.37 | 0, n=0 | 13105.4 |
| -O3, thresh=2 | ** 2559.49 ** | 71.1072, n=61 | 13162.4 |
Picking thresh=2 reduces the geomean __text section size by 0.14% at
-Os, -Oz, and -O3 and causes ~0.2% growth in the TEXT segment. Note that
TEXT size is page-aligned, whereas section sizes are byte-aligned.
Experiment data from building LNT+externals for ARM64 (N = 558 programs,
all sizes in bytes):
| Configuration | __text geom size | __cold geom size | TEXT geom size |
| **-Os** | 1763.96 | 0, n=0 | 42934.9 |
| -Os, thresh=2 | ** 1760.9 ** | 76.6755, n=61 | 42934.9 |
Picking thresh=2 reduces the geomean __text section size by 0.17% at
-Os and causes no growth in the TEXT segment.
Measurements were done with D57082 (r352080) applied.
Differential Revision: https://reviews.llvm.org/D57125
llvm-svn: 352228
N_FUNC_COLD is a new MachO symbol attribute. It's a hint to the linker
to order a symbol towards the end of its section, to improve locality.
Example:
```
void a1() {}
__attribute__((cold)) void a2() {}
void a3() {}
int main() {
a1();
a2();
a3();
return 0;
}
```
A linker that supports N_FUNC_COLD will order _a2 to the end of the text
section. From `nm -njU` output, we see:
```
_a1
_a3
_main
_a2
```
Differential Revision: https://reviews.llvm.org/D57190
llvm-svn: 352227
Currently if a breakpoint site is already present, its ID will be returned, not the LLDB_INVALID_BREAK_ID.
On the other hand, Process::CreateBreakpointSite may have another reasons to return LLDB_INVALID_BREAK_ID.
llvm-svn: 352226
This patch adds support for displaying remarks with multiple
lines. For such remarks, it creates a hidden div
containing the message's lines except the first one in a <pre>
tag. It also prepends a link (with '+' as text) to the regular remark
line. This link can be used to show/hide the div containing the
full remark.
In combination with D57159, this allows for better displaying of
multiline remarks in the html pages generated by opt-viewer.
The Javascript is very simple and should be supported by any recent
major browser.
Reviewers: hfinkel, anemet, thegameg, serge-sans-paille
Reviewed By: anemet
Differential Revision: https://reviews.llvm.org/D57167
llvm-svn: 352223
As Discussed here:
http://lists.llvm.org/pipermail/llvm-dev/2019-January/129543.html
There are problems exposing the _Float16 type on architectures that
haven't defined the ABI/ISel for the type yet, so we're temporarily
disabling the type and making it opt-in.
Differential Revision: https://reviews.llvm.org/D57188
Change-Id: I5db7366dedf1deb9485adb8948b1deb7e612a736
llvm-svn: 352221
Summary:
D57116 fails on the armv7 bots, which is I assume due to the timing of
the RSS check on the platform. While I don't have a platform to test
that change on, I assume this would do.
The test could be made more reliable by either delaying more the
allocations, or allocating more large-chunks, but both those options
have a somewhat non negligible impact (more memory used, longer test).
Hence me trying to keep the additional sleeping/allocating to a
minimum.
Reviewers: eugenis, yroux
Reviewed By: yroux
Subscribers: javed.absar, kristof.beyls, delcypher, #sanitizers, llvm-commits
Differential Revision: https://reviews.llvm.org/D57241
llvm-svn: 352220
declaration in MSVCCompat mode
Microsoft compiler permits the use of 'static' storage specifier outside
of a class definition if it's on an out-of-line member function template
declaration.
This patch allows 'static' storage specifier on an out-of-line member
function template declaration with a warning in Clang (To be compatible
with Microsoft).
Intel C/C++ compiler allows the 'static' keyword with a warning in
Microsoft mode. GCC allows this with -fpermissive.
Patch By: Manna
Differential Revision: https://reviews.llvm.org/D56473
Change-Id: I97b2d9e9d57cecbcd545d17e2523142a85ca2702
llvm-svn: 352219
This seems unnecessarily complicated because we gave names to
opposite polarity bools and have code comments that don't really
line up with the logic.
Step 1: remove UndefUpper and assert that it is the opposite of
UndefLower after the initial early exit.
llvm-svn: 352217
This patch adds a new type StringBlockVal which can be used to emit a
YAML block scalar, which preserves newlines in a multiline string. It
also updates MappingTraits<DiagnosticInfoOptimizationBase::Argument> to
use it for argument values with more than a single newline.
This is helpful for remarks that want to display more in-depth
information in a more structured way.
Reviewers: thegameg, anemet
Reviewed By: anemet
Subscribers: hfinkel, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D57159
llvm-svn: 352216