This shouldn't really happen in practice I hope, but we tried to handle other constant cases. We missed this one because we checked for ConstantVector without realizing that zero becomes ConstantAggregateZero instead.
So instead just check for Constant and use getAggregateElement which will do the dirty work for us.
llvm-svn: 343270
flag to true in LLJIT when running in multithreaded mode.
The IRLayer::setCloneToNewContextOnEmit method sets a flag within the IRLayer
that causes modules added to that layer to be moved to a new context (by
serializing to/from a memory buffer) when they are emitted. This allows modules
that were all loaded on the same context to be compiled in parallel.
llvm-svn: 343266
In a very recent change I introduced a --no-export-default flag
but after conferring with others it seems that this feature already
exists in gnu GNU ld and lld in the form the --export-dynamic flag
which is off by default.
This change replaces export-default with export-dynamic and also
changes the default to match the traditional linker behaviour.
Now, by default, only the entry point is exported. If other symbols
are required by the embedder then --export-dynamic or --export can
be used to export all visibility hidden symbols or individual
symbols respectively.
This change touches a lot of tests that were relying on symbols
being exported by default. I imagine it will also effect many
users but do think the change is worth it match of the traditional
behaviour and flag names.
Differential Revision: https://reviews.llvm.org/D52587
llvm-svn: 343265
The darwin linker was complaining about Toolchains/RISCV.cpp and
Toolchains/Arch/RISCV.cpp had the same name. Fix is to just rename
Toolchains/RISCV.cpp to Toolchains/RISCVToolchain.cpp.
Differential revision: https://reviews.llvm.org/D52574
llvm-svn: 343263
Failure to lock the context can lead to data races if other threads are
operating on other ThreadSafeModules that share the same context.
llvm-svn: 343261
Summary: Set default schedule for parallel for loops to schedule(static, 1) when using SPMD mode on the NVPTX device offloading toolchain to ensure coalescing.
Reviewers: ABataev, Hahnfeld, caomhin
Reviewed By: ABataev
Subscribers: jholewinski, guansong, cfe-commits
Differential Revision: https://reviews.llvm.org/D52629
llvm-svn: 343260
one SymbolLinkagePromoter utility.
SymbolLinkagePromoter renames anonymous and private symbols, and bumps all
linkages to at least global/hidden-visibility. Modules whose symbols have been
promoted by this utility can be decomposed into sub-modules without introducing
link errors. This is used by the CompileOnDemandLayer to extract single-function
modules for lazy compilation.
llvm-svn: 343257
Summary: For the OpenMP NVPTX toolchain choose a default distribute schedule that ensures coalescing on the GPU when in SPMD mode. This significantly increases the performance of offloaded target code and reduces the number of registers used on the GPU side.
Reviewers: ABataev, caomhin, Hahnfeld
Reviewed By: ABataev, Hahnfeld
Subscribers: Hahnfeld, jholewinski, guansong, cfe-commits
Differential Revision: https://reviews.llvm.org/D52434
llvm-svn: 343253
Summary:
The default values used for Space/Size for the new SizeClassMap do not work
with Android. The Compact map appears to be in the same boat.
Disable the test on Android for now to turn the bots green, but there is no
reason Compact & Dense should not have an Android test.
Added a FIXME, I will revisit this soon.
Reviewers: eugenis
Subscribers: srhines, kubamracek, delcypher, #sanitizers, llvm-commits
Differential Revision: https://reviews.llvm.org/D52623
llvm-svn: 343252
Summary:
When no scope qualifier is specified, allow completing index symbols
from any scope and insert proper automatically. This is still experimental and
hidden behind a flag.
Things missing:
- Scope proximity based scoring.
- FuzzyFind supports weighted scopes.
Reviewers: sammccall
Reviewed By: sammccall
Subscribers: kbobyrev, ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D52364
llvm-svn: 343248
Summary:
_Note_: I am not attached to the name `DenseSizeClassMap`, so if someone has a
better idea, feel free to suggest it.
The current pre-defined `SizeClassMap` hold a decent amount of cached entries,
either in cheer number of, or in amount of memory cached.
Empirical testing shows that more compact per-class arrays (whose sizes are
directly correlated to the number of cached entries) are beneficial to
performances, particularly in highly threaded environments.
The new proposed `SizeClassMap` has the following properties:
```
c00 => s: 0 diff: +0 00% l 0 cached: 0 0; id 0
c01 => s: 16 diff: +16 00% l 4 cached: 8 128; id 1
c02 => s: 32 diff: +16 100% l 5 cached: 8 256; id 2
c03 => s: 48 diff: +16 50% l 5 cached: 8 384; id 3
c04 => s: 64 diff: +16 33% l 6 cached: 8 512; id 4
c05 => s: 80 diff: +16 25% l 6 cached: 8 640; id 5
c06 => s: 96 diff: +16 20% l 6 cached: 8 768; id 6
c07 => s: 112 diff: +16 16% l 6 cached: 8 896; id 7
c08 => s: 128 diff: +16 14% l 7 cached: 8 1024; id 8
c09 => s: 144 diff: +16 12% l 7 cached: 7 1008; id 9
c10 => s: 160 diff: +16 11% l 7 cached: 6 960; id 10
c11 => s: 176 diff: +16 10% l 7 cached: 5 880; id 11
c12 => s: 192 diff: +16 09% l 7 cached: 5 960; id 12
c13 => s: 208 diff: +16 08% l 7 cached: 4 832; id 13
c14 => s: 224 diff: +16 07% l 7 cached: 4 896; id 14
c15 => s: 240 diff: +16 07% l 7 cached: 4 960; id 15
c16 => s: 256 diff: +16 06% l 8 cached: 4 1024; id 16
c17 => s: 320 diff: +64 25% l 8 cached: 3 960; id 49
c18 => s: 384 diff: +64 20% l 8 cached: 2 768; id 50
c19 => s: 448 diff: +64 16% l 8 cached: 2 896; id 51
c20 => s: 512 diff: +64 14% l 9 cached: 2 1024; id 48
c21 => s: 640 diff: +128 25% l 9 cached: 1 640; id 49
c22 => s: 768 diff: +128 20% l 9 cached: 1 768; id 50
c23 => s: 896 diff: +128 16% l 9 cached: 1 896; id 51
c24 => s: 1024 diff: +128 14% l 10 cached: 1 1024; id 48
c25 => s: 1280 diff: +256 25% l 10 cached: 1 1280; id 49
c26 => s: 1536 diff: +256 20% l 10 cached: 1 1536; id 50
c27 => s: 1792 diff: +256 16% l 10 cached: 1 1792; id 51
c28 => s: 2048 diff: +256 14% l 11 cached: 1 2048; id 48
c29 => s: 2560 diff: +512 25% l 11 cached: 1 2560; id 49
c30 => s: 3072 diff: +512 20% l 11 cached: 1 3072; id 50
c31 => s: 3584 diff: +512 16% l 11 cached: 1 3584; id 51
c32 => s: 4096 diff: +512 14% l 12 cached: 1 4096; id 48
c33 => s: 5120 diff: +1024 25% l 12 cached: 1 5120; id 49
c34 => s: 6144 diff: +1024 20% l 12 cached: 1 6144; id 50
c35 => s: 7168 diff: +1024 16% l 12 cached: 1 7168; id 51
c36 => s: 8192 diff: +1024 14% l 13 cached: 1 8192; id 48
c37 => s: 10240 diff: +2048 25% l 13 cached: 1 10240; id 49
c38 => s: 12288 diff: +2048 20% l 13 cached: 1 12288; id 50
c39 => s: 14336 diff: +2048 16% l 13 cached: 1 14336; id 51
c40 => s: 16384 diff: +2048 14% l 14 cached: 1 16384; id 48
c41 => s: 20480 diff: +4096 25% l 14 cached: 1 20480; id 49
c42 => s: 24576 diff: +4096 20% l 14 cached: 1 24576; id 50
c43 => s: 28672 diff: +4096 16% l 14 cached: 1 28672; id 51
c44 => s: 32768 diff: +4096 14% l 15 cached: 1 32768; id 48
c45 => s: 40960 diff: +8192 25% l 15 cached: 1 40960; id 49
c46 => s: 49152 diff: +8192 20% l 15 cached: 1 49152; id 50
c47 => s: 57344 diff: +8192 16% l 15 cached: 1 57344; id 51
c48 => s: 65536 diff: +8192 14% l 16 cached: 1 65536; id 48
c49 => s: 81920 diff: +16384 25% l 16 cached: 1 81920; id 49
c50 => s: 98304 diff: +16384 20% l 16 cached: 1 98304; id 50
c51 => s: 114688 diff: +16384 16% l 16 cached: 1 114688; id 51
c52 => s: 131072 diff: +16384 14% l 17 cached: 1 131072; id 48
c53 => s: 64 diff: +0 00% l 0 cached: 8 512; id 4
Total cached: 864928 (152/432)
```
It holds a bit less of 1MB of cached entries at most, and the cache fits in a
page.
The plan is to use this map by default for Scudo once we make sure that there
is no unforeseen impact for any of current use case.
Benchmarks give the most increase in performance (with Scudo) when looking at
highly threaded/contentious environments. For example, rcp2-benchmark
experiences a 10K QPS increase (~3%), and a decrease of 50MB for the max RSS
(~10%). On platforms like Android where we only have a couple of caches,
performance remain similar.
Reviewers: eugenis, kcc
Reviewed By: eugenis
Subscribers: kubamracek, delcypher, #sanitizers, llvm-commits
Differential Revision: https://reviews.llvm.org/D52371
llvm-svn: 343246
Summary:
-lm is needed for these tests on Linux, but the lit config for this package automatically adds it for Linux and excludes it for Windows. So we should be able to get these tests running again by just dropping -lm and let the lit config add it when possible.
I was under the impression that -lm worked across platforms because it exists in other tests without and 'UNSUPPORTED: windows' commands (e.g. divsc3_test.c), but those are actually excluded because they 'REQUIRES: c99-complex' which is excluded from windows platforms (also by the local lit config).
I don't have easy access to a windows machine to verify this patch, but I can trigger a build bot run on clang-x64-ninja-win7 shortly after submitting.
Reviewers: hans
Subscribers: dberris, delcypher, llvm-commits, #sanitizers
Differential Revision: https://reviews.llvm.org/D52563
llvm-svn: 343245
Had we emitted this IR earlier, InstCombine would have removed icmp so I'm going to assume using the i1 directly would be considered canonical.
llvm-svn: 343244
- Add latency timings to GDB packet log summary if timestamps are on log
- Add the ability to plot the latencies for each packet type with --plot
- Don't crash the script when target xml register info is in wierd format
llvm-svn: 343243
'ld{{.*}}"' seems to match the complete line for me which is failing
the test. Only allow an optional '.exe' for Windows systems as most
other tests do.
Another possibility would be to collapse the greedy expression with
the next check to avoid matching the full line.
Differential Revision: https://reviews.llvm.org/D52619
llvm-svn: 343240
Now that D51487 has landed, the last machine verifier tests that failed EXPENSIVE_CHECKS builds have now been fixed/removed, so we can remove @MatzeB 's isMachineVerifierClean() hack for sparc targets.
Differential Revision: https://reviews.llvm.org/D52612
llvm-svn: 343232
Bits [23-22] are used in Add and Sub to specify the shift. The value of the
shift field must be 0x; values of 1x are unallocated. MTE adds some instructions
that use such encodings, and this patch refactors the Add/Sub class so that
another class could derive from this one to implement other encodings and other
formats of bitfields.
Patch by Pablo Barrio!
Differential revision: https://reviews.llvm.org/D52489
llvm-svn: 343231
When looking for the bclib Clang considered the default library
path first while it preferred directories in LIBRARY_PATH when
constructing the invocation of nvlink. The latter actually makes
more sense because during development it allows using a non-default
runtime library. So change the search for the bclib to start
looking in directories given by LIBRARY_PATH.
Additionally add a new option --libomptarget-nvptx-path= which
will be searched first. This will be handy for testing purposes.
Differential Revision: https://reviews.llvm.org/D51686
llvm-svn: 343230
This adds two new barrier instructions which can be used to restrict
speculative execution of load instructions.
Patch by Pablo Barrio!
Differential revision: https://reviews.llvm.org/D52483
llvm-svn: 343229
When C is not zero and infinites are not allowed (C / X) > 0 is a sign
test. Depending on the sign of C, the predicate must be swapped.
E.g.:
foo(double X) {
if ((-2.0 / X) <= 0) ...
}
=>
foo(double X) {
if (X >= 0) ...
}
Patch by: @marels (Martin Elshuber)
Differential Revision: https://reviews.llvm.org/D51942
llvm-svn: 343228
Summary:
Add a dominance check to ensure that the possible devirtualizable
call is actually dominated by the type test/checked load intrinsic being
analyzed. With PGO, after indirect call promotion is performed during
the compile step, followed by inlining, we may have a type test in the
promoted and inlined sequence that allows an indirect call in that
sequence to be devirtualized. That indirect call (inserted by inlining
after promotion) will share the same vtable pointer as the fallback
indirect call that cannot be devirtualized.
Before this patch the code was incorrectly devirtualizing the fallback
indirect call.
See the new test and the example described there for more details.
Reviewers: pcc, vitalybuka
Subscribers: mehdi_amini, Prazek, eraman, steven_wu, dexonsmith, llvm-commits
Differential Revision: https://reviews.llvm.org/D52514
llvm-svn: 343226
This adds new instructions used by the Branch Target Identification
feature. When this is enabled, these are the only instructions which can
be targeted by indirect branch instructions.
Patch by Pablo Barrio!
Differential revision: https://reviews.llvm.org/D52485
llvm-svn: 343225