Keep track of achieved occupancy in SIMachineFunctionInfo.
At the moment we have a lot of duplicated or even missed code to
query and maintain occupancy info. Record it in the MFI and
query in a single call. Interfaces:
- getOccupancy() - returns current recorded achieved occupancy.
- getMinAllowedOccupancy() - returns lesser of the achieved occupancy
and the lowest occupancy we are ready to tolerate. For example if
a kernel is memory bound we are ready to tolerate 4 waves.
- limitOccupancy() - record occupancy level if we have to lower it.
- increaseOccupancy() - record occupancy if scheduler managed to
increase the occupancy.
MFI takes care of integrating different checks affecting occupancy,
including LDS use and waves-per-eu attribute. Note that scheduler
starts with not yet known register pressure, so has to record either
limit or increase in occupancy after it is done. Later passes can
just query a resulting value.
New interface is used in the active scheduler and NFC wrt its work.
Changes are also made to experimental schedulers to use it and record
an occupancy after they are done. Before the change waves-per-eu was
ignored by experimental schedulers and tolerance window for memory
bound kernels was not used.
Differential Revision: https://reviews.llvm.org/D47509
llvm-svn: 333629
Summary:
This is part of the larger XRay Profiling Mode effort.
This patch implements a centralised collector for `FunctionCallTrie`
instances, associated per thread. It maintains a global set of trie
instances which can be retrieved through the XRay API for processing
in-memory buffers (when registered). Future changes will include the
wiring to implement the actual profiling mode implementation.
This central service provides the following functionality:
* Posting a `FunctionCallTrie` associated with a thread, to the central
list of tries.
* Serializing all the posted `FunctionCallTrie` instances into
in-memory buffers.
* Resetting the global state of the serialized buffers and tries.
Depends on D45757.
Reviewers: echristo, pelikan, kpw
Reviewed By: kpw
Subscribers: llvm-commits, mgorny
Differential Revision: https://reviews.llvm.org/D45758
llvm-svn: 333624
v2: use "ensureAlignment"
make functions cache line aligned
Fixes GPU hangs since r333219:
"AMDGPU: Split R600 AsmPrinter code into its own class"
Differential Revision: https://reviews.llvm.org/D47516
llvm-svn: 333622
Besides other changes, this update introduces functions to translate a
maps and sets into lists of their elements. These lists are useful as
we can define iterators for lists, which allow us to replace many uses
of foreach.
llvm-svn: 333621
This fixes projects/compiler-rt/test/fuzzer/sigusr.test, which was
broken by r333614. The trouble was that "&&" changes the command for
which "$!" gives the pid.
llvm-svn: 333620
The majority of the cases were correct. This fixes the few that weren't.
I also removed some superfluous parentheses in non-macros that confused by attempts at grepping for missing casts.
llvm-svn: 333615
(Relands r333584, reverted in 333592.)
When debugging test failures with -vv (or -v in the case of the
internal shell), this makes it easier to locate the RUN line that
failed. For example, clang's test/Driver/linux-ld.c has 892 total RUN
lines, and clang's test/Driver/arm-cortex-cpus.c has 424 RUN lines
after concatenation for line continuations.
When reading the generated shell script, this also makes it easier to
locate the RUN line that produced each command.
To support reporting RUN line numbers in the case of the internal
shell, this patch extends the internal shell to support the null
command, ":", except pipelines are not supported.
To support reporting RUN line numbers in the case of windows cmd.exe
as the external shell, this patch extends -vv to set "echo on" instead
of "echo off" in bat files. (Support for windows cmd.exe as a lit
external shell will likely be dropped later, but I found out too
late.)
Reviewed By: delcypher, asmith, stella.stamenova, jmorse, lebedev.ri, rnk
Differential Revision: https://reviews.llvm.org/D44598
llvm-svn: 333614
I think this is a holdover from when we used to declare variables inside the macros. And then its been copy and pasted forward for years every time a new macro intrinsic gets added.
Interestingly this caused some tests for IRGen to be slightly more optimized. We now return a zeroinitializer directly instead of going through a store+load.
It also removed a bogus error message on another test.
llvm-svn: 333613
Previously, the checker was using the nullability of the expression,
which is nonnull IFF both receiver and method are annotated as _Nonnull.
However, the receiver could be known to the analyzer to be nonnull
without being explicitly marked as _Nonnull.
rdar://40635584
Differential Revision: https://reviews.llvm.org/D47510
llvm-svn: 333612
Don't always:
cast (select (cmp x, y), z, C) --> select (cmp x, y), (cast z), C'
This is something that came up as far back as D26556, and I lost track of it.
I suspect that this transform is part of the underlying problem that is
inspiring some of the recent proposals that seek to match larger patterns
that include a cast op. Even if that's not true, this transform causes
problems for codegen (particularly with vector types).
A transform to actively match the size of cmp and select operand sizes should
follow. This patch just removes the harmful canonicalization in the other
direction.
Differential Revision: https://reviews.llvm.org/D47163
llvm-svn: 333611
Discard the last uncompleted deferred region in a decl, if one exists.
This prevents lines at the end of a function containing only whitespace
or closing braces from being marked as uncovered, if they follow a
region terminator (return/break/etc).
The previous behavior was to heuristically complete deferred regions at
the end of a decl. In practice this ended up being too brittle for too
little gain. Users would complain that there was no way to reach full
code coverage because whitespace at the end of a function would be
marked uncovered.
rdar://40238228
Differential Revision: https://reviews.llvm.org/D46918
llvm-svn: 333609
Summary:
Fix PR37625. It's possible for an extern_weak declaration to be emitted
to the merged module when a definition exists in the ThinLTO portion of
the build; discard the linkage on the declaration in that case.
(otherwise we copy the linkage to the alias to the jumptable and fail)
Reviewers: pcc
Reviewed By: pcc
Subscribers: mehdi_amini, llvm-commits, kcc
Differential Revision: https://reviews.llvm.org/D47494
llvm-svn: 333604
Most of the origial comments used C style /* */ comments, but some C++ // comments had snuck in over time.
Still need to convert all the doxygen comments. Which is much harder to do.
llvm-svn: 333603
Looks like we intended to compare this->Members with Other->Members
here, but ended up comparing this->Members with this->Members. Oops. :)
Since CongruenceClass::Members is a SmallPtrSet anyway, we can probably
skip building std::sets if we're willing to write a bit more code.
This appears to be no functional change (for sufficiently lax values of
"no"): this equality check was only being called inside of an assert.
So, worst case, we'll catch more bugs in the form of assertion failures.
Thanks to d0k for noting this!
llvm-svn: 333601
This is to make it clear what kind of bugs the LegalizerInfo::verifier
is able to catch and test its output
Reviewers: aemerson, qcolombet
Reviewed By: aemerson
Differential Revision: https://reviews.llvm.org/D46338
llvm-svn: 333597
Previously, we printed out two lines of help messages for `--foo bar`
and `--foo=bar` like this:
--soname=<value> Set DT_SONAME
--soname <value> Set DT_SONAME
--sort-section=<value> Specifies sections sorting rule when linkerscript is used
--sort-section <value> Specifies sections sorting rule when linkerscript is used
This change eliminates duplicate lines that doesn't contain `=` for such
options like this.
--soname=<value> Set DT_SONAME
--sort-section=<value> Specifies sections sorting rule when linkerscript is used
Differential Revision: https://reviews.llvm.org/D47558
llvm-svn: 333596
We don't use the result of the query, and all tests pass if I remove it.
During startup, ASan spends a fair amount of time in this handler, and
the query is much more expensive than the call to commit the memory.
llvm-svn: 333595
By using std::shared_ptr for TreePatternNode, we can avoid leaking them.
Reviewers: craig.topper, dsanders, stoklund, tstellar, zturner
Reviewed By: dsanders
Differential Revision: https://reviews.llvm.org/D47463
llvm-svn: 333591
Summary:
Creating the IRBuilder methods:
CreateElementUnorderedAtomicMemSet
CreateElementUnorderedAtomicMemMove
These mirror the methods that create calls to the regular (non-atomic) memmove and
memset intrinsics.
llvm-svn: 333588
(Relands r330755 (reverted in r330848) with fix for PR37239.)
When debugging test failures with -vv (or -v in the case of the
internal shell), this makes it easier to locate the RUN line that
failed. For example, clang's test/Driver/linux-ld.c has 892 total RUN
lines, and clang's test/Driver/arm-cortex-cpus.c has 424 RUN lines
after concatenation for line continuations.
When reading the generated shell script, this also makes it easier to
locate the RUN line that produced each command.
To support reporting RUN line numbers in the case of the internal
shell, this patch extends the internal shell to support the null
command, ":", except pipelines are not supported.
To support reporting RUN line numbers in the case of windows cmd.exe
as the external shell, this patch extends -vv to set "echo on" instead
of "echo off" in bat files. (Support for windows cmd.exe as a lit
external shell will likely be dropped later, but I found out too
late.)
Reviewed By: delcypher, asmith, stella.stamenova, jmorse, lebedev.ri, rnk
Differential Revision: https://reviews.llvm.org/D44598
llvm-svn: 333584
This teaches lldb-test how to launch a process, set up an IRMemoryMap,
and issue memory allocations in the target process through the map. This
makes it possible to test IRMemoryMap in a targeted way.
This has uncovered two bugs so far. The first bug is that Malloc
performs an adjustment on the pointer returned from AllocateMemory (for
alignment purposes) which ultimately allows overlapping memory regions
to be created. The second bug is that after most of the address space on
the host side is exhausted, Malloc may return the same address multiple
times. These bugs (and hopefully more!) can be uncovered and tested for
with targeted lldb-test commands.
At an even higher level, the motivation for addressing these bugs is
that they can lead to strange user-visible failures (e.g, variables
assume the wrong value during expression evaluation, or the debugger
crashes). See my third comment on this swift-lldb PR for an example:
https://github.com/apple/swift-lldb/pull/652
I hope lldb-test is the right place to add this testing harness. Setting
up a gtest-style unit test proved too cumbersome (you need to recreate
or mock way too much debugger state), as did writing end-to-end tests
(it's hard to write a test that actually hits a buggy path).
With lldb-test, it's easy to read/generate the test input and parse the
test output. I'll attach a simple "fuzz" tester which generates failing
test cases to the Phab review. Here's an example:
```
Command: malloc(size=1024, alignment=32)
Malloc: address = 0xca000
Command: malloc(size=64, alignment=16)
Malloc: address = 0xca400
Command: malloc(size=1024, alignment=16)
Malloc: address = 0xca440
Command: malloc(size=16, alignment=8)
Malloc: address = 0xca840
Command: malloc(size=2048, alignment=16)
Malloc: address = 0xcb000
Command: malloc(size=64, alignment=32)
Malloc: address = 0xca860
Command: malloc(size=1024, alignment=16)
Malloc: address = 0xca890
Malloc error: overlapping allocation detected, previous allocation at [0xca860, 0xca8a0)
```
{F6288839}
Differential Revision: https://reviews.llvm.org/D47508
llvm-svn: 333583
The set properties are never used, so a vector is enough. No
functionality change intended.
While there add some std::moves to SparseSolver.
llvm-svn: 333582
Per discussion on the generic-abi mailing list:
https://groups.google.com/forum/#!topic/generic-abi/MPr8TVtnVn4
An object file manipulation tool must either write out a symbol
table with the same number of entries as the original symbol table
and in the same order, or if this is impossible, refuse to operate
on the object file if it has unrecognized sections that are linked
to the symtab section. However, existing tools (namely GNU strip,
GNU objcopy and ld.{bfd,gold,lld} -r) do not comply with this at
present: they change symbol table indexes and set sh_link to 0 on
the unrecognized symtab-linked sections.
We intend to use the latter as a (temporary) signal that a tool has
operated on a proposed new symtab-linked section and invalidated the
symbol table indexes. However, llvm-objcopy currently keeps sh_link
pointing to the new symtab section. This patch changes llvm-objcopy
to set sh_link to 0 to match the behaviour of the other tools.
Differential Revision: https://reviews.llvm.org/D47404
llvm-svn: 333581
Created the IsSplatValue helper from the splat detection code in LowerScalarVariableShift as a first NFC step towards improving support for splat rotations, which is an extension of PR37426.
llvm-svn: 333580