operands and used flags to support matching immediate operands.
This is a bit trickier than register operands, and we still want to fall
back on a register operands even for things that appear to be
"immediates" when they won't actually select into the operation's
immediate operand. This also requires us to handle things like selecting
`sub` vs. `add` to minimize the number of bits needed to represent the
immediate, and picking the shortest immediate encoding. In order to
that, we in turn need to scan to make sure that CF isn't used as it will
get inverted.
The end result seems very nice though, and we're now generating
optimal instruction sequences for these patterns IMO.
A follow-up patch will further expand this to other operations with RMW
memory operands. But handing `add` and `sub` are useful starting points
to flesh out the machinery and make sure interesting and complex cases
can be handled.
Thanks to Craig Topper who provided a few fixes and improvements to this
patch in addition to the review!
Differential Revision: https://reviews.llvm.org/D37139
llvm-svn: 312764
Most callers were not expecting the exit(0) and trying to exit with a
different value.
This also adds back the call to cl::PrintHelpMessage in llvm-ar.
llvm-svn: 312761
r312318 - Debug info for variables whose type is shrinked to bool
r312325, r312424, r312489 - Test case for r312318
Revision 312318 introduced a null dereference bug.
Details in https://bugs.llvm.org/show_bug.cgi?id=34490
llvm-svn: 312758
If --dynamic-list is given, only those symbols are preemptible.
This allows combining --dynamic-list and version scripts too. The
version script controls which symbols are visible, and --dynamic-list
controls which of those are preemptible.
This fixes pr34053.
llvm-svn: 312757
As is indexes above SHN_LORESERVE will not be handled correctly because
they'll be treated as indexes of sections rather than special values
that should just be copied. This change adds support to copy them
though.
Patch by Jake Ehrlich
Differential Revision: https://reviews.llvm.org/D37393
llvm-svn: 312756
Summary:
This is a first half(?) of a fix for the following bug:
https://bugs.llvm.org/show_bug.cgi?id=34147 (gcc -Wtype-limits)
GCC's -Wtype-limits does warn on comparison of unsigned value
with signed zero (as in, with 0), but clang only warns if the
zero is unsigned (i.e. 0U).
Also, be careful not to double-warn, or falsely warn on
comparison of signed/fp variable and signed 0.
Yes, all these testcases are needed.
Testing: $ ninja check-clang-sema check-clang-semacxx
Also, no new warnings for clang stage-2 build.
Reviewers: rjmccall, rsmith, aaron.ballman
Reviewed By: rjmccall
Subscribers: cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D37565
llvm-svn: 312750
The ToolChain class validates the -mthread-model flag in the constructor which
doesn't work correctly since the thread model methods are virtual methods. The
check is moved into Clang::ConstructJob() when constructing the internal
command line.
https://reviews.llvm.org/D37496
Patch by: Ian Tessier!
llvm-svn: 312748
We already uses pipefail to detect failure of a redirected command, so
the "|| echo failure" construct was unnecessary.
These tests run and pass on Windows now.
llvm-svn: 312747
This will allow async handlers to be added that return void or Error::success().
Such handlers are expected to be common, since one of the primary uses of
addAsyncHandler is to run the body of the handler in a detached thread, in which
case the main handler returns immediately and does not need to provide an Error
value.
llvm-svn: 312746
Right now Symbols must be either undefined or defined in a specific
section. Some symbols have section indexes like SHN_ABS however. This
change adds support for outputting symbols that have such section
indexes.
Patch by Jake Ehrlich
Differential Revision: https://reviews.llvm.org/D37391
llvm-svn: 312745
It is possible for two modules to have the same name if they are
archive members with the same name, or if we are doing LTO (in which
case all modules will have the name "lto.tmp").
Differential Revision: https://reviews.llvm.org/D37589
llvm-svn: 312744
The tests are filechecking against stderr and use some magic to make stdout go
away and pipe stderr to FileCheck. This broke bots on windows.
llvm-svn: 312739
Summary:
That is, instead of "1 error generated", we now say "1 error generated
when compiling for sm_35".
This (partially) solves a usability foogtun wherein e.g. users call a
function that's only defined on sm_60 when compiling for sm_35, and they
get an unhelpful error message.
Reviewers: tra
Subscribers: sanjoy, cfe-commits
Differential Revision: https://reviews.llvm.org/D37548
llvm-svn: 312736
Even though the content of the minidump does not change in a debugging session,
frames can't be indiscriminately be cached since modules and symbols can be
explicitly added after the minidump is loaded.
The fix is simple, just let the base Thread::ClearStackFrames() do its job.
submitted by amccarth on behalf of lemo
Bug: https://bugs.llvm.org/show_bug.cgi?id=34510
Differential Revision: https://reviews.llvm.org/D37527
llvm-svn: 312735
For now CUDA-9 is not included in the list of CUDA versions clang
searches for, so the path to CUDA-9 must be explicitly passed
via --cuda-path=.
On LLVM side NVPTX added sm_70 GPU type which bumps required
PTX version to 6.0, but otherwise is equivalent to sm_62 at the moment.
Differential Revision: https://reviews.llvm.org/D37576
llvm-svn: 312734
Second try after fixing a code san problem with iterator reference types.
This change introduces a subcommand to the llvm-xray tool called
"stacks" which allows for analysing XRay traces provided as inputs and
accounting time to stacks instead of just individual functions. This
gives us a more precise view of where in a program the latency is
actually attributed.
The tool uses a trie data structure to keep track of the caller-callee
relationships as we process the XRay traces. In particular, we keep
track of the function call stack as we enter functions. While we're
doing this we're adding nodes in a trie and indicating a "calls"
relatinship between the caller (current top of the stack) and the callee
(the new top of the stack). When we push function ids onto the stack, we
keep track of the timestamp (TSC) for the enter event.
When exiting functions, we are able to account the duration by getting
the difference between the timestamp of the exit event and the
corresponding entry event in the stack. This works even if we somehow
miss the exit events for intermediary functions (i.e. if the exit event
is not cleanly associated with the enter event at the top of the stack).
The output of the tool currently provides just the top N leaf functions
that contribute the most latency, and the top N stacks that have the
most frequency. In the future we can provide more sophisticated query
mechanisms and potentially an export to database feature to make offline
analysis of the stack traces possible with existing tools.
Differential revision: D34863
llvm-svn: 312733
Fixes some combine issues for AMDGPU where we weren't
getting the many extract_vector_elt combines expected
in a future patch.
This should really be checking isOperationLegalOrCustom on
the extract. That improves a number of x86 lit tests, but
a few get stuck in an infinite loop from one place
where a similar looking extract is created. I have a
different workaround in the backend for that which
keeps many of those improvements, but also adds a few
regressions.
llvm-svn: 312730
Block captures can have different physical locations
in memory segments depending on the use case (as a function
call or as a kernel enqueue) and in different vendor
implementations.
Therefore it's unclear how to add address space to capture
addresses uniformly. Currently it has been decided to disallow
taking addresses of captured variables until further
clarifications in the spec.
Differential Revision: https://reviews.llvm.org/D36410
llvm-svn: 312728
To support errata patching on AArch64 we need to be able to overwrite
an arbitrary instruction with a branch. For AArch64 it is sufficient to
always write all the bits of the branch instruction and not just the
immediate field. This is safe as the non-immediate bits of the branch
instruction are always the same.
Differential Revision: https://reviews.llvm.org/D36745
llvm-svn: 312727
Summary:
Test was skipped because -data-evaluate-expression was thought
to not work on globals. This is not the case - the issue was clang
removes debug info for globals in cpp files that are not used.
Add a reference to the globals in question, and fix memory patter in
test to match memory pattern in testcase.
Reviewers: ki.stfu, abidh
Reviewed By: ki.stfu
Subscribers: aprantl, lldb-commits
Differential Revision: https://reviews.llvm.org/D37533
llvm-svn: 312726
These don't add any value as they're just compositions of existing
patterns. However, they can confuse the cost logic in ISel, leading to
duplicated vcvt instructions like in PR33199.
llvm-svn: 312724
This patch expands the support of lowerInterleavedload to {8|16|32}x8i stride 3.
LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=3 VF={8|16|32}) and we plan to include the store (deinterleved side).
The patch goal is to optimize the following sequence:
a0 b0 c0 a1 b1 c1 a2 b2
c2 a3 b3 c3 a4 b4 c4 a5
b5 c5 a6 b6 c6 a7 b7 c7
into
a0 a1 a2 a3 a4 a5 a6 a7
b0 b1 b2 b3 b4 b5 b6 b7
c0 c1 c2 c3 c4 c5 c6 c7
Reviewers
1. zvi
2. igor
3. guyblank
4. dorit
5. Ayal
llvm-svn: 312722
The current description of AllowAllParametersOfDeclarationOnNextLine in
the Clang-Format Style Options guide suggests that it is possible to
format function declaration, which fits in a single line (what is not
supported in current clang-format version). Also the example was not
reproducible and mades no sense.
Patch by Lucja Mazur, thank you!
llvm-svn: 312721
This change converts the `MipsAsmBackend` constructor to the "standard"
form. It makes possible to use `RegisterMCAsmBackend` for the backends
registrations. Now we pass `Triple` instance to the `MipsAsmBackend`
ctor and deduce all required options like endianness and bitness from
the triple. We still need to implement explicit ABI checking for
providing correct options to backends.
Differential revision: https://reviews.llvm.org/D37519
llvm-svn: 312720
Summary:
For large basic blocks with lots of combinable instructions, the
MachineTraceMetrics computations in MachineCombiner can dominate the compile
time, as computing the trace information is quadratic in the number of
instructions in a BB and it's relevant successors/predecessors.
In most cases, knowing the instruction depth should be enough to make
combination decisions. As we already iterate over all instructions in a basic
block, the instruction depth can be computed incrementally. This reduces the
cost of machine-combine drastically in cases where lots of instructions
are combined. The major drawback is that AFAIK, computing the critical path
length cannot be done incrementally. Therefore we only compute
instruction depths incrementally, for basic blocks with more
instructions than inc_threshold. The -machine-combiner-inc-threshold
option can be used to set the threshold and allows for easier
experimenting and checking if using incremental updates for all basic
blocks has any impact on the performance.
Reviewers: sanjoy, Gerolf, MatzeB, efriedma, fhahn
Reviewed By: fhahn
Subscribers: kiranchandramohan, javed.absar, efriedma, llvm-commits
Differential Revision: https://reviews.llvm.org/D36619
llvm-svn: 312719
The type of NewValue might change due to ScalarEvolution
looking though bitcasts. The synthesized NewValue therefore
becomes the type before the bitcast.
llvm-svn: 312718
Summary:
Looks like we are out of sync between the doc and the code.
Reviewers: djasper
Reviewed By: djasper
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D37558
llvm-svn: 312716
Summary:
This function is used in D36619 to update the instruction depths
incrementally.
Reviewers: efriedma, Gerolf, MatzeB, fhahn
Reviewed By: fhahn
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D36696
llvm-svn: 312714
The ARM, BPF, MSP430, Sparc and Mips backends all use a similar code sequence
for lowering SelectCC. As pointed out by @reames in D29937, this code isn't
particularly clear and in most of these backends doesn't actually match the
comments. This patch makes the code sequence clearer for the Sparc backend
through better variable naming and more accurate comments (e.g. we are
inserting triangle control flow, _not_ diamond). There is no functional
change.
Differential Revision: https://reviews.llvm.org/D37194
llvm-svn: 312713