It wasn't actually running the pass, and since it is
missing the llvm prefix, the eh intrinsic was not
really an IntrinsicInst.
Also add missing test for lifetime markers.
llvm-svn: 275370
Summary:
In this patch we implement the following parts of XRay:
- Supporting a function attribute named 'function-instrument' which currently only supports 'xray-always'. We should be able to use this attribute for other instrumentation approaches.
- Supporting a function attribute named 'xray-instruction-threshold' used to determine whether a function is instrumented with a minimum number of instructions (IR instruction counts).
- X86-specific nop sleds as described in the white paper.
- A machine function pass that adds the different instrumentation marker instructions at a very late stage.
- A way of identifying which return opcode is considered "normal" for each architecture.
There are some caveats here:
1) We don't handle PATCHABLE_RET in platforms other than x86_64 yet -- this means if IR used PATCHABLE_RET directly instead of a normal ret, instruction lowering for that platform might do the wrong thing. We think this should be handled at instruction selection time to by default be unpacked for platforms where XRay is not availble yet.
2) The generated section for X86 is different from what is described from the white paper for the sole reason that LLVM allows us to do this neatly. We're taking the opportunity to deviate from the white paper from this perspective to allow us to get richer information from the runtime library.
Reviewers: sanjoy, eugenis, kcc, pcc, echristo, rnk
Subscribers: niravd, majnemer, atrick, rnk, emaste, bmakam, mcrosier, mehdi_amini, llvm-commits
Differential Revision: http://reviews.llvm.org/D19904
llvm-svn: 275367
This now should also work with the interprocedural variant of the pass.
Slightly easier now that the yak is shaved.
Differential Revision: http://reviews.llvm.org/D22329
llvm-svn: 275363
Summary:
In Scalarizer::gather we see if we already have a scattered form of Op,
and in that case use the new form.
In the particular case of PR28108, the found ValueVector SV has size 2,
where the first Value is nullptr, and the second is indeed a proper Value.
The nullptr then caused an assert to blow when we tried to do
cast<Instruction>(SV[I]).
With this patch we check SV[I] before doing the cast, and if it's nullptr
we just skip over it.
I don't know the Scalarizer well enough to know if this is the best fix
or if something should be done else where to prevent the nullptr from
being in the ValueVector at all, but at least this avoids the crash
and looking at the test case output it looks reasonable.
Reviewers: hfinkel, frasercrmck, wala, mehdi_amini
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D21518
llvm-svn: 275359
This adds Clang-specific DWARF constants for nullability and ObjC
class properties that are already generated by clang. This patch adds
dwarfdump support and a more comprehensive testcase.
<rdar://problem/27335745>
llvm-svn: 275354
Avoid exposing a cl::opt in a public header and instead promote this
option in the API.
Alternatively, we could land the cl::opt in CommandFlags.h so that
it is available to every tool, but we would still have to find an
option for clang.
llvm-svn: 275348
IPRA try to optimize caller saved register by propagating register
usage information from callee to caller so it is beneficial to have
caller saved registers compare to callee saved registers when IPRA
is enabled. Please find more detailed explanation here
https://groups.google.com/d/msg/llvm-dev/XRzGhJ9wtZg/tjAJqb0eEgAJ.
This change makes local function do not have any callee preserved
register when IPRA is enabled. A simple test case is also added to
verify this change.
Patch by Vivek Pandya <vivekvpandya@gmail.com>
Differential Revision: http://reviews.llvm.org/D21561
llvm-svn: 275347
Treat loads which clip before the start of a global initializer the same
way we treat clipping beyond the end of the initializer: use zeros.
llvm-svn: 275345
Code cleanup: Move references to SlotMapping and SourceMgr into the
PerFunctionMIParsingState to avoid unnecessary passing around in
parameters.
llvm-svn: 275342
The code was pretty much copy-pasted between SCCP and IPSCCP. The situation
became clearly worse after I introduced the support for folding structs in
SCCP. This commit is NFC as we currently (still) skip the replacement
step in IPSCCP, but I'll change this soon.
llvm-svn: 275339
Summary:
Previously it would say we had an invariant load if any of the memory
operands were invariant. But the load should be invariant only if *all*
the memory operands are invariant.
No testcase because this has proven to be very difficult to tickle in
practice. As just one example, ARM's ldrd instruction, which loads 64
bits into two 32-bit regs, is theoretically affected by this. But when
it's produced, it loses its memoperands' invariance bits!
Reviewers: jfb
Subscribers: llvm-commits, aemerson
Differential Revision: http://reviews.llvm.org/D22318
llvm-svn: 275331
Code cleanup: The PerFunctionMIParsingState is per function, moving a
reference into PFS we can avoid passing around the MachineFunction in an
extra parameter most of the time.
Also change most signatures to consistently pass PFS reference first.
llvm-svn: 275329
It's safe to print out source coverage views using multiple threads when
using the -output-dir mode of the `llvm-cov show` sub-command.
While testing this on my development machine, I observed that the speed
up is roughly linear with the number of available cores. Avg. time for
`llvm-cov show ./llvm-as -show-line-counts-or-regions`:
1 thread: 7.79s user 0.33s system 98% cpu 8.228 total
4 threads: 7.82s user 0.34s system 283% cpu 2.880 total
llvm-svn: 275321
This happens to make X86CallFrameOptimization in -O0 / FastISel builds as well,
but it's not clear if the pass should run in that setup.
http://reviews.llvm.org/D22314
llvm-svn: 275320
Summary:
LSV used to abort vectorizing a chain for interleaved load/store accesses that alias.
Allow a valid prefix of the chain to be vectorized, mark just the prefix and retry vectorizing the remaining chain.
Reviewers: llvm-commits, jlebar, arsenm
Subscribers: mzolotukhin
Differential Revision: http://reviews.llvm.org/D22119
llvm-svn: 275317
See http://reviews.llvm.org/D22079
Changes the Archive::child_begin and Archive::children to require a reference
to an Error. If iterator increment fails (because the archive header is
damaged) the iterator will be set to 'end()', and the error stored in the
given Error&. The Error value should be checked by the user immediately after
the loop. E.g.:
Error Err;
for (auto &C : A->children(Err)) {
// Do something with archive child C.
}
// Check the error immediately after the loop.
if (Err)
return Err;
Failure to check the Error will result in an abort() when the Error goes out of
scope (as guaranteed by the Error class).
llvm-svn: 275316
Currently the MIR framework prints all its outputs (errors and actual
representation) on stderr.
This patch fixes that by printing the regular output in the output
specified with -o.
Differential Revision: http://reviews.llvm.org/D22251
llvm-svn: 275314
Doing "I++" inside of an EXPECT_* triggers
warning: expression with side effects has no effect in an unevaluated context
because EXPECT_* partially expands to
EqHelper<(sizeof(::testing::internal::IsNullLiteralHelper(MockObjects[I++] + 1)) == 1)>
which is an unevaluated context.
llvm-svn: 275293
Summary: Normally when you do a bitwise operation on an enum value, you
get back an instance of the underlying type (e.g. int). But using this
macro, bitwise ops on your enum will return you back instances of the
enum. This is particularly useful for enums which represent a
combination of flags.
Suppose you have a function which takes an int and a set of flags. One
way to do this would be to take two numeric params:
enum SomeFlags { F1 = 1, F2 = 2, F3 = 4, ... };
void Fn(int Num, int Flags);
void foo() {
Fn(42, F2 | F3);
}
But now if you get the order of arguments wrong, you won't get an error.
You might try to fix this by changing the signature of Fn so it accepts
a SomeFlags arg:
enum SomeFlags { F1 = 1, F2 = 2, F3 = 4, ... };
void Fn(int Num, SomeFlags Flags);
void foo() {
Fn(42, static_cast<SomeFlags>(F2 | F3));
}
But now we need a static cast after doing "F2 | F3" because the result
of that computation is the enum's underlying type.
This patch adds a mechanism which gives us the safety of the second
approach with the brevity of the first.
enum SomeFlags {
F1 = 1, F2 = 2, F3 = 4, ..., F_MAX = 128,
LLVM_MARK_AS_BITMASK_ENUM(F_MAX)
};
void Fn(int Num, SomeFlags Flags);
void foo() {
Fn(42, F2 | F3); // No static_cast.
}
The LLVM_MARK_AS_BITMASK_ENUM macro enables overloads for bitwise
operators on SomeFlags. Critically, these operators return the enum
type, not its underlying type, so you don't need any static_casts.
An advantage of this solution over the previously-proposed BitMask class
[0, 1] is that we don't need any wrapper classes -- we can operate
directly on the enum itself.
The approach here is somewhat similar to OpenOffice's typed_flags_set
[2]. But we skirt the need for a wrapper class (and a good deal of
complexity) by judicious use of enable_if. We SFINAE on the presence of
a particular enumerator (added by the LLVM_MARK_AS_BITMASK_ENUM macro)
instead of using a traits class so that it's impossible to use the enum
before the overloads are present. The solution here also seamlessly
works across multiple namespaces.
[0] http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20150622/283369.html
[1] http://lists.llvm.org/pipermail/llvm-commits/attachments/20150623/073434b6/attachment.obj
[2] https://cgit.freedesktop.org/libreoffice/core/tree/include/o3tl/typed_flags_set.hxx
Reviewers: chandlerc, rsmith
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D22279
llvm-svn: 275292
Because of the goop involved in the EXPECT_EQ macro, we were getting the
following warning
expression with side effects has no effect in an unevaluated context
because the "I++" was being used inside of a template type:
switch (0) case 0: default: if (const ::testing::AssertionResult gtest_ar = (::testing::internal:: EqHelper<(sizeof(::testing::internal::IsNullLiteralHelper(Args[I++])) == 1)>::Compare("Args[I++]", "&A", Args[I++], &A))) ; else ::testing::internal::AssertHelper(::testing::TestPartResult::kNonFatalFailure, "../src/unittests/IR/FunctionTest.cpp", 94, gtest_ar.failure_message()) = ::testing::Message();
llvm-svn: 275291
In D21740, we discussed trying to make this a more general matcher. However, I didn't see a clean
way to handle the regular m_Not cases and these non-splat vector patterns, so I've opted for the
direct approach here. If there are other potential uses of areInverseVectorBitmasks(), we could
move that helper function to a higher level.
There is an open question as to which is of these forms should be considered the canonical IR:
%sel = select <4 x i1> <i1 true, i1 false, i1 false, i1 true>, <4 x i32> %a, <4 x i32> %b
%shuf = shufflevector <4 x i32> %a, <4 x i32> %b, <4 x i32> <i32 0, i32 5, i32 6, i32 3>
Differential Revision: http://reviews.llvm.org/D22114
llvm-svn: 275289
Summary:
v2: don't count SGPRs spilled to scratch twice
I think this is sufficient. It doesn't count private memory usage, which
happens often and uses scratch but isn't technically a spill. The private
memory usage can be computed by:
[scratch_per_thread - vgpr_spills - a random multiple of SGPR spills].
The fact SGPR spills add very high numbers to the scratch size make that
computation a guessing game, but I don't have a solution to that.
Reviewers: tstellarAMD
Subscribers: arsenm, kzhuravl
Differential Revision: http://reviews.llvm.org/D22197
llvm-svn: 275288
While testing a follow-on change to enable index-based symbol resolution
and internalization in the distributed backends, I realized that a test
case change I made in r275247 was only required because we were not
analyzing symbols in the claimed files in thinlto-index-only mode.
In the fixed test case there should be no internalization because we are
linking in -shared mode, so f() is in fact exported, which is detected
properly when we analyze symbols in thinlto-index-only mode. Note that
this is not (yet) a correctness issue (because we are not yet performing
the index-based linkage optimizations in the distributed backends -
that's coming in a follow-on patch).
llvm-svn: 275277
We know that pcmp produces all-ones/all-zeros bitmasks, so we can use that behavior to avoid unnecessary constant loading.
One could argue that load+and is actually a better solution for some CPUs (Intel big cores) because shifts don't have the
same throughput potential as load+and on those cores, but that should be handled as a CPU-specific later transformation if
it ever comes up. Removing the load is the more general x86 optimization. Note that the uneven usage of vpbroadcast in the
test cases is filed as PR28505:
https://llvm.org/bugs/show_bug.cgi?id=28505
Differential Revision: http://reviews.llvm.org/D22225
llvm-svn: 275276
- Add new TTI instruction checks
- Don't use const for blocks that are mutated.
- Checking isBranch and isTerminator should be redundant
llvm-svn: 275252
Summary:
This is necessary for D21771. In order to add the hotness attribute to
optimization remarks we need BFI to be available in all passes that emit
optimization remarks.
However we don't want to pay for computing BFI unless the hotness
attribute is requested.
This is achieved by making BFI lazy at the very high-level through a new
analysis pass -- BFI is not calculated unless requested.
I am adding a test to check the laziness under D21771 where the first
user of the analysis is added.
Reviewers: hfinkel, dexonsmith, davidxl
Subscribers: davidxl, dexonsmith, llvm-commits
Differential Revision: http://reviews.llvm.org/D22141
llvm-svn: 275250
This achieves the same result as previously by using line wrapping. This allows
us to have one keyword per line which makes adding a new keyword significantly
easier, especially if they are inserted in a lexicographical sort order as you
no longer need to reflow the content around it.
This only does the keywords as that is the group which changes more often.
llvm-svn: 275248
Internalization was missing cases where we originally had a local symbol
that was promoted eagerly but not actually exported. This is because we
were only internalizing the set of global (non-local) symbols that were
PREVAILAING_DEF_IRONLY. Instead, collect the set of global symbols that
are referenced outside of a single IR file, and skip internalization for
those.
llvm-svn: 275247
The many levels of nesting inside the responsible code made it easy for
bugs to sneak in. Flattening the logic makes it easier to see what's
going on.
llvm-svn: 275244
These patterns just extracted the source down to 128-bits to use the instructions. AVX512 seems to have blindly copied them over for VLX, but did not create similar patterns for 512-bit sources. So I'm hoping the backend can't actually produce these cases.
llvm-svn: 275240
New pass manager for LICM.
Summary: Port LICM to the new pass manager.
Reviewers: davidxl, silvas
Subscribers: krasin, vitalybuka, silvas, davide, sanjoy, llvm-commits, mehdi_amini
Differential Revision: http://reviews.llvm.org/D21772
llvm-svn: 275224
We can freeze the registers after the MachineFrameInfo has been configured (by
telling it about calls, inline asm, ...). This doesn't happen at all yet, but
will be part of IR translation.
Fixes -verify-machineinstrs assertion.
llvm-svn: 275221
The LCSSA pass itself will not generate several redundant PHI nodes in a single
exit block. However, such redundant PHI nodes don't violate LCSSA form, and may
be introduced by passes that preserve LCSSA, and/or preserved by the LCSSA pass
itself. So, assuming a single PHI node per exit block is not safe.
llvm-svn: 275217