Doing "I++" inside of an EXPECT_* triggers
warning: expression with side effects has no effect in an unevaluated context
because EXPECT_* partially expands to
EqHelper<(sizeof(::testing::internal::IsNullLiteralHelper(MockObjects[I++] + 1)) == 1)>
which is an unevaluated context.
llvm-svn: 275293
Summary: Normally when you do a bitwise operation on an enum value, you
get back an instance of the underlying type (e.g. int). But using this
macro, bitwise ops on your enum will return you back instances of the
enum. This is particularly useful for enums which represent a
combination of flags.
Suppose you have a function which takes an int and a set of flags. One
way to do this would be to take two numeric params:
enum SomeFlags { F1 = 1, F2 = 2, F3 = 4, ... };
void Fn(int Num, int Flags);
void foo() {
Fn(42, F2 | F3);
}
But now if you get the order of arguments wrong, you won't get an error.
You might try to fix this by changing the signature of Fn so it accepts
a SomeFlags arg:
enum SomeFlags { F1 = 1, F2 = 2, F3 = 4, ... };
void Fn(int Num, SomeFlags Flags);
void foo() {
Fn(42, static_cast<SomeFlags>(F2 | F3));
}
But now we need a static cast after doing "F2 | F3" because the result
of that computation is the enum's underlying type.
This patch adds a mechanism which gives us the safety of the second
approach with the brevity of the first.
enum SomeFlags {
F1 = 1, F2 = 2, F3 = 4, ..., F_MAX = 128,
LLVM_MARK_AS_BITMASK_ENUM(F_MAX)
};
void Fn(int Num, SomeFlags Flags);
void foo() {
Fn(42, F2 | F3); // No static_cast.
}
The LLVM_MARK_AS_BITMASK_ENUM macro enables overloads for bitwise
operators on SomeFlags. Critically, these operators return the enum
type, not its underlying type, so you don't need any static_casts.
An advantage of this solution over the previously-proposed BitMask class
[0, 1] is that we don't need any wrapper classes -- we can operate
directly on the enum itself.
The approach here is somewhat similar to OpenOffice's typed_flags_set
[2]. But we skirt the need for a wrapper class (and a good deal of
complexity) by judicious use of enable_if. We SFINAE on the presence of
a particular enumerator (added by the LLVM_MARK_AS_BITMASK_ENUM macro)
instead of using a traits class so that it's impossible to use the enum
before the overloads are present. The solution here also seamlessly
works across multiple namespaces.
[0] http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20150622/283369.html
[1] http://lists.llvm.org/pipermail/llvm-commits/attachments/20150623/073434b6/attachment.obj
[2] https://cgit.freedesktop.org/libreoffice/core/tree/include/o3tl/typed_flags_set.hxx
Reviewers: chandlerc, rsmith
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D22279
llvm-svn: 275292
Because of the goop involved in the EXPECT_EQ macro, we were getting the
following warning
expression with side effects has no effect in an unevaluated context
because the "I++" was being used inside of a template type:
switch (0) case 0: default: if (const ::testing::AssertionResult gtest_ar = (::testing::internal:: EqHelper<(sizeof(::testing::internal::IsNullLiteralHelper(Args[I++])) == 1)>::Compare("Args[I++]", "&A", Args[I++], &A))) ; else ::testing::internal::AssertHelper(::testing::TestPartResult::kNonFatalFailure, "../src/unittests/IR/FunctionTest.cpp", 94, gtest_ar.failure_message()) = ::testing::Message();
llvm-svn: 275291
In D21740, we discussed trying to make this a more general matcher. However, I didn't see a clean
way to handle the regular m_Not cases and these non-splat vector patterns, so I've opted for the
direct approach here. If there are other potential uses of areInverseVectorBitmasks(), we could
move that helper function to a higher level.
There is an open question as to which is of these forms should be considered the canonical IR:
%sel = select <4 x i1> <i1 true, i1 false, i1 false, i1 true>, <4 x i32> %a, <4 x i32> %b
%shuf = shufflevector <4 x i32> %a, <4 x i32> %b, <4 x i32> <i32 0, i32 5, i32 6, i32 3>
Differential Revision: http://reviews.llvm.org/D22114
llvm-svn: 275289
Summary:
v2: don't count SGPRs spilled to scratch twice
I think this is sufficient. It doesn't count private memory usage, which
happens often and uses scratch but isn't technically a spill. The private
memory usage can be computed by:
[scratch_per_thread - vgpr_spills - a random multiple of SGPR spills].
The fact SGPR spills add very high numbers to the scratch size make that
computation a guessing game, but I don't have a solution to that.
Reviewers: tstellarAMD
Subscribers: arsenm, kzhuravl
Differential Revision: http://reviews.llvm.org/D22197
llvm-svn: 275288
While testing a follow-on change to enable index-based symbol resolution
and internalization in the distributed backends, I realized that a test
case change I made in r275247 was only required because we were not
analyzing symbols in the claimed files in thinlto-index-only mode.
In the fixed test case there should be no internalization because we are
linking in -shared mode, so f() is in fact exported, which is detected
properly when we analyze symbols in thinlto-index-only mode. Note that
this is not (yet) a correctness issue (because we are not yet performing
the index-based linkage optimizations in the distributed backends -
that's coming in a follow-on patch).
llvm-svn: 275277
We know that pcmp produces all-ones/all-zeros bitmasks, so we can use that behavior to avoid unnecessary constant loading.
One could argue that load+and is actually a better solution for some CPUs (Intel big cores) because shifts don't have the
same throughput potential as load+and on those cores, but that should be handled as a CPU-specific later transformation if
it ever comes up. Removing the load is the more general x86 optimization. Note that the uneven usage of vpbroadcast in the
test cases is filed as PR28505:
https://llvm.org/bugs/show_bug.cgi?id=28505
Differential Revision: http://reviews.llvm.org/D22225
llvm-svn: 275276
- Add new TTI instruction checks
- Don't use const for blocks that are mutated.
- Checking isBranch and isTerminator should be redundant
llvm-svn: 275252
Summary:
This is necessary for D21771. In order to add the hotness attribute to
optimization remarks we need BFI to be available in all passes that emit
optimization remarks.
However we don't want to pay for computing BFI unless the hotness
attribute is requested.
This is achieved by making BFI lazy at the very high-level through a new
analysis pass -- BFI is not calculated unless requested.
I am adding a test to check the laziness under D21771 where the first
user of the analysis is added.
Reviewers: hfinkel, dexonsmith, davidxl
Subscribers: davidxl, dexonsmith, llvm-commits
Differential Revision: http://reviews.llvm.org/D22141
llvm-svn: 275250
This achieves the same result as previously by using line wrapping. This allows
us to have one keyword per line which makes adding a new keyword significantly
easier, especially if they are inserted in a lexicographical sort order as you
no longer need to reflow the content around it.
This only does the keywords as that is the group which changes more often.
llvm-svn: 275248
Internalization was missing cases where we originally had a local symbol
that was promoted eagerly but not actually exported. This is because we
were only internalizing the set of global (non-local) symbols that were
PREVAILAING_DEF_IRONLY. Instead, collect the set of global symbols that
are referenced outside of a single IR file, and skip internalization for
those.
llvm-svn: 275247
The many levels of nesting inside the responsible code made it easy for
bugs to sneak in. Flattening the logic makes it easier to see what's
going on.
llvm-svn: 275244
These patterns just extracted the source down to 128-bits to use the instructions. AVX512 seems to have blindly copied them over for VLX, but did not create similar patterns for 512-bit sources. So I'm hoping the backend can't actually produce these cases.
llvm-svn: 275240
New pass manager for LICM.
Summary: Port LICM to the new pass manager.
Reviewers: davidxl, silvas
Subscribers: krasin, vitalybuka, silvas, davide, sanjoy, llvm-commits, mehdi_amini
Differential Revision: http://reviews.llvm.org/D21772
llvm-svn: 275224
We can freeze the registers after the MachineFrameInfo has been configured (by
telling it about calls, inline asm, ...). This doesn't happen at all yet, but
will be part of IR translation.
Fixes -verify-machineinstrs assertion.
llvm-svn: 275221
The LCSSA pass itself will not generate several redundant PHI nodes in a single
exit block. However, such redundant PHI nodes don't violate LCSSA form, and may
be introduced by passes that preserve LCSSA, and/or preserved by the LCSSA pass
itself. So, assuming a single PHI node per exit block is not safe.
llvm-svn: 275217
Summary:
Refactored the profitability analysis out of the IC promotion pass and
into lib/Analysis so that it can be accessed by the summary index
builder in a follow-on patch to enable IC promotion in ThinLTO (D21932).
Reviewers: davidxl, xur
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D22182
llvm-svn: 275216
This patch corresponds to review:
http://reviews.llvm.org/D20239
It adds exploitation of XXINSERTW and XXEXTRACTUW instructions that
are useful in some cases for inserting and extracting vector elements of
v4[if]32 vectors.
llvm-svn: 275215
With r274952 and r275201 in place there are no cases left where a
forward liveness analysis yields different results than a backward one.
So we can remove the forward stepping logic.
Differential Revision: http://reviews.llvm.org/D22083
llvm-svn: 275204
Use LivePhysRegs with a backwards walking algorithm to update live in
lists, this way the results do not depend on the presence of kill flags
anymore.
This patch also reduces the number of registers added as live-in.
Previously all pristine registers as well as all sub registers of a
super register were added resulting in unnecessarily large live in
lists. This fixed https://llvm.org/PR25263.
Differential Revision: http://reviews.llvm.org/D22027
llvm-svn: 275201
If a subtarget has both ZCZeroing and CustomCheapAsMoveHandling features (now
only Kryo has both), set FMOVS0 and FMOVD0 isAsCheapAsAMove.
Differential Revision: http://reviews.llvm.org/D22256
llvm-svn: 275178
This patch corresponds to review:
http://reviews.llvm.org/D21358
Vector shifts that have the same semantics as a vector swap are cannonicalized
as such to provide additional opportunities for swap removal optimization to
remove unnecessary swaps.
llvm-svn: 275168
Added support for:
1. Multi dimension array.
2. Array of structure type, which previously was declared incompletely.
3. Dynamic size array.
4. Array where element type is a typedef, volatile or constant (this should resolve PR28311).
Differential Revision: http://reviews.llvm.org/D21526
llvm-svn: 275167
Summary:
Previously, constant index insertelements would be turned into SI_INDIRECT_DST,
which is bound to prevent some optimization opportunities. Worse, it mislead
the heuristic that decides whether immediates should be lowered to S_MOV_B32
or V_MOV_B32 in a way that resulted in unnecessary v_readfirstlanes.
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, kzhuravl, llvm-commits
Differential Revision: http://reviews.llvm.org/D22217
llvm-svn: 275160
Avoid implicit conversions from MachineInstrBundleIterator to
MachineInstr*, mainly by preferring MachineInstr& over MachineInstr* and
using range-based for loops.
llvm-svn: 275149
Summary:
It's useful to have some visibility about which call sites are devirtualized,
especially for debug purposes. Another use case is a regression test on the
application side (like, Chromium).
Reviewers: pcc
Differential Revision: http://reviews.llvm.org/D22252
llvm-svn: 275145
Avoid implicit iterator conversions from MachineInstrBundleIterator to
MachineInstr* in the Hexagon backend, mostly by preferring MachineInstr&
over MachineInstr* and switching to range-based for loops.
There's a long tail of API cleanup here, but I'm planning to leave the
rest to the Hexagon maintainers. HexagonInstrInfo defines many of its
own predicates, and most of them still take MachineInstr*. Some of
those actually check for nullptr, so I didn't feel comfortable changing
them to MachineInstr& en masse.
llvm-svn: 275142
Avoid implicit conversions from MachineInstrBundleIterator to
MachineInstr* in the Mips backend, mainly by preferring MachineInstr&
over MachineInstr* when a pointer isn't nullable and using range-based
for loops.
llvm-svn: 275141
Avoid implicit conversions from MachineInstrBundleIterator to
MachineInstr* in the SystemZ backend, mainly by preferring MachineInstr&
over MachineInstr* and using range-based for loops.
llvm-svn: 275137
The linker supports a feature to force load an object from a static
archive if it defines an Objective-C category.
This API supports this feature by looking at every section in the
module to find if a category is defined in the module.
llvm-svn: 275125
This patch simplifies the graph builder by encoding nodes as {Value,
Dereference Level} pairs. This lets us kill edge types, and allows us to
get rid of hacks in StratifiedSets (like addAttrsBelow/...). This
simplification also allows us to remove InstantiatedRelations and
InstantiatedAttrs.
Patch by Jia Chen.
Differential Revision: http://reviews.llvm.org/D22080
llvm-svn: 275122
Summary:
Aiming to correct the ordering of loads/stores. This patch changes the
insert point for loads to the position of the first load.
It updates the ordering method for loads to insert before, rather than after.
Before this patch the following sequence:
"load a[1], store a[1], store a[0], load a[2]"
Would incorrectly vectorize to "store a[0,1], load a[1,2]".
The correctness check was assuming the insertion point for loads is at
the position of the first load, when in practice it was at the last
load. An alternative fix would have been to invert the correctness check.
The current fix changes insert position but also requires reordering of
instructions before the vectorized load.
Updated testcases to reflect the changes.
Reviewers: tstellarAMD, llvm-commits, jlebar, arsenm
Subscribers: mzolotukhin
Differential Revision: http://reviews.llvm.org/D22071
llvm-svn: 275117
Immediate branch targets aren't commonly used, but if they are we should make
sure they can actually be encoded. This means they must be divisible by 2 when
targeting Thumb mode, and by 4 when targeting ARM mode.
Also do a little naming cleanup while I was changing everything around anyway.
llvm-svn: 275116
Summary:
Setting MIMG to 0 has a bunch of unexpected side effects, including that
isVMEM returns false which leads to incorrect treatment in the hazard
recognizer. The reason I noticed it is that it also leads to incorrect
treatment in VGPR-to-SGPR copies, which is one cause of the referenced bug.
The only reason why MIMG was set to 0 is to signal the special handling of
dmasks, but that can be checked differently.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96877
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, kzhuravl, llvm-commits
Differential Revision: http://reviews.llvm.org/D22210
llvm-svn: 275113
This will be useful once we start adding the ability to dump type
records and symbol records, since it will allow us to generate
mergeable information instead of information that specifies an
entire file.
llvm-svn: 275109
Summary:
The main bug fix here is using the 32-bit encoding of V_ADD_I32 in
materializeFrameBaseRegister and resolveFrameIndex, so that arbitrary
immediates work.
The second part is that we may now require the SegmentWaveByteOffset
even when there are initially no stack objects and VGPR spilling isn't
enabled, for stack slots that are allocated later. This means that some
bits become effectively dead and can be cleaned up.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96602
Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, llvm-commits, kzhuravl
Differential Revision: http://reviews.llvm.org/D21551
llvm-svn: 275108
Make some AVX and AVX512 cast costs more precise.
Based on part of a patch by Elena Demikhovsky (D15604).
Differential Revision: http://reviews.llvm.org/D22064
llvm-svn: 275106
Blocks to be tail-merged may share more than one successor. Correct the
comment to state that they share a specific successor, SuccBB, rather
than a single successor, which is not true.
llvm-svn: 275104
This bug (llvm.org/PR28124) was introduced by r237977, which refactored
the tail call sequence to be generated in two passes instead of one.
Unfortunately, the stack adjustment produced by the first pass was not
recognized by X86FrameLowering::mergeSPUpdates() in all cases, causing
code such as the following, which clobbers the return address, to be
generated:
popl %edi
popl %edi
pushl %eax
jmp tailcallee # TAILCALL
To fix the problem, the entire stack adjustment is performed in
X86ExpandPseudo::ExpandMI() for tail calls.
Patch by Magnus Lång <margnus1@gmail.com>
Differential Revision: http://reviews.llvm.org/D21325
llvm-svn: 275103
It is an optimization pass, and should not run at -O0. Especially since Fast RA
will not do the required register coalescing anyway, so it's a loss even from
the optimization standpoint.
This also works around (but doesn't quite fix) PR28489.
llvm-svn: 275099
Summary: Add support for the z13 instructions LOCHI and LOCGHI which
conditionally load immediate values. Add target instruction info hooks so
that if conversion will allow predication of LHI/LGHI.
Author: RolandF
Reviewers: uweigand
Subscribers: zhanjunl
Commiting on behalf of Roland.
Differential Revision: http://reviews.llvm.org/D22117
llvm-svn: 275086
There's a little bit of churn in this patch because the initialization
mechanism is now shared between the old and the new PM. Other than
that, it's just a pretty mechanical translation.
llvm-svn: 275082
Summary: http://reviews.llvm.org/D22118 uses metadata to store the call count, which makes it possible to have branch weight to have only one elements. Also fix the assertion failure in inliner when checking the instruction type to include "invoke" instruction.
Reviewers: mkuper, dnovillo
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D22228
llvm-svn: 275079
In preparation for porting this pass to the new PM (which has no
doInitialization()).
Differential Revision: http://reviews.llvm.org/D22223
llvm-svn: 275074
Summary:
For sample-based PGO, using BFI to calculate callsite count is sometime not accurate. This is because with sampling based approach, if a callsite resides in a hot loop deeply nested in a bunch of cold branches, the callsite's BFI frequency would be inaccurately calculated due to lack of samples in the cold branch.
E.g.
if (A1 && A2 && A3 && ..... && A10) {
for (i=0; i < 100000000; i++) {
callsite();
}
}
Assume that A1 to A100 are all 100% taken, and callsite has 1000 samples and thus is considerred hot. Because the loop's trip count is huge, it's normal that all branches outside the loop has no sample at all. As a result, we can only use static branch probability to derive the the frequency of the loop header. Assuming that static heuristic thinks each branch is 50% taken, then the count calculated from BFI will be 1/(2^10) of the actual value.
In order to get more accurate callsite count, we directly annotate the weight on the call instruction, and directly use it when checking callsite hotness.
Note that this mechanism can also be shared by instrumentation based callsite hotness analysis. The side benefit is that it breaks the dependency from Inliner to BFI as call count is embedded in the IR.
Reviewers: davidxl, eraman, dnovillo
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D22118
llvm-svn: 275073
Summary: Handle the case when there is only one incoming/outgoing edge for a visited basic block: use the block weight to adjust edge weight even when the edge has been visited before. This can help reduce inaccuracies introduced by incorrect basic block profile, as shown in the updated unittest.
Reviewers: davidxl, dnovillo
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D22180
llvm-svn: 275072
This subtle change to getModRefInfo(Instruction, ImmutableCallSite) is to
ensure that the semantics are equal to that of getModRefInfo(CS1, CS2) when
the Instruction is a call-site.
This is now more in line with getModRefInfo generally: it returns Mod when
I modifies a memory location that is accessed (read or written) by CS and
Ref when I reads a memory location that is written by CS.
From a grep of the code, the only uses of this particular getModRefInfo
overload are in MemorySSA and MemCpyOptimizer, and they only care about
where the result is MR_NoModRef or not. Therefore, this change should have
no visible effect.
Separated out from D17279 upon request.
llvm-svn: 275065
At present the only shuffle with a variable mask we recognise is PSHUFB, which influences if its worth the cost of mask creation/loading of a combined target shuffle with a variable mask. This change sets up the infrastructure to support other shuffles in the future but has no effect yet.
llvm-svn: 275059
Preserve assembly comments from input in output assembly and flags to
toggle property. This is on by default for inline assembly and off in
llvm-mc.
Parsed comments are emitted immediately before an EOL which generally
places them on the expected line.
Reviewers: rtrieu, dwmw2, rnk, majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D20020
llvm-svn: 275058
For functions which are known to return a specific argument, pointer-comparison
folding can look through the function calls as part of its analysis.
Differential Revision: http://reviews.llvm.org/D9387
llvm-svn: 275039
For functions which are known to return their argument,
isDereferenceableAndAlignedPointer can examine the argument value.
Differential Revision: http://reviews.llvm.org/D9384
llvm-svn: 275038
When building SCEVs, if a function is known to return its argument, then we can
build the SCEV using the corresponding argument value.
Differential Revision: http://reviews.llvm.org/D9381
llvm-svn: 275037
If a function is known to return one of its arguments, we can use that in order
to compute known bits of the return value.
Differential Revision: http://reviews.llvm.org/D9397
llvm-svn: 275036
Motivated by the work on the llvm.noalias intrinsic, teach BasicAA to look
through returned-argument functions when answering queries. This is essential
so that we don't loose all other AA information when supplementing with
llvm.noalias.
Differential Revision: http://reviews.llvm.org/D9383
llvm-svn: 275035
In order to make the optimizer smarter about using the 'returned' argument
attribute (generally, but motivated by my llvm.noalias intrinsic work), add a
utility function to Call/InvokeInst, and CallSite, to make it easy to get the
returned call argument (when one exists).
P.S. There is already an unfortunate amount of code duplication between
CallInst and InvokeInst, and this adds to it. We should probably clean that up
separately.
Differential Revision: http://reviews.llvm.org/D22204
llvm-svn: 275031
Calls to matchVectorShuffleAsInsertPS only need to ensure the inputs are 128-bit vectors. Only lowerVectorShuffleAsInsertPS needs to ensure that they are v4f32.
llvm-svn: 275028
A function can have one argument with the 'returned' attribute, indicating that
the associated argument is always the return value of the function. Add
FuncAttrs inference logic.
Differential Revision: http://reviews.llvm.org/D22202
llvm-svn: 275027
The description of the 'returned' attribute says that it is only used when
code-generating the caller. I'd like to make the optimizer smarter about
looking through functions with returned arguments (generally, but motivated by
my llvm.noalias work). As David pointed out in the review of D22202, the
LangRef should be updated to make its expanded uses clearer.
Differential Revision: http://reviews.llvm.org/D22205
llvm-svn: 275026