Commit Graph

134921 Commits

Author SHA1 Message Date
Matt Arsenault f071102647 AMDGPU: Remove last AMDIL intrinsics
llvm-svn: 275309
2016-07-13 19:42:06 +00:00
Davide Italiano 390b7ea533 [SCCP] Factor out common code.
llvm-svn: 275308
2016-07-13 19:33:25 +00:00
Davide Italiano 2185001551 [SCCP] Use early return. NFCI.
llvm-svn: 275307
2016-07-13 19:23:30 +00:00
Andrew Kaylor 346dd7f1bd Reverting r275284 due to platform-specific test failures
llvm-svn: 275304
2016-07-13 19:09:16 +00:00
Sanjay Patel eff2aa70fc add more tests for zexty xor sandwiches
...mmm sandwiches

llvm-svn: 275302
2016-07-13 18:58:55 +00:00
Simon Pilgrim 5d664af3c3 [X86][SSE] Regenerate truncated shift test
Check SSE2 and AVX2 implementations

llvm-svn: 275300
2016-07-13 18:50:10 +00:00
Simon Pilgrim 631643e7d9 Regenerate test
llvm-svn: 275299
2016-07-13 18:46:37 +00:00
Sanjay Patel 904a88025a add test for zexty xor sandwich
llvm-svn: 275297
2016-07-13 18:40:38 +00:00
Justin Lebar 544b23d88f Fix header comment in unittests/CodeGen/DIEHashTest.cpp.
llvm-svn: 275296
2016-07-13 18:38:20 +00:00
Krzysztof Parzyszek cb4dd7656b Move mempcpy_call.ll to X86 subdirectory
llvm-svn: 275294
2016-07-13 18:28:45 +00:00
Justin Lebar 0753800383 Fix warning in ObjectTransformLayerTest.
Doing "I++" inside of an EXPECT_* triggers

  warning: expression with side effects has no effect in an unevaluated context

because EXPECT_* partially expands to

  EqHelper<(sizeof(::testing::internal::IsNullLiteralHelper(MockObjects[I++] + 1)) == 1)>

which is an unevaluated context.

llvm-svn: 275293
2016-07-13 18:27:49 +00:00
Justin Lebar 81edbbe259 [ADT] Add LLVM_MARK_AS_BITMASK_ENUM, used to enable bitwise operations on enums without static_cast.
Summary: Normally when you do a bitwise operation on an enum value, you
get back an instance of the underlying type (e.g. int).  But using this
macro, bitwise ops on your enum will return you back instances of the
enum.  This is particularly useful for enums which represent a
combination of flags.

Suppose you have a function which takes an int and a set of flags.  One
way to do this would be to take two numeric params:

  enum SomeFlags { F1 = 1, F2 = 2, F3 = 4, ... };
  void Fn(int Num, int Flags);

  void foo() {
    Fn(42, F2 | F3);
  }

But now if you get the order of arguments wrong, you won't get an error.

You might try to fix this by changing the signature of Fn so it accepts
a SomeFlags arg:

  enum SomeFlags { F1 = 1, F2 = 2, F3 = 4, ... };
  void Fn(int Num, SomeFlags Flags);

  void foo() {
    Fn(42, static_cast<SomeFlags>(F2 | F3));
  }

But now we need a static cast after doing "F2 | F3" because the result
of that computation is the enum's underlying type.

This patch adds a mechanism which gives us the safety of the second
approach with the brevity of the first.

  enum SomeFlags {
    F1 = 1, F2 = 2, F3 = 4, ..., F_MAX = 128,
    LLVM_MARK_AS_BITMASK_ENUM(F_MAX)
  };

  void Fn(int Num, SomeFlags Flags);

  void foo() {
    Fn(42, F2 | F3);  // No static_cast.
  }

The LLVM_MARK_AS_BITMASK_ENUM macro enables overloads for bitwise
operators on SomeFlags.  Critically, these operators return the enum
type, not its underlying type, so you don't need any static_casts.

An advantage of this solution over the previously-proposed BitMask class
[0, 1] is that we don't need any wrapper classes -- we can operate
directly on the enum itself.

The approach here is somewhat similar to OpenOffice's typed_flags_set
[2].  But we skirt the need for a wrapper class (and a good deal of
complexity) by judicious use of enable_if.  We SFINAE on the presence of
a particular enumerator (added by the LLVM_MARK_AS_BITMASK_ENUM macro)
instead of using a traits class so that it's impossible to use the enum
before the overloads are present.  The solution here also seamlessly
works across multiple namespaces.

[0] http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20150622/283369.html
[1] http://lists.llvm.org/pipermail/llvm-commits/attachments/20150623/073434b6/attachment.obj
[2] https://cgit.freedesktop.org/libreoffice/core/tree/include/o3tl/typed_flags_set.hxx

Reviewers: chandlerc, rsmith

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D22279

llvm-svn: 275292
2016-07-13 18:23:16 +00:00
Justin Lebar ab4622cb2a Fix warnings in FunctionTest.cpp.
Because of the goop involved in the EXPECT_EQ macro, we were getting the
following warning

  expression with side effects has no effect in an unevaluated context

because the "I++" was being used inside of a template type:

  switch (0) case 0: default: if (const ::testing::AssertionResult gtest_ar = (::testing::internal:: EqHelper<(sizeof(::testing::internal::IsNullLiteralHelper(Args[I++])) == 1)>::Compare("Args[I++]", "&A", Args[I++], &A))) ; else ::testing::internal::AssertHelper(::testing::TestPartResult::kNonFatalFailure, "../src/unittests/IR/FunctionTest.cpp", 94, gtest_ar.failure_message()) = ::testing::Message();

llvm-svn: 275291
2016-07-13 18:17:46 +00:00
Sanjay Patel c00e48a3db [InstCombine] extend vector select matching for non-splat constants
In D21740, we discussed trying to make this a more general matcher. However, I didn't see a clean
way to handle the regular m_Not cases and these non-splat vector patterns, so I've opted for the
direct approach here. If there are other potential uses of areInverseVectorBitmasks(), we could
move that helper function to a higher level.

There is an open question as to which is of these forms should be considered the canonical IR:
  %sel = select <4 x i1> <i1 true, i1 false, i1 false, i1 true>, <4 x i32> %a, <4 x i32> %b
  %shuf = shufflevector <4 x i32> %a, <4 x i32> %b, <4 x i32> <i32 0, i32 5, i32 6, i32 3>

Differential Revision: http://reviews.llvm.org/D22114

llvm-svn: 275289
2016-07-13 18:07:02 +00:00
Marek Olsak 0532c190f7 AMDGPU/SI: Emit the number of SGPR and VGPR spills
Summary:
v2: don't count SGPRs spilled to scratch twice

I think this is sufficient. It doesn't count private memory usage, which
happens often and uses scratch but isn't technically a spill. The private
memory usage can be computed by:
  [scratch_per_thread - vgpr_spills - a random multiple of SGPR spills].

The fact SGPR spills add very high numbers to the scratch size make that
computation a guessing game, but I don't have a solution to that.

Reviewers: tstellarAMD

Subscribers: arsenm, kzhuravl

Differential Revision: http://reviews.llvm.org/D22197

llvm-svn: 275288
2016-07-13 17:35:15 +00:00
Andrew Kaylor 12cccdd731 Fix for Bug 26903, adds support to inline __builtin_mempcpy
Patch by Sunita Marathe

Differential Revision: http://reviews.llvm.org/D21920

llvm-svn: 275284
2016-07-13 17:25:11 +00:00
David Blaikie b83cf10899 PR28516: Fix LangRef description of call and invoke to match IR changes for typeless pointers
llvm-svn: 275283
2016-07-13 17:21:34 +00:00
Matthias Braun 512424f28a PatchableFunction: Skip pseudos that do not create code
This fixes http://llvm.org/PR28524

llvm-svn: 275278
2016-07-13 16:37:29 +00:00
Teresa Johnson b907d06151 [ThinLTO/gold] Enable symbol resolution in distributed backend case
While testing a follow-on change to enable index-based symbol resolution
and internalization in the distributed backends, I realized that a test
case change I made in r275247 was only required because we were not
analyzing symbols in the claimed files in thinlto-index-only mode.

In the fixed test case there should be no internalization because we are
linking in -shared mode, so f() is in fact exported, which is detected
properly when we analyze symbols in thinlto-index-only mode. Note that
this is not (yet) a correctness issue (because we are not yet performing
the index-based linkage optimizations in the distributed backends -
that's coming in a follow-on patch).

llvm-svn: 275277
2016-07-13 16:35:56 +00:00
Sanjay Patel 610a2f6525 [x86][SSE/AVX] optimize pcmp results better (PR28484)
We know that pcmp produces all-ones/all-zeros bitmasks, so we can use that behavior to avoid unnecessary constant loading.

One could argue that load+and is actually a better solution for some CPUs (Intel big cores) because shifts don't have the
same throughput potential as load+and on those cores, but that should be handled as a CPU-specific later transformation if
it ever comes up. Removing the load is the more general x86 optimization. Note that the uneven usage of vpbroadcast in the
test cases is filed as PR28505:
https://llvm.org/bugs/show_bug.cgi?id=28505

Differential Revision: http://reviews.llvm.org/D22225

llvm-svn: 275276
2016-07-13 16:04:07 +00:00
David Majnemer 4cff2f8d49 [ConstantFolding] Use sdiv_ov
This is a simplification, there should be no functional change.

llvm-svn: 275273
2016-07-13 15:53:46 +00:00
Simon Pilgrim a99368fa35 [X86][AVX512] Add support for VPERMILPD/VPERMILPS variable shuffle mask comments
llvm-svn: 275272
2016-07-13 15:45:36 +00:00
Simon Pilgrim 48d8340760 [X86][AVX] Add support for target shuffle combining to VPERMILPS variable shuffle mask
Added AVX512F VPERMILPS shuffle decoding support

llvm-svn: 275270
2016-07-13 15:10:43 +00:00
Tom Stellard 418beb7671 AMDGPU/SI: Add support for R_AMDGPU_GOTPCREL
Reviewers: rafael, ruiu, tony-tye, arsenm, kzhuravl

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D21484

llvm-svn: 275268
2016-07-13 14:23:33 +00:00
Nirav Dave c1d8d4b268 Rename llc's -fpreserve-as-comments flag -preserve-as-comments.
llvm-svn: 275266
2016-07-13 14:20:41 +00:00
Nirav Dave 8ea792db60 [MC] Fix lexing ordering in assembly label parsing to preserve same line
comment placement.

llvm-svn: 275265
2016-07-13 14:03:12 +00:00
Simon Pilgrim 57548a6fa6 [X86][SSE] Check for lane crossing shuffles before trying to combine to PSHUFB
Removes a return-on-fail that was making it tricky to add other variable mask shuffles.

llvm-svn: 275262
2016-07-13 12:48:41 +00:00
Etienne Bergeron d8b9735a46 fix incorrect xref in sphinx doc
llvm-svn: 275255
2016-07-13 06:10:37 +00:00
Matt Arsenault 0056868c4a AMDGPU: Fold out no-op kill intrinsics
llvm-svn: 275253
2016-07-13 06:04:22 +00:00
Matt Arsenault 8dff86d878 AMDGPU: WQM cleanups
- Add new TTI instruction checks
- Don't use const for blocks that are mutated.
- Checking isBranch and isTerminator should be redundant

llvm-svn: 275252
2016-07-13 05:55:15 +00:00
David Majnemer 1b3db33e3d [ConstantFolding] Don't treat negative GEP offsets as positive
GEP offsets are signed, don't treat them as huge positive numbers.

llvm-svn: 275251
2016-07-13 05:16:16 +00:00
Adam Nemet c2f791d8a7 [BFI] Add new LazyBFI analysis pass
Summary:
This is necessary for D21771.  In order to add the hotness attribute to
optimization remarks we need BFI to be available in all passes that emit
optimization remarks.

However we don't want to pay for computing BFI unless the hotness
attribute is requested.

This is achieved by making BFI lazy at the very high-level through a new
analysis pass -- BFI is not calculated unless requested.

I am adding a test to check the laziness under D21771 where the first
user of the analysis is added.

Reviewers: hfinkel, dexonsmith, davidxl

Subscribers: davidxl, dexonsmith, llvm-commits

Differential Revision: http://reviews.llvm.org/D22141

llvm-svn: 275250
2016-07-13 05:01:48 +00:00
David Majnemer 90a9704a41 [ConstantFolding] Cleanups
No functional change is intended, just a minor cleanup.

llvm-svn: 275249
2016-07-13 04:22:12 +00:00
Saleem Abdulrasool b21e7834eb vim: separate the keywords into one per line
This achieves the same result as previously by using line wrapping.  This allows
us to have one keyword per line which makes adding a new keyword significantly
easier, especially if they are inserted in a lexicographical sort order as you
no longer need to reflow the content around it.

This only does the keywords as that is the group which changes more often.

llvm-svn: 275248
2016-07-13 03:47:58 +00:00
Teresa Johnson 27694571b1 [ThinLTO/gold] ThinLTO internalization fixes
Internalization was missing cases where we originally had a local symbol
that was promoted eagerly but not actually exported. This is because we
were only internalizing the set of global (non-local) symbols that were
PREVAILAING_DEF_IRONLY. Instead, collect the set of global symbols that
are referenced outside of a single IR file, and skip internalization for
those.

llvm-svn: 275247
2016-07-13 03:42:41 +00:00
David Majnemer 17bdf445e4 [IR] Make getIndexedOffsetInType return a signed result
A GEPed offset can go negative, the result of getIndexedOffsetInType
should according be a signed type.

llvm-svn: 275246
2016-07-13 03:42:38 +00:00
Saleem Abdulrasool f12c28d008 vim: add local_unnamed_addr keyword
The `local_unnamed_addr` was introduced in SVN r272709.  Update the syntax
highlighting rules.

llvm-svn: 275245
2016-07-13 03:36:05 +00:00
David Majnemer a7b6c973e5 [ConstantFold] Don't incorrectly infer inbounds on array GEP
The many levels of nesting inside the responsible code made it easy for
bugs to sneak in.  Flattening the logic makes it easier to see what's
going on.

llvm-svn: 275244
2016-07-13 03:24:41 +00:00
David Majnemer 81d877b392 [LoopVectorize] Further cleanups
No functional change is intended, just a minor cleanup.

llvm-svn: 275243
2016-07-13 03:24:38 +00:00
Craig Topper ff1c327ebb [X86] Remove some seemingly unnecessary patterns that supported vector zext/sext with 256-bit source types producing a 256-bit result.
These patterns just extracted the source down to 128-bits to use the instructions. AVX512 seems to have blindly copied them over for VLX, but did not create similar patterns for 512-bit sources. So I'm hoping the backend can't actually produce these cases.

llvm-svn: 275240
2016-07-13 02:21:25 +00:00
Keno Fischer 1efc3b70c5 Fix ScalarEvolutionExpander step scaling bug
The expandAddRecExprLiterally function incorrectly transforms
`[Start + Step * X]` into `Step * [Start + X]` instead of the correct
transform of `[Step * X] + Start`.

This caused https://github.com/JuliaLang/julia/issues/14704#issuecomment-174126219
due to what appeared to be sufficiently complicated loop interactions.

Patch by Jameson Nash (jameson@juliacomputing.com).

Reviewers: sanjoy
Differential Revision: http://reviews.llvm.org/D16505

llvm-svn: 275239
2016-07-13 01:28:12 +00:00
Teresa Johnson 835df56cb3 Remove another unused variable from r275216
Remove another variable added in r275216 that was only used in debug
mode.

llvm-svn: 275238
2016-07-12 23:49:17 +00:00
Michael Kuperstein 51078b81ca [LV] Do not invalidate use-lists we're iterating over.
Should make sanitizers happier.

llvm-svn: 275230
2016-07-12 23:11:34 +00:00
Dehao Chen f400a099a4 Add missing files for r275222
New pass manager for LICM.

Summary: Port LICM to the new pass manager.

Reviewers: davidxl, silvas

Subscribers: krasin, vitalybuka, silvas, davide, sanjoy, llvm-commits, mehdi_amini

Differential Revision: http://reviews.llvm.org/D21772

llvm-svn: 275224
2016-07-12 22:42:24 +00:00
Dehao Chen 9cba1f4e7e New pass manager for LICM.
Summary: Port LICM to the new pass manager.

Reviewers: davidxl, silvas

Subscribers: krasin, vitalybuka, silvas, davide, sanjoy, llvm-commits, mehdi_amini

Differential Revision: http://reviews.llvm.org/D21772

llvm-svn: 275222
2016-07-12 22:37:48 +00:00
Tim Northover 72eebfa4b0 GlobalISel: freeze reserved regs after IRTranslator.
We can freeze the registers after the MachineFrameInfo has been configured (by
telling it about calls, inline asm, ...). This doesn't happen at all yet, but
will be part of IR translation.

Fixes -verify-machineinstrs assertion.

llvm-svn: 275221
2016-07-12 22:23:42 +00:00
Matt Arsenault 786724a22e AMDGPU: Follow up to r275203
I meant to squash this into it.

llvm-svn: 275220
2016-07-12 21:41:32 +00:00
Teresa Johnson 8950ad12ad Remove unused variable to fix bot failure from r275216
Remove unused variable added in r275216. Should fix bot failure:
http://lab.llvm.org:8011/builders/lld-x86_64-darwin13/builds/24665

llvm-svn: 275219
2016-07-12 21:29:05 +00:00
Nemanja Ivanovic f0407e3902 The test case I added is PowerPC specific but I accidentally
had it in the wrong directory. Moved it to CodeGen/PowerPC.

Sorry about the noise.

llvm-svn: 275218
2016-07-12 21:24:08 +00:00
Michael Kuperstein a99c46cc73 [LV] Remove wrong assumption about LCSSA
The LCSSA pass itself will not generate several redundant PHI nodes in a single
exit block. However, such redundant PHI nodes don't violate LCSSA form, and may
be introduced by passes that preserve LCSSA, and/or preserved by the LCSSA pass
itself. So, assuming a single PHI node per exit block is not safe.

llvm-svn: 275217
2016-07-12 21:24:06 +00:00
Teresa Johnson 1e44b5d3ab Refactor indirect call promotion profitability analysis (NFC)
Summary:
Refactored the profitability analysis out of the IC promotion pass and
into lib/Analysis so that it can be accessed by the summary index
builder in a follow-on patch to enable IC promotion in ThinLTO (D21932).

Reviewers: davidxl, xur

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D22182

llvm-svn: 275216
2016-07-12 21:13:44 +00:00
Nemanja Ivanovic b43bb6141e [Power9] Add codegen for VSX word insert/extract instructions
This patch corresponds to review:
http://reviews.llvm.org/D20239

It adds exploitation of XXINSERTW and XXEXTRACTUW instructions that
are useful in some cases for inserting and extracting vector elements of
v4[if]32 vectors.

llvm-svn: 275215
2016-07-12 21:00:10 +00:00
Piotr Padlewski fa0cdb371b Review fixes to lit documentation
Reviewers: mehdi_amini

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D22245

llvm-svn: 275214
2016-07-12 20:59:17 +00:00
David Majnemer 8b401013c1 [LoopAccessAnalysis] Some minor cleanups
Use range-base for loops.
Use auto when appropriate.

No functional change is intended.

llvm-svn: 275213
2016-07-12 20:31:46 +00:00
Simon Pilgrim 6fa71da4a4 [X86][AVX] Add support for target shuffle combining to VPERM2F128/VPERM2I128
llvm-svn: 275212
2016-07-12 20:27:32 +00:00
Davide Italiano 0080269342 [SCCP] Constant fold structs if all the lattice value are constant.
Differential Revision:   http://reviews.llvm.org/D22269

llvm-svn: 275208
2016-07-12 19:54:19 +00:00
David Majnemer 9330b78431 [LoopVectorize] Assorted cleanups
Use range-based for loops instead of doing everything manually.
Use auto when appropriate.

No functional change is intended.

llvm-svn: 275205
2016-07-12 19:35:15 +00:00
Matthias Braun 96ec47db74 X86FixupBWInsts: No need for forward liveness analysis.
With r274952 and r275201 in place there are no cases left where a
forward liveness analysis yields different results than a backward one.
So we can remove the forward stepping logic.

Differential Revision: http://reviews.llvm.org/D22083

llvm-svn: 275204
2016-07-12 19:04:30 +00:00
Matt Arsenault 657f871a4e AMDGPU: Fix verifier error with kill intrinsic
Don't create a terminator in the middle of the block.
We should probably get rid of this intrinsic.

llvm-svn: 275203
2016-07-12 19:01:23 +00:00
Dehao Chen b9f8e29290 [PM] Port LoopIdiomRecognize Pass to new PM
Summary: Port LoopIdiomRecognize Pass to new PM

Reviewers: davidxl

Subscribers: davide, sanjoy, mzolotukhin, llvm-commits

Differential Revision: http://reviews.llvm.org/D22250

llvm-svn: 275202
2016-07-12 18:45:51 +00:00
Matthias Braun aeab09fb8f BranchFolding: Use LivePhysReg to update live in lists.
Use LivePhysRegs with a backwards walking algorithm to update live in
lists, this way the results do not depend on the presence of kill flags
anymore.

This patch also reduces the number of registers added as live-in.
Previously all pristine registers as well as all sub registers of a
super register were added resulting in unnecessarily large live in
lists. This fixed https://llvm.org/PR25263.

Differential Revision: http://reviews.llvm.org/D22027

llvm-svn: 275201
2016-07-12 18:44:33 +00:00
Matt Arsenault 10531d1020 AMDGPU: Set isConvergent on v_cmpx* instructions
No test since these aren't used now, except for one place
in a pre-emit pass.

llvm-svn: 275200
2016-07-12 18:41:03 +00:00
Wei Ding 5b2636a152 AMDGPU: Add LLVM IR Intrinsic for v_lerp_u8
Differential Revision: http://reviews.llvm.org/D22239

llvm-svn: 275197
2016-07-12 18:02:14 +00:00
Krzysztof Parzyszek 98c0f482d6 Fix printing of debugging information in LiveIntervals::shrinkToUses
Print VNI->def before calling VNI->markUnused(), since markUnused makes
the def invalid.

llvm-svn: 275196
2016-07-12 17:55:28 +00:00
Krzysztof Parzyszek f5b9bb61f7 Add print/dump routines to LiveInterval::SubRange
llvm-svn: 275194
2016-07-12 17:37:44 +00:00
Xinliang David Li 9eb472ba4b [PGO] Don't include full file path in static function profile counter names
Patch by Jake VanAdrighem
Differential Revision: http://reviews.llvm.org/D22028

llvm-svn: 275193
2016-07-12 17:14:51 +00:00
Sanjay Patel 4a6a751dce add tests for missing DeMorgan's Law folds
llvm-svn: 275192
2016-07-12 17:05:04 +00:00
Sanjay Patel 3900191ecc auto-generate checks
llvm-svn: 275188
2016-07-12 16:21:55 +00:00
Sanjay Patel 93dffe629a auto-generate checks
llvm-svn: 275187
2016-07-12 16:17:30 +00:00
Sanjay Patel 6d1f227e6b auto-generate checks
llvm-svn: 275186
2016-07-12 16:13:04 +00:00
Nirav Dave 2d84ec67f8 [MC] Flip llc's assembly comment preservation flag to have consistent
orientation with llvm-mc.

llvm-svn: 275179
2016-07-12 15:32:36 +00:00
Haicheng Wu 711ca868fc [AArch64] Set FMOVS0 and FMOVD0 as isAsCheapAsAMove when needed.
If a subtarget has both ZCZeroing and CustomCheapAsMoveHandling features (now
only Kryo has both), set FMOVS0 and FMOVD0 isAsCheapAsAMove.

Differential Revision: http://reviews.llvm.org/D22256

llvm-svn: 275178
2016-07-12 15:31:41 +00:00
Nemanja Ivanovic eebbcb6d57 [PowerPC] Cannonicalize applicable vector shift immediates as swaps
This patch corresponds to review:
http://reviews.llvm.org/D21358

Vector shifts that have the same semantics as a vector swap are cannonicalized
as such to provide additional opportunities for swap removal optimization to
remove unnecessary swaps.

llvm-svn: 275168
2016-07-12 12:16:27 +00:00
Amjad Aboud acee568545 [codeview] Improved array type support.
Added support for:
1. Multi dimension array.
2. Array of structure type, which previously was declared incompletely.
3. Dynamic size array.
4. Array where element type is a typedef, volatile or constant (this should resolve PR28311).

Differential Revision: http://reviews.llvm.org/D21526

llvm-svn: 275167
2016-07-12 12:06:34 +00:00
Nicolai Haehnle 7968c34586 AMDGPU: Unify MOVRELSOffset and MOVRELDOffset
Summary:
Previously, constant index insertelements would be turned into SI_INDIRECT_DST,
which is bound to prevent some optimization opportunities. Worse, it mislead
the heuristic that decides whether immediates should be lowered to S_MOV_B32
or V_MOV_B32 in a way that resulted in unnecessary v_readfirstlanes.

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: http://reviews.llvm.org/D22217

llvm-svn: 275160
2016-07-12 08:12:16 +00:00
Vitaly Buka 204dc533c5 Revert "New pass manager for LICM."
Summary: This reverts commit r275118.

Subscribers: sanjoy, mehdi_amini

Differential Revision: http://reviews.llvm.org/D22259

llvm-svn: 275156
2016-07-12 06:25:32 +00:00
Craig Topper a6e6febe2c [AVX512] Remove masked logic op intrinsics and autoupgrade them to native IR.
llvm-svn: 275155
2016-07-12 05:27:53 +00:00
Rui Ueyama dbdfe62c3f Dump enum unique names.
llvm-svn: 275152
2016-07-12 03:33:48 +00:00
Rui Ueyama ef5ec2da4a Re-enable TPI hash verification for enum records.
We didn't read unique names correctly. As a result, we computed
hashes on (non-)unique names instead of unique names.

llvm-svn: 275150
2016-07-12 03:25:03 +00:00
Duncan P. N. Exon Smith 7b4c18e8f3 X86: Avoid implicit iterator conversions, NFC
Avoid implicit conversions from MachineInstrBundleIterator to
MachineInstr*, mainly by preferring MachineInstr& over MachineInstr* and
using range-based for loops.

llvm-svn: 275149
2016-07-12 03:18:50 +00:00
NAKAMURA Takumi 99933f1b51 Fix libdeps in r275125. LTO tools require BitReader.
llvm-svn: 275148
2016-07-12 03:01:22 +00:00
Ivan Krasin 5474645dc8 Print remarks from WholeProgramDevirt pass for each call site.
Summary:
It's useful to have some visibility about which call sites are devirtualized,
especially for debug purposes. Another use case is a regression test on the
application side (like, Chromium).

Reviewers: pcc

Differential Revision: http://reviews.llvm.org/D22252

llvm-svn: 275145
2016-07-12 02:38:37 +00:00
NAKAMURA Takumi e92e2124f6 llvm/test/CodeGen/AMDGPU/selected-stack-object.ll REQUIRES +Asserts, since it expects assertion failure.
llvm-svn: 275144
2016-07-12 02:18:09 +00:00
Haicheng Wu 1e39574e9f [Kryo] Enable ZCZeroing feature
This feature uses immediate #0 to zero a register.

Differential Revision: http://reviews.llvm.org/D19985

llvm-svn: 275143
2016-07-12 02:04:01 +00:00
Duncan P. N. Exon Smith 98226e3d93 Hexagon: Avoid implicit iterator conversions, NFC
Avoid implicit iterator conversions from MachineInstrBundleIterator to
MachineInstr* in the Hexagon backend, mostly by preferring MachineInstr&
over MachineInstr* and switching to range-based for loops.

There's a long tail of API cleanup here, but I'm planning to leave the
rest to the Hexagon maintainers.  HexagonInstrInfo defines many of its
own predicates, and most of them still take MachineInstr*.  Some of
those actually check for nullptr, so I didn't feel comfortable changing
them to MachineInstr& en masse.

llvm-svn: 275142
2016-07-12 01:55:32 +00:00
Duncan P. N. Exon Smith fdd30c620d Mips: Avoid implicit iterator conversions, NFC
Avoid implicit conversions from MachineInstrBundleIterator to
MachineInstr* in the Mips backend, mainly by preferring MachineInstr&
over MachineInstr* when a pointer isn't nullable and using range-based
for loops.

llvm-svn: 275141
2016-07-12 01:47:02 +00:00
Craig Topper 46b34fe315 [X86,IR] Remove unnecessary or unused LLVMContext parameter from some of the X86 intrinsic upgrade functions.
llvm-svn: 275138
2016-07-12 01:42:33 +00:00
Duncan P. N. Exon Smith 4565ec0c1d SystemZ: Avoid implicit iterator conversions, NFC
Avoid implicit conversions from MachineInstrBundleIterator to
MachineInstr* in the SystemZ backend, mainly by preferring MachineInstr&
over MachineInstr* and using range-based for loops.

llvm-svn: 275137
2016-07-12 01:39:01 +00:00
Nico Weber c7bf646a99 Teach FastISel about thiscall (and, hence, about callee-pop).
http://reviews.llvm.org/D22115

llvm-svn: 275135
2016-07-12 01:30:35 +00:00
Matt Arsenault fc7e6a0a0e AMDGPU: Cleanup pseudoinstructions
llvm-svn: 275133
2016-07-12 00:23:17 +00:00
Matt Arsenault 840593e19d AMDGPU: Fix missing scc def on control flow pseudos
These are all expanded to instructions that include an scc def.

llvm-svn: 275132
2016-07-12 00:08:14 +00:00
Matt Arsenault e3742466b9 AMDGPU: Enable trackLivenessAfterRegAlloc
This has caught a number of bugs.

llvm-svn: 275131
2016-07-11 23:56:30 +00:00
Mehdi Amini 0da268d13a Do not use bool in C header lto.h, use lto_bool_t instead
llvm-svn: 275130
2016-07-11 23:55:01 +00:00
Matt Arsenault 45f8216cee AMDGPU: Remove superfluous string attributes from tests
Also fix v_mac.ll not testing right thing for fneg

llvm-svn: 275129
2016-07-11 23:35:48 +00:00
George Burgess IV 1cbd039234 Attempt to make buildbots happy.
Woohoo, unused variable warnings in builds without asserts (as a result
of r275122).

llvm-svn: 275126
2016-07-11 23:18:32 +00:00
Mehdi Amini e75aa6f674 Add a libLTO API to query a memory buffer and check if it contains ObjC categories
The linker supports a feature to force load an object from a static
archive if it defines an Objective-C category.
This API supports this feature by looking at every section in the
module to find if a category is defined in the module.

llvm-svn: 275125
2016-07-11 23:10:18 +00:00
George Burgess IV de1be7171a [CFLAA] Simplify CFLGraphBuilder. NFC.
This patch simplifies the graph builder by encoding nodes as {Value,
Dereference Level} pairs. This lets us kill edge types, and allows us to
get rid of hacks in StratifiedSets (like addAttrsBelow/...). This
simplification also allows us to remove InstantiatedRelations and
InstantiatedAttrs.

Patch by Jia Chen.

Differential Revision: http://reviews.llvm.org/D22080

llvm-svn: 275122
2016-07-11 22:59:09 +00:00
Dehao Chen 7ef5820fa3 New pass manager for LICM.
Summary: Port LICM to the new pass manager.

Reviewers: davidxl, silvas

Subscribers: silvas, davide, sanjoy, llvm-commits, mehdi_amini

Differential Revision: http://reviews.llvm.org/D21772

llvm-svn: 275118
2016-07-11 22:45:24 +00:00
Alina Sbirlea cbc6ac2afd Correct ordering of loads/stores.
Summary:
Aiming to correct the ordering of loads/stores. This patch changes the
insert point for loads to the position of the first load.
It updates the ordering method for loads to insert before, rather than after.

Before this patch the following sequence:
"load a[1], store a[1], store a[0], load a[2]"
Would incorrectly vectorize to "store a[0,1], load a[1,2]".
The correctness check was assuming the insertion point for loads is at
the position of the first load, when in practice it was at the last
load. An alternative fix would have been to invert the correctness check.
The current fix changes insert position but also requires reordering of
instructions before the vectorized load.

Updated testcases to reflect the changes.

Reviewers: tstellarAMD, llvm-commits, jlebar, arsenm

Subscribers: mzolotukhin

Differential Revision: http://reviews.llvm.org/D22071

llvm-svn: 275117
2016-07-11 22:34:29 +00:00
Tim Northover 3e0361710a ARM: validate immediate branch targets in AsmParser.
Immediate branch targets aren't commonly used, but if they are we should make
sure they can actually be encoded. This means they must be divisible by 2 when
targeting Thumb mode, and by 4 when targeting ARM mode.

Also do a little naming cleanup while I was changing everything around anyway.

llvm-svn: 275116
2016-07-11 22:29:37 +00:00
Nicolai Haehnle c06bfa1daa AMDGPU: Treat texture gather instructions more like other MIMG instructions
Summary:
Setting MIMG to 0 has a bunch of unexpected side effects, including that
isVMEM returns false which leads to incorrect treatment in the hazard
recognizer. The reason I noticed it is that it also leads to incorrect
treatment in VGPR-to-SGPR copies, which is one cause of the referenced bug.

The only reason why MIMG was set to 0 is to signal the special handling of
dmasks, but that can be checked differently.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96877

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: http://reviews.llvm.org/D22210

llvm-svn: 275113
2016-07-11 21:59:43 +00:00
Zachary Turner dbeaea7b35 Refactor the PDB writing to use a builder approach
llvm-svn: 275110
2016-07-11 21:45:26 +00:00
Zachary Turner f6b9382467 [pdb] Add a pdb2yaml option to not dump file headers.
This will be useful once we start adding the ability to dump type
records and symbol records, since it will allow us to generate
mergeable information instead of information that specifies an
entire file.

llvm-svn: 275109
2016-07-11 21:45:09 +00:00
Nicolai Haehnle f52c3cf272 AMDGPU: fix local stack slot allocation bugs
Summary:
The main bug fix here is using the 32-bit encoding of V_ADD_I32 in
materializeFrameBaseRegister and resolveFrameIndex, so that arbitrary
immediates work.

The second part is that we may now require the SegmentWaveByteOffset
even when there are initially no stack objects and VGPR spilling isn't
enabled, for stack slots that are allocated later. This means that some
bits become effectively dead and can be cleaned up.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96602
Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org>

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D21551

llvm-svn: 275108
2016-07-11 21:44:40 +00:00
Michael Kuperstein f0c59330e9 [X86] Make some cast costs more precise
Make some AVX and AVX512 cast costs more precise.
Based on part of a patch by Elena Demikhovsky (D15604).

Differential Revision: http://reviews.llvm.org/D22064

llvm-svn: 275106
2016-07-11 21:39:44 +00:00
Kyle Butt 83a25792c5 Codegen: Fix comment in BranchFolding.cpp
Blocks to be tail-merged may share more than one successor. Correct the
comment to state that they share a specific successor, SuccBB, rather
than a single successor, which is not true.

llvm-svn: 275104
2016-07-11 21:37:03 +00:00
Quentin Colombet fb82c7bc94 [X86] Fix tailcall return address clobber bug.
This bug (llvm.org/PR28124) was introduced by r237977, which refactored
the tail call  sequence to be generated in two passes instead of one.

Unfortunately, the stack adjustment produced by the first pass was not
recognized by X86FrameLowering::mergeSPUpdates() in all cases, causing
code such as the following, which clobbers the return address, to be
generated:

popl    %edi
popl    %edi
pushl   %eax
jmp     tailcallee              # TAILCALL

To fix the problem, the entire stack adjustment is performed in
X86ExpandPseudo::ExpandMI() for tail calls.

Patch by Magnus Lång <margnus1@gmail.com>

Differential Revision: http://reviews.llvm.org/D21325

llvm-svn: 275103
2016-07-11 21:03:03 +00:00
Sanjay Patel bb7d87ee25 fix documentation comments; NFC
llvm-svn: 275101
2016-07-11 20:50:39 +00:00
Alina Sbirlea 327955e057 Add TLI.allowsMisalignedMemoryAccesses to LoadStoreVectorizer
Summary: Extend TTI to access TLI.allowsMisalignedMemoryAccesses(). Check condition when vectorizing load and store chains.
Add additional parameters: AddressSpace, Alignment, Fast.

Reviewers: llvm-commits, jlebar

Subscribers: arsenm, mzolotukhin

Differential Revision: http://reviews.llvm.org/D21935

llvm-svn: 275100
2016-07-11 20:46:17 +00:00
Michael Kuperstein cfbac5f361 [X86] Disable FixupSetCC for CodeGenOpt::None
It is an optimization pass, and should not run at -O0. Especially since Fast RA
will not do the required register coalescing anyway, so it's a loss even from
the optimization standpoint.

This also works around (but doesn't quite fix) PR28489.

llvm-svn: 275099
2016-07-11 20:40:44 +00:00
Chad Rosier 4f0dad1674 [IPRA] Properly compute register usage at call sites.
Differential Revision: http://reviews.llvm.org/D21395
Patch by Vivek Pandya.
PR28144

llvm-svn: 275087
2016-07-11 18:45:49 +00:00
Zhan Jun Liau def708a0f9 [SystemZ] Recognize Load On Condition Immediate (LOCHI/LOGHI) opportunities
Summary: Add support for the z13 instructions LOCHI and LOCGHI which
conditionally load immediate values.  Add target instruction info hooks so
that if conversion will allow predication of LHI/LGHI.

Author: RolandF

Reviewers: uweigand

Subscribers: zhanjunl

Commiting on behalf of Roland.

Differential Revision: http://reviews.llvm.org/D22117

llvm-svn: 275086
2016-07-11 18:45:03 +00:00
Davide Italiano 63c4ce8e1b [SCCP] Try to follow the DRY principle, use `OpSt`.
Thanks to Eli Friedman for pointing out in his post-commit review!

llvm-svn: 275084
2016-07-11 18:21:29 +00:00
Jingyue Wu 641cfee976 [SLSR] Call getPointerSizeInBits with the correct address space.
llvm-svn: 275083
2016-07-11 18:13:28 +00:00
Davide Italiano e8ae0b5eb4 [PM/IPO] Port LowerTypeTests to the new PassManager.
There's a little bit of churn in this patch because the initialization
mechanism is now shared between the old and the new PM. Other than
that, it's just a pretty mechanical translation.

llvm-svn: 275082
2016-07-11 18:10:06 +00:00
Jacques Pienaar c3a162c451 [lanai] Add more tests for assembly of conditional ALU ops
llvm-svn: 275081
2016-07-11 17:58:16 +00:00
Dehao Chen 71021cdf47 Fix the assertion failure caused by http://reviews.llvm.org/D22118
Summary: http://reviews.llvm.org/D22118 uses metadata to store the call count, which makes it possible to have branch weight to have only one elements. Also fix the assertion failure in inliner when checking the instruction type to include "invoke" instruction.

Reviewers: mkuper, dnovillo

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D22228

llvm-svn: 275079
2016-07-11 17:36:02 +00:00
David Majnemer dd8f6bdd46 [IR] Stop a -Wsign-compare warning from firing
llvm-svn: 275077
2016-07-11 17:09:06 +00:00
Davide Italiano 12a115683b [LowerTypeTests] Don't rely on doInitialization().
In preparation for porting this pass to the new PM (which has no
doInitialization()).

Differential Revision:  http://reviews.llvm.org/D22223

llvm-svn: 275074
2016-07-11 17:00:31 +00:00
Dehao Chen 9232f98279 Implement callsite-hotness based inline cost for Sample-based PGO
Summary:
For sample-based PGO, using BFI to calculate callsite count is sometime not accurate. This is because with sampling based approach, if a callsite resides in a hot loop deeply nested in a bunch of cold branches, the callsite's BFI frequency would be inaccurately calculated due to lack of samples in the cold branch.

E.g.

if (A1 && A2 && A3 && ..... && A10) {
  for (i=0; i < 100000000; i++) {
    callsite();
  }
}

Assume that A1 to A100 are all 100% taken, and callsite has 1000 samples and thus is considerred hot. Because the loop's trip count is huge, it's normal that all branches outside the loop has no sample at all. As a result, we can only use static branch probability to derive the the frequency of the loop header. Assuming that static heuristic thinks each branch is 50% taken, then the count calculated from BFI will be 1/(2^10) of the actual value.

In order to get more accurate callsite count, we directly annotate the weight on the call instruction, and directly use it when checking callsite hotness.

Note that this mechanism can also be shared by instrumentation based callsite hotness analysis. The side benefit is that it breaks the dependency from Inliner to BFI as call count is embedded in the IR.

Reviewers: davidxl, eraman, dnovillo

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D22118

llvm-svn: 275073
2016-07-11 16:48:54 +00:00
Dehao Chen 29d2641f52 Tune the weight propagation algorithm for sample profile.
Summary: Handle the case when there is only one incoming/outgoing edge for a visited basic block: use the block weight to adjust edge weight even when the edge has been visited before. This can help reduce inaccuracies introduced by incorrect basic block profile, as shown in the updated unittest.

Reviewers: davidxl, dnovillo

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D22180

llvm-svn: 275072
2016-07-11 16:40:17 +00:00
Sanjay Patel 8f1d408c74 [x86] make some of the tests 256-bit for testing diversity
llvm-svn: 275070
2016-07-11 15:08:37 +00:00
Nirav Dave 57033c6336 Add missing include from previous commit
llvm-svn: 275069
2016-07-11 14:32:57 +00:00
Nirav Dave 8603062ee4 Fix branch relaxation in 16-bit mode.
Thread through MCSubtargetInfo to relaxInstruction function allowing relaxation
to generate jumps with 16-bit sized immediates in 16-bit mode.

This fixes PR22097.

Reviewers: dwmw2, tstellarAMD, craig.topper, jyknight

Subscribers: jfb, arsenm, jyknight, llvm-commits, dsanders

Differential Revision: http://reviews.llvm.org/D20830

llvm-svn: 275068
2016-07-11 14:23:53 +00:00
Sanjay Patel b428951990 [x86] specify triple to avoid bot failures
llvm-svn: 275067
2016-07-11 14:17:54 +00:00
Nicolai Haehnle 889a20cf40 [Sink] Don't move calls to readonly functions across stores
Summary:

Reviewers: hfinkel, majnemer, tstellarAMD, sunfish

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D17279

llvm-svn: 275066
2016-07-11 14:11:51 +00:00
Nicolai Haehnle 9343f36f8e AliasAnalysis: unify getModRefInfo(I, CS) semantics with other overloads
This subtle change to getModRefInfo(Instruction, ImmutableCallSite) is to
ensure that the semantics are equal to that of getModRefInfo(CS1, CS2) when
the Instruction is a call-site.

This is now more in line with getModRefInfo generally: it returns Mod when
I modifies a memory location that is accessed (read or written) by CS and
Ref when I reads a memory location that is written by CS.

From a grep of the code, the only uses of this particular getModRefInfo
overload are in MemorySSA and MemCpyOptimizer, and they only care about
where the result is MR_NoModRef or not. Therefore, this change should have
no visible effect.

Separated out from D17279 upon request.

llvm-svn: 275065
2016-07-11 14:11:45 +00:00
Sanjay Patel 0d38830aca [x86] update checks
llvm-svn: 275064
2016-07-11 14:07:31 +00:00
Simon Pilgrim 832463eada [X86][SSE] Generalise target shuffle combine of shuffles using variable masks
At present the only shuffle with a variable mask we recognise is PSHUFB, which influences if its worth the cost of mask creation/loading of a combined target shuffle with a variable mask. This change sets up the infrastructure to support other shuffles in the future but has no effect yet.

llvm-svn: 275059
2016-07-11 12:49:35 +00:00
Nirav Dave 53a72f4d3c Provide support for preserving assembly comments
Preserve assembly comments from input in output assembly and flags to
toggle property. This is on by default for inline assembly and off in
llvm-mc.

Parsed comments are emitted immediately before an EOL which generally
places them on the expected line.

Reviewers: rtrieu, dwmw2, rnk, majnemer

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D20020

llvm-svn: 275058
2016-07-11 12:42:14 +00:00
Artem Tamazov 53c9de08d2 [AMDGPU][llvm-mc] Quickfix for r272748 to enable labels in branch instructions.
Fixes issue mentioned at:
  https://github.com/RadeonOpenCompute/LLVM-AMDGPU-Assembler-Extra/issues/13.
Lit tests added.

Differential Revision: http://reviews.llvm.org/D22133

llvm-svn: 275054
2016-07-11 12:07:18 +00:00
Zlatko Buljan cba9f80ba8 [mips][microMIPS] Implement LDC1, SDC1, LDC2, SDC2, LWC1, SWC1, LWC2 and SWC2 instructions and add CodeGen support
Differential Revision: http://reviews.llvm.org/D18824

llvm-svn: 275050
2016-07-11 07:41:56 +00:00
Elena Demikhovsky d84f337953 AVX-512: DAG lowering for scalar MIN/MAX commutable ops
DAG lowering was missing for the scalar FMINC, FMAXC nodes.
The nodes are generated only in the "unsafe-fp-math" mode.
Added tests.

llvm-svn: 275048
2016-07-11 06:08:06 +00:00
Craig Topper 7ee070e7bc [AVX512] Add support for 512-bit ANDN now that all ones build vectors survive long enough to allow the matching.
llvm-svn: 275046
2016-07-11 05:36:53 +00:00
Craig Topper 516e14cd8e [AVX512] Use vpternlog with an immediate of 0xff to create 512-bit all one vectors.
llvm-svn: 275045
2016-07-11 05:36:48 +00:00
Craig Topper 8674849d6e [X86] Add the AVX512 SET0 pseudos to foldMemoryOperandImpl since they are marked for CanFoldAsLoad.
I don't really know how to test this.

llvm-svn: 275044
2016-07-11 05:36:41 +00:00
Hal Finkel 02012bcfee Revert r275027 - Let FuncAttrs infer the 'returned' argument attribute
Reverting r275027 and r275033. These seem to cause miscompiles on the AArch64 buildbot.

llvm-svn: 275042
2016-07-11 04:51:23 +00:00
Daniel Berlin e64985fc94 Allow BasicBlockEdge to be used in DenseMap
Summary: Add a DenseMapInfo specialization for BasicBlockEdge

Reviewers: hfinkel, chandlerc, majnemer

Differential Revision: http://reviews.llvm.org/D22207

llvm-svn: 275041
2016-07-11 04:37:53 +00:00
Hal Finkel 2cac58f604 Pointer-comparison folding should look through returned-argument functions
For functions which are known to return a specific argument, pointer-comparison
folding can look through the function calls as part of its analysis.

Differential Revision: http://reviews.llvm.org/D9387

llvm-svn: 275039
2016-07-11 03:37:59 +00:00
Hal Finkel bf3957a553 Teach isDereferenceablePointer to look through returned-argument functions
For functions which are known to return their argument,
isDereferenceableAndAlignedPointer can examine the argument value.

Differential Revision: http://reviews.llvm.org/D9384

llvm-svn: 275038
2016-07-11 03:08:49 +00:00
Hal Finkel e186debb8b Teach SCEV to look through returned-argument functions
When building SCEVs, if a function is known to return its argument, then we can
build the SCEV using the corresponding argument value.

Differential Revision: http://reviews.llvm.org/D9381

llvm-svn: 275037
2016-07-11 02:48:23 +00:00
Hal Finkel 6fd5e1f02b Teach computeKnownBits to look through returned-argument functions
If a function is known to return one of its arguments, we can use that in order
to compute known bits of the return value.

Differential Revision: http://reviews.llvm.org/D9397

llvm-svn: 275036
2016-07-11 02:25:14 +00:00
Hal Finkel 5c12d8fe8f BasicAA should look through functions with returned arguments
Motivated by the work on the llvm.noalias intrinsic, teach BasicAA to look
through returned-argument functions when answering queries. This is essential
so that we don't loose all other AA information when supplementing with
llvm.noalias.

Differential Revision: http://reviews.llvm.org/D9383

llvm-svn: 275035
2016-07-11 01:32:20 +00:00
Hal Finkel 47646c0981 Add a 'Returned' intrinsic property corresponding to the 'returned' argument attribute
This will be used by the upcoming llvm.noalias intrinsic.

Differential Revision: http://reviews.llvm.org/D22201

llvm-svn: 275034
2016-07-11 01:28:42 +00:00
Hal Finkel ce881a41f9 Don't use a SmallSet for returned attribute inference
Suggested post-commit by David Majnemer on IRC (following-up on a pre-commit
review comment).

llvm-svn: 275033
2016-07-11 01:14:21 +00:00
Hal Finkel e87ad547ef Add getReturnedArgOperand to Call/InvokeInst, CallSite
In order to make the optimizer smarter about using the 'returned' argument
attribute (generally, but motivated by my llvm.noalias intrinsic work), add a
utility function to Call/InvokeInst, and CallSite, to make it easy to get the
returned call argument (when one exists).

P.S. There is already an unfortunate amount of code duplication between
CallInst and InvokeInst, and this adds to it. We should probably clean that up
separately.

Differential Revision: http://reviews.llvm.org/D22204

llvm-svn: 275031
2016-07-10 23:01:32 +00:00
Simon Pilgrim ee4a33ae46 [X86][SSE] Relax type assertions for matchVectorShuffleAsInsertPS
Calls to matchVectorShuffleAsInsertPS only need to ensure the inputs are 128-bit vectors. Only lowerVectorShuffleAsInsertPS needs to ensure that they are v4f32.

llvm-svn: 275028
2016-07-10 22:26:05 +00:00
Hal Finkel d66a7b05db Let FuncAttrs infer the 'returned' argument attribute
A function can have one argument with the 'returned' attribute, indicating that
the associated argument is always the return value of the function. Add
FuncAttrs inference logic.

Differential Revision: http://reviews.llvm.org/D22202

llvm-svn: 275027
2016-07-10 22:02:55 +00:00
Hal Finkel 3b66caa290 Update the LangRef description of the 'returned' attribute
The description of the 'returned' attribute says that it is only used when
code-generating the caller. I'd like to make the optimizer smarter about
looking through functions with returned arguments (generally, but motivated by
my llvm.noalias work). As David pointed out in the review of D22202, the
LangRef should be updated to make its expanded uses clearer.

Differential Revision: http://reviews.llvm.org/D22205

llvm-svn: 275026
2016-07-10 21:52:39 +00:00
Sanjay Patel fedc01ad76 [DAG] make isConstantSplatVector() available to the rest of lowering
llvm-svn: 275025
2016-07-10 21:27:06 +00:00