Commit Graph

109919 Commits

Author SHA1 Message Date
Eugene Leviant 453c976a63 [ThinLTO] Implement summary visualizer
Differential revision: https://reviews.llvm.org/D41297

llvm-svn: 323062
2018-01-21 07:27:32 +00:00
Lang Hames 5ff5a30bee [ORC] Add a lookupFlags method to VSO.
lookupFlags returns a SymbolFlagsMap for the requested symbols, along with a
set containing the SymbolStringPtr for any symbol not found in the VSO.

The JITSymbolFlags for each symbol will have been stripped of its transient
JIT-state flags (i.e. NotMaterialized, Materializing).

Calling lookupFlags does not trigger symbol materialization.

llvm-svn: 323060
2018-01-21 03:20:39 +00:00
Lang Hames 86edf53664 [ORC] More cleanup. NFC.
llvm-svn: 323059
2018-01-21 03:20:36 +00:00
Lang Hames 0d7d442699 [ORC] Cleanup. NFC.
llvm-svn: 323057
2018-01-21 02:24:45 +00:00
Philip Reames f57714c3c7 [DSE] Factor out common code [NFC]
We already had the pointer being stored to in the MemLoc, reuse that code.  In merging cases, it turned out the interface of the getLocForWrite had become inconsitent with other related utilities.  Fix that by making sure the input passes hasAnalyzableWrite as well.

llvm-svn: 323056
2018-01-21 02:10:54 +00:00
Philip Reames 424e7a1174 [DSE] Minor rename for clarity sake [NFC]
llvm-svn: 323055
2018-01-21 01:44:33 +00:00
Craig Topper 7fddf2bfef [X86] Add an override of targetShrinkDemandedConstant to limit the damage that shrinkdemandedbits can do to zext_in_reg operations
Summary:
This patch adds an implementation of targetShrinkDemandedConstant that tries to keep shrinkdemandedbits from removing bits that would otherwise have been recognized as a movzx.

We still need a follow patch to stop moving ands across srl if the and could be represented as a movzx before the shift but not after. I think this should help with some of the cases that D42088 ended up removing during isel.

Reviewers: spatel, RKSimon

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42265

llvm-svn: 323048
2018-01-20 18:50:09 +00:00
Simon Pilgrim 89540d9665 [X86][SSE] Check for out of bounds PEXTR/PINSR indices during faux shuffle combining.
llvm-svn: 323045
2018-01-20 17:16:01 +00:00
Jonas Paulsson 7ad28863fb [SelectionDAG] Fix codegen of vector stores with non byte-sized elements.
This was completely broken, but hopefully fixed by this patch.

In cases where it is needed, a vector with non byte-sized elements is stored
by extracting, zero-extending, shift:ing and or:ing the elements into an
integer of the same width as the vector, which is then stored.

Review: Eli Friedman, Ulrich Weigand
https://reviews.llvm.org/D42100#inline-369520
https://bugs.llvm.org/show_bug.cgi?id=35520

llvm-svn: 323042
2018-01-20 16:05:10 +00:00
Martin Storsjo f641d0d4f2 [COFF] Keep the underscore on exported decorated stdcall functions in MSVC mode
This (together with the corresponding LLD commit, that contains the
testcase updates) fixes PR35733.

Differential Revision: https://reviews.llvm.org/D41631

llvm-svn: 323035
2018-01-20 11:44:32 +00:00
Saleem Abdulrasool 99f479abcf CodeGen: handle llvm.used properly for COFF
`llvm.used` contains a list of pointers to named values which the
compiler, assembler, and linker are required to treat as if there is a
reference that they cannot see.  Ensure that the symbols are preserved
by adding an explicit `-include` reference to the linker command.

llvm-svn: 323017
2018-01-20 00:28:02 +00:00
Craig Topper 08bd14803c [X86] Teach X86 codegen to use vector width preference to avoid promoting to 512-bit types when VLX is enabled and the preference is for a smaller size.
This change applies to places where we would turn 128/256-bit code into 512-bit in order to get a wider element type through sext/zext. Any 512-bit types that already existed in the IR/DAG will be left that way.

The width preference has no effect on codegen behavior when the target does not have AVX512 enabled. So AVX/AVX2 codegen cannot be limited via this mechanism yet.

If the preference is lower than 256 we may still use a 256 bit type to do the operation. Constraining to 128 bits makes it much more difficult to support some operations. For many of these cases we need to change element width while keeping element count constant which is easiest done by switching between 256 and 128 bit.

The preference is only obeyed when AVX512 and VLX are available. This means the preference is not obeyed for KNL, but is obeyed for SKX, Cannonlake, and Icelake. For KNL, the only way to do masked operation is on 512-bit registers so we would have to completely disable masking to obey the preference. We would also lose support for gather, scatter, ctlz, vXi64 multiplies, etc. This may change in the future, but this simplifies the initial implementation.

Differential Revision: https://reviews.llvm.org/D41895

llvm-svn: 323016
2018-01-20 00:26:12 +00:00
Craig Topper 0d797a34d8 [X86] Add support for passing 'prefer-vector-width' function attribute into X86Subtarget and exposing via X86's getRegisterWidth TTI interface.
This will cause the vectorizers to do some limiting of the vector widths they create. This is not a strict limit. There are reasons I know of that the loop vectorizer will generate larger vectors for.

I've written this in such a way that the interface will only return a properly supported width(0/128/256/512) even if the attribute says something funny like 384 or 10.

This has been split from D41895 with the remainder in a follow up commit.

llvm-svn: 323015
2018-01-20 00:26:08 +00:00
Derek Schuff a83a665cd4 [WebAssembly] Fix MSVC build
nullptr_t can't be used left of boolean &&

llvm-svn: 323012
2018-01-20 00:01:18 +00:00
Akira Hatanaka 73ceb50d85 [ObjCARC] Do not turn a call to @objc_autoreleaseReturnValue into a call
to @objc_autorelease if its operand is a PHI and the PHI has an
equivalent value that is used by a return instruction.

For example, ARC optimizer shouldn't replace the call in the following
example, as doing so breaks the AutoreleaseRV/RetainRV optimization:

  %v1 = bitcast i32* %v0 to i8*
  br label %bb3
bb2:
  %v3 = bitcast i32* %v2 to i8*
  br label %bb3
bb3:
  %p = phi i8* [ %v1, %bb1 ], [ %v3, %bb2 ]
  %retval = phi i32* [ %v0, %bb1 ], [ %v2, %bb2 ] ; equivalent to %p
  %v4 = tail call i8* @objc_autoreleaseReturnValue(i8* %p)
  ret i32* %retval

Also, make sure ObjCARCContract replaces @objc_autoreleaseReturnValue's
operand uses with its value so that the call gets tail-called.

rdar://problem/15894705

llvm-svn: 323009
2018-01-19 23:51:13 +00:00
Rui Ueyama 20b49240b8 Fix -Wunused-variable.
llvm-svn: 323004
2018-01-19 22:56:04 +00:00
Lang Hames b72f48452c [ORC] Re-apply r322913 with a fix for a read-after-free error.
ExternalSymbolMap now stores the string key (rather than using a StringRef),
as the object file backing the key may be removed at any time.

llvm-svn: 323001
2018-01-19 22:24:13 +00:00
Jessica Paquette a499c3c29d Add optional DICompileUnit to DIBuilder + make outliner debug info use it
Previously, the DIBuilder didn't expose functionality to set its compile unit
in any other way than calling createCompileUnit. This meant that the outliner,
which creates new functions, had to create a new compile unit for its debug
info.

This commit adds an optional parameter in the DIBuilder's constructor which
lets you set its CU at construction.

It also changes the MachineOutliner so that it keeps track of the DISubprograms
for each outlined sequence. If debugging information is requested, then it
uses one of the outlined sequence's DISubprograms to grab a CU. It then uses
that CU to construct the DISubprogram for the new outlined function.

The test has also been updated to reflect this change.

See https://reviews.llvm.org/D42254 for more information. Also see the e-mail
discussion on D42254 in llvm-commits for more context.

llvm-svn: 322992
2018-01-19 21:21:49 +00:00
Ulrich Weigand 426f6bef44 [SystemZ] Prefer LOCHI over generating IPM sequences
On current machines we have load-on-condition instructions that can be
used to directly implement the SETCC semantics.  If we have those, it is
always preferable to use them instead of generating the IPM sequence.

llvm-svn: 322989
2018-01-19 20:56:04 +00:00
Ulrich Weigand 31112895d9 [SystemZ] Directly use CC result of compare-and-swap
In order to implement a test whether a compare-and-swap succeeded, the
SystemZ back-end currently emits a rather inefficient sequence of first
converting the CC result into an integer, and then testing that integer
against zero.  This commit changes the back-end to simply directly test
the CC value set by the compare-and-swap instruction.

llvm-svn: 322988
2018-01-19 20:54:18 +00:00
Ulrich Weigand 849a59fd4b [SystemZ] Rework IPM sequence generation
The SystemZ back-end uses a sequence of IPM followed by arithmetic
operations to implement the SETCC primitive.  This is currently done
early during SelectionDAG.  This patch moves generating those sequences
to much later in SelectionDAG (during PreprocessISelDAG).

This doesn't change much in generated code by itself, but it allows
further enhancements that will be checked-in as follow-on commits.

llvm-svn: 322987
2018-01-19 20:52:04 +00:00
Ulrich Weigand 9eb858c92f [SystemZ] Implement computeKnownBitsForTargetNode
This provides a computeKnownBits implementation for SystemZ target
nodes.  Currently only SystemZISD::SELECT_CCMASK is supported.

llvm-svn: 322986
2018-01-19 20:49:05 +00:00
Ulrich Weigand 5605be9e50 [SelectionDAG] Teach computeKnownBits about ATOMIC_CMP_SWAP_WITH_SUCCESS boolean return value
The second return value of ATOMIC_CMP_SWAP_WITH_SUCCESS is known to be a
boolean, and should therefore be treated by computeKnownBits just like
the second return values of SMULO / UMULO.

Differential Revision: https://reviews.llvm.org/D42067

llvm-svn: 322985
2018-01-19 20:47:14 +00:00
Sam Clegg 30e1bbc106 [WebAssembly] MC: Start table at offset 1 rather than 0
Summary:
For consistency with the output of lld.

This is useful in runnable binaries as can them be sure the
null function pointer will never be a valid argument
call_indirect.

Subscribers: jfb, dschuff, jgravelle-google, aheejin, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D42284

llvm-svn: 322978
2018-01-19 18:57:01 +00:00
Joel Galenson dbc724f764 [ARM] Fix perf regression in compare optimization.
Fix a performance regression caused by r322737.

While trying to make it easier to replace compares with existing adds and
subtracts, I accidentally stopped it from doing so in some cases.  This should
fix that.  I'm also fixing another potential bug in that commit.

Differential Revision: https://reviews.llvm.org/D42263

llvm-svn: 322972
2018-01-19 17:46:27 +00:00
Derek Schuff bfb02aec5a [WebAssembly] Fix libcall signature lookup
RuntimeLibcallSignatures previously manually initialized all the libcall
names into an array and searched it linearly for the first match to lookup
the corresponding index.
r322802 switched that to initializing a map keyed by the libcall name.
Neither of these approaches works correctly because some libcall numbers use
the same name on different platforms (e.g. the "l" suffixed functions
use f80 or f128 or ppcf128).

This change fixes that by ensuring that each name only goes into the map
once. It also adds tests.

Differential Revision: https://reviews.llvm.org/D42271

llvm-svn: 322971
2018-01-19 17:45:54 +00:00
Dan Gohman 5d2b9354b1 [WebAssembly] Make sign-extension opcodes a distinct feature.
Sign-extension opcodes have been split into a separate proposal from
the main threads proposal, so switch them to their own target
feature. See:

https://github.com/WebAssembly/sign-extension-ops

llvm-svn: 322966
2018-01-19 17:16:24 +00:00
Daniel Neilson 1e68724d24 Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)
Summary:
 This is a resurrection of work first proposed and discussed in Aug 2015:
   http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html
and initially landed (but then backed out) in Nov 2015:
   http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html

 The @llvm.memcpy/memmove/memset intrinsics currently have an explicit argument
which is required to be a constant integer. It represents the alignment of the
dest (and source), and so must be the minimum of the actual alignment of the
two.

 This change is the first in a series that allows source and dest to each
have their own alignments by using the alignment attribute on their arguments.

 In this change we:
1) Remove the alignment argument.
2) Add alignment attributes to the source & dest arguments. We, temporarily,
   require that the alignments for source & dest be equal.

 For example, code which used to read:
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 100, i32 4, i1 false)
will now read
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 4 %dest, i8* align 4 %src, i32 100, i1 false)

 Downstream users may have to update their lit tests that check for
@llvm.memcpy/memmove/memset call/declaration patterns. The following extended sed script
may help with updating the majority of your tests, but it does not catch all possible
patterns so some manual checking and updating will be required.

s~declare void @llvm\.mem(set|cpy|move)\.p([^(]*)\((.*), i32, i1\)~declare void @llvm.mem\1.p\2(\3, i1)~g
s~call void @llvm\.memset\.p([^(]*)i8\(i8([^*]*)\* (.*), i8 (.*), i8 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i8(i8\2* \3, i8 \4, i8 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i16\(i8([^*]*)\* (.*), i8 (.*), i16 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i16(i8\2* \3, i8 \4, i16 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i32\(i8([^*]*)\* (.*), i8 (.*), i32 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i32(i8\2* \3, i8 \4, i32 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i64\(i8([^*]*)\* (.*), i8 (.*), i64 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i64(i8\2* \3, i8 \4, i64 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i128\(i8([^*]*)\* (.*), i8 (.*), i128 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i128(i8\2* \3, i8 \4, i128 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i8\(i8([^*]*)\* (.*), i8 (.*), i8 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i8(i8\2* align \6 \3, i8 \4, i8 \5, i1 \7)~g
s~call void @llvm\.memset\.p([^(]*)i16\(i8([^*]*)\* (.*), i8 (.*), i16 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i16(i8\2* align \6 \3, i8 \4, i16 \5, i1 \7)~g
s~call void @llvm\.memset\.p([^(]*)i32\(i8([^*]*)\* (.*), i8 (.*), i32 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i32(i8\2* align \6 \3, i8 \4, i32 \5, i1 \7)~g
s~call void @llvm\.memset\.p([^(]*)i64\(i8([^*]*)\* (.*), i8 (.*), i64 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i64(i8\2* align \6 \3, i8 \4, i64 \5, i1 \7)~g
s~call void @llvm\.memset\.p([^(]*)i128\(i8([^*]*)\* (.*), i8 (.*), i128 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i128(i8\2* align \6 \3, i8 \4, i128 \5, i1 \7)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i8\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i8 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i8(i8\3* \4, i8\5* \6, i8 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i16\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i16 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i16(i8\3* \4, i8\5* \6, i16 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i32\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i32 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i32(i8\3* \4, i8\5* \6, i32 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i64\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i64 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i64(i8\3* \4, i8\5* \6, i64 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i128\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i128 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i128(i8\3* \4, i8\5* \6, i128 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i8\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i8 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i8(i8\3* align \8 \4, i8\5* align \8 \6, i8 \7, i1 \9)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i16\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i16 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i16(i8\3* align \8 \4, i8\5* align \8 \6, i16 \7, i1 \9)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i32\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i32 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i32(i8\3* align \8 \4, i8\5* align \8 \6, i32 \7, i1 \9)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i64\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i64 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i64(i8\3* align \8 \4, i8\5* align \8 \6, i64 \7, i1 \9)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i128\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i128 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i128(i8\3* align \8 \4, i8\5* align \8 \6, i128 \7, i1 \9)~g

 The remaining changes in the series will:
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
   source and dest alignments.
Step 3) Update Clang to use the new IRBuilder API.
Step 4) Update Polly to use the new IRBuilder API.
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
        and those that use use MemIntrinsicInst::[get|set]Alignment() to use
        getDestAlignment() and getSourceAlignment() instead.
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
        MemIntrinsicInst::[get|set]Alignment() methods.

Reviewers: pete, hfinkel, lhames, reames, bollu

Reviewed By: reames

Subscribers: niosHD, reames, jholewinski, qcolombet, jfb, sanjoy, arsenm, dschuff, dylanmckay, mehdi_amini, sdardis, nemanjai, david2050, nhaehnle, javed.absar, sbc100, jgravelle-google, eraman, aheejin, kbarton, JDevlieghere, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, llvm-commits

Differential Revision: https://reviews.llvm.org/D41675

llvm-svn: 322965
2018-01-19 17:13:12 +00:00
Petr Hosek cc7a8f14bd Fallback option for colorized output when terminfo isn't available
Try to detect the terminal color support by checking the value of the
TERM environment variable. This is not great, but it's better than
nothing when terminfo library isn't available, which may still be the
case on some Linux distributions.

Differential Revision: https://reviews.llvm.org/D42055

llvm-svn: 322962
2018-01-19 17:10:55 +00:00
Carey Williams 22c49c6470 Test commit
llvm-svn: 322958
2018-01-19 16:55:23 +00:00
Sanjay Patel 74a1eef7c4 [x86] shrink 'and' immediate values by setting the high bits (PR35907)
Try to reverse the constant-shrinking that happens in SimplifyDemandedBits()
for 'and' masks when it results in a smaller sign-extended immediate.

We are also able to detect dead 'and' ops here (the mask is all ones). In
that case, we replace and return without selecting the 'and'.

Other targets might want to share some of this logic by enabling this under a
target hook, but I didn't see diffs for simple cases with PowerPC or AArch64,
so they may already have some specialized logic for this kind of thing or have
different needs.

This should solve PR35907:
https://bugs.llvm.org/show_bug.cgi?id=35907

Differential Revision: https://reviews.llvm.org/D42088

llvm-svn: 322957
2018-01-19 16:37:25 +00:00
Sanjay Patel 33cb84571f [InstSimplify] use m_Specific and commutative matcher to reduce code; NFCI
llvm-svn: 322955
2018-01-19 16:12:55 +00:00
Nirav Dave 72d32f24f5 [X86] Extend load-op-store fusion merge to ADC/SBB.
Summary: Add handling of EFLAG input to X86 Load-op-store fusion checking.

Reviewers: craig.topper, RKSimon

Subscribers: llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D42128

llvm-svn: 322952
2018-01-19 15:37:57 +00:00
Sander de Smalen 909cf956a1 [AArch64][SVE] Asm: Add support for RDVL/ADDVL/ADDPL instructions
Reviewers: fhahn, rengolin, t.p.northover, echristo, olista01, SjoerdMeijer

Reviewed By: SjoerdMeijer

Subscribers: SjoerdMeijer, aemerson, javed.absar, tschuett, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D41900

llvm-svn: 322951
2018-01-19 15:22:00 +00:00
Alexey Bataev fa80c47c6a [SLP] Fix vectorization for tree with trunc to minimum required bit width.
Summary:
If the vectorized tree has truncate to minimum required bit width and
the vector type of the cast operation after the truncation is the same
as the vector type of the cast operands, count cost of the vector cast
operation as 0, because this cast will be later removed.
Also, if the vectorization tree root operations are integer cast operations, do not consider them as candidates for truncation. It will just create extra number of the same vector/scalar operations, which will be removed by instcombiner.

Reviewers: RKSimon, spatel, mkuper, hfinkel, mssimpso

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D41948

llvm-svn: 322946
2018-01-19 14:40:13 +00:00
Dmitry Preobrazhensky 0e074e349d [AMDGPU][MC] Corrected parsing of image modifiers and encoding of image atomics
See bugs
    35962: https://bugs.llvm.org/show_bug.cgi?id=35962
    35963: https://bugs.llvm.org/show_bug.cgi?id=35963

Differential Revision: https://reviews.llvm.org/D42184

Reviewers: vpykhtin, artem.tamazov, arsenm
llvm-svn: 322942
2018-01-19 13:49:53 +00:00
Francis Visoiu Mistrih 548add992e [CodeGen] Unify printing format of debug-location in both MIR and -debug
Use "debug-location" instead of "; dbg:" in MI::print.

llvm-svn: 322936
2018-01-19 11:44:42 +00:00
Hiroshi Inoue d24ddcd6c4 [NFC] fix trivial typos in comments
"the the" -> "the"

llvm-svn: 322934
2018-01-19 10:55:29 +00:00
Alina Sbirlea df26cf8117 [ModRefInfo] Return NoModRef for Must and NoModRef.
Summary:
In ModRefInfo "Must" was introduced to track presence of MustAlias, but we still want to return NoModRef when there is neither Mod or Ref, even when MustAlias is found. Patch has small fixes to ensure this happens.
Minor cleanup to remove nesting for 2 if statements when calling getModRefInfo for 2 ImmutableCallSites.

Reviewers: sanjoy

Subscribers: jlebar, llvm-commits

Differential Revision: https://reviews.llvm.org/D42209

llvm-svn: 322932
2018-01-19 10:26:40 +00:00
John Brawn 2867bd72c0 [InstCombine] Make foldSelectOpOp able to handle two-operand getelementptr
Three (or more) operand getelementptrs could plausibly also be handled, but
handling only two-operand fits in easily with the existing BinaryOperator
handling.

Differential Revision: https://reviews.llvm.org/D39958

llvm-svn: 322930
2018-01-19 10:05:15 +00:00
Matthias Braun 4a7c8e7aa2 Split MachineLICM into EarlyMachineLICM and MachineLICM; NFC
This avoids playing games with pseudo pass IDs and avoids using an
unreliable MRI::isSSA() check to determine whether register allocation
has happened.

Note that this renames:
- MachineLICMID -> EarlyMachineLICM
- PostRAMachineLICMID -> MachineLICMID
to be consistent with the EarlyTailDuplicate/TailDuplicate naming.

llvm-svn: 322927
2018-01-19 06:46:10 +00:00
Matthias Braun 3ab9fcb98e Split TailDuplicatePass into pre- and post-RA variant; NFC
Split TailDuplicatePass into EarlyTailDuplicate and TailDuplicate. This
avoids playing games with fake pass IDs and using MRI::isSSA() to
determine pre-/post-RA state.

llvm-svn: 322926
2018-01-19 06:08:17 +00:00
Craig Topper f4cd9083ac [X86] Make better use of instregex for cmovcc/setcc/jcc instructions in the Intel scheduler models.
Combine all the separate condition codes into a singular expression when possible.

llvm-svn: 322924
2018-01-19 05:47:32 +00:00
Serguei Katkov 22bb1c0e17 Revert [CGP] Re-enable Select in complex addressing mode
One of buildbots failed. Revert for now till fix the issue.

llvm-svn: 322923
2018-01-19 04:52:39 +00:00
Matthias Braun 5c290dc206 AArch64: Fix emergency spillslot being out of reach for large callframes
Re-commit of r322200: The testcase shouldn't hit machineverifiers
anymore with r322917 in place.

Large callframes (calls with several hundreds or thousands or
parameters) could lead to situations in which the emergency spillslot is
out of range to be addressed relative to the stack pointer.
This commit forces the use of a frame pointer in the presence of large
callframes.

This commit does several things:
- Compute max callframe size at the end of instruction selection.
- Add mirFileLoaded target callback. Use it to compute the max callframe size
  after loading a .mir file when the size wasn't specified in the file.
- Let TargetFrameLowering::hasFP() return true if there exists a
  callframe > 255 bytes.
- Always place the emergency spillslot close to FP if we have a frame
  pointer.
- Note that `useFPForScavengingIndex()` would previously return false
  when a base pointer was available leading to the emergency spillslot
  getting allocated late (that's the whole effect of this callback).
  Which made no sense to me so I took this case out: Even though the
  emergency spillslot is technically not referenced by FP in this case
  we still want it allocated early.

Differential Revision: https://reviews.llvm.org/D40876

llvm-svn: 322919
2018-01-19 03:16:36 +00:00
Matthias Braun dc4b3e87f4 AArch64: Omit callframe setup/destroy when not necessary
Do not create CALLSEQ_START/CALLSEQ_END when there is no callframe to
setup and the callframe size is 0.

- Fixes an invalid callframe nesting for byval arguments, which would
  look like this before this patch (as in `big-byval.ll`):
    ...
    ADJCALLSTACKDOWN 32768, 0, ...   # Setup for extfunc
    ...
    ADJCALLSTACKDOWN 0, 0, ...  # setup for memcpy
    ...
    BL &memcpy ...
    ADJCALLSTACKUP 0, 0, ...    # destroy for memcpy
    ...
    BL &extfunc
    ADJCALLSTACKUP 32768, 0, ...   # destroy for extfunc

- Saves us two instructions in the common case of zero-sized stackframes.
- Remove an unnecessary scheduling barrier (hence the small unittest
  changes).

Differential Revision: https://reviews.llvm.org/D42006

llvm-svn: 322917
2018-01-19 02:45:38 +00:00
Sam Clegg b6c5bc27c4 [WebAssembly] Add test expectations for gcc C++ tests (gcc/testsuite/g++.dg)
Differential Revision: https://reviews.llvm.org/D42226

llvm-svn: 322915
2018-01-19 01:40:52 +00:00
Lang Hames 44efd042a2 [ORC] Revert r322913 while I investigate an ASan failure.
llvm-svn: 322914
2018-01-19 01:40:26 +00:00
Lang Hames 817df9fa0c [ORC] Redesign the JITSymbolResolver interface to support bulk queries.
Bulk queries reduce IPC/RPC overhead for cross-process JITing and expose
opportunities for parallel compilation.

The two new query methods are lookupFlags, which finds the flags for each of a
set of symbols; and lookup, which finds the address and flags for each of a
set of symbols. (See doxygen comments for more details.)

The existing JITSymbolResolver class is renamed LegacyJITSymbolResolver, and
modified to extend the new JITSymbolResolver class using the following scheme:

- lookupFlags is implemented by calling findSymbolInLogicalDylib for each of the
symbols, then returning the result of calling getFlags() on each of these
symbols. (Importantly: lookupFlags does NOT call getAddress on the returned
symbols, so lookupFlags will never trigger materialization, and lookupFlags will
never call findSymbol, so only symbols that are part of the logical dylib will
return results.)

- lookup is implemented by calling findSymbolInLogicalDylib for each symbol and
falling back to findSymbol if findSymbolInLogicalDylib returns a null result.
Assuming a symbol is found its getAddress method is called to materialize it and
the result (if getAddress succeeds) is stored in the result map, or the error
(if getAddress fails) is returned immediately from lookup. If any symbol is not
found then lookup returns immediately with an error.

This change will break any out-of-tree derivatives of JITSymbolResolver. This
can be fixed by updating those classes to derive from LegacyJITSymbolResolver
instead.

llvm-svn: 322913
2018-01-19 01:12:40 +00:00
Craig Topper 84b26b90d1 [X86] Add intrinsic support for the RDPID instruction
This adds a new instrinsic to support the rdpid instruction. The implementation is a bit weird because the intrinsic is defined as always returning 32-bits, but the assembler support thinks the instruction produces a 64-bit register in 64-bit mode. But really it zeros the upper 32 bits. So I had to add separate patterns where 64-bit mode uses an extract_subreg.

Differential Revision: https://reviews.llvm.org/D42205

llvm-svn: 322910
2018-01-18 23:52:31 +00:00
Changpeng Fang ba6240cc71 AMDGPU/SI: Fix typos in d16 support patch the buffer intrinsics.
llvm-svn: 322906
2018-01-18 22:57:57 +00:00
Reid Kleckner 7897a789e7 [CodeView] Add line numbers for inlined call sites
We did this for inline call site line tables, but we hadn't done it for
regular function line tables yet. This patch copies that logic from
encodeInlineLineTable.

llvm-svn: 322905
2018-01-18 22:55:43 +00:00
Reid Kleckner b525872288 [CodeView] Sink complex inline functions to .cpp file, NFC
I'm cleaning up this code before I attempt to fix a line table bug.

llvm-svn: 322904
2018-01-18 22:55:14 +00:00
Changpeng Fang 4737e892de AMDGPU/SI: Add d16 support for image intrinsics.
Summary:
  This patch implements d16 support for image load, image store and image sample intrinsics.

Reviewers:
  Matt, Brian.

Differential Revision:
  https://reviews.llvm.org/D3991

llvm-svn: 322903
2018-01-18 22:08:53 +00:00
Eric Christopher 668e6b4b05 Typo fix SIBABRT -> SIGABRT.
Based on a patch by Henry Wong!

llvm-svn: 322902
2018-01-18 21:45:51 +00:00
Peter Collingbourne 719c1f7405 Support: Add missing #include.
This #include is necessary to provide the definitions of _fpclass
and _FPCLASS_NZ when building with libc++.

llvm-svn: 322885
2018-01-18 20:49:33 +00:00
Paul Robinson 8181d23b3d [DWARFv5] Number the line-table's directory array correctly.
The compilation directory has always been #0, but as of DWARF v5 it is
explicitly listed in the line-table section instead of implicitly
being a reference to the compile_unit DIE's DW_AT_comp_dir attribute.
This means the dumper should number the dumped array starting with 0
or 1 depending on the DWARF version of the line table.

References in the generated DWARF are correct, it's just the dumper
that was wrong.  Also some assembler-coded tests were similarly
confused about directory numbers.

llvm-svn: 322884
2018-01-18 20:33:35 +00:00
Amara Emerson d5785775f8 [AArch64][GlobalISel] Add isel support for global values in the large code model.
Fixes PR35958.

Differential Revision: https://reviews.llvm.org/D42175

llvm-svn: 322878
2018-01-18 19:21:27 +00:00
Ana Pazos 1b57c7a0f4 [RISCV] Fixed setting predicates for compressed instructions.
Summary:
Fixed setting predicates for compressed instructions.
Some instructions were being generated with C extension
enabled only, without proper checks for the other
required extensions like F, D and 32 and 64-bit target checks.
Affected instructions:
C_FLD, C_FLW, C_LD, C_FSD, C_FSW, C_SD,
C_JAL, C_ADDIW, C_SUBW, C_ADDW,
C_FLDSP, C_FLWSP, C_LDSP, C_FSDSP, C_FSWSP, C_SDSP

Reviewers: asb, shiva0217

Reviewed By: asb

Subscribers: rbar, johnrusso, simoncook, jordy.potman.lists, sabuasal, niosHD, llvm-commits

Differential Revision: https://reviews.llvm.org/D42132

llvm-svn: 322876
2018-01-18 18:54:05 +00:00
Zachary Turner 1bc2ce6b9b Speed up iteration of CodeView record streams.
There's some abstraction overhead in the underlying
mechanisms that were being used, and it was leading to an
abundance of small but not-free copies being made.  This
showed up on a profile.  Eliminating this and going back to
a low-level byte-based implementation speeds up lld with
/DEBUG between 10 and 15%.

Differential Revision: https://reviews.llvm.org/D42148

llvm-svn: 322871
2018-01-18 18:35:01 +00:00
Francis Visoiu Mistrih eb3f76fc83 [CodeGen][NFC] Rename IsVerbose to IsStandalone in Machine*::print
Committed r322867 too soon.

Differential Revision: https://reviews.llvm.org/D42239

llvm-svn: 322868
2018-01-18 18:05:15 +00:00
Francis Visoiu Mistrih 378b5f3de6 [CodeGen] Print RegClasses on MI in verbose mode
r322086 removed the trailing information describing reg classes for each
register.

This patch adds printing reg classes next to every register when
individual operands/instructions/basic blocks are printed. In the case
of dumping MIR or printing a full function, by default don't print it.

Differential Revision: https://reviews.llvm.org/D42239

llvm-svn: 322867
2018-01-18 17:59:06 +00:00
Sanjay Patel 0e690b05d1 [TargetLowering] add punctuation for readability; NFC
llvm-svn: 322855
2018-01-18 15:25:32 +00:00
Francis Visoiu Mistrih 586444e42b [CodeGen][NFC] Refactor MachineInstr::print
* Handle more cases where the MI is not attached yet
* Add similar asserts like in MIRPrinter::print

llvm-svn: 322848
2018-01-18 14:52:14 +00:00
Benjamin Kramer bfc1d976ca [HWAsan] Fix uninitialized variable.
Found by msan.

llvm-svn: 322847
2018-01-18 14:19:04 +00:00
Alex Bradbury 921383828e [RISCV] Codegen support for the standard RV32M instruction set extension
llvm-svn: 322843
2018-01-18 12:36:38 +00:00
Alex Bradbury 7d6aa1f7ae [RISCV] Implement frame pointer elimination
llvm-svn: 322839
2018-01-18 11:34:02 +00:00
Sam Parker 1f8c035d6b [SelectionDAG] Convert assert to condtion
Follow-up to r322120 which can cause assertions for AArch64 because
v1f64 and v1i64 are legal types.

Differential Revision: https://reviews.llvm.org/D42097

llvm-svn: 322823
2018-01-18 09:22:24 +00:00
Craig Topper 83b0a98902 [X86] Use vmovdqu64/vmovdqa64 for unmasked integer vector stores for consistency with loads.
Previously we used 64 for vXi64 stores and 32 for everything else. This change uses 64 for everything just like do for loads.

llvm-svn: 322820
2018-01-18 07:44:09 +00:00
Craig Topper 21c8a8fa49 [X86] Remove isel patterns for using unmasked vmovdqa32/vmovdqu32 for integer vector loads.
These patterns were just looking for a vXi64 bitcasted to vXi32, but there is no advantage to using vmovdqa32 over vmovdqa64.

llvm-svn: 322819
2018-01-18 07:44:06 +00:00
Rafael Espindola 8a1ca67461 Don't drop dso_local in LTO.
LTO sets dso_local as an optimization, so don't clear it.

This avoid clearing it from undefined hidden symbols, which would then
fail the verifier.

llvm-svn: 322814
2018-01-18 05:38:43 +00:00
Craig Topper 7f0d85ec1e [DAGCombiner] Add a DAG combine to turn a splat build_vector where the splat elemnt is a bitcast from a vector type into a concat_vector
For example, a build_vector of i64 bitcasted from v2i32 can be turned into a concat_vectors of the v2i32 vectors with a bitcast to a vXi64 type

Differential Revision: https://reviews.llvm.org/D42090

llvm-svn: 322811
2018-01-18 04:17:06 +00:00
Rafael Espindola 9fbc040599 Make GlobalValues with non-default visibilility dso_local.
This is similar to r322317, but for visibility. It is not as neat
because we have to special case extern_weak.

The idea is the same as the previous change, make the transition to
explicit dso_local easier for the frontends. With this they only have
to add dso_local to symbols where we need some external information to
decide if it is dso_local (like it being part of an ELF executable).

llvm-svn: 322806
2018-01-18 02:08:23 +00:00
Justin Bogner a9346e050f GlobalISel: Make MachineCSE runnable in the middle of the GlobalISel
Right now, it is not possible to run MachineCSE in the middle of the
GlobalISel pipeline. Being able to run generic optimizations between the
core passes of GlobalISel was one of the goals of the new ISel framework.
This is the first attempt to do it.

The problem is that MachineCSE pass assumes all register operands have a
register class, which, in GlobalISel context, won't be true until after the
InstructionSelect pass. The reason for this behaviour is that before
replacing one virtual register with another, MachineCSE pass (and most of
the other optimization machine passes) must check if the virtual registers'
constraints have a (sufficiently large) intersection, and constrain the
resulting register appropriately if such intersection exists.

GlobalISel extends the representation of such constraints from just a
register class to a triple (low-level type, register bank, register
class).

This commit adds MachineRegisterInfo::constrainRegAttrs method that extends
MachineRegisterInfo::constrainRegClass to such a triple.

The idea is that going forward we should use:

- RegisterBankInfo::constrainGenericRegister within GlobalISel's
  InstructionSelect pass
- MachineRegisterInfo::constrainRegClass within SelectionDAG ISel
- MachineRegisterInfo::constrainRegAttrs everywhere else regardless
  the target and instruction selector it uses.

Patch by Roman Tereshin. Thanks!

llvm-svn: 322805
2018-01-18 02:06:56 +00:00
Derek Schuff 53b3855b2b [WebAssembly] Remove duplicated RTLIB names
Remove the tight coupling between llvm/CodeGenRuntimeLibcalls.def and
the table of supported singatures for wasm. This will allow adding new libcalls
without changing wasm's signature table.

Also, some cleanup:
Use ManagedStatics instead of const tables to avoid memory/binary bloat.
Use a StringMap instead of a linear search for name lookup.

Differential Revision: https://reviews.llvm.org/D35592

llvm-svn: 322802
2018-01-18 01:15:45 +00:00
Volkan Keles 4aa73a649a Fix the failure caused by r322773
Do not run GlobalISel if `-fast-isel=0 -global-isel=false`.

llvm-svn: 322800
2018-01-18 01:10:30 +00:00
Jessica Paquette 729e68693f [MachineOutliner] Add DISubprograms to outlined functions.
Before, it wasn't possible to get backtraces inside outlined functions. This
commit adds DISubprograms to the IR functions created by the outliner which
makes this possible. Also attached a test that ensures that the produced
debug information is correct. This is useful to users that want to debug
outlined code.

llvm-svn: 322789
2018-01-18 00:00:58 +00:00
Reid Kleckner 1aa9061c5f [CodeGen] Hoist common AsmPrinter code out of X86, ARM, and AArch64
Every known PE COFF target emits /EXPORT: linker flags into a .drective
section. The AsmPrinter should handle this.

While we're at it, use global_values() and emit each export flag with
its own .ascii directive. This should make the .s file output more
readable.

llvm-svn: 322788
2018-01-17 23:55:23 +00:00
Evgeniy Stepanov 5bd669dc8f [hwasan] LLVM-level flags for linux kernel-compatible hwasan instrumentation.
Summary:
-hwasan-mapping-offset defines the non-zero shadow base address.
-hwasan-kernel disables calls to __hwasan_init in module constructors.
Unlike ASan, -hwasan-kernel does not force callback instrumentation.
This is controlled separately with -hwasan-instrument-with-calls.

Reviewers: kcc

Subscribers: srhines, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D42141

llvm-svn: 322785
2018-01-17 23:24:38 +00:00
Volkan Keles a79b0620a0 Add a TargetOption to enable/disable GlobalISel
Summary:
This patch adds a new target option in order to control GlobalISel.
This will allow the users to enable/disable GlobalISel prior to the
backend by calling `TargetMachine::setGlobalISel(bool Enable)`.

No test case as there is already a test to check GlobalISel
command line options.
See: CodeGen/AArch64/GlobalISel/gisel-commandline-option.ll.

Reviewers: qcolombet, aemerson, ab, dsanders

Reviewed By: qcolombet

Subscribers: rovka, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D42137

llvm-svn: 322773
2018-01-17 22:34:21 +00:00
Benjamin Kramer 8b1986b5cb Add support for emitting libcalls for x86_fp80 -> fp128 and vice-versa
compiler_rt doesn't provide them (yet), but libgcc does. PR34076.

llvm-svn: 322772
2018-01-17 22:29:16 +00:00
Easwaran Raman e5b8de2f1f Add a ProfileCount class to represent entry counts.
Summary:
The class wraps a uint64_t and an enum to represent the type of profile
count (real and synthetic) with some helper methods.

Reviewers: davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D41883

llvm-svn: 322771
2018-01-17 22:24:23 +00:00
Eli Friedman c60a23a6af [LegalizeDAG] Fix ATOMIC_CMP_SWAP_WITH_SUCCESS legalization.
The code wasn't zero-extending correctly, so the comparison could
spuriously fail.

Adds some AArch64 tests to cover this case.

Inspired by D41791.

Differential Revision: https://reviews.llvm.org/D41798

llvm-svn: 322767
2018-01-17 22:04:36 +00:00
Javed Absar 1e28194a40 [SCEV] Fix typo. NFC.
Fix confusing typo in comment.

llvm-svn: 322765
2018-01-17 21:58:35 +00:00
Zaara Syeda c9dc7b451b Revert [PowerPC] This reverts commit rL322721
Failing build bots. Revert the commit now.

llvm-svn: 322748
2018-01-17 20:00:15 +00:00
Philip Reames f5ff5d584e [MDA] Use common code instead of reimplementing same. [NFC]
llvm-svn: 322747
2018-01-17 19:57:19 +00:00
Aditya Nandakumar 18b3f9d384 [GISel] Make constrainSelectedInstRegOperands() available to the legalizer. NFC
https://reviews.llvm.org/D42149

llvm-svn: 322743
2018-01-17 19:31:33 +00:00
Sam Clegg 9f3fe42e19 [WebAssembly] Remove debug names from symbol table
Get rid of DEBUG_FUNCTION_NAME symbols. When we actually debug
data, maybe we'll want somewhere to put it... but having a symbol
that just stores the name of another symbol seems odd.
It means you have multiple Symbols with the same name, one
containing the actual function and another containing the name!

Store the names in a vector on the WasmObjectFile when reading
them in. Also stash them on the WasmFunctions themselves.
The names are //not// "symbol names" or aliases or anything,
they're just the name that a debugger should show against the
function body itself. NB. The WasmObjectFile stores them so that
they can be exported in the YAML losslessly, and hence the tests
can be precise.

Enforce that the CODE section has been read in before reading
the "names" section. Requires minor adjustment to some tests.

Patch by Nicholas Wilson!

Differential Revision: https://reviews.llvm.org/D42075

llvm-svn: 322741
2018-01-17 19:28:43 +00:00
Rafael Espindola d700869235 Use a got to access a hidden weak undefined on MachO.
Trying to link

__attribute__((weak, visibility("hidden"))) extern int foo;
int *main(void) {
  return &foo;
}

on OS X fails with

ld: 32-bit RIP relative reference out of range (-4294971318 max is +/-2GB): from _main (0x100000FAB) to _foo@0x00001000 (0x00000000) in '_main' from test.o for architecture x86_64

The problem being that 0 cannot be computed as a fixed difference from
%rip. Exactly the same issue exists on ELF and we can use the same
solution.

llvm-svn: 322739
2018-01-17 19:19:55 +00:00
Joel Galenson bbcaf4ac5c [ARM] Optimize {s,u}mul.with.overflow.
This extends my previous patches to also optimize overflow-checked multiplies during SelectionDAG.

Differential revision: https://reviews.llvm.org/D40922

llvm-svn: 322738
2018-01-17 19:19:05 +00:00
Joel Galenson fe7fa40869 [ARM] Optimize {s,u}{add,sub}.with.overflow.
The ARM backend contains code that tries to optimize compares by replacing them with an existing instruction that sets the flags the same way. This allows it to replace a "cmp" with a "adds", generalizing the code that replaces "cmp" with "sub". It also heuristically disables sinking of instructions that could potentially be used to replace compares (currently only if they're next to each other).

Differential revision: https://reviews.llvm.org/D38378

llvm-svn: 322737
2018-01-17 19:19:05 +00:00
Daniel Neilson 88dddb8948 [Attributes] Fix crash when attempting to remove alignment from an attribute list/set
Summary:
 Discovered while working on a patch to move alignment in
@llvm.memcpy/move/set from an arg into parameter attributes.

 The current implementations of AttributeSet::removeAttribute() and
AttributeList::removeAttribute crash when attempting to remove the
alignment attribute. Currently, these implementations add the
to-be-removed attributes to an AttrBuilder and then remove
the builder from the list/set. Alignment is special in that it
must be added to a builder with an integer value for the alignment;
attempts to add alignment to a builder without a value is an error.

 This change fixes the removeAttribute implementations for AttributeSet and
AttributeList to make them able to remove the alignment, and other similar,
attributes.

Reviewers: rnk, chandlerc, pete, javed.absar, reames

Reviewed By: rnk

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D41951

llvm-svn: 322735
2018-01-17 19:15:21 +00:00
Simon Pilgrim 8c87a2e7bd [X86][BTVER2] Reduce instregex usage (PR35955)
Most are just replaced with instrs lists, but a few regexps have been further generalized to match more instructions with a single pattern.

llvm-svn: 322734
2018-01-17 19:12:48 +00:00
Craig Topper b70ca5060f [X86] Teach LowerBUILD_VECTOR to recognize pair-wise splats of 32-bit elements and use a 64-bit broadcast
If we are splatting pairs of 32-bit elements, we can use a 64-bit broadcast to get the job done.

We could probably could probably do this with other sizes too, for example four 16-bit elements. Or we could broadcast pairs of 16-bit elements using a 32-bit element broadcast. But I've left that as a future improvement.

I've also restricted this to AVX2 only because we can only broadcast loads under AVX.

Differential Revision: https://reviews.llvm.org/D42086

llvm-svn: 322730
2018-01-17 18:58:22 +00:00
Craig Topper 279ace187a [X86] When legalizing (v64i1 select i8, v64i1, v64i1) make sure not to introduce bitcasts to i64 in 32-bit mode
We legalize selects of masks with scalar conditions using a bitcast to an integer type. But if we are in 32-bit mode we can't convert v64i1 to i64. So instead split the v64i1 to v32i1 and concat it back together. Each half will then be legalized by bitcasting to i32 which is fine.

The test case is a little indirect. If we have the v64i1 select in IR it will get legalized by legalize vector ops which has a run of type legalization after it. That type legalization run is able to fix this i64 bitcast. So in order to avoid that we need a build_vector of a splat which legalize vector ops will ignore. Legalize DAG will then turn that into a select via LowerBUILD_VECTORvXi1. And the select will get legalized. In this case there is no type legalizer run to cleanup the bitcast.

This fixes pr35972.

llvm-svn: 322724
2018-01-17 18:46:01 +00:00
Zaara Syeda 8e951fd2f6 [PowerPC] Add handling for ColdCC calling convention and a pass to mark
candidates with coldcc attribute.

This patch adds support for the coldcc calling convention for Power.
This changes the set of non-volatile registers. It includes a pass to stress
test the implementation by marking all static directly called functions with
the coldcc attribute through the option -enable-coldcc-stress-test. It also
includes an option, -ppc-enable-coldcc, to add the coldcc attribute to
functions which are cold at all call sites based on BlockFrequencyInfo when
the containing function does not call any non cold functions.

Differential Revision: https://reviews.llvm.org/D38413

llvm-svn: 322721
2018-01-17 18:22:55 +00:00
Tatyana Krasnukha 8979eea04e [ARC] Add missing condition codes.
Summary: Added VS and VC, required for disassembling.

Reviewers: petecoup

Reviewed By: petecoup

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42172

llvm-svn: 322718
2018-01-17 17:58:28 +00:00
Jonas Paulsson ef785694f2 [SystemZ] Handle BRCTH branches correctly in SystemZLongBranch.cpp.
BRCTH is capable of a long branch which needs to be recognized during branch
relaxation. This is done by checking for ExtraRelaxSize == 0.

Review: Ulrich Weigand
llvm-svn: 322688
2018-01-17 17:16:07 +00:00
Matt Arsenault 1491ca8911 AMDGPU: Error in SIAnnotateControlFlow instead of assert
This assert typically happens if an unstructured CFG is passed
to the pass. This can happen if the pass is run independently
without the structurizer.

llvm-svn: 322685
2018-01-17 16:30:01 +00:00
Diana Picus 01bcfd2112 [ARM GlobalISel] Rename local variable. NFC
llvm-svn: 322667
2018-01-17 15:25:37 +00:00
Pablo Barrio f2c29571da [AArch64] Fix incorrect LD1 of 16-bit FP vectors in big endian
Summary:
Loading a vector of 4 half-precision FP sometimes results in an LD1
of 2 single-precision FP + a reversal. This results in an incorrect
byte swap due to the conversion from little endian to big endian.

In order to generate the correct byte swap, it is easier to
generate the correct LD1 of 4 half-precision FP, thus avoiding the
subsequent reversal.

Reviewers: craig.topper, jmolloy, olista01

Reviewed By: olista01

Subscribers: efriedma, samparker, SjoerdMeijer, rogfer01, aemerson, rengolin, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D41863

llvm-svn: 322663
2018-01-17 14:39:29 +00:00
Sanjay Patel aa766efd09 [InstCombine] fix demanded-bits propagation for zext/trunc
I was comparing the demanded-bits implementations between InstCombine
and TargetLowering as part of investigating questions in D42088 and
noticed that this was wrong in IR. We were losing all of the prior
known bits when we got back to the 'zext'.

llvm-svn: 322662
2018-01-17 14:39:28 +00:00
Alex Bradbury d93f889d89 [RISCV] Allow RISCVAsmBackend::writeNopData to generate c.nop when supported
When the compressed instruction set is enabled, the 16-bit c.nop can be
generated if necessary.

Differential Revision: https://reviews.llvm.org/D41221
Patch by Shiva Chen.

llvm-svn: 322658
2018-01-17 14:17:12 +00:00
Diana Picus c62a16234b [ARM GlobalISel] Map G_FPEXT and G_FPTRUNC to FPR
llvm-svn: 322657
2018-01-17 14:14:14 +00:00
Daniil Fukalov d5fca554e2 [AMDGPU] add LDS f32 intrinsics
added llvm.amdgcn.atomic.{add|min|max}.f32 intrinsics
to allow generate ds_{add|min|max}[_rtn]_f32 instructions
needed for OpenCL float atomics in LDS

Reviewed by: arsenm

Differential Revision: https://reviews.llvm.org/D37985

llvm-svn: 322656
2018-01-17 14:05:05 +00:00
Dmitry Preobrazhensky 6b65f7c380 [AMDGPU][MC][GFX9] Enable inline constants for SDWA operands
See bug 35771: https://bugs.llvm.org/show_bug.cgi?id=35771

Differential Revision: https://reviews.llvm.org/D42058

Reviewers: vpykhtin, artem.tamazov, arsenm
llvm-svn: 322655
2018-01-17 14:00:48 +00:00
Diana Picus 65ed364fac [ARM GlobalISel] Legalize G_FPEXT and G_FPTRUNC
Mark G_FPEXT and G_FPTRUNC as legal or libcall, depending on hardware
support, but only for conversions between float and double.

Also add the necessary boilerplate so that the LegalizerHelper can
introduce the required libcalls. This also works only for float and
double, but isn't too difficult to extend when the need arises.

llvm-svn: 322651
2018-01-17 13:34:10 +00:00
Ivan A. Kosarev 4d0ff0c74d [Transforms] Support making mutable versions of new-format TBAA access tags
Differential Revision: https://reviews.llvm.org/D41565

llvm-svn: 322650
2018-01-17 13:29:54 +00:00
Benjamin Kramer 8d073a2c2d [X86] Don't mutate shuffle arguments after early-out for AVX512
The match* functions have the annoying behavior of modifying its inputs.
Save and restore the inputs, just in case the early out for AVX512 is
hit. This is still not great and its only a matter of time this kind of
bug happens again, but I couldn't come up with a better pattern without
rewriting significant chunks of this code. Fixes PR35977.

llvm-svn: 322644
2018-01-17 13:01:06 +00:00
Benjamin Kramer 05dc3527de [X86] Constify DebugLoc parameters. No functionality change.
llvm-svn: 322643
2018-01-17 13:00:58 +00:00
Hiroshi Inoue 8f976ba0bf [NFC] fix trivial typos in comments
"the the" -> "the"

llvm-svn: 322636
2018-01-17 12:29:38 +00:00
Pavel Labath 67530e478b Don't emit apple accelerator tables on non-darwin targets
Summary:
Currently -glldb turns on emission of apple tables on all targets, but
lldb is only really capable of consuming them on darwin. Furthermore,
making lldb consume these tables is not straight-forward because of the
differences in how the debug info is distributed on darwin vs. elf
targets.

The darwin debug model assumes that the debug info (along with
accelerator tables) will either remain in the .o files or it will be
linked into a dsym bundle by a linker that knows how to merge these
tables. In the elf world, all present linkers will simply concatenate
these accelerator tables into the shared object. Since the tables are
not self-terminating, this renders the tables unusable, as the debugger
cannot pry the individual tables apart anymore.

It might theoretically be possible to make the tables work with split
dwarf, as that is somewhat similar to the apple .o model, but
unfortunately right now the combination of -glldb and -gsplit-dwarf
produces broken object files.

Until these issues are resolved there is no point in emitting the apple
tables for these targets. At best, it wastes space; at worst, it breaks
compilation and prevents the user from getting other benefits of -glldb.

Reviewers: probinson, aprantl, dblaikie

Subscribers: emaste, dim, llvm-commits, JDevlieghere

Differential Revision: https://reviews.llvm.org/D41986

llvm-svn: 322633
2018-01-17 11:52:13 +00:00
Javed Absar 0b05f327d6 [SCEV] fix typo
llvm-svn: 322629
2018-01-17 11:03:06 +00:00
George Rimar 2421b6fa5c [ThinLTO] - Remove code duplication. NFC.
Refactors 3 copies of isExpected.
Splitted from D42107.

llvm-svn: 322627
2018-01-17 10:33:05 +00:00
Andrew V. Tischenko f7706994a6 Allow usage of X86-prefixes as separate instrs.
Differential Revision: https://reviews.llvm.org/D42102

llvm-svn: 322623
2018-01-17 10:12:06 +00:00
Sean Eveson 2ae6037dd1 [MC] Fix -stack-size-section on ARM
Change symbol values in the stack_size section from being 8 bytes, to being a target dependent size.

Differential Revision: https://reviews.llvm.org/D42108

llvm-svn: 322619
2018-01-17 09:01:29 +00:00
Craig Topper 77ba1e7c08 [X86] In LowerBUILD_VECTOR, rename ExtVT to EltVT so it makes sense.
llvm-svn: 322616
2018-01-17 03:58:21 +00:00
Craig Topper de1d28e053 [X86] Remove duplicate lines from scheduler models. NFC
llvm-svn: 322615
2018-01-17 03:50:21 +00:00
George Burgess IV 41e646d8ea [Support] Return an enum instead of an unsigned; NFC.
We seem to be (logically) returning ArchExtKinds here in all cases, so
the return type should reflect that.

The static_cast is necessary because `A.ID` is actually an `unsigned`,
presumably since we use `decltype(A)` to represent extended attributes
for both ARM and AArch64, which use distinct `ArchExtKinds`.

We can't trivially make the same change for ARM, because one of the
values it returns is the bitwise-or of two `ARM::ArchExtKind`s.

llvm-svn: 322613
2018-01-17 03:12:06 +00:00
Aaron Smith 53a1a1616c Fix pretty printing the unspecified param of a variadic function
Summary:
 - Fix a bug in PrettyBuiltinDumper that returns "void" as the name for
  an unspecified builtin type. Since the unspecified param of a variadic
  function is considered a builtin of unspecified type in PDBs, we set
  "..." for its name.

  - Provide a method to determine if a PDBSymbolFunc is variadic in
  PrettyFunctionDumper since PDBSymbolFunc::getArgument() doesn't return the
  last unspecified-type param.

  - Add a pretty-func-dumper.test to test pretty dumping of variadic
  functions.

Reviewers: zturner, llvm-commits

Reviewed By: zturner

Differential Revision: https://reviews.llvm.org/D41801

llvm-svn: 322608
2018-01-17 01:22:03 +00:00
Evgeniy Stepanov c07e0bd533 [hwasan] Rename sized load/store callbacks to be consistent with ASan.
Summary: __hwasan_load is now __hwasan_loadN.

Reviewers: kcc

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D42138

llvm-svn: 322601
2018-01-16 23:15:08 +00:00
Simon Pilgrim a8e6b885bd [X86][BTVER2] Fix scheduling of VCMPSD/VCMPSS instructions
For some reason they don't have a trailing i like the packed equivalents.

llvm-svn: 322600
2018-01-16 22:15:41 +00:00
Florian Hahn c6c89bffdc [CallSiteSplitting] Pass list of (BB, Conditions) pairs to splitCallSite.
This removes some duplication from splitCallSite and makes it easier to
add additional code dealing with each predecessor. It also allows us to
split for more than 2 predecessors, although that is not enabled for
now.

Reviewers: junbuml, mcrosier, davidxl, davide

Reviewed By: junbuml

Differential Revision: https://reviews.llvm.org/D41858

llvm-svn: 322599
2018-01-16 22:13:15 +00:00
Simon Pilgrim 3c66e2c541 [X86][BTVER2] Use instrs instead of instregex for low match counts (PR35955)
llvm-svn: 322598
2018-01-16 22:08:43 +00:00
Simon Pilgrim e9a2832f32 [X86][BTVER2] Use instrs instead of instregex for single use matches (PR35955)
llvm-svn: 322597
2018-01-16 21:44:48 +00:00
Rui Ueyama af4ddd5a6e Specify inline for isWhitespace in CommandLine.cpp
Patch by Takuto Ikuta.

In chromium's component build, there are many directive sections and
commandline parsing takes much time.
This patch is for speed up of lld in RelWithDebInfo build by forcing
inline heavily called isWhitespace function.

10 times link perf stats of blink_core.dll changed like below.

master:
TotalSeconds: 9.8764878
TotalSeconds: 10.1455242
TotalSeconds: 10.075279
TotalSeconds: 10.3397347
TotalSeconds: 9.8361665
TotalSeconds: 9.9544441
TotalSeconds: 9.8960686
TotalSeconds: 9.8877865
TotalSeconds: 10.0551879
TotalSeconds: 10.0492254
Avg: 10.01159047

with this patch:
TotalSeconds: 8.8696762
TotalSeconds: 9.1021585
TotalSeconds: 9.0233893
TotalSeconds: 9.1886175
TotalSeconds: 9.156954
TotalSeconds: 9.0978564
TotalSeconds: 9.1316824
TotalSeconds: 8.8354606
TotalSeconds: 9.2549431
TotalSeconds: 9.4473085
Avg: 9.11080465

llvm-svn: 322595
2018-01-16 20:52:32 +00:00
Lang Hames 4a793c0667 [ExecutionEngine] Rename JITSymbol::isStrongDefinition to isStrong.
For symmetry with isWeak, isCommon.

llvm-svn: 322594
2018-01-16 20:39:51 +00:00
Guozhi Wei e6fb4e1f8a [PPC] Add a new register XER aliased to CARRY
When "xer" is specified as clobbered register in inline assembler, clang can accept it, but llvm simply ignore it when lowered to machine instructions. It may cause problems later in scheduler.

This patch adds a new register XER aliased to CARRY, and adds it to register class CARRYRC. Now PPCTargetLowering::getRegForInlineAsmConstraint can return correct register number for inline asm constraint "{xer}", and scheduler behave correctly.

Differential Revision: https://reviews.llvm.org/D41967

llvm-svn: 322591
2018-01-16 19:28:50 +00:00
Francis Visoiu Mistrih 54a9e7a400 [CodeGen] Skip some instructions that shouldn't affect shrink-wrapping
r320606 checked for MI.isMetaInstruction which skips all DBG_VALUEs.

This also skips IMPLICIT_DEFs and other instructions that may def / read
a register.

Differential Revision: https://reviews.llvm.org/D42119

llvm-svn: 322584
2018-01-16 18:55:26 +00:00
Volkan Keles f7f2568613 [GlobalISel][TableGen] Add support for SDNodeXForm
Summary:
This patch adds CustomRenderer which renders the matched
operands to the specified instruction.

Targets can enable the matching of SDNodeXForm by adding
a definition that inherits from GICustomOperandRenderer and
GISDNodeXFormEquiv as follows.

def gi_imm8 : GICustomOperandRenderer<"renderImm8”>,
                       GISDNodeXFormEquiv<imm8_xform>;

Custom renderer functions should be of the form:
void render(MachineInstrBuilder &MIB, const MachineInstr &I);

Reviewers: dsanders, ab, rovka

Reviewed By: dsanders

Subscribers: kristof.beyls, javed.absar, llvm-commits, mgrang, qcolombet

Differential Revision: https://reviews.llvm.org/D42012

llvm-svn: 322582
2018-01-16 18:44:05 +00:00
Alexey Bataev 6977dbcc7b [SLP] Fix for PR32164: Improve vectorization of reverse order of extract operations.
Summary: Sometimes vectorization of insertelement instructions with extractelement operands may produce an extra shuffle operation, if these operands are in the reverse order. Patch tries to improve this situation by the reordering of the operands to remove this extra shuffle operation.

Reviewers: mkuper, hfinkel, RKSimon, spatel

Subscribers: mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D33954

llvm-svn: 322579
2018-01-16 18:17:01 +00:00
Simon Pilgrim 3e0aafbfcc [X86][MMX] Accept UNDEF upper bits for MOVD GR32->MMX
llvm-svn: 322574
2018-01-16 17:01:31 +00:00
Petar Jovanovic 0b464e4f0e [LiveDebugValues] recognize spilled reg killed in instruction after spill
Current condition for spill instruction recognition in LiveDebugValues does
not recognize case when register is spilled and killed in next instruction.

Patch by Nikola Prica.

Differential Revision: https://reviews.llvm.org/D41226

llvm-svn: 322554
2018-01-16 14:46:05 +00:00
Simon Pilgrim 85e6139633 [X86][MMX] Improve MMX constant generation
Extend the MMX zero code to take any constant with zero'd upper 32-bits

llvm-svn: 322553
2018-01-16 14:21:28 +00:00
Jonas Devlieghere 6f24c8778c [DebugInfo] Unify dumping of address ranges
Summary:
This patch unifies the printing of address ranges as [0x0, 0x1).

rdar://34822059

Reviewers: aprantl, dblaikie

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D42056

llvm-svn: 322543
2018-01-16 11:17:57 +00:00
Francis Visoiu Mistrih 5eaddb3f68 [CodeGen] Remove special case of printing subRegIdx from MachineInstr::print
Support in MachineOperand has been added in r320209. No need to special
case this anymore.

llvm-svn: 322542
2018-01-16 10:53:14 +00:00
Francis Visoiu Mistrih ecd0b83312 [CodeGen][NFC] Correct case for printSubRegIdx
llvm-svn: 322541
2018-01-16 10:53:11 +00:00
Yonghong Song 035dd256d5 [BPF] Mark pseudo insn patterns as isCodeGenOnly
These pseudos are not supposed to be visible to user.

This patch reduced the auto-generated instruction matcher. For example,
the following words are removed from keyword list of LLVM BPF assembler.

-  MCK__35_, // '#'
-  MCK__COLON_, // ':'
-  MCK__63_, // '?'
-  MCK_ADJCALLSTACKDOWN, // 'ADJCALLSTACKDOWN'
-  MCK_ADJCALLSTACKUP, // 'ADJCALLSTACKUP'
-  MCK_PSEUDO, // 'PSEUDO'
-  MCK_Select, // 'Select'

Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
llvm-svn: 322535
2018-01-16 07:27:20 +00:00
Yonghong Song b42c7c7863 [BPF] Teach DAG2DAG AND elimination about load intrinsics
As commented on the existing code:

  // The Reg operand should be a virtual register, which is defined
  // outside the current basic block. DAG combiner has done a pretty
  // good job in removing truncating inside a single basic block.

However, when the Reg operand comes from bpf_load_[byte | half | word]
intrinsics, the generic optimizer doesn't understand their results are
zero extended, so these single basic block elimination opportunities were
missed.

Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
llvm-svn: 322534
2018-01-16 07:27:19 +00:00
Hiroshi Inoue 99a8faa615 [SROA] fix assetion failure
This patch fixes the assertion failure in SROA reported in PR35657.
PR35657 reports the assertion failure due to r319522 (splitting for non-whole-alloca slices), but this problem can happen even without r319522.

The problem exists in a check for reusing an existing alloca when rewriting partitions. As the original comment said, we can reuse the existing alloca if the new alloca has the same type and offset with the existing one. But the code checks only type of the alloca and then check the offset using an assert.
In a corner case with out-of-bounds access (e.g. @PR35657 function added in unit test), it is possible that the two allocas have the same type but different offsets.

This patch makes the check of the offset in the if condition, and re-enables the splitting for non-whole-alloca slices.

Differential Revision: https://reviews.llvm.org/D41981

llvm-svn: 322533
2018-01-16 06:23:05 +00:00
Craig Topper 7a0c601f95 [X86] Revisit the fix I made years ago to make 'xchgl %eax, %eax' not encode using the 0x90 encoding in 64-bit mode.
Prior to this we had a separate instruction and register class that excluded eax to prevent matching the instruction that would encode with 0x90.

This patch changes this to just use an InstAlias to force xchgl %eax, %eax to use XCHG32rr instruction in 64-bit mode. This gets rid of the separate instruction and register class.

llvm-svn: 322532
2018-01-16 06:07:16 +00:00
Craig Topper daa385f480 [X86] Make 'xchgq %rax, %rax' an alias for the 0x90 nop encoding to match gas.
Previously we encoded it as 0x48 0x90.

llvm-svn: 322531
2018-01-16 06:07:14 +00:00
Simon Pilgrim e5dad1365c Avoid Wparentheses warning.
llvm-svn: 322526
2018-01-15 22:40:06 +00:00
Simon Pilgrim 85bd9141ca [X86][MMX] Add support for MMX zero vector creation
As mentioned on PR35869, (and came up recently on D41517) we don't create a MMX zero register via the PXOR but instead perform a spill to stack from a XMM zero register.

This patch adds support for direct MMX zero vector creation and should make it easier to add better constant vector creation in the future as well.

Differential Revision: https://reviews.llvm.org/D41908

llvm-svn: 322525
2018-01-15 22:32:40 +00:00
Simon Pilgrim 940eae3cc1 [X86][SSE] Add custom execution domain fixing for BLENDPD/BLENDPS/PBLENDD/PBLENDW (PR34873)
Add support for custom execution domain fixing and implement support for BLENDPD/BLENDPS/PBLENDD/PBLENDW.

Differential Revision: https://reviews.llvm.org/D42042

llvm-svn: 322524
2018-01-15 22:18:45 +00:00
Craig Topper 1393ccf949 [X86] Use MVT::getVectorVT instead of EVT::getVectorVT when splitting 256/512 bit build_vectors. NFC
We must be creating a legal type here which means it can be an MVT.

llvm-svn: 322512
2018-01-15 20:33:53 +00:00
Craig Topper aacc622564 [X86] Generalize some code in LowerBUILD_VECTOR. NFC
llvm-svn: 322511
2018-01-15 20:33:52 +00:00
Craig Topper 4f7fadd029 [X86] Remove unnecessary if statement from LowerBUILD_VECTOR. NFCI
We were checking for 128, 256, or 512 bit vectors, but those are the only types that can get here.

llvm-svn: 322510
2018-01-15 20:33:50 +00:00
Dan Gohman 7aa1fcdf3e [WebAssembly] Update README.txt.
Describe more of the current status, mention Rust as another easy
way to use this backend, and add more documentation links.

llvm-svn: 322508
2018-01-15 20:08:14 +00:00
Stanislav Mekhanoshin 62875fcd6c [AMDGPU] Add HW_REG_SH_MEM_BASES symbolic name for s_getreg_b32
Differential Revision: https://reviews.llvm.org/D41617

llvm-svn: 322500
2018-01-15 18:49:15 +00:00