Commit Graph

110057 Commits

Author SHA1 Message Date
Simon Pilgrim 09c56b799f [X86] Apply clang-format to detectUSatPattern. NFCI.
Cleanup from D42544

llvm-svn: 323439
2018-01-25 16:38:56 +00:00
Krzysztof Parzyszek 16610b0a57 Revert "[Hexagon] Replace EmitFunctionEntryCode with a DAG preprocessing code"
This reverts r323374. The fix needs a different approach.

llvm-svn: 323438
2018-01-25 16:36:53 +00:00
Sanjay Patel 1d68112c4b [InstCombine] narrow masked zexted binops (PR35792)
This is guarded by shouldChangeType(), so the tests show that
we don't do the fold if the narrower type is not legal. Note
that there is a proposal (D42424) that would change the results
for the specific cases shown in these tests. That difference is
also discussed in PR35792:
https://bugs.llvm.org/show_bug.cgi?id=35792

Alive proofs for the cases handled here as well as the bitwise 
logic binops that we should already do better on:
https://rise4fun.com/Alive/c97
https://rise4fun.com/Alive/Lc5E
https://rise4fun.com/Alive/kdf

llvm-svn: 323437
2018-01-25 16:34:36 +00:00
Alexey Bataev a0b2c78efc Revert "[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle."
This reverts commit r323430 to fix buildbots.

llvm-svn: 323432
2018-01-25 15:20:29 +00:00
Alexey Bataev ad51fe3644 [SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle.
Summary:
If the same value is going to be vectorized several times in the same
tree entry, this entry is considered to be a gather entry and cost of
this gather is counter as cost of InsertElementInstrs for each gathered
value. But we can consider these elements as ShuffleInstr with
SK_PermuteSingle shuffle kind.

Reviewers: spatel, RKSimon, mkuper, hfinkel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38697

llvm-svn: 323430
2018-01-25 15:01:36 +00:00
Amjad Aboud f1f57a3137 Another try to commit 323321 (aggressive instruction combine).
llvm-svn: 323416
2018-01-25 12:06:32 +00:00
Jonas Devlieghere 2c14b15538 [Dwarf] Add dsymutil Atom extensions. NFC
This patch extends the atom types used by the Apple accelerator tables
with two dsymutil extensions:

 - DW_ATOM_type_type_flags
 - DW_ATOM_qual_name_hash

llvm-svn: 323414
2018-01-25 11:19:08 +00:00
Mikael Holmen 886edf8f8a [GlobalOpt] Emit fragments using field offsets from struct layout
Summary:
When creating the debug fragments for a SRA'd struct, use the fields'
offsets, taken from the struct layout, as the offsets for the resulting
fragments. This fixes an issue where GlobalOpt would emit fragments with
incorrect offsets for padded fields.

This should solve PR36016.

Patch by David Stenberg.

Reviewers: aprantl

Reviewed By: aprantl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42489

llvm-svn: 323411
2018-01-25 10:09:26 +00:00
Igor Laevsky 7a43f26dc8 [FuzzMutate] Inst deleter doesn't work with PhiNodes
Differential Revision: https://reviews.llvm.org/D42412

llvm-svn: 323409
2018-01-25 09:22:18 +00:00
Eugene Leviant 41e45955bb [IRMover] Add comment and fix test case
llvm-svn: 323407
2018-01-25 08:35:52 +00:00
Craig Topper b369cdbaad [X86] Expand IMUL/MUL instregexs in Intel scheduler models. Add load latency to some of them in SkylakeClient model.
The regular expressions and the imul names caused some instructions to be matched by multiple regexs creating unpredictable results.

This changes them all to use explicit instrs instead.

While doing this I also found that some instructions in Skylake were missing load latency so I fixed that too.

llvm-svn: 323406
2018-01-25 06:57:42 +00:00
Craig Topper 795b17f4fb [X86] Expand IMUL/MUL instregexs in Znver1 scheduler to show what's actually implemented.
The IMUL instruction names mixed with the prefix matching of the instregex lead to some strange matches. The worst being that several memory instructions are using the register form latency.

I don't know what the right answer is, so I've left TODOs and will try to work with the AMD folks to get this cleaned up.

llvm-svn: 323405
2018-01-25 06:57:39 +00:00
Craig Topper 066e73762d [X86] Name the MMX phaddd instruction with 3 Ds instead of just 2. NFC
llvm-svn: 323403
2018-01-25 04:45:32 +00:00
Craig Topper dbddac0915 [X86] Remove 64/128/256 from MMX/SSE/AVX instruction names for overall consistency. NFC
MMX instrutions all start with MMX_ so the 64 isn't needed for disambigutation.
SSE/AVX1 instructions are assumed 128-bit so we don't need to say 128.
AVX2 instructions should use a Y to indicate 256-bits.

llvm-svn: 323402
2018-01-25 04:45:30 +00:00
Craig Topper 81c87092d1 [X86] Remove unnecessary '_alt' and '_Int' from scheduler model regular expressions.
These were treated as optional suffixes, but the regular expressions are already prefix matches so this is unnecessary. It breaks the binary search optimization in tablegen due to the top level question mark.

llvm-svn: 323401
2018-01-25 04:45:28 +00:00
Lang Hames c8a74a0448 [ORC] Refactor the various lookupFlags methods to return the flags map via the
first argument.

This makes lookupFlags more consistent with lookup (which takes the query as the
first argument) and composes better in practice, since lookups are usually
linearly chained: Each lookupFlags can populate the result map based on the
symbols not found in the previous lookup. (If the maps were returned rather than
passed by reference there would have to be a merge step at the end).

llvm-svn: 323398
2018-01-25 01:43:00 +00:00
Aditya Nandakumar 81c81b6426 [GISel]: Implement GlobalISel combiner API.
https://reviews.llvm.org/D41373

The various components are

GICombinerHelper contains transformations that are common to all
targets. Targets can pick and choose which transformations (at
function/opcode granularity) each pass uses via configuring a
GICombinerInfo.

GICombiner contains some common code and it does the traversal,
driving of combines, worklist management and iterating until
convergence.

GICombinerInfo is an interface with a virtual method called combine.
The combiner info will allow targets to pick and choose (or
implement their own specific combines). CombineInfos can make
use of available combines in GICombineHelper to configure the
transformations for a particular pass. Currently this approach allows
cherry picking transformations from helpers (at function/opcode
granularity) and also allows early returning on specific
transformations. Targets also get to prioritize whether target specific
combines run before/after the opt-in generic combines. Ideally we would
like this part to be configured by both C++ and Tablegen. The
CombinerInfo also has a field which indicates how to deal with
IllegalOps (ie - should we allow to create them/or legalize them?).

A CombinerPass would configure a CombinerInfo, create the GICombiner
with the Info, and call
GICombiner::combineMachineInstrs(MachineFunction&).
This organization is very similar to the GISelLegalizer.

llvm-svn: 323392
2018-01-25 00:41:58 +00:00
Krzysztof Parzyszek 14f3ef1f0e [Hexagon] Replace EmitFunctionEntryCode with a DAG preprocessing code
The code in EmitFunctionEntryCode needs to know the maximum stack
alignment, but it runs very early in the selection process (before
lowering). The final stack alignment may change during lowering, so
the code needs to be moved to where the alignment is known.

llvm-svn: 323374
2018-01-24 21:19:51 +00:00
Amara Emerson 4f84f8862b [AArch64][GlobalISel] Fall back during AArch64 isel if we have a volatile load.
The tablegen imported patterns for sext(load(a)) don't check for single uses
of the load or delete the original after matching. As a result two loads are
left in the generated code. This particular issue will be fixed by adding
support for a G_SEXTLOAD opcode in future.

There are however other potential issues around this that wouldn't be fixed by
a G_SEXTLOAD, so until we have a proper solution we don't try to handle volatile
loads at all in the AArch64 selector.

Fixes/works around PR36018.

llvm-svn: 323371
2018-01-24 20:35:37 +00:00
Amara Emerson f386e2b081 [GlobalISel] Don't fall back to FastISel.
Apparently checking the pass structure isn't enough to ensure that we don't fall
back to FastISel, as it's set up as part of the SelectionDAGISel.

llvm-svn: 323369
2018-01-24 19:59:29 +00:00
Simon Pilgrim 9f551ad604 [X86][SSE] Aggressively use PMADDWD for v4i32 multiplies with 17 or more leading zeros
As discussed in D41484, PMADDWD for 'zero extended' vXi32 is nearly always a better option than PMULLD:
On SNB it will result in code that isn't any faster, but not any slower so we may as well keep it.
On KNL it only has half the throughput, so I've disabled it on there - ideally there'd be a better way than this.

Differential Revision: https://reviews.llvm.org/D42258

llvm-svn: 323367
2018-01-24 19:20:02 +00:00
Rafael Espindola 349fe0aa87 Simplify. NFC.
Thanks to Teresa Johnson for the suggestion.

llvm-svn: 323365
2018-01-24 19:11:24 +00:00
Alexey Bataev 0affccc8d7 Revert "[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle."
This reverts commit r323348 because of the broken buildbots.

llvm-svn: 323359
2018-01-24 18:36:51 +00:00
Easwaran Raman bf38deef3f Revert "[ThinLTO] Add call edges' relative block frequency to per-module summary."
Causes buildbot regressions.

llvm-svn: 323358
2018-01-24 18:15:29 +00:00
Geoff Berry c4796d4745 [AMDGPU] Make sure all super regs of reserved regs are marked reserved.
Summary:
Move reserveRegisterTuples into AMDGPURegisterInfo and use it in
R600RegisterInfo::getReservedRegs and
R600InstrInfo::reserveIndirectRegisters to ensure that all super
registers of reserved registers are also marked as reserved.

Before this change, under certain circumstances, the registers %t1_x and
%t1_xyzw would be marked as reserved, but %t1_xy and %t1_xyz would not
be, leading to the register allocator sometimes assigning a register to
%t1_xy, which is invalid since %t1_x is reserved.

Reviewers: arsenm, tstellar, MatzeB, qcolombet

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D42448

llvm-svn: 323356
2018-01-24 18:09:53 +00:00
Nicolai Haehnle 4afb64e4c6 Revert r321751, "StructurizeCFG: Fix broken backedge detection"
It causes regressions in various OpenGL test suites.

Keep the test cases introduced by r321751 as XFAIL, and add a test case
for the regression.

Change-Id: I90b4cc354f68cebe5fcef1f2422dc8fe1c6d3514
Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=36015
llvm-svn: 323355
2018-01-24 18:02:05 +00:00
Weiming Zhao 665784f170 [ARM] Expand long shifts for Thumb1 to __aeabi_ calls
Summary: For long shifts, the inlined version takes about 20 instructions on Thumb1. To avoid the code bloat, expand to __aeabi_ calls if target is Thumb1.

Reviewers: samparker

Reviewed By: samparker

Subscribers: samparker, aemerson, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D42401

llvm-svn: 323354
2018-01-24 18:00:57 +00:00
Craig Topper 05af43fbad [X86] Fix some inconsistencies in the itineraries and Sched for (V)PEXTRW/(V)PINSRW
The weirdest being that PEXTRWrr was tagged as a memory operation.

llvm-svn: 323353
2018-01-24 17:58:57 +00:00
Craig Topper b85b484fee [X86] Adjust names of PINSRW/PEXTRW intructions between MMX/SSE/AVX/AVX512 for consistency and to maybe enable more regular expression compaction in the scheduler models. NFCI
llvm-svn: 323352
2018-01-24 17:58:51 +00:00
Craig Topper 23cc866c97 [X86] Remove '(_REV)?' from a bunch of scheduler regular expressions. NFC
The regexs are treated as a prefix match already so the checking for optional text at the end provides no value. Instead it prevents the binary search optimization in tablegen from kicking in due to the top level question mark.

llvm-svn: 323351
2018-01-24 17:58:42 +00:00
Easwaran Raman 5f7aff9a0a [ThinLTO] Add call edges' relative block frequency to per-module summary.
Summary:
This allows relative block frequency of call edges to be passed to the
thinlink stage where it will be used to compute synthetic entry counts
of functions.

Reviewers: tejohnson, pcc

Subscribers: mehdi_amini, llvm-commits, inglorion

Differential Revision: https://reviews.llvm.org/D42212

llvm-svn: 323349
2018-01-24 17:51:23 +00:00
Alexey Bataev 4bd8e5332f [SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle.
Summary:
If the same value is going to be vectorized several times in the same
tree entry, this entry is considered to be a gather entry and cost of
this gather is counter as cost of InsertElementInstrs for each gathered
value. But we can consider these elements as ShuffleInstr with
SK_PermuteSingle shuffle kind.

Reviewers: spatel, RKSimon, mkuper, hfinkel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38697

llvm-svn: 323348
2018-01-24 17:50:53 +00:00
Krzysztof Parzyszek cf3ad5841b [Hexagon] Run late copy propagation and dead code elimination passes
llvm-svn: 323346
2018-01-24 17:48:11 +00:00
Rafael Espindola fc16f76edb Handle R_386_PLT32 in RuntimeDyldELF.
This should fix the 32 bit buildbots.

llvm-svn: 323344
2018-01-24 17:36:08 +00:00
Zvi Rackover 51f0d64b9c InstSimplify: If divisor element is undef simplify to undef
Summary:
If any vector divisor element is undef, we can arbitrarily choose it be
zero which would make the div/rem an undef value by definition.

Reviewers: spatel, reames

Reviewed By: spatel

Subscribers: magabari, llvm-commits

Differential Revision: https://reviews.llvm.org/D42485

llvm-svn: 323343
2018-01-24 17:22:00 +00:00
Daniel Sanders 262ed0ecd7 [globalisel] Introduce LegalityQuery to better encapsulate the legalizer decisions. NFC.
Summary:
`getAction(const InstrAspect &) const` breaks encapsulation by exposing
the smaller components that are used to decide how to legalize an
instruction.

This is a problem because we need to change the implementation of
LegalizerInfo so that it's able to describe particular type combinations
rather than just cartesian products of types.

For example, declaring the following
  setAction({..., 0, s32}, Legal)
  setAction({..., 0, s64}, Legal)
  setAction({..., 1, s32}, Legal)
  setAction({..., 1, s64}, Legal)
currently declares these type combinations as legal:
  {s32, s32}
  {s64, s32}
  {s32, s64}
  {s64, s64}
but we currently have no means to say that, for example, {s64, s32} is
not legal. Some operations such as G_INSERT/G_EXTRACT/G_MERGE_VALUES/
G_UNMERGE_VALUES has relationships between the types that are currently
described incorrectly.

Additionally, G_LOAD/G_STORE currently have no means to legalize non-atomics
differently to atomics. The necessary information is in the MMO but we have no
way to use this in the legalizer. Similarly, there is currently no way for the
register type and the memory type to differ so there is no way to cleanly
represent extending-load/truncating-store in a way that can't be broken by
optimizers (resulting in illegal MIR).

This patch introduces LegalityQuery which provides all the information
needed by the legalizer to make a decision on whether something is legal
and how to legalize it.

Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar, volkan, reames, bogner

Reviewed By: bogner

Subscribers: bogner, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D42244

llvm-svn: 323342
2018-01-24 17:17:46 +00:00
Jonas Devlieghere 5803a6744e [NFC] Make magic number for DJB hash function customizable.
This allows us to specify the magic number for the DJB hash function.
This feature is needed by dsymutil to emit Apple types accelerator
table.

llvm-svn: 323341
2018-01-24 16:53:14 +00:00
Sanjay Patel 1d91ec34b2 [ValueTracking] add recursion depth param to matchSelectPattern
We're getting bug reports:
https://bugs.llvm.org/show_bug.cgi?id=35807
https://bugs.llvm.org/show_bug.cgi?id=35840
https://bugs.llvm.org/show_bug.cgi?id=36045
...where we blow up the stack in value tracking because other passes are sending 
in selects that have an operand that is itself the select.

We don't currently have a reliable way to avoid analyzing dead code that may take 
non-standard forms, so bail out when things go too far.

This mimics the recursion depth limitations in other parts of value tracking.

Unfortunately, this pushes the underlying problems for other passes (jump-threading,
simplifycfg, correlated-propagation) into hiding. If someone wants to uncover those
again, the first draft of this patch on Phab would do that (it would assert rather
than bail out).

Differential Revision: https://reviews.llvm.org/D42442

llvm-svn: 323331
2018-01-24 15:20:37 +00:00
Amjad Aboud d53504e379 Reverted 323321.
llvm-svn: 323326
2018-01-24 14:48:49 +00:00
Pablo Barrio 9b3d4c01a0 [AArch64] Avoid unnecessary vector byte-swapping in big-endian
Summary:
Loads/stores of some NEON vector types are promoted to other vector
types with different lane sizes but same vector size. This is not a
problem in little-endian but, when in big-endian, it requires
additional byte reversals required to preserve the lane ordering
while keeping the right endianness of the data inside each lane.
For example:

%1 = load <4 x half>, <4 x half>* %p

results in the following assembly:

ld1 { v0.2s }, [x1]
rev32 v0.4h, v0.4h

This patch changes the promotion of these loads/stores so that the
actual vector load/store (LD1/ST1) takes care of the endianness
correctly and there is no need for further byte reversals. The
previous code now results in the following assembly:

ld1 { v0.4h }, [x1]

Reviewers: olista01, SjoerdMeijer, efriedma

Reviewed By: efriedma

Subscribers: aemerson, rengolin, javed.absar, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D42235

llvm-svn: 323325
2018-01-24 14:13:47 +00:00
Krzysztof Parzyszek 5aef4b5997 [Hexagon] Remove unused HexagonISD opcodes, NFC
llvm-svn: 323324
2018-01-24 14:07:37 +00:00
Sander de Smalen dc00becd1b [DebugInfo] Emit DWARF reference for DIVariable 'count' in DISubrange
Summary:
This patch implements the codegen of DWARF debug info for non-constant
'count' fields for DISubrange.

This is patch [2/3] in a series to extend LLVM's DISubrange Metadata
node to support debugging of C99 variable length arrays and vectors with
runtime length like the Scalable Vector Extension for AArch64. It is
also a first step towards representing more complex cases like arrays
in Fortran.

Reviewers: echristo, pcc, aprantl, dexonsmith, clayborg, kristof.beyls, dblaikie

Reviewed By: aprantl

Subscribers: fhahn, aemerson, rengolin, JDevlieghere, llvm-commits

Differential Revision: https://reviews.llvm.org/D41696

llvm-svn: 323323
2018-01-24 13:35:54 +00:00
Amjad Aboud e4453233d7 [InstCombine] Introducing Aggressive Instruction Combine pass (-aggressive-instcombine).
Combine expression patterns to form expressions with fewer, simple instructions.
This pass does not modify the CFG.

For example, this pass reduce width of expressions post-dominated by TruncInst
into smaller width when applicable.

It differs from instcombine pass in that it contains pattern optimization that
requires higher complexity than the O(1), thus, it should run fewer times than
instcombine pass.

Differential Revision: https://reviews.llvm.org/D38313

llvm-svn: 323321
2018-01-24 12:42:42 +00:00
Simon Pilgrim f26df47831 [X86][SSE] Avoid calls to combineX86ShufflesRecursively that can't combine to target shuffles (PR32037)
Don't bother making recursive calls to combineX86ShufflesRecursively if we have more shuffle source operands than will be combined together with the remaining recursive depth.

See https://bugs.llvm.org/show_bug.cgi?id=32037#c26 and https://bugs.llvm.org/show_bug.cgi?id=32037#c27 for the reduction in compile times from this patch.

Differential Revision: https://reviews.llvm.org/D42378

llvm-svn: 323320
2018-01-24 11:41:09 +00:00
Malcolm Parsons 21e545d08d Fix typos of occurred and occurrence
llvm-svn: 323318
2018-01-24 10:33:39 +00:00
Igor Laevsky 50acecf2ab [llvm-opt-fuzzer] Add couple of popular passes
Differential Revision: https://reviews.llvm.org/D42410

llvm-svn: 323314
2018-01-24 09:57:17 +00:00
Sander de Smalen fdf40917d9 [Metadata] Extend 'count' field of DISubrange to take a metadata node
Summary:
This patch extends the DISubrange 'count' field to take either a
(signed) constant integer value or a reference to a DILocalVariable
or DIGlobalVariable.

This is patch [1/3] in a series to extend LLVM's DISubrange Metadata
node to support debugging of C99 variable length arrays and vectors with
runtime length like the Scalable Vector Extension for AArch64. It is
also a first step towards representing more complex cases like arrays
in Fortran.

Reviewers: echristo, pcc, aprantl, dexonsmith, clayborg, kristof.beyls, dblaikie

Reviewed By: aprantl

Subscribers: rnk, probinson, fhahn, aemerson, rengolin, JDevlieghere, llvm-commits

Differential Revision: https://reviews.llvm.org/D41695

llvm-svn: 323313
2018-01-24 09:56:07 +00:00
Sven van Haastregt e8404780c3 [DAGCombiner] Bail out if vector size is not a multiple
For the included test case, the DAG transformation
  concat_vectors(scalar, undef) -> scalar_to_vector(sclr)
would attempt to create a v2i32 vector for a v9i8
concat_vector.  Bail out to avoid creating a bitcast with
mismatching sizes later on.

Differential Revision: https://reviews.llvm.org/D42379

llvm-svn: 323312
2018-01-24 09:53:47 +00:00
Max Kazantsev 0f720e1296 [NFC] Remove overconfident assert from IRCE
This patch removes assert that SCEV is able to prove that a value is
non-negative. In fact, SCEV can sometimes be unable to do this because
its cache does not update properly. This assert will be returned once this
problem is resolved.

llvm-svn: 323309
2018-01-24 07:51:41 +00:00
Martin Storsjo 4ed94a06ac [ARM] Call __chkstk for dynamic stack allocation in all windows environments
This matches what MSVC does for alloca() function calls on ARM.
Even if MSVC doesn't support VLAs at the language level, it does
support the alloca function.

On the clang level, both the _alloca() (when emulating MSVC, which is
what the alloca() function expands to) and __builtin_alloca() builtin
functions, and VLAs, map to the same LLVM IR "alloca" function - so
within LLVM they're not distinguishable from each other.

Differential Revision: https://reviews.llvm.org/D42292

llvm-svn: 323308
2018-01-24 06:40:11 +00:00
Martin Storsjo e8248f2e10 [GlobalMerge] Don't merge dllexport globals
Merging such globals loses the dllexport attribute. Add a test
to check that normal globals still are merged.

Differential Revision: https://reviews.llvm.org/D42127

llvm-svn: 323307
2018-01-24 06:40:04 +00:00
Craig Topper 069e1dd861 [X86] Move 'Y' to correct place in FMA4 regular expression in Znver1 scheduler model.
I think these instructions used to be named differently and the regular expression reflected that. I guess we must have correct itinerary information that made this not matter for the scheduler test?

llvm-svn: 323305
2018-01-24 05:32:51 +00:00
Craig Topper a55ac7b790 [X86] Rename 256-bit VFRCZ instructions to have the Y before the rr/rm to match other instructions. NFC
llvm-svn: 323304
2018-01-24 05:14:39 +00:00
Craig Topper fd68c2d0ae [X86] Remove redundant regular expression from the Znver1 scheduler model. NFC
llvm-svn: 323303
2018-01-24 05:14:33 +00:00
Hiroshi Inoue 501931b117 [NFC] fix trivial typos in comments
"the the" -> "the"

llvm-svn: 323302
2018-01-24 05:04:35 +00:00
Craig Topper 0321ebc054 [X86] Use ISD::SIGN_EXTEND instead of X86ISD::VSEXT for mask to xmm/ymm/zmm conversion
There are a couple tricky things with this patch.

I had to add an override of isVectorLoadExtDesirable to stop DAG combine from combining sign_extend with loads after legalization since we legalize sextload using a load+sign_extend. Overriding this hook actually prevents a lot sextloads from being created in the first place.

I also had to add isel patterns because DAG combine blindly combines sign_extend+truncate to a smaller sign_extend which defeats what legalization was trying to do.

Differential Revision: https://reviews.llvm.org/D42407

llvm-svn: 323301
2018-01-24 04:51:17 +00:00
Jakub Kuderski ffb4fb7f6f [Dominators] Introduce DomTree verification levels
Summary:
Currently, there are 2 ways to verify a DomTree:
* `DT.verify()` -- runs full tree verification and checks all the properties and gives a reason why the tree is incorrect. This is run by when EXPENSIVE_CHECKS are enabled or when `-verify-dom-info` flag is set.
* `DT.verifyDominatorTree()` -- constructs a fresh tree and compares it against the old one. This does not check any other tree properties (DFS number, levels), nor ensures that the construction algorithm is correct. Used by some passes inside assertions.

This patch introduces DomTree verification levels, that try to close the gape between the two ways of checking trees by introducing 3 verification levels:
- Full -- checks all properties, but can be slow (O(N^3)). Used when manually requested (e.g. `assert(DT.verify())`) or when  `-verify-dom-info` is set.
- Basic -- checks all properties except the sibling property, and compares the current tree with a freshly constructed one instead. This should catch almost all errors, but does not guarantee that the construction algorithm is correct. Used when EXPENSIVE checks are enabled.
- Fast -- checks only basic properties (reachablility, dfs numbers, levels, roots), and compares with a fresh tree. This is meant to replace the legacy `DT.verifyDominatorTree()` and in my tests doesn't cause any noticeable performance impact even in the most pessimistic examples.

When used to verify dom tree wrapper pass analysis on sqlite3, the 3 new levels make `opt -O3` take the following amount of time on my machine:
- no verification: 8.3s
- `DT.verify(VerificationLevel::Fast)`: 10.1s
- `DT.verify(VerificationLevel::Basic)`: 44.8s
- `DT.verify(VerificationLevel::Full)`: 1m 46.2s
(and the previous `DT.verifyDominatorTree()` is within the noise of the Fast level)

This patch makes `DT.verifyDominatorTree()` pick between the 3 verification levels depending on EXPENSIVE_CHECKS and `-verify-dom-info`.

Reviewers: dberlin, brzycki, davide, grosser, dmgreen

Reviewed By: dberlin, brzycki

Subscribers: MatzeB, llvm-commits

Differential Revision: https://reviews.llvm.org/D42337

llvm-svn: 323298
2018-01-24 02:40:35 +00:00
Rafael Espindola 432a587cf0 Don't assume a null GV is local for ELF and MachO.
This is already a simplification, and should help with avoiding a plt
reference when calling an intrinsic with -fno-plt.

With this change we return false for null GVs, so the caller only
needs to check the new metadata to decide if it should use foo@plt or
*foo@got.

llvm-svn: 323297
2018-01-24 02:11:18 +00:00
Eric Christopher a8bdf5328d Remove set but unused variable IsUndef.
llvm-svn: 323295
2018-01-24 01:51:57 +00:00
Zvi Rackover b5447b1e7c X86: Update isVectorShiftByScalarCheap with cases covered by AVX512BW
Summary:
AVX512BW adds support for variable shift amount for 16-bit element
vectors.

Reviewers: craig.topper, RKSimon, spatel

Reviewed By: RKSimon

Subscribers: rengolin, tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D42437

llvm-svn: 323292
2018-01-24 01:36:40 +00:00
Aditya Nandakumar f2aa2af24e [GISel]: Remove redundant copies at the end of ISel
https://reviews.llvm.org/D42402

A lot of these copies are useless (copies b/w VRegs having the same
regclass) and should be cleaned up.

llvm-svn: 323291
2018-01-24 01:35:26 +00:00
Sam Clegg 23012e98c9 [WebAssembly] Add minor helper functions to WasmObjectFile
Also, fix crash when exporting an imported function.

Differential Revision: https://reviews.llvm.org/D42454

llvm-svn: 323290
2018-01-24 01:27:17 +00:00
Matthias Braun 70fd374d1e AArch64: Cyclone: Remove SlowMisaligned128Store tuning flag
Remove FeatureSlowMisaligned128Store from cyclone flags.
This flag causes splitting of 16 byte wide stores into 2 stored of 8
bytes. This was useful on older apple CPUs which were slow for 16byte
stores that were not aligned on 16byte. As the compiler often cannot
predict the actual alignment, the splitting was choosen.

This has been a topic for a lot of debate as the splitting also
decreases performance for some benchmarks. Measuring the effects on
newer apple chips (rdar://35525421) shows that it harms more cases than
it helps. So it is time to retire this workaround.

llvm-svn: 323289
2018-01-24 00:39:53 +00:00
Benjamin Kramer 0627a33ed4 [TblGen] Inline an (almost) trivial accessor. No functionality change.
llvm-svn: 323276
2018-01-23 23:03:50 +00:00
Volkan Keles ebf34ea316 BlockExtractor: Remove unused variable. NFC.
llvm-svn: 323271
2018-01-23 22:24:34 +00:00
Tim Shen 7abe9887b0 [PPC] Avoid incorrect fp-i128-fp lowering.
Summary:
Fix an issue that's similar to what D41411 fixed:
  float(__int128(float_var)) shouldn't be optimized to xscvdpsxds +
  xscvsxdsp, as they mean (float)(int64_t)float_var.

Reviewers: jtony, hfinkel, echristo

Subscribers: sanjoy, nemanjai, hiraditya, llvm-commits, kbarton

Differential Revision: https://reviews.llvm.org/D42400

llvm-svn: 323270
2018-01-23 22:06:57 +00:00
Volkan Keles dc40be75f8 [llvm-extract] Support extracting basic blocks
Summary:
Currently, there is no way to extract a basic block from a function easily. This patch
extends llvm-extract to extract the specified basic block(s).

Reviewers: loladiro, rafael, bogner

Reviewed By: bogner

Subscribers: hintonda, mgorny, qcolombet, llvm-commits

Differential Revision: https://reviews.llvm.org/D41638

llvm-svn: 323266
2018-01-23 21:51:34 +00:00
Craig Topper 1e42a4a735 [X86] Merge some regular expressions in Zen scheduler model and remove 2 unused classes.
I don't know if the unused classes were intended to be used and that the VEX version is really different than the legacy SSE version. Agner's tables don't show any differences. I'm just cleaning up assuming the current behavior is correct.

llvm-svn: 323263
2018-01-23 21:37:56 +00:00
Craig Topper 3067f4ddca [X86] Remove 'Int_' from instregexs in Zen scheduler model.
No instructions have Int_ at the beginning. It's always at the end now. So it should be picked up as a prefix match

llvm-svn: 323262
2018-01-23 21:37:54 +00:00
Craig Topper 002657731b [X86] Move 'Int_' to the end of the name of the VCOMISS/VUCOMISS and instructions to get them picked up by the scheduler model regexs.
All other intrinsic instructions put the _Int on the end. This make these instructions consistent and gets the prefix instregexs in the scheduler models to pick them up.

llvm-svn: 323261
2018-01-23 21:37:51 +00:00
Simon Pilgrim 2cc74ed2be [X86][AVX] LowerBUILD_VECTORAsVariablePermute - add support for VPERMILPV to v2i64/v2f64
Minor refactor to make it possible for LowerBUILD_VECTORAsVariablePermute to be used with a wider variety of shuffles op and types.

I'd have liked to add v4i32/v4f32 support as well but we don't see v4i32 index extractions at the moment (which is why I created D42308)

After this I intend to begin adding scaling support for PSHUFB (v8i16, v4i32, v2i64)) and VPERMPS (v4f64, v4i64).

Differential Revision: https://reviews.llvm.org/D42431

llvm-svn: 323260
2018-01-23 21:33:24 +00:00
Evgeniy Stepanov d5a6fdbe95 [safestack] Inline safestack pointer access when possible.
Summary:
This adds an -mllvm flag that forces the use of a runtime function call to
get the unsafe stack pointer, the same that is currently used on non-x86, non-aarch64 android.
The call may be inlined.

Reviewers: pcc

Subscribers: aemerson, kristof.beyls, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D37405

llvm-svn: 323259
2018-01-23 21:27:07 +00:00
Simon Pilgrim c1e2290d37 Fix MSVC "result of 32-bit shift implicitly converted to 64 bits" warning. NFCI.
llvm-svn: 323258
2018-01-23 21:22:16 +00:00
Alexey Bataev 4f74a31c0e Revert "[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle."
This reverts commit r323246 because of the broken buildbots.

llvm-svn: 323252
2018-01-23 20:11:27 +00:00
Krzysztof Parzyszek d5e8a260bb [Hexagon] Add patterns for sext_inreg of HVX vector types
llvm-svn: 323250
2018-01-23 19:56:16 +00:00
Alexey Bataev 6719e2418c [SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle.
Summary:
If the same value is going to be vectorized several times in the same
tree entry, this entry is considered to be a gather entry and cost of
this gather is counter as cost of InsertElementInstrs for each gathered
value. But we can consider these elements as ShuffleInstr with
SK_PermuteSingle shuffle kind.

Reviewers: spatel, RKSimon, mkuper, hfinkel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38697

llvm-svn: 323246
2018-01-23 19:30:26 +00:00
Krzysztof Parzyszek 275ffa4679 [Hexagon] Implement hasLoadFromStackSlot and hasStoreToStackSlot
If the instruction is a bundle, check the instructions inside of it.

Patch by Suyog Sarda.

llvm-svn: 323240
2018-01-23 19:08:40 +00:00
Nico Weber 1c7c45688c Introduce errorToBool() helper and use it.
errorToBool() converts an Error to a bool and puts the Error in a checked
state.  No behavior change.

https://reviews.llvm.org/D42422

llvm-svn: 323238
2018-01-23 19:03:13 +00:00
Sam Clegg 68b425f0bf [WebAssembly] Remove "name" section of object wasm object files
LLD is unaffected, no changes needed there. LLD continues to
write out a name section, using the symbol names.

Fixes: https://github.com/WebAssembly/tool-conventions/issues/37

Patch by Nicholas Wilson!

Differential Revision: https://reviews.llvm.org/D42425

llvm-svn: 323234
2018-01-23 18:30:04 +00:00
Krzysztof Parzyszek ae3e934bd6 [Hexagon] Fix unused variable warning in release build
llvm-svn: 323233
2018-01-23 18:16:52 +00:00
Krzysztof Parzyszek 3780a0e1fa [Hexagon] Implement basic vector operations on vectors vNi1
In addition to that, make sure that there are no boolean vector types that
are associated with multiple register classes. Specifically, remove v32i1
and v64i1 from integer register classes. These types will correspond to
results of vector comparisons, and as such should belong to the vector
predicate class. Having them in scalar registers as well makes legalization
ambiguous.

llvm-svn: 323229
2018-01-23 17:53:59 +00:00
Simon Pilgrim 6ff241fc99 [X86][SSE] LowerBUILD_VECTORAsVariablePermute - extract subvector from oversized index vectors
llvm-svn: 323223
2018-01-23 17:02:15 +00:00
Dan Gohman 5464941a6a [WebAssembly] Add mem.* intrinsics.
The grow_memory and current_memory instructions are expected to be
officially renamed to mem.grow and mem.size. Introduce new intrinsics
with the new names. These new names aren't yet official, so for now,
use them at your own risk.

Also, take this opportunity to add arguments for the currently unused
immediate field in those instructions.

llvm-svn: 323222
2018-01-23 17:02:02 +00:00
Dan Gohman f2c1cae5cb [WebAssembly] Switch to *-wasm as the default target triple.
This makes wasm32-unknown-unknown-wasm the default, which supports
the .o file writer and the new linking ABI. To enable s2wasm-compatible
output, use the wasm32-unknown-unknown-elf triple.

llvm-svn: 323220
2018-01-23 16:55:44 +00:00
Yaxun Liu d0820433b1 Verifier: fix bug treating debug info issue as non-debug info issue
Normally when llvm-as sees only debug info errors in LLVM assembly, it simply
drops the debug info and outputs a valid LLVM bitcode and returns 0.

There is a bug in LLVM verifier which incorrectly treats a debug info error
as non-debug info error, which causes llvm-as returns 1 even though llvm-as
can drop the invalid debug info and outputs a valid LLVM bitcode.

This patch fixes that.

Differential Revision: https://reviews.llvm.org/D42391

llvm-svn: 323216
2018-01-23 16:11:15 +00:00
Yaxun Liu 8b7454a8dd CodeGen: Fix assertion in ScheduleDAGMILive::scheduleMI due to llvm.dbg.value
Fix a bug in ScheduleDAGMILive::scheduleMI which causes BotRPTracker not tracking CurrentBottom in some rare cases involving llvm.dbg.value.

This issues causes amdgcn target to assert when compiling some user codes with -g.

Differential Revision: https://reviews.llvm.org/D42394

llvm-svn: 323214
2018-01-23 16:04:53 +00:00
Craig Topper c58c2b5c9b [X86] Rewrite vXi1 element insertion by using a vXi1 scalar_to_vector and inserting into a vXi1 vector.
The existing code was already doing something very similar to subvector insertion so this allows us to remove the nearly duplicate code.

This patch is a little larger than it should be due to differences between the DQI handling between the two today.

llvm-svn: 323212
2018-01-23 15:56:36 +00:00
Simon Pilgrim 0c9f77a9f9 [X86][SSE] LowerBUILD_VECTORAsVariablePermute - ensure that the source vector is not larger than the destination
We might be able to support this in the future with VPERMV3, OR(PSHUFB, PSHUFB) etc.

llvm-svn: 323210
2018-01-23 15:51:03 +00:00
Simon Pilgrim 9b4a097f94 Use EVT::changeVectorElementTypeToInteger() to convert index type to integer
llvm-svn: 323207
2018-01-23 15:30:07 +00:00
Simon Pilgrim e2905c8a0c [X86][SSE] LowerBUILD_VECTORAsVariablePermute - ensure that the index vector has the correct number of elements
llvm-svn: 323206
2018-01-23 15:13:37 +00:00
Tim Northover f9b560aa8e AArch64: get type from correct result when forming BFX
Some nodes produce multiple values so when obtaining the type of an ISD::OR we
need to make sure we ask for the correct one. Hopefully that's all of them.

llvm-svn: 323205
2018-01-23 15:11:27 +00:00
Tim Northover 9f3003d08f AArch64: get type from correct result when forming BFI/BFM
Some nodes produce multiple values so when obtaining the type of an ISD::OR we
need to make sure we ask for the correct one.

llvm-svn: 323202
2018-01-23 14:37:03 +00:00
Craig Topper 76adcc86cd [X86] Legalize v32i1 without BWI via splitting to v16i1 rather than the default of promoting to v32i8.
Summary:
For the most part its better to keep v32i1 as a mask type of a narrower width than trying to promote it to a ymm register.

I had to add some overrides to the methods that get the types for the calling convention so that we still use v32i8 for argument/return purposes.

There are still some regressions in here. I definitely saw some around shuffles. I think we probably should move vXi1 shuffle from lowering to a DAG combine where I think the extend and truncate we have to emit would be better combined.

I think we also need a DAG combine to remove trunc from (extract_vector_elt (trunc))

Overall this removes something like 13000 CHECK lines from lit tests.

Reviewers: zvi, RKSimon, delena, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42031

llvm-svn: 323201
2018-01-23 14:25:39 +00:00
Craig Topper c2df6409c7 [X86] Add missing MOVSX/MOVZX instructions to load folding tables.
I'm not sure there's any way to generate these folding cases especially the movzx ones since even the register form is never emitted by codegen.

I'm just adding them to remove the difference with the autogenerated version of the folding table.

llvm-svn: 323200
2018-01-23 14:09:22 +00:00
Serguei Katkov 17e5794f11 [CGP] Fix the GV handling in complex addressing mode
If in complex addressing mode the difference is in GV then
base reg should not be installed because we plan to use
base reg as a merge point of different GVs.

This is a fix for PR35980.

Reviewers: reames, john.brawn, santosh
Reviewed By: john.brawn
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42230

llvm-svn: 323192
2018-01-23 12:07:49 +00:00
Simon Pilgrim 8ea1a0c690 [X86][SSE] LowerBUILD_VECTORAsVariablePermute - fix PSHUFB source/index operand ordering
As detailed in rL317463, PSHUFB (like most variable shuffle instructions) uses Op[0] for the source vector and Op[1] for the shuffle index vector, VPERMV works in reverse which is probably where the confusion comes from.

Differential Revision: https://reviews.llvm.org/D42380

llvm-svn: 323190
2018-01-23 11:39:06 +00:00
MinSeong Kim 27f77b4300 [Analysis] Disable exp/exp2/pow finite lib calls on Android with -ffast-math.
Summary:
Since r322087, glibc's finite lib calls are generated when possible.
However, glibc is not supported on Android. Therefore this change
enables llvm to finely distinguish between linux and Android for
unsupported library calls. The change also include some regression
tests.

Reviewers: srhines, pirama

Reviewed By: srhines

Subscribers: kongyi, chh, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D42288

llvm-svn: 323187
2018-01-23 11:11:36 +00:00
Stefan Maksimovic 98749e0249 [mips] Properly select abs and sqrt instructions
- Alter abs for micromips to have both AFGR64 and FGR64
  variants, same as sqrt
- Remove sqrt and abs from MicroMips32r6InstrInfo.td,
  use micromips FGR64 variants
- Restrict non-micromips abs/sqrt with NotInMicroMips
  predicate

Differential revision: https://reviews.llvm.org/D41439

llvm-svn: 323184
2018-01-23 10:09:39 +00:00
Ashutosh Nema 007b425b77 This change add's optimization remark in LoopVersioning LICM pass.
Summary:
This patch is adding remark messages to the LoopVersioning LICM pass, 
which will be useful for optimization remark emitter (ORE) infrastructure.

Patch by: Deepak Porwal

Reviewers: anemet, ashutosh.nema, eastig

Subscribers: eastig, vivekvpandya, fhahn, llvm-commits
llvm-svn: 323183
2018-01-23 09:47:28 +00:00
Anton Bikineev 82f61151b3 [InstSimplify] (X << Y) % X -> 0
llvm-svn: 323182
2018-01-23 09:27:47 +00:00
Craig Topper c92edd994e [X86] Don't reorder (srl (and X, C1), C2) if (and X, C1) can be matched as a movzx
Summary:
If we can match as a zero extend there's no need to flip the order to get an encoding benefit. As movzx is 3 bytes with independent source/dest registers. The shortest 'and' we could make is also 3 bytes unless we get lucky in the register allocator and its on AL/AX/EAX which have a 2 byte encoding.

This patch was more impressive before r322957 went in. It removed some of the same Ands that got deleted by that patch.

Reviewers: spatel, RKSimon

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42313

llvm-svn: 323175
2018-01-23 05:45:52 +00:00
Craig Topper e5aea25980 [X86] Remove 'NOREX' comment from the printing of _NOREX instructions.
Some of the NOREX instructions are used in 32-bit mode making this printing confusing. It also doesn't provide a lot of value since you can see the h-register being used by the instruction.

llvm-svn: 323174
2018-01-23 05:37:00 +00:00
Craig Topper 26a701f24f [X86] Various vXi1 insertion improvements.
Add missing patterns for inserting v1i1 into a zero vector. Use insert_subvector to zero upper bits before inserting an element into a vXi1 vector. Replace kshift based isel pattern with insert_subvector based pattern now that code that caused the pattern has been fixed to emit insert_subvector.

llvm-svn: 323173
2018-01-23 05:36:53 +00:00
David Blaikie 0c64f5a2eb NewPM: Add an extension point for the start of the pipeline.
This applies to most pipelines except the LTO and ThinLTO backend
actions - it is for use at the beginning of the overall pipeline.

This extension point will be used to add the GCOV pass when enabled in
Clang.

llvm-svn: 323166
2018-01-23 01:25:20 +00:00
Sam Clegg 60ec30340f [WebAssembly] Store function index rather than table index in TABLE_INDEX relocations
Relocations of type R_WEBASSEMBLY_TABLE_INDEX represent places
where the table index for a given function is needed.  While the
value stored in this location is a table index, the index in
the relocation entry itself is a function index (the index of
the function which is to be called indirectly).

This is how is was spec'd originally but the LLVM implementation
didn't do this.  This makes things a little simpler in the linker
since the table in the input file can essentially be ignored that
the output table can be created purely based on these relocations.

Patch by Nicholas Wilson!

Differential Revision: https://reviews.llvm.org/D42080

llvm-svn: 323165
2018-01-23 01:23:17 +00:00
Rui Ueyama 322fcfee34 Revert r322595: Specify inline for isWhitespace in CommandLine.cpp
The original change was made based on a misunderstanding that
-DCMAKE_BUILD_TYPE=RelWithDebugInfo would produce the same executable
as -DCMAKE_BUILD_TYPE=Release modulo debug info. Turned out that's not
true -- it at least disables some optimizations such as function inlining.

llvm-svn: 323161
2018-01-22 23:27:50 +00:00
Chandler Carruth c58f2166ab Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715, "Branch Target Injection", and is one of the two halves to Spectre..
Summary:
First, we need to explain the core of the vulnerability. Note that this
is a very incomplete description, please see the Project Zero blog post
for details:
https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html

The basis for branch target injection is to direct speculative execution
of the processor to some "gadget" of executable code by poisoning the
prediction of indirect branches with the address of that gadget. The
gadget in turn contains an operation that provides a side channel for
reading data. Most commonly, this will look like a load of secret data
followed by a branch on the loaded value and then a load of some
predictable cache line. The attacker then uses timing of the processors
cache to determine which direction the branch took *in the speculative
execution*, and in turn what one bit of the loaded value was. Due to the
nature of these timing side channels and the branch predictor on Intel
processors, this allows an attacker to leak data only accessible to
a privileged domain (like the kernel) back into an unprivileged domain.

The goal is simple: avoid generating code which contains an indirect
branch that could have its prediction poisoned by an attacker. In many
cases, the compiler can simply use directed conditional branches and
a small search tree. LLVM already has support for lowering switches in
this way and the first step of this patch is to disable jump-table
lowering of switches and introduce a pass to rewrite explicit indirectbr
sequences into a switch over integers.

However, there is no fully general alternative to indirect calls. We
introduce a new construct we call a "retpoline" to implement indirect
calls in a non-speculatable way. It can be thought of loosely as
a trampoline for indirect calls which uses the RET instruction on x86.
Further, we arrange for a specific call->ret sequence which ensures the
processor predicts the return to go to a controlled, known location. The
retpoline then "smashes" the return address pushed onto the stack by the
call with the desired target of the original indirect call. The result
is a predicted return to the next instruction after a call (which can be
used to trap speculative execution within an infinite loop) and an
actual indirect branch to an arbitrary address.

On 64-bit x86 ABIs, this is especially easily done in the compiler by
using a guaranteed scratch register to pass the target into this device.
For 32-bit ABIs there isn't a guaranteed scratch register and so several
different retpoline variants are introduced to use a scratch register if
one is available in the calling convention and to otherwise use direct
stack push/pop sequences to pass the target address.

This "retpoline" mitigation is fully described in the following blog
post: https://support.google.com/faqs/answer/7625886

We also support a target feature that disables emission of the retpoline
thunk by the compiler to allow for custom thunks if users want them.
These are particularly useful in environments like kernels that
routinely do hot-patching on boot and want to hot-patch their thunk to
different code sequences. They can write this custom thunk and use
`-mretpoline-external-thunk` *in addition* to `-mretpoline`. In this
case, on x86-64 thu thunk names must be:
```
  __llvm_external_retpoline_r11
```
or on 32-bit:
```
  __llvm_external_retpoline_eax
  __llvm_external_retpoline_ecx
  __llvm_external_retpoline_edx
  __llvm_external_retpoline_push
```
And the target of the retpoline is passed in the named register, or in
the case of the `push` suffix on the top of the stack via a `pushl`
instruction.

There is one other important source of indirect branches in x86 ELF
binaries: the PLT. These patches also include support for LLD to
generate PLT entries that perform a retpoline-style indirection.

The only other indirect branches remaining that we are aware of are from
precompiled runtimes (such as crt0.o and similar). The ones we have
found are not really attackable, and so we have not focused on them
here, but eventually these runtimes should also be replicated for
retpoline-ed configurations for completeness.

For kernels or other freestanding or fully static executables, the
compiler switch `-mretpoline` is sufficient to fully mitigate this
particular attack. For dynamic executables, you must compile *all*
libraries with `-mretpoline` and additionally link the dynamic
executable and all shared libraries with LLD and pass `-z retpolineplt`
(or use similar functionality from some other linker). We strongly
recommend also using `-z now` as non-lazy binding allows the
retpoline-mitigated PLT to be substantially smaller.

When manually apply similar transformations to `-mretpoline` to the
Linux kernel we observed very small performance hits to applications
running typical workloads, and relatively minor hits (approximately 2%)
even for extremely syscall-heavy applications. This is largely due to
the small number of indirect branches that occur in performance
sensitive paths of the kernel.

When using these patches on statically linked applications, especially
C++ applications, you should expect to see a much more dramatic
performance hit. For microbenchmarks that are switch, indirect-, or
virtual-call heavy we have seen overheads ranging from 10% to 50%.

However, real-world workloads exhibit substantially lower performance
impact. Notably, techniques such as PGO and ThinLTO dramatically reduce
the impact of hot indirect calls (by speculatively promoting them to
direct calls) and allow optimized search trees to be used to lower
switches. If you need to deploy these techniques in C++ applications, we
*strongly* recommend that you ensure all hot call targets are statically
linked (avoiding PLT indirection) and use both PGO and ThinLTO. Well
tuned servers using all of these techniques saw 5% - 10% overhead from
the use of retpoline.

We will add detailed documentation covering these components in
subsequent patches, but wanted to make the core functionality available
as soon as possible. Happy for more code review, but we'd really like to
get these patches landed and backported ASAP for obvious reasons. We're
planning to backport this to both 6.0 and 5.0 release streams and get
a 5.0 release with just this cherry picked ASAP for distros and vendors.

This patch is the work of a number of people over the past month: Eric, Reid,
Rui, and myself. I'm mailing it out as a single commit due to the time
sensitive nature of landing this and the need to backport it. Huge thanks to
everyone who helped out here, and everyone at Intel who helped out in
discussions about how to craft this. Also, credit goes to Paul Turner (at
Google, but not an LLVM contributor) for much of the underlying retpoline
design.

Reviewers: echristo, rnk, ruiu, craig.topper, DavidKreitzer

Subscribers: sanjoy, emaste, mcrosier, mgorny, mehdi_amini, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D41723

llvm-svn: 323155
2018-01-22 22:05:25 +00:00
Mark Searles 7687d42052 [AMDGPU] SI Load Store Optimizer: When merging with offset, use V_ADD_{I|U}32_e64
- Change inserted add ( V_ADD_{I|U}32_e32 ) to _e64 version ( V_ADD_{I|U}32_e64 ) so that the add uses a vreg for the carry; this prevents inserted v_add from killing VCC; the _e64 version doesn't accept a literal in its encoding, so we need to introduce a mov instr as well to get the imm into a register.
- Change pass name to "SI Load Store Optimizer"; this removes the '/', which complicates scripts.

Differential Revision: https://reviews.llvm.org/D42124

llvm-svn: 323153
2018-01-22 21:46:43 +00:00
Dmitry Vyukov 68aab34f2d asan: allow inline instrumentation for the kernel
Currently ASan instrumentation pass forces callback
instrumentation when applied to the kernel.
This patch changes the current behavior to allow
using inline instrumentation in this case.

Authored by andreyknvl. Reviewed in:
https://reviews.llvm.org/D42384

llvm-svn: 323140
2018-01-22 19:07:11 +00:00
Evandro Menezes 312443fd83 [AArch64] Create a separate feature set for Exynos M3
Distinguish the features from Exynos M2.

llvm-svn: 323139
2018-01-22 19:03:26 +00:00
Joel Galenson 1d89cd2bb4 [ARM] Cleanup part of ARMBaseInstrInfo::optimizeCompareInstr (NFCI).
As noted in another review, this loop is confusing.  This commit cleans it up
somewhat.

Differential Revision: https://reviews.llvm.org/D42312

llvm-svn: 323136
2018-01-22 17:53:47 +00:00
Petar Jovanovic 29aced1bae [mips] add warnings for using dsp and msa flags with inappropriate revisions
Dsp and dspr2 require MIPS revision 2, while msa requires revision 5. Adding
warnings for cases when these flags are used with earlier revision.

Patch by Milos Stojanovic.

Differential Revision: https://reviews.llvm.org/D40490

llvm-svn: 323131
2018-01-22 16:43:30 +00:00
Ulrich Weigand 145d63f1ad [SystemZ] Fix bootstrap failure due to invalid DAG loop
The change in r322988 caused a failure in the bootstrap build bot.
The problem was that directly gluing a BR_CCMASK node to a
compare-and-swap could lead to issues if other nodes were
chained in between.  There is then no way to create a topological
sort that respects both the chain sequence and the glue property.

Fixed for now by rejecting the optimization in this case.  As a
future enhancement, we may be able to handle additional cases
by swapping chain links around.

llvm-svn: 323129
2018-01-22 15:41:49 +00:00
Marina Yatsina 811523cc08 Fix bug in commit 323096 exposed by test in test-suite-verify-machineinstrs-x86_64h-O3
Change-Id: I0a4b10d0d6c8de606d989c567ec07944ae283a87
llvm-svn: 323126
2018-01-22 15:31:05 +00:00
Sander de Smalen 7ab96f534c [AArch64][SVE] Asm: PTRUE and PTRUES instructions
Summary: These instructions initialize a predicate vector from a pattern/immediate.

Reviewers: fhahn, rengolin, evandro, mcrosier, t.p.northover, samparker, olista01

Reviewed By: samparker

Subscribers: aemerson, javed.absar, tschuett, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D41819

llvm-svn: 323124
2018-01-22 15:29:19 +00:00
Carey Williams da15b5b116 [AArch64] optimise v4f16 fcmps to utilise vector instructions
Improves the code generation for v4f16 FCMP instructions when FullFP16 is not supported.
Generating FCTVL(s) rather than a longer series of FCVTs.

Differential Revision: https://reviews.llvm.org/D41772

llvm-svn: 323118
2018-01-22 14:16:11 +00:00
Eugene Leviant 28d8a49f42 [ThinLTO] Re-commit of dot dumper after test fix
llvm-svn: 323116
2018-01-22 13:35:40 +00:00
Marina Yatsina e4d63a499d Fixing warnings caused by commit 323095
Change-Id: I4e1f81db2f5382a820f4016c23b243e4d5aebf51
llvm-svn: 323114
2018-01-22 13:24:10 +00:00
Pavel Labath 9b36fd2541 Rename DwarfAcceleratorTable to AppleAcceleratorTable. NFC
This frees up the first name to be used as an base class for the
apple table and the dwarf5 .debug_names accel table. The rename  was
split off from D42297 (adding of debug_names support), which is still
under review.

llvm-svn: 323113
2018-01-22 13:17:23 +00:00
Simon Pilgrim 17682a86da [X86][SSE] Add ISD::VECTOR_SHUFFLE to faux shuffle decoding (Reapplied)
Primarily, this allows us to use the aggressive extraction mechanisms in combineExtractWithShuffle earlier and make use of UNDEF elements that may be lost during lowering.

Reapplied after rL322279 was reverted at rL322335 due to PR35918, underlying issue was fixed at rL322644.

llvm-svn: 323104
2018-01-22 12:05:17 +00:00
Sander de Smalen 245e0e67f3 [AArch64][SVE] Asm: Predicate patterns
Summary:
This patch adds support for parsing/printing of named or unnamed
patterns that are used in SVE's PTRUE instruction, amongst others.

The pattern can be specified as a named pattern to initialize the predicate
vector or it can be specified as an immediate in the range 0-31.

Reviewers: fhahn, rengolin, evandro, mcrosier, t.p.northover

Reviewed By: fhahn

Subscribers: aemerson, javed.absar, tschuett, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D41818

llvm-svn: 323098
2018-01-22 10:46:00 +00:00
Marina Yatsina 77a21dbad4 Break false dependencies for POPCNT, LZCNT, TZCNT
Add POPCNT, LZCNT, TZCNT to the list of instructions that have false dependency.
Add a test to make sure BreakFalseDeps breaks the dependencies for these instructions.
Update affected tests.

This fixes bugzilla https://bugs.llvm.org/show_bug.cgi?id=33869

This is the final of multiple patches that fix this bugzilla.
Most of the patches are intended at refactoring the existent code.

Reviews of the refactoring done to enable this change:
https://reviews.llvm.org/D40330
https://reviews.llvm.org/D40331
https://reviews.llvm.org/D40332
https://reviews.llvm.org/D40333

Differential Revision: https://reviews.llvm.org/D40334

Change-Id: If95cbf1a3f5c7dccff8f1b22ecb397542147303d
llvm-svn: 323096
2018-01-22 10:07:01 +00:00
Marina Yatsina 0bf841ac2a Separate LoopTraversal, ReachingDefAnalysis and BreakFalseDeps into their own files.
This is the one of multiple patches that fix bugzilla https://bugs.llvm.org/show_bug.cgi?id=33869
Most of the patches are intended at refactoring the existent code.

Additional relevant reviews:
https://reviews.llvm.org/D40330
https://reviews.llvm.org/D40331
https://reviews.llvm.org/D40332
https://reviews.llvm.org/D40334

Differential Revision: https://reviews.llvm.org/D40333

Change-Id: Ie5f8eb34d98cfdfae23a3072eb69b5794f0e2d56
llvm-svn: 323095
2018-01-22 10:06:50 +00:00
Marina Yatsina 3d8efa4f0c Rename ExecutionDepsFix files to ExecutionDomainFix
This is the one of multiple patches that fix bugzilla https://bugs.llvm.org/show_bug.cgi?id=33869
Most of the patches are intended at refactoring the existent code.

Additional relevant reviews:
https://reviews.llvm.org/D40330
https://reviews.llvm.org/D40331
https://reviews.llvm.org/D40333
https://reviews.llvm.org/D40334

Differential Revision: https://reviews.llvm.org/D40332

Change-Id: I6a048cca7fdafbfc42fb1bac94343e483befded8
llvm-svn: 323094
2018-01-22 10:06:33 +00:00
Marina Yatsina c757c0b6ba ExecutionDepsFix refactoring:
- clang-format

This is the one of multiple patches that fix bugzilla https://bugs.llvm.org/show_bug.cgi?id=33869
Most of the patches are intended at refactoring the existent code.

Additional relevant reviews:
https://reviews.llvm.org/D40330
https://reviews.llvm.org/D40332
https://reviews.llvm.org/D40333
https://reviews.llvm.org/D40334

Differential Revision: https://reviews.llvm.org/D40331

Change-Id: I131b126af13bc743bc5d69d83699e52b9b720979
llvm-svn: 323093
2018-01-22 10:06:18 +00:00
Marina Yatsina 30234e866a ExecutionDepsFix refactoring:
- Moving comments to class definition in header file
- Changing comments to doxygen style
- Rephrase loop traversal explaining comment

This is the one of multiple patches that fix bugzilla https://bugs.llvm.org/show_bug.cgi?id=33869
Most of the patches are intended at refactoring the existent code.

Additional relevant reviews:
https://reviews.llvm.org/D40330
https://reviews.llvm.org/D40332
https://reviews.llvm.org/D40333
https://reviews.llvm.org/D40334

Differential Revision: https://reviews.llvm.org/D40331

Change-Id: I9a12618db5b66128611fa71b54a233414f6012ac
llvm-svn: 323092
2018-01-22 10:06:10 +00:00
Marina Yatsina 273d35db4d ExecutionDepsFix refactoring:
- Removing LiveRegs

This is the one of multiple patches that fix bugzilla https://bugs.llvm.org/show_bug.cgi?id=33869
Most of the patches are intended at refactoring the existent code.

Additional relevant reviews:
https://reviews.llvm.org/D40330
https://reviews.llvm.org/D40332
https://reviews.llvm.org/D40333
https://reviews.llvm.org/D40334

Differential Revision: https://reviews.llvm.org/D40331

Change-Id: I8ab56d99951a6d6981542f68d94c1f624f3c9fbf
llvm-svn: 323091
2018-01-22 10:06:01 +00:00
Marina Yatsina 6daa2d25e8 ExecutionDepsFix refactoring:
- Changing LiveRegs to be a vector

This is the one of multiple patches that fix bugzilla https://bugs.llvm.org/show_bug.cgi?id=33869
Most of the patches are intended at refactoring the existent code.

Additional relevant reviews:
https://reviews.llvm.org/D40330
https://reviews.llvm.org/D40332
https://reviews.llvm.org/D40333
https://reviews.llvm.org/D40334

Differential Revision: https://reviews.llvm.org/D40331

Change-Id: I9cdd364bd7bf2a0bf61ea41a48d4bd310ec3bce4
llvm-svn: 323090
2018-01-22 10:05:53 +00:00
Marina Yatsina 0f9cda52f5 ExecutionDepsFix refactoring:
- Changing DenseMap<MBB*, LiveReg*> to SmallVector<LiveReg*>
- Now the MBB number will be the index of LiveReg in the vector.
- Adding asserts

This patch is NFC.

This is the one of multiple patches that fix bugzilla https://bugs.llvm.org/show_bug.cgi?id=33869
Most of the patches are intended at refactoring the existent code.

Additional relevant reviews:
https://reviews.llvm.org/D40330
https://reviews.llvm.org/D40332
https://reviews.llvm.org/D40333
https://reviews.llvm.org/D40334

Differential Revision: https://reviews.llvm.org/D40331

Change-Id: If4a3f141693d0361ddb292432337dbb63a1e69ee
llvm-svn: 323089
2018-01-22 10:05:45 +00:00
Marina Yatsina 6b34e63ebd ExecutionDepsFix refactoring:
- Remove unneeded includes and unneeded members
- Use range iterators
- Variable renaming, typedefs, extracting constants
- Removing {} from one line ifs

This patch is NFC.

This is the one of multiple patches that fix bugzilla https://bugs.llvm.org/show_bug.cgi?id=33869
Most of the patches are intended at refactoring the existent code.

Additional relevant reviews:
https://reviews.llvm.org/D40330
https://reviews.llvm.org/D40332
https://reviews.llvm.org/D40333
https://reviews.llvm.org/D40334

Differential Revision: https://reviews.llvm.org/D40331

Change-Id: Ib59060ab3fa5bee3bf2ca2045c24e572635ee7f6
llvm-svn: 323088
2018-01-22 10:05:37 +00:00
Marina Yatsina 6fc2aaae8d Separate ExecutionDepsFix into 4 parts:
1. ReachingDefsAnalysis - Allows to identify for each instruction what is the “closest” reaching def of a certain register. Used by BreakFalseDeps (for clearance calculation) and ExecutionDomainFix (for arbitrating conflicting domains).
2. ExecutionDomainFix - Changes the variant of the instructions in order to minimize domain crossings.
3. BreakFalseDeps - Breaks false dependencies.
4. LoopTraversal - Creatws a traversal order of the basic blocks that is optimal for loops (introduced in revision L293571). Both ExecutionDomainFix and ReachingDefsAnalysis use this to determine the order they will traverse the basic blocks.

This also included the following changes to ExcecutionDepsFix original logic:
1. BreakFalseDeps and ReachingDefsAnalysis logic no longer restricted by a register class.
2. ReachingDefsAnalysis tracks liveness of reg units instead of reg indices into a given reg class.

Additional changes in affected files:
1. X86 and ARM targets now inherit from ExecutionDomainFix instead of ExecutionDepsFix. BreakFalseDeps also was added to the passes they activate.
2. Comments and references to ExecutionDepsFix replaced with ExecutionDomainFix and BreakFalseDeps, as appropriate.

Additional refactoring changes will follow.

This commit is (almost) NFC.
The only functional change is that now BreakFalseDeps will break dependency for all register classes.
Since no additional instructions were added to the list of instructions that have false dependencies, there is no actual change yet.
In a future commit several instructions (and tests) will be added.

This is the first of multiple patches that fix bugzilla https://bugs.llvm.org/show_bug.cgi?id=33869
Most of the patches are intended at refactoring the existent code.

Additional relevant reviews:
https://reviews.llvm.org/D40331
https://reviews.llvm.org/D40332
https://reviews.llvm.org/D40333
https://reviews.llvm.org/D40334

Differential Revision: https://reviews.llvm.org/D40330

Change-Id: Icaeb75e014eff96a8f721377783f9a3e6c679275
llvm-svn: 323087
2018-01-22 10:05:23 +00:00
Pavel Labath 44197df7a3 [BinaryFormat] Add .debug_names support
Summary:
This adds a definition of the .debug_names section and the new constants
(DW_IDX_???) which are used in it.

Reviewers: JDevlieghere, aprantl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42296

llvm-svn: 323084
2018-01-22 09:41:36 +00:00
Serguei Katkov f38041dc3e Revert [SCEV] Fix isLoopEntryGuardedByCond usage
It causes buildbot failures. New added assert is fired.
It seems not all usages of isLoopEntryGuardedByCond are fixed.

llvm-svn: 323079
2018-01-22 07:47:02 +00:00
Serguei Katkov 50714a1cbc [SCEV] Fix isLoopEntryGuardedByCond usage
ScalarEvolution::isKnownPredicate invokes isLoopEntryGuardedByCond without check
that SCEV is available at entry point of the loop. It is incorrect and fixed by patch.

Reviewers: sanjoy, mkazantsev, anna, dorit
Reviewed By: mkazantsev
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42165

llvm-svn: 323077
2018-01-22 07:31:41 +00:00
Hiroshi Inoue 290adb3184 [NFC] fix trivial typos in comments
"the the" -> "the"

llvm-svn: 323074
2018-01-22 05:54:46 +00:00
Lang Hames 635fd9092b [ORC] Add orc::SymbolResolver, a Orc/Legacy API interop header, and an
orc::SymbolResolver to JITSymbolResolver adapter.

The new orc::SymbolResolver interface uses asynchronous queries for better
performance. (Asynchronous queries with bulk lookup minimize RPC/IPC overhead,
support parallel incoming queries, and expose more available work for
distribution). Existing ORC layers will soon be updated to use the
orc::SymbolResolver API rather than the legacy llvm::JITSymbolResolver API.

Because RuntimeDyld still uses JITSymbolResolver, this patch also includes an
adapter that wraps an orc::SymbolResolver with a JITSymbolResolver API.

llvm-svn: 323073
2018-01-22 03:00:31 +00:00
Sanjay Patel 9530f18864 [InstCombine] (X << Y) / X -> 1 << Y
...when the shift is known to not overflow with the matching
signed-ness of the division.

This closes an optimization gap caused by canonicalizing mul
by power-of-2 to shl as shown in PR35709:
https://bugs.llvm.org/show_bug.cgi?id=35709

Patch by Anton Bikineev!

Differential Revision: https://reviews.llvm.org/D42032

llvm-svn: 323068
2018-01-21 16:14:51 +00:00
Eugene Leviant 72b9bdb71a Temporarily revert r323062 to investigate buildbot failures
llvm-svn: 323065
2018-01-21 10:22:19 +00:00
Eugene Leviant 453c976a63 [ThinLTO] Implement summary visualizer
Differential revision: https://reviews.llvm.org/D41297

llvm-svn: 323062
2018-01-21 07:27:32 +00:00
Lang Hames 5ff5a30bee [ORC] Add a lookupFlags method to VSO.
lookupFlags returns a SymbolFlagsMap for the requested symbols, along with a
set containing the SymbolStringPtr for any symbol not found in the VSO.

The JITSymbolFlags for each symbol will have been stripped of its transient
JIT-state flags (i.e. NotMaterialized, Materializing).

Calling lookupFlags does not trigger symbol materialization.

llvm-svn: 323060
2018-01-21 03:20:39 +00:00
Lang Hames 86edf53664 [ORC] More cleanup. NFC.
llvm-svn: 323059
2018-01-21 03:20:36 +00:00
Lang Hames 0d7d442699 [ORC] Cleanup. NFC.
llvm-svn: 323057
2018-01-21 02:24:45 +00:00
Philip Reames f57714c3c7 [DSE] Factor out common code [NFC]
We already had the pointer being stored to in the MemLoc, reuse that code.  In merging cases, it turned out the interface of the getLocForWrite had become inconsitent with other related utilities.  Fix that by making sure the input passes hasAnalyzableWrite as well.

llvm-svn: 323056
2018-01-21 02:10:54 +00:00
Philip Reames 424e7a1174 [DSE] Minor rename for clarity sake [NFC]
llvm-svn: 323055
2018-01-21 01:44:33 +00:00
Craig Topper 7fddf2bfef [X86] Add an override of targetShrinkDemandedConstant to limit the damage that shrinkdemandedbits can do to zext_in_reg operations
Summary:
This patch adds an implementation of targetShrinkDemandedConstant that tries to keep shrinkdemandedbits from removing bits that would otherwise have been recognized as a movzx.

We still need a follow patch to stop moving ands across srl if the and could be represented as a movzx before the shift but not after. I think this should help with some of the cases that D42088 ended up removing during isel.

Reviewers: spatel, RKSimon

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42265

llvm-svn: 323048
2018-01-20 18:50:09 +00:00
Simon Pilgrim 89540d9665 [X86][SSE] Check for out of bounds PEXTR/PINSR indices during faux shuffle combining.
llvm-svn: 323045
2018-01-20 17:16:01 +00:00
Jonas Paulsson 7ad28863fb [SelectionDAG] Fix codegen of vector stores with non byte-sized elements.
This was completely broken, but hopefully fixed by this patch.

In cases where it is needed, a vector with non byte-sized elements is stored
by extracting, zero-extending, shift:ing and or:ing the elements into an
integer of the same width as the vector, which is then stored.

Review: Eli Friedman, Ulrich Weigand
https://reviews.llvm.org/D42100#inline-369520
https://bugs.llvm.org/show_bug.cgi?id=35520

llvm-svn: 323042
2018-01-20 16:05:10 +00:00
Martin Storsjo f641d0d4f2 [COFF] Keep the underscore on exported decorated stdcall functions in MSVC mode
This (together with the corresponding LLD commit, that contains the
testcase updates) fixes PR35733.

Differential Revision: https://reviews.llvm.org/D41631

llvm-svn: 323035
2018-01-20 11:44:32 +00:00
Saleem Abdulrasool 99f479abcf CodeGen: handle llvm.used properly for COFF
`llvm.used` contains a list of pointers to named values which the
compiler, assembler, and linker are required to treat as if there is a
reference that they cannot see.  Ensure that the symbols are preserved
by adding an explicit `-include` reference to the linker command.

llvm-svn: 323017
2018-01-20 00:28:02 +00:00
Craig Topper 08bd14803c [X86] Teach X86 codegen to use vector width preference to avoid promoting to 512-bit types when VLX is enabled and the preference is for a smaller size.
This change applies to places where we would turn 128/256-bit code into 512-bit in order to get a wider element type through sext/zext. Any 512-bit types that already existed in the IR/DAG will be left that way.

The width preference has no effect on codegen behavior when the target does not have AVX512 enabled. So AVX/AVX2 codegen cannot be limited via this mechanism yet.

If the preference is lower than 256 we may still use a 256 bit type to do the operation. Constraining to 128 bits makes it much more difficult to support some operations. For many of these cases we need to change element width while keeping element count constant which is easiest done by switching between 256 and 128 bit.

The preference is only obeyed when AVX512 and VLX are available. This means the preference is not obeyed for KNL, but is obeyed for SKX, Cannonlake, and Icelake. For KNL, the only way to do masked operation is on 512-bit registers so we would have to completely disable masking to obey the preference. We would also lose support for gather, scatter, ctlz, vXi64 multiplies, etc. This may change in the future, but this simplifies the initial implementation.

Differential Revision: https://reviews.llvm.org/D41895

llvm-svn: 323016
2018-01-20 00:26:12 +00:00