Commit Graph

167090 Commits

Author SHA1 Message Date
Jonas Devlieghere 743d351120 [dsymutil] Add support for generating DWARF5 accelerator tables.
This patch add support for emitting DWARF5 accelerator tables
(.debug_names) from dsymutil. Just as with the Apple style accelerator
tables, it's possible to update existing dSYMs. This patch includes a
test that show how you can convert back and forth between the two types.

If no kind of table is specified, dsymutil will default to generating
Apple-style accelerator tables whenever it finds those in its input. The
same is true when there are no accelerator tables at all. Finally, in
the remaining case, where there's at least one DWARF v5 table and no
Apple ones, the output will contains a DWARF accelerator tables
(.debug_names).

Differential revision: https://reviews.llvm.org/D49137

llvm-svn: 337980
2018-07-25 23:01:38 +00:00
Yonghong Song 71d81e5c8f bpf: new option -bpf-expand-memcpy-in-order to expand memcpy in order
Some BPF JIT backends would want to optimize memcpy in their own
architecture specific way.

However, at the moment, there is no way for JIT backends to see memcpy
semantics in a reliable way. This is due to LLVM BPF backend is expanding
memcpy into load/store sequences and could possibly schedule them apart from
each other further. So, BPF JIT backends inside kernel can't reliably
recognize memcpy semantics by peephole BPF sequence.

This patch introduce new intrinsic expand infrastructure to memcpy.

To get stable in-order load/store sequence from memcpy, we first lower
memcpy into BPF::MEMCPY node which then expanded into in-order load/store
sequences in expandPostRAPseudo pass which will happen after instruction
scheduling. By this way, kernel JIT backends could reliably recognize
memcpy through scanning BPF sequence.

This new memcpy expand infrastructure is gated by a new option:

  -bpf-expand-memcpy-in-order

Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 337977
2018-07-25 22:40:02 +00:00
Eli Friedman d6baff65f7 [GlobalMerge] Handle llvm.compiler.used correctly.
Reuse the handling for llvm.used, and don't transform such globals.

Fixes a failure on the asan buildbot caused by my previous commit.

llvm-svn: 337973
2018-07-25 22:03:35 +00:00
Sanjay Patel 215dcbf4db [SelectionDAG] try to convert funnel shift directly to rotate if legal
If the DAGCombiner's rotate matching was working as expected, 
I don't think we'd see any test diffs here. 

This sidesteps the issue of custom lowering for rotates raised in PR38243:
https://bugs.llvm.org/show_bug.cgi?id=38243
...by only dealing with legal operations.

llvm-svn: 337966
2018-07-25 21:38:30 +00:00
Roman Tereshin 4f10a9d3a3 [LSV] Look through selects for consecutive addresses
In some cases LSV sees (load/store _ (select _ <pointer expression>
<pointer expression>)) patterns in input IR, often due to sinking and
other forms of CFG simplification, sometimes interspersed with
bitcasts and all-constant-indices GEPs. With this
patch`areConsecutivePointers` method would attempt to handle select
instructions. This leads to an increased number of successful
vectorizations.

Technically, select instructions could appear in index arithmetic as
well, however, we don't see those in our test suites / benchmarks.
Also, there is a lot more freedom in IR shapes computing integral
indices in general than in what's common in pointer computations, and
it appears that it's quite unreliable to do anything short of making
select instructions first class citizens of Scalar Evolution, which
for the purposes of this patch is most definitely an overkill.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D49428

llvm-svn: 337965
2018-07-25 21:33:00 +00:00
Sanjay Patel f94c4c84e6 [AArch, PowerPC] add more tests for legal rotate ops; NFC
llvm-svn: 337964
2018-07-25 21:25:50 +00:00
Eli Friedman 0887cf9cab [GlobalMerge] Allow merging globals with arbitrary alignment.
Instead of depending on implicit padding from the structure layout code,
use a packed struct and emit the padding explicitly.

Differential Revision: https://reviews.llvm.org/D49710

llvm-svn: 337961
2018-07-25 20:58:01 +00:00
Florian Hahn b6613ac665 Revert r337904: [IPSCCP] Use PredicateInfo to propagate facts from cmp instructions.
I suspect it is causing the clang-stage2-Rthinlto failures.

llvm-svn: 337956
2018-07-25 19:44:19 +00:00
Martin Storsjo d78b394543 Add missing 'override', fixing compilation with some compilers since SVN r337950
llvm-svn: 337952
2018-07-25 19:01:36 +00:00
Martin Storsjo ff33a95ed4 [COFF] Use comdat shared constants for MinGW as well
GNU binutils tools have no problems with this kind of shared constants,
provided that we actually hook it up completely in AsmPrinter and
produce a global symbol.

This effectively reverts SVN r335918 by hooking the rest of it up
properly.

This feature was implemented originally in SVN r213006, with no reason
for why it can't be used for MinGW other than the fact that GCC doesn't
do it while MSVC does.

Differential Revision: https://reviews.llvm.org/D49646

llvm-svn: 337951
2018-07-25 18:35:42 +00:00
Martin Storsjo d2662c32fb [COFF] Hoist constant pool handling from X86AsmPrinter into AsmPrinter
In SVN r334523, the first half of comdat constant pool handling was
hoisted from X86WindowsTargetObjectFile (which despite the name only
was used for msvc targets) into the arch independent
TargetLoweringObjectFileCOFF, but the other half of the handling was
left behind in X86AsmPrinter::GetCPISymbol.

With only half of the handling in place, inconsistent comdat
sections/symbols are created, causing issues with both GNU binutils
(avoided for X86 in SVN r335918) and with the MS linker, which
would complain like this:

fatal error LNK1143: invalid or corrupt file: no symbol for COMDAT section 0x4

Differential Revision: https://reviews.llvm.org/D49644

llvm-svn: 337950
2018-07-25 18:35:31 +00:00
Eli Friedman 0f522bdbac [LangRef] Clarify undefined behavior for function attributes.
Violating the invariants specified by attributes is undefined behavior.
Maybe we could use poison instead for some of the parameter attributes,
but I don't think it's worthwhile.

Differential Revision: https://reviews.llvm.org/D49041

llvm-svn: 337947
2018-07-25 18:26:38 +00:00
Eli Friedman 733f4ed1bb [ARM] Prefer lsls+lsrs over lsls+ands or lsrs+ands in Thumb1.
Saves materializing the immediate for the "ands".

Corresponding patterns exist for lsrs+lsls, but that seems less common
in practice.

Now implemented as a DAGCombine.

Differential Revision: https://reviews.llvm.org/D49585

llvm-svn: 337945
2018-07-25 18:22:22 +00:00
Roman Tereshin ed047b0184 [SCEV] Add [zs]ext{C,+,x} -> (D + [zs]ext{C-D,+,x})<nuw><nsw> transform
as well as sext(C + x + ...) -> (D + sext(C-D + x + ...))<nuw><nsw>
similar to the equivalent transformation for zext's

if the top level addition in (D + (C-D + x * n)) could be proven to
not wrap, where the choice of D also maximizes the number of trailing
zeroes of (C-D + x * n), ensuring homogeneous behaviour of the
transformation and better canonicalization of such AddRec's

(indeed, there are 2^(2w) different expressions in `B1 + ext(B2 + Y)` form for
the same Y, but only 2^(2w - k) different expressions in the resulting `B3 +
ext((B4 * 2^k) + Y)` form, where w is the bit width of the integral type)

This patch generalizes sext(C1 + C2*X) --> sext(C1) + sext(C2*X) and
sext{C1,+,C2} --> sext(C1) + sext{0,+,C2} transformations added in
r209568 relaxing the requirements the following way:

1. C2 doesn't have to be a power of 2, it's enough if it's divisible by 2
 a sufficient number of times;
2. C1 doesn't have to be less than C2, instead of extracting the entire
  C1 we can split it into 2 terms: (00...0XXX + YY...Y000), keep the
  second one that may cause wrapping within the extension operator, and
  move the first one that doesn't affect wrapping out of the extension
  operator, enabling further simplifications;
3. C1 and C2 don't have to be positive, splitting C1 like shown above
 produces a sum that is guaranteed to not wrap, signed or unsigned;
4. in AddExpr case there could be more than 2 terms, and in case of
  AddExpr the 2nd and following terms and in case of AddRecExpr the
  Step component don't have to be in the C2*X form or constant
  (respectively), they just need to have enough trailing zeros,
  which in turn could be guaranteed by means other than arithmetics,
  e.g. by a pointer alignment;
5. the extension operator doesn't have to be a sext, the same
  transformation works and profitable for zext's as well.

Apparently, optimizations like SLPVectorizer currently fail to
vectorize even rather trivial cases like the following:

 double bar(double *a, unsigned n) {
   double x = 0.0;
   double y = 0.0;
   for (unsigned i = 0; i < n; i += 2) {
     x += a[i];
     y += a[i + 1];
   }
   return x * y;
 }

If compiled with `clang -std=c11 -Wpedantic -Wall -O3 main.c -S -o - -emit-llvm`
(!{!"clang version 7.0.0 (trunk 337339) (llvm/trunk 337344)"})

it produces scalar code with the loop not unrolled with the unsigned `n` and
`i` (like shown above), but vectorized and unrolled loop with signed `n` and
`i`. With the changes made in this commit the unsigned version will be
vectorized (though not unrolled for unclear reasons).

How it all works:

Let say we have an AddExpr that looks like (C + x + y + ...), where C
is a constant and x, y, ... are arbitrary SCEVs. Let's compute the
minimum number of trailing zeroes guaranteed of that sum w/o the
constant term: (x + y + ...). If, for example, those terms look like
follows:

        i
XXXX...X000
YYYY...YY00
   ...
ZZZZ...0000

then the rightmost non-guaranteed-zero bit (a potential one at i-th
position above) can change the bits of the sum to the left (and at
i-th position itself), but it can not possibly change the bits to the
right. So we can compute the number of trailing zeroes by taking a
minimum between the numbers of trailing zeroes of the terms.

Now let's say that our original sum with the constant is effectively
just C + X, where X = x + y + .... Let's also say that we've got 2
guaranteed trailing zeros for X:

         j
CCCC...CCCC
XXXX...XX00  // this is X = (x + y + ...)

Any bit of C to the left of j may in the end cause the C + X sum to
wrap, but the rightmost 2 bits of C (at positions j and j - 1) do not
affect wrapping in any way. If the upper bits cause a wrap, it will be
a wrap regardless of the values of the 2 least significant bits of C.
If the upper bits do not cause a wrap, it won't be a wrap regardless
of the values of the 2 bits on the right (again).

So let's split C to 2 constants like follows:

0000...00CC  = D
CCCC...CC00  = (C - D)

and represent the whole sum as D + (C - D + X). The second term of
this new sum looks like this:

CCCC...CC00
XXXX...XX00
-----------  // let's add them up
YYYY...YY00

The sum above (let's call it Y)) may or may not wrap, we don't know,
so we need to keep it under a sext/zext. Adding D to that sum though
will never wrap, signed or unsigned, if performed on the original bit
width or the extended one, because all that that final add does is
setting the 2 least significant bits of Y to the bits of D:

YYYY...YY00 = Y
0000...00CC = D
-----------  <nuw><nsw>
YYYY...YYCC

Which means we can safely move that D out of the sext or zext and
claim that the top-level sum neither sign wraps nor unsigned wraps.

Let's run an example, let's say we're working in i8's and the original
expression (zext's or sext's operand) is 21 + 12x + 8y. So it goes
like this:

0001 0101  // 21
XXXX XX00  // 12x
YYYY Y000  // 8y

0001 0101  // 21
ZZZZ ZZ00  // 12x + 8y

0000 0001  // D
0001 0100  // 21 - D = 20
ZZZZ ZZ00  // 12x + 8y

0000 0001  // D
WWWW WW00  // 21 - D + 12x + 8y = 20 + 12x + 8y

therefore zext(21 + 12x + 8y) = (1 + zext(20 + 12x + 8y))<nuw><nsw>

This approach could be improved if we move away from using trailing
zeroes and use KnownBits instead. For instance, with KnownBits we could
have the following picture:

    i
10 1110...0011  // this is C
XX X1XX...XX00  // this is X = (x + y + ...)

Notice that some of the bits of X are known ones, also notice that
known bits of X are interspersed with unknown bits and not grouped on
the rigth or left.

We can see at the position i that C(i) and X(i) are both known ones,
therefore the (i + 1)th carry bit is guaranteed to be 1 regardless of
the bits of C to the right of i. For instance, the C(i - 1) bit only
affects the bits of the sum at positions i - 1 and i, and does not
influence if the sum is going to wrap or not. Therefore we could split
the constant C the following way:

    i
00 0010...0011  = D
10 1100...0000  = (C - D)

Let's compute the KnownBits of (C - D) + X:

XX1 1            = carry bit, blanks stand for known zeroes
 10 1100...0000  = (C - D)
 XX X1XX...XX00  = X
--- -----------
 XX X0XX...XX00

Will this add wrap or not essentially depends on bits of X. Adding D
to this sum, however, is guaranteed to not to wrap:

0    X
 00 0010...0011  = D
 sX X0XX...XX00  = (C - D) + X
--- -----------
 sX XXXX   XX11

As could be seen above, adding D preserves the sign bit of (C - D) +
X, if any, and has a guaranteed 0 carry out, as expected.

The more bits of (C - D) we constrain, the better the transformations
introduced here canonicalize expressions as it leaves less freedom to
what values the constant part of ((C - D) + x + y + ...) can take.

Reviewed By: mzolotukhin, efriedma

Differential Revision: https://reviews.llvm.org/D48853

llvm-svn: 337943
2018-07-25 18:01:41 +00:00
Stella Stamenova bb9fd461a9 [windows] Don't inline fieldFromInstruction on Windows
Summary:
The VS compiler (on Windows) has a bug which results in fieldFromInstruction being optimized out in some circumstances. This only happens in *release no debug info* builds that have assertions *turned off* - in all other situations the function is not inlined, so the functionality is correct. All of the bots have assertions turned on, so this path is not regularly tested. The workaround is to not inline the function on Windows - if the bug is fixed in a later release of the VS compiler, the noinline specification can be removed.

The test that consistently reproduces this is Lanai v11.txt test.

Reviewers: asmith, labath, zturner

Subscribers: dblaikie, stella.stamenova, aprantl, JDevlieghere, llvm-commits

Differential Revision: https://reviews.llvm.org/D49753

llvm-svn: 337942
2018-07-25 17:33:20 +00:00
Xinliang David Li 45a607e563 Add an option to specify the name of
an function whose CFG is to be viewed/printed.

Differential Revision: https://reviews.llvm.org/D49447

llvm-svn: 337940
2018-07-25 17:22:12 +00:00
Ulrich Weigand 5f75371c5d Fix corruption of result number in LegalizeVectorOps.cpp
When VectorLegalizer::LegalizeOp creates a new SDValue after iterating
over its arguments, we need to refer to the same result number of the
new node that the original value used.

Reviewed by: cameron.mcinally

Differential Revision: https://reviews.llvm.org/D49805

llvm-svn: 337939
2018-07-25 17:08:13 +00:00
Stanislav Mekhanoshin 7e7268ac1c [AMDGPU] Use AssumptionCacheTracker in the divrem32 expansion
Differential Revision: https://reviews.llvm.org/D49761

llvm-svn: 337938
2018-07-25 17:02:11 +00:00
Stanislav Mekhanoshin b8269a9589 Fix llvm::ComputeNumSignBits with some operations and llvm.assume
Currently ComputeNumSignBits does early exit while processing some
of the operations (add, sub, mul, and select). This prevents the
function from using AssumptionCacheTracker if passed.

Differential Revision: https://reviews.llvm.org/D49759

llvm-svn: 337936
2018-07-25 16:39:24 +00:00
Pavel Labath da3c4fb5fe Revert "dwarfgen: Add support for generating the debug_str_offsets section, take 2"
This reverts commit r337933. The build error is fixed but the test now
fails on the darwin buildbots. Investigating...

llvm-svn: 337935
2018-07-25 16:34:43 +00:00
Krzysztof Parzyszek 4e07509d18 [Hexagon] Properly scale bit index when extracting elements from vNi1
For example v = <2 x i1> is represented as bbbbaaaa in a predicate register,
where b = v[1], a = v[0]. Extracting v[1] is equivalent to extracting bit 4
from the predicate register.

llvm-svn: 337934
2018-07-25 16:20:59 +00:00
Pavel Labath 78ab659bb4 dwarfgen: Add support for generating the debug_str_offsets section, take 2
This recommits r337910 after fixing an "ambiguous call to addAttribute"
error with some compilers (gcc circa 4.9 and MSVC). It seems that these
compilers will consider a "false -> pointer" conversion during overload
resolution. This creates ambiguity because one I added an overload which
takes a MCExpr * as an argument.

I fix this by making the new overload take MCExpr&, which avoids the
conversion. It also documents the fact that we expect a valid MCExpr
object.

Original commit message follows:

The motivation for this is D49493, where we'd like to test details of
debug_str_offsets behavior which is difficult to trigger from a
traditional test.

This adds the plubming necessary for dwarfgen to generate this section.
The more interesting changes are:
- I've moved emitStringOffsetsTableHeader function from DwarfFile to
  DwarfStringPool, so I can generate the section header more easily from
  the unit test.
- added a new addAttribute overload taking an MCExpr*. This is used to
  generate the DW_AT_str_offsets_base, which links a compile unit to the
  offset table.

I've also added a basic test for reading and writing DW_form_strx forms.

Reviewers: dblaikie, JDevlieghere, probinson

Subscribers: llvm-commits, aprantl

Differential Revision: https://reviews.llvm.org/D49670

llvm-svn: 337933
2018-07-25 15:33:32 +00:00
Andres Freund ee10ce7137 Move JIT listener C binding fallbackks to ExecutionEngineBindings.cpp.
Initially, in https://reviews.llvm.org/D44890, I had these defined as
empty functions inside the header when the respective event listener
was not built in. As done in that commit, that wasn't correct, because
it was a ODR violation.  Krasimir hot-fixed that in r333265, but that
wasn't quite right either, because it'd lead to the symbol not being
available.

Instead just move the fallbacksto ExecutionEngineBindings.cpp. Could
define them as static inlines in the header too, but I don't think it
matters.

Reviewers: whitequark

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D49654

llvm-svn: 337930
2018-07-25 15:04:57 +00:00
Pavel Labath b4e17c29dd Revert "dwarfgen: Add support for generating the debug_str_offsets section"
This reverts commit r337910 as it's generating "ambiguous call to
addAttribute" errors on some bots.

Will resubmit once I get a chance to look into the problem.

llvm-svn: 337924
2018-07-25 12:52:30 +00:00
Petar Jovanovic 58c0210023 [MIPS GlobalISel] Lower pointer arguments
Add support for lowering pointer arguments.
Changing type from pointer to integer is already done in
MipsTargetLowering::getRegisterTypeForCallingConv.

Patch by Petar Avramovic.

Differential Revision: https://reviews.llvm.org/D49419

llvm-svn: 337912
2018-07-25 12:35:01 +00:00
Pavel Labath 7a59e3bf37 dwarfgen: Add support for generating the debug_str_offsets section
Summary:
The motivation for this is D49493, where we'd like to test details of
debug_str_offsets behavior which is difficult to trigger from a
traditional test.

This adds the plubming necessary for dwarfgen to generate this section.
The more interesting changes are:
- I've moved emitStringOffsetsTableHeader function from DwarfFile to
  DwarfStringPool, so I can generate the section header more easily from
  the unit test.
- added a new addAttribute overload taking an MCExpr*. This is used to
  generate the DW_AT_str_offsets_base, which links a compile unit to the
  offset table.

I've also added a basic test for reading and writing DW_form_strx forms.

Reviewers: dblaikie, JDevlieghere, probinson

Subscribers: llvm-commits, aprantl

Differential Revision: https://reviews.llvm.org/D49670

llvm-svn: 337910
2018-07-25 11:55:59 +00:00
Jonas Paulsson 374af8070e [SystemZ] Use tablegen loops in SchedModels
NFC changes to make scheduler TableGen files more readable, by using loops
instead of a lot of similar defs with just e.g. a latency value that changes.

https://reviews.llvm.org/D49598
Review: Ulrich Weigand, Javed Abshar

llvm-svn: 337909
2018-07-25 11:42:55 +00:00
Florian Hahn 6f5c6adbcd Recommit r333268: [IPSCCP] Use PredicateInfo to propagate facts from cmp instructions.
r337828 resolves a PredicateInfo issue with unnamed types.

Original message:
This patch updates IPSCCP to use PredicateInfo to propagate
facts to true branches predicated by EQ and to false branches
predicated by NE.

As a follow up, we should be able to extend it to also propagate additional
facts about nonnull.

Reviewers: davide, mssimpso, dberlin, efriedma

Reviewed By: davide, dberlin

llvm-svn: 337904
2018-07-25 11:13:40 +00:00
Thomas Preud'homme 768d6ce4a3 Fix PR34170: Crash on inline asm with 64bit output in 32bit GPR
Add support for inline assembly with output operand that do not
naturally go in the register class it is constrained to (eg. double in a
32-bit GPR as in the PR).

llvm-svn: 337903
2018-07-25 11:11:12 +00:00
Paul Semel 0913dcd747 [llvm-objdump] Add dynamic section printing to private-headers option
Differential Revision: https://reviews.llvm.org/D49016

llvm-svn: 337902
2018-07-25 11:09:20 +00:00
Paul Semel 5ce8f1598c [llvm-readobj] Generic hex-dump option
Helpers are available to make this option file format independant. This
patch adds the feature for Wasm file format. It doesn't change the
behavior of the other file format handling.

Differential Revision: https://reviews.llvm.org/D49545

llvm-svn: 337896
2018-07-25 10:04:37 +00:00
Chandler Carruth 4f6481dc81 [x86/SLH] Sink the return hardening into the main block-walk + hardening
code.

This consolidates all our hardening calls, and simplifies the code
a bit. It seems much more clear to handle all of these together.

No functionality changed here.

llvm-svn: 337895
2018-07-25 09:18:48 +00:00
Chandler Carruth 196e719acd [x86/SLH] Improve name and comments for the main hardening function.
This function actually does two things: it traces the predicate state
through each of the basic blocks in the function (as that isn't directly
handled by the SSA updater) *and* it hardens everything necessary in the
block as it goes. These need to be done together so that we have the
currently active predicate state to use at each point of the hardening.

However, this also made obvious that the flag to disable actual
hardening of loads was flawed -- it also disabled tracing the predicate
state across function calls within the body of each block. So this patch
sinks this debugging flag test to correctly guard just the hardening of
loads.

Unless load hardening was disabled, no functionality should change with
tis patch.

llvm-svn: 337894
2018-07-25 09:00:26 +00:00
Simon Atanasyan b524459288 [mips] Replace custom parsing logic for data directives by the `addAliasForDirective`
The target independent AsmParser doesn't recognise .hword, .word, .dword
which are required for Mips. Currently MipsAsmParser recognises these
through dispatch to MipsAsmParser::parseDataDirective. This contains
equivalent logic to AsmParser::parseDirectiveValue. This patch allows
reuse of AsmParser::parseDirectiveValue by making use of
addAliasForDirective to support .hword, .word and .dword.

Original patch provided by Alex Bradbury at D47001 was modified to fix
handling of microMIPS symbols. The `AsmParser::parseDirectiveValue`
calls either `EmitIntValue` or `EmitValue`. In this patch we override
`EmitIntValue` in the `MipsELFStreamer` to clear a pending set of
microMIPS symbols.

Differential revision: https://reviews.llvm.org/D49539

llvm-svn: 337893
2018-07-25 07:07:43 +00:00
Chijun Sima bd5d80d050 [Dominators] Assert if there is modification to DelBB while it is awaiting deletion
Summary:
Previously, passes use
```
DomTreeUpdater DTU(DT, DomTreeUpdater::UpdateStrategy::Lazy);
DTU.deleteBB(DelBB);
```
to delete a BasicBlock.
But passes which don't have the ability to update DomTree (e.g. tailcallelim, simplifyCFG) cannot recognize a DelBB awaiting deletion and will continue to process this DelBB.
This is a simple approach to notify devs of passes which may use DTU in the future to deal with deleted BasicBlocks under Lazy Strategy correctly.

Reviewers: kuhar, brzycki, dmgreen

Reviewed By: kuhar

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D49731

llvm-svn: 337891
2018-07-25 06:18:33 +00:00
Craig Topper dc0e8a601d [X86] Use X86ISD::MUL_IMM instead of ISD::MUL for multiply we intend to be selected to LEA.
This prevents other combines from possibly disturbing it.

llvm-svn: 337890
2018-07-25 05:33:36 +00:00
Craig Topper d9fa8147c4 [X86] Autogenerate complete checks and fix a failure introduced in r337875.
llvm-svn: 337889
2018-07-25 05:22:13 +00:00
Tom Stellard 179757ef05 [RegisterBankInfo] Ignore InstrMappings that create impossible to repair operands
Summary:
This is a follow-up to r303043.  In computeMapping(), we need to disqualify an
InstrMapping if it would be impossible to repair one of the registers in the
instruction to match the mapping.

This change is needed in order to be able to define an instruction
mapping for G_SELECT for the AMDGPU target and will be tested
by test/CodeGen/AMDGPU/GlobalISel/regbankselect-select.mir

Reviewers: ab, qcolombet, t.p.northover, dsanders

Reviewed By: qcolombet

Subscribers: tpr, llvm-commits

Differential Revision: https://reviews.llvm.org/D49735

llvm-svn: 337882
2018-07-25 03:08:35 +00:00
Petr Hosek 47e5fcba57 [profile] Support profiling runtime on Fuchsia
This ports the profiling runtime on Fuchsia and enables the
instrumentation. Unlike on other platforms, Fuchsia doesn't use
files to dump the instrumentation data since on Fuchsia, filesystem
may not be accessible to the instrumented process. We instead use
the data sink to pass the profiling data to the system the same
sanitizer runtimes do.

Differential Revision: https://reviews.llvm.org/D47208

llvm-svn: 337881
2018-07-25 03:01:35 +00:00
Chandler Carruth 7024921c0a [x86/SLH] Teach the x86 speculative load hardening pass to harden
against v1.2 BCBS attacks directly.

Attacks using spectre v1.2 (a subset of BCBS) are described in the paper
here:
https://people.csail.mit.edu/vlk/spectre11.pdf

The core idea is to speculatively store over the address in a vtable,
jumptable, or other target of indirect control flow that will be
subsequently loaded. Speculative execution after such a store can
forward the stored value to subsequent loads, and if called or jumped
to, the speculative execution will be steered to this potentially
attacker controlled address.

Up until now, this could be mitigated by enableing retpolines. However,
that is a relatively expensive technique to mitigate this particular
flavor. Especially because in most cases SLH will have already mitigated
this. To fully mitigate this with SLH, we need to do two core things:
1) Unfold loads from calls and jumps, allowing the loads to be post-load
   hardened.
2) Force hardening of incoming registers even if we didn't end up
   needing to harden the load itself.

The reason we need to do these two things is because hardening calls and
jumps from this particular variant is importantly different from
hardening against leak of secret data. Because the "bad" data here isn't
a secret, but in fact speculatively stored by the attacker, it may be
loaded from any address, regardless of whether it is read-only memory,
mapped memory, or a "hardened" address. The only 100% effective way to
harden these instructions is to harden the their operand itself. But to
the extent possible, we'd like to take advantage of all the other
hardening going on, we just need a fallback in case none of that
happened to cover the particular input to the control transfer
instruction.

For users of SLH, currently they are paing 2% to 6% performance overhead
for retpolines, but this mechanism is expected to be substantially
cheaper. However, it is worth reminding folks that this does not
mitigate all of the things retpolines do -- most notably, variant #2 is
not in *any way* mitigated by this technique. So users of SLH may still
want to enable retpolines, and the implementation is carefuly designed to
gracefully leverage retpolines to avoid the need for further hardening
here when they are enabled.

Differential Revision: https://reviews.llvm.org/D49663

llvm-svn: 337878
2018-07-25 01:51:29 +00:00
Craig Topper fc501a9223 [X86] Use a shift plus an lea for multiplying by a constant that is a power of 2 plus 2/4/8.
The LEA allows us to combine an add and the multiply by 2/4/8 together so we just need a shift for the larger power of 2.

llvm-svn: 337875
2018-07-25 01:15:38 +00:00
Craig Topper 5be253d988 [X86] Expand mul by pow2 + 2 using a shift and two adds similar to what we do for pow2 - 2.
llvm-svn: 337874
2018-07-25 01:15:35 +00:00
Craig Topper 56c104f104 [X86] Use a two lea sequence for multiply by 37, 41, and 73.
These fit a pattern used by 11, 21, and 19.

llvm-svn: 337871
2018-07-24 23:44:17 +00:00
Craig Topper b5342b592e [X86] Add test cases for multiply by 37, 41, and 73.
These can all be handled with 2 LEAs similar to what we do for 11, 19, 21.

llvm-svn: 337870
2018-07-24 23:44:15 +00:00
Craig Topper f8fcee70a3 [X86] Change multiply by 26 to use two multiplies by 5 and an add instead of multiply by 3 and 9 and a subtract.
Same number of operations, but ending in an add is friendlier due to it being commutable.

llvm-svn: 337869
2018-07-24 23:44:12 +00:00
Hideki Saito ef380b0fc5 [LV] Fix for PR38110, LV encountered llvm_unreachable()
Summary: truncateToMinimalBitWidths() doesn't handle all Instructions and the worst case is compiler crash via llvm_unreachable(). Fix is to add a case to handle PHINode and changed the worst case to NO-OP (from compiler crash).

Reviewers: sbaranga, mssimpso, hsaito

Reviewed By: hsaito

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D49461

llvm-svn: 337861
2018-07-24 22:30:31 +00:00
Roman Tereshin 1ba1f9310c [SCEV] Add zext(C + x + ...) -> D + zext(C-D + x + ...)<nuw><nsw> transform
if the top level addition in (D + (C-D + x + ...)) could be proven to
not wrap, where the choice of D also maximizes the number of trailing
zeroes of (C-D + x + ...), ensuring homogeneous behaviour of the
transformation and better canonicalization of such expressions.

This enables better canonicalization of expressions like

  1 + zext(5 + 20 * %x + 24 * %y)  and
      zext(6 + 20 * %x + 24 * %y)

which get both transformed to

  2 + zext(4 + 20 * %x + 24 * %y)

This pattern is common in address arithmetics and the transformation
makes it easier for passes like LoadStoreVectorizer to prove that 2 or
more memory accesses are consecutive and optimize (vectorize) them.

Reviewed By: mzolotukhin

Differential Revision: https://reviews.llvm.org/D48853

llvm-svn: 337859
2018-07-24 21:48:56 +00:00
Craig Topper 5ddc0a2b14 [X86] When expanding a multiply by a negative of one less than a power of 2, like 31, don't generate a negate of a subtract that we'll never optimize.
We generated a subtract for the power of 2 minus one then negated the result. The negate can be optimized away by swapping the subtract operands, but DAG combine doesn't know how to do that and we don't add any of the new nodes to the worklist anyway.

This patch makes use explicitly emit the swapped subtract.

llvm-svn: 337858
2018-07-24 21:31:21 +00:00
Craig Topper 6d29891bef [X86] Generalize the multiply by 30 lowering to generic multipy by power 2 minus 2.
Use a left shift and 2 subtracts like we do for 30. Move this out from behind the slow lea check since it doesn't even use an LEA.

Use this for multiply by 14 as well.

llvm-svn: 337856
2018-07-24 21:15:41 +00:00
Heejin Ahn 8daef0751d [WebAssembly] Add tests for weaker memory consistency orderings
Summary:
Currently all wasm atomic memory access instructions are sequentially
consistent, so even if LLVM IR specifies weaker orderings than that, we
should upgrade them to sequential ordering and treat them in the same
way.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D49194

llvm-svn: 337854
2018-07-24 21:06:44 +00:00
Craig Topper 86d6320b94 [X86] Change multiply by 19 to use (9 * X) * 2 + X instead of (5 * X) * 4 - 1.
The new lowering can be done in 2 LEAs. The old code took 1 LEA, 1 shift, and 1 sub.

llvm-svn: 337851
2018-07-24 20:31:48 +00:00
Jessica Paquette 58e706a66a [MachineOutliner][NFC] Move outlined function remark into its own function
This pulls the OutlinedFunction remark out into its own function to make
the code a bit easier to read.

llvm-svn: 337849
2018-07-24 20:20:45 +00:00
Jessica Paquette 69f517df27 [MachineOutliner][NFC] Move target frame info into OutlinedFunction
Just some gardening here.

Similar to how we moved call information into Candidates, this moves outlined
frame information into OutlinedFunction. This allows us to remove
TargetCostInfo entirely.

Anywhere where we returned a TargetCostInfo struct, we now return an
OutlinedFunction. This establishes OutlinedFunctions as more of a general
repeated sequence, and Candidates as occurrences of those repeated sequences.

llvm-svn: 337848
2018-07-24 20:13:10 +00:00
Peter Collingbourne e06bac4796 Put "built-in" function definitions in global Used list, for LTO. (fix bug 34169)
When building with LTO, builtin functions that are defined but whose calls have not been inserted yet, get internalized. The Global Dead Code Elimination phase in the new LTO implementation then removes these function definitions. Later optimizations add calls to those functions, and the linker then dies complaining that there are no definitions. This CL fixes the new LTO implementation to check if a function is builtin, and if so, to not internalize (and later DCE) the function. As part of this fix I needed to move the RuntimeLibcalls.{def,h} files from the CodeGen subidrectory to the IR subdirectory. I have updated all the files that accessed those two files to access their new location.

Fixes PR34169

Patch by Caroline Tice!

Differential Revision: https://reviews.llvm.org/D49434

llvm-svn: 337847
2018-07-24 19:34:37 +00:00
Chandler Carruth c9313a9ecb [x86] Teach the x86 backend that it can fold between TCRETURNm* and TCRETURNr* and fix latent bugs with register class updates.
Summary:
Enabling this fully exposes a latent bug in the instruction folding: we
never update the register constraints for the register operands when
fusing a load into another operation. The fused form could, in theory,
have different register constraints on its operands. And in fact,
TCRETURNm* needs its memory operands to use tailcall compatible
registers.

I've updated the folding code to re-constrain all the registers after
they are mapped onto their new instruction.

However, we still can't enable folding in the general case from
TCRETURNr* to TCRETURNm* because doing so may require more registers to
be available during the tail call. If the call itself uses all but one
register, and the folded load would require both a base and index
register, there will not be enough registers to allocate the tail call.

It would be better, IMO, to teach the register allocator to *unfold*
TCRETURNm* when it runs out of registers (or specifically check the
number of registers available during the TCRETURNr*) but I'm not going
to try and solve that for now. Instead, I've just blocked the forward
folding from r -> m, leaving LLVM free to unfold from m -> r as that
doesn't introduce new register pressure constraints.

The down side is that I don't have anything that will directly exercise
this. Instead, I will be immediately using this it my SLH patch. =/

Still worse, without allowing the TCRETURNr* -> TCRETURNm* fold, I don't
have any tests that demonstrate the failure to update the memory operand
register constraints. This patch still seems correct, but I'm nervous
about the degree of testing due to this.

Suggestions?

Reviewers: craig.topper

Subscribers: sanjoy, mcrosier, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D49717

llvm-svn: 337845
2018-07-24 19:04:37 +00:00
Craig Topper 1d504f777e [Inliner] Teach inliner to merge 'min-legal-vector-width' function attribute
When we inline a function with a min-legal-vector-width attribute we need to make sure the caller also ends up with at least that vector width.

This patch is necessary to make always_inline functions like intrinsics propagate their min-legal-vector-width. Though nothing uses min-legal-vector-width yet.

A future patch will add heuristics to preventing inlining with different vector width mismatches. But that code would need to be in inline cost analysis which is separate from the code added here.

Differential Revision: https://reviews.llvm.org/D49162

llvm-svn: 337844
2018-07-24 18:49:00 +00:00
Craig Topper 1296c622df [X86] Add test case to show failure to combine away negates that may be created by mul by constant expansion.
Mul by constant can expand to a sequence that ends with a negate. If the next instruction is an add or sub we might be able to fold the negate away.

We currently fail to do this because we explicitly don't add anything to the DAG combine worklist when we expand multiplies. This is primarily to keep the multipy from being reformed, but we should consider adding the users to worklist.

llvm-svn: 337843
2018-07-24 18:36:46 +00:00
Azharuddin Mohammed cb4d0cd3bb [docker] Fix LLVM_EXTERNAL_PROJECTS cmake variable value
Summary:
LLVM_ENABLE_PROJECTS expects a semicolon separated project list.

Fixes PR38158.

Reviewers: ilya-biryukov

Reviewed By: ilya-biryukov

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D49712

llvm-svn: 337842
2018-07-24 18:34:13 +00:00
Jessica Paquette fca55129b1 [MachineOutliner][NFC] Make Candidates own their call information
Before this, TCI contained all the call information for each Candidate.

This moves that information onto the Candidates. As a result, each Candidate
can now supply how it ought to be called. Thus, Candidates will be able to,
say, call the same function in cheaper ways when possible. This also removes
that information from TCI, since it's no longer used there.

A follow-up patch for the AArch64 outliner will demonstrate this.

llvm-svn: 337840
2018-07-24 17:42:11 +00:00
Jessica Paquette 1cc52a0079 [MachineOutliner][NFC] Move missed opt remark into its own function
Having the missed remark code in the middle of `findCandidates` made the
function hard to follow. This yanks that out into a new function,
`emitNotOutliningCheaperRemark`.

llvm-svn: 337839
2018-07-24 17:37:28 +00:00
Jessica Paquette f94d1d29c1 [MachineOutliner][NFC] Sink some candidate logic into OutlinedFunction
Just some simple gardening to improve clarity.

Before, we had something along the lines of

1) Create a std::vector of Candidates
2) Create an OutlinedFunction
3) Create a std::vector of pointers to Candidates
4) Copy those over to the OutlinedFunction and the Candidate list

Now, OutlinedFunctions create the Candidate pointers. They're still copied
over to the main list of Candidates, but it makes it a bit clearer what's
going on.

llvm-svn: 337838
2018-07-24 17:36:13 +00:00
Joel Galenson 8dbcc58917 Use SCEV to avoid inserting some bounds checks.
This patch uses SCEV to avoid inserting some bounds checks when they are not needed.  This slightly improves the performance of code compiled with the bounds check sanitizer.

Differential Revision: https://reviews.llvm.org/D49602

llvm-svn: 337830
2018-07-24 15:21:54 +00:00
Florian Hahn 36d2e25d5a [PredicateInfo] Use custom mangling to support ssa_copy with unnamed types.
This is a workaround and it would be better to fix this generally, but
doing it generally is quite tricky. See D48541 and PR38117.

Doing it in PredicateInfo directly allows us to use the type address to
differentiate different unnamed types, because neither the created
declarations nor the ssa_copy calls should be visible after
PredicateInfo got destroyed.

Reviewers: efriedma, davide

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D49126

llvm-svn: 337828
2018-07-24 14:49:52 +00:00
Simon Atanasyan 28ded4ee19 [mips] Fix local dynamic TLS with Sym64
For the final DTPREL addition, rather than a lui/daddiu/daddu triple,
LLVM was erronously emitting a daddiu/daddiu pair, treating the %dtprel_hi
as if it were a %dtprel_lo, since Mips::Hi expands unshifted for Sym64.
Instead, use a new TlsHi node and, although unnecessary due to the exact
structure of the nodes emitted, use TlsHi for local exec too to prevent
future bugs. Also garbage-collect the unused TprelLo and TlsGd nodes,
and TprelHi since its functionality is provided by the new common TlsHi node.

Patch by James Clarke.

Differential revision: https://reviews.llvm.org/D49259

llvm-svn: 337827
2018-07-24 13:47:52 +00:00
Chandler Carruth 54529146c6 [x86/SLH] Extract the core register hardening logic to a low-level
helper and restructure the post-load hardening to use this.

This isn't as trivial as I would have liked because the post-load
hardening used a trick that only works for it where it swapped in
a temporary register to the load rather than replacing anything.
However, there is a simple way to do this without that trick that allows
this to easily reuse a friendly API for hardening a value in a register.
That API will in turn be usable in subsequent patcehs.

This also techincally changes the position at which we insert the subreg
extraction for the predicate state, but that never resulted in an actual
instruction and so tests don't change at all.

llvm-svn: 337825
2018-07-24 12:44:00 +00:00
Chandler Carruth 376113da89 [x86/SLH] Tidy up a comment, using doxygen structure and wording it to
be more accurate and understandable.

llvm-svn: 337822
2018-07-24 12:19:01 +00:00
Sam Parker 8b93e82c3d [ARM] Disable ARMCodeGenPrepare by default
ARM Stage 2 builders have been suspiciously broken since the pass was
committed. Disabling to hopefully fix the bots and give me time to
debug.

llvm-svn: 337821
2018-07-24 12:04:23 +00:00
Duncan P. N. Exon Smith ab55bc48e6 ADT: Shrink SmallVector size 0 to 16B on 64-bit platforms
SmallVectorTemplateCommon wants to know the address of the first element
so it can detect whether it's in "small size" mode.

The old implementation split the small array, creating the storage for
the first element in SmallVectorTemplateCommon, and pulling the rest
into SmallVectorStorage where we know the size of the array.  This
bloats SmallVector size 0 by the larger of sizeof(void*) and sizeof(T),
and we're not even using the storage.

The new implementation leaves the full small storage to
SmallVectorStorage.  To calculate the offset of the first element in
SmallVectorTemplateCommon, we just need to know how far to jump, which
we can calculate out-of-band.  One subtlety is that we need
SmallVectorStorage to be properly aligned even when the size is 0, to be
sure that (for large alignments) we actually have the padding and it's
well defined to do the pointer math.

llvm-svn: 337820
2018-07-24 11:32:13 +00:00
Florian Hahn 6698f9b7db Recommit r334887: [SmallSet] Add SmallSetIterator.
Updated to make sure we properly construct/destroy SetIter if it has a
non-trivial ctors/dtors, like in MSVC.

llvm-svn: 337818
2018-07-24 10:32:54 +00:00
Shiva Chen f5938bfbf9 Revert "[DebugInfo] Generate DWARF debug information for labels."
This reverts commit b454fa1b4079b6c0a5b1565982d16516385838d7.

llvm-svn: 337812
2018-07-24 06:17:45 +00:00
Chandler Carruth a25aca21af [x86] Clean up and convert test to use generated CHECK lines.
This test was already checking microscopic behavior of tail call under
specific conditions. This just makes the CHECK lines much more
consistent, clear, and easily updated when intentional changes are made.

I've also switched the test to consistently name the entry block and to
order the helper declarations and comments for specific tests in the
more usual locations.

llvm-svn: 337806
2018-07-24 03:18:08 +00:00
Chandler Carruth d41dca2ddc [x86] Update the CHECK lines of this test to use the latest patterns
from the script. This minimizes the diff in subsequent changes.

llvm-svn: 337805
2018-07-24 03:07:07 +00:00
Shiva Chen d6b2cdf9d4 [DebugInfo] Generate DWARF debug information for labels.
There are two forms for label debug information in DWARF format.

1. Labels in a non-inlined function:

DW_TAG_label
  DW_AT_name
  DW_AT_decl_file
  DW_AT_decl_line
  DW_AT_low_pc

2. Labels in an inlined function:

DW_TAG_label
  DW_AT_abstract_origin
  DW_AT_low_pc

We will collect label information from DBG_LABEL. Before every DBG_LABEL,
we will generate a temporary symbol to denote the location of the label.
The symbol could be used to get DW_AT_low_pc afterwards. So, we create a
mapping between 'inlined label' and DBG_LABEL MachineInstr in DebugHandlerBase.
The DBG_LABEL in the mapping is used to query the symbol before it.

The AbstractLabels in DwarfCompileUnit is used to process labels in inlined
functions.

We also keep a mapping between scope and labels in DwarfFile to help to
generate correct tree structure of DIEs.

Differential Revision: https://reviews.llvm.org/D45556

Patch by Hsiangkai Wang.

llvm-svn: 337799
2018-07-24 02:22:55 +00:00
Tom Stellard b7f19e6d1e AMDGPU/GlobalISel: Legalize G_INSERT
Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D49601

llvm-svn: 337798
2018-07-24 02:19:20 +00:00
Dean Michael Berris 833bb6fbdc llvm-xray: Broken chrome trace event format output
Summary:
Missing comma separator for EXIT and TAIL_EXIT RecordTypes emit invalid
JSON output for Chrome Trace Event Format.

Reviewers: dberris

Reviewed By: dberris

Subscribers: sammccall, kpw, llvm-commits

Differential Revision: https://reviews.llvm.org/D49687

llvm-svn: 337795
2018-07-24 01:45:34 +00:00
Tom Stellard 2d37929c10 AMDGPU/GlobalISel: Remove unnecessary legality constraint for G_EXTRACT
Summary:
We were marking G_EXTRACT operations unsupported if the output type
was larger than the input type.  I don't see how this could ever actually
happen, so I dropped the constraint.  Doing this makes it possible to
reuse the same legality code for G_INSERT.

Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D49600

llvm-svn: 337794
2018-07-24 01:43:49 +00:00
Andres Freund 376a3d3659 Add PerfJITEventListener for perf profiling support.
This new JIT event listener supports generating profiling data for
the linux 'perf' profiling tool, allowing it to generate function and
instruction level profiles.

Currently this functionality is not enabled by default, but must be
enabled with LLVM_USE_PERF=yes.  Given that the listener has no
dependencies, it might be sensible to enable by default once the
initial issues have been shaken out.

I followed existing precedent in registering the listener by default
in lli. Should there be a decision to enable this by default on linux,
that should probably be changed.

Please note that until https://reviews.llvm.org/D47343 is resolved,
using this functionality with mcjit rather than orcjit will not
reliably work.

Disregarding the previous comment, here's an example:

$ cat /tmp/expensive_loop.c

bool stupid_isprime(uint64_t num)
{
        if (num == 2)
                return true;
        if (num < 1 || num % 2 == 0)
                return false;
        for(uint64_t i = 3; i < num / 2; i+= 2) {
                if (num % i == 0)
                        return false;
        }
        return true;
}

int main(int argc, char **argv)
{
        int numprimes = 0;

        for (uint64_t num = argc; num < 100000; num++)
        {
                if (stupid_isprime(num))
                        numprimes++;
        }

        return numprimes;
}

$ clang -ggdb -S -c -emit-llvm /tmp/expensive_loop.c -o
/tmp/expensive_loop.ll

$ perf record -o perf.data -g -k 1 ./bin/lli -jit-kind=mcjit /tmp/expensive_loop.ll 1

$ perf inject --jit -i perf.data -o perf.jit.data

$ perf report -i perf.jit.data
-   92.59%  lli      jitted-5881-2.so                   [.] stupid_isprime
     stupid_isprime
     main
     llvm::MCJIT::runFunction
     llvm::ExecutionEngine::runFunctionAsMain
     main
     __libc_start_main
     0x4bf6258d4c544155
+    0.85%  lli      ld-2.27.so                         [.] do_lookup_x

And line-level annotations also work:
       │              for(uint64_t i = 3; i < num / 2; i+= 2) {
       │1 30:   movq   $0x3,-0x18(%rbp)
  0.03 │1 38:   mov    -0x18(%rbp),%rax
  0.03 │        mov    -0x10(%rbp),%rcx
       │        shr    $0x1,%rcx
  3.63 │     ┌──cmp    %rcx,%rax
       │     ├──jae    6f
       │     │                if (num % i == 0)
  0.03 │     │  mov    -0x10(%rbp),%rax
       │     │  xor    %edx,%edx
 89.00 │     │  divq   -0x18(%rbp)
       │     │  cmp    $0x0,%rdx
  0.22 │     │↓ jne    5f
       │     │                        return false;
       │     │  movb   $0x0,-0x1(%rbp)
       │     │↓ jmp    73
       │     │        }
  3.22 │1 5f:│↓ jmp    61
       │     │        for(uint64_t i = 3; i < num / 2; i+= 2) {

Subscribers: mgorny, llvm-commits

Differential Revision: https://reviews.llvm.org/D44892

llvm-svn: 337789
2018-07-24 00:54:06 +00:00
Vedant Kumar d6ff43cc71 [Debugify] Export per-pass debug info loss statistics
Add a -debugify-export option to opt. This exports per-pass `debugify`
loss statistics to a file in CSV format.

For some interesting numbers on debug value loss during an -O2 build
of the sqlite3 amalgamation, see the review thread.

Differential Revision: https://reviews.llvm.org/D49003

llvm-svn: 337787
2018-07-24 00:41:29 +00:00
Vedant Kumar ca407c4336 [Debugify] Move interface definitions to a header, NFC
This is a minor cleanup in preparation for a change to export DI
statistics from -check-debugify. To do that, it would be cleaner to have
a dedicated header for the debugify interface.

llvm-svn: 337786
2018-07-24 00:41:28 +00:00
Chandler Carruth 66fbbbca60 [x86/SLH] Simplify the code for hardening a loaded value. NFC.
This is in preparation for extracting this into a re-usable utility in
this code.

llvm-svn: 337785
2018-07-24 00:35:36 +00:00
Chandler Carruth b46c22de00 [x86/SLH] Remove complex SHRX-based post-load hardening.
This code was really nasty, had several bugs in it originally, and
wasn't carrying its weight. While on Zen we have all 4 ports available
for SHRX, on all of the Intel parts with Agner's tables, SHRX can only
execute on 2 ports, giving it 1/2 the throughput of OR.

Worse, all too often this pattern required two SHRX instructions in
a chain, hurting the critical path by a lot.

Even if we end up needing to safe/restore EFLAGS, that is no longer so
bad. We pay for a uop to save the flag, but we very likely get fusion
when it is used by forming a test/jCC pair or something similar. In
practice, I don't expect the SHRX to be a significant savings here, so
I'd like to avoid the complex code required. We can always resurrect
this if/when someone has a specific performance issue addressed by it.

llvm-svn: 337781
2018-07-24 00:21:59 +00:00
Fangrui Song 5bad9d835a [DWARF] Use deque in place of SmallVector to fix use-after-free issue
Summary: SmallVector's elements are moved when resizing and cause use-after-free.

Reviewers: probinson, dblaikie

Subscribers: JDevlieghere, llvm-commits

Differential Revision: https://reviews.llvm.org/D49702

llvm-svn: 337772
2018-07-23 23:27:45 +00:00
Thomas Anderson 8e8a652c2f Fix typo in test/CodeGen/Mips/dins.ll
Differential Revision: https://reviews.llvm.org/D49704

llvm-svn: 337771
2018-07-23 23:19:53 +00:00
Wolfgang Pieb 790d86cefc Embed a template specialization in a namespace to work around a gcc bug.
llvm-svn: 337770
2018-07-23 23:14:23 +00:00
Wolfgang Pieb 439801ba1d [DWARF v5] Refactor range lists dumping by using a more generic way of handling tables of lists.
The intent is to use it for location list tables as well. Change is almost NFC with the exception
of the spelling of some strings used during dumping (all lowercase now).

Reviewer: JDevlieghere

Differential Revision: https://reviews.llvm.org/D49500

llvm-svn: 337763
2018-07-23 22:37:17 +00:00
Teresa Johnson b963c0b658 [LTO] Handle __imp_ (dllimport) symbols consistently with lld
Summary:
Similar to what lld already does for dllimport symbols which are
prefaced with __imp_ (see lld patch r240620), strip off the __imp_
prefix in LTO. Otherwise we can get 2 separate GlobalResolution for
a single symbol, the dllimport declaration, and the definition, which
leads to incorrect LTO handling.

Fixes PR38105.

Reviewers: pcc

Subscribers: mehdi_amini, inglorion, steven_wu, dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D49138

llvm-svn: 337762
2018-07-23 22:33:57 +00:00
Erik Pilkington 28e08a0a61 [demangler] call terminate() if allocation failed
We really should set *status to memory_alloc_failure, but we need to refactor
the demangler a bit to properly propagate the failure up the stack. Until then,
its better to explicitly terminate then rely on a null dereference crash.

rdar://31240372

llvm-svn: 337759
2018-07-23 22:23:04 +00:00
Martin Storsjo db42d51ee3 [MC] Add a separate flag for skipping comdat constant sections for MinGW. NFC.
This actually has nothing to do with the associative comdat sections
that aren't supported by GNU binutils ld.

Clarify the comments from SVN r335918 and use a separate flag for it.

Differential Revision: https://reviews.llvm.org/D49645

llvm-svn: 337757
2018-07-23 22:15:25 +00:00
Martin Storsjo 100fc97051 [COFF] Fix assembly output of comdat sections without an attached symbol
Since SVN r335286, the .xdata sections are produced without an attached
symbol, which requires using a different syntax when printing assembly
output.

Instead of the usual syntax of '.section <name>,"dr",discard,<symbol>',
use '.section <name>,"dr"' + '.linkonce discard' (which is what GCC
uses for all assembly output).

This fixes PR38254.

Differential Revision: https://reviews.llvm.org/D49651

llvm-svn: 337756
2018-07-23 22:15:19 +00:00
Martin Storsjo c2b701408e [AArch64] Use MCAsmInfoMicrosoft and MCAsmInfoGNUCOFF as base classes
This matches the structure used on X86 and ARM. This requires
a little bit of duplication of the parts that are equal in both
AArch64 COFF variants though.

Before SVN r335286, these classes didn't add anything that MCAsmInfoCOFF
didn't, but now they do.

This makes AArch64 match X86 in how comdat is used for float constants
for MinGW.

Differential Revision: https://reviews.llvm.org/D49637

llvm-svn: 337755
2018-07-23 22:15:14 +00:00
Vedant Kumar 0970e2e34f [utils] Fix the llvm::Optional data formatter
The llvm::Optional data formatter needs to look through the `Storage`
container if it's present.

Before:

   220    if (Op && Op->getOp() != dwarf::DW_OP_LLVM_fragment)
-> 221      HasComplexExpression = true;
   222
   223    // If the register can only be described by a complex expression (i.e.,
   224    // multiple subregisters) it doesn't safely compose with another complex
Target 0: (llc) stopped.
(lldb) p Op
(llvm::Optional<llvm::DIExpression::ExprOperand>) $0 = None

After:

(lldb) p Op
(llvm::Optional<llvm::DIExpression::ExprOperand>) $0 =
(llvm::DIExpression::ExprOperand) storage = {
  Op = 0x000000010603d460
}

llvm-svn: 337752
2018-07-23 21:59:06 +00:00
Vedant Kumar 22bd6f99fa [SelectionDAG] Reduce DanglingDebugInfo memory traffic, NFC
This avoids approx. 2 x 10^5 DenseMap insertions in both non-debug and
debug -O2 builds of the sqlite3 amalgamation.

llvm-svn: 337751
2018-07-23 21:59:04 +00:00
Teresa Johnson e214fdeb69 [ThinLTO] Ensure the TargetLibraryInfo is constructed early enough
Summary:
Without this change, the WholeProgramDevirt pass, which requires the
TargetLibraryInfo, will construct one from the default triple.

Fixes PR38139.

Reviewers: pcc

Subscribers: mehdi_amini, inglorion, steven_wu, dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D49278

llvm-svn: 337750
2018-07-23 21:58:19 +00:00
George Burgess IV b00fb46479 [DebugCounters] Keep track of total counts
This patch makes debug counters keep track of the total number of times
we've called `shouldExecute` for each counter, so it's easier to build
automated tooling on top of these.

A patch to print these counts is coming soon.

Patch by Zhizhou Yang!

Differential Revision: https://reviews.llvm.org/D49560

llvm-svn: 337748
2018-07-23 21:49:36 +00:00
Fangrui Song d9c254771d [gdb] Fix SmallVector pretty printer after r337514
llvm-svn: 337747
2018-07-23 21:33:51 +00:00
Manoj Gupta f9f50f634d ConstantFolding: Avoid a crash.
Summary:
Check if the parent basic block and caller exists
before calling CS.getCaller when constant folding
strip.invariant.group instrinsic.

This avoids a crash when the function containing the intrinsic
is being inlined. The instruction is checked for any simplifiction
but has not yet been added to a basic block.

Reviewers: Prazek, rsmith, efriedma

Reviewed By: efriedma

Subscribers: eraman, llvm-commits

Differential Revision: https://reviews.llvm.org/D49690

llvm-svn: 337742
2018-07-23 21:20:00 +00:00
Reid Kleckner 980c4df037 Re-land r335297 "[X86] Implement more of x86-64 large and medium PIC code models"
Don't try to generate large PIC code for non-ELF targets. Neither COFF
nor MachO have relocations for large position independent code, and
users have been using "large PIC" code models to JIT 64-bit code for a
while now. With this change, if they are generating ELF code, their
JITed code will truly be PIC, but if they target MachO or COFF, it will
contain 64-bit immediates that directly reference external symbols. For
a JIT, that's perfectly fine.

llvm-svn: 337740
2018-07-23 21:14:35 +00:00
Matt Davis 07dee81a68 [llvm-mca][docs] Define IPC where it is first mentioned. NFC.
Expand the abbreviation where it is first used, and use IPC elsewhere.

llvm-svn: 337739
2018-07-23 21:10:50 +00:00
David Greene 4345df3dad Fix RegScavenger::unprocess
RegScavenger::unprocess walks backward, so it should undo the effects
of defs before undoing effects of kills. Previously it did things in
the opposite order, leaving a register apparently unused (dead) in the
case where an instruction both used (killed) and defined a register.

Differential Revision: https://reviews.llvm.org/D42200

llvm-svn: 337735
2018-07-23 20:23:50 +00:00
Nirav Dave 5af81d5bfa Add inline asm aliasing test.
llvm-svn: 337734
2018-07-23 20:19:10 +00:00