Commit Graph

67148 Commits

Author SHA1 Message Date
Sanjay Patel 796e009c31 [AArch64] add tests for fcvtl2; NFC 2019-12-14 11:12:01 -05:00
Johannes Doerfert 6cc2b1d789 [Attributor][Tests] Copy & use the ArgumentPromotion tests 2019-12-14 01:05:36 -06:00
Johannes Doerfert c0cfdd32d0 [ArgPromo][Tests] Run update_test_checks on all ArgumentPromotion tests
Summary:
In preparation of D65531 as well as the reuse of these tests for the
Attributor, we modernize them and use the update_test_checks to simplify
updates.

This was done with the update_test_checks after D68819 and D68850.

Reviewers: hfinkel, vsk, dblaikie, davidxl, tejohnson, tstellar, echristo, chandlerc, efriedma, lebedev.ri

Subscribers: bollu, arphaman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68766
2019-12-14 00:29:38 -06:00
Johannes Doerfert 139c9ef45a [Attributor] Annotate call sites of declarations with a callback
Even if a declaration is called, if there is a callback we might need
the information during CG-SCC traversal (D70767).
2019-12-13 23:51:59 -06:00
Johannes Doerfert dab7d515ba [Attributor][NFC] Add more simple test situations for callbacks 2019-12-13 23:51:59 -06:00
Johannes Doerfert 6a05ee05b6 [Attributor][NFC] Reorder test functions
Since one of the functions has a personality the attribute set is
printed. If the function is the first it should (hopefully) always be #0
2019-12-13 23:51:59 -06:00
Johannes Doerfert 3da7efedaa [Attributor] Reuse the IPConstantProp tests for the Attributor
The Attributor can, to some degree, do what IPConstantProp does. We can
consequently use the corner cases already collected and tested for in
the IPConstantProp tests to improve Attributor test coverage.

This exposed various bugs fixed in previous Attributor patches.

Not all functionality of IPConstantProp is available in AAValueSimplify
and AAIsDead so some tests show that we cannot perform the expected
constant propagation.

Reviewers: fhahn, efriedma, mssimpso, davide

Subscribers: bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69748
2019-12-13 22:03:26 -06:00
Fangrui Song a0aa58dad5 [AArch64] Save FP for leaf functions when disabling frame pointer elimination
The change allows clang -mno-omit-leaf-frame-pointer to disable frame
pointer elimination. This behavior matches X86 and Mips, and also GCC
AArch64.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D71168
2019-12-13 18:48:58 -08:00
Sean Fertile 93faa237da [PowerPC] Add Support for indirect calls on AIX.
Extends the desciptor-based indirect call support for 32-bit codegen,
and enables indirect calls for AIX.

In-depth Description:
In a function descriptor based ABI, a function pointer points at a
descriptor structure as opposed to the function's entry point. The
descriptor takes the form of 3 pointers: 1 for the function's entry
point, 1 for the TOC anchor of the module containing the function
definition, and 1 for the environment pointer:

struct FunctionDescriptor {
  void *EntryPoint;
  void *TOCAnchor;
  void *EnvironmentPointer;
};

An indirect call has several steps of loading the the information from
the descriptor into the proper registers for setting up the call. Namely
it has to:

1) Save the caller's TOC pointer into the TOC save slot in the linkage
   area, and then load the callee's TOC pointer into the TOC register
   (GPR 2 on AIX).

2) Load the function descriptor's entry point into the count register.

3) Load the environment pointer into the environment pointer register
   (GPR 11 on AIX).

4) Perform the call by branching on count register.

5) Restore the caller's TOC pointer after returning from the indirect call.

A couple important caveats to the above:

- There is no way to directly load a value from memory into the count register.
  Instead we populate the count register by loading the entry point address into
  a gpr and then moving the gpr to the count register.

- The TOC restore has to come immediately after the branch on count register
  instruction (i.e., the 1st instruction executed after we return from the
  call). This is an implementation limitation. We could, in theory, schedule
  the restore elsewhere as long as no uses of the TOC pointer fall in between
  the call and the restore; however, to keep it simple, we insert a pseudo
  instruction that represents both the indirect branch instruction and the
  load instruction that restores the caller's TOC from the linkage area. As
  they flow through the compiler as a single pseudo instruction, nothing can be
  inserted between them and the caller's TOC is then valid at any use.

Differtential Revision: https://reviews.llvm.org/D70724
2019-12-13 20:07:00 -05:00
Roman Tereshin 8731799fc6 [Legalizer] Making artifact combining order-independent
Legalization algorithm is complicated by two facts:
1) While regular instructions should be possible to legalize in
   an isolated, per-instruction, context-free manner, legalization
   artifacts can only be eliminated in pairs, which could be deeply, and
   ultimately arbitrary nested: { [ () ] }, where which paranthesis kind
   depicts an artifact kind, like extend, unmerge, etc. Such structure
   can only be fully eliminated by simple local combines if they are
   attempted in a particular order (inside out), or alternatively by
   repeated scans each eliminating only one innermost pair, resulting in
   O(n^2) complexity.
2) Some artifacts might in fact be regular instructions that could (and
   sometimes should) be legalized by the target-specific rules. Which
   means failure to eliminate all artifacts on the first iteration is
   not a failure, they need to be tried as instructions, which may
   produce more artifacts, including the ones that are in fact regular
   instructions, resulting in a non-constant number of iterations
   required to finish the process.

I trust the recently introduced termination condition (no new artifacts
were created during as-a-regular-instruction-retrial of artifacts not
eliminated on the previous iteration) to be efficient in providing
termination, but only performing the legalization in full if and only if
at each step such chains of artifacts are successfully eliminated in
full as well.

Which is currently not guaranteed, as the artifact combines are applied
only once and in an arbitrary order that has to do with the order of
creation or insertion of artifacts into their worklist, which is a no
particular order.

In this patch I make a small change to the artifact combiner, making it
to re-insert into the worklist immediate (modulo a look-through copies)
artifact users of each vreg that changes its definition due to an
artifact combine.

Here the first scan through the artifacts worklist, while not
being done in any guaranteed order, only needs to find the innermost
pair(s) of artifacts that could be immediately combined out. After that
the process follows def-use chains, making them shorter at each step, thus
combining everything that can be combined in O(n) time.

Reviewers: volkan, aditya_nandakumar, qcolombet, paquette, aemerson, dsanders

Reviewed By: aditya_nandakumar, paquette

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71448
2019-12-13 15:45:18 -08:00
Sam Elliott a0f43b0043 [RISCV] Move DebugLoc Copy into CompressInstEmitter
Summary:
This copy ensures that debug location information is kept for
compressed instructions. There are places where both compressInstruction and
uncompressInstruction are called that were not doing this copy, discarding some
debug info.

This change merely moves the copy into the generated file, so you cannot forget
to copy the location over when compressing or uncompressing.

Reviewers: asb, luismarques

Reviewed By: luismarques

Subscribers: sameer.abuasal, aprantl, hiraditya, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, s.egerton, pzheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67493
2019-12-13 20:01:04 +00:00
Francesco Petrogalli 19f73f0d1b Revert "[VectorUtils] Introduce the Vector Function Database (VFDatabase)."
This reverts commit 0be81968a2.

The VFDatabase needs some rework to be able to handle vectorization
and subsequent scalarization of intrinsics in out-of-tree versions of
the compiler. For more details, see the discussion in
https://reviews.llvm.org/D67572.
2019-12-13 19:42:04 +00:00
Sanjay Patel 940600ae41 [InstSimplify] improve test coverage for insert+splat; NFC 2019-12-13 14:03:54 -05:00
Sanjay Patel 2f0c7fd2db [DAGCombiner] fold shift-trunc-shift to shift-mask-trunc (2nd try)
The initial attempt (rG89633320) botched the logic by reversing
the source/dest types. Added x86 tests for additional coverage.
The vector tests show a potential improvement (fold vector load
instead of broadcasting), but that's a known/existing problem.

This fold is done in IR by instcombine, and we have a special
form of it already here in DAGCombiner, but we want the more
general transform too:
https://rise4fun.com/Alive/3jZm

Name: general
Pre: (C1 + zext(C2) < 64)
%s = lshr i64 %x, C1
%t = trunc i64 %s to i16
%r = lshr i16 %t, C2
=>
%s2 = lshr i64 %x, C1 + zext(C2)
%a = and i64 %s2, zext((1 << (16 - C2)) - 1)
%r = trunc %a to i16

Name: special
Pre: C1 == 48
%s = lshr i64 %x, C1
%t = trunc i64 %s to i16
%r = lshr i16 %t, C2
=>
%s2 = lshr i64 %x, C1 + zext(C2)
%r = trunc %s2 to i16

...because D58017 exposes a regression without this fold.
2019-12-13 14:03:54 -05:00
Hiroshi Yamauchi ed50e6060b [PGO][PGSO] Enable size optimizations in code gen / target passes for cold code.
Summary: Split off of D67120.

Reviewers: davidxl

Subscribers: hiraditya, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71288
2019-12-13 11:01:19 -08:00
Momchil Velikov 8e8e3181aa [ARM] Fix in ICE when retrieving the number of micro-ops for vlldm/vlstm
The big switch in `ARMBaseInstrInfo::getNumMicroOps` is missing cases for
`VLLDM` and `VLSTM`, which are currently defined with itineraries having a
dynamic count of micro-ops.

Assuming an optimistic case in which these instruction do not actually perform
loads or stores, and with the idea that Armv8-m cores are supposed to use the
new style scheduling models, this patch just sets the itinerary for those two
instructions to `NoItinerary`.

Differential Revision: https://reviews.llvm.org/D71266
2019-12-13 18:19:40 +00:00
Momchil Velikov d53e61863d [AArch64] Emit PAC/BTI .note.gnu.property flags
This patch make LLVM emit the processor specific program property types
defined in AArch64 ELF spec
https://developer.arm.com/docs/ihi0056/f/elf-for-the-arm-64-bit-architecture-aarch64-abi-2019q2-documentation

A file containing no functions gets both property flags.  Otherwise, a property
is set iff all the functions in the file have the corresponding attribute.

Patch by Daniel Kiss and Momchil Velikov.

Differential Revision: https://reviews.llvm.org/D71019
2019-12-13 17:38:20 +00:00
Fangrui Song f99eedeb72 [MC][PowerPC] Fix a crash when redefining a symbol after .set
Fix PR44284. This is probably not valid assembly but we should not crash.

Reviewed By: luporl, #powerpc, steven.zhang

Differential Revision: https://reviews.llvm.org/D71443
2019-12-13 09:31:54 -08:00
Mark Murray a2cd4600ec [ARM][MVE][Intrinsics] All vqdmulhq/vqrdmulhq tests should be for signed numbers.
Fix broken tests. I can't yet explain how they worked locally pre-commit.
2019-12-13 17:29:59 +00:00
Kristina Bessonova d5655c4d2e [llvm-dwarfdump][Statistics] Don't count coverage less than 1% as 0%
Summary:
This is a follow up for D70548.
Currently, variables with debug info coverage between 0% and 1% are put into
zero-bucket. D70548 changed the way statistics calculate a variable's coverage:
we began to use enclosing scope rather than a possible variable life range.
Thus more variables might be moved to zero-bucket despite they have some debug
info coverage.
The patch is to distinguish between a variable that has location info but
it's significantly less than its enclosing scope and a variable that doesn't
have it at all.

Reviewers: djtodoro, aprantl, dblaikie, avl

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71070
2019-12-13 17:34:58 +03:00
Nicola Zaghen 97572775d2 Reland [DataLayout] Fix occurrences that size and range of pointers are assumed to be the same.
GEP index size can be specified in the DataLayout, introduced in D42123. However, there were still places
in which getIndexSizeInBits was used interchangeably with getPointerSizeInBits. This notably caused issues
with Instcombine's visitPtrToInt; but the unit tests was incorrect, so this remained undiscovered.

This fixes the buildbot failures.

Differential Revision: https://reviews.llvm.org/D68328

Patch by Joseph Faulls!
2019-12-13 14:30:21 +00:00
Sanjay Patel dc9e6ba90b [x86] add tests for shift-trunc-shift; NFC
More coverage for a possible generic transform.
2019-12-13 08:37:06 -05:00
Mikhail Maltsev 99581fd4c8 [ARM][MVE] Add vector reduction intrinsics with two vector operands
Summary:
This patch adds intrinsics for the following MVE instructions:
* VABAV
* VMLADAV, VMLSDAV
* VMLALDAV, VMLSLDAV
* VRMLALDAVH, VRMLSLDAVH

Each of the above 4 groups has a corresponding new LLVM IR intrinsic,
since the instructions cannot be easily represented using
general-purpose IR operations.

Reviewers: simon_tatham, ostannard, dmgreen, MarkMurrayARM

Reviewed By: MarkMurrayARM

Subscribers: merge_guards_bot, kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D71062
2019-12-13 13:17:29 +00:00
Kristina Bessonova 1cc4b603ba [llvm-dwarfdump][Statistics] Change the coverage buckets representation. NFC
Summary:
This changes the representation of 'coverage buckets' in llvm-dwarfdump and
llvm-locstats to one that makes more clear what the buckets contain.

See some related details in D71070.

Reviewers: djtodoro, aprantl, cmtice, jhenderson

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71366
2019-12-13 16:08:25 +03:00
Simon Tatham 25305a9311 [ARM][MVE] Add intrinsics for more immediate shifts.
Summary:
This fills in the remaining shift operations that take a single vector
input and an immediate shift count: the `vqshl`, `vqshlu`, `vrshr` and
`vshll[bt]` families.

`vshll[bt]` (which shifts each input lane left into a double-width
output lane) is the most interesting one. There are separate MC
instruction ids for shifting by exactly the input lane width and
shifting by less than that, because the instruction encoding is so
completely different for the lane-width special case. So I had to
write two sets of patterns to match based on the immediate shift
count, which involved adding a ComplexPattern matcher to avoid the
general-case pattern accidentally matching the special case too. For
that family I've made sure to add an llc codegen test for both
versions of each instruction.

I'm experimenting with a new strategy for parametrising the isel
patterns for all these instructions: adding extra fields to the
relevant `Instruction` subclass itself, which are ignored by the
Tablegen backends that generate the MC data, but can be retrieved from
each instance of that instruction subclass when it's passed as a
template parameter to the multiclass that generates its isel patterns.
A nice effect of that is that I can fill in those informational fields
using `let` blocks, rather than having to type them out once per
instruction at `defm` time.

(As a result, quite a lot of existing instruction `def`s are
reindented by this patch, so it's clearer to read with whitespace
changes ignored.)

Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard

Reviewed By: MarkMurrayARM

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D71458
2019-12-13 13:07:39 +00:00
Tim Renouf fce1a6f584 Revert "AMDGPU: Try to commute sub of boolean ext"
This reverts commit 69fcfb7d35.

As shown in the test I attached to this commit, the change I reverted
causes a problem with "zext(cc1) - zext(cc2)". It commuted
the operands to the sub and used different logic to select the addc/subc
instruction:
   sub zext (setcc), x => addcarry 0, x, setcc
   sub sext (setcc), x => subcarry 0, x, setcc

... but that is bogus. I believe it is not possible to fold those commuted
patterns into any form of addcarry or subcarry. It may have worked as
intended before "AMDGPU: Change boolean content type to 0 or 1" because
the setcc was considered to be -1 rather than 1.

Differential Revision: https://reviews.llvm.org/D70978

Change-Id: If2139421aa6c935cbd1d925af58fe4a4aa9e8f43
2019-12-13 12:49:06 +00:00
Djordje Todorovic baea913609 [llvm-locstats] Avoid the locstats when no scope bytes coverage found
If the total number of PC range bytes in each variable's enclosing scope
('scope bytes total') is 0, we will have division by zero.

Differential Revision: https://reviews.llvm.org/D71415
2019-12-13 13:45:44 +01:00
Sjoerd Meijer e91420e17d Revert "[ARM][MVE] findVCMPToFoldIntoVPS. NFC."
This reverts commit 9468e3334b.

There's a test that doesn't like this change. The RDA analysis
gets invalided by changes in the block, which is not taken into
account. Revert while I work on a fix for this.
2019-12-13 11:56:44 +00:00
Mark Murray 228c74076d [ARM][MVE][Intrinsics] Add *_x() variants of my *_m() intrinsics.
Summary:
Better use of multiclass is used, and this helped find some existing
bugs in the predicated VMULL* intrinsics, which are now fixed.

The refactored VMULL[TB]Q_(INT|POLY)_M() intrinsics were discovered
to have an argument ("inactive") with incorrect type, and this required
a fix that is included in this whole patch. The argument "inactive"
should have been the same width (per vector element) as the return
type of the intrinsic, but was not in the case where the return type
was double the element width of the input types.

To assist in testing the multiclassing , and to thwart further gremlins,
the unit tests are improved in scope.

The *.ll tests are all generated by a small bit of throw-away scripting
from the corresponding *.c tests, and as such the diffs are large and
nasty. Look at the file rather than the diff.

Reviewers: dmgreen, miyuki, ostannard, simon_tatham

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D71421
2019-12-13 11:51:23 +00:00
Kerry McLaughlin 4194ca8e5a Recommit "[AArch64][SVE] Implement intrinsics for non-temporal loads & stores"
Updated pred_load patterns added to AArch64SVEInstrInfo.td by this patch
to use reg + imm non-temporal loads to fix previous test failures.

Original commit message:

Adds the following intrinsics:
  - llvm.aarch64.sve.ldnt1
  - llvm.aarch64.sve.stnt1

This patch creates masked loads and stores with the
MONonTemporal flag set when used with the intrinsics above.
2019-12-13 10:08:20 +00:00
David Stenberg 5c7cc6f83d [LiveDebugValues] Omit entry values for DBG_VALUEs with pre-existing expressions
Summary:
This is a quickfix for PR44275. An assertion that checks that the
DIExpression is valid failed due to attempting to create an entry value
for an indirect parameter. This started appearing after D69028, as the
indirect parameter started being represented using an DW_OP_deref,
rather than with the DBG_VALUE's second operand, meaning that the
isIndirectDebugValue() check in LiveDebugValues did not exclude such
parameters. A DIExpression that has an entry value operation can
currently not have any other operation, leading to the failed isValid()
check.

This patch simply makes us stop considering emitting entry values
for such parameters. To support such cases I think we at least need
to do the following changes:

 * In DIExpression::isValid(): Remove the limitation that a
   DW_OP_LLVM_entry_value operation can be the only operation in a
   DIExpression.

 * In LiveDebugValues::emitEntryValues(): Create an entry value of size
   1, so that it only wraps the register operand, and not the whole
   pre-existing expression (the DW_OP_deref).

 * In LiveDebugValues::removeEntryValue(): Check that the new debug
   value has the same debug expression as the original, rather than
   checking that the debug expression is empty.

 * In DwarfExpression::addMachineRegExpression(): Modify the logic so
   that a DW_OP_reg* expression is emitted for the entry value.
   That is how GCC emits entry values for indirect parameters. That will
   currently not happen to due the DW_OP_deref causing the
   !HasComplexExpression to fail. The LocationKind needs to be changed
   also, rather than always emitting a DW_OP_stack_value for entry values.

There are probably more things I have missed, but that could hopefully
be a good starting point for emitting such entry values.

Reviewers: djtodoro, aprantl, jmorse, vsk

Reviewed By: aprantl, vsk

Subscribers: hiraditya, llvm-commits

Tags: #debug-info, #llvm

Differential Revision: https://reviews.llvm.org/D71416
2019-12-13 10:49:46 +01:00
Georgii Rymar 86e652f828 [yaml2obj] - Add a way to override sh_flags section field.
Currently we have the `Flags` property that allows to
set flags for a section. The problem is that it does not
allow us to set an arbitrary value, because of bit fields
validation under the hood. An arbitrary values can be used
to test specific broken cases.

We probably do not want to relax the validation, so this
patch adds a `ShSize` property that allows to
override the `sh_size`. It is inline with others `Sh*` properties
we have already.

Differential revision: https://reviews.llvm.org/D71411
2019-12-13 11:54:37 +03:00
Georgii Rymar 422b078c69 [llvm-readobj] - Fix letters used for dumping section types in GNU style.
I've noticed that when we have all regular flags set, we print "WAEXMSILoGTx"
instead of "WAXMSILOGTCE" printed by GNU readelf.

It happens because:
1) We print SHF_EXCLUDE at the wrong place.
2) We do not recognize SHF_COMPRESSED, we print "x" instead of "C".
3) We print "o" instead of "O" for SHF_OS_NONCONFORMING.

This patch fixes differences and adds test cases.

Differential revision: https://reviews.llvm.org/D71418
2019-12-13 11:31:24 +03:00
Nikita Popov 21fbd5587c Reapply [LVI] Normalize pointer behavior
This is a rebase of the change over D70376, which fixes an LVI cache
invalidation issue that also affected this patch.

-----

Related to D69686. As noted there, LVI currently behaves differently
for integer and pointer values: For integers, the block value is always
valid inside the basic block, while for pointers it is only valid at
the end of the basic block. I believe the integer behavior is the
correct one, and CVP relies on it via its getConstantRange() uses.

The reason for the special pointer behavior is that LVI checks whether
a pointer is dereferenced in a given basic block and marks it as
non-null in that case. Of course, this information is valid only after
the dereferencing instruction, or in conservative approximation,
at the end of the block.

This patch changes the treatment of dereferencability: Instead of
including it inside the block value, we instead treat it as something
similar to an assume (it essentially is a non-nullness assume) and
incorporate this information in intersectAssumeOrGuardBlockValueConstantRange()
if the context instruction is the terminator of the basic block.
This happens either when determining an edge-value internally in LVI,
or when a terminator was explicitly passed to getValueAt(). The latter
case makes this change not fully NFC, because we can now fold
terminator icmps based on the dereferencability information in the
same block. This is the reason why I changed one JumpThreading test
(it would optimize the condition away without the change).

Of course, we do not want to recompute dereferencability on each
intersectAssume call, so we need a new cache for this. The
dereferencability analysis requires walking the entire basic block
and computing underlying objects of all memory operands. This was
previously done separately for each queried pointer value. In the
new implementation (both because this makes the caching simpler,
and because it is faster), I instead only walk the full BB once and
cache all the dereferenced pointers. So the traversal is now performed
only once per BB, instead of once per queried pointer value.

I think the overall model now makes more sense than before, and there
will be no more pitfalls due to differing integer/pointer behavior.

Differential Revision: https://reviews.llvm.org/D69914
2019-12-13 08:59:58 +01:00
Evgenii Stepanov dabd2622a8 hwasan: add tag_offset DWARF attribute to optimized debug info
Summary:
Support alloca-referencing dbg.value in hwasan instrumentation.
Update AsmPrinter to emit DW_AT_LLVM_tag_offset when location is in
loclist format.

Reviewers: pcc

Subscribers: srhines, aprantl, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70753
2019-12-12 16:18:54 -08:00
Danilo Carvalho Grael 6bed43f3c4 [AArch64][SVE] Add integer arithmetic with immediate instructions.
Summary:
Add pattern matching for the following instructions:
- add, sub, subr, sqadd, sqsub, uqadd, uqsub

This patch required complex patterns to match the immediate with optinal left shift.

I re-used the Select function from the other SVE repo to implement the complext pattern.

I plan on doing another patch to also match constant vector of the same immediate.

Reviewers: sdesmalen, huntergr, rengolin, efriedma, c-rhodes, mgudim, kmclaughlin

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits, amehsan

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71370
2019-12-12 17:28:46 -05:00
Johannes Doerfert 6abd01e462 [Attributor][FIX] Do treat byval arguments special
When we reason about the pointer argument that is byval we actually
reason about a local copy of the value passed at the call site. This was
not the case before and we wrongly introduced attributes based on the
surrounding function.

AAMemoryBehaviorArgument, AAMemoryBehaviorCallSiteArgument and
AANoCaptureCallSiteArgument are made aware of byval now. The code
to skip "subsuming positions" for reasoning follows a common pattern and
we should refactor it. A TODO was added.

Discovered by @efriedma as part of D69748.
2019-12-12 16:04:21 -06:00
Sanjay Patel 9432937190 Revert "[DAGCombiner] fold shift-trunc-shift to shift-mask-trunc"
This reverts commit 8963332c33.
There was a logic bug typo in this code, but it wasn't visible in the asm for the tests.
2019-12-12 16:24:40 -05:00
Sanjay Patel 8963332c33 [DAGCombiner] fold shift-trunc-shift to shift-mask-trunc
This fold is done in IR by instcombine, and we have a special
form of it already here in DAGCombiner, but we want the more
general transform too:
https://rise4fun.com/Alive/3jZm

Name: general
Pre: (C1 + zext(C2) < 64)
%s = lshr i64 %x, C1
%t = trunc i64 %s to i16
%r = lshr i16 %t, C2
=>
%s2 = lshr i64 %x, C1 + zext(C2)
%a = and i64 %s2, zext((1 << (16 - C2)) - 1)
%r = trunc %a to i16

Name: special
Pre: C1 == 48
%s = lshr i64 %x, C1
%t = trunc i64 %s to i16
%r = lshr i16 %t, C2
=>
%s2 = lshr i64 %x, C1 + zext(C2)
%r = trunc %s2 to i16

...because D58017 exposes a regression without this fold.
2019-12-12 15:44:13 -05:00
Teresa Johnson c8e0bb3b2c [LTO] Support for embedding bitcode section during LTO
Summary:
This adds support for embedding bitcode in a binary during LTO. The libLTO gains supports the `-lto-embed-bitcode` flag. The option allows users of the LTO library to embed a bitcode section. For example, LLD can pass the option via `ld.lld -mllvm=-lto-embed-bitcode`.

This feature allows doing something comparable to `clang -c -fembed-bitcode`, but on the (LTO) linker level. Having bitcode alongside native code has many use-cases. To give an example, the MacOS linker can create a `-bitcode_bundle` section containing bitcode. Also, having this feature built into LLVM is an alternative to 3rd party tools such as [[ https://github.com/travitch/whole-program-llvm | wllvm ]] or [[ https://github.com/SRI-CSL/gllvm | gllvm ]]. As with these tools, this feature simplifies creating "whole-program" llvm bitcode files, but in contrast to wllvm/gllvm it does not rely on a specific llvm frontend/driver.

Patch by Josef Eisl <josef.eisl@oracle.com>

Reviewers: #llvm, #clang, rsmith, pcc, alexshap, tejohnson

Reviewed By: tejohnson

Subscribers: tejohnson, mehdi_amini, inglorion, hiraditya, aheejin, steven_wu, dexonsmith, dang, cfe-commits, llvm-commits, #llvm, #clang

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D68213
2019-12-12 12:34:19 -08:00
Jonas Paulsson 61f5ba5c32 [SystemZ] Implement the packed stack layout
Any llvm function with the "packed-stack" attribute will be compiled to use
the packed stack layout which reuses unused parts of the incoming register
save area. This is needed for building the Linux kernel.

Review: Ulrich Weigand
https://reviews.llvm.org/D70821
2019-12-12 10:26:03 -08:00
Sanjay Patel 927a6614bc [AArch64][PowerPC] add tests for shift sandwich; NFC 2019-12-12 12:37:02 -05:00
Florian Hahn bd12a322d7 [BasicAA] Use GEP as context for computeKnownBits in aliasGEP.
In order to use assumptions, computeKnownBits needs a context
instruction. We can use the GEP, if it is an instruction. We already
pass the assumption cache, but it cannot be used without a context
instruction.

Reviewers: anemet, asbirlea, hfinkel, spatel

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D71264
2019-12-12 17:18:04 +00:00
Florian Hahn 526244b187 [Matrix] Add first set of matrix intrinsics and initial lowering pass.
This is the first patch adding an initial set of matrix intrinsics and a
corresponding lowering pass. This has been discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2019-October/136240.html

The first patch introduces four new intrinsics (transpose, multiply,
columnwise load and store) and a LowerMatrixIntrinsics pass, that
lowers those intrinsics to vector operations.

Matrixes are embedded in a 'flat' vector (e.g. a 4 x 4 float matrix
embedded in a <16 x float> vector) and the intrinsics take the dimension
information as parameters. Those parameters need to be ConstantInt.
For the memory layout, we initially assume column-major, but in the RFC
we also described how to extend the intrinsics to support row-major as
well.

For the initial lowering, we split the input of the intrinsics into a
set of column vectors, transform those column vectors and concatenate
the result columns to a flat result vector.

This allows us to lower the intrinsics without any shape propagation, as
mentioned in the RFC. In follow-up patches, we plan to submit the
following improvements:
 * Shape propagation to eliminate the embedding/splitting for each
   intrinsic.
 * Fused & tiled lowering of multiply and other operations.
 * Optimization remarks highlighting matrix expressions and costs.
 * Generate loops for operations on large matrixes.
 * More general block processing for operation on large vectors,
   exploiting shape information.

We would like to add dedicated transpose, columnwise load and store
intrinsics, even though they are not strictly necessary. For example, we
could instead emit a large shufflevector instruction instead of the
transpose. But we expect that to
  (1) become unwieldy for larger matrixes (even for 16x16 matrixes,
      the resulting shufflevector masks would be huge),
  (2) risk instcombine making small changes, causing us to fail to
      detect the transpose, preventing better lowerings

For the load/store, we are additionally planning on exploiting the
intrinsics for better alias analysis.

Reviewers: anemet, Gerolf, reames, hfinkel, andrew.w.kaylor, efriedma, rengolin

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D70456
2019-12-12 15:42:18 +00:00
Sjoerd Meijer 9468e3334b [ARM][MVE] findVCMPToFoldIntoVPS. NFC.
This adds ReachingDefAnalysis (RDA) to the VPTBlock pass, so that we can
reimplement findVCMPToFoldIntoVPS with just a few calls to RDA.

Differential Revision: https://reviews.llvm.org/D71330
2019-12-12 15:41:20 +00:00
Sam Parker 1274ac3dc2 [ARM][MVE] Sink vector shift operand
Recommit e0b966643f. sub instructions were being generated for the
negated value, and for some reason they were the register only ones.
I think the problem was because I was grabbing the 'zero' from
vmovimm, which is a target constant. Now I'm just generating a new
Constant zero and so rsb instructions are now generated.

Original commit message:

The shift amount operand can be provided in a general purpose
register so sink it. Flip the vdup and negate so the existing
patterns can be used for matching.

Differential Revision: https://reviews.llvm.org/D70841
2019-12-12 14:34:00 +00:00
James Henderson 84a9756a72 [llvm-dwarfdump] Add blank line after printing line table
This helps delineate it in the output from later tables or other output.

Reviewed by: JDevlieghere

Differential Revision: https://reviews.llvm.org/D71344
2019-12-12 14:06:10 +00:00
Sam Parker 021b613cdc [NFC][ARM] Add some test triples
Add thumb and thumb2 to a couple of the test files.
2019-12-12 13:51:39 +00:00
stozer e39e2b4a79 [DebugInfo] Prevent invalid fragments at ISel from dropping debug info
During SelectionDAG, if a value which is associated with a DBG_VALUE
needs to be split across multiple registers, the DBG_VALUE will be split
into a set of fragment expressions to recreate the original value.

If one or more of these fragments cannot be created, they would
previously be silently dropped, causing the old debug value to live past
its expiry date. This patch fixes this issue by keeping invalid
fragments while setting their value as Undef.

Differential revision: https://reviews.llvm.org/D70248
2019-12-12 12:28:39 +00:00
Georgii Rymar c752de0505 [llvm-readobj][test] - Add a test for testing regular section flags and cleanup flags testing.
This:
1) Adds a test for testing all section flags (`section-flags.test`).
2) Renames `sec-flags.test`->`section-arch-flags.test`
   and performs a clean up.
3) Removes `compression.zlib.style.elf-x86-64` binary and a test case
   for SHF_COMPRESSED flag, because them are now excessive.
4) Adds missing MIPS flags and a test for SHF_ARM_PURECODE.

Differential revision: https://reviews.llvm.org/D71333
2019-12-12 14:26:40 +03:00
Mirko Brkusanin d7357c52a4 [Mips] Add support for min/max/umin/umax atomics
In order to properly implement these atomic we need one register more than other
binary atomics. It is used for storing result from comparing values in addition
to the one that is used for actual result of operation.

https://reviews.llvm.org/D71028
2019-12-12 11:32:37 +01:00
Nicola Zaghen f798eb21ec Temporarily Revert "[DataLayout] Fix occurrences that size and range of pointers are assumed to be the same."
This reverts commit 5f6208778f.

This caused failures in Transforms/PhaseOrdering/scev-custom-dl.ll
const: Assertion `getBitWidth() == CR.getBitWidth() && "ConstantRange types don't agree!"' failed.
2019-12-12 10:29:54 +00:00
Nicola Zaghen 5f6208778f [DataLayout] Fix occurrences that size and range of pointers are assumed to be the same.
GEP index size can be specified in the DataLayout, introduced in D42123. However, there were still places
in which getIndexSizeInBits was used interchangeably with getPointerSizeInBits. This notably caused issues
with Instcombine's visitPtrToInt; but the unit tests was incorrect, so this remained undiscovered.

Differential Revision: https://reviews.llvm.org/D68328

Patch by Joseph Faulls!
2019-12-12 10:07:01 +00:00
Georgii Rymar fff9f049b2 [llvm-readobj][test] - Cleanup and split tests in tools/llvm-readobj folder.
tools/llvm-readobj currently contains tests that are either general for
all file types or that mix file types inside. This patch refactors
these test and leaves only general tests in that folder. All other
tests were moved to ELF/COFF/MachO and wasm accordingly.

I tried to minimize amount of changes, so most of the test parts
remained unchanged. Any further refactorings and improvements for
particular tests should be done independently from this patch.

Differential revision: https://reviews.llvm.org/D71269
2019-12-12 12:21:58 +03:00
Alexey Lapshin 71aaebc824 [DWARF5][DWARFVerifier] Check that Skeleton compilation unit does not have children.
That patch adds checking into DWARFVerifier that the Skeleton
compilation unit does not have children.

Differential Revision: https://reviews.llvm.org/D71244
2019-12-12 10:59:10 +03:00
Sam Parker f8ff3bf55b Revert "[ARM][MVE] Sink vector shift operand"
This reverts commit e0b966643f.

Instruction selection is failing with expensive checks.
2019-12-12 07:52:57 +00:00
Sam Parker e0b966643f [ARM][MVE] Sink vector shift operand
The shift amount operand can be provided in a general purpose
register so sink it. Flip the vdup and negate so the existing
patterns can be used for matching.

Differential Revision: https://reviews.llvm.org/D70841
2019-12-12 07:35:21 +00:00
Wenlei He d275a06487 [AutoFDO] Statistic for context sensitive profile guided inlining
Summary: AutoFDO compilation has two places that do inlining - the sample profile loader that does inlining with context sensitive profile, and the regular inliner as CGSCC pass. Ideally we want most inlining to come from sample profile loader as that is driven by context sensitive profile and also retains context sensitivity after inlining. However the reality is most of the inlining actually happens during regular inliner. To track the number of inline instances from sample profile loader and help move more inlining to sample profile loader, I'm adding statistics and optimization remarks for sample profile loader's inlining.

Reviewers: wmi, davidxl

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70584
2019-12-11 21:37:21 -08:00
Puyan Lotfi f5b7a46837 [llvm][MIRVRegNamerUtils] Adding hashing on memoperands.
No more hash collisions for memoperands. Now the MIRCanonicalization
pass shouldn't hit hash collisions when dealing with nearly identical
memory accessing instructions when their memoperands are in fact different.

Differential Revision: https://reviews.llvm.org/D71328
2019-12-11 22:11:49 -05:00
Cameron McInally 7aa5c16088 [AArch64][SVE] Add patterns for scalable vselect
This patch matches scalable vector selects to predicated move instructions.

Differential Revision: https://reviews.llvm.org/D71298
2019-12-11 20:15:44 -06:00
Reid Kleckner 5d986953c8 [IR] Split out target specific intrinsic enums into separate headers
This has two main effects:
- Optimizes debug info size by saving 221.86 MB of obj file size in a
  Windows optimized+debug build of 'all'. This is 3.03% of 7,332.7MB of
  object file size.
- Incremental step towards decoupling target intrinsics.

The enums are still compact, so adding and removing a single
target-specific intrinsic will trigger a rebuild of all of LLVM.
Assigning distinct target id spaces is potential future work.

Part of PR34259

Reviewers: efriedma, echristo, MaskRay

Reviewed By: echristo, MaskRay

Differential Revision: https://reviews.llvm.org/D71320
2019-12-11 18:02:14 -08:00
Sanjay Patel 83e1bd36be [AArch64][x86] add tests for possible infinite loops in DAGCombiner; NFC
This is a reduction of a test that failed (infinite looped)
with rGd1f0bdf2d2df (subsequently reverted). I've duplicated
it for 2 targets to increase coverage - everything down here
is wobbly.
2019-12-11 19:41:42 -05:00
Vedant Kumar 56232f950d Revert "[DWARF] Allow cross-CU references of subprogram definitions"
This reverts commit 30038da15b. It causes
the stage2 thinLTO bot to fail with:

Assertion failed: (CU.getDIE(CalleeSP) && "Expected declaration subprogram DIE for callee")

rdar://57840415
2019-12-11 15:55:48 -08:00
Sanjay Patel cdf5cfea8e Revert "[SDAG] remove use restriction in isNegatibleForFree() when called from getNegatedExpression()"
This reverts commit d1f0bdf2d2.
The patch can cause infinite loops in DAGCombiner.
2019-12-11 16:56:58 -05:00
Sam Clegg 881d877846 [WebAssembly] Add new `export_name` clang attribute for controlling wasm export names
This is equivalent to the existing `import_name` and `import_module`
attributes which control the import names in the final wasm binary
produced by lld.

This maps the existing

This attribute currently requires a string rather than using the
symbol name for a couple of reasons:

1. Avoid confusion with static and dynamic linking which is
   based on symbol name.  Exporting a function from a wasm module using
   this directive is orthogonal to both static and dynamic linking.
2. Avoids name mangling.

Differential Revision: https://reviews.llvm.org/D70520
2019-12-11 11:54:57 -08:00
Nikita Popov 8db5143b1a [InstCombine] Optimize overflow check base on uadd.with.overflow result
Fix for https://bugs.llvm.org/show_bug.cgi?id=40846.

This adds a combine for cases where a (a + b) < a style overflow
check is performed, but with a + b being the result of
uadd.with.overflow, so the overflow result is also already available
and we can just use it. Subsequently GVN/CSE will deduplicate the extracts.

We can run into this situation if you have both a uadd.with.overflow
and a manual add + overflow check in the same function (on the same
operands), in which case GVN will rewrite the add to the with.overflow
result and leave you with this pattern.

The implementation is a bit ugly because I'm handling the various
canonicalization edge cases.

This does not yet handle the negated version of this pattern.

Differential Revision: https://reviews.llvm.org/D58644
2019-12-11 20:52:04 +01:00
Danila Kutenin 19e83a9b4c [ValueTracking] Pointer is known nonnull after load/store
If the pointer was loaded/stored before the null check, the check
is redundant and can be removed. For now the optimizers do not
remove the nullptr check, see https://gcc.godbolt.org/z/H2r5GG.
The patch allows to use more nonnull constraints. Also, it found
one more optimization in some PowerPC test. This is my first llvm
review, I am free to any comments.

Differential Revision: https://reviews.llvm.org/D71177
2019-12-11 20:32:29 +01:00
Danila Kutenin fc765698e0 [ValueTracking] Add tests for non-null check after load/store; NFC
Tests for D71177.
2019-12-11 20:26:31 +01:00
Nikita Popov b361d3bbcd [MergeFuncs] Remove incorrect attribute copying
Fix for https://bugs.llvm.org/show_bug.cgi?id=44236. This code was
originally introduced in rG36512330041201e10f5429361bbd79b1afac1ea1.
However, the attribute copying was done in the wrong place (in general
call replacement, not thunk generation) and a proper fix was
implemented in D12581.

Previously this code was just unnecessary but harmless (because
FunctionComparator ensured that the attributes of the two functions
are exactly the same), but since byval was changed to accept a type
this copying is actively wrong and may result in malformed IR.

Differential Revision: https://reviews.llvm.org/D71173
2019-12-11 20:09:54 +01:00
Andrzej Warzynski a75463c471 Add intrinsics for unary narrowing operations
Summary:
The following intrinsics for unary narrowing operations are added:
 * @llvm.aarch64.sve.sqxtnb
 * @llvm.aarch64.sve.uqxtnb
 * @llvm.aarch64.sve.sqxtunb
 * @llvm.aarch64.sve.sqxtnt
 * @llvm.aarch64.sve.uqxtnt
 * @llvm.aarch64.sve.sqxtunt

Reviewers: sdesmalen, rengolin, efriedma

Reviewed By: efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71270
2019-12-11 18:55:51 +00:00
Florian Hahn 2675a3c880 [AArch64] Be more careful to skip debug operands in LdSt Optimizier.
This fixes crashes with $noreg operands.
2019-12-11 18:47:45 +00:00
Sanjay Patel d1f0bdf2d2 [SDAG] remove use restriction in isNegatibleForFree() when called from getNegatedExpression()
This is an alternate fix for the bug discussed in D70595.
This also includes minimal tests for other in-tree targets
to show the problem more generally.

We check the number of uses as a predicate for whether some
value is free to negate, but that use count can change as we
rewrite the expression in getNegatedExpression(). So something
that was marked free to negate during the cost evaluation
phase becomes not free to negate during the rewrite phase (or
the inverse - something that was not free becomes free).
This can lead to a crash/assert because we expect that
everything in an expression that is negatible to be handled
in the corresponding code within getNegatedExpression().

This patch skips the use check during the rewrite phase.
So we determine that some expression isNegatibleForFree
(identically to without this patch), but during the rewrite,
don't rely on use counts to decide how to create the optimal
expression.

Differential Revision: https://reviews.llvm.org/D70975
2019-12-11 13:30:39 -05:00
Bardia Mahjour 916d37a2bc [DA] Improve dump to show source and sink of the dependence
Summary:
The current da printer shows the dependence without indicating
which instructions are being considered as the src vs dst. It
also silently ignores call instructions, despite the fact that
they create confused dependence edges to other memory
instructions. This patch addresses these two issues plus a
couple of minor non-functional improvements.

Authored By: bmahjour

Reviewer: dmgreen, fhahn, philip.pfaffe, chandlerc

Reviewed By: dmgreen, fhahn

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71088
2019-12-11 11:48:16 -05:00
Florian Hahn 4fe92abceb [AArch64] Skip debug ops with regsOverlap in AArch64 LD/ST opt.
This fixes a crash when debug instructions are in between 2 stores.
2019-12-11 16:26:31 +00:00
Ulrich Weigand 5ad67df988 [SystemZ] Add llvm.minimum / llvm.maximum tests
The backend already supports the @llvm.minimum and @llvm.maximum
intrinsics, but we had no test cases for those.  Add tests.
2019-12-11 17:01:13 +01:00
Craig Topper 3adc819b7a [X86] Erase dead LEA instruction after converting it to MOV in FixupLEAPass::processInstrForSlow3OpLEA. 2019-12-11 07:51:23 -08:00
Ulrich Weigand ac473394ff [SystemZ] Fix 128-bit strict FMA expansion pre-z14
Before z14, we did not have any FMA instruction for 128-bit
floating-point, so the @llvm.fma.f128 intrinsic needs to be
expanded to a libcall on those platforms.

This worked correctly for regular FMA, but was implemented
incorrectly for the strict version.  This was not noticed
because we did not have test coverage for this case.

This patch fixes that incorrect expansion and adds the
missing test cases.
2019-12-11 16:32:08 +01:00
Diogo Sampaio ee21934588 [ARM][NFC] Change test to use CHECK-NEXT 2019-12-11 14:25:36 +00:00
Matt Arsenault 49d731b5e0 Verifier: Check frame-pointer attribute values
There are a few places that check specific string attributes have
particular values, and assert if they are something else. The verifier
should catch these kinds of cases.
2019-12-11 19:53:49 +05:30
Matt Arsenault 32137699f7 AMDGPU: Fix copy-pasted test name error 2019-12-11 19:44:47 +05:30
Kerry McLaughlin c0a3ab3655 Revert "[AArch64][SVE] Implement intrinsics for non-temporal loads & stores"
This reverts commit 3f5bf35f86 as it was
causing build failures in llvm-clang-x86_64-expensive-checks:

http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-debian/builds/392
http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-ubuntu/builds/1045
2019-12-11 13:58:39 +00:00
Florian Hahn 17554b8961 [AArch64] Teach Load/Store optimizier to rename store operands for pairing.
In some cases, we can rename a store operand, in order to enable pairing
of stores.  For store pairs, that cannot be merged because the first
tored register is defined in between the second store, we try to find
suitable rename register.

First, we check if we can rename the given register:

1. The first store register must be killed at the store, which means we
   do not have to rename instructions after the first store.
2. We scan backwards from the first store, to find the definition of the
   stored register and check all uses in between are renamable. Along
   they way, we collect the minimal register classes of the uses for
   overlapping (sub/super)registers.

Second, we try to find an available register from the minimal physical
register class of the original register. A suitable register must not be

1. defined before FirstMI
2. between the previous definition of the register to rename
3. a callee saved register.

We use KILL flags to clear defined registers while scanning from the
beginning to the end of the block.

This triggers quite often, here are the top changes for MultiSource,
SPEC2000, SPEC2006 compiled with -O3 for iOS:

Metric: aarch64-ldst-opt.NumPairCreated

Program                                        base     patch    diff
 test-suite...nch/fourinarow/fourinarow.test     2.00    39.00   1850.0%
 test-suite...s/ASC_Sequoia/IRSmk/IRSmk.test    46.00    80.00   73.9%
 test-suite...chmarks/Olden/power/power.test    70.00    96.00   37.1%
 test-suite...cations/hexxagon/hexxagon.test    29.00    39.00   34.5%
 test-suite...nchmarks/McCat/05-eks/eks.test   100.00   132.00   32.0%
 test-suite.../Trimaran/enc-rc4/enc-rc4.test    46.00    59.00   28.3%
 test-suite...T2006/473.astar/473.astar.test   160.00   200.00   25.0%
 test-suite.../Trimaran/enc-md5/enc-md5.test     8.00    10.00   25.0%
 test-suite...telecomm-gsm/telecomm-gsm.test   113.00   139.00   23.0%
 test-suite...ediabench/gsm/toast/toast.test   113.00   139.00   23.0%
 test-suite...Source/Benchmarks/sim/sim.test    91.00   111.00   22.0%
 test-suite...C/CFP2000/179.art/179.art.test    41.00    49.00   19.5%
 test-suite...peg2/mpeg2dec/mpeg2decode.test   245.00   279.00   13.9%
 test-suite...marks/Olden/health/health.test    16.00    18.00   12.5%
 test-suite...ks/Prolangs-C/cdecl/cdecl.test    90.00   101.00   12.2%
 test-suite...fice-ispell/office-ispell.test    91.00   100.00    9.9%
 test-suite...oxyApps-C/miniGMG/miniGMG.test   430.00   465.00    8.1%
 test-suite...lowfish/security-blowfish.test    39.00    42.00    7.7%
 test-suite.../Applications/spiff/spiff.test    42.00    45.00    7.1%
 test-suite...arks/mafft/pairlocalalign.test   2473.00  2646.00   7.0%
 test-suite.../VersaBench/ecbdes/ecbdes.test    29.00    31.00    6.9%
 test-suite...nch/beamformer/beamformer.test   220.00   235.00    6.8%
 test-suite...CFP2000/177.mesa/177.mesa.test   2110.00  2252.00   6.7%
 test-suite...ve-susan/automotive-susan.test   109.00   116.00    6.4%
 test-suite...s-C/unix-smail/unix-smail.test    65.00    69.00    6.2%
 test-suite...CI_Purple/SMG2000/smg2000.test   1194.00  1265.00   5.9%
 test-suite.../Benchmarks/nbench/nbench.test   472.00   500.00    5.9%
 test-suite...oxyApps-C/miniAMR/miniAMR.test   248.00   262.00    5.6%
 test-suite...quoia/CrystalMk/CrystalMk.test    18.00    19.00    5.6%
 test-suite...rks/tramp3d-v4/tramp3d-v4.test   7331.00  7710.00   5.2%
 test-suite.../Benchmarks/Bullet/bullet.test   5651.00  5938.00   5.1%
 test-suite...ternal/HMMER/hmmcalibrate.test   750.00   788.00    5.1%
 test-suite...T2006/456.hmmer/456.hmmer.test   764.00   802.00    5.0%
 test-suite...ications/JM/ldecod/ldecod.test   1028.00  1079.00   5.0%
 test-suite...CFP2006/444.namd/444.namd.test   1368.00  1434.00   4.8%
 test-suite...marks/7zip/7zip-benchmark.test   4471.00  4685.00   4.8%
 test-suite...6/464.h264ref/464.h264ref.test   3122.00  3271.00   4.8%
 test-suite...pplications/oggenc/oggenc.test   1497.00  1565.00   4.5%
 test-suite...T2000/300.twolf/300.twolf.test   742.00   774.00    4.3%
 test-suite.../Prolangs-C/loader/loader.test    24.00    25.00    4.2%
 test-suite...0.perlbench/400.perlbench.test   1983.00  2058.00   3.8%
 test-suite...ications/JM/lencod/lencod.test   4612.00  4785.00   3.8%
 test-suite...yApps-C++/PENNANT/PENNANT.test   995.00   1032.00   3.7%
 test-suite...arks/VersaBench/dbms/dbms.test    54.00    56.00    3.7%

Reviewers: efriedma, thegameg, samparker, dmgreen, paquette, evandro

Reviewed By: paquette

Differential Revision: https://reviews.llvm.org/D70450
2019-12-11 13:50:11 +00:00
James Henderson 5224feb7ca [test][llvm-dwarfdump] Add missing testing for some --debug-* options
A number of the --debug-* options in llvm-dwarfdump are not particularly
well tested. In some cases, the option is only tested as part of testing
another feature, or a specific part of the section that the options
dump. This change adds four new tests to address some of these holes. It
is not aiming to address every hole however.

I kept the --debug-line switch test separate to X86/brief.s because the
latter only considers the parts of the line table that are affected by
verbose printing, thus missing out things like the header and different
values for things like the Line, Column etc registers.

Reviewed by: JDevlieghere

Differential Revision: https://reviews.llvm.org/D71276
2019-12-11 13:42:54 +00:00
Andrzej Warzynski 65651f197a [AArch64][SVE] Add DAG combine rules for gather loads and sext/zext
Summary:
These changes allow us to support sign-extending gather loads with the
exisiting intrinsics (i.e. @llvm.aarch64.sve.ld1.gather.*).

Reviewers: sdesmalen, huntergr, kmclaughlin, efriedma, rengolin, rovka, dancgr, mgudim

Reviewed By: sdesmalen

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential revision: https://reviews.llvm.org/D70812
2019-12-11 12:56:18 +00:00
Georgii Rymar 9a5c849991 [llvm-readobj][llvm-readelf] - Remove excessive empty lines when reporting errors and warnings.
After recent changes it is now seems possible to get rid of
printing '\n' before each error and warning. This makes the output
cleaner.

Differential revision: https://reviews.llvm.org/D71246
2019-12-11 15:06:33 +03:00
Oliver Stannard 6ae3d310bd Revert "Reland [AArch64][MachineOutliner] Return address signing for outlined functions"
This reverts commit cec2d5c174.

Reverting because this is still creating outlined functions with return
address signing instructions with mismatches SP values. For example:

  int *volatile v;

  void foo(int x) {
    int a[x];
    v = &a[0];
    v = &a[0];
    v = &a[0];
    v = &a[0];
    v = &a[0];
    v = &a[0];
  }

  void bar(int x) {
    int a[x];
    v = 0;
    v = &a[0];
    v = &a[0];
    v = &a[0];
    v = &a[0];
    v = &a[0];
  }

This generates these two outlined functions, both of which modify SP
between the paciasp and retaa instructions:

  $ clang --target=aarch64-arm-none-eabi -march=armv8.3-a -c test2.c -o - -S -Oz -mbranch-protection=pac-ret+leaf
  ...
  OUTLINED_FUNCTION_0:                    // @OUTLINED_FUNCTION_0
          .cfi_sections .debug_frame
          .cfi_startproc
  // %bb.0:
          paciasp
          .cfi_negate_ra_state
          mov     w8, w0
          lsl     x8, x8, #2
          add     x8, x8, #15             // =15
          mov     x9, sp
          and     x8, x8, #0x7fffffff0
          sub     x8, x9, x8
          mov     x29, sp
          mov     sp, x8
          adrp    x9, v
          retaa
  ...
  OUTLINED_FUNCTION_1:                    // @OUTLINED_FUNCTION_1
          .cfi_startproc
  // %bb.0:
          paciasp
          .cfi_negate_ra_state
          str     x8, [x9, :lo12:v]
          str     x8, [x9, :lo12:v]
          str     x8, [x9, :lo12:v]
          str     x8, [x9, :lo12:v]
          str     x8, [x9, :lo12:v]
          mov     sp, x29
          retaa
2019-12-11 12:06:20 +00:00
Simon Tatham 1fed9a0c0c [TableGen] Add bang-operators !getop and !setop.
Summary:
These allow you to get and set the operator of a dag node, without
affecting its list of arguments.

`!getop` is slightly fiddly because in many contexts you need its
return value to have a static type more specific than 'any record'. It
works to say `!cast<BaseClass>(!getop(...))`, but it's cumbersome, so
I made `!getop` take an optional type suffix itself, so that can be
written as the shorter `!getop<BaseClass>(...)`.

Reviewers: hfinkel, nhaehnle

Reviewed By: nhaehnle

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71191
2019-12-11 12:05:22 +00:00
Kerry McLaughlin 3f5bf35f86 [AArch64][SVE] Implement intrinsics for non-temporal loads & stores
Summary:
Adds the following intrinsics:
  - llvm.aarch64.sve.ldnt1
  - llvm.aarch64.sve.stnt1

This patch creates masked loads and stores with the
MONonTemporal flag set when used with the intrinsics above.

Reviewers: sdesmalen, paulwalker-arm, dancgr, mgudim, efriedma, rengolin

Reviewed By: efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71000
2019-12-11 11:13:51 +00:00
czhengsz bf4580b7e7 [PowerPC][NFC] add test case for lwa - loop ds form prep 2019-12-11 06:10:11 -05:00
Sjoerd Meijer d97cf1f889 [ARM][LowOverheadLoops] Remove dead loop update instructions.
After creating a low-overhead loop, the loop update instruction was still
lingering around hurting performance. This removes dead loop update
instructions, which in our case are mostly SUBS instructions.

To support this, some helper functions were added to MachineLoopUtils and
ReachingDefAnalysis to analyse live-ins of loop exit blocks and find uses
before a particular loop instruction, respectively.

This is a first version that removes a SUBS instruction when there are no other
uses inside and outside the loop block, but there are some more interesting
cases in test/CodeGen/Thumb2/LowOverheadLoops/mve-tail-data-types.ll which
shows that there is room for improvement. For example, we can't handle this
case yet:

    ..
    dlstp.32  lr, r2
  .LBB0_1:
    mov r3, r2
    subs  r2, #4
    vldrh.u32 q2, [r1], #8
    vmov  q1, q0
    vmla.u32  q0, q2, r0
    letp  lr, .LBB0_1
  @ %bb.2:
    vctp.32 r3
    ..

which is a lot more tricky because r2 is not only used by the subs, but also by
the mov to r3, which is used outside the low-overhead loop by the vctp
instruction, and that requires a bit of a different approach, and I will follow
up on this.

Differential Revision: https://reviews.llvm.org/D71007
2019-12-11 10:20:19 +00:00
Simon Tatham bd0f271c9e [ARM][MVE] Add intrinsics for immediate shifts. (reland)
This adds the family of `vshlq_n` and `vshrq_n` ACLE intrinsics, which
shift every lane of a vector left or right by a compile-time
immediate. They mostly work by expanding to the IR `shl`, `lshr` and
`ashr` operations, with their second operand being a vector splat of
the immediate.

There's a fiddly special case, though. ACLE specifies that the
immediate in `vshrq_n` can take values up to //and including// the bit
size of the vector lane. But LLVM IR thinks that shifting right by the
full size of the lane is UB, and feels free to replace the `lshr` with
an `undef` half way through the optimization pipeline. Hence, to keep
this legal in source code, I have to detect it at codegen time.
Logical (unsigned) right shifts by the element size are handled by
simply emitting the zero vector; arithmetic ones are converted into a
shift of one bit less, which will always give the same output.

In order to do that check, I also had to enhance the tablegen
MveEmitter so that it can cope with converting a builtin function's
operand into a bare integer to pass to a code-generating subfunction.
Previously the only bare integers it knew how to handle were flags
generated from within `arm_mve.td`.

Reviewers: dmgreen, miyuki, MarkMurrayARM, ostannard

Reviewed By: dmgreen, MarkMurrayARM

Subscribers: echristo, hokein, rdhindsa, kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D71065
2019-12-11 10:10:09 +00:00
Sam Parker ee7579409b [ARM][TypePromotion] Enable by default
Enable the TypePromotion pass my default (again).

This patch was originally committed in 393dacacf7.
This patch was reverted in a38396939c.

Differential Revision: https://reviews.llvm.org/D70998
2019-12-11 10:00:16 +00:00
Georgii Rymar 445c3fdd2a [llvm-readelf] - Do no print an empty symbol version as "<corrupt>"
It is discussed here https://reviews.llvm.org/D71118#inline-643172

Currently when a version is empty, llvm-readelf prints:
"000:   0 (*local*)       2 (<corrupt>)"

But GNU readelf does not treat empty section as corrupt.
There is no sense in having empty versions anyways it seems, but
this change is for consistency with GNU.

Differential revision: https://reviews.llvm.org/D71243
2019-12-11 12:24:37 +03:00
Martin Storsjö af39708c2d [llvm-readobj] Fix/improve printing WinEH unwind info for linked PE images
ARMWinEHPrinter was already designed to handle linked PE images
(since d2941b43f4), but resolving symbols didn't consistently
take the image base into account (as linked images seldom have a
symbol table, except for in MinGW setups).

Win64EHDumper wasn't really designed to handle linked images (it would
crash if executed on such a file), but a few concepts (getSymbol,
taking a virtual address instead of a relocation, and
getSectionContaining for finding the section containing a certain
virtual address) can be borrowed from ARMWinEHPrinter.

Adjust ARMWinEHPrinter to print the address of the exception handler
routine as a VA instead of an RVA, consistently with other addresses
in the same printout, and make Win64EHDumper print addresses similarly
for image cases.

Differential Revision: https://reviews.llvm.org/D71303
2019-12-11 10:20:34 +02:00
QingShan Zhang f99297176c [PowerPC] Exploitate the Vector Integer Average Instructions
PowerPC has instruction to do the semantics of this piece of code:

vector int foo(vector int m, vector int n) {
  return (m + n + 1) >> 1;
}
This patch is adding the match rule to select it.

Differential Revision: https://reviews.llvm.org/D71002
2019-12-11 07:25:57 +00:00
Nico Weber caa4120906 Revert "[DebugInfo] Refactored macro related generation, added a test case for macinfo.dwo emission."
This reverts commit 307f60a1a3.

DebugInfo/X86/debug-macinfo-split-dwarf.ll fails on Windows:

Command Output (stdout):
--
$ ":" "RUN: at line 1"
$ "c:\src\llvm-project\out\gn\bin\llc.exe" "-mtriple=x86_64-pc-windows-gnu" "-O0" "-split-dwarf-file=foo.dwo" "-filetype=obj"
Assertion failed: Section && "Cannot switch to a null section!", file ../../llvm/lib/MC/MCStreamer.cpp, line 1103
Stack dump:
0.	Program arguments: c:\src\llvm-project\out\gn\bin\llc.exe -mtriple=x86_64-pc-windows-gnu -O0 -split-dwarf-file=foo.dwo -filetype=obj
2019-12-10 21:32:30 -05:00
Fangrui Song 4d53b99c5d [llvm-ar] Improve tool selection heuristic
If llvm-ar is installed at arm-pokymllib32-linux-gnueabi-llvm-ar, it may
think it is llvm-lib due to the "lib" substring.

Improve the heuristic to make all the following work as intended:

llvm-ar-9 (llvm-9 package on Debian)
llvm-ranlib.exe
Lib.exe (reported by D44808)
arm-pokymllib32-linux-gnueabi-llvm-ar (reported by D71030)

Reviewed By: raj.khem, rupprecht

Differential Revision: https://reviews.llvm.org/D71302
2019-12-10 17:32:50 -08:00
Craig Topper 935d41e4bd [X86] Split v64i1 arguments into 2 v32i1s that will be promoted to v32i8 under min-legal-vector-width=256
This is an improvement to 88dacbd436
2019-12-10 17:29:02 -08:00
Puyan Lotfi f364686f34 [llvm][MIRVRegNamerUtil] Adding hashing against MachineInstr flags.
Now, flags will result in differing hashes for a given MI. In effect, if
you have two instructions with everything identical except for their
flags then you should get two different hashes and fewer collisions.

Differential Revision: https://reviews.llvm.org/D70479
2019-12-10 20:16:14 -05:00
Wang, Pengfei 21bc8631fe [FPEnv][X86] Constrained FCmp intrinsics enabling on X86
Summary: This is a follow up of D69281, it enables the X86 backend support for the FP comparision.

Reviewers: uweigand, kpn, craig.topper, RKSimon, cameron.mcinally, andrew.w.kaylor

Subscribers: hiraditya, llvm-commits, annita.zhang, LuoYuanke, LiuChen3

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70582
2019-12-11 08:23:09 +08:00