Commit Graph

40958 Commits

Author SHA1 Message Date
Hans Wennborg aeacdc258b IRMover: Avoid accidentally mapping types from the destination module (PR30799)
During Module linking, it's possible for SrcM->getIdentifiedStructTypes();
to return types that are actually defined in the destination module
(DstM). Depending on how the bitcode file was read,
getIdentifiedStructTypes() might do a walk over all values, including
metadata nodes, looking for types. In my case, a debug info metadata
node was shared between the two modules, and it referred to a type
defined in the destination module (see test case).

Differential Revision: https://reviews.llvm.org/D26212

llvm-svn: 287353
2016-11-18 17:33:05 +00:00
Simon Pilgrim 7bde5df5f0 [X86][AVX512] Split AVX512F/AVX512VL tests to demonstrate missed int2fp opportunities without AVX512VL
llvm-svn: 287348
2016-11-18 15:31:36 +00:00
Tom Stellard df613198c0 GlobalISel: Fix unconditional fallback with global isel abort is disabled
Reviewers: t.p.northover, ab, qcolombet

Subscribers: mehdi_amini, vkalintiris, wdng, dberris, llvm-commits, rovka

Differential Revision: https://reviews.llvm.org/D26765

llvm-svn: 287344
2016-11-18 14:14:35 +00:00
Tom Stellard 01e65d2cfc AMDGPU/SI: Remove zero_extend patterns for i16 ops selected to 32-bit insts
Summary:
The 32-bit instructions don't zero the high 16-bits like the 16-bit
instructions do.

Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D26828

llvm-svn: 287342
2016-11-18 13:53:34 +00:00
Florian Hahn 77382be56b [simplifycfg][loop-simplify] Preserve loop metadata in 2 transformations.
insertUniqueBackedgeBlock in lib/Transforms/Utils/LoopSimplify.cpp now
propagates existing llvm.loop metadata to newly the added backedge.

llvm::TryToSimplifyUncondBranchFromEmptyBlock in lib/Transforms/Utils/Local.cpp
now propagates existing llvm.loop metadata to the branch instructions in the
predecessor blocks of the empty block that is removed.

Differential Revision: https://reviews.llvm.org/D26495

llvm-svn: 287341
2016-11-18 13:12:07 +00:00
Nicolai Haehnle ce2b589df5 AMDGPU: Fix legalization of MUBUF instructions in shaders
Summary:
The addr64-based legalization is incorrect for MUBUF instructions with idxen
set as well as for BUFFER_LOAD/STORE_FORMAT_* instructions.  This affects
e.g.  shaders that access buffer textures.

Since we never actually need the addr64-legalization in shaders, this patch
takes the easy route and keys off the calling convention.  If this ever
affects (non-OpenGL) compute, the type of legalization needs to be chosen
based on some TSFlag.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98664

Reviewers: arsenm, tstellarAMD

Subscribers: kzhuravl, wdng, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D26747

llvm-svn: 287339
2016-11-18 11:55:52 +00:00
Ehsan Amiri ff0942e6ea [Power9] Add patterns for vnegd, vnegw
Exploit new instructions by adding patterns to .td file.
https://reviews.llvm.org/D26551

llvm-svn: 287334
2016-11-18 11:05:55 +00:00
Simon Pilgrim 3e5045e8f1 [X86][AVX2] Add v8i32->v8i64 mul test (PR30845)
llvm-svn: 287332
2016-11-18 11:00:36 +00:00
Ehsan Amiri 85818684c6 [PPC][DAGCombine] Convert SETCC to subtract when the result is zero extended
When we see a SETCC whose only users are zero extend operations, we can replace
it with a subtraction. This results in doing all calculations in GPRs and
avoids CR use.

Currently we do this only for ULT, ULE, UGT and UGE condition codes. There are
ways that this can be extended. For example for signed condition codes. In that
case we will be introducing additional sign extend instructions, so more careful
profitability analysis may be required.

Another direction to extend this is for equal, not equal conditions. Also when
users of SETCC are any_ext or sign_ext, we might be able to do something 
similar.

llvm-svn: 287329
2016-11-18 10:41:44 +00:00
Craig Topper 1de753f7f5 [InstCombine][AVX-512] Teach InstCombineCalls how to handle the intrinsics for variable shift with 16-bit elements.
This is a straightforward extension of the existing support for 32/64-bit element types. Just needed to add the additional instrinsics to the switches.

llvm-svn: 287316
2016-11-18 06:04:33 +00:00
Craig Topper 02b5a1b50f [AVX-512] Replace masked 16-bit element variable shift intrinsics with new unmasked versions and selects.
The same thing was done to 32-bit and 64-bit element sizes previously.

This will allow us to support these shuffls in InstCombineCalls along with the other variable shift intrinsics.

llvm-svn: 287312
2016-11-18 05:04:44 +00:00
Matt Arsenault 742deb2495 AMDGPU: Fix crash on illegal type for inlineasm
There are still crashes on non-MVT types in other
places.

llvm-svn: 287310
2016-11-18 04:42:57 +00:00
Alexei Starovoitov 8f9f8210c1 convert bpf assembler to look like kernel verifier output
since bpf instruction set was introduced people learned to
read and understand kernel verifier output whereas llvm asm
output stayed obscure and unknown. Convert llvm to emit
assembler text similar to kernel to avoid this discrepancy

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
llvm-svn: 287300
2016-11-18 02:32:35 +00:00
Craig Topper 07f1c15995 [AVX-512] Support FCOPYSIGN for v16f32 and v8f64
Summary:
This extends FCOPYSIGN support to 512-bit vectors.

I've also added tests to show what the 128-bit and 256-bit cases look like with broadcast loads.

Reviewers: delena, zvi, RKSimon, spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26791

llvm-svn: 287298
2016-11-18 02:25:34 +00:00
Anna Zaks 9cd5ed1241 [asan] Turn on Mach-O global metadata liveness tracking by default
This patch turns on the metadata liveness tracking since all known issues
have been resolved. The future has been implemented in
https://reviews.llvm.org/D16737 and enables support of dead code stripping
option on Mach-O platforms.

As part of enabling the feature, I also plan on reverting the following
patch to compiler-rt:

http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160704/369910.html

Differential Revision: https://reviews.llvm.org/D26772

llvm-svn: 287235
2016-11-17 16:55:40 +00:00
Konstantin Zhuravlyov 0a1a7b6b23 Revert "AMDGPU: Enable ConstrainCopy DAG mutation"
This reverts commit r287146.

This breaks few conformance tests.

llvm-svn: 287233
2016-11-17 16:41:49 +00:00
Simon Pilgrim 8eca5520dc [X86][SSE] Improve lowering of vXi64 multiply with known zero 32-bit halves
vXi64 multiplication is lowered into 3 calls of vpmuludq with the upper/lower 32-bit halves.

If any of these halves are zero then we can remove individual calls. Although there was isBuildVectorAllZeros code to do this I don't think it ever worked (maybe just for constant folded cases that don't seem to be tested for any longer).

This requires additional X86ISD support for computeKnownBitsForTargetNode, so far I've just added support for X86ISD::VZEXT (VPMOVZX* - helping the AVX2+ cases).

Partial fix for PR30845

Differential Revision: https://reviews.llvm.org/D26590

llvm-svn: 287223
2016-11-17 12:14:49 +00:00
Pablo Barrio c41e856f53 [ARM] Relax restriction on variadic functions for tailcall optimization
Summary:
Variadic functions can be treated in the same way as normal functions
with respect to the number and types of parameters.

Reviewers: grosbach, olista01, t.p.northover, rengolin

Subscribers: javed.absar, aemerson, llvm-commits

Differential Revision: https://reviews.llvm.org/D26748

llvm-svn: 287219
2016-11-17 10:56:58 +00:00
Oren Ben Simhon 489d6eff4f [X86] RegCall - Handling v64i1 in 32/64 bit target
Register Calling Convention defines a new behavior for v64i1 types.
This type should be saved in GPR.
However for 32 bit machine we need to split the value into 2 GPRs (because each is 32 bit).

Differential Revision: https://reviews.llvm.org/D26181

llvm-svn: 287217
2016-11-17 09:59:40 +00:00
Sanjoy Das 4a8fe09040 [ImplicitNullCheck] Fix an edge case where we were hoisting incorrectly
ImplicitNullCheck keeps track of one instruction that the memory
operation depends on that it also hoists with the memory operation.
When hoisting this dependency, it would sometimes clobber a live-in
value to the basic block we were hoisting the two things out of.  Fix
this by explicitly looking for such dependencies.

I also noticed two redundant checks on `MO.isDef()` in IsMIOperandSafe.
They're redundant since register MachineOperands are either Defs or Uses
-- there is no third kind.  I'll change the checks to asserts in a later
commit.

llvm-svn: 287213
2016-11-17 07:29:40 +00:00
Craig Topper dfaf9201cb [X86] Add a test case where, due to a bug in selectScalarSSELoad, we fold the same load twice.
llvm-svn: 287210
2016-11-17 05:37:39 +00:00
Konstantin Zhuravlyov 20ba24e231 [AMDGPU] Add missing test for rL287203
llvm-svn: 287204
2016-11-17 04:33:20 +00:00
Konstantin Zhuravlyov 3f0cdc7a11 [AMDGPU] Promote f16/i16 conversions to f32/i32
llvm-svn: 287201
2016-11-17 04:00:46 +00:00
Konstantin Zhuravlyov 662e01dfbe [AMDGPU] Expand `br_cc` for f16
Differential Revision: https://reviews.llvm.org/D26732

llvm-svn: 287199
2016-11-17 03:49:01 +00:00
Dehao Chen 41d72a8632 Use profile info to adjust loop unroll threshold.
Summary:
For flat loop, even if it is hot, it is not a good idea to unroll in runtime, thus we set a lower partial unroll threshold.
For hot loop, we set a higher unroll threshold and allows expensive tripcount computation to allow more aggressive unrolling.

Reviewers: davidxl, mzolotukhin

Subscribers: sanjoy, mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D26527

llvm-svn: 287186
2016-11-17 01:17:02 +00:00
Dylan McKay 48c26b2b12 [AVR] Remove some accidentally-commited code that broke the bots
This is a remnant of an on-chip unit testing tool that has since been
moved out-of-tree.

It was accidentally committed in r287162.

llvm-svn: 287180
2016-11-17 00:09:38 +00:00
Peter Collingbourne f72a8d4e08 Introduce GlobalSplit pass.
This pass splits globals into elements using inrange annotations on
getelementptr indices.

Differential Revision: https://reviews.llvm.org/D22295

llvm-svn: 287178
2016-11-16 23:40:26 +00:00
Dylan McKay 6dd69032c9 [AVR] Fix basic block naming in ctlz and cttz tests
The branch selector would change the names.

llvm-svn: 287174
2016-11-16 22:48:38 +00:00
Dylan McKay 9701c42de9 [AVR] Add tests for counting leading/trailing zeros
This adds two test files that verify the 'cttz' and 'ctlz' operations.

llvm-svn: 287172
2016-11-16 22:38:43 +00:00
Sanjay Patel 066139a3ec [x86] allow FP-logic ops when one operand is FP and result is FP
We save an inter-register file move this way. If there's any CPU where
the FP logic is slower, we could transform this back to int-logic in 
MachineCombiner.

This helps, but doesn't solve, PR6137:
https://llvm.org/bugs/show_bug.cgi?id=6137

The 'andn' test shows that we're missing a pattern match to
recognize the xor with -1 constant as a 'not' op.

llvm-svn: 287171
2016-11-16 22:34:05 +00:00
Kevin Enderby 7fa40c9f2b General clean up of error handling in llvm-objdump to remove its use of report_fatal_error().
No real functional change with this commit.

The problem with report_fatal_error() is it does not include the tool name
and the file name the for which the error message was generated.

Uses of report_fatal_error() were change to report_error() or error()
to get a better error and to make the code smaller and cleaner.

Also changed things like error(errorToErrorCode(SOrErr.takeError())) to
use report_error() with a file name and the llvm::Error (as well as the
ArchitectureName if available) so the error message is printed.

llvm-svn: 287163
2016-11-16 22:17:38 +00:00
Dylan McKay a789f40002 [AVR] Add the pseudo instruction expansion pass
Summary:
A lot of the pseudo instructions are required because LLVM assumes that
all integers of the same size as the pointer size are legal. This means
that it will not currently expand 16-bit instructions to their 8-bit
variants because it thinks 16-bit types are legal for the operations.

This also adds all of the CodeGen tests that required the pass to run.

Reviewers: arsenm, kparzysz

Subscribers: wdng, mgorny, modocache, llvm-commits

Differential Revision: https://reviews.llvm.org/D26577

llvm-svn: 287162
2016-11-16 21:58:04 +00:00
Sanjoy Das df4b162e4d [ImplicitNullChecks] Do not not handle call MachineInstrs
We don't track callee clobbered registers correctly, so avoid hoisting
across calls.

Note: for this bug to trigger we need a `readonly` call target, since we
already have logic to not hoist across potentially storing instructions
either.

llvm-svn: 287159
2016-11-16 21:45:22 +00:00
Peter Collingbourne 7a74803abf Bitcode: Introduce initial multi-module reader API.
Implement getLazyBitcodeModule() and parseBitcodeFile() in terms of it.

Differential Revision: https://reviews.llvm.org/D26719

llvm-svn: 287156
2016-11-16 21:44:45 +00:00
Tim Northover 397f9d9d05 ARM: fix CodeGen for 64-bit shifts.
One half of the shifts obviously needed conditional selection based on whether
the shift amount is more than 32-bits, but leaving the other half as the
natural shift isn't acceptable either: it's undefined behaviour to shift a
32-bit value by more than 31.

llvm-svn: 287149
2016-11-16 20:54:28 +00:00
Matt Arsenault 3b36bb1d87 AMDGPU: Enable ConstrainCopy DAG mutation
This fixes a probably unintended divergence from the default
scheduler behavior.

llvm-svn: 287146
2016-11-16 20:35:23 +00:00
Geoff Berry 8301c645c8 [AArch64] Handle vector types in replaceZeroVectorStore.
Summary:
Extend replaceZeroVectorStore to handle more vector type stores,
floating point zero vectors and set alignment more accurately on split
stores.

This is a follow-up change to r286875.

This change fixes PR31038.

Reviewers: MatzeB

Subscribers: mcrosier, aemerson, llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D26682

llvm-svn: 287142
2016-11-16 19:35:19 +00:00
Tom Stellard 0d162b1c4f AMDGPU/SI: Avoid creating unnecessary copies in the SIFixSGPRCopies pass
Summary:
1. Don't try to copy values to and from the same register class.
2. Replace copies with of registers with immediate values with v_mov/s_mov
   instructions.

The main purpose of this change is to make MachineSink do a better job of
determining when it is beneficial to split a critical edge, since the pass
assumes that copies will become move instructions.

This prevents a regression in uniform-cfg.ll if we enable critical edge
splitting for AMDGPU.

Reviewers: arsenm

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: https://reviews.llvm.org/D23408

llvm-svn: 287131
2016-11-16 18:42:17 +00:00
Sanjay Patel 7f3d51f840 [x86] add fake scalar FP logic instructions to ReplaceableInstrs to save some bytes
We can replace "scalar" FP-bitwise-logic with other forms of bitwise-logic instructions. 
Scalar SSE/AVX FP-logic instructions only exist in your imagination and/or the bowels of 
compilers, but logically equivalent int, float, and double variants of bitwise-logic 
instructions are reality in x86, and the float variant may be a shorter instruction 
depending on which flavor (SSE or AVX) of vector ISA you have...so just prefer float all 
the time.

This is a preliminary step towards solving PR6137:
https://llvm.org/bugs/show_bug.cgi?id=6137

Differential Revision:
https://reviews.llvm.org/D26712

llvm-svn: 287122
2016-11-16 17:42:40 +00:00
Reid Kleckner 3a83e76811 [sancov] Name the global containing the main source file name
If the global name doesn't start with __sancov_gen, ASan will insert
unecessary red zones around it.

llvm-svn: 287117
2016-11-16 16:50:43 +00:00
Simon Pilgrim 79416ea76a [X86] Add integer division test for PR23590
Shows missed opportunity to recognise reduced integer division result size

llvm-svn: 287110
2016-11-16 14:54:34 +00:00
Simon Pilgrim b57dd17142 [X86][AVX512] Autoupgrade lossless i32/u32 to f64 conversion intrinsics with generic IR
Both the (V)CVTDQ2PD (i32 to f64) and (V)CVTUDQ2PD (u32 to f64) conversion instructions are lossless and can be safely represented as generic SINT_TO_FP/UINT_TO_FP calls instead of x86 intrinsics without affecting final codegen.

LLVM counterpart to D26686

Differential Revision: https://reviews.llvm.org/D26736

llvm-svn: 287108
2016-11-16 14:48:32 +00:00
Simon Pilgrim 9e355bc5bb [X86][AVX512] Added some mask/maskz tests for sitofp/uitofp i32 to f64
llvm-svn: 287106
2016-11-16 14:24:04 +00:00
Simon Pilgrim c223aa52b1 [X86] Regenerated integer divide tests to test on 32 and 64 bit targets
llvm-svn: 287104
2016-11-16 14:12:11 +00:00
Simon Pilgrim dd8c71c646 [X86][SSE] Added PSUBUS from SELECT tests from D25987
llvm-svn: 287103
2016-11-16 13:59:03 +00:00
Simon Dardis 8ca1cbccc6 [mips] Fix unsigned/signed type error
MipsFastISel uses a a class to represent addresses with a signed member
to represent the offset. MipsFastISel::emitStore, emitLoad and computeAddress
all treated the offset as being positive. In cases where the offset was
actually negative and a frame pointer was used, this would cause the constant
synthesis routine to crash as it would generate an unexpected instruction
sequence when frame indexes are replaced.

Reviewers: vkalintiris

Differential Revision: https://reviews.llvm.org/D26192

llvm-svn: 287099
2016-11-16 11:29:07 +00:00
Simon Dardis 7b7cb8d9dd [mips] not instruction alias
This patch adds the single operand form of the not alias to microMIPS and
MIPS along with additional tests.

This partially resolves PR/30381.

Thanks to Sean Bruno for reporting the issue!

llvm-svn: 287097
2016-11-16 11:04:49 +00:00
Ayman Musa 4d60243bfd [X86][AVX512] Removing llvm x86 intrinsics for _mm_mask_move_{ss|sd} intrinsics.
Differential Revision: https://reviews.llvm.org/D26128

llvm-svn: 287087
2016-11-16 09:00:28 +00:00
Craig Topper 6910fa0ef4 [X86] Remove the scalar intrinsics for fadd/fsub/fdiv/fmul
Summary: These intrinsics have been unused for clang for a while. This patch removes them. We auto upgrade them to extractelements, a scalar operation and then an insertelement. This matches the sequence used by clangs intrinsic file.

Reviewers: zvi, delena, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26660

llvm-svn: 287083
2016-11-16 05:24:10 +00:00
Davide Italiano 6cf09265f9 [ELF] Convert ELF.h to Expected<T>.
This has two advantages:
1) We slowly move away from ErrorOr to the new handling interface,
in the hope of having an uniform error handling in LLVM, eventually.
2) We're starting to have *meaningful* error messages for invalid
object ELF files, rather than a generic "parse error". At some point
we should include also the offset to improve the quality of the
diagnostic.

llvm-svn: 287081
2016-11-16 05:10:28 +00:00
Saleem Abdulrasool d05c5aea47 test: use separate input file for test
Rather than using sed to generate the input and pipe the result to
strings, use the static input instead.

llvm-svn: 287079
2016-11-16 04:08:46 +00:00
Matthias Braun 3d51cf0a2c AArch64: Use DeadRegisterDefinitionsPass before regalloc.
Doing this before register allocation reduces register pressure as we do
not even have to allocate a register for those dead definitions.

Differential Revision: https://reviews.llvm.org/D26111

llvm-svn: 287076
2016-11-16 03:38:27 +00:00
Konstantin Zhuravlyov 2a87a42035 [AMDGPU] Handle f16 select{_cc}
- Select `select` to `v_cndmask_b32`
- Expand `select_cc`
- Refactor patterns

Differential Revision: https://reviews.llvm.org/D26714

llvm-svn: 287074
2016-11-16 03:16:26 +00:00
Quentin Colombet fb9b0cdcfe [RegAllocGreedy] Record missed hint for late recoloring.
In https://reviews.llvm.org/D25347, Geoff noticed that we still have
useless copy that we can eliminate after register allocation. At the
time the allocation is chosen for those copies, they are not useless
but, because of changes in the surrounding code, later on they might
become useless.
The Greedy allocator already has a mechanism to deal with such cases
with a late recoloring. However, we missed to record the some of the
missed hints.

This commit fixes that.

llvm-svn: 287070
2016-11-16 01:07:12 +00:00
Vyacheslav Klochkov b3dc774a99 Fixed the lost FastMathFlags for CALL operations in SLPVectorizer.
Reviewer: Michael Zolotukhin.
Differential Revision: https://reviews.llvm.org/D26575

llvm-svn: 287064
2016-11-16 00:55:50 +00:00
Justin Lebar 2860573529 [BypassSlowDivision] Handle division by constant numerators better.
Summary:
We don't do BypassSlowDivision when the denominator is a constant, but
we do do it when the numerator is a constant.

This patch makes two related changes to BypassSlowDivision when the
numerator is a constant:

 * If the numerator is too large to fit into the bypass width, don't
   bypass slow division (because we'll never run the smaller-width
   code).

 * If we bypass slow division where the numerator is a constant, don't
   OR together the numerator and denominator when determining whether
   both operands fit within the bypass width.  We need to check only the
   denominator.

Reviewers: tra

Subscribers: llvm-commits, jholewinski

Differential Revision: https://reviews.llvm.org/D26699

llvm-svn: 287062
2016-11-16 00:44:47 +00:00
Joerg Sonnenberger 8c1a9ac52b Always use relative jump table encodings on PowerPC64.
For the default, small and medium code model, use the existing
difference from the jump table towards the label. For all other code
models, setup the picbase and use the difference between the picbase and
the block address.

Overall, this results in smaller data tables at the expensive of one or
two more arithmetic operation at the jump site. Given that we only create
jump tables with a lot more than two entries, it is a net win in size.
For larger code models the assumption remains that individual functions
are no larger than 2GB.

Differential Revision: https://reviews.llvm.org/D26336

llvm-svn: 287059
2016-11-16 00:37:30 +00:00
Jan Vesely e8cc395e4f AMDGPU/GCN: Exit early in hazard recognizer if there is no vreg argument
wbinvl.* are vector instruction that do not sue vector registers.

v2: check only M?BUF instructions

Differential Revision: https://reviews.llvm.org/D26633

llvm-svn: 287056
2016-11-15 23:55:15 +00:00
Sanjay Patel aaf430452b [x86] regenerate checks; NFC
llvm-svn: 287051
2016-11-15 23:09:53 +00:00
Kevin Enderby 844c4ac55a General clean up of Mach-O error handling in llvm-objdump.
To get a good error message for all files that could contain Mach-O
files the code in llvm-objdump needs to use the archive member name
and name of the architecture of a slice of a universal file in those cases
where the error come from a Mach-O file in an archive or a universal file.

Most of this is fixed by moving the call to checkSymbolTable() into
ProcessMachO() and calling it when the operation needs the symbol
table.  And then calling the form of report_error() that has the
ArchiveName and ArchitectureName arguments.  One other place
needed to call this form of report_error() also with these arguments.

Also changed the code in MachODump.cpp to not use report_fatal_error()
and use report_error() instead to make the code smaller and cleaner.  All
cases of this are for errors with the symbol table which should now never
be tripped since checkSymbolTable() should be called first to get a good
error message in these cases.

llvm-svn: 287050
2016-11-15 23:07:41 +00:00
Sanjay Patel 07529a313a [x86] auto-generate better checks; NFC
llvm-svn: 287049
2016-11-15 23:01:11 +00:00
Sanjay Patel 87cb0745eb [x86] auto-generate better checks; NFC
llvm-svn: 287048
2016-11-15 22:42:20 +00:00
Filipe Cabecinhas ec350b71fa [AddressSanitizer] Add support for (constant-)masked loads and stores.
This patch adds support for instrumenting masked loads and stores under
ASan, if they have a constant mask.

isInterestingMemoryAccess now supports returning a mask to be applied to
the loads, and instrumentMop will use it to generate additional checks.

Added tests for v4i32 v8i32, and v4p0i32 (~v4i64) for both loads and
stores (as well as a test to verify we don't add checks to non-constant
masks).

Differential Revision: https://reviews.llvm.org/D26230

llvm-svn: 287047
2016-11-15 22:37:30 +00:00
Sanjay Patel 9a4ce290d0 [x86] auto-generate better checks; NFC
llvm-svn: 287046
2016-11-15 22:33:16 +00:00
Amaury Sechet 003216b319 [C API] Prevent nullptr dereferences in C API for counting attributes.
See https://reviews.llvm.org/D26392

Patch by @maleadt

llvm-svn: 287044
2016-11-15 22:19:59 +00:00
Peter Collingbourne bc9a574657 Object: replace backslashes with slashes in embedded relative thin archive paths on Windows.
This makes these thin archives portable between *nix and Windows.

Differential Revision: https://reviews.llvm.org/D26696

llvm-svn: 287038
2016-11-15 21:36:35 +00:00
Chad Rosier 201fc1ed26 [AArch64] Add support for Qualcomm's Falkor CPU.
Differential Revision: https://reviews.llvm.org/D26673

llvm-svn: 287036
2016-11-15 21:34:12 +00:00
Tom Stellard d23de360db AMDGPU/SI: Fix pattern for i16 = sign_extend i1
Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D26670

llvm-svn: 287035
2016-11-15 21:25:56 +00:00
Sanjay Patel 2a51748a5d [x86] add tests for FP-logic equivalent instruction replacement
The ANDN test needs at least 3 different fixes.

llvm-svn: 287032
2016-11-15 21:19:28 +00:00
Kostya Serebryany 9d6dc7b164 [sanitizer-coverage] make sure asan does not instrument coverage guards (reported in https://github.com/google/oss-fuzz/issues/84)
llvm-svn: 287030
2016-11-15 21:12:50 +00:00
Tim Northover bf55f7ea59 llvm-objdump: deal with unexpected object files more gracefully.
Specifically, we don't want to segfault on release builds, so print the problem
instead.

llvm-svn: 287022
2016-11-15 20:26:01 +00:00
Matt Arsenault d4bb5e4831 AMDGPU: Enable store clustering
Also respect the TII hook for these like the generic code does
in case we want a flag later to disable this.

llvm-svn: 287021
2016-11-15 20:22:55 +00:00
Haicheng Wu faee2b71a7 [AArch64] Lower multiplication by a constant int to shl+add+shl
Lower a = b * C where C = (2^n + 1) * 2^m to

add     w0, w0, w0, lsl n
lsl     w0, w0, m

Differential Revision: https://reviews.llvm.org/D229245

llvm-svn: 287019
2016-11-15 20:16:48 +00:00
Matt Arsenault 3666629837 AMDGPU: Analyze mubuf with immediate soffset
Fixes giving up on clustering common addr64 accesses with
constant 0 soffset.

llvm-svn: 287018
2016-11-15 20:14:27 +00:00
Wei Mi 37c4aaaf52 Revert r286999 which caused buildbot test failures. Some testcases need to be made target specific.
llvm-svn: 287014
2016-11-15 19:42:05 +00:00
Stanislav Mekhanoshin ea91cca593 [AMDGPU] Add wave barrier builtin
The wave barrier represents the discardable barrier. Its main purpose is to
carry convergent attribute, thus preventing illegal CFG optimizations. All lanes
in a wave come to convergence point simultaneously with SIMT, thus no special
instruction is needed in the ISA. The barrier is discarded during code generation.

Differential Revision: https://reviews.llvm.org/D26585

llvm-svn: 287007
2016-11-15 19:00:15 +00:00
Sanjay Patel 22465125b3 [x86] auto-generate checks; NFC
Also, fix the test params to use an attribute rather than a CPU model
and remove the AVX run because that does nothing but check for a 'v'
prefix in all of these tests.

llvm-svn: 287003
2016-11-15 18:44:53 +00:00
Wei Mi 7ccf7651c0 [LSR] Allow formula containing Reg for SCEVAddRecExpr related with outerloop.
In RateRegister of existing LSR, if a formula contains a Reg which is a SCEVAddRecExpr,
and this SCEVAddRecExpr's loop is an outerloop, the formula will be marked as Loser
and dropped.

Suppose we have an IR that %for.body is outerloop and %for.body2 is innerloop. LSR only
handle inner loop now so only %for.body2 will be handled.

Using the logic above, formula like
reg(%array) + reg({1,+, %size}<%for.body>) + 1*reg({0,+,1}<%for.body2>) will be dropped
no matter what because reg({1,+, %size}<%for.body>) is a SCEVAddRecExpr type reg related
with outerloop. Only formula like
reg(%array) + 1*reg({{1,+, %size}<%for.body>,+,1}<nuw><nsw><%for.body2>) will be kept
because the SCEVAddRecExpr related with outerloop is folded into the initial value of the
SCEVAddRecExpr related with current loop.

But in some cases, we do need to share the basic induction variable
reg{0 ,+, 1}<%for.body2> among LSR Uses to reduce the final total number of induction
variables used by LSR, so we don't want to drop the formula like
reg(%array) + reg({1,+, %size}<%for.body>) + 1*reg({0,+,1}<%for.body2>) unconditionally.

From the existing comment, it tries to avoid considering multiple level loops at the same time.
However, existing LSR only handles innermost loop, so for any SCEVAddRecExpr with a loop other
than current loop, it is an invariant and will be simple to handle, and the formula doesn't have
to be dropped.

Differential Revision: https://reviews.llvm.org/D26429

llvm-svn: 286999
2016-11-15 18:35:53 +00:00
Pawel Bylica c3f6c97f71 Integer legalization: fix MUL expansion
Summary:
This fixes the runtime results produces by the fallback multiplication expansion introduced in r270720.

For tests I created a fuzz tester that compares the results with Boost.Multiprecision.

Reviewers: hfinkel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26628

llvm-svn: 286998
2016-11-15 18:29:24 +00:00
Zaara Syeda a19c9e60e9 vector load store with length (left justified) llvm portion
llvm-svn: 286993
2016-11-15 17:54:19 +00:00
Wei Mi d2948cef70 [IndVars] Change the order to compute WidenAddRec in widenIVUse.
When both WidenIV::getWideRecurrence and WidenIV::getExtendedOperandRecurrence
return non-null but different WideAddRec, if getWideRecurrence is called
before getExtendedOperandRecurrence, we won't bother to call
getExtendedOperandRecurrence again. But As we know it is possible that after
SCEV folding, we cannot prove the legality using the SCEVAddRecExpr returned
by getWideRecurrence. Meanwhile if getExtendedOperandRecurrence returns non-null
WideAddRec, we know for sure that it is legal to do widening for current instruction.
So it is better to put getExtendedOperandRecurrence before getWideRecurrence, which
will increase the chance of successful widening.

Differential Revision: https://reviews.llvm.org/D26059

llvm-svn: 286987
2016-11-15 17:34:52 +00:00
Craig Topper c7486af9c9 [AVX-512] Add AVX-512 vector shift intrinsics to memory santitizer.
Just needed to add the intrinsics to the exist switch. The code is generic enough to support the wider vectors with no changes.

llvm-svn: 286980
2016-11-15 16:27:33 +00:00
Simon Pilgrim ceffb43b1b [X86][SSE] Improve SINT_TO_FP of boolean vector results (signum)
This patch helps avoids poor legalization of boolean vector results (e.g. 8f32 -> 8i1 -> 8i16) that feed into SINT_TO_FP by inserting an early SIGN_EXTEND and so help improve the truncation logic.

This is not necessary for AVX512 targets where boolean vectors are legal - AVX512 manages to lower ( sint_to_fp vXi1 ) into some form of ( select mask, 1.0f , 0.0f ) in most cases.

Fix for PR13248

Differential Revision: https://reviews.llvm.org/D26583

llvm-svn: 286979
2016-11-15 16:24:40 +00:00
Sanjay Patel bb238bb4e5 [InstCombine] add tests for bitcasted selects; NFC
llvm-svn: 286978
2016-11-15 16:01:16 +00:00
Pablo Barrio 4f80c93a2e Revert "[JumpThreading] Unfold selects that depend on the same condition"
This reverts commit ac54d0066c478a09c7cd28d15d0f9ff8af984afc.

llvm-svn: 286976
2016-11-15 15:42:23 +00:00
Robert Lougher b0905209dd [LoopVectorizer] When estimating reg usage, unused insts may "end" another use
The register usage algorithm incorrectly treats instructions whose value is
not used within the loop (e.g. those that do not produce a value).

The algorithm first calculates the usages within the loop.  It iterates over
the instructions in order, and records at which instruction index each use
ends (in fact, they're actually recorded against the next index, as this is
when we want to delete them from the open intervals).

The algorithm then iterates over the instructions again, adding each
instruction in turn to a list of open intervals.  Instructions are then
removed from the list of open intervals when they occur in the list of uses
ended at the current index.

The problem is, instructions which are not used in the loop are skipped.
However, although they aren't used, the last use of a value may have been
recorded against that instruction index.  In this case, the use is not deleted
from the open intervals, which may then bump up the estimated register usage.

This patch fixes the issue by simply moving the "is used" check after the loop
which erases the uses at the current index.

Differential Revision: https://reviews.llvm.org/D26554

llvm-svn: 286969
2016-11-15 14:27:33 +00:00
Tony Jiang 5f850cd1b1 [PowerPC] Implement BE VSX load/store builtins - llvm portion.
This patch implements all the overloads for vec_xl_be and vec_xst_be. On BE,
they behaves exactly the same with vec_xl and vec_xst, therefore they are
simply implemented by defining a matching macro. On LE, they are implemented
by defining new builtins and intrinsics. For int/float/long long/double, it
is just a load (lxvw4x/lxvd2x) or store(stxvw4x/stxvd2x). For char/char/short,
we also need some extra shuffling before or after call the builtins to get the
desired BE order. For int128, simply call vec_xl or vec_xst.

llvm-svn: 286967
2016-11-15 14:25:56 +00:00
Zvi Rackover f0b9b57bd3 [X86][FastISel] Fix lowering of overflow result on AVX512 targets
Summary:
    Fix a case where the overflow value of type i1, which is legal on AVX512, was assigned to a VK1 register class.
    We always want this value to be assigned to a GPR since the overflow return value is lowered to a SETO instruction.

    Fixes pr30981.

    Reviewers: mkuper, igorb, craig.topper, guyblank, qcolombet

    Subscribers: qcolombet, llvm-commits

    Differential Revision: https://reviews.llvm.org/D26620

llvm-svn: 286958
2016-11-15 13:29:23 +00:00
Javed Absar f043dac25d [ARM] Add machine scheduler for Cortex-R52
This patch adds the Sched Machine Model for Cortex-R52.

Details of the pipeline and descriptions are in comments
in file ARMScheduleR52.td included in this patch.

Reviewers: rengolin, jmolloy

Differential Revision: https://reviews.llvm.org/D26500

llvm-svn: 286949
2016-11-15 11:34:54 +00:00
Asaf Badouh b573553424 DAGCombiner: fix combine of trunc and select
bugzilla:
https://llvm.org/bugs/show_bug.cgi?id=29002
pr29002

Differential Revision: https://reviews.llvm.org/D26449


 

llvm-svn: 286938
2016-11-15 07:55:22 +00:00
Matt Arsenault 1c8d933881 TableGen: Add operator !or
llvm-svn: 286936
2016-11-15 06:49:28 +00:00
Zvi Rackover 76dbf26599 [X86][GlobalISel] Add minimal call lowering support to the IRTranslator
Summary:
    Add basic functionality to support call lowering for X86.
    Currently only supports functions which return void and take zero arguments.
    Inspired by commit 286573.

Reviewers: ab, qcolombet, t.p.northover

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26593

llvm-svn: 286935
2016-11-15 06:34:33 +00:00
Craig Topper 0637099f24 [AVX-512] Add an example test case for PR31018.
llvm-svn: 286934
2016-11-15 05:21:55 +00:00
Matt Arsenault c79dc70d50 AMDGPU: Fix f16 fabs/fneg
llvm-svn: 286931
2016-11-15 02:25:28 +00:00
Saleem Abdulrasool f7009b42f8 llvm-strings: support the `-n` option
Permit specifying the match length (the `-n` or `--bytes` option).  The
deprecated `-[length]` form is not supported as an option.  This allows the
strings tool to display only the specified length strings rather than the
hardcoded default length of >= 4.

llvm-svn: 286914
2016-11-15 00:43:52 +00:00
Matt Arsenault 972034bda9 AMDGPU: Fix formatting of 1/2pi immediate
llvm-svn: 286912
2016-11-15 00:04:33 +00:00
Tom Stellard 9c884e495c MIRParser: Add support for parsing vreg reg alloc hints
Reviewers: qcolombet, MatzeB

Subscribers: wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D26573

llvm-svn: 286911
2016-11-15 00:03:14 +00:00
Evandro Menezes 9fc54826e0 [AArch64] Compute the Newton series for reciprocals natively
Implement the Newton series for square root, its reciprocal and reciprocal
natively using the specialized instructions in AArch64 to perform each
series iteration.

Differential revision: https://reviews.llvm.org/D26518

llvm-svn: 286907
2016-11-14 23:29:01 +00:00
Peter Collingbourne 3cb86272fc Linker: Remove unnecessary call to copyMetadata in IRLinker::linkGlobalVariable.
This was causing us to create duplicate metadata on global variables.
Debug info test case by Adrian Prantl, additional test cases by me.

Fixes PR31012.

Differential Revision: https://reviews.llvm.org/D26622

llvm-svn: 286905
2016-11-14 23:18:38 +00:00
Tim Northover e33b175411 GlobalISel: add tests for G_ZEXT/G_SEXT to types smaller than 32-bits.
Support was accidentally added in r286407, but there were no tests at the time.

llvm-svn: 286903
2016-11-14 22:50:22 +00:00
Sanjay Patel aaa06fa486 [InstCombine] add tests to show missing bitcast folds
llvm-svn: 286900
2016-11-14 22:44:06 +00:00
Tom Stellard 11e60ff7da RegAllocGreedy: Properly initialize this pass, so that -run-pass will work
Reviewers: qcolombet, MatzeB

Subscribers: wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D26572

llvm-svn: 286895
2016-11-14 21:50:13 +00:00
Kuba Brecka ddfdba3b01 [tsan] Add support for C++ exceptions into TSan (call __tsan_func_exit during unwinding), LLVM part
This adds support for TSan C++ exception handling, where we need to add extra calls to __tsan_func_exit when a function is exitted via exception mechanisms. Otherwise the shadow stack gets corrupted (leaked). This patch moves and enhances the existing implementation of EscapeEnumerator that finds all possible function exit points, and adds extra EH cleanup blocks where needed.

Differential Revision: https://reviews.llvm.org/D26177

llvm-svn: 286893
2016-11-14 21:41:13 +00:00
Saleem Abdulrasool f10a871419 Revert "Revert "llvm-strings: support printing the filename""
Change the dynamic files to static in the hope that it will actually fix the
transient errors that Ive been unable to reproduce.

llvm-svn: 286891
2016-11-14 21:10:41 +00:00
Kevin Enderby 22fc007809 Add a checkSymbolTable() method to the MachOObjectFile class.
The philosophy of the error checking in libObject for Mach-O files
is that the constructor will check the load commands so for their
tables the offsets and sizes are properly contained in the file.
But there is no checking of the entries of any of the tables.

For the contents of the tables themselves the methods accessing
the contents of the entries return errors as needed.  In some
cases this however makes it difficult or cumbersome to produce
a good error message which would include the tool name, file name,
archive member, and name of the architecture of a slice of a universal file
the error occurred in.

So idea is that there will be a method to check a table which can
be called up front before using it allowing a good error message
to be produced before a table is used.  And if only verification of
the Mach-O file and its tables are wanted a new possible method
checkAllTables() could be added to call all of the methods to
check all the tables at some time when such methods exist.

The checkSymbolTable() is the first of such methods to check
one of the Mach-O file tables.  This method initially will used in
llvm-objdump’s DisassembleMachO() routine before it gets the
section and symbol information.  As if there are problems with
the symbol table currently the error is first encountered by the
bool operator() in the SymbolSorter() struct which passed to
std::sort().  In this case there is no context as to the file name
the symbol which results a poor error message:

LLVM ERROR: truncated or malformed object (bad string index: 22 for symbol at index 1)

with the added call to the checkSymbolTable() method the
error message includes the tool name and file name:

llvm-objdump: 'macho-invalid-symbol-strx': truncated or malformed object (bad string table index: 22 past the end of string table, for symbol at index 1)
llvm-svn: 286887
2016-11-14 20:57:04 +00:00
Tim Northover 46a6f0fbf0 Recommit: ARM: sort register lists by encoding in push/pop instructions.
For example we were producing

    push {r8, r10, r11, r4, r5, r7, lr}

This is misleading (r4, r5 and r7 are actually pushed before the rest), and
other components (stack folding recently) often forget to deal with the extra
complexity coming from the different order, leading to miscompiles. Finally, we
warn about our own code in -no-integrated-as mode without this, which is really
not a good idea.

Fixed usage of std::sort so that we (hopefully) use instantiations that
actually exist in GCC 4.8.

llvm-svn: 286881
2016-11-14 20:28:24 +00:00
Michael Kuperstein f221f13ccc [X86] Tests exhibiting bad parial reloading behavior. NFC.
llvm-svn: 286878
2016-11-14 19:58:11 +00:00
Geoff Berry 526c50588d [AArch64] Split 0 vector stores into scalar store pairs.
Summary:
Replace a splat of zeros to a vector store by scalar stores of WZR/XZR.
The load store optimizer pass will merge them to store pair stores.
This should be better than a movi to create the vector zero followed by
a vector store if the zero constant is not re-used, since one
instructions and one register live range will be removed.

For example, the final generated code should be:

  stp xzr, xzr, [x0]

instead of:

  movi v0.2d, #0
  str q0, [x0]

Reviewers: t.p.northover, mcrosier, MatzeB, jmolloy

Subscribers: aemerson, rengolin, llvm-commits

Differential Revision: https://reviews.llvm.org/D26561

llvm-svn: 286875
2016-11-14 19:39:04 +00:00
Teresa Johnson 4fef68cb8d [ThinLTO] Only promote exported locals as marked in index
Summary:
We have always speculatively promoted all renamable local values
(except const non-address taken variables) for both the exporting
and importing module. We would then internalize them back based on
the ThinLink results if they weren't actually exported. This is
inefficient, and results in unnecessary renames. It also meant we
had to check the non-renamability of a value in the summary, which
was already checked during function importing analysis in the ThinLink.

Made renameModuleForThinLTO (which does the promotion/renaming) instead
use the index when exporting, to avoid unnecessary renames/promotions.
For importing modules, we can simply promoted all values as any local
we import by definition is exported and needs promotion.

This required changes to the method used by the FunctionImport pass
(only invoked from 'opt' for testing) and when invoked from llvm-link,
since neither does a ThinLink. We simply conservatively mark all locals
in the index as promoted, which preserves the current aggressive
promotion behavior.

I also needed to change an llvm-lto based test where we had previously
been aggressively promoting values that weren't importable (aliasees),
but now will not promote.

Reviewers: mehdi_amini

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26467

llvm-svn: 286871
2016-11-14 19:21:41 +00:00
Tim Northover 1b66f39cf2 Revert "ARM: sort register lists by encoding in push/pop instructions."
This reverts commit 286866. It broke a bot, something to do with exactly which
templates std::sort accepts.

llvm-svn: 286867
2016-11-14 19:05:28 +00:00
Tim Northover e908ea844c ARM: sort register lists by encoding in push/pop instructions.
For example we were producing

    push {r8, r10, r11, r4, r5, r7, lr}

This is misleading (r4, r5 and r7 are actually pushed before the rest), and
other components (stack folding recently) often forget to deal with the extra
complexity coming from the different order, leading to miscompiles. Finally, we
warn about our own code in -no-integrated-as mode without this, which is really
not a good idea.

llvm-svn: 286866
2016-11-14 19:02:17 +00:00
Sean Fertile a435e07de8 [PPC] Add intrinsic mapping to the xscvhpsp instruction
add an intrinsic to expose the 'VSX Scalar Convert Half-Precision to
Single-Precision' instruction.

Differential review: https://reviews.llvm.org/D26536

llvm-svn: 286862
2016-11-14 18:43:59 +00:00
Changpeng Fang 8236fe103f AMDGPU/SI: Support data types other than V4f32 in image intrinsics
Summary:
  Extend image intrinsics to support data types of V1F32 and V2F32.

  TODO: we should define a mapping table to change the opcode for data type of V2F32 but just one channel is active,
  even though such case should be very rare.

Reviewers:
  tstellarAMD

Differential Revision:
  http://reviews.llvm.org/D26472

llvm-svn: 286860
2016-11-14 18:33:18 +00:00
Zvi Rackover 35bb7fdadc [X86] Adding reproducer for pr30981
llvm-svn: 286855
2016-11-14 18:10:44 +00:00
Teresa Johnson 3624bdf60a Restore "[ThinLTO] Prevent exporting of locals used/defined in module level asm"
This restores the rest of r286297 (part was restored in r286475).
Specifically, it restores the part requiring adding a dependency from
the Analysis to Object library (downstream use changed to correctly
model split BitReader vs BitWriter libraries).

Original description of this part of patch follows:

Module level asm may also contain defs of values. We need to prevent
export of any refs to local values defined in module level asm (e.g. a
ref in normal IR), since that also requires renaming/promotion of the
local. To do that, the summary index builder looks at all values in the
module level asm string that are not marked Weak or Global, which is
exactly the set of locals that are defined. A summary is created for
each of these local defs and flagged as NoRename.

This required adding handling to the BitcodeWriter to look at GV
declarations to see if they have a summary (rather than skipping them
all).

Finally, added an assert to IRObjectFile::CollectAsmUndefinedRefs to
ensure that an MCAsmParser is available, otherwise the module asm parse
would silently fail. Initialized the asm parser in the opt tool for use
in testing this fix.

Fixes PR30610.

llvm-svn: 286844
2016-11-14 17:12:32 +00:00
Sumanth Gundapaneni d428cf8b5f [Hexagon] Remove unsafe load instructions that affect Stack Slot Coloring
The Stack slot coloring pass removes a store that is followed by a load
that deal with the same stack slot. The function isLoadFromStackSlot
is supposed to consider the loads that have no side-effects. This
patch fixed the issue by removing the unsafe loads from this function
Eg:
%vreg0<def> = L2_loadruh_io <fi#15>, 0
S2_storeri_io <fi#15>, 0, %vreg0

In this case, we load an unsigned extended half word and store this in to
the same stack slot. The Stack slot coloring pass considers safe to remove
the store. This patch marked all the non-vector byte and half word loads as
unsafe.

llvm-svn: 286843
2016-11-14 17:11:00 +00:00
Simon Pilgrim 779da8e5ea [CostModel][X86] Added mul costs for vXi8 vectors
More realistic v16i8/v32i8/v64i8 MUL costs - we have to extend to vXi16, use PMULLW and then truncate the result

llvm-svn: 286838
2016-11-14 15:54:24 +00:00
Simon Pilgrim 27fed8e5d6 [X86][AVX] Fixed v16i16/v32i8 ADD/SUB costs on AVX1 subtargets
Add explicit v16i16/v32i8 ADD/SUB costs, matching the costs of v4i64/v8i32 - they were missing for some reason.

This has side effects on the LV max bandwidth tests (AVX1 now prefers 128-bit vectors vs AVX2 which still prefers 256-bit)

llvm-svn: 286832
2016-11-14 14:45:16 +00:00
Sean Fertile adda5b2d2b [PPC] add intrinsics for vec extract exp/significand and vec test data class.
Differential Revision: https://reviews.llvm.org/D26272

llvm-svn: 286829
2016-11-14 14:42:37 +00:00
Renato Golin 199b6b941d Revert "llvm-strings: support printing the filename"
Also,

Revert "test: remove the archive before modifying it"
Revert "test: explicitly use gnu format"

This reverts commits r286778, r286729 and r286767, as they are randomly failing
on many bots (AArch64, x86_64).

llvm-svn: 286820
2016-11-14 13:09:24 +00:00
James Molloy 6df8f27c95 [InlineCost] Remove skew when calculating call costs
When calculating the cost of a call instruction we were applying a heuristic penalty as well as the cost of the instruction itself.

However, when calculating the benefit from inlining we weren't discounting the equivalent penalty for the call instruction that would be removed! This caused skew in the calculation and meant we wouldn't inline in the following, trivial case:

  int g() {
    h();
  }
  int f() {
    g();
  }

llvm-svn: 286814
2016-11-14 11:14:41 +00:00
Craig Topper 8f85ad1755 [AVX-512] Add suffixless aliases for EVEX encoded vcvtsi2ss/vcvtsi2sd/vcvtusi2ss/vcvtusi2sd. This matches the VEX behavior.
Fixes another problem from PR28850.

llvm-svn: 286790
2016-11-14 02:46:58 +00:00
Craig Topper b8596e4d1d [X86] Cleanup 'x' and 'y' mnemonic suffixes for vcvtpd2dq/vcvttpd2dq/vcvtpd2ps and similar instructions.
-Don't print the 'x' suffix for the 128-bit reg/mem VEX encoded instructions in Intel syntax. This is consistent with the EVEX versions.
-Don't print the 'y' suffix for the 256-bit reg/reg VEX encoded instructions in Intel or AT&T syntax. This is consistent with the EVEX versions.
-Allow the 'x' and 'y' suffixes to be used for the reg/mem forms when we're assembling using Intel syntax.
-Allow the 'x' and 'y' suffixes on the reg/reg EVEX encoded instructions in Intel or AT&T syntax. This is consistent with what VEX was already allowing.

This should fix at least some of PR28850.

llvm-svn: 286787
2016-11-14 01:53:29 +00:00
Craig Topper 353e59b6d6 [AVX-512] Remove and autoupgrade masked dword/qword variable shift intrinsics to the new unmasked versions and selects.
llvm-svn: 286786
2016-11-14 01:53:22 +00:00
Saleem Abdulrasool 25d7683fe7 test: remove the archive before modifying it
The archive may already exist when not doing a clean test run.  The dirty state
can cause a test failure.  Remove the archive first.

llvm-svn: 286778
2016-11-13 20:43:41 +00:00
Saleem Abdulrasool 7091820a96 llvm-cxxfilt: support reading from stdin
`c++filt` when given no arguments runs as a REPL, decoding each line as a
decorated name.  Unify the test structure to be more uniform, with the tests for
llvm-cxxfilt living under test/tools/llvm-cxxfilt.

llvm-svn: 286777
2016-11-13 20:43:38 +00:00
Sanjay Patel cfcc42bdc2 [ValueTracking] recognize even more variants of smin/smax
Similar to:
https://reviews.llvm.org/rL285499
https://reviews.llvm.org/rL286318

We can't minimally expose this in IR tests because we don't have min/max intrinsics,
but the difference is visible in codegen because SelectionDAGBuilder::visitSelect() 
uses matchSelectPattern().

We're not canonicalizing these patterns in IR (yet), so I don't expect there to be any
regressions as noted here:
http://lists.llvm.org/pipermail/llvm-dev/2016-November/106868.html

llvm-svn: 286776
2016-11-13 20:04:52 +00:00
Craig Topper ba13703bb3 [AVX-512] Fix a disassembler failure for AVX-512 vcmpss/vcmpsd with an immediate larger than 32. Fix the same bug with VLX vcmpps/vcmppd.
Fixes PR24941.

llvm-svn: 286775
2016-11-13 19:58:18 +00:00
Saleem Abdulrasool 6ac46b42ac test: synchronise lit substitutions
llvm-strings was added to the test dependencies without updating the lit
substitutions.  Synchronise the list.

llvm-svn: 286773
2016-11-13 19:37:00 +00:00
Saleem Abdulrasool 8b9be8fd41 llvm-strings: support printing the filename
This adds support for the `-f` or `--print-file-name` option for strings.

llvm-svn: 286767
2016-11-13 19:07:48 +00:00
Matt Arsenault dc45274d54 AMDGPU: Implement SGPR spilling with scalar stores
nThis avoids the nasty problems caused by using
memory instructions that read the exec mask while
spilling / restoring registers used for control flow
masking, but only for VI when these were added.

This always uses the scalar stores when enabled currently,
but it may be better to still try to spill to a VGPR
and use this on the fallback memory path.

The cache also needs to be flushed before wave termination
if a scalar store is used.

llvm-svn: 286766
2016-11-13 18:20:54 +00:00
Igor Breger e2399f9e0e revert commit r286761, some builds failed on Win platforms
llvm-svn: 286765
2016-11-13 15:48:11 +00:00
Simon Pilgrim 055c09c1c0 [X86][SSE] Add zero lower 32-bits test case for PR30845
llvm-svn: 286764
2016-11-13 15:32:11 +00:00
Simon Pilgrim 8f7c56125e [X86][AVX512] Add masked VPMOZX test case for PR26762
llvm-svn: 286763
2016-11-13 15:16:43 +00:00
Simon Pilgrim ce59a536f7 [X86][SSE] Add additional test case for PR30845
llvm-svn: 286762
2016-11-13 14:57:52 +00:00
Ayman Musa c09b3769ae [X86][AVX512] Removing llvm x86 intrinsics for _mm_mask_move_{ss|sd} intrinsics.
Differential Revision: https://reviews.llvm.org/D26128

llvm-svn: 286761
2016-11-13 14:51:25 +00:00
Ayman Musa 46af8f9c6f [X86][AVX512] Add patterns for all variants of VMOVSS/VMOVSD instructions.
Differential Revision: https://reviews.llvm.org/D26022

llvm-svn: 286758
2016-11-13 14:29:32 +00:00
Craig Topper b4173a5a70 [InstCombine][AVX-512] Teach InstCombineCalls to handle the new unmasked AVX-512 variable shift intrinsics.
llvm-svn: 286755
2016-11-13 07:26:19 +00:00
Craig Topper 43e97649a1 [AVX-512] Add unmasked intrinsics for variable shifts of dwords and qwords.
These will be used to replace the masked intrinsics so that InstCombineCalls can optimize the AVX-512 variable shifts the same way it does for AVX2.

llvm-svn: 286754
2016-11-13 07:26:15 +00:00
Konstantin Zhuravlyov f86e4b7266 [AMDGPU] Add f16 support (VI+)
Differential Revision: https://reviews.llvm.org/D25975

llvm-svn: 286753
2016-11-13 07:01:11 +00:00
Craig Topper 706d897d8a [AVX-512] Move masked shift intrinsics tests to the autoupgrade test file. These missed being moved in r286725.
llvm-svn: 286746
2016-11-13 03:42:27 +00:00
Craig Topper 8b831cbb2a [InstCombine][AVX-512] Expand vector shift handling to work on the AVX-512 shift by immediate and shift by single value.
This does not include support for the AVX-512 variable shifts. That will be coming in a future patch.

llvm-svn: 286739
2016-11-13 01:51:55 +00:00
Sanjay Patel a1b8c10bf6 [x86] add smin/smax with zero tests
These are vector tests corresponding to the discussion at:
http://lists.llvm.org/pipermail/llvm-dev/2016-November/106868.html

Apart from the lack of min/max matching, the and/andn difference 
shows a lack of DAG-level canonicalization.

llvm-svn: 286737
2016-11-13 00:32:39 +00:00
Simon Pilgrim 6e09afa9d0 [X86][SSE] Add test case for PR30845
llvm-svn: 286734
2016-11-12 23:44:58 +00:00
Saleem Abdulrasool 43400b4c06 test: explicitly use gnu format
This should fix the Darwin buildbots.

llvm-svn: 286729
2016-11-12 19:03:08 +00:00
Saleem Abdulrasool be3a2919f4 llvm-strings: trivialise logic until we support more options
Until we have handling for ignoring unloaded sections, simplify the logic to
the point of triviality.  This fixes the scanning of archives, particularly when
embedded in archives.

llvm-svn: 286727
2016-11-12 18:37:04 +00:00
Craig Topper da6a63db1c [AVX-512] Remove the remaining masked shift by immediate or by single value. Autoupgrade them to recently introduced unmasked versions and a select.
After this I'll add the unmasked intrinsics to InstCombineCalls to finish making our handling of these types of shuffles consistent between AVX-512 and the legacy intrinsics.

llvm-svn: 286725
2016-11-12 18:04:46 +00:00
Michal Gorny a583be4e52 [OCaml] Clear cross-target test deps when building out-of-tree
Clear cross-target test dependencies when using LLVM_OCAML_OUT_OF_TREE,
in order to make it possible to run check-llvm-bindings-ocaml without
rebuilding the whole LLVM.

Differential Revision: https://reviews.llvm.org/D26580

llvm-svn: 286720
2016-11-12 14:58:30 +00:00
Craig Topper 9d25c5e2fa [AVX-512] Add unmasked version of shift by immediate and shift by single element in XMM.
Summary:
This is the first step towards being able to add the avx512 shift by immediate intrinsics to InstCombineCalls where we aleady support the sse2 and avx2 intrinsics. We need to the unmasked versions so we can avoid having to teach InstCombineCalls that it would need to insert selects sometimes. Instead we'll just add the selects around the new instrinsics in the frontend.

This change should also enable the shift by i32 intrinsics to take a non-constant shift value just like the avx2 and sse intrinsics. This will enable us to fix PR30691 once we update clang.

Next I'll switch clang to use the new builtins. Then we'll come back to the backend and remove/autoupgrade the old intrinsics. Then I'll work on the same series for variable shifts.

Reviewers: RKSimon, zvi, delena

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26333

llvm-svn: 286711
2016-11-12 05:28:24 +00:00
Craig Topper 5cb13062d2 [AVX-512] Add support for lowering shuffles to VALIGND/VALIGNQ
Summary: VALIGND and VALIGNQ are similar to PALIGNR but instead of working on a 128-bit lane they work on the entire vector register. This change leverages the shuffle rotate detection code used for PALIGNR to detect these cases.

Reviewers: delena, RKSimon

Subscribers: Farhana, llvm-commits

Differential Revision: https://reviews.llvm.org/D26297

llvm-svn: 286709
2016-11-12 05:05:27 +00:00