Commit Graph

98032 Commits

Author SHA1 Message Date
Matthias Braun 181983055f BranchRelaxation: Recompute live-ins when splitting a block
Factors out and reuses live-in computation code from BranchFolding.

Differential Revision: https://reviews.llvm.org/D27558

llvm-svn: 290013
2016-12-16 23:55:37 +00:00
Paul Robinson 2dfb688214 Allow "line 0" to be the first explicit debug location in a function.
Feedback on r289468 from Adrian Prantl.

llvm-svn: 290012
2016-12-16 23:54:33 +00:00
Peter Collingbourne 3b8011f108 ModuleSummaryAnalysis: Remove some duplicate code. NFCI.
llvm-svn: 290003
2016-12-16 23:19:02 +00:00
Kevin Enderby 59343a9429 Fix a bugs with using some Mach-O command line flags like "-arch armv7m".
The Mach-O command line flag like "-arch armv7m" does not match the
arch name part of its llvm Triple which is "thumbv7m-apple-darwin”.

I think the best way to fix this is to have
llvm::object::MachOObjectFile::getArchTriple() optionally return the
name of the Mach-O arch flag that would be used with -arch that
matches the CPUType and CPUSubType.  Then change
llvm::object::MachOUniversalBinary::ObjectForArch::getArchTypeName()
to use that and change it to getArchFlagName() as the type name is
really part of the Triple and the -arch flag name is a Mach-O thing
for a specific Triple with a specific Mcpu value.

rdar://29663637

llvm-svn: 290001
2016-12-16 22:54:02 +00:00
Zachary Turner 46225b193f Resubmit "[CodeView] Hook CodeViewRecordIO for reading/writing symbols."
The original patch was broken due to some undefined behavior
as well as warnings that were triggering -Werror.

llvm-svn: 290000
2016-12-16 22:48:14 +00:00
Kostya Serebryany 3a4e2dd92f [libFuzzer] avoid msan false positives in more cases
llvm-svn: 289999
2016-12-16 22:45:25 +00:00
Kostya Serebryany be7003f99c [libFuzzer] add an experimental flag -experimental_len_control=1 that sets max_len to 1M and tries to increases the actual max sizes of mutations very gradually. Also remove a bit of dead code
llvm-svn: 289998
2016-12-16 22:42:05 +00:00
Teresa Johnson a61f5e3796 [ThinLTO] Import composite types as declarations
Summary:
When reading the metadata bitcode, create a type declaration when
possible for composite types when we are importing. Doing this in
the bitcode reader saves memory. Also it works naturally in the case
when the type ODR map contains a definition for the same composite type
because it was used in the importing module (buildODRType will
automatically use the existing definition and not create a type
declaration).

For Chromium built with -g2, this reduces the aggregate size of the
generated native object files by 66% (from 31G to 10G). It reduced
the time through the ThinLTO link and backend phases by about 20% on
my machine.

Reviewers: mehdi_amini, dblaikie, aprantl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27775

llvm-svn: 289993
2016-12-16 21:25:01 +00:00
Michael Kuperstein 3ca147ea3d Preserve loop metadata when folding branches to a common destination.
Differential Revision: https://reviews.llvm.org/D27830

llvm-svn: 289992
2016-12-16 21:23:59 +00:00
Jun Bum Lim 90b6b5074a [CodeGenPrep] Skip merging empty case blocks
This is recommit of r287553 after fixing the invalid loop info after eliminating an empty block and unit test failures in AVR and WebAssembly :

Summary: Merging an empty case block into the header block of switch could cause ISel to add COPY instructions in the header of switch, instead of the case block, if the case block is used as an incoming block of a PHI. This could potentially increase dynamic instructions, especially when the switch is in a loop. I added a test case which was reduced from the benchmark I was targetting.

Reviewers: t.p.northover, mcrosier, manmanren, wmi, joerg, davidxl

Subscribers: joerg, qcolombet, danielcdh, hfinkel, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D22696

llvm-svn: 289988
2016-12-16 20:38:39 +00:00
Sanjoy Das 007572706b Inline stripInvariantGroupMetadata out of existence
As a one liner function, I don't think it is pulling its weight in terms
of helping readability.

llvm-svn: 289987
2016-12-16 20:29:39 +00:00
Adrian Prantl 73ec065604 Revert "[IR] Remove the DIExpression field from DIGlobalVariable."
This reverts commit 289920 (again).
I forgot to implement a Bitcode upgrade for the case where a DIGlobalVariable
has not DIExpression. Unfortunately it is not possible to safely upgrade
these variables without adding a flag to the bitcode record indicating which
version they are.
My plan of record is to roll the planned follow-up patch that adds a
unit: field to DIGlobalVariable into this patch before recomitting.
This way we only need one Bitcode upgrade for both changes (with a
version flag in the bitcode record to safely distinguish the record
formats).

Sorry for the churn!

llvm-svn: 289982
2016-12-16 19:39:01 +00:00
Zachary Turner d0fffd1d14 Revert "[CodeView] Hook CodeViewRecordIO for reading/writing symbols."
This reverts commit r289978, which is failing due to some rebase/merge
issues.

llvm-svn: 289981
2016-12-16 19:25:23 +00:00
Zachary Turner a4e7dfbc16 [CodeView] Hook CodeViewRecordIO for reading/writing symbols.
This is the 3rd of 3 patches to get reading and writing of
CodeView symbol and type records to use a single codepath.

Differential Revision: https://reviews.llvm.org/D26427

llvm-svn: 289978
2016-12-16 19:20:35 +00:00
Mehdi Amini 8662305bae Strip invalid TBAA when reading bitcode
This ensures backward compatibility on bitcode loading.

Differential Revision: https://reviews.llvm.org/D27839

llvm-svn: 289977
2016-12-16 19:16:29 +00:00
Matthew Simpson a4964f291a Reapply "[LV] Enable vectorization of loops with conditional stores by default"
This patch reapplies r289863. The original patch was reverted because it
exposed a bug causing the loop vectorizer to crash in the Python runtime on
PPC. The underlying issue was fixed with r289958.

llvm-svn: 289975
2016-12-16 19:12:02 +00:00
Krzysztof Parzyszek ea9f8ce03c Implement LaneBitmask::any(), use it to replace !none(), NFCI
llvm-svn: 289974
2016-12-16 19:11:56 +00:00
Sanjoy Das 089c699743 Fix CodeGenPrepare::stripInvariantGroupMetadata
`dropUnknownNonDebugMetadata` takes a list of "known" metadata IDs.  The
only reason it worked at all is that `getMetadataID` returns something
unrelated -- it returns the subclass ID of the receiver (which is used
in `dyn_cast` etc.).  That does not numerically match
`LLVMContext::MD_invariant_group` and ends up dropping `invariant_group`
along with every other metadata that does not numerically match
`LLVMContext::MD_invariant_group`.

llvm-svn: 289973
2016-12-16 18:52:33 +00:00
Eli Friedman f624ec27b7 [ARM] Add ARMISD::VLD1DUP to match vld1_dup more consistently.
Currently, there are substantial problems forming vld1_dup even if the
VDUP survives legalization. The lack of an actual node
leads to terrible results: not only can we not form post-increment vld1_dup
instructions, but we form scalar pre-increment and post-increment
loads which force the loaded value into a GPR. This patch fixes that
by combining the vdup+load into an ARMISD node before DAGCombine
messes it up.

Also includes a crash fix for vld2_dup (see testcase @vld2dupi8_postinc_variable).

Recommiting with fix to avoid forming vld1dup if the type of the load
doesn't match the type of the vdup (see
https://llvm.org/bugs/show_bug.cgi?id=31404).

Differential Revision: https://reviews.llvm.org/D27694

llvm-svn: 289972
2016-12-16 18:44:08 +00:00
Joel Jones 8980ba643e Fix name typo in SelectonDAG
llvm-svn: 289969
2016-12-16 18:22:54 +00:00
Matt Arsenault 55e7d65b12 AMDGPU: Fix name for v_ashrrev_i16
llvm-svn: 289967
2016-12-16 17:40:11 +00:00
Marcos Pividori 566cf67e7c [libFuzzer] Fix index error in SearchMemory() implementation for Windows.
Differential Revision: https://reviews.llvm.org/D27731

llvm-svn: 289966
2016-12-16 17:35:25 +00:00
Marcos Pividori 3b04af2420 [libFuzzer] Remove unnecessary includes of posix headers.
Remove includes of "unistd.h" header, which is missing in non posix
systems.

Differential Revision: https://reviews.llvm.org/D277300

llvm-svn: 289965
2016-12-16 17:35:21 +00:00
Marcos Pividori 6a7e4c2e20 [libFuzzer] Update tests to use more general functions instead of posix specific.
Replace sleep() posix function by a more portable sleep_for() function
from std. Also, ignore memmem() and strcasestr() on Windows.

Differential Revision: https://reviews.llvm.org/D27729

llvm-svn: 289964
2016-12-16 17:35:13 +00:00
Hans Wennborg ef57755427 Fix -Wself-assign from r289955
llvm-svn: 289962
2016-12-16 17:16:46 +00:00
David Blaikie 7d4a5599da Revert "dwarfdump: Support/process relocations on a CU's abbrev_off"
Reverting because this breaks lld's gdb_index support - it's probably
double counting the abbrev relocation offset.

This reverts commit r289954.

llvm-svn: 289961
2016-12-16 17:10:17 +00:00
Jun Bum Lim f9416af191 Revert "[CodeGenPrep] Skip merging empty case blocks"
This reverts commit r289951.

llvm-svn: 289960
2016-12-16 17:06:14 +00:00
Matthew Simpson 099af810de [LV] Don't attempt to type-shrink scalarized instructions
After r288909, instructions feeding predicated instructions may be scalarized
if profitable. Since these instructions will remain scalar, we shouldn't
attempt to type-shrink them. We should only truncate vector types to their
minimal bit widths. This bug was exposed by enabling the vectorization of loops
containing conditional stores by default.

llvm-svn: 289958
2016-12-16 16:52:35 +00:00
Dehao Chen 2797800595 Pass sample pgo flags to thinlto.
Summary: ThinLTO needs to invoke SampleProfileLoader pass during link time in order to annotate profile correctly after module importing.

Reviewers: davidxl, mehdi_amini, tejohnson

Subscribers: pcc, davide, llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D27790

llvm-svn: 289957
2016-12-16 16:48:46 +00:00
Hans Wennborg 35f21cba13 [X86] Fold (setcc (cmp (atomic_load_add x, -C) C), COND) to (setcc (LADD x, -C), COND) (PR31367)
atomic_load_add returns the value before addition, but sets EFLAGS based on the
result of the addition. That means it's setting the flags based on effectively
subtracting C from the value at x, which is also what the outer cmp does.

This targets a pattern that occurs frequently with reference counting pointers:

  void decrement(long volatile *ptr) {
    if (_InterlockedDecrement(ptr) == 0)
      release();
  }

Clang would previously compile it (for 32-bit at -Os) as:

00000000 <?decrement@@YAXPCJ@Z>:
   0:   8b 44 24 04             mov    0x4(%esp),%eax
   4:   31 c9                   xor    %ecx,%ecx
   6:   49                      dec    %ecx
   7:   f0 0f c1 08             lock xadd %ecx,(%eax)
   b:   83 f9 01                cmp    $0x1,%ecx
   e:   0f 84 00 00 00 00       je     14 <?decrement@@YAXPCJ@Z+0x14>
  14:   c3                      ret

and with this patch it becomes:

00000000 <?decrement@@YAXPCJ@Z>:
   0:   8b 44 24 04             mov    0x4(%esp),%eax
   4:   f0 ff 08                lock decl (%eax)
   7:   0f 84 00 00 00 00       je     d <?decrement@@YAXPCJ@Z+0xd>
   d:   c3                      ret

(Equivalent variants with _InterlockedExchangeAdd, std::atomic<>'s fetch_add
or pre-decrement operator generate the same code.)

Differential Revision: https://reviews.llvm.org/D27781

llvm-svn: 289955
2016-12-16 16:34:59 +00:00
David Blaikie e9fda9f201 dwarfdump: Support/process relocations on a CU's abbrev_off
Input can be produced by ld -r, for example (a normal LLVM workflow
never hits this - LLVM only ever produces a single abbrev table in an
object (shared by multiple CUs), so the reloc's always 0, and when it's
linked together the relocation's resolved so it doesn't need to be
handled)

llvm-svn: 289954
2016-12-16 16:31:10 +00:00
Jun Bum Lim 85347dde27 [CodeGenPrep] Skip merging empty case blocks
This is recommit of r287553 after fixing the invalid loop info after eliminating an empty block:

Summary: Merging an empty case block into the header block of switch could cause ISel to add COPY instructions in the header of switch, instead of the case block, if the case block is used as an incoming block of a PHI. This could potentially increase dynamic instructions, especially when the switch is in a loop. I added a test case which was reduced from the benchmark I was targetting.

Reviewers: t.p.northover, mcrosier, manmanren, wmi, joerg, davidxl

Subscribers: joerg, qcolombet, danielcdh, hfinkel, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D22696

llvm-svn: 289951
2016-12-16 16:03:31 +00:00
Simon Pilgrim 4b73c3de50 [X86][AVX] Call lowerVectorShuffleWithSHUFPS directly instead of calling DAG.getVectorShuffle (PR27885)
We've already done the hardwork of ensuring the mask is safe for 'SHUFPS'.

llvm-svn: 289950
2016-12-16 15:23:32 +00:00
Simon Pilgrim 9519bd9232 [X86][AVX512] use a single shufps for 512-bit vectors when it can save instructions
This is the 512-bit counterpart to the 128-bit transform checked in here:
https://reviews.llvm.org/rL289837

This patch is based on the draft by @sroland (Roland Scheidegger) that is attached to PR27885:
https://llvm.org/bugs/show_bug.cgi?id=27885

llvm-svn: 289946
2016-12-16 14:30:04 +00:00
Nico Weber c4d695e25b Speculatively revert r289925, see PR31407
llvm-svn: 289944
2016-12-16 14:02:28 +00:00
Krzysztof Parzyszek 36ef5dc3df [MIRParser] Add parsing hex literals of arbitrary size as unsigned integers
The current code does not parse hex literals larger than 32-bit.

llvm-svn: 289943
2016-12-16 13:58:01 +00:00
Daniel Jasper a44fd3014d Move VerifierSupport into namespace llvm.
It currently is in an unnamed namespace and then it shouldn't be used
from something in the header file. This actually triggers a warning with
GCC:
../include/llvm/IR/Verifier.h:39:7: warning: ‘llvm::TBAAVerifier’ has a field ‘llvm::TBAAVerifier::Diagnostic’ whose type uses the anonymous namespace [enabled by default]

llvm-svn: 289942
2016-12-16 13:53:46 +00:00
Benjamin Kramer 24bf8689dd [GlobalISel] Silence unused variable warnings in Release builds.
llvm-svn: 289941
2016-12-16 13:13:03 +00:00
Diana Picus 812caee65a [ARM] GlobalISel: Select add i32, i32
Add the minimal support necessary to select a function that returns the sum of
two i32 values.

This includes some support for argument/return lowering of i32 values through
registers, as well as the handling of copy and add instructions throughout the
GlobalISel pipeline.

Differential Revision: https://reviews.llvm.org/D26677

llvm-svn: 289940
2016-12-16 12:54:46 +00:00
Simon Pilgrim f159a3414f [X86][SSE] Combine shuffles to MOVSS/MOVSD whatever the domain.
We already do the same thing in shuffle lowering; but don't do it if we have SSE41 (PBLEND) instead.

llvm-svn: 289937
2016-12-16 11:48:51 +00:00
Chandler Carruth 48b4e614d8 Revert r289863: [LV] Enable vectorization of loops with conditional
stores by default

This uncovers a crasher in the loop vectorizer on PPC when building the
Python runtime. I'll send the testcase to the review thread for the
original commit.

llvm-svn: 289934
2016-12-16 11:31:39 +00:00
Florian Hahn 3c8b8c98b0 [codegen] Add generic functions to skip debug values.
Summary:
This commits moves skipDebugInstructionsForward and
skipDebugInstructionsBackward from lib/CodeGen/IfConversion.cpp
to include/llvm/CodeGen/MachineBasicBlock.h and updates
some codgen files to use them.

This refactoring was suggested in https://reviews.llvm.org/D27688
and I thought it's best to do the refactoring in a separate
review, but I could also put both changes in a single review
if that's preferred.

Also, the names for the functions aren't the snappiest and
I would be happy to rename them if anybody has suggestions. 

Reviewers: eli.friedman, iteratee, aprantl, MatzeB

Subscribers: MatzeB, llvm-commits

Differential Revision: https://reviews.llvm.org/D27782

llvm-svn: 289933
2016-12-16 11:10:26 +00:00
Diana Picus 2af9c389bf [ARM] Expose methods to get the CCAssignFn. NFCI
Add two public methods to ARMTargetLowering: CCAssignFnForCall and
CCAssignFnForReturn, which are just calling the already existing private method
CCAssignFnForNode. These will come in handy for GlobalISel on ARM.

We also replace all calls to CCAssignFnForNode in ARMISelLowering.cpp, because
the new methods are friendlier to the reader.

llvm-svn: 289932
2016-12-16 10:35:20 +00:00
Chandler Carruth 05e80d31bd Revert r289638: [PowerPC] Fix logic dealing with nop after calls (and tail-call eligibility)
This patch appears to result in trampolines in vtables being miscompiled
when they in turn tail call a method.

I've posted some preliminary details about the failure on the thread for
this commit and talked to Hal. He was comfortable going ahead and
reverting until we sort out what is wrong.

llvm-svn: 289928
2016-12-16 07:31:20 +00:00
Mehdi Amini a84a840a56 Extract a TBAAVerifier out of the verifier (NFC)
This is intended to be used (in a later patch) by the BitcodeReader
to detect invalid TBAA and drop them when loading bitcode, so that
we don't break client that have legacy bitcode with possible invalid
TBAA.

Differential Revision: https://reviews.llvm.org/D27838

llvm-svn: 289927
2016-12-16 06:29:14 +00:00
Nico Weber 7aac3fdc2c attempt to fix windows build
llvm-svn: 289926
2016-12-16 05:13:02 +00:00
Ekaterina Romanova 25da8a9b53 Update .debug_line section version information to match DWARF version.
One more attempt to re-commit the patch r285355, which I had to revert in r285362, because some tests were failing (the reason is because the size of the line_table varied depending on the full file name).

In the past the compiler always emitted .debug_line version 2, though some opcodes from DWARF 3 (e.g. DW_LNS_set_prologue_end, DW_LNS_set_epilogue_begin or DW_LNS_set_isa) and from DWARF 4 could be emitted by the compiler.

This patch changes version information of .debug_line to exactly match the DWARF version. For .debug_line version 4, a new field maximum_operations_per_instruction is emitted.

Differential Revision: https://reviews.llvm.org/D16697

llvm-svn: 289925
2016-12-16 05:10:11 +00:00
Nico Weber 11c2e6ee3a Revert 279703, it caused PR31404.
llvm-svn: 289923
2016-12-16 04:51:25 +00:00
Adrian Prantl 74a835cda0 [IR] Remove the DIExpression field from DIGlobalVariable.
This patch implements PR31013 by introducing a
DIGlobalVariableExpression that holds a pair of DIGlobalVariable and
DIExpression.

Currently, DIGlobalVariables holds a DIExpression. This is not the
best way to model this:

(1) The DIGlobalVariable should describe the source level variable,
    not how to get to its location.

(2) It makes it unsafe/hard to update the expressions when we call
    replaceExpression on the DIGLobalVariable.

(3) It makes it impossible to represent a global variable that is in
    more than one location (e.g., a variable with multiple
    DW_OP_LLVM_fragment-s).  We also moved away from attaching the
    DIExpression to DILocalVariable for the same reasons.

This reapplies r289902 with additional testcase upgrades.

<rdar://problem/29250149>
https://llvm.org/bugs/show_bug.cgi?id=31013
Differential Revision: https://reviews.llvm.org/D26769

llvm-svn: 289920
2016-12-16 04:25:54 +00:00
Teresa Johnson edddca22c9 [ThinLTO] Thin link efficiency: More efficient export list computation
Summary:
Instead of checking whether a global referenced by a function being
imported is defined in the same module, speculatively always add the
referenced globals to the module's export list. After all imports are
computed, for each module prune any not in its defined set from its
export list.

For a huge C++ app with aggressive importing thresholds, even with
D27687 we spent a lot of time invoking modulePath() from
exportGlobalInModule (modulePath() was still the 2nd hottest routine in
profile). The reason is that with comdat/linkonce the summary lists for
each GUID can be long. For the app in question, for example, we were
invoking exportGlobalInModule almost 2 million times, and we traversed
an average of 63 entries in the summary list each time.

This patch reduced the thin link time for the app by about 10% (on top
of D27687) when using aggressive importing thresholds, and about 3.5% on
average with default importing thresholds.

Reviewers: mehdi_amini

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27755

llvm-svn: 289918
2016-12-16 04:11:51 +00:00
Chandler Carruth ba5de63bc3 Add extra headers that got deleted by my revert in r289916 but for which
new usage had already grown in the file.

llvm-svn: 289917
2016-12-16 04:08:31 +00:00
Chandler Carruth 4154062b69 Revert patch series introducing the DAG combine to match a load-by-bytes
idiom.

r289538: Match load by bytes idiom and fold it into a single load
r289540: Fix a buildbot failure introduced by r289538
r289545: Use more detailed assertion messages in the code ...
r289646: Add a couple of assertions to the load combine code ...

This DAG combine has a bad crash in it that is quite hard to trigger
sadly -- it relies on sneaking code with UB through the SDAG build and
into this particular combine. I've responded to the original commit with
a test case that reproduces it.

However, the code also has other problems that will require substantial
changes to address and so I'm going ahead and reverting it for now. This
should unblock us and perhaps others that are hitting the crash in the
wild and will let a fresh patch with updated approach come in cleanly
afterward.

Sorry for any trouble or disruption!

llvm-svn: 289916
2016-12-16 04:05:22 +00:00
Davide Italiano f024a56cb8 [SimplifyLibCalls] Use a lambda. NFCI.
llvm-svn: 289911
2016-12-16 02:28:38 +00:00
Eugene Zelenko 26e8c7df3a [Hexagon] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 289907
2016-12-16 01:00:40 +00:00
Adrian Prantl 03c6d31a3b Revert "[IR] Remove the DIExpression field from DIGlobalVariable."
This reverts commit 289902 while investigating bot berakage.

llvm-svn: 289906
2016-12-16 01:00:30 +00:00
Peter Collingbourne 7a4be21d46 Add missing library dep.
llvm-svn: 289903
2016-12-16 00:43:00 +00:00
Adrian Prantl ce13935776 [IR] Remove the DIExpression field from DIGlobalVariable.
This patch implements PR31013 by introducing a
DIGlobalVariableExpression that holds a pair of DIGlobalVariable and
DIExpression.

Currently, DIGlobalVariables holds a DIExpression. This is not the
best way to model this:

(1) The DIGlobalVariable should describe the source level variable,
    not how to get to its location.

(2) It makes it unsafe/hard to update the expressions when we call
    replaceExpression on the DIGLobalVariable.

(3) It makes it impossible to represent a global variable that is in
    more than one location (e.g., a variable with multiple
    DW_OP_LLVM_fragment-s).  We also moved away from attaching the
    DIExpression to DILocalVariable for the same reasons.

<rdar://problem/29250149>
https://llvm.org/bugs/show_bug.cgi?id=31013
Differential Revision: https://reviews.llvm.org/D26769

llvm-svn: 289902
2016-12-16 00:36:43 +00:00
Peter Collingbourne 1398a32e28 IPO: Introduce ThinLTOBitcodeWriter pass.
This pass prepares a module containing type metadata for ThinLTO by splitting
it into regular and thin LTO parts if possible, and writing both parts to
a multi-module bitcode file. Modules that do not contain type metadata are
written unmodified as a single module.

All globals with type metadata are added to the regular LTO module, and
the rest are added to the thin LTO module.

Differential Revision: https://reviews.llvm.org/D27324

llvm-svn: 289899
2016-12-16 00:26:30 +00:00
Evandro Menezes 1b48bac330 [AArch64] Add FeatureSlowMisaligned128Store to Exynos M1 and M2
This feature now gates such stores after r289845.  Thus the Exynos
processors now need this feature.

llvm-svn: 289898
2016-12-16 00:18:00 +00:00
Teresa Johnson 19f2aa7891 [ThinLTO] Thin link efficiency improvement: don't re-export globals (NFC)
Summary:
We were reinvoking exportGlobalInModule numerous times redundantly.
No need to re-export globals referenced by a global that was already
imported from its module. This resulted in a large speedup in the thin
link for a big application, particularly when importing aggressiveness
was cranked up.

Reviewers: mehdi_amini

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27687

llvm-svn: 289896
2016-12-15 23:50:06 +00:00
Davide Italiano 85ad36b0e0 [SimplifyLibCalls] Lower fls() to llvm.ctlz().
Differential Revision:  https://reviews.llvm.org/D14590

llvm-svn: 289894
2016-12-15 23:45:11 +00:00
David Blaikie 38b74bf249 DebugInfo: Address non-deterministic output (iterating a SmallPtrSet) in 289697
Post-commit review feedback from Adrian Prantl.

Hopefully this fixes that up :)

llvm-svn: 289892
2016-12-15 23:37:38 +00:00
Quentin Colombet 327f942876 [IRTranslator] Merge the entry and ABI lowering blocks.
The IRTranslator uses an additional block before the LLVM-IR entry block
to perform all the ABI lowering and the constant hoisting. Thus, this
block is the actual entry block and it falls through the LLVM-IR entry
block. However, with such representation, we end up with two basic
blocks that are not maximal.

Therefore, this patch adds a bit of canonicalization by merging both the
LLVM-IR entry block and the ABI lowering/constants hoisting into one
block, making the resulting block more likely to be maximal (indeed the
LLVM-IR entry block might not have been maximal).

llvm-svn: 289891
2016-12-15 23:32:25 +00:00
David Blaikie 3e3eb33ed7 DebugInfo: Emit ranges for functions with DISubprograms but lacking locations on any instructions
This seems more consistent, and helps tidy up/simplify some other code
in this change.

llvm-svn: 289889
2016-12-15 23:17:52 +00:00
Davide Italiano 890e850348 [SimplifyLibCalls] Remove redundant folding logic for ffs().
Lowering to llvm.cttz() will result in constant folding anyway
if the argument to ffs is a constant. Pointed out by Eli for
fls() in D14590.

llvm-svn: 289888
2016-12-15 23:11:00 +00:00
Eli Friedman 379294676d Don't combine splats with other shuffles.
We sometimes end up creating shuffles which are worse than the obvious
translation of the IR.

Fixes https://llvm.org/bugs/show_bug.cgi?id=31301 .

Differential Revision: https://reviews.llvm.org/D27793

llvm-svn: 289882
2016-12-15 22:41:40 +00:00
Yichao Yu 8f8cdd00da Fix R_AARCH64_MOVW_UABS_G3 relocation
Summary: The relocation is missing mask so an address that has non-zero bits in 47:43 may overwrite the register number. (Frequently shows up as target register changed to `xzr`....)

Reviewers: t.p.northover, lhames

Subscribers: davide, aemerson, rengolin, llvm-commits

Differential Revision: https://reviews.llvm.org/D27609

llvm-svn: 289880
2016-12-15 22:36:53 +00:00
Matt Arsenault 327188aa15 AMDGPU: Select branch on undef to uniform scc branch
llvm-svn: 289877
2016-12-15 21:57:11 +00:00
Teresa Johnson eb0ac24172 [ThinLTO] Revert part of r289843 that belonged to another patch.
The code change for D27687 accidentally got committed along with the
main change in r289843. Revert it temporarily, so that I can recommit it
along with its test as intended.

llvm-svn: 289875
2016-12-15 21:39:42 +00:00
Eli Friedman 34505083c6 Don't combine a shuffle of two BUILD_VECTORs with duplicate elements.
Targets can't handle this case well in general; we often transform
a shuffle of two cheap BUILD_VECTORs to element-by-element insertion,
which is very inefficient.

Fixes https://llvm.org/bugs/show_bug.cgi?id=31364 . Partially
fixes https://llvm.org/bugs/show_bug.cgi?id=31301.

Differential Revision: https://reviews.llvm.org/D27787

llvm-svn: 289874
2016-12-15 21:36:59 +00:00
Sanjoy Das 6698c15cb6 [Verifier] Allow TBAA metadata on atomicrmw and atomiccmpxchg
This used to be allowed before r289402 by default (before r289402 you
could have TBAA metadata on any instruction), and while I'm not sure
that it helps, it does sound reasonable enough to not fail the verifier
and we have out-of-tree users who use this.

llvm-svn: 289872
2016-12-15 21:23:44 +00:00
Teresa Johnson 0c3f57b133 [ThinLTO] Remove stale comment (NFC)
This should have been removed with r288446.

llvm-svn: 289871
2016-12-15 20:53:31 +00:00
Matt Arsenault 0b386360c5 AMDGPU: Fix asserting on returned tail calls
llvm-svn: 289868
2016-12-15 20:50:12 +00:00
Teresa Johnson 475b51a700 [ThinLTO] Thin link efficiency: skip candidate added later with higher threshold (NFC)
Summary:
Thin link efficiency improvement. After adding an importing candidate to
the worklist we might have later added it again with a higher threshold.
Skip it when popped from the worklist if we recorded a higher threshold
than the current worklist entry, it will get processed again at the
higher threshold when that entry is popped.

This required adding the summary's GUID to the worklist, so that it can
be used to query the recorded highest threshold for it when we pop from the
worklist.

Reviewers: mehdi_amini

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27696

llvm-svn: 289867
2016-12-15 20:48:19 +00:00
Matt Arsenault 0e8a299f19 AMDGPU: Assembler support for vintrp instructions
llvm-svn: 289866
2016-12-15 20:40:20 +00:00
Matthew Simpson 6a98bcfe33 [LV] Enable vectorization of loops with conditional stores by default
This patch sets the default value of the "-enable-cond-stores-vec" command line
option to "true".

Differential Revision: https://reviews.llvm.org/D27814

llvm-svn: 289863
2016-12-15 20:11:05 +00:00
Andrea Di Biagio f20c57eca9 [SimplifyCFG] Merge debug locations when hoisting an instruction from a then/else branch. NFC.
Now that a new API to merge debug locations has been committed at r289661 (see
review D26256 for more details), we can use it to "improve" the code added by
revision r280995.

Instead of nulling the debugloc of a commoned instruction, we use the 'merged'
debug location. At the moment, this is just a no functional change since
function `DILocation::getMergedLocation()` is just a stub and would always
return a null location.

Differential Revision: https://reviews.llvm.org/D27804

llvm-svn: 289862
2016-12-15 20:01:26 +00:00
Geoff Berry 66d1f0ff1f [LiveRangeEdit] Change eliminateDeadDef assert to if condition.
The assert could potentially fire (though no cases have been
encountered), so just check that the instruction we're handling
specially for rematerialization only has one def to begin with.

Reviewed by Wei Mi over email.

llvm-svn: 289861
2016-12-15 19:55:19 +00:00
Peter Collingbourne e089554c8f LibDriver: Allow resource files to be archive members.
It seems pointless to add a resource to an archive because it won't have
any symbols to link against (and link.exe doesn't have an equivalent of
--whole-archive), but lib.exe allows it for some reason.

llvm-svn: 289859
2016-12-15 19:37:46 +00:00
Sanjay Patel d640641a61 [InstCombine] add folds for icmp (smin X, Y), X
Min/max canonicalization (r287585) exposes the fact that we're missing combines for min/max patterns. 
This patch won't solve the example that was attached to that thread, so something else still needs fixing.

The line between InstCombine and InstSimplify gets blurry here because sometimes the icmp instruction that
we want to fold to already exists, but sometimes it's the swapped form of what we want.

Corresponding changes for smax/umin/umax to follow.

Differential Revision: https://reviews.llvm.org/D27531

llvm-svn: 289855
2016-12-15 19:13:37 +00:00
Ahmed Bougacha 5228603387 [GlobalISel] Drop workaround for Legalizer member/class sharing a name. NFC.
MachineLegalizer used to be the name of both the class and the member,
causing GCC errors. r276522 fixed that by renaming the member to just
'Legalizer'.  The 'class' workaround isn't necessary anymore; drop it.

llvm-svn: 289848
2016-12-15 18:45:30 +00:00
Sanjay Patel a97358bc8e [x86] use a single shufps for 256-bit vectors when it can save instructions
This is the 256-bit counterpart to the 128-bit transform checked in here:
https://reviews.llvm.org/rL289837

This patch is based on the draft by @sroland (Roland Scheidegger) that is
attached to PR27885:
https://llvm.org/bugs/show_bug.cgi?id=27885

llvm-svn: 289846
2016-12-15 18:43:46 +00:00
Matthew Simpson 2c8de192a1 [AArch64] Guard Misaligned 128-bit store penalty by subtarget feature
This patch checks that the SlowMisaligned128Store subtarget feature is set
when penalizing such stores in getMemoryOpCost.

Differential Revision: https://reviews.llvm.org/D27677

llvm-svn: 289845
2016-12-15 18:36:59 +00:00
Ahmed Bougacha 2a26a5f1f0 [AArch64][GlobalISel] Remove redundant RBI comments. NFC.
It's brittle, and Doxygen already picks the overriden method's comment
anyway.

llvm-svn: 289844
2016-12-15 18:22:15 +00:00
Teresa Johnson 1b859a2306 [ThinLTO] Ensure callees get hot threshold when first seen on cold path
This is split out from D27696, since it turned out to be a bug fix and
not part of the NFC efficiency change.

Keep the same adjusted (possibly decayed) threshold in both the worklist
and the ImportList. Otherwise if we encountered it first along a cold
path, the callee would be added to the worklist with a lower decayed
threshold than when it is later encountered along a hot path. But the
logic uses the threshold recorded in the ImportList entry to check if
we should re-add it, and without this patch the threshold recorded there
is the same along both paths so we don't re-add it. Using the
same possibly decayed threshold in the ImportList ensures we re-add it
later with the higher non-decayed hot path threshold.

llvm-svn: 289843
2016-12-15 18:21:01 +00:00
Sanjay Patel a0d8a278a7 [x86] use a single shufps when it can save instructions
This is a tiny patch with a big pile of test changes.
This partially fixes PR27885:
https://llvm.org/bugs/show_bug.cgi?id=27885

My motivating case looks like this:

  - vpshufd {{.*#+}} xmm1 = xmm1[0,1,0,2]
  - vpshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]
  - vpblendw {{.*#+}} xmm0 = xmm0[0,1,2,3],xmm1[4,5,6,7]

  + vshufps {{.*#+}} xmm0 = xmm0[0,2],xmm1[0,2]

And this happens several times in the diffs. For chips with domain-crossing penalties,
the instruction count and size reduction should usually overcome any potential 
domain-crossing penalty due to using an FP op in a sequence of int ops. For chips such
as recent Intel big cores and Atom, there is no domain-crossing penalty for shufps, so
using shufps is a pure win.

So the test case diffs all appear to be improvements except one test in 
vector-shuffle-combining.ll where we miss an opportunity to use a shift to generate 
zero elements and one test in combine-sra.ll where multiple uses prevent the expected
shuffle combining.

Differential Revision: https://reviews.llvm.org/D27692

llvm-svn: 289837
2016-12-15 18:03:38 +00:00
Simon Pilgrim 7522f54feb [X86][SSE] Fix domains for scalar store instructions
As discussed on D27692

llvm-svn: 289834
2016-12-15 17:09:24 +00:00
Robert Lougher 6ea759a83e Revert "[SimplifyCFG] In sinkLastInstruction correctly set debugloc of common inst"
Reverting as it is causing buildbot failures (address sanitizer).

llvm-svn: 289833
2016-12-15 16:59:13 +00:00
Jacques Pienaar ccffe38352 [lanai] Simplify small section check in LowerGlobalAddress and treat ldata sections specially.
Move the check for the code model into isGlobalInSmallSectionImpl and return false (not in small section) for variables placed in sections prefixed with .ldata (workaround for a tool limitation).

llvm-svn: 289832
2016-12-15 16:56:16 +00:00
Simon Pilgrim ba46422694 [X86][AVX512] Moved instruction domain lookups to the right table. NFCI.
Avoid duplicating instructions in the int32/int64 domains.

llvm-svn: 289830
2016-12-15 16:38:51 +00:00
Robert Lougher cf17674211 [SimplifyCFG] In sinkLastInstruction correctly set debugloc of "common" inst
Simplify CFG will try to sink the last instruction in a series of basic blocks,
creating a "common" instruction in the successor block (sinkLastInstruction).
When it does this, the debug location of the single instruction should be the
merged debug locations of the commoned instructions.

Differential Revision: https://reviews.llvm.org/D27590

llvm-svn: 289828
2016-12-15 16:17:53 +00:00
Simon Pilgrim d7518896ff [X86][SSE] Fix domains for VZEXT_LOAD type instructions
Add the missing domain equivalences for movss, movsd, movd and movq zero extending loading instructions.

Differential Revision: https://reviews.llvm.org/D27684

llvm-svn: 289825
2016-12-15 16:05:29 +00:00
Alexander Timofeev a57511c451 Fix for regression after Global Load Scalarization patch
llvm-svn: 289822
2016-12-15 15:17:19 +00:00
Krzysztof Parzyszek 91b5cf8412 Extract LaneBitmask into a separate type
Specifically avoid implicit conversions from/to integral types to
avoid potential errors when changing the underlying type. For example,
a typical initialization of a "full" mask was "LaneMask = ~0u", which
would result in a value of 0x00000000FFFFFFFF if the type was extended
to uint64_t.

Differential Revision: https://reviews.llvm.org/D27454

llvm-svn: 289820
2016-12-15 14:36:06 +00:00
Simon Pilgrim 2f7f0e7a48 [CostModel][X86] Updated reverse shuffle costs
llvm-svn: 289819
2016-12-15 14:24:07 +00:00
Ehsan Amiri 795b0671c5 [InstCombine] New opportunities for FoldAndOfICmp and FoldXorOfICmp
A number of new patterns for simplifying and/xor of icmp:

(icmp ne %x, 0) ^ (icmp ne %y, 0) => icmp ne %x, %y if the following is true:
1- (%x = and %a, %mask) and (%y = and %b, %mask)
2- %mask is a power of 2.

(icmp eq %x, 0) & (icmp ne %y, 0) => icmp ult %x, %y if the following is true:
1- (%x = and %a, %mask1) and (%y = and %b, %mask2)
2- Let %t be the smallest power of 2 where %mask1 & %t != 0. Then for any
   %s that is a power of 2 and %s & %mask2 != 0, we must have %s <= %t.
For example if %mask1 = 24 and %mask2 = 16, setting %s = 16 and %t = 8
violates condition (2) above. So this optimization cannot be applied.

llvm-svn: 289813
2016-12-15 12:25:13 +00:00
Simon Pilgrim 9876ed07f6 [CostModel] Fix long standing bug with reverse shuffle mask detection
Incorrect 'undef' mask index matching meant that broadcast shuffles could be detected as reverse shuffles

llvm-svn: 289811
2016-12-15 12:12:45 +00:00
Nemanja Ivanovic 552c8e960e [Power9] Allow AnyExt immediates for XXSPLTIB
In some situations, the BUILD_VECTOR node that builds a v18i8 vector by
a splat of an i8 constant will end up with signed 8-bit values and other
situations, it'll end up with unsigned ones. Handle both situations.

Fixes PR31340.

llvm-svn: 289804
2016-12-15 11:16:20 +00:00
Dylan McKay 4f590f28e7 [AVR] Support floats in the instrumention pass
This also refactors some common code into the 'GetTypeName' method.

llvm-svn: 289803
2016-12-15 11:02:41 +00:00
Sjoerd Meijer 96e10b5a9e [Thumb] Teach ISel how to lower compares of AND bitmasks efficiently
This is essentially a recommit of r285893, but with a correctness fix. The
problem of the original commit was that this:

bic r5, r7, #31
cbz r5, .LBB2_10

got rewritten into:

lsrs  r5, r7, #5
beq .LBB2_10

The result in destination register r5 is not the same and this is incorrect
when r5 is not dead. So this fix includes checking the uses of the AND
destination register. And also, compared to the original commit, some regression
tests didn't need changing anymore because of this extra check.

For completeness, this was the original commit message:

For the common pattern (CMPZ (AND x, #bitmask), #0), we can do some more
efficient instruction selection if the bitmask is one consecutive sequence of
set bits (32 - clz(bm) - ctz(bm) == popcount(bm)).

1) If the bitmask touches the LSB, then we can remove all the upper bits and
set the flags by doing one LSLS.
2) If the bitmask touches the MSB, then we can remove all the lower bits and
set the flags with one LSRS.
3) If the bitmask has popcount == 1 (only one set bit), we can shift that bit
into the sign bit with one LSLS and change the condition query from NE/EQ to
MI/PL (we could also implement this by shifting into the carry bit and
branching on BCC/BCS).
4) Otherwise, we can emit a sequence of LSLS+LSRS to remove the upper and lower
zero bits of the mask.

1-3 require only one 16-bit instruction and can elide the CMP. 4 requires two
16-bit instructions but can elide the CMP and doesn't require materializing a
complex immediate, so is also a win.

Differential Revision: https://reviews.llvm.org/D27761

llvm-svn: 289794
2016-12-15 09:38:59 +00:00
Dylan McKay 4b028e2ee1 [AVR] Add argument indices to the instrumention hook functions
This allows the instrumention hook functions to do better
pretty-printing.

llvm-svn: 289793
2016-12-15 09:38:09 +00:00
Prakhar Bahuguna 13e9921ccc Fix for build warning in execute-only support
llvm-svn: 289788
2016-12-15 08:42:04 +00:00
Prakhar Bahuguna e640c6f765 Allow ELF section flags to be specified numerically
Summary:
GAS already allows flags for sections to be specified directly as a
numeric value. This functionality is particularly useful for setting
processor or application-specific values that may not be directly
supported or understood by LLVM. This patch allows LLVM to use numeric
section flag values verbatim if specified by the assembly file.

Reviewers: grosbach, rafael, t.p.northover, rengolin

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27451

llvm-svn: 289785
2016-12-15 07:59:15 +00:00
Prakhar Bahuguna 52a7dd7d78 [ARM] Implement execute-only support in CodeGen
This implements execute-only support for ARM code generation, which
prevents the compiler from generating data accesses to code sections.
The following changes are involved:

* Add the CodeGen option "-arm-execute-only" to the ARM code generator.
* Add the clang flag "-mexecute-only" as well as the GCC-compatible
  alias "-mpure-code" to enable this option.
* When enabled, literal pools are replaced with MOVW/MOVT instructions,
  with VMOV used in addition for floating-point literals. As the MOVT
  instruction is required, execute-only support is only available in
  Thumb mode for targets supporting ARMv8-M baseline or Thumb2.
* Jump tables are placed in data sections when in execute-only mode.
* The execute-only text section is assigned section ID 0, and is
  marked as unreadable with the SHF_ARM_PURECODE flag with symbol 'y'.
  This also overrides selection of ELF sections for globals.

llvm-svn: 289784
2016-12-15 07:59:08 +00:00
Kostya Serebryany 628b43aab6 [libFuzzer] enable the failure-resistant merge by default (with trace-pc-guard only)
llvm-svn: 289772
2016-12-15 06:21:21 +00:00
Hal Finkel f19e114237 Revert part of r289765 that is not necessary
CS.doesNotAccessMemory(ArgNo) and CS.onlyReadsMemory(ArgNo) calls
dataOperandHasImpliedAttr, so revert this part of r289765 because
it should not be necessary.

llvm-svn: 289768
2016-12-15 05:50:45 +00:00
Hal Finkel 34f9d6ac11 Trying to fix NDEBUG build after r289764
llvm-svn: 289766
2016-12-15 05:33:19 +00:00
Hal Finkel 39fed399e1 Fix argument attribute queries with bundle operands
When iterating over data operands in AA, don't make argument-attribute-specific
queries on bundle operands. Trying to fix self hosting...

llvm-svn: 289765
2016-12-15 05:09:15 +00:00
Sanjoy Das d7389d6261 [MachineBlockPlacement] Don't make blocks "uneditable"
Summary:
This fixes an issue with MachineBlockPlacement due to a badly timed call
to `analyzeBranch` with `AllowModify` set to true.  The timeline is as
follows:

 1. `MachineBlockPlacement::maybeTailDuplicateBlock` calls
    `TailDup.shouldTailDuplicate` on its argument, which in turn calls
    `analyzeBranch` with `AllowModify` set to true.

 2. This `analyzeBranch` call edits the terminator sequence of the block
    based on the physical layout of the machine function, turning an
    unanalyzable non-fallthrough block to a unanalyzable fallthrough
    block.  Normally MBP bails out of rearranging such blocks, but this
    block was unanalyzable non-fallthrough (and thus rearrangeable) the
    first time MBP looked at it, and so it goes ahead and decides where
    it should be placed in the function.

 3. When placing this block MBP fails to analyze and thus update the
    block in keeping with the new physical layout.

Concretely, before (1) we have something like:

```
LBL0:
  < unknown terminator op that may branch to LBL1 >
  jmp LBL1

LBL1:
  ... A

LBL2:
  ... B
```

In (2), analyze branch simplifies this to

```
LBL0:
  < unknown terminator op that may branch to LBL2 >
  ;; jmp LBL1 <- redundant jump removed

LBL1:
  ... A

LBL2:
  ... B
```

In (3), MachineBlockPlacement goes ahead with its plan of putting LBL2
after the first block since that is profitable.

```
LBL0:
  < unknown terminator op that may branch to LBL2 >
  ;; jmp LBL1 <- redundant jump

LBL2:
  ... B

LBL1:
  ... A
```

and the program now has incorrect behavior (we no longer fall-through
from `LBL0` to `LBL1`) because MBP can no longer edit LBL0.

There are several possible solutions, but I went with removing the teeth
off of the `analyzeBranch` calls in TailDuplicator.  That makes thinking
about the result of these calls easier, and breaks nothing in the lit
test suite.

I've also added some bookkeeping to the MachineBlockPlacement pass and
used that to write an assert that would have caught this.

Reviewers: chandlerc, gberry, MatzeB, iteratee

Subscribers: mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D27783

llvm-svn: 289764
2016-12-15 05:08:57 +00:00
Craig Topper ab5f355d8c [AVX-512][InstCombine] Add masked scalar FMA intrinsics to SimplifyDemandedVectorElts.
llvm-svn: 289759
2016-12-15 03:49:45 +00:00
Hal Finkel 321053a7ca Fix iterator-invalidation issue
Inserting a new key into a DenseMap potentially invalidates iterators into that
map. Trying to fix an issue from r289755 triggering this assertion:

  Assertion `isHandleInSync() && "invalid iterator access!"' failed.

llvm-svn: 289757
2016-12-15 03:30:40 +00:00
Hal Finkel 3ca4a6bcf1 Remove the AssumptionCache
After r289755, the AssumptionCache is no longer needed. Variables affected by
assumptions are now found by using the new operand-bundle-based scheme. This
new scheme is more computationally efficient, and also we need much less
code...

llvm-svn: 289756
2016-12-15 03:02:15 +00:00
Hal Finkel cb9f78e1c3 Make processing @llvm.assume more efficient by using operand bundles
There was an efficiency problem with how we processed @llvm.assume in
ValueTracking (and other places). The AssumptionCache tracked all of the
assumptions in a given function. In order to find assumptions relevant to
computing known bits, etc. we searched every assumption in the function. For
ValueTracking, that means that we did O(#assumes * #values) work in InstCombine
and other passes (with a constant factor that can be quite large because we'd
repeat this search at every level of recursion of the analysis).

Several of us discussed this situation at the last developers' meeting, and
this implements the discussed solution: Make the values that an assume might
affect operands of the assume itself. To avoid exposing this detail to
frontends and passes that need not worry about it, I've used the new
operand-bundle feature to add these extra call "operands" in a way that does
not affect the intrinsic's signature. I think this solution is relatively
clean. InstCombine adds these extra operands based on what ValueTracking, LVI,
etc. will need and then those passes need only search the users of the values
under consideration. This should fix the computational-complexity problem.

At this point, no passes depend on the AssumptionCache, and so I'll remove
that as a follow-up change.

Differential Revision: https://reviews.llvm.org/D27259

llvm-svn: 289755
2016-12-15 02:53:42 +00:00
Justin Lebar 7853d3b9dd [NVPTX] Remove dead #defines from NVPTXUtilities.h.
llvm-svn: 289747
2016-12-15 00:45:06 +00:00
Joerg Sonnenberger 400e7b7811 Use PIC relocation model as default for PowerPC64 ELF.
Most of the PowerPC64 code generation for the ELF ABI is already PIC.
There are four main exceptions:
(1) Constant pointer arrays etc. should in writeable sections.
(2) The TOC restoration NOP after a call is needed for all global
symbols. While GNU ld has a workaround for questionable GCC self-calls,
we trigger the checks for calls from COMDAT sections as they cross input
sections and are therefore not considered self-calls. The current
decision is questionable and suboptimal, but outside the scope of the
change.
(3) TLS access can not use the initial-exec model.
(4) Jump tables should use relative addresses. Note that the current
encoding doesn't work for the large code model, but it is more compact
than the default for any non-trivial jump table. Improving this is again
beyond the scope of this change.

At least (1) and (3) are assumptions made in target-independent code and
introducing additional hooks is a bit messy. Testing with clang shows
that a -fPIC binary is 600KB smaller than the corresponding -fno-pic
build. Separate testing from improved jump table encodings would explain
only about 100KB or so. The rest is expected to be a result of more
aggressive immediate forming for -fno-pic, where the -fPIC binary just
uses TOC entries.

This change brings the LLVM output in line with the GCC output, other
PPC64 compilers like XLC on AIX are known to produce PIC by default
as well. The relocation model can still be provided explicitly, i.e.
when using MCJIT.

One test case for case (1) is included, other test cases with relocation
mode sensitive behavior are wired to static for now. They will be
reviewed and adjusted separately.

Differential Revision: https://reviews.llvm.org/D26566

llvm-svn: 289743
2016-12-15 00:01:53 +00:00
Justin Lebar a54f4d7052 [NVPTX] Remove dead code.
I've chosen to remove NVPTXInstrInfo::CanTailMerge but not
NVPTXInstrInfo::isLoadInstr and isStoreInstr (which are also dead)
because while the latter two are reasonably useful utilities, the former
cannot be used safely: It relies on successful address space inference
to identify writes to shared memory, but addrspace inference is a
best-effort thing.

llvm-svn: 289740
2016-12-14 23:20:40 +00:00
Sanjay Patel afee21a5b2 [DAG] allow more select folding for targets that have 'and not' (PR31175)
The original motivation for this patch comes from wanting to canonicalize 
more IR to selects and also canonicalizing min/max.

If we're going to do that, we need more backend fixups to undo select codegen 
when simpler ops will do. I chose AArch64 for the tests because that shows the
difference in the simplest way. This should fix:
https://llvm.org/bugs/show_bug.cgi?id=31175

Differential Revision: https://reviews.llvm.org/D27489

llvm-svn: 289738
2016-12-14 22:59:14 +00:00
Eugene Zelenko f9f8c68290 [Hexagon] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 289736
2016-12-14 22:50:46 +00:00
Greg Clayton 52fe1f68c8 Add the ability to get attribute values as Optional<T>
When getting attributes it is sometimes nicer to use Optional<T> some of the time instead of magic values. I tried to cut over to only using the Optional values but it made many of the call sites very messy, so it makes sense the leave in the calls that can return a default value. Otherwise code that looks like this:

uint64_t CallColumn = Die.getAttributeValueAsAddress(DW_AT_call_line, 0);

Has to be turned into:

uint64_t CallColumn = 0;
if (auto CallColumnValue = Die.getAttributeValueAsAddress(DW_AT_call_line))
    CallColumn = *CallColumnValue;

The first snippet of code looks much better. But in cases where you want an offset that may or may not be there, the following code looks better:

if (auto StmtOffset = Die.getAttributeValueAsSectionOffset(DW_AT_stmt_list)) {
  // Use StmtOffset
}

Differential Revision: https://reviews.llvm.org/D27772

llvm-svn: 289731
2016-12-14 22:38:08 +00:00
Justin Lebar 19bf9d2b6d [NVPTX] Support .maxnreg annotation.
Reviewers: tra

Subscribers: llvm-commits, jholewinski

Differential Revision: https://reviews.llvm.org/D27638

llvm-svn: 289729
2016-12-14 22:32:50 +00:00
Justin Lebar e6867085fa [NVPTX] Remove string constants from NVPTXBaseInfo.h.
Summary:
Previously they were defined as a 2D char array in a header file.  This
is kind of overkill -- we can let the linker lay out these strings
however it pleases.  While we're at it, we might as well just inline
these constants where they're used, as each of them is used only once.

Also move NVPTXUtilities.{h,cpp} into namespace llvm.

Reviewers: tra

Subscribers: jholewinski, mgorny, llvm-commits

Differential Revision: https://reviews.llvm.org/D27636

llvm-svn: 289728
2016-12-14 22:32:44 +00:00
Peter Collingbourne b677fe00fb LibDriver: Reject inputs that are not COFF objects or bitcode files.
Fixes PR31372.

Differential Revision: https://reviews.llvm.org/D27776

llvm-svn: 289726
2016-12-14 22:19:22 +00:00
Dehao Chen 40dd8c5109 Only sets profile summary when it was not preset.
Summary: SampleProfileLoader pass may be invoked twice by LTO. The 2nd pass should not append more summary info as it is already preset by the 1st pass.

Reviewers: eraman, davidxl

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D27733

llvm-svn: 289725
2016-12-14 22:06:49 +00:00
Dehao Chen fb699619a0 Fix the bug in r289714 (NFC).
llvm-svn: 289724
2016-12-14 22:03:08 +00:00
Davide Italiano 2ceb628f36 [LTO] Reject modules without datalayout.
Also, udpate the ~60 failing tests in the tree which did
not contain a valid datalayout.
This fixes PR31123. lld will be updated in a following patch,
immediately after this is committed.

Differential Revision:  https://reviews.llvm.org/D27082

llvm-svn: 289719
2016-12-14 21:57:04 +00:00
Filipe Cabecinhas dd9688703c [asan] Don't skip instrumentation of masked load/store unless we've seen a full load/store on that pointer.
Reviewers: kcc, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27625

llvm-svn: 289718
2016-12-14 21:57:04 +00:00
Filipe Cabecinhas 1e69017a6d [asan] Hook ClInstrumentWrites and ClInstrumentReads to masked operation instrumentation.
Reviewers: kcc

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27548

llvm-svn: 289717
2016-12-14 21:56:59 +00:00
Dehao Chen a99e082e15 Create SampleProfileLoader pass in llvm instead of clang
Summary: We used to create SampleProfileLoader pass in clang. This makes LTO/ThinLTO unable to add this pass in the linker plugin. This patch moves the SampleProfileLoader pass creation from clang to llvm pass manager builder.

Reviewers: tejohnson, davidxl, dnovillo

Subscribers: llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D27743

llvm-svn: 289714
2016-12-14 21:40:47 +00:00
Eli Friedman cbed30c501 [ARM] Split 128-bit vectors in BUILD_VECTOR lowering
Given that INSERT_VECTOR_ELT operates on D registers anyway, combining
64-bit vectors into a 128-bit vector is basically free. Therefore, try
to split BUILD_VECTOR nodes before giving up and lowering them to a series
of INSERT_VECTOR_ELT instructions. Sometimes this allows dramatically
better lowerings; see testcases for examples. Inspired by similar code
in the x86 backend for AVX.

Differential Revision: https://reviews.llvm.org/D27624

llvm-svn: 289706
2016-12-14 20:44:38 +00:00
Nico Weber 53816d074d fix gcc warning about a superfluous ;
llvm-svn: 289705
2016-12-14 20:33:54 +00:00
Robert Lougher cfd7198698 [InstCombine] Folding of a compare with RHS const should merge debug locations
If all the operands to a phi node are compares that have a RHS constant,
instcombine will try to pull them through the phi node, combining them into
a single operation. When it does this, the debug location of the new op
should be the merged debug locations of the phi node arguments.

Patch 8 of 8 for D26256.  Folding of a compare that has a RHS constant.

Differential Revision: https://reviews.llvm.org/D26256

llvm-svn: 289704
2016-12-14 20:27:22 +00:00
Eli Friedman 10576e73c9 [ARM] Add ARMISD::VLD1DUP to match vld1_dup more consistently.
Currently, there are substantial problems forming vld1_dup even if the
VDUP survives legalization. The lack of an actual node
leads to terrible results: not only can we not form post-increment vld1_dup
instructions, but we form scalar pre-increment and post-increment
loads which force the loaded value into a GPR. This patch fixes that
by combining the vdup+load into an ARMISD node before DAGCombine
messes it up.

Also includes a crash fix for vld2_dup (see testcase @vld2dupi8_postinc_variable).

Differential Revision: https://reviews.llvm.org/D27694

llvm-svn: 289703
2016-12-14 20:25:26 +00:00
Amjad Aboud 43c8b6b7b2 [DebugInfo] Changed DIBuilder::createCompileUnit() to take DIFile instead of FileName and Directory.
This way it will be easier to expand DIFile (e.g., to contain checksum) without the need to modify the createCompileUnit() API.

Reviewers: llvm-commits, rnk

Differential Revision: https://reviews.llvm.org/D27762

llvm-svn: 289702
2016-12-14 20:24:54 +00:00
Yaxun Liu 04334b527d Fix build failure due to r289674 on certain systems
Removed a useless include which caused conflict.

llvm-svn: 289700
2016-12-14 20:17:47 +00:00
Robert Lougher c9f7354776 [InstCombine] Folding of a binop with RHS const should merge the debug locations
If all the operands to a phi node are a binop with a RHS constant, instcombine
will try to pull them through the phi node, combining them into a single
operation. When it does this, the debug location of the new op should be the
merged debug locations of the phi node arguments.

Patch 7 of 8 for D26256.  Folding of a binop with RHS constant.

Differential Revision: https://reviews.llvm.org/D26256

llvm-svn: 289699
2016-12-14 20:07:49 +00:00
David Blaikie b461468958 DebugInfo: Improve type safety and simplify some subprogram finalization code
This probably ended up this way aften the subprogram<>function link
inversion and debug info metadata schema changes.

llvm-svn: 289697
2016-12-14 19:38:39 +00:00
Geoff Berry ca11a1e147 [GVNHoist] Move GVNHoist to function simplification part of pipeline.
Summary:
Move GVNHoist to later in the optimization pipeline, specifically, to
the function simplification part of the pipeline.  The new pipeline
location allows GVNHoist to run on a function after its callees have
been inlined but before the function has been considered for inlining
into its callers, exposing more opportunities for hoisting.

Performance results on AArch64 kryo:
Improvements:
  Benchmarks/CoyoteBench/fftbench  -24.952%
  spec2006/bzip2                    -4.071%
  internal bmark                    -3.177%
  Benchmarks/PAQ8p/paq8p            -1.754%
  spec2000/perlbmk                  -1.328%
  spec2006/h264ref                  -1.140%

Regressions:
  internal bmark                    +1.818%
  Benchmarks/mafft/pairlocalalign   +1.084%

Reviewers: sebpop, dberlin, hiraditya

Subscribers: aemerson, mehdi_amini, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D27722

llvm-svn: 289696
2016-12-14 19:38:22 +00:00
Andrew Kaylor ce3bcae632 [WinEH] Avoid holding references to BlockColor (DenseMap) entries while inserting new elements
Differential Revision: https://reviews.llvm.org/D27693

llvm-svn: 289694
2016-12-14 19:30:18 +00:00
Robert Lougher f02d9b8325 [InstCombine] When folding casts through a phi node merge the debug locations
If all the operands to a phi node are a cast, instcombine will try to pull
them through the phi node, combining them into a single cast. When it does
this, the debug location of the new cast should be the merged debug locations
of the phi node arguments.

Patch 6 of 8 for D26256.  Folding of a cast operation.

Differential Revision: https://reviews.llvm.org/D26256

llvm-svn: 289693
2016-12-14 19:24:01 +00:00
Sean Callanan 62204ad74a Include <cstdarg> in PrettyStackTrace.cpp, fixing the bots.
llvm-svn: 289691
2016-12-14 19:19:53 +00:00
Sean Callanan 032dbf9ee3 Prepare PrettyStackTrace for LLDB adoption
This patch fixes the linkage for __crashtracer_info__, making it have the proper mangling (extern "C") and linkage (private extern).
It also adds a new PrettyStackTrace type, allowing LLDB to adopt this instead of Host::SetCrashDescriptionWithFormat().

Without this patch, CrashTracer on macOS won't pick up pretty stack traces from any LLVM client. 
An LLDB commit adopting this API will follow shortly.

Differential Revision: https://reviews.llvm.org/D27683

llvm-svn: 289689
2016-12-14 19:09:43 +00:00
Robert Lougher 373e36a410 [InstCombine] Folding loads through a phi node should merge the debug locations
If all the operands to a phi node are a load, instcombine will try to pull
them through the phi node, combining them into a single load. When it does
this, the debug location of the new load should be the merged debug locations
of the phi node arguments.

Patch 5 of 8 for D26256.  Folding of a load operation.

Differential Revision: https://reviews.llvm.org/D26256

llvm-svn: 289688
2016-12-14 19:02:14 +00:00
Robert Lougher 8fc1e89bbb [InstCombine] When folding GEP through a phi node merge the debug locations
If all the operands to a phi node are getelementptr, instcombine
will try to pull them through the phi node, combining them into a single
operation.  When it does this, the debug location of the new getelementptr
should be the merged debug locations of the phi node arguments.

Patch 4 of 8 for D26256.  Folding of a getelementptr operation.

Differential Revision: https://reviews.llvm.org/D26256

llvm-svn: 289684
2016-12-14 18:37:50 +00:00
Eric Christopher ba1024cfb8 This change does two things:
Adds a "Discriminator" field to struct DILineInfo, which defaults to 0.
Fills out the "Discriminator" field in DILineInfo in DWARFDebugLine::LineTable::getFileLineInfoForAddress().

in order to have a slightly nicer interface in getFileLineInfoForAddress.

Patch by Simon Que!

Differential Revision: https://reviews.llvm.org/D27649

llvm-svn: 289683
2016-12-14 18:29:39 +00:00
Robert Lougher 4b0790d488 [InstCombine] Merge debug locations when folding through a phi node
If all the operands to a phi node are of the same operation, instcombine
will try to pull them through the phi node, combining them into a single
operation.  When it does this, the debug location of the operation should
be the merged debug locations of the phi node arguments.

Patch 3 of 8 for D26256.  Folding of a compare operation.

Differential Revision: https://reviews.llvm.org/D26256

llvm-svn: 289681
2016-12-14 18:14:57 +00:00
Kostya Serebryany d9d9a54511 [libFuzzer] disable msan for one more hook that reads target's data that might be uninitialized
llvm-svn: 289680
2016-12-14 18:13:02 +00:00
Robert Lougher 2428a4050f [InstCombine] Merge debug locations when folding through a phi node
If all the operands to a phi node are of the same operation, instcombine
will try to pull them through the phi node, combining them into a single
operation.  When it does this, the debug location of the operation should
be the merged debug locations of the phi node arguments.

Patch 2 of 8 for D26256.  Folding of a binary operation.

Differential Revision: https://reviews.llvm.org/D26256

llvm-svn: 289679
2016-12-14 17:49:19 +00:00
Dehao Chen 23025f8483 revert r289669 which breaks bots
llvm-svn: 289676
2016-12-14 17:23:16 +00:00
Yaxun Liu 07d659bc76 AMDGPU: Emit runtime metadata version 2 as YAML
Differential Revision: https://reviews.llvm.org/D25046

llvm-svn: 289674
2016-12-14 17:16:52 +00:00
Matt Arsenault bdc0ac0a0e AMDGPU: Make AllocationPriority of SGPRs higher than VGPRs
Since SGPRs should spill to VGPRs, they should be allocated first.
I don't think this is sufficient for SGPRs to always spill to
VGPRs though.

llvm-svn: 289671
2016-12-14 16:52:06 +00:00
Dehao Chen cb61c94d87 Create SampleProfileLoader pass in llvm instead of clang
Summary: We used to create SampleProfileLoader pass in clang. This makes LTO/ThinLTO unable to add this pass in the linker plugin. This patch moves the SampleProfileLoader pass creation from clang to llvm pass manager builder.

Reviewers: tejohnson, davidxl, dnovillo

Subscribers: llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D27743

llvm-svn: 289669
2016-12-14 16:49:28 +00:00
Nirav Dave f5bf03c7ef Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
Reverting due to ARM MCJIT and MIPS LLD error.

This reverts commit r289659.

llvm-svn: 289667
2016-12-14 16:43:44 +00:00
Matt Arsenault ebfba7027e AMDGPU: Change vintrp printing
llvm-svn: 289664
2016-12-14 16:36:12 +00:00
Nirav Dave 8527ab0ad2 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Retrying after fixing after removing load-store factoring through
token factors in favor of improved token factor operand pruning

Simplify Consecutive Merge Store Candidate Search

Now that address aliasing is much less conservative, push through
simplified store merging search which only checks for parallel stores
through the chain subgraph. This is cleaner as the separation of
non-interfering loads/stores from the store-merging logic.

Whem merging stores, search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited. This improves the quality of the
output SelectionDAG and generally the output CodeGen (with some
exceptions).

Additional Minor Changes:

   1. Finishes removing unused AliasLoad code
   2. Unifies the the chain aggregation in the merged stores across
      code paths
   3. Re-add the Store node to the worklist after calling
      SimplifyDemandedBits.
   4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
      arbitrary, but seemed sufficient to not cause regressions in
      tests.

This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.

Many tests required some changes as memory operations are now
reorderable. Some tests relying on the order were changed to use
volatile memory operations

Noteworthy tests:

    CodeGen/AArch64/argument-blocks.ll -
      It's not entirely clear what the test_varargs_stackalign test is
      supposed to be asserting, but the new code looks right.

    CodeGen/AArch64/arm64-memset-inline.lli -
    CodeGen/AArch64/arm64-stur.ll -
    CodeGen/ARM/memset-inline.ll -

      The backend now generates *worse* code due to store merging
      succeeding, as we do do a 16-byte constant-zero store efficiently.

    CodeGen/AArch64/merge-store.ll -
      Improved, but there still seems to be an extraneous vector insert
      from an element to itself?

    CodeGen/PowerPC/ppc64-align-long-double.ll -
      Worse code emitted in this case, due to the improved store->load
      forwarding.

    CodeGen/X86/dag-merge-fast-accesses.ll -
    CodeGen/X86/MergeConsecutiveStores.ll -
    CodeGen/X86/stores-merging.ll -
    CodeGen/Mips/load-store-left-right.ll -
      Restored correct merging of non-aligned stores

    CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
      Improved. Correctly merges buffer_store_dword calls

    CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
      Improved. Sidesteps loading a stored value and
      merges two stores

    CodeGen/X86/pr18023.ll -
      This test has been removed, as it was asserting incorrect
      behavior. Non-volatile stores *CAN* be moved past volatile loads,
      and now are.

    CodeGen/X86/vector-idiv.ll -
    CodeGen/X86/vector-lzcnt-128.ll -
      It's basically impossible to tell what these tests are actually
      testing. But, looks like the code got better due to the memory
      operations being recognized as non-aliasing.

    CodeGen/X86/win32-eh.ll -
      Both loads of the securitycookie are now merged.

Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel

Differential Revision: https://reviews.llvm.org/D14834

llvm-svn: 289659
2016-12-14 15:44:26 +00:00
Simon Pilgrim 05ab8ffc7e [DAGCombiner] Try to use SelectionDAG::isKnownToBeAPowerOfTwo instead of just APInt::isPowerOf2
Generalize sdiv/udiv/srem/urem combines using APInt::isPowerOf2, which only works for const/splat-const values, to call SelectionDAG::isKnownToBeAPowerOfTwo instead which recognises many more cases.

Added a DAGCombiner::BuildLogBase2 helper since PowerOf2 combines often involve taking the log2 of such a value.

Differential Revision: https://reviews.llvm.org/D27714

llvm-svn: 289654
2016-12-14 15:08:13 +00:00
Michael Zuckerman 1ce2a23a1e Fix bug 30945- [AVX512] Failure to flip vector comparison to remove not mask instruction
adding new optimization opportunity by adding new X86ISelLowering pattern. The test case was shown in https://llvm.org/bugs/show_bug.cgi?id=30945.

Test explanation:
Select gets three arguments mask, op and op2. In this case, the Mask is a result of ICMP. The ICMP instruction compares (with equal operand) the zero initializer vector and the result of the first ICMP.

In general, The result of "cmp eq, op1, zero initializers" is "not(op1)" where op1 is a mask. By rearranging of the two arguments inside the Select instruction, we can get the same result. Without the necessary of the middle phase ("cmp eq, op1, zero initializers").

Missed optimization opportunity: 
vpcmpled %zmm0, %zmm1, %k0
knotw %k0, %k1

can be combine to 
vpcmpgtd %zmm0, %zmm2, %k1

Reviewers: 
1. delena
2. igorb 

Commited after check all 
Differential Revision: https://reviews.llvm.org/D27160

llvm-svn: 289653
2016-12-14 14:57:10 +00:00
Stephan Bergmann 17c7f70362 Replace APFloatBase static fltSemantics data members with getter functions
At least the plugin used by the LibreOffice build
(<https://wiki.documentfoundation.org/Development/Clang_plugins>) indirectly
uses those members (through inline functions in LLVM/Clang include files in turn
using them), but they are not exported by utils/extract_symbols.py on Windows,
and accessing data across DLL/EXE boundaries on Windows is generally
problematic.

Differential Revision: https://reviews.llvm.org/D26671

llvm-svn: 289647
2016-12-14 11:57:17 +00:00
Artur Pilipenko f3ee444010 Add a couple of assertions to the load combine code introduced by r289538
llvm-svn: 289646
2016-12-14 11:55:47 +00:00
Oliver Stannard 268f42f1ce [Assembler] Better error messages for .org directive
Currently, the error messages we emit for the .org directive when the
expression is not absolute or is out of range do not include the line
number of the directive, so it can be hard to track down the problem if
a file contains many .org directives.

This patch stores the source location in the MCOrgFragment, so that it
can be used for diagnostics emitted during layout.

Since layout is an iterative process, and the errors are detected during
each iteration, it would have been possible for errors to be reported
multiple times. To prevent this, I've made the assembler bail out after
each iteration if any errors have been reported. This will still allow
multiple unrelated errors to be reported in the common case where they
are all detected in the first round of layout.

Differential Revision: https://reviews.llvm.org/D27411

llvm-svn: 289643
2016-12-14 10:43:58 +00:00
Dylan McKay 3abd1d3e12 [AVR] Add a function instrumentation pass
This will be used for an on-chip test suite.

llvm-svn: 289641
2016-12-14 10:15:00 +00:00
Craig Topper aeaa52cc11 [X86][InstCombine] Handle demanded elements for operand of AVX-512 scalar floating point to integer conversion intrinsics.
llvm-svn: 289639
2016-12-14 07:46:12 +00:00
Hal Finkel 065b756528 [PowerPC] Fix logic dealing with nop after calls (and tail-call eligibility)
This change aims to unify and correct our logic for when we need to allow for
the possibility of the linker adding a TOC restoration instruction after a
call. This comes up in two contexts:

 1. When determining tail-call eligibility. If we make a tail call (i.e.
    directly branch to a function) then there is no place for the linker to add
    a TOC restoration.
 2. When determining when we need to add a nop instruction after a call.
    Likewise, if there is a possibility that the linker might need to add a
    TOC restoration after a call, then we need to put a nop after the call
    (the bl instruction).

First problem: We were using similar, but different, logic to decide (1) and
(2). This is just wrong. Both the resideInSameModule function (used when
determining tail-call eligibility) and the isLocalCall function (used when
deciding if the post-call nop is needed) were supposed to be determining the
same underlying fact (i.e. might a TOC restoration be needed after the call).
The same logic should be used in both places.

Second problem: The logic in both places was wrong. We only know that two
functions will share the same TOC when both functions come from the same
section of the same object. Otherwise the linker might cause the functions to
use different TOC base addresses (unless the multi-TOC linker option is
disabled, in which case only shared-library boundaries are relevant). There are
a number of factors that can cause functions to be placed in different sections
or come from different objects (-ffunction-sections, explicitly-specified
section names, COMDAT, weak linkage, etc.). All of these need to be checked.
The existing logic only checked properties of the callee, but the properties of
the caller must also be checked (for example, calling from a function in a
COMDAT section means calling between sections).

There was a conceptual error in the resideInSameModule function in that it
allowed tail calls to functions with weak linkage and protected/hidden
visibility. While protected/hidden visibility does prevent the function
implementation from being replaced at runtime (via interposition), it does not
prevent the linker from using an alternate implementation at link time (i.e.
using some strong definition to replace the provided weak one during linking).
If this happens, then we're still potentially looking at a required TOC
restoration upon return.

Otherwise, in general, the post-call nop is needed wherever ELF interposition
needs to be supported. We don't currently support ELF interposition at the IR
level (see http://lists.llvm.org/pipermail/llvm-dev/2016-November/107625.html
for more information), and I don't think we should try to make it appear to
work in the backend in spite of that fact. This will yield subtle bugs if
interposition is attempted. As a result, regardless of whether we're in PIC
mode, we don't assume that we need to add the nop to support the possibility of
ELF interposition. However, the necessary check is in place (i.e. calling
GV->isInterposable and TM.shouldAssumeDSOLocal) so when we have functions for
which interposition is allowed at the IR level, we'll add the nop as necessary.
In the mean time, we'll generate more tail calls and fewer nops when compiling
position-independent code.

Differential Revision: https://reviews.llvm.org/D27231

llvm-svn: 289638
2016-12-14 07:24:50 +00:00
Craig Topper 268b3abe6d [X86][InstCombine] Teach SimplifyDemandedVectorElts to handle masked scalar add/sub/mul/div/max/min intrinsics better.
Now we can remove these intrinsics if element 0 isn't used. Also fix undef element tracking.

llvm-svn: 289636
2016-12-14 06:06:58 +00:00
Craig Topper dfd268d76b [X86][InstCombine] Handle scalar fmadd intrinsics correctly in SimplifyDemandedVectorElts.
Now we pass a modified version of DemandedElts to each operand and we calculate undef elts correctly.

llvm-svn: 289632
2016-12-14 05:43:05 +00:00
Mehdi Amini 8e13bc4562 [ThinLTO] Add an API to trigger file-based API for returning objects to the linker
Summary:
The motivation is to support better the -object_path_lto option on
Darwin. The linker needs to write down the generate object files on
disk for later use by lldb or dsymutil (debug info are not present
in the final binary). We're moving this into libLTO so that we can
be smarter when a cache is enabled and hard-link when possible
instead of duplicating the files.

Reviewers: tejohnson, deadalnix, pcc

Subscribers: dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D27507

llvm-svn: 289631
2016-12-14 04:56:42 +00:00
Craig Topper eb6a20e79e [X86][InstCombine] Teach SimplifyDemandedVectorElts to handle scalar round intrinsics more correctly.
Now we only pass bit 0 of the DemandedElts to optimize operand 1 as we recurse since the upper bits are unused. Similarly we clear bit 0 for optimizing operand 0.

Also calculate UndefElts correctly.

Simplify InstCombineCalls for these instrinics to just call SimplifyDemandedVectorElts for the call instrution to reuse this support.

llvm-svn: 289629
2016-12-14 03:17:30 +00:00
Craig Topper a0372dec26 [X86][InstCombine] Teach SimplifyDemandedVectorElts to handle scalar min/max/cmp intrinsics more correctly.
Now we only pass bit 0 of the DemandedElts to optimize operand 1 as we recurse since the upper bits are unused.

Also calculate UndefElts correctly.

Simplify InstCombineCalls for these instrinics to just call SimplifyDemandedVectorElts for the call instrution to reuse this support.

llvm-svn: 289628
2016-12-14 03:17:27 +00:00
Mehdi Amini 76a00b51f0 Don't double-initialize cl::opt for iterating in reverse order to uncover non-determinism in codegen by default
Bots are broken and needs to be fixed before having this on by default.
The feature was committed in r289619.

I tried to disable it in r289624 and failed because it was initialized in two places.

llvm-svn: 289626
2016-12-14 02:35:32 +00:00
Peter Collingbourne 1a0720e8c4 LTO: Add support for multi-module bitcode files.
Differential Revision: https://reviews.llvm.org/D27313

llvm-svn: 289621
2016-12-14 01:17:59 +00:00
Paul Robinson 8fec3da00c [DWARF] Preserve column number when emitting 'line 0' record
Follow-up to r289256, address a FIXME to avoid resetting the column
number. This reduced .debug_line by 2.6% in a RelWithDebInfo
self-build of clang.

llvm-svn: 289620
2016-12-14 00:27:35 +00:00
Mandeep Singh Grang f6b069c7db [llvm] Iterate SmallPtrSet in reverse order to uncover non-determinism in codegen
Summary:
Given a flag (-mllvm -reverse-iterate) this patch will enable iteration of SmallPtrSet in reverse order.
The idea is to compile the same source with and without this flag and expect the code to not change.
If there is a difference in codegen then it would mean that the codegen is sensitive to the iteration order of SmallPtrSet.
This is enabled only with LLVM_ENABLE_ABI_BREAKING_CHECKS.

Reviewers: chandlerc, dexonsmith, mehdi_amini

Subscribers: mgorny, emaste, llvm-commits

Differential Revision: https://reviews.llvm.org/D26718

llvm-svn: 289619
2016-12-14 00:15:57 +00:00
Evandro Menezes aeec780e42 Add support for Samsung Exynos M3 (NFC)
llvm-svn: 289613
2016-12-13 23:31:41 +00:00
Greg Clayton 1cbf3fa94a Switch functions that returned bool and filled in a DWARFFormValue arg with ones that return Optional<DWARFFormValue>
Differential Revision: https://reviews.llvm.org/D27737

llvm-svn: 289611
2016-12-13 23:20:56 +00:00
Kostya Serebryany f6f82c2cc8 [libFuzzer] fix an UB (invalid shift) spotted by ubsan. The code worked fine by luck, because the way shifts actually work on clang+x86
llvm-svn: 289607
2016-12-13 22:49:14 +00:00
Eugene Zelenko 8208592707 [Hexagon] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 289604
2016-12-13 22:13:50 +00:00
Dehao Chen 0f35fa907d Change CoverageTracker from a global variable to member variable to avoid breaking thread-safety. (NFC)
llvm-svn: 289603
2016-12-13 22:13:18 +00:00
Anna Thomas 65ca8e91cc [IRCE] Avoid loop optimizations on pre and post loops
Summary:
This patch will add loop metadata on the pre and post loops generated by IRCE.
Currently, we have metadata for disabling optimizations such as vectorization,
unrolling, loop distribution and LICM versioning (and confirmed that these
optimizations check for the metadata before proceeding with the transformation).

The pre and post loops generated by IRCE need not go through loop opts (since
these are slow paths).

Added two test cases as well.

Reviewers: sanjoy, reames

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26806

llvm-svn: 289588
2016-12-13 21:05:21 +00:00
Michael Kuperstein 3d23d4a234 [LV] Don't vectorize when we have a small static bound on trip count
We currently check if the exact trip count is known and is smaller than the
"tiny loop" bound. We should be checking the maximum bound on the trip count
instead.

Differential Revision: https://reviews.llvm.org/D27690

llvm-svn: 289583
2016-12-13 20:38:18 +00:00
Peter Collingbourne 45102a24c7 Object: Make IRObjectFile own multiple modules and enumerate symbols from all modules.
This implements multi-module support in IRObjectFile.

Differential Revision: https://reviews.llvm.org/D26951

llvm-svn: 289578
2016-12-13 20:20:17 +00:00
Peter Collingbourne c5fecb4f1a Object: Remove module accessors from IRObjectFile, and hide its constructor.
Differential Revision: https://reviews.llvm.org/D27079

llvm-svn: 289577
2016-12-13 20:10:22 +00:00
Peter Collingbourne 77f4c30d6f LTO: Port the legacy LTO API to ModuleSymbolTable.
Differential Revision: https://reviews.llvm.org/D27078

llvm-svn: 289576
2016-12-13 20:01:58 +00:00
Peter Collingbourne ad90369a94 LTO: Port the new LTO API to ModuleSymbolTable.
Differential Revision: https://reviews.llvm.org/D27077

llvm-svn: 289574
2016-12-13 19:43:49 +00:00
Alina Sbirlea 77c5eaaeda Generalize strided store pattern in interleave access pass
Summary:
This patch aims to generalize matching of the strided store accesses to more general masks.
The more general rule is to have consecutive accesses based on the stride:
[x, y, ... z, x+1, y+1, ...z+1, x+2, y+2, ...z+2, ...]
All elements in the masks need not form a contiguous space, there may be gaps.
As before, undefs are allowed and filled in with adjacent element loads.

Reviewers: HaoLiu, mssimpso

Subscribers: mkuper, delena, llvm-commits

Differential Revision: https://reviews.llvm.org/D23646

llvm-svn: 289573
2016-12-13 19:32:36 +00:00
Matthias Braun fde00fc252 Revert "AArch64CollectLOH: Rewrite as block-local analysis."
This is not always behaving as expected as it turns out block live-in
lists are only correct most of the time. Still waiting for reviews on
https://reviews.llvm.org/D27559 to have them correct all of the time.

See also http://llvm.org/PR31361, rdar://25117107

This reverts commit r288567.
This reverts commit r288561.

llvm-svn: 289570
2016-12-13 19:08:17 +00:00
Tim Northover fe7c59adb8 GlobalISel: fix GOT accesses on AArch64.
We were using the correct pseudo-instruction, but because the operand's flags
weren't set correctly we still ended up emitting incorrect relocations during
MC lowering.

llvm-svn: 289566
2016-12-13 18:25:38 +00:00
Greg Clayton c8c1032c0c Make a DWARFDIE class that can help avoid using the wrong DWARFUnit when extracting attributes
Many places pass around a DWARFDebugInfoEntryMinimal and a DWARFUnit. It is easy to get things wrong by using the wrong DWARFUnit with a DWARFDebugInfoEntryMinimal. This patch creates a DWARFDie class that contains the DWARFUnit and DWARFDebugInfoEntryMinimal objects so that they can't get out of sync. All attribute extraction has been moved out of DWARFDebugInfoEntryMinimal and into DWARFDie. DWARFDebugInfoEntryMinimal was also renamed to DWARFDebugInfoEntry.

DWARFDie objects are temporary objects that are used by clients and contain 2 pointers that you always need to have anyway. Keeping them grouped will avoid errors and simplify many of the attribute extracting APIs by not having to pass in a DWARFUnit.

Differential Revision: https://reviews.llvm.org/D27634

llvm-svn: 289565
2016-12-13 18:25:19 +00:00
Marcos Pividori c21b3c949d [libFuzzer] Add missing header needed for Windows.
llvm-svn: 289564
2016-12-13 17:46:48 +00:00
Marcos Pividori 7c1defd738 [libFuzzer] Avoid name collision with Windows API.
Windows uses some macros to replace DeleteFile() by DeleteFileA() or
DeleteFileW(). This was causing an error at link time.
DeleteFile was renamed to RemoveFile().

Differential Revision: https://reviews.llvm.org/D27577

llvm-svn: 289563
2016-12-13 17:46:40 +00:00
Marcos Pividori 67dfacdd80 [libFuzzer] Implement DirName() for Windows.
Implement DirName from scratch to avoid dependencies on external libraries.
It's based on MSDN documentation for Naming Files, Paths, and Namespaces.

The algorithm can't simply start from the end and look backwards for the
first separator, because we need to preserve the prefix that represent
the root location. We shouldn't remove anything there. In Windows we
have many different options, like:
 \\Server\Share\ , \ , C: , C:\ , \\?\C:\ , \\?\UNC\Server\Share\
We remove the last separator in the rest of the path, if it exists.

It was implemented to have a similar behaviour to dirname() in linux,
removing trailing separators, returning "." when the path doesn't
contain separators, etc.

Differential Revision: https://reviews.llvm.org/D27579

llvm-svn: 289562
2016-12-13 17:46:32 +00:00
Marcos Pividori 64d4147396 [libFuzzer] Fix bug in detecting timeouts when input string is empty.
I added a new flag RunningCB to know if the Fuzzer's main thread is
running the CB function, instead of using (!CurrentUnitSize).
(!CurrentUnitSize) doesn't work properly. For example, in FuzzerLoop.cpp,
inside ShuffleAndMinimize() function, we execute the callback with an
empty string (size=0). Previous implementation failed to detect timeouts
in that execution.
Also, I add a regression test for that case.

Differential Revision: https://reviews.llvm.org/D27433

llvm-svn: 289561
2016-12-13 17:46:25 +00:00
Marcos Pividori 178fe58745 [libFuzzer] Clean up headers and file formatting of LibFuzzer files.
Reorganize #includes to follow LLVM Coding Standards.
Include some missing headers. Required to use `Printf()`.

Aside from that, this patch contains no functional change.
It is purely a re-organization.

Differential Revision: https://reviews.llvm.org/D27363

llvm-svn: 289560
2016-12-13 17:46:11 +00:00
Marcos Pividori 6e3d885c79 [libFuzzer] Properly use unsigned for workers, jobs and NumberOfCpuCores.
std:🧵:hardware_concurrency() returns an unsigned, so I modify
NumberOfCpuCores() to return unsigned too.
The number of cpus is used to define the number of workers, so I decided
to update the worker and jobs flags to be declared as unsigned too.

Differential Revision: https://reviews.llvm.org/D27685

llvm-svn: 289559
2016-12-13 17:45:53 +00:00
Marcos Pividori 463f8bdd0b [libFuzzer] Properly use unsigned for Process ID.
Use unsigned for PID instead of signed int. GetCurrentProcessId() returns
an unsigned (DWORD) so we must be sure we can deal with all possible values.
I use a long unsigned to be sure it can hold a 32 bit unsigned (DWORD).

Differential Revision: https://reviews.llvm.org/D27281

llvm-svn: 289558
2016-12-13 17:45:44 +00:00
Marcos Pividori c59b692c85 [libFuzzer] Improve Signal Handler interface.
Add new flags to FuzzingOptions to represent the different conditions
on the signal handling. These options are passed when calling
SetSignalHandler().
This changes simplify the implementation of Windows's exception
handling. Now we can define a unique handler for all the exceptions.

Differential Revision: https://reviews.llvm.org/D27238

llvm-svn: 289557
2016-12-13 17:45:20 +00:00
David Callahan ebcf916c5a [ADCE] Add code to remove dead branches
Summary:
This is last in of a series of patches to evolve ADCE.cpp to support
removing of unnecessary control flow.

This patch adds the code to update the control and data flow graphs
to remove the dead control flow.

Also update unit tests to test the capability to remove dead,
may-be-infinite loop which is enabled by the switch
-adce-remove-loops.

Previous patches:

D23824 [ADCE] Add handling of PHI nodes when removing control flow
D23559 [ADCE] Add control dependence computation
D23225 [ADCE] Modify data structures to support removing control flow
D23065 [ADCE] Refactor anticipating new functionality (NFC)
D23102 [ADCE] Refactoring for new functionality (NFC)

Reviewers: dberlin, majnemer, nadav, mehdi_amini

Subscribers: llvm-commits, david2050, freik, twoh

Differential Revision: https://reviews.llvm.org/D24918

llvm-svn: 289548
2016-12-13 16:42:18 +00:00
Artur Pilipenko 469fcd2afd Use more detailed assertion messages in the code introduced by r289538
llvm-svn: 289545
2016-12-13 16:26:15 +00:00
Artur Pilipenko 79d1255e26 Fix a buildbot failure introduced by r289538
Build failed because of unused variable in product mode.

llvm-svn: 289540
2016-12-13 14:55:31 +00:00
Artur Pilipenko c93cc5955f [DAGCombiner] Match load by bytes idiom and fold it into a single load
Match a pattern where a wide type scalar value is loaded by several narrow loads and combined by shifts and ors. Fold it into a single load or a load and a bswap if the targets supports it.

Assuming little endian target:
  i8 *a = ...
  i32 val = a[0] | (a[1] << 8) | (a[2] << 16) | (a[3] << 24)
=>
  i32 val = *((i32)a)

  i8 *a = ...
  i32 val = (a[0] << 24) | (a[1] << 16) | (a[2] << 8) | a[3]
=>
  i32 val = BSWAP(*((i32)a))

This optimization was discussed on llvm-dev some time ago in "Load combine pass" thread. We came to the conclusion that we want to do this transformation late in the pipeline because in presence of atomic loads load widening is irreversible transformation and it might hinder other optimizations.

Eventually we'd like to support folding patterns like this where the offset has a variable and a constant part:
  i32 val = a[i] | (a[i + 1] << 8) | (a[i + 2] << 16) | (a[i + 3] << 24)

Matching the pattern above is easier at SelectionDAG level since address reassociation has already happened and the fact that the loads are adjacent is clear. Understanding that these loads are adjacent at IR level would have involved looking through geps/zexts/adds while looking at the addresses.

The general scheme is to match OR expressions by recursively calculating the origin of individual bits which constitute the resulting OR value. If all the OR bits come from memory verify that they are adjacent and match with little or big endian encoding of a wider value. If so and the load of the wider type (and bswap if needed) is allowed by the target generate a load and a bswap if needed.

Reviewed By: hfinkel, RKSimon, filcab

Differential Revision: https://reviews.llvm.org/D26149

llvm-svn: 289538
2016-12-13 14:21:14 +00:00
Artur Pilipenko 01e86444a0 Move BaseIndexOffset in DAGCombiner.cpp so it will be available for the upcoming user
llvm-svn: 289537
2016-12-13 14:16:02 +00:00
Simon Pilgrim 9dc67c0101 [SelectionDAG] computeKnownBits - simplified knownbits sign extension. NFCI.
We don't need to extract+test the sign bit of the known ones/zeros, we can use sext which will handle all of this.

llvm-svn: 289534
2016-12-13 13:36:27 +00:00
Simon Dardis c97cfb69ba [mips][rtdyld] Move MIPS relocation resolution to a subclass and implement N32 relocations
N32 relocations are only correct for individual relocations at the moment.
Support for relocation composition will follow in a later patch.

Patch By: Daniel Sanders

Reviwers: vkalintiris, atanasyan

Differential Revision: https://reviews.llvm.org/D27467

llvm-svn: 289532
2016-12-13 11:39:18 +00:00
Simon Dardis e8af792439 [mips] Fix comment to respect 80 chars per line; NFC
llvm-svn: 289530
2016-12-13 11:10:53 +00:00
Simon Dardis 43b5ce492d [mips] Fix compact branch hazard detection
In certain cases it is possible that transient instructions such as
%reg = IMPLICIT_DEF as a single instruction in a basic block to reach
the MipsHazardSchedule pass. This patch teaches MipsHazardSchedule to
properly look through such cases.

Reviewers: vkalintiris, zoran.jovanovic

Differential Revision: https://reviews.llvm.org/D27209

llvm-svn: 289529
2016-12-13 11:07:51 +00:00
Diana Picus 2d9adbf524 [GlobalISel] Move extendRegister where it belongs. NFCI
Apparently I missed this one when I moved ValueHandler back in r288658. Sorry!

llvm-svn: 289528
2016-12-13 10:46:12 +00:00
Craig Topper ac75bca1eb [X86][InstCombine] Fix SimplifyDemandedVectorElts to handle frcz scalar intrinsics correctly.
Only the lower bits of the input element are used. And only the lower element can be undef since the upper bits are zeroed.

Have InstCombineCalls call SimplifyDemandedVectorElts for these intrinsics to reuse this support.

llvm-svn: 289523
2016-12-13 07:45:45 +00:00
Rong Xu 51a1e3c430 [PGO] Fix insane counts due to nonreturn calls
Summary:
Since we don't break BBs for function calls. We might get some insane counts 
(wrap of unsigned) in the presence of noreturn calls.

This patch sets these counts to zero instead of the wrapped number.

Reviewers: davidxl

Subscribers: xur, eraman, llvm-commits

Differential Revision: https://reviews.llvm.org/D27602

llvm-svn: 289521
2016-12-13 06:41:14 +00:00
Davide Italiano 463bebc319 [SCCP] Debug diagnostic goes under DEBUG(). NFCI.
llvm-svn: 289519
2016-12-13 05:56:04 +00:00
Dylan McKay 1e57fa487b [AVR] Add an 'relax memory operation' pass
Summary:
This pass will be used to relax instructions which use out of bounds
memory accesses to equivalent operations that can work with the
addresses.

The pass currently implements relaxation for the STDWPtrQRr instruction.

Without this pass, an assertion error would be hit in the pseudo expansion pass.

In the future, we will need to add more instructions to this pass. We can do
that on a case-by-case basic.

Reviewers: arsenm, kparzysz

Subscribers: wdng, llvm-commits, mgorny

Differential Revision: https://reviews.llvm.org/D27650

llvm-svn: 289517
2016-12-13 05:53:14 +00:00
Philip Reames 1f1bbac8da [peephole] Enhance folding logic to work for STATEPOINTs
The general idea here is to get enough of the existing restrictions out of the way that the already existing folding logic in foldMemoryOperand can kick in for STATEPOINTs and fold references to immutable stack slots. The key changes are:

    Support for folding multiple operands at once which reference the same load
    Support for folding multiple loads into a single instruction
    Walk all the operands of the instruction for varidic instructions (this is a bug fix!)

Once this lands, I'll post another patch which refactors the TII interface here. There's nothing actually x86 specific about the x86 code used here.

Differential Revision: https://reviews.llvm.org/D24103

llvm-svn: 289510
2016-12-13 01:38:41 +00:00
Philip Reames 51387a8c28 [Statepoints] Reuse stack slots more than once within a basic block
The stack slot reuse code had a really amusing bug. We ended up only reusing a stack slot exact once (initial use + reuse) within a basic block. If we had a third statepoint to process, we ended up allocating a new set of stack slots. If we crossed a basic block boundary, the set got cleared. As a result, code which is invoke heavy doesn't see the problem, but multiple calls within a basic block does. Net result: as we optimize invokes into calls, lowering gets worse.

The root error here is that the bitmap uses by the custom allocator wasn't kept in sync. The result was that we ended up resizing the bitmap on the next statepoint (to handle the cross block case), reset the bit once, but then never reset it again.

Differential Revision: https://reviews.llvm.org/D25243

llvm-svn: 289509
2016-12-13 01:21:15 +00:00
Kostya Serebryany a31300e789 [libFuzzer] don't require extra flags with -minimize_crash=1 (default to -max_total_time=600). Also respect exact_artifact_path when outputting the end result
llvm-svn: 289506
2016-12-13 00:40:47 +00:00
Marcos Pividori 681e904419 [libFuzzer] Implement Timers for Windows.
Implemented timeouts for Windows using TimerQueueTimers.
Timers are used to supervise the time of execution of the
callback function that is being fuzzed.

Differential Revision: https://reviews.llvm.org/D27237

llvm-svn: 289495
2016-12-12 23:25:11 +00:00
Andrew Kaylor ff6a1edfa8 Avoid infinite loops in branch folding
Differential Revision: https://reviews.llvm.org/D27582

llvm-svn: 289486
2016-12-12 23:05:38 +00:00
Kostya Serebryany 092d5764a1 [libFuzzer] split one slow test into several, for more parallel testing
llvm-svn: 289481
2016-12-12 22:55:25 +00:00
Nico Weber b3901bdde8 Fix MSVC build after 289461; MSVC isn't sure if this is std:: or llvm::
llvm-svn: 289480
2016-12-12 22:46:40 +00:00
Kostya Serebryany a4b43bf8e8 [libFuzzer] make SimpleCmpTest a bit simpler to crack and more verbose
llvm-svn: 289477
2016-12-12 22:39:33 +00:00
Sanjay Patel 62104ee6d9 [x86] fix formatting; NFC
llvm-svn: 289476
2016-12-12 22:31:01 +00:00
Eugene Zelenko 6a9226d9b8 [AMDGPU, PowerPC, TableGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 289475
2016-12-12 22:23:53 +00:00
Guozhi Wei 1fd553c934 [PPC] Prefer direct move on power8 if load 1 or 2 bytes to VSR
Power8 has MTVSRWZ but no LXSIBZX/LXSIHZX, so move 1 or 2 bytes to VSR through MTVSRWZ is much faster than store the extended value into stack and load it with LXSIWZX.
This patch fixes pr31144.

Differential Revision: https://reviews.llvm.org/D27287

llvm-svn: 289473
2016-12-12 22:09:02 +00:00
Tim Shen 44bde896a5 [APFloat] Implement PPCDoubleDouble add and subtract.
Summary:
I looked at libgcc's implementation (which is based on the paper,
Software for Doubled-Precision Floating-Point Computations", by Seppo Linnainmaa,
ACM TOMS vol 7 no 3, September 1981, pages 272-283.) and made it generic to
arbitrary IEEE floats.

Differential Revision: https://reviews.llvm.org/D26817

llvm-svn: 289472
2016-12-12 21:59:30 +00:00
Matthew Simpson 92ce0230b5 [SLP] Fix sign-extends for type-shrinking
This patch ensures the correct minimum bit width during type-shrinking.
Previously when type-shrinking, we always sign-extended values back to their
original width. However, if we are going to sign-extend, and the sign bit is
unknown, we have to increase the minimum bit width by one bit so the
sign-extend will fill the upper bits correctly. If the sign bit is known to be
zero, we can perform a zero-extend instead. This should fix PR31243.

Reference: https://llvm.org/bugs/show_bug.cgi?id=31243
Differential Revision: https://reviews.llvm.org/D27466

llvm-svn: 289470
2016-12-12 21:11:04 +00:00
Kostya Serebryany 035af9b346 [libFuzzer] build libFuzzer itself with asan
llvm-svn: 289469
2016-12-12 20:58:10 +00:00
Paul Robinson ac7fe5e0c4 Recommit r288212: Emit 'no line' information for interesting 'orphan' instructions.
DWARF specifies that "line 0" really means "no appropriate source
location" in the line table.  By default, use this for branch targets
and some other cases that have no specified source location, to
prevent inheriting unfortunate line numbers from physically preceding
instructions (which might be from completely unrelated source).

Updated patch allows enabling or suppressing this behavior for all
unspecified source locations.

Differential Revision: http://reviews.llvm.org/D24180

llvm-svn: 289468
2016-12-12 20:49:11 +00:00
Kostya Serebryany d4be88913e [libFuzzer] respect -max_len during merge
llvm-svn: 289467
2016-12-12 20:39:35 +00:00
Teresa Johnson a29bd6ffcc [ThinLTO] Remove useless code (NFC)
Should have been removed in r288446.

llvm-svn: 289466
2016-12-12 20:34:28 +00:00
Mehdi Amini ef27db879c Refactor BitcodeReader: move Metadata and ValueId handling in their own class/file
Summary:
I'm planning on changing the way we load metadata to enable laziness.
I'm getting lost in this gigantic files, and gigantic class that is the bitcode
reader. This is a first toward splitting it in a few coarse components that
are more easily understandable.

Reviewers: pcc, tejohnson

Subscribers: mgorny, llvm-commits, dexonsmith

Differential Revision: https://reviews.llvm.org/D27646

llvm-svn: 289461
2016-12-12 19:34:26 +00:00
Mehdi Amini bf2090e31a Remove IsMetadataMaterialized from BitcodeReader (NFC)
Summary: It does not seem useful.

Reviewers: pcc, dexonsmith

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27668

llvm-svn: 289457
2016-12-12 19:23:39 +00:00
Geoff Berry d73420d591 [LiveRangeEdit] Add assert string and descriptive comment.
llvm-svn: 289456
2016-12-12 19:12:41 +00:00
Dimitry Andric 59e5cb4342 Fix compile with GCC 5 or later
Summary:

Compiling with GCC 5 or later can fail with a bogus error "constructor
required before non-static data member for
llvm::ValueEnumerator::MDRange::First has been parsed".

This was originally fixed upstream in GCC PR 70528, but later this fix
was reverted, and released versions of GCC still show the bogus error.

To work around this, replace MDRange's declaration of a default
constructor with a definition.

Reviewers: dexonsmith, rsmith, rivanvx

Subscribers: llvm-commits, dim, dexonsmith

Differential Revision: https://reviews.llvm.org/D18730

llvm-svn: 289454
2016-12-12 19:05:52 +00:00
Reid Kleckner 30422eea0f Revert "[SCEVExpand] do not hoist divisions by zero (PR30935)"
Reverts r289412. It caused an OOB PHI operand access in instcombine when
ASan is enabled. Reduction in progress.

Also reverts "[SCEVExpander] Add a test case related to r289412"

llvm-svn: 289453
2016-12-12 18:52:32 +00:00
Simon Atanasyan 5048514c20 [mips] For PIC code convert unconditional jump to unconditional branch
Unconditional branch uses relative addressing which is the right choice
in case of position independent code.

This is a fix for the bug:
https://dmz-portal.mips.com/bugz/show_bug.cgi?id=2445

Differential revision: https://reviews.llvm.org/D27483

llvm-svn: 289448
2016-12-12 17:40:26 +00:00
Nicolai Haehnle f45ea4bbc5 AMDGPU: llvm.amdgcn.interp.mov is a source of divergence
Summary:
While the result is constant across a single primitive, each pixel
shader wave can have pixels from multiple primitives.

Reviewers: tstellarAMD, arsenm

Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D27572

llvm-svn: 289447
2016-12-12 16:52:19 +00:00
Sanjay Patel e730ce87a5 [InstCombine] fix bug when offsetting case values of a switch (PR31260)
We could truncate the condition and then try to fold the add into the
original condition value causing wrong case constants to be used.

Move the offset transform ahead of the truncate transform and return
after each transform, so there's no chance of getting confused values.

Fix for:
https://llvm.org/bugs/show_bug.cgi?id=31260

llvm-svn: 289442
2016-12-12 16:13:52 +00:00
Teresa Johnson 040cc16835 [ThinLTO] Import only necessary DICompileUnit fields
Summary:
As discussed on mailing list, for ThinLTO importing we don't need
to import all the fields of the DICompileUnit. Don't import enums,
macros, retained types lists. Also only import local scoped imported
entities. Since we don't currently import any global variables,
we also don't need to import the list of global variables (added an
assert to verify none are being imported).

This is being done by pre-populating the value map entries to map
the unneeded metadata to nullptr. For the imported entities, we can
simply replace the source module's list with a new list containing
only those needed imported entities. This is done in the IRLinker
constructor so that value mapping automatically does the desired
mapping.

Reviewers: mehdi_amini, dexonsmith, dblaikie, aprantl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27635

llvm-svn: 289441
2016-12-12 16:09:30 +00:00
Sanjay Patel 87e2f677d7 [InstCombine] clean up range-for-loops in visitSwitchInst(); NFCI
llvm-svn: 289439
2016-12-12 15:52:56 +00:00
Simon Pilgrim 4cbe1834e4 Update inline argument comment. NFCI.
combineX86ShufflesRecursively 'HasPSHUFB' flag has been the more generic 'HasVariableMask' flag for some time.

llvm-svn: 289430
2016-12-12 13:43:15 +00:00
Simon Pilgrim 5ebd2b542b [X86][SSE] Add support for combining SSE VSHLI/VSRLI uniform constant shifts.
Fixes some missed constant folding opportunities and allows us to combine shuffles that end with a logical bit shift.

llvm-svn: 289429
2016-12-12 13:33:58 +00:00
Simon Pilgrim 369cd349b9 [X86][SSE] Lower suitably sign-extended mul vXi64 using PMULDQ
PMULDQ returns the 64-bit result of the signed multiplication of the lower 32-bits of vXi64 vector inputs, we can lower with this if the sign bits stretch that far.

Differential Revision: https://reviews.llvm.org/D27657

llvm-svn: 289426
2016-12-12 10:49:15 +00:00
Simon Pilgrim 040a36c176 [SelectionDAG] Add support for EXTRACT_SUBVECTOR to ComputeNumSignBits
Pre-commit as discussed on D27657

llvm-svn: 289425
2016-12-12 10:29:43 +00:00
Craig Topper 36ecce9bed [X86] Teach selectScalarSSELoad to accept full 128-bit vector loads and the X86ISD::VZEXT_LOAD opcode.
Disable peephole on some of the tests that no longer require it to properly fold scalar intrinsics.

llvm-svn: 289424
2016-12-12 07:57:24 +00:00
Craig Topper f2c6f7abf3 [X86] Change CMPSS/CMPSD intrinsic instructions to use sse_load_f32/f64 as its memory pattern instead of full vector load.
These intrinsics only load a single element. We should use sse_loadf32/f64 to give more options of what loads it can match.

Currently these instructions are often only getting their load folded thanks to the load folding in the peephole pass. I plan to add more types of loads to sse_load_f32/64 so we can match without the peephole.

llvm-svn: 289423
2016-12-12 07:57:21 +00:00
Craig Topper 081c0e2864 [X86] Remove some intrinsic instructions from hasPartialRegUpdate
Summary:
These intrinsic instructions are all selected from intrinsics that have well defined behavior for where the upper bits come from. It's not the same place as the lower bits.

As you can see we were suppressing load folding for these instructions in some cases. In none of the cases was the separate load helping avoid a partial dependency on the destination register. So we should just go ahead and allow the load to be folded.

Only foldMemoryOperand was suppressing folding for these. They all have patterns for folding sse_load_f32/f64 that aren't gated with OptForSize, but sse_load_f32/f64 doesn't allow 128-bit vector loads. It only allows scalar_to_vector and vzmovl of scalar loads to match. There's no reason we can't allow a 128-bit vector load to be narrowed so I would like to fix sse_load_f32/f64 to allow that. And if I do that it changes some of these same test cases to fold the load too.

Reviewers: spatel, zvi, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27611

llvm-svn: 289419
2016-12-12 05:07:17 +00:00
Sebastian Pop 8c9cc8c86b [SCEVExpand] do not hoist divisions by zero (PR30935)
SCEVExpand computes the insertion point for the components of a SCEV to be code
generated.  When it comes to generating code for a division, SCEVexpand would
not be able to check (at compilation time) all the conditions necessary to avoid
a division by zero.  The patch disables hoisting of expressions containing
divisions by anything other than non-zero constants in order to avoid hoisting
these expressions past conditions that should hold before doing the division.

The patch passes check-all on x86_64-linux.

Differential Revision: https://reviews.llvm.org/D27216

llvm-svn: 289412
2016-12-12 02:52:51 +00:00
Craig Topper 7fc6d34ed1 [InstCombine][XOP] The instructions for the scalar frcz intrinsics are defined to put 0 in the upper bits, not pass bits through like other intrinsics. So we should return a zero vector instead.
llvm-svn: 289411
2016-12-11 22:32:38 +00:00
Simon Pilgrim 831435cb14 [X86][SSE] Add support for combining target shuffles to SHUFPD.
llvm-svn: 289407
2016-12-11 21:26:25 +00:00
Davide Italiano 0a1476c756 [SCCP] Use the appropriate helper function. NFCI.
llvm-svn: 289406
2016-12-11 21:19:03 +00:00
Ayman Musa 7ec4ed55d3 [X86][AVX512] Add missing patterns for broadcast fallback in case load node has multiple uses (for v4i64 and v4f64).
When the load node which the broadcast instruction broadcasts has multiple uses, it cannot be folded.
A fallback pattern is added to catch these cases and provide another solution.

Differential Revision: https://reviews.llvm.org/D27661

llvm-svn: 289404
2016-12-11 20:11:17 +00:00
Sanjoy Das 6de678815c [TBAA] Don't generate invalid TBAA when merging nodes
Summary:
Fix a corner case in `MDNode::getMostGenericTBAA` where we can sometimes
generate invalid TBAA metadata.

Reviewers: chandlerc, hfinkel, mehdi_amini, manmanren

Subscribers: mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D26635

llvm-svn: 289403
2016-12-11 20:07:25 +00:00
Sanjoy Das 3336f681e3 [Verifier] Add verification for TBAA metadata
Summary:
This change adds some verification in the IR verifier around struct path
TBAA metadata.

Other than some basic sanity checks (e.g. we get constant integers where
we expect constant integers), this checks:

 - That by the time an struct access tuple `(base-type, offset)` is
   "reduced" to a scalar base type, the offset is `0`.  For instance, in
   C++ you can't start from, say `("struct-a", 16)`, and end up with
   `("int", 4)` -- by the time the base type is `"int"`, the offset
   better be zero.  In particular, a variant of this invariant is needed
   for `llvm::getMostGenericTBAA` to be correct.

 - That there are no cycles in a struct path.

 - That struct type nodes have their offsets listed in an ascending
   order.

 - That when generating the struct access path, you eventually reach the
   access type listed in the tbaa tag node.

Reviewers: dexonsmith, chandlerc, reames, mehdi_amini, manmanren

Subscribers: mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D26438

llvm-svn: 289402
2016-12-11 20:07:15 +00:00
Sanjay Patel 81ed3499cd [Constants] don't die processing non-ConstantInt GEP indices in isGEPWithNoNotionalOverIndexing() (PR31262)
This should fix:
https://llvm.org/bugs/show_bug.cgi?id=31262

llvm-svn: 289401
2016-12-11 20:07:02 +00:00
Sebastian Pop e08d9c7c87 instr-combiner: sum up all latencies of the transformed instructions
We have found that -- when the selected subarchitecture has a scheduling model
and we are not optimizing for size -- the machine-instruction combiner uses a
too-simple algorithm to compute the cost of one of the two alternatives [before
and after running a combining pass on a section of code], and therefor it throws
away the combination results too often.

This fix has the potential to help any ISA with the potential to combine
instructions and for which at least one subarchitecture has a scheduling model.
As of now, this is only known to definitely affect AArch64 subarchitectures with
a scheduling model.

Regression tested on AMD64/GNU-Linux, new test case tested to fail on an
unpatched compiler and pass on a patched compiler.

Patch by Abe Skolnik and Sebastian Pop.

llvm-svn: 289399
2016-12-11 19:39:32 +00:00
Sanjoy Das ba1bf87586 [SCEVExpander] Explicitly expand AddRec starts into loop preheader
This is NFC today, but won't be once D27216 (or an equivalent patch) is
in.

This change fixes a design problem in SCEVExpander -- it relied on a
hoisting optimization to generate correct code for add recurrences.
This meant changing the hoisting optimization to not kick in under
certain circumstances (to avoid speculating faulting instructions, say)
would break correctness.

The fix is to make the correctness requirements explicit, and have it
not rely on the hoisting optimization for correctness.

llvm-svn: 289397
2016-12-11 19:02:21 +00:00
Oren Ben Simhon 9683ecbff6 [X86] Regcall - Adding support for mask types
Regcall calling convention passes mask types arguments in x86 GPR registers.
The review includes the changes required in order to support v32i1, v16i1 and v8i1.

Differential Revision: https://reviews.llvm.org/D27148

llvm-svn: 289383
2016-12-11 14:10:52 +00:00
Craig Topper 23ebd9564f [X86][InstCombine] Add support for scalar FMA intrinsics to SimplifyDemandedVectorElts.
This teaches SimplifyDemandedElts that the FMA can be removed if the lower element isn't used. It also teaches it that if upper elements of the first operand aren't used then we can simplify them.

llvm-svn: 289377
2016-12-11 08:54:52 +00:00
Chandler Carruth ecbe61966f Tweak the core loop in StringRef::find to avoid calling memcmp on every
iteration.

Instead, load the byte at the needle length, compare it directly, and
save it to use in the lookup table of lengths we can skip forward.

I also added an annotation to expect that the comparison fails so that
the loop gets laid out contiguously without the call to memcpy (and the
substantial register shuffling that the ABI requires of that call).

Finally, because this behaves especially badly with a needle length of
one (by calling memcmp with a zero length) special case that to directly
call memchr, which is what we should have been doing anyways.

This was motivated by the fact that there are a large number of test
cases in 'check-llvm' where FileCheck's performance is dominated by
calls to StringRef::find (in a release, no-asserts build). I'm working
on patches to generally improve matters there, but this alone was worth
a 12.5% improvement in one test case where FileCheck spent 92% of its
time in this routine.

I experimented a bunch with different minor variations on this theme,
for example setting the pointer *at* the last byte and indexing
backwards for the call to memcmp. That didn't improve anything on this
version and seemed more complex. I also tried other things to make the
loop flow more nicely and none worked. =/ It is a bit unfortunate, the
generated code here remains pretty gross, but I don't see any obvious
ways to improve it. At this point, most of my ideas would be really
elaborate:

1) While the remainder of the string is long enough, we could load
   a 16-byte or 32-byte vector at the address of the last byte and use
   palignr to rotate that and check the first 15- or 31-bytes at the
   front of the next segment, essentially pre-loading the first several
   bytes of the next iteration so we could quickly detect a mismatch in
   those bytes without an additional memory access. Down side would be
   the code complexity, having a fallback loop, and likely misaligned
   vector load. Plus it would make the common case of the last byte not
   matching somewhat slower (need some extraction from a vector).
2) While we have space, we could do an aligned load of a 16- or 32-byte
   vector that *contains* the end byte, and use any peceding bytes to
   have a more precise "no" test, and any subsequent bytes could be
   saved for the next iteration. This remove any unaligned load penalty,
   but still requires us to pay the overhead of vector extraction for
   the cases where we didn't need to do anything other than load and
   compare the last byte.
3) Try to walk from the last byte in a way that is more friendly to
   cache and/or memory pre-fetcher considering we have to poke the last
   byte anyways.

No idea if any of these are really worth pursuing though. They all seem
somewhat unlikely to yield big wins in practice and to be a lot of work
and complexity. So I settled here, which at least seems like a strict
improvement over the previous version.

llvm-svn: 289373
2016-12-11 07:46:21 +00:00
Craig Topper 61b280e7b0 [X86][InstCombine] Teach InstCombineCalls to simplify demanded elements for scalar FMA intrinsics.
These intrinsics don't read the upper bits of their second and third inputs so we can try to simplify them.

llvm-svn: 289372
2016-12-11 07:42:06 +00:00
Craig Topper d96395365a [AVX-512][InstCombine] Teach InstCombineCalls how to simplify demanded for scalar cmp intrinsics with masking and rounding.
These intrinsics don't read the upper elements of their first and second input. These are slightly different the the SSE version which does use the upper bits of its first element as passthru bits since the result goes to an XMM register. For AVX-512 the result goes to a mask register instead.

llvm-svn: 289371
2016-12-11 07:42:04 +00:00
Craig Topper 790d0fa569 [AVX-512][InstCombine] Teach InstCombineCalls how to simplify demanded elements for scalar add,div,mul,sub,max,min intrinsics with masking and rounding.
These intrinsics don't read the upper bits of their second input. And the third input is the passthru for masking and that only uses the lower element as well.

llvm-svn: 289370
2016-12-11 07:42:01 +00:00
Kostya Serebryany 441e6310ae [libFuzzer] don't depend on time in a test
llvm-svn: 289368
2016-12-11 06:28:09 +00:00
Craig Topper 58917f3508 [AVX-512][InstCombine] Add 512-bit vpermilvar intrinsics to InstCombineCalls to match 128 and 256-bit.
llvm-svn: 289354
2016-12-11 01:59:36 +00:00
Craig Topper e7166ce237 [X86] Fix a comment to say 'an FMA' instead of 'a FMA'. NFC
llvm-svn: 289352
2016-12-11 01:28:08 +00:00
Craig Topper 1f1b441267 [X86] Remove masking from 512-bit VPERMIL intrinsics in preparation for being able to constant fold them in InstCombineCalls like we do for 128/256-bit.
llvm-svn: 289350
2016-12-11 01:26:44 +00:00
Dylan McKay 139c0c7c37 [AVR] Fix a signed vs unsigned compiler warning
llvm-svn: 289349
2016-12-11 00:24:13 +00:00
Craig Topper 9a63d7ade5 [X86][InstCombine] Teach InstCombineCalls to turn pshufb intrinsic into a shufflevector if the indices are constant.
llvm-svn: 289348
2016-12-11 00:23:50 +00:00
Dylan McKay 658bb0964a [AVR] Remove incorrect comment
This should've been removed in r289323.

llvm-svn: 289346
2016-12-10 23:50:30 +00:00
Craig Topper edab02b50b [X86] Remove masking from 512-bit PSHUFB intrinsics in preparation for being able to constant fold it in InstCombineCalls like we do for 128/256-bit.
llvm-svn: 289344
2016-12-10 23:09:43 +00:00
Sanjay Patel 4c48bbe94d [InstCombine] add helper for shift-by-shift folds; NFCI
These are currently limited to integer types, but we should
be able to extend to splat vectors and possibly general vectors.

llvm-svn: 289343
2016-12-10 22:16:29 +00:00
Simon Pilgrim a03e350e69 [X86][SSE] Ensure UNPCK inputs are a consistent value type in LowerHorizontalByteSum
llvm-svn: 289341
2016-12-10 21:16:45 +00:00
Craig Topper abe7c5b5e9 [AVX-512] Remove 128/256 masked vpermil instrinsics and autoupgrade to a select around the unmasked avx1 intrinsics.
llvm-svn: 289340
2016-12-10 21:15:52 +00:00
Craig Topper a4744d170e [X86][IR] Move the autoupgrading of store intrinsics out of the main nested if/else chain. This should buy a little more time against the MSVC limit mentioned in PR31034.
The handlers for stores all return at the end of their block so they can be picked off early.

llvm-svn: 289339
2016-12-10 21:15:48 +00:00
Matt Arsenault fbc728853f AMDGPU: Fix asan errors when folding operands
This was failing when trying to fold immediates into operand 1 of a
phi, which only has one statically known operand.

llvm-svn: 289337
2016-12-10 19:58:00 +00:00
Simon Pilgrim fb58550d73 [X86][SSE] Move ZeroVector creation into the shuffle pattern case where its actually used.
Also fix the ZeroVector's type - I've no idea how this hasn't caused problems........

llvm-svn: 289336
2016-12-10 19:49:55 +00:00
Craig Topper 18b57da491 [AVX-512] Add support for lowering (v2i64 (fp_to_sint (v2f32))) to vcvttps2uqq when AVX512DQ and AVX512VL are available.
llvm-svn: 289335
2016-12-10 19:35:39 +00:00
Craig Topper 8e288e0b68 [X86] Clarify indentation. NFC
llvm-svn: 289334
2016-12-10 19:35:36 +00:00
Craig Topper 85f0e57c33 [X86] Combine LowerFP_TO_SINT and LowerFP_TO_UINT. They only differ by a single boolean flag passed to a helper function. Just check the opcode and create the flag.
llvm-svn: 289333
2016-12-10 19:35:33 +00:00
Sanjay Patel 35289c62a8 [InstSimplify] improve function name; NFC
llvm-svn: 289332
2016-12-10 17:40:47 +00:00
Simon Atanasyan edd7a7bb40 [mips] Eliminate else-after-return. NFC
llvm-svn: 289331
2016-12-10 17:30:09 +00:00
Simon Pilgrim 54945a12ec [SelectionDAG] Add ability for computeKnownBits to peek through bitcasts from 'large element' scalar/vector to 'small element' vector.
Extension to D27129 which already supported bitcasts from 'small element' vector to 'large element' scalar/vector types.

llvm-svn: 289329
2016-12-10 17:00:00 +00:00
Dylan McKay 41258cf07d [AVR] Add a stub README file
llvm-svn: 289326
2016-12-10 12:08:19 +00:00
Dylan McKay d8a603c23b [AVR] Fix and clean up the inline assembly tests
There was a bug where we would hit an assertion if 'Q' was used as a
constraint.

I also removed hardcoded register names to prefer regexes so the tests
don't break when the register allocator changes.

llvm-svn: 289325
2016-12-10 11:49:07 +00:00
Dylan McKay 801a4bd4ed [AVR] Fix an inline asm assertion which would always trigger
It looks like some time in the past, constraint codes were changed from
chars being passed around to enums.

llvm-svn: 289323
2016-12-10 11:18:37 +00:00
Dylan McKay 5c90b8cb4f [AVR] Use the register scavenger when expanding 'LDDW' instructions
Summary: This gets rid of the hardcoded 'r0' that was used previously.

Reviewers: asl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27567

llvm-svn: 289322
2016-12-10 10:51:55 +00:00
Dylan McKay 5d0233bea2 [AVR] Support stores to undefined pointers
This would previously trigger an assertion error in AVRISelDAGToDAG.

llvm-svn: 289321
2016-12-10 10:16:13 +00:00
Chandler Carruth 6b9816477b [PM] Support invalidation of inner analysis managers from a pass over the outer IR unit.
Summary:
This never really got implemented, and was very hard to test before
a lot of the refactoring changes to make things more robust. But now we
can test it thoroughly and cleanly, especially at the CGSCC level.

The core idea is that when an inner analysis manager proxy receives the
invalidation event for the outer IR unit, it needs to walk the inner IR
units and propagate it to the inner analysis manager for each of those
units. For example, each function in the SCC needs to get an
invalidation event when the SCC gets one.

The function / module interaction is somewhat boring here. This really
becomes interesting in the face of analysis-backed IR units. This patch
effectively handles all of the CGSCC layer's needs -- both invalidating
SCC analysis and invalidating function analysis when an SCC gets
invalidated.

However, this second aspect doesn't really handle the
LoopAnalysisManager well at this point. That one will need some change
of design in order to fully integrate, because unlike the call graph,
the entire function behind a LoopAnalysis's results can vanish out from
under us, and we won't even have a cached API to access. I'd like to try
to separate solving the loop problems into a subsequent patch though in
order to keep this more focused so I've adapted them to the API and
updated the tests that immediately fail, but I've not added the level of
testing and validation at that layer that I have at the CGSCC layer.

An important aspect of this change is that the proxy for the
FunctionAnalysisManager at the SCC pass layer doesn't work like the
other proxies for an inner IR unit as it doesn't directly manage the
FunctionAnalysisManager and invalidation or clearing of it. This would
create an ever worsening problem of dual ownership of this
responsibility, split between the module-level FAM proxy and this
SCC-level FAM proxy. Instead, this patch changes the SCC-level FAM proxy
to work in terms of the module-level proxy and defer to it to handle
much of the updates. It only does SCC-specific invalidation. This will
become more important in subsequent patches that support more complex
invalidaiton scenarios.

Reviewers: jlebar

Subscribers: mehdi_amini, mcrosier, mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D27197

llvm-svn: 289317
2016-12-10 06:34:44 +00:00
Craig Topper a39b650d72 [X86] Use X86ISD::CVTTP2SI and X86ISD::CVTTP2UI for lowering 128-bit cvttps2qq and cvttps2uqq intrinsics since there is a mismatch between number of input and output elements.
Ideally ISD::FP_TO_SINT and ISD::FP_TO_UINT would only be used for cases with the same number of input and output elements.

Similar things have already been done for other convert intrinsics.

llvm-svn: 289316
2016-12-10 06:02:48 +00:00
Dylan McKay f368509543 [AVR] Fix a bunch of incorrect assertion messages
These should've been checking whether the immediate is a 6-bit unsigned
integer.

If the immediate was '63', this would cause an assertion error which
shouldn't have occurred.

llvm-svn: 289315
2016-12-10 05:48:48 +00:00
Kostya Serebryany c05cb60369 [libFuzzer] test cleanup (3)
llvm-svn: 289314
2016-12-10 02:48:42 +00:00
Kostya Serebryany 832d39e9cc [libFuzzer] test cleanup (2)
llvm-svn: 289313
2016-12-10 02:47:00 +00:00
Kostya Serebryany 2f962fe5f7 [libFuzzer] test cleanup
llvm-svn: 289312
2016-12-10 02:45:56 +00:00
Kostya Serebryany 61be0f947d [libFuzzer] switch all libFuzzer tests to use -fsanitize-coverage=trace-pc-guard. Support for the previosly used instrumentation will be removed in the following changes
llvm-svn: 289311
2016-12-10 02:26:23 +00:00
Kostya Serebryany 1394ce2aa2 [libFuzzer] use __sanitizer_get_module_and_offset_for_pc to get the module name while printing the coverage
llvm-svn: 289310
2016-12-10 01:19:35 +00:00
Matt Arsenault 2402b95db0 AMDGPU: Fix AMDGPUPromoteAlloca breaking addrspacecasts
The users of the addrspacecast were having their types incorrectly
changed, producing invalid bitcasts between address spaces.

llvm-svn: 289307
2016-12-10 00:52:50 +00:00
Matt Arsenault 4bd7236193 AMDGPU: Fix handling of 16-bit immediates
Since 32-bit instructions with 32-bit input immediate behavior
are used to materialize 16-bit constants in 32-bit registers
for 16-bit instructions, determining the legality based
on the size is incorrect. Change operands to have the size
specified in the type.

Also adds a workaround for a disassembler bug that
produces an immediate MCOperand for an operand that
is supposed to be OPERAND_REGISTER.

The assembler appears to accept out of bounds immediates and
truncates them, but this seems to be an issue for 32-bit
already.

llvm-svn: 289306
2016-12-10 00:39:12 +00:00
Matt Arsenault f0c862594b AMDGPU: Fix vintrp disassembly
llvm-svn: 289292
2016-12-10 00:29:55 +00:00
Matt Arsenault 618b330dd0 AMDGPU: Change vintrp printing to better match sc
Some of the immediates need to be printed differently
eventually.

llvm-svn: 289291
2016-12-10 00:23:12 +00:00
Eugene Zelenko 2bc2f33ba2 [AMDGPU, PowerPC, TableGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 289282
2016-12-09 22:06:55 +00:00
Adrian Prantl 8fafb8d378 Fix LLVM's use of DW_OP_bit_piece in DWARF expressions.
LLVM's use of DW_OP_bit_piece is incorrect and a based on a
misunderstanding of the wording in the DWARF specification. The offset
argument of DW_OP_bit_piece refers to the offset into the location
that is on the top of the DWARF expression stack, and not an offset
into the source variable. This has since also been clarified in the
DWARF specification.

This patch fixes all uses of DW_OP_bit_piece to emit the correct
offset and simplifies the DwarfExpression class to semi-automaticaly
emit empty DW_OP_pieces to adjust the offset of the source variable,
thus simplifying the code using DwarfExpression.

While this is an incompatible bugfix, in practice I don't expect this
to be much of a problem since LLVM's old interpretation and the
correct interpretation of DW_OP_bit_piece differ only when there are
gaps in the fragmented locations of the described variables or if
individual fragments are smaller than a byte. LLDB at least won't
interpret locations with gaps in them because is has no way to present
undefined bits in a variable, and there is a high probability that an
old-form expression will be malformed when interpreted correctly,
because the DW_OP_bit_piece offset will be outside of the location at
the top of the stack.

As a nice side-effect, this patch enables us to use a more efficient
encoding for subregisters: In order to express a sub-register at a
non-zero offset we now use a DW_OP_bit_piece instead of shifting the
value into place manually.

This patch also adds missing test coverage for code paths that weren't
exercised before.

<rdar://problem/29335809>
Differential Revision: https://reviews.llvm.org/D27550

llvm-svn: 289266
2016-12-09 20:43:40 +00:00
Marek Olsak 23ae31cca0 AMDGPU/SI: Remove XNACK feature from CI
Summary: CI doesn't have XNACK.

Reviewers: tstellarAMD

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D27175

llvm-svn: 289263
2016-12-09 19:49:58 +00:00
Marek Olsak 0f55fbae6c AMDGPU/SI: Don't reserve XNACK when it's disabled
Summary:
This frees 2 additional scalar registers.

These are results from all of my 3 patches combined:

  Polaris:
    Spilled SGPRs: 2231 -> 1517 (-32.00 %)

  Tonga:
    Spilled SGPRs: 3829 -> 2608 (-31.89 %)
    Spilled VGPRs: 100 -> 84 (-16.00 %)

  Tonga even spills SGPRs via VGPRs to scratch. That's a compute shader
  limited to 64 VGPRs.

Reviewers: tstellarAMD

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D27151

llvm-svn: 289262
2016-12-09 19:49:54 +00:00
Marek Olsak 693e9be918 AMDGPU/SI: Don't reserve FLAT_SCR on non-HSA targets & without stack objects
Summary: This frees 2 scalar registers.

Reviewers: tstellarAMD

Subscribers: qcolombet, arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D27150

llvm-svn: 289261
2016-12-09 19:49:48 +00:00
Marek Olsak 91f22fbf4f AMDGPU/SI: Allow using SGPRs 96-101 on VI
Summary:
There is no point in setting SGPRS=104, because VI allocates SGPRs
in multiples of 16, so 104 -> 112. That enables us to use all 102 SGPRs
for general purposes.

Reviewers: tstellarAMD

Subscribers: qcolombet, arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D27149

llvm-svn: 289260
2016-12-09 19:49:40 +00:00
Paul Robinson 4fa7b57a1f [DWARF] Suppress .loc directives from CFI instructions
Like DBG_VALUE, these emit nothing to the .text section, and sometimes
have no source location specified.  Just ignore them.

Differential Revision: http://reviews.llvm.org/D27492

llvm-svn: 289256
2016-12-09 19:15:32 +00:00
Matt Arsenault 7b00cf4706 AMDGPU: Fix isTypeDesirableForOp for i16
This should do nothing for targets without i16.

llvm-svn: 289235
2016-12-09 17:57:43 +00:00
Simon Pilgrim 017b7a71d8 [SelectionDAG] Add knownbits support for EXTRACT_VECTOR_ELT opcodes (REAPPLIED)
Reapplied with fix for PR31323 - X86 SSE2 vXi16 multiplies for illegal types were creating CONCAT_VECTORS nodes with vector inputs that might not total the number of elements in the result type.

llvm-svn: 289232
2016-12-09 17:53:11 +00:00
Matt Arsenault 38d8ed2b75 AMDGPU: Fix i128 mul
llvm-svn: 289231
2016-12-09 17:49:14 +00:00
Matt Arsenault 52facf0195 AMDGPU: Allow TBA, TMA, TTMP* registers with SMEM instructions
Fixes assembler regressions.

llvm-svn: 289230
2016-12-09 17:49:11 +00:00
Matt Arsenault eb4a55e066 AMDGPU: Clean up instruction bits
Sort the instruction bits by type and make sure there is one
for each format.

Also cleanup namespaces.

llvm-svn: 289229
2016-12-09 17:49:08 +00:00
Sean Fertile 1c4109b4c2 [PPC] Add intrinsics for vector extract word and vector insert word.
Revision: https://reviews.llvm.org/D26547
llvm-svn: 289227
2016-12-09 17:21:42 +00:00
Nirav Dave bedb5d906c Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r289221 which appears to be triggering an assertion

llvm-svn: 289226
2016-12-09 17:18:24 +00:00
Nirav Dave fd51ff4fd8 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Retrying after fixing overly aggressive load-store forwarding optimization.

Simplify Consecutive Merge Store Candidate Search

Now that address aliasing is much less conservative, push through
simplified store merging search which only checks for parallel stores
through the chain subgraph. This is cleaner as the separation of
non-interfering loads/stores from the store-merging logic.

Whem merging stores, search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited. This improves the quality of the
output SelectionDAG and generally the output CodeGen (with some
exceptions).

Additional Minor Changes:

   1. Finishes removing unused AliasLoad code
   2. Unifies the the chain aggregation in the merged stores across
      code paths
   3. Re-add the Store node to the worklist after calling
      SimplifyDemandedBits.
   4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
      arbitrary, but seemed sufficient to not cause regressions in
      tests.

This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.

Many tests required some changes as memory operations are now
reorderable. Some tests relying on the order were changed to use
volatile memory operations

Noteworthy tests:

    CodeGen/AArch64/argument-blocks.ll -
      It's not entirely clear what the test_varargs_stackalign test is
      supposed to be asserting, but the new code looks right.

    CodeGen/AArch64/arm64-memset-inline.lli -
    CodeGen/AArch64/arm64-stur.ll -
    CodeGen/ARM/memset-inline.ll -

      The backend now generates *worse* code due to store merging
      succeeding, as we do do a 16-byte constant-zero store efficiently.

    CodeGen/AArch64/merge-store.ll -
      Improved, but there still seems to be an extraneous vector insert
      from an element to itself?

    CodeGen/PowerPC/ppc64-align-long-double.ll -
      Worse code emitted in this case, due to the improved store->load
      forwarding.

    CodeGen/X86/dag-merge-fast-accesses.ll -
    CodeGen/X86/MergeConsecutiveStores.ll -
    CodeGen/X86/stores-merging.ll -
    CodeGen/Mips/load-store-left-right.ll -
      Restored correct merging of non-aligned stores

    CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
      Improved. Correctly merges buffer_store_dword calls

    CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
      Improved. Sidesteps loading a stored value and
      merges two stores

    CodeGen/X86/pr18023.ll -
      This test has been removed, as it was asserting incorrect
      behavior. Non-volatile stores *CAN* be moved past volatile loads,
      and now are.

    CodeGen/X86/vector-idiv.ll -
    CodeGen/X86/vector-lzcnt-128.ll -
      It's basically impossible to tell what these tests are actually
      testing. But, looks like the code got better due to the memory
      operations being recognized as non-aliasing.

    CodeGen/X86/win32-eh.ll -
      Both loads of the securitycookie are now merged.

Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel

Differential Revision: https://reviews.llvm.org/D14834

llvm-svn: 289221
2016-12-09 16:15:12 +00:00
Simon Pilgrim b9eb99f570 Use SelectionDAG.getSplatBuildVector helper. NFCI.
llvm-svn: 289220
2016-12-09 16:01:50 +00:00
Tom Stellard 2a48433fcf AMDGPU/SI: Don't mark VINTRP instructions as mayLoad
Summary:
These instructions technically do read from memory, but the memory
is considered to be out of bounds for normal load/store instructions.

shader-db stats:

SGPRS: 1416075 -> 1413323 (-0.19 %)
VGPRS: 867413 -> 863935 (-0.40 %)
Spilled SGPRs: 1409 -> 1354 (-3.90 %)
Spilled VGPRs: 63 -> 63 (0.00 %)
Private memory VGPRs: 880 -> 880 (0.00 %)
Scratch size: 2648 -> 2632 (-0.60 %) dwords per thread
Code Size: 37889052 -> 37897340 (0.02 %) bytes
LDS: 2147 -> 2147 (0.00 %) blocks
Max Waves: 279243 -> 280369 (0.40 %)
Wait states: 0 -> 0 (0.00 %)

Reviewers: nhaehnle, mareko, arsenm

Subscribers: kzhuravl, wdng, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D27593

llvm-svn: 289219
2016-12-09 15:57:15 +00:00
Simon Pilgrim bf9c0e7434 [SelectionDAG] Use SelectionDAG.getBuildVector helper. NFCI.
Makes interception of BUILD_VECTOR creation easier for debugging.

llvm-svn: 289218
2016-12-09 15:23:41 +00:00
Simon Pilgrim 15f1f828b5 [SelectionDAG] Add additional checks to CONCAT_VECTORS creation
Part of the work for PR31323 - add extra asserts checking that the input vectors are of consistent type and result in the correct number of vector elements.

llvm-svn: 289214
2016-12-09 14:27:52 +00:00
Benjamin Kramer eedc4059c3 Plug another leak in the DWARF unittests, DIEInlineStrings are never destroyed.
llvm-svn: 289208
2016-12-09 13:33:41 +00:00
Simon Pilgrim e4050a2961 [SelectionDAG] Add partial BITCAST support to computeKnownBits
Adds support for bitcasting a little endian 'small element' vector to 'large element' scalar/vector (e.g. v16i8 to v4i32 or v2i32 to i64), which is required for PR30845. We extract the knownbits for each 'small element' part and concatenate the results together.

We can add support for big endian and 'large element' scalar/vector to 'small element' vector bitcasting once we have test cases for them.

Differential Revision: https://reviews.llvm.org/D27129

llvm-svn: 289200
2016-12-09 10:13:45 +00:00
Daniel Jasper f51e05ffbc Revert "[SelectionDAG] Add knownbits support for EXTRACT_VECTOR_ELT opcodes"
This reverts commit r288916 as it is currently causing a crasher in
Halide. Reproducer on llvm.org/PR31323. While it might be that halide is
generating invalid IR, llc shouldn't crash.

llvm-svn: 289194
2016-12-09 09:04:51 +00:00
Craig Topper 38b1b5d44f [X86] Modify patterns from memory form of RCP/RSQRT/SQRT intrinsics to only allow (scalar_to_vector (loadf32/load64)) instead of anything that sse_load_f32/f64 can match.
sse_load_f32/f64 can also match loads that are zero extended to vectors. We shouldn't match that because we wouldn't be able to get the instruction to zero the upper bits like the intrinsic semantics would require for such a case.

There is a test case that does depend on this behavior.

llvm-svn: 289193
2016-12-09 07:57:21 +00:00
Dylan McKay 18ae0f68f8 [AVR] Use a more appropriate integer type for wide IN/OUT instructions
We could previously select an integer which would hit an assertion error
in pseudo expansion.

The new type will also generate the appropriate fixups if needed, which
wasn't done beforehand.

llvm-svn: 289192
2016-12-09 07:49:14 +00:00
Dylan McKay a5d49dfbb3 [AVR] Add tests for a large number of pseudo instructions
This adds MIR tests for 24 pseudo instructions.

llvm-svn: 289191
2016-12-09 07:49:04 +00:00
Craig Topper a55b483bb5 [AVX-512] Correctly preserve the passthru semantics of the FMA scalar intrinsics
Summary:
Scalar intrinsics have specific semantics about the which input's upper bits are passed through to the output. The same input is also supposed to be the input we use for the lower element when the mask bit is 0 in a masked operation. We aren't currently keeping these semantics with instruction selection.

This patch corrects this by introducing new scalar FMA ISD nodes that indicate whether operand 1(one of the multiply inputs) or operand 3(the additon/subtraction input) should pass thru its upper bits.

We use this information to select 213/132 form for the operand 1 version and the 231 form for the operand 3 version.

We also use this information to suppress combining FNEG operations on the passthru input since semantically the passthru bits aren't negated. This is stronger than the earlier check added for a user being SELECTS so we can remove that.

This fixes PR30913.

Reviewers: delena, zvi, v_klochkov

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27144

llvm-svn: 289190
2016-12-09 06:42:28 +00:00
Matt Arsenault 27c062932a AMDGPU: Select i16 instructions to VOP3 forms
These were selecting directly to the VOP2 form instead
of VOP3 like the i32 instructions. Fixes regressions in
future commits where an immediate isn't folded because it was
initially used for the second operand.

Because uniform 16-bit operations are promoted to i32, it's
difficult to get a simple testcase where this matters. Fold
failures in SIFoldOperands here tend to be hidden by commute
and fold in SIShrinkInstructions.

llvm-svn: 289189
2016-12-09 06:19:12 +00:00
Peter Collingbourne 8db7e5e4ee Re-commit r289184, "Support: Use a 64-bit seek in raw_fd_ostream::seek()." with a configure-time check for lseek64.
llvm-svn: 289187
2016-12-09 05:20:43 +00:00
Craig Topper c4f2b0996d [X86] Add masked versions of VPERMT2* and VPERMI2* to load folding tables.
llvm-svn: 289186
2016-12-09 05:20:11 +00:00
Peter Collingbourne f74fcdd30c Revert r289184, we need more configury for Darwin and *BSD.
llvm-svn: 289185
2016-12-09 05:04:30 +00:00
Peter Collingbourne 08ba509266 Support: Use a 64-bit seek in raw_fd_ostream::seek().
llvm-svn: 289184
2016-12-09 04:57:19 +00:00
Davide Italiano 824d695231 [SCCP] Teach the pass about `mul %x 0` even if %x is overdefined.
The motivating example is:

extern int patatino;
int goo() {
    int x = 0;
    for (int i = 0; i < 1000000; ++i) {
        x *= patatino;
    }
    return x;
}

Currently SCCP will not realize that this function returns always zero,
therefore will try to unroll and vectorize the loop at -O3 producing an
awful lot of (useless) code. With this change, it will just produce:

0000000000000000 <g>:
   xor    %eax,%eax
   retq

llvm-svn: 289175
2016-12-09 03:08:42 +00:00
Craig Topper 2aeb456425 [AVX-512] Add vpermilps/pd to load folding tables.
llvm-svn: 289173
2016-12-09 02:18:11 +00:00
Craig Topper 107b187d2a [Analysis] Fix typo in comment. NFC
llvm-svn: 289171
2016-12-09 02:18:04 +00:00
Kostya Serebryany 111e1d69e3 [libFuzzer] implement crash-resistant merge (https://github.com/google/sanitizers/issues/722). This is a first experimental variant that needs some more testing, thus not yet adding a lit test (but there are unit tests).
llvm-svn: 289166
2016-12-09 01:17:24 +00:00
Peter Collingbourne 8786754cc3 WholeProgramDevirt: Teach the pass to handle structs of arrays.
This will become necessary in some cases once D22296 lands.

llvm-svn: 289165
2016-12-09 01:10:11 +00:00
Chandler Carruth 86f0bdf832 [LCG] Minor cleanup to the LCG walk over a function, NFC.
This just hoists the check for declarations up a layer which allows
various sets used in the walk to be smaller. Also moves the relevant
comments to match, and catches a few other cleanups in this code.

llvm-svn: 289163
2016-12-09 00:46:44 +00:00
Peter Collingbourne 7a1e5bbe4e Make WholeProgramDevirt understand ConstStruct vtables.
Based on a patch by LemonBoy!

Differential Revision: https://reviews.llvm.org/D26581

llvm-svn: 289162
2016-12-09 00:33:27 +00:00
Chris Bieneman 313b326bb6 [ObjectYAML] Support for DWARF debug_aranges
This patch adds support for round tripping DWARF debug_aranges in and out of YAML.

llvm-svn: 289161
2016-12-09 00:26:44 +00:00
Zia Ansari 394cef803a [InstSimplify] Add "X / 1.0" to SimplifyFDivInst.
Differential Revision: https://reviews.llvm.org/D27587

llvm-svn: 289153
2016-12-08 23:27:40 +00:00
Tim Northover b58346f2f2 GlobalISel: fall back gracefully for debug intrinsics.
Supporting them properly is a reasonably complex chunk of work, so to allow bot
testing before then we should at least be able to fall back to DAG ISel.

llvm-svn: 289150
2016-12-08 22:44:13 +00:00
Tim Northover 1e656ec137 GlobalISel: factor overflow handling into separate function. NFC.
llvm-svn: 289149
2016-12-08 22:44:00 +00:00
Davide Italiano 54c683f9e7 [SCCP] Make sure SCCP and ConstantFolding agree on undef >> a.
Currently SCCP folds the value to -1, while ConstantProp folds to
0. This changes SCCP to do what ConstantFolding does.

llvm-svn: 289147
2016-12-08 22:28:53 +00:00
Reid Kleckner 785e7d282c Don't emit .seh_handler directives for any cleanup funclets
We were falsely claiming that we had an LSDA for the relevant EH
personality before this change, which could lead to the EH machinery
interpreting random adjacent data as an LSDA.

Fixes PR31317

This change is safe because cleanups can't contain exception handlers
today. We do these things to maintain that invariant:
- C++ destructors are naturally out-of-line
- __finally blocks are outlined in clang
- LLVM's inliner will not inline EH constructs into cleanups

llvm-svn: 289101
2016-12-08 20:38:46 +00:00
Krzysztof Parzyszek 77a45576ef [RDF] Fix incorrect lane mask calculation
This was exposed by some code that used more than one level of sub-
registers. There is no testcase, because there is no such code in the
Hexagon backend.

llvm-svn: 289099
2016-12-08 20:33:45 +00:00
Matt Arsenault e96d03745d AMDGPU: Make f16 ConstantFP legal
Not having this legal led to combine failures, resulting
in dumb things like bitcasts of constants not being folded
away.

The only reason I'm leaving the v_mov_b32 hack that f32
already uses is to avoid madak formation test regressions.
PeepholeOptimizer has an ordering issue where the immediate
fold attempt is into the sgpr->vgpr copy instead of the actual
use. Running it twice avoids that problem.

llvm-svn: 289096
2016-12-08 20:14:46 +00:00
Stanislav Mekhanoshin 73b54f4134 [AMDGPU] Fix number of reserved SGPRs on CI to reflect flat scratch use
Differential Revision: https://reviews.llvm.org/D27225

llvm-svn: 289095
2016-12-08 20:07:23 +00:00
Matt Arsenault 6c06a6f48a AMDGPU: Fix commuting v_sub_u16
The correct commutable opcode was set to itself, so this
was simply swapping the operands to commute instead of also
changing the opcode to v_subrev_u16.

llvm-svn: 289093
2016-12-08 19:52:38 +00:00
Stanislav Mekhanoshin 50ea93a2bd [AMDGPU] Add amdgpu-unify-metadata pass
Multiple metadata values for records such as opencl.ocl.version, llvm.ident
and similar are created after linking several modules. For some of them, notably
opencl.ocl.version, this creates semantic problem because we cannot tell which
version of OpenCL the composite module conforms.

Moreover, such repetitions of identical values often create a huge list of
unneeded metadata, which grows bitcode size both in memory and stored on disk.
It can go up to several Mb when linked against our OpenCL library. Lastly, such
long lists obscure reading of dumped IR.

The pass unifies metadata after linking.

Differential Revision: https://reviews.llvm.org/D25381

llvm-svn: 289092
2016-12-08 19:46:04 +00:00
Peter Collingbourne 235c275b20 IR, X86: Understand !absolute_symbol metadata on global variables.
Summary:
Attaching !absolute_symbol to a global variable does two things:
1) Marks it as an absolute symbol reference.
2) Specifies the value range of that symbol's address.
Teach the X86 backend to allow absolute symbols to appear in place of
immediates by extending the relocImm and mov64imm32 matchers. Start using
relocImm in more places where it is legal.

As previously proposed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2016-October/105800.html

Differential Revision: https://reviews.llvm.org/D25878

llvm-svn: 289087
2016-12-08 19:01:00 +00:00
Chris Bieneman fbf7dfe1ba [ObjectYAML] Remove DWARF from class names
Since all the DWARF classes are in a DWARFYAML namespace having every class start with DWARF seems like a bit of overkill.

llvm-svn: 289080
2016-12-08 17:46:57 +00:00
Alexander Timofeev 18009560c5 [AMDGPU] Scalarization of global uniform loads.
Summary:
LC can currently select scalar load for uniform memory access
basing on readonly memory address space only. This restriction
originated from the fact that in HW prior to VI vector and scalar caches
are not coherent. With MemoryDependenceAnalysis we can check that the
memory location corresponding to the memory operand of the LOAD is not
clobbered along the all paths from the function entry.

Reviewers: rampitec, tstellarAMD, arsenm

Subscribers: wdng, arsenm, nhaehnle

Differential Revision: https://reviews.llvm.org/D26917

llvm-svn: 289076
2016-12-08 17:28:47 +00:00
Keno Fischer dc09119776 ConstantFolding: Don't crash when encountering vector GEP
ConstantFolding tried to cast one of the scalar indices to a vector
type. Instead, use the vector type only for the first index (which
is the only one allowed to be a vector) and use its scalar type
otherwise.

Fixes PR31250.

Reviewers: majnemer
Differential Revision: https://reviews.llvm.org/D27389

llvm-svn: 289073
2016-12-08 17:22:35 +00:00
NAKAMURA Takumi 689493bb12 Prune unused libdeps.
llvm-svn: 289060
2016-12-08 15:28:02 +00:00
NAKAMURA Takumi 9ccd966612 LanaiInstPrinter: Prune unused libdeps.
llvm-svn: 289054
2016-12-08 14:26:30 +00:00