Commit Graph

106482 Commits

Author SHA1 Message Date
Matt Arsenault defe371771 AMDGPU: Stop modifying SP in call sequences
Because the stack growth direction and addressing is done
in the same direction, modifying SP at the beginning of the
call sequence was incorrect. If we had a stack passed argument,
we would end up skipping that number of bytes before pushing
arguments, leaving unused/inconsistent space.

The callee creates fixed stack objects in its frame, so
the space necessary for these is already logically allocated
in the callee, so we just let the callee increment SP if
it really requires it.

llvm-svn: 313279
2017-09-14 17:37:40 +00:00
Dehao Chen 3a81f84d9a Invoke GetInlineCost for legality check before inline functions in SampleProfileLoader.
Summary: SampleProfileLoader inlines hot functions if it is inlined in the profiled binary. However, the inline needs to be guarded by legality check, otherwise it could lead to correctness issues.

Reviewers: eraman, davidxl

Reviewed By: eraman

Subscribers: vitalybuka, sanjoy, llvm-commits

Differential Revision: https://reviews.llvm.org/D37779

llvm-svn: 313277
2017-09-14 17:29:56 +00:00
Simon Dardis 55e446737f [mips] Implement the 'dext' aliases and it's disassembly alias.
The other members of the dext family of instructions (dextm, dextu) are
traditionally handled by the assembler selecting the right variant of
'dext' depending on the values of the position and size operands.

When these instructions are disassembled, rather than reporting the
actual instruction, an equivalent aliased form of 'dext' is generated
and is reported. This is to mimic the behaviour of binutils.

Reviewers: slthakur, nitesh.jain, atanasyan

Differential Revision: https://reviews.llvm.org/D34887

llvm-svn: 313276
2017-09-14 17:27:53 +00:00
Matt Arsenault 6efd082c01 AMDGPU: Make frame register caller preserved
Using SplitCSR for the frame register was very broken. Often
the copies in the prolog and epilog were optimized out, in addition
to them being inserted after the true prolog where the FP
was clobbered.

I have a hacky solution which works that continues to use
split CSR, but for now this is simpler and will get to working
programs.

llvm-svn: 313274
2017-09-14 17:14:57 +00:00
Adrian Prantl 5fd3d49bc4 llvm-dwarfdump: support dumping static archives.
llvm-svn: 313272
2017-09-14 17:01:53 +00:00
Krzysztof Parzyszek 779d98e1c0 TableGen support for parameterized register class information
This replaces TableGen's type inference to operate on parameterized
types instead of MVTs, and as a consequence, some interfaces have
changed:
- Uses of MVTs are replaced by ValueTypeByHwMode.
- EEVT::TypeSet is replaced by TypeSetByHwMode.

This affects the way that types and type sets are printed, and the
tests relying on that have been updated.

There are certain users of the inferred types outside of TableGen
itself, namely FastISel and GlobalISel. For those users, the way
that the types are accessed have changed. For typical scenarios,
these replacements can be used:
- TreePatternNode::getType(ResNo) -> getSimpleType(ResNo)
- TreePatternNode::hasTypeSet(ResNo) -> hasConcreteType(ResNo)
- TypeSet::isConcrete -> TypeSetByHwMode::isValueTypeByHwMode(false)

For more information, please refer to the review page.

Differential Revision: https://reviews.llvm.org/D31951

llvm-svn: 313271
2017-09-14 16:56:21 +00:00
Krzysztof Parzyszek 6ca02b25a7 [IfConversion] More simple, correct dead/kill liveness handling
Patch by Jesper Antonsson.

Differential Revision: https://reviews.llvm.org/D37611

llvm-svn: 313268
2017-09-14 15:53:11 +00:00
Simon Dardis 6f83ae38a3 [mips] Implement the 'dins' aliases.
Traditionally GAS has provided automatic selection between dins, dinsm and
dinsu. Binutils also disassembles all instructions in that family as 'dins'
rather than the actual instruction.

Reviewers: slthakur

Differential Revision: https://reviews.llvm.org/D34877

llvm-svn: 313267
2017-09-14 15:17:50 +00:00
Sanjay Patel 0d4fd5b668 [InstSimplify] fold sdiv/srem based on compare of dividend and divisor
This should bring signed div/rem analysis up to the same level as unsigned. 
We use icmp simplification to determine when the divisor is known greater than the dividend.

Each positive test is followed by a negative test to show that we're not overstepping the boundaries of the known bits.
There are extra tests for the signed-min-value special cases.

Alive proofs:
http://rise4fun.com/Alive/WI5

Differential Revision: https://reviews.llvm.org/D37713

llvm-svn: 313264
2017-09-14 14:59:07 +00:00
Aleksandar Beserminji 7d610f4d06 Test commit.
llvm-svn: 313262
2017-09-14 14:34:04 +00:00
Sanjay Patel cca8f7853f [InstSimplify] clean up div/rem handling; NFCI
The idea to make an 'isDivZero' helper was suggested for the signed case in D37713:
https://reviews.llvm.org/D37713

This clean-up makes it clear that D37713 is just filling the gap for signed div/rem,
removes unnecessary code, and allows us to remove a bit of duplicated code from the
planned improvement in D37713.

llvm-svn: 313261
2017-09-14 14:09:11 +00:00
Krzysztof Parzyszek 473d02dbac [Hexagon] Make getMemAccessSize return size in bytes
It used to return the actual field value from the instruction descriptor.
There is no reason for that, that value is not interesting in any way and
the specifics of its encoding in the descriptor should not be exposed.

llvm-svn: 313257
2017-09-14 12:06:40 +00:00
Ayman Musa ab68449c53 [X86] When applying the shuffle-to-zero-extend transformation on floating point, bitcast to integer first.
Fix issue described in PR34577.

Differential Revision: https://reviews.llvm.org/D37803

llvm-svn: 313256
2017-09-14 12:06:38 +00:00
Jonas Devlieghere 5891060ff8 [dwarfdump] Add DWARF verifiers for address ranges
This patch started as an attempt to rebase Greg's differential (D32821).
The result is both quite similar and different at the same time. It adds
the following checks:

 - Verify that all address ranges in a DIE are valid.
 - Verify that no ranges within the DIE overlap.
 - Verify that no ranges overlap with the ranges of a sibling.
 - Verify that children are completely contained in its (direct)
   parent's address range. (unless both are subprograms)

Differential revision: https://reviews.llvm.org/D37696

llvm-svn: 313255
2017-09-14 11:33:42 +00:00
Simon Dardis 28365b33ad [mips] Pick the right variant of DINS upfront and enable target instruction verification
This patch complements D16810 "[mips] Make isel select the correct DEXT variant
up front.". Now ISel picks the right variant of DINS, so now there is no need
to replace DINS with the appropriate variant during
MipsMCCodeEmitter::encodeInstruction().

This patch also enables target specific instruction verification for ins, dins,
dinsm, dinsu, ext, dext, dextm, dextu. These instructions have constraints that
are checked when generating MipsISD::Ins and MipsISD::Ext nodes, but these
constraints are not checked during instruction selection. Adding machine
verification should catch outstanding cases.

Finally, correct a bug that instruction verification uncovered, where the
position operand of a DINSU generated during lowering was being silently
and accidently corrected to the correct value.

Reviewers: slthakur

Differential Revision: https://reviews.llvm.org/D34809

llvm-svn: 313254
2017-09-14 10:58:00 +00:00
Jonas Devlieghere a9f55bed8a Revert "[dwarfdump] Add DWARF verifiers for address ranges"
This reverts commit r313250.

llvm-svn: 313253
2017-09-14 10:49:15 +00:00
Simon Pilgrim 8bd2d8780a [DAGCombine] (shl (or x, c1), c2) -> (or (shl x, c2), c1 << c2)
We already have a combine for this pattern when the input to shl is add, so we just need to enable the transformation when the input is or.

Original patch by @tstellar

Differential Revision: https://reviews.llvm.org/D19325

llvm-svn: 313251
2017-09-14 10:38:30 +00:00
Jonas Devlieghere d7201b3a36 [dwarfdump] Add DWARF verifiers for address ranges
This patch started as an attempt to rebase Greg's differential (D32821).
The result is both quite similar and different at the same time. It adds
the following checks:

 - Verify that all address ranges in a DIE are valid.
 - Verify that no ranges within the DIE overlap.
 - Verify that no ranges overlap with the ranges of a sibling.
 - Verify that children are completely contained in its (direct)
   parent's address range. (unless both are subprograms)

Differential revision: https://reviews.llvm.org/D37696

llvm-svn: 313250
2017-09-14 10:38:18 +00:00
Simon Pilgrim 523483e0bd [SelectionDAG] ComputeNumSignBits - cleanup ROTL/ROTR wrapping to match DAGCombine etc.
Use RotAmt.urem(VTBits) instead of AND(RotAmt, VTBits - 1)

TBH I don't expect non-power-of-2 types to be created, but it makes the logic clearer and matches what we do in other rotation combines.

llvm-svn: 313245
2017-09-14 10:28:01 +00:00
Chandler Carruth 7376ae88eb [PM/CGSCC] Teach the CGSCC pass manager components to gracefully handle
invalidated SCCs even when we do not have an updated SCC to redirect
towards.

This comes up in a fairly subtle and surprising circumstance: we need to
have a connected but internal node in the call graph which later becomes
a disconnected island, and then gets deleted. All of this needs to
happen mid-CGSCC walk. Because it is disconnected, we have no way of
computing a new "current" SCC when it gets deleted. Instead, we need to
explicitly check for a deleted "current" SCC and bail out of the current
CGSCC step. This will bubble all the way up to the post-order walk and
then resume correctly.

I've included minimal tests for this bug. The specific behavior
matches something we've seen in the wild with the new PM combined with
ThinLTO and sample PGO, but I've not yet confirmed whether this is the
only issue there.

llvm-svn: 313242
2017-09-14 08:33:57 +00:00
Alon Kom 682cfc1d4c [LV] Fix maximum legal VF calculation
This patch fixes pr34283, which exposed that the computation of
maximum legal width for vectorization was wrong, because it relied
on MaxInterleaveFactor to obtain the maximum stride used in the loop,
however not all strided accesses in the loop have an interleave-group
associated with them.
Instead of recording the maximum stride in the loop, which can be over
conservative (e.g. if the access with the maximum stride is not involved
in the dependence limitation), this patch tracks the actual maximum legal
width imposed by accesses that are involved in dependencies.

Differential Revision: https://reviews.llvm.org/D37507

llvm-svn: 313237
2017-09-14 07:40:02 +00:00
Dean Michael Berris 01fd7c8bd4 [XRay][CodeGen] Use the current function symbol as the associated symbol for the instrumentation map
Summary:
XRay had been assuming that the previous section is the "text" section
of the function when lowering the instrumentation map. Unfortunately
this is not a safe assumption, because we may be coming from lowering
debug type information for the function being lowered.

This fixes an issue with combining -gsplit-dwarf, -generate-type-units,
-debug-compile and -fxray-instrument for sole member functions. When the
split dwarf section is stripped, we're left with references from the
xray_instr_map to the debug section. The change now uses the function's
symbol instead of the previous section's start symbol.

We found the bug while attempting to strip the split debug sections off
an XRay-instrumented object file, which had a peculiar edge-case for
single-function classes where the single function is being lowered.
Because XRay had assocaited the instrumentation map for a function to
the debug types section instead of the function's section, the objcopy
call will fail due to the misplaced reference from the xray_instr_map
section.

Reviewers: pcc, dblaikie, echristo

Subscribers: llvm-commits, aprantl

Differential Revision: https://reviews.llvm.org/D37791

llvm-svn: 313233
2017-09-14 07:08:23 +00:00
Simon Atanasyan b35dd1c908 [mips] Recognise the triple used by Debian for MIPS n32 ABI
Triples like mips64-linux-gnuabin32 are documented in this article:
https://wiki.debian.org/Multiarch/Tuples

llvm-svn: 313231
2017-09-14 06:50:05 +00:00
Vitaly Buka 48624d327a Revert "Invoke GetInlineCost for legality check before inline functions in SampleProfileLoader."
Patch introduced uninitialized value.

This reverts commit r313195.

llvm-svn: 313230
2017-09-14 05:40:33 +00:00
Peter Collingbourne cfbd089237 Reland r313157, "ThinLTO: Correctly follow aliasee references when dead stripping." which was reverted in r313222.
This reland includes a fix for the LowerTypeTests pass so that it
looks past aliases when determining which type identifiers are live.

Differential Revision: https://reviews.llvm.org/D37842

llvm-svn: 313229
2017-09-14 05:02:59 +00:00
Dinar Temirbulatov df0b843875 [SLPVectorizer] Prefer auto over explicit type for VL0, NFCI.
llvm-svn: 313228
2017-09-14 04:28:35 +00:00
Hans Wennborg ae050afeb9 Revert r313157 "ThinLTO: Correctly follow aliasee references when dead stripping."
This broke Chromium's CFI build; see crbug.com/765004.

> We were previously handling aliases during dead stripping by adding
> the aliased global's "original name" GUID to the worklist. This will
> lead to incorrect behaviour if the global has local linkage because
> the original name GUID will not correspond to the global's GUID in
> the summary.
>
> Because an alias is just another name for the global that it
> references, there is no need to mark the referenced global as used,
> or to follow references from any other copies of the global. So all
> we need to do is to follow references from the aliasee's summary
> instead of the alias.
>
> Differential Revision: https://reviews.llvm.org/D37789

llvm-svn: 313222
2017-09-14 00:40:14 +00:00
Matt Arsenault ecb43ef1bc AMDGPU: Don't spill SP reg like a normal CSR
llvm-svn: 313217
2017-09-13 23:47:01 +00:00
Reid Kleckner cd7bba0264 [codeview] Fold FIXME into comment, there's nothing to do. NFC
llvm-svn: 313214
2017-09-13 23:30:01 +00:00
Hans Wennborg 06e2a384c2 Revert r312719 "[MachineCombiner] Update instruction depths incrementally for large BBs."
This caused PR34596.

> [MachineCombiner] Update instruction depths incrementally for large BBs.
>
> Summary:
> For large basic blocks with lots of combinable instructions, the
> MachineTraceMetrics computations in MachineCombiner can dominate the compile
> time, as computing the trace information is quadratic in the number of
> instructions in a BB and it's relevant successors/predecessors.
>
> In most cases, knowing the instruction depth should be enough to make
> combination decisions. As we already iterate over all instructions in a basic
> block, the instruction depth can be computed incrementally. This reduces the
> cost of machine-combine drastically in cases where lots of instructions
> are combined. The major drawback is that AFAIK, computing the critical path
> length cannot be done incrementally. Therefore we only compute
> instruction depths incrementally, for basic blocks with more
> instructions than inc_threshold. The -machine-combiner-inc-threshold
> option can be used to set the threshold and allows for easier
> experimenting and checking if using incremental updates for all basic
> blocks has any impact on the performance.
>
> Reviewers: sanjoy, Gerolf, MatzeB, efriedma, fhahn
>
> Reviewed By: fhahn
>
> Subscribers: kiranchandramohan, javed.absar, efriedma, llvm-commits
>
> Differential Revision: https://reviews.llvm.org/D36619

llvm-svn: 313213
2017-09-13 23:23:09 +00:00
Stanislav Mekhanoshin 7fe9a5d9b4 Allow target to decide when to cluster loads/stores in misched
MachineScheduler when clustering loads or stores checks if base
pointers point to the same memory. This check is done through
comparison of base registers of two memory instructions. This
works fine when instructions have separate offset operand. If
they require a full calculated pointer such instructions can
never be clustered according to such logic.

Changed shouldClusterMemOps to accept base registers as well and
let it decide what to do about it.

Differential Revision: https://reviews.llvm.org/D37698

llvm-svn: 313208
2017-09-13 22:20:47 +00:00
Adrian Prantl 3ae35eb56b llvm-dwarfdump: automatically dump both regular and .dwo variant of sections
Since users typically don't really care about the .dwo / non.dwo
distinction, this patch makes it so dwarfdump --debug-<info,...> dumps
.debug_info and (if available) also .debug_info.dwo. This simplifies
the command line interface (I've removed all dwo-specific dump
options) and makes the tool friendlier to use.

Differential Revision: https://reviews.llvm.org/D37771

llvm-svn: 313207
2017-09-13 22:09:01 +00:00
Matt Arsenault fb017ae155 AMDGPU: Handle coldcc in more places
Missed in r312936

llvm-svn: 313205
2017-09-13 21:55:52 +00:00
Reid Kleckner 89af112cf5 [codeview] VLAs and unsized arrays should use a size of zero
Previously we used a size of '1' for VLAs because we weren't sure what
MSVC did. However, MSVC does support declaring an array without a size,
for which it emits an array type with a size of zero. Clang emits the
same DI metadata for VLAs and arrays without bound, so we would describe
arrays without bound as having one element. This lead to Microsoft
debuggers only printing a single element.

Emitting a size of zero appears to cause these debuggers to search the
symbol information to find a definition of the variable with accurate
array bounds.

Fixes http://crbug.com/763580

llvm-svn: 313203
2017-09-13 21:54:20 +00:00
Eli Friedman bde9fc76dd [ARM] Add more CPUs to host detection
This returns "cortex-a73" for second-generation Kryo; not precisely
correct, but close enough.

Differential Revision: https://reviews.llvm.org/D37724

llvm-svn: 313200
2017-09-13 21:48:00 +00:00
Eugene Zelenko 8002c504cd [Transforms] Fix some Clang-tidy modernize-use-using and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 313198
2017-09-13 21:43:53 +00:00
Wei Mi c0d066468e [RegAlloc] Keep a copy of live interval for the spilled vregs in HoistSpillHelper.
This is to fix PR34502. After rL311401, the live range of spilled vreg will be
cleared. HoistSpill need to use the live range of the original vreg before splitting
to know the moving range of the spills. The patch saves a copy of live interval for
the spilled vreg inside of HoistSpillHelper.

Differential Revision: https://reviews.llvm.org/D37578

llvm-svn: 313197
2017-09-13 21:41:30 +00:00
Dehao Chen 15c86ef970 Invoke GetInlineCost for legality check before inline functions in SampleProfileLoader.
Summary: SampleProfileLoader inlines hot functions if it is inlined in the profiled binary. However, the inline needs to be guarded by legality check, otherwise it could lead to correctness issues.

Reviewers: eraman, davidxl

Reviewed By: eraman

Subscribers: sanjoy, llvm-commits

Differential Revision: https://reviews.llvm.org/D37779

llvm-svn: 313195
2017-09-13 21:22:55 +00:00
Eugene Zelenko 618c555bbe [CodeGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 313194
2017-09-13 21:15:20 +00:00
Adrian McCarthy d91bf3998f Mark static member functions as static in CodeViewDebug
Summary:
To improve CodeView quality for static member functions, we need to make the
static explicit.  In addition to a small change in LLVM's CodeViewDebug to
return the appropriate MethodKind, this requires a small change in Clang to
note the staticness in the debug info metadata.

Subscribers: aprantl, hiraditya

Differential Revision: https://reviews.llvm.org/D37715

llvm-svn: 313192
2017-09-13 20:53:55 +00:00
Easwaran Raman 4924bb002d [Inliner] Add another way to compute full inline cost.
Summary:
Full inline cost is computed when -inline-cost-full is true or ORE is
non-null. This patch adds another way to compute full inline cost by
adding a field to InlineParams. This will be used by SampleProfileLoader
to check legality of inlining a callee that it wants to inline.

Reviewers: danielcdh, haicheng

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D37819

llvm-svn: 313185
2017-09-13 20:16:02 +00:00
Anna Thomas 19529f75b9 [LV] Avoid computing the register usage for default VF. NFC
These are changes to reduce redundant computations when calculating a
feasible vectorization factor:
1. early return when target has no vector registers
2. don't compute register usage for the default VF.

Suggested during review for D37702.

llvm-svn: 313176
2017-09-13 19:35:45 +00:00
Michael Zuckerman 80d3649f23 Refactoring the stride 4 code in the X86interleavedaccess NFC
llvm-svn: 313166
2017-09-13 18:28:09 +00:00
Adrian Prantl 3dcd122151 llvm-dwarfdump: support dumping UUIDs of Mach-O binaries.
This is a feature supported by Darwin dwarfdump. UUIDs are used to
associate executables with their .dSYM bundles.

llvm-svn: 313165
2017-09-13 18:22:59 +00:00
Hiroshi Yamauchi a43913cfaf Add options to dump PGO counts in text.
Summary:
Added text options to -pgo-view-counts and -pgo-view-raw-counts that dump block frequency and branch probability info in text.

This is useful when the graph is very large and complex (the dot command crashes, lines/edges too close to tell apart, hard to navigate without textual search) or simply when text is preferred.

Reviewers: davidxl

Reviewed By: davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D37776

llvm-svn: 313159
2017-09-13 17:20:38 +00:00
Teresa Johnson cbdc5ff628 [ThinLTO] AliasSummary should not have any references
Summary: References should only be on the aliasee.

Reviewers: pcc

Subscribers: llvm-commits, inglorion

Differential Revision: https://reviews.llvm.org/D37814

llvm-svn: 313158
2017-09-13 17:10:24 +00:00
Peter Collingbourne d067c8ed59 ThinLTO: Correctly follow aliasee references when dead stripping.
We were previously handling aliases during dead stripping by adding
the aliased global's "original name" GUID to the worklist. This will
lead to incorrect behaviour if the global has local linkage because
the original name GUID will not correspond to the global's GUID in
the summary.

Because an alias is just another name for the global that it
references, there is no need to mark the referenced global as used,
or to follow references from any other copies of the global. So all
we need to do is to follow references from the aliasee's summary
instead of the alias.

Differential Revision: https://reviews.llvm.org/D37789

llvm-svn: 313157
2017-09-13 17:09:20 +00:00
Alexander Kornienko 208eecd57f Convenience/safety fix for llvm::sys::Execute(And|No)Wait
Summary:
Change the type of the Redirects parameter of llvm::sys::ExecuteAndWait,
ExecuteNoWait and other APIs that wrap them from `const StringRef **` to
`ArrayRef<Optional<StringRef>>`, which is safer and simplifies the use of these
APIs (no more local StringRef variables just to get a pointer to).

Corresponding clang changes will be posted as a separate patch.

Reviewers: bkramer

Reviewed By: bkramer

Subscribers: vsk, llvm-commits

Differential Revision: https://reviews.llvm.org/D37563

llvm-svn: 313155
2017-09-13 17:03:37 +00:00
Teresa Johnson 1958083d35 [ThinLTO] For SamplePGO, need to handle ICP targets consistently in thin link
Summary:
SamplePGO indirect call profiles record the target as the original GUID
for statics. The importer had special handling to map to the normal GUID
in that case. The dead global analysis needs the same treatment or
inconsistencies arise, resulting in linker unsats due to some dead
symbols being exported and kept, leaving in references to other dead
symbols that are removed.

This can happen when a SamplePGO profile collected by one binary is used
for a different binary, so the indirect call profiles may not accurately
reflect live targets.

Reviewers: danielcdh

Subscribers: mehdi_amini, inglorion, llvm-commits, eraman

Differential Revision: https://reviews.llvm.org/D37783

llvm-svn: 313151
2017-09-13 15:16:38 +00:00
Petar Jovanovic 50e068158b [mips] correct operand range for DINSM instruction
This patch corrects the definition of the DINSM instruction.
Specification for DINSM instruction for Mips64 says that size operand should
be 2 <= size <= 64, but it is defined as uimm5_inssize_plus1 which gives
range of 1 .. 32.

Patch by Aleksandar Beserminji.

Differential Revision: https://reviews.llvm.org/D37683

llvm-svn: 313149
2017-09-13 14:09:13 +00:00
Mikael Holmen 4eb2a96e7f [MachineScheduler] Put SchedRegion in an anonymous namespace.
Summary: It pollutes the global namespace otherwise.

Patch by: Bevin Hansson

Reviewers: jonpa

Reviewed By: jonpa

Subscribers: MatzeB, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D37555

llvm-svn: 313148
2017-09-13 14:07:47 +00:00
Stefan Pintilie dff606ec3e [Power9] Add missing instructions: extswsli, popcntb
Added the following P9 instructions: extswsli, extswsli., popcntb

Differential Revision: https://reviews.llvm.org/D37342

llvm-svn: 313147
2017-09-13 14:05:27 +00:00
Jonas Devlieghere 81f5abe1ad [MachO] Prevent heap overflow when load command extends past EOF
This patch fixes a heap-buffer-overflow when a malformed Mach-O has a
load command who's size extends past the end of the binary.

Fixes: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=3225

Differential revision: https://reviews.llvm.org/D37439

llvm-svn: 313145
2017-09-13 13:43:01 +00:00
Jonas Devlieghere 27476ce24b [dwarfdump] Rename Brief to Verbose in DIDumpOptions
This patches renames "brief" to "verbose" in de DIDumpOptions and
inverts the logic to match the new behavior where brief is the default.
Changing the default value uncovered some bugs related to the
DIDumpOptions not being propagated and have been fixed as well.

Differential revision: https://reviews.llvm.org/D37745

llvm-svn: 313139
2017-09-13 09:43:05 +00:00
Igor Breger 5c721199dd [GlobalISel][X86] support G_FPEXT operation.
Summary: Support G_FPEXT operation. Selection done via TableGen'erated code.

Reviewers: zvi, guyblank, aymanmus, m_zuckerman

Reviewed By: zvi

Subscribers: rovka, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D34816

llvm-svn: 313135
2017-09-13 09:05:23 +00:00
Uriel Korach 5d5da5f531 [X86] [PATCH] [intrinsics] Lowering X86 ABS intrinsics to IR. (llvm)
This patch, together with a matching clang patch (https://reviews.llvm.org/D37694), implements the lowering of X86 ABS intrinsics to IR.

differential revision: https://reviews.llvm.org/D37693.

llvm-svn: 313134
2017-09-13 09:02:36 +00:00
Mohammed Agabaria e9aebf26af [X86] Adding X86 Processor Families
Adding x86 Processor families to initialize several uArch properties (based on the family)
This patch shows how gather cost can be initialized based on the proc. family

Differential Revision: https://reviews.llvm.org/D35348

llvm-svn: 313132
2017-09-13 09:00:27 +00:00
Craig Topper 2b6bfda561 [X86] Make sure we emit a SUBREG_TO_REG after the MOV32ri when creating a BEXTR64rr instruction from a shift/and pair.
Fixes PR34589.

llvm-svn: 313126
2017-09-13 07:53:21 +00:00
Elena Demikhovsky 6cab129464 [X86 CodeGen] Optimization of ZeroExtendLoad for v2i8 vector
Load with zero-extend and sign-extend from v2i8 to v2i32 is "Legal" since SSE4.1 and may be performed using PMOVZXBD , PMOVSXBD instructions.

llvm-svn: 313121
2017-09-13 06:40:26 +00:00
Ayal Zaks e2a8c0758f [LV] Fix PR34523 - avoid generating redundant selects
When converting a PHI into a series of 'select' instructions to combine the
incoming values together according their edge masks, initialize the first
value to the incoming value In0 of the first predecessor, instead of
generating a redundant assignment 'select(Cond[0], In0, In0)'. The latter
fails when the Cond[0] mask is null, representing a full mask, which can
happen only when there's a single incoming value.

No functional changes intended nor expected other than surviving null Cond[0]'s.

This fix follows D35725, which introduced using null to represent full masks.

Differential Revision: https://reviews.llvm.org/D37619

llvm-svn: 313119
2017-09-13 06:28:37 +00:00
Aditya Kumar dfa8741c96 [GVNHoist] Factor out reachability to search for anticipable instructions quickly
Factor out the reachability such that multiple queries to find reachability of values are fast. This is based on finding
the ANTIC points
in the CFG which do not change during hoisting. The ANTIC points are basically the dominance-frontiers in the inverse
graph. So we introduce a data structure (CHI nodes)
to keep track of values flowing out of a basic block. We only do this for values with multiple occurrences in the
function as they are the potential hoistable candidates.

This patch allows us to hoist instructions to a basic block with >2 successors, as well as deal with infinite loops in a
trivial way.
Relevant test cases are added to show the functionality as well as regression fixes from PR32821.

Regression from previous GVNHoist:
We do not hoist fully redundant expressions because fully redundant expressions are already handled by NewGVN

Differential Revision: https://reviews.llvm.org/D35918
Reviewers: dberlin, sebpop, gberry,

llvm-svn: 313116
2017-09-13 05:28:03 +00:00
Craig Topper 0a3bcebcc2 [X86] Use isUInt<32> to simplify some code. NFC
llvm-svn: 313112
2017-09-13 02:29:59 +00:00
Leslie Zhai 49277d1fea [ARC] Prepare the implementation of relocation for LLD
Reviewers: ruiu, kparzysz, petecoup, rafael

Reviewed By: kparzysz

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D37556

llvm-svn: 313109
2017-09-13 01:49:49 +00:00
Reid Kleckner 8a1cd91016 [InstCombine] Add a flag to disable LowerDbgDeclare
Summary:
This should improve optimized debug info for address-taken variables at
the cost of inaccurate debug info in some situations.

We patched this into clang and deployed this change to Chromium
developers, and this significantly improved debuggability of optimized
code. The long-term solution to PR34136 seems more and more like it's
going to take a while, so I would like to commit this change under a
flag so that it can be used as a stop-gap measure.

This flag should really help so for C++ aggregates like std::string and
std::vector, which are typically address-taken, even after inlining, and
cannot be SROA-ed.

Reviewers: aprantl, dblaikie, probinson, dberlin

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D36596

llvm-svn: 313108
2017-09-13 01:43:25 +00:00
Petr Hosek c35fe2b70b [Fuchsia] Magenta -> Zircon
Fuchsia's lowest API layer has been renamed from Magenta to Zircon.
In LLVM proper, this is only mentioned in comments.

Patch by Roland McGrath

Differential Revision: https://reviews.llvm.org/D37763

llvm-svn: 313105
2017-09-13 01:18:06 +00:00
Derek Schuff a519fe5a37 [WebAssembly] Add sign extend instructions from atomics proposal
Select them from ISD::SIGN_EXTEND_INREG

Differential Revision: https://reviews.llvm.org/D37603

remove spurious change

llvm-svn: 313101
2017-09-13 00:29:06 +00:00
Sanjay Patel 659279450e [x86] eliminate unnecessary vector compare for AVX masked store
The masked store instruction only cares about the sign-bit of each mask element,
so the compare s<0 isn't needed.

As noted in PR11210:
https://bugs.llvm.org/show_bug.cgi?id=11210
...fixing this should allow us to eliminate x86-specific masked store intrinsics in IR.
(Although more testing will be needed to confirm that.)

I filed a bug to track improvements for AVX512:
https://bugs.llvm.org/show_bug.cgi?id=34584

Differential Revision: https://reviews.llvm.org/D37446

llvm-svn: 313089
2017-09-12 23:24:05 +00:00
Dehao Chen f3ed14d323 Refactor the code to pass down ACT to SampleProfileLoader correctly.
Summary: This change passes down ACT to SampleProfileLoader for the new PM. Also remove the default value for SampleProfileLoader class as it is not used.

Reviewers: eraman, davidxl

Reviewed By: eraman

Subscribers: sanjoy, llvm-commits

Differential Revision: https://reviews.llvm.org/D37773

llvm-svn: 313080
2017-09-12 21:55:55 +00:00
Peter Collingbourne 876da0294a Remove -generate-dwarf-pub-sections flag.
This flag is unnecessary for testing because we can get the coverage
we need by adjusting CU attributes.

Differential Revision: https://reviews.llvm.org/D37725

llvm-svn: 313079
2017-09-12 21:50:55 +00:00
Peter Collingbourne b52e23669c IR: Represent -ggnu-pubnames with a flag on the DICompileUnit.
This allows the flag to be persisted through to LTO.

Differential Revision: https://reviews.llvm.org/D37655

llvm-svn: 313078
2017-09-12 21:50:41 +00:00
Petar Jovanovic e4dacb750d [mips] handle UImm16_AltRelaxed match type
Currently, UImm16_AltRelaxed match type is not handled in
MatchAndEmitInstruction() function, which may result in
llvm_unreachable() behavior.
This patch adds necessary case for this match type.

Patch by Aleksandar Beserminji.

Differential Revision: https://reviews.llvm.org/D37682

llvm-svn: 313077
2017-09-12 21:43:33 +00:00
Alina Sbirlea 80b806bf30 Make promoteLoopAccessesToScalars independent of AliasSet [NFC]
Summary:
The current promoteLoopAccessesToScalars method receives an AliasSet, but
the information used is in fact a list of Value*, known to must alias.
Create the list ahead of time to make this method independent of the AliasSet class.

While there is no functionality change, this adds overhead for creating
a set of Value*, when promotion would normally exit earlier.
This is meant to be as a first refactoring step in order to start replacing
AliasSetTracker with MemorySSA.
And while the end goal is to redesign LICM, the first few steps will focus on
adding MemorySSA as an alternative to the AliasSetTracker using most of the
existing functionality.

Reviewers: mkuper, danielcdh, dberlin

Subscribers: sanjoy, chandlerc, gberry, davide, llvm-commits

Differential Revision: https://reviews.llvm.org/D35439

llvm-svn: 313075
2017-09-12 21:18:44 +00:00
Ahmed Bougacha 106dd035a8 [AArch64][GlobalISel] Select all fpexts.
Tablegen already can select these: mark them as legal, remove the
c++ code, and add tests for all types.

llvm-svn: 313074
2017-09-12 21:04:11 +00:00
Ahmed Bougacha a7aa2a9fb1 [AArch64][GlobalISel] Select all fptruncs.
We already support these in tablegen, but we're matching the wrong
operator (libm ftrunc).  Fix that.

While there, drop the c++ code, support COPYs of FPR16, and add tests
for the other types.

llvm-svn: 313073
2017-09-12 21:04:10 +00:00
Lei Huang 34e6621724 Update branch coalescing to be a PowerPC specific pass
Implementing this pass as a PowerPC specific pass.  Branch coalescing utilizes
the analyzeBranch method which currently does not include any implicit operands.
This is not an issue on PPC but must be handled on other targets.

Pass is currently off by default. Enabled via -enable-ppc-branch-coalesce.

Differential Revision : https: // reviews.llvm.org/D32776

llvm-svn: 313061
2017-09-12 18:39:11 +00:00
Sam Clegg 2176a9f2a3 [WebAssembly] Remove flags from MCSectionWasm
Looks like these were copied from the ELF sections but
don't apply to Wasm and were not used anywhere.

Also remove unused Wasm methods in MCContext.

Differential Revision: https://reviews.llvm.org/D37633

llvm-svn: 313058
2017-09-12 18:31:24 +00:00
Robert Lougher 51529eb0c2 Revert "[DWARF] Incorrect prologue end line record."
This reverts commit r313047 as it is causing buildbot failure (lldb inline
stepping tests).

llvm-svn: 313057
2017-09-12 18:23:15 +00:00
Yonghong Song 06ff655e59 bpf: Add BPF AsmParser support in LLVM
Reviewed-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
llvm-svn: 313055
2017-09-12 17:55:23 +00:00
Craig Topper 958106d0f1 [X86] Move matching of (and (srl/sra, C), (1<<C) - 1) to BEXTR/BEXTRI instruction to custom isel
Recognizing this pattern during DAG combine hides information about the 'and' and the shift from other combines. I think it should be recognized at isel so its as late as possible. But it can't be done with table based isel because you need to be able to look at both immediates. This patch moves it to custom isel in X86ISelDAGToDAG.cpp.

This does break a couple tests in tbm_patterns because we are now emitting an and_flag node or (cmp and, 0) that we dont' recognize yet. We already had this problem for several other TBM patterns so I think this fine and we can address of them together.

I've also fixed a bug where the combine to BEXTR was preventing us from using a trick of zero extending AH to handle extracts of bits 15:8. We might still want to use BEXTR if it enables load folding. But honestly I hope we narrowed the load instead before got to isel.

I think we should probably also support matching BEXTR from (srl/srl (and mask << C), C). But that should be a different patch.

Differential Revision: https://reviews.llvm.org/D37592

llvm-svn: 313054
2017-09-12 17:40:25 +00:00
Robert Lougher f696a22d3c [DWARF] Incorrect prologue end line record.
A prologue-end line record is emitted with an incorrect associated address,
which causes a debugger to show the beginning of function body to be inside
the prologue.

Patch written by Carlos Alberto Enciso.

Differential Revision: https://reviews.llvm.org/D37625

llvm-svn: 313047
2017-09-12 16:35:25 +00:00
Anna Thomas 9f1be02fa3 [LV] Clamp the VF to the trip count
Summary:
When the MaxVectorSize > ConstantTripCount, we should just clamp the
vectorization factor to be the ConstantTripCount.
This vectorizes loops where the TinyTripCountThreshold >= TripCount < MaxVF.

Earlier we were finding the maximum vector width, which could be greater than
the trip count itself. The Loop vectorizer does all the work for generating a
vectorizable loop, but in the end we would always choose the scalar loop (since
the VF > trip count). This allows us to choose the VF keeping in mind the trip
count if available.

This is a fix on top of rL312472.

Reviewers: Ayal, zvi, hfinkel, dneilson

Reviewed by: Ayal

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D37702

llvm-svn: 313046
2017-09-12 16:32:45 +00:00
Hans Wennborg 8c1eb106bd Revert r313009 "[ARM] Use ADDCARRY / SUBCARRY"
This was causing PR34045 to fire again.

> This is a preparatory step for D34515 and also is being recommitted as its
> first version caused PR34045.
>
> This change:
>  - makes nodes ISD::ADDCARRY and ISD::SUBCARRY legal for i32
>  - lowering is done by first converting the boolean value into the carry flag
>    using (_, C) ← (ARMISD::ADDC R, -1) and converted back to an integer value
>    using (R, _) ← (ARMISD::ADDE 0, 0, C). An ARMISD::ADDE between the two
>    operations does the actual addition.
>  - for subtraction, given that ISD::SUBCARRY second result is actually a
>    borrow, we need to invert the value of the second operand and result before
>    and after using ARMISD::SUBE. We need to invert the carry result of
>    ARMISD::SUBE to preserve the semantics.
>  - given that the generic combiner may lower ISD::ADDCARRY and
>    ISD::SUBCARRYinto ISD::UADDO and ISD::USUBO we need to update their lowering
>    as well otherwise i64 operations now would require branches. This implies
>    updating the corresponding test for unsigned.
>  - add new combiner to remove the redundant conversions from/to carry flags
>    to/from boolean values (ARMISD::ADDC (ARMISD::ADDE 0, 0, C), -1) → C
>  - fixes PR34045
>
> Differential Revision: https://reviews.llvm.org/D35192

Also revert follow-up r313010:

> [ARM] Fix typo when creating ISD::SUB nodes
>
> In D35192, I accidentally introduced a typo when creating ISD::SUB nodes,
> giving them two values instead of one.
>
> This fails when the merge_values combiner finds one of these nodes.
>
> This change fixes PR34564.
>
> Differential Revision: https://reviews.llvm.org/D37690

llvm-svn: 313044
2017-09-12 16:24:17 +00:00
Alexey Bataev a26d3e834d [SLP] Fix for PHINode during horizontal reduction scanning, NFC.
Reduces number of loops during instructions analysis.

llvm-svn: 313035
2017-09-12 15:13:50 +00:00
Jonas Paulsson fc4f323ac1 [SystemZ] Add the CoveredBySubRegs bit to GPR64, GPR128 and FPR128 registers.
This bit is needed in order for the CalleeSavedRegs list to automatically
include the super registers if all of their subregs are present.

Thanks to Wei Mi for initially indicating this deficiency in the SystemZ
backend.

Review: Ulrich Weigand.
https://bugs.llvm.org/show_bug.cgi?id=34550

llvm-svn: 313023
2017-09-12 12:11:29 +00:00
Sjoerd Meijer bafde8f3e3 [AArch64] ISel: Add some debug messages to LowerBUILDVECTOR. NFC.
Differential Revision: https://reviews.llvm.org/D37676

llvm-svn: 313017
2017-09-12 10:24:12 +00:00
Yael Tsafrir 47668b5e03 [X86] Lower _mm[256|512]_[mask[z]]_avg_epu[8|16] intrinsics to native llvm IR
Differential Revision: https://reviews.llvm.org/D37560

llvm-svn: 313013
2017-09-12 07:50:35 +00:00
Silviu Baranga ac920f7716 [LAA] Allow more run-time alias checks by coercing pointer expressions to AddRecExprs
Summary:
LAA can only emit run-time alias checks for pointers with affine AddRec
SCEV expressions. However, non-AddRecExprs can be now be converted to
affine AddRecExprs using SCEV predicates.

This change tries to add the minimal set of SCEV predicates in order
to enable run-time alias checking.

Reviewers: anemet, mzolotukhin, mkuper, sanjoy, hfinkel

Reviewed By: hfinkel

Subscribers: mssimpso, Ayal, dorit, roman.shirokiy, mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D17080

llvm-svn: 313012
2017-09-12 07:48:22 +00:00
Roger Ferrer Ibanez 9df2527b0b [ARM] Fix typo when creating ISD::SUB nodes
In D35192, I accidentally introduced a typo when creating ISD::SUB nodes,
giving them two values instead of one.

This fails when the merge_values combiner finds one of these nodes.

This change fixes PR34564.

Differential Revision: https://reviews.llvm.org/D37690

llvm-svn: 313010
2017-09-12 07:42:28 +00:00
Roger Ferrer Ibanez 4f92b4162f [ARM] Use ADDCARRY / SUBCARRY
This is a preparatory step for D34515 and also is being recommitted as its
first version caused PR34045.

This change:
 - makes nodes ISD::ADDCARRY and ISD::SUBCARRY legal for i32
 - lowering is done by first converting the boolean value into the carry flag
   using (_, C) ← (ARMISD::ADDC R, -1) and converted back to an integer value
   using (R, _) ← (ARMISD::ADDE 0, 0, C). An ARMISD::ADDE between the two
   operations does the actual addition.
 - for subtraction, given that ISD::SUBCARRY second result is actually a
   borrow, we need to invert the value of the second operand and result before
   and after using ARMISD::SUBE. We need to invert the carry result of
   ARMISD::SUBE to preserve the semantics.
 - given that the generic combiner may lower ISD::ADDCARRY and
   ISD::SUBCARRYinto ISD::UADDO and ISD::USUBO we need to update their lowering
   as well otherwise i64 operations now would require branches. This implies
   updating the corresponding test for unsigned.
 - add new combiner to remove the redundant conversions from/to carry flags
   to/from boolean values (ARMISD::ADDC (ARMISD::ADDE 0, 0, C), -1) → C
 - fixes PR34045

Differential Revision: https://reviews.llvm.org/D35192

llvm-svn: 313009
2017-09-12 07:40:09 +00:00
Craig Topper fd6be2868e [X86] Fix typo in comment. NFC
llvm-svn: 312990
2017-09-12 01:30:09 +00:00
Hans Wennborg 075e5a2e2b Revert r312898 "[ARM] Use ADDCARRY / SUBCARRY"
It caused PR34564.

> This is a preparatory step for D34515 and also is being recommitted as its
> first version caused PR34045.
>
> This change:
>  - makes nodes ISD::ADDCARRY and ISD::SUBCARRY legal for i32
>  - lowering is done by first converting the boolean value into the carry flag
>    using (_, C) ← (ARMISD::ADDC R, -1) and converted back to an integer value
>    using (R, _) ← (ARMISD::ADDE 0, 0, C). An ARMISD::ADDE between the two
>    operations does the actual addition.
>  - for subtraction, given that ISD::SUBCARRY second result is actually a
>    borrow, we need to invert the value of the second operand and result before
>    and after using ARMISD::SUBE. We need to invert the carry result of
>    ARMISD::SUBE to preserve the semantics.
>  - given that the generic combiner may lower ISD::ADDCARRY and
>    ISD::SUBCARRYinto ISD::UADDO and ISD::USUBO we need to update their lowering
>    as well otherwise i64 operations now would require branches. This implies
>    updating the corresponding test for unsigned.
>  - add new combiner to remove the redundant conversions from/to carry flags
>    to/from boolean values (ARMISD::ADDC (ARMISD::ADDE 0, 0, C), -1) → C
>  - fixes PR34045
>
> Differential Revision: https://reviews.llvm.org/D35192

llvm-svn: 312980
2017-09-11 23:52:02 +00:00
Yonghong Song be9c00347f bpf: add " ll" in the LD_IMM64 asmstring
This partially revert previous fix in commit f5858045aa0b
("bpf: proper print imm64 expression in inst printer").

In that commit, the original suffix "ll" is removed from
LD_IMM64 asmstring. In the customer print method, the "ll"
suffix is printed if the rhs is an immediate. For example,
"r2 = 5ll" => "r2 = 5ll", and "r3 = varll" => "r3 = var".

This has an issue though for assembler. Since assembler
relies on asmstring to do pattern matching, it will not
be able to distiguish between "mov r2, 5" and
"ld_imm64 r2, 5" since both asmstring is "r2 = 5".
In such cases, the assembler uses 64bit load for all
"r = <val>" asm insts.

This patch adds back " ll" suffix for ld_imm64 with one
additional space for "#reg = #global_var" case.

Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
llvm-svn: 312978
2017-09-11 23:43:35 +00:00
Eugene Zelenko 32a4056438 [CodeGen] Fix some Clang-tidy modernize-use-using and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 312971
2017-09-11 23:00:48 +00:00
Adrian Prantl 7bc1b28291 llvm-dwarfdump: Replace -debug-dump=sect option with individual options.
As discussed on llvm-dev in
http://lists.llvm.org/pipermail/llvm-dev/2017-September/117301.html
this changes the command line interface of llvm-dwarfdump to match the
one used by the dwarfdump utility shipping on macOS. In addition to
being shorter to type this format also has the advantage of allowing
more than one section to be specified at the same time.

In a nutshell, with this change

  $ llvm-dwarfdump --debug-dump=info
  $ llvm-dwarfdump --debug-dump=apple-objc

becomes

  $ dwarfdump --debug-info --apple-objc

Differential Revision: https://reviews.llvm.org/D37714

llvm-svn: 312970
2017-09-11 22:59:45 +00:00
Peter Collingbourne b9b6025328 LowerTypeTests: Add import/export support for targets without absolute symbol constants.
The rationale is the same as for r312967.

Differential Revision: https://reviews.llvm.org/D37408

llvm-svn: 312968
2017-09-11 22:49:10 +00:00
Peter Collingbourne b15a35e604 WholeProgramDevirt: Add import/export support for targets without absolute symbol constants.
Not all targets support the use of absolute symbols to export
constants. In particular, ARM has a wide variety of constant encodings
that cannot currently be relocated by linkers. So instead of exporting
the constants using symbols, export them directly in the summary.
The values of the constants are left as zeroes on targets that support
symbolic exports.

This may result in more cache misses when targeting those architectures
as a result of arbitrary changes in constant values, but this seems
somewhat unavoidable for now.

Differential Revision: https://reviews.llvm.org/D37407

llvm-svn: 312967
2017-09-11 22:34:42 +00:00
Matt Arsenault 537bd3b906 AMDGPU: Allow coldcc calls
llvm-svn: 312936
2017-09-11 18:54:20 +00:00
Petar Jovanovic d4f3723c56 [mips][microMIPS] add lapc instruction
Implement LAPC instruction for mips32r6, mips64r6 and micromips32r6.

Patch by Milos Stojanovic.

Differential Revision: https://reviews.llvm.org/D35984

llvm-svn: 312934
2017-09-11 18:34:04 +00:00
Hiroshi Yamauchi 9364432cec Unmerge GEPs to reduce register pressure on IndirectBr edges.
Summary:
GEP merging can sometimes increase the number of live values and register
pressure across control edges and cause performance problems particularly if the
increased register pressure results in spills.

This change implements GEP unmerging around an IndirectBr in certain cases to
mitigate the issue. This is in the CodeGenPrepare pass (after all the GEP
merging has happened.)

With this patch, the Python interpreter loop runs faster by ~5%.

Reviewers: sanjoy, hfinkel

Reviewed By: hfinkel

Subscribers: eastig, junbuml, llvm-commits

Differential Revision: https://reviews.llvm.org/D36772

llvm-svn: 312930
2017-09-11 17:52:08 +00:00
Stanislav Mekhanoshin 710da42b86 [AMDGPU] Produce madak and madmk from the two-address pass
These two instructions are normally selected, but when the
two address pass converts mac into mad we end up with the
mad where we could have one of these.

Differential Revision: https://reviews.llvm.org/D37389

llvm-svn: 312928
2017-09-11 17:13:57 +00:00
Craig Topper 7b02020c7f [X86] Remove portions of r275950 that are no longer needed with i1 not being a legal type
Summary:
r275950 added support for turning (trunc (X >> N) to i1) into BT(X, N). But that's no longer necessary now that i1 isn't legal.

This patch removes the support for that, but preserves some of the refactorings done in that commit.

Reviewers: guyblank, RKSimon, spatel, zvi

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D37673

llvm-svn: 312925
2017-09-11 16:16:48 +00:00
Craig Topper 8dff57a0ed [SelectionDAG] Remove a check for type being a vector type after calling getShiftAmountTy. NFCI
getShiftAmountTy already returns the vector type when called for vectors.

llvm-svn: 312924
2017-09-11 16:15:39 +00:00
Marcello Maggioni ce90060d1c [ScalarEvolution] Refactor forgetLoop() to improve performance
forgetLoop() has pretty bad performance because it goes over
the same instructions over and over again in particular when
nested loop are involved.
The refactoring changes the function to a not-recursive function
and reusing the allocation for data-structures and the Visited
set.

NFCI

Differential Revision: https://reviews.llvm.org/D37659

llvm-svn: 312920
2017-09-11 15:44:20 +00:00
Matt Arsenault eb4474cf20 Fix typo
llvm-svn: 312919
2017-09-11 15:23:22 +00:00
Simon Pilgrim b092bd321a [X86][SSE] Add support for X86ISD::PACKSS to ComputeNumSignBitsForTargetNode
Helps improve combineLogicBlendIntoPBLENDV support by allowing us to peek into through PACKSS truncations of vector comparison results.

Differential Revision: https://reviews.llvm.org/D37680

llvm-svn: 312916
2017-09-11 14:03:47 +00:00
Tim Renouf 660ba2b8af [AMDGPU] exp should not be in WQM mode
A mrt exp with vm=1 must be in exact (non-WQM) mode, as it also exports
the exec mask as the valid mask to determine which pixels to render.

This commit marks any exp as needing to be in exact mode.

Actually, if there are multiple mrt exps, only one needs to have vm=1,
and only that one needs to be in exact mode. But that is an optimization
for another day.

Differential Revision: https://reviews.llvm.org/D36305

llvm-svn: 312915
2017-09-11 13:55:39 +00:00
Sanjay Patel fa877fd464 [InstSimplify] reorder methods; NFC
I'm trying to refactor some shared code for integer div/rem,
but I keep having to scroll through fdiv. The FP ops have
nothing in common with the integer ops, so I'm moving FP
below everything else. 

While here, improve a couple of comments and fix some formatting.

llvm-svn: 312913
2017-09-11 13:34:27 +00:00
Andre Vieira c429aabb91 [ARM] Enable the use of SVC anywhere in an IT block
Differential Revision: https://reviews.llvm.org/D37374

llvm-svn: 312908
2017-09-11 11:11:17 +00:00
Dylan McKay 0fc5fe0a58 [AVR] Enable the '__do_copy_data' function
Also enables '__do_clear_bss'.

These functions are automaticalled called by the CRT if they are
declared.

We need these to be called otherwise RAM will start completely
uninitialised, even though we need to copy RAM variables from progmem to
RAM.

llvm-svn: 312905
2017-09-11 10:32:51 +00:00
Igor Breger 1f14364d64 [GlobalISel][X86] G_ANYEXT support.
Summary: G_ANYEXT support

Reviewers: zvi, delena

Reviewed By: delena

Subscribers: rovka, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D37675

llvm-svn: 312903
2017-09-11 09:41:13 +00:00
Tim Renouf 6cb007fc72 AMDGPU: trivial comment change
... to check commit access for new committer.

llvm-svn: 312900
2017-09-11 08:31:32 +00:00
Roger Ferrer Ibanez 12b20f2307 [ARM] Use ADDCARRY / SUBCARRY
This is a preparatory step for D34515 and also is being recommitted as its
first version caused PR34045.

This change:
 - makes nodes ISD::ADDCARRY and ISD::SUBCARRY legal for i32
 - lowering is done by first converting the boolean value into the carry flag
   using (_, C) ← (ARMISD::ADDC R, -1) and converted back to an integer value
   using (R, _) ← (ARMISD::ADDE 0, 0, C). An ARMISD::ADDE between the two
   operations does the actual addition.
 - for subtraction, given that ISD::SUBCARRY second result is actually a
   borrow, we need to invert the value of the second operand and result before
   and after using ARMISD::SUBE. We need to invert the carry result of
   ARMISD::SUBE to preserve the semantics.
 - given that the generic combiner may lower ISD::ADDCARRY and
   ISD::SUBCARRYinto ISD::UADDO and ISD::USUBO we need to update their lowering
   as well otherwise i64 operations now would require branches. This implies
   updating the corresponding test for unsigned.
 - add new combiner to remove the redundant conversions from/to carry flags
   to/from boolean values (ARMISD::ADDC (ARMISD::ADDE 0, 0, C), -1) → C
 - fixes PR34045

Differential Revision: https://reviews.llvm.org/D35192

llvm-svn: 312898
2017-09-11 07:38:05 +00:00
Elena Demikhovsky cc477bbcea Fixed a bug in splitting Scatter operation in the Type Legalizer.
After the split of the Scatter operation, the order of the new instructions is well defined - Lo goes before Hi. Otherwise the semantic of Scatter (from LSB to MSB) is broken.
I'm chaining 2 nodes to prevent reordering.

Differential Revision https://reviews.llvm.org/D37670

llvm-svn: 312894
2017-09-11 06:18:15 +00:00
Simon Pilgrim 5e2ed8beb1 [X86][SSE] Tidyup + clang-format combineX86ShuffleChain call. NFCI.
llvm-svn: 312887
2017-09-10 18:18:45 +00:00
Simon Pilgrim ff347d3ea4 [X86][SSE] Move combineTo call out of combineX86ShufflesConstants. NFCI.
Move towards making it possible to use the shuffle combines for cases where we don't want to call DCI.CombineTo() with the result.

llvm-svn: 312886
2017-09-10 18:10:49 +00:00
Sanjay Patel 5876189ff1 [InstSimplify] refactor udiv/urem code and add tests; NFCI
This removes some duplicated code and makes it easier to support signed div/rem
in a similar way if we want to do that. Note that the existing comments were not
accurate - we don't need a constant divisor to simplify; icmp simplification does
more than that. But as the added tests show, it could go even further.

llvm-svn: 312885
2017-09-10 17:55:08 +00:00
Simon Pilgrim 9a95e1afd0 [X86][SSE] Move combineTo call out of combineX86ShuffleChain. NFCI.
First step towards making it possible to use the shuffle combines for cases where we don't want to call DCI.CombineTo() with the result.

llvm-svn: 312884
2017-09-10 14:06:41 +00:00
Coby Tayree ef66b3bbab [X86][X86AsmParser] adding const on InlineAsmIdentifierInfo in CreateMemForInlineAsm. NFC.
llvm-svn: 312881
2017-09-10 12:21:24 +00:00
Uriel Korach 01dfd3d1e3 Revert "adding autoUpgrade support to broadcast[f|i]32x2 intrinsics"
This reverts commit r312879 - An accidental partial commit.

llvm-svn: 312880
2017-09-10 09:07:21 +00:00
Uriel Korach 3eb10a79e5 adding autoUpgrade support to broadcast[f|i]32x2 intrinsics
llvm-svn: 312879
2017-09-10 08:40:13 +00:00
Uriel Korach 18972237a2 Test commit
llvm-svn: 312878
2017-09-10 08:31:22 +00:00
Nuno Lopes 404f106d71 Merge isKnownNonNull into isKnownNonZero
It now knows the tricks of both functions.
Also, fix a bug that considered allocas of non-zero address space to be always non null

Differential Revision: https://reviews.llvm.org/D37628

llvm-svn: 312869
2017-09-09 18:23:11 +00:00
Craig Topper 3be1db82b6 [X86] Don't disable slow INC/DEC if optimizing for size
Summary:
Just because INC/DEC is a little slow on some processors doesn't mean we shouldn't prefer it when optimizing for size.

This appears to match gcc behavior.

Reviewers: chandlerc, zvi, RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D37177

llvm-svn: 312866
2017-09-09 17:11:59 +00:00
Sanjay Patel 6fd4391ddd [DivRempairs] add a pass to optimize div/rem pairs (PR31028)
This is intended to be a superset of the functionality from D31037 (EarlyCSE) but implemented 
as an independent pass, so there's no stretching of scope and feature creep for an existing pass. 
I also proposed a weaker version of this for SimplifyCFG in D30910. And I initially had almost 
this same functionality as an addition to CGP in the motivating example of PR31028:
https://bugs.llvm.org/show_bug.cgi?id=31028

The advantage of positioning this ahead of SimplifyCFG in the pass pipeline is that it can allow 
more flattening. But it needs to be after passes (InstCombine) that could sink a div/rem and
undo the hoisting that is done here.

Decomposing remainder may allow removing some code from the backend (PPC and possibly others).

Differential Revision: https://reviews.llvm.org/D37121 

llvm-svn: 312862
2017-09-09 13:38:18 +00:00
Craig Topper 6bed9de3d5 [X86] Call removeDeadNode when we're done doing custom isel for mul, div and test
Summary:
Once we've done our custom isel for these nodes, I think we should be calling removeDeadNode to prune them out of the DAG. Table driven isel ultimately either calls morphNodeTo which modifies a node and doesn't leave dead nodes. Or it emits new nodes and then calls removeDeadNode as part of Opc_CompleteMatch.

If you run a simple multiply test case like this through llc with -debug you'll see a umul_lohi node get printed as part of the dump for Instruction Selection ends.

```
define i64 @foo(i64 %a, i64 %b) local_unnamed_addr #0 {
entry:
  %conv = zext i64 %a to i128
  %conv1 = zext i64 %b to i128
  %mul = mul nuw nsw i128 %conv1, %conv
  %shr = lshr i128 %mul, 64
  %conv2 = trunc i128 %shr to i64
  ret i64 %conv2
}
```

Reviewers: RKSimon, spatel, zvi, guyblank, niravd

Reviewed By: niravd

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D37547

llvm-svn: 312857
2017-09-09 05:57:20 +00:00
Craig Topper 63c5047a4e [X86] Use ReplaceNode instead of ReplaceUses when converting X86ISD::SHRUNKBLEND to ISD::VSELECT during isel.
This ensures that the SHRUNKBLEND node gets erased immediately.

llvm-svn: 312856
2017-09-09 05:57:19 +00:00
Kostya Serebryany 4192b96313 [sanitizer-coverage] call appendToUsed once per module, not once per function (which is too slow)
llvm-svn: 312855
2017-09-09 05:30:13 +00:00
Alexey Bataev 628fbcae4c [SLP] Fix buildbots, NFC.
llvm-svn: 312853
2017-09-09 02:08:45 +00:00
Matthias Braun 6b2b88b071 RegAllocFast: Fix warning; NFC
llvm-svn: 312852
2017-09-09 01:16:59 +00:00
Matthias Braun 864cf585ff RegAllocFast: Cleanup; NFC
- Use range based for
- Variable names should start with upper case
- Add `const`
- Change class name to match filename
- Fix doxygen comments
- Use MCPhysReg instead of unsigned
- Use references instead of pointers where things cannot be nullptr
- Misc coding style improvements

llvm-svn: 312846
2017-09-09 00:52:46 +00:00
Matthias Braun a09d18deb0 RegAllocFast: Move vector to class level to avoid reallocation; NFC
llvm-svn: 312845
2017-09-09 00:52:45 +00:00
Matthias Braun a5225e8cb0 RegAllocFast: Remove write-only set; NFC
llvm-svn: 312844
2017-09-09 00:52:42 +00:00
Kyle Butt 8c0314c3ed PPC: Don't select lxv/stxv for insufficiently aligned stack slots.
The lxv/stxv instructions require an offset that is 0 % 16. Previously we were
selecting lxv/stxv for loads and stores to the stack where the offset from the
slot was a multiple of 16, but the stack slot was not 16 or more byte aligned.
When the frame gets lowered these transform to r(1|31) + slot + offset.
If slot is not aligned, slot + offset may not be 0 % 16.
Now we require 16 byte or more alignment for select lxv/stxv to stack slots.

Includes a testcase that shows both sufficiently and insufficiently aligned
stack slots.

llvm-svn: 312843
2017-09-09 00:37:56 +00:00
Davide Italiano 0731a4f52a [AMDGPU] Remove unused function. NFCI.
llvm-svn: 312836
2017-09-08 23:54:11 +00:00
Yonghong Song 093420f929 bpf: proper print imm64 expression in inst printer
Fixed an issue in printImm64Operand where if the value is
an expression, print out the expression properly. Currently,
it will print
  r1 = <MCOperand Expr:(tx_port)>ll
With the patch, the printout will be
  r1 = tx_port

Suggested-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
llvm-svn: 312833
2017-09-08 23:32:38 +00:00
Guozhi Wei 62d6414465 [TargetTransformInfo] Add a new public interface getInstructionCost
Current TargetTransformInfo can support throughput cost model and code size model, but sometimes we also need instruction latency cost model in different optimizations. Hal suggested we need a single public interface to query the different cost of an instruction. So I proposed following interface:

  enum TargetCostKind {
    TCK_RecipThroughput, ///< Reciprocal throughput.
    TCK_Latency,         ///< The latency of instruction.
    TCK_CodeSize         ///< Instruction code size.
  };

  int getInstructionCost(const Instruction *I, enum TargetCostKind kind) const;

All clients should mainly use this function to query the cost of an instruction, parameter <kind> specifies the desired cost model.

This patch also provides a simple default implementation of getInstructionLatency.

The default getInstructionLatency provides latency numbers for only small number of instruction classes, those latency numbers are only reasonable for modern OOO processors. It can be extended in following ways:

   Add more detail into this function.
   Add getXXXLatency function and call it from here.
   Implement target specific getInstructionLatency function.

Differential Revision: https://reviews.llvm.org/D37170

llvm-svn: 312832
2017-09-08 22:29:17 +00:00
Matt Arsenault 461ed08fbd AMDGPU: Start using !con operator
We have a lot of operand definition work essentially producing
every valid permutation of operands to workaround builiding
operand lists based on the instruction features. Apparently tablegen
already has a mostly undocumented operator to concat dags which
simplies this.

Convert one simple place to use this. The BUF instruction definitions
have much more complicated logic that can be totally rewritten now.

llvm-svn: 312822
2017-09-08 19:09:13 +00:00
Matt Arsenault 2f4df7ec41 AMDGPU: Recompute scc liveness
The various scalar bit operations set SCC,
so one is erased or moved it needs to be recomputed.
Not sure why the existing tests don't fail on this.

llvm-svn: 312819
2017-09-08 18:51:26 +00:00
Vedant Kumar 79a1b5ee5a [Coverage] Build sorted and unique segments
A coverage segment contains a starting line and column, an execution
count, and some other metadata. Clients of the coverage library use
segments to prepare line-oriented reports.

Users of the coverage library depend on segments being unique and sorted
in source order. Currently this is not guaranteed (this is why the clang
change which introduced deferred regions was reverted).

This commit documents the "unique and sorted" condition and asserts that
it holds. It also fixes the SegmentBuilder so that it produces correct
output in some edge cases.

Testing: I've added unit tests for some edge cases. I've also checked
that the new SegmentBuilder implementation is fully covered. Apart from
running check-profile and the llvm-cov tests, I've successfully used a
stage1 llvm-cov to prepare a coverage report for an instrumented clang
binary.

Differential Revision: https://reviews.llvm.org/D36813

llvm-svn: 312817
2017-09-08 18:44:50 +00:00
Vedant Kumar efcf41b528 [Coverage] Define LineColPair for convenience. NFC.
llvm-svn: 312815
2017-09-08 18:44:48 +00:00
Vedant Kumar bae8397006 [Coverage] Report errors when reading malformed source regions
Each source region has a start and end location. Report an error when
the end location does not precede the begin location.

The old lineExecutionCounts.covmapping test actually had a buggy source
region in it. This commit introduces a regenerated copy of the coverage
and moves the old copy to malformedRegions.covmapping, for a test.

Differential Revision: https://reviews.llvm.org/D37387

llvm-svn: 312814
2017-09-08 18:44:47 +00:00
Chandler Carruth 38e2b506db [x86] Fix GCC pedantic warnings about default arguments for lambdas.
llvm-svn: 312809
2017-09-08 18:23:42 +00:00
Dinar Temirbulatov 0d31f0af43 [SLPVectorizer] Add struct InstructionsState that holds information about analysis of vector to be vectorized.
Reviewers: spatel, mzolotukhin, mkuper, hfinkel, RKSimon, filcab, ABataev, davide

Subscribers: llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D37212

llvm-svn: 312802
2017-09-08 17:08:17 +00:00
Wei Mi 5d84d9b35c Fix a bug for rL312641.
rL312641 Allowed llvm.memcpy/memset/memmove to be tail calls when parent
function return the intrinsics's first argument. However on arm-none-eabi
platform, llvm.memcpy will be expanded to __aeabi_memcpy which doesn't
have return value. The fix is to check the libcall name after expansion
to match "memcpy/memset/memmove" before allowing those intrinsic to be
tail calls.

llvm-svn: 312799
2017-09-08 16:44:52 +00:00
Krzysztof Parzyszek f78eca8fb5 Preserve existing regs when adding pristines to LivePhysRegs/LiveRegUnits
Differential Revision: https://reviews.llvm.org/D37600

llvm-svn: 312797
2017-09-08 16:29:50 +00:00
Alexey Bataev bd4a361739 [SLP] Fix the warning about paths not returning the value, NFC.
llvm-svn: 312793
2017-09-08 14:32:20 +00:00
Alexey Bataev 6dd29fccb8 [SLP] Support for horizontal min/max reduction.
SLP vectorizer supports horizontal reductions for Add/FAdd binary
operations. Patch adds support for horizontal min/max reductions.
Function getReductionCost() is split to getArithmeticReductionCost() for
binary operation reductions and getMinMaxReductionCost() for min/max
reductions.
Patch fixes PR26956.

Differential revision: https://reviews.llvm.org/D27846

llvm-svn: 312791
2017-09-08 13:49:36 +00:00
Max Kazantsev d7b0f74c64 Re-enable "[IRCE] Identify loops with latch comparison against current IV value"
Re-applying after the found bug was fixed.

Differential Revision: https://reviews.llvm.org/D36215

llvm-svn: 312783
2017-09-08 10:15:05 +00:00
Jonas Devlieghere f4ed65da04 [dwarfdump] Verify line table prologue
This patch adds prologue verification, which is already present in
Apple's dwarfdump. It checks for invalid directory indices and warns
about duplicate file paths.

Differential revision: https://reviews.llvm.org/D37511

llvm-svn: 312782
2017-09-08 09:48:51 +00:00
Martin Storsjo d085200a11 [llvm-dlltool] Mention arm64 in the lists of architecture alternatives
This was missed in SVN r310223 when arm64 support was added.

Differential Revision: https://reviews.llvm.org/D37588

llvm-svn: 312776
2017-09-08 06:49:46 +00:00
Max Kazantsev 57db44838d diff --git a/lib/Transforms/Scalar/InductiveRangeCheckElimination.cpp b/lib/Transforms/Scalar/InductiveRangeCheckElimination.cpp
index f72a808..9fa49fd 100644
--- a/lib/Transforms/Scalar/InductiveRangeCheckElimination.cpp
+++ b/lib/Transforms/Scalar/InductiveRangeCheckElimination.cpp
@@ -450,20 +450,10 @@ struct LoopStructure {
   // equivalent to:
   //
   // intN_ty inc = IndVarIncreasing ? 1 : -1;
-  // pred_ty predicate = IndVarIncreasing
-  //                         ? IsSignedPredicate ? ICMP_SLT : ICMP_ULT
-  //                         : IsSignedPredicate ? ICMP_SGT : ICMP_UGT;
+  // pred_ty predicate = IndVarIncreasing ? ICMP_SLT : ICMP_SGT;
   //
-  //
-  // for (intN_ty iv = IndVarStart; predicate(IndVarBase, LoopExitAt);
-  //      iv = IndVarNext)
+  // for (intN_ty iv = IndVarStart; predicate(iv, LoopExitAt); iv = IndVarBase)
   //   ... body ...
-  //
-  // Here IndVarBase is either current or next value of the induction variable.
-  // in the former case, IsIndVarNext = false and IndVarBase points to the
-  // Phi node of the induction variable. Otherwise, IsIndVarNext = true and
-  // IndVarBase points to IV increment instruction.
-  //
 
   Value *IndVarBase;
   Value *IndVarStart;
@@ -471,13 +461,12 @@ struct LoopStructure {
   Value *LoopExitAt;
   bool IndVarIncreasing;
   bool IsSignedPredicate;
-  bool IsIndVarNext;
 
   LoopStructure()
       : Tag(""), Header(nullptr), Latch(nullptr), LatchBr(nullptr),
         LatchExit(nullptr), LatchBrExitIdx(-1), IndVarBase(nullptr),
         IndVarStart(nullptr), IndVarStep(nullptr), LoopExitAt(nullptr),
-        IndVarIncreasing(false), IsSignedPredicate(true), IsIndVarNext(false) {}
+        IndVarIncreasing(false), IsSignedPredicate(true) {}
 
   template <typename M> LoopStructure map(M Map) const {
     LoopStructure Result;
@@ -493,7 +482,6 @@ struct LoopStructure {
     Result.LoopExitAt = Map(LoopExitAt);
     Result.IndVarIncreasing = IndVarIncreasing;
     Result.IsSignedPredicate = IsSignedPredicate;
-    Result.IsIndVarNext = IsIndVarNext;
     return Result;
   }
 
@@ -841,42 +829,21 @@ LoopStructure::parseLoopStructure(ScalarEvolution &SE,
     return false;
   };
 
-  // `ICI` can either be a comparison against IV or a comparison of IV.next.
-  // Depending on the interpretation, we calculate the start value differently.
+  // `ICI` is interpreted as taking the backedge if the *next* value of the
+  // induction variable satisfies some constraint.
 
-  // Pair {IndVarBase; IsIndVarNext} semantically designates whether the latch
-  // comparisons happens against the IV before or after its value is
-  // incremented. Two valid combinations for them are:
-  //
-  // 1) { phi [ iv.start, preheader ], [ iv.next, latch ]; false },
-  // 2) { iv.next; true }.
-  //
-  // The latch comparison happens against IndVarBase which can be either current
-  // or next value of the induction variable.
   const SCEVAddRecExpr *IndVarBase = cast<SCEVAddRecExpr>(LeftSCEV);
   bool IsIncreasing = false;
   bool IsSignedPredicate = true;
-  bool IsIndVarNext = false;
   ConstantInt *StepCI;
   if (!IsInductionVar(IndVarBase, IsIncreasing, StepCI)) {
     FailureReason = "LHS in icmp not induction variable";
     return None;
   }
 
-  const SCEV *IndVarStart = nullptr;
-  // TODO: Currently we only handle comparison against IV, but we can extend
-  // this analysis to be able to deal with comparison against sext(iv) and such.
-  if (isa<PHINode>(LeftValue) &&
-      cast<PHINode>(LeftValue)->getParent() == Header)
-    // The comparison is made against current IV value.
-    IndVarStart = IndVarBase->getStart();
-  else {
-    // Assume that the comparison is made against next IV value.
-    const SCEV *StartNext = IndVarBase->getStart();
-    const SCEV *Addend = SE.getNegativeSCEV(IndVarBase->getStepRecurrence(SE));
-    IndVarStart = SE.getAddExpr(StartNext, Addend);
-    IsIndVarNext = true;
-  }
+  const SCEV *StartNext = IndVarBase->getStart();
+  const SCEV *Addend = SE.getNegativeSCEV(IndVarBase->getStepRecurrence(SE));
+  const SCEV *IndVarStart = SE.getAddExpr(StartNext, Addend);
   const SCEV *Step = SE.getSCEV(StepCI);
 
   ConstantInt *One = ConstantInt::get(IndVarTy, 1);
@@ -1060,7 +1027,6 @@ LoopStructure::parseLoopStructure(ScalarEvolution &SE,
   Result.IndVarIncreasing = IsIncreasing;
   Result.LoopExitAt = RightValue;
   Result.IsSignedPredicate = IsSignedPredicate;
-  Result.IsIndVarNext = IsIndVarNext;
 
   FailureReason = nullptr;
 
@@ -1350,9 +1316,8 @@ LoopConstrainer::RewrittenRangeInfo LoopConstrainer::changeIterationSpaceEnd(
                                       BranchToContinuation);
 
     NewPHI->addIncoming(PN->getIncomingValueForBlock(Preheader), Preheader);
-    auto *FixupValue =
-        LS.IsIndVarNext ? PN->getIncomingValueForBlock(LS.Latch) : PN;
-    NewPHI->addIncoming(FixupValue, RRI.ExitSelector);
+    NewPHI->addIncoming(PN->getIncomingValueForBlock(LS.Latch),
+                        RRI.ExitSelector);
     RRI.PHIValuesAtPseudoExit.push_back(NewPHI);
   }
 
@@ -1735,10 +1700,7 @@ bool InductiveRangeCheckElimination::runOnLoop(Loop *L, LPPassManager &LPM) {
   }
   LoopStructure LS = MaybeLoopStructure.getValue();
   const SCEVAddRecExpr *IndVar =
-      cast<SCEVAddRecExpr>(SE.getSCEV(LS.IndVarBase));
-  if (LS.IsIndVarNext)
-    IndVar = cast<SCEVAddRecExpr>(SE.getMinusSCEV(IndVar,
-                                                  SE.getSCEV(LS.IndVarStep)));
+      cast<SCEVAddRecExpr>(SE.getMinusSCEV(SE.getSCEV(LS.IndVarBase), SE.getSCEV(LS.IndVarStep)));
 
   Optional<InductiveRangeCheck::Range> SafeIterRange;
   Instruction *ExprInsertPt = Preheader->getTerminator();
diff --git a/test/Transforms/IRCE/latch-comparison-against-current-value.ll b/test/Transforms/IRCE/latch-comparison-against-current-value.ll
deleted file mode 100644
index afea0e6..0000000
--- a/test/Transforms/IRCE/latch-comparison-against-current-value.ll
+++ /dev/null
@@ -1,182 +0,0 @@
-; RUN: opt -verify-loop-info -irce-print-changed-loops -irce -S < %s 2>&1 | FileCheck %s
-
-; Check that IRCE is able to deal with loops where the latch comparison is
-; done against current value of the IV, not the IV.next.
-
-; CHECK: irce: in function test_01: constrained Loop at depth 1 containing: %loop<header><exiting>,%in.bounds<latch><exiting>
-; CHECK: irce: in function test_02: constrained Loop at depth 1 containing: %loop<header><exiting>,%in.bounds<latch><exiting>
-; CHECK-NOT: irce: in function test_03: constrained Loop at depth 1 containing: %loop<header><exiting>,%in.bounds<latch><exiting>
-; CHECK-NOT: irce: in function test_04: constrained Loop at depth 1 containing: %loop<header><exiting>,%in.bounds<latch><exiting>
-
-; SLT condition for increasing loop from 0 to 100.
-define void @test_01(i32* %arr, i32* %a_len_ptr) #0 {
-
-; CHECK:      test_01
-; CHECK:        entry:
-; CHECK-NEXT:     %exit.mainloop.at = load i32, i32* %a_len_ptr, !range !0
-; CHECK-NEXT:     [[COND2:%[^ ]+]] = icmp slt i32 0, %exit.mainloop.at
-; CHECK-NEXT:     br i1 [[COND2]], label %loop.preheader, label %main.pseudo.exit
-; CHECK:        loop:
-; CHECK-NEXT:     %idx = phi i32 [ %idx.next, %in.bounds ], [ 0, %loop.preheader ]
-; CHECK-NEXT:     %idx.next = add nuw nsw i32 %idx, 1
-; CHECK-NEXT:     %abc = icmp slt i32 %idx, %exit.mainloop.at
-; CHECK-NEXT:     br i1 true, label %in.bounds, label %out.of.bounds.loopexit1
-; CHECK:        in.bounds:
-; CHECK-NEXT:     %addr = getelementptr i32, i32* %arr, i32 %idx
-; CHECK-NEXT:     store i32 0, i32* %addr
-; CHECK-NEXT:     %next = icmp slt i32 %idx, 100
-; CHECK-NEXT:     [[COND3:%[^ ]+]] = icmp slt i32 %idx, %exit.mainloop.at
-; CHECK-NEXT:     br i1 [[COND3]], label %loop, label %main.exit.selector
-; CHECK:        main.exit.selector:
-; CHECK-NEXT:     %idx.lcssa = phi i32 [ %idx, %in.bounds ]
-; CHECK-NEXT:     [[COND4:%[^ ]+]] = icmp slt i32 %idx.lcssa, 100
-; CHECK-NEXT:     br i1 [[COND4]], label %main.pseudo.exit, label %exit
-; CHECK-NOT: loop.preloop:
-; CHECK:        loop.postloop:
-; CHECK-NEXT:    %idx.postloop = phi i32 [ %idx.copy, %postloop ], [ %idx.next.postloop, %in.bounds.postloop ]
-; CHECK-NEXT:     %idx.next.postloop = add nuw nsw i32 %idx.postloop, 1
-; CHECK-NEXT:     %abc.postloop = icmp slt i32 %idx.postloop, %exit.mainloop.at
-; CHECK-NEXT:     br i1 %abc.postloop, label %in.bounds.postloop, label %out.of.bounds.loopexit
-
-entry:
-  %len = load i32, i32* %a_len_ptr, !range !0
-  br label %loop
-
-loop:
-  %idx = phi i32 [ 0, %entry ], [ %idx.next, %in.bounds ]
-  %idx.next = add nsw nuw i32 %idx, 1
-  %abc = icmp slt i32 %idx, %len
-  br i1 %abc, label %in.bounds, label %out.of.bounds
-
-in.bounds:
-  %addr = getelementptr i32, i32* %arr, i32 %idx
-  store i32 0, i32* %addr
-  %next = icmp slt i32 %idx, 100
-  br i1 %next, label %loop, label %exit
-
-out.of.bounds:
-  ret void
-
-exit:
-  ret void
-}
-
-; ULT condition for increasing loop from 0 to 100.
-define void @test_02(i32* %arr, i32* %a_len_ptr) #0 {
-
-; CHECK:      test_02
-; CHECK:        entry:
-; CHECK-NEXT:     %exit.mainloop.at = load i32, i32* %a_len_ptr, !range !0
-; CHECK-NEXT:     [[COND2:%[^ ]+]] = icmp ult i32 0, %exit.mainloop.at
-; CHECK-NEXT:     br i1 [[COND2]], label %loop.preheader, label %main.pseudo.exit
-; CHECK:        loop:
-; CHECK-NEXT:     %idx = phi i32 [ %idx.next, %in.bounds ], [ 0, %loop.preheader ]
-; CHECK-NEXT:     %idx.next = add nuw nsw i32 %idx, 1
-; CHECK-NEXT:     %abc = icmp ult i32 %idx, %exit.mainloop.at
-; CHECK-NEXT:     br i1 true, label %in.bounds, label %out.of.bounds.loopexit1
-; CHECK:        in.bounds:
-; CHECK-NEXT:     %addr = getelementptr i32, i32* %arr, i32 %idx
-; CHECK-NEXT:     store i32 0, i32* %addr
-; CHECK-NEXT:     %next = icmp ult i32 %idx, 100
-; CHECK-NEXT:     [[COND3:%[^ ]+]] = icmp ult i32 %idx, %exit.mainloop.at
-; CHECK-NEXT:     br i1 [[COND3]], label %loop, label %main.exit.selector
-; CHECK:        main.exit.selector:
-; CHECK-NEXT:     %idx.lcssa = phi i32 [ %idx, %in.bounds ]
-; CHECK-NEXT:     [[COND4:%[^ ]+]] = icmp ult i32 %idx.lcssa, 100
-; CHECK-NEXT:     br i1 [[COND4]], label %main.pseudo.exit, label %exit
-; CHECK-NOT: loop.preloop:
-; CHECK:        loop.postloop:
-; CHECK-NEXT:    %idx.postloop = phi i32 [ %idx.copy, %postloop ], [ %idx.next.postloop, %in.bounds.postloop ]
-; CHECK-NEXT:     %idx.next.postloop = add nuw nsw i32 %idx.postloop, 1
-; CHECK-NEXT:     %abc.postloop = icmp ult i32 %idx.postloop, %exit.mainloop.at
-; CHECK-NEXT:     br i1 %abc.postloop, label %in.bounds.postloop, label %out.of.bounds.loopexit
-
-entry:
-  %len = load i32, i32* %a_len_ptr, !range !0
-  br label %loop
-
-loop:
-  %idx = phi i32 [ 0, %entry ], [ %idx.next, %in.bounds ]
-  %idx.next = add nsw nuw i32 %idx, 1
-  %abc = icmp ult i32 %idx, %len
-  br i1 %abc, label %in.bounds, label %out.of.bounds
-
-in.bounds:
-  %addr = getelementptr i32, i32* %arr, i32 %idx
-  store i32 0, i32* %addr
-  %next = icmp ult i32 %idx, 100
-  br i1 %next, label %loop, label %exit
-
-out.of.bounds:
-  ret void
-
-exit:
-  ret void
-}
-
-; Same as test_01, but comparison happens against IV extended to a wider type.
-; This test ensures that IRCE rejects it and does not falsely assume that it was
-; a comparison against iv.next.
-; TODO: We can actually extend the recognition to cover this case.
-define void @test_03(i32* %arr, i64* %a_len_ptr) #0 {
-
-; CHECK:      test_03
-
-entry:
-  %len = load i64, i64* %a_len_ptr, !range !1
-  br label %loop
-
-loop:
-  %idx = phi i32 [ 0, %entry ], [ %idx.next, %in.bounds ]
-  %idx.next = add nsw nuw i32 %idx, 1
-  %idx.ext = sext i32 %idx to i64
-  %abc = icmp slt i64 %idx.ext, %len
-  br i1 %abc, label %in.bounds, label %out.of.bounds
-
-in.bounds:
-  %addr = getelementptr i32, i32* %arr, i32 %idx
-  store i32 0, i32* %addr
-  %next = icmp slt i32 %idx, 100
-  br i1 %next, label %loop, label %exit
-
-out.of.bounds:
-  ret void
-
-exit:
-  ret void
-}
-
-; Same as test_02, but comparison happens against IV extended to a wider type.
-; This test ensures that IRCE rejects it and does not falsely assume that it was
-; a comparison against iv.next.
-; TODO: We can actually extend the recognition to cover this case.
-define void @test_04(i32* %arr, i64* %a_len_ptr) #0 {
-
-; CHECK:      test_04
-
-entry:
-  %len = load i64, i64* %a_len_ptr, !range !1
-  br label %loop
-
-loop:
-  %idx = phi i32 [ 0, %entry ], [ %idx.next, %in.bounds ]
-  %idx.next = add nsw nuw i32 %idx, 1
-  %idx.ext = sext i32 %idx to i64
-  %abc = icmp ult i64 %idx.ext, %len
-  br i1 %abc, label %in.bounds, label %out.of.bounds
-
-in.bounds:
-  %addr = getelementptr i32, i32* %arr, i32 %idx
-  store i32 0, i32* %addr
-  %next = icmp ult i32 %idx, 100
-  br i1 %next, label %loop, label %exit
-
-out.of.bounds:
-  ret void
-
-exit:
-  ret void
-}
-
-!0 = !{i32 0, i32 50}
-!1 = !{i64 0, i64 50}

llvm-svn: 312775
2017-09-08 04:26:41 +00:00
Adrian Prantl 99ba97726a Fix a crash when emitting debug info for multi-reg function arguments
by reusing more of the existing machinery

This is a follow-up to r312169.
Thanks to Björn Pettersson for the testcase!

llvm-svn: 312773
2017-09-08 02:31:37 +00:00
Dean Michael Berris 711dec260f [XRay][CodeGen][PowerPC] Fix tail exit codegen for XRay in PPC
Summary:
This fixes code-gen for XRay in PPC. The regression wasn't caught by
codegen tests  which we add in this change.

What happened was the following:

- For tail exits, we used to unconditionally prepend the returns/exits
  with a pseudo-instruction that gets lowered to the instrumentation
  sled (and leave the actual return/exit instruction as-is).
- Changes to the XRay instrumentation pass caused the tail exits to
  suddenly also emit the tail exit pseudo-instruction, since the check
  for whether a return instruction was also a call instruction meant it
  was a tail exit instruction.
- None of the tests caught the regression either due to non-existent
  tests, or the tests being disabled/removed for continuous breakage.

This change re-introduces some of the basic tests and verifies that
we're back to a state that allows the back-end to generate appropriate
XRay instrumented binaries for PPC in the presence of tail exits.

Reviewers: echristo, timshen

Subscribers: nemanjai, kbarton, llvm-commits

Differential Revision: https://reviews.llvm.org/D37570

llvm-svn: 312772
2017-09-08 01:47:56 +00:00
Chandler Carruth acbcf06f03 [x86] Flesh out the custom ISel for RMW aritmetic ops with used flags to
cover the bitwise operators.

Nothing really exciting here, this just stamps out the rest of the core
operations that can RMW memory and set flags.

Still not implemented here: ADC, SBB. Those will require more
interesting logic to channel the flags *in*, and I'm not currently
planning to try to tackle that. It might be interesting for someone who
wants to improve our code generation for bignum implementations.

Differential Revision: https://reviews.llvm.org/D37141

llvm-svn: 312768
2017-09-08 00:17:12 +00:00
Peter Collingbourne 88a58cf9e7 WholeProgramDevirt: When promoting for single-impl devirt, also rename the comdat.
This is required when targeting COFF, as the comdat name must match
one of the names of the symbols in the comdat.

Differential Revision: https://reviews.llvm.org/D37550

llvm-svn: 312767
2017-09-08 00:10:53 +00:00
Chandler Carruth 52a31bf268 [x86] Extend the manual ISel of `add` and `sub` with both RMW memory
operands and used flags to support matching immediate operands.

This is a bit trickier than register operands, and we still want to fall
back on a register operands even for things that appear to be
"immediates" when they won't actually select into the operation's
immediate operand. This also requires us to handle things like selecting
`sub` vs. `add` to minimize the number of bits needed to represent the
immediate, and picking the shortest immediate encoding. In order to
that, we in turn need to scan to make sure that CF isn't used as it will
get inverted.

The end result seems very nice though, and we're now generating
optimal instruction sequences for these patterns IMO.

A follow-up patch will further expand this to other operations with RMW
memory operands. But handing `add` and `sub` are useful starting points
to flesh out the machinery and make sure interesting and complex cases
can be handled.

Thanks to Craig Topper who provided a few fixes and improvements to this
patch in addition to the review!

Differential Revision: https://reviews.llvm.org/D37139

llvm-svn: 312764
2017-09-07 23:54:24 +00:00
Rafael Espindola 39c150eecb Don't call exit from cl::PrintHelpMessage.
Most callers were not expecting the exit(0) and trying to exit with a
different value.

This also adds back the call to cl::PrintHelpMessage in llvm-ar.

llvm-svn: 312761
2017-09-07 23:30:48 +00:00
Eugene Zelenko 975293f0e5 [Bitcode] Fix some Clang-tidy modernize-use-using and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 312760
2017-09-07 23:28:24 +00:00
Reid Kleckner 0e8c4bb055 Sink some IntrinsicInst.h and Intrinsics.h out of llvm/include
Many of these uses can get by with forward declarations. Hopefully this
speeds up compilation after adding a single intrinsic.

llvm-svn: 312759
2017-09-07 23:27:44 +00:00
Richard Trieu c7828ebea4 Revert r312318, r312325, r312424, r312489
r312318 - Debug info for variables whose type is shrinked to bool
r312325, r312424, r312489 - Test case for r312318

Revision 312318 introduced a null dereference bug.
Details in https://bugs.llvm.org/show_bug.cgi?id=34490

llvm-svn: 312758
2017-09-07 23:20:35 +00:00
Reid Kleckner 3cdf713fd2 Move duplicate helpers from DbgValueInst / DbgDeclareInst to DbgInfoIntrinsic
NFC

llvm-svn: 312754
2017-09-07 22:46:24 +00:00
Paul Robinson bb92137080 [DWARF] Line 0 should not have a discriminator.
It's meaningless and takes up extra space in the line table.

Differential Revision: https://reviews.llvm.org/D37364

llvm-svn: 312751
2017-09-07 22:15:44 +00:00
Petr Hosek 5c469a3daa [yaml2obj][ELF] Add support for symbol indexes greater than SHN_LORESERVE
Right now Symbols must be either undefined or defined in a specific
section. Some symbols have section indexes like SHN_ABS however. This
change adds support for outputting symbols that have such section
indexes.

Patch by Jake Ehrlich

Differential Revision: https://reviews.llvm.org/D37391

llvm-svn: 312745
2017-09-07 20:44:16 +00:00
Peter Collingbourne 9e26e97955 COFF: PDB: Allow multiple modules with the same name.
It is possible for two modules to have the same name if they are
archive members with the same name, or if we are doing LTO (in which
case all modules will have the name "lto.tmp").

Differential Revision: https://reviews.llvm.org/D37589

llvm-svn: 312744
2017-09-07 20:39:46 +00:00
Peter Collingbourne 8ad3aab4e5 Remove dead code. NFCI.
llvm-svn: 312740
2017-09-07 19:17:30 +00:00
Artem Belevich 8af4e23d1e [CUDA] Added rudimentary support for CUDA-9 and sm_70.
For now CUDA-9 is not included in the list of CUDA versions clang
searches for, so the path to CUDA-9 must be explicitly passed
via --cuda-path=.

On LLVM side NVPTX added sm_70 GPU type which bumps required
PTX version to 6.0, but otherwise is equivalent to sm_62 at the moment.

Differential Revision: https://reviews.llvm.org/D37576

llvm-svn: 312734
2017-09-07 18:14:32 +00:00
Matt Arsenault d7e2303df2 AMDGPU: Start selecting v_mad_mix_f32
llvm-svn: 312732
2017-09-07 18:05:07 +00:00
Matt Arsenault 61ec738b60 DAG: Allow creating extract_vector_elt post-legalize
Fixes some combine issues for AMDGPU where we weren't
getting the many extract_vector_elt combines expected
in a future patch.

This should really be checking isOperationLegalOrCustom on
the extract. That improves a number of x86 lit tests, but
a few get stuck in an infinite loop from one place
where a similar looking extract is created. I have a
different workaround in the backend for that which
keeps many of those improvements, but also adds a few
regressions.

llvm-svn: 312730
2017-09-07 17:24:43 +00:00
Konstantin Zhuravlyov 5f5b586c99 AMDGPU: Handle non-temporal loads and stores
Differential Revision: https://reviews.llvm.org/D36862

llvm-svn: 312729
2017-09-07 17:14:54 +00:00
Konstantin Zhuravlyov c8c9d4a0a6 AMDGPU: Handle more than one memory operand in SIMemoryLegalizer
Differential Revision: https://reviews.llvm.org/D37397

llvm-svn: 312725
2017-09-07 16:14:21 +00:00
Benjamin Kramer 6ef976d5e1 [ARM] Remove redundant vcvt patterns.
These don't add any value as they're just compositions of existing
patterns. However, they can confuse the cost logic in ISel, leading to
duplicated vcvt instructions like in PR33199.

llvm-svn: 312724
2017-09-07 14:52:26 +00:00
Michael Zuckerman 5a385940d3 [X86][LLVM]Expanding Supports lowerInterleavedLoad() in X86InterleavedAccess (VF{8|16|32} stride 3).
This patch expands the support of lowerInterleavedload to {8|16|32}x8i stride 3.

LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=3 VF={8|16|32}) and we plan to include the store (deinterleved side).

The patch goal is to optimize the following sequence:
a0 b0 c0 a1 b1 c1 a2 b2
c2 a3 b3 c3 a4 b4 c4 a5
b5 c5 a6 b6 c6 a7 b7 c7

into

a0 a1 a2 a3 a4 a5 a6 a7
b0 b1 b2 b3 b4 b5 b6 b7
c0 c1 c2 c3 c4 c5 c6 c7

Reviewers
1. zvi
2. igor
3. guyblank
4. dorit
5. Ayal

llvm-svn: 312722
2017-09-07 14:02:13 +00:00
Simon Atanasyan 6d7958684b [mips] Use RegisterMCAsmBackend to register all MIPS asm backends. NFC
This change converts the `MipsAsmBackend` constructor to the "standard"
form. It makes possible to use `RegisterMCAsmBackend` for the backends
registrations. Now we pass `Triple` instance to the `MipsAsmBackend`
ctor and deduce all required options like endianness and bitness from
the triple. We still need to implement explicit ABI checking for
providing correct options to backends.

Differential revision: https://reviews.llvm.org/D37519

llvm-svn: 312720
2017-09-07 12:54:26 +00:00
Florian Hahn d39b8a3533 [MachineCombiner] Update instruction depths incrementally for large BBs.
Summary:
For large basic blocks with lots of combinable instructions, the
MachineTraceMetrics computations in MachineCombiner can dominate the compile
time, as computing the trace information is quadratic in the number of
instructions in a BB and it's relevant successors/predecessors.

In most cases, knowing the instruction depth should be enough to make
combination decisions. As we already iterate over all instructions in a basic
block, the instruction depth can be computed incrementally. This reduces the
cost of machine-combine drastically in cases where lots of instructions
are combined. The major drawback is that AFAIK, computing the critical path
length cannot be done incrementally. Therefore we only compute
instruction depths incrementally, for basic blocks with more
instructions than inc_threshold. The -machine-combiner-inc-threshold
option can be used to set the threshold and allows for easier
experimenting and checking if using incremental updates for all basic
blocks has any impact on the performance.

Reviewers: sanjoy, Gerolf, MatzeB, efriedma, fhahn

Reviewed By: fhahn

Subscribers: kiranchandramohan, javed.absar, efriedma, llvm-commits

Differential Revision: https://reviews.llvm.org/D36619

llvm-svn: 312719
2017-09-07 12:49:39 +00:00
Florian Hahn cf0cdd4c02 [MachineTraceMetrics] Add computeDepth function (NFCI).
Summary:
This function is used in D36619 to update the instruction depths
incrementally.

Reviewers: efriedma, Gerolf, MatzeB, fhahn

Reviewed By: fhahn

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D36696

llvm-svn: 312714
2017-09-07 11:51:30 +00:00
Alex Bradbury c09d5611c4 [Sparc][NFC] Clean up SelectCC lowering
The ARM, BPF, MSP430, Sparc and Mips backends all use a similar code sequence 
for lowering SelectCC. As pointed out by @reames in D29937, this code isn't 
particularly clear and in most of these backends doesn't actually match the 
comments. This patch makes the code sequence clearer for the Sparc backend 
through better variable naming and more accurate comments (e.g. we are 
inserting triangle control flow, _not_ diamond). There is no functional 
change.

Differential Revision: https://reviews.llvm.org/D37194

llvm-svn: 312713
2017-09-07 11:30:55 +00:00
Jonas Paulsson 0f056352a8 Revert "[RegAlloc] Make sure live-ranges reflect the state of the IR when removing them"
This temporarily reverts commit 463fa38 (r311401).

See https://bugs.llvm.org/show_bug.cgi?id=34502

llvm-svn: 312708
2017-09-07 09:13:17 +00:00
Zvi Rackover 25799d93f0 X86: Improve AVX512 fptoui lowering
Summary:
Add patterns for
  fptoui <16 x float> to <16 x i8>
  fptoui <16 x float> to <16 x i16>

Reviewers: igorb, delena, craig.topper

Reviewed By: craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D37505

llvm-svn: 312704
2017-09-07 07:40:34 +00:00
Craig Topper 7bc65e220c [X86] Force shuffle lowering to only create X86ISD::VPERM2X128 with 64-bit element types so we can remove some patterns from isel.
Intrinsic handling is still creating these nodes with 32-bit elements as well. But at least this gets rid of 8 and 16.

Ideally, someday we'll convert the intrinsics to generic vector shuffles and remove the intrinsics.

llvm-svn: 312702
2017-09-07 06:11:10 +00:00
Matt Arsenault 65ca292a8d AMDGPU: Don't legalize i16 extloads to i32 with legal i16
Keeping non-i16 extloads makes it easier to match some new
gfx9 load instructions.

llvm-svn: 312699
2017-09-07 05:37:34 +00:00
Peter Collingbourne 681fbb64a4 ModuleSummaryAnalysis: Correctly handle all function operand references.
The current code that handles personality functions when creating a
module summary does not correctly handle the case where a function's
personality function operand refers to the function indirectly
(e.g. via a bitcast). This patch handles such cases by treating
personality function references like any other reference, i.e. by
adding them to the function's reference list. This has the minor side
benefit of allowing personality functions to participate in early
dead stripping.

We do this by calling findRefEdges on the function itself. This way
we also end up handling other function operands (specifically prefix
data and prologue data) for free.

Differential Revision: https://reviews.llvm.org/D37553

llvm-svn: 312698
2017-09-07 05:35:35 +00:00
Craig Topper 9228aee711 [X86] Remove patterns for selecting a v8f32 X86ISD::MOVSS or v4f64 X86ISD::MOVSD.
I don't think we ever generate these. If we did, I would expect we would also be able to generate v16f32 and v8f64, but we don't have those patterns.

llvm-svn: 312694
2017-09-07 05:08:16 +00:00
Saleem Abdulrasool 5fba8ba9cc ARM: track globals promoted to coalesced const pool entries
Globals that are promoted to an ARM constant pool may alias with another
existing constant pool entry. We need to keep a reference to all globals
that were promoted to each constant pool value so that we can emit a
distinct label for each promoted global. These labels are necessary so
that debug info can refer to the promoted global without an undefined
reference during linking.

Patch by Stephen Crane!

llvm-svn: 312692
2017-09-07 04:00:13 +00:00
Peter Collingbourne 4899a923db Object: Downgrade invalid weak externals from an assert fail to an llvm::Error when creating an irsymtab.
This fixes bitcode emission for modules containing invalid weak externals.

llvm-svn: 312686
2017-09-07 01:33:52 +00:00
Matt Arsenault 3ced3d90c3 InstSimplify: canonicalize is idempotent
llvm-svn: 312685
2017-09-07 01:21:43 +00:00
Peter Collingbourne 264d7c9021 LTO: Remove unnecessary Windows support code.
I empirically verified that open files can in fact be renamed on
Windows with sys::fs::rename, so remove the incorrect code and comment.

llvm-svn: 312683
2017-09-07 00:55:00 +00:00
Eugene Zelenko 92334e07ca [Pass] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 312679
2017-09-06 23:05:38 +00:00
Stanislav Mekhanoshin 442e28dd42 [AMDGPU] Use v_pk_max_f16 for fcanonicalize
Differential Revision: https://reviews.llvm.org/D37325

llvm-svn: 312676
2017-09-06 22:27:29 +00:00
Sam Clegg e7a60708ff [WebAssembly] Only treat imports/exports as symbols when reading relocatable object files
This change only treats imported and exports functions and globals
as symbol table entries the object has a "linking" section (i.e. it is
relocatable object file).

In this case all globals must be of type I32 and initialized with
i32.const.  This was previously being assumed but not checked for and
was causing a failure on big endian machines due to using the wrong
value of then union.

See: https://bugs.llvm.org/show_bug.cgi?id=34487

Differential Revision: https://reviews.llvm.org/D37497

llvm-svn: 312674
2017-09-06 22:05:41 +00:00
Rui Ueyama f2f4e033b4 Removes redundant `llvm::`, add comments and simplify a return type of a function.
No functional change intended.

llvm-svn: 312673
2017-09-06 22:05:32 +00:00
Matthias Braun c9056b834d Insert IMPLICIT_DEFS for undef uses in tail merging
Tail merging can convert an undef use into a normal one when creating a
common tail. Doing so can make the register live out from a block which
previously contained the undef use. To keep the liveness up-to-date,
insert IMPLICIT_DEFs in such blocks when necessary.

To enable this patch the computeLiveIns() function which used to
compute live-ins for a block and set them immediately is split into new
functions:
- computeLiveIns() just computes the live-ins in a LivePhysRegs set.
- addLiveIns() applies the live-ins to a block live-in list.
- computeAndAddLiveIns() is a convenience function combining the other
  two functions and behaving like computeLiveIns() before this patch.

Based on a patch by Krzysztof Parzyszek <kparzysz@codeaurora.org>

Differential Revision: https://reviews.llvm.org/D37034

llvm-svn: 312668
2017-09-06 20:45:24 +00:00
Krzysztof Parzyszek 1dc313727e Disable jump threading into loop headers
Consider this type of a loop:
    for (...) {
      ...
      if (...) continue;
      ...
    }
Normally, the "continue" would branch to the loop control code that
checks whether the loop should continue iterating and which contains
the (often) unique loop latch branch. In certain cases jump threading
can "thread" the inner branch directly to the loop header, creating
a second loop latch. Loop canonicalization would then transform this
loop into a loop nest. The problem with this is that in such a loop
nest neither loop is countable even if the original loop was. This
may inhibit subsequent loop optimizations and be detrimental to
performance.

Differential Revision: https://reviews.llvm.org/D36404

llvm-svn: 312664
2017-09-06 19:36:58 +00:00
Craig Topper 7391786175 [X86] Move more isel patterns to X86InstrVecCompiler.td. NFC
This moves more of our subvector insert/extract tricks to X86InstrVecCompiler.td and refactors them into multiclasses.

llvm-svn: 312661
2017-09-06 19:03:55 +00:00
Stanislav Mekhanoshin ea134bcb13 [AMDGPU] Fixed encoding of v_pk_mul_f16 in fcanonicalize
Differential Revision: https://reviews.llvm.org/D37522

llvm-svn: 312660
2017-09-06 18:29:51 +00:00
Krzysztof Parzyszek a3017aa2ab [IfConversion] Remove kill flags from common instructions as well
When if-converting a diamond, two separate blocks will be placed back
to back to form a straight line code. To ensure correctness of the
liveness information, any registers that are live in the second block
should not be killed in the first block, even if they were in the
original code.
Additionally, when the two blocks share common instructions at the
beginning, these instructions will not be duplicated, but only placed
once, before both of the blocks. Since the function "isIdenticalTo"
(as used here) ignores kill flags, the common initial code in one
block may have a kill flag for a register that is live in the other
block.
Because the code that removes kill flags only runs for the non-common
parts of the predicated blocks, a kill flag mismatch in the common
code could still lead to a live register being killed prematurely.

llvm-svn: 312654
2017-09-06 17:57:13 +00:00
Craig Topper d548bb9d37 [X86] Actually add the new file that was supposed to go with r312649.
llvm-svn: 312650
2017-09-06 17:06:40 +00:00
Craig Topper cf1d8a55f2 [X86] Introduce a new td file to hold patterns some of the non instruction patterns from SSE and AVX512
This patch moves some of similar non-instruction patterns from X86InstrSSE.td and X86InstrAVX512.td to a common file.

This is intended as a starting point. There are many other optimization patterns that exist in both files that we could move here.

Differential Revision: https://reviews.llvm.org/D37455

llvm-svn: 312649
2017-09-06 16:56:52 +00:00
Nuno Lopes ba1c9f7aee Fix PR33878: BasicAA incorrectly assumes different address spaces don't alias
Remove code that assumed that a nullptr of address space != 0 couldnt alias with a non-null pointer. This is incorrect, since nothing can be concluded about a null pointer in an address space != 0.
This code was written before address spaces were introduced

Differential Revision: https://reviews.llvm.org/D37518

llvm-svn: 312648
2017-09-06 16:55:31 +00:00
Alexander Kornienko 3ad84ee009 Minor style fixes in lib/Support/**/Program.(inc|cpp).
No functional changes intended.

llvm-svn: 312646
2017-09-06 16:28:33 +00:00
Krzysztof Parzyszek daf1a5f94e [Hexagon] Add option to generate calls to "abort" for "unreachable"
llvm-svn: 312644
2017-09-06 16:22:55 +00:00