Commit Graph

110057 Commits

Author SHA1 Message Date
Martin Storsjo e8248f2e10 [GlobalMerge] Don't merge dllexport globals
Merging such globals loses the dllexport attribute. Add a test
to check that normal globals still are merged.

Differential Revision: https://reviews.llvm.org/D42127

llvm-svn: 323307
2018-01-24 06:40:04 +00:00
Craig Topper 069e1dd861 [X86] Move 'Y' to correct place in FMA4 regular expression in Znver1 scheduler model.
I think these instructions used to be named differently and the regular expression reflected that. I guess we must have correct itinerary information that made this not matter for the scheduler test?

llvm-svn: 323305
2018-01-24 05:32:51 +00:00
Craig Topper a55ac7b790 [X86] Rename 256-bit VFRCZ instructions to have the Y before the rr/rm to match other instructions. NFC
llvm-svn: 323304
2018-01-24 05:14:39 +00:00
Craig Topper fd68c2d0ae [X86] Remove redundant regular expression from the Znver1 scheduler model. NFC
llvm-svn: 323303
2018-01-24 05:14:33 +00:00
Hiroshi Inoue 501931b117 [NFC] fix trivial typos in comments
"the the" -> "the"

llvm-svn: 323302
2018-01-24 05:04:35 +00:00
Craig Topper 0321ebc054 [X86] Use ISD::SIGN_EXTEND instead of X86ISD::VSEXT for mask to xmm/ymm/zmm conversion
There are a couple tricky things with this patch.

I had to add an override of isVectorLoadExtDesirable to stop DAG combine from combining sign_extend with loads after legalization since we legalize sextload using a load+sign_extend. Overriding this hook actually prevents a lot sextloads from being created in the first place.

I also had to add isel patterns because DAG combine blindly combines sign_extend+truncate to a smaller sign_extend which defeats what legalization was trying to do.

Differential Revision: https://reviews.llvm.org/D42407

llvm-svn: 323301
2018-01-24 04:51:17 +00:00
Jakub Kuderski ffb4fb7f6f [Dominators] Introduce DomTree verification levels
Summary:
Currently, there are 2 ways to verify a DomTree:
* `DT.verify()` -- runs full tree verification and checks all the properties and gives a reason why the tree is incorrect. This is run by when EXPENSIVE_CHECKS are enabled or when `-verify-dom-info` flag is set.
* `DT.verifyDominatorTree()` -- constructs a fresh tree and compares it against the old one. This does not check any other tree properties (DFS number, levels), nor ensures that the construction algorithm is correct. Used by some passes inside assertions.

This patch introduces DomTree verification levels, that try to close the gape between the two ways of checking trees by introducing 3 verification levels:
- Full -- checks all properties, but can be slow (O(N^3)). Used when manually requested (e.g. `assert(DT.verify())`) or when  `-verify-dom-info` is set.
- Basic -- checks all properties except the sibling property, and compares the current tree with a freshly constructed one instead. This should catch almost all errors, but does not guarantee that the construction algorithm is correct. Used when EXPENSIVE checks are enabled.
- Fast -- checks only basic properties (reachablility, dfs numbers, levels, roots), and compares with a fresh tree. This is meant to replace the legacy `DT.verifyDominatorTree()` and in my tests doesn't cause any noticeable performance impact even in the most pessimistic examples.

When used to verify dom tree wrapper pass analysis on sqlite3, the 3 new levels make `opt -O3` take the following amount of time on my machine:
- no verification: 8.3s
- `DT.verify(VerificationLevel::Fast)`: 10.1s
- `DT.verify(VerificationLevel::Basic)`: 44.8s
- `DT.verify(VerificationLevel::Full)`: 1m 46.2s
(and the previous `DT.verifyDominatorTree()` is within the noise of the Fast level)

This patch makes `DT.verifyDominatorTree()` pick between the 3 verification levels depending on EXPENSIVE_CHECKS and `-verify-dom-info`.

Reviewers: dberlin, brzycki, davide, grosser, dmgreen

Reviewed By: dberlin, brzycki

Subscribers: MatzeB, llvm-commits

Differential Revision: https://reviews.llvm.org/D42337

llvm-svn: 323298
2018-01-24 02:40:35 +00:00
Rafael Espindola 432a587cf0 Don't assume a null GV is local for ELF and MachO.
This is already a simplification, and should help with avoiding a plt
reference when calling an intrinsic with -fno-plt.

With this change we return false for null GVs, so the caller only
needs to check the new metadata to decide if it should use foo@plt or
*foo@got.

llvm-svn: 323297
2018-01-24 02:11:18 +00:00
Eric Christopher a8bdf5328d Remove set but unused variable IsUndef.
llvm-svn: 323295
2018-01-24 01:51:57 +00:00
Zvi Rackover b5447b1e7c X86: Update isVectorShiftByScalarCheap with cases covered by AVX512BW
Summary:
AVX512BW adds support for variable shift amount for 16-bit element
vectors.

Reviewers: craig.topper, RKSimon, spatel

Reviewed By: RKSimon

Subscribers: rengolin, tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D42437

llvm-svn: 323292
2018-01-24 01:36:40 +00:00
Aditya Nandakumar f2aa2af24e [GISel]: Remove redundant copies at the end of ISel
https://reviews.llvm.org/D42402

A lot of these copies are useless (copies b/w VRegs having the same
regclass) and should be cleaned up.

llvm-svn: 323291
2018-01-24 01:35:26 +00:00
Sam Clegg 23012e98c9 [WebAssembly] Add minor helper functions to WasmObjectFile
Also, fix crash when exporting an imported function.

Differential Revision: https://reviews.llvm.org/D42454

llvm-svn: 323290
2018-01-24 01:27:17 +00:00
Matthias Braun 70fd374d1e AArch64: Cyclone: Remove SlowMisaligned128Store tuning flag
Remove FeatureSlowMisaligned128Store from cyclone flags.
This flag causes splitting of 16 byte wide stores into 2 stored of 8
bytes. This was useful on older apple CPUs which were slow for 16byte
stores that were not aligned on 16byte. As the compiler often cannot
predict the actual alignment, the splitting was choosen.

This has been a topic for a lot of debate as the splitting also
decreases performance for some benchmarks. Measuring the effects on
newer apple chips (rdar://35525421) shows that it harms more cases than
it helps. So it is time to retire this workaround.

llvm-svn: 323289
2018-01-24 00:39:53 +00:00
Benjamin Kramer 0627a33ed4 [TblGen] Inline an (almost) trivial accessor. No functionality change.
llvm-svn: 323276
2018-01-23 23:03:50 +00:00
Volkan Keles ebf34ea316 BlockExtractor: Remove unused variable. NFC.
llvm-svn: 323271
2018-01-23 22:24:34 +00:00
Tim Shen 7abe9887b0 [PPC] Avoid incorrect fp-i128-fp lowering.
Summary:
Fix an issue that's similar to what D41411 fixed:
  float(__int128(float_var)) shouldn't be optimized to xscvdpsxds +
  xscvsxdsp, as they mean (float)(int64_t)float_var.

Reviewers: jtony, hfinkel, echristo

Subscribers: sanjoy, nemanjai, hiraditya, llvm-commits, kbarton

Differential Revision: https://reviews.llvm.org/D42400

llvm-svn: 323270
2018-01-23 22:06:57 +00:00
Volkan Keles dc40be75f8 [llvm-extract] Support extracting basic blocks
Summary:
Currently, there is no way to extract a basic block from a function easily. This patch
extends llvm-extract to extract the specified basic block(s).

Reviewers: loladiro, rafael, bogner

Reviewed By: bogner

Subscribers: hintonda, mgorny, qcolombet, llvm-commits

Differential Revision: https://reviews.llvm.org/D41638

llvm-svn: 323266
2018-01-23 21:51:34 +00:00
Craig Topper 1e42a4a735 [X86] Merge some regular expressions in Zen scheduler model and remove 2 unused classes.
I don't know if the unused classes were intended to be used and that the VEX version is really different than the legacy SSE version. Agner's tables don't show any differences. I'm just cleaning up assuming the current behavior is correct.

llvm-svn: 323263
2018-01-23 21:37:56 +00:00
Craig Topper 3067f4ddca [X86] Remove 'Int_' from instregexs in Zen scheduler model.
No instructions have Int_ at the beginning. It's always at the end now. So it should be picked up as a prefix match

llvm-svn: 323262
2018-01-23 21:37:54 +00:00
Craig Topper 002657731b [X86] Move 'Int_' to the end of the name of the VCOMISS/VUCOMISS and instructions to get them picked up by the scheduler model regexs.
All other intrinsic instructions put the _Int on the end. This make these instructions consistent and gets the prefix instregexs in the scheduler models to pick them up.

llvm-svn: 323261
2018-01-23 21:37:51 +00:00
Simon Pilgrim 2cc74ed2be [X86][AVX] LowerBUILD_VECTORAsVariablePermute - add support for VPERMILPV to v2i64/v2f64
Minor refactor to make it possible for LowerBUILD_VECTORAsVariablePermute to be used with a wider variety of shuffles op and types.

I'd have liked to add v4i32/v4f32 support as well but we don't see v4i32 index extractions at the moment (which is why I created D42308)

After this I intend to begin adding scaling support for PSHUFB (v8i16, v4i32, v2i64)) and VPERMPS (v4f64, v4i64).

Differential Revision: https://reviews.llvm.org/D42431

llvm-svn: 323260
2018-01-23 21:33:24 +00:00
Evgeniy Stepanov d5a6fdbe95 [safestack] Inline safestack pointer access when possible.
Summary:
This adds an -mllvm flag that forces the use of a runtime function call to
get the unsafe stack pointer, the same that is currently used on non-x86, non-aarch64 android.
The call may be inlined.

Reviewers: pcc

Subscribers: aemerson, kristof.beyls, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D37405

llvm-svn: 323259
2018-01-23 21:27:07 +00:00
Simon Pilgrim c1e2290d37 Fix MSVC "result of 32-bit shift implicitly converted to 64 bits" warning. NFCI.
llvm-svn: 323258
2018-01-23 21:22:16 +00:00
Alexey Bataev 4f74a31c0e Revert "[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle."
This reverts commit r323246 because of the broken buildbots.

llvm-svn: 323252
2018-01-23 20:11:27 +00:00
Krzysztof Parzyszek d5e8a260bb [Hexagon] Add patterns for sext_inreg of HVX vector types
llvm-svn: 323250
2018-01-23 19:56:16 +00:00
Alexey Bataev 6719e2418c [SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle.
Summary:
If the same value is going to be vectorized several times in the same
tree entry, this entry is considered to be a gather entry and cost of
this gather is counter as cost of InsertElementInstrs for each gathered
value. But we can consider these elements as ShuffleInstr with
SK_PermuteSingle shuffle kind.

Reviewers: spatel, RKSimon, mkuper, hfinkel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38697

llvm-svn: 323246
2018-01-23 19:30:26 +00:00
Krzysztof Parzyszek 275ffa4679 [Hexagon] Implement hasLoadFromStackSlot and hasStoreToStackSlot
If the instruction is a bundle, check the instructions inside of it.

Patch by Suyog Sarda.

llvm-svn: 323240
2018-01-23 19:08:40 +00:00
Nico Weber 1c7c45688c Introduce errorToBool() helper and use it.
errorToBool() converts an Error to a bool and puts the Error in a checked
state.  No behavior change.

https://reviews.llvm.org/D42422

llvm-svn: 323238
2018-01-23 19:03:13 +00:00
Sam Clegg 68b425f0bf [WebAssembly] Remove "name" section of object wasm object files
LLD is unaffected, no changes needed there. LLD continues to
write out a name section, using the symbol names.

Fixes: https://github.com/WebAssembly/tool-conventions/issues/37

Patch by Nicholas Wilson!

Differential Revision: https://reviews.llvm.org/D42425

llvm-svn: 323234
2018-01-23 18:30:04 +00:00
Krzysztof Parzyszek ae3e934bd6 [Hexagon] Fix unused variable warning in release build
llvm-svn: 323233
2018-01-23 18:16:52 +00:00
Krzysztof Parzyszek 3780a0e1fa [Hexagon] Implement basic vector operations on vectors vNi1
In addition to that, make sure that there are no boolean vector types that
are associated with multiple register classes. Specifically, remove v32i1
and v64i1 from integer register classes. These types will correspond to
results of vector comparisons, and as such should belong to the vector
predicate class. Having them in scalar registers as well makes legalization
ambiguous.

llvm-svn: 323229
2018-01-23 17:53:59 +00:00
Simon Pilgrim 6ff241fc99 [X86][SSE] LowerBUILD_VECTORAsVariablePermute - extract subvector from oversized index vectors
llvm-svn: 323223
2018-01-23 17:02:15 +00:00
Dan Gohman 5464941a6a [WebAssembly] Add mem.* intrinsics.
The grow_memory and current_memory instructions are expected to be
officially renamed to mem.grow and mem.size. Introduce new intrinsics
with the new names. These new names aren't yet official, so for now,
use them at your own risk.

Also, take this opportunity to add arguments for the currently unused
immediate field in those instructions.

llvm-svn: 323222
2018-01-23 17:02:02 +00:00
Dan Gohman f2c1cae5cb [WebAssembly] Switch to *-wasm as the default target triple.
This makes wasm32-unknown-unknown-wasm the default, which supports
the .o file writer and the new linking ABI. To enable s2wasm-compatible
output, use the wasm32-unknown-unknown-elf triple.

llvm-svn: 323220
2018-01-23 16:55:44 +00:00
Yaxun Liu d0820433b1 Verifier: fix bug treating debug info issue as non-debug info issue
Normally when llvm-as sees only debug info errors in LLVM assembly, it simply
drops the debug info and outputs a valid LLVM bitcode and returns 0.

There is a bug in LLVM verifier which incorrectly treats a debug info error
as non-debug info error, which causes llvm-as returns 1 even though llvm-as
can drop the invalid debug info and outputs a valid LLVM bitcode.

This patch fixes that.

Differential Revision: https://reviews.llvm.org/D42391

llvm-svn: 323216
2018-01-23 16:11:15 +00:00
Yaxun Liu 8b7454a8dd CodeGen: Fix assertion in ScheduleDAGMILive::scheduleMI due to llvm.dbg.value
Fix a bug in ScheduleDAGMILive::scheduleMI which causes BotRPTracker not tracking CurrentBottom in some rare cases involving llvm.dbg.value.

This issues causes amdgcn target to assert when compiling some user codes with -g.

Differential Revision: https://reviews.llvm.org/D42394

llvm-svn: 323214
2018-01-23 16:04:53 +00:00
Craig Topper c58c2b5c9b [X86] Rewrite vXi1 element insertion by using a vXi1 scalar_to_vector and inserting into a vXi1 vector.
The existing code was already doing something very similar to subvector insertion so this allows us to remove the nearly duplicate code.

This patch is a little larger than it should be due to differences between the DQI handling between the two today.

llvm-svn: 323212
2018-01-23 15:56:36 +00:00
Simon Pilgrim 0c9f77a9f9 [X86][SSE] LowerBUILD_VECTORAsVariablePermute - ensure that the source vector is not larger than the destination
We might be able to support this in the future with VPERMV3, OR(PSHUFB, PSHUFB) etc.

llvm-svn: 323210
2018-01-23 15:51:03 +00:00
Simon Pilgrim 9b4a097f94 Use EVT::changeVectorElementTypeToInteger() to convert index type to integer
llvm-svn: 323207
2018-01-23 15:30:07 +00:00
Simon Pilgrim e2905c8a0c [X86][SSE] LowerBUILD_VECTORAsVariablePermute - ensure that the index vector has the correct number of elements
llvm-svn: 323206
2018-01-23 15:13:37 +00:00
Tim Northover f9b560aa8e AArch64: get type from correct result when forming BFX
Some nodes produce multiple values so when obtaining the type of an ISD::OR we
need to make sure we ask for the correct one. Hopefully that's all of them.

llvm-svn: 323205
2018-01-23 15:11:27 +00:00
Tim Northover 9f3003d08f AArch64: get type from correct result when forming BFI/BFM
Some nodes produce multiple values so when obtaining the type of an ISD::OR we
need to make sure we ask for the correct one.

llvm-svn: 323202
2018-01-23 14:37:03 +00:00
Craig Topper 76adcc86cd [X86] Legalize v32i1 without BWI via splitting to v16i1 rather than the default of promoting to v32i8.
Summary:
For the most part its better to keep v32i1 as a mask type of a narrower width than trying to promote it to a ymm register.

I had to add some overrides to the methods that get the types for the calling convention so that we still use v32i8 for argument/return purposes.

There are still some regressions in here. I definitely saw some around shuffles. I think we probably should move vXi1 shuffle from lowering to a DAG combine where I think the extend and truncate we have to emit would be better combined.

I think we also need a DAG combine to remove trunc from (extract_vector_elt (trunc))

Overall this removes something like 13000 CHECK lines from lit tests.

Reviewers: zvi, RKSimon, delena, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42031

llvm-svn: 323201
2018-01-23 14:25:39 +00:00
Craig Topper c2df6409c7 [X86] Add missing MOVSX/MOVZX instructions to load folding tables.
I'm not sure there's any way to generate these folding cases especially the movzx ones since even the register form is never emitted by codegen.

I'm just adding them to remove the difference with the autogenerated version of the folding table.

llvm-svn: 323200
2018-01-23 14:09:22 +00:00
Serguei Katkov 17e5794f11 [CGP] Fix the GV handling in complex addressing mode
If in complex addressing mode the difference is in GV then
base reg should not be installed because we plan to use
base reg as a merge point of different GVs.

This is a fix for PR35980.

Reviewers: reames, john.brawn, santosh
Reviewed By: john.brawn
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42230

llvm-svn: 323192
2018-01-23 12:07:49 +00:00
Simon Pilgrim 8ea1a0c690 [X86][SSE] LowerBUILD_VECTORAsVariablePermute - fix PSHUFB source/index operand ordering
As detailed in rL317463, PSHUFB (like most variable shuffle instructions) uses Op[0] for the source vector and Op[1] for the shuffle index vector, VPERMV works in reverse which is probably where the confusion comes from.

Differential Revision: https://reviews.llvm.org/D42380

llvm-svn: 323190
2018-01-23 11:39:06 +00:00
MinSeong Kim 27f77b4300 [Analysis] Disable exp/exp2/pow finite lib calls on Android with -ffast-math.
Summary:
Since r322087, glibc's finite lib calls are generated when possible.
However, glibc is not supported on Android. Therefore this change
enables llvm to finely distinguish between linux and Android for
unsupported library calls. The change also include some regression
tests.

Reviewers: srhines, pirama

Reviewed By: srhines

Subscribers: kongyi, chh, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D42288

llvm-svn: 323187
2018-01-23 11:11:36 +00:00
Stefan Maksimovic 98749e0249 [mips] Properly select abs and sqrt instructions
- Alter abs for micromips to have both AFGR64 and FGR64
  variants, same as sqrt
- Remove sqrt and abs from MicroMips32r6InstrInfo.td,
  use micromips FGR64 variants
- Restrict non-micromips abs/sqrt with NotInMicroMips
  predicate

Differential revision: https://reviews.llvm.org/D41439

llvm-svn: 323184
2018-01-23 10:09:39 +00:00
Ashutosh Nema 007b425b77 This change add's optimization remark in LoopVersioning LICM pass.
Summary:
This patch is adding remark messages to the LoopVersioning LICM pass, 
which will be useful for optimization remark emitter (ORE) infrastructure.

Patch by: Deepak Porwal

Reviewers: anemet, ashutosh.nema, eastig

Subscribers: eastig, vivekvpandya, fhahn, llvm-commits
llvm-svn: 323183
2018-01-23 09:47:28 +00:00
Anton Bikineev 82f61151b3 [InstSimplify] (X << Y) % X -> 0
llvm-svn: 323182
2018-01-23 09:27:47 +00:00
Craig Topper c92edd994e [X86] Don't reorder (srl (and X, C1), C2) if (and X, C1) can be matched as a movzx
Summary:
If we can match as a zero extend there's no need to flip the order to get an encoding benefit. As movzx is 3 bytes with independent source/dest registers. The shortest 'and' we could make is also 3 bytes unless we get lucky in the register allocator and its on AL/AX/EAX which have a 2 byte encoding.

This patch was more impressive before r322957 went in. It removed some of the same Ands that got deleted by that patch.

Reviewers: spatel, RKSimon

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42313

llvm-svn: 323175
2018-01-23 05:45:52 +00:00
Craig Topper e5aea25980 [X86] Remove 'NOREX' comment from the printing of _NOREX instructions.
Some of the NOREX instructions are used in 32-bit mode making this printing confusing. It also doesn't provide a lot of value since you can see the h-register being used by the instruction.

llvm-svn: 323174
2018-01-23 05:37:00 +00:00
Craig Topper 26a701f24f [X86] Various vXi1 insertion improvements.
Add missing patterns for inserting v1i1 into a zero vector. Use insert_subvector to zero upper bits before inserting an element into a vXi1 vector. Replace kshift based isel pattern with insert_subvector based pattern now that code that caused the pattern has been fixed to emit insert_subvector.

llvm-svn: 323173
2018-01-23 05:36:53 +00:00
David Blaikie 0c64f5a2eb NewPM: Add an extension point for the start of the pipeline.
This applies to most pipelines except the LTO and ThinLTO backend
actions - it is for use at the beginning of the overall pipeline.

This extension point will be used to add the GCOV pass when enabled in
Clang.

llvm-svn: 323166
2018-01-23 01:25:20 +00:00
Sam Clegg 60ec30340f [WebAssembly] Store function index rather than table index in TABLE_INDEX relocations
Relocations of type R_WEBASSEMBLY_TABLE_INDEX represent places
where the table index for a given function is needed.  While the
value stored in this location is a table index, the index in
the relocation entry itself is a function index (the index of
the function which is to be called indirectly).

This is how is was spec'd originally but the LLVM implementation
didn't do this.  This makes things a little simpler in the linker
since the table in the input file can essentially be ignored that
the output table can be created purely based on these relocations.

Patch by Nicholas Wilson!

Differential Revision: https://reviews.llvm.org/D42080

llvm-svn: 323165
2018-01-23 01:23:17 +00:00
Rui Ueyama 322fcfee34 Revert r322595: Specify inline for isWhitespace in CommandLine.cpp
The original change was made based on a misunderstanding that
-DCMAKE_BUILD_TYPE=RelWithDebugInfo would produce the same executable
as -DCMAKE_BUILD_TYPE=Release modulo debug info. Turned out that's not
true -- it at least disables some optimizations such as function inlining.

llvm-svn: 323161
2018-01-22 23:27:50 +00:00
Chandler Carruth c58f2166ab Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715, "Branch Target Injection", and is one of the two halves to Spectre..
Summary:
First, we need to explain the core of the vulnerability. Note that this
is a very incomplete description, please see the Project Zero blog post
for details:
https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html

The basis for branch target injection is to direct speculative execution
of the processor to some "gadget" of executable code by poisoning the
prediction of indirect branches with the address of that gadget. The
gadget in turn contains an operation that provides a side channel for
reading data. Most commonly, this will look like a load of secret data
followed by a branch on the loaded value and then a load of some
predictable cache line. The attacker then uses timing of the processors
cache to determine which direction the branch took *in the speculative
execution*, and in turn what one bit of the loaded value was. Due to the
nature of these timing side channels and the branch predictor on Intel
processors, this allows an attacker to leak data only accessible to
a privileged domain (like the kernel) back into an unprivileged domain.

The goal is simple: avoid generating code which contains an indirect
branch that could have its prediction poisoned by an attacker. In many
cases, the compiler can simply use directed conditional branches and
a small search tree. LLVM already has support for lowering switches in
this way and the first step of this patch is to disable jump-table
lowering of switches and introduce a pass to rewrite explicit indirectbr
sequences into a switch over integers.

However, there is no fully general alternative to indirect calls. We
introduce a new construct we call a "retpoline" to implement indirect
calls in a non-speculatable way. It can be thought of loosely as
a trampoline for indirect calls which uses the RET instruction on x86.
Further, we arrange for a specific call->ret sequence which ensures the
processor predicts the return to go to a controlled, known location. The
retpoline then "smashes" the return address pushed onto the stack by the
call with the desired target of the original indirect call. The result
is a predicted return to the next instruction after a call (which can be
used to trap speculative execution within an infinite loop) and an
actual indirect branch to an arbitrary address.

On 64-bit x86 ABIs, this is especially easily done in the compiler by
using a guaranteed scratch register to pass the target into this device.
For 32-bit ABIs there isn't a guaranteed scratch register and so several
different retpoline variants are introduced to use a scratch register if
one is available in the calling convention and to otherwise use direct
stack push/pop sequences to pass the target address.

This "retpoline" mitigation is fully described in the following blog
post: https://support.google.com/faqs/answer/7625886

We also support a target feature that disables emission of the retpoline
thunk by the compiler to allow for custom thunks if users want them.
These are particularly useful in environments like kernels that
routinely do hot-patching on boot and want to hot-patch their thunk to
different code sequences. They can write this custom thunk and use
`-mretpoline-external-thunk` *in addition* to `-mretpoline`. In this
case, on x86-64 thu thunk names must be:
```
  __llvm_external_retpoline_r11
```
or on 32-bit:
```
  __llvm_external_retpoline_eax
  __llvm_external_retpoline_ecx
  __llvm_external_retpoline_edx
  __llvm_external_retpoline_push
```
And the target of the retpoline is passed in the named register, or in
the case of the `push` suffix on the top of the stack via a `pushl`
instruction.

There is one other important source of indirect branches in x86 ELF
binaries: the PLT. These patches also include support for LLD to
generate PLT entries that perform a retpoline-style indirection.

The only other indirect branches remaining that we are aware of are from
precompiled runtimes (such as crt0.o and similar). The ones we have
found are not really attackable, and so we have not focused on them
here, but eventually these runtimes should also be replicated for
retpoline-ed configurations for completeness.

For kernels or other freestanding or fully static executables, the
compiler switch `-mretpoline` is sufficient to fully mitigate this
particular attack. For dynamic executables, you must compile *all*
libraries with `-mretpoline` and additionally link the dynamic
executable and all shared libraries with LLD and pass `-z retpolineplt`
(or use similar functionality from some other linker). We strongly
recommend also using `-z now` as non-lazy binding allows the
retpoline-mitigated PLT to be substantially smaller.

When manually apply similar transformations to `-mretpoline` to the
Linux kernel we observed very small performance hits to applications
running typical workloads, and relatively minor hits (approximately 2%)
even for extremely syscall-heavy applications. This is largely due to
the small number of indirect branches that occur in performance
sensitive paths of the kernel.

When using these patches on statically linked applications, especially
C++ applications, you should expect to see a much more dramatic
performance hit. For microbenchmarks that are switch, indirect-, or
virtual-call heavy we have seen overheads ranging from 10% to 50%.

However, real-world workloads exhibit substantially lower performance
impact. Notably, techniques such as PGO and ThinLTO dramatically reduce
the impact of hot indirect calls (by speculatively promoting them to
direct calls) and allow optimized search trees to be used to lower
switches. If you need to deploy these techniques in C++ applications, we
*strongly* recommend that you ensure all hot call targets are statically
linked (avoiding PLT indirection) and use both PGO and ThinLTO. Well
tuned servers using all of these techniques saw 5% - 10% overhead from
the use of retpoline.

We will add detailed documentation covering these components in
subsequent patches, but wanted to make the core functionality available
as soon as possible. Happy for more code review, but we'd really like to
get these patches landed and backported ASAP for obvious reasons. We're
planning to backport this to both 6.0 and 5.0 release streams and get
a 5.0 release with just this cherry picked ASAP for distros and vendors.

This patch is the work of a number of people over the past month: Eric, Reid,
Rui, and myself. I'm mailing it out as a single commit due to the time
sensitive nature of landing this and the need to backport it. Huge thanks to
everyone who helped out here, and everyone at Intel who helped out in
discussions about how to craft this. Also, credit goes to Paul Turner (at
Google, but not an LLVM contributor) for much of the underlying retpoline
design.

Reviewers: echristo, rnk, ruiu, craig.topper, DavidKreitzer

Subscribers: sanjoy, emaste, mcrosier, mgorny, mehdi_amini, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D41723

llvm-svn: 323155
2018-01-22 22:05:25 +00:00
Mark Searles 7687d42052 [AMDGPU] SI Load Store Optimizer: When merging with offset, use V_ADD_{I|U}32_e64
- Change inserted add ( V_ADD_{I|U}32_e32 ) to _e64 version ( V_ADD_{I|U}32_e64 ) so that the add uses a vreg for the carry; this prevents inserted v_add from killing VCC; the _e64 version doesn't accept a literal in its encoding, so we need to introduce a mov instr as well to get the imm into a register.
- Change pass name to "SI Load Store Optimizer"; this removes the '/', which complicates scripts.

Differential Revision: https://reviews.llvm.org/D42124

llvm-svn: 323153
2018-01-22 21:46:43 +00:00
Dmitry Vyukov 68aab34f2d asan: allow inline instrumentation for the kernel
Currently ASan instrumentation pass forces callback
instrumentation when applied to the kernel.
This patch changes the current behavior to allow
using inline instrumentation in this case.

Authored by andreyknvl. Reviewed in:
https://reviews.llvm.org/D42384

llvm-svn: 323140
2018-01-22 19:07:11 +00:00
Evandro Menezes 312443fd83 [AArch64] Create a separate feature set for Exynos M3
Distinguish the features from Exynos M2.

llvm-svn: 323139
2018-01-22 19:03:26 +00:00
Joel Galenson 1d89cd2bb4 [ARM] Cleanup part of ARMBaseInstrInfo::optimizeCompareInstr (NFCI).
As noted in another review, this loop is confusing.  This commit cleans it up
somewhat.

Differential Revision: https://reviews.llvm.org/D42312

llvm-svn: 323136
2018-01-22 17:53:47 +00:00
Petar Jovanovic 29aced1bae [mips] add warnings for using dsp and msa flags with inappropriate revisions
Dsp and dspr2 require MIPS revision 2, while msa requires revision 5. Adding
warnings for cases when these flags are used with earlier revision.

Patch by Milos Stojanovic.

Differential Revision: https://reviews.llvm.org/D40490

llvm-svn: 323131
2018-01-22 16:43:30 +00:00
Ulrich Weigand 145d63f1ad [SystemZ] Fix bootstrap failure due to invalid DAG loop
The change in r322988 caused a failure in the bootstrap build bot.
The problem was that directly gluing a BR_CCMASK node to a
compare-and-swap could lead to issues if other nodes were
chained in between.  There is then no way to create a topological
sort that respects both the chain sequence and the glue property.

Fixed for now by rejecting the optimization in this case.  As a
future enhancement, we may be able to handle additional cases
by swapping chain links around.

llvm-svn: 323129
2018-01-22 15:41:49 +00:00
Marina Yatsina 811523cc08 Fix bug in commit 323096 exposed by test in test-suite-verify-machineinstrs-x86_64h-O3
Change-Id: I0a4b10d0d6c8de606d989c567ec07944ae283a87
llvm-svn: 323126
2018-01-22 15:31:05 +00:00
Sander de Smalen 7ab96f534c [AArch64][SVE] Asm: PTRUE and PTRUES instructions
Summary: These instructions initialize a predicate vector from a pattern/immediate.

Reviewers: fhahn, rengolin, evandro, mcrosier, t.p.northover, samparker, olista01

Reviewed By: samparker

Subscribers: aemerson, javed.absar, tschuett, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D41819

llvm-svn: 323124
2018-01-22 15:29:19 +00:00
Carey Williams da15b5b116 [AArch64] optimise v4f16 fcmps to utilise vector instructions
Improves the code generation for v4f16 FCMP instructions when FullFP16 is not supported.
Generating FCTVL(s) rather than a longer series of FCVTs.

Differential Revision: https://reviews.llvm.org/D41772

llvm-svn: 323118
2018-01-22 14:16:11 +00:00
Eugene Leviant 28d8a49f42 [ThinLTO] Re-commit of dot dumper after test fix
llvm-svn: 323116
2018-01-22 13:35:40 +00:00
Marina Yatsina e4d63a499d Fixing warnings caused by commit 323095
Change-Id: I4e1f81db2f5382a820f4016c23b243e4d5aebf51
llvm-svn: 323114
2018-01-22 13:24:10 +00:00
Pavel Labath 9b36fd2541 Rename DwarfAcceleratorTable to AppleAcceleratorTable. NFC
This frees up the first name to be used as an base class for the
apple table and the dwarf5 .debug_names accel table. The rename  was
split off from D42297 (adding of debug_names support), which is still
under review.

llvm-svn: 323113
2018-01-22 13:17:23 +00:00
Simon Pilgrim 17682a86da [X86][SSE] Add ISD::VECTOR_SHUFFLE to faux shuffle decoding (Reapplied)
Primarily, this allows us to use the aggressive extraction mechanisms in combineExtractWithShuffle earlier and make use of UNDEF elements that may be lost during lowering.

Reapplied after rL322279 was reverted at rL322335 due to PR35918, underlying issue was fixed at rL322644.

llvm-svn: 323104
2018-01-22 12:05:17 +00:00
Sander de Smalen 245e0e67f3 [AArch64][SVE] Asm: Predicate patterns
Summary:
This patch adds support for parsing/printing of named or unnamed
patterns that are used in SVE's PTRUE instruction, amongst others.

The pattern can be specified as a named pattern to initialize the predicate
vector or it can be specified as an immediate in the range 0-31.

Reviewers: fhahn, rengolin, evandro, mcrosier, t.p.northover

Reviewed By: fhahn

Subscribers: aemerson, javed.absar, tschuett, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D41818

llvm-svn: 323098
2018-01-22 10:46:00 +00:00
Marina Yatsina 77a21dbad4 Break false dependencies for POPCNT, LZCNT, TZCNT
Add POPCNT, LZCNT, TZCNT to the list of instructions that have false dependency.
Add a test to make sure BreakFalseDeps breaks the dependencies for these instructions.
Update affected tests.

This fixes bugzilla https://bugs.llvm.org/show_bug.cgi?id=33869

This is the final of multiple patches that fix this bugzilla.
Most of the patches are intended at refactoring the existent code.

Reviews of the refactoring done to enable this change:
https://reviews.llvm.org/D40330
https://reviews.llvm.org/D40331
https://reviews.llvm.org/D40332
https://reviews.llvm.org/D40333

Differential Revision: https://reviews.llvm.org/D40334

Change-Id: If95cbf1a3f5c7dccff8f1b22ecb397542147303d
llvm-svn: 323096
2018-01-22 10:07:01 +00:00
Marina Yatsina 0bf841ac2a Separate LoopTraversal, ReachingDefAnalysis and BreakFalseDeps into their own files.
This is the one of multiple patches that fix bugzilla https://bugs.llvm.org/show_bug.cgi?id=33869
Most of the patches are intended at refactoring the existent code.

Additional relevant reviews:
https://reviews.llvm.org/D40330
https://reviews.llvm.org/D40331
https://reviews.llvm.org/D40332
https://reviews.llvm.org/D40334

Differential Revision: https://reviews.llvm.org/D40333

Change-Id: Ie5f8eb34d98cfdfae23a3072eb69b5794f0e2d56
llvm-svn: 323095
2018-01-22 10:06:50 +00:00
Marina Yatsina 3d8efa4f0c Rename ExecutionDepsFix files to ExecutionDomainFix
This is the one of multiple patches that fix bugzilla https://bugs.llvm.org/show_bug.cgi?id=33869
Most of the patches are intended at refactoring the existent code.

Additional relevant reviews:
https://reviews.llvm.org/D40330
https://reviews.llvm.org/D40331
https://reviews.llvm.org/D40333
https://reviews.llvm.org/D40334

Differential Revision: https://reviews.llvm.org/D40332

Change-Id: I6a048cca7fdafbfc42fb1bac94343e483befded8
llvm-svn: 323094
2018-01-22 10:06:33 +00:00
Marina Yatsina c757c0b6ba ExecutionDepsFix refactoring:
- clang-format

This is the one of multiple patches that fix bugzilla https://bugs.llvm.org/show_bug.cgi?id=33869
Most of the patches are intended at refactoring the existent code.

Additional relevant reviews:
https://reviews.llvm.org/D40330
https://reviews.llvm.org/D40332
https://reviews.llvm.org/D40333
https://reviews.llvm.org/D40334

Differential Revision: https://reviews.llvm.org/D40331

Change-Id: I131b126af13bc743bc5d69d83699e52b9b720979
llvm-svn: 323093
2018-01-22 10:06:18 +00:00
Marina Yatsina 30234e866a ExecutionDepsFix refactoring:
- Moving comments to class definition in header file
- Changing comments to doxygen style
- Rephrase loop traversal explaining comment

This is the one of multiple patches that fix bugzilla https://bugs.llvm.org/show_bug.cgi?id=33869
Most of the patches are intended at refactoring the existent code.

Additional relevant reviews:
https://reviews.llvm.org/D40330
https://reviews.llvm.org/D40332
https://reviews.llvm.org/D40333
https://reviews.llvm.org/D40334

Differential Revision: https://reviews.llvm.org/D40331

Change-Id: I9a12618db5b66128611fa71b54a233414f6012ac
llvm-svn: 323092
2018-01-22 10:06:10 +00:00
Marina Yatsina 273d35db4d ExecutionDepsFix refactoring:
- Removing LiveRegs

This is the one of multiple patches that fix bugzilla https://bugs.llvm.org/show_bug.cgi?id=33869
Most of the patches are intended at refactoring the existent code.

Additional relevant reviews:
https://reviews.llvm.org/D40330
https://reviews.llvm.org/D40332
https://reviews.llvm.org/D40333
https://reviews.llvm.org/D40334

Differential Revision: https://reviews.llvm.org/D40331

Change-Id: I8ab56d99951a6d6981542f68d94c1f624f3c9fbf
llvm-svn: 323091
2018-01-22 10:06:01 +00:00
Marina Yatsina 6daa2d25e8 ExecutionDepsFix refactoring:
- Changing LiveRegs to be a vector

This is the one of multiple patches that fix bugzilla https://bugs.llvm.org/show_bug.cgi?id=33869
Most of the patches are intended at refactoring the existent code.

Additional relevant reviews:
https://reviews.llvm.org/D40330
https://reviews.llvm.org/D40332
https://reviews.llvm.org/D40333
https://reviews.llvm.org/D40334

Differential Revision: https://reviews.llvm.org/D40331

Change-Id: I9cdd364bd7bf2a0bf61ea41a48d4bd310ec3bce4
llvm-svn: 323090
2018-01-22 10:05:53 +00:00
Marina Yatsina 0f9cda52f5 ExecutionDepsFix refactoring:
- Changing DenseMap<MBB*, LiveReg*> to SmallVector<LiveReg*>
- Now the MBB number will be the index of LiveReg in the vector.
- Adding asserts

This patch is NFC.

This is the one of multiple patches that fix bugzilla https://bugs.llvm.org/show_bug.cgi?id=33869
Most of the patches are intended at refactoring the existent code.

Additional relevant reviews:
https://reviews.llvm.org/D40330
https://reviews.llvm.org/D40332
https://reviews.llvm.org/D40333
https://reviews.llvm.org/D40334

Differential Revision: https://reviews.llvm.org/D40331

Change-Id: If4a3f141693d0361ddb292432337dbb63a1e69ee
llvm-svn: 323089
2018-01-22 10:05:45 +00:00
Marina Yatsina 6b34e63ebd ExecutionDepsFix refactoring:
- Remove unneeded includes and unneeded members
- Use range iterators
- Variable renaming, typedefs, extracting constants
- Removing {} from one line ifs

This patch is NFC.

This is the one of multiple patches that fix bugzilla https://bugs.llvm.org/show_bug.cgi?id=33869
Most of the patches are intended at refactoring the existent code.

Additional relevant reviews:
https://reviews.llvm.org/D40330
https://reviews.llvm.org/D40332
https://reviews.llvm.org/D40333
https://reviews.llvm.org/D40334

Differential Revision: https://reviews.llvm.org/D40331

Change-Id: Ib59060ab3fa5bee3bf2ca2045c24e572635ee7f6
llvm-svn: 323088
2018-01-22 10:05:37 +00:00
Marina Yatsina 6fc2aaae8d Separate ExecutionDepsFix into 4 parts:
1. ReachingDefsAnalysis - Allows to identify for each instruction what is the “closest” reaching def of a certain register. Used by BreakFalseDeps (for clearance calculation) and ExecutionDomainFix (for arbitrating conflicting domains).
2. ExecutionDomainFix - Changes the variant of the instructions in order to minimize domain crossings.
3. BreakFalseDeps - Breaks false dependencies.
4. LoopTraversal - Creatws a traversal order of the basic blocks that is optimal for loops (introduced in revision L293571). Both ExecutionDomainFix and ReachingDefsAnalysis use this to determine the order they will traverse the basic blocks.

This also included the following changes to ExcecutionDepsFix original logic:
1. BreakFalseDeps and ReachingDefsAnalysis logic no longer restricted by a register class.
2. ReachingDefsAnalysis tracks liveness of reg units instead of reg indices into a given reg class.

Additional changes in affected files:
1. X86 and ARM targets now inherit from ExecutionDomainFix instead of ExecutionDepsFix. BreakFalseDeps also was added to the passes they activate.
2. Comments and references to ExecutionDepsFix replaced with ExecutionDomainFix and BreakFalseDeps, as appropriate.

Additional refactoring changes will follow.

This commit is (almost) NFC.
The only functional change is that now BreakFalseDeps will break dependency for all register classes.
Since no additional instructions were added to the list of instructions that have false dependencies, there is no actual change yet.
In a future commit several instructions (and tests) will be added.

This is the first of multiple patches that fix bugzilla https://bugs.llvm.org/show_bug.cgi?id=33869
Most of the patches are intended at refactoring the existent code.

Additional relevant reviews:
https://reviews.llvm.org/D40331
https://reviews.llvm.org/D40332
https://reviews.llvm.org/D40333
https://reviews.llvm.org/D40334

Differential Revision: https://reviews.llvm.org/D40330

Change-Id: Icaeb75e014eff96a8f721377783f9a3e6c679275
llvm-svn: 323087
2018-01-22 10:05:23 +00:00
Pavel Labath 44197df7a3 [BinaryFormat] Add .debug_names support
Summary:
This adds a definition of the .debug_names section and the new constants
(DW_IDX_???) which are used in it.

Reviewers: JDevlieghere, aprantl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42296

llvm-svn: 323084
2018-01-22 09:41:36 +00:00
Serguei Katkov f38041dc3e Revert [SCEV] Fix isLoopEntryGuardedByCond usage
It causes buildbot failures. New added assert is fired.
It seems not all usages of isLoopEntryGuardedByCond are fixed.

llvm-svn: 323079
2018-01-22 07:47:02 +00:00
Serguei Katkov 50714a1cbc [SCEV] Fix isLoopEntryGuardedByCond usage
ScalarEvolution::isKnownPredicate invokes isLoopEntryGuardedByCond without check
that SCEV is available at entry point of the loop. It is incorrect and fixed by patch.

Reviewers: sanjoy, mkazantsev, anna, dorit
Reviewed By: mkazantsev
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42165

llvm-svn: 323077
2018-01-22 07:31:41 +00:00
Hiroshi Inoue 290adb3184 [NFC] fix trivial typos in comments
"the the" -> "the"

llvm-svn: 323074
2018-01-22 05:54:46 +00:00
Lang Hames 635fd9092b [ORC] Add orc::SymbolResolver, a Orc/Legacy API interop header, and an
orc::SymbolResolver to JITSymbolResolver adapter.

The new orc::SymbolResolver interface uses asynchronous queries for better
performance. (Asynchronous queries with bulk lookup minimize RPC/IPC overhead,
support parallel incoming queries, and expose more available work for
distribution). Existing ORC layers will soon be updated to use the
orc::SymbolResolver API rather than the legacy llvm::JITSymbolResolver API.

Because RuntimeDyld still uses JITSymbolResolver, this patch also includes an
adapter that wraps an orc::SymbolResolver with a JITSymbolResolver API.

llvm-svn: 323073
2018-01-22 03:00:31 +00:00
Sanjay Patel 9530f18864 [InstCombine] (X << Y) / X -> 1 << Y
...when the shift is known to not overflow with the matching
signed-ness of the division.

This closes an optimization gap caused by canonicalizing mul
by power-of-2 to shl as shown in PR35709:
https://bugs.llvm.org/show_bug.cgi?id=35709

Patch by Anton Bikineev!

Differential Revision: https://reviews.llvm.org/D42032

llvm-svn: 323068
2018-01-21 16:14:51 +00:00
Eugene Leviant 72b9bdb71a Temporarily revert r323062 to investigate buildbot failures
llvm-svn: 323065
2018-01-21 10:22:19 +00:00
Eugene Leviant 453c976a63 [ThinLTO] Implement summary visualizer
Differential revision: https://reviews.llvm.org/D41297

llvm-svn: 323062
2018-01-21 07:27:32 +00:00
Lang Hames 5ff5a30bee [ORC] Add a lookupFlags method to VSO.
lookupFlags returns a SymbolFlagsMap for the requested symbols, along with a
set containing the SymbolStringPtr for any symbol not found in the VSO.

The JITSymbolFlags for each symbol will have been stripped of its transient
JIT-state flags (i.e. NotMaterialized, Materializing).

Calling lookupFlags does not trigger symbol materialization.

llvm-svn: 323060
2018-01-21 03:20:39 +00:00
Lang Hames 86edf53664 [ORC] More cleanup. NFC.
llvm-svn: 323059
2018-01-21 03:20:36 +00:00
Lang Hames 0d7d442699 [ORC] Cleanup. NFC.
llvm-svn: 323057
2018-01-21 02:24:45 +00:00
Philip Reames f57714c3c7 [DSE] Factor out common code [NFC]
We already had the pointer being stored to in the MemLoc, reuse that code.  In merging cases, it turned out the interface of the getLocForWrite had become inconsitent with other related utilities.  Fix that by making sure the input passes hasAnalyzableWrite as well.

llvm-svn: 323056
2018-01-21 02:10:54 +00:00
Philip Reames 424e7a1174 [DSE] Minor rename for clarity sake [NFC]
llvm-svn: 323055
2018-01-21 01:44:33 +00:00
Craig Topper 7fddf2bfef [X86] Add an override of targetShrinkDemandedConstant to limit the damage that shrinkdemandedbits can do to zext_in_reg operations
Summary:
This patch adds an implementation of targetShrinkDemandedConstant that tries to keep shrinkdemandedbits from removing bits that would otherwise have been recognized as a movzx.

We still need a follow patch to stop moving ands across srl if the and could be represented as a movzx before the shift but not after. I think this should help with some of the cases that D42088 ended up removing during isel.

Reviewers: spatel, RKSimon

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42265

llvm-svn: 323048
2018-01-20 18:50:09 +00:00
Simon Pilgrim 89540d9665 [X86][SSE] Check for out of bounds PEXTR/PINSR indices during faux shuffle combining.
llvm-svn: 323045
2018-01-20 17:16:01 +00:00
Jonas Paulsson 7ad28863fb [SelectionDAG] Fix codegen of vector stores with non byte-sized elements.
This was completely broken, but hopefully fixed by this patch.

In cases where it is needed, a vector with non byte-sized elements is stored
by extracting, zero-extending, shift:ing and or:ing the elements into an
integer of the same width as the vector, which is then stored.

Review: Eli Friedman, Ulrich Weigand
https://reviews.llvm.org/D42100#inline-369520
https://bugs.llvm.org/show_bug.cgi?id=35520

llvm-svn: 323042
2018-01-20 16:05:10 +00:00
Martin Storsjo f641d0d4f2 [COFF] Keep the underscore on exported decorated stdcall functions in MSVC mode
This (together with the corresponding LLD commit, that contains the
testcase updates) fixes PR35733.

Differential Revision: https://reviews.llvm.org/D41631

llvm-svn: 323035
2018-01-20 11:44:32 +00:00
Saleem Abdulrasool 99f479abcf CodeGen: handle llvm.used properly for COFF
`llvm.used` contains a list of pointers to named values which the
compiler, assembler, and linker are required to treat as if there is a
reference that they cannot see.  Ensure that the symbols are preserved
by adding an explicit `-include` reference to the linker command.

llvm-svn: 323017
2018-01-20 00:28:02 +00:00
Craig Topper 08bd14803c [X86] Teach X86 codegen to use vector width preference to avoid promoting to 512-bit types when VLX is enabled and the preference is for a smaller size.
This change applies to places where we would turn 128/256-bit code into 512-bit in order to get a wider element type through sext/zext. Any 512-bit types that already existed in the IR/DAG will be left that way.

The width preference has no effect on codegen behavior when the target does not have AVX512 enabled. So AVX/AVX2 codegen cannot be limited via this mechanism yet.

If the preference is lower than 256 we may still use a 256 bit type to do the operation. Constraining to 128 bits makes it much more difficult to support some operations. For many of these cases we need to change element width while keeping element count constant which is easiest done by switching between 256 and 128 bit.

The preference is only obeyed when AVX512 and VLX are available. This means the preference is not obeyed for KNL, but is obeyed for SKX, Cannonlake, and Icelake. For KNL, the only way to do masked operation is on 512-bit registers so we would have to completely disable masking to obey the preference. We would also lose support for gather, scatter, ctlz, vXi64 multiplies, etc. This may change in the future, but this simplifies the initial implementation.

Differential Revision: https://reviews.llvm.org/D41895

llvm-svn: 323016
2018-01-20 00:26:12 +00:00
Craig Topper 0d797a34d8 [X86] Add support for passing 'prefer-vector-width' function attribute into X86Subtarget and exposing via X86's getRegisterWidth TTI interface.
This will cause the vectorizers to do some limiting of the vector widths they create. This is not a strict limit. There are reasons I know of that the loop vectorizer will generate larger vectors for.

I've written this in such a way that the interface will only return a properly supported width(0/128/256/512) even if the attribute says something funny like 384 or 10.

This has been split from D41895 with the remainder in a follow up commit.

llvm-svn: 323015
2018-01-20 00:26:08 +00:00
Derek Schuff a83a665cd4 [WebAssembly] Fix MSVC build
nullptr_t can't be used left of boolean &&

llvm-svn: 323012
2018-01-20 00:01:18 +00:00
Akira Hatanaka 73ceb50d85 [ObjCARC] Do not turn a call to @objc_autoreleaseReturnValue into a call
to @objc_autorelease if its operand is a PHI and the PHI has an
equivalent value that is used by a return instruction.

For example, ARC optimizer shouldn't replace the call in the following
example, as doing so breaks the AutoreleaseRV/RetainRV optimization:

  %v1 = bitcast i32* %v0 to i8*
  br label %bb3
bb2:
  %v3 = bitcast i32* %v2 to i8*
  br label %bb3
bb3:
  %p = phi i8* [ %v1, %bb1 ], [ %v3, %bb2 ]
  %retval = phi i32* [ %v0, %bb1 ], [ %v2, %bb2 ] ; equivalent to %p
  %v4 = tail call i8* @objc_autoreleaseReturnValue(i8* %p)
  ret i32* %retval

Also, make sure ObjCARCContract replaces @objc_autoreleaseReturnValue's
operand uses with its value so that the call gets tail-called.

rdar://problem/15894705

llvm-svn: 323009
2018-01-19 23:51:13 +00:00
Rui Ueyama 20b49240b8 Fix -Wunused-variable.
llvm-svn: 323004
2018-01-19 22:56:04 +00:00
Lang Hames b72f48452c [ORC] Re-apply r322913 with a fix for a read-after-free error.
ExternalSymbolMap now stores the string key (rather than using a StringRef),
as the object file backing the key may be removed at any time.

llvm-svn: 323001
2018-01-19 22:24:13 +00:00
Jessica Paquette a499c3c29d Add optional DICompileUnit to DIBuilder + make outliner debug info use it
Previously, the DIBuilder didn't expose functionality to set its compile unit
in any other way than calling createCompileUnit. This meant that the outliner,
which creates new functions, had to create a new compile unit for its debug
info.

This commit adds an optional parameter in the DIBuilder's constructor which
lets you set its CU at construction.

It also changes the MachineOutliner so that it keeps track of the DISubprograms
for each outlined sequence. If debugging information is requested, then it
uses one of the outlined sequence's DISubprograms to grab a CU. It then uses
that CU to construct the DISubprogram for the new outlined function.

The test has also been updated to reflect this change.

See https://reviews.llvm.org/D42254 for more information. Also see the e-mail
discussion on D42254 in llvm-commits for more context.

llvm-svn: 322992
2018-01-19 21:21:49 +00:00
Ulrich Weigand 426f6bef44 [SystemZ] Prefer LOCHI over generating IPM sequences
On current machines we have load-on-condition instructions that can be
used to directly implement the SETCC semantics.  If we have those, it is
always preferable to use them instead of generating the IPM sequence.

llvm-svn: 322989
2018-01-19 20:56:04 +00:00
Ulrich Weigand 31112895d9 [SystemZ] Directly use CC result of compare-and-swap
In order to implement a test whether a compare-and-swap succeeded, the
SystemZ back-end currently emits a rather inefficient sequence of first
converting the CC result into an integer, and then testing that integer
against zero.  This commit changes the back-end to simply directly test
the CC value set by the compare-and-swap instruction.

llvm-svn: 322988
2018-01-19 20:54:18 +00:00
Ulrich Weigand 849a59fd4b [SystemZ] Rework IPM sequence generation
The SystemZ back-end uses a sequence of IPM followed by arithmetic
operations to implement the SETCC primitive.  This is currently done
early during SelectionDAG.  This patch moves generating those sequences
to much later in SelectionDAG (during PreprocessISelDAG).

This doesn't change much in generated code by itself, but it allows
further enhancements that will be checked-in as follow-on commits.

llvm-svn: 322987
2018-01-19 20:52:04 +00:00
Ulrich Weigand 9eb858c92f [SystemZ] Implement computeKnownBitsForTargetNode
This provides a computeKnownBits implementation for SystemZ target
nodes.  Currently only SystemZISD::SELECT_CCMASK is supported.

llvm-svn: 322986
2018-01-19 20:49:05 +00:00
Ulrich Weigand 5605be9e50 [SelectionDAG] Teach computeKnownBits about ATOMIC_CMP_SWAP_WITH_SUCCESS boolean return value
The second return value of ATOMIC_CMP_SWAP_WITH_SUCCESS is known to be a
boolean, and should therefore be treated by computeKnownBits just like
the second return values of SMULO / UMULO.

Differential Revision: https://reviews.llvm.org/D42067

llvm-svn: 322985
2018-01-19 20:47:14 +00:00
Sam Clegg 30e1bbc106 [WebAssembly] MC: Start table at offset 1 rather than 0
Summary:
For consistency with the output of lld.

This is useful in runnable binaries as can them be sure the
null function pointer will never be a valid argument
call_indirect.

Subscribers: jfb, dschuff, jgravelle-google, aheejin, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D42284

llvm-svn: 322978
2018-01-19 18:57:01 +00:00
Joel Galenson dbc724f764 [ARM] Fix perf regression in compare optimization.
Fix a performance regression caused by r322737.

While trying to make it easier to replace compares with existing adds and
subtracts, I accidentally stopped it from doing so in some cases.  This should
fix that.  I'm also fixing another potential bug in that commit.

Differential Revision: https://reviews.llvm.org/D42263

llvm-svn: 322972
2018-01-19 17:46:27 +00:00
Derek Schuff bfb02aec5a [WebAssembly] Fix libcall signature lookup
RuntimeLibcallSignatures previously manually initialized all the libcall
names into an array and searched it linearly for the first match to lookup
the corresponding index.
r322802 switched that to initializing a map keyed by the libcall name.
Neither of these approaches works correctly because some libcall numbers use
the same name on different platforms (e.g. the "l" suffixed functions
use f80 or f128 or ppcf128).

This change fixes that by ensuring that each name only goes into the map
once. It also adds tests.

Differential Revision: https://reviews.llvm.org/D42271

llvm-svn: 322971
2018-01-19 17:45:54 +00:00
Dan Gohman 5d2b9354b1 [WebAssembly] Make sign-extension opcodes a distinct feature.
Sign-extension opcodes have been split into a separate proposal from
the main threads proposal, so switch them to their own target
feature. See:

https://github.com/WebAssembly/sign-extension-ops

llvm-svn: 322966
2018-01-19 17:16:24 +00:00
Daniel Neilson 1e68724d24 Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)
Summary:
 This is a resurrection of work first proposed and discussed in Aug 2015:
   http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html
and initially landed (but then backed out) in Nov 2015:
   http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html

 The @llvm.memcpy/memmove/memset intrinsics currently have an explicit argument
which is required to be a constant integer. It represents the alignment of the
dest (and source), and so must be the minimum of the actual alignment of the
two.

 This change is the first in a series that allows source and dest to each
have their own alignments by using the alignment attribute on their arguments.

 In this change we:
1) Remove the alignment argument.
2) Add alignment attributes to the source & dest arguments. We, temporarily,
   require that the alignments for source & dest be equal.

 For example, code which used to read:
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 100, i32 4, i1 false)
will now read
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 4 %dest, i8* align 4 %src, i32 100, i1 false)

 Downstream users may have to update their lit tests that check for
@llvm.memcpy/memmove/memset call/declaration patterns. The following extended sed script
may help with updating the majority of your tests, but it does not catch all possible
patterns so some manual checking and updating will be required.

s~declare void @llvm\.mem(set|cpy|move)\.p([^(]*)\((.*), i32, i1\)~declare void @llvm.mem\1.p\2(\3, i1)~g
s~call void @llvm\.memset\.p([^(]*)i8\(i8([^*]*)\* (.*), i8 (.*), i8 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i8(i8\2* \3, i8 \4, i8 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i16\(i8([^*]*)\* (.*), i8 (.*), i16 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i16(i8\2* \3, i8 \4, i16 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i32\(i8([^*]*)\* (.*), i8 (.*), i32 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i32(i8\2* \3, i8 \4, i32 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i64\(i8([^*]*)\* (.*), i8 (.*), i64 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i64(i8\2* \3, i8 \4, i64 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i128\(i8([^*]*)\* (.*), i8 (.*), i128 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i128(i8\2* \3, i8 \4, i128 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i8\(i8([^*]*)\* (.*), i8 (.*), i8 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i8(i8\2* align \6 \3, i8 \4, i8 \5, i1 \7)~g
s~call void @llvm\.memset\.p([^(]*)i16\(i8([^*]*)\* (.*), i8 (.*), i16 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i16(i8\2* align \6 \3, i8 \4, i16 \5, i1 \7)~g
s~call void @llvm\.memset\.p([^(]*)i32\(i8([^*]*)\* (.*), i8 (.*), i32 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i32(i8\2* align \6 \3, i8 \4, i32 \5, i1 \7)~g
s~call void @llvm\.memset\.p([^(]*)i64\(i8([^*]*)\* (.*), i8 (.*), i64 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i64(i8\2* align \6 \3, i8 \4, i64 \5, i1 \7)~g
s~call void @llvm\.memset\.p([^(]*)i128\(i8([^*]*)\* (.*), i8 (.*), i128 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i128(i8\2* align \6 \3, i8 \4, i128 \5, i1 \7)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i8\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i8 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i8(i8\3* \4, i8\5* \6, i8 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i16\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i16 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i16(i8\3* \4, i8\5* \6, i16 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i32\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i32 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i32(i8\3* \4, i8\5* \6, i32 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i64\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i64 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i64(i8\3* \4, i8\5* \6, i64 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i128\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i128 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i128(i8\3* \4, i8\5* \6, i128 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i8\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i8 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i8(i8\3* align \8 \4, i8\5* align \8 \6, i8 \7, i1 \9)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i16\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i16 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i16(i8\3* align \8 \4, i8\5* align \8 \6, i16 \7, i1 \9)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i32\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i32 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i32(i8\3* align \8 \4, i8\5* align \8 \6, i32 \7, i1 \9)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i64\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i64 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i64(i8\3* align \8 \4, i8\5* align \8 \6, i64 \7, i1 \9)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i128\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i128 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i128(i8\3* align \8 \4, i8\5* align \8 \6, i128 \7, i1 \9)~g

 The remaining changes in the series will:
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
   source and dest alignments.
Step 3) Update Clang to use the new IRBuilder API.
Step 4) Update Polly to use the new IRBuilder API.
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
        and those that use use MemIntrinsicInst::[get|set]Alignment() to use
        getDestAlignment() and getSourceAlignment() instead.
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
        MemIntrinsicInst::[get|set]Alignment() methods.

Reviewers: pete, hfinkel, lhames, reames, bollu

Reviewed By: reames

Subscribers: niosHD, reames, jholewinski, qcolombet, jfb, sanjoy, arsenm, dschuff, dylanmckay, mehdi_amini, sdardis, nemanjai, david2050, nhaehnle, javed.absar, sbc100, jgravelle-google, eraman, aheejin, kbarton, JDevlieghere, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, llvm-commits

Differential Revision: https://reviews.llvm.org/D41675

llvm-svn: 322965
2018-01-19 17:13:12 +00:00
Petr Hosek cc7a8f14bd Fallback option for colorized output when terminfo isn't available
Try to detect the terminal color support by checking the value of the
TERM environment variable. This is not great, but it's better than
nothing when terminfo library isn't available, which may still be the
case on some Linux distributions.

Differential Revision: https://reviews.llvm.org/D42055

llvm-svn: 322962
2018-01-19 17:10:55 +00:00
Carey Williams 22c49c6470 Test commit
llvm-svn: 322958
2018-01-19 16:55:23 +00:00
Sanjay Patel 74a1eef7c4 [x86] shrink 'and' immediate values by setting the high bits (PR35907)
Try to reverse the constant-shrinking that happens in SimplifyDemandedBits()
for 'and' masks when it results in a smaller sign-extended immediate.

We are also able to detect dead 'and' ops here (the mask is all ones). In
that case, we replace and return without selecting the 'and'.

Other targets might want to share some of this logic by enabling this under a
target hook, but I didn't see diffs for simple cases with PowerPC or AArch64,
so they may already have some specialized logic for this kind of thing or have
different needs.

This should solve PR35907:
https://bugs.llvm.org/show_bug.cgi?id=35907

Differential Revision: https://reviews.llvm.org/D42088

llvm-svn: 322957
2018-01-19 16:37:25 +00:00
Sanjay Patel 33cb84571f [InstSimplify] use m_Specific and commutative matcher to reduce code; NFCI
llvm-svn: 322955
2018-01-19 16:12:55 +00:00
Nirav Dave 72d32f24f5 [X86] Extend load-op-store fusion merge to ADC/SBB.
Summary: Add handling of EFLAG input to X86 Load-op-store fusion checking.

Reviewers: craig.topper, RKSimon

Subscribers: llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D42128

llvm-svn: 322952
2018-01-19 15:37:57 +00:00
Sander de Smalen 909cf956a1 [AArch64][SVE] Asm: Add support for RDVL/ADDVL/ADDPL instructions
Reviewers: fhahn, rengolin, t.p.northover, echristo, olista01, SjoerdMeijer

Reviewed By: SjoerdMeijer

Subscribers: SjoerdMeijer, aemerson, javed.absar, tschuett, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D41900

llvm-svn: 322951
2018-01-19 15:22:00 +00:00
Alexey Bataev fa80c47c6a [SLP] Fix vectorization for tree with trunc to minimum required bit width.
Summary:
If the vectorized tree has truncate to minimum required bit width and
the vector type of the cast operation after the truncation is the same
as the vector type of the cast operands, count cost of the vector cast
operation as 0, because this cast will be later removed.
Also, if the vectorization tree root operations are integer cast operations, do not consider them as candidates for truncation. It will just create extra number of the same vector/scalar operations, which will be removed by instcombiner.

Reviewers: RKSimon, spatel, mkuper, hfinkel, mssimpso

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D41948

llvm-svn: 322946
2018-01-19 14:40:13 +00:00
Dmitry Preobrazhensky 0e074e349d [AMDGPU][MC] Corrected parsing of image modifiers and encoding of image atomics
See bugs
    35962: https://bugs.llvm.org/show_bug.cgi?id=35962
    35963: https://bugs.llvm.org/show_bug.cgi?id=35963

Differential Revision: https://reviews.llvm.org/D42184

Reviewers: vpykhtin, artem.tamazov, arsenm
llvm-svn: 322942
2018-01-19 13:49:53 +00:00
Francis Visoiu Mistrih 548add992e [CodeGen] Unify printing format of debug-location in both MIR and -debug
Use "debug-location" instead of "; dbg:" in MI::print.

llvm-svn: 322936
2018-01-19 11:44:42 +00:00
Hiroshi Inoue d24ddcd6c4 [NFC] fix trivial typos in comments
"the the" -> "the"

llvm-svn: 322934
2018-01-19 10:55:29 +00:00
Alina Sbirlea df26cf8117 [ModRefInfo] Return NoModRef for Must and NoModRef.
Summary:
In ModRefInfo "Must" was introduced to track presence of MustAlias, but we still want to return NoModRef when there is neither Mod or Ref, even when MustAlias is found. Patch has small fixes to ensure this happens.
Minor cleanup to remove nesting for 2 if statements when calling getModRefInfo for 2 ImmutableCallSites.

Reviewers: sanjoy

Subscribers: jlebar, llvm-commits

Differential Revision: https://reviews.llvm.org/D42209

llvm-svn: 322932
2018-01-19 10:26:40 +00:00
John Brawn 2867bd72c0 [InstCombine] Make foldSelectOpOp able to handle two-operand getelementptr
Three (or more) operand getelementptrs could plausibly also be handled, but
handling only two-operand fits in easily with the existing BinaryOperator
handling.

Differential Revision: https://reviews.llvm.org/D39958

llvm-svn: 322930
2018-01-19 10:05:15 +00:00
Matthias Braun 4a7c8e7aa2 Split MachineLICM into EarlyMachineLICM and MachineLICM; NFC
This avoids playing games with pseudo pass IDs and avoids using an
unreliable MRI::isSSA() check to determine whether register allocation
has happened.

Note that this renames:
- MachineLICMID -> EarlyMachineLICM
- PostRAMachineLICMID -> MachineLICMID
to be consistent with the EarlyTailDuplicate/TailDuplicate naming.

llvm-svn: 322927
2018-01-19 06:46:10 +00:00
Matthias Braun 3ab9fcb98e Split TailDuplicatePass into pre- and post-RA variant; NFC
Split TailDuplicatePass into EarlyTailDuplicate and TailDuplicate. This
avoids playing games with fake pass IDs and using MRI::isSSA() to
determine pre-/post-RA state.

llvm-svn: 322926
2018-01-19 06:08:17 +00:00
Craig Topper f4cd9083ac [X86] Make better use of instregex for cmovcc/setcc/jcc instructions in the Intel scheduler models.
Combine all the separate condition codes into a singular expression when possible.

llvm-svn: 322924
2018-01-19 05:47:32 +00:00
Serguei Katkov 22bb1c0e17 Revert [CGP] Re-enable Select in complex addressing mode
One of buildbots failed. Revert for now till fix the issue.

llvm-svn: 322923
2018-01-19 04:52:39 +00:00
Matthias Braun 5c290dc206 AArch64: Fix emergency spillslot being out of reach for large callframes
Re-commit of r322200: The testcase shouldn't hit machineverifiers
anymore with r322917 in place.

Large callframes (calls with several hundreds or thousands or
parameters) could lead to situations in which the emergency spillslot is
out of range to be addressed relative to the stack pointer.
This commit forces the use of a frame pointer in the presence of large
callframes.

This commit does several things:
- Compute max callframe size at the end of instruction selection.
- Add mirFileLoaded target callback. Use it to compute the max callframe size
  after loading a .mir file when the size wasn't specified in the file.
- Let TargetFrameLowering::hasFP() return true if there exists a
  callframe > 255 bytes.
- Always place the emergency spillslot close to FP if we have a frame
  pointer.
- Note that `useFPForScavengingIndex()` would previously return false
  when a base pointer was available leading to the emergency spillslot
  getting allocated late (that's the whole effect of this callback).
  Which made no sense to me so I took this case out: Even though the
  emergency spillslot is technically not referenced by FP in this case
  we still want it allocated early.

Differential Revision: https://reviews.llvm.org/D40876

llvm-svn: 322919
2018-01-19 03:16:36 +00:00
Matthias Braun dc4b3e87f4 AArch64: Omit callframe setup/destroy when not necessary
Do not create CALLSEQ_START/CALLSEQ_END when there is no callframe to
setup and the callframe size is 0.

- Fixes an invalid callframe nesting for byval arguments, which would
  look like this before this patch (as in `big-byval.ll`):
    ...
    ADJCALLSTACKDOWN 32768, 0, ...   # Setup for extfunc
    ...
    ADJCALLSTACKDOWN 0, 0, ...  # setup for memcpy
    ...
    BL &memcpy ...
    ADJCALLSTACKUP 0, 0, ...    # destroy for memcpy
    ...
    BL &extfunc
    ADJCALLSTACKUP 32768, 0, ...   # destroy for extfunc

- Saves us two instructions in the common case of zero-sized stackframes.
- Remove an unnecessary scheduling barrier (hence the small unittest
  changes).

Differential Revision: https://reviews.llvm.org/D42006

llvm-svn: 322917
2018-01-19 02:45:38 +00:00
Sam Clegg b6c5bc27c4 [WebAssembly] Add test expectations for gcc C++ tests (gcc/testsuite/g++.dg)
Differential Revision: https://reviews.llvm.org/D42226

llvm-svn: 322915
2018-01-19 01:40:52 +00:00
Lang Hames 44efd042a2 [ORC] Revert r322913 while I investigate an ASan failure.
llvm-svn: 322914
2018-01-19 01:40:26 +00:00
Lang Hames 817df9fa0c [ORC] Redesign the JITSymbolResolver interface to support bulk queries.
Bulk queries reduce IPC/RPC overhead for cross-process JITing and expose
opportunities for parallel compilation.

The two new query methods are lookupFlags, which finds the flags for each of a
set of symbols; and lookup, which finds the address and flags for each of a
set of symbols. (See doxygen comments for more details.)

The existing JITSymbolResolver class is renamed LegacyJITSymbolResolver, and
modified to extend the new JITSymbolResolver class using the following scheme:

- lookupFlags is implemented by calling findSymbolInLogicalDylib for each of the
symbols, then returning the result of calling getFlags() on each of these
symbols. (Importantly: lookupFlags does NOT call getAddress on the returned
symbols, so lookupFlags will never trigger materialization, and lookupFlags will
never call findSymbol, so only symbols that are part of the logical dylib will
return results.)

- lookup is implemented by calling findSymbolInLogicalDylib for each symbol and
falling back to findSymbol if findSymbolInLogicalDylib returns a null result.
Assuming a symbol is found its getAddress method is called to materialize it and
the result (if getAddress succeeds) is stored in the result map, or the error
(if getAddress fails) is returned immediately from lookup. If any symbol is not
found then lookup returns immediately with an error.

This change will break any out-of-tree derivatives of JITSymbolResolver. This
can be fixed by updating those classes to derive from LegacyJITSymbolResolver
instead.

llvm-svn: 322913
2018-01-19 01:12:40 +00:00
Craig Topper 84b26b90d1 [X86] Add intrinsic support for the RDPID instruction
This adds a new instrinsic to support the rdpid instruction. The implementation is a bit weird because the intrinsic is defined as always returning 32-bits, but the assembler support thinks the instruction produces a 64-bit register in 64-bit mode. But really it zeros the upper 32 bits. So I had to add separate patterns where 64-bit mode uses an extract_subreg.

Differential Revision: https://reviews.llvm.org/D42205

llvm-svn: 322910
2018-01-18 23:52:31 +00:00
Changpeng Fang ba6240cc71 AMDGPU/SI: Fix typos in d16 support patch the buffer intrinsics.
llvm-svn: 322906
2018-01-18 22:57:57 +00:00
Reid Kleckner 7897a789e7 [CodeView] Add line numbers for inlined call sites
We did this for inline call site line tables, but we hadn't done it for
regular function line tables yet. This patch copies that logic from
encodeInlineLineTable.

llvm-svn: 322905
2018-01-18 22:55:43 +00:00
Reid Kleckner b525872288 [CodeView] Sink complex inline functions to .cpp file, NFC
I'm cleaning up this code before I attempt to fix a line table bug.

llvm-svn: 322904
2018-01-18 22:55:14 +00:00
Changpeng Fang 4737e892de AMDGPU/SI: Add d16 support for image intrinsics.
Summary:
  This patch implements d16 support for image load, image store and image sample intrinsics.

Reviewers:
  Matt, Brian.

Differential Revision:
  https://reviews.llvm.org/D3991

llvm-svn: 322903
2018-01-18 22:08:53 +00:00
Eric Christopher 668e6b4b05 Typo fix SIBABRT -> SIGABRT.
Based on a patch by Henry Wong!

llvm-svn: 322902
2018-01-18 21:45:51 +00:00
Peter Collingbourne 719c1f7405 Support: Add missing #include.
This #include is necessary to provide the definitions of _fpclass
and _FPCLASS_NZ when building with libc++.

llvm-svn: 322885
2018-01-18 20:49:33 +00:00
Paul Robinson 8181d23b3d [DWARFv5] Number the line-table's directory array correctly.
The compilation directory has always been #0, but as of DWARF v5 it is
explicitly listed in the line-table section instead of implicitly
being a reference to the compile_unit DIE's DW_AT_comp_dir attribute.
This means the dumper should number the dumped array starting with 0
or 1 depending on the DWARF version of the line table.

References in the generated DWARF are correct, it's just the dumper
that was wrong.  Also some assembler-coded tests were similarly
confused about directory numbers.

llvm-svn: 322884
2018-01-18 20:33:35 +00:00
Amara Emerson d5785775f8 [AArch64][GlobalISel] Add isel support for global values in the large code model.
Fixes PR35958.

Differential Revision: https://reviews.llvm.org/D42175

llvm-svn: 322878
2018-01-18 19:21:27 +00:00
Ana Pazos 1b57c7a0f4 [RISCV] Fixed setting predicates for compressed instructions.
Summary:
Fixed setting predicates for compressed instructions.
Some instructions were being generated with C extension
enabled only, without proper checks for the other
required extensions like F, D and 32 and 64-bit target checks.
Affected instructions:
C_FLD, C_FLW, C_LD, C_FSD, C_FSW, C_SD,
C_JAL, C_ADDIW, C_SUBW, C_ADDW,
C_FLDSP, C_FLWSP, C_LDSP, C_FSDSP, C_FSWSP, C_SDSP

Reviewers: asb, shiva0217

Reviewed By: asb

Subscribers: rbar, johnrusso, simoncook, jordy.potman.lists, sabuasal, niosHD, llvm-commits

Differential Revision: https://reviews.llvm.org/D42132

llvm-svn: 322876
2018-01-18 18:54:05 +00:00
Zachary Turner 1bc2ce6b9b Speed up iteration of CodeView record streams.
There's some abstraction overhead in the underlying
mechanisms that were being used, and it was leading to an
abundance of small but not-free copies being made.  This
showed up on a profile.  Eliminating this and going back to
a low-level byte-based implementation speeds up lld with
/DEBUG between 10 and 15%.

Differential Revision: https://reviews.llvm.org/D42148

llvm-svn: 322871
2018-01-18 18:35:01 +00:00
Francis Visoiu Mistrih eb3f76fc83 [CodeGen][NFC] Rename IsVerbose to IsStandalone in Machine*::print
Committed r322867 too soon.

Differential Revision: https://reviews.llvm.org/D42239

llvm-svn: 322868
2018-01-18 18:05:15 +00:00
Francis Visoiu Mistrih 378b5f3de6 [CodeGen] Print RegClasses on MI in verbose mode
r322086 removed the trailing information describing reg classes for each
register.

This patch adds printing reg classes next to every register when
individual operands/instructions/basic blocks are printed. In the case
of dumping MIR or printing a full function, by default don't print it.

Differential Revision: https://reviews.llvm.org/D42239

llvm-svn: 322867
2018-01-18 17:59:06 +00:00
Sanjay Patel 0e690b05d1 [TargetLowering] add punctuation for readability; NFC
llvm-svn: 322855
2018-01-18 15:25:32 +00:00
Francis Visoiu Mistrih 586444e42b [CodeGen][NFC] Refactor MachineInstr::print
* Handle more cases where the MI is not attached yet
* Add similar asserts like in MIRPrinter::print

llvm-svn: 322848
2018-01-18 14:52:14 +00:00
Benjamin Kramer bfc1d976ca [HWAsan] Fix uninitialized variable.
Found by msan.

llvm-svn: 322847
2018-01-18 14:19:04 +00:00
Alex Bradbury 921383828e [RISCV] Codegen support for the standard RV32M instruction set extension
llvm-svn: 322843
2018-01-18 12:36:38 +00:00
Alex Bradbury 7d6aa1f7ae [RISCV] Implement frame pointer elimination
llvm-svn: 322839
2018-01-18 11:34:02 +00:00
Sam Parker 1f8c035d6b [SelectionDAG] Convert assert to condtion
Follow-up to r322120 which can cause assertions for AArch64 because
v1f64 and v1i64 are legal types.

Differential Revision: https://reviews.llvm.org/D42097

llvm-svn: 322823
2018-01-18 09:22:24 +00:00
Craig Topper 83b0a98902 [X86] Use vmovdqu64/vmovdqa64 for unmasked integer vector stores for consistency with loads.
Previously we used 64 for vXi64 stores and 32 for everything else. This change uses 64 for everything just like do for loads.

llvm-svn: 322820
2018-01-18 07:44:09 +00:00
Craig Topper 21c8a8fa49 [X86] Remove isel patterns for using unmasked vmovdqa32/vmovdqu32 for integer vector loads.
These patterns were just looking for a vXi64 bitcasted to vXi32, but there is no advantage to using vmovdqa32 over vmovdqa64.

llvm-svn: 322819
2018-01-18 07:44:06 +00:00
Rafael Espindola 8a1ca67461 Don't drop dso_local in LTO.
LTO sets dso_local as an optimization, so don't clear it.

This avoid clearing it from undefined hidden symbols, which would then
fail the verifier.

llvm-svn: 322814
2018-01-18 05:38:43 +00:00
Craig Topper 7f0d85ec1e [DAGCombiner] Add a DAG combine to turn a splat build_vector where the splat elemnt is a bitcast from a vector type into a concat_vector
For example, a build_vector of i64 bitcasted from v2i32 can be turned into a concat_vectors of the v2i32 vectors with a bitcast to a vXi64 type

Differential Revision: https://reviews.llvm.org/D42090

llvm-svn: 322811
2018-01-18 04:17:06 +00:00
Rafael Espindola 9fbc040599 Make GlobalValues with non-default visibilility dso_local.
This is similar to r322317, but for visibility. It is not as neat
because we have to special case extern_weak.

The idea is the same as the previous change, make the transition to
explicit dso_local easier for the frontends. With this they only have
to add dso_local to symbols where we need some external information to
decide if it is dso_local (like it being part of an ELF executable).

llvm-svn: 322806
2018-01-18 02:08:23 +00:00
Justin Bogner a9346e050f GlobalISel: Make MachineCSE runnable in the middle of the GlobalISel
Right now, it is not possible to run MachineCSE in the middle of the
GlobalISel pipeline. Being able to run generic optimizations between the
core passes of GlobalISel was one of the goals of the new ISel framework.
This is the first attempt to do it.

The problem is that MachineCSE pass assumes all register operands have a
register class, which, in GlobalISel context, won't be true until after the
InstructionSelect pass. The reason for this behaviour is that before
replacing one virtual register with another, MachineCSE pass (and most of
the other optimization machine passes) must check if the virtual registers'
constraints have a (sufficiently large) intersection, and constrain the
resulting register appropriately if such intersection exists.

GlobalISel extends the representation of such constraints from just a
register class to a triple (low-level type, register bank, register
class).

This commit adds MachineRegisterInfo::constrainRegAttrs method that extends
MachineRegisterInfo::constrainRegClass to such a triple.

The idea is that going forward we should use:

- RegisterBankInfo::constrainGenericRegister within GlobalISel's
  InstructionSelect pass
- MachineRegisterInfo::constrainRegClass within SelectionDAG ISel
- MachineRegisterInfo::constrainRegAttrs everywhere else regardless
  the target and instruction selector it uses.

Patch by Roman Tereshin. Thanks!

llvm-svn: 322805
2018-01-18 02:06:56 +00:00
Derek Schuff 53b3855b2b [WebAssembly] Remove duplicated RTLIB names
Remove the tight coupling between llvm/CodeGenRuntimeLibcalls.def and
the table of supported singatures for wasm. This will allow adding new libcalls
without changing wasm's signature table.

Also, some cleanup:
Use ManagedStatics instead of const tables to avoid memory/binary bloat.
Use a StringMap instead of a linear search for name lookup.

Differential Revision: https://reviews.llvm.org/D35592

llvm-svn: 322802
2018-01-18 01:15:45 +00:00
Volkan Keles 4aa73a649a Fix the failure caused by r322773
Do not run GlobalISel if `-fast-isel=0 -global-isel=false`.

llvm-svn: 322800
2018-01-18 01:10:30 +00:00
Jessica Paquette 729e68693f [MachineOutliner] Add DISubprograms to outlined functions.
Before, it wasn't possible to get backtraces inside outlined functions. This
commit adds DISubprograms to the IR functions created by the outliner which
makes this possible. Also attached a test that ensures that the produced
debug information is correct. This is useful to users that want to debug
outlined code.

llvm-svn: 322789
2018-01-18 00:00:58 +00:00
Reid Kleckner 1aa9061c5f [CodeGen] Hoist common AsmPrinter code out of X86, ARM, and AArch64
Every known PE COFF target emits /EXPORT: linker flags into a .drective
section. The AsmPrinter should handle this.

While we're at it, use global_values() and emit each export flag with
its own .ascii directive. This should make the .s file output more
readable.

llvm-svn: 322788
2018-01-17 23:55:23 +00:00
Evgeniy Stepanov 5bd669dc8f [hwasan] LLVM-level flags for linux kernel-compatible hwasan instrumentation.
Summary:
-hwasan-mapping-offset defines the non-zero shadow base address.
-hwasan-kernel disables calls to __hwasan_init in module constructors.
Unlike ASan, -hwasan-kernel does not force callback instrumentation.
This is controlled separately with -hwasan-instrument-with-calls.

Reviewers: kcc

Subscribers: srhines, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D42141

llvm-svn: 322785
2018-01-17 23:24:38 +00:00
Volkan Keles a79b0620a0 Add a TargetOption to enable/disable GlobalISel
Summary:
This patch adds a new target option in order to control GlobalISel.
This will allow the users to enable/disable GlobalISel prior to the
backend by calling `TargetMachine::setGlobalISel(bool Enable)`.

No test case as there is already a test to check GlobalISel
command line options.
See: CodeGen/AArch64/GlobalISel/gisel-commandline-option.ll.

Reviewers: qcolombet, aemerson, ab, dsanders

Reviewed By: qcolombet

Subscribers: rovka, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D42137

llvm-svn: 322773
2018-01-17 22:34:21 +00:00
Benjamin Kramer 8b1986b5cb Add support for emitting libcalls for x86_fp80 -> fp128 and vice-versa
compiler_rt doesn't provide them (yet), but libgcc does. PR34076.

llvm-svn: 322772
2018-01-17 22:29:16 +00:00
Easwaran Raman e5b8de2f1f Add a ProfileCount class to represent entry counts.
Summary:
The class wraps a uint64_t and an enum to represent the type of profile
count (real and synthetic) with some helper methods.

Reviewers: davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D41883

llvm-svn: 322771
2018-01-17 22:24:23 +00:00
Eli Friedman c60a23a6af [LegalizeDAG] Fix ATOMIC_CMP_SWAP_WITH_SUCCESS legalization.
The code wasn't zero-extending correctly, so the comparison could
spuriously fail.

Adds some AArch64 tests to cover this case.

Inspired by D41791.

Differential Revision: https://reviews.llvm.org/D41798

llvm-svn: 322767
2018-01-17 22:04:36 +00:00
Javed Absar 1e28194a40 [SCEV] Fix typo. NFC.
Fix confusing typo in comment.

llvm-svn: 322765
2018-01-17 21:58:35 +00:00
Zaara Syeda c9dc7b451b Revert [PowerPC] This reverts commit rL322721
Failing build bots. Revert the commit now.

llvm-svn: 322748
2018-01-17 20:00:15 +00:00
Philip Reames f5ff5d584e [MDA] Use common code instead of reimplementing same. [NFC]
llvm-svn: 322747
2018-01-17 19:57:19 +00:00
Aditya Nandakumar 18b3f9d384 [GISel] Make constrainSelectedInstRegOperands() available to the legalizer. NFC
https://reviews.llvm.org/D42149

llvm-svn: 322743
2018-01-17 19:31:33 +00:00
Sam Clegg 9f3fe42e19 [WebAssembly] Remove debug names from symbol table
Get rid of DEBUG_FUNCTION_NAME symbols. When we actually debug
data, maybe we'll want somewhere to put it... but having a symbol
that just stores the name of another symbol seems odd.
It means you have multiple Symbols with the same name, one
containing the actual function and another containing the name!

Store the names in a vector on the WasmObjectFile when reading
them in. Also stash them on the WasmFunctions themselves.
The names are //not// "symbol names" or aliases or anything,
they're just the name that a debugger should show against the
function body itself. NB. The WasmObjectFile stores them so that
they can be exported in the YAML losslessly, and hence the tests
can be precise.

Enforce that the CODE section has been read in before reading
the "names" section. Requires minor adjustment to some tests.

Patch by Nicholas Wilson!

Differential Revision: https://reviews.llvm.org/D42075

llvm-svn: 322741
2018-01-17 19:28:43 +00:00
Rafael Espindola d700869235 Use a got to access a hidden weak undefined on MachO.
Trying to link

__attribute__((weak, visibility("hidden"))) extern int foo;
int *main(void) {
  return &foo;
}

on OS X fails with

ld: 32-bit RIP relative reference out of range (-4294971318 max is +/-2GB): from _main (0x100000FAB) to _foo@0x00001000 (0x00000000) in '_main' from test.o for architecture x86_64

The problem being that 0 cannot be computed as a fixed difference from
%rip. Exactly the same issue exists on ELF and we can use the same
solution.

llvm-svn: 322739
2018-01-17 19:19:55 +00:00
Joel Galenson bbcaf4ac5c [ARM] Optimize {s,u}mul.with.overflow.
This extends my previous patches to also optimize overflow-checked multiplies during SelectionDAG.

Differential revision: https://reviews.llvm.org/D40922

llvm-svn: 322738
2018-01-17 19:19:05 +00:00
Joel Galenson fe7fa40869 [ARM] Optimize {s,u}{add,sub}.with.overflow.
The ARM backend contains code that tries to optimize compares by replacing them with an existing instruction that sets the flags the same way. This allows it to replace a "cmp" with a "adds", generalizing the code that replaces "cmp" with "sub". It also heuristically disables sinking of instructions that could potentially be used to replace compares (currently only if they're next to each other).

Differential revision: https://reviews.llvm.org/D38378

llvm-svn: 322737
2018-01-17 19:19:05 +00:00
Daniel Neilson 88dddb8948 [Attributes] Fix crash when attempting to remove alignment from an attribute list/set
Summary:
 Discovered while working on a patch to move alignment in
@llvm.memcpy/move/set from an arg into parameter attributes.

 The current implementations of AttributeSet::removeAttribute() and
AttributeList::removeAttribute crash when attempting to remove the
alignment attribute. Currently, these implementations add the
to-be-removed attributes to an AttrBuilder and then remove
the builder from the list/set. Alignment is special in that it
must be added to a builder with an integer value for the alignment;
attempts to add alignment to a builder without a value is an error.

 This change fixes the removeAttribute implementations for AttributeSet and
AttributeList to make them able to remove the alignment, and other similar,
attributes.

Reviewers: rnk, chandlerc, pete, javed.absar, reames

Reviewed By: rnk

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D41951

llvm-svn: 322735
2018-01-17 19:15:21 +00:00
Simon Pilgrim 8c87a2e7bd [X86][BTVER2] Reduce instregex usage (PR35955)
Most are just replaced with instrs lists, but a few regexps have been further generalized to match more instructions with a single pattern.

llvm-svn: 322734
2018-01-17 19:12:48 +00:00
Craig Topper b70ca5060f [X86] Teach LowerBUILD_VECTOR to recognize pair-wise splats of 32-bit elements and use a 64-bit broadcast
If we are splatting pairs of 32-bit elements, we can use a 64-bit broadcast to get the job done.

We could probably could probably do this with other sizes too, for example four 16-bit elements. Or we could broadcast pairs of 16-bit elements using a 32-bit element broadcast. But I've left that as a future improvement.

I've also restricted this to AVX2 only because we can only broadcast loads under AVX.

Differential Revision: https://reviews.llvm.org/D42086

llvm-svn: 322730
2018-01-17 18:58:22 +00:00
Craig Topper 279ace187a [X86] When legalizing (v64i1 select i8, v64i1, v64i1) make sure not to introduce bitcasts to i64 in 32-bit mode
We legalize selects of masks with scalar conditions using a bitcast to an integer type. But if we are in 32-bit mode we can't convert v64i1 to i64. So instead split the v64i1 to v32i1 and concat it back together. Each half will then be legalized by bitcasting to i32 which is fine.

The test case is a little indirect. If we have the v64i1 select in IR it will get legalized by legalize vector ops which has a run of type legalization after it. That type legalization run is able to fix this i64 bitcast. So in order to avoid that we need a build_vector of a splat which legalize vector ops will ignore. Legalize DAG will then turn that into a select via LowerBUILD_VECTORvXi1. And the select will get legalized. In this case there is no type legalizer run to cleanup the bitcast.

This fixes pr35972.

llvm-svn: 322724
2018-01-17 18:46:01 +00:00
Zaara Syeda 8e951fd2f6 [PowerPC] Add handling for ColdCC calling convention and a pass to mark
candidates with coldcc attribute.

This patch adds support for the coldcc calling convention for Power.
This changes the set of non-volatile registers. It includes a pass to stress
test the implementation by marking all static directly called functions with
the coldcc attribute through the option -enable-coldcc-stress-test. It also
includes an option, -ppc-enable-coldcc, to add the coldcc attribute to
functions which are cold at all call sites based on BlockFrequencyInfo when
the containing function does not call any non cold functions.

Differential Revision: https://reviews.llvm.org/D38413

llvm-svn: 322721
2018-01-17 18:22:55 +00:00
Tatyana Krasnukha 8979eea04e [ARC] Add missing condition codes.
Summary: Added VS and VC, required for disassembling.

Reviewers: petecoup

Reviewed By: petecoup

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42172

llvm-svn: 322718
2018-01-17 17:58:28 +00:00
Jonas Paulsson ef785694f2 [SystemZ] Handle BRCTH branches correctly in SystemZLongBranch.cpp.
BRCTH is capable of a long branch which needs to be recognized during branch
relaxation. This is done by checking for ExtraRelaxSize == 0.

Review: Ulrich Weigand
llvm-svn: 322688
2018-01-17 17:16:07 +00:00
Matt Arsenault 1491ca8911 AMDGPU: Error in SIAnnotateControlFlow instead of assert
This assert typically happens if an unstructured CFG is passed
to the pass. This can happen if the pass is run independently
without the structurizer.

llvm-svn: 322685
2018-01-17 16:30:01 +00:00
Diana Picus 01bcfd2112 [ARM GlobalISel] Rename local variable. NFC
llvm-svn: 322667
2018-01-17 15:25:37 +00:00
Pablo Barrio f2c29571da [AArch64] Fix incorrect LD1 of 16-bit FP vectors in big endian
Summary:
Loading a vector of 4 half-precision FP sometimes results in an LD1
of 2 single-precision FP + a reversal. This results in an incorrect
byte swap due to the conversion from little endian to big endian.

In order to generate the correct byte swap, it is easier to
generate the correct LD1 of 4 half-precision FP, thus avoiding the
subsequent reversal.

Reviewers: craig.topper, jmolloy, olista01

Reviewed By: olista01

Subscribers: efriedma, samparker, SjoerdMeijer, rogfer01, aemerson, rengolin, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D41863

llvm-svn: 322663
2018-01-17 14:39:29 +00:00
Sanjay Patel aa766efd09 [InstCombine] fix demanded-bits propagation for zext/trunc
I was comparing the demanded-bits implementations between InstCombine
and TargetLowering as part of investigating questions in D42088 and
noticed that this was wrong in IR. We were losing all of the prior
known bits when we got back to the 'zext'.

llvm-svn: 322662
2018-01-17 14:39:28 +00:00
Alex Bradbury d93f889d89 [RISCV] Allow RISCVAsmBackend::writeNopData to generate c.nop when supported
When the compressed instruction set is enabled, the 16-bit c.nop can be
generated if necessary.

Differential Revision: https://reviews.llvm.org/D41221
Patch by Shiva Chen.

llvm-svn: 322658
2018-01-17 14:17:12 +00:00
Diana Picus c62a16234b [ARM GlobalISel] Map G_FPEXT and G_FPTRUNC to FPR
llvm-svn: 322657
2018-01-17 14:14:14 +00:00
Daniil Fukalov d5fca554e2 [AMDGPU] add LDS f32 intrinsics
added llvm.amdgcn.atomic.{add|min|max}.f32 intrinsics
to allow generate ds_{add|min|max}[_rtn]_f32 instructions
needed for OpenCL float atomics in LDS

Reviewed by: arsenm

Differential Revision: https://reviews.llvm.org/D37985

llvm-svn: 322656
2018-01-17 14:05:05 +00:00
Dmitry Preobrazhensky 6b65f7c380 [AMDGPU][MC][GFX9] Enable inline constants for SDWA operands
See bug 35771: https://bugs.llvm.org/show_bug.cgi?id=35771

Differential Revision: https://reviews.llvm.org/D42058

Reviewers: vpykhtin, artem.tamazov, arsenm
llvm-svn: 322655
2018-01-17 14:00:48 +00:00
Diana Picus 65ed364fac [ARM GlobalISel] Legalize G_FPEXT and G_FPTRUNC
Mark G_FPEXT and G_FPTRUNC as legal or libcall, depending on hardware
support, but only for conversions between float and double.

Also add the necessary boilerplate so that the LegalizerHelper can
introduce the required libcalls. This also works only for float and
double, but isn't too difficult to extend when the need arises.

llvm-svn: 322651
2018-01-17 13:34:10 +00:00
Ivan A. Kosarev 4d0ff0c74d [Transforms] Support making mutable versions of new-format TBAA access tags
Differential Revision: https://reviews.llvm.org/D41565

llvm-svn: 322650
2018-01-17 13:29:54 +00:00
Benjamin Kramer 8d073a2c2d [X86] Don't mutate shuffle arguments after early-out for AVX512
The match* functions have the annoying behavior of modifying its inputs.
Save and restore the inputs, just in case the early out for AVX512 is
hit. This is still not great and its only a matter of time this kind of
bug happens again, but I couldn't come up with a better pattern without
rewriting significant chunks of this code. Fixes PR35977.

llvm-svn: 322644
2018-01-17 13:01:06 +00:00
Benjamin Kramer 05dc3527de [X86] Constify DebugLoc parameters. No functionality change.
llvm-svn: 322643
2018-01-17 13:00:58 +00:00
Hiroshi Inoue 8f976ba0bf [NFC] fix trivial typos in comments
"the the" -> "the"

llvm-svn: 322636
2018-01-17 12:29:38 +00:00
Pavel Labath 67530e478b Don't emit apple accelerator tables on non-darwin targets
Summary:
Currently -glldb turns on emission of apple tables on all targets, but
lldb is only really capable of consuming them on darwin. Furthermore,
making lldb consume these tables is not straight-forward because of the
differences in how the debug info is distributed on darwin vs. elf
targets.

The darwin debug model assumes that the debug info (along with
accelerator tables) will either remain in the .o files or it will be
linked into a dsym bundle by a linker that knows how to merge these
tables. In the elf world, all present linkers will simply concatenate
these accelerator tables into the shared object. Since the tables are
not self-terminating, this renders the tables unusable, as the debugger
cannot pry the individual tables apart anymore.

It might theoretically be possible to make the tables work with split
dwarf, as that is somewhat similar to the apple .o model, but
unfortunately right now the combination of -glldb and -gsplit-dwarf
produces broken object files.

Until these issues are resolved there is no point in emitting the apple
tables for these targets. At best, it wastes space; at worst, it breaks
compilation and prevents the user from getting other benefits of -glldb.

Reviewers: probinson, aprantl, dblaikie

Subscribers: emaste, dim, llvm-commits, JDevlieghere

Differential Revision: https://reviews.llvm.org/D41986

llvm-svn: 322633
2018-01-17 11:52:13 +00:00