Commit Graph

112020 Commits

Author SHA1 Message Date
Sanjay Patel cbb0450540 [InstCombine] add folds for icmp + sub (PR36969)
(A - B) >u A --> A <u B
C <u (C - D) --> C <u D

https://rise4fun.com/Alive/e7j

Name: ugt
  %sub = sub i8 %x, %y
  %cmp = icmp ugt i8 %sub, %x
=>
  %cmp = icmp ult i8 %x, %y
  
Name: ult
  %sub = sub i8 %x, %y
  %cmp = icmp ult i8 %x, %sub
=>
  %cmp = icmp ult i8 %x, %y

This should fix:
https://bugs.llvm.org/show_bug.cgi?id=36969

llvm-svn: 329011
2018-04-02 20:37:40 +00:00
Harlan Haskins bee4b5894a Fix header mismatch in DIBuilder Type APIs
Some of the headers changed slightly, and the accompanying
implementation didn't change. This caused a silent failure.

llvm-svn: 329003
2018-04-02 19:11:44 +00:00
Zachary Turner d11328a1bb [llvm-pdbutil] Add an export subcommand.
This command can dump the binary contents of a stream to a file.
This is useful when you want to do side-by-side comparisons of
a specific stream from two PDBs to examine the differences between
them.  You can export both of them to a file, then open them up
side by side in a hex editor (for example), so as to eliminate any
differences that might arise from the contents being on different
blocks in the PDB.

In subsequent patches I plan to improve the "explain" subcommand
so that you can explain the contents of a binary file that isn't
necessarily a full PDB, but one of these dumped streams, by telling
the subcommand how to interpret the contents.

llvm-svn: 329002
2018-04-02 18:35:21 +00:00
Nico Weber 868112181b Remove HAVE_LIBPSAPI, HAVE_SHELL32.
These used to be set in the old autoconf build, but the cmake build has had a
"TODO: actually check for these" comment since it was checked in, and they
were set to 1 on mingw unconditionally.  It seems safe to say that they always
exist under mingw, so just remove them and assume they're set exactly when on
mingw (with msvc, we use `pragma comment` instead of linking these via flags).

llvm-svn: 328992
2018-04-02 17:32:48 +00:00
Rong Xu 5a8d4c3357 [DeadArgumentElim] Clone function level metadatas
Some Function level metadatas, such as function entry count, are not cloned in
DeadArgumentElim. This happens a lot in lto/thinlto because of DeadArgumentElim
after internalization.

This patch clones the metadatas in the original function to the new function.

Differential Revision: https://reviews.llvm.org/D44127

llvm-svn: 328991
2018-04-02 17:27:38 +00:00
Nico Weber f3db8e3c70 Remove HAVE_DIRENT_H.
The autoconf manual: "This macro is obsolescent, as all current systems with
directory libraries have <dirent.h>. New programs need not use this macro."

llvm-svn: 328989
2018-04-02 17:17:29 +00:00
Dmitry Preobrazhensky b181c7312e [AMDGPU][MC][GFX9] Added instructions v_cvt_norm_*16_f16, v_sat_pk_u8_i16
See bug 36847: https://bugs.llvm.org/show_bug.cgi?id=36847

Differential Revision: https://reviews.llvm.org/D45097

Reviewers: artem.tamazov, arsenm, timcorringham
llvm-svn: 328988
2018-04-02 17:09:20 +00:00
Gor Nishanov b0316d96ae [coroutines] Add support for llvm.coro.noop intrinsics
Summary:
A recent addition to Coroutines TS (https://wg21.link/p0913) adds a pre-defined coroutine noop_coroutine that does nothing.
To implement this feature, we implemented an llvm.coro.noop intrinsic that returns a coroutine handle to a coroutine that does nothing when resumed or destroyed.

Reviewers: EricWF, modocache, rnk, lewissbaker

Reviewed By: modocache

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D45114

llvm-svn: 328986
2018-04-02 16:55:12 +00:00
Dmitry Preobrazhensky 6bad04ecf5 [AMDGPU][MC][GFX9] Added s_atomic_* and s_buffer_atomic_* instructions
Fixed a bug which caused Tablegen crash.

See bug 36837: https://bugs.llvm.org/show_bug.cgi?id=36837

Differential Revision: https://reviews.llvm.org/D45085

Reviewers: artem.tamazov, arsenm, timcorringham
llvm-svn: 328983
2018-04-02 16:10:25 +00:00
Krzysztof Parzyszek 0831f57afe [Hexagon] Clean up some code in HexagonAsmPrinter, NFC
llvm-svn: 328981
2018-04-02 15:06:55 +00:00
Alexey Bataev 3decaf4275 [SLP] Fix PR36481: vectorize reassociated instructions.
Summary:
If the load/extractelement/extractvalue instructions are not originally
consecutive, the SLP vectorizer is unable to vectorize them. Patch
allows reordering of such instructions.

Reviewers: RKSimon, spatel, hfinkel, mkuper, Ayal, ashahid

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D43776

llvm-svn: 328980
2018-04-02 14:51:37 +00:00
Nico Weber f492f58182 Revert r328975, it makes TableGen assert on the bots.
llvm-svn: 328978
2018-04-02 14:20:23 +00:00
Nico Weber 9f03e9de77 Remove HAVE_WRITEV that's unused after r255837.
llvm-svn: 328977
2018-04-02 14:18:13 +00:00
Dmitry Preobrazhensky 32c450ae6a [AMDGPU][MC][GFX9] Added s_atomic_* and s_buffer_atomic_* instructions
See bug 36837: https://bugs.llvm.org/show_bug.cgi?id=36837

Differential Revision: https://reviews.llvm.org/D45085

Reviewers: artem.tamazov, arsenm, timcorringham
llvm-svn: 328975
2018-04-02 13:52:23 +00:00
Nico Weber 2eada78a50 Attempt to heal bots after r328970.
llvm-svn: 328974
2018-04-02 13:49:35 +00:00
Lama Saba 927468309f [X86] Reduce Store Forward Block issues in HW - Recommit after fixing Bug 36346
If a load follows a store and reloads data that the store has written to memory, Intel microarchitectures can in many cases forward the data directly from the store to the load, This "store forwarding" saves cycles by enabling the load to directly obtain the data instead of accessing the data from cache or memory.
A "store forward block" occurs in cases that a store cannot be forwarded to the load. The most typical case of store forward block on Intel Core microarchiticutre that a small store cannot be forwarded to a large load.
The estimated penalty for a store forward block is ~13 cycles.

This pass tries to recognize and handle cases where "store forward block" is created by the compiler when lowering memcpy calls to a sequence
of a load and a store.

The pass currently only handles cases where memcpy is lowered to XMM/YMM registers, it tries to break the memcpy into smaller copies.
breaking the memcpy should be possible since there is no atomicity guarantee for loads and stores to XMM/YMM.

Differential revision: https://reviews.llvm.org/D41330

Change-Id: Ib48836ccdf6005989f7d4466fa2035b7b04415d9
llvm-svn: 328973
2018-04-02 13:48:28 +00:00
Hiroshi Inoue 6d48493817 [PowerPC] fix assertion failure due to missing instruction in P9InstrResources.td
This patch adds L(D|W|H|B)XTLS instructions introduced by https://reviews.llvm.org/rL327635 in P9InstrResources.td.

llvm-svn: 328969
2018-04-02 12:18:21 +00:00
Jonas Devlieghere 9e3e7a99e8 [dsymutil] Upstream emitting of papertrail warnings.
When running dsymutil as part of your build system, it can be desirable
for warnings to be part of the end product, rather than just being
emitted to the output stream. This patch upstreams that functionality.

Differential revision: https://reviews.llvm.org/D44639

llvm-svn: 328965
2018-04-02 10:40:43 +00:00
Craig Topper 96729cd64b [X86][Silvermont] Use correct latency and throughput information for divide and square root in the scheduler model.
Data taken from Table 16-17 in the Intel Optimization Manual.

llvm-svn: 328962
2018-04-02 06:34:16 +00:00
Craig Topper 6a814904da [X86][SkylakeServer] Correct throughput for 512-bit sqrt and divide.
Data taken from the AVX512_SKX_PortAssign spreadsheet at http://instlatx64.atw.hu/

llvm-svn: 328961
2018-04-02 05:54:34 +00:00
Craig Topper 8104f266a4 [X86] Correct the throughput for divide instructions in Sandy Bridge/Haswell/Broadwell/Skylake scheduler models.
Fixes most of PR36898. Still need to fix the 512-bit instructions, but Agner's tables don't have those.

llvm-svn: 328960
2018-04-02 05:33:28 +00:00
Craig Topper dc74094398 [X86] Fix the SchedRW for AVX512 shift instructions.
It was being inadvertently defaulted to an FADD scheduler class.

llvm-svn: 328959
2018-04-02 03:15:02 +00:00
Craig Topper 5fb1dc2d22 [X86] Give the AVX512 VEXTRACT instructions the same SchedRWs as the SSE/AVX versions.
llvm-svn: 328958
2018-04-02 02:44:55 +00:00
Craig Topper caec723a1a [X86] Add an itinerary to BTR64rr.
llvm-svn: 328956
2018-04-02 01:12:34 +00:00
Craig Topper 02daec00a2 [X86] Make sure all the classes declare in the Haswell scheduler model are prefixed with HW.
The tablegen files all share a namespace so we shouldn't use a generic names in a specific scheduler model.

llvm-svn: 328955
2018-04-02 01:12:32 +00:00
Craig Topper c90d906b16 [X86] Give VINSERTPS the same intinerary as INSERTPS.
llvm-svn: 328954
2018-04-02 00:48:11 +00:00
Harlan Haskins b7881bbfa2 Add C API bindings for DIBuilder 'Type' APIs
This patch adds a set of unstable C API bindings to the DIBuilder interface for
creating structure, function, and aggregate types.

This patch also removes the existing implementations of these functions from
the Go bindings and updates the Go API to fit the new C APIs.

llvm-svn: 328953
2018-04-02 00:17:40 +00:00
Craig Topper dc4a6d1ef6 [X86] Cleanup ADCX/ADOX instruction definitions.
Give them both the same itineraries. Add hasSideEffects = 0 to ADOX since they don't have patterns. Rename source operands to $src1 and $src2 instead of $src0 and $src. Add ReadAfterLd to the memory form SchedRW.

llvm-svn: 328952
2018-04-01 23:58:50 +00:00
Petr Hosek 934e5d5436 [AArch64] Reserve x18 register on Fuchsia
This register is reserved as a platform register on Fuchsia.

Differential Revision: https://reviews.llvm.org/D45105

llvm-svn: 328950
2018-04-01 23:44:04 +00:00
Craig Topper 8a1787ae22 [DebugCounter] Make -debug-counter cl::Hidden.
llvm-svn: 328948
2018-04-01 22:16:52 +00:00
Craig Topper f5730c38e9 [LegacyPassManager] Make 'print-module-scope' cl::Hidden like the rest of the printing options.
llvm-svn: 328947
2018-04-01 21:54:26 +00:00
Craig Topper 9f834810ea [X86] Give ADC8/16/32/64mi the same scheduling information as ADC8/16/32/64mr and SBB8/16/32/64mi.
It doesn't make a lot of sense that it would be different.

llvm-svn: 328946
2018-04-01 21:54:24 +00:00
Chandler Carruth 4244625c51 [x86] Correct the operand structure of the ADOX instruction.
This also moves to define it in the same way as ADCX which seems to use
constraints a bit better.

This is pulled out of the review for reducing the use of popf for
restoring EFLAGS, but is independent. There are still more problems with
our definitions for these instructions that Craig is going to look at
but this is at least less broken and he can start from this to improve
them more fully.

Thanks to Craig for the review here.

llvm-svn: 328945
2018-04-01 21:53:18 +00:00
Chandler Carruth 06b343c6ed [x86] Expose more of the condition conversion routines in the public API
for X86's instruction information. I've now got a second patch under
review that needs these same APIs. This bit is nicely orthogonal and
obvious, so landing it. NFC.

llvm-svn: 328944
2018-04-01 21:47:55 +00:00
Nicolai Haehnle 4254d45a79 AMDGPU: Make isIntrinsicSourceOfDivergence table-driven
Summary:
This is in preparation for the new dimension-aware image intrinsics,
which I'd rather not have to list here by hand.

Change-Id: Iaa16e3a635a11283918ce0d9e1e618591b0bf6fa

Reviewers: arsenm, rampitec, b-sumner

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D44938

llvm-svn: 328939
2018-04-01 17:09:14 +00:00
Nicolai Haehnle 5d0d30304c AMDGPU: Make getTgtMemIntrinsic table-driven for resource-based intrinsics
Summary:
Avoids having to list all intrinsics manually.

This is in preparation for the new dimension-aware image intrinsics,
which I'd rather not have to list here by hand.

Change-Id: If7ced04998397ef68c4cb8f7de66b5050fb767e5

Reviewers: arsenm, rampitec, b-sumner

Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D44937

llvm-svn: 328938
2018-04-01 17:09:07 +00:00
Mandeep Singh Grang fe1d28e83d [DebugInfo] Change std::sort to llvm::sort in response to r327219
Summary:
r327219 added wrappers to std::sort which randomly shuffle the container before sorting.
This will help in uncovering non-determinism caused due to undefined sorting
order of objects having the same key.

To make use of that infrastructure we need to invoke llvm::sort instead of std::sort.

Note: This patch is one of a series of patches to replace *all* std::sort to llvm::sort.
Refer the comments section in D44363 for a list of all the required patches.

Reviewers: echristo, zturner, samsonov

Reviewed By: echristo

Subscribers: JDevlieghere, llvm-commits

Differential Revision: https://reviews.llvm.org/D45134

llvm-svn: 328935
2018-04-01 16:18:49 +00:00
Teresa Johnson 974706ebf7 [ThinLTO] Add an import cutoff for debugging/triaging
Summary:
Adds -import-cutoff=N which will stop importing during the thin link
after N imports. Default is -1 (no  limit).

Reviewers: wmi

Subscribers: inglorion, llvm-commits

Differential Revision: https://reviews.llvm.org/D45127

llvm-svn: 328934
2018-04-01 15:54:40 +00:00
David Green f80ebc8d21 [LoopRotate] Rotate loops with loop exiting latches
If a loop has a loop exiting latch, it can be profitable
to rotate the loop if it leads to the simplification of
a phi node. Perform rotation in these cases even if loop
rotate itself didnt simplify the loop to get there.

Differential Revision: https://reviews.llvm.org/D44199

llvm-svn: 328933
2018-04-01 12:48:24 +00:00
Craig Topper 9b8cd5fe55 [X86] Don't check for folding into a store when deciding if we can promote an i16 mul.
There's no RMW mul operation.

llvm-svn: 328931
2018-04-01 06:29:32 +00:00
Craig Topper db6caabccc [X86] Check if the load and store are to the same pointer before preventing i16 RMW shifts and subtracts from being promoted.
llvm-svn: 328930
2018-04-01 06:29:28 +00:00
Craig Topper ae2de57db0 [X86] Allow i16 subtracts to be promoted if the load is on the LHS and its not being stored.
llvm-svn: 328928
2018-04-01 06:29:25 +00:00
Craig Topper 9bc0d881a3 [X86] Remove unneeded temporary variable. NFC
This Promote flag was alwasys set to true except in the default case. But in the default case we don't need to set PVT and can just return false.

llvm-svn: 328926
2018-04-01 06:29:21 +00:00
Mandeep Singh Grang 97bcade70f [Analysis] Change std::sort to llvm::sort in response to r327219
Summary:
r327219 added wrappers to std::sort which randomly shuffle the container before sorting.
This will help in uncovering non-determinism caused due to undefined sorting
order of objects having the same key.

To make use of that infrastructure we need to invoke llvm::sort instead of std::sort.

Note: This patch is one of a series of patches to replace *all* std::sort to llvm::sort.
Refer D44363 for a list of all the required patches.

Reviewers: sanjoy, dexonsmith, hfinkel, RKSimon

Reviewed By: dexonsmith

Subscribers: david2050, llvm-commits

Differential Revision: https://reviews.llvm.org/D44944

llvm-svn: 328925
2018-04-01 01:46:51 +00:00
Sanjay Patel 6124cae8f7 [DAGCombine] (float)((int) f) --> ftrunc (PR36617)
fptosi / fptoui round towards zero, and that's the same behavior as ISD::FTRUNC, 
so replace a pair of casts with the equivalent node. We don't have to account for 
special cases (NaN, INF) because out-of-range casts are undefined.

Differential Revision: https://reviews.llvm.org/D44909

llvm-svn: 328921
2018-03-31 17:55:44 +00:00
Simon Pilgrim 3b8ad346f9 [X86][Btver2] Add MMX_PSHUFB to the JWritePSHUFB InstRW entries
llvm-svn: 328918
2018-03-31 09:15:54 +00:00
Simon Pilgrim 8c8ebd7945 Fix trailing whitespace. NFCI.
llvm-svn: 328917
2018-03-31 09:14:14 +00:00
Puyan Lotfi 57c4f38c35 [MIR-Canon] Adding support for local idempotent instruction hoisting.
llvm-svn: 328915
2018-03-31 05:48:51 +00:00
Craig Topper 13a0f83a05 [X86] Add SchedRW for PMULLD
Summary:
It seems many CPUs don't implement this instruction as well as the other vector multiplies. Often using a multi uop flow. Silvermont in particular has a 7 uop flow with 11 cycle throughput. Sandy Bridge implements it as a single uop with 5 cycle latency and 1 cycle throughput. But Haswell and later use 2 uops with 10 cycle latency and 2 cycle throughput.

This patch adds a new X86SchedWritePair we can use to tag this instruction separately. I've provided correct information for Silvermont, Btver2, and Sandy Bridge. I've removed the InstRWs for SandyBridge. I've left Haswell/Broadwell/Skylake InstRWs in place because I wasn't sure how to account for the different load latency between 128 and 256 bits. I also left Znver1 InstRWs in place because the existing values don't match Agner's spreadsheet.

I also left a FIXME in the SandyBridge model because it being used for the "generic" model is too optimistic for the 256/512-bit versions since those are multiple uops on all known CPUs.

Reviewers: RKSimon, GGanesh, courbet

Reviewed By: RKSimon

Subscribers: gchatelet, gbedwell, andreadb, llvm-commits

Differential Revision: https://reviews.llvm.org/D44972

llvm-svn: 328914
2018-03-31 04:54:32 +00:00
Teresa Johnson db83aceb06 [ThinLTO] Add an option to force summary call edges cold for debugging
Summary:
Useful to selectively disable importing into specific modules for
debugging/triaging/workarounds.

Reviewers: eraman

Subscribers: inglorion, llvm-commits

Differential Revision: https://reviews.llvm.org/D45062

llvm-svn: 328909
2018-03-31 00:18:08 +00:00