Commit Graph

55295 Commits

Author SHA1 Message Date
Michael Berg ed89d069f4 add a missed case for binary op FMF propagation under select folds
llvm-svn: 339938
2018-08-16 20:59:45 +00:00
Philip Reames 684fa57ef7 [MemLoc] Fix a bug causing any use of invariant.end to crash in LICM
The fix is fairly simple, but is says something unpleasant about the usage and testing of invariant.start/end scopes that this went undetected.  To put this in perspective, *any* invariant.end in a loop flowing through LICM crashed.  I haven't bothered to figure out just how far back this goes, but it's not caused by any of the recent changes.  We're probably talking months if not years.  

llvm-svn: 339936
2018-08-16 20:48:55 +00:00
Krzysztof Parzyszek bb1aede865 [SystemZ] Require asserts in subregliveness-06.mir
The option -misched=shuffle is only available with !NDEBUG builds.

llvm-svn: 339931
2018-08-16 20:12:15 +00:00
Peter Collingbourne 3da2ffb826 Add missing test file from r339799.
llvm-svn: 339927
2018-08-16 19:29:01 +00:00
Craig Topper 3dfc5af178 [X86] Pre-commit test case for D50827.
llvm-svn: 339926
2018-08-16 19:27:43 +00:00
Krzysztof Parzyszek 9af86a5e01 [MachineVerifier] Check if predecessor is jointly dominated by undefs
Each use of a value should be jointly dominated by the union of defs and
undefs. It can happen that it will only be jointly dominated by undefs,
and that is still legal. Make sure that the verifier is aware of that.

llvm-svn: 339924
2018-08-16 19:13:28 +00:00
Eli Friedman 73e8a784e6 [SelectionDAG] Improve the legalisation lowering of UMULO.
There is no way in the universe, that doing a full-width division in
software will be faster than doing overflowing multiplication in
software in the first place, especially given that this same full-width
multiplication needs to be done anyway.

This patch replaces the previous implementation with a direct lowering
into an overflowing multiplication algorithm based on half-width
operations.

Correctness of the algorithm was verified by exhaustively checking the
output of this algorithm for overflowing multiplication of 16 bit
integers against an obviously correct widening multiplication. Baring
any oversights introduced by porting the algorithm to DAG, confidence in
correctness of this algorithm is extremely high.

Following table shows the change in both t = runtime and s = space. The
change is expressed as a multiplier of original, so anything under 1 is
“better” and anything above 1 is worse.

+-------+-----------+-----------+-------------+-------------+
| Arch  | u64*u64 t | u64*u64 s | u128*u128 t | u128*u128 s |
+-------+-----------+-----------+-------------+-------------+
|   X64 |     -     |     -     |    ~0.5     |    ~0.64    |
|  i686 |   ~0.5    |   ~0.6666 |    ~0.05    |    ~0.9     |
| armv7 |     -     |   ~0.75   |      -      |    ~1.4     |
+-------+-----------+-----------+-------------+-------------+

Performance numbers have been collected by running overflowing
multiplication in a loop under `perf` on two x86_64 (one Intel Haswell,
other AMD Ryzen) based machines. Size numbers have been collected by
looking at the size of function containing an overflowing multiply in
a loop.

All in all, it can be seen that both performance and size has improved
except in the case of armv7 where code size has regressed for 128-bit
multiply. u128*u128 overflowing multiply on 32-bit platforms seem to
benefit from this change a lot, taking only 5% of the time compared to
original algorithm to calculate the same thing.

The final benefit of this change is that LLVM is now capable of lowering
the overflowing unsigned multiply for integers of any bit-width as long
as the target is capable of lowering regular multiplication for the same
bit-width. Previously, 128-bit overflowing multiply was the widest
possible.

Patch by Simonas Kazlauskas!

Differential Revision: https://reviews.llvm.org/D50310

llvm-svn: 339922
2018-08-16 18:39:39 +00:00
Jordan Rupprecht d1767dc56f [llvm-strip] Add support for -p/--preserve-dates
Summary: [llvm-strip] Preserve access/modification timestamps when -p is used.

Reviewers: jakehehrlich, jhenderson, alexshap

Reviewed By: jhenderson

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D50744

llvm-svn: 339921
2018-08-16 18:29:40 +00:00
Krzysztof Parzyszek 17143f6111 [RegisterCoalescer] Shrink to uses if needed after removeCopyByCommutingDef
llvm-svn: 339912
2018-08-16 18:02:59 +00:00
Simon Pilgrim 87d0039a45 [TargetLowering] Add support for non-uniform vectors to BuildSDIV
This patch refactors the existing TargetLowering::BuildSDIV base implementation to support non-uniform constant vector denominators.

This is the last patch necessary to close PR36545

Differential Revision: https://reviews.llvm.org/D50765

llvm-svn: 339908
2018-08-16 17:44:33 +00:00
Reid Kleckner bd5d71229d [codeview] Use push_macro to avoid conflicts instead of a prefix
Summary:
This prefix was added in r333421, and it changed our dumper output to
say things like "CVRegEAX" instead of just "EAX". That's a functional
change that I'd rather avoid.

I tested GCC, Clang, and MSVC, and all of them support #pragma
push_macro. They don't issue warnings whem the macro is not defined
either.

I don't have a Mac so I can't test the real termios.h header, but I
looked at the termios.h sources online and looked for other conflicts.
I saw only the CR* macros, so those are the ones we work around.

Reviewers: zturner, JDevlieghere

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D50851

llvm-svn: 339907
2018-08-16 17:34:31 +00:00
Matt Arsenault 7121bed210 AMDGPU: Custom lower fexp
This will allow the library to just use __builtin_expf directly
without expanding this itself. Note f64 still won't work because
there is no exp instruction for it.

llvm-svn: 339902
2018-08-16 17:07:52 +00:00
Simon Pilgrim 8b9e545477 [X86][SSE] Add sdiv by nonuniform constant vector test containing -1/+1 and all-bits style constants
llvm-svn: 339901
2018-08-16 17:07:41 +00:00
Evandro Menezes 42422b33cf [NFC] Fix typo in test cases
llvm-svn: 339900
2018-08-16 17:03:22 +00:00
Nirav Dave 7fd992a755 [MC][X86] Enhance X86 Register expression handling to more closely match GCC.
Allow the comparison of x86 registers in the evaluation of assembler
directives. This generalizes and simplifies the extension from r334022
to catch another case found in the Linux kernel.

Reviewers: rnk, void

Reviewed By: rnk

Subscribers: hiraditya, nickdesaulniers, llvm-commits

Differential Revision: https://reviews.llvm.org/D50795

llvm-svn: 339895
2018-08-16 16:31:14 +00:00
Zachary Turner 970fdc3236 [MS Demangler] Demangle string literals.
When demangling string literals, Microsoft's undname
simply prints 'string'.  This patch implements string
literal demangling while doing a bit better than this
by decoding as much of the string as possible and
trying to faithfully reproduce the original string
literal definition.

This is a bit tricky because the different character
types char, char16_t, and char32_t are not uniquely
identified by the mangling, so we have to use a
heuristic to try to guess the character type.  But
it works pretty well, and many tests are added to
illustrate the behavior.

Differential Revision: https://reviews.llvm.org/D50806

llvm-svn: 339892
2018-08-16 16:17:36 +00:00
Zachary Turner 83313f8f54 [MS Demangler] Don't fail on MD5-mangled names.
When we have an MD5 mangled name, we shouldn't choke and say
that it's an invalid name.  Even though it's impossible to demangle,
we should just output the original name.

llvm-svn: 339891
2018-08-16 16:17:17 +00:00
Sanjay Patel 0ea8d8b951 [ConstantFolding] add tests for funnel shift intrinsics; NFC
No functionality for this yet.

llvm-svn: 339889
2018-08-16 16:10:42 +00:00
Evandro Menezes c05c7e11bb [InstCombine] Expand the simplification of pow(x, 0.5) to sqrt(x)
Expand the number of cases when `pow(x, 0.5)` is simplified into `sqrt(x)`
by considering the math semantics with more granularity.

Differential revision: https://reviews.llvm.org/D50036

llvm-svn: 339887
2018-08-16 15:58:08 +00:00
Sanjay Patel 039f556f44 [InstCombine] move vector compare before same-shuffled ops
This is a step towards fixing PR37463:
https://bugs.llvm.org/show_bug.cgi?id=37463

llvm-svn: 339875
2018-08-16 12:52:17 +00:00
George Rimar d2f90ea337 [yaml2obj] - Allow to use numeric sh_link (Link) value for sections.
That change allows using numeric values for Link field.
It is consistent with the code for another fields in this method.

llvm-svn: 339873
2018-08-16 12:44:17 +00:00
George Rimar 17257bb0b5 [yaml2elf] - Use check-next in test.
Its a follow up for rL339870.

llvm-svn: 339872
2018-08-16 12:40:27 +00:00
Sam Parker 0d51197051 [ARM] Ignore GEPs in ARMCodeGenPrepare
While searching through the use-def tree, ignore GetElementPtrInst
instructions because they don't need promoting and neither do their
indices. Otherwise, the wide indices prevent the transformation from
happening.

Differential Revision: https://reviews.llvm.org/D50762

llvm-svn: 339871
2018-08-16 12:24:40 +00:00
George Rimar 7f2df7df45 [yaml2elf] - Simplify code, add a test. NFC.
This simplifies the code allowing to set the sh_info
for relocations sections. And adds a missing test.

llvm-svn: 339870
2018-08-16 12:23:22 +00:00
Sam Parker 0e2f0bd48e [ARM] Allow zext in ARMCodeGenPrepare
Treat zext instructions as roots, like we do for truncs.

Differential Revision: https://reviews.llvm.org/D50759

llvm-svn: 339868
2018-08-16 11:54:09 +00:00
Alex Bradbury fdc4647ca3 [RISCV][MC] Don't fold symbol differences if requiresDiffExpressionRelocations is true
When emitting the difference between two symbols, the standard behavior is 
that the difference will be resolved to an absolute value if both of the 
symbols are offsets from the same data fragment. This is undesirable on 
architectures such as RISC-V where relaxation in the linker may cause the 
computed difference to become invalid. This caused an issue when compiling to 
object code, where the size of a function in the debug information was already 
calculated even though it could change as a consequence of relaxation in the 
subsequent linking stage.

This patch inhibits the resolution of symbol differences to absolute values 
where the target's AsmBackend has declared that it does not want these to be 
folded.

Differential Revision: https://reviews.llvm.org/D45773
Patch by Edward Jones.

llvm-svn: 339864
2018-08-16 11:26:37 +00:00
Sam Parker 13567dbbd8 [ARM] Allow signed icmps in ARMCodeGenPrepare
Originally committed in r339755 which was reverted in r339806 due to
an asan issue. The issue was caused by my assumption that operands to
a CallInst mapped to the FunctionType Params. CallInsts are now
handled by iterating over their ArgOperands instead of Operands.
    
Original Message:
  Treat signed icmps as 'sinks', allowing them to be in the use-def
  tree, enabling more promotions to be performed. As a sink, any
  promoted incoming values need to be truncated before being used by
  the signed icmp.
    
  Differential Revision: https://reviews.llvm.org/D50067

llvm-svn: 339858
2018-08-16 10:05:39 +00:00
Craig Topper 9c1d9fdeaa [X86] Remove masking from the 512-bit padds and psubs intrinsics. Use select in IR instead.
llvm-svn: 339842
2018-08-16 06:20:24 +00:00
Craig Topper 9d6983c9fd [X86] Remove the unused masked 128 and 256-bit masked padds/psubs intrinsics.
Still need to remove masking from the 512-bit versions.

llvm-svn: 339841
2018-08-16 06:20:22 +00:00
Craig Topper 054b8cce2d [X86] Correct some bad FileCheck prefixes in tests. Add test cases for v64i8 padd/psub saturation intrinsics.
For some reason we had the 128/256-bit tests, but no the 512-bit tests.

llvm-svn: 339840
2018-08-16 06:20:19 +00:00
Chandler Carruth 00c35c7794 [x86] Actually initialize the SLH pass with the x86 backend and use
a shorter name ('x86-slh') for the internal flags and pass name.

Without this, you can't use the -stop-after or -stop-before
infrastructure. I seem to have just missed this when originally adding
the pass.

The shorter name solves two problems. First, the flag names were ...
really long and hard to type/manage. Second, the pass name can't be the
exact same as the flag name used to enable this, and there are already
some users of that flag name so I'm avoiding changing it unnecessarily.

llvm-svn: 339836
2018-08-16 01:22:19 +00:00
Easwaran Raman aca738b742 [BFI] Use rounding while computing profile counts.
Summary:
Profile count of a block is computed by multiplying its block frequency
by entry count and dividing the result by entry block frequency. Do
rounded division in the last step and update test cases appropriately.

Reviewers: davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D50822

llvm-svn: 339835
2018-08-16 00:26:59 +00:00
Guozhi Wei 8c17f9a77d [CodeGenPrepare] Add BothExtension type to PromotedInsts
This patch fixes PR38125.

Instruction extension types are recorded in PromotedInsts, it can be used later in function canGetThrough. If an instruction has two users with different extension types, it will be inserted into PromotedInsts two times in function promoteOperandForOther. The second one overwrites the first one, and the final extension type is wrong, later causes problem in canGetThrough.

This patch changes the simple bool extension type to 2-bit enum type, add a BothExtension type in addition to zero/sign extension. When an user sees BothExtension for an instruction, it actually knows nothing about how that instruction is extended.

Differential Revision: https://reviews.llvm.org/D49512

llvm-svn: 339822
2018-08-15 22:08:26 +00:00
Matt Arsenault f533e6b0ed AMDGPU: Fold fneg into fmed3
llvm-svn: 339821
2018-08-15 21:46:27 +00:00
Matt Arsenault a816073764 AMDGPU: Improve extract_vector_elt reduction combine
Handle fmul, fsub and preserve flags.

Also really test minnum/maxnum reductions.
The existing tests were only checking from
minnum/maxnum matched from a fast math compare
and select which is not the same.

llvm-svn: 339820
2018-08-15 21:34:06 +00:00
Matt Arsenault b3a80e5397 AMDGPU: Implement llvm.amdgcn.icmp/fcmp for i16/f16
Also support these on targets without support for these,
since it will allow us to freely create these in instcombine.

llvm-svn: 339819
2018-08-15 21:25:20 +00:00
Craig Topper 08e082619a [X86] Improve AVX1 shuffle lowering for v8f32 shuffles where the low half comes from V1 and the high half comes from V2 and the halves do the same operation
To lower this we now create a new V1 containing the low half of both sources and a new V2 containing the upper half of both sources. Then we created a repeated lane shuffle of those new sources to create the final result.

This fixes PR35833

Differential Revison: https://reviews.llvm.org/D41794

llvm-svn: 339818
2018-08-15 21:21:52 +00:00
Matt Arsenault 9a389fbd79 AMDGPU: Stop producing icmp/fcmp intrinsics with invalid types
llvm-svn: 339815
2018-08-15 21:14:25 +00:00
Matt Arsenault 6c7ba82900 AMDGPU: Address todo for handling 1/(2 pi)
llvm-svn: 339814
2018-08-15 21:03:55 +00:00
Vitaly Buka ed4239f482 Revert "[ARM] Allow signed icmps in ARMCodeGenPrepare"
use-after-poison in check-llvm under asan

This reverts commit r339755.

llvm-svn: 339806
2018-08-15 20:09:35 +00:00
Peter Collingbourne 62e4fc48a5 llvm-readobj: Fix addend in relocations for android packed format
If a relocation group doesn't have the RELOCATION_GROUP_HAS_ADDEND_FLAG set, then this implies the group's addend equals zero.
In this case android packed format won't encode an explicit addend delta, instead we need to set Addend, the "previous addend" variable, to zero by ourself.

Patch by Yi-Yo Chiang!

Differential Revision: https://reviews.llvm.org/D50601

llvm-svn: 339799
2018-08-15 17:58:22 +00:00
Amara Emerson 070ac768ff [InstCombine] Fix IC trying to create a xor of pointer types.
rdar://42473741

Differential Revision: https://reviews.llvm.org/D50775

llvm-svn: 339796
2018-08-15 17:46:22 +00:00
Sanjay Patel 49a8280f43 [AArch64] add tests for poor vector intrinsic lowering via legalization (PR38527); NFC
These correspond to the x86 tests added with rL339790 / rL339791, but I widened
the non-fsin tests to v3f32 to show the problem because AArch supports v2f32 ops. 

llvm-svn: 339793
2018-08-15 17:06:21 +00:00
Krzysztof Parzyszek 3b097b4d3e [RegisterCoalescer] Ensure that both registers have subranges if one does
llvm-svn: 339792
2018-08-15 17:04:58 +00:00
Sanjay Patel 712d42f53d [x86] add fabs test for vector intrinsic to potential libcall bug; NFC
This is a negative test for x86 because it has custom lowering for fabs.

llvm-svn: 339791
2018-08-15 16:56:09 +00:00
Sanjay Patel f9afee479f [x86] add tests for poor vector intrinsic lowering via legalization (PR38527); NFC
llvm-svn: 339790
2018-08-15 16:35:50 +00:00
Krzysztof Parzyszek 88d267d094 [RegisterCoalescer] Reset VNInfo def when copying segments over
llvm-svn: 339788
2018-08-15 16:21:53 +00:00
Derek Schuff 82812fb986 [WebAssembly] SIMD replace_lane
Implement and test replace_lane instructions.

Patch by Thomas Lively

Differential Revision: https://reviews.llvm.org/D50750

llvm-svn: 339786
2018-08-15 16:18:51 +00:00
Krzysztof Parzyszek 46ce441df6 [RegAlloc] Check that subreg liveness tracking applies to given virtual reg
Subregister liveness applies selectively to register classes with certain
properties. Make sure that when it's enabled, it applies to a given virtual
register (in virtual register rewriter).

llvm-svn: 339784
2018-08-15 16:07:47 +00:00
Krzysztof Parzyszek 4e06beb820 [SystemZ] Add testcase for r339778
llvm-svn: 339780
2018-08-15 15:43:13 +00:00