Commit Graph

279028 Commits

Author SHA1 Message Date
Sanjay Patel 5a0cdac174 [InstCombine] canonicalize shifty abs(): ashr+add+xor --> cmp+neg+sel
We want to do this for 2 reasons:
1. Value tracking does not recognize the ashr variant, so it would fail to match for cases like D39766.
2. DAGCombiner does better at producing optimal codegen when we have the cmp+sel pattern.

More detail about what happens in the backend:
1. DAGCombiner has a generic transform for all targets to convert the scalar cmp+sel variant of abs 
   into the shift variant. That is the opposite of this IR canonicalization.
2. DAGCombiner has a generic transform for all targets to convert the vector cmp+sel variant of abs 
   into either an ABS node or the shift variant. That is again the opposite of this IR canonicalization.
3. DAGCombiner has a generic transform for all targets to convert the exact shift variants produced by #1 or #2
   into an ISD::ABS node. Note: It would be an efficiency improvement if we had #1 go directly to an ABS node 
   when that's legal/custom.
4. The pattern matching above is incomplete, so it is possible to escape the intended/optimal codegen in a 
   variety of ways.
   a. For #2, the vector path is missing the case for setlt with a '1' constant.
   b. For #3, we are missing a match for commuted versions of the shift variants.
5. Therefore, this IR canonicalization can only help get us to the optimal codegen. The version of cmp+sel 
   produced by this patch will be recognized in the DAG and converted to an ABS node when possible or the 
   shift sequence when not.
6. In the following examples with this patch applied, we may get conditional moves rather than the shift 
   produced by the generic DAGCombiner transforms. The conditional move is created using a target-specific 
   decision for any given target. Whether it is optimal or not for a particular subtarget may be up for debate.

define i32 @abs_shifty(i32 %x) {
  %signbit = ashr i32 %x, 31 
  %add = add i32 %signbit, %x  
  %abs = xor i32 %signbit, %add 
  ret i32 %abs
}

define i32 @abs_cmpsubsel(i32 %x) {
  %cmp = icmp slt i32 %x, zeroinitializer
  %sub = sub i32 zeroinitializer, %x
  %abs = select i1 %cmp, i32 %sub, i32 %x
  ret i32 %abs
}

define <4 x i32> @abs_shifty_vec(<4 x i32> %x) {
  %signbit = ashr <4 x i32> %x, <i32 31, i32 31, i32 31, i32 31> 
  %add = add <4 x i32> %signbit, %x  
  %abs = xor <4 x i32> %signbit, %add 
  ret <4 x i32> %abs
}

define <4 x i32> @abs_cmpsubsel_vec(<4 x i32> %x) {
  %cmp = icmp slt <4 x i32> %x, zeroinitializer
  %sub = sub <4 x i32> zeroinitializer, %x
  %abs = select <4 x i1> %cmp, <4 x i32> %sub, <4 x i32> %x
  ret <4 x i32> %abs
}

> $ ./opt -instcombine shiftyabs.ll -S | ./llc -o - -mtriple=x86_64 -mattr=avx 
> abs_shifty:
> 	movl	%edi, %eax
> 	negl	%eax
> 	cmovll	%edi, %eax
> 	retq
> 
> abs_cmpsubsel:
> 	movl	%edi, %eax
> 	negl	%eax
> 	cmovll	%edi, %eax
> 	retq
> 
> abs_shifty_vec:
> 	vpabsd	%xmm0, %xmm0
> 	retq
> 
> abs_cmpsubsel_vec:
> 	vpabsd	%xmm0, %xmm0
> 	retq
> 
> $ ./opt -instcombine shiftyabs.ll -S | ./llc -o - -mtriple=aarch64
> abs_shifty:
> 	cmp	w0, #0                  // =0
> 	cneg	w0, w0, mi
> 	ret
> 
> abs_cmpsubsel: 
> 	cmp	w0, #0                  // =0
> 	cneg	w0, w0, mi
> 	ret
>                                        
> abs_shifty_vec: 
> 	abs	v0.4s, v0.4s
> 	ret
> 
> abs_cmpsubsel_vec: 
> 	abs	v0.4s, v0.4s
> 	ret
> 
> $ ./opt -instcombine shiftyabs.ll -S | ./llc -o - -mtriple=powerpc64le 
> abs_shifty:  
> 	srawi 4, 3, 31
> 	add 3, 3, 4
> 	xor 3, 3, 4
> 	blr
> 
> abs_cmpsubsel:
> 	srawi 4, 3, 31
> 	add 3, 3, 4
> 	xor 3, 3, 4
> 	blr
> 
> abs_shifty_vec:   
> 	vspltisw 3, -16
> 	vspltisw 4, 15
> 	vsubuwm 3, 4, 3
> 	vsraw 3, 2, 3
> 	vadduwm 2, 2, 3
> 	xxlxor 34, 34, 35
> 	blr
> 
> abs_cmpsubsel_vec: 
> 	vspltisw 3, -16
> 	vspltisw 4, 15
> 	vsubuwm 3, 4, 3
> 	vsraw 3, 2, 3
> 	vadduwm 2, 2, 3
> 	xxlxor 34, 34, 35
> 	blr
>

Differential Revision: https://reviews.llvm.org/D40984

llvm-svn: 320921
2017-12-16 16:41:17 +00:00
Sanjay Patel cb8c009801 [Driver, CodeGen] pass through and apply -fassociative-math
There are 2 parts to getting the -fassociative-math command-line flag translated to LLVM FMF:

1. In the driver/frontend, we accept the flag and its 'no' inverse and deal with the 
   interactions with other flags like -ffast-math -fno-signed-zeros -fno-trapping-math. 
   This was mostly already done - we just need to translate the flag as a codegen option. 
   The test file is complicated because there are many potential combinations of flags here.
   Note that we are matching gcc's behavior that requires 'nsz' and no-trapping-math.

2. In codegen, we map the codegen option to FMF in the IR builder. This is simple code and 
   corresponding test.

For the motivating example from PR27372:

float foo(float a, float x) { return ((a + x) - x); }

$ ./clang -O2 27372.c -S -o - -ffast-math  -fno-associative-math -emit-llvm  | egrep 'fadd|fsub'
  %add = fadd nnan ninf nsz arcp contract float %0, %1
  %sub = fsub nnan ninf nsz arcp contract float %add, %2

So 'reassoc' is off as expected (and so is the new 'afn' but that's a different patch). 
This case now works as expected end-to-end although the underlying logic is still wrong:

$ ./clang  -O2 27372.c -S -o - -ffast-math  -fno-associative-math | grep xmm
	addss	%xmm1, %xmm0
	subss	%xmm1, %xmm0

We're not done because the case where 'reassoc' is set is ignored by optimizer passes. Example:

$ ./clang  -O2 27372.c -S -o - -fassociative-math -fno-signed-zeros -fno-trapping-math -emit-llvm  | grep fadd
  %add = fadd reassoc float %0, %1

$ ./clang -O2  27372.c -S -o - -fassociative-math -fno-signed-zeros -fno-trapping-math | grep xmm
	addss	%xmm1, %xmm0
	subss	%xmm1, %xmm0

Differential Revision: https://reviews.llvm.org/D39812

llvm-svn: 320920
2017-12-16 16:11:17 +00:00
Craig Topper 5028ace602 [X86] Implement kand/kandn/kor/kxor/kxnor/knot intrinsics using native IR.
llvm-svn: 320919
2017-12-16 08:26:22 +00:00
Craig Topper d2a2a39c93 [X86] Remove GCCBuiltin from kand/kandn/kor/kxor/kxnor/knot intrinsics so clang can implement with native IR.
llvm-svn: 320918
2017-12-16 08:25:30 +00:00
Craig Topper 1c7d07c601 [X86] Remove unneeded code for handling the old kunpck intrinsics.
llvm-svn: 320917
2017-12-16 06:58:30 +00:00
Craig Topper 798f2c037c [X86] Add the two files I forgot to commit in r320915.
llvm-svn: 320916
2017-12-16 06:10:24 +00:00
Craig Topper b846d1ff76 [X86] Add builtins and tests for 128 and 256 bit vpopcntdq.
llvm-svn: 320915
2017-12-16 06:02:31 +00:00
Hal Finkel 92ea8acbcd Move Transforms/LoopVectorize/consecutive-ptr-cg-bug.ll into the X86 subdirectory
This test depends on X86's TTI; move into the X86 subdirectory.

llvm-svn: 320914
2017-12-16 05:10:20 +00:00
Hal Finkel 5444f40965 [LV] Extend InstWidening with CM_Widen_Recursive
Changes to the original scalar loop during LV code gen cause the return value
of Legal->isConsecutivePtr() to be inconsistent with the return value during
legal/cost phases (further analysis and information of the bug is in D39346).
This patch is an alternative fix to PR34965 following the CM_Widen approach
proposed by Ayal and Gil in D39346. It extends InstWidening enum with
CM_Widen_Reverse to properly record the widening decision for consecutive
reverse memory accesses and, consequently, get rid of the
Legal->isConsetuviePtr() call in LV code gen. I think this is a simpler/cleaner
solution to PR34965 than the one in D39346.

Fixes PR34965.

Patch by Diego Caballero, thanks!

Differential Revision: https://reviews.llvm.org/D40742

llvm-svn: 320913
2017-12-16 02:55:24 +00:00
Galina Kistanova 5f8c84c5be Fixed warning 'function declaration isn’t a prototype [-Werror=strict-prototypes]'
llvm-svn: 320912
2017-12-16 02:54:17 +00:00
Hal Finkel e86a8b79b5 [PowerPC, AsmParser] Enable the mnemonic spell corrector
r307148 added an assembly mnemonic spelling correction support and enabled it
on ARM. This enables that support on PowerPC as well.

Patch by Dmitry Venikov, thanks!

Differential Revision: https://reviews.llvm.org/D40552

llvm-svn: 320911
2017-12-16 02:42:18 +00:00
Craig Topper c08960597c [X86] Add 128 and 256-bit VPOPCNTDQ instructions. Adjust some tablegen classes LZCNT/POPCNT.
I think when this instruction was first published it was only for a Knights CPU and thus VLX version was missing.

llvm-svn: 320910
2017-12-16 02:40:28 +00:00
Vitaly Buka 12f9b8cf24 [LTO] Update tests for r320905
llvm-svn: 320909
2017-12-16 02:40:20 +00:00
Hal Finkel 05e4648482 [VerifyDiagnosticConsumer] support -verify=<prefixes>
This mimics FileCheck's --check-prefixes option.

The default prefix is "expected". That is, "-verify" is equivalent to
"-verify=expected".

The goal is to permit exercising a single test suite source file with different
compiler options producing different sets of diagnostics.  While cpp can be
combined with the existing -verify to accomplish the same goal, source is often
easier to maintain when it's not cluttered with preprocessor directives or
duplicate passages of code. For example, this patch also rewrites some existing
clang tests to demonstrate the benefit of this feature.

Patch by Joel E. Denny, thanks!

Differential Revision: https://reviews.llvm.org/D39694

llvm-svn: 320908
2017-12-16 02:23:22 +00:00
Vitaly Buka fd563a0352 Remove trailing whitespace
llvm-svn: 320907
2017-12-16 02:12:35 +00:00
Sam Clegg 731a76646f [WebAssembly] Return ArrayRef's rather than const std::vector&
From working on lld I've learned this is generally the
preferred way for several reasons (e.g. more concise, improves
encapsulation).

Differential Revision: https://reviews.llvm.org/D41265

llvm-svn: 320906
2017-12-16 02:10:16 +00:00
Vitaly Buka a5376f393e [LTO] Make processing of combined module more consistent
Summary:
1. Use stream 0 only for combined module. Previously if combined module was not
processes ThinLTO used the stream for own output. However small changes in input,
could trigger combined module  and shuffle outputs making life of llvm::LTO harder.

2. Always process combined module and write output to stream 0. Processing empty
combined module is cheap and allows llvm::LTO users to avoid implementing processing
which is already done in llvm::LTO.

Subscribers: mehdi_amini, inglorion, eraman, hiraditya

Differential Revision: https://reviews.llvm.org/D41267

llvm-svn: 320905
2017-12-16 02:10:00 +00:00
Hal Finkel cc15219fb1 [TextDiagnosticBuffer] Fix diagnostic note emission order
The frontend currently groups diagnostics from the command line according to
diagnostic level, but that places all notes last. Fix that by emitting such
diagnostics in the order they were generated.

Patch by Joel E. Denny, thanks!

Differential Revision: https://reviews.llvm.org/D40995

llvm-svn: 320904
2017-12-16 01:40:19 +00:00
Teresa Johnson 160f4bb803 Add another missing -enable-import-metadata to test
r320895 modified a test so that it needs -enable-import-metadata which
is false by default for NDEBUG, found another place that needs this
added.

llvm-svn: 320903
2017-12-16 01:35:36 +00:00
Vedant Kumar fa5a0e59f0 [CodeGen] Specialize mixed-sign mul-with-overflow (fix PR34920)
This patch introduces a specialized way to lower overflow-checked
multiplications with mixed-sign operands. This fixes link failures and
ICEs on code like this:

  void mul(int64_t a, uint64_t b) {
    int64_t res;
    __builtin_mul_overflow(a, b, &res);
  }

The generic checked-binop irgen would use a 65-bit multiplication
intrinsic here, which requires runtime support for _muloti4 (128-bit
multiplication), and therefore fails to link on i386. To get an ICE
on x86_64, change the example to use __int128_t / __uint128_t.

Adding runtime and backend support for 65-bit or 129-bit checked
multiplication on all of our supported targets is infeasible.

This patch solves the problem by using simpler, specialized irgen for
the mixed-sign case.

llvm.org/PR34920, rdar://34963321

Testing: Apart from check-clang, I compared the output from this fairly
comprehensive test driver using unpatched & patched clangs:
https://gist.github.com/vedantk/3eb9c88f82e5c32f2e590555b4af5081

Differential Revision: https://reviews.llvm.org/D41149

llvm-svn: 320902
2017-12-16 01:28:25 +00:00
Hal Finkel 2ff24731bb [SimplifyLibCalls] Inline calls to cabs when it's safe to do so
When unsafe algerbra is allowed calls to cabs(r) can be replaced by:

  sqrt(creal(r)*creal(r) + cimag(r)*cimag(r))

Patch by Paul Walker, thanks!

Differential Revision: https://reviews.llvm.org/D40069

llvm-svn: 320901
2017-12-16 01:26:25 +00:00
Hal Finkel 7333aa9f16 [LV] NFC patch for moving VP*Recipe class definitions from LoopVectorize.cpp to VPlan.h
This is a small step forward to move VPlan stuff to where it should belong (i.e., VPlan.*):

  1. VP*Recipe classes in LoopVectorize.cpp are moved to VPlan.h.
  2. Many of VP*Recipe::print() and execute() definitions are still left in
     LoopVectorize.cpp since they refer to things declared in LoopVectorize.cpp. To
     be moved to VPlan.cpp at a later time.
  3. InterleaveGroup class is moved from anonymous namespace to llvm namespace.
     Referencing it in anonymous namespace from VPlan.h ended up in warning.

Patch by Hideki Saito, thanks!

Differential Revision: https://reviews.llvm.org/D41045

llvm-svn: 320900
2017-12-16 01:12:50 +00:00
Teresa Johnson 4358a40345 Add -enable-import-metadata to test
r320895 modified a test so that it needs -enable-import-metadata which
is false by default for NDEBUG.

llvm-svn: 320899
2017-12-16 01:00:48 +00:00
Craig Topper 6b129fde5a [X86] Add back the assert from r320830 that was reverted in r320850
Hopefully r320864 has fixed the offending case that failed the assert.

llvm-svn: 320898
2017-12-16 00:33:16 +00:00
Teresa Johnson 69b2de8466 Fix NDEBUG build problem in r320895
Fix incorrect placement of #endif causing NDEBUG build failures.

llvm-svn: 320897
2017-12-16 00:29:31 +00:00
Shoaib Meenai a1f6fba4d1 [COFF] Clean up debug option handling
/debug and /debug:dwarf are orthogonal. An object file can contain both
CodeView and DWARF debug info, so the combination of /debug:dwarf and
/debug should generate both DWARF and a PDB, rather than /debug:dwarf
always suppressing PDB creation.

/nopdb is now redundant and can be removed. /debug /nopdb was previously
used to support DWARF, but specifying /debug:dwarf is entirely
equivalent to that combination now.

Differential Revision: https://reviews.llvm.org/D41310

llvm-svn: 320896
2017-12-16 00:23:24 +00:00
Teresa Johnson 81bbf74265 [ThinLTO] Enable importing of aliases as copy of aliasee
Summary:
This implements a missing feature to allow importing of aliases, which
was previously disabled because alias cannot be available_externally.
We instead import an alias as a copy of its aliasee.

Some additional work was required in the IndexBitcodeWriter for the
distributed build case, to ensure that the aliasee has a value id
in the distributed index file (i.e. even when it is not being
imported directly).

This is a performance win in codes that have many aliases, e.g. C++
applications that have many constructor and destructor aliases.

Reviewers: pcc

Subscribers: mehdi_amini, inglorion, eraman, llvm-commits

Differential Revision: https://reviews.llvm.org/D40747

llvm-svn: 320895
2017-12-16 00:18:12 +00:00
Shoaib Meenai c4fdbca604 [COFF] Update an outdated comment. NFC
This comment dates from when LLD didn't produce actual PDBs, and is very
outdated now.

llvm-svn: 320894
2017-12-15 23:52:46 +00:00
David Blaikie 2110924909 Fix WebAssembly backend for some LLVM API changes
llvm-svn: 320893
2017-12-15 23:52:06 +00:00
Shoaib Meenai 111db7945d [COFF] Simplify hasArgs calls. NFC
We can just pass multiple options to hasArgs (which will check for any
of those options being present) instead of calling it multiple times.

llvm-svn: 320892
2017-12-15 23:51:14 +00:00
Davide Italiano 864a6edf8b [CMake] darwin-debug is an hard dependency for tests on macOS.
Fixes a few failured on the testsuite with CMake.

llvm-svn: 320891
2017-12-15 23:27:10 +00:00
Quentin Colombet 893e0f15e2 [TableGen][GlobalISel] Make the different Matcher comparable
This opens refactoring opportunities in the match table now that we can
check that two predicates are the same.

NFC.

llvm-svn: 320890
2017-12-15 23:24:39 +00:00
Quentin Colombet a646ef08e8 [TableGen][GlobalISel] Fix unused variable warning in release mode
Introduced in r320887.

NFC.

llvm-svn: 320889
2017-12-15 23:24:36 +00:00
Paul Robinson 6d0484f2b6 Revert "Recommit "[DWARFv5] Dump an MD5 checksum in the line-table header.""
This reverts commit 0afef672f63f0e4e91938656bc73424a8c058bfc.
Still failing at runtime on bots.

llvm-svn: 320888
2017-12-15 23:21:52 +00:00
Quentin Colombet aad20be6ca [TableGen][GlobalISel] Have the predicate directly know which data they are dealing with
Prior to this patch, a predicate wouldn't make sense outside of its
rule. Indeed, it was only during emitting a rule that a predicate would
be made aware of the IDs of the data it is checking. Because of that,
predicates could not be moved around or compared between each other.

NFC.

llvm-svn: 320887
2017-12-15 23:07:42 +00:00
Paul Robinson 5c8f7d7de4 Recommit "[DWARFv5] Dump an MD5 checksum in the line-table header."
Adds missing support for DW_FORM_data16.

Update of r320852, fixing the unittest to use a hand-coded struct
instead of std::array to guarantee data layout.

Differential Revision: https://reviews.llvm.org/D41090

llvm-svn: 320886
2017-12-15 22:57:17 +00:00
Matthias Braun 042fed54fb Fix unused variable in non-assert builds
llvm-svn: 320885
2017-12-15 22:53:33 +00:00
Matthias Braun f1caa2833f MachineFunction: Return reference from getFunction(); NFC
The Function can never be nullptr so we can return a reference.

llvm-svn: 320884
2017-12-15 22:22:58 +00:00
Davide Italiano 8539edb0f3 [MacOSX/Queues] Relax an overly aggressive assertion in a test.
"Default" is a valid QoS for a thread on older versions of macOS,
like the one installed in the bot.
Thanks to Jason Molenda for helping me figuring out the problem.

<rdar://problem/28346273>

llvm-svn: 320883
2017-12-15 22:22:51 +00:00
Matthias Braun 4684033a2f MachineFunction: Slight refactoring; NFC
Slight cleanup/refactor in preparation for upcoming commit.

llvm-svn: 320882
2017-12-15 22:22:46 +00:00
Matthias Braun 89488fffdd MachineModuleInfo: Remove unused function; NFC
Remove the unused setModule() function; it would be dangerous if someone
actually used it as it wouldn't reset/recompute various other module
related data.

llvm-svn: 320881
2017-12-15 22:22:42 +00:00
Sam Clegg c018115480 [WebAssembly] Don't include lazy symbols in import table
This bug was introduced in: https://reviews.llvm.org/D41304.
Add a test for this case.

Differential Revision: https://reviews.llvm.org/D41309

llvm-svn: 320872
2017-12-15 22:17:15 +00:00
Galina Kistanova 6532b3b9d2 Fixed the gcc 'enumeral and non-enumeral type in conditional expression [-Werror=extra]' warning introduced by r320750
llvm-svn: 320868
2017-12-15 22:15:29 +00:00
Krzysztof Parzyszek 058d3cec15 [Hexagon] Remove recursion in visitUsesOf, replace with use queue
This is primarily to reduce stack usage, but ordering the use queue
according to the position in the code (earlier instructions visited
before later ones) reduces the number of unnecessary bottoms due to
visiting instructions out of order, e.g.
  %reg1 = copy %reg0
  %reg2 = copy %reg0
  %reg3 = and %reg1, %reg2
Here, reg3 should be known to be same as reg0-2, but if reg3 is
evaluated after reg1 is updated, but before reg2 is updated, the two
inputs to the and will appear different, causing reg3 to become
bottom.

llvm-svn: 320866
2017-12-15 21:34:05 +00:00
Krzysztof Parzyszek 266d6f03a1 [Hexagon] Handle concat_vectors of all allowed HVX types
llvm-svn: 320865
2017-12-15 21:23:12 +00:00
Craig Topper 6b8ac481f1 [X86] Use AND32ri8 instead of AND64ri8 in Asan code in EmitCallAsanReport for 32-bit mode.
This seemed to work due to a quirk in the X86 MC encoder that didn't emit a REX byte that the AND64ri8 implies when in 32-bit mode. This made the encoding the same as AND32ri8. I tried to add an assert to catch the dropped REX prefix that caught this.

llvm-svn: 320864
2017-12-15 21:18:06 +00:00
Craig Topper 422ed23298 [X86] In LowerVectorCTPOP use ISD::ZERO_EXTEND/ISD::TRUNCATE instead of the target specific nodes.
The target independent nodes will get legalized to the target specific nodes by their own legalization process. Someday I'd like to stop using a target specific for zero extends and truncates of legal types so the less places we reference the target specific opcode the better.

llvm-svn: 320863
2017-12-15 21:18:05 +00:00
Craig Topper f08ab74ae3 [X86] Remove unnecessary TODO.
When I wrote it I thought we were missing a potential optimization for KNL. But investigating further shows that for KNL we still do the optimal thing by widening to v4f32 and then using special isel patterns to widen again to zmm a register.

llvm-svn: 320862
2017-12-15 20:57:18 +00:00
Martin Storsjo cf29eb8c22 [MinGW] Ignore the --no-seh flag
The COFF linker automatically sets the IMAGE_DLL_CHARACTERISTICS_NO_SEH
when suitable, similarly to link.exe.

Differential Revision: https://reviews.llvm.org/D41275

llvm-svn: 320861
2017-12-15 20:53:10 +00:00
Martin Storsjo a1e9b6e3d2 [COFF] Set the IMAGE_DLL_CHARACTERISTICS_NO_SEH flag automatically
This seems to match how link.exe sets it.

Differential Revision: https://reviews.llvm.org/D41252

llvm-svn: 320860
2017-12-15 20:53:03 +00:00