Commit Graph

54654 Commits

Author SHA1 Message Date
Roman Lebedev 859e14aeaa [InstCombine] Fold x s> x & (-1 >> y) to x s> (-1 >> y)
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/I3O

This pattern is not commutative!
We must make sure not to fold the commuted version!

llvm-svn: 337105
2018-07-14 20:08:16 +00:00
Roman Lebedev 62d050d2c4 [NFC][InstCombine] Tests for x s> x & (-1 >> y) to x s> (-1 >> y) fold.
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/I3O

This pattern is not commutative!
We must make sure not to fold the commuted version!

llvm-svn: 337104
2018-07-14 20:08:09 +00:00
Roman Lebedev 0f5ec8921b [InstCombine] Fold x u<= x & C to x u<= C
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/Fqp

This pattern is not commutative. But InstSimplify will
already have taken care of the 'commutative' variant.

llvm-svn: 337102
2018-07-14 16:44:54 +00:00
Roman Lebedev 79ee3211ba [NFC][InstCombine] Tests for x u<= x & C to x u<= C fold.
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/Fqp

This pattern is not commutative. But InstSimplify will
already have taken care of the 'commutative' variant.

llvm-svn: 337101
2018-07-14 16:44:48 +00:00
Roman Lebedev 74f611a1f5 [InstCombine] Fold x u> x & C to x u> C
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/JvS

This pattern is not commutative. But InstSimplify will
already have taken care of the 'commutative' variant.

llvm-svn: 337100
2018-07-14 16:44:43 +00:00
Roman Lebedev 60ea5df7c2 [NFC][InstCombine] Tests for x u> x & C to x u> C fold.
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/JvS

This pattern is not commutative. But InstSimplify will
already have taken care of the 'commutative' variant.

llvm-svn: 337099
2018-07-14 16:44:37 +00:00
Roman Lebedev e3dc587ae0 [InstCombine] Fold x & (-1 >> y) u< x to x u> (-1 >> y)
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/ocb

This pattern is not commutative. But InstSimplify will
already have taken care of the 'commutative' variant.

llvm-svn: 337098
2018-07-14 12:20:16 +00:00
Roman Lebedev cad4e3d741 [NFC][InstCombine] Tests for x & (-1 >> y) u< x to x u> (-1 >> y)
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/ocb

This pattern is not commutative. But InstSimplify will
already have taken care of the 'commutative' variant.

llvm-svn: 337097
2018-07-14 12:20:11 +00:00
Roman Lebedev fac48474ce [InstCombine] Fold x & (-1 >> y) u>= x to x u<= (-1 >> y)
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/azI

This pattern is not commutative. But InstSimplify will
already have taken care of the 'commutative' variant.

llvm-svn: 337096
2018-07-14 12:20:06 +00:00
Roman Lebedev 3591eb9296 [NFC][InstCombine] Tests for x & (-1 >> y) u>= x to x u<= (-1 >> y)
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/azI

This pattern is not commutative. But InstSimplify will
already have taken care of the 'commutative' variant.

llvm-svn: 337095
2018-07-14 12:20:01 +00:00
Roman Lebedev ea2ffc3fcf [NFC][InstCombine] Add forgotten variable tests for foldICmpWithLowBitMaskedVal()
llvm-svn: 337094
2018-07-14 12:19:56 +00:00
Nico Weber 337e241d58 Attempt to get test/tools/llvm-lib/help.test passing on sanitizer-x86_64-linux-fast
The bot has a /b directory, so /? matches against that and gets expanded to it.

(Thanks to Hans's r187366, which solved the same problem for clang-cl a while
ago and which saved me much head scratching.)

llvm-svn: 337092
2018-07-14 11:33:33 +00:00
Francis Visoiu Mistrih f905bf1408 [MachineOutliner] Check the last instruction from the sequence when updating liveness
The MachineOutliner was doing an std::for_each from the call (inserted
before the outlined sequence) to the iterator at the end of the
sequence.

std::for_each needs the iterator past the end, so the last instruction
was not taken into account when propagating the liveness information.

This fixes the machine verifier issue in machine-outliner-disubprogram.ll.

Differential Revision: https://reviews.llvm.org/D49295

llvm-svn: 337090
2018-07-14 09:40:01 +00:00
Chandler Carruth fb503ac027 [x86/SLH] Fix an issue where we wouldn't harden any loads if we found
no conditions.

This is only valid to do if we're hardening calls and rets with LFENCE
which results in an LFENCE guarding the entire entry block for us.

llvm-svn: 337089
2018-07-14 09:32:37 +00:00
Craig Topper 7426cf6717 [X86] Fix a subtle bug in the custom execution domain fixing for blends.
The code tried to find the immediate by using getNumOperands() on the MachineInstr, but there might be implicit-defs after the immediate that get counted.

Instead use getNumOperands() from the instruction description which will only count the operands that are defined in the td file.

llvm-svn: 337088
2018-07-14 06:30:30 +00:00
Nico Weber 17172c6b80 Give llvm-lib rudimentary help output.
https://reviews.llvm.org/D49318

llvm-svn: 337084
2018-07-14 02:29:44 +00:00
Craig Topper f0b164415c [X86] Prefer blendi over movss/sd when avx512 is enabled unless optimizing for size.
AVX512 doesn't have an immediate controlled blend instruction. But blend throughput is still better than movss/sd on SKX.

This commit changes AVX512 to use the AVX blend instructions instead of MOVSS/MOVSD. This constrains the register allocation since it won't be able to use XMM16-31, but hopefully the increased throughput and reduced port 5 pressure makes up for that.

llvm-svn: 337083
2018-07-14 02:05:08 +00:00
Teresa Johnson b78c5d0602 Revert "[ThinLTO] Ensure we always select the same function copy to import"
This reverts commits r337050 and r337059. Caused failure in
reverse-iteration bot that needs more investigation.

llvm-svn: 337081
2018-07-14 01:45:49 +00:00
Teresa Johnson 8e739f98fe Revert "[ThinLTO] Add debug output to test"
This reverts commit r337076. Not needed any more.

llvm-svn: 337080
2018-07-14 01:34:06 +00:00
Evgeniy Stepanov 1971ba097d Revert "AMDGPU: Fix handling of alignment padding in DAG argument lowering"
This reverts commit r337021.

WARNING: MemorySanitizer: use-of-uninitialized-value
     0x1415cd65 in void write_signed<long>(llvm::raw_ostream&, long, unsigned long, llvm::IntegerStyle) /code/llvm-project/llvm/lib/Support/NativeFormatting.cpp:95:7
     0x1415c900 in llvm::write_integer(llvm::raw_ostream&, long, unsigned long, llvm::IntegerStyle) /code/llvm-project/llvm/lib/Support/NativeFormatting.cpp:121:3
     0x1472357f in llvm::raw_ostream::operator<<(long) /code/llvm-project/llvm/lib/Support/raw_ostream.cpp:117:3
     0x13bb9d4 in llvm::raw_ostream::operator<<(int) /code/llvm-project/llvm/include/llvm/Support/raw_ostream.h:210:18
     0x3c2bc18 in void printField<unsigned int, &(amd_kernel_code_s::amd_kernel_code_version_major)>(llvm::StringRef, amd_kernel_code_s const&, llvm::raw_ostream&) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:78:23
     0x3c250ba in llvm::printAmdKernelCodeField(amd_kernel_code_s const&, int, llvm::raw_ostream&) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:104:5
     0x3c27ca3 in llvm::dumpAmdKernelCode(amd_kernel_code_s const*, llvm::raw_ostream&, char const*) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:113:5
     0x3a46e6c in llvm::AMDGPUTargetAsmStreamer::EmitAMDKernelCodeT(amd_kernel_code_s const&) /code/llvm-project/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp:161:3
     0xd371e4 in llvm::AMDGPUAsmPrinter::EmitFunctionBodyStart() /code/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:204:26

[...]

Uninitialized value was created by an allocation of 'KernelCode' in the stack frame of function '_ZN4llvm16AMDGPUAsmPrinter21EmitFunctionBodyStartEv'
     0xd36650 in llvm::AMDGPUAsmPrinter::EmitFunctionBodyStart() /code/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:192

llvm-svn: 337079
2018-07-14 01:20:53 +00:00
Teresa Johnson 05d28c8002 [ThinLTO] Add debug output to test
Add -debug-only=function-import to get more information for debugging
reverse-iteration bot failure from r337050.

llvm-svn: 337076
2018-07-14 00:08:48 +00:00
Tim Shen a064622bd3 Re-apply "[SCEV] Strengthen StrengthenNoWrapFlags (reapply r334428)."
llvm-svn: 337075
2018-07-13 23:58:46 +00:00
Tim Shen e78bd2815e Add a CHECK line for r337072.
llvm-svn: 337074
2018-07-13 23:48:59 +00:00
Krzysztof Parzyszek 7ced04c0fd [Hexagon] Avoid introducing calls into coalesced range of HVX vector pairs
If an HVX vector register is to be coalesced into a vector pair, make
sure that the vector pair will not have a function call in its live range,
unless it already had one. All HVX vector registers are volatile, so
any vector register live across a function call will have to be spilled.

If a vector needs to be spilled, and it's coalesced into a vector pair
then the whole pair will need to be spilled (even if only a part of it is
live), taking extra stack space.

llvm-svn: 337073
2018-07-13 23:42:29 +00:00
Tim Shen 9e25d5d2ce [LSR] If no Use is interesting, early return.
Summary:
By looking at the callers of getUse(), we can see that even though
IVUsers may offer uses, but they may not be interesting to
LSR. It's possible that none of them is interesting.

Reviewers: sanjoy

Subscribers: jlebar, hiraditya, bixia, llvm-commits

Differential Revision: https://reviews.llvm.org/D49049

llvm-svn: 337072
2018-07-13 23:40:00 +00:00
Teresa Johnson 4f8d704cd0 [ThinLTO] Require x86 target for new test
Should fix non-x86 bot failures for new test from r337050.

llvm-svn: 337059
2018-07-13 22:36:22 +00:00
Tom Stellard ac68471326 AMDGPU/GlobalISel: Implement select() for 32-bit @llvm.minnun and @llvm.maxnum
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D46172

llvm-svn: 337056
2018-07-13 22:16:03 +00:00
Craig Topper da424ba1c5 [X86][FastISel] Support uitofp with avx512.
llvm-svn: 337055
2018-07-13 22:09:30 +00:00
Eli Friedman 835297a951 [LTO] Fix linking with an alias defined using another alias.
When we're linking an alias which will be defined later, we neeed to
build a GlobalAlias, or else we'll crash later in
IRLinker::linkGlobalValueBody.

clang sometimes constructs aliases like this for C++ destructors.

Differential Revision: https://reviews.llvm.org/D49316

llvm-svn: 337053
2018-07-13 21:58:55 +00:00
Teresa Johnson d94c0594d9 [ThinLTO] Ensure we always select the same function copy to import
In order to always import the same copy of a linkonce function,
even when encountering it with different thresholds (a higher one then a
lower one), keep track of the summary we decided to import.
This ensures that the backend only gets a single definition to import
for each GUID, so that it doesn't need to choose one.

Move the largest threshold the GUID was considered for import into the
current module out of the ImportMap (which is part of a larger map
maintained across the whole index), and into a new map just maintained
for the current module we are computing imports for. This saves some
memory since we no longer have the thresholds maintained across the
whole index (and throughout the in-process backends when doing a normal
non-distributed ThinLTO build), at the cost of some additional
information being maintained for each invocation of ComputeImportForModule
(the selected summary pointer for each import).

There is an additional map lookup for each callee being considered for
importing, however, this was able to subsume a map lookup in the
Worklist iteration that invokes computeImportForFunction. We also are
able to avoid calling selectCallee if we already failed to import at the
same or higher threshold.

I compared the run time and peak memory for the SPEC2006 471.omnetpp
benchmark (running in-process ThinLTO backends), as well as for a large
internal benchmark with a distributed ThinLTO build (so just looking at
the thin link time/memory). Across a number of runs with and without
this change there was no significant change in the time and memory.

(I tried a few other variations of the change but they also didn't
improve time or peak memory).

Reviewers: davidxl

Subscribers: mehdi_amini, inglorion, llvm-commits

Differential Revision: https://reviews.llvm.org/D48670

llvm-svn: 337050
2018-07-13 21:35:51 +00:00
Tom Stellard 390a5f4774 AMDGPU/GlobalISel: Implement select() for @llvm.amdgcn.exp
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D45882

llvm-svn: 337046
2018-07-13 21:05:14 +00:00
Craig Topper f0831eef0b [X86][FastISel] Add EVEX support to sitofp handling.
llvm-svn: 337045
2018-07-13 21:03:43 +00:00
Fangrui Song 90d9c201dc [X86] Try fixing r336768
llvm-svn: 337043
2018-07-13 20:54:24 +00:00
Roman Lebedev 4d22f15a2f [NFC][InstCombine] Tests for 'check for [no] signed truncation' pattern
[[ https://bugs.llvm.org/show_bug.cgi?id=38149 | PR38149 ]]

As discussed in https://reviews.llvm.org/D49179#1158957 and later,
the IR for 'check for [no] signed truncation' pattern can be improved:
https://rise4fun.com/Alive/gBf
^ that pattern will be produced by Implicit Integer Truncation sanitizer,
https://reviews.llvm.org/D48958 https://bugs.llvm.org/show_bug.cgi?id=21530
in signed case, therefore it is probably a good idea to improve it.

The DAGCombine will reverse this transform, see
https://reviews.llvm.org/D49266

llvm-svn: 337042
2018-07-13 20:33:34 +00:00
Vlad Tsyrklevich cd1559366d [LowerTypeTests] Limit when icall jumptable entries are emitted
Summary:
Currently LowerTypeTests emits jumptable entries for all live external
and address-taken functions; however, we could limit the number of
functions that we emit entries for significantly.

For Cross-DSO CFI, we continue to emit jumptable entries for all
exported definitions.  In the non-Cross-DSO CFI case, we only need to
emit jumptable entries for live functions that are address-taken in live
functions. This ignores exported functions and functions that are only
address taken in dead functions. This change uses ThinLTO summary data
(now emitted for all modules during ThinLTO builds) to determine
address-taken and liveness info.

The logic for emitting jumptable entries is more conservative in the
regular LTO case because we don't have summary data in the case of
monolithic LTO builds; however, once summaries are emitted for all LTO
builds we can unify the Thin/monolithic LTO logic to only use summaries
to determine the liveness of address taking functions.

This change is a partial fix for PR37474. It reduces the build size for
nacl_helper by ~2-3%, the reduction is due to nacl_helper compiling in
lots of unused code and unused functions that are address taken in dead
functions no longer being being considered live due to emitted jumptable
references. The reduction for chromium is ~0.1-0.2%.

Reviewers: pcc, eugenis, javed.absar

Reviewed By: pcc

Subscribers: aheejin, dexonsmith, dschuff, mehdi_amini, eraman, steven_wu, llvm-commits, kcc

Differential Revision: https://reviews.llvm.org/D47652

llvm-svn: 337038
2018-07-13 19:57:39 +00:00
Jonas Devlieghere 327e7a1608 [dwarfdump] Add pretty printer for accelerator table based on Atom.
For instance, When dumping .apple_types, the second atom represents the
DW_TAG. In addition to printing the raw value, we now also pretty print
the value if the ATOM tells us how.

llvm-svn: 337026
2018-07-13 17:21:51 +00:00
Andrea Di Biagio e86e6efea1 [llvm-mca][BtVer2] Add tests for dependency breaking instructions.
llvm-svn: 337024
2018-07-13 16:46:51 +00:00
Matt Arsenault de95077780 AMDGPU: Fix handling of alignment padding in DAG argument lowering
This was completely broken if there was ever a struct argument, as
this information is thrown away during the argument analysis.

The offsets as passed in to LowerFormalArguments are not useful,
as they partially depend on the legalized result register type,
and they don't consider the alignment in the first place.

Ignore the Ins array, and instead figure out from the raw IR type
what we need to do. This seems to fix the padding computation
if the DAG lowering is forced (and stops breaking arguments
following padded arguments if the arguments were only partially
lowered in the IR)

llvm-svn: 337021
2018-07-13 16:40:25 +00:00
Evgeniy Stepanov ca8c2f7638 Revert "CallGraphSCCPass: iterate over all functions."
This reverts commit r336419: use-after-free on CallGraph::FunctionMap elements
due to the use of a stale iterator in CGPassManager::runOnModule.

The iterator may be invalidated if a pass removes a function, ex.:
  llvm::LegacyInlinerBase::inlineCalls
  inlineCallsImpl
  llvm::CallGraph::removeFunctionFromModule

llvm-svn: 337018
2018-07-13 16:32:31 +00:00
Roman Lebedev b64e74feed [NFC][X86][AArch64] Negative tests for 'check for [no] signed truncation' pattern
See D49247, D49266

I'm only adding the sane negative tests, and not
adding the one-use tests yet. Also, not adding
negative tests for the second pattern with inverted operands yet,
since it's handling will be added in later differential.

llvm-svn: 337014
2018-07-13 16:14:37 +00:00
Joel Galenson 667eac80da [cfi-verify] Only run AArch64 tests when it is a supported target
This stops the tests I added in r337007 from running when AArch64 is not a supported target.

llvm-svn: 337012
2018-07-13 16:09:19 +00:00
Jonas Devlieghere 8afd926077 [dwarfdump] Pretty print DW_AT_APPLE_runtime_class
Instead of printing

  DW_AT_APPLE_runtime_class       (0x10)

we now print

  DW_AT_APPLE_runtime_class       (DW_LANG_ObjC)

llvm-svn: 337011
2018-07-13 16:06:17 +00:00
Nemanja Ivanovic 080c35050e [PowerPC] Materialize more constants with CR-field set in late peephole
Revision r322373 fixed a bug in how we materialize constants when the CR-field
needs to be set.

However the fix is overly conservative. It will only do the transform if
AND-ing the input with the new constant produces the same new constant.
This is of course correct, but not necessarily required.

If there are no futher uses of the constant, the constant can be changed.
If there are no uses of the GPR result, the final result of the materialization
isn't important other than it needs to compare to zero correctly (lt, gt, eq).

Differential revision: https://reviews.llvm.org/D42109

llvm-svn: 337008
2018-07-13 15:21:03 +00:00
Joel Galenson 06e7e5798f [cfi-verify] Support AArch64.
This patch adds support for AArch64 to cfi-verify.

This required three changes to cfi-verify.  First, it generalizes checking if an instruction is a trap by adding a new isTrap flag to TableGen (and defining it for x86 and AArch64).  Second, the code that ensures that the operand register is not clobbered between the CFI check and the indirect call needs to allow a single dereference (in x86 this happens as part of the jump instruction).  Third, we needed to ensure that return instructions are not counted as indirect branches.  Technically, returns are indirect branches and can be covered by CFI, but LLVM's forward-edge CFI does not protect them, and x86 does not consider them, so we keep that behavior.

In addition, we had to improve AArch64's code to evaluate the branch target of a MCInst to handle calls where the destination is not the first operand (which it often is not).

Differential Revision: https://reviews.llvm.org/D48836

llvm-svn: 337007
2018-07-13 15:19:33 +00:00
Stella Stamenova 02695fa8a7 [json, test] Fix the json.td test - the path to python could contain spaces
Summary: The path to the python executable can contain spaces, so it should be specified with quotes.

Reviewers: asmith, simon_tatham

Reviewed By: simon_tatham

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D49258

llvm-svn: 337006
2018-07-13 15:10:53 +00:00
Simon Atanasyan fb0d4e4432 [mips] Add microMIPS case to the tests and regenerate assertions using update_llc_test_checks.py. NFC
llvm-svn: 337004
2018-07-13 15:03:24 +00:00
Tim Renouf f3d8295105 DivergenceAnalysis: added debug output
Summary:
This commit does two things:

1. modified the existing DivergenceAnalysis::dump() so it dumps the
   whole function with added DIVERGENT: annotations;

2. added code to do that dump if the appropriate -debug-only option is
   on.

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D47700

Change-Id: Id97b605aab1fc6f5a11a20c58a99bbe8c565bf83
llvm-svn: 336998
2018-07-13 13:13:30 +00:00
Chandler Carruth 90358e1ef1 [SLH] Introduce a new pass to do Speculative Load Hardening to mitigate
Spectre variant  for x86.

There is a lengthy, detailed RFC thread on llvm-dev which discusses the
high level issues. High level discussion is probably best there.

I've split the design document out of this patch and will land it
separately once I update it to reflect the latest edits and updates to
the Google doc used in the RFC thread.

This patch is really just an initial step. It isn't quite ready for
prime time and is only exposed via debugging flags. It has two major
limitations currently:
1) It only supports x86-64, and only certain ABIs. Many assumptions are
   currently hard-coded and need to be factored out of the code here.
2) It doesn't include any options for more fine-grained control, either
   of which control flow edges are significant or which loads are
   important to be hardened.
3) The code is still quite rough and the testing lighter than I'd like.

However, this is enough for people to begin using. I have had numerous
requests from people to be able to experiment with this patch to
understand the trade-offs it presents and how to use it. We would also
like to encourage work to similar effect in other toolchains.

The ARM folks are actively developing a system based on this for
AArch64. We hope to merge this with their efforts when both are far
enough along. But we also don't want to block making this available on
that effort.

Many thanks to the *numerous* people who helped along the way here. For
this patch in particular, both Eric and Craig did a ton of review to
even have confidence in it as an early, rough cut at this functionality.

Differential Revision: https://reviews.llvm.org/D44824

llvm-svn: 336990
2018-07-13 11:13:58 +00:00
Simon Pilgrim 36b944e778 [SLPVectorizer] Add initial alternate opcode support for cast instructions. (REAPPLIED-2)
We currently only support binary instructions in the alternate opcode shuffles.

This patch is an initial attempt at adding cast instructions as well, this raises several issues that we probably want to address as we continue to generalize the alternate mechanism:

1 - Duplication of cost determination - we should probably add scalar/vector costs helper functions and get BoUpSLP::getEntryCost to use them instead of determining costs directly.
2 - Support alternate instructions with the same opcode (e.g. casts with different src types) - alternate vectorization of calls with different IntrinsicIDs will require this.
3 - Allow alternates to be a different instruction type - mixing binary/cast/call etc.
4 - Allow passthrough of unsupported alternate instructions - related to PR30787/D28907 'copyable' elements.

Reapplied with fix to only accept 2 different casts if they come from the same source type (PR38154).

Differential Revision: https://reviews.llvm.org/D49135

llvm-svn: 336989
2018-07-13 11:09:52 +00:00
Chandler Carruth 1151692ec5 [x86] Fix a capitalization that I failed to save in my editor before
landing the patch. =/

llvm-svn: 336986
2018-07-13 09:48:04 +00:00
Chandler Carruth caa7b03a50 [x86] Teach the EFLAGS copy lowering to handle much more complex control
flow patterns including forks, merges, and even cyles.

This tries to cover a reasonably comprehensive set of patterns that
still don't require PHIs or PHI placement. The coverage was inspired by
the amazing variety of patterns produced when copy EFLAGS and restoring
it to implement Speculative Load Hardening. Without this patch, we
simply cannot make such complex and invasive changes to x86 instruction
sequences due to EFLAGS.

I've added "just" one test, but this test covers many different
complexities and corner cases of this approach. It is actually more
comprehensive, as far as I can tell, than anything that I have
encountered in the wild on SLH.

Because the test is so complex, I've tried to give somewhat thorough
comments and an ASCII-art diagram of the control flows to make it a bit
easier to read and maintain long-term.

Differential Revision: https://reviews.llvm.org/D49220

llvm-svn: 336985
2018-07-13 09:39:10 +00:00
Sander de Smalen 7c3c0f24a3 [AArch64][SVE] Asm: Vector Unpack Low/High instructions.
This patch adds support for the following unpack instructions:
  
- PUNPKLO, PUNPKHI   Unpack elements from low/high half and
                     place into elements of twice their size.

  e.g. punpklo p0.h, p0.b

- UUNPKLO, UUNPKHI   Unpack elements from low/high half and 
  SUNPKLO, SUNPKHI   place into elements of twice their size
                     after zero- or sign-extending the values.

  e.g. uunpklo z0.h, z0.b

llvm-svn: 336982
2018-07-13 09:25:43 +00:00
Simon Pilgrim 9fe0bf3be7 [AArch64] Updated bigendian buildvector tests
As suggested by @efriedma on D49262 - changed the extractelement to a store to prevent SimplifyDemandedVectorElts from simplifying the build vectors - this keeps the immediate generation which was the point of the tests.

llvm-svn: 336981
2018-07-13 09:25:32 +00:00
Simon Pilgrim a39389ebff [ARM] Regenerated arg endian test
As requested on D49262

llvm-svn: 336980
2018-07-13 09:16:56 +00:00
Sander de Smalen faee91a52b [AArch64][SVE] Asm: Support for insert element (INSR) instructions.
Insert general purpose register into shifted vector, e.g.
  insr    z0.s, w0
  insr    z0.d, x0

Insert SIMD&FP scalar register into shifted vector, e.g.
  insr    z0.b, b0
  insr    z0.h, h0
  insr    z0.s, s0
  insr    z0.d, d0

llvm-svn: 336979
2018-07-13 08:51:57 +00:00
Petar Jovanovic be2e80af12 [LiveDebugValues] Tracking copying value between registers
During the execution of long functions or functions that have a lot of
inlined code it could come to the situation where tracked value could be
transferred from one register to another. The transfer is recognized only if
destination register is a callee saved register and if source register is
killed. We do not salvage caller-saved registers since there is a great
chance that killed register would outlive it.

Patch by Nikola Prica.

Differential Revision: https://reviews.llvm.org/D44016

llvm-svn: 336978
2018-07-13 08:24:26 +00:00
Dean Michael Berris 10141261e1 [XRay][compiler-rt] Add PID field to llvm-xray tool and add PID metadata record entry in FDR mode
Summary:
llvm-xray changes:
- account-mode - process-id  {...} shows after thread-id
- convert-mode - process {...} shows after thread
- parses FDR and basic mode pid entries
- Checks version number for FDR log parsing.

Basic logging changes:
- Update header version from 2 -> 3

FDR logging changes:
- Update header version from 2 -> 3
- in writeBufferPreamble, there is an additional PID Metadata record (after thread id record and tsc record)

Test cases changes:
- fdr-mode.cc, fdr-single-thread.cc, fdr-thread-order.cc modified to catch process id output in the log.

Reviewers: dberris

Reviewed By: dberris

Subscribers: hiraditya, llvm-commits, #sanitizers

Differential Revision: https://reviews.llvm.org/D49153

llvm-svn: 336974
2018-07-13 05:38:22 +00:00
Craig Topper 2ab325ba23 [X86] Remove isel patterns that turns packed add/sub/mul/div+movss/sd into scalar intrinsic instructions.
This is not an optimization we should be doing in isel. This is more suitable for a DAG combine.

My main concern is a future time when we support more FPENV. Changing a packed op to a scalar op could cause us to miss some exceptions that should have occured if we had done a packed op. A DAG combine would be better able to manage this.

llvm-svn: 336971
2018-07-13 04:50:39 +00:00
Joel E. Denny dc5ba317b1 [FileCheck] Implement -v and -vv for tracing matches
-v prints all directive pattern matches.

-vv additionally prints info that might be noise to users but that can
be helpful to FileCheck developers.

To maximize code reuse and to make diagnostics more consistent, this
patch also adjusts and extends some of the existing diagnostics.
CHECK-NOT failures now report variables uses.  Many more diagnostics
now report the check prefix and kind of directive.

Reviewed By: probinson

Differential Revision: https://reviews.llvm.org/D47114

llvm-svn: 336967
2018-07-13 03:08:23 +00:00
Sanjay Patel 70043b7e9a [InstCombine] return when SimplifyAssociativeOrCommutative makes a change
This bug was created by rL335258 because we used to always call instsimplify
after trying the associative folds. After that change it became possible
for subsequent folds to encounter unsimplified code (and potentially assert
because of it). 

Instead of carrying changed state through instcombine, we can just return 
immediately. This allows instsimplify to run, so we can continue assuming
that easy folds have already occurred.

llvm-svn: 336965
2018-07-13 01:18:07 +00:00
Piotr Padlewski c63b492bcd Simplify recursive launder.invariant.group and strip
Summary:
This patch is crucial for proving equality laundered/stripped
pointers. eg:

  bool foo(A *a) {
    return a == std::launder(a);
  }

Clang with -fstrict-vtable-pointers will emit something like:

    define dso_local zeroext i1 @_Z3fooP1A(%struct.A* %a) {
    entry:
      %c = bitcast %struct.A* %a to i8*
      %call = tail call i8* @llvm.launder.invariant.group.p0i8(i8* %c)
      %0 = bitcast %struct.A* %a to i8*
      %1 = tail call i8* @llvm.strip.invariant.group.p0i8(i8* %0)
      %2 = tail call i8* @llvm.strip.invariant.group.p0i8(i8* %call)
      %cmp = icmp eq i8* %1, %2
      ret i1 %cmp
    }

and because %2 can be replaced with @llvm.strip.invariant.group(%0)
and that %2 and %1 will produce the same value (because strip is readnone)
we can replace compare with true.

Reviewers: rsmith, hfinkel, majnemer, amharc, kuhar

Subscribers: llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D47423

llvm-svn: 336963
2018-07-12 23:55:20 +00:00
Craig Topper cbdf571afd [X86] Regenerate checks in sse-scalar-fp-arith.ll.
llvm-svn: 336958
2018-07-12 22:59:03 +00:00
Craig Topper 3a13477214 [X86] Add AVX512 equivalents of some isel patterns so we get EVEX instructions.
These are the patterns for matching fceil, ffloor, and sqrt to intrinsic instructions if they have a MOVSS/SD.

llvm-svn: 336954
2018-07-12 22:14:10 +00:00
Craig Topper b0053b79d6 Revert r336950 and r336951 "[X86] Add AVX512 equivalents of some isel patterns so we get EVEX instructions." and "foo"
One of them had a bad title and they should have been squashed.

llvm-svn: 336953
2018-07-12 21:58:03 +00:00
Craig Topper 3b837b6b63 [X86] Add AVX512 equivalents of some isel patterns so we get EVEX instructions.
These are the patterns for matching fceil, ffloor, and sqrt to intrinsic instructions if they have a MOVSS/SD.

llvm-svn: 336951
2018-07-12 21:53:23 +00:00
Martin Storsjo 86b95489fd Revert "[SLPVectorizer] Add initial alternate opcode support for cast instructions. (REAPPLIED)"
This reverts commit r336812, which broke compilation of a number
of projects, see PR38154.

llvm-svn: 336949
2018-07-12 21:33:42 +00:00
Bill Wendling 7bd9e94e38 [gold-plugin] Disable section ordering for relocatable links
Not all programs want section ordering when compiled with LTO.
In particular, the Linux kernel is very sensitive when it comes to linking, and
doesn't boot when each function is placed in its own sections.

Reviewed By: pcc

Differential Revision: https://reviews.llvm.org/D48756

llvm-svn: 336943
2018-07-12 20:35:58 +00:00
Stefan Pintilie 94259ba13a [PowerPC] [NFC] Update __float128 tests
Add the two options -ppc-vsr-nums-as-vr and -ppc-asm-full-reg-names to
the __float128 tests. Then modify the tests as required.

llvm-svn: 336940
2018-07-12 20:18:57 +00:00
Craig Topper 57c4585bab [X86][FastISel] Support EVEX version of sqrt.
llvm-svn: 336939
2018-07-12 19:58:06 +00:00
Craig Topper 1880a3f0d8 [X86] Add show-mc-encoding to some fast-isel intrinsic tests so we can observe a failure to select EVEX instructions.
The one I noticed is sqrtsss/sd, but there could be others.

I had to add a couple new tests that don't have insertelement in there to catch this on the fast-isel path. Otherwise we trigger an abort and use SelectionDAG.

llvm-svn: 336938
2018-07-12 19:58:01 +00:00
Matt Arsenault 4dca0a9904 AMDGPU: Fix assert in truncate combine with vectors
The piece above probably has the same problem, but I need
to try to come up with a test for it.

llvm-svn: 336935
2018-07-12 19:40:16 +00:00
Wolfgang Pieb fcf3810cf7 [DWARF v5] Generate range list tables into the .debug_rnglists section. No support for split DWARF
and no use of DW_FORM_rnglistx with the DW_AT_ranges attribute.

Reviewer: aprantl

Differential Revision: https://reviews.llvm.org/D49214

llvm-svn: 336927
2018-07-12 18:18:21 +00:00
Stephen Hines e8c3c5fe5d Add --strip-all option back to llvm-strip.
Summary:
This option appears to have been dropped as part of the refactoring in
r331663. Unfortunately, if we want to use llvm-strip as a drop-in
replacement for strip, this option should still be available.

Reviewers: alexshap

Reviewed By: alexshap

Subscribers: meikeb, kongyi, chh, jakehehrlich, llvm-commits, pirama

Differential Revision: https://reviews.llvm.org/D49226

llvm-svn: 336921
2018-07-12 17:42:17 +00:00
Roman Lebedev 1574e49792 [NFC][X86][AArch64] Add tests for the 'check for [no] signed truncation' pattern
Summary:
[[ https://bugs.llvm.org/show_bug.cgi?id=38149 | PR38149 ]]

As discussed in https://reviews.llvm.org/D49179#1158957 and later,
the IR can be improved:
https://rise4fun.com/Alive/gBf
^ that pattern will be produced by Implicit Integer Truncation sanitizer,
https://reviews.llvm.org/D48958
https://bugs.llvm.org/show_bug.cgi?id=21530
in signed case, therefore it is probably a good idea to improve it.

But as it looks from these tests,
i think we want to revert at least some cases in DAGCombine.

Reviewers: spatel, craig.topper, RKSimon, javed.absar

Reviewed By: spatel

Subscribers: kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D49247

llvm-svn: 336917
2018-07-12 17:00:11 +00:00
Sjoerd Meijer 88757ba3bd Follow up of r336913: forgot to add the new test files.
llvm-svn: 336914
2018-07-12 14:59:02 +00:00
Roman Lebedev 733a8d6886 [InstCombine] icmp-logical.ll: restore the original intention of the test.
While that fold is clearly not happening [anymore],
we do now have separate test cases for these cases,
so we should be ok to slightly adjust these tests
to not potentially loose test coverage.

As suggested by Hiroshi Yamauchi in
https://reviews.llvm.org/D49179#1159345

llvm-svn: 336912
2018-07-12 14:56:17 +00:00
Roman Lebedev 74f899f0f4 [InstCombine] Fold x & (-1 >> y) != x to x u> (-1 >> y)
Summary:
A complementary fold to D49179.

https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/Rny

Caveat: one more thing in `test/Transforms/InstCombine/icmp-logical.ll` breaks.

Reviewers: spatel, craig.topper

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D49205

llvm-svn: 336911
2018-07-12 14:56:12 +00:00
Andrew Ng ac305e18a4 [ThinLTO] Escape module paths when printing
We have located a bug in AssemblyWriter::printModuleSummaryIndex(). This
function outputs path strings incorrectly. Backslashes in the strings
are not correctly escaped.

Consequently, if a path name contains a backslash followed by two
hexadecimal characters, the sequence is incorrectly interpreted when the
output is read by another component. This mangles the path and results
in error.

This patch fixes this issue by calling printEscapedString() to output
the module paths.

Patch by Chris Jackson.

Differential Revision: https://reviews.llvm.org/D49090

llvm-svn: 336908
2018-07-12 14:40:21 +00:00
Francis Visoiu Mistrih 8b2ab91a5d [DebugInfo][X86] Add start-after flags to MIR tests
These tests would fail with -verify-machineinstrs because the MI
generated from the IR would be merged with the one already in the MIR
files, and we get the following error:

```
*** Bad machine code: Function has NoVRegs property but there are VReg operands ***
- function:    f
```

Differential Revision: https://reviews.llvm.org/D49191

llvm-svn: 336907
2018-07-12 14:36:48 +00:00
Francis Visoiu Mistrih 68e9ed75d5 [XRay] Fix machine verifier issues in X86
I'm not sure if this fix is the right thing to do, but it seemed to me
that PATCHABLE_RET and PATCHABLE_TAIL_CALL don't have any defs.

Running the following:

```
LLVM_ENABLE_MACHINE_VERIFIER=1 ./build/bin/llvm-lit -v -a test/CodeGen/X86/xray-*
```

results in the following tests to fail (along others):

```
LLVM :: CodeGen/X86/xray-attribute-instrumentation.ll
LLVM :: CodeGen/X86/xray-custom-log.ll
LLVM :: CodeGen/X86/xray-log-args.ll
LLVM :: CodeGen/X86/xray-loop-detection.ll
LLVM :: CodeGen/X86/xray-multiplerets-in-blocks.mir
LLVM :: CodeGen/X86/xray-section-group.ll
LLVM :: CodeGen/X86/xray-selective-instrumentation.ll
LLVM :: CodeGen/X86/xray-tail-call-sled.ll
LLVM :: CodeGen/X86/xray-typed-event-log.ll
```

The errors are:

```
*** Bad machine code: Explicit definition must be a register ***
- function:    fn
- basic block: %bb.0  (0x7fa31a84d908)
- instruction: PATCHABLE_RET 2560, $eax
- operand 0:   2560
```

and

```
*** Bad machine code: Explicit definition must be a register ***
- function:    caller
- basic block: %bb.0  (0x7fbff3044108)
- instruction: PATCHABLE_TAIL_CALL 3009, @callee, <regmask $bh $bl $bp $bph $bpl $bx $ebp $ebx $hbp $hbx $rbp $rbx $r12 $r13 $r14 $r15 $r12b $r13b $r14b $r15b $r12bh $r13bh $r14bh $r15bh $r12d $r13d $r14d $r15d $r12w $r13w $r14w $r15w $r12wh and 3 more...>, implicit $rsp, implicit $ssp, implicit $rsp, implicit $ssp, implicit $edi
- operand 0:   3009
```

Differential Revision: https://reviews.llvm.org/D49187

llvm-svn: 336906
2018-07-12 14:36:43 +00:00
Simon Pilgrim 44b89fa900 [X86][SSE] Utilize ZeroableElements for canWidenShuffleElements
canWidenShuffleElements can do a better job if given a mask with ZeroableElements info. Apparently, ZeroableElements was being only used to identify AllZero candidates, but possibly we could plug it into more shuffle matchers.

Original Patch by Zvi Rackover @zvi

Differential Revision: https://reviews.llvm.org/D42044

llvm-svn: 336903
2018-07-12 13:29:41 +00:00
Chen Zheng 9b00a8e9d7 [InstCombine]add testcases for folding more SPFofSPF pattern
Differential Revision: https://reviews.llvm.org/D49222

llvm-svn: 336902
2018-07-12 13:28:20 +00:00
Simon Pilgrim 8a463897e9 [X86][AVX] Use Zeroable mask to improve shuffle mask widening
Noticed while updating D42044, lowerV2X128VectorShuffle can improve the shuffle mask with the zeroable data to create a target shuffle mask to recognise more 'zero upper 128' patterns.

NOTE: lowerV4X128VectorShuffle could benefit as well but the code needs refactoring first to discriminate between SM_SentinelUndef and SM_SentinelZero for negative shuffle indices.

Differential Revision: https://reviews.llvm.org/D49092

llvm-svn: 336900
2018-07-12 13:03:58 +00:00
Simon Pilgrim 8868cda9ca [X86] Add UDIV by uniform/non-uniform constant tests
llvm-svn: 336894
2018-07-12 09:04:28 +00:00
Chen Zheng fdf13ef342 [InstSimplify] simplify add instruction if two operands are negative
Differential Revision: https://reviews.llvm.org/D49216

llvm-svn: 336881
2018-07-12 03:06:04 +00:00
Eric Christopher 7289ac8a19 Temporarily revert "Recommit r328307: [IPSCCP] Use constant range information for comparisons of parameters." as it's causing miscompiles.
A testcase was provided in the original review thread.

This reverts commit r336098.

llvm-svn: 336877
2018-07-12 01:53:21 +00:00
Chandler Carruth b4faf4ce08 [x86] Fix another trivial bug in x86 flags copy lowering that has been
there for a long time.

The boolean tracking whether we saw a kill of the flags was supposed to
be per-block we are scanning and instead was outside that loop and never
cleared. It requires a quite contrived test case to hit this as you have
to have multiple levels of successors and interleave them with kills.
I've included such a test case here.

This is another bug found testing SLH and extracted to its own focused
patch.

llvm-svn: 336876
2018-07-12 01:43:21 +00:00
Craig Topper be996bd2d9 [X86] Add patterns to use VMOVSS/SD zero masking for scalar f32/f64 select with zero.
These showed up in some of the upgraded FMA code. We really need to improve these test cases more, but this helps for now.

llvm-svn: 336875
2018-07-12 00:54:40 +00:00
Chandler Carruth 1c8234f639 [x86] Fix EFLAGS copy lowering to correctly handle walking past uses in
multiple successors where some of the uses end up killing the EFLAGS
register.

There was a bug where rather than skipping to the next basic block
queued up with uses once we saw a kill, we stopped processing the blocks
entirely. =/

Test case produces completely nonsensical code w/o this tiny fix.

This was found testing Speculative Load Hardening and split out of that
work.

Differential Revision: https://reviews.llvm.org/D49211

llvm-svn: 336874
2018-07-12 00:52:50 +00:00
Craig Topper 034adf2683 [X86] Remove and autoupgrade the scalar fma intrinsics with masking.
This converts them to what clang is now using for codegen. Unfortunately, there seem to be a few kinks to work out still. I'll try to address with follow up patches.

llvm-svn: 336871
2018-07-12 00:29:56 +00:00
Eric Christopher ba4a090a24 Add -allow-deprecated-dag-overlap to one of the experimental webassembly target tests.
llvm-svn: 336870
2018-07-12 00:01:51 +00:00
Duncan P. N. Exon Smith 9f22752c4b IR: Skip -print-*-all after -print-*
This changes `-print-*` from transformation passes to analysis passes so
that `-print-after-all` and `-print-before-all` don't trigger.  This
avoids some redundant output.

Patch by Son Tuan Vu!

llvm-svn: 336869
2018-07-11 23:30:25 +00:00
Eli Friedman 0319c28459 [CodeGen] Emit more precise AssertZext/AssertSext nodes.
This is marginally helpful for removing redundant extensions, and the
code is easier to read, so it seems like an all-around win. In the new
test i8-phi-ext.ll, we used to emit an AssertSext i8; now we emit an
AssertZext i2, which allows the extension of the return value to be
eliminated.

Differential Revision: https://reviews.llvm.org/D49004

llvm-svn: 336868
2018-07-11 23:26:35 +00:00
Craig Topper ed6acde8cf [LoopIdiomRecognize] Don't convert a do while loop to ctlz.
This commit suppresses turning loops like this into "(bitwidth - ctlz(input))".

unsigned foo(unsigned input) {
  unsigned num = 0;
  do {
    ++num;
    input >>= 1;
  } while (input != 0);
  return num;
}

The loop version returns a value of 1 for both an input of 0 and an input of 1. Converting to a naive ctlz does not preserve that.

Theoretically we could do better if we checked isKnownNonZero or we could insert a select to handle the divergence. But until we have motivating cases for that, this is the easiest solution.

llvm-svn: 336864
2018-07-11 22:35:28 +00:00
Craig Topper ef08aec935 [LoopIdiomRecognize] Add a test case showing a loop we turn into ctlz that we shouldn't.
This loop executes one iteration without checking the input value. This produces a count of 1 for an input of 0 and 1. We are turning this into 32 - ctlz(n), but that returns 0 if n is 0.

llvm-svn: 336862
2018-07-11 22:17:26 +00:00
Roman Lebedev 2c19fe52e9 [NFC][InstCombine] Tests for x & (-1 >> y) != x -> x u> (-1 >> y) fold
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/Rny

llvm-svn: 336857
2018-07-11 21:28:42 +00:00
Joel E. Denny e08566fcf9 finish: [FileCheck] Add -allow-deprecated-dag-overlap to failing llvm tests
Differential Revision: https://reviews.llvm.org/D47171

This contains the portions of that patch that could not be committed
using the git monorepo because of dos line ending problems.

llvm-svn: 336848
2018-07-11 20:31:51 +00:00
Joel E. Denny bcf5b441d8 [FileCheck] Don't permit overlapping CHECK-DAG
That is, make CHECK-DAG skip matches that overlap the matches of any
preceding consecutive CHECK-DAG directives.  This change makes
CHECK-DAG more consistent with other directives, and there is evidence
it makes CHECK-DAG more intuitive and less error-prone.  See the RFC
discussion starting at:

  http://lists.llvm.org/pipermail/llvm-dev/2018-May/123010.html

Moreover, this behavior enables CHECK-DAG groups for unordered,
non-unique strings or patterns.  For example, it is useful for
verifying output or logs from a parallel program, such as the OpenMP
runtime.

This patch also implements the command-line option
-allow-deprecated-dag-overlap, which reverts CHECK-DAG to the old
overlapping behavior.  This option should not be used in new tests.
It is meant only for the existing tests that are broken by this change
and that need time to update.

See the following bugzilla issue for tracking of such tests:

  https://bugs.llvm.org/show_bug.cgi?id=37532

Patches to add -allow-deprecated-dag-overlap to those tests will
follow immediately.

Reviewed By: probinson

Differential Revision: https://reviews.llvm.org/D47106

llvm-svn: 336847
2018-07-11 20:27:27 +00:00
Joel E. Denny 9fa9c9368d [FileCheck] Add -allow-deprecated-dag-overlap to failing llvm tests
See https://reviews.llvm.org/D47106 for details.

Reviewed By: probinson

Differential Revision: https://reviews.llvm.org/D47171

This commit drops that patch's changes to:

  llvm/test/CodeGen/NVPTX/f16x2-instructions.ll
  llvm/test/CodeGen/NVPTX/param-load-store.ll

For some reason, the dos line endings there prevent me from commiting
via the monorepo.  A follow-up commit (not via the monorepo) will
finish the patch.

llvm-svn: 336843
2018-07-11 20:25:49 +00:00
Roman Lebedev 68d54cf5b3 [InstCombine] Fold x & (-1 >> y) == x to x u<= (-1 >> y)
Summary:
https://bugs.llvm.org/show_bug.cgi?id=38123

This pattern will be produced by Implicit Integer Truncation sanitizer,
https://reviews.llvm.org/D48958
https://bugs.llvm.org/show_bug.cgi?id=21530
in unsigned case, therefore it is probably a good idea to improve it.

https://rise4fun.com/Alive/Rny
^ there are more opportunities for folds, i will follow up with them afterwards.

Caveat: this somehow exposes a missing opportunities
in `test/Transforms/InstCombine/icmp-logical.ll`
It seems, the problem is in `foldLogOpOfMaskedICmps()` in `InstCombineAndOrXor.cpp`.
But i'm not quite sure what is wrong, because it calls `getMaskedTypeForICmpPair()`,
which calls `decomposeBitTestICmp()` which should already work for these cases...
As @spatel notes in https://reviews.llvm.org/D49179#1158760,
that code is a rather complex mess, so we'll let it slide.

Reviewers: spatel, craig.topper

Reviewed By: spatel

Subscribers: yamauchi, majnemer, t.p.northover, llvm-commits

Differential Revision: https://reviews.llvm.org/D49179

llvm-svn: 336834
2018-07-11 19:05:04 +00:00
Joel E. Denny 614c986175 Revert r336830: [FileCheck] Don't permit overlapping CHECK-DAG
Companion patches are failing to commit, and this patch alone breaks
many tests.

llvm-svn: 336833
2018-07-11 19:03:00 +00:00
Paul Robinson 72b014051c Quick fix for some Windows bots
llvm-svn: 336832
2018-07-11 18:51:15 +00:00
Joel E. Denny edf338856c [FileCheck] Don't permit overlapping CHECK-DAG
That is, make CHECK-DAG skip matches that overlap the matches of any
preceding consecutive CHECK-DAG directives.  This change makes
CHECK-DAG more consistent with other directives, and there is evidence
it makes CHECK-DAG more intuitive and less error-prone.  See the RFC
discussion starting at:

  http://lists.llvm.org/pipermail/llvm-dev/2018-May/123010.html

Moreover, this behavior enables CHECK-DAG groups for unordered,
non-unique strings or patterns.  For example, it is useful for
verifying output or logs from a parallel program, such as the OpenMP
runtime.

This patch also implements the command-line option
-allow-deprecated-dag-overlap, which reverts CHECK-DAG to the old
overlapping behavior.  This option should not be used in new tests.
It is meant only for the existing tests that are broken by this change
and that need time to update.

See the following bugzilla issue for tracking of such tests:

  https://bugs.llvm.org/show_bug.cgi?id=37532

Patches to add -allow-deprecated-dag-overlap to those tests will
follow immediately.

Reviewed By: probinson

Differential Revision: https://reviews.llvm.org/D47106

llvm-svn: 336830
2018-07-11 18:42:58 +00:00
Paul Semel 0f9ca2d960 Revert "[llvm-objdump] Add -demangle (-C) option"
This reverts commit 3a44ccd156e0edd2e89226f8ed63928e227900bb.
This reverts commit d5cfc836bb5552e20507d3612d13ff66ff9e36a0.

llvm-svn: 336829
2018-07-11 18:09:52 +00:00
Craig Topper 38b290f7d7 [X86] Remove patterns for inserting a load into a zero vector.
We can instead block the load folding isProfitableToFold. Then isel will emit a register->register move for the zeroing part and a separate load. The PostProcessISelDAG should be able to remove the register->register move.

This saves us patterns and fixes the fact that we only had unaligned load patterns. The test changes show places where we should have been using an aligned load.

llvm-svn: 336828
2018-07-11 18:09:04 +00:00
Simon Pilgrim 667a5b541f [TargetTransformInfo] Add pow2 analysis for scalar constants
Add ConstantInt analysis to getOperandInfo so we get more realistic div/rem expansion costs comparable to the vector costs.

llvm-svn: 336827
2018-07-11 17:51:27 +00:00
Sanjay Patel 14eeb5a5c0 [InstSimplify] add/move tests for add folds; NFC
isKnownNegation() is currently proposed as part of D48754,
but it could be used to make InstSimplify stronger independently
of any abs() improvements.

llvm-svn: 336822
2018-07-11 16:52:18 +00:00
Paul Semel 569200aab8 Fix llvm-objdump demangle test (added triple option)
llvm-svn: 336821
2018-07-11 16:31:33 +00:00
Andrea Di Biagio 483db141e3 [X86] Fix MayLoad/HasSideEffect flag for (V)MOVLPSrm instructions.
Before revision 336728, the "mayLoad" flag for instruction (V)MOVLPSrm was
inferred directly from the "default" pattern associated with the instruction
definition.

r336728 removed special node X86Movlps, and all the patterns associated to it.
Now instruction (V)MOVLPSrm doesn't have a pattern associated to it, and the
'mayLoad/hasSideEffects' flags are left unset.

When the instruction info is emitted by tablegen, method
CodeGenDAGPatterns::InferInstructionFlags() sees that (V)MOVLPSrm doesn't have a
pattern, and flags are undefined. So, it conservatively sets the
"hasSideEffects" flag for it.

As a consequence, we were losing the 'mayLoad' flag, and we were gaining a
'hasSideEffect' flag in its place.
This patch fixes the issue (originally reported by Michael Holmen).

The mca tests show the differences in the instruction info flags.  Instructions
that were affected by this problem were: MOVLPSrm/VMOVLPSrm/VMOVLPSZ128rm.

Differential Revision: https://reviews.llvm.org/D49182

llvm-svn: 336818
2018-07-11 15:27:50 +00:00
Paul Semel bcf55ab95a [llvm-objdump] Add -demangle (-C) option
Differential Revision: https://reviews.llvm.org/D49043

llvm-svn: 336816
2018-07-11 15:25:39 +00:00
Simon Pilgrim 876e99bf2c [SLPVectorizer] Add initial alternate opcode support for cast instructions. (REAPPLIED)
We currently only support binary instructions in the alternate opcode shuffles.

This patch is an initial attempt at adding cast instructions as well, this raises several issues that we probably want to address as we continue to generalize the alternate mechanism:

1 - Duplication of cost determination - we should probably add scalar/vector costs helper functions and get BoUpSLP::getEntryCost to use them instead of determining costs directly.
2 - Support alternate instructions with the same opcode (e.g. casts with different src types) - alternate vectorization of calls with different IntrinsicIDs will require this.
3 - Allow alternates to be a different instruction type - mixing binary/cast/call etc.
4 - Allow passthrough of unsupported alternate instructions - related to PR30787/D28907 'copyable' elements.

Reapplied with fix to only accept 2 different casts if they come from the same source type.

Differential Revision: https://reviews.llvm.org/D49135

llvm-svn: 336812
2018-07-11 15:05:10 +00:00
Simon Pilgrim 09832e33b4 [SLPVectorizer] Ensure alternate/passthrough doesn't vectorize sdiv with undef elts
llvm-svn: 336809
2018-07-11 14:34:43 +00:00
Simon Pilgrim 0c7bea0dc0 [SLPVectorizer] Add some additional alternate cast tests
Initial attempt at D49135 failed as we weren't correctly handling casts with different source types.

llvm-svn: 336808
2018-07-11 14:29:13 +00:00
Simon Pilgrim 1edde95abd Revert rL336804: [SLPVectorizer] Add initial alternate opcode support for cast instructions.
Reverting due to buildbot failures

llvm-svn: 336806
2018-07-11 14:08:16 +00:00
Simon Pilgrim 2f963a7e83 [SLPVectorizer] Add initial alternate opcode support for cast instructions.
We currently only support binary instructions in the alternate opcode shuffles.

This patch is an initial attempt at adding cast instructions as well, this raises several issues that we probably want to address as we continue to generalize the alternate mechanism:

1 - Duplication of cost determination - we should probably add scalar/vector costs helper functions and get BoUpSLP::getEntryCost to use them instead of determining costs directly.
2 - Support alternate instructions with the same opcode (e.g. casts with different src types) - alternate vectorization of calls with different IntrinsicIDs will require this.
3 - Allow alternates to be a different instruction type - mixing binary/cast/call etc.
4 - Allow passthrough of unsupported alternate instructions - related to PR30787/D28907 'copyable' elements.

Differential Revision: https://reviews.llvm.org/D49135

llvm-svn: 336804
2018-07-11 13:34:09 +00:00
Krzysztof Parzyszek 0b492f7fb8 [CodeGen] Ignore debug uses in MachineCopyPropagation
Debug uses should not count as real uses, since the presence of debug
information could affect the generated code.

llvm-svn: 336803
2018-07-11 13:30:27 +00:00
Andrea Di Biagio d2e2c053cf [llvm-mca] Use a different character to flag instructions with side-effects in the Instruction Info View. NFC
This makes easier to identify changes in the instruction info flags.  It also
helps spotting potential regressions similar to the one recently introduced at
r336728.

Using the same character to mark MayLoad/MayStore/HasSideEffects is problematic
for llvm-lit. When pattern matching substrings, llvm-lit consumes tabs and
spaces. A change in position of the flag marker may not trigger a test failure.

This patch only changes the character used for flag `hasSideEffects`. The reason
why I didn't touch other flags is because I want to avoid spamming the mailing
because of the massive diff due to the numerous tests affected by this change.

In future, each instruction flag should be associated with a different character
in the Instruction Info View.

llvm-svn: 336797
2018-07-11 12:44:44 +00:00
Roman Lebedev 76f195cdbd [NFC][InstCombine] Tests for x & (-1 >> y) == x -> x u<= (-1 >> y) fold
https://bugs.llvm.org/show_bug.cgi?id=38123

This pattern will be produced by Implicit Integer Truncation sanitizer,
https://reviews.llvm.org/D48958
https://bugs.llvm.org/show_bug.cgi?id=21530
in unsigned case, therefore it is probably a good idea to improve it.

https://rise4fun.com/Alive/Rny

llvm-svn: 336796
2018-07-11 12:37:12 +00:00
Sjoerd Meijer 53449daa96 [ARM] ParallelDSP: multiple reduction stmts in loop
This fixes an issue that we were not properly supporting multiple reduction
stmts in a loop, and not generating SMLADs for these cases. The alias analysis
checks were done too early, making it too conservative.

Differential revision: https://reviews.llvm.org/D49125

llvm-svn: 336795
2018-07-11 12:36:25 +00:00
Jonas Devlieghere 26ddf274d7 Use debug-prefix-map for AT_NAME
AT_NAME was being emitted before the directory paths were remapped. This
ensures that all paths are remapped before anything is emitted.

An additional test case has been added.

Note that this only works if the replacement string is an absolute path.
If not, then AT_decl_file believes the new path is a relative path, and
joins that path with the compilation directory. I do not know of a good
way to resolve this.

Patch by: Siddhartha Bagaria (starsid)

Differential revision: https://reviews.llvm.org/D49169

llvm-svn: 336793
2018-07-11 12:30:35 +00:00
Sander de Smalen ea45a89e5c [AArch64][SVE] Asm: Support for COMPACT instruction.
The compact instruction shuffles active elements of vector
into lowest numbered elements and sets remaining elements
to zero. 

e.g.
  compact z0.s, p0, z1.s

llvm-svn: 336789
2018-07-11 11:22:26 +00:00
Simon Pilgrim f6ff75c4c2 Fix check-prefix vs check-prefixes typo in updated test
llvm-svn: 336787
2018-07-11 10:42:51 +00:00
Simon Pilgrim 1975efe555 [AArch64] Regenerate SDIV tests
Will make codegen diffs much easier to grok in a future patch

llvm-svn: 336786
2018-07-11 10:39:50 +00:00
Roman Lebedev a042fae6e0 [NFC][InstCombine] icmp-logical.ll: add a few more tests.
The @masked_and_notA_slightly_optimized and @masked_or_A
will break when PR38123 will be fixed:
https://rise4fun.com/Alive/Rny
Clearly, they aren't optimized currently.

https://rise4fun.com/Alive/ERo

llvm-svn: 336784
2018-07-11 10:31:12 +00:00
Sander de Smalen a90530f7c1 [AArch64][SVE] Asm: Support for LAST(A|B) and CLAST(A|B) instructions.
The LASTB and LASTA instructions extract the last active element,
or element after the last active, from the source vector.

The added variants are:

  Scalar:
  last(a|b)  w0, p0, z0.b
  last(a|b)  w0, p0, z0.h
  last(a|b)  w0, p0, z0.s
  last(a|b)  x0, p0, z0.d

  SIMD & FP Scalar:
  last(a|b)  b0, p0, z0.b
  last(a|b)  h0, p0, z0.h
  last(a|b)  s0, p0, z0.s
  last(a|b)  d0, p0, z0.d

The CLASTB and CLASTA conditionally extract the last or element after
the last active element from the source vector.

The added variants are:

  Scalar:
  clast(a|b)  w0, p0, w0, z0.b
  clast(a|b)  w0, p0, w0, z0.h
  clast(a|b)  w0, p0, w0, z0.s
  clast(a|b)  x0, p0, x0, z0.d

  SIMD & FP Scalar:
  clast(a|b)  b0, p0, b0, z0.b
  clast(a|b)  h0, p0, h0, z0.h
  clast(a|b)  s0, p0, s0, z0.s
  clast(a|b)  d0, p0, d0, z0.d

  Vector:
  clast(a|b)  z0.b, p0, z0.b, z1.b
  clast(a|b)  z0.h, p0, z0.h, z1.h
  clast(a|b)  z0.s, p0, z0.s, z1.s
  clast(a|b)  z0.d, p0, z0.d, z1.d

Please refer to the architecture specification for more details on
the semantics of the added instructions.

llvm-svn: 336783
2018-07-11 10:08:00 +00:00
Paul Semel b98f504850 [llvm-readobj] Add -hex-dump (-x) option
Differential Revision: https://reviews.llvm.org/D48281

llvm-svn: 336782
2018-07-11 10:00:29 +00:00
Roman Lebedev 5260c9efc8 [NFC][InstCombine] Fix extra space padding in icmp-mul-zext.ll test
update_test_checks will drop it anyway, creating noise..

llvm-svn: 336781
2018-07-11 09:57:53 +00:00
Roman Lebedev c5e437e5eb [NFC][InstCombine] Add variable names and regenerate icmp-logical.ll test.
llvm-svn: 336780
2018-07-11 09:57:46 +00:00
Andrea Di Biagio 2b3a4f9c9b [llvm-mca] Add tests for partial register writes.
llvm-mca doesn't know that on modern AMD processors, portions of a general
purpose register are not treated independently. So, a partial register write has
a false dependency on the super-register.

The issue with partial register writes will be addressed by a follow-up patch.

llvm-svn: 336778
2018-07-11 09:50:00 +00:00
Simon Pilgrim df9d59771b [DAGCombiner] Support non-uniform X%C -> X-(X/C)*C folds
First stage in PR38057 - support non-uniform constant vectors in the combine to reuse the division-by-constant logic.

We can definitely do better for srem pow2 remainders (and avoid that extra multiply....) but this at least helps keep everything on the vector unit.

Differential Revision: https://reviews.llvm.org/D48975

llvm-svn: 336774
2018-07-11 09:22:42 +00:00
Simon Pilgrim 97cf111689 [DAGCombiner] Add (urem X, -1) -> select(X == -1, 0, x) fold
llvm-svn: 336773
2018-07-11 09:14:37 +00:00
Simon Tatham 6a8c6cadf1 [TableGen] Add a general-purpose JSON backend.
The aim of this backend is to output everything TableGen knows about
the record set, similarly to the default -print-records backend. But
where -print-records produces output in TableGen's input syntax
(convenient for humans to read), this backend produces it as
structured JSON data, which is convenient for loading into standard
scripting languages such as Python, in order to extract information
from the data set in an automated way.

The output data contains a JSON representation of the variable
definitions in output 'def' records, and a few pieces of metadata such
as which of those definitions are tagged with the 'field' prefix and
which defs are derived from which classes. It doesn't dump out
absolutely every piece of knowledge it _could_ produce, such as type
information and complicated arithmetic operator nodes in abstract
superclasses; the main aim is to allow consumers of this JSON dump to
essentially act as new backends, and backends don't generally need to
depend on that kind of data.

The new backend is implemented as an EmitJSON() function similar to
all of llvm-tblgen's other EmitFoo functions, except that it lives in
lib/TableGen instead of utils/TableGen on the basis that I'm expecting
to add it to clang-tblgen too in a future patch.

To test it, I've written a Python script that loads the JSON output
and tests properties of it based on comments in the .td source - more
or less like FileCheck, except that the CHECK: lines have Python
expressions after them instead of textual pattern matches.

Reviewers: nhaehnle

Reviewed By: nhaehnle

Subscribers: arichardson, labath, mgorny, llvm-commits

Differential Revision: https://reviews.llvm.org/D46054

llvm-svn: 336771
2018-07-11 08:40:19 +00:00
Craig Topper 02867f0fa3 [X86] The TEST instruction is eliminated when BSF/TZCNT is used
Summary:
These changes cover the PR#31399.
Now the ffs(x) function is lowered to (x != 0) ? llvm.cttz(x) + 1 : 0
and it corresponds to the following llvm code:
  %cnt = tail call i32 @llvm.cttz.i32(i32 %v, i1 true)
  %tobool = icmp eq i32 %v, 0
  %.op = add nuw nsw i32 %cnt, 1
  %add = select i1 %tobool, i32 0, i32 %.op
and x86 asm code:
  bsfl     %edi, %ecx
  addl     $1, %ecx
  testl    %edi, %edi
  movl     $0, %eax
  cmovnel  %ecx, %eax
In this case the 'test' instruction can't be eliminated because
the 'add' instruction modifies the EFLAGS, namely, ZF flag
that is set by the 'bsf' instruction when 'x' is zero.

We now produce the following code:
  bsfl     %edi, %ecx
  movl     $-1, %eax
  cmovnel  %ecx, %eax
  addl     $1, %eax

Patch by Ivan Kulagin

Reviewers: davide, craig.topper, spatel, RKSimon

Reviewed By: craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D48765

llvm-svn: 336768
2018-07-11 06:57:42 +00:00
Craig Topper 1d6a80cd95 [X86] Remove some composite MOVSS/MOVSD isel patterns.
These patterns looked for a MOVSS/SD followed by a scalar_to_vector. Or a scalar_to_vector followed by a load.

In both cases we emitted a MOVSS/SD for the MOVSS/SD part, a REG_CLASS for the scalar_to_vector, and a MOVSS/SD for the load.

But we have patterns that do each of those 3 things individually so there's no reason to build large patterns.

Most of the test changes are just reorderings. The one test that had a meaningful change is pr30430.ll and it appears to be a regression. But its doing -O0 so I think it missed a lot of opportunities and was just getting lucky before.

llvm-svn: 336762
2018-07-11 04:51:40 +00:00
Sam Clegg 92617559bb [WebAssembly] Add pass to infer prototypes for prototype-less functions
See https://bugs.llvm.org/show_bug.cgi?id=35385

Differential Revision: https://reviews.llvm.org/D48471

llvm-svn: 336759
2018-07-11 04:29:36 +00:00
Stefan Pintilie b9d01aa29e [Power9] Add remaining __flaot128 builtin support for FMA round to odd
Implement this as it is done on GCC:

__float128 a, b, c, d;
a = __builtin_fmaf128_round_to_odd (b, c, d);         // generates xsmaddqpo
a = __builtin_fmaf128_round_to_odd (b, c, -d);        // generates xsmsubqpo
a = - __builtin_fmaf128_round_to_odd (b, c, d);       // generates xsnmaddqpo
a = - __builtin_fmaf128_round_to_odd (b, c, -d);      // generates xsnmsubpqp

Differential Revision: https://reviews.llvm.org/D48218

llvm-svn: 336754
2018-07-11 01:42:22 +00:00
Chen Zheng fb361d2525 [test cases] add test cases for find more abs pattern
Differential Revision: https://reviews.llvm.org/D49123

llvm-svn: 336752
2018-07-11 01:07:21 +00:00
Eli Friedman d2c739230c [ARM] Treat cmn immediates as legal in isLegalICmpImmediate.
The original code attempted to do this, but the std::abs() call didn't
actually do anything due to implicit type conversions.  Fix the type
conversions, and perform the correct check for negative immediates.

This probably has very little practical impact, but it's worth fixing
just to avoid confusion in the future, I think.

Differential Revision: https://reviews.llvm.org/D48907

llvm-svn: 336742
2018-07-10 23:44:37 +00:00
Craig Topper 860ab496d3 [X86] Teach X86InstrInfo::commuteInstructionImpl to use MOVSD/MOVSS for BLEND under optsize when the immediate allows it.
Isel currently emits movss/movsd a lot of the time and an accidental double commute turns it into a blend.

Ideally we'd select blend directly in isel under optspeed and not rely on the double commute to create blend.

llvm-svn: 336731
2018-07-10 22:02:23 +00:00
Teresa Johnson c0320ef47b [ThinLTO] Use std::map to get determistic imports files
Summary:
I noticed that the .imports files emitted for distributed ThinLTO
backends do not have consistent ordering. This is because StringMap
iteration order is not guaranteed to be deterministic. Since we already
have a std::map with this information, used when emitting the individual
index files (ModuleToSummariesForIndex), use it for the imports files as
well.

This issue is likely causing some unnecessary rebuilds of the ThinLTO
backends in our distributed build system as the imports files are inputs
to those backends.

Reviewers: pcc, steven_wu, mehdi_amini

Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D48783

llvm-svn: 336721
2018-07-10 20:06:04 +00:00
Alexander Ivchenko 48ca0550dd [GlobalISel][X86_64] Support for G_SITOFP
The instruction selection is automatically handled by tablegen

llvm-svn: 336703
2018-07-10 16:38:35 +00:00
Eugene Leviant 6a572b8e79 [Evaluator] Examine alias when evaluating function call
This fixes PR38120

llvm-svn: 336702
2018-07-10 16:34:23 +00:00
Simon Pilgrim 4cb4609392 [DAGCombiner] Add special case fast paths for udiv x,1 and udiv x,-1
udiv x,-1 was going down the (slow) BuildUDIV route resulting in unnecessary shifts.

llvm-svn: 336701
2018-07-10 16:33:07 +00:00
Konstantin Zhuravlyov f0badd5ac1 AMDGPU: Make hidden argument metadata consistent with
amdgpu-implicitarg-num-bytes attribute

Differential Revision: https://reviews.llvm.org/D49096

llvm-svn: 336697
2018-07-10 16:12:51 +00:00
Sanjay Patel c8d9d812ec [InstCombine] allow flag propagation when using safe constant
This corresponds with the code for the single binop pattern
added in rL336684.

llvm-svn: 336696
2018-07-10 16:09:49 +00:00
Simon Pilgrim 9bd9fef446 [X86] Add srem/udiv/urem by constant tests
Match the tests in combine-sdiv.ll

llvm-svn: 336694
2018-07-10 16:08:28 +00:00
Heejin Ahn 9ef850b844 [WebAssembly] Add missing a few {{$}}s to a test
llvm-svn: 336691
2018-07-10 16:00:43 +00:00
Konstantin Zhuravlyov 75024cf4a7 AMDGPU/NFC: Fix typo in test name
hsa-metadata-enqueu-kernel.ll ->
hsa-metadata-enqueue-kernel.ll

llvm-svn: 336689
2018-07-10 15:54:46 +00:00
Paul Robinson 88ef98b02b Update test to work on Windows
llvm-svn: 336687
2018-07-10 15:23:10 +00:00
Sanjay Patel 509a1e7a9b [InstCombine] safely allow non-commutative binop identity constant folds
This was originally intended with D48893, but as discussed there, we
have to make the folds safe from producing extra poison. This should
give the single binop folds the same capabilities as the existing
folds for 2-binops+shuffle.

LLVM binary opcode review: there are a total of 18 binops. There are 7 
commutative binops (add, mul, and, or, xor, fadd, fmul) which we already 
fold. We're able to fold 6 more opcodes with this patch (shl, lshr, ashr,
fdiv, udiv, sdiv). There are no folds for srem/urem/frem AFAIK. We don't 
bother with sub/fsub with constant operand 1 because those are 
canonicalized to add/fadd. 7 + 6 + 3 + 2 = 18.

llvm-svn: 336684
2018-07-10 15:12:31 +00:00
Krzysztof Parzyszek c87ecf2526 [Hexagon] Change .mir testcase to make sure function is not in SSA form
If a machine function satisfies SSA, the IsSSA property is assumed even
if the pass to be executed runs after existing from SSA. If the pass
output then does not conform to SSA, a verifier error will be flagged
(with expensive checks enabled).

llvm-svn: 336682
2018-07-10 14:49:54 +00:00
Paul Robinson c17c8bf749 Support -fdebug-prefix-map in llvm-mc. This is useful to omit the
debug compilation dir when compiling assembly files with -g.
Part of PR38050.

Patch by Siddhartha Bagaria!

Differential Revision: https://reviews.llvm.org/D48988

llvm-svn: 336680
2018-07-10 14:41:54 +00:00
Sanjay Patel 3333106a62 [InstCombine] drop poison flags when shuffle mask undef propagates to constant
llvm-svn: 336679
2018-07-10 14:27:55 +00:00
Sander de Smalen 53108d48f7 [AArch64][SVE] Asm: Support for predicated unary operations.
This patch adds support for the following instructions:
  CLS  (Count Leading Sign bits)
  CLZ  (Count Leading Zeros)
  CNT  (Count non-zero bits)
  CNOT (Logically invert boolean condition in vector)
  NOT  (Bitwise invert vector)
  FABS (Floating-point absolute value)
  FNEG (Floating-point negate)

All operations are predicated and unary, e.g.
  clz  z0.s, p0/m, z1.s

- CLS, CLZ, CNT, CNOT and NOT have variants for 8, 16, 32
  and 64 bit elements.

- FABS and FNEG have variants for 16, 32 and 64 bit elements.

llvm-svn: 336677
2018-07-10 14:05:55 +00:00
Matt Arsenault a680199a96 Reapply "AMDGPU: Force inlining if LDS global address is used"
This reverts commit r336623

llvm-svn: 336675
2018-07-10 14:03:41 +00:00
Sanjay Patel 06ea4206ad [InstCombine] allow more shuffle-binop folds with safe constants
The case with 2 variables is more complicated than the case where
we eliminate the shuffle entirely because a shuffle with an undef 
mask element creates an undef result. 

I'm not aware of any current analysis/transform that recognizes that 
undef propagating to a div/rem/shift, but we have to guard against 
the possibility.

llvm-svn: 336668
2018-07-10 13:33:26 +00:00
Anastasis Grammenos 612bf7cac5 [DebugInfo][LoopVectorize] Preserve DL in induction PHI and Add
Differential Revision: https://reviews.llvm.org/D48968

llvm-svn: 336667
2018-07-10 13:29:50 +00:00
Krzysztof Parzyszek c052451a02 [Hexagon] Add implicit uses even when untied explicit uses are present
An explicit untied use is not sufficient to maintain liveness of a
register redefined in a predicated instruction. For example
  %1 = COPY %0
  ...
  %1 = A2_paddif %2, %1, 1
could become
  $r1 = COPY $r0
  ...
  $r1 = A2_paddif $p0, $r1, 1
and later
  $r1 = COPY $r0                ;; this is not really dead!
  ...
  $r1 = A2_paddif $p0, $r0, 1

llvm-svn: 336662
2018-07-10 12:57:49 +00:00
Karl-Johan Karlsson 1ffeb5d7f0 [LowerSwitch] Fixed faulty PHI nodes
Summary:
Fixed two cases of where PHI nodes need to be updated by lowerswitch.

When lowerswitch find out that the switch default branch is not
reachable it remove the old default and replace it with the most
popular block from the cases, but it forget to update the PHI
nodes in the default block.

The PHI nodes also need to be updated when the switch is replaced
with a single branch.

Reviewers: hans, reames, arsenm

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D47203

llvm-svn: 336659
2018-07-10 12:06:16 +00:00
Chandler Carruth 47dc3a346e [PM/Unswitch] Fix a collection of closely related issues with trivial
switch unswitching.

The core problem was that the way we handled unswitching trivial exit
edges through the default successor of a switch. For some reason
I thought the right way to do this was to add a block containing
unreachable and point the default successor at this block. In
retrospect, this has an amazing number of problems.

The first issue is the one that this pass has always worked around -- we
have to *detect* such edges and avoid unswitching them again. This
seemed pretty easy really. You juts look for an edge to a block
containing unreachable. However, this pattern is woefully unsound. So
many things can break it. The amazing thing is that I found a test case
where *simple-loop-unswitch itself* breaks this! When we do
a *non-trivial* unswitch of a switch we will end up splitting this exit
edge. The result will be a default successor that is an exit and
terminates in ... a perfectly normal branch. So the first test case that
I started trying to fix is added to the nontrivial test cases. This is
a ridiculous example that did just amazing things previously. With just
unswitch, it would create 10+ copies of this stuff stamped out. But if
you combine it *just right* with a bunch of other passes (like
simplify-cfg, loop rotate, and some LICM) you can get it to do this
infinitely. Or at least, I never got it to finish. =[

This, in turn, uncovered another related issue. When we are manipulating
these switches after doing a trivial unswitch we never correctly updated
PHI nodes to reflect our edits. As soon as I started changing how these
edges were managed, it became obvious there were more issues that
I couldn't realistically leave unaddressed, so I wrote more test cases
around PHI updates here and ensured all of that works now.

And this, in turn, required some adjustment to how we collect and manage
the exit successor when it is the default successor. That showed a clear
bug where we failed to include it in our search for the outer-most loop
reached by an unswitched exit edge. This was actually already tested and
the test case didn't work. I (wrongly) thought that was due to SCEV
failing to analyze the switch. In fact, it was just a simple bug in the
code that skipped the default successor. While changing this, I handled
it correctly and have updated the test to reflect that we now get
precise SCEV analysis of trip counts for the outer loop in one of these
cases.

llvm-svn: 336646
2018-07-10 08:36:05 +00:00
Mikhail Dvoretckii 89c919c20b [X86] Fast-isel tests for lowered truncation intrinsics
This patch adds fast-isel tests for the IR patterns produced for truncation
intrinsics in rC336643.

Differential Revision: https://reviews.llvm.org/D48822

llvm-svn: 336645
2018-07-10 08:26:54 +00:00
Simon Pilgrim d32ca2c0b7 [X86][SSE] Prefer BLEND(SHL(v,c1),SHL(v,c2)) over MUL(v, c3)
Now that rL336250 has landed, we should prefer 2 immediate shifts + a shuffle blend over performing a multiply. Despite the increase in instructions, this is quicker (especially for slow v4i32 multiplies), avoid loads and constant pool usage. It does mean however that we increase register pressure. The code size will go up a little but by less than what we save on the constant pool data.

This patch also adds support for v16i16 to the BLEND(SHIFT(v,c1),SHIFT(v,c2)) combine, and also prevents blending on pre-SSE41 shifts if it would introduce extra blend masks/constant pool usage.

Differential Revision: https://reviews.llvm.org/D48936

llvm-svn: 336642
2018-07-10 07:58:33 +00:00
Craig Topper 5fd020c082 [X86] Regenerate vector-shuffle-512-v8.ll so the script will merge the 32 and 64 bit checks together. NFC
llvm-svn: 336641
2018-07-10 07:17:41 +00:00
Craig Topper 866a377e91 [X86] Correct vfixupimm load patterns to look for an integer load, not a floating point load bitcasted to integer.
DAG combine wouldn't let a floating point load bitcasted to integer exist. It would just be an integer load.

llvm-svn: 336626
2018-07-10 00:49:49 +00:00
Craig Topper 59fd2f4c52 [X86] Add test cases that show failure to fold load into vfixupimm instructions due to bad isel pattern.
llvm-svn: 336625
2018-07-10 00:49:47 +00:00
Vlad Tsyrklevich 688e752207 Revert "AMDGPU: Force inlining if LDS global address is used"
This reverts commit r336587, it was causing test failures on the
sanitizer bots.

llvm-svn: 336623
2018-07-10 00:46:07 +00:00
Sanjay Patel 69faf464ed [InstCombine] allow more shuffle folds using safe constants
getSafeVectorConstantForBinop() was calling getBinOpIdentity() assuming
that the constant we wanted was operand 1 (RHS). That's wrong, but I
don't think we could expose a bug or even a suboptimal fold from that
because the callers have other guards for any binop that would have
been affected.

llvm-svn: 336617
2018-07-09 23:22:47 +00:00
Heejin Ahn fed7382ef6 [WebAssembly] Support for binary atomic RMW instructions
Summary:
This adds support for binary atomic read-modify-write instructions:
add, sub, and, or, xor, and xchg.

This does not yet support translations of some of LLVM IR atomicrmw
instructions (nand, max, min, umax, and umin) that do not have a direct
counterpart in wasm instructions.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D49088

llvm-svn: 336615
2018-07-09 22:30:51 +00:00
Manoj Gupta 77eeac3d9e llvm: Add support for "-fno-delete-null-pointer-checks"
Summary:
Support for this option is needed for building Linux kernel.
This is a very frequently requested feature by kernel developers.

More details : https://lkml.org/lkml/2018/4/4/601

GCC option description for -fdelete-null-pointer-checks:
This Assume that programs cannot safely dereference null pointers,
and that no code or data element resides at address zero.

-fno-delete-null-pointer-checks is the inverse of this implying that
null pointer dereferencing is not undefined.

This feature is implemented in LLVM IR in this CL as the function attribute
"null-pointer-is-valid"="true" in IR (Under review at D47894).
The CL updates several passes that assumed null pointer dereferencing is
undefined to not optimize when the "null-pointer-is-valid"="true"
attribute is present.

Reviewers: t.p.northover, efriedma, jyknight, chandlerc, rnk, srhines, void, george.burgess.iv

Reviewed By: efriedma, george.burgess.iv

Subscribers: eraman, haicheng, george.burgess.iv, drinkcat, theraven, reames, sanjoy, xbolva00, llvm-commits

Differential Revision: https://reviews.llvm.org/D47895

llvm-svn: 336613
2018-07-09 22:27:23 +00:00
George Burgess IV 3fbfa9c403 Make llvm.objectsize more conservative with null
In non-zero address spaces, we were reporting that an object at `null`
always occupies zero bytes. This is incorrect in many cases, so just
return `unknown` in those cases for now.

Differential Revision: https://reviews.llvm.org/D48860

llvm-svn: 336611
2018-07-09 22:21:16 +00:00
Simon Pilgrim 017c68c122 Fix line endings. NFCI.
llvm-svn: 336602
2018-07-09 20:52:07 +00:00
Stefan Pintilie 133acb22bb [Power9] Add __float128 builtins for Rounding Operations
Added __float128 support for a number of rounding operations:

trunc
rint
nearbyint
round
floor
ceil

Differential Revision: https://reviews.llvm.org/D48415

llvm-svn: 336601
2018-07-09 20:38:40 +00:00
Heejin Ahn d31bc9866b [WebAssembly] Improve readability of load/stores and tests. NFC.
Summary:
- Changed variable/function names to be more consistent
- Improved comments in test files
- Added more tests
- Fixed a few typos
- Misc. cosmetic changes

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D49087

llvm-svn: 336598
2018-07-09 20:18:21 +00:00
Stefan Pintilie 58e3e0a827 [Power9] [LLVM] Add __float128 support for trunc to double round to odd
Add support for this builtin:
double builtin_truncf128_round_to_odd(float128)

Differential Revision: https://reviews.llvm.org/D48483

llvm-svn: 336595
2018-07-09 20:09:22 +00:00
Mark Searles 7139dea6d9 RenameIndependentSubregs: Fix handling of undef tied operands
Ensure that, if updating a tied operand pair, to only update
that pair.

Differential Revision: https://reviews.llvm.org/D49052

llvm-svn: 336593
2018-07-09 20:07:03 +00:00
Daniel Sanders 9481399c0f [globalisel][irtranslator] Add support for atomicrmw and (strong) cmpxchg
Summary:
This patch adds support for the atomicrmw instructions and the strong
cmpxchg instruction to the IRTranslator.

I've left out weak cmpxchg because LangRef.rst isn't entirely clear on what
difference it makes to the backend. As far as I can tell from the code, it
only matters to AtomicExpandPass which is run at the LLVM-IR level.

Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar, volkan, javed.absar

Reviewed By: qcolombet

Subscribers: kristof.beyls, javed.absar, igorb, llvm-commits

Differential Revision: https://reviews.llvm.org/D40092

llvm-svn: 336589
2018-07-09 19:33:40 +00:00
Matt Arsenault 40cb6cab56 AMDGPU: Force inlining if LDS global address is used
These won't work for the forseeable future. These aren't allowed
from OpenCL, but IPO optimizations can make them appear.

Also directly set the attributes on functions, regardless
of the linkage rather than cloning functions like before.

llvm-svn: 336587
2018-07-09 19:22:22 +00:00
Roman Lebedev 5ccae1750b [X86][TLI] DAGCombine: Unfold variable bit-clearing mask to two shifts.
Summary:
This adds a reverse transform for the instcombine canonicalizations
that were added in D47980, D47981.

As discussed later, that was worse at least for the code size,
and potentially for the performance, too.

https://rise4fun.com/Alive/Zmpl

Reviewers: craig.topper, RKSimon, spatel

Reviewed By: spatel

Subscribers: reames, llvm-commits

Differential Revision: https://reviews.llvm.org/D48768

llvm-svn: 336585
2018-07-09 19:06:42 +00:00
Stefan Pintilie 83a5fe146e [Power9] Add __float128 builtins for Round To Odd
GCC has builtins for these round to odd instructions:

__float128 __builtin_sqrtf128_round_to_odd (__float128)
__float128 __builtin_{add,sub,mul,div}f128_round_to_odd (__float128, __float128)
__float128 __builtin_fmaf128_round_to_odd (__float128, __float128, __float128)

Differential Revision: https://reviews.llvm.org/D47550

llvm-svn: 336578
2018-07-09 18:50:06 +00:00
Steven Wu a1a8e66a11 Add bitcode compatibility test for 6.0
Summary:
Add bitcode compatibility test for 6.0. On top of the normal disassemble
test, also runs the verifier to make sure simple 6.0 bitcode can pass
the current IR verifier.

Reviewers: vsk

Reviewed By: vsk

Subscribers: dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D49086

llvm-svn: 336574
2018-07-09 17:57:48 +00:00
Sanjay Patel 651438c2f6 [InstCombine] correct test comments; NFC
llvm-svn: 336570
2018-07-09 17:48:08 +00:00
Craig Topper 47170b3153 [X86] In combineFMA, make sure we bitcast the result of isFNEG back the expected type before creating the new FMA node.
Previously, we were creating malformed SDNodes, but nothing noticed because the type constraints prevented isel from noticing.

llvm-svn: 336566
2018-07-09 17:43:24 +00:00
Simon Pilgrim d070659280 [X86][AVX] Regenerate AVX1 fast-isel tests.
Let the update script merge 32/64 tests where possible

llvm-svn: 336565
2018-07-09 17:38:00 +00:00
Sanjay Patel 7cd32419ab [InstCombine] avoid extra poison when moving shift above shuffle
As discussed in D49047 / D48987, shift-by-undef produces poison,
so we can't use undef vector elements in that case..

Note that we need to extend this for poison-generating flags,
and there's a proposal to create poison from FMF in D47963,

llvm-svn: 336562
2018-07-09 17:20:20 +00:00
Jonas Devlieghere 82dee6aca8 [dsymutil] Add support for outputting assembly
When implementing the DWARF accelerator tables in dsymutil I ran into an
assertion in the assembler. Debugging these kind of issues is a lot
easier when looking at the assembly instead of debugging the assembler
itself. Since it's only a matter of creating an AsmStreamer instead of a
MCObjectStreamer it made sense to turn this into a (hidden) dsymutil
feature.

Differential revision: https://reviews.llvm.org/D49079

llvm-svn: 336561
2018-07-09 16:58:48 +00:00
Steven Wu e1f7c5f8c7 [BitcodeReader] Infer the correct runtime preemption for GlobalValue
Summary:
To allow bitcode built by old compiler to pass the current verifer,
BitcodeReader needs to auto infer the correct runtime preemption from
linkage and visibility for GlobalValues.

Since llvm-6.0 bitcode already contains the new field but can be
incorrect in some cases, the attribute needs to be recomputed all the
time in BitcodeReader. This will make all the GVs has dso_local marked
correctly if read from bitcode, and it should still allow the verifier
to catch mistakes in optimization passes.

This should fix PR38009.

Reviewers: sfertile, vsk

Reviewed By: vsk

Subscribers: dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D49039

llvm-svn: 336560
2018-07-09 16:52:05 +00:00
Sander de Smalen d3efb59f29 [AArch64][SVE] Asm: Support for CNT(B|H|W|D) and CNTP instructions.
This patch adds support for the following instructions:

  CNTB CNTH - Determine the number of active elements implied by
  CNTW CNTD   the named predicate constant, multiplied by an
              immediate, e.g.

                cnth x0, vl8, 

  CNTP      - Count active predicate elements, e.g.
                cntp  x0, p0, p1.b

              counts the number of active elements in p1, predicated
              by p0, and stores the result in x0.

llvm-svn: 336552
2018-07-09 15:22:08 +00:00
Stefan Pintilie 3d76326d24 [Power9] Add __float128 support for compare operations
Added handling for the select f128.

Differential Revision: https://reviews.llvm.org/D48294

llvm-svn: 336548
2018-07-09 13:36:14 +00:00
Sander de Smalen 813b21e33a [AArch64][SVE] Asm: Support for remaining shift instructions.
This patch completes support for shifts, which include:
- LSL   - Logical Shift Left
- LSLR  - Logical Shift Left, Reversed form
- LSR   - Logical Shift Right
- LSRR  - Logical Shift Right, Reversed form
- ASR   - Arithmetic Shift Right
- ASRR  - Arithmetic Shift Right, Reversed form
- ASRD  - Arithmetic Shift Right for Divide

In the following variants:

- Predicated shift by immediate - ASR, LSL, LSR, ASRD
  e.g.
    asr z0.h, p0/m, z0.h, 

  (active lanes of z0 shifted by )

- Unpredicated shift by immediate - ASR, LSL*, LSR*
  e.g.
    asr z0.h, z1.h, 

  (all lanes of z1 shifted by , stored in z0)

- Predicated shift by vector - ASR, LSL*, LSR*
  e.g.
    asr z0.h, p0/m, z0.h, z1.h

  (active lanes of z0 shifted by z1, stored in z0)

- Predicated shift by vector, reversed form - ASRR, LSLR, LSRR
  e.g.
    lslr z0.h, p0/m, z0.h, z1.h

  (active lanes of z1 shifted by z0, stored in z0)

- Predicated shift left/right by wide vector - ASR, LSL, LSR
  e.g.
    lsl z0.h, p0/m, z0.h, z1.d

  (active lanes of z0 shifted by wide elements of vector z1)

- Unpredicated shift left/right by wide vector - ASR, LSL, LSR
  e.g.
    lsl z0.h, z1.h, z2.d

  (all lanes of z1 shifted by wide elements of z2, stored in z0)

*Variants added in previous patches.

llvm-svn: 336547
2018-07-09 13:23:41 +00:00
Sanjay Patel 5bd36644c8 [InstCombine] fix shuffle-of-binops transform to avoid poison/undef
As noted in D48987, there are many different ways for this transform to go wrong. 
In particular, the poison potential for shifts means we have to more careful with those ops. 
I added tests to make that behavior visible for all of the different cases that I could find.

This is a partial fix. To make this review easier, I did not make changes for the single binop 
pattern (handled in foldSelectShuffleWith1Binop()). I also left out some potential optimizations 
noted with TODO comments. I'll follow-up once we're confident that things are correct here.

The goal is to correct all marked FIXME tests to either avoid the shuffle transform or do it safely.

Note that distinguishing when the shuffle mask contains undefs and using getBinOpIdentity() allows 
for some improvements to div/rem patterns, so there are wins along with the missed opportunities 
and fixes.

Differential Revision: https://reviews.llvm.org/D49047

llvm-svn: 336546
2018-07-09 13:21:46 +00:00
Stefan Maksimovic 0a23998fe7 [mips] Addition of the [d]rem and [d]remu instructions
Related to http://reviews.llvm.org/D15772
Depends on http://reviews.llvm.org/D16889
Adds [D]REM[U] instructions.

Patch By: Srdjan Obucina
Contributions from: Simon Dardis

Differential Revision: https://reviews.llvm.org/D17036

llvm-svn: 336545
2018-07-09 13:06:44 +00:00
Sander de Smalen 54077dcfcb [AArch64][SVE] Asm: Support for TBL instruction.
Support for SVE's TBL instruction for programmable table
lookup/permute using vector of element indices, e.g.

  tbl  z0.d, { z1.d }, z2.d

stores elements from z1, indexed by elements from z2, into z0.

llvm-svn: 336544
2018-07-09 12:32:56 +00:00
Andrea Di Biagio 8834779644 [llvm-mca] report an error if the assembly sequence contains an unsupported instruction.
This is a short-term fix for PR38093.
For now, we llvm::report_fatal_error if the instruction builder finds an
unsupported instruction in the instruction stream.

We need to revisit this fix once we start addressing PR38101.
Essentially, we need a better framework for error handling.

llvm-svn: 336543
2018-07-09 12:30:55 +00:00
Chandler Carruth ed2965438e [PM/Unswitch] Fix a nasty bug in the new PM's unswitch introduced in
r335553 with the non-trivial unswitching of switches.

The code correctly updated most aspects of the CFG and analyses, but
missed some crucial aspects:
1) When multiple cases have the same successor, we unswitch that
   a single time and replace the switch with a direct branch. The CFG
   here is correct, but the target of this direct branch may have had
   a PHI node with multiple entries in it.
2) When we still have to clone a successor of the switch into an
   unswitched copy of the loop, we'll delete potentially multiple edges
   entering this successor, not just one.
3) We also have to delete multiple edges entering the successors in the
   original loop when they have to be retained.
4) When the "retained successor" *also* occurs as a case successor, we
   just assert failed everywhere. This doesn't happen very easily
   because its always valid to simply drop the case -- the retained
   successor for switches is always the default successor. However, it
   is likely possible through some contrivance of different loop passes,
   unrolling, and simplifying for this to occur in practice and
   certainly there is nothing "invalid" about the IR so this pass needs
   to handle it.
5) In the case of , we also will replace these multiple edges with
   a direct branch much like in  and need to collapse the entries in
   any PHI nodes to a single enrty.

All of this stems from the delightful fact that the same successor can
show up in multiple parts of the switch terminator, and each of these
are considered a distinct edge for the purpose of PHI nodes (and
iterating the successors and predecessors) but not for unswitching
itself, the dominator tree, or many other things. For the record,
I intensely dislike this "feature" of the IR in large part because of
the complexity it causes in passes like this. We already have a ton of
logic building sets and handling duplicates, and we just had to add
a bunch more.

I've added a complex test case that covers all five of the above failure
modes. I've also added a variation on it where  and  occur in loop
exit, adding fun where we have an LCSSA PHI node with "multiple entries"
despite have dedicated exits. There were no additional issues found by
this, but it seems a useful corner case to cover with testing.

One thing that working on all of this code has made painfully clear for
me as well is how amazingly inefficient our PHI node representation is
(in terms of the in-memory data structures and the APIs used to update
them). This code has truly marvelous complexity bounds because every
time we remove an entry from a PHI node we do a linear scan to find it
and then a linear update to the data structure to remove it. We could in
theory batch all of the PHI node updates into a single linear walk of
the operands making this much more efficient, but the APIs fight hard
against this and the fact that we have to handle duplicates in the
peculiar manner we do (removing all but one in some cases) makes even
implementing that very tedious and annoying. Anyways, none of this is
new here or specific to loop unswitching. All code in LLVM that updates
PHI node operands suffers from these problems.

llvm-svn: 336536
2018-07-09 10:30:48 +00:00
Sander de Smalen c69944c6b0 [AArch64][SVE] Asm: Support for ADR instruction.
Supporting various addressing modes:
- adr z0.s, [z0.s, z0.s]
- adr z0.s, [z0.s, z0.s, lsl #<shift>]
- adr z0.d, [z0.d, z0.d]
- adr z0.d, [z0.d, z0.d, lsl #<shift>]
- adr z0.d, [z0.d, z0.d, uxtw #<shift>]
- adr z0.d, [z0.d, z0.d, sxtw #<shift>]

Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D48870

llvm-svn: 336533
2018-07-09 09:58:24 +00:00
Sander de Smalen bd513b42a1 [AArch64][SVE] Asm: Support for UZP and TRN instructions.
This patch adds support for:
  UZP1  Concatenate even elements from two vectors
  UZP2  Concatenate  odd elements from two vectors
  TRN1  Interleave  even elements from two vectors
  TRN2  Interleave   odd elements from two vectors

With variants for both data and predicate vectors, e.g.
  uzp1    z0.b, z1.b, z2.b
  trn2    p0.s, p1.s, p2.s

llvm-svn: 336531
2018-07-09 09:12:17 +00:00
Chijun Sima 9e1e0c7b2a [PGOMemOPSize] Preserve the DominatorTree
Summary:
PGOMemOPSize only modifies CFG in a couple of places; thus we can preserve the DominatorTree with little effort.
When optimizing SQLite with -O3, this patch can decrease 3.8% of the numbers of nodes traversed by DFS and 5.7% of the times DominatorTreeBase::recalculation is called.

Reviewers: kuhar, davide, dmgreen

Reviewed By: dmgreen

Subscribers: mzolotukhin, vsk, llvm-commits

Differential Revision: https://reviews.llvm.org/D48914

llvm-svn: 336522
2018-07-09 08:07:21 +00:00
Craig Topper 9e17073c21 [X86] Enhance combineFMA to look for FNEG behind an EXTRACT_VECTOR_ELT.
llvm-svn: 336514
2018-07-08 18:04:00 +00:00
Simon Pilgrim 2eced71ecf [X86][SSE] Combine v16i8 SHL by constants to multiplies
Pre-AVX512 (which can perform a quick extend/shift/truncate), extending to 2 v8i16 for the PMULLW and then truncating is more performant than relying on the generic PBLENDVB vXi8 shift path and uses a similar amount of mask constant pool data.

Differential Revision: https://reviews.llvm.org/D48963

llvm-svn: 336513
2018-07-08 12:47:50 +00:00
Roman Lebedev 0e58dee284 [MCA][X86][NFC] Add BSF/BSR resource tests
Reviewers: RKSimon, andreadb, courbet

Reviewed By: RKSimon

Subscribers: gbedwell, llvm-commits

Differential Revision: https://reviews.llvm.org/D48997

llvm-svn: 336510
2018-07-08 09:50:14 +00:00