llvm-project/llvm/test
Simon Pilgrim cb780b32a3 [X86][SSE] Optimize the truncation of vector comparison results with PACKSS
We currently default to using either generic shuffles or MASK+PACKUS/PACKSS to truncate all integer vectors. For vector comparisons, we know that the result will be either all or zero bits in every element, which can be efficiently truncated by directly using PACKSS to repeatedly halve the size of each element.

Due to the limited input values (-1 or 0) we don't need to account for vector element size, so for simplicity we just use the PACKSS(vXi16,vXi16) implementation in all cases. Additionally for AVX2 PACKSS of 256bit data we must perform a PERMQ shuffle to reorder the data into the correct order. I did investigate performing a single shuffle after all the PACKSS calls but the need to cross 128bit lanes makes this difficult to achieve efficiently.

We avoid performing this on AVX512 as it should have better alternative truncation instructions.

Differential Revision: https://reviews.llvm.org/D22814

llvm-svn: 277132
2016-07-29 10:23:10 +00:00
..
Analysis [BPI] Add new LazyBPI analysis 2016-07-28 23:31:12 +00:00
Assembler Invariant start/end intrinsics overloaded for address space 2016-07-22 17:49:40 +00:00
Bindings Add writeonly IR attribute 2016-07-04 08:01:29 +00:00
Bitcode Add writeonly IR attribute 2016-07-04 08:01:29 +00:00
BugPoint
CodeGen [X86][SSE] Optimize the truncation of vector comparison results with PACKSS 2016-07-29 10:23:10 +00:00
DebugInfo [CodeView] Don't crash on functions without subprograms 2016-07-28 05:03:22 +00:00
Examples
ExecutionEngine X86: handle external tail calls in Windows JIT 2016-07-14 17:27:06 +00:00
Feature Add flag to PassManagerBuilder to disable GVN Hoist Pass. 2016-07-22 22:02:19 +00:00
FileCheck Make check lines not match themselves. 2016-06-16 19:38:48 +00:00
Instrumentation Unpoison stack before resume instruction 2016-07-22 22:04:38 +00:00
Integer
JitListener
LTO Add a libLTO API to query a memory buffer and check if it contains ObjC categories 2016-07-11 23:10:18 +00:00
LibDriver
Linker Don't verify inputs to the Linker if ODR merging. 2016-06-29 18:31:48 +00:00
MC [MC][X86] Fix Intel Operand assembly parsing for .set ids 2016-07-27 17:39:41 +00:00
Object Add checks to the MachOObjectFile() constructor to make sure load commands sizes 2016-07-07 22:11:42 +00:00
ObjectYAML [YAML] Fix YAML tags appearing before the start of sequence elements 2016-06-28 21:10:26 +00:00
Other Temporarily remove one test run line to unblock PPC bots. 2016-07-08 00:32:58 +00:00
SymbolRewriter [PM] Port SymbolRewriter to the new PM 2016-07-25 20:52:00 +00:00
TableGen tests: accept different TargetOpcode values. 2016-07-07 17:51:42 +00:00
ThinLTO/X86 ThinLTO: Do not take into account whether a definition has multiple copies when promoting. 2016-07-07 18:31:51 +00:00
Transforms [EarlyCSE] Correctly handle simplified, but live, instructions 2016-07-29 05:39:21 +00:00
Unit
Verifier [IR] Introduce a non-integral pointer type 2016-07-28 23:43:38 +00:00
YAMLParser
tools Capture stderr when checking for gold version 2016-07-29 00:39:56 +00:00
.clang-format
CMakeLists.txt
TestRunner.sh
lit.cfg [lit] Don't match tool names within new PM's <> markers 2016-07-25 23:09:10 +00:00
lit.site.cfg.in