Commit Graph

22 Commits

Author SHA1 Message Date
Craig Topper eaa1cf5b57 [X86] Fix typo in comment. NFC
llvm-svn: 345236
2018-10-25 05:00:20 +00:00
Adrian Prantl 5f8f34e459 Remove \brief commands from doxygen comments.
We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.

Patch produced by

  for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done

Differential Revision: https://reviews.llvm.org/D46290

llvm-svn: 331272
2018-05-01 15:54:18 +00:00
David Blaikie 13e77db2df Fix layering of MachineValueType.h by moving it from CodeGen to Support
This is used by llvm tblgen as well as by LLVM Targets, so the only
common place is Support for now. (maybe we need another target for these
sorts of things - but for now I'm at least making them correct & we can
make them better if/when people have strong feelings)

llvm-svn: 328395
2018-03-23 23:58:25 +00:00
Craig Topper 2153114227 [X86] Fix typo in comment. NFC
llvm-svn: 318156
2017-11-14 16:14:00 +00:00
Michael Zuckerman 72a6f893cb Fixing bug issue https://bugs.llvm.org/show_bug.cgi?id=34978
Change-Id: I7f13d5bcb181be2860377df7b40e1579a8ad4add
llvm-svn: 316067
2017-10-18 08:04:31 +00:00
Eugene Zelenko 60433b682f [X86] Fix some Clang-tidy modernize-use-using and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 314953
2017-10-05 00:33:50 +00:00
Michael Zuckerman e4084f6bdb [X86][LLVM]Expanding Supports lowerInterleaved{store|load}() in X86InterleavedAccess (VF64 stride 3-4)
I continue to support different VF interleaved and in this pass for this patch,
I added the vf64 stride3 support for both load and store.
I also added support fot the stride4 store.

Reviewers:
1. zvi
2. dorit
3. igorb
4. guyblank

Differential Revision: https://reviews.llvm.org/D37687

Change-Id: I3d238efedf217d1768b348d710de1efa2f19d27b
llvm-svn: 314651
2017-10-02 07:35:25 +00:00
Michael Zuckerman b92b6d424f Code refactoring for the interleaved code <NFC>
Change-Id: I7831c9febad8e14278a5bc87584a0053dc837be1
llvm-svn: 314596
2017-09-30 14:55:03 +00:00
Michael Zuckerman 0b5db55b96 Small modification <NFC>
Change-Id: I360abccee12cae29bd2ac4f8399c9ecc92eb7f13
llvm-svn: 314510
2017-09-29 12:45:54 +00:00
Michael Zuckerman 645f777e40 [X86][LLVM]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess (VF{8|16|32} stride 3)
This patch expands the support of lowerInterleavedStore to {8|16|32}x8i stride 3.

LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=3 VF={8|16|32}) .
This patch is part two of two patches and it covers the store (interlevaed) side.

The patch goal is to optimize the following sequence:
a0 a1 a2 a3 a4 a5 a6 a7
b0 b1 b2 b3 b4 b5 b6 b7
c0 c1 c2 c3 c4 c5 c6 c7

into
a0 b0 c0 a1 b1 c1 a2 b2
c2 a3 b3 c3 a4 b4 c4 a5
b5 c5 a6 b6 c6 a7 b7 c7

Reviewers:
zvi
guyblank
dorit
Ayal

Differential Revision: https://reviews.llvm.org/D37117

Change-Id: I56ced8bcbea809a37654060771911ade20246ccc
llvm-svn: 314234
2017-09-26 18:49:11 +00:00
Michael Zuckerman 4a97df01c4 [X86][LLVM]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess (VF8 stride 4):
This patch expands the support of lowerInterleavedStore to 8x8i stride 4.

LLVM creates suboptimal shuffle code-gen for AVX2.
In overall, this patch is a specific fix for the pattern (Strid=4 VF=8) and we plan to include more patterns in the future.

The patch goal is to optimize the following sequence:
At the end of the computation, we have xmm2, xmm0, xmm12 and xmm3 holding
each 8 chars:

c0, c1, , c7
m0, m1, , m7
y0, y1, , y7
k0, k1, ., k7

And these need to be transposed/interleaved and stored like so:

c0 m0 y0 k0 c1 m1 y1 k1 c2 m2 y2 k2 c3 m3 y3 k3 ....

Reviewers
DavidKreitzer
Farhana
zvi
igorb
guyblank
RKSimon
Ayal

Differential Revision: https://reviews.llvm.org/D36058

Change-Id: I3cc5c2ca5d6318901c192a4428493b99ef424c32
llvm-svn: 314109
2017-09-25 14:50:38 +00:00
Michael Zuckerman 80d3649f23 Refactoring the stride 4 code in the X86interleavedaccess NFC
llvm-svn: 313166
2017-09-13 18:28:09 +00:00
Michael Zuckerman 5a385940d3 [X86][LLVM]Expanding Supports lowerInterleavedLoad() in X86InterleavedAccess (VF{8|16|32} stride 3).
This patch expands the support of lowerInterleavedload to {8|16|32}x8i stride 3.

LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=3 VF={8|16|32}) and we plan to include the store (deinterleved side).

The patch goal is to optimize the following sequence:
a0 b0 c0 a1 b1 c1 a2 b2
c2 a3 b3 c3 a4 b4 c4 a5
b5 c5 a6 b6 c6 a7 b7 c7

into

a0 a1 a2 a3 a4 a5 a6 a7
b0 b1 b2 b3 b4 b5 b6 b7
c0 c1 c2 c3 c4 c5 c6 c7

Reviewers
1. zvi
2. igor
3. guyblank
4. dorit
5. Ayal

llvm-svn: 312722
2017-09-07 14:02:13 +00:00
Michael Zuckerman 680ac10aa7 [X86][LLVM]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess (VF16 stride 4).
This patch expands the support of lowerInterleavedStore to 16x8i stride 4.

LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=4 VF=16) and we plan to include more patterns in the future.

The patch goal is to optimize the following sequence:
At the end of the computation, we have ymm2, ymm0, ymm12 and ymm3 holding
each 16 chars:

c0, c1, , c16
m0, m1, , m16
y0, y1, , y16
k0, k1, ., k16

And these need to be transposed/interleaved and stored like so:

c0 m0 y0 k0 c1 m1 y1 k1 c2 m2 y2 k2 c3 m3 y3 k3 ....

Differential Revision: https://reviews.llvm.org/D35829

llvm-svn: 310252
2017-08-07 13:22:39 +00:00
Michael Zuckerman c1918ad571 [X86][LLVM]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess.
This patch expands the support of lowerInterleavedStore to 32x8i stride 4.

LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=4 VF=32) and we plan to include more patterns in the future. To reach our goal of "more patterns". We include two mask creators. The first function creates shuffle's mask equivalent to unpacklo/unpackhi instructions. The other creator creates mask equivalent to a concat of two half vectors(high/low).

The patch goal is to optimize the following sequence:
At the end of the computation, we have ymm2, ymm0, ymm12 and ymm3 holding
each 32 chars:

c0, c1, , c31
m0, m1, , m31
y0, y1, , y31
k0, k1, ., k31

And these need to be transposed/interleaved and stored like so:

c0 m0 y0 k0 c1 m1 y1 k1 c2 m2 y2 k2 c3 m3 y3 k3 ....

Reviewers:
dorit
Farhana
RKSimon
guyblank
DavidKreitzer

Differential Revision: https://reviews.llvm.org/D34601

llvm-svn: 309086
2017-07-26 08:10:14 +00:00
Farhana Aleen e4a89a6462 X86InterleaveAccess: A fix for bug33826
Reviewers: DavidKreitzer

Differential Revision: https://reviews.llvm.org/D35638

llvm-svn: 308784
2017-07-21 21:35:00 +00:00
Farhana Aleen 9bd593e0d7 Fixed a (product) build error that was due to an unused variable
Details: There was a use but it was in the assert which was not
         exercised during product build.

Reviewers: Andrew Kaylor

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D32658

llvm-svn: 306073
2017-06-22 23:56:31 +00:00
Farhana Aleen 4b652a5335 Supported lowerInterleavedStore() in X86InterleavedAccess.
Reviewers: RKSimon, DavidKreitzer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D32658

llvm-svn: 306068
2017-06-22 22:59:04 +00:00
Benjamin Kramer efcf06f5f2 Move symbols from the global namespace into (anonymous) namespaces. NFC.
llvm-svn: 294837
2017-02-11 11:06:55 +00:00
Benjamin Kramer 215b22e612 Fix unused variable warning in Release builds. NFC.
llvm-svn: 288416
2016-12-01 20:49:34 +00:00
David L Kreitzer 0e3ae305b6 Refactored X86InterleavedAccess into a class. NFCI.
Patch by Farhana Aleen

Differential Revision: https://reviews.llvm.org/D25986

llvm-svn: 288410
2016-12-01 19:56:39 +00:00
David L Kreitzer 01a057a0c4 Add a pass to optimize patterns of vectorized interleaved memory accesses for
X86. The pass optimizes as a unit the entire wide load + shuffles pattern
produced by interleaved vectorization. This initial patch optimizes one pattern
(64-bit elements interleaved by a factor of 4). Future patches will generalize
to additional patterns.

Patch by Farhana Aleen

Differential revision: http://reviews.llvm.org/D24681

llvm-svn: 284260
2016-10-14 18:20:41 +00:00