Commit Graph

7 Commits

Author SHA1 Message Date
Jordan Rupprecht 351b7e7b24 Revert Recommit [PowerPC] Update P9 vector costs for insert/extract element
This reverts r364557 (git commit 9f7f5858fe)

This crashes as reported on the commit thread. Repro instructions TBD.

llvm-svn: 364876
2019-07-01 23:29:46 +00:00
Roland Froese 9f7f5858fe Recommit [PowerPC] Update P9 vector costs for insert/extract element
Recommit patch D60160 after regression fix patch D63463.

llvm-svn: 364557
2019-06-27 16:20:24 +00:00
David L. Jones fccb505f0f Revert "[llvm] r359313 - [PowerPC] Update P9 vector costs for insert/extract element"
This causes segfaults during optimized builds. More details, including a reproducer, are on the llvm-commits thread for r359313.

llvm-svn: 359648
2019-05-01 05:01:03 +00:00
Roland Froese 4b17772b9e [PowerPC] Update P9 vector costs for insert/extract element
The PPC vector cost model values for insert/extract element reflect older
processors that lacked vector insert/extract and move-to/move-from VSR
instructions.  Update getVectorInstrCost to give appropriate values for when
the newer instructions are present.

Differential Revision: https://reviews.llvm.org/D60160

llvm-svn: 359313
2019-04-26 16:14:17 +00:00
Roland Froese a5dd08cac2 [PowerPC] Add some PPC vec cost tests to prep for D60160 NFC
llvm-svn: 358699
2019-04-18 18:12:09 +00:00
Hal Finkel de0b413ec0 [PowerPC] Adjust load/store costs in PPCTTI
This provides more realistic costs for the insert/extractelement instructions
(which are load/store pairs), accounts for the cheap unaligned Altivec load
sequence, and for unaligned VSX load/stores.

Bad news:
MultiSource/Applications/sgefa/sgefa - 35% slowdown (this will require more investigation)
SingleSource/Benchmarks/McGill/queens - 20% slowdown (we no longer vectorize this, but it was a constant store that was scalarized)
MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2 - 2% slowdown

Good news:
SingleSource/Benchmarks/Shootout/ary3 - 54% speedup
SingleSource/Benchmarks/Shootout-C++/ary - 40% speedup
MultiSource/Benchmarks/Ptrdist/ks/ks - 35% speedup
MultiSource/Benchmarks/FreeBench/neural/neural - 30% speedup
MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt - 20% speedup

Unfortunately, estimating the costs of the stack-based scalarization sequences
is hard, and adjusting these costs is like a game of whac-a-mole :( I'll
revisit this again after we have better codegen for vector extloads and
truncstores and unaligned load/stores.

llvm-svn: 205658
2014-04-04 23:51:18 +00:00
Bill Schmidt 62fe7a5b17 Refine fix to bug 15041.
Thanks to help from Nadav and Hal, I have a more reasonable (and even
correct!) approach.  This specifically penalizes the insertelement
and extractelement operations for the performance hit that will occur
on PowerPC processors.

llvm-svn: 174725
2013-02-08 18:19:17 +00:00