This patch enables fusing dependent AESE/AESMC and AESD/AESIMC
instruction pairs on Cortex-A72, as recommended in the Software
Optimization Guide, section 4.10.
llvm-svn: 303073
Doing this means that if an LEApcrel is used in two places we will rematerialize
instead of generating two MOVs. This is particularly useful for printfs using
the same format string, where we want to generate an address into a register
that's going to get corrupted by the call.
Differential Revision: https://reviews.llvm.org/D32858
llvm-svn: 303054
Doing this lets us hoist it out of loops, and I've also marked it as
rematerializable the same as the thumb1 and thumb2 counterparts.
It looks like it being marked as such was just a mistake, as the commit that
made that change only mentions LEApcrelJT and in thumb1 and thumb2 only the
LEApcrelJT instructions were marked as having side-effects, so it looks like
the intent was to only mark LEApcrelJT as having side-effects but LEApcrel was
accidentally marked as such also.
Differential Revision: https://reviews.llvm.org/D32857
llvm-svn: 303053
Currently, when masked load, store, gather or scatter intrinsics are used, we check in CodeGenPrepare pass if the subtarget support these intrinsics, if not we replace them with scalar code - this is a functional transformation not an optimization (not optional).
CodeGenPrepare pass does not run when the optimization level is set to CodeGenOpt::None (-O0).
Functional transformation should run with all optimization levels, so here I created a new pass which runs on all optimization levels and does no more than this transformation.
Differential Revision: https://reviews.llvm.org/D32487
llvm-svn: 303050
We were previously silently emitting bogus data in release mode,
making it very hard to diagnose the error, or crashing with an
assert in debug mode. A proper diagnostic is now always emitted
when the value to be emitted is out of range.
llvm-svn: 303041
I noticed the 512-bit lzcnts don't use the X86 specific lookup table code and instead use the EXPAND case in LegalizeDAG. I was toying around with fixing this and noticed it would require compare instructions that generate i1 masks and then converting from mask to vector. Then I noticed that we don't test which compares are used with avx512vl and no avx512cd.
llvm-svn: 303020
Remove an unneeded prefix from the 32-bit command line. Make all the 64-bit triples match. Replace ALL with X64 and remove it from the 32-bit test.
llvm-svn: 303019
Running `llvm-readobj -coff-directives msvcrt.lib` resulted in this error:
Invalid data was encountered while parsing the file
This happened because some of the object files in the archive have empty
`.drectve` sections. These empty sections result in a `parse_failed` error being
returned from `COFFObjectFile::getSectionContents()`, which in turn caused
`llvm-readobj` to stop. With this change, `getSectionContents` now returns
success, and like before the resulting array is empty.
Patch by Dave Lee.
Differential Revision: https://reviews.llvm.org/D32652
llvm-svn: 303014
Summary:
If the Worklist build causes an IR change this change flag currently factors into the flag for running another iteration of the iteration loop. But only changes during processing should trigger another loop.
This patch captures the worklist creation change flag into the outside the loop flag currently used for DbgDeclares and only sends that flag up to the caller. Rerunning the loop only depends on IC.run() now.
This uses the debug output of InstCombine to determine if one or two iterations run. I couldn't think of a better way to detect it since the second spurious iteration shoudn't make any visible changes. Just wasted computation.
I can do a pre-commit of the test case with the CHECK-NOT as a CHECK if this is an ok way to check this.
This is a subset of D31678 as I'm still not sure how to verify the analysis behavior for that.
Reviewers: davide, majnemer, spatel, chandlerc
Reviewed By: davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32453
llvm-svn: 302982
Tests with target intrinsics are inherently target specific, so it
doesn't actually make sense to run them if we've excluded their
target.
llvm-svn: 302979
I bet the change is correct but this test seems to expose some underlying
problem that manifest only on some buildbots, and I'm not able to reproduce
locally. Unfortunately I can't debug right now but I don't want to annoy
people with spurious failures, so I'll XFAIL until I can take a look (over
the weekend).
llvm-svn: 302976
Update a few tests to use llvm.masked.load/store instead of arm neon
vector loads and stores, and move the tests that are actually specific
to those arm intrinsics to their own files. This lets us mark the
tests that use target specific intrinsics as requiring those targets.
llvm-svn: 302972
Implemented frequency based cost/saving analysis
and related options.
The pass is now in a state ready to be turne on
in the pipeline (in follow up).
Differential Revision: http://reviews.llvm.org/D32783
llvm-svn: 302967