Commit r206510 falsely advertised to fix the load cases, even though it only
fixed the store case. This commit adds the same fix for the load case including
the missing test coverage.
llvm-svn: 206577
This commit was attributed to a different person from the person who
posted the patch to the list, and the person who posted it the list
claimed when they did that they were not the author, but that the author
was yet a third person. I don't know what is going on here, but
reverting until the attribution is clear and the author has explicitly
contributed the patch.
Also, the review hasn't really involved any of the MC maintainers and
that seems questionable too.
llvm-svn: 206576
Code mostly copied from AArch64, just tidied up a trifle and plumbed
into the ARM64 way of doing things.
This also enables the AArch64 tests which inspired the previous
untested commits.
llvm-svn: 206574
A vector extract followed by a dup can become a single instruction even if the
types don't match. AArch64 handled this in ISelLowering, but a few reasonably
simple patterns can take care of it in TableGen, so that's where I've put it.
llvm-svn: 206573
ARM64 was scalarizing some vector comparisons which don't quite map to
AArch64's compare and mask instructions. AArch64's approach of sacrificing a
little efficiency to emulate them with the limited set available was better, so
I ported it across.
More "inspired by" than copy/paste since the backend's internal expectations
were a bit different, but the tests were invaluable.
llvm-svn: 206570
I enhanced it a little in the process. The decision shouldn't really be beased
on whether a BUILD_VECTOR is a splat: any set of constants will do the job
provided they're related in the correct way.
Also, the BUILD_VECTOR could be any operand of the incoming AND nodes, so it's
best to check for all 4 possibilities rather than assuming it'll be the RHS.
llvm-svn: 206569
It's not actually used to handle C or C++ ABI rules on ARM64, but could well be
emitted by other language front-ends, so it's as well to have a sensible
implementation.
llvm-svn: 206568
Previously module verification was always enabled, with no way to turn it off.
As of this commit, module verification is on by default in Debug builds, and off
by default in release builds. The default behaviour can be overridden by calling
setVerifyModules(bool) on the JIT instance (this works for both the old JIT, and
MCJIT).
<rdar://problem/16150008>
llvm-svn: 206561
Use scalar BFE with constant shift and offset when possible.
This is complicated by the fact that the scalar version packs
the two operands of the vector version into one.
llvm-svn: 206558
Rewrite the shared implementation of BlockFrequencyInfo and
MachineBlockFrequencyInfo entirely.
The old implementation had a fundamental flaw: precision losses from
nested loops (or very wide branches) compounded past loop exits (and
convergence points).
The @nested_loops testcase at the end of
test/Analysis/BlockFrequencyAnalysis/basic.ll is motivating. This
function has three nested loops, with branch weights in the loop headers
of 1:4000 (exit:continue). The old analysis gives non-sensical results:
Printing analysis 'Block Frequency Analysis' for function 'nested_loops':
---- Block Freqs ----
entry = 1.0
for.cond1.preheader = 1.00103
for.cond4.preheader = 5.5222
for.body6 = 18095.19995
for.inc8 = 4.52264
for.inc11 = 0.00109
for.end13 = 0.0
The new analysis gives correct results:
Printing analysis 'Block Frequency Analysis' for function 'nested_loops':
block-frequency-info: nested_loops
- entry: float = 1.0, int = 8
- for.cond1.preheader: float = 4001.0, int = 32007
- for.cond4.preheader: float = 16008001.0, int = 128064007
- for.body6: float = 64048012001.0, int = 512384096007
- for.inc8: float = 16008001.0, int = 128064007
- for.inc11: float = 4001.0, int = 32007
- for.end13: float = 1.0, int = 8
Most importantly, the frequency leaving each loop matches the frequency
entering it.
The new algorithm leverages BlockMass and PositiveFloat to maintain
precision, separates "probability mass distribution" from "loop
scaling", and uses dithering to eliminate probability mass loss. I have
unit tests for these types out of tree, but it was decided in the review
to make the classes private to BlockFrequencyInfoImpl, and try to shrink
them (or remove them entirely) in follow-up commits.
The new algorithm should generally have a complexity advantage over the
old. The previous algorithm was quadratic in the worst case. The new
algorithm is still worst-case quadratic in the presence of irreducible
control flow, but it's linear without it.
The key difference between the old algorithm and the new is that control
flow within a loop is evaluated separately from control flow outside,
limiting propagation of precision problems and allowing loop scale to be
calculated independently of mass distribution. Loops are visited
bottom-up, their loop scales are calculated, and they are replaced by
pseudo-nodes. Mass is then distributed through the function, which is
now a DAG. Finally, loops are revisited top-down to multiply through
the loop scales and the masses distributed to pseudo nodes.
There are some remaining flaws.
- Irreducible control flow isn't modelled correctly. LoopInfo and
MachineLoopInfo ignore irreducible edges, so this algorithm will
fail to scale accordingly. There's a note in the class
documentation about how to get closer. See also the comments in
test/Analysis/BlockFrequencyInfo/irreducible.ll.
- Loop scale is limited to 4096 per loop (2^12) to avoid exhausting
the 64-bit integer precision used downstream.
- The "bias" calculation proposed on llvmdev is *not* incorporated
here. This will be added in a follow-up commit, once comments from
this review have been handled.
llvm-svn: 206548
The frontend option -fno-optimize-sibling-calls resolves to -cc1's
-mdisable-tail-calls, which is passed to the TargetMachine in the
backend. PassManagerBuilder was adding the -tailcallelim pass anyway.
Use a new DisableTailCalls option in PassManagerBuilder to disable tail
calls harder.
Requires the matching commit in LLVM that adds DisableTailCalls.
<rdar://problem/16050591>
llvm-svn: 206543
Even tough we may want to generate a vector load, the address from which to load
still is a scalar. Make sure even if previous address computations may have been
vectorized, that the addresses are also available as scalars.
This fixes http://llvm.org/PR19469
Reported-by: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
llvm-svn: 206510
In preparation for using a binary format for instrumentation based
profiling, explicitly treat the test inputs as text and transform them
before running. This will allow us to leave the checked in files in
human readable format once the instrumentation format is binary.
No functional change.
llvm-svn: 206509
Summary:
This prevents the discriminator generation pass from triggering if
the DWARF version being used in the module is prior to 4.
Reviewers: echristo, dblaikie
CC: llvm-commits
Differential Revision: http://reviews.llvm.org/D3413
llvm-svn: 206507