Summary:
These instructions were added in MIPS-I, and MIPS-II but were removed in
MIPS-III. Interestingly, GAS continues to accept them when assembling for
MIPS-III.
For the moment, these instructions will follow GAS and accept them for
MIPS-III and newer but this will be tightened up when the invalid-*.s
tests are added.
Depends on D3647
Reviewers: vmedic
Reviewed By: vmedic
Differential Revision: http://reviews.llvm.org/D3648
llvm-svn: 208311
Summary:
A small number of instructions are rejected with the wrong error message.
These have been placed in a separate test for now. There seems to be some
parsing quirk that triggers when these instructions are disabled.
Depends on D3571
Reviewers: vmedic
Reviewed By: vmedic
Differential Revision: http://reviews.llvm.org/D3647
llvm-svn: 208305
Also removed an inaccurate comment that stated that a DenseMap was used as
storage for the ListInit*'s. It's currently using a FoldingSet.
I expect there's a better way to fix this but I haven't found it yet. FoldingSet
is incompatible with the Pool template and I'm not sure if FoldingSet can be
safely replaced with a DenseMap of computed FoldingSetID's to ListInit*'s.
llvm-svn: 208293
The old method used by X86TTI to determine partial-unrolling thresholds was
messy (because it worked by testing target features), and also would not
correctly identify the target CPU if certain target features were disabled.
After some discussions on IRC with Chandler et al., it was decided that the
processor scheduling models were the right containers for this information
(because it is often tied to special uop dispatch-buffer sizes).
This does represent a small functionality change:
- For generic x86-64 (which uses the SB model and, thus, will get some
unrolling).
- For AMD cores (because they still currently use the SB scheduling model)
- For Haswell (based on benchmarking by Louis Gerbarg, it was decided to bump
the default threshold to 50; we're working on a test case for this).
Otherwise, nothing has changed for any other targets. The logic, however, has
been moved into BasicTTI, so other targets may now also opt-in to this
functionality simply by setting LoopMicroOpBufferSize in their processor
model definitions.
llvm-svn: 208289
This adds FK_SecRel_2 relocation support to ARM. This enables the building of
object files for armv7-windows-msvc which enables CodeView line tables for
debugging as opposed to armv7-windows-itanium which currently uses DWARF.
llvm-svn: 208273
Summary:
Vectors built with zeros and elements in the same order as another
(source) vector are optimized to be built using a single insertps
instruction.
Also optimize when we move one element in a vector to a different place
in that vector while zeroing out some of the other elements.
Further optimizations are possible, described in TODO comments.
I will be implementing at least some of them in the near future.
Added some tests for different cases where this optimization triggers.
Reviewers: nadav, delena, craig.topper
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D3521
llvm-svn: 208271
The change to ExtractGV.cpp has no functionality change except to avoid
the asserts. Existing testcases already cover this, so I didn't add a
new one.
llvm-svn: 208264
Visibilities of `hidden` and `protected` are meaningless for symbols
with local linkage.
- Change the assembler to reject non-default visibility on symbols
with local linkage.
- Change the bitcode reader to auto-upgrade `hidden` and `protected`
to `default` when the linkage is local.
- Update LangRef.
<rdar://problem/16141113>
llvm-svn: 208263
`ModuleLinker::getLinkageResult()` shouldn't create symbols with local
linkage and non-default visibility -- in fact, symbols with local
linkage shouldn't be merged at all. Assert to that effect.
llvm-svn: 208262
Since visibility is meaningless for symbols with local linkage, check
local linkage before visibility when setting symbol attributes.
When linkage is `internal` and the visibility is `hidden`, the exposed
attribute is now `LTO_SYMBOL_SCOPE_INTERNAL` instead of
`LTO_SYMBOL_SCOPE_HIDDEN`. Although the bitfield allows *both* to be
specified, the combination is nonsense anyway.
Given changes (in progress) to drop visibility when a symbol has local
linkage, this almost has no functionality change: it's mostly a cleanup
to clarify the logic.
The exception is when something has `appending` linkage. Before this
change, such symbols would be advertised as `LTO_SYMBOL_SCOPE_INTERNAL`;
now, they'll be given `LTO_SYMBOL_SCOPE_COMMON`.
Unfortunately this is really awkward to test. This only changes what we
advertise to linkers (before running LTO), not what the final object
looks like. In theory I could add `DEBUG` output to `llvm-lto` (and
test with "REQUIRES: asserts"), but follow-up commits to disallow
`internal hidden` simplify this anyway.
<rdar://problem/16141113>
llvm-svn: 208261
relocation entries it applies.
Prior to this patch, RuntimeDyldImpl::resolveExternalSymbols discarded
relocations for external symbols once they had been applied. This causes issues
if the client calls MCJIT::finalizeLoadedModules more than once, and updates the
location of any symbols in between (e.g. by calling MCJIT::mapSectionAddress).
No test case yet: None of our in-tree memory managers support moving sections
around. I'll have to hack up a dummy memory manager before I can write a unit
test.
Fixes <rdar://problem/16764378>
llvm-svn: 208257
The loop stream detector (LSD) on modern Intel cores, which optimizes the
execution of small loops, has limits on the number of taken branches in
addition to uop-count limits (modern AMD cores have similar limits).
Unfortunately, at the IR level, estimating the number of branches that will be
taken is difficult. For one thing, it strongly depends on later passes (block
placement, etc.). The original implementation took a conservative approach and
limited the maximal BB DFS depth of the loop. However, fairly-extensive
benchmarking by several of us has revealed that this is the wrong approach. In
fact, there are zero known cases where the branch limit prevents a detrimental
unrolling (but plenty of cases where it does prevent beneficial unrolling).
While we could improve the current branch counting logic by incorporating
branch probabilities, this further complication seems unjustified without a
motivating regression. Instead, unless and until a regression appears, the
branch counting will be removed.
llvm-svn: 208255
Given a FMA family (e.g., 213, 231), not all the variants (i.e., register or
memory) are commutable.
E.g., for the 213 family (with the syntax src1, src2, src3):
fmaXXX213 A, B, reg3/mem3 == fmaXXX213 B, A, reg3/mem3
Now consider the 231 family:
fmaXXX231 A, B, reg3 == fmaXXX231 A, reg3, B
But
fmaXXX231 A, B, mem3 != fmaXXX231 A, mem3, B
Indeed, mem3 cannot be the second argument of the memory variant of fmaXXX231.
Working on a reduced test case!
<rdar://problem/16800495>
llvm-svn: 208252
When reducing the bitwidth of a comparison against a constant, the
original setcc's result type was used, which was incorrect.
No test since I don't think any other in tree targets change the
bitwidth of the setcc type depending on the bitwidth of the compared
type.
llvm-svn: 208236
To compute the dimensions of the array in a unique way, we split the
delinearization analysis in three steps:
- find parametric terms in all memory access functions
- compute the array dimensions from the set of terms
- compute the delinearized access functions for each dimension
The first step is executed on all the memory access functions such that we
gather all the patterns in which an array is accessed. The second step reduces
all this information in a unique description of the sizes of the array. The
third step is delinearizing each memory access function following the common
description of the shape of the array computed in step 2.
This rewrite of the delinearization pass also solves a problem we had with the
previous implementation: because the previous algorithm was by induction on the
structure of the SCEV, it would not correctly recognize the shape of the array
when the memory access was not following the nesting of the loops: for example,
see polly/test/ScopInfo/multidim_only_ivs_3d_reverse.ll
; void foo(long n, long m, long o, double A[n][m][o]) {
;
; for (long i = 0; i < n; i++)
; for (long j = 0; j < m; j++)
; for (long k = 0; k < o; k++)
; A[i][k][j] = 1.0;
Starting with this patch we no longer delinearize access functions that do not
contain parameters, for example in test/Analysis/DependenceAnalysis/GCD.ll
;; for (long int i = 0; i < 100; i++)
;; for (long int j = 0; j < 100; j++) {
;; A[2*i - 4*j] = i;
;; *B++ = A[6*i + 8*j];
these accesses will not be delinearized as the upper bound of the loops are
constants, and their access functions do not contain SCEVUnknown parameters.
llvm-svn: 208232
default architecture for reasonable modern x86 processors, actually be
modern. This processor model should essentially be "tuned" for modern
x86 chips as much as possible without undue penalties on any specific
architecture. Previously we weren't even using the nice scheduling
models. There are a few other tweaks needed here, but this change at
least I have benchmarked across a decent swatch of chips (intel's
clovertown, westmere, and sandybridge; amd's istanbul) and seen no
significant regressions.
If anyone has suggested ways to test this, just let me know. Somewhat
alarmingly, no existing tests failed.
llvm-svn: 208230
this patch disables the dead register elimination pass and the load/store pair
optimization pass at -O0. The ILP optimizations don't require the optimization
level to be checked because the call to addILPOpts is predicated with the
necessary check. The AdvSIMDScalar pass is disabled by default at all
optimization levels. This patch leaves that pass disabled by default.
Also, move command-line options into ARM64TargetMachine.cpp and add a few
additional flags to aid in debugging. This fixes an issue with the
-debug-pass=Structure flag where passes were printed, but not actually run
(i.e., AdvSIMDScalar pass).
llvm-svn: 208223
Summary:
These processors will only be available for the integrated assembler at
first (CodeGen will emit a fatal error saying they are not implemented).
The intention is to work through the existing instructions and correctly
annotate the ISA they were added in so that we have a sufficiently good
base to start MIPS64r6 development. MIPS64r6 removes/re-encodes certain
instructions and I believe it is best to define ISA's using set-union's
as far as possible rather than using set-subtraction.
Reviewers: vmedic
Subscribers: emaste, llvm-commits
Differential Revision: http://reviews.llvm.org/D3569
llvm-svn: 208221
This is a followup to r208171, where a call to make_unique was
disambiguated for MSVC. Disambiguate two more calls, and remove the
comment about it since this is what we do everywhere.
llvm-svn: 208219
When performing a scalar comparison that feeds into a vector select,
it's actually better to do the comparison on the vector side: the
scalar route would be "CMP -> CSEL -> DUP", the vector is "CM -> DUP"
since the vector comparisons are all mask based.
llvm-svn: 208210
Summary:
One small functional change. The recently added PAUSE instruction now has
the HasStdEnc predicate which was accidentally removed by a Requires<>.
Depends on D3640
Reviewers: vmedic
Reviewed By: vmedic
Differential Revision: http://reviews.llvm.org/D3641
llvm-svn: 208209
Summary:
Move IsGP64bit into GPRPredicates, and IsFP64bit/NotFP64bit into FGRPredicates
No functional change (confirmed by diffing tablegen-erated files).
Depends on D3639
Reviewers: vmedic
Reviewed By: vmedic
Differential Revision: http://reviews.llvm.org/D3640
llvm-svn: 208201
The AAPCS states that values passed in registers must have a value as though
they had been loaded with "LDR". LDR is equivalent to "LD1.64 vX.1D" - that is,
loading scalars to vector registers and loading 1-element vectors is equivalent.
The logic implemented here is to ensure that at all call boundaries and during
formal argument lowering all vectors are treated as their bitwidth-based floating
point scalar counterpart, which is always one of f64 or f128 (v2i32 -> f64,
v4i32 -> f128 etc). A BITCAST is inserted so that the appropriate REV will be
generated during code generation.
llvm-svn: 208198
Summary:
This makes it easier to prove a more complicated change in the next commit
is non-functional.
Reviewers: vmedic
Reviewed By: vmedic
Differential Revision: http://reviews.llvm.org/D3639
llvm-svn: 208197
Because we've canonicalised on using LD1/ST1, every time we do a bitcast
between vector types we must do an equivalent lane reversal.
Consider a simple memory load followed by a bitconvert then a store.
v0 = load v2i32
v1 = BITCAST v2i32 v0 to v4i16
store v4i16 v2
In big endian mode every memory access has an implicit byte swap. LDR and
STR do a 64-bit byte swap, whereas LD1/ST1 do a byte swap per lane - that
is, they treat the vector as a sequence of elements to be byte-swapped.
The two pairs of instructions are fundamentally incompatible. We've decided
to use LD1/ST1 only to simplify compiler implementation.
LD1/ST1 perform the equivalent of a sequence of LDR/STR + REV. This makes
the original code sequence: v0 = load v2i32
v1 = REV v2i32 (implicit)
v2 = BITCAST v2i32 v1 to v4i16
v3 = REV v4i16 v2 (implicit)
store v4i16 v3
But this is now broken - the value stored is different to the value loaded
due to lane reordering. To fix this, on every BITCAST we must perform two
other REVs:
v0 = load v2i32
v1 = REV v2i32 (implicit)
v2 = REV v2i32
v3 = BITCAST v2i32 v2 to v4i16
v4 = REV v4i16
v5 = REV v4i16 v4 (implicit)
store v4i16 v5
This means an extra two instructions, but actually in most cases the two REV
instructions can be combined into one. For example:
(REV64_2s (REV64_4h X)) === (REV32_4h X)
There is also no 128-bit REV instruction. This must be synthesized with an
EXT instruction.
Most bitconverts require some sort of conversion. The only exceptions are:
a) Identity conversions - vNfX <-> vNiX
b) Single-lane-to-scalar - v1fX <-> fX or v1iX <-> iX
Even though there are hundreds of changed lines, I have a fairly high confidence
that they are somewhat correct. The changes to add two REV instructions per
bitcast were pretty mechanical, and once I'd done that I threw the resulting
.td at a script I wrote which combined the two REVs together (and added
an EXT instruction, for f128) based on an instruction description I gave it.
This was much less prone to error than doing it all manually, plus my brain
would not just have melted but would have vapourised.
llvm-svn: 208194
This completes the port of r204814 (cpirker "AArch64_BE function argument
passing for ARM ABI") from AArch64 to ARM64, and fixes a bunch of issues
found during later development along the way. The biggest of these was
that the alignment fixup logic wasn't replicated into all the places it
should have been.
llvm-svn: 208192
This is a third patch of patch series that improves MergeFunctions
performance time from O(N*N) to O(N*log(N)).
This patch description:
Being comparing functions we need to compare values we meet at left and
right sides.
Its easy to sort things out for external values. It just should be
the same value at left and right.
But for local values (those were introduced inside function body)
we have to ensure they were introduced at exactly the same place,
and plays the same role.
In short, patch introduces values serial numbering and comparison routine.
The last one compares two values by their serial numbers.
llvm-svn: 208189
Summary:
The overall idea is to chop the Predicates list into subsets that are
usually overridden independently. This allows subclasses to partially
override the predicates of their superclasses without having to re-add all
the existing predicates.
This patch starts the process by moving HasStdEnc into a new
EncodingPredicates list and almost everything else into
AdditionalPredicates.
It has revealed a couple likely bugs where 'let Predicates' has removed
the HasStdEnc predicate.
No functional change (confirmed by diffing tablegen-erated files).
Depends on D3549, D3506
Reviewers: vmedic
Differential Revision: http://reviews.llvm.org/D3550
llvm-svn: 208184
Summary:
It concatenates two or more lists. In addition to the !strconcat semantics
the lists must have the same element type.
My overall aim is to make it easy to append to Instruction.Predicates
rather than override it. This can be done by concatenating lists passed as
arguments, or by concatenating lists passed in additional fields.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: hfinkel, llvm-commits
Differential Revision: http://reviews.llvm.org/D3506
llvm-svn: 208183
Summary:
This will make it easier to prove that a more complicated change in the
following commit is non-functional.
No functional change.
Depends on D3506
Reviewers: vmedic
Reviewed By: vmedic
Differential Revision: http://reviews.llvm.org/D3549
llvm-svn: 208179
1) Fix for printing debug locations for absolute paths.
2) Location printing is moved into public method DebugLoc::print() to avoid re-inventing the wheel.
Differential Revision: http://reviews.llvm.org/D3513
llvm-svn: 208177
O(N*log(N)). The idea is to introduce total ordering among functions set.
It allows to build binary tree and perform function look-up procedure in O(log(N)) time.
This patch description:
Introduced total ordering among constants implemented in cmpConstants method.
Method performs lexicographical comparison between constants represented as
hypothetical numbers of next format:
<bitcastability-trait><raw-bit-contents>
Please, read cmpConstants declaration comments for more details.
llvm-svn: 208173
This field is used for a list of variables to ensure they are not lost
during optimization (they're only included when optimizations are
enabled).
llvm-svn: 208159
Mark up additional instructions which are part of the function prologue as
MachineFrameSetup. These instructions are part of the function prologue,
emitted by the PEI pass to setup the stack for use in the activating frame.
llvm-svn: 208153
The ARM::BLX instruction is an ARM mode instruction. The Windows on ARM target
is limited to Thumb instructions. Correctly use the thumb mode tBLXr
instruction. This would manifest as an errant write into the object file as the
instruction is 4-bytes in length rather than 2. The result would be a corrupted
object file that would eventually result in an executable that would crash at
runtime.
llvm-svn: 208152
If the source files referenced by a gcno file are missing, gcov
outputs a coverage file where every line is simply /*EOF*/. This also
occurs for lines in the coverage that are past the end of a file that
is found.
This change mimics gcov.
llvm-svn: 208149
In gcov, there's a -n/--no-output option, which disables the writing
of any .gcov files, so that it emits only the summary info on stdout.
This implements the same behaviour in llvm-cov.
llvm-svn: 208148
This is similar to the getAlignment patch, but is done just for
completeness. It looks like we never call getSection on an alias. All the
tests still pass if the if is replaced with an assert.
llvm-svn: 208139
Speculatively reverting due to a suspicious failure on a Windows
buildbot.
This reverts commit 10c37a012ea11596d44cd9059fe09c959caf30c8.
llvm-svn: 208131
remove it from the list of unspilled registers. Otherwise the following
attempt to keep the stack aligned by picking an extra GPR register to
spill will not work as it picks up r11.
llvm-svn: 208129
Summary:
When I initially introduced -pass-remarks, I thought it would be a
neat idea to make it additive. So, if one used it as:
$ llc -pass-remarks=inliner --pass-remarks=loop.*
the compiler would build the regular expression '(inliner)|(loop.*)'.
The more I think about it, the more I regret it. This is not how
other flags work. The standard semantics are right-to-left overrides.
This is how clang interprets -Rpass. And I think the two should be
compatible in this respect.
Reviewers: qcolombet
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D3614
llvm-svn: 208122
Before this patch, the backend always emitted a store+load sequence to
bitconvert from f64 to i64 the input operand of a ISD::BITCAST dag node that
performed a bitconvert from type MVT::f64 to type MVT::v2i32. The resulting
i64 node was then used to build a v2i32 vector.
With this patch, the backend now produces a cheaper SCALAR_TO_VECTOR from
MVT::f64 to MVT::v2f64. That SCALAR_TO_VECTOR is then followed by a "free"
bitcast to type MVT::v4i32. The elements of the resulting
v4i32 are then extracted to build a v2i32 vector (which is illegal and
therefore promoted to MVT::v2i64).
This is in general cheaper than emitting a stack store+load sequence
to bitconvert the operand from type f64 to type i64.
llvm-svn: 208107
This patch implements the infrastructure to use named register constructs in
programs that need access to specific registers (bare metal, kernels, etc).
So far, only the stack pointer is supported as a technology preview, but as it
is, the intrinsic can already support all non-allocatable registers from any
architecture.
llvm-svn: 208104
An alias has the address of what it points to, so it also has the same
alignment.
This allows a few optimizations to see past aliases for free.
llvm-svn: 208103
fall back to the normal path without a cpu. While doing this fix
llc to just exit when we don't have a module to process instead of
asserting.
llvm-svn: 208102
Also, provide the ability to create temporary and non-temporary
declarations, as not all declarations may be replaced by definitions
later on.
This provides the necessary infrastructure for Clang to fix PR19598,
leaking temporary MDNodes in Clang's debug info generation.
llvm-svn: 208054
The Win64 docs are very clear that anything larger than 8 bytes is
passed by reference, and GCC MinGW64 honors that for __modti3 and
friends.
Patch by Jameson Nash!
llvm-svn: 208029