O(N*log(N)). The idea is to introduce total ordering among functions set.
That allows to build binary tree and perform function look-up procedure in O(log(N)) time.
This patch description:
Introduced total ordering among Type instances. Actually it is improvement for existing
isEquivalentType.
0. Coerce pointer of 0 address space to integer.
1. If left and right types are equal (the same Type* value), return 0 (means equal).
2. If types are of different kind (different type IDs). Return result of type IDs
comparison, treating them as numbers.
3. If types are vectors or integers, return result of its
pointers comparison (casted to numbers).
4. Check whether type ID belongs to the next group:
* Void
* Float
* Double
* X86_FP80
* FP128
* PPC_FP128
* Label
* Metadata
If so, return 0.
5. If left and right are pointers, return result of address space
comparison (numbers comparison).
6. If types are complex.
Then both LEFT and RIGHT will be expanded and their element types will be checked with
the same way. If we get Res != 0 on some stage, return it. Otherwise return 0.
7. For all other cases put llvm_unreachable.
llvm-svn: 203788
order to use the single assignment. That's probably worth doing for
a lot of these types anyways as they may have non-trivial moves and so
getting copy elision in more places seems worthwhile.
I've tried to add some tests that actually catch this mistake, and one
of the types is now well tested but the others' tests still fail to
catch this. I'll keep working on tests, but this gets the core pattern
right.
llvm-svn: 203780
convenient it is to imagine a world where this works, that is not C++ as
was pointed out in review. The standard even goes to some lengths to
preclude any attempt at this, for better or worse. Maybe better. =]
llvm-svn: 203775
Only one instruction pair needed changing: SMULH & UMULH. The previous
code worked, but MC was doing extra work treating Ra as a valid
operand (which then got completely overwritten in MCCodeEmitter).
No behaviour change, so no tests.
llvm-svn: 203772
VSX is an ISA extension supported on the POWER7 and later cores that enhances
floating-point vector and scalar capabilities. Among other things, this adds
<2 x double> support and generally helps to reduce register pressure.
The interesting part of this ISA feature is the register configuration: there
are 64 new 128-bit vector registers, the 32 of which are super-registers of the
existing 32 scalar floating-point registers, and the second 32 of which overlap
with the 32 Altivec vector registers. This makes things like vector insertion
and extraction tricky: this can be free but only if we force a restriction to
the right register subclass when needed. A new "minipass" PPCVSXCopy takes care
of this (although it could do a more-optimal job of it; see the comment about
unnecessary copies below).
Please note that, currently, VSX is not enabled by default when targeting
anything because it is not yet ready for that. The assembler and disassembler
are fully implemented and tested. However:
- CodeGen support causes miscompiles; test-suite runtime failures:
MultiSource/Benchmarks/FreeBench/distray/distray
MultiSource/Benchmarks/McCat/08-main/main
MultiSource/Benchmarks/Olden/voronoi/voronoi
MultiSource/Benchmarks/mafft/pairlocalalign
MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4
SingleSource/Benchmarks/CoyoteBench/almabench
SingleSource/Benchmarks/Misc/matmul_f64_4x4
- The lowering currently falls back to using Altivec instructions far more
than it should. Worse, there are some things that are scalarized through the
stack that shouldn't be.
- A lot of unnecessary copies make it past the optimizers, and this needs to
be fixed.
- Many more regression tests are needed.
Normally, I'd fix these things prior to committing, but there are some
students and other contributors who would like to work this, and so it makes
sense to move this development process upstream where it can be subject to the
regular code-review procedures.
llvm-svn: 203768
There are currently two schemes for mapping instruction operands to
instruction-format variables for generating the instruction encoders and
decoders for the assembler and disassembler respectively: a) to map by name and
b) to map by position.
In the long run, we'd like to remove the position-based scheme and use only
name-based mapping. Unfortunately, the name-based scheme currently cannot deal
with complex operands (those with suboperands), and so we currently must use
the position-based scheme for those. On the other hand, the position-based
scheme cannot deal with (register) variables that are split into multiple
ranges. An upcoming commit to the PowerPC backend (adding VSX support) will
require this capability. While we could teach the position-based scheme to
handle that, since we'd like to move away from the position-based mapping
generally, it seems silly to teach it new tricks now. What makes more sense is
to allow for partial transitioning: use the name-based mapping when possible,
and only use the position-based scheme when necessary.
Now the problem is that mixing the two sensibly was not possible: the
position-based mapping would map based on position, but would not skip those
variables that were mapped by name. Instead, the two sets of assignments would
overlap. However, I cannot currently change the current behavior, because there
are some backends that rely on it [I think mistakenly, but I'll send a message
to llvmdev about that]. So I've added a new TableGen bit variable:
noNamedPositionallyEncodedOperands, that can be used to cause the
position-based mapping to skip variables mapped by name.
llvm-svn: 203767
Support to the IAS was added to actually parse and handle the complex SO
expressions. However, the object file lowering was not updated to compensate
for the fact that the shift operand may be an absolute expression.
When trying to assemble to an object file, the lowering would fail while
succeeding when emitting purely assembly. Add an appropriate test.
The test case is inspired by the test case provided by Jiangning Liu who also
brought the issue to light.
llvm-svn: 203762
I personally build with these settings enabled all the time, and it
is clearer to see the actual warning flags (e.g., -Wuninitialized)
get passed by Xcode rather than seeing -Wno-uninitialized followed
by -Wall (the latter canceling out the former) and figuring out
what is going on.
Xcode will ignore build settings it doesn't understand, so this will
work on possibly older versions of Xcode that don't support all
of these settings.
llvm-svn: 203760
for use with C++11 range-based for-loops.
The gist of phase 1 is to remove the skipInstruction() and skipBundle()
methods from these iterators, instead splitting each iterator into a version
that walks operands, a version that walks instructions, and a version that
walks bundles. This has the result of making some "clever" loops in lib/CodeGen
more verbose, but also makes their iterator invalidation characteristics much
more obvious to the casual reader. (Making them concise again in the future is a
good motivating case for a pre-incrementing range adapter!)
Phase 2 of this undertaking with consist of removing the getOperand() method,
and changing operator*() of the operand-walker to return a MachineOperand&. At
that point, it should be possible to add range views for them that work as one
might expect.
llvm-svn: 203757
"ProcResource def is not included in the ProcResources".
Some of the machine model definitions were not added to the
processor's list used for diagnostics and error checking.
llvm-svn: 203749
As an example that was not actually being used, it suffered from a slow bitrot.
The two main issues with it were that it had no cmake support and
included a copy of the autoconf directory. The reality is that
autoconf is not easily composable. The lack of composabilty is why we
have clang options in llvm's configure. Suggesting that users include
a copy of autoconf/ in their projects seems a bad idea.
We are also in the process of switching to cmake, so pushing autoconf
to new project is probably not what we want.
llvm-svn: 203728
This makes the mapping consistent with other CU->X mappings in the
MCContext, helping pave the way to refactor all these values into a
single data structure per CU and thus a single map.
I haven't renamed the data structure as that would make the patch churn
even higher (the MCLineSection name no longer makes sense, as this
structure now contains lines for multiple sections covered by a single
CU, rather than lines for a single section in multiple CUs) and further
refactorings will follow that may remove this type entirely.
For convenience, I also gave the MCLineSection value semantics so we
didn't have to do the lazy construction, manual delete, etc.
(& for those playing at home, refactoring the line printing into a
single data structure will eventually alow that data structure to be
reused to own the debug_line.dwo line table used for type unit file name
resolution)
llvm-svn: 203726
Chandler voiced some concern with checking this in without some
discussion first. Reverting for now.
This reverts r203703, r203704, r203708, and 203709.
llvm-svn: 203723
Extend what's currently done for shift because the HW performs this masking
implicitly:
(rotl:i32 x, (and y, 31)) -> (rotl:i32 x, y)
I use the newly factored out multiclass that was only supporting shifts so
far.
For testing I extended my testcase for the new rotation idiom.
<rdar://problem/15295856>
llvm-svn: 203718
On ELF and COFF an alias is just another name for a position in the file.
There is no way to refer to a position in another file, so an alias to
undefined is meaningless.
MachO currently doesn't support aliases. The spec has a N_INDR, which when
implemented will have a different set of restrictions. Adding support for
it shouldn't be harder than any other IR extension.
For now, having the IR represent what is actually possible with current
tools makes it easier to fix the design of GlobalAlias.
llvm-svn: 203705
This replaces the llvm-profdata tool with a version that uses the
recently introduced Profile library. The new tool has the ability to
generate and summarize profdata files as well as merging them.
llvm-svn: 203704
This provides a library to work with the instrumentation based
profiling format that is used by clang's -fprofile-instr-* options and
by the llvm-profdata tool. This is a binary format, rather than the
textual one that's currently in use.
The tests are in the subsequent commits that use this.
llvm-svn: 203703
This allows us to generate table lookups for code such as:
unsigned test(unsigned x) {
switch (x) {
case 100: return 0;
case 101: return 1;
case 103: return 2;
case 105: return 3;
case 107: return 4;
case 109: return 5;
case 110: return 6;
default: return f(x);
}
}
Since cases 102, 104, etc. are not constants, the lookup table has holes
in those positions. We therefore guard the table lookup with a bitmask check.
Patch by Jasper Neumann!
llvm-svn: 203694
The peephole (shift x, (and y, 31)) -> (shift x, y) is repeated for each
integer type and each shift variant.
To improve this a new multiclass is added that covers all integer types. The
shift patterns are now instantiated from this. I am planning to add new
instances for rotates as well.
No functional change intended:
* test/CodeGen/X86/shift-and.ll provides coverage
* Compared the expanded tablegen output and matched up the defs for these
Pat<>s before and after
llvm-svn: 203685
There's a bit of duplicated "magic" code in opt.cpp and Clang's CodeGen that
computes the inliner threshold from opt level and size opt level.
This patch moves the code to a function that lives alongside the inliner itself,
providing a convenient overload to the inliner creation.
A separate patch can be committed to Clang to use this once it's committed to
LLVM. Standalone tools that use the inlining pass can also avoid duplicating
this code and fearing it will go out of sync.
Note: this patch also restructures the conditinal logic of the computation to
be cleaner.
llvm-svn: 203669
Summary:
This is a white lie to workaround a widespread bug in the -mfp64
implementation.
The problem is that none of the 32-bit fpu ops mention the fact that they
clobber the upper 32-bits of the 64-bit FPR. This allows MTHC1 to be
scheduled on the wrong side of most 32-bit FPU ops, particularly MTC1.
Fixing that requires a major overhaul of the FPU implementation which can't
be done right now due to time constraints.
The testcase is SingleSource/Benchmarks/Misc/oourafft.c when given
TARGET_CFLAGS='-mips32r2 mfp64 -mmsa'.
Also correct the comment added in r203464 to indicate that two
instructions were affected.
Reviewers: matheusalmeida, jacksprat
Reviewed By: matheusalmeida
Differential Revision: http://llvm-reviews.chandlerc.com/D3029
llvm-svn: 203659
Summary:
Correct the match patterns and the lowerings that made the CodeGen tests pass despite the mistakes.
The original testcase that discovered the problem was SingleSource/UnitTests/SignlessType/factor.c in test-suite.
During review, we also found that some of the existing CodeGen tests were incorrect and fixed them:
* bitwise.ll: In bsel_v16i8 the IfSet/IfClear were reversed because bsel and bmnz have different operand orders and the test didn't correctly account for this. bmnz goes 'IfClear, IfSet, CondMask', while bsel goes 'CondMask, IfClear, IfSet'.
* vec.ll: In the cases where a bsel is emitted as a bmnz (they are the same operation with a different input tied to the result) the operands were in the wrong order.
* compare.ll and compare_float.ll: The bsel operand order was correct for a greater-than comparison, but a greater-than comparison instruction doesn't exist. Lowering this operation inverts the condition so the IfSet/IfClear need to be swapped to match.
The differences between BSEL, BMNZ, and BMZ and how they map to/from vselect are rather confusing. I've therefore added a note to MSA.txt to explain this in a single place in addition to the comments that explain each case.
Reviewers: matheusalmeida, jacksprat
Reviewed By: matheusalmeida
Differential Revision: http://llvm-reviews.chandlerc.com/D3028
llvm-svn: 203657
When the list of VFP registers to be saved was non-contiguous (so multiple
vpush/vpop instructions were needed) these were being ordered oddly, as in:
vpush {d8, d9}
vpush {d11}
This led to the layout in memory being [d11, d8, d9] which is ugly and doesn't
match the CFI_INSTRUCTIONs we're generating either (so Dwarf info would be
broken).
This switches the order of vpush/vpop (in both prologue and epilogue,
obviously) so that the Dwarf locations are correct again.
rdar://problem/16264856
llvm-svn: 203655
It seems gas can't handle CFI directives with VFP register names ("d12", etc.).
This broke us trying to build Chromium for Android after 201423.
A gas bug has been filed: https://sourceware.org/bugzilla/show_bug.cgi?id=16694
compnerd suggested making this conditional on whether we're using the integrated
assembler or not. I'll look into that in a follow-up patch.
Differential Revision: http://llvm-reviews.chandlerc.com/D3049
llvm-svn: 203635
I could fold the callers into their one call site, but the indirection
(given how verbose choosing the section is) seemed helpful.
The use of a member function pointer's a bit "tricky", but seems limited
enough, the call sites are simple/clean/clear, and there's only one use.
llvm-svn: 203619
the first run of the polly buildbot failed, and then it started passing.
This is due to the fact that the buildbot re-builds in an existing directory,
and the first run does not have WITH_POLLY set when it enters tools/.
Thus, cmake ignores the tools/polly dir in the first run, and then because
it reuses the CMakeCache.txt of the previous run, it has the WITH_POLLY set
by the previous run, and so it passes the second time.
llvm-svn: 203615
Add a utility function to convert the Windows path separator to Unix style path
separators. This is used by a subsequent change in clang to enable the use of
Windows SDK headers on Linux.
llvm-svn: 203611
* Add masking instructions before indirect calls (in MC layer).
* Align call + branch delay to the bundle end (in MC layer).
Differential Revision: http://llvm-reviews.chandlerc.com/D3032
llvm-svn: 203606
The function hasReliableSymbolDifference had exactly one use in the MachO
writer. It is also only true for X86_64. In fact, the comments refers to
"Darwin x86_64" and everything else, so this makes the code match the
comment.
If this is to be abstracted again, it should be a property of
TargetObjectWriter, like useAggressiveSymbolFolding.
llvm-svn: 203605
Before this patch the unix code for creating hardlinks was unused. The code
for creating symbolic links was implemented in lib/Support/LockFileManager.cpp
and the code for creating hard links in lib/Support/*/Path.inc.
The only use we have for these is in LockFileManager.cpp and it can use both
soft and hard links. Just have a create_link function that creates one or the
other depending on the platform.
llvm-svn: 203596
GuardMalloc can print info to stderr, causing these tests to fail.
Since FileCheck errors on empty inputs, just add a bit of dummy
data to make it happy.
llvm-svn: 203595
This fixes the bug where we would bitcast the 64-bit floating point result
of cmpneqsd to a 64-bit integer even on 32-bit targets.
Differential Revision: http://llvm-reviews.chandlerc.com/D3009
llvm-svn: 203581
When resolving a function call to an external routine, the dynamic
loader must patch the "nop" after the branch instruction to a load
that restores the TOC register.
Current code does that, but only with the *first* instance of a call
to any particular external routine, i.e. at the point where it also
allocates the call stub. With subsequent calls to the same routine,
current code neglects to patch in the TOC restore code. This is a
bug, and leads to corrupt TOC pointers in those cases.
Fixed by patching in restore code every time.
llvm-svn: 203580
Use the options in the ARMISelLowering to control whether tail calls are
optimised or not. Previously, this option was entirely ignored on the ARM
target and only honoured on x86.
This option is mostly useful in profiling scenarios. The default remains that
tail call optimisations will be applied.
llvm-svn: 203577
This option is from 2010, designed to work around a linker issue on Darwin for
ARM. According to grosbach this is no longer an issue and this option can
safely be removed.
llvm-svn: 203576
Tail call optimisation was previously disabled on all targets other than
iOS5.0+. This enables the tail call optimisation on all Thumb 2 capable
platforms.
The test adjustments are to remove the IR hint "tail" to function invocation.
The tests were designed assuming that tail call optimisations would not kick in
which no longer holds true.
llvm-svn: 203575
After r203553 overflow intrinsics and their non-intrinsic (normal)
instruction get hashed to the same value. This patch prevents PRE from
moving an instruction into a predecessor block, and trying to add a phi
node that gets two different types (the intrinsic result and the
non-intrinsic result), resulting in a failing assert.
llvm-svn: 203574
ATOMIC_STORE operations always get here as a lowered ATOMIC_SWAP, so there's no
need for any code to handle them specially.
There should be no functionality change so no tests.
llvm-svn: 203567
The syntax for "cmpxchg" should now look something like:
cmpxchg i32* %addr, i32 42, i32 3 acquire monotonic
where the second ordering argument gives the required semantics in the case
that no exchange takes place. It should be no stronger than the first ordering
constraint and cannot be either "release" or "acq_rel" (since no store will
have taken place).
rdar://problem/15996804
llvm-svn: 203559
When an overflow intrinsic is followed by a non-overflow instruction,
replace the latter with an extract. For example:
%sadd = tail call { i32, i1 } @llvm.sadd.with.overflow.i32(i32 %a, i32 %b)
%sadd3 = add i32 %a, %b
Here the add statement will be replaced by an extract.
When an overflow intrinsic follows a non-overflow instruction, a clone
of the intrinsic is inserted before the normal instruction, which makes
it the same as the previous case. Subsequent runs of GVN can then clean
up the duplicate instructions and insert the extract.
This fixes PR8817.
llvm-svn: 203553
The official specifications state the name to be ARMNT (as per the Microsoft
Portable Executable and Common Object Format Specification v8.3).
llvm-svn: 203530