With this patch MCDisassembler::getInstruction takes an ArrayRef<uint8_t>
instead of a MemoryObject.
Even on X86 there is a maximum size an instruction can have. Given
that, it seems way simpler and more efficient to just pass an ArrayRef
to the disassembler instead of a MemoryObject and have it do a virtual
call every time it wants some extra bytes.
llvm-svn: 221751
For historical reasons archives on mach-o have two possible names for the
file containing the table of contents for the archive: "__.SYMDEF SORTED"
and "__.SYMDEF". But the libObject archive reader only supported the former.
This patch fixes llvm::object::Archive to support both names.
llvm-svn: 221747
Currently, we have a type parameter mechanism for intrinsics. Rather than having to specify a separate intrinsic for each combination of argument and return types, we can specify a single intrinsic with one or more type parameters. These type parameters are passed explicitly to Intrinsic::getDeclaration or can be specified implicitly in the naming of the intrinsic function in an LL file.
Today, the types are limited to integer, floating point, and pointer types. With a goal of supporting symbolic targets for patchpoints and statepoints, this change adds support for function types. The change also includes support for first class aggregate types (named structures and arrays) since these appear in function types we've encountered.
Reviewed by: atrick, ributzka
Differential Revision: http://reviews.llvm.org/D4608
llvm-svn: 221742
We currently have two ways of informing the optimizer that the result of a load is never null: metadata and assume. This change converts the second in to the former. This avoids a need to implement optimizations using both forms.
We should probably extend this basic idea to metadata of other forms; in particular, range metadata. We view is that assumes should be considered a "last resort" for when there isn't a more canonical way to represent something.
Reviewed by: Hal
Differential Revision: http://reviews.llvm.org/D5951
llvm-svn: 221737
This is a reapplication of r221171, but we only perform the transformation
on expressions which include a multiplication. We do not transform rem/div
operations as this doesn't appear to be safe in all cases.
llvm-svn: 221721
Summary:
This change moves asan-coverage instrumentation
into a separate Module pass.
The other part of the change in clang introduces a new flag
-fsanitize-coverage=N.
Another small patch will update tests in compiler-rt.
With this patch no functionality change is expected except for the flag name.
The following changes will make the coverage instrumentation work with tsan/msan
Test Plan: Run regression tests, chromium.
Reviewers: nlewycky, samsonov
Reviewed By: nlewycky, samsonov
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D6152
llvm-svn: 221718
Instead, we're going to separate metadata from the Value hierarchy. See
PR21532.
This reverts commit r221375.
This reverts commit r221373.
This reverts commit r221359.
This reverts commit r221167.
This reverts commit r221027.
This reverts commit r221024.
This reverts commit r221023.
This reverts commit r220995.
This reverts commit r220994.
llvm-svn: 221711
What would happen before that commit is that the SDDbgValues associated with
a deallocated SDNode would be marked Invalidated, but SDDbgInfo would keep
a map entry keyed by the SDNode pointer pointing to this list of invalidated
SDDbgNodes. As the memory gets reused, the list might get wrongly associated
with another new SDNode. As the SDDbgValues are cloned when they are transfered,
this can lead to an exponential number of SDDbgValues being produced during
DAGCombine like in http://llvm.org/bugs/show_bug.cgi?id=20893
Note that the previous behavior wasn't really buggy as the invalidation made
sure that the SDDbgValues won't be used. This commit can be considered a
memory optimization and as such is really hard to validate in a unit-test.
llvm-svn: 221709
This commit adds a new pass that can inject checks before indirect calls to
make sure that these calls target known locations. It supports three types of
checks and, at compile time, it can take the name of a custom function to call
when an indirect call check fails. The default failure function ignores the
error and continues.
This pass incidentally moves the function JumpInstrTables::transformType from
private to public and makes it static (with a new argument that specifies the
table type to use); this is so that the CFI code can transform function types
at call sites to determine which jump-instruction table to use for the check at
that site.
Also, this removes support for jumptables in ARM, pending further performance
analysis and discussion.
Review: http://reviews.llvm.org/D4167
llvm-svn: 221708
This is a first step for generating SSE rcp instructions for reciprocal
calcs when fast-math allows it. This is very similar to the rsqrt optimization
enabled in D5658 ( http://reviews.llvm.org/rL220570 ).
For now, be conservative and only enable this for AMD btver2 where performance
improves significantly both in terms of latency and throughput.
We may never enable this codegen for Intel Core* chips because the divider circuits
are just too fast. On SandyBridge, divss can be as fast as 10 cycles versus the 21
cycle critical path for the rcp + mul + sub + mul + add estimate.
Follow-on patches may allow configuration of the number of Newton-Raphson refinement
steps, add AVX512 support, and enable the optimization for more chips.
More background here: http://llvm.org/bugs/show_bug.cgi?id=21385
Differential Revision: http://reviews.llvm.org/D6175
llvm-svn: 221706
My original support for the general dynamic and local dynamic TLS
models contained some fairly obtuse hacks to generate calls to
__tls_get_addr when lowering a TargetGlobalAddress. Rather than
generating real calls, special GET_TLS_ADDR nodes were used to wrap
the calls and only reveal them at assembly time. I attempted to
provide correct parameter and return values by chaining CopyToReg and
CopyFromReg nodes onto the GET_TLS_ADDR nodes, but this was also not
fully correct. Problems were seen with two back-to-back stores to TLS
variables, where the call sequences ended up overlapping with unhappy
results. Additionally, since these weren't real calls, the proper
register side effects of a call were not recorded, so clobbered values
were kept live across the calls.
The proper thing to do is to lower these into calls in the first
place. This is relatively straightforward; see the changes to
PPCTargetLowering::LowerGlobalTLSAddress() in PPCISelLowering.cpp.
The changes here are standard call lowering, except that we need to
track the fact that these calls will require a relocation. This is
done by adding a machine operand flag of MO_TLSLD or MO_TLSGD to the
TargetGlobalAddress operand that appears earlier in the sequence.
The calls to LowerCallTo() eventually find their way to
LowerCall_64SVR4() or LowerCall_32SVR4(), which call FinishCall(),
which calls PrepareCall(). In PrepareCall(), we detect the calls to
__tls_get_addr and immediately snag the TargetGlobalTLSAddress with
the annotated relocation information. This becomes an extra operand
on the call following the callee, which is expected for nodes of type
tlscall. We change the call opcode to CALL_TLS for this case. Back
in FinishCall(), we change it again to CALL_NOP_TLS for 64-bit only,
since we require a TOC-restore nop following the call for the 64-bit
ABIs.
During selection, patterns in PPCInstrInfo.td and PPCInstr64Bit.td
convert the CALL_TLS nodes into BL_TLS nodes, and convert the
CALL_NOP_TLS nodes into BL8_NOP_TLS nodes. This replaces the code
removed from PPCAsmPrinter.cpp, as the BL_TLS or BL8_NOP_TLS
nodes can now be emitted normally using their patterns and the
associated printTLSCall print method.
Finally, as a result of these changes, all references to get-tls-addr
in its various guises are no longer used, so they have been removed.
There are existing TLS tests to verify the changes haven't messed
anything up). I've added one new test that verifies that the problem
with the original code has been fixed.
llvm-svn: 221703
The ISel lowering for global TLS access in PIC mode was creating a pseudo
instruction that is later expanded to a call, but the code was not
setting the hasCalls flag in the MachineFrameInfo alongside the adjustsStack
flag. This caused some functions to be mistakenly recognized as leaf functions,
and this in turn affected the decision to eliminate the frame pointer.
With the fix, hasCalls is properly set and the leaf frame pointer is correctly
preserved.
llvm-svn: 221695
LLVM replaces the SelectionDAG pattern (xor (set_cc cc x y) 1) with
(set_cc !cc x y), which is only correct when the xor has type i1.
Instead, we should check that the constant operand to the xor is all
ones.
llvm-svn: 221693
Summary:
This patch enables code generation for the MIPS II target. Pre-Mips32
targets don't have the MUL instruction, so we add the correspondent
pattern that uses the MULT/MFLO combination in order to retrieve the
product.
This is WIP as we don't support code generation for select nodes due to
the lack of conditional-move instructions.
Reviewers: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D6150
llvm-svn: 221686
The canonical name when printing assembly is still $29. The reason is that
GAS does not accept "$hwr_ulr" at the moment.
This addresses the comments from r221307, which reverted the original
commit r221299.
llvm-svn: 221685
The original commit r221299 was reverted in r221307. I removed the name
"hrw_ulr" ($29) from the original commit because two tests were failing.
llvm-svn: 221681
Referencing one symbol from another in the same section does not
generally require a relocation. However, the MS linker has a feature
called /INCREMENTAL which enables incremental links. It achieves this
by creating thunks to the actual function and redirecting all
relocations to point to the thunk.
This breaks down with the old scheme if you have a function which
references, say, itself. On x86_64, we would use %rip relative
addressing to reference the start of the function from out current
position. This would lead to miscompiles because other references might
reference the thunk instead, breaking function pointer equality.
This fixes PR21520.
llvm-svn: 221678
This fixes an issue with matching trunc -> assertsext -> zext on x86-64, which would not zero the high 32-bits. See PR20494 for details.
Recommitting - This time, with a hopefully working test.
Differential Revision: http://reviews.llvm.org/D6128
llvm-svn: 221672
This adds const to a few methods that already return const references or
creates a const version when they reterun non-const references.
llvm-svn: 221666
AVX2 is available.
According to IACA, the new lowering has a throughput of 8 cycles instead of 13
with the previous one.
Althought this lowering kicks in some SPECs benchmarks, the performance
improvement was within the noise.
Correctness testing has been done for the whole range of uint32_t with the
following program:
uint4 v = (uint4) {0,1,2,3};
uint32_t i;
//Check correctness over entire range for uint4 -> float4 conversion
for( i = 0; i < 1U << (32-2); i++ )
{
float4 t = test(v);
float4 c = correct(v);
if( 0xf != _mm_movemask_ps( t == c ))
{
printf( "Error @ %vx: %vf vs. %vf\n", v, c, t);
return -1;
}
v += 4;
}
Where "correct" is the old lowering and "test" the new one.
The patch adds a test case for the two custom lowering instruction.
It also modifies the vector cost model, which is why cast.ll and uitofp.ll are
modified.
2009-02-26-MachineLICMBug.ll is also modified because we now hoist 7
instructions instead of 4 (3 more constant loads).
rdar://problem/18153096>
llvm-svn: 221657
In the case we optimize an integer extend away and replace it directly with the
source register, we also have to clear all kill flags at all its uses.
This is necessary, because the orignal IR instruction might be trivially dead,
but we replaced it with a nop at MI level.
llvm-svn: 221628
Switch statements may have more than one incoming edge into the same BB if they
all have the same value. When the switch statement is converted these incoming
edges are now coming from multiple BBs. Updating all incoming values to be from
a single BB is incorrect and would generate invalid LLVM IR.
The fix is to only update the first occurrence of an incoming value. Switch
lowering will perform subsequent calls to this helper function for each incoming
edge with a new basic block - updating all edges in the process.
This fixes rdar://problem/18916275.
llvm-svn: 221627
This fixes an issue with matching trunc -> assertsext -> zext on x86-64, which would not zero the high 32-bits.
See PR20494 for details.
Differential Revision: http://reviews.llvm.org/D6128
llvm-svn: 221626
Summary:
It currently only implements hasBranchDivergence, and will be extended
in later diffs.
Split from D6188.
Test Plan: make check-all
Reviewers: jholewinski
Reviewed By: jholewinski
Subscribers: llvm-commits, meheff, eliben, jholewinski
Differential Revision: http://reviews.llvm.org/D6195
llvm-svn: 221619
This fixes a few cases of:
* Wrong variable name style.
* Lines longer than 80 columns.
* Repeated names in comments.
* clang-format of the above.
This make the next patch a lot easier to read.
llvm-svn: 221615
We already use the llvm namespace. Remove the unnecessary prefix. Use the
StringRef::equals method to compare with C strings rather than instantiating
std::strings.
Addresses late review comments from David Majnemer.
llvm-svn: 221564
Visual Studio 2012 apparently does not support using alias declarations. Use
the more traditional typedef approach. This should let the Windows buildbots
pass. NFC.
llvm-svn: 221554
This introduces the symbol rewriter. This is an IR->IR transformation that is
implemented as a CodeGenPrepare pass. This allows for the transparent
adjustment of the symbols during compilation.
It provides a clean, simple, elegant solution for symbol inter-positioning. This
technique is often used, such as in the various sanitizers and performance
analysis.
The control of this is via a custom YAML syntax map file that indicates source
to destination mapping, so as to avoid having the compiler to know the exact
details of the source to destination transformations.
llvm-svn: 221548
Summary:
... and after all that refactoring, it's possible to distinguish softfloat
floating point values from integers so this patch no longer breaks softfloat to
do it.
Remove direct handling of i32's in the N32/N64 ABI by promoting them to
i64. This more closely reflects the ABI documentation and also fixes
problems with stack arguments on big-endian targets.
We now rely on signext/zeroext annotations (already generated by clang) and
the Assert[SZ]ext nodes to avoid the introduction of unnecessary sign/zero
extends.
It was not possible to convert three tests to use signext/zeroext. These tests
are bswap.ll, ctlz-v.ll, ctlz-v.ll. It's not possible to put signext on a
vector type so we just accept the sign extends here for now. These tests don't
pass the vectors the same way clang does (clang puts multiple elements in the
same argument, these map 1 element to 1 argument) so we don't need to worry too
much about it.
With this patch, all known N32/N64 bugs should be fixed and we now pass the
first 10,000 tests generated by ABITest.py.
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D6117
llvm-svn: 221534
Summary:
One of the calls to AllocateStack (the one in LowerCall) doesn't look like
it should be there but it was there before and removing it breaks the
frame size calculation.
Reviewers: vmedic, theraven
Reviewed By: theraven
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D6116
llvm-svn: 221529
Summary:
In addition to the usual f128 workaround, it was also necessary to provide
a means of accessing ArgListEntry::IsFixed.
Reviewers: theraven, vmedic
Reviewed By: vmedic
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D6111
llvm-svn: 221518
Summary:
In the long run, it should probably become a calling convention in its own
right but for now just move it out of
MipsISelLowering::analyzeCallOperands() so that we can drop this function
in favour of CCState::AnalyzeCallOperands().
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D6085
llvm-svn: 221517
Summary:
CCState objects already carry this information in their isVarArg() method.
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D6084
llvm-svn: 221516
We would attempt to fold away a call instruction which had been marked
overdefined. However, it's not valid to transition to constant from
overdefined.
This fixes PR21512.
llvm-svn: 221513
Summary:
This makes PIC levels a Module flag attribute, which can be queried by the
backend. The flag is named `PIC Level`, and can have a value of:
0 - Backend-default
1 - Small-model (-fpic)
2 - Large-model (-fPIC)
These match the `-pic-level' command line argument for clang, and the value of the
preprocessor macro `__PIC__'.
Test Plan:
New flags tests specific for the 'PIC Level' module flag.
Tests to be added as part of a future commit for PowerPC, which will use this new API.
Reviewers: rafael, echristo
Reviewed By: rafael, echristo
Subscribers: rafael, llvm-commits
Differential Revision: http://reviews.llvm.org/D5882
llvm-svn: 221510
Reversing a CB* instruction used to drop the flags on the condition. On the
included testcase, this lead to a read from an undefined vreg.
Using addOperand keeps the flags, here <undef>.
Differential Revision: http://reviews.llvm.org/D6159
llvm-svn: 221507
A pointer's pointee might not be sized: the pointee could be a function.
Report this as IK_NoInduction when calculating isInductionVariable.
This fixes PR21508.
llvm-svn: 221501
The ELF symbol `st_other` field might contain additional flags besides
visibility ones. This patch implements support for some MIPS specific
flags.
llvm-svn: 221491
Fixed an issue with the (v)cvttps2dq and (v)cvttpd2dq instructions being incorrectly put in the 2 source operand folding tables instead of the 1 source operand and added the missing SSE/AVX versions.
Also added missing (v)cvtps2dq and (v)cvtpd2dq instructions to the folding tables.
Differential Revision: http://reviews.llvm.org/D6001
llvm-svn: 221489
mingw lies about the size of a function's AuxFunctionDefinition. Ignore
the field and rely on our heuristic to determine the symbol's size.
llvm-svn: 221485
The variable is private, so the name should not be relied on. Also, the
linker uses the sections, so asan should too when trying to avoid causing
the linker problems.
llvm-svn: 221480
Imported declarations can be DIGlobalVariables which aren't a DIScope. Today
clang (unknowingly I believe) shoehorns these into a DIScope and it all works
just because we never access the fields.
llvm-svn: 221466
instructions. Inlining might cause such cases and it's not valid to
reassociate floating-point instructions without the unsafe algebra flag.
Patch by Mehdi Amini <mehdi_amini@apple.com>!
llvm-svn: 221462
Summary:
As with returns, we must be able to identify f128 arguments despite them
being lowered away. We do this with a pre-analyze step that builds a
vector and then we use this vector from the tablegen-erated code.
Reviewers: vmedic
Reviewed By: vmedic
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D6081
llvm-svn: 221461
On 32 bit windows we use label differences and .set does not suppress
rolocations, a combination that was not used before r220256.
This fixes PR21497.
llvm-svn: 221456
Example:
define <4 x i32> @test(<4 x i32> %a, <4 x i32> %b) {
%shuffle = shufflevector <4 x i32> %a, <4 x i32> %b, <4 x i32> <i32 4, i32 5, i32 6, i32 3>
ret <4 x i32> %shuffle
}
Before llc (-mattr=+sse4.1), produced the following assembly instruction:
pblendw $4294967103, %xmm1, %xmm0
After
pblendw $63, %xmm1, %xmm0
llvm-svn: 221455
Summary:
Currently, we give an error if %z is used with non-immediates, instead of continuing as if the %z isn't there.
For example, you use the %z operand modifier along with the "Jr" constraints ("r" makes the operand a register, and "J" makes it an immediate, but only if its value is 0).
In this case, you want the compiler to print "$0" if the inline asm input operand turns out to be an immediate zero and you want it to print the register containing the operand, if it's not.
We give an error in the latter case, and we shouldn't (GCC also doesn't).
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D6023
llvm-svn: 221453
Summary:
Improved warning message when using .cpload inside a reorder section and added an error message for using .cpload with Mips16 enabled.
Modified the tests to fit with the changes mentioned above, added a test-case for the N32 ABI in cpload.s and did some reformatting to make the tests easier to read.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5465
llvm-svn: 221447
Summary:
Fixed all of the missing endian conversions that Lang Hames and I identified in
RuntimeDyldMachOARM.h.
Fixed the opcode emission in RuntimeDyldImpl::createStubFunction() for AArch64,
ARM, Mips when the host endian doesn't match the target endian.
PowerPC will need changing if it's opcodes are affected by endianness but I've
left this for now since I'm unsure if this is the case and it's the only path
that specifies the target endian.
This patch fixes MachO_ARM_PIC_relocations.s on a big-endian Mips host. This
is the last of the known issues on this host.
Reviewers: lhames
Reviewed By: lhames
Subscribers: aemerson, llvm-commits
Differential Revision: http://reviews.llvm.org/D6130
llvm-svn: 221446
Use the position of the subsequent symbol in the object file to infer
the size of it's predecessor. I hope to eventually remove whatever COFF
specific details from this little algorithm so that we can unify this
logic with what Mach-O does.
llvm-svn: 221444
When generating gcov compatible profiling, we sometimes skip emitting
data for functions for one reason or another. However, this was
emitting different function IDs in the .gcno and .gcda files, because
the .gcno case was using the loop index before skipping functions and
the .gcda the array index after. This resulted in completely invalid
gcov data.
This fixes the problem by making the .gcno loop track the ID
separately from the loop index.
llvm-svn: 221441
condition to match a blend.
This prevents optimizations that work on VSELECT to perform invalid
transformations. Indeed, the optimized condition does not match the vector
boolean content that is expected and bad things may happen.
This patch yields the exact same code on the whole test-suite + specs (-O3 and
-O3 -march=core-avx2), it improves one test case (vector-blend.ll) and fixes a
bug reduced in vselect-avx.ll.
<rdar://problem/18819506>
llvm-svn: 221429
Remove dynamic relocations of __gxx_personality_v0 from the .eh_frame.
The MIPS64 follow-up of the MIPS32 fix (rL209907).
Patch by Vladimir Stefanovic.
Differential Revision: http://reviews.llvm.org/D6141
llvm-svn: 221408
Added missing memory folding for the (V)CVTDQ2PS instructions - we can safely fold these (but not the (V)CVTDQ2PD versions which have a register/memory size discrepancy in the source operand). I've added a test case demonstrating that stack folding now works.
Differential Revision: http://reviews.llvm.org/D5981
llvm-svn: 221407
Summary:
X86FastISel::fastMaterializeAlloca was incorrectly conditioning its
opcode selection on subtarget bitness rather than pointer size.
Differential Revision: http://reviews.llvm.org/D6136
llvm-svn: 221386
This works around the limitation that PTX does not allow .param space
loads/stores with arbitrary pointers.
If a function has a by-val struct ptr arg, say foo(%struct.x *byval %d), then
add the following instructions to the first basic block :
%temp = alloca %struct.x, align 8
%tt1 = bitcast %struct.x * %d to i8 *
%tt2 = llvm.nvvm.cvt.gen.to.param %tt2
%tempd = bitcast i8 addrspace(101) * to %struct.x addrspace(101) *
%tv = load %struct.x addrspace(101) * %tempd
store %struct.x %tv, %struct.x * %temp, align 8
The above code allocates some space in the stack and copies the incoming
struct from param space to local space. Then replace all occurences of %d
by %temp.
Fixes PR21465.
llvm-svn: 221377
Change `NamedMDNode::getOperator()` from returning `MDNode *` to
returning `Value *`. To reduce boilerplate at some call sites, add a
`getOperatorAsMDNode()` for named metadata that's expected to only
return `MDNode` -- for now, that's everything, but debug node named
metadata (such as llvm.dbg.cu and llvm.dbg.sp) will soon change. This
is part of PR21433.
Note that there's a follow-up patch to clang for the API change.
llvm-svn: 221375
We currently have no infrastructure to support these correctly.
This is accomplished by generating a call to a runtime library function that
aborts at runtime in place of the regular wrapper for such functions. Direct
calls are rewritten in the usual way during traversal of the caller's IR.
We also remove the "split-stack" attribute from such wrappers, as the code
generator cannot currently handle split-stack vararg functions.
llvm-svn: 221360
This matches the format produced by the AMD proprietary driver.
//==================================================================//
// Shell script for converting .ll test cases: (Pass the .ll files
you want to convert to this script as arguments).
//==================================================================//
; This was necessary on my system so that A-Z in sed would match only
; upper case. I'm not sure why.
export LC_ALL='C'
TEST_FILES="$*"
MATCHES=`grep -v Patterns SIInstructions.td | grep -o '"[A-Z0-9_]\+["e]' | grep -o '[A-Z0-9_]\+' | sort -r`
for f in $TEST_FILES; do
# Check that there are SI tests:
grep -q -e 'verde' -e 'bonaire' -e 'SI' -e 'tahiti' $f
if [ $? -eq 0 ]; then
for match in $MATCHES; do
sed -i -e "s/\([ :]$match\)/\L\1/" $f
done
# Try to get check lines with partial instruction names
sed -i 's/\(;[ ]*SI[A-Z\\-]*: \)\([A-Z_0-9]\+\)/\1\L\2/' $f
fi
done
sed -i -e 's/bb0_1/BB0_1/g' ../../../test/CodeGen/R600/infinite-loop.ll
sed -i -e 's/SI-NOT: bfe/SI-NOT: {{[^@]}}bfe/g'../../../test/CodeGen/R600/llvm.AMDGPU.bfe.*32.ll ../../../test/CodeGen/R600/sext-in-reg.ll
sed -i -e 's/exp_IEEE/EXP_IEEE/g' ../../../test/CodeGen/R600/llvm.exp2.ll
sed -i -e 's/numVgprs/NumVgprs/g' ../../../test/CodeGen/R600/register-count-comments.ll
sed -i 's/\(; CHECK[-NOT]*: \)\([A-Z_0-9]\+\)/\1\L\2/' ../../../test/CodeGen/R600/select64.ll ../../../test/CodeGen/R600/sgpr-copy.ll
//==================================================================//
// Shell script for converting .td files (run this last)
//==================================================================//
export LC_ALL='C'
sed -i -e '/Patterns/!s/\("[A-Z0-9_]\+[ "e]\)/\L\1/g' SIInstructions.td
sed -i -e 's/"EXP/"exp/g' SIInstrInfo.td
llvm-svn: 221350
This patch improves the folding of vector AND nodes into blend operations for
targets that feature SSE4.1. A vector AND node where one of the operands is
a constant build_vector with elements that are either zero or all-ones can be
converted into a blend.
This allows for example to simplify the following code:
define <4 x i32> @test(<4 x i32> %A, <4 x i32> %B) {
%1 = and <4 x i32> %A, <i32 0, i32 0, i32 0, i32 -1>
%2 = and <4 x i32> %B, <i32 -1, i32 -1, i32 -1, i32 0>
%3 = or <4 x i32> %1, %2
ret <4 x i32> %3
}
Before this patch llc (-mcpu=corei7) generated:
andps LCPI1_0(%rip), %xmm0, %xmm0
andps LCPI1_1(%rip), %xmm1, %xmm1
orps %xmm1, %xmm0, %xmm0
retq
With this patch we generate a single 'vpblendw'.
llvm-svn: 221343
Some ARM FPUs only have 16 double-precision registers, rather than the
normal 32. LLVM represents this with the D16 target feature. This is
currently used by CodeGen to avoid using high registers when they are
not available, but the assembler and disassembler do not.
I fix this in the assmebler and disassembler rather than the
InstrInfo.td files, as the latter would require a large number of
changes everywhere one of the floating-point instructions is referenced
in the backend. This solution is similar to the one used for
co-processor numbers and MSR masks.
llvm-svn: 221341
Commit 220932 caused crash when building clang-tblgen on aarch64 debian target,
so it's blocking all daily tests.
The std::call_once implementation in pthread has bug for aarch64 debian.
llvm-svn: 221331
Exact shifts may not shift out any non-zero bits. Use computeKnownBits
to determine when this occurs and just return the left hand side.
This fixes PR21477.
llvm-svn: 221325
We currently try to push an even number of registers to preserve 8-byte
alignment during a function's prologue, but only when the stack alignment is
prcisely 8. Many of the reasons for doing it apply also when that alignment > 8
(the extra store is often free, and can save another stack adjustment, though
less frequently for 16-byte stack alignment).
llvm-svn: 221321
We were making an attempt to do this by adding an extra callee-saved GPR (so
that there was an even number in the list), but when that failed we went ahead
and pushed anyway.
This had a couple of potential issues:
+ The .cfi directives we emit misplaced dN because they were based on
PrologEpilogInserter's calculation.
+ Unaligned stores can be less efficient.
+ Unaligned stores can actually fault (likely only an issue in niche cases,
but possible).
This adds a final explicit stack adjustment if all other options fail, so that
the actual locations of the registers match up with where they should be.
llvm-svn: 221320
Divides and remainder operations do not behave like other operations
when they are given poison: they turn into undefined behavior.
It's really hard to know if the operands going into a div are or are not
poison. Because of this, we should only choose to speculate if there
are constant operands which we can easily reason about.
This fixes PR21412.
llvm-svn: 221318
Patch to allow (v)blendps, (v)blendpd, (v)pblendw and vpblendd instructions to be commuted - swaps the src registers and inverts the blend mask.
This is primarily to improve memory folding (see new tests), but it also improves the quality of shuffles (see modified tests).
Differential Revision: http://reviews.llvm.org/D6015
llvm-svn: 221313
change LoopSimplifyPass to be !isCFGOnly. The motivation for the earlier patch
(r221223) was that LoopSimplify is not preserved by instcombine though
setPreservesCFG indicates that it is. This change fixes the issue
by making setPreservesCFG no longer imply LoopSimplifyPass, and is therefore less
invasive.
llvm-svn: 221311
While fixing up the register classes in the machine combiner in a previous
commit I missed one.
This fixes the last one and adds a test case.
llvm-svn: 221308
Clang -gsplit-dwarf self-host -O0, binary increases by 0.0005%, -O2,
binary increases by 25%.
A large binary inside Google, split-dwarf, -O0, and other internal flags
(GDB index, etc) increases by 1.8%, optimized build is 35%.
The size impact may be somewhat greater in .o files (I haven't measured
that much - since the linked executable -O0 numbers seemed low enough)
due to relocations. These relocations could be removed if we taught the
llvm-symbolizer to handle indexed addressing in the .o file (GDB can't
cope with this just yet, but GDB won't be reading this info anyway).
Also debug_ranges could be shared between .o and .dwo, though ideally
debug_ranges would get a schema that could used index(+offset)
addressing, and move to the .dwo file, then we'd be back to sharing
addresses in the address pool again.
But for now, these sizes seem small enough to go ahead with this.
Verified that no other DW_TAGs are produced into the .o file other than
subprograms and inlined_subroutines.
llvm-svn: 221306
We were producing a relocation for
----------------
.section foo,bar
La:
Lb:
.long La-Lb
--------------
but not for
---------------------
.section foo,bar
zed:
La:
Lb:
.long La-Lb
----------------
This patch handles the case where both fragments are part of the first atom
in a section and there is no corresponding symbol to that atom.
This fixes pr21328.
llvm-svn: 221304
This patch adds 'FeatureSlowSHLD' to 'bdver3'.
According to the official AMD optimization guide for amdfam15: "Using
alternative code in place of SHLD achieves lower overall latency and
requires fewer execution resources. The 32-bit and 64-bit forms of
ADD, ADC, SHR, and LEA (except 16-bit form) are DirectPath
instructions, while SHLD is a VectorPath instruction."
This patch also explicitly sets feature AVX and SSE4A for all the bdver*
cpus. This part of the patch is a non-functional change and it is mainly
done for clarity reasons (Both XOP and FMA4 already imply AVX and SSE4A).
llvm-svn: 221296
Registers are not all equal. Some are not allocatable (infinite cost),
some have to be preserved but can be used, and some others are just free
to use.
Ensure there is a cost hierarchy reflecting this fact, so that the
allocator will favor scratch registers over callee-saved registers.
llvm-svn: 221293
This patch improves how the different costs (register, interference, spill
and coalescing) relates together. The assumption is now that:
- coalescing (or any other "side effect" of reg alloc) is negative, and
instead of being derived from a spill cost, they use the block
frequency info.
- spill costs are in the [MinSpillCost:+inf( range
- register or interference costs are in [0.0:MinSpillCost( or +inf
The current MinSpillCost is set to 10.0, which is a random value high
enough that the current constraint builders do not need to worry about
when settings costs. It would however be worth adding a normalization
step for register and interference costs as the last step in the
constraint builder chain to ensure they are not greater than SpillMinCost
(unless this has some sense for some architectures). This would work well
with the current builder pipeline, where all costs are tweaked relatively
to each others, but could grow above MinSpillCost if the pipeline is
deep enough.
The current heuristic is tuned to depend rather on the number of uses of
a live interval rather than a density of uses, as used by the greedy
allocator. This heuristic provides a few percent improvement on a number
of benchmarks (eembc, spec, ...) and will definitely need to change once
spill placement is implemented: the current spill placement is really
ineficient, so making the cost proportionnal to the number of use is a
clear win.
llvm-svn: 221292
Summary:
Appropriately set/clear the FeatureBit for Mips16 when these assembler directives are used and also emit ".set nomips16" (previously, only ".set mips16" was being emitted).
These improvements allow for better testing of the .cpload/.cprestore assembler directives (which are not supposed to work when Mips16 is enabled).
Test Plan: The test is bare-bones because there are no MC tests for Mips16 instructions (there's only one, which checks that the Mips16 ELF header flag gets set), and that suggests to me that it has not been implemented yet in the IAS.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5462
llvm-svn: 221277
This is experimental, just barely enough to get things to not
immediately combust.
A note for those who are curious:
Only lld can successfully link the object files, other linkers truncate
the section names making the debug sections illegible to debuggers.
Even with this in mind, we believe we are having trouble with SECREL
relocations.
llvm-svn: 221245
1>C:\Program Files (x86)\Windows Kits\8.1\Include\um\minwinbase.h(46):
error C2146: syntax error : missing ';' before identifier 'nLength'
1>C:\Program Files (x86)\Windows Kits\8.1\Include\um\minwinbase.h(46):
error C4430: missing type specifier - int assumed. Note: C++ does not support default-int
...
including <windows.h> is actually required.
llvm-svn: 221244
preserve LoopSimplify because instcombine may replace branch predicates
with undef which loop simplify then replaces with always exit. Replace
setPreservesCFG with the more constrained preservation of DomTree and
LoopInfo.
llvm-svn: 221223
LoadCombine can be smarter about aborting when a writing instruction is
encountered, instead of aborting upon encountering any writing instruction, use
an AliasSetTracker, and only abort when encountering some write that might
alias with the loads that could potentially be combined.
This was originally motivated by comments made (and a test case provided) by
David Majnemer in response to PR21448. It turned out that LoadCombine was not
responsible for that PR, but LoadCombine should also be improved so that
unrelated stores (and @llvm.assume) don't interrupt load combining.
llvm-svn: 221203
This generalizes the range handling for ranges in both the skeleton and
full unit, laying the foundation for the addition of more ranges (rather
than just the CU's special case) in the skeleton CU with fission+gmlt.
llvm-svn: 221202
FoldOpIntoPhi could create an infinite loop if the PHI could potentially
reach a BB it was considering inserting instructions into. The
instructions it would insert would eventually lead to other combines
firing which would, again, lead to FoldOpIntoPhi firing.
The solution is to handicap FoldOpIntoPhi so that it doesn't attempt to
insert instructions that the PHI might reach.
This fixes PR21377.
llvm-svn: 221187
So that it may be shared between skeleton/full compile unit, for CU
ranges and other ranges to be added for fission+gmlt.
(at some point we might want some kind of object shared between the
skeleton and full compile units for all those things we only want one of
in that scope, rather than having the full unit always look through to
the skeleton... - alternatively, we might be able to have the skeleton
pointer (or another, separate pointer) point to the skeleton or to the
unit itself in non-fission, so we don't have to special case its
absence)
llvm-svn: 221186
This is one of a few steps to generalize range handling to include the
CU range (thus the CU's range list will be moved into the range list
list, losing track of the base address in the process), which means
generalizing ranges from both the skeleton and full unit under fission.
And... then I can used that generalized support for ranges in
fission+gmlt where there'll be a bunch more ranges in the skeleton.
llvm-svn: 221182
register class tGPRRegClass if the target is thumb1.
This commit fixes a crash that occurs during register allocation which was
triggered when a virtual register defined by an inline-asm instruction had to
be spilled.
rdar://problem/18740489
llvm-svn: 221178
For 8-bit divrems where the remainder is used, we used to generate:
divb %sil
shrw $8, %ax
movzbl %al, %eax
That was to avoid an H-reg access, which is problematic mainly because
it isn't possible in REX-prefixed instructions.
This patch optimizes that to:
divb %sil
movzbl %ah, %eax
To do that, we explicitly extend AH, and extract the L-subreg in the
resulting register. The extension is done using the NOREX variants of
MOVZX. To support signed operations, MOVSX_NOREX is also added.
Further, this introduces a new SDNode type, [us]divrem_ext_hreg, which is
then lowered to a sequence containing a single zext (rather than 2).
Differential Revision: http://reviews.llvm.org/D6064
llvm-svn: 221176
EarlyCSE uses a simple generation scheme for handling memory-based
dependencies, and calls to @llvm.assume (which are marked as writing to memory
to ensure the preservation of control dependencies) disturb that scheme
unnecessarily. Skipping calls to @llvm.assume is legal, and the alternative
(adding AA calls in EarlyCSE) is likely undesirable (we have GVN for that).
Fixes PR21448.
llvm-svn: 221175
call DAGCombiner. But we ran into a case (on Windows) where the
calling convention causes argument lowering to bail out of fast-isel,
and we end up in CodeGenAndEmitDAG() which does run DAGCombiner.
So, we need to make DAGCombiner check for 'optnone' after all.
Commit includes the test that found this, plus another one that got
missed in the original optnone work.
llvm-svn: 221168
This CPU definition is redundant. The Cortex-A9 is defined as
supporting multiprocessing extensions. Remove its definition and
update appropriate tests.
LLVM defines both a cortex-a9 CPU and a cortex-a9-mp CPU. The only
difference between the two CPU definitions in ARM.td is that
cortex-a9-mp contains the feature FeatureMP for multiprocessing
extensions.
This is redundant since the Cortex-A9 is defined as having
multiprocessing extensions in the TRMs. armcc also defines the
Cortex-A9 as having multiprocessing extensions by default.
Change-Id: Ifcadaa6c322be0a33d9d2a39cfdd7da1d75981a7
llvm-svn: 221166
Some literals in the AArch64 backend had 15 'f's rather than 16, causing
comparisons with a constant 0xffffffffffffffff to be miscompiled.
llvm-svn: 221157
Hexagon was not calling InitializeELF and could not select between
ctors and init_array.
Phabricator revision: http://reviews.llvm.org/D6061
llvm-svn: 221156
The MRI scripts have to work with CRLF, and in general it is probably
a good idea to support this in a core utility like LineIterator.
llvm-svn: 221153
When LLVM emits DWARF call frame information, it currently creates a local,
section-relative symbol in the code section, which is pointed to by a
relocation on the .eh_frame section. However, for C++ we emit some functions in
section groups, and the SysV ABI has some rules to make it easier to remove
these sections
(http://www.sco.com/developers/gabi/latest/ch4.sheader.html#section_group_rules):
A symbol table entry with STB_LOCAL binding that is defined relative to one
of a group's sections, and that is contained in a symbol table section that is
not part of the group, must be discarded if the group members are discarded.
References to this symbol table entry from outside the group are not allowed.
This means that we need to use the function symbol for the relocation, not a
temporary symbol.
There was a comment in the code claiming that the local symbol was used to
avoid creating a relocation, but a relocation must be created anyway as the
code and CFI are in different sections.
llvm-svn: 221150
The problem is mostly that variadic output instruction
aren't handled, so it is rejected for having an inconsistent
number of operands, and then the right number of operands
isn't emitted.
llvm-svn: 221117
The issue was that linkAppendingVarProto does the full linking job, including
deleting the old dst variable. The fix is just to call it and return early
if we have a GV with appending linkage.
original message:
Refactor duplicated code in liking GlobalValues.
There is quiet a bit of logic that is common to any GlobalValue but was
duplicated for Functions, GlobalVariables and GlobalAliases.
While at it, merge visibility even when comdats are used, fixing pr21415.
llvm-svn: 221098
This commit introduces heap-use-after-free detected by ASan. Here is the output
for one of several tests that detect it:
******************** TEST 'LLVM :: Linker/AppendingLinkage.ll' FAILED ********************
Command Output (stderr):
--
=================================================================
==2122==ERROR: AddressSanitizer: heap-use-after-free on address 0x60c00000b9c8 at pc 0x0000005d05d1 bp 0x7fff64ed27c0 sp 0x7fff64ed27b8
READ of size 4 at 0x60c00000b9c8 thread T0
#0 0x5d05d0 in llvm::GlobalValue::setUnnamedAddr(bool) /usr/local/google/home/chandlerc/src/llvm/build/../include/llvm/IR/GlobalValue.h:115:35
#1 0x69fff1 in (anonymous namespace)::ModuleLinker::linkGlobalValueProto(llvm::GlobalValue*) /usr/local/google/home/chandlerc/src/llvm/build/../lib/Linker/LinkModules.cpp:1041:5
#2 0x697229 in (anonymous namespace)::ModuleLinker::run() /usr/local/google/home/chandlerc/src/llvm/build/../lib/Linker/LinkModules.cpp:1485:9
#3 0x696542 in llvm::Linker::linkInModule(llvm::Module*) /usr/local/google/home/chandlerc/src/llvm/build/../lib/Linker/LinkModules.cpp:1621:10
#4 0x4a2db7 in main /usr/local/google/home/chandlerc/src/llvm/build/../tools/llvm-link/llvm-link.cpp:116:9
#5 0x7f4ae61e5ec4 in __libc_start_main /build/buildd/eglibc-2.19/csu/libc-start.c:287
#6 0x41eb71 in _start (/usr/local/google/home/chandlerc/src/llvm/build/bin/llvm-link+0x41eb71)
0x60c00000b9c8 is located 72 bytes inside of 128-byte region [0x60c00000b980,0x60c00000ba00)
freed by thread T0 here:
#0 0x4a1e6b in operator delete(void*) /usr/local/google/home/chandlerc/src/llvm/opt-build/../projects/compiler-rt/lib/asan/asan_new_delete.cc:94:3
#1 0x5d1a7a in llvm::iplist<llvm::GlobalVariable, llvm::ilist_traits<llvm::GlobalVariable> >::erase(llvm::ilist_iterator<llvm::GlobalVariable>) /usr/local/google/home/chandlerc/src/llvm/build/../inclu
de/llvm/ADT/ilist.h:466:5
#2 0x5d1980 in llvm::GlobalVariable::eraseFromParent() /usr/local/google/home/chandlerc/src/llvm/build/../lib/IR/Globals.cpp:204:3
#3 0x6a8a4d in (anonymous namespace)::ModuleLinker::linkAppendingVarProto(llvm::GlobalVariable*, llvm::GlobalVariable const*) /usr/local/google/home/chandlerc/src/llvm/build/../lib/Linker/LinkModules.
cpp:980:3
#4 0x6a7403 in (anonymous namespace)::ModuleLinker::linkGlobalVariableProto(llvm::GlobalVariable const*, llvm::GlobalValue*, bool) /usr/local/google/home/chandlerc/src/llvm/build/../lib/Linker/LinkMod
ules.cpp:1074:11
#5 0x69ff4e in (anonymous namespace)::ModuleLinker::linkGlobalValueProto(llvm::GlobalValue*) /usr/local/google/home/chandlerc/src/llvm/build/../lib/Linker/LinkModules.cpp:1028:13
#6 0x697229 in (anonymous namespace)::ModuleLinker::run() /usr/local/google/home/chandlerc/src/llvm/build/../lib/Linker/LinkModules.cpp:1485:9
#7 0x696542 in llvm::Linker::linkInModule(llvm::Module*) /usr/local/google/home/chandlerc/src/llvm/build/../lib/Linker/LinkModules.cpp:1621:10
#8 0x4a2db7 in main /usr/local/google/home/chandlerc/src/llvm/build/../tools/llvm-link/llvm-link.cpp:116:9
#9 0x7f4ae61e5ec4 in __libc_start_main /build/buildd/eglibc-2.19/csu/libc-start.c:287
previously allocated by thread T0 here:
#0 0x4a192b in operator new(unsigned long) /usr/local/google/home/chandlerc/src/llvm/opt-build/../projects/compiler-rt/lib/asan/asan_new_delete.cc:62:35
#1 0x61d85c in llvm::User::operator new(unsigned long, unsigned int) /usr/local/google/home/chandlerc/src/llvm/build/../lib/IR/User.cpp:57:19
#2 0x6a7525 in (anonymous namespace)::ModuleLinker::linkGlobalVariableProto(llvm::GlobalVariable const*, llvm::GlobalValue*, bool) /usr/local/google/home/chandlerc/src/llvm/build/../lib/Linker/LinkMod
ules.cpp:1100:3
#3 0x69ff4e in (anonymous namespace)::ModuleLinker::linkGlobalValueProto(llvm::GlobalValue*) /usr/local/google/home/chandlerc/src/llvm/build/../lib/Linker/LinkModules.cpp:1028:13
#4 0x697229 in (anonymous namespace)::ModuleLinker::run() /usr/local/google/home/chandlerc/src/llvm/build/../lib/Linker/LinkModules.cpp:1485:9
#5 0x696542 in llvm::Linker::linkInModule(llvm::Module*) /usr/local/google/home/chandlerc/src/llvm/build/../lib/Linker/LinkModules.cpp:1621:10
#6 0x4a2db7 in main /usr/local/google/home/chandlerc/src/llvm/build/../tools/llvm-link/llvm-link.cpp:116:9
#7 0x7f4ae61e5ec4 in __libc_start_main /build/buildd/eglibc-2.19/csu/libc-start.c:287
SUMMARY: AddressSanitizer: heap-use-after-free /usr/local/google/home/chandlerc/src/llvm/build/../include/llvm/IR/GlobalValue.h:115 llvm::GlobalValue::setUnnamedAddr(bool)
Shadow bytes around the buggy address:
0x0c187fff96e0: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
0x0c187fff96f0: 00 00 00 00 00 00 00 fa fa fa fa fa fa fa fa fa
0x0c187fff9700: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fa
0x0c187fff9710: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
0x0c187fff9720: 00 00 00 00 00 00 00 00 fa fa fa fa fa fa fa fa
=>0x0c187fff9730: fd fd fd fd fd fd fd fd fd[fd]fd fd fd fd fd fd
0x0c187fff9740: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
0x0c187fff9750: fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa fa
0x0c187fff9760: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c187fff9770: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
0x0c187fff9780: fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Heap right redzone: fb
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack partial redzone: f4
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
ASan internal: fe
==2122==ABORTING
llvm-svn: 221096
Currently we only need to emit skeleton strings into the CU header and
we do this by explicitly calling "addLocalString". With gmlt-in-fission,
we'll be emitting a bunch of other strings from other codepaths where
it's not statically known that these strings will be local or not.
Introduce a virtual function to indicate whether this unit is a DWO unit
or not (I'm not sure if we have a good term for this, the
opposite/alternative to 'skeleton' unit) and use that to generalize the
string emission logic so that strings can be correctly emitted in both
the skeleton and dwo unit when in split dwarf mode.
And to demonstrate that this works, switch the existing special callers
of addLocalString in the skeleton builder to addString - and they still
work. Yay.
llvm-svn: 221094
This is a useful distinction/invariant/delination to make because
LineTablesOnly mode is never relevant to type units, so it's clear that
we're not doing weird line-tables-only-with-types by making this API
choice.
It also lays the foundations nicely for adding gmlt-like data to fission
skeleton CUs while limiting the effects to CUs and not TUs.
llvm-svn: 221093
(these will shortly become virtual, with a null implementation in
DwarfUnit (since type units don't have accelerator tables in the current
schema) and the current implementation down in DwarfCompileUnit, moving
the actual maps there too)
llvm-svn: 221082
r221056 "[mips] Move F128 argument handling into MipsCCState as we did for returns. NFC."
r221058 "[mips] Fix unused variable warning introduced in r221056"
r221059 "[mips] Move all ByVal handling into CCState and tablegen-erated code. NFC."
r221061 "Renamed CCState members that appear to misspell 'Processed' as 'Proceed'. NFC."
It cuased an undefined behavior in LLVM :: CodeGen/Mips/return-vector.ll.
llvm-svn: 221081
This would help catch cases where we might otherwise try to reference a
dwo CU label, which would be weird - because without relocations in the
dwo file it's not generally meaningful to talk about the CU offsets
there (or, if it is, we can do so in absolute terms without using a
relocation to compute it).
llvm-svn: 221078
This allows the CU label to be emitted only for compile units, as
they're the only ones that need it (so they can be referenced from
pubnames)
llvm-svn: 221072
m_ZExt might bind against a ConstantExpr instead of an Instruction.
Assuming this, using cast<Instruction>, results in InstCombine crashing.
Instead, introduce ZExtOperator to bridge both Instruction and
ConstantExpr ZExts.
This fixes PR21445.
llvm-svn: 221069
This was a compile-unit specific label (unused in type units) and seems
unnecessary anyway when we can more easily directly compute the size of
the compile unit.
llvm-svn: 221067
Summary:
CCState already contains a byval implementation that is very similar to the
Mips custom code. This patch merges the custom code into the existing
common code and tablegen-erated code.
Reviewers: vmedic
Reviewed By: vmedic
Subscribers: rnk, llvm-commits
Differential Revision: http://reviews.llvm.org/D5977
llvm-svn: 221059
Summary:
There are a couple more changes to make before analyzeFormalArguments can
be merged into the standard AnalyzeFormalArguments. I've had to temporarily
poke a couple holes in MipsCCState's encapsulation to save having to make
all the required changes for this merge all at once*. These will be removed
shortly.
* We must merge our ByVal argument handling with the implementation in CCState.
This will be done over the next three patches, then the fourth will merge
analyzeFormalArguments with AnalyzeFormalArguments.
Depends on D5967
Reviewers: vmedic
Reviewed By: vmedic
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5969
llvm-svn: 221056
Type units no longer have skeletons and it's misleading to be able to
query for a type unit's skeleton (it might incorrectly lead one to
conclude that if a unit doesn't have a skeleton it's not in a .dwo
file... ).
llvm-svn: 221055
Summary:
It's now passed in as an argument to functions that need it. Eventually
this argument will be replaced by the 'this' pointer for a MipsCCState
object.
Depends on D5966
Reviewers: vmedic
Reviewed By: vmedic
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5967
llvm-svn: 221054
Summary:
There is one remaining trace of it in MipsCC::analyzeCallOperands() where
Mips16 might override the calling convention. This will moved into
tablegen-erated code later.
Depends on D5965
Reviewers: vmedic
Reviewed By: vmedic
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5966
llvm-svn: 221053
Summary:
CustomCallingConv is simply a CallingConv that tablegen should not generate the
implementation for. It allows regular CallingConv's to delegate to these custom
functions. This is (currently) necessary for Mips and we cannot use CCCustom
without having to adapt to the different API that CCCustom uses.
This brings us a bit closer to being able to remove
MipsCC::analyzeCallOperands and MipsCC::analyzeFormalArguments in favour of
the common implementation.
No functional change to the targets.
Depends on D3341
Reviewers: vmedic
Reviewed By: vmedic
Subscribers: vmedic, llvm-commits
Differential Revision: http://reviews.llvm.org/D5965
llvm-svn: 221052
This is the first big step to allowing gmlt-like inline scope
information in the skeleton CU. While this commit doesn't change the
functionality, it's only a small step to call
"constructAbstractSubprogramDIE" on both the InfoHolder and the
SkeletonHolder (when in use) and that will at least create the abstract
SP dies in that case, though still not creating the other subprograms.
llvm-svn: 221051
This removes calls to isMaterializable in the following cases:
* It was redundant with a call to isDeclaration now that isDeclaration returns
the correct answer for materializable functions.
* It was followed by a call to Materialize. Just call Materialize and check EC.
llvm-svn: 221050
This can happen pretty often in code that looks like:
int foo = bar - 1;
if (foo < 0)
do stuff
In this case, bar < 1 is an equivalent condition.
This transform requires that the add instruction be annotated with nsw.
llvm-svn: 221045
getDISubprogram was mistakenly thought to contain a bug: we thought we
might need to try harder if we found a DebugLoc we didn't find.
llvm-svn: 221044
Summary:
This patch extends the 'show' and 'merge' commands in llvm-profdata to handle
sample PGO formats. Using the 'merge' command it is now possible to convert
one sample PGO format to another.
The only format that is currently not working is 'gcc'. I still need to
implement support for it in lib/ProfileData.
The changes in the sample profile support classes are needed for the
merge operation.
Reviewers: bogner
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D6065
llvm-svn: 221032
"[x86] Simplify vector selection if condition value type matches vselect value type and true value is all ones or false value is all zeros."
llvm-svn: 221028
Change `Instruction::getAllMetadata()` to modify a vector of `Value`
instead of `MDNode` and update call sites. This is part of PR21433.
llvm-svn: 221027
Change `Instruction::getMetadata()` to return `Value` as part of
PR21433.
Update most callers to use `Instruction::getMDNode()`, which wraps the
result in a `cast_or_null<MDNode>`.
llvm-svn: 221024
Add `Instruction::getMDNode()` that casts to `MDNode` before changing
`Instruction::getMetadata()` to return `Value`. This avoids adding
`cast_or_null<MDNode>` boiler-plate throughout the code.
Part of PR21433.
llvm-svn: 221023
It appears to ignore or find ambiguous MachineInstrBuilder's conversion
operators that allow conversion to MachineInstr* and
MachineBasicBlock::bundle_iterator.
As a workaround, add an explicit way to get the MachineInstr.
llvm-svn: 221017
There is quiet a bit of logic that is common to any GlobalValue but was
duplicated for Functions, GlobalVariables and GlobalAliases.
While at it, merge visibility even when comdats are used, fixing pr21415.
llvm-svn: 221014
The getBinary and getBuffer method now return ordinary pointers of appropriate
const-ness. Ownership is transferred by calling takeBinary(), which returns a
pair of the Binary and a MemoryBuffer.
llvm-svn: 221003
We need to figure out how to track ptrtoint values all the
way until result is converted back to a pointer in order
to correctly rewrite the pointer type.
llvm-svn: 220997
Now that we have initial support for VSX, we can begin adding
intrinsics for programmer access to VSX instructions. This patch adds
basic support for VSX intrinsics in general, and tests it by
implementing intrinsics for minimum and maximum for the vector double
data type.
The LLVM portion of this is quite straightforward. There is a
companion patch for Clang.
llvm-svn: 220988
Our internal test reveals such case should not be transformed:
cmp x17, #3
b.lt .LBB10_15
...
subs x12, x12, #1
b.gt .LBB10_1
where x12 is a liveout, becomes:
cmp x17, #2
b.le .LBB10_15
...
subs x12, x12, #2
b.ge .LBB10_1
Unable to provide test case as it's difficult to reproduce on community branch.
http://reviews.llvm.org/D6048
Patch by Zhaoshi Zheng <zhaoshiz@codeaurora.org>!
llvm-svn: 220987
This patch adds an optimization in CodeGenPrepare to move an extractelement
right before a store when the target can combine them.
The optimization may promote any scalar operations to vector operations in the
way to make that possible.
** Context **
Some targets use different register files for both vector and scalar operations.
This means that transitioning from one domain to another may incur copy from one
register file to another. These copies are not coalescable and may be expensive.
For example, according to the scheduling model, on cortex-A8 a vector to GPR
move is 20 cycles.
** Motivating Example **
Let us consider an example:
define void @foo(<2 x i32>* %addr1, i32* %dest) {
%in1 = load <2 x i32>* %addr1, align 8
%extract = extractelement <2 x i32> %in1, i32 1
%out = or i32 %extract, 1
store i32 %out, i32* %dest, align 4
ret void
}
As it is, this IR generates the following assembly on armv7:
vldr d16, [r0] @vector load
vmov.32 r0, d16[1] @ cross-register-file copy: 20 cycles
orr r0, r0, #1 @ scalar bitwise or
str r0, [r1] @ scalar store
bx lr
Whereas we could generate much faster code:
vldr d16, [r0] @ vector load
vorr.i32 d16, #0x1 @ vector bitwise or
vst1.32 {d16[1]}, [r1:32] @ vector extract + store
bx lr
Half of the computation made in the vector is useless, but this allows to get
rid of the expensive cross-register-file copy.
** Proposed Solution **
To avoid this cross-register-copy penalty, we promote the scalar operations to
vector operations. The penalty will be removed if we manage to promote the whole
chain of computation in the vector domain.
Currently, we do that only when the chain of computation ends by a store and the
target is able to combine an extract with a store.
Stores are the most likely candidates, because other instructions produce values
that would need to be promoted and so, extracted as some point[1]. Moreover,
this is customary that targets feature stores that perform a vector extract (see
AArch64 and X86 for instance).
The proposed implementation relies on the TargetTransformInfo to decide whether
or not it is beneficial to promote a chain of computation in the vector domain.
Unfortunately, this interface is rather inaccurate for this level of details and
although this optimization may be beneficial for X86 and AArch64, the inaccuracy
will lead to the optimization being too aggressive.
Basically in TargetTransformInfo, everything that is legal has a cost of 1,
whereas, even if a vector type is legal, usually a vector operation is slightly
more expensive than its scalar counterpart. That will lead to too many
promotions that may not be counter balanced by the saving of the
cross-register-file copy. For instance, on AArch64 this penalty is just 4
cycles.
For now, the optimization is just enabled for ARM prior than v8, since those
processors have a larger penalty on cross-register-file copies, and the scope is
limited to basic blocks. Because of these two factors, we limit the effects of
the inaccuracy. Indeed, I did not want to build up a fancy cost model with block
frequency and everything on top of that.
[1] We can imagine targets that can combine an extractelement with other
instructions than just stores. If we want to go into that direction, the current
interfaces must be augmented and, moreover, I think this becomes a global isel
problem.
Differential Revision: http://reviews.llvm.org/D5921
<rdar://problem/14170854>
llvm-svn: 220978
Tested this by #if 0'ing out the pthreads implementation, which
indicated that this fallback was not currently compiling successfully
and applying this patch resolves that.
Patch by Andy Chien.
llvm-svn: 220969
In a case where we have a no {un,}signed wrap flag on the increment, if
RHS - Start is constant then we can avoid inserting a max operation bewteen
the two, since we can statically determine which is greater.
This allows us to unroll loops such as:
void testcase3(int v) {
for (int i=v; i<=v+1; ++i)
f(i);
}
llvm-svn: 220960
Since block address values can be larger than 2GB in 64-bit code, they
cannot be loaded simply using an @l / @ha pair, but instead must be
loaded from the TOC, just like GlobalAddress, ConstantPool, and
JumpTable values are.
The commit also fixes a bug in PPCLinuxAsmPrinter::doFinalization where
temporary labels could not be used as TOC values, since code would
attempt (and fail) to use GetOrCreateSymbol to create a symbol of the
same name as the temporary label.
llvm-svn: 220959
r212242 introduced a legalizer hook, originally to let AArch64 widen
v1i{32,16,8} rather than scalarize, because the legalizer expected, when
scalarizing the result of a conversion operation, to already have
scalarized the operands. On AArch64, v1i64 is legal, so that commit
ensured operations such as v1i32 = trunc v1i64 wouldn't assert.
It did that by choosing to widen v1 types whenever possible. However,
v1i1 types, for which there's no legal widened type, would still trigger
the assert.
This commit fixes that, by only scalarizing a trunc's result when the
operand has already been scalarized, and introducing an extract_elt
otherwise.
This is similar to r205625.
Fixes PR20777.
llvm-svn: 220937
Earlier this summer I fixed an issue where we were incorrectly combining
multiple loads that had different constraints such alignment, invariance,
temporality, etc. Apparently in one case I made copt paste error and swapped
alignment and invariance.
Tests included.
rdar://18816719
llvm-svn: 220933
Summary:
This patch adds an llvm_call_once which is a wrapper around std::call_once on platforms where it is available and devoid of bugs. The patch also migrates the ManagedStatic mutex to be allocated using llvm_call_once.
These changes are philosophically equivalent to the changes added in r219638, which were reverted due to a hang on Win32 which was the result of a bug in the Windows implementation of std::call_once.
Reviewers: aaron.ballman, chapuni, chandlerc, rnk
Reviewed By: rnk
Subscribers: majnemer, llvm-commits
Differential Revision: http://reviews.llvm.org/D5922
llvm-svn: 220932
The langref says:
LLVM explicitly allows declarations of global variables to be marked
constant, even if the final definition of the global is not. This
capability can be used to enable slightly better optimization of the
program, but requires the language definition to guarantee that
optimizations based on the ‘constantness’ are valid for the
translation units that do not include the definition.
Given that definition, when merging two declarations, we have to drop
constantness if of of them is not marked contant, since the Module
without the constant marker might not have the necessary guarantees.
llvm-svn: 220927
If we load from a location with range metadata, we can use information about the ranges of the loaded value for optimization purposes. This helps to remove redundant checks and canonicalize checks for other optimization passes. This particular patch checks whether a value is known to be non-zero from the range metadata.
Currently, these tests are against InstCombine. In theory, all of these should be InstSimplify since we're not inserting any new instructions. Moving the code may follow in a separate change.
Reviewed by: Hal
Differential Revision: http://reviews.llvm.org/D5947
llvm-svn: 220925