Doing this with a hash map doesn't change behavior and avoids calling
isIdenticalTo O(n^2) times. This should probably eventually move into a utility
class shared with EarlyCSE and the limited CSE in the SLPVectorizer.
llvm-svn: 193926
When the loop vectorizer was part of the SCC inliner pass manager gvn would
run after the loop vectorizer followed by instcombine. This way redundancy
(multiple uses) were removed and instcombine could perform scalarization on the
induction variables. Having moved the loop vectorizer to later we no longer run
any form of redundancy elimination before we perform instcombine. This caused
vectorized induction variables to survive that did not before.
On a recent iMac this helps linpack back from 6000Mflops to 7000Mflops.
This should also help lpbench and paq8p.
I ran a Release (without Asserts) build over the test-suite and did not see any
negative impact on compile time.
radar://15339680
llvm-svn: 193891
In a failed attempt to allow the gnu-public-names.ll test case to not
hardcode the size of the unit that the pubnames section referred to I've
at least managed to have unit headers and pubnames headers print out in
a similar style.
This failed to achieve the desired goal because the header in a unit
specifies the length of the unit without the length element of the
header whereas the length in the pubnames includes this element, so the
numbers are off by 4 bytes. I don't know of any arithmetic powers in
FileCheck so the test case can't simply say "CU_LENGTH + 4".
llvm-svn: 193872
linkonce_odr_auto_hide was in incomplete attempt to implement a way
for the linker to hide symbols that are known to be available in every
TU and whose addresses are not relevant for a particular DSO.
It was redundant in that it all its uses are equivalent to
linkonce_odr+unnamed_addr. Unlike those, it has never been connected
to clang or llvm's optimizers, so it was effectively dead.
Given that nothing produces it, this patch just nukes it
(other than the llvm-c enum value).
llvm-svn: 193865
Add a Virtualization ARM subtarget feature along with adding proper build
attribute emission for Tag_Virtualization_use (encodes Virtualization and
TrustZone) and Tag_MPextension_use.
Also rework test/CodeGen/ARM/2010-10-19-mc-elf-objheader.ll testcase to
something that is more maintainable. This changes the focus of this
testcase away from testing CPU defaults (which is tested elsewhere), onto
specifically testing that attributes are encoded correctly.
llvm-svn: 193859
Fix Tag_ABI_HardFP_use build attribute to handle single precision FP,
replace deprecated Tag_ABI_HardFP_use value of 3 with 0 and also add
some tests for Tag_ABI_VFP_args.
llvm-svn: 193856
This adds another heuristic to BPI, similar to the existing heuristic that
considers (x == 0) unlikely to be true. As suggested in the PACT'98 paper by
Deitrich, Cheng, and Hwu, -1 is often used to indicate an invalid index, and
equality comparisons with -1 are also unlikely to succeed. Local
experimentation supports this hypothesis: This yields a 1-2% speedup in the
test-suite sqlite benchmark on the PPC A2 core, with no significant
regressions.
llvm-svn: 193855
When a dependence check fails we can still try to vectorize loops with runtime
array bounds checks.
This helps linpack to vectorize a loop in dgefa. And we are back to 2x of the
scalar performance on a corei7-avx.
radar://15339680
llvm-svn: 193853
Objective-C data structures.
This is allows tools such as darwin's otool(1) that uses the
LLVM disassembler take a pointer value being loaded by
an instruction and add a comment to what it is being referenced
to make following disassembly of Objective-C programs
more readable.
For example disassembling the Mac OS X TextEdit app one
will see comments like the following:
movq 0x20684(%rip), %rsi ## Objc selector ref: standardUserDefaults
movq 0x21985(%rip), %rdi ## Objc class ref: _OBJC_CLASS_$_NSUserDefaults
movq 0x1d156(%rip), %r14 ## Objc message: +[NSUserDefaults standardUserDefaults]
leaq 0x23615(%rip), %rdx ## Objc cfstring ref: @"SelectLinePanel"
callq 0x10001386c ## Objc message: -[[%rdi super] initWithWindowNibName:]
These diffs also include putting quotes around C strings
in literal pools and uses "symbol address" in the comment
when adding a symbol name to the comment to tell these
types of references apart:
leaq 0x4f(%rip), %rax ## literal pool for: "Hello world"
movq 0x1c3ea(%rip), %rax ## literal pool symbol address: ___stack_chk_guard
Of course the easy changes are in the LLVM disassembler and
the hard work is up to the implementer of the SymbolLookUp()
call back.
rdar://10602439
llvm-svn: 193833
Given that backend does not handle "invoke asm" correctly ("invoke asm" will be
handled by SelectionDAGBuilder::visitInlineAsm, which does not have the right
setup for LPadToCallSiteMap) and we already made the assumption that inline asm
does not throw in InstCombiner::visitCallSite, we are going to make the same
assumption in Inliner to make sure we don't convert "call asm" to "invoke asm".
If it becomes necessary to add support for "invoke asm" later on, we will need
to modify the backend as well as remove the assumptions that inline asm does
not throw.
Fix rdar://15317907
llvm-svn: 193808
There are two ways one could implement hiding of linkonce_odr symbols in LTO:
* LLVM tells the linker which symbols can be hidden if not used from native
files.
* The linker tells LLVM which symbols are not used from other object files,
but will be put in the dso symbol table if present.
GOLD's API is the second option. It was implemented almost 1:1 in llvm by
passing the list down to internalize.
LLVM already had partial support for the first option. It is also very similar
to how ld64 handles hiding these symbols when *not* doing LTO.
This patch then
* removes the APIs for the DSO list.
* marks LTO_SYMBOL_SCOPE_DEFAULT_CAN_BE_HIDDEN all linkonce_odr unnamed_addr
global values and other linkonce_odr whose address is not used.
* makes the gold plugin responsible for handling the API mismatch.
llvm-svn: 193800
We add a map in DwarfDebug to map MDNodes that are shareable across CUs to the
corresponding DIEs: MDTypeNodeToDieMap. These DIEs can be shared across CUs,
that is why we keep the maps in DwarfDebug instead of CompileUnit.
We make the assumption that if a DIE is not added to an owner yet, we assume
it belongs to the current CU. Since DIEs for the type system are added to
their owners immediately after creation, and other DIEs belong to the current
CU, the assumption should be true.
A testing case is added to show that we only create a single DIE for a type
MDNode and we use ref_addr to refer to the type DIE.
We also add a testing case to show ref_addr relocations for non-darwin
platforms.
llvm-svn: 193779
As on other hosts, the CPU identification instruction is priveleged,
so we need to look through /proc/cpuinfo. I copied the PowerPC way of
handling "generic".
Several tests were implicitly assuming z10 and so failed on z196.
llvm-svn: 193742
This adds a new subtarget feature called FPARMv8 (implied by NEON), and
predicates the support of the FP instructions and registers on this feature.
llvm-svn: 193739
When an extend more than doubles the size of the elements (e.g., a zext
from v16i8 to v16i32), the normal legalization method of splitting the
vectors will run into problems as by the time the destination vector is
legal, the source vector is illegal. The end result is the operation
often becoming scalarized, with the typical horrible performance. For
example, on x86_64, the simple input of:
define void @bar(<16 x i8> %a, <16 x i32>* %p) nounwind {
%tmp = zext <16 x i8> %a to <16 x i32>
store <16 x i32> %tmp, <16 x i32>*%p
ret void
}
Generates:
.section __TEXT,__text,regular,pure_instructions
.section __TEXT,__const
.align 5
LCPI0_0:
.long 255 ## 0xff
.long 255 ## 0xff
.long 255 ## 0xff
.long 255 ## 0xff
.long 255 ## 0xff
.long 255 ## 0xff
.long 255 ## 0xff
.long 255 ## 0xff
.section __TEXT,__text,regular,pure_instructions
.globl _bar
.align 4, 0x90
_bar:
vpunpckhbw %xmm0, %xmm0, %xmm1
vpunpckhwd %xmm0, %xmm1, %xmm2
vpmovzxwd %xmm1, %xmm1
vinsertf128 $1, %xmm2, %ymm1, %ymm1
vmovaps LCPI0_0(%rip), %ymm2
vandps %ymm2, %ymm1, %ymm1
vpmovzxbw %xmm0, %xmm3
vpunpckhwd %xmm0, %xmm3, %xmm3
vpmovzxbd %xmm0, %xmm0
vinsertf128 $1, %xmm3, %ymm0, %ymm0
vandps %ymm2, %ymm0, %ymm0
vmovaps %ymm0, (%rdi)
vmovaps %ymm1, 32(%rdi)
vzeroupper
ret
So instead we can check if there are legal types that enable us to split
more cleverly when the input vector is already legal such that we don't
turn it into an illegal type. If the extend is such that it's more than
doubling the size of the input we check if
- the number of vector elements is even,
- the source type is legal,
- the type of a split source is illegal,
- the type of an extended (by doubling element size) source is legal, and
- the type of that extended source when split is legal.
If the conditions are met, instead of just splitting both the
destination and the source types, we create an extend that only goes up
one "step" (doubling the element width), and the continue legalizing the
rest of the operation normally. The result is that this operates as a
new, more effecient, termination condition for the loop of "split the
operation until the destination type is legal."
With this change, the above example now compiles to:
_bar:
vpxor %xmm1, %xmm1, %xmm1
vpunpcklbw %xmm1, %xmm0, %xmm2
vpunpckhwd %xmm1, %xmm2, %xmm3
vpunpcklwd %xmm1, %xmm2, %xmm2
vinsertf128 $1, %xmm3, %ymm2, %ymm2
vpunpckhbw %xmm1, %xmm0, %xmm0
vpunpckhwd %xmm1, %xmm0, %xmm3
vpunpcklwd %xmm1, %xmm0, %xmm0
vinsertf128 $1, %xmm3, %ymm0, %ymm0
vmovaps %ymm0, 32(%rdi)
vmovaps %ymm2, (%rdi)
vzeroupper
ret
This generalizes a custom lowering that was added a while back to the
ARM backend. That lowering is no longer necessary, and is removed. The
testcases for it, however, provide excellent ARM tests for this change
and so remain.
rdar://14735100
llvm-svn: 193727
The function verifyFunction() in lib/IR/Verifier.cpp misses some
calls. It creates a temporary FunctionPassManager that will run a
single Verifier pass. Unfortunately, FunctionPassManager is no
PassManager and does not call doInitialization() and doFinalization()
by itself. Verifier does important tasks in doInitialization() such as
collecting type information used to check DebugInfo metadata and
doFinalization() does some additional checks. Therefore these checks
were missed and debug info couldn't be verified at all, it just
crashed if the function had some.
verifyFunction() is currently not used in llvm unless -debug option is
enabled, and in unittests/IR/VerifierTest.cpp
VerifierTest had to be changed to create the function in a module from
which the type debug info can be collected.
Patch by Michael Kruse.
llvm-svn: 193719
With this patch llvm produces a weak_def_can_be_hidden for linkonce_odr
if they are also unnamed_addr or don't have their address taken.
There is not a lot of documentation about .weak_def_can_be_hidden, but
from the old discussion about linkonce_odr_auto_hide and the name of
the directive this looks correct: these symbols can be hidden.
Testing this with the ld64 in Xcode 5 linking clang reduces the number of
exported symbols from 21053 to 19049.
llvm-svn: 193718
startswith_lower is ocassionally useful and I think worth adding.
endwith_lower is added for completeness.
Differential Revision: http://llvm-reviews.chandlerc.com/D2041
llvm-svn: 193706
Fixing this Windows build error:
..\lib\Target\Mips\MipsSEISelLowering.cpp(997) : error C2027: use of undefined type 'llvm::raw_ostream'
llvm-svn: 193696
Also corrected the definition of the intrinsics for these instructions (the
result register is also the first operand), and added intrinsics for bsel and
bseli to clang (they already existed in the backend).
These four operations are mostly equivalent to bsel, and bseli (the difference
is which operand is tied to the result). As a result some of the tests changed
as described below.
bitwise.ll:
- bsel.v test adapted so that the mask is unknown at compile-time. This stops
it emitting bmnzi.b instead of the intended bsel.v.
- The bseli.b test now tests the right thing. Namely the case when one of the
values is an uimm8, rather than when the condition is a uimm8 (which is
covered by bmnzi.b)
compare.ll:
- bsel.v tests now (correctly) emits bmnz.v instead of bsel.v because this
is the same operation (see MSA.txt).
i8.ll
- CHECK-DAG-ized test.
- bmzi.b test now (correctly) emits equivalent bmnzi.b with swapped operands
because this is the same operation (see MSA.txt).
- bseli.b still emits bseli.b though because the immediate makes it
distinguishable from bmnzi.b.
vec.ll:
- CHECK-DAG-ized test.
- bmz.v tests now (correctly) emits bmnz.v with swapped operands (see
MSA.txt).
- bsel.v tests now (correctly) emits bmnz.v with swapped operands (see
MSA.txt).
llvm-svn: 193693
This required correcting the definition of the bins[lr]i intrinsics because
the result is also the first operand.
It also required removing the (arbitrary) check for 32-bit immediates in
MipsSEDAGToDAGISel::selectVSplat().
Currently using binsli.d with 2 bits set in the mask doesn't select binsli.d
because the constant is legalized into a ConstantPool. Similar things can
happen with binsri.d with more than 10 bits set in the mask. The resulting
code when this happens is correct but not optimal.
llvm-svn: 193687
(or (and $a, $mask), (and $b, $inverse_mask)) => (vselect $mask, $a, $b).
where $mask is a constant splat. This allows bitwise operations to make use
of bsel.
It's also a stepping stone towards matching bins[lr], and bins[lr]i from
normal IR.
Two sets of similar tests have been added in this commit. The bsel_* functions
test the case where binsri cannot be used. The binsr_*_i functions will
start to use the binsri instruction in the next commit.
llvm-svn: 193682
splat.d is implemented but this subtest is currently disabled. This is because
it is difficult to match the appropriate IR on MIPS32. There is a patch under
review that should help with this so I hope to enable the subtest soon.
llvm-svn: 193680
The Type Legalizer recognizes that VSELECT needs to be split, because the type
is to wide for the given target. The same does not always apply to SETCC,
because less space is required to encode the result of a comparison. As a result
VSELECT is split and SETCC is unrolled into scalar comparisons.
This commit fixes the issue by checking for VSELECT-SETCC patterns in the DAG
Combiner. If a matching pattern is found, then the result mask of SETCC is
promoted to the expected vector mask type for the given target. This mask has
usually the same size as the VSELECT return type (except for Intel KNL). Now the
type legalizer will split both VSELECT and SETCC.
This allows the following X86 DAG Combine code to sucessfully detect the MIN/MAX
pattern. This fixes PR16695, PR17002, and <rdar://problem/14594431>.
Reviewed by Nadav
llvm-svn: 193676
Use EmitLabelOffsetDifference for handling on darwin platform when
non-darwin platforms use EmitLabelPlusOffset.
Also fix a bug in EmitLabelOffsetDifference where the size is hard-coded
to 4 even though Size is passed in as an argument.
llvm-svn: 193660
To support ref_addr, we calculate the section offset of a DIE (i.e. offset
of a DIE from beginning of the debug info section). The Offset field in DIE
is currently CU-relative. To calculate the section offset, we add a
DebugInfoOffset field in CompileUnit to store the offset of a CU from beginning
of the debug info section. We set the value in DwarfUnits::computeSizeAndOffset
for each CompileUnit.
A helper function DIE::getCompileUnit is added to return the CU DIE that
the input DIE belongs to. We also add a map CUDieMap in DwarfDebug to help
finding the CU for a given CU DIE.
For a cross-referenced DIE, we first find the CU DIE it belongs to with
getCompileUnit, then we use CUDieMap to get the corresponding CU for the CU DIE.
Adding the section offset of the CU with the CU-relative offset of a DIE gives
us the seciton offset of the DIE.
We correctly emit ref_addr with relocation using EmitLabelPlusOffset when
doesDwarfUseRelocationsAcrossSections is true.
This commit handles the emission of DW_FORM_ref_addr when we have an attribute
with FORM_ref_addr. A follow-on patch will start using ref_addr when adding a
DIEEntry. This commit will be tested and verified in the follow-on patch.
Reviewed off-list by Eric, Thanks.
llvm-svn: 193658
after the DIE creation, we construct the context first.
Ensure that we create the context before we create a type so that we can add
the newly created type to the parent. Remove last use of addToContextOwner
now that it's not needed.
We use createAndAddDIE to wrap around "new DIE(". Now all shareable DIEs
should be added to their parents right after the creation.
Reviewed off-list by Eric, Thanks.
llvm-svn: 193657
Helper functions are added:
emitPostLd: emit a post-increment load operation with given size.
emitPostSt: emit a post-increment store operation with given size.
No functionality change.
llvm-svn: 193656
This modifies the pass to classify every SSP-triggering AllocaInst according to
an SSPLayoutKind (LargeArray, SmallArray, AddrOf). This analysis is collected
by the pass and made available for use, but no other pass uses it yet.
The next patch will make use of this analysis in PEI and StackSlot
passes. The end goal is to support ssp-strong stack layout rules.
WIP.
Differential Revision: http://llvm-reviews.chandlerc.com/D1789
llvm-svn: 193653
MSVC can't comprehend
template<typename T, size_t N>
ArrayRef<T> makeArrayRef(const T (&Arr)[N]) {
return ArrayRef<T>(Arr);
}
if Arr is
static const uint8_t sizes[];
declared in a templated and defined a few lines later.
I'll send a proper fix (i.e. get rid of unnecessary templates) for review soon.
llvm-svn: 193604
Adds a subtarget feature for the CRC instructions (optional in v8-A) to the ARM (32-bit) backend.
Differential Revision: http://llvm-reviews.chandlerc.com/D2036
llvm-svn: 193599
after the DIE creation, we construct the context first.
This touches creation of namespaces and global variables. The purpose is to
handle all DIE creations similarly: constructs the context first, then creates
the DIE and immediately adds the DIE to its parent.
We use createAndAddDIE to wrap around "new DIE(".
llvm-svn: 193589
By vectorizing a series of srl, or, ... instructions we have obfuscated the
intention so much that the backend does not know how to fold this code away.
radar://15336950
llvm-svn: 193573
No test case, because with the current cost model we don't see a difference.
An upcoming ARM memory cost model change will expose and test this bug.
radar://15332579
llvm-svn: 193572
More patches will be submitted to convert "new DIE(" to use createAddAndDIE in
DwarfCompileUnit.cpp. This will simplify implementation of addDIEEntry where
we have to decide between ref4 and ref_addr, because DIEs that can be shared
across CU will be added to a CU already.
Reviewed off-list by Eric.
llvm-svn: 193567
It wraps around "new DIE(" and handles the bookkeeping part of the newly-created
DIE. It adds the DIE to its parent, and calls insertDIE if necessary. It makes
sure that bookkeeping is done at the earliest time and we should not see
parentless DIEs if all constructions of DIEs go through this helper function.
Later on, we can use an allocator for DIE allocation, and will only need to
change createAndAddDIE instead of modifying all the "new DIE(".
Reviewed off-list by Eric.
llvm-svn: 193566
Complicated CU-DIE-specific logic in the latter was never used,
and it makes sense to have safety checks for broken dwarf in the former.
llvm-svn: 193563
Summary:
Use DWARF4 table of form classes to fetch attributes from DIE
in a more consistent way. This shouldn't change the functionality and
serves as a refactoring for upcoming change: DW_AT_high_pc has different
semantics depending on its form class.
Reviewers: dblaikie, echristo
Reviewed By: echristo
CC: echristo, llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D1961
llvm-svn: 193553
an MCExpr, in order to avoid writing an encoded zero value in the immediate
field.
When getUnconditionalBranchTargetOpValue is called with an MCExpr target, we
don't know what the final immediate field value should be. We shouldn't
explicitly set the immediate field to an encoded zero value as zero is encoded
with a non-zero bit pattern. This leads to bits being set that pollute the
final immediate value. The nature of the encoding is such that the polluted
bits only affect very large immediate values, explaining why this hasn't
caused problems earlier.
Fixes <rdar://problem/15155975>.
llvm-svn: 193535
This commit allows the ARM integrated assembler to parse
and assemble the code with .eabi_attribute, .cpu, and
.fpu directives.
To implement the feature, this commit moves the code from
AttrEmitter to ARMTargetStreamers, and several new test
cases related to cortex-m4, cortex-r5, and cortex-a15 are
added.
Besides, this commit also change the Subtarget->isFPOnlySP()
to Subtarget->hasD16() to match the usage of .fpu directive.
This commit changes the test cases:
* Several .eabi_attribute directives in
2010-09-29-mc-asm-header-test.ll are removed because the .fpu
directive already cover the functionality.
* In the Cortex-A15 test case, the value for
Tag_Advanced_SIMD_arch has be changed from 1 to 2,
which is more precise.
llvm-svn: 193524
useAA significantly improves the handling of vector code that has TBAA
information attached. It also helps other cases, as shown by the testsuite
changes here. The only real downside I've seen is that it interferes with
MergeConsecutiveStores. The problem is that that optimization works top
down, starting at the first store in the chain, and looks for cases where
the chain result is only used by a single related store. These related
stores don't alias, so useAA will have rewritten all the later stores to
use a different chain input (typically the same one as the first store).
I think the advantages outweigh the disadvantages though, so for now I've
just disabled alias analysis for the unaligned-01.ll test.
llvm-svn: 193521
Making useAA() default to true for SystemZ showed that the combiner alias
analysis wasn't handling volatile accesses. This hit many of the SystemZ
tests, but I arbitrarily picked one for the purpose of this patch.
llvm-svn: 193518
Most SelectionDAG code drops the TBAA info when creating a new form of a
load and store (e.g. during legalization, or when converting a plain
load to an extending one). This patch tries to catch all cases where
the TBAA information can legitimately be carried over.
The patch adds alternative forms of getLoad() and getExtLoad() that take
a MachineMemOperand instead of individual fields. (The corresponding
getTruncStore() already exists.) The idea is to use the MachineMemOperand
forms when all fields are carried over (size, pointer info, isVolatile,
isNonTemporal, alignment and TBAA info). If some adjustment is being
made, e.g. to narrow the load, then we still pass the individual fields
but also pass the TBAA info.
llvm-svn: 193517
We can't do this for the general case as saying a GEP with a negative index
doesn't have unsigned wrap isn't valid for negative indices.
%gep = getelementptr inbounds i32* %p, i64 -1
But an inbounds GEP cannot run past the end of address space. So we check for
the very common case of a positive index and make GEPs derived from that NUW.
Together with Andy's recent non-unit stride work this lets us analyze loops
like
void foo3(int *a, int *b) {
for (; a < b; a++) {}
}
PR12375, PR12376.
Differential Revision: http://llvm-reviews.chandlerc.com/D2033
llvm-svn: 193514
Before I just ported the shell of the pass. I've tried to keep everything
nearly identical to the ARM version. I think it will be very easy to eventually
merge these two and create a new more general pass that other targets can
use. I have some improvements I would like to make to allow pools to
be shared across functions and some other things. When I'm all done we
can think about making a more general pass. More to be ported but the
basic mechanism works now almost as good as gcc mips16.
llvm-svn: 193509
Partial fix for PR17459: wrong code at -O3 on x86_64-linux-gnu
(affecting trunk and 3.3)
When SCEV expands a recurrence outside of a loop it attempts to scale
by the stride of the recurrence. Chained recurrences don't work that
way. We could compute binomial coefficients, but would hve to
guarantee that the chained AddRec's are in a perfectly reduced form.
llvm-svn: 193438
Partial fix for PR17459: wrong code at -O3 on x86_64-linux-gnu
(affecting trunk and 3.3)
ScalarEvolutionNormalization was attempting to normalize by adding and
subtracting strides. Chained recurrences don't work that way.
llvm-svn: 193437
This patch teaches GlobalStatus to analyze a call that uses the global value as
a callee, not as an argument.
With this change internalize call handle the common use of linkonce_odr
functions. This reduces the number of linkonce_odr functions in a LTO build of
clang (checked with the emit-llvm gold plugin option) from 1730 to 60.
llvm-svn: 193436
The loop vectorizer does not currently understand how to vectorize
extractelement instructions. The existing check, which excluded all
vector-valued instructions, did not catch extractelement instructions because
it checked only the return value. As a result, vectorization would proceed,
producing illegal instructions like this:
%58 = extractelement <2 x i32> %15, i32 0
%59 = extractelement i32 %58, i32 0
where the second extractelement is illegal because its first operand is not a vector.
llvm-svn: 193434
This fix a memory leak found by valgrind.
Calling it from the base class destructor would not destroy the BasicCallGraph
bits.
FIXME: BasicCallGraph is the only thing that inherits from CallGraph. Can
we merge the two?
llvm-svn: 193412
When assembling, a .thumb_func directive is supposed to be applicable to the
next symbol definition, even if there are intervening directives. We were
racing ahead to try and find it, and this commit should fix the issue.
Patch by Gabor Ballabas
llvm-svn: 193403
There's a barrier instruction so that should still be used, but most actual
atomic operations are going to need a platform decision on the correct
behaviour (either nop if single-threaded or OS-support otherwise).
rdar://problem/15287210
llvm-svn: 193399
ARM processors without ldrex/strex need to be able to make libcalls for all
atomic operations, including the newer min/max versions.
The alternative would probably be expanding these operations in terms of
cmpxchg (as x86 does always), but in the configurations where this matters
code-size tends to be paramount so the libcall is more desirable.
llvm-svn: 193398
This optimization is not SSE specific so I am moving it to DAGco.
The new scalar_to_vector dag node exposed a missing pattern in the AArch64 target that I needed to add.
llvm-svn: 193393
llvm-cov will now be able to read program counts from the GCDA file and
output it in the same format as gcov. The program summary tag was
identified from gcov-io.h as "\0\0\0\a3".
There is currently a bug in GCOVProfiling.cpp which does not generate
the
run- or program-counting IR, so this change was tested manually by
modifying the GCDA file and comparing the gcov and llvm-cov outputs.
llvm-svn: 193389
Also improve the implementation of EmitRawText(Twine) so it doesn't
bother using the SmallString buffer if the Twine is a simple StringRef
anyway.
llvm-svn: 193378
This commit changes the struct byval lowering for arm to use inline
checks for the subtarget instead of a class abstraction to represent
the differences. The class abstraction was judged to be too much
code for this task.
No intended functionality change.
llvm-svn: 193357
This prevents us from silently accepting invalid instructions on (for example)
Cortex-M4 with just single-precision VFP support.
No tests for the extra Pat Requires because they're essentially assertions: the
affected code should have been lowered to libcalls before ISel.
rdar://problem/15302004
llvm-svn: 193354
Make sure we mark all loops (scalar and vector) when vectorizing,
so that we don't try to vectorize them anymore. Also, set unroll
to 1, since this is what we check for on early exit.
llvm-svn: 193349
The fused multiply instructions were added in VFPv4 but are still NEON
instructions, in particular they shouldn't be available on a Cortex-M4 not
matter how floaty it is.
llvm-svn: 193342
If an alias inherits directly from InstAlias then it doesn't get any default
"Requires" values, so llvm-mc will allow it even on architectures that don't
support the underlying instruction.
This tidies up the obvious VFP and NEON cases I found.
llvm-svn: 193340
The compiler-rt functions __adddf3vfp and so on exist purely to allow Thumb1
code to make use of VFP instructions by switching back to ARM mode, they make
no sense for M-class processors which don't even have an ARM mode.
Given that justification, in practice this is a platform ABI decision so the
actual check is based on that rather than CPU features.
rdar://problem/15302004
llvm-svn: 193327
Without this, customers of the MCJIT were leaking memory like crazy.
It's not really clear what the *right* memory management is here, so I'm
not trying to add lots of tests or other logic, just trying to get us
back to a better baseline. I'll follow up on the original commit to
figure out the right path forward.
llvm-svn: 193323
POP instructions are aliased to the ARM LDM variants but have different syntax.
This caused two problems: we tried to access a non-existent operand to annotate
the '!', and the error message didn't make much sense.
With some vigorous hand-waving in the error message both problems can be
fixed.
llvm-svn: 193322
LLVM optimizers may widen accesses to packed structures that overflow the structure itself, but should be in bounds up to the alignment of the object
llvm-svn: 193317
When generating the IfTrue basic block during the F128CSEL pseudo-instruction
handling, the NZCV live-in for the newly created BB wasn't being added. This
caused a fault during MI-sched/live range calculation when the predecessor
for the fall-through BB didn't have a live-in for phys-reg as expected.
llvm-svn: 193316
This was a fundamental flaw in llvm-cov where it treated the values in
the GCDA files as block counts instead of edge counts. This created
incorrect line counts when branching was present. Instead, the edge
counts should be summed to obtain the correct block count.
The fix was tested using custom test files as well as single source
files from the test-suite directory. The behaviour can be verified by
reading the GCOV documentation that describes the GCDA spec ("ARC_COUNTS
gives the counter values for those arcs that are instrumented") and the
header description provided by GCOVProfiling.cpp ("instruments the code
that runs to records (sic) the edges between blocks that run and emit a
complementary "gcda" file on exit").
llvm-svn: 193299
Calling _chkstk is required on ELF as well as COFF on Windows. Without
_chkstk, functions requiring large stack crash in initialization code.
Previous code tested for COFF format but not Mach-O and this patch modifies
the code to test for Windows OS (both Windows target and MingW target)
but not Mach-O object format: Looks like macho environment was used to
build some EFI code.
Credits to Andrew MacPherson.
llvm-svn: 193289
Since we never insert DIE for DITemplateTypeParameter to a map, there is no need
to call getDIE in getOrCreateTemplateTypeParameterDIE. It is also renamed to
constructTemplateTypeParameterDIE to match with other construct functions
in CompileUnit.
Same applies to getOrCreateTemplateValueParameterDIE.
llvm-svn: 193287
Remove the unneeded return values from createMemberDIE, constructEnumTypeDIE,
getOrCreateTemplateTypeParameterDIE, and getOrCreateTemplateValueParameterDIE.
llvm-svn: 193285
There are a few motivations for this:
- Using a map allows for checking if line is in map. This differentiates
unexecutable lines (such as comments) from unexecuted logical lines of
code. "#####" is now outputted in this case, in line with gcov.
- Source files are no longer read in twice: once when storing the line
counts, and once when outputting the data.
- Greatly simplifies the function FileInfo::addLineCount().
llvm-svn: 193264