In order to make comdats always explicit in the IR, we decided to make
the syntax a bit more compact for the case of a GlobalObject in a
comdat with the same name.
Just dropping the $name causes problems for
@foo = globabl i32 0, comdat
$bar = comdat ...
and
declare void @foo() comdat
$bar = comdat ...
So the syntax is changed to
@g1 = globabl i32 0, comdat($c1)
@g2 = globabl i32 0, comdat
and
declare void @foo() comdat($c1)
declare void @foo() comdat
llvm-svn: 225302
int->fp conversions on PPC must be done through memory loads and stores. On a
modern core, this process begins by storing the int value to memory, then
loading it using a (sometimes special) FP load instruction. Unfortunately, we
would do this even when the value to be converted was itself a load, and we can
just use that same memory location instead of copying it to another first.
There is a slight complication when handling int_to_fp(fp_to_int(x)) pairs,
because the fp_to_int operand has not been lowered when the int_to_fp is being
lowered. We handle this specially by invoking fp_to_int's lowering logic
(partially) and getting the necessary memory location (some trivial refactoring
was done to make this possible).
This is all somewhat ugly, and it would be nice if some later CodeGen stage
could just clean this stuff up, but because doing so would involve modifying
target-specific nodes (or instructions), it is not immediately clear how that
would work.
Also, remove a related entry from the README.txt for which we now generate
reasonable code.
llvm-svn: 225301
This is equivalent to the AMDGPUTargetMachine now, but it is the
starting point for separating R600 and GCN functionality into separate
targets.
It is recommened that users start using the gcn triple for GCN-based
GPUs, because using the r600 triple for these GPUs will be deprecated in
the future.
llvm-svn: 225277
This patch improves the logic added at revision 224899 (see review D6728) that
teaches the backend when it is profitable to speculate calls to cttz/ctlz.
The original algorithm conservatively avoided speculating more than one
instruction from a basic block in a control flow grap modelling an if-statement.
In particular, the only allowed instruction (excluding the terminator) was a
call to cttz/ctlz. However, there are cases where we could be less conservative
and still be able to speculate a call to cttz/ctlz.
With this patch, CodeGenPrepare now tries to speculate a cttz/ctlz if the
result is zero extended/truncated in the same basic block, and the zext/trunc
instruction is "free" for the target.
Added new test cases to CodeGen/X86/cttz-ctlz.ll
Differential Revision: http://reviews.llvm.org/D6853
llvm-svn: 225274
This also rolls in the changes discussed in http://reviews.llvm.org/D6766.
Defers migrating the debug info for new allocas until after all partitions
are created.
Thanks to Chandler for reviewing!
llvm-svn: 225272
In r225251, I removed an old entry from the README.txt file. While there are
several contributing factors (including pieces in Clang's ABI code), upon
further reflection, the backend part deserves a regression test.
llvm-svn: 225268
This is already handled in general when it is known the
conversion can't lose bits with smaller integer types
casted into wider floating point types.
This pattern happens somewhat often in GPU programs that cast
workitem intrinsics to float, which are often compared with 0.
Specifically handle the special case of compares with zero which
should also be known to not lose information. I had a more general
version of this which allows equality compares if the casted float is
exactly representable in the integer, but I'm not 100% confident that
is always correct.
Also fold cases that aren't integers to true / false.
llvm-svn: 225265
Use this to test that path of invalidation. This test actually shows
redundant invalidation here that is really bad. I'm going to work on
fixing that next, but wanted to commit the test harness now that its all
working.
llvm-svn: 225257
Requires new AsmParserOperand types that detect 16-bit and 32/64-bit mode so that we choose the right instruction based on default sizing without predicates. This is necessary since predicates mess up the disassembler table building.
llvm-svn: 225256
Try harder to get rid of bitcast'd calls by ptrtoint/inttoptr'ing
arguments and return values when DataLayout says it is safe to do so.
llvm-svn: 225254
remove an extra, redundant pass manager wrapping every run.
I had kept seeing these when manually testing, but it was getting really
annoying and was going to cause problems with overly eager invalidation.
The root cause was an overly complex and unnecessary pile of code for
parsing the outer layer of the pass pipeline. We can instead delegate
most of this to the recursive pipeline parsing.
I've added some somewhat more basic and precise tests to catch this.
llvm-svn: 225253
"ELF Handling for Thread-Local Storage" specifies that R_X86_64_GOTTPOFF
relocation target a movq or addq instruction.
Prohibit the truncation of such loads to movl or addl.
This fixes PR22083.
Differential Revision: http://reviews.llvm.org/D6839
llvm-svn: 225250
The old target DAG combine that allowed for performing int_to_fp(fp_to_int(x))
without a load/store pair is updated here with support for unsigned integers,
and to support single-precision values without a third rounding step, on newer
cores with the appropriate instructions.
llvm-svn: 225248
a specific analysis result.
This is quite handy to test things, and will also likely be very useful
for debugging issues. You could narrow down pass validation failures by
walking these invalidate pass runs up and down the pass pipeline, etc.
I've added support to the pass pipeline parsing to be able to create one
of these for any analysis pass desired.
Just adding this class uncovered one latent bug where the
AnalysisManager CRTP base class had a hard-coded Module type rather than
using IRUnitT.
I've also added tests for invalidation and caching of analyses in
a basic way across all the pass managers. These in turn uncovered two
more bugs where we failed to correctly invalidate an analysis -- its
results were invalidated but the key for re-running the pass was never
cleared and so it was never re-run. Quite nasty. I'm very glad to debug
this here rather than with a full system.
Also, yes, the naming here is horrid. I'm going to update some of the
names to be slightly less awful shortly. But really, I've no "good"
ideas for naming. I'll be satisfied if I can get it to "not bad".
llvm-svn: 225246
manager tests to use them and be significantly more comprehensive.
This, naturally, uncovered a bug where the CGSCC pass manager wasn't
printing analyses when they were run.
The only remaining core manipulator is I think an invalidate pass
similar to the require pass. That'll be next. =]
llvm-svn: 225240
is a no-op other than requiring some analysis results be available.
This can be used in real pass pipelines to force the usually lazy
analysis running to eagerly compute something at a specific point, and
it can be used to test the pass manager infrastructure (my primary use
at the moment).
I've also added bit of pipeline parsing magic to support generating
these directly from the opt command so that you can directly use these
when debugging your analysis. The syntax is:
require<analysis-name>
This can be used at any level of the pass manager. For example:
cgscc(function(require<my-analysis>,no-op-function))
This would produce a no-op function pass requiring my-analysis, followed
by a fully no-op function pass, both of these in a function pass manager
which is nested inside of a bottom-up CGSCC pass manager which is in the
top-level (implicit) module pass manager.
I have zero attachment to the particular syntax I'm using here. Consider
it a straw man for use while I'm testing and fleshing things out.
Suggestions for better syntax welcome, and I'll update everything based
on any consensus that develops.
I've used this new functionality to more directly test the analysis
printing rather than relying on the cgscc pass manager running an
analysis for me. This is still minimally tested because I need to have
analyses to run first! ;] That patch is next, but wanted to keep this
one separate for easier review and discussion.
llvm-svn: 225236
We now produce the desired code as noted in the README.txt file (no spurious
or). Remove the README entry and improve the regression test.
llvm-svn: 225214
This object is meant to own the ObjectFiles and their underlying
MemoryBuffer. It is basically the equivalent of an OwningBinary
except that it efficiently handles Archives. It is optimized for
efficiently providing mappings of members of the same archive when
they are opened successively (which is standard in Darwin debug
maps, objects from the same archive will be contiguous).
Of course, the BinaryHolder will also be used by the DWARF linker
once it is commited, but for now only the debug map parser uses it.
With this change, you can run llvm-dsymutil on your Darwin debug build
of clang and get a complete debug map for it.
Differential Revision: http://reviews.llvm.org/D6690
llvm-svn: 225207
Consider this function from our README.txt file:
int foo(int a, int b) { return (a < b) << 4; }
We now explicitly track CR bits by default, so the comment in the README.txt
about not really having a SETCC is no longer accurate, but we did generate this
somewhat silly code:
cmpw 0, 3, 4
li 3, 0
li 12, 1
isel 3, 12, 3, 0
sldi 3, 3, 4
blr
which generates the zext as a select between 0 and 1, and then shifts the
result by a constant amount. Here we preprocess the DAG in order to fold the
results of operations on an extension of an i1 value into the SELECT_I[48]
pseudo instruction when the resulting constant can be materialized using one
instruction (just like the 0 and 1). This was not implemented as a DAGCombine
because the resulting code would have been anti-canonical and depends on
replacing chained user nodes, which does not fit well into the lowering
paradigm. Now we generate:
cmpw 0, 3, 4
li 3, 0
li 12, 16
isel 3, 12, 3, 0
blr
which is less silly.
llvm-svn: 225203
The 64-bit semantics of cntlzw are not special, the 32-bit population count is
stored as a 64-bit value in the range [0,32]. As a result, it is always zero
extended, and it can be added to the PPCISelDAGToDAG peephole optimization as a
frontier instruction for the removal of unnecessary zero extensions.
llvm-svn: 225192
lhbrx and lwbrx not only load their data with byte swapping, but also clear the
upper 32 bits (at least). As a result, they can be added to the PPCISelDAGToDAG
peephole optimization as frontier instructions for the removal of unnecessary
zero extensions.
llvm-svn: 225189
We used to generate code similar to:
umov.b w8, v0[2]
strb w8, [x0, x1]
because the STR*ro* patterns were preferred to ST1*.
Instead, we can avoid going through GPRs, and generate:
add x8, x0, x1
st1.b { v0 }[2], [x8]
This patch increases the ST1* AddedComplexity to achieve that.
rdar://16372710
Differential Revision: http://reviews.llvm.org/D6202
llvm-svn: 225183
For 0-lane stores, we used to generate code similar to:
fmov w8, s0
str w8, [x0, x1, lsl #2]
instead of:
str s0, [x0, x1, lsl #2]
To correct that: for store lane 0 patterns, directly match to STR <subreg>0.
Byte-sized instructions don't have the special case for a 0 index,
because FPR8s are defined to have untyped content.
rdar://16372710
Differential Revision: http://reviews.llvm.org/D6772
llvm-svn: 225181
Tag_compatibility takes two arguments, but before this patch it would
erroneously accept just one, it now produces an error in that case.
Change-Id: I530f918587620d0d5dfebf639944d6083871ef7d
llvm-svn: 225167