int->fp conversions on PPC must be done through memory loads and stores. On a
modern core, this process begins by storing the int value to memory, then
loading it using a (sometimes special) FP load instruction. Unfortunately, we
would do this even when the value to be converted was itself a load, and we can
just use that same memory location instead of copying it to another first.
There is a slight complication when handling int_to_fp(fp_to_int(x)) pairs,
because the fp_to_int operand has not been lowered when the int_to_fp is being
lowered. We handle this specially by invoking fp_to_int's lowering logic
(partially) and getting the necessary memory location (some trivial refactoring
was done to make this possible).
This is all somewhat ugly, and it would be nice if some later CodeGen stage
could just clean this stuff up, but because doing so would involve modifying
target-specific nodes (or instructions), it is not immediately clear how that
would work.
Also, remove a related entry from the README.txt for which we now generate
reasonable code.
llvm-svn: 225301
This was causing a race condition where DoDestroy() would acquire
the lock and then initiate a shutdown and then wait for it to
complete. But part of the shutdown involved acquiring the same
lock from a different thread. So the main thread would timeout
waiting for the shutdown to complete and return too soon.
The end result of this is that SBProcess::Kill() was broken on
Windows.
llvm-svn: 225297
The goal is to allows MachineFunctionInfo to override this create
function to customize the creation.
No change intended in existing backend in this patch.
llvm-svn: 225292
In DS write instructions, the address operand comes before the value
operand(s) which is reversed from every other instruction type.
The SIInsertWait assumed that the first use for each instruction
was the value, so for DS write it was protecting the address
operand with s_waitcnt instructions when it should have been
protecting the value operand.
llvm-svn: 225289
A recent POSIX host thread issue where HostThreadPosix::Join() wasn't returning the thread result was responsible for this regression, yet we had no test case covering this so it wasn't discovered.
llvm-svn: 225284
Summary:
Excerpt from [atomics.types.operations.req]/21:
> When only one memory_order argument is supplied, the value of
> success is order, and the value of failure is order except that a
> value of memory_order_acq_rel shall be replaced by the value
> memory_order_acquire and a value of memory_order_release shall be
> replaced by the value memory_order_relaxed.
Clean up some copy pasta while I'm here (someone added a return
statement to a void function).
Reviewers: EricWF, jroelofs, mclow.lists
Reviewed By: mclow.lists
Subscribers: cfe-commits
Differential Revision: http://reviews.llvm.org/D6632
llvm-svn: 225280
This is equivalent to the AMDGPUTargetMachine now, but it is the
starting point for separating R600 and GCN functionality into separate
targets.
It is recommened that users start using the gcn triple for GCN-based
GPUs, because using the r600 triple for these GPUs will be deprecated in
the future.
llvm-svn: 225277
This patch improves the logic added at revision 224899 (see review D6728) that
teaches the backend when it is profitable to speculate calls to cttz/ctlz.
The original algorithm conservatively avoided speculating more than one
instruction from a basic block in a control flow grap modelling an if-statement.
In particular, the only allowed instruction (excluding the terminator) was a
call to cttz/ctlz. However, there are cases where we could be less conservative
and still be able to speculate a call to cttz/ctlz.
With this patch, CodeGenPrepare now tries to speculate a cttz/ctlz if the
result is zero extended/truncated in the same basic block, and the zext/trunc
instruction is "free" for the target.
Added new test cases to CodeGen/X86/cttz-ctlz.ll
Differential Revision: http://reviews.llvm.org/D6853
llvm-svn: 225274
This also rolls in the changes discussed in http://reviews.llvm.org/D6766.
Defers migrating the debug info for new allocas until after all partitions
are created.
Thanks to Chandler for reviewing!
llvm-svn: 225272
The color scheme is the same as the one used by the colorize dwarfdump
script on Darwin.
A new --color option can be used to forcibly turn color on or off.
http://reviews.llvm.org/D6852
llvm-svn: 225269
In r225251, I removed an old entry from the README.txt file. While there are
several contributing factors (including pieces in Clang's ABI code), upon
further reflection, the backend part deserves a regression test.
llvm-svn: 225268
No functional changes. Support for ARM's modified immediate syntax was added
in r223113 and r223115 (review: D6408). That patch introduced the mod_imm*
tblegen definitions which renders the existing so_imm* definitions redundant.
This patch gets rid of them completely.
Reviewed as: D6722
llvm-svn: 225266
This is already handled in general when it is known the
conversion can't lose bits with smaller integer types
casted into wider floating point types.
This pattern happens somewhat often in GPU programs that cast
workitem intrinsics to float, which are often compared with 0.
Specifically handle the special case of compares with zero which
should also be known to not lose information. I had a more general
version of this which allows equality compares if the casted float is
exactly representable in the integer, but I'm not 100% confident that
is always correct.
Also fold cases that aren't integers to true / false.
llvm-svn: 225265
We should reconsider this after having switched to imath (instead of gmp)
as the default isl backend, as this would allow us to keep a copy of isl
in the polly svn and to consequently make it easier to distribute Polly.
llvm-svn: 225262
Linux has 64k pages, so the old limit was only two pages. With ASLR the
initial sp might be right at the start of the second page, so the stack
will immediately grow down into the first page; and if you use all pages
of a limited stack then asan hits a kernel bug to do with how stack
guard pages are reported in /proc/self/maps:
http://lkml.iu.edu//hypermail/linux/kernel/1501.0/01025.html
We should still fix the underlying problems, but in the mean time this
patch makes the test work with 64k pages as well as it does with 4k
pages.
llvm-svn: 225261
Use this to test that path of invalidation. This test actually shows
redundant invalidation here that is really bad. I'm going to work on
fixing that next, but wanted to commit the test harness now that its all
working.
llvm-svn: 225257
Requires new AsmParserOperand types that detect 16-bit and 32/64-bit mode so that we choose the right instruction based on default sizing without predicates. This is necessary since predicates mess up the disassembler table building.
llvm-svn: 225256
Try harder to get rid of bitcast'd calls by ptrtoint/inttoptr'ing
arguments and return values when DataLayout says it is safe to do so.
llvm-svn: 225254
remove an extra, redundant pass manager wrapping every run.
I had kept seeing these when manually testing, but it was getting really
annoying and was going to cause problems with overly eager invalidation.
The root cause was an overly complex and unnecessary pile of code for
parsing the outer layer of the pass pipeline. We can instead delegate
most of this to the recursive pipeline parsing.
I've added some somewhat more basic and precise tests to catch this.
llvm-svn: 225253
Overall this seems simpler. It reduces duplication of patterns between both modes and it simplifies the memory folding/unfolding tables as they don't need to create fake instructions just to keep track of 64-bitness.
llvm-svn: 225252
Because of how Clang represents structs as arrays (at least on non-Darwin
platforms), and what SROA does, etc. this is no longer a problem.
llvm-svn: 225251