This patch restores TwoAddressInstructionPass's pre-r153892 behaviour when
rescheduling instructions in TryInstructionTransform. Hopefully this will fix
PR12493. To refix PR11861, lowering of INSERT_SUBREGS is deferred until after
the copy that unties the operands is emitted (this seems to be a more
appropriate fix for that issue anyway).
llvm-svn: 154338
We currently want to look whether PWD is available - if PWD is available it will
get us the non-resolved current path, while fs::current_path will resolve
symlinks. The long term fix is to not rely on that behavior any more.
llvm-svn: 154330
Specifically, using a an integer outside [0, 1] as a boolean constant seems to
be an easy mistake to make with things like "x == a || b" where the author
intended "x == a || x == b".
The bug caused by calling SkipUntil with three token kinds was also identified
by a VC diagnostic & reported by Francois Pichet as review feedback for my
commit r154163. I've included test cases to verify the error recovery that was
broken/poorly implemented due to this bug.
The other fix (lib/Sema/SemaExpr.cpp) seems like that code was never actually
reached in any of Clang's tests & is related to Objective C features I'm not
familiar with, so I've not been able to construct a test case for it. Perhaps
someone else can.
llvm-svn: 154325
A couple of cases where we were accidentally creating constant conditions by
something like "x == a || b" instead of "x == a || x == b". In one case a
conditional & then unreachable was used - I transformed this into a direct
assert instead.
llvm-svn: 154324
x86 addressing modes. This allows PIE-based TLS offsets to fit directly
into an addressing mode immediate offset, which is the last remaining
code quality issue from PR12380. With this patch, that PR is completely
fixed.
To understand why this patch is correct to match these offsets into
addressing mode immediates, break it down by cases:
1) 32-bit is trivially correct, and unmodified here.
2) 64-bit non-small mode is unchanged and never matches.
3) 64-bit small PIC code which is RIP-relative is handled specially in
the match to try to fit RIP into the base register. If it fails, it
now early exits. This behavior is unchanged by the patch.
4) 64-bit small non-PIC code which is not RIP-relative continues to work
as it did before. The reason these immediates are safe is because the
ABI ensures they fit in small mode. This behavior is unchanged.
5) 64-bit small PIC code which is *not* using RIP-relative addressing.
This is the only case changed by the patch, and the primary place you
see it is in TLS, either the win64 section offset TLS or Linux
local-exec TLS model in a PIC compilation. Here the ABI again ensures
that the immediates fit because we are in small mode, and any other
operations required due to the PIC relocation model have been handled
externally to the Wrapper node (extra loads etc are made around the
wrapper node in ISelLowering).
I've tested this as much as I can comparing it with GCC's output, and
everything appears safe. I discussed this with Anton and it made sense
to him at least at face value. That said, if there are issues with PIC
code after this patch, yell and we can revert it.
llvm-svn: 154304
comprehensive testing of TLS codegen for x86. Convert all of the ones
that were still using grep to use FileCheck. Remove some redundancies
between them.
Perhaps most interestingly expand the test cases so that they actually
fully list the instruction snippet being tested. TLS operations are
*very* narrowly defined, and so these seem reasonably stable. More
importantly, the existing test cases already were crazy fine grained,
expecting specific registers to be allocated. This just clarifies that
no *other* instructions are expected, and fills in some crucial gaps
that weren't being tested at all.
This will make any subsequent changes to TLS much more clear during
review.
llvm-svn: 154303
case as we don't currently have any way of dumping target options or
otherwise observing this. Another small step toward fixing PR12380. With
this we generate TLS accesses using the static model instead of the
dynamic model, but we're still generating suboptimal code under the
mistaken assumption that the TLS offset might be greater than 2^32, and
therefor not viable as an immediate offset of a segment register.
llvm-svn: 154298
when -ffast-math, i.e. don't just always do it if the reciprocal can
be formed exactly. There is already an IR level transform that does
that, and it does it more carefully.
llvm-svn: 154296
optimizations which are valid for position independent code being linked
into a single executable, but not for such code being linked into
a shared library.
I discussed the design of this with Eric Christopher, and the decision
was to support an optional bit rather than a completely separate
relocation model. Fundamentally, this is still PIC relocation, its just
that certain optimizations are only valid under a PIC relocation model
when the resulting code won't be in a shared library. The simplest path
to here is to expose a single bit option in the TargetOptions. If folks
have different/better designs, I'm all ears. =]
I've included the first optimization based upon this: changing TLS
models to the *Exec models when PIE is enabled. This is the LLVM
component of PR12380 and is all of the hard work.
llvm-svn: 154294
in TargetLowering. There was already a FIXME about this location being
odd. The interface is simplified as a consequence. This will also make
it easier to change TLS models when compiling with PIE.
llvm-svn: 154292
First, this patch cleans up the parsing of the PIC and PIE family of
options in the driver. The existing logic failed to claim arguments all
over the place resulting in kludges that marked the options as unused.
Instead actually walk all of the arguments and claim them properly.
We now treat -f{,no-}{pic,PIC,pie,PIE} as a single set, accepting the
last one on the commandline. Previously there were lots of ordering bugs
that could creep in due to the nature of the parsing. Let me know if
folks would like weird things such as "-fPIE -fno-pic" to turn on PIE,
but disable full PIC. This doesn't make any sense to me, but we could in
theory support it.
Options that seem to have intentional "trump" status (-static, -mkernel,
etc) continue to do so and are commented as such.
Next, a -pie-level flag is threaded into the frontend, rigged to
a language option, and handled preprocessor, setting up the appropriate
defines. We'll now have the correct defines when compiling with -fpie.
The one place outside of the preprocessor that was inspecting the PIC
level (as opposed to the relocation model, which is set and handled
separately, yay!) is in the GNU ObjC runtime. I changed it to exactly
preserve existing behavior. If folks want to change its behavior in the
face of PIE, they can do that in a separate patch.
Essentially the only functionality changed here is the preprocessor
defines and bug-fixes to the argument management.
Tests have been updated and extended to test all of this a bit more
thoroughly.
llvm-svn: 154291
testing any of the strange driver behavior. We already have some tiny
tests for the driver behavior, and I'm going to expand them greatly in
the next commit.
llvm-svn: 154290
where a chain outside of the loop block-set ended up in the worklist for
scheduling as part of the contiguous loop. However, asserting the first
block in the chain is in the loop-set isn't a valid check -- we may be
forced to drag a chain into the worklist due to one block in the chain
being part of the loop even though the first block is *not* in the loop.
This occurs when we have been forced to form a chain early due to
un-analyzable branches.
No test case here as I have no idea how to even begin reducing one, and
it will be hopelessly fragile. We have to somehow end up with a loop
header of an inner loop which is a successor of a basic block with an
unanalyzable pair of branch instructions. Ow. Self-host triggers it so
it is unlikely it will regress.
This at least gets block placement back to passing selfhost and the test
suite. There are still a lot of slowdown that I don't like coming out of
block placement, although there are now also a lot of speedups. =[ I'm
seeing swings in both directions up to 10%. I'm going to try to find
time to dig into this and see if we can turn this on for 3.1 as it does
a really good job of cleaning up after some loops that degraded with the
inliner changes.
llvm-svn: 154287
GEPs, bit casts, and stores reaching it but no other instructions. These
often show up during the iterative processing of the inliner, SROA, and
DCE. Once we hit this point, we can completely remove the alloca. These
were actually showing up in the final, fully optimized code in a bunch
of inliner tests I've been working on, and notably they show up after
LLVM finishes optimizing away all function calls involved in
hash_combine(a, b).
llvm-svn: 154285
Previously we used three instructions to broadcast an immediate value into a
vector register.
On Sandybridge we continue to load the broadcasted value from the constant pool.
llvm-svn: 154284
An MDNode has a list of MDNodeOperands allocated directly after it as part of
its allocation. Therefore, the Parent of the MDNodeOperands can be found by
walking back through the operands to the beginning of that list. Mark the first
operand's value pointer as being the 'first' operand so that we know where the
beginning of said list is.
This saves a *lot* of space during LTO with -O0 -g flags.
llvm-svn: 154280