Commit Graph

125552 Commits

Author SHA1 Message Date
Chandler Carruth 3779ac10b4 Cleanup and relax a restriction on the matching of global offsets into
x86 addressing modes. This allows PIE-based TLS offsets to fit directly
into an addressing mode immediate offset, which is the last remaining
code quality issue from PR12380. With this patch, that PR is completely
fixed.

To understand why this patch is correct to match these offsets into
addressing mode immediates, break it down by cases:
1) 32-bit is trivially correct, and unmodified here.
2) 64-bit non-small mode is unchanged and never matches.
3) 64-bit small PIC code which is RIP-relative is handled specially in
   the match to try to fit RIP into the base register. If it fails, it
   now early exits. This behavior is unchanged by the patch.
4) 64-bit small non-PIC code which is not RIP-relative continues to work
   as it did before. The reason these immediates are safe is because the
   ABI ensures they fit in small mode. This behavior is unchanged.
5) 64-bit small PIC code which is *not* using RIP-relative addressing.
   This is the only case changed by the patch, and the primary place you
   see it is in TLS, either the win64 section offset TLS or Linux
   local-exec TLS model in a PIC compilation. Here the ABI again ensures
   that the immediates fit because we are in small mode, and any other
   operations required due to the PIC relocation model have been handled
   externally to the Wrapper node (extra loads etc are made around the
   wrapper node in ISelLowering).

I've tested this as much as I can comparing it with GCC's output, and
everything appears safe. I discussed this with Anton and it made sense
to him at least at face value. That said, if there are issues with PIC
code after this patch, yell and we can revert it.

llvm-svn: 154304
2012-04-09 02:13:06 +00:00
Chandler Carruth 84b834267e Fold 15 tiny test cases into a single file that implements the
comprehensive testing of TLS codegen for x86. Convert all of the ones
that were still using grep to use FileCheck. Remove some redundancies
between them.

Perhaps most interestingly expand the test cases so that they actually
fully list the instruction snippet being tested. TLS operations are
*very* narrowly defined, and so these seem reasonably stable. More
importantly, the existing test cases already were crazy fine grained,
expecting specific registers to be allocated. This just clarifies that
no *other* instructions are expected, and fills in some crucial gaps
that weren't being tested at all.

This will make any subsequent changes to TLS much more clear during
review.

llvm-svn: 154303
2012-04-09 01:43:17 +00:00
Nick Kledzik 467209b1d4 Remove definedAtomsBegin() and co. so that C++11 range based for loops can be used
llvm-svn: 154302
2012-04-09 00:58:21 +00:00
Nick Kledzik 062a98cff0 Rename referencesBegin() to begin() so that C++11 range based for loops can be used
llvm-svn: 154301
2012-04-08 23:52:13 +00:00
Craig Topper 6148fe65e8 Optimize code a bit. No functional change intended.
llvm-svn: 154299
2012-04-08 23:15:04 +00:00
Chandler Carruth 097d019c6c Wire up -fpie and -fPIE to LLVM's newly added TargetOptions. No test
case as we don't currently have any way of dumping target options or
otherwise observing this. Another small step toward fixing PR12380. With
this we generate TLS accesses using the static model instead of the
dynamic model, but we're still generating suboptimal code under the
mistaken assumption that the TLS offset might be greater than 2^32, and
therefor not viable as an immediate offset of a segment register.

llvm-svn: 154298
2012-04-08 21:09:51 +00:00
Benjamin Kramer bb6ff08766 Silence sign-compare warning.
llvm-svn: 154297
2012-04-08 19:04:45 +00:00
Duncan Sands 2f1dc3814b Only have codegen turn fdiv by a constant into fmul by the reciprocal
when -ffast-math, i.e. don't just always do it if the reciprocal can
be formed exactly.  There is already an IR level transform that does
that, and it does it more carefully.

llvm-svn: 154296
2012-04-08 18:08:12 +00:00
Craig Topper c8e2d91a58 Simplify code that tries to do vector extracts for shuffles when the mask width and the input vector widths don't match. No need to check the min and max are in range before calculating the start index. The range check after having the start index is sufficient. Also no need to check for an extract from the beginning differently.
llvm-svn: 154295
2012-04-08 17:53:33 +00:00
Chandler Carruth ede4a8aa2b Teach LLVM about a PIE option which, when enabled on top of PIC, makes
optimizations which are valid for position independent code being linked
into a single executable, but not for such code being linked into
a shared library.

I discussed the design of this with Eric Christopher, and the decision
was to support an optional bit rather than a completely separate
relocation model. Fundamentally, this is still PIC relocation, its just
that certain optimizations are only valid under a PIC relocation model
when the resulting code won't be in a shared library. The simplest path
to here is to expose a single bit option in the TargetOptions. If folks
have different/better designs, I'm all ears. =]

I've included the first optimization based upon this: changing TLS
models to the *Exec models when PIE is enabled. This is the LLVM
component of PR12380 and is all of the hard work.

llvm-svn: 154294
2012-04-08 17:51:45 +00:00
Chandler Carruth 16f0ebcbb5 Move the TLSModel information into the TargetMachine rather than hiding
in TargetLowering. There was already a FIXME about this location being
odd. The interface is simplified as a consequence. This will also make
it easier to change TLS models when compiling with PIE.

llvm-svn: 154292
2012-04-08 17:20:55 +00:00
Chandler Carruth c0c0455f55 Teach Clang about PIE compilations. This is the first step of PR12380.
First, this patch cleans up the parsing of the PIC and PIE family of
options in the driver. The existing logic failed to claim arguments all
over the place resulting in kludges that marked the options as unused.
Instead actually walk all of the arguments and claim them properly.

We now treat -f{,no-}{pic,PIC,pie,PIE} as a single set, accepting the
last one on the commandline. Previously there were lots of ordering bugs
that could creep in due to the nature of the parsing. Let me know if
folks would like weird things such as "-fPIE -fno-pic" to turn on PIE,
but disable full PIC. This doesn't make any sense to me, but we could in
theory support it.

Options that seem to have intentional "trump" status (-static, -mkernel,
etc) continue to do so and are commented as such.

Next, a -pie-level flag is threaded into the frontend, rigged to
a language option, and handled preprocessor, setting up the appropriate
defines. We'll now have the correct defines when compiling with -fpie.

The one place outside of the preprocessor that was inspecting the PIC
level (as opposed to the relocation model, which is set and handled
separately, yay!) is in the GNU ObjC runtime. I changed it to exactly
preserve existing behavior. If folks want to change its behavior in the
face of PIE, they can do that in a separate patch.

Essentially the only functionality changed here is the preprocessor
defines and bug-fixes to the argument management.

Tests have been updated and extended to test all of this a bit more
thoroughly.

llvm-svn: 154291
2012-04-08 16:40:35 +00:00
Chandler Carruth 4e97337914 Rephrase the preprocessor test to directly use CC1 and not bother
testing any of the strange driver behavior. We already have some tiny
tests for the driver behavior, and I'm going to expand them greatly in
the next commit.

llvm-svn: 154290
2012-04-08 16:40:31 +00:00
Chandler Carruth f4725b468e FileCheck-ize this test.
llvm-svn: 154289
2012-04-08 16:40:30 +00:00
Benjamin Kramer 25a3d816a6 EngineBuilder::create is expected to take ownership of the TargetMachine passed to it. Delete it on error or when we create an interpreter that doesn't need it.
llvm-svn: 154288
2012-04-08 14:53:14 +00:00
Chandler Carruth bed1abf9ca Remove an over zealous assert. The assert was trying to catch places
where a chain outside of the loop block-set ended up in the worklist for
scheduling as part of the contiguous loop. However, asserting the first
block in the chain is in the loop-set isn't a valid check -- we may be
forced to drag a chain into the worklist due to one block in the chain
being part of the loop even though the first block is *not* in the loop.
This occurs when we have been forced to form a chain early due to
un-analyzable branches.

No test case here as I have no idea how to even begin reducing one, and
it will be hopelessly fragile. We have to somehow end up with a loop
header of an inner loop which is a successor of a basic block with an
unanalyzable pair of branch instructions. Ow. Self-host triggers it so
it is unlikely it will regress.

This at least gets block placement back to passing selfhost and the test
suite. There are still a lot of slowdown that I don't like coming out of
block placement, although there are now also a lot of speedups. =[ I'm
seeing swings in both directions up to 10%. I'm going to try to find
time to dig into this and see if we can turn this on for 3.1 as it does
a really good job of cleaning up after some loops that degraded with the
inliner changes.

llvm-svn: 154287
2012-04-08 14:37:02 +00:00
Chandler Carruth 49158908dc Add a debug-only 'dump' method to the BlockChain structure to ease
debugging.

llvm-svn: 154286
2012-04-08 14:37:01 +00:00
Chandler Carruth f82b0e2d29 Teach InstCombine to nuke a common alloca pattern -- an alloca which has
GEPs, bit casts, and stores reaching it but no other instructions. These
often show up during the iterative processing of the inliner, SROA, and
DCE. Once we hit this point, we can completely remove the alloca. These
were actually showing up in the final, fully optimized code in a bunch
of inliner tests I've been working on, and notably they show up after
LLVM finishes optimizing away all function calls involved in
hash_combine(a, b).

llvm-svn: 154285
2012-04-08 14:36:56 +00:00
Nadav Rotem 82609df647 AVX2: Build splat vectors by broadcasting a scalar from the constant pool.
Previously we used three instructions to broadcast an immediate value into a
vector register.
On Sandybridge we continue to load the broadcasted value from the constant pool.

llvm-svn: 154284
2012-04-08 12:54:54 +00:00
Bill Wendling 8c783d4122 Remove old 'grep' lines.
llvm-svn: 154283
2012-04-08 11:53:54 +00:00
Bill Wendling ccf1109040 Formatting changes. Don't put spaces in front of some code, which only makes it look 'off'.
llvm-svn: 154282
2012-04-08 11:52:52 +00:00
Bill Wendling 57f8e5ebe4 FileCheckize these testcases.
llvm-svn: 154281
2012-04-08 11:00:38 +00:00
Bill Wendling 5c0068f807 Remove the 'Parent' pointer from the MDNodeOperand class.
An MDNode has a list of MDNodeOperands allocated directly after it as part of
its allocation. Therefore, the Parent of the MDNodeOperands can be found by
walking back through the operands to the beginning of that list. Mark the first
operand's value pointer as being the 'first' operand so that we know where the
beginning of said list is.

This saves a *lot* of space during LTO with -O0 -g flags.

llvm-svn: 154280
2012-04-08 10:20:49 +00:00
Bill Wendling 9b2503a006 Allow subclasses of the ValueHandleBase to store information as part of the
value pointer by making the value pointer into a pointer-int pair with 2 bits
available for flags.

llvm-svn: 154279
2012-04-08 10:16:43 +00:00
Richard Smith 4051ff7650 Don't forget to evaluate the subexpression in a null pointer cast. If we're
converting from std::nullptr_t, the subexpression might have side-effects.

llvm-svn: 154278
2012-04-08 08:02:07 +00:00
Michael J. Spencer d73a53f158 [docs] Add more open projects.
llvm-svn: 154277
2012-04-08 03:47:49 +00:00
Michael J. Spencer 00d9e87cac [docs] Add documentation todos.
llvm-svn: 154276
2012-04-08 02:06:15 +00:00
Michael J. Spencer d01c8fe7a5 [docs] Make the index page ReST based instead of html based.
llvm-svn: 154275
2012-04-08 02:06:04 +00:00
Michael J. Spencer f9bc125c5a [docs] Add open projects page that includes the TODO.txt files.
llvm-svn: 154274
2012-04-07 23:10:01 +00:00
Francois Pichet 7ebc4c1910 ext_reserved_user_defined_literal must not default to Error in MicrosoftMode. Hence create ext_ms_reserved_user_defined_literal that doesn't default to Error; otherwise MSVC headers won't parse.
Fixes PR12383.

llvm-svn: 154273
2012-04-07 23:09:23 +00:00
Craig Topper d024cef233 Turn avx2 vinserti128 intrinsic calls into INSERT_SUBVECTOR DAG nodes and remove patterns for selecting the intrinsic. Similar was already done for avx1.
llvm-svn: 154272
2012-04-07 22:32:29 +00:00
Simon Atanasyan 571d7bde3c MIPS: Pass -mabi option to the assmbler when compile MIPS targets.
llvm-svn: 154270
2012-04-07 22:31:29 +00:00
Simon Atanasyan 3b7589a6c3 MIPS: Move code calculates CPU and ABI names to the separate function to reuse this function later.
llvm-svn: 154269
2012-04-07 22:09:23 +00:00
Craig Topper aa9aab5ad2 Move vinsertf128 patterns near the instruction definitions. Add AddedComplexity to AVX2 vextracti128 patterns to give them priority over the integer versions of vextractf128 patterns.
llvm-svn: 154268
2012-04-07 21:57:43 +00:00
Craig Topper e09d1c5c48 Remove 'else' after 'if' that ends in return.
llvm-svn: 154267
2012-04-07 21:23:41 +00:00
Nadav Rotem 71d07ae5cb 1. Remove the part of r153848 which optimizes shuffle-of-shuffle into a new
shuffle node because it could introduce new shuffle nodes that were not
   supported efficiently by the target.

2. Add a more restrictive shuffle-of-shuffle optimization for cases where the
   second shuffle reverses the transformation of the first shuffle.

llvm-svn: 154266
2012-04-07 21:19:08 +00:00
Duncan Sands 5f8397a934 Convert floating point division by a constant into multiplication by the
reciprocal if converting to the reciprocal is exact.  Do it even if inexact
if -ffast-math.  This substantially speeds up ac.f90 from the polyhedron
benchmarks.

llvm-svn: 154265
2012-04-07 20:04:00 +00:00
Chandler Carruth 75a1cf327a Perform partial SROA on the helper hashing structure. I really wish the
optimizers could do this for us, but expecting partial SROA of classes
with template methods through cloning is probably expecting too much
heroics. With this change, the begin/end pointer pairs which indicate
the status of each loop iteration are actually passed directly into each
layer of the combine_data calls, and the inliner has a chance to see
when most of the combine_data function could be deleted by inlining.
Similarly for 'length'.

We have to be careful to limit the places where in/out reference
parameters are used as those will also defeat the inliner / optimizers
from properly propagating constants.

With this change, LLVM is able to fully inline and unroll the hash
computation of small sets of values, such as two or three pointers.
These now decompose into essentially straight-line code with no loops or
function calls.

There is still one code quality problem to be solved with the hashing --
LLVM is failing to nuke the alloca. It removes all loads from the
alloca, leaving only lifetime intrinsics and dead(!!) stores to the
alloca. =/ Very unfortunate.

llvm-svn: 154264
2012-04-07 20:01:31 +00:00
Chandler Carruth 28192c9398 Fix ValueTracking to conclude that debug intrinsics are safe to
speculate. Without this, loop rotate (among many other places) would
suddenly stop working in the presence of debug info. I found this
looking at loop rotate, and have augmented its tests with a reduction
out of a very hot loop in yacr2 where failing to do this rotation costs
sometimes more than 10% in runtime performance, perturbing numerous
downstream optimizations.

This should have no impact on performance without debug info, but the
change in performance when debug info is enabled can be extreme. As
a consequence (and this how I got to this yak) any profiling of
performance problems should be treated with deep suspicion -- they may
have been wildly innacurate of debug info was enabled for profiling. =/
Just a heads up.

llvm-svn: 154263
2012-04-07 19:22:18 +00:00
Benjamin Kramer e1f4ca1b0f SCEV: When expanding a GEP the final addition to the base pointer has NUW but not NSW.
Found by inspection.

llvm-svn: 154262
2012-04-07 17:19:26 +00:00
Bob Wilson 6f9be7e2c6 Fix Thumb __builtin_longjmp with integrated assembler. <rdar://problem/11203543>
The tLDRr instruction with the last register operand set to the zero register
prints in assembly as if no register was specified, and the assembler encodes
it as a tLDRi instruction with a zero immediate.  With the integrated assembler,
that zero register gets emitted as "r0", so we get "ldr rx, [ry, r0]" which
is broken.  Emit the instruction as tLDRi with a zero immediate.  I don't
know if there's a good way to write a testcase for this.  Suggestions welcome.

Opportunities for follow-up work:
1) The asm printer should complain if a non-optional register operand is set
   to the zero register, instead of silently dropping it.
2) The integrated assembler should complain in the same situation, instead of
   silently emitting the operand as "r0".

llvm-svn: 154261
2012-04-07 16:51:59 +00:00
Hongbin Zheng ed986ab6a4 Rewritten expandRegion to clarify the intention and improve
performance, patched by Johannes Doerfert <johannes@jdoerfert.de>.

llvm-svn: 154260
2012-04-07 15:14:28 +00:00
Hongbin Zheng 3a2d6035d2 ScopDetection: Add some comments to function "expandRegion".
llvm-svn: 154259
2012-04-07 12:29:27 +00:00
Hongbin Zheng 94868e6cc6 Speed up SCoP detection time by checking the exit of the region first,
patched by Johannes Doerfert <johannes@jdoerfert.de>.

llvm-svn: 154258
2012-04-07 12:29:17 +00:00
Benjamin Kramer c2b5c67df3 Linux/ProcessMonitor: include sys/user.h for user_regs_struct and user_fpregs_struct.
llvm-svn: 154255
2012-04-07 09:13:49 +00:00
NAKAMURA Takumi a7d49883bb [Cygwin] Work around to flush stdout in a thread, or stdout in threads won't be flushed at exit.
llvm-svn: 154254
2012-04-07 06:59:28 +00:00
Jason Molenda 7a2f433367 Version bump to lldb-138.
llvm-svn: 154252
2012-04-07 06:18:20 +00:00
Tobias Grosser 84ecc47e1c CodeGen: Allow Polly to do 'grouped unrolling', but no vector generation.
Grouped unrolling means that we unroll a loop such that the different instances
of a certain statement are scheduled right after each other, but we do
not generate any vector code. The idea here is that we can schedule the
bb vectorizer right afterwards and use it heuristics to decide when
vectorization should be performed.

llvm-svn: 154251
2012-04-07 06:16:08 +00:00
Jason Molenda b9e88d4186 Fix a integer trauction issue - calculating the current time in
nanoseconds in 32-bit expression would cause pthread_cond_timedwait
to time out immediately.  Add explicit casts to the TimeValue::TimeValue
ctor that takes a struct timeval and change the NanoSecsPerSec etc
constants defined in TimeValue to be uint64_t so any other calculations
involving these should be promoted to 64-bit even when lldb is built
for 32-bit.

<rdar://problem/11204073>, <rdar://problem/11179821>, <rdar://problem/11194705>.

llvm-svn: 154250
2012-04-07 04:55:02 +00:00
Hongbin Zheng 5758f495da Refactor: Use positive field names in VectorizeConfig.
llvm-svn: 154249
2012-04-07 03:56:23 +00:00