llvm-project

Commit Graph

Author	SHA1	Message	Date
Chandler Carruth	3779ac10b4	Cleanup and relax a restriction on the matching of global offsets into x86 addressing modes. This allows PIE-based TLS offsets to fit directly into an addressing mode immediate offset, which is the last remaining code quality issue from PR12380. With this patch, that PR is completely fixed. To understand why this patch is correct to match these offsets into addressing mode immediates, break it down by cases: 1) 32-bit is trivially correct, and unmodified here. 2) 64-bit non-small mode is unchanged and never matches. 3) 64-bit small PIC code which is RIP-relative is handled specially in the match to try to fit RIP into the base register. If it fails, it now early exits. This behavior is unchanged by the patch. 4) 64-bit small non-PIC code which is not RIP-relative continues to work as it did before. The reason these immediates are safe is because the ABI ensures they fit in small mode. This behavior is unchanged. 5) 64-bit small PIC code which is not using RIP-relative addressing. This is the only case changed by the patch, and the primary place you see it is in TLS, either the win64 section offset TLS or Linux local-exec TLS model in a PIC compilation. Here the ABI again ensures that the immediates fit because we are in small mode, and any other operations required due to the PIC relocation model have been handled externally to the Wrapper node (extra loads etc are made around the wrapper node in ISelLowering). I've tested this as much as I can comparing it with GCC's output, and everything appears safe. I discussed this with Anton and it made sense to him at least at face value. That said, if there are issues with PIC code after this patch, yell and we can revert it. llvm-svn: 154304	2012-04-09 02:13:06 +00:00
Chandler Carruth	84b834267e	Fold 15 tiny test cases into a single file that implements the comprehensive testing of TLS codegen for x86. Convert all of the ones that were still using grep to use FileCheck. Remove some redundancies between them. Perhaps most interestingly expand the test cases so that they actually fully list the instruction snippet being tested. TLS operations are very narrowly defined, and so these seem reasonably stable. More importantly, the existing test cases already were crazy fine grained, expecting specific registers to be allocated. This just clarifies that no other instructions are expected, and fills in some crucial gaps that weren't being tested at all. This will make any subsequent changes to TLS much more clear during review. llvm-svn: 154303	2012-04-09 01:43:17 +00:00
Nick Kledzik	467209b1d4	Remove definedAtomsBegin() and co. so that C++11 range based for loops can be used llvm-svn: 154302	2012-04-09 00:58:21 +00:00
Nick Kledzik	062a98cff0	Rename referencesBegin() to begin() so that C++11 range based for loops can be used llvm-svn: 154301	2012-04-08 23:52:13 +00:00
Craig Topper	6148fe65e8	Optimize code a bit. No functional change intended. llvm-svn: 154299	2012-04-08 23:15:04 +00:00
Chandler Carruth	097d019c6c	Wire up -fpie and -fPIE to LLVM's newly added TargetOptions. No test case as we don't currently have any way of dumping target options or otherwise observing this. Another small step toward fixing PR12380. With this we generate TLS accesses using the static model instead of the dynamic model, but we're still generating suboptimal code under the mistaken assumption that the TLS offset might be greater than 2^32, and therefor not viable as an immediate offset of a segment register. llvm-svn: 154298	2012-04-08 21:09:51 +00:00
Benjamin Kramer	bb6ff08766	Silence sign-compare warning. llvm-svn: 154297	2012-04-08 19:04:45 +00:00
Duncan Sands	2f1dc3814b	Only have codegen turn fdiv by a constant into fmul by the reciprocal when -ffast-math, i.e. don't just always do it if the reciprocal can be formed exactly. There is already an IR level transform that does that, and it does it more carefully. llvm-svn: 154296	2012-04-08 18:08:12 +00:00
Craig Topper	c8e2d91a58	Simplify code that tries to do vector extracts for shuffles when the mask width and the input vector widths don't match. No need to check the min and max are in range before calculating the start index. The range check after having the start index is sufficient. Also no need to check for an extract from the beginning differently. llvm-svn: 154295	2012-04-08 17:53:33 +00:00
Chandler Carruth	ede4a8aa2b	Teach LLVM about a PIE option which, when enabled on top of PIC, makes optimizations which are valid for position independent code being linked into a single executable, but not for such code being linked into a shared library. I discussed the design of this with Eric Christopher, and the decision was to support an optional bit rather than a completely separate relocation model. Fundamentally, this is still PIC relocation, its just that certain optimizations are only valid under a PIC relocation model when the resulting code won't be in a shared library. The simplest path to here is to expose a single bit option in the TargetOptions. If folks have different/better designs, I'm all ears. =] I've included the first optimization based upon this: changing TLS models to the *Exec models when PIE is enabled. This is the LLVM component of PR12380 and is all of the hard work. llvm-svn: 154294	2012-04-08 17:51:45 +00:00
Chandler Carruth	16f0ebcbb5	Move the TLSModel information into the TargetMachine rather than hiding in TargetLowering. There was already a FIXME about this location being odd. The interface is simplified as a consequence. This will also make it easier to change TLS models when compiling with PIE. llvm-svn: 154292	2012-04-08 17:20:55 +00:00
Chandler Carruth	c0c0455f55	Teach Clang about PIE compilations. This is the first step of PR12380. First, this patch cleans up the parsing of the PIC and PIE family of options in the driver. The existing logic failed to claim arguments all over the place resulting in kludges that marked the options as unused. Instead actually walk all of the arguments and claim them properly. We now treat -f{,no-}{pic,PIC,pie,PIE} as a single set, accepting the last one on the commandline. Previously there were lots of ordering bugs that could creep in due to the nature of the parsing. Let me know if folks would like weird things such as "-fPIE -fno-pic" to turn on PIE, but disable full PIC. This doesn't make any sense to me, but we could in theory support it. Options that seem to have intentional "trump" status (-static, -mkernel, etc) continue to do so and are commented as such. Next, a -pie-level flag is threaded into the frontend, rigged to a language option, and handled preprocessor, setting up the appropriate defines. We'll now have the correct defines when compiling with -fpie. The one place outside of the preprocessor that was inspecting the PIC level (as opposed to the relocation model, which is set and handled separately, yay!) is in the GNU ObjC runtime. I changed it to exactly preserve existing behavior. If folks want to change its behavior in the face of PIE, they can do that in a separate patch. Essentially the only functionality changed here is the preprocessor defines and bug-fixes to the argument management. Tests have been updated and extended to test all of this a bit more thoroughly. llvm-svn: 154291	2012-04-08 16:40:35 +00:00
Chandler Carruth	4e97337914	Rephrase the preprocessor test to directly use CC1 and not bother testing any of the strange driver behavior. We already have some tiny tests for the driver behavior, and I'm going to expand them greatly in the next commit. llvm-svn: 154290	2012-04-08 16:40:31 +00:00
Chandler Carruth	f4725b468e	FileCheck-ize this test. llvm-svn: 154289	2012-04-08 16:40:30 +00:00
Benjamin Kramer	25a3d816a6	EngineBuilder::create is expected to take ownership of the TargetMachine passed to it. Delete it on error or when we create an interpreter that doesn't need it. llvm-svn: 154288	2012-04-08 14:53:14 +00:00
Chandler Carruth	bed1abf9ca	Remove an over zealous assert. The assert was trying to catch places where a chain outside of the loop block-set ended up in the worklist for scheduling as part of the contiguous loop. However, asserting the first block in the chain is in the loop-set isn't a valid check -- we may be forced to drag a chain into the worklist due to one block in the chain being part of the loop even though the first block is not in the loop. This occurs when we have been forced to form a chain early due to un-analyzable branches. No test case here as I have no idea how to even begin reducing one, and it will be hopelessly fragile. We have to somehow end up with a loop header of an inner loop which is a successor of a basic block with an unanalyzable pair of branch instructions. Ow. Self-host triggers it so it is unlikely it will regress. This at least gets block placement back to passing selfhost and the test suite. There are still a lot of slowdown that I don't like coming out of block placement, although there are now also a lot of speedups. =[ I'm seeing swings in both directions up to 10%. I'm going to try to find time to dig into this and see if we can turn this on for 3.1 as it does a really good job of cleaning up after some loops that degraded with the inliner changes. llvm-svn: 154287	2012-04-08 14:37:02 +00:00
Chandler Carruth	49158908dc	Add a debug-only 'dump' method to the BlockChain structure to ease debugging. llvm-svn: 154286	2012-04-08 14:37:01 +00:00
Chandler Carruth	f82b0e2d29	Teach InstCombine to nuke a common alloca pattern -- an alloca which has GEPs, bit casts, and stores reaching it but no other instructions. These often show up during the iterative processing of the inliner, SROA, and DCE. Once we hit this point, we can completely remove the alloca. These were actually showing up in the final, fully optimized code in a bunch of inliner tests I've been working on, and notably they show up after LLVM finishes optimizing away all function calls involved in hash_combine(a, b). llvm-svn: 154285	2012-04-08 14:36:56 +00:00
Nadav Rotem	82609df647	AVX2: Build splat vectors by broadcasting a scalar from the constant pool. Previously we used three instructions to broadcast an immediate value into a vector register. On Sandybridge we continue to load the broadcasted value from the constant pool. llvm-svn: 154284	2012-04-08 12:54:54 +00:00
Bill Wendling	8c783d4122	Remove old 'grep' lines. llvm-svn: 154283	2012-04-08 11:53:54 +00:00
Bill Wendling	ccf1109040	Formatting changes. Don't put spaces in front of some code, which only makes it look 'off'. llvm-svn: 154282	2012-04-08 11:52:52 +00:00
Bill Wendling	57f8e5ebe4	FileCheckize these testcases. llvm-svn: 154281	2012-04-08 11:00:38 +00:00
Bill Wendling	5c0068f807	Remove the 'Parent' pointer from the MDNodeOperand class. An MDNode has a list of MDNodeOperands allocated directly after it as part of its allocation. Therefore, the Parent of the MDNodeOperands can be found by walking back through the operands to the beginning of that list. Mark the first operand's value pointer as being the 'first' operand so that we know where the beginning of said list is. This saves a lot of space during LTO with -O0 -g flags. llvm-svn: 154280	2012-04-08 10:20:49 +00:00
Bill Wendling	9b2503a006	Allow subclasses of the ValueHandleBase to store information as part of the value pointer by making the value pointer into a pointer-int pair with 2 bits available for flags. llvm-svn: 154279	2012-04-08 10:16:43 +00:00
Richard Smith	4051ff7650	Don't forget to evaluate the subexpression in a null pointer cast. If we're converting from std::nullptr_t, the subexpression might have side-effects. llvm-svn: 154278	2012-04-08 08:02:07 +00:00
Michael J. Spencer	d73a53f158	[docs] Add more open projects. llvm-svn: 154277	2012-04-08 03:47:49 +00:00
Michael J. Spencer	00d9e87cac	[docs] Add documentation todos. llvm-svn: 154276	2012-04-08 02:06:15 +00:00
Michael J. Spencer	d01c8fe7a5	[docs] Make the index page ReST based instead of html based. llvm-svn: 154275	2012-04-08 02:06:04 +00:00
Michael J. Spencer	f9bc125c5a	[docs] Add open projects page that includes the TODO.txt files. llvm-svn: 154274	2012-04-07 23:10:01 +00:00
Francois Pichet	7ebc4c1910	ext_reserved_user_defined_literal must not default to Error in MicrosoftMode. Hence create ext_ms_reserved_user_defined_literal that doesn't default to Error; otherwise MSVC headers won't parse. Fixes PR12383. llvm-svn: 154273	2012-04-07 23:09:23 +00:00
Craig Topper	d024cef233	Turn avx2 vinserti128 intrinsic calls into INSERT_SUBVECTOR DAG nodes and remove patterns for selecting the intrinsic. Similar was already done for avx1. llvm-svn: 154272	2012-04-07 22:32:29 +00:00
Simon Atanasyan	571d7bde3c	MIPS: Pass -mabi option to the assmbler when compile MIPS targets. llvm-svn: 154270	2012-04-07 22:31:29 +00:00
Simon Atanasyan	3b7589a6c3	MIPS: Move code calculates CPU and ABI names to the separate function to reuse this function later. llvm-svn: 154269	2012-04-07 22:09:23 +00:00
Craig Topper	aa9aab5ad2	Move vinsertf128 patterns near the instruction definitions. Add AddedComplexity to AVX2 vextracti128 patterns to give them priority over the integer versions of vextractf128 patterns. llvm-svn: 154268	2012-04-07 21:57:43 +00:00
Craig Topper	e09d1c5c48	Remove 'else' after 'if' that ends in return. llvm-svn: 154267	2012-04-07 21:23:41 +00:00
Nadav Rotem	71d07ae5cb	1. Remove the part of r153848 which optimizes shuffle-of-shuffle into a new shuffle node because it could introduce new shuffle nodes that were not supported efficiently by the target. 2. Add a more restrictive shuffle-of-shuffle optimization for cases where the second shuffle reverses the transformation of the first shuffle. llvm-svn: 154266	2012-04-07 21:19:08 +00:00
Duncan Sands	5f8397a934	Convert floating point division by a constant into multiplication by the reciprocal if converting to the reciprocal is exact. Do it even if inexact if -ffast-math. This substantially speeds up ac.f90 from the polyhedron benchmarks. llvm-svn: 154265	2012-04-07 20:04:00 +00:00
Chandler Carruth	75a1cf327a	Perform partial SROA on the helper hashing structure. I really wish the optimizers could do this for us, but expecting partial SROA of classes with template methods through cloning is probably expecting too much heroics. With this change, the begin/end pointer pairs which indicate the status of each loop iteration are actually passed directly into each layer of the combine_data calls, and the inliner has a chance to see when most of the combine_data function could be deleted by inlining. Similarly for 'length'. We have to be careful to limit the places where in/out reference parameters are used as those will also defeat the inliner / optimizers from properly propagating constants. With this change, LLVM is able to fully inline and unroll the hash computation of small sets of values, such as two or three pointers. These now decompose into essentially straight-line code with no loops or function calls. There is still one code quality problem to be solved with the hashing -- LLVM is failing to nuke the alloca. It removes all loads from the alloca, leaving only lifetime intrinsics and dead(!!) stores to the alloca. =/ Very unfortunate. llvm-svn: 154264	2012-04-07 20:01:31 +00:00
Chandler Carruth	28192c9398	Fix ValueTracking to conclude that debug intrinsics are safe to speculate. Without this, loop rotate (among many other places) would suddenly stop working in the presence of debug info. I found this looking at loop rotate, and have augmented its tests with a reduction out of a very hot loop in yacr2 where failing to do this rotation costs sometimes more than 10% in runtime performance, perturbing numerous downstream optimizations. This should have no impact on performance without debug info, but the change in performance when debug info is enabled can be extreme. As a consequence (and this how I got to this yak) any profiling of performance problems should be treated with deep suspicion -- they may have been wildly innacurate of debug info was enabled for profiling. =/ Just a heads up. llvm-svn: 154263	2012-04-07 19:22:18 +00:00
Benjamin Kramer	e1f4ca1b0f	SCEV: When expanding a GEP the final addition to the base pointer has NUW but not NSW. Found by inspection. llvm-svn: 154262	2012-04-07 17:19:26 +00:00
Bob Wilson	6f9be7e2c6	Fix Thumb __builtin_longjmp with integrated assembler. <rdar://problem/11203543> The tLDRr instruction with the last register operand set to the zero register prints in assembly as if no register was specified, and the assembler encodes it as a tLDRi instruction with a zero immediate. With the integrated assembler, that zero register gets emitted as "r0", so we get "ldr rx, [ry, r0]" which is broken. Emit the instruction as tLDRi with a zero immediate. I don't know if there's a good way to write a testcase for this. Suggestions welcome. Opportunities for follow-up work: 1) The asm printer should complain if a non-optional register operand is set to the zero register, instead of silently dropping it. 2) The integrated assembler should complain in the same situation, instead of silently emitting the operand as "r0". llvm-svn: 154261	2012-04-07 16:51:59 +00:00
Hongbin Zheng	ed986ab6a4	Rewritten expandRegion to clarify the intention and improve performance, patched by Johannes Doerfert <johannes@jdoerfert.de>. llvm-svn: 154260	2012-04-07 15:14:28 +00:00
Hongbin Zheng	3a2d6035d2	ScopDetection: Add some comments to function "expandRegion". llvm-svn: 154259	2012-04-07 12:29:27 +00:00
Hongbin Zheng	94868e6cc6	Speed up SCoP detection time by checking the exit of the region first, patched by Johannes Doerfert <johannes@jdoerfert.de>. llvm-svn: 154258	2012-04-07 12:29:17 +00:00
Benjamin Kramer	c2b5c67df3	Linux/ProcessMonitor: include sys/user.h for user_regs_struct and user_fpregs_struct. llvm-svn: 154255	2012-04-07 09:13:49 +00:00
NAKAMURA Takumi	a7d49883bb	[Cygwin] Work around to flush stdout in a thread, or stdout in threads won't be flushed at exit. llvm-svn: 154254	2012-04-07 06:59:28 +00:00
Jason Molenda	7a2f433367	Version bump to lldb-138. llvm-svn: 154252	2012-04-07 06:18:20 +00:00
Tobias Grosser	84ecc47e1c	CodeGen: Allow Polly to do 'grouped unrolling', but no vector generation. Grouped unrolling means that we unroll a loop such that the different instances of a certain statement are scheduled right after each other, but we do not generate any vector code. The idea here is that we can schedule the bb vectorizer right afterwards and use it heuristics to decide when vectorization should be performed. llvm-svn: 154251	2012-04-07 06:16:08 +00:00
Jason Molenda	b9e88d4186	Fix a integer trauction issue - calculating the current time in nanoseconds in 32-bit expression would cause pthread_cond_timedwait to time out immediately. Add explicit casts to the TimeValue::TimeValue ctor that takes a struct timeval and change the NanoSecsPerSec etc constants defined in TimeValue to be uint64_t so any other calculations involving these should be promoted to 64-bit even when lldb is built for 32-bit. <rdar://problem/11204073>, <rdar://problem/11179821>, <rdar://problem/11194705>. llvm-svn: 154250	2012-04-07 04:55:02 +00:00
Hongbin Zheng	5758f495da	Refactor: Use positive field names in VectorizeConfig. llvm-svn: 154249	2012-04-07 03:56:23 +00:00

1 2 3 4 5 ...

125552 Commits All Branches Search

125552 Commits

All Branches