llvm-project

Commit Graph

Author	SHA1	Message	Date
Chandler Carruth	10f28f26fd	[PM] Replace the Pass argument in MergeBasicBlockIntoOnlyPred with a DominatorTree argument as that is the analysis that it wants to update. This removes the last non-loop utility function in Utils/ which accepts a raw Pass argument. llvm-svn: 226537	2015-01-20 01:37:09 +00:00
Adrian Prantl	5883af3faa	Remove support for DIVariable's FlagIndirectVariable and expect frontends to use a DIExpression with a DW_OP_deref instead. This is not only a much more natural place for this informationl; there is also a technical reason: The FlagIndirectVariable is used to mark a variable that is turned into a reference by virtue of the calling convention; this happens for example to aggregate return values. The inliner, for example, may actually need to undo this indirection to correctly represent the value in its new context. This is impossible to implement because the DIVariable can't be safely modified. We can however safely construct a new DIExpression on the fly. llvm-svn: 226476	2015-01-19 17:57:29 +00:00
Rafael Espindola	12ca34f53f	Bring r226038 back. No change in this commit, but clang was changed to also produce trivial comdats when needed. Original message: Don't create new comdats in CodeGen. This patch stops the implicit creation of comdats during codegen. Clang now sets the comdat explicitly when it is required. With this patch clang and gcc now produce the same result in pr19848. llvm-svn: 226467	2015-01-19 15:16:06 +00:00
Chandler Carruth	37df2cfbf8	[PM] Remove the Pass argument from all of the critical edge splitting APIs and replace it and numerous booleans with an option struct. The critical edge splitting API has a really large surface of flags and so it seems worth burning a small option struct / builder. This struct can be constructed with the various preserved analyses and then flags can be flipped in a builder style. The various users are now responsible for directly passing along their analysis information. This should be enough for the critical edge splitting to work cleanly with the new pass manager as well. This API is still pretty crufty and could be cleaned up a lot, but I've focused on this change just threading an option struct rather than a pass through the API. llvm-svn: 226456	2015-01-19 12:09:11 +00:00
Michael Kuperstein	54c61edee7	[MIScheduler] Slightly better handling of constrainLocalCopy when both source and dest are local This fixes PR21792. Differential Revision: http://reviews.llvm.org/D6823 llvm-svn: 226433	2015-01-19 07:30:47 +00:00
David Blaikie	9459832ebd	std::unique_ptrify the MCStreamer argument to createAsmPrinter llvm-svn: 226414	2015-01-18 20:29:04 +00:00
Mehdi Amini	37f316afaf	Improve DAG combine pass on certain IR vector patterns Loading 2 2x32-bit float vectors into the bottom half of a 256-bit vector produced suboptimal code in AVX2 mode with certain IR combinations. In particular, the IR optimizer folded 2f32 + 2f32 -> 4f32, 4f32 + 4f32 (undef) -> 8f32 into a 2f32 + 2f32 -> 8f32, which seems more canonical, but then mysteriously generated rather bad code; the movq/movhpd combination didn't match. The problem lay in the BUILD_VECTOR optimization path. The 2f32 inputs would get promoted to 4f32 by the type legalizer, eventually resulting in a BUILD_VECTOR on two 4f32 into an 8f32. The BUILD_VECTOR then, recognizing these were both half the output size, concatted them and then produced a shuffle. However, the resulting concat + shuffle was more complex than it should be; in the case where the upper half of the output is undef, we probably want to generate shuffle + concat instead. This enhancement causes the vector_shuffle combine step to recognize this suboptimal pattern and correct it. I included it there instead of in BUILD_VECTOR in case the same suboptimal pattern occurs for other reasons. This results in the optimizer correctly producing the optimal movq + movhpd sequence for all three variations on this IR, even with AVX2. I've included a test case. Radar link: rdar://problem/19287012 Fix for PR 21943. From: Fiona Glaser <fglaser@apple.com> llvm-svn: 226360	2015-01-17 01:35:56 +00:00
Matthias Braun	7618b2b23d	RegisterCoalescer: Cleanup and improved comment for a subtle detail. llvm-svn: 226353	2015-01-17 00:33:13 +00:00
Matthias Braun	0eb940aed0	RegisterCoalescer: Cleanup by factoring out a common expression llvm-svn: 226352	2015-01-17 00:33:11 +00:00
Matthias Braun	e2fa081615	RegisterCoalescer: Cleanup comment style - Consistenly put comments above the function declaration, not the definition. To achieve this some duplicate comments got merged and some comment parts describing implementation details got moved into their functions. - Consistently use doxygen comments above functions. - Do not use doxygen comments inside functions. llvm-svn: 226351	2015-01-17 00:33:09 +00:00
Matthias Braun	fc6ef3a270	RegisterCoalescer: Drive-by typo + whitespace fix llvm-svn: 226350	2015-01-17 00:33:06 +00:00
Philip Reames	287987ca13	Update a comment Be a bit more explicit about the fact that addrspace(1) is not reserved. llvm-svn: 226344	2015-01-16 23:21:07 +00:00
Philip Reames	36319538d0	clang-format all the GC related files (NFC) Nothing interesting here... llvm-svn: 226342	2015-01-16 23:16:12 +00:00
Philip Reames	2b45395876	Move ownership of GCStrategy objects to LLVMContext Note: This change ended up being slightly more controversial than expected. Chandler has tentatively okayed this for the moment, but I may be revisiting this in the near future after we settle some high level questions. Rather than have the GCStrategy object owned by the GCModuleInfo - which is an immutable analysis pass used mainly by gc.root - have it be owned by the LLVMContext. This simplifies the ownership logic (i.e. can you have two instances of the same strategy at once?), but more importantly, allows us to access the GCStrategy in the middle end optimizer. To this end, I add an accessor through Function which becomes the canonical way to get at a GCStrategy instance. In the near future, this will allows me to move some of the checks from http://reviews.llvm.org/D6808 into the Verifier itself, and to introduce optimization legality predicates for some of the recent additions to InstCombine. (These will follow as separate changes.) Differential Revision: http://reviews.llvm.org/D6811 llvm-svn: 226311	2015-01-16 20:07:33 +00:00
Philip Reames	7de640a876	Remove gc.root's findCustomSafePoints mechanism Searching all of the existing gc.root implementations I'm aware of (all three of them), there was exactly one use of this mechanism, and that was to implement a performance improvement that should have been applied to the default lowering. Having this function is requiring a dependency on a CodeGen class (MachineFunction), in a class which is otherwise completely independent of CodeGen. I could solve this differently, but given that I see absolutely no value in preserving this mechanism, I going to just get rid of it. Note: Tis is the first time I'm intentionally breaking previously supported gc.root functionality. Given 3.6 has branched, I believe this is a good time to do this. Differential Revision: http://reviews.llvm.org/D7004 llvm-svn: 226305	2015-01-16 19:33:28 +00:00
Timur Iskhodzhanov	60b721363c	Revert r226242 - Revert Revert Don't create new comdats in CodeGen This breaks AddressSanitizer (ninja check-asan) on Windows llvm-svn: 226251	2015-01-16 08:38:45 +00:00
Rafael Espindola	67a79e72f5	Revert "Revert Don't create new comdats in CodeGen" This reverts commit r226173, adding r226038 back. No change in this commit, but clang was changed to also produce trivial comdats for costructors, destructors and vtables when needed. Original message: Don't create new comdats in CodeGen. This patch stops the implicit creation of comdats during codegen. Clang now sets the comdat explicitly when it is required. With this patch clang and gcc now produce the same result in pr19848. llvm-svn: 226242	2015-01-16 02:22:55 +00:00
Hal Finkel	5ef58eb86d	Revert "r226086 - Revert "r226071 - [RegisterCoalescer] Remove copies to reserved registers"" Reapply r226071 with fixes. Two fixes: 1. We need to manually remove the old and create the new 'deaf defs' associated with physical register definitions when we move the definition of the physical register from the copy point to the point of the original vreg def. This problem was picked up by the machinstr verifier, and could trigger a verification failure on test/CodeGen/X86/2009-02-12-DebugInfoVLA.ll, so I've turned on the verifier in the tests. 2. When moving the def point of the phys reg up, we need to make sure that it is neither defined nor read in between the two instructions. We don't, however, extend the live ranges of phys reg defs to cover uses, so just checking for live-range overlap between the pair interval and the phys reg aliases won't pick up reads. As a result, we manually iterate over the range and check for reads. A test soon to be committed to the PowerPC backend will test this change. Original commit message: [RegisterCoalescer] Remove copies to reserved registers This allows the RegisterCoalescer to join "non-flipped" range pairs with a physical destination register -- which allows the RegisterCoalescer to remove copies like this: <vreg> = something (maybe a load, for example) ... (things that don't use PHYSREG) PHYSREG = COPY <vreg> (with all of the restrictions normally applied by the RegisterCoalescer: having compatible register classes, etc. ) Previously, the RegisterCoalescer handled only the opposite case (copying from a physical register). I don't handle the problem fully here, but try to get the common case where there is only one use of <vreg> (the COPY). An upcoming commit to the PowerPC backend will make this pattern much more common on PPC64/ELF systems. llvm-svn: 226200	2015-01-15 20:32:09 +00:00
Philip Reames	66c9fb0d52	Style cleanup of old gc.root lowering code Use static functions for helpers rather than static member functions. a) this changes the linking (minor at best), and b) this makes it obvious no object state is involved. llvm-svn: 226198	2015-01-15 19:49:25 +00:00
Philip Reames	b87144160e	clang-format GCStrategy.cpp & GCRootLowering.cpp (NFC) llvm-svn: 226196	2015-01-15 19:39:17 +00:00
Philip Reames	f27f373895	Split GCStrategy.cpp into two files (NFC) This preparation for an update to http://reviews.llvm.org/D6811. GCStrategy.cpp will hopefully be moving into IR/, where as the lowering logic needs to stay in CodeGen/ llvm-svn: 226195	2015-01-15 19:29:42 +00:00
Timur Iskhodzhanov	f5adf13fac	Revert Don't create new comdats in CodeGen It breaks AddressSanitizer on Windows. llvm-svn: 226173	2015-01-15 16:14:34 +00:00
Mehdi Amini	fa546b29a0	Fix SelectionDAG -view-*-dags filtering llvm-svn: 226163	2015-01-15 12:03:32 +00:00
Alexander Kornienko	8c0809c7f8	Replace size method call of containers to empty method where appropriate This patch was generated by a clang tidy checker that is being open sourced. The documentation of that checker is the following: /// The emptiness of a container should be checked using the empty method /// instead of the size method. It is not guaranteed that size is a /// constant-time function, and it is generally more efficient and also shows /// clearer intent to use empty. Furthermore some containers may implement the /// empty method but not implement the size method. Using empty whenever /// possible makes it easier to switch to another container in the future. Patch by Gábor Horváth! llvm-svn: 226161	2015-01-15 11:41:30 +00:00
Chandler Carruth	b98f63dbdb	[PM] Separate the TargetLibraryInfo object from the immutable pass. The pass is really just a means of accessing a cached instance of the TargetLibraryInfo object, and this way we can re-use that object for the new pass manager as its result. Lots of delta, but nothing interesting happening here. This is the common pattern that is developing to allow analyses to live in both the old and new pass manager -- a wrapper pass in the old pass manager emulates the separation intrinsic to the new pass manager between the result and pass for analyses. llvm-svn: 226157	2015-01-15 10:41:28 +00:00
Hal Finkel	dd669615dd	Revert "r226071 - [RegisterCoalescer] Remove copies to reserved registers" Reverting this while I investigate some bad behavior this is causing. As a possibly-related issue, adding -verify-machineinstrs to one of the test cases now fails because of this change: llc test/CodeGen/X86/2009-02-12-DebugInfoVLA.ll -march=x86-64 -o - -verify-machineinstrs * Bad machine code: No instruction at def index * - function: foo - basic block: BB#0 return (0x10007e21f10) [0B;736B) - liverange: [128r,128d:9)[160r,160d:8)[176r,176d:7)[336r,336d:6)[464r,464d:5)[480r,480d:4)[624r,624d:3)[752r,752d:2)[768r,768d:1)[78 4r,784d:0) 0@784r 1@768r 2@752r 3@624r 4@480r 5@464r 6@336r 7@176r 8@160r 9@128r - register: %DS Valno #3 is defined at 624r * Bad machine code: Live segment doesn't end at a valid instruction * - function: foo - basic block: BB#0 return (0x10007e21f10) [0B;736B) - liverange: [128r,128d:9)[160r,160d:8)[176r,176d:7)[336r,336d:6)[464r,464d:5)[480r,480d:4)[624r,624d:3)[752r,752d:2)[768r,768d:1)[78 4r,784d:0) 0@784r 1@768r 2@752r 3@624r 4@480r 5@464r 6@336r 7@176r 8@160r 9@128r - register: %DS [624r,624d:3) LLVM ERROR: Found 2 machine code errors. where 624r corresponds exactly to the interval combining change: 624B %RSP<def> = COPY %vreg16; GR64:%vreg16 Considering merging %vreg16 with %RSP RHS = %vreg16 [608r,624r:0) 0@608r updated: 608B %RSP<def> = MOV64rm <fi#3>, 1, %noreg, 0, %noreg; mem:LD8[%saved_stack.1] Success: %vreg16 -> %RSP Result = %RSP llvm-svn: 226086	2015-01-15 03:08:59 +00:00
Chandler Carruth	62d4215baa	[PM] Move TargetLibraryInfo into the Analysis library. While the term "Target" is in the name, it doesn't really have to do with the LLVM Target library -- this isn't an abstraction which LLVM targets generally need to implement or extend. It has much more to do with modeling the various runtime libraries on different OSes and with different runtime environments. The "target" in this sense is the more general sense of a target of cross compilation. This is in preparation for porting this analysis to the new pass manager. No functionality changed, and updates inbound for Clang and Polly. llvm-svn: 226078	2015-01-15 02:16:27 +00:00
NAKAMURA Takumi	95b3880dd0	Win64Exception.cpp: Try to fix crash for x64 EH. "Per" might be null there. llvm-svn: 226077	2015-01-15 02:15:21 +00:00
Hal Finkel	8299646236	[RegisterCoalescer] Remove copies to reserved registers This allows the RegisterCoalescer to join "non-flipped" range pairs with a physical destination register -- which allows the RegisterCoalescer to remove copies like this: <vreg> = something (maybe a load, for example) ... (things that don't use PHYSREG) PHYSREG = COPY <vreg> (with all of the restrictions normally applied by the RegisterCoalescer: having compatible register classes, etc. ) Previously, the RegisterCoalescer handled only the opposite case (copying from a physical register). I don't handle the problem fully here, but try to get the common case where there is only one use of <vreg> (the COPY). An upcoming commit to the PowerPC backend will make this pattern much more common on PPC64/ELF systems. llvm-svn: 226071	2015-01-15 01:25:28 +00:00
Ramkumar Ramachandra	dba7329ebb	[GC] CodeGenPrep transform: simplify offsetable relocate The transform is somewhat involved, but the basic idea is simple: find derived pointers that have been offset from the base pointer using gep and replace the relocate of the derived pointer with a gep to the relocated base pointer (with the same offset). llvm-svn: 226060	2015-01-14 23:27:07 +00:00
Reid Kleckner	e80a0a7572	Use MMI->getPersonality() instead of MMI->getPersonalities()[MMI->getPersonalityIndex()] Also nuke the comment about supporting multiple personalities in a single function, aka PR1414. That's just crazy. llvm-svn: 226052	2015-01-14 22:47:54 +00:00
Matthias Braun	96a319588a	MachineVerifier: Allow undef reads if a matching superreg is defined. Summary: Some pseudo instruction expansions break down a wide register use into multiple uses of smaller sub registers. If the super register was partially undefined the broken down sub registers may be completely undefined now leading to MachineVerifier complaints. Unfortunately liveness information to add the required dead flags is not easily (cheaply) available when expanding pseudo instructions. This commit changes the verifier to be quiet if there is an additional implicit use of a super register. Pseudo instruction expanders can use this to mark cases where partially defined values get potentially broken into completely undefined ones. Differential Revision: http://reviews.llvm.org/D6973 llvm-svn: 226047	2015-01-14 22:25:14 +00:00
Rafael Espindola	fad1639a12	Don't create new comdats in CodeGen. This patch stops the implicit creation of comdats during codegen. Clang now sets the comdat explicitly when it is required. With this patch clang and gcc now produce the same result in pr19848. llvm-svn: 226038	2015-01-14 20:55:48 +00:00
Chandler Carruth	e3288147f0	[MBP] Add flags to disable the BadCFGConflict check in MachineBlockPlacement. Some benchmarks have shown that this could lead to a potential performance benefit, and so adding some flags to try to help measure the difference. A possible explanation. In diamond-shaped CFGs (A followed by either B or C both followed by D), putting B and C both in between A and D leads to the code being less dense than it could be. Always either B or C have to be skipped increasing the chance of cache misses etc. Moving either B or C to after D might be beneficial on average. In the long run, but we should probably do a better job of analyzing the basic block and branch probabilities to move the correct one of B or C to after D. But even if we don't use this in the long run, it is a good baseline for benchmarking. Original patch authored by Daniel Jasper with test tweaks and a second flag added by me. Differential Revision: http://reviews.llvm.org/D6969 llvm-svn: 226034	2015-01-14 20:19:29 +00:00
Reid Kleckner	9b5eaf0d5a	Emit the Itanium LSDA for unknown EH personalities on Win64 This fixes lots of generic CodeGen tests that use __gcc_personality_v0. This suggests that using ExceptionHandling::MSVC was a mistake, and we should instead classify each function by personality function. This would, for example, allow us to LTO a binary containing uses of SEH and Itanium EH. llvm-svn: 226019	2015-01-14 18:50:10 +00:00
Reid Kleckner	b57c1dc0f7	Remove dead code for llvm.eh.selector in the old EH model llvm-svn: 226018	2015-01-14 18:49:39 +00:00
Chandler Carruth	d9903888d9	[cleanup] Re-sort all the #include lines in LLVM using utils/sort_includes.py. I clearly haven't done this in a while, so more changed than usual. This even uncovered a missing include from the InstrProf library that I've added. No functionality changed here, just mechanical cleanup of the include order. llvm-svn: 225974	2015-01-14 11:23:27 +00:00
Mehdi Amini	d8976b8ed3	SelectionDAG: add a -filter-view-dags option to llc This option takes the name of the basic block you want to visualize with -view-*-dags Differential Revision: http://reviews.llvm.org/D6948 llvm-svn: 225953	2015-01-14 06:03:18 +00:00
Mehdi Amini	648eff1695	DAG Combiner: Fold SelectCC When Cond is UNDEF In case folding a node end up with a NaN as operand for the select, the folding of the condition of the selectcc node returns "UNDEF". Differential Revision: http://reviews.llvm.org/D6889 llvm-svn: 225952	2015-01-14 05:45:24 +00:00
Mehdi Amini	7b068f6ba4	Add assertions for out of bound index in ComputeLinearIndex llvm-svn: 225951	2015-01-14 05:38:48 +00:00
Mehdi Amini	8923cc5470	Fold a loop for array processing in ComputeLinearIndex When processing an array, every Elt has the same layout, it is useless to recursively call each ComputeLinearIndex on each element. Just do it once and multiply by the number of elements. Differential Revision: http://reviews.llvm.org/D6832 llvm-svn: 225949	2015-01-14 05:33:01 +00:00
JF Bastien	eeea8970b4	Revert "Insert random noops to increase security against ROP attacks (llvm)" This reverts commit: http://reviews.llvm.org/D3392 llvm-svn: 225948	2015-01-14 05:24:33 +00:00
Matt Arsenault	bd22342322	Implement new way of expanding extloads. Now that the source and destination types can be specified, allow doing an expansion that doesn't use an EXTLOAD of the result type. Try to do a legal extload to an intermediate type and extend that if possible. This generalizes the special case custom lowering of extloads R600 has been using to work around this problem. This also happens to fix a bug that would incorrectly use more aligned loads than should be used. llvm-svn: 225925	2015-01-14 01:35:17 +00:00
JF Bastien	dcdd5ad252	Insert random noops to increase security against ROP attacks (llvm) A pass that adds random noops to X86 binaries to introduce diversity with the goal of increasing security against most return-oriented programming attacks. Command line options: -noop-insertion // Enable noop insertion. -noop-insertion-percentage=X // X% of assembly instructions will have a noop prepended (default: 50%, requires -noop-insertion) -max-noops-per-instruction=X // Randomly generate X noops per instruction. ie. roll the dice X times with probability set above (default: 1). This doesn't guarantee X noop instructions. In addition, the following 'quick switch' in clang enables basic diversity using default settings (currently: noop insertion and schedule randomization; it is intended to be extended in the future). -fdiversify This is the llvm part of the patch. clang part: D3393 http://reviews.llvm.org/D3392 Patch by Stephen Crane (@rinon) llvm-svn: 225908	2015-01-14 01:07:26 +00:00
Hal Finkel	665026838b	Adjust ScheduleDAGSDNodes::RegDefIter for patchpoints PATCHPOINT is a strange pseudo-instruction. Depending on how it is used, and whether or not the AnyReg calling convention is being used, it might or might not define a value. However, its TableGen definition says that it defines one value, and so when it doesn't, the code in ScheduleDAGSDNodes::RegDefIter becomes confused and the code that uses the RegDefIter will try to get the register class of the MVT::Other type associated with the PATCHPOINT's chain result (under certain circumstances). This will be covered by the PPC64 PatchPoint test cases once that support is re-committed. llvm-svn: 225907	2015-01-14 01:07:03 +00:00
Reid Kleckner	0a57f65514	CodeGen support for x86_64 SEH catch handlers in LLVM This adds handling for ExceptionHandling::MSVC, used by the x86_64-pc-windows-msvc triple. It assumes that filter functions have already been outlined in either the frontend or the backend. Filter functions are used in place of the landingpad catch clause type info operands. In catch clause order, the first filter to return true will catch the exception. The C specific handler table expects the landing pad to be split into one block per handler, but LLVM IR uses a single landing pad for all possible unwind actions. This patch papers over the mismatch by synthesizing single instruction BBs for every catch clause to fill in the EH selector that the landing pad block expects. Missing functionality: - Accessing data in the parent frame from outlined filters - Cleanups (from __finally) are unsupported, as they will require outlining and parent frame access - Filter clauses are unsupported, as there's no clear analogue in SEH In other words, this is the minimal set of changes needed to write IR to catch arbitrary exceptions and resume normal execution. Reviewers: majnemer Differential Revision: http://reviews.llvm.org/D6300 llvm-svn: 225904	2015-01-14 01:05:27 +00:00
Adrian Prantl	7813d9c979	Debug Info: Implement DwarfCompileUnit::addComplexAddress() using DIEDwarfExpression (and get rid of a bunch of redundant code). NFC llvm-svn: 225900	2015-01-14 01:01:30 +00:00
Adrian Prantl	ad768c3719	Debug Info: Emitting a register in DwarfExpression may fail. Report the status in a bool and let the users deal with the error. NFC. llvm-svn: 225899	2015-01-14 01:01:28 +00:00
Adrian Prantl	658676c3ea	Debug Info: Move DIEDwarfExpression into DwarfExpression.h because it needs to be accessed from both DwarfCompileUnit.cpp and DwarfUnit.cpp. NFC. llvm-svn: 225898	2015-01-14 01:01:22 +00:00
Eric Christopher	6e30cd95cb	Migrate ABIName to MCTargetOptions so that it can be shared between the TargetMachine level and the MC level. llvm-svn: 225891	2015-01-14 00:50:31 +00:00
Adrian Prantl	8efadbf868	Debug Info: Don't bother emitting DW_AT_frame_base if the function has no frame register. "Tested" via an assertion triggered by DwarfExpression. llvm-svn: 225858	2015-01-14 00:15:16 +00:00
Adrian Prantl	1411577ad9	Revert "Debug Info: Bail out of AddMachineRegPiece() if MachineReg is not a" This reverts commit r225852, it was a bad idea. MachineReg should always be a physical register. If it isn't this DebugLoc shouldn't have been created in the first place. llvm-svn: 225857	2015-01-14 00:15:12 +00:00
Adrian Prantl	e8e0bac270	Debug Info: Bail out of AddMachineRegPiece() if MachineReg is not a physical register. The call to getMinimalPhysRegClass() later on asserts on this condition. llvm-svn: 225852	2015-01-13 23:39:15 +00:00
Adrian Prantl	092d9489ed	Debug Info: Move the complex expression handling (=the remainder) of emitDebugLocValue() into DwarfExpression. Ought to be NFC, but it actually uncovered a bug in the debug-loc-asan.ll testcase. The testcase checks that the address of variable "y" is stored at [RSP+16], which also lines up with the comment. It also check(ed) that the value of "y" is stored in RDI before that, but that is actually incorrect, since RDI is the very value that is stored in [RSP+16]. Here's the assembler output: movb 2147450880(%rcx), %r8b #DEBUG_VALUE: bar:y <- RDI cmpb $0, %r8b movq %rax, 32(%rsp) # 8-byte Spill movq %rsi, 24(%rsp) # 8-byte Spill movq %rdi, 16(%rsp) # 8-byte Spill .Ltmp3: #DEBUG_VALUE: bar:y <- [RSP+16] Fixed the comment to spell out the correct register and the check to expect an address rather than a value. Note that the range that is emitted for the RDI location was and is still wrong, it claims to begin at the function prologue, but really it should start where RDI is first assigned. llvm-svn: 225851	2015-01-13 23:39:11 +00:00
Adrian Prantl	0a3bfdbd37	cleanup. llvm-svn: 225848	2015-01-13 23:11:51 +00:00
Adrian Prantl	172ab66a11	Document, cleanup, and clang-format DwarfExpression.h llvm-svn: 225847	2015-01-13 23:11:07 +00:00
Adrian Prantl	8995f5c92f	Debug Info: Turn DIExpression::getFrameRegister() into an isFrameRegister() function. NFC. llvm-svn: 225846	2015-01-13 23:10:43 +00:00
Matthias Braun	f50ab43214	DAGCombiner: simplify by using condition variables; NFC llvm-svn: 225836	2015-01-13 22:17:46 +00:00
Matt Arsenault	bf0db918b2	R600: Implement getRecipEstimate This requires a new hook to prevent expanding sqrt in terms of rsqrt and reciprocal. v_rcp_f32, v_rsq_f32, and v_sqrt_f32 are all the same rate, so this expansion would just double the number of instructions and cycles. llvm-svn: 225828	2015-01-13 20:53:23 +00:00
Hal Finkel	c4ee2c5188	[StackMaps] Use CurrentFnSymForSize When computing the call-site offset, use AP.CurrentFnSymForSize instead of AP.CurrentFnSym. There should be no change for other targets, but this is necessary for generating valid expressions for PPC64/ELF. llvm-svn: 225807	2015-01-13 17:48:07 +00:00
Hal Finkel	0ad96c818c	[StackMaps] Mark in CallLoweringInfo when lowering a patchpoint While, generally speaking, the process of lowering arguments for a patchpoint is the same as lowering a regular indirect call, on some targets it may not be exactly the same. Targets may not, for example, want to add additional register dependencies that apply only to making cross-DSO calls through linker stubs, may not want to load additional registers out of function descriptors, and may not want to add additional side-effect-causing instructions that cannot be removed later with the call itself being generated. The PowerPC target will use this in a future commit (for all of the reasons stated above). llvm-svn: 225806	2015-01-13 17:48:04 +00:00
Hal Finkel	df87f9383b	[StackMaps] Allow the target to pre-process the live-out mask Some targets, PowerPC for example, have pseudo-registers (such as that used to represent the rounding mode), that don't have DWARF register numbers or a register class. These are used only for internal dependency tracking, and should not appear in the recorded live-outs. This adds a callback allowing the target to pre-process the live-out mask in order to remove these kinds of registers so that the StackMaps code does not complain about them and/or attempt to include them in the output. This will be used by the PowerPC target in a future commit. llvm-svn: 225805	2015-01-13 17:47:59 +00:00
Olivier Sallenave	325096980b	Added TLI hook for isFPExtFree. Some of the FMA combine heuristics are now guarded with that hook. llvm-svn: 225795	2015-01-13 15:06:36 +00:00
Mehdi Amini	22e59748ef	Peephole opt needs optimizeSelect() to keep track of newly created MIs Peephole optimizer is scanning a basic block forward. At some point it needs to answer the question "given a pointer to an MI in the current BB, is it located before or after the current instruction". To perform this, it keeps a set of the MIs already seen during the scan, if a MI is not in the set, it is assumed to be after. It means that newly created MIs have to be inserted in the set as well. This commit passes the set as an argument to the target-dependent optimizeSelect() so that it can properly update the set with the (potentially) newly created MIs. llvm-svn: 225772	2015-01-13 07:07:13 +00:00
Reid Kleckner	3542ace6ef	Rename llvm.recoverframeallocation to llvm.framerecover This name is less descriptive, but it sort of puts things in the 'llvm.frame...' namespace, relating it to frameallocate and frameaddress. It also avoids using "allocate" and "allocation" together. llvm-svn: 225752	2015-01-13 01:51:34 +00:00
Reid Kleckner	e9b8931873	Add the llvm.frameallocate and llvm.recoverframeallocation intrinsics These intrinsics allow multiple functions to share a single stack allocation from one function's call frame. The function with the allocation may only perform one allocation, and it must be in the entry block. Functions accessing the allocation call llvm.recoverframeallocation with the function whose frame they are accessing and a frame pointer from an active call frame of that function. These intrinsics are very difficult to inline correctly, so the intention is that they be introduced rarely, or at least very late during EH preparation. Reviewers: echristo, andrew.w.kaylor Differential Revision: http://reviews.llvm.org/D6493 llvm-svn: 225746	2015-01-13 00:48:10 +00:00
Matt Arsenault	a982e4f82b	Combine fcmp + select to fminnum / fmaxnum if no nans and legal Also require unsafe FP math for no since there isn't a way to test for signed zeros. llvm-svn: 225744	2015-01-13 00:43:00 +00:00
Adrian Prantl	66f2595845	Debug Info: Move support for constants into DwarfExpression. Move the declaration of DebugLocDwarfExpression into DwarfExpression.h because it needs to be accessed from AsmPrinterDwarf.cpp and DwarfDebug.cpp NFC. llvm-svn: 225734	2015-01-13 00:04:06 +00:00
Adrian Prantl	a4c30d6509	Make DwarfExpression store the AsmPrinter instead of the TargetMachine. NFC. llvm-svn: 225731	2015-01-12 23:36:56 +00:00
Adrian Prantl	9cffbd8daa	remove extra semicolon llvm-svn: 225730	2015-01-12 23:36:50 +00:00
Reid Kleckner	bba20f06de	musttail: Only set the inreg flag for fastcall and vectorcall Otherwise we'll attempt to forward ECX, EDX, and EAX for cdecl and stdcall thunks, leaving us with no scratch registers for indirect call targets. Fixes PR22052. llvm-svn: 225729	2015-01-12 23:28:23 +00:00
Adrian Prantl	337e360279	Run clang-format on the parts of AsmPrinterDwarf where it improves the readability. llvm-svn: 225726	2015-01-12 23:03:23 +00:00
Adrian Prantl	0fec811d7b	Debug Info: Add a virtual destructor to DwarfExpression. Thanks Chandler for noticing! llvm-svn: 225724	2015-01-12 22:59:28 +00:00
Adrian Prantl	0d5df0ac1c	Untwine this expression. Thanks to David for noticing! llvm-svn: 225720	2015-01-12 22:39:14 +00:00
Adrian Prantl	0e6ffb9d0d	Debug Info: Implement DwarfUnit::addRegisterOpPiece() using DwarfExpression. NFC. llvm-svn: 225717	2015-01-12 22:37:16 +00:00
Adrian Prantl	00dbc2a7d3	Debug Info: Implement DwarfUnit::addRegisterOffset using DwarfExpression. No functional change. llvm-svn: 225707	2015-01-12 22:19:26 +00:00
Adrian Prantl	b16d9ebb0c	Debug info: Factor out the creation of DWARF expressions from AsmPrinter into a new class DwarfExpression that can be shared between AsmPrinter and DwarfUnit. This is the first step towards unifying the two entirely redundant implementations of dwarf expression emission in DwarfUnit and AsmPrinter. Almost no functional change — Testcases were updated because asm comments that used to be on two lines now appear on the same line, which is actually preferable. llvm-svn: 225706	2015-01-12 22:19:22 +00:00
Matthias Braun	f5d931f716	RegisterCoalescer: Turn some impossible conditions into asserts This is a fixed version of reverted r225500. It fixes the too early if() continue; of the last patch and adds a comment to the unorthodox loop. llvm-svn: 225652	2015-01-12 19:10:17 +00:00
Ahmed Bougacha	e03bef7543	[SimplifyLibCalls] Factor out fortified libcall handling. This lets us remove CGP duplicate. Differential Revision: http://reviews.llvm.org/D6541 llvm-svn: 225640	2015-01-12 17:22:43 +00:00
Joerg Sonnenberger	8a36a8e5d4	Revert r225500, it leads to infinite loops. llvm-svn: 225590	2015-01-10 21:49:36 +00:00
Lang Hames	1e923ec122	Recommit r224935 with a fix for the ObjC++/AArch64 bug that that revision introduced. A test case for the bug was already committed in r225385. Patch by Rafael Espindola. llvm-svn: 225534	2015-01-09 18:55:42 +00:00
Matthias Braun	7e87384592	RegisterCoalescer: Fix removeCopyByCommutingDef with subreg liveness The code that eliminated additional coalescable copies in removeCopyByCommutingDef() used MergeValueNumberInto() which internally may merge A into B or B into A. In this case A and B had different Def points, so we have to reset ValNo.Def to the intended one after merging. llvm-svn: 225503	2015-01-09 03:01:31 +00:00
Matthias Braun	ea399e59cf	RegisterCoalescer: Some cleanup in removeCopyByCommutingDef(), NFC llvm-svn: 225502	2015-01-09 03:01:28 +00:00
Matthias Braun	55586a2f2d	RegisterCoalescer: No need to set kill flags, they are recompute later anyway llvm-svn: 225501	2015-01-09 03:01:26 +00:00
Matthias Braun	6588b145fc	RegisterCoalescer: Turn some impossible conditions into asserts llvm-svn: 225500	2015-01-09 03:01:23 +00:00
Hal Finkel	0ce7f372e5	[DAGCombine] Remainder of fix to r225380 (More FMA folding opportunities) As pointed out by Aditya (and Owen), when we elide an FP extend to form an FMA, we need to extend the incoming operands so that the resulting node will really be legal. This is currently enabled only for PowerPC, and it happens to work there regardless, but this should fix the functionality for everyone else should anyone else wish to use it. llvm-svn: 225492	2015-01-09 01:29:29 +00:00
Hal Finkel	33ead6f901	Partial fix to r225380 (More FMA folding opportunities) As pointed out by Aditya (and Owen), there are two things wrong with this code. First, it adds patterns which elide FP extends when forming FMAs, and that might not be profitable on all targets (it belongs behind the pre-existing aggressive-FMA-formation flag). This is fixed by this change. Second, the resulting nodes might have operands of different types (the extensions need to be re-added). That will be fixed in the follow-up commit. llvm-svn: 225485	2015-01-09 00:45:54 +00:00
Hal Finkel	0709f5160f	[MachineLICM] A command-line option to hoist even cheap instructions Add a command-line option to enable hoisting even cheap instructions (in low-register-pressure situations). This is turned off by default, but has proved useful for testing purposes. llvm-svn: 225470	2015-01-08 22:10:48 +00:00
Duncan P. N. Exon Smith	e90f1165d8	CodeGen: Use handy new-fangled post-increment, NFC Drive-by cleanup; I noticed this when reviewing the patch that became r225466. llvm-svn: 225468	2015-01-08 21:07:55 +00:00
Duncan P. N. Exon Smith	5914a97af8	CodeGen: Use range-based for loops, NFC Patch by Ramkumar Ramachandra! llvm-svn: 225466	2015-01-08 20:44:33 +00:00
Elena Demikhovsky	285fbd551a	Masked Load/Store - fixed a bug in type legalization. llvm-svn: 225441	2015-01-08 12:29:19 +00:00
Michael Kuperstein	698ea3b488	Fix include ordering, NFC. llvm-svn: 225439	2015-01-08 11:59:43 +00:00
Michael Kuperstein	8c65e31a5a	Move SPAdj logic from PEI into the targets (NFC) PEI tries to keep track of how much starting or ending a call sequence adjusts the stack pointer by, so that it can resolve frame-index references. Currently, it takes a very simplistic view of how SP adjustments are done - both FrameStartOpcode and FrameDestroyOpcode adjust it exactly by the amount written in its first argument. This view is in fact incorrect for some targets (e.g. due to stack re-alignment, or because it may want to adjust the stack pointer in multiple steps). However, that doesn't cause breakage, because most targets (the only in-tree exception appears to be 32-bit ARM) rely on being able to simplify the call frame pseudo-instructions earlier, so this code is never hit. Moving the computation into TargetInstrInfo allows targets to override the way the adjustment is computed if they need to have a non-zero SPAdj. Differential Revision: http://reviews.llvm.org/D6863 llvm-svn: 225437	2015-01-08 11:04:38 +00:00
Quentin Colombet	a799e2e014	[RegAllocGreedy] Introduce a late pass to repair broken hints. A broken hint is a copy where both ends are assigned different colors. When a variable gets evicted in the neighborhood of such copies, it is likely we can reconcile some of them. Context Copies are inserted during the register allocation via splitting. These split points are required to relax the constraints on the allocation problem. When such a point is inserted, both ends of the copy would not share the same color with respect to the current allocation problem. When variables get evicted, the allocation problem becomes different and some split point may not be required anymore. However, the related variables may already have been colored. This usually shows up in the assembly with pattern like this: def A ... save A to B def A use A restore A from B ... use B Whereas we could simply have done: def B ... def A use A ... use B Proposed Solution A variable having a broken hint is marked for late recoloring if and only if selecting a register for it evict another variable. Indeed, if no eviction happens this is pointless to look for recoloring opportunities as it means the situation was the same as the initial allocation problem where we had to break the hint. Finally, when everything has been allocated, we look for recoloring opportunities for all the identified candidates. The recoloring is performed very late to rely on accurate copy cost (all involved variables are allocated). The recoloring is simple unlike the last change recoloring. It propagates the color of the broken hint to all its copy-related variables. If the color is available for them, the recoloring uses it, otherwise it gives up on that hint even if a more complex coloring would have worked. The recoloring happens only if it is profitable. The profitability is evaluated using the expected frequency of the copies of the currently recolored variable with a) its current color and b) with the target color. If a) is greater or equal than b), then it is profitable and the recoloring happen. Example Consider the following example: BB1: a = b = BB2: ... = b = a Let us assume b gets split: BB1: a = b = BB2: c = b ... d = c = d = a Because of how the allocation work, b, c, and d may be assigned different colors. Now, if a gets evicted to make room for c, assuming b and d were assigned to something different than a. We end up with: BB1: a = st a, SpillSlot b = BB2: c = b ... d = c = d e = ld SpillSlot = e This is likely that we can assign the same register for b, c, and d, getting rid of 2 copies. Performances Both ARM64 and x86_64 show performance improvements of up to 3% for the llvm-testsuite + externals with Os and O3. There are a few regressions too that comes from the (in)accuracy of the block frequency estimate. <rdar://problem/18312047> llvm-svn: 225422	2015-01-08 01:16:39 +00:00
Ahmed Bougacha	2b6917b020	[SelectionDAG] Allow targets to specify legality of extloads' result type (in addition to the memory type). The LoadExt legalization handling used to only have one type, the memory type. This forced users to assume that as long as the extload for the memory type was declared legal, and the result type was legal, the whole extload was legal. However, this isn't always the case. For instance, on X86, with AVX, this is legal: v4i32 load, zext from v4i8 but this isn't: v4i64 load, zext from v4i8 Whereas v4i64 is (arguably) legal, even without AVX2. Note that the same thing was done a while ago for truncstores (r46140), but I assume no one needed it yet for extloads, so here we go. Calls to getLoadExtAction were changed to add the value type, found manually in the surrounding code. Calls to setLoadExtAction were mechanically changed, by wrapping the call in a loop, to match previous behavior. The loop iterates over the MVT subrange corresponding to the memory type (FP vectors, etc...). I also pulled neighboring setTruncStoreActions into some of the loops; those shouldn't make a difference, as the additional types are illegal. (e.g., i128->i1 truncstores on PPC.) No functional change intended. Differential Revision: http://reviews.llvm.org/D6532 llvm-svn: 225421	2015-01-08 00:51:32 +00:00
Matthias Braun	9d7bc0874c	RegisterCoalescer: Do not remove IMPLICIT_DEFS if they are required for subranges. The register coalescer used to remove implicit_defs when they are covered by the main range anyway. With subreg liveness tracking we can't do that anymore in places where the IMPLICIT_DEF is required as begin of a subregister liverange. llvm-svn: 225416	2015-01-08 00:21:23 +00:00
Matthias Braun	d55e6ddacf	RegisterCoalescer: Fix valuesIdentical() in some subrange merge cases. I got confused and assumed SrcIdx/DstIdx of the CoalescerPair is a subregister index in SrcReg/DstReg, but they are actually subregister indices of the coalesced register that get you back to SrcReg/DstReg when applied. Fixed the bug, improved comments and simplified code accordingly. Testcase by Tom Stellard! llvm-svn: 225415	2015-01-07 23:58:38 +00:00
Matthias Braun	4fe686af00	LiveInterval: Implement feedback by Quentin Colombet. llvm-svn: 225413	2015-01-07 23:35:11 +00:00
Adrian Prantl	d88af278b9	Update a comment. llvm-svn: 225399	2015-01-07 21:35:13 +00:00
Ahmed Bougacha	67dd2d25a3	[CodeGen] Use MVT iterator_ranges in legality loops. NFC intended. A few loops do trickier things than just iterating on an MVT subset, so I'll leave them be for now. Follow-up of r225387. llvm-svn: 225392	2015-01-07 21:27:10 +00:00
Olivier Sallenave	0451532996	More FMA folding opportunities. llvm-svn: 225380	2015-01-07 20:54:17 +00:00
Adrian Prantl	3dd48c6fde	Debug info: Allow aggregate types to be described by constants. llvm-svn: 225378	2015-01-07 20:48:58 +00:00
Olivier Sallenave	e64ad7cedd	Test commit llvm-svn: 225368	2015-01-07 19:45:17 +00:00
Philip Reames	352fb93773	Add a missing file from 225365 llvm-svn: 225366	2015-01-07 19:13:28 +00:00
Philip Reames	4ac17a3026	Introduce an example statepoint GC strategy This change includes the most basic possible GCStrategy for a GC which is using the statepoint lowering code. At the moment, this GCStrategy doesn't really do much - aside from actually generate correct stackmaps that is - but I went ahead and added a few extra correctness checks as proof of concept. It's mostly here to provide documentation on how to do one, and to provide a point for various optimization legality hooks I'd like to add going forward. (For context, see the TODOs in InstCombine around gc.relocate.) Most of the validation logic added here as proof of concept will soon move in to the Verifier. That move is dependent on http://reviews.llvm.org/D6811 There was discussion in the review thread about addrspace(1) being reserved for something. I'm going to follow up on a seperate llvmdev thread. If needed, I'll update all the code at once. Note that I am deliberately not making a GCStrategy required to use gc.statepoints with this change. I want to give folks out of tree - including myself - a chance to migrate. In a week or two, I'll make having a GCStrategy be required for gc.statepoints. To this end, I added the gc tag to one of the test cases but not others. Differential Revision: http://reviews.llvm.org/D6808 llvm-svn: 225365	2015-01-07 19:07:50 +00:00
Jonas Paulsson	fcf0cba88c	New method SDep::isNormalMemoryOrBarrier() in ScheduleDAGInstrs.cpp. Used to iterate over previously added memory dependencies in adjustChainDeps() and iterateChainSucc(). SDep::isCtrl() was previously used in these places, that also gave anti and output edges. The code may be worse if these are followed, because MisNeedChainEdge() will conservatively return true since a non-memory instruction has no memory operands, and a false chain dep will be added. It is also unnecessary since all memory accesses of interest will be reached by memory dependencies, and there is a budget limit for the number of edges traversed. This problem was found on an out-of-tree target with enabled alias analysis. No test case for an in-tree target has been found. Reviewed by Hal Finkel. llvm-svn: 225351	2015-01-07 13:38:29 +00:00
Jonas Paulsson	bf408bbe38	Fix typos in comment and option help texts. For -enable-aa-sched-mi and -use-tbaa-in-sched-mi. llvm-svn: 225350	2015-01-07 13:20:57 +00:00
Lang Hames	66f755f84f	Revert r224935 "Refactor duplicated code. No intended functionality change." This is affecting the behavior of some ObjC++ / AArch64 test cases on Darwin. Reverting to get the bots green while I track down the source of the changed behavior. llvm-svn: 225311	2015-01-06 23:04:36 +00:00
Mehdi Amini	f3721bf619	SelectionDAGBuilder: move constant initialization out of loop No semantic change intended. Reviewers: resistor Differential Revision: http://reviews.llvm.org/D6834 llvm-svn: 225278	2015-01-06 18:20:04 +00:00
Andrea Di Biagio	f807a6f297	[CodeGenPrepare] Improved logic to speculate calls to cttz/ctlz. This patch improves the logic added at revision 224899 (see review D6728) that teaches the backend when it is profitable to speculate calls to cttz/ctlz. The original algorithm conservatively avoided speculating more than one instruction from a basic block in a control flow grap modelling an if-statement. In particular, the only allowed instruction (excluding the terminator) was a call to cttz/ctlz. However, there are cases where we could be less conservative and still be able to speculate a call to cttz/ctlz. With this patch, CodeGenPrepare now tries to speculate a cttz/ctlz if the result is zero extended/truncated in the same basic block, and the zext/trunc instruction is "free" for the target. Added new test cases to CodeGen/X86/cttz-ctlz.ll Differential Revision: http://reviews.llvm.org/D6853 llvm-svn: 225274	2015-01-06 17:41:18 +00:00
Frederic Riss	e541e0b327	Make DIE.h a public CodeGen header. dsymutil would like to use all the AsmPrinter/MCStreamer infrastructure to stream out the DWARF. In order to do so, it will reuse the DIE object and so this header needs to be public. The interface exposed here has some corners that cannot be used without a DwarfDebug object, but clients that want to stream Dwarf can just avoid these. Differential Revision: http://reviews.llvm.org/D6695 llvm-svn: 225208	2015-01-05 21:29:41 +00:00
Craig Topper	d3c02f177a	Replace several 'assert(false' with 'llvm_unreachable' or fold a condition into the assert. llvm-svn: 225160	2015-01-05 10:15:49 +00:00
Hal Finkel	5772566ed6	[PowerPC/BlockPlacement] Allow target to provide a per-loop alignment preference The existing code provided for specifying a global loop alignment preference. However, the preferred loop alignment might depend on the loop itself. For recent POWER cores, loops between 5 and 8 instructions should have 32-byte alignment (while the others are better with 16-byte alignment) so that the entire loop will fit in one i-cache line. To support this, getPrefLoopAlignment has been made virtual, and can be provided with an optional MachineLoop* so the target can inspect the loop before answering the query. The default behavior, as before, is to return the value set with setPrefLoopAlignment. MachineBlockPlacement now queries the target for each loop instead of only once per function. There should be no functional change for other targets. llvm-svn: 225117	2015-01-03 17:58:24 +00:00
Alexey Samsonov	553185ee4b	Revert "merge consecutive stores of extracted vector elements" This reverts commit r224611. This change causes crashes in X86 DAG->DAG Instruction Selection. llvm-svn: 225031	2014-12-31 00:40:28 +00:00
David Blaikie	aeaa5bf55e	DebugInfo: Omit is_stmt from line table entries on the same line. GCC does this for non-zero discriminators and since GCC doesn't produce column info, that was the only place it comes up there. For LLVM, since we can emit discriminators and/or column info, it makes more sense to invert the condition and just test for changes in line number. This should resolve at least some of the GDB 7.5 test suite failures created by recent Clang changes that increase the location fidelity (which, since Clang defaults to including column info on Linux by default created a bunch of cases that confused GDB). In theory we could do this better/differently by grouping actual source statements together in a similar manner to the way lexical scopes are handled but given that GDB isn't really in a position to consume that (& users are probably somewhat used to different lines being different 'statements') this seems the safest and cheapest change. (I'm concerned that doing this 'right' would bloat the debugloc data even further - something Duncan's working hard to address) llvm-svn: 225011	2014-12-30 22:47:13 +00:00
Peter Collingbourne	7ef497b1f5	x86_64: Fix calls to __morestack under the large code model. Under the large code model, we cannot assume that __morestack lives within 2^31 bytes of the call site, so we cannot use pc-relative addressing. We cannot perform the call via a temporary register, as the rax register may be used to store the static chain, and all other suitable registers may be either callee-save or used for parameter passing. We cannot use the stack at this point either because __morestack manipulates the stack directly. To avoid these issues, perform an indirect call via a read-only memory location containing the address. This solution is not perfect, as it assumes that the .rodata section is laid out within 2^31 bytes of each function body, but this seems to be sufficient for JIT. Differential Revision: http://reviews.llvm.org/D6787 llvm-svn: 225003	2014-12-30 20:05:19 +00:00
Michael Kuperstein	c43b063358	[COFF] Don't try to add quotes to already quoted linker directives If a linker directive is already quoted, don't try to quote it again, otherwise it creates a mess. This pops up in places like: #pragma comment(linker,"\"/foo bar'\"") Differential Revision: http://reviews.llvm.org/D6792 llvm-svn: 224998	2014-12-30 19:23:48 +00:00
Rafael Espindola	bed67f3adc	Refactor duplicated code. No intended functionality change. llvm-svn: 224935	2014-12-29 15:18:31 +00:00
Andrea Di Biagio	22ee3f63b9	[CodeGenPrepare] Teach when it is profitable to speculate calls to @llvm.cttz/ctlz. If the control flow is modelling an if-statement where the only instruction in the 'then' basic block (excluding the terminator) is a call to cttz/ctlz, CodeGenPrepare can try to speculate the cttz/ctlz call and simplify the control flow graph. Example: \code entry: %cmp = icmp eq i64 %val, 0 br i1 %cmp, label %end.bb, label %then.bb then.bb: %c = tail call i64 @llvm.cttz.i64(i64 %val, i1 true) br label %end.bb end.bb: %cond = phi i64 [ %c, %then.bb ], [ 64, %entry] \code In this example, basic block %then.bb is taken if value %val is not zero. Also, the phi node in %end.bb would propagate the size-of in bits of %val only if %val is equal to zero. With this patch, CodeGenPrepare will try to hoist the call to cttz from %then.bb into basic block %entry only if cttz is cheap to speculate for the target. Added two new hooks in TargetLowering.h to let targets customize the behavior (i.e. decide whether it is cheap or not to speculate calls to cttz/ctlz). The two new methods are 'isCheapToSpeculateCtlz' and 'isCheapToSpeculateCttz'. By default, both methods return 'false'. On X86, method 'isCheapToSpeculateCtlz' returns true only if the target has LZCNT. Method 'isCheapToSpeculateCttz' only returns true if the target has BMI. Differential Revision: http://reviews.llvm.org/D6728 llvm-svn: 224899	2014-12-28 11:07:35 +00:00
Elena Demikhovsky	87700a734d	Scalarizer for masked load and store intrinsics. Masked vector intrinsics are a part of common LLVM IR, but they are really supported on AVX2 and AVX-512 targets. I added a code that translates masked intrinsic for all other targets. The masked vector intrinsic is converted to a chain of scalar operations inside conditional basic blocks. http://reviews.llvm.org/D6436 llvm-svn: 224897	2014-12-28 08:54:45 +00:00
Timur Iskhodzhanov	b6fa52f274	Band-aid fix for PR22032: don't emit DWARF debug info if AddressSanitizer is enabled on Windows llvm-svn: 224860	2014-12-26 17:00:51 +00:00
David Majnemer	25b383ac66	Silence GCC's -Wparentheses warning No functionality change intended. llvm-svn: 224833	2014-12-25 10:03:23 +00:00
Elena Demikhovsky	fb81b93e17	Masked Load/Store - Changed the order of parameters in intrinsics. No functional changes. The documentation is coming. llvm-svn: 224829	2014-12-25 07:49:20 +00:00
David Majnemer	2913eca4e2	CodeGen: Error on redefinitions instead of asserting It's possible to have a prior definition of a symbol in module asm. Raise an error instead of crashing. llvm-svn: 224828	2014-12-24 23:06:55 +00:00
David Majnemer	8e92dfee20	CodeGen: Allow aliases to be overridden by variables llvm-svn: 224827	2014-12-24 22:44:29 +00:00
David Majnemer	58cb80c940	MC: Label definitions are permitted after .set directives .set directives may be overridden by other .set directives as well as label definitions. This fixes PR22019. llvm-svn: 224811	2014-12-24 10:27:50 +00:00
Matthias Braun	51ca510094	LiveInterval: Remove accidentally committed debug code. llvm-svn: 224807	2014-12-24 02:35:07 +00:00
Matthias Braun	dbcca0dbb4	LiveInterval: Introduce createMainRangeFromSubranges(). This function constructs the main liverange by merging all subranges if subregister liveness tracking is available. This should be slightly faster to compute instead of performing the liveness calculation again for the main range. More importantly it avoids cases where the main liverange would cover positions where no subrange was live. These cases happened for partial definitions where the actual defined part was dead and only the undefined parts used later. The register coalescing requires that every part covered by the main live range has at least one subrange live. I also expect this function to become usefull later for places where the subranges are modified in a way that it is hard to correctly fix the main liverange in the machine scheduler, we can simply reconstruct it from subranges then. llvm-svn: 224806	2014-12-24 02:11:51 +00:00
Matthias Braun	7030dda8d5	RegisterCoalescer: With subrange liveness there may be no RedefVNI for unused lanes. llvm-svn: 224805	2014-12-24 02:11:48 +00:00
Matthias Braun	36768c684f	LiveRangeEdit: Check for completely empy subranges after removing ValNos. Completely empty subranges are not allowed and must be removed when subreg liveness is enabled. llvm-svn: 224804	2014-12-24 02:11:46 +00:00
Matthias Braun	f603c88d13	LiveIntervalAnalysis: Fix performance bug that I introduced in r224663. Without a reference the code did not remember when moving the iterators of the subranges/registerunit ranges forward and instead would scan from the beginning again at the next position. llvm-svn: 224803	2014-12-24 02:11:43 +00:00
Adrian Prantl	3026a54aa2	Debug Info: In symmetry to DW_TAG_pointer_type, do not emit the byte size of a DW_TAG_ptr_to_member_type. This restores the behavior from before r224780-r224781. llvm-svn: 224799	2014-12-24 01:17:51 +00:00
Mehdi Amini	d38920891e	Always assert in DAGCombine and not only when -debug is enabled Right now in DAG Combine check the validity of the returned type only when -debug is given on the command line. However usually the test cases in the validation does not use -debug. An Assert build should always check this. llvm-svn: 224779	2014-12-23 18:59:02 +00:00
Michael Kuperstein	f4536ea6e8	[DagCombine] Improve DAGCombiner BUILD_VECTOR when it has two sources of elements This partially fixes PR21943. For AVX, we go from: vmovq (%rsi), %xmm0 vmovq (%rdi), %xmm1 vpermilps $-27, %xmm1, %xmm2 ## xmm2 = xmm1[1,1,2,3] vinsertps $16, %xmm2, %xmm1, %xmm1 ## xmm1 = xmm1[0],xmm2[0],xmm1[2,3] vinsertps $32, %xmm0, %xmm1, %xmm1 ## xmm1 = xmm1[0,1],xmm0[0],xmm1[3] vpermilps $-27, %xmm0, %xmm0 ## xmm0 = xmm0[1,1,2,3] vinsertps $48, %xmm0, %xmm1, %xmm0 ## xmm0 = xmm1[0,1,2],xmm0[0] To the expected: vmovq (%rdi), %xmm0 vmovhpd (%rsi), %xmm0, %xmm0 retq Fixing this for AVX2 is still open. Differential Revision: http://reviews.llvm.org/D6749 llvm-svn: 224759	2014-12-23 08:59:45 +00:00
Reid Kleckner	ce0093344f	Make musttail more robust for vector types on x86 Previously I tried to plug musttail into the existing vararg lowering code. That turned out to be a mistake, because non-vararg calls use significantly different register lowering, even on x86. For example, AVX vectors are usually passed in registers to normal functions and memory to vararg functions. Now musttail uses a completely separate lowering. Hopefully this can be used as the basis for non-x86 perfect forwarding. Reviewers: majnemer Differential Revision: http://reviews.llvm.org/D6156 llvm-svn: 224745	2014-12-22 23:58:37 +00:00
Quentin Colombet	84f89ccd45	[CodeGenPrepare] Handle properly the promotion of operands when this does not generate instructions. Fixes PR21978. Related to <rdar://problem/18310086> llvm-svn: 224717	2014-12-22 18:11:52 +00:00
Rafael Espindola	366e5c1bf1	The leak detector is dead, long live asan and valgrind. In resent times asan and valgrind have found way more memory management bugs in llvm than the special purpose leak detector. llvm-svn: 224703	2014-12-22 13:00:36 +00:00
Saleem Abdulrasool	90c224a143	CodeGen: minor style tweaks to SSP Clean up some style related things in the StackProtector CodeGen. NFC. llvm-svn: 224693	2014-12-21 21:52:38 +00:00
Matt Arsenault	22b4c256e1	Enable (sext x) == C --> x == (trunc C) combine Extend the existing code which handles this for zext. This makes this more useful for targets with ZeroOrNegativeOne BooleanContent and obsoletes a custom combine SI uses for i1 setcc (sext(i1), 0, setne) since the constant will now be shrunk to i1. llvm-svn: 224691	2014-12-21 16:48:42 +00:00
Saleem Abdulrasool	57b5fe57e3	CodeGen: constify and use range loop for SSP Use range-based for loop and constify the iterators. NFC. llvm-svn: 224683	2014-12-20 21:37:51 +00:00
Matthias Braun	714c494ca1	LiveIntervalAnalysis: No kill flags for partially undefined uses. We must not add kill flags when reading a vreg with some undefined subregisters, if subreg liveness tracking is enabled. This is because the register allocator may reuse these undefined subregisters for other values which are not killed. llvm-svn: 224664	2014-12-20 01:54:50 +00:00
Matthias Braun	7f8dece1d7	LiveIntervalAnalysis: cleanup addKills(), NFC - Use more const modifiers - Use references for things that can't be nullptr - Improve some variable names llvm-svn: 224663	2014-12-20 01:54:48 +00:00
Reid Kleckner	f2acbbaf22	EH: Sink computation of local PadMap variable into function that uses it No functionality change. llvm-svn: 224635	2014-12-19 22:30:08 +00:00
Reid Kleckner	93acac6cfc	Add the ExceptionHandling::MSVC enumeration It is intended to be used for a family of personality functions that have similar IR preparation requirements. Typically when interoperating with MSVC personality functions, bits of functionality need to be outlined from the main function into helper functions. There is also usually more than one landing pad per invoke, which does not match the LLVM IR landingpad representation. None of this is implemented yet. This change just adds a new enum that is active for *-windows-msvc and delegates to the EH removal preparation pass. No functionality change for other targets. llvm-svn: 224625	2014-12-19 22:19:48 +00:00
Sanjay Patel	0428a5786e	merge consecutive stores of extracted vector elements Add a path to DAGCombiner::MergeConsecutiveStores() to combine multiple scalar stores when the store operands are extracted vector elements. This is a partial fix for PR21711 ( http://llvm.org/bugs/show_bug.cgi?id=21711 ). For the new test case, codegen improves from: vmovss %xmm0, (%rdi) vextractps $1, %xmm0, 4(%rdi) vextractps $2, %xmm0, 8(%rdi) vextractps $3, %xmm0, 12(%rdi) vextractf128 $1, %ymm0, %xmm0 vmovss %xmm0, 16(%rdi) vextractps $1, %xmm0, 20(%rdi) vextractps $2, %xmm0, 24(%rdi) vextractps $3, %xmm0, 28(%rdi) vzeroupper retq To: vmovups %ymm0, (%rdi) vzeroupper retq Patch reviewed by Nadav Rotem. Differential Revision: http://reviews.llvm.org/D6698 llvm-svn: 224611	2014-12-19 20:23:41 +00:00
Matthias Braun	aeb50b3805	RegisterCoalescer: rewrite eliminateUndefCopy(). This also fixes problems with undef copies of subregisters. I can't attach a testcase for that as none of the targets in trunk has subregister liveness tracking enabled. llvm-svn: 224560	2014-12-19 01:39:46 +00:00
Adrian Prantl	cf44e7870b	Explain why LLVM is emitting a DW_AT_containing_type inside of a class. llvm-svn: 224555	2014-12-19 00:01:20 +00:00
Matthias Braun	15abf3743c	LiveIntervalAnalysis: Cleanup computeDeadValues - This also fixes a bug introduced in r223880 where values were not correctly marked as Dead anymore. - Cleanup computeDeadValues(): split up SubRange code variant, simplify arguments. llvm-svn: 224538	2014-12-18 19:58:52 +00:00
Eric Christopher	661f2d1ca1	Add a new string member to the TargetOptions struct for the name of the abi we should be using. For targets that don't use the option there's no change, otherwise this allows external users to set the ABI via string and avoid some of the -backend-option pain in clang. Use this option to move the ABI for the ARM port from the Subtarget to the TargetMachine and update the testcases accordingly since it's no longer valid to set via -mattr. llvm-svn: 224492	2014-12-18 02:20:58 +00:00
Matthias Braun	0a410f6243	RegisterCoalescer: Fix stripCopies() picking up main range instead of subregister range This fixes a problem where stripCopies() would switch to values in the main liverange when it crossed a copy instruction. However when joining subranges we need to stay in the respective subregister ranges. llvm-svn: 224461	2014-12-17 21:25:20 +00:00
Matthias Braun	8142efa8ea	ExecutionDepsFix: Correctly handle wide registers. The ExecutionDepsFix previously mapped each register to 1 or zero registers of the register class it was called with and therefore simulating liveness for. This was problematic for cases involving wider registers like Q0 on ARM where ExecutionDepsFix gets invoked for the Dxx registers. In these cases the wide register would get mapped to the last matching D register, while it should have been all matching D registers. This commit changes the AliasMap to use a SmallVector to map registers to potentially multiple destination regclass registers. This is required to avoid regressions with subregister liveness tracking enabled. llvm-svn: 224447	2014-12-17 19:13:47 +00:00
Michael Kuperstein	047b1a0400	[DAGCombine] Slightly improve lowering of BUILD_VECTOR into a shuffle. This handles the case of a BUILD_VECTOR being constructed out of elements extracted from a vector twice the size of the result vector. Previously this was always scalarized. Now, we try to construct a shuffle node that feeds on extract_subvectors. This fixes PR15872 and provides a partial fix for PR21711. Differential Revision: http://reviews.llvm.org/D6678 llvm-svn: 224429	2014-12-17 12:32:17 +00:00
Toma Tabacu	a23f13c3b0	[mips] Set GCC-compatible MIPS asssembler options before inline asm blocks. Summary: When generating MIPS assembly, LLVM always overrides the default assembler options by emitting the '.set noreorder', '.set nomacro' and '.set noat' directives, while GCC uses the default options if an assembly-level function contains inline assembly code. This becomes a problem when the code generated by LLVM is interleaved with inline assembly which assumes GCC-like assembler options (from Linux, for example). This patch fixes these conflicts by setting the appropriate assembler options at the beginning of an inline asm block and popping them at the end. Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6637 llvm-svn: 224425	2014-12-17 10:56:16 +00:00
Matthias Braun	f4a72cd06e	RegisterCoalescer: Sprinkle some const modifiers. llvm-svn: 224409	2014-12-17 02:18:13 +00:00
Quentin Colombet	fc2201e922	[CodeGenPrepare] Reapply r224351 with a fix for the assertion failure: The type promotion helper does not support vector type, so when make such it does not kick in in such cases. Original commit message: [CodeGenPrepare] Move sign/zero extensions near loads using type promotion. This patch extends the optimization in CodeGenPrepare that moves a sign/zero extension near a load when the target can combine them. The optimization may promote any operations between the extension and the load to make that possible. Although this optimization may be beneficial for all targets, in particular AArch64, this is enabled for X86 only as I have not benchmarked it for other targets yet. Context Most targets feature extended loads, i.e., loads that perform a zero or sign extension for free. In that context it is interesting to expose such pattern in CodeGenPrepare so that the instruction selection pass can form such loads. Sometimes, this pattern is blocked because of instructions between the load and the extension. When those instructions are promotable to the extended type, we can expose this pattern. Motivating Example Let us consider an example: define void @foo(i8* %addr1, i32* %addr2, i8 %a, i32 %b) { %ld = load i8* %addr1 %zextld = zext i8 %ld to i32 %ld2 = load i32* %addr2 %add = add nsw i32 %ld2, %zextld %sextadd = sext i32 %add to i64 %zexta = zext i8 %a to i32 %addza = add nsw i32 %zexta, %zextld %sextaddza = sext i32 %addza to i64 %addb = add nsw i32 %b, %zextld %sextaddb = sext i32 %addb to i64 call void @dummy(i64 %sextadd, i64 %sextaddza, i64 %sextaddb) ret void } As it is, this IR generates the following assembly on x86_64: [...] movzbl (%rdi), %eax # zero-extended load movl (%rsi), %es # plain load addl %eax, %esi # 32-bit add movslq %esi, %rdi # sign extend the result of add movzbl %dl, %edx # zero extend the first argument addl %eax, %edx # 32-bit add movslq %edx, %rsi # sign extend the result of add addl %eax, %ecx # 32-bit add movslq %ecx, %rdx # sign extend the result of add [...] The throughput of this sequence is 7.45 cycles on Ivy Bridge according to IACA. Now, by promoting the additions to form more extended loads we would generate: [...] movzbl (%rdi), %eax # zero-extended load movslq (%rsi), %rdi # sign-extended load addq %rax, %rdi # 64-bit add movzbl %dl, %esi # zero extend the first argument addq %rax, %rsi # 64-bit add movslq %ecx, %rdx # sign extend the second argument addq %rax, %rdx # 64-bit add [...] The throughput of this sequence is 6.15 cycles on Ivy Bridge according to IACA. This kind of sequences happen a lot on code using 32-bit indexes on 64-bit architectures. Note: The throughput numbers are similar on Sandy Bridge and Haswell. Proposed Solution To avoid the penalty of all these sign/zero extensions, we merge them in the loads at the beginning of the chain of computation by promoting all the chain of computation on the extended type. The promotion is done if and only if we do not introduce new extensions, i.e., if we do not degrade the code quality. To achieve this, we extend the existing “move ext to load” optimization with the promotion mechanism introduced to match larger patterns for addressing mode (r200947). The idea of this extension is to perform the following transformation: ext(promotableInst1(...(promotableInstN(load)))) => promotedInst1(...(promotedInstN(ext(load)))) The promotion mechanism in that optimization is enabled by a new TargetLowering switch, which is off by default. In other words, by default, the optimization performs the “move ext to load” optimization as it was before this patch. Performance Configuration: x86_64: Ivy Bridge fixed at 2900MHz running OS X 10.10. Tested Optimization Levels: O3/Os Tests: llvm-testsuite + externals. Results: - No regression beside noise. - Improvements: CINT2006/473.astar: ~2% Benchmarks/PAQ8p: ~2% Misc/perlin: ~3% The results are consistent for both O3 and Os. <rdar://problem/18310086> llvm-svn: 224402	2014-12-17 01:36:17 +00:00
David Blaikie	8b979f01c6	PR21875: codegen for non-type template parameters of nullptr_t type llvm-svn: 224399	2014-12-17 00:43:22 +00:00
Reid Kleckner	04b69f89aa	Revert "[CodeGenPrepare] Move sign/zero extensions near loads using type promotion." This reverts commit r224351. It causes assertion failures when building ICU. llvm-svn: 224397	2014-12-17 00:29:23 +00:00
Hans Wennborg	224cb82a39	SelectionDAG switch lowering: use 'unsigned' to count destination popularity SwitchInst::getNumCases() returns unsinged, so using uint64_t to count cases seems unnecessary. Also fix a missing CHECK in the test case. llvm-svn: 224393	2014-12-16 23:41:59 +00:00
Sanjay Patel	7129c10cae	merge consecutive loads that are offset from a base address SelectionDAG::isConsecutiveLoad() was not detecting consecutive loads when the first load was offset from a base address. This patch recognizes that pattern and subtracts the offset before comparing the second load to see if it is consecutive. The codegen change in the new test case improves from: vmovsd 32(%rdi), %xmm0 vmovsd 48(%rdi), %xmm1 vmovhpd 56(%rdi), %xmm1, %xmm1 vmovhpd 40(%rdi), %xmm0, %xmm0 vinsertf128 $1, %xmm1, %ymm0, %ymm0 To: vmovups 32(%rdi), %ymm0 An existing test case is also improved from: vmovsd (%rdi), %xmm0 vmovsd 16(%rdi), %xmm1 vmovsd 24(%rdi), %xmm2 vunpcklpd %xmm2, %xmm0, %xmm0 ## xmm0 = xmm0[0],xmm2[0] vmovhpd 8(%rdi), %xmm1, %xmm3 To: vmovsd (%rdi), %xmm0 vmovsd 16(%rdi), %xmm1 vmovhpd 24(%rdi), %xmm0, %xmm0 vmovhpd 8(%rdi), %xmm1, %xmm1 This patch fixes PR21771 ( http://llvm.org/bugs/show_bug.cgi?id=21771 ). Differential Revision: http://reviews.llvm.org/D6642 llvm-svn: 224379	2014-12-16 21:57:18 +00:00
Matt Arsenault	dd3b77d64c	Move lowerConstant to AsmPrinter This was a static function before, and NVPTX duplicated it because it wasn't exposed. llvm-svn: 224354	2014-12-16 19:16:14 +00:00
Quentin Colombet	d5e57b731f	[CodeGenPrepare] Move sign/zero extensions near loads using type promotion. This patch extends the optimization in CodeGenPrepare that moves a sign/zero extension near a load when the target can combine them. The optimization may promote any operations between the extension and the load to make that possible. Although this optimization may be beneficial for all targets, in particular AArch64, this is enabled for X86 only as I have not benchmarked it for other targets yet. Context Most targets feature extended loads, i.e., loads that perform a zero or sign extension for free. In that context it is interesting to expose such pattern in CodeGenPrepare so that the instruction selection pass can form such loads. Sometimes, this pattern is blocked because of instructions between the load and the extension. When those instructions are promotable to the extended type, we can expose this pattern. Motivating Example Let us consider an example: define void @foo(i8* %addr1, i32* %addr2, i8 %a, i32 %b) { %ld = load i8* %addr1 %zextld = zext i8 %ld to i32 %ld2 = load i32* %addr2 %add = add nsw i32 %ld2, %zextld %sextadd = sext i32 %add to i64 %zexta = zext i8 %a to i32 %addza = add nsw i32 %zexta, %zextld %sextaddza = sext i32 %addza to i64 %addb = add nsw i32 %b, %zextld %sextaddb = sext i32 %addb to i64 call void @dummy(i64 %sextadd, i64 %sextaddza, i64 %sextaddb) ret void } As it is, this IR generates the following assembly on x86_64: [...] movzbl (%rdi), %eax # zero-extended load movl (%rsi), %es # plain load addl %eax, %esi # 32-bit add movslq %esi, %rdi # sign extend the result of add movzbl %dl, %edx # zero extend the first argument addl %eax, %edx # 32-bit add movslq %edx, %rsi # sign extend the result of add addl %eax, %ecx # 32-bit add movslq %ecx, %rdx # sign extend the result of add [...] The throughput of this sequence is 7.45 cycles on Ivy Bridge according to IACA. Now, by promoting the additions to form more extended loads we would generate: [...] movzbl (%rdi), %eax # zero-extended load movslq (%rsi), %rdi # sign-extended load addq %rax, %rdi # 64-bit add movzbl %dl, %esi # zero extend the first argument addq %rax, %rsi # 64-bit add movslq %ecx, %rdx # sign extend the second argument addq %rax, %rdx # 64-bit add [...] The throughput of this sequence is 6.15 cycles on Ivy Bridge according to IACA. This kind of sequences happen a lot on code using 32-bit indexes on 64-bit architectures. Note: The throughput numbers are similar on Sandy Bridge and Haswell. Proposed Solution To avoid the penalty of all these sign/zero extensions, we merge them in the loads at the beginning of the chain of computation by promoting all the chain of computation on the extended type. The promotion is done if and only if we do not introduce new extensions, i.e., if we do not degrade the code quality. To achieve this, we extend the existing “move ext to load” optimization with the promotion mechanism introduced to match larger patterns for addressing mode (r200947). The idea of this extension is to perform the following transformation: ext(promotableInst1(...(promotableInstN(load)))) => promotedInst1(...(promotedInstN(ext(load)))) The promotion mechanism in that optimization is enabled by a new TargetLowering switch, which is off by default. In other words, by default, the optimization performs the “move ext to load” optimization as it was before this patch. Performance Configuration: x86_64: Ivy Bridge fixed at 2900MHz running OS X 10.10. Tested Optimization Levels: O3/Os Tests: llvm-testsuite + externals. Results: - No regression beside noise. - Improvements: CINT2006/473.astar: ~2% Benchmarks/PAQ8p: ~2% Misc/perlin: ~3% The results are consistent for both O3 and Os. <rdar://problem/18310086> llvm-svn: 224351	2014-12-16 19:09:03 +00:00
Aaron Ballman	0d6a010c13	Fixing -Wsign-compare warnings; NFC. llvm-svn: 224337	2014-12-16 14:04:11 +00:00
Matthias Braun	1aed6ffa35	LiveRangeCalc: Rewrite subrange calculation This changes subrange calculation to calculate subranges sequentially instead of in parallel. The code is easier to understand that way and addresses the code review issues raised about LiveOutData being hard to understand/needing more comments by removing them :) llvm-svn: 224313	2014-12-16 04:03:38 +00:00
Adrian Prantl	b9fa945d51	ARM/AArch64: Attach the FrameSetup MIFlag to CFI instructions. Debug info marks the first instruction without the FrameSetup flag as being the end of the function prologue. Any CFI instructions in the middle of the function prologue would cause debug info to end the prologue too early and worse, attach the line number of the CFI instruction, which incidentally is often 0. llvm-svn: 224294	2014-12-16 00:20:49 +00:00
Matthias Braun	c3a72c2e5f	Revert "LiveRangeCalc: Rewrite subrange calculation" Revert until I find out why non-subreg enabled targets break. This reverts commit 6097277eefb9c5fb35a7f493c783ee1fd1b9d6a7. llvm-svn: 224278	2014-12-15 21:36:35 +00:00
Matthias Braun	0352201e15	LiveRangeCalc: Rewrite subrange calculation This changes subrange calculation to calculate subranges sequentially instead of in parallel. The code is easier to understand that way and addresses the code review issues raised about LiveOutData being hard to understand/needing more comments by removing them :) llvm-svn: 224272	2014-12-15 21:16:21 +00:00
Matthias Braun	42fab34ffb	LiveRangeCalc: use more range based for loops; NFC llvm-svn: 224263	2014-12-15 19:40:46 +00:00
Michael Ilseman	addddc441f	Silence more static analyzer warnings. Add in definedness checks for shift operators, null checks when pointers are assumed by the code to be non-null, and explicit unreachables. llvm-svn: 224255	2014-12-15 18:48:43 +00:00
Akira Hatanaka	7ba78302b5	Rename argument strings of codegen passes to avoid collisions with command line options. This commit changes the command line arguments (PassInfo::PassArgument) of two passes, MachineFunctionPrinter and MachineScheduler, to avoid collisions with command line options that have the same argument strings. This bug manifests when the PassList construct (defined in opt.cpp) is used in a tool that links with codegen passes. To reproduce the bug, paste the following lines into llc.cpp and run llc. #include "llvm/IR/LegacyPassNameParser.h" static llvm:🆑:list<const llvm::PassInfo*, bool, llvm::PassNameParser> PassList(llvm:🆑:desc("Optimizations available:")); rdar://problem/19212448 llvm-svn: 224186	2014-12-13 04:52:04 +00:00
Andrea Di Biagio	d65fd9facd	Reapply "[MachineScheduler] Fix for PR21807: minor code difference building with/without -g." This reapplies r224118 with a fix for test 'misched-code-difference-with-debug.ll'. That test was failing on some buildbots because it was x86 specific but it was missing a target triple. Added an explicit triple to test misched-code-difference-with-debug.ll. llvm-svn: 224126	2014-12-12 15:09:58 +00:00
Andrea Di Biagio	5634a54efc	Revert: [MachineScheduler] Fix for PR21807: minor code difference building with/without -g. Test 'misched-code-difference-with-debug.ll' was failing on some buildbots. llvm-svn: 224121	2014-12-12 13:34:03 +00:00
Andrea Di Biagio	01236e3eca	[MachineScheduler] Fix for PR21807: minor code difference building with/without -g. This patch fixes the issue reported as PR21807. There was a minor difference in the generated code depending on the -g flag. The cause was that with -g the machine scheduler used a different scheduling strategy. This decision was based on the number of instructions in a schedule region and included debug instructions in that count. This patch fixes the issue in MISched and provides a test. Patch by Russell Gallop! llvm-svn: 224118	2014-12-12 12:41:22 +00:00
Ekaterina Romanova	90ff20d8f5	A fix for PR21176. DW_OP_const <const> doesn't describe a constant value, but a value at a constant address. The proper way to describe a constant value is DW_OP_constu <const>, DW_OP_stack_value. Added DW_OP_stack_value to the stack. Marked incorrect-variable-debugloc1.ll to xfail for PowerPC64, while the the failure (PR21881) is being investigated. llvm-svn: 224098	2014-12-12 05:11:47 +00:00
Philip Reames	60de8b29f7	Comment and minor code cleanup for GCStrategy (NFC) Updating comments to reflect the current state of the world after my recent changes to ownership structure and generally better describe what a GCStrategy is and how it works. llvm-svn: 224086	2014-12-12 00:49:03 +00:00
Matt Arsenault	810cb62962	Add target hook for whether it is profitable to reduce load widths Add an option to disable optimization to shrink truncated larger type loads to smaller type loads. On SI this prevents using scalar load instructions in some cases, since there are no scalar extloads. llvm-svn: 224084	2014-12-12 00:00:24 +00:00
Duncan P. N. Exon Smith	d6f8e4b03c	CodeGen: Stop using LeakDetector for MachineInstr Since `MachineInstr` is required to have a trivial destructor, it cannot remove itself from `LeakDetection`. Remove the calls. As it happens, this requirement is because `MachineFunction` allocates all `MachineInstr`s in a custom allocator; when the `MachineFunction` is destroyed they're dropped of the edge. There's no benefit to detecting leaks. llvm-svn: 224061	2014-12-11 21:51:37 +00:00
Matthias Braun	7e37a5f523	[CodeGen] Add print and verify pass after each MachineFunctionPass by default Previously print+verify passes were added in a very unsystematic way, which is annoying when debugging as you miss intermediate steps and allows bugs to stay unnotice when no verification is performed. To make this change practical I added the possibility to explicitely disable verification. I used this option on all places where no verification was performed previously (because alot of places actually don't pass the MachineVerifier). In the long term these problems should be fixed properly and verification enabled after each pass. I'll enable some more verification in subsequent commits. This is the 2nd attempt at this after realizing that PassManager::add() may actually delete the pass. llvm-svn: 224059	2014-12-11 21:26:47 +00:00
Rafael Espindola	01c73610d0	This reverts commit r224043 and r224042. check-llvm was failing. llvm-svn: 224045	2014-12-11 20:03:57 +00:00
Matthias Braun	a7c82a9f1d	[CodeGen] Add print and verify pass after each MachineFunctionPass by default Previously print+verify passes were added in a very unsystematic way, which is annoying when debugging as you miss intermediate steps and allows bugs to stay unnotice when no verification is performed. To make this change practical I added the possibility to explicitely disable verification. I used this option on all places where no verification was performed previously (because alot of places actually don't pass the MachineVerifier). In the long term these problems should be fixed properly and verification enabled after each pass. I'll enable some more verification in subsequent commits. llvm-svn: 224042	2014-12-11 19:42:05 +00:00
Matthias Braun	a4e932db16	[CodeGen] Let MachineVerifierPass own its banner string llvm-svn: 224041	2014-12-11 19:41:51 +00:00
Patrik Hagglund	cb06a36c9a	Bugfix in InlineSpiller::traceSiblingValue(). Properly determine whether or not a phi was added by splitting. Check against the current VNInfo of OrigLI instead of against the OrigVNI argument. Patch provided by Jonas Paulsson. Reviewed by Quentin Colombet. llvm-svn: 224009	2014-12-11 10:40:17 +00:00
Ekaterina Romanova	75fd123967	Reverting commit 223981, because the test that I added (incorrect-variable-debugloc1.ll) failed for llvm-ppc64. The test is failing for llvm-ppc64 because for this platform the location list is not being generated at all (most likely because of the bug in PPC code optimization or generation). I will file a bug agains PPC compiler, but meanwhile, until PPC bug is fixed, I will have to revert my change. llvm-svn: 224000	2014-12-11 06:22:35 +00:00
Philip Reames	1e30897497	GCStrategy should not own GCFunctionInfo This change moves the ownership and access of GCFunctionInfo (the object which describes the safepoints associated with a safepoint under GCRoot) to GCModuleInfo. Previously, this was owned by GCStrategy which was in turned owned by GCModuleInfo. This made GCStrategy module specific which is 'surprising' given it's name and other purposes. There's a few more changes needed, but we're getting towards the point we can reuse GCStrategy for gc.statepoint as well. p.s. The style of this code ends up being a mess. I was trying to move code around without otherwise changing much. Once I get the ownership structure rearranged, I will go through and fixup spacing, naming, comments etc. Differential Revision: http://reviews.llvm.org/D6587 llvm-svn: 223994	2014-12-11 01:47:23 +00:00
Matthias Braun	09afa1ea74	LiveInterval: Use range based for loops for subregister ranges. llvm-svn: 223991	2014-12-11 00:59:06 +00:00
Ekaterina Romanova	ceeaba7932	A fix for PR21176. DW_OP_const <const> doesn't describe a constant value, but a value at a constant address. The proper way to describe a constant value is DW_OP_constu <const>, DW_OP_stack_value. Added DW_OP_stack_value to the stack. -This line, and those below, will be ignored-- M lib/CodeGen/AsmPrinter/DwarfDebug.cpp A test/DebugInfo/incorrect-variable-debugloc1.ll llvm-svn: 223981	2014-12-10 23:19:56 +00:00
Matthias Braun	96761959d4	LiveInterval: Use more range based for loops for value numbers and segments. llvm-svn: 223978	2014-12-10 23:07:54 +00:00
Aaron Ballman	e5a2a0c9a8	Silencing a -Wsequence-point warning, and the resulting undefined behavior. NFC. llvm-svn: 223926	2014-12-10 14:14:54 +00:00
Matthias Braun	96d7732b08	MachineVerifier: Allow physreg use if just a subreg is defined. We can't mark partially undefined registers, so we have to allow reading a register in the machine verifier if just parts of a register are defined. llvm-svn: 223896	2014-12-10 01:13:13 +00:00
Matthias Braun	21554d9b30	MachineVerifier: Allow LiveInterval segments to end at a partial write. In the subregister liveness tracking case we do not create implicit reads on partial register writes anymore, still we need to produce a new SSA value for partial writes so the live segment has to end. llvm-svn: 223895	2014-12-10 01:13:11 +00:00
Matthias Braun	279f83645c	VirtRegMap: Improve block live-in info if subregister liveness is available. llvm-svn: 223894	2014-12-10 01:13:08 +00:00
Matthias Braun	d70caaf5a5	VirtRegMap: No implicit defs/uses for super registers with subreg liveness tracking. Adding the implicit defs/uses to the superregisters is semantically questionable but was not dangerous before as the register allocator never assigned the same register to two overlapping LiveIntervals even when the actually live subregisters do not overlap. With subregister liveness tracking enabled this does actually happen and leads to subsequent bugs if we don't stop adding the superregister defs/uses. llvm-svn: 223892	2014-12-10 01:13:04 +00:00
Matthias Braun	587e27415d	LiveRegMatrix: Respect subregister liveness when allocating registers. llvm-svn: 223891	2014-12-10 01:13:01 +00:00
Matthias Braun	a0f0c1f013	LiveIntervalUnion: Allow specification of liverange when unifying/extracting. This allows it to add subregister ranges into the union. llvm-svn: 223890	2014-12-10 01:12:59 +00:00
Matthias Braun	14f764c872	RegisterCoalescer: Preserve subregister liveranges. llvm-svn: 223888	2014-12-10 01:12:52 +00:00
Matthias Braun	2079aa9140	LiveInterval: Add removeEmptySubRanges(). llvm-svn: 223887	2014-12-10 01:12:40 +00:00
Matthias Braun	8970d847c4	LiveIntervalAnalysis: Add subregister aware variants pruneValue(). llvm-svn: 223886	2014-12-10 01:12:36 +00:00
Matthias Braun	e3d3b88cb9	Add a flag to enable/disable subregister liveness. llvm-svn: 223884	2014-12-10 01:12:30 +00:00
Matthias Braun	e5f861b781	LiveIntervalAnalysis: Adapt repairIntervalsInRange() to subregister liveness. llvm-svn: 223883	2014-12-10 01:12:26 +00:00
Matthias Braun	fe896c703c	LiveRangeEdit: Adapt eliminateDeadDef() to subregister liveness. llvm-svn: 223882	2014-12-10 01:12:23 +00:00
Matthias Braun	7044d69e87	LiveIntervalAnalysis: Adapt handleMove() to subregister ranges. llvm-svn: 223881	2014-12-10 01:12:20 +00:00
Matthias Braun	20e1f38a41	LiveIntervalAnalysis: Update SubRanges in shrinkToUses(). llvm-svn: 223880	2014-12-10 01:12:18 +00:00
Matthias Braun	2f66232bde	LiveIntervalAnalysis: Compute subregister ranges. llvm-svn: 223878	2014-12-10 01:12:12 +00:00
Matthias Braun	3f1d8fdd33	LiveInterval: Add support to track liveness of subregisters. This code adds the required data structures. Algorithms to compute it follow. llvm-svn: 223877	2014-12-10 01:12:10 +00:00
Matthias Braun	e62c207092	LiveInterval: Add a 'covers' operation to LiveRange. llvm-svn: 223876	2014-12-10 01:12:06 +00:00
Philip Reames	de226055ca	Remove the Module pointer from GCStrategy and GCMetadataPrinter In the current implementation, GCStrategy is a part of the ownership structure for the gc metadata which describes a Module. It also contains a reference to the module in question. As a result, GCStrategy instances are essentially Module specific. I plan to transition away from this design. Instead, a GCStrategy will be owned by the LLVMContext. It will be a lightweight policy object which contains no information about the Modules or Functions involved, but can be easily reached given a Function. The first step in this transition is to remove the direct Module reference from GCStrategy. This also requires removing the single user of this reference, the GCMetadataPrinter hierarchy. In theory, this will allow the lifetime of the printers to be scoped to the LLVMContext as well, but in practice, I'm not actually changing that. (Yet?) An alternate design would have been to move the direct Module reference into the GCMetadataPrinter and change the keying of the owning maps to explicitly key off both GCStrategy and Module. I'm open to doing it that way instead, but didn't see much value in preserving the per Module association for GCMetadataPrinters. The next change in this sequence will be to start unwinding the intertwined ownership between GCStrategy, GCModuleInfo, and GCFunctionInfo. Differential Revision: http://reviews.llvm.org/D6566 llvm-svn: 223859	2014-12-09 23:57:54 +00:00
Duncan P. N. Exon Smith	5bf8fef580	IR: Split Metadata from Value Split `Metadata` away from the `Value` class hierarchy, as part of PR21532. Assembly and bitcode changes are in the wings, but this is the bulk of the change for the IR C++ API. I have a follow-up patch prepared for `clang`. If this breaks other sub-projects, I apologize in advance :(. Help me compile it on Darwin I'll try to fix it. FWIW, the errors should be easy to fix, so it may be simpler to just fix it yourself. This breaks the build for all metadata-related code that's out-of-tree. Rest assured the transition is mechanical and the compiler should catch almost all of the problems. Here's a quick guide for updating your code: - `Metadata` is the root of a class hierarchy with three main classes: `MDNode`, `MDString`, and `ValueAsMetadata`. It is distinct from the `Value` class hierarchy. It is typeless -- i.e., instances do not have a `Type`. - `MDNode`'s operands are all `Metadata ` (instead of `Value `). - `TrackingVH<MDNode>` and `WeakVH` referring to metadata can be replaced with `TrackingMDNodeRef` and `TrackingMDRef`, respectively. If you're referring solely to resolved `MDNode`s -- post graph construction -- just use `MDNode`. - `MDNode` (and the rest of `Metadata`) have only limited support for `replaceAllUsesWith()`. As long as an `MDNode` is pointing at a forward declaration -- the result of `MDNode::getTemporary()` -- it maintains a side map of its uses and can RAUW itself. Once the forward declarations are fully resolved RAUW support is dropped on the ground. This means that uniquing collisions on changing operands cause nodes to become "distinct". (This already happened fairly commonly, whenever an operand went to null.) If you're constructing complex (non self-reference) `MDNode` cycles, you need to call `MDNode::resolveCycles()` on each node (or on a top-level node that somehow references all of the nodes). Also, don't do that. Metadata cycles (and the RAUW machinery needed to construct them) are expensive. - An `MDNode` can only refer to a `Constant` through a bridge called `ConstantAsMetadata` (one of the subclasses of `ValueAsMetadata`). As a side effect, accessing an operand of an `MDNode` that is known to be, e.g., `ConstantInt`, takes three steps: first, cast from `Metadata` to `ConstantAsMetadata`; second, extract the `Constant`; third, cast down to `ConstantInt`. The eventual goal is to introduce `MDInt`/`MDFloat`/etc. and have metadata schema owners transition away from using `Constant`s when the type isn't important (and they don't care about referring to `GlobalValue`s). In the meantime, I've added transitional API to the `mdconst` namespace that matches semantics with the old code, in order to avoid adding the error-prone three-step equivalent to every call site. If your old code was: MDNode N = foo(); bar(isa <ConstantInt>(N->getOperand(0))); baz(cast <ConstantInt>(N->getOperand(1))); bak(cast_or_null <ConstantInt>(N->getOperand(2))); bat(dyn_cast <ConstantInt>(N->getOperand(3))); bay(dyn_cast_or_null<ConstantInt>(N->getOperand(4))); you can trivially match its semantics with: MDNode N = foo(); bar(mdconst::hasa <ConstantInt>(N->getOperand(0))); baz(mdconst::extract <ConstantInt>(N->getOperand(1))); bak(mdconst::extract_or_null <ConstantInt>(N->getOperand(2))); bat(mdconst::dyn_extract <ConstantInt>(N->getOperand(3))); bay(mdconst::dyn_extract_or_null<ConstantInt>(N->getOperand(4))); and when you transition your metadata schema to `MDInt`: MDNode N = foo(); bar(isa <MDInt>(N->getOperand(0))); baz(cast <MDInt>(N->getOperand(1))); bak(cast_or_null <MDInt>(N->getOperand(2))); bat(dyn_cast <MDInt>(N->getOperand(3))); bay(dyn_cast_or_null<MDInt>(N->getOperand(4))); - A `CallInst` -- specifically, intrinsic instructions -- can refer to metadata through a bridge called `MetadataAsValue`. This is a subclass of `Value` where `getType()->isMetadataTy()`. `MetadataAsValue` is the only class that can legally refer to a `LocalAsMetadata`, which is a bridged form of non-`Constant` values like `Argument` and `Instruction`. It can also refer to any other `Metadata` subclass. (I'll break all your testcases in a follow-up commit, when I propagate this change to assembly.) llvm-svn: 223802	2014-12-09 18:38:53 +00:00
Juergen Ributzka	8bda738221	[CGP] Rewrite pattern match for splitBranchCondition to work with Values instead. Rewrite the pattern match code to work also with Values instead with Instructions only. Also remove the no longer need matcher (m_Instruction). llvm-svn: 223797	2014-12-09 17:50:10 +00:00
Juergen Ributzka	194350a936	Revert "Move function to obtain branch weights into the BranchInst class. NFC." This reverts commit r223784 and copies the 'ExtractBranchMetadata' to CodeGenPrepare. llvm-svn: 223795	2014-12-09 17:32:12 +00:00
Juergen Ributzka	c1bbcbbd32	[CodeGenPrepare] Split branch conditions into multiple conditional branches. This optimization transforms code like: bb1: %0 = icmp ne i32 %a, 0 %1 = icmp ne i32 %b, 0 %or.cond = or i1 %0, %1 br i1 %or.cond, label %TrueBB, label %FalseBB into a multiple branch instructions like: bb1: %0 = icmp ne i32 %a, 0 br i1 %0, label %TrueBB, label %bb2 bb2: %1 = icmp ne i32 %b, 0 br i1 %1, label %TrueBB, label %FalseBB This optimization is already performed by SelectionDAG, but not by FastISel. FastISel cannot perform this optimization, because it cannot generate new MachineBasicBlocks. Performing this optimization at CodeGenPrepare time makes it available to both - SelectionDAG and FastISel - and the implementation in SelectiuonDAG could be removed. There are currenty a few differences in codegen for X86 and PPC, so this commmit only enables it for FastISel. Reviewed by Jim Grosbach This fixes rdar://problem/19034919. llvm-svn: 223786	2014-12-09 16:36:13 +00:00
Owen Anderson	558012a3fc	Fix a few instances found in SelectionDAG where we were not handling F16 at parity with F32 and F64. llvm-svn: 223760	2014-12-09 06:50:39 +00:00
Hal Finkel	c8cf2b88bc	Handle early-clobber registers in the aggressive anti-dep breaker The aggressive anti-dep breaker, used by the PowerPC backend during post-RA scheduling (but is available to all targets), did not handle early-clobber MI operands (at all). When constructing the list of available registers for the replacement of some def operand, check the using instructions, and remove registers assigned to early-clobbered defs from the set. Fixes PR21452. llvm-svn: 223727	2014-12-09 01:00:59 +00:00
Tom Stellard	3e01d47d98	MISched: Fix moving stores across barriers This fixes an issue with ScheduleDAGInstrs::buildSchedGraph where stores without an underlying object would not be added as a predecessor to the current BarrierChain. llvm-svn: 223717	2014-12-08 23:36:48 +00:00
Justin Bogner	61ba2e3996	InstrProf: An intrinsic and lowering for instrumentation based profiling Introduce the ``llvm.instrprof_increment`` intrinsic and the ``-instrprof`` pass. These provide the infrastructure for writing counters for profiling, as in clang's ``-fprofile-instr-generate``. The implementation of the instrprof pass is ported directly out of the CodeGenPGO classes in clang, and with the followup in clang that rips that code out to use these new intrinsics this ends up being NFC. Doing the instrumentation this way opens some doors in terms of improving the counter performance. For example, this will make it simple to experiment with alternate lowering strategies, and allows us to try handling profiling specially in some optimizations if we want to. Finally, this drastically simplifies the frontend and puts all of the lowering logic in one place. llvm-svn: 223672	2014-12-08 18:02:35 +00:00
Hans Wennborg	08de833c1c	SelectionDAG switch lowering: Replace unreachable default with most popular case. This can significantly reduce the size of the switch, allowing for more efficient lowering. I also worked with the idea of exploiting unreachable defaults by omitting the range check for jump tables, but always ended up with a non-neglible binary size increase. It might be worth looking into some more. SimplifyCFG currently does this transformation, but I'm working towards changing that so we can optimize harder based on unreachable defaults. Differential Revision: http://reviews.llvm.org/D6510 llvm-svn: 223566	2014-12-06 01:28:50 +00:00
Eric Christopher	d1fb7e4590	These two calls were grabbing the same register info. Unify them. llvm-svn: 223502	2014-12-05 19:23:55 +00:00
Ahmed Bougacha	55e3c2d9cf	[CodeGenPrepare] Use variables for reused values. NFC. llvm-svn: 223491	2014-12-05 18:04:40 +00:00
Hal Finkel	66d7791176	Revert "r223440 - Consider subregs when calling MI::registerDefIsDead for phys deps" Reverting this because, while it fixes the problem in the reduced test case, it does not fix the problem in the full test case from the bug report. llvm-svn: 223442	2014-12-05 02:07:35 +00:00
Hal Finkel	d013d99fe0	Consider subregs when calling MI::registerDefIsDead for phys deps The scheduling dependency graph is built bottom-up within each scheduling region, and ScheduleDAGInstrs::addPhysRegDeps is called to add output/anti dependencies, based on physical registers, to the SUs for instructions based on those that come before them. In the test case, we start before post-RA scheduling with a block that looks like this: ... INLINEASM <... andc $0,$0,$2 stdcx. $0,0,$3 bne- 1b > [sideeffect] [mayload] [maystore] [attdialect], $0:[regdef-ec:G8RC], %X6<earlyclobber,def,dead>, $1:[mem], %X3<kill>, $2:[reguse:G8RC], %X5<kill>, $3:[reguse:G8RC], %X3, $4:[mem], %X3, $5:[clobber], %CC<earlyclobber,imp-def,dead>, <<badref>> ... %X4<def,dead> = ANDIo8 %X4<kill>, 1, %CR0<imp-def,dead>, %CR0GT<imp-def> ... %R29<def> = ISEL %R3<undef>, %R4<kill>, %CR0GT<kill> where it is relevant that %CC is an alias to %CR0, and that %CR0GT is a subregister of %CR0. However, for post-RA scheduling, no dependency was added to prevent the INLINEASM from being scheduled in between the ANDIo8 and the ISEL (which communicate via the %CR0GT register). In ScheduleDAGInstrs::addPhysRegDeps, when called for the %CC operand, we'd iterate over all of its aliases (which include %CC itself and also %CR0), and look for previously-encountered defs of those registers. We'd find the ANDIo8, but decide not to add a dependency between the INLINEASM and the ANDIo8 because both the INLINEASM's def of %CC is dead, and also the ANDIo8 def of %CR0 is dead. This ignores, however, that ANDIo8 has a non-dead def of %CR0GT, a subregister of %CR0, and thus a dependency still must exist. To fix this problem, when calling registerDefIsDead on the SU with the def, we also check all subregisters for possible non-dead defs, and add the dependency if any are found. Fixes PR21742. llvm-svn: 223440	2014-12-05 01:57:22 +00:00
Adrian Prantl	ab255fcd09	Cleanup: Calls to getDwarfRegNum() may actually fail, if there is no DWARF register number mapping, or if the register was a virtual register that was never materialized. Previously, we would just emit a bogus location, after this patch we don't emit a location at all by doing an early exit. After my bugfix in r223401 today, this doesn't actually happen on any target that I tested this with, but it's still preferable to make the possibility of a failure explicit. llvm-svn: 223428	2014-12-05 01:02:46 +00:00
Adrian Prantl	da7e03f1bf	Simplify implementation and testcase of r223401 based on feedback from dblaikie. llvm-svn: 223405	2014-12-04 22:58:41 +00:00
Adrian Prantl	a3ae0b3b5b	Debug info: If the RegisterCoalescer::reMaterializeTrivialDef() is eliminating all uses of a vreg, update any DBG_VALUE describing that vreg to point to the rematerialized register instead. llvm-svn: 223401	2014-12-04 22:29:04 +00:00
Patrik Hagglund	d06de4b954	Use DomTree in MachineSink to sink over diamonds. According to a previous FIXME comment we now not only look at MBB successors, but also handle code sinking past them: x = computation if () {} else {} use x The instruction could be sunk over the whole diamond for the if/then/else (or loop, etc), allowing it to be sunk into other blocks after that. Modified test added in r204522, due to one spill less present. Minor fixes in comments. Patch provided by Jonas Paulsson. Reviewed by Hal Finkel. llvm-svn: 223350	2014-12-04 10:36:42 +00:00
Simon Pilgrim	be24ab367b	[InstCombine] Minor optimization for bswap with binary ops Added instcombine optimizations for BSWAP with AND/OR/XOR ops: OP( BSWAP(x), BSWAP(y) ) -> BSWAP( OP(x, y) ) OP( BSWAP(x), CONSTANT ) -> BSWAP( OP(x, BSWAP(CONSTANT) ) ) Since its just a one liner, I've also added BSWAP to the DAGCombiner equivalent as well: fold (OP (bswap x), (bswap y)) -> (bswap (OP x, y)) Refactored bswap-fold tests to use FileCheck instead of just checking that the bswaps had gone. Differential Revision: http://reviews.llvm.org/D6407 llvm-svn: 223349	2014-12-04 09:44:01 +00:00
Elena Demikhovsky	f1de34b84d	Masked Load / Store Intrinsics - the CodeGen part. I'm recommiting the codegen part of the patch. The vectorizer part will be send to review again. Masked Vector Load and Store Intrinsics. Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores. Added SDNodes for masked operations and lowering patterns for X86 code generator. Examples: <16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align /, <16 x i1> %mask) declare void @llvm.masked.store.v8f64(i8 %addr, <8 x double> %value, i32 4, <8 x i1> %mask) Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch. http://reviews.llvm.org/D6191 llvm-svn: 223348	2014-12-04 09:40:44 +00:00
Matt Arsenault	4e27343eec	Allow target to specify prefix for labels Use the MCAsmInfo instead of the DataLayout, and allow specifying a custom prefix for labels specifically. HSAIL requires that labels begin with @, but global symbols with &. llvm-svn: 223323	2014-12-04 00:06:57 +00:00
Quentin Colombet	079aba733a	[RegAllocFast] Handle implicit definitions conservatively. Prior to this commit, physical registers defined implicitly were considered free right after their definition, i.e.. like dead definitions. Therefore, their uses had to immediately follow their definitions, otherwise the related register may be reused to allocate a virtual register. This commit fixes this assumption by keeping implicit definitions alive until they are actually used. The downside is that if the implicit definition was dead (and not marked at such), we block an otherwise available register. This is however conservatively correct and makes the fast register allocator much more robust in particular regarding the scheduling of the instructions. Fixes PR21700. llvm-svn: 223317	2014-12-03 23:38:08 +00:00
Peter Collingbourne	51d2de7b9e	Prologue support Patch by Ben Gamari! This redefines the `prefix` attribute introduced previously and introduces a `prologue` attribute. There are a two primary usecases that these attributes aim to serve, 1. Function prologue sigils 2. Function hot-patching: Enable the user to insert `nop` operations at the beginning of the function which can later be safely replaced with a call to some instrumentation facility 3. Runtime metadata: Allow a compiler to insert data for use by the runtime during execution. GHC is one example of a compiler that needs this functionality for its tables-next-to-code functionality. Previously `prefix` served cases (1) and (2) quite well by allowing the user to introduce arbitrary data at the entrypoint but before the function body. Case (3), however, was poorly handled by this approach as it required that prefix data was valid executable code. Here we redefine the notion of prefix data to instead be data which occurs immediately before the function entrypoint (i.e. the symbol address). Since prefix data now occurs before the function entrypoint, there is no need for the data to be valid code. The previous notion of prefix data now goes under the name "prologue data" to emphasize its duality with the function epilogue. The intention here is to handle cases (1) and (2) with prologue data and case (3) with prefix data. References ---------- This idea arose out of discussions[1] with Reid Kleckner in response to a proposal to introduce the notion of symbol offsets to enable handling of case (3). [1] http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-May/073235.html Test Plan: testsuite Differential Revision: http://reviews.llvm.org/D6454 llvm-svn: 223189	2014-12-03 02:08:38 +00:00
Hal Finkel	bbdee93638	[PowerPC] Implement readcyclecounter for PPC32 We've long supported readcyclecounter on PPC64, but it is easier there (the read of the 64-bit time-base register can be accomplished via a single instruction). This now provides an implementation for PPC32 as well. On PPC32, the time-base register is still 64 bits, but can only be read 32 bits at a time via two separate SPRs. The ISA manual explains how to do this properly (it involves re-reading the upper bits and looping if the counter has wrapped while being read). This requires PPC to implement a custom integer splitting legalization for the READCYCLECOUNTER node, turning it into a target-specific SDAG node, which then gets turned into a pseudo-instruction, which is then expanded to the necessary sequence (which has three SPR reads, the comparison and the branch). Thanks to Paul Hargrove for pointing out to me that this was still unimplemented. llvm-svn: 223161	2014-12-02 22:01:00 +00:00
Philip Reames	72fbe7a6f0	Restructure some assertion checking based on post commit feedback by Aaron and Tom. llvm-svn: 223150	2014-12-02 21:01:48 +00:00
Philip Reames	f814a511da	Appease a build bot complaining about an unused variable that's used in an assertion. llvm-svn: 223142	2014-12-02 19:28:57 +00:00
Philip Reames	1a1bdb22bf	[Statepoints 3/4] Statepoint infrastructure for garbage collection: SelectionDAGBuilder This is the third patch in a small series. It contains the CodeGen support for lowering the gc.statepoint intrinsic sequences (223078) to the STATEPOINT pseudo machine instruction (223085). The change also includes the set of helper routines and classes for working with gc.statepoints, gc.relocates, and gc.results since the lowering code uses them. With this change, gc.statepoints should be functionally complete. The documentation will follow in the fourth change, and there will likely be some cleanup changes, but interested parties can start experimenting now. I'm not particularly happy with the amount of code or complexity involved with the lowering step, but at least it's fairly well isolated. The statepoint lowering code is split into it's own files and anyone not working on the statepoint support itself should be able to ignore it. During the lowering process, we currently spill aggressively to stack. This is not entirely ideal (and we have plans to do better), but it's functional, relatively straight forward, and matches closely the implementations of the patchpoint intrinsics. Most of the complexity comes from trying to keep relocated copies of values in the same stack slots across statepoints. Doing so avoids the insertion of pointless load and store instructions to reshuffle the stack. The current implementation isn't as effective as I'd like, but it is functional and 'good enough' for many common use cases. In the long term, I'd like to figure out how to integrate the statepoint lowering with the register allocator. In principal, we shouldn't need to eagerly spill at all. The register allocator should do any spilling required and the statepoint should simply record that fact. Depending on how challenging that turns out to be, we may invest in a smarter global stack slot assignment mechanism as a stop gap measure. Reviewed by: atrick, ributzka llvm-svn: 223137	2014-12-02 18:50:36 +00:00
Ahmed Bougacha	54b7d334c7	[MachineCSE] Clear kill-flag on registers imp-def'd by the CSE'd instruction. Go through implicit defs of CSMI and MI, and clear the kill flags on their uses in all the instructions between CSMI and MI. We might have made some of the kill flags redundant, consider: subs ... %NZCV<imp-def> <- CSMI csinc ... %NZCV<imp-use,kill> <- this kill flag isn't valid anymore subs ... %NZCV<imp-def> <- MI, to be eliminated csinc ... %NZCV<imp-use,kill> Since we eliminated MI, and reused a register imp-def'd by CSMI (here %NZCV), that register, if it was killed before MI, should have that kill flag removed, because it's lifetime was extended. Also, add an exhaustive testcase for the motivating example. Reviewed by: Juergen Ributzka <juergen@apple.com> llvm-svn: 223133	2014-12-02 18:09:51 +00:00
Philip Reames	0365f1a376	[Statepoints 2/4] Statepoint infrastructure for garbage collection: MI & x86-64 Backend This is the second patch in a small series. This patch contains the MachineInstruction and x86-64 backend pieces required to lower Statepoints. It does not include the code to actually generate the STATEPOINT machine instruction and as a result, the entire patch is currently dead code. I will be submitting the SelectionDAG parts within the next 24-48 hours. Since those pieces are by far the most complicated, I wanted to minimize the size of that patch. That patch will include the tests which exercise the functionality in this patch. The entire series can be seen as one combined whole in http://reviews.llvm.org/D5683. The STATEPOINT psuedo node is generated after all gc values are explicitly spilled to stack slots. The purpose of this node is to wrap an actual call instruction while recording the spill locations of the meta arguments used for garbage collection and other purposes. The STATEPOINT is modeled as modifing all of those locations to prevent backend optimizations from forwarding the value from before the STATEPOINT to after the STATEPOINT. (Doing so would break relocation semantics for collectors which wish to relocate roots.) The implementation of STATEPOINT is closely modeled on PATCHPOINT. Eventually, much of the code in this patch will be removed. The long term plan is to merge the functionality provided by statepoints and patchpoints. Merging their implementations in the backend is likely to be a good starting point. Reviewed by: atrick, ributzka llvm-svn: 223085	2014-12-01 22:52:56 +00:00
Ahmed Bougacha	fb6eeb74c5	[MachineVerifier] Accept a MBB with a single landing pad successor. The MachineVerifier used to check that there was always exactly one unconditional branch to a non-landingpad (normal) successor. If that normal successor to an invoke BB is unreachable, it seems reasonable to only have one successor, the landing pad. On targets other than AArch64 (and on AArch64 with a different testcase), the branch folder turns the branch to the landing pad into a fallthrough. The MachineVerifier, which relies on AnalyzeBranch, is unable to check the condition, and doesn't complain. However, it does in this specific testcase, where the branch to the landing pad remained. Make the MachineVerifier accept it. llvm-svn: 223059	2014-12-01 18:43:53 +00:00
Hans Wennborg	5bef5b522b	Revert r223049, r223050 and r223051 while investigating test failures. I didn't foresee affecting the Clang test suite :/ llvm-svn: 223054	2014-12-01 17:36:43 +00:00
Hans Wennborg	1571336fb2	SelectionDAG switch lowering: Replace unreachable default with most popular case. This can significantly reduce the size of the switch, allowing for more efficient lowering. I also worked with the idea of exploiting unreachable defaults by omitting the range check for jump tables, but always ended up with a non-neglible binary size increase. It might be worth looking into some more. llvm-svn: 223049	2014-12-01 17:08:32 +00:00
Akira Hatanaka	b9991a2656	[stack protector] Set edge weights for newly created basic blocks. This commit fixes a bug in stack protector pass where edge weights were not set when new basic blocks were added to lists of successor basic blocks. Differential Revision: http://reviews.llvm.org/D5766 llvm-svn: 222987	2014-12-01 04:27:03 +00:00
Hans Wennborg	6dfb041ffc	Switch lowering: reformat some for loops etc. NFC llvm-svn: 222962	2014-11-29 21:24:12 +00:00
Hans Wennborg	6c42d1a5de	Switch lowering: Fix broken 'Figure out which block is next' code This doesn't seem to have worked in a long time, but other optimizations would clean it up. llvm-svn: 222961	2014-11-29 21:17:05 +00:00
Simon Pilgrim	2bfd9129f4	Target triple OS detection tidyup. NFC Use Triple::isOS*() helpers where possible. llvm-svn: 222960	2014-11-29 19:18:21 +00:00
Duncan P. N. Exon Smith	9bc81fbe92	Revert "Masked Vector Load and Store Intrinsics." This reverts commit r222632 (and follow-up r222636), which caused a host of LNT failures on an internal bot. I'll respond to the commit on the list with a reproduction of one of the failures. Conflicts: lib/Target/X86/X86TargetTransformInfo.cpp llvm-svn: 222936	2014-11-28 21:29:14 +00:00
Elena Demikhovsky	6de72e5b0c	Converted back to Unix format (after my last commit 222632) llvm-svn: 222636	2014-11-23 15:21:53 +00:00
Elena Demikhovsky	9e5089a938	Masked Vector Load and Store Intrinsics. Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores. Added SDNodes for masked operations and lowering patterns for X86 code generator. Examples: <16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align /, <16 x i1> %mask) declare void @llvm.masked.store.v8f64(i8 %addr, <8 x double> %value, i32 4, <8 x i1> %mask) Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch. http://reviews.llvm.org/D6191 llvm-svn: 222632	2014-11-23 08:07:43 +00:00
Manman Ren	f0a582bada	Debug Info: revert r222195, r222210 and r222239. This is no longer needed after David's fix at r222377 + r222485. rdar://18958417 llvm-svn: 222563	2014-11-21 19:55:23 +00:00
Manman Ren	c98ec0e70a	[Objective-C] Support a new special module flag that will be put into the objc_imageinfo struct. rdar://17954668 llvm-svn: 222558	2014-11-21 19:24:55 +00:00
Sanjay Patel	eb4a4d5aeb	Don't repeat class/function/variable names in comments. NFC. llvm-svn: 222555	2014-11-21 18:58:38 +00:00
Sanjay Patel	b06441aded	Less space; NFC llvm-svn: 222546	2014-11-21 18:05:59 +00:00
Andrea Di Biagio	0225b5bf6f	[DAG] Teach how to turn a build_vector into a shuffle if some of the operands are zero. Before this patch, the DAGCombiner only tried to convert build_vector dag nodes into shuffles if all operands were either extract_vector_elt or undef. This patch improves that logic and teaches the DAGCombiner how to deal with build_vector dag nodes where one or more operands are zero. A build_vector dag node with some zero operands is turned into a shuffle only if the resulting shuffle mask is legal for the target. llvm-svn: 222536	2014-11-21 14:32:06 +00:00
Andrea Di Biagio	26e8f4d166	[DAG] Refactor the shuffle combining logic in DAGCombiner. NFC. This patch simplifies the logic that combines a pair of shuffle nodes into a single shuffle if there is a legal mask. Also added comments to better describe the algorithm. No functional change intended. llvm-svn: 222522	2014-11-21 11:33:07 +00:00
Hao Liu	44e5d7a131	DAGCombiner: Allow the DAGCombiner to combine multiple FDIVs with the same divisor info FMULs by the reciprocal. E.g., ( a / D; b / D ) -> ( recip = 1.0 / D; a * recip; b * recip) A hook is added to allow the target to control whether it needs to do such combine. Reviewed in http://reviews.llvm.org/D6334 llvm-svn: 222510	2014-11-21 06:39:58 +00:00

... 3 4 5 6 7 ...

17908 Commits