llvm-project

Commit Graph

Author	SHA1	Message	Date
Francis Visoiu Mistrih	a8a83d150f	[CodeGen] Use MachineOperand::print in the MIRPrinter for MO_Register. Work towards the unification of MIR and debug output by refactoring the interfaces. For MachineOperand::print, keep a simple version that can be easily called from `dump()`, and a more complex one which will be called from both the MIRPrinter and MachineInstr::print. Add extra checks inside MachineOperand for detached operands (operands with getParent() == nullptr). https://reviews.llvm.org/D40836 * find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/kill: ([^ ]+) ([^ ]+)<def> ([^ ]+)/kill: \1 def \2 \3/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/kill: ([^ ]+) ([^ ]+) ([^ ]+)<def>/kill: \1 \2 def \3/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/kill: def ([^ ]+) ([^ ]+) ([^ ]+)<def>/kill: def \1 \2 def \3/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/<def>//g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<kill>/killed \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<imp-use,kill>/implicit killed \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<dead>/dead \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<def[ ],[ ]dead>/dead \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<imp-def[ ],[ ]dead>/implicit-def dead \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<imp-def>/implicit-def \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<imp-use>/implicit \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<internal>/internal \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name "*.s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<undef>/undef \1/g' llvm-svn: 320022	2017-12-07 10:40:31 +00:00
Francis Visoiu Mistrih	25528d6de7	[CodeGen] Unify MBB reference format in both MIR and debug output As part of the unification of the debug format and the MIR format, print MBB references as '%bb.5'. The MIR printer prints the IR name of a MBB only for block definitions. * find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" $ -type f -print0 \| xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)->getNumber/" << printMBBReference(\1)/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" $ -type f -print0 \| xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)\.getNumber/" << printMBBReference(\1)/g' * find . $ -name ".txt" -o -name ".s" -o -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" $ -type f -print0 \| xargs -0 sed -i '' -E 's/BB#([0-9]+)/%bb.\1/g' * grep -nr 'BB#' and fix Differential Revision: https://reviews.llvm.org/D40422 llvm-svn: 319665	2017-12-04 17:18:51 +00:00
Francis Visoiu Mistrih	9d7bb0cb40	[CodeGen] Print register names in lowercase in both MIR and debug output As part of the unification of the debug format and the MIR format, always print registers as lowercase. * Only debug printing is affected. It now follows MIR. Differential Revision: https://reviews.llvm.org/D40417 llvm-svn: 319187	2017-11-28 17:15:09 +00:00
Simon Pilgrim	9a6b720f4f	[X86] Regenerate select tests llvm-svn: 317571	2017-11-07 13:21:02 +00:00
Alexander Ivchenko	34498ba052	[X86] Combining CMOVs with [ANY,SIGN,ZERO]_EXTEND for cases where CMOV has constant arguments Combine CMOV[i16]<-[SIGN,ZERO,ANY]_EXTEND to [i32,i64] into CMOV[i32,i64]. One example of where it is useful is: before (20 bytes) <foo>: test $0x1,%dil mov $0x307e,%ax mov $0xffff,%cx cmovne %ax,%cx movzwl %cx,%eax retq after (18 bytes) <foo>: test $0x1,%dil mov $0x307e,%ecx mov $0xffff,%eax cmovne %ecx,%eax retq Reviewers: craig.topper, aaboud, spatel, RKSimon, zvi Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36711 llvm-svn: 313982	2017-09-22 13:21:39 +00:00
Nikolai Bozhenov	84af99b3b1	[X86FixupBWInsts] More precise register liveness if no <imp-use> on MOVs. Summary: Subregister liveness tracking is not implemented for X86 backend, so sometimes the whole super register is said to be live, when only a subregister is really live. That might happen if the def and the use are located in different MBBs, see added fixup-bw-isnt.mir test. However, using knowledge of the specific instructions handled by the bw-fixup-pass we can get more precise liveness information which this change does. Reviewers: MatzeB, DavidKreitzer, ab, andrew.w.kaylor, craig.topper Reviewed By: craig.topper Subscribers: n.bozhenov, myatsina, llvm-commits, hiraditya Patch by Andrei Elovikov <andrei.elovikov@intel.com> Differential Revision: https://reviews.llvm.org/D37559 llvm-svn: 313524	2017-09-18 10:17:59 +00:00
Balaram Makam	42adadfca0	Re-land MachineInstr: Reason locally about some memory objects before going to AA. Summary: Reverts r311008 to reinstate r310825 with a fix. Refine alias checking for pseudo vs value to be conservative. This fixes the original failure in builtbot unittest SingleSource/UnitTests/2003-07-09-SignedArgs. Reviewers: hfinkel, nemanjai, efriedma Reviewed By: efriedma Subscribers: bjope, mcrosier, nhaehnle, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D36900 llvm-svn: 312126	2017-08-30 14:57:12 +00:00
Chandler Carruth	8ac488b161	[x86] Fix an amazing goof in the handling of sub, or, and xor lowering. The comment for this code indicated that it should work similar to our handling of add lowering above: if we see uses of an instruction other than flag usage and store usage, it tries to avoid the specialized X86ISD::* nodes that are designed for flag+op modeling and emits an explicit test. Problem is, only the add case actually did this. In all the other cases, the logic was incomplete and inverted. Any time the value was used by a store, we bailed on the specialized X86ISD node. All of this appears to have been historical where we had different logic here. =/ Turns out, we have quite a few patterns designed around these nodes. We should actually form them. I fixed the code to match what we do for add, and it has quite a positive effect just within some of our test cases. The only thing close to a regression I see is using: notl %r testl %r, %r instead of: xorl -1, %r But we can add a pattern or something to fold that back out. The improvements seem more than worth this. I've also worked with Craig to update the comments to no longer be actively contradicted by the code. =[ Some of this still remains a mystery to both Craig and myself, but this seems like a large step in the direction of consistency and slightly more accurate comments. Many thanks to Craig for help figuring out this nasty stuff. Differential Revision: https://reviews.llvm.org/D37096 llvm-svn: 311737	2017-08-25 00:34:07 +00:00
Geoff Berry	4e38e02e6f	Revert "[MachineCopyPropagation] Extend pass to do COPY source forwarding" This reverts commit r311038. Several buildbots are breaking, and at least one appears to be due to the forwarding of physical regs enabled by this change. Reverting while I investigate further. llvm-svn: 311062	2017-08-17 04:04:11 +00:00
Geoff Berry	87f8d25150	[MachineCopyPropagation] Extend pass to do COPY source forwarding This change extends MachineCopyPropagation to do COPY source forwarding. This change also extends the MachineCopyPropagation pass to be able to be run during register allocation, after physical registers have been assigned, but before the virtual registers have been re-written, which allows it to remove virtual register COPY LiveIntervals that become dead through the forwarding of all of their uses. Reviewers: qcolombet, javed.absar, MatzeB, jonpa Subscribers: jyknight, nemanjai, llvm-commits, nhaehnle, mcrosier, mgorny Differential Revision: https://reviews.llvm.org/D30751 llvm-svn: 311038	2017-08-16 20:50:01 +00:00
Sanjay Patel	169dae70a6	[x86] use more shift or LEA for select-of-constants (2nd try) The previous rev (r310208) failed to account for overflow when subtracting the constants to see if they're suitable for shift/lea. This version add a check for that and more test were added in r310490. We can convert any select-of-constants to math ops: http://rise4fun.com/Alive/d7d For this patch, I'm enhancing an existing x86 transform that uses fake multiplies (they always become shl/lea) to avoid cmov or branching. The current code misses cases where we have a negative constant and a positive constant, so this is just trying to plug that hole. The DAGCombiner diff prevents us from hitting a terrible inefficiency: we can start with a select in IR, create a select DAG node, convert it into a sext, convert it back into a select, and then lower it to sext machine code. Some notes about the test diffs: 1. 2010-08-04-MaskedSignedCompare.ll - We were creating control flow that didn't exist in the IR. 2. memcmp.ll - Choose -1 or 1 is the case that got me looking at this again. We could avoid the push/pop in some cases if we used 'movzbl %al' instead of an xor on a different reg? That's a post-DAG problem though. 3. mul-constant-result.ll - The trade-off between sbb+not vs. setne+neg could be addressed if that's a regression, but those would always be nearly equivalent. 4. pr22338.ll and sext-i1.ll - These tests have undef operands, so we don't actually care about these diffs. 5. sbb.ll - This shows a win for what is likely a common case: choose -1 or 0. 6. select.ll - There's another borderline case here: cmp+sbb+or vs. test+set+lea? Also, sbb+not vs. setae+neg shows up again. 7. select_const.ll - These are motivating cases for the enhancement; replace cmov with cheaper ops. Assembly differences between movzbl and xor to avoid a partial reg stall are caused later by the X86 Fixup SetCC pass. Differential Revision: https://reviews.llvm.org/D35340 llvm-svn: 310717	2017-08-11 15:44:14 +00:00
Sanjay Patel	807f92b8ff	[x86] revert r310208 to investigate test-suite failures (PR34105 / PR34097) llvm-svn: 310264	2017-08-07 15:47:48 +00:00
Sanjay Patel	a923c2ee95	[x86] use more shift or LEA for select-of-constants We can convert any select-of-constants to math ops: http://rise4fun.com/Alive/d7d For this patch, I'm enhancing an existing x86 transform that uses fake multiplies (they always become shl/lea) to avoid cmov or branching. The current code misses cases where we have a negative constant and a positive constant, so this is just trying to plug that hole. The DAGCombiner diff prevents us from hitting a terrible inefficiency: we can start with a select in IR, create a select DAG node, convert it into a sext, convert it back into a select, and then lower it to sext machine code. Some notes about the test diffs: 1. 2010-08-04-MaskedSignedCompare.ll - We were creating control flow that didn't exist in the IR. 2. memcmp.ll - Choose -1 or 1 is the case that got me looking at this again. I think we could avoid the push/pop in some cases if we used 'movzbl %al' instead of an xor on a different reg? That's a post-DAG problem though. 3. mul-constant-result.ll - The trade-off between sbb+not vs. setne+neg could be addressed if that's a regression, but I think those would always be nearly equivalent. 4. pr22338.ll and sext-i1.ll - These tests have undef operands, so I don't think we actually care about these diffs. 5. sbb.ll - This shows a win for what I think is a common case: choose -1 or 0. 6. select.ll - There's another borderline case here: cmp+sbb+or vs. test+set+lea? Also, sbb+not vs. setae+neg shows up again. 7. select_const.ll - These are motivating cases for the enhancement; replace cmov with cheaper ops. Assembly differences between movzbl and xor to avoid a partial reg stall are caused later by the X86 Fixup SetCC pass. Differential Revision: https://reviews.llvm.org/D35340 llvm-svn: 310208	2017-08-06 16:27:07 +00:00
Sanjay Patel	15748d239e	[x86] transform vector inc/dec to use -1 constant (PR33483) Convert vector increment or decrement to sub/add with an all-ones constant: add X, <1, 1...> --> sub X, <-1, -1...> sub X, <1, 1...> --> add X, <-1, -1...> The all-ones vector constant can be materialized using a pcmpeq instruction that is commonly recognized as an idiom (has no register dependency), so that's better than loading a splat 1 constant. AVX512 uses 'vpternlogd' for 512-bit vectors because there is apparently no better way to produce 512 one-bits. The general advantages of this lowering are: 1. pcmpeq has lower latency than a memop on every uarch I looked at in Agner's tables, so in theory, this could be better for perf, but... 2. That seems unlikely to affect any OOO implementation, and I can't measure any real perf difference from this transform on Haswell or Jaguar, but... 3. It doesn't look like it from the diffs, but this is an overall size win because we eliminate 16 - 64 constant bytes in the case of a vector load. If we're broadcasting a scalar load (which might itself be a bug), then we're replacing a scalar constant load + broadcast with a single cheap op, so that should always be smaller/better too. 4. This makes the DAG/isel output more consistent - we use pcmpeq already for padd x, -1 and psub x, -1, so we should use that form for +1 too because we can. If there's some reason to favor a constant load on some CPU, let's make the reverse transform for all of these cases (either here in the DAG or in a later machine pass). This should fix: https://bugs.llvm.org/show_bug.cgi?id=33483 Differential Revision: https://reviews.llvm.org/D34336 llvm-svn: 306289	2017-06-26 14:19:26 +00:00
Simon Pilgrim	46dd55f1e1	[X86][SSE] Change BUILD_VECTOR interleaving ordering to improve coalescing/combine opportunities We currently generate BUILD_VECTOR as a tree of UNPCKL shuffles of the same type: e.g. for v4f32: Step 1: unpcklps 0, 2 ==> X: <?, ?, 2, 0> : unpcklps 1, 3 ==> Y: <?, ?, 3, 1> Step 2: unpcklps X, Y ==> <3, 2, 1, 0> The issue is because we are not placing sequential vector elements together early enough, we fail to recognise many combinable patterns - consecutive scalar loads, extractions etc. Instead, this patch unpacks progressively larger sequential vector elements together: e.g. for v4f32: Step 1: unpcklps 0, 2 ==> X: <?, ?, 1, 0> : unpcklps 1, 3 ==> Y: <?, ?, 3, 2> Step 2: unpcklpd X, Y ==> <3, 2, 1, 0> This does mean that we are creating UNPCKL shuffle of different value types, but the relevant combines that benefit from this are quite capable of handling the additional BITCASTs that are now included in the shuffle tree. Differential Revision: https://reviews.llvm.org/D33864 llvm-svn: 304688	2017-06-04 20:12:04 +00:00
Amaury Sechet	5746e7356a	Update select.ll expected results. NFC llvm-svn: 304557	2017-06-02 16:07:43 +00:00
Dehao Chen	6b737ddce7	Add LiveRangeShrink pass to shrink live range within BB. Summary: LiveRangeShrink pass moves instruction right after the definition with the same BB if the instruction and its operands all have more than one use. This pass is inexpensive and guarantees optimal live-range within BB. Reviewers: davidxl, wmi, hfinkel, MatzeB, andreadb Reviewed By: MatzeB, andreadb Subscribers: hiraditya, jyknight, sanjoy, skatkov, gberry, jholewinski, qcolombet, javed.absar, krytarowski, atrick, spatel, RKSimon, andreadb, MatzeB, mehdi_amini, mgorny, efriedma, davide, dberlin, llvm-commits Differential Revision: https://reviews.llvm.org/D32563 llvm-svn: 304371	2017-05-31 23:25:25 +00:00
Hans Wennborg	b00ffd8cb7	Revert r302938 "Add LiveRangeShrink pass to shrink live range within BB." This also reverts follow-ups r303292 and r303298. It broke some Chromium tests under MSan, and apparently also internal tests at Google. llvm-svn: 303369	2017-05-18 18:50:05 +00:00
Dehao Chen	65dd23e273	Add LiveRangeShrink pass to shrink live range within BB. Summary: LiveRangeShrink pass moves instruction right after the definition with the same BB if the instruction and its operands all have more than one use. This pass is inexpensive and guarantees optimal live-range within BB. Reviewers: davidxl, wmi, hfinkel, MatzeB, andreadb Reviewed By: MatzeB, andreadb Subscribers: hiraditya, jyknight, sanjoy, skatkov, gberry, jholewinski, qcolombet, javed.absar, krytarowski, atrick, spatel, RKSimon, andreadb, MatzeB, mehdi_amini, mgorny, efriedma, davide, dberlin, llvm-commits Differential Revision: https://reviews.llvm.org/D32563 llvm-svn: 302938	2017-05-12 19:29:27 +00:00
Amaury Sechet	09ecd3117e	Update various test's codegen. NFC llvm-svn: 296257	2017-02-25 16:46:47 +00:00
Simon Pilgrim	bfd4495512	[X86][SSE] Combine shuffle nodes with multiple uses if all the users are being combined. Currently we only combine shuffle nodes if they have a single user to prevent us from causing code bloat by splitting the shuffles into several different combines. We don't take into account that in some cases we will already have combined all the users during recursively calling up the shuffle tree. This patch keeps a list of all the shuffle nodes that have been combined so far and permits combining of further shuffle nodes if all its users are in that list. Differential Revision: https://reviews.llvm.org/D29399 llvm-svn: 294183	2017-02-06 13:44:45 +00:00
Asaf Badouh	e11d2d73bf	[X86][MCU] Minor bug fix for r293469 + test case llvm-svn: 293478	2017-01-30 13:14:37 +00:00
Asaf Badouh	53713df0c2	[X86][MCU] replace select with bit manipulation instead of branches Differential Revision: https://reviews.llvm.org/D28354 llvm-svn: 293469	2017-01-30 08:16:59 +00:00
Asaf Badouh	b573553424	DAGCombiner: fix combine of trunc and select bugzilla: https://llvm.org/bugs/show_bug.cgi?id=29002 pr29002 Differential Revision: https://reviews.llvm.org/D26449 llvm-svn: 286938	2016-11-15 07:55:22 +00:00
Asaf Badouh	bb2338e939	reproducer for pr29002 https://reviews.llvm.org/D26449 llvm-svn: 286470	2016-11-10 16:27:27 +00:00
Simon Pilgrim	319c094771	[X86][SSE] Regenerate select tests llvm-svn: 283674	2016-10-08 21:17:44 +00:00
Sanjay Patel	c0899b961a	[x86] regenerate checks llvm-svn: 281529	2016-09-14 20:16:24 +00:00
Hans Wennborg	23cdc643b9	Revert to extend i8/i16 return values on Darwin (PR26665) In r260133, LLVM was changed to no longer extend i8/i16 return values, as it's not required by the ABI. However, code was found in the wild that relies on the old behaviour on Darwin, so this commit reverts back to that old behaviour for Darwin. On other platforms, it's less likely that code would be depending on the old behaviour, as GCC and MSVC haven't been extending such return values. llvm-svn: 261235	2016-02-18 18:17:05 +00:00
Hans Wennborg	850ec6ca18	[X86] Don't zero/sign-extend i1, i8, or i16 return values to 32 bits (PR22532) This matches GCC and MSVC's behaviour, and saves on code size. We were already not extending i1 return values on x86_64 after r127766. This takes that patch further by applying it to x86 target as well, and also for i8 and i16. The ABI docs have been unclear about the required behaviour here. The new i386 psABI [1] clearly states (Table 2.4, page 14) that i1, i8, and i16 return vales do not need to be extended beyond 8 bits. The x86_64 ABI doc is being updated to say the same [2]. Differential Revision: http://reviews.llvm.org/D16907 [1]. https://01.org/sites/default/files/file_attach/intel386-psabi-1.0.pdf [2]. https://groups.google.com/d/msg/x86-64-abi/E8O33onbnGQ/_RFWw_ixDQAJ llvm-svn: 260133	2016-02-08 19:34:30 +00:00
David Blaikie	a79ac14fa6	[opaque pointer type] Add textual IR support for explicit type parameter to load instruction Essentially the same as the GEP change in r230786. A similar migration script can be used to update test cases, though a few more test case improvements/changes were required this time around: (r229269-r229278) import fileinput import sys import re pat = re.compile(r"((?:=\|:\|^)\sload (?:atomic )?(?:volatile )?(.?))(\| addrspace$\d+$ )\($\| (?:%\|@\|null\|undef\|blockaddress\|getelementptr\|addrspacecast\|bitcast\|inttoptr\|\[\[[a-zA-Z]\|\{\{).$)") for line in sys.stdin: sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line)) Reviewers: rafael, dexonsmith, grosser Differential Revision: http://reviews.llvm.org/D7649 llvm-svn: 230794	2015-02-27 21:17:42 +00:00
David Blaikie	79e6c74981	[opaque pointer type] Add textual IR support for explicit type parameter to getelementptr instruction One of several parallel first steps to remove the target type of pointers, replacing them with a single opaque pointer type. This adds an explicit type parameter to the gep instruction so that when the first parameter becomes an opaque pointer type, the type to gep through is still available to the instructions. * This doesn't modify gep operators, only instructions (operators will be handled separately) * Textual IR changes only. Bitcode (including upgrade) and changing the in-memory representation will be in separate changes. * geps of vectors are transformed as: getelementptr <4 x float> %x, ... ->getelementptr float, <4 x float> %x, ... Then, once the opaque pointer type is introduced, this will ultimately look like: getelementptr float, <4 x ptr> %x with the unambiguous interpretation that it is a vector of pointers to float. * address spaces remain on the pointer, not the type: getelementptr float addrspace(1)* %x ->getelementptr float, float addrspace(1)* %x Then, eventually: getelementptr float, ptr addrspace(1) %x Importantly, the massive amount of test case churn has been automated by same crappy python code. I had to manually update a few test cases that wouldn't fit the script's model (r228970,r229196,r229197,r229198). The python script just massages stdin and writes the result to stdout, I then wrapped that in a shell script to handle replacing files, then using the usual find+xargs to migrate all the files. update.py: import fileinput import sys import re ibrep = re.compile(r"(^.?[^%\w]getelementptr inbounds )(((?:<\d x )?)(.?)(\| addrspace$\d$) \(\|>)(?:$\| (?:%\|@\|null\|undef\|blockaddress\|getelementptr\|addrspacecast\|bitcast\|inttoptr\|\[\[[a-zA-Z]\|\{\{).$))") normrep = re.compile( r"(^.?[^%\w]getelementptr )(((?:<\d* x )?)(.?)(\| addrspace$\d$) \(\|>)(?:$\| (?:%\|@\|null\|undef\|blockaddress\|getelementptr\|addrspacecast\|bitcast\|inttoptr\|\[\[[a-zA-Z]\|\{\{).$))") def conv(match, line): if not match: return line line = match.groups()[0] if len(match.groups()[5]) == 0: line += match.groups()[2] line += match.groups()[3] line += ", " line += match.groups()[1] line += "\n" return line for line in sys.stdin: if line.find("getelementptr ") == line.find("getelementptr inbounds"): if line.find("getelementptr inbounds") != line.find("getelementptr inbounds ("): line = conv(re.match(ibrep, line), line) elif line.find("getelementptr ") != line.find("getelementptr ("): line = conv(re.match(normrep, line), line) sys.stdout.write(line) apply.sh: for name in "$@" do python3 `dirname "$0"`/update.py < "$name" > "$name.tmp" && mv "$name.tmp" "$name" rm -f "$name.tmp" done The actual commands: From llvm/src: find test/ -name .ll \| xargs ./apply.sh From llvm/src/tools/clang: find test/ -name .mm -o -name .m -o -name .cpp -o -name .c \| xargs -I '{}' ../../apply.sh "{}" From llvm/src/tools/polly: find test/ -name *.ll \| xargs ./apply.sh After that, check-all (with llvm, clang, clang-tools-extra, lld, compiler-rt, and polly all checked out). The extra 'rm' in the apply.sh script is due to a few files in clang's test suite using interesting unicode stuff that my python script was throwing exceptions on. None of those files needed to be migrated, so it seemed sufficient to ignore those cases. Reviewers: rafael, dexonsmith, grosser Differential Revision: http://reviews.llvm.org/D7636 llvm-svn: 230786	2015-02-27 19:29:02 +00:00
Chandler Carruth	40dbd382ad	[SDAG] Fix a really, really terrible bug in the DAG combiner. This code is completely wrong. It is also dead, as if it were to ever run, it would crash. Fortunately, after my work to the combiner, it is at least possible to reach the code, and llvm-stress has found a test case. Thanks to Patrick for reporting. It would be really good if anyone who remembers how this code works and what it was intended to do could add some more obvious test coverage instead of my completely contrived and reduced test case. My test case was so brittle I left a bread crumb comment in it to help the next person to stumble on it and not know what it was actually testing for. llvm-svn: 214785	2014-08-04 21:29:59 +00:00
Matt Arsenault	3332b70627	Revert "Revert r212640, "Add trunc (select c, a, b) -> select c (trunc a), (trunc b) combine."" Don't try to convert the select condition type. llvm-svn: 212750	2014-07-10 18:21:04 +00:00
Andrew Trick	e97d8d6dde	Enable MI Sched for x86. This changes the SelectionDAG scheduling preference to source order. Soon, the SelectionDAG scheduler can be bypassed saving a nice chunk of compile time. Performance differences that result from this change are often a consequence of register coalescing. The register coalescer is far from perfect. Bugs can be filed for deficiencies. On x86 SandyBridge/Haswell, the source order schedule is often preserved, particularly for small blocks. Register pressure is generally improved over the SD scheduler's ILP mode. However, we are still able to handle large blocks that require latency hiding, unlike the SD scheduler's BURR mode. MI scheduler also attempts to discover the critical path in single-block loops and adjust heuristics accordingly. The MI scheduler relies on the new machine model. This is currently unimplemented for AVX, so we may not be generating the best code yet. Unit tests are updated so they don't depend on SD scheduling heuristics. llvm-svn: 192750	2013-10-15 23:33:07 +00:00
Andrew Trick	8485257d6d	Allocate local registers in order for optimal coloring. Also avoid locals evicting locals just because they want a cheaper register. Problem: MI Sched knows exactly how many registers we have and assumes they can be colored. In cases where we have large blocks, usually from unrolled loops, greedy coloring fails. This is a source of "regressions" from the MI Scheduler on x86. I noticed this issue on x86 where we have long chains of two-address defs in the same live range. It's easy to see this in matrix multiplication benchmarks like IRSmk and even the unit test misched-matmul.ll. A fundamental difference between the LLVM register allocator and conventional graph coloring is that in our model a live range can't discover its neighbors, it can only verify its neighbors. That's why we initially went for greedy coloring and added eviction to deal with the hard cases. However, for singly defined and two-address live ranges, we can optimally color without visiting neighbors simply by processing the live ranges in instruction order. Other beneficial side effects: It is much easier to understand and debug regalloc for large blocks when the live ranges are allocated in order. Yes, global allocation is still very confusing, but it's nice to be able to comprehend what happened locally. Heuristics could be added to bias register assignment based on instruction locality (think late register pairing, banks...). Intuituvely this will make some test cases that are on the threshold of register pressure more stable. llvm-svn: 187139	2013-07-25 18:35:14 +00:00
Stephen Lin	f799e3f944	Convert CodeGen//.ll tests to use the new CHECK-LABEL for easier debugging. No functionality change and all tests pass after conversion. This was done with the following sed invocation to catch label lines demarking function boundaries: sed -i '' "s/^;$ $$[A-Z0-9_]$:$ $test$[A-Za-z0-9_-]$:$ $$/;\1\2-LABEL:\3test\4:\5/g" test/CodeGen//*.ll which was written conservatively to avoid false positives rather than false negatives. I scanned through all the changes and everything looks correct. llvm-svn: 186258	2013-07-13 20:38:47 +00:00
Andrew Trick	121124acf8	Revert "Temporarily enable MI-Sched on X86." This reverts commit 98a9b72e8c56dc13a2617de84503a3d78352789c. llvm-svn: 184823	2013-06-25 02:48:58 +00:00
Andrew Trick	5a1e0af838	Temporarily enable MI-Sched on X86. Sorry for the unit test churn. I'll try to make the change permanently next time. llvm-svn: 184705	2013-06-24 09:13:20 +00:00
Preston Gurd	a01daace88	Pad Short Functions for Intel Atom The current Intel Atom microarchitecture has a feature whereby when a function returns early then it is slightly faster to execute a sequence of NOP instructions to wait until the return address is ready, as opposed to simply stalling on the ret instruction until the return address is ready. When compiling for X86 Atom only, this patch will run a pass, called "X86PadShortFunction" which will add NOP instructions where less than four cycles elapse between function entry and return. It includes tests. This patch has been updated to address Nadav's review comments - Optimize only at >= O1 and don't do optimization if -Os is set - Stores MachineBasicBlock* instead of BBNum - Uses DenseMap instead of std::map - Fixes placement of braces Patch by Andy Zhang. llvm-svn: 171879	2013-01-08 18:27:24 +00:00
Nadav Rotem	478b6a47ec	Revert revision 171524. Original message: URL: http://llvm.org/viewvc/llvm-project?rev=171524&view=rev Log: The current Intel Atom microarchitecture has a feature whereby when a function returns early then it is slightly faster to execute a sequence of NOP instructions to wait until the return address is ready, as opposed to simply stalling on the ret instruction until the return address is ready. When compiling for X86 Atom only, this patch will run a pass, called "X86PadShortFunction" which will add NOP instructions where less than four cycles elapse between function entry and return. It includes tests. Patch by Andy Zhang. llvm-svn: 171603	2013-01-05 05:42:48 +00:00
Preston Gurd	e36b685a94	The current Intel Atom microarchitecture has a feature whereby when a function returns early then it is slightly faster to execute a sequence of NOP instructions to wait until the return address is ready, as opposed to simply stalling on the ret instruction until the return address is ready. When compiling for X86 Atom only, this patch will run a pass, called "X86PadShortFunction" which will add NOP instructions where less than four cycles elapse between function entry and return. It includes tests. Patch by Andy Zhang. llvm-svn: 171524	2013-01-04 20:54:54 +00:00
Benjamin Kramer	ecd15d7f6c	X86: Fix accidentally swapped operands. llvm-svn: 165871	2012-10-13 12:50:19 +00:00
Benjamin Kramer	d6b9362fc2	X86: Promote i8 cmov when both operands are coming from truncates of the same width. X86 doesn't have i8 cmovs so isel would emit a branch. Emitting branches at this level is often not a good idea because it's too late for many optimizations to kick in. This solution doesn't add any extensions (truncs are free) and tries to avoid introducing partial register stalls by filtering direct copyfromregs. I'm seeing a ~10% speedup on reading a random .png file with libpng15 via graphicsmagick on x86_64/westmere, but YMMV depending on the microarchitecture. llvm-svn: 165868	2012-10-13 10:39:49 +00:00
Manman Ren	e8c6b15137	Update testing case for Atom when disabling rematerialization in TwoAddressInstructionPass. The generated code for Atom has a different code sequence. This is realted to commit r160749. llvm-svn: 160755	2012-07-25 20:17:14 +00:00
Manman Ren	cc1dc6dc11	Disable rematerialization in TwoAddressInstructionPass. It is redundant; RegisterCoalescer will do the remat if it can't eliminate the copy. Collected instruction counts before and after this. A few extra instructions are generated due to spilling but it is normal to see these kinds of changes with almost any small codegen change, according to Jakob. This also fixed rdar://11830760 where xor is expected instead of movi0. llvm-svn: 160749	2012-07-25 18:28:13 +00:00
Manman Ren	ef4e0479ec	X86: optimization for -(x != 0) This patch will optimize -(x != 0) on X86 FROM cmpl $0x01,%edi sbbl %eax,%eax notl %eax TO negl %edi sbbl %eax %eax In order to generate negl, I added patterns in Target/X86/X86InstrCompiler.td: def : Pat<(X86sub_flag 0, GR32:$src), (NEG32r GR32:$src)>; rdar: 10961709 llvm-svn: 156312	2012-05-07 18:06:23 +00:00
Manman Ren	f02efc8731	Revert r155853 The commit is intended to fix rdar://10961709. But it is the root cause of PR12720. Revert it for now. llvm-svn: 155992	2012-05-02 15:24:32 +00:00
Manman Ren	4f4d5c8fc8	X86: optimization for -(x != 0) This patch will optimize -(x != 0) on X86 FROM cmpl $0x01,%edi sbbl %eax,%eax notl %eax TO negl %edi sbbl %eax %eax llvm-svn: 155853	2012-04-30 22:51:25 +00:00
Manman Ren	5b7e08c9d8	test/CodeGen/X86/select.ll: remove spaces llvm-svn: 155840	2012-04-30 18:54:27 +00:00
Chandler Carruth	4190b507c5	Flip the new block-placement pass to be on by default. This is mostly to test the waters. I'd like to get results from FNT build bots and other bots running on non-x86 platforms. This feature has been pretty heavily tested over the last few months by me, and it fixes several of the execution time regressions caused by the inlining work by preventing inlining decisions from radically impacting block layout. I've seen very large improvements in yacr2 and ackermann benchmarks, along with the expected noise across all of the benchmark suite whenever code layout changes. I've analyzed all of the regressions and fixed them, or found them to be impossible to fix. See my email to llvmdev for more details. I'd like for this to be in 3.1 as it complements the inliner changes, but if any failures are showing up or anyone has concerns, it is just a flag flip and so can be easily turned off. I'm switching it on tonight to try and get at least one run through various folks' performance suites in case SPEC or something else has serious issues with it. I'll watch bots and revert if anything shows up. llvm-svn: 154816	2012-04-16 13:49:17 +00:00

1 2

70 Commits