llvm-project

Commit Graph

Author	SHA1	Message	Date
Vlad Tsyrklevich	ab016e00ec	[X86] FastISel fall back on !absolute_symbol GVs Summary: D25878, which added support for !absolute_symbol for normal X86 ISel, did not add support for materializing references to absolute symbols for X86 FastISel. This causes build failures because FastISel generates PC-relative relocations for absolute symbols. Fall back to normal ISel for references to !absolute_symbol GVs. Fix for PR38200. Reviewers: pcc, craig.topper Reviewed By: pcc Subscribers: hiraditya, llvm-commits, kcc Differential Revision: https://reviews.llvm.org/D50116 llvm-svn: 338599	2018-08-01 17:44:37 +00:00
Fangrui Song	f78650a8de	Remove trailing space sed -Ei 's/[[:space:]]+$//' include/*/.{def,h,td} lib/*/.{cpp,h} llvm-svn: 338293	2018-07-30 19:41:25 +00:00
Matt Arsenault	81920b0a25	DAG: Add calling convention argument to calling convention funcs This seems like a pretty glaring omission, and AMDGPU wants to treat kernels differently from other calling conventions. llvm-svn: 338194	2018-07-28 13:25:19 +00:00
Craig Topper	da424ba1c5	[X86][FastISel] Support uitofp with avx512. llvm-svn: 337055	2018-07-13 22:09:30 +00:00
Craig Topper	f0831eef0b	[X86][FastISel] Add EVEX support to sitofp handling. llvm-svn: 337045	2018-07-13 21:03:43 +00:00
Craig Topper	57c4585bab	[X86][FastISel] Support EVEX version of sqrt. llvm-svn: 336939	2018-07-12 19:58:06 +00:00
Craig Topper	d43f58231c	[X86][FastISel] Choose EVEX instructions when possible when lowering x86_sse_cvttss2si and similar intrinsics. This should fix a machine verifier error. llvm-svn: 336924	2018-07-12 18:03:56 +00:00
Adrian Prantl	5f8f34e459	Remove \brief commands from doxygen comments. We've been running doxygen with the autobrief option for a couple of years now. This makes the \brief markers into our comments redundant. Since they are a visual distraction and we don't want to encourage more \brief markers in new code either, this patch removes them all. Patch produced by for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done Differential Revision: https://reviews.llvm.org/D46290 llvm-svn: 331272	2018-05-01 15:54:18 +00:00
Oren Ben Simhon	fdd72fd522	[X86] Added support for nocf_check attribute for indirect Branch Tracking X86 Supports Indirect Branch Tracking (IBT) as part of Control-Flow Enforcement Technology (CET). IBT instruments ENDBR instructions used to specify valid targets of indirect call / jmp. The `nocf_check` attribute has two roles in the context of X86 IBT technology: 1. Appertains to a function - do not add ENDBR instruction at the beginning of the function. 2. Appertains to a function pointer - do not track the target function of this pointer by adding nocf_check prefix to the indirect-call instruction. This patch implements `nocf_check` context for Indirect Branch Tracking. It also auto generates `nocf_check` prefixes before indirect branchs to jump tables that are guarded by range checks. Differential Revision: https://reviews.llvm.org/D41879 llvm-svn: 327767	2018-03-17 13:29:46 +00:00
Craig Topper	9c098ed819	[X86] Add back fast-isel code for handling i8 shifts. I removed this in r316797 because the coverage report showed no coverage and I thought it should have been handled by the auto generated table. I now see that there is code that bypasses the table if the shift amount is out of bounds. This adds back the code. We'll codegen out of bounds i8 shifts to effectively (amount & 0x1f). The 0x1f is a strange quirk of x86 that shift amounts are always masked to 5-bits(except 64-bits). So if the masked value is still out bounds the result will be 0. Fixes PR36731. llvm-svn: 327540	2018-03-14 17:57:19 +00:00
Hiroshi Inoue	c8e9245816	[NFC] fix trivial typos in comments and documents "to to" -> "to" llvm-svn: 323628	2018-01-29 05:17:03 +00:00
Chandler Carruth	c58f2166ab	Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715, "Branch Target Injection", and is one of the two halves to Spectre.. Summary: First, we need to explain the core of the vulnerability. Note that this is a very incomplete description, please see the Project Zero blog post for details: https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html The basis for branch target injection is to direct speculative execution of the processor to some "gadget" of executable code by poisoning the prediction of indirect branches with the address of that gadget. The gadget in turn contains an operation that provides a side channel for reading data. Most commonly, this will look like a load of secret data followed by a branch on the loaded value and then a load of some predictable cache line. The attacker then uses timing of the processors cache to determine which direction the branch took in the speculative execution, and in turn what one bit of the loaded value was. Due to the nature of these timing side channels and the branch predictor on Intel processors, this allows an attacker to leak data only accessible to a privileged domain (like the kernel) back into an unprivileged domain. The goal is simple: avoid generating code which contains an indirect branch that could have its prediction poisoned by an attacker. In many cases, the compiler can simply use directed conditional branches and a small search tree. LLVM already has support for lowering switches in this way and the first step of this patch is to disable jump-table lowering of switches and introduce a pass to rewrite explicit indirectbr sequences into a switch over integers. However, there is no fully general alternative to indirect calls. We introduce a new construct we call a "retpoline" to implement indirect calls in a non-speculatable way. It can be thought of loosely as a trampoline for indirect calls which uses the RET instruction on x86. Further, we arrange for a specific call->ret sequence which ensures the processor predicts the return to go to a controlled, known location. The retpoline then "smashes" the return address pushed onto the stack by the call with the desired target of the original indirect call. The result is a predicted return to the next instruction after a call (which can be used to trap speculative execution within an infinite loop) and an actual indirect branch to an arbitrary address. On 64-bit x86 ABIs, this is especially easily done in the compiler by using a guaranteed scratch register to pass the target into this device. For 32-bit ABIs there isn't a guaranteed scratch register and so several different retpoline variants are introduced to use a scratch register if one is available in the calling convention and to otherwise use direct stack push/pop sequences to pass the target address. This "retpoline" mitigation is fully described in the following blog post: https://support.google.com/faqs/answer/7625886 We also support a target feature that disables emission of the retpoline thunk by the compiler to allow for custom thunks if users want them. These are particularly useful in environments like kernels that routinely do hot-patching on boot and want to hot-patch their thunk to different code sequences. They can write this custom thunk and use `-mretpoline-external-thunk` in addition to `-mretpoline`. In this case, on x86-64 thu thunk names must be: ``` __llvm_external_retpoline_r11 ``` or on 32-bit: ``` __llvm_external_retpoline_eax __llvm_external_retpoline_ecx __llvm_external_retpoline_edx __llvm_external_retpoline_push ``` And the target of the retpoline is passed in the named register, or in the case of the `push` suffix on the top of the stack via a `pushl` instruction. There is one other important source of indirect branches in x86 ELF binaries: the PLT. These patches also include support for LLD to generate PLT entries that perform a retpoline-style indirection. The only other indirect branches remaining that we are aware of are from precompiled runtimes (such as crt0.o and similar). The ones we have found are not really attackable, and so we have not focused on them here, but eventually these runtimes should also be replicated for retpoline-ed configurations for completeness. For kernels or other freestanding or fully static executables, the compiler switch `-mretpoline` is sufficient to fully mitigate this particular attack. For dynamic executables, you must compile all libraries with `-mretpoline` and additionally link the dynamic executable and all shared libraries with LLD and pass `-z retpolineplt` (or use similar functionality from some other linker). We strongly recommend also using `-z now` as non-lazy binding allows the retpoline-mitigated PLT to be substantially smaller. When manually apply similar transformations to `-mretpoline` to the Linux kernel we observed very small performance hits to applications running typical workloads, and relatively minor hits (approximately 2%) even for extremely syscall-heavy applications. This is largely due to the small number of indirect branches that occur in performance sensitive paths of the kernel. When using these patches on statically linked applications, especially C++ applications, you should expect to see a much more dramatic performance hit. For microbenchmarks that are switch, indirect-, or virtual-call heavy we have seen overheads ranging from 10% to 50%. However, real-world workloads exhibit substantially lower performance impact. Notably, techniques such as PGO and ThinLTO dramatically reduce the impact of hot indirect calls (by speculatively promoting them to direct calls) and allow optimized search trees to be used to lower switches. If you need to deploy these techniques in C++ applications, we strongly recommend that you ensure all hot call targets are statically linked (avoiding PLT indirection) and use both PGO and ThinLTO. Well tuned servers using all of these techniques saw 5% - 10% overhead from the use of retpoline. We will add detailed documentation covering these components in subsequent patches, but wanted to make the core functionality available as soon as possible. Happy for more code review, but we'd really like to get these patches landed and backported ASAP for obvious reasons. We're planning to backport this to both 6.0 and 5.0 release streams and get a 5.0 release with just this cherry picked ASAP for distros and vendors. This patch is the work of a number of people over the past month: Eric, Reid, Rui, and myself. I'm mailing it out as a single commit due to the time sensitive nature of landing this and the need to backport it. Huge thanks to everyone who helped out here, and everyone at Intel who helped out in discussions about how to craft this. Also, credit goes to Paul Turner (at Google, but not an LLVM contributor) for much of the underlying retpoline design. Reviewers: echristo, rnk, ruiu, craig.topper, DavidKreitzer Subscribers: sanjoy, emaste, mcrosier, mgorny, mehdi_amini, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D41723 llvm-svn: 323155	2018-01-22 22:05:25 +00:00
Daniel Neilson	1e68724d24	Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1) Summary: This is a resurrection of work first proposed and discussed in Aug 2015: http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html and initially landed (but then backed out) in Nov 2015: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html The @llvm.memcpy/memmove/memset intrinsics currently have an explicit argument which is required to be a constant integer. It represents the alignment of the dest (and source), and so must be the minimum of the actual alignment of the two. This change is the first in a series that allows source and dest to each have their own alignments by using the alignment attribute on their arguments. In this change we: 1) Remove the alignment argument. 2) Add alignment attributes to the source & dest arguments. We, temporarily, require that the alignments for source & dest be equal. For example, code which used to read: call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 100, i32 4, i1 false) will now read call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 4 %dest, i8* align 4 %src, i32 100, i1 false) Downstream users may have to update their lit tests that check for @llvm.memcpy/memmove/memset call/declaration patterns. The following extended sed script may help with updating the majority of your tests, but it does not catch all possible patterns so some manual checking and updating will be required. s~declare void @llvm\.mem(set\|cpy\|move)\.p([^(])$(.), i32, i1$~declare void @llvm.mem\1.p\2(\3, i1)~g s~call void @llvm\.memset\.p([^(])i8$i8([^])\ (.), i8 (.), i8 (.), i32 [01], i1 ([^)])$~call void @llvm.memset.p\1i8(i8\2* \3, i8 \4, i8 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i16$i8([^])\ (.), i8 (.), i16 (.), i32 [01], i1 ([^)])$~call void @llvm.memset.p\1i16(i8\2* \3, i8 \4, i16 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i32$i8([^])\ (.), i8 (.), i32 (.), i32 [01], i1 ([^)])$~call void @llvm.memset.p\1i32(i8\2* \3, i8 \4, i32 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i64$i8([^])\ (.), i8 (.), i64 (.), i32 [01], i1 ([^)])$~call void @llvm.memset.p\1i64(i8\2* \3, i8 \4, i64 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i128$i8([^])\ (.), i8 (.), i128 (.), i32 [01], i1 ([^)])$~call void @llvm.memset.p\1i128(i8\2* \3, i8 \4, i128 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i8$i8([^])\ (.), i8 (.), i8 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.memset.p\1i8(i8\2 align \6 \3, i8 \4, i8 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(])i16$i8([^])\ (.), i8 (.), i16 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.memset.p\1i16(i8\2 align \6 \3, i8 \4, i16 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(])i32$i8([^])\ (.), i8 (.), i32 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.memset.p\1i32(i8\2 align \6 \3, i8 \4, i32 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(])i64$i8([^])\ (.), i8 (.), i64 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.memset.p\1i64(i8\2 align \6 \3, i8 \4, i64 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(])i128$i8([^])\ (.), i8 (.), i128 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.memset.p\1i128(i8\2 align \6 \3, i8 \4, i128 \5, i1 \7)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i8$i8([^])\ (.), i8([^])\ (.), i8 (.), i32 [01], i1 ([^)])$~call void @llvm.mem\1.p\2i8(i8\3 \4, i8\5* \6, i8 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i16$i8([^])\ (.), i8([^])\ (.), i16 (.), i32 [01], i1 ([^)])$~call void @llvm.mem\1.p\2i16(i8\3 \4, i8\5* \6, i16 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i32$i8([^])\ (.), i8([^])\ (.), i32 (.), i32 [01], i1 ([^)])$~call void @llvm.mem\1.p\2i32(i8\3 \4, i8\5* \6, i32 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i64$i8([^])\ (.), i8([^])\ (.), i64 (.), i32 [01], i1 ([^)])$~call void @llvm.mem\1.p\2i64(i8\3 \4, i8\5* \6, i64 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i128$i8([^])\ (.), i8([^])\ (.), i128 (.), i32 [01], i1 ([^)])$~call void @llvm.mem\1.p\2i128(i8\3 \4, i8\5* \6, i128 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i8$i8([^])\ (.), i8([^])\ (.), i8 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.mem\1.p\2i8(i8\3* align \8 \4, i8\5* align \8 \6, i8 \7, i1 \9)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i16$i8([^])\ (.), i8([^])\ (.), i16 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.mem\1.p\2i16(i8\3* align \8 \4, i8\5* align \8 \6, i16 \7, i1 \9)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i32$i8([^])\ (.), i8([^])\ (.), i32 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.mem\1.p\2i32(i8\3* align \8 \4, i8\5* align \8 \6, i32 \7, i1 \9)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i64$i8([^])\ (.), i8([^])\ (.), i64 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.mem\1.p\2i64(i8\3* align \8 \4, i8\5* align \8 \6, i64 \7, i1 \9)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i128$i8([^])\ (.), i8([^])\ (.), i128 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.mem\1.p\2i128(i8\3* align \8 \4, i8\5* align \8 \6, i128 \7, i1 \9)~g The remaining changes in the series will: Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing source and dest alignments. Step 3) Update Clang to use the new IRBuilder API. Step 4) Update Polly to use the new IRBuilder API. Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API, and those that use use MemIntrinsicInst::[get\|set]Alignment() to use getDestAlignment() and getSourceAlignment() instead. Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the MemIntrinsicInst::[get\|set]Alignment() methods. Reviewers: pete, hfinkel, lhames, reames, bollu Reviewed By: reames Subscribers: niosHD, reames, jholewinski, qcolombet, jfb, sanjoy, arsenm, dschuff, dylanmckay, mehdi_amini, sdardis, nemanjai, david2050, nhaehnle, javed.absar, sbc100, jgravelle-google, eraman, aheejin, kbarton, JDevlieghere, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, llvm-commits Differential Revision: https://reviews.llvm.org/D41675 llvm-svn: 322965	2018-01-19 17:13:12 +00:00
Saleem Abdulrasool	05e285bcc5	FastISel: support no-PLT PIC calls on ELF x86_64 Add support for properly handling PIC code with no-PLT. This equates to `-fpic -fno-plt -O0` with the clang frontend. External functions are marked with nonlazybind, which must then be indirected through the GOT. This allows code to be built without optimizations in PIC mode without going through the PLT. Addresses PR35653! llvm-svn: 320776	2017-12-15 00:32:09 +00:00
Craig Topper	a0be5a06c1	[X86] Rename some instructions that start with Int_ to have the _Int at the end. This matches AVX512 version and is more consistent overall. And improves our scheduler models. In some cases this adds _Int to instructions that didn't have any Int_ before. It's a side effect of the adjustments made to some of the multiclasses. llvm-svn: 320325	2017-12-10 19:47:56 +00:00
Francis Visoiu Mistrih	9d7bb0cb40	[CodeGen] Print register names in lowercase in both MIR and debug output As part of the unification of the debug format and the MIR format, always print registers as lowercase. * Only debug printing is affected. It now follows MIR. Differential Revision: https://reviews.llvm.org/D40417 llvm-svn: 319187	2017-11-28 17:15:09 +00:00
Craig Topper	5ae677e102	[X86] Add 64-bit int to float/double conversion with AVX to X86FastISel::X86SelectSIToFP Summary: [X86] Teach fast isel to handle i64 sitofp with AVX. For some reason we only handled i32 sitofp with AVX. But with SSE only we support i64 so we should do the same with AVX. Also add i686 command lines for the 32-bit tests. 64-bit tests are in a separate file to avoid a fast-isel abort failure in 32-bit mode. Reviewers: RKSimon, zvi Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D39450 llvm-svn: 317102	2017-11-01 16:23:06 +00:00
Craig Topper	a827f84dcc	[X86] Add AVX512 support to X86FastISel::fastMaterializeFloatZero. llvm-svn: 317059	2017-11-01 00:47:45 +00:00
Craig Topper	668b1ab6f1	[X86] Clang-format some code. NFC llvm-svn: 316973	2017-10-31 02:34:29 +00:00
Craig Topper	9f01f6093c	[X86] Add AVX512 support to fast isel's X86ChooseCmpOpcode. llvm-svn: 316955	2017-10-30 21:09:19 +00:00
Craig Topper	367cc12fa9	[X86] Remove AVX512 early out from X86FastISel::X86SelectCmp. This shouldn't be needed anymore since i1 isn't a legal type. llvm-svn: 316912	2017-10-30 14:50:11 +00:00
Craig Topper	912f3b8e4b	[X86] Use the extended vector register classes in fast isel with AVX512F/VL. llvm-svn: 316857	2017-10-29 05:14:26 +00:00
Craig Topper	5f2289a13c	[X86] Add AVX512 support to X86FastISel::X86SelectFPExt and X86FastISel::X86SelectFPTrunc. llvm-svn: 316856	2017-10-29 02:50:31 +00:00
Craig Topper	1e30d783dd	[X86] Add AVX512 support to X86FastISel::X86MaterializeFP llvm-svn: 316853	2017-10-29 02:18:41 +00:00
Craig Topper	202b559ae0	[X86] Replace some default cases in X86SelectShift with llvm_unreachable. llvm-svn: 316839	2017-10-28 19:56:56 +00:00
Craig Topper	f8b92661b8	[X86] Remove unneeded MVT::i1 related code from fast isel. llvm-svn: 316825	2017-10-28 05:52:23 +00:00
Craig Topper	d69453290e	[X86] Remove fast-isel code for handling i8 shifts. This is handled by auto generated code. llvm-svn: 316797	2017-10-27 21:00:59 +00:00
Craig Topper	728fa7b4e2	[X86] Teach fastisel to use VLX VMOVNTDQA for v4f64 and 256-bit integers when available. This looks to have been missed from r280682. llvm-svn: 316790	2017-10-27 20:13:10 +00:00
Craig Topper	4f8656a7af	[X86] Enable extended comparison predicate support for SETUEQ/SETONE when targeting AVX instructions. We believe that despite AMD's documentation, that they really do support all 32 comparision predicates under AVX. Differential Revision: https://reviews.llvm.org/D38609 llvm-svn: 315201	2017-10-09 01:05:15 +00:00
Craig Topper	d37625859a	[X86] Fix copy pasto in X86FastISel::fastEmitInst_rrrr. The 4th operand was not being constrained and the third operand was being constrained twice. llvm-svn: 314648	2017-10-02 05:46:53 +00:00
Craig Topper	b7e4c94c6c	[X86] Fix register class name in a comment. NFC llvm-svn: 314250	2017-09-26 21:35:11 +00:00
Craig Topper	e92327e236	[X86] Don't emit COPY_TO_REG to ABCD registers before EXTRACT_SUBREG of sub_8bit This is similar to D37843, but for sub_8bit. This fixes all of the patterns except for the 2 that emit only an EXTRACT_SUBREG. That causes a verifier error with global isel because global isel doesn't know to issue the ABCD when doing this extract on 32-bits targets. Differential Revision: https://reviews.llvm.org/D37890 llvm-svn: 313558	2017-09-18 19:21:21 +00:00
Craig Topper	619b759a57	[X86] Teach fastisel to handle zext/sext i8->i16 and sext i1->i8/i16/i32/i64 Summary: ZExt and SExt from i8 to i16 aren't implemented in the autogenerated fast isel table because normal isel does a zext/sext to 32-bits and a subreg extract to avoid a partial register write or false dependency on the upper bits of the destination. This means without handling in fast isel we end up triggering a fast isel abort. We had no custom sign extend handling at all so while I was there I went ahead and implemented sext i1->i8/i16/i32/i64 which was also missing. This generates an i1->i8 sign extend using a mask with 1, then an 8-bit negate, then continues with a sext from i8. A better sequence would be a wider and/negate, but would require more custom code. Fast isel tests are a mess and I couldn't find a good home for the tests so I created a new one. The test pr34381.ll had to have fast-isel removed because it was relying on a fast isel abort to hit the bug. The test case still seems valid with fast-isel disabled though some of the instructions changed. Reviewers: spatel, zvi, igorb, guyblank, RKSimon Reviewed By: guyblank Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37320 llvm-svn: 312422	2017-09-02 18:53:46 +00:00
Craig Topper	ead876b8e8	[X86] Remove some code from fast isel that is no longer needed with i1 being an illegal type. llvm-svn: 312190	2017-08-30 23:05:54 +00:00
Craig Topper	f7ae101d95	[X86] Remove unneed AVX512 check from fast isel. This is no longer necessary now that i1 is illegal. llvm-svn: 312146	2017-08-30 18:08:58 +00:00
Craig Topper	2374de420b	[AVX512] Remove leftover code for when i1 was a legal type from the fast isel load/store code. Summary: I don't think we need this code anymore. It only existed because i1 used to be legal. There's probably more unneeded code in fast isel still. Reviewers: guyblank, zvi Reviewed By: guyblank Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36652 llvm-svn: 310843	2017-08-14 15:28:47 +00:00
Reid Kleckner	7662d50d10	[X86] Teach fastisel to select calls to dllimport functions Summary: Direct calls to dllimport functions are very common Windows. We should add them to the -O0 fast path. Reviewers: rafael Subscribers: llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D36197 llvm-svn: 310152	2017-08-05 00:10:43 +00:00
Martin Storsjo	2f24e93481	[AArch64] Extend CallingConv::X86_64_Win64 to AArch64 as well Rename the enum value from X86_64_Win64 to plain Win64. The symbol exposed in the textual IR is changed from 'x86_64_win64cc' to 'win64cc', but the numeric value is kept, keeping support for old bitcode. Differential Revision: https://reviews.llvm.org/D34474 llvm-svn: 308208	2017-07-17 20:05:19 +00:00
Davide Italiano	a63981aaa9	[X86/FastIsel] Fall-back to SelectionDAG when lowering soft-floats. FastIsel can't handle them, so we would end up crashing during register class selection. Fixes PR26522. Differential Revision: https://reviews.llvm.org/D35272 llvm-svn: 307797	2017-07-12 15:26:06 +00:00
Simon Pilgrim	f7113fd270	[X86][AVX1] Split 256-bit vector non-temporal FastISel loads to keep it non-temporal (PR32744) Extension to D33728 llvm-svn: 304798	2017-06-06 14:18:39 +00:00
Guy Blank	548e22a1a7	[X86][AVX512] Make i1 illegal in the CodeGen This patch defines the i1 type as illegal in the X86 backend for AVX512. For DAG operations on <N x i1> types (build vector, extract vector element, ...) i8 is used, and should be truncated/extended. This should produce better scalar code for i1 types since GPRs will be used instead of mask registers. Differential Revision: https://reviews.llvm.org/D32273 llvm-svn: 303421	2017-05-19 12:35:15 +00:00
Igor Breger	db75455990	[X86] Move getX86ConditionCode() from X86FastISel.cpp to X86InstrInfo.cpp. NFC Summary: Move getX86ConditionCode() from X86FastISel.cpp to X86InstrInfo.cpp so it can be used by GloabalIsel instruction selector. This is a pre-commit for a patch I'm working on to support G_ICMP. NFC. Reviewers: zvi, guyblank, delena Reviewed By: guyblank, delena Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D33038 llvm-svn: 302767	2017-05-11 06:36:37 +00:00
Serge Pavlov	d526b13e61	Add extra operand to CALLSEQ_START to keep frame part set up previously Using arguments with attribute inalloca creates problems for verification of machine representation. This attribute instructs the backend that the argument is prepared in stack prior to CALLSEQ_START..CALLSEQ_END sequence (see http://llvm.org/docs/InAlloca.htm for details). Frame size stored in CALLSEQ_START in this case does not count the size of this argument. However CALLSEQ_END still keeps total frame size, as caller can be responsible for cleanup of entire frame. So CALLSEQ_START and CALLSEQ_END keep different frame size and the difference is treated by MachineVerifier as stack error. Currently there is no way to distinguish this case from actual errors. This patch adds additional argument to CALLSEQ_START and its target-specific counterparts to keep size of stack that is set up prior to the call frame sequence. This argument allows MachineVerifier to calculate actual frame size associated with frame setup instruction and correctly process the case of inalloca arguments. The changes made by the patch are: - Frame setup instructions get the second mandatory argument. It affects all targets that use frame pseudo instructions and touched many files although the changes are uniform. - Access to frame properties are implemented using special instructions rather than calls getOperand(N).getImm(). For X86 and ARM such replacement was made previously. - Changes that reflect appearance of additional argument of frame setup instruction. These involve proper instruction initialization and methods that access instruction arguments. - MachineVerifier retrieves frame size using method, which reports sum of frame parts initialized inside frame instruction pair and outside it. The patch implements approach proposed by Quentin Colombet in https://bugs.llvm.org/show_bug.cgi?id=27481#c1. It fixes 9 tests failed with machine verifier enabled and listed in PR27481. Differential Revision: https://reviews.llvm.org/D32394 llvm-svn: 302527	2017-05-09 13:35:13 +00:00
Oren Ben Simhon	dbd4bba1ec	[X86] Support of no_caller_saved_registers attribute This patch implements the LLVM part for no_caller_saved_registers attribute as appears here: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=5ed3cc7b66af4758f7849ed6f65f4365be8223be. In order to implement the attribute, we use the dynamic CSR mechanism to remove returned/passed arguments from the function regmask/CSR list. Differential Revision: https://reviews.llvm.org/D31876 llvm-svn: 302020	2017-05-03 13:07:19 +00:00
Reid Kleckner	6652a52e2b	Use Argument::hasAttribute and AttributeList::ReturnIndex more This eliminates many extra 'Idx' induction variables in loops over arguments in CodeGen/ and Target/. It also reduces the number of places where we assume that ReturnIndex is 0 and that we should add one to argument numbers to get the corresponding attribute list index. NFC llvm-svn: 301666	2017-04-28 18:37:16 +00:00
Krzysztof Parzyszek	44e25f37ae	Move size and alignment information of regclass to TargetRegisterInfo 1. RegisterClass::getSize() is split into two functions: - TargetRegisterInfo::getRegSizeInBits(const TargetRegisterClass &RC) const; - TargetRegisterInfo::getSpillSize(const TargetRegisterClass &RC) const; 2. RegisterClass::getAlignment() is replaced by: - TargetRegisterInfo::getSpillAlignment(const TargetRegisterClass &RC) const; This will allow making those values depend on subtarget features in the future. Differential Revision: https://reviews.llvm.org/D31783 llvm-svn: 301221	2017-04-24 18:55:33 +00:00
Reid Kleckner	fb502d2f5e	[IR] Make paramHasAttr to use arg indices instead of attr indices This avoids the confusing 'CS.paramHasAttr(ArgNo + 1, Foo)' pattern. Previously we were testing return value attributes with index 0, so I introduced hasReturnAttr() for that use case. llvm-svn: 300367	2017-04-14 20:19:02 +00:00
Simon Pilgrim	b6702eaec3	[X86][MMX] Add fast-isel support for MMX non-temporal writes Differential Revision: https://reviews.llvm.org/D31754 llvm-svn: 299852	2017-04-10 16:58:07 +00:00
Craig Topper	3001b35189	[AVX-512] Fix bad comment from r299112. NFC llvm-svn: 299114	2017-03-30 21:05:33 +00:00
Craig Topper	533b1bde1b	[AVX-512] Fix another case where fastisel was generating a GR8 to VK1 copy. This time after calls returning i1. Fixes PR32472. llvm-svn: 299112	2017-03-30 21:02:52 +00:00

1 2 3 4 5 ...

604 Commits