The PATCHPOINT instructions have a single optional defined register operand,
but the machine verifier can't verify the optional defined register operands.
This commit makes sure that the machine verifier won't report an error when a
PATCHPOINT instruction doesn't have its optional defined register operand.
This change will allow us to enable the machine verifier for the code
generation tests for the patchpoint intrinsics.
Reviewers: Juergen Ributzka
llvm-svn: 244513
Summary:
This makes it so that reports symbolized after the fact with
llvm-symbolizer are more similar to the ones we generate at runtime with
in-process dbghelp.
Reviewers: samsonov
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11785
llvm-svn: 244512
With this we finally have an ELFFile that is O(1) to construct. This is helpful
for programs like lld which have to do their own section walk.
llvm-svn: 244510
frame setup instruction.
This commit ensures that the stack map lowering code in FastISel adds an
appropriate number of immediate operands to the frame setup instruction.
The previous code added just one immediate operand, which was fine for a target
like AArch64, but on X86 the ADJCALLSTACKDOWN64 instruction needs two explicit
operands. This caused the machine verifier to report an error when the old code
added just one.
Reviewers: Juergen Ributzka
Differential Revision: http://reviews.llvm.org/D11853
llvm-svn: 244508
NaCl's sandbox doesn't allow PUSHF/POPF out of security concerns (priviledged emulators have forgotten to mask system bits in the past, and EFLAGS's DF bit is a constant source of hilarity). Commit r220529 fixed PR20376 by saving cmpxchg's flags result using EFLAGS, this commit now generated LAHF/SAHF instead, for all of x86 (not just NaCl) because it leads to an overall performance gain over PUSHF/POPF.
As with the previous patch this code generation is pretty bad because it occurs very later, after register allocation, and in many cases it rematerializes flags which were already available (e.g. already in a register through SETE). Fortunately it's somewhat rare that this code needs to fire.
I did [[ https://github.com/jfbastien/benchmark-x86-flags | a bit of benchmarking ]], the results on an Intel Haswell E5-2690 CPU at 2.9GHz are:
| Time per call (ms) | Runtime (ms) | Benchmark |
| 0.000012514 | 6257 | sete.i386 |
| 0.000012810 | 6405 | sete.i386-fast |
| 0.000010456 | 5228 | sete.x86-64 |
| 0.000010496 | 5248 | sete.x86-64-fast |
| 0.000012906 | 6453 | lahf-sahf.i386 |
| 0.000013236 | 6618 | lahf-sahf.i386-fast |
| 0.000010580 | 5290 | lahf-sahf.x86-64 |
| 0.000010304 | 5152 | lahf-sahf.x86-64-fast |
| 0.000028056 | 14028 | pushf-popf.i386 |
| 0.000027160 | 13580 | pushf-popf.i386-fast |
| 0.000023810 | 11905 | pushf-popf.x86-64 |
| 0.000026468 | 13234 | pushf-popf.x86-64-fast |
Clearly `PUSHF`/`POPF` are suboptimal. It doesn't really seems to be worth teaching LLVM about individual flags, at least not for this purpose.
Reviewers: rnk, jvoung, t.p.northover
Subscribers: llvm-commits
Differential revision: http://reviews.llvm.org/D6629
llvm-svn: 244503
When clang is built with -DLLVM_ENABLE_ASSERTIONS=Off,
it does not create names for IR values.
Differential Revision: http://reviews.llvm.org/D11437
llvm-svn: 244502
This allows emitting kernels that were instantiated from the host code
and which would never be explicitly referenced otherwise.
Differential Revision: http://reviews.llvm.org/D11666
llvm-svn: 244501
The main purpose is to avoid errors and warnings while parsing CUDA
header files. The attributes are currently unused otherwise.
Differential version: http://reviews.llvm.org/D11690
llvm-svn: 244497
As discussed in D11760, this patch moves the (V)PSRA(WD) arithmetic shift-by-constant folding to InstCombine to match the logical shift implementations.
Differential Revision: http://reviews.llvm.org/D11886
llvm-svn: 244495
With this patch clang appends the command line options that would allow vectorization when floating-point commutativity is required. Specifically those are enabling fast-math or specifying a loop hint.
llvm-svn: 244492
This patch moves the verification of fast-math to just before vectorization is done. This way we can tell clang to append the command line options would that allow floating-point commutativity. Specifically those are enableing fast-math or specifying a loop hint.
llvm-svn: 244489
Summary:
The vtable takes its DLL storage class from the class, not the key
function. When they disagree, the vtable won't be exported by the DLL
that defines the key function. The easiest way to ensure that importers
of the class emit their own vtable is to say that the class has no key
function.
Reviewers: hans, majnemer
Subscribers: cfe-commits
Differential Revision: http://reviews.llvm.org/D11913
llvm-svn: 244488
Sometimes interleaving is not beneficial, as determined by the cost-model and sometimes it is disabled by a loop hint (by the user). This patch modifies the diagnostic messages to make it clear why interleaving wasn't done.
llvm-svn: 244485
The LDD/STD instructions can load/store a 64bit quantity from/to
memory to/from a consecutive even/odd pair of (32-bit) registers. They
are part of SparcV8, and also present in SparcV9. (Although deprecated
there, as you can store 64bits in one register).
As recommended on llvmdev in the thread "How to enable use of 64bit
load/store for 32bit architecture" from Apr 2015, I've modeled the
64-bit load/store operations as working on a v2i32 type, rather than
making i64 a legal type, but with few legal operations. The latter
does not (currently) work, as there is much code in llvm which assumes
that if i64 is legal, operations like "add" will actually work on it.
The same assumption does not hold for v2i32 -- for vector types, it is
workable to support only load/store, and expand everything else.
This patch:
- Adds a new register class, IntPair, for even/odd pairs of registers.
- Modifies the list of reserved registers, the stack spilling code,
and register copying code to support the IntPair register class.
- Adds support in AsmParser. (note that in asm text, you write the
name of the first register of the pair only. So the parser has to
morph the single register into the equivalent paired register).
- Adds the new instructions themselves (LDD/STD/LDDA/STDA).
- Hooks up the instructions and registers as a vector type v2i32. Adds
custom legalizer to transform i64 load/stores into v2i32 load/stores
and bitcasts, so that the new instructions can actually be
generated, and marks all operations other than load/store on v2i32
as needing to be expanded.
- Copies the unfortunate SelectInlineAsm hack from ARMISelDAGToDAG.
This hack undoes the transformation of i64 operands into two
arbitrarily-allocated separate i32 registers in
SelectionDAGBuilder. and instead passes them in a single
IntPair. (Arbitrarily allocated registers are not useful, asm code
expects to be receiving a pair, which can be passed to ldd/std.)
Also adds a bunch of test cases covering all the bugs I've added along
the way.
Differential Revision: http://reviews.llvm.org/D8713
llvm-svn: 244484
I looked into adding a warning / error for this to FileCheck, but there doesn't
seem to be a good way to avoid it triggering on the instances of it in RUN lines.
llvm-svn: 244481
cl uses 'CL' and '_CL_' to prepend and append command line options to
the given argument vector. There is an additional quirk whereby '#' is
transformed into '='.
Differential Revision: http://reviews.llvm.org/D11896
llvm-svn: 244473
Previously all test output was reported by each individual
instance of dotest.py. After a recent patch, dosep gets dotest
outptu via a pipe, and selectively decides which output to
print.
This breaks certain scripts which rely on having full output
of each dotest instance to do various parsing and/or log-scraping.
While we make no promises about the format of dotest output, it's
easy to restore this to the old behavior for now, although it is
behind a flag. To re-enable full output, run dosep.py with the -s
option.
Differential Revision: http://reviews.llvm.org/D11816
Reviewed By: Chaoren Lin
llvm-svn: 244469
These changes are for Android x86_64 targets to be compatible
with current Android g++ and conform to AMD64 ABI.
https://llvm.org/bugs/show_bug.cgi?id=23897
* Return type of long double (fp128) should be fp128, not x86_fp80.
* Vararg of long double (fp128) could be in register and overflowed to memory.
https://llvm.org/bugs/show_bug.cgi?id=24111
* Return value of long double (fp128) _Complex should be in memory like a structure of {fp128,fp128}.
Differential Revision: http://reviews.llvm.org/D11437
llvm-svn: 244468
This change adds the new unroll metadata "llvm.loop.unroll.enable" which directs
the optimizer to unroll a loop fully if the trip count is known at compile time, and
unroll partially if the trip count is not known at compile time. This differs from
"llvm.loop.unroll.full" which explicitly does not unroll a loop if the trip count is not
known at compile time
With this change "#pragma unroll" generates "llvm.loop.unroll.enable" rather than
"llvm.loop.unroll.full" metadata. This changes the semantics of "#pragma unroll" slightly
to mean "unroll aggressively (fully or partially)" rather than "unroll fully or not at all".
The motivating example for this change was some internal code with a loop marked
with "#pragma unroll" which only sometimes had a compile-time trip count depending
on template magic. When the trip count was a compile-time constant, everything works
as expected and the loop is fully unrolled. However, when the trip count was not a
compile-time constant the "#pragma unroll" explicitly disabled unrolling of the loop(!).
Removing "#pragma unroll" caused the loop to be unrolled partially which was desirable
from a performance perspective.
llvm-svn: 244467
This change adds the unroll metadata "llvm.loop.unroll.enable" which directs
the optimizer to unroll a loop fully if the trip count is known at compile time, and
unroll partially if the trip count is not known at compile time. This differs from
"llvm.loop.unroll.full" which explicitly does not unroll a loop if the trip count is not
known at compile time.
The "llvm.loop.unroll.enable" is intended to be added for loops annotated with
"#pragma unroll".
llvm-svn: 244466