This patch moves checking the threshold of runtime pointer checks to the vectorization requirements (late diagnostics) and emits a diagnostic that infroms the user the loop would be vectorized if not for exceeding the pointer-check threshold. Clang will also append the options that can be used to allow vectorization.
llvm-svn: 244523
Summary:
For now output using C99's hexadecimal floating-point representation.
This patch also cleans up how machine operands are printed: instead of special-casing per type of machine instruction, the code now handles operands generically.
Reviewers: sunfish
Subscribers: llvm-commits, jfb
Differential Revision: http://reviews.llvm.org/D11914
llvm-svn: 244520
The PATCHPOINT instructions have a single optional defined register operand,
but the machine verifier can't verify the optional defined register operands.
This commit makes sure that the machine verifier won't report an error when a
PATCHPOINT instruction doesn't have its optional defined register operand.
This change will allow us to enable the machine verifier for the code
generation tests for the patchpoint intrinsics.
Reviewers: Juergen Ributzka
llvm-svn: 244513
Summary:
This makes it so that reports symbolized after the fact with
llvm-symbolizer are more similar to the ones we generate at runtime with
in-process dbghelp.
Reviewers: samsonov
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11785
llvm-svn: 244512
frame setup instruction.
This commit ensures that the stack map lowering code in FastISel adds an
appropriate number of immediate operands to the frame setup instruction.
The previous code added just one immediate operand, which was fine for a target
like AArch64, but on X86 the ADJCALLSTACKDOWN64 instruction needs two explicit
operands. This caused the machine verifier to report an error when the old code
added just one.
Reviewers: Juergen Ributzka
Differential Revision: http://reviews.llvm.org/D11853
llvm-svn: 244508
NaCl's sandbox doesn't allow PUSHF/POPF out of security concerns (priviledged emulators have forgotten to mask system bits in the past, and EFLAGS's DF bit is a constant source of hilarity). Commit r220529 fixed PR20376 by saving cmpxchg's flags result using EFLAGS, this commit now generated LAHF/SAHF instead, for all of x86 (not just NaCl) because it leads to an overall performance gain over PUSHF/POPF.
As with the previous patch this code generation is pretty bad because it occurs very later, after register allocation, and in many cases it rematerializes flags which were already available (e.g. already in a register through SETE). Fortunately it's somewhat rare that this code needs to fire.
I did [[ https://github.com/jfbastien/benchmark-x86-flags | a bit of benchmarking ]], the results on an Intel Haswell E5-2690 CPU at 2.9GHz are:
| Time per call (ms) | Runtime (ms) | Benchmark |
| 0.000012514 | 6257 | sete.i386 |
| 0.000012810 | 6405 | sete.i386-fast |
| 0.000010456 | 5228 | sete.x86-64 |
| 0.000010496 | 5248 | sete.x86-64-fast |
| 0.000012906 | 6453 | lahf-sahf.i386 |
| 0.000013236 | 6618 | lahf-sahf.i386-fast |
| 0.000010580 | 5290 | lahf-sahf.x86-64 |
| 0.000010304 | 5152 | lahf-sahf.x86-64-fast |
| 0.000028056 | 14028 | pushf-popf.i386 |
| 0.000027160 | 13580 | pushf-popf.i386-fast |
| 0.000023810 | 11905 | pushf-popf.x86-64 |
| 0.000026468 | 13234 | pushf-popf.x86-64-fast |
Clearly `PUSHF`/`POPF` are suboptimal. It doesn't really seems to be worth teaching LLVM about individual flags, at least not for this purpose.
Reviewers: rnk, jvoung, t.p.northover
Subscribers: llvm-commits
Differential revision: http://reviews.llvm.org/D6629
llvm-svn: 244503
As discussed in D11760, this patch moves the (V)PSRA(WD) arithmetic shift-by-constant folding to InstCombine to match the logical shift implementations.
Differential Revision: http://reviews.llvm.org/D11886
llvm-svn: 244495
This patch moves the verification of fast-math to just before vectorization is done. This way we can tell clang to append the command line options would that allow floating-point commutativity. Specifically those are enableing fast-math or specifying a loop hint.
llvm-svn: 244489
Sometimes interleaving is not beneficial, as determined by the cost-model and sometimes it is disabled by a loop hint (by the user). This patch modifies the diagnostic messages to make it clear why interleaving wasn't done.
llvm-svn: 244485
The LDD/STD instructions can load/store a 64bit quantity from/to
memory to/from a consecutive even/odd pair of (32-bit) registers. They
are part of SparcV8, and also present in SparcV9. (Although deprecated
there, as you can store 64bits in one register).
As recommended on llvmdev in the thread "How to enable use of 64bit
load/store for 32bit architecture" from Apr 2015, I've modeled the
64-bit load/store operations as working on a v2i32 type, rather than
making i64 a legal type, but with few legal operations. The latter
does not (currently) work, as there is much code in llvm which assumes
that if i64 is legal, operations like "add" will actually work on it.
The same assumption does not hold for v2i32 -- for vector types, it is
workable to support only load/store, and expand everything else.
This patch:
- Adds a new register class, IntPair, for even/odd pairs of registers.
- Modifies the list of reserved registers, the stack spilling code,
and register copying code to support the IntPair register class.
- Adds support in AsmParser. (note that in asm text, you write the
name of the first register of the pair only. So the parser has to
morph the single register into the equivalent paired register).
- Adds the new instructions themselves (LDD/STD/LDDA/STDA).
- Hooks up the instructions and registers as a vector type v2i32. Adds
custom legalizer to transform i64 load/stores into v2i32 load/stores
and bitcasts, so that the new instructions can actually be
generated, and marks all operations other than load/store on v2i32
as needing to be expanded.
- Copies the unfortunate SelectInlineAsm hack from ARMISelDAGToDAG.
This hack undoes the transformation of i64 operands into two
arbitrarily-allocated separate i32 registers in
SelectionDAGBuilder. and instead passes them in a single
IntPair. (Arbitrarily allocated registers are not useful, asm code
expects to be receiving a pair, which can be passed to ldd/std.)
Also adds a bunch of test cases covering all the bugs I've added along
the way.
Differential Revision: http://reviews.llvm.org/D8713
llvm-svn: 244484
I looked into adding a warning / error for this to FileCheck, but there doesn't
seem to be a good way to avoid it triggering on the instances of it in RUN lines.
llvm-svn: 244481
This change adds the unroll metadata "llvm.loop.unroll.enable" which directs
the optimizer to unroll a loop fully if the trip count is known at compile time, and
unroll partially if the trip count is not known at compile time. This differs from
"llvm.loop.unroll.full" which explicitly does not unroll a loop if the trip count is not
known at compile time.
The "llvm.loop.unroll.enable" is intended to be added for loops annotated with
"#pragma unroll".
llvm-svn: 244466
The scalarizer can cache incorrect entries when walking up a chain of
insertelement instructions. This occurs when it encounters more than one
instruction that it is not actively searching for, as it unconditionally caches
every element it finds. The fix is to only cache the first element that it
isn't searching for so we don't overwrite correct entries.
Reviewers: hfinkel
Differential Revision: http://reviews.llvm.org/D11559
llvm-svn: 244448
PR24139 contains an analysis of poor register allocation. One of the findings
was that when calculating the spill weight, a rematerializable interval once
split is no longer rematerializable. This is because the isRematerializable
check in CalcSpillWeights.cpp does not follow the copies introduced by live
range splitting (after splitting, the live interval register definition is a
copy which is not rematerializable).
Reviewers: qcolombet
Differential Revision: http://reviews.llvm.org/D11686
llvm-svn: 244439
We can only PHI translate instructions. In our attempt to PHI translate
a bitcast, we attempt to translate its operand; however, the operand
might be an argument or a global instead of an instruction. Benignly
bail out when this happens.
This fixes PR24397.
Differential Revision: http://reviews.llvm.org/D11879
llvm-svn: 244418
The pass adds new kernel arguments for image attributes, and
resolves calls to dummy attribute and resource id getter functions.
Patch by: Zoltan Gilian
llvm-svn: 244372
Summary:
With InstAlias, we don't need to print the _e32 portion of the mnemonic
when we print the $dst operand. This change makes it possible to
include vcc in the asm string when we switch VOPC over to having
implicit vcc defs.
Reviewers: arsenm
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11813
llvm-svn: 244362
Summary: llvm::ConstantFoldTerminator function can convert SwitchInst with single case (and default) to a conditional BranchInst. This patch adds support to preserve make.implicit metadata on this conversion.
Reviewers: sanjoy, weimingz, chenli
Subscribers: mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D11841
llvm-svn: 244348
This patch fixes the sse2/avx2 vector shift by constant instcombine call to correctly deal with the fact that the shift amount is formed from the entire lower 64-bit and not just the lowest element as it currently assumes.
e.g.
%1 = tail call <4 x i32> @llvm.x86.sse2.psrl.d(<4 x i32> %v, <4 x i32> <i32 15, i32 15, i32 15, i32 15>)
In this case, (V)PSRLD doesn't perform a lshr by 15 but in fact attempts to shift by 64424509455 ((15 << 32) | 15) - giving a zero result.
In addition, this review also recognizes shift-by-zero from a ConstantAggregateZero type (PR23821).
Differential Revision: http://reviews.llvm.org/D11760
llvm-svn: 244341
Summary: We were using the SI encoding for VI.
Reviewers: arsenm
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11812
llvm-svn: 244332
In tree they are only used by llvm-readobj, but it is also used by
https://github.com/mono/CppSharp.
While at it, add some missing error checking.
llvm-svn: 244320
llvm-dsymutil has to be able to process debug info produced by other compilers
which use different line table settings. The testcase wasn't generated by
another compiler, but by a modified clang.
llvm-svn: 244319
Summary:
Port the ReconstructShuffle function from AArch64 to ARM
to handle mismatched incoming types in the BUILD_VECTOR
node.
This fixes an outstanding FIXME in the ReconstructShuffle
code.
Reviewers: t.p.northover, rengolin
Subscribers: aemerson, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D11720
llvm-svn: 244314
Summary: WebAssembly's tablegen instructions have the names WebAssembly expects, but by LLVM convention they're uppercase and suffixed with their type after an underscore. Leave the C++ code that way, but print outt he names WebAssembly expects (lowercase, no type). We could teach tablegen to do this later, maybe by using `!cast<string>(node)` in the .td files.
Reviewers: sunfish
Subscribers: jfb, llvm-commits
Differential Revision: http://reviews.llvm.org/D11776
llvm-svn: 244305
The block address machine operands can reference IR blocks in other functions.
This commit fixes a bug where the references to unnamed IR blocks in other
functions weren't serialized correctly.
llvm-svn: 244299
When we are not emitting the condition for the branch, because the condition is
in another BB or SDAG did the selection for us, then we have to mask the flag in
the register with AND.
This is required when the condition comes from a truncate, because SDAG only
truncates down to a legal size of i32.
This fixes rdar://problem/22161062.
llvm-svn: 244291
This reverts commit r243198 and 243304.
Turns out this wasn't the correct fix for this problem. It works only within
FastISel, but fails when the truncate is selected by SDAG.
llvm-svn: 244287
lld might end up using a small part of this, but it will be in a much
refactored form. For now this unblocks avoiding the full section scan in the
ELFFile constructor.
This also has a (very small) error handling improvement.
llvm-svn: 244282
A dSYM bundle is a file hierarchy that looks slike this:
<bundle name>.dSYM/
Contents/
Info.plist
Resources/
DWARF/
<DWARF file(s)>
This is the default output mode of dsymutil.
llvm-svn: 244270
dsymutil should by default generate dSYM bundles which are filesystem
hierarchies containing the debug info and an additional Info.plist.
Currently llvm-dsymutil emits raw binaries containing the debug info.
This is what we call the 'flat mode'. Add a -f/-flat option that is
supposed to enable that flat mode, but don't wire it for now, only
pass it to the tests that will need it to stay functional once we
do bundle generation by default.
This basically makes this commit NFC and removes the noise from the
actual commit that adds support for bundle generation.
llvm-svn: 244269
Summary: This allows us to consolidate several of the TableGen patterns.
Reviewers: arsenm
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11602
llvm-svn: 244253
points.
There is an infinite loop that can occur in Shrink Wrapping while searching
for the Save/Restore points.
Part of this search checks whether the save/restore points are located in
different loop nests and if so, uses the (post) dominator trees to find the
immediate (post) dominator blocks. However, if the current block does not have
any immediate (post) dominators then this search will result in an infinite
loop. This can occur in code containing an infinite loop.
The modification checks whether the immediate (post) dominator is different from
the current save/restore block. If it is not, then the search terminates and the
current location is not considered as a valid save/restore point for shrink wrapping.
Phabricator: http://reviews.llvm.org/D11607
llvm-svn: 244247
iisUnmovableInstruction() had a list of instructions hardcoded which are
considered unmovable. The list lacked (at least) an entry for the va_arg
and cmpxchg instructions.
Fix this by introducing a new Instruction::mayBeMemoryDependent()
instead of maintaining another instruction list.
Patch by Matthias Braun <matze@braunis.de>.
Differential Revision: http://reviews.llvm.org/D11577
rdar://problem/22118647
llvm-svn: 244244
Summary: Divide the primitive size in bits by eight so the initial load's alignment is in bytes as expected. Tested with the included unit test.
Reviewers: rengolin, jfb
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11804
llvm-svn: 244229
This change improves EmitLoweredSelect() so that multiple contiguous CMOV pseudo
instructions with the same (or exactly opposite) conditions get lowered using a single
new basic-block. This eliminates unnecessary extra basic-blocks (and CFG merge points)
when contiguous CMOVs are being lowered.
Patch by: kevin.b.smith@intel.com
Differential Revision: http://reviews.llvm.org/D11428
llvm-svn: 244202
The COFFSymbolRef::isFunctionDefinition() function tests for several conditions
that are not related to whether a symbol is a function, but rather whether
the symbol meets the requirements for a function definition auxiliary record,
which excludes certain symbols such as internal functions and undefined
references. The test we need to determine the symbol type is much simpler:
we only need to compare the complex type against IMAGE_SYM_DTYPE_FUNCTION.
llvm-svn: 244195
This commit implements the initial serialization of the machine operand target
flags. It extends the 'TargetInstrInfo' class to add two new methods that help
to provide text based serialization for the target flags.
This commit can serialize only the X86 target flags, and the target flags for
the other targets will be serialized in the follow-up commits.
Reviewers: Duncan P. N. Exon Smith
llvm-svn: 244185
This reverts commit r244163. The workaround shouldn't be necessary
after r244172, and moreover the commit was slightly buggy as it
dis a simple mkdir without removing the directory first, which could
cause 'File exists' errors.
llvm-svn: 244182
More specifically, make NVPTXISelDAGToDAG able to emit cached loads (LDG) for pointer induction variables.
Also fix latent bug where LDG was not restricted to kernel functions. I believe that this could not be triggered so far since we do not currently infer that a pointer is global outside a kernel function, and only loads of global pointers are considered for cached loads.
llvm-svn: 244166
This option allows to select a subset of the architectures when
performing a universal binary link. The filter is done completely
in the mach-o specific part of the code.
llvm-svn: 244160
Summary:
Emit both DWARF and CodeView if "CodeView" and "Dwarf Version" module
flags are set.
Reviewers: majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11756
llvm-svn: 244158
This commit serializes the offset for the following operands: target index,
global address, external symbol, constant pool index, and block address.
llvm-svn: 244157
Summary: PR24191 finds that the expected memory-register operations aren't generated when relaxed { load ; modify ; store } is used. This is similar to PR17281 which was addressed in D4796, but only for memory-immediate operations (and for memory orderings up to acquire and release). This patch also handles some floating-point operations.
Reviewers: reames, kcc, dvyukov, nadav, morisset, chandlerc, t.p.northover, pete
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11382
llvm-svn: 244128
The DWARF linker isn't touched by this, the implementation links
individual files and merges them together into a fat binary by
calling out to the 'lipo' utility.
The main change is that the MachODebugMapParser can now return
multiple debug maps for a single binary.
The test just verifies that lipo would be invoked correctly, but
doesn't actually generate a binary. This mimics the way clang
tests its external iplatform tools integration.
llvm-svn: 244087
In PR24288 it was pointed out that the easy case of a non-escaping
global and something that *obviously* required an escape sometimes is
hidden behind PHIs (or selects in theory). Because we have this binary
test, we can easily just check that all possible input values satisfy
the requirement. This is done with a (very small) recursion through PHIs
and selects. With this, the specific example from the PR is correctly
folded by GVN.
Differential Revision: http://reviews.llvm.org/D11707
llvm-svn: 244078