The XL implementation of vec_round for vector double uses
"round-to-nearest, ties to even" just as the vector float
`version does. However clang and gcc use "round-to-nearest-away"
for vector double and "round-to-nearest, ties to even"
for vector float.
The XL behaviour is implemented under the __XL_COMPAT_ALTIVEC__
macro similarly to other instances of incompatibility.
Differential revision: https://reviews.llvm.org/D113642
This patch updates location lists in various x86 tests to reflect what
instruction referencing produces. There are two flavours of change:
* Not following a register copy immediately, because instruction
referencing can make some slightly smarter decisions,
* Extended ranges, due to having additional information.
The register changes aren't that interesting, it's just a choice between
equally legitimate registers that instr-ref does differently. The extended
ranges are largely due to following stack restores better.
Differential Revision: https://reviews.llvm.org/D114362
The test tries to provoke internal allocator to be locked during fork
and then force the child process to use the internal allocator.
This test sometimes deadlocks with the new tsan runtime.
Depends on D114514.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D114515
It was introduced in:
9cffc9550b tsan: allow to force use of __libc_malloc in sanitizer_common
and used in:
512a18e518 tsan: add standalone deadlock detector
and later used for Go support.
But now both uses are gone. Nothing defines SANITIZER_USE_MALLOC.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D114514
Make all code go through FormatTokenSource instead of going around it, which
makes changes to TokenSource brittle.
Add LLVM_DEBUG in FormatTokenSource to be able to follow the token stream.
In these test updates for instruction referencing, I've added specific
instr-ref RUN lines, and kep thte DBG_VALUE-based variable location check
lines too. This is because argument handling is really fiddly, and I figure
it's worth duplicating the testing to ensure it's definitely correct.
There's also dbg-value-superreg-copy2.mir, a dtest for where varaible
locations go when virtual registers are coalesced together. I don't think
there's an instruction referencing specific test for this, so have
duplicated that to for instruction referencing.
Differential Revision: https://reviews.llvm.org/D114262
If we only demand bits from one half of a rotation pattern, see if we can simplify to a logical shift.
For the ARM rev16 patterns, I had to drop a fold to prevent srl(bswap()) -> rotr(bswap) -> srl(bswap) infinite loops. I've replaced this with an isel PatFrag which should do the same task.
https://alive2.llvm.org/ce/z/iroxki (rol -> shl by amt iff demanded bits has at least as many trailing zeros as the shift amount)
https://alive2.llvm.org/ce/z/4ez_U- (ror -> shl by revamt iff demanded bits has at least as many trailing zeros as the reverse shift amount)
https://alive2.llvm.org/ce/z/cD7dR- (ror -> lshr by amt iff demanded bits has at least as many leading zeros as the shift amount)
https://alive2.llvm.org/ce/z/_XGHtQ (rol -> lshr by revamt iff demanded bits has at least as many leading zeros as the reverse shift amount)
Differential Revision: https://reviews.llvm.org/D114354
Select SelectionDAG ops smul_lohi/umul_lohi to
v_mad_i64_i32/v_mad_u64_u32 respectively, with an addend of 0.
v_mul_lo, v_mul_hi and v_mad_i64/u64 are all quarter-rate instructions
so it is better to use one instruction than two.
Further improvements are possible to make better use of the addend
operand, but this is already a strict improvement over what we have
now.
Differential Revision: https://reviews.llvm.org/D113986
Change VOP_PAT_GEN to default to not generating an instruction selection
pattern for the VOP2 (e32) form of an instruction, only for the VOP3
(e64) form. This allows SIFoldOperands maximum freedom to fold copies
into the operands of an instruction, before SIShrinkInstructions tries
to shrink it back to the smaller encoding.
This affects the following VOP2 instructions:
v_min_i32
v_max_i32
v_min_u32
v_max_u32
v_and_b32
v_or_b32
v_xor_b32
v_lshr_b32
v_ashr_i32
v_lshl_b32
A further cleanup could simplify or remove VOP_PAT_GEN, since its
optional second argument is never used.
Differential Revision: https://reviews.llvm.org/D114252
Implicit derivatives are only valid in pixel shaders,
hence only implicitly enable WQM for pixel shaders.
This avoids unintended WQM in other shader types (e.g. compute)
when image sampling instructions are used.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D114414
This is similar to D113574, but as a DAG combine, not tablegen patterns.
Doing the fold as a DAG combine allows the fadd to be folded with a
fmul, finally producing a predicated vfma. It performs the same fold of
fadd(x, vselect(p, y, -0.0)) to vselect p, (fadd x, y), x) using -0.0 as
the identity value of a fadd.
Differential Revision: https://reviews.llvm.org/D113584
In quite a few places we were calling getCurSDLoc() to get the debug
location, but this is already a local variable `sdl`.
Differential Revision: https://reviews.llvm.org/D114447
It appears that we can emit all the instructions for a function, including
debug instructions, and then optimise some of the values out late.
Specifically, in the attached test case, an argument gets optimised out
after DBG_VALUE / DBG_INSTR_REFs are created. This confuses
MachineFunction::finalizeDebugInstrRefs, which expects to be able to find a
defining instruction, and crashes instead.
Fix this by identifying when there's no defining instruction, and
translating that instead into a DBG_VALUE $noreg.
Differential Revision: https://reviews.llvm.org/D114476
Similar to D84091 which added extra predicated folds for integer operations
using the identity element of the operation, this adds them for floating
point operations for the form `BinOp(x, select(p, y, Identity))`. They are
folded back to predicated versions of the operator, with fadd having the
identity -0.0, fsub using the identity 0.0 and fmul using 1.0.
Differential Revision: https://reviews.llvm.org/D113574
Test size larger than clear_shadow_mmap_threshold,
which is handled differently.
Depends on D114348.
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D114366
This patch adds parallel processing of chunks. When reducing very large
inputs, e.g. functions with 500k basic blocks, processing chunks in
parallel can significantly speed up the reduction.
To allow modifying clones of the original module in parallel, each clone
needs their own LLVMContext object. To achieve this, each job parses the
input module with their own LLVMContext. In case a job successfully
reduced the input, it serializes the result module as bitcode into a
result array.
To ensure parallel reduction produces the same results as serial
reduction, only the first successfully reduced result is used, and
results of other successful jobs are dropped. Processing resumes after
the chunk that was successfully reduced.
The number of threads to use can be configured using the -j option.
It defaults to 1, which means serial processing.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D113857
This patch updates the cost model for ordered reductions so that a call
to the llvm.fmuladd intrinsic is modelled as a normal fmul instruction
plus the cost of an ordered fadd reduction.
Differential Revision: https://reviews.llvm.org/D111630
In-loop vector reductions which use the llvm.fmuladd intrinsic involve
the creation of two recipes; a VPReductionRecipe for the fadd and a
VPInstruction for the fmul. If the call to llvm.fmuladd has fast-math flags
these should be propagated through to the fmul instruction, so an
interface setFastMathFlags has been added to the VPInstruction class to
enable this.
Differential Revision: https://reviews.llvm.org/D113125
`isConstRefReturningMethodCall` should be considering
`CXXOperatorCallExpr` in addition to `CXXMemberCallExpr`. Clang considers
these to be distinct (`CXXOperatorCallExpr` derives from `CallExpr`, not
`CXXMemberCallExpr`), but we don't care in the context of this
check.
This is important because of
`std::vector<Expensive>::operator[](size_t) const`.
Differential Revision: https://reviews.llvm.org/D114249
The existing constrained shift PatFrags only dealt with masked shift
from OpenCL front-ends. This change copies the
X86DAGToDAGISel::isUnneededShiftMask() function to AMDGPU and uses it in
the shift PatFrag predicates.
Differential Revision: https://reviews.llvm.org/D113448
The attribute 'r' allows (or disallows for the negative case) read-only
sections, i.e. ones without the SHF_WRITE flag, to be assigned to the
memory region. Before the patch, lld could put a section in the wrong
region or fail with "error: no memory region specified for section".
Differential Revision: https://reviews.llvm.org/D113771
Remove duplicate `Pass` suffix from view-op-graph pass class name. The
extra suffix would lead to methods like registerViewOpGraphPassPass
being generated.
Differential Revision: https://reviews.llvm.org/D114459
Also replace ArrayAttr with IndexElementsAttr to model subscript dimensions.
An array of attribute is a sparse inefficient storage, with an API that
requires to unpack/repack integers at every call site.
Instead we can store dense array of integer as IndexElementsAttr.
Reviewed By: clementval, kiranchandramohan
Differential Revision: https://reviews.llvm.org/D112899
I don't see a reason why not to. If we allows lookup functions by full names,
I can change the test case in D113930 to use `lldb-test symbols --find=function --name=full::name --function-flags=full ...`,
though the duplicate method decl prolem is still there for `lldb-test symbols --dump-ast`.
That's a seprate bug, we can fix it later.
Differential Revision: https://reviews.llvm.org/D114467
Nothing uses more than 8bit now. So the rest of the headers can store other data.
kStackTraceMax is 256 now, but all sanitizers by default store just 20-30 frames here.
Verified no diff exist between previous version, new version python 2, and python 3 for an example stack.
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D114404
Similarly to what GCC does, we should allow scalars with
the "v" constraint rather than introducing unnecessary
new constraints for scalars in Altivec registers.
Differential revision: https://reviews.llvm.org/D113635
AMDGPU is unusual in that the both stack is indexed in the same
direction as stack growth (up). We therefore always need the emergency
stack slots placed as low as possible to ensure they are in range of
load/store instruction immediate offsets. The existing logic is mostly
OK, but failed if we required stack realignment.
I don't understand what the existing control isFPCloseToIncomingSP is
supposed to mean, but can only be used to stop placing the scavenge
slots earlier. Make this explicit so that targets can opt-in rather
than opt-out only.