Summary:
Based on Simon's D52965, but improved to handle strict fp and improve some of the shuffling.
Rather than use v2i1/v4i1 and let type legalization continue, just generate all the code with legal types and use an explicit shuffle.
I also added an explicit setcc to the v4i64 code to match the semantics of vselect which doesn't just use the sign bit. I'm also using a v4i64->v4i32 truncate instead of the shuffle in Simon's original code. With the setcc this will become a pack.
Future work can look into using X86ISD::BLENDV and a different shuffle that only moves the sign bit.
Reviewers: RKSimon, spatel
Reviewed By: RKSimon
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71956
Summary:
In addMustTailToCoroResumes, we set musttail on those resume instructions that are followed by a ret instruction. This is done by simplifyTerminatorLeadingToRet which replace a sequence of branches leading to a ret with a clone of the ret.
However it forgets to remove corresponding PHI values that come from basic block of replaced branch, and may cause jumpthreading pass hangs (https://bugs.llvm.org/show_bug.cgi?id=43720)
This patch fix this issue
Test Plan:
cppcoro library with O3+flto
check-llvm
Reviewers: modocache, GorNishanov, lewissbaker
Reviewed By: modocache
Subscribers: mehdi_amini, EricWF, hiraditya, dexonsmith, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71826
Patch by junparser (JunMa)!
This simplifies the generic interface and also makes SHF_ARM_PURECODE
more robust (fixes a TODO). Inspecting MCDataFragment contents covers
more cases than MCObjectStreamer::EmitBytes.
Linux' current addLibCxxIncludePaths and addLibStdCxxIncludePaths
are actually almost non-Linux-specific at all, and can be reused
almost as such for all gcc toolchains. Only keep
Android/Freescale/Cray hacks in Linux's version.
Patch by sthibaul (Samuel Thibault)
Differential Revision: https://reviews.llvm.org/D69758
Attempt to use combineLogicBlendIntoConditionalNegate for (select M, (sub 0, X), X) -> (sub (xor X, M), M)
We limit this to cases that can't easily replace the VSELECT with a shuffle (non-constant masks) or where a BLENDV is likely to occur (which tends to result in slower codegen).
This adds extra scalar handling to isFMAFasterThanFMulAndFAdd, allowing
the target independent code to handle more folds in more situations (for
example if the fast math flags are present, but the global
AllowFPOpFusion option isnt). It also splits apart the HasSlowFPVMLx
into HasSlowFPVFMx, to allow VFMA and VMLA to be controlled separately
if needed.
Differential Revision: https://reviews.llvm.org/D72139
The code here isn't great in all caess. Particularly v4f64->v4i32
on 64-bit AVX targets. But there is some improvement in some
configurations.
There's definitely some issues with computeNumSignBits with
X86ISD::STRICT_FCMP. As well as not being able to propagate sign
bits through merge_values nodes that get created during custom
legalization.
Previously, for vectors we created a vselect with a condition that
didn't match what the target wanted according to getSetCCResultType.
To make up for this, X86 had a special DAG combine to detect if
the condition was all sign bits and then insert its own truncate
or extend. By adding the extend/truncate here explicitly we can
avoid that.
ExpandStrictFPOp calls ExpandUINT_TO_FLOAT. Previously, ExpandUINT_TO_FLOAT
returned SDValue() if it wasn't able to handle and needed to unroll.
Then ExpandStrictFPOp would detect his SDValue() and do the unroll.
After this change, ExpandUINT_TO_FLOAT will directly call
UnrollStrictFPOp and return the unrolled result.
```
lld/ELF/Relocations.cpp:1622:56: warning: loop variable 'ts' of type 'const std::pair<ThunkSection *, uint32_t>' (aka 'const pair<lld:🧝:ThunkSection *, unsigned int>') creates a copy from type 'const std::pair<ThunkSection *, uint32_t>' [-Wrange-loop-analysis]
for (const std::pair<ThunkSection *, uint32_t> ts : isd->thunkSections)
```
Drop const qualifier to fix -Wrange-loop-analysis.
We can make -Wrange-loop-analysis warnings (DiagnoseForRangeConstVariableCopies) on `const A` more
permissive on more types (e.g. POD -> trivially copyable), unfortunately it will not make std::pair
good, because `constexpr pair& operator=(const pair& p);` is unfortunately user-defined.
Reviewed By: Mordante
Differential Revision: https://reviews.llvm.org/D72211
SCEVExpander modifies the underlying function so it is more suitable in
Transforms/Utils, rather than Analysis. This allows using other
transform utils in SCEVExpander.
Reviewers: sanjoy.google, efriedma, reames
Reviewed By: sanjoy.google
Differential Revision: https://reviews.llvm.org/D71537
This produces more intelligible looking results, more comparabble to
the DAG output in the simplest cases. This is probably wrong in
complex control flow, but RegBankSelect doesn't attempt analyzing if
this is on a masked path for selecting the bank yet.
We're checking the current register bank of the registers in the
instruction, but the mapping may have inserted cross bank copies and
is expecting to replace the registers.
We mostly get away with this currently, because VGPR->SGPR copies are
illegal, and we assume this won't happen. In a future change, we'll
start relying on more cross register bank copies being inserted, and
this starts to break down.
I would think it's better than having two practically identical folds
next to eachother, but then generalization isn't all that pretty
due to the fact that we need to produce different `sub` each time..
This change is no-functional-changes-intended refactoring.
Summary:
For artificial cases (huge array, few usages), Global SRA optimization creates
a lot of redundant data. It creates an instance of GlobalVariable for each array
element. For huge array, that means huge compilation time and huge memory usage.
Following example compiles for 10 minutes and requires 40GB of memory.
namespace {
char LargeBuffer[64 * 1024 * 1024];
}
int main ( void ) {
LargeBuffer[0] = 0;
printf("\n ");
return LargeBuffer[0] == 0;
}
The fix is to avoid Global SRA for large arrays.
Reviewers: craig.topper, rnk, efriedma, fhahn
Reviewed By: rnk
Subscribers: xbolva00, lebedev.ri, lkail, merge_guards_bot, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71993
This patch attempts to peek through vectors based on the demanded bits/elt of a particular ISD::EXTRACT_VECTOR_ELT node, allowing us to avoid dependencies on ops that have no impact on the extract.
In particular this helps remove some unnecessary scalar->vector->scalar patterns.
The wasm shift patterns are annoying - @tlively has indicated that the wasm vector shift codegen are to be refactored in the near-term and isn't considered a major issue.
Reapplied after reversion at rL368660 due to PR42982 which was fixed at rGca7fdd41bda0.
Differential Revision: https://reviews.llvm.org/D65887
Both MS link.exe and GNU ld.bfd handle it this way; one can have
multiple object files defining the same absolute symbols, as long
as it defines it to the same value. But if there are multiple absolute
symbols with differing values, it is treated as an error.
Differential Revision: https://reviews.llvm.org/D71981
This represents a more realistic version of the code being tested.
The cmov converter doesn't look at the code after the loop so
it doesn't matter for what's being tested.
But as noted in this twitter thread https://twitter.com/trav_downs/status/1213311159413161987
gcc can turn the previous MaxIndex code into the MaxValue code. So
returning the index makes it a distinct case.
All the windows bots are failing match-tree.td and there's no obvious cause that
I can see. It's not just the %p formatting problem. My best guess is that
there's an ordering issue too but I'll need further information to figure that
out. Revert while I'm investigating.
This reverts commit 64f1bb5cd2 and 77d4b5f5fe
Currently, there is no option to delete all the watchpoint without LLDB
asking for a confirmation. Besides making the watchpoint delete command
homogeneous with the breakpoint delete command, this option could also
become handy to trigger automated watchpoint deletion i.e. using
breakpoint actions.
rdar://42560586
Differential Revision: https://reviews.llvm.org/D72096
Signed-off-by: Med Ismail Bennani <medismail.bennani@gmail.com>