When demangling string literals, Microsoft's undname
simply prints 'string'. This patch implements string
literal demangling while doing a bit better than this
by decoding as much of the string as possible and
trying to faithfully reproduce the original string
literal definition.
This is a bit tricky because the different character
types char, char16_t, and char32_t are not uniquely
identified by the mangling, so we have to use a
heuristic to try to guess the character type. But
it works pretty well, and many tests are added to
illustrate the behavior.
Differential Revision: https://reviews.llvm.org/D50806
llvm-svn: 339892
When we have an MD5 mangled name, we shouldn't choke and say
that it's an invalid name. Even though it's impossible to demangle,
we should just output the original name.
llvm-svn: 339891
Expand the number of cases when `pow(x, 0.5)` is simplified into `sqrt(x)`
by considering the math semantics with more granularity.
Differential revision: https://reviews.llvm.org/D50036
llvm-svn: 339887
While searching through the use-def tree, ignore GetElementPtrInst
instructions because they don't need promoting and neither do their
indices. Otherwise, the wide indices prevent the transformation from
happening.
Differential Revision: https://reviews.llvm.org/D50762
llvm-svn: 339871
When emitting the difference between two symbols, the standard behavior is
that the difference will be resolved to an absolute value if both of the
symbols are offsets from the same data fragment. This is undesirable on
architectures such as RISC-V where relaxation in the linker may cause the
computed difference to become invalid. This caused an issue when compiling to
object code, where the size of a function in the debug information was already
calculated even though it could change as a consequence of relaxation in the
subsequent linking stage.
This patch inhibits the resolution of symbol differences to absolute values
where the target's AsmBackend has declared that it does not want these to be
folded.
Differential Revision: https://reviews.llvm.org/D45773
Patch by Edward Jones.
llvm-svn: 339864
Originally committed in r339755 which was reverted in r339806 due to
an asan issue. The issue was caused by my assumption that operands to
a CallInst mapped to the FunctionType Params. CallInsts are now
handled by iterating over their ArgOperands instead of Operands.
Original Message:
Treat signed icmps as 'sinks', allowing them to be in the use-def
tree, enabling more promotions to be performed. As a sink, any
promoted incoming values need to be truncated before being used by
the signed icmp.
Differential Revision: https://reviews.llvm.org/D50067
llvm-svn: 339858
a shorter name ('x86-slh') for the internal flags and pass name.
Without this, you can't use the -stop-after or -stop-before
infrastructure. I seem to have just missed this when originally adding
the pass.
The shorter name solves two problems. First, the flag names were ...
really long and hard to type/manage. Second, the pass name can't be the
exact same as the flag name used to enable this, and there are already
some users of that flag name so I'm avoiding changing it unnecessarily.
llvm-svn: 339836
Summary:
Profile count of a block is computed by multiplying its block frequency
by entry count and dividing the result by entry block frequency. Do
rounded division in the last step and update test cases appropriately.
Reviewers: davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D50822
llvm-svn: 339835
This patch fixes PR38125.
Instruction extension types are recorded in PromotedInsts, it can be used later in function canGetThrough. If an instruction has two users with different extension types, it will be inserted into PromotedInsts two times in function promoteOperandForOther. The second one overwrites the first one, and the final extension type is wrong, later causes problem in canGetThrough.
This patch changes the simple bool extension type to 2-bit enum type, add a BothExtension type in addition to zero/sign extension. When an user sees BothExtension for an instruction, it actually knows nothing about how that instruction is extended.
Differential Revision: https://reviews.llvm.org/D49512
llvm-svn: 339822
Handle fmul, fsub and preserve flags.
Also really test minnum/maxnum reductions.
The existing tests were only checking from
minnum/maxnum matched from a fast math compare
and select which is not the same.
llvm-svn: 339820
To lower this we now create a new V1 containing the low half of both sources and a new V2 containing the upper half of both sources. Then we created a repeated lane shuffle of those new sources to create the final result.
This fixes PR35833
Differential Revison: https://reviews.llvm.org/D41794
llvm-svn: 339818
If a relocation group doesn't have the RELOCATION_GROUP_HAS_ADDEND_FLAG set, then this implies the group's addend equals zero.
In this case android packed format won't encode an explicit addend delta, instead we need to set Addend, the "previous addend" variable, to zero by ourself.
Patch by Yi-Yo Chiang!
Differential Revision: https://reviews.llvm.org/D50601
llvm-svn: 339799
These correspond to the x86 tests added with rL339790 / rL339791, but I widened
the non-fsin tests to v3f32 to show the problem because AArch supports v2f32 ops.
llvm-svn: 339793
Subregister liveness applies selectively to register classes with certain
properties. Make sure that when it's enabled, it applies to a given virtual
register (in virtual register rewriter).
llvm-svn: 339784
To make ISD::VSELECT available(legal) so long as there are altivec instruction,
otherwise it's default behavior is expanding.
Use xxsel to match vselect if vsx is open, or use vsel.
In order to do not write many patterns in td file, promote (for vector it's
bitcast) all other type into v4i32 and only pattern match vselect of v4i32 into
vsel or xxsel.
Patch by wuzish
Differential revision: https://reviews.llvm.org/D49531
llvm-svn: 339779
This allows to set custom Info field value for SHT_GROUP sections.
It is useful to allow this because we would be able to replace at least one binary
object committed in LLD and replace it with the yaml2obj based test.
Differential revision: https://reviews.llvm.org/D50776
llvm-svn: 339772
We only try to promote types with are smaller than 16-bits, but we
also need to check that the type is not less than 8-bits.
Differential Revision: https://reviews.llvm.org/D50769
llvm-svn: 339770
When trying to combine a DAG that builds a vector out of sign-extensions of
vector extracts, the code assumes legal input types. Due to that, we have to
disable this combine prior to legalization.
In some cases, the DAG will look slightly different after legalization so
account for that in the matching code.
This is a fix for https://bugs.llvm.org/show_bug.cgi?id=38087
Differential Revision: https://reviews.llvm.org/D49080
llvm-svn: 339769
This patch fixes a regression introduced at revision 338702.
A processor resource mask was incorrectly implicitly truncated to an unsigned
quantity. Later on, the truncated mask was used to initialize an element of a
vector of processor resource descriptors.
On targets with more than 32 processor resources, some elements of the vector
are left uninitialized. As a consequence, this bug might have eventually caused
a crash due to null dereference in the Scheduler.
This patch fixes PR38575, and adds a test for it.
llvm-svn: 339768
Currently, it is possible to use yaml2obj for producing SHT_GROUP sections
of type GRP_COMDAT. For LLD test case I need to produce an object with
a broken (different from GRP_COMDAT) type.
The patch teaches tool to do such things.
Differential revision: https://reviews.llvm.org/D50761
llvm-svn: 339764
This commit adds new sibling-call test cases, so it will be possible to see
how these test cases will be changed after applying D45653.
See D45653 for details.
llvm-svn: 339760
This patch refactors the existing BuildExactSDIV implementation to support non-uniform constant vector denominators.
Differential Revision: https://reviews.llvm.org/D50392
llvm-svn: 339756
Treat signed icmps as 'sinks', allowing them to be in the use-def
tree, enabling more promotions to be performed. As a sink, any
promoted incoming values need to be truncated before being used by
the signed icmp.
Differential Revision: https://reviews.llvm.org/D50067
llvm-svn: 339755
Add pointers to the list of allowed types, but don't try to promote
them. Also fixed a bug with the promotion of undef values, so a new
value is now created instead of mutating in place. We also now only
promote if there's an instruction in the use-def chains other than
the icmp, sinks and sources.
Differential Revision: https://reviews.llvm.org/D50054
llvm-svn: 339754
The `experimental_guard` intrinsic has memory write semantics to model the thread-exiting
logic, but does not do any actual writes to memory. Currently, `AliasSetTracker` treats it as a
normal memory write. As result, a loop-invariant load cannot be hoisted out of loop because
the guard may possibly alias with it.
This patch makes `AliasSetTracker` so that it doesn't treat guards as memory writes.
Differential Revision: https://reviews.llvm.org/D50497
Reviewed By: reames
llvm-svn: 339753
Summary:
This shouldn't have been a specific number but rather a regex. This was
a part of rL339474 which got reverted.
Reviewers: aardappel
Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D50728
llvm-svn: 339736
Flags in DIBasicType will be used to pass attributes used in
DW_TAG_base_type, such as DW_AT_endianity.
Patch by Chirag Patel!
Differential Revision: https://reviews.llvm.org/D49610
llvm-svn: 339714
Modifies existing SIMD tests to also check that SIMD instructions are
lowered to the expected bytes. This CL depends on D50597.
Reviewers: aheejin
Subscribers: sunfish, jgravelle-google, sbc100, llvm-commits
Differential Revision: https://reviews.llvm.org/D50660
Patch by Thomas Lively (tlively)
llvm-svn: 339712
1) We print __restrict twice on member pointers. This is fixed
and relevant tests are re-enabled.
2) Several tests were disabled because of printing slightly
different output than undname. These were confirmed to be
bugs in undname, so we just re-enable the tests.
3) The test for printing reference temporaries is re-enabled. This
is a clang mangling extension, so we have some flexibility with
how we demangle it. The output currently looks fine, so we just
re-enable the test with no fixes.
llvm-svn: 339708
Implement instruction selection for all versions of the extract_lane
instruction. Use explicit sext/zext to differentiate between
extract_lane_s and extract_lane_u for applicable types, otherwise
default to extract_lane_u.
Reviewers: aheejin
Subscribers: sunfish, jgravelle-google, sbc100, llvm-commits
Differential Revision: https://reviews.llvm.org/D50597
Patch by Thomas Lively (tlively)
llvm-svn: 339707
Summary:
This patch teaches the loop vectorizer to vectorize loops with non
header phis that have have outside uses. This is because the iteration
dependence distance for these phis can be widened upto VF (similar to
how we do for induction/reduction) if they do not have a cyclic
dependence with header phis. When identifying reduction/induction/first
order recurrence header phis, we already identify if there are any cyclic
dependencies that prevents vectorization.
The vectorizer is taught to extract the last element from the vectorized
phi and update the scalar loop exit block phi to contain this extracted
element from the vector loop.
This patch can be extended to vectorize loops where instructions other
than phis have outside uses.
Reviewers: Ayal, mkuper, mssimpso, efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D50579
llvm-svn: 339703
rL339686 added the case where a faux shuffle might have repeated shuffle inputs coming from either side of the OR().
This patch improves the insertion of the inputs into the source ops lists to account for this, as well as making it trivial to add support for shuffles with more than 2 inputs in the future.
llvm-svn: 339696
There are two forms for label debug information in DWARF format.
1. Labels in a non-inlined function:
DW_TAG_label
DW_AT_name
DW_AT_decl_file
DW_AT_decl_line
DW_AT_low_pc
2. Labels in an inlined function:
DW_TAG_label
DW_AT_abstract_origin
DW_AT_low_pc
We will collect label information from DBG_LABEL. Before every DBG_LABEL,
we will generate a temporary symbol to denote the location of the label.
The symbol could be used to get DW_AT_low_pc afterwards. So, we create a
mapping between 'inlined label' and DBG_LABEL MachineInstr in DebugHandlerBase.
The DBG_LABEL in the mapping is used to query the symbol before it.
The AbstractLabels in DwarfCompileUnit is used to process labels in inlined
functions.
We also keep a mapping between scope and labels in DwarfFile to help to
generate correct tree structure of DIEs.
It also generates label debug information under global isel.
Differential Revision: https://reviews.llvm.org/D45556
llvm-svn: 339676
Summary: This revision improves previous version (rL330322) which has been reverted due to crashes.
This is the patch that lowers x86 intrinsics to native IR
in order to enable optimizations. The patch also includes folding
of previously missing saturation patterns so that IR emits the same
machine instructions as the intrinsics.
Reviewers: craig.topper, spatel, RKSimon
Reviewed By: craig.topper
Subscribers: mike.dvoretsky, DavidKreitzer, sroland, llvm-commits
Differential Revision: https://reviews.llvm.org/D46179
llvm-svn: 339650
Summary:
When WPD is performed in a ThinLTO backend, the function may be created
if it isn't already in that module. Module::getOrInsertFunction may
add a bitcast, in which case the returned Constant is not a Function and
doesn't have a name. Invoke stripPointerCasts() on the returned value
where we access its name.
Reviewers: pcc
Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits
Differential Revision: https://reviews.llvm.org/D49959
llvm-svn: 339640
Summary:
The AsmWriter was only writing the Args for a ConstVCall if it was
non-empty, however, the LLParser was always expecting it. To aid
in making it optional, surround the ConstVCall VFuncId and Args in
parentheses when writing, then make the Args optional when reading.
Reviewers: pcc
Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits
Differential Revision: https://reviews.llvm.org/D49960
llvm-svn: 339637
Summary:
Calls marked 'tail' cannot read or write allocas from the current frame
because the current frame might be destroyed by the time they run.
However, a tail call may use an alloca with byval. Calling with byval
copies the contents of the alloca into argument registers or stack
slots, so there is no lifetime issue. Tail calls never modify allocas,
so we can return just ModRefInfo::Ref.
Fixes PR38466, a longstanding bug.
Reviewers: hfinkel, nlewycky, gbiv, george.burgess.iv
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D50679
llvm-svn: 339636
The behavior in 64-bit mode is different between Intel and AMD CPUs. Intel ignores the 0x66 prefix. AMD does not. objump doesn't ignore the 0x66 prefix. Since LLVM aims to match objdump behavior, we should do the same.
While I was trying to fix this I had change brtarget16/32 to use ENCODING_IW/ID instead of ENCODING_Iv to get the 0x66+REX.W case to act sort of sanely. It's still wrong, but that's a problem for another day.
The change in encoding exposed the fact that 16-bit mode disassembly of relative jumps was creating JMP_4 with a 2 byte immediate. It should have been JMP_2. From just printing you can't tell the difference, but if you dumped the encoding it wouldn't have matched what we started with.
While fixing that, it exposed that jo/jno opcodes were missing from the switch that this patch deleted and there were no test cases for them.
Fixes PR38537.
llvm-svn: 339622
Summary:
This comes with `Implicit Conversion Sanitizer - integer sign change` (D50250):
```
signed char test(unsigned int x) { return x; }
```
`clang++ -fsanitize=implicit-conversion -S -emit-llvm -o - /tmp/test.cpp -O3`
* Old: {F6904292}
* With this patch: {F6904294}
General pattern:
X & Y
Where `Y` is checking that all the high bits (covered by a mask `4294967168`)
are uniform, i.e. `%arg & 4294967168` can be either `4294967168` or `0`
Pattern can be one of:
%t = add i32 %arg, 128
%r = icmp ult i32 %t, 256
Or
%t0 = shl i32 %arg, 24
%t1 = ashr i32 %t0, 24
%r = icmp eq i32 %t1, %arg
Or
%t0 = trunc i32 %arg to i8
%t1 = sext i8 %t0 to i32
%r = icmp eq i32 %t1, %arg
This pattern is a signed truncation check.
And `X` is checking that some bit in that same mask is zero.
I.e. can be one of:
%r = icmp sgt i32 %arg, -1
Or
%t = and i32 %arg, 2147483648
%r = icmp eq i32 %t, 0
Since we are checking that all the bits in that mask are the same,
and a particular bit is zero, what we are really checking is that all the
masked bits are zero.
So this should be transformed to:
%r = icmp ult i32 %arg, 128
The transform itself ended up being rather horrible, even though i omitted some cases.
Surely there is some infrastructure that can help clean this up that i missed?
https://rise4fun.com/Alive/3Ou
The initial commit (rL339610)
was reverted, since the first assert was being triggered.
The @positive_with_extra_and test now has coverage for that case.
Reviewers: spatel, craig.topper
Reviewed By: spatel
Subscribers: RKSimon, erichkeane, vsk, llvm-commits
Differential Revision: https://reviews.llvm.org/D50465
llvm-svn: 339621
Even though this code is below a function called optimizeFloatingPointLibCall(),
we apparently can't guarantee that we're dealing with FPMathOperators, so bail
out immediately if that's not true.
llvm-svn: 339618
At least one buildbot was able to actually trigger that assert
on the top of the function. Will investigate.
This reverts commit r339610.
llvm-svn: 339612
Summary:
This comes with `Implicit Conversion Sanitizer - integer sign change` (D50250):
```
signed char test(unsigned int x) { return x; }
```
`clang++ -fsanitize=implicit-conversion -S -emit-llvm -o - /tmp/test.cpp -O3`
* Old: {F6904292}
* With this patch: {F6904294}
General pattern:
X & Y
Where `Y` is checking that all the high bits (covered by a mask `4294967168`)
are uniform, i.e. `%arg & 4294967168` can be either `4294967168` or `0`
Pattern can be one of:
%t = add i32 %arg, 128
%r = icmp ult i32 %t, 256
Or
%t0 = shl i32 %arg, 24
%t1 = ashr i32 %t0, 24
%r = icmp eq i32 %t1, %arg
Or
%t0 = trunc i32 %arg to i8
%t1 = sext i8 %t0 to i32
%r = icmp eq i32 %t1, %arg
This pattern is a signed truncation check.
And `X` is checking that some bit in that same mask is zero.
I.e. can be one of:
%r = icmp sgt i32 %arg, -1
Or
%t = and i32 %arg, 2147483648
%r = icmp eq i32 %t, 0
Since we are checking that all the bits in that mask are the same,
and a particular bit is zero, what we are really checking is that all the
masked bits are zero.
So this should be transformed to:
%r = icmp ult i32 %arg, 128
https://rise4fun.com/Alive/3Ou
Reviewers: spatel, craig.topper
Reviewed By: spatel
Subscribers: RKSimon, erichkeane, vsk, llvm-commits
Differential Revision: https://reviews.llvm.org/D50465
llvm-svn: 339610
Added a test case to reduction showing where it's illegal to identify
vectorize a loop.
Resetting the reduction var during loop iterations disallows us from
widening the dependency cycle to VF, thereby making it illegal to
vectorize the loop.
llvm-svn: 339605
This is a very partial fix for the reported problem. I suspect
we do not get this fold in most motivating cases because most of
the time, the libcall would have been replaced by an intrinsic,
and that optimization is handled elsewhere...but maybe it should
be handled here?
llvm-svn: 339604
Fix SelectionDAG::computeKnownBits asserting when handling EXTRACT_SUBVECTOR
when zero extending the demanded elements mask if it is already as long as the
source vector.
Differential Revision: https://reviews.llvm.org/D49574
llvm-svn: 339600
Summary: The GR740 provides an up cycle counter in the
registers ASR22 and ASR23. As these registers can not be
read together atomically we only use the value of ASR23
for llvm.readcyclecounter(). The ASR23 register holds the
32 LSBs of the up-counter.
Reviewers: jyknight, venkatra
Reviewed By: jyknight
Subscribers: fedor.sergeev, jrtc27, llvm-commits
Differential Revision: https://reviews.llvm.org/D48638
llvm-svn: 339551
This is a second part of D49974 that handles widening of conditional branches that
have very likely `false` branch.
Differential Revision: https://reviews.llvm.org/D50040
Reviewed By: reames
llvm-svn: 339537
This is another variation of PR38533. In this case, the result type of the bitcast is legal and 16-bits wide, but not a scalar integer. So we need to emit the convert to i16 and then bitcast it to the true result type. This new bitcast will be further type legalized if necessary.
llvm-svn: 339536
Previously if the result type was a vector, we emitted a FP_TO_FP16 with a vector result type which isn't valid.
This is basically the opposite case of the root cause of PR38533.
llvm-svn: 339535
Fixes PR37524.
The exception handling encodings for x86_64 in kernel code model
has been changed with r309884. Restore it to correct ones. These
encodings include PersonalityEncoding, LSDAEncoding and
TTypeEncoding.
Differential Revision: https://reviews.llvm.org/D50490
llvm-svn: 339534
Summary:
We've supported constant folding for sse versions for many years. This patch adds support for the avx512 versions including unsigned with the default rounding mode. We could probably do more with other roundings modes and SAE in the future.
The test cases are largely based on the sse.ll test cases. But I did add some test cases to ensure the unsigned versions don't accept negative values. Also checked the bounds of f64->i32 conversions to make sure unsigned has a larger positive range than signed.
Reviewers: RKSimon, spatel, chandlerc
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D50553
llvm-svn: 339529
I'm not sure the exact nsz flag combination that
is OK. I think as long as it's on either, this is OK.
For now just check it on the omod multiply.
llvm-svn: 339513
If one of the elements is undef, use the canonicalized constant
from the other element instead of 0.
Splat vectors are more useful for other optimizations, such
as matching vector clamps. This was breaking on clamps
of half3 from the undef 4th component.
llvm-svn: 339512
Try to improve the computed counts when it has been explicitly set by a pragma
or command line option. This moves the code around, so that first call to
computeUnrollCount to get a sensible count and override that if explicit unroll
and jam counts are specified.
Also added some extra debug messages for when unroll and jamming is disabled.
Differential Revision: https://reviews.llvm.org/D50075
llvm-svn: 339501
Now we switch to the subregister in expandPostRAPseudos where we already switched the opcode.
This simplifies a few isel patterns that used the pseudo directly. And magically seems to have improved our ability to CSE it in the undef-label.ll test.
llvm-svn: 339496
If we have an assume which is known to execute and whose operand is invariant, we can lift that into the pre-header. So long as we don't change which paths the assume executes on, this is a legal transformation. It's likely to be a useful canonicalization as other transforms only look for dominating assumes.
Differential Revision: https://reviews.llvm.org/D50364
llvm-svn: 339481
Summary:
Moved Explicit Locals pass to last.
Made that pass obligatory.
Made it convert from register to stack based instructions, and removed the registers.
Fixes to related code that was expecting register based instructions.
Added the correct testing flag to all tests, depending on what the
format they were expecting so far.
Translated one test to stack format as example: reg-stackify-stack.ll
tested:
llvm-lit -v `find test -name WebAssembly`
unittests/MC/*
Reviewers: dschuff, sunfish
Subscribers: jfb, llvm-commits, aheejin, eraman, jgravelle-google, sbc100
Differential Revision: https://reviews.llvm.org/D50568
llvm-svn: 339474
LLVM normally prefers to minimize the number of bits set in an AND
immediate, but that doesn't always match the available ARM instructions.
In Thumb1 mode, prefer uxtb or uxth where possible; otherwise, prefer
a two-instruction sequence movs+ands or movs+bics.
Some potential improvements outlined in
ARMTargetLowering::targetShrinkDemandedConstant, but seems to work
pretty well already.
The ARMISelDAGToDAG fix ensures we don't generate an invalid UBFX
instruction due to a larger-than-expected mask. (It's orthogonal, in
some sense, but as far as I can tell it's either impossible or nearly
impossible to reproduce the bug without this change.)
According to my testing, this seems to consistently improve codesize by
a small amount by forming bic more often for ISD::AND with an immediate.
Differential Revision: https://reviews.llvm.org/D50030
llvm-svn: 339472
There are two cases we need to support with extern "C"
functions. The first is the case of a '9' indicating that
the function has no prototype. This occurs when we mangle
a symbol inside of an extern "C" function, but not the
function itself.
The second case is when we have an overloaded extern "C"
functions. In this case we emit $$J0 to indicate this.
This patch adds support for both of these cases.
llvm-svn: 339471
This was broken because of a malformed check line. Incidentally,
this exposed a case where we crash when we should just be returning
an error, so we should fix that. The demangler shouldn't crash due
to user input.
llvm-svn: 339466
Before we wouldn't properly demangle something like
Foo<const int>. Template args have a special escape sequence
'$$C' that is optional, but if it is present contains
qualifiers. So we need to check for this and only if it
present, demangle qualifiers before demangling the type.
With this fix, we re-enable some tests that were previously
marked FIXME.
llvm-svn: 339465
This includes a test that would have exposed the bug in rL339439
which was reverted at rL339446. The compare can be integer while
the binop is FP or vice-versa, so we need to use the binop type
when we ask for the identity constant.
llvm-svn: 339453
Summary: Similar to asan's flag, it can be used to disable the use of ifunc to access hwasan shadow address.
Reviewers: vitalybuka, kcc
Subscribers: srhines, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D50544
llvm-svn: 339447
These were uncovered when porting the mangling tests in
ms-templates.cpp from clang/CodeGenCXX over to demangling
tests. The main issues fixed here are surrounding integer
literal signed and unsignedness, empty array dimensions,
and pointer and reference non-type template parameters.
Differential Revision: https://reviews.llvm.org/D50512
llvm-svn: 339434
Enabling ARMCodeGenPrepare by default caused a whole load of
failures. This is due to zexts and truncs not being handled properly.
ZExts are messy so it's just easier to disable for now and truncs
are allowed only as 'sinks'. I still need to figure out why allowing
them as 'sources' causes so many failures. The other main changes are
that we are explicit in the types that we converting to, it's now
always 'TypeSize'. Type support is also now performed while checking
for valid opcodes as it unnecessarily complicated having the checks
are different stages.
I've moved the tests around too, so we have the zext and truncs in
their own file as well as the overflowing opcode tests.
Differential Revision: https://reviews.llvm.org/D50518
llvm-svn: 339432
The previous name sounds like it inserts cfguard implementation, but it
really just emits the table of address-taken functions. Change the name
to better reflect that.
Clang will be updated in the next commit.
llvm-svn: 339419
If code is compiled for X86 without SSE support, the register save area
doesn't contain FPU registers, so `AMD64FpEndOffset` should be equal to
`AMD64GpEndOffset`.
llvm-svn: 339414
MemorySSA currently creates MemoryAccesses for lifetime intrinsics, and
sometimes treats them as clobbers. This may/may not be the best way
forward, but while we're doing it, we should consider
MayAlias/PartialAlias to be clobbers.
The ideal fix here is probably to remove all of this reasoning about
lifetimes from MemorySSA + put it into the passes that need to care. But
that's a wayyy broader fix that needs some consensus, and we have
miscompiles + a release branch today, and this should solve the
miscompiles just as well.
differential revision is D43269. Landing without an explicit LGTM (and
without using the special please-autoclose-this syntax) so we can still
use that revision as a place to decide what the right fix here is.
llvm-svn: 339411
Summary:
i64x2 and f64x2 operations are not implemented in V8, so we normally
do not want to emit them. However, they are in the SIMD spec proposal,
so we still want to be able to test them in the toolchain. This patch
adds a flag to enable their emission.
Reviewers: aheejin, dschuff
Subscribers: sunfish, jgravelle-google, sbc100, llvm-commits
Differential Revision: https://reviews.llvm.org/D50423
Patch by Thomas Lively (tlively)
llvm-svn: 339407
The motivating case is an otherwise dead loop with a fence in it. At the moment, this goes all the way through the optimizer and we end up emitting an entirely pointless loop on x86. This case may seem a bit contrived, but we've seen it in real code as the result of otherwise reasonable lowering strategies combined w/thread local memory optimizations (such as escape analysis).
To handle this simple case, we can teach LICM to hoist must execute fences when there is no other memory operation within the loop.
Differential Revision: https://reviews.llvm.org/D50489
llvm-svn: 339378
Summary:
LoopSimplifyCFG should update ScEv for all loops after a block is deleted.
If the deleted block "Succ" is part of L, then it is part of all parent loops, so forget topmost loop.
Reviewers: greened, mkazantsev, sanjoy
Subscribers: jlebar, javed.absar, uabelho, llvm-commits
Differential Revision: https://reviews.llvm.org/D50422
llvm-svn: 339363
The inalloca parameter has to be the only parameter passed in memory.
Changing the convention to fastcc can break that.
At some point we should teach global opt how to optimize ABI attributes
like inalloca and maybe byval. These attributes are mainly used to match
C ABIs. They are harder for LLVM to optimize and they don't always
generate the best code.
Fixes PR38487
llvm-svn: 339360
Similar to rL337966 - if the DAGCombiner's rotate matching was
working as expected, I don't think we'd see any test diffs here.
AArch only goes right, and PPC only goes left.
x86 has both, so no diffs there.
Differential Revision: https://reviews.llvm.org/D50091
llvm-svn: 339359
Summary: This change provides a common optimization path for both Unsafe and FMF driven optimization for this fsub fold adding reassociation, as it the flag that most closely represents the translation
Reviewers: spatel, wristow, arsenm
Reviewed By: spatel
Subscribers: wdng
Differential Revision: https://reviews.llvm.org/D50195
llvm-svn: 339357
As discussed on D41794, we have many cases where we fail to combine shuffles as the input operands have other uses.
This patch permits these shuffles to be combined as long as they don't introduce additional variable shuffle masks, which should reduce instruction dependencies and allow the total number of shuffles to still drop without increasing the constant pool.
However, this may mean that some memory folds may no longer occur, and on pre-AVX require the occasional extra register move.
This also exposes some poor PMULDQ/PMULUDQ codegen which was doing unnecessary upper/lower calculations which will in fact fold to zero/undef - the fix will be added in a followup commit.
Differential Revision: https://reviews.llvm.org/D50328
llvm-svn: 339335
According to PTX ISA .volatile has the same memory synchronization
semantics as .relaxed.sys, so it can be used to implement monotonic
atomic loads and stores. This is important for OpenMP's atomic
construct where
- 'read's and 'write's are lowered to atomic loads and stores, and
- an update of float or double types are lowered into a cmpxchg loop.
(Note that PTX could do better because it has atom.add.f{32,64} but
LLVM's atomicrmw instruction only allows integer types.)
Higher levels of atomicity (like acquire and release) need additional
synchronization properties which were added with PTX ISA 6.0 / sm_70.
So using these instructions still results in an error.
Differential Revision: https://reviews.llvm.org/D50391
llvm-svn: 339316
This pseudo-instruction is similar to la but uses PC-relative addressing
unconditionally. This is, la is only different to lla when using -fPIC. This
pseudo-instruction seems often forgotten in several specs but it is definitely
mentioned in binutils opcodes/riscv-opc.c. The semantics are defined both in
page 37 of the "RISC-V Reader" book but also in function macro found in
gas/config/tc-riscv.c.
This is a very first step towards adding PIC support for Linux in the RISC-V
backend.
The lla pseudo-instruction expands to a sequence of auipc + addi with a couple
of pc-rel relocations where the second points to the first one. This is
described in
https://github.com/riscv/riscv-elf-psabi-doc/blob/master/riscv-elf.md#pc-relative-symbol-addresses
For now, this patch only introduces support of that pseudo instruction at the
assembler parser.
Differential Revision: https://reviews.llvm.org/D49661
llvm-svn: 339314
The main interesting case is a fence in an otherwise dead loop or one containing only arithmetic. This can happen as a result of DSE or other transforms from seemingly reasonable initial IR.
llvm-svn: 339310
Changes the default Windows target triple returned by
GetHostTriple.cmake from the old environment names (which we wanted to
move away from) to newer, normalized ones. This also requires updating
all tests to use the new systems names in constraints.
Differential Revision: https://reviews.llvm.org/D47381
llvm-svn: 339307
isNegatibleForFree() should not matter here (as the test diffs show)
because it's always a win to replace an fsub+fadd with fneg. The
problem in D50195 persists because either (1) we are doing these
folds in the wrong order or (2) we're missing another fold for fadd.
llvm-svn: 339299
LLVM triple normalization is handling "unknown" and empty components
differently; for example given "x86_64-unknown-linux-gnu" and
"x86_64-linux-gnu" which should be equivalent, triple normalization
returns "x86_64-unknown-linux-gnu" and "x86_64--linux-gnu". autoconf's
config.sub returns "x86_64-unknown-linux-gnu" for both
"x86_64-linux-gnu" and "x86_64-unknown-linux-gnu". This changes the
triple normalization to behave the same way, replacing empty triple
components with "unknown".
This addresses PR37129.
Differential Revision: https://reviews.llvm.org/D50219
llvm-svn: 339294
On Darwin we pin the DWARF line tables to version 2. Stop doing so for
DWARF v5 and later.
Differential revision: https://reviews.llvm.org/D49381
llvm-svn: 339288
Normally, if any registers are spilled, we prefer to spill lr on Thumb1
so we can fold the "bx lr" into the "pop". However, if there are tail
calls involved, restoring lr is expensive, so skip the optimization in
that case.
The spill of r7 in the new test also isn't necessary, but that's
mostly orthogonal to this patch. (It's the same code in
ARMFrameLowering, but it's not related to tail calls.)
Differential Revision: https://reviews.llvm.org/D49459
llvm-svn: 339283
Template manglings use a fresh back-referencing context, so we
need to do the same. This fixes several existing tests which are
marked as FIXME, so those are now actually run.
llvm-svn: 339275
When reading a custom WASM section, it was possible that its name
extended beyond the size of the section. This resulted in a bogus value
for the section size due to the size overflowing.
Fixes heap buffer overflow detected by OSS-fuzz:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=8190
Differential revision: https://reviews.llvm.org/D50387
llvm-svn: 339269
When using APPLE extensions, don't duplicate the compiler invocation's
flags both in AT_producer and AT_APPLE_flags.
Differential revision: https://reviews.llvm.org/D50453
llvm-svn: 339268
The scalar cases are handled in instcombine's internal
reassociation pass for FP ops, but it misses the vector types.
These patterns are similar to what was handled in InstSimplify in:
https://reviews.llvm.org/rL339171https://reviews.llvm.org/rL339174https://reviews.llvm.org/rL339176
...but we can't use instsimplify on these because we require negation
of the original operand.
llvm-svn: 339263
This patch aims to improve the codegen for vector loads involving the
scalar_to_vector (load X) sequence. Initially, ld->mv instructions were used
for scalar_to_vector (load X), so this patch allows scalar_to_vector (load X)
to utilize:
LXSD and LXSDX for i64 and f64
LXSIWAX for i32 (sign extension to i64)
LXSIWZX for i32 and f64
Committing on behalf of Amy Kwan.
Differential Revision: https://reviews.llvm.org/D48950
llvm-svn: 339260
Summary:
Currently, in line with GCC, when specifying reserved registers like sp or pc on an inline asm() clobber list, we don't always preserve the original value across the statement. And in general, overwriting reserved registers can have surprising results.
For example:
```
extern int bar(int[]);
int foo(int i) {
int a[i]; // VLA
asm volatile(
"mov r7, #1"
:
:
: "r7"
);
return 1 + bar(a);
}
```
Compiled for thumb, this gives:
```
$ clang --target=arm-arm-none-eabi -march=armv7a -c test.c -o - -S -O1 -mthumb
...
foo:
.fnstart
@ %bb.0: @ %entry
.save {r4, r5, r6, r7, lr}
push {r4, r5, r6, r7, lr}
.setfp r7, sp, #12
add r7, sp, #12
.pad #4
sub sp, #4
movs r1, #7
add.w r0, r1, r0, lsl #2
bic r0, r0, #7
sub.w r0, sp, r0
mov sp, r0
@APP
mov.w r7, #1
@NO_APP
bl bar
adds r0, #1
sub.w r4, r7, #12
mov sp, r4
pop {r4, r5, r6, r7, pc}
...
```
r7 is used as the frame pointer for thumb targets, and this function needs to restore the SP from the FP because of the variable-length stack allocation a. r7 is clobbered by the inline assembly (and r7 is included in the clobber list), but LLVM does not preserve the value of the frame pointer across the assembly block.
This type of behavior is similar to GCC's and has been discussed on the bugtracker: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=11807 . No consensus seemed to have been reached on the way forward. Clang behavior has briefly been discussed on the CFE mailing (starting here: http://lists.llvm.org/pipermail/cfe-dev/2018-July/058392.html). I've opted for following Eli Friedman's advice to print warnings when there are reserved registers on the clobber list so as not to diverge from GCC behavior for now.
The patch uses MachineRegisterInfo's target-specific knowledge of reserved registers, just before we convert the inline asm string in the AsmPrinter.
If we find a reserved register, we print a warning:
```
repro.c:6:7: warning: inline asm clobber list contains reserved registers: R7 [-Winline-asm]
"mov r7, #1"
^
```
Reviewers: eli.friedman, olista01, javed.absar, efriedma
Reviewed By: efriedma
Subscribers: efriedma, eraman, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D49727
llvm-svn: 339257
Provide a pass-through of the numerator for divide by one cases - this is the same approach we take in DAGCombiner::visitSDIVLike.
I investigated whether we could achieve this by magic MULHU/SRL values but nothing appeared to work as we don't have a way for MULHU(x,c) -> x
llvm-svn: 339254
Match the GNU assembler in supporting immediate operands for these
instructions even when the reg-reg mnemonic is used.
Differential Revision: https://reviews.llvm.org/D50046
Patch by Kito Cheng.
llvm-svn: 339252
This accounts for the missing IR fold noted in D50195. We don't need any fast-math to enable the negation transform.
FP negation can always be folded into an fmul/fdiv constant to eliminate the fneg.
I've limited this to one-use to ensure that we are eliminating an instruction rather than replacing fneg by a
potentially expensive fdiv or fmul.
Differential Revision: https://reviews.llvm.org/D50417
llvm-svn: 339248
Making the test use urem relies on it calling udiv-like combines, but the real issue is with the udiv so we're better off using that directly.
llvm-svn: 339247
Summary:
https://rise4fun.com/Alive/IT3
Comes up in the [most ugliest] `signed int` -> `signed char` case of
`-fsanitize=implicit-conversion` (https://reviews.llvm.org/D50250)
Previously, we were stuck with `not`: {F6867736}
But now we are able to completely get rid of it: {F6867737}
(FIXME: why are we loosing the metadata? that seems wrong/strange.)
Here, we only want to do that it we will be able to completely
get rid of that 'not'.
Reviewers: spatel, craig.topper
Reviewed By: spatel
Subscribers: vsk, erichkeane, llvm-commits
Differential Revision: https://reviews.llvm.org/D50301
llvm-svn: 339243
Summary: Extend fix for PR34170 to support inline assembly with multiple output operands that do not naturally go in the register class it is constrained to (eg. double in a 32-bit GPR as in the PR).
Reviewers: bogner, t.p.northover, lattner, javed.absar, efriedma
Reviewed By: efriedma
Subscribers: efriedma, tra, eraman, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D45437
llvm-svn: 339225