lookupFlags returns a SymbolFlagsMap for the requested symbols, along with a
set containing the SymbolStringPtr for any symbol not found in the VSO.
The JITSymbolFlags for each symbol will have been stripped of its transient
JIT-state flags (i.e. NotMaterialized, Materializing).
Calling lookupFlags does not trigger symbol materialization.
llvm-svn: 323060
We already had the pointer being stored to in the MemLoc, reuse that code. In merging cases, it turned out the interface of the getLocForWrite had become inconsitent with other related utilities. Fix that by making sure the input passes hasAnalyzableWrite as well.
llvm-svn: 323056
Summary:
This patch adds an implementation of targetShrinkDemandedConstant that tries to keep shrinkdemandedbits from removing bits that would otherwise have been recognized as a movzx.
We still need a follow patch to stop moving ands across srl if the and could be represented as a movzx before the shift but not after. I think this should help with some of the cases that D42088 ended up removing during isel.
Reviewers: spatel, RKSimon
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42265
llvm-svn: 323048
This was completely broken, but hopefully fixed by this patch.
In cases where it is needed, a vector with non byte-sized elements is stored
by extracting, zero-extending, shift:ing and or:ing the elements into an
integer of the same width as the vector, which is then stored.
Review: Eli Friedman, Ulrich Weigand
https://reviews.llvm.org/D42100#inline-369520https://bugs.llvm.org/show_bug.cgi?id=35520
llvm-svn: 323042
This (together with the corresponding LLD commit, that contains the
testcase updates) fixes PR35733.
Differential Revision: https://reviews.llvm.org/D41631
llvm-svn: 323035
`llvm.used` contains a list of pointers to named values which the
compiler, assembler, and linker are required to treat as if there is a
reference that they cannot see. Ensure that the symbols are preserved
by adding an explicit `-include` reference to the linker command.
llvm-svn: 323017
This change applies to places where we would turn 128/256-bit code into 512-bit in order to get a wider element type through sext/zext. Any 512-bit types that already existed in the IR/DAG will be left that way.
The width preference has no effect on codegen behavior when the target does not have AVX512 enabled. So AVX/AVX2 codegen cannot be limited via this mechanism yet.
If the preference is lower than 256 we may still use a 256 bit type to do the operation. Constraining to 128 bits makes it much more difficult to support some operations. For many of these cases we need to change element width while keeping element count constant which is easiest done by switching between 256 and 128 bit.
The preference is only obeyed when AVX512 and VLX are available. This means the preference is not obeyed for KNL, but is obeyed for SKX, Cannonlake, and Icelake. For KNL, the only way to do masked operation is on 512-bit registers so we would have to completely disable masking to obey the preference. We would also lose support for gather, scatter, ctlz, vXi64 multiplies, etc. This may change in the future, but this simplifies the initial implementation.
Differential Revision: https://reviews.llvm.org/D41895
llvm-svn: 323016
This will cause the vectorizers to do some limiting of the vector widths they create. This is not a strict limit. There are reasons I know of that the loop vectorizer will generate larger vectors for.
I've written this in such a way that the interface will only return a properly supported width(0/128/256/512) even if the attribute says something funny like 384 or 10.
This has been split from D41895 with the remainder in a follow up commit.
llvm-svn: 323015
to @objc_autorelease if its operand is a PHI and the PHI has an
equivalent value that is used by a return instruction.
For example, ARC optimizer shouldn't replace the call in the following
example, as doing so breaks the AutoreleaseRV/RetainRV optimization:
%v1 = bitcast i32* %v0 to i8*
br label %bb3
bb2:
%v3 = bitcast i32* %v2 to i8*
br label %bb3
bb3:
%p = phi i8* [ %v1, %bb1 ], [ %v3, %bb2 ]
%retval = phi i32* [ %v0, %bb1 ], [ %v2, %bb2 ] ; equivalent to %p
%v4 = tail call i8* @objc_autoreleaseReturnValue(i8* %p)
ret i32* %retval
Also, make sure ObjCARCContract replaces @objc_autoreleaseReturnValue's
operand uses with its value so that the call gets tail-called.
rdar://problem/15894705
llvm-svn: 323009
ExternalSymbolMap now stores the string key (rather than using a StringRef),
as the object file backing the key may be removed at any time.
llvm-svn: 323001
Previously, the DIBuilder didn't expose functionality to set its compile unit
in any other way than calling createCompileUnit. This meant that the outliner,
which creates new functions, had to create a new compile unit for its debug
info.
This commit adds an optional parameter in the DIBuilder's constructor which
lets you set its CU at construction.
It also changes the MachineOutliner so that it keeps track of the DISubprograms
for each outlined sequence. If debugging information is requested, then it
uses one of the outlined sequence's DISubprograms to grab a CU. It then uses
that CU to construct the DISubprogram for the new outlined function.
The test has also been updated to reflect this change.
See https://reviews.llvm.org/D42254 for more information. Also see the e-mail
discussion on D42254 in llvm-commits for more context.
llvm-svn: 322992
On current machines we have load-on-condition instructions that can be
used to directly implement the SETCC semantics. If we have those, it is
always preferable to use them instead of generating the IPM sequence.
llvm-svn: 322989
In order to implement a test whether a compare-and-swap succeeded, the
SystemZ back-end currently emits a rather inefficient sequence of first
converting the CC result into an integer, and then testing that integer
against zero. This commit changes the back-end to simply directly test
the CC value set by the compare-and-swap instruction.
llvm-svn: 322988
The SystemZ back-end uses a sequence of IPM followed by arithmetic
operations to implement the SETCC primitive. This is currently done
early during SelectionDAG. This patch moves generating those sequences
to much later in SelectionDAG (during PreprocessISelDAG).
This doesn't change much in generated code by itself, but it allows
further enhancements that will be checked-in as follow-on commits.
llvm-svn: 322987
The second return value of ATOMIC_CMP_SWAP_WITH_SUCCESS is known to be a
boolean, and should therefore be treated by computeKnownBits just like
the second return values of SMULO / UMULO.
Differential Revision: https://reviews.llvm.org/D42067
llvm-svn: 322985
Summary:
For consistency with the output of lld.
This is useful in runnable binaries as can them be sure the
null function pointer will never be a valid argument
call_indirect.
Subscribers: jfb, dschuff, jgravelle-google, aheejin, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D42284
llvm-svn: 322978
Fix a performance regression caused by r322737.
While trying to make it easier to replace compares with existing adds and
subtracts, I accidentally stopped it from doing so in some cases. This should
fix that. I'm also fixing another potential bug in that commit.
Differential Revision: https://reviews.llvm.org/D42263
llvm-svn: 322972
RuntimeLibcallSignatures previously manually initialized all the libcall
names into an array and searched it linearly for the first match to lookup
the corresponding index.
r322802 switched that to initializing a map keyed by the libcall name.
Neither of these approaches works correctly because some libcall numbers use
the same name on different platforms (e.g. the "l" suffixed functions
use f80 or f128 or ppcf128).
This change fixes that by ensuring that each name only goes into the map
once. It also adds tests.
Differential Revision: https://reviews.llvm.org/D42271
llvm-svn: 322971
Sign-extension opcodes have been split into a separate proposal from
the main threads proposal, so switch them to their own target
feature. See:
https://github.com/WebAssembly/sign-extension-ops
llvm-svn: 322966
Try to detect the terminal color support by checking the value of the
TERM environment variable. This is not great, but it's better than
nothing when terminfo library isn't available, which may still be the
case on some Linux distributions.
Differential Revision: https://reviews.llvm.org/D42055
llvm-svn: 322962
Try to reverse the constant-shrinking that happens in SimplifyDemandedBits()
for 'and' masks when it results in a smaller sign-extended immediate.
We are also able to detect dead 'and' ops here (the mask is all ones). In
that case, we replace and return without selecting the 'and'.
Other targets might want to share some of this logic by enabling this under a
target hook, but I didn't see diffs for simple cases with PowerPC or AArch64,
so they may already have some specialized logic for this kind of thing or have
different needs.
This should solve PR35907:
https://bugs.llvm.org/show_bug.cgi?id=35907
Differential Revision: https://reviews.llvm.org/D42088
llvm-svn: 322957
Summary:
If the vectorized tree has truncate to minimum required bit width and
the vector type of the cast operation after the truncation is the same
as the vector type of the cast operands, count cost of the vector cast
operation as 0, because this cast will be later removed.
Also, if the vectorization tree root operations are integer cast operations, do not consider them as candidates for truncation. It will just create extra number of the same vector/scalar operations, which will be removed by instcombiner.
Reviewers: RKSimon, spatel, mkuper, hfinkel, mssimpso
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41948
llvm-svn: 322946
Summary:
In ModRefInfo "Must" was introduced to track presence of MustAlias, but we still want to return NoModRef when there is neither Mod or Ref, even when MustAlias is found. Patch has small fixes to ensure this happens.
Minor cleanup to remove nesting for 2 if statements when calling getModRefInfo for 2 ImmutableCallSites.
Reviewers: sanjoy
Subscribers: jlebar, llvm-commits
Differential Revision: https://reviews.llvm.org/D42209
llvm-svn: 322932
Three (or more) operand getelementptrs could plausibly also be handled, but
handling only two-operand fits in easily with the existing BinaryOperator
handling.
Differential Revision: https://reviews.llvm.org/D39958
llvm-svn: 322930
This avoids playing games with pseudo pass IDs and avoids using an
unreliable MRI::isSSA() check to determine whether register allocation
has happened.
Note that this renames:
- MachineLICMID -> EarlyMachineLICM
- PostRAMachineLICMID -> MachineLICMID
to be consistent with the EarlyTailDuplicate/TailDuplicate naming.
llvm-svn: 322927
Split TailDuplicatePass into EarlyTailDuplicate and TailDuplicate. This
avoids playing games with fake pass IDs and using MRI::isSSA() to
determine pre-/post-RA state.
llvm-svn: 322926
Re-commit of r322200: The testcase shouldn't hit machineverifiers
anymore with r322917 in place.
Large callframes (calls with several hundreds or thousands or
parameters) could lead to situations in which the emergency spillslot is
out of range to be addressed relative to the stack pointer.
This commit forces the use of a frame pointer in the presence of large
callframes.
This commit does several things:
- Compute max callframe size at the end of instruction selection.
- Add mirFileLoaded target callback. Use it to compute the max callframe size
after loading a .mir file when the size wasn't specified in the file.
- Let TargetFrameLowering::hasFP() return true if there exists a
callframe > 255 bytes.
- Always place the emergency spillslot close to FP if we have a frame
pointer.
- Note that `useFPForScavengingIndex()` would previously return false
when a base pointer was available leading to the emergency spillslot
getting allocated late (that's the whole effect of this callback).
Which made no sense to me so I took this case out: Even though the
emergency spillslot is technically not referenced by FP in this case
we still want it allocated early.
Differential Revision: https://reviews.llvm.org/D40876
llvm-svn: 322919
Do not create CALLSEQ_START/CALLSEQ_END when there is no callframe to
setup and the callframe size is 0.
- Fixes an invalid callframe nesting for byval arguments, which would
look like this before this patch (as in `big-byval.ll`):
...
ADJCALLSTACKDOWN 32768, 0, ... # Setup for extfunc
...
ADJCALLSTACKDOWN 0, 0, ... # setup for memcpy
...
BL &memcpy ...
ADJCALLSTACKUP 0, 0, ... # destroy for memcpy
...
BL &extfunc
ADJCALLSTACKUP 32768, 0, ... # destroy for extfunc
- Saves us two instructions in the common case of zero-sized stackframes.
- Remove an unnecessary scheduling barrier (hence the small unittest
changes).
Differential Revision: https://reviews.llvm.org/D42006
llvm-svn: 322917
Bulk queries reduce IPC/RPC overhead for cross-process JITing and expose
opportunities for parallel compilation.
The two new query methods are lookupFlags, which finds the flags for each of a
set of symbols; and lookup, which finds the address and flags for each of a
set of symbols. (See doxygen comments for more details.)
The existing JITSymbolResolver class is renamed LegacyJITSymbolResolver, and
modified to extend the new JITSymbolResolver class using the following scheme:
- lookupFlags is implemented by calling findSymbolInLogicalDylib for each of the
symbols, then returning the result of calling getFlags() on each of these
symbols. (Importantly: lookupFlags does NOT call getAddress on the returned
symbols, so lookupFlags will never trigger materialization, and lookupFlags will
never call findSymbol, so only symbols that are part of the logical dylib will
return results.)
- lookup is implemented by calling findSymbolInLogicalDylib for each symbol and
falling back to findSymbol if findSymbolInLogicalDylib returns a null result.
Assuming a symbol is found its getAddress method is called to materialize it and
the result (if getAddress succeeds) is stored in the result map, or the error
(if getAddress fails) is returned immediately from lookup. If any symbol is not
found then lookup returns immediately with an error.
This change will break any out-of-tree derivatives of JITSymbolResolver. This
can be fixed by updating those classes to derive from LegacyJITSymbolResolver
instead.
llvm-svn: 322913
This adds a new instrinsic to support the rdpid instruction. The implementation is a bit weird because the intrinsic is defined as always returning 32-bits, but the assembler support thinks the instruction produces a 64-bit register in 64-bit mode. But really it zeros the upper 32 bits. So I had to add separate patterns where 64-bit mode uses an extract_subreg.
Differential Revision: https://reviews.llvm.org/D42205
llvm-svn: 322910
We did this for inline call site line tables, but we hadn't done it for
regular function line tables yet. This patch copies that logic from
encodeInlineLineTable.
llvm-svn: 322905
Summary:
This patch implements d16 support for image load, image store and image sample intrinsics.
Reviewers:
Matt, Brian.
Differential Revision:
https://reviews.llvm.org/D3991
llvm-svn: 322903
The compilation directory has always been #0, but as of DWARF v5 it is
explicitly listed in the line-table section instead of implicitly
being a reference to the compile_unit DIE's DW_AT_comp_dir attribute.
This means the dumper should number the dumped array starting with 0
or 1 depending on the DWARF version of the line table.
References in the generated DWARF are correct, it's just the dumper
that was wrong. Also some assembler-coded tests were similarly
confused about directory numbers.
llvm-svn: 322884
There's some abstraction overhead in the underlying
mechanisms that were being used, and it was leading to an
abundance of small but not-free copies being made. This
showed up on a profile. Eliminating this and going back to
a low-level byte-based implementation speeds up lld with
/DEBUG between 10 and 15%.
Differential Revision: https://reviews.llvm.org/D42148
llvm-svn: 322871
r322086 removed the trailing information describing reg classes for each
register.
This patch adds printing reg classes next to every register when
individual operands/instructions/basic blocks are printed. In the case
of dumping MIR or printing a full function, by default don't print it.
Differential Revision: https://reviews.llvm.org/D42239
llvm-svn: 322867
Follow-up to r322120 which can cause assertions for AArch64 because
v1f64 and v1i64 are legal types.
Differential Revision: https://reviews.llvm.org/D42097
llvm-svn: 322823
LTO sets dso_local as an optimization, so don't clear it.
This avoid clearing it from undefined hidden symbols, which would then
fail the verifier.
llvm-svn: 322814
For example, a build_vector of i64 bitcasted from v2i32 can be turned into a concat_vectors of the v2i32 vectors with a bitcast to a vXi64 type
Differential Revision: https://reviews.llvm.org/D42090
llvm-svn: 322811
This is similar to r322317, but for visibility. It is not as neat
because we have to special case extern_weak.
The idea is the same as the previous change, make the transition to
explicit dso_local easier for the frontends. With this they only have
to add dso_local to symbols where we need some external information to
decide if it is dso_local (like it being part of an ELF executable).
llvm-svn: 322806
Right now, it is not possible to run MachineCSE in the middle of the
GlobalISel pipeline. Being able to run generic optimizations between the
core passes of GlobalISel was one of the goals of the new ISel framework.
This is the first attempt to do it.
The problem is that MachineCSE pass assumes all register operands have a
register class, which, in GlobalISel context, won't be true until after the
InstructionSelect pass. The reason for this behaviour is that before
replacing one virtual register with another, MachineCSE pass (and most of
the other optimization machine passes) must check if the virtual registers'
constraints have a (sufficiently large) intersection, and constrain the
resulting register appropriately if such intersection exists.
GlobalISel extends the representation of such constraints from just a
register class to a triple (low-level type, register bank, register
class).
This commit adds MachineRegisterInfo::constrainRegAttrs method that extends
MachineRegisterInfo::constrainRegClass to such a triple.
The idea is that going forward we should use:
- RegisterBankInfo::constrainGenericRegister within GlobalISel's
InstructionSelect pass
- MachineRegisterInfo::constrainRegClass within SelectionDAG ISel
- MachineRegisterInfo::constrainRegAttrs everywhere else regardless
the target and instruction selector it uses.
Patch by Roman Tereshin. Thanks!
llvm-svn: 322805
Remove the tight coupling between llvm/CodeGenRuntimeLibcalls.def and
the table of supported singatures for wasm. This will allow adding new libcalls
without changing wasm's signature table.
Also, some cleanup:
Use ManagedStatics instead of const tables to avoid memory/binary bloat.
Use a StringMap instead of a linear search for name lookup.
Differential Revision: https://reviews.llvm.org/D35592
llvm-svn: 322802
Before, it wasn't possible to get backtraces inside outlined functions. This
commit adds DISubprograms to the IR functions created by the outliner which
makes this possible. Also attached a test that ensures that the produced
debug information is correct. This is useful to users that want to debug
outlined code.
llvm-svn: 322789
Every known PE COFF target emits /EXPORT: linker flags into a .drective
section. The AsmPrinter should handle this.
While we're at it, use global_values() and emit each export flag with
its own .ascii directive. This should make the .s file output more
readable.
llvm-svn: 322788
Summary:
-hwasan-mapping-offset defines the non-zero shadow base address.
-hwasan-kernel disables calls to __hwasan_init in module constructors.
Unlike ASan, -hwasan-kernel does not force callback instrumentation.
This is controlled separately with -hwasan-instrument-with-calls.
Reviewers: kcc
Subscribers: srhines, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D42141
llvm-svn: 322785
Summary:
This patch adds a new target option in order to control GlobalISel.
This will allow the users to enable/disable GlobalISel prior to the
backend by calling `TargetMachine::setGlobalISel(bool Enable)`.
No test case as there is already a test to check GlobalISel
command line options.
See: CodeGen/AArch64/GlobalISel/gisel-commandline-option.ll.
Reviewers: qcolombet, aemerson, ab, dsanders
Reviewed By: qcolombet
Subscribers: rovka, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D42137
llvm-svn: 322773
Summary:
The class wraps a uint64_t and an enum to represent the type of profile
count (real and synthetic) with some helper methods.
Reviewers: davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41883
llvm-svn: 322771
The code wasn't zero-extending correctly, so the comparison could
spuriously fail.
Adds some AArch64 tests to cover this case.
Inspired by D41791.
Differential Revision: https://reviews.llvm.org/D41798
llvm-svn: 322767
Get rid of DEBUG_FUNCTION_NAME symbols. When we actually debug
data, maybe we'll want somewhere to put it... but having a symbol
that just stores the name of another symbol seems odd.
It means you have multiple Symbols with the same name, one
containing the actual function and another containing the name!
Store the names in a vector on the WasmObjectFile when reading
them in. Also stash them on the WasmFunctions themselves.
The names are //not// "symbol names" or aliases or anything,
they're just the name that a debugger should show against the
function body itself. NB. The WasmObjectFile stores them so that
they can be exported in the YAML losslessly, and hence the tests
can be precise.
Enforce that the CODE section has been read in before reading
the "names" section. Requires minor adjustment to some tests.
Patch by Nicholas Wilson!
Differential Revision: https://reviews.llvm.org/D42075
llvm-svn: 322741
Trying to link
__attribute__((weak, visibility("hidden"))) extern int foo;
int *main(void) {
return &foo;
}
on OS X fails with
ld: 32-bit RIP relative reference out of range (-4294971318 max is +/-2GB): from _main (0x100000FAB) to _foo@0x00001000 (0x00000000) in '_main' from test.o for architecture x86_64
The problem being that 0 cannot be computed as a fixed difference from
%rip. Exactly the same issue exists on ELF and we can use the same
solution.
llvm-svn: 322739
This extends my previous patches to also optimize overflow-checked multiplies during SelectionDAG.
Differential revision: https://reviews.llvm.org/D40922
llvm-svn: 322738
The ARM backend contains code that tries to optimize compares by replacing them with an existing instruction that sets the flags the same way. This allows it to replace a "cmp" with a "adds", generalizing the code that replaces "cmp" with "sub". It also heuristically disables sinking of instructions that could potentially be used to replace compares (currently only if they're next to each other).
Differential revision: https://reviews.llvm.org/D38378
llvm-svn: 322737
Summary:
Discovered while working on a patch to move alignment in
@llvm.memcpy/move/set from an arg into parameter attributes.
The current implementations of AttributeSet::removeAttribute() and
AttributeList::removeAttribute crash when attempting to remove the
alignment attribute. Currently, these implementations add the
to-be-removed attributes to an AttrBuilder and then remove
the builder from the list/set. Alignment is special in that it
must be added to a builder with an integer value for the alignment;
attempts to add alignment to a builder without a value is an error.
This change fixes the removeAttribute implementations for AttributeSet and
AttributeList to make them able to remove the alignment, and other similar,
attributes.
Reviewers: rnk, chandlerc, pete, javed.absar, reames
Reviewed By: rnk
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41951
llvm-svn: 322735
Most are just replaced with instrs lists, but a few regexps have been further generalized to match more instructions with a single pattern.
llvm-svn: 322734
If we are splatting pairs of 32-bit elements, we can use a 64-bit broadcast to get the job done.
We could probably could probably do this with other sizes too, for example four 16-bit elements. Or we could broadcast pairs of 16-bit elements using a 32-bit element broadcast. But I've left that as a future improvement.
I've also restricted this to AVX2 only because we can only broadcast loads under AVX.
Differential Revision: https://reviews.llvm.org/D42086
llvm-svn: 322730
We legalize selects of masks with scalar conditions using a bitcast to an integer type. But if we are in 32-bit mode we can't convert v64i1 to i64. So instead split the v64i1 to v32i1 and concat it back together. Each half will then be legalized by bitcasting to i32 which is fine.
The test case is a little indirect. If we have the v64i1 select in IR it will get legalized by legalize vector ops which has a run of type legalization after it. That type legalization run is able to fix this i64 bitcast. So in order to avoid that we need a build_vector of a splat which legalize vector ops will ignore. Legalize DAG will then turn that into a select via LowerBUILD_VECTORvXi1. And the select will get legalized. In this case there is no type legalizer run to cleanup the bitcast.
This fixes pr35972.
llvm-svn: 322724
candidates with coldcc attribute.
This patch adds support for the coldcc calling convention for Power.
This changes the set of non-volatile registers. It includes a pass to stress
test the implementation by marking all static directly called functions with
the coldcc attribute through the option -enable-coldcc-stress-test. It also
includes an option, -ppc-enable-coldcc, to add the coldcc attribute to
functions which are cold at all call sites based on BlockFrequencyInfo when
the containing function does not call any non cold functions.
Differential Revision: https://reviews.llvm.org/D38413
llvm-svn: 322721
BRCTH is capable of a long branch which needs to be recognized during branch
relaxation. This is done by checking for ExtraRelaxSize == 0.
Review: Ulrich Weigand
llvm-svn: 322688
This assert typically happens if an unstructured CFG is passed
to the pass. This can happen if the pass is run independently
without the structurizer.
llvm-svn: 322685
Summary:
Loading a vector of 4 half-precision FP sometimes results in an LD1
of 2 single-precision FP + a reversal. This results in an incorrect
byte swap due to the conversion from little endian to big endian.
In order to generate the correct byte swap, it is easier to
generate the correct LD1 of 4 half-precision FP, thus avoiding the
subsequent reversal.
Reviewers: craig.topper, jmolloy, olista01
Reviewed By: olista01
Subscribers: efriedma, samparker, SjoerdMeijer, rogfer01, aemerson, rengolin, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D41863
llvm-svn: 322663
I was comparing the demanded-bits implementations between InstCombine
and TargetLowering as part of investigating questions in D42088 and
noticed that this was wrong in IR. We were losing all of the prior
known bits when we got back to the 'zext'.
llvm-svn: 322662
When the compressed instruction set is enabled, the 16-bit c.nop can be
generated if necessary.
Differential Revision: https://reviews.llvm.org/D41221
Patch by Shiva Chen.
llvm-svn: 322658
Mark G_FPEXT and G_FPTRUNC as legal or libcall, depending on hardware
support, but only for conversions between float and double.
Also add the necessary boilerplate so that the LegalizerHelper can
introduce the required libcalls. This also works only for float and
double, but isn't too difficult to extend when the need arises.
llvm-svn: 322651
The match* functions have the annoying behavior of modifying its inputs.
Save and restore the inputs, just in case the early out for AVX512 is
hit. This is still not great and its only a matter of time this kind of
bug happens again, but I couldn't come up with a better pattern without
rewriting significant chunks of this code. Fixes PR35977.
llvm-svn: 322644
Summary:
Currently -glldb turns on emission of apple tables on all targets, but
lldb is only really capable of consuming them on darwin. Furthermore,
making lldb consume these tables is not straight-forward because of the
differences in how the debug info is distributed on darwin vs. elf
targets.
The darwin debug model assumes that the debug info (along with
accelerator tables) will either remain in the .o files or it will be
linked into a dsym bundle by a linker that knows how to merge these
tables. In the elf world, all present linkers will simply concatenate
these accelerator tables into the shared object. Since the tables are
not self-terminating, this renders the tables unusable, as the debugger
cannot pry the individual tables apart anymore.
It might theoretically be possible to make the tables work with split
dwarf, as that is somewhat similar to the apple .o model, but
unfortunately right now the combination of -glldb and -gsplit-dwarf
produces broken object files.
Until these issues are resolved there is no point in emitting the apple
tables for these targets. At best, it wastes space; at worst, it breaks
compilation and prevents the user from getting other benefits of -glldb.
Reviewers: probinson, aprantl, dblaikie
Subscribers: emaste, dim, llvm-commits, JDevlieghere
Differential Revision: https://reviews.llvm.org/D41986
llvm-svn: 322633
Change symbol values in the stack_size section from being 8 bytes, to being a target dependent size.
Differential Revision: https://reviews.llvm.org/D42108
llvm-svn: 322619
We seem to be (logically) returning ArchExtKinds here in all cases, so
the return type should reflect that.
The static_cast is necessary because `A.ID` is actually an `unsigned`,
presumably since we use `decltype(A)` to represent extended attributes
for both ARM and AArch64, which use distinct `ArchExtKinds`.
We can't trivially make the same change for ARM, because one of the
values it returns is the bitwise-or of two `ARM::ArchExtKind`s.
llvm-svn: 322613
Summary:
- Fix a bug in PrettyBuiltinDumper that returns "void" as the name for
an unspecified builtin type. Since the unspecified param of a variadic
function is considered a builtin of unspecified type in PDBs, we set
"..." for its name.
- Provide a method to determine if a PDBSymbolFunc is variadic in
PrettyFunctionDumper since PDBSymbolFunc::getArgument() doesn't return the
last unspecified-type param.
- Add a pretty-func-dumper.test to test pretty dumping of variadic
functions.
Reviewers: zturner, llvm-commits
Reviewed By: zturner
Differential Revision: https://reviews.llvm.org/D41801
llvm-svn: 322608
This removes some duplication from splitCallSite and makes it easier to
add additional code dealing with each predecessor. It also allows us to
split for more than 2 predecessors, although that is not enabled for
now.
Reviewers: junbuml, mcrosier, davidxl, davide
Reviewed By: junbuml
Differential Revision: https://reviews.llvm.org/D41858
llvm-svn: 322599
Patch by Takuto Ikuta.
In chromium's component build, there are many directive sections and
commandline parsing takes much time.
This patch is for speed up of lld in RelWithDebInfo build by forcing
inline heavily called isWhitespace function.
10 times link perf stats of blink_core.dll changed like below.
master:
TotalSeconds: 9.8764878
TotalSeconds: 10.1455242
TotalSeconds: 10.075279
TotalSeconds: 10.3397347
TotalSeconds: 9.8361665
TotalSeconds: 9.9544441
TotalSeconds: 9.8960686
TotalSeconds: 9.8877865
TotalSeconds: 10.0551879
TotalSeconds: 10.0492254
Avg: 10.01159047
with this patch:
TotalSeconds: 8.8696762
TotalSeconds: 9.1021585
TotalSeconds: 9.0233893
TotalSeconds: 9.1886175
TotalSeconds: 9.156954
TotalSeconds: 9.0978564
TotalSeconds: 9.1316824
TotalSeconds: 8.8354606
TotalSeconds: 9.2549431
TotalSeconds: 9.4473085
Avg: 9.11080465
llvm-svn: 322595
When "xer" is specified as clobbered register in inline assembler, clang can accept it, but llvm simply ignore it when lowered to machine instructions. It may cause problems later in scheduler.
This patch adds a new register XER aliased to CARRY, and adds it to register class CARRYRC. Now PPCTargetLowering::getRegForInlineAsmConstraint can return correct register number for inline asm constraint "{xer}", and scheduler behave correctly.
Differential Revision: https://reviews.llvm.org/D41967
llvm-svn: 322591
r320606 checked for MI.isMetaInstruction which skips all DBG_VALUEs.
This also skips IMPLICIT_DEFs and other instructions that may def / read
a register.
Differential Revision: https://reviews.llvm.org/D42119
llvm-svn: 322584
Summary:
This patch adds CustomRenderer which renders the matched
operands to the specified instruction.
Targets can enable the matching of SDNodeXForm by adding
a definition that inherits from GICustomOperandRenderer and
GISDNodeXFormEquiv as follows.
def gi_imm8 : GICustomOperandRenderer<"renderImm8”>,
GISDNodeXFormEquiv<imm8_xform>;
Custom renderer functions should be of the form:
void render(MachineInstrBuilder &MIB, const MachineInstr &I);
Reviewers: dsanders, ab, rovka
Reviewed By: dsanders
Subscribers: kristof.beyls, javed.absar, llvm-commits, mgrang, qcolombet
Differential Revision: https://reviews.llvm.org/D42012
llvm-svn: 322582
Summary: Sometimes vectorization of insertelement instructions with extractelement operands may produce an extra shuffle operation, if these operands are in the reverse order. Patch tries to improve this situation by the reordering of the operands to remove this extra shuffle operation.
Reviewers: mkuper, hfinkel, RKSimon, spatel
Subscribers: mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D33954
llvm-svn: 322579
Current condition for spill instruction recognition in LiveDebugValues does
not recognize case when register is spilled and killed in next instruction.
Patch by Nikola Prica.
Differential Revision: https://reviews.llvm.org/D41226
llvm-svn: 322554
These pseudos are not supposed to be visible to user.
This patch reduced the auto-generated instruction matcher. For example,
the following words are removed from keyword list of LLVM BPF assembler.
- MCK__35_, // '#'
- MCK__COLON_, // ':'
- MCK__63_, // '?'
- MCK_ADJCALLSTACKDOWN, // 'ADJCALLSTACKDOWN'
- MCK_ADJCALLSTACKUP, // 'ADJCALLSTACKUP'
- MCK_PSEUDO, // 'PSEUDO'
- MCK_Select, // 'Select'
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
llvm-svn: 322535
As commented on the existing code:
// The Reg operand should be a virtual register, which is defined
// outside the current basic block. DAG combiner has done a pretty
// good job in removing truncating inside a single basic block.
However, when the Reg operand comes from bpf_load_[byte | half | word]
intrinsics, the generic optimizer doesn't understand their results are
zero extended, so these single basic block elimination opportunities were
missed.
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
llvm-svn: 322534
This patch fixes the assertion failure in SROA reported in PR35657.
PR35657 reports the assertion failure due to r319522 (splitting for non-whole-alloca slices), but this problem can happen even without r319522.
The problem exists in a check for reusing an existing alloca when rewriting partitions. As the original comment said, we can reuse the existing alloca if the new alloca has the same type and offset with the existing one. But the code checks only type of the alloca and then check the offset using an assert.
In a corner case with out-of-bounds access (e.g. @PR35657 function added in unit test), it is possible that the two allocas have the same type but different offsets.
This patch makes the check of the offset in the if condition, and re-enables the splitting for non-whole-alloca slices.
Differential Revision: https://reviews.llvm.org/D41981
llvm-svn: 322533
Prior to this we had a separate instruction and register class that excluded eax to prevent matching the instruction that would encode with 0x90.
This patch changes this to just use an InstAlias to force xchgl %eax, %eax to use XCHG32rr instruction in 64-bit mode. This gets rid of the separate instruction and register class.
llvm-svn: 322532
As mentioned on PR35869, (and came up recently on D41517) we don't create a MMX zero register via the PXOR but instead perform a spill to stack from a XMM zero register.
This patch adds support for direct MMX zero vector creation and should make it easier to add better constant vector creation in the future as well.
Differential Revision: https://reviews.llvm.org/D41908
llvm-svn: 322525
Add support for custom execution domain fixing and implement support for BLENDPD/BLENDPS/PBLENDD/PBLENDW.
Differential Revision: https://reviews.llvm.org/D42042
llvm-svn: 322524