Primarily, this allows us to use the aggressive extraction mechanisms in combineExtractWithShuffle earlier and make use of UNDEF elements that may be lost during lowering.
llvm-svn: 322279
The code that is supposed to "Round odd types to the next pow of two" seems
broken and as well completely unused (untested). It also seems that
ExpandStore really shouldn't ever change the memory VT, which this in fact
does.
As a first step in fixing the broken handling of vector stores (of irregular
types, e.g. an i1 vector), this code is removed. For discussion, see
https://bugs.llvm.org/show_bug.cgi?id=35520.
Review: Eli Friedman
llvm-svn: 322275
Summary:
As RKSimon suggested in pr35820, in the case that Src is smaller in
bit-size than Indices, need to widen Src to avoid type mismatch.
Fixes pr35820
Reviewers: RKSimon, craig.topper
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41865
llvm-svn: 322272
Although the register scavenger can often find a spare register, an emergency
spill slot is needed to guarantee success. Reserve this slot in cases where
the function is known to have a large stack (meaning the scavenger may be
needed when forming stack addresses).
llvm-svn: 322269
Summary:
- MSVC uses the none type for a variadic argument in CodeView
- Add a unit test
Reviewers: zturner, llvm-commits
Reviewed By: zturner
Differential Revision: https://reviews.llvm.org/D41931
llvm-svn: 322257
If the index is v2i64 we can use the scatter instruction that has v4i32/v4f32 data register, v2i64 index, and v2i1 mask. Similar was already done for gather.
Implement custom widening for v2i32 data to remove the code that reverses type legalization during lowering.
llvm-svn: 322254
The function can take a significant amount of time on some
complicated test cases, but for the currently only use of
the function we can stop the initialization much earlier
when we find out we are going to discard the result anyway
in the caller of the function.
Adding configurable cut-off points so that we avoid wasting time.
NFCI.
llvm-svn: 322248
Summary:
Addresses issue: https://bugs.llvm.org/show_bug.cgi?id=34595
The topmost class, `SmallVector`, has internal storage for some
elements; `N - 1` elements' bytes worth of space. Meanwhile a base
class `SmallVectorTemplateCommon` has room for one element as well,
totaling `N` elements' worth of space.
The space for the N elements is contiguous and straddles
`SmallVectorTemplateCommon` and `SmallVector`.
A class "between" those two owning the storage, `SmallVectorImpl`, in
its destructor, calls the destructor for elements contained in the
vector, if any. It uses `destroy_range(begin, end)` and deletes all
items in sequence, starting from the end.
By the time the destructor for `SmallVectorImpl` is running, though, the
memory for elements `[1, N)` is already poisoned, due to `SmallVector`'s
destructor having done its thing already.
So if the element type `T` has a nontrivial destructor that accesses any
members of the `T` instance being destroyed, we'll run into a
user-after-poison bug.
This patch moves the destruction loop into `SmallVector`'s destructor,
so any memory being accessed while dtors are running is not yet
poisoned.
Confirmed this broke before (and now works with this patch) with these
compiler flags:
-fsanitize=memory
-fsanitize-memory-use-after-dtor
-fsanitize-memory-track-origins
and with the cmake flag
`-DLLVM_USE_SANITIZER='MemoryWithOrigins;Undefined'` as well as
`MSAN_OPTIONS=poison_in_dtor=1`.
Patch By: elsteveogrande
Reviewers: eugenis, morehouse, dblaikie
Reviewed By: eugenis, dblaikie
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41916
llvm-svn: 322241
Like rL321668 / rL321672, the planned optimizer change to
fix these will be in ValueTracking, but we can test the
changes cleanly here with AArch64 codegen.
llvm-svn: 322238
Revert for now as the testcase is hitting a pre-existing verifier error
that manifest as a failure when expensive checks are enabled (or
-verify-machineinstrs) is used.
This reverts commit r322200.
llvm-svn: 322231
After D41349, we can no get a MCSubtargetInfo into the MCAsmBackend constructor. This allows us to get NOPL from a subtarget feature rather than a CPU name blacklist.
Differential Revision: https://reviews.llvm.org/D41721
llvm-svn: 322227
Simplify the code slightly: Instead of creating empty subranges in one
case and immediately removing them, do not create them in the first
place.
llvm-svn: 322226
Branch relaxation is needed to support branch displacements that overflow the
instruction's immediate field.
Differential Revision: https://reviews.llvm.org/D40830
llvm-svn: 322224
Make sure I really get back to the beahvior before my rewrite in r321035
which turned out not to be completely NFC as I changed the behavior for
the ios simulator environment.
llvm-svn: 322223
This is a prerequisite for the branch relaxation pass, and allows a number of
optimisation passes (e.g. BranchFolding and MachineBlockPlacement) to work.
Differential Revision: https://reviews.llvm.org/D40808
llvm-svn: 322222
Includes support for expanding va_copy. Also adds support for using 'aligned'
registers when necessary for vararg calls, and ensure the frame pointer always
points to the bottom of the vararg spill region. This is necessary to ensure
that the saved return address and stack pointer are always available at fixed
known offsets of the frame pointer.
Differential Revision: https://reviews.llvm.org/D40805
llvm-svn: 322215
Currently we infer the scale at isel time by analyzing whether the base is a constant 0 or not. If it is we assume scale is 1, else we take it from the element size of the pass thru or stored value. This seems a little weird and I think it makes more sense to make it explicit in the DAG rather than doing tricky things in the backend.
Most of this patch is just making sure we copy the scale around everywhere.
Differential Revision: https://reviews.llvm.org/D40055
llvm-svn: 322210
ADRP instructions weren't being outlined because they're PC-relative and thus
fail the LR checks. This patch adds a special case for ADRPs to
getOutliningType to make sure that ADRPs can be outlined and updates the MIR
test.
llvm-svn: 322207
D41353 / D41233 are proposing to alter the shl/and canonicalization,
but I think that would just move an existing pattern-matching hole
to a different place.
llvm-svn: 322206
Large callframes (calls with several hundreds or thousands or
parameters) could lead to situations in which the emergency spillslot is
out of range to be addressed relative to the stack pointer.
This commit forces the use of a frame pointer in the presence of large
callframes.
This commit does several things:
- Compute max callframe size at the end of instruction selection.
- Add mirFileLoaded target callback. Use it to compute the max callframe size
after loading a .mir file when the size wasn't specified in the file.
- Let TargetFrameLowering::hasFP() return true if there exists a
callframe > 255 bytes.
- Always place the emergency spillslot close to FP if we have a frame
pointer.
- Note that `useFPForScavengingIndex()` would previously return false
when a base pointer was available leading to the emergency spillslot
getting allocated late (that's the whole effect of this callback).
Which made no sense to me so I took this case out: Even though the
emergency spillslot is technically not referenced by FP in this case
we still want it allocated early.
Differential Revision: https://reviews.llvm.org/D40876
llvm-svn: 322200
Summary:
After teaching InlineCost more about address spaces ()
another fault was detected in the inliner. If an argument has
the byval attribute the parameter might be copied to an alloca.
That part seems to work fine even if the argument has a different
address space than the alloca address space. However, if the
address spaces differ, then the inlined function still might
refer to the parameter using the original address space (the
inliner does not handle that situation very well).
This patch avoids the problem by simply disallowing inlining
when there are byval arguments with address space that differs
from the alloca address space.
I'm not really sure how to transform the code if we want to
get inlining for this situation. I assume that it never has
been working, and that the fixes in r321809 just exposed an
old problem.
Fault found by skatkov (Serguei Katkov). It is mentioned in
follow up comments to https://reviews.llvm.org/D40455.
Reviewers: skatkov
Reviewed By: skatkov
Subscribers: uabelho, eraman, llvm-commits, haicheng
Differential Revision: https://reviews.llvm.org/D41898
llvm-svn: 322181
Adds missing coverage for SHUFPD undef argument lowering, and also shows a missed opportunity to remove a unnecessary move compared to 02 shuffle mask.
llvm-svn: 322175
For hard float, it is legal.
For soft float, we need to lower to 0 - x first, and then we can use the
libcall for G_FSUB. This is undoing some of the canonicalization
performed by the IRTranslator (which introduces G_FNEG when it sees a
0 - x). Ideally, that canonicalization would be performed by a
pre-legalizer pass that would allow targets to opt out of this behaviour
rather than dance around it in the legalizer.
llvm-svn: 322168
Summary:
This extends TableGen's AsmMatcherEmitter with code that generates
a table with tied-operand constraints. The constraints are checked
when parsing the instruction. If an operand is not equal to its tied operand,
the assembler will give an error.
Patch [2/3] in a series to add operand constraint checks for SVE's predicated ADD/SUB.
Reviewers: olista01, rengolin, mcrosier, fhahn, craig.topper, evandro, echristo
Reviewed By: fhahn
Subscribers: javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D41446
llvm-svn: 322166
Prefetches used to always be chained between any previous and following
memory accesses. The problem with this was that later optimizations, such as
folding of a load into the user instruction, got disrupted.
This patch relaxes the chaining of prefetches in order to remedy this.
Reveiw: Hal Finkel
https://reviews.llvm.org/D38886
llvm-svn: 322163
Since a load and test instruction treat its operands as signed, it can only
replace a logical compare for EQ/NE uses.
Review: Ulrich Weigand
https://bugs.llvm.org/show_bug.cgi?id=35662
llvm-svn: 322161
Planning to add support for named vregs. This puts is in a conundrum since
physregs are named as well. To rectify this we need to use a sigil other than
'%' for physregs in MIR. We've settled on using '$' for physregs but first we
must repurpose it from external symbols using it, which is what this commit is
all about. We think '&' will have familiar semantics for C/C++ users.
llvm-svn: 322146
There were a few places where outs() was being used
directly rather than the ScopedPrinter object.
Differential Revision: https://reviews.llvm.org/D41370
llvm-svn: 322141
version being used on some of the green dragon builders (plus a clang-format).
Workaround: AsynchronousSymbolQuery and VSO want to work with
JITEvaluatedSymbols anyway, so just use them (instead of JITSymbol, which
happens to tickle the bug).
The libcxx bug being worked around was fixed in r276003, and there are plans to
update the offending builders.
llvm-svn: 322140
Summary:
LowerTypeTests moves some function definitions from individual object
files to the merged module, leaving a stub to be called in the merged
module's jump table. If an alias was pointing to such a function
definition LowerTypeTests would fail because the alias would be left
without a definition to point to.
This change 1) emits information about aliases to the ThinLTO summary,
2) replaces aliases pointing to function definitions that are moved to
the merged module with function declarations, and 3) re-emits those
aliases in the merged module pointing to the correct function
definitions.
The patch does not correctly fix all possible mis-uses of aliases in
LowerTypeTests. For example, it does not handle aliases with a different
type from the pointed to function.
The addition of alias data increases the size of Chrome build artifacts
by less than 1%.
Reviewers: pcc
Reviewed By: pcc
Subscribers: mehdi_amini, eraman, mgrang, llvm-commits, eugenis, kcc
Differential Revision: https://reviews.llvm.org/D41741
llvm-svn: 322139
Adds option /guard:cf to clang-cl and -cfguard to cc1 to emit function IDs
of functions that have their address taken into a section named .gfids$y for
compatibility with Microsoft's Control Flow Guard feature.
The original patch didn't have the lit.local.cfg file that restricts the new
test to x86, thus the new test was failing on the non-x86 bots.
Differential Revision: https://reviews.llvm.org/D40531
The reverts r322008, which was a revert of r322005.
This reverts commit a05b89f9aca70597dc79fe97bc49b50b51f525ba.
llvm-svn: 322136
This is more in line with what happens in the final
executable when symbols are undefined (i.e. weak
references).
Differential Revision: https://reviews.llvm.org/D41840
llvm-svn: 322130
Summary:
When performing constant propagation for call instructions we have historically replaced all uses of the return from a call, but not removed the call itself. This is required for correctness if the calls have side effects, however the compiler should be able to safely remove calls that don't have side effects.
This allows the compiler to completely fold away calls to functions that have no side effects if the inputs are constant and the output can be determined at compile time.
Reviewers: davide, sanjoy, bruno, dberlin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38856
llvm-svn: 322125
This patch makes the following changes to the schedule of instructions in the
prologue and epilogue.
The stack pointer update is moved down in the prologue so that the callee saves
do not have to wait for the update to happen.
Saving the lr is moved down in the prologue to hide the latency of the mflr.
The stack pointer is moved up in the epilogue so that restoring of the lr can
happen sooner.
The mtlr is moved up in the epilogue so that it is away form the blr at the end
of the epilogue. The latency of the mtlr can now be hidden by the loads of the
callee saved registers.
This commit is almost identical to this one: r322036 except that two warnings
that broke build bots have been fixed.
The revision number is D41737 as before.
llvm-svn: 322124
These indexes are useful because they are not always zero based and
functions and globals are referenced elsewhere by their index.
This matches what we already do for the type index space.
Differential Revision: https://reviews.llvm.org/D41877
llvm-svn: 322121
Summary:
In the case of an fp_extend of v1f16 to v1f32 where the v1f16 is the
result of a bitcast from i16, avoid creating an illegal fp16_to_fp where
the input is not a vector and the result is a v1f32.
V2: The fix is now to avoid vector scalarization creating a v1->scalar
bitcast.
Reviewers: srhines, t.p.northover
Subscribers: nhaehnle, llvm-commits, dstuttard, t-tye, yaxunl, wdng, kzhuravl, arsenm
Differential Revision: https://reviews.llvm.org/D41126
llvm-svn: 322120
Summary:
I had a case where multiple nested uniform ifs resulted in code that did
v_cmp comparisons, combining the results with s_and_b64, s_or_b64 and
s_xor_b64 and using the resulting mask in s_cbranch_vccnz, without first
ensuring that bits for inactive lanes were clear.
There was already code for inserting an "s_and_b64 vcc, exec, vcc" to
clear bits for inactive lanes in the case that the branch is instruction
selected as s_cbranch_scc1 and is then changed to s_cbranch_vccnz in
SIFixSGPRCopies. I have added the same code into SILowerControlFlow for
the case that the branch is instruction selected as s_cbranch_vccnz.
This de-optimizes the code in some cases where the s_and is not needed,
because vcc is the result of a v_cmp, or multiple v_cmp instructions
combined by s_and/s_or. We should add a pass to re-optimize those cases.
Reviewers: arsenm, kzhuravl
Subscribers: wdng, yaxunl, t-tye, llvm-commits, dstuttard, timcorringham, nhaehnle
Differential Revision: https://reviews.llvm.org/D41292
llvm-svn: 322119
Patch by Takuto Ikuta.
This patch reduces lld link time of chromium's blink_core.dll in
component build.
Total size of input argument in .directives become nearly 300MB in the
build and almost all its content are /EXPORT.
To reduce time of parsing too many /EXPORT option in the build, I
introduce fastpath for /EXPORT in ArgParser::parseDirectives.
On my desktop machine, 4 times stats of the link time are like below.
Improved around 20%.
This patch
TotalSeconds : 8.6217627
TotalSeconds : 8.5402175
TotalSeconds : 8.6855853
TotalSeconds : 8.3624441
Ave : 8.5525024
master
TotalSeconds : 10.9975031
TotalSeconds : 11.3409428
TotalSeconds : 10.6332897
TotalSeconds : 10.7650687
Ave : 10.934201075
llvm-svn: 322117
Add powerpc- (32-bit) as XFAIL for tests that are documented either in-
line or via commit messages as expected to fail on big-endian systems.
Tests not documented in-line are documented in commit messages as
follows:
r211172 - test/tools/llvm-cov/llvm-cov.test
r247920 - test/Transforms/SampleProfile/gcc-simple.ll
llvm-svn: 322114
Summary:
This pass synthesizes function entry counts by traversing the callgraph
and using the relative block frequencies of the callsites. The intended
use of these counts is in inlining to determine hot/cold callsites in
the absence of profile information.
The pass is split into two files with the code that propagates the
counts in a callgraph in a Utils file. I plan to add support for
propagation in the thinlto link phase and the propagation code will be
shared and hence this split. I did not add support to the old PM since
hot callsite determination in inlining is not possible in old PM
(although we could use hot callee heuristic with synthetic counts in the
old PM it is not worth the effort tuning it)
Reviewers: davidxl, silvas
Subscribers: mgorny, mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D41604
llvm-svn: 322110
Summary:
https://reviews.llvm.org/rL321877 introduced the `OptTable::findNearest`
method, to find the closest edit distance option for a given string.
However, the implementation contained a bug: for a typo `-foo` with an
edit distance of 1 away from a valid option `--foo`, `findNearest`
would suggest a nearby option of `foo`. That is, the result would not
include the `--` prefix, and so was not a valid option.
Fix the bug by ensuring that the prefix string is initialized to one of
the valid prefixes for the option.
Test Plan: `check-llvm-unit`
Reviewers: v.g.vassilev, teemperor, ruiu, jroelofs, yamaguchi
Reviewed By: jroelofs
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41873
llvm-svn: 322109
Summary:
If the vector type is transformed to non-vector single type, the compile
may crash trying to get vector information about non-vector type.
Reviewers: RKSimon, spatel, mkuper, hfinkel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41862
llvm-svn: 322106
Because of potential UB (known bits conflicts with an llvm.assume),
we have to check rather than assert here because InstSimplify doesn't
kill the compare:
https://bugs.llvm.org/show_bug.cgi?id=35846
llvm-svn: 322104
Summary:
With DebugTypeODRUniquing enabled, during IR linking debug metadata
in the destination module may be reached from the source module.
This means that ConstantAsMetadata nodes (e.g. on DITemplateValueParameter)
may contain a value the destination module. When trying to map such
metadata nodes, we will attempt to map a GV already in the dest module.
linkGlobalValueProto will end up with a source GV that is the same as
the dest GV as well as the new GV. Trying to access the TypeMap for the
source GV type, which is actually a dest GV type, hits an assertion
since it appears that we have mapped into the source module (because the
type is the value not a key into the map).
Detect that we don't need to access the TypeMap in this case, since
there is no need to create a bitcast from the new GV to the source GV
type as they GV are the same.
Fixes PR35722.
Reviewers: mehdi_amini, pcc
Subscribers: probinson, llvm-commits, eraman
Differential Revision: https://reviews.llvm.org/D41624
llvm-svn: 322103
Summary:
That would allow to recursively compare directories in tests using
"diff -r" on Windows in a similar way as it can be done on Linux or Mac.
Reviewers: zturner, morehouse, vsk
Reviewed By: zturner
Subscribers: kcc, llvm-commits
Differential Revision: https://reviews.llvm.org/D41776
llvm-svn: 322102
Normally target independent DAG combine would do this combine based on getSetCCResultType, but with VLX getSetCCResultType returns a vXi1 type preventing the DAG combining from kicking in.
But doing this combine can allow us to remove the explicit sign extend that would otherwise be emitted.
This patch adds a target specific DAG combine to combine the sext+setcc when the result type is the same size as the input to the setcc. I've restricted this to FP compares and things that can be represented with PCMPEQ and PCMPGT since we don't have full integer compare support on the older ISAs.
Differential Revision: https://reviews.llvm.org/D41850
llvm-svn: 322101
llc, opt, and clang can all autodetect the CPU and supported features. lli cannot as far as I could tell.
This patch uses the getCPUStr() and introduces a new getCPUFeatureList() and uses those in lli in place of MCPU and MAttrs.
Ideally, we would merge getCPUFeatureList and getCPUFeatureStr, but opt and llc need a string and lli wanted a list. Maybe we should just return the SubtargetFeature object and let the caller decide what it needs?
Differential Revision: https://reviews.llvm.org/D41833
llvm-svn: 322100
This change adds the missing armv8l variant as an alias of armv8 architecture.
The issue was observed with several regressions in validation on armv8l
hardware (for instance ExecutionEngine/frem.ll failed due to lack of neon fpu).
Tested with regression testsuite passed without regression on ARM and x86_64.
Patch by Yvan Roux.
Reviewers: rengolin, rogfer01, olista01, fhahn
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D41859
llvm-svn: 322098
In -debug output we print "pred:" whenever a MachineOperand is a
predicate operand in the instruction descriptor, and "opt:" whenever a
MachineOperand is an optional def in the instruction descriptor.
Differential Revision: https://reviews.llvm.org/D41870
llvm-svn: 322096
Summary:
The idea is that it would replace
(non-Writable)MemoryBuffer::getNewMemBuffer, which is quite useless
unless you const_cast its contents to write to it (which all (both)
callers of this function were doing). This patch also fixes one of the usages in
COFFWriter. After fixing the other usage in clang, I plan to delete the old
function.
Reviewers: dblaikie, Bigcheese
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41540
llvm-svn: 322094
Fixed issue that was found on sanitizer-x86_64-linux-fast.
I changed the result type of 'Parser.getTok().getString().lower()'
in AArch64AsmParser::tryParseSVEPredicateVector() from 'StringRef' to
'auto', since StringRef::lower() returns a std::string.
llvm-svn: 322092
Summary:
Added the FastVariableShuffle feature to cases that resembled processors
for which this fearure is on.
For AVX2 there are processors with and w/o this fearue enable.
For AVX512 only KNL does enable this feature so cases which only have
+avx512f were left without the FastVariableShuffle enabled.
Reviewers: RKSimon, craig.topper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41851
llvm-svn: 322090
Currently the MachineInstr::print function prints the
frame-setup/frame-destroy differently than it does in MIR.
Instead of:
%x21 = LDR %sp, -16; flags: FrameDestroy
print:
%x21 = frame-destroy LDR %sp, -16
llvm-svn: 322088
Ingredients in this patch:
1. Add HANDLE_LIBCALL defs for finite mathlib functions that correspond to LLVM intrinsics.
2. Plumbing to send TargetLibraryInfo down to SelectionDAGLegalize.
3. Relaxed math and library checking in SelectionDAGLegalize::ConvertNodeToLibcall() to choose finite libcalls.
There was a bug about determining the availability of the finite calls that should be fixed with:
rL322010
Not in this patch:
This doesn't resolve the question/bug of clang creating the intrinsic IR in the first place.
There's likely follow-up work needed to support the long double variants better.
There's room for improvement to reduce the code duplication.
Create finite calls that don't originate from a corresponding intrinsic or DAG node?
Differential Revision: https://reviews.llvm.org/D41338
llvm-svn: 322087
Since register classes and banks are already printed with the register
definition, don't print it at the end of every instruction anymore.
This follows MIR in this regard and is another step to the unification
of the two formats.
llvm-svn: 322086
EarlyCSE did not try to salvage debug info during erasing of instructions.
This change fixes it.
Patch by Djordje Todorovic.
Differential Revision: https://reviews.llvm.org/D41496
llvm-svn: 322083
We are printing / parsing the `frame-setup` MachineInstr flag but not
the `frame-destroy` one.
Differential Revision: https://reviews.llvm.org/D41509
llvm-svn: 322071
Summary:
Parsing of the '/m' (merging) or '/z' (zeroing) suffix of a predicate operand.
Patch [2/3] in a series to add predicated ADD/SUB instructions for SVE.
Reviewers: rengolin, mcrosier, evandro, fhahn, echristo, MatzeB, t.p.northover
Reviewed By: fhahn
Subscribers: t.p.northover, MatzeB, aemerson, javed.absar, tschuett, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D41442
llvm-svn: 322070
Summary:
This commit enables some of the arithmetic instructions for Nios2 ISA (for both
R1 and R2 revisions), implements facilities required to emit those instructions
and provides LIT tests for added instructions.
Reviewed By: hfinkel
Differential Revision: https://reviews.llvm.org/D41236
Author: belickim <mateusz.belicki@intel.com>
llvm-svn: 322069
CET (Control-Flow Enforcement Technology) introduces a new mechanism called IBT (Indirect Branch Tracking).
According to IBT, each Indirect branch should land on dedicated ENDBR instruction (End Branch).
The new pass adds ENDBR instructions for every indirect jmp/call (including jumps using jump tables / switches).
For more information, please see the following:
https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf
Differential Revision: https://reviews.llvm.org/D40482
Change-Id: Icb754489faf483a95248f96982a4e8b1009eb709
llvm-svn: 322062
When cross-compiling for Windows on Unix, the built toolchain will need
to be transferred to Windows to actually run. My opinion is that the
Unix build should use symlinks, and the transfer to Windows should take
care of making those symlinks usable. E.g., I envision tarballs to be a
common form of transfer from Unix to Windows, in which case the tarball
can be created using --dereference to follow the symlinks.
The motivation here is that, when cross-compiling for Windows on Unix,
the installation will *already* create symlinks. The reason is that the
installation script will be invoked without knowing the host system, so
the `if(UNIX)` check in the installation symlink creation script will
reflect the build system rather than the host system. We could either
make the build and install trees both contain copies or both contain
symlinks, and using symlinks is a significant space saving without (in
my opinion) having any detrimental effect on the usage of the cross-
compiled toolchain on Windows.
A secondary motivation is that Windows 10 version 1703 and later finally
lift the administrator rights requirement for creating symbolic links
(if the system is in Developer Mode), which makes symlinks a lot more
practical even on Windows. Of course Unix and Windows symlinks aren't
interoperable, but symlinks for Windows toolchains is a reasonable
future direction to be going in anyway.
Differential Revision: https://reviews.llvm.org/D41314
llvm-svn: 322061
The code that checks the immediate wasn't masking to the lower 3-bits like the code in X86InstrInfo.cpp that's used by the peephole pass does.
llvm-svn: 322060
SCEV tracks the correspondence of created SCEV to original instruction.
However during creation of SCEV it is possible that nuw/nsw/exact flags are
lost.
As a result during expansion of the SCEV the instruction with nuw/nsw/exact
will be used where it was expected and we produce poison incorreclty.
Reviewers: sanjoy, mkazantsev, sebpop, jbhateja
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41578
llvm-svn: 322058
If the offset is differ in two addressing mode we can continue only if
ScaleReg is not set due to we will use it as merge of different offsets.
It should fix PR35799 and PR35805.
Reviewers: john.brawn, reames
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41227
llvm-svn: 322056
The CTRLoop pass performs checks on the argument of certain libcalls/intrinsics,
and assumes the arguments must be of a simple type. This isn't always the case
though. For example if we unroll and vectorize a loop we may end up with vectors
larger then the largest legal type, along with intrinsics that operate on those
wider types. This happened in the ffmpeg build, where we unrolled a loop and
ended up with a sqrt intrinsic that operated on V16f64, triggering an assertion.
Differential Revision: https://reviews.llvm.org/D41758
llvm-svn: 322055
I had to drop fast-isel-abort from a test because we can't fast isel some of the mask stuff. When we used intrinsics we implicitly fell back to SelectionDAG for the intrinsic call without triggering the abort error. But with native IR that doesn't happen the same way.
llvm-svn: 322050
The pattern was this
def : Pat<(i32 (zext (i8 (bitconvert (v8i1 VK8:$src))))),
(MOVZX32rr8 (EXTRACT_SUBREG (i32 (COPY_TO_REGCLASS VK8:$src, GR32)), sub_8bit))>, Requires<[NoDQI]>;
but if you just let (i32 (zext X)) match byte itself you'll get MOVZX32rr8. And if you let (i8 (bitconvert (v8i1 VK8:$src))) match by itself you'll get (EXTRACT_SUBREG (i32 (COPY_TO_REGCLASS VK8:$src, GR32)), sub_8bit).
So we can just let isel do the two patterns naturally.
llvm-svn: 322049
This commit does two things. Firstly, it adds a collection of flags which can
be passed along to the target to encode information about the MBB that an
instruction lives in to the outliner.
Second, it adds some of those flags to the AArch64 outliner in order to add
more stack instructions to the list of legal instructions that are handled
by the outliner. The two flags added check if
- There are calls in the MachineBasicBlock containing the instruction
- The link register is available in the entire block
If the link register is available and there are no calls, then a stack
instruction can always be outlined without fixups, regardless of what it is,
since in this case, the outliner will never modify the stack to create a
call or outlined frame.
The motivation for doing this was checking which instructions are most often
missed by the outliner. Instructions like, say
%sp<def> = ADDXri %sp, 32, 0; flags: FrameDestroy
are very common, but cannot be outlined in the case that the outliner might
modify the stack. This commit allows us to outline instructions like this.
llvm-svn: 322048
When cross-compiling, we cannot use the just built toolchain, instead
we need to use the host toolchain which we assume has a support for
targeting the selected target platform. We also need to pass the path
to the native version of llvm-config to external projects.
Differential Revision: https://reviews.llvm.org/D41678
llvm-svn: 322046
I'm going to convert these to 'icmp slt X, zeroinitializer' in clang's CGBuiltin.cpp, but the GCCBuiltin names need to be removed to do that.
llvm-svn: 322037
This patch makes the following changes to the schedule of instructions in the
prologue and epilogue.
The stack pointer update is moved down in the prologue so that the callee saves
do not have to wait for the update to happen.
Saving the lr is moved down in the prologue to hide the latency of the mflr.
The stack pointer is moved up in the epilogue so that restoring of the lr can
happen sooner.
The mtlr is moved up in the epilogue so that it is away form the blr at the end
of the epilogue. The latency of the mtlr can now be hidden by the loads of the
callee saved registers.
Differential Revision: https://reviews.llvm.org/D41737
llvm-svn: 322036
If the make program isn't in the path, the native configure will fail.
Pass CMAKE_MAKE_PROGRAM to the native configure explicitly to remedy
this, similar to what's already done for external project configuration.
Explicitly set CMAKE_MAKE_PROGRAM before the user flags so that they can
override it for the native build if they desire (though I can't fathom
why that would be useful).
llvm-svn: 322032
The problem was that our Obj -> Yaml dumper had not been taught
to handle certain types of records. This meant that when I
generated the test input files, the records were still there but
none of its fields were filled out. So when it did the
Yaml -> Obj conversion as part of the test, it generated records
with garbage in them.
The patch here fixes the Obj <-> Yaml converter, and additionally
updates the test file with fresh Yaml generated by the fixed
converter.
llvm-svn: 322029
Add test/DebugInfo/MIR/Mips/lit.local.cfg so no tests are run if Mips is
not a supported target.
This should resolve buildbot failures seen after r322015.
llvm-svn: 322020
The last iterator of MBB should be recognized as MBB.end() not as
MBB.instr_end() which could return bundled instruction that is not iterable
with basic iterator.
Patch by Nikola Prica.
Differential Revision: https://reviews.llvm.org/D41626
llvm-svn: 322015
This patch was part of:
https://reviews.llvm.org/D41338
...but we can expose the bug in IR via constant propagation
as shown in the test. Unless the triple includes 'linux', we
should not fold these because the functions don't exist on
other platforms (yet?).
llvm-svn: 322010
The new test fails on the Hexagon bot. Reverting while I investigate.
This reverts https://reviews.llvm.org/rL322005
This reverts commit b7e0026b4385180c378edc658ec91a39566f2942.
llvm-svn: 322008
This is an attempt of fixing PR35807.
Due to the non-standard definition of dominance in LLVM, where uses in
unreachable blocks are dominated by anything, you can have, in an
unreachable block:
%patatino = OP1 %patatino, CONSTANT
When `SimplifyInstruction` receives a PHI where an incoming value is of
the aforementioned form, in some cases, loops indefinitely.
What I propose here instead is keeping track of the incoming values
from unreachable blocks, and replacing them with undef. It fixes this
case, and it seems to be good regardless (even if we can't prove that
the value is constant, as it's coming from an unreachable block, we
can ignore it).
Differential Revision: https://reviews.llvm.org/D41812
llvm-svn: 322006
Adds option /guard:cf to clang-cl and -cfguard to cc1 to emit function IDs
of functions that have their address taken into a section named .gfids$y for
compatibility with Microsoft's Control Flow Guard feature.
Differential Revision: https://reviews.llvm.org/D40531
llvm-svn: 322005
This patch improves diagnostic for case when mapped instruction
does not contain a field listed under RowFields.
Differential Revision: https://reviews.llvm.org/D41778
llvm-svn: 322004
There is precedence for factorization transforms in instcombine for FP ops with fast-math.
We also have similar logic in foldSPFofSPF().
It would take more work to add this to reassociate because that's specialized for binops,
and min/max are not binops (or even single instructions). Also, I don't have evidence that
larger min/max trees than this exist in real code, but if we find that's true, we might
want to reorganize where/how we do this optimization.
In the motivating example from https://bugs.llvm.org/show_bug.cgi?id=35717 , we have:
int test(int xc, int xm, int xy) {
int xk;
if (xc < xm)
xk = xc < xy ? xc : xy;
else
xk = xm < xy ? xm : xy;
return xk;
}
This patch solves that problem because we recognize more min/max patterns after rL321672
https://rise4fun.com/Alive/Qjnehttps://rise4fun.com/Alive/3yg
Differential Revision: https://reviews.llvm.org/D41603
llvm-svn: 321998
Summary:
Fixes the bug with incorrect handling of InsertValue|InsertElement
instrucions in SLP vectorizer. Currently, we may use incorrect
ExtractElement instructions as the operands of the original
InsertValue|InsertElement instructions.
Reviewers: mkuper, hfinkel, RKSimon, spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41767
llvm-svn: 321994
Summary:
If the vectorized value is marked as extra reduction argument, its users
are not considered as external users. Patch fixes this.
Reviewers: mkuper, hfinkel, RKSimon, spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41786
llvm-svn: 321993
This patch allows `r7` to be used, regardless of its use as a frame pointer, as
a temporary register when popping `lr`, and also falls back to using a high
temporary register if, for some reason, we weren't able to find a suitable low
one.
Differential revision: https://reviews.llvm.org/D40961
Fixes https://bugs.llvm.org/show_bug.cgi?id=35481
llvm-svn: 321989
(Target)FrameLowering::determineCalleeSaves can be called multiple
times. I don't think it should have side-effects as creating stack
objects and setting global MachineFunctionInfo state as it is doing
today (in other back-ends as well).
This moves the creation of stack objects from determineCalleeSaves to
assignCalleeSavedSpillSlots.
Differential Revision: https://reviews.llvm.org/D41703
llvm-svn: 321987
These tests assumes availability of external symbols provided by the
C++ library, but those won't be available in case when the C++ library
is statically linked because lli itself doesn't need these.
This uses llvm-readobj -needed-libs to check if C++ library is linked as
shared library and exposes that information as a feature to lit.
Differential Revision: https://reviews.llvm.org/D41272
llvm-svn: 321981
The approach was never discussed, I wasn't able to reproduce this
non-determinism, and the original author went AWOL.
After a discussion on the ML, Philip suggested to revert this.
llvm-svn: 321974
I had removed the qualifiers around the autogenerated folding table so I could compare with the manual table, but didn't intend to commit the change.
llvm-svn: 321971
Allow SimplifyDemandedBits to use TargetLoweringOpt::computeKnownBits to look through bitcasts. This can help simplifying in some cases where bitcasts of constants generated during or after legalization can't be folded away, and thus didn't get picked up by SimplifyDemandedBits. This fixes PR34620, where a redundant pand created during legalization from lowering and lshr <16xi8> wasn't being simplified due to the presence of a bitcasted build_vector as an operand.
Committed on the behalf of @sameconrad (Sam Conrad)
Differential Revision: https://reviews.llvm.org/D41643
llvm-svn: 321969
Summary:
There are few oddities that occur due to v1i1, v8i1, v16i1 being legal without v2i1 and v4i1 being legal when we don't have VLX. Particularly during legalization of v2i32/v4i32/v2i64/v4i64 masked gather/scatter/load/store. We end up promoting the mask argument to these during type legalization and then have to widen the promoted type to v8iX/v16iX and truncate it to get the element size back down to v8i1/v16i1 to use a 512-bit operation. Since need to fill the upper bits of the mask we have to fill with 0s at the promoted type.
It would be better if we could just have the v2i1/v4i1 types as legal so they don't undergo any promotion. Then we can just widen with 0s directly in a k register. There are no real v4i1/v2i1 instructions anyway. Everything is done on a larger register anyway.
This also fixes an issue that we couldn't implement a masked vextractf32x4 from zmm to xmm properly.
We now have to support widening more compares to 512-bit to get a mask result out so new tablegen patterns got added.
I had to hack the legalizer for widening the operand of a setcc a bit so it didn't try create a setcc returning v4i32, extract from it, then try to promote it using a sign extend to v2i1. Now we create the setcc with v4i1 if the original setcc's result type is v2i1. Then extract that and don't sign extend it at all.
There's definitely room for improvement with some follow up patches.
Reviewers: RKSimon, zvi, guyblank
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41560
llvm-svn: 321967
Another small step forward to move VPlan stuff outside of LoopVectorize.cpp.
VPlanBuilder.h is renamed to LoopVectorizationPlanner.h
LoopVectorizationPlanner class is moved from LoopVectorize.cpp to
LoopVectorizationPlanner.h LoopVectorizationCostModel::VectorizationFactor
class is moved to LoopVectorizationPlanner.h (used by the planner class) ---
this needs further streamlining work in later patches and thus all I did was
take it out of the CostModel class and moved to the header file. The callback
function had to stay inside LoopVectorize.cpp since it calls an
InnerLoopVectorizer member function declared in it. Next Steps: Make
InnerLoopVectorizer, LoopVectorizationCostModel, and other classes more modular
and more aligned with VPlan direction, in small increments.
Previous step was: r320900 (https://reviews.llvm.org/D41045)
Patch by Hideki Saito, thanks!
Differential Revision: https://reviews.llvm.org/D41420
llvm-svn: 321962
In addition to target-dependent attributes, we can also preserve a
white-listed subset of target independent function attributes. The white-list
excludes problematic attributes, most prominently:
* attributes related to memory accesses, as alloca instructions
could be moved in/out of the extracted block
* control-flow dependent attributes, like no_return or thunk, as the
relerelevant instructions might or might not get extracted.
Thanks @efriedma and @aemerson for providing a set of attributes that cannot be
propagated.
Reviewers: efriedma, davidxl, davide, silvas
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D41334
llvm-svn: 321961
Summary:
I believe legalization is really expecting that ReplaceNodeResults will return something with the same type as the thing that's being legalized. Ultimately, it uses the output to replace the uses in the DAG so the type should match to make that work.
There are two relevant cases here. When crbits are enabled, then i1 is a legal type and getSetCCResultType should return i1. In this case, the truncate will be between i1 and i1 and should be removed (SelectionDAG::getNode does this). Otherwise, getSetCCResultType will be i32 and the legalizer will promote the truncate to be i32 -> i32 which will be similarly removed.
With this fixed we can remove some code from PromoteIntRes_SETCC that seemed to only exist to deal with the intrinsic being replaced with a larger type without changing the other operand. With the truncate being used for connectivity this doesn't happen anymore.
Reviewers: hfinkel
Reviewed By: hfinkel
Subscribers: nemanjai, llvm-commits, kbarton
Differential Revision: https://reviews.llvm.org/D41654
llvm-svn: 321959
The instructions that load 64-bits or an xmm register should be TB_NO_REVERSE to avoid the load being widened during unfold. The instructions that load 128-bits need to ensure 128-bit alignment.
llvm-svn: 321956
The memory form of the xmm->xmm version only writes 64-bits. If we use it in the folding tables and its get used for a stack spill, only half the slot will be written. Then a reload may read all 128-bits which will pull in garbage. But without the spill the upper bits of the register would have been zero. By not folding we would preserve the zeros.
llvm-svn: 321950
We don't do fine grained feature control like this on features prior to AVX512.
We do still have checks in place in the assembly parser itself that prevents %zmm references or %xmm16-31 from being parsed without at least -mattr=avx512f. Same for rounding control and mask operands. That will prevent the table matcher from matching for any instructions that need those features and that's probably good enough.
llvm-svn: 321947
If the varargs are not accessed by a function, we can inline the
function.
Reviewers: dblaikie, chandlerc, davide, efriedma, rnk, hfinkel
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D41335
llvm-svn: 321940
In the minimal case, this won't remove instructions, but it still improves
uses of existing values.
In the motivating example from PR35834, it does remove instructions, and
sets that case up to be optimized by something like D41603:
https://reviews.llvm.org/D41603
llvm-svn: 321936
This is the last step needed to fix PR33325:
https://bugs.llvm.org/show_bug.cgi?id=33325
We're trading branch and compares for loads and logic ops.
This makes the code smaller and hopefully faster in most cases.
The 24-byte test shows an interesting construct: we load the trailing scalar
elements into vector registers and generate the same pcmpeq+movmsk code that
we expected for a pair of full vector elements (see the 32- and 64-byte tests).
Differential Revision: https://reviews.llvm.org/D41714
llvm-svn: 321934
Having a single call to findDbgUsers() allows salvageDebugInfo() to
return earlier.
Differential Revision: https://reviews.llvm.org/D41787
llvm-svn: 321915
This had been reverted because the new test failed on non-X86 bots. I moved
the new test to the appropriate subdirectory to correct this.
Differential Revision: https://reviews.llvm.org/D41264
Original submission: r321122 (which was reverted by r321125)
This reverts commit 3c1639b5703c387a0d8cba2862803b4e68dff436.
llvm-svn: 321911
Summary:
This commit updates the BufferByteStreamer, used by DebugLocStream
to buffer bytes/comments to put in the debug_loc section, to
make sure that the Buffer and Comments vectors are synced.
Previously, when an SLEB128 or ULEB128 was emitted together with
a comment, the vectors could be out-of-sync if the LEB encoding
added several entries to the Buffer vectors, while we only added
a single entry to the Comments vector.
The goal with this is to get the comments in the debug_loc
section in the .s file correctly aligned.
Example (using ARM as target):
Instead of
.byte 144 @ sub-register DW_OP_regx
.byte 128 @ 256
.byte 2 @ DW_OP_piece
.byte 147 @ 8
.byte 8 @ sub-register DW_OP_regx
.byte 144 @ 257
.byte 129 @ DW_OP_piece
.byte 2 @ 8
.byte 147 @
.byte 8 @
we now get
.byte 144 @ sub-register DW_OP_regx
.byte 128 @ 256
.byte 2 @
.byte 147 @ DW_OP_piece
.byte 8 @ 8
.byte 144 @ sub-register DW_OP_regx
.byte 129 @ 257
.byte 2 @
.byte 147 @ DW_OP_piece
.byte 8 @ 8
Reviewers: JDevlieghere, rnk, aprantl
Reviewed By: aprantl
Subscribers: davide, Ka-Ka, uabelho, aemerson, javed.absar, kristof.beyls, llvm-commits, JDevlieghere
Differential Revision: https://reviews.llvm.org/D41763
llvm-svn: 321907
Without this we allow "vmovd %rax, %xmm0", but not "vmovd %rax, %xmm16"
This exists due to continue a silly bug where really old versions of the GNU assembler required movd instead of movq on these instructions. This compatibility hack then crept forward to avx version too, but we didn't propagate it to avx512.
llvm-svn: 321903
This behavior existed to work with an old version of the gnu assembler on MacOS that only accepted this form. Newer versions of GNU assembler and the current LLVM derived version of the assembler on MacOS support movq as well.
llvm-svn: 321898
Otherwise, in some extreme test case, very long names are created and the
compiler consumes large amount of memory. Size limit is set to a relatively
high value not to disturb debugging.
Compiler flag -non-global-value-max-name-size=<value> can be used to customize
the size.
Differential Revision: https://reviews.llvm.org/D41296
llvm-svn: 321886
This change adds support in llvm-objcopy for GNU objcopy's --localize-hidden
option. This option changes every hidden or internal symbol into a local symbol.
llvm-svn: 321884
This is not a record type that clang currently generates,
but it is a record that is encountered in object files generated
by cl. This record is unusual in that it refers directly to
the string table instead of indirectly to the string table via
the FileChecksums table. Because of this, it was previously
overlooked and we weren't remapping the string indices at all.
This would lead to crashes in MSVC when trying to display a
variable whose debug info involved an S_FILESTATIC.
Original bug report by Alexander Ganea
Differential Revision: https://reviews.llvm.org/D41718
llvm-svn: 321883
Besides the bug of omitting the inverse transform of max(~a, ~b) --> ~min(a, b),
the use checking and operand creation were off. We were potentially creating
repeated identical instructions of existing values. This led to infinite
looping after I added the extra folds.
By using the simpler m_Not matcher and not creating new 'not' ops for a and b,
we avoid that problem. It's possible that not using IsFreeToInvert() here is
more limiting than the simpler matcher, but there are no tests for anything
more exotic. It's also possible that we should relax the use checking further
to handle a case like PR35834:
https://bugs.llvm.org/show_bug.cgi?id=35834
...but we can make that a follow-up if it is needed.
llvm-svn: 321882
Summary:
Remove a platform-specific path separator added to the llvm-mt help text test
in https://reviews.llvm.org/D41732.
Test Plan: `check-llvm`
llvm-svn: 321881
We have some code to try to determine how many pieces an MSF
Free Page Map is split into, and this code had an off by one
error which would cause the calculation to be incorrect when
there were exactly 4096*k + 1 blocks in an MSF file.
Original investigation and patch outline by Colden Cullen.
Differential Revision: https://reviews.llvm.org/D41742
llvm-svn: 321880
Summary:
Add a method `OptTable::findNearest`, which allows users of OptTable to
check user input for misspelled options. In addition, have llvm-mt
check for misspelled options. For example, if a user invokes
`llvm-mt /oyt:foo`, the error message will indicate that while an
option named `/oyt:` does not exist, `/out:` does.
The method ports the functionality of the `LookupNearestOption` method
from LLVM CommandLine to libLLVMOption. This allows tools like Clang
and Swift, which do not use CommandLine, to use this functionality to
suggest similarly spelled options.
As room for future improvement, the new method as-is cannot yet properly suggest
nearby "joined" options -- that is, for an option string "-FozBar", where
"-Foo" is the correct option name and "Bar" is the value being passed along
with the misspelled option, this method will calculate an edit distance of 4,
by deleting "Bar" and changing "z" to "o". It should instead calculate an edit
distance of just 1, by changing "z" to "o" and recognizing "Bar" as a
value. This commit includes a disabled test that expresses this limitation.
Test Plan: `check-llvm`
Reviewers: yamaguchi, v.g.vassilev, teemperor, ruiu, jroelofs
Reviewed By: jroelofs
Subscribers: jroelofs, llvm-commits
Differential Revision: https://reviews.llvm.org/D41732
llvm-svn: 321877
Summary: The test is failing because Windows do not support "diff -r".
Reviewers: Dor1s
Reviewed By: Dor1s
Differential Revision: https://reviews.llvm.org/D41768
llvm-svn: 321876
Summary:
Local testing has demonstrated a great speed improvement, compare the following:
1) Existing version:
```
$ time llvm-cov show -format=html -output-dir=report -instr-profile=... ...
The tool has been launched: 00:00:00
Loading coverage data: 00:00:00
Get unique source files: 00:00:33
Creating an index out of the source files: 00:00:34
Going into prepareFileReports: 00:00:34
Going to emit summary information for each file: 00:28:55 <-- 28:21 min!
Going to emit links to files with no function: 00:28:55
Launching 32 threads for generating HTML files: 00:28:55
real 37m43.651s
user 112m5.540s
sys 7m39.872s
```
2) Multi-threaded version with 32 CPUs:
```
$ time llvm-cov show -format=html -output-dir=report -instr-profile=... ...
The tool has been launched: 00:00:00
Loading coverage data: 00:00:00
Get unique source files: 00:00:38
Creating an index out of the source files: 00:00:40
Going into prepareFileReports: 00:00:40
Preparing file reports using 32 threads: 00:00:40
# Creating thread tasks for the following number of files: 16422
Going to emit summary information for each file: 00:01:57 <-- 1:17 min!
Going to emit links to files with no function: 00:01:58
Launching 32 threads for generating HTML files: 00:01:58
real 11m2.044s
user 134m48.124s
sys 7m53.388s
```
Reviewers: vsk, morehouse
Reviewed By: vsk
Subscribers: Dor1s, llvm-commits, kcc
Differential Revision: https://reviews.llvm.org/D41206
llvm-svn: 321871
Currently the assembler would accept, e.g. `ldr r0, [s0, #12]` and similar.
This patch add checks that only general-purpose registers are used in address
operands, shifted registers, and shift amounts.
Differential revision: https://reviews.llvm.org/D39910
llvm-svn: 321866
The original commit broke the builders due to a think-o in an assertion:
AsynchronousSymbolQuery's constructor needs to check the callback member
variables, not the constructor arguments.
llvm-svn: 321853
This implements the DWARF 5 feature described at
http://www.dwarfstd.org/ShowIssue.php?issue=141215.1
This allows a consumer to understand whether a composite data type is
trivially copyable and thus should be passed by value instead of by
reference. The canonical example is being able to distinguish the
following two types:
// S is not trivially copyable because of the explicit destructor.
struct S {
~S() {}
};
// T is a POD type.
struct T {
~T() = default;
};
This patch adds two new (DI)flags to LLVM metadata: TypePassByValue
and TypePassByReference.
<rdar://problem/36034922>
Differential Revision: https://reviews.llvm.org/D41743
llvm-svn: 321844
SymbolSource.
These new APIs are a first stab at tackling some current shortcomings of ORC,
especially in performance and threading support.
VSO (Virtual Shared Object) is a symbol table representing the symbol
definitions of a set of modules that behave as if they had been statically
linked together into a shared object or dylib. Symbol definitions, either
pre-defined addresses or lazy definitions, can be added and queries for symbol
addresses made. The table applies the same linkage strength rules that static
linkers do when constructing a dylib or shared object: duplicate definitions
result in errors, strong definitions override weak or common ones. This class
should improve symbol lookup speed by providing centralized symbol tables (as
compared to the findSymbol implementation in the in-tree ORC layers, which
maintain one symbol table per object file / module added).
AsynchronousSymbolQuery is a query for the addresses of a set of symbols.
Query results are returned via a callback once they become available. Querying
for a set of symbols, rather than one symbol at a time (as the current lookup
scheme does) the JIT has the opportunity to make better use of available
resources (e.g. by spawning multiple jobs to materialize the requested symbols
if possible). Returning results via a callback makes queries asynchronous, so
queries from multiple threads of JIT'd code can proceed simultaneously.
SymbolSource represents a source of symbol definitions. It is used when
adding lazy symbol definitions to a VSO. Symbol definitions can be materialized
when needed or discarded if a stronger definition is found. Materializing on
demand via SymbolSources should (eventually) allow us to remove the lazy
materializers from JITSymbol, which will in turn allow the removal of many
current error checks and reduce the number of RPC round-trips involved in
materializing remote symbols. Adding a discard function allows sources to
discard symbol definitions (or mark them as available_externally), reducing the
amount of redundant code generated by the JIT for ODR symbols.
llvm-svn: 321838