Summary:
Previously, ParseCommandLineOptions returns false and ignores error messages
when IgnoreErrors. It would be useful to also return error messages if users
decide to check parsing result instead of having the program exit on error.
Reviewers: chandlerc, mehdi_amini, rnk
Reviewed By: rnk
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30893
llvm-svn: 297810
Enable the selection of the 64-bit signed multiply accumulate
instructions which operate on 16-bit operands. These are enabled for
ARMv5TE onwards for ARM and for V6T2 and other DSP enabled Thumb
architectures.
Differential Revision: https://reviews.llvm.org/D30044
llvm-svn: 297809
Summary: D25742 improved the precision of debug locations for PGO by removing debug locations from common tail when tail-merging. However, if identical insturctions that are merged into a common tail have the same debug locations, there's no need to remove them. This patch creates a merged debug location of identical instructions across SameTails and assign it to the instruction in the common tail, so that the debug locations are maintained if they are same across identical instructions.
Reviewers: aprantl, probinson, MatzeB, rob.lougher
Reviewed By: aprantl
Subscribers: andreadb, llvm-commits
Differential Revision: https://reviews.llvm.org/D30226
llvm-svn: 297805
On MachO platforms that use subsections-via-symbols dead code stripping will
drop prefix data. Unfortunately there is no great way to convey the relationship
between a function and its prefix data to the linker. We are forced to use a bit
of a hack: we give the prefix data it’s own symbol, and mark the actual function
entry an .alt_entry.
Patch by Moritz Angermann!
Differential Revision: https://reviews.llvm.org/D30770
llvm-svn: 297804
Summary:
<1 x Ty> is not a legal vector type in LLT, we shouldn’t build G_MERGE_VALUES
instruction for them.
Reviewers: qcolombet, aditya_nandakumar, dsanders, t.p.northover, ab, javed.absar
Reviewed By: qcolombet
Subscribers: dberris, rovka, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D30948
llvm-svn: 297792
Summary:
Adds a new kind of MachineOperand: MO_Placeholder.
This operand must not appear in the MIR and only exists as a way of
creating an 'uninitialized' operand until a matcher function overwrites it.
Depends on D30046, D29712
Reviewers: t.p.northover, ab, rovka, aditya_nandakumar, javed.absar, qcolombet
Reviewed By: qcolombet
Subscribers: dberris, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D30089
llvm-svn: 297782
Reduced version of D26357 - based on the discussion on llvm-dev about canonicalization of UMIN/UMAX/SMIN/SMAX as well as ABS I've reduced that patch to just the ABS ISD node (with x86/sse support) to improve basic combines and lowering.
ARM/AArch64, Hexagon, PowerPC and NVPTX all have similar instructions allowing us to make this a generic opcode and move away from the hard coded tablegen patterns which makes it tricky to match more complex patterns.
At the moment this patch doesn't attempt legalization as we only create an ABS node if its legal/custom.
Differential Revision: https://reviews.llvm.org/D29639
llvm-svn: 297780
Previously we were using the encoded LEB hex values
for the value types. This change uses the decoded
negative value and the LEB encoder to write them out.
Differential Revision: https://reviews.llvm.org/D30847
Patch by Sam Clegg
llvm-svn: 297777
On Solaris ld (and some other tools that use the underlying utility
libraries, such as elfdump) chokes on an archive library that has no
symbol table. The Solaris tools always create one, even if it's empty.
That bug has been fixed in the latest development line, and can
probably be backported to a supported release, but it would be nice if
LLVM's archiver could emit the empty symbol table, too.
Patch by Danek Duvall!
llvm-svn: 297773
Make MCSectionELF::AssociatedSection be a link to a symbol, because
that's how it works in the assembly, and use it in the asm printer.
llvm-svn: 297769
Summary:
In SamplePGO, if the profile is collected from non-LTO binary, and used to drive ThinLTO, the indirect call promotion may fail because ThinLTO adjusts local function names to avoid conflicts. There are two places of where the mismatch can happen:
1. thin-link prepends SourceFileName to front of FuncName to build the GUID (GlobalValue::getGlobalIdentifier). Unlike instrumentation FDO, SamplePGO does not use the PGOFuncName scheme and therefore the indirect call target profile data contains a hash of the OriginalName.
2. backend compiler promotes some local functions to global and appends .llvm.{$ModuleHash} to the end of the FuncName to derive PromotedFunctionName
This patch tries at the best effort to find the GUID from the original local function name (in profile), and use that in ICP promotion, and in SamplePGO matching that happens in the backend after importing/inlining:
1. in thin-link, it builds the map from OriginalName to GUID so that when thin-link reads in indirect call target profile (represented by OriginalName), it knows which GUID to import.
2. in backend compiler, if sample profile reader cannot find a profile match for PromotedFunctionName, it will try to find if there is a match for OriginalFunctionName.
3. in backend compiler, we build symbol table entry for OriginalFunctionName and pointer to the same symbol of PromotedFunctionName, so that ICP can find the correct target to promote.
Reviewers: mehdi_amini, tejohnson
Reviewed By: tejohnson
Subscribers: llvm-commits, Prazek
Differential Revision: https://reviews.llvm.org/D30754
llvm-svn: 297757
This patch refactors the PHisToFix loop as follows:
- The loop itself now resides in its own method.
- The new method iterates on scalar-loop's header; the PHIsToFix map formerly
propagated as an output parameter and filled during phi widening is removed.
- The code handling reductions is moved into its own method, similar to the
existing fixFirstOrderRecurrence().
Differential Revision: https://reviews.llvm.org/D30755
llvm-svn: 297740
This instruction was missing from the list of opcodes that we check, so we were
hitting an llvm_unreachable in ARMMCCodeEmitter.cpp for the ARM MOVT
instruction, rather than the diagnostic that is emitted for the other MOVW/MOVT
instructions.
Differential revision: https://reviews.llvm.org/D30936
llvm-svn: 297739
Refactoring Cost Model's selectVectorizationFactor() so that it handles only the
selection of the best VF from a pre-computed range of candidate VF's, extracting
early-exit criteria and the computation of a MaxVF upper-bound to other methods,
all driven by a newly introduced LoopVectorizationPlanner.
Differential Revision: https://reviews.llvm.org/D30653
llvm-svn: 297737
If it is possible for the RHS of a shift operation to be greater than or equal
to the bit-width, then the result might be undef, and we can't report any known
bits.
In some cases, this was allowing a transformation in instcombine which widened
an undef value from i1 to i32, increasing the range of values that a function
could return.
Differential revision: https://reviews.llvm.org/D30781
llvm-svn: 297724
Create nodes for smulwb and smulwt and move their selection from
DAGToDAG to DAG combine. smlawb and smlawt can then be selected
using tablegen. Added some helper functions to detect shift patterns
as well as a wrapper around SimplifyDemandBits. Added a couple of
extra tests.
Differential Revision: https://reviews.llvm.org/D30708
llvm-svn: 297716
Each Calling convention (CC) defines a static list of registers that should be preserved by a callee function. All other registers should be saved by the caller.
Some CCs use additional condition: If the register is used for passing/returning arguments – the caller needs to save it - even if it is part of the Callee Saved Registers (CSR) list.
The current LLVM implementation doesn’t support it. It will save a register if it is part of the static CSR list and will not care if the register is passed/returned by the callee.
The solution is to dynamically allocate the CSR lists (Only for these CCs). The lists will be updated with actual registers that should be saved by the callee.
Since we need the allocated lists to live as long as the function exists, the list should reside inside the Machine Register Info (MRI) which is a property of the Machine Function and managed by it (and has the same life span).
The lists should be saved in the MRI and populated upon LowerCall and LowerFormalArguments.
The patch will also assist to implement future no_caller_saved_regsiters attribute intended for interrupt handler CC.
Differential Revision: https://reviews.llvm.org/D28566
llvm-svn: 297715
getIntrinsicInstrCost() used to only compute scalarization cost based on types.
This patch improves this so that the actual arguments are checked when they are
available, in order to handle only unique non-constant operands.
Tests updates:
Analysis/CostModel/X86/arith-fp.ll
Transforms/LoopVectorize/AArch64/interleaved_cost.ll
Transforms/LoopVectorize/ARM/interleaved_cost.ll
The improvement in getOperandsScalarizationOverhead() to differentiate on
constants made it necessary to update the interleaved_cost.ll tests even
though they do not relate to intrinsics.
Review: Hal Finkel
https://reviews.llvm.org/D29540
llvm-svn: 297705
When checking if chain node is foldable, make sure the intermediate nodes have a single use across all results not just the result that was used to reach the chain node.
This recovers a test case that was severely broken by r296476, my making sure we don't create ADD/ADC that loads and stores when there is also a flag dependency.
llvm-svn: 297698
Recommiting with compiler time improvements
Recommitting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.
* Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through
simplified store merging search and chain alias analysis which only
checks for parallel stores through the chain subgraph. This is cleaner
as the separation of non-interfering loads/stores from the
store-merging logic.
When merging stores search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited.
This improves the quality of the output SelectionDAG and the output
Codegen (save perhaps for some ARM cases where we correctly constructs
wider loads, but then promotes them to float operations which appear
but requires more expensive constant generation).
Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)
Additional Minor Changes:
1. Finishes removing unused AliasLoad code
2. Unifies the chain aggregation in the merged stores across code
paths
3. Re-add the Store node to the worklist after calling
SimplifyDemandedBits.
4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
arbitrary, but seems sufficient to not cause regressions in
tests.
5. Remove Chain dependencies of Memory operations on CopyfromReg
nodes as these are captured by data dependence
6. Forward loads-store values through tokenfactors containing
{CopyToReg,CopyFromReg} Values.
7. Peephole to convert buildvector of extract_vector_elt to
extract_subvector if possible (see
CodeGen/AArch64/store-merge.ll)
8. Store merging for the ARM target is restricted to 32-bit as
some in some contexts invalid 64-bit operations are being
generated. This can be removed once appropriate checks are
added.
This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.
Many tests required some changes as memory operations are now
reorderable, improving load-store forwarding. One test in
particular is worth noting:
CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
forwarding converts a load-store pair into a parallel store and
a memory-realized bitcast of the same value. However, because we
lose the sharing of the explicit and implicit store values we
must create another local store. A similar transformation
happens before SelectionDAG as well.
Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
llvm-svn: 297695
This reverts commit r242302. External type refs of this form were
never used by any LLVM frontend so this is effectively dead code.
(They were introduced to support clang module debug info, but in the
end we came up with a better design that doesn't use this feature at
all.)
rdar://problem/25897929
Differential Revision: https://reviews.llvm.org/D30917
llvm-svn: 297684
Previously, it created a temporary directory and then failed when
FileOutputBuffer tried to rename that file to the destination file
(which is actually a directory name).
Differential Revision: https://reviews.llvm.org/D30912
llvm-svn: 297679
The previous algorithm for RegUsageInfoCollector had pretty bad
performance on architectures with a lot of registers that alias
a lot one another, because we potentially iterate for every register
over all the aliasing registers. This costs even more if the function
is small and doesn't define a lot of registers.
This patch changes the algorithm to one that while iterating over
all the registers it will iterate over the aliasing registers only
if the register itself is defined.
This should be faster based on the assumption that only a subset
of the whole LLVM registers set is actually defined in the function.
Differential Revision: https://reviews.llvm.org/D30880
llvm-svn: 297673
rL230225 made the assumption that only the lower 32-bits of an MMX register load is used as a shift value, when in fact the whole 64-bits are reloaded and treated as a i64 to determine the shift value.
This patch reverts rL230225 to ensure that the whole 64-bits of memory are folded and ensures that the upper 32-bit are zero'd for cases where the shift value has come from a scalar source.
Found during fuzz testing.
Differential Revision: https://reviews.llvm.org/D30833
llvm-svn: 297667
I am leaving the code in clang which filters mxcsr from the clobber list because that is still technically correct and will be useful again when the MXCSR register is reintroduced.
llvm-svn: 297664
This commit adds tail call support to the MachineOutliner pass. This allows
the outliner to insert jumps rather than calls in areas where tail calling is
possible. Outlined tail calls include the return or terminator of the basic
block being outlined from.
Tail call support allows the outliner to take returns and terminators into
consideration while finding candidates to outline. It also allows the outliner
to save more instructions. For example, in the X86-64 outliner, a tail called
outlined function saves one instruction since no return has to be inserted.
llvm-svn: 297653
For AVX-512 we force the input to zero if the input is undef or the mask is all ones to break an execution dependency. This patch brings the same behavior to AVX2.
llvm-svn: 297652
We were already forcing undef inputs to become a zero vector, this now catches an all ones mask too.
Ideally we'd use undef and let execution dep fix handle picking the best register/clearance for the undef, but I don't think it can handle the early clobber today.
llvm-svn: 297651
Previously we could round-trip type records from PDB -> Yaml ->
PDB, but for symbols we could only go from PDB -> Yaml. This
completes the round-tripping for symbols as well.
llvm-svn: 297625
If raw_fd_ostream is constructed with the path of "-", it claims
ownership of the stdout file descriptor. This means that it closes
stdout when it is destroyed. If there are multiple users of
raw_fd_ostream wrapped around stdout, then a crash can occur because
of operations on a closed stream.
An example of this would be running something like "clang -S -o - -MD
-MF - test.cpp". Alternatively, using outs() (which creates a local
version of raw_fd_stream to stdout) anywhere combined with such a
stream usage would cause the crash.
The fix duplicates the stdout file descriptor when used within
raw_fd_ostream, so that only that particular descriptor is closed when
the stream is destroyed.
Patch by James Henderson!
llvm-svn: 297624
We used to hit an unreachable in getRegBankFromRegClass when dealing with the
stack pointer. This commit adds support for the GPRsp reg class.
llvm-svn: 297621
This commit is a follow-up on r297580. It fixes the FIXME added temporarily
by that commit to keep the removal of Unroller's specialized version of
scalarizeInstruction() an NFC. See https://reviews.llvm.org/D30715 for details.
llvm-svn: 297610
Loop over the ARM decode tables; this is a clean-up to reduce some code
duplication.
Differential Revision: https://reviews.llvm.org/D30814
llvm-svn: 297608
The immediate should be 1 or 2, not 0 or 1. This was found while adding bounds checking to clang. In fact the existing clang builtin test failed if we ran it all the way to assembly.
llvm-svn: 297591
I noticed unnecessary 'sbb' instructions in D30472 and while looking at 'ptest' codegen recently.
This happens because we were transforming any 'setb' - even when we only wanted a single-bit result.
This patch moves those transforms under visitAdd/visitSub, so we we're only creating sbb/adc when it
is a win. I don't know why we need a SETCC_CARRY node type, but I'm not proposing to change that
existing behavior in this patch.
Also, I'm skeptical that sbb/adc are a win for all micro-arches, so I added comments to the test files
where this transform still fires.
The test changes here are all cases where we no longer produce sbb/adc. Avoiding partial register
stalls (generating an xor to clear a register) is not handled in some cases, but that's a separate
issue.
Differential Revision: https://reviews.llvm.org/D30611
llvm-svn: 297586
Summary:
A53 scheduler causes an assertion failure on all CRC instructions:
include/llvm/CodeGen/MachineInstr.h:280: const llvm::MachineOperand
&llvm::MachineInstr::getOperand(unsigned int) const: Assertion `i <
getNumOperands() && "getOperand() out of range!"' failed.
The case statements corresponding to CRC instructions are incorrect and should
be removed.
Also adding a testcase while on this.
Reviewers: t.p.northover, javed.absar, apazos, rengolin
Reviewed By: rengolin
Subscribers: evandro, aemerson, llvm-commits, rengolin
Differential Revision: https://reviews.llvm.org/D30274
llvm-svn: 297582
Unroller's specialized scalarizeInstruction() is mostly duplicating Vectorizer's
variant. OTOH Vectorizer's scalarizeInstruction() already supports the special
case of VF==1 except for avoiding mask-bit extraction in that case. This patch
removes Unroller's specialized version in favor of a unified method.
The only functional difference between the two variants seems to be setting
memcheck metadata for loads and stores only in Vectorizer's variant, which is a
bug in Unroller. To keep this patch an NFC the unified method doesn't set
memcheck metadata for VF==1.
Differential Revision: https://reviews.llvm.org/D30715
llvm-svn: 297580
I'm pretty sure there are more problems lurking here. But I think this fixes PR32241.
I've added the test case from that bug and added asserts that will fail if we ever try to copy between high registers and mask registers again.
llvm-svn: 297574
Without SSE41 (pextrb) we currently extract byte elements from a vector by spilling to stack and reloading the byte.
This patch is an initial attempt at using MOVD/PEXTRW to extract the relevant DWORD/WORD from the vector and then shift+truncate to collect the correct byte.
Extraction of multiple bytes this way would result in code bloat, but as explained in the patch we could probably afford to be more aggressive with the supported extractions before again falling back on spilling - possibly through counting the number of extracts and which DWORD/WORD they originate?
Differential Revision: https://reviews.llvm.org/D29841
llvm-svn: 297568
Since v_max_f32_e64/v_max_f16_e64 can be folded if the target
instruction supports the clamp bit, we also need to maintain
modifiers when converting v_mac to v_mad.
This fixes a rendering issue with Dirt Rally because a v_mac
instruction with the clamp bit set was converted to a v_mad
but that bit was lost during the conversion.
Fixes: e184e01dd79 ("AMDGPU: Fold FP clamp as modifier bit")
Patch by Samuel Pitoiset <samuel.pitoiset@gmail.com>
llvm-svn: 297556
Summary:
Ths "cases" support was not quite finished, is unused, and is really just debug counters.
(well, almost, debug counters are slightly more powerful, in that they can skip things at the start, too).
Note, opt-bisect itself could also be implemented as a wrapper around
debug counters, but not sure it's worth it ATM.
I'll shove it on a todo list if we think it is.
Reviewers: MatzeB, chandlerc
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30856
llvm-svn: 297542
Summary:
This change solves the same problem as D30726, except that this only
throws out the bathwater.
AST was not correctly tracking and deleting UnknownInstructions via
handles. The existing code only tracks "pointers" in its
`ASTCallbackVH`, so an UnknownInstruction (that isn't also def'ing a
pointer used by another memory instruction) never gets a
`ASTCallbackVH`.
There are two other ways to solve this problem:
- Use the `PointerRec` scheme for both known and unknown instructions.
- Use a `CallbackVH` that erases the offending Instruction from the
UnknownInstruction list.
Both of the above changes seemed to be significantly (and unnecessarily
IMO) more complex than this.
Reviewers: chandlerc, dberlin, hfinkel, reames
Subscribers: mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D30849
llvm-svn: 297539
This method inverts the Reason field of a scheduling candidate.
It does right comparison between RegCritical and RegExcess, but
everything else is broken. In fact it can prefer less strong reason
such as Weak over RegCritical because Weak > -RegCritical.
The CandReason enum is properly sorted, so just remove artificial
ranking.
Differential Revision: https://reviews.llvm.org/D30557
llvm-svn: 297536
We don't need to check whether the fallback path is enabled to return
false. Just do that all the time on error cases, the caller knows (or
at least should know!) how to handle the failing case.
llvm-svn: 297535
The problem can occur in presence of subregs. If we are swapping two
instructions defining different subregs of the same register we will
get a new liveout from a block. We need to preserve value number for
block's liveout for successor block's livein to match.
Differential Revision: https://reviews.llvm.org/D30558
llvm-svn: 297534
This function will find the closest ref node aliased to Reg that is
in an instruction preceding Inst. This could be used to identify the
hypothetical reaching def of Reg, if Reg was a member of Inst.
llvm-svn: 297524
This only requires a 64-bit memory source, not the whole 128-bits. But the 128-bit case is still supported via X86InstrInfo::foldMemoryOperandImpl
llvm-svn: 297523
Summary: No test case as none of the in-tree targets with GlobalISel support has this condition.
Reviewers: qcolombet, aditya_nandakumar, dsanders, t.p.northover, ab
Reviewed By: qcolombet
Subscribers: dberris, rovka, kristof.beyls, llvm-commits, igorb
Differential Revision: https://reviews.llvm.org/D30786
llvm-svn: 297512
In order to make it easier to parse information about the performance of
MacroFusion, this patch adds the function and the instruction names to the
debug output of this pass.
llvm-svn: 297504
Summary: There is no need to check profile count as only CallInst will have metadata attached.
Reviewers: eraman
Reviewed By: eraman
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30799
llvm-svn: 297500
This reverts r293386, r294027, r294029 and r296411.
Turns out the SLP tree isn't actually a "tree" and we don't handle
accessing the same packet of loads in several different orders well,
causing miscompiles.
Revert until we can fix this properly.
llvm-svn: 297493
Summary:
We don’t actually use LegalizerInfo in Legalizer pass, it’s just passed
as an argument.
In order to check if an instruction is legal or not, we need to get LegalizerInfo
by calling `MI.getParent()->getParent()->getSubtarget().getLegalizerInfo()`.
Instead, make LegalizerInfo accessible in LegalizerHelper.
Reviewers: qcolombet, aditya_nandakumar, dsanders, ab, t.p.northover, kristof.beyls
Reviewed By: qcolombet
Subscribers: dberris, llvm-commits, rovka
Differential Revision: https://reviews.llvm.org/D30838
llvm-svn: 297491
In openFileForRead, we would not previously return an error
if real_path resolution failed. After a recent patch, we
started propagating this error up. This caused a failure
in clang when trying to call openFileForRead("nul"). This
patch restores the previous behavior of not propagating this
error up.
llvm-svn: 297488
LLVM already has real_path like functionality, but it is
cumbersome to use and involves clean up after (e.g. you have
to call openFileForRead, then close the resulting FD).
Furthermore, on Windows it doesn't work for directories since
opening a directory and opening a file require slightly
different flags.
So I add a simple function `real_path` which works for all
paths on all platforms and has a simple to use interface.
In doing so, I add the ability to opt in to resolving tilde
expressions (e.g. ~/foo), which are normally handled by
the shell.
Differential Revision: https://reviews.llvm.org/D30668
llvm-svn: 297483
Summary:
Depends on D30379
This improves the state of things for the sub class of operation.
Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30436
llvm-svn: 297482
Summary: As per title. This is extracted from D29872 and I threw SADDO in.
Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30379
llvm-svn: 297479
We currently have to insert bits via a temporary variable of the same size as the target with various shift/mask stages, resulting in further temporary variables, all of which require the allocation of memory for large APInts (MaskSizeInBits > 64).
This is another of the compile time issues identified in PR32037 (see also D30265).
This patch adds the APInt::insertBits() helper method which avoids the temporary memory allocation and masks/inserts the raw bits directly into the target.
Differential Revision: https://reviews.llvm.org/D30780
llvm-svn: 297458
This patches teaches the MIPS backend to accept more values for constant
splats. Previously, only 10 bit signed immediates or values that could be
loaded using an ldi.[bhwd] instruction would be acceptted. This patch relaxes
that constraint so that any constant value that be splatted is accepted.
As a result, the constant pool is used less for vector operations, and the
suite of bit manipulation instructions b(clr|set|neg)i can now be used with
the full range of their immediate operand.
Reviewers: slthakur
Differential Revision: https://reviews.llvm.org/D30640
llvm-svn: 297457
Summary:
This is a continuation of D28861. Add an SMLoc to MCUnaryExpr such that
a better diagnostic can be given in case of an error in later stages of
assembling.
Reviewers: rengolin, grosbach, javed.absar, olista01
Reviewed By: olista01
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30581
llvm-svn: 297454
ARMISD::ADD[CE] nodes, instead of the generic ISD::ADD[CE].
Summary:
This allows for some simplification because the combines
are no longer limited to just one go at the node before
it gets legalized into an ARM target-specific one.
Reviewers: jmolloy, rogfer01
Subscribers: aemerson, llvm-commits, rengolin
Differential Revision: https://reviews.llvm.org/D30401
llvm-svn: 297453
It was introduced in:
r296945
WholeProgramDevirt: Implement exporting for single-impl devirtualization.
---------------------
r296939
WholeProgramDevirt: Add any unsuccessful llvm.type.checked.load devirtualizations to the list of llvm.type.test users.
---------------------
Microsoft Visual Studio Community 2015
Version 14.0.23107.0 D14REL
Does not compile that code without additional brackets, showing multiple error like below:
WholeProgramDevirt.cpp(1216): error C2958: the left bracket '[' found at 'c:\access_softek\llvm\lib\transforms\ipo\wholeprogramdevirt.cpp(1216)' was not matched correctly
WholeProgramDevirt.cpp(1216): error C2143: syntax error: missing ']' before '}'
WholeProgramDevirt.cpp(1216): error C2143: syntax error: missing ';' before '}'
WholeProgramDevirt.cpp(1216): error C2059: syntax error: ']'
llvm-svn: 297451
All MIPS .debug_* sections should be marked with ELF type SHT_MIPS_DWARF
accordingly the specification [1]. Also the same section type is assigned
to these sections by GNU tools.
[1] ftp.software.ibm.com/software/os390/czos/dwarf/mips_extensions.pdf
Differential Revision: https://reviews.llvm.org/D29789
llvm-svn: 297447
GAS supports specification of section header's type using a numeric
value [1]. This patch brings the same functionality to LLVM. That allows
to setup some target-specific section types belong to the SHT_LOPROC -
SHT_HIPROC range. If we attempt to print unknown section type, MCSectionELF
class shows an error message. It's better than print sole '@' sign
without any section type name.
In case of MIPS, example of such section's type is SHT_MIPS_DWARF.
Without the patch we will have to implement some workarounds
in probably not-MIPS-specific part of code base to convert SHT_MIPS_DWARF
to the @progbits while printing assembly and to assign SHT_MIPS_DWARF for
@progbits sections named .debug_* if we encounter such section in
an input assembly.
[1] https://sourceware.org/binutils/docs/as/Section.html
Differential Revision: https://reviews.llvm.org/D29719
llvm-svn: 297446
same as already done for ARM and Thumb2.
Reviewers: jmolloy, rogfer01, efriedma
Subscribers: aemerson, llvm-commits, rengolin
Differential Revision: https://reviews.llvm.org/D30400
llvm-svn: 297443
Summary:
These are the functions used to determine when values of loads can be
extracted from stores, etc, and to perform the necessary insertions to
do this. There are no changes to the functions themselves except
reformatting, and one case where memdep was informed of a removed load
(which was pushed into the caller).
Reviewers: davide
Subscribers: mgorny, llvm-commits, Prazek
Differential Revision: https://reviews.llvm.org/D30478
llvm-svn: 297438
Summary: We should not use that to check basic block hotness as optimization may mess it up.
Reviewers: eraman
Reviewed By: eraman
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30800
llvm-svn: 297437
Summary:
Similar to SmallPtrSet, this makes find and count work with both const
referneces and const pointers.
Reviewers: dblaikie
Subscribers: llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D30713
llvm-svn: 297424
Summary: This essentially does the same transform as for ADC.
Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30417
llvm-svn: 297416
- Fix the insertion point, which occasionally could have been incorrect.
- Avoid creating multiple bitsplits with the same operands, if an old one
could be reused.
llvm-svn: 297414
The good reason to do this is that static allocas are pretty simple to handle
(especially at -O0) and avoiding tracking DBG_VALUEs throughout the pipeline
should give some kind of performance benefit.
The bad reason is that the debug pipeline is an unholy mess of implicit
contracts, where determining whether "DBG_VALUE %reg, imm" actually implies a
load or not involves the services of at least 3 soothsayers and the sacrifice
of at least one chicken. And it still gets it wrong if the variable is at SP
directly.
llvm-svn: 297410
Summary: This essentially does the same transform as for SUBC.
Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30437
llvm-svn: 297404
As discussed in the review thread for rL297026, this is actually 2 changes that
would independently fix all of the test cases in the patch:
1. Return undef in FoldConstantArithmetic for div/rem by 0.
2. Move basic undef simplifications for div/rem (simplifyDivRem()) before
foldBinopIntoSelect() as a matter of efficiency.
I will handle the case of vectors with any zero element as a follow-up. That change
is the DAG sibling for D30665 + adding a check of vector elements to FoldConstantVectorArithmetic().
I'm deleting the test for PR30693 because it does not test for the actual bug any more
(dangers of using bugpoint).
Differential Revision:
https://reviews.llvm.org/D30741
llvm-svn: 297384
The fix introduces segfaults and clobbers the value to be stored when
the atomic sequence loops.
Revert "[Target/MIPS] Kill dead code, no functional change intended."
This reverts commit r296153.
Revert "Recommit "[mips] Fix atomic compare and swap at O0.""
This reverts commit r296134.
llvm-svn: 297380
Minor cleanup in ARMInstrVFP.td: removed some FIXMEs and added a MC test for
vcmp that was actually missing.
Differential Revision: https://reviews.llvm.org/D30745
llvm-svn: 297376
entire SCC before iterating on newly-introduced call edges resulting
from any inlined function bodies.
This more closely matches the behavior of the old PM's inliner. While it
wasn't really clear to me initially, this behavior is actually essential
to the inliner behaving reasonably in its current design.
Because the inliner is fundamentally a bottom-up inliner and all of its
cost modeling is designed around that it often runs into trouble within
an SCC where we don't have any meaningful bottom-up ordering to use. In
addition to potentially cyclic, infinite inlining that we block with the
inline history mechanism, it can also take seemingly simple call graph
patterns within an SCC and turn them into *insanely* large functions by
accidentally working top-down across the SCC without any of the
threshold limitations that traditional top-down inliners use.
Consider this diabolical monster.cpp file that Richard Smith came up
with to help demonstrate this issue:
```
template <int N> extern const char *str;
void g(const char *);
template <bool K, int N> void f(bool *B, bool *E) {
if (K)
g(str<N>);
if (B == E)
return;
if (*B)
f<true, N + 1>(B + 1, E);
else
f<false, N + 1>(B + 1, E);
}
template <> void f<false, MAX>(bool *B, bool *E) { return f<false, 0>(B, E); }
template <> void f<true, MAX>(bool *B, bool *E) { return f<true, 0>(B, E); }
extern bool *arr, *end;
void test() { f<false, 0>(arr, end); }
```
When compiled with '-DMAX=N' for various values of N, this will create an SCC
with a reasonably large number of functions. Previously, the inliner would try
to exhaust the inlining candidates in a single function before moving on. This,
unfortunately, turns it into a top-down inliner within the SCC. Because our
thresholds were never built for that, we will incrementally decide that it is
always worth inlining and proceed to flatten the entire SCC into that one
function.
What's worse, we'll then proceed to the next function, and do the exact same
thing except we'll skip the first function, and so on. And at each step, we'll
also make some of the constant factors larger, which is awesome.
The fix in this patch is the obvious one which makes the new PM's inliner use
the same technique used by the old PM: consider all the call edges across the
entire SCC before beginning to process call edges introduced by inlining. The
result of this is essentially to distribute the inlining across the SCC so that
every function incrementally grows toward the inline thresholds rather than
allowing the inliner to grow one of the functions vastly beyond the threshold.
The code for this is a bit awkward, but it works out OK.
We could consider in the future doing something more powerful here such as
prioritized order (via lowest cost and/or profile info) and/or a code-growth
budget per SCC. However, both of those would require really substantial work
both to design the system in a way that wouldn't break really useful
abstraction decomposition properties of the current inliner and to be tuned
across a reasonably diverse set of code and workloads. It also seems really
risky in many ways. I have only found a single real-world file that triggers
the bad behavior here and it is generated code that has a pretty pathological
pattern. I'm not worried about the inliner not doing an *awesome* job here as
long as it does *ok*. On the other hand, the cases that will be tricky to get
right in a prioritized scheme with a budget will be more common and idiomatic
for at least some frontends (C++ and Rust at least). So while these approaches
are still really interesting, I'm not in a huge rush to go after them. Staying
even closer to the existing PM's behavior, especially when this easy to do,
seems like the right short to medium term approach.
I don't really have a test case that makes sense yet... I'll try to find a
variant of the IR produced by the monster template metaprogram that is both
small enough to be sane and large enough to clearly show when we get this wrong
in the future. But I'm not confident this exists. And the behavior change here
*should* be unobservable without snooping on debug logging. So there isn't
really much to test.
The test case updates come from two incidental changes:
1) We now visit functions in an SCC in the opposite order. I don't think there
really is a "right" order here, so I just update the test cases.
2) We no longer compute some analyses when an SCC has no call instructions that
we consider for inlining.
llvm-svn: 297374
Fix a machine verifier issue where a instruction was using a invalid
register. The return pseudo is expanded and has the return address
register added to it. The return register may have been spuriously
mark as killed earlier.
This partially resolves PR/27458
Thanks to Quentin Colombet for reporting the issue!
llvm-svn: 297372
Summary:
In a .symver assembler directive like:
.symver name, name2@@nodename
"name2@@nodename" should get the same symbol binding as "name".
While the ELF object writer is updating the symbol binding for .symver
aliases before emitting the object file, not doing so when the module
inline assembly is handled by the RecordStreamer is causing the wrong
behavior in *LTO mode.
E.g. when "name" is global, "name2@@nodename" must also be marked as
global. Otherwise, the symbol is skipped when iterating over the LTO
InputFile symbols (InputFile::Symbol::shouldSkip). So, for example,
when performing any *LTO via the gold-plugin, the versioned symbol
definition is not recorded by the plugin and passed back to the
linker. If the object was in an archive, and there were no other symbols
needed from that object, the object would not be included in the final
link and references to the versioned symbol are undefined.
The llvm-lto2 tests added will give an error about an unused symbol
resolution without the fix.
Reviewers: rafael, pcc
Reviewed By: pcc
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D30485
llvm-svn: 297332
!type metadata can not be dropped. An alternative to this is adding
!type metadata from the replaced globals to the replacement, but that
may weaken type tests and make them slower at the same time.
The merged global gets !dbg metadata from replaced globals, and can
end up with multiple debug locations.
llvm-svn: 297327
This commit changes the BumpPtrAllocator for suffix tree nodes to a SpecificBumpPtrAllocator.
Before, node construction was leaking memory because of the DenseMap in SuffixTreeNodes.
Changing this to a SpecificBumpPtrAllocator allows this memory to properly be released.
llvm-svn: 297319
When the array indexes are all determined by GVN to be constants,
a call is made to constant-folding to optimize/simplify the address
computation.
The constant-folding, however, makes a mistake in that it sometimes reads
back stale Idxs instead of NewIdxs, that it re-computed in previous iteration.
This leads to incorrect addresses coming out of constant-folding to GEP.
A test case is included. The error is only triggered when indexes have particular
patterns that the stale/new index updates interplay matters.
Reviewers: Daniel Berlin
Differential Revision: https://reviews.llvm.org/D30642
llvm-svn: 297317
We already have a function create_directories() which can create
an entire tree, and remove() which can remove an empty directory,
but we do not have remove_directories() which can remove an entire
tree. This patch adds such a function.
Because removing a directory tree can have dangerous consequences
when the tree contains a directory symlink, the patch here updates
the existing directory_iterator construct to optionally not follow
symlinks (previously it would always follow symlinks). The delete
algorithm uses this flag so that for symlinks, only the links are
removed, and not the targets.
On Windows this is implemented with SHFileOperation, which also
does not recurse into symbolic links or junctions.
Differential Revision: https://reviews.llvm.org/D30676
llvm-svn: 297314
Analyzing larger trees is extremely difficult with the current debug output so
this adds GraphTraits and DOTGraphTraits on top of the VectorizableTree data
structure. We can now display the SLP trees with Graphviz as in
https://reviews.llvm.org/F3132765.
I decorated the graph where a value needs to be gathered for one reason or
another. These are the red nodes.
There are other improvement I am planning to make as I work through my case
here. For example, I would also like to mark nodes that need to be extracted.
Differential Revision: https://reviews.llvm.org/D30731
llvm-svn: 297303
Because IRBuilder performs constant-folding, it's not guaranteed that an
instruction in the original loop map to an instruction in the vector loop. It
could map to a constant vector instead. The handling of first-order recurrences
was incorrectly making this assumption when setting the IRBuilder's insert
point.
llvm-svn: 297302
This was originall reverted due to some test failures in
ModuleCache and TestCompDirSymlink. These issues have all
been resolved and the code now passes all tests.
Differential Revision: https://reviews.llvm.org/D30698
llvm-svn: 297300
When extracting a bitfield from the high register in a register pair,
the final offset should be relative to the high register (for 32-bit
extracts).
llvm-svn: 297288
Summary: By using reg_nodbg_empty() to determine if a function can be
treated as a leaf function or not, we miss the case when the register
pair L0_L1 is used but not L0 by itself. This has the effect that
use_all_i32_regs(), a test in reserved-regs.ll which tries to use all
registers, gets treated as a leaf function.
Reviewers: jyknight, venkatra
Reviewed By: jyknight
Subscribers: davide, RKSimon, sepavloff, llvm-commits
Differential Revision: https://reviews.llvm.org/D27089
llvm-svn: 297285
Summary: Use AA when scanning to find an available load value.
Reviewers: rengolin, mcrosier, hfinkel, trentxintong, dberlin
Reviewed By: rengolin, dberlin
Subscribers: aemerson, dberlin, llvm-commits
Differential Revision: https://reviews.llvm.org/D30352
llvm-svn: 297284
Recommitting patch which was previously reverted in r297159. These
changes should address the casting issues.
The original patch enables dbg.value intrinsics to be attached to
newly inserted PHI nodes.
Differential Review: https://reviews.llvm.org/D30701
llvm-svn: 297269
After inspection, it's an UB in our code base. Someone cast a var-arg
function pointer to a non-var-arg one. :/
Re-commit r296771 to continue testing on the patch.
Sorry for the trouble!
llvm-svn: 297256
A block with an UnreachableInst does not transfer execution to a successor.
The problem was exposed by GVN-hoist. This patch fixes bug 32153.
Patch by Aditya Kumar.
Differential Revision: https://reviews.llvm.org/D30667
llvm-svn: 297254
This is repetition of isImage() function in NVPTXUtilities.cpp.
Patch by Briana Grace!
Differential Revision: https://reviews.llvm.org/D30706
llvm-svn: 297252
If there is only one successor, and that successor only
has one predecessor the wait can obviously be delayed until
uses or the end of the next block. This avoids code quality
regressions when there are trivial fallthrough blocks inserted
for structurization.
llvm-svn: 297251
This helps in cases involving bitfields where an AND is exposed by
legalization.
Differential Revision: https://reviews.llvm.org/D30472
llvm-svn: 297249
Summary:
This will allow future patches to inspect the details of the LLT. The implementation is now split between
the Support and CodeGen libraries to allow TableGen to use this class without introducing layering concerns.
Thanks to Ahmed Bougacha for finding a reasonable way to avoid the layering issue and providing the version of this patch without that problem.
The problem with the previous commit appears to have been that TableGen was including CodeGen/LowLevelType.h instead of Support/LowLevelTypeImpl.h.
Reviewers: t.p.northover, qcolombet, rovka, aditya_nandakumar, ab, javed.absar
Subscribers: arsenm, nhaehnle, mgorny, dberris, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D30046
llvm-svn: 297241
Itanium ABI may have an address point one byte after the end of a
vtable. When such vtable global is split, the !type metadata needs to
follow the right vtable.
Differential Revision: https://reviews.llvm.org/D30716
llvm-svn: 297236
This was committed at r297155 and reverted at r297166 because of an
over-reaching clang test. That should be fixed with r297189.
This is one part of solving a recent bug report:
http://lists.llvm.org/pipermail/llvm-dev/2017-February/110293.html
This keeps with our general approach: changing arbitrary shuffles is off-limts,
but changing splat is ok. The transform is very similar to the existing
shrinkBitwiseLogic() canonicalization.
Differential Revision: https://reviews.llvm.org/D30123
llvm-svn: 297232
In my refactoring I introduced a bug where we were using the reference size instead of the offset size for DW_FORM_strp and similar forms.
This patch resolves the error and adds a test case testing all the DWARF forms for DWARF2 AddrSize 8. There is similar coverage already in the DWARFDebugInfoTest sources that covers the parser. Once I migrate the DWARFGenerator APIs to be built on the YAML tools they will be fully covered under the same tests.
llvm-svn: 297230
We were calculating incorrect extract/insert offsets by trying to be too
tricksy with min/max. It's clearer to just split the logic up into "register
starts before this segment" vs "after".
llvm-svn: 297226
Summary:
The purpose of coro.end intrinsic is to allow frontends to mark the cleanup and
other code that is only relevant during the initial invocation of the coroutine
and should not be present in resume and destroy parts.
In landing pads coro.end is replaced with an appropriate instruction to unwind to
caller. The handling of coro.end differs depending on whether the target is
using landingpad or WinEH exception model.
For landingpad based exception model, it is expected that frontend uses the
`coro.end`_ intrinsic as follows:
```
ehcleanup:
%InResumePart = call i1 @llvm.coro.end(i8* null, i1 true)
br i1 %InResumePart, label %eh.resume, label %cleanup.cont
cleanup.cont:
; rest of the cleanup
eh.resume:
%exn = load i8*, i8** %exn.slot, align 8
%sel = load i32, i32* %ehselector.slot, align 4
%lpad.val = insertvalue { i8*, i32 } undef, i8* %exn, 0
%lpad.val29 = insertvalue { i8*, i32 } %lpad.val, i32 %sel, 1
resume { i8*, i32 } %lpad.val29
```
The `CoroSpit` pass replaces `coro.end` with ``True`` in the resume functions,
thus leading to immediate unwind to the caller, whereas in start function it
is replaced with ``False``, thus allowing to proceed to the rest of the cleanup
code that is only needed during initial invocation of the coroutine.
For Windows Exception handling model, a frontend should attach a funclet bundle
referring to an enclosing cleanuppad as follows:
```
ehcleanup:
%tok = cleanuppad within none []
%unused = call i1 @llvm.coro.end(i8* null, i1 true) [ "funclet"(token %tok) ]
cleanupret from %tok unwind label %RestOfTheCleanup
```
The `CoroSplit` pass, if the funclet bundle is present, will insert
``cleanupret from %tok unwind to caller`` before
the `coro.end`_ intrinsic and will remove the rest of the block.
Reviewers: majnemer
Reviewed By: majnemer
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D25543
llvm-svn: 297223
Some intrinsics take metadata parameters. These all need custom
handling of some form, and cannot possibly be lowered generically to
G_INTRINSIC calls with vreg operands.
Reject them, instead of hitting an assert later in getOrCreateVReg.
llvm-svn: 297209
When we translate a no-op (same type) bitcast, we try to be clever and
only emit a COPY if we already assigned a vreg to the defined value.
However, when we didn't, we tried to assign to a reference into the
ValToVReg DenseMap, even though the RHS of the assignment
(getOrCreateVReg) could potentially grow that DenseMap, invalidating the
reference.
Avoid that by getting the source vreg first.
I audited the rest of the translator; this is the only tricky case.
The test is quite unwieldy, as the problem is caused by the DenseMap
growing, which happens after the 47th mapped value.
llvm-svn: 297208
For vector operands, the `select` instruction supports both vector and
non-vector conditions. The MIR builder had an overly restrictive
assertion, that only accepted vector conditions for vector selects
(in effect implementing ISD::VSELECT).
Make it possible to express the full range of G_SELECTs.
llvm-svn: 297207
When computing the mapping for non-generic instructions, we skipped
%noreg operands, because we can't always reason about their banks.
Also skip them when applying the mapping. Otherwise, we could end
up with mappings that we can't apply.
While there, duplicate an assert to distinguish between the two
error conditions.
llvm-svn: 297201
When a dbg_value has a constant operand that isn't representable in MI,
there isn't much we can do. Use %noreg (0) for those situations.
This matches the SelectionDAG behavior.
llvm-svn: 297200
We cannot leave the identity copies 'select true, arg, undef' that this pass
inserts for arguments to simplify handling of values on swifterror arguments.
swifterror arguments have restrictions on their uses.
rdar://30839288
llvm-svn: 297197
Move the __try/__except block outside of the set_thread_name function to avoid a conflict with object unwinding due to the use of the llvm::Storage.
Differential Revision: https://reviews.llvm.org/D30707
llvm-svn: 297192
Broadcom Vulcan is now Cavium ThunderX2T99.
LLVM Bugzilla: http://bugs.llvm.org/show_bug.cgi?id=32113
Minor fixes for the alignments of loops and functions for
ThunderX T81/T83/T88 (better performance).
Patch was tested with SpecCPU2006.
Patch by Stefan Teleman
Differential Revision: https://reviews.llvm.org/D30510
llvm-svn: 297190
More module problems. This time it only showed up in the stage 2 compile of
clang-x86_64-linux-selfhost-modules-2 but not the stage 1 compile.
Somehow, this change causes the build to need Attributes.gen before it's been
generated.
llvm-svn: 297188
When expanding the set of uniform instructions beyond the seed instructions
(e.g., consecutive pointers), we mark a new instruction uniform if all its
loop-varying users are uniform. We should also allow users that are consecutive
or interleaved memory accesses. This fixes cases where we have an instruction
that is used as the pointer operand of a consecutive access but also used by a
non-memory instruction that later becomes uniform as part of the expansion.
llvm-svn: 297179
Summary:
Loop alignment can cause a significant change of
the perfromance for short loops.
To be able to evaluate the impact of loop alignment this change
introduces the new option x86-experimental-pref-loop-alignment.
The alignment will be 2^Value bytes, the default value is 4.
Patch by Serguei Katkov!
Reviewers: craig.topper
Reviewed By: craig.topper
Subscribers: sanjoy, llvm-commits
Differential Revision: https://reviews.llvm.org/D30391
llvm-svn: 297178
Summary:
This will allow future patches to inspect the details of the LLT. The implementation is now split between
the Support and CodeGen libraries to allow TableGen to use this class without introducing layering concerns.
Thanks to Ahmed Bougacha for finding a reasonable way to avoid the layering issue and providing the version of this patch without that problem.
Reviewers: t.p.northover, qcolombet, rovka, aditya_nandakumar, ab, javed.absar
Subscribers: arsenm, nhaehnle, mgorny, dberris, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D30046
llvm-svn: 297177
This reverts commit r296488.
As noted by David Blaikie on llvm-commits, I overlooked the case of a
debug function being inlined into a nodebug function being inlined
into a debug function.
llvm-svn: 297163
The check for LSL #0 in an IT block was checking if operand 4 was zero, but
operand 4 is the condition code operand so it was actually checking for LSLEQ.
Fix this by checking operand 3, which really is the immediate operand, and add
some tests.
Differential Revision: https://reviews.llvm.org/D30692
llvm-svn: 297142
The original patch r296865 was reverted as it broke the chromium builds for
Android https://bugs.llvm.org/show_bug.cgi?id=32134, this patch reapplies
r296865 with a fix to make sure it doesn't cause the build regression.
The problem was that intrinsic selection on int_arm_get_fpscr was failing in
ISel this was because the code to manually select this intrinsic still thought
it was the version with no side-effects (INTRINSIC_WO_CHAIN) which is wrong as
it doesn't semantically match the definition in the tablegen code which says it
does have side-effects, I've fixed this by updating the intrinsic type to
INTRINSIC_W_CHAIN (has side-effects). I've also added a test for this based on
Hans original reproducer.
Differential Revision: https://reviews.llvm.org/D30645
llvm-svn: 297137
Summary: Previously, it had always been materialized as a push/pop sequence.
Reviewers: labrinea, jroelofs
Reviewed By: jroelofs
Subscribers: llvm-commits, rengolin
Differential Revision: https://reviews.llvm.org/D30648
llvm-svn: 297134
X86EvexToVex machine instruction pass compresses EVEX encoded instructions by replacing them with their identical VEX encoded instructions when possible.
It uses manually supported 2 large tables that map the EVEX instructions to their VEX ideticals.
This TableGen backend replaces the tables by automatically generating them.
Differential Revision: https://reviews.llvm.org/D30451
llvm-svn: 297127
evex2vex pass defines 2 tables which maps EVEX instructions to their VEX identical when possible. Adding all missing entries.
Differential Revision: https://reviews.llvm.org/D30501
llvm-svn: 297126
This reverts commit r296771.
We found some wide spread test failures internally. I'm working on a
testcase. Politely revert the patch in the mean time. :)
llvm-svn: 297124