The register scavenger maintains a DistanceMap that maps MI pointers to their
distance from the top of the current MBB. The DistanceMap is built
incrementally in forward() and in bulk in findFirstUse(). It is used by
scavengeRegister() to determine which candidate register has the longest
unused interval.
Unfortunately the DistanceMap contents can become outdated. The first time
scavengeRegister() is called, the DistanceMap is filled to cover the MBB. If
then instructions are inserted in the MBB (as they always are following
scavengeRegister()), the recorded distances are too short. This causes bad
behaviour in the included test case where a register use /after/ the current
position is ignored because findFirstUse() thinks is is /before/ the current
position. A "using an undefined register" assertion follows promptly.
The fix is to build a fresh DistanceMap at the top of scavengeRegister(), and
discard it after use. This means that DistanceMap is no longer needed as a
RegScavenger member variable, and forward() doesn't need to update it.
The fix then discloses issue number two in the same test case: The candidate
search in scavengeRegister() finds a CSR that has been saved in the prologue,
but is currently unused. It would be both inefficient and wrong to spill such
a register in the emergency spill slot. In the present case, the emergency
slot restore is placed immediately before the normal epilogue restore, leading
to a "Redefining a live register" assertion.
Fix number two: When scavengerRegister() stumbles upon an unused register that
is overwritten later in the MBB, return that register early. It is important
to verify that the register is defined later in the MBB, otherwise it might be
an unspilled CSR.
llvm-svn: 78650
the overloaded vector types allowed floating-point or integer vector elements.
Most of these operations actually depend on the element type, so bitcasting
was not an option.
If you include the vpadd intrinsics that I updated earlier, this gets rid
of 20 intrinsics.
llvm-svn: 78646
MERGE_VALUES nodes. Replacing the result values with the
operands in one MERGE_VALUES node may cause another
MERGE_VALUES node be CSE'd with the first one, and bring
its uses along, so that the first one isn't dead, as this
code expects. Fix this by iterating until the node is
really dead. This fixes PR4699.
llvm-svn: 78619
instead of syntactically as a string. This means that it keeps track of the
segment, section, flags, etc directly and asmprints them in the right format.
This also includes parsing and validation support for llvm-mc and
"attribute(section)", so we should now start getting errors about invalid
section attributes from the compiler instead of the assembler on darwin.
Still todo:
1) Uniquing of darwin mcsections
2) Move all the Darwin stuff out to MCSectionMachO.[cpp|h]
3) there are a few FIXMEs, for example what is the syntax to get the
S_GB_ZEROFILL segment type?
llvm-svn: 78547
Blackfin supports and/or/xor on i32 but not on i16. Teach
DAGCombiner::SimplifyBinOpWithSameOpcodeHands to not produce illegal nodes
after legalize ops.
llvm-svn: 78497
Verify that early clobber registers and their aliases are not used.
All changes to RegsAvailable are now done as a transaction so the order of
operands makes no difference.
The included test case is from PR4686. It has behaviour that was dependent on the order of operands.
llvm-svn: 78465
- This doesn't actually improve the algorithm (its still linear), but the
generated (match) code is now fairly compact and table driven. Still need a
generic string matcher.
- The table still needs to be compressed, this is quite simple to do and should
shrink it to under 16k.
- This also simplifies and restructures the code to make the match classes more
explicit, in anticipation of resolving ambiguities.
llvm-svn: 78461
I can clean this up a bit more and do way with the TheCondState and just use
the top element on the TheCondStack if not empty. Also may tweak the code
around ParseConditionalAssemblyDirectives() to simplify the AsmParser code.
llvm-svn: 78423
- Still not very sane, but a least its not 60k lines on X86. :)
- In terms of correctness, currently some things are hard wired for X86, and we
still don't properly resolve ambiguities (this is ignoring the instructions
we don't even match due to funny .td stuff or other corner cases).
The high level changes:
1. Represent tokens which are significant for matching explicitly as separate
operands. This uniformly handles not only the instruction mnemonic, but
also 'signficiant' syntax like the '*' in "call * ...".
2. Separate the matching of operands to an instruction from the construction of
the MCInst. In theory this can be done during matching, but since the number
of variations is small I think it makes sense to decompose the problems.
3. Improved a few of the mechanisms to at least successfully flatten / tokenize
the assembly strings for PowerPC and ARM.
4. The comment at the top of AsmMatcherEmitter.cpp explains the approach I'm
moving towards for handling ambiguous instructions. The high-bit is to infer
a partial ordering of the operand classes (and force the user to specify one
if we can't) and use that to resolve ambiguities.
llvm-svn: 78378
This patch takes pain to ensure all the PEI lowering code does the right thing when lowering frame indices, insert code to manipulate stack pointers, etc. It's also custom lowering dynamic stack alloc into pseudo instructions so we can insert the right instructions at scheduling time.
This fixes PR4659 and PR4682.
llvm-svn: 78361
by aggressive chain operand optimization. UpdateNodeOperands
does not modify the node in place if it would result in
a node identical to an existing node.
llvm-svn: 78297
and high-bits values in ways that weren't correct for integer
types wider than 64 bits. This fixes a miscompile in
PPMacroExpansion.cpp in clang on x86-64.
llvm-svn: 78295
Instead of awkwardly encoding calling-convention information with ISD::CALL,
ISD::FORMAL_ARGUMENTS, ISD::RET, and ISD::ARG_FLAGS nodes, TargetLowering
provides three virtual functions for targets to override:
LowerFormalArguments, LowerCall, and LowerRet, which replace the custom
lowering done on the special nodes. They provide the same information, but
in a more immediately usable format.
This also reworks much of the target-independent tail call logic. The
decision of whether or not to perform a tail call is now cleanly split
between target-independent portions, and the target dependent portion
in IsEligibleForTailCallOptimization.
This also synchronizes all in-tree targets, to help enable future
refactoring and feature work.
llvm-svn: 78142
When LowerExtract eliminates an EXTRACT_SUBREG with a kill flag, it moves the
kill flag to the place where the sub-register is killed. This can accidentally
overlap with the use of a sibling sub-register, and we have trouble.
In the test case we have this code:
Live Ins: %R0 %R1 %R2
%R2L<def> = EXTRACT_SUBREG %R2<kill>, 1
%R2H<def> = LOAD16fi <fi#-1>, 0, Mem:LD(2,4) [FixedStack-1 + 0]
%R1L<def> = EXTRACT_SUBREG %R1<kill>, 1
%R0L<def> = EXTRACT_SUBREG %R0<kill>, 1
%R0H<def> = ADD16 %R2H<kill>, %R2L<kill>, %AZ<imp-def>, %AN<imp-def>, %AC0<imp-def>, %V<imp-def>, %VS<imp-def>
subreg: CONVERTING: %R2L<def> = EXTRACT_SUBREG %R2<kill>, 1
subreg: eliminated!
subreg: killed here: %R0H<def> = ADD16 %R2H, %R2L, %R2<imp-use,kill>, %AZ<imp-def>, %AN<imp-def>, %AC0<imp-def>, %V<imp-def>, %VS<imp-def>
The kill flag on %R2 is moved to the last instruction, and the live range overlaps with the definition of %R2H:
*** Bad machine code: Redefining a live physical register ***
- function: f
- basic block: 0x18358c0 (#0)
- instruction: %R2H<def> = LOAD16fi <fi#-1>, 0, Mem:LD(2,4) [FixedStack-1 + 0]
Register R2H was defined but already live.
The fix is to replace EXTRACT_SUBREG with IMPLICIT_DEF instead of eliminating
it completely:
subreg: CONVERTING: %R2L<def> = EXTRACT_SUBREG %R2<kill>, 1
subreg: replace by: %R2L<def> = IMPLICIT_DEF %R2<kill>
Note that these IMPLICIT_DEF instructions survive to the asm output. It is
necessary to fix the stack-color-with-reg test case because of that.
llvm-svn: 78093
killed by another operand.
There is probably a better fix. Either 1) scavenger can look at other operands, or
2) livevariables can be smarter about kill markers. Patches welcome.
llvm-svn: 78072
When LowerSubregsInstructionPass::LowerInsert eliminates an INSERT_SUBREG
instriction because it is an identity copy, make sure that the same registers
are alive before and after the elimination.
When the super-register is marked <undef> this requires inserting an
IMPLICIT_DEF instruction to make sure the super register is live.
Fix a related bug where a kill flag on the inserted sub-register was not transferred properly.
Finally, clear the undef flag in MachineInstr::addRegisterKilled. Undef implies dead and kill implies live, so they cant both be valid.
llvm-svn: 77989
This is not just a matter of passing in the target triple from the module;
currently backends are making decisions based on the build and host
architecture. The goal is to migrate to making these decisions based off of the
triple (in conjunction with the feature string). Thus most clients pass in the
target triple, or the host triple if that is empty.
This has one important change in the way behavior of the JIT and llc.
For the JIT, it was previously selecting the Target based on the host
(naturally), but it was setting the target machine features based on the triple
from the module. Now it is setting the target machine features based on the
triple of the host.
For LLC, -march was previously only used to select the target, the target
machine features were initialized from the module's triple (which may have been
empty). Now the target triple is taken from the module, or the host's triple is
used if that is empty. Then the triple is adjusted to match -march.
The take away is that -march for llc is now used in conjunction with the host
triple to initialize the subtarget. If users want more deterministic behavior
from llc, they should use -mtriple, or set the triple in the input module.
llvm-svn: 77946
__builtin_bfin_ones does the same as ctpop, so it can be implemented in the front-end.
__builtin_bfin_loadbytes loads from an unaligned pointer with the disalignexcpt instruction. It does the same as loading from a pointer with the low bits masked. It is better if the front-end creates a masked load. We can always instruction select the masked to disalignexcpt+load.
We keep csync/ssync/idle. These intrinsics represent instructions that need workarounds for some silicon revisions. We may even want to convert inline assembler to intrinsics to enable the workarounds.
llvm-svn: 77917
Allow imp-def and imp-use of anything in the scavenger asserts, just like the machine code verifier.
Allow redefinition of a sub-register of a live register.
llvm-svn: 77904
to:
.quad X
even on a 32-bit system, where X is not 64-bits. There isn't much that
we can do here, so we just print:
.quad ((X) & 4294967295)
instead.
llvm-svn: 77818
myself because I'm getting tired of seeing the red buildbots, which have
been red since 5:30PM PDT last night.
Proposed supplement to developer policy: committers should make sure to
be around to watch for buildbot failures after committing.
llvm-svn: 77785
instructions for calls since BL and BLX are always 32-bit long and BX is always
16-bit long.
Also, we should be using BLX to call external function stubs.
llvm-svn: 77756
- Operands which are just a label should be parsed as immediates, not memory
operands (from the assembler perspective).
- Match a few more flavors of immediates.
- Distinguish match functions for memory operands which don't take a segment
register.
- We match the .s for "hello world" now!
llvm-svn: 77745
padding is disabled, tabs get replaced by spaces except in the case of
the first operand, where the tab is output to line up the operands after
the mnemonics.
Add some better comments and eliminate redundant code.
Fix some testcases to not assume tabs.
llvm-svn: 77740
- Uses MCAsmToken::getIdentifier which returns the (sub)string representing the
meaningfull contents a string or identifier token.
- Directives aren't done yet.
llvm-svn: 77739
into the mergable section if it is one of our special cases. This could
obviously be improved, but this is the minimal fix and restores us to the
previous behavior.
llvm-svn: 77679
- This is "experimental" code, I am feeling my way around and working out the
best way to do things (and learning tblgen in the process). Comments welcome,
but keep in mind this stuff will change radically.
- This is enough to match "subb" and friends, but not much else. The next step is to
automatically generate the matchers for individual operands.
llvm-svn: 77657
When the return value is not used (i.e. only care about the value in the memory), x86 does not have to use add to implement these. Instead, it can use add, sub, inc, dec instructions with the "lock" prefix.
This is currently implemented using a bit of instruction selection trick. The issue is the target independent pattern produces one output and a chain and we want to map it into one that just output a chain. The current trick is to select it into a merge_values with the first definition being an implicit_def. The proper solution is to add new ISD opcodes for the no-output variant. DAG combiner can then transform the node before it gets to target node selection.
Problem #2 is we are adding a whole bunch of x86 atomic instructions when in fact these instructions are identical to the non-lock versions. We need a way to add target specific information to target nodes and have this information carried over to machine instructions. Asm printer (or JIT) can use this information to add the "lock" prefix.
llvm-svn: 77582
wide vectors. Likewise, change VSTn intrinsics to take separate arguments
for each vector in a multi-vector struct. Adjust tests accordingly.
llvm-svn: 77468