This SelectionDAG node will be attached to call nodes by LowerCall(),
and eventually becomes a MO_RegisterMask MachineOperand on the
MachineInstr representing the call instruction.
LowerCall() will attach a register mask that depends on the calling
convention.
llvm-svn: 148436
We know that the blend instructions only use the MSB, so if the mask is
sign-extended then we can convert it into a SHL instruction. This is a
common pattern because the type-legalizer sign-extends the i1 type which
is used by the LLVM-IR for the condition.
Added a new optimization in SimplifyDemandedBits for SIGN_EXTEND_INREG -> SHL.
llvm-svn: 148225
overly conservative. It was concerned about cases where it would prohibit
folding simple [r, c] addressing modes. e.g.
ldr r0, [r2]
ldr r1, [r2, #4]
=>
ldr r0, [r2], #4
ldr r1, [r2]
Change the logic to look for such cases which allows it to form indexed memory
ops more aggressively.
rdar://10674430
llvm-svn: 148086
When we load the v12i32 type, the GenWidenVectorLoads method generates two loads: v8i32 and v4i32
and attempts to use CONCAT_VECTORS to join them. In this fix I concat undef values to widen
the smaller value. The test "widen_load-2.ll" also exposes this bug on AVX.
llvm-svn: 147964
detect a pattern which can be implemented with a small 'shl' embedded in
the addressing mode scale. This happens in real code as follows:
unsigned x = my_accelerator_table[input >> 11];
Here we have some lookup table that we look into using the high bits of
'input'. Each entity in the table is 4-bytes, which means this
implicitly gets turned into (once lowered out of a GEP):
*(unsigned*)((char*)my_accelerator_table + ((input >> 11) << 2));
The shift right followed by a shift left is canonicalized to a smaller
shift right and masking off the low bits. That hides the shift right
which x86 has an addressing mode designed to support. We now detect
masks of this form, and produce the longer shift right followed by the
proper addressing mode. In addition to saving a (rather large)
instruction, this also reduces stalls in Intel chips on benchmarks I've
measured.
In order for all of this to work, one part of the DAG needs to be
canonicalized *still further* than it currently is. This involves
removing pointless 'trunc' nodes between a zextload and a zext. Without
that, we end up generating spurious masks and hiding the pattern.
llvm-svn: 147936
of several newly un-defaulted switches. This also helps optimizers
(including LLVM's) recognize that every case is covered, and we should
assume as much.
llvm-svn: 147861
a combined-away node and the result of the combine isn't substantially
smaller than the input, it's just canonicalized. This is the first part
of a significant (7%) performance gain for Snappy's hot decompression
loop.
llvm-svn: 147604
Before we'd get:
$ clang t.c
fatal error: error in backend: Invalid operand for inline asm constraint 'i'!
Now we get:
$ clang t.c
t.c:16:5: error: invalid operand for inline asm constraint 'i'!
"movq (%4), %%mm0\n"
^
Which at least gets us the inline asm that is the problem.
llvm-svn: 147502
The failure seen on win32, when i64 type is illegal.
It happens on stage of conversion VECTOR_SHUFFLE to BUILD_VECTOR.
The failure message is:
llc: SelectionDAG.cpp:784: void VerifyNodeCommon(llvm::SDNode*): Assertion `(I->getValueType() == EltVT || (EltVT.isInteger() && I->getValueType().isInteger() && EltVT.bitsLE(I->getValueType()))) && "Wrong operand type!"' failed.
I added a special test that checks vector shuffle on win32.
llvm-svn: 147445
The failure seen on win32, when i64 type is illegal.
It happens on stage of conversion VECTOR_SHUFFLE to BUILD_VECTOR.
The failure message is:
llc: SelectionDAG.cpp:784: void VerifyNodeCommon(llvm::SDNode*): Assertion `(I->getValueType() == EltVT || (EltVT.isInteger() && I->getValueType().isInteger() && EltVT.bitsLE(I->getValueType()))) && "Wrong operand type!"' failed.
I added a special test that checks vector shuffle on win32.
llvm-svn: 147399
undefined result. This adds new ISD nodes for the new semantics,
selecting them when the LLVM intrinsic indicates that the undef behavior
is desired. The new nodes expand trivially to the old nodes, so targets
don't actually need to do anything to support these new nodes besides
indicating that they should be expanded. I've done this for all the
operand types that I could figure out for all the targets. Owners of
various targets, please review and let me know if any of these are
incorrect.
Note that the expand behavior is *conservatively correct*, and exactly
matches LLVM's current behavior with these operations. Ideally this
patch will not change behavior in any way. For example the regtest suite
finds the exact same instruction sequences coming out of the code
generator. That's why there are no new tests here -- all of this is
being exercised by the existing test suite.
Thanks to Duncan Sands for reviewing the various bits of this patch and
helping me get the wrinkles ironed out with expanding for each target.
Also thanks to Chris for clarifying through all the discussions that
this is indeed the approach he was looking for. That said, there are
likely still rough spots. Further review much appreciated.
llvm-svn: 146466
We must not issue a bitcast operation for integer-promotion of vector types, because the
location of the values in the vector may be different.
llvm-svn: 146150
generator to it. For non-bundle instructions, these behave exactly the same
as the MC layer API.
For properties like mayLoad / mayStore, look into the bundle and if any of the
bundled instructions has the property it would return true.
For properties like isPredicable, only return true if *all* of the bundled
instructions have the property.
For properties like canFoldAsLoad, isCompare, conservatively return false for
bundles.
llvm-svn: 146026
1. Added opcode BUNDLE
2. Taught MachineInstr class to deal with bundled MIs
3. Changed MachineBasicBlock iterator to skip over bundled MIs; added an iterator to walk all the MIs
4. Taught MachineBasicBlock methods about bundled MIs
llvm-svn: 145975
change, now you need a TargetOptions object to create a TargetMachine. Clang
patch to follow.
One small functionality change in PTX. PTX had commented out the machine
verifier parts in their copy of printAndVerify. That now calls the version in
LLVMTargetMachine. Users of PTX who need verification disabled should rely on
not passing the command-line flag to enable it.
llvm-svn: 145714
Conservatively returns zero when the GV does not specify an alignment nor is it
initialized. Previously it returns ABI alignment for type of the GV. However, if
the type is a "packed" type, then the under-specified alignments is attached to
the load / store instructions. In that case, the alignment of the type cannot be
trusted.
rdar://10464621
llvm-svn: 145300
than ABI alignment. These are loads / stores from / to "packed" data structures.
Their alignments are intentionally under-specified.
rdar://10301431
llvm-svn: 145273
dropping weights on the floor for invokes. This was impeding my writing
further test cases for invoke when interacting with probabilities and
block placement.
No test case as there doesn't appear to be a way to test this stuff. =/
Suggestions for a test case of course welcome. I hope to be able to add
test cases that indirectly cover this eventually by adding probabilities
to the exceptional edge and reordering blocks as a result.
llvm-svn: 145060
ADDs. MaxOffs is used as a threshold to limit the size of the offset. Tradeoffs
being: (1) If we can't materialize the large constant then we'll cause fast-isel
to bail. (2) Too large of an offset can't be directly encoded in the ADD
resulting in a MOV+ADD. Generally not a bad thing because otherwise we would
have had ADD+ADD, but on Thumb this turns into a MOVS+MOVT+ADD. Working on a fix
for that. (3) Conversely, too low of a threshold we'll miss opportunities to
coalesce ADDs.
rdar://10412592
llvm-svn: 144886
for a single miss and not all predecessor instructions that get selected by
the selection DAG instruction selector. This is still not exact (e.g., over
states misses when folded/dead instructions are present), but it is a step in
the right direction.
llvm-svn: 144832
%arrayidx135 = getelementptr inbounds [4 x [4 x [4 x [4 x i32]]]]* %M0, i32 0, i64 0
%arrayidx136 = getelementptr inbounds [4 x [4 x [4 x i32]]]* %arrayidx135, i32 0, i64 %idxprom134
Prior to this commit, the GEP instruction that defines %arrayidx136 thought that
%arrayidx135 was a trivial kill. The GEP that defines %arrayidx135 doesn't
generate any code and thus %M0 gets folded into the second GEP. Thus, we need
to look through GEPs with all zero indices.
rdar://10443319
llvm-svn: 144730
instruction lower optimization" in the pre-RA scheduler.
The optimization, rather the hack, was done before MI use-list was available.
Now we should be able to implement it in a better way, perhaps in the
two-address pass until a MI scheduler is available.
Now that the scheduler has to backtrack to handle call sequences. Adding
artificial scheduling constraints is just not safe. Furthermore, the hack
is not taking all the other scheduling decisions into consideration so it's just
as likely to pessimize code. So I view disabling this optimization goodness
regardless of PR11314.
llvm-svn: 144267
dragonegg self-host buildbot will recover (it is complaining about object
files differing between different build stages). Original commit message:
Add a hack to the scheduler to disable pseudo-two-address dependencies in
basic blocks containing calls. This works around a problem in which
these artificial dependencies can get tied up in calling seqeunce
scheduling in a way that makes the graph unschedulable with the current
approach of using artificial physical register dependencies for calling
sequences. This fixes PR11314.
llvm-svn: 144188
basic blocks containing calls. This works around a problem in which
these artificial dependencies can get tied up in calling seqeunce
scheduling in a way that makes the graph unschedulable with the current
approach of using artificial physical register dependencies for calling
sequences. This fixes PR11314.
llvm-svn: 144124
Add support for trimming constants to GetDemandedBits. This fixes some funky
constant generation that occurs when stores are expanded for targets that don't
support unaligned stores natively.
llvm-svn: 144102
When this field is true it means that the load is from constant (runt-time or compile-time) and so can be hoisted from loops or moved around other memory accesses
llvm-svn: 144100
fixes: Use a separate register, instead of SP, as the
calling-convention resource, to avoid spurious conflicts with
actual uses of SP. Also, fix unscheduling of calling sequences,
which can be triggered by pseudo-two-address dependencies.
llvm-svn: 143206
it fixes the dragonegg self-host (it looks like gcc is miscompiled).
Original commit messages:
Eliminate LegalizeOps' LegalizedNodes map and have it just call RAUW
on every node as it legalizes them. This makes it easier to use
hasOneUse() heuristics, since unneeded nodes can be removed from the
DAG earlier.
Make LegalizeOps visit the DAG in an operands-last order. It previously
used operands-first, because LegalizeTypes has to go operands-first, and
LegalizeTypes used to be part of LegalizeOps, but they're now split.
The operands-last order is more natural for several legalization tasks.
For example, it allows lowering code for nodes with floating-point or
vector constants to see those constants directly instead of seeing the
lowered form (often constant-pool loads). This makes some things
somewhat more complicated today, though it ought to allow things to be
simpler in the future. It also fixes some bugs exposed by Legalizing
using RAUW aggressively.
Remove the part of LegalizeOps that attempted to patch up invalid chain
operands on libcalls generated by LegalizeTypes, since it doesn't work
with the new LegalizeOps traversal order. Instead, define what
LegalizeTypes is doing to be correct, and transfer the responsibility
of keeping calls from having overlapping calling sequences into the
scheduler.
Teach the scheduler to model callseq_begin/end pairs as having a
physical register definition/use to prevent calls from having
overlapping calling sequences. This is also somewhat complicated, though
there are ways it might be simplified in the future.
This addresses rdar://9816668, rdar://10043614, rdar://8434668, and others.
Please direct high-level questions about this patch to management.
Delete #if 0 code accidentally left in.
llvm-svn: 143188
on every node as it legalizes them. This makes it easier to use
hasOneUse() heuristics, since unneeded nodes can be removed from the
DAG earlier.
Make LegalizeOps visit the DAG in an operands-last order. It previously
used operands-first, because LegalizeTypes has to go operands-first, and
LegalizeTypes used to be part of LegalizeOps, but they're now split.
The operands-last order is more natural for several legalization tasks.
For example, it allows lowering code for nodes with floating-point or
vector constants to see those constants directly instead of seeing the
lowered form (often constant-pool loads). This makes some things
somewhat more complicated today, though it ought to allow things to be
simpler in the future. It also fixes some bugs exposed by Legalizing
using RAUW aggressively.
Remove the part of LegalizeOps that attempted to patch up invalid chain
operands on libcalls generated by LegalizeTypes, since it doesn't work
with the new LegalizeOps traversal order. Instead, define what
LegalizeTypes is doing to be correct, and transfer the responsibility
of keeping calls from having overlapping calling sequences into the
scheduler.
Teach the scheduler to model callseq_begin/end pairs as having a
physical register definition/use to prevent calls from having
overlapping calling sequences. This is also somewhat complicated, though
there are ways it might be simplified in the future.
This addresses rdar://9816668, rdar://10043614, rdar://8434668, and others.
Please direct high-level questions about this patch to management.
llvm-svn: 143177
trying to legalize the operand types when only the result type
is required to be legalized - the type legalization machinery
will get round to the operands later if they need legalizing.
There can be a point to legalizing operands in parallel with
the result: when this saves compile time or results in better
code. There was only one case in which this was true: when
the operand is also split, so keep the logic for that bit.
As a result of this change, additional operand legalization
methods may need to be introduced to handle nodes where the
result and operand types can differ, like SIGN_EXTEND, but
the testsuite doesn't contain any tests where this is the case.
In any case, it seems better to require such methods (and die
with an assert if they doesn't exist) than to quietly produce
wrong code if we forgot to special case the node in
SplitVecRes_UnaryOp.
llvm-svn: 143026
This code makes different decisions when compiled into x87 instructions
because of different rounding behavior. That caused phase 2/3
miscompares on 32-bit Linux when the phase 1 compiler was built with gcc
(using x87), and the phase 2 compiler was built with clang (using SSE).
This fixes PR11200.
llvm-svn: 143006
ZExtPromotedInteger and SExtPromotedInteger based on the operation we legalize.
SetCC return type needs to be legalized via PromoteTargetBoolean.
llvm-svn: 142660
When checking the availability of instructions using the TLI, a 'promoted'
instruction IS available. It means that the value is bitcasted to another type
for which there is an operation. The correct check for the availablity of an
instruction is to check if it should be expanded.
llvm-svn: 142542
svn r139159 caused SelectionDAG::getConstant() to promote BUILD_VECTOR operands
with illegal types, even before type legalization. For this testcase, that led
to one BUILD_VECTOR with i16 operands and another with promoted i32 operands,
which triggered the assertion.
llvm-svn: 142370
Some code want to check that *any* call within a function has the 'returns
twice' attribute, not just that the current function has one.
llvm-svn: 142221
This isn't put into the 'clear()' method because the information needs to stick
around (at least for a little bit) after the selection DAG is built.
llvm-svn: 142032
The inline asm operand constraint is initially encoded in the virtual
register for the operand, but that register class may change during
coalescing, and the original constraint is lost.
Encode the original register class as part of the flag word for each
inline asm operand. This makes it possible to recover the actual
constraint required by inline asm, just like we can for normal
instructions.
llvm-svn: 141833
EXTRACT_SUBREG is emitted as %dst = COPY %src:sub, so there is no need to
constrain the %dst register class. RegisterCoalescer will apply the
necessary constraints if it decides to eliminate the COPY.
The %src register class does need to be constrained to something with
the right sub-registers, though. This is currently done manually with
COPY_TO_REGCLASS nodes. They can possibly be removed after this patch.
llvm-svn: 141207
The register class created by INSERT_SUBREG and SUBREG_TO_REG must be
legal and support the SubIdx sub-registers.
The new getSubClassWithSubReg() hook can compute that.
This may create INSERT_SUBREG instructions defining a larger register
class than the sub-register being inserted. That is OK,
RegisterCoalescer will constrain the register class as needed when it
eliminates the INSERT_SUBREG instructions.
llvm-svn: 141198
and the alignment is 0 (i.e., it's defined globally in one file and declared in
another file) it could get an alignment which is larger than the ABI allows for
that type, resulting in aligned moves being used for unaligned loads.
For instance, in file A.c:
struct S s;
In file B.c:
struct {
// something long
};
extern S s;
void foo() {
struct S p = s;
// ...
}
this copy is a 'memcpy' which is turned into a series of 'movaps' instructions
on X86. But this is wrong, because 'struct S' has alignment of 4, not 16.
llvm-svn: 140902
This intrinsic is used to pass the index of the function context to the back-end
for further processing. The back-end is in charge of filling in the rest of the
entries.
llvm-svn: 140676
SDNodes may return values which are wider than the incoming element types. In
this patch we fix the integer promotion of these nodes.
Fixes spill-q.ll when running -promote-elements.
llvm-svn: 140471
(this is always the case for scalars), otherwise use the promoted result type.
Fix test/CodeGen/X86/vsplit-and.ll when promote-elements is enabled.
llvm-svn: 140464
When generating the trunc-store of i1's, we need to use the vector type and not
the scalar type.
This patch fixes the assertion in CodeGen/Generic/bool-vector.ll when
running with -promote-elements.
llvm-svn: 140463
DecomposeMERGE_VALUES to "know" that results are legalized in
a particular order, by passing it the number of the result
being legalized (the type legalization core provides this, it
just needs to be passed on).
llvm-svn: 140373
integer-promotion of CONCAT_VECTORS.
Test: test/CodeGen/X86/widen_shuffle-1.ll
This patch fixes the above tests (when running in with -promote-elements).
llvm-svn: 140372
Sometimes register class constraints are trivial, like GR32->GR32_NOSP,
or GPR->rGPR. Teach InstrEmitter to simply constrain the virtual
register instead of emitting a copy in these cases.
Normally, these copies are handled by the coalescer. This saves some
coalescer work.
llvm-svn: 140340
This is still a hack until we can teach tblgen to generate the
optional CPSR operand rather than an implicit CPSR def. But the
strangeness is now limited to the selection DAG. ADD/SUB MI's no
longer have implicit CPSR defs, nor do we allow flag setting variants
of these opcodes in machine code. There are several corner cases to
consider, and getting one wrong would previously lead to nasty
miscompilation. It's not the first time I've debugged one, so this
time I added enough verification to ensure it won't happen again.
llvm-svn: 140228
No functionality change. The hook makes it explicit which patterns
require "special" handling. i.e. it self-documents tblgen
deficiencies. I plan to add verification in ExpandISelPseudos and
Thumb2SizeReduce to catch any missing hasPostISelHooks. Otherwise it's
too fragile.
llvm-svn: 140160
Modified ARMISelLowering::AdjustInstrPostInstrSelection to handle the
full gamut of CPSR defs/uses including instructins whose "optional"
cc_out operand is not really optional. This allowed removal of the
hasPostISelHook to simplify the .td files and make the implementation
more robust.
Fixes rdar://10137436: sqlite3 miscompile
llvm-svn: 140134
(The fix for the related failures on x86 is going to be nastier because we actually need Acquire memoperands attached to the atomic load instrs, etc.)
llvm-svn: 139221
with a vector condition); such selects become VSELECT codegen nodes.
This patch also removes VSETCC codegen nodes, unifying them with SETCC
nodes (codegen was actually often using SETCC for vector SETCC already).
This ensures that various DAG combiner optimizations kick in for vector
comparisons. Passes dragonegg bootstrap with no testsuite regressions
(nightly testsuite as well as "make check-all"). Patch mostly by
Nadav Rotem.
llvm-svn: 139159
init.trampoline and adjust.trampoline intrinsics, into two intrinsics
like in GCC. While having one combined intrinsic is tempting, it is
not natural because typically the trampoline initialization needs to
be done in one function, and the result of adjust trampoline is needed
in a different (nested) function. To get around this llvm-gcc hacks the
nested function lowering code to insert an additional parent variable
holding the adjust.trampoline result that can be accessed from the child
function. Dragonegg doesn't have the luxury of tweaking GCC code, so it
stored the result of adjust.trampoline in the memory GCC set aside for
the trampoline itself (this is always available in the child function),
and set up some new memory (using an alloca) to hold the trampoline.
Unfortunately this breaks Go which allocates trampoline memory on the
heap and wants to use it even after the parent has exited (!). Rather
than doing even more hacks to get Go working, it seemed best to just use
two intrinsics like in GCC. Patch mostly by Sanjoy Das.
llvm-svn: 139140
If we have a chain of zext -> assert_zext -> zext -> use, the first zext would get simplified away because of the later zext, and then the later zext would get simplified away because of the assert. The solution is to teach SimplifyDemandedBits that assert_zext demands all of the high bits of its input, rather than only those demanded by its users. No testcase because the only example I have manifests as llvm-gcc miscompiling LLVM, and I haven't found a smaller case that reproduces this problem.
Fixes <rdar://problem/10063365>.
llvm-svn: 139059
to be unreliable on platforms which require memcpy calls, and it is
complicating broader legalize cleanups. It is hoped that these cleanups
will make memcpy byval easier to implement in the future.
llvm-svn: 138977
Added canClobberReachingPhysRegUse() to handle a particular pattern in
which a two-address instruction could be forced to interfere with
EFLAGS, causing a compare to be unnecessarilly cloned.
Fixes rdar://problem/5875261
llvm-svn: 138924
Add a instruction flag: hasPostISelHook which tells the pre-RA scheduler to
call a target hook to adjust the instruction. For ARM, this is used to
adjust instructions which may be setting the 's' flag. ADC, SBC, RSB, and RSC
instructions have implicit def of CPSR (required since it now uses CPSR physical
register dependency rather than "glue"). If the carry flag is used, then the
target hook will *fill in* the optional operand with CPSR. Otherwise, the hook
will remove the CPSR implicit def from the MachineInstr.
llvm-svn: 138810
I don't really like the patterns, but I'm having trouble coming up with a
better way to handle them.
I plan on making other targets use the same legalization
ARM-without-memory-barriers is using... it's not especially efficient, but
if anyone cares, it's not that hard to fix for a given target if there's
some better lowering.
llvm-svn: 138621
the intent seems to be to terminate even in Release builds, just use abort()
directly.
If program flow ever reaches a __builtin_unreachable (which llvm_unreachable is
#define'd to on newer GCCs) then the program is undefined.
llvm-svn: 138068
The landingpad instruction is lowered into the EXCEPTIONADDR and EHSELECTION
SDNodes. The information from the landingpad instruction is harvested by the
'AddLandingPadInfo' function. The new EH uses the current EH scheme in the
back-end. This will change once we switch over to the new scheme. (Reviewed by
Jakob!)
llvm-svn: 137880
This generates the SDNodes for the new exception handling scheme. It takes the
two values coming from the landingpad instruction and assigns them to the
EXCEPTIONADDR and EHSELECTION nodes.
llvm-svn: 137873
This implements the 'landingpad' instruction. It's used to indicate that a basic
block is a landing pad. There are several restrictions on its use (see
LangRef.html for more detail). These restrictions allow the exception handling
code to gather the information it needs in a much more sane way.
This patch has the definition, implementation, C interface, parsing, and bitcode
support in it.
llvm-svn: 137501
This adds the 'resume' instruction class, IR parsing, and bitcode reading and
writing. The 'resume' instruction resumes propagation of an existing (in-flight)
exception whose unwinding was interrupted with a 'landingpad' instruction (to be
added later).
llvm-svn: 136589
working on x86 (at least for trivial testcases); other architectures will
need more work so that they actually emit the appropriate instructions for
orderings stricter than 'monotonic'. (As far as I can tell, the ARM, PPC,
Mips, and Alpha backends need such changes.)
llvm-svn: 136457
specified in the same file that the library itself is created. This is
more idiomatic for CMake builds, and also allows us to correctly specify
dependencies that are missed due to bugs in the GenLibDeps perl script,
or change from compiler to compiler. On Linux, this returns CMake to
a place where it can relably rebuild several targets of LLVM.
I have tried not to change the dependencies from the ones in the current
auto-generated file. The only places I've really diverged are in places
where I was seeing link failures, and added a dependency. The goal of
this patch is not to start changing the dependencies, merely to move
them into the correct location, and an explicit form that we can control
and change when necessary.
This also removes a serialization point in the build because we don't
have to scan all the libraries before we begin building various tools.
We no longer have a step of the build that regenerates a file inside the
source tree. A few other associated cleanups fall out of this.
This isn't really finished yet though. After talking to dgregor he urged
switching to a single CMake macro to construct libraries with both
sources and dependencies in the arguments. Migrating from the two macros
to that style will be a follow-up patch.
Also, llvm-config is still generated with GenLibDeps.pl, which means it
still has slightly buggy dependencies. The internal CMake
'llvm-config-like' macro uses the correct explicitly specified
dependencies however. A future patch will switch llvm-config generation
(when using CMake) to be based on these deps as well.
This may well break Windows. I'm getting a machine set up now to dig
into any failures there. If anyone can chime in with problems they see
or ideas of how to solve them for Windows, much appreciated.
llvm-svn: 136433
This generates the correct SDNodes for the landingpad instruction. It makes an
assumption that the result of the landingpad instruction has at least two
values. And that the first value is a pointer to the exception object and the
second value is the "selector."
llvm-svn: 136430
'atomicrmw' instructions, which allow representing all the current atomic
rmw intrinsics.
The allowed operands for these instructions are heavily restricted at the
moment; we can probably loosen it a bit, but supporting general
first-class types (where it makes sense) might get a bit complicated,
given how SelectionDAG works.
As an initial cut, these operations do not support specifying an alignment,
but it would be possible to add if we think it's useful. Specifying an
alignment lower than the natural alignment would be essentially
impossible to support on anything other than x86, but specifying a greater
alignment would be possible. I can't think of any useful optimizations which
would use that information, but maybe someone else has ideas.
Optimizer/codegen support coming soon.
llvm-svn: 136404