The syntax for "cmpxchg" should now look something like:
cmpxchg i32* %addr, i32 42, i32 3 acquire monotonic
where the second ordering argument gives the required semantics in the case
that no exchange takes place. It should be no stronger than the first ordering
constraint and cannot be either "release" or "acq_rel" (since no store will
have taken place).
rdar://problem/15996804
llvm-svn: 203559
the legalization cost must be included to get an accurate
estimation of the total cost of the scalarized vector.
The inaccurate cost triggered unprofitable SLP vectorization on
32-bit X86.
Summary:
Include legalization overhead when computing scalarization cost
Reviewers: hfinkel, nadav
CC: chandlerc, rnk, llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D2992
llvm-svn: 203509
the stack of the analysis group because they are all immutable passes.
This is made clear by Craig's recent work to use override
systematically -- we weren't overriding anything for 'finalizePass'
because there is no such thing.
This is kind of a lame restriction on the API -- we can no longer push
and pop things, we just set up the stack and run. However, I'm not
invested in building some better solution on top of the existing
(terrifying) immutable pass and legacy pass manager.
llvm-svn: 203437
This requires a number of steps.
1) Move value_use_iterator into the Value class as an implementation
detail
2) Change it to actually be a *Use* iterator rather than a *User*
iterator.
3) Add an adaptor which is a User iterator that always looks through the
Use to the User.
4) Wrap these in Value::use_iterator and Value::user_iterator typedefs.
5) Add the range adaptors as Value::uses() and Value::users().
6) Update *all* of the callers to correctly distinguish between whether
they wanted a use_iterator (and to explicitly dig out the User when
needed), or a user_iterator which makes the Use itself totally
opaque.
Because #6 requires churning essentially everything that walked the
Use-Def chains, I went ahead and added all of the range adaptors and
switched them to range-based loops where appropriate. Also because the
renaming requires at least churning every line of code, it didn't make
any sense to split these up into multiple commits -- all of which would
touch all of the same lies of code.
The result is still not quite optimal. The Value::use_iterator is a nice
regular iterator, but Value::user_iterator is an iterator over User*s
rather than over the User objects themselves. As a consequence, it fits
a bit awkwardly into the range-based world and it has the weird
extra-dereferencing 'operator->' that so many of our iterators have.
I think this could be fixed by providing something which transforms
a range of T&s into a range of T*s, but that *can* be separated into
another patch, and it isn't yet 100% clear whether this is the right
move.
However, this change gets us most of the benefit and cleans up
a substantial amount of code around Use and User. =]
llvm-svn: 203364
This is already done for shifts. Allow it for rotations as well. E.g.:
(rotl:i32 x, (trunc (and y, 31))) -> (rotl:i32 x, (and (trunc y), 31))
Use the newly factored-out distributeTruncateThroughAnd.
With this patch and some X86.td tweaks we should be able to remove redundant
masking of the rotation amount like in the example above. HW implicitly
performs this masking.
The testcase will be added as part of the X86 patch.
llvm-svn: 203316
This is the new idiom:
x<<(y&31) | x>>((0-y)&31)
which is recognized as:
x ROTL (y&31)
The change refines matchRotateSub. In
Neg & (OpSize - 1) == (OpSize - Pos) & (OpSize - 1), if Pos is
Pos' & (OpSize - 1) we can just use Pos' instead of Pos.
llvm-svn: 203315
Slightly change the wording in the function comment. Originally, it can be
misunderstood as we turned the input into two subsequent rotates.
Better connect the comment which talks about Mask and the code which used
LoBits. Renamed variable to MaskLoBits.
llvm-svn: 203314
be split and the result type widened.
When the condition of a vselect has to be split it makes no sense widening the
vselect and thereby widening the condition. We end up in an endless loop of
widening (vselect result type) and splitting (condition mask type) doing this.
Instead, split both the condition and the vselect and widen the result.
I ran this over the test suite with i686 and mattr=+sse and saw no regressions.
Fixes PR18036.
llvm-svn: 203311
First: refactor out the emission of entries into the .debug_loc section
into its own routine.
Second: add a new class ByteStreamer that can be used to either emit
using an AsmPrinter or hash using DIEHash the series of bytes that
would be emitted. Use this in all of the location emission routines
for the .debug_loc section.
No functional change intended outside of a few additional comments
in verbose assembly.
llvm-svn: 203304
This helps the instruction selector to lower an i64 * i64 -> i128
multiplication into a single instruction on targets which support it.
Patch by Manuel Jacob.
llvm-svn: 203230
Summary:
llvm/MC/MCSectionMachO.h and llvm/Support/MachO.h both had the same
definitions for the section flags. Instead, grab the definitions out of
support.
No functionality change.
Reviewers: grosbach, Bigcheese, rafael
Reviewed By: rafael
CC: llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D2998
llvm-svn: 203211
The old system was fairly convoluted:
* A temporary label was created.
* A single PROLOG_LABEL was created with it.
* A few MCCFIInstructions were created with the same label.
The semantics were that the cfi instructions were mapped to the PROLOG_LABEL
via the temporary label. The output position was that of the PROLOG_LABEL.
The temporary label itself was used only for doing the mapping.
The new CFI_INSTRUCTION has a 1:1 mapping to MCCFIInstructions and points to
one by holding an index into the CFI instructions of this function.
I did consider removing MMI.getFrameInstructions completelly and having
CFI_INSTRUCTION own a MCCFIInstruction, but MCCFIInstructions have non
trivial constructors and destructors and are somewhat big, so the this setup
is probably better.
The net result is that we don't create temporary labels that are never used.
llvm-svn: 203204
This patch teaches the DAGCombiner how to fold a binary OR between two
shufflevector into a single shuffle vector when possible.
The rules are:
1. fold (or (shuf A, V_0, MA), (shuf B, V_0, MB)) -> (shuf A, B, Mask1)
2. fold (or (shuf A, V_0, MA), (shuf B, V_0, MB)) -> (shuf B, A, Mask2)
The DAGCombiner can take advantage of the fact that OR is commutative and
compute two possible shuffle masks (Mask1 and Mask2) for the resulting
shuffle node.
Before folding a dag according to either rule 1 or 2, DAGCombiner verifies
that the resulting shuffle mask is legal for the target.
DAGCombiner would firstly try to fold according to 1.; If not possible
then it will try to fold according to 2.
If both Mask1 and Mask2 are illegal then we conservatively don't fold
the OR instruction.
llvm-svn: 203156
This compiles with no changes to clang/lld/lldb with MSVC and includes
overloads to various functions which are used by those projects and llvm
which have OwningPtr's as parameters. This should allow out of tree
projects some time to move. There are also no changes to libs/Target,
which should help out of tree targets have time to move, if necessary.
llvm-svn: 203083
This is consistent with GDB ToT and reduces the number of relocations in
(type and compile) units, substantially reducing relocations and debug
size in fission + type units builds.
llvm-svn: 203082
pointed to by the attribute, rather than the form as a first
step to determining how to hash the values. No functional change
intended.
llvm-svn: 203044
This works by moving the existing code into the DIEValue hierarchy
and using the DwarfDebug pointer off of the AsmPrinter to access
any global information we need.
llvm-svn: 203033
This enables us to figure out where in the debug_loc section our
locations are so that we can eventually hash them. It also helps
remove some special case code in emission. No functional change.
llvm-svn: 203018
Before llvm-mc would print it, but llc was assuming that it would produce
another section changing directive before one was needed. That assumption is
false with inline asm.
Fixes PR19049.
Another option would be to always create the section, but in the asm printer
avoid printing sections changes during initialization. That would work, but
* We do use the fact that llvm-mc prints it in testing. The tests can be changed
if needed.
* A quick poll on IRC suggest that most developers prefer the implicit .text to
be printed.
llvm-svn: 203001
already lived there and it is where it belongs -- this is the in-memory
debug location representation.
This is just cleanup -- Modules can actually cope with this, but that
doesn't make it right. After chatting with folks that have out-of-tree
stuff, going ahead and moving the rest of the headers seems preferable.
llvm-svn: 202960
Patchpoints already did this. Doing it for stackmaps is a convenience
for the runtime in the event that it needs to scratch register to
patch or perform a runtime call thunk.
Unlike patchpoints, we just assume the AnyRegCC calling
convention. This is the only language and target independent calling
convention specific to stackmaps so makes sense. Although the calling
convention is not currently used to select the scratch registers.
llvm-svn: 202943
selection dag (PR19012)
In X86SelectionDagInfo::EmitTargetCodeForMemcpy we check with MachineFrameInfo
to make sure that ESI isn't used as a base pointer register before we choose to
emit rep movs (which clobbers esi).
The problem is that MachineFrameInfo wouldn't know about dynamic allocas or
inline asm that clobbers the stack pointer until SelectionDAGBuilder has
encountered them.
This patch fixes the problem by checking for such things when building the
FunctionLoweringInfo.
Differential Revision: http://llvm-reviews.chandlerc.com/D2954
llvm-svn: 202930
using a full uint16_t with the flag value... which happens to be
0 or 1. Update the class for bool values and rename functions slightly.
llvm-svn: 202921
Currently this code is duplicated across visitSHL, visitSRA and visitSRL. The
plan is to add rotates as clients to this new function.
There is no functional change intended here.
llvm-svn: 202908
source file had already been moved. Also move the unittest into the IR
unittest library.
This may seem an odd thing to put in the IR library but we only really
use this with instructions and it needs the LLVM context to work, so it
is intrinsically tied to the IR library.
llvm-svn: 202842
directly care about the Value class (it is templated so that the key can
be any arbitrary Value subclass), it is in fact concretely tied to the
Value class through the ValueHandle's CallbackVH interface which relies
on the key type being some Value subclass to establish the value handle
chain.
Ironically, the unittest is already in the right library.
llvm-svn: 202824
Move the test for this class into the IR unittests as well.
This uncovers that ValueMap too is in the IR library. Ironically, the
unittest for ValueMap is useless in the Support library (honestly, so
was the ValueHandle test) and so it already lives in the IR unittests.
Mmmm, tasty layering.
llvm-svn: 202821
name might indicate, it is an iterator over the types in an instruction
in the IR.... You see where this is going.
Another step of modularizing the support library.
llvm-svn: 202815
Inside iterate, we scan backwards then scan forwards in a loop. When iteration
is not zero, the last node was just updated so we can skip it. But when
iteration is zero, we can't skip the last node.
For the testing case, fixing this will save a spill and move register copies
from hot path to cold path.
llvm-svn: 202557
The previous PBQP solver was very robust but consumed a lot of memory,
performed a lot of redundant computation, and contained some unnecessarily tight
coupling that prevented experimentation with novel solution techniques. This new
solver is an attempt to address these shortcomings.
Important/interesting changes:
1) The domain-independent PBQP solver class, HeuristicSolverImpl, is gone.
It is replaced by a register allocation specific solver, PBQP::RegAlloc::Solver
(see RegAllocSolver.h).
The optimal reduction rules and the backpropagation algorithm have been extracted
into stand-alone functions (see ReductionRules.h), which can be used to build
domain specific PBQP solvers. This provides many more opportunities for
domain-specific knowledge to inform the PBQP solvers' decisions. In theory this
should allow us to generate better solutions. In practice, we can at least test
out ideas now.
As a side benefit, I believe the new solver is more readable than the old one.
2) The solver type is now a template parameter of the PBQP graph.
This allows the graph to notify the solver of any modifications made (e.g. by
domain independent rules) without the overhead of a virtual call. It also allows
the solver to supply policy information to the graph (see below).
3) Significantly reduced memory overhead.
Memory management policy is now an explicit property of the PBQP graph (via
the CostAllocator typedef on the graph's solver template argument). Because PBQP
graphs for register allocation tend to contain many redundant instances of
single values (E.g. the value representing an interference constraint between
GPRs), the new RASolver class uses a uniquing scheme. This massively reduces
memory consumption for large register allocation problems. For example, looking
at the largest interference graph in each of the SPEC2006 benchmarks (the
largest graph will always set the memory consumption high-water mark for PBQP),
the average memory reduction for the PBQP costs was 400x. That's times, not
percent. The highest was 1400x. Yikes. So - this is fixed.
"PBQP: No longer feasting upon every last byte of your RAM".
Minor details:
- Fully C++11'd. Never copy-construct another vector/matrix!
- Cute tricks with cost metadata: Metadata that is derived solely from cost
matrices/vectors is attached directly to the cost instances themselves. That way
if you unique the costs you never have to recompute the metadata. 400x less
memory means 400x less cost metadata (re)computation.
Special thanks to Arnaud de Grandmaison, who has been the source of much
encouragement, and of many very useful test cases.
This new solver forms the basis for future work, of which there's plenty to do.
I will be adding TODO notes shortly.
- Lang.
llvm-svn: 202551
This extract-and-trunc vector optimization cannot work for i1 values as
currently implemented, and so I'm disabling this for now for i1 values. In the
future, this can be fixed properly.
Soon I'll commit support for i1 CR bit tracking in the PowerPC backend, and
this will be covered by one of the existing regression tests.
llvm-svn: 202449
This is a temporary workaround for native arm linux builds:
PR18996: Changing regalloc order breaks "lencod" on native arm linux builds.
llvm-svn: 202433
scan the register file for sub- and super-registers.
No functionality change intended.
(Tests are updated because the comments in the assembler output are
different.)
llvm-svn: 202416
any ranges - this includes CU ranges where we were previously emitting an
end list marker even if we didn't have a list.
Testcase includes a test for line table only code emission as the problem
was noticed while writing this test.
llvm-svn: 202357
any ranges to the list of ranges for the CU as we don't want to emit
them anyway. This ensures that we will still emit ranges if we have
a compile unit compiled with only line tables and one compiled with
full debug info requested (we'll emit for the one with full debug info).
Update testcase metadata accordingly to continue emitting ranges.
llvm-svn: 202333
This handles pathological cases in which we see 2x increase in spill
code for large blocks (~50k instructions). I don't have a unit test
for this behavior.
Fixes rdar://16072279.
llvm-svn: 202304
The aggressive anti-dependency breaker scans instructions, bottom-up, within the
scheduling region in order to find opportunities where register renaming can
be used to break anti-dependencies.
Unfortunately, the aggressive anti-dep breaker was treating a register definition
as defining all of that register's aliases (including super registers). This behavior
is incorrect when the super register is live and there are other definitions of
subregisters of the super register.
For example, given the following sequence:
%CR2EQ<def> = CROR %CR3UN, %CR3UN<kill>
%CR2GT<def> = IMPLICIT_DEF
%X4<def> = MFOCRF8 %CR2
the analysis of the first subregister definition would work as expected:
Anti: %CR2GT<def> = IMPLICIT_DEF
Def Groups: CR2GT=g194->g0(via CR2)
Antidep reg: CR2GT (zero group)
Use Groups:
but the analysis of the second one would not:
Anti: %CR2EQ<def> = CROR %CR3UN, %CR3UN<kill>
Def Groups: CR2EQ=g195
Antidep reg: CR2EQ
Rename Candidates for Group g195: ...
because, when processing the %CR2GT<def>, we'd mark all super registers of
%CR2GT (%CR2 in this case) as defined. As a result, when processing
%CR2EQ<def>, %CR2 no longer appears to be live, and %CR2EQ<def>'s group is not
%unioned with the %CR2 group.
I don't have an in-tree test case for this yet (and even if I did, I don't have
a small one).
llvm-svn: 202294
Variadic functions have an unspecified parameter tag after the last
argument. In IR this is represented as an unspecified parameter in the
subroutine type.
Paired commit with CFE r202185.
rdar://problem/13690847
This re-applies r202184 + a bugfix in DwarfDebug's argument handling.
llvm-svn: 202188
Variadic functions have an unspecified parameter tag after the last
argument. In IR this is represented as an unspecified parameter in the
subroutine type.
Paired commit with CFE.
rdar://problem/13690847
llvm-svn: 202184
The function with uwtable attribute might be visited by the
stack unwinder, thus the link register should be considered
as clobbered after the execution of the branch and link
instruction (i.e. the definition of the machine instruction
can't be ignored) even when the callee function are marked
with noreturn.
llvm-svn: 202165
After this I will set the default back to F_None. The advantage is that
before this patch forgetting to set F_Binary would corrupt a file on windows.
Forgetting to set F_Text produces one that cannot be read in notepad, which
is a better failure mode :-)
llvm-svn: 202052
This commit moves getSLEB128Size() and getULEB128Size() from
MCAsmInfo to LEB128.h and removes some copy-and-paste code.
Besides, this commit also adds some unit tests for the LEB128
functions.
llvm-svn: 201937
CodeGenPrepare uses extensively TargetLowering which is part of libLLVMCodeGen.
This is a layer violation which would introduce eventually a dependence on
CodeGen in ScalarOpts.
Move CodeGenPrepare into libLLVMCodeGen to avoid that.
Follow-up of <rdar://problem/15519855>
llvm-svn: 201912
shifted mask rather than masking and shifting separately.
The patch adds this transformation to the DAGCombiner:
(shl (and (setcc:i8v16 ...) N01C) N1C) -> (and (setcc:i8v16 ...) N01C<<N1C)
<rdar://problem/16054492>
Patch by Adam Nemet <anemet@apple.com>
llvm-svn: 201906
The lowering of the frame index for stackmaps and patchpoints requires some
target-specific magic and should therefore be handled in the target-specific
eliminateFrameIndex method.
This is related to <rdar://problem/16106219>
llvm-svn: 201904
We were just emitting a label for this section for no real reason - this
caused us to emit the section even though we never put anything in it.
Not bothering with a test (though not adamantly anti-test) because it
seems somewhat arbitrary to test for the absence of this section anymore
than the absence of any other section.
llvm-svn: 201876
This replaces the old NoIntegratedAssembler with at TargetOption. This is
more flexible and will be used to forward clang's -no-integrated-as option.
llvm-svn: 201836
passing down an AsmPrinter instance so we could compute the size of
the block which could be target specific. All of the test cases in
the unittest don't have any target specific data so we can use a NULL
AsmPrinter there. This also depends upon block data being added as
integers.
We can now hash the entire fission-cu.ll compile unit so turn the
flag on there with the hash value.
llvm-svn: 201752
TargetLoweringBase is implemented in CodeGen, so before this patch we had
a dependency fom Target to CodeGen. This would show up as a link failure of
llvm-stress when building with -DBUILD_SHARED_LIBS=ON.
This fixes pr18900.
llvm-svn: 201711
r201608 made llvm corretly handle private globals with MachO. r201622 fixed
a bug in it and r201624 and r201625 were changes for using private linkage,
assuming that llvm would do the right thing.
They all got reverted because r201608 introduced a crash in LTO. This patch
includes a fix for that. The issue was that TargetLoweringObjectFile now has
to be initialized before we can mangle names of private globals. This is
trivially true during the normal codegen pipeline (the asm printer does it),
but LTO has to do it manually.
llvm-svn: 201700
When outputting an object we check its section to find its name, but when
looking for the section with -ffunction-section we look for the symbol name.
Break the loop by requesting a name with the private prefix when constructing
the section name. This matches the behavior before r201608.
llvm-svn: 201622
The IR
@foo = private constant i32 42
is valid, but before this patch we would produce an invalid MachO from it. It
was invalid because it would use an L label in a section where the liker needs
the labels in order to atomize it.
One way of fixing it would be to just reject this IR in the backend, but that
would not be very front end friendly.
What this patch does is use an 'l' prefix in sections that we know the linker
requires symbols for atomizing them. This allows frontends to just use
private and not worry about which sections they go to or how the linker handles
them.
One small issue with this strategy is that now a symbol name depends on the
section, which is not available before codegen. This is not a problem in
practice. The reason is that it only happens with private linkage, which will
be ignored by the non codegen users (llvm-nm and llvm-ar).
llvm-svn: 201608
alongside DIEBlock and replace uses accordingly. Use DW_FORM_exprloc
in DWARF4 and later code. Update testcases.
Adding a DIELoc instead of using extra forms inside DIEBlock so
that we can keep location expressions separate from other uses. No
direct use at the moment, however, it's not a lot of code and
using a separately named class keeps it somewhat more obvious
what's going on in various locations.
llvm-svn: 201481
This broke in r185459 while TLS support was being generalized to handle
non-symbol TLS representations.
I thought about/tried having an enum rather than a bool to track the
TLS-ness of the address table entry, but namespaces and naming seemed
more hassle than it was worth for only one caller that needed to specify
this.
llvm-svn: 201469
Type units will share the statement list of their defining compile unit.
This is a tradeoff that reduces .o debug info size at the cost of some
linked debug info size (since the contents of those string tables won't
be deduplicated along with the type unit) which seems right for now.
llvm-svn: 201445
These types have an out of line virtual function each (emitHeader at
least) so they won't have weak vtables - no need for more than that.
llvm-svn: 201444
This probably also addresses the FIXME in the fission case regarding
multiple compile units, though I haven't tested that.
This code still confuses me (the literal zero offset makes little sense,
the limitations surrounding asm output I'm not sure about either - but
perhaps we should just always emit one line table? Or should we not rely
on .loc/.file even in assembly so we can produce the same output between
asm and object output?) but this maintains the existing functionality.
llvm-svn: 201441
Recommitting r201380 (reverted in r201389)
Recommitting r201351 and r201355 (reverted in r201351 and r201355)
We weren't emitting the an empty (header only) line table when the line
table was empty - this made the DWARF invalid (the compile unit would
point to the zero-size debug_lines section where there should've been an
empty line table but there was nothing at all). Fix that, and as a
consequence this works around/addresses PR18809.
Also, we emit a non-empty line table to workaround a darwin linker bug,
so XFAILing on darwin too.
Also, mark the test as 'REQUIRES: object-emission' because it does.
llvm-svn: 201429
Recommitting r201351 and r201355 (reverted in r201351 and r201355)
We weren't emitting the an empty (header only) line table when the line
table was empty - this made the DWARF invalid (the compile unit would
point to the zero-size debug_lines section where there should've been an
empty line table but there was nothing at all). Fix that, and as a
consequence this works around/addresses PR18809.
llvm-svn: 201380
Summary:
AsmPrinter::EmitInlineAsm() will no longer use the EmitRawText() call for
targets with mature MC support. Such targets will always parse the inline
assembly (even when emitting assembly). Targets without mature MC support
continue to use EmitRawText() for assembly output.
The hasRawTextSupport() check in AsmPrinter::EmitInlineAsm() has been replaced
with MCAsmInfo::UseIntegratedAs which when true, causes the integrated assembler
to parse inline assembly (even when emitting assembly output). UseIntegratedAs
is set to true for targets that consider any failure to parse valid assembly
to be a bug. Target specific subclasses generally enable the integrated
assembler in their constructor. The default value can be overridden with
-no-integrated-as.
All tests that rely on inline assembly supporting invalid assembly (for example,
those that use mnemonics such as 'foo' or 'hello world') have been updated to
disable the integrated assembler.
Changes since review (and last commit attempt):
- Fixed test failures that were missed due to configuration of local build.
(fixes crash.ll and a couple others).
- Fixed tests that happened to pass because the local build was on X86
(should fix 2007-12-17-InvokeAsm.ll)
- mature-mc-support.ll's should no longer require all targets to be compiled.
(should fix ARM and PPC buildbots)
- Object output (-filetype=obj and similar) now forces the integrated assembler
to be enabled regardless of default setting or -no-integrated-as.
(should fix SystemZ buildbots)
Reviewers: rafael
Reviewed By: rafael
CC: llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D2686
llvm-svn: 201333
This fix checks the original LLVM IR node to identify opaque constants by
looking for the bitcast-constant pattern. Originally we looked at the generated
SDNode, but this might lead to incorrect results. The SDNode could have been
generated by an constant expression that was folded to a constant.
This fixes <rdar://problem/16050719>
llvm-svn: 201291
We are now no longer relying on the target-specific call lowering implementation
to lower a stackmap intrinsic call. Instead we perform the call lowering in a
target-independent way directly in the stackmap lowering code. This simplifies
the code and removes the need to fixup the code after the target-specific call
lowering.
llvm-svn: 201263
The ID type for the stackmap and patchpoint intrinsics are in both cases i64.
This fixes an zero extend in the SelectionDAGBuilder that still used i32. This
also updates the target independent instructions STACKMAP and PATCHPOINT to use
the correct type.
llvm-svn: 201262
Summary:
AsmPrinter::EmitInlineAsm() will no longer use the EmitRawText() call for targets with mature MC support. Such targets will always parse the inline assembly (even when emitting assembly). Targets without mature MC support continue to use EmitRawText() for assembly output.
The hasRawTextSupport() check in AsmPrinter::EmitInlineAsm() has been replaced with MCAsmInfo::UseIntegratedAs which when true, causes the integrated assembler to parse inline assembly (even when emitting assembly output). UseIntegratedAs is set to true for targets that consider any failure to parse valid assembly to be a bug. Target specific subclasses generally enable the integrated assembler in their constructor. The default value can be overridden with -no-integrated-as.
All tests that rely on inline assembly supporting invalid assembly (for example, those that use mnemonics such as 'foo' or 'hello world') have been updated to disable the integrated assembler.
Reviewers: rafael
Reviewed By: rafael
CC: llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D2686
llvm-svn: 201237
There's still one piece missing here, which is adding the
DW_AT_stmt_list to the type unit that refer's to the compile unit's line
table. Working on that.
llvm-svn: 201198
This is preliminary work to fix type unit file strings so they appear in
their originating CU's line table - but it's also just good/simple
cleanup, so I'm committing it ahead of time.
llvm-svn: 201195
We used to be pretty vague about what debug entities were what, with
many conditionals to silently drop/skip/accept things. These don't seem
to be relevant anymore.
llvm-svn: 201194
Debug info: Emit values in subregisters that do not have a separate
DWARF register number by emitting a super-register + DW_OP_bit_piece.
This is necessary because on x86_64, there are no DWARF register numbers
for i386-style subregisters.
Fixes a bunch of FIXMEs.
rdar://problem/16015314
llvm-svn: 201190
This comes up in empty files or files containing #file directives that
never reference the actual source file name. Came up in a small test of
line tables I was playing with.
llvm-svn: 201187
DWARF register number by emitting a super-register + DW_OP_bit_piece.
This is necessary because on x86_64, there are no DWARF register numbers
for i386-style subregisters.
Fixes a bunch of FIXMEs.
rdar://problem/16015314
llvm-svn: 201180
BUILD_VECTOR nodes, e.g.:
(concat_vectors (BUILD_VECTOR a1, a2, a3, a4), (BUILD_VECTOR b1, b2, b3, b4))
->
(BUILD_VECTOR a1, a2, a3, a4, b1, b2, b3, b4)
This fixes an issue with AVX, where a sequence was not recognized as a 256-bit
vbroadcast due to the concat_vectors.
llvm-svn: 201158
These methods normally call each other and it is really annoying if the
arguments are in different order. The more common rule was that the arguments
specific to call are first (GV, Encoding, Suffix) and the auxiliary objects
(Mang, TM) come after. This patch changes the exceptions.
llvm-svn: 201044
This solves a problem where a def machine operand has no uses but has
not been marked dead. In this case, the initial RP analysis was being
extra precise and determining from LiveIntervals the the register was
actually dead. This caused us to omit the register from the RP
tracker's block live out. That's all good, but the per-instruction
summary still accounted for it as a valid def. This could cause an
assertion in the tracker later when we underflow pressure.
This is from a bug report on an out-of-tree target. It is not
reproducible on well-behaved targets. I'm just making an obvious fix
without unit test.
llvm-svn: 200941
In a previous commit (r199818) we added a const_cast to an existing
subtarget info instead of creating a new one so that we could reuse
it when creating the TargetAsmParser for parsing inline assembly.
This cast was necessary because we needed to reuse the existing STI
to avoid generating incorrect code when the inline asm contained
mode-switching directives (e.g. .code 16).
The root cause of the failure was that there was an implicit sharing
of the STI between the parser and the MCCodeEmitter. To fix a
different but related issue, we now explicitly pass the STI to the
MCCodeEmitter (see commits r200345-r200351).
The const_cast is no longer necessary and we can now create a fresh
STI for the inline asm parser to use.
Differential Revision: http://llvm-reviews.chandlerc.com/D2709
llvm-svn: 200929
The aim in this patch is to reduce work that VirtRegRewriter needs to do when
telling MachineRegisterInfo which physregs are in use. Up until now
VirtRegRewriter::rewrite has been doing rewriting and populating def info and
then proceeding to set whether a physreg is used based this info for every
physreg that the target provides. This can be expensive when a target has an
unusually high number of supported physregs, and is a noticeable chunk of
compile time for small programs on such targets.
So to reduce compile time, this patch simply adds the use of a SparseSet to the
rewrite function that is used to flag each physreg that is encountered in a
MachineFunction. Afterward, rather than iterating over the set of all physregs
for a given target to set the physregs used in MachineRegisterInfo, the new way
is to iterate over the set of physregs that were actually encountered and set
in the SparseSet. This improves compile time because the existing rewrite
function was iterating over all MachineOperands already, and because the
iterations afterward to setPhysRegUsed is reduced by use of the SparseSet data.
llvm-svn: 200919
programs on targets with large register files. The root of the compile time
overhead was in the use of llvm::SmallVector to hold PhysRegEntries, which
resulted in slow-down from calling llvm::SmallVector::assign(N, 0). In contrast
std::vector uses the faster __platform_bzero to zero out primitive buffers when
assign is called, while SmallVector uses an iterator.
The fix for this was simply to replace the SmallVector with a dynamically
allocated buffer and to initialize or reinitialize the buffer based on the
total registers that the target architecture requires. The changes support
cases where a pass manager may be reused for different targets, and note that
the PhysRegEntries is allocated using calloc mainly for good for, and also to
quite tools like Valgrind (see comments for more info on this).
There is an rdar to track the fact that SmallVector doesn't have platform
specific speedup optimizations inside of it for things like this, and I'll
create a bugzilla entry at some point soon as well.
TL;DR: This fix replaces the expensive llvm::SmallVector<unsigned
char>::assign(N, 0) with a call to calloc for N bytes which is much faster
because SmallVector's assign uses iterators.
llvm-svn: 200917
large register files. The omission of Queries.clear() is perfectly safe because
LiveIntervalUnion::Query doesn't contain any data that needs freeing and
because LiveRegMatrix::runOnFunction happens to reset the OwningArrayPtr
holding Queries every time it is run, so there's no need to zero out the
queries either. Not having to do this for very large numbers of physregs
is a noticeable constant cost reduction in compilation of small programs.
llvm-svn: 200913
During DAGCombine visitShiftByConstant assumes that certain binary operations
with only constant operands can always be folded successfully. This is no longer
true when the constant is opaque. This commit fixes visitShiftByConstant by not
performing the optimization for opaque constants. Otherwise we would end up in
an infinite DAGCombine loop.
llvm-svn: 200900
find a register.
The idea is to choose a color for the variable that cannot be allocated and
recolor its interferences around. Unlike the current register allocation scheme,
it is allowed to change the color of an already assigned (but maybe not
splittable or spillable) live interval while propagating this change to its
neighbors.
In other word, there are two things that may help finding an available color:
- Already assigned variables (RS_Done) can be recolored to different color.
- The recoloring allows to catch solutions that needs to touch more that just
the neighbors of the current allocated variable.
E.g.,
vA can use {R1, R2 }
vB can use { R2, R3}
vC can use {R1 }
Where vA, vB, and vC cannot be split anymore (they are reloads for instance) and
they all interfere.
vA is assigned R1
vB is assigned R2
vC tries to evict vA but vA is already done.
=> Regular register allocation heuristic fails.
Last chance recoloring kicks in:
vC does as if vA was evicted => vC uses R1.
vC is marked as fixed.
vA needs to find a color.
None are available.
vA cannot evict vC: vC is a fixed virtual register now.
vA does as if vB was evicted => vA uses R2.
vB needs to find a color.
R3 is available.
Recoloring => vC = R1, vA = R2, vB = R3.
<rdar://problem/15947839>
llvm-svn: 200883