This fixes va_start/va_copy of a va_list field which happens to not
be laid out at a 16-byte boundary.
Differential Revision: http://llvm-reviews.chandlerc.com/D276
llvm-svn: 172128
Like Clang's FixItHint, SMFixIt represents an insertion, replacement, or
removal of source text. One or more fix-its can be emitted as part of
a diagnostic, and will be printed below the source range line to show the
user how they can fix their code.
Currently, the only client of SMFixIt is clang-tblgen; thus, the tests for
this behavior live in clang/test/TableGen/tg-fixits.td. If/when SMFixIt is
adopted within LLVM itself, those tests should be moved to the LLVM suite.
llvm-svn: 172086
When calling hasProperty() on an instruction inside a bundle, it should
always behave as if IgnoreBundle was passed, and just return properties
for the current instruction.
Only attempt to aggregate bundle properties whan asked about the bundle
header.
The assertion fires on existing ARM test cases without this fix.
llvm-svn: 172082
requirement when creating stack objects in MachineFrameInfo.
Add CreateStackObjectWithMinAlign to throw error when the minimal alignment
can't be achieved and to clamp the alignment when the preferred alignment
can't be achieved. Same is true for CreateVariableSizedObject.
Will not emit error in CreateSpillStackObject or CreateStackObject.
As long as callers of CreateStackObject do not assume the object will be
aligned at the requested alignment, we should not have miscompile since
later optimizations which look at the object's alignment will have the correct
information.
rdar://12713765
llvm-svn: 172027
It cahced XOR's operands before calling visitXOR() but failed to update the
operands when visitXOR changed the XOR node.
rdar://12968664
llvm-svn: 171999
1. Added debug messages when in OptimizeIndividualCalls we move calls into predecessors and then erase the original call.
2. Added debug messages when in the process of moving calls in ObjCARCOpt::MoveCalls we create new RR and delete old RR.
3. Added a debug message when we visit a specific retain instruction in ObjCARCOpt::PerformCodePlacement.
llvm-svn: 171988
It is possible to build MI bundles that don't begin with a BUNDLE
header. Add support for such bundles, counting all instructions inside
the bundle.
llvm-svn: 171985
This patch adjust the r171506 to make all DWARF enconding pc-relative
for PPC64. It also adds the R_PPC64_REL32 relocation handling in MCJIT
(since the eh_frame will not generate PIC-relative relocation) and also
adds the emission of stubs created by the TTypeEncoding.
llvm-svn: 171979
fp128 is almost but not quite completely illegal as a type on AArch64. As a
result it needs to have a register class (for argument passing mainly), but all
operations need to be lowered to runtime calls. Currently there's no way for
targets to do this (without duplicating code), as the relevant functions are
hidden in SelectionDAG. This patch changes that.
llvm-svn: 171971
PR 14848. The lowered sequence is based on the existing sequence the target-independent
DAG Combiner creates for the scalar case.
Patch by Zvi Rackover.
llvm-svn: 171953
This was an experimental option, but needs to be defined
per-target. e.g. PPC A2 needs to aggressively hide latency.
I converted some in-order scheduling tests to A2. Hal is working on
more test cases.
llvm-svn: 171946
It's clearer and additionally this gets rid of the usage of `DefmID`,
which doesn't really correspond to anything in the language (it was just
used in the name of this parsing function which parsed a `MultiClassID`
and returned that multiclass's record).
This area of the code still needs a lot of work.
llvm-svn: 171938
method because getContents().size() already covers it. So computeFragmentSize
can use the generic MCEncodedFragment interface when querying both Data and
Relaxable fragments for contents sizes.
No change in functionality
llvm-svn: 171903
value in the 64 bit .eh_frame section.
It doesn't however allow exception handling to work
yet since it depends on the correct relocation model
being set in the ELF header flags.
Contributer: Jack Carter
llvm-svn: 171881
The current Intel Atom microarchitecture has a feature whereby
when a function returns early then it is slightly faster to execute
a sequence of NOP instructions to wait until the return address is ready,
as opposed to simply stalling on the ret instruction until
the return address is ready.
When compiling for X86 Atom only, this patch will run a pass,
called "X86PadShortFunction" which will add NOP instructions where less
than four cycles elapse between function entry and return.
It includes tests.
This patch has been updated to address Nadav's review comments
- Optimize only at >= O1 and don't do optimization if -Os is set
- Stores MachineBasicBlock* instead of BBNum
- Uses DenseMap instead of std::map
- Fixes placement of braces
Patch by Andy Zhang.
llvm-svn: 171879
one file where it is called as a static function. Nuke the declaration
and the definition in lib/CodeGen, along with the include of
SelectionDAG.h from this file.
There is no dependency edge from lib/CodeGen to
lib/CodeGen/SelectionDAG, so it isn't valid for a routine in lib/CodeGen
to reference the DAG. There is a dependency from
lib/CodeGen/SelectionDAG on lib/CodeGen. This breaks one violation of
this layering.
llvm-svn: 171842
small loops. On small loops post-loop that handles scalars (and runs slower) can take more time to execute than the
rest of the loop. This patch disables widening of loops with a small static trip count.
llvm-svn: 171798
o. X/C1 * C2 => X * (C2/C1) (if C2/C1 is neither special FP nor denormal)
o. X/C1 * C2 -> X/(C1/C2) (if C2/C1 is either specical FP or denormal, but C1/C2 is a normal Fp)
Let MDC denote multiplication or dividion with one & only one operand being a constant
o. (MDC ± C1) * C2 => (MDC * C2) ± (C1 * C2)
(so long as the constant-folding doesn't yield any denormal or special value)
llvm-svn: 171793
proposal. This leaves the strings in the skeleton die as strp,
but in all dwo files they're accessed now via DW_FORM_GNU_str_index.
Add support for dumping these sections and modify the fission-cu.ll
testcase to have the correct strings and form. Fix a small bug
in the fixed form sizes routine that involved out of array accesses
for the table and add a FIXME in the extractFast routine to fix
this up.
llvm-svn: 171779
code generation. Variables addressed through a GlobalAlias were not being
handled, and variables with available_externally linkage were treated
incorrectly. The patch contains two new tests to verify the correct code
generation for these cases.
llvm-svn: 171778
This is necessary not only for representing empty ranges, but for handling
multibyte characters in the input. (If the end pointer in a range refers to
a multibyte character, should it point to the beginning or the end of the
character in a char array?) Some of the code in the asm parsers was already
assuming this anyway.
llvm-svn: 171765
turning a code like this:
if (foo)
free(foo)
into that:
free(foo)
Move a call to free from basic block FB into FB's predecessor, P,
when the path from P to FB is taken only if the argument of free is
not equal to NULL.
Some restrictions apply on P and FB to be sure that this code motion
is profitable. Namely:
1. FB must have only one predecessor P.
2. FB must contain only the call to free plus an unconditional
branch to S.
3. P's successors are FB and S.
Because of 1., we will not increase the code size when moving the call
to free from FB to P.
Because of 2., FB will be empty after the move.
Because of 2. and 3., P's branch instruction becomes useless, so as FB
(simplifycfg will do the job).
llvm-svn: 171762
peculiar headers under include/llvm.
This struct still doesn't make a lot of sense, but it makes more sense
down in TargetLowering than it did before.
llvm-svn: 171739
already in a class, just inline the four of them. I suspect that this
class could be simplified some to not always keep distinct variables for
these things, but it wasn't clear to me how given the usage so I opted
for a trivial and mechanical translation.
This removes one of the two remaining users of a header in include/llvm
which does nothing more than define a 4 member struct.
llvm-svn: 171738
TargetTransformInfo rather than TargetLowering, removing one of the
primary instances of the layering violation of Transforms depending
directly on Target.
This is a really big deal because LSR used to be a "special" pass that
could only be tested fully using llc and by looking at the full output
of it. It also couldn't run with any other loop passes because it had to
be created by the backend. No longer is this true. LSR is now just
a normal pass and we should probably lift the creation of LSR out of
lib/CodeGen/Passes.cpp and into the PassManagerBuilder. =] I've not done
this, or updated all of the tests to use opt and a triple, because
I suspect someone more familiar with LSR would do a better job. This
change should be essentially without functional impact for normal
compilations, and only change behvaior of targetless compilations.
The conversion required changing all of the LSR code to refer to the TTI
interfaces, which fortunately are very similar to TargetLowering's
interfaces. However, it also allowed us to *always* expect to have some
implementation around. I've pushed that simplification through the pass,
and leveraged it to simplify code somewhat. It required some test
updates for one of two things: either we used to skip some checks
altogether but now we get the default "no" answer for them, or we used
to have no information about the target and now we do have some.
I've also started the process of removing AddrMode, as the TTI interface
doesn't use it any longer. In some cases this simplifies code, and in
others it adds some complexity, but I think it's not a bad tradeoff even
there. Subsequent patches will try to clean this up even further and use
other (more appropriate) abstractions.
Yet again, almost all of the formatting changes brought to you by
clang-format. =]
llvm-svn: 171735
bogus comparison operands to default to eq/oeq. Fix that, fix a couple of
tests that accidentally passed and test for bogus comparison opeartors
explicitly.
llvm-svn: 171733
being present. Make a member of one of the helper classes a reference as
part of this.
Reformatting goodness brought to you by clang-format.
llvm-svn: 171726
This makes the loop vectorizer match the pattern followed by roughly all
other passses. =]
Notably, this header file was braken in several regards: it contained
a using namespace directive, global #define's that aren't globaly
appropriate, and global constants defined directly in the header file.
As a side benefit, lots of the types in this file become internal, which
will cause the optimizer to chew on this pass more effectively.
llvm-svn: 171723
This could be simplified further, but Hal has a specific feature for
ignoring TTI, and so I preserved that.
Also, I needed to use it because a number of tests fail when switching
from a null TTI to the NoTTI nonce implementation. That seems suspicious
to me and so may be something that you need to look into Hal. I worked
it by preserving the old behavior for these tests with the flag that
ignores all target info.
llvm-svn: 171722
Absent a Contributor's License Agreement (CLA) with an LLVM legal entity and as
reviewed and agreed with Chris Lattner, add a patent license covering future
contributions from ARM until there is a CLA. This is to make explicit ARM's
grant of patent rights to recipients of LLVM containing ARM-contributed
material.
llvm-svn: 171721
this patch brought to you by the tool clang-format.
I wanted to fix up the names of constructor parameters because they
followed a bit of an anti-pattern by naming initialisms with CamelCase:
'Tti', 'Se', etc. This appears to have been in an attempt to not overlap
with the names of member variables 'TTI', 'SE', etc. However,
constructor arguments can very safely alias members, and in fact that's
the conventional way to pass in members. I've fixed all of these I saw,
along with making some strang abbreviations such as 'Lp' be simpler 'L',
or 'Lgl' be the word 'Legal'.
However, the code I was touching had indentation and formatting somewhat
all over the map. So I ran clang-format and fixed them.
I also fixed a few other formatting or doxygen formatting issues such as
using ///< on trailing comments so they are associated with the correct
entry.
There is still a lot of room for improvement of the formating and
cleanliness of this code. ;] At least a few parts of the coding
standards or common practices in LLVM's code aren't followed, the enum
naming rules jumped out at me. I may mix some of these while I'm here,
but not all of them.
llvm-svn: 171719
I'm sorry for duplicating bad style here, but I wanted to keep
consistency. I've pinged the code review thread where this style was
reviewed and changes were requested.
llvm-svn: 171714
This c'tor takes the AttributeSet class as the parameter. It will eventually
grab the attributes from the specified index and create a new attribute builder
with those attributes.
llvm-svn: 171712
This works fine with GDB for member variable pointers, but GDB's support for
member function pointers seems to be quite unrelated to
DW_TAG_ptr_to_member_type. (see GDB bug 14998 for details)
llvm-svn: 171698
through as a reference rather than a pointer. There is always *some*
implementation of this available, so this simplifies code by not having
to test for whether it is available or not.
Further, it turns out there were piles of places where SimplifyCFG was
recursing and not passing down either TD or TTI. These are fixed to be
more pedantically consistent even though I don't have any particular
cases where it would matter.
llvm-svn: 171691
a TargetMachine to construct (and thus isn't always available), to an
analysis group that supports layered implementations much like
AliasAnalysis does. This is a pretty massive change, with a few parts
that I was unable to easily separate (sorry), so I'll walk through it.
The first step of this conversion was to make TargetTransformInfo an
analysis group, and to sink the nonce implementations in
ScalarTargetTransformInfo and VectorTargetTranformInfo into
a NoTargetTransformInfo pass. This allows other passes to add a hard
requirement on TTI, and assume they will always get at least on
implementation.
The TargetTransformInfo analysis group leverages the delegation chaining
trick that AliasAnalysis uses, where the base class for the analysis
group delegates to the previous analysis *pass*, allowing all but tho
NoFoo analysis passes to only implement the parts of the interfaces they
support. It also introduces a new trick where each pass in the group
retains a pointer to the top-most pass that has been initialized. This
allows passes to implement one API in terms of another API and benefit
when some other pass above them in the stack has more precise results
for the second API.
The second step of this conversion is to create a pass that implements
the TargetTransformInfo analysis using the target-independent
abstractions in the code generator. This replaces the
ScalarTargetTransformImpl and VectorTargetTransformImpl classes in
lib/Target with a single pass in lib/CodeGen called
BasicTargetTransformInfo. This class actually provides most of the TTI
functionality, basing it upon the TargetLowering abstraction and other
information in the target independent code generator.
The third step of the conversion adds support to all TargetMachines to
register custom analysis passes. This allows building those passes with
access to TargetLowering or other target-specific classes, and it also
allows each target to customize the set of analysis passes desired in
the pass manager. The baseline LLVMTargetMachine implements this
interface to add the BasicTTI pass to the pass manager, and all of the
tools that want to support target-aware TTI passes call this routine on
whatever target machine they end up with to add the appropriate passes.
The fourth step of the conversion created target-specific TTI analysis
passes for the X86 and ARM backends. These passes contain the custom
logic that was previously in their extensions of the
ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces.
I separated them into their own file, as now all of the interface bits
are private and they just expose a function to create the pass itself.
Then I extended these target machines to set up a custom set of analysis
passes, first adding BasicTTI as a fallback, and then adding their
customized TTI implementations.
The fourth step required logic that was shared between the target
independent layer and the specific targets to move to a different
interface, as they no longer derive from each other. As a consequence,
a helper functions were added to TargetLowering representing the common
logic needed both in the target implementation and the codegen
implementation of the TTI pass. While technically this is the only
change that could have been committed separately, it would have been
a nightmare to extract.
The final step of the conversion was just to delete all the old
boilerplate. This got rid of the ScalarTargetTransformInfo and
VectorTargetTransformInfo classes, all of the support in all of the
targets for producing instances of them, and all of the support in the
tools for manually constructing a pass based around them.
Now that TTI is a relatively normal analysis group, two things become
straightforward. First, we can sink it into lib/Analysis which is a more
natural layer for it to live. Second, clients of this interface can
depend on it *always* being available which will simplify their code and
behavior. These (and other) simplifications will follow in subsequent
commits, this one is clearly big enough.
Finally, I'm very aware that much of the comments and documentation
needs to be updated. As soon as I had this working, and plausibly well
commented, I wanted to get it committed and in front of the build bots.
I'll be doing a few passes over documentation later if it sticks.
Commits to update DragonEgg and Clang will be made presently.
llvm-svn: 171681
cvtsi2* should parse with an 'l' or 'q' suffix or no suffix at all. No suffix should be treated the same as 'l' suffix. Printing should always print a suffix. Previously we didn't parse or print an 'l' suffix.
cvtt*2si/cvt*2si should parse with an 'l' or 'q' suffix or not suffix at all. No suffix should use the destination register size to choose encoding. Printing should not print a suffix.
Original 'l' suffix issue with cvtsi2* pointed out by Michael Kuperstein.
llvm-svn: 171668
pass into the SelectionDAG itself rather than snooping on the
implementation of that pass as exposed by the TargetMachine. This
removes the last direct client of the ScalarTargetTransformInfo class
outside of the TTI pass implementation.
llvm-svn: 171625
interfaces which could be extracted from it, and must be provided on
construction, to a chained analysis group.
The end goal here is that TTI works much like AA -- there is a baseline
"no-op" and target independent pass which is in the group, and each
target can expose a target-specific pass in the group. These passes will
naturally chain allowing each target-specific pass to delegate to the
generic pass as needed.
In particular, this will allow a much simpler interface for passes that
would like to use TTI -- they can have a hard dependency on TTI and it
will just be satisfied by the stub implementation when that is all that
is available.
This patch is a WIP however. In particular, the "stub" pass is actually
the one and only pass, and everything there is implemented by delegating
to the target-provided interfaces. As a consequence the tools still have
to explicitly construct the pass. Switching targets to provide custom
passes and sinking the stub behavior into the NoTTI pass is the next
step.
llvm-svn: 171621
values -- that's not required to fix the bug that was cropping up, and
the values selected made the enumeration's underlying type signed and
introduced some warnings. This fixes the -Werror build.
The underlying issue here was that the DenseMapInfo was casting values
completely outside the range of the underlying storage of the
enumeration to the enumeration's type. GCC went and "optimized" that
into infloops and other misbehavior. By providing designated special
values for these keys in the dense map, we ensure they are indeed
representable and that they won't be used for anything else.
It might be better to reuse None for the empty key and have the
tombstone share the value of the sentinel enumerator, but honestly
having 2 extra enumerators seemed not to matter and this seems a bit
simpler. I'll let Bill shuffle this around (or ask me to shuffle it
around) if he prefers it to look a different way.
I also made the switch a bit more clear (and produce a better assert)
that the enumerators are *never* going to show up and are errors if they
do.
llvm-svn: 171614
This change essentially reverts r87069 which came without a test case. It
causes no regressions in the GDB 7.5 test suite & fixes 25 xfails (commit
to the test suite to follow). If anyone can present a test case that
demonstrates why this check is necessary I'd be happy to account for it in one
way or another.
llvm-svn: 171609
URL: http://llvm.org/viewvc/llvm-project?rev=171524&view=rev
Log:
The current Intel Atom microarchitecture has a feature whereby when a function
returns early then it is slightly faster to execute a sequence of NOP
instructions to wait until the return address is ready,
as opposed to simply stalling on the ret instruction
until the return address is ready.
When compiling for X86 Atom only, this patch will run a pass, called
"X86PadShortFunction" which will add NOP instructions where less than four
cycles elapse between function entry and return.
It includes tests.
Patch by Andy Zhang.
llvm-svn: 171603
The series of patches leading up to this one makes llc -O0 run 8% faster.
When deallocating a MachineFunction, there is no need to visit all
MachineInstr and MachineOperand objects to deallocate them. All their
memory come from a BumpPtrAllocator that is about to be purged, and they
have empty destructors anyway.
This only applies when deallocating the MachineFunction.
DeleteMachineInstr() should still be used to recycle MI memory during
the codegen passes.
Remove the LeakDetector support for MachineInstr. I've never seen it
used before, and now it definitely doesn't work. With this patch, leaked
MachineInstrs would be much less of a problem since all of their memory
will be reclaimed by ~MachineFunction().
llvm-svn: 171599
Instead of an std::vector<MachineOperand>, use MachineOperand arrays
from an ArrayRecycler living in MachineFunction.
This has several advantages:
- MachineInstr now has a trivial destructor, making it possible to
delete them in batches when destroying MachineFunction. This will be
enabled in a later patch.
- Bypassing malloc() and free() can be faster, depending on the system
library.
- MachineInstr objects and their operands are allocated from the same
BumpPtrAllocator, so they will usually be next to each other in
memory, providing better locality of reference.
- Reduce MachineInstr footprint. A std::vector is 24 bytes, the new
operand array representation only uses 8+4+1 bytes in MachineInstr.
- Better control over operand array reallocations. In the old
representation, the use-def chains would be reordered whenever a
std::vector reached its capacity. The new implementation never changes
the use-def chain order.
Note that some decisions in the code generator depend on the use-def
chain orders, so this patch may cause different assembly to be produced
in a few cases.
llvm-svn: 171598
This function works like memmove() for MachineOperands, except it also
updates any use-def chains containing the moved operands.
The use-def chains are updated without affecting the order of operands
in the list. That isn't possible when using the
removeRegOperandFromUseList() and addRegOperandToUseList() functions.
Callers to follow soon.
llvm-svn: 171597
legality of an address mode to not use a struct of four values and
instead to accept them as parameters. I'd love to have named parameters
here as most callers only care about one or two of these, but the
defaults aren't terribly scary to write out.
That said, there is no real impact of this as the passes aren't yet
using STTI for this and are still relying upon TargetLowering.
llvm-svn: 171595
next to its only user. This helper relies on TargetLowering information
that shouldn't be generally used throughout the Transfoms library, and
so it made little sense as a generic utility.
This also consolidates the file where we need to remove the remaining
uses of TargetLowering in favor of the IR-layer abstract interface in
TargetTransformInfo.
llvm-svn: 171590
The Attribute class is eventually going to represent one attribute. So we need
this class to create the set of attributes. Add some iterator methods to the
builder to access its internal bits in a nice way.
llvm-svn: 171586
leaving this undefined, and despite the sentence in the standard that
seems to require it, I'll cede the point and assume its a bug in the
wording. Other parts of POSIX regularly allow for things to be -1
instead of undefined, this should too. Makes things more consistent too.
This should have to real impact for folks though.
llvm-svn: 171574
defines _POSIX_CPUTIME but doesn't support the clock_* functions.
I don't test the value of _POSIX_CPUTIME because the spec merely says
that if it is defined, the CPU-specific timers are available, whereas it
says that _POSIX_TIMERS must be defined and defined to a value greater
than zero. However, this may not work, as the POSIX spec clearly states:
"If the symbolic constant _POSIX_CPUTIME is defined, then the symbolic
constant _POSIX_TIMERS shall also be defined by the implementation to
have the value 200112L."
If this doesn't work, I'll add more hacks for Darwin.
llvm-svn: 171565
The bit mask thing will be a thing of the past. It's not extensible enough. Get
rid of its use here. Opt instead for using a vector to hold the attributes.
Note: Some of this code will become obsolete once the rewrite is further along.
llvm-svn: 171553
wall time, user time, and system time since a process started.
For walltime, we currently use TimeValue's interface and a global
initializer to compute a close approximation of total process runtime.
For user time, this adds support for an somewhat more precise timing
mechanism -- clock_gettime with the CLOCK_PROCESS_CPUTIME_ID clock
selected.
For system time, we have to do a full getrusage call to extract the
system time from the OS. This is expensive but unavoidable.
In passing, clean up the implementation of the old APIs and fix some
latent bugs in the Windows code. This might have manifested on Windows
ARM systems or other systems with strange 64-bit integer behavior.
The old API for this both user time and system time simultaneously from
a single getrusage call. While this results in fewer system calls, it
also results in a lower precision user time and if only user time is
desired, it introduces a higher overhead. It may be worthwhile to switch
some of the pass timers to not track system time and directly track user
and wall time. The old API also tracked walltime in a confusing way --
it just set it to the current walltime rather than providing any measure
of wall time since the process started the way buth user and system time
are tracked. The new API is more consistent here.
The plan is to eventually implement these methods for a *child* process
by using the wait3(2) system call to populate an rusage struct
representing the whole subprocess execution. That way, after waiting on
a child process its stats will become accurate and cheap to query.
llvm-svn: 171551
returns early then it is slightly faster to execute a sequence of NOP
instructions to wait until the return address is ready,
as opposed to simply stalling on the ret instruction
until the return address is ready.
When compiling for X86 Atom only, this patch will run a pass, called
"X86PadShortFunction" which will add NOP instructions where less than four
cycles elapse between function entry and return.
It includes tests.
Patch by Andy Zhang.
llvm-svn: 171524
* Remove dead methods.
* Use the 'operator==' method instead of 'contains', which isn't needed.
* Fix some comments.
No functionality change.
llvm-svn: 171523
reachablity.
We conservatively approximate the reachability analysis by saying it is not
reachable if there is a single path starting from "From" and the path does not
reach "To".
rdar://12801584
llvm-svn: 171512
This patch fixes the PPC eh_frame definitions for the personality and
frame unwinding for PIC objects. It makes PIC build correctly creates
relative relocations in the '.rela.eh_frame' segments and thus avoiding
a text relocation that generates a DT_TEXTREL segments in link phase.
llvm-svn: 171506
1. Add code to estimate register pressure.
2. Add code to select the unroll factor based on register pressure.
3. Add bits to TargetTransformInfo to provide the number of registers.
llvm-svn: 171469