(defined by the x32 ABI) mode, in which case its pointers are 32-bits
in size. This knowledge is also added to X86RegisterInfo that now
returns the appropriate registers in getPointerRegClass.
There are many outcomes to this change. In order to keep the patches
separate and manageable, we start by focusing on some simple testable
cases. The patch adds a test with passing a pointer to a function -
focusing on the difference between the two data models for x86-64.
Another test is added for handling of 'sret' arguments (and
functionality is added in X86ISelLowering to make it work).
A note on naming: the "x32 ABI" document refers to the AMD64
architecture (in LLVM it's distinguished by being is64Bits() in the
x86 subtarget) with two variations: the LP64 (default) data model, and
the ILP32 data model. This patch adds predicates to the subtarget
which are consistent with this naming scheme.
llvm-svn: 173503
The requirements of the strong heuristic are:
* A Protector is required for functions which contain an array, regardless of
type or length.
* A Protector is required for functions which contain a structure/union which
contains an array, regardless of type or length. Note, there is no limit to
the depth of nesting.
* A protector is required when the address of a local variable (i.e., stack
based variable) is exposed. (E.g., such as through a local whose address is
taken as part of the RHS of an assignment or a local whose address is taken as
part of a function argument.)
llvm-svn: 173231
SSPStrong applies a heuristic to insert stack protectors in these situations:
* A Protector is required for functions which contain an array, regardless of
type or length.
* A Protector is required for functions which contain a structure/union which
contains an array, regardless of type or length. Note, there is no limit to
the depth of nesting.
* A protector is required when the address of a local variable (i.e., stack
based variable) is exposed. (E.g., such as through a local whose address is
taken as part of the RHS of an assignment or a local whose address is taken as
part of a function argument.)
This patch implements the SSPString attribute to be equivalent to
SSPRequired. This will change in a subsequent patch.
llvm-svn: 173230
- Add list of physical registers clobbered in pseudo atomic insts
Physical registers are clobbered when pseudo atomic instructions are
expanded. Add them in clobber list to prevent DAG scheduler to
mis-schedule them after these insns are declared side-effect free.
- Add test case from Michael Kuperstein <michael.m.kuperstein@intel.com>
llvm-svn: 173200
The optimization handles esoteric cases but adds a lot of complexity both to the X86 backend and to other backends.
This optimization disables an important canonicalization of chains of SEXT nodes and makes SEXT and ZEXT asymmetrical.
Disabling the canonicalization of consecutive SEXT nodes into a single node disables other DAG optimizations that assume
that there is only one SEXT node. The AVX mask optimizations is one example. Additionally this optimization does not update the cost model.
llvm-svn: 172968
It cahced XOR's operands before calling visitXOR() but failed to update the
operands when visitXOR changed the XOR node.
rdar://12968664
llvm-svn: 171999
PR 14848. The lowered sequence is based on the existing sequence the target-independent
DAG Combiner creates for the scalar case.
Patch by Zvi Rackover.
llvm-svn: 171953
The current Intel Atom microarchitecture has a feature whereby
when a function returns early then it is slightly faster to execute
a sequence of NOP instructions to wait until the return address is ready,
as opposed to simply stalling on the ret instruction until
the return address is ready.
When compiling for X86 Atom only, this patch will run a pass,
called "X86PadShortFunction" which will add NOP instructions where less
than four cycles elapse between function entry and return.
It includes tests.
This patch has been updated to address Nadav's review comments
- Optimize only at >= O1 and don't do optimization if -Os is set
- Stores MachineBasicBlock* instead of BBNum
- Uses DenseMap instead of std::map
- Fixes placement of braces
Patch by Andy Zhang.
llvm-svn: 171879
cvtsi2* should parse with an 'l' or 'q' suffix or no suffix at all. No suffix should be treated the same as 'l' suffix. Printing should always print a suffix. Previously we didn't parse or print an 'l' suffix.
cvtt*2si/cvt*2si should parse with an 'l' or 'q' suffix or not suffix at all. No suffix should use the destination register size to choose encoding. Printing should not print a suffix.
Original 'l' suffix issue with cvtsi2* pointed out by Michael Kuperstein.
llvm-svn: 171668
URL: http://llvm.org/viewvc/llvm-project?rev=171524&view=rev
Log:
The current Intel Atom microarchitecture has a feature whereby when a function
returns early then it is slightly faster to execute a sequence of NOP
instructions to wait until the return address is ready,
as opposed to simply stalling on the ret instruction
until the return address is ready.
When compiling for X86 Atom only, this patch will run a pass, called
"X86PadShortFunction" which will add NOP instructions where less than four
cycles elapse between function entry and return.
It includes tests.
Patch by Andy Zhang.
llvm-svn: 171603
returns early then it is slightly faster to execute a sequence of NOP
instructions to wait until the return address is ready,
as opposed to simply stalling on the ret instruction
until the return address is ready.
When compiling for X86 Atom only, this patch will run a pass, called
"X86PadShortFunction" which will add NOP instructions where less than four
cycles elapse between function entry and return.
It includes tests.
Patch by Andy Zhang.
llvm-svn: 171524
Most IMPLICIT_DEF instructions are removed by the ProcessImplicitDefs
pass, and a few are reinserted by PHIElimination when a PHI argument is
<undef>.
RegisterCoalescer was assuming that all IMPLICIT_DEF live ranges look
like those created by PHIElimination, and that their live range never
leaves the basic block.
The PR14732 test case does tricks with PHI nodes that causes a longer
IMPLICIT_DEF live range to appear. This happens very rarely, but
RegisterCoalescer should be able to handle it.
llvm-svn: 171435
DAGCombiner::reduceBuildVecConvertToConvertBuildVec() was making two
mistakes:
1. It was checking the legality of scalar INT_TO_FP nodes and then generating
vector nodes.
2. It was passing the result value type to
TargetLoweringInfo::getOperationAction() when it should have been
passing the value type of the first operand.
llvm-svn: 171420
register. In most cases we actually compare or select YMM-sized registers
and mixing the two types creates horrible code. This commit optimizes
some of the transition sequences.
PR14657.
llvm-svn: 171148
When these instructions are encoded in VEX (on AVX) there is no such requirement. This changes the folding
tables and removes the alignment restrictions from VEX-encoded instructions.
llvm-svn: 171024
pmuludq is slow, but it turns out that all the unpacking and packing of the
scalarized mul is even slower. 10% speedup on loop-vectorized paq8p.
llvm-svn: 170985
The only way to read the eflags is using push and pop. If we don't
adjust the stack then we run over the first frame index. This is
not something that we want to do, so we have to make sure that
our machine function does not copy the flags. If it does then
we have to emit the prolog that adjusts the stack.
rdar://12896831
llvm-svn: 170961