Add basic assembly/disassembly support for the first Intel SHA
instruction 'sha1rnds4'. Also includes feature flag, and test cases.
Support for the remaining instructions will follow in a separate patch.
llvm-svn: 190611
latency for certain models of the Intel Atom family, by converting
instructions into their equivalent LEA instructions, when it is both
useful and possible to do so.
llvm-svn: 180573
variant/dialect. Addresses a FIXME in the emitMnemonicAliases function.
Use and test case to come shortly.
rdar://13688439 and part of PR13340.
llvm-svn: 179804
indirect through a memory address is to load the memory address into
a register and then call indirect through the register.
This patch implements this improvement by modifying SelectionDAG to
force a function address which is a memory reference to be loaded
into a virtual register.
Patch by Sriram Murali.
llvm-svn: 178171
All Intel CPUs since Yonah look a lot alike, at least at the granularity
of the scheduling models. We can add more accurate models for
processors that aren't Sandy Bridge if required. Haswell will probably
need its own.
The Atom processor and anything based on NetBurst is completely
different. So are the non-Intel chips.
llvm-svn: 178080
The current Intel Atom microarchitecture has a feature whereby
when a function returns early then it is slightly faster to execute
a sequence of NOP instructions to wait until the return address is ready,
as opposed to simply stalling on the ret instruction until
the return address is ready.
When compiling for X86 Atom only, this patch will run a pass,
called "X86PadShortFunction" which will add NOP instructions where less
than four cycles elapse between function entry and return.
It includes tests.
This patch has been updated to address Nadav's review comments
- Optimize only at >= O1 and don't do optimization if -Os is set
- Stores MachineBasicBlock* instead of BBNum
- Uses DenseMap instead of std::map
- Fixes placement of braces
Patch by Andy Zhang.
llvm-svn: 171879
URL: http://llvm.org/viewvc/llvm-project?rev=171524&view=rev
Log:
The current Intel Atom microarchitecture has a feature whereby when a function
returns early then it is slightly faster to execute a sequence of NOP
instructions to wait until the return address is ready,
as opposed to simply stalling on the ret instruction
until the return address is ready.
When compiling for X86 Atom only, this patch will run a pass, called
"X86PadShortFunction" which will add NOP instructions where less than four
cycles elapse between function entry and return.
It includes tests.
Patch by Andy Zhang.
llvm-svn: 171603
returns early then it is slightly faster to execute a sequence of NOP
instructions to wait until the return address is ready,
as opposed to simply stalling on the ret instruction
until the return address is ready.
When compiling for X86 Atom only, this patch will run a pass, called
"X86PadShortFunction" which will add NOP instructions where less than four
cycles elapse between function entry and return.
It includes tests.
Patch by Andy Zhang.
llvm-svn: 171524
Not all chips targeted by x86_64 have this feature, but a dramatically
increasing number do. Specifying a chip-specific tuning parameter will
continue to turn the feature on or off as appropriate for that
particular chip, but the generic flag should try to achieve the best
performance on the most widely available hardware. Today, the number of
chips with fast UA access dwarfs those without in the x86-64 space.
Note that this also brings LLVM's code generation for this '-march' flag
more in line with that of modern GCCs. Reviewed by Dan Gohman.
llvm-svn: 170269
Summary:
Not all chips targeted by x86_64 have this feature, but a dramatically
increasing number do. Specifying a chip-specific tuning parameter will
continue to turn the feature on or off as appropriate for that
particular chip, but the generic flag should try to achieve the best
performance on the most widely available hardware. Today, the number of
chips with fast UA access dwarfs those without in the x86-64 space.
Note that this also brings LLVM's code generation for this '-march' flag
more in line with that of modern GCCs.
CC: llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D195
llvm-svn: 169740
Intel chips.
The model number rules were determined by inspecting Intel's
documentation for their newer chip model numbers. My understanding is
that all of the newer Intel chips have fast unaligned memory access, but
if anyone is concerned about a particular chip, just shout.
No tests updated; it's not clear we have dedicated tests for the chips'
various features, but if anyone would like tests (or can point me at
some existing ones), I'm happy to oblige.
llvm-svn: 169730
- Add RTM code generation support throught 3 X86 intrinsics:
xbegin()/xend() to start/end a transaction region, and xabort() to abort a
tranaction region
llvm-svn: 167573
- CodeGenPrepare pass for identifying div/rem ops
- Backend specifies the type mapping using addBypassSlowDivType
- Enabled only for Intel Atom with O2 32-bit -> 8-bit
- Replace IDIV with instructions which test its value and use DIVB if the value
is positive and less than 256.
- In the case when the quotient and remainder of a divide are used a DIV
and a REM instruction will be present in the IR. In the non-Atom case
they are both lowered to IDIVs and CSE removes the redundant IDIV instruction,
using the quotient and remainder from the first IDIV. However,
due to this optimization CSE is not able to eliminate redundant
IDIV instructions because they are located in different basic blocks.
This is overcome by calculating both the quotient (DIV) and remainder (REM)
in each basic block that is inserted by the optimization and reusing the result
values when a subsequent DIV or REM instruction uses the same operands.
- Test cases check for the presents of the optimization when calculating
either the quotient, remainder, or both.
Patch by Tyler Nowicki!
llvm-svn: 163150
subtarget CPU descriptions and support new features of
MachineScheduler.
MachineModel has three categories of data:
1) Basic properties for coarse grained instruction cost model.
2) Scheduler Read/Write resources for simple per-opcode and operand cost model (TBD).
3) Instruction itineraties for detailed per-cycle reservation tables.
These will all live side-by-side. Any subtarget can use any
combination of them. Instruction itineraries will not change in the
near term. In the long run, I expect them to only be relevant for
in-order VLIW machines that have complex contraints and require a
precise scheduling/bundling model. Once itineraries are only actively
used by VLIW-ish targets, they could be replaced by something more
appropriate for those targets.
This tablegen backend rewrite sets things up for introducing
MachineModel type #2: per opcode/operand cost model.
llvm-svn: 159891
Adds an instruction itinerary to all x86 instructions, giving each a default latency of 1, using the InstrItinClass IIC_DEFAULT.
Sets specific latencies for Atom for the instructions in files X86InstrCMovSetCC.td, X86InstrArithmetic.td, X86InstrControl.td, and X86InstrShiftRotate.td. The Atom latencies for the remainder of the x86 instructions will be set in subsequent patches.
Adds a test to verify that the scheduler is working.
Also changes the scheduling preference to "Hybrid" for i386 Atom, while leaving x86_64 as ILP.
Patch by Preston Gurd!
llvm-svn: 149558
http://lab.llvm.org:8011/builders/llvm-x86_64-linux/builds/101
--- Reverse-merging r141854 into '.':
U test/MC/Disassembler/X86/x86-32.txt
U test/MC/Disassembler/X86/simple-tests.txt
D test/CodeGen/X86/bmi.ll
U lib/Target/X86/X86InstrInfo.td
U lib/Target/X86/X86ISelLowering.cpp
U lib/Target/X86/X86.td
U lib/Target/X86/X86Subtarget.h
llvm-svn: 141857
instructions are more aligned than the CPU requires, and adds some additional
directives, to follow in future patches. Patch by David Meyer!
llvm-svn: 139125
use MC instructions in the printInstruction() method via the tablegen flag
for it rather than a #define prior to including the autogenerated bits.
llvm-svn: 115238
- The idea is that when a match fails, we just try to match each of +'b', +'w',
+'l'. If exactly one matches, we assume this is a mnemonic prefix and accept
it. If all match, we assume it is width generic, and take the 'l' form.
- This would be a horrible hack, if it weren't so simple. Therefore it is an
elegant solution! Chris gets the credit for this particular elegant
solution. :)
- Next step to making this more robust is to have the X86 matcher generate the
mnemonic prefix information. Ideally we would also compute up-front exactly
which mnemonic to attempt to match, but this may require more custom code in
the matcher than is really worth it.
llvm-svn: 103012
When a target instruction wants to set target-specific flags, it should simply
set bits in the TSFlags bit vector defined in the Instruction TableGen class.
This works well because TableGen resolves member references late:
class I : Instruction {
AddrMode AM = AddrModeNone;
let TSFlags{3-0} = AM.Value;
}
let AM = AddrMode4 in
def ADD : I;
TSFlags gets the expected bits from AddrMode4 in this example.
llvm-svn: 100384
a new subtarget option for AES and check for the support. Add "westmere"
line of processors and add AES-NI support to the core i7.
Add a couple of TODOs for information I couldn't verify.
llvm-svn: 100231
On Nehalem and newer CPUs there is a 2 cycle latency penalty on using a register
in a different domain than where it was defined. Some instructions have
equvivalents for different domains, like por/orps/orpd.
The SSEDomainFix pass tries to minimize the number of domain crossings by
changing between equvivalent opcodes where possible.
This is a work in progress, in particular the pass doesn't do anything yet. SSE
instructions are tagged with their execution domain in TableGen using the last
two bits of TSFlags. Note that not all instructions are tagged correctly. Life
just isn't that simple.
The SSE execution domain issue is very similar to the ARM NEON/VFP pipeline
issue handled by NEONMoveFixPass. This pass may become target independent to
handle both.
llvm-svn: 99524
temporary workaround for matching inc/dec on x86_64 to the correct instruction.
- This hack will eventually be replaced with a robust mechanism for handling
matching instructions based on the available target features.
llvm-svn: 98858
ignore alignment requirements for SIMD memory operands. This
is useful on architectures like the AMD 10h that do not trap on
unaligned references if a status bit is twiddled at startup time.
llvm-svn: 93151
be non-optimal. To be precise, we should avoid folding loads if the instructions
only update part of the destination register, and the non-updated part is not
needed. e.g. cvtss2sd, sqrtss. Unfolding the load from these instructions breaks
the partial register dependency and it can improve performance. e.g.
movss (%rdi), %xmm0
cvtss2sd %xmm0, %xmm0
instead of
cvtss2sd (%rdi), %xmm0
An alternative method to break dependency is to clear the register first. e.g.
xorps %xmm0, %xmm0
cvtss2sd (%rdi), %xmm0
llvm-svn: 91672