Call instructions no longer have a list of 43 call-clobbered registers.
Instead, they get a single register mask operand with a bit vector of
call-preserved registers.
This saves a lot of memory, 42 x 32 bytes = 1344 bytes per call
instruction, and it speeds up building call instructions because those
43 imp-def operands no longer need to be added to use-def lists. (And
removed and shifted and re-added for every explicit call operand).
Passes like LiveVariables, LiveIntervals, RAGreedy, PEI, and
BranchFolding are significantly faster because they can deal with call
clobbers in bulk.
Overall, clang -O2 is between 0% and 8% faster, uniformly distributed
depending on call density in the compiled code. Debug builds using
clang -O0 are 0% - 3% faster.
I have verified that this patch doesn't change the assembly generated
for the LLVM nightly test suite when building with -disable-copyprop
and -disable-branch-fold.
Branch folding behaves slightly differently in a few cases because call
instructions have different hash values now.
Copy propagation flushes its data structures when it crosses a register
mask operand. This causes it to leave a few dead copies behind, on the
order of 20 instruction across the entire nightly test suite, including
SPEC. Fixing this properly would require the pass to use different data
structures.
llvm-svn: 150638
The c'tor list is stored as a list of 'void ()*'s, so all of the functions are
bitcast to that. However, the dyn_cast doesn't automagically look through
bitcasts. Do that for it.
<rdar://problem/10813350>
llvm-svn: 150572
- Use unsigned literals when the desired result is unsigned. This mostly allows unsigned/signed mismatch warnings to be less noisy even if they aren't on by default.
- Remove misplaced llvm_unreachable.
- Add static to a declaration of a function on MSVC x86 only.
- Change some instances of calling a static function through a variable to simply calling that function while removing the unused variable.
llvm-svn: 150364
If the DEC node had more than one user, it was doing this lowering but
leaving the original DEC node around and so decrementing twice.
Fixes PR11964.
llvm-svn: 150356
This requires some gymnastics to make it available for C code. Remove the names
from the disassembler tables, making them relocation free.
llvm-svn: 150303
Now that the clang driver passes the CPU and feature information to
the backend when processing assembly files (150273), this isn't necessary.
llvm-svn: 150274
Creates a configurable regalloc pipeline.
Ensure specific llc options do what they say and nothing more: -reglloc=... has no effect other than selecting the allocator pass itself. This patch introduces a new umbrella flag, "-optimize-regalloc", to enable/disable the optimizing regalloc "superpass". This allows for example testing coalscing and scheduling under -O0 or vice-versa.
When a CodeGen pass requires the MachineFunction to have a particular property, we need to explicitly define that property so it can be directly queried rather than naming a specific Pass. For example, to check for SSA, use MRI->isSSA, not addRequired<PHIElimination>.
CodeGen transformation passes are never "required" as an analysis
ProcessImplicitDefs does not require LiveVariables.
We have a plan to massively simplify some of the early passes within the regalloc superpass.
llvm-svn: 150226
Moving toward a uniform style of pass definition to allow easier target configuration.
Globally declare Pass ID.
Globally declare pass initializer.
Use INITIALIZE_PASS consistently.
Add a call to the initializer from CodeGen.cpp.
Remove redundant "createPass" functions and "getPassName" methods.
While cleaning up declarations, cleaned up comments (sorry for large diff).
llvm-svn: 150100
Creating the isPredicated TSFlag enables the code
to use the property defined in the instruction format
instead of using a large switch statement.
llvm-svn: 150078
This CL delays reading of function bodies from initial parse until
materialization, allowing overlap of compilation with bitcode download.
llvm-svn: 149918
convert at least one client over to use them. Subsequent patches both to
LLVM and Clang will try to convert more people over to a common set of
predicates.
This round of predicates is focused on OS-categorization predicates.
llvm-svn: 149815
but with a critical fix to the SelectionDAG code that optimizes copies
from strings into immediate stores: the previous code was stopping reading
string data at the first nul. Address this by adding a new argument to
llvm::getConstantStringInfo, preserving the behavior before the patch.
llvm-svn: 149800
Passes prior to instructon selection are now split into separate configurable stages.
Header dependencies are simplified.
The bulk of this diff is simply removal of the silly DisableVerify flags.
Sorry for the target header churn. Attempting to stabilize them.
llvm-svn: 149754
Allows command line overrides to be centralized in LLVMTargetMachine.cpp.
LLVMTargetMachine can intercept common passes and give precedence to command line overrides.
Allows adding "internal" target configuration options without touching TargetOptions.
Encapsulates the PassManager.
Provides a good point to initialize all CodeGen passes so that Pass ID's can be used in APIs.
Allows modifying the target configuration hooks without rebuilding the world.
llvm-svn: 149672
needed to emit a 64-bit gp-relative relocation entry. Make changes necessary
for emitting jump tables which have entries with directive .gpdword. This patch
does not implement the parts needed for direct object emission or JIT.
llvm-svn: 149668
NEON loads and stores accept single and double spaced pairs, triples,
and quads of D registers. This patch adds new register classes to
accurately model those constraints:
Dn, Dn+1 Dn, Dn+2
----------------------
DPair DPairSpc
DTriple DTripleSpc
DQuad DQuadSpc
Also extend the existing QQ and QQQQ register classes to contains all Q
pairs and quads instead of just the aligned ones.
These new register classes will make it possible to accurately model
constraints on NEON loads and stores, and we can get rid of all the NEON
pseudo-instructions. The late scheduler will be able to accurately
model instruction dependencies from the explicit operands.
This more than doubles the number of ARM registers, but the backend
passes are quite good at handling this. The llc -O0 compile time only
regresses by 1.5%. Future work on register mask operands will recover
this regression.
llvm-svn: 149640