In some cases, a significant delay may be observed between the moment
a request for a CPU to come up is made and the moment it is ready to
start executing kernel code. This is especially true when a whole
cluster has to be powered up which may take in the order of miliseconds.
It is therefore a good idea to let the outbound CPU continue to execute
code in the mean time, and be notified when the inbound is ready before
performing the actual switch.
This is achieved by registering a completion block with the appropriate
IPI callback, and programming the sending of an IPI by the early assembly
code prior to entering the main kernel code. Once the IPI is delivered
to the outbound CPU, the completion block is "completed" and the switcher
thread is resumed.
Signed-off-by: Nicolas Pitre <nico@linaro.org>