For better explanation, I break down the page fault handling into steps:
1) There is a page fault caused by DMA operation initiated by SPU and
DMA is suspended.
2) The interrupt handler 'spu_irq_class_1()/__spu_trap_data_map()' is
called and it just wakes up the sleeping spe-manager thread.
3) by PPE scheduler, the corresponding bottom half,
spu_irq_class_1_bottom() is called in process context and DMA is
restarted.
There can be a quite large time gap between 2) and 3) and I found
the following problem:
Between 2) and 3) If the context becomes unbound, 3) is not executed
because when the spe-manager thread is awaken, the context is already
saved. (This situation can happen, for example, when a high priority spe
thread newly started in that time gap)
But the actual problem is that the corresponding SPU context does not
work even if it is bound again to a SPU.
Besides I can see the following warning in mambo simulator when the
context becomes
unbound(in save_mfc_cmd()), i.e. when unbind() is called for the
context after step 2) before 3) :
'WARNING: 61392752237: SPE2: MFC_CMD_QUEUE channel count of 15 is
inconsistent with number of available DMA queue entries of 16'
After I go through available documents, I found that the problem is
because the suspended DMA is not restarted when it is bound again.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>