mirror of https://github.com/l4ka/hazelnut.git
346 lines
10 KiB
HTML
346 lines
10 KiB
HTML
<!doctype html public "-//w3c//dtd html 4.0 transitional//en">
|
|
<html>
|
|
<head>
|
|
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
|
|
<meta name="GENERATOR" content="Mozilla/4.72 [en] (X11; I; Linux 2.2.14 i686) [Netscape]">
|
|
<meta name="Author" content="Uwe Dannowski">
|
|
<meta name="Description" content="This file should record some kind of daily report.Here we note down the problems we stumbled over, our soulutionsand what we learned how to do or better not do certain things.">
|
|
<title>L4/KA - A diary</title>
|
|
</head>
|
|
<body>
|
|
|
|
<h2>
|
|
Pending problems</h2>
|
|
|
|
<h4>
|
|
on a write, ARM reports a read pagefault first if no mapping was present
|
|
(u)</h4>
|
|
|
|
<blockquote>I fear, we have to do this evil code analysis thing on a <i>translation
|
|
abort</i> to figure out whether it was a read or a write access.
|
|
<br>Write accesses on read-only mapped pages are fine - these cause a <i>permission
|
|
abort</i>.</blockquote>
|
|
|
|
<h4>
|
|
L4/Linux enter_kdebugs "non linux task" (u)</h4>
|
|
|
|
<blockquote>The linux kernel received an IPC with no error code, but the
|
|
sender's thread_id was not valid.
|
|
<br>Non-solution: Some exits in sys_ipc returning an error code didn't
|
|
keep the dest_id as specified.
|
|
<p>Note: Not seen anymore since 2000-04-10.</blockquote>
|
|
|
|
<h2>
|
|
|
|
<hr WIDTH="100%">2000-01-18</h2>
|
|
|
|
<h4>
|
|
IPC via sysenter/sysexit works (u)</h4>
|
|
|
|
<blockquote>We use a modified version of Jochen's proposal for the int
|
|
0x30 replacement
|
|
<br>The kernel esp is restored from the tss.esp0 location directly after
|
|
entering the kernel. This saves the expensive (serializing) wrmsr to update
|
|
the MSR_0x175.</blockquote>
|
|
|
|
<h2>
|
|
|
|
<hr WIDTH="100%">2000-01-17</h2>
|
|
|
|
<h4>
|
|
First steps towards the sysenter/sysexit IPC mechanism</h4>
|
|
|
|
<blockquote><b>sysenter</b>: MSR_0x174 -> cs; MSR_0x174 + 8 -> ss; MSR_0x175
|
|
-> esp; MSR_0x176 -> eip
|
|
<br>The esp register is overwritten on sysenter, no return information
|
|
is saved implicitly.
|
|
<br><b>sysexit</b>: MSR_0x174 + 16 -> cs; MSR_0x174 + 16 + 8 -> ss; ecx
|
|
-> esp; edx -> eip
|
|
<br>The registers ecx and edx must contain esp and eip and thus are not
|
|
free for use.
|
|
<p>The kernel esp (MSR_0x175) must be updated on every thread switch.
|
|
<br>Jochen proposes to store the return address on the user's stack and
|
|
to keep the user's esp in ecx.</blockquote>
|
|
|
|
<h2>
|
|
|
|
<hr WIDTH="100%">2000-01-16</h2>
|
|
|
|
<h4>
|
|
An IPC with timeout NEVER returns with error code 0x20 (timeout) (u)</h4>
|
|
|
|
<blockquote>Problem: This happened due to not dequeueing threads from the
|
|
wakeup queue. We relied on the scheduler to dequeue threads from the wakeup
|
|
queue. But this holds only, if the timeout becomes active. In case of an
|
|
IPC happening before the timeout expiry, the thread remained in the wakeup
|
|
queue.
|
|
<br>Scenario: L4/Linux' bottom half thread does IPCs with timeout and with
|
|
timeout NEVER. If an IPC with a timeout succeeded before the timeout became
|
|
active, the thread remained in the wakeup queue. Later on, when the thread
|
|
did an IPC with timeout NEVER, the scheduler found the thread in the wakeup
|
|
queue with an expired timeout and activated the thread. The thread then
|
|
concluded, that it woke up due to a timeout and returned with an error
|
|
code indicating a receive timeout (0x20).
|
|
<br>Solution: put calls to thread_dequeue_wakeup wherever they're required</blockquote>
|
|
|
|
<h4>
|
|
L4/Linux enter_kdebugs "wrong access" after some l4_thread_ex_regs from
|
|
the signal thread (u)</h4>
|
|
|
|
<blockquote>Scenario: The signal thread changes the thread's IP to an endless
|
|
loop in the emulib. Then it installes a faked exception 20 and kicks the
|
|
thread to the exception handler ... which then returns to the thread.
|
|
<br>Problem: Sometimes the worker thread is in a pagefault IPC. L4/KA does
|
|
an unwind_ipc by kicking the thread back to user land, but with trashed
|
|
registers, but the worker thread doesn't know that its registers were trashed.
|
|
<br>Solution: unwind_ipc must not unwind the IPC by placing the thread
|
|
in userland. It must abort only the IPC operation (i.e., return to the
|
|
place where do_pagefault_ipc was called), which means that we have to save
|
|
more state information. :-(
|
|
<p>Would it be enough to save the last_ipc_ksp to the TCB and restore that
|
|
on an aborted IPC (if it was valid)?
|
|
<br>Late night note: Yes it is. Currently, a return-to-able stack frame
|
|
is created in do_pagefault_ipc and its location is stored in the TCB (unwind_ipc_sp).</blockquote>
|
|
|
|
<h2>
|
|
|
|
<hr WIDTH="100%">2000-04-12</h2>
|
|
|
|
<h4>
|
|
l4_thread_schedule doesn't work in RMGR</h4>
|
|
|
|
<blockquote>sbd. cut the RMGR local c-binding for l4_thread_schedule short
|
|
<br>Having more than <b>one</b> set of syscall bindings is not really healthy!!!</blockquote>
|
|
|
|
<h2>
|
|
|
|
<hr WIDTH="100%">2000-04-11</h2>
|
|
|
|
<h4>
|
|
L4/Linux spuriously enter_kdebugs "bh error"</h4>
|
|
|
|
<blockquote>Still looking for the reason of wait(..., IPC_NEVER) returning
|
|
with timeout</blockquote>
|
|
|
|
<h2>
|
|
|
|
<hr WIDTH="100%">2000-04-10</h2>
|
|
|
|
<h4>
|
|
L4/Linux sometimes enter_kdebugs "wrong stack" when processes are killed</h4>
|
|
|
|
<blockquote>Scenario: An L4/Linux task is hit by a signal -> its signal
|
|
thread receives a message from the L4/Linux kernel. Then it does an lthread_ex_regs
|
|
on its worker thread (which might be in a pagefault (effectively in an
|
|
IPC which has to be aborted)).
|
|
<br>In unwind_ipc we called thread_dequeue_send(tcb,tcb) with exchanged
|
|
arguments. This led to a kernel pagefault by dereferencing a NULL pointer
|
|
(Maybe, we should add a check for that?), which was turned into an IPC
|
|
to the corresponding pager -> the L4/Linux kernel. To the L4/Linux kernel
|
|
this IPC looked like a syscall, but the signal thread must not do Linux
|
|
syscalls. Finally, the thread_id of the signal thread didn't match the
|
|
thread_id of the worker thread (stored in L4/Linux' TSS structure) -> L4/Linux
|
|
complains about the signal thread using the wrong stack (although there
|
|
is not even a proper stack for that thread anyway :-)</blockquote>
|
|
|
|
<h2>
|
|
|
|
<hr WIDTH="100%">2000-01-xx</h2>
|
|
|
|
<h4>
|
|
???</h4>
|
|
|
|
<blockquote>...</blockquote>
|
|
|
|
<h2>
|
|
|
|
<hr WIDTH="100%">2000-01-18</h2>
|
|
|
|
<h4>
|
|
total silence after thread switch</h4>
|
|
|
|
<blockquote>switching from one thread to another can happen in two ways:
|
|
<p>- switch inside the same address space
|
|
<br>- switch to a different address space
|
|
<p>later is the more interesting and complicated case.
|
|
<br>Following situation:
|
|
<p>Thread A, address space 1 switches to Thread B, address space 2
|
|
<p>assume that the tcb of thread A is not mapped in address space 2. Thus,
|
|
a
|
|
<br>switch to AS2 *BEFORE* the stack is switched to thread B will cause
|
|
a
|
|
<br>pagefault on the stack of thread A (since it is not mapped in AS 2).
|
|
But the
|
|
<br>pagefault handler usually operates on the stack. Therefore, the system
|
|
hangs
|
|
<br>in an infinite pagefault loop.
|
|
<p>Solution:
|
|
<br>To switch from one address space to another, stacks must be switched
|
|
BEFORE
|
|
<br>the address space is switched. If a pagefault is raised in the destination
|
|
<br>AS on the source tcb, we already have a valid stack and can handle
|
|
the
|
|
<br>fault.
|
|
<br>(v)</blockquote>
|
|
|
|
<h4>
|
|
crazy unpredictable behavior when changing small sections in the code (x86)</h4>
|
|
|
|
<blockquote>probably, GRUB used the memory where the x86-kernel was loaded
|
|
for its own purpose, after the image was loaded
|
|
<br>short term solution: relocate .init section
|
|
<br>long term solution: sign your code and check signature before execution</blockquote>
|
|
|
|
<h2>
|
|
|
|
<hr WIDTH="100%">2000-01-17</h2>
|
|
|
|
<h4>
|
|
???</h4>
|
|
|
|
<blockquote>...</blockquote>
|
|
|
|
<h2>
|
|
|
|
<hr WIDTH="100%">2000-01-14</h2>
|
|
|
|
<h4>
|
|
???</h4>
|
|
|
|
<blockquote>...</blockquote>
|
|
|
|
<h2>
|
|
|
|
<hr WIDTH="100%">2000-01-13</h2>
|
|
|
|
<h4>
|
|
TCB layout (v)</h4>
|
|
|
|
<blockquote>Fixed stack size removed</blockquote>
|
|
|
|
<h2>
|
|
|
|
<hr WIDTH="100%">2000-01-12</h2>
|
|
|
|
<h4>
|
|
IPC (v)</h4>
|
|
|
|
<h4>
|
|
Scheduler fixed (v)</h4>
|
|
|
|
<h4>
|
|
Flush TLB + Cache (u)</h4>
|
|
|
|
<h4>
|
|
ARM Mode switching, used wrong SP (u)</h4>
|
|
|
|
<h4>
|
|
Code moved to another section doesn't work any more, but did fine before</h4>
|
|
|
|
<blockquote>First, even symbols from section <tt>.kdebug</tt> were placed
|
|
in the <tt>.text</tt> section during the link process.
|
|
<br>When I create a separate <tt>.kdebug</tt> section directly behind the
|
|
<tt>.text</tt>
|
|
section, the virtual address of that section is <tt>end(.kdebug)</tt>,
|
|
as expected.
|
|
<br>I forgot to set the load address of that section, thus is was loaded
|
|
somewhere but not even near the <tt>.text</tt> section. (u)</blockquote>
|
|
|
|
<hr WIDTH="100%">
|
|
<h2>
|
|
2000-01-11</h2>
|
|
|
|
<h4>
|
|
???</h4>
|
|
|
|
<blockquote>...</blockquote>
|
|
|
|
<h2>
|
|
|
|
<hr WIDTH="100%">2000-01-10</h2>
|
|
|
|
<h4>
|
|
Cannot raise any exception at all on ARM</h4>
|
|
|
|
<blockquote>To raise an exception the exception vector must be mapped in
|
|
the current pagetable. Otherwise, the machine simply locks up. (u)</blockquote>
|
|
|
|
<h4>
|
|
Unplausible values</h4>
|
|
|
|
<blockquote>When stealing code, always rererererecheck the semantics of
|
|
type names!!!
|
|
<br><tt>word_t(source system) != word_t(dest system)</tt>
|
|
<br>(u)</blockquote>
|
|
|
|
<h2>
|
|
|
|
<hr WIDTH="100%"></h2>
|
|
|
|
<h2>
|
|
2000-01-09</h2>
|
|
|
|
<h4>
|
|
Make compatibility</h4>
|
|
|
|
<blockquote>It appears, that make 3.78.1 works better than 3.77... The
|
|
older version seemed unable to include the Makeconf file, probably caused
|
|
by a parsing error. Things broke when I started using $(error ... ), I
|
|
guess. (u)</blockquote>
|
|
|
|
<h4>
|
|
When cpp goes crazy</h4>
|
|
|
|
<blockquote>like
|
|
<br><b><tt>#define blah(x) \</tt></b>
|
|
<br><b><tt>{ \</tt></b>
|
|
<br><b><tt> foo; \</tt></b>
|
|
<br><b><tt>} \</tt></b>
|
|
<br>being preprocessed as
|
|
<br><b><tt>{</tt></b>
|
|
<br><b><tt> foo;</tt></b>
|
|
<br><b><tt>}</tt></b>
|
|
<br>you might suffer from DOSified files. Emacs not showing <b><tt>^M</tt></b>
|
|
doesn't tell you anything.
|
|
<br>Using cvs under NT with a working directory on a samba share and the
|
|
repository accessed via rsh does unix2dos and dos2unix translations when
|
|
checking in/out. Files in working directory contain <b><tt>^M</tt></b>.
|
|
(u)</blockquote>
|
|
|
|
<h4>
|
|
x86-Pagetables</h4>
|
|
|
|
<blockquote>usermode pagedir entries must have the usermode bit set. (v)</blockquote>
|
|
|
|
<h4>
|
|
VMWare - accessing nonexistent phys mem -> pgfault</h4>
|
|
|
|
<blockquote>VMWare does not cleanly emulate pagetable entries to non-existing
|
|
memory. Instead of delivering "undefined" data VMWare raises a pagefault.
|
|
(v)</blockquote>
|
|
|
|
<h4>
|
|
VMWare uses parts of the virtual mem</h4>
|
|
|
|
<blockquote>VMWare does not allow full usage of memory beyond 0xFFFF0000.
|
|
VMWare stops with errors like "muli not emulated". It seems that memory
|
|
used above that point cannot hold stuff like gdt and idt. (v)</blockquote>
|
|
|
|
<h4>
|
|
TODO</h4>
|
|
|
|
<ul>
|
|
<li>
|
|
arch/config.h contains KERNEL_PHYS. This should go into the linker scripts.
|
|
(u)</li>
|
|
|
|
<li>
|
|
keep in mind: Could we make use of architecturally private TCB members?
|
|
(u)</li>
|
|
</ul>
|
|
|
|
<hr WIDTH="100%">
|
|
<br>
|
|
</body>
|
|
</html>
|