Skip to content

Conversation

@greatbridf
Copy link
Owner

Check #54 for more details.

SMS-Derfflinger and others added 30 commits July 23, 2025 23:40
Fix page cache's bug, add size check in read function.

Add page cache's base operations for ext4, but the cachepage will not be
dropped until kernel stop, so we need to call fsync function manually,
consider use some strategy such as LRU.
temporary write back by timer, when write function is called, check if the time since the last write back is greater than 10 seconds. If it is, then write back.
Remove old Scheduler. Add Runtime as replacement.

Use stackless coroutine as the low level tasking mechanism and build the
stackful tasks on top of it.

Redesign of the task state system. Rework the executor.

Remove Run trait and anything related.

Signed-off-by: greatbridf <greatbridf@icloud.com>
We use RUNNING to indicate that the task is on the cpu, and use READY to
indicate that the task could be further run again and therefore put into
the ready queue after one poll() call.

When the task is acquired from the ready queue and put onto cpu, it's
marked as RUNNING only, making it put suspended after we got the
Poll::Pending from the poll() call. If we (or others) call Waker::wake()
within the run, we'll set the READY flag then. And when we return from
the poll call, we could find it by a CAS and put it back to the ready
queue again.

We've also done some adaption work to the rest of the kernel, mainly to
remove *SOME* of the Task::block_on calls. But to completely remove it
is not possible for now. We should solve that in further few commits.

Signed-off-by: greatbridf <greatbridf@icloud.com>
Add tracing logs in Runtime::enter and other critical points.

Pass trace_scheduler feature down to eonix_runtime crate, fixing the
problem that the feature is not working.

When the task is blocked, we set CURRENT_TASK to None as well.

In early initialization stage, the stack is placed in identically mapped
physical address. VirtIO driver might try converting the given buffer
paths back to physical ones, which will generate errors. So BSP and AP
should allocate an another stack and switch to it. We use TaskContext
for the fix.

Signed-off-by: greatbridf <greatbridf@icloud.com>
This is used only by Thread when we enter user execution context, when
we need to save the "interrupt stack" to the local CPU so we can get the
information needed to capture the trap.

We need to support nested captured trap returns. So instead of setting
that manually, we save the needed information when trap_return() is
called (since we have precisely the trap context needed) and restore it
after the trap is captured.

Signed-off-by: greatbridf <greatbridf@icloud.com>
On riscv64 platforms, we load the kernel tp only if we've come from U
mode to reduce overhead. But we would restore the tp saved in
TrapContext even if we are returning to kernel space, which causes
problems because the default tp is zero.

We should save kernel tp register to the field in TrapContext structs
when we set privilege mode to kernel.

Signed-off-by: greatbridf <greatbridf@icloud.com>
We provide a simple block_on to constantly poll the given future and
block the current execution thread as before.

We also introduce a new future wrapper named `stackful` to convert any
future into a stackful one. We allocate a stack and keep polling the
future on the stack by constructing a TrapContext and call trap_return()
to get into the stackful environment. Then we capture the timer
interrupt to get preempts work.

Signed-off-by: greatbridf <greatbridf@icloud.com>
If we don't pass in FEATURES or SMP, we will have no feature enabled. In
this scenerio, the dangling --feature argument will cause cargo to panic.

We provide the features and the --feature together to avoid this...

Signed-off-by: greatbridf <greatbridf@icloud.com>
We can pass a function to be called after a success rcu_sync call.

Signed-off-by: greatbridf <greatbridf@icloud.com>
Simple renamings... Further work is needed to make the system work.

Signed-off-by: greatbridf <greatbridf@icloud.com>
The previous implementation has some bugs inside that will cause kernel
space nested traps to lose some required information:

- In kernel mode, trap contexts are saved above the current stack frame
  without exception, which is not what we want. We expect to read the
  trap data in the CAPTURED context.
- The capturer task context is not saved as well, which will mess up the
  nested traps completely.
- We are reading page fault virtual addresses in TrapContext::trap_type,
  which won't work since if the inner trap is captured, and the outer
  trap interleaves with the trap_type() call, we will lose the stval
  data in the inner trap.

The solution is to separate our "normal" trap handling procedure out of
captured trap handling procedure. We swap the stvec CSR when we set up
captured traps and restore it afterwards so the two approach don't have
to tell then apart in trap entries. Then, we can store the TrapContext
pointer in sscratch without having to distinguish between trap handling
types. In the way, we keep the procedure simple.

The register stval is saved together with other registers to be used in
page faults.

Signed-off-by: greatbridf <greatbridf@icloud.com>
We've got everything done in order to make the system run.

Add Thread::contexted to load the context needed for the thread to run.
Wrap the Thread::real_run() with contexted(stackful(...)) in
Thread::run().

We would use this for now. Later, we will make the thread completely
asynchronous. This way we don't have to change its interface then.

Signed-off-by: greatbridf <greatbridf@icloud.com>
Similar to 661a159:
- Save previous {trap, task}_ctx and restore them afterwards.
- Set kernel tp when setting trap context user mode.
- Add the program counter with 4 bytes on breakpoints.

Signed-off-by: greatbridf <greatbridf@icloud.com>
TODO: hide changes to the program counter in the HAL crate.

Signed-off-by: greatbridf <greatbridf@icloud.com>
The current implementation use the WokenUp object to detect whether the
stackful task is woken up somewhere. This is WRONG since we might lose
wakeups as the runtime have no idea what we have done. If someone wakes
us up, the task won't be enqueued so we will never have a second chance
to get to the foreground.

The fix is to use Arc<Task> to create a waker and check whether the task
is ready each time we get back to the stackful poll loop.

Signed-off-by: greatbridf <greatbridf@icloud.com>
We introduced a per-thread allocator inside the future object to
allocate space for the syscalls. This ensures performance and saves
memory. The allocator takes up 8K for now and is enough for current use.

Signed-off-by: greatbridf <greatbridf@icloud.com>
Signed-off-by: greatbridf <greatbridf@icloud.com>
Signed-off-by: greatbridf <greatbridf@icloud.com>
Use unwinding crate to unwind the stack and print stack trace.

Sightly adjust the linker script and move eh_frame into rodata section.

Due to limited kernel image size, there might be some problems on x86_64
platforms. Further fixes needed but won't be fixed for now.

Signed-off-by: greatbridf <greatbridf@icloud.com>
(cherry picked from commit 6bb54d9eae13b76768f011c44222b25b785b83e0)
Signed-off-by: greatbridf <greatbridf@icloud.com>
The stackful tasks might be woken up before actually being put into
sleep by returning a Poll::Pending. Thus, infinite sleep will occur
since we are no longer on both the wait list and the ready queue.

The solution is to remember that we are woken up in stackful wakers and
check before putting us to sleep by wait_for_wakeups().

Also, implement Drop for RCUPointer by using call_rcu to drop the
underlying data. We must mark T: Send + Sync + 'static in order to send
the arc to the runtime...

Signed-off-by: greatbridf <greatbridf@icloud.com>
The current implementation ignores the given argument and uses the
default arch. Change the wrong behavior...

Signed-off-by: greatbridf <greatbridf@icloud.com>
Inode and superblock rework:

Remove old Inode trait as it used to undertake too much responsibility.
The new method use three new traits: InodeOps is used to acquire generic
inode attributes. InodeFileOps and InodeDirOps handle file and directory
requests respectively. All the three have async fn trait methods and
don't need to be ?Sized. Then, we implement Inode, InodeFile and
InodeDir for the implementors of the three "Ops" traits, erasing their
actual type and provide generic dyn interface by wrapping the futures in
boxes. We should provide an io worker? or some io context with an
allocator for futures to reduce the overhead of io requests, or come up
with some better ideas?

For inode usage, we introduce InodeRef and InodeUse. InodeRef is a
simple wrapper for Weak<impl Inode> and InodeUse for Arc<impl Inode>.
This helps us use them better as we can't define impls for Arc<dyn
Inode>'s as they are foreign types. We also provide some more helper
methods for them.

After the change, we don't impose ANY structural restriction except for
the spinlock wrapped InodeInfo. The InodeInfo struct design might need
rethinking but the current implementation seems to be fine aside from
unnecessary locks when we try to access some of its fields but this
shouldn't be a VERY big problem and very urgent...

Similar changes are also made to superblock traits and types. But for
the superblock objects, we use a SuperBlockComplex struct to store
common fields such as whether the superblock is read only, their device
id and so on. Also the structs have a superblock rwsem inside. But we
haven't decided how to use that (such as whether we should acquire the
lock and pass it to the inode methods) and even whether they should
exist and be there. This will need further thinking so we put this off
for now...

Filesystem rework:

Rework tmpfs, fatfs and procfs with the new technology mentioned above,
leaving the old ext4 unchanged. The current implementation of ext4 uses
some "random" library from the "camp". Its code hasn't been fully
reviewed for time reasons but seems to be rather "problematic"... We
might rewrite the whole module later and the page cache subsystem
requires fully reworking as well. So we put this work off as well.

Block device and other parts rework:

Wraps PageCacheBackend, MountCreator and BlockRequestQueue with
async_trait to provide dyn compatible async functions. Dentry walking
functions are also put to the heap since they are recursive functions...
This has similar problems to the inode traits, ugly solution. Further
optimization is required.

Signed-off-by: greatbridf <greatbridf@icloud.com>
The old path walking algorithm requires recursion, which is not
supported in async rust. So we boxed them all as a temporary solution in
previous commits. This would introduce mass overhead even for fast path
walks just because we might sleep in `readlink()` and `lookup()` calls.

The new proposed method is to break the walk into several phases similar
to that in Linux: RCU walk and REF walk. The RCU walk will hold the RCU
lock and never blocks so the function itself can be non-async. If we hit
non-present dentries, we will fallback to REF walk. In REF walks, we
clone the Arcs and consult to the VFS layer for an accurate answer.

Note that in both the two methods mentioned above, symlinks are not
handled and will be returned directly with all path components left
untouched. We have a dedicated async function to follow the symlinks by
recursive calling the walk function. This can be slow and won't be
called frequently. So we wrapped the function with `Box::pin()` to break
the recursion chain. After the symlink resolution is done, we return to
the original position and continue the walk.

We found that the association of an inode to a dentry is one way. So the
`data` RCUPointer field is actually unnecessary and we can use the atomic
dentry type to sync readers with the writer. This way we can eliminate
`DentryData` allocations and improve performance.

We also introduced a new RCU read lock syntax. In the RCU walk mentioned
above, we need to store dentry references protected by some RCU read lock.
With the old syntax, we can't express the lifetime associtated by the
common RCU read lock. The new syntax provides a `rcu_read_lock()` method
to acquire the RCU read lock. The lock returned has a associated lifetime
so we can use it in the RCU session.

Signed-off-by: greatbridf <greatbridf@icloud.com>
Signed-off-by: greatbridf <greatbridf@icloud.com>
If we have invalid format, we should print a None instead of panicking.

Signed-off-by: greatbridf <greatbridf@icloud.com>
Introduce the new page locking mechanism to ensure exclusiveness of
pages when we access them. The underlying locks are not implemented yet
for now because we will change the paging structs in the following few
patches.

Introduce a new `PageExcl` struct representing a page that conforms with
Rust's ownership rules. The page owned exclusively can be accessed
without taking page locks.

Remove the `MemoryBlock` structs as they are not easy to use and
represent barely no semantic meanings.

Signed-off-by: greatbridf <greatbridf@icloud.com>
Separate old UserTLS into UserTLS and UserTLSDescriptor.

UserTLS is for threads to hold infomation about its storage. Descriptors
are used in clone syscalls.

Signed-off-by: greatbridf <greatbridf@icloud.com>
Signed-off-by: greatbridf <greatbridf@icloud.com>
Introduce `Zone`s: a Zone is a region of physical memory that all share
the same NUMA node. The Zone will hold all RawPage structs. Buddy
allocator will now store a reference to the zone that hold all its
pages. Thus, we make the buddy allocator independent of underlying
physical page frame management framework.

Remove unnecessary page flags and structs.

Signed-off-by: greatbridf <greatbridf@icloud.com>
- Bump rust compiler version to nightly-2026-01-09.
- Inode rework: add a generic Inode struct.
- Add a macro to help function tweaks.
- PageCache rework: reduce complexity and try to decouple.
- Adapt fat32, tmpfs to the new page cache system.
- Change the way we process mapped pages and load ELF executables.
- Refine handling flags in `MMArea::handle_mmap`.

Signed-off-by: greatbridf <greatbridf@icloud.com>
- Remove struct `Page` and add `Folio`s to represent adjacent pages.
- Introduce `Zone`s similar to that in Linux. Looking forward to
  removing all occurrence of `ArchPhysAccess` and so on.
- Adapt existing code to new `Folio` interface in a dirty and rough way.

Signed-off-by: greatbridf <greatbridf@icloud.com>
- Add script/backtrace to translate backtraces.
- Add a cut sign in the kernel panic routine to indicate the start of
  stack backtrace.

Signed-off-by: greatbridf <greatbridf@icloud.com>
No functional changes

- Add `{extern_,}symbol_addr` macro to retrieve symbol address.
- Remove manual impl Send and Sync for RawPage
- Make elided '_ lifetimes in return types explicit
- Suppress unused warnings by allowing them
- Remove really unused functions
- Refactor `println_trace` macro to suppress unused variable warnings

Signed-off-by: greatbridf <greatbridf@icloud.com>
Make print message prettier. Also panic when we have an error.

Signed-off-by: greatbridf <greatbridf@icloud.com>
Signed-off-by: greatbridf <greatbridf@icloud.com>
- thd.exit() will set thd.dead and send it to the reaper
- delay the release of process mm until we reap it
- extract futex logic out of exit and exec routine

Signed-off-by: greatbridf <greatbridf@icloud.com>
- Use intrusive lists to store and organize the process hierarchy.
- Remove `FileArray::open_console()`. Do it in the init script instead.
- Fix open logic: acquire controlling terminals only if O_NOCTTY is not
  set. Put this into TerminalFile::open().
- Send SIGHUP and then SIGCONT to foreground pgroup procs when the
  controlling terminal is dropped.
- Set the controlling terminal of sessions in Terminal.
- Limit max line width to 80. Format some codes.

Signed-off-by: greatbridf <greatbridf@icloud.com>
Reformat the files with new the format style to make the real changes
clearer.

Signed-off-by: greatbridf <greatbridf@icloud.com>
We already have `FDT.present_ram().free_ram()`. Remove the impl in
`ArchMemory` to avoid confusion.

Signed-off-by: greatbridf <greatbridf@icloud.com>
- Add `extern_symbol_value` to retrieve far relative symbol values.
- Get `BSS_LENGTH` and `__kernel_end` using `extern_symbol_addr`.
- Get `_ap_start` using `extern_symbol_value`.

Signed-off-by: greatbridf <greatbridf@icloud.com>
Signed-off-by: greatbridf <greatbridf@icloud.com>
With current ldscript, linkers will put vdso data after `__kernel_end`,
which is buggy since we use the symbol to indicate the end of our kernel
image and newly allocated pages may overwrite those positions.

Change by place the vdso inside REGION_DATA. Remove old VDSO memory
region. Align .data section end to page size border.

Add a helper macro to retrieve .vdso section symbol addresses.

Signed-off-by: greatbridf <greatbridf@icloud.com>
Strip out the memory used by the kernel and FDT data out of free memory
block returned by FDT.

Closes: #54 ("Random kernel freezing on process creation / exiting")
Fixes: 4351cf5 ("partial work: fix riscv64 bootstrap")
Signed-off-by: greatbridf <greatbridf@icloud.com>
Using *const (), no bytes are written to the position, which might
result in uninitialized memory access.

Fixes: ebd3d12 ("change(x86): optimize bootstrap code, remove kinit.cpp")
Fixes: 191877a ("feat(hal): impl basic single hart bootstrap for riscv64")
Signed-off-by: greatbridf <greatbridf@icloud.com>
Signed-off-by: greatbridf <greatbridf@icloud.com>
This can shorten qemu memory map.

Signed-off-by: greatbridf <greatbridf@icloud.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants