This function is getting quite involved and it also gained more callers
lately. This is not performance critical so Uninline it to save on
binary size.
Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
Define the generic _current directly and get rid of the generic
arch_current_get().
The SMP default implementation is now known as z_smp_current_get().
It is no longer inlined which saves significant binary size (about 10%
for some random test case I checked).
Introduce z_current_thread_set() and use it in place of
arch_current_thread_set() for updating the current thread pointer
given this is not necessarily an architecture specific operation.
The architecture specific optimization, when enabled, should only care
about its own things and not have to also update the generic
_current_cpu->current copy.
Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
Mostly a revert of commit b1def7145f ("arch: deprecate `_current`").
This commit was part of PR #80716 whose initial purpose was about providing
an architecture specific optimization for _current. The actual deprecation
was sneaked in later on without proper discussion.
The Zephyr core always used _current before and that was fine. It is quite
prevalent as well and the alternative is proving rather verbose.
Furthermore, as a concept, the "current thread" is not something that is
necessarily architecture specific. Therefore the primary abstraction
should not carry the arch_ prefix.
Hence this revert.
Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
Even though calculating the priority queue index in the priority
multiq is quick, caching it allows us to extract an extra 2% in
terms of performance as measured by the thread_metric cooperative
benchmark.
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
Adds customized yield implementations based upon the selected
scheduler (dumb, multiq or scalable). Although each follows the
same broad outline, some of them allow for additional tweaking
to extract maximal performance. For example, the multiq variant
improves the performance of k_yield() by about 20%.
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
Dequeuing from a doubly linked list is similar to removing an item
except that it does not re-initialize the dequeued node.
This comes in handy when sorting a doubly linked list (where the
node gets removed and re-added). In that circumstance, re-initializing
the node is required. Furthermore, the compiler does not always
'understand' this. Thus, when performance is critical, dequeuing
may be preferred to removing.
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
Minor cleanups include ...
1. Eliminating unnecessary if-defs and forward declarations
2. Co-locating routines of the same queue type
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
The `lock` arg is used multiple times in the function, making the
`ARG_UNUSED(lock);` redundant, remove it.
Signed-off-by: Yong Cong Sin <ycsin@meta.com>
Signed-off-by: Yong Cong Sin <yongcong.sin@gmail.com>
Sleeping and suspended are now orthogonal states. That is, a thread
may be both sleeping and suspended and the two do not interact. One
repercussion of this is that suspending a thread will no longer
abort its timeout.
Threads are now created in the 'sleeping' state instead of a
'suspended' state. This dovetails nicely with the start delay that
can be given to a newly created thread--it is as though the very
first operation that a thread with a start delay is a sleep.
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
At the present time, Zephyr does has overlap between sleeping and
suspending. Not only should sleeping and suspended be orthogonal
states, but we should ensure users always employ the correct API.
For example, to wake a sleeping thread, k_wakeup() should be used,
and to resume a suspended thread, k_thread_resume() should be used.
However, at the present time k_thread_resume() can be used on a
thread that called k_sleep(K_FOREVER). Sleeping should have nothing
to do with suspension.
This commit introduces the new _THREAD_SLEEPING thread state along
with some prep-work to facilitate the decoupling of the sleeping and
suspended thread states.
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
Moves the arch_swap() declaration out of kernel_arch_interface.h
and into the various architectures' kernel_arch_func.h. This
permits the arch_swap() to be inlined on ARM, but extern'd on
the other architectures that still implement arch_swap().
Inlining this function on ARM has shown at least a +5% performance
boost according to the thread_metric benchmark on the disco_l475_iot1
board.
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
Traditionally threads have been initialized with a PRESTART flag set,
which gets cleared when the thread runs for the first time via either
its timeout or the k_thread_start() API.
But if you think about it, this is no different, semantically, than
SUSPENDED: the thread is prevented from running until the flag is
cleared.
So unify the two. Start threads in the SUSPENDED state, point
everyone looking at the PRESTART bit to the SUSPENDED flag, and make
k_thread_start() be a synonym for k_thread_resume().
There is some mild code size savings from the eliminated duplication,
but the real win here is that we make space in the thread flags byte,
which had run out.
Signed-off-by: Andy Ross <andyross@google.com>
`_current` is now functionally equals to `arch_curr_thread()`, remove
its usage in-tree and deprecate it instead of removing it outright,
as it has been with us since forever.
Signed-off-by: Yong Cong Sin <ycsin@meta.com>
Signed-off-by: Yong Cong Sin <yongcong.sin@gmail.com>
Add the following arch-specific APIs:
- arch_curr_thread()
- arch_set_curr_thread()
which allow SMP architectures to implement a faster "get current
thread pointer" than the default provided by the kernel. The 'set'
function is required for the 'get' to work, more on that later.
When `CONFIG_ARCH_HAS_CUSTOM_CURRENT_IMPL` is selected, calls to
`_current` & `k_sched_current_thread_query()` will be redirected to
`arch_curr_thread()`, which ideally should translate into a single
instruction read, avoiding the current
"lock > read CPU > read current thread > unlock" path in SMP
architectures and thus greatly improves the read performance.
However, since the kernel relies on a copy of the "current thread"s on
every CPU for certain operations (i.e. to compare the priority of the
currently scheduled thread on another CPU to determine if IPI should be
sent), we can't eliminate the copy of "current thread" (`current`) from
the `struct _cpu` and therefore the kernel now has to invoke
`arch_set_curr_thread()` in addition to what it has been doing. This
means that it will take slightly longer (most likely one instruction
write) to change the current thread pointer on the current
CPU.
Signed-off-by: Yong Cong Sin <ycsin@meta.com>
Signed-off-by: Yong Cong Sin <yongcong.sin@gmail.com>
Without multithreading only two stacks present: ISR and main.
As any stack they also could overflow, so it make sense to add stack
guard for them also.
Remove stack guard dependency on multithreading and mark
`Z_RISCV_STACK_GUARD_SIZE` bytes at the beginning of stack as read-only
region with PMP entry.
Signed-off-by: Volodymyr Fialko <vfialko@marvell.com>
Move the initialization of the priority q for running out of sched.c to
remove one more ifdef from sched.c. No change in functionality but
better matches the rest of sched.c and priority_q.h such that the
ifdefry needed is done in in priority_q.h.
Signed-off-by: Tom Burdick <thomas.burdick@intel.com>
In a uniprocessor system, _sched_spinlock may not need to be
held in all the same cases that it does in a multiprocessor
system. Removing those unnecessary usages can lead to better
performance on UP systems. In the case of uncontested taking
and giving of a semaphore, this can be as much as a +14%
performance gain.
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
Inlining z_unpend_first_thread() has been observed to give a
+8% and +16% performance boost to the thread_metric benchmark's
message processing and synchronization tests respectively.
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
Re-orders the checks in should_preempt() tests so that the
z_is_thread_timeout_active() check is done last.
This change has been observed to give a +7% performance boost on
the thread_metric benchmark's preemptive scheduling test.
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
Removing the routine z_ready_thread_locked() as it is not
used anywhere. It was a leftover artefact from development
that previously escaped cleanup.
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
This adds a interface to allow coredump to dump privileged
stack which is defined in architecture specific way.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
Fill the memory of all CPU's IRQ stack with 0xAA on init, so
that `z_stack_space_get` can calculate the remaining space
correctly.
Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
Signed-off-by: Yong Cong Sin <ycsin@meta.com>
Signed-off-by: Yong Cong Sin <yongcong.sin@gmail.com>
This adds a new arch_thread_priv_stack_space_get() interface for
each architecture to report privileged stack space usage. Each
architecture will need to implement this function as each arch
has their own way of defining privileged stacks.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
The intention of this API is to allow setting the posix thread
name equal to the zephyr thread name.
By defining it as an arch interface, the implementation becomes
generic.
Signed-off-by: Rubin Gerritsen <rubin.gerritsen@nordicsemi.no>
Utilize a code spell-checking tool to scan for and correct spelling errors
in all files within the `kernel` directory.
Signed-off-by: Pisit Sawangvonganan <pisit@ndrsolution.com>
This is part of a series of moving memory management related
stuff out of the Z_ namespace and into its own namespace.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
This is part of a series of move memory management related
stuff out of Z_ namespace into its own namespace.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
Also any demand paging and page frame related bits are
renamed.
This is part of a series to move memory management related
stuff out of the Z_ namespace into its own namespace.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
This is part of a series to move memory management related
stuff from Z_ namespace into its own namespace.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
This is part of a series to move memory management related
stuff from Z_ namespace into its own namespace.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
This is part of a series to move memory management related
stuff from Z_ namespace into its own namespace.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
Renames:
Z_KERNEL_VIRT_START to K_MEM_KERNEL_VIRT_START
Z_KERNEL_VIRT_SIZE to K_MEM_KERNEL_VIRT_SIZE
Z_KERNEL_VIRT_END to K_MEM_KERNEL_VIRT_END
This is part of a series to move memory management related
stuff from Z_ namespace into its own namespace.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
Renames:
Z_VIRT_RAM_START to K_MEM_VIRT_RAM_START
Z_VIRT_RAM_SIZE to K_MEM_VIRT_RAM_SIZE
Z_VIRT_RAM_END to K_MEM_VIRT_RAM_END
This is part of a series to move memory management related
stuff from Z_ namespace into its own namespace.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
Renames:
Z_PHYS_RAM_START to K_MEM_PHYS_RAM_START
Z_PHYS_RAM_SIZE to K_MEM_PHYS_RAM_SIZE
Z_PHYS_RAM_END to K_MEM_PHYS_RAM_END
This is part of a series to move memory management related
stuff from Z_ namespace into its own namespace.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
This is part of a series to move memory management related
stuff from the Z_ namespace into its own namespace.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
Rename Z_BOOT_VIRT_TO_PHYS() and Z_BOOT_PHYS_TO_VIRT() to
K_MEM_BOOT_VIRT_TO_PHYS() and K_MEM_BOOT_PHYS_TO_VIRT()
respectively. This is part of a series to move memory management
functions away from the Z_ namespace and into its own namespace.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
The CONFIG_IPI_OPTIMIZE configuration option allows for the flagging
and subsequent signaling of IPIs to be optimized.
It does this by making each bit in the kernel's pending_ipi field
a flag that indicates whether the corresponding CPU might need an IPI
to trigger the scheduling of a new thread on that CPU.
When a new thread is made ready, we compare that thread against each
of the threads currently executing on the other CPUs. If there is a
chance that that thread should preempt the thread on the other CPU
then we flag that an IPI is needed for that CPU. That is, a clear bit
indicates that the CPU absolutely will not need to reschedule, while a
set bit indicates that the target CPU must make that determination for
itself.
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
Created `GEN_OFFSET_STRUCT` & `GEN_NAMED_OFFSET_STRUCT` that
works for `struct`, and remove the use of `z_arch_esf_t`
completely.
Signed-off-by: Yong Cong Sin <ycsin@meta.com>
Make `struct arch_esf` compulsory for all architectures by
declaring it in the `arch_interface.h` header.
After this commit, the named struct `z_arch_esf_t` is only used
internally to generate offsets, and is slated to be removed
from the `arch_interface.h` header in the future.
Signed-off-by: Yong Cong Sin <ycsin@meta.com>
Namespaced the generated headers with `zephyr` to prevent
potential conflict with other headers.
Introduce a temporary Kconfig `LEGACY_GENERATED_INCLUDE_PATH`
that is enabled by default. This allows the developers to
continue the use of the old include paths for the time being
until it is deprecated and eventually removed. The Kconfig will
generate a build-time warning message, similar to the
`CONFIG_TIMER_RANDOM_GENERATOR`.
Updated the includes path of in-tree sources accordingly.
Most of the changes here are scripted, check the PR for more
info.
Signed-off-by: Yong Cong Sin <ycsin@meta.com>
The struct z_page_frame is marked __packed to avoid extra padding as
such padding may represent significant memory waste when lots of page
frames are used. However this is a bad strategy.
The code contained this somewhat dubious comment and code in
free_page_frame_list_put():
/* The structure is packed, which ensures that this is true */
void *node = pf;
sys_slist_append(&free_page_frame_list, node);
This is bad for many reasons:
- type checking is completely bypassed;
- if the sys_snode_t node member is no longer located at the front of
struct z_page_frame then the code will still compile and possibly run
but be broken with memory corruption as a likely outcome;
- the sys_slist_append() code is completely unaware of the packed
attribute which breaks architectures with alignment restrictions.
Let's improve code efficiency as well as memory usage by removing the
packed attribute and manually packing the flags in the unused virtual
address bits. This way the page frame array remains naturally aligned,
data access becomes optimal and the actual array size gets even smaller.
Signed-off-by: Nicolas Pitre <npitre@baylibre.com>