kernel/sched: Fix SMP must-wait-for-switch conditions in abort/join

As discovered by Carlo Caione, the k_thread_join code had a case where
it detected it had been called on a thread already marked _THREAD_DEAD
and exited early.  That's not sufficient.  The thread state is mutated
from the thread itself on its exit path.  It may still be running!

Just like the code in z_swap(), we need to spin waiting on the other
CPU to write the switch handle before knowing it's safe to return,
otherwise the calling context might (and did) do something like
immediately k_thread_create() a new thread in the "dead" thread's
struct while it was still running on the other core.

There was also a similar case in k_thread_abort() which had the same
issue: it needs to spin waiting on the other CPU to kill the thread
via the same mechanism.

Fixes #58116

Originally-by: Carlo Caione <ccaione@baylibre.com>
Signed-off-by: Andy Ross <andyross@google.com>
This commit is contained in:
Andy Ross 2023-05-26 09:39:16 -07:00 committed by jgl-meta
parent c3046f417a
commit a08e23f68e

View file

@ -1771,6 +1771,13 @@ void z_thread_abort(struct k_thread *thread)
k_spin_unlock(&sched_spinlock, key);
while (is_aborting(thread)) {
}
/* Now we know it's dying, but not necessarily
* dead. Wait for the switch to happen!
*/
key = k_spin_lock(&sched_spinlock);
z_sched_switch_spin(thread);
k_spin_unlock(&sched_spinlock, key);
} else if (active) {
/* Threads can join */
add_to_waitq_locked(_current, &thread->join_queue);
@ -1806,6 +1813,7 @@ int z_impl_k_thread_join(struct k_thread *thread, k_timeout_t timeout)
SYS_PORT_TRACING_OBJ_FUNC_ENTER(k_thread, join, thread, timeout);
if ((thread->base.thread_state & _THREAD_DEAD) != 0U) {
z_sched_switch_spin(thread);
ret = 0;
} else if (K_TIMEOUT_EQ(timeout, K_NO_WAIT)) {
ret = -EBUSY;