[kernel][futex] Refactor futex state tracking. (51b2bba3) · Commits · Hacrwang / Fuchsia OS

Commit 51b2bba3 authored 6 years ago by John Grossman Committed by CQ bot account: commit-bot@chromium.org 6 years ago

Browse files

[kernel][futex] Refactor futex state tracking.

Refactor the way that we track futex state in the kernel.

Previously, the way that we tracked state was to have a FutexNode
object which was held on the stack for each waiting thread. The
FutexNode was an object which played two roles. When it was the head
of a list of waiters, it maintained any per-futex-state in addition to
maintaining the per-waiter state. When it was just another waiter in
the queue, it had storage for the per-futex-state, but really was only
holding on to the per-waiter state. Whenever the head of the waiters
for a particular futex changed, bookkeeping code needed to transfer
the per-futex-state from the old head to the new head.

This approach was pretty neat as it resulted in lockless O(1)
allocation for futex state which could not fail (baring a kernel stack
overflow). Still, there were some downsides to the approach. In
particular...

++ Part of the per-waiter state was a wait_queue_t; Basically ever
futex waiting in a queue had their own dedicated wait_queue object.
While not incorrect, this approach makes it difficult when we want
priority inheritance to become involved. It is easier to use a
single wait_queue_t for all of the futex waiters as this allows the
scheduler code to become involved in the decision of who to release
from a futex during a wake or a requeue operation. Also, it aligns
the data structure used to hold waiting between futexes and the
rest of the kernel. Now, as the scheduler evolves and changes are
made to how waiters wait, a set of futex waiters is guaranteed to
have the same behavior as any other set of waiting threads.
++ Because of the slightly tricky way in which the FutexNode storage
was being held, custom container code was needed. Again, there is
nothing inherently wrong with this, but the existing test code had
no tests, and raised the readability bar for someone new to the
code.
++ As the per-futex-state storage needs start to grow in order to
implement PI, maintaining the code which moves the futex state from
old-head to new-head starts to have new requirements which need to
be met. Remembering to do this in all of the proper places can
become burdensome and prone to failure. While refactoring the code
to centralize the logic would be possible, the logic is not needed
if the bookkeeping is centralized.

So; enter this change. Instead allocating this state on the stack any
time there is a waiter, we shift to the following approach. Start by
recognizing that a futex only has state while it has waiters.
Therefor, the absolute maximum number of futex state structures we
need in a process limited to the number of threads in the system.
When a thread is created, we can dynamically allocate a structure for
tracking per-futex-state and contribute it to our process's
FutexContext state pool. When a thread exits, it may remove a state
structure from free pool and let it's reference go out of scope.

In the meantime, code actually waiting on a futex can simply look up
the futex state for the futex in question from the associative
container (currently a hashtable backed by DLL buckets) which maps
from futex id to futex state structure, or just grab a new one from
the futex free pool if there are no current waiters for the futex in
question.

The free pool in this situation is a simple list, so our O(1)
allocation property is preserved. We now need a lock in order to
protect our free and active collections, but this can be the same lock
which was already needed to protect the free/active state of a given
futex's state, so no additional cost is being paid. Finally, because
of the argument given above about the maximum number of active futex
contexts in a process being the same as the active number of threads
in the process, we are guaranteed to never be able to fail to allocate
futex state.

ZX-1798 #comment Refactor futex state to prepare for moving ownership down into the wait queue level.

Tests: Build and overnight unit tests on QEMU, NUC and VIM2
Change-Id: I1451929e24a1bcb504be3f94ce858dbb6c7875f5

parent 2fa83bc5

Expand all Hide whitespace changes

Inline Side-by-side

Showing with 579 additions and 668 deletions

Please register or to comment