From 85ad1e57f4b57649331cca8a03b24e430a453f75 Mon Sep 17 00:00:00 2001 From: Nathan Goldbaum Date: Thu, 21 Aug 2025 14:39:32 -0600 Subject: [PATCH 1/5] Add CFFI thread safety docs --- doc/source/overview.rst | 152 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 152 insertions(+) diff --git a/doc/source/overview.rst b/doc/source/overview.rst index ee8b0df3..81ebca8b 100644 --- a/doc/source/overview.rst +++ b/doc/source/overview.rst @@ -595,6 +595,158 @@ with C code to initialize global variables. The actual ``lib.*()`` function calls should be obvious: it's like C. +.. _thread-safety: + +Thread Safety +------------- + +Multithreading can be a powerful but tricky way to exploit the many cores on +modern CPUs. Combining CFFI with the Python `threading` module is a convenient +way to use multithreaded parallelism with a C library. + +On the GIL-enabled build, CFFI will release the GIL before calling into a C +library. That means that it is possible to get multithreaded speedups using CFFI +on both the free-threaded and GIL-enabled builds of Python. However, that also +means that the GIL does not protect multithreaded shared use of C data +structures exposed via FFI. + +If the C library you are wrapping is not thread-safe, then it is not thread-safe +to use the library via Python without adding some kind of locking. If the +library *is* thread-safe, then no additional locking is necessary to ensure the +thread safety of CFFI itself. As of version 2.0, CFFI generates thread-safe +bindings. + +Let's make that concrete by wrapping some code that is not thread-safe due to +use of a C global variable: + +.. code-block:: python + + from cffi import FFI + ffibuilder = FFI() + + ffibuilder.set_source("_thread_safety_example", + r""" + #include + + static int64_t value = 0; + static int64_t increment(void) { + value++; + return value; + } + """, + libraries=[] + ) + + ffibuilder.cdef(r""" + int64_t increment(void); + """ + ) + + if __name__ == "__main__": + ffibuilder.compile(verbose=True) + +The way that the ``increment`` uses the ``value`` global variable is not +thread-safe. `Data races +`_ are possible if two +threads simultaneously call ``increment``. We can engineer that situation with a +Python script that calls into the wrapper like so: + +.. code-block:: python + + import sys + + from concurrent.futures import ThreadPoolExecutor, wait + import threading + + from _thread_safety_example import ffi, lib + + # Make races more likely by switching threads more often + # on the GIL-enabled build. This has no effect on the + # free-threaded build. + sys.setswitchinterval(.0000001) + + N_WORKERS = 4 + + l = threading.Lock() + + def work(): + lib.increment() + + def run_thread_pool(): + with ThreadPoolExecutor(max_workers=N_WORKERS) as tpe: + try: + futures = [tpe.submit(work) for _ in range(100000)] + # block until all work finishes + wait(futures) + finally: + # check for exceptions in worker threads + [f.result() for f in futures] + + + run_thread_pool() + + print(lib.increment()) + +On the system used to run this example by the author, this script prints random +results, with possible result values ranging from 99960 to 99980, indicating +that, on average, races happen a few dozen times over the hundred thousand loop +iterations. The results you get will depend on your hardware, system +configuration, and Python interpreter version. + +Note that races are relatively rare. The CFFI bindings and Python interpreter +add enough overhead that it is not very likely for two threads to simultaneously +increment the static integer. This can make code *appear* to be sequentially +consistent for small sample sizes, when it is in fact not consistent. See `this +tutorial +`_ +for more examples of how the GIL and Python overhead can mask thread safety +issues that only manifest under production load. + +We can make the above example script thread-safe by using a lock: + +.. code-block:: python + + l = threading.Lock() + + def work(): + l.acquire() + lib.increment() + l.release() + +The `threading.Lock` ensures only one thread can call into the wrapped C library +at a time. Any thread that calls ``l.acquire()`` while another thread has +already acquired the lock will block until the lock is released. + +Using a global lock like this is necessary it is not safe for more than one +thread to simultaneously call into any part of the library. This is the case if +the library relies on global state its implementation that does not have any +explicit synchronization. Libraries like this are not re-entrant. + +For re-entrant libraries, where two threads can simultaneously use the library +so long as the threads do not share references to an object, generally you will +want to use a per-object lock instead of a global lock. Keep in mind in this +case that any program with more than one lock can lead to a deadlock and care +must be taken to avoid situations where two threads can deadlock. + +If you do not expect to use the bindings for a thread-unsafe library in a +multithreaded program, locking is not necessary. Similarly if you know that you +are using the library in a thread-safe manner by construction, it is not +necessary to add locking. Also, if you know that the C library you are wrapping +is thread-safe, no additional locking is necessary to make the CFFI bindings +thread-safe. As of version 2.0, CFFI generates thread-safe bindings to C +libraries. + +If you publish CFFI bindings for a library, you should document the thread +safety guarantees of your bindings. It may make sense to add locking into the +bindings but it might also make sense to clearly document the bindings are not +thread-safe and it is up to users to ensure appropriate synchronization or +exclusive access if users do want to use the bindings in a thread pool. + +See the Python free-threading guide page on `improving the thread safety of +Python code +`_ +for more information about updating a Python library with thread safety in mind. + .. _abi-versus-api: ABI versus API From 37bcd7809c2588ad7d245684ac7ef6c3aba415ca Mon Sep 17 00:00:00 2001 From: Nathan Goldbaum Date: Fri, 22 Aug 2025 08:19:20 -0600 Subject: [PATCH 2/5] Delete incorrect statements --- doc/source/overview.rst | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/doc/source/overview.rst b/doc/source/overview.rst index 81ebca8b..8eb38a0a 100644 --- a/doc/source/overview.rst +++ b/doc/source/overview.rst @@ -613,8 +613,7 @@ structures exposed via FFI. If the C library you are wrapping is not thread-safe, then it is not thread-safe to use the library via Python without adding some kind of locking. If the library *is* thread-safe, then no additional locking is necessary to ensure the -thread safety of CFFI itself. As of version 2.0, CFFI generates thread-safe -bindings. +thread safety of CFFI itself. Let's make that concrete by wrapping some code that is not thread-safe due to use of a C global variable: @@ -733,8 +732,7 @@ multithreaded program, locking is not necessary. Similarly if you know that you are using the library in a thread-safe manner by construction, it is not necessary to add locking. Also, if you know that the C library you are wrapping is thread-safe, no additional locking is necessary to make the CFFI bindings -thread-safe. As of version 2.0, CFFI generates thread-safe bindings to C -libraries. +thread-safe. If you publish CFFI bindings for a library, you should document the thread safety guarantees of your bindings. It may make sense to add locking into the From 825ef942f0fc06cae92d71df152b98f6fc7bfb4c Mon Sep 17 00:00:00 2001 From: Nathan Goldbaum Date: Fri, 22 Aug 2025 08:20:38 -0600 Subject: [PATCH 3/5] Add more links, examples, and suggestions about TSan --- doc/source/overview.rst | 48 ++++++++++++++++++++++++++++++----------- 1 file changed, 35 insertions(+), 13 deletions(-) diff --git a/doc/source/overview.rst b/doc/source/overview.rst index 8eb38a0a..b990f00d 100644 --- a/doc/source/overview.rst +++ b/doc/source/overview.rst @@ -718,21 +718,36 @@ already acquired the lock will block until the lock is released. Using a global lock like this is necessary it is not safe for more than one thread to simultaneously call into any part of the library. This is the case if -the library relies on global state its implementation that does not have any -explicit synchronization. Libraries like this are not re-entrant. - -For re-entrant libraries, where two threads can simultaneously use the library -so long as the threads do not share references to an object, generally you will -want to use a per-object lock instead of a global lock. Keep in mind in this -case that any program with more than one lock can lead to a deadlock and care +the library relies on global state that does not have any explicit +synchronization. Libraries like this are not `re-entrant +`_. + +Libraries that are re-entrant but not thread-safe are usually structured such +that two threads can simultaneously use the library so long as the threads do +not simultaneously mutate shared references to an object. For libraries like +this you will want to use a per-object lock instead of a global lock. Keep in +mind in this case that any program with more than one lock can lead to a +`deadlock `_ and care must be taken to avoid situations where two threads can deadlock. -If you do not expect to use the bindings for a thread-unsafe library in a -multithreaded program, locking is not necessary. Similarly if you know that you -are using the library in a thread-safe manner by construction, it is not -necessary to add locking. Also, if you know that the C library you are wrapping -is thread-safe, no additional locking is necessary to make the CFFI bindings -thread-safe. +If it is a programming error for two threads to simultaneously share an object, +you might acquire a `threading.Lock` object named ``l`` like this: + +.. code-block:: python + + if not l.acquire(blocking=False): + raise RuntimeError("Multithreaded use is not supported") + + # call into the unsafe library or use an unsafe object + + l.release() + +This prevents deadlocks, since `l.acquire(blocking=False)` returns `False` +immediately if the lock is already acquired by another thread. + +If you know that the C library you are wrapping is thread-safe, no additional +locking is necessary to make the CFFI bindings thread-safe. Please report thread +safety bugs that you suspect are due to issues in the generated CFFI bindings. If you publish CFFI bindings for a library, you should document the thread safety guarantees of your bindings. It may make sense to add locking into the @@ -745,6 +760,13 @@ Python code `_ for more information about updating a Python library with thread safety in mind. +You can validate the thread safety of your library by running multithreaded +tests using `Thread Sanitizer +`_. See the Python +free-threading guide page on `using Thread Sanitizer to detect thread safety +issues `_ for more +details. + .. _abi-versus-api: ABI versus API From ed167f8d4e9e2e4a428171c47693180777d186ea Mon Sep 17 00:00:00 2001 From: Nathan Goldbaum Date: Fri, 22 Aug 2025 09:52:33 -0600 Subject: [PATCH 4/5] fix indentation in code example --- doc/source/overview.rst | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/doc/source/overview.rst b/doc/source/overview.rst index b990f00d..a5a50b30 100644 --- a/doc/source/overview.rst +++ b/doc/source/overview.rst @@ -671,20 +671,20 @@ Python script that calls into the wrapper like so: def work(): lib.increment() - def run_thread_pool(): - with ThreadPoolExecutor(max_workers=N_WORKERS) as tpe: - try: - futures = [tpe.submit(work) for _ in range(100000)] - # block until all work finishes - wait(futures) - finally: - # check for exceptions in worker threads - [f.result() for f in futures] + def run_thread_pool(): + with ThreadPoolExecutor(max_workers=N_WORKERS) as tpe: + try: + futures = [tpe.submit(work) for _ in range(100000)] + # block until all work finishes + wait(futures) + finally: + # check for exceptions in worker threads + [f.result() for f in futures] - run_thread_pool() + run_thread_pool() - print(lib.increment()) + print(lib.increment()) On the system used to run this example by the author, this script prints random results, with possible result values ranging from 99960 to 99980, indicating From 89e2e63a034c0459ded29bb9cf60a482a00bf9e0 Mon Sep 17 00:00:00 2001 From: Nathan Goldbaum Date: Mon, 25 Aug 2025 10:24:13 -0600 Subject: [PATCH 5/5] Update doc/source/overview.rst Co-authored-by: Matti Picus --- doc/source/overview.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/source/overview.rst b/doc/source/overview.rst index a5a50b30..fb78249f 100644 --- a/doc/source/overview.rst +++ b/doc/source/overview.rst @@ -716,7 +716,7 @@ The `threading.Lock` ensures only one thread can call into the wrapped C library at a time. Any thread that calls ``l.acquire()`` while another thread has already acquired the lock will block until the lock is released. -Using a global lock like this is necessary it is not safe for more than one +Using a global lock like this is necessary if it is not safe for more than one thread to simultaneously call into any part of the library. This is the case if the library relies on global state that does not have any explicit synchronization. Libraries like this are not `re-entrant