Skip to content

Commit 5cc5ca2

Browse files
committed
Fix possible deadlock with async_socket_for_remote/use_hedged_requests and parallel KILL
Right now it is possible to call QueryStatus::addPipelineExecutor() when the executors_mutex already acquired, it is possible when the query was cancelled via KILL QUERY. Here I will show some traces from debugger from a real example, where tons of ProcessList::insert() got deadlocked. Let's look at the lock owner for one of the threads that was deadlocked in ProcessList::insert(): (gdb) p *mutex $2 = { __data = { __owner = 46899, }, } And now let's see the stack trace of the 46899: #0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103 #1 0x00007fb65569b714 in __GI___pthread_mutex_lock (mutex=0x7fb4a9d15298) at ../nptl/pthread_mutex_lock.c:80 #2 0x000000001b6edd91 in pthread_mutex_lock (arg=0x7fb4a9d15298) at ../src/Common/ThreadFuzzer.cpp:317 #3 std::__1::__libcpp_mutex_lock (__m=0x7fb4a9d15298) at ../contrib/libcxx/include/__threading_support:303 #4 std::__1::mutex::lock (this=0x7fb4a9d15298) at ../contrib/libcxx/src/mutex.cpp:33 #5 0x0000000014c7ae63 in std::__1::lock_guard<std::__1::mutex>::lock_guard (__m=..., this=<optimized out>) at ../contrib/libcxx/include/__mutex_base:91 #6 DB::QueryStatus::addPipelineExecutor (this=0x7fb4a9d14f90, e=0x80) at ../src/Interpreters/ProcessList.cpp:372 #7 0x0000000015bee4a7 in DB::PipelineExecutor::PipelineExecutor (this=0x7fb4b1e53618, processors=..., elem=<optimized out>) at ../src/Processors/Executors/PipelineExecutor.cpp:54 #12 std::__1::make_shared<DB::PipelineExecutor, std::__1::vector<std::__1::shared_ptr<DB::IProcessor>, std::__1::allocator<std::__1::shared_ptr<DB::IProcessor> > >&, DB::QueryStatus*&, void> (__args=@0x7fb63095b9b0: 0x7fb4a9d14f90, __args=@0x7fb63095b9b0: 0x7fb4a9d14f90) at ../contrib/libcxx/include/__memory/shared_ptr.h:963 #13 DB::QueryPipelineBuilder::execute (this=0x7fb63095b8b0) at ../src/QueryPipeline/QueryPipelineBuilder.cpp:552 #14 0x00000000158c6c27 in DB::Connection::sendExternalTablesData (this=0x7fb6545e9d98, data=...) at ../src/Client/Connection.cpp:797 #27 0x0000000014043a81 in DB::RemoteQueryExecutorRoutine::operator() (this=0x7fb63095bf20, sink=...) at ../src/QueryPipeline/RemoteQueryExecutorReadContext.cpp:46 #32 0x000000000a16dd4f in make_fcontext () at ../contrib/boost/libs/context/src/asm/make_x86_64_sysv_elf_gas.S:71 And also in the logs you can see very strange things for this thread: 2022.09.13 14:14:51.228979 [ 51145 ] {1712D4E914EC7C99} <Debug> Connection (localhost:9000): Sent data for 1 external tables, total 11 rows in 0.00046389 sec., 23688 rows/sec., 3.84 KiB (8.07 MiB/sec.), compressed 1.1070121092649958 times to 3.47 KiB (7.29 MiB/sec.) ... 2022.09.13 14:14:51.719402 [ 46899 ] {7c90ffa4-1dc8-42fd-938c-4e307c244394} <Debug> executeQuery: (from 10.101.15.181:42478) KILL QUERY WHERE query_id = '1712D4E914EC7C99' (stage: Complete) 2022.09.13 14:14:51.719488 [ 46899 ] {7c90ffa4-1dc8-42fd-938c-4e307c244394} <Debug> executeQuery: (internal) SELECT query_id, user, query FROM system.processes WHERE query_id = '1712D4E914EC7C99' (stage: Complete) 2022.09.13 14:14:51.719754 [ 46899 ] {7c90ffa4-1dc8-42fd-938c-4e307c244394} <Trace> ContextAccess (default): Access granted: SELECT(user, query_id, query) ON system.processes 2022.09.13 14:14:51.720544 [ 46899 ] {7c90ffa4-1dc8-42fd-938c-4e307c244394} <Trace> InterpreterSelectQuery: FetchColumns -> Complete 2022.09.13 14:14:53.228964 [ 46899 ] {7c90ffa4-1dc8-42fd-938c-4e307c244394} <Debug> Connection (localhost:9000): Sent data for 2 scalars, total 2 rows in 2.6838e-05 sec., 73461 rows/sec., 68.00 B (2.38 MiB/sec.), compressed 0.4594594594594595 times to 148.00 B (5.16 MiB/sec.) How is this possible? The answer is fibers and query cancellation routine. During cancellation of async queries it going into fibers again and try to do this gracefully. However because of this during canceling query it may call QueryStatus::addPipelineExecutor() from QueryStatus::cancelQuery(). Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
1 parent c511c3f commit 5cc5ca2

File tree

1 file changed

+6
-0
lines changed

1 file changed

+6
-0
lines changed

src/Interpreters/ProcessList.cpp

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -369,6 +369,12 @@ CancellationCode QueryStatus::cancelQuery(bool)
369369

370370
void QueryStatus::addPipelineExecutor(PipelineExecutor * e)
371371
{
372+
/// In case of asynchronous distributed queries it is possible to call
373+
/// addPipelineExecutor() from the cancelQuery() context, and this will
374+
/// lead to deadlock.
375+
if (is_killed.load())
376+
throw Exception("Query was cancelled", ErrorCodes::QUERY_WAS_CANCELLED);
377+
372378
std::lock_guard lock(executors_mutex);
373379
assert(std::find(executors.begin(), executors.end(), e) == executors.end());
374380
executors.push_back(e);

0 commit comments

Comments
 (0)