Skip to content

Commit 49e7bfc

Browse files
feat(FIR-49930): add bulk_insert parameter to executemany for improved INSERT performance (#463)
Co-authored-by: ptiurin <petro.tiurin@firebolt.io> Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: Petro Tiurin <93913847+ptiurin@users.noreply.github.com>
1 parent e380de2 commit 49e7bfc

File tree

11 files changed

+794
-36
lines changed

11 files changed

+794
-36
lines changed

docsrc/Connecting_and_queries.rst

Lines changed: 61 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -196,14 +196,14 @@ To get started, follow the steps below:
196196
) as connection:
197197
# Create a cursor
198198
cursor = connection.cursor()
199-
199+
200200
# Execute a simple test query
201201
cursor.execute("SELECT 1")
202202

203203
.. note::
204204

205-
Firebolt Core is assumed to be running locally on the default port (3473). For instructions
206-
on how to run Firebolt Core locally using Docker, refer to the
205+
Firebolt Core is assumed to be running locally on the default port (3473). For instructions
206+
on how to run Firebolt Core locally using Docker, refer to the
207207
`official docs <https://docs.firebolt.io/firebolt-core/firebolt-core-get-started>`_.
208208

209209

@@ -404,7 +404,7 @@ parameters equal in length to the number of placeholders in the statement.
404404
"INSERT INTO test_table2 VALUES ($1, $2, $3)",
405405
(2, "world", "2018-01-02"),
406406
)
407-
407+
408408
# paramstyle only needs to be set once, it will be used for all subsequent queries
409409

410410
cursor.execute(
@@ -437,6 +437,58 @@ as values in the second argument.
437437
cursor.close()
438438

439439

440+
Bulk insert for improved performance
441+
--------------------------------------
442+
443+
For inserting large amounts of data more efficiently, you can use the ``bulk_insert`` parameter
444+
with ``executemany()``. This concatenates multiple INSERT statements into a single batch request,
445+
which can significantly improve performance when inserting many rows.
446+
447+
**Note:** The ``bulk_insert`` parameter only works with INSERT statements and supports both
448+
``fb_numeric`` and ``qmark`` parameter styles. Using it with other statement types will
449+
raise an error.
450+
451+
**Example with QMARK parameter style (default):**
452+
453+
::
454+
455+
# Using the default qmark parameter style
456+
cursor.executemany(
457+
"INSERT INTO test_table VALUES (?, ?, ?)",
458+
(
459+
(1, "apple", "2019-01-01"),
460+
(2, "banana", "2020-01-01"),
461+
(3, "carrot", "2021-01-01"),
462+
(4, "donut", "2022-01-01"),
463+
(5, "eggplant", "2023-01-01")
464+
),
465+
bulk_insert=True # Enable bulk insert for better performance
466+
)
467+
468+
**Example with FB_NUMERIC parameter style:**
469+
470+
::
471+
472+
import firebolt.db
473+
# Set paramstyle to "fb_numeric" for server-side parameter substitution
474+
firebolt.db.paramstyle = "fb_numeric"
475+
476+
cursor.executemany(
477+
"INSERT INTO test_table VALUES ($1, $2, $3)",
478+
(
479+
(1, "apple", "2019-01-01"),
480+
(2, "banana", "2020-01-01"),
481+
(3, "carrot", "2021-01-01"),
482+
(4, "donut", "2022-01-01"),
483+
(5, "eggplant", "2023-01-01")
484+
),
485+
bulk_insert=True # Enable bulk insert for better performance
486+
)
487+
488+
When ``bulk_insert=True``, the SDK concatenates all INSERT statements into a single batch
489+
and sends them to the server for optimized batch processing.
490+
491+
440492
Setting session parameters
441493
--------------------------------------
442494

@@ -731,7 +783,7 @@ of execute_async is -1, which is the rowcount for queries where it's not applica
731783
cursor.execute_async("INSERT INTO my_table VALUES (5, 'egg', '2022-01-01')")
732784
token = cursor.async_query_token
733785

734-
Trying to access `async_query_token` before calling `execute_async` will raise an exception.
786+
Trying to access `async_query_token` before calling `execute_async` will raise an exception.
735787

736788
.. note::
737789
Multiple-statement queries are not supported for asynchronous queries. However, you can run each statement
@@ -746,9 +798,9 @@ Monitoring the query status
746798
To check the async query status you need to retrieve the token of the query. The token is a unique
747799
identifier for the query and can be used to fetch the query status. You can store this token
748800
outside of the current process and use it later to check the query status. :ref:`Connection <firebolt.db:Connection>` object
749-
has two methods to check the query status: :py:meth:`firebolt.db.connection.Connection.is_async_query_running` and
750-
:py:meth:`firebolt.db.connection.Connection.is_async_query_successful`.`is_async_query_running` will return True
751-
if the query is still running, and False otherwise. `is_async_query_successful` will return True if the query
801+
has two methods to check the query status: :py:meth:`firebolt.db.connection.Connection.is_async_query_running` and
802+
:py:meth:`firebolt.db.connection.Connection.is_async_query_successful`.`is_async_query_running` will return True
803+
if the query is still running, and False otherwise. `is_async_query_successful` will return True if the query
752804
has finished successfully, None if query is still running and False if the query has failed.
753805

754806
::
@@ -779,7 +831,7 @@ will send a cancel request to the server and the query will be stopped.
779831

780832
token = cursor.async_query_token
781833
connection.cancel_async_query(token)
782-
834+
783835
# Verify that the query was cancelled
784836
running = connection.is_async_query_running(token)
785837
print(running) # False

src/firebolt/async_db/cursor.py

Lines changed: 17 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -218,9 +218,11 @@ async def _do_execute(
218218
timeout: Optional[float] = None,
219219
async_execution: bool = False,
220220
streaming: bool = False,
221+
bulk_insert: bool = False,
221222
) -> None:
222223
await self._close_rowset_and_reset()
223224
self._row_set = StreamingAsyncRowSet() if streaming else InMemoryAsyncRowSet()
225+
224226
# Import paramstyle from module level
225227
from firebolt.async_db import paramstyle
226228

@@ -230,7 +232,12 @@ async def _do_execute(
230232
)
231233

232234
plan = statement_planner.create_execution_plan(
233-
raw_query, parameters, skip_parsing, async_execution, streaming
235+
raw_query,
236+
parameters,
237+
skip_parsing,
238+
async_execution,
239+
streaming,
240+
bulk_insert,
234241
)
235242
await self._execute_plan(plan, timeout)
236243
self._state = CursorState.DONE
@@ -385,6 +392,7 @@ async def executemany(
385392
query: str,
386393
parameters_seq: Sequence[Sequence[ParameterType]],
387394
timeout_seconds: Optional[float] = None,
395+
bulk_insert: bool = False,
388396
) -> Union[int, str]:
389397
"""Prepare and execute a database query.
390398
@@ -402,6 +410,9 @@ async def executemany(
402410
`SET param=value` statement before it. All parameters are stored in
403411
cursor object until it's closed. They can also be removed with
404412
`flush_parameters` method call.
413+
Bulk insert: When bulk_insert=True, multiple INSERT queries are
414+
concatenated and sent as a single batch for improved performance.
415+
Only supported for INSERT statements.
405416
406417
Args:
407418
query (str): SQL query to execute.
@@ -410,11 +421,15 @@ async def executemany(
410421
query with actual values from each set in a sequence. Resulting queries
411422
for each subset are executed sequentially.
412423
timeout_seconds (Optional[float]): Query execution timeout in seconds.
424+
bulk_insert (bool): When True, concatenates multiple INSERT queries
425+
into a single batch request. Only supported for INSERT statements.
413426
414427
Returns:
415428
int: Query row count.
416429
"""
417-
await self._do_execute(query, parameters_seq, timeout=timeout_seconds)
430+
await self._do_execute(
431+
query, parameters_seq, timeout=timeout_seconds, bulk_insert=bulk_insert
432+
)
418433
return self.rowcount
419434

420435
@check_not_closed

0 commit comments

Comments
 (0)