Skip to content

Conversation

@mrForza
Copy link
Contributor

@mrForza mrForza commented Dec 15, 2025

This patch introduces a new way of map_callrw execution by which we can pass some arguments to all storages and split buckets' arguments to those storages that have at least one bucket of bucket_ids. To achieve this we introduce a new string option - mode to map_callrw api.

@TarantoolBot document

Title: vshard: mode option for router.map_callrw()

This string option regulates on which storages the user function will be executed via map_callrw. Possible values:

  1. full - the user function will be executed on all storages in
    cluster.
  2. partial - the user function will be executed on certain sotrages on
    which buckets from bucket_ids were found.

After that changes map_callrw works in 4 different scenarios depending on mode and bucket_ids options:

  1. <mode = 'full', bucket_ids = nil> - user function will be executed
    with args on all storages.
  2. <mode = 'full', bucket_ids = {[bid_1] = {b_arg_1}, ...}> - storages
    that have at least one bucket of bucket_ids will execute user
    function with args and additional buckets' arguments. Other storages
    will execute user function only with args.
  3. <mode = 'partial', bucket_ids = {bid_1, ...}> - user function will be
    executed with args on storages that have at least one bucket of
    bucket_ids.
  4. <mode = 'partial', bucket_id = {[bid_1] = {b_arg_1}, ...}> - the same
    as the 3rd scenario but buckets' arguments (b_arg_1, ..., b_arg_n) will
    be added as additional arguments to user function.

Also now map_callrw ends with error in cases of <mode = 'full', bucket_ids = {1, 2, ...}> and <mode = 'partial', bucket_ids = nil>.

Closes #559

Copy link
Collaborator

@Gerold103 Gerold103 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks 🔥! I sure hope it will help the customers 😁.

Comment on lines 983 to 996
local grouped_args_added = false
for id, rs in pairs(replicasets) do
if grouped_args ~= nil then
-- It's cheaper to push and then pop, rather then deepcopy
-- arguments table for every call.
table.insert(args, grouped_args[id])
local rs_grouped_args = grouped_args[id]
table.insert(args, rs_grouped_args)
grouped_args_added = rs_grouped_args ~= nil
end
local res, err = rs:callrw('vshard.storage._call', func_args, opts_map)
if grouped_args ~= nil then
if grouped_args_added then
table.remove(args)
grouped_args_added = false
end
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You won't need this flag I think if you would change it to something as:

for ... do
    local rs_args = grouped_args and grouped_args[id] or nil
    if rs_args then table.insert(args, grouped_args[id]) end
    -- do the call
    if rs_args then table.remove(args) end
end

The code style is a bit relaxed, but you get the idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment on lines 1099 to 1100
local mode = bucket_ids and MAP_CALLRW_PARTIAL or MAP_CALLRW_FULL
return mode, nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to return trailing nils in Lua. Can be simplified to return bucket_ids and MAP_CALLRW_PARTIAL or MAP_CALLRW_FULL.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

if opts then
mode, err = router_set_map_callrw_mode(opts.mode, opts.bucket_ids)
if err then
return nil, err, nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. No need for a trailing nil.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment on lines 1180 to 1187
local mode = MAP_CALLRW_FULL
if opts then
mode, err = router_set_map_callrw_mode(opts.mode, opts.bucket_ids)
if err then
return nil, err, nil
end
timeout = opts.timeout or consts.CALL_TIMEOUT_MIN
do_return_raw = opts.return_raw
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you assign mode and then overwrite it when opts are provided. Lets not do an overwrite. And instead simply declare the empty mode variable, and with given opts do what you do now, and in the else branch assign it to the "full" default. Then there won't be an overwrite, which in turn might save us a tiny bit of cpu.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment on lines 1129 to 1131
-- 4) <mode = 'partial', bucket_id = {[bid_1] = {b_arg_1}, ...}> - the same
-- as the 3rd scenario but buckets' arguments (b_arg_1, ..., b_arg_n)
-- will be added as additional arguments to user function.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This here and in the commit message might be slightly misleading. It looks like the user should expect the bucket arguments to be passed as positional arguments like func(all_args, bucket1_arg, bucket2_arg, ...). But instead it is more like func(all_args, bucket_args_map). I suggest to clarify this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@Gerold103
Copy link
Collaborator

The commit message doesn't mention the ticket. You need to write Closes #559 right before the TarantoolBot doc request.

@Gerold103 Gerold103 assigned mrForza and unassigned Gerold103 Dec 18, 2025
This patch introduces a new way of `map_callrw` execution by which we can
pass some arguments to all storages and split buckets' arguments to those
storages that have at least one bucket of `bucket_ids`. To achieve this we
introduce a new string option - `mode` to `map_callrw` api.

Closes tarantool#559

@TarantoolBot document

Title: vshard: `mode` option for `router.map_callrw()`

This string option regulates on which storages the user function will be
executed via `map_callrw`. Possible values:
1) `full` - the user function will be executed on all storages in
   cluster.
2) `partial` - the user function will be executed on certain sotrages on
   which buckets from `bucket_ids` were found.

After that changes `map_callrw` works in 4 different scenarios depending
on `mode` and `bucket_ids` options:
1) `<mode = 'full', bucket_ids = nil>` - user function will be executed
   with args on all storages.
2) `<mode = 'full', bucket_ids = bucket_args_map>` - storages that have at
   least one bucket of bucket_ids will execute user function with args
   and additional buckets' arguments. Other storages will execute user
   function only with args.
3) `<mode = 'partial', bucket_ids = numeric_buckets_map>` - user function
   will be executed with args on storages that have at least one bucket
   of bucket_ids.
4) `<mode = 'partial', bucket_ids = bucket_args_map>` - the same as the
   3rd scenario but buckets' arguments will be added as additional
   arguments to user function.

Annotations:
   bucket_args_map = `{[bid_1] = {b_arg_1}, ..., [bid_n] = {b_arg_n}}`
   numeric_buckets_map = `{bucket_id_1, ..., bucket_id_n}`

Also now `map_callrw` ends with error in cases of `<mode = 'full',
bucket_ids = {1, 2, ...}>` and `<mode = 'partial', bucket_ids = nil>`.
@mrForza mrForza force-pushed the mrforza/gh-559-full-map-call-rw-with-split-args branch from 1f4e9b4 to 11e76dc Compare December 22, 2025 12:37
@mrForza mrForza assigned Serpentian and unassigned Serpentian and mrForza Dec 23, 2025
Copy link
Collaborator

@Serpentian Serpentian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the patch! It looks great, but I'm afraid, it's a little bit broken now, let's fix and test the new mode with broken cache on the router

res, err = ivshard.router.map_callrw('assert', {'a'},
{bucket_ids = {5ULL, 6ULL}})
ilt.assert(res)
ilt.assert_not(err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in the doc request:

the user function will be executed on certain sotrages on

on certain storages

-- of bucket_ids.
-- 4) <mode = 'partial', bucket_ids = bucket_args_map> - the same as the 3rd
-- scenario but buckets' arguments will be added as additional arguments
-- to user function.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment and doc request:

Please, let's describe all available modes in the doc request and the comment above, the API is very difficult, it's better to explicitly mention every available behavior. What we're missing now is the description, when mode is not provided. It's default value is a little bit tricky (full, when no bucket_ids and partial otherwise).

return nil, lerror.make('Router can\'t execute map_callrw with ' ..
'\'full\' mode and numeric bucket_ids')
end
return opts_mode, nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nil in the end is not needed

local rs_args = grouped_args and grouped_args[id] or nil
-- It's cheaper to push and then pop, rather then deepcopy
-- arguments table for every call.
if rs_args then table.insert(args, rs_args) end
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: let's preserve the old style with proper indentation of the conditions. It's easier to look at the code and minimizes the diff, which will look as follows:

diff --git a/vshard/router/init.lua b/vshard/router/init.lua
index b6baeb3..a0be400 100644
--- a/vshard/router/init.lua
+++ b/vshard/router/init.lua
@@ -981,13 +981,14 @@ local function replicasets_map_reduce(replicasets, rid, func, args,
     --
     local func_args = {'storage_map', rid, func, args}
     for id, rs in pairs(replicasets) do
-        if grouped_args ~= nil then
+        local rs_args = grouped_args and grouped_args[id]
+        if rs_args ~= nil then
             -- It's cheaper to push and then pop, rather then deepcopy
             -- arguments table for every call.
             table.insert(args, grouped_args[id])
         end
         local res, err = rs:callrw('vshard.storage._call', func_args, opts_map)
-        if grouped_args ~= nil then
+        if rs_args ~= nil then
             table.remove(args)
         end
         if res == nil then

-- arguments table for every call.
table.insert(args, grouped_args[id])
end
local rs_args = grouped_args and grouped_args[id] or nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or nil is not needed in the end

router_ref_storage_all(router, timeout)
end
if timeout then
if plain_bucket_ids then
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work unfortunately. Previously the router_group_map_callrw_args was invoked right after router_ref_storage_by_buckets, now it can be invoked also after router_ref_storage_all. The first one finds the buckets in the cluster, updates the router map cache and only because of that we could use the router_group_map_callrw_args function. But the router_ref_storage_all doesn't do that.

Currently, the newly added mode (mode = 'full' + split args) doesn't work, when the cache on a router is broken, the arguments will be sent to the wrong instances. Let's won't forget to add the test case for that

@Serpentian Serpentian assigned mrForza and unassigned Serpentian Dec 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Full map_callrw with split args

3 participants