Skip to content

Unlimited RQ / remote dynamic code executor #2

@pocesar

Description

@pocesar

Brainstorm time. Express server that has 2 endpoints:

  • /register

    • Registers a callback, and you receive an id later. it evals the payload and keep in memory. this is only for not sending the whole function every time on /request. Using the id received here calls the same evaled function using /request
  • /request

    • FIFO Queue the request (the same RequestOptions format) in memory.
    • would be on a 32GB instance with worker_threads (pool of 7-8 threads) in a work stealing manner.
    • use a SharedArrayBuffer with the JSON in string format, JSON.parse inside the worker?
    • Bilateral communication can work with MessageChannel. Maybe call Apify.pushData from the received message? or inside the worker? (might overload the dataset endpoint)
    • Persists remaining requests to RL on migration, using the same id from register, so we know which one belongs to where and can be loaded from the KV.
  1. how retries are done? they come back to the queue or stay retrying until giving up?
  2. Registering the code from the "main" process (ie. another scraper/task) can serialize the function as in:
    await fetch(/* container url register */, { 
       body: myWorkerCode.toString()
    })
  3. since it's a fire-and-forget mechanism (and the data will be pushed to the designated dataset), how could de-duping occur?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions