Feat/add caching to data sources #1015
Open
+212
−31
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
In the start_task / stop_task lifecycle, some operations slow down the global execution, which can be a non negligeable overhead in context of real-time inference, for example.
In perticular, the
get_global_energy_mix_data()method is called at each invocation to update data that shouldn't change during the whole execution lifetime.The same way, at tracker init the cpu detection runs multiple time for Apple silicon, as it is used in the
resource tracker(ResourceTracker._get_install_instructions) and in the PowerMetrics CLI setup (ApplePowermetrics._setup_cli) in addition to the CPU hardware detection incpu.py.In a first step towards a better internal data management, a quick way to reduce the I/O &
cpuinfo.get_cpu_info()costly execution is to cache the extracted data. Start / stop task operations gained 0.4-0.5 ms for p50 :And results are better for slower requests, and globally for the get_global_energy_mix_data post-cache :
start_task()p95: 3.45ms → 1.65ms (-52%)stop_task()p95: 5.02ms → 3.33ms (-34% )get_global_energy_mix_data(): 2-3ms → 0.0007msMotivation and Context
In preparation to the integration of real time inference framework, based on the start / stop task API, blocking operations can be problematic when performance is optimized server side. At 1-2ms average execution time by start / top operation, the overhead seems limited for the moment. An asynchronous implementation could be considered but with caution, because when wrapping a single inference we might be synchronous when reading the data at start & stop time, otherwise the delta would measure an incorrect computation time if ran in background thread.
How Has This Been Tested?
A unit test has been added, benchmarks have been run post-implementation.
Screenshots (if appropriate):
Types of changes
What types of changes does your code introduce? Put an
xin all the boxes that apply:Checklist:
Go over all the following points, and put an
xin all the boxes that apply.