Skip to content

Conversation

@rairatne
Copy link

@rairatne rairatne commented Jun 1, 2023

Modified nn-hal to improve memory utilization and scores on the Ai_Benchmark app.
Improvement includes:
- better memory usage while doing parallel inference
- more operations enabled/added with float 16 support
- offload to remote infer if available
- offload to remote only if the model is non-quant type
- for now, remote-infer is only supported if nn-api calls execute Synchronously
- enable parallel remote inference
- supports dynamic input shapes and data-types for remote infer

Tracked-On: OAM-109729

- loadmodel rpc call added after sending IR files
- included data_type parameter for remote input data
- add check for remote output length

Tracked-On: OAM-110555
Signed-off-by: Ratnesh Kumar Rai <ratnesh.kumar.rai@intel.com>
rairatne and others added 7 commits June 2, 2023 14:24
- increases grpc message size limit to INT_MAX
- increases deadline for remote model load time to 3 minutes
- modified mRemoteCheck from global to class member
- improved remote checks
- increase chunk size from 1 MB to 10 MB

Tracked-On:OAM-110557
Signed-off-by: Ratnesh Kumar Rai <ratnesh.kumar.rai@intel.com>
Signed-off-by: Anoob Anto K <anoob.anto.kodankandath@intel.com>
- Added Hard Swish
- Enabled Resize Bilinear for float 16 and Quant
- Enabled Resize Nearest Neighbor for float 16 and Quant
- Resolved quan type conversion for Quant Asymm and
Signed for Split

Tracked-On: OAM-110564
Signed-off-by: Anoob Anto K <anoob.anto.kodankandath@intel.com>
Signed-off-by: Ratnesh Kumar Rai <ratnesh.kumar.rai@intel.com>
This fixes the following errors

- Upcasting non-compliant model
- Upcasting non-compliant operand type TENSOR_QUANT8_ASYMM_SIGNED from V1_3::OperandType to V1_2::OperandType

Tracked-On: OAM-110572
Signed-off-by: Anoob Anto K <anoob.anto.kodankandath@intel.com>
- modified mDetectionClient to class object from global
- added Tokens to identify specific model request over
grpc
- added release rpc call to do cleanup on remote side
- [To be fixed]removed remote infer for asyncExceute and
fencedExecute as implementaion was not proper

Tracked-On: OAM-110559
Signed-off-by: Anoob Anto K <anoob.anto.kodankandath@intel.com>
Signed-off-by: Ratnesh Kumar Rai <ratnesh.kumar.rai@intel.com>
Remove static variables
- Separate ModelInfo objects for each operation
- unmap runtime memory pool at end of each execute call
- optimised network graph creator so that it can be released once
graph is created and loaded

Tracked-On: OAM-110558
Signed-off-by: Anoob Anto K <anoob.anto.kodankandath@intel.com>
Signed-off-by: Ratnesh Kumar Rai <ratnesh.kumar.rai@intel.com>
- if remote infer fails, disable parallel attempts for
remote inference

- disable remote infer for quant type models

Tracked-On: OAM-110563
Signed-off-by: Ratnesh Kumar Rai <ratnesh.kumar.rai@intel.com>
Signed-off-by: Anoob Anto K <anoob.anto.kodankandath@intel.com>
- Split previous loadNetwork into two parts
    - create Network: It loads the generated graph Network
    and dump it as xml and bin
    - loadNetwork: which now reads the xml and bin and
    and create infer request

- fallback to native inference if remote infer fails.

Note: fallback causes load network to trigger load for
native infer which increase infer time in fallback scenario,
in case of only native infer(no remote infer) compile_model
is called twice, thus resulting in longer model
time.
Sub-Task JIRA: OAM-110562

Tracked-On: OAM-109729
Signed-off-by: Ratnesh Kumar Rai <ratnesh.kumar.rai@intel.com>
@rairatne rairatne force-pushed the ai_benchmark_improvements branch from 6ead7e3 to ac80d67 Compare June 2, 2023 09:14
@sysopenci sysopenci added the Stale Stale label for inactive open prs label Sep 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Stale Stale label for inactive open prs Valid commit message

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants