[step 1]support variable block input shapes for gemma4#1656
[step 1]support variable block input shapes for gemma4#1656wenhuach21 wants to merge 16 commits intomainfrom
Conversation
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Pull request overview
This PR aims to add initial support for the gemma4 model family by adapting block input caching/quantization flow to handle variable block shapes and extra cached inputs.
Changes:
- Allow wrapper blocks to forward positional args through to decoder layers.
- Add a predefined fixed-attribute lookup for special model types (
gemma4). - Extend caching/quantization to support variable-shaped block groupings and extra per-block cached inputs.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 9 comments.
| File | Description |
|---|---|
| auto_round/wrapper.py | Forwards positional args through WrapperMultiblock.forward to improve model compatibility. |
| auto_round/special_model_handler.py | Introduces predefined fixed attributes (e.g., gemma4) retrievable from model.config.model_type. |
| auto_round/compressors/base.py | Uses fixed attributes to alter block caching/quantization for variable block shapes and additional cached inputs. |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
for more information, see https://pre-commit.ci
|
TODO @n1ck-guo I'll leave it to you 1 consolidate with your pr. While this pr is more general, it costs large vram during calibration. 2 Add an argument to the API to allow users to configure this, since it’s not easy to determine whether a model has variable block inputs. One possible approach is to probe with sample data, but that would require loading all the blocks, which is costly. |
…into support_gemma4
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
Signed-off-by: lvliang-intel <liang1.lv@intel.com>
|
/azp run Unit-Test-CUDA-AutoRound |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
please have a review, thanks |
| pbar = tqdm(range(0, len(block_names), nblocks)) | ||
|
|
||
| for i in range(0, len(block_names), nblocks): | ||
| if block_names[i] in input_others_extra_blocks: |
There was a problem hiding this comment.
Better to change to "if input_others_extra_blocks and block_names[i] in input_others_extra_blocks:"
| return [normalize_tree(item) for item in value] | ||
| return normalize_scalar(value) | ||
|
|
||
| def escape_invalid_json_backslashes(text: str) -> str: |
There was a problem hiding this comment.
no return in this interface?
Description
Please briefly describe your main changes, the motivation.
Type of Change
Related Issues
Fixes or relates to #
Checklist Before Submitting