Add 4 bit inference capability #280

snowclipsed · 2025-05-19T21:47:54Z

No description provided.

moondream/torch/sample.py

moondream/torch/text.py

moondream/torch/weights.py

snowclipsed · 2025-05-20T00:06:07Z

moondream/torch/moondream.py

-        self._decode_one_tok = torch.compile(
-            self._decode_one_tok, fullgraph=True, mode="reduce-overhead"
-        )
+        self._vis_enc = torch.compile(self._vis_enc, fullgraph=False, mode="reduce-overhead")


@EthanReid what do you think about this? The prefill/decode won't compile for me (even at fp16)

It works for me. Since compile is optional, it should be kept.

moondream/torch/sample.py

EthanReid · 2025-05-20T05:42:27Z

Please clean up print statements

moondream/torch/sample.py

Fixed fp16 naming

Add 4 bit inference capability

d493e10

snowclipsed closed this May 19, 2025

snowclipsed reopened this May 19, 2025

EthanReid reviewed May 19, 2025

View reviewed changes

snowclipsed commented May 20, 2025

View reviewed changes

snowclipsed added 4 commits May 19, 2025 18:37

update weights.py

e341dd9

clean sample.py and text.py

1c7f9e9

allow setting layernorm and linear dtypes independently and dynamically

dc0251c

remove unnecessary debug statement

6b3caeb

EthanReid reviewed May 20, 2025

View reviewed changes

moondream/torch/sample.py Outdated Show resolved Hide resolved

snowclipsed added 2 commits May 19, 2025 22:15

run through black formatter

2eb349b

changed logger level and max tokens default

78d2b88

EthanReid reviewed May 20, 2025

View reviewed changes

moondream/torch/sample.py Show resolved Hide resolved

EthanReid and others added 4 commits May 20, 2025 09:55

Fixed fp16 naming

bd9740d

Merge pull request #3 from EthanReid/int4

dbdecd6

Fixed fp16 naming

remove bitblas logging suprression from sample

c92221c

add back max query speeds

3144167

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add 4 bit inference capability #280

Add 4 bit inference capability #280

Uh oh!

snowclipsed commented May 19, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

snowclipsed May 20, 2025

Uh oh!

EthanReid May 20, 2025

Uh oh!

Uh oh!

EthanReid commented May 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add 4 bit inference capability #280

Are you sure you want to change the base?

Add 4 bit inference capability #280

Uh oh!

Conversation

snowclipsed commented May 19, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

snowclipsed May 20, 2025

Choose a reason for hiding this comment

Uh oh!

EthanReid May 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

EthanReid commented May 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants