Diff with StreamingLLM #20
pfeatherstone
started this conversation in
General
Replies: 1 comment 2 replies
-
|
I dont have experiments to back it up, but my intuition is that RMT is closer to a solution whereas attention sinks is a hack to make standard decoders work. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Meta AI have just released https://arxiv.org/pdf/2309.17453.pdf
This looks related to, or at least trying to solve the same problems as, RMT.
Any thoughts?
Beta Was this translation helpful? Give feedback.
All reactions