Diff with StreamingLLM #20

pfeatherstone · 2023-10-16T12:04:25Z

pfeatherstone
Oct 16, 2023

Meta AI have just released https://arxiv.org/pdf/2309.17453.pdf
This looks related to, or at least trying to solve the same problems as, RMT.
Any thoughts?

lucidrains · 2023-10-16T14:10:10Z

lucidrains
Oct 16, 2023
Maintainer

I dont have experiments to back it up, but my intuition is that RMT is closer to a solution whereas attention sinks is a hack to make standard decoders work.

2 replies

lucidrains Oct 16, 2023
Maintainer

the right way to approach attention sinks would be to append learned memory key / values https://github.com/lucidrains/x-transformers#augmenting-self-attention-with-persistent-memory at each layer, but i also think relative positions will need to be redesigned

lucidrains Oct 16, 2023
Maintainer

we'll just have to see what follow up papers do. sometimes you can have a new paper basically obviate previous papers, so its best not to jump on this hacky solution yet

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Diff with StreamingLLM #20

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Diff with StreamingLLM #20

Uh oh!

pfeatherstone Oct 16, 2023

Replies: 1 comment · 2 replies

Uh oh!

Uh oh!

lucidrains Oct 16, 2023 Maintainer

Uh oh!

lucidrains Oct 16, 2023 Maintainer

Uh oh!

lucidrains Oct 16, 2023 Maintainer

pfeatherstone
Oct 16, 2023

Replies: 1 comment 2 replies

lucidrains
Oct 16, 2023
Maintainer

lucidrains Oct 16, 2023
Maintainer

lucidrains Oct 16, 2023
Maintainer