Follow

youtube.com/watch?v=WVPE62Gk3E

"The quadratic resource requirements of the attention mechanism are the main roadblock in scaling up transformers to long sequences. This paper replaces the full quadratic attention mechanism by a combination of random attention, window attention, and global attention."

· · Web · 0 · 0 · 0
Sign in to participate in the conversation
Mastodon

Personal server of Lukáš Lánský