Follow

arxiv.org/abs/1911.03584

"Specifically, we prove that a multi-head self-attention layer with sufficient number of heads is at least as expressive as any convolutional layer. Our numerical experiments then show that self-attention layers attend to pixel-grid patterns similarly to CNN layers, corroborating our analysis."

· · Web · 0 · 0 · 0
Sign in to participate in the conversation
Mastodon

Personal server of Lukáš Lánský