DeepMind Proposes Novel Vision Transformer for Arbitrary Size & Resolution
In a new paper Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution, a Google DeepMind research team further improves ViT with Native Resolution ViT (NaViT), which is able process input sequences of arbitrary resolutions and aspect ratios.