Abstract: Large vision-language models (LVLMs) have demonstrated remarkable capabilities across a wide range of multimodal understanding and reasoning tasks. However, recent research shows that LVLMs ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results