Aligning objects with corresponding textual descriptions is a fundamental challenge and a realistic requirement in vision-language understanding. While recent multimodal embedding models excel at ...
Abstract: Text-Pedestrian Image Retrieval employs textual description of pedestrian's appearance to identify the corresponding pedestrian image. This task involves modality discrepancy and the ...
Abstract: Contemporary advancements in Earth observation technologies have generated substantial data resources for remote sensing image retrieval applications. However, existing models exhibit ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results