A Closer Look at the Few-Shot Adaptation of Large Vision-Language Models

Julio Silva-Rodríguez · Sina Hajimiri · Ismail Ben Ayed · Jose Dolz

ÉTS Montreal

CVPR 2024

Highlights

  • Adapter-style efficient transfer learning allow black-box, and fast few-shot transferability of VLMs.
  • Existing Adapters learn a combination of zero-shot prototypes and support embeddings to leverage taks-specific predictions.
  • Pitfalls: prior Adapters require a validation subset to fix key hyperparameters, unrealistic on the few-shot data regime.
  • Proposed: Few-shot adapters with model selection strategy based only on the support set.
    • Zero-shot Linear Probe (ZS-LP): a surprisingly strong well-initialized Linear Probe.
    • Class-Adaptive Linear Probe (CLAP): constraining the learnt prototypes to remain close to zero-shot weights.

Few-shot VLMs Adaptation


The adaptation of Vision-Language Models using few-shots as supervision benefits from the efficient transfer of the pre-trained features. Two alternatives are currently popularized: Prompt Learning, and Adapters.



Pitfalls on Existing Adapters


Existing Adapters exhibit strong performance only in narrowly-defined experimental setups, and with a careful adjustment of hyperparameters based on a large corpus of labeled samples. To outperform a carefully designed Linear Probing (ZS-LP) baseline, these methods require to optimize their hyperparameters on each target task, which is unrealistic.



Class-Adaptive Linear Probing (CLAP)


We propose a novel approach that meets the requirements of real-world scenarios. We introduce a CLass-Adaptive linear Probe (CLAP) objective, that constraints the learned prototypes to retain prior zero-shot knowledge adaptely based only on the few support shots, and uses an homogeneus learning configuration accross tasks.



Citation


Please cite our paper if it is helpful to your work:

@inproceedings{clap24,
    title={A Closer Look at the Few-Shot Adaptation of Large Vision-Language Models},
    author={Julio Silva-Rodr\'iguez and Sina Hajimiri and Ismail Ben Ayed and Jose Dolz},
    booktitle={IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    year={2024}
    }

Contact


Please feel free to contact us: julio-jose.silva-rodriguez@etsmtl.ca.