|
|
2 months ago | |
|---|---|---|
| CLIP | 2 months ago | |
| ckpt | 2 months ago | |
| data | 2 months ago | |
| dataset | 2 months ago | |
| dataset%2Ffewshot_seed%2FBrain | 2 months ago | |
| README.md | 2 months ago | |
| ablation_test_zero.py | 2 months ago | |
| ablation_train_zero.py | 2 months ago | |
| loss.py | 2 months ago | |
| prompt.py | 2 months ago | |
| test_few.py | 2 months ago | |
| test_zero.py | 2 months ago | |
| train_few.py | 2 months ago | |
| train_few_graph.py | 2 months ago | |
| train_zero.py | 2 months ago | |
| 袁菱202325330126-于越202325330125.pdf | 2 months ago | |
README.md
Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images
This is an official implementation of “Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images” with PyTorch, accepted by CVPR 2024 (Highlight).
If our work is helpful for your research, please consider citing:
@inproceedings{huang2024adapting,
title={Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images}
author={Huang, Chaoqin and Jiang, Aofan and Feng, Jinghao and Zhang, Ya and Wang, Xinchao and Wang, Yanfeng},
booktitle={IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2024}
}
Abstract: Recent advancements in large-scale visual-language pre-trained models have led to significant progress in zero-/few-shot anomaly detection within natural image domains. However, the substantial domain divergence between natural and medical images limits the effectiveness of these methodologies in medical anomaly detection. This paper introduces a novel lightweight multi-level adaptation and comparison framework to repurpose the CLIP model for medical anomaly detection. Our approach integrates multiple residual adapters into the pre-trained visual encoder, enabling a stepwise enhancement of visual features across different levels. This multi-level adaptation is guided by multi-level, pixel-wise visual-language feature alignment loss functions, which recalibrate the model’s focus from object semantics in natural imagery to anomaly identification in medical images. The adapted features exhibit improved generalization across various medical data types, even in zero-shot scenarios where the model encounters unseen medical modalities and anatomical regions during training. Our experiments on medical anomaly detection benchmarks demonstrate that our method significantly surpasses current state-of-the-art models, with an average AUC improvement of 6.24% and 7.33% for anomaly classification, 2.03% and 2.37% for anomaly segmentation, under the zero-shot and few-shot settings, respectively.
Keywords: Anomaly Detection, Medical Images
Get Started
Environment
- python >= 3.8.5
- pytorch >= 1.10.0
- torchvision >= 0.11.1
- numpy >= 1.19.2
- scipy >= 1.5.2
- kornia >= 0.6.1
- pandas >= 1.1.3
- opencv-python >= 4.5.4
- pillow
- tqdm
- ftfy
- regex
Pretrained model
-
Download and put it under
CLIP/ckptfolder -
MVFA:
- few-shot: https://drive.google.com/file/d/1bV1yzPxJarTRfd8liMIwyHcGywTTEL2k/view?usp=sharing
- zero-shot: https://drive.google.com/file/d/1nGhcK32CrkgTR5Rav6rNfptUHaASfRnU/view?usp=sharing
Download and put it under
ckptfolder. Please also unzip the ckpt files:unzip few-shot.zip unzip zero-shot.zip
Medical Anomaly Detection Benchmark
-
(optional) Follow the BMAD to apply for permission to download the relevant dataset. After extracting the data, reorganize the data benchmark according to the guidelines provided in our Appendix A.
-
We also provide the pre-processed benchmark. Please download the following dataset
- Liver: https://drive.google.com/file/d/1xriF0uiwrgoPh01N6GlzE5zPi_OIJG1I/view?usp=sharing
- Brain: https://drive.google.com/file/d/1YxcjcQqsPdkDO0rqIVHR5IJbqS9EIyoK/view?usp=sharing
- HIS: https://drive.google.com/file/d/1hueVJZCFIZFHBLHFlv1OhqF8SFjUVHk6/view?usp=sharing
- RESC: https://drive.google.com/file/d/1BqDbK-7OP5fUha5zvS2XIQl-_t8jhTpX/view?usp=sharing
- OCT17: https://drive.google.com/file/d/1GqT0V3_3ivXPAuTn4WbMM6B9i0JQcSnM/view?usp=sharing
- ChestXray: https://drive.google.com/file/d/15DhnBAk-h6TGLTUbNLxP8jCCDtwOHAjb/view?usp=sharing
-
Place it within the master directory
dataand unzip the dataset.tar -xvf Liver.tar.gz tar -xvf Brain.tar.gz tar -xvf Histopathology_AD.tar.gz tar -xvf Retina_RESC.tar.gz tar -xvf Retina_OCT2017.tar.gz tar -xvf Chest.tar.gz
File Structure
After the preparation work, the whole project should have the following structure:
code
├─ ckpt
│ ├─ few-shot
│ └─ zero-shot
├─ CLIP
│ ├─ bpe_simple_vocab_16e6.txt.gz
│ ├─ ckpt
│ │ └─ ViT-L-14-336px.pt
│ ├─ clip.py
│ ├─ model.py
│ ├─ models.py
│ ├─ model_configs
│ │ └─ ViT-L-14-336.json
│ ├─ modified_resnet.py
│ ├─ openai.py
│ ├─ tokenizer.py
│ └─ transformer.py
├─ data
│ ├─ Brain_AD
│ │ ├─ valid
│ │ └─ test
│ ├─ ...
│ └─ Retina_RESC_AD
│ ├─ valid
│ └─ test
├─ dataset
│ ├─ fewshot_seed
│ │ ├─ Brain
│ │ ├─ ...
│ │ └─ Retina_RESC
│ ├─ medical_few.py
│ └─ medical_zero.py
├─ loss.py
├─ prompt.py
├─ readme.md
├─ train_few.py
├─ train_zero.py
└─ utils.py
Quick Start
python test_few.py --obj $target-object --shot $few-shot-number
For example, to test on the Brain MRI with k=4, simply run:
python test_few.py --obj Brain --shot 4
Training
python train_few.py --obj $target-object --shot $few-shot-number
For example, to train on the Brain MRI with k=4, simply run:
python train_few.py --obj Brain --shot 4
Results
Results of zero-shot anomaly detection and localization:
| AUC (%) | Detection | Localization | ||
| Zero-shot | Paper | Inplementation | Paper | Inplementation |
| HIS | 77.90 | 76.90 | - | - |
| ChestXray | 71.11 | 71.11 | - | - |
| OCT17 | 95.40 | 95.40 | - | - |
| BrainMRI | 78.63 | 79.80 | 90.27 | 89.68 |
| LiverCT | 76.24 | 81.18 | 97.85 | 97.93 |
| RESC | 83.31 | 88.99 | 92.05 | 90.44 |
| Average | 80.43 | 82.23 | 93.39 | 92.68 |
Results of few-shot anomaly detection and localization with k=4:
| AUC (%) | Detection | Localization | ||
| 4-shot | Paper | Inplementation | Paper | Inplementation |
| HIS | 82.71 | 82.71 | - | - |
| ChestXray | 81.95 | 81.95 | - | - |
| OCT17 | 99.38 | 99.38 | - | - |
| BrainMRI | 92.44 | 92.31 | 97.30 | 97.30 |
| LiverCT | 81.18 | 81.18 | 99.73 | 99.69 |
| RESC | 96.18 | 96.18 | 98.97 | 98.97 |
| Average | 88.97 | 88.95 | 98.67 | 98.65 |
训练:多级特征适应
在CLIP的多个特征级别上应用CLIP适配器
在 CLIP 的多个特征级别上应用 CLIP 适配器(Adapter),是论文提出的MVFA(Multi-level Visual Feature Adapter)框架的核心设计,本质是在 CLIP 视觉编码器的不同特征阶段插入轻量级可学习模块,实现 “不微调主干网络、仅适配特征” 的目标,从而让预训练于自然图像的 CLIP 适配医疗图像异常检测任务。
视觉编码器的 “特征级别” ?
CLIP 的视觉编码器(以论文使用的 ViT-L/14 为例)采用分层结构,将图像从 “原始像素” 逐步转化为 “抽象特征”,论文将其划分为4 个特征级别(Stage,S₁~S₄),每个级别对应不同抽象程度的特征,具体如下:
| 特征级别(Stage) | 位置 | 特征抽象度 | 对应任务价值 |
|---|---|---|---|
| S₁(第 1 阶段) | 编码器前 6 层输出 | 低(接近像素) | 捕捉局部细节(如医疗图像中的微小病灶边缘) |
| S₂(第 2 阶段) | 编码器中间 6 层输出 | 中 | 平衡局部细节与全局结构(如肿瘤区域轮廓) |
| S₃(第 3 阶段) | 编码器后 6 层输出 | 高 | 捕捉全局语义(如脑部 MRI 的整体结构异常) |
| S₄(第 4 阶段) | 编码器最终输出 | 最高(全局特征) | 图像级分类(如判断 “是否为异常图像”) |
简单来说,CLIP 的特征级别越靠后(S₄),特征越抽象、越偏向 “自然图像的物体语义”(如 “猫”“车” 的类别特征);级别越靠前(S₁),特征越具体、越偏向 “像素级细节”—— 而医疗异常检测既需要 S₁~S₃的细节特征定位病灶(分割任务),也需要 S₄的全局特征判断图像是否异常(分类任务),这是 “多级别适配” 的核心需求。
论文不在单一特征级别(如仅 S₄)应用适配器,而是覆盖 S₁~S₃(中间级别)+ S₄(最终级别)
任务鸿沟:从 “物体语义识别” 到 “异常分辨”
CLIP 预训练的目标是 “识别自然图像中的物体类别”(如区分 “猫” 和 “车”),关注的是 “正常物体的语义特征”;而医疗异常检测需要 “分辨正常组织与异常病灶”(如区分 “正常脑组织” 和 “肿瘤”),关注的是 “局部细节偏差”。
- 仅在 S₄(全局特征)应用适配器:只能优化图像级分类,无法捕捉像素级异常细节(如小病灶);
- 多级别应用(S₁~S₄):S₁~S₃适配局部细节特征(定位病灶),S₄适配全局特征(判断图像异常),同时满足分类与分割需求。
域鸿沟:从 “自然图像” 到 “医疗图像”
自然图像(如风景、动物)与医疗图像(如 MRI、CT)的风格、纹理、语义差异极大(域偏移严重):
- 单一级别适配:仅能调整某一抽象度的特征,无法全面弥合域差异(如 S₄适配仅优化全局语义,S₁适配仅优化像素细节);
- 多级别适配:通过 S₁~S₄的逐层适配,让 CLIP 的每一层特征都逐步向医疗图像的 “正常 / 异常特征” 对齐(如 S₁适配让像素细节贴近 CT 的灰度分布,S₄适配让全局语义从 “物体类别” 转向 “组织正常性”)。
泛化性提升:跨模态 / 解剖区域的零样本能力
核心目标是让模型在 “训练时未见过的医疗模态(如训练 MRI、测试 CT)” 或 “解剖区域(如训练脑部、测试肝脏)” 上仍有效:
- 多级别适配器通过 “逐层特征校准”,让每个级别都学习到 “通用的正常 / 异常特征模式”(而非某一模态的专属特征);
- 对比实验显示:仅用多级别适配器(而非全局投影器)的模型,在零样本场景下分类 AUC 平均提升 13.57%(Table 4),证明多级别适配是泛化性的关键。
与 “单一级别适配” 的对比:多级别更优
论文通过消融实验(Table 5)验证了多级别应用的必要性:
- 单一级别适配(如仅 S₂):最优的 S₂级别分类 AUC 为 88.84%、分割 AUC 为 98.62%;
- 多级别适配(S₁~S₄融合):分类 AUC 提升至 88.97%、分割 AUC 提升至 88.67%,且在所有数据集上均表现更稳定(避免单一级别对某类医疗图像的偏见)。
“在 CLIP 的多个特征级别上应用 CLIP 适配器”,本质是通过轻量级、残差式、多级别、双任务适配模块,在不破坏 CLIP 预训练知识的前提下,让 CLIP 的每一层特征都逐步从 “自然图像物体语义” 转向 “医疗图像异常特征”,最终实现 “零样本 / 少样本下跨模态、跨解剖区域的医疗异常检测与定位”,这也是论文 MVFA 框架超越其他 CLIP-based 方法(如 WinCLIP、April-GAN)的核心创新点。